I've got 2 data sets - both of companies responding to a questionnaire. The first set of 15 has a characteristic that the second set of 27 does not.
The distribution is skewed, the whole sample population displayed a high degree of skew. The sample was reasonably heavy tailed, the first 200 companies' employee figures producing a mean of 67.89 and a median of 57.5, indicating positive skew. The coefficient of skewness is 1.05, and coefficient of kurtosis is 0.50. The 42 respondents data for numbers of employees produced a mean of 67.29, a median of 61.5, again indicating positive skew. The coefficient of skewness for the respondents is 1.23, the coefficient of kurtosis is 1.68.
My calculation of the variance for populations shows that there is a significant between sample effect at the 0.05 significance level; the F test statistic of 2.74 exceeds the 5% level of 2.1, and approaches the 1% level of 2.9, therefore F test on variances shows that the observed variance ratio is too large to support the null hypothesis that the populations do not differ significantly.
Questions - data below. A) Am I right so far ? or does the skewness / non normal distribution invalidate a variance analysis ? B) Does the fact that the population is skewed matter - or should I correct using Bessel's ? c) If I need to correct - how should I do it ?
F test on variances 2.7438 degrees of freedom 1 (greater) 14 degrees of freedom 2 (lesser) 26