Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.


Kate J.
Posts:
177
Registered:
6/9/11


Minimum data set size required for KruskalWallis test?
Posted:
May 22, 2013 9:11 PM


I'm attempting to perform the nonparametric KruskalWallis test (kruskalwallis() function) on 3 sets of data generated from 3 different testing conditions. In the past, I've successfully performed this test on large data sets from a different project (with 100+ members in each set). Currently, each of my sets only has 3 to 5 values. (I'm always comparing sets of equal size.)
The problem: despite my use of previous code that successfully performed KruskalWallis analysis on larger data sets, when I try to perform the same analysis on my current, much smaller data sets, I'm receiving error messages. I'm wondering: is there a minimum set size required to perform KruskalWallis analysis?
Here is my code:
dataSetA = [21.4 27.2 31.8]; dataSetB = [54.0 57.0 59.4]; dataSetC = [30.6 48.2 35.2];
myData = [dataSetA dataSetB dataSetC]; [p,table,stats] = kruskalwallis(mydata) c1 = multcompare(stats)
The plot that is generated contains only a single boxplot instead of 3 (I know that a boxplot for only 3 values is dicey...), and here is the Matlab screen output:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ p = 1
table = 'Source' 'SS' 'df' 'MS' 'Chisq' 'Prob>Chisq' 'Columns' [ 0] [14] [ 0] [ 0] [ 1] 'Error' [279] [ 0] [NaN] [] [] 'Total' [279] [14] [] [] []
stats = gnames: '1' n: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1] source: 'kruskalwallis' meanranks: 8 sumt: 12
Note: Intervals can be used for testing but are not simultaneous confidence intervals. ??? Subscripted assignment dimension mismatch.
Error in ==> multcompare>makeM at 564 MM(:,2) = sqrt(diag(gcov));
Error in ==> multcompare at 475 [M,MM,hh] = makeM(gmeans, gcov, crit, gnames, mname, dodisp);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Since collecting this particular data is timeconsuming, it would be good to be able to get an idea about approximately how large my data sets will need to be in order for this type of analysis to work (as it *is* possible for me to collect more, if necessary); otherwise, I should consider other forms of statistical analysis.
Thanks in advance for your insights!



