Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Minimum data set size required for Kruskal-Wallis test?
Replies: 2   Last Post: May 24, 2013 8:18 PM

 Messages: [ Previous | Next ]
 Kate J. Posts: 177 Registered: 6/9/11
Minimum data set size required for Kruskal-Wallis test?
Posted: May 22, 2013 9:11 PM

I'm attempting to perform the nonparametric Kruskal-Wallis test (kruskalwallis() function) on 3 sets of data generated from 3 different testing conditions. In the past, I've successfully performed this test on large data sets from a different project (with 100+ members in each set). Currently, each of my sets only has 3 to 5 values. (I'm always comparing sets of equal size.)

The problem: despite my use of previous code that successfully performed Kruskal-Wallis analysis on larger data sets, when I try to perform the same analysis on my current, much smaller data sets, I'm receiving error messages. I'm wondering: is there a minimum set size required to perform Kruskal-Wallis analysis?

Here is my code:

dataSetA = [21.4 27.2 31.8];
dataSetB = [54.0 57.0 59.4];
dataSetC = [30.6 48.2 35.2];

myData = [dataSetA dataSetB dataSetC];
[p,table,stats] = kruskalwallis(mydata)
c1 = multcompare(stats)

The plot that is generated contains only a single boxplot instead of 3 (I know that a boxplot for only 3 values is dicey...), and here is the Matlab screen output:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
p = 1

table =
'Source' 'SS' 'df' 'MS' 'Chi-sq' 'Prob>Chi-sq'
'Columns' [ 0] [14] [ 0] [ 0] [ 1]
'Error' [279] [ 0] [NaN] [] []
'Total' [279] [14] [] [] []

stats =
gnames: '1'
n: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
source: 'kruskalwallis'
meanranks: 8
sumt: 12

Note: Intervals can be used for testing but are not simultaneous confidence intervals.
??? Subscripted assignment dimension mismatch.

Error in ==> multcompare>makeM at 564
MM(:,2) = sqrt(diag(gcov));

Error in ==> multcompare at 475
[M,MM,hh] = makeM(gmeans, gcov, crit, gnames, mname, dodisp);

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since collecting this particular data is time-consuming, it would be good to be able to get an idea about approximately how large my data sets will need to be in order for this type of analysis to work (as it *is* possible for me to collect more, if necessary); otherwise, I should consider other forms of statistical analysis.