Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: Kolmogorov–Smirnov / Lilliefors test, small sample
s

Replies: 5   Last Post: Aug 11, 2013 5:23 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
AndyHancock

Posts: 23
Registered: 5/22/12
Kolmogorov–Smirnov / Lilliefors test, small sample
s

Posted: Jul 23, 2013 10:31 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

I've been reading up on Kolmogorov-Smirnov (KS) and Lilliefors (LF) tests. I realize there are other tests, but I'm just trying to understand a sublety of the KS/LF test from an academic perspective. The test statistic is the maximum difference in the CDFs, and in a typical usage scenario, one of the two CDFs being compared is a reference distribution, often a theoretical and/or hypothesized distribution, while the other CDF is an empirical CDF from a sample (EDF). For small samples, the EDF is staircase shaped, with the left end of each stop being closed end of an interval and the right end being the open end. The thresholds for rejection are tabulated for various signifcance levels and sample sizes. The LF thresholds are generated from Monte Carlo simulation, and they take into account the fact that the test statistic is smaller when the parameters of the reference distribution are estimated from the data sample.

Whew. OK, that's all I know.

Now for the question. Let's call F0(x) the reference CDF and F1(x) the EDF to be tested against F0(x). Let the difference by deltaCDF(x). Then the test statistic is max of deltaCDF(x) over x. For small sample sizes, F1(x) has distinct steps. Many tests and visualizations evaluate a metric only at the point of data sample. If that is done for the KS/LF tests, then deltaCDF(x) is only evaluated only at x-values where the sample contains data. That would correspond the closed end (left end) of each staircase step. However, it is possible for deltaCDF(x) to increase toward the right end of each staircase step. So it is possible for the test staircase max[deltaCDF(x)] to exceed a selected threshold without the analyst knowing about it.

Is this actually a problem? I mean, theoretically it seems to be. However, if each tabulated threshold is arrived at by compiling countless cases in which max[deltaCDF(x)] is determined only at x-values in the data sample, then the theory becomes irrelevant.



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.