Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.
|
|
|
|
Computing tables for use with Peirce's Criterion - in 1855 and 2008
Posted:
Oct 5, 2008 9:41 AM
|
|
In 1852 Peirce described his criterion for identifying outliers in a sample from an otherwise normal distribution (1). In 1855 Gould, having being applying the criterion professionally, published a set of tables of critical deviations together with a description of the algorithm he had used to calculate them (2). The calculations were done manually, aided only by log-tables and some special function tables he managed to get his hands on!
Gould's first two tables list the (squares of the) critical deviation x as a function of N, the number of values in the sample, and k, the number of outliers being tested for. These tables differ with respect the value for m, the 'number of unknown quantities', set to 1 and 2, respectively.
Below I have tried to be consistent with Gould's notation. Here are the exceptions:
Gould Here -------------------------------- Lower case n k psi x erfc(x/sqrt(2)) --------------------------------
Next, two equations are formulated from which the critical deviation can be found (x>0).
Gould's equation D can be stated:
R = R1(x) = exp((x^2-1)/2) erfc(x/sqrt(2))
Defining (Gould's equation B):
Q^N = k^k (N-k)^(N-k) / N^N
Gould's equation A' can be combined with the unlabeled one just above equation C into:
R = R2(x) = Q^(N/k) [(N-m-k x^2)/(N-m-k)]^(-(N-k)/(2k))
Now a numerical solution for x can be found after equating R1 and R2.
Gould's algorithm was based on the following pseudoconstant strategy (3), the initial value for R being my own suggestion:
R = 0.25 <Find x from: R2(x)=R> repeat R = R1(x) oldx = x <Find x from: R2(x)=R> until abs(x-oldx)<Eps
Although this works quite well on a computer (with some extra termination logistics), much faster convergence can be achieved through root-finding by Newton's Method. I currently use the algorithm below (suitable for double precision floating point). Some loop-invariant code have been kept explicit for the sake of readability.
---------------------------------------------------------------------- function PeirceDeviation(N,k,m) // Peirce's critical deviation for N samples, k potential outliers and m // 'unknown quantities' (usually 1).
if N-m-k<=0 then return(0) // result undefined
LnQN = k*Ln(k) + (N-k)*Ln(N-k) - N*Ln(N)
x = 1
repeat x = min(x, sqrt((N-m)/k) - 1e-10)
// R1(x) and R2(x) R1 = exp((x*x-1)/2)*erfc(x/sqrt(2)) R2 = exp( (LnQN - 0.5*(N-k)*ln((N-m-k*x*x)/(N-m-k)))/k )
// R1'(x) and R2'(x) R1d = x*R1 - sqrt(2/pi/exp(1)) R2d = x*(N-k)/(N-m-k*x*x)*R2
// Newton oldx = x x = oldx - (R1-R2)/(R1d-R2d) until abs(x-oldx) < N * 2E-16
return(x) ----------------------------------------------------------------------
Gould did not list x-values below 1, and when glancing down the tables along the edge of the filled-in cells, one is tempted to ask if something mathematically interesting can be said about the special case x=1. For that purpose we shall accept non-integer values for N and/or k.
Setting R1(1)=R2(1) we get:
erfc(1/sqrt(2)) = Q^(N/k) = k [(N-k)^(N-k) / N^N]^(1/k)
or, defining the ratio C = k/N:
erfc(1/sqrt(2)) = C (1-C)^(1/C-1)
So, x=1 implies a fixed k/N ratio, which does not depend on m. C must be estimated numerically; here it is to 30 decimal places:
C = 0.58974 36745 10713 07248 06181 37574 ...
which just happens to be rather close to the simple fraction:
23/39 = 0.58974 35897 43589 ...
1. Peirce, B.: Criterion for the rejection of doubtful observations. Astronomical Journal, vol. 2, iss. 45, p. 161-163 (1852). http://adsabs.harvard.edu/abs/1852AJ......2..161P http://adsabs.harvard.edu/abs/1852AJ......2..176P (erratum)
2. Gould, B.A.: On Peirce's Criterion for the Rejection of Doubtful Observations, with tables for facilitating its application. Astronomical Journal, vol. 4, iss. 83, p. 81-87 (1855). http://adsabs.harvard.edu/abs/1855AJ......4...81G
3. Acton, F.S.: Real Computing Made Real: Preventing Errors in Scientific and Engineering Calculations. Princeton University Press (1996).
Best regards,
Knud Thomsen Geologist, Denmark
|
|
|
|