Date: Oct 5, 2008 9:41 AM Author: sales@kt-algorithms.com Subject: Computing tables for use with Peirce's Criterion - in 1855 and 2008 In 1852 Peirce described his criterion for identifying outliers in a

sample from an otherwise normal distribution (1).

In 1855 Gould, having being applying the criterion professionally,

published a set of tables of critical deviations together with a

description of the algorithm he had used to calculate them (2). The

calculations were done manually, aided only by log-tables and some

special function tables he managed to get his hands on!

Gould's first two tables list the (squares of the) critical deviation

x as a function of N, the number of values in the sample, and k, the

number of outliers being tested for. These tables differ with respect

the value for m, the 'number of unknown quantities', set to 1 and 2,

respectively.

Below I have tried to be consistent with Gould's notation.

Here are the exceptions:

Gould Here

--------------------------------

Lower case n k

psi x erfc(x/sqrt(2))

--------------------------------

Next, two equations are formulated from which the critical deviation

can be found (x>0).

Gould's equation D can be stated:

R = R1(x) = exp((x^2-1)/2) erfc(x/sqrt(2))

Defining (Gould's equation B):

Q^N = k^k (N-k)^(N-k) / N^N

Gould's equation A' can be combined with the unlabeled one just above

equation C into:

R = R2(x) = Q^(N/k) [(N-m-k x^2)/(N-m-k)]^(-(N-k)/(2k))

Now a numerical solution for x can be found after equating R1 and R2.

Gould's algorithm was based on the following pseudoconstant strategy

(3), the initial value for R being my own suggestion:

R = 0.25

<Find x from: R2(x)=R>

repeat

R = R1(x)

oldx = x

<Find x from: R2(x)=R>

until abs(x-oldx)<Eps

Although this works quite well on a computer (with some extra

termination logistics), much faster convergence can be achieved

through root-finding by Newton's Method.

I currently use the algorithm below (suitable for double precision

floating point). Some loop-invariant code have been kept explicit for

the sake of readability.

----------------------------------------------------------------------

function PeirceDeviation(N,k,m)

// Peirce's critical deviation for N samples, k potential outliers and

m

// 'unknown quantities' (usually 1).

if N-m-k<=0 then return(0) // result undefined

LnQN = k*Ln(k) + (N-k)*Ln(N-k) - N*Ln(N)

x = 1

repeat

x = min(x, sqrt((N-m)/k) - 1e-10)

// R1(x) and R2(x)

R1 = exp((x*x-1)/2)*erfc(x/sqrt(2))

R2 = exp( (LnQN - 0.5*(N-k)*ln((N-m-k*x*x)/(N-m-k)))/k )

// R1'(x) and R2'(x)

R1d = x*R1 - sqrt(2/pi/exp(1))

R2d = x*(N-k)/(N-m-k*x*x)*R2

// Newton

oldx = x

x = oldx - (R1-R2)/(R1d-R2d)

until abs(x-oldx) < N * 2E-16

return(x)

----------------------------------------------------------------------

Gould did not list x-values below 1, and when glancing down the tables

along the edge of the filled-in cells, one is tempted to ask if

something mathematically interesting can be said about the special

case x=1.

For that purpose we shall accept non-integer values for N and/or k.

Setting R1(1)=R2(1) we get:

erfc(1/sqrt(2)) = Q^(N/k) = k [(N-k)^(N-k) / N^N]^(1/k)

or, defining the ratio C = k/N:

erfc(1/sqrt(2)) = C (1-C)^(1/C-1)

So, x=1 implies a fixed k/N ratio, which does not depend on m.

C must be estimated numerically; here it is to 30 decimal places:

C = 0.58974 36745 10713 07248 06181 37574 ...

which just happens to be rather close to the simple fraction:

23/39 = 0.58974 35897 43589 ...

1. Peirce, B.: Criterion for the rejection of doubtful observations.

Astronomical Journal, vol. 2, iss. 45, p. 161-163 (1852).

http://adsabs.harvard.edu/abs/1852AJ......2..161P

http://adsabs.harvard.edu/abs/1852AJ......2..176P (erratum)

2. Gould, B.A.: On Peirce's Criterion for the Rejection of Doubtful

Observations,

with tables for facilitating its application.

Astronomical Journal, vol. 4, iss. 83, p. 81-87 (1855).

http://adsabs.harvard.edu/abs/1855AJ......4...81G

3. Acton, F.S.: Real Computing Made Real: Preventing Errors in

Scientific and

Engineering Calculations. Princeton University Press (1996).

Best regards,

Knud Thomsen

Geologist, Denmark