"Paul" wrote in message news:email@example.com...
I've found conflicting information about the degrees of freedom to use in the chi-square distribution when estimating failure rate from the number of failures seen over a specified period of time. To be sure, the lower MTBF (upper failure rate) always uses 2n+2, where n is the number of failures. However, the upper MTBF (lower failure rate) is shown as using both 2n and 2n+2, depending on the source. I haven't found an online explanation of exactly how the chi-square distribution enters into the calculation (other than http://www.weibull.com/hotwire/issue116/relbasics116.htm, which I'm still chewing on). So I haven't been able to determine whether 2n or 2n+2 is correct from first principles at this point. Based on the reasoning in the above weibull.com page, however, I am inclined to believe that the degrees of freedom should be 2n because we're talking about the two tails of the *same* distribution for upper and lower limits. But this leaves the mystery of why 2n+2 shows up frequently. Is the reason for this straightforward enough to explain via this newsgroup?
At first sight, there seem to be two valid approaches:
(1) treat the problem as having observed n observations from an exponential distribution. The last interval (after the last failure) also has an exponential distribution, but is not independent of the rest .... since the sum of the intervals must be the total "specified period of time".
(2) treat the problem as having observed n, where n is a realisation of a Poisson random variable.
These lead to two different estimates of the rate ... the first based on the time to the last failure, the second on the total observation period. There are then two different confidence intervals... the first leads to using a chi-squared distribution with 2n degrees of freedom because you have something proportional to the sum of n realisations from a chi-squared distribution with 2 degrees of freedom ... the second can also lead to using a chi-squared distribution but for an indirect reason resulting from the relation between the cumulative distribution functions of the Poisson and chi-squared distributions ... it may be that this gives a result using 2n+2 degrees of freedom. However http://en.wikipedia.org/wiki/Poisson_distribution#Confidence_interval indicates that the upper and lower limits in this approach would use different degrees of freedom, 2n and 2n+2.
It seems likely that approach (2) is using more statistical information than approach (1) and hence that the basic estimate in (2) is to be preferred. The fact that the result in approach (2) apparently uses 2 different degrees of freedom might be thought of as just a mathematical artefact, or as a result of using n+1 exponential observations (in the form of the total interval length) that are not independent (as an adjustment from the result for n independent observations.
However, approach (1) is clearly an invalid representation of the experiment, as n is not fixed, but might give a valid inference arguing conditional on the observed n. Approach (2) is able to give a reasonable answer for the case where n=0. There are presumably alternative ways of treating the problem that are applicable when the failure time distribution is not exponential, but these must lead to different results as n would not be Poisson.