Probability Density Functions
Date: 07/23/2003 at 14:13:15 From: Kevin Subject: Probability Density Functions My boss has given me the responsibility of taking a set of data, finding the correct distribution that goes with it, and then drawing the PDF onto the histogram chart. Unfortunately I'm not sure what steps are needed to go from the data to a known distribution. Since I'm using various sets of data that change every time I can't send in an example at all. The most frustrating part is that since I don't have a statistics background I don't know where to begin. I've heard phrases like shape parameter, scale parameter, Pearson Distribution, Goodness of Fit test and Method of Moments, but they don't make any sense to me. I've done extensive research within textbooks, the Internet, and through Profs at the local university; however, I still don't have a clue where to begin. Below are the various PDF's for the distributions. Exponential: lambda * Exp(-1 * lambda * x) Gamma:[x^(a - 1) * Exp(-(x/B))] / [Gamma(a) * B^a] Normal: Exp[(-(x-u)^2) / (2 * (a^2))]/[a * (2 * PI)^0.5] Rayleigh: [x * Exp(-0.5 * (x / a)^2)] / a^2 Weibull: [a * (x^(a-1)) * Exp(-(x/B)^a)] / (B^a) For the Normal curve I've learned the various parameters that go into the function; however, none of the other curves is as well documented. What I need is a guiding voice to tell me where to begin and how to go about finding out which distribution the data falls under. Finding out how to calculate the shape and scale variables where applicable would be nice as well.
Date: 07/23/2003 at 15:07:36 From: Doctor George Subject: Re: Probability Density Functions Hi Kevin, Thanks for writing to Doctor Math. As you suspect, you have a learning curve to climb. The most common thing to do is to fit a Pearson distribution to the data. The basic method of moments technique compares the first four moments of your data with the moments of the curves in the Pearson family. The third and fourth moments, related to skewness and kurtosis, are the basis for determining which member of the Pearson family to use. The other moments tell you how to scale and shift the distribution to fit your data. I strongly recommend that you get your hands on Norman Johnson's book, _Systems of Frequency Curves_. The Rayleigh distribution is a special case of the Weibull, and neither of them is in the Pearson family. If you have a reason to fit those particular distributions they must be done separately. The Pearson Type IV is numerically difficult to handle. It is common to substitute the Johnson Su distribution for it. You will find it in Johnson's book as well. A key question is what information you hope to gain from the fitted distribution. Drawing inferences from the fitted distribution is not always as good an idea as it initially seems (though it may make for impressive presentations). It may be that the distribution will have little to do with your actual data in some critical respect. There is a substantial amount of code that you will have to write. Many of the building blocks are available for free through the NIST website. You may also want to construct random number generators for the various members of the Pearson family to help when testing. If need be, there are commercial software packages that do this kind of task. Whether or not you can integrate with them effectively is another issue. Write again if I can give you more help. - Doctor George, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.