Associated Topics || Dr. Math Home || Search Dr. Math

### Probability Density Functions

```Date: 07/23/2003 at 14:13:15
From: Kevin
Subject: Probability Density Functions

My boss has given me the responsibility of taking a set of data,
finding the correct distribution that goes with it, and then drawing
the PDF onto the histogram chart. Unfortunately I'm not sure what
steps are needed to go from the data to a known distribution. Since
I'm using various sets of data that change every time I can't send in
an example at all.

The most frustrating part is that since I don't have a statistics
background I don't know where to begin. I've heard phrases like
shape parameter, scale parameter, Pearson Distribution, Goodness of
Fit test and Method of Moments, but they don't make any sense to me.
I've done extensive research within textbooks, the Internet, and
through Profs at the local university; however, I still don't have a
clue where to begin.

Below are the various PDF's for the distributions.

Exponential: lambda * Exp(-1 * lambda * x)
Gamma:[x^(a - 1) * Exp(-(x/B))] / [Gamma(a) * B^a]
Normal: Exp[(-(x-u)^2) / (2 * (a^2))]/[a * (2 * PI)^0.5]
Rayleigh: [x * Exp(-0.5 * (x / a)^2)] / a^2
Weibull: [a * (x^(a-1)) * Exp(-(x/B)^a)] / (B^a)

For the Normal curve I've learned the various parameters that go into
the function; however, none of the other curves is as well documented.

What I need is a guiding voice to tell me where to begin and how to
go about finding out which distribution the data falls under.
Finding out how to calculate the shape and scale variables where
applicable would be nice as well.
```

```
Date: 07/23/2003 at 15:07:36
From: Doctor George
Subject: Re: Probability Density Functions

Hi Kevin,

Thanks for writing to Doctor Math.

As you suspect, you have a learning curve to climb. The most common
thing to do is to fit a Pearson distribution to the data. The basic
method of moments technique compares the first four moments of your
data with the moments of the curves in the Pearson family.

The third and fourth moments, related to skewness and kurtosis, are
the basis for determining which member of the Pearson family to use.
The other moments tell you how to scale and shift the distribution

I strongly recommend that you get your hands on Norman Johnson's
book, _Systems of Frequency Curves_.

The Rayleigh distribution is a special case of the Weibull, and
neither of them is in the Pearson family. If you have a reason to
fit those particular distributions they must be done separately.

The Pearson Type IV is numerically difficult to handle. It is common
to substitute the Johnson Su distribution for it. You will find it
in Johnson's book as well.

A key question is what information you hope to gain from the fitted
distribution. Drawing inferences from the fitted distribution is not
always as good an idea as it initially seems (though it may make for
impressive presentations). It may be that the distribution will have
little to do with your actual data in some critical respect.

There is a substantial amount of code that you will have to write.
Many of the building blocks are available for free through the NIST
website. You may also want to construct random number generators for
the various members of the Pearson family to help when testing.

If need be, there are commercial software packages that do this kind
of task. Whether or not you can integrate with them effectively is
another issue.

Write again if I can give you more help.

- Doctor George, The Math Forum
http://mathforum.org/dr.math/
```
Associated Topics:
College Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search