Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.

Topic: A guess of the Probability density function from percentile values
Replies: 17   Last Post: Jan 2, 2011 10:06 AM

 Messages: [ Previous | Next ]
 clvickson@gmail.com Posts: 32 Registered: 1/12/09
Re: A guess of the Probability density function from percentile values
Posted: Dec 27, 2010 1:49 PM

On Dec 27, 2:17 am, Allamarein <matteo.diplom...@gmail.com> wrote:
> I have just posted a similar thread in another group.
> I hope to be more lucky here.
>
> I know three percentile values.
> Let's say they are:
> 95% 82.1
> 50% 80.3
> 5%  77.8
>
> I would to get a Probability density function.
> I presume I would guess the shape of this curve.
> Since these data refer to a scientific measuring, I would find a t-
> Student or a Gaussian distribution that is consistent with the
> previous percentiles.
>
> Any suggestions?

with mean 'a' and standard deviation 'b', you want to find 'a' and
'b'. The cumulative distribution function F(x) has the form F(x) =
Phi((x-a)/b), where Phi(z) = Pr{ N(0,1) <= z} = standard normal
cumulative distribution; this function is widely tabulated and readily
available in standard software and even on scientific hand-held
calculators. You have F(77.8) = 0.05, F(80.3) = 0.50 and F(82.1) =
0.95. Unfortunately, these data are inconsistent! The problem is that
Phi(0)= 1/2 (so the mean is a = 80.3), while Phi(-1.644853627) = 0.05,
Phi(1.644853672) = 0.95. Thus, in the normal distribution, the
percentiles 5%, 50% and 95% should be spaced equally apart, but in
your case the spacing between 5% and 50% is 80.3 - 77.8 = 2.5, while
the spacing between 50% and 95% is 82.1 - 80.3 = 1.8.

So, now you have a problem: a normal distribution will not fit your
data. At his point you have several options: (1) change the
distribution---preferably to one having three parameters, because you
have three conditions to fit; (2) use a normal distribution but keep
only two of the three items of data (giving three different answers,
depending on which two out of three you keep); or (3) decide on some
other type of "best fit", such as least-squares or least absolute
deviation--- in other words, look for parameters (a,b) that minimize
[F(77.8)-0.05]^2 + [F(80.3)-0.50]^2 + [F(82.1) - 0.95]^2 or that
minimize |F(77.8)-0.05| + |F(80.3)- 0.50| + |F(82.1) - 0.95|. The
first one is a nonlinear optimization problem that can be solved using
standard software cuch as the Solver tool in EXCEL. The second one can
be turned into a problem of minimizing a linear function subject to
nonlinear inequality constraints, which can also be solved in EXCEL,
for example. Note that you will not have an exact fit to your data,
but that may not matter because if your data are really the result of
measurement (as you state) they are inaccurate anyway.

If you choose option (2), keeping only F(80.3) = .50 and F(82.1) =
0.95 (for example), you have a = 80.3 (because in the normal, the mean
is the 50th percentile) and (82.1 - 80.3)/b = 1.644853672 - 0 = 1.8/
b, so b = 1.8/1.644853672 =~= 1.094 . You would get different answers
if you chose 5% and 50% or 5% and 95%.

R.G. Vickson

Date Subject Author
12/27/10 Red Star
12/27/10 astanoff
12/27/10 Red Star
12/27/10 astanoff
12/27/10 Red Star
12/27/10 clvickson@gmail.com
12/27/10 RGVickson@shaw.ca
12/27/10 Robert Israel
12/28/10 Red Star
12/28/10 astanoff
12/28/10 Red Star
12/29/10 RGVickson@shaw.ca
12/29/10 Red Star
12/29/10 RGVickson@shaw.ca
12/29/10 Red Star
12/30/10 RGVickson@shaw.ca
12/30/10 Red Star
1/2/11 Red Star