|
|
Re: A guess of the Probability density function from percentile values
Posted:
Dec 27, 2010 1:49 PM
|
|
On Dec 27, 2:17 am, Allamarein <matteo.diplom...@gmail.com> wrote: > I have just posted a similar thread in another group. > I hope to be more lucky here. > > I know three percentile values. > Let's say they are: > 95% 82.1 > 50% 80.3 > 5% 77.8 > > I would to get a Probability density function. > I presume I would guess the shape of this curve. > Since these data refer to a scientific measuring, I would find a t- > Student or a Gaussian distribution that is consistent with the > previous percentiles. > > Any suggestions?
To expand on Astanoff's answer: if your density function is normal with mean 'a' and standard deviation 'b', you want to find 'a' and 'b'. The cumulative distribution function F(x) has the form F(x) = Phi((x-a)/b), where Phi(z) = Pr{ N(0,1) <= z} = standard normal cumulative distribution; this function is widely tabulated and readily available in standard software and even on scientific hand-held calculators. You have F(77.8) = 0.05, F(80.3) = 0.50 and F(82.1) = 0.95. Unfortunately, these data are inconsistent! The problem is that Phi(0)= 1/2 (so the mean is a = 80.3), while Phi(-1.644853627) = 0.05, Phi(1.644853672) = 0.95. Thus, in the normal distribution, the percentiles 5%, 50% and 95% should be spaced equally apart, but in your case the spacing between 5% and 50% is 80.3 - 77.8 = 2.5, while the spacing between 50% and 95% is 82.1 - 80.3 = 1.8.
So, now you have a problem: a normal distribution will not fit your data. At his point you have several options: (1) change the distribution---preferably to one having three parameters, because you have three conditions to fit; (2) use a normal distribution but keep only two of the three items of data (giving three different answers, depending on which two out of three you keep); or (3) decide on some other type of "best fit", such as least-squares or least absolute deviation--- in other words, look for parameters (a,b) that minimize [F(77.8)-0.05]^2 + [F(80.3)-0.50]^2 + [F(82.1) - 0.95]^2 or that minimize |F(77.8)-0.05| + |F(80.3)- 0.50| + |F(82.1) - 0.95|. The first one is a nonlinear optimization problem that can be solved using standard software cuch as the Solver tool in EXCEL. The second one can be turned into a problem of minimizing a linear function subject to nonlinear inequality constraints, which can also be solved in EXCEL, for example. Note that you will not have an exact fit to your data, but that may not matter because if your data are really the result of measurement (as you state) they are inaccurate anyway.
If you choose option (2), keeping only F(80.3) = .50 and F(82.1) = 0.95 (for example), you have a = 80.3 (because in the normal, the mean is the 50th percentile) and (82.1 - 80.3)/b = 1.644853672 - 0 = 1.8/ b, so b = 1.8/1.644853672 =~= 1.094 . You would get different answers if you chose 5% and 50% or 5% and 95%.
R.G. Vickson
|
|