Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: probability density function identification
Replies: 9   Last Post: May 25, 2013 5:27 AM

 Messages: [ Previous | Next ]
 David Jones Posts: 80 Registered: 2/9/12
Re: probability density function identification
Posted: May 21, 2013 3:46 AM

"Rich Ulrich" wrote in message
news:i51mp8trgggnsum94uh4dngdb07partm3t@4ax.com...

On Mon, 20 May 2013 21:43:18 -0700 (PDT), Fern
<priyan.fernando@gmail.com> wrote:

>Hi,
>
>I have a question on trying to reverse engineer the probability density
>function from which a set of numbers were generated. My setup is the
>following:
>
>1) I have two probability density functions, both of whose domain is
>bounded in [0,1]:
>a) Beta (4,2) distribution
>b) Uniform (0.358060,0.975273) distribution
>
>2) Note that the parameters of the Uniform distribution have been carefully
>selected so that it has the same mean and variance as the Beta
>distribution.
>
>3)From each distribution we generate 50 numbers
>
>4)We then sum these random numbers separately (for the beta and uniform)
>and the value are placed as elements in two vectors (RandBeta and
>RandUnif).
>
>5)We repeat steps 3-4 until the vectors RandBeta and RandUnif have 20,000
>elements each.
>
>In light of the Central Limit Theorem (which would hold for summing
>variates drawn from the two distributions above) my question is whether it
>is possible to examine the vectors RandBeta and RandUnif (without knowing
>which is which) and determine which was generated from the Beta pdf and
>which form the Uniform pdf?
>
>Thanks!

Selecting between two choices is not much reverse engineering.

The question is whether a sample of 20,000 is large enough to
detect the detect the differences in distributions based on sampling
the averages of 50 uniforms vs. 50 beta(4,2). Testing would depend
higher-order moments than the first and second.

<<snip>>

It is not clear that moments would be useful. In this context the ranges of
possible values for the two averages are (0,1) and (0.358060,0.975273) ...
so that as soon as a value of the average outside the range
(0.358060,0.975273) occurs you know that the original distribution must have
been the Beta (4,2) . Of course the probability of such an outcome from the
Beta (4,2) average distribution might be too small for this to have much
chance of happening within 20000 samples, but it perhaps indicates the way
to go..... which to me seems to be to look at the tail behaviour

For the uniform case, there are certainly analytical expressions for the
distribution function of the average. There may not be a corresponding
analytical expression for the beta distribution, but there are possibilities
of finding the distribution function numerically. If neither of these
appeal, there are still possibilities of proceeding if the OP is prepared to
generate samples known to be from one or other of the two sources. A
suggestion would be to construct appropriate log-survivor plots for the two
tails and to see how the sample version of these compares to either the
known distributions (if possible), or to repeated samplings from the two
sources. The repeated-sampling approach would at least give an idea of how
much separation of the cases there can be in a sample 20000 values.

David Jones

Date Subject Author
5/21/13 Priyan Fernando
5/21/13 Richard Ulrich
5/21/13 David Jones
5/23/13 Ray Koopman
5/24/13 Richard Ulrich
5/24/13 Ray Koopman
5/24/13 David Jones
5/24/13 Ray Koopman
5/25/13 David Jones
5/24/13 Richard Ulrich