Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: probability density function identification
Replies: 9   Last Post: May 25, 2013 5:27 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
David Jones

Posts: 61
Registered: 2/9/12
Re: probability density function identification
Posted: May 21, 2013 3:46 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply



"Rich Ulrich" wrote in message
news:i51mp8trgggnsum94uh4dngdb07partm3t@4ax.com...

On Mon, 20 May 2013 21:43:18 -0700 (PDT), Fern
<priyan.fernando@gmail.com> wrote:

>Hi,
>
>I have a question on trying to reverse engineer the probability density
>function from which a set of numbers were generated. My setup is the
>following:
>
>1) I have two probability density functions, both of whose domain is
>bounded in [0,1]:
>a) Beta (4,2) distribution
>b) Uniform (0.358060,0.975273) distribution
>
>2) Note that the parameters of the Uniform distribution have been carefully
>selected so that it has the same mean and variance as the Beta
>distribution.
>
>3)From each distribution we generate 50 numbers
>
>4)We then sum these random numbers separately (for the beta and uniform)
>and the value are placed as elements in two vectors (RandBeta and
>RandUnif).
>
>5)We repeat steps 3-4 until the vectors RandBeta and RandUnif have 20,000
>elements each.
>
>In light of the Central Limit Theorem (which would hold for summing
>variates drawn from the two distributions above) my question is whether it
>is possible to examine the vectors RandBeta and RandUnif (without knowing
>which is which) and determine which was generated from the Beta pdf and
>which form the Uniform pdf?
>
>Thanks!


Selecting between two choices is not much reverse engineering.

The question is whether a sample of 20,000 is large enough to
detect the detect the differences in distributions based on sampling
the averages of 50 uniforms vs. 50 beta(4,2). Testing would depend
higher-order moments than the first and second.

<<snip>>

It is not clear that moments would be useful. In this context the ranges of
possible values for the two averages are (0,1) and (0.358060,0.975273) ...
so that as soon as a value of the average outside the range
(0.358060,0.975273) occurs you know that the original distribution must have
been the Beta (4,2) . Of course the probability of such an outcome from the
Beta (4,2) average distribution might be too small for this to have much
chance of happening within 20000 samples, but it perhaps indicates the way
to go..... which to me seems to be to look at the tail behaviour

For the uniform case, there are certainly analytical expressions for the
distribution function of the average. There may not be a corresponding
analytical expression for the beta distribution, but there are possibilities
of finding the distribution function numerically. If neither of these
appeal, there are still possibilities of proceeding if the OP is prepared to
generate samples known to be from one or other of the two sources. A
suggestion would be to construct appropriate log-survivor plots for the two
tails and to see how the sample version of these compares to either the
known distributions (if possible), or to repeated samplings from the two
sources. The repeated-sampling approach would at least give an idea of how
much separation of the cases there can be in a sample 20000 values.

David Jones




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.