Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: A "plausible range" for a random variable
Replies: 9   Last Post: Jun 11, 2013 7:42 PM

 Messages: [ Previous | Next ]
 Richard Ulrich Posts: 2,961 Registered: 12/13/04
Re: A "plausible range" for a random variable
Posted: Jun 7, 2013 1:31 PM

On Fri, 7 Jun 2013 09:30:09 -0700 (PDT), Sergio Rossi
<deltaquattro@gmail.com> wrote:

>Hi,
>
>I run a Monte Carlo simulation of a black box code,i.e., I assign probability distributions to the code inputs and I obtain a Monte Carlo sample of the output variable Y. Y doesn't have to be Gaussian, because the input distribution aren't necessarily Gaussian, and even if they were, the output depends nonlinearly on inputs.
>
>My bosses asked me to give them a "plausible range" for the variable Y. Trying to rephrase this question in a statistical framework, I thought about finding a lower bound L and an upper bound U for Y, such that p(L<=Y<=U) equal to, say, 95%. In practice, that's percentiles estimation. For example, if I were to set L=-inf, then U would be precisely the 95-th percentile of the distribution of Y, so the problem would become to estimate the 95-th percentile of Y.
>Questions:
>1. Is there a preferred way to select L and U? I don't think so, since I don't know which is the distribution of Y. So I was thinking to just select two percentiles "symmetrical about the median", such that p(L<=Y<=U) = alpha. For example, if alpha = .95, I just choose L as the 2.5-percentile and U as the 97.5th percentile.
>2. How do I estimate L and U? I know I could just load my samples in R and use bootstrap. However, I'd prefer to have also an analytical formula, for a variety of reasons. I have fairly large samples (usually N ~= 2000), so I guess that there should be some expression for the confidence intervals of percentiles, based on CLT. Can you post them?
>

1)
Selecting Lower and Upper limits, L and U, as you notice, is
usually done symmetrically (if not one-tailed). That is done,
mainly, for lack of any other good reason to outweigh the
equal emphasis on each end.

Statistical estimation theory occasionally looks at the
"narrowest" CI. That is *the* important characteristic
of one-tailed tests, determining UMP (Uniformly Most
Powerful). Because tails can be asymmetrical, no two-tailed
test is UMP.

Decision theory would suggest that you apply a loss-function
to determine what degree of asymmetry might apply -- I
was intrigued, long ago, by the suggestion that the "power" of
standard research might be improved by splitting the conventional
5% into 4% at the "expected" end and 1% at the other end,
for a gain in general power without losing the right to report
stronger effects in the opposite direction. I read that at least
30 years ago, so you can see that the idea never caught on.

2)
A parametric approach to L and U for Extreme Values is not
going to be at all efficient. What is used for estimation is what
your bootstrapping would converge to: The CI based for L
(or U) based on rank-order in the original sample.

Poisson consideration gives a good approximation for small
proportions. This is applied for your N=2000, 2 1/2%, as follows.

Rank 50 is the point estimate of L. The +/- 2SD range for Poisson
can be estimated as ( Square(Sqrt(L) - 1), Square(Sqrt(L) + 1) )

The square root of 50 is about 7; the square of 6 is 36, and the
square of 8 is 64. That gives (approximately) the CI for L=50
is (37, 65).

--
Rich Ulrich