|
Re: A "plausible range" for a random variable
Posted:
Jun 7, 2013 1:31 PM
|
|
On Fri, 7 Jun 2013 09:30:09 -0700 (PDT), Sergio Rossi <deltaquattro@gmail.com> wrote:
>Hi, > >I run a Monte Carlo simulation of a black box code,i.e., I assign probability distributions to the code inputs and I obtain a Monte Carlo sample of the output variable Y. Y doesn't have to be Gaussian, because the input distribution aren't necessarily Gaussian, and even if they were, the output depends nonlinearly on inputs. > >My bosses asked me to give them a "plausible range" for the variable Y. Trying to rephrase this question in a statistical framework, I thought about finding a lower bound L and an upper bound U for Y, such that p(L<=Y<=U) equal to, say, 95%. In practice, that's percentiles estimation. For example, if I were to set L=-inf, then U would be precisely the 95-th percentile of the distribution of Y, so the problem would become to estimate the 95-th percentile of Y. >Questions: >1. Is there a preferred way to select L and U? I don't think so, since I don't know which is the distribution of Y. So I was thinking to just select two percentiles "symmetrical about the median", such that p(L<=Y<=U) = alpha. For example, if alpha = .95, I just choose L as the 2.5-percentile and U as the 97.5th percentile. >2. How do I estimate L and U? I know I could just load my samples in R and use bootstrap. However, I'd prefer to have also an analytical formula, for a variety of reasons. I have fairly large samples (usually N ~= 2000), so I guess that there should be some expression for the confidence intervals of percentiles, based on CLT. Can you post them? >
1) Selecting Lower and Upper limits, L and U, as you notice, is usually done symmetrically (if not one-tailed). That is done, mainly, for lack of any other good reason to outweigh the equal emphasis on each end.
Statistical estimation theory occasionally looks at the "narrowest" CI. That is *the* important characteristic of one-tailed tests, determining UMP (Uniformly Most Powerful). Because tails can be asymmetrical, no two-tailed test is UMP.
Decision theory would suggest that you apply a loss-function to determine what degree of asymmetry might apply -- I was intrigued, long ago, by the suggestion that the "power" of standard research might be improved by splitting the conventional 5% into 4% at the "expected" end and 1% at the other end, for a gain in general power without losing the right to report stronger effects in the opposite direction. I read that at least 30 years ago, so you can see that the idea never caught on.
2) A parametric approach to L and U for Extreme Values is not going to be at all efficient. What is used for estimation is what your bootstrapping would converge to: The CI based for L (or U) based on rank-order in the original sample.
Poisson consideration gives a good approximation for small proportions. This is applied for your N=2000, 2 1/2%, as follows.
Rank 50 is the point estimate of L. The +/- 2SD range for Poisson can be estimated as ( Square(Sqrt(L) - 1), Square(Sqrt(L) + 1) )
The square root of 50 is about 7; the square of 6 is 36, and the square of 8 is 64. That gives (approximately) the CI for L=50 is (37, 65).
-- Rich Ulrich
|
|