
Re: A "plausible range" for a random variable
Posted:
Jun 7, 2013 1:31 PM


On Fri, 7 Jun 2013 09:30:09 0700 (PDT), Sergio Rossi <deltaquattro@gmail.com> wrote:
>Hi, > >I run a Monte Carlo simulation of a black box code,i.e., I assign probability distributions to the code inputs and I obtain a Monte Carlo sample of the output variable Y. Y doesn't have to be Gaussian, because the input distribution aren't necessarily Gaussian, and even if they were, the output depends nonlinearly on inputs. > >My bosses asked me to give them a "plausible range" for the variable Y. Trying to rephrase this question in a statistical framework, I thought about finding a lower bound L and an upper bound U for Y, such that p(L<=Y<=U) equal to, say, 95%. In practice, that's percentiles estimation. For example, if I were to set L=inf, then U would be precisely the 95th percentile of the distribution of Y, so the problem would become to estimate the 95th percentile of Y. >Questions: >1. Is there a preferred way to select L and U? I don't think so, since I don't know which is the distribution of Y. So I was thinking to just select two percentiles "symmetrical about the median", such that p(L<=Y<=U) = alpha. For example, if alpha = .95, I just choose L as the 2.5percentile and U as the 97.5th percentile. >2. How do I estimate L and U? I know I could just load my samples in R and use bootstrap. However, I'd prefer to have also an analytical formula, for a variety of reasons. I have fairly large samples (usually N ~= 2000), so I guess that there should be some expression for the confidence intervals of percentiles, based on CLT. Can you post them? >
1) Selecting Lower and Upper limits, L and U, as you notice, is usually done symmetrically (if not onetailed). That is done, mainly, for lack of any other good reason to outweigh the equal emphasis on each end.
Statistical estimation theory occasionally looks at the "narrowest" CI. That is *the* important characteristic of onetailed tests, determining UMP (Uniformly Most Powerful). Because tails can be asymmetrical, no twotailed test is UMP.
Decision theory would suggest that you apply a lossfunction to determine what degree of asymmetry might apply  I was intrigued, long ago, by the suggestion that the "power" of standard research might be improved by splitting the conventional 5% into 4% at the "expected" end and 1% at the other end, for a gain in general power without losing the right to report stronger effects in the opposite direction. I read that at least 30 years ago, so you can see that the idea never caught on.
2) A parametric approach to L and U for Extreme Values is not going to be at all efficient. What is used for estimation is what your bootstrapping would converge to: The CI based for L (or U) based on rankorder in the original sample.
Poisson consideration gives a good approximation for small proportions. This is applied for your N=2000, 2 1/2%, as follows.
Rank 50 is the point estimate of L. The +/ 2SD range for Poisson can be estimated as ( Square(Sqrt(L)  1), Square(Sqrt(L) + 1) )
The square root of 50 is about 7; the square of 6 is 36, and the square of 8 is 64. That gives (approximately) the CI for L=50 is (37, 65).
 Rich Ulrich

