On Sat, 25 May 2013 12:43:38 +0200, Cristiano <cristiapi@NSgmail.com> wrote:
>On 25/05/2013 5:50, Rich Ulrich wrote: >>> Yes, I do that, but to be more precise: >>> 1) Draw 7; compute skewness; >>> 2) if skewness < 0 discard the value, else save. >> >> Depending on what you mean by "discard," > >Uh? What I mean? Discard is discard; I mean discard. >You can take a look here: >http://www.thefreedictionary.com/discard >"To throw away; reject." > >> this might introduce some unknown bias. Do you keep the count? >> There will never be *exactly* 50% of the sample with >> skewness less than 0. > >Sure, but where's the problem?
Do you count it? "Throw away; reject" implies that you will sample 100k values that are all positive, which is clearly wrong. If you adapted by sampling 50k positive, you will be wrong by the fraction off from 50%.
> >>> The reason to discard skewness < 0 is that I need to calculate only a >>> critical value for the skewness (the distribution must be exactly >>> symmetrical); if I get 5th percentile = -0.123 and 95th percentile = >>> .124, which critical value should I take?
Technically, you were talking about a two-tailed test, and assuming (very rationally) that it has symmetrical tails. The technically correct answer to the two-tailed limit is the absolute value that rejects a total of 10% of the trials. For a particular randomization, there will be more from one end than from the other. Okay. You get a improved answer by using both ends together, instead of using either (or both) separately.
>> >> As you say, the distribution *ought* to be exactly symmetrical. >> >> The lower limit provides a second value based on 100,000 >> replications. (1) Why ignore it? (2) If there were some bias >> in your RNG that these computations brought out, it would be >> important to know it. > >The RNG I use doesn't have any bias.
I expect that that is (nearly) true. But I expect that a professional RNG creator/tester would never lay out that statement without some qualification, such as, "that woud be detected in an experiment like this one."
>I checked that using properly designed tests and I check the simulation >using a properly designed generator. > >> (3) When you compute 10 or 20 cut-offs, >> you can compute a pragmatic standard error, to go along with >> the theoretical one (based on ranks around the 5% cutoff). >> >> Back when computers were 1000 times slower than today, I was >> reading some computer science literature. Cutting an eight-hour >> monte carlo job in half would have been a worth-while benefit of >> using both ends of the distribution, even without the cross-check >> on validity (from actually *looking* at both values). No reader >> would have complained. > >I don't have any problem in using both tails, but does it make any sense? >We already know that the critical values for the 5th and 95th percentile >*must* be exactly the same. >For example, using both tails I get: > 0.05 -.82306 +/- 2.75e-4 > 0.95 .82311 +/- 2.73e-4 >(+/- indicates the confidence interval) >The p-value have to come from a 2-sided test; there should be only one >critical value. Where's the sense in using -.82306 and .82311?
Here's a minor puzzle for me. Early, you were referring to the same two-tailed limits (I think) as being about 1.2, not 0.82. Oh, well.
Anyway. As someone else suggests, the easiest check that is readily available seems to be: Check what you get for N=25, since that is where the usual tables start.
> >> That all being said -- I don't know why your results don't agree >> with the page you cite. Before I looked at the page, I >> wondered at potential differences in definitions of "skewness". > >If it's not too much trouble, you just need to click the links to see >that the pages show also the formulas. > >> However, they seem very explicit in what is being computed. >> You *would* get slightly different results if you don't compute >> the moments around the observed means for each set (but >> assumed zero). > >I calculated the above critical values using mean= 0, while when I >calculate them using the sample mean I get: > 0.05 -.81661 +/- 2.79e-4 > 0.95 .81637 +/- 2.74e-4 >There are significant differences, but the values are very far away from >those tabulated in that site. >How that can be possible? >I'm not interested in using the values in the site, but I need to >understand whether my simulation works fine. > >If someone can confirm that the following procedure is good, I can stop >asking and I can start the simulation: > >1) Randomly draw N normally (or uniformly) distributed numbers >2) compute the skewness (or the kurtosis) >2a) [if skewness < 0 discard the value, else save] >3) Repeat many times >4) calculate the p-th percentile of the saved skewness (or kurtosis) >5) Repeat until the confidence interval for the p-th percentile is "good". > >[I can calculate when "good" is good.] > >Step 2a: for the kurtosis I need 2 critical values, but for the skewness >do I really need 2 critical values? > >Cristiano