Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.


Luis A. Afonso
Posts:
4,699
From:
LIsbon (Portugal)
Registered:
2/16/05


Bootstrap and NHST
Posted:
May 12, 2013 10:38 AM


Bootstrap and NHST
Because nowadays it is currently feasible to perform fast processing data via computers, the intensive data treatment by random simulation such that Permutation, Bootstrap, Jacknife, etc. had acquired increasing importance comparing with those using parametric procedures. The slow and limited capacity performance of mechanical machines used at Fisher´s time one hundred year ago has nothing to do with the modern computers even for PC´s. Furthermore the lack of Theoretical Probabilistic basis of a lot of procedures lead people to base treatments on numeric procedures, sometimes only naively intuitive, the computers are so able to solve.
__0__Directional aiming
I intend to analyse Bradley Efron and their coauthors work, manly in what concerns the 80´s papers, in order to get an idea how the classical methodology can be compared with this new standard fully adopted since then.
___1__Onemean Bootstrap . A common simple problem in Decision do occur when a Distribution mean is bounded, 2 tails, 1 alpha probability, using a normal nsize sample, namely:
xhat +/ T(n1, alpha/2)*sqrt (ssd/n) (1) xhat = observed mean, ssd= sum square deviations about xhat. T(n1, alpha/2) = Student Distribution fractil alpha/2, n1 df.
It worth to be noted that, once the size chosen, we are dealing with two sample variables, xhat and ssd, with different distributions, normal and chisq. More precisely the Distribution variance sigmasq is limited by:
ssd/Chi0 <= sigmasq<= ssd/Chi1 (2) Chi0= Chi(n1, alpha/2) Chi1= Chi(n1, 1alpha/2), while xhat follows a Normal Distribution.
In Parametric Statistics a twotail *symmetric* 1alpha C.I. is such that there is only alpha/2 probability the parameters value stay before the left´s bound and alpha/2 to stay after the right´s bound. Bootstrap Given an nsize i.i.d. sample, the source S, a bootstrap sample is obtained by sampling at random, with replacement the n items of S. Therefore a Bsample can show from 1 to n repeated items from the source. It´s claimed, and certain theoretical results tends to confirm, that asymptotically the full Bset can be thought as equal the Population of samples, getting from the full Distribution when drawn one by one. Among the multitude of persons which professional work is Data Processing there is at least one, me, which truly hate the term asymptotic and avoid, if possible, using such techniques. They are irreplaceable for Deductive Purposes but ominous for practical finite data.
The first conclusion about the dispersion is that (1) contains from 0.954 to 0.951 values of the Bootstrap set when the N(0,1) sourcesamples goes from n=100 to 400. The results are shown above, the frequencies from the left bound and from the right one as well. For each size 5 sources were obtained and 400´000 Bootstrap synthesized. (Program <orchid>)
_n=400_____ ____________0.0246___0.9509___0.0245__ ____________0.0243___0.9511___0.0246__ ____________0.0243___0.9512___0.0245__ ____________0.0255___0.9506___0.0239__ ____________0.0248___0.9500___0.0252__
_n=350_____ ____________0.0238___0.9516___0.0246__ ____________0.0252___0.9507___0.0242__ ____________0.0247___0.9508___0.0246__ ____________0.0242___0.9520___0.0238__ ____________0.0239___0.9520___0.0242__
_n=300_____ ____________0.0246___0.9515___0.0239__ ____________0.0245___0.9522___0.0234__ ____________0.0242___0.9514___0.0244__ ____________0.0241___0.9525___0.0234__ ____________0.0243___0.9508___0.0249__
_n=250_____ ____________0.0241___0.9516___0.0243__ ____________0.0250___0.9511___0.0239__ ____________0.0245___0.9515___0.0240__ ____________0.0251___0.9515___0.0234__ ____________0.0250___0.9512___0.0238__
_n=200_____ ____________0.0233___0.9518___0.0249__ ____________0.0245___0.9522___0.0232__ ____________0.0235___0.9524___0.0241__ ____________0.0243___0.9516___0.0241__ ____________0.0239___0.9518___0.0243__
_n=150_____ ____________0.0243___0.9524___0.0233__ ____________0.0247___0.9530___0.0223__ ____________0.0243___0.9517___0.0240__ ____________0.0234___0.9530___0.0235__ ____________0.0240___0.9531___0.0229__
_n=100_____ ____________0.0237___0.9541___0.0222__ ____________0.0232___0.9535___0.0233__ ____________0.0229___0.9539___0.0232__ ____________0.0237___0.9539___0.0225__ ____________0.0223___0.9539___0.0238__
The problem is not sufficiently treated before we compare the Dispersion of Bootstrap mean values with the Parametric provided through Parametric methods.
Luis A. Afonso
REM "ORCHID" CLS : COLOR 15 DEFDBL AZ: PRINT " ORCHID n= 100(50)400 " RANDOMIZE TIMER REM DIM X(400) REM T(1) = 1.9842: T(2) = 1.976: T(3) = 1.972: T(4) = 1.9695 T(5) = 1.9679: T(6) = 1.9668: T(7) = 1.9659 REM INPUT " size = "; n 5 kii = 2 * n / 100  1 IF INT(kii) <> kii THEN GOTO 5 IF kii > 7 THEN GOTO 5 IF kii < 0 THEN GOTO 5 INPUT " how many= "; all pi = 4 * ATN(1) COLOR 7: PRINT : PRINT REM msource = 0: sssource = 0 FOR i = 1 TO n aa = SQR(2 * LOG(RND)) X(i) = 0 + 1 * aa * COS(2 * pi * RND) msource = msource + X(i) / n sssource = sssource + X(i) * X(i) NEXT i ssd = sssource  n * msource * msource st = SQR(ssd / (n * (n  1))) left = msource  st * T(kii) right = msource + st * T(kii) REM FOR rpt = 1 TO all m = 0 FOR i = 1 TO n gg = INT(n * RND) + 1 m = m + X(gg) / n NEXT i REM checking the Bmean REM REM IF m < left THEN lft = lft + 1 IF m > right THEN rgt = rgt + 1 IF m > left AND m < right THEN u = u + 1 LOCATE 12, 43 PRINT USING " ######### "; all  rpt LOCATE 14, 40 PRINT USING "#.#### "; lft / rpt; u / rpt; rgt / rpt REM NEXT rpt PRINT USING "##.#### "; T(kii) END



