Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Topic: Bootstrap and NHST
Replies: 0

 Luis A. Afonso Posts: 4,758 From: LIsbon (Portugal) Registered: 2/16/05
Bootstrap and NHST
Posted: May 12, 2013 10:38 AM

Bootstrap and NHST

Because nowadays it is currently feasible to perform fast processing data via computers, the intensive data treatment by random simulation such that Permutation, Bootstrap, Jacknife, etc. had acquired increasing importance comparing with those using parametric procedures. The slow and limited capacity performance of mechanical machines used at Fisher´s time one hundred year ago has nothing to do with the modern computers even for PC´s. Furthermore the lack of Theoretical Probabilistic basis of a lot of procedures lead people to base treatments on numeric procedures, sometimes only naively intuitive, the computers are so able to solve.

__0__Directional aiming

I intend to analyse Bradley Efron and their co-authors work, manly in what concerns the 80´s papers, in order to get an idea how the classical methodology can be compared with this new standard fully adopted since then.

___1__One-mean Bootstrap
.
A common simple problem in Decision do occur when a Distribution mean is bounded, 2 tails, 1- alpha probability, using a normal n-size sample, namely:

xhat +/- T(n-1, alpha/2)*sqrt (ssd/n) (1)
xhat = observed mean,
ssd= sum square deviations about xhat.
T(n-1, alpha/2) = Student Distribution fractil alpha/2, n-1 df.

It worth to be noted that, once the size chosen, we are dealing with two sample variables, xhat and ssd, with different distributions, normal and chi-sq. More precisely the Distribution variance sigmasq is limited by:

ssd/Chi0 <= sigmasq<= ssd/Chi1 (2)
Chi0= Chi(n-1, alpha/2)
Chi1= Chi(n-1, 1-alpha/2),
while xhat follows a Normal Distribution.

In Parametric Statistics a two-tail *symmetric* 1-alpha C.I. is such that there is only alpha/2 probability the parameters value stay before the left´s bound and alpha/2 to stay after the right´s bound.
Bootstrap
Given an n-size i.i.d. sample, the source S, a bootstrap sample is obtained by sampling at random, with replacement the n items of S. Therefore a Bsample can show from 1 to n repeated items from the source. It´s claimed, and certain theoretical results tends to confirm, that asymptotically the full Bset can be thought as equal the Population of samples, getting from the full Distribution when drawn one by one.
Among the multitude of persons which professional work is Data Processing there is at least one, me, which truly hate the term asymptotic and avoid, if possible, using such techniques. They are irreplaceable for Deductive Purposes but ominous for practical finite data.

The first conclusion about the dispersion is that (1) contains from 0.954 to 0.951 values of the Bootstrap set when the N(0,1) source-samples goes from n=100 to 400. The results are shown above, the frequencies from the left bound and from the right one as well. For each size 5 sources were obtained and 400´000 Bootstrap synthesized.
(Program <orchid>)

_n=400_____
____________0.0246___0.9509___0.0245__
____________0.0243___0.9511___0.0246__
____________0.0243___0.9512___0.0245__
____________0.0255___0.9506___0.0239__
____________0.0248___0.9500___0.0252__

_n=350_____
____________0.0238___0.9516___0.0246__
____________0.0252___0.9507___0.0242__
____________0.0247___0.9508___0.0246__
____________0.0242___0.9520___0.0238__
____________0.0239___0.9520___0.0242__

_n=300_____
____________0.0246___0.9515___0.0239__
____________0.0245___0.9522___0.0234__
____________0.0242___0.9514___0.0244__
____________0.0241___0.9525___0.0234__
____________0.0243___0.9508___0.0249__

_n=250_____
____________0.0241___0.9516___0.0243__
____________0.0250___0.9511___0.0239__
____________0.0245___0.9515___0.0240__
____________0.0251___0.9515___0.0234__
____________0.0250___0.9512___0.0238__

_n=200_____
____________0.0233___0.9518___0.0249__
____________0.0245___0.9522___0.0232__
____________0.0235___0.9524___0.0241__
____________0.0243___0.9516___0.0241__
____________0.0239___0.9518___0.0243__

_n=150_____
____________0.0243___0.9524___0.0233__
____________0.0247___0.9530___0.0223__
____________0.0243___0.9517___0.0240__
____________0.0234___0.9530___0.0235__
____________0.0240___0.9531___0.0229__

_n=100_____
____________0.0237___0.9541___0.0222__
____________0.0232___0.9535___0.0233__
____________0.0229___0.9539___0.0232__
____________0.0237___0.9539___0.0225__
____________0.0223___0.9539___0.0238__

The problem is not sufficiently treated before we compare the Dispersion of Bootstrap mean values with the Parametric provided through Parametric methods.

Luis A. Afonso

REM "ORCHID"
CLS : COLOR 15
DEFDBL A-Z: PRINT " ORCHID n= 100(50)400 "
RANDOMIZE TIMER
REM
DIM X(400)
REM
T(1) = 1.9842: T(2) = 1.976: T(3) = 1.972: T(4) = 1.9695
T(5) = 1.9679: T(6) = 1.9668: T(7) = 1.9659
REM
INPUT " size = "; n
5 kii = 2 * n / 100 - 1
IF INT(kii) <> kii THEN GOTO 5
IF kii > 7 THEN GOTO 5
IF kii < 0 THEN GOTO 5
INPUT " how many= "; all
pi = 4 * ATN(1)
COLOR 7: PRINT : PRINT
REM
msource = 0: sssource = 0
FOR i = 1 TO n
aa = SQR(-2 * LOG(RND))
X(i) = 0 + 1 * aa * COS(2 * pi * RND)
msource = msource + X(i) / n
sssource = sssource + X(i) * X(i)
NEXT i
ssd = sssource - n * msource * msource
st = SQR(ssd / (n * (n - 1)))
left = msource - st * T(kii)
right = msource + st * T(kii)
REM
FOR rpt = 1 TO all
m = 0
FOR i = 1 TO n
gg = INT(n * RND) + 1
m = m + X(gg) / n
NEXT i
REM checking the Bmean
REM
REM
IF m < left THEN lft = lft + 1
IF m > right THEN rgt = rgt + 1
IF m > left AND m < right THEN u = u + 1
LOCATE 12, 43
PRINT USING " ######### "; all - rpt
LOCATE 14, 40
PRINT USING "#.#### "; lft / rpt; u / rpt; rgt / rpt
REM
NEXT rpt
PRINT USING "##.#### "; T(kii)
END