Date: Dec 14, 2012 3:27 PM
Author: Ray Koopman
Subject: Re: Thanks for the terminological/methodological corrections, and<br> also for the ref to gnuplot.

On Dec 14, 5:46 am, djh <> wrote:
> Thanks as always for your patience with my gobbledy-gook.
> I. Terminological matters:
> A. Yes ? I meant the term ueSlope to denote the ?coefficient of u*e?
> in c on (e,u,u*e,u^2).
> B. What I meant by this particular piece of gobbledy-gook:
> ?From each computation of ueSlope on (ubar, ebar) we have a pair of
> slopes with a pair of associated probabilities ...?
> was:
> ?From each execution of ueSlope on (ubar, ebar), we obtain a pair of
> coefficients for the ubar,ebar predictors and a pair of probabilities
> associated with these coefficients?.

So there are two different regressions. First you regress c on
(e,u,e*u,u^2) and take the means of e and u and the coefficient of
e*u. You do that for each cell in the 5-way (Fold x Set x Subset x
Method x Length) table in which there are 15 or more observations.
Then you regress the coefficient on the two means, but over what set
of cases? For instance, do you do it over Length for each cell in a 4-
way (Fold x Set x Subset x Method) table?

> Further, I assumed that each of these probabilities IS generated by
> the protocol you gave in your post of 12/7 at 5:27:
> ************************
> Let
> estimated coefficient of the extra predictor in the second model
> t = ----------------------------------------------------------------.
> estimated standard error of that estimated coefficient
> Refer t to the t-distribution with df = n-k, where n = # of
> observations, and k = # of coefficients in the second model. (If the
> model has an intercept then k = # of predictors + 1.) ?
> ***********************
> But if this is not correct, please straighten me out here so that I
> know (for general purposes) how to calculate coefficient-associated
> probabilities.

The question is what to plug into the denominator of the expression
for t. Most regression programs use the SEP times the square root of
the appropriate diagonal element of the inverse of the X'X matrix.
That assumes that the second model is correct and that the cases are
a random sample. But if
a) you don't want to assume that the second model is correct, and
b) the cases are not a random sample, and
c) you have an estimate of the SE of the DV for each case, then
there is another formula for the denominator of the t.
Let A = inverse[X'X], and W = AX'. X is n by k, so W is k by n.
The vector of regression coefficients is Wy, where y is the n-vector
of observations of the DV. Let U be the matrix whose elements are the
squares of the elements of W, and let v be the n-vector containing
the squares of the SEs of the elements of y. Then Uv is a k-vector
containing the variances of the regression coefficients, and the
square roots of those values are the SEs of the coefficients.
The df of the SE of the j'th coefficient is

------------------, where fi is the df of vi.

I think you all of conditions a,b,c hold.

> II. Methodological matter:
> You wrote:
> ?Regardless of the answers to my previous questions, you can't split
> naturally paired p's, sort them, re-pair them, and then compare the re-
> paired p's -- which you shouldn't compare in the first place, even
> without the shuffling, because p-values are NOT effect sizes.?
> OK ? thanks for explaining that.
> Since I want to concentrate right now on your request for linearity
> checks of Aubque_Se on L (and also the question of the nature of the
> plots which you raised in your post of 12/13 @ 1108pm), I will not
> pursue this matter further at the moment.

So far the all the plots look pretty linear, with just a hint of
positive curvature, but it's hard to say because SEs are themselves
heteroscedastic. But I do have one request: please explain the labels
AuqSE, AubuSE, AubquSE, AubeSE, AubqeSE. I know that A is an average,
and SE is a standard error, but the interior letters are a mystery.

> But after the checks are done, I would like to return to the question
> of the correct way to investigate whether there is a MoSS and/or
> Subset effect involving ?ueSlope on (ubar,ebar)? (which I?ll now refer
> to as ueCoeff on (ubar,ebar), as per your terminological correction
> noted in IA above.)

That would be easier to read if you said "the regression of ueSlope on
(ubar,ebar)? instead of just ?ueSlope on (ubar,ebar)? or "ueCoeff on
(ubar,ebar)". In general, please include the phrase "the regression

> And if you can see right off that there is some necessary relationship
> between the regression ueCoeff on (ubar,ebar) and the regression
> Aubque on L, I?d be very curious to know what it is, inasmuch as I
> think the data regarding ueCoeff on (ubar,ebar) will show a MoSS and/
> or Subset effect when it is analyzed correctly.
> Thanks as always for your patience.