On Nov 19, 5:02 am, djh <halitsk...@att.net> wrote: > Thanks as always for taking the time to construct that formalization. > > I have a question about your conclusion: > > ?So all you're suggesting is to look at some linear transformations > of the centroids of the points at each L. That can never tell you > anything about the within-L relations or if/how they change with L.? > > But I will cycle back and ask it after you?ve had a chance to review > (and hopefully comment) on my posts of 11/18 at 3:30 pm (as emended by > my post of 11/18 at 10:28pm) concerning a median-based approach to > classifying data as uL or uH. > > What I fear is that after you?ve read these two posts, you?ll tell me > that one can?t have uL and uH as parameters of the design and at the > same time, classify observations as uL or uH based on medians > calculated from the values of u in a given > (set,method,subset,fold,LenInt), e.g. (1,N,S,a1,3) or (2,R,C,b47,12). > > So, whenever you?ve had a chance to read the two posts, I?d very much > appreciate your comments on the (non-)legitmacy of the proposed median- > based approach to uL and uH.
You're currently splitting u at a fixed value of 1, right? But you want to change the rules so that each cell has its own cutoffs, which would be the the medians of the old Lo and Hi values in that cell, so that you would ignore half the data in each cell?
I don't like it. I never like doing the regressions separately for u < 1 and u >= 1 (or u <= 1 and u > 1 -- I forget which way u = 1 goes). If you think the regression function changes as a function of u then you should model that dependence. The usual first step is to take it to be linear, which would add u^2 to the regression of c on u, u & u*e to the regression of c on e, and u*e & u^2 to the multiple regression of c on e & u.
That would cost you only 1 or 2 df in each cell. The current split created a new factor, u-level, by cutting each cell in two, which on average halved all the cell n's. The new rules would cut those n's in half again.
The only potential advantage I can see to splitting is that the analyses would be less dependent on the assumption that the error variance does not depend on the level of u.