On May 20, 4:59 pm, djh <halitsk...@att.net> wrote: > You wrote: > > ******** > Start learning R. Jackknifing requires combining the results of n+1 > analyses, where n is the number of segments. Here's how it's done. > > Let b denote the vector of estimated coefficients (including the > intercept) you get from the usual analysis. For i = 1,...,n, let > b'i denote the vector of coefficients you get when you omit the > comparisons involving segment i, and let b"i = n*b - (n-1)*b'i. > For a 7-predictor model you would end up with an n x 8 matrix B". > The trick is to treat B" as if its rows were independent estimates > of the coefficients. The column means are reduced-bias estimates > of the coefficients, and 1/n times the (inferential) covariance > matrix is an empirical estimate of the covariance matrix of the > estimated coefficients. > > I suppose you could do all that via the interface to John's program, > but it would certainly be awkward. > ******* > > I understand what you mean about "awkward" - since my i's are large, > they imply a manually un-doable number of runs into John's calculator. > > But let me first try to understand something more basic about what you > wrote that I presently don't understand. > > The 7-predictor model right now is generally run on 32 cells, > resulting in 32 (n0,n1) pairs for 32 different input sets > S1,...Sk,...,S32, each with its own set of input segments that can be > thought of as being labelled this way: > > s1v1,...,siv1,...,snv1 ("v" means subscript) > ... > s1vk,...,sivk,...,snvk > ... > s1v32,...,siv32,...,snv32 > > If we were dealing with just one of these 32 sets of input segments, I > would understand exactly what you mean about running nvk times, > omitting in each run the comparisons for segment sivk from ivk=1 to > ivk=nvk. > > But since we run the 7-predictor model on all 32 sets of segments > simultaneously, I don't at all understand what you mean. In > particular, it doesn't make any sense to me that you want me to run > the model nvk times on just one cell Sk of the 32. How can the model > generate coefficients for just one cell? > > When you described this originally, I thought that what was necessary > was simply to get nvk (n0,n1) pairs for each cell 1...k...32, where > each nvk pair results from omitting results involving a different sivk > in s1vk,...,sivk,...,snvk. > > But now it seems that you meant something entirely different, and I'm > hoping I've conveyed the source of my confusion well enough for you to > help me through it. > > Very sorry to be so dense, Ray ... I know the demand it places on your > time and patience ...
Please correct any errors in the following. I need to be sure that I have it right it before I try to explain the jackknife.
1. A segment is compared only to other segments in the same length interval, never to a segment in a different length interval. However, for reasons that don't matter at this point, not all possible pairs of segments in the same interval are compared.
2. Whether or not a pair matches depends on only that pair, regardless of what other comparisons are (or are not) done or the outcomes of those comparisons.
3. c,L,e,u are properties of individual segments, but x1 & x2 are properties of pairs. The values of c,L,e,u in the two driver regressions are the averages of (the logs of) the values of the segments in the pair.
4. The values of L used in the logistic regression depend on only the endpoints of the intervals, regardless of the actual values of L of the segments that are included in the analysis.
5. If a segment is omitted from the analysis then nothing outside that segment's length interval will change. In that segment's interval, all pairs involving that segment, and only pairs involving that segment, will be removed from the set of pairs that are compared, and no new pairs will be added to that set. Both of the driver regressions may change, which may change the (x1,x2) values of some pairs, which in turn may change the four w values and (n0,n1) pairs.