Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.



dependent variable values missing not at random
Posted:
May 1, 2009 3:56 AM


Hi,
I have a data set where the response is time to residential development for tax parcels (some are not developed at all so censoring is happening) and predictors are assorted variables measured on the landscape (eg topographic slope). I would like to use this data to conduct a survival analysis. The problem is that some parcels (less that 5%) do not have data for the year of residential development, and so I cannot compute 'survival times' for them. It is likely that the earlier in time a house was built, the less likely anyone knows the date of building, and so data missing is likely nonrandom and correlated with date of building/survival time. I am most interested in testing hypotheses about parameter estimates in this case. I've read some of the literature on missing data, but the case where dependent values are missing not at random is not well covered, it seems. What are some potential ways to handle this situation? I thought of substituting in the values that would create the most bias and/or variance in the regression coefficients to set a bound. If my nulls are rejected under these conditions, then I can be fairly certain (can I put a probability on it?) that the complete data set would also reject these nulls. Any ideas? thanks, Seth



