Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.



Missing data: density estimation and regression
Posted:
Oct 1, 2012 7:46 PM


Hi everybody,
I have a dataset with some missing data ~24% (numerical variable Age) in a dataset that I will use to build a binary classifier. I'd like to analyze some alternatives to complete the null values. I'm analyzing the following:
1. Discard records with missing data 2. Fill null values using the mean of average (taking all the records with age values) 3. I plotted an histogram of Age and I can see a shape similar to Chisquare distribution. I want to estimate parameters of chisquare distribution using max likelihood, then take random values from the chisquare distribution to fill null values (I think it is difficult to build a regression model with a good fit, like linear regression model to try to predict Age using the other variables).
My question is, how can I estimate the chisquare parameters (I've done something similar using maximum likelihood for other distributions)? I want to know the estimators
May it be the 3rd the worst option of all because it increases the variance of age with 'predicted' values? Do you have any suggestion?
I'm using R environment, but any pointer will be useful for me! Thanks in advance!
Hernán



