I have a dataset with some missing data ~24% (numerical variable Age) in a dataset that I will use to build a binary classifier. I'd like to analyze some alternatives to complete the null values. I'm analyzing the following:
1. Discard records with missing data
2. Fill null values using the mean of average (taking all the records with age values)
3. I plotted an histogram of Age and I can see a shape similar to Chi-square distribution. I want to estimate parameters of chi-square distribution using max likelihood, then take random values from the chi-square distribution to fill null values
(I think it is difficult to build a regression model with a good fit, like linear regression model to try to predict Age using the other variables).
My question is, how can I estimate the chi-square parameters (I've done something similar using maximum likelihood for other distributions)? I want to know the estimators
May it be the 3rd the worst option of all because it increases the variance of age with 'predicted' values? Do you have any suggestion?
I'm using R environment, but any pointer will be useful for me!