The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » sci.math.* » sci.stat.math

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Missing data: density estimation and regression
Replies: 0  

Advanced Search

Back to Topic List Back to Topic List

Posts: 10
Registered: 11/5/07
Missing data: density estimation and regression
Posted: Oct 1, 2012 7:46 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Hi everybody,

I have a dataset with some missing data ~24% (numerical variable Age) in a dataset that I will use to build a binary classifier. I'd like to analyze some alternatives to complete the null values. I'm analyzing the following:

1. Discard records with missing data

2. Fill null values using the mean of average (taking all the records with age values)

3. I plotted an histogram of Age and I can see a shape similar to Chi-square distribution. I want to estimate parameters of chi-square distribution using max likelihood, then take random values from the chi-square distribution to fill null values

(I think it is difficult to build a regression model with a good fit, like linear regression model to try to predict Age using the other variables).

My question is, how can I estimate the chi-square parameters (I've done something similar using maximum likelihood for other distributions)? I want to know the estimators

May it be the 3rd the worst option of all because it increases the variance of age with 'predicted' values? Do you have any suggestion?

I'm using R environment, but any pointer will be useful for me!

Thanks in advance!


Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.