Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.


root
Posts:
57
Registered:
8/26/09


Re: Data Mining, Data Partitioning, Prediction and Inference
Posted:
Mar 11, 2011 11:44 AM


slutsky_fan <matttbogard@gmail.com> wrote: > Typically to deal with generalization error, when building forecasts > or predictive models, data will be partitioned at least into training > and validation data sets. What if I'm primarily just concerned with > making inferences about certain variables in the model (evaluating > directions of coefficients, odds ratios etc.)? Wouldn't I actually be > better off NOT PARTITIONING the data and using the whole data set to > get better coefficient estimates? > > In other words, If I've just developed a predictive model, and all I > want from it are predicted probabilities, I should probably partition > the data and validate my results. But then, when it comes to making > actual inferences about the relationships between the variables I > should probably rebuild the model on a complete nonpartitioned data > set. Is this a good or bad methodology? > > Thanks.
In my experience the data are grouped into three classes: 1. those known to have some property 2. those known not to have that property 3. all the rest for which having/nothaving is not known.
The "training" is done using groups 1,2 and is then applied to group 3.



