slutsky_fan <email@example.com> wrote: > Typically to deal with generalization error, when building forecasts > or predictive models, data will be partitioned at least into training > and validation data sets. What if I'm primarily just concerned with > making inferences about certain variables in the model (evaluating > directions of co-efficients, odds ratios etc.)? Wouldn't I actually be > better off NOT PARTITIONING the data and using the whole data set to > get better co-efficient estimates? > > In other words, If I've just developed a predictive model, and all I > want from it are predicted probabilities, I should probably partition > the data and validate my results. But then, when it comes to making > actual inferences about the relationships between the variables I > should probably re-build the model on a complete non-partitioned data > set. Is this a good or bad methodology? > > Thanks.
In my experience the data are grouped into three classes: 1. those known to have some property 2. those known not to have that property 3. all the rest for which having/not-having is not known.
The "training" is done using groups 1,2 and is then applied to group 3.