Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.



Re: Lin. regression, probability that a sample belongs to the data set?
Posted:
Aug 14, 2014 9:13 PM


On Thursday, August 14, 2014 10:33:08 PM UTC+12, Aino wrote: > Hmm.. This is what I mean. Where would the basic linear discriminant analysis place the one single line to separate these two groups?:
Great illustration, thanks. Now it is obvious that you have a nonlinear discrimination problem. Specifically, you need a multiplicative X*Y "interaction" term.
> I'll have to look into the "2predictor logistic regression". Here too, you will need an interaction term.
> > A simple thought experiment shows that looking for "closeness to regression lines" can't be a very good solution in general.
> This I need to think about.. Hmm.. At least I'm fortunate enough that both groups in my data have X's in the same range. Never mind. Your data look quite different from what I was imagining. Even the Y's look to be in pretty much the same range.
> > Well, I suspect there is a lot about your application that I am misunderstanding. It might help to see a plot of the points from the two samples (plotted in different colors) on an X versus Y scattergram. > Here is an artist's view of one of the data sets.
Thanks, that helps me a lot. It really does appear that the best discriminator will involve an X*Y term. For example, a logistic regression model with an X*Y term classifies correctly for about 88% of the cases with your "artist's view" data set. (But I cheated and did that in SPSSsorry.)
>However, regardless of what method I should use, I would very much like to solve the problem of how to determine the probabilities of an individual data point belonging to groups=1 and group=2.
Are you thinking of those two probabilities as being separate quantities or necessarily summing to 1? If the latter, logistic regression gives you such a probability split for each case, depending on the values of the predictor variables. If you go back to the regression residual approach, you could also use Bayes theorem, starting with the zscore of the residual from each model to get Pr(Dgroup 1) and Pr(Dgroup 2).



