The Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Lin. regression, probability that a sample belongs to the data set?
Replies: 6   Last Post: Aug 14, 2014 9:13 PM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Jeff Miller

Posts: 99
Registered: 12/18/04
Re: Lin. regression, probability that a sample belongs to the data set?
Posted: Aug 14, 2014 9:13 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Thursday, August 14, 2014 10:33:08 PM UTC+12, Aino wrote:
> Hmm.. This is what I mean. Where would the basic linear discriminant analysis place the one single line to separate these two groups?:

Great illustration, thanks. Now it is obvious that you have a nonlinear discrimination problem. Specifically, you need a multiplicative X*Y "interaction" term.

> I'll have to look into the "2-predictor logistic regression".
Here too, you will need an interaction term.

> > A simple thought experiment shows that looking for "closeness to regression lines" can't be a very good solution in general.

> This I need to think about.. Hmm.. At least I'm fortunate enough that both groups in my data have X's in the same range.
Never mind. Your data look quite different from what I was imagining. Even the Y's look to be in pretty much the same range.

> > Well, I suspect there is a lot about your application that I am misunderstanding. It might help to see a plot of the points from the two samples (plotted in different colors) on an X versus Y scattergram.
> Here is an artist's view of one of the data sets.

Thanks, that helps me a lot. It really does appear that the best discriminator will involve an X*Y term. For example, a logistic regression model with an X*Y term classifies correctly for about 88% of the cases with your "artist's view" data set. (But I cheated and did that in SPSS--sorry.)

>However, regardless of what method I should use, I would very much like to solve the problem of how to determine the probabilities of an individual data point belonging to groups=1 and group=2.

Are you thinking of those two probabilities as being separate quantities or necessarily summing to 1? If the latter, logistic regression gives you such a probability split for each case, depending on the values of the predictor variables.

If you go back to the regression residual approach, you could also use Bayes theorem, starting with the z-score of the residual from each model to get Pr(D|group 1) and Pr(D|group 2).



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2017. All Rights Reserved.