Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.



Re: Lin. regression, probability that a sample belongs to the data set?
Posted:
Aug 12, 2014 9:03 AM


"Aino" <aino.tietavainen@removeThis.helsinki.fi> wrote in message <lscp64$nqi$1@newscl01ah.mathworks.com>... > Hi all. > > I have a simple linear regression with x and y data. Now, if I take a sample, say (x1, y1), how do I get some probability that the sample belongs to the regressed data at hand? > > In another words, it is possible (somehow..) to get for example 95% prediction bounds/intervals to the regressed data, but how do I do the opposite, how do I get the "percentage" for a certain (x1, y1)? > > The bigger picture (for those who are interested): I have two sets of data and two regression lines, and I have to decide to which data set the sample belongs to. Linear discriminant analysis is not an option here, but anything "ANCOVA with unequal slopes" would be interesting. >
So given a linear regression, you can compute an uncertainty around the line at any point x. This would be in the form of a normal distribution, with mean at the predicted value of the line, and a variance around that point in y. The variance will be largest near the ends of the line of course.
So given that (x,y) pair, you will have a normal distribution. Use the normal CDF to convert that to a probability score. You will get different probabilities for each line of course, so the line with the better score "wins".
A quick search online shows at least a few sites site with sufficient information provided to do the computations, here:
http://science.widener.edu/svb/stats/regress.html
or here:
http://www.mpiahd.mpg.de/~calj/statistical_methods_ss2013/lectures/05_regression.pdf
Should be easy enough.
John



