Best Fit for Linear Data
Date: Mon, 07 Aug 1995 16:09:00 +0000 (GMT) From: Steve Bullman 75902517/01509 282517 Subject: stats analysis problem Dear Dr. Math, You may be able to help with a problem that I've had for a couple of years now. If you have a set of data, x1,y1....xn,yn with a known linear relationship between x & y (y = mx + c), then: I was taught at university that the best straight line fit MUST go through xbar,ybar (xbar average of all x values, ybar is average of all y values). I used this last year in a piece of analysis and my supervisor's supervisor crossed it out in big red pen and wrote "Rubbish" all over it. However, try as I might, I can't find a maths/stats text book that says which is correct - all I've got are my hand-written notes used during revision for the exams. I've consulted two "tame" mathematicians - one proved it was true, the other proved it was false! I'm confused! Help would be much appreciated (a text reference for the definitive answer would be absolutely brilliant!) Thanks in anticipation Steve Bullman
From: Doctor Ken Subject: Re: stats analysis problem Hello there! Well, my inclination is to say that this isn't true. Here's why. Let's say we have a simple linear relationship y = m x. We'll use the method of least squares to find out the best value for m. Let (xk, yk) be the set of experimental values. Since the distance between the real point (xk, yk) and the expected point (xk, m xk) is |m xk - yk|, the sum of the squares of these distances is S = (m x1 - y1)^2 + ... + (m xn - yn)^2. To find the value of m for which this is minimal, differentiate with respect to m and set what you get equal to zero: dS = 2 x1 (m x1 - y1) + ... + 2 xn (m xn - yn) x1 (m x1 - y1) + ... + xn (m xn - yn) = 0 m (x1^2 + ... + xn^2) = x1 y1 + ... + xn yn m = (x1 y1 + ... + xn yn)/(x1^2 + ... + xn^2) So now we've found what the best value for m is in our equation y = m x. Now let's apply this to an example. Let x1=1; x2=2; x3=5; x4=6.6; y1=3; y2=90; y3=9; y4=2. Then the equation y = m x becomes y = 3.279 x, xbar becomes 3.65, and ybar becomes 26. But 26 isn't 3.279 * 3.65. So I guess I'd say that it's not true. However, there is some subjectivity going on here. I mean, the method of least squares is just one way to say which is the "best" approximation. So I wouldn't say that you're wrong, just that this method doesn't give you something that goes through xbar, ybar. -Doctor Ken The Geometry Forum
From: Anonymous Date: Fri, 5 Jul 1996 Subject: Answer to "regression" problem in Ask Dr. Math I am a statistician working in California. I love the Ask Dr. Math resource and review it periodically for useful answers to neat problems. Recently, I noticed a response from "Doctor Ken -- The Geometry Forum" that had an error in it which I would like to correct (perhaps others have already pointed it out). The question was this: Does the "best-fit" linear regression function go through the point (X-bar, Y-bar)? The correct answer is "Yes," when the model is specified as Y=mx + c and "best" means the "least squares" fit. Ken came up with the opposite answer because he considered the model Y=mx. This model is commonly referred to as "regression through the origin." In the case of regression through the origin, the best fit line does not go through the point (X-bar, Y-bar). The calculus of minimization will lead to the conclusion that the best fit line for the model Y=mx + c does indeed pass through the point defined by the mean of X and the mean of Y for the set of points being analyzed. Regards from a major fan. - Lawrence C. Larsen
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.