Associated Topics || Dr. Math Home || Search Dr. Math

Best Fit for Linear Data

```
Date: Mon, 07 Aug 1995 16:09:00 +0000 (GMT)
From: Steve Bullman 75902517/01509 282517
Subject: stats analysis problem

Dear Dr. Math,

You may be able to help with a problem that I've had for a couple of years now.

If you have a set of data, x1,y1....xn,yn with a known linear relationship
between x & y (y = mx + c), then:

I was taught at university that the best straight line fit MUST go through
xbar,ybar (xbar average of all x values, ybar is average of all y values).

I used this last year in a piece of analysis and my supervisor's supervisor
crossed it out in big red pen and wrote "Rubbish" all over it.

However, try as I might, I can't find a maths/stats text book that says which
is correct - all I've got are my hand-written notes used during revision for
the exams.

I've consulted two "tame" mathematicians - one proved it was true, the other
proved it was false!

I'm confused!  Help would be much appreciated (a text reference for the
definitive answer would be absolutely brilliant!)

Thanks in anticipation

Steve Bullman
```

```
From: Doctor Ken
Subject: Re: stats analysis problem

Hello there!

Well, my inclination is to say that this isn't true.  Here's why.  Let's say
we have a simple linear relationship y = m x.  We'll use the method of
least squares to find out the best value for m.

Let (xk, yk) be the set of experimental values.  Since the distance between
the real point (xk, yk) and the expected point (xk, m xk) is
|m xk - yk|, the sum of the squares of these distances is
S = (m x1 - y1)^2 + ... + (m xn - yn)^2.
To find the value of m for which this is minimal, differentiate with respect
to m and set what you get equal to zero:

dS = 2 x1 (m x1 - y1) + ... + 2 xn (m xn - yn)
x1 (m x1 - y1) + ... + xn (m xn - yn) = 0
m (x1^2 + ... + xn^2) = x1 y1 + ... + xn yn
m = (x1 y1 + ... + xn yn)/(x1^2 + ... + xn^2)

So now we've found what the best value for m is in our equation y = m x.

Now let's apply this to an example.  Let x1=1; x2=2; x3=5; x4=6.6;
y1=3; y2=90; y3=9; y4=2.  Then the equation y = m x becomes y = 3.279 x,
xbar becomes 3.65, and ybar becomes 26.  But 26 isn't 3.279 * 3.65.  So I
guess I'd say that it's not true.

However, there is some subjectivity going on here.  I mean, the method of
least squares is just one way to say which is the "best" approximation.
So I wouldn't say that you're wrong, just that this method doesn't give you
something that goes through xbar, ybar.

-Doctor Ken
The Geometry Forum
```

```
From: Anonymous
Date: Fri, 5 Jul 1996

I am a statistician working in California.  I love the Ask Dr. Math resource
and review it periodically for useful answers to neat problems.

Recently, I noticed a response from "Doctor Ken -- The Geometry Forum" that
had an error in it which I would like to correct (perhaps others have

through the point (X-bar, Y-bar)?

The correct answer is "Yes," when the model is specified as Y=mx + c and
"best" means the "least squares" fit.

Ken came up with the opposite answer because he considered the model Y=mx.
This model is commonly referred to as "regression through the origin."  In
the case of regression through the origin, the best fit line does not go
through the point (X-bar, Y-bar).

The calculus of minimization will lead to the conclusion that the best fit
line for the model Y=mx + c does indeed pass through the point defined by
the mean of X and the mean of Y for the set of points being analyzed.

Regards from a major fan.

- Lawrence C. Larsen
```
Associated Topics:
College Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search