Associated Topics || Dr. Math Home || Search Dr. Math

### Best-fitting Line to a Number of Points

```
Date: 04/25/2001 at 06:59:09
From: Demetrios Halazonetis
Subject: Best-fitting line to a number of points

I have a number of points on a plane and want to find a line that
best-fits through the points. I want to minimize the sum of squares
of the distances of each point from the line. I know that the distance
of a point from the line is given by Ax+By+C, if the equation of
the line is normalized (A^2+B^2 = 1). So, the squared distance is
(Ax+By+C)^2.

I tried to calculate the derivative of the sum of these squared
distances and set it to zero, in order to find A, B, and C, but got
lost in the math. Can you help? I would think that this is a solved
problem but could not find anything on the Internet.

I thought I could take partial derivatives with respect to A, B, and
C, and solve the resulting system of equations, but I don't know if
this is allowed, given that A and B are not independent variables,
so I replaced B with SQRT(1-A^2) and got the partial derivatives with
respect to A and C. I could not solve the system. Also, I am troubled
SQRT will give only the positive number.

Thanks,
Demetrios Halazonetis
```

```
Date: 04/25/2001 at 16:44:58
From: Doctor Rob
Subject: Re: Best-fitting line to a number of points

Thanks for writing to Ask Dr. Math, Demetrios.

It would be better to write

y = a[0] + a[1]*x,

and to minimize the sum

N
F(a[0],a[1]) = SUM (a[0]+a[1]*x[n]-y[n])^2.
n=1

This will work for all but vertical lines.  The details of how
to do this are given on the following page from the Dr. Math
Archives:

Interpolation Polynomials and Linear Regression
http://mathforum.org/dr.math/problems/cathala10.9.98.html

Write again if you have further questions.

- Doctor Rob, The Math Forum
http://mathforum.org/dr.math/
```

```
Date: 04/26/2001 at 04:31:24
From: Demetrios Halazonetis
Subject: Re: Best-fitting line to a number of points

Thanks for your suggestion, but what you propose is the well-known
solution to linear regression. It is not what I am looking for,
because it minimizes the vertical distance of each point from the
line. I want to minimize the perpendicular distance of each point from
the line, which will give different results.

I found this page, which seems to point to the right direction, but I
have to test it:

Least Squares 3D Circle
http://mathforum.org/dr.math/problems/yoder8.13.97.html

I also found this page on the Net, which uses a different approach:

Perpendicular Regression Of A Line - MathPages, Kevin Brown
http://mathpages.com/home/kmath110.htm

I will write some code and see what works best.

Thanks,
Demetrios Halazonetis
```

```
Date: 04/26/2001 at 12:02:33
From: Doctor Rob
Subject: Re: Best-fitting line to a number of points

Let the line sought be A*x + B*y + C = 0, with no restriction on A, B,
and C, except not both of A and B are zero. Then the distance from a
point (x[n],y[n]) to this line is given by

d[n] = |A*x[n]+B*y[n]+C|/sqrt(A^2+B^2).

Then the sum of the squares of the distances of the r points from the
line is

F(A,B,C) = SUM d[n]^2
= SUM (A*x[n]+B*y[n]+C)^2/(A^2+B^2).

Here and henceforth, the sums run over the range 1 <= n <= r.

Now take the partial derivatives of F with respect to A, B, and C, and
set them equal to zero. This will give you a set of three simultaneous
nonlinear equations in A, B, and C, and you need to solve them. They
will be polynomial, of degree 3. When I did this, the equations turned
out to be

SUM (A*x[n]+B*y[n]+C)*(A*C-B^2*x[n]+A*B*y[n]) = 0,
SUM (A*x[n]+B*y[n]+C)*(B*C+A*B*x[n]-A^2*y[n]) = 0,
SUM A*x[n]+B*y[n]+C = 0.

These can be rewritten as polynomials in A, B, and C with coefficients
involving

r = SUM 1,
s = SUM x[n],
t = SUM y[n],
u = SUM x[n]^2,
v = SUM x[n]*y[n],
w = SUM y[n]^2,

which are quantities you can compute from your data points.

-v*A^2*B-s*A^2*C+(u-w)*A*B^2-2*t*A*B*C-r*A*C^2+v*B^3+s*B^2*C = 0,
v*A^3+(w-u)*A^2*B+t*A^2*C-v*A*B^2-2*s*A*B*C-t*B^2*C-r*B*C^2 = 0,
s*A+t*B+r*C = 0.

C can be eliminated using the last equation, which is the same as

C = -(s*A+t*B)/r.

That will leave you with two cubic equations in two unknowns A and B.

They turn out to be equivalent to the single quadratic equation

(r*v-s*t)*A^2 + (s^2-t^2-r*u+r*w)*A*B - (r*v-s*t)*B^2 = 0.
A^2 + [(s^2-t^2-r*u+r*w)/(r*v-s*t)]*A*B - B^2 = 0.

Now unless r*v - s*t = 0, you can solve this equation for -A/B, the
slope of the line. Then you can find C/B, too, and thus the equation
of the line. You will get two distinct solutions. One is the minimum
you seek, and the other is a saddle point of F(A,B,C). You can tell
which is the minimum by evaluating F(A,B,C) at each one and choosing
the solution that gives you the smaller value.

If r*v = s*t, but the coefficient s^2-t^2-r*u+r*w is nonzero, then the
minimum will be achieved for either A = 0 or B = 0. Figure out C for
both of these possibilities, and substitute them into F(A,B,C). The
one that gives you the lesser value is your solution.

If r*v = s*t and s^2-t^2-r*u+r*w = 0, then any line through the center
of mass (s/r,t/r) will give the same value of F as any other, and all
are tied for being the best.

- Doctor Rob, The Math Forum
http://mathforum.org/dr.math/
```

```
Date: 04/26/2001 at 14:27:44
From: Demetrios Halazonetis
Subject: Re: Best-fitting line to a number of points

Thanks, I think this will work. It was not as simple as I thought.
I already tried the approach described in
http://mathpages.com/home/kmath110.htm    and it also works fine. The
math is simpler.

Now I have to figure out the way to do the same in 3D (fit a plane to
a number of points). I am not sure that the method described in Least
Squares 3D Circle at

http://mathforum.org/dr.math/problems/yoder8.13.97.html

is easy. If the problem in 2D involves a quadratic equation, the 3D
will most probably be a cubic. Anyway, that is for later.

Thank you very much for your help!

Demetris
```

```
Date: 04/30/2001 at 14:37:26
From: Doctor Rob
Subject: Re: Best-fitting line to a number of points

When I repeated this calculation for a plane, the equations got very
complicated very quickly. They started with three cubics and one
linear equation in A, B, C, and D. The coefficients involve SUM x^2,
SUM y^2, SUM z^2, SUM x*y, SUM x*z, SUM y*z, SUM x, SUM y, SUM z, and
SUM 1. The linear equation I used to eliminate D. Then I worked pretty
hard with the help of Mathematica(TM) to eliminate C from the
remaining equations. I ended up with an equation in (-A/B) of degree 8
with *VERY* complicated coefficients. It could be symbolically
factored into a linear part, a cubic part, and a quartic part.  Of the
eight solutions, seven were saddle points and one was the minimum
sought. I also found conditions on the data for the solution to be
unique, also very complicated.

I am willing to believe that my computations could be erroneous, and
that the actual situation is not that bad, but I would also be willing
to believe that that is exactly what happens.

What I learned was that doing those computations symbolically is
probably a mistake, but doing them numerically in any particular

- Doctor Rob, The Math Forum
http://mathforum.org/dr.math/
```

```Date: 03/19/2003 at 15:06:48
From: Brian Shier
Subject: Best-fitting line to a number of points

Finding a best fit line or plane by minimizing perpendicular
distances is a problem in "orthogonal distance regression." An
Internet search will turn up plenty of material.

Once it is recognized that the best fit line or plane contains the
centroid of the data, the sum of squared distances (in either case)
can be rewritten involving a Rayleigh quotient that uses the
covariance matrix of the data (shifted by the centroid). The
Rayleigh quotient is minimized and maximized by the eigen vectors of
the covariance matrix that correspond to its smallest and largest
eigen values. If the data is 3D, the minimizing eigen vector will be
the normal vector of the best fit plane, and the maximizing eigen
vector will be the direction vector of the best fit line.
Interestingly, the best fit line is on the best fit plane.

In practice the vectors are computed using the Singular Value
Decomposition (SVD). A good college text in linear algebra would
have the needed information on the Rayleigh quotient and the SVD.

This is admittedly complex, but it is the cleanest approach. The
down side is that computing the SVD is not something to be done by
hand, and it is not an easy task to program.

The case of a best fit 2D line can be handled without all of this
complexity. If the best fit line is written as

(x - xbar) sin(theta) - (y - ybar) cos(theta) = 0,

then the sum of squared distances can be minimized as a function of
theta. The slope of the best fit line is then tan(minimizing_theta).
This is a little easier than working with y = mx + b.

There is a lot of detail beneath the surface here. I tried to only
give enough for those who have the background and interest to pursue
it. The problem is simple to describe, but not so simple to solve
beyond the case of the 2D line.

-Brian
```
Associated Topics:
College Analysis
College Euclidean Geometry
College Non-Euclidean Geometry
College Statistics

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search