Best-fitting Line to a Number of PointsDate: 04/25/2001 at 06:59:09 From: Demetrios Halazonetis Subject: Best-fitting line to a number of points I have a number of points on a plane and want to find a line that best-fits through the points. I want to minimize the sum of squares of the distances of each point from the line. I know that the distance of a point from the line is given by Ax+By+C, if the equation of the line is normalized (A^2+B^2 = 1). So, the squared distance is (Ax+By+C)^2. I tried to calculate the derivative of the sum of these squared distances and set it to zero, in order to find A, B, and C, but got lost in the math. Can you help? I would think that this is a solved problem but could not find anything on the Internet. I thought I could take partial derivatives with respect to A, B, and C, and solve the resulting system of equations, but I don't know if this is allowed, given that A and B are not independent variables, so I replaced B with SQRT(1-A^2) and got the partial derivatives with respect to A and C. I could not solve the system. Also, I am troubled about this replacement, because B may be positive or negative, but the SQRT will give only the positive number. Thanks, Demetrios Halazonetis Date: 04/25/2001 at 16:44:58 From: Doctor Rob Subject: Re: Best-fitting line to a number of points Thanks for writing to Ask Dr. Math, Demetrios. It would be better to write y = a[0] + a[1]*x, and to minimize the sum N F(a[0],a[1]) = SUM (a[0]+a[1]*x[n]-y[n])^2. n=1 This will work for all but vertical lines. The details of how to do this are given on the following page from the Dr. Math Archives: Interpolation Polynomials and Linear Regression http://mathforum.org/dr.math/problems/cathala10.9.98.html Write again if you have further questions. - Doctor Rob, The Math Forum http://mathforum.org/dr.math/ Date: 04/26/2001 at 04:31:24 From: Demetrios Halazonetis Subject: Re: Best-fitting line to a number of points Thanks for your suggestion, but what you propose is the well-known solution to linear regression. It is not what I am looking for, because it minimizes the vertical distance of each point from the line. I want to minimize the perpendicular distance of each point from the line, which will give different results. I found this page, which seems to point to the right direction, but I have to test it: Least Squares 3D Circle http://mathforum.org/dr.math/problems/yoder8.13.97.html I also found this page on the Net, which uses a different approach: Perpendicular Regression Of A Line - MathPages, Kevin Brown http://mathpages.com/home/kmath110.htm I will write some code and see what works best. Thanks, Demetrios Halazonetis Date: 04/26/2001 at 12:02:33 From: Doctor Rob Subject: Re: Best-fitting line to a number of points Thanks for writing back. How about this approach? Let the line sought be A*x + B*y + C = 0, with no restriction on A, B, and C, except not both of A and B are zero. Then the distance from a point (x[n],y[n]) to this line is given by d[n] = |A*x[n]+B*y[n]+C|/sqrt(A^2+B^2). Then the sum of the squares of the distances of the r points from the line is F(A,B,C) = SUM d[n]^2 = SUM (A*x[n]+B*y[n]+C)^2/(A^2+B^2). Here and henceforth, the sums run over the range 1 <= n <= r. Now take the partial derivatives of F with respect to A, B, and C, and set them equal to zero. This will give you a set of three simultaneous nonlinear equations in A, B, and C, and you need to solve them. They will be polynomial, of degree 3. When I did this, the equations turned out to be SUM (A*x[n]+B*y[n]+C)*(A*C-B^2*x[n]+A*B*y[n]) = 0, SUM (A*x[n]+B*y[n]+C)*(B*C+A*B*x[n]-A^2*y[n]) = 0, SUM A*x[n]+B*y[n]+C = 0. These can be rewritten as polynomials in A, B, and C with coefficients involving r = SUM 1, s = SUM x[n], t = SUM y[n], u = SUM x[n]^2, v = SUM x[n]*y[n], w = SUM y[n]^2, which are quantities you can compute from your data points. -v*A^2*B-s*A^2*C+(u-w)*A*B^2-2*t*A*B*C-r*A*C^2+v*B^3+s*B^2*C = 0, v*A^3+(w-u)*A^2*B+t*A^2*C-v*A*B^2-2*s*A*B*C-t*B^2*C-r*B*C^2 = 0, s*A+t*B+r*C = 0. C can be eliminated using the last equation, which is the same as C = -(s*A+t*B)/r. That will leave you with two cubic equations in two unknowns A and B. They turn out to be equivalent to the single quadratic equation (r*v-s*t)*A^2 + (s^2-t^2-r*u+r*w)*A*B - (r*v-s*t)*B^2 = 0. A^2 + [(s^2-t^2-r*u+r*w)/(r*v-s*t)]*A*B - B^2 = 0. Now unless r*v - s*t = 0, you can solve this equation for -A/B, the slope of the line. Then you can find C/B, too, and thus the equation of the line. You will get two distinct solutions. One is the minimum you seek, and the other is a saddle point of F(A,B,C). You can tell which is the minimum by evaluating F(A,B,C) at each one and choosing the solution that gives you the smaller value. If r*v = s*t, but the coefficient s^2-t^2-r*u+r*w is nonzero, then the minimum will be achieved for either A = 0 or B = 0. Figure out C for both of these possibilities, and substitute them into F(A,B,C). The one that gives you the lesser value is your solution. If r*v = s*t and s^2-t^2-r*u+r*w = 0, then any line through the center of mass (s/r,t/r) will give the same value of F as any other, and all are tied for being the best. - Doctor Rob, The Math Forum http://mathforum.org/dr.math/ Date: 04/26/2001 at 14:27:44 From: Demetrios Halazonetis Subject: Re: Best-fitting line to a number of points Thanks, I think this will work. It was not as simple as I thought. I already tried the approach described in http://mathpages.com/home/kmath110.htm and it also works fine. The math is simpler. Now I have to figure out the way to do the same in 3D (fit a plane to a number of points). I am not sure that the method described in Least Squares 3D Circle at http://mathforum.org/dr.math/problems/yoder8.13.97.html is easy. If the problem in 2D involves a quadratic equation, the 3D will most probably be a cubic. Anyway, that is for later. Thank you very much for your help! Demetris Date: 04/30/2001 at 14:37:26 From: Doctor Rob Subject: Re: Best-fitting line to a number of points Thanks for the reply, Demetrios. When I repeated this calculation for a plane, the equations got very complicated very quickly. They started with three cubics and one linear equation in A, B, C, and D. The coefficients involve SUM x^2, SUM y^2, SUM z^2, SUM x*y, SUM x*z, SUM y*z, SUM x, SUM y, SUM z, and SUM 1. The linear equation I used to eliminate D. Then I worked pretty hard with the help of Mathematica(TM) to eliminate C from the remaining equations. I ended up with an equation in (-A/B) of degree 8 with *VERY* complicated coefficients. It could be symbolically factored into a linear part, a cubic part, and a quartic part. Of the eight solutions, seven were saddle points and one was the minimum sought. I also found conditions on the data for the solution to be unique, also very complicated. I am willing to believe that my computations could be erroneous, and that the actual situation is not that bad, but I would also be willing to believe that that is exactly what happens. What I learned was that doing those computations symbolically is probably a mistake, but doing them numerically in any particular case isn't bad at all. - Doctor Rob, The Math Forum http://mathforum.org/dr.math/ Date: 03/19/2003 at 15:06:48 From: Brian Shier Subject: Best-fitting line to a number of points Finding a best fit line or plane by minimizing perpendicular distances is a problem in "orthogonal distance regression." An Internet search will turn up plenty of material. Once it is recognized that the best fit line or plane contains the centroid of the data, the sum of squared distances (in either case) can be rewritten involving a Rayleigh quotient that uses the covariance matrix of the data (shifted by the centroid). The Rayleigh quotient is minimized and maximized by the eigen vectors of the covariance matrix that correspond to its smallest and largest eigen values. If the data is 3D, the minimizing eigen vector will be the normal vector of the best fit plane, and the maximizing eigen vector will be the direction vector of the best fit line. Interestingly, the best fit line is on the best fit plane. In practice the vectors are computed using the Singular Value Decomposition (SVD). A good college text in linear algebra would have the needed information on the Rayleigh quotient and the SVD. This is admittedly complex, but it is the cleanest approach. The down side is that computing the SVD is not something to be done by hand, and it is not an easy task to program. The case of a best fit 2D line can be handled without all of this complexity. If the best fit line is written as (x - xbar) sin(theta) - (y - ybar) cos(theta) = 0, then the sum of squared distances can be minimized as a function of theta. The slope of the best fit line is then tan(minimizing_theta). This is a little easier than working with y = mx + b. There is a lot of detail beneath the surface here. I tried to only give enough for those who have the background and interest to pursue it. The problem is simple to describe, but not so simple to solve beyond the case of the 2D line. -Brian |
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]
Ask Dr. Math^{TM}
© 1994- The Math Forum at NCTM. All rights reserved.
http://mathforum.org/dr.math/