Least Squares Regression for Quadratic Curve Fitting
Date: 02/27/2008 at 14:56:07 From: Rodo Subject: Curve fitting I have the following table of values x y 31 0 27 -1 23 -3 19 -5 15 -7 11 -10 7 -15 3 -25 I would like to find a function to interpolate all integer values between 0 and 31 in x. I drew the above values and I got what looks like a square root curve. So I was thinking that maybe it's OK to interpolate the other settings from a fitted equation like P = a*sqrt(b*PA_LEVEL)+c. However, I was playing with the values and the equation and I don't find how to calculate a,b,c values. How can I calculate them? Using 3 values from the table like (x,y)=(0,31),(-5,19),(-25,3) I was trying to solve a,b,c using the equation P = a*sqrt(b*PA_LEVEL)+c.
Date: 02/27/2008 at 18:18:51 From: Doctor Vogler Subject: Re: Curve fitting Hi Rodo, Thanks for writing to Dr. Math. Actually, you only need one of a and b, because you can just move the "a" inside the square root, or pull the "b" out. It is more usual to invert the equation and solve for a, b, and c in y = a*x^2 + b*x + c. The simplest kind of fitting is least-squares regression. Least-squares linear regression is very common, and least-squares quadratic regression is not very different. It gives a good approximation, and it has the very nice property that you can solve the equations once and then use these formulas for a, b, and c. See also Wikipedia: Least Squares http://en.wikipedia.org/wiki/Least_squares and Wikipedia: Linear Least Squares http://en.wikipedia.org/wiki/Linear_least_squares (Note that Wikipedia is using "linear" in the sense that a, b, and c are linear, whereas I was using "linear" and "quadratic" in the sense that x is linear or quadratic. Both of these cases are linear in the Wikipedia sense. Nonlinear in the Wikipedia sense would be something like y = a*cos(b*x), because the parameter b is inside the cosine.) The idea is to choose the parabola that minimizes the sum of the squares of the vertical distances between your data points and your parabola. That is, you have n = 8 points (x_i, y_i) for i=1 to i=8, and you want the sum n sum (a*x_i^2 + b*x_i + c - y_i)^2 i=1 to be as small as possible. (We square so that points below don't cancel points above. We use the square instead of the absolute value because it gives a nicer solution. This has the effect--which might be good or bad, depending on your problem--of making it more important to get faraway points reasonably close than to get nearby points right on.) Now we pull out our calculus tricks, and we solve as follows: (1) first multiply out the square (2) split the sum into lots of smaller sums (3) take derivatives with respect to the unknowns a, b, c (4) set these equal to zero and solve for a, b, and c The result is a, b, and c given by formulas involving sums of values from your data points. Importantly, the formulas do not depend on the data, so you can get the formulas once and then just plug in the numbers when you have them. So let's solve the general (linear) least-squares quadratic regression problem. (1) Multiply out the square (a*x_i^2 + b*x_i + c - y_i)^2 = a^2*x_i^4 + b^2*x_i^2 + c^2 + y_i^2 + 2ab*x_i^3 + 2ac*x_i^2 + 2bc*x_i - 2a*x_i^2*y_i - 2b*x_i*y_i - 2c*y_i (2) Split the sum n sum (a*x_i^2 + b*x_i + c - y_i)^2 = i=1 n n a^2 sum x_i^4 + (b^2 + 2ac) sum x_i^2 + c^2 * n i=1 i=1 n n n + sum y_i^2 + 2ab sum x_i^3 + 2bc sum x_i i=1 i=1 i=1 n n n - 2a sum x_i^2*y_i - 2b sum x_i*y_i - 2c sum y_i i=1 i=1 i=1 So now we'll use the notation Sjk to mean the sum of x_i^j*y_i^k. (Note that S00 = n, the number of data points you have.) Therefore, we can write the sum as a^2*S40 + (b^2 + 2ac)*S20 + c^2*S00 + S02 + 2ab*S30 + 2bc*S10 - 2a*S21 - 2b*S11 - 2c*S01. (3) Take derivatives The local minimum for this function is going to be where the derivatives with respect to a, b, and c (treating the data points and therefore the sums Sjk as constants) are all zero. The derivatives are: (with respect to a) 2a*S40 + 2c*S20 + 2b*S30 - 2*S21 (with respect to b) 2b*S20 + 2a*S30 + 2c*S10 - 2*S11 (with respect to c) 2a*S20 + 2c*S00 + 2b*S10 - 2*S01 (4) Solve Now we solve the system of simultaneous equations 2a*S40 + 2c*S20 + 2b*S30 - 2*S21 = 0 2b*S20 + 2a*S30 + 2c*S10 - 2*S11 = 0 2a*S20 + 2c*S00 + 2b*S10 - 2*S01 = 0 which we can also write in matrix notation (after dividing by 2 for simplification) as [ S40 S30 S20 ] [ a ] [ S21 ] [ S30 S20 S10 ] [ b ] = [ S11 ] [ S20 S10 S00 ] [ c ] [ S01 ] Now we can use Cramer's Rule to give a, b, and c as formulas in these Sjk values. They all have the same denominator: a = (S01*S10*S30 - S11*S00*S30 - S01*S20^2 + S11*S10*S20 + S21*S00*S20 - S21*S10^2) /(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3) b = (S11*S00*S40 - S01*S10*S40 + S01*S20*S30 - S21*S00*S30 - S11*S20^2 + S21*S10*S20) /(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3) c = (S01*S20*S40 - S11*S10*S40 - S01*S30^2 + S11*S20*S30 + S21*S10*S30 - S21*S20^2) /(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3) Now all you have to do is take your data (your eight points) and evaluate the various sums n Sj0 = sum x_i^j i=1 for j = 0 through 4, and n Sj1 = sum x_i^j*y_i i=1 for j = 0 through 2. Then you substitute into the formulas for a, b, and c, and you are done! If you have any questions about this or need more help, please write back and show me what you have been able to do, and I will try to offer further suggestions. - Doctor Vogler, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.