|


Least Squares Regression for Quadratic Curve FittingDate: 02/27/2008 at 14:56:07 From: Rodo Subject: Curve fitting I have the following table of values x y 31 0 27 -1 23 -3 19 -5 15 -7 11 -10 7 -15 3 -25 I would like to find a function to interpolate all integer values between 0 and 31 in x. I drew the above values and I got what looks like a square root curve. So I was thinking that maybe it's OK to interpolate the other settings from a fitted equation like P = a*sqrt(b*PA_LEVEL)+c. However, I was playing with the values and the equation and I don't find how to calculate a,b,c values. How can I calculate them? Using 3 values from the table like (x,y)=(0,31),(-5,19),(-25,3) I was trying to solve a,b,c using the equation P = a*sqrt(b*PA_LEVEL)+c.
Date: 02/27/2008 at 18:18:51
From: Doctor Vogler
Subject: Re: Curve fitting
Hi Rodo,
Thanks for writing to Dr. Math. Actually, you only need one of a and
b, because you can just move the "a" inside the square root, or pull
the "b" out. It is more usual to invert the equation and solve for a,
b, and c in
y = a*x^2 + b*x + c.
The simplest kind of fitting is least-squares regression.
Least-squares linear regression is very common, and least-squares
quadratic regression is not very different. It gives a good
approximation, and it has the very nice property that you can solve
the equations once and then use these formulas for a, b, and c.
See also
Wikipedia: Least Squares
http://en.wikipedia.org/wiki/Least_squares
and
Wikipedia: Linear Least Squares
http://en.wikipedia.org/wiki/Linear_least_squares
(Note that Wikipedia is using "linear" in the sense that a, b, and c
are linear, whereas I was using "linear" and "quadratic" in the sense
that x is linear or quadratic. Both of these cases are linear in the
Wikipedia sense. Nonlinear in the Wikipedia sense would be something
like y = a*cos(b*x), because the parameter b is inside the cosine.)
The idea is to choose the parabola that minimizes the sum of the
squares of the vertical distances between your data points and your
parabola. That is, you have n = 8 points (x_i, y_i) for i=1 to i=8,
and you want the sum
n
sum (a*x_i^2 + b*x_i + c - y_i)^2
i=1
to be as small as possible. (We square so that points below don't
cancel points above. We use the square instead of the absolute value
because it gives a nicer solution. This has the effect--which might
be good or bad, depending on your problem--of making it more important
to get faraway points reasonably close than to get nearby points right
on.)
Now we pull out our calculus tricks, and we solve as follows:
(1) first multiply out the square
(2) split the sum into lots of smaller sums
(3) take derivatives with respect to the unknowns a, b, c
(4) set these equal to zero and solve for a, b, and c
The result is a, b, and c given by formulas involving sums of values
from your data points. Importantly, the formulas do not depend on the
data, so you can get the formulas once and then just plug in the
numbers when you have them. So let's solve the general (linear)
least-squares quadratic regression problem.
(1) Multiply out the square
(a*x_i^2 + b*x_i + c - y_i)^2 =
a^2*x_i^4 + b^2*x_i^2 + c^2 + y_i^2 + 2ab*x_i^3 +
2ac*x_i^2 + 2bc*x_i - 2a*x_i^2*y_i - 2b*x_i*y_i - 2c*y_i
(2) Split the sum
n
sum (a*x_i^2 + b*x_i + c - y_i)^2 =
i=1
n n
a^2 sum x_i^4 + (b^2 + 2ac) sum x_i^2 + c^2 * n
i=1 i=1
n n n
+ sum y_i^2 + 2ab sum x_i^3 + 2bc sum x_i
i=1 i=1 i=1
n n n
- 2a sum x_i^2*y_i - 2b sum x_i*y_i - 2c sum y_i
i=1 i=1 i=1
So now we'll use the notation Sjk to mean the sum of x_i^j*y_i^k.
(Note that S00 = n, the number of data points you have.) Therefore,
we can write the sum as
a^2*S40 + (b^2 + 2ac)*S20 + c^2*S00 + S02 + 2ab*S30
+ 2bc*S10 - 2a*S21 - 2b*S11 - 2c*S01.
(3) Take derivatives
The local minimum for this function is going to be where the
derivatives with respect to a, b, and c (treating the data points and
therefore the sums Sjk as constants) are all zero. The derivatives are:
(with respect to a)
2a*S40 + 2c*S20 + 2b*S30 - 2*S21
(with respect to b)
2b*S20 + 2a*S30 + 2c*S10 - 2*S11
(with respect to c)
2a*S20 + 2c*S00 + 2b*S10 - 2*S01
(4) Solve
Now we solve the system of simultaneous equations
2a*S40 + 2c*S20 + 2b*S30 - 2*S21 = 0
2b*S20 + 2a*S30 + 2c*S10 - 2*S11 = 0
2a*S20 + 2c*S00 + 2b*S10 - 2*S01 = 0
which we can also write in matrix notation (after dividing by 2 for
simplification) as
[ S40 S30 S20 ] [ a ] [ S21 ]
[ S30 S20 S10 ] [ b ] = [ S11 ]
[ S20 S10 S00 ] [ c ] [ S01 ]
Now we can use Cramer's Rule to give a, b, and c as formulas in these
Sjk values. They all have the same denominator:
a = (S01*S10*S30 - S11*S00*S30 - S01*S20^2
+ S11*S10*S20 + S21*S00*S20 - S21*S10^2)
/(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3)
b = (S11*S00*S40 - S01*S10*S40 + S01*S20*S30
- S21*S00*S30 - S11*S20^2 + S21*S10*S20)
/(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3)
c = (S01*S20*S40 - S11*S10*S40 - S01*S30^2
+ S11*S20*S30 + S21*S10*S30 - S21*S20^2)
/(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3)
Now all you have to do is take your data (your eight points) and
evaluate the various sums
n
Sj0 = sum x_i^j
i=1
for j = 0 through 4, and
n
Sj1 = sum x_i^j*y_i
i=1
for j = 0 through 2. Then you substitute into the formulas for a, b,
and c, and you are done!
If you have any questions about this or need more help, please write
back and show me what you have been able to do, and I will try to
offer further suggestions.
- Doctor Vogler, The Math Forum
http://mathforum.org/dr.math/
|
Search the Dr. Math Library: |
[Privacy Policy] [Terms of Use]


Ask Dr. MathTM
© 1994-2008 The Math Forum
http://mathforum.org/dr.math/