Drexel dragonThe Math ForumDonate to the Math Forum

Ask Dr. Math - Questions and Answers from our Archives
_____________________________________________
Associated Topics || Dr. Math Home || Search Dr. Math
_____________________________________________

Least Squares Regression for Quadratic Curve Fitting

Date: 02/27/2008 at 14:56:07
From: Rodo
Subject: Curve fitting

I have the following table of values
 x      y
31      0
27     -1
23     -3
19     -5
15     -7
11    -10
 7    -15
 3    -25

I would like to find a function to interpolate all integer values
between 0 and 31 in x.  I drew the above values and I got what looks
like a square root curve.  So I was thinking that maybe it's OK to
interpolate the other settings from a fitted equation like P =
a*sqrt(b*PA_LEVEL)+c.

However, I was playing with the values and the equation and I don't
find how to calculate a,b,c values.  How can I calculate them?

Using 3 values from the table like (x,y)=(0,31),(-5,19),(-25,3) I was
trying to solve a,b,c using the equation P = a*sqrt(b*PA_LEVEL)+c.



Date: 02/27/2008 at 18:18:51
From: Doctor Vogler
Subject: Re: Curve fitting

Hi Rodo,

Thanks for writing to Dr. Math.  Actually, you only need one of a and
b, because you can just move the "a" inside the square root, or pull
the "b" out.  It is more usual to invert the equation and solve for a,
b, and c in

  y = a*x^2 + b*x + c.

The simplest kind of fitting is least-squares regression.  
Least-squares linear regression is very common, and least-squares
quadratic regression is not very different.  It gives a good
approximation, and it has the very nice property that you can solve
the equations once and then use these formulas for a, b, and c.

See also

  Wikipedia: Least Squares
    http://en.wikipedia.org/wiki/Least_squares 

and

  Wikipedia: Linear Least Squares
    http://en.wikipedia.org/wiki/Linear_least_squares 

(Note that Wikipedia is using "linear" in the sense that a, b, and c
are linear, whereas I was using "linear" and "quadratic" in the sense
that x is linear or quadratic.  Both of these cases are linear in the
Wikipedia sense.  Nonlinear in the Wikipedia sense would be something
like y = a*cos(b*x), because the parameter b is inside the cosine.)

The idea is to choose the parabola that minimizes the sum of the
squares of the vertical distances between your data points and your
parabola.  That is, you have n = 8 points (x_i, y_i) for i=1 to i=8,
and you want the sum

   n
  sum (a*x_i^2 + b*x_i + c - y_i)^2
  i=1

to be as small as possible.  (We square so that points below don't
cancel points above.  We use the square instead of the absolute value
because it gives a nicer solution.  This has the effect--which might
be good or bad, depending on your problem--of making it more important 
to get faraway points reasonably close than to get nearby points right 
on.)

Now we pull out our calculus tricks, and we solve as follows:

  (1) first multiply out the square
  (2) split the sum into lots of smaller sums
  (3) take derivatives with respect to the unknowns a, b, c
  (4) set these equal to zero and solve for a, b, and c

The result is a, b, and c given by formulas involving sums of values
from your data points.  Importantly, the formulas do not depend on the
data, so you can get the formulas once and then just plug in the
numbers when you have them.  So let's solve the general (linear)
least-squares quadratic regression problem.

(1) Multiply out the square

  (a*x_i^2 + b*x_i + c - y_i)^2 =
    a^2*x_i^4 + b^2*x_i^2 + c^2 + y_i^2 + 2ab*x_i^3 + 
    2ac*x_i^2 + 2bc*x_i - 2a*x_i^2*y_i - 2b*x_i*y_i - 2c*y_i

(2) Split the sum

   n
  sum (a*x_i^2 + b*x_i + c - y_i)^2 =
  i=1

       n                       n
  a^2 sum x_i^4 + (b^2 + 2ac) sum x_i^2 + c^2 * n
      i=1                     i=1

     n               n               n
  + sum y_i^2 + 2ab sum x_i^3 + 2bc sum x_i
    i=1             i=1             i=1

        n                  n                n
  - 2a sum x_i^2*y_i - 2b sum x_i*y_i - 2c sum y_i
       i=1                i=1              i=1

So now we'll use the notation Sjk to mean the sum of x_i^j*y_i^k. 
(Note that S00 = n, the number of data points you have.)  Therefore,
we can write the sum as

  a^2*S40 + (b^2 + 2ac)*S20 + c^2*S00 + S02 + 2ab*S30
     + 2bc*S10 - 2a*S21 - 2b*S11 - 2c*S01.

(3) Take derivatives

The local minimum for this function is going to be where the
derivatives with respect to a, b, and c (treating the data points and
therefore the sums Sjk as constants) are all zero.  The derivatives are:

(with respect to a)

  2a*S40 + 2c*S20 + 2b*S30 - 2*S21

(with respect to b)

  2b*S20 + 2a*S30 + 2c*S10 - 2*S11

(with respect to c)

  2a*S20 + 2c*S00 + 2b*S10 - 2*S01

(4) Solve

Now we solve the system of simultaneous equations

  2a*S40 + 2c*S20 + 2b*S30 - 2*S21 = 0
  2b*S20 + 2a*S30 + 2c*S10 - 2*S11 = 0
  2a*S20 + 2c*S00 + 2b*S10 - 2*S01 = 0

which we can also write in matrix notation (after dividing by 2 for
simplification) as

  [ S40  S30  S20 ] [ a ]   [ S21 ]
  [ S30  S20  S10 ] [ b ] = [ S11 ]
  [ S20  S10  S00 ] [ c ]   [ S01 ]

Now we can use Cramer's Rule to give a, b, and c as formulas in these
Sjk values.  They all have the same denominator:

  a = (S01*S10*S30 - S11*S00*S30 - S01*S20^2
       + S11*S10*S20 + S21*S00*S20 - S21*S10^2)
    /(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3)

  b = (S11*S00*S40 - S01*S10*S40 + S01*S20*S30
       - S21*S00*S30 - S11*S20^2 + S21*S10*S20)
    /(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3)

  c = (S01*S20*S40 - S11*S10*S40 - S01*S30^2
       + S11*S20*S30 + S21*S10*S30 - S21*S20^2)
    /(S00*S20*S40 - S10^2*S40 - S00*S30^2 + 2*S10*S20*S30 - S20^3)

Now all you have to do is take your data (your eight points) and
evaluate the various sums

         n
  Sj0 = sum x_i^j
        i=1

for j = 0 through 4, and

         n
  Sj1 = sum x_i^j*y_i
        i=1

for j = 0 through 2.  Then you substitute into the formulas for a, b,
and c, and you are done!

If you have any questions about this or need more help, please write
back and show me what you have been able to do, and I will try to
offer further suggestions.

- Doctor Vogler, The Math Forum
  http://mathforum.org/dr.math/ 
Associated Topics:
College Statistics

Search the Dr. Math Library:


Find items containing (put spaces between keywords):
 
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

_____________________________________
Math Forum Home || Math Library || Quick Reference || Math Forum Search
_____________________________________

Ask Dr. MathTM
© 1994-2013 The Math Forum
http://mathforum.org/dr.math/