Best Curve Fit
Date: 01/30/2003 at 18:22:47 From: Brian Subject: Finding the equation of a series of points on a graph I have a series of points on a graph that form a piecewise equation that "steps" down as x increases. I need to create an equation from these points that approximates values through the step, i.e. a natural log graph. I have run across several questions that present this type of scenario. For example: My piecewise equation is representative of a "tiered" billing system, where, for example, 0-5000 units cost x dollars, 5001-10000 units cost y dollars (where y < x), 10001-20000 units cost j ... and so forth. The question is, is it possible to come up with an equation that will approximate the cost of x units at any given point based on the number of units? I hope this helps to explain it, and I appreciate your help in advance. Thanks! Brian
Date: 01/30/2003 at 20:02:39 From: Doctor Tom Subject: Re: Finding the equation of a series of points on a graph Hi Brian, The problem you are solving has the general name "regression." For example, if you wanted to find the best fitting straight line to your points, the problem would be one of "linear regression," and so on. In general, you have to commit ahead of time to the sort of curve (or straight line) to which you're trying to fit your data. Then what you usually try to do is minimize the error between the points and the curve by adjusting the curve parameters. For a linear regression, this is easy, since any line (except a vertical one) has the equation y = mx + b. You want to adjust m and b to minimize the total error. If you were trying to fit an exponential like y = e^(ax), you only have one parameter, a, that you adjust to minimize the error. If you want to fit the best possible parabola (opening up or down), then y = ax^2 + bx + c, and you can adjust a, b and c to minimize your error. Let me show you how this is done for a straight line, and the same method (with some modification depending on the family of curves in which you seek a best fit). The line's equation is y = mx + b. Suppose your points are (x1, y1), (x2, y2), ... (xn, yn). Then if you put x1 into the equation, you get y1* = mx1 + b. Your error is the error between y1* and y1, et cetera. The usual error measure takes (y1* - y1)^2 - the square of the vertical distance between your data point and the point lying exactly on the curve y = mx + b. It's squared for a couple of reasons: (1) it's always positive, and (2) it has nice differentiabilty properties. This is called a "least square fit." I can thus write the error as: (y1 - mx1 - b)^2 for the first data point. The error due to the second point is (y2 - mx2 - b)^2, and so on. You obtain the total error by adding the error terms for all n of your points. Call the total error E, or, better, E(m, b), since E depends on m and b. Now, you want to minimize E(m, b) by finding suitable values for m and b. Well, that's easy - just differentiate E with respect to m and set it to zero. Similarly, differentiate E with respect to b and set that to zero. The common solution for m and b is the pair (m, b) that minimizes the error and gives the best straight-line fit. It may look frightening at first to take derivatives, but remember: all of the x1, x2, ... and y1, y2, ... are just constants, so their derivatives are zero. Only m and b vary. To fit the best parabola, the same method works, but the error term would look like E(a, b, c). You'd have to differentiate with respect to each of the three varibles and set all of those to zero. The common solution (you will have three equations and three unknowns) will be a, b, c such that the parabola y = ax^2 + bx +c minimizes the error between itself and your data points. - Doctor Tom, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.