Modelling Discrete Data with Exponential Functions
Date: 5/14/96 at 10:5:15 From: Anonymous Subject: Modelling Discrete Data with Exponential Functions My questions are a part of an assignment for 12th grade mathematics. 1) I am stuck trying to fit this exponential function to a set of three discrete data points; (5,55), (60,20), (150,1): g(x) = De^kx + E where D,k, and E are real, arbitrary constants. The assignment is based on modelling a function to pass through three discrete points. We are to assess how closely the model relates to other discrete points between the points that the model intercepts. eg: (30,37) (45,27) (75,14) (90,10). I can find g(x) to fit discrete data that have evenly spaced x-values, eg: (15,50) (60,20) (105,6). To do it, I have used simultaneous equations to find a value for D,k, and E, but the method only works with evenly spaced data on the horizontal axis. My method for finding a function to fit the unevenly spaced co-ordinates is as follows: 55 = De^5k + E (1) 20 = De^60k + E (2) 1 = De^150k + E (3) Find E from (2)/(1): 20-E ---- = e^55k 55-E 20 - E = 55e^55k - Ee^55k 20 - 55e^55k = E(1 - e^ 55k) (20 - 55e^ 55k) E = -------------- (4) (1 - e^ 55k) I then find another expression for E in the same way, using (3)/(2): (1 - 20e^ 90k) E = -------------- (5) (1 - e^ 90k) By equating these two expressions, and expanding and simplifying, I get: 19 = 54e^ 55k - 35e^ 145k Here, I am stuck. I am not sure if this is the kind of expression I should be looking for to take the natural logarithm of both sides, and then solve to find k. What's more, I don't know the steps to find the natural log of the above expression. When I try: ln(19) = ln(54) + 55k - (ln(35) + 145k) I get different values for E, when substituting k back into expressions (4) and (5). So I guess I'm not doing the log thing right. What should be my next step? 2) I would also appreciate some suggestions on how I should go about measuring "closeness". That is, how well g(x) fits the discrete data set, compared to the hyberbolic function: A f(x) = ------ + C x + B Again, A,B, and C are real, arbitrary constants. 3) Is it appropriate to use integration, and compare the area under the straight line between discrete points to that of the function which passes through the points? Or is it more applicable to simply measure the vertical distance between discrete points and the model functions? I would also appreciate any suggestions as to how I might extend the investigation in some way. If it helps, I have included the complete set of discrete data: x : y ---------- 5 55 15 50 30 37 45 27 60 20 75 14 90 10 105 6 150 1 The data must be modelled by f(x) and g(x). 4) Finally, does an algorithm exist to find f(x) and g(x) to fit more than three sets of discrete data? Is it possible, say, to find a model that passes through all the above co-ordinates? Thank you very much for your time, Rohan Wilson.
Date: 12/11/96 at 01:05:15 From: Doctor Rob Subject: Re: Modelling Discrete Data with Exponential Functions You are correct in your guess that you are not doing the log thing right. The formulae you use for logs involve only products, quotients, and powers. There is no simple formula for ln(a - b), which is the log of a difference. Actually, I would deviate from your path at the point where you found that 20-E 55k ---- = e (6) 55-E and 1-E 90k ---- = e (7) 20-E At this point I would eliminate k, not E, by raising (6) to the 18th power, and (7) to the 11th power [because 18 = 90/GCD(90,55) and 11 = 55/GCD(90,55)], so that we get: (20-E)^18 990k (1-E)^11 --------- = e = --------- (8) (55-E)^18 (20-E)^11 Now we can cross-multiply, expand, and gather like terms to obtain a polynomial expression for E. Unfortunately the polynomial has degree 29 and is irreducible! Fortunately it has only three real roots and 13 pairs of complex conjugate roots, which are spurious. I used Newton's method to approximate the three real roots. These I used to compute e^ 55k using (6), but only one gave a positive result. The other two values of E, I rejected as spurious. Then I found the value of k, and used (1) to find D. You can fill in the details. I got the approximate resulting values E = 3.54676, k = -0.0207301, D = 57.0726. 2) There are two useful measures of how well a function fits points. The first is to sum over all the points the squares of the differences between the function values and the point values, then take the square root of the sum. The other is to sum over all the point the absolute values of the differences between the function values and the point values. Both can tell you how well a function fits. The smaller these measures are, the closer the fit. The one using squares will tend to make large deviations penalize you more than the one using absolute values. Probably in a case like this, where the fit is rather good already, they would yield the same decision as to which function was a better fit, but one would have to compute the actual numbers to be sure. 3) Areas are another measure of goodness of fit, provided you don't have a large positive area cancelling a large negative area. Probably an improvement would be to integrate the absolute value of the difference between the function and the piecewise linear function you describe in the preceding paragraph. The fallacy in using the piecewise linear function is that you are giving weight in your decision process to parts of the x-axis where you actually have no information whatsoever about the function you are trying to approximate. There may be long intervals in which you have no points at all, yet they are contributing heavily to this integral measure. It is definitely better to use the vertical distances at the points you know. Extending this investigation might be done in many ways. Try these measures of goodness-of-fit, and see whether they work to give you the same decision about which functions fit better. Try other collections of points to see if you can construct ones which work especially well or especially badly with each measure. 4) With only three free parameters, D, E, and k, if you had more than three conditions like (1), (2), and (3), you would have an overdetermined set of equations, and would not be able, in general, to make the function pass through all the points. This is similar to having three or more linear equations in two unknowns, which corresponds geometrically to three or more lines in the plane. A common solution would be a point through which all the lines pass. In general three or more lines are not concurrent, but you might get lucky! Glad to help. Write back if we can do more. -Doctor Rob, The Math Forum Check out our web site! http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.