Associated Topics || Dr. Math Home || Search Dr. Math

Modelling Discrete Data with Exponential Functions

```
Date: 5/14/96 at 10:5:15
From: Anonymous
Subject: Modelling Discrete Data with Exponential Functions

My questions are a part of an assignment for 12th grade mathematics.

1) I am stuck trying to fit this exponential function to a set of
three discrete data points; (5,55), (60,20), (150,1):

g(x) = De^kx + E       where D,k, and E are real, arbitrary constants.

The assignment is based on modelling a function to pass through three
discrete points. We are to assess how closely the model relates to
other discrete points between the points that the model intercepts.
eg: (30,37) (45,27) (75,14) (90,10).

I can find g(x) to fit discrete data that have evenly spaced x-values,
eg:  (15,50)  (60,20)  (105,6). To do it, I have used simultaneous
equations to find a value for D,k, and E, but the method only works
with evenly spaced data on the horizontal axis.

My method for finding a function to fit the unevenly spaced
co-ordinates is as follows:

55 = De^5k + E             (1)

20 = De^60k + E            (2)

1 = De^150k + E            (3)

Find E from (2)/(1):

20-E
----  = e^55k
55-E

20 - E = 55e^55k  -  Ee^55k

20 - 55e^55k = E(1 - e^ 55k)

(20 - 55e^ 55k)
E   = --------------                (4)
(1  -  e^ 55k)

I then find another expression for E in the same way, using (3)/(2):

(1 - 20e^ 90k)
E  = --------------                  (5)
(1 - e^ 90k)

By equating these two expressions, and expanding and simplifying, I
get:

19 = 54e^ 55k - 35e^ 145k

Here, I am stuck. I am not sure if this is the kind of expression I
should be looking for to take the natural logarithm of both sides, and
then solve to find k.  What's more, I don't know the
steps to find the natural log of the above expression. When I try:

ln(19) = ln(54) + 55k - (ln(35) + 145k)

I get different values for E, when substituting k back into
expressions (4) and (5). So I guess I'm not doing the log thing right.
What should be my next step?

2) I would also appreciate some suggestions on how I should go about
measuring "closeness". That is, how well g(x) fits the discrete data
set, compared to the hyberbolic function:
A
f(x) = ------  + C
x + B

Again, A,B, and C are real, arbitrary constants.

3) Is it appropriate to use integration, and compare the area under
the straight line between discrete points to that of the function
which passes through the points?  Or is it more applicable to simply
measure the vertical distance between discrete points and the model
functions? I would also appreciate any suggestions as to how I might
extend the investigation in some way.

If it helps, I have included the complete set of discrete data:

x  :   y
----------
5    55
15    50
30    37
45    27
60    20
75    14
90    10
105     6
150     1

The data must be modelled by f(x) and g(x).

4) Finally, does an algorithm exist to find f(x) and g(x) to fit more
than three sets of discrete data? Is it possible, say, to find a
model that passes through all the above co-ordinates?

Thank you very much for your time,

Rohan Wilson.
```

```
Date: 12/11/96 at 01:05:15
From: Doctor Rob
Subject: Re: Modelling Discrete Data with Exponential Functions

You are correct in your guess that you are not doing the log thing
right.  The formulae you use for logs involve only products,
quotients, and powers.  There is no simple formula for ln(a - b),
which is the log of a difference.

Actually, I would deviate from your path at the point where you found
that

20-E        55k
----    =  e         (6)
55-E

and

1-E      90k
----  =  e           (7)
20-E

At this point I would eliminate k, not E, by raising (6) to the 18th
power, and (7) to the 11th power [because 18 = 90/GCD(90,55) and
11 = 55/GCD(90,55)], so that we get:

(20-E)^18    990k    (1-E)^11
--------- = e     = ---------         (8)
(55-E)^18           (20-E)^11

Now we can cross-multiply, expand, and gather like terms to obtain a
polynomial expression for E. Unfortunately the polynomial has degree
29 and is irreducible! Fortunately it has only three real roots and 13
pairs of complex conjugate roots, which are spurious. I used Newton's
method to approximate the three real roots.  These I used to compute
e^ 55k using (6), but only one gave a positive result. The other two
values of E, I rejected as spurious. Then I found the value of k, and
used (1) to find D.

You can fill in the details.  I got the approximate resulting values
E = 3.54676, k = -0.0207301, D = 57.0726.

2) There are two useful measures of how well a function fits points.
The first is to sum over all the points the squares of the differences
between the function values and the point values, then take the square
root of the sum. The other is to sum over all the point the absolute
values of the differences between the function values and the point
values. Both can tell you how well a function fits. The smaller these
measures are, the closer the fit. The one using squares will tend to
make large deviations penalize you more than the one using absolute
values. Probably in a case like this, where the fit is rather good
already, they would yield the same decision as to which function was
a better fit, but one would have to compute the actual numbers to be
sure.

3) Areas are another measure of goodness of fit, provided you don't
have a large positive area cancelling a large negative area. Probably
an improvement would be to integrate the absolute value of the
difference between the function and the piecewise linear function you
describe in the preceding paragraph.

The fallacy in using the piecewise linear function is that you are
giving weight in your decision process to parts of the x-axis where
you actually have no information whatsoever about the function you
are trying to approximate. There may be long intervals in which you
have no points at all, yet they are contributing heavily to this
integral measure. It is definitely better to use the vertical
distances at the points you know.

Extending this investigation might be done in many ways. Try these
measures of goodness-of-fit, and see whether they work to give you the
same decision about which functions fit better.  Try other collections
of points to see if you can construct ones which work especially well
or especially badly with each measure.

4) With only three free parameters, D, E, and k, if you had more than
three conditions like (1), (2), and (3), you would have an
overdetermined set of equations, and would not be able, in general, to
make the function pass through all the points. This is similar to
having three or more linear equations in two unknowns, which
corresponds geometrically to three or more lines in the plane. A
common solution would be a point through which all the lines pass. In
general three or more lines are not concurrent, but you might get
lucky!

Glad to help.  Write back if we can do more.

-Doctor Rob,  The Math Forum
Check out our web site!  http://mathforum.org/dr.math/
```
Associated Topics:
High School Calculus

Search the Dr. Math Library:

 Find items containing (put spaces between keywords):   Click only once for faster results: [ Choose "whole words" when searching for a word like age.] all keywords, in any order at least one, that exact phrase parts of words whole words

Submit your own question to Dr. Math
Math Forum Home || Math Library || Quick Reference || Math Forum Search