The Math Forum

Ask Dr. Math - Questions and Answers from our Archives
Associated Topics || Dr. Math Home || Search Dr. Math

Newton-Raphson Method

Date: 06/24/2009 at 10:17:12
From: William
Subject: Failure of Newton-Raphson

Are there any equations that cannot be solved using the Newton-Raphson
method (irrespective of the initial estimate)?  I thought 
0 = arctan(x) might not be solvable but one just needs to choose an
initial value within about 1.39 of the root.  I tried 0 = cube root of
x and it seems that Newton-Raphson always diverges away from the root
irrespective of the initial estimate (other than x = 0 of course).  Is
this correct?

I am struggling with proving algebraically that Newton-Rhapson 
always fails with 0 = cube root of x.  I would also like to solve what
the critical value (for the first estimate) is for solving arctan(x) =
0 using Newton-Raphson (i.e. when does it diverge away and when does 
it converge to the root). 

Date: 06/25/2009 at 04:46:29
From: Doctor Jacques
Subject: Re: Failure of Newton-Raphson

Hi William,

It is correct that you cannot solve x^(1/3) = 0 by Newton-Raphson: the
derivative of x^(1/3) is x^(-2/3)/3, and the iteration formula becomes:

  x' = x - x^(1/3)/(x^(-2/3)/3)
     = x - 3x
     = 2x

and it is clear that this diverges if you start with any non-zero
value of x.

I don't know if it is possible to give a precise set of rules for the
convergence of the N-R method; however, we can explore some related
ideas and try to shed some light on the subject.

It may be useful to study first another iterative method.  Assume that
we want to solve an equation of the form:

  x = g(x)               [1]

We might try to start with an initial guess x[0], and compute
successive approximations by the formula:

  x[n+1] = g(x[n])       [2]

If the sequence x[k] converges to some limit a, then a is a solution
of [1], i.e. a = g(a).  For example, if we want to solve the equation
x = sqrt(x) (near 1), and if we start with an initial guess of 2, this
procedure gives the sequence of values:


which converges (very slowly) to 1.  The problem is that the procedure
may not converge.  For example, if you try it with the equation:

  x = 2x - 1

(whose solution is x = 1), and start with an initial value close to 1,
like 1.1, you will notice that the sequence x[n] diverges.

The interesting fact about the equation [1] is that it has a nice
geometrical interpretation: each solution is an intersection of the
curve y = g(x) and the line y = x.  The algorithm [2] also has a
geometrical interpretation.

We start with an initial value x[0], and find the corresponding point
on the curve y = g(x).  From that point, we draw a horizontal line to
the 45 line y = x.  The x-coordinate of the intersection is the next
approximation x[1] (try it, it is not very easy to draw this in an
e-mail...).  You will see that, if the sequence converges, you get
either a staircase pattern or a "square spiral", depending on whether
g(x) is increasing or decreasing.  If the sequence diverges, you get
the same pattern in reverse, i.e., the points get further apart from
the solution.

To find a sufficient condition for convergence, we draw two lines at
(+/-) 45 from the solution point (a, g(a)).  One of these lines is 
y = x, the other one is perpendicular and has equation 
y - g(a) = -(x - a).  These lines divide the plane in four regions. We
call the left and right regions "good" and the top and bottom regions

The main result is that, if there is a neighborhood of a such that the
graph of y = g(x) is entirely contained in the "good" region in that
neighborhood, and if you start with an initial value in the
neighborhood, the sequence x[n] will converge to a.  Note that this is
only a sufficient condition: if the condition is not satisfied,
anything can happen.  (If there is a neighborhood of a such that the
graph of y = g(x) is entirely in the "bad" region in that
neighborhood, then the sequence cannot converge "smoothly" to a,
although you might still stumble on the solution by accident,
depending on the behavior of g(x) outside the neighborhood).

This can be proved rigorously (it is not very hard), but it can be
seen easily by looking at the graph.  In doing so, you will also
notice that the convergence will be faster if the graph of g is
closer to the horizontal.

Now, it is a standard result of analysis (a variation of Rolle's
theorem) that a _sufficient_ condition for the graph to be contained
in the good region is that there exists a positive real number L < 1,
such that |g'(x)| <= L in the neighborhood in question, where |g'(x)|
is the absolute value of the derivative of g(x).  (Note that it is not
enough to have |g'(x)| < 1 : L must be fixed).  To summarize:

  If there is a neighborhood of a such that |g'(a)| < L in that
  neighborhood, where L is positive and < 1, then the procedure [2]
  will converge to a for any initial value in the neighborhood in

Note that this also is merely a sufficient condition.  Using a
previous remark, you will also expect that the convergence will be
faster if L is smaller.

Given an equation:

  f(x) = 0              [3]

there may be many ways to convert it to the form [1]; some of these
ways will lead to convergent sequences, and some will lead to
divergent sequences; in addition, the speed of convergence may be
different depending on the form you select.  For example, the equation
x = sqrt(x) that we encountered previously can also be written as
x = x^2.  In this latter form, the algorithm [2] will never converge
to the root x = 1 : if the initial value is > 1, the sequence diverges
and if the initial value is < 1, the sequence converges to the other
root x = 0.

One possible way to convert an equation [3] to the form [1] is to
write it as:

  g(x) = x + a*f(x)        [4]

which means that we take g(x) = x + a*f(x) for some constant a.  As we
have seen, the convergence will be faster if we can make |g'(x)| as
small as possible in some neighborhood of the root.  We have:

  g'(x) = 1 + a*f'(x)   [5]

If the derivative f'(x) is approximately equal to some constant k in
the vicinity of the root, a good choice for a would be -1/k, since
this will make g' approximately 0.  If we guess k as f'(x), this gives
the iteration formula:

  x[n+1] = g(x[n])
         = x[n] + a*f(x[n])
         = x[n] - f(x[n])/f'(x[n])   [6]

which is Newton's formula (at least...).

We can now reverse the argument.  Starting with Newton's formula, we
write it in the form [2], with:

  g(x) = x - f(x)/f'(x)

Note that (at least, if f'(x) is not zero), any solution of x = g(x)
is a solution of f(x) = 0.

As we have seen, this will give a convergent sequence if (but not only
if) there is a neighborhood of the solution a and a positive
L < 1 such that |g'(x)| < L in that neighborhood.  We compute g'(x):

  g'(x) = 1 - ((f'(x)^2) - f(x)f"(x))/(f'(x)^2)
        = f(x)f"(x)/(f'(x)^2)         [7]

and the sequence will converge if there is a neighborhood of a such
that this is between -L and +L, for some L < 1.  If f' and f" are
sufficiently well-behaved, there is no problem finding such a
neighborhood, since, for x close to the root, f(x) will be very small.

The problem with the equation x^(1/3) = 0 is that we have:

  f(x) = x^(1/3)
  f'(x) = x^(-2/3)/3
  f"(x) = (-2/9)x^(-5/3)
  g'(x) = f(x)f"(x)/f'(x)^2 = -2

and this is < -1 for all x, which means that we cannot conclude that
the sequence is convergent.  In fact, as noted before, this implies
that the sequence will always be divergent, because the inequality
holds for all x.

Intuitively, this comes from the fact that f"(x) tends to infinity
faster than f'(x)^2.  We have:

  f"(x)/f'(x)^2 = -2x^(-1/3)

On the positive side, there are cases where such an approach can lead
to exceptionally fast convergence.  See, for example:

  An Iterative Method of Calculating Pi 

Please feel free to write back if you want to discuss this further.

- Doctor Jacques, The Math Forum 
Associated Topics:
College Calculus
High School Calculus
High School Functions

Search the Dr. Math Library:

Find items containing (put spaces between keywords):
Click only once for faster results:

[ Choose "whole words" when searching for a word like age.]

all keywords, in any order at least one, that exact phrase
parts of words whole words

Submit your own question to Dr. Math

[Privacy Policy] [Terms of Use]

Math Forum Home || Math Library || Quick Reference || Math Forum Search

Ask Dr. MathTM
© 1994- The Math Forum at NCTM. All rights reserved.