Date: 06/24/2009 at 10:17:12 From: William Subject: Failure of Newton-Raphson Are there any equations that cannot be solved using the Newton-Raphson method (irrespective of the initial estimate)? I thought 0 = arctan(x) might not be solvable but one just needs to choose an initial value within about 1.39 of the root. I tried 0 = cube root of x and it seems that Newton-Raphson always diverges away from the root irrespective of the initial estimate (other than x = 0 of course). Is this correct? I am struggling with proving algebraically that Newton-Rhapson always fails with 0 = cube root of x. I would also like to solve what the critical value (for the first estimate) is for solving arctan(x) = 0 using Newton-Raphson (i.e. when does it diverge away and when does it converge to the root).
Date: 06/25/2009 at 04:46:29 From: Doctor Jacques Subject: Re: Failure of Newton-Raphson Hi William, It is correct that you cannot solve x^(1/3) = 0 by Newton-Raphson: the derivative of x^(1/3) is x^(-2/3)/3, and the iteration formula becomes: x' = x - x^(1/3)/(x^(-2/3)/3) = x - 3x = 2x and it is clear that this diverges if you start with any non-zero value of x. I don't know if it is possible to give a precise set of rules for the convergence of the N-R method; however, we can explore some related ideas and try to shed some light on the subject. It may be useful to study first another iterative method. Assume that we want to solve an equation of the form: x = g(x)  We might try to start with an initial guess x, and compute successive approximations by the formula: x[n+1] = g(x[n])  If the sequence x[k] converges to some limit a, then a is a solution of , i.e. a = g(a). For example, if we want to solve the equation x = sqrt(x) (near 1), and if we start with an initial guess of 2, this procedure gives the sequence of values: 2.0000 1.4142 1.1892 1.0905 1.0443 which converges (very slowly) to 1. The problem is that the procedure may not converge. For example, if you try it with the equation: x = 2x - 1 (whose solution is x = 1), and start with an initial value close to 1, like 1.1, you will notice that the sequence x[n] diverges. The interesting fact about the equation  is that it has a nice geometrical interpretation: each solution is an intersection of the curve y = g(x) and the line y = x. The algorithm  also has a geometrical interpretation. We start with an initial value x, and find the corresponding point on the curve y = g(x). From that point, we draw a horizontal line to the 45° line y = x. The x-coordinate of the intersection is the next approximation x (try it, it is not very easy to draw this in an e-mail...). You will see that, if the sequence converges, you get either a staircase pattern or a "square spiral", depending on whether g(x) is increasing or decreasing. If the sequence diverges, you get the same pattern in reverse, i.e., the points get further apart from the solution. To find a sufficient condition for convergence, we draw two lines at (+/-) 45° from the solution point (a, g(a)). One of these lines is y = x, the other one is perpendicular and has equation y - g(a) = -(x - a). These lines divide the plane in four regions. We call the left and right regions "good" and the top and bottom regions "bad". The main result is that, if there is a neighborhood of a such that the graph of y = g(x) is entirely contained in the "good" region in that neighborhood, and if you start with an initial value in the neighborhood, the sequence x[n] will converge to a. Note that this is only a sufficient condition: if the condition is not satisfied, anything can happen. (If there is a neighborhood of a such that the graph of y = g(x) is entirely in the "bad" region in that neighborhood, then the sequence cannot converge "smoothly" to a, although you might still stumble on the solution by accident, depending on the behavior of g(x) outside the neighborhood). This can be proved rigorously (it is not very hard), but it can be seen easily by looking at the graph. In doing so, you will also notice that the convergence will be faster if the graph of g is closer to the horizontal. Now, it is a standard result of analysis (a variation of Rolle's theorem) that a _sufficient_ condition for the graph to be contained in the good region is that there exists a positive real number L < 1, such that |g'(x)| <= L in the neighborhood in question, where |g'(x)| is the absolute value of the derivative of g(x). (Note that it is not enough to have |g'(x)| < 1 : L must be fixed). To summarize: If there is a neighborhood of a such that |g'(a)| < L in that neighborhood, where L is positive and < 1, then the procedure  will converge to a for any initial value in the neighborhood in question. Note that this also is merely a sufficient condition. Using a previous remark, you will also expect that the convergence will be faster if L is smaller. Given an equation: f(x) = 0  there may be many ways to convert it to the form ; some of these ways will lead to convergent sequences, and some will lead to divergent sequences; in addition, the speed of convergence may be different depending on the form you select. For example, the equation x = sqrt(x) that we encountered previously can also be written as x = x^2. In this latter form, the algorithm  will never converge to the root x = 1 : if the initial value is > 1, the sequence diverges and if the initial value is < 1, the sequence converges to the other root x = 0. One possible way to convert an equation  to the form  is to write it as: g(x) = x + a*f(x)  which means that we take g(x) = x + a*f(x) for some constant a. As we have seen, the convergence will be faster if we can make |g'(x)| as small as possible in some neighborhood of the root. We have: g'(x) = 1 + a*f'(x)  If the derivative f'(x) is approximately equal to some constant k in the vicinity of the root, a good choice for a would be -1/k, since this will make g' approximately 0. If we guess k as f'(x), this gives the iteration formula: x[n+1] = g(x[n]) = x[n] + a*f(x[n]) = x[n] - f(x[n])/f'(x[n])  which is Newton's formula (at least...). We can now reverse the argument. Starting with Newton's formula, we write it in the form , with: g(x) = x - f(x)/f'(x) Note that (at least, if f'(x) is not zero), any solution of x = g(x) is a solution of f(x) = 0. As we have seen, this will give a convergent sequence if (but not only if) there is a neighborhood of the solution a and a positive L < 1 such that |g'(x)| < L in that neighborhood. We compute g'(x): g'(x) = 1 - ((f'(x)^2) - f(x)f"(x))/(f'(x)^2) = f(x)f"(x)/(f'(x)^2)  and the sequence will converge if there is a neighborhood of a such that this is between -L and +L, for some L < 1. If f' and f" are sufficiently well-behaved, there is no problem finding such a neighborhood, since, for x close to the root, f(x) will be very small. The problem with the equation x^(1/3) = 0 is that we have: f(x) = x^(1/3) f'(x) = x^(-2/3)/3 f"(x) = (-2/9)x^(-5/3) g'(x) = f(x)f"(x)/f'(x)^2 = -2 and this is < -1 for all x, which means that we cannot conclude that the sequence is convergent. In fact, as noted before, this implies that the sequence will always be divergent, because the inequality holds for all x. Intuitively, this comes from the fact that f"(x) tends to infinity faster than f'(x)^2. We have: f"(x)/f'(x)^2 = -2x^(-1/3) On the positive side, there are cases where such an approach can lead to exceptionally fast convergence. See, for example: An Iterative Method of Calculating Pi http://mathforum.org/library/drmath/view/65244.html Please feel free to write back if you want to discuss this further. - Doctor Jacques, The Math Forum http://mathforum.org/dr.math/
Search the Dr. Math Library:
Ask Dr. MathTM
© 1994-2013 The Math Forum