CS227-Scientific Computing. Lecture 6: Nonlinear Equations

CS227-Scientific Computing Lecture 6: Nonlinear Equations

A Financial Problem You invest $100 a month in an interest-bearing account. You make 60 deposits, and one month after the last deposit (5 years after the first deposit) you withdraw the money in the account. You would like to have $ 8000. What must the rate of interest on the account be in order to achieve this?

A Financial Problem Let s solve the problem in general with r : monthly rate of interest d : monthly deposit n : number of periods M : goal

A Financial Problem Account balance after 0 months: d. Account balance after 1 month (before second deposit): (1 + r)d. Account balance after 2 months (before third deposit): (1 + r) 2 d + (1 + r)d. Account balance after n months: d((1 + r) n + (1 + r) n 1 + + (1 + r)) = (geometric series) n 1 d(1 + r) (1 + r) j = j=0 [ (1 + r) n ] 1 d(1 + r). r

A Financial Problem So we have a nonlinear equation; [ (1 + r) n ] 1 d(1 + r) = M. r With our original values for d, n, m, we have to solve the nonlinear equation [ (1 + r) 60 ] 1 (1 + r) 80 = 0. r

Truncation error and rate of convergence of iterative methods The solutions to linear equations and linear systems can be expressed exactly as sums, products and quotients of their coefficients. All the error present in calculating solutions this way is due to roundoff error. In contrast, nonlinear equations usually cannot have their solutions expressed this way. Instead, solution methods typically generate a sequence of approximations that converge to the root. So in addition to the roundoff error, there is also truncation error: If you cut the iterative process off after s steps, how close are you to the correct answer? How fast does the process converge to the root?

A simple idea: Bisection Theorem Intermediate Value Theorem. If f is a continuous function throughout the interval a x b, and f (a) f (b) < 0, then there is some c, with a < c < b, such that f (c) = 0.

Bisection Method So if we start with an interval [a, b] that brackets a root of f, we can split it in two by computing the midpoint c = a+b 2. One of the two subintervals [a, c] or [c, b] will bracket a root, and we can continue subdividing until the width of the bracketing interval is as small as we desire. In the figure below, we go from [a, b] to [a, b ] to [a, b ] to [a, b ] to[a, b ].

Rate of Convergence of Bisection After k steps, we have the root trapped between two numbers that are (b a) 2 k apart, so we get roughly one additional bit of precision for each evaluation of the function f. As long as we start with a pair of numbers that brackets a root, and as long as f is continuous, this is foolproof. Note that we have not discussed how to find brackets (sometimes a plot, or careful thinking about the function itself, can help). There is a risk that the initial interval has more than one root of f. And the rate of convergence is rather slow.

Implementation of Bisection The bisection function posted on the course website shows a robust implementation in MATLAB. Note, first of all, the use of function handles as arguments, as well as the flexibility that allows us to apply this to a function of several variables fixing the values of all but the first variable and solving for the first variable. Since evaluating the function f is likely to be the most time-consuming step, the code is written to make sure that f is evaluated only once in each pass through the while loop. Note also that the function returns a lot of information: The final values of the endpoints of the bracketing interval as well as the values of f at these endpoints. You can of course just call it with a single output argument and get an approximation to the root.

Implementation of Bisection Here is the function bisection applied to our interest rate problem. The depositor is putting in a total of $ 6000. If he earned 33 % interest in the last month, he would get the desired $ 8000 in one period, so we can use 0.33 as an upper bound. We might like to use 0 as a lower bound, but our function is written in a form that is not defined at r = 0, so we need to use a very small positive value let s say 0.001 (one tenth of one percent interest per month, which is surely too small). >> F=@(r,d,n,M)d*(1+r)*((1+r)^n-1)/r-M; >> [x,y,r,s]=bisection(f,0.001,0.33,100,60,8000) x = 0.009072994445666 y = 0.009072994445666 r = -1.809894456528127e-10 s =0.910827217623591e-11 So the answer is about 0.91 % monthly interest, which is close to 11% annual interest.

Newton s Method The pretty idea here is to guess a value that is close to the root, then follow the tangent line at that point until it crosses the x-axis. This should be a closer approximation to the root, and we can iterate the procedure.

Newton s Method Call the initial guess x 0, and the subsequent approximations x 1, x 2, etc. The equation of the tangent line to the graph of f at (x i, f (x i )) is y = f (x i )(x x i ) + f (x i ), so we have or 0 = f (x i )(x i+1 x i ) + f (x i ), x i+1 = x i f (x i) f (x i ).

Example For instance, let us try to find a solution to the 5th degree polynomial equation x 5 + 2x 2 = 0. We begin with a plot, which suggests a starting guess of x 0 = 1.

Example-continued The Newton s method iteration is x i+1 = x i x i 5 + 2x i 2 5xi 4. + 2 Let s try this out: >> G=@(x)x-((x^5+2*x-2)/(5*x^4+2)); >> x=1; >> x=g(x) x = 0.857142857142857 >> x=g(x) x = 0.819484893762504 >> x=g(x) x = 0.817476251723243 >> x=g(x) x = 0.817471019036304 >> x=g(x) x = 0.817471019000967 >> x=g(x) x = 0.817471019000967

Example-continued In successive iterations we get 1, 2, 5, 9 correct decimal digits, and the answer stabilizes after the 5th iterate. So the convergence is very fast.

Rate of convergence of Newton s Method How closely does the tangent line to the graph of f at a approximate f (x) for x close to a? Taylor s Theorem (for degree 2): f (x) = f (a) + f (a)(x a) + f (x a)2 (c) 2 for some c between a and x. So if we take x to be a root of f and a = x i, we get so 0 = f (x i ) + f (x i )(x x i ) + f (c) (x x i ) 2, 2 x i+1 x = f (c) 2f (x i ) x i x 2.

Rate of convergence of Newton s Method Roughly speaking, this means that if ɛ i represents the absolute error at the i th iteration, then ɛ i+1 f (x ) f (x ) ɛ2 i. So, if you start out close enough to x, then number of correct digits roughly doubles at each iteration. (Quadratic convergence.) This is very fast. But there are lots of caveats: If you don t start out close enough to a root, then the iterates may fail to converge altogether. If f (x ) is zero, or close to zero, the convergence may be very slow. Furthermore, the method requires you to know the derivative of f. If the values of f are only tabulated, this may be unavailable. Even if it is available, you have to evaluate both f and f at each iteration, which means that there is extra work to do.

Rapidly convergent methods that do not require the derivative. Secant method: Start out with two guesses x 0 and x 1 that bracket the root. At each subsequent step, set x i+1 to be the point where the line segment joining the points crosses the x-axis. (x i 1, f (x i 1 )), (x i, f (x i ))

Rapidly convergent methods that do not require the derivative When things are working right, the rate of convergence of the secant method is much faster than linear, but not as fast as quadratic. There are some of the same issues as with Newton s method: Poor choices of initial values can get your farther and farther from the root.

Rapidly convergent methods that do not require the derivative The industrial-strength method used in MATLAB combines several strategies. For most rapid convergence it keeps track of the three previous points x i 2, x i 1, x i. It then uses quadratic interpolation to find a parabola through the three points (x i 2, f (x i 2 ), (x i 1, f (x i 1 )), (x i, f (x i ). The catch is, it does this backwards, interchanging the roles of the x- and y-coordinates. So the resulting parabola is oriented with its axis parallel to the x-axis, and thus intersects the x-axis at one point, which is x i+1. This is called inverse quadratic interpolation. In cases where inverse quadratic interpolation won t work (e.g., two of the y-coordinates the same), or the method appears to be wandering further from a root, the algorithm will take a step of the secant method or bisection instead.

fzero The basic syntax is x = fzero(function handle,guess) x = fzero(function handle,left bracket, right bracket)) but as usual, there are many options. For instance, type x=fzero(function handle,guess,optimset( Display, iter )) to get an idea of what fzero is doing.