Solving Nonlinear Equations

Learning objectives

Set up a problem with one parameter
Solve a problem with one parameter

Root of a Function

Consider a function $f : R \to R <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo>:</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mo accent="false" stretchy="false">\to</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow></math>$ . The point $x \in R <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>\in</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow></math>$ is called the root of $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ if $f (x) = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn></math>$ .

Solution of an Equation

Finding the values of $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ for which $f (x) = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mn>0</mn></math>$ is useful for many applications, but a more general task is to find the values of $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ for which $f (x) = y <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>y</mi></math>$ . The same techniques used to find the root of a function can be used to solve an equation by manipulating the function like so:

˜ f (x) = f (x) - y = 0 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mover><mi>f</mi><mo stretchy="false">~</mo></mover></mrow><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>-</mo><mi>y</mi><mo>=</mo><mn>0</mn></math>

The new function $˜ f (x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mover><mi>f</mi><mo stretchy="false">~</mo></mover></mrow><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></math>$ has a root at the solution to the original equation $f (x) = y <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><mi>y</mi></math>$ .

Definition of Jacobian Matrix

Given $f : R n \to R n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">f</mi><mo>:</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup><mo accent="false" stretchy="false">\to</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ we define the Jacobian matrix $J f <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub></math>$ as:

Jf(x)=[∂f1∂x1…∂f1∂xn⋮⋱⋮∂fn∂x1…∂fn∂xn]<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false">)</mo><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mfrac><mrow><mi>∂</mi><msub><mi>f</mi><mn>1</mn></msub></mrow><mrow><mi>∂</mi><msub><mi>x</mi><mn>1</mn></msub></mrow></mfrac></mtd><mtd><mo>…</mo></mtd><mtd><mfrac><mrow><mi>∂</mi><msub><mi>f</mi><mn>1</mn></msub></mrow><mrow><mi>∂</mi><msub><mi>x</mi><mi>n</mi></msub></mrow></mfrac></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd><mtd><mo>⋱</mo></mtd><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><mfrac><mrow><mi>∂</mi><msub><mi>f</mi><mi>n</mi></msub></mrow><mrow><mi>∂</mi><msub><mi>x</mi><mn>1</mn></msub></mrow></mfrac></mtd><mtd><mo>…</mo></mtd><mtd><mfrac><mrow><mi>∂</mi><msub><mi>f</mi><mi>n</mi></msub></mrow><mrow><mi>∂</mi><msub><mi>x</mi><mi>n</mi></msub></mrow></mfrac></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

Solving One Equation

Linear functions are trivial to solve, as are quadratic functions if you have the quadratic formula memorized. However, polynomials of higher degree and non-polynomial functions are much more difficult to solve. The simplest technique for solving these types of equations is to use an iterative root-finding technique.

We will try out the following techniques using the function:

f (x) = x 3 - x - 1 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo><mo>=</mo><msup><mi>x</mi><mn>3</mn></msup><mo>-</mo><mi>x</mi><mo>-</mo><mn>1</mn></math>

Bisection Method

The bisection method is the simplest root-finding technique.

Algorithm

The algorithm for bisection is analogous to binary search:

Take two points, $a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ and $b <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>b</mi></math>$ , on each side of the root such that $f (a) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></math>$ and $f (b) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></math>$ have opposite signs.
Calculate the midpoint $c=a+b2<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi><mo>=</mo><mfrac><mrow><mi>a</mi><mo>+</mo><mi>b</mi></mrow><mn>2</mn></mfrac></math>$
Evaluate $f (c) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>c</mi><mo stretchy="false">)</mo></math>$ and use $c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ to replace either $a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ or $b <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>b</mi></math>$ , keeping the signs of the endpoints opposite.

With this algorithm we successively half the length of the interval known to contain the root each time. We can repeat this process until the length of the interval is less than the tolerance to which we want to know the root.

Computational Cost

Conceptually bisection method uses 2 function evaluations at each iteration. However, at each step either one of $a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ or $b <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>b</mi></math>$ stays the same. So, at each iteration (after the first iteration), one of $f (a) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></math>$ or $f (b) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></math>$ was computed during the previous iteration. Therefore, bisection method requires only one new function evaluation per iteration. Depending on how costly the function is to evaluate, this can be a significant cost savings.

Convergence

Bisection method has linear convergence, with a constant of 1/2.

Drawbacks

The bisection method requires us to know a little about our function. Specifically, $f (x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></math>$ must be continuous and we must have an interval $[a, b] <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">[</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">]</mo></math>$ such that

sgn (f (a)) = - sgn (f (b)) . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi data-mjx-auto-op="false">sgn</mi></mrow><mo stretchy="false">(</mo><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>=</mo><mo>-</mo><mrow data-mjx-texclass="ORD"><mi data-mjx-auto-op="false">sgn</mi></mrow><mo stretchy="false">(</mo><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo><mo stretchy="false">)</mo><mo>.</mo></math>

Then, by the intermediate value theorem, we know that there must be a root in the interval $[a, b] <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">[</mo><mi>a</mi><mo>,</mo><mi>b</mi><mo stretchy="false">]</mo></math>$ .

This restriction means that the bisection method cannot solve for the root of $x 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>x</mi><mn>2</mn></msup></math>$ , as it never crosses the x-axis and becomes negative.

Example

From the graph above, we can see that $f (x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></math>$ has a root somewhere between 1 and 2. It is difficult to tell exactly what the root is, but we can use the bisection method to approximate it. Specifically, we can set $a = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi><mo>=</mo><mn>1</mn></math>$ and $b = 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>b</mi><mo>=</mo><mn>2</mn></math>$ .

Iteration 1

a=1b=2c=a+b2=32=1.5<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mi>a</mi></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn></mtd></mtr><mtr><mtd><mi>b</mi></mtd><mtd><mi></mi><mo>=</mo><mn>2</mn></mtd></mtr><mtr><mtd><mi>c</mi></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mi>a</mi><mo>+</mo><mi>b</mi></mrow><mn>2</mn></mfrac><mo>=</mo><mfrac><mn>3</mn><mn>2</mn></mfrac><mo>=</mo><mn>1.5</mn></mtd></mtr></mtable></math>

f (a) = f (1) = 13 - 1 - 1 = - 1 f (b) = f (2) = 23 - 2 - 1 = 5 f (c) = f (1.5) = 1.5 3 - 1.5 - 1 = 0.875 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left right left" columnspacing="0em 2em 0em" rowspacing="3pt"><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><msup><mn>1</mn><mn>3</mn></msup><mo>-</mo><mn>1</mn><mo>-</mo><mn>1</mn></mtd><mtd><mi></mi><mo>=</mo><mo>-</mo><mn>1</mn></mtd></mtr><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>2</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><msup><mn>2</mn><mn>3</mn></msup><mo>-</mo><mn>2</mn><mo>-</mo><mn>1</mn></mtd><mtd><mi></mi><mo>=</mo><mn>5</mn></mtd></mtr><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>c</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1.5</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><msup><mn>1.5</mn><mn>3</mn></msup><mo>-</mo><mn>1.5</mn><mo>-</mo><mn>1</mn></mtd><mtd><mi></mi><mo>=</mo><mn>0.875</mn></mtd></mtr></mtable></math>

Since $f (b) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></math>$ and $f (c) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>c</mi><mo stretchy="false">)</mo></math>$ are both positive, we will replace $b <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>b</mi></math>$ with $c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ and further narrow our interval.

Iteration 2

a=1b=1.5c=a+b2=2.52=1.25<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mi>a</mi></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn></mtd></mtr><mtr><mtd><mi>b</mi></mtd><mtd><mi></mi><mo>=</mo><mn>1.5</mn></mtd></mtr><mtr><mtd><mi>c</mi></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mi>a</mi><mo>+</mo><mi>b</mi></mrow><mn>2</mn></mfrac><mo>=</mo><mfrac><mn>2.5</mn><mn>2</mn></mfrac><mo>=</mo><mn>1.25</mn></mtd></mtr></mtable></math>

f (a) = f (1) = - 1 f (b) = f (1.5) = 0.875 f (c) = f (1.25) = 1.25 3 - 1.25 - 1 = - 0.296875 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left right" columnspacing="0em 2em" rowspacing="3pt"><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mo>-</mo><mn>1</mn></mtd></mtr><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1.5</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mn>0.875</mn></mtd></mtr><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>c</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1.25</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><msup><mn>1.25</mn><mn>3</mn></msup><mo>-</mo><mn>1.25</mn><mo>-</mo><mn>1</mn><mo>=</mo><mo>-</mo><mn>0.296875</mn></mtd></mtr></mtable></math>

Since $f (a) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></math>$ and $f (c) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>c</mi><mo stretchy="false">)</mo></math>$ are both negative, we will replace $a <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>a</mi></math>$ with $c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$ and further narrow our interval.

Note that as described above, we didn’t need to recalculate $f (a) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></math>$ or $f (b) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></math>$ as we had already calculated them during the previous iteration. Reusing these values can be a significant cost savings.

Iteration 3

a=1.25b=1.5c=a+b2=1.25+1.52=1.375<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mi>a</mi></mtd><mtd><mi></mi><mo>=</mo><mn>1.25</mn></mtd></mtr><mtr><mtd><mi>b</mi></mtd><mtd><mi></mi><mo>=</mo><mn>1.5</mn></mtd></mtr><mtr><mtd><mi>c</mi></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mi>a</mi><mo>+</mo><mi>b</mi></mrow><mn>2</mn></mfrac><mo>=</mo><mfrac><mrow><mn>1.25</mn><mo>+</mo><mn>1.5</mn></mrow><mn>2</mn></mfrac><mo>=</mo><mn>1.375</mn></mtd></mtr></mtable></math>

f (a) = f (1.25) = - 0.296875 f (b) = f (1.5) = 0.875 f (c) = f (1.375) = 1.375 3 - 1.375 - 1 = 0.224609375 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left right" columnspacing="0em 2em" rowspacing="3pt"><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>a</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1.25</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mo>-</mo><mn>0.296875</mn></mtd></mtr><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>b</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1.5</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mn>0.875</mn></mtd></mtr><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>c</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1.375</mn><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><msup><mn>1.375</mn><mn>3</mn></msup><mo>-</mo><mn>1.375</mn><mo>-</mo><mn>1</mn><mo>=</mo><mn>0.224609375</mn></mtd></mtr></mtable></math>

…

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$

When running the code for bisection method given below, the resulting approximate root determined is 1.324717957244502. With bisection, we can approximate the root to a desired tolerance (the value above is for the default tolerances).

Code

The following Python code calls SciPy’s bisect method:

import scipy.optimize as opt

def f(x):
    return x**3 - x - 1

root = opt.bisect(f, a=1, b=2)

Newton’s Method

The Newton-Raphson Method (a.k.a. Newton’s Method) uses a Taylor series approximation of the function to find an approximate solution. Specifically, it takes the first 2 terms:

f (x k + h) \approx f (x k) + f' (x k) h <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo>+</mo><mi>h</mi><mo stretchy="false">)</mo><mo>\approx</mo><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo><mo>+</mo><msup><mi>f</mi><mo data-mjx-alternate="1">'</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo><mi>h</mi></math>

Algorithm

Starting with the Taylor series above, we can find the root of this new function like so:

$f (x k) + f' (x k) h = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo><mo>+</mo><msup><mi>f</mi><mo data-mjx-alternate="1">'</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo><mi>h</mi><mo>=</mo><mn>0</mn></math>$ $h=−f(xk)f′(xk)<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>h</mi><mo>=</mo><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo></mrow></mfrac></math>$

This value of $h <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>h</mi></math>$ can now be used to find a value of $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi></math>$ closer to the root of $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ :

xk+1=xk+h=xk−f(xk)f′(xk)<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><msub><mi>x</mi><mi>k</mi></msub><mo>+</mo><mi>h</mi><mo>=</mo><msub><mi>x</mi><mi>k</mi></msub><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo></mrow></mfrac></math>

Geometrically, $(x k + 1, 0) <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>,</mo><mn>0</mn><mo stretchy="false">)</mo></math>$ is the intersection of the x-axis and the tangent of the graph at $(x k, f (x k)) <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo>,</mo><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo><mo stretchy="false">)</mo></math>$ .

By repeatedly this procedure, we can get closer and closer to the actual root.

Computational Cost

With Newton’s method, at each iteration we must evaluate both $f (x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></math>$ and $f' (x) <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>f</mi><mo data-mjx-alternate="1">'</mo></msup><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></math>$ .

Convergence

Typically, Newton’s Method has quadratic convergence.

Drawbacks

Although Newton’s Method converges quickly, the additional cost of evaluating the derivative makes each iteration slower to compute. Many functions are not easily differentiable, so Newton’s Method is not always possible. Even in cases when it is possible to evaluate the derivative, it may be quite costly.

Convergence only works well if you are already close to the root. Specifically, if started too far from the root Newton’s method may not converge at all.

Example

We will need the following equations:

f (x) = x 3 - x - 1 f' (x) = 3 x 2 - 1 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mi>f</mi><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><msup><mi>x</mi><mn>3</mn></msup><mo>-</mo><mi>x</mi><mo>-</mo><mn>1</mn></mtd></mtr><mtr><mtd><msup><mi>f</mi><mo data-mjx-alternate="1">'</mo></msup><mo stretchy="false">(</mo><mi>x</mi><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>=</mo><mn>3</mn><msup><mi>x</mi><mn>2</mn></msup><mo>-</mo><mn>1</mn></mtd></mtr></mtable></math>

Iteration 1

From the graph above, we can see that the root is somewhere near $x = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mn>1</mn></math>$ . We will use this as our starting position, $x 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mn>0</mn></msub></math>$ .

x1=x0−f(x0)f′(x0)=1−f(1)f′(1)=1−13−1−13⋅12−1=1+12=1.5<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi>x</mi><mn>0</mn></msub><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn><mo>−</mo><mfrac><mrow><msup><mn>1</mn><mn>3</mn></msup><mo>−</mo><mn>1</mn><mo>−</mo><mn>1</mn></mrow><mrow><mn>3</mn><mo>⋅</mo><msup><mn>1</mn><mn>2</mn></msup><mo>−</mo><mn>1</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn><mo>+</mo><mfrac><mn>1</mn><mn>2</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.5</mn></mtd></mtr></mtable></math>

Iteration 2

x2=x1−f(x1)f′(x1)=1.5−f(1.5)f′(1.5)=1.5−1.53−1.5−13⋅1.52−1=1.5−0.8755.75=1.3478260869565217<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi>x</mi><mn>1</mn></msub><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.5</mn><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1.5</mn><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><mn>1.5</mn><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.5</mn><mo>−</mo><mfrac><mrow><msup><mn>1.5</mn><mn>3</mn></msup><mo>−</mo><mn>1.5</mn><mo>−</mo><mn>1</mn></mrow><mrow><mn>3</mn><mo>⋅</mo><msup><mn>1.5</mn><mn>2</mn></msup><mo>−</mo><mn>1</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.5</mn><mo>−</mo><mfrac><mn>0.875</mn><mn>5.75</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3478260869565217</mn></mtd></mtr></mtable></math>

Iteration 3

x3=x2−f(x2)f′(x2)=1.3478260869565217−f(1.3478260869565217)f′(1.3478260869565217)=1.3478260869565217−1.34782608695652173−1.3478260869565217−13⋅1.34782608695652172−1=1.3478260869565217−0.100682173091148244.449905482041588=1.325200398950907<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi>x</mi><mn>3</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi>x</mi><mn>2</mn></msub><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>2</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>2</mn></msub><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3478260869565217</mn><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1.3478260869565217</mn><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><mn>1.3478260869565217</mn><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3478260869565217</mn><mo>−</mo><mfrac><mrow><msup><mn>1.3478260869565217</mn><mn>3</mn></msup><mo>−</mo><mn>1.3478260869565217</mn><mo>−</mo><mn>1</mn></mrow><mrow><mn>3</mn><mo>⋅</mo><msup><mn>1.3478260869565217</mn><mn>2</mn></msup><mo>−</mo><mn>1</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3478260869565217</mn><mo>−</mo><mfrac><mn>0.10068217309114824</mn><mn>4.449905482041588</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.325200398950907</mn></mtd></mtr></mtable></math>

As you can see, Newton’s Method is already converging significantly faster than the Bisection Method.

…

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$

When running the code for Newton’s method given below, the resulting approximate root determined is 1.324717957244746.

Code

The following Python code calls SciPy’s newton method:

import scipy.optimize as opt


def f(x):
    return x**3 - x - 1

def fprime(x):
    return 3 * x**2 - 1

root = opt.newton(f, x0=1, fprime=fprime)

Secant Method

Like Newton’s Method, secant method uses the Taylor Series to find the solution. However, you may not always be able to take the derivative of a function. Secant method gets around this by approximating the derivative as:

f′(xk)≈f(xk)−f(xk−1)xk−xk−1<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo><mo>≈</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mi>k</mi></msub><mo stretchy="false">)</mo><mo>−</mo><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo></mrow><mrow><msub><mi>x</mi><mi>k</mi></msub><mo>−</mo><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>−</mo><mn>1</mn></mrow></msub></mrow></mfrac></math>

Algorithm

The steps involved in the Secant Method are identical to those of the Newton Method, with the derivative replaced by an approximation for the slope of the tangent.

Computational Cost

Similar to bisection, although secant method conceptually requires 2 function evaluations per iteration, one of the function evaluations will have been computed in the previous iteration and can be reused. So, secant method requires 1 new function evaluation per iteration (after the first iteration).

Convergence

Secant method has superlinear convergence.

More specifically, the rate of convergence $r <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi></math>$ is:

r=1+√52≈1.618<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>r</mi><mo>=</mo><mfrac><mrow><mn>1</mn><mo>+</mo><msqrt><mn>5</mn></msqrt></mrow><mn>2</mn></mfrac><mo>≈</mo><mn>1.618</mn></math>

This happens to be the golden ratio.

Drawbacks

This technique has many of the same drawbacks as Newton’s Method, but does not require a derivative. It does not converge as quickly as Newton’s Method. It also requires two starting guesses near the root.

Example

Let’s start with $x 0 = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mn>0</mn></msub><mo>=</mo><mn>1</mn></math>$ and $x - 1 = 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msub><mo>=</mo><mn>2</mn></math>$ .

Iteration 1

First, find an approximate for the derivative (slope):

f′(x0)≈f(x0)−f(x−1)x0−x−1=f(1)−f(2)1−2=(13−1−1)−(23−2−1)1−2=(−1)−(5)1−2=6<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>≈</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy="false">)</mo><mo>−</mo><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msub><mo stretchy="false">)</mo></mrow><mrow><msub><mi>x</mi><mn>0</mn></msub><mo>−</mo><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msub></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo><mo>−</mo><mi>f</mi><mo stretchy="false">(</mo><mn>2</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1</mn><mo>−</mo><mn>2</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo stretchy="false">(</mo><msup><mn>1</mn><mn>3</mn></msup><mo>−</mo><mn>1</mn><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo><mo>−</mo><mo stretchy="false">(</mo><msup><mn>2</mn><mn>3</mn></msup><mo>−</mo><mn>2</mn><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1</mn><mo>−</mo><mn>2</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo stretchy="false">(</mo><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo><mo>−</mo><mo stretchy="false">(</mo><mn>5</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1</mn><mo>−</mo><mn>2</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>6</mn></mtd></mtr></mtable></math>

Then, use this for Newton’s Method:

x1=x0−f(x0)f′(x0)=1−f(1)f′(1)=1−13−1−16=1+16=1.1666666666666667<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi>x</mi><mn>0</mn></msub><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn><mo>−</mo><mfrac><mrow><msup><mn>1</mn><mn>3</mn></msup><mo>−</mo><mn>1</mn><mo>−</mo><mn>1</mn></mrow><mn>6</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1</mn><mo>+</mo><mfrac><mn>1</mn><mn>6</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.1666666666666667</mn></mtd></mtr></mtable></math>

Iteration 2

f′(x1)≈f(x1)−f(x0)x1−x0=f(1.1666666666666667)−f(1)1.1666666666666667−1=(1.16666666666666673−1.1666666666666667−1)−(13−1−1)1.1666666666666667−1=(−0.5787037037037035)−(−1)1.1666666666666667−1=2.5277777777777777<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>≈</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo stretchy="false">)</mo><mo>−</mo><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>0</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msub><mi>x</mi><mn>1</mn></msub><mo>−</mo><msub><mi>x</mi><mn>0</mn></msub></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1.1666666666666667</mn><mo stretchy="false">)</mo><mo>−</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1.1666666666666667</mn><mo>−</mo><mn>1</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo stretchy="false">(</mo><msup><mn>1.1666666666666667</mn><mn>3</mn></msup><mo>−</mo><mn>1.1666666666666667</mn><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo><mo>−</mo><mo stretchy="false">(</mo><msup><mn>1</mn><mn>3</mn></msup><mo>−</mo><mn>1</mn><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1.1666666666666667</mn><mo>−</mo><mn>1</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo stretchy="false">(</mo><mo>−</mo><mn>0.5787037037037035</mn><mo stretchy="false">)</mo><mo>−</mo><mo stretchy="false">(</mo><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1.1666666666666667</mn><mo>−</mo><mn>1</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>2.5277777777777777</mn></mtd></mtr></mtable></math>

x2=x1−f(x1)f′(x1)=1.1666666666666667−f(1.1666666666666667)f′(1.1666666666666667)=1.1666666666666667−1.16666666666666673−1.1666666666666667−12.5277777777777777=1.1666666666666667−−0.57870370370370352.5277777777777777=1.3956043956043955<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi>x</mi><mn>1</mn></msub><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.1666666666666667</mn><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1.1666666666666667</mn><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><mn>1.1666666666666667</mn><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.1666666666666667</mn><mo>−</mo><mfrac><mrow><msup><mn>1.1666666666666667</mn><mn>3</mn></msup><mo>−</mo><mn>1.1666666666666667</mn><mo>−</mo><mn>1</mn></mrow><mn>2.5277777777777777</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.1666666666666667</mn><mo>−</mo><mfrac><mrow><mo>−</mo><mn>0.5787037037037035</mn></mrow><mn>2.5277777777777777</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3956043956043955</mn></mtd></mtr></mtable></math>

Iteration 3

f′(x2)≈f(x2)−f(x1)x2−x1=f(1.3956043956043955)−f(1.1666666666666667)1.3956043956043955−1.1666666666666667=(1.39560439560439553−1.3956043956043955−1)−(1.16666666666666673−1.1666666666666667−1)1.3956043956043955−1.1666666666666667=(0.3226305152401032)−(−0.5787037037037035)1.3956043956043955−1.1666666666666667=3.9370278683465503<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>2</mn></msub><mo stretchy="false">)</mo></mtd><mtd><mi></mi><mo>≈</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>2</mn></msub><mo stretchy="false">)</mo><mo>−</mo><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msub><mi>x</mi><mn>2</mn></msub><mo>−</mo><msub><mi>x</mi><mn>1</mn></msub></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1.3956043956043955</mn><mo stretchy="false">)</mo><mo>−</mo><mi>f</mi><mo stretchy="false">(</mo><mn>1.1666666666666667</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1.3956043956043955</mn><mo>−</mo><mn>1.1666666666666667</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo stretchy="false">(</mo><msup><mn>1.3956043956043955</mn><mn>3</mn></msup><mo>−</mo><mn>1.3956043956043955</mn><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo><mo>−</mo><mo stretchy="false">(</mo><msup><mn>1.1666666666666667</mn><mn>3</mn></msup><mo>−</mo><mn>1.1666666666666667</mn><mo>−</mo><mn>1</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1.3956043956043955</mn><mo>−</mo><mn>1.1666666666666667</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mfrac><mrow><mo stretchy="false">(</mo><mn>0.3226305152401032</mn><mo stretchy="false">)</mo><mo>−</mo><mo stretchy="false">(</mo><mo>−</mo><mn>0.5787037037037035</mn><mo stretchy="false">)</mo></mrow><mrow><mn>1.3956043956043955</mn><mo>−</mo><mn>1.1666666666666667</mn></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>3.9370278683465503</mn></mtd></mtr></mtable></math>

x3=x2−f(x2)f′(x2)=1.3956043956043955−f(1.3956043956043955)f′(1.3956043956043955)=1.3956043956043955−1.39560439560439553−1.3956043956043955−13.9370278683465503=1.3956043956043955−0.32263051524010323.9370278683465503=1.3136566609098987<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi>x</mi><mn>3</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi>x</mi><mn>2</mn></msub><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><msub><mi>x</mi><mn>2</mn></msub><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><msub><mi>x</mi><mn>2</mn></msub><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3956043956043955</mn><mo>−</mo><mfrac><mrow><mi>f</mi><mo stretchy="false">(</mo><mn>1.3956043956043955</mn><mo stretchy="false">)</mo></mrow><mrow><msup><mi>f</mi><mo data-mjx-alternate="1">′</mo></msup><mo stretchy="false">(</mo><mn>1.3956043956043955</mn><mo stretchy="false">)</mo></mrow></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3956043956043955</mn><mo>−</mo><mfrac><mrow><msup><mn>1.3956043956043955</mn><mn>3</mn></msup><mo>−</mo><mn>1.3956043956043955</mn><mo>−</mo><mn>1</mn></mrow><mn>3.9370278683465503</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3956043956043955</mn><mo>−</mo><mfrac><mn>0.3226305152401032</mn><mn>3.9370278683465503</mn></mfrac></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mn>1.3136566609098987</mn></mtd></mtr></mtable></math>

…

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$

When running the code for secant method given below, the resulting approximate root determined is 1.324717957244753.

Code

SciPy’s newton method serves double-duty. If given a function $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ and a first derivative $f' <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>f</mi><mo data-mjx-alternate="1">'</mo></msup></math>$ , it will use Newton’s Method. If it is not given a derivative, it will instead use the Secant Method to approximate it:

import scipy.optimize as opt

def f(x):
    return x**3 - x - 1

root = opt.newton(f, x0=1)

Solving Many Equations

Similar to root-finding in 1 dimension, we can also perform root-finding for multiple equations in $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ dimensions. Mathematically, we are trying to solve $f (x) = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo><mo mathvariant="bold">=</mo><mn mathvariant="bold">0</mn></math>$ for $f : R n \to R n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">f</mi><mo>:</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup><mo accent="false" stretchy="false">\to</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ . In other words, $f (x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo></math>$ is now a vector-valued function

f (x) = [f 1 (x) ⋮ f n (x)] = [f 1 (x 1, \dots, x n) ⋮ f n (x 1, \dots, x n)] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>f</mi><mn>1</mn></msub><mo stretchy="false">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>f</mi><mi>n</mi></msub><mo stretchy="false">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false">)</mo></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>f</mi><mn>1</mn></msub><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mi>x</mi><mi>n</mi></msub><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>f</mi><mi>n</mi></msub><mo stretchy="false">(</mo><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mi>x</mi><mi>n</mi></msub><mo stretchy="false">)</mo></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

If we are instead looking for the solution to $f (x) = y <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo><mo mathvariant="bold">=</mo><mi mathvariant="bold-italic">y</mi></math>$ , we can rework our function like so:

˜ f (x) = f (x) - y = 0 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mover><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">~</mo></mover></mrow><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo><mo>=</mo><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo><mo>-</mo><mi mathvariant="bold-italic">y</mi><mo>=</mo><mn mathvariant="bold">0</mn></math>

We can think of each equation as a function that describes a surface. We are looking for vectors that describe the intersection of these surfaces.

Newton’s Method

The multi-dimensional equivalent of Newton’s Method involves approximating a function as:

f (x + s) \approx f (x) + J f (x) s <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo mathvariant="bold">+</mo><mi mathvariant="bold-italic">s</mi><mo stretchy="false" mathvariant="bold">)</mo><mo>\approx</mo><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo><mo>+</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false">)</mo><mi mathvariant="bold-italic">s</mi></math>

where $J f <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub></math>$ is the Jacobian matrix of $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="bold-italic">f</mi></math>$ .

By setting this to $0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mn mathvariant="bold">0</mn></mrow></math>$ and rearranging, we get:

J f (x) s = - f (x) (1) s = - J f (x) - 1 f (x) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false">)</mo><mi mathvariant="bold-italic">s</mi></mtd><mtd><mi></mi><mo>=</mo><mo>-</mo><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo><mstyle scriptlevel="0"><mspace width="2em"></mspace></mstyle><mstyle scriptlevel="0"><mspace width="2em"></mspace></mstyle><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></mtd></mtr><mtr><mtd><mi mathvariant="bold-italic">s</mi></mtd><mtd><mi></mi><mo>=</mo><mo>-</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><mi mathvariant="bold-italic">x</mi><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false" mathvariant="bold">)</mo></mtd></mtr></mtable></math>

Note that in practice we would not actually invert the Jacobian, but would instead solve the linear system in $(1) <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mn>1</mn><mo stretchy="false">)</mo></math>$ to determine the step.

Algorithm

Similar to the way we solved for $x k + 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>k</mi><mo>+</mo><mn>1</mn></mrow></msub></math>$ in 1 dimension, we can solve for:

$x k + 1 = x k + s k <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi mathvariant="bold-italic">x</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold-italic">k</mi><mo mathvariant="bold">+</mo><mn mathvariant="bold">1</mn></mrow></msub><mo>=</mo><msub><mi mathvariant="bold-italic">x</mi><mi mathvariant="bold-italic">k</mi></msub><mo>+</mo><msub><mi mathvariant="bold-italic">s</mi><mi mathvariant="bold-italic">k</mi></msub></math>$ where $\boldsymbol{s_k}$ is determined by solving the linear system $J f (x k) s k = - f (x k) . <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><msub><mi mathvariant="bold-italic">x</mi><mi mathvariant="bold-italic">k</mi></msub><mo stretchy="false">)</mo><msub><mi mathvariant="bold-italic">s</mi><mi mathvariant="bold-italic">k</mi></msub><mo>=</mo><mo>-</mo><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><msub><mi mathvariant="bold-italic">x</mi><mi mathvariant="bold-italic">k</mi></msub><mo stretchy="false" mathvariant="bold">)</mo><mo>.</mo></math>$

Drawbacks

Just like in 1D, Newton’s Method only converges locally. It may also be expensive to compute $J f <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub></math>$ at each iteration and we must solve a linear system at each iteration.

Example

Let’s find a root for:

f (x, y) = [x + 2 y - 2 x 2 + 4 y 2 - 4] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi mathvariant="bold-italic">f</mi><mo stretchy="false">(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mi>x</mi><mo>+</mo><mn>2</mn><mi>y</mi><mo>-</mo><mn>2</mn></mtd></mtr><mtr><mtd><msup><mi>x</mi><mn>2</mn></msup><mo>+</mo><mn>4</mn><msup><mi>y</mi><mn>2</mn></msup><mo>-</mo><mn>4</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

The corresponding Jacobian and inverse Jacobian are:

J f (x) = [12 2 x 8 y] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><mi mathvariant="bold-italic">x</mi><mo stretchy="false">)</mo><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>2</mn></mtd></mtr><mtr><mtd><mn>2</mn><mi>x</mi></mtd><mtd><mn>8</mn><mi>y</mi></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

J−1f=1x−2y[−2y12x2−14]<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msubsup><mo>=</mo><mfrac><mn>1</mn><mrow><mi>x</mi><mo>−</mo><mn>2</mn><mi>y</mi></mrow></mfrac><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>2</mn><mi>y</mi></mtd><mtd><mfrac><mn>1</mn><mn>2</mn></mfrac></mtd></mtr><mtr><mtd><mfrac><mi>x</mi><mn>2</mn></mfrac></mtd><mtd><mo>−</mo><mfrac><mn>1</mn><mn>4</mn></mfrac></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

In this example, as the Jacobian is a $2 \times 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mo>\times</mo><mn>2</mn></math>$ matrix with a simple inverse, we work explicitly with the inverse, even though we would not explicitly compute the inverse for a real problem.

Iteration 1

Let’s start at $x 0 = [11] <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">0</mn></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>$ .

x1=x0−Jf(x0)−1f(x0)=[11]−11−2[−21212−14][11]=[11]+[−1.50.25]=[−0.51.25]<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">1</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">0</mn></msub><mo>−</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">0</mn></msub><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">0</mn></msub><mo stretchy="false" mathvariant="bold">)</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>−</mo><mfrac><mn>1</mn><mrow><mn>1</mn><mo>−</mo><mn>2</mn></mrow></mfrac><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>2</mn></mtd><mtd><mfrac><mn>1</mn><mn>2</mn></mfrac></mtd></mtr><mtr><mtd><mfrac><mn>1</mn><mn>2</mn></mfrac></mtd><mtd><mo>−</mo><mfrac><mn>1</mn><mn>4</mn></mfrac></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>+</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>1.5</mn></mtd></mtr><mtr><mtd><mn>0.25</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>0.5</mn></mtd></mtr><mtr><mtd><mn>1.25</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr></mtable></math>

Iteration 2

x2=x1−Jf(x1)−1f(x1)=[−0.51.25]−1−0.5−2.5[−2.512−14−14][02.5]=[−0.51.25]+13[1.25−0.625]=[−0.083333331.04166667]<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">2</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">1</mn></msub><mo>−</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">1</mn></msub><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">1</mn></msub><mo stretchy="false" mathvariant="bold">)</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>0.5</mn></mtd></mtr><mtr><mtd><mn>1.25</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>−</mo><mfrac><mn>1</mn><mrow><mo>−</mo><mn>0.5</mn><mo>−</mo><mn>2.5</mn></mrow></mfrac><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>2.5</mn></mtd><mtd><mfrac><mn>1</mn><mn>2</mn></mfrac></mtd></mtr><mtr><mtd><mo>−</mo><mfrac><mn>1</mn><mn>4</mn></mfrac></mtd><mtd><mo>−</mo><mfrac><mn>1</mn><mn>4</mn></mfrac></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>2.5</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>0.5</mn></mtd></mtr><mtr><mtd><mn>1.25</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>+</mo><mfrac><mn>1</mn><mn>3</mn></mfrac><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1.25</mn></mtd></mtr><mtr><mtd><mo>−</mo><mn>0.625</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>0.08333333</mn></mtd></mtr><mtr><mtd><mn>1.04166667</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr></mtable></math>

Iteration 3

x3=x2−Jf(x2)−1f(x2)=[−0.083333331.04166667]−1−0.08333333−2.08333334[−2.083333340.5−0.041666665−0.25][9.99999993922529⋅10−90.34722224944444413]=[−0.083333331.04166667]+12.1666666699999997[0.1736111−0.08680556]=[−0.003205131.00160256]<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">3</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">2</mn></msub><mo>−</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">J</mi></mrow><mi>f</mi></msub><mo stretchy="false">(</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">2</mn></msub><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mi mathvariant="bold-italic">f</mi><mo stretchy="false" mathvariant="bold">(</mo><msub><mi mathvariant="bold-italic">x</mi><mn mathvariant="bold">2</mn></msub><mo stretchy="false" mathvariant="bold">)</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>0.08333333</mn></mtd></mtr><mtr><mtd><mn>1.04166667</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>−</mo><mfrac><mn>1</mn><mrow><mo>−</mo><mn>0.08333333</mn><mo>−</mo><mn>2.08333334</mn></mrow></mfrac><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>2.08333334</mn></mtd><mtd><mn>0.5</mn></mtd></mtr><mtr><mtd><mo>−</mo><mn>0.041666665</mn></mtd><mtd><mo>−</mo><mn>0.25</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>9.99999993922529</mn><mo>⋅</mo><msup><mn>10</mn><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>9</mn></mrow></msup></mtd></mtr><mtr><mtd><mn>0.34722224944444413</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>0.08333333</mn></mtd></mtr><mtr><mtd><mn>1.04166667</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>+</mo><mfrac><mn>1</mn><mn>2.1666666699999997</mn></mfrac><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>0.1736111</mn></mtd></mtr><mtr><mtd><mo>−</mo><mn>0.08680556</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnalign="center" columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>−</mo><mn>0.00320513</mn></mtd></mtr><mtr><mtd><mn>1.00160256</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></mtd></mtr></mtable></math>

…

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$

When running the code for Newton’s method given below, the resulting approximate root determined is $[- 2.74060567 \cdot 10 - 16 1] ⊤ <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mo>-</mo><mn>2.74060567</mn><mo>\cdot</mo><msup><mn>10</mn><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>16</mn></mrow></msup></mtd><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mi mathvariant="normal">⊤</mi></msup></math>$ .

Code

import numpy as np
import scipy.optimize as opt


def f(xvec):
    x, y = xvec
    return np.array([
        x + 2*y - 2,
        x**2 + 4*y**2 - 4
    ])

def Jf(xvec):
    x, y = xvec
    return np.array([
        [1, 2],
        [2*x, 8*y]
    ])

sol = opt.root(f, x0=[1, 1], jac=Jf)
root = sol.x

Review Questions

See this review link

ChangeLog

2017-12-02 Erin Carrier ecarrie2@illinois.edu: adds review questions, adds a little more cost information, a few other minor fixes
2017-12-25 Adam Stewart adamjs5@illinois.edu: first complete draft
2017-10-17 Luke Olson lukeo@illinois.edu: outline

Solving Nonlinear Equations

Learning objectives

Root of a Function

Solution of an Equation

Definition of Jacobian Matrix

Solving One Equation

Bisection Method

Algorithm

Computational Cost

Convergence

Drawbacks

Example

Iteration 1

Iteration 2

Iteration 3

Iteration n<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>

Code

Newton’s Method

Algorithm

Computational Cost

Convergence

Drawbacks

Example

Iteration 1

Iteration 2

Iteration 3

Iteration n<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>

Code

Secant Method

Algorithm

Computational Cost

Convergence

Drawbacks

Example

Iteration 1

Iteration 2

Iteration 3

Iteration n<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>

Code

Solving Many Equations

Newton’s Method

Algorithm

Drawbacks

Example

Iteration 1

Iteration 2

Iteration 3

Iteration n<math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>

Code

Review Questions

ChangeLog

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$

Iteration $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$