AATA Newton's Method

Section 6.5 Newton's Method

Newton's method looks for an optimal point \(x^*\) of \(f(x)\) by looking for a zero of the gradient vector field \(\nabla_x f\text{.}\) Its formulation, like in the 1-dim case, comes from Taylor's expansion:

\begin{equation*} \nabla_{y}f \simeq \nabla_{x} f + Hess_{x}f\cdot(y-x) \end{equation*}

so, in order to have \(\nabla_{y}f=0\text{,}\) the best bet is choosing \(y\) so that

\begin{equation*} y = x - (Hess_{x}f)^{-1}\cdot(\nabla_{x} f) \end{equation*}

Like in the 1-dim case, Newton's method converges in a single step when \(f(x)\) is quadratic -- this is not very surprising: when \(f(x)=x^TAx+b^Tx+C\text{,}\) its Hessian \(Hess_{x}f\) is constant and equal to \(A\) and to use Newton's method we need to evaluate \(A^{-1}\text{,}\) which amounts to solving the linear system \(Ax=-b\text{.}\)

The algorithm:

select an initial point \(x\text{;}\)
select a threshold \(\epsilon>0\text{;}\)
until \(\|\nabla_{x}f\|<\epsilon\text{:}\) \(x\to x-(Hess_{x}f)^{-1}\cdot(\nabla_{x}f\)).
print \(x\text{.}\)

A concrete implementation. Consider again the optimization problem solved numerically in the previous section, whose minimum is achieved at \((x,y) = (-1,4)\text{:}\)

\begin{equation*} f(x,y)=5x^2+4xy+y^2-6x-4y+15 \end{equation*}

A second example. This example uses MATLAB's symbolic capabilities in order to avoid to input by hand the gradient and the Hessian of the function. Since Octave does not support symbolic calculations, you need to run this code on MATLAB.

Convergence

Does this method always converge? Short answer: No. Nicer answer: Yes, provided that we start "close enough".

Theorem 6.5.1.

Suppose that \(f(x)\) is smooth and that \(\nabla_{x^*}f=0\text{.}\) Then there is some distance \(\epsilon>0\) such that the Newton method converges to \(x^*\) for every initial point \(x_0\) whithin \(\epsilon\) from \(x^*\text{.}\)

Remark 1. Theorem above says nothing about points that are further away: they may or may not converge.

Remark 2. This method just looks at zeros of the gradient, so there is guarantee that this point is a minimum: it could be a saddle or even a maximum!

Rate of Convergence.

Theorem 6.5.2.

When it does converge, Newton's method converges quadratically.

We show this fact in dimension 1:

\begin{equation*} |x_{k+1}-x^*|=\left|x_k-x^*-\frac{f(x_k)}{f'(x_k)}\right|= \end{equation*}

\begin{equation*} =\left|\frac{1}{f'(x_k)}\right|\cdot|f'(x_k)(x_k-x^*)-f(x_k)|\simeq \end{equation*}

\begin{equation*} = \left|\frac{1}{f'(x_k)}\right|\cdot|f''(x_k)(x_k-x^*)^2+O(|x_k-x^*|^3)|\leq \end{equation*}

\begin{equation*} \leq M|x_k-x^*|^2. \end{equation*}