Section 6.5 Newton's Method
Newton's method looks for an optimal point \(x^*\) of \(f(x)\) by looking for a zero of the gradient vector field \(\nabla_x f\text{.}\) Its formulation, like in the 1-dim case, comes from Taylor's expansion:
\begin{equation*}
\nabla_{y}f \simeq \nabla_{x} f + Hess_{x}f\cdot(y-x)
\end{equation*}
so, in order to have \(\nabla_{y}f=0\text{,}\) the best bet is choosing \(y\) so that
\begin{equation*}
y = x - (Hess_{x}f)^{-1}\cdot(\nabla_{x} f)
\end{equation*}
Like in the 1-dim case, Newton's method converges in a single step when \(f(x)\) is quadratic -- this is not very surprising: when \(f(x)=x^TAx+b^Tx+C\text{,}\) its Hessian \(Hess_{x}f\) is constant and equal to \(A\) and to use Newton's method we need to evaluate \(A^{-1}\text{,}\) which amounts to solving the linear system \(Ax=-b\text{.}\)
The algorithm: - select an initial point \(x\text{;}\)
- select a threshold \(\epsilon>0\text{;}\)
- until \(\|\nabla_{x}f\|<\epsilon\text{:}\) \(x\to x-(Hess_{x}f)^{-1}\cdot(\nabla_{x}f\)).
- print \(x\text{.}\)
\begin{equation*}
f(x,y)=5x^2+4xy+y^2-6x-4y+15
\end{equation*}
A second example. This example uses MATLAB's symbolic capabilities in order to avoid to input by hand the gradient and the Hessian of the function. Since Octave does not support symbolic calculations, you need to run this code on MATLAB.
Convergence Does this method always converge? Short answer: No. Nicer answer: Yes, provided that we start "close enough".
Theorem 6.5.1.
Suppose that \(f(x)\) is smooth and that \(\nabla_{x^*}f=0\text{.}\) Then there is some distance \(\epsilon>0\) such that the Newton method converges to \(x^*\) for every initial point \(x_0\) whithin \(\epsilon\) from \(x^*\text{.}\)Theorem 6.5.2.
When it does converge, Newton's method converges quadratically.
\begin{equation*}
|x_{k+1}-x^*|=\left|x_k-x^*-\frac{f(x_k)}{f'(x_k)}\right|=
\end{equation*}
\begin{equation*}
=\left|\frac{1}{f'(x_k)}\right|\cdot|f'(x_k)(x_k-x^*)-f(x_k)|\simeq
\end{equation*}
\begin{equation*}
= \left|\frac{1}{f'(x_k)}\right|\cdot|f''(x_k)(x_k-x^*)^2+O(|x_k-x^*|^3)|\leq
\end{equation*}
\begin{equation*}
\leq M|x_k-x^*|^2.
\end{equation*}