Section 6.5 Newton's Method
Newton's method looks for an optimal point \(x^*\) of \(f(x)\) by looking for a zero of the gradient vector field \(\nabla_x f\text{.}\) Its formulation, like in the 1-dim case, comes from Taylor's expansion:
\begin{equation*}
\nabla_{y}f \simeq \nabla_{x} f + Hess_{x}f\cdot(y-x)
\end{equation*}
so, in order to have \(\nabla_{y}f=0\text{,}\) the best bet is choosing \(y\) so that
\begin{equation*}
y = x - (Hess_{x}f)^{-1}\cdot(\nabla_{x} f)
\end{equation*}
Like in the 1-dim case, Newton's method converges in a single step when \(f(x)\) is quadratic -- this is not very surprising: when \(f(x)=x^TAx+\boldsymbol{b}^Tx+C\text{,}\) its Hessian \(Hess_{x}f\) is constant and equal to \(A\) and to use Newton's method we need to evaluate \(A^{-1}\text{,}\) which amounts to solving the linear system \(Ax=-b\text{.}\)
The algorithm: - select an initial point \(x\text{;}\)
- select a threshold \(\epsilon>0\text{;}\)
- until \(\|\nabla_{x}f\|<\epsilon\text{:}\) \(x\to x-(Hess_{x}f)^{-1}\cdot(\nabla_{x}f\)).
- print \(x\text{.}\)
- Using the same tolerance of \(10^{-5}\text{,}\) we get the minimum of the quadratic map in a single step...
- ... and the minimum of the quartic map in 6 steps.
- The case of the Rosenbrock's function is tough even for Newton's method. If we lower the tolerance to \(10^{-3}\text{,}\) then the method converges to the minimum in almost 4000 steps. Not an optimal result, but at least we finally get convergence.
Theorem 6.5.1.
Suppose that \(f(x)\) is smooth and that \(\nabla_{x^*}f=0\text{.}\) Then there is some distance \(\epsilon>0\) such that the Newton method converges to \(x^*\) for every initial point \(x_0\) whithin \(\epsilon\) from \(x^*\text{.}\)Theorem 6.5.2.
When it does converge, Newton's method converges quadratically.
\begin{equation*}
|x_{k+1}-x^*|=\left|x_k-x^*-\frac{f(x_k)}{f'(x_k)}\right|=
\end{equation*}
\begin{equation*}
=\left|\frac{1}{f'(x_k)}\right|\cdot|f'(x_k)(x_k-x^*)-f(x_k)|\simeq
\end{equation*}
\begin{equation*}
= \left|\frac{1}{f'(x_k)}\right|\cdot|f''(x_k)(x_k-x^*)^2+O(|x_k-x^*|^3)|\leq
\end{equation*}
\begin{equation*}
\leq M|x_k-x^*|^2.
\end{equation*}