NA Gradient methods

Section 6.3 Gradient methods

Recall that the gradient \(\nabla_\boldsymbol{x}f\) points at each point at the direction of maximal growth of \(f\) at \(\boldsymbol{x}\text{.}\) Then is a good choice setting

\begin{equation*} \boldsymbol{d}_k = - \nabla_\boldsymbol{x_k}f \end{equation*}

More generally, one can set

\begin{equation*} \boldsymbol{d}_k = - C_k\nabla_\boldsymbol{x_k}f \end{equation*}

as long as every \(C_k\) is a positive-definite matrix, namely \(C_k\) is symmetric and all of its eigenvalues are positive for every \(k\text{.}\)

Subsection 6.3.1 How to choose \(d_k\text{?}\)

Steepest descent method

\(\boldsymbol{d}_k = - \nabla_\boldsymbol{x_k}f\)

Newton method

\(\boldsymbol{d}_k = - Hess_{x_k}(f)^{-1}(\nabla_\boldsymbol{x_k}f)\)

This last method is the \(n\)-dimensional analogue of the Newton method we already met in dimension 1.

Subsection 6.3.2 How to choose \(\alpha_k\text{?}\)

There are many possible choices, below are a few simple ones:

\(\hbox{constant step size: }\alpha_k=s\)
- might not converge is \(s\) is too large, might be slow if \(s\) is too small.
\(\hbox{diminishing size: }\alpha_k\to0,\hbox{ e.g. }\alpha_k=1/k\)
- should not converge to 0 too fast or might be slow.
\(\hbox{line search: }\alpha_k\hbox{ minimizes }g_k(\alpha)=f(x_k+\alpha d_k)\)
- can use 1-dim root-finding methods to get \(\alpha_k\text{;}\)

Subsection 6.3.3 Stopping Criteria

There are several possible stopping criteria for these methods.

Let \(\epsilon>0\) be a given threshold:

\(\\|\nabla_{x_k} f\\|<\epsilon\)	(we got close enough to a critical point).
\(\\|f(x_{k+1})-f(x_k)\\|<\epsilon\)	(the value of \(f\) stabilized).
\(\\|x_{k+1}-x_k\\| < \epsilon\)	(iterates stabilized).
\(\displaystyle\frac{\\|f(x_{k+1})-f(x_k)\\|}{\max\{1,\\|f(x_k)\\|\}}<\epsilon\)	(the relative change in \(f\) is stabilizing).
\(\displaystyle\frac{\\|x_{k+1}-x_k\\|}{\max\{1,\\|x_k\\|\}}<\epsilon\)	(the relative change in \(x\) is stabilizing).