Skip to main content

Section 6.3 Gradient methods

Recall that the gradient \(\nabla_\boldsymbol{x}f\) points at each point at the direction of maximal growth of \(f\) at \(\boldsymbol{x}\text{.}\) Then is a good choice setting
\begin{equation*} \boldsymbol{d}_k = - \nabla_\boldsymbol{x_k}f \end{equation*}
More generally, one can set
\begin{equation*} \boldsymbol{d}_k = - C_k\nabla_\boldsymbol{x_k}f \end{equation*}
as long as every \(C_k\) is a positive-definite matrix, namely \(C_k\) is symmetric and all of its eigenvalues are positive for every \(k\text{.}\)

Subsection 6.3.1 How to choose \(d_k\text{?}\)

Steepest descent method

\(\boldsymbol{d}_k = - \nabla_\boldsymbol{x_k}f\)

Newton method

\(\boldsymbol{d}_k = - Hess_{x_k}(f)^{-1}(\nabla_\boldsymbol{x_k}f)\)

This last method is the \(n\)-dimensional analogue of the Newton method we already met in dimension 1.

Subsection 6.3.2 How to choose \(\alpha_k\text{?}\)

There are many possible choices, below are a few simple ones:

  • \(\hbox{constant step size: }\alpha_k=s\)
    • might not converge is \(s\) is too large, might be slow if \(s\) is too small.
  • \(\hbox{diminishing size: }\alpha_k\to0,\hbox{ e.g. }\alpha_k=1/k\)
    • should not converge to 0 too fast or might be slow.
  • \(\hbox{line search: }\alpha_k\hbox{ minimizes }g_k(\alpha)=f(x_k+\alpha d_k)\)
    • can use 1-dim root-finding methods to get \(\alpha_k\text{;}\)

Subsection 6.3.3 Stopping Criteria

There are several possible stopping criteria for these methods.

Let \(\epsilon>0\) be a given threshold:

\(\|\nabla_{x_k} f\|<\epsilon\) (we got close enough to a critical point).
\(\|f(x_{k+1})-f(x_k)\|<\epsilon\) (the value of \(f\) stabilized).
\(\|x_{k+1}-x_k\| < \epsilon\) (iterates stabilized).
\(\displaystyle\frac{\|f(x_{k+1})-f(x_k)\|}{\max\{1,\|f(x_k)\|\}}<\epsilon\) (the relative change in \(f\) is stabilizing).
\(\displaystyle\frac{\|x_{k+1}-x_k\|}{\max\{1,\|x_k\|\}}<\epsilon\) (the relative change in \(x\) is stabilizing).