Section 6.3 Gradient methods
Recall that the gradient \(\nabla_\boldsymbol{x}f\) points at each point at the direction of maximal growth of \(f\) at \(\boldsymbol{x}\text{.}\) Then is a good choice setting
\begin{equation*}
\boldsymbol{d}_k = - \nabla_\boldsymbol{x_k}f
\end{equation*}
More generally, one can set
\begin{equation*}
\boldsymbol{d}_k = - C_k\nabla_\boldsymbol{x_k}f
\end{equation*}
as long as every \(C_k\) is a positive-definite matrix, namely \(C_k\) is symmetric and all of its eigenvalues are positive for every \(k\text{.}\) Subsection 6.3.1 How to choose \(d_k\text{?}\)
Steepest descent method\(\boldsymbol{d}_k = - \nabla_\boldsymbol{x_k}f\) Newton method\(\boldsymbol{d}_k = - Hess_{x_k}(f)^{-1}(\nabla_\boldsymbol{x_k}f)\) This last method is the \(n\)-dimensional analogue of the Newton method we already met in dimension 1.Subsection 6.3.2 How to choose \(\alpha_k\text{?}\)
There are many possible choices, below are a few simple ones:- \(\hbox{constant step size: }\alpha_k=s\)
- might not converge is \(s\) is too large, might be slow if \(s\) is too small.
- \(\hbox{diminishing size: }\alpha_k\to0,\hbox{ e.g. }\alpha_k=1/k\)
- should not converge to 0 too fast or might be slow.
- \(\hbox{line search: }\alpha_k\hbox{ minimizes }g_k(\alpha)=f(x_k+\alpha d_k)\)
- can use 1-dim root-finding methods to get \(\alpha_k\text{;}\)
Subsection 6.3.3 Stopping Criteria
There are several possible stopping criteria for these methods. Let \(\epsilon>0\) be a given threshold:\(\|\nabla_{x_k} f\|<\epsilon\) | (we got close enough to a critical point). |
\(\|f(x_{k+1})-f(x_k)\|<\epsilon\) | (the value of \(f\) stabilized). |
\(\|x_{k+1}-x_k\| < \epsilon\) | (iterates stabilized). |
\(\displaystyle\frac{\|f(x_{k+1})-f(x_k)\|}{\max\{1,\|f(x_k)\|\}}<\epsilon\) | (the relative change in \(f\) is stabilizing). |
\(\displaystyle\frac{\|x_{k+1}-x_k\|}{\max\{1,\|x_k\|\}}<\epsilon\) | (the relative change in \(x\) is stabilizing). |