Section 6.6 Conjugate Gradient
A shortcoming of the steepest descent method to solve \(Ax=b\) is that it tends to "zig-zag" a lot, namely we come back often to the same (or almost the same) direction many times. It would be much faster if we could find a set of directions that avoids this. A natural fix to this problem is choosing the sequence of the \(\boldsymbol{p}_{k+1}\) as vectors that are "\(A\)-orthogonal", namely
\begin{equation*}
\boldsymbol{p}_{k+1}^T\cdot A\boldsymbol{p}_j=0\,,\;j=0,\dots,k,
\end{equation*}
starting with the usual \(\boldsymbol{p_0}=\nabla_{\boldsymbol{x_0}}f\text{.}\) The advantage of this choice, coupled with the linear search, is that once a direction has been explored one (in exact mathematics!) does not go back to it, so that the algorithm lands on the minimum in at most a number of steps equal to the size of the matrix \(A\text{.}\)
Remark 6.6.1.
In exact mathematics, hence, the Conjugate Gradient algorithm is not iterative but exact. In practice though it is iterative anyways for two main reasons:- because of the round-off errors, the eigendirections \(\boldsymbol{p}_{k}\) are never exact;
- when we use the method for non-quadratic functions, the method is anyways non exact.
\begin{equation*}
\boldsymbol{p}_{0}=\boldsymbol{r}_{0}=\boldsymbol{b} -A\boldsymbol{x}_{0}
\end{equation*}
and then we use recursively the following formulae:
\begin{equation*}
\begin{aligned}
\alpha_k&=\frac{\|\boldsymbol{r}_{k}\|^2}{\|\boldsymbol{p}_{k}\|^2_A}=\frac{\boldsymbol{r}^T_{k}\boldsymbol{r}_{k}}{\boldsymbol{p}^T_{k}A\boldsymbol{p}_{k}}\\
\boldsymbol{x}_{k+1}&=\boldsymbol{x}_{k} + \alpha_k\boldsymbol{p}_{k}\\
\boldsymbol{r}_{k+1}&=\boldsymbol{b}-A\boldsymbol{x}_{k}=\boldsymbol{r}_{k}-\alpha_kA\boldsymbol{p}_{k} \\
\beta_k&=\frac{\langle \boldsymbol{r}_{k+1},\boldsymbol{p}_{k}\rangle_A}{\|\boldsymbol{p}_{k}\|^2}
=\frac{\|\boldsymbol{r}_{k+1}\|^2}{\|\boldsymbol{r}_{k}\|^2}\\
\boldsymbol{p}_{k+1}&=\boldsymbol{r}_{k+1}+\beta_k\boldsymbol{p}_{k}\\
\end{aligned}
\end{equation*}
A concrete implementation. Below we find the minimum of our usual quadratic function
\begin{equation*}
f(x,y)=5x^2+4xy+y^2-6x-4y+15
\end{equation*}
with this method.