AATA Conjugate Gradient

Section 6.6 Conjugate Gradient

A shortcoming of the steepest descent method to solve \(Ax=b\) is that it tends to "zig-zag" a lot, namely we come back often to the same (or almost the same) direction many times. It would be much faster if we could find a set of directions that avoids this.

A natural fix to this problem is choosing the sequence of the \(\boldsymbol{p}_{k+1}\) as vectors that are "\(A\)-orthogonal", namely

\begin{equation*} \boldsymbol{p}_{k+1}^T\cdot A\boldsymbol{p}_j=0\,,\;j=0,\dots,k, \end{equation*}

starting with the usual \(\boldsymbol{p_0}=\nabla_{\boldsymbol{x_0}}f\text{.}\)

The advantage of this choice, coupled with the linear search, is that once a direction has been explored one (in exact mathematics!) does not go back to it, so that the algorithm lands on the minimum in at most a number of steps equal to the size of the matrix \(A\text{.}\)

Remark 6.6.1.

In exact mathematics, hence, the Conjugate Gradient algorithm is not iterative but exact. In practice though it is iterative anyways for two main reasons:

because of the round-off errors, the eigendirections \(\boldsymbol{p}_{k}\) are never exact;
when we use the method for non-quadratic functions, the method is anyways non exact.

The algorithm. Assume that \(f(\boldsymbol{x})\simeq\frac{1}{2}\boldsymbol{x}^TA\boldsymbol{x}-b^T\boldsymbol{x}+c\text{.}\) Then the starting points of the CG algorithm are

\begin{equation*} \boldsymbol{p}_{0}=\boldsymbol{r}_{0}=\boldsymbol{b} -A\boldsymbol{x}_{0} \end{equation*}

and then we use recursively the following formulae:

\begin{equation*} \begin{aligned} \alpha_k&=\frac{\|\boldsymbol{r}_{k}\|^2}{\|\boldsymbol{p}_{k}\|^2_A}=\frac{\boldsymbol{r}^T_{k}\boldsymbol{r}_{k}}{\boldsymbol{p}^T_{k}A\boldsymbol{p}_{k}}\\ \boldsymbol{x}_{k+1}&=\boldsymbol{x}_{k} + \alpha_k\boldsymbol{p}_{k}\\ \boldsymbol{r}_{k+1}&=\boldsymbol{b}-A\boldsymbol{x}_{k}=\boldsymbol{r}_{k}-\alpha_kA\boldsymbol{p}_{k} \\ \beta_k&=\frac{\langle \boldsymbol{r}_{k+1},\boldsymbol{p}_{k}\rangle_A}{\|\boldsymbol{p}_{k}\|^2} =\frac{\|\boldsymbol{r}_{k+1}\|^2}{\|\boldsymbol{r}_{k}\|^2}\\ \boldsymbol{p}_{k+1}&=\boldsymbol{r}_{k+1}+\beta_k\boldsymbol{p}_{k}\\ \end{aligned} \end{equation*}

A concrete implementation. Below we find the minimum of our usual quadratic function

\begin{equation*} f(x,y)=5x^2+4xy+y^2-6x-4y+15 \end{equation*}

with this method.