AATA Error analysis

Section 4.5 Error analysis

How reliable is the numerical solution provided by the LU method with pivoting? Can we find bounds for the distance of the numerical solution from the exact one?

Norms. In order to answer these questions we need to recall the concept of norm of a vector/matrix.

Definition 4.5.1.

A norm on a linear space \(\Bbb R^n\) is a map \(\|\cdot\|:\Bbb R^n\to[0,\infty)\) such that:

\(\|v\|\geq0\) and \(\|v\|=0\) if and only if \(v=0\text{;}\)
\(\|\lambda v\|=|\lambda|\cdot\|v\|\) for all scalars \(\lambda\in\Bbb R\text{;}\)
\(\|v+w\|\leq\|v\|+\|w\|\) (triangular inequality).

Examples:

\(\|v\|_1 = \displaystyle\sum_{i=1}^n |v^i|\text{;}\) (see Taxicab norm)
\(\|v\|_2 = \sqrt{\displaystyle\sum_{i=1}^n |v^i|^2}\text{;}\) (the ususal Euclidean norm)
\(\|v\|_\infty = \displaystyle\max_{1\leq i\leq n}|v^i|\text{.}\) (Chebyshev distance)

The norm of a vector with endpoints \(A\) and \(B\) is also the distance between point \(A\) and point \(B\) and the concept of distance depends on the environment. For instance, if points \(A\) and \(B\) are two points in an ice ring, definitely the most suitable concept of distance is the Euclidean one: one just moves from \(A\) to \(B\) on the straight line joining them. If the two points mark two buildings in Manhattan, though, then to go from \(A\) to \(B\) one can not move diagonally (you'd cross the buildings) but only either horizontally or vertically, so should use the Taxicab norm. But if \(A\) marks a King piece on a chessboard, which can also move diagonally, then the number of moves it takes to the King to get to \(B\) is given by the Chebyshev distance between \(A\) and \(B\text{!}\)

Notice that, if a sequence of vectors goes to zero in a norm of \(\bR^n\text{,}\) it does in all other ones, so from the point of view of limits it does not matter which norm one chooses.

The space of all \(n\times n\) matrices is a linear space and so the norms above apply to matrices as well. For matrices, though, there is a special class of norms that are induced by the underlying vector norm in the following way:

\begin{equation*} \|A\| = \max_{\|v\|=1}\|Av\|. \end{equation*}

In case the underlying vector space norm is the Eucludean one, the corresponding matricial norm is called the spectral norm: when \(A\) is symmetric, its spectral norm \(\|A\|_{sp}\) is the highest modulus eigenvalue of \(A\text{;}\) when \(A\) is not symmetric, then \(\|A\|^2_{sp}=\|AA^*\|_{sp}\text{,}\) where \(A^*\) is the transpose of \(A\text{.}\) Notice that, since \((AB)^*=B^*A^*\text{,}\) the matrix \(AA^*\) is symmetric for any \(A\text{.}\)

A very important property relating norms and products of matrices is that, in case of any norm induced by the underlying vector norm,

\begin{equation*} \|AB\|\leq\|A\|\cdot\|B\|. \end{equation*}

We say that these norms are submultiplicative. Throughout this section, we will use submultiplicative norms.

Error propagation. Let \(\hat x=x+\Delta x\) be the solution of \(Ax=b+\Delta b\text{.}\) Since \(A\) is linear,

\begin{equation*} A\hat x = A(x+\Delta x)=Ax+A\Delta x=b+\Delta b, \end{equation*}

so \(A\Delta x = \Delta b\) and therefore

\begin{equation*} \Delta x = A^{-1}\Delta b. \end{equation*}

Hence

\begin{equation*} \|x\|\geq\frac{\|b\|}{\|A\|} \end{equation*}

and

\begin{equation*} \|\Delta x\|\leq\|A^{-1}\|\cdot\|\Delta b\|, \end{equation*}

so that

\begin{equation*} \frac{\|\Delta x\|}{\|x\|}\leq\|A^{-1}\|\cdot\|\Delta b\|\frac{\|A\|}{\|b\|}, \end{equation*}

namely

\begin{equation*} \frac{\|\Delta x\|}{\|x\|}\leq \|A\|\cdot\|A^{-1}\|\frac{\|\Delta b\|}{\|b\|}. \end{equation*}

More generally, one can prove that

\begin{equation*} \frac{\|\Delta x\|}{\|x\|}\leq\|A\|\cdot\|A^{-1}\|\left(\frac{\|\Delta b\|}{\|b\|}+\frac{\|\Delta A\|}{\|A\|}\right), \end{equation*}

namely the relative error in the solution is equal to the sum of relative errors in the equations coefficients rescaled by a factor \(\|A\|\cdot\|A^{-1}\|\text{.}\)

Conditioning. The quantity

\begin{equation*} k(A) = \|A\|\cdot\|A^{-1}\| \end{equation*}

is called conditioning of \(A\text{.}\)

Clearly \(k(\Bbb I_n)=1\) in any norm and so

\begin{equation*} 1=\|{\Bbb I}_n\| = \|AA^{-1}\|\leq\|A\|\cdot\|A^{-1}\|=k(A). \end{equation*}

For a diagonal matrix with diagonal elements \(d_i\text{,}\)

\begin{equation*} k(A)=\max|d_i|/\min|d_i| \end{equation*}

In general, when evaluated with the spectral norm, the conditioning is equal to the modulus of the ratio of the highest modulus eigenvalue over the smallest modulus one. Maps that stretch a lot in one direction and shrink a lot in some other have a large conditioning number since in the plane of those two directions the map looks like a degenerate map that sends the whole plane into a line.

The conditioning of a matrix A in MATLAB/Octave can be obtained with the command cond(A). We use this command in the code below to evaluate the conditioning of the Hilbert matrices \(H^{(n)}\) defined by

\begin{equation*} H^{(n)}_{ij}=\frac{1}{i+j-1},\;\;i,j=1,\dots,n. \end{equation*}

These matrices are known to have, as \(n\) increases, a largest eigenvalue not particularly large and a smallest eigenvalue that, instead, is going fast to zero. As a consequence, their conditioning grows very fast. In MATLAB/Octave, \(H^{(n)}\) is returned by the command hilb(n).

Residual. The residual of a numerical solution of a linear equation is

\begin{equation*} r = b-A\hat x. \end{equation*}

Notice that, if there were no error and \(\hat x\) coincided with the solution \(x\text{,}\) then \(r=0\) since \(Ax=b\text{.}\)

In principle the residual is a good estimator of the quality of the solution but only when the conditioning of the matrix \(A\) is not too large!

Indeed, by definition, \(r = A\Delta x\) and therefore \(\Delta x=A^{-1}r\text{,}\) so that

\begin{equation*} \|\Delta x\|\leq\|r\|\|A^{-1}\|,\;\|x|\geq\|b\|\cdot\|A\|\text{.} \end{equation*}

Hence

\begin{equation*} \frac{\|\Delta x\|}{\|x\|}\leq k(A)\frac{\|r\|}{\|b\|} \end{equation*}

so that, if \(k(A)\) is very large, we can have very imprecise solutions even in presence of very small residuals.

We illustrate how things can go bad in a concrete case. Let us solve, with the LU decomposition method, the linear system

\begin{equation*} \begin{pmatrix} 0.913&0.659\\ 0.457&0.330\\ \end{pmatrix}\cdot \begin{pmatrix} x\\ y\\ \end{pmatrix} = \begin{pmatrix} 0.254\\ 0.127\\ \end{pmatrix}, \end{equation*}

in our \(D_3\) floating point system.

Note that the exact solution of the system is

\begin{equation*} x=1, y=-1. \end{equation*}

Applying the LU decomposition, we find that

\begin{equation*} L = \begin{pmatrix} 1&0\\ 0.501&1\\ \end{pmatrix},\;\; U = \begin{pmatrix} 0.913&0.659\\ 0&-1.59\cdot10^{-4}\\ \end{pmatrix} \end{equation*}

Remark 4.5.2.

Note that the 0 in the \(U\) matrix does not come from the actual difference

\begin{equation*} 0.457-0.501\cdot0.913. \end{equation*}

Almost never floating point calculations produce an exact zero! We put zero there "by hand" because we know that that would be the result in exact mathematics and without those zeros the LU method just could not work. Of course this is a source of instability for sensistive matrices, namely matrices for which the conditioning is large.

Hence we are left with the system

\begin{equation*} \begin{pmatrix} 0.913&0.659\\ 0&-1.59\cdot10^{-4}\\ \end{pmatrix} \cdot \begin{pmatrix} x\\ y\\ \end{pmatrix} = \begin{pmatrix} 0.254\\ -2.54\cdot10^{-4}\\ \end{pmatrix}. \end{equation*}

The system above has solution

\begin{equation*} \hat y=\frac{-2.54}{-1.59}=1.60,\;\hat x=\frac{0.254-0.659\cdot1.60}{0.913}=-0.872, \end{equation*}

quite far from the correct one!

Can we tell how bad is this solution from the residual? The answer is actually negative. The residual of this numerical solution is

\begin{equation*} \begin{pmatrix} 0.254\\ 0.127\\ \end{pmatrix} - \begin{pmatrix} 0.913&0.659\\ 0.457&0.330\\ \end{pmatrix} \cdot \begin{pmatrix} -0.872\\ 1.60\\ \end{pmatrix} = \begin{pmatrix} -4.26\cdot10^{-3}\\ -2.50\cdot10^{-3}\\ \end{pmatrix} \end{equation*}

The norm of this residual in the Taxicab norm is

\begin{equation*} \|r\|=6.76\cdot10^{-3} \end{equation*}

but the numerical solution is quite far from the exact one \(x=1,y=-1\)!

Note also that points much closer to the solution, like \(x=0.999\text{,}\) \(y=-1.01\text{,}\) have a higher norm residual:

\begin{equation*} \begin{pmatrix} 0.254\\ 0.127\\ \end{pmatrix} - \begin{pmatrix} 0.913&0.659\\ 0.457&0.330\\ \end{pmatrix} \cdot \begin{pmatrix} 0.999\\ -1.01\\ \end{pmatrix} = \begin{pmatrix} 7.50\cdot10^{-3}\\ 3.76\cdot10^{-3}\\ \end{pmatrix} , \end{equation*}

whose Taxicab norm is

\begin{equation*} \|r\|=1.13\cdot10^{-2}\text{,} \end{equation*}

about double than the residual of the quite wrong numerical solution above.

The reason for this quite unintuitive behavior is that \(A\) is ill-conditioned:

\begin{equation*} k(A)\simeq10^4. \end{equation*}

So... why does this happen, and what does it mean?

The matrix \(A\) has one eigenvalue which is about 1 while the second is about \(10^{-4}\text{.}\) An eigenvector of the second eigenvalue is about \((-0.7,1)\text{.}\) This means that every vector with direction close to \((-0.7,1)\) gets stretched by \(A\) by a factor of about \(10^{-4}\text{.}\) Now,

\begin{equation*} x-\hat x \simeq \begin{pmatrix} 1.87\\ -2.60\\ \end{pmatrix} \simeq -2.6 \begin{pmatrix} -0.7\\ 1\\ \end{pmatrix} \end{equation*}

Hence in this case the residual vector is very small just by a coincindence: because \(x-\hat x\) is almost an eigenvector of the smallest eigenvalue of \(A\text{,}\) that happens to be much smaller than the largest eigenvalue. If we choose "by hand" \(\hat x\) to be close to the solution, this time \(x-\hat x\) is not much parallel to \((-0.7,1)\) and so the reason the relative residue is small is solely due to the small \(\Delta x\text{,}\) but of course in general we do not know in advance the solution and so cannot tell easily which one is which when \(k(A)\) is large.

Finally, you can use the code in this section to solve the system in double precision. You will notice that this time the numerical solution will be pretty close to the exact one, due to the much higher precision of calculations. On the other side, we could build examples of matrices with conditioning so large that even in double precision we would get a rather large error. In other words, check always your matrix conditioning before solving a system!