NA Error analysis

Section 4.5 Error analysis

How reliable is the numerical solution provided by the LU method with pivoting? Can we find bounds for the distance of the numerical solution from the exact one?

The answer is positive but, in order to express these bounds, we need to be able to evaluate the "size" of a vector and of a matrix. We do this through a tool, defined below, called norm.

Norms. We recall below how to define norms (i.e. lengths) of vectors and matrices.

Definition 4.5.1.

A norm on a linear space \(\Bbb R^n\) is a map \(\|\cdot\|:\Bbb R^n\to[0,\infty)\) such that:

\(\|v\|\geq0\) and \(\|v\|=0\) if and only if \(v=0\text{;}\)
\(\|\lambda v\|=|\lambda|\cdot\|v\|\) for all scalars \(\lambda\in\Bbb R\text{;}\)
\(\|v+w\|\leq\|v\|+\|w\|\) (triangular inequality).

Examples:

\(\|v\|_1 = \displaystyle\sum_{i=1}^n |v^i|\text{;}\) (see Taxicab norm)
\(\|v\|_2 = \sqrt{\displaystyle\sum_{i=1}^n |v^i|^2}\text{;}\) (the ususal Euclidean norm)
\(\|v\|_\infty = \displaystyle\max_{1\leq i\leq n}|v^i|\text{.}\) (Chebyshev distance)

The norm of a vector with endpoints \(A\) and \(B\) is also the distance between point \(A\) and point \(B\) and the concept of distance depends on the environment. For instance, if points \(A\) and \(B\) are two points in an ice ring, definitely the most suitable concept of distance is the Euclidean one: one just moves from \(A\) to \(B\) on the straight line joining them. If the two points mark two buildings in Manhattan, though, then to go from \(A\) to \(B\) one can not move diagonally (you'd cross the buildings) but only either horizontally or vertically, so should use the Taxicab norm. But if \(A\) marks a King piece on a chessboard, which can also move diagonally, then the number of moves it takes to the King to get to \(B\) is given by the Chebyshev distance between \(A\) and \(B\text{!}\)

Notice that, if a sequence of vectors goes to zero in a norm of \(\bR^n\text{,}\) it does in all other ones, so from the point of view of limits it does not matter which norm one chooses.

The space of all \(n\times n\) matrices is a linear space and so the norms above apply to matrices as well. For matrices, though, there is a special class of norms that are induced by the underlying vector norm in the following way:

\begin{equation*} \|A\| = \max_{\|v\|=1}\|Av\|. \end{equation*}

Notice that, with this definition,

\begin{equation*} \|Av\|\leq\|A\|\cdot\|v\| \end{equation*}

for all vectors \(v\text{.}\)

In case the underlying vector space norm is the Euclidean one, the corresponding matricial norm is called the spectral norm: when \(A\) is symmetric, its spectral norm \(\|A\|_{sp}\) is the highest modulus eigenvalue of \(A\text{;}\) when \(A\) is not symmetric, then \(\|A\|^2_{sp}=\|AA^*\|_{sp}\text{,}\) where \(A^*\) is the transpose of \(A\text{.}\) Notice that, since \((AB)^*=B^*A^*\text{,}\) the matrix \(AA^*\) is symmetric for any \(A\text{.}\)

A very important property relating norms and products of matrices is that, in case of any norm induced by the underlying vector norm,

\begin{equation*} \|AB\|\leq\|A\|\cdot\|B\|. \end{equation*}

We say that these norms are submultiplicative. Throughout this section, we will use submultiplicative norms.

Error propagation in linear systems. Suppose you want to solve numerically the linear system

\begin{equation*} Av=b. \end{equation*}

First, notice that we have to input the entries of \(A\) and the components of \(b\text{,}\) and this alone will entail an error since most numbers cannot be represent exactly in either single or double precision floating point systems. This means that the system we are really solving numerically is rather

\begin{equation*} (A+\Delta A)v=b+\Delta b. \end{equation*}

To keep the discussion as simple as possible, let us assume now that \(A\)'s entries are represented evactly, namely that \(\Delta A=0\text{.}\)

Denote by \(v_e\) the solution of \(Av=b\) and by \(v_n\) the solution of \(Av=b+\Delta b\) and set \(\Delta v=v_n-v_e\text{,}\) so that

\begin{equation*} v_n = v_e + \Delta v. \end{equation*}

In other words, \(v_e\) is the exact solution of the original system while \(v_n\) is the exact solution of the "floating point version" of the system. Notice that the actual solution that any algorithm will find won't be exactly \(v_n\) because of the floating point errors involved in the operations needed to get the result. In the following, we assume that these errors are negligible and so we assume that \(v_n\) is actually our numerical solution.

Since \(A\) is linear,

\begin{equation*} b+\Delta b = A v_n = A(v_e+\Delta v) = Av_e+A\Delta v, \end{equation*}

so \(A\Delta v = \Delta b\) and therefore

\begin{equation*} \Delta v = A^{-1}\Delta b. \end{equation*}

Hence, from \(Av_e=b\text{,}\) we get that

\begin{equation*} \|v_e\|\geq\frac{\|b\|}{\|A\|} \end{equation*}

and, from \(A\Delta v = \Delta b\text{,}\) we get that

\begin{equation*} \|\Delta v\|\leq\|A^{-1}\|\cdot\|\Delta b\|. \end{equation*}

From these two inequalities we get that

\begin{equation*} \frac{\|\Delta v\|}{\|v_e\|}\leq\|A^{-1}\|\cdot\|\Delta b\|\frac{\|A\|}{\|b\|}, \end{equation*}

namely

\begin{equation*} \frac{\|\Delta v\|}{\|v_e\|}\leq \|A\|\cdot\|A^{-1}\|\frac{\|\Delta b\|}{\|b\|}. \end{equation*}

More generally, when there is an error also on the entries of \(A\text{,}\) one can prove that

\begin{equation*} \frac{\|\Delta v\|}{\|v_e\|}\leq\|A\|\cdot\|A^{-1}\|\left(\frac{\|\Delta b\|}{\|b\|}+\frac{\|\Delta A\|}{\|A\|}\right), \end{equation*}

namely the relative error in the solution is equal to the sum of relative errors in the equations coefficients rescaled by a factor \(\|A\|\cdot\|A^{-1}\|\text{.}\)

Conditioning. The quantity

\begin{equation*} k(A) = \|A\|\cdot\|A^{-1}\| \end{equation*}

is called conditioning of \(A\text{.}\)

Clearly \(k(\Bbb I_n)=1\) in any norm and so

\begin{equation*} 1=\|{\Bbb I}_n\| = \|AA^{-1}\|\leq\|A\|\cdot\|A^{-1}\|=k(A). \end{equation*}

For a diagonal matrix with diagonal elements \(d_i\text{,}\)

\begin{equation*} k(A)=\max|d_i|/\min|d_i| \end{equation*}

In general, when evaluated with the spectral norm, the conditioning is equal to the modulus of the ratio of the highest modulus eigenvalue over the smallest modulus one. Maps that stretch a lot in one direction and shrink a lot in some other have a large conditioning number since in the plane of those two directions the map looks like a degenerate map that sends the whole plane into a line.

The conditioning of a matrix A in Python can be obtained with the command cond(A). We use this command in the code below to evaluate the conditioning of the Hilbert matrices \(H^{(n)}\) defined by

\begin{equation*} H^{(n)}_{ij}=\frac{1}{i+j-1},\;\;i,j=1,\dots,n. \end{equation*}

These matrices are known to have, as \(n\) increases, a largest eigenvalue not particularly large and a smallest eigenvalue that, instead, is going fast to zero. As a consequence, their conditioning grows very fast. In Python, \(H^{(n)}\) is returned by the command hilb(n).

Residual. The residual of a numerical solution of a linear equation is

\begin{equation*} r = b-Av_n. \end{equation*}

Notice that, if there were no error and \(v_n\) coincided with the solution \(v_e\text{,}\) then \(r=0\) since \(Av_n=b\text{.}\)

In principle, the residual is a good estimator of the quality of the solution but only when the conditioning of the matrix \(A\) is not too large!

Indeed, by definition, \(r = A\Delta v\) and therefore \(\Delta v=A^{-1}r\text{,}\) so that

\begin{equation*} \|\Delta v\|\leq\|r\|\|A^{-1}\|,\;\|v_e\|\geq\frac{\|b\|}{\|A\|}\text{.} \end{equation*}

Hence

\begin{equation*} \frac{\|\Delta v\|}{\|v_e\|}\leq k(A)\frac{\|r\|}{\|b\|} \end{equation*}

so that, if \(k(A)\) is very large, we can have very imprecise solutions even in presence of very small residuals.

We illustrate how things can go bad in a concrete case. Let us solve, with the LU decomposition method, the linear system

\begin{equation*} \begin{pmatrix} 0.913&0.659\\ 0.457&0.330\\ \end{pmatrix}\cdot \begin{pmatrix} x\\ y\\ \end{pmatrix} = \begin{pmatrix} 0.254\\ 0.127\\ \end{pmatrix}, \end{equation*}

in our \(D_3\) floating point system.

Note that the exact solution of the system is

\begin{equation*} x_e=1, y_e=-1. \end{equation*}

Applying the LU decomposition, we find that

\begin{equation*} L = \begin{pmatrix} 1&0\\ 0.501&1\\ \end{pmatrix},\;\; U = \begin{pmatrix} 0.913&0.659\\ 0&-1.59\cdot10^{-4}\\ \end{pmatrix} \end{equation*}

Remark 4.5.2.

Note that the 0 in the \(U\) matrix does not come from the actual difference

\begin{equation*} 0.457-0.501\cdot0.913. \end{equation*}

Almost never floating point calculations produce an exact zero! We put zero there "by hand" because we know that that would be the result in exact mathematics and without those zeros the LU method just could not work. Of course this is a source of instability for sensistive matrices, namely matrices for which the conditioning is large.

Hence we are left with the system

\begin{equation*} \begin{pmatrix} 0.913&0.659\\ 0&-1.59\cdot10^{-4}\\ \end{pmatrix} \cdot \begin{pmatrix} x\\ y\\ \end{pmatrix} = \begin{pmatrix} 0.254\\ -2.54\cdot10^{-4}\\ \end{pmatrix}. \end{equation*}

The system above has solution

\begin{equation*} y_n=\frac{-2.54}{-1.59}=1.60,\; x_n=\frac{0.254-0.659\cdot1.60}{0.913}=-0.872, \end{equation*}

quite far from the correct one!

Can we tell how bad is this solution from the residual? The answer is actually negative. The residual of this numerical solution is

\begin{equation*} \begin{pmatrix} 0.254\\ 0.127\\ \end{pmatrix} - \begin{pmatrix} 0.913&0.659\\ 0.457&0.330\\ \end{pmatrix} \cdot \begin{pmatrix} -0.872\\ 1.60\\ \end{pmatrix} = \begin{pmatrix} -4.26\cdot10^{-3}\\ -2.50\cdot10^{-3}\\ \end{pmatrix} \end{equation*}

The norm of this residual in the Taxicab norm is

\begin{equation*} \|r\|=6.76\cdot10^{-3} \end{equation*}

but the numerical solution is quite far from the exact one \(x_e=1,y_e=-1\)!

Note also that points much closer to the solution, like \(x=0.999\text{,}\) \(y=-1.01\text{,}\) have a higher norm residual:

\begin{equation*} \begin{pmatrix} 0.254\\ 0.127\\ \end{pmatrix} - \begin{pmatrix} 0.913&0.659\\ 0.457&0.330\\ \end{pmatrix} \cdot \begin{pmatrix} 0.999\\ -1.01\\ \end{pmatrix} = \begin{pmatrix} 7.50\cdot10^{-3}\\ 3.76\cdot10^{-3}\\ \end{pmatrix} , \end{equation*}

whose Taxicab norm is

\begin{equation*} \|r\|=1.13\cdot10^{-2}\text{,} \end{equation*}

about double than the residual of the quite wrong numerical solution above.

The reason for this quite unintuitive behavior is that \(A\) is ill-conditioned:

\begin{equation*} k(A)\simeq10^4. \end{equation*}

So... why does this happen, and what does it mean?

The matrix \(A\) has one eigenvalue which is about 1 while the second is about \(10^{-4}\text{.}\) An eigenvector of the second eigenvalue is about \((-0.7,1)\text{.}\) This means that every vector with direction close to \((-0.7,1)\) gets stretched by \(A\) by a factor of about \(10^{-4}\text{.}\) Now,

\begin{equation*} \begin{pmatrix} x_e\\ y_e\\ \end{pmatrix} - \begin{pmatrix} x_n\\ y_n\\ \end{pmatrix} \simeq \begin{pmatrix} 1.87\\ -2.60\\ \end{pmatrix} \simeq -2.6 \begin{pmatrix} -0.7\\ 1\\ \end{pmatrix} \end{equation*}

Hence in this case the residual vector is very small just by a coincindence: since \(\begin{pmatrix} x_e\\ y_e\\ \end{pmatrix} - \begin{pmatrix} x_n\\ y_n\\ \end{pmatrix}\) is almost an eigenvector of the smallest eigenvalue of \(A\text{,}\) that happens to be much smaller than the largest eigenvalue. If we choose "by hand" \(\begin{pmatrix} x_e\\ y_e\\ \end{pmatrix}\) to be close to the solution, this time \(\begin{pmatrix} x_e\\ y_e\\ \end{pmatrix} - \begin{pmatrix} x_n\\ y_n\\ \end{pmatrix}\) is not much parallel to \((-0.7,1)\) and so the reason the relative residue is small is solely due to the small \(\Delta x\text{,}\) but of course in general we do not know in advance the solution and so cannot tell easily which one is which when \(k(A)\) is large.

Finally, you can use the code in this section to solve the system in double precision. You will notice that this time the numerical solution will be pretty close to the exact one, due to the much higher precision of calculations. On the other side, we could build examples of matrices with conditioning so large that even in double precision we would get a rather large error.

In other words, check always your matrix conditioning before solving a system numericallly!