NA Linear spaces and linear maps

Section 4.1 Linear spaces and linear maps

Vector spaces. In order to be able to talk about linear functions/maps we need first to introduce the concept of vector space. Vector spaces are often called also linear spaces exactly because they provide the environment where linear maps live.

The simplest examples of vector spaces are the sets of all ordered \(n\)-tuples of real numbers

\begin{equation*} (x_1,x_2,\dots,x_n), \end{equation*}

where \(n\) can be any positive integer. Each \(n\)-tuple is called vector. These sets are usually denoted by \(\Bbb R^n\text{,}\) so that \(\Bbb R^1\) is the line, \(\Bbb R^2\) is the plane and so on. The number \(n\) is called the dimension of the space.

There are two basic natural operations defined on \(\Bbb R^n\text{:}\)

given two vectors \((x_1,\dots,x_n)\) and \((y_1,\dots,y_n)\text{,}\) we can sum them component-wise:
\begin{equation} (x_1,\dots,x_n)+(y_1,\dots,y_n)=(x_1+y_1,\dots,x_n+y_n)\label{eq-vsum}\tag{4.1.1} \end{equation}
given a real number \(r\) and a vector \((x_1,\dots,x_n)\text{,}\) we can multiply the vector by the number:
\begin{equation} r\cdot(x_1,\dots,x_n) = (r\cdot x_1,\dots,r\cdot x_n).\label{eq-vmul}\tag{4.1.2} \end{equation}

These operations satisfy some property that they inherit from the corresponding properties of real numbers: the sum is commutative and associative, there exists a zero vector (the vector whose components are all zeros) and each vector has a unique opposite vector; the product is associative and distributive and multiplying by 1 does not change the vector.

In finite dimension -- and this is the only relevant case in Numerical Analysis -- every vector space is isomorphic to \(\Bbb R^n\text{,}\) so the elementary examples above are all that there is.

More abstractly, a vector space \(V\) is a set of elements (called vectors) on which it is defined a sum, namely an operation denoted by + that satisfies the usual properties of addition of real numbers, and a multiplication by a scalar (in our case scalars are real numbers).

The sum satisfies the following properties:

\(v+u=u+v\text{;}\)
\(v+(u+w)=(v+u)+w\text{;}\)
there is a vector denoted by 0 such that \(v+0=0+v=v\text{;}\)
each \(v\) has an opposite vector, denoted by \(-v\text{,}\) such that \(v+(-v)=0\text{.}\)

The scalar multiplication satisfies the following properties:

\(s\cdot(t\cdot v) = (s\cdot t)\cdot v\text{;}\)
\(s\cdot(u+v)=s\cdot u+s\cdot v\text{;}\)
\((s+t)\cdot v=s\cdot v+t\cdot v\text{;}\)
\(1\cdot v=v\text{;}\)

Dimension of a vector space. Each vector space is characterized by its dimension. Saying that \(V\) has dimension \(n\) means precisely that there are \(n\) vectors \(e_1,e_2,\dots,e_n\text{,}\) called a base of \(V\text{,}\) such that each vector \(v\) of \(V\) can be written in a unique way as

\begin{equation*} v = v_1e_1+v_2e_2+\dots+v_ne_n, \end{equation*}

where \(v_1,v_2,\dots,v_n\) are real numbers and are called the components of \(v\) in that base.

Hence, once a base of \(V\) is given, each vector can be represented by a \(n\)-tuple of real numbers: \((v_1,v_2,\dots,v_n)\text{.}\) Notice that this is exactly what a Python array is: any array v is precisely an ordered collection of \(n\) elements v[0],v[1],...,v[n-1] (notice that, as most programming langugaes, starts counting from 0 rather than from 1).

Incidentally, the argument above shows that the example above of the vector space \(\bR^n\) is fundamental: each vector space of dimension \(n\) is actually isomorphic to \(\bR^n\text{.}\) In some sense, there is just one vector space for each finite dimension \(n\) (in infinite dimension this is not the case).

Column vectors. In mathematics, vectors are usually represented by columns rather than by rows. In Python, this difference is not noticeable unless one defines an array of \(n\) elements as a \(1\times n\) matrix. In this case, the transpose (see the code below) is a matrix \(n\times 1\text{.}\)

As mentioned above, in principle we can always thinking about a vector space of dimension \(n\) just as the set of all \(n\)-tuples of real numbers \((v_1,v_2,\dots,v_n)\text{,}\) so the sum and scalar multiplication are given by (4.1.1) and (4.1.2).

We need to remember, though, that the "true" object is the vector and not its components. If we change the base of \(V\text{,}\) the components of vectors change although vector themselves don't.

Linear functions. So, what does it mean that \(f\) is linear? It means that is, in some sense, transparent with respect to these operations, namely

\(\displaystyle f(u+v)=f(u)+f(v)\)
\(\displaystyle f(s\cdot v)=s\cdot f(v)\)

The simplest case is when \(f\) is a linear function of one variable:

\begin{equation*} f(x) = f(x\cdot1)=x\cdot f(1)=c\cdot x, \end{equation*}

where \(c=f(1)\) is some real constant. Hence, \(f(x)\) is a monomial of degree 1.

When \(f\) is a function of two variables \((x,y)\text{,}\) then

\begin{gather*} f(x,y)=\\ =f(x\cdot(1,0)+y\cdot(0,1))=\\ =f(x\cdot(1,0))+f(y\cdot(0,1))=\\ =x\cdot f(1,0)+y\cdot f(0,1)=\\ =ax+by, \end{gather*}

where \(a=f(1,0)\) and \(b=f(0,1)\text{.}\)

In matricial notation,

\begin{equation*} f(x,y) = \begin{pmatrix}a&b\cr\end{pmatrix}\begin{pmatrix}x\cr y\cr\end{pmatrix}. \end{equation*}

In the code below we evaluate the function

\begin{equation*} f(x,y)=x+2y=\begin{pmatrix}1&2\cr\end{pmatrix}\begin{pmatrix}x\cr y\cr\end{pmatrix} \end{equation*}

at \(x=-2, y=4\text{:}\)

In case of a linear function in \(n\) variables,

\begin{equation*} f(x_1,\dots,x_n) = a_1x_1+\dots+a_nx_n=\begin{pmatrix}a_1&\dots&a_n\cr\end{pmatrix}\begin{pmatrix}x_1\cr \vdots\cr x_n\cr\end{pmatrix}. \end{equation*}

Linear maps. Any linear map \(f\) from a \(n\)-dimensional vector space to a \(m\)-dimensional vector space can be thought as a (column) vector of \(m\) linear functions \(f^1,\dots,f^m\) of \(n\) linear variables. In matricial notation, \(m\) columns of \(n\)-dimensional row vectors is called \(n\times m\) matrix:

\begin{equation*} f(x_1,\dots,x_n) = (a_{11}x_1+\dots+a_{1n}x_n,\dots,a_{m1}x_1+\dots+a_{mn}x_n)= \end{equation*}

\begin{equation*} =\begin{pmatrix}a_{11}&\dots&a_{1n}\cr\vdots&\vdots&\vdots\cr a_{m1}&\dots&a_{mn}\cr\end{pmatrix}\begin{pmatrix}x_1\cr \vdots\cr x_n\cr\end{pmatrix}. \end{equation*}

Throughout this book, we will only consider linear maps from a vector space into itself, namely the case \(m=n\text{.}\) The corresponding matrices are, not surprisingly, called square matrices.

Determinant. We start discussing the case of a linear map from the plane \(\bR^2\) to itself. Take a linear map \(f\) from the plane to itself and a base \(\{e_1,e_2\}\) of \(\bR^2\text{.}\) Let us choose the units so that the parallelogram having \(e_1\) and \(e_2\) as a pair of sides has surface area equal to 1. What happens to the area of the parallelogram having \(f(e_1)\) and \(f(e_2)\) as a pair of sides?

Let's use components. With respect to themselves, clearly \(e_1=(1,0)\) and \(e_2=(0,1)\text{,}\) namely the parallelogram is a square of side 1. The map \(f\) is represented by a \(2\times2\) matrix so that

\begin{equation*} f(x,y) = (ax+by,cx+dy) = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}\begin{pmatrix}x\cr y\cr\end{pmatrix}. \end{equation*}

Hence

\begin{equation*} f(e_1) = f(1,0) = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}\begin{pmatrix}1\cr 0\cr\end{pmatrix} = \begin{pmatrix}a\cr c\cr\end{pmatrix} \end{equation*}

and

\begin{equation*} f(e_2) = f(0,1) = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}\begin{pmatrix}0\cr 1\cr\end{pmatrix} = \begin{pmatrix}b\cr d\cr\end{pmatrix}. \end{equation*}

Recall now the vector product in 3-dimensions: the area of the parallelogram with \(\begin{pmatrix}a\cr c\cr\end{pmatrix}\) and \(\begin{pmatrix}b\cr d\cr\end{pmatrix}\) is the length of the vector product

\begin{equation*} \begin{pmatrix}a\cr c\cr 0\cr\end{pmatrix}\times\begin{pmatrix}b\cr d\cr 0\cr\end{pmatrix} = \begin{pmatrix}0\cr 0\cr ad-bc\cr\end{pmatrix}, \end{equation*}

namely it is equal to the absolute value of \(ad-bc\text{.}\)

The quantity \(\det f = ad-bc\) is therefore an important number relative to a linear map and it is called its determinant. It is a measure of how much does a linear map distort areas. This precise fact is indeed true not only in the plane but in all dimensions.

Evaluating directly the determinant of general matrices is impossible. The determinant looks like a very important quantity but its evaluation presents a big problem. Indeed, the expression for the determinant of a \(n\times n\) matrix contains \(n!\) summands. Recall that the factorial grows even faster than exponentially...

Let's make a simple concrete example. As of Jun 2023, the fastest supercomputer (using about 9 million cores) is able to perform about \(1.2\times10^{18}\) floating-point operations per second. In order to sum \(30!\simeq2.6\times10^{32}\) summands, here is the number of years needed to perform the sum:

Yes, it takes more than 7 million years!! Notice that in problems coming from the numerical analysis of Partial Differential Equations one might have to deal with matrices that are \(1000\times1000\text{.}\) Clearly the direct calculation is out of the question.

Inverse of a matrix. It turns out also that the determinant of a map appears also in the expression of the inverse of a linear map, and so of a matrix. A direct calculation shows that, if

\begin{equation*} A = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}, \end{equation*}

then

\begin{equation*} A^{-1} = \frac{1}{\det A}\begin{pmatrix}d&-b\cr -c&a\cr\end{pmatrix}. \end{equation*}

Clearly, if \(\det A=0\text{,}\) the inverse is not well defined. This should not be surprising: if the square of side 1 is sent to a parallelogram of zero area, it means that the two vectors \((1,0)\) and \((0,1)\) are sent into two vectors belonging to the same line. Hence all vectors of the plane are sent by \(A\) to a single line and clearly the map cannot be inverted.

Solving linear systems. Why do we care so much about inverting matrices? Well, any linear system, such as

\begin{equation*} \begin{aligned} 2x+3y-z&=1\cr \phantom{2x+}y+3z&=0\cr 7x+y-2z&=5\cr \end{aligned} \end{equation*}

amounts to solve a matricial equation, in this case

\begin{equation*} \pmatrix{2&3&-1\cr 0&1&\phantom{-}3\cr 7&1&-2\cr } \cdot \pmatrix{x\cr y\cr z\cr} = \pmatrix{1\cr 0\cr 5\cr}. \end{equation*}

In abstract terms, the equation above looks like

\begin{equation*} A\cdot v=b. \end{equation*}

When \(A\) is invertable, clearly the solution is

\begin{equation*} v = A^{-1}b. \end{equation*}

In other words, finding the inverse of a matrix is conceptually equivalent to solving a system. Since, though, the direct evaluation of the inverse is, in concrete, impossible, we will have to find some other way. This will be exactly the topic of the following sections.