AATA Linear spaces and linear maps

Section 4.1 Linear spaces and linear maps

Vector spaces. In order to be able to talk about linear functions/maps we need first to introduce the concept of vector space. Vector spaces are often called also linear spaces exactly because they provide the environment where linear maps live.

In short, a vector space \(V\) is a set of elements (called vectors) on which it is defined a sum, namely an operation denoted by + that satisfies the usual properties of addition of real numbers, and a multiplication by a scalar (in our case scalars are real numbers).

The sum satisfies the following properties:

\(v+u=u+v\text{;}\)
\(v+(u+w)=(v+u)+w\text{;}\)
there is a vector denoted by 0 such that \(v+0=0+v=0\text{;}\)
each \(v\) has an opposite vector, denoted by \(-v\text{,}\) such that \(v+(-v)=0\text{.}\)

Scalar multiplication satisfies the following properties:

\(s\cdot(t\cdot v) = (s\cdot t)\cdot v\text{;}\)
\(s\cdot(u+v)=s\cdot u+s\cdot v\text{;}\)
\((s+t)\cdot v=s\cdot v+t\cdot v\text{;}\)
\(1\cdot v=v\text{;}\)

A simple example of vector space is the set \(\bR^n\) of all \(n\)-tuples of real numbers \((x_1,\dots,x_n)\text{.}\) In this case the sum of thwo \(n\)-tuples is their componentwise sum and the multiplication of a scalar by a vector is the componentwise multiplication:

\begin{equation} (x_1,\dots,x_n)+(y_1,\dots,y_n)=(x_1+y_1,\dots,x_n+y_n)\label{eq-vsum}\tag{4.1.1} \end{equation}

\begin{equation} s(x_1,\dots,x_n)=(s x_1,\dots,s x_n)\label{eq-vmul}\tag{4.1.2} \end{equation}

Dimension of a vector space. Each vector space is characterized by its dimension. Saying that \(V\) has dimension \(n\) means precisely that there are \(n\) vectors \(e_1,e_2,\dots,e_n\text{,}\) called a base of \(V\text{,}\) such that each vector \(v\) of \(V\) can be written in a unique way as

\begin{equation*} v = v_1e_1+v_2e_2+\dots+v_ne_n, \end{equation*}

where \(v_1,v_2,\dots,v_n\) are real numbers and are called the components of \(v\) in that base.

Hence, once a base of \(V\) is given, each vector can be represented by a \(n\)-tuple of real numbers: \((v_1,v_2,\dots,v_n)\text{.}\) Notice that this is exactly what a MATLAB array is: any array v is precisely an ordered collection of \(n\) elements v(1),v(2),...,v(n).

Incidentally, the argument above shows that the example above of the vector space \(\bR^n\) is fundamental: each vector space of dimension \(n\) is actually isomorphic to \(\bR^n\text{.}\) In some sense, there is just one vector space for each finite dimension \(n\) (in infinite dimension this is not the case).

Column vectors. In mathematics, vectors are usually represented by columns rather than by rows. The operator that, in MATLAB, exchanges rows with columns in a vector v (and, as we will see, in a matrix) is the ' symbol:

As mentioned above, in principle we can always thinking about a vector space of dimension \(n\) just as the set of all \(n\)-tuples of real numbers \((v_1,v_2,\dots,v_n)\text{,}\) so the sum and scalar multiplication are given by (4.1.1) and (4.1.2).

We need to remember, though, that the "true" object is the vector and not its components. If we change the base of \(V\text{,}\) the components of vectors change although vector themselves don't.

Linear functions. So, what does it mean that \(f\) is linear? It means that is, in some sense, transparent with respect to these operations, namely

\(\displaystyle f(u+v)=f(u)+f(v)\)
\(\displaystyle f(s\cdot v)=s\cdot f(v)\)

The simplest case is when \(f\) is a linear function of one variable:

\begin{equation*} f(x) = f(x\cdot1)=x\cdot f(1)=c\cdot x, \end{equation*}

where \(c=f(1)\) is some real constant. Hence, \(f(x)\) is a monomial of degree 1.

When \(f\) is a function of two variables \((x,y)\text{,}\) then

\begin{gather*} f(x,y)=\\ =f(x\cdot(1,0)+y\cdot(0,1))=\\ =f(x\cdot(1,0))+f(y\cdot(0,1))=\\ =x\cdot f(1,0)+y\cdot f(0,1)=\\ =ax+by, \end{gather*}

where \(a=f(1,0)\) and \(b=f(0,1)\text{.}\)

In matricial notation,

\begin{equation*} f(x,y) = \begin{pmatrix}a&b\cr\end{pmatrix}\begin{pmatrix}x\cr y\cr\end{pmatrix}. \end{equation*}

Notice that the product between the row vector and the column vector is a matricial product and therefore in MATLAB we need to use for it the * symbol. In the code below we evaluate the function

\begin{equation*} f(x,y)=x+2y=\begin{pmatrix}1&2\cr\end{pmatrix}\begin{pmatrix}x\cr y\cr\end{pmatrix} \end{equation*}

at \(x=-2, y=4\text{:}\)

In case of a linear function in \(n\) variables,

\begin{equation*} f(x_1,\dots,x_n) = a_1x_1+\dots+a_nx_n=\begin{pmatrix}a_1&\dots&a_n\cr\end{pmatrix}\begin{pmatrix}x_1\cr \vdots\cr x_n\cr\end{pmatrix}. \end{equation*}

Linear maps. Any linear map \(f\) from a \(n\)-dimensional vector space to a \(m\)-dimensional vector space can be thought as a (column) vector of \(m\) linear functions \(f^1,\dots,f^m\) of \(n\) linear variables. In matricial notation, \(m\) columns of \(n\)-dimensional row vectors is called \(n\times m\) matrix:

\begin{equation*} f(x_1,\dots,x_n) = (a_{11}x_1+\dots+a_{1n}x_n,\dots,a_{m1}x_1+\dots+a_{mn}x_n)= \end{equation*}

\begin{equation*} =\begin{pmatrix}a_{11}&\dots&a_{1n}\cr\vdots&\vdots&\vdots\cr a_{m1}&\dots&a_{mn}\cr\end{pmatrix}\begin{pmatrix}x_1\cr \vdots\cr x_n\cr\end{pmatrix}. \end{equation*}

Throughout this book, we will only consider linear maps from a vector space into itself, namely the case \(m=n\text{.}\) The corresponding matrices are, not surprisingly, called square matrices.

Determinant. We start discussing the case of a linear map from the plane \(\bR^2\) to itself. Take a linear map \(f\) from the plane to itself and a base \(\{e_1,e_2\}\) of \(\bR^2\text{.}\) Let us choose the units so that the parallelogram having \(e_1\) and \(e_2\) as a pair of sides has surface area equal to 1. What happens to the area of the parallelogram having \(f(e_1)\) and \(f(e_2)\) as a pair of sides?

Let's use components. With respect to themselves, clearly \(e_1=(1,0)\) and \(e_1=(0,1)\text{,}\) namely the parallelogram is a square of side 1. The map \(f\) is represented by a \(2\times2\) matrix so that

\begin{equation*} f(x,y) = (ax+by,cx+dy) = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}\begin{pmatrix}x\cr y\cr\end{pmatrix}. \end{equation*}

Hence

\begin{equation*} f(e_1) = f(1,0) = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}\begin{pmatrix}1\cr 0\cr\end{pmatrix} = \begin{pmatrix}a\cr c\cr\end{pmatrix} \end{equation*}

and

\begin{equation*} f(e_2) = f(0,1) = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}\begin{pmatrix}0\cr 1\cr\end{pmatrix} = \begin{pmatrix}b\cr d\cr\end{pmatrix}. \end{equation*}

Recall now the vector product in 3-dimensions: the area of the parallelogram with \(\begin{pmatrix}a\cr c\cr\end{pmatrix}\) and \(\begin{pmatrix}b\cr d\cr\end{pmatrix}\) is the length of the vector product

\begin{equation*} \begin{pmatrix}a\cr c\cr 0\cr\end{pmatrix}\times\begin{pmatrix}b\cr d\cr 0\cr\end{pmatrix} = \begin{pmatrix}0\cr 0\cr ad-bc\cr\end{pmatrix}, \end{equation*}

namely it is equal to the absolute value of \(ad-bc\text{.}\)

The quantity \(\det f = ad-bc\) is therefore an important number relative to a linear map and it is called its determinant. It is a measure of how much does a linear map distort areas. This precise fact is indeed true not only in the plane but in all dimensions.

Evaluating directly the determinant of general matrices is impossible. The determinant looks like a very important quantity but its evaluation presents a big problem. Indeed, the expression for the determinant of a \(n\times n\) matrix contains \(n!\) summands. Recall that the factorial grows even faster than exponentially...

Let's make a simple concrete example. The current fastest supercomputer is able to perform about \(0.5\times10^{18}\) floating-point operations per second. In order to sum \(30!\simeq2.6\times10^{32}\) summands, here is the number of years needed to perform the sum:

Yes, it takes at least about 16 million years!! Notice that in problems coming from the numerical analysis of Partial Differential Equations one might have to deal with matrices that are \(1000\times1000\text{.}\) Clearly the direct calculation is out of the question.

Inverse of a matrix. It turns out also that the determinant of a map appears also in the expression of the inverse of a linear map, and so of a matrix. A direct calculation shows that, if

\begin{equation*} A = \begin{pmatrix}a&b\cr c&d\cr\end{pmatrix}, \end{equation*}

then

\begin{equation*} A^{-1} = \frac{1}{\det A}\begin{pmatrix}d&-b\cr -c&a\cr\end{pmatrix}. \end{equation*}

Clearly, if \(\det A=0\text{,}\) the inverse is not well defined. This should not be surprising: if the square of side 1 is sent to a parallelogram of zero area, it means that the two vectors \((1,0)\) and \((0,1)\) are sent into two vectors belonging to the same line. Hence all vectors of the plane are sent by \(A\) to a single line and clearly the map cannot be inverted.

Solving linear systems. Why do we care so much about inverting matrices? Well, any linear system, such as

\begin{equation*} \begin{aligned} 2x+3y-z&=1\cr \phantom{2x+}y+3z&=0\cr 7x+y-2z&=5\cr \end{aligned} \end{equation*}

amounts to solve a matricial equation, in this case

\begin{equation*} \pmatrix{2&3&-1\cr 0&1&\phantom{-}3\cr 7&1&-2\cr } \cdot \pmatrix{x\cr y\cr z\cr} = \pmatrix{1\cr 0\cr 5\cr}. \end{equation*}

In abstract terms, the equation above looks like

\begin{equation*} A\cdot v=b. \end{equation*}

When \(A\) is invertable, clearly the solution is

\begin{equation*} v = A^{-1}b. \end{equation*}

In other words, finding the inverse of a matrix is conceptually equivalent to solving a system. Since, though, the direct evaluation of the inverse is, in concrete, impossible, we will have to find some other way. This will be exactly the topic of the following sections.