Skip to main content

Section 5.2 The Hahn-Banach Theorem

Once we have a linear space \(X\text{,}\) it is natural to consider linear equations. A linear equation writes as
\begin{equation*} \eta(v)=c, \end{equation*}
where \(\eta:X\to\bR\) (resp. \(\eta:X\to\bC\) in case of complex scalars) is a linear map (also called covector) and \(c\in\bR\) (resp. \(c\in\bC\)). Recall that the set of all covectors is denoted by \(X^*\text{.}\) In finite dimension, given a basis \(e_1,\dots,e_n\) of \(X\text{,}\) \(v = v^1 e_1 + \dots v^n e_n\) and the equation writes
\begin{equation*} \eta_1 v^1 + \dots + \eta_n v^n = c, \end{equation*}
where we set \(\eta_i=\eta(e_i)\) for all \(i=1,2,\dots\text{.}\)

Of course such equation is always solvable. Things get more interesting considering more than one equation. Consider a collection \(\eta^{(i)}, i\in I\) of covectors and a corresponding collection \(c^{(i)}, i\in I\) of scalars. Is the system
\begin{equation*} \eta^{(i)}(v) = c^{(i)},\qquad i\in I \end{equation*}
solvable?

Now, recall that the elements of \(X\) can be considered as linear maps on \(X^*\text{,}\) namely that each \(v\in X\) determines uniquely a map \(v:X^*\to\bR\) defined by \(v(\eta) = \eta(v)\text{.}\) Then, after switching the roles of vectors and covectors, when \(X\) is reflexive, one can reformulate the system above as follows: given a collection of vectors \(v^{(i)}\text{,}\) find a covector \(\eta\in X^*\) such that
\begin{equation*} \eta(v^{(i)}) = c^{(i)},\qquad i\in I. \end{equation*}
There is an obvious obstruction to the existence of \(\eta\text{,}\) namely the \(c^{(i)}\) must be compatible with the fact that \(\eta\) is a function and therefore cannot have more than one value on any vector. Hence, given two distinct real numbers \(c,d\text{,}\) the pair of equations \(\eta(v)=c, \eta(v)=d\) has no solution.

The Hahn-Banach theorem shows that the one above is the only obstruction to the existence of a solution. The proof consists in a finite-dimensional inductive step, that therefore solves the problem for any Banach space with a Schauder basis. For all other cases, the claim is proved by a standard application of the Zorn Lemma, that will discuss in some length below.
Definition 5.2.1.
Given a real vector space \(X\text{,}\) a sublinear function on \(X\) is a function \(p:X\to\bR\) such that:
  1. \(p(\lambda v)=\lambda p(v)\) for every \(v\in X\) and \(\lambda\geq0\text{;}\)
  2. \(p(v+u)\leq p(v)+p(u)\) for every \(v,u\in X\text{.}\)
We say that a function \(q:X\to\bR\) si dominated by \(p\) (or \(p\)-dominated) if \(q(v)\leq p(v)\) for every \(v\in X\text{.}\)
For each \(b\in\bR\text{,}\) the linear functional
\begin{equation*} \eta_b(m\oplus \lambda v) = \eta(m) + \lambda b \end{equation*}
is an extension of \(\eta\) on \((M\oplus\bR v)^*\text{.}\) We need to prove that there are \(b\) such that \(\eta_b\) is dominated by \(p\text{.}\)

We use the fact that
\begin{equation*} \eta(m) - \eta(n) = \eta(m-n) \leq p(m-n) = p(m+v-v-n)\leq p(m+v) + p(-v-n). \end{equation*}
Indeed, then
\begin{equation*} -p(-n-v)-\eta(n)\leq p(m+v) - \eta(m)\text{ for all }m,n\in M \end{equation*}
and so
\begin{equation*} \sup_{n\in M} -p(-n-v)-\eta(n) \leq \inf_{m\in M} p(m+v) - \eta(m). \end{equation*}
We claim that, for any \(b\) such that
\begin{equation*} \sup_{n\in M} -p(-n-v)-\eta(n) \leq b \leq \inf_{m\in M} p(m+v) - \eta(m), \end{equation*}
\(\eta_b\) is dominated by \(p\text{.}\)

Indeed, for any \(\lambda\neq0\) and \(m\in M\text{,}\)
\begin{equation*} -p(-\frac{m}{\lambda}-v)-\eta(\frac{m}{\lambda})\leq b\leq p(\frac{m}{\lambda}+v) - \eta(\frac{m}{\lambda}). \end{equation*}
Assume \(\lambda>0\text{.}\) Then
\begin{equation*} p(\frac{m}{\lambda}+v) - \eta(\frac{m}{\lambda}) = \frac{1}{\lambda}\left(p(m+\lambda v) - \eta(m)\right), \end{equation*}
so that
\begin{equation*} \lambda b\leq p(m+\lambda v) - \eta(m) \end{equation*}
and similarly for \(\lambda<0\text{.}\)

Zorn's Lemma. Often in Mathematics one has to prove the existence of some maximal element in a partially ordered set. Recall that a set is partially ordered if on it is defined a relation \(\leq\) that is reflexive (\(a\leq a\)), antisymmetric (\(a\leq b\) and \(b\leq a\) implies \(a=b\)) and transitive (\(a\leq b\) and \(b\leq c\) implies \(a\leq c\)).

An example is the set of all linearly independent subsets of a vector space, where the symbol \(\leq\) stands for "is a subset of". A maximal element of such set is necessarily a basis of the space, so proving its existance means proving that each vector space has a basis (this is called Hamel basis, it is a concept different from that of Schauder basis!).

In order to prove the existence of such elements, one would need to use transfinite induction, namely induction on ordinal numbers, rather than cardinal ones. Note that, while all countable sets have the same cardinality, they can have very different ordinal numbers. To understand the difference, it can be useful considering sequences on the line. Take the countable sets \(C_1=\{1-\frac{1}{n}:n\in\bN\}\cup\{1\}\) and \(C_2=\{1-\frac{1}{n}:n\in\bN\}\cup\{1\}\cup\{2-\frac{1}{n}:n\in\bN\}\cup\{2\}\text{.}\) Both have the same cardinality but they have different ordinals. In \(C_1\) and \(C_2\text{,}\) the ordinal of \(1-\frac{1}{n}\) is \(n\) for every \(n\in\bN\) and the ordinal of 1 is \(\omega\text{,}\) which is the least ordinal larger than every finite ordinal. In \(C_2\text{,}\) the ordinal of \(2-\frac{1}{n}\) is \(\omega+n-1\) for every \(n\in\bN\) and the ordinal of 2 is \(2\omega\text{.}\) Similarly one can have sequences with ordinals that arrive to powers of \(\omega\) and so on. In order to prove properties that are indexed by ordinals, one needs transfinite induction. Zorn's Lemma is there precisely to help avoiding using transfinite induction in many occasions.

We do not prove here the Lemma but we provide three different equivalent statements of it.

Definition 5.2.3.
Let \(X\) be a partially ordered set. A subset \(C\) of \(P\) is a chain if any two elements of \(C\) are comparable. An upper bound of a subset \(Q\) of \(P\) is an element \(\bar q\in P\) such that \(q\leq \bar q\) for all \(q\in Q\text{.}\)

Now we can prove the Hahn-Banach theorem.
The set \(H\) of all extensions of \(\eta\) is partially ordered by the relation of "being an extension of". We claim that each chain \(C\subset H\) has an upper bound. Indeed, define \(\bar M=\cup_{\eps\in C}\dom(\eps)\) and set \(\bar\eta(v)=\eps(v)\) for any \(\eps\) such that \(v\in\dom\eps\text{.}\) Then \(\bar\eta\) is an upper bound for \(C\text{.}\) By Zorn's Lemma, this implies that \(H\) has a maximal element \(\eta'\text{.}\) Such element must be defined on the whole \(X\) or otherwise we could extend it with the Lemma above. Hence, \(\eta'\) is the extension in the claim of the theorem.

The following corollaries give us fundamental information on dual spaces.

The first result is the generalization of the following observation. Consider the vector space \(\bR^n\) and let \(v\in\bR^n\text{.}\) Usually we represent this vectors as columns. Covectors, namely elements of \((\bR^n)^*\text{,}\) are usually represented as rows, so that the action of a covector \(\eta\) on a vector is given simply by the matricial product:
\begin{equation*} \eta(v) = \begin{pmatrix}\eta_1\amp\dots\amp\eta_n\end{pmatrix} \begin{pmatrix}v^1\\\dots\\v^n\\\end{pmatrix} = \eta_1v^1+\dots+\eta_nv^n. \end{equation*}
In particular, this shows that, for each
\begin{equation*} v=\begin{pmatrix}v^1\\\dots\\v^n\\\end{pmatrix}, \end{equation*}
there is a covector
\begin{equation*} \eps_v=\begin{pmatrix}v^1\amp\dots\amp v^n\end{pmatrix} \end{equation*}
with the following properties:
\begin{equation*} \eps_v(v)=\|v\|^2,\qquad\|\eps_v\|=\|v\|. \end{equation*}
Hence, by setting \(\eta_v=\eps_v/\|v\|\text{,}\) we see that
\begin{equation*} \eta_v(v)=\|v\|,\qquad\|\eta_v\|=1. \end{equation*}
Next corollary shows that a linear functional with the same properties is in the dual of every topological vector space.
Define \(\eps(v)=\|v\|\) and extend \(\eps\) by linearity to \(\span\{v\}\text{.}\) Hence,
\begin{equation*} \eps(\lambda v) = \lambda \|v\| \leq |\lambda|\cdot\|v\| = \|\lambda v\|, \end{equation*}
namely \(\eps\) is dominated by the norm of \(X\text{,}\) which is sublinear. Hence, by Hahn-Banach theorem, there is an \(\eta\in X^*\) that extends \(\eps\) and is dominated by the norm of \(X\) as well. This means that
\begin{equation*} \eta(v)=\|v\|\text{ and }\eta(v)\leq\|v\|\text{ for all }v\in X. \end{equation*}
Thus,
\begin{equation*} \|\eta(v)\| = \sup_{v\in X, v\neq0}\frac{|\eta(v)|}{\|v\|} = \frac{|\eta(v)|}{\|v\|} = 1. \end{equation*}

The following corollary extends the one above to multidimensional subspaces. The proof follows the same argument and so we omit it. The following corollary shows that \(X^*\) separates points in \(X\text{.}\)
Assume \(\eta(x) = \eta(y)\) for all \(\eta\in X^*\text{,}\) but \(x\neq y\text{.}\) Then on \(\span\{x,y\}\) we define a linear functional \(\eps(\alpha x +\beta y) = \alpha \|x\|. \) Note that
\begin{equation*} p(\alpha x +\beta y) = \|\alpha x\| \end{equation*}
is a sublinear function and
\begin{equation*} \eps(\alpha x +\beta y) = \alpha \|x\| \leq p(\alpha x +\beta y). \end{equation*}
\(\eps(x) = \|x\|\neq \eps(y) =0.\) Hence, by the Hahn-Banach Theorem, we can extend \(\eps\) to a linear form \(\eta\in X^*\text{.}\) Since \(\eps(x) = \|x\|\neq \eps(y) =0\text{,}\) then \(\eta(x)\neq \eta(y)\text{,}\) contradicting the assumption. Hence \(x=y\text{.}\)

An important consequence of the result above is the following.

Set \(J:X\to X^{**}\) so that \(J_v(\eta) = \eta(v).\)
By the corollary above, \(J\) is injective. Moreover,
\begin{equation*} \|J_v\|_{X^{**}} = \sup_{\|\eta\|_{X^*}=1}|J_v(\eta)|. \end{equation*}
Since \(|J_v(\eta)|\leq \|\eta\|_{X^*}\|v\|_X,\) this means that
\begin{equation*} \|J_v\|_{X^{**}}\leq\|v\|_X. \end{equation*}
By Corollary 5.2.8, for every \(v\in X\) there is an \(\eta\in X^*\) such that \(\eta(v)=\|v\|_X\) and \(\|\eta\|_{X^*}=1\text{,}\) so \(\|J_v\|_{X^{**}}=\|v\|_X\text{.}\)

Definition 5.2.13.
We say that \(X\) is reflexive if the map \(J\) is surjective.
We conclude the section by going back to the motivational question of the Hahn-Banach theorem.
Let \(M=\span\{v^{(1)},\dots,v^{(n)}\}\) and define \(\eta(\sum_{i=1}^n\lambda_i v^{(i)}) = \sum_{i=1}^n\lambda_i c^{(i)}\text{.}\) Then \(\eta\in M^*\) and so it is dominated by \(\|\eta\|_{M^*}\|\cdot\|_M\text{,}\) that is sublinear. Hence, there is a \(\bar\eta\in X^*\) that extends \(\eta\text{.}\)