You are on page 1of 59

Functional Analysis

Alexander Strohmaier
May 16, 2013

Contents
1

Introduction and preliminaries

Motivation

Topological Spaces

3.1

Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.2

Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Banach Spaces
4.1

Dense subsets and separable Banach spaces . . . . . . . . . . . . . . . . . .

13

Linear maps between Banach spaces and the dual space

17

5.1

The Hahn-Banach theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

5.2

Topologies on Banach spaces and Operator Topologies . . . . . . . . . . . .

24

Lp -spaces and their duals

26

Adjoint operator

27

Other topological vector spaces

28

8.1

The space C0 (Rn ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

8.2

The space C (Rn ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

8.3
9

The space S(Rn ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Fourier transforms

10 Hilbert Spaces

29
33
37

10.1 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

44

10.2 Fourier transforms of periodic functions . . . . . . . . . . . . . . . . . . . .

48

11 Compact operators

49

12 Compact operators in Hilbert spaces and spectral theory

52

12.1 Trace class operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

Introduction and preliminaries

These are lecture notes taken from the course MAD203 Functional Analysis at Loughborough
university. Please note that there might be many typos. You are welcome to point those out
to me if you find them. Please also note that the lecture notes can not be a substitute for the
lecture.

Motivation

The word functional usually means a function on a set or space of functions. Here is an
example. Lets us look at an elastic string that is fixed between two points of distance L. The
amplitude of the vibration of the string at distance x and time t defines a function
f : [0, L] R R,

(x, t) 7 f (x, t).

The so-called action functional L is then given by



Z T Z L
f 2
f 2
L=
| | (x, t) | | (x, t) dxdt.
x
t
0
0
The equations of motion for that string can be obtained by finding extremal points of this
functional and in this way one obtains the wave equation
2f
2f
2 = 0.
x2
t
Here is another example of a functional that wants to be minimized. Suppose we have a
2-dimensional domain D with nice (smooth) boundary D. We imagine a membrane in this
domain with a fixed amplitude at the boundary. We are asking the question of how the
membrane in the interior will look like. The theory of elastic media gives us a first order
approximation to the energy of that membrane

Z 
f 2
f 2
| | (x, y) + | | (x, y) dxdy.
x
y
D
We are therefore looking for a minimum f of this functional on the space of differentiable
functions satisfying f |Y = g, where g is a given function on the boundary. It was a classical
problem to prove that such a minimizer exists and is unique. This problem was solved by
methods of functional analysis.
As we can see analysis on function spaces looks at similar questions as classical analysis.
Namely, questions of existence of minima, maxima and their uniqueness, questions of continuity

or solutions to equations. However, this is not an applied course as we will develop here the
abstract theory of functional analysis. Applications will accompany us throughout the course
and we will meet them in examples. But the main objective of the course is to develop the
abstract theory and the terminology necessary to understand an solve such problems but also to
provide the language necessary to read modern textbooks of mathematics and research articles.
Our main guiding example will be Fourier transforms and generalized Fourier transforms on
compact spaces but also integral operators and their spectral theory. Since there is some basic
language necessary to treat examples we will come to them in the middle of the course.

Topological Spaces

I would like to review here some notions you have already met in previous courses.
Definition 3.1. A metric space (X, d) is a set X together with a map d, the metric,
d : X X R,

(x, y) 7 d(x, y)

such that
d(x, y) 0,
d(x, y) = d(y, x),
d(x, y) d(x, z) + d(z, y),
d(x, y) = 0 x = y.
The open ball of radius  > 0 around the point x X is defined as
B (x) := {y X | d(x, y) < }.
A subset U X in a metric space is called open if for every point x U there is an open ball
B (x) which is in U, i.e. B (x) U. The set of open sets
T = {U X | U is open}
is called the topology of the space X induced by the metric. It satisfies the following properties:
is open
X is open

the union of any collection of open sets is again open


the intersection of a finite number of open sets is again open.
In general a system of subsets T of a set X that satisfies the above is called topological
space. The collection of open sets is called the topology. A set in this system is called open
by definition. If we are dealing with a metric space we think of its topology as the one defined
above. Here again the abstract definition
Definition 3.2. A topological space is a set X together with a set of subsets T P(X) such
that
T
XT
the union of any collection of sets in T is again in T
the intersection of a finite number of sets in T is again in T .
A subset U T is called open and we say that T is the set of open sets for the topology.
Loosely speaking the topology says something about neighborhood relations of points in X
but without saying anything about their distance. A topology is enough to declare when a
sequence of points xn converges to a point x. We say that O is an open neighborhood of
x X if O is open and contains the point x.
Definition 3.3. Let X be a topological space. Then we say that a sequence xn converges to
x if for every open neighborhood U of x there are only finitely many xn that are not in U. In
other words: for every open neighborhood U of x there exists an N such that for every n > N
we have xn U.
Continuity makes sense for maps between any topological spaces.
Definition 3.4. Let X and Y be topological spaces. Then a map f : X Y is called
continuous of the pre-image of any open set is open. In other words, f 1 (O) is open for all
open O Y .
Here are a couple of more conventions. In order to see that they make sense it is advisable to
look at an example of a subset in R2 that is neither open nor closed.
Definition 3.5.

A set is called closed if its complement is open.

The closure Y of a subset Y X is the smallest closed set containing Y , or equivalently


the intersection of all closed set containing Y .
The interiour intY of Y is the largest open set contained in Y , or equivalently, (Y c )c .
A subset Y X is called dense if its closure coincides with X, or in other words Y = X.
The boundary Y of a subset Y X is defined as Y \int(Y ).
If Y X is a subset of a topological space X then the intersections U Y of all open sets on
X with Y will be a topology on Y , the so-called relative topology. A subset Y X is called
dense if Y = X.
Let us look at the example of different topologies on the space of continuous functions C[0, 1]
on the interval [0, 1]. The topology of pointwise convergence is defined in such a way that
sequences of functions fn converge to f is and only if fn (x) converges to f (x) for all x [0, 1].
An open neighborhood basis of f would be
Uc,x = {g | |g(x) f (x)| < c}
and a set U in this topology is open if every point f U admits an open neighborhood
Uc,x = {g | |g(x) f (x)| < c} contained in U. It is then not difficult to check that convergence
in this topology means pointwise convergence. Note that this topology can not be obtained
from a metric.
Other topologies can be constructed from different metrics on C[0, 1]. For example we could
define a metric
d(f, g) = supx |f (x) g(x)|
or
Z

|f (x) g(x)|dx.

d(f, g) =
0

These metrics give rise to different topologies and thus, to different notions of convergence.
A subset Y X of a topological space is again a topological space in its own right by taking
as the open sets in Y all sets of the form U Y , where U is any open set in X. This topology
is called the induced topology.

3.1

Compactness

An open cover of a topological space X is a family of open sets (U )I such that


[
U = X.
I

A subcover of a cover is a subset I 0 I of the index set such that (U )I 0 is still a cover.
Definition 3.6. A topological space X is called compact if every open cover has a finite
subcover.
Definition 3.7. A topological space X is called sequentially compact if every sequence (xn )nN
in X has a convergent subsequence.
We know from sequences and series that in a bounded closed set in Rn every sequence has a
convergent subsequence. Therefore, bounded closed sets in Rn are sequentially compact.
Lemma 3.8. Let X be a sequentially compact metric space. Then for every  > 0 there exist
finitely many points x1 , . . . , xn such that {B (xi ) | i = 1, . . . , n} is a cover.
Proof. Suppose this were not the case. Then there would exist an  > 0 such that for any
finite number of points x1 , . . . , xn the collection of balls B (xi ) does not cover, i.e.
n
[

B (xi ) 6= X.

i=1

Starting with n = 1 and then inductively adding points that are in the complement of
ni=1 B (xi ) we end up with an infinite sequence of points xi such that d(xi , xk ) . This
sequence does certainly not have a convergent subsequence in contradiction with the sequential compactness of X.
Theorem 3.9. A metric space X is compact if and only if it is sequentially compact.
Proof. We show the two directions separately.
Compactness implies sequential compactness: Suppose that X is compact and let (xi )iN
be a sequence. We want to show that it has a convergent subsequence. Suppose (xi ) did not
have a convergent subsequence. Then no point x is an accumulation point of the sequence.
That means that for every point xi there is an i > 0 such that there are only finitely many
xk in Bi (xi ). Take such a ball for every x. Then, these balls together with the open set
X\ (i {xi }) cover (note that i {xi } is closed as it does note have accumulation points). The
existence of a finite subcover then implies that there are only finitely many xi in X, which is
a contradiction.
Sequential compactness implies compactness: This implication is quite tricky. The proof
is by contradiction. Let us assume our space is sequentially compact and there exists a cover
U that does not have a finite subcover. By the above lemma there are finitely many points
xi such that B1 (xi ) is a cover. Each of the balls B1 (xi ) is covered by U as well. Since our

cover does not have a finite subcover one of the balls B1 (xi ) does not have a finite subcover.
Denote the relevant point xi by z1 . Again there are finitely many points xi such that B 1 (xi )
2
is a cover. This is also covering B1 (z1 ). In the same way as before there is one of the xi
such that B1 (z1 ) B 1 (xi ) can not be covered by a finite subcover of U . Call that point z2 .
2
Continuing like this we construct a sequence of points zi such that none of the sets
N
\

B1 (z1 ) . . . B1/N (zN )

i=1

can be covered by a finite subcover of U . By assumption the sequence zi has a convergent


subsequence. Say z is a limit point of that subsequence. Since U is an open cover the point z
is contained in one of the U and of course that means that a whole ball around z is contained
in U . Therefore, there exits an N such that B1/N (zN ) is a subset of U . Which means that
N
\

B1 (z1 ) . . . B1/N (zN )

i=1

is a subset of U . Thus, there is a subcover consisting of one element. This is a contradiction


as we constructed the sequence of balls in such a way that these sets cannot be covered by a
finite number of the U .

3.2

Completeness

A sequence (xn ) in a metric space (X, d) is called Cauchy sequence if for every  > 0 there
exists an N > 0 such that
n, m > N : d(xn , xm ) < .
A metric space is called complete if every Cauchy sequence converges.
Suppose we have a metric space that is not complete. Then we can add points to that space
in such a way that the resulting space is complete in the same way as we can add points to
the imcomplete space Q to get the complete space R. For every metric space there exists
a complete metric space such that the original metric space is a dense subset. I will descibe
here briefly the process of completion. Suppose that V is a normed vector space. Denote by
CV the set of all Cauchy sequences in V . Now we call two Cauchy sequences (an ) and (bn )
equivalent, and write a b if and only if
d(an , bn ) 0.
The set of equivalence classes CV / is our new space. It is also a metric space with distance
d((a), (b)) = lim d(ak , bk ).
k

Clearly it does not matter which representative we choose to compute the distance. The so
obtained space will be denoted by V and is actually complete. Suppose we have a Cauchy
sequence of constant sequences. Then, this Cauchy sequence has a limit, namely the diagonal
sequence. We have the following theorem.
Theorem 3.10. If (X, d) is a metric space then there exits a complete metric space (X, d)
such that X is a dense subset of X and the metric on X is the restriction of the metric on X.
The construction with equivalence classes of Cauchy sequences gives rise to such a complete
space. It is called the abstract completion of X.

Banach Spaces

A normed (real or complex) vector space is a vector space together with a norm, that is a
map
kk:V R
such that
kvk = ||kvk,
||v + w|| ||v|| + kwk
kvk 0
kvk = 0 if and only if v = 0.
If only the first three properties hold then we call the map a semi-norm.
If we have a normed space then the map
d(x, y) = kx yk
will define a metric on that space.
Definition 4.1. A complete normed space is called Banach space.
This is a very important concept. Here is an example why.

Theorem 4.2. Let V be a Banach space and suppose (an ) is a sequence in V such that
X

kan k

converges in R. Then the sequence

n
X

bn =

ak

k=1

converges in V , in other words the series


X

an

converges in V .
Proof. By assumption
n
X

kak k

k=1

converges, so it is a Cauchy sequence. This means for any  > 0 there is an N such that for
every M > N :
M
X
kak k < .
k=N

Therefore also
k(

M
X

k=N

This in turns means that

M
X

ak )k

kak k < .

k=N
n
X

an

k=1

is a Cauchy sequence and by completeness it converges.


Theorem 4.3. Let X be a set and B(X) be the space of bounded functions on X. Then
B(X) is a Banach space with norm
kf k = sup |f (x)|.
xX

Proof.
Step 1: Suppose that fn is a Cauchy sequence of functions. That means for every  > 0 there
exist N > 0 such that
sup |fn (x) fm (x)| < 
x

for all n, m > N . In particular this is true for every x and therefore for every x the sequence
fn (x) is a Cauchy sequence in R (or C). Thus, the pointwise limit exists and defines a function
f (x).
Step 2: It remains to show that f is bounded and that fn converges to f in the norm. For
this we note that for  > 0 and every x we can find N (x) such that for all n > N (x) we have
|fn (x) f (x)| < /2.
On the other hand we know that fn is a Cauchy sequence. That means we can choose M
such that
|fn (x) fm (x)| < /2
for all n, m M . Here M is independent of x. Thus,
|fn (x) f (x)| |fn (x) fm (x)| + |fm (x) f (x)|.
But this is true for all m and we can make m dependent on x large enough so that always
|fn (x) f (x)| |fn (x) fm (x)| + |fm (x) f (x)| < /2 + /2 = .
Therefore, we have shown that for all n > M , independently of x,
|fn (x) f (x)| < .
This shows in particular that f (x) is bounded since we might choose  = 1 and obtain
|f (x)| |fn (x)| + 1 C.
for some n using that fn is a bounded function. The above also shows convergence, as the
constant M was independent of x so that we have
kfn f k = sup |fn (x) f (x)| < 
x

for all n > M .


Theorem 4.4. Let X be a topological space. Then the space Cb (X) of bounded continuous
functions with norm
kf k = sup |f (x)|
xX

is a Banach space.

10

Proof. Suppose that fn is a Cauchy sequence in Cb (X). Then by the previous theorem the
sequence converges uniformly to a bounded function and we only need to show that f is
continuous. Suppose I is an open subset of R (or C). We need to prove that f 1 (I) is open.
So it is enough to show that every point x V with f (x) I has an open neighborhood U
such that f (U) I. Take the point f (x) I. Since I is open there is an -ball around f (x)
which is contained in I. Now choose N such that
|fN (y) f (y)| < /2
for all y X. Next take the ball B/2 (f (x)). Its pre-image under fN is open because fN is
continuous. The condition y fN1 (B/2 (f (x))) is by definition equivalent to
|fN (y) f (x)| < /2.
This condition is of course true for y = x by the above. Moreover,
|f (y) f (x)| |fN (y) f (x)| + |fN (y) f (y)| |fN (y) f (x)| + /2
so that y fN1 (B/2 (f (x))) implies that f (y) B (f (x)). Thus, the open set fN1 (B/2 (f (x)))
is an open neighborhood of x which is contained in f 1 (I). Thus f 1 (I) is open.
Since every continuous function on a compact space is bounded we immediately get.
Corollary 4.5. Let X be a compact topological space. Then the set of continuous functions
C(X) with norm
kf k = sup |f (x)|
xX

is a Banach space.
Example 4.6. The set of continuously differentiable functions on [0, 1] with norm
kf k = sup |f (x)|
xX

is not a Banach space.


Example 4.7. The set of continuously differentiable functions on [0, 1] with norm
kf kC 1 = sup(|f (x)| + |f 0 (x)|)
xX

is a Banach space. This space is denoted by C 1 ([0, 1])

11

Example 4.8. The set of k-times continuously differentiable functions on [0, 1] with norm
kf kC k =

k
X

sup |f (m) (x)|

m=0 xX

is a Banach space. This space is denoted by C k ([0, 1])


Example 4.9. Since sequences are just functions on N the set
` (N) = {(an )nN | C > 0 n : an C}
of bounded sequences is a Banach space with norm
k(an )k = sup |an |.
n

Here is another example of a Banach space. Namely denote by


p

` (N) = {(an )nN |

|an |p < }.

n=1

Then this is a Banach space with norm


!1/p
k(an )k =

|an |p

Example 4.10. More generally, suppose that (X, ) is a measure space. The set of measurable
functions f on X with
Z
(

|f |p dx)1/p

is not a Banach space because there are non-zero functions that are non-zero only on a set
of measure zero. Let us therefore identify functions f and g if they differ on a set of measure
zero. In this way one obtains a Banach space which is called Lp (X) and its norm is given by
Z
kf kp = ( |f |p dx)1/p .
Example 4.11. Suppose again that (X, ) is a measure space. The set of measurable functions
f on X with
ess supx |f (x)| = inf{C > 0 | (X\f 1 ([C, C])) = 0} <
is not a Banach space because of the same reason as in the previous example. If we again
identify functions if they differ on a set of measure zero we obtain the Banach space L (X)
with norm
kf k = ess supx |f (x)|.

12

4.1

Dense subsets and separable Banach spaces

A subset S in a topological space X is called dense if S = X. The space is called separable


if there exists a countable dense subset. Dense sets are important because every element in
the set can be approximated by elements in the dense set. A famous case of density is the
Theorem of Stone-Weierstrass. It gives an easy to check condition that implies density in the
space of continuous functions on a compact space.
Theorem 4.12 (Stone-Weierstrass). Suppose that X is a compact topological space and let
C(X, R) be the Banach space of real valued continuous functions on X with norm k k .
Suppose that A C(X, R) is a unital subalgebra of C(X, R), i.e.
A is a linear subspace,
1V,
A A A, or in other words f, g A implies that also f g A.
Suppose furthermore that A separates points, i.e. for any two x, y X with x 6= y there exists
a function f A such that f (x) 6= f (y). Then, A is dense in C(X, R).
This is a harmless sounding theorem. It states that a subset of C(X, R) which is closed under
algebraic operations and separates points is automatically dense. It consequences are striking.
Before we prove this theorem lets look at some of them.
Corollary 4.13 (Weierstrass approximation theorem). The space of polynomials R[x] is dense
in C([a, b], R) for any compact interval [a, b] in the in the k k norm.
In other words, any continuous function can be approximated with arbitrary accuracy by a
polynomial.
Corollary 4.14. The space of polynomials R[x1 , . . . , xn ] is dense in C(K, R) for any compact
subset K of Rn in the k k norm.
This is the higher dimensional version of the above theorem and states that a continuous
functions of n-variables can be approximated by polynomials in n variables.
Corollary 4.15. Let C(S 1 , R) be the space of continuous functions on the unit circle, or,
equivalently, the space of 2-periodic real valued functions on R. Then the finite linear span
of the set
[
{1, sin(mx), cos(mx)}
mN
1

is dense in C(S , R).

13

The Stone-Weierstrass theorem is actually a consequence of the following theorem by Stone.


Here is some notation first for two functions f and g on a topological space.
f g = min{f, g},
f g = max{f, g}
Note that of f and g are continuous, then so are f g and f g.
Theorem 4.16 (Stone). Let X be a compact topological space and suppose that there is a
subset A of C(X, R) such that
A is closed under the operations and , this means f, g A implies f g A and
f g A.
for any pair of points x 6= y and numbers a, b R there is a function f A such that
f (x) = a and f (y) = b.
for every point x and every real number a there is an f A with f (x) = a.
Then, A is dense in C(X, R) in the topology induced by the norm kk (the uniform topology).
Proof. We need to prove that any function g can be approximated by elements in A. For each
two points x, y choose a function fx,y A such that fx,y (x) = g(x) and fx,y (y) = g(y). Such
a function exists by our hypothesis for every pair of points. Now the sets
Ox,y = {z X | fx,y (z) < g(z) + }
are open and form a cover of X even of we fix x. This is because Ox,y contains both x and y.
Now find a finite subcover for each fixed x. That is there are finitely many points y1 , . . . , yn
such that Ox,yi is an open cover. Now define the function
fx = fx,y1 . . . fx,yn .
By hypothesis f is in A and it has the property that
fx (z) < g(z) + 
but now for all z. Moreover, fx (x) = g(x). Again, the sets
Ox = {z X | fx (z) > g(z) }

14

are an open cover and therefore there is a finite subcover. This means there are finitely many
points x1 , . . . , xk such that Oxi is an open cover of X. Now the function
f = fx1 . . . fxk
is in A and satisfies
g(x)  < f (x) < g(x) + 
for all x, or in other words kf gk < .
Sketch of proof of the Stone Weierstrass theorem: Let B be the closure of A in C(X, R).
Then also B will be a unital point separating subalgebra of C(X, R).

Step 1: If f is non-negative and in B, so is f . To see this note that it is enough to show

this for 0 < f < 1 because in case f 6= 0 we can compute f by


s
p
p
1
f = 2 kf k
f.
2kf k

P
k
Now the Taylor series
1 x converges absolutely and uniformly on any interval
k=0 an x for
[0, 1 ). Therefore, the series

X
ak (1 f )k
k=0

converges in the Banach space B because all the partial sums are actually in B as B is a

subalgebra. The limit of this sequence is of course f + . If we let go to zero, we can see

that also f B. This works because


p
p
p
p

| f (x) + f (x)| = ( f (x) + + f (x))1


so that the approximation is uniform.
p
Step 2: Since |f | = f 2 we have that f B implies |f | B. Since moreover,
f + g |f g|

,
2
2
f + g |f g|
f g =
+
2
2

f g =

we conclude from this that B is closed under the operations and .


Step 3: Assume that x 6= y are points in X and assume that a, b are real numbers. Then, by
assumption there is an element f in B such that
f (x) 6= f (y).

15

Since B is a subspace that contains the constant functions the function


g(z) = a + (b a)

f (x) f (z)
f (x) f (y)

is also in B and it satisfies f (x) = a and f (y) = b. Step 4: Since the constant function a is
contained in B of course for every x X, a R we can find a function f in B with f (x) = a.
Simply take the constant function a.
Final Step: As we can see all the conditions of Stones theorem are satisfied and therefore B
is dense in C(X, R). Since B is closed in C(X, R) this means that B = C(X, R). Thus, A is
dense in C(X, R).
Corollary 4.17 (Stone-Weierstrass (complex version)). Suppose that X is a compact topological space and let C(X, C) be the Banach space of complex valued continuous functions on
X with norm k k . Suppose that A C(X, C) is a unital -subalgebra of C(X, C), i.e.
A is a linear subspace,
1 A,
A A A, or in other words f, g A implies that also f g A.
if f A the also f A.
Suppose furthermore that A separates points, i.e. for any two x, y X with x 6= y there exists
a function f A such that f (x) 6= f (y). Then, A is dense in C(X, R).
Corollary 4.18. The linear span of the set {eim | m Z} is dense in C(S 1 , C).
The Stone-Weierstrass also says something about the separability of certain Banach spaces.
Remember what it means for a topological space to be separable.
Definition 4.19. A topolocical space X is called separable if there exists a countable dense
subset of X.
Suppose K Rn is compact. Then the space of polynomials with real cofficients and nvariables is dense in the space of continuous functions C(K). Of course every polynomial with
real coefficients may be approximated by one with rational coefficients. Thus the set of rational
polynomials Q[x1 , . . . , xn ] is dense in C(K). However, the space of rational polynomials is a
countable set. In this way one obtains
Corollary 4.20. Let K be a compact subset of Rn then the Banach space C(K) is separable.

16

Supppose K is endowed with the Borel -algebra. Then we will probably see later on in the
course that the space C(K) is dense in Lp (x). Thus we also have
Corollary 4.21. Let K be a compact subset of Rn then the Banach space Lp (K) is separable
for 1 p < .
Since Rn is the coubtable union of compact sets here is one more step.
Corollary 4.22. The Banach space Lp (Rn ) is separable for 1 p < .

Linear maps between Banach spaces and the dual space

To simplify notations let us denote by K the relevant field. That is K is either the field of
reals R or complex numbers C.
Definition 5.1. Suppose V and W are normed vector spaces. A linear map L : V W is
said to be bounded if there exists a number C > 0 such that
kL(v)k Ckvk.
or equivalently
kL(v)k C,
for all v B1 (0) = {v V | |c| 1}.
The two statements are equivalent since we can divide this inequality by the norm of v if v is
nonzero. Then, using linearity of L the first inequality if obviously equivalent to
kL(

1
v)k C
kvk

1
for all v 6= 0 and of course v kvk
is in the closed unit ball (it is actually on its boundary). This
means that the map is bounded when we regard it as a map from the unit ball. Our first
observation is a very important one. It says that for a linear map boundedness and continuity
are equivalent.

Proposition 5.2. A linear map between two normed vector spaces is bounded if and only if it
is continuous.
Proof. First suppose that L is not bounded. Then, for every n there exists a vn such that
kL(vn )k > nkvn k.

17

With en =

1
v
kvn k n

we have
kL(en )k > nken k = n.

But this means that kL( n1 en )k > 1 even though n1 en converges to zero in the norm. Thus, L
is not continuous. On the other hand suppose L is bounded. Then it follows from kv wk < 
that
kL(v) L(w)k = kL(v w)k Ckv wk < C.
So L is clearly continuous because for given > 0 we can choose  = C 1 and obtain the
required estimate that garantuees continuity.
Let us denote the space of bounded linear maps from V to W by B(V, W ). If W = V we will
simply write B(V ) instead of B(V, V ). If W is the field K (either C or R depending on whether
V is a real or a complex Banach space) then we will call elements in B(V, K) functionals. The
space V = B(V, K), the space of linear continuous functionals on V , is called the topological
dual space or sometimes simply dual space if we know what we are talking about. If V and
W are normed vector spaces then we call elements in B(V, W ) bounded operators from V
to W . Elements in B(V ) are called operators on V . Of course the space B(V, W ) is also a
vector space. In the following we will write simply Av for A(v) thinking of the operator A as
acting to the right and assigning a vector Av to the vector v. So suppose A B(V, W ) is an
operator. Then there exists a positive number C > 0 such that
kAvk Ckvk.
Let us denote the infimum over all such numbers by kAk. This means
kAk = sup{kAvk | v V, kvk = 1}.
It is now not very difficult to check that with this norm the space B(V, W ) is a normed vector
space in its own right.
Theorem 5.3. Suppose V is a normed vector space and W is a Banach space. Then the space
B(V, W ) is a Banach space with norm
kLk = sup{kLvk | v V, kvk = 1}
Proof. We need to show completeness of B(V, W ). Suppose An is a Cauchy sequence of
operators in B(V, W ). This means in particular that for every v V the sequence wn = An (v)
is a Cauchy sequence in W and since W is a Banach space this Cauchy sequence converges
to an element w. Let us define the map A : V W by Av = w. Then the map A is obviously
linear and the only thing we need to prove is that it is bounded and the limit of An in B(V, W ).

18

What follows is now essentially the same argument as in 4.3. Since An is a Cauchy sequence
we have
kAn Am k 0
and we also have for every v with kvk = 1 that
kAn v Avk kAn v Am vk + kAm v Avk
where the first term can be made arbitrarily small independently of v and m because Am is
a Cauchy sequence. The second term can be made small independently of v because we can
make m dependent on v in the same way as in the proof of theorem 4.3. Therefore
sup kAn v Avk 0
kvk=1

or in other words
kAn Ak 0.

Here is another principle that is very important in functional analysis. If we have an operator
which is defined only on a dense set we can ask whether or not it has a continuous extension
to the whole space. For linear maps this can always be done.
Theorem 5.4. Suppose V and W are Banach spaces and let S V be a dense subspace.
Suppose that L : S W is a bounded linear map. Then there exists a unique bounded map
: V W such that L|
S = L.
L
Proof. Since S is dense any element in V can be approximated by elements in S. That is for
every v V there is a sequence sn S that converges to v. Then L(sn ) is a Cauchy sequence
because
kLsn Lsm k kLkksn sm k
by
and therefore Lsn converges to a point w. Let us define the map L
= w.
Lv
extends L and it is quite easy to check that it is linear. Moreover, L
is bounded
Then, clearly, L
with the same norm is L. To see this note that if sn v and kvk = 1 then also ksn k
Corollary 5.5. If V is a normed vector space and W a Banach space then any bounded operator
A B(V, W ) has a unique continuous extension to a bounded operator A B(V , W ) from
the abstract completion of V to W .

19

Example 5.6. If g L ([0, 1]) then the linear functional g : C([0, 1]) R defined by
Z 1
g(x)f (x)dx
g (f ) =
0

is a bouned linear functional and its norm is given by


kg k = kgk1 .
Example 5.7. Suppose that k C([0, 1] [0, 1]) is a continuous function. Then the map
T : C([0, 1]) C([0, 1]), f 7 T f,
Z 1
(T f )(x) =
k(x, y)f (y)dy
0

is a bounded linear operator in B(C([0, 1]). Its norm is equal to


Z
kT k = sup
kk(x, y)kdy
x[0,1]

and therefore
kT k kkk .
Theorem 5.8. If A B(V1 , V2 ) and B B(V2 , V3 ) then also BA B(V1 , V3 ) and we have
kBAk kBk kAk.
In particular B(V ) is an algebra for any Banach space V .
Proof. Clearly,
kBAvk kBkkAvk kBkkAkkvk,
and the statement of the theorem follows from this.
Theorem 5.9. Suppose that V is a Banach space and A B(V ) is an operator with kAk < 1.
Then
id A
is invertible and its inverse is bounded and given by the convergent Neumann series.
(id A)1 =

Ak .

k=0

Moreover,
k(id A)1 k

20

1
.
1 kAk

Proof. First let us prove that the Neumann series converges. By theorem 4.2 we only need
to show that

X
kAk k
k=0

converges. But since kAk k kAkk this follows from the fact that

kAkk

k=0

converges, namely to (1 kAk)1 . Next note that if


B=

Ak

k=0

then (id A)B = B(id A) is equal to

k=0

Ak = id.

k=1

The formula for the norm of (A)1 follows from the triangle inequality applied to the Neumann series.
Example 5.10. Suppose k C([0, 1] C([0, 1])) and f C([0, 1]). Then, for small enough
the Fredholm integral equation in
Z 1
k(x, y)(y)dy
(x) = f (x) +
0

has a unique continuous solution given by

X
(x) = f (x) +
(T k f )(x),
k=1

where
k

k(x, y1 )k(y1 , y2 ) . . . k(yk1 , yk )f (yk )dy1 dyk .

...

(T f )(x) =

5.1

The Hahn-Banach theorem

The main question we would like to ask here is if there are enough bounded linear functionals
on a Banach space. For example we may ask the question if for any two points in a Banach
space is there a linear functional that separates the two points. The answer is always yes and
it is the Hahn-Banach theorem that takes care of such questions.

21

Theorem 5.11 (Hahn-Banach (real version)). Let V be a real vector space and p : X R a
semi-norm. Suppose U V is a linear subspace of V and let ` : U R be a linear functional
on U such that
v U : `(v) p(v).
Then there exists a linear extension L : V R, L|U = ` such that
v V : L(v) p(v).
Proof. The proof requires Zorns lemma. Let us show first that we can always go one dimension
higher. So assume that dimV /U = 1. This means that the union of U with one vector v0
/U
spans V . So every v V can be uniquely written as
v = u + v0 ,
where u U and R. The linear functionals extending ` are then all of the form
Lr (v) = `(u) + r
and we ask if we can choose r in such a way that Lr (v) p(v) or in other words
`(u) + r p(u + v0 ).
By assumption this is certainly true if = 0. For > 0 this condition is equivalent to
r p(u + v0 ) `(u)
r p(u/ + v0 ) `(u/)
r inf (p(v + v0 ) `(v)).
vU

and for negative it is equivalent to


r p(u + v0 ) `(u)
r p(u/ v0 ) `(u/)
r sup (`(w) p(w v0 )).
wU

Therefore there is such an r for all if and only if


`(w) p(w v0 ) p(v + v0 ) `(v)
which is equivalent to
`(w) + `(v) p(w v0 ) + p(v + v0 ).

22

But this follows from the triangle inequality


`(v) + `(w) = `(v + w) p(v + w) p(v + v0 ) + p(w v0 ).
This shows that we can always go one dimension higher. We may now apply Zorns lemma
to the partially ordered set of extensions of `. For two extensions (U1 , L1 ) and (U2 , L2 ) we
say that (U2 , L2 ) (U1 , L1 ) if U1 U2 and L2 |U1 = L1 . Clearly a totally ordered subset
(U , L ) has an upper bound (Ub , Lb )) by simply setting Ub = U and defining the extension
there by Lb (v) = L (v) if v U . By Zorns Lemma there is a maximal element (Um , Lm ).
Maximality implies now that Um = V since otherwise we could extend this functional one
dimension higher.
Theorem 5.12 (Hahn-Banach (complex version)). Let V be a complex vector space and
p : X R a semi-norm. Suppose U V is a linear subspace of V and let ` : U C be a
linear functional on U such that
v U : `(v) p(v).
Then there exists a linear extension L : V C, L|U = ` such that
v V : |L(v)| p(v).
Proof. We first note a basic fact. Namely if h : V R is a real linear functional on a complex
vector space (regarded as a real vector space), then the functional

h(v)
= h(v) ih(iv)
= h. So let us take the real linear functional <(`). We
is complex linear and of course <(h)
can apply the real version of the Hahn-Banach theorem to obtain a real linear extension h of
as above. This clearly satisfies
`. Then we define a complex linear extension by L = h
ReL(v) p(v)
but by linearity of L this implies
|L(v)| p(v).
since we can multiply v by the exp(iargL(v)).
Theorem 5.13 (Hahn-Banach (continuation version)). Let V be a real or complex Banach
space and U a subspace. Then every continuous linear functional ` on U can be extended to
a continuous linear functional L on V with the same norm, i.e. k`k = kLk.

23

For example applying the Hahn Banach theorem to the functional v 7 defined on the one
dimensional subspace generated by v gives the following.
Corollary 5.14. Suppose that V is a Banach space and v V with kvk = 1. Then there exists
a linear functional such that kk = 1 and (v) = 1.
If we take two linearly independent vectors v1 , v2 V we can define the functional on the span
of v1 and v2 as 1 v1 + 2 v2 7 1 we obtain
Corollary 5.15. Suppose that V is a Banach space and v1 , v2 are distinct vectors in V . Then
there is a bounded linear functional ` on V such that `(v1 ) = 1 and `(v2 ) = 0.

5.2

Topologies on Banach spaces and Operator Topologies

Of course every Banach space comes automatically equipped with its norm topology. In this
section we will describe other interesting topologies on Banach spaces.
Suppose V is a Banach space.
The norm topology is defined by open balls B (w) = {v | kv wk < }. This means
that a sequence vn converges to v if and only of kvn vk converges to 0. We say then
that this sequence converges in the norm, or is a norm convergent sequence.
The weak topology is generated by locally convex open sets of the form C,` (w) = {v |
|`(v w)| < } where ` is in V with k`k = 1. Another way of saying this that the
topology is defined by the family of semi-norms
p` (v) = |`(v)|
where ` ranges over all linear functionals with norm 1. A sequence vn converges in this
topology if and only if for all ` V with k`k = 1 we have
`(vn ) `(v).
Remark 5.16. Topologies that are defined by a family of semi-norms p are called locally
convex topologies because the sets {x | p (x) < } are convex and open neighborhoods of
zero. One usually requires the condition that
(( : p (x) = 0) = x = 0.
This is important since it makes the topology a Hausdorff topology. This means that two
different points always have non-intersecting open neighborhoods. This is important as it

24

prevents a convergent series from having two different limits. If the family if semi-norms is
countable then the topology is actually equivalent to a metric topology with metric
d(x, y) =

2k

k=1

pk (x y)
.
1 + pk (x y)

Note that the Hahn-Banach theorem guarantees that the weak topology on a Banach space
is Hausdorff because any two points may be separated by semi-norms.
On the dual space there are yet more topologies. So let V be a Banach space and V be its
dual space.
there is of course the usual norm topology on V as it is a Banach space in its own right.
then there is also the weak topology on V as it is a Banach space in its own right.
the so called weak--topology is the topology of pointwise convergence. It is defined as
the locally convex topology generated by the family of semi-norms
px (v) = |v(x)|,
where x V . Thus, a sequence (vn ) in V converges in the weak--topology to v V
if and only if
vn (x) v(x)
for all x V .
To make the situation even more complicated we also have of course different topologies on
the space of operators B(V, W ) if V and W are Banach space,
there is of course the usual norm topology on B(V, W ) as it is a Banach space in its own
right.
the strong operator topology is the locally convex topology defined by the semi-norms
pv (A) = kAvk,
where v V . Thus, a sequence of operators An B(V, W ) converges to A B(V, W )
strongly (or in the strong operator topology) if and only if
kAn v Avk 0
for all v V , or in other words if and only if An v converges in the norm for every v.

25

the weak operator topology is the locally convex topology defined by the semi-norms
pv,w (A) = |w(Av)|,
where v V and w V . Thus, a sequence of operators An B(V, W ) converges to
A B(V, W ) weakly (or in the weak operator topology) if and only if
An v
converges to Av weakly for every v V . This is the case if and only if
w(An v) w(Av)
for every w V and v V .

Lp-spaces and their duals

Remember that `p = `p (N) is the space of sequences xn such that


(

|xn |p )1/p < .

k=1
p

and the ` -norm of x is the above finite number.


Suppose now that 1 p < and that
1 1
+ = 1,
p q
(with convention

= 0). The H
olders inequality states that for s `q and t `p the sequence

sn tn

k=1

converges as well. So we have a dual pairing between these spaces. In other words we can
define a map
T : `q (`p ) ,

X
(T x)(y) =
sn t n .

(1)
(2)

k=1

As we will see in the Problem session this map is an isometric isomorphism. So the dual of `p
can be naturally identified with `q . In particular the dual of `2 is equal to `2 itself, Remember
that `p is just Lp (N, ) if is the counting measure. The above is true for any -finite measure,
but we will not prove this here.

26

Theorem 6.1. Let 1 p < , p1 + 1q = 1 and suppose (, ) is a -finite measure space (


is a countable union of sets of finite measure). Then the map
T : Lq (, ) (Lp (, )) ,
Z
(T x)(y) =
f gd

(3)
(4)

is an isometric isomorphism.
Note that this is not true any more if we put q = 1 and p = . The dual of L is not L1 .
To see this note that
f 7 f (0)
is a bounded linear functional on the space C([1, 1]). By Hahn-Banach we can extend it to
a bounded linear functional on L (1, 1). But such a functional can not come from an L1 function (see Problem Sheet). The same is true for the space ` as the existence of Banach
limits shows.
A Banach space is called reflexive if V = V . As we have seen now Lp spaces are reflexive if
1 < p < .

Adjoint operator

If T is a bounded operator from a Banach space V to a Banach space W then the adjoint T
of that operator is the operator from W to V defined by
(T y)(v) = y(T v)
where y W . To see that T is bounded, suppose y has norm 1. Then,
kT y(v)k = ky(T v)k kT vk kT kkvk,
or in other words
kT k kT k.
By the theorem of Hahn-Banach
kT k = sup sup |y(T v)| = sup kT yk = kT k.
kvk=1 kyk=1

kyk=1

So that the norms are actually equal.

27

Example 7.1. Let T be the left shift operator on the space `p , where 1 p < .
T (x1 , x2 , . . .) = (x2 , x2 , . . .)
Then, T is the right shift operator on `q = (`p )
T (x1 , x2 , x3 , . . .) = (0, x1 , x2 , . . .)
because
s(T x) =

sk xk+1 = s1 x2 + s2 x3 + . . . = (T s)(x).

k=1

Other topological vector spaces

As we saw locally convex topologies on vector spaces may be defined by a family of semi-norms.
We would like to give here further examples of topological function spaces that are important
in applications.

8.1

The space C0 (Rn )

These are the spaces of compactly supported smooth functions. That is a function f
C (Rn ) is by definition in C0 (Rn ) is if is infinitely often differentiable and vanishes outside a
compact set. The smallest compact set outside of which f vanishes is called the support of
f and is denoted by suppf . The topology on C0 (Rn ) is a locally convex topology that can
most easily be described by the convergence of sequences. Namely, a sequence fn in C0 (Rn )
converges to f in C0 (Rn ) if and only if there exists a compact set K such that supp(fn ) K
and supp(f ) K and moreover
fn f
uniformly on K. That is the function and all its derivatives converge uniformly. The topological
dual of this space is the space of distributions D0 (Rn ). These are the linear functionals on
C0 (Rn ) such that for every compact set K Rn there is a k N, and a positive constant C
such that
X
|(f )| C
sup | f |, f C0 (K),
||k

8.2

xK

The space C (Rn )

This is the space of smooth (infinitely often differentiable) functions. We endow it with the
topology of uniform convergence of all derivatives on compact subsets.

28

8.3

The space S(Rn )

is the space of Schwartz functions. These are functions that together with all their derivatives
decay faster then any polynomial. This means S(Rn ) is the space of functions f such that for
any multi-indices , we have
sup |x f | < .
x

If we take the best constants as semi-norms, that is


p, (f ) = sup |x f |
x

this becomes a locally convex topological vector space with a countable family of semi-norms
which is complete (a Frechet space). The dual space is the space of Schwartz distributions
S 0 (Rn ). These are the functionals on S(Rn ) such that there exists and N > 0 with
X
|(f )| C
sup |x f |
||+||N

Example 8.1. Any function in f L1loc (Rn ) defines a distribution f in D0 (Rn ) by


Z
f (g) =
f (x)g(x)dx.
Rn

It is not difficult to see that the map L1loc (Rn ) D0 (Rn ) is injective, so we can understand
naturally L1loc (Rn ) as a subset of D0 (Rn ).
Example 8.2. The so-called Dirac delta function is the distribution in S 0 (Rn ) defined by
(g) = g(0).
Similarly, the Dirac delta function centered at x0 Rn is the distribution defined by
x0 (g) = g(x0 ).
Note that mutiplication and differentiation induces continuous operators on the spaces. The
adjoint maps are then defined of course on the space of distributions.
Definition 8.3. Let be a distribution in either D0 (Rn ) or S 0 (Rn ). Then the distributional
derivative is defined as
( )(g) := ((1)|| g).
Integration by parts shows that of is given by a smooth function then the distributional
derivative coincides with the ordinary derivative. The new thing is now that any distribution is
differentiable arbitrarily many times in the distributional sense.

29

Theorem 8.4. Suppose that is a distribution in D0 (R) such that 0 = 0 then is constant,
i.e. there is a C such that
= a.
R
Proof. 0 = 0 means that (f 0 ) = 0 for any test function f C0 (R). If g(x)dx = 0 then
Rx
h(x) = g(t)dt is also a test function and g = h0 . Therefore, if C0 (R) is such that
R
(x)dx = 1 then


Z
Z
0 = g(x) (x)( g(t)dt) = (g) () g(x)dx.
Therefore,

Z
(g) = a

g(x)dx,

where a = ().
Easily by induction one shows that if (k+1) = 0 then is a polynomial of degree at most k.
Example 8.5. Let H be the step function defined by H(x) = 1 for x 0 and H(x) = 0 for
x < 0. Then H is not differentiable in the ordinary sense at 0. However, in the distributional
sense we have
H 0 = ,
since
0

H(x)f (x)dx =

H (f ) =

f 0 (x)dx = f (0).

We may also multiply a distribution by a smooth function ( in case of Schwartz distribution


the function and all its derivatives has to be polynomially bounded).
(f )(g) := (f g).
Summarizing we have the following natural inclusions.
C0 (Rn ) S(Rn ) S 0 (Rn ) D0 (Rn )
and
C (Rn ) D0 (Rn ).
In order to talk about convergence we endow the spaces of distributions S 0 (Rn ) and D0 (Rn )
with their weak--topologies. This means of n is a sequence of distributions S 0 (Rn ) then n
converges in this topology to if and only if
n (f ) (f )

30

0.35
0.30
0.25
0.20
0.15
0.10
0.05

-2

-1

Figure 1: Graph of f (x)


for all f S(Rn ). Similiarly, n is a sequence of distributions D0 (Rn ) then n converges in
this topology to if and only if
n (f ) (f )
for all f D(Rn ).
Example 8.6. Let f C0 (R) be the function
(
1
e 1x2 for x [1, 1]
f (x) =
0
for x
/ [1, 1]
The graph of f (x) is plotted in figure 1. Let c be the integral
Z
c=
f (x)dx.

Then, the function

1
g(x) = f (x)
c
has support equal to [1, 1], is non-negative, and
Z
g(x)dx = 1.
If we define gn (x) = ng(nx) then the sequence gn consists of non-negative functions and we
have
Z
gn (x)dx = 1.
From this it already follows that gn converges in the weak--topology to the Dirac -distribution.
Indeed, suppose is a test function in S(R). Then
Z
Z 1/n
Z 1/n
Z 1/n
gn (x)(x)dx =
gn (x)(x)dx =
gn (x)(0)dx +
gn (x) ((x) (0)) dx.
1/n

1/n

1/n

For the second term we have


Z 1/n
|
gn (x) ((x) (0)) dx| sup |(x) (0)|
|x|<1/n

1/n

31

which goes to zero as n because is continuous. The first term is equal to (0) and
therefore,
Z
lim
gn (x)(x)dx = (0),
n

or in other words gn in the weak--topology.


More generally we say a sequence of functions gk S(Rn ) is a -family if and only if
R
Rn gk (x)dx = 1,
gk (x) 0 pointwise for all x 6= 0,
R
lim supk Rn |gk (x)|dx < .
and in this case we know that gk in the weak--topology by the same argument as above.
We already saw one example of a -family. Similarly, if f (x) is a non-negative function of
compact support such that
Z
f (x)dx = 1
then gk (x) = k n f (kx) is a -family. A more concrete example is the family of Gaussian
distributions
|x|2
1
e 22
g (x) =
( 2 2 )n
which is a -family as 0 (to obtain a sequence one could choose = 1/k).
Now suppose f is a smooth function and g is a smooth function with compact support. We
define the convolution product of f and g by
Z
(f g)(x) =
f (x t)g(t)dt.
Rn

If would like to define the convolution of a distribution in D0 (Rn ) with a function in C0 (Rn ).
The above integral can be re-written as
Z
Z
f (t)g(x t)dt = hf, gx i,
f (x t)g(t)dt =
Rn

Rn

where gx (t) = g(x t) is the function that is obtained from g by a reflection and a shift by
x. But this makes sense also for distributions. So we may define the convolution product of
a distribution f with a compactly supported smooth function g by f (
gx ). The result will be
a smooth function. This can be used to obtain the following density result, as one can show
that the sequence f gn converges in the weak--topology to f .
Theorem 8.7. The space C0 (Rn ) is (sequentially) dense in S 0 (Rn ) and D0 (Rn ) and the distributional derivatives as well as multiplication by functions are the unique continuous extensions
of these maps on C0 (Rn ).

32

Fourier transforms

As we already saw the Fourier transform F


1
F(f )() =
(2)n/2

f (x)eih,xi dx

Rn

defines a continuous map from L1 (Rn ) to L (Rn ). We will write f for the Fourier transform
Ff .
Remarkably the Fourier transform of a Schwartz function is again a Schwartz function.
Proposition 9.1. The Fourier transform F is a continuous linear map from S(Rn ) to S(Rn ).
Proof. Since each function in f S(Rn ) decays faster than any negative power this implies
that for any multi-index the functions x f (x) is absolutely integrable. This proves that f is
smooth and its derivatives are given by
Z

n
f (x) eih,xi dx =
f () = (2) 2
n
R
Z
n
f .
\
f (x)(ix) eih,xi dx = (ix)
= (2) 2
Rn

In the same way one gets


n
2

f()

f (x)(ix) eih,xi dx =
= (2)
n
R
Z
n
= (2) 2
f (x)(ix) (ix ) eih,xi dx =
n
R
Z
n
= (2) 2
f (x)(ix) (ix ) eih,xi dx
Rn

and after integration by parts


f()

= (2)

n
2


(ix ) (ix) f (x) eih,xi dx.

Rn

If f S(Rn ) then the right hand side is finite for all and . Moreover,
Z


n
| f ()| (2) 2 |
(ix ) (ix) f (x) dx|.
Rn

The right hand side does not depend to any more and the map
Z

f 7
(ix ) (ix) f (x) dx,
Rn

is continuous as it can be bounded by the semi-norms after application of the product rule.
Therefore, F is continuous as a map from S(Rn ) to S(Rn ).

33

0.20
0.35
0.15

0.30
0.25

0.10

0.20
0.05
0.15
0.10

-20

-2

20

-0.05

-1

10

-10

0.05

-0.10

Figure 2: Graph of f (x) and its Fourier transform


As we saw in the proof the Fourier transform intertwines differentiation and mutiplication by
x.
Proposition 9.2. If is a multi-index and f S(Rn ) then
(Ff ) = (i)|| F(x f ),
F( f ) = (i)|| F(f ).
Proposition 9.3. Let g(x) = eih0 ,xi f (x). Then g() = f( 0 ).
Lemma 9.4. Let g(x) S(Rn ) be the standard Gaussian
g(x) =
where kxk =

1
2
ekxk /2 ,
n/2
(2)

p
hx, xi is the euclidean norm. Then,
g() =

1
2
ekk /2 .
n/2
(2)

Proof. Let us proof first the case n = 1. Then, g is the unique solution to the ordinary
differential equation
g 0 (x) + xg(x) = 0.
with initial condition g(0) =
equation

1 .
2

By Prop. 9.2 the function g therefore satisfies the differential


i
g () + i
g 0 () = 0

which is the same differential equation. Now of course


Z
1
1
g(0) =
g(x)dx = .
2
2

34

This means g satisfies the same first order differential equation with the same initial conditions.
Therefore, Fg = g, which proves the Lemma for n = 1. The case n > 1 is easily reduced to
the case n = 1 as g is a product of Gaussian functions
g(x) =

1
2
2
ex1 /2 exn /2
n/2
(2)

and therefore the integral factorizes and we get


Z
Z
1 x21 /2
2
g() =
e
exn /2 eix1 1 eixn n dx1 dxn =
n
(2)
1
1
2
2
2
=
e1 /2 en /2 =
ekk /2 .
n/2
n/2
(2)
(2)

A simple change of variables shows that the Fourier transform of


ga (x) =

a2 x2
1
e 2
n/2
(2)

is given by
2
1
1
2
2a .
e
an (2)n/2

Note that as a 0 this converges to in the weak--topology on S 0 (Rn ). So it is quite


interesting that as a 0 the function ga spreads out more and more and converges to (2)1n/2
its Fourier transform concentrates more and more near 0 and converges to .
Lemma 9.5. If f, g S(Rn ), then
Z
Z
(Ff )(x)g(x)dx = f (x)(Fg)(x)dx
Proof. This is an easy consequence of Fubinis theorem. Namely,
Z
Z Z
1

f ()g()d =
f (x)eih,xi g()dxd =
n/2
(2)
Z
= f (x)
g (x)dx

Lemma 9.6. For any f S(Rn ) we have


(FFf )(x) = f (x).

35

Proof. We apply Lemma 9.5 to the function f and the function ga (x) =
Z

f(x)eix ga (x)dx =

2x
1
ea 2
((2)n/2 )

. Then,

Z
f (x)
ga (x + )dx =

f (x )
ga (x)dx

Now the left hand side converges to (FFf )() and the right hand side converges to f ()
Theorem 9.7. The Fourier transform is invertible and continuous as a map from S(Rn ) to
S(Rn ). Its inverse is given by
Z
1
1
(F f )(x) =
f ()e+ihx,i d.
(2)n/2
As Lemma 9.5 shows for Schwartz functions g and f we have
g(f ) = g(f),
if we understand g as a Schwartz distribution. Therefore, the adjoint map F : S 0 (Rn )
S 0 (Rn ) coincides with F on the dense subset S(Rn ). We conclude that
Theorem 9.8. The Fourier transform F : S(Rn ) S(Rn ) has a unique continuous extension
to a map (denoted here fore convenience by the same letter) F : S 0 (Rn ) S 0 (Rn ). This map
coincides the the adjoint of F. That is the Fourier transform F S 0 (Rn ) of a distribution
F S 0 (Rn ) may be defined as

(F )() = F ().
The Fourier transform on L1 (Rn ) as defined earlier cooincides with the restriction of this map
to L1 (Rn ) S 0 (Rn ).
Example 9.9. Let us calculate the Fourier transform of the constant function 1. Since 1 is
not in L1 (Rn ) the Fourier transform is only defined in the sense of distributions because the
integral
Z
eih,xi dx

does not converge. If is a test function in S(Rn ), then


Z

(1)() = 1() =
()d
= (2)n/2 (0).
Rn

Or in other words
1 = (2)n/2 .
Since (FFf )(x) = f (x) we get
=

1
1.
(2)n/2

36

By duality Proposition 9.2 is also valid for distributions. That is


Proposition 9.10. If is a multi-index and F S 0 (Rn ) then
(FF ) = (i)|| F(x F ),
F( F ) = (i)|| F(F ).
Exercise 9.11. Prove that if f, g L1 (Rn ), then
g ().
fd
g() = (2)n/2 f()
For two functions f, g in L2 (Rn ) we denote their L2 -scalar product by
Z
hf, gi :=
f (x)g(x)dx.
Rn

Note that
Z

hf, gi :=
f(x)
g (x)dx =
n
R
Z
Z
Z
F(f )(x)
F(f )(x)
=
g (x)dx =
g (x)dx =
f (x)(FFg)(x)dx =
n
n
Rn
R
R
Z
=
f (x)g(x)dx = hf, gi.
Rn

Since kf k22 = hf, f i we have in particular kfk2 = kf k2 . Therefore we have proved.


Theorem 9.12. The Fourier transform F is an isometric isomorphism from L2 (Rn ) to L2 (Rn ).

10

Hilbert Spaces

Hilbert space are Banach spaces that have an extra structure. Namely their norm comes from
an inner product. Let K be the field R or C.
Definition 10.1. Let V be a K-vector space. Then a map
h, i : V V K
is called inner product (sometimes scalar product) if
(1) hv, w1 + w2 i = hv, w1 i + hv, w2 i for all v, w1 , w2 V and , K,

37

(2) hv, wi = hw, vi for all v, w V ,


(3) hv, vi > 0 for all v V with v 6= 0.
In case K = R the second condition means that the scalar product is symmetric. Note that
as a direct consequence of these properties one also has
hv, wi = hv, wi,
hv1 + v2 , wi = hv1 , wi + hv2 , wi.
A finite dimensional example if the standard inner product on Rn given by
hx, yi =

n
X

xj y j

j=1

or on Cn given by
hx, yi =

n
X

xj yj .

j=1

Another example is the space `2 with scalar product given by


hx, yi =

xj yj .

j=1

If (X, ) is a measure space then also


Z
hf, gi =

f (x)g(x)d(x)
X

is an inner product on L2 (X, ).


A vector space with an inner product is called pre-Hilbert space.
Theorem 10.2 (Cauchy-Schwarz Inequality). Let V be a pre-Hilbert space. Then for all
v, w V :
|hv, wi|2 hv, vi hw, wi
with equality if and only if v and w are linearly independent.
Proof. If one of the vectors is 0 both sides are equal to 0 and there is nothing left to show.
We can therefore safely assume that both vectors are non-zero. Consider the vector v w.
Then
0 hv w, v wi = hv, vi hw, vi hv, wi + ||2 hw, wi.

38

Now choose =

hw,vi
.
hw,wi

Then this becomes

hv, vi 2|hw, vi|2 hw, wi1 + |hw, vi|2 hw, wi1 0.


This means of course
hv, vihw, wi |hw, vi|2 0,
with equality if and only if v w = 0.
The Cauchy-Schwarz inequality implies that ||v|| :=
and thus defines a norm.

p
hv, vi satisfies the triangle inequality

||v + w||2 = hv + w, v + wi =
= hv, vi + hw, wi + 2<hv, wi
||v||2 + ||w||2 + 2||v||||w|| = (||v|| + ||w||)2 .
Note that the inner product can be recovered from the norm by
1
hv, wi = (kv + wk2 kv wk2 )
4
if V is a real vector space and from
1
hv, wi = (kv + wk2 kv wk2 + ikv + iwk2 ikv iwk2 )
4
if V is a complex inner product space. One can show (and this will not be done here) that a
norm on a normed vector space comes from a scalar product if and only if the parallelogram
identity
kv + wk2 + kv wk2 = 2(kvk2 + kwk2 ).
holds.
Definition 10.3. A Hilbert space is an inner product space which is complete in the norm
defined by the inner product.
Definition 10.4. Let V be an inner product space. Then we say v and w are orthogonal and
write v w if and only of hv, wi = 0. We say two subsets U, W V are orthogonal and write
U W if and only if u w for all u U and w W . The orthogonal complement W of a
subset W V is defined as
W = {v V | w W : v w}.

39

This of course means that W is the largest subset that is orthogonal to W . The following
properties are immediate from the definition and the fact that the map
v 7 hw, vi
is continuous for each w (this follows from the Cauchy-Schwarz inequality).
W is always a closed subspace,
W (W ) ,
W = spanW

Pythagoras: v w = kv + wk2 = kvk2 + kwk2 .


Theorem 10.5. Let H be a Hilbert space and V H be a convex closed subset. Suppose
that x H. Then there exists precisely one v V such that
kv xk = inf wV kw xk.
Proof. Of course we can assume that x
/ V since otherwise the statement is trivial (simply
choose v =x). We can also assume without loss of generality that x = 0 by simply shifting by
x. Now let vn be a sequence such that
kvn xk inf wV kwk = d.
Note that d is the distance of 0 from V . By the parallelogram identity
kvn + vm k2 + kvn vm k2 = 2(kvn k2 + kvm k2 ) 4d2
m 2
k d2 because by assumption V is convex and therefore
as n, m . But k vn +v
2
Therefore,
kvn vm k 0

vn +vm
2

V.

as m, n . Since vn is a Cauchy sequence it converges in H and since V is closed the limit


v is in V . By continuity of the norm we have
kvk = d.
Uniqueness follows from this construction as well because if w and v are two such points then
it follows by the above that the alternating sequence (v, w, v, w, . . .) is Cauchy which implies
that v = w.

40

Theorem 10.6. Suppose that V H is a closed subspace in a Hilbert space. And let x H.
Then there is a unique element v V such that
w V :

hv x, wi = 0

This element is also the unique element satisfying


kx vk = inf kx wk.
wV

Proof. Let us show that the conditions


w V :

hv x, wi = 0

and
kx vk = inf kx wk.
wV

are equivalent. Suppose


hv x, wi = 0.
for all w V . Then,
kx wk2 = kx v (w v)k2 = kx vk2 + kw vk2 kx vk2 .
Conversely suppose that
kx vk kx v + twk
for all t in the field. Therefore
kx vk2 + |t|2 kwk2 + 2<(thx v, wi) kx vk2 .
Therefore, for all t we have
2<(thx v, wi) 0
which implies hx v, wi = 0. The rest follows from the previous theorem.
Remember that a projection is a linear map P : H H with the property that P 2 = P .
Corollary 10.7. Let V be a closed subspace of a Hilbert space H. Then there exists a unique
projection PV such that
rg(PV ) = V ,

rg(1 PV ) = V .

41

Simply define PV (x) to be the unique element w such that x w is orthogonal to V . It is not
difficult to check that this map is indeed linear and satisfies the above properties. Conversely
if the above properties are true then obviously 1 PV (x) is orthogonal to V and therefore it
has to be equal to w. If V is non-zero, then kPV k = 1.
Corollary 10.8. If V is a subspace in a Hilbert space then
V = (V ) .
Proof. Simply from P(V ) = 1 PV = 1 (1 PV ) = PV .
Theorem 10.9 (Riesz representation theorem). Let H be a Hilbert space. Then the map
: H H
(v)(w) = hv, wi
is bijective, isometric and conjugate linear. In other words, for every H there is a unique
vector v H such that () = hv, i and kvk = kk.
Proof. Obviously the map is conjugate linear and isometric because
sup |(v)(w)| = sup |hv, wi| = kvk,
kwk=1

kwk=1

1
where the supremum is attained for the vector w = kvk
v in case v 6= 0. It follows that is
injective. It remains to show surjectivity. So suppose H is a continuous linear functional
and assume without loss of generality that kk = 1. Since is continuous the kernel V of is
a closed subspace and therefore there exists a unique orthogonal projection pV onto V . Then
1 pV is not zero (the kernel is closed and not equal to V since 6= 0). and has dimension
one. Take a non-zero vector v in V and define w = (v)1 v. Then, of course, (w) = 1.
For any other vector u V we have

(u (u)w) = 0,
so u (u)w is in V and in V and therefore vanishes. So u is a multiple of w and therefore
V is one-dimensional and spanned by w. This means that a vector u H can be uniquely
written as
u = w + u0
where u0 = pV u ker. So
(u) = (w) = = hw, ui
which means that = (w).

42

Example 10.10. As we saw before L2 (R) with scalar product


Z
hf, gi =
f (x)g(x)dx
R

is a Hilbert space. Then the function f = 1/(1 + x2 ) is in L2 (R) and an easy calculation shows
that
kf k2 = /2,
so that

2 1
1 + x2
is a vector of length one. The corresponding linear map is the map
Z r
2 1
g(x)dx
g(x) 7
1 + x2
R
is linear, has norm one and it represented by f . The kernel is exactly the orthogonal complement of f which is the set of vectors g such that
Z r
2 1
g(x)dx = 0.
1 + x2
R
Example 10.11. Consider the space L2 (S 1 ) of functions on the unit circle with scalar product
Z 2
hf, gi =
f (g()d.
0

Again this space is a Hilbert space. The map m that sends a function to its mean value
Z 2
1
f (x)dx
m(f ) =
2 0
1
is continuous. It is represented by the constant function 2
because of course
Z 2
1
1
m(f ) =
1 f (x)dx = h , f i.
2 0
2

So kmk = 12 because k1k = 2.

For Banach spaces the adjoint of an operator A : V V is an operator on the dual space
A : V V . For Hilbert spaces the map : H H identifies the spaces H and H and
we can therefore understand the adjoint of an operator A : H H on a Hilbert space H as
an operator on the same Hilbert space: If A B(H) is a bounded operator then we define its
Hilbert space adjoint A B(H) by
hw, Ai = hA w, i,

43

so that we have, by definition, for all v, w H


hw, Avi = hA w, vi.
Note that we use the same symbol for the adjoint and the Hilbert space adjoint because it
will be clear from the context which adjoint we mean. Also these two definitions are closely
related as they coincide once we identify H with H using the map . Indeed, if v is the
functional represented by v, then A v is represented by A v.

10.1

Orthonormal Bases

Definition 10.12. A subset S H in a Hilbert space H is called orthonormal if


kvk = 1 for all v S,
hv, wi = 0 for all w, v S with v 6= w.
If S is a maximal orthonormal set then S is called orthonormal basis. Sometimes an orthonormal basis is also called complete orthonormal system.
Theorem 10.13 (Theorem of Pythagoras). Suppose S is an orthonormal set and {e1 , . . . , en }
S. Then
n
n
X
X
2
k
i ei k =
|i |2 .
i=1

i=1

Proof. Exercise.
Theorem 10.14. An orthonormal set S is an orthonormal basis of and only if span S is dense
in H.
Proof. Suppose that spanS is dense in H. (S ) = H and therefore S = {0}. So S is
maximal because we cannot add another non-zero vector that is orthogonal to S. Conversely
suppose that S is maximal. Then S = {0} as otherwise we could add a vector in S with
norm one to S and make it larger. Therefore spanS = (S ) = H.
Example 10.15. For the Hilbert space L2 (S 1 ) with inner product the set
1
{ eik | k Z}
2
is orthonormal. Since it is dense it is an orthonormal basis.

44

Theorem 10.16 (Gram-Schmidt procedure). Let {vn | n N} be a linearly independent subset


of a Hilbert space H. Then there exists an orthonormal set S such that
spanS = span{vn | n N}
Proof. Put e1 =

v1
.
kv1 k

Then
w2 = v2 he1 , v2 ie1

is orthogonal to e1 as one easily verifies by direct computation. Put e2 =


inductively
wk+1

w2
.
kw2 k

Now proceed

k
X
= vk+1
hei , vk+1 iei ,
i=1

ek+1 =

1
wk+1

wk+1 .

By construction {e1 , e2 , . . .} is an orthonormal set with the same span as {vn | n N}.
Example 10.17. In the Hilbert space L2 [1, 1] the space of monomials xn , n N0 is a countable linearly independent set. Of one carries out the Gram Schmidt procedure one obtains
r
1
en (x) = n + Pn (x),
 2n
1
d
Pn (x) = n
(x2 1)n .
2 n! dx
The polynomials
P0 (x) = 1, P1 (x) = x,

1
P2 (x) =
3x2 1 ,
2

1
5x3 3x , . . .
P3 (x) =
2
are called Legendre polynomials and appear in a lot of other contexts. Again, as we started
with a set with dense span and thus this set is an orthonormal basis.
Theorem 10.18 (Bessel Inequality). Suppose that S = {e1 , e2 , e3 , . . .} is a countable orthonormal set and let v H. Then

X
|hv, ei i|2 kvk2 .
n=1

45

Proof. For any N > 0 define


N
X
vN = v
hei , viei .
i=1

Then vN is orthogonal to e1 , . . . , eN . By the theorem of Pythagoras


kvk2 = kvN k2 +

N
X

|hv, ei i|2

i=1

N
X

|hv, ei i|2 .

i=1

Since this is true for all N the theorem follows.


Theorem 10.19. Let S = {e1 , e2 , e3 , . . .} is a countable or finite orthonormal set in a Hilbert
space H. Then for any v H the sequence

hei , viei

i=1

converges and the operator P defined by


Pv =

hei , viei

i=1

coincides with the unique orthogonal projection onto the closure of the span of S.
Proof. The partial sums form a Cauchy sequence because
k

N
X

hei , viei k =

N
X

|hei , vi|2

i=M

i=M

and the latter is Cauchy because of the Bessel inequality. So this indeed defines a linear
operator P . It remains to check that P is the orthogonal projection onto (S ) . First note
that P v is indeed in (S ) because of course if w S then

X
hP v, wi =
hv, ei ihei , wi = 0.
i=1

On the other hand v P v is in S as


hP v, ek i =

hv, ei ihei , ek i = hv, ek i

i=1

and consequently
hv P v, ek i = 0.

46

Theorem 10.20. In any Hilbert space H there exists an orthonormal basis. If H is separable
and infinite dimensional then this orthonormal basis is countable.
Proof. This is an immediate consequence of Zorns lemma. If H is separable choose a countable dense linearly independent set. Then simply apply the Gram-Schmidt procedure to obtain
an orthonormal basis.
Corollary 10.21. Suppose that H is a separable Hilbert space and let S = {e1 , e2 , . . .} be an
orthonormal basis. Then
for all v H: v =

i=1 hei , viei .

for all v H: kvk2 =

i=1

for all v, w H: hv, wi =

|hei , vi|2 (Parseval equality).

i=1 hv, ei ihei , wi.

S = {0}.
Remember that we call an operator A B(H) self-adjoint if
A = A.
If U : H1 H2 is an isometry, that is
kU vk = kvk,
then it also preserves scalar products since the scalar product can be recovered from the norm.
This means
hU v, U wi = hv, U U w = hv, wi.
This implies that U U = 1. Moreover, P = U U is then a projection
(U U )2 = U U U U = U U
and since
P = (U U ) = U U = P
it is orthogonal. This means that ker P is orthogonal to rgP . Indeed, if v ker P and w rgP
then
hv, wi = h(1 P )v, P wi = hP (1 P )v, wi,

47

but since P = P we have P (1 P ) = P P 2 = P P = 0. A map which is isometric and


onto is called unitary. For a map to be unitary is equivalent to
U U = 1,
U U = 1.
We say that two Hilbert spaces are isomorphic of there is a unitary map between them The
above theorem shows that any separable Hilbert space is isomorphic to `2 .
Theorem 10.22 (Riesz-Fischer). L2 ([0, 1]) is isomorphic (unitarily equivalent) to `2 .

10.2

Fourier transforms of periodic functions

As we already saw
1
{ek = eik | k Z}
2
is an orthonormal basis in the space L2 (S 1 ). Therefore, we can identify L2 (S 1 ) with `2 via the
map
f 7 f,
Z 2
1

f ()eik d.
fk = hek , f i =
2 0
The inverse is then of course given by
1 X ik
f () =
fk e .
2 kZ
Remark 10.23. Note, that this converges only in L2 and does not need to converge pointwise.
This map is of course the Fourier transform of a 2-periodic function. Let us define the space
of Schwartz functions S(Z) on Z as the space of sequences xk such that for every N > 0
sup |xk |(1 + |k|)N < .
k

Theorem 10.24. The space S(Z) is the image of C (S 1 ) under the Fourier transform. This
means a function f L2 (S 1 ) is smooth if and only if
f S(Z).

48

Proof. First assume a S(Z). Then in particular k N ak is in `2 for all N > 0. Hence,
1 X
f () =
ak eik .
2 kZ
converges uniformly and so does
(

1 X
N
) f () =
(ik)N ak eik .

2 kZ

As a consequence limit and differentiation can exchanged and f is a smooth function. Conversely, assume f is smooth. Then
1 X
f () =
ak eik
2 kZ
and as above
(

N
1 X
(ik)N fk eik .
) f () =

2 kZ

This implies that k N fk is in `2 for any N . Since sequences in `2 are bounded f S(Z).

11

Compact operators

Definition 11.1. Let V and W be Banach spaces and let T B(V, W ) be a linear operator.
Then T is called compact if the image of the unit ball in V under T has compact closure,
that is if T B1 (V ) is compact.
First note that the T B1 is bounded, namely it is a subset of BkT k . So if W is finite dimensional
then any bounded operator is compact. For the same reason T is automatically compact if
the range of T is finite dimensional. However, in infinite dimensional Banach spaces bounded
and closed sets are not necessarily compact. So the identity operator on V is compact if and
only if V is finite dimensional. The following proposition is completely evident if we remember
that compactness and sequential compactness are equivalent in Banach spaces.
Proposition 11.2. An operator T B(V, W ) is compact if and only if for any bounded
sequence vn in V there is a subsequence vn(k) such that T vn(k) converges.
Theorem 11.3. The set K(V, W ) os compact operators in B(V, W ) is a closed subspace.

49

Proof. Let us prove first that K(V, W ) is a subspace. The operator 0 is compact since a point
0 W is compact. If T is compact then also T is compact as one can easily see by inspecting
the above proposition. Suppose T, S K(V, W ) and let vn be a bounded sequence. Then T vn
has a convergent subsequence. Call this subsequence vn . Then S
vn also has a convergent
subsequence, vn0 . Therefore, vn is a subsequence such that
(T + S)vn0 = T vn0 + Svn0
converges. Thus T + S is compact as well and therefore K(V, W ) is a linear subspace. The
more tricky part is to show that K(V, W ) is closed. That is we need to prove that the limit
of a sequence of compact operators is again compact. So let Tn be a sequence of compact
operators that converge in the operator norm to an operator T . Let vn be a bounded sequence
in V . Then choose a subsequence vn1 such that T1 vn1 converges. Then choose a subsequence
vn2 of vn1 such that T2 vn2 converges. Continue in this way to obtain a sequence of subsequences
vnk . It will have the property that Tl vnk converges for all l k. The diagonal sequence vn = vnn
will then be a subsequence of vn with the property that Tm vn converges for all m. It remains
to show that T vn converges as well. For this it is enough to check that T vn is a Cauchy
sequence.
kT vn T vm k CkTk T k + kTk vn Tk vm k.
Since this is true for any k we can make the right hand side smaller than any  > 0 for n, m > N
by choosing N and k large enough. The constant C appears here because the sequence vn is
bounded and therefore k
vn vm k C for some C > 0.
Remark 11.4. It is in general not true any more for the weak and strong limits of compact
operators that they are compact.
Theorem 11.5. Suppose V1 , V2 , V3 , V4 are Banach spaces A K(V2 , V3 ) is compact and B
B(V3 , V4 ) and C B(V1 , V2 ) are bounded. Then also AC and BA are compact.
Proof. That AC is compact follows from the fact that the image of a bounded set under C
is again bounded. Since the image of a compact set under a continuous map is compact also
BA is compact.
This means that the product of a compact operator with a bounded operator is again compact.
In particular the compact operators K(V ) in a Banach space V for a two sided ideal.
Example 11.6. In the Banach space `p (Z) with p 1 consider the operator of multiplication
T by k21+1
1
(T x)k = 2
xk .
k +1

50

Then as we have already seen T is bounded and has norm 1. But T is also compact for the
following reason. Define Tn by
(
1
xk |k| < n
2
(Tn x)k = k +1
0
|k| n
Then Tn has finite rank, namely 2n 1 and is therefore compact. Moreover,
kT Tn k =

1
n2 + 1

and thus T is in the norm a limit of compact operators and is therefore compact.
Example 11.7. In the same way as in the above example one shows that the operator of
multiplication in `p by a function f (k) is compact as soon as limk |f (k)| = 0.
Example 11.8. Let F L2 (R R) be a square integrable function on R2 . Define the operator
T : L2 (R) L2 (R) by
Z
(T f )(x) =

F (x, y)f (y)dy.

Then T is bounded and kT k kF kL2 because


Z Z
2
2
kT f k = kT f (x)kL2 = | F (x, y)f (y)dy|2 dx
Z Z
Z Z
2
2

|F (x, y)| dykf k dx =


|F (x, y)|2 dxdykf k2 =
= kF k2L2 kf k2 .
Let Ai and Bi a finite collection of sets of finite measure in R. Then
X
F (x, y) =
ai Ai (x)Bi (y)

(5)

is square integrable. Namely, Ai (x)Bi (y) is the characteristic function of the square Ai Bi .
The corresponding operator T has finite rank in this case because the image consists of
functions that are in the span of Ai . Therefore, if F (x, y) is of the form (5) the operator T
is compact. On the other hand every square integrable function is in L2 the limit of such step
functions. Thus, for any F L2 (R R) the operator T is compact as it is a limit of finite
rank operators.
Example 11.9. The above example works for other measure spaces too. For example any
function F L2 ([0, 1]2 ) induces a compact operator from L2 ([0, 1]) to L2 ([0, 1]).

51

Example 11.10. If F C([0, 1]2 ) then the integral operator T : C([0, 1]) C([0, 1])
Z 1
F (x, y)f (y)dy
(T f )(x) =
0

is compact. This can be seen again by approximating F uniformly by polynomials.


Example 11.11. Let `2 (Z) define the projection pn as the orthogonal projection onto the first
2n + 1 standard basis vectors. This means
(
xk |k| n
(pn x)k =
0 |k| > n
Then in the strong topology of operators we have
pn id.
Note that this is only a strong limit and not in the norm. Even though the operators pn are
all compact the limit id is not compact.

12

Compact operators in Hilbert spaces and spectral theory

In this section we have a closer look at compact operators in Hilbert spaces. Some of the
results in this section are also true more generally for Banach spaces. If you would like to know
more on what is valid in Banach spaces and what is valid only in Hilbert spaces please consult
the literature.
Proposition 12.1. Suppose that T is a compact operator in a Hilbert space H. Then any
eigenspace for non-zero eigenvalue 6= 0 is finite dimensional. This means, if 6= 0 then
dim ker(T id) < .
Proof. Let P be the orthogonal projection onto the eigenspace with eigenvalue . Then,
P T P = P.
Since T is compact, so is P T P and therefore also P is compact. So if 6= 0 then also P
is compact. As P is orthgonal projection the range of P , which is equal to the eigenspace, is
finite dimensional.
Proposition 12.2. If T is a self-adjoint operator in a Hilbert space H then all eigenvalues of
T are real.

52

Proof. Let v H be an eigenvector with eigenvalue C. Then


kvk2 = hv, Avi = hAv, vi = kvk2 ,
and thus = .
Proposition 12.3. Let T be a self-adjoint operator in a Hilbert space H. Then eigenvectors
belonging to different eigenspaces are orthogonal.
Proof. Let v1 and v2 be eigenvectors with eigenvalues 1 and 2 with 1 6= 2 . Then
hv1 , v2 i =

1
(hT v1 , v2 i hv1 , T v2 i) = 0.
1 2

Proposition 12.4. Let T be a compact self-adjoint operator in a Hilbert space. Then the only
possible accumulation point for the set of eigenvalues of T is 0, i.e. there are only finitely
many eigenvalues with modulus greater than  for any  > 0.
Proof. It is enough to show that following statement. For any  > 0 there are only finitely
many eigenvalues with || > . Suppose by contradiction that there are infinitely many eigenvalues i > . Choose an infinite countable set k consisting of such eigenvalues and choose
corresponding eigenvectors vk of norm 1. Let V be the closure of the span of {v1 , v2 , . . .}.
Since all vk are orthogonal and linearly independent (they are eigenvectors of T for different
eigenvalues) V is infinite dimensional and vk is an orthonormal basis in V Now note that
vk = T 1
k vk and thus all vk are in the image under T of the ball B1 . By compactness of
T vk has a convergent subsequence. That is a contradiction as an orthonormal basis can not
have a convergent subsequence as
kvn vm k2 = 2.

Proposition 12.5. Let T be a compact self-adjoint operator in a Hilbert space H 6= {0}. Then
either kT k or kT k is an eigenvalue for T .
Proof. We can assume that kT k > 0 since otherwise T = 0 and then any vector is an
eigenvector with eigenvalue 0. By multiplying T by kT k1 be can assume that kT k = 1. Then
there exists a sequence vn H such that kvn k = 1 and
kT vn k 1.

53

The we also know that kT 2 k 1 and thus


1 kT 2 vn k = kT 2 vn kkvn k |hT 2 vn , vn i| = kT vn k2 .
So also kT 2 vn k 1. By compactness of T the sequence T vn has a convergent subsequence
wn which converges to w. The above means kwk = 1 and kT wk = 1. Then
|hT 2 w, wi| kT 2 wk kT 2 k 1,
|hT 2 w, wi| = hT w, T wi = 1
so we have equality in the Cauchy-Schwartz inequality and thus T w and w are linearly dependent. This means there is a constant such that
T 2 w = w.
Since hT 2 w, wi = 1 we have = 1 and thus T 2 w = w. Since either T w + w or T w w is
non-zero we have found an eigenvector with eigenvalue plus or minus one as
T (T w + w) = T w + w,
T (T w w) = (T w w).

Theorem 12.6 (Spectral Theorem for self-adjoint operators). Let T be a compact self-adjoint
operator in a Hilbert space H. Then there exists an orthonormal basis consisting of eigenvectors. If H is separable then this basis is countable {e1 , e2 , e3 , . . .} and T can be expressed
as

X
T =
i hei , iei
i=1

and the sum converges in the operator norm. Here i is the eigenvalue of the eigenvector ei .
Proof. Let V be the span of all eigenvectors. Since the eigenspaces are orthogonal we can
contruct an orthonormal basis in V that consists of eigenvectors by choosing an orthonormal
basis in each eigenspace taking the union of all these bases. It remains to show that V is
the whole space. For this we need to show that V = {0}. Since T maps eigenvectors into
eigenvectors it leaves the space V invariant. Since it is self-adjoint it also leaves V invariant.
Indeed of v is orthogonal to V then
hT v, wi = hv, T wi = 0
for all w V and therefore also T v is orthogonal to V . We restrict to operator to V and
regard it as an operator S from V to V . Then S is compact. Then either V = {0} or S

54

has an eigenvector u V . Since u is also an eigenvector for T we have u V and u V


and thus u = 0 in contradiction to u being an eigenvector. Thus V = {0} and we have
proved that there exists an orthonormal basis consisting of eigenvectors.
If H is separable this basis must be countable and on the finite linear span of the basis vector
T is of course given by

X
T =
i hei , iei .
i=1

Since 0 is the only accumulation point of the sequence i this means that i 0 and thus
the sequence converges in the operator norm. Thus the limit is a bounded operator and since
it coincides with T on a dense subset it has to coincide with T .
The sum
T =

i hei , iei

i=1

writes T as a linear combination of projections onto the eigenspaces and is called the spectral
decomposition of T .
Example 12.7. The solution operator
T = (i

d
+ 1/2)1
d

in L2 (S 1 ) to the problem
(i

d
+ 1/2)f = g
d

was shown in the tutorials to be compact and self-adjoint. In fact the Fourier transform
provides as with the spectral decomposition of T . Namely, ek = 12 eik is an orthonormal
basis consisting of eigenvectors with eigenvalues
k+

1
2

and the spectral decomposition of T is


T =

X
kZ

1
hek , iek .
k + 12

Example 12.8. Let Kt be the solution operator to the heat equation, that sends the function
to the solution of the heat equation

2
(, t) = 2 (, t)
t

55

at time t > 0 with initial condition


(, 0) = ().
Then one easily checks that
(t) =

ek t hek , iek

kZ

is the solution. Of course this is a compact operator for each fixed t > 0 since
ek

2t

goes to zero as k goes to plus or minus infinity. The above formula is the spectral resolution
of this operator.
Suppose that k C([0, 1]2 ) is continuous and satisfies
k(x, y) = k(y, x).
Then the integral operator K : L2 ([0, 1]) L2 ([0, 1]) is self-adjoint because
Z 1Z 1
hf, Kgi =
f (x)k(x, y)g(y)dxdy =
0
0
Z 1Z 1
=
k(y, x)f (x)g(y)dxdy = hKf, gi.
0

By the spectral theorem there exists an orthonormal basis {ei } consisting of eigenvectors. The
non-zero eigenvectors are continuous functions as they satisfy
Z 1
1
ei (x) = i
k(x, y)ei (y)dy
0

and the right hand side is obviously continuous. So in the sense of operators we can approximate the operator K by finite rank operators with kernels
kn (x, y) =

n
X

i ek (x)ek (y).

k=1

12.1

Trace class operators

Suppose that T is a self-adjoint compact operator and suppose in addition that


X
|n | < .
n

56

Such an operator is called a trace-class operator and the number


X
tr T =
n
n

is called its trace. More generally of T is not necessarily self-adjoint then T is called trace-class
operator if
X
k <
k

where i are the eigenvalues of T T .


Theorem 12.9. Let T be a self-adjoint trace class operator in a separable Hilbert space and
let {ei } be an orthonormal basis (not necessarily consisting of eigenvectors!). Then
X
hek , T ek i
k

converges absolutely and equals tr T . If T is a positive compact self-adjoint operator and the
above sum converges, then T is trace class.
Proof. By the spectral theorem we have
hek , T ek i =

m hvm , ek ihek , vm i

where {vm } is a orthonormal basis consisting of eigenvectors. Thus


X
X
hek , T ek i =
m hvm , ek ihek , vm i.
kZ

m,k

This converges absolutely if and only if m m converges absolutely as


X
hvm , ek ihek , vm i = kvm k = 1
k

and we get
X
X
hek , T ek i =
m = tr T.
m

kZ

Theorem 12.10. Suppose that k C([0, 1]2 ) satisfies


k(x, y) = k(y, x)
and suppose furthermore that

k(x, y) L2 ([0, 1]2 ).


x
Then the integral operator K : L2 ([0, 1]) L2 ([0, 1]) is trace-class and
Z 1
tr K =
k(x, x)dx.
0

57

Proof. One shows that this operator is a product of two Hilbert-Schmidt operators.
Example 12.11. The operator defined by the kernel k(x, y) =

x + y is trace class.

The second half of the proof does in fact not use that k(x, y) is differentiable, but uses only
continuity. This we also have
Theorem 12.12. Suppose that k C([0, 1]2 ) satisfies
k(x, y) = k(y, x).
Suppose furthermore that the integral operator K : L2 ([0, 1]) L2 ([0, 1]) is trace class. Then
Z 1
tr K =
k(x, x)dx.
0

58

You might also like