You are on page 1of 143

CAAM 335: MATRIX ANALYSIS

Course Notes: Spring 2011


Preface
Bellman has called matrix theory the arithmetic of higher mathemat-
ics. Under the inuence of Bellman and Kalman engineers and scientists
have found in matrix theory a language for representing and analyzing mul-
tivariable systems. Our goal in these notes is to demonstrate the role of
matrices in the modeling of physical systems and the power of matrix theory
in the analysis and synthesis of such systems.
Beginning with modeling of structures in static equilibrium we focus on
the linear nature of the relationship between relevant state variables and ex-
press these relationships as simple matrixvector products. For example, the
voltage drops across the resistors in a network are linear combinations of the
potentials at each end of each resistor. Similarly, the current through each
resistor is assumed to be a linear function of the voltage drop across it. And,
nally, at equilibrium, a linear combination (in minus out) of the currents
must vanish at every node in the network. In short, the vector of currents
is a linear transformation of the vector of voltage drops which is itself a lin-
ear transformation of the vector of potentials. A linear transformation of
n numbers into m numbers is accomplished by multiplying the vector of n
numbers by an m-by-n matrix. Once we have learned to spot the ubiqui-
tous matrixvector product we move on to the analysis of the resulting linear
systems of equations. We accomplish this by stretching your knowledge of
threedimensional space. That is, we ask what does it mean that the mbyn
matrix X transforms R
n
(real ndimensional space) into R
m
? We shall visu-
alize this transformation by splitting both R
n
and R
m
each into two smaller
spaces between which the given X behaves in very manageable ways. An
understanding of this splitting of the ambient spaces into the so called four
fundamental subspaces of X permits one to answer virtually every question
that may arise in the study of structures in static equilibrium.
In the second half of the notes we argue that matrix methods are equally
eective in the modeling and analysis of dynamical systems. Although our
modeling methodology adapts easily to dynamical problems we shall see, with
respect to analysis, that rather than splitting the ambient spaces we shall be
better served by splitting X itself. The process is analogous to decomposing
a complicated signal into a sum of simple harmonics oscillating at the natural
frequencies of the structure under investigation. For we shall see that (most)
matrices may be written as weighted sums of matrices of very special type.
The weights are the eigenvalues, or natural frequencies, of the matrix while
i
the component matrices are projections composed from simple products of
eigenvectors. Our approach to the eigendecomposition of matrices requires a
brief exposure to the beautiful eld of Complex Variables. This foray has the
added benet of permitting us a more careful study of the Laplace Transform,
another fundamental tool in the study of dynamical systems.
Steve Cox
ii
Contents
1. Matrix Methods for Electrical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1. Nerve Fibers and the Strang Quartet
2. Example 1
3. Example 2
4. Exercises
2. Matrix Methods for Mechanical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1. Introduction
2. A Fiber Chain
3. A Small Planar Truss
4. The General Planar Truss
5. Exercises
3. The Column and Null Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1. Introduction
2. The Column Space
3. The Null Space
4. A Blend of Theory and Example
5. Two More Examples
6. Concluding Remarks
7. Exercises
4. The Fundamental Theorem of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . 45
1. Introduction
2. The Row Space
3. The Left Null Space
4. Exercises
5. Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
1. Introduction
2. The Normal Equations
3. Applying Least Squares to the Biaxial Test Problem
4. Projections
5. Exercises
6. Matrix Methods for Dynamical Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1. Introduction
2. Nerve Fibers and the Dynamic Strang Quartet
3. The Laplace Transform
4. The Inverse Laplace Transform
5. The Backward Euler Method
iii
6. Exercises
7. Complex Numbers and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
1. Complex Numbers, Vectors and Matrices
2. Complex Functions
3. Complex Dierentiation
4. Exercises
8. Complex Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
1. Introduction
2. Cauchys Theorem
3. The Residue Theorem
4. The Inverse Laplace Transform
5. Exercises
9. The Eigenvalue Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
1. Introduction
2. The Resolvent
3. The Partial Fraction Expansion of the Resolvent
4. The Spectral Representation
5. Examples
6. The Characteristic Polynomial
7. Exercises
10. The Spectral Representation of a Symmetric Matrix . . . . . . . . . . . . . . . 107
1. Introduction
2. The Spectral Representation
3. GramSchmidt Orthogonalization
4. The diagonalization of a symmetric matrix
5. Exercises
11. The Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
1. Introduction
2. The SVD in Image Compression
3. Exercises
12. Matrix Methods for Biochemical Networks . . . . . . . . . . . . . . . . . . . . . . . . 128
1. Introduction
iv
1. Matrix Methods for Electrical Systems
1.1. Nerve Fibers and the Strang Quartet
We wish to conrm, by example, the prefatory claim that matrix algebra
is a useful means of organizing (stating and solving) multivariable problems.
In our rst such example we investigate the response of a nerve ber to a
constant current stimulus. Ideally, a nerve ber is simply a cylinder of ra-
dius a and length that conducts electricity both along its length and across
its lateral membrane. Though we shall, in subsequent chapters, delve more
deeply into the biophysics, here, in our rst outing, we shall stick to its purely
resistive properties. The latter are expressed via two quantities:
i
, the re-
sistivity in cm of the cytoplasm that lls the cell, and
m
, the resistivity in
cm
2
of the cells lateral membrane.
R
R
R
R
R
i
i
i
m
m
R
m
Figure 1.1. A 3 compartment model of a nerve cell.
Although current surely varies from point to point along the ber it is hoped
that these variations are regular enough to be captured by a multicompart-
1
ment model. By that we mean that we choose a number N and divide the ber
into N segments each of length /N. Denoting a segments axial resistance
by
R
i
=

i
/N
a
2
and membrane resistance by
R
m
=

m
2a/N
we arrive at the lumped circuit model of gure 1.1. For a ber in culture we
may assume a constant extracellular potential, e.g., zero. We accomplish this
by connecting and grounding the extracellular nodes, see gure 1.2.
R
R
R R
R R
i
i
i
m m m
i
0
Figure 1.2. A rudimentary circuit model.
This gure also incorporates the exogeneous disturbance, a current stimulus
between ground and the left end of the ber. Our immediate goal is to
compute the resulting currents through each resistor and the potential at
each of the nodes. Our longrange goal is to provide a modeling methodology
that can be used across the engineering and science disciplines. As an aid to
computing the desired quantities we give them names. With respect to gure
2
1.3 we label the vector of potentials
x = [x
1
x
2
x
3
x
4
]
and vector of currents
y = [y
1
y
2
y
3
y
4
y
5
y
6
].
We have also (arbitrarily) assigned directions to the currents as a graphical
aid in the consistent application of the basic circuit laws.
R
R
R R
R R
i
i
i
m m m
i
0
x x x x
y
y
y
y
y
y
1 2 3 4
2
3
4
5
6
1
Figure 1.3. The fully dressed circuit model.
We incorporate the circuit laws in a modeling methodology that takes
the form of a Strang* Quartet
(S1) Express the voltage drops via e = Ax.
(S2) Express Ohms Law via y = Ge.
(S3) Express Kirchhos Current Law via A
T
y = f.
(S4) Combine the above into A
T
GAx = f.
* G. Strang, Introduction to Applied Mathematics, Wellesley-Cambridge
Press, 1986
3
The A in (S1) is the nodeedge adjacency matrix it encodes the net-
works connectivity. The G in (S2) is the diagonal matrix of edge conductances
it encodes the physics of the network. The f in (S3) is the vector of current
sources it encodes the networks stimuli. The culminating A
T
GA in (S4)
is the symmetric matrix whose inverse, when applied to f, reveals the vector
of potentials, x. In order to make these ideas our own we must work many,
many examples.
1.2. Example 1
With respect to the circuit of gure 1.3, in accordance with step (S1), we
express the six potentials dierences (always tail minus head)
e
1
= x
1
x
2
e
2
= x
2
e
3
= x
2
x
3
e
4
= x
3
e
5
= x
3
x
4
e
6
= x
4
Such long, tedious lists cry out for matrix representation, to wit
e = Ax where A =
_
_
_
_
_
_
_
1 1 0 0
0 1 0 0
0 1 1 0
0 0 1 0
0 0 1 1
0 0 0 1
_
_
_
_
_
_
_
Step (S2), Ohms law, states that the current along an edge is equal to the
potential drop across the edge divided by the resistance of the edge. In our
4
case,
y
j
= e
j
/R
i
, j = 1, 3, 5 and y
j
= e
j
/R
m
, j = 2, 4, 6
or, in matrix notation,
y = Ge
where
G =
_
_
_
_
_
_
_
1/R
i
0 0 0 0 0
0 1/R
m
0 0 0 0
0 0 1/R
i
0 0 0
0 0 0 1/R
m
0 0
0 0 0 0 1/R
i
0
0 0 0 0 0 1/R
m
_
_
_
_
_
_
_
Step (S3), Kirchhos Current Law, states that the sum of the currents into
each node must be zero. In our case
i
0
y
1
= 0
y
1
y
2
y
3
= 0
y
3
y
4
y
5
= 0
y
5
y
6
= 0
or, in matrix terms
By = f
where
B =
_
_
_
1 0 0 0 0 0
1 1 1 0 0 0
0 0 1 1 1 0
0 0 0 0 1 1
_
_
_
and f =
_
_
_
i
0
0
0
0
_
_
_
.
Turning back the page we recognize in B the transpose of A. Calling it such,
we recall our main steps
e = Ax, y = Ge, and A
T
y = f.
5
On substitution of the rst two into the third we arrive, in accordance with
(S4), at
A
T
GAx = f. (1.1)
This is a system of four equations for the 4 unknown potentials, x
1
through
x
4
. As you know, the system (1.1) may have either 1, 0, or innitely many
solutions, depending on f and A
T
GA. We shall devote chapters 3 and 4 to an
unraveling of the previous sentence. For now, we cross our ngers and solve
by invoking the Matlab program, fib1.m, suspended from the course page.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4
5
6
7
8
9
10
11
z (cm)
x

(
m
V
)
Potential along 64 compartment fiber
Figure 1.4. Results of a 64 compartment simulation.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
z (cm)
y

(


A
)
Axial current along 64 compartment fiber
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.01
0.012
0.014
0.016
0.018
0.02
0.022
0.024
0.026
0.028
z (cm)
y

(


A
)
Membrane current along 64 compartment fiber
Figure 1.5. Results of a 64 compartment simulation.
This program is a bit more ambitious than the above in that it allows us to
6
specify the number of compartments and that rather than just spewing the x
and y values it plots them as a function of distance along the ber. We note
that, as expected, everything tapers o with distance from the source and
that the axial current is signicantly greater than the membrane, or leakage,
current.
1.3. Example 2
We have seen in the previous section how a current source may produce
a potential dierence across a cells membrane. We note that, even in the
absence of electrical stimuli, there is always a dierence in potential between
the inside and outside of a living cell. In fact, this dierence is the biologists
denition of living. Life is maintained by the fact that the cells interior
is rich in potassium ions, K
+
, and poor in sodium ions, Na
+
, while in the
exterior medium it is just the opposite. These concentration dierences beget
potential dierences under the guise of the Nernst potentials
E
Na
=
RT
F
log
_
[Na]
o
[Na]
i
_
and E
K
=
RT
F
log
_
[K]
o
[K]
i
_
where R is the gas constant, T is temperature, and F is the Faraday constant.
Associated with these potentials are membrane resistances

m,Na
and
m,K
that together produce the
m
above via
1/
m
= 1/
m,Na
+ 1/
m,K
,
and produce the aforementioned rest potential
E
m
=
m
(E
Na
/
m,Na
+E
K
/
m,K
).
7
With respect to our old circuit model, each compartment now sports a battery
in series with its membrane resistance.
R
R
R R
R R
i
i
i
m m m
i
0
E E E
m m m
Figure 1.6. Circuit model with resting potentials.
Revisiting steps (S14) we note that in (S1) the even numbered voltage
drops are now
e
2
= x
2
E
m
, e
4
= x
3
E
m
and e
6
= x
4
E
m
.
We accommodate such things by generalizing (S1) to
(S1) Express the voltage drops as e = bAx where b is the vector of batteries.
No changes are necessary for (S2) and (S3). The nal step now reads,
(S4) Combine (S1), (S2) and (S3) to produce A
T
GAx = A
T
Gb +f.
Returning to gure 1.6 we note that
b = E
m
[0 1 0 1 0 1]
T
and A
T
Gb = (E
m
/R
m
)[0 1 1 1]
T
.
This requires only minor changes to our old code. The new program is called
fib2.m and results of its use are indicated in the next two gures.
8
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
66
65
64
63
62
61
60
59
z (cm)
x

(
m
V
)
Potential along 64 compartment fiber
Figure 1.7. Results of a 64 compartment simulation with batteries.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
z (cm)
y

(


A
)
Axial current along 64 compartment fiber
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.01
0.012
0.014
0.016
0.018
0.02
0.022
0.024
0.026
0.028
z (cm)
y

(


A
)
Membrane current along 64 compartment fiber
Figure 1.8. Results of a 64 compartment simulation with batteries.
1.4. Exercises
[1] In order to refresh your matrix-vector multiply skills please calculate, by
hand, the product A
T
GA in the 3 compartment case and write out the
4 equations in (1.1). The second equation should read
(x
1
+ 2x
2
x
3
)/R
i
+x
2
/R
m
= 0. (1.2)
[2] We began our discussion with the hope that a multicompartment model
could indeed adequately capture the bers true potential and current
proles. In order to check this one should run fib1.m with increasing
9
values of N until one can no longer detect changes in the computed
potentials.
(a) Please run fib1.m with N = 8, 16, 32 and 64. Plot all of the po-
tentials on the same (use hold) graph, using dierent line types for
each. (You may wish to alter fib1.m so that it accepts N as an
argument).
Let us now interpret this convergence. The main observation is that the
dierence equation, (1.2), approaches a dierential equation. We can see
this by noting that
dz /N
acts as a spatial step size and that x
k
, the potential at (k 1)dz, is
approximately the value of the true potential at (k 1)dz. In a slight
abuse of notation, we denote the latter
x((k 1)dz).
Applying these conventions to (1.2) and recalling the denitions of R
i
and R
m
we see (1.2) become
a
2

i
x(0) + 2x(dz) x(2dz)
dz
+
2adz

m
x(dz) = 0,
or, after multiplying through by
m
/(adz),
a
m

i
x(0) + 2x(dz) x(2dz)
dz
2
+ 2x(dz) = 0.
We note that a similar equation holds at each node (save the ends) and
that as N and therefore dz 0 we arrive at
d
2
x(z)
dz
2

2
i
a
m
x(z) = 0. (1.3)
10
(b) With 2
i
/(a
m
) show that
x(z) = sinh(

z) + cosh(

z) (1.4)
satises (1.3) regardless of and .
We shall determine and by paying attention to the ends of the ber.
At the near end we nd
a
2

i
x(0) x(dz)
dz
= i
0
,
which, as dz 0 becomes
dx(0)
dz
=

i
i
0
a
2
. (1.5)
At the far end, we interpret the condition that no axial current may leave
the last node to mean
dx()
dz
= 0. (1.6)
(c) Substitute (1.4) into (1.5) and (1.6) and solve for and and write
out the nal x(z).
(d) Substitute into x the , a,
i
and
m
values used in fib1.m, plot the
resulting function (using, e.g., ezplot) and compare this to the plot
achieved in part (a).
11
2. Matrix Methods for Mechanical Systems
2.1. Introduction
We move now from the electrical to mechanical prospection of tissue. In
this application, one applies traction to the edges of a square sample of planar
tissue and seeks to identify, from measurement of the resulting deformation,
regions of increased hardness or stiness.
2.2. A Fiber Chain
As a precursor to the biaxial problem let us rst consider the uniaxial
case. We connect 3 masses (nodes) with four springs (bers) between two
immobile walls,
k
1
m
1
k
2
m
2
k
3
m
3
k
4
Figure 2.1. A ber chain.
apply forces at the masses and measure the associated displacement. More
precisely, we suppose that a horizontal force, f
j
, is applied to each m
j
, and
produces a displacement x
j
, with the sign convention that rightward means
positive. The bars at the ends of the gure indicate rigid supports incapable of
movement. The k
j
denote the respective spring stinesses. Regarding units,
we measure f
j
in Newtons (N) and x
j
in meters (m) and so stiness, k
j
, is
measured in (N/m). In fact each stiness is a parameter composed of both
material and geometric quantities. In particular,
k
j
=
Y
j
a
j
L
j
(2.1)
12
where Y
j
is the bers Youngs modulus (N/m
2
), a
j
is the bers cross-
sectional area (m
2
) and L
j
is the bers (reference) length (m).
The analog of potential dierence is here elongation. If e
j
denotes the
elongation of the jth spring then naturally,
e
1
= x
1
, e
2
= x
2
x
1
, e
3
= x
3
x
2
, and e
4
= x
3
,
or, in matrix terms,
e = Ax where A =
_
_
_
1 0 0
1 1 0
0 1 1
0 0 1
_
_
_
.
We note that e
j
is positive when the spring is stretched and negative when
compressed. The analog of Ohms Law is here Hookes Law: the restoring
force in a spring is proportional to its elongation. We call this constant of
proportionality the stiness, k
j
, of the spring, and denote the restoring force
by y
j
. Hookes Law then reads, y
j
= k
j
e
j
, or, in matrix terms
y = Ke where K =
_
_
_
k
1
0 0 0
0 k
2
0 0
0 0 k
3
0
0 0 0 k
4
_
_
_
.
The analog of Kirchhos Current Law is here typically called force balance.
More precisely, equilibrium is synonymous with the fact that the net force
acting on each mass must vanish. In symbols,
y
1
y
2
f
1
= 0, y
2
y
3
f
2
= 0, and y
3
y
4
f
3
= 0,
or, in matrix terms
By = f where f =
_
_
f
1
f
2
f
3
_
_
and B =
_
_
1 1 0 0
0 1 1 0
0 0 1 1
_
_
.
13
As is the previous section we recognize in B the transpose of A. Gathering
our three important steps
e = Ax (2.2)
y = Ke (2.3)
A
T
y = f (2.4)
we arrive, via direct substitution, at an equation for x. Namely
A
T
y = f A
T
Ke = f A
T
KAx = f.
Assembling A
T
KA we arrive at the nal system
_
_
k
1
+k
2
k
2
0
k
2
k
2
+k
3
k
3
0 k
3
k
3
+k
4
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
f
1
f
2
f
3
_
_
. (2.5)
Although Matlab solves such systems with ease our aim here is to develop a
deeper understanding of Gaussian Elimination and so we proceed by hand.
This aim is motivated by a number of important considerations. First, not all
linear systems have unique solutions. A careful look at Gaussian Elimination
will provide the general framework for not only classifying those systems that
possess unique solutions but also for providing detailed diagnoses of those
systems that lack solutions or possess too many.
In Gaussian Elimination one rst uses linear combinations of preceding
rows to eliminate nonzeros below the main diagonal and then solves the result-
ing triangular system via backsubstitution. To rm up our understanding
let us take up the case where each k
j
= 1 and so (2.5) takes the form
_
_
2 1 0
1 2 1
0 1 2
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
f
1
f
2
f
3
_
_
(2.6)
14
We eliminate the (2, 1) (row 2, column 1) element by implementing
new row 2 = old row 2 +
1
2
row 1, (2.7)
bringing
_
_
2 1 0
0 3/2 1
0 1 2
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
f
1
f
2
+f
1
/2
f
3
_
_
We eliminate the current (3, 2) element by implementing
new row 3 = old row 3 +
2
3
row 2, (2.8)
bringing the uppertriangular system
Ux = g, (2.9)
or, more precisely,
_
_
2 1 0
0 3/2 1
0 0 4/3
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
f
1
f
2
+f
1
/2
f
3
+ 2f
2
/3 +f
1
/3
_
_
(2.10)
One now simply reads o
x
3
= (f
1
+ 2f
2
+ 3f
3
)/4.
This in turn permits the solution of the second equation
x
2
= 2(x
3
+f
2
+f
1
/2)/3 = (f
1
+ 2f
2
+f
3
)/2,
and, in turn,
x
1
= (x
2
+f
1
)/2 = (3f
1
+ 2f
2
+f
3
)/4.
15
One must say that Gaussian Elimination has succeeded here. For, regardless
of the actual elements of f we have produced an x for which A
T
KAx = f.
Although Gaussian Elimination, remains the most ecient means for
solving systems of the form Sx = f it pays, at times, to consider alternate
means. At the algebraic level, suppose that there exists a matrix that undoes
multiplication by S in the sense that multiplication by 2
1
undoes multipli-
cation by 2. The matrix analog of 2
1
2 = 1 is
S
1
S = I
where I denotes the identity matrix (all zeros except the ones on the diagonal).
We call S
1
the inverse of S or S inverse for short. Its value stems from
watching what happens when it is applied to each side of Sx = f. Namely,
Sx = f S
1
Sx = S
1
f Ix = S
1
f x = S
1
f.
Hence, to solve Sx = f for x it suces to multiply f by the inverse of S. Let
us now consider how one goes about computing S
1
. In general this takes a
little more than twice the work of Gaussian Elimination, for we interpret
SS
1
= I
as n (the size of S) applications of Gaussian elimination, with f running
through n columns of the identity matrix. The bundling of these n applica-
tions into one is known as the Gauss-Jordan method. Let us demonstrate it
on the S appearing in (2.6). We rst augment S with I.
_
_
2 1 0 | 1 0 0
1 2 1 | 0 1 0
0 1 2 | 0 0 1
_
_
16
We then eliminate down, being careful to address each of the 3 f vectors.
This produces
_
_
2 1 0 | 1 0 0
0 3/2 1 | 1/2 1 0
0 0 4/3 | 1/3 2/3 1
_
_
Now, rather than simple backsubstitution we instead eliminate up. Elimi-
nating rst the (2, 3) element we nd
_
_
2 1 0 | 1 0 0
0 3/2 0 | 3/4 3/2 3/4
0 0 4/3 | 1/3 2/3 1
_
_
Now eliminating the (1, 2) element we achieve
_
_
2 0 0 | 3/2 1 1/2
0 3/2 0 | 3/4 3/2 3/4
0 0 4/3 | 1/3 2/3 1
_
_
In the nal step we scale each row in order that the matrix on the left takes
on the form of the identity. This requires that we multiply row 1 by 1/2, row
2 by 3/2 and row 3 by 3/4, with the result
_
_
1 0 0 | 3/4 1/2 1/4
0 1 0 | 1/2 1 1/2
0 0 1 | 1/4 1/2 3/4
_
_
.
Now in this transformation of S into I we have, ipso facto, transformed I to
S
1
, i.e., the matrix that appears on the right upon applying the method of
GaussJordan is the inverse of the matrix that began on the left. In this case,
S
1
=
_
_
3/4 1/2 1/4
1/2 1 1/2
1/4 1/2 3/4
_
_
.
One should check that S
1
f indeed coincides with the x computed above.
17
Not all matrices possess inverses. Those that do are called invertible or
nonsingular. For example
_
1 2
2 4
_
is singular.
Some matrices can be inverted by inspection. An important class of
such matrices is in fact latent in the process of Gaussian Elimination itself.
To begin, we build the elimination matrix that enacts the elementary row
operation spelled out in (2.7),
E
1
=
_
_
1 0 0
1/2 1 0
0 0 1
_
_
Do you see that this matrix (when applied from the left to S) leaves rows 1
and 3 unsullied but adds half of row one to two? This ought to be undone
by simply subtracting half of row 1 from row two, i.e., by application of
E
1
1
=
_
_
1 0 0
1/2 1 0
0 0 1
_
_
Please conrm that E
1
1
E
1
is indeed I. Similarly, the matrix analogs of (2.8)
and its undoing are
E
2
=
_
_
1 0 0
0 1 0
0 2/3 1
_
_
and E
1
2
=
_
_
1 0 0
0 1 0
0 2/3 1
_
_
Again, please conrm that E
2
E
1
2
= I. Now we may express the reduction
of S to U (recall (2.9)) as
E
2
E
1
S = U
18
and the subsequent reconstitution by
S = LU, where L = E
1
1
E
1
2
=
_
_
1 0 0
1/2 1 0
0 2/3 1
_
_
One speaks of this representation as the LU decomposition of S. We have
just observed that the inverse of a product is the product of the inverses in
reverse order. Do you agree that S
1
= U
1
L
1
? And what do you think of
the statement S
1
= A
1
K
1
(A
T
)
1
?
LU decomposition is the preferred method of solution for the large lin-
ear systems that occur in practice. The decomposition is implemented in
MATLAB as
[L U] = lu(S);
and in fact lies at the heart of MATLABs blackslash command. To diagram
its use, we write Sx = f as LUx = f and recognize that the latter is nothing
more than a pair of triangular problems:
Lc = f and Ux = c,
that may be solved by forward and backward substitution respectively. This
representation achieves its greatest advantage when one is asked to solve Sx =
f over a large class of f vectors. For example, if we wish to steadily increase
the force, f
2
, on mass 2, and track the resulting displacement we would be
well served by
[L U] = lu(S);
f = [1 1 1];
for j=1:100,
19
f(2) = f(2) + j/100;
x = U \ (L \ f);
plot(x,o)
end
You are correct in pointing out that we could have also just precomputed the
inverse of S and then sequentially applied it in our for loop. The use of the
inverse is, in general, considerably more costly in terms of both memory and
operation counts. The exercises will give you a chance to see this for yourself.
2.3. A Small Planar Truss
It turns out that singular matrices are typical in the biaxial testing prob-
lem that opened the section. As our initial step into the world of such planar
structures let us consider the simple truss in the gure below.
k
1
m
1
x
1
x
2
k
2
x
3
x
4
m
2
k
3
Figure 2.2. A simple swing.
We denote by x
1
and x
2
the respective horizontal and vertical displace-
ments of m
1
(positive is right and down). Similarly, f
1
and f
2
will denote the
associated components of force. The corresponding displacements and forces
at m
2
will be denoted by x
3
, x
4
and f
3
, f
4
. In computing the elongations of
20
the three springs we shall make reference to their unstretched lengths, L
1
, L
2
,
and L
3
.
Now, if spring 1 connects (0, L
1
) to (0, 0) when at rest and (0, L
1
) to
(x
1
, x
2
) when stretched then its elongation is simply
e
1
=
_
x
2
1
+ (x
2
+L
1
)
2
L
1
. (2.11)
The price one pays for moving to higher dimensions is that lengths are now
expressed in terms of square roots. The upshot is that the elongations are
not linear combinations of the end displacements as they were in the uniaxial
case. If we presume however that the loads and stinesses are matched in
the sense that the displacements are small compared with the original lengths
then we may eectively ignore the nonlinear contribution in (2.11). In order
to make this precise we need only recall the Taylor development of

1 +t
about t = 0, i.e.,

1 +t = 1 +t/2 +O(t
2
)
where the latter term signies the remainder. With regard to e
1
this allows
e
1
=
_
x
2
1
+x
2
2
+ 2x
2
L
1
+L
2
1
L
1
= L
1
_
1 + (x
2
1
+x
2
2
)/L
2
1
+ 2x
2
/L
1
L
1
= L
1
+ (x
2
1
+x
2
2
)/(2L
1
) +x
2
+L
1
O(((x
2
1
+x
2
2
)/L
2
1
+ 2x
2
/L
1
)
2
) L
1
= x
2
+ (x
2
1
+x
2
2
)/(2L
1
) +L
1
O(((x
2
1
+x
2
2
)/L
2
1
+ 2x
2
/L
1
)
2
).
If we now assume that
(x
2
1
+x
2
2
)/(2L
1
) is small compared to x
2
(2.12)
then, as the O term is even smaller, we may neglect all but the rst terms in
the above and so arrive at
e
1
= x
2
.
21
To take a concrete example, if L
1
is one meter and x
1
and x
2
are each one
centimeter than x
2
is one hundred times (x
2
1
+x
2
2
)/(2L
1
).
With regard to the second spring, arguing as above, its elongation is
(approximately) its stretch along its initial direction. As its initial direction
is horizontal, its elongation is just the dierence of the respective horizontal
end displacements, namely,
e
2
= x
3
x
1
.
Finally, the elongation of the third spring is (approximately) the dierence of
its respective vertical end displacements, i.e.,
e
3
= x
4
.
We encode these three elongations in
e = Ax where A =
_
_
0 1 0 0
1 0 1 0
0 0 0 1
_
_
.
Hookes law is an elemental piece of physics and is not perturbed by our leap
from uniaxial to biaxial structures. The upshot is that the restoring force in
each spring is still proportional to its elongation, i.e., y
j
= k
j
e
j
where k
j
is
the stiness of the jth spring. In matrix terms,
y = Ke where K =
_
_
k
1
0 0
0 k
2
0
0 0 k
3
_
_
.
Balancing horizontal and vertical forces at m
1
brings
y
2
f
1
= 0 and y
1
f
2
= 0,
22
while balancing horizontal and vertical forces at m
2
brings
y
2
f
3
= 0 and y
3
f
4
= 0.
We assemble these into
By = f where B =
_
_
_
0 1 0
1 0 0
0 1 0
0 0 1
_
_
_
,
and recognize, as expected, that B is nothing more than A
T
. Putting the
pieces together, we nd that x must satisfy Sx = f where
S = A
T
KA =
_
_
_
k
2
0 k
2
0
0 k
1
0 0
k
2
0 k
2
0
0 0 0 k
3
_
_
_
.
Applying one step of Gaussian Elimination brings
_
_
_
k
2
0 k
2
0
0 k
1
0 0
0 0 0 0
0 0 0 k
3
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
=
_
_
_
f
1
f
2
f
1
+f
3
f
4
_
_
_
and back substitution delivers
x
4
= f
4
/k
3
,
0 = f
1
+f
3
,
x
2
= f
2
/k
1
,
x
1
x
3
= f
1
/k
2
.
The second of these is remarkable in that it contains no components of x.
Instead, it provides a condition on f. In mechanical terms, it states that
there can be no equilibrium unless the horizontal forces on the two masses are
23
equal and opposite. Of course one could have observed this directly from the
layout of the truss. In modern, threedimensional structures with thousands
of members meant to shelter or convey humans one should not however be
satised with the visual integrity of the structure. In particular, one desires
a detailed description of all loads that can, and, especially, all loads that can
not, be equilibrated by the proposed truss. In algebraic terms, given a matrix
S one desires a characterization of (1) all those f for which Sx = f possesses a
solution and (2) all those f for which Sx = f does not possess a solution. We
provide such a characterization in Chapter 3 in our discussion of the column
space of a matrix.
Supposing now that f
1
+f
3
= 0 we note that although the system above
is consistent it still fails to uniquely determine the four components of x. In
particular, it species only the dierence between x
1
and x
3
. As a result both
x =
_
_
_
f
1
/k
2
f
2
/k
1
0
f
4
/k
3
_
_
_
and x =
_
_
_
0
f
2
/k
1
f
1
/k
2
f
4
/k
3
_
_
_
satisfy Sx = f. In fact, one may add to either an arbitrary multiple of
z
_
_
_
1
0
1
0
_
_
_
(2.13)
and still have a solution of Sx = f. Searching for the source of this lack of
uniqueness we observe some redundancies in the columns of S. In particular,
the third is simply the opposite of the rst. As S is simply A
T
KA we recognize
that the original fault lies with A, where again, the rst and third columns
are opposites. These redundancies are encoded in z in the sense that
Az = 0.
24
Interpreting this in mechanical terms, we view z as a displacement and Az as
the resulting elongation. In Az = 0 we see a nonzero displacement producing
zero elongation. One says in this case that the truss deforms without doing
any work and speaks of z as an unstable mode. Again, this mode could have
been observed by a simple glance at Figure 2.2. Such is not the case for more
complex structures and so the engineer seeks a systematic means by which
all unstable modes may be identied. We shall see in Chapter 3 that these
modes are captured by the null space of A.
From Sz = 0 one easily deduces that S is singular. More precisely, if S
1
were to exist then S
1
Sz would equal S
1
0, i.e., z = 0, contrary to (2.13).
As a result, Matlab will fail to solve Sx = f even when f is a force that the
truss can equilibrate. One way out is to use the pseudoinverse, as we shall
see below.
2.4. The General Planar Truss
Finally, let us return to something that resembles the mechanical prospec-
tion problem advertised in the introduction. In the gure below we oer a
crude mechanical model of a planar tissue, say, e.g., an excised sample of the
wall of a vein.
Elastic bers, numbered 1 20, meet at nodes, numbered 1 9. We limit
our observation to the motion of the nodes by denoting the horizontal and
vertical displacements of node j by x
2j1
and x
2j
respectively. Retaining the
convention that down and right are positive we note that the elongation of
ber 1 is
e
1
= x
2
x
8
25
while that of ber 3 is
e
3
= x
3
x
1
.
1
2
3
4
5
6
7
8
9
10
11
12
13 14
15
16
17
18
19 20
1 2 3
4 5 6
7 8 9
Figure 2.3. A crude tissue model.
As bers 2 and 4 are neither vertical nor horizontal their elongations,
in terms of nodal displacements, are not so easy to read o. This is more
a nuisance than an obstacle however, for recalling our earlier discussion, the
elongation is approximately just the stretch along its undeformed axis. With
respect to ber 2, as it makes the angle /4 with respect to the positive
horizontal axis, we nd
e
2
= (x
9
x
1
) cos(/4) + (x
10
x
2
) sin(/4) = (x
9
x
1
+x
2
x
10
)/

2.
Similarly, as ber 4 makes the angle 3/4 with respect to the positive hori-
zontal axis, its elongation is
e
4
= (x
7
x
3
) cos(3/4) +(x
8
x
4
) sin(3/4) = (x
3
x
7
+x
4
x
8
)/

2.
26
These are both direct applications of the general formula
e
j
= (x
2n1
x
2m1
) cos(
j
) + (x
2n
x
2m
) sin(
j
) (2.14)
for ber j, as depicted in the gure below, connecting node m to node n and
making the angle
j
with the positive horizontal axis when node m is assumed
to lie at the point (0, 0). The reader should check that our expressions for e
1
and e
3
indeed conform to this general formula and that e
2
and e
4
agree with
ones intuition. For example, visual inspection of the specimen suggests that
ber 2 can not be supposed to stretch (i.e., have positive e
2
) unless x
9
> x
1
and/or x
2
> x
10
. Does this jibe with (2.14)?

x
x
x
x
2m
2m-1
2n
2n-1
j
deformed
original
Figure 2.4. Elongation of a generic bar, see (2.14).
Applying (2.14) to each of the remaining bers we arrive at e = Ax
where A is 20-by-18, one row for each ber, and one column for each degree
of freedom. For systems of such size with such a well dened structure one
naturally hopes to automate the construction. We have done just that in the
accompanying M-le and diary. The M-le begins with a matrix of raw data
that anyone with a protractor could have keyed in directly from Figure 2.3.
More precisely, the data matrix has a row for each ber and each row consists
of the starting and ending node numbers and the angle the ber makes with
27
the positive horizontal axis. This data is precisely what (2.14) requires in
order to know which columns of A receive the proper cos or sin. The nal A
matrix is displayed in the diary.
The next two steps are now familiar. If K denotes the diagonal matrix
of ber stinesses and f denotes the vector of nodal forces then y = Ke and
A
T
y = f and so one must solve Sx = f where S = A
T
KA. In this case
there is an entire threedimensional class of z for which Az = 0 and therefore
Sz = 0. The three indicates that there are three independent unstable modes
of the specimen, e.g., two translations and a rotation. As a result S is singular
and x = S\f in matlab will get us nowhere. The way out is to recognize that
S has 18 3 = 15 stable modes and that if we restrict S to act only in these
directions then it should be invertible. We will begin to make these notions
precise in Chapter 4 on the Fundamental Theorem of Linear Algebra.
Figure 2.5. The solid(dashed) circles correspond to the nodal positions be-
fore(after) the application of the traction force, f.
For now let us note that every matrix possesses such a pseudo-inverse
and that it may be computed in Matlab via the pinv command. On supposing
the ber stinesses to each be one and the edge traction to be of the form
f = [1 1 0 1 1 1 1 0 0 0 1 0 1 1 0 1 1 1]
T
, we arrive at x via
x=pinv(S)*f and refer to Fig. 2.5 for its graphical representation.
28
2.5. Exercises
[1] With regard to gure 2.1,
(i) Derive the A and K matrices resulting from the removal of the fourth
spring and assemble S = A
T
KA.
(ii) Compute S
1
, by hand via GaussJordan, and compute L and U
where S = LU by hand via the composition of elimination matrices
and their inverses. Assume throughout that with k
1
= k
2
= k
3
= k,
(iii) Use the result of (ii) with the load f = [0 0 F]
T
to solve Sx = f by
hand two ways, i.e., x = S
1
f and Lc = f and Ux = c.
[2] With regard to gure 2.2,
(i) Derive the A and K matrices resulting from the addition of a fourth
(diagonal) ber that runs from the top of ber one to the second
mass and assemble S = A
T
KA.
(ii) Compute S
1
, by hand via GaussJordan, and compute L and U
where S = LU by hand via the composition of elimination matrices
and their inverses. Assume throughout that with k
1
= k
2
= k
3
=
k
4
= k.
(iii) Use the result of (ii) with the load f = [0 0 F 0]
T
to solve Sx = f
by hand two ways, i.e., x = S
1
f and Lc = f and Ux = c.
[3] Generalize gure 2.3 to the case of 16 nodes connected by 42 bers.
Introduce one sti (say k = 100) ber and show how to detect it by
properly choosing f. Submit your well-documented M-le as well as
the plots, similar to Figure 2.5, from which you conclude the presence of
a sti ber.
[4] We generalize gure 2.3 to permit ever ner meshes. In particular, with
29
reference to the gure below we assume N(N 1) nodes where the hori-
zontal and vertical bers each have length 1/N while the diagonal bers
have length

2/N. The top row of bers is anchored to the ceiling.


2 1
2N N+2 N+1
3
N(N1)
(N1)(4N3)
4N
4N1 4N2
4N4 4N3
4N5
11
10
9
8
7
6
5
4
3
2
1
N
(i) Write and test a matlab function S=bignet(N) that accepts the odd
number N and produces the stiness matrix S = A
T
KA. As a check
on your work we oer a spy plot of A when N = 5. Your K matrix
should reect the ber lengths as spelled out in (2.1). You may
assume Y
j
a
j
= 1 for each ber. The sparsity of A also produces a
sparse S. In order to exploit this, please use S=sparse(S) as the
nal line in bignet.m.
30
0 10 20 30 40
0
10
20
30
40
50
60
nz = 179
spy(A) when N=5
(ii) Write and test a driver called bigrun that generates S for N = 5 :
4 : 29 and for each N solves Sx = f two ways for 100 choices of f. In
particular, f is a steady downward pull on the bottom set of nodes,
with a continual increase on the pull at the center node. This can
be done via f=zeros(size(S,1),1); f(2:2:2*N) = 1e-3/N;
for j=1:100,
f(N+1) = f(N+1) + 1e-4/N;
This construction should be repeated twice. In the rst scenario,
precompute S
1
via inv and then apply x = S
1
f in the j loop.
In the second scenario precompute L and U and then apply x =
31
U\(L\f) in the j loop. In both cases use tic and toc to time each
for loop and so produce a graph of the form
0 200 400 600 800 1000 1200 1400 1600 1800
0
2
4
6
8
10
12
14
16
degrees of freedom
e
l
a
p
s
e
d

t
i
m
e

(
s
)


inv
lu
Submit your well documented code, a spy plot of S when N = 9, and a time
comparison like (will vary with memory and cpu) that shown above.
32
3. The Column and Null Spaces
3.1. Introduction
We take a closer look at the issues of existence and uniqueness of solutions
to Sx = f raised in the previous section.
3.2. The Column Space
We begin with the simple geometric interpretation of matrixvector mul-
tiplication. Namely, the multiplication of the n-by-1 vector x by the m-by-n
matrix S produces a linear combination of the columns of S. More precisely,
if s
j
denotes the jth column of S, then
Sx = [s
1
s
2
s
n
]
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
= x
1
s
1
+x
2
s
2
+ +x
n
s
n
. (3.1)
The picture I wish to place in your minds eye is that Sx lies in the plane
spanned by the columns of S. This plane occurs so frequently that we nd it
useful to distinguish it with a
Denition 1. The column space of the m-by-n matrix S is simply the span
of its columns, i.e.,
R(S) {Sx : x R
n
}.
This is a subset of R
m
. The letter R stands for range.
For example, let us recall the S matrix associated with Figure 2.2. Its
column space is
R(S) =
_

_
x
1
_
_
_
k
2
0
k
2
0
_
_
_
+x
2
_
_
_
0
k
1
0
0
_
_
_
+x
3
_
_
_
k
2
0
k
2
0
_
_
_
+x
4
_
_
_
0
0
0
k
3
_
_
_
: x R
4
_

_
.
33
As the rst and third columns are colinear we may write
R(S) =
_

_
x
1
_
_
_
k
2
0
k
2
0
_
_
_
+x
2
_
_
_
0
k
1
0
0
_
_
_
+x
3
_
_
_
0
0
0
k
3
_
_
_
: x R
3
_

_
.
As the remaining three columns are linearly independent we may go no fur-
ther. We recognize then R(S) as a three dimensional subspace of R
4
. In
order to use these ideas with any condence we must establish careful deni-
tions of subspace, independence, and dimension.
A subspace is a natural generalization of line and plane. Namely, it is
any set that is closed under vector addition and scalar multiplication. More
precisely,
Denition 2. A subset M of R
d
is a subspace of R
d
when
(1) p +q M whenever p M and q M, and
(2) tp M whenever p M and t R.
Let us conrm now that R(S) is indeed a subspace. Regarding (1) if
p R(S) and q R(S) then p = Sx and q = Sy for some x and y. Hence,
p + q = Sx + Sy = S(x + y), i.e., (p + q) R(S). With respect to (2),
tp = tSx = S(tx) so tp R(S).
This establishes that every column space is a subspace. The converse is
also true. Every subspace is the column space of some matrix. To make sense
of this we should more carefully expalin what we mean by span.
Denition 3. A collection of vectors {s
1
, s
2
, . . . , s
n
} in a subspace M is said
to span M when M = R(S) where S = [s
1
s
2
s
n
].
We shall be interested in how a subspace is situated in its ambient space.
34
We shall have occasion to speak of complementary subspaces and even the
sum of two subspaces. Lets take care of the latter right now,
Denition 4. If M and Q are subspaces of the same ambient space, R
d
, we
dene their direct sum
M Q {p +q : p M and q Q}
as the union of all possible sums of vectors from M and Q.
Do you see how R
3
may be written as the direct sum of R
1
and R
2
?
3.3. The Null Space
Denition 5. The null space of an m-by-n matrix S is the collection of
those vectors in R
n
that S maps to the zero vector in R
m
. More precisely,
N(S) {x R
n
: Sx = 0}.
Let us conrm that N(S) is in fact a subspace. If both x and y lie in
N(S) then Sx = Sy = 0 and so S(x +y) = 0. In addition, S(tx) = tSx = 0
for every t R.
As an example we remark that the null space of the S matrix associated
with Figure 2.2 is
N(S) =
_

_
t
_
_
_
1
0
1
0
_
_
_
: t R
_

_
,
a line in R
4
.
The null space answers the question of uniqueness of solutions to Sx = f.
For, if Sx = f and Sy = f then S(x y) = Sx Sy = f f = 0 and so
35
(x y) N(S). Hence, a solution to Sx = f will be unique if, and only if,
N(S) = {0}.
Recalling (3.1) we note that if x N(S) and x = 0, say, e.g., x
1
= 0,
then Sx = 0 takes the form
s
1
=
n

j=2
x
j
x
1
s
j
.
That is, the rst column of S may be expressed as a linear combination of
the remaining columns of S. Hence, one may determine the (in)dependence
of a set of vectors by examining the null space of the matrix whose columns
are the vectors in question.
Denition 6. The vectors {s
1
, s
2
, . . . , s
n
} are said to be linearly indepen-
dent if N(S) = {0} where S = [s
1
s
2
s
n
].
As lines and planes are described as the set of linear combinations of one
or two generators, so too subspaces are most conveniently described as the
span of a few basis vectors.
Denition 7. A collection of vectors {s
1
, s
2
, . . . , s
n
} in a subspace M is a
basis for M when the matrix S = [s
1
s
2
s
n
] satises
(1) M = R(S), and
(2) N(S) = {0}.
The rst stipulates that the columns of S span M while the second re-
quires the columns of S to be linearly independent.
3.4. A Blend of Theory and Example
Let us compute bases for the null and column spaces of the adjacency
matrix associated with the ladder below
36
1 2
3 4
1 2 3
4 5
6 7 8
Figure 3.1. An unstable ladder?
The ladder has 8 bars and 4 nodes, so 8 degrees of freedom. Continuing
to denote the horizontal and vertical displacements of node j by x
2j1
and
x
2j
we arrive at the A matrix
A =
_
_
_
_
_
_
_
_
_
_
_
1 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
0 1 0 0 0 1 0 0
0 0 0 1 0 0 0 1
0 0 0 0 1 0 0 0
0 0 0 0 1 0 1 0
0 0 0 0 0 0 1 0
_
_
_
_
_
_
_
_
_
_
_
To determine a basis for R(A) we must nd a way to discard its dependent
columns. A moments reection reveals that columns 2 and 6 are colinear, as
are columns 4 and 8. We seek, of course, a more systematic means of uncov-
ering these, and perhaps other less obvious, dependencies. Such dependencies
are more easily discerned from the row reduced form
A
red
= rref(A) =
_
_
_
_
_
_
_
_
_
_
_
1 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0
0 0 1 0 0 0 0 0
0 0 0 1 0 0 0 1
0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
37
Recall that rref performs the elementary row operations necessary to elimi-
nate all nonzeros below the diagonal. For those who cant stand to miss any
of the action I recommend rrefmovie.
Each nonzero row of A
red
is called a pivot row. The rst nonzero in each
row of A
red
is called a pivot. Each column that contains a pivot is called a
pivot column. On account of the staircase nature of A
red
we nd that there
are as many pivot columns as there are pivot rows. In our example there are
six of each and, again on account of the staircase nature, the pivot columns
are the linearly independent columns of A
red
. One now asks how this might
help us distinguish the independent columns of A. For, although the rows
of A
red
are linear combinations of the rows of A no such thing is true with
respect to the columns. The answer is: pay attention only to the indices
of the pivot columns. In our example, columns {1, 2, 3, 4, 5, 7} are the pivot
columns. In general
Proposition 1. Suppose A is m-by-n. If columns {c
j
: j = 1, . . . , r} are the
pivot columns of A
red
then columns {c
j
: j = 1, . . . , r} of A constitute a basis
for R(A).
Proof: Note that the pivot columns of A
red
are, by construction, linearly
independent. Suppose, however, that columns {c
j
: j = 1, . . . , r} of A are
linearly dependent. In this case there exists a nonzero x R
n
for which
Ax = 0 and
x
k
= 0, k {c
j
: j = 1, . . . , r}. (3.2)
Now Ax = 0 necessarily implies that A
red
x = 0, contrary to the fact that
columns {c
j
: j = 1, . . . , r} are the pivot columns of A
red
. (The implication
Ax = 0 A
red
x = 0 follows from the fact that we may read row reduction as
38
a sequence of linear transformations of A. If we denote the product of these
transformations by T then TA = A
red
and you see why Ax = 0 A
red
x = 0.
The reverse implication follows from the fact that each of our row operations
is reversible, or, in the language of the land, invertible.)
We now show that the span of columns {c
j
: j = 1, . . . , r} of A indeed
coincides with R(A). This is obvious if r = n, i.e., if all of the columns
are linearly independent. If r < n there exists a q {c
j
: j = 1, . . . , r}.
Looking back at A
red
we note that its qth column is a linear combination
of the pivot columns with indices not exceeding q. Hence, there exists an x
satisfying (3.2) and A
red
x = 0 and x
q
= 1. This x then necessarily satises
Ax = 0. This states that the qth column of A is a linear combination of
columns {c
j
: j = 1, . . . , r} of A.
Let us now exhibit a basis for N(A). We exploit the already mentioned
fact that N(A) = N(A
red
). Regarding the latter, we partition the elements
of x into so called pivot variables,
{x
c
j
: j = 1, . . . , r}
and free variables
{x
k
: k {c
j
: j = 1, . . . , r}}.
There are evidently n r free variables. For convenience, let us denote these
in the future by
{x
c
j
: j = r + 1, . . . , n}.
One solves A
red
x = 0 by expressing each of the pivot variables in terms
of the nonpivot, or free, variables. In the example above, x
1
, x
2
, x
3
, x
4
, x
5
and
39
x
7
are pivot while x
6
and x
8
are free. Solving for the pivot in terms of the
free we nd
x
7
= 0, x
5
= 0, x
4
= x
8
, x
3
= 0, x
2
= x
6
, x
1
= 0,
or, written as a vector,
x = x
6
_
_
_
_
_
_
_
_
_
_
_
0
1
0
0
0
1
0
0
_
_
_
_
_
_
_
_
_
_
_
+x
8
_
_
_
_
_
_
_
_
_
_
_
0
0
0
1
0
0
0
1
_
_
_
_
_
_
_
_
_
_
_
, (3.3)
where x
6
and x
8
are free. As x
6
and x
8
range over all real numbers the x
above traces out a plane in R
8
. This plane is precisely the null space of A
and (3.3) describes a generic element as the linear combination of two basis
vectors. Compare this to what matlab returns when faced with null(A,r).
Abstracting these calculations we arrive at
Proposition 2. Suppose that A is m-by-n with pivot indices {c
j
: j =
1, . . . , r} and free indices {c
j
: j = r + 1, . . . , n}. A basis for N(A) may be
constructed of nr vectors {z
1
, z
2
, . . . , z
nr
} where z
k
, and only z
k
, possesses
a nonzero in its c
r+k
component.
With respect to our ladder the free indices are c
7
= 6 and c
8
= 8. You
still may be wondering what R(A) and N(A) tell us about the ladder that
did not already know. Regarding R(A) the answer will come in the next
chapter. The null space calculation however has revealed two independent
motions against which the ladder does no work! Do you see that the two
40
vectors in (3.3) encode rigid vertical motions of bars 4 and 5 respectively?
As each of these lies in the null space of A the associated elongation is zero.
Can you square this with the ladder as pictured in gure 3.1? I hope not, for
vertical motion of bar 4 must stretch bars 1,2,6 and 7. How does one resolve
this (apparent) contradiction?
3.5. A Couple More Examples
We compute bases for the column and null spaces of
A =
_
1 1 0
1 0 1
_
Subtracting the rst row from the second lands us at
A
red
=
_
1 1 0
0 1 1
_
hence both rows are pivot rows and columns 1 and 2 are pivot columns.
Proposition 1 then informs us that the rst two columns of A, namely
__
1
1
_
,
_
1
0
__
(3.4)
comprise a basis for R(A). In this case, R(A) = R
2
.
Regarding N(A) we express each row of A
red
x = 0 as the respective pivot
variable in terms of the free. More precisely, x
1
and x
2
are pivot variables
and x
3
is free and A
red
x = 0 reads
x
1
+x
2
= 0
x
2
+x
3
= 0
Working from the bottom up we nd
x
2
= x
3
and x
1
= x
3
41
and hence every vector in the null space is of the form
x = x
3
_
_
1
1
1
_
_
.
In other words
N(A) =
_
_
_
x
3
_
_
1
1
1
_
_
: x
3
R
_
_
_
and
_
_
1
1
1
_
_
constitutes a basis for N(A).
Let us stretch this example a bit and see what happens. In particular,
we append a new column and arrive at
B =
_
1 1 0 2
1 0 1 3
_
.
The column space of A was already the whole space and so adding a column
changes, with respect to R(A), nothing. That is, R(B) = R(A) and (3.4) is
a basis for R(B).
Regarding N(B) we again subtract the rst row from the second,
B
red
=
_
1 1 0 2
0 1 1 1
_
and identify x
1
and x
2
as pivot variables and x
3
and x
4
as free. We see that
B
red
x = 0 means
x
1
+x
2
+ 2x
4
= 0
x
2
+x
3
+x
4
= 0
42
or, equivalently,
x
2
= x
3
+x
4
and x
1
= x
3
3x
4
and so
N(B) =
_

_
x
3
_
_
_
1
1
1
0
_
_
_
+x
4
_
_
_
3
1
0
1
_
_
_
: x
3
R, x
4
R
_

_
and
_

_
_
_
_
1
1
1
0
_
_
_
,
_
_
_
3
1
0
1
_
_
_
_

_
constitutes a basis for N(B).
3.6. Concluding Remarks
The number of pivots, r, of an m-by-n matrix A appears to be an im-
portant indicator. We shall refer to it from now on as the rank of A. Our
canonical bases for R(A) and N(A) possess r and n r elements respec-
tively. The number of elements in a basis for a subspace is typically called
the dimension of the subspace.
3.7. Exercises
[1] Which of the following subsets of R
3
are actually subspaces? Check both
conditions in denition 2 and show your work.
(a) All vectors whose rst component x
1
= 0.
(b) All vectors whose rst component x
1
= 1.
(c) All vectors whose rst two components obey x
1
x
2
= 0.
(d) The vector (0, 0, 0).
43
(e) All linear combinations of the pair (1, 1, 0) and (2, 0, 1).
(f) All vectors for which x
3
x
2
+ 3x
1
= 0.
[2] I encourage you to use rref and null for the following.
(i) Add a diagonal crossbar between nodes 3 and 2 in Figure 3.1 and
compute bases for the column and null spaces of the new adjacency
matrix. As this crossbar fails to stabilize the ladder we shall add one
more bar.
(ii) To the 9 bar ladder of (i) add a diagonal cross bar between nodes 1
and the left end of bar 6. Compute bases for the column and null
spaces of the new adjacency matrix.
[3] We wish to show that N(A) = N(A
T
A) regardless of A.
(i) We rst take a concrete example. Report the ndings of null when
applied to A and A
T
A for the A matrix associated with Figure 3.1.
(ii) For arbitrary A show that N(A) N(A
T
A), i.e., that if Ax = 0
then A
T
Ax = 0.
(iii) For arbitrary A show that N(A
T
A) N(A), i.e., that if A
T
Ax = 0
then Ax = 0. (Hint: if A
T
Ax = 0 then x
T
A
T
Ax = 0 and says
something about Ax, recall that y
2
y
T
y.)
[4] Suppose that A is m-by-n and that N(A) = R
n
. Argue that A must be
the zero matrix.
[5] Suppose that both {s
1
, . . . , s
n
} and {t
1
, . . . , t
m
} are both bases for the
subspace M. Prove that m = n and hence that our notion of dimension
makes sense.
44
4. The Fundamental Theorem of Linear Algebra
4.1. Introduction
The previous chapter, in a sense, only told half of the story. In particular,
an m-by-n matrix A maps R
n
into R
m
and its null space lies in R
n
and its
column space lies in R
m
. Having seen examples where R(A) was a proper
subspace of R
m
one naturally asks about what is left out. Similarly, one
wonders about the subspace of R
n
that is complimentary to N(A). These
questions are answered by the column space and null space of A
T
.
4.2. The Row Space
As the columns of A
T
are simply the rows of A we call R(A
T
) the row
space of A. More precisely
Denition 1. The row space of the m-by-n matrix A is simply the span of
its rows, i.e.,
R(A
T
) {A
T
y : y R
m
}.
This is a subspace of R
n
.
Regarding a basis for R(A
T
) we recall that the rows of A
red
rref(A)
are merely linear combinations of the rows of A and hence
R(A
T
) = R((A
red
)
T
).
Recalling that pivot rows of A
red
are linearly independent and that all re-
maining rows of A
red
are zero leads us to
Proposition 1. Suppose A is m-by-n. The pivot rows of A
red
constitute a
basis for R(A
T
).
45
As there are r pivot rows of A
red
we nd that the dimension of R(A
T
) is
r. Recalling Proposition 2.2 we nd the dimensions of N(A) and R(A
T
) to
be complementary, i.e., they sum to the dimension of the ambient space, n.
Much more in fact is true. Let us compute the dot product of an arbitrary
element x R(A
T
) and z N(A). As x = A
T
y for some y we nd
x
T
z = (A
T
y)
T
z = y
T
Az = 0.
This states that every vector in R(A
T
) is perpendicular to every vector in
N(A).
Let us test this observation on the A matrix stemming from the unstable
ladder of 3.4. Recall that
z
1
=
_
_
_
_
_
_
_
_
_
_
_
0
1
0
0
0
1
0
0
_
_
_
_
_
_
_
_
_
_
_
and z
2
=
_
_
_
_
_
_
_
_
_
_
_
0
0
0
1
0
0
0
1
_
_
_
_
_
_
_
_
_
_
_
constitute a basis for N(A) while the pivot rows of A
red
are
x
1
=
_
_
_
_
_
_
_
_
_
_
_
1
0
0
0
0
0
0
0
_
_
_
_
_
_
_
_
_
_
_
, x
2
=
_
_
_
_
_
_
_
_
_
_
_
0
1
0
0
0
1
0
0
_
_
_
_
_
_
_
_
_
_
_
, x
3
=
_
_
_
_
_
_
_
_
_
_
_
0
0
1
0
0
0
0
0
_
_
_
_
_
_
_
_
_
_
_
,
46
and
x
4
=
_
_
_
_
_
_
_
_
_
_
_
0
0
0
1
0
0
0
1
_
_
_
_
_
_
_
_
_
_
_
, x
5
=
_
_
_
_
_
_
_
_
_
_
_
0
0
0
0
1
0
0
0
_
_
_
_
_
_
_
_
_
_
_
, x
6
=
_
_
_
_
_
_
_
_
_
_
_
0
0
0
0
0
0
1
0
_
_
_
_
_
_
_
_
_
_
_
.
Indeed, each z
j
is perpendicular to each x
k
. As a result,
{z
1
, z
2
, x
1
, x
2
, x
3
, x
4
, x
5
, x
6
}
comprises a set of 8 linearly independent vectors in R
8
. These vectors then
necessarily span R
8
. For, if they did not, there would exist nine linearly
independent vectors in R
8
! In general, we nd
Fundamental Theorem of Linear Algebra (Preliminary). Suppose A
is m-by-n and has rank r. The row space, R(A
T
), and the null space, N(A),
are respectively r and n r dimensional subspaces of R
n
. Each x R
n
may
be uniquely expressed in the form
x = x
R
+x
N
, where x
R
R(A
T
) and x
N
N(A). (4.1)
Recalling Denition 4 from Chapter 3 we may interpret (4.1) as
R
n
= R(A
T
) N(A).
As the constituent subspaces have been shown to be orthogonal we speak of
R
n
as the orthogonal direct sum of R(A
T
) and N(A).
47
4.3. The Left Null Space
The Fundamental Theorem will more than likely say that R
m
= R(A)
N(A
T
). In fact, this is already in the preliminary version. To coax it out we
realize that there was nothing special about the choice of letters used. Hence,
if B is p-by-q then the preliminary version states that R
q
= R(B
T
) N(B).
As a result, letting B = A
T
, p = n and q = m, we nd indeed R
m
=
R(A) N(A
T
). That is, the left null space, N(A
T
), is the orthogonal
complement of the column space, R(A). The word left stems from the fact
that A
T
y = 0 is equivalent to y
T
A = 0, where y acts on A from the left.
In order to compute a basis for N(A
T
) we merely mimic the construction
of the previous section. Namely, we compute (A
T
)
red
and then solve for the
pivot variables in terms of the free ones.
With respect to the A matrix associated with the unstable ladder of 3.4,
we nd
A
T
=
_
_
_
_
_
_
_
_
_
_
_
1 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0
0 1 1 0 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 1 0
0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 1
0 0 0 0 1 0 0 0
_
_
_
_
_
_
_
_
_
_
_
and
(A
T
)
red
= rref(A
T
) =
_
_
_
_
_
_
_
_
_
_
_
1 0 1 0 0 0 0 0
0 1 1 0 0 0 0 0
0 0 0 1 0 0 0 0
0 0 0 0 1 0 0 0
0 0 0 0 0 1 0 1
0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
.
48
We recognize the rank of A
T
to be 6, with pivot and free indices
{1, 2, 4, 5, 6, 7} and {3, 8}
respectively. Solving (A
T
)
red
x = 0 for the pivot variables in terms of the free
we nd
x
7
= x
8
, x
6
= x
8
, x
5
= 0, x
4
= 0, x
2
= x
3
, x
1
= x
3
,
or in vector form,
x = x
3
_
_
_
_
_
_
_
_
_
_
_
1
1
1
0
0
0
0
0
_
_
_
_
_
_
_
_
_
_
_
+x
8
_
_
_
_
_
_
_
_
_
_
_
0
0
0
0
0
1
1
1
_
_
_
_
_
_
_
_
_
_
_
.
These two vectors constitute a basis for N(A
T
) and indeed they are both
orthogonal to every column of A. We have now exhibited means by which
one may assemble bases for the four fundamental subspaces. In the process
we have established
Fundamental Theorem of Linear Algebra. Suppose A is m-by-n and
has rank r. One has the orthogonal direct sums
R
n
= R(A
T
) N(A) and R
m
= R(A) N(A
T
)
where the dimensions are
dimR(A) = dimR(A
T
) = r
dimN(A) = n r
dimN(A
T
) = mr.
49
We shall see many applications of this fundamental theorem. Perhaps
one of the most common is the use of the orthogonality of R(A) and N(A
T
)
in the characterization of those b for which an x exists for which Ax = b.
There are many instances for which R(A) is quite large and unwieldy while
N(A
T
) is small and therefore simpler to grasp. As an example, consider the
(n 1)-by-n rst order dierence matrix with 1 on the diagonal and 1 on
the super diagonal,
A =
_
_
_
_
_
1 1 0 0
0 1 1 0


0 1 1
_
_
_
_
_
It is not dicult to see that N(A
T
) = {0} and so, R(A), being its orthogonal
complement, is the entire space, R
n1
. That is, for each b R
n1
there
exists an x R
n
such that Ax = b. The uniqueness of such an x is decided
by N(A). We recognize this latter space as the span of the vector of ones.
But you already knew that adding a constant to a function does not change
its derivative.
4.4. Exercises
[1] True or false: support your answer.
(i) If A is square then R(A) = R(A
T
).
(ii) If A and B have the same four fundamental subspaces then A=B.
[2] Construct bases (by hand) for the four subspaces associated with
A =
_
1 1 1
1 0 1
_
.
Also provide a careful sketch of these subspaces.
50
[3] Show that if AB = 0 then R(B) N(A).
[4] Why is there no matrix whose row space and null space both contain the
vector [1 1 1]
T
?
[5] Write down a matrix with the required property or explain why no such
matrix exists.
(a) Column space contains [1 0 0]
T
and [0 0 1]
T
while row space contains
[1 1]
T
and [1 2]
T
.
(b) Column space has basis [1 1 1]
T
while null space has basis [1 2 1]
T
.
(c) Column space is R
4
while row space is R
3
.
[6] One often constructs matrices via outer products, e.g., given a n-by-1
vector v let us consider A = vv
T
.
(a) Show that v is a basis for R(A),
(b) Show that N(A) coincides with all vectors perpendicular to v.
(c) What is the rank of A?
51
5. Least Squares
5.1. Introduction
We learned in the previous chapter that Ax = b need not possess a
solution when the number of rows of A exceeds its rank, i.e., r < m. As
this situation arises quite often in practice, typically in the guise of more
equations than unknowns, we establish a rationale for the absurdity Ax = b.
5.2. The Normal Equations
The goal is to choose x such that Ax is as close as possible to b. Measuring
closeness in terms of the sum of the squares of the components we arrive at
the least squares problem of minimizing
Ax b
2
(Ax b)
T
(Ax b) (5.1)
over all x R
n
. The path to the solution is illuminated by the Fundamental
Theorem. More precisely, we write
b = b
R
+b
N
where b
R
R(A) and b
N
N(A
T
).
On noting that (i) (Axb
R
) R(A) for every x R
n
and (ii) R(A) N(A
T
)
we arrive at the Pythagorean Theorem
Ax b
2
= Ax b
R
b
N

2
= Ax b
R

2
+b
N

2
, (5.2)
It is now clear from (5.2) that the best x is the one that satises
Ax = b
R
. (5.3)
As b
R
R(A) this equation indeed possesses a solution. We have yet however
to specify how one computes b
R
given b. Although an explicit expression for
52
b
R
, the so called orthogonal projection of b onto R(A), in terms of A and
b is within our grasp we shall, strictly speaking, not require it. To see this,
let us note that if x satises (5.3) then
Ax b = Ax b
R
b
N
= b
N
. (5.4)
As b
N
is no more easily computed than b
R
you may claim that we are just
going in circles. The practical information in (5.4) however is that (Axb)
N(A
T
), i.e., A
T
(Ax b) = 0, i.e.,
A
T
Ax = A
T
b. (5.5)
As A
T
b R(A
T
) regardless of b this system, often referred to as the normal
equations, indeed has a solution. This solution is unique so long as the
columns of A
T
A are linearly independent, i.e., so long as N(A
T
A) = {0}.
Recalling Chapter 2, Exercise 2, we note that this is equivalent to N(A) = {0}.
We summarize our ndings in
Theorem 1. The set of x R
n
for which the mist Ax b
2
is smallest is
composed of those x for which
A
T
Ax = A
T
b.
There is always at least one such x.
There is exactly one such x i N(A) = {0}.
As a concrete example, suppose with reference to the gure below that
A =
_
_
1 1
0 1
0 0
_
_
and b =
_
_
1
1
1
_
_
.
53
2
1
0
1
2
2
1
0
1
2
0
0.2
0.4
0.6
0.8
1
y
2
b
R
b
b
N
y
1
R(A)
y
3
Figure 5.1. The decomposition of b.
As b = R(A) there is no x such that Ax = b. Indeed,
Ax b
2
= (x
1
+x
2
1)
2
+ (x
2
1)
2
+ 1 1,
with the minimum uniquely attained at
x =
_
0
1
_
,
in agreement with the unique solution of (5.5), for
A
T
A =
_
1 1
1 2
_
and A
T
b =
_
1
2
_
.
We now recognize, a posteriori, that
b
R
= Ax =
_
_
1
1
0
_
_
is the orthogonal projection of b onto the column space of A.
54
5.3. Applying Least Squares to the Biaxial Test Problem
We shall formulate the identication of the 20 ber stinesses in Fig-
ure 2.3, as a least squares problem. We envision loading, f, the 9 nodes and
measuring the associated 18 displacements, x. From knowledge of x and f
we wish to infer the components of K = diag(k) where k is the vector of
unknown ber stinesses. The rst step is to recognize that
A
T
KAx = f
may be written as
Bk = f where B = A
T
diag(Ax). (5.6)
Though conceptually simple this is not of great use in practice, for B is 18-
by-20 and hence (5.6) possesses many solutions. The way out is to compute
k as the result of more than one experiment. We shall see that, for our small
sample, 2 experiments will suce.
To be precise, we suppose that x
(1)
is the displacement produced by
loading f
(1)
while x
(2)
is the displacement produced by loading f
(2)
. We then
piggyback the associated pieces in
B =
_
A
T
diag(Ax
(1)
)
A
T
diag(Ax
(2)
)
_
and f =
_
f
(1)
f
(2)
_
.
This B is 36-by-20 and so the system Bk = f is overdetermined and hence
ripe for least squares.
We proceed then to assemble B and f. We suppose f
(1)
and f
(2)
to
correspond to horizontal and vertical stretching
f
(1)
= [1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0]
T
f
(2)
= [0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1]
T
55
respectively. For the purpose of our example we suppose that each k
j
= 1
except k
8
= 5. We assemble A
T
KA as in Chapter 2 and solve
A
T
KAx
(j)
= f
(j)
with the help of the pseudoinverse. In order to impart some reality to this
problem we taint each x
(j)
with 10 percent noise prior to constructing B.
Please see the attached Mle for details. Regarding
B
T
Bk = B
T
f
we note that Matlab solves this system when presented with k=B\f when B is
rectangular. We have plotted the results of this procedure in the gure below
0 5 10 15 20 25
0
1
2
3
4
5
6
fiber number
f
i
b
e
r

s
t
i
f
f
n
e
s
s
Figure 5.2. Results of a successful biaxial test.
The sti ber is readily identied.
56
5.4. Projections
From an algebraic point of view (5.5) is an elegant reformulation of the
least squares problem. Though easy to remember it unfortunately obscures
the geometric content, suggested by the word projection, of (5.4). As pro-
jections arise frequently in many applications we pause here to develop them
more carefully.
With respect to the normal equations we note that if N(A) = {0} then
x = (A
T
A)
1
A
T
b
and so the orthogonal projection of b onto R(A) is
b
R
= Ax = A(A
T
A)
1
A
T
b. (5.7)
Dening
P = A(A
T
A)
1
A
T
, (5.8)
(5.7) takes the form b
R
= Pb. Commensurate with our notion of what a
projection should be we expect that P map vectors not in R(A) onto R(A)
while leaving vectors already in R(A) unscathed. More succinctly, we expect
that Pb
R
= b
R
, i.e., PPb = Pb. As the latter should hold for all b R
m
we
expect that
P
2
= P. (5.9)
With respect to (5.8) we nd that indeed
P
2
= A(A
T
A)
1
A
T
A(A
T
A)
1
A
T
= A(A
T
A)
1
A
T
= P.
We also note that the P in (5.8) is symmetric. We dignify these properties
through
57
Denition 1. A matrix P that satises P
2
= P is called a projection. A
symmetric projection is called an orthogonal projection.
We have taken some pains to motivate the use of the word projection.
You may be wondering however what symmetry has to do with orthogonality.
We explain this in terms of the tautology
b = Pb + (I P)b.
Now, if P is a projection then so too is (I P). Moreover, if P is symmetric
then the dot product of bs two constituents is
(Pb)
T
(I P)b = b
T
P
T
(I P)b = b
T
(P P
2
)b = b
T
0b = 0,
i.e., Pb is orthogonal to (I P)b.
As examples of nonorthogonal projections we oer
_
1 0
1 0
_
and
_
_
1 0 0
1/2 0 0
1/4 1/2 1
_
_
Finally, let us note that the central formula, P = A(A
T
A)
1
A
T
, is even a bit
more general than advertised. It has been billed as the orthogonal projection
onto the column space of A. The need often arises however for the orthogonal
projection onto some arbitrary subspace M. The key to using the old P is
simply to realize that every subspace is the column space of some matrix.
More precisely, if
{x
1
, . . . , x
m
}
is a basis for M then clearly if these x
j
are placed into the columns of a matrix
called A then R(A) = M. For example, if M is the line through [1 1]
T
then
P =
_
1
1
_
1
2
( 1 1 ) =
1
2
_
1 1
1 1
_
58
is orthogonal projection onto M.
5.5. Exercises
[1] A steal beam was stretched to lengths = 6, 7, and 8 feet under applied
forces of f = 1, 2, and 4 tons. Assuming Hookes law L = cf, nd
its compliance, c, and original length, L, by least squares.
[2] With regard to the example of 5.3 note that, due to the the random
generation of the noise that taints the displacements, one gets a dierent
answer every time the code is invoked.
(i) Write a loop that invokes the code a statistically signicant number
of times and submit bar plots of the average ber stiness and its
standard deviation for each ber, along with the associated Mle.
(ii) Experiment with various noise levels with the goal of determining
the level above which it becomes dicult to discern the sti ber.
Carefully explain your ndings.
[3] Find the matrix that projects R
3
onto the line spanned by [1 0 1]
T
.
[4] Find the matrix that projects R
3
onto the plane spanned by [1 0 1]
T
and
[1 1 1]
T
.
[5] If P is the projection of R
m
onto a kdimensional subspace M, what is
the rank of P and what is R(P)?
59
6. Matrix Methods for Dynamical Systems
6.1. Introduction
Up to this point we have largely been concerned with (i) deriving linear
systems of algebraic equations (from considerations of static equilibrium) and
(ii) the solution of such systems via Gaussian elimination.
In this section we hope to begin to persuade the reader that our tools
extend in a natural fashion to the class of dynamic processes. More precisely,
we shall argue that (i) Matrix Algebra plays a central role in the derivation
of mathematical models of dynamical systems and that, with the aid of the
Laplace transform in an analytical setting or the Backward Euler method in
the numerical setting, (ii) Gaussian elimination indeed produces the solution.
6.2. Nerve Fibers and the Dynamic Strang Quartet
A nerve bers natural electrical stimulus is not direct current but rather
a short burst of current, the socalled nervous impulse. In such a dynamic
environment the cells membrane behaves not only like a leaky conductor but
also like a charge separator, or capacitor.
C
R
E
R R
C
R
E
C
R
E
i
m
m
m
m
m
m
m
i
i
0
x x x
y
y
y
y
y
y
y
y
1
2
3
1
2
3
4
5
6
7
8
cb
cb
Figure 6.1. An RC model of a nerve ber.
60
The typical value of a cells membrane capacitance is
c = 1 (F/cm
2
)
where F denotes microFarad. The capacitance of a single compartment is
therefore
C
m
= 2a(/N)c
and runs parallel to each R
m
, see gure 6.1. This gure also diers from
gure 1.4 in that it possesses two edges to the left of the stimuli. These edges
serve to mimic that portion of the stimulus current that is shunted by the cell
body. If A
cb
denotes the surface area of the cell body then its capacitance
and resistance are
C
cb
= A
cb
c and R
cb
= A
cb

m
respectively. We ask now how the static Strang Quartet of chapter one should
be augmented. Regarding (S1) we proceed as before. The voltage drops are
e
1
= x
1
, e
2
= x
1
E
m
, e
3
= x
1
x
2
, e
4
= x
2
,
e
5
= x
2
E
m
, e
6
= x
2
x
3
, e
7
= x
3
, e
8
= x
3
E
m
,
and so
e = b Ax where b = E
m
_
_
_
_
_
_
_
_
_
_
_
0
1
0
0
1
0
0
1
_
_
_
_
_
_
_
_
_
_
_
and A =
_
_
_
_
_
_
_
_
_
_
_
1 0 0
1 0 0
1 1 0
0 1 0
0 1 0
0 1 1
0 0 1
0 0 1
_
_
_
_
_
_
_
_
_
_
_
61
In (S2) we must now augment Ohms law with voltagecurrent law obeyed
by a capacitor, namely the current through a capacitor is proportional to
the time rate of change of the potential across it. This yields, (denoting d/dt
by

),
y
1
= C
cb
e

1
, y
2
= e
2
/R
cb
, y
3
= e
3
/R
i
, y
4
= C
m
e

4
,
y
5
= e
5
/R
m
, y
6
= e
6
/R
i
, y
7
= C
m
e

7
, y
8
= e
8
/R
m
,
or, in matrix terms,
y = Ge +Ce

where
G =
_
_
_
_
_
_
_
_
_
_
_
0 0 0 0 0 0 0 0
0 1/R
cb
0 0 0 0 0 0
0 0 1/R
i
0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 1/R
m
0 0 0
0 0 0 0 0 1/R
i
0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1/R
m
_
_
_
_
_
_
_
_
_
_
_
and
C =
_
_
_
_
_
_
_
_
_
_
_
C
cb
0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 C
m
0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 C
m
0
0 0 0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
.
are the conductance and capacitance matrices.
As Kirchhos Current law is insensitive to the type of device occupying
an edge step (S3) proceeds exactly as above.
i
0
y
1
y
2
y
3
= 0 y
3
y
4
y
5
y
6
= 0 y
6
y
7
y
8
= 0,
62
or, in matrix terms,
A
T
y = f where f = [i
0
0 0]
T
.
Step (S4) remains one of assembling,
A
T
y = f A
T
(Ge +Ce

) = f A
T
(G(b Ax) +C(b

Ax

)) = f,
becomes
A
T
CAx

+A
T
GAx = A
T
Gb +f +A
T
Cb

. (6.1)
This is the general form of the potential equations for an RC circuit. It
presumes of the user knowledge of the initial value of each of the potentials,
x(0) = X. (6.2)
Regarding the circuit of gure 6.1 we nd
A
T
CA =
_
_
C
cb
0 0
0 C 0
0 0 C
_
_
A
T
GA =
_
_
G
cb
+G
i
G
i
0
G
i
2G
i
+G
m
G
i
0 G
i
G
i
+G
m
_
_
A
T
Gb = E
m
_
_
G
cb
G
m
G
m
_
_
and A
T
Cb

=
_
_
0
0
0
_
_
.
and an initial (rest) potential of
x(0) = E
m
[1 1 1]
T
.
We shall now outline two modes of attack on such problems. The Laplace
Transform is an analytical tool that produces exact, closedform, solutions for
small tractable systems and therefore oers insight into how larger systems
63
should behave. The BackwardEuler method is a technique for solving a
discretized (and therefore approximate) version of (6.1). It is highly exible,
easy to code, and works on problems of great size. Both the BackwardEuler
and Laplace Transform methods require, at their core, the algebraic solution
of a linear system of equations. In deriving these methods we shall nd it
more convenient to proceed from the generic system
x

= Bx +g. (6.3)
With respect to our ber problem
B = (A
T
CA)
1
A
T
GA
=
_
_
(G
cb
+G
i
)/C
cb
G
i
/C
cb
0
G
i
/C
m
(2G
i
+G
m
)/C
m
G
i
/C
m
0 G
i
/C
m
(G
i
+G
m
)/C
m
_
_
(6.4)
and
g = (A
T
CA)
1
(A
T
Gb +f) =
_
_
(G
cb
E
m
+i
0
)/C
cb
E
m
G
m
/C
m
E
m
G
m
/C
m
_
_
.
6.3. The Laplace Transform
The Laplace Transform is typically credited with taking dynamical prob-
lems into static problems. Recall that the Laplace Transform of the function
h is
(Lh)(s)
_

0
e
st
h(t) dt.
where s is a complex variable. We shall soon take a 2 chapter dive into the
theory of complex variables and functions. But for now let us proceed calmly
and condently and follow Matlabs lead. For example
>> syms t
64
>> laplace(exp(t))
ans = 1/(s-1)
>> laplace(t*exp(-t))
ans = 1/(s+1)
2
The Laplace Transform of a matrix of functions is simply the matrix of
Laplace transforms of the individual elements. For example
L
_
e
t
te
t
_
=
_
1/(s 1)
1/(s + 1)
2
_
.
Now, in preparing to apply the Laplace transform to (6.3) we write it as
Lx

= L(Bx +g) (6.5)


and so must determine how L acts on derivatives and sums. With respect to
the latter it follows directly from the denition that
L(Bx +g) = LBx +Lg = BLx +Lg. (6.6)
Regarding its eect on the derivative we nd, on integrating by parts, that
Lx

=
_

0
e
st
x

(t) dt = x(t)e
st

0
+s
_

0
e
st
x(t) dt.
Supposing that x and s are such that x(t)e
st
0 as t we arrive at
Lx

= sLx x(0). (6.7)


Now, upon substituting (6.6) and (6.7) into (6.5) we nd
sLx x(0) = BLx +Lg,
65
which is easily recognized to be a linear system for Lx, namely
(sI B)Lx = Lg +x(0). (6.8)
The only thing that distinguishes this system from those encountered since
chapter 1 is the presence of the complex variable s. This complicates the
mechanical steps of Gaussian Elimination or the GaussJordan Method but
the methods indeed apply without change. Taking up the latter method, we
write
Lx = (sI B)
1
(Lg +x(0)).
The matrix (sI B)
1
is typically called the resolvent of B at s. We turn
to Matlab for its symbolic calculation. For example,
>> B = [2 -1;-1 2]
>> R = inv(s*eye(2)-B)
R =
[ (s-2)/(s*s-4*s+3), -1/(s*s-4*s+3)]
[ -1/(s*s-4*s+3), (s-2)/(s*s-4*s+3)]
We note that (sI B)
1
is well dened except at the roots of the
quadratic, s
2
4s + 3. This quadratic is the determinant of (sI B) and
is often referred to as the characteristic polynomial of B. Its roots are
called the eigenvalues of B.
As a second example let us take the B matrix of (6.4) with the parameter
choices specied in fib3.m, namely
B =
_
_
0.135 0.125 0
0.5 1.01 0.5
0 0.5 0.51
_
_
(6.9)
66
The associated (sI B)
1
is a bit bulky, please run fib3.m, so we display
here only the denominator of each term, i.e.,
s
3
+ 1.655s
2
+ 0.4078s + 0.0039. (6.10)
Assuming a current stimulus of the form i
0
(t) = t
3
exp(t/6)/10000, and
E
m
= 0 brings
(Lg)(s) =
_
_
0.191/(s + 1/6)
4
0
0
_
_
and so (6.10) persists in
Lx = (sI B)
1
Lg
=
0.191
(s + 1/6)
4
(s
3
+ 1.655s
2
+ 0.4078s + 0.0039)
_
_
s
2
+ 1.5s + 0.27
0.5s + 0.26
0.2497
_
_
Now comes the rub. A simple linear solve (or inversion) has left us with the
Laplace transform of x. The accursed No Free Lunch Theorem informs us
that we shall have to do some work in order to recover x from Lx.
6.4. The Inverse Laplace Transform
In coming sections we shall establish that the inverse Laplace transform
of a function h is
(L
1
h)(t) =
1
2
_
+

e
(c+iy)t
h(c +iy) dy, (6.11)
where i

1 and the real number c is chosen so that all of the singularities


of h lie to the left of the line of integration. With the inverse Laplace transform
one may express the solution of (6.3) as
x(t) = L
1
(sI B)
1
(Lg +x(0)). (6.12)
67
As an example, let us take the rst component of Lx, namely
Lx
1
(s) =
0.19(s
2
+ 1.5s + 0.27)
(s + 1/6)
4
(s
3
+ 1.655s
2
+ 0.4078s + 0.0039)
.
The singularities, or poles, are the points s at which Lx
1
(s) blows up. These
are clearly the roots of its denominator, namely
1/100, 329/400

73/16 and 1/6. (6.13)


All four being negative it suces to take c = 0 and so the integration in
(6.11) proceeds up the imaginary axis. We dont suppose the reader to have
already encountered integration in the complex plane but hope that this ex-
ample might provide the motivation necessary for a brief overview of such.
Before that however we note that Matlab has digested the calculus we wish
to develop. Referring again to fib3.m for details we note that the ilaplace
command produces
x
1
(t) = 211.35 exp(t/100)
(.0554t
3
+ 4.5464t
2
+ 1.085t + 474.19) exp(t/6)+
exp(329t/400)(262.842 cosh(

73t/16) + 262.836 sinh(

73t/16))
0 50 100 150 200 250 300 350 400
0
20
40
60
80
100
120
140
t (ms)
x

(
m
V
)
Figure 6.2. The 3 potentials associated with gure 6.1.
68
The other potentials, see the gure above, possess similar expressions. Please
note that each of the poles of Lx
1
appear as exponents in x
1
and that the
coecients of the exponentials are polynomials whose degrees are determined
by the orders of the respective poles.
6.5. The BackwardEuler Method
Where in the previous section we tackled the derivative in (6.3) via an
integral transform we pursue in this section a much simpler strategy, namely,
replace the derivative with a nite dierence quotient. That is, one chooses a
small dt and replaces (6.3) with
x(t) x(t dt)
dt
= B x(t) +g(t). (6.14)
The utility of (6.14) is that it gives a means of solving for x at the present
time, t, from knowledge of x in the immediate past, t dt. For example, as
x(0) = x(0) is supposed known we write (6.14) as
(I/dt B) x(dt) = x(0)/dt +g(dt).
Solving this for x(dt) we return to (6.14) and nd
(I/dt B) x(2dt) = x(dt)/dt +g(2dt)
and solve for x(2dt). The general step from past to present,
x(jdt) = (I/dt B)
1
( x((j 1)dt)/dt +g(jdt)), (6.15)
is repeated until some desired nal time, Tdt, is reached. This equation has
been implemented in fib3.m with dt = 1 and B and g as above. The resulting
x (run fib3 yourself!) is indistinguishable from gure 6.2.
69
Comparing the two representations, (6.12) and (6.15), we see that they
both produce the solution to the general linear system of ordinary equations,
(6.3), by simply inverting a shifted copy of B. The former representation is
hard but exact while the latter is easy but approximate. Of course we should
expect the approximate solution, x, to approach the exact solution, x, as the
time step, dt, approaches zero. To see this let us return to (6.15) and assume,
for now, that g 0. In this case, one can reverse the above steps and arrive
at the representation
x(jdt) = ((I dtB)
1
)
j
x(0). (6.16)
Now, for a xed time t we suppose that dt = t/j and ask whether
x(t) = lim
j
((I (t/j)B)
1
)
j
x(0).
This limit, at least when B is one-by-one, yields the exponential
x(t) = exp(Bt)x(0),
clearly the correct solution to (6.3). A careful explication of the matrix
exponential and its relationship to (6.12) will have to wait until we have
mastered the inverse Laplace transform.
6.6. Exercises
[1] Compute, without the aid of a machine, the Laplace transforms of e
t
and
te
t
. Show ALL of your work.
[2] Extract from fib3.m analytical expressions for x
2
and x
3
.
[3] Use eig to compute the eigenvalues of B as given in (6.9). Use det to
compute the characteristic polynomial of B. Use roots to compute the
70
roots of this characteristic polynomial. Compare these to the results of
eig. How does Matlab compute the roots of a polynomial? (type help
roots) for the answer).
[4] Adapt the Backward Euler portion of fib3.m so that one may specify
an arbitrary number of compartments, as in fib1.m. Submit your well
documented M-le along with a plot of x
1
and x
10
versus time (on the
same well labeled graph) for a nine compartment ber of length = 1 cm.
[5] Derive (6.16) from (6.15) by working backwards toward x(0). Along the
way you should explain why (I/dt B)
1
/dt = (I dtB)
1
.
[6] Show, for scalar B, that ((1 (t/j)B)
1
)
j
exp(Bt) as j . Hint:
By denition
((1 (t/j)B)
1
)
j
= exp(j log(1/(1 (t/j)B)))
now use LH opitals rule to show that j log(1/(1 (t/j)B)) Bt.
71
7. Complex Numbers, Vectors, Matrices and Functions
7.1. Complex Numbers, Vectors and Matrices
A complex number is simply a pair of real numbers. In order to stress
however that the two arithmetics dier we separate the two real pieces by
the symbol +i. More precisely, each complex number, z, may be uniquely
expressed by the combination x + iy where x and y are real and i denotes

1. We call x the real part and y the imaginary part of z. We now


summarize the main rules of complex arithmetic.
If
z
1
= x
1
+iy
1
and z
2
= x
2
+iy
2
then
z
1
+z
2
(x
1
+x
2
) +i(y
1
+y
2
)
z
1
z
2
(x
1
+iy
1
)(x
2
+iy
2
) = (x
1
x
2
y
1
y
2
) +i(x
1
y
2
+x
2
y
1
)
z
1
x
1
iy
1
,
z
1
z
2

z
1
z
2
z
2
z
2
=
(x
1
x
2
+y
1
y
2
) +i(x
2
y
1
x
1
y
2
)
x
2
2
+y
2
2
|z
1
|

z
1
z
1
=
_
x
2
1
+y
2
1
,
In addition to the Cartesian representation z = x +iy one also has the polar
form
z = |z|(cos +i sin ), where (, ] and
= atan2(y, x)
_

_
/2, if x = 0, y > 0,
/2, if x = 0, y < 0,
arctan(y/x) if x > 0,
arctan(y/x) + if x < 0, y 0,
arctan(y/x) if x < 0, y < 0.
This form is especially convenient with regards to multiplication. More pre-
72
cisely,
z
1
z
2
= |z
1
||z
2
|{(cos
1
cos
2
sin
1
sin
2
) +i(cos
1
sin
2
+ sin
1
cos
2
)}
= |z
1
||z
2
|{cos(
1
+
2
) +i sin(
1
+
2
)}.
As a result,
z
n
= |z|
n
(cos(n) +i sin(n)).
A complex vector (matrix) is simply a vector (matrix) of complex numbers.
Vector and matrix addition proceed, as in the real case, from elementwise
addition. The dot or inner product of two complex vectors requires, however,
a little modication. This is evident when we try to use the old notion to
dene the length of complex vector. To wit, note that if
z =
_
1 +i
1 i
_
then
z
T
z = (1 +i)
2
+ (1 i)
2
= 1 + 2i 1 + 1 2i 1 = 0.
Now length should measure the distance from a point to the origin and should
only be zero for the zero vector. The x, as you have probably guessed, is
to sum the squares of the magnitudes of the components of z. This is
accomplished by simply conjugating one of the vectors. Namely, we dene
the length of a complex vector via
z =
_
z
T
z. (7.1)
In the example above this produces
_
|1 +i|
2
+|1 i|
2
=

4 = 2.
73
As each real number is the conjugate of itself, this new denition subsumes
its real counterpart.
The notion of magnitude also gives us a way to dene limits and hence
will permit us to introduce complex calculus. We say that the sequence of
complex numbers, {z
n
: n = 1, 2, . . .}, converges to the complex number z
0
and write
z
n
z
0
or z
0
= lim
n
z
n
,
when, presented with any > 0 one can produce an integer N for which
|z
n
z
0
| < when n N. As an example, we note that (i/2)
n
0.
7.2. Complex Functions
A complex function is merely a rule for assigning certain complex num-
bers to other complex numbers. The simplest (nonconstant) assignment is
the identity function f(z) z. Perhaps the next simplest function assigns to
each number its square, i.e., f(z) z
2
. As we decomposed the argument of
f, namely z, into its real and imaginary parts, we shall also nd it convenient
to partition the value of f, z
2
in this case, into its real and imaginary parts.
In general, we write
f(x +iy) = u(x, y) +iv(x, y)
where u and v are both realvalued functions of two real variables. In the
case that f(z) z
2
we nd
u(x, y) = x
2
y
2
and v(x, y) = 2xy.
With the tools of the previous section we may produce complex polynomials
f(z) = z
m
+c
m1
z
m1
+ +c
1
z +c
0
.
74
We say that such an f is of degree m. We shall often nd it convenient to
represent polynomials as the product of their factors, namely
f(z) = (z
1
)
m
1
(z
2
)
m
2
(z
h
)
m
h
. (7.2)
Each
j
is a root of f of degree m
j
. Here h is the number of distinct roots
of f. We call
j
a simple root when m
j
= 1. In chapter 6 we observed the
appearance of ratios of polynomials or so called rational functions. Suppose
q(z) =
f(z)
g(z)
is rational, that f is of order at most m 1 while g is of order m with the
simple roots {
1
, . . . ,
m
}. It should come as no surprise that such a q should
admit a Partial Fraction Expansion
q(z) =
m

j=1
q
j
z
j
.
One uncovers the q
j
by rst multiplying each side by (z
j
) and then letting
z tend to
j
. For example, if
1
z
2
+ 1
=
q
1
z +i
+
q
2
z i
(7.3)
then multiplying each side by (z +i) produces
1
z i
= q
1
+
q
2
(z +i)
z i
.
Now, in order to isolate q
1
it is clear that we should set z = i. So doing we
nd q
1
= i/2. In order to nd q
2
we multiply (7.3) by (z i) and then set
z = i. So doing we nd q
2
= i/2, and so
1
z
2
+ 1
=
i/2
z +i
+
i/2
z i
. (7.4)
75
Returning to the general case, we encode the above in the simple formula
q
j
= lim
z
j
(z
j
)q(z). (7.5)
You should be able to use this to conrm that
z
z
2
+ 1
=
1/2
z +i
+
1/2
z i
. (7.6)
Recall that the resolvent we met in Chapter 6 was in fact a matrix of rational
functions. Now, the partial fraction expansion of a matrix of rational functions
is simply the matrix of partial fraction expansions of each of its elements. This
is easier done than said. For example, the resolvent of
B =
_
0 1
1 0
_
is
(zI B)
1
=
1
z
2
+ 1
_
z 1
1 z
_
=
1
z +i
_
1/2 i/2
i/2 1/2
_
+
1
z i
_
1/2 i/2
i/2 1/2
_
.
(7.7)
The rst line comes from either Gauss-Jordan by hand or via the symbolic
toolbox in Matlab. More importantly, the second line is simply an amalgama-
tion of (7.3) and (7.4). Complex matrices have nally entered the picture. We
shall devote all of Chapter 10 to uncovering the remarkable properties enjoyed
by the matrices that appear in the partial fraction expansion of (zI B)
1
.
Have you noticed that, in our example, the two matrices are each projections,
that they sum to I, and that their product is 0? Could this be an accident?
In Chapter 6 we were confronted with the complex exponential when
considering the Laplace Transform. By analogy to the real exponential we
dene
e
z

n=0
z
n
n!
76
and nd that
e
i
= 1 +i + (i)
2
/2 + (i)
3
/3! + (i)
4
/4! +
= (1
2
/2 +
4
/4! ) +i(
3
/3! +
5
/5! )
= cos +i sin .
With this observation, the polar form is now simply z = |z|e
i
. One may just
as easily verify that
cos =
e
i
+ e
i
2
and sin =
e
i
e
i
2i
.
These suggest the denitions, for complex z, of
cos z
e
iz
+ e
iz
2
and sinz
e
iz
e
iz
2i
.
As in the real case the exponential enjoys the property that
e
z
1
+z
2
= e
z
1
e
z
2
and in particular
e
x+iy
= e
x
e
iy
= e
x
cos y +ie
x
sin y.
Finally, the inverse of the complex exponential is the complex logarithm,
ln z ln(|z|) +i, for z = |z|e
i
.
One nds that ln(1 +i) = ln

2 +i3/4.
7.3. Complex Dierentiation
The complex function f is said to be dierentiable at z
0
if
lim
zz
0
f(z) f(z
0
)
z z
0
77
exists, by which we mean that
f(z
n
) f(z
0
)
z
n
z
0
converges to the same value for every sequence {z
n
} that converges to z
0
.
In this case we naturally call the limit f

(z
0
).
Example: The derivative of z
2
is 2z.
lim
zz
0
z
2
z
2
0
z z
0
= lim
zz
0
(z z
0
)(z +z
0
)
z z
0
= 2z
0
.
Example: The exponential is its own derivative.
lim
zz
0
e
z
e
z
0
z z
0
= e
z
0
lim
zz
0
e
zz
0
1
z z
0
= e
z
0
lim
zz
0

n=0
(z z
0
)
n
(n + 1)!
= e
z
0
.
Example: The real part of z is not a dierentiable function of z.
We show that the limit depends on the angle of approach. First, when
z
n
z
0
on a line parallel to the real axis, e.g., z
n
= x
0
+ 1/n +iy
0
, we nd
lim
n
x
0
+ 1/n x
0
x
0
+ 1/n +iy
0
(x
0
+iy
0
)
= 1,
while if z
n
z
0
in the imaginary direction, e.g., z
n
= x
0
+i(y
0
+ 1/n), then
lim
n
x
0
x
0
x
0
+i(y
0
+ 1/n) (x
0
+iy
0
)
= 0.
This last example suggests that when f is dierentiable a simple rela-
tionship must bind its partial derivatives in x and y.
78
Proposition 7.1. If f is dierentiable at z
0
then
f

(z
0
) =
f
x
(z
0
) = i
f
y
(z
0
).
Proof: With z = x +iy
0
,
f

(z
0
) = lim
zz
0
f(z) f(z
0
)
z z
0
= lim
xx
0
f(x +iy
0
) f(x
0
+iy
0
)
x x
0
=
f
x
(z
0
).
Alternatively, when z = x
0
+iy then
f

(z
0
) = lim
zz
0
f(z) f(z
0
)
z z
0
= lim
yy
0
f(x
0
+iy) f(x
0
+iy
0
)
i(y y
0
)
= i
f
y
(z
0
).
In terms of the real and imaginary parts of f this result brings the
CauchyRiemann equations
u
x
=
v
y
and
v
x
=
u
y
. (7.8)
Regarding the converse proposition we note that when f has continuous par-
tial derivatives in region obeying the CauchyRiemann equations then f is in
fact dierentiable in the region.
We remark that with no more energy than that expended on their real
cousins one may uncover the rules for dierentiating complex sums, products,
quotients, and compositions.
As one important application of the derivative let us attempt to expand
in partial fractions a rational function whose denominator has a root with
degree larger than one. As a warm-up let us try to nd q
1,1
and q
1,2
in the
expansion
z + 2
(z + 1)
2
=
q
1,1
z + 1
+
q
1,2
(z + 1)
2
.
79
Arguing as above it seems wise to multiply through by (z +1)
2
and so arrive
at
z + 2 = q
1,1
(z + 1) +q
1,2
. (7.9)
On setting z = 1 this gives q
1,2
= 1. With q
1,2
computed (7.9) takes the
simple form z + 1 = q
1,1
(z + 1) and so q
1,1
= 1 as well. Hence
z + 2
(z + 1)
2
=
1
z + 1
+
1
(z + 1)
2
.
This latter step grows more cumbersome for roots of higher degree. Let us
consider
(z + 2)
2
(z + 1)
3
=
q
1,1
z + 1
+
q
1,2
(z + 1)
2
+
q
1,3
(z + 1)
3
.
The rst step is still correct: multiply through by the factor at its highest
degree, here 3. This leaves us with
(z + 2)
2
= q
1,1
(z + 1)
2
+q
1,2
(z + 1) +q
1,3
. (7.10)
Setting z = 1 again produces the last coecient, here q
1,3
= 1. We are left
however with one equation in two unknowns. Well, not really one equation,
for (7.10) is to hold for all z. We exploit this by taking two derivatives, with
respect to z, of (7.10). This produces
2(z + 2) = 2q
1,1
(z + 1) +q
1,2
and 2 = 2q
1,1
.
The latter of course needs no comment. We derive q
1,2
from the former by set-
ting z = 1. This example will permit us to derive a simple expression for the
partial fraction expansion of the general proper rational function q = f/g
where g has h distinct roots {
1
, . . . ,
h
} of respective degrees {m
1
, . . . , m
h
}.
We write
q(z) =
h

j=1
m
j

k=1
q
j,k
(z
j
)
k
(7.11)
80
and note, as above, that q
j,k
is the coecient of (z
j
)
m
j
k
in the rational
function r
j
(z) q(z)(z
j
)
m
j
. Hence, q
j,k
may be computed by setting
z =
j
in the ratio of the (m
j
k)th derivative of r
j
to (m
j
k)!. That is,
q
j,k
= lim
z
j
1
(m
j
k)!
d
m
j
k
dz
m
j
k
{(z
j
)
m
j
q(z)}. (7.12)
As a second example, let us take
B =
_
_
1 0 0
1 3 0
0 1 1
_
_
(7.13)
and compute the
j,k
matrices in the expansion
(zI B)
1
=
_
_
_
1
z1
0 0
1
(z1)(z3)
1
z3
0
1
(z1)
2
(z3)
1
(z1)(z3)
1
z1
_
_
_
=
1
z 1

1,1
+
1
(z 1)
2

1,2
+
1
z 3

2,1
.
The only challenging term is the (3,1) element. We write
1
(z 1)
2
(z 3)
=
q
1,1
z 1
+
q
1,2
(z 1)
2
+
q
2,1
z 3
.
It follows from (7.12) that
q
1,1
=
_
1
z 3
_

(1) =
1
4
and q
1,2
=
_
1
z 3
_
(1) =
1
2
(7.14)
and
q
2,1
=
_
1
(z 1)
2
_
(3) =
1
4
. (7.15)
It now follows that
(zI B)
1
=
1
z 1
_
_
1 0 0
1/2 0 0
1/4 1/2 1
_
_
+
1
(z 1)
2
_
_
0 0 0
0 0 0
1/2 0 0
_
_
+
1
z 3
_
_
0 0 0
1/2 1 0
1/4 1/2 0
_
_
.
(7.16)
81
In closing, let us remark that the method of partial fraction expansions has
been implemented in Matlab. In fact, (7.14) and (7.15) follow from the single
command
[r,p,k]=residue([0 0 0 1],[1 -5 7 -3]).
The rst input argument is Matlab-speak for the polynomial f(z) = 1 while
the second argument corresponds to the denominator
g(z) = (z 1)
2
(z 3) = z
3
5z
2
+ 7z 3.
7.4. Exercises
[1] Express |e
z
| in terms of x and/or y.
[2] Conrm that e
ln z
= z and ln e
z
= z.
[3] Find the real and imaginary parts of cos z and sinz. Express your answers
in terms of regular and hyperbolic trigonometric functions.
[4] Show that cos
2
z + sin
2
z = 1.
[5] With z
w
e
wln z
for complex z and w compute

i.
[6] Verify that sinz and cos z satisfy the Cauchy-Riemann equations (7.8)
and use the proposition to evaluate their derivatives.
[7] Submit a Matlab diary documenting your use of residue in the partial
fraction expansion of the resolvent of
B =
_
_
2 0 0
1 4 0
0 1 2
_
_
.
82
8. Complex Integration
8.1. Introduction
Our main goal is a better understanding of the partial fraction expansion
of a given resolvent. With respect to the example that closed the last chapter,
see (7.13)(7.16), we found
(zI B)
1
=
1
z
1
P
1
+
1
(z
1
)
2
D
1
+
1
z
2
P
2
where the P
j
and D
j
enjoy the amazing properties
BP
1
= P
1
B =
1
P
1
+D
1
and BP
2
= P
2
B =
2
P
2
(8.1)
P
1
+P
2
= I, P
2
1
= P
1
, P
2
2
= P
2
, and D
2
1
= 0, (8.2)
P
1
D
1
= D
1
P
1
= D
1
and P
2
D
1
= D
1
P
2
= 0. (8.3)
In order to show that this always happens, i.e., that it is not a quirk produced
by the particular B in (7.13), we require a few additional tools from the theory
of complex variables. In particular, we need the fact that partial fraction
expansions may be carried out through complex integration.
8.2. Cauchys Theorem
We shall be integrating complex functions over complex curves. Such a
curve is parametrized by one complex valued or, equivalently, two real valued,
function(s) of a real parameter (typically denoted by t). More precisely,
C {z(t) = x(t) +iy(t) : t
1
t t
2
}.
For example, if x(t) = y(t) = t from t
1
= 0 to t
2
= 1, then C is the line
segment joining 0 +i0 to 1 +i.
83
We now dene
_
C
f(z) dz
_
t
2
t
1
f(z(t))z

(t) dt.
For example, if C = {t +it : 0 t 1} as above and f(z) = z then
_
C
z dz =
_
1
0
(t +it)(1 +i) dt =
_
1
0
{(t t) +i2t} dt = i,
while if C is the unit circle {e
it
: 0 t 2} then
_
C
z dz =
_
2
0
e
it
ie
it
dt = i
_
2
0
e
i2t
dt = i
_
2
0
{cos(2t) +i sin(2t)} dt = 0.
Remaining with the unit circle but now integrating f(z) = 1/z we nd
_
C
z
1
dz =
_
2
0
e
it
ie
it
dt = 2i.
We generalize this calculation to arbitrary (integer) powers over arbitrary
circles. More precisely, for integer m and xed complex a we integrate (za)
m
over
C(a, r) {a +re
it
: 0 t 2},
the circle of radius r centered at a. We nd
_
C(a,r)
(z a)
m
dz =
_
2
0
(a +re
it
a)
m
rie
it
dt
= ir
m+1
_
2
0
e
i(m+1)t
dt
= ir
m+1
_
2
0
{cos((m+ 1)t) +i sin((m+ 1)t)} dt
=
_
2i if m = 1
0 otherwise,
(8.4)
regardless of the size of r!
84
When integrating more general functions it is often convenient to express
the integral in terms of its real and imaginary parts. More precisely
_
C
f(z) dz =
_
C
{u(x, y) +iv(x, y)}{dx +idy}
=
_
C
{u(x, y) dx v(x, y) dy} +i
_
C
{u(x, y) dy +v(x, y) dx}
=
_
b
a
{u(x(t), y(t))x

(t) v(x(t), y(t))y

(t)} dt
+i
_
b
a
{u(x(t), y(t))y

(t) +v(x(t), y(t))x

(t)} dt.
The second line should invoke memories of
Greens Theorem. If C is a closed curve and M and N are continuously
dierentiable realvalued functions on C
in
, the region enclosed by C, then
_
C
{M dx +N dy} =
__
C
in
_
N
x

M
y
_
dxdy
Applying this to the situation above, we nd, so long as C is closed, that
_
C
f(z) dz =
__
C
in
_
v
x
+
u
y
_
dxdy +i
__
C
in
_
u
x

v
y
_
dxdy.
At rst glance it appears that Greens Theorem only serves to muddy the
waters. Recalling the CauchyRiemann equations however we nd that each
of these double integrals is in fact identically zero! In brief, we have proven
Cauchys Theorem. If f is dierentiable on and in the closed curve C then
_
C
f(z) dz = 0.
Strictly speaking, in order to invoke Greens Theorem we require not
only that f be dierentiable but that its derivative in fact be continuous.
85
This however is simply a limitation of our simple mode of proof, Cauchys
Theorem is true as stated.
This theorem, together with (8.4), permits us to integrate every proper
rational function. More precisely, if q = f/g where f is a polynomial of degree
at most m 1 and g is an mth degree polynomial with h distinct zeros at
{
j
}
h
j=1
with respective multiplicities of {m
j
}
h
j=1
we found that
q(z) =
h

j=1
m
j

k=1
q
j,k
(z
j
)
k
. (8.5)
Observe now that if we choose r
j
so small that
j
is the only zero of g encircled
by C
j
C(
j
, r
j
) then by Cauchys Theorem
_
C
j
q(z) dz =
m
j

k=1
q
j,k
_
C
j
1
(z
j
)
k
dz.
In (8.4) we found that each, save the rst, of the integrals under the sum is
in fact zero. Hence
_
C
j
q(z) dz = 2iq
j,1
. (8.6)
With q
j,1
in hand, say from (7.12) or residue, one may view (8.6) as a means
for computing the indicated integral. The opposite reading, i.e., that the
integral is a convenient means of expressing q
j,1
, will prove just as useful.
With that in mind, we note that the remaining residues may be computed as
integrals of the product of q and the appropriate factor. More precisely,
_
C
j
q(z)(z
j
)
k1
dz = 2iq
j,k
. (8.7)
One may be led to believe that the precision of this result is due to the very
special choice of curve and function. We shall see ...
86
8.3. The Residue Theorem
After (8.6) and (8.7) perhaps the most useful consequence of Cauchys
Theorem is the
Curve Replacement Lemma. Suppose that C
2
is a closed curve that lies
inside the region encircled by the closed curve C
1
. If f is dierentiable in the
annular region outside C
2
and inside C
1
then
_
C
1
f(z) dz =
_
C
2
f(z) dz.
Proof: With reference to the gure below we introduce two vertical segments
and dene the closed curves C
3
= abcda (where the bc arc is clockwise and
the da arc is counter-clockwise) and C
4
= adcba (where the ad arc is counter-
clockwise and the cb arc is clockwise). By merely following the arrows we
learn that
_
C
1
f(z) dz =
_
C
2
f(z) dz +
_
C
3
f(z) dz +
_
C
4
f(z) dz.
As Cauchys Theorem implies that the integrals over C
3
and C
4
each vanish,
we have our result.
a
b
c
d
C
C
1
2
Figure 8.1. The Curve Replacement Lemma.
87
This Lemma says that in order to integrate a function it suces to inte-
grate it over regions where it is singular, i.e., nondierentiable.
Let us apply this reasoning to the integral
_
C
z
(z
1
)(z
2
)
dz
where C encircles both
1
and
2
as depicted in the cartoon on the next page.
We nd that
_
C
z
(z
1
)(z
2
)
dz =
_
C
1
z dz
(z
1
)(z
2
)
+
_
C
2
z dz
(z
1
)(z
2
)
.
Developing the integrand in partial fractions we nd
_
C
1
z dz
(z
1
)(z
2
)
=

1

2
_
C
1
dz
z
1
+

2

1
_
C
1
dz
z
2
=
2i
1

2
.
Similarly,
_
C
2
z dz
(z
1
)(z
2
)
=
2i
2

1
.
Putting things back together we nd
_
C
z
(z
1
)(z
2
)
dz = 2i
_

1

2
+

2

1
_
= 2i. (8.8)
88
Replace this contour
With this contour
Now, as we squeeze the neck (and round the
heads) we find that the two sides of the neck
cancel and we arrive at the two contours

1
2
C
C
C 1
2
Figure 8.2. Concentrating on the poles.
We may view (8.8) as a special instance of integrating a rational function
around a curve that encircles all of the zeros of its denominator. In particular,
recalling (8.5) and (8.6), we nd
_
C
q(z) dz =
h

j=1
m
j

k=1
_
C
j
q
j,k
(z
j
)
k
dz = 2i
h

j=1
q
j,1
. (8.9)
To take a slightly more complicated example let us integrate f(z)/(z a)
over some closed curve C inside of which f is dierentiable and a resides. Our
Curve Replacement Lemma now permits us to claim that
_
C
f(z)
z a
dz =
_
C(a,r)
f(z)
z a
dz.
It appears that one can go no further without specifying f. The alert reader
however recognizes that the integral over C(a, r) is independent of r and so
proceeds to let r 0, in which case z a and f(z) f(a). Computing the
integral of 1/(z a) along the way we are lead to the hope that
_
C
f(z)
z a
dz = f(a)2i
89
In support of this conclusion we note that
_
C(a,r)
f(z)
z a
dz =
_
C(a,r)
_
f(z)
z a
+
f(a)
z a

f(a)
z a
_
dz
= f(a)
_
C(a,r)
1
z a
dz +
_
C(a,r)
f(z) f(a)
z a
dz.
Now the rst term is f(a)2i regardless of r while, as r 0, the integrand
of the second term approaches f

(a) and the region of integration approaches


the point a. Regarding this second term, as the integrand remains bounded as
the perimeter of C(a, r) approaches zero the value of the integral must itself
be zero.
This result is typically known as
Cauchys Integral Formula. If f is dierentiable on and in the closed curve
C then
f(a) =
1
2i
_
C
f(z)
z a
dz (8.10)
for each a lying inside C.
The consequences of such a formula run far and deep. We shall delve into
only one or two. First, we note that, as a does not lie on C, the right hand
side is a perfectly smooth function of a. Hence, dierentiating each side, we
nd
f

(a) =
1
2i
_
C
f(z)
(z a)
2
dz (8.11)
for each a lying inside C. Applying this reasoning n times we arrive at a
formula for the nth derivative of f at a,
d
n
f
da
n
(a) =
n!
2i
_
C
f(z)
(z a)
1+n
dz (8.12)
90
for each a lying inside C. The upshot is that once f is shown to be dieren-
tiable it must in fact be innitely dierentiable. As a simple extension let us
consider
1
2i
_
C
f(z)
(z
1
)(z
2
)
2
dz
where f is still assumed dierentiable on and in C and that C encircles both

1
and
2
. By the curve replacement lemma this integral is the sum
1
2i
_
C
1
f(z)
(z
1
)(z
2
)
2
dz +
1
2i
_
C
2
f(z)
(z
1
)(z
2
)
2
dz
where
j
now lies in only C
j
. As f(z)/(z
2
) is well behaved in C
1
we may
use (8.10) to conclude that
1
2i
_
C
1
f(z)
(z
1
)(z
2
)
2
dz =
f(
1
)
(
1

2
)
2
.
Similarly, As f(z)/(z
1
) is well behaved in C
2
we may use (8.11) to conclude
that
1
2i
_
C
2
f(z)
(z
1
)(z
2
)
2
dz =
d
da
f(a)
(a
1
)

a=
2
.
These calculations can be read as a concrete instance of
The Residue Theorem. If g is a polynomial with roots {
j
}
h
j=1
of degree
{m
j
}
h
j=1
and C is a closed curve encircling each of the
j
and f is dierentiable
on and in C then
_
C
f(z)
g(z)
dz = 2i
h

j=1
res(
j
)
where
res(
j
) = lim
z
j
1
(m
j
1)!
d
m
j
1
dz
m
j
1
_
(z
j
)
m
j
f(z)
g(z)
_
is called the residue of f/g at
j
.
One of the most important instances of this theorem is the formula for
91
8.4. The Inverse Laplace Transform
If q is a rational function with poles {
j
}
h
j=1
then the inverse Laplace
transform of q is
(L
1
q)(t)
1
2i
_
C
q(z)e
zt
dz (8.13)
where C is a curve that encloses each of the poles of q. As a result
(L
1
q)(t) =
h

j=1
res(
j
). (8.14)
Let us put this lovely formula to the test. We take our examples from chapter
6. Let us rst compute the inverse Laplace Transform of
q(z) =
1
(z + 1)
2
.
According to (8.14) it is simply the residue of q(z)e
zt
at z = 1, i.e.,
res(1) = lim
z1
d
dz
e
zt
= te
t
.
This closes the circle on the example begun in 6.3 and continued in exercise
6.1.
For our next example we recall
Lx
1
(s) =
0.19(s
2
+ 1.5s + 0.27)
(s + 1/6)
4
(s
3
+ 1.655s
2
+ 0.4078s + 0.0039)
from chapter 6. Using numde, sym2poly and residue, see fib4.m for details,
returns
r
1
=
_
_
_
_
_
_
_
_
_
0.0029
262.8394
474.1929
1.0857
9.0930
0.3326
211.3507
_
_
_
_
_
_
_
_
_
and p
1
=
_
_
_
_
_
_
_
_
_
1.3565
0.2885
0.1667
0.1667
0.1667
0.1667
0.0100
_
_
_
_
_
_
_
_
_
.
92
You will be asked in the exercises to show that this indeed jibes with the
x
1
(t) = 211.35 exp(t/100)
(.0554t
3
+ 4.5464t
2
+ 1.085t + 474.19) exp(t/6)+
exp(329t/400)(262.842 cosh(

73t/16) + 262.836 sinh(

73t/16))
achieved in chapter 6 via ilaplace.
8.5. Exercises
[1] Let us conrm the representation (8.7) in the matrix case. More precisely,
if (z) (zI B)
1
is the resolvent associated with B then (8.7) states
that
(z) =
h

j=1
m
j

k=1

j,k
(z
j
)
k
where

j,k
=
1
2i
_
C
j
(z)(z
j
)
k1
dz. (8.15)
Compute the
j,k
per (8.15) for the B in (7.13). Conrm that they agree
with those appearing in (7.16).
[2] Use (8.14) to compute the inverse Laplace transform of 1/(s
2
+ 2s + 2).
[3] Use the result of the previous exercise to solve, via the Laplace transform,
the dierential equation
x

(t) +x(t) = e
t
sin t, x(0) = 0.
Hint: Take the Laplace transform of each side.
[4] Explain how one gets from r
1
and p
1
to x
1
(t).
[5] Compute, as in fib4.m, the residues of Lx
2
(s) and Lx
3
(s) and conrm
that they give rise to the x
2
(t) and x
3
(t) you derived in Chapter 1.
93
9. The Eigenvalue Problem
9.1. Introduction
Harking back to chapter 6 we labeled the complex number an eigen-
value of B if I B was not invertible. In order to nd such one has
only to nd those s for which (sI B)
1
is not dened. To take a concrete
example we note that if
B =
_
_
1 1 0
0 1 0
0 0 2
_
_
(9.1)
then
(sI B)
1
=
1
(s 1)
2
(s 2)
_
_
(s 1)(s 2) s 2 0
0 (s 1)(s 2) 0
0 0 (s 1)
2
_
_
(9.2)
and so
1
= 1 and
2
= 2 are the two eigenvalues of B. Now, to say that

j
I B is not invertible is to say that its columns are linearly dependent, or,
equivalently, that the null space N(
j
I B) contains more than just the zero
vector. We call N(
j
I B) the jth eigenspace and call each of its nonzero
members a jth eigenvector. The dimension of N(
j
I B) is referred to as
the geometric multiplicity of
j
. With respect to B above we compute
N(
1
I B) by solving (I B)x = 0, i.e.,
_
_
0 1 0
0 0 0
0 0 1
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
0
0
0
_
_
Clearly
N(
1
I B) = {c[1 0 0]
T
: c R}.
Arguing along the same lines we also nd
N(
2
I B) = {c[0 0 1]
T
: c R}.
94
That B is 3-by-3 but possesses only 2 linearly independent eigenvectors, much
like our patient on the front cover, suggests that matrices can not necessarily
be judged by the number of their eigenvectors. After a little probing one might
surmise that Bs condition is related to the fact that
1
is a double pole of
(sI B)
1
. In order to esh out that remark and nd a proper replacemnet
for the missing eigenvector we must take a much closer look at the resolvent.
9.2. The Resolvent
One means by which to come to grips with R(s) is to treat it as the
matrix analog of the scalar function
1
s b
. (9.3)
This function is a scaled version of the even simpler function 1/(1 z). This
latter function satises the identity (just multiply across by 1 z to check it)
1
1 z
= 1 +z +z
2
+ +z
n1
+
z
n
1 z
(9.4)
for each positive integer n. Furthermore, if |z| < 1 then z
n
0 as n
and so (9.4) becomes, in the limit,
1
1 z
=

n=0
z
n
,
the familiar geometric series. Returning to (9.3) we write
1
s b
=
1/s
1 b/s
=
1
s
+
b
s
2
+ +
b
n1
s
n
+
b
n
s
n
1
s b
,
and hence, so long as |s| > |b| we nd,
1
s b
=
1
s

n=0
_
b
s
_
n
.
95
This same line of reasoning may be applied in the matrix case. That is,
(sI B)
1
= s
1
(I B/s)
1
=
1
s
+
B
s
2
+ +
B
n1
s
n
+
B
n
s
n
(sI B)
1
, (9.5)
and hence, so long as |s| > B where B is the magnitude of the largest
eigenvalue of B, we nd
(sI B)
1
= s
1

n=0
(B/s)
n
. (9.6)
Although (9.6) is indeed a formula for the resolvent you may, regarding com-
putation, not nd it any more attractive than the Gauss-Jordan method. We
view (9.6) however as an analytical rather than computational tool. More
precisely, it facilitates the computation of integrals of R(s). For example, if
C

is the circle of radius centered at the origin and > B then


_
C

(sI B)
1
ds =

n=0
B
n
_
C

s
1n
ds = 2iI. (9.7)
This result is essential to our study of the eigenvalue problem. As are the two
resolvent identities. Regarding the rst we deduce from the simple observation
(s
2
I B)
1
(s
1
I B)
1
= (s
2
I B)
1
(s
1
I B s
2
I +B)(s
1
I B)
1
that
R(s
2
) R(s
1
) = (s
1
s
2
)R(s
2
)R(s
1
). (9.8)
The second identity is simply a rewriting of
(sI B)(sI B)
1
= (sI B)
1
(sI B) = I,
namely,
BR(s) = R(s)B = sR(s) I. (9.9)
96
9.3. The Partial Fraction Expansion of the Resolvent
The GaussJordan method informs us that R will be a matrix of rational
functions with a common denominator. In keeping with the notation of the
previous chapters we assume the denominator to have the h distinct roots,
{
j
}
h
j=1
with associated multiplicities {m
j
}
h
j=1
.
Now, assembling the partial fraction expansions of each element of R we
arrive at
R(s) =
h

j=1
m
j

k=1
R
j,k
(s
j
)
k
(9.10)
where, recalling equation (8.7), R
j,k
is the matrix
R
j,k
=
1
2i
_
C
j
R(z)(z
j
)
k1
dz. (9.11)
To take a concrete example, with respect to (9.1) and (9.2) we nd
R
1,1
=
_
_
1 0 0
0 1 0
0 0 0
_
_
R
1,2
=
_
_
0 1 0
0 0 0
0 0 0
_
_
and R
2,1
=
_
_
0 0 0
0 0 0
0 0 1
_
_
.
One notes immediately that these matrices enjoy some amazing properties.
For example
R
2
1,1
= R
1,1
, R
2
2,1
= R
2,1
, R
1,1
R
2,1
= 0, and R
2
1,2
= 0. (9.12)
We now show that this is no accident. We shall nd it true in general as a
consequence of (9.11) and the rst resolvent identity.
Proposition 9.1. R
2
j,1
= R
j,1
.
Proof: Recall that the C
j
appearing in (9.11) is any circle about
j
that
neither touches nor encircles any other root. Suppose that C
j
and C

j
are two
97
such circles and C

j
encloses C
j
. Now
R
j,1
=
1
2i
_
C
j
R(z) dz =
1
2i
_
C

j
R(z) dz
and so
R
2
j,1
=
1
(2i)
2
_
C
j
R(z) dz
_
C

j
R(w) dw
=
1
(2i)
2
_
C
j
_
C

j
R(z)R(w) dwdz
=
1
(2i)
2
_
C
j
_
C

j
R(z) R(w)
w z
dwdz
=
1
(2i)
2
_
_
C
j
R(z)
_
C

j
1
w z
dwdz
_
C

j
R(w)
_
C
j
1
w z
dz dw
_
=
1
2i
_
C
j
R(z) dz = R
j,1
.
We used the rst resolvent identity, (9.8), in moving from the second to the
third line. In moving from the fourth to the fth we used only
_
C

j
1
w z
dw = 2i and
_
C
j
1
w z
dz = 0. (9.13)
The latter integrates to zero because C
j
does not encircle w.
Recalling denition 5.1 that matrices that equal their squares are projec-
tions we adopt the abbreviation
P
j
R
j,1
.
With respect to the product P
j
P
k
, for j = k, the calculation runs along the
same lines. The dierence comes in (9.13) where, as C
j
lies completely outside
of C
k
, both integrals are zero. Hence,
98
Proposition 9.2. If j = k then P
j
P
k
= 0.
Along the same lines we dene
D
j
R
j,2
and prove
Proposition 9.3. If 1 k m
j
1 then D
k
j
= R
j,k+1
. D
m
j
j
= 0.
Proof: For k and greater than or equal to one,
R
j,k+1
R
j,+1
=
1
(2i)
2
_
C
j
R(z)(z
j
)
k
dz
_
C

j
R(w)(w
j
)

dw
=
1
(2i)
2
_
C
j
_
C

j
R(z)R(w)(z
j
)
k
(w
j
)

dwdz
=
1
(2i)
2
_
C

j
_
C
j
R(z) R(w)
w z
(z
j
)
k
(w
j
)

dwdz
=
1
(2i)
2
_
C
j
R(z)(z
j
)
k
_
C

j
(w
j
)

w z
dwdz

1
(2i)
2
_
C

j
R(w)(w
j
)

_
C
j
(z
j
)
k
w z
dz dw
=
1
2i
_
C
j
R(z)(z
j
)
k+
dz = R
j,k++1
.
because
_
C

j
(w
j
)

w z
dw = 2i(z
j
)

and
_
C
j
(z
j
)
k
w z
dz = 0. (9.14)
With k = = 1 we have shown R
2
j,2
= R
j,3
, i.e., D
2
j
= R
j,3
. Similarly, with
k = 1 and = 2 we nd R
j,2
R
j,3
= R
j,4
, i.e., D
3
j
= R
j,4
, and so on. Finally,
at k = m
j
we nd
D
m
j
j
= R
j,m
j
+1
=
1
2i
_
C
j
R(z)(z
j
)
m
j
dz = 0
99
by Cauchys Theorem.
With this we now have the sought after expansion
R(z) =
h

j=1
_
1
z
j
P
j
+
m
j
1

k=1
1
(z
j
)
k+1
D
k
j
_
, (9.15)
along with verication of a number of the properties laid out in (8.1)(8.3). In
the case that each eigenvalue is simple, i.e., m
j
= 1, we arrive at the compact
representation
(zI B)
1
=
n

j=1
P
j
z
j
(9.16)
As a quick application of this we return to a curious limit that arose in our
discussion of Backward Euler. Namely, we would like to evaluate
lim
k
(I (t/k)B)
k
.
To nd this we note that (zI B)
1
= (I B/z)
1
/z and so (9.16) may be
written
(I B/z)
1
=
n

j=1
zP
j
z
j
=
n

j=1
P
j
1
j
/z
If we now set z = k/t, where k is a positive integer, and use the fact that the
P
j
are projections that annihilate one another then we arrive rst at
(I (t/k)B)
k
=
n

j=1
P
j
(1 (t/k)
j
)
k
and so, in the limit nd
lim
k
(I (t/k)B)
k
=
n

j=1
exp(
j
t)P
j
100
and naturally refer to this limit as exp(Bt).
9.4. The Spectral Representation
With just a little bit more work we shall arrive at a similar expansion
for B itself. We begin by applying the second resolvent identity, (9.9), to P
j
.
More precisely, we note that (9.9) implies that
BP
j
= P
j
B =
1
2i
_
C
j
(zR(z) I) dz
=
1
2i
_
C
j
zR(z) dz
=
1
2i
_
C
j
R(z)(z
j
) dz +

j
2i
_
C
j
R(z) dz
= D
j
+
j
P
j
.
(9.17)
Summing this over j we nd
B
h

j=1
P
j
=
h

j=1

j
P
j
+
h

j=1
D
j
. (9.18)
We can go one step further, namely the evaluation of the rst sum. This
stems from (9.7) where we integrated R(s) over a circle C

where > B.
The connection to the P
j
is made by the residue theorem. More precisely,
_
C

R(z) dz = 2i
h

j=1
P
j
.
Comparing this to (9.7) we nd
h

j=1
P
j
= I, (9.19)
and so (9.18) takes the form
B =
h

j=1

j
P
j
+
h

j=1
D
j
. (9.20)
101
It is this formula that we refer to as the Spectral Representation of B. To
the numerous connections between the P
j
and D
j
we wish to add one more.
We rst write (9.17) as
(B
j
I)P
j
= D
j
(9.21)
and then raise each side to the m
j
power. As P
m
j
j
= P
j
and D
m
j
j
= 0 we nd
(B
j
I)
m
j
P
j
= 0. (9.22)
For this reason we call the range of P
j
the jth generalized eigenspace, call
each of its nonzero members a jth generalized eigenvector and refer to the
dimension of R(P
j
) as the algebraic multiplicity of
j
. With regard to the
example with which we began the chapter we note that although it has only
two linearly independent eigenvectors the span of the associated generalized
eigenspaces indeed lls out R
3
, i.e., R(P
1
) R(P
2
) = R
3
. One may view this
as a consequence of P
1
+P
2
= I, or, perhaps more concretely, as appending the
generalized rst eigenvector [0 1 0]
T
to the original two eigenvectors [1 0 0]
T
and [0 0 1]
T
. In still other words, the algebraic multiplicities sum to the
ambient dimension (here 3), while the sum of geometric multiplicities falls
short (here 2).
9.5. Examples
We take a look back at our previous examples in light of the results of
the previous 2 sections. With respect to the rotation matrix
B =
_
0 1
1 0
_
102
we recall, see (8.6), that
R(s) =
1
s
2
+ 1
_
s 1
1 s
_
=
1
s i
_
1/2 i/2
i/2 1/2
_
+
1
s +i
_
1/2 i/2
i/2 1/2
_
=
1
s
1
P
1
+
1
s
2
P
2
,
(9.23)
and so
B =
1
P
1
+
2
P
2
= i
_
1/2 i/2
i/2 1/2
_
+i
_
1/2 i/2
i/2 1/2
_
.
From m
1
= m
2
= 1 it follows that R(P
1
) and R(P
2
) are actual (as opposed
to generalized) eigenspaces. These column spaces are easily determined. In
particular, R(P
1
) is the span of
e
1
=
_
1
i
_
while R(P
2
) is the span of
e
2
=
_
1
i
_
.
To recapitulate, from partial fraction expansion one can read o the projec-
tions from which one can read o the eigenvectors. The reverse direction,
producing projections from eigenvectors, is equally worthwhile. We laid the
groundwork for this step in Chapter 5. In particular, (5.8) stipulates that
P
1
= e
1
(e
T
1
e
1
)
1
e
T
1
and P
2
= e
2
(e
T
2
e
2
)
1
e
T
2
.
As e
T
1
e
1
= e
T
2
e
2
= 0 these formulas can not possibly be correct. Returning
to Chapter 5 we realize that it was, perhaps implicitly, assumed that all
quantities were real. At root is the notion of the length of a complex vector.
103
It is not the square root of the sum of squares of its components but rather
the square root of the sum of squares of the magnitudes of its components.
That is, recalling that the magnitude of a complex quantity z is

zz,
e
1

2
= e
T
1
e
1
rather e
1

2
= e
T
1
e
1
.
Yes, we have had this discussion before, recall (7.1). The upshot of all of
this is that, when dealing with complex vectors and matrices, one should
conjugate before every transpose. Matlab (of course) does this automatically,
i.e., the prime symbol,

, conjugates and transposes simultaneously. As we
have used the prime to denote dierentiation we instead use the asterisk to
denote conjugate transpose, i.e.,
x

x
T
.
All this suggests that the desired projections are more likely
P
1
= e
1
(e

1
e
1
)
1
e

1
and P
2
= e
2
(e

2
e
2
)
1
e

2
. (9.24)
Please check that (9.24) indeed jibes with (9.23).
9.6. The Characteristic Polynomial
Our understanding of eigenvalues, as poles of the resolvent, has permitted
us to exploit the beautiful eld of complex integration and arrive at a (hope-
fully) clear understanding of the Spectral Representation. It is time however
to take note of alternate paths to this fundamental result. In fact the most
common understanding of eigenvalues is not via poles of the resolvent but
rather as zeros of the characteristic polynomial. Of course an eigenvalue is an
eigenvalue and mathematics will not permit contradictory denitions. The
104
starting point is as above - an eigenvalue of the n-by-n matrix B is a value of
s for which (sI B) is not invertible. Of the many tests for invertibility let
us here recall a matrix is not invertible if and only if it has a zero pivot. If we
denote the jth pivot of B by p
j
(B) we may dene the determinant of B as
det(B)
n

j=1
p
j
(B)
For example, if
B =
_
a b
c d
_
then det(B) ad bc.
More to the point
det(sI B) = (s a)(s d) bc
= s
2
(a +d)s + (ad bc).
This is a degree-2 (quadratic) polynomial for the 2-by-2 matrix B. The for-
mula that denes the determinant of a general n-by-n matrix, though com-
plicated, always involves an alternating sum of products of n elements of B,
and hence
p(s) det(sI B)
is a polynomial of degree n. It is called the characteristic polynomial of
B.
Since the zeros of p are the points where det(sI B) = 0, those zeros are
precisely the points where sI B is not invertible, i.e., the eigenvalues of B.
The Fundamental Theorem of Algebra tells us that a degree-n polynomial has
no more than n roots, and thus an n-by-n matrix can never have more than
n distinct eigenvalues. In the language of Section 9.3, we must have h n.
105
Let us look at a few examples. First, consider the matrix from Section 9.5,
B =
_
0 1
1 0
_
.
The simple formula for the 2-by-2 determinant leads to
det(sI B) = s
2
+ 1 = (s i)(s +i),
conrming that the eigenvalues of B are
1
= i and
2
= i.
For larger matrices we turn to MATLAB. Its poly function directly deliv-
ers the coecients of the characteristic polynomial. Returning to the 3-by-3
example that opened this chapter, we nd
>> B = [1 1 0; 0 1 0; 0 0 2]
>> p = poly(B)
p = 1 -4 5 -2
>> roots(p)
2.0000
1.0000 + 0.0000i
1.0000 - 0.0000i
which means p(s) = s
3
4s
2
+ 5s 2 = (s 1)
2
(s 2) which is none other
than the common denominator in our resolvent (9.2).
9.7. Exercises
[1] Argue as in Proposition 9.1 that if j = k then D
j
P
k
= P
j
D
k
= 0.
[2] Argue from (9.21) and P
2
j
= P
j
that D
j
P
j
= P
j
D
j
= D
j
.
[3] The two previous exercises come in very handy when computing powers
of matrices. For example, suppose B is 4-by-4, that h = 2 and m
1
=
106
m
2
= 2. Use the spectral representation of B together with the rst two
exercises to arrive at simple formulas for B
2
and B
3
.
[4] Compute the spectral representation of the circulant matrix
B =
_
_
_
2 8 6 4
4 2 8 6
6 4 2 8
8 6 4 2
_
_
_
.
Carefully label all eigenvalues, eigenprojections and eigenvectors.
[5] Explain why the algebraic multiplicity of
j
is greater than or equal to
its geometric multiplicity and why these two multiplicities coincide if and
only if m
j
= 1.
107
10. The Spectral Representation of a Symmetric Matrix
10.1. Introduction
Our goal is to show that if B is symmetric then
each
j
is real,
each P
j
is symmetric and
each D
j
vanishes.
Let us begin with an example. The resolvent of
B =
_
_
1 1 1
1 1 1
1 1 1
_
_
is
R(s) =
1
s(s 3)
_
_
s 2 1 1
1 s 2 1
1 1 s 2
_
_
=
1
s
_
_
2/3 1/3 1/3
1/3 2/3 1/3
1/3 1/3 2/3
_
_
+
1
s 3
_
_
1/3 1/3 1/3
1/3 1/3 1/3
1/3 1/3 1/3
_
_
.
=
1
s
1
P
1
+
1
s
2
P
2
and so indeed each of the bullets holds true. With each of the D
j
falling by
the wayside you may also expect that the respective geometric and algebraic
multiplicities coincide.
10.2. The Spectral Representation
We begin with
Proposition 10.1. If B is real and B = B
T
then the eigenvalues of B are
real.
108
Proof: We suppose that and x comprise an eigenpair of B, i.e., Bx = x
and show that = . On conjugating each side of this eigen equation we nd
Bx = x and Bx = x (10.1)
where we have used the fact that B is real. We now take the inner product
of the rst with x and the second with x and nd
x
T
Bx = x
2
and x
T
Bx = x
2
. (10.2)
The left hand side of the rst is a scalar and so equal to its transpose
x
T
Bx = (x
T
Bx)
T
= x
T
B
T
x = x
T
Bx (10.3)
where in the last step we used B = B
T
. Combining (10.2) and (10.3) we see
x
2
= x
2
.
Finally, as x > 0 we may conclude that = .
We next establish
Proposition 10.2. If B is real and B = B
T
then each eigenprojection, P
j
,
and each eigennilpotent, D
j
, is real and symmetric.
Proof: From the fact (please prove it) that the transpose of the inverse is the
inverse of the transpose we learn that
{(sI B)
1
}
T
= {(sI B)
T
}
1
= (sI B)
1
i.e., the resolvent of a symmetric matrix is symmetric. Hence
P
T
j
=
_
1
2i
_
C
j
(sI B)
1
ds
_
T
=
1
2i
_
C
j
{(sI B)
1
}
T
ds
=
1
2i
_
C
j
(sI B)
1
ds = P
j
.
109
By the same token, we nd that each D
T
j
= D
j
. The elements of P
j
and D
j
are real because they are residues of real functions evaluated at real poles.
It remains to establish
Proposition 10.3. The zero matrix is the only real symmetric nilpotent
matrix.
Proof: Suppose that D is n-by-n and D = D
T
and D
m
= 0 for some positive
integer m. We show that D
m1
= 0 by showing that every vector lies in its
null space. To wit, if x R
n
then
D
m1
x
2
= x
T
(D
m1
)
T
D
m1
x
= x
T
D
m1
D
m1
x
= x
T
D
m2
D
m
x
= 0.
As D
m1
x = 0 for every x it follows (recall Exercise 3.3) that D
m1
= 0.
Continuing in this fashion we nd D
m2
= 0 and so, eventually, D = 0.
We have now established
Proposition 10.4. If B is real and symmetric then
B =
h

j=1

j
P
j
(10.4)
where the
j
are real and the P
j
are real orthogonal projections that sum to
the identity and whose pairwise products vanish.
One indication that things are simpler when using the spectral represen-
tation is
B
100
=
h

j=1

100
j
P
j
. (10.5)
110
As this holds for all powers it even holds for power series. As a result,
exp(B) =
h

j=1
exp(
j
)P
j
.
It is also extremely useful in attempting to solve
Bx = b
for x. Replacing B by its spectral representation and b by Ib or, more to the
point by

j
P
j
b we nd
h

j=1

j
P
j
x =
h

j=1
P
j
b.
Multiplying through by P
1
gives
1
P
1
x = P
1
b or P
1
x = P
1
b/
1
. Multiplying
through by the subsequent P
j
s gives P
j
x = P
j
b/
j
. Hence,
x =
h

j=1
P
j
x =
h

j=1
1

j
P
j
b. (10.6)
We clearly run in to trouble when one of the eigenvalues vanishes. This, of
course, is to be expected. For a zero eigenvalue indicates a nontrivial null
space which signies dependencies in the columns of B and hence the lack of
a unique solution to Bx = b.
Another way in which (10.6) may be viewed is to note that, when B is
symmetric, (9.15) takes the form
(zI B)
1
=
h

j=1
1
z
j
P
j
.
111
Now if 0 is not an eigenvalue we may set z = 0 in the above and arrive at
B
1
=
h

j=1
1

j
P
j
. (10.7)
Hence, the solution to Bx = b is
x = B
1
b =
h

j=1
1

j
P
j
b,
as in (10.6). With (10.7) we have nally reached a point where we can begin
to dene an inverse even for matrices with dependent columns, i.e., with a
zero eigenvalue. We simply exclude the oending term in (10.7). Supposing
that
h
= 0 we dene the pseudoinverse of B to be
B
+

h1

j=1
1

j
P
j
.
Let us now see whether it is deserving of its name. More precisely, when
b R(B) we would expect that x = B
+
b indeed satises Bx = b. Well
BB
+
b = B
h1

j=1
1

j
P
j
b =
h1

j=1
1

j
BP
j
b =
h1

j=1
1

j
P
j
b =
h1

j=1
P
j
b.
It remains to argue that the latter sum really is b. We know that
b =
h

j=1
P
j
b and b R(B).
The latter informs us that b N(B
T
). As B = B
T
we have in fact that
b N(B). As P
h
is nothing but orthogonal projection onto N(B) it follows
that P
h
b = 0 and so B(B
+
b) = b, that is, x = B
+
b is a solution to Bx = b.
112
The representation (10.5) is unarguably terse and in fact is often written
out in terms of individual eigenvectors. Let us see how this is done. Note
that if x R(P
1
) then x = P
1
x and so
Bx = BP
1
x =
h

j=1

j
P
j
P
1
x =
1
P
1
x =
1
x,
i.e., x is an eigenvector of B associated with
1
. Similarly, every (nonzero)
vector in R(P
j
) is an eigenvector of B associated with
j
.
Next let us demonstrate that each element of R(P
j
) is orthogonal to each
element of R(P
k
) when j = k. If x R(P
j
) and y R(P
k
) then
x
T
y = (P
j
x)
T
P
k
y = x
T
P
j
P
k
y = 0.
With this we note that if {x
j,1
, x
j,2
, . . . , x
j,n
j
} constitutes a basis for R(P
j
)
then in fact the union of such bases,
{x
j,p
: 1 j h, 1 p n
j
}, (10.8)
forms a linearly independent set. Notice now that this set has
h

j=1
n
j
elements. That these dimensions indeed sum to the ambient dimension, n,
follows directly from the fact that the underlying P
j
sum to the n-by-n identity
matrix. We have just proven
Proposition 10.5. If B is real and symmetric and n-by-n then B has a set
of n linearly independent eigenvectors.
Getting back to a more concrete version of (10.5) we now assemble ma-
trices from the individual bases
E
j
[x
j,1
x
j,2
. . . x
j,n
j
]
113
and note, once again, that P
j
= E
j
(E
T
j
E
j
)
1
E
T
j
, and so
B =
h

j=1

j
E
j
(E
T
j
E
j
)
1
E
T
j
. (10.9)
I understand that you may feel a little underwhelmed with this formula. If we
work a bit harder we can remove the presence of the annoying inverse. What
I mean is that it is possible to choose a basis for each R(P
j
) for which each of
the corresponding E
j
satisfy E
T
j
E
j
= I. As this construction is fairly general
let us devote a separate section to it.
10.3. GramSchmidt Orthogonalization
Suppose that M is an m-dimensional subspace with basis
{x
1
, . . . , x
m
}.
We transform this into an orthonormal basis
{q
1
, . . . , q
m
}
for M via
(1) Set y
1
= x
1
and q
1
= y
1
/y
1
.
(2) y
2
= x
2
minus the projection of x
2
onto the line spanned by q
1
. That is
y
2
= x
2
q
1
(q
T
1
q
1
)
1
q
T
1
x
2
= x
2
q
1
q
T
1
x
2
.
Set q
2
= y
2
/y
2
and Q
2
= [q
1
q
2
].
(3) y
3
= x
3
minus the projection of x
3
onto the plane spanned by q
1
and q
2
.
That is
y
3
= x
3
Q
2
(Q
T
2
Q
2
)
1
Q
T
2
x
3
= x
3
q
1
q
T
1
x
3
q
2
q
T
2
x
3
.
114
Set q
3
= y
3
/y
3
and Q
3
= [q
1
q
2
q
3
].
Continue in this fashion through step
(m) y
m
= x
m
minus its projection onto the subspace spanned by the columns
of Q
m1
. That is
y
m
= x
m
Q
m1
(Q
T
m1
Q
m1
)
1
Q
T
m1
x
m
= x
m

m1

j=1
q
j
q
T
j
x
m
.
Set q
m
= y
m
/y
m
.
To take a simple example, let us orthogonalize the following basis for R
3
,
x
1
=
_
_
1
0
0
_
_
, x
2
=
_
_
1
1
0
_
_
, x
3
=
_
_
1
1
1
_
_
.
(1) q
1
= y
1
= x
1
.
(2) y
2
= x
2
q
1
q
T
1
x
2
= [0 1 0]
T
, and so q
2
= y
2
.
(2) y
3
= x
3
q
1
q
T
1
x
3
q
2
q
T
2
x
3
= [0 0 1]
T
, and so q
3
= y
3
.
We have arrived at
q
1
=
_
_
1
0
0
_
_
, q
2
=
_
_
0
1
0
_
_
, q
3
=
_
_
0
0
1
_
_
.
Once the idea is grasped the actual calculations are best left to a machine.
Matlab accomplishes this via the orth command. Its implementation is a bit
more sophisticated than a blind run through our steps (1) through (m). As a
result, there is no guarantee that it will return the same basis. For example
115
>> X=[1 1 1;0 1 1;0 0 1];
>> Q=orth(X)
Q =
0.7370 -0.5910 0.3280
0.5910 0.3280 -0.7370
0.3280 0.7370 0.5910
This ambiguity does not bother us, for one orthogonal basis is as good as
another. Let us put this into practice, via (10.9)
10.4. The Diagonalization of a Symmetric Matrix
By choosing an orthogonal basis {q
j,k
: 1 k n
j
} for each R(P
j
) and
collecting the basis vectors in
Q
j
= [q
j,1
q
j,2
q
j,n
j
]
we nd that
P
j
= Q
j
Q
T
j
=
n
j

k=1
q
j,k
q
T
j,k
.
As a result, the spectral representation (10.9) takes the form
B =
h

j=1

j
Q
j
Q
T
j
=
h

j=1

j
n
j

k=1
q
j,k
q
T
j,k
. (10.10)
This is the spectral representation in perhaps its most detailed dress. There
exists, however, still another form! It is a form that you are likely to see in
future engineering courses and is achieved by assembling the Q
j
into a single
n-by-n orthonormal matrix
Q = [Q
1
Q
h
].
116
Having orthonormal columns it follows that Q
T
Q = I. Q being square, it
follows in addition that Q
T
= Q
1
. Now
Bq
j,k
=
j
q
j,k
may be encoded in matrix terms via
BQ = Q (10.11)
where is the n-by-n diagonal matrix whose rst n
1
diagonal terms are
1
,
whose next n
2
diagonal terms are
2
, and so on. That is, each
j
is repeated
according to its multiplicity. Multiplying each side of (10.11), from the right,
by Q
T
we arrive at
B = QQ
T
. (10.12)
Because one may just as easily write
Q
T
BQ = (10.13)
one says that Q diagonalizes B.
Let us return the our example
B =
_
_
1 1 1
1 1 1
1 1 1
_
_
of the last chapter. Recall that the eigenspace associated with
1
= 0 had
e
1,1
=
_
_
1
1
0
_
_
and e
1,2
=
_
_
1
0
1
_
_
for a basis. Via GramSchmidt we may replace this with
q
1,1
=
1

2
_
_
1
1
0
_
_
and q
1,2
=
1

6
_
_
1
1
2
_
_
.
117
Normalizing the vector associated with
2
= 3 we arrive at
q
2,1
=
1

3
_
_
1
1
1
_
_
,
and hence
Q = [q
1
1
q
2
1
q
2
] =
1

6
_
_

3 1

3 1

2
0 2

2
_
_
and =
_
_
0 0 0
0 0 0
0 0 3
_
_
.
118
10.5. Exercises
[1] The stiness matrix associated with the unstable swing of gure 2.2 is
S =
_
_
_
1 0 1 0
0 1 0 0
1 0 1 0
0 0 0 1
_
_
_
.
(i) Find the three distinct eigenvalues,
1
= 1,
2
= 2,
3
= 0, along with
their associated eigenvectors e
1,1
, e
1,2
, e
2,1
, e
3,1
, and projection matrices,
P
1
, P
2
, P
3
. What are the respective geometric multiplicities?
(ii) Use the folk theorem that states in order to transform a matrix it suces
to transform its eigenvalues to arrive at a guess for S
1/2
. Show that your
guess indeed satises S
1/2
S
1/2
= S. Does your S
1/2
have anything in
common with the element-wise square root of S?
(iii) Show that R(P
3
) = N(S).
(iv) Assemble
S
+
=
1

1
P
1
+
1

2
P
2
and check your result against pinv(S) in Matlab.
(v) Use S
+
to solve Sx = f where f = [0 1 0 2]
T
and carefully draw before
and after pictures of the unloaded and loaded swing.
(vi) It can be very useful to sketch each of the eigenvectors in this fashion.
In fact, a movie is the way to go. Please run the Matlab truss demo
by typing truss and view all 12 of the movies. Please sketch the 4
eigenvectors of (i) by showing how they deform the swing.
119
11. The Singular Value Decomposition
11.1. Introduction
The singular value decomposition is, in a sense, the spectral representa-
tion of a rectangular matrix. Of course if A is m-by-n and m = n then it
does not make sense to speak of the eigenvalues of A. We may, however, rely
on the previous section to give us relevant spectral representations of the two
symmetric matrices
A
T
A and AA
T
.
That these two matrices together may indeed tell us everything about A can
be gleaned from
N(A
T
A) = N(A), N(AA
T
) = N(A
T
),
R(A
T
A) = R(A
T
), and R(AA
T
) = R(A).
You have proven the rst of these in a previous exercise. The proof of the
second is identical. The row and column space results follow from the rst
two via orthogonality.
On the spectral side, we shall now see that the eigenvalues of AA
T
and
A
T
A are nonnegative and that their nonzero eigenvalues coincide. Let us rst
conrm this on the A matrix associated with the unstable swing (see gure
2.2)
A =
_
_
0 1 0 0
1 0 1 0
0 0 0 1
_
_
. (11.1)
The respective products are
AA
T
=
_
_
1 0 0
0 2 0
0 0 1
_
_
and A
T
A =
_
_
_
1 0 1 0
0 1 0 0
1 0 1 0
0 0 0 1
_
_
_
.
120
Analysis of the rst is particularly simple. Its null space is clearly just the
zero vector while
1
= 2 and
2
= 1 are its eigenvalues. Their geometric
multiplicities are n
1
= 1 and n
2
= 2. In A
T
A we recognize the S matrix from
exercise [11.1] and recall that its eigenvalues are
1
= 2,
2
= 1, and
3
= 0
with multiplicities n
1
= 1, n
2
= 2, and n
3
= 1. Hence, at least for this A, the
eigenvalues of AA
T
and A
T
A are nonnegative and their nonzero eigenvalues
coincide. In addition, the geometric multiplicities of the nonzero eigenvalues
sum to 3, the rank of A.
Proposition 1. The eigenvalues of AA
T
and A
T
A are nonnegative. Their
nonzero eigenvalues, including geometric multiplicities, coincide. The geomet-
ric multiplicities of the nonzero eigenvalues sum to the rank of A.
Proof: If A
T
Ax = x then x
T
A
T
Ax = x
T
x, i.e., Ax
2
= x
2
and so
0. A similar argument works for AA
T
.
Now suppose that
j
> 0 and that {x
j,k
}
n
j
k=1
constitutes an orthogonal
basis for the eigenspace R(P
j
). Starting from
A
T
Ax
j,k
=
j
x
j,k
(11.2)
we nd, on multiplying through (from the left) by A that
AA
T
Ax
j,k
=
j
Ax
j,k
,
i.e.,
j
is an eigenvalue of AA
T
with eigenvector Ax
j,k
, so long as Ax
j,k
= 0.
It follows from the rst paragraph of this proof that Ax
j,k
=

j
, which,
by hypothesis, is nonzero. Hence,
y
j,k

Ax
j,k

j
, 1 k n
j
(11.3)
121
is a collection of unit eigenvectors of AA
T
associated with
j
. Let us now
show that these vectors are orthonormal for xed j.
y
T
j,i
y
j,k
=
1

j
x
T
j,i
A
T
Ax
j,k
= x
T
j,i
x
j,k
= 0.
We have now demonstrated that if
j
> 0 is an eigenvalue of A
T
A of geometric
multiplicity n
j
then it is an eigenvalue of AA
T
of geometric multiplicity at
least n
j
. Reversing the argument, i.e., generating eigenvectors of A
T
A from
those of AA
T
we nd that the geometric multiplicities must indeed coincide.
Regarding the rank statement, we discern from (11.2) that if
j
> 0
then x
j,k
R(A
T
A). The union of these vectors indeed constitutes a basis
for R(A
T
A), for anything orthogonal to each of these x
j,k
necessarily lies
in the eigenspace corresponding to a zero eigenvalue, i.e., in N(A
T
A). As
R(A
T
A) = R(A
T
) it follows that dimR(A
T
A) = r and hence the n
j
, for

j
> 0, sum to r.
Let us now gather together some of the separate pieces of the proof. For
starters, we order the eigenvalues of A
T
A from high to low,

1
>
2
> >
h
and write
A
T
A = X
n
X
T
(11.4)
where
X = [X
1
X
h
], where X
j
= [x
j,1
x
j,n
j
]
and
n
is the n-by-n diagonal matrix with
1
in the rst n
1
slots,
2
in the
next n
2
slots, etc. Similarly
AA
T
= Y
m
Y
T
(11.5)
122
where
Y = [Y
1
Y
h
], where Y
j
= [y
j,1
y
j,n
j
].
and
m
is the m-by-m diagonal matrix with
1
in the rst n
1
slots,
2
in the
next n
2
slots, etc. The y
j,k
were dened in (11.3) under the assumption that

j
> 0. If
j
= 0 let Y
j
denote an orthonormal basis for N(AA
T
). Finally,
call

j
=

j
and let denote the m-by-n matrix diagonal matrix with
1
in the rst n
1
slots and
2
in the next n
2
slots, etc. Notice that

T
=
n
and
T
=
m
. (11.6)
Now recognize that (11.3) may be written
Ax
j,k
=
j
y
j,k
and that this is simply the column by column rendition of
AX = Y .
As XX
T
= I we may multiply through (from the right) by X
T
and arrive at
the singular value decomposition of A,
A = Y X
T
. (11.7)
Let us conrm this on the A matrix in (11.1). We have

4
=
_
_
_
2 0 0 0
0 1 0 0
0 0 1 0
0 0 0 0
_
_
_
and X =
1

2
_
_
_
1 0 0 1
0

2 0 0
1 0 0 1
0 0

2 0
_
_
_
123
and

3
=
_
_
2 0 0
0 1 0
0 0 1
_
_
and Y =
_
_
0 1 0
1 0 0
0 0 1
_
_
.
Hence,
=
_
_

2 0 0 0
0 1 0 0
0 0 1 0
_
_
(11.8)
and so A = Y X
T
says that A should coincide with
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_

2 0 0 0
0 1 0 0
0 0 1 0
_
_
_
_
_
1/

2 0 1/

2 0
0 1 0 0
0 0 0 1
1/

2 0 1/

2 0
_
_
_
.
This indeed agrees with A. It also agrees (up to sign changes in the columns
of X) with what one receives upon typing [Y,SIG,X]=svd(A) in Matlab.
You now ask what we get for our troubles. I express the rst dividend as
a proposition that looks to me like a quantitative version of the fundamental
theorem of linear algebra.
Proposition 2. If Y X
T
is the singular value decomposition of A then
(i) The rank of A, call it r, is the number of nonzero elements in .
(ii) The rst r columns of X constitute an orthonormal basis for R(A
T
). The
n r last columns of X constitute an orthonormal basis for N(A).
(iii) The rst r columns of Y constitute an orthonormal basis for R(A). The
mr last columns of Y constitute an orthonormal basis for N(A
T
).
Let us now solve Ax = b with the help of the pseudoinverse of A. You
know the right thing to do, namely reciprocate all of the nonzero singular
values. Because m is not necesarily n we must also be careful with dimen-
sions. To be precise, let
+
denote the n-by-m matrix whose rst n
1
diagonal
124
elements are 1/
1
, whose next n
2
diagonal elements are 1/
2
and so on. In
the case that
h
= 0, set the nal n
h
diagonal elements of
+
to zero. Now,
one denes the pseudo-inverse of A to be
A
+
X
+
Y
T
.
Taking the A of (11.1) we nd

+
=
_
_
_

2 0 0
0 1 0
0 0 1
0 0 0
_
_
_
and so
A
+
=
_
_
_
1/

2 0 0 1/

2
0 1 0 0
1/

2 0 0 1/

2
0 0 1 0
_
_
_
_
_
_
1/

2 0 0
0 1 0
0 0 1
0 0 0
_
_
_
_
_
0 1 0
1 0 0
0 0 1
_
_
=
_
_
_
0 1/2 0
1 0 0
0 1/2 0
0 0 1
_
_
_
in agreement with what appears from pinv(A). Let us now investigate the
sense in which A
+
is the inverse of A. Suppose that b R
m
and that we wish
to solve Ax = b. We suspect that A
+
b should be a good candidate. Observe
now that
(A
T
A)A
+
b = X
n
X
T
X
+
Y
T
b by (11.4)
= X
n

+
Y
T
b because X
T
X = I
= X
T

+
Y
T
b by (11.6)
= X
T
Y
T
b because
T

+
=
T
= A
T
b by (11.7),
125
that is, A
+
b satises the least-squares problem A
T
Ax = A
T
b.
11.2. The SVD in Image Compression
Most applications of the SVD are manifestations of the folk theorem, The
singular vectors associated with the largest singular values capture the essense
of the matrix. This is most easily seen when applied to gray scale images. For
example, the jpeg associated with the image in the top left of gure 11.2 is
262-by-165 matrix. Such a matrix-image is read, displayed and decomposed
by
M = imread(jb.jpg); imagesc(M); colormap(gray);
[Y,S,X] = svd(double(M));
The singular values lie on the diagonal of S and are arranged in decreasing
order. We see their rapid decline in Figure 11.1.
0 20 40 60 80 100 120 140 160
10
0
10
1
10
2
10
3
10
4
index
s
i
n
g
u
l
a
r

v
a
l
u
e
Figure 11.1. The singular values of John Brown.
We now experiment with quantitative versions of the folk theorem. In par-
126
ticular, we examine the result of keeping but the rst k singular vectors and
values. That is we construct
Ak = Y(:,1:k)*S(1:k,1:k)*X(:,1:k);
for decreasing values of k.
Figure 11.2 The results of imagesc(Ak) for, starting at the top left, k=165,
64, 32 and moving right, and then starting at the bottom left and moving
right, k=24,20,16.
127
11.3. Exercises
[1] Suppose that A is m-by-n and b is m-by-1. Set x
+
= A
+
b and suppose
x satises A
T
Ax = A
T
b. Prove that x
+
x. (Hint: decompose
x = x
R
+ x
N
into its row space and null space components. Likewise
x
+
= x
+
R
+x
+
N
. Now argue that x
R
= x
+
R
and x
+
N
= 0 and recognize that
you are almost home.)
[2] Experiment with compressing the image of Cartman hung from the course
page. Submit gures corresponding to keeping 20, 10 and 5 singu-
lar values. Note: Cartman is really a color le, so after saving it to
your directory as cart.jpg and entering matlab you might say M = im-
read(cart.jpg) and then M = M(:,:,1).
[3] This assignment will focus on an application of the SVD in sound process-
ing. The goal is to remove the additive noise (hiss) from a sound signal
by altering the spectrum of the corresponding observation matrix. While
the previous sentence might have sounded somewhat intimidating, there
is no reason to panic, as we will guide you slowly and carefuly through
the entire procedure. First, some theory.
The Hankel Matrix. In image compression, one can think of the in-
dividual image pixels as the basic building blocks of the signal. In audio
applications, these building blocks are sound samples. The noisy signal y is
a vector of T sound samples, where each sample is represented by a number
between 1 and 1, i.e.
y =
_
_
_
_
y(1)
y(2)
.
.
.
y(T)
_
_
_
_
with 1 y(i) 1, for i = 1, ..., T. For simplicity, we will assume that the
128
total number of samples T is an odd number. How can we process a vector
using the SVD? Well, we cant. We need to construct a matrix that in some
sense contains the information about the signal y. As it is often done in signal
processing, we dene the mm symmetric Hankel matrix Y , where m =
T+1
2
,
as follows:
Y =
_
_
_
_
y(1) y(2) y(m)
y(2) y(3) y(m+ 1)
.
.
.
.
.
.
.
.
.
.
.
.
y(m) y(m+ 1) y(T)
_
_
_
_
We can now compute the SVD of the matrix Y and alter its singular values in
order to remove the noise. It should be noted that the Hankel matrix Y is quite
often chosen to be rectangular of size n m, with n m and n +m = T +1.
We choose a square matrix to ease the Matlab implementation.
SVD computation. We compute the SVD of Y :
Y = UV
T
.
Noise removal. It makes sense to assume that the main features of the
signal will be associated with the large singular values of Y (recall that the
same was true for the image compression problem). Similarly, the smaller
singular values are associate d with the part of the signal that is supposed to
contain only noise. Our objective therefore is to get rid of the smaller singular
values. We dene

NR
=
_
_
_
_
_
_
_
_
_
_
_
(1)
(2)
.
.
.
(K)
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
129
where
NR
is m m, and (1), . . . , (K) are the rst K singular values of
Y . We now set

X = U
NR
V
T
.
Restoring the Hankel Structure. How do we get the cleaned-up
signal (lets call it x) back from the matrix

X? The matrix

X is clearly no
longer a Hankel matrix, so we have to work some more. The Hankel structure
can be recovered by constructing a new matrix X where every element from an
antidiagonal of

X is replaced by the average value along that antidiagonal.
Consider the example
_
_
_
1 2 3 4
2 5 7 8
3 7 9 10
4 8 10 12
_
_
_

_
_
_
1 2 11/3 11/2
2 11/3 11/2 25/3
11/3 11/2 25/3 10
11/2 25/3 10 12
_
_
_
This procedure nally leads us to the Hankel matrix
X =
_
_
_
_
x(1) x(2) x(m)
x(2) x(3) x(m+ 1)
.
.
.
.
.
.
.
.
.
.
.
.
x(m) x(m+ 1) x(T)
_
_
_
_
from which we can read o the noise-free signal
x =
_
_
_
_
x(1)
x(2)
.
.
.
x(T)
_
_
_
_
Assignment In short, you are now asked to implement the outlined
noise-reduction procedure in Matlab. On the course web site, you will nd the
noisy signals voice noisy, eguitar noisy, tune noisy, and chord noisy, in
the .wav format. Submit the following:
130
Your entire Matlab code.
Plots of the noisy signals y and the noise-free signals x (i.e. your results)
for all four example signals.
Report the quantities x for all four example signals.
Make sure that your results make sense. Listen to the obtained signals!
Which example signal yields the best denoising result?
Comment on the computational eciency of the SVD algorithm. In
particular, construct random square matrices A (use A=randn( , )) of
sizes 100 100, 200 200, 400 400, 800 800, and 1600 1600, and
record the svd(A) run-times for each case. Use the tic; toc; Matlab
timing procedures.
IMPORTANT hints:
Use K = 200 of the rst singular values of Y to construct the clean
signals.
Use [y,fs,nbits] = wavread(filename); to load the noisy signal y
from the above les. Use wavwrite(x,fs,nbits,filename); to store
the noise-free signal x into new les. You can play the les associated
with sounds x and y using Windows Media Player under Windows or
Xine under UNIX or Linux.
Type help hankel in Matlab. This should help with the setup of the
Hankel matrix Y .
Do the chord example at the very end. Due to its size, it might take
several hours to denoise (here, you will be computing the SVD of a 2857
2857 matrix!).
131
12. Matrix Methods for Biochemical Networks
12.1. Introduction
We derive and solve linear programming problems stemming from ux
balance subject to thermodynamic constraints. With respect to the gure
below suppose that we have a steady ow into metabolite 1 and wish to
accumulate as much as possible of metabolite 6.
v
v
6
5 4 3 2
1
m
m m m m
m
v
1
8
7
6 5 4
3
2
1
v v v
v
v
Figure 12.1. A metabolic Network
There are many pathways from m
1
to m
6
and our task is discover or
design one that produces the greatest yield. Along the way we must obey ux
balance and energy considerations. The former simply means that ux into
an internal metabolite must balance ux out and that each of the uxes has
132
an energetically preferred direction. More precisely, each v
j
0 and
v
1
v
2
v
3
+ 1 = 0
v
1
+v
2
+v
5
v
6
v
7
= 0
v
2
v
3
+v
4
v
5
v
8
= 0
v
1
+v
6
v
7
= 0
v
3
v
4
v
8
= 0
or
Sv = b
where
S =
_
_
_
_
_
1 1 1 0 0 0 0 0
1 1 0 0 1 1 1 0
0 1 1 1 1 0 0 1
1 0 0 0 0 1 1 0
0 0 1 1 0 0 0 1
_
_
_
_
_
and b =
_
_
_
_
_
1
0
0
0
0
_
_
_
_
_
and our objective is to solve
max
vP
v
7
+v
8
where
P {v : Sv = b, v 0}
As with many things, this is a oneliner in matlab
v = linprog(f,[],[],S,b,zeros(8,1),[])
where f = [0 0 0 0 0 0 1 1]
T
encodes the objective function. Matlab
returns the answer
v = [0 1 0 0 1 1 1 0]
T
(12.1)
133
Can you see it? Our interest of course is in getting under the hood and guring
out what linprog is really up to. One way is to attempt to decompose P. We
will do this with the help of S
+
, the pseudoinverse of S, and E, an m-by-q
matrix whose columns comprise a basis for N(S). Now
P {v = S
+
b +Ey : y P
red
}
where
P
red
= {y R
q
: Ey S
+
b}
In this light I ask that you see that P
red
is the intersection of half spaces
(for each inequality in Ey S
+
b holds on one side of the hyperplane where
Ey = S
+
b). In the case of gure 12.1 we nd
S
+
b =
_
_
_
_
_
_
_
_
_
_
_
0.3125
0.3750
0.3125
0.1250
0.0000
0.1250
0.1875
0.1875
_
_
_
_
_
_
_
_
_
_
_
and E =
_
_
_
_
_
_
_
_
_
_
_
1 1 0
0 1 1
1 2 1
1 2 2
0 1 1
1 0 0
0 1 0
0 0 1
_
_
_
_
_
_
_
_
_
_
_
and so P
red
may be seen with our eyes (gure !!!!). Do you recognize it as
the intersection of halfspaces? Where 2 of these intesect we have an edge.
Where 3 intersect we nd a vertex.
With the reduced polytope representation also comes the reduced cost,
f
T
v = f
T
(S
+
b +Ey) = f
T
S
+
b +y
T
E
T
f.
The rst term in the last expression is completely determined by the network,
while the latter is determined by the y values in P
red
. For the problem at
hand
y
T
E
T
f = y
1
+y
2
.
134
Do you see that this achieves its maximum on P
red
at precisely the rst vertex
y = [1.1250 0.8125 0.1875]
T
and do you see that this gives the v in (12.1)?
0
0.5
1
1.5 0.5
0
0.5
1
0.2
0.1
0
0.1
0.2
y
2
The Reduced Polytope
y
1
y
3
0
0.5
1
1.5 0.5
0
0.5
1
0.2
0.1
0
0.1
0.2
3
1
y
2
The Reduced Polytope
5
y
1
4
2
y
3
Figure 12.2. The reduced polytope associated with the network in gure
12.1 with hidden lines hidden (left) and revealed (right). We have labeled the
5 vertices in the right panel.
It is a fact every linear objective function attains its maximum at one
or more vertices. The Simplex Method therefore starts at a vertex and then
moves along edges of increasing value until it hits a maximal vertex. Let us
presume we have starting vertex and see how best to proceed. For example
v
0
= [1 2 1 0 0 0 1 1]
T
/4
will do. Its nonzero elements are termed basic and its zero elements are
termed nonbasic. They provide a natural decompostion of S as
S
B
=
_
_
_
_
_
1 1 1 0 0
1 1 0 1 0
0 1 1 0 1
1 0 0 1 0
0 0 1 0 1
_
_
_
_
_
and S
N
=
_
_
_
_
_
0 0 0
0 1 1
1 1 0
0 0 1
1 0 0
_
_
_
_
_
135
in particular, S
B
is just the basic columns of S and S
N
is nonbasic the nonbasic
columns of S. A simlilar decomposition produces f
B
and f
N
. The simplex
method is a systematic iterative procedure for exchanging indices in B and
N until no improvement can be found. During an exchange we will speak
one variable that enters B and one variable that leaves B. We shall follow
Chvatal in implementing
(1) Solve S
T
B
y = f
B
.
(2) Choose an entering column, a = S
N
(:, e) by requiring that y
T
a < f
N
(e).
(3) Solve S
B
d = a.
(4) Find the largest t such that v

B
td 0. The variable for which equality
holds will now leave.
(5) Set the value of the entering variable to t and replace the values v

B
of
the basic variables by v

B
td 0. Replace the leaving column of S
B
by
the entering column.
Let us examine each of these steps for the example at hand. What follows
is a sprucedup matlab diary.
v = [1 2 1 0 0 0 1 1]/4;
B = [1 2 3 7 8];
N = [4 5 6];
Iteration 1
y = S(:,B)\ f(B);
y*S(:N) = 1/2 0 -1/2
f(N) = 0 0 0
a = SN(:,3);
136
d = S(:,B)\a = 3/4 -1/2 -1/4 -1/4 -1/4
vB = v(B) = 1/4 1/2 1/4 1/4 1/4
t = 1/3;
B = [6 2 3 7 8]; N = [4 5 1];
tmp = vB - t*d;
vB = tmp; vB(1) = t;
vB = 1/3 2/3 1/3 1/3 1/3
Iteration 2
y = S(:,B)\f(B) = -2/3 -1/2 -1/6 -1/2 -5/6
y*S(:,N) = 2/3 -1/3 2/3
f(N) = 0 0 0
a = SN(:,2);
d = S(:,B)\a = -2/3 -1/3 1/3 -2/3 1/3
vB = 1/3 2/3 1/3 1/3 1/3
t=1;
tmp = vB - t*d
vB = tmp; vB(3) = t;
vB = 1 1 1 1 0
B = [6 2 5 7 8]; N = [4 3 1];
Iteration 3
y = S(:,B)\f(B);
y*S(:,N) = 0 1 1
f(N) = 0 0 0
no candidate for entering basis
137

You might also like