Applied Linear Algebra Review

Applied Linear Algebra Refresher Course
Karianne Bergen
kbergen@stanford.edu
Institute for Computational and Mathematical Engineering,
Stanford University
September 16-19, 2013
K. Bergen
(ICME)
Applied Linear Algebra
1 / 140
Course Topics
Matrix and vector properties

Matrix algebra
Linear transformations
Solving linear systems
Orthogonality
Eigenvectors and eigenvalues
Matrix decompositions
K. Bergen
(ICME)
2 / 140
Vectors and Matrices
K. Bergen
(ICME)
3 / 140
Scalars, vectors and matrices
Scalar : a single quantity or measurement that is invariant under

coordinate rotations (e.g. mass, volume, temperature).
, , R
(Greek letters)
Vector : an ordered collection of scalars (e.g. displacement,

acceleration, force).

x1
x2

x = . , xi R i = 1, . . . n
..
xn
x, y, z Rn
(lower case) is a vector with n entries.
All vectors are column vectors unless otherwise noted.

Row vectors are denoted by xT = [ x1 x2 xn ].
For simplicity, we will restrict ourselves to real numbers. The generalization to the
complex case can be found in any linear algebra text.
K. Bergen
(ICME)
4 / 140
... Scalars, vectors and matrices
Matrix : a two-dimensional collection of scalars.
a11
a21
an = .
..
A = a1 a2
a12
a22
..
.
am1 am2
a1n
a2n
.. ,
.
amn
aT1
aT2
AT =
T
am
A Rmn is a matrix with m rows and n columns.

A, B, , Rmn
K. Bergen
(ICME)
(upper case)
5 / 140
Operations on vectors
Scalar multiplication:

x1
x1
x2 x2

x = . = .
.. ..
(x)i = xi ,
xn
xn
Vector addition/subtraction:
(x + y)i = xi + yi ,
x1
y1
x 1 + y1
x 2 y2 x 2 + y2

x+y = . + . =
..
.. ..
.
xn
yn
x n + yn
Inner Product (dot product, scalar product):

xT y =
n
X
xi yi = x1 y1 + x2 y2 + + xn yn
i=1
Other common notation for the inner product: hx, yi,

K. Bergen
(ICME)
xy
6 / 140
Vector norms
How do we measure the length or size of a vector?

Vector norm is a function f : Rn R+ that satisfies the following
properties:
R and v Rn
Scale invariance: f (v) = ||f (v),
Triangle inequality: f (u + v) f (u) + f (v),
Positivity: f (x) = 0 x 0
K. Bergen
(ICME)
u, v Rn
7 / 140
Commonly used vector norms

1-norm (taxi-cab norm):
kvk1 =
n
X
|vi |
i=1
2-norm (Euclidean norm):

kvk2 =
n
X
!1
vi2
vT v
i=1
infinity-norm (max norm):

kvk = max |vi |
i
These are examples of the p-norm, for p = 1, 2, and :

!1
n
p
X
kvkp =
|vi |p
, p1
i=1
K. Bergen
(ICME)
8 / 140
Questions?
K. Bergen
(ICME)
9 / 140
Mini-Quiz 1:
Compute the 1-norm, 2-norm, infinity-norm of the following vector:

3
v = 6
2
K. Bergen
(ICME)
10 / 140
Solution (Q1):
Solutions:
1-norm:
kvk1 =
3
X
vi = 3 + 6 + 2 = 11
i=1
2-norm:
kvk2 =
3
X
!1
2
vi2
32 + 62 + 22 = 9 + 36 + 4 = 49 = 7
i=1
-norm
kvk = max |vi | = max{3, 6, 2} = 6
i
K. Bergen
(ICME)
11 / 140
Cauchy-Schwartz Inequality
The inner product satisfies the Cauchy-Schwartz inequality:

|xT y| kxk2 kyk2
for x, y Rn
with equality when y is a scalar multiple of x.

This inequality is arises in many applications, including
I
I
I
Geometry: triangle inequality

Physics: Heisenberg uncertainty principle
Statistics: Cramer-Rao lower bound
K. Bergen
(ICME)
12 / 140
Inner Product and Orthogonality
Orthogonal vectors: vectors u and v are orthogonal (u v) if their

inner product is zero.
uT v = 0
uv
Orthonormal vectors: vectors u and v are orthonormal if they are

orthogonal and both are normalized to length 1.
uv
and kuk = 1, kvk = 1
The orthogonal complement of a set of vectors V , denoted V , is

the set of vectors x such that xT v = 0, v V .
K. Bergen
(ICME)
13 / 140
Inner Product and Projections
Projection: the inner product gives us a way to compute the

component of a vector v along any direction u:
proju v =
uT v u
kuk kuk
When u is normalized to unit length, kuk = 1, this becomes

proju v = uT v u
K. Bergen
(ICME)
14 / 140
Questions?
K. Bergen
(ICME)
15 / 140
Mini-Quiz 2:
Give an algebraic proof for the triangle inequality

kx + yk2 kxk2 + kyk2
using the Cauchy-Schwartz inequality. (Hint: expand kx + yk2 )
K. Bergen
(ICME)
16 / 140
Solution (Q2):
Give an algebraic proof for the triangle inequality
kx + yk2 kxk2 + kyk2
using the Cauchy-Schwartz inequality. (Hint: expand kx + yk2 )
Solution:
kx + yk2 = (x + y)T (x + y)
= xT x + 2xT y + y T y
= kxk2 + kyk2 + 2 xT y
kxk2 + kyk2 + 2|xT y|

kxk2 + kyk2 + 2kxkkyk
(by Cauchy-Schwartz)
= (kxk + kyk)
K. Bergen
(ICME)
17 / 140
Operations with Matrices

Scalar multiplication:
(A)ij = Aij ,
a11
am1
a1n
..
a11

=
am1
amn
a1n
..
.
amn
Matrix addition:
(A + B)ij = Aij + Bij
a11 + b11
a1n + b1n
..
A+B =
,
.
am1 + bm1
amn + bmn
K. Bergen
(ICME)
A, B Rmn
18 / 140
Matrix-vector multiplication
Matrix-vector multiplication:
(Ax)i =
n
X
Aij xj ,
A Rmn ,
x Rn
j=1
Matrix-vector product as linear combination of matrix columns

x1
x2

Ax = a1 a2 an . = x1 a1 +x2 a2 + +xn an
..
xn
Matrix-vector product in terms as a vector of scalar products
a
T1
a
T1 x

T
T
a
2
2 x
Ax = . x =
.
..
..
a
Tm
K. Bergen
(ICME)
a
Tm x
19 / 140
Matrix-matrix multiplication
Matrix product:
(AB)ij =
p
X
A Rmp ,
Aik Bkj ,
B Rpn
k=1
Matrix products in terms of inner
a
T1
T
a
2
(AB)ij = a
Ti bj ,
A= .
..
a
Tm
products:
B = b1 b2
bn
Since inner products are only defined for vectors of the same
length, the matrix product requires that a
i , bj Rp for all i, j.
K. Bergen
(ICME)
20 / 140
... Matrix-matrix multiplication

Matrix products in terms of Matrix-vector products:

AB = A b1 b2
AB =
a
T1
a
T2
..
.
a
Tm
bn = Ab1 Ab2
B =
a
T1 B
a
T2 B
..
.
a
Tm B
Abn
Matrix-matrix multiplication can be expressed in terms of the

concatenation of matrix-vector products.
For A Rkl and B Rmn the product AB is only defined if
l = m, and the product BA is only defined if k = n.
K. Bergen
(ICME)
21 / 140
Properties of Matrix-matrix multiplication

Matrix multiplication is associative:
A(BC) = (AB)C
Matrix multiplication is distributive:
A(B + C) = AB + AC
Matrix multiplication is, in general, not commutative:
AB 6= BA

For example, let A =

1 1
1 0
and B =
, then
0 1
1 1

2 1
1 1
AB =
6
=
= BA.
1 1
1 2
K. Bergen
(ICME)
22 / 140
Vector Outer Product

Using the definition of matrix multiplication, we can define the
vector outer product:

x1 y1 x1 y2 x1 yn
x1
x2
x 2 yn

x2 y1 x2 y2

xy T = . y1 y2 yn = .
..
..
..
..
.
.
xm y1 xm y2
xm
xy T
xm yn
= x i yj
ij
x Rm , y Rn ,
xy T Rmn
The outer product xy T of vectors x and y is a matrix, and is

defined even if the vectors are not the same length.
K. Bergen
(ICME)
23 / 140
Questions?
K. Bergen
(ICME)
24 / 140
Mini-Quiz 3:
Which of the following vector and matrix operations are well-defined?
1
2
3
x
A
xy
4
5
6
= 5,
1
x = 5 ,
2
K. Bergen
(ICME)
xT y
xT z
zxT

4
y = 0 ,
1
z=

3
,
0
Az
z T Ay
A=

4 0 2
1 5 3
25 / 140
Solution (Q3):
Which of the following vector and matrix operations are well-defined?
1
2
3
x
A
xy
Yes
Yes
Yes
4
5
6
= 5,
1
x = 5 ,
2
K. Bergen
(ICME)
xT y
xT z
zxT
Yes
No
Yes

4
y = 0 ,
1
z=

3
,
0
Az
No
z T Ay
Yes
A=

4 0 2
1 5 3
26 / 140
Transpose of a Matrix
Transpose: the transpose of matrix A Rmn is the matrix
AT Rnm formed by exchanging the rows and columns of A:
(AT )ij = Aji
Properties:
(AT )T
T
(A + B)
K. Bergen
(ICME)
= A
= AT + B T
(AB)T
= B T AT
(B)T
= (B T )
27 / 140
Trace of a Matrix
Trace: the trace of a square matrix A Rnn is the sum of its
diagonal entries:
n
X
tr(A) =
aii
i=1
The trace is a linear map and has the following properties:

tr(A + B) = tr(A) + tr(B)
tr(A) = tr(A),
tr(A) = tr(A )
tr(AB) = tr(BA)
The trace of a matrix product functions similarly to the inner
product of vectors:
X
tr(AT B) =
Aij Bij , A, B Rmn
i,j
K. Bergen
(ICME)
28 / 140
Matrix norms
Induced norm (operator norm) corresponding to vector norm k k:
kAk = max kAxk
kxk=1
I
I
I
kAk1 : maximum absolute column sum

kAk : p
maximum absolute row sum
kAk2 = max (AT A) = max (A) (spectral norm)
Frobenius norm:
v
uX
n
q
um X
|aij |2 = trace(AT A)
kAkF = t
i=1 j=1
The Frobenius norm is equivalent to kvec(A)k2 .
K. Bergen
(ICME)
29 / 140
Geometric Interpretation of Matrix Norms
diagram from Trefethen and Bau.

K. Bergen
(ICME)
30 / 140
Properties of Matrix Norms
Matrix norms satisfy the same properties as vector norms:

1
Scale invariance: kAk = ||kAk
Triangle inequality: kA + Bk kAk + kBk
Positivity: kAk 0 and kAk = 0 only if A 0
Some matrix norms, including induced norms and the Frobenius

norm, also satisfy the submultiplicative property:
kAxk kAkkxk,
K. Bergen
(ICME)
and kABk kAkkBk
31 / 140
A few words on: Vector and Matrix Norms

Vector norms is to give us a way to measure how big a vector is.
2-norm (Euclidean norm) is most common: it has nice relationship
with the inner product kvk22 = v T v, and fits with our usual
geometric notions of length and distance.
The choice of norm may depend on the application:
Let y Rn be a set of n measurements, and let y Rn be the
estimates for y from a model. Let r = y y be the difference
between the measured value and the model estimate. We want to
optimize our model to make r as small as possible, but how
should we measure small?
I
I
I
krk : use this if no error should exceed a prescribed value .

krk2 : use this if large errors are bad, but small errors are ok.
krk1 : use this norm if you want to your measurement and estimate
to match exactly for as many samples as possible.
Matrix norms are an extension of vector norms to matrices.

K. Bergen
(ICME)
32 / 140
Questions?
K. Bergen
(ICME)
33 / 140
Mini-Quiz 4:
Determine whether each statement given below is True or False:
There exist matrices A, B 6= 0 such that AB = 0.
The trace is invariant under cyclic permutations:
tr(ABC) = tr(CAB) = tr(BCA).
The definition of the induced norm allows us to use different vector
norms on the numerator and denominator
kAk, = max kAxk .
kxk =1
kAk,1 is equal to the largest element of A in absolute value .

K. Bergen
(ICME)
34 / 140
Solutions (Q4):
There exist matrices A, B 6= 0 such that AB = 0.

1 2
2 4
TRUE : For example, when A =
, and B =
.
2 4
1 2
The trace is invariant under cyclic permutations:
tr(ABC) = tr(CAB) = tr(BCA).
TRUE
Use the property that tr(XY ) = tr(Y X):
To show that tr(ABC) = tr(CAB), let X = AB, and Y = C.
To show that tr(ABC) = tr(BCA), let X = A and Y = BC.
K. Bergen
(ICME)
35 / 140
Solutions (Q4):
The definition of the induced norm allows us to use different vector
norms on the numerator and denominator
kAk, = max kAxk .
kxk =1
kAk,1 is equal to the largest element of A in absolute value .

TRUE
kAk,1 = max kAxk

kxk1 =1
K. Bergen
(ICME)

T
= max max(
ai xi ) = max |aij |.
kxk1 =1
i,j
36 / 140
Special Matrices: Diagonal, Identity

Diagonal Matrix : a matrix is diagonal if the
are on the matrix diagonal.
d1
(
di if i = j
0
Dij =
,
D=.
..
0 if i 6= j
0
only non-zero entries

0
d2
..
.
0
0
0
..
.
dn
Identity Matrix : the identity In Rnn is a diagonal matrix with

1s along the diagonal (denoted by I when size is unambiguous).
1 0 0
(
0 1
0
1 if i = j
Iij =
,
I = .
..
.
.
.
.
. .
0 if i 6= j
0 0 1
AIn = Im A = A
AI = IA = A
K. Bergen
(ICME)
for rectangular A Rmn

for square A Rnn
37 / 140
Special Matrices: Triangular

Triangular Matrix : a matrix is lower/upper triangular if all of the
entries above/below the diagonal are zero:
l11
l21
L= .
..
l22
..
.
lm1 lm2
I
I
I
..
u11 u12
u22
U =
..
lmn
u1n
u2n
..
.
umn
The sum of two upper triangular matrices is upper triangular.

The product of two upper triangular matrices is upper triangular.
The inverse of an invertible upper triangular matrix is upper
triangular.
K. Bergen
(ICME)
38 / 140
Special Matrices: Symmetric, Orthogonal

Symmetric Matrix : a square matrix is symmetric if
its own transpose.
1
2
A = AT Rnn ,
A= .
..
..
.
it is equal to
..
.
n
Orthogonal (Unitary) Matrix : a square matrix U Rnn is

unitary if it satisfies
U T U = U U T = I.
The columns of a orthogonal matrix U are orthonormal.
K. Bergen
(ICME)
39 / 140
Questions?
K. Bergen
(ICME)
40 / 140
Mini-Quiz 5:
For any A Rnn , AT A is symmetric.
For any A Rnn , A AT is symmetric.
Invariance under Unitary Multiplication.
For any matrix x Rm and unitary Q Rmm , we have
kQxk2 = kxk2
Let D Rnn = diag(d1 , , dn ) be any diagonal matrix, then
DA = AD for any matrix A Rnn .
K. Bergen
(ICME)
41 / 140
Solution (Q5):

For any A Rnn , AT A is symmetric.
TRUE : (AT A)T = (A)T (AT )T = AT A
For any A Rnn , A AT is symmetric.

1 1
FALSE : For example when A =
,
1 0
K. Bergen
(ICME)
AT

0 2
=
.
2 0
42 / 140
Solution (Q5):
Invariance under Unitary Multiplication.
For any matrix x Rm and unitary Q Rmm , we have
kQxk2 = kxk2
TRUE : kQxk22 = (Qx)T (Qx) = xT QT Qx = xT x = kxk22
Similarly, for the matrix 2-norm and Frobenius norm, we have:

kQAk22 = max (QA)T (QA)

= max AT QT QA

= max AT A)
= kAk22
and kQAk2F

= tr (QA)T (QA)

= tr AT A
= kAk2F
K. Bergen
(ICME)
43 / 140
Solution (Q5):
Let D Rnn = diag(d1 , , dn ) be any diagonal matrix, then
DA = AD for any matrix A Rnn .
FALSE
DA rescales the rows of A while AD rescales the columns of A:
d1 a
T1
a
1
d1 0 0
T d2 a
0 d2
T2
0
2
=
DA = .
6=
.
.
.
..
..
..
. .. ..
T
T
dn a
n
a
n
0 0 dn
AD = a1
K. Bergen
(ICME)
d1 0
0 d2
an .
..
..
.
0 0
.. = d1 a1
.
dn an
dn
44 / 140
Vector Spaces
K. Bergen
(ICME)
45 / 140
Linear Transformations
Linear transformation: T : Rn Rm is a linear transformation if
for all x, y Rn the following properties hold:
I
T (x + y) = T (x) + T (y)
T (x) = T (x)
Theorem: Let A Rmn be a matrix and define the function

TA (x) = Ax. Then TA : Rn Rm is a linear transformation.
I
Matrix multiplication is a linear function.
Theorem: Let T : Rn Rm , then there exists a matrix A such

that T (x) = Ax.
I
There is a matrix for every linear function.
K. Bergen
(ICME)
46 / 140
Linear Combinations
Let v1 , v2 , , vr Rm be a set of vectors. Then any vector v
which can be written in the form
v = 1 v1 + 2 v2 + + r vr ,
i R i = 1, . . . , r
is a linear combination of vectors v1 , v2 , , vr .

Span: the set of all linear combination of vectors v1 , , vr Rm
is called the span of {v1 , , vr }
I
I
span{v1 , , vr }, is always a subspace of Rm .

If S = span{v1 , , vr }, then S is spanned by v1 , v2 , , vr .
A set S is called a subspace if it is closed under vector addition

and scalar multiplication:
I
if x, y S and R, then x + y S and x S
K. Bergen
(ICME)
47 / 140
Linear Independence
Linear dependence: a set of vectors v1 , v2 , , vr is linearly
dependent if there exists a set of scalars 1 , 2 , , r R with
at least one i 6= 0 such that
1 v1 + 2 v2 + + r vr = 0
That is, a set of vectors is linearly dependent if one of the vectors
in the set can be written as a linear combination of one or more
other vectors in the set.
Linear independence: a set of vectors v1 , v2 , , vr is linearly
independent if it is not linearly dependent. That is
1 v1 + 2 v2 + + r vr = 0
K. Bergen
(ICME)
1 = = r = 0
48 / 140
Questions?
K. Bergen
(ICME)
49 / 140
Mini-Quiz 6:
Are the following sets of vectors linearly independent or linearly
dependent?
1
{a, b, c}
{b, c, d}
{a, b, d}
{a, b, c, d}

1
a = 0 ,
0
K. Bergen
(ICME)

0
b = 1 ,
1

0
c = 0 ,
1
0
d = 1
1
50 / 140
Solution (Q6):
Are the following sets of vectors linearly independent or linearly
dependent?
1
{a, b, c}
independent
{b, c, d}
dependent: b + 2c = d
{a, b, d}
independent
{a, b, c, d} dependent: at most 3 linearly independent vectors in R3

1
0
0
0
a = 0 , b = 1 , c = 0 , d = 1
0
1
1
1
K. Bergen
(ICME)
51 / 140
Basis and Dimension

A basis for a subspace S is a linearly independent set of vectors
that span S
I
The basis for a subspace S is not unique, but all bases for the
subspace contain the same number of vectors.
For example, consider the case where S = R2 :

1
0
1
1
,
and
,
are both bases that span R2 .
0
1
1
1
The dimension of the subspace S, dim(S), is the number of

linearly independent vectors in the basis for S (in the example
above, dim(S) = 2).
Theorem (unique representation): if vectors v1 , . . ., vn are a
basis for subspace S, then every vector in S can be uniquely
represented as a linear combination of these basis vectors.
K. Bergen
(ICME)
52 / 140
Range and Nullspace

The range (column space, image) of a matrix A Rmn , denoted
by R(A), is the set of all linear combinations of the columns of A:
R(A) := {Ax | x Rn } ,
R(A) Rm
The nullspace (kernel) of a matrix A Rmn , denoted by N (A), is

the set of vectors z such that Az = 0:
N (A) := {z Rn | Az = 0} ,
N (A) Rn
The range and nullspace of AT are called the row space and left
nullspace of A.
I
These four subspaces of A are intrinsic to A and do not depend on

the choice of basis.
K. Bergen
(ICME)
53 / 140
Rank of a Matrix
The column rank of A, denoted rank(A), is dimension of R(A).

The row rank of A is the dimension of R(AT ).
Rank of a matrix: the column rank of a matrix is always equal to
the row rank, and therefore we refer to this value as the rank of A.
Matrix A Rmn is full rank if rank(A) = min{m, n}.
K. Bergen
(ICME)
54 / 140
Fundamental Theorem of Linear Algebra

Fundamental theorem of linear algebra:
1
The nullspace of A is the orthogonal complement of the row space.

N (A) = R(AT )
The left nullspace of A is the orthogonal complement of the column

space.
N (AT ) = (R(A))
Corollary (Rank-nullity):
dim (R(A)) + dim (N (A)) = n,
K. Bergen
(ICME)
A Rmn
55 / 140
Questions?
K. Bergen
(ICME)
56 / 140
Mini-Quiz 7:
Give the range R() and nullspace N () for the matrices

1 0
1 0 1
32
A = 0 2 R , and B =
R23
0 1 1
0 0
What are their dimensions ( dim (R()) and dim (N ()) )?
K. Bergen
(ICME)
57 / 140
Solution (Q7):
Give the range R() and nullspace N () for the matrices

1 0
1 0 1
32
A = 0 2 R , and B =
R23
0 1 1
0 0
What are their dimensions ( dim (R()) and dim (N ()) )?
Solution:

0
1
R(A) = 0 + 2 , , R ,
N (A) =
0
0

1
1
0
+
, , R ,
N (B) = 1 , R
R(B) =
0
1
1
dim (R(A)) = 2
dim (N (A)) = 0
dim (R(B)) = 2
dim (N (B)) = 1
K. Bergen
(ICME)
58 / 140
Solving Linear Systems
K. Bergen
(ICME)
59 / 140
Nonsingular Matrix
Theorem: a matrix A Rmn with m n has full rank if and
only if it maps no two distinct vectors to the same vector, and
Ax = Ay
x = y.
Singular matrix : a square matrix A Rnn that is not full rank.

Nonsingular matrix : a square matrix A Rnn of full rank.
If A is nonsingular, we can uniquely express any vector y Rn as
y = Ax for some unique x Rn . If we chose vectors ei = Axi ,
i = 1, . . . , n, we can write
AX = A x1
K. Bergen
(ICME)
xn = Ax1
Axn = e1
60 / 140
en = I
Matrix Inverse
Matrix inverse: the inverse of a square matrix A Rnn is the
matrix B Rnn such that
AB = BA = I
I
I
I
I
where I is the identity matrix
The inverse of A is denoted by A1 .

Any square, nonsingular matrix A has a unique inverse A1
satisfying AA1 = A1 A = I.
1
(AB) = B 1 A1 (assuming both inverses exist).
(AT )1 = (A1 )T
If Ax = b, then A1 b gives the vector of coefficients in the linear

combination of the columns of A that yields b:

x1
..
1
b = Ax = x1 a1 + + xn an ,
x=A b= .
xn
K. Bergen
(ICME)
61 / 140
Invertible Matrix Theorem
Let A Rnn be a square matrix, then the following statements

are equivalent:
I
I
I
I
I
I
I
I
A is invertible, A is nonsingular.
There exists a matrix A1 such that AA1 = In = A1 A.
Ax = b has exactly one solution for each b Rn .
Az = 0 has only the trivial solution z = 0, i.e. N (A) =
The columns of A are linearly independent.
A is full rank, i.e. rank(A) = n.
det(A) 6= 0.
0 is not an eigenvalue of A.
K. Bergen
(ICME)
62 / 140
Questions?
K. Bergen
(ICME)
63 / 140
Mini-Quiz 8:
If A and B Rnn are two invertible matrices, which of the following

formulas are necessarily true?
A + B is invertible and (A + B)1 = A1 + B 1 .
(A + B)2 = A2 + 2AB + B 2 .
(ABA1 )3 = AB 3 A1 .
A1 B is invertible and A1 B
K. Bergen
(ICME)
1
= B 1 A.
64 / 140
Solutions (Q8):
A + B is invertible and (A + B)1 = A1 + B 1 .
FALSE : A =

1 1
,
0 1

1 1
=
,
0 1
B=
1

1 0
,
1 1

1 0
=
,
1 1

2 1
1 2
A+B =
(A + B)

=
2
3
31
13
2
3
(A + B)2 = A2 + 2AB + B 2 .
FALSE : (A + B)2 = A2 + AB + BA + B 2 and AB 6= BA for all
invertible matrix pairs A, B.
K. Bergen
(ICME)
65 / 140
Solutions (Q8):
(ABA1 )3 = AB 3 A1
TRUE
(ABA1 )3 = (ABA1 )(ABA1 )(ABA1

= AB(A1 A)B(A1 A)BA1 )
= AB 3 A1
A1 B is invertible and A1 B
1
= B 1 A
TRUE
(A1 B)(A1 B)1 = (A1 B)(B 1 A)

= A1 (BB 1 )A
= A1 A
= In
K. Bergen
(ICME)
66 / 140
Systems of Linear Equations

Example: Find values x1 , x2 R that satisfy
2x1 + x2 = 7
and
x1 3x2 = 7
(for example, if we want to find the intersection point of two lines).

This can be written as a matrix equation:

x1
7
2 1
=
or Ax = b
1 3 x2
7

where A =
K. Bergen
(ICME)

2 1
,
1 3

x=

x1
7
, and b =
x2
7
67 / 140
Linear Systems
One of the fundamental problems in linear algebra:
Given A Rmn and b Rm , find x Rn such that
Ax = b.
The set of solutions of a linear system can behave in three ways:
1
2
3
The system has one unique solution.

The system has infinitely many solutions.
The system has no solution.
For a solution x to exist, we must have b R(A).

I
If b R(A) and rank(A) = n, then there is a unique solution

(usually square systems: m = n).
If b R(A) and rank(A) < n (N (A) 6= ), then there are infinitely
many solutions (usually underdetermined systems: m < n).
If b
/ R(A), then there is no solution (usually overdetermined
systems: m > n).
K. Bergen
(ICME)
68 / 140
Solution Set of Linear Systems
diagram from http://oak.ucc.nau.edu/jws8/3equations3unknowns.html

K. Bergen
(ICME)
69 / 140
Gaussian Elimination (row reduction)

How do we solve Ax = b, A Rnn ?
For some types of matrices, Ax = b is easy to solve.
I
I
Diagonal matrices: equations independent, so xi = bi /aii , i

Upper/Lower triangular matrices: back/forward substitution.
Gaussian Elimination: Transform the system Ax = b into an

upper triangular system U x = y using elementary row operations.
The following are elementary
row
operations that can be applied
to the augmented system A | b to introduce zeros below the
diagonal without changing the solution set x:
1
2
3
Multiply row by a scalar.

Add scalar multiples of one row to another.
Permute rows.
K. Bergen
(ICME)
70 / 140
Step 1 : For A R33 , b R3 , form

A b =
Step 2 : Compute
Step 3 : Solve
+ +
+
K. Bergen
the augmented system.
the row echelon form using

L1
0 +

0 +

L2
+

0

row operations.

+ +
+ +

+ +
+ +
the triangular system by back substitution.

back substitution
+ = U y , Ux = y
x = U 1 y
+
(ICME)
71 / 140
Questions?
K. Bergen
(ICME)
72 / 140
Mini-Quiz 9:
Determine the lower triangular matrices Li that correspond to the
following elementary row operations:
L1 such that L1 A = A1 :
2 8
4
1 4
2
1 = 2 5
1
L1 2 5
4 10 1
4 10 1
L2 such that L2 A1 = A2 :
1 4
2
1 4
2
1 = 0 3 3
L2 2 5
4 10 1
4 10 1
L3 such that L3 A2 = A3 :
1 4
2
1 4
2
L3 0 3 3 = 0 3 3
4 10 1
0 6 9
K. Bergen
(ICME)
73 / 140
Solution (Q9):
1
0 0
2 8
4
1 4
2
1 = 2 5
1
L1 A = A1 : 0 1 0 2 5
0 0 1
4 10 1
4 10 1
2
1 0 0
1 4
2
1 4
2
1 = 0 3 3
L2 A1 = A2 : 2 1 0 2 5
0 0 1
4 10 1
4 10 1
1 0 0
1 4
2
1 4
2
L2 A3 = A3 : 0 1 0 0 3 3 = 0 3 3
4 0 1
4 10 1
0 6 9
L3 L2 L1 such that L3 L2 L1 A = A3 :
1
0 0
2 8
4
1 4
2
2
1 1 0 2 5
1 = 0 3 3
2 0 1
4 10 1
0 6 9
K. Bergen
(ICME)
74 / 140
Gaussian Elimination and LU Decomposition

We can think of the elementary row operations applied to A as
linear transformations, which can be represented as a series of
lower triangular matrices, L1 , , Ln1 .
Ln1 L1 A = U
or
L1 A = U,
1
L1 = L1
1 Ln1
rearranging yields A = LU.

If we can factor A as A = LU , then we can solve the system
Ax = b by solving two triangular systems:
I
I
Solve Ly = b using forward substitution,

Solve U x = y using backward substitution.
Gaussian elimination implicitly computes the LU factorization.
K. Bergen
(ICME)
75 / 140
Partial Pivoting
Pure Gaussian elimination can not be used to solve general linear

systems because we may encounter a zero on the diagonal during
the process, e.g.

0 1
A=
1 2
In practical implementation of the algorithm, the rows of the
matrix must be permuted to avoid division by zero.
Partial pivoting swaps rows if an entry below the diagonal of the
current column is larger in absolute value than the current
diagonal entry (pivot element).
K. Bergen
(ICME)
76 / 140
Permutations
The reordering, or permutation, of the columns in partial pivoting
can be represented as a linear transformation.
Permutation matrix : a square, binary matrix P Rnn that has
exactly one entry 1 in each row and each column.
I
Left-multiplication of a matrix A by a permuation matrix reorders

the rows of A, while right-multiplication reorders the columns of A.
0 1 0
For example, let P = 0 0 1 ,
1 0 0
1
2
3
A = 10 20 30 ,
100 200 300
then the row and column permutations are, respectively:
10 20 30
3
2
1
P A = 100 200 300 , and AP = 30 20 10 .
1
2
3
300 200 100
K. Bergen
(ICME)
77 / 140
LU Decomposition
Let A Rnn be a square matrix, then the LU decomposition
factors A into the product of a lower triangular matrix L Rnn
and an upper triangular matrix U Rnn
A = LU.
When partial pivoting is used to permute the rows, we obtain a
decomposition for the form Ln1 Pn1 L1 P1 A = U , which can
be written3 in form: L1 P A = U .
In general, any square matrix A Rnn (singular or nonsingular)
has a factorization
P A = LU,
where P is a permutation matrix.
see Trefethen and Bau: Chapter 21 for details.

K. Bergen
(ICME)
78 / 140
Cholesky and Positive Definiteness

Positive definite matrix : a matrix A Rnn is positive definite if
z T Az > 0
for every z Rn ,
often denoted by A 0.
Example:

A=
xT Ax =
x1 x2
4 2
2 10

4 2 x1
2 10 x2
= 4x21 4x1 x2 + 10x22

= (2x1 x2 )2 + 9x22 > 0
I
I
x 6= 0
Positive semi-definite matrix : z T Az 0 for every z Rn .

Negative definite matrix :
z T Az < 0 for every z Rn .
K. Bergen
(ICME)
79 / 140
... Cholesky and Positive Definiteness

A symmetric matrix A is positive definite (A 0) if and only if
there exists a unique lower triangular matrix L such that
A = LLT .
This factorization is called the Cholesky decomposition of A.
The Cholesky algorithm, a modified version of Gauss elimination,
is used to compute the factor L.
I
Basic idea: In Gauss-elimination we use row operations to

introduce zeros below the diagonal in each column. To maintain
symmetry, we can apply the same operations to the columns of the
matrix to introduce zeros in the first row.

L1 A
0 + + = L1 A = A1
A =

0 + +

0 0
A1 LT
A1 = 0 1 0 + + = A2 = A1 LT1 = L1 ALT1
0
0 + +
K. Bergen
(ICME)
80 / 140
Questions?
K. Bergen
(ICME)
81 / 140
Mini-Quiz 10:
Quadratic forms: a function q(x) : Rn R if it is a linear
combination of functions of the form xi xj . A quadratic form can
be written as
q(x) = xT Ax,
where A Rnn is symmetric.
Let B Rmn . Show that q(x) = kBxk2 is a quadratic form, find

A such that q(x) = xT Ax, and determine the definiteness of A.
Given A, B, find the permutation matrix P such that P BP = A:
1 2 3
1 3 2
A = 4 5 6 , B = 7 9 8
7 8 9
4 6 5
K. Bergen
(ICME)
82 / 140
Solution (Q10):
Let B Rmn . Show that q(x) = kBxk2 is a quadratic form, find
A such that q(x) = xT Ax, and determine the definiteness of A.
Solution:
q(x) = (Ax)T (Ax) = xT AT Ax = xT Bx,
where B = AT A.
q(x) is positive semi-definite, since q(x) = kAxk2 0, x Rm .

Given A, B,
1
A = 4
7
K. Bergen
find the permutation matrix P such that P BP = A:
2 3
1 3 2
1 0 0
5 6 , B = 7 9 8 , and P = 0 0 1
8 9
4 6 5
0 1 0
(ICME)
83 / 140
Orthogonalization
K. Bergen
(ICME)
84 / 140
Orthogonal (Unitary) Matrices

Recall that two vectors x, y Rn are orthogonal if xT y = 0, and a
set of vectors {u1 , , un } is orthonormal if
(
1 for i = j
uTi uj =
, kui k = 1, i
0 for i 6= j
Orthogonal (Unitary) Matrix : a matrix Q Rnn is orthogonal if
its columns q1 , , qn Rn form an orthonormal set:
T
q1
QT Q = ... q1 qn = In
qnT
I
I
If Q is orthogonal, then Q1 = QT (inverse is easy to compute!)

Orthogonal transformations preserve lengths and angles and
represent rotations or reflections.
p
kQxk2 = xT QT Qx = xT Ix = kxk2
K. Bergen
(ICME)
85 / 140
Projectors
Recall our notation for the projection of vector v onto u:

proju v =
uT v u
kuk kuk
We can generalize the idea to projections onto subspaces of Rn .

Projector: a square matrix P Rnn is a projector if P = P 2 .
I
I
For v Rn , P v represents the projection of v into the range of P .

If v R(P ) (x : v = P x), then P v = P 2 x = P x = v.
K. Bergen
(ICME)
86 / 140
Orthogonal Projectors
If x Rm and A Rmn is a matrix whose columns form a

subspace of Rm , then x can be decomposed as:
x = xk + x ,
where xk R(A) and x R(A)
xk is the orthogonal projection of x onto R(A).

I
xk is the vector in R(A) closest to x, in the sense that

kx xk k < kx vk,
K. Bergen
(ICME)
v R(A) 6= xk .
87 / 140
... Orthogonal Projectors

An orthogonal projector is a projector for which the range and
nullspace are orthogonal complements.
I
I
An orthogonal projector satisfies P T = P .

If u Rn is a unit vector, then Pu = uuT is the orthogonal
projection onto the line containing u.
If u1 , , uk are an orthonormal basis for subspace V and
A = [u1 un ], then the projection onto V is
PA = AAT
or equivalently: projA x = PA x =
=
K. Bergen
(ICME)
(uT1 x)u1 + + (uTn x)un

proju1 x + + projun x
88 / 140
Questions?
K. Bergen
(ICME)
89 / 140
Mini-Quiz 11:
Show that if P is an orthogonal projector, then I 2P is
orthogonal.
Hint: P T = P for orthogonal projectors and P 2 = P for all
projectors. Q is orthogonal if QT Q = I.
Find the matrix for the orthogonal
A= 0
1
projector PA onto R(A):
0
1 .
0
Hint: The columns of A are orthogonal, but not orthonormal

because the first column does not have length 1. The projection
onto R(Q) is PQ = QQT if the columns of Q are orthonormal.
K. Bergen
(ICME)
90 / 140
Solution (11):
Show that if P is an orthogonal projector, then I 2P is
orthogonal. Solution:
(I 2P )T (I 2P ) = I 2 2P T I 2IP + 4P T P
= 4P 2 4P + I
= 4P 4P + I
(P T = P )
(P 2 = P )
= I
1 0
Find the orthogonal projector PA onto R(A): A = 0 1.
1 0
R(A) = R Q = 0 1
1
0
2
1
1
2 0 2
PA = PQ = QQT = 0 1 0
1
1
2 0 2
K. Bergen
(ICME)
91 / 140
Gram-Schmidt Orthogonalization
Gram-Schmidt algorithm: a method for computing an orthonormal
basis {q1 , , qm } for a set of vectors A = [a1 , , an ] Rmn .
Each vector is orthogonalized with respect to the previous vectors
and then normalized:
v1 = a1
q1 =
v2 = a2 projq1 a2
q2 =
v1
kv1 k
v2
kv2 k
..
.
vk = ak
..
.
k1
X
projqi ak
qk =
vk
kvk k
i=1
In matrix form: A = QR, where

(

qiT aj
Q = q1 , , qm is orthogonal, and Rij =
0
K. Bergen
(ICME)
if i j
if i > j
92 / 140
QR Decomposition
In the process of orthogonalizing the columms of matrix A, the
Gram-Schmidt algorithm computes the QR decomposition of A.
QR decomposition: any matrix A Rmn can be decomposed as
A = QR
where Q Rmm is an orthogonal matrix, and R Rmn is an
upper triangular matrix.
For rectangular A Rmn , m n
A = QR = Q1

R1
Q2
= Q1 R1
0
where R1 Rnn is upper triangular, and Q1 Rmn and

Q2 Rm(mn) both have orthogonal columns.
K. Bergen
(ICME)
93 / 140
QR Decomposition for Solving Linear Systems
Consider the linear system Ax = b, where A Rnn is a square,

nonsingular matrix and b Rn .
I
The QR decomposition expresses the matrix A = QR as the product

of an orthogonal matrix Q (QQT = In ) and upper triangular R:
Ax = b = QRx = b = Rx = QT b
y = QT b can be easily computed, and Rx = y is a triangular system

that can be solved by back substitution.
K. Bergen
(ICME)
94 / 140
Questions?
K. Bergen
(ICME)
95 / 140
Mini-Quiz 12:
Use Gram-Schmidt to compute an orthonormal basis for the

columns of A :
1
2
12
1
3
13
A=
3
2
3
2
3
1
3
Hint: All three columns are normalized to length 1. The first and
second columns are orthogonal, and the second and third columns
are orthogonal. The projection operator for normalized vectors is
proju v = (uT v)u.
K. Bergen
(ICME)
96 / 140
Solution (Q12):
Use Gram-Schmidt to compute an orthonormal basis for A :
A=
q1 = a1 ,
1
13
3
1
3
1
2
12
2
3
2
3
1
3
q2 = a2
v3 = a3 projq1 a3 projq2 a3 = a3 (aT1 a3 )a1

2
1
1
3
9
3
1

5 1
= 23 3
= 9
3 3
1
1
29
3
3
1
q3 =
K. Bergen
(ICME)
16
v3
=
kv3 k 6
26
97 / 140
Least Squares Problems
K. Bergen
(ICME)
98 / 140
Least Squares
We want to solve Ax = b, but what do we do if b

/ R(A) (i.e.
there does not exist a vector x such that Ax = b)?
I
For example, let A Rmn , m > n be a tall-and-skinny

(overdetermined ) matrix. Then for most b Rm , there is no
solution x Rn such that Ax = b.
Least Squares problem: Define residual r = b Ax and find the

vector x that minimizes
krk22 = kb Axk22 .
K. Bergen
(ICME)
99 / 140
... Least Squares
We can decompose any vector b Rm into components b = b1 + b2 ,

with b1 R(A) and b2 N (AT ).
Since b2 is in the orthogonal complement of the R(A), we obtain
the following expression for the residual norm:
krk22 = kb1 Ax + b2 k22 = kb1 Axk22 + kb2 k22
which is minimized when Ax = b1 and r = b2 N (AT ).
K. Bergen
(ICME)
100 / 140
Normal Equations
The least squares solution x occurs when r = b2 N (AT ) or

equivalently AT r = AT (b Ax) = 0.
Rearranging this expression gives the normal equations:
AT Ax = AT b
I
If A has full column rank, then A = AT A is invertible.

= b (where b = AT b) has a unique solution x.
Thus Ax
K. Bergen
(ICME)
101 / 140
Least Squares and Orthogonal Projections

If x? Rn is the least squares solution to Ax = b, then Ax? is the
orthogonal projection of b onto R(A).
x? = arg min kb Axk
x
kb Ax? k kb Axk,
Ax? = PA b
x Rn
(PA is the projection onto R(A))

=
b Ax? R(A) = N (AT )
AT (b Ax? ) = 0
AT Ax? = AT b
Combining the solution x? = (AT A)1 AT b and Ax? = PA b gives

the matrix for the orthogonal projection onto R(A):
PA = A(AT A)1 AT
K. Bergen
(ICME)
102 / 140
Questions?
K. Bergen
(ICME)
103 / 140
Mini-Quiz 13
Find the least squares solution of

1 0
1
0 1 x1 = 2 .
x2
0 1
1
by solving Ax = b1 or AT Ax = AT b
Hints:

0
1
R(A) = 0 + 1 , , R ,
0
1
N (AT ) = 1 , R ,

1
0
0
b = 1 0 + 1.5 1 + 0.5 1
0
1
1
K. Bergen
(ICME)
104 / 140
Mini-Quiz 13
Find the least squares solution of

1 0
1
x
0 1 1 = 2 .
x2
0 1
1
Solution:
1
0
b = b1 + b2 + 1.5 + 0.5 ,
1.5
0.5

1 0
A A=
,
0 2
T

1 0
1
x
Ax = b1 = 0 1 1 = 1.5 =
x2
0 1
1.5

1 0 x1
1
=
AT Ax = AT b =
=
0 2 x2
3
K. Bergen
(ICME)

1
A b=
3
T

x1
1
=
x2
1.5


x1
1
=
x2
1.5
105 / 140
Least Squares via QR

The normal equation AT Ax = AT b can be rewritten using the QR
factorization A = QR:
AT Ax = AT b
RT QT QRx = RT QT b
RT Rx = RT QT b
Rx = QT b
Solution using QR decomposition:
I
I
I
compute QR factorization of A : A = QR
form y = QT b
solve Rx = y by back substitution
K. Bergen
(ICME)
106 / 140
Least Squares via Cholesky
M = AT A is symmetric and positive definite, and can be rewritten

in terms the Cholesky factorization M = LLT
Solution using Cholesky factorization:
I
I
I
I
I
form M = AT A
compute the Cholesky factorization of M : M = LLT
form y = AT b
solve Lz = y by forward substitution
solve LT x = z by back substitution
K. Bergen
(ICME)
107 / 140
Matrix Calculus
Vector-by-scalar and Scalar-by-vector derivatives:

x
R, x Rn ,
1
x
2
=
.. ,
.
x
x2
xn
Vector-by-vector derivatives:
y
y Rm , x Rn ,
y
x
1
x
y1
x2
=
..
.
y1
xn
K. Bergen
(ICME)
y2
x1
y2
x2
..
y2
xn
ym
x1
ym
x2
..
ym
xn
108 / 140
xn
... Matrix Calculus4

Vector-by-vector identities:
(xT x)
= 2x,
x
(bT x)
= b,
x

(xT Ax)
= A + AT x
x
(Ax)
= AT
x
= 2Ax, if A = AT
Derivative of a quadratic:
x kb
I
Axk22 =

xT AT Ax 2bT Ax + bT b = 2 AT Ax AT b
Setting this expression equal to zero gives the normal equations:
x kb
Axk22 = 0 = AT Ax = AT b
see http://en.wikipedia.org/wiki/Matrix_calculus for more properties.

K. Bergen
(ICME)
109 / 140
Eigenvalues and Eigenvectors
K. Bergen
(ICME)
110 / 140
Eigenvalues and Eigenvectors
For any square matrix A Rnn , there is at least one scalar and
a corresponding vector v 6= 0 such that:
Av = v
or equivalently
(A I)v = 0.
This scalar is an eigenvalue of A and v is an eigenvector

corresponding to eigenvalue .
K. Bergen
(ICME)
111 / 140
Geometric interpretation of Eigenvectors
(a)
(b)

2 1
Figure: Under the transformation matrix A =
, the directions of

1 2
1
1
vectors parallel to v1 =
(blue) and v2 =
(purple) are preserved.
1
1
5
diagram from wikipedia.org/wiki/Eigenvalues_and_eigenvectors

K. Bergen
(ICME)
112 / 140
... Eigenvalues and Eigenvectors

The set of distinct eigenvalues of A is called the spectrum of A and
is denoted (A):
(A) := { R | A I is singular.}
The magnitude of the largest eigenvalue in absolute value is called
the spectral radius of A,
(A) = max |i |.
i (A)
The set of all eigenvectors associated with an eigenvalue form

the eigenspace, E = N (A I), associated with .
K. Bergen
(ICME)
113 / 140
Similar Matrices
Two matrices A and B are similar is they share the same
eigenvalues.
I
Similar matrices represent the same linear transformation under two

different bases.
If P is a nonsingular matrix, then B = P 1 AP is similar to A.
P is called a similarity transformation or change of basis matrix.
A and B do not in general have the same eigenvectors.
If v is an eigenvector of A, then P 1 v is an eigenvector of B.
Given a square matrix A Rnn , we wish to to reduce it to its

simplest form by means of a similarity transformation.
K. Bergen
(ICME)
114 / 140
Diagonalizable Matrices
Diagonalizable matrix : a matrix A Rnn that is similar to a
diagonal matrix, i.e. there exists an invertible matrix P such that
P 1 AP = D,
where D is diagonal.
A matrix is diagonalizable if and only if it has n linearly

independent eigenvectors.
I
Matrices with n distinct eigenvalues are diagonalizable.
Real symmetric matrices are diagonalizable by unitary matrices.

I
Matrices are diagonalizable by unitary matrices if and only if they

are normal (a matrix is normal if it satisfies AT A = AAT ).
K. Bergen
(ICME)
115 / 140
Eigendecomposition
Let A Rnn be a square, diagonalizable matrix, then A can be

factorized as
A = V V 1
where = diag(1 , , n ) is a diagonal matrix whose elements
are eigenvalues of A, and V is a square matrix whose ith column
vi is the eigenvector corresponding to eigenvalue ii = i .
I
Note: this decomposition does not exist for every square matrix A.

1 1
e.g. A =
can not be diagonalized.
0 1
K. Bergen
(ICME)
116 / 140
Spectral Theorem
Any symmetric matrix A = AT Rnn , has n (not necessarily

distinct) eigenvalues, and A can be decomposed with the
symmetric eigenvalue decomposition:
A=
n
X
i ui uTi = U U T ,
i=1
where U is orthogonal and = diag(1 , , n ) is diagonal.
K. Bergen
(ICME)
117 / 140
Properties of Eigenvalues
The trace of A Rnn is equal to the sum of its eigenvalues:
tr(A) =
n
X
i=1
The rank of A Rnn is equal to the number of non-zero

eigenvalues of A.
If A Rnn is non-singular with eigenvalue i 6= 0, then
eigenvalue of A1 .
1
i
If A Rnn is symmetric positive definite (A = AT and

z T Az > 0, z Rn ), then all of its eigenvalues are positive.
K. Bergen
(ICME)
118 / 140
is an
Questions?
K. Bergen
(ICME)
119 / 140
Mini-Quiz 14
Show that if is an eigenvalue of a nonsingular matrix A with

corresponding eigenvector v, then v is also an eigenvector of A1 with
eigenvalue 1 .
Hint: we know that Av = v and we want to show that A1 v = 1 v.
K. Bergen
(ICME)
120 / 140
Solution (Q14)
Show that if is an eigenvalue of a nonsingular matrix A with
corresponding eigenvector v, then v is also an eigenvector of A1 with
eigenvalue 1 .
Av = v
A
(Av) = A1 (v)
v = (A1 v)
1
v
K. Bergen
(ICME)
= A1 v
121 / 140
Solving Eigenvalue Problems
Solving eigenvalue problems Av = v is a fundamentally more

difficult problem than solving a linear system Ax = b.
Solving the eigenvalue problem for an n n matrix can be reduced
to solving for the roots of an nth degree polynomial. For n 5, no
equation exists for the roots of an arbitrary nth degree polynomial
given its coefficients.
All algorithms to solve eigenvalue problems for matrices of
arbitrary size are iterative. This contrasts with linear systems
which have algorithms that are guaranteed to produce the solution
in a finite number of steps.
K. Bergen
(ICME)
122 / 140
Solving Small Eigenvalue Problems

Recall that if Av = v, then (A I)v = 0.
I
Since (A I) has a vector, v, in its nullspace, it is not singular

(non-invertible).
Therefore finding the eigenvalues of A is equivalent to finding the
values for which (A I) is singular.
For a 2 2 matrix,

a b
A=
, the matrix inverse is A1 =
c d
1
adbc

d b
.
c a
The quantity ad bc is called the determinant of A, det(A), and

the inverse exists if and only if det(A) 6= 0.
Therefore, for our 2 2 eigenvalue problem, we want to find the
eigenvalues such that det(A I) = 0:

(a )
b
(AI) =
, det(AI) = (a)(d)bc = 0
c
(d )
K. Bergen
(ICME)
123 / 140
... Solving Small Eigenvalue Problems

Example:

2 1
A=
,
1 2

(2 )
1
(A I) =
1
(2 )
det(A I) = (2 )(2 ) 1 1
= 2 4 + 3
This is a 2nd degree polynomial (i.e. quadratic). We want to find
the roots of this polynomial, 1 , and 2 , which will satisfy
det(A i I) = 0.
2 4 + 3 = 0
( 3)( 1) = 0
= 1 = 3,
K. Bergen
(ICME)
2 = 1
124 / 140
Power Method for Finding Eigenvectors

One method for finding the eigenvector, v1 , corresponding to the
largest eigenvalue, 1 , of A Rnn is called the power method :
Pick any vector z Rn , and compute the sequence:
n
o
Az, A2 z, A3 z, A4 z, , Ak z , then Ak z v1 as k .
z can be written as a linear combination of the (linearly
independent) eigenvectors of A:
z = 1 v1 + + n vn
Therefore
Az = A(1 v1 + + n vn )
= 1 Av1 + + n Avn
= 1 1 v1 + + n n vn
and iterating gives
Ak z = 1 k1 v1 + + n kn vn
since 1 is the largest eigenvalue, k1 will dominate the sum, so

Ak z will converge toward the direction of v1 .
K. Bergen
(ICME)
125 / 140
The Singular Value

Decomposition
K. Bergen
(ICME)
126 / 140
Singular Value Decomposition

Theorem (Singular Value Decomposition): every matrix
A Rmn has a decomposition, called the singular value
decomposition (SVD) of the form
A = U V T =
r
X
i ui viT ,
i=1
mm , V Rnn are unitary matrices and
where
U R
r 0
Rmn is a diagonal matrix with
=
0 0
r = diag(1 , , r ) and r min(m, n) = rank(A).
The positive numbers 1 2 r > 0 are called the

singular values of A and are uniquely determined.
K. Bergen
(ICME)
127 / 140
Geometric Interpretation of the SVD
The image of the unit sphere under a matrix is a hyper-ellipse.
diagram: http://people.sc.fsu.edu/~jburkardt/latex/fsu_2006/svd.png
K. Bergen
(ICME)
128 / 140
Singular Values
The number of non-zero singular values r is the rank of the matrix.
The singular values are also related to a class of matrix norms:
Spectral norm (induced by the Euclidean vector norm):
q
kAk2 = max (AT A) = max (A).
Frobenius norm:
v
v
u r
uX
n
q
uX
um X
T
2
kAkF = t
|aij | = trace(A A) = t
i2
i=1 j=1
K. Bergen
(ICME)
i=1
129 / 140
Singular Values and Eigenvalues

The singular value decomposition of A = U V T Rmn is related
to the eigenvalue decomposition of symmetric matrix AT A Rnn :
AT A = (V U T )(U V T ) = V 2 V T = V V T
T.
and similarly AAT = U U T = U U
q
q
i (AAT ), i = 1, , r
= i (A) = i (AT A) =
The singular values of A are the square roots of the eigenvalues of
both AT A and AAT .
The right singular vectors, V , of A are the eigenvectors of AT A
The left singular vectors, U , of A are the eigenvectors of AAT
K. Bergen
(ICME)
130 / 140
Singular Vectors
The columns of U = [u1 , , um ] and V = [v1 , , vn ] are called
the left and right singular vectors of A, respectively.
The columns of U and V form an orthonormal set of vectors,
which can be regarded as orthonormal bases.
The singular vectors provide a convenient way to represent the
bases of the fundamental subspaces of matrix A.
If A Rmn is a matrix of rank r, then
R(A) = span {u1 , , ur }
N (AT ) = span {ur+1 , , um }
R(AT ) = span {v1 , , vr }
N (A) = span {vr+1 , , vn }
K. Bergen
(ICME)
131 / 140
Low-rank Approximation via the SVD

From the definition of the SVD, any matrix A Rmn of rank r
can be written as a sum of rank-one matrices:
A = U V T =
r
X
i ui viT ,
i=1
Theorem (low-rank approximation): Let A Rmn and

0 k r, then
min
B : rank(B)=k
kA Bk2 = k+1
where the minimum is attained by B ? = Ak =
k
P
i=1
i ui viT .
Application: image compression
K. Bergen
(ICME)
132 / 140
Image Compression using SVD
(a) rank-1
approximation
(18,456 bytes 0.6%)
K. Bergen
(ICME)
(b) rank-2
approximation
(36,912 bytes 1.1%)
(c) rank-5
approximation
(92,280 bytes 2.8%)
133 / 140
(a) rank-10
approximation
(184,560 bytes 5.6%)
K. Bergen
(ICME)
(b) rank-20
approximation
(369,120 bytes 11.1%)
(c) rank-40
approximation
(738,240 bytes 22.3%)
134 / 140
(a) original image

(3,317,760 bytes)
K. Bergen
(ICME)
135 / 140
Questions?
K. Bergen
(ICME)
136 / 140
Mini-Quiz 15
The SVD is useful in many linear algebra proofs, since it decomposes a
matrix into orthogonal and diagonal factors which have nice properties.
Let A Rmn = U V T . Show that
kAk2 = max (A) =
max
kxk=1, kyk=1
y T Ax.
Hint: Use the SVD and unitary invariance of the Euclidean vector
norm (kU xk = kxk).
Polar decomposition:
Show how any square matrix A Rnn = U V T can be written as
A = QS,
Q orthogonal, S symmetric positive semi-definite.
Hint: V V T is symmetric positive semi-definite.
K. Bergen
(ICME)
137 / 140
Solution (Q15)
Let A = U V T . Show that kAk2 = max (A) =
Solution:
max
kxk=1,kyk=1
y T Ax =
=
=
max
kxk=1, kyk=1
max
max
kxk=1,kyk=1
y T Ax.
y T (U V T )x
(y T U )(V T x)
kV T xk=1, kU T yk=1
max
k
xk=1, k
y k=1
yT
x
= max ii
i
= max (A)
Show how any square matrix A = U V T can be written as
A = QS, Q orthogonal, S symmetric positive semi-definite.
Solution:
A = U V T = U In V T = U (V T V )V T = (U V T )(V V T )
A = QS,
K. Bergen
(ICME)
where Q = U V T , and S = V V T .
138 / 140
The End!
K. Bergen
(ICME)
139 / 140
References and Acknowledgements

1
Numerical Linear Algebra by Lloyd N. Trefethen & David Bau III.
Matrix Computations by Gene H. Golub & Charles F. Van Load.
Linear Algebra and its Applications by Gilbert Strang.
Linear Algebra with Applications. by Otto Bretscher.
www.Wikipedia.org and Mathworld.Wolfram.com
Thanks to former ICME Refresher Course instructors Yuekai Sun,

Nicole Taheri, and Milinda Lakkam whose slides served as useful
references in creating this presentation.
Thanks to Carlos Sing-Long and Anil Damle for their feedback on
this presentation.
K. Bergen
(ICME)
140 / 140

Applied Linear Algebra Review

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Linear Algebra Review

Uploaded by

Copyright:

Available Formats

Applied Linear Algebra Refresher Course

September 16-19, 2013

Applied Linear Algebra

Matrix and vector properties

Applied Linear Algebra

Vectors and Matrices

Applied Linear Algebra

Scalars, vectors and matrices

Scalar : a single quantity or measurement that is invariant under

Vector : an ordered collection of scalars (e.g. displacement,

(lower case) is a vector with n entries.

All vectors are column vectors unless otherwise noted.

Applied Linear Algebra

... Scalars, vectors and matrices

Matrix : a two-dimensional collection of scalars.

A Rmn is a matrix with m rows and n columns.

Applied Linear Algebra

Inner Product (dot product, scalar product):

Other common notation for the inner product: hx, yi,

Applied Linear Algebra

How do we measure the length or size of a vector?

Scale invariance: f (v) = ||f (v),

Triangle inequality: f (u + v) f (u) + f (v),

Applied Linear Algebra

Commonly used vector norms

2-norm (Euclidean norm):

infinity-norm (max norm):

These are examples of the p-norm, for p = 1, 2, and :

Applied Linear Algebra

Applied Linear Algebra

Compute the 1-norm, 2-norm, infinity-norm of the following vector:

Applied Linear Algebra

Applied Linear Algebra

The inner product satisfies the Cauchy-Schwartz inequality:

with equality when y is a scalar multiple of x.

Geometry: triangle inequality

Applied Linear Algebra

Inner Product and Orthogonality

Orthogonal vectors: vectors u and v are orthogonal (u v) if their

Orthonormal vectors: vectors u and v are orthonormal if they are

and kuk = 1, kvk = 1

The orthogonal complement of a set of vectors V , denoted V , is

Applied Linear Algebra

Inner Product and Projections

Projection: the inner product gives us a way to compute the

When u is normalized to unit length, kuk = 1, this becomes

Applied Linear Algebra

Applied Linear Algebra

Give an algebraic proof for the triangle inequality

Applied Linear Algebra

kxk2 + kyk2 + 2|xT y|

Applied Linear Algebra

Operations with Matrices

Applied Linear Algebra

Matrix-vector product as linear combination of matrix columns

Applied Linear Algebra

Matrix products in terms of inner

Applied Linear Algebra

... Matrix-matrix multiplication

Matrix-matrix multiplication can be expressed in terms of the

Applied Linear Algebra

Properties of Matrix-matrix multiplication

Applied Linear Algebra

Vector Outer Product