Matrix Analysis For Scientists and Engineers by Alan J Laub

Matrix Analysis
Matrix Analysis
for Scientists &
for Scientists & Engineers
Engineers
This page intentionally
This page intentionally left
left blank
blank
Matrix Analysis
Matrix Analysis
for Scientists &
for Scientists & Engineers
Engineers
Alan
Alan J. Laub
J. Laub
University of California
Davis, California
slam.
Copyright © 2005
Copyright by the
2005 by the Society
Society for Industrial and
for Industrial and Applied Mathematics.
Applied Mathematics.
10987654321
10987654321
All rights
All rights reserved.
reserved. Printed
Printed in
in the
the United
United States
States of
of America.
America. NoNo part
part of
of this book
this book
may be
may be reproduced,
reproduced, stored,
stored, or
or transmitted
transmitted inin any
any manner
manner without the written
without the written permission
permission
of the
of publisher. For
the publisher. For information,
information, write to the
write to the Society
Society for Industrial and
for Industrial Applied
and Applied
Mathematics, 3600
Mathematics, 3600 University
University City
City Science
Science Center,
Center, Philadelphia,
Philadelphia, PA
PA 19104-2688.
19104-2688.
MATLAB® is
MATLAB® is a
a registered
registered trademark
trademark of The MathWorks,
of The MathWorks, Inc.
Inc. For
For MATLAB
MATLAB product
product information,
information,
please contact
please The MathWorks,
contact The MathWorks, Inc.,
Inc., 3
3 Apple Hill Drive,
Apple Hill Drive, Natick,
Natick, MA
MA 01760-2098
01760-2098 USA,
USA,
508-647-7000, Fax:
508-647-7000, 508-647-7101, info@mathworks.com,
Fax: 508-647-7101, www.mathworks.com
info@mathworks.com, wwwmathworks.com
Mathematica is
Mathematica is a
a registered
registered trademark
trademark of
of Wolfram
Wolfram Research,
Research, Inc.
Inc.
Mathcad is
Mathcad is a
a registered
registered trademark of Mathsoft
trademark of Mathsoft Engineering
Engineering &
& Education,
Education, Inc.
Inc.
Library of
Library of Congress
Congress Cataloging-in-Publication
Cataloging-in-Publication Data
Data
Laub, Alan
Laub, J., 1948-
Alan J., 1948-
Matrix analysis
Matrix analysis for scientists and
for scientists and engineers
engineers // Alan
Alan J.
J. Laub.
Laub.
p. cm.
p. cm.
Includes bibliographical
Includes bibliographical references
references and
and index.
index.
ISBN 0-89871-576-8
ISBN 0-89871-576-8 (pbk.)
(pbk.)
1. Matrices.
1. Matrices. 2.
2. Mathematical
Mathematical analysis.
analysis. I.I. Title.
Title.
QA188138
QA 2005
188.L38 2005
512.9'434—dc22
512.9'434-dc22
2004059962
2004059962
About the cover:

About the cover: The
The original
original artwork
artwork featured on the
featured on cover was
the cover created by
was created by freelance
freelance
artist Aaron
artist Tallon of
Aaron Tallon of Philadelphia,
Philadelphia, PA.
PA. Used
Used by
by permission.
permission .
slam
•
5.lam... is a
is a registered
registered trademark.
trademark.
To my
To my wife,
wife, Beverley
Beverley
(who captivated UBC math library
captivated me in the UBC
nearly forty
nearly forty years ago)
left blank
blank
Contents
Contents
Preface
Preface xi
xi
11 Introduction and
Introduction and Review
Review 11
1.1 Some
1.1 Some Notation and Terminology
Notation and Terminology 11
1.2 Matrix
1.2 Matrix Arithmetic
Arithmetic . . . . . . . . 33
1.3 Inner
1.3 Inner Products and Orthogonality
Products and Orthogonality . 4
1.4 Determinants
1.4 Determinants 44
2
2 Vector Spaces
Vector Spaces 77
2.1 Definitions and
2.1 Definitions and Examples
Examples . 77
2.2 Subspaces
2.2 Subspaces......... 99
2.3
2.3 Linear
Linear Independence
Independence . . . 10
10
2.4 Sums and
2.4 Sums and Intersections of Subspaces
Intersections of Subspaces 13
13
33 Linear Transformations
Linear Transformations 17
17
3.1
3.1 Definition and Examples
Definition and Examples . . . . . . . . . . . . . 17
17
3.2 Matrix Representation
3.2 Matrix of Linear
Representation of Linear Transformations
Transformations 18
18
3.3 Composition of
3.3 Composition of Transformations
Transformations . . 19
19
3.4 Structure of
3.4 Structure of Linear
Linear Transformations
Transformations 20
20
3.5
3.5 Four
Four Fundamental
Fundamental Subspaces
Subspaces . . . . 22
22
4
4 Introduction to
Introduction to the
the Moore-Penrose
Moore-Penrose Pseudoinverse
Pseudoinverse 29
4.1
4.1 Definitions and
Definitions and Characterizations
Characterizations. 29
4.2 Examples
4.2 Examples.......... 30
30
4.3 Properties
4.3 Properties and
and Applications
Applications . . . . 31
31
55 Introduction to
Introduction to the
the Singular
Singular Value
Value Decomposition
Decomposition 35
35
5.1
5.1 The
The Fundamental
Fundamental Theorem
Theorem . . . 35
35
5.2 Some Basic
5.2 Some Basic Properties
Properties . . . . . 38
5.3 Row and Column
5.3 Rowand Compressions
Column Compressions 40
6
6 Linear Equations
Linear Equations 43
43
6.1
6.1 Vector
Vector Linear
Linear Equations
Equations . . . . . . . . . 43
43
6.2 Matrix Linear
6.2 Matrix Linear Equations
Equations . . . . . . . . 44
6.3
6.3 AA More
More General
General Matrix
Matrix Linear
Linear Equation
Equation 47
47
6.4 Some Useful
6.4 Some Useful and Interesting Inverses
and Interesting Inverses. 47
47
vii
viii
viii Contents
Contents
7 Projections, Inner Product Spaces, and Norms 51

51
7.1
7.1 Projections
Projections . . . . . . . . . . . . . . . . . . . . . . 51
51
7.1.1
7.1.1 The four
The four fundamental
fundamental orthogonal
orthogonal projections
projections 52
52
7.2 Inner
7.2 Inner Product
Product Spaces
Spaces 54
54
7.3 Vector
7.3 Vector Norms
Norms 57
57
7.4 Matrix
7.4 Matrix Norms
Norms . . . . 59
59
8 Linear Least Squares Problems 65

65
8.1 The
8.1 The Linear
Linear Least Squares Problem
Least Squares Problem . . . . . . . . . . . . . . 65
65
8.2 Geometric
8.2 Geometric Solution
Solution . . . . . . . . . . . . . . . . . . . . . . 67
67
8.3 Linear
8.3 Linear Regression
Regression and Other Linear
and Other Linear Least
Least Squares
Squares Problems
Problems 67
67
8.3.1 Example:
8.3.1 Example: Linear regression . . . . . . .
Linear regression 67
67
8.3.2 Other
8.3.2 Other least
least squares
squares problems
problems . . . . . . . 69
8.4 Least
8.4 Squares and
Least Squares and Singular
Singular Value
Value Decomposition
Decomposition 70
70
8.5 Least
8.5 Squares and
Least Squares QR Factorization
and QR Factorization . . . . . . . 71
71
9 Eigenvalues and Eigenvectors 75

9.1
9.1 Fundamental
Fundamental Definitions and Properties
Definitions and Properties 75
75
9.2
9.2 Jordan
Jordan Canonical
Canonical Form
Form . . . . . 82
82
9.3 Determination
9.3 of the
Determination of JCF . . . . .
the JCF 85
85
9.3.1
9.3.1 Theoretical computation
Theoretical computation . 86
86
9.3.2
9.3.2 OnOn the +1's
the + l's in JCF blocks
in JCF blocks 88
88
9.4 Geometric
9.4 Aspects of
Geometric Aspects of the JCF
the JCF 89
89
9.5 The
9.5 The Matrix Sign Function
Matrix Sign Function. 91
91
10 Canonical Forms 95
95
10.1 Some
10.1 Some Basic Canonical Forms
Basic Canonical Forms . 95
95
10.2 Definite
10.2 Definite Matrices
Matrices . . . . . . . 99
10.3 Equivalence
10.3 Equivalence Transformations and Congruence
Transformations and Congruence 102
102
10.3.1 Block
10.3.1 Block matrices and definiteness
matrices and definiteness 104
104
10.4 Rational
10.4 Canonical Form
Rational Canonical Form . . . . . . . . . 104
104
11 Linear
11 Differential and
Linear Differential and Difference
Difference Equations
Equations 109
109
11.1 Differential
ILl Differential Equations
Equations . . . . . . . . . . . . . . . . 109
109
11.1.1 Properties
11.1.1 Properties ofthe
of the matrix exponential . . . .
matrix exponential 109
109
11.1.2 Homogeneous
11.1.2 Homogeneous linear
linear differential
differential equations
equations 112
112
11.1.3 Inhomogeneous
11.1.3 Inhomogeneous linear
linear differential
equations 112
112
11.1.4 Linear
11.1.4 matrix differential
Linear matrix equations . .
differential equations 113
113
11.1.5 Modal
11.1.5 Modal decompositions
decompositions . . . . . . . . . 114
114
11.1.6 Computation
11.1.6 Computation ofof the
the matrix exponential
matrix exponential 114
114
11.2 Difference
11.2 Equations . . . . . . . . . . . . . .
Difference Equations 118
118
11.2.1 Homogeneous
linear difference equations
difference equations 118
118
11.2.2 Inhomogeneous linear difference equations
11.2.2 Inhomogeneous linear difference equations 118
118
11.2.3 Computation
11.2.3 Computation ofof matrix
matrix powers
powers . 119
119
11.3 Higher-Order
11.3 Higher-Order Equations
Equations. . . . . . . . . . . . . . . 120
120
Contents
Contents ix
ix
12 Generalized
12 Generalized Eigenvalue
Eigenvalue Problems
Problems 125
12.1 The Generalized Eigenvalue/Eigenvector Problem
12.1 The Generalized EigenvaluelEigenvector Problem 125
125
12.2 Canonical
12.2 Canonical Forms
Forms . . . . . . . . . . . . . . . . . 127
127
12.3 Application
12.3 Application to
to the Computation of
the Computation System Zeros
of System Zeros . 130
12.4 Symmetric
12.4 Symmetric Generalized
Generalized Eigenvalue Problems .
Eigenvalue Problems 131
131
12.5 Simultaneous
12.5 Simultaneous Diagonalization
Diagonalization . . . . . . . . . 133
12.5.1 Simultaneous
12.5.1 Simultaneous diagonalization
diagonalization via
via SVD
SVD 133
133
12.6 Higher-Order
12.6 Higher-Order Eigenvalue Problems ..
Eigenvalue Problems 135
135
12.6.1 Conversion
12.6.1 Conversion to first-order form
to first-order form 135
13 Kronecker
13 Kronecker Products
Products 139
139
13.1 Definition
13.1 and Examples
Definition and Examples . . . . . . . . . . . . . 139
139
13.2 Properties
13.2 Properties of
of the
the Kronecker
Kronecker Product
Product . . . . . . . 140
13.3 Application
13.3 Application to Sylvester and
to Sylvester and Lyapunov
Lyapunov Equations
Equations 144
144
Bibliography
Bibliography 151
Index
Index 153
left blank
blank
Preface
Preface
This
This book
book is intended to
is intended to be
be used
used as as aa text for beginning
text for beginning graduate-level
graduate-level (or (or even
even senior-level)
senior-level)
students in
students in engineering,
engineering, the the sciences,
sciences, mathematics, computer science,
mathematics, computer science, or or computational
computational
science
science who wish to be familar with enough enough matrix analysis
analysis that they
they areare prepared
prepared to to use its
tools and ideas
tools and comfortably in
ideas comfortably in aa variety
variety of of applications.
applications. By By matrix analysis II mean
matrix analysis linear
mean linear
algebra and
algebra and matrix
matrix theory
theory together
together with with their
their intrinsic
intrinsic interaction
interaction with
with and application to
and application to
linear
linear dynamical
dynamical systemssystems (systems
(systems of linear differential
of linear differential or or difference
difference equations).
equations). The text
The text
can be
can be used
used in in aa one-quarter
one-quarter or or one-semester
one-semester course course to to provide
provide aa compact overview of
compact overview of
much of
much of the
the important
important and and useful
useful mathematics
mathematics that, in many
that, in many cases, students meant
cases, students meant to to learn
learn
thoroughly
thoroughly as as undergraduates,
undergraduates, but somehow didn't
but somehow didn't quite manage to
quite manage to do.
do. Certain topics
Certain topics
that
that may
may have
have beenbeen treated
treated cursorily
cursorily in in undergraduate
undergraduate courses
courses areare treated
treated in in more depth
more depth
and more
and more advanced
advanced material
material is is introduced.
introduced. II have have tried
tried throughout
throughout to to emphasize
emphasize only only thethe
more important and "useful" tools, methods, and mathematical structures. Instructors are
encouraged to
encouraged supplement the
to supplement the book
book with with specific application examples
specific application examples from from their
their own
own
particular
particular subject area.
subject area.
The choice
The choice of of topics
topics covered
covered in in linear
linear algebra
algebra andand matrix
matrix theory
theory is is motivated
motivated bothboth by by
applications and
applications and by computational utility
by computational utility and relevance. The
and relevance. The concept
concept of of matrix
matrix factorization
factorization
is
is emphasized
emphasized throughout
throughout to to provide
provide aa foundation
foundation for for aa later
later course
course in in numerical
numerical linear
linear
algebra. Matrices
algebra. Matrices are stressed more
are stressed more than abstract vector
than abstract vector spaces,
spaces, although Chapters 22 and
although Chapters and 3 3
do cover
do cover some geometric (i.e.,
some geometric (i.e., basis-free
basis-free or subspace) aspects
or subspace) aspects ofof many
many of of the fundamental
the fundamental
notions. The books by Meyer [18], Noble and Daniel [20], Ortega Ortega [21], and Strang [24]
are excellent
are excellent companion
companion texts texts for this book.
for this Upon completion
book. Upon completion of of aa course
course based
based on on this
this
text, the
text, student is
the student is then well-equipped to
then well-equipped to pursue, either via
pursue, either via formal courses or
formal courses or through self-
through self-
study, follow-on topics on the computational side (at the level of [7], [II], [11], [23], or [25], for
example) or
example) or on
on the
the theoretical
theoretical side side (at
(at the level of
the level [12], [13],
of [12], [13], or [16], for
or [16], example).
for example).
Prerequisites for for using this this text are quite modest: essentially essentially just an understanding
understanding
of calculus
of calculus and definitely some
and definitely some previous exposure to
previous exposure to matrices
matrices and and linear algebra. Basic
linear algebra. Basic
concepts such
concepts such as determinants, singularity
as determinants, singularity of of matrices, eigenvalues and
matrices, eigenvalues and eigenvectors,
eigenvectors, and and
positive definite matrices
matrices should have been covered at least least once, even though their recollec-
tion may
tion may occasionally
occasionally be "hazy." However,
be "hazy." However, requiring
requiring suchsuch material
material as as prerequisite permits
prerequisite permits
the early
the early (but "out-of-order" by
(but "out-of-order" by conventional standards) introduction
conventional standards) introduction of of topics
topics such
such as
as pseu-
pseu-
doinverses and
doinverses and the singular value
the singular decomposition (SVD).
value decomposition (SVD). These
These powerful
powerful and and versatile tools
versatile tools
can then be exploited
can exploited to to provide a unifying foundationfoundation upon which to base subsequent subsequent top- top-
ics. Because
ics. Because tools such as
tools such as the SVD are
the SVD are not
not generally amenable to
generally amenable to "hand
"hand computation,"
computation," this this
approach necessarily
approach necessarily presupposes
presupposes the availability of
the availability of appropriate mathematical software
appropriate mathematical software on on
aa digital
digital computer.
computer. For For this,
this, II highly
highly recommend
recommend MAlLAB®MATLAB® although
although other
other software
software suchsuch asas
xi
xi
xii
xii Preface
Preface
Mathematica® or Mathcad®
Mathcad® is also excellent. Since this text is not intended for a course in
numerical linear algebra per se, the details of most of the numerical aspects of linear algebra
per se,
are deferred to
are deferred to such
such aa course.
course.
The presentation of the material in this book is is strongly influenced
influenced by by computa-
computa-
tional issues for two principal reasons. First, "real-life"
"real-life" problems seldom yield to simple
closed-form formulas or solutions. They must generally be solved computationally and
closed-form
it is important to know which types of algorithms can be relied upon and which cannot.
Some of
Some of the
the key
key algorithms
algorithms of numerical linear
of numerical linear algebra,
algebra, in
in particular, form the
particular, form the foundation
foundation
upon which rests virtually
virtually all of modern
modem scientific and engineering computation. A second
motivation for a computational emphasis is that it provides many of the essential tools for
what I call "qualitative mathematics."
mathematics." For example, in an elementary linear algebra course,
a set of vectors is either linearly independent or it is not. This is an absolutely fundamental
fundamental
concept. But in most engineering or scientific contexts we want to know more than that.
If a set of vectors is linearly
If linearly independent,
independent, how "nearly dependent" are the vectors? If If they
are linearly dependent, are there "best" linearly independent subsets? These tum turn out to
be
be much more difficult
much more difficult problems
problems and frequently involve
and frequently involve research-level
research-level questions
questions when
when set
set
in the context of of the finite-precision, finite-range floating-point arithmetic environment of of
most modem
modern computing platforms.
Some of of the
the applications
applications ofof matrix
matrix analysis
analysis mentioned
mentioned briefly
briefly in
in this
this book
book derive
from the modern state-space approach to dynamical systems. State-space
modem state-space State-space methods are
now standard
standard in much of modernmodem engineering where, for example, control systems with
large numbers
numbers of interacting inputs, outputs, and states often give rise to models models of very
high order that must be analyzed, simulated, and evaluated. The "language" in which such
models are conveniently described
described involves vectors and matrices. It is thus crucial to acquire
knowledge of the vocabulary
a working knowledge vocabulary and grammar of this language. The tools of matrix
analysis are also applied
applied on a daily basis to problems in biology, chemistry, econometrics,
physics, statistics, and a wide variety of other fields, and thus the text can serve a rather
diverse audience.
diverse audience. Mastery of the material in this text should enable the student to read and
understand the modern
modem language of matrices used throughout mathematics, science, and
engineering.
While prerequisites
prerequisites for this text are modest, and while most material is developed
developed from
basic ideas in the book, the student does require a certain amount of what is conventionally
referred
referred to as "mathematical maturity." Proofs Proofs are given for many theorems. When they are
not given explicitly,
not given explicitly, they
they are
are either obvious or
either obvious or easily
easily found
found in
in the literature. This
the literature. This is
is ideal
ideal
material from which to learn a bit about mathematical proofs and the mathematical maturity
and insight gained thereby. It is my firm conviction
conviction that such maturity is neither
neither encouraged
nor nurtured by relegating the mathematical aspects of applications (for example, linear
algebra for elementary state-space theory) to
algebra to an appendix or introducing
introducing it "on-the-f1y"
"on-the-fly" when
necessary. Rather,
Rather, one must
must lay
lay a firm foundation upon
firm foundation upon which
which subsequent applications and and
perspectives
perspectives can be built in a logical, consistent, and coherent fashion.
I have taught this material for many years, many times at UCSB and twice at UC
Davis,
Davis, andand the course has
the course has proven
proven to to be
be remarkably successful at
remarkably successful enabling students
at enabling students from
from
disparate backgrounds to acquire a quite acceptable acceptable level of mathematical maturity and
rigor for subsequent graduate
graduate studies in a variety of disciplines. Indeed, many students who
completed
completed the course, especially
especially the first few times it was offered,
offered, remarked afterward that
if only they had had this course before they took linear systems, or signal processing.
if processing,
Preface
Preface xiii
XIII
or estimation theory, etc., they would have been able to concentrate on the new ideas
they wanted to learn, rather than having to spend time making up for deficiencies
deficiencies in their
background
background in matrices and linear algebra. My fellow instructors, too, realized that by
requiring this course as a prerequisite, they no longer had to provide as much time for
"review" and could focus instead on the subject at hand. The concept seems to work.
— AJL, June 2004

-AJL,
left blank
blank
Chapter 1
Chapter 1
Introduction and
Introduction and Review
Review
1.1
1.1 Some Notation
Some Notation and
and Terminology
Terminology
We begin
We begin with
with aa brief introduction to
brief introduction to some
some standard
standard notation and terminology
notation and terminology to
to be
be used
used
throughout the
throughout text. This
the text. This is
is followed
followed by
by aa review
review of
of some
some basic notions in
basic notions in matrix
matrix analysis
analysis
and linear
and linear algebra.
algebra.
The following
The following sets
sets appear
appear frequently
frequently throughout
throughout subsequent
subsequent chapters:
chapters:
1. Rnn== the
I. IR the set
set of
of n-tuples
n-tuples of
of real
real numbers
numbers represented as column
represented as column vectors.
vectors. Thus,
Thus, xx Ee Rn
IR n
means
means
where Xi
where xi Ee R for ii Ee !!.
IR for n.
Henceforth, the notation!!
Henceforth, the notation n denotes
denotes the
the set
set {I,
{1, ...
..., , nn}.
}.
Note:
Note: Vectors
Vectors are always column
are always column vectors.
vectors. A A row
row vector is denoted
vector is denoted by yT, where
by y~ where
yy G
E Rn
IR n and
and the
the superscript
superscript TT is
is the
the transpose
transpose operation.
operation. That
That aa vector
vector isis always
always aa
column vector
column vector rather
rather than
than aa row
row vector
vector is entirely arbitrary,
is entirely arbitrary, but this convention
but this convention makes
makes
it easy
it easy toto recognize
recognize immediately
immediately throughout
throughout thethe text that, e.g.,
text that, x TTyy is
e.g., X is aa scalar
scalar while
while
T
xy
xyT isis an
an nn xx nn matrix.
matrix.
2. en
2. Cn = the
the set
set of
of n-tuples
n-tuples of
of complex
complex numbers
numbers represented
represented as
as column
column vectors.
vectors.
Rrnmxn
3. IR = the
xn = the set
set of
of real
real (or
(or real-valued)
real-valued) m
m xx nn matrices.
matrices.
Rmxnr
4. 1R;n = the set
=
xn set of
of real
real m x n matrices of
of rank
rank r. Thus, Rnxnn
Thus, IR~ denotes the
xn denotes the set
set of
of real
real
nonsingular
nonsingular n matrices.
n xx nn matrices.
5. emxn
5. Crnxn = the
= the set
set of
of complex
complex (or
(or complex-valued)
complex-valued) m xx nn matrices.
matrices.
Cmxn
6. e;n xn = the
= the set
set of
of complex
complex m
m xx n matrices of
n matrices of rank
rank r.
r.
1
22 Chapter 1.
Chapter 1. Introduction
Introduction and
and Review
Review
We now classify some of the more familiar "shaped" matrices. A matrix A Ee IRn xn
(or A enxn
eC"
A E x
")is
) is
•• diagonal
diagonal if
if aij
a,7 == 00 for
forii i=
^ }.j.
upper triangular
•• upper triangular if
if aij
a,; == 00 for
forii >> }.j.
lower triangular
•• lower triangular if a,7 == 00 for
if aij for i/ << }.j.
•• tridiagonal
tridiagonal if a(y =
if aij = 00 for
for Ii|z -—JI > 1.
j\ > 1.
•• pentadiagonal
pentadiagonal if ai; =
if aij = 00 for
for Ii|/ -—J
j\I >> 2.
2.
•• upper
upper Hessenberg
Hessenberg if afj == 00 for
if aij for ii -— jj >> 1.
1.
•• lower
lower Hessenberg
Hessenberg if
if aij
a,; == 00 for
for }j -—ii >> 1.
1.
Each of the above also has a "block" analogue obtained by replacing scalar components in
nxn mxn
the respective definitions
definitions by block
block submatrices.
submatrices. For
For example,
example, if if A Ee IR
Rnxn , , B Ee IR
R nxm ,, and
C Rmxm, then
C Ee jRmxm, then the (m + n)
the (m (m + n)
n) xx (m matrix [~
n) matrix [A0Bc~]] isisblock
block upper
upper triangular.
triangular.
T
The transpose of
The of aa matrix
matrix A is denoted
A is denoted byby A and is
AT and the matrix
is the whose (i, j)th
matrix whose j)th entry
entry
7 mx
is the (j,
(7, i)th
Oth entry of A, A, that is, (AT)ij
(A ),, = aji.
a,,. Note that if A A E e R ", then A
jRmxn, AT7" e E" xm .
E jRnxm.
A Ee em
If A
If C mxxn, Hermitian transpose (or conjugate
", then its Hermitian conjugate transpose) is denoted by AHH (or
H
sometimes A*)
sometimes and its
A*) and its (i, j)\h entry is
j)th entry (A ), 7 =
is (AH)ij = (aji),
(«77), where
where the the bar indicates complex
bar indicates complex
conjugation; i.e.,
T
i.e., if z =
= aIX + jf$ (j =
jfJ (j = ii =
H
= R), z
v^T), then z = = IX
a -— jfi. A matrix A
jfJ. A is symmetric
A is
if A
if A == A Hermitian if A
A T and Hermitian A = = AA H.. We henceforth adopt the convention that,
We henceforth that, unless
otherwise noted,
otherwise noted, anan equation
equation likelike AA == A ATT implies
implies that
that A is real-valued
A is real-valued while
while aa statement
statement
H
like A
A == AAH implies that A complex-valued.
A is complex-valued.
Remark
Remark 1.1. While R isis most
While \/—\ most commonly
commonly denoted
denoted by in mathematics
by ii in mathematics texts,
texts, }j is
is
the
the more common notation
more common notation in
in electrical
electrical engineering and system
engineering and system theory.
theory. There is some
There is some
advantage to being conversant with both notations. The notation j is used throughout the
text
text but
but reminders are placed
reminders are placed at
at strategic
strategic locations.
locations.
Example 1.2.
Example 1.2.
1. A = [ ; ~ is symmetric
] is symmetric (and Hermitian).
(and Hermitian).
2. A = [ 7+}
5 7+
2 j ] is
is complex-valued symmetric but
complex-valued symmetric but not Hermitian.
not Hermitian.
3 · A -- [ 7 -5 j 7+}
2 ] is
is Hermitian
Hermitian (but
(but not symmetric).
not symmetric).
Transposes of
Transposes of block matrices can
block matrices can be
be defined in an
defined in an obvious way. For
obvious way. For example,
example, it
it is
is
easy to
easy to see
see that
that if
if A,, are appropriately
Aij are appropriately dimensioned
dimensioned subblocks,
subblocks, then
then
r = [
1.2. Matrix Arithmetic 3
11.2 Arithmetic
.2 Matrix Arithmetic
It is assumed that the reader is familiar with the fundamental notions of matrix addition,
multiplication of a matrix by a scalar, and multiplication of matrices.
A special case of matrix multiplication
multiplication occurs when the second
second matrix is a column
vector x, i.e.,
i.e., the matrix-vector product Ax.
Ax. A very important way to view this product is
to interpret
interpret it as a weighted
weighted sum (linear
(linear combination)
combination) of the columns of A. That is, suppose
suppose
A = la' ....• a"1 E

m
JR " with a, E JRm and x = Il ;xn~ ]
Then
Ax = Xjal + ... + Xnan E jRm.
The importance
importance of this interpretation
interpretation cannot be overemphasized. As a numerical example,
take
take A
A = [~ 85 74]x
= [96 ~], x ==
! [~].
2 . Then
Then we can quickly
we can quickly calculate dot products
calculate dot products of
of the rows of
the rows of A
A
with the column

column x to find Ax = [50[;~],
Ax = 32]' but this matrix-vector
matrix-vector product
product can also be computed
computed
v1a
via
3.[ ~ J+2.[ ~ J+l.[ ~ l

computer-architecture-related advan-
For large arrays of numbers, there can be important computer-architecture-related
tages to preferring the latter calculation method.
mxn nxp
For matrix multiplication, suppose A
multiplication, suppose A eE RjRmxn and
and B = [bi,...,b
[hI,.'" hpp]] e jRnxp with
E R
1
bi e W
hi E jRn.. Then the matrix product A ABB can be thought of as above, applied p times:
There is also an alternative, but equivalent, formulation of matrix multiplication that appears
frequently in the text and is presented below as a theorem. Again, its importance cannot be
overemphasized. It It is deceptively simple and its full understanding is well rewarded.
Theorem
Theorem 1.3.1.3. Let U
U = [MI,
[Uj, ....
. . ,, un] Rmxn with
un]Ee jRmxn Rm and
withUiut Ee jRm andVV == [VI,
[v{.•. pxn
vn]Ee lRRPxn
,...,, Vn]
p
vt eE R
with Vi jRP.. Then
n
UV T = LUiVr E jRmxp.
i=I
If matrices C and D are compatible for multiplication, recall that (CD)

If (C D)TT = DT C TT
= DT
H H H
(C D)H =— DH
(or (CD} D CC H).). This gives a dual to the matrix-vector
matrix-vector result above. Namely, if if
C EeR mxn
has row
jRmxn has row vectors cJ E l x ", and
cj Ee jRlxn, and is
is premultiplied
premultiplied by
by aa row
row vector yT Rlxm,
yTeE jRlxm,
then the product can be written as a weighted linear sum of the rows of C as follows:
follows:
yTC=YICf +"'+Ymc~ EjRlxn.
Theorem
Theorem 1.3 can then also be generalized to its "row
"row dual." The details are left
left to the readei
reader.
4
4 Chapter 1.
Introduction and
and Review
Review
1.3
1.3 Inner Products
Inner Products and
and Orthogonality
Orthogonality
For vectors
For vectors x, yy E
e R", the Euclidean
IRn, the Euclidean inner product (or inner
inner product inner product, for
for short)
short) of x and
yy is
is given
given by
by
n
T
(x, y) := x y = Lx;y;.
;=1
Note
Note that the inner
that the inner product
product is
is aa scalar.
scalar.
If x, y Ee <en,
If C", wewe define their complex
define their Euclidean inner
complex Euclidean inner product
product (or
(or inner
inner product,
product,
for short)
for short) by
by
n
(x'Y}c :=xHy = Lx;y;.
;=1
Note that (x, (x, y)c =

y)c = (y,
(y, x) i.e., the order
c, i.e.,
x}c, order in
in which
which xx and yy appear
appear in
in the complex inner
product
product is is important.
important. The
The more
more conventional
conventional definition
definition of
of the complex inner
the complex inner product
product is
is
((x,
x , yy)c
)c = yHxx =
= yH = L:7=1 xiyi but
Eni=1 x;y; but throughout the text
throughout the text we
we prefer
prefer the
the symmetry with the
symmetry with the real
real
case.
case.
Example 1.4. Let
Example 1.4. Let xx = and yy == [~].
[1j]] and
= [} [1/2]. Then
Then
(x, Y}c = [ } JH [ ~ ] = [I - j] [ ~ ] = 1 - 2j
while
while
and we
and see that,
we see indeed, (x,
that, indeed, (x, Y}c
y)c =
= {y,
(y, x)c'
x)c.
Note that xx TTxx =
Note that = 00 if
if and
and only
only ifif xx == 00 when
HH
when xx eE RnIRn but
but that
that this
this is
is not
not true
true if
ifxx eE Cn.
en.
What
What isis true
true in
in the complex case
the complex case isis that
that X x x = 00 if and only
if and only ifif x = 0. O. To illustrate, consider
To illustrate, consider
the nonzero vector
the nonzero vector xx above.
above. Then
Then X T
x TXx = =0 0 but
but X H
= 2.2.
x HXX =
Two
Two nonzero
nonzero vectors
vectors x, x, y eE IR R are
n
are said
said to
to be orthogonal if
be orthogonal if their inner product
their inner product is is
zero, i.e., xxTTyy =
zero, i.e., = 0. O. Nonzero
Nonzero complex
complex vectors
vectors areare orthogonal
orthogonal if if X H
x Hyy = = O.0. If xx and
and yy areare
orthogonal
orthogonal and and X
nxn
T
x TXx = = 11 and
and yyTT
yy == 1,1, then
then wewe say
T
say that are orthonormal.
and yy are
that xx and
T
orthonormal. A A
matrix
matrix A eE IR R nxn is an orthogonal
is an orthogonal matrix matrix if if A
AT AA == AAAAT = /,
= I, where
where /I is is the
the n n x x nn
identity
identity matrix.
matrix. The notation /„ In is sometimes
sometimes used used to denote
denote the identity matrix in in IRRnxn
nx
"
(or en xn).
(orC" x
"). Similarly,
Similarly, a matrix A A eE en nxn
C xn is said H
said to be unitary if A H A = = AA H =H
= I. Clearly
an
an orthogonal
orthogonal or or unitary matrix has
unitary matrix has orthonormal
orthonormal rows and orthonormal
rows and orthonormal columns.
columns. ThereThere is is
no special name
no special name attached
attached to to aa nonsquare
nonsquare matrixmatrix A A e R mxn (or
E ]Rrn"n (or €E e
mxn
Cmxn ))with
with orthonormal
orthonormal
rows
rows or columns.
or columns.
1.4
1.4 Determinants
Determinants
It is assumed
It assumed that the reader is familiar with the basic theory of
of determinants.
determinants. For A
A E IRnnxn
eR xn
nxn
(or
(or AA 6 C xn)
E en ) we
we use
use the notation det
the notation det A for the
A for the determinant
determinant of A. We
of A. We list
list below
below some
some of
of
1.4. Determinants
1.4. Determinants 5
the more
more useful properties
properties of determinants. Note that this is
is not aa minimal set, i.e., several
properties are consequences of one or more of the others.
properties are consequences of one or more of the others.
1. If
If A
A has a zero row or if any two rows of A
A are equal, then det A = 0.o.
A =
2. If
If A
A has
has aa zero
zero column
column or
or if
if any
any two
two columns
columns of
of A
A are
are equal,
equal, then
then det
det A = 0.
A = O.
3. Interchanging
3. Interchanging two
two rows of A
rows of A changes
changes only
only the sign of
the sign of the
the determinant.
determinant.
4. Interchanging two columns of A changes only the sign of

of the determinant.
5.
5. Multiplying
Multiplying aa row
row of
of A
A by
by aa scalar
scalar a
ex results
results in
in aa new
new matrix
matrix whose
whose determinant
determinant is
is
a det A.
exdetA.
6. Multiplying
Multiplying a column of A
A by a scalar
scalar ex
a results in a new matrix whose determinant
determinant
is a det
is ex det A.
A.
7. Multiplying
7. Multiplying aa row of A
row of A by
by aa scalar
scalar and
and then
then adding
adding it
it to
to another
another row
row does
does not
not change
change
the
the determinant.
determinant.
8.
8. Multiplying aa column
column of
of A by a scalar
scalar and then adding it to another column
column does
does not
change the
change the determinant.
determinant.
nxn
9. det AT =
detAT = det
detA (det AHH =
A (detA det A if A
= detA A eE C
C"X").).
10. If
10. If A is diagonal,
diagonal, then det A =
=a11a22 • • • ann,
alla22 ... i.e., det
ann, i.e., det AA isis the
the product
product of
of its
its diagonal
diagonal
elements.
11. If
11. If A is upper triangular, then det
det A =
= all
a11a22 • • • a"n.
a22 ... ann.
12. If
12. If A
A is
is lower triangular, then
lower triangUlar, then det
det A
A== a11a22 • • • ann.
alla22 ... ann.
13. If A
13. A is block diagonal (or
block diagonal (or block
block upper triangular or block lower triangular), with
square diagonal blocks A11, A 11, A22, • • •,, A
A 22 , ... (of possibly different
nn (of
An" different sizes), then det A
A ==
det A11
det A 11 det
det A22 • • • det
A22 ... Ann.
det Ann.
14. If
14. If A,
A, B eRIRnnxn
B E xn
,thendet(AB) = det
, then det(AB) = det A
A det 5.
det B.
15. If
15. If A
A € Rnxn, then
E lR~xn, then det(A- 1
det(A- 1)) =
=1det
de: AA.
.
nxn
16. If
16. If A lR~xn
A eE R and
and D
DE eRIR m xm
mxm
det [~
,, then det [Ac B~]
D] = del
detAA det(D CA-l 1 B).
det(D –- CA– B).
Proof" This
Proof: This follows
follows easily from the
easily from the block LU factorization
block LU factorization
[~ ~J=[ ~ ][ ~
17. If
17. If A IRnnxn
A Ee R xn
and D
and D eE RM mxm
lR~xm, , then
then det [Ac B~]
det [~ D] = det
det D
D det(A
det(A -– B
BD – 11C
D- ).
C).
Proof" This follows easily from the block UL factorization
Proof:
BD- 1
I ][
6
6 Chapter 1.
Introduction and
and Review
Review
Remark
Remark 1.5.1.5. The factorization of
The factorization of aa matrix
matrix A into the
A into the product of aa unit
product of lower triangular
unit lower triangular
matrix L 1's on the diagonal) and an
L (i.e., lower triangular with all l's an upper triangular matrix
is called an
U is
V an LV
LU factorization;
factorization; see,
see, for example,
example, [24].
[24]. Another
Another such
such factorization
factorization is
is VL
UL
where V U is unit upper triangular and L is lower triangular.
triangular. The factorizations used above
are block analogues of these.
Remark C A –-I1 BB is called the Schur complement of A in[AC
Remark 1.6. The matrix D -— e [~ BD].
~ ].
Similarly, A -– B D – l C is the Schur complement of
BD-Ie of D
Din [AC B~D ].
in [~ l
EXERCISES
EXERCISES
Rnxn and or
1. If A eE jRnxn a is a scalar, what is det(aA)? What is det(–A)?
det(-A)?
2. If
If A
A is orthogonal, what is det A? If A
A? If A is unitary,
unitary, what is det A?
A?
3. Let x, y eE Rn.
Letx,y jRn. Show that det(I – xyT) = 1-
Showthatdet(l-xyT) yTx.
1 – yTx.
4. Let U1, U2, ...

VI, V2, Uk €
. . .,,Vk Rnxn
E jRn xn be orthogonal matrices. Show that the product V
U =
=
U1 V2
VI • • •V
U2 ... is an
Ukk is an orthogonal matrix.
5. Let A
A E R n x n . The trace of
e jRNxn. of A, denoted
denoted TrA,
Tr A, is defined as the sum of its diagonal
aii.
elements,
elements, i.e.,
i.e., TrA = L~=I
TrA = Eni=1 au·
(a) Show that the trace is a linear Rnxn
linear function; i.e., if A, B eE JRn xn and a, ft
f3 eE R,
JR, then
Tr(aA + f3B)
Tr(aA fiB)= aTrA +
= aTrA + fiTrB.
f3TrB.
(b) Show that Tr(AB) = Tr(BA),
Tr(Afl) = AB i=
Tr(£A), even though in general AB ^ B A.
BA.
nxn
(c) Let S €E RjRnxn be skew-symmetric, STT =
skew-symmetric, i.e., S -S. Show that TrS
= -So TrS = 0.
O. Then
either prove the converse or provide a counterexample.
6. A matrix A
A eE W x
A22 = A.
" is said to be idempotent if A
jRnxn
/ x™ . , • ,
(a) Show that the matrix A = --2!I [T|_ 2cos
A =
22
2cos<9
. _..
2f)
sin 2^
sm 0
0
2sin
2sm2rt
J. .
sin 20 1 . .d_,
_sin. 20
2z 0 is idempotent
# J IS I empotent
r
..lor all
for aII #.
II
o.
X
Suppose A
(b) Suppose A eE IR"
jRn xn"isisidempotent andAAi=^ I.I. Show
idempotentand Showthat
thatAAmust
mustbe
besingular.
singular.
Chapter 2
Vector Spaces
Vector Spaces
In this
In chapter we
this chapter we give
give aa brief
brief review
review of some of
of some of the basic concepts
the basic concepts of vector spaces.
of vector spaces. The
The
emphasis is
emphasis on finite-dimensional
is on finite-dimensional vector spaces, including
vector spaces, including spaces
spaces formed
formed by special classes
by special classes
of matrices,
of matrices, but some infinite-dimensional
but some infinite-dimensional examples
examples are
are also
also cited.
cited. An excellent reference
An excellent reference
for this
for this and
and the
the next
next chapter is [10],
chapter is [10], where some of
where some of the
the proofs
proofs that
that are
are not
not given
given here
here may
may
be found.
be found.
2.1 Definitions and Examples

Definition 2.1.
Definition 2.1. A
A field set F
is aa set
field is IF together with two
together with operations +,
two operations +, .• : IF
F xx F
IF —> F such
~ IF such that
that
(Al) a + (,8
(Al) a (P + y)
y) =
= (a + p ) + yy ffor
(a +,8) o r all
all a,,8, € F.
a, ft, yy Elf.
(A2) there
(A2) there exists
exists an
an element 0 Ee F
element 0 IF such that aa + 0
such that 0== aa. for
for all a Ee F.
all a IF.
(A3) for
(A3) all aa eE F,
for all there exists
IF, there exists an
an element (—a) eE F
element (-a) such that
IF such a + (-a)
that a (—a) = 0.
O.
(A4) a+
(A4) a + ,8
p= ,8 + afar
= ft a for all
all a, ft e
a, ,8 F.
Elf.
(Ml) aa·- ((,8,

(Ml) p - yy)) = (a·,8)·
( a - p ) - yyf for
o r all
all a,,8, e F.
a, p, yy Elf.
(M2) there exists an element I1 Ee F

(M2) IF such that a .• II = for all aa Ee F.
= a for IF.
IF, a f.
(M3) for all a Ee ¥,
(M3) ^ 0, a"1 E€ F
0, there exists an element a-I a~l == 1.
IF such that a .• a-I 1.
(M4) aa·,8
(M4) • p =,8
= P .• afar
a for all
all a,
a, ,8 F.
p Ee IF.
(D)
(D) p + y)
aa·- ((,8 y)=ci-p+a-
= a·,8 +a· yy for alia,
for all p,ye¥.
a, ,8, y Elf.
Axioms (A1)-(A3) state

Axioms (Al)-(A3) state that (F, +)
that (IF, +) is is aa group
group and
and an abelian group
an abelian group if (A4) also
if (A4) holds.
also holds.
Axioms (MI)-(M4)
Axioms (M1)-(M4) state
state that
that (IF to), .)•) isis an
(F \\ {0}, an abelian
abelian group.
group.
Generally
Generally speaking,
speaking, when
when nono confusion
confusion can can arise,
arise, the
the multiplication
multiplication operator "•" is
operator "." is
not written
not written explicitly.
explicitly.
7
8 Chapter 2. Vector Spaces
Example 2.2.
Example 2.2.
1.
I. R
IR with
with ordinary addition and
ordinary addition and multiplication is aa field.
multiplication is field.
2. e
2. C with
with ordinary complex addition
ordinary complex addition and multiplication is
and multiplication is aa field.
field.
3. = the
Raf.r] =
3. Ra[x] field of
the field of rational
rational functions
functions in
in the
the indeterminate
indeterminate xx
= {ao+ atX + ... + apxP :aj,f3i EIR ;P,qEZ

+} ,
f30 + f3t X + ... + f3qXq
where
where Z+ = {O,l,2,
Z+ = {0,1,2, ...
...},}, is
is aa field.
field.
mxn
4. IR~ xn =
4.RMr = {m
m xx nn matrices
matrices of
of rank
rank rr with
with real
real coefficients}
coefficients) is
is clearly
clearly not
not aa field
field since,
since,
x
for
for example, (Ml) does
example, (MI) does not
not hold
hold unless
unless mm== n.n. Moreover,
Moreover, R" lR~xn" is
is not
not aa field
field either
either
since (M4)
since (M4) does
does not
not hold
hold in
in general (although the
general (although other 88 axioms
the other hold).
axioms hold).
Definition
Definition 2.3.
2.3. A vector space
A vector space over
over a
a field
field F
IF is
is a
a set V together
set V together with
with two operations
two operations
+ ::VV xx VV -+ and- :: IFF xxV
-^VV and· -»•VV such
V -+ such that
that
(VI) (V,
(VI) (V, +)
+) is
is an
an abelian group.
abelian group.
(V2) (a·
(V2) ( a - pf3)) -. vv =
= aa - .( (f3
P ' V. v)
) f ofor
r all
all a, p e
a, f3 E F
IF and for all
andfor all vv E
e V.
V.
(V3) (a + f3).
(V3) (a a • vv +
ft) • vv == a· + pf3.• vv for
for all a, p
all a, f3 € F and
Elf andforall E V.
for all vv e V.
(V4) a·
(V4) (v + w)
a-(v = a . v + a-
w)=a-v w for
a .w all aa eElF
for all F and for all
andfor w Ee V.
v, w
all v, V.
1 • vv =
(V5) I·
(V5) = vv for eV
all vv E
for all (1 eElf).
V (1 F).
A
A vector
vector space
space isis denoted
denoted by (V, F)
by (V, IF) or,
or, when
when there
there is
is no
no possibility
possibility of
of confusion
confusion as
as to
to the
the
underlying
underlying fie Id, simply
field, simply by V.
by V.
Remark
Remark 2.4.2.4. Note that +
Note that + and
and·• in
in Definition
Definition 2.3
2.3 are
are different the +
from the
different from + and
and .• in Definition
in Definition
2.1 in the
2.1 in the sense
sense of
of operating
operating on
on different objects in
different objects in different sets. In
different sets. In practice,
practice, this
this causes
causes
no confusion and
no confusion the • operator
and the· operator is
is usually
usually not even written
not even written explicitly.
explicitly.
Example 2.5.
Example 2.5.
1. (R", R)
I. (IRn, IR) with
with addition
addition defined
defined by
by
and scalar multiplication

and scalar defined by
multiplication defined by
is aa vector
is space. Similar
vector space. Similar definitions
definitions hold
hold for
for (C", C).
(en, e).
2.2. Subspaces
2.2. Subspaces 99
2. (E mxn , E)
2. (JRmxn, is aa vector
JR) is space with
vector space with addition
addition defined
defined by
by
+ fJI2 aln + fJln
l
al2
[ ."
a21 + P"
+ fJ2I a22 + fJ22 a2n + fJ2n
A+B= .
amI + fJml am2 + fJm2 a mn + fJmn

and scalar
and scalar multiplication
multiplication defined
defined by
yA =
by
[ ya"
y a 21
.
yaml
y a l2
y a 22
yam 2
ya,"
ya2n
.
yamn
l .
3. Let
3. Let (V,
(V, IF)
F) be an arbitrary
be an arbitrary vector space and
vector space and '0
V be an arbitrary
be an arbitrary set.
set. Let
Let cf>('O,
O(X>, V)V) be the
be the
set of
set of functions
functions f/ mapping
mapping '0D to V. Then
to V. Then cf>('O,
O(D, V)V) is
is aa vector space with
vector space with addition
addition
defined by
defined by
(f + g)(d) = fed) + g(d) for all d E '0 and for all f, g E cf>
and scalar
and scalar multiplication defined by
multiplication defined by
(af)(d) = af(d) for all a E IF, for all d ED, and for all f E cf>.
Special Cases:
Special Cases:
(a) '0
(a) V == [to,
[to, td,
t\], (V, F) =
(V, IF) = (JR
(IR",
n
E), and
, JR), and the functions are
the functions are piecewise
piecewise continuous
continuous
n n
=: (PC[to,
=: (PC[f0, td)n
t\]) or continuous
continuous =: =: (C[?
(C[to,
0 , h])
td)n. .
(b) '0 = [to, +00), (V, IF) = (JRn, JR), etc.
4. Let A
4. Let A E R"x". Then
€ JR(nxn. Then {x(t)
(x(t) :: x(t)
x(t) =
= Ax(t}} is aa vector
Ax(t)} is space (of
vector space (of dimension
dimension n).
n).
2.2
2.2 Subspaces
Subspaces
Definition 2.6.
Definition 2.6. Let (V, IF)
Let (V, F) be
be aa vector
vector space
space andand let W c~ V,
let W V, W
W f= = 0.
0. Then
Then (W,(W, IF)
F) is
is aa
subspace of
subspace of (V, F) if
(V, IF) if and
and only (W, IF)
only ifif (W, F) is itself aa vector
is itself vector space or, equivalently,
space or, equivalently, if
if and
and only
only
if(awl ßW2) eE W for
i f ( a w 1 + fJw2) foral! a, fJß eE IF
all a, ¥ andforall
and for all WI, w2 Ee W.
w1, W2
Remark 2.7.
Remark 2.7. The
The latter
latter characterization
characterization of
of aa subspace
subspace is
is often
often the
the easiest
easiest way
way to
to check
check
or prove
or prove that something is
that something is indeed
indeed aa subspace
subspace (or(or vector
vector space);
space); i.e.,
i.e., verify
verify that
that the
the set
set in
in
question is
question is closed
closed under
under addition
addition and
and scalar
scalar multiplication.
multiplication. Note,
Note, too, that since
too, that since 00 Ee IF,
F, this
this
implies that
implies the zero
that the vector must
zero vector be in
must be in any
any subspace.
subspace.
Notation: When
Notation: When the
the underlying
underlying field is understood,
field is understood, we
we write
write W
W c~ V,V, and
and the symbol ~,
the symbol c,
when
when used with vector
used with vector spaces,
spaces, is
is henceforth
henceforth understood
understood toto mean
mean "is
"is aa subspace
subspace of."
of." The
The
less restrictive
less restrictive meaning
meaning "is
"is aa subset
subset of'
of" is
is specifically
specifically flagged
flagged as
as such.
such.
Example 2.8.
Example 2.S.
1. Consider (V,
1. F) =
(V,lF) (R" X ",R) and
= (JR.nxn,JR.) and let W = {A e
= [A E R"
x
JR.nxn" :: A
A is symmetric}. Then
We V.
W~V.
Proof' Suppose
Proof: AI, A
Suppose A\, A22 are symmetric. Then
are symmetric. Then it
it is easily shown
is easily shown that aAI +
that ctA\ + f3A2
fiAi is
is
symmetric
symmetric for
for all
all a, ft eE R.
a, f3 R
x
2. Let W = {A €E R"
]Rnxn" :: A is orthogonal}. Then W is /wf R"x".
not a subspace of JR.nxn.
2 2
3.
3. Consider (V, F)
Consider (V, = (R
IF) = (]R2,, R)
JR.) and
and for
for each
each vv €E R
]R2 of
of the
the form
form vv = [~~ ]] identify
= [v1v2 identify v1
VI with
with
the
the jc-coordinate
x-coordinate in
in the
the plane
plane and u2 with
and V2 with the
the y-coordinate.
y-coordinate. For ß eE R,
For a, f3 define
R define
W",/l = {V : v = [ ac ~ f3 ] ; c E JR.} .
Then Wa,ß is
Then W",/l is aa subspace
subspace of V if
of V and only
if and only if
if f3ß =
= 0.
O. As
As an interesting exercise,
an interesting exercise, sketch
sketch
W2,1, W2,o,W1/2,1,
W2.I, W2,O, Wi,I' andandW1/2,
Wi,o,0. Note, too, that
Note, too, that the
the vertical
vertical line
line through
through the
the origin
origin (i.e.,
(i.e.,
a oo) is
a == 00) is also
also aa subspace.
subspace.
All
All lines
lines through
through the origin are
the origin are subspaces. Shifted subspaces
subspaces. Shifted Wa,ß with
subspaces W",/l with f3ß =
=1= 0
0 are
are
called linear varieties.
called linear varieties.
Henceforth,
Henceforth, wewe drop
drop the
the explicit dependence of
explicit dependence of aa vector space on
vector space on an
an underlying
underlying field.
field.
Thus, V
Thus, V usually
usually denotes
denotes aa vector
vector space
space with
with the
the underlying
underlying field
field generally
generally being
being R unless
JR. unless
explicitly
explicitly stated otherwise.
stated otherwise.
Definition
Definition 2.9.
2.9. IfffR
12, and
and SS are vector spaces
are vector (or subspaces),
spaces (or then RR =
subspaces), then = SS if
if and
and only
only ifif
RC S and S C
~SandS R.
~ R.
Note: To
To prove
prove twotwo vector spaces are
vector spaces are equal,
equal, one
one usually
usually proves
proves the
the two
two inclusions separately:
inclusions separately:
An
An arbitrary
arbitrary rr eE R is
is shown
shown to
to be
be an
an element
element of
of S and
and then
then an arbitrary s5 E€ S is
an arbitrary is shown
shown to
to
be an element
be an element of of R.
R.
2.3
2.3 Linear Independence
Linear Independence
Let
Let X {VI, v2,
X = {v1, V2, ••.•
• •}} be
be aa nonempty
nonempty collection
collection of
of vectors
vectors u,
Vi in
in some
some vector
vector space V.
space V.
Definition
Definition 2.10.
2.10. X X is a linearly
linearly dependent set set ofof vectors if
if and only if
and only if there exist
exist k distinct
distinct
elements v1, ...
elements VI, vk eE X
. . . ,, Vk X and
and scalars . . . ,, (Xk
a1, ..•
scalars aI, ak not
not all
all zero
zero such
such that
that
X
X is
is aa linearly
linearly independent
independent set
set of
of vectors if and
vectors if and only
only ififfor any collection
for any collection of
of kk distinct
distinct
elements VI,
elements v1, ...
. . . ,,Vk of X
Vk of X and
and for
for any
any scalars
scalars a1, . . . ,, ak,
aI, ••• ak,
al VI + ... + (XkVk = 0 implies al = 0, ... , ak = O.

2.3. Linear Independence
2.3. Linear Independence 11
11
Example 2.11.
Example 2.11.
1. LetV
I. 1£t V = ~,
= R3. Then
Then {[ ~ HiHi] } Ime~ly is a linearly independent
i" set. Why?
independent.. Why?
However, [
Howe,."I i1[i1[l ]} iss aa Iin=ly de~ndent ~t
linearly dependent set
(since 2vI —
(since 2v\ v2 +
- V2 + v3
V3 = 0).
= 0).
2. Let A
A E R xn and 5B eE R"
e ]Rnxn xm
. Then
]Rnxm. Thenconsider
considerthe
therows
rows of tA
ofeetA BB as
as vectors
vectors in
in Cm
em[t tIl
0, t1]
[to,
fA
(recall that etA
e denotes the matrix exponential, which is discussed in more detail in
Chapter 11).
Chapter 11). Independence
Independence of these vectors
of these vectors turns
turns out
out to
to be
be equivalent
equivalent to
to aa concept
concept
called controllability, to
called to be studied further
be studied further in
in what
what follows.
follows.
Let Vivf eE R",

]Rn, ii Ee If,
k, and consider
consider the matrix V = = [VI,
[ v 1 , ...
... ,Vk] Rnxk. The
, Vk] eE ]Rnxk. The linear
dependence
dependence of of this set of
this set of vectors
vectors is equivalent to
is equivalent to the
the existence
existence of of aa nonzero
nonzero vector Rk
vector a eE ]Rk
such that Va = 0. O. An equivalent condition
condition for linear
linear dependence
dependence is that the k x x k matrix
VT VV is
VT is singular.
singular. If If the
the set
set of
of vectors
vectors is
is independent,
independent, and
and there
there exists
exists aa Ee ]Rk
R* such
such that
that
Va =
Va 0, then
= 0, then a = = 0.O. An
An equivalent condition for
equivalent condition linear independence
for linear independence is is that
that the matrix
the matrix
V TTVV is
is nonsingular.
nonsingular.
Definition
Definition 2.12.
2.12. Let X = {VI,
X = [ v 1 , V2,
v 2 , ...•
. . }} be a collection of
of vectors vi.
Vi Ee V. Then the span
span of
of
X is
X is defined
defined asas
Sp(X) = Sp{VI, V2, ... }
= {v : V = (Xl VI + ... + (XkVk ; (Xi ElF, Vi EX, kEN},
where N = {I,
{1, 2, ...
...}.
}.
n
Example 2.13.
2.13. Let V =
=R and define
]Rn and
0 0
0 1 0
el = 0 , e2 = 0 ,'" ,en = 0
o o
Then Sp{e1, e2, ...
SpIel, e2, ...,e n} =
, en} = Rn.
]Rn.
Definition
Definition 2.14.
2.14. A
A set of vectors
set of vectors X is aa basis
X is basis for V if
for V if and
and only
only ijif
1. X (of basis vectors), and

X is a linearly independent set (of and
2. Sp(X)
2. Sp(X) = V.
= V.
12
[e\,...,
Example 2.15. {el, en} is a basis for
... , en} for]Rn natural basis).
IR" (sometimes called the natural
Now let bb1, ..., , bnn be a basis (with a specific order associated with the basis vectors)
l , ...
for V. Then
for Then for
for all
all v E there exists
e V there exists aa unique
unique n-tuple {E1 , ...
n-tuple {~I' . . . , ,E~n}
n } such
such that
that
v= ~Ibl + ... + ~nbn = Bx,
where
B ~ [b".,b.l. x ~ DJ
Definition 2.16. The scalars {Ei}
Definition 2.16. components (or sometimes the coordinates)
{~i } are called the components coordinates)
of v with respect to the basis {b
of (b1,
l , ...,
... , b nn]} and are unique.
unique. We
We say that the vector x of
of
represents the vector v with respect to the basis B.
components represents B.
In Rn,
Example 2.17. In]Rn,
VI ]
: = vlel + V2e2 + ... + vne n ·
Vn
We can
We can also
also determine
determine components
components of
of vv with
with respect
respect to
to another
another basis. For example,
basis. For example, while
while
[ ~ ] = I . el + 2 . e2,
with respect
with respect to
to the basis
the basis
{[-~l[-!J}
we have
we have
[ ~ ] = 3.[ -~ ] + 4· [ -~ l
To see this, write
[ ~ ] = XI • [ - ~ +
] X2 • [ _! ]
= [ -~ -! ][ ~~ l
Then
Then
[ ~~ ] = [ -; -1 r I [ ; ] =[ ~ l
Theorem 2.18.
Theorem 2.18. The
The number
number of
of elements
elements in
in aa basis
basis of
of aa vector
vector space is independent
space is independent of
of the
the
particular basis considered.
particular basis considered.
Definition 2.19.
Definition 2.19. If
If a basis X
X for V= 0)
for a vector space V(Jf 0) has n elements, V
V is
is said to
be n-dimensional or
be n.dimensional or have dimension nn and
have dimension and we
we write dim(V)
write dim (V) = nn or dim V
or dim V — n.
n. For
For = =
2.4. Sums
2.4. Sums and
and Intersections
Intersectionsof
of Subspaces
Subspaces 13
13
consistency, and
consistency, and because the 00 vector
because the vector is
is in
in any
any vector
vector space,
space, we
we define dim(O) =
define dim(O) = 0.
O. A
A
vector space
vector space VV is finite-dimensional if
is finite-dimensional there exists
if there exists aa basis
basis X < +00
with nn <
X with elements;
+00 elements;
otherwise, V
otherwise, V is infinite-dimensional.
is infinite-dimensional.
Thus, Theorem
Thus, 2.18 says
Theorem 2.18 says that
that dim(V)
dim (V) = the number of
the number of elements in aa basis.
elements in basis.
Example 2.20.
Example 2.20.
dim(Rn)=n.
1. dim(~n) = n.
dim(R mxn ) = mn.

2. dim(~mXn) mn.
Note: Check that
Note: Check that aa basis
basis for Rmxn is
for ~mxn is given
given by the mn
by the matrices Eij;
mn matrices Eij; ii eE m,
m, jj Ee ~,
n,
where Efj is
where Eij is aa matrix all of
matrix all of whose elements are
whose elements are 00 except
except for
for aa 11 in
in the (i, J)th
the (i, j)th location.
location.
The collection
The collection ofof E;j
Eij matrices
matrices can
can be
be called
called the
the "natural
"natural basis matrices."
basis matrices."
3. dim(C[to, tJJ) - +00.
t1]) = +00.
4. dim{A Rnxn :: A
dim{A E€ ~nxn A =A
T
AT}} = {1/2(n + 1).
= !n(n
(To see
(To see why,
why, determine 1
determine1/2n(n
!n(n + 1) symmetric
1) basis matrices.)
symmetric basis matrices.)
2
5. dim{A
5. Rnxn :: A
dim{A Ee ~nxn is upper
A is (lower) triangular}
upper (lower) !n(n + 1).
=1/2n(n
triangular} = 1).
2.4
2.4 Sums and Intersections of Subspaces
Subspaces
Definition 2.21.
Definition 2.21. Let (V, JF')
Let (V, F) bebe a vector space
a vector space and let 71,
and let c V.
R, SS S; V. The sum and
The sum and intersection
intersection
of R, and
ofR and SS are
are defined
defined respectively by:
respectively by:
1. n+S
1. R {r + ss :: rr eE R,
S = {r U, ss eE S}.
5}.
2.
2. ft
R nH5 = {v
S = {v :: vv Ee 7^ and vv Ee 5}.
R and S}.
Theorem 2.22.
Theorem 2.22.
kK
1. K
1. CV
R + SS S; V (in
(in general, U\ -\+
general, RI ... +h 7^ =: L
Rk =: ]T R;
ft/ S;
C V,
V, for
for finite
finite k).
k).
1=1
;=1
2.
2. 72.
R n D5 CV
S S; V (in
(in general,
general, n
f] *R, CV
Raa S; V/or an arbitrary
for an arbitrary index
index set A).
set A).
a eA
CiEA
Remark
Remark 2.23. The union
2.23. The union of
of two
two subspaces,
subspaces, R C is not
S, is
U S, not necessarily
necessarily aa subspace.
subspace.
Definition
Definition 2.24. = R
2.24. T = 0 SS is
REB is the direct sum
the direct sum of
ofR and SS ifif
R and
1. R n S == 0,
1. n and
0, and
2. R + SS =
2. U general, ft;
(in general
= T (in R; n (^ ft,-)(L
R j ) == 00 and
am/ ]P ft,- L
Ri == T).
T).
y>f
H; «
The subspaces
The subspaces R, and SS are
Rand are said
said to be complements
to be complements of each other
of each other in
in T.
T.
14
14 Chapter 2.
Chapter 2. Vector
Vector Spaces
Spaces
2
2.25. The complement of R
Remark 2.25. ft (or S)
S) is not unique.
unique. For example, consider V = =R jR2
and let
and let R
ft be
be any
any line
line through
through the
the origin.
origin. Then any other
Then any other distinct line through
distinct line through the origin is
the origin is
a complement of R. ft. Among all the complements there is a unique
unique one orthogonal to R. ft.
We discuss more about orthogonal complements elsewhere in the text.
Theorem 2.26. Suppose
Theorem 2.26. =RR O
Suppose T = EB S. Then
Then
every tt E€ T can
1. every can be
be written
written uniquely in the
uniquely in form tt = rr + ss with
the form with rr Ee Rand
R and ss Ee S.
S.
2. dim(T) =
2. dim(ft) +
= dim(R) + dim(S).
Proof:
Proof: ToTo prove the first part, suppose an arbitrary vector tt Ee T can be written in two ways
as rl + s1
as tt = r1 r2 + S2,
Sl = r2 S2, where
where r1, r2 Ee Rand
rl, r2 R. and s1,
SI, S2 e S. Then
S2 E Then r1
r, — r2 = s2—
- r2 s\. But
S2 - SI. But
r1 -–r2
rl r2 £ ft and 52
E Rand S2 -— si
SI eE S. Since Rft n
S. Since fl S = 0,
0, we
we must
must have r\ = r2
have rl and s\
r-i and SI = si from
S2 from
which uniqueness follows.
which uniqueness follows.
The statement of the second part is a special case of the next theorem. D 0
Theorem 2.27.
Theorem 2.27. For
For arbitrary
arbitrary subspaces ft, S
subspaces R, S of
of a vector space
a vector space V,
V,
dim(R + S) = dim(R) + dim(S) - dim(R n S).
Example 2.28. Let U be the subspace of upper triangular matrices in E"

Example 2.28. x
" and let £.c be the
jRn xn the
subspace of lower triangUlar
triangular matrices in R nxn
. Then it may be checked that U +
jRn xn. + .c
L= Rnxn
= jRn xn
while Uun.c
n £ is the set of diagonal matrices in R nxn
jRnxn.. Using the fact that dim {diagonal
(diagonal
matrices} == n,
n, together with Examples 2.20.2 and 2.20.5, one can easily verify the validity
validity
of
of the formula given
the formula given in Theorem 2.27.
in Theorem 2.27.
Example 2.29.
Example 2.29. (V, IF)
Let (V, (R n x n , R),
F) = (jRnxn, ft be the set of skew-symmetric matrices in
jR), let R
R" x
", and
jRnxn, let S
and let the set
be the set of symmetric matrices in R"
in x
jRnxn. n
". Then V = U $0 S.
S.
x
Proof: This follows easily from the fact that any A
Proof: e R"
A E jRnxn" can be written
written in the form
1 TIT
A=2:(A+A )+2:(A-A).
The first matrix on the right-hand side above is in S while the second is in R.
ft.
EXERCISES
EXERCISES
1. Suppose {VI,
1. {vi,..., vd is a linearly dependent set. Then show that one of the vectors
... , Vk}
must be a linear combination of the others.
2. Let x\,
XI, *2, . . . ,, x/c
X2, ... Xk EE R"
jRn be nonzero mutually
mutually orthogonal vectors. Show that {XI,
[x\,...,
... ,
Xk} must be
Xk} must be aa linearly
independent set.
set.
v\,...
3. Let VI, ... ,v n be orthonormal vectors in R".
, Vn jRn. Show that Av\,...,
Av" •.. , Av are also or-
AVnn are or-
x
thonormal if and only if A Ee R"
jRnxn" is orthogonal.
Consider the vectors VI
4. Consider [2 1l]fr and V2
= [2
v\ — l] r .Prove
1*2== [3[3 1f. Provethat
thatVIviand
andV2V2form
forma abasis
basis
2
for R
jR2.. Find the components of the vector v
v= = [4 l]r with respect to this basis.
[4 If
Exercises
Exercises 15
5. Let
5. Let P denote
denote the set of
the set polynomials of
of polynomials degree less
of degree than or
less than or equal
equal to two of
to two the form
of the form
2
Po + PI
Po p\xX + pix where Po,
P2x2,, where po, PI,
p\, p2
P2 Ee R. Show that
R Show is aa vector
that P is vector space
space over
over R
E. Show
Show
that the
that the polynomials
polynomials 1, 1, *,
x, and 2x2 —
and 2x2 - 11 are
are aa basis for P. Find
basis for the components
Find the components ofof the
the
22
polynomial 22 +
polynomial 3x + 4x
+ 3x 4x with
with respect
respect to
to this basis.
this basis.
6. Prove
6. Theorem 2.22
Prove Theorem 2.22 (for
(for the
the case of two
case of two subspaces
subspaces Rand
R and S only).
only).
7. Let
7. denote the
Let Pnn denote vector space
the vector space of of polynomials
polynomials of of degree
degree lessless than
than or equal to
or equal to n, and of
n, and of
n
the form p(x) Po + PIX + ... + Pnxn,
the form p ( x ) = po + p\x + • • • + p n x , where the coefficients /?, are all real.
where the coefficients Pi are all real. Let PE Let PE
denote
denote the subspace of
the subspace all even
of all even polynomials
polynomials in in Pnn,, i.e.,
i.e., those that satisfy
those that satisfy the property
the property
p(—x}
p( -x) = = p(x). Similarly, let
p(x). Similarly, let PQ denote the
Po denote subspace of
the subspace of all
all odd polynomials, i.e.,
odd polynomials, i.e.,
those satisfying p(—x} = – p ( x ) . Show
those satisfying p(-x) = -p(x). Show that nn = PE that P = P © PO-
E EB Po·
8. Repeat
8. Repeat Example
Example 2.28 using instead
2.28 using instead the
the two subspaces T
two subspaces 7" of
of tridiagonal
tridiagonal matrices
matrices and
and
of upper
U of
U upper triangular
triangular matrices.
matrices.
left blank
blank
Chapter 3
Chapter 3
3.1
3.1 Definition and
Definition and Examples
Examples
We begin with the basic definition
definition of
of aa linear
linear transformation (or
(or linear map, linear function,
function,
or linear
or operator) between
linear operator) between two
two vector spaces.
vector spaces.
Definition 3.1. Let
Let (V, F)
IF) and (W, F)
and (W, IF) be
be vector
vector spaces. Then C
spaces. Then -> W is aa linear
I:- :: V -+
transformation if and
transformation if and only if
only if
I:-(avi + {3V2)
£(avi pv2) = al:-vi + {3I:-V2
= aCv\ fi£v2 far all a,
for all £e
a, {3 F and
ElF far all
and for all v },v
VI, V22e V.
E V.
The vector
The vector space
space V is called the domain of
called the of the
the transformation
transformation C while VV,
I:- while the space
W, the space into
into
which
which it
it maps, is called
maps, is the co-domain.
called the
Example 3.2.
Example 3.2.
1. Let F
1. Let IF = R and take
JR and take VV= WW = PC[f 0, +00).
PC[to, +00).
Define
Define £I:- :: PC[t 0, +00)
PC[to, -> PC[t
+00) -+ 0, +00)
PC[to, +00) by
by
vet) f--+ wet) = (I:-v)(t) = 11to

e-(t-r)v(r) dr.
2. Let IF = R
Let F V= W
and take V
JR and R mx ". Fix
W = JRmxn. MEe R
Fix M mxm
JRmxm..
mx -+ JRmxn
I:- :: JRmxn mxn by
Define £ R " -> M by
X f--+ Y = I:-X = MX.
3. Let
3. Let F
IF =
=n R and take
JR and V=
take V = P" pn = {p(x)
(p(x) = a ao0 + ct
alx}x H
+ ... +h aanx
nx"
n : a,
ai E
E R}
JR} and
and
1
w
W = = -p -.
pn-l.
Define C.: —> W by I:-Lpp =— p',
I:- : V -+ p', where' denotes differentiation
where I denotes differentiation with respect to x.
x.
17
18 Chapters. Li near Transformations
Chapter 3. Linear Transformations
3.2
3.2 Matrix Representation
Matrix Representation of Linear Transformations
of Linear Transformations
Linear
Linear transformations
transformations between
between vector
vector spaces
spaces with
with specific
specific bases
bases can can be
be represented con-
represented con-
veniently in matrix form. Specifically,
Specifically, suppose £L : (V, F)IF) —>•
~ (W, (W, F)IF) is linear and further
suppose that {Vi,
{u,, i eE n}
~} and {Wj,
{w j, j Ee !!!.}
m] are bases for V V and W, respectively. Then the
ith column of A = = Mat £ L (the matrix representation of £L with respect to the given bases
for V
for V and
and W)
W) is
is the
the representation of £i>,
representation of LVi with
with respect
respect to {w}j,•, jj eE m}.
to {w raj. In
In other
other words,
words,
n
al ]
A= : E JR.mxn
a mn
represents £ since
represents L since
LVi = aliwl + ... + amiWm

=Wai,
where W=
where W = [w\,..., wm]]and
[WI, ... , w and
is the ith
z'th column of A. Note that A = Mat £ L depends on the particular bases for V
V and W.
This could be reflected by subscripts, say, in the notation, but this is usually
usually not done.
The action of £L on an arbitrary vector Vv eE V V is uniquely
uniquely determined
determined (by linearity)
by
by its action on
its action on aa basis. Thus, if
basis. Thus, v =
if V ~I VI +
= E1v1 + ...
••• +
+ E vn =
~nnVn = VVxx (where
(where u, and hence
v, and hence jc,
x, is
is
arbitrary), then
arbitrary), then
LVx = Lv = ~ILvI + ... + ~nLvn

=~IWal+"'+~nWan
= WAx.
Thus, £V
Thus, LV = WWA since xx was
A since was arbitrary.
arbitrary.
When
When V V== R",
JR.n, W Rmm and
W == lR. and {Vi,
[ v i , ii Ee n}, [ w jj', jj eE !!!.}
~}, {W m} areare the
the usual (natural) bases,
usual (natural) bases
the equation £V
LV = W WA A becomes simply £ L == A.A. WeWe thus commonly identify identify A linea
A as a linear
transformation with
transformation its matrix
with its matrix representation,
representation, i.e., i.e.,
m
Thinking of
Thinking of A both as aa matrix
both as matrix and
and as
as aa linear
linear transformation from Rn
transformation from JR." to Rm
to lR. usually
usually causes
causes no
no
confusion. Change of basis then corresponds naturally
naturally to appropriate matrix multiplication.
3.3. Composition
3.3. Composition of Transformations
ofTransformations 19
19
3.3 Composition of Transformations

Composition
Consider three vector spaces U, V, and W and transformations B from U to V and A from
Wand
to W.
V to W. Then
Then we
we can
can define
define aa new
new transformation
transformation C as follows:
C as follows:
The above diagram

The above diagram illustrates
illustrates the
the composition
composition of
of transformations C =
transformations C AB. Note
= AB. Note that
that in
in
most texts, the arrows above are reversed as follows:
However, it might be useful to prefer the former since the transformations A and B appear
in the
in same order
the same order in
in both
both the
the diagram
diagram and
and the
the equation.
equation. If dimZ// =
If dimU = p, dim V =
p, dimV = n,n,
and
and dim W = m,
dim W and if
m, and if we associate matrices
we associate matrices with the transformations
with the transformations in
in the
the usual
usual way,
way,
then composition
then composition of
of transformations corresponds to
transformations corresponds to standard
standard matrix multiplication. That
matrix mUltiplication. That is,
is,
we have C C —A AB B .. The above is sometimes expressed
expressed componentwise by the
mxp nxp
formula
n
cij = L aikbkj.
k=1
Two Special
Two Special Cases:
Inner Product:
Inner Product: Let x, y eE Rn.
~n. Then their inner product is the scalar
xTy = Lx;y;.
;=1
m
Outer Product:
Outer Product: Let x eE R Rn. Then their outer product is the m x n
, yy eE ~n.
~m,
matrix
matrix
Rmxn can be written in the form A =

Note that any rank-one matrix A eE ~mxn xyT
= xyT
above (or xy H
xyH if A Ee C c ).). A rank-one symmetric matrix can be written in
mxn
mxn
xx TT (or xx
the form XX XXHH).).
20
20 Chapter 3.
Chapter 3. LinearTransformations
Li near Transformations
3.4
3.4 Structure of
Structure of Linear
Transformations
Let A
Let A :: V
V —> be aa linear
--+ W be linear transformation.
transformation.
Definition 3.3. The
Definition3.3. range of
The range of A, denotedlZ( A), is
A, denotedR(A), is the
the set {w Ee W : w
set {w w== Av
Av for some vv Ee V}.
for some V}.
Equivalently, R(A) =— {Av{Av : v Ee V}.
V}. The range ofof A is also known as the image of
of A and
denoted Im(A).
denoted Im(A).
The nullspace of
The of A, denoted
denoted N(A),
N(A), is
is the {v Ee V
the set {v V : Av
Av = 0}. The
= O}. The nullspace
nullspace of
of
A is also known as the kernel
kernel ofof A and
and denoted Ker (A).
(A).
Theorem 3.4. Let

Let A
A :: V —>• W be
V --+ be aa linear
linear transformation.
transformation. Then
Then
1. R
1. ( A ) S;
R(A) W.
C W.
2. N(A)
2. c V.
N(A) S; V.
Note that
Note that N(A) and R(A)
N(A) and R(A) are,
are, in
in general,
general, subspaces
subspaces of
of different
different spaces.
spaces.
mxn
Theorem 3.5. Let A Ee R
~mxn.. If
If A is written in
in terms of
of its columns as A =
= [ai,
[a\,...
... ,,a n],
an],
then
then
then
R(A) = Sp{al, ... , an} .
Proof: The proof

Proof: The proof of
of this
this theorem
theorem is
is easy,
easy, essentially
essentially following
following immediately
immediately from
from the defi-
the defi-
nition. D
nition. 0
Remark 3.6. Note that in

Note that in Theorem
Theorem 3.5 and
and throughout
throughout the
the text,
text, the
the same
same symbol
symbol (A)
(A) is
is
used to
used denote both
to denote both aa linear
linear transformation
transformation and
and its
its matrix
matrix representation
representation with
with respect
respect to
to the
the
usual (natural)
usual (natural) bases.
bases. See
See also
also the
the last paragraph of
of Section
Section 3.2.
3.2.
3.7. Let {VI,

Definition 3.7. {v1,...,
... , vk]
vd be a set of u, Ee ~n.
of nonzero vectors Vi Rn. The set is said to
be orthogonal if
be orthogonal vjvjv j = 00 for
if' vr for ii f=
^ jj and orthonormal if
and orthonormal if vr ij, where
vf vvjj = 88ij' 8tj is
where 8ij is the
the
Kronecker delta
Kronecker delta defined
defined byby
8 = {I0
ij
ifi=j,
if i f= j.
3.8.
Example 3.8.
1. {[ ~ J. [-: J} is an
is an orthogonal
orthogonal set.
set.
2. {[ ~~i ],[ -:~~ J} is an

is an orthonormal
orthonormal set.
set.
33.. If {{VI,
t > i •.•
, . . . ,,Vk . h Vi
with
Vk}} Wit u, E.IN,.
€ 1Tlln is
M." IS
• an ort
an hogonaI set,
orthogonal en {I~
set,ththen —/==,
~, ...- -.,, ~}
—/===
~ | IS. isan
an
I ~VI vi
^/v, VI ^/v'k vk
~~~ ]
orthonormal set.
orthonormal set.
3.4. Structure
3.4. Structure of Linear
of Li near Transformations
Transformations 21
21
c Rn.
Definition 3.9. Let S <; Then the
]Rn. Then of S is defined
the orthogonal complement of defined as the set
1
S VTS = 0 for all s e S}.
- = {v e Rn : vTs=OforallsES}.
S~={VE]Rn:
3.10. Let
Example 3.10.
Then it can be shown that
Working from the definition, the computation involved is simply to find all nontrivial (i.e.,
nonzero) solutions of the system of equations
3xI + 5X2 + 7X3 = 0,

-4xI + X2 + X3 = 0.
Note that there is nothing special about the two vectors in the basis defining S being or-
thogonal. Any set of vectors will do, including dependent spanning vectors (which would,
of course, then give rise to redundant equations).
Theorem 3.11.
Theorem Let R SS C
311 Let n,
<; R n
]Rn. The
Then
2. S \B S~ = ]Rn.
3. (S~)l. = S.
4. n <; S if and only if S~ <; n~.
5. (n + S)~ = nl. n S~.

6. (n n S)~ = n~ + S~.
Proof: We prove and discuss only item 2 here. The proofs of the other results are left
Proof: left as
exercises. Let {VI,
{v1, ... vk} be an orthonormal basis for S and let x E
..., , Vk} e Rn
]Rn be an arbitrary
vector.
vector. Set
Set
k
XI = L (xT Vi)Vi,
;=1
X2 =X -XI.
22
22 Chapters.
Chapter 3. Li near Transformations
Then
Then x\ e <S
XI E and, since
S and, since
x 2TVj = XTVj - XITVj
=XTVj-XTVj=O,
we see that
we see that x2 is orthogonal
X2 is orthogonal to to v1, ..., , Vk
VI, .•. and hence
Vk and hence to
to any
any linear
linear combination
combination of of these
these
vectors.
vectors. In other words,
In other words, X2 X2 is
is orthogonal
orthogonal to to any
any vector
vector in S. We
in S. We have
have thusthus shown
shown thatthat
S1 == Rn.
S + S.l IRn. We
We also have that SS U S1 ==00 since the
n S.l the only vector s Ee S orthogonal
orthogonal to
everything in
everything (i.e., including
in S (i.e., including itself)
itself) is
is 0.
O.
It is also easy to see directly that, when we have such direct sum decompositions,
It decompositions, we
can write
can write vectors
vectors in in aa unique
unique way
way with
with respect
respect toto the
the corresponding subspaces. Suppose,
corresponding subspaces. Suppose,
1
for example,
for example, that that xx = = x1XI + x2.
X2 = x; + x'
= x'1+ x~,
2 , where
where x\,
XI, x
x; 1 E
E S and
Sand x2,
X2, x'
x~ 2 e
E S
S.l. Then
. Then
T T
(x'1
(x; —- x1)
XI/(x' (x~
2 —- x2)
X2) = 0 0 by
by definition
definition of S.l. But
of ST. But then
then (x'1
(x; — x1) (x;
- XI)T xd = 00 since
(x'1 -– x1) since
xx~2 — X2 =
-X2 = — (x'1 —
-(x; x1) (which
-XI) (which follows
follows byby rearranging
rearranging thethe equation
equation x1+x2
XI +X2 = = x'1
x; + x'2). Thus,
+x~). Thus,
XI —
x1 = x'1 and x2 == xx~.
x; andx2 2. D0
Theorem
Theorem 3.12. 3.12. Let
Let A Rnn —>
A :: IR -+ R m
IRm.. Then
Then
N(A)1" = 7£(A
1. N(A).l R(A Tr ).). (Note:
(Note: This
This holds only for finite-dimensional vector spaces.)
for finite-dimensional spaces.)
1
2. 'R,(A)
2. R(A).l ~ — N(ATT).). (Note:
= J\f(A (Note: This
This also
also holds
holds for
for infinite-dimensional
infinite-dimensional vector spaces.)
vector spaces.)
Proof: To
Proof: To prove the first part, take an an arbitrary xx eE A/"(A).
N(A). Then Ax Ax = = 0 and
and this is
T T
equivalent
equivalent to
to yyT Ax
Ax = = 00 for
for all
all y.
v. But
But yyT Ax
Ax == (AT( A T yy{ x. Thus,
) x. Thus, Ax
Ax == 0 0 if
if and
and only
only if
if xx
T r
is orthogonal
orthogonal to all vectors of the formform A AT y,v, i.e.,
i.e., xx eE R(AT).l.
R(A ) . Since
Since xx was arbitrary, we
have established that N(A)1 = R(A
established thatN(A).l U(ATT}. ).
The
The proof
proof ofof the
the second
second part
part is
is similar
similar and
and is is left
left as
as an exercise. D
an exercise. 0
Definition
Definition 3.13.
3.13. Let
Let A IRnn -+
A :: R -> R m
. Then
IRm. Then {v
{v Ee R"
IRn :: Av
Av == 0}O} is
is sometimes
sometimes called
called the
the
m TT
right
right nullspace
nullspace of of A.
A. Similarly,
Similarly, (w {w e E R
IR :: w
m
W A = 0}
A = O} is
is called the left
called the left nullspace of A.
nullspace of A.
Clearly, the
Clearly, the right
right nullspace
nullspace is is A/"(A)
N(A) while
while the left nullspace
the left nullspace is J\f(ATT).
is N(A ).
Theorem 3.12 and
Theorem 3.12 and part
part 22 of
of Theorem
Theorem 3.11
3.11 can
can be
be combined
combined to
to give
give two
two very
very fun-
fun-
damental
damental and useful decompositions
decompositions of vectors in the domain and
and co-domain of a linear
transformation
transformation A. See also
A. See also Theorem 2.26.
Theorem 2.26.
m
Theorem
Theorem 3.14
3.14 (Decomposition
(Decomposition Theorem).
Theorem). Let
Let A R"n ->
A :: IR -+ R . Then
IRm. Then
7.
1. every vector vv in
every vector in the
the domain
domain space
space R"
IRn can be written
can be written inin a
a unique
unique way
way as
as vv == xx + y,
y,
where x €E M(A)
N(A) and and y €
E J\f(A)
±
ft(Ar) (i.e.,
N(A).l = R(AT) R"n = M(A)
(i.e., IR N(A) EB ft(ATr)).
0 R(A ».
2. every
every vector
vector w in
in the
the co-domain
co-domain space Rmm can
space IR can be
be written
written in a unique
ina unique way asww =
way as x+y,
= x+y,
1
where x eE U(A)
R(A) andand y eE ft(A) N(AT)T ) (i.e.,
R(A).l- = Af(A IRmm =
(i.e., R R(A) 0
= 7l(A) M(ATT)).
EBN(A ».
This
This key
key theorem
theorem becomes
becomes very
very easy
easy to
to remember
remember by carefully studying
by carefully and under-
studying and under-
standing
standing Figure 3.1 in
Figure 3.1 in the
the next section.
next section.
3.5
3.5 Four Fundamental
Four Fundamental Subspaces
Subspaces
x
Consider
Consider aa general
general matrix
matrix AA €
E E^ ". When
lR;,xn. When thought
thought of
of as
as aa linear
linear transformation E"n
from IR
transformation from
m
to IRm,, many
to R many properties of A
properties of can be
A can be developed in terms
developed in terms of
of the
the four fundamental subspaces
four fundamental subspaces
3.5. Four
3.5. Four Fundamental
Subspaces 23
23
r N(A)1- r
EB {OJ
X {O}Gl
n-r m -r
Figure 3.1.
Figure 3.1. Four fundamental subspaces.
fundamental subspaces.
7£(A),
R(A), 'R.(A)^,
R(A)1-, AN(A), N(A)T. Figure 3.1
f ( A ) , and N(A)1-. 3.1 makes many key properties
properties seem almost
obvious and
obvious and we return to
we return to this
this figure
figure frequently
frequently both
both in
in the
the context
context of
of linear
linear transformations
transformations
and in
and in illustrating
illustrating concepts
concepts such such as
as controllability
controllability and
and observability.
observability.
Definition 3.15. Let
Definition 3.15. Let V and W be
and W be vector spaces and
vector spaces and let
let A be aa linear
A :: V -+ W be linear transfor-
transfor-
motion.
mation.
1. A
1. A is onto (also
is onto (also called
called epic
epic or
or surjective)
surjective) ifR(A)
ifR,(A) =
= W.
W.
2. A
2. A is one-to-one or
is one-to-one 1-1 (also
or 1-1 (also called
called monic
monic or
or injective)
infective) if
ifJ\f(A)
N(A) == O. 0. Two
Two equivalent
equivalent
characterizations of A
characterizations of A being 1-1 that
being 1-1 that are
are often
often easier to verify
easier to verify in
in practice are the
practice are the
following:
following:
(a) AVI = AV2 ===} VI = V2 .

(b) VI t= V2 ===} AVI t= AV2 .
Definition 3.16. IR n -+
3.16. Let A : E" -> R m
. Then rank(A)
IRm. dimftCA).
rank(A) = dim R(A). This is sometimes called
column rank
the column rank of
of A (maximum number of of independent row rank
independent columns). The row rank of
of A is
24
24 Chapter3.
Chapter Linear Transformations
3. LinearTransformations
r
dim 7£(A
R(AT) ) (maximum number of of independent
independent rows).
rows). The dual notion to rank is the nullity
of A,
of A, sometimes denoted nullity(A) or
denoted nullity(A) corank(A), and
or corank(A), and is
is defined
defined as dim A/"(A).
as dimN(A).
Theorem 3.17.
3.17. Let A :: R n
~ R
]Rn ->
m
]Rm.. Then dim K(A) dimA/'(A) ± . (Note:
R(A) = dimNCA)-L. (Note: Since
1 TT
A/Â) = 7l(A
N(A)-L" = R(A ),), this theorem is sometimes colloquially
colloquially stated "row rank of
of A == column
of A.")
rank of A.")
Proof: J\f(A)~L ~
Proof: Define a linear transformation T : N(A)-L —>•R(A)
7£(A)byby
Tv = Av for all v E N(A)-L.
Clearly T is 1-1 (since A/"(T) N(T) = To see that T is also onto, take any W
= 0). To w eE R(A).
7£(A). Then
by definition there is a vector xx Ee ]Rn R" such that AxAx = — w.
w. Write xx = Xl x\ + X2,
X2, where
1
x\ Ee A/Â)
Xl N(A)-L - andx2
and jc2 eE A/"(A).
N(A). Then AjtiAXI = W u; = TXI
r*i since Xl N(A)-L.1. The last equality
*i eE A/Â)-
shows that T T is onto. We thus have that dim dim7?.(A)
R(A) = = dimN(A)-L
dimA/Â^ since it is easily shown
1
{ui, ...
that if {VI, . . . ,, viv} abasis
r } is a forA/'CA)
basis for N(A)-L,, then {TVI, . . . ,, Tv
{Tv\, ... Tvrr]} is aabasis 7?.(A). Finally, if
basis for R(A). if
we apply this and several previous results, the following following string of equalities follows follows easily:
"column rank of A"
"column rank(A) =
A" = rank(A) = dim R(A) =
dim7e(A) dim A/Â)1 =
= dimN(A)-L = dim
dim7l(A
R(AT) T
) = rank(A r ) ==
= rank(AT)
"row rank of of A." D0
The following corollary is immediate. Like the theorem, it is a statement about equality
of dimensions; the subspaces
of dimensions; the subspaces themselves
themselves are
are not
not necessarily in the
necessarily in the same
same vector
vector space.
space.
3.18. Let A : R"
Corollary 3.18. ]Rn ~-> R m
dimA/"(A) +
]Rm.. Then dimN(A) + dim
dimft(A) = n, where n is the
R(A) =
dimension of the
dimension of the domain
domain of
of A.
A.
Proof: From
Proof: Theorems 3.11
From Theorems 3.11 and
and 3.17
3.17 we
we see
see immediately
immediately that
that
n = dimN(A) + dimN(A)-L
= dimN(A) + dim R(A) . 0
completeness, we include here a few miscellaneous results about ranks of sums

For completeness,
and products of matrices.
xn
3.19. Let A, B Ee R"
Theorem 3.19. ]Rnxn.. Then
1. O:s rank(A + B) :s rank(A) + rank(B).

2. rank(A) + rank(B) - n :s rank(AB) :s min{rank(A), rank(B)}.
3. nullity(B) :s nullity(AB) :s nullity(A) + nullity(B).

4. if B is nonsingular, rank(AB) = rank(BA) = rank(A) and N(BA) = N(A).
Part 44 of
Part of Theorem 3.19 suggests
Theorem 3.19 suggests looking
looking atthe
at the general
general problem of the
problem of four fundamental
the four fundamental
subspaces of matrix products. The basic results are contained in the following
following easily proved
theorem.
3.5. Four
3.5. Four Fundamental
Subspaces 25
25
mxn nxp
3.20. Let A Ee R
Theorem 3.20. IRmxn,, B Ee R
IRnxp.. Then
1. RCAB) S; RCA).
2. N(AB) ;2 N(B).
3. R«AB)T) S; R(B T ).
4. N«AB)T) ;2 N(A T ).
The next
The next theorem
theorem is
is closely
closely related
related to
to Theorem
Theorem 3.20
3.20 and
and is
is also
also easily
easily proved.
proved. It
It
is extremely
is extremely useful
useful in
in text
text that
that follows,
follows, especially
especially when
when dealing
dealing with
with pseudoinverses
pseudoinverses and
and
linear least
linear least squares
squares problems.
problems.
mxn
3.21. Let
IRmxn.. Then
1. R(A) = R(AA T ).
2. R(AT) = R(A T A).
3. N(A) = N(A T A).
4. N(A T ) = N(AA T ).
We now
We now characterize
characterize I-I
1-1 and
and onto
onto transformations
transformations and
and provide
provide characterizations
characterizations in
in
terms of
terms of rank and invertibility.
rank and invertibility.
Theorem 3.22.
Theorem Rnn -+
3.22. Let A :: IR Rm. Then
-» IRm.
1. A
1. is onto
A is onto if
if and
and only //"rank(A)
only if —m
rank (A) = m (A
(A has
has linearly
independent rows
rows or
or is
is said
said to
to
have full
have row rank;
full row rank; equivalently, AATT is
equivalently, AA is nonsingular).
nonsingular).
2. A
2. is 1-1
A is 1-1 if
if and
and only z/rank(A) =
only ifrank(A) = nn (A
(A has
has linearly
independent columns
columns or
or is
is said
said
T
to have
to have full column rank;
full column rank; equivalently,
equivalently, A
AT A nonsingular).
A is nonsingular).
Proof: Proof of
Proof' Proof of part 1: If
part 1: If A is onto,
A is onto, dim R(A) =
dim7?,(A) —m m = rank (A). Conversely,
— rank(A). Conversely, let Rm
let yy Ee IRm
T T ] n
be arbitrary. Let jc x =A AT(AA
(AAT)-I IRn.. Then y = Ax, i.e., y Ee R(A),
)~ y Y Ee R 7?.(A), so A
A is onto.
Proof of
Proof of part
part 2:
2: If
If A is 1-1,
A is 1-1, then
then N(A)
A/"(A) == 0,
0, which
which implies
implies that dim A/Â)-1 =
that dimN(A)1- —nn — =
dim 7£(ATr ),), and
dim R(A and hence
hence dim
dim 7£(A)
R(A) = nn by Theorem 3.17.
by Theorem 3.17. Conversely,
Conversely, suppose
suppose AXI
Ax\ = Ax^.AX2.
Then A ATr A;ti
AXI = A ATT
Ax2,
AX2, which implies x\ X2 since A
= x^.
XI = ATrAA is invertible. Thus, A A is
1-1.
1-1. D
D
Definition 3.23.
Definition 3.23. A A :: V —» W
V -+ W is invertible (or
is invertible (or bijective)
bijective) if
if and
and only
only if
if it
it is
is 1-1
1-1 and
and onto.
onto.
Note that
Note that if
if A
A isis invertible,
invertible, then dim V
then dim V = dim W.
— dim W. Also,
Also, A IRn1 -+
A :: W IR n is
-»• E" invertible or
is invertible or
nonsingular if and only
nonsingular ifand z/rank(A) =
only ifrank(A) = n.
n.
x
Note that in the special case when A A E€ R"
IR~xn,", the transformations A, Ar, and A-I
A, AT, A"1
±
are all
are all 1-1
1-1 and
and onto
onto between
between the
the two
two spaces
spaces M(A)
N(A)1- and and 7£(A). The transformations
R(A). The AT
transformations AT
!
and A
and A~ have the
-I have the same
same domain
domain and
and range but are
range but are inin general
general different
different maps
maps unless
unless A
A is
is
T
orthogonal. Similar remarks apply to A A and A~
A -T. .
26 Chapters. Li near Transformations
Chapter 3. linear Transformations
If a linear
If linear transformation is not invertible, it may still be right or left
left invertible. Defi-
nitions of these concepts
concepts are followed by a theorem characterizing left left and right invertible
transformations.
Definition
Definition 3.24.
3.24. Let
Let A V ->
A :: V -+ W. Then
Then
A-RR ::
1. A is said to be right invertible ifif there exists a right inverse transformation A~
R
W -+
—> V such that AA~
AA -R == Iww,, where IIww denotes the identity transfonnation
transformation on W.
L
to be left invertible ifif there exists a left
2. A is said to left inverse transformation
transformation A~
A -L :: W —>
-+
L
such that
V such that AA~
-L A A == Iv,
Iv, where
where Iv Iv denotes
denotes the
the identity
identity transfonnation
transformation on
on V.
V.
Theorem 3.25. Let -> W. Then

Let A : V -+
1.
1. A
A is right
right invertible
invertible ifif and
and only
only ifif it onto.
it is onto.
2. A is
is left invertible if
left invertible if and
and only ifit
if it is 1-1.
Moreover, A is
is invertible if
if and
and only if
if it is both right and
and left invertible, i.e.,
left invertible, both1-1
i.e., both 1-1 and
and
onto, in A -Il =
in which case A~ = A~ R
A -R =
= A~
A -L.L
.
m
Note: From Theorem
Theorem 3.22
3.22 we see that if A : E" ->• E
]Rn -+ ]Rm is onto, then a right inverse
R T T
is given by A~
A -R == AAT(AA
(AAT)) -I.. Similarly, if A is 1-1, then a left
left inverse is given by
L
L = (ATTA)-I1AT.
T
A~ = (A A)~ A .
A-
Theorem 3.26. Let A : V -»

3.26. Let -+ V.
V.
1. If A - RR such that AA~

If there exists a unique right inverse A~ A A - RR =
= I, then A is invertible.
L
2. If
If there exists a unique left
left inverse A~
A -L A -LLA
such that A~ = I, then A is invertible.
A =
Proof: We prove the first part and

Proof: and leave
leave the proof of the second
proof of second to the reader. Notice the
the
following:
following:
A(A- R + A-RA -I) = AA- R + AA-RA - A

= I +IA - A since AA -R = I
= I.
Thus, (A R
(A -R + AA -RRAA —
+ - /)I) must
must be
be aa right
right inverse
inverse and, therefore, by
and, therefore, by uniqueness
uniqueness itit must
must be
be
the case that A~A -RR
+ A~
+ A -RRA A -— I =
= A~
A -R.R
. But this implies that A~A -RRA
A == /,
I, i.e.,
i.e., that A~ R
A -R is
aa left
left inverse.
inverse. It
It then
then follows
follows from
from Theorem
Theorem 3.25 3.25 that
that A is invertible.
A is invertible. D 0
Example 3.27.
1. Let A = [1 2]
= [1 : E2 -»•
2]:]R2 -+ E]R1I.. Then A is onto. (Proof:
(Proo!' Take any a E ]R1I;
€ E ; then one
2
can
can always
always find
find vv eE E such that
]R2 such that [1
[1 2][^]
2][ ~~] = a). Obviously
= a). A has
Obviously A has full
full row rank
row rank
(=1)
(= 1) and A - RR =
and A~ _~]j is a right
= [ _j right inverse.
inverse. Also, it is clear that there are
are infinitely
infinitely many
right inverses for A. Chapter 6 we characterize
A. In Chapter characterize all right inverses of a matrix by
characterizing all solutions
characterizing all solutions of of the
the linear
linear matrix
matrix equation
equation ARAR = = I.I.
Exercises 27
2. Let
2. LetA [J] : E1 ->
A = [i]:]Rl E2. ThenAis
~ ]R2. 1-1. (Proof
Then A is 1-1. (Proof: The only solution
The only to 0 =
solution toO Av =
= Av [I2]v
= [i]v
is vv = 0,
is 0, whence N(A) =
whence A/"(A) = 00 so
so A is 1-1).
A is 1-1). ItIt is
is now
now obvious
obvious that
that A
A has full column
has full column
L
rank (=1) and A~ A -L [3 -—1]
= [3
= 1] is a left inverse. Again, it is clear that there are
infinitely many
infinitely many left
left inverses
inverses for
for A. In Chapter
A. In Chapter 66 we characterize all
we characterize all left
left inverses
inverses ofof aa
matrix by characterizing
matrix characterizing all
all solutions
solutions of
of the
the linear
linear matrix
matrix equation
equation LALA = = I.I.
3. The
3. matrix
The matrix
1 1
A = 2 1
[ 3 1
when considered
when considered as
as aa linear
linear transformation on
on IE]R3,\ isisneither
neither 1-1
1-1nor
nor onto.We
onto. We give
give
below
below bases for its
bases for its four fundamental subspaces.
four fundamental subspaces.
EXERCISES
EXERCISES
1. Let
1. A =
Let A = [[~8 5;3 i)
J4 and consider
consider AA as a linear
transformation mapping E 3
E2.
]R3 to ]R2.
Find the
Find the matrix
matrix representation
representation of
of A
A with respect to
to the bases
bases
{[lHHU]}
of R3 and
{[il[~J}
2
of E .
nx
2. Consider the
2. Consider the vector space R
vector space ]Rnxn" over
over E, let S denote
]R, let denote the
the subspace
subspace of
of symmetric
symmetric
matrices, and
matrices, and let
let 7£ denote the
R denote the subspace
subspace of of skew-symmetric
skew-symmetric matrices.
matrices. For
For matrices
matrices
nx
Y Ee E
X, Y ]Rnxn " define their inner product by (X, (X, Y) Tr(XTr F).
= Tr(X
y) = Y). Show that, with
respect to
respect to this
this inner
inner product,
product, R —SS^.
'R, = J. .
Consider the differentiation

3. Consider differentiation operator C £ I-I?
£, defined in Example 3.2.3. Is £, 1-1? IsIs£
£,
onto?
onto?
4. Prove
4. Prove Theorem 3.4.
Theorem 3.4.
28 Chapters.
Chapter Linear Transformations
3. Linear Transformations
5. Prove Theorem 3.11.4.

3.Il.4.
6. Prove Theorem
Theorem 3.12.2.
7. Determine
Detennine bases for the four fundamental
fundamental subspaces of the matrix
A=[~2 5~ 5~ ~].
3
mxn
Suppose A Ee IR
8. Suppose Rm left inverse. Show that ATT has a right inverse.
xn has a left
9. Let
9. Let A = n
= [[~J o]. Determine A/"(A) and
DetennineN(A) and 7£(A). Are they
R(A). Are equal? Is
they equal? Is this
If this is true in general, prove it; if not, provide a counterexample.
If
this true
true in general?
in general?
9x48
10. Suppose A €
10. 1R~9X48.
E Mg . How many linearly
linearly independent solutions can be found to the
independent solutions
homogeneous
homogeneous linear
linear system
system Ax = 0?
Ax = O?
T
11. Modify Figure 3.1
3.1 to illustrate the four fundamental subspaces associated
associated with A
ATE e
nxm m
IR nxm thought of as a transformation from
R IR m to IRn.
from R R".
Chapter
Chapter 4
4
Introduction to
Introduction to the
the
Moore-Pen rose
Moore-Penrose
Pseudoinverse
Pseudoinverse
In this
In this chapter
chapter we
we give
give aa brief introduction to
brief introduction to the
the Moore-Penrose
Moore-Penrose pseudoinverse,
pseudoinverse, aa gener-
gener-
alization of the inverse of a matrix. The Moore-Penrose pseudoinverse is defined for any
matrix and, as
matrix and, as is shown in
is shown in the following text,
the following text, brings great notational
brings great and conceptual
notational and clarity
conceptual clarity
to
to the
the study of solutions
study of solutions to arbitrary systems
to arbitrary systems ofof linear
linear equations
equations and
and linear
linear least
least squares
squares
problems.
problems.
4.1
4.1 Definitions and
Definitions and Characterizations
Characterizations
Consider aa linear
Consider linear transformation
transformation A
A :: X —>• y,y, where
X ---+ whereX Xand andY y arearearbitrary
arbitraryfinite-
finite-
1
dimensional vector spaces. Define
dimensional Define a transformation
transformation T T :: Af(A)
N(A).l —>• R(A)
- ---+ Tl(A) by by
Tx = Ax for all x E NCA).l.
Then, as noted
Then, as in the
noted in the proof
proof of
of Theorem 3.17, T
Theorem 3.17, T isis bijective (1-1 and
bijective Cl-l and onto),
onto), and
and hence
hence wewe
can
can define a unique inverse transformation T- T~l 1 :: 7£(A) J\f(A}~L. This
—>•NCA).l.
RCA) ---+ Thistransformation
transformation
can be
can used to
be used give our
to give our first definition of
first definition A ++,, the
of A the Moore-Penrose
Moore-Penrose pseudoinverse
pseudoinverse of A.
of A.
Unfortunately, the definition neither
neither provides
provides nor suggests a good
good computational strategy
computational strategy
determining AA++..
for determining
for
Definition 4.1.
Definition 4.1. With
With A
A and T as
and T as defined
defined above,
above, define
define aa transformation A++ : Y
transformation A y —»•
---+ X by
X by
L +
where y =
where Y y\ + Yz
= YI j2 with y\ eE 7£(A)
with Yl RCA) and yi eE Tl(A}
and Yz RCA).l.. Then
Then A the Moore-Penrose
is the
A+ is Moore-Penrose
pseudoinverse
pseudoinverse of A.
of A.
Although
Although X X and
and Y y were arbitrary vector
were arbitrary spaces above,
vector spaces above, let
let us
us henceforth
henceforth consider
consider the
the
1 X
X =W
case X ~n and Y
y =R lP1.mm.. We
We have thus defined A+
A + for all A
A Ee IR™ ". A purely algebraic
lP1.;" xn.
characterization of
characterization A ++ is
of A given in
is given in the
the next
next theorem, which was
theorem, which was proved
proved by by Penrose in 1955;
Penrose in 1955;
see [22].
see [22].
29
30 Chapter 4.
Chapter Introduction to
4. Introduction to the
the Moore-Penrose
Moore-Penrose Pseudoinverse
Pseudoinverse
xn
Theorem 4.2.
Theorem 4.2. Let A Ee lR;"
Let A R?xn. . Then
Then G
G= A++ if
=A if and
and only
only ifif
(PI) AGA
(Pl) = A.
AGA = A.
(P2) GAG = G.
(P2) GAG G.
(P3) (AG)T
(P3) (AGf = AG.
AG. =
(P4) (GA)T == GA.
(P4) (GA)T GA.
Furthermore, A++ always
Furthermore, A always exists
exists and
and is
is unique.
unique.
Note that the
Note that the inverse
inverse of
of aa nonsingular
nonsingular matrix satisfies all
matrix satisfies all four
four Penrose properties. Also,
Penrose properties. Also,
aa right
right or
or left
left inverse
inverse satisfies
satisfies no
no fewer
fewer than
than three
three of
of the
the four
four properties.
properties. Unfortunately,
Unfortunately, asas
with Definition
with Definition 4.1, neither the
4.1, neither the statement
statement of
of Theorem
Theorem 4.2 4.2 nor
nor its proof suggests
its proof suggests aa computa-
computa-
tional algorithm.
tional algorithm. However,
However, thethe Penrose
Penrose properties
properties dodo offer
offer the great virtue
the great virtue of
of providing
providing aa
checkable criterion
checkable criterion in
in the
the following sense. Given
following sense. Given aa matrix
matrix G G that
that is
is aa candidate
candidate for
for being
being
the pseudoinverse
the pseudoinverse of of A, one need
A, one need simply
simply verify
verify the
the four
four Penrose
Penrose conditions
conditions (P1)-(P4).
(P1)-(P4). If
If G
G
satisfies all
satisfies four, then
all four, then by
by uniqueness,
uniqueness, itit must be A
must be A++.. Such
Such aa verification
verification is is often relatively
often relatively
straightforward.
straightforward.
Example 4.3.
Example 4.3. Consider
Consider A [a
A == [']. Verify directly that
Verify directly
L
that A+
A+ == [| ~] [!
f ] satisfies
satisfies (PI)-(P4).
(P1)-(P4).
Note that
Note that other
other left
left inverses
inverses (for
(for example,
example, A~
A -L == [3[3 -— 1])
1]) satisfy
satisfy properties
properties (PI),
(PI), (P2),
(P2),
and (P4) but
and (P4) but not (P3).
not (P3).
Still another
Still characterization of
another characterization A++ is
of A is given in the
given in following theorem,
the following theorem, whose
whose proof
proof
can be
can be found
found in
in [1,
[1, p.
p. 19].
19]. While not generally
While not generally suitable
suitable for
for computer
computer implementation,
implementation, this
this
characterization can
characterization can be
be useful for hand
useful for calculation of
hand calculation of small
small examples.
examples.
xn
Theorem 4.4.
Theorem 4.4. Let A Ee lR;"
Let A R™xn. . Then
Then
A+ = lim (AT A + 82 1) -I AT (4.1)

6--+0
= limAT(AAT +8 2 1)-1. (4.2)

6--+0
4.2
4.2 Examples
Examples
Each of the
Each of the following
following can
can be
be derived
derived or
or verified
verified by using the
by using above definitions
the above definitions or
or charac-
charac-
terizations.
terizations.
Example 4.5.
Example A+t == A
4.5. X ATT
(AATT)
(AA if A
)~-I if is onto
A is onto (independent
(independent rows)
rows) (A is right invertible).
(A is
Example
Example 4.6. A+ =
4.6. A+ (AT A)~
= (AT AT if
A)-I AT if A is 1-1
A is 1-1 (independent
(independent columns)
columns) (A is left
(A is left invertible).
invertible).
Example 4.7.
Example 4.7. For
For any
any scalar
scalar a,
a,
if a t= 0,
if a =0.
4.3. Properties
4.3. Properties and
and Applications
Applications 31
31
Example 4.8.
Example 4.8. For
For any
any vector
vector v Ee M",
jRn,
if v i= 0,
if v = O.
Example 4.9.
Example 4.9.
[~ ~ r=[ 0 ~l
Example 4.10.
Example 4.10.
[~ ~ r 1 =[
I
4
I
4
4
I
4
I
4.3
4.3 Properties and
Properties and Applications
Applications
This section presents
This section presents some
some miscellaneous
miscellaneous useful results on
useful results pseudoinverses. Many
on pseudoinverses. Many of
of these
these
are used
are used in
in the
the text
text that
that follows.
follows.
mx
Theorem 4.11.
4.11. Let A Ee R jRmxn"and supposeUUEejRmxm,
andsuppose R n x "areare
Rmxm,VVEejRnxn orthogonal(M(Mis is
orthogonal
T -11
orthogonal if
orthogonal if MT M ).
M = M- Then
). Then
For the
Proof: For
Proof: the proof, simply verify
proof, simply verify that
that the
the expression
expression above
above does indeed satisfy
does indeed satisfy each
each cof
the four Penrose
the four Penrose conditions.
conditions. D 0
nxn
Theorem 4.12.
Theorem 4.12. Let S Ee R
Let S be symmetric
jRnxn be symmetric with
with UUTTSU
SU = = D,
D, where U is
where U is orthogonal
orthogonal an
and
+ + TT +
D is
D is diagonal.
diagonal. Then
Then S
S+ = U UD U , , where
D+U where D
D+ is
is again
again a
a diagonal
diagonal matrix
matrix whose
whose diagonc
diagonal
elements are
elements are determined
determined according to Example
according to Example 4.7.
4.7.
4.13. For
Theorem 4.13. For all A
A E Rmxn,
e jRmxn,
1. A+ = (AT A)+ AT = AT (AA T)+.
2. (A T )+ = (A+{.
Proof: Both
Proof: results can
Both results can be proved using
be proved the limit
using the limit characterization
characterization of
of Theorem
Theorem 4.4. The
4.4. The
proof of
proof of the
the first
first result is not
result is particularly easy
not particularly easy and
and does not even
does not even have the virtue
have the virtue of
of being
being
especially illuminating.
especially illuminating. The The interested
interested reader
reader can
can consult
consult the proof in
the proof in [1,
[1, p.
p. 27]. The
27]. The
proof of the
proof of the second
second result (which can
result (which can also
also be
be proved easily by
proved easily by verifying the four
verifying the four Penrose
Penrose
conditions) is
conditions) is as
as follows:
follows:
(A T )+ = lim (AA T + 82 l)-IA

~--+O
= lim [AT(AAT + 82 l)-1{

~--+O
= [limAT(AAT + 82 l)-1{
~--+O
= (A+{. 0
32 Chapter 4.
4. Introduction to the
the Moore-Penrose
Moore-Penrose Pseudo
Pseudoinverse
inverse
Note that by combining Theorems 4.12 4.13 we can,

4.12 and 4.13 can, in theory at least, compute
the Moore-Penrose pseudoinverse of any matrix (since AAT A AT and AT
AT A are symmetric). This
turns out to be a poor
poor approach in finite-precision arithmetic, however (see, e.g., [7],
(see, e.g., [7], [II],
[11],
[23]), and better methods are suggested in text that follows.
Theorem 4.11
Theorem 4.11 is suggestive of a "reverse-order" property for pseudoinverses of prod-
nets of
ucts of matrices such as
matrices such as exists
exists for
for inverses of products.
inverses of nroducts TTnfortnnatelv. in general,
Unfortunately, in peneraK
As
As an example consider
an example consider A = [0
A = [0 1J
I] and B=
and B : J. Then
= [LI. Then
(AB)+ = 1+ = I
while
while
B+ A+ = [~ []
~J ~ = ~.
sufficient conditions under which the reverse-order
However, necessary and sufficient reverse-order property does
hold are known and we quote a couple of moderately useful results for reference.
Theorem 4.14. (AB)+ =
4.14. (AB)+ = B + +
A + ifif and
B+ A and only if
only if
1. n(BB T AT) ~ n(AT)
and
2. n(A T AB) ~ nCB) .
Proof: [9].
Proof: For the proof, see [9]. D
0
+
Theorem 4.15.
4.15. (AB)
(AB)+ == B?A+, where BI
B{ Ai, where BI =
= A+AB
A+ AB and
and A)
AI = AB\B+.
= ABIB{.
Proof: For the proof, see [5].
Proof: [5]. D
0
n xr r xm
Theorem 4.16.
4.16. If
If A eE R
lR~xr, eR
r , B E lR~xm, (AB)+ == B+
r , then (AB)+ B+A+.
A+.
Proof' Since A Ee R
Proof: lR~xr,
n xr
r , then A+ A+ =
= (AT(ATA)~
A)-IlAAT,
T
, whence A A+ +
Similarly, since
AA = fIrr .• Similarly,
xm + T T +
B eE W
lR;xm,
r , we
we have B
B+ = B (BB
BT(BBT)-I,)~\ whence BB
BB+ = If rr.. The
The result then follows by
by
taking
takingBBIt == B,At
B, A\ =
=A A inin Theorem
Theorem 4.15.
4.15. D 0
The following theorem gives some additional useful properties

properties of pseudoinverses.
mxn
Theorem 4.17.
4.17. For e lR
For all A E Rmxn ,,
1. (A+)+ = A.
2. (AT A)+ = A+(A T)+, (AA T )+ = (A T)+ A+.
3. n(A+) = n(A T) = n(A+ A) = n(A TA).

4. N(A+) = N(AA+) = N«AA T)+) = N(AA T) = N(A T).
5. If A is normal, then AkA+ = A+ Ak and (Ak)+ = (A+)kforall integers k > O.
Exercises 33
xn
Note:
Note: Recall that A
Recall that A eE R"
IRn xn is normal if
is normal A ATT =
if AA ATTA.
= A A. For
For example,
example, if
if A
A is
is symmetric,
symmetric,
skew-symmetric, or
skew-symmetric, or orthogonal,
orthogonal, then
then it
it is
is normal.
normal. However,
However, aa matrix
matrix can
can be
be none of the
none of the
preceding
preceding but still be
but still be normal, such as
normal, such as
A=[ -ba ab]

for scalars a,
for scalars b eE R
a, b E.
The next
The next theorem
theorem is
is fundamental
fundamental to facilitating aa compact
to facilitating and unifying
compact and approach
unifying approach
to studying
to studying the
the existence of solutions
existence of solutions of
of (matrix)
(matrix) linear equations and
linear equations linear least
and linear squares
least squares
problems.
problems.
nxp MXm
Suppose A Ee R
IRnxp,, B Ee E
IRnxm.. Then K(B) cS; R(A)
Then R(B) U(A) if
if and only ifif
and only
AA+B == B.
B.
m
Proof: Suppose R(B)
Proof: Suppose K(B) cS; U(A)
R(A) and take arbitrary
and take arbitrary jc
x EeR . Then
IRm. Then Bx
Bx eE R(B)
H(B) cS; H(A),
RCA), so
so
p
there exists aa vector
there exists vector yy Ee R such that
IRP such that Ay Bx. Then
= Bx.
Ay = Then we have
we have
Bx = Ay = AA + Ay = AA + Bx,
where one
where one of the Penrose
of the Penrose properties is used
properties is used above.
above. Since
Since xx was arbitrary, we
was arbitrary, we have
have shown
shown
that B
that = AA+B.
B = AA+ B.
+
To
To prove the converse,
prove the converse, assume
assume that
that AA
AA +B and take
B and
B = B take arbitrary
arbitrary yy eE K(B). Then
R(B). Then
there exists
there exists aa vector eR
vector xx E m
m
such that
IR such that Bx y, whereupon
= y,
Bx = whereupon
y = Bx = AA+Bx E R(A). 0
EXERCISES
EXERCISES
1. Use
1. Theorem 4.4
Use Theorem 4.4 to compute the
to compute the pseudoinverse
pseudoinverse of
of \ 2 U ;].1 •
2
2. If jc,
x, Yy eE R", (xyT)+ == (x T(xx)+(yT
IRn, show that (xyT)+ T + T + T
x) (yy)+ yxT.
y) yx
mxn r
3.
3. For
For A
A eE R
IRmxn, , prove
prove that
that 7£(A)
RCA) == 7£(AA
R(AAT) ) using
using only
only definitions
definitions and elementary
and elementary
properties
properties of
of the Moore-Penrose pseudoinverse.
the Moore-Penrose pseudoinverse.
mxn
4. For A
A e
E R
IRmxn,, prove that R(A+) ft(ATr).
ft(A+) = R(A
pxn mx
5. For A eR
A E IRPxn 5 €R
and BE ", show that
IRmxn, JV(A) C
thatN(A) S; A/"(S) and only if BA+
N(B) if and fiA+A
A = B.
B.
xn
6. Let A
6. A G
E M"
IRn xn, E n xxm
, 5B eE JRn m
IRmmxm
, and D E€ E xm and suppose further that D
D is nonsingular.
Prove or
(a) Prove or disprove
disprove that
that
(b) Prove
(b) Prove or
or disprove
[~
disprove that
that
AB
D r =[
A+
0
-A+ABD- i
D- i ].
[~ B
D r=[ A+
0
-A+BD- 1
D- i
l
left blank
blank
Chapter
Chapter 5
5
Introduction to
Introduction to the
the Singular
Singular
Value Decomposition
Value Decomposition
In
In this
this chapter
chapter we
we give
give aa brief
brief introduction
introduction to
to the singular value
the singular value decomposition
decomposition (SVD).
(SVD). We
We
show that
show that every matrix has
every matrix an SVD
has an SVD and
and describe
describe some
some useful
useful properties
properties and
and applications
applications
of
of this
this important
important matrix
matrix factorization.
factorization. The SVD plays
The SVD plays aa key conceptual and
key conceptual computational
and computational
role
role throughout (numerical) linear
throughout (numerical) linear algebra
algebra and its applications.
and its applications.
5.1 Theorem
The Fundamental Theorem
xn mxm
Theorem 5.1.
Theorem 5.1. Let A eE R™
IR~xn.. Then there exist orthogonal matrices U E
e R
IRmxm and
and
nxn
nxn
V E€ IR
V R such such that
that
A = U~VT, (5.1)
where S
where ~ =
specifically,
= [J
specifically, we have
n
[~ °0], SS = diagfcri, ... ,,o>)
diag(ul, ... ur ) e
E R
IRrxr,, and a\ > ••• >
UI > > U
orr >rxr
> 0. More
O. More
V IT
A = [U I U2) [ ~ 0
0 ][ ] VT
2
(5.2)
= Ulsvt· (5.3)
nxr
The submatrix sizes are all determined by r (which must be S min{m, «}),
< min{m, n}), i.e., U\ eE W
i.e., UI IRmxr,,
U2 eE ^x(m-r)
U2 «xr j yV22 €E Rnxfo-r^
VI eE RIRnxr,
IRrnx(m-rl,; Vi andthethe0-O-subblocks
IRnx(n-r),and JM^/ocJb in inE~arearecompatibly
compatibly
dimensioned.
dimensioned.
r r
Proof: Since A
Proof: Since AT A A >:::::00( A
(ATAAi is s symmetric
symmetric and and nonnegative definite; recall,
nonnegative definite; recall, forfor example,
example,
[24, Ch.
[24, 6]), its
Ch. 6]), its eigenvalues
eigenvalues are all real
are all real and
and nonnegative.
nonnegative. (Note: (Note: The
The rest of the
rest of the proof follows
proof follows
analogously
analogously if if we
we start
start with
with thethe observation
observation thatthat A AAT A T :::::
> 00 and
and the
the details
detailsare
are left
left toto the
the reader
reader
T
as an
as exercise.) Denote
an exercise.) Denote the the setset of eigenvalues of
of eigenvalues of A AT A A by {of , i/ eE !!.}
by {U?, n} with a\ :::::
with UI > ... • • • :::::
>U arr >>
0 = Ur+1
0= o>+i = = ...
• • • = Un.
an. LetLet {Vi,{u, , ii Ee !!.} be aa set
n} be set of of corresponding
corresponding orthonormal
orthonormal eigenvectors
eigenvectors
and
and let V\ = [v\,
let VI [VI, ... ...,,Vvr r),] , V2Vi == [Vr+I,
[vr+\,... Letting S =—diag(uI,
. . .,V, vn n].].LettingS diag(cri,.... . .,u, rcf),r),we wecan
can
r 2 T 2
write
write AA TAVi
A VI = = VI ViS Premultiplying by
S2.. Premultiplying gives vt
Vf gives
by vt Vf A A VI = vt
A TAVi VfV^SVI S2 == S2,S2, the the latter
latter
equality following
equality following from from the the orthonormality
orthonormality of of thethe r;, Vi vectors.
vectors. Pre-
Pre- and andpostmultiplying
postmultiplyingby by
S~l gives
S-I eives the
the emotion
equation
(5.4)
35
36
36 Chapter 5.
Introduction to the Singular
to the Singular Value
Value Decomposition
Decomposition
Turning now
Turning now toto the
the eigenvalue
eigenvalue equations
equations corresponding
corresponding toto the
the eigenvalues
eigenvalues ar+l, . . . ,, a
or+\,... ann we
we
A TTAV
have that A A V2z = VzOV20 = 0, whence Vi A T
Vf A AV22
T A V = 0.
O. Thus, AV2
A V = 0. Now
O. Now define the
mx/ l
Ui E
matrix VI e M IRmxr VI = AViS~
" by U\ AViS-I. (5.4) we see
. Then from (5.4) UfU\ =
see that VrVI = /; i.e., the
77IX( r)
columns
columns of U\
VI are orthonormal. Choose
orthonormal. Choose any matrix U2 £ ^
V2 E ™~
IRmx(m-r) such that [U\ V2]
[VI U2] is
orthogonal. Then
T
V AV =[ VrAVI
VIAVI
Vr AVz
vI AVz
]
~]
VrAVI
=[ vIA VI
since A
since AVV22 ==0.O. Referring
Referring to the equation
to the equation VU\I == A
A VI S l defining
V\ S-I VI, we
defining U\, we see
see that
that U{
V r AV\
A VI =
=
and vI
S and
S 1/2 AVi
A VI = = vI
UÛiS
VI S == O.0. The
The latter
latter equality
equality follows
follows from
from the
the orthogonality
orthogonality ofof the
the
columns of VI U\ and
andU UTAV
V 2.. Thus, we see that, in fact, VT A V = [~ ~], and defining this matrix
[Q Q],
to
to be S completes
be ~ completes thethe proof.
proof. D 0
Definition 5.2.
Definition 5.2. Let A
A == V"i:.
t/E VTVT be an SVD
SVD of
of A
A as in Theorem 5.1.
5.1.
1. The set {ai,
[a\,...,
... , ar}} is called
called the set of values of
of (nonzero) singular values of the matrix A and
iI T
is denoted ~(A).
£(A). From the proof cr,(A) = A
of Theorem 5.1 we see that ai(A)
proof of A;'-(2 (AT
(A A)
A) ==
I
At- (AATT).).
A.? (AA Note that there are also min{m, n] -— r zero singular
min{m, n} singular values.
2. The columns
columns of called the left
ofUV are called singular vectors
left singular vectors of
of A (and are the orthonormal
orthonormal
eigenvectors of
eigenvectors AATT).).
of AA
3. The columns of called the right

of V are called singular vectors
right singular vectors of
of A (and are the orthonormal
orthonormal
eigenvectors of A1A).
of AT A).
x
Remark 5.3.
Remark 5.3. The analogous
analogous complex
complex case in which A Ee C™
IC~ xn" is quite straightforward.
straightforward.
A =
H
decomposition is A
The decomposition t/E V
= V"i:. V H,, where V V are unitary and the proof
U and V proof is essentially
identical, except for Hermitian transposes replacing transposes.
Remark 5.4.
Remark 5.4. Note that V U and V can be be interpreted
interpreted as changes
changes of basis in both the domaindomain
and co-domain
and co-domain spacesspaces with respect to
with respect to which
which A A then has aa diagonal
then has diagonal matrix
matrix representation.
representation.
Specifically, let C,
Specifically, denoteAAthought
C denote thought ofofasasaalinear
transformation mapping
mapping IRWn totoIRm. Then
W. Then
T
rewriting A A = = V"i:.
U^V VT as
as AV
A V = V"i:.U E wewe see
see that Mat
Mat C £ is
is "i:.
S with respect
respect to thethe bases
[v\,...,
{VI, ... , vn }} for R"n and
for IR and {u
{u\,..., for R
m]} for
I, •.. , u m
m (see
IRm (see the
the discussion
discussion in Section 3.2).
in Section 3.2). See
See also
also
Remark 5.16.
Remark 5.16.
Remark 5.5.
Remark 5.5. The !:ingular
singular value decomposition
decomposition is not unique.
unique. For example, an examination
of the proof Theorem 5.1 reveals that
proof of Theorem
any orthonormal
• £lny orthonormal basis
basis for
for N(A)
jV(A) can
can be
be used
used for
for V2.
V2.
there may
• there be nonuniqueness
may be nonuniqueness associated
associated with the columns
with the columns of (and hence
V\ (and
of VI hence VI) cor-
U\) cor-
responding to multiple
responding to cr/'s.
multiple O'i'S.
5.1. The
5.1. The Fundamental
Fundamental Theorem
Theorem 37
37
C/2can be used so long as [U

• any U2 [U\I U2] orthogonal.
Ui] is orthogonal.
• columns of U V can be changed (in tandem) by sign (or multiplier of the form
U and V form
eejej8 in the
the complex case).
case).
What is unique, however, is the matrix I:

E and the span of the columns of UI, f/2, VI,
U\, U2, Vi, and
¥22 (see Theorem
V Theorem 5.11). Note, too,
too,that
thataa"full
"full SVD"
SVD"(5.2)
(5.2)can
canalways
alwaysbe
beconstructed
constructedfrom
from
SVD" (5.3).
a "compact SVD" (5.3).
Remark 5.6.
5.6. Computing
Computing an SVD by working directly with the eigenproblem for A ATT A
A or
T
AA T is numerically poor in finite-precision arithmetic. Better algorithms exist that work
AA
directly on A via a sequence of orthogonal
orthogonal transformations; see, e.g.,
transformations; see, [7], [11],
e.g., [7], [11],[25],
[25].
F/vamnlp
Example 5.7.
- U I UT,
A -- [10 01] -
where U
U is an arbitrary
arbitrary 2x2
2 x 2 orthogonal
orthogonal matrix, is an SVD.
Example 5.8.
5.8.
sin e
-~ ] J[~ ~J[
A _ [ 1 cose cose Sine]
- 0 = [ - sine cose sine -cose '
where e
0 is arbitrary, is an SVD.
Example
Example 5.9.
5.9.
A=U n=[
I
3
2
3
-2y'5
-5-
y'5
S-
2~
4y'5
15
][ 3~
0
0][
0
v'2
T
v'2
v'2
T
-v'2
-2-
]
2 _y'5 0 0 T
3 0 -3-
[] =
3
2
3
2
3
3J2 [~ ~]
is an SVD.
MX
Example 5.10.
5.10. Let AA e R nxn
E IR " be symmetric
symmetric and positive definite. Let V
V be an orthogonal
orthogonal
eigenvectors that diagonalizes A,
matrix of eigenvectors A, i.e., VT AV
i.e., VT AV ==A > > O.
0. Then A = V
A = A VTT is an
VAV
SVD of A.
SVDof A.
A factorization UI: t/SVVTr of
o f aann m
m x nn matrix A
A qualifies as an SVD if UU and V are
orthogonal and I: £ is an m x n "diagonal" matrix whose diagonal elements in the upper
corner are positive (and ordered). For example, if A
left comer f/E VTT is an SVD of A,
A = UI:V A, then
VI:TUr Tr
V S C / i is
s aan
n SSVD T
V D ooff AT.
A .
38
38 Chapter
Chapter 5. Introduction to
5. Introduction the Singular
to the Singular Value Decomposition
Value Decomposition
5.2
5.2 Some Basic
Some Basic Properties
Properties
mxn
Theorem 5.11. Let
Theorem 5.11. Let A
A Ee R
jRrnxn have aa singular
have singular value decomposition A
value decomposition = U'£
A = VLV VTT.. Using
Using
the notation
the notation of
of Theorem
Theorem 5.1, the following
5.1, the following properties hold:
properties hold:
1. rank(A)
1. = rr == the
rank(A) = the number
number of
of nonzero
nonzero singular
singular values
values of A.
of A.
2.
2. Let
Let U = [HI,
V =. [UI, ....
. . ,, uurn]
m] and V =
and V = [VI,
[v\,...
..., , vvnn].]. Then
Then A
A has
has the
the dyadic (or outer
dyadic (or outer
product) expansion
product) expansion
r
A = Laiuiv;. (5.5)
i=1
3. The singular
3. The singular vectors
vectors satisfy
satisfy the
the relations
relations
AVi = ajui, (5.6)

AT Uj = aivi (5.7)
for i E r.
4. LetUI = [UI, ... , u r ], U2 = [Ur+I, ... , urn], VI = [VI, ... , vr ], andV2 = [Vr+I, ... , Vn].
Then
(a) R(VI) = R(A) = N(A T / .
(b) R(U2) = R(A)1- = N(A T ).
(c) R(VI) = N(A)1- = R(A T ).

(d) R(V2) = N(A) = R(AT)1-.
Remark 5.12.
Remark 5.12. Part
Part 4 of the
4 of the above
above theorem
theorem provides
provides aa numerically
numerically superior
superior method for
method for
finding (orthonormal)
finding (orthonormal) bases for the
bases for the four fundamental subspaces
four fundamental subspaces compared
compared to methods based
to methods based
on, for example,
on, for example, reduction to row
reduction to or column
row or column echelon
echelon form.
form. Note
Note that
that each subspace requires
each subspace requires
knowledge
knowledge of of the
the rank
rank r. The relationship
r. The relationship to
to the
the four
four fundamental subspaces is
fundamental subspaces is summarized
summarized
nicely in Figure
nicely in 5.1.
Figure 5.1.
Remark
Remark 5.13. The elegance
5.13. The elegance of
of the dyadic decomposition
the dyadic (5.5) as
decomposition (5.5) as aa sum
sum of
of outer products
outer products
and the key
and the key vector
vector relations (5.6) and
relations (5.6) and (5.7)
(5.7) explain
explain why it is
why it is conventional
conventional to write the
to write the SVD
SVD
as
as A UZVTT rather
= U'£V
A = rather than, say, A
than, say, = U,£V.
A = UZV.
mx
Theorem 5.14. Let
Theorem Let A e E
A E jRmxn" have
have aa singular
singular value decomposition A
value decomposition UHVTT as
= U,£V
A = as in
in
TheoremS.].
Theorem 5.1. Then
Then
(5.8)
where
where
5.2. Some Basic
5.2. Some Properties
Basic Properties 39
39
r r
E9 {O}
/ {O)<!l
n-r m-r
Figure 5.1.
Figure 5.1. SVD and the
SVD and the four
four fundamental subspaces.
fundamental subspaces.
with
with the Q-subblocks appropriately
the O-subblocks appropriately sized.
sized. Furthermore,
Furthermore, ifif we
we let
let the
the columns
columns of U and
of U and V
V
be as defined
be as defined in
in Theorem
Theorem 5.11, then
5.11, then
r 1
= L -v;u;,
;=1 U;
(5.10)
Proof: The proof

Proof' The proof follows
follows easily
easily by
by verifying
verifying the four Penrose
the four Penrose conditions.
conditions. D
0
+
Remark
Remark 5.15.
5.15. NoteNote that
that none
none of
of the expressions above
the expressions above quite
quite qualifies
qualifies as an SVD
as an SVD ofof AA+
if
if we insist that
we insist that the singular values
the singular values be ordered from
be ordered from largest
largest to smallest. However,
to smallest. However, aa simple
simple
reordering
reordering accomplishes
accomplishes thethe task:
task:
(5.11)
This
This can also be
can also be written
written in
in matrix
matrix terms
terms byby using
using the
the so-called
so-called reverse-order identity matrix
reverse-order identity matrix
(or
(or exchange
exchange matrix)
matrix) P
P == \e
[e rr,,eer-I,
r^\, ...
..., , e^, ed, which
e2, e\\, which is
is clearly
clearly orthogonal
orthogonal and symmetric.
and symmetric.
40
40 Chapters.
5. Introduction to the Singular Value
the Singular Decomposition
Value Decomposition
Then
Then
A+ = (VI p)(PS-1 p)(PVr)
is
is the
the matrix
matrix version
version of (5.11). A
of (5.11). "full SVD"
A "full SVD" can
can be
be similarly constructed.
similarly constructed.
Remark
Remark 5.16.5.16. Recall
Recall the
the linear
transformation T T used
used in in the
the proof
proof ofof Theorem
Theorem 3.17 3.17 andand
in
in Definition
Definition 4.1. Since T
4.1. Since T is
is determined
determined byby its
its action
action on
on aa basis,
basis, and since ({VI,
and since v \ , ...
. . .,,vvr r}}isisaa
basis forJ\f(A)±, then
basisforN(A).l, then TT can
can be
be defined
defined by cr, w,, ,i / E~.
TV; == OjUj
by TVj e r. Similarly,
Similarly, since
since {UI, [u\,... ... , ,u
u rr}}
isabasisfor7£(.4),
is a basis forR(A), then T~lI can
then T- be defined by T^'M, =
canbedefinedbyT-Iu; = tv; ^-u, ,i, / eE~.r. From
From Section
Section 3.2, 3.2, the
the
matrix
matrix representation for T
representation for T with
with respect
respect to
to the
the bases
bases {{VI, ..., , vvrr}} and
v \ , ... {MII,, ...
and {u . . . ,, uurr]} is
is clearly
clearly
S, while
S, while the
the matrix
matrix representation for the
representation for inverse linear
the inverse linear transformation
transformation T- T~lI with
with respect
respect to to
1
the same bases
the same bases is
is 5""
S-I..
5.3
5.3 Row and Column
Rowand Compressions
Column Compressions
Row compression
Let
Let A
A E Rmxn have
E lR. an SVD
have an SVD given
given by
by (5.1). Then
(5.1). Then
VT A = :EVT
= [~ ~ ] [ ~i ]
-- [ SVr mxn
0 ] E lR. .
Notice
Notice that N(A) =
that M(A) - N(V A/"(SV,r) and
M(UT T A) = N(svr> and the
the matrix SVr Ee R
matrix SVf rx
lR. rxll" has full row
has full row
T
rank. In other
rank. In other words,
words, premultiplication
premultiplication of A by
of A U is
by VT is an
an orthogonal
orthogonal transformation
transformation that that
"compresses" A
"compresses" A by
by row
row transformations. Such aa row
transformations. Such row compression
compression cancan also
also be accomplished
be accomplished
D _
by
by orthogonal
orthogonal row
row transformations
transformations performed
performed directly on A
directly on A to
to reduce
reduce itit to
to the form [~],
the form 0 ,
where
where R R is
is upper
upper triangular.
triangular. Both
Both compressions
compressions areare analogous
analogous toto the so-called row-reduced
the so-called row-reduced
echelon
echelon form
form which,
which, when derived by
when derived by aa Gaussian
Gaussian elimination algorithm implemented
elimination algorithm implemented in in
finite-precision
finite-precision arithmetic,
arithmetic, isis not
not generally
generally as
as reliable
reliable aa procedure.
procedure.
Column
Column compression
compression
Again,
Again, let
let A lR.mxn have
A eE R have an SVD given
an SVD given by
by (5.1). Then
(5.1). Then
AV = V:E
= [VI U2] [~ ~ ]
=[VIS 0] ElR.mxn.
mxr
This
This time,
time, notice
notice that
that H(A) R(A V) = R(UI
R(A) = K(AV) S) and
K(UiS) and the
the matrix
matrix UiS
VI S eE RlR. m xr has full
has full
column
column rank.
rank. In
In other
other words,
words, postmultiplication
postmultiplication of
of AA by V is
by V an orthogonal
is an orthogonal transformation
transformation
that "compresses"
"compresses" A A by
by column
I;olumn transformations. Such aa compression
transformations. Such compression is analogous to
is analogous to the
the
Exercises
Exercises 41
41
so-called
so-called column-reduced
column-reduced echelon
echelon form,
form, which
which isis not
not generally
generally aa reliable
reliable procedure
procedure when
when
performed
performed by Gauss transformations
by Gauss transformations in finite-precision arithmetic.
in finite-precision arithmetic. For details, see,
For details, see, for
for
example, [7], [11],[23],
[7], [11], [23],[25].
[25].
EXERCISES
EXERCISES
1. Let X E
€M mx
". If
IRmxn. If XT
XTXX = o.
= 0, show that X == 0.
T
2.
2. Prove
Prove Theorem
Theorem 5.1
5.1 starting from the
starting from observation that
the observation that AA
AAT ~ 0.
> O.
3. Let A IRnxn
A eE E" xn
be symmetric but indefinite.
indefinite. Determine an
an SVD
SVD of A.
A.
m n
4. Let x eE R
4. , yy eE R
IRm, ~n be nonzero vectors. Determine
Determine an SVD of
of the matrix A e R™
A E ~~ xn
defined
defined by
by A xyT.
A = xyT.
5. Determine
5. SVDs of
Determine SVDs of the
the matrices
matrices
(a)
[-1
0 ] -1
(b)
[ ~l
6. Let
Let A
A e
E R mxn
and suppose W
~mxn and W eR mxm and
E IRmxm 7 eE R
and Y nxn
~nxn are
are orthogonal.
(a) Show
(a) Show that
that A and W A F have
and WAY have the
the same
same singular
singular values
values (and
(and hence
hence the
the same
same rank).
rank).
(b) Suppose that W and Y
Wand are nonsingular but not necessarily orthogonal. Do A
Yare A
and WAY
and WAY have
have the same singular
the same singular values?
values? Do
Do they
they have
have the
the same
same rank?
rank?
XM
7. Let
7. Let A €E R"
~~xn. . Use
Use the SVD to
the SVD determine aa polar factorization
to determine factorization of i.e., AA== QQP
of A, i.e., P
where
where QQ is
is orthogonal
orthogonal and P = P
and P p TT >
> 0.
O. Note:
Note: this
this is analogous to
is analogous to the
the polar
polar form
form
zz = rel&
iO
of a complex
ofa complex scalar
scalar zz (where J=I).
(where ii = jj = V^T).
left blank
blank
Chapter 6
Chapter 6
Linear Equations
Linear Equations
In
In this chapter we
this chapter examine existence
we examine existence and
and uniqueness of solutions
uniqueness of solutions of
of systems
systems of
of linear
linear
equations. General
equations. General linear
linear systems
systems of
of the form
the form
(6.1)
are studied
are studied and
and include,
include, as
as aa special
special case,
case, the
the familiar
familiar vector system
vector system
Ax = b; A E ]Rn xn, b E ]Rn. (6.2)
6.1
6.1 Vector Linear
Vector Linear Equations
Equations
We begin with
We begin with aa review of some
review of some of
of the
the principal results associated
principal results associated with
with vector
vector linear
linear systems.
systems.
Theorem 6.1. Consider
Theorem 6.1. Consider the
the system of linear
system of linear equations
equations
Ax = b; A E lRmxn, b E lRm. (6.3)
1. There
1. There exists
exists aa solution to (6.3)
solution to (6.3) if
if and
and only
only ififbeH(A).
b E R(A).
2. There exists a solution to (6.3)

2. (6.3} for Rmm if
for all b Ee lR if and only
only ifR(A)
ifU(A) == lR m, i.e.,
W", i.e., A is
onto; equivalently,
onto; equivalently, there
there exists
exists aa solution if and
solution if and only
only ifrank([A, b]) =
j/"rank([A, b]) = rank(A),
rank(A), andand
this is possible
this is only ifm
possible only ifm ::::
< nn (since
(since m dimT^(A)
m = dim R(A) = = rank(A) < min{m,
rank(A) :::: n.
min{m, nn}).
3. A
3. A solution to (6.3)
solution to (6.3) is
is unique
unique if
if and
and only
only if
ifJ\f(A)
N(A) = 0, i.e.,
= 0, i.e., A is 1-1.
A is 1-1.
4. There
4. There exists
exists aa unique
unique solution to
to (6.3)
(6.3) for all bb Ee ]Rm
for all W" if
if and
and only
only if
if A is
is nonsingular;
nonsingular;
mxm
equivalently, A
equivalently, A EG lR
Mmxm and A
and has neither
A has neither aa 0 singular
singular value nor aa 0 eigenvalue.
value nor eigenvalue.
1
5. There
5. There exists
exists at most one
at most one solution
solution toto (6.3)
(6.3) for
for all Wm
all bb Ee lR if and
if and only
only if
if the
the columns
columns of
of
A
A are
are linearly
linearly independent, i.e., A/"(A)
independent, i.e., N(A) = 0, 0, and
and this
this is
is possible
possible only ifm >
only ifm n.
::: n.
6. There exists a nontrivial solution to the homogeneous system Ax

6. = 0
Ax = 0 if
if and only
only ifif
rank(A)
rank(A) << n.
n.
43
44 Chapter
Chapter 6. Linear Equations
6. Linear Equations
Proof: The
Proof: The proofs
proofs are
are straightforward
straightforward andand can
can bebe consulted
consulted in
in standard
standard texts
texts on linear
on linear
algebra.
algebra. Note
Note that some parts
that some parts of
of the
the theorem
theorem follow
follow directly
directly from
from others. For example,
others. For example, to
to
prove
prove part 6, note
part 6, note that
that xx = 00 is always aa solution
is always solution to
to the
the homogeneous system. Therefore,
homogeneous system. Therefore, we
we
must
must have
have the
the case
case of
of aa nonunique solution, i.e.,
nonunique solution, i.e., A is not
A is 1-1, which
not I-I, which implies
implies rank(A)
rank(A) << nn
by
by part
part 3. D
0
6.2
6.2 Matrix Linear Equations
In
In this
this section
section we
we present some of
present some of the
the principal
principal results
results concerning
concerning existence and uniqueness
existence and uniqueness
of solutions
of solutions to
to the
the general
general matrix
matrix linear system (6.1).
linear system (6.1). Note
Note that
that the
the results of Theorem
results of Theorem
6.1 follow
6.1 from those
follow from those below for the
below for special case
the special case k = = 1,1, while
while results for (6.2)
results for follow by
(6.2) follow by
specializing even
specializing even further
further to
to the
the case
case m m= = n.
n.
Theorem
Theorem 6.2 (Existence). The
6.2 (Existence). The matrix
matrix linear
linear equation
equation
AX = B; A E JR. mxn , BE JR.mxk, (6.4)
has
has aa solution
solution ifif and
and only ifR(B) C
only ifl^(B) S; 7£(A);
R(A); equivalently,
equivalently, aa solution
solution exists
exists ifif and
and only
only ifif
+
AA
AA+B B = B.
B.
Proof: The
Proof: The subspace
subspace inclusion
inclusion criterion follows essentially
criterion follows essentially from
from the
the definition
definition of
of the range
the range
of aa matrix.
of matrix. The
The matrix criterion is
matrix criterion is Theorem
Theorem 4.18.
4.18. 0
mxn mxk +
Theorem
Theorem 6.3.
6.3. Let A eE R
JR.mxn,, B E
eRJR.mxk and suppose that AA +B = B. Then any matrix
B =
of
of the form
the form
X = A+ B + (/ - A+ A)Y, where Y E JR.nxk is arbitrary, (6.5)
is
is aa solution
solution of
of
AX=B. (6.6)
Furthermore, all solutions

Furthermore, all of (6.6)
solutions of (6.6) are of this
are of form.
this form.
Proof: To
Proof: To verify
verify that
that (6.5)
(6.5) is
is aa solution,
solution, premultiply
premultiply by A:
by A:
AX = AA+ B + A(I - A+ A)Y

= B + (A - AA+ A)Y by hypothesis
= B since AA + A = A by the first Penrose condition.
That all
That solutions arc
all solutions of this
are of this form
form can
can be seen as
be seen follows. Let
as follows. Let Z
Z be
be an arbitrary solution
an arbitrary solution of
of
(6.6), i.e.,
(6.6). i.e .. AZ — B.
AZ :::: B. Then
Then we
we can
can write
write
Z=A+AZ+(I-A+A)Z
=A+B+(I-A+A)Z
and
and this
this is
is clearly
clearly of
of the
the form (6.5).
form (6.5). 0
6.2. Matrix
6.2. Matrix Linear
Linear Equations
Equations 45
+
Remark 6.4. When A
Remark A is square and nonsingular, A
A+ A-I1 and so (I
= A" (/ -— A
A++
A) = O.
A) 0. Thus,
1
there is no "arbitrary" component, leaving only the unique solution XX• = A~ B.
= A-I B.
Remark 6.5.
Remark 6.5. It particular solution X = A++BB is the solution of (6.6)
It can be shown that the particular (6.6)
7
that minimizes
that minimizes TrXT
TrX X. (TrO(Tr(-) denotes
denotes the
the trace of aa matrix;
trace of matrix; recall TrX r X =
that TrXT
recall that = £\ jcj.)
Li,j•xlj.)
Theorem 6.6 (Uniqueness).

Theorem 6.6 (Uniqueness). A
A solution of the
solution of the matrix linear equation
matrix linear equation
AX = B; A E lR,mxn, BE lR,mxk (6,7)
unique if
is unique if and only if
and only A ++AA =
if A = /; equivalently, (6.7)
I; equivalently, (6.7) has
has aa unique solution if
unique solution if and
and only if
only if
M(A)
N(A) = = 0.
O.
Proof:
Proof: The first equivalence is immediate from Theorem 6.3. The second follows by noting
that A+A
thatA+ A = if r =
= I/ can occur only ifr — n, where r = rank(A)
n, wherer (recall r :::
rank(A) (recallr n), But
< h). rank(A) =
Butrank(A) = nn
if and
if and only if A
only if is I-lor
A is 1-1 or _/V(A)
N(A) = 0. D
= O. 0
Example 6.7.
Example 6.7. Suppose A E"x". Find all solutions of the homogeneous system Ax
A Ee lR,nxn. Ax = 0.
— 0,
Solution:
x=A+O+(I-A+A)y
= (I-A+A)y,
R" is arbitrary. Hence, there exists a nonzero solution if and only if A

where yy eE lR,n +
A+ t= I,I.
AA /=
This is equivalent to either rank (A) =
rank(A) = r << n or A
A being singular. Clearly, if there exists a
nonzero solution, it is not unique.
unique,
Computation: Since yy is arbitrary, it is easy to see that all solutions are generated
from a basis for 7£(7
R(I -— AA ++ A).
A). But if AA has an SVD given by A
A == U VT, then it is easily
f/Eh VT,
checked that 1-/ - A+A
A+ A = Vz V2V r
and R(Vz
2 and
V[ vD
U(V2V^) = K(V2) =
= R(Vz) =N(A),
N(A).
Example 6.S.
Example 6.8. Characterize
Characterize all right inverses of a matrix A ]Rmx"; equivalently, find all
A Ee lR,mxn;
solutions R of the equation ARAR = = 1Imm., Here, we write 1mIm to emphasize the m x m identity
matrix.
matrix,
Solution: There exists a right inverse if and only if R(Im) 7£(/m) S;c 7£(A)
R(A) and this is
equivalent to
equivalent AA +
to AA m = 1m.
+I1m Clearly, this
Im. Clearly, can occur
this can occur if
if and only if
and only rank(A) =
if rank(A) = rr = m (since
m (since
+
rr :::
< m) A being onto (A
m) and this is equivalent to A (A+ is then a right inverse). All right inverses
of A
of A are then of the form
R = A+ 1m + (In - A+ A)Y
=A++(I-A+A)Y,
E"xm is arbitrary,
where Y Ee lR,nxm arbitrary. There is a unique right inverse if and
and only if A
A++
A
A = I/
1
(N(A) = 0), in which case A
(AA(A) = R = A-I.
A must be invertible and R A" .
Example
Example 6.9.
6.9. Consider
Consider the system of linear first-order difference equations
(6,8)
46
46 Equations
Chapter 6. Linear Equations
nxmxm
IR nxxn" and B E IR
with A Ee R" fieR" (n(rc>l,ra>l).
~ I, m ~ I). The vector Jt*
Xk in linear system theory is
known
known asas the state vector
the state vector at
at time
time k while Uk is
while Uk is the input (control)
the input vector. The
(control) vector. The general
general
solution of
solution of (6.8)
(6.8) is is given
given by
by
k-J
Xk = Akxo + LAk-J-j BUj (6.9)
j=O
Uk-J ]
k k-J Uk-2
~Axo+[B.AB •...• A B] ~o (6.10)
[
for kk >
for 1. We
~ 1. We might
might now
now ask the question:
ask the question: Given
Given XQ
Xo = 0,0, does
does there exist an
there exist an input sequence
input sequence
k
{uj
[Uj
{u j 1 ~:b
}}y~Q such that
jj^ such x^
that Xk takes an
Xk takes an arbitrary va in
arbitrary value
value W ? In
in 1R"? In linear
linear system
system theory,
theory, this is aa question
this is question
of reacbability.
of reachability. SinceSince m ~> I,
1, from
from the
the fundamental
fundamental Existence
Existence Theorem,
Theorem, Theorem 6.2, we
see that (6.8) is reachable if and only if if
R([ B, AB, ... , A n - J B]) = 1R"
or, equivalently,
or, if and
equivalently, if and only
only if
if
rank [B, AB, ... , A n - J B] = n.

A related
A related question
question is
is the
the following:
following: Given
Given anan arbitrary
arbitrary initial
initial vector
vector XQ, does there
Xo, does there ex-
ex-
ist an
an input
input sequence
sequence {u j l'/:b
{"y}"~o such that xXnn =
such that 0? In linear
= O? linear system
system theory,
theory, this
this is
is called
called
controllability. Again from Theorem
controllability. Theorem 6.2, we see that (6.8) is controllable if and only if if
Clearly, reachability always implies controllability and, if A A is nonsingular, control-

1
lability and
lability and reachability are equivalent.
reachability are equivalent. The
The matrices
matrices A = [~
A = [ ° ~] andB5 == [~]
Q1and f ^ 1provide
provideanan
example of
example of aa system
system that
that is
is controllable
controllable but
but not
not reachable.
reachable.
The above are standard
The standard conditions with analogues
conditions with for continuous-time
analogues for models (i.e.,
continuous-time models (i.e.,
linear differential
linear differential equations).
equations). There
There are
are many
many other
other algebraically
algebraically equivalent
equivalent conditions.
conditions.
Example 6.10.
Example 6.10. We
We now introduce an
now introduce output vector
an output vector Yk to the
yk to the system
system (6.8)
(6.8) of
of Example
Example 6.9
6.9
by appending
by appending the equation
the equation
(6.11)
pxn pxm
with
with C e R
E IR Pxn and D
and €R
E IR Pxm (p ~
(p > 1).
1). We
We cancan then
then pose some new
pose some new questions
questions about
about the
the
overall system that
overall system that are dual in
are dual in the
the system-theoretic
system-theoretic sense
sense to to reachability
reachability and controllability.
and controllability.
The answers
The answers are
are cast
cast in
in terms
terms that
that are
are dual
dual in
in the
the linear
linear algebra
algebra sense
sense as
as well.
well. The
The condition
condition
dual to
dual to reachability is called
reachability is observability: When
called observability: When doesdoes knowledge
knowledge of {"7j r/:b
of {u }"!Q and
and {Yj l';:b
{y_/}"~o
suffice to
suffice determine (uniquely)
to determine (uniquely) Jt 0? As
xo? As aa dual
dual toto controllability,
controllability, wewe have
have thethe notion
notion ofof
reconstructibility: When
reconstructibility: does knowledge
When does knowledge of of {u r/:b and
{wjy }"~Q {YJ lj:b suffice
and {;y/}"Io suffice to determine
to determine
(uniquely) xxn?
(uniquely) The fundamental
nl The fundamental duality
duality result from linear
result from linear system
system theory
theory isis the
the following:
following:
(A, B)
(A. B) iJ
is reachable [controllable] if and
[controllablcl if and only if (AT,T. B TT)] is observable
if(A [reconstructive].
observable [reconsrrucrible]
6.4 Some
6.4 Some Useful
Useful and
and Interesting
Interesting Inverses
Inverses 47
To derive
To derive aa condition
condition for
for observability,
observability, notice
notice that
that
k-l
Yk = CAkxo +L CAk-1-j BUj + DUk. (6.12)
j=O
Thus,
Thus,
Yo - Duo
Yl - CBuo - Du]
(6.13)
Let v denote
Let denote the
r Yn-] -
the (known)
Lj:~ CA n - 2 -j BUj - DUn-l
(known) vector
vector on
on the left-hand side
the left-hand side of
of (6.13) and let
(6.13) and let R denote
denote the matrix on
the matrix on
the right-hand
the right-hand side.
side. Then,
Then, by
by definition,
definition, v Ee R(R),
Tl(R), soso aa solution
solution exists.
exists. By the fundamental
By the fundamental
Uniqueness
Uniqueness Theorem,
Theorem, Theorem 6.6,the
Theorem 6.6, thesolution
solutionisisthen
thenunique
uniqueififand
andonly
onlyififN(R)
N(R) ==0,0,
or, equivalently,
or, equivalently, if
if and
and only
only if
if
6.3
6.3 A More General
A More General Matrix
Matrix Linear
Linear Equation
Equation
mxn mxq
jRmxn, B Ee R
,B jRmx , and C e Rpxti. Then the
q , and C E jRpxq.
the equation
AXC=B (6.14)
+ +
has aa solution
has solution if
if and
and only
only if
if AA
AA + BC
BC+C C =
= B, in which
B, in which case the general
case the general solution is of
solution is of the
the
form
(6.15)
n p
where Y €E R * is arbitrary.
jRnxp arbitrary.
A compact
A compact matrix
matrix criterion
criterion for
for uniqueness
uniqueness ofof solutions
solutions to
to (6.14)
(6.14) requires
requires the notion
the notion
of the Kronecker product of matrices for its statement. Such a criterion (CC <g) A++AA =—I)I)
of the Kronecker product of matrices for its statement. Such a criterion (C +
C+ ® A
is stated
is stated and
and proved in Theorem
proved in Theorem 13.27.
13.27.
6.4
6.4 Some Useful
Some Useful and
and Interesting
Interesting Inverses
Inverses
In many
In many applications,
applications, the
the coefficient
coefficient matrices
matrices of
of interest are square
interest are square and
and nonsingular. Listed
nonsingular. Listed
below is aa small
below is small collection
collection of
of useful matrix identities,
useful matrix identities, particularly
particularly for
for block matrices, as-
block matrices, as-
nxn nxm
sociated with matrix inverses. In these identities, A Ee R
sociated jRnxn,, B E ER jRnxm,, C E Rmxn,
e jRmxn,
mxm
and
and DD €E EjRm xm.. Invertibility
Invertibility is
is assumed
assumed for
for any component or
any component or subblock
subblock whose
whose inverse
inverse is
is
indicated. Verification
indicated. Verification of each identity
of each is recommended
identity is recommended as as an
an exercise
exercise for
for the
the reader.
reader.
48 Chapter 6.
Chapter 6. Linear
Linear Equations
Equations
(A + BDC)-I
1. (A
1. BDCr1 == A~ A~lB(D~lI + CA-IB)-ICA-I.
A-Il -- A-IB(D- CA~lB)~[CA~l.
This result
This result is is known
known as as the Sherman-Morrison-Woodbury formula.
the Sherman-Morrison-Woodbury formula. ItIt has
has many
many
applications (and is frequently "rediscovered") including, for example, formulas for
applications (and is frequently "rediscovered") including, for example, formulas for
the inverse
the inverse of of aa sum
sum of of matrices such as
matrices such as (A(A + D)"1 or (A-I
+ D)-lor (A"1 + D"1) . It
+ D-I)-I. It also
also
yields very
yields very efficient
efficient "updating"
"updating" oror "downdating"
"downdating" formulas
formulas in
in expressions
expressions such
such as
as
x
(A +
(A + JUT —1
xx TT )) -I (with symmetric
(with symmetric A A E lRnxn
e R" " and
and ;c
x Ee E")
lRn) that arise in
that arise optimization
in optimization
theory.
theory.
2. [~ ~ r l
= [ ~
3. [~ !/
Both of
l
r [~ -~ l [~ ~/ r [~ -~ 1
these
=
l
Both of these matrices satisfy the matrix equation X2

matrices satisfy the matrix equation
=
l
from which
X^ == /I from it is
which it is obvious
obvious
that X~
that X-I = X. Note that the positions of the / and — / blocks may be exchanged.
X. Note that the positions of the / and - / blocks may be exchanged.
4. [~ ~ r A~I l
= [
-A-I BD- I ]
D- I .
[~ ~ r [-D~I~A-I D~I 1
l
5. =
~r ~~B 1
l
BC
6. [ / +c = [!C /
7. [~ ~
where
r
where E E == (D
l
(D -— CA- B) (E
l
= [ A-I
CA I B)-I (E is
+_~~!~CA-I -A~BE
is the inverse of
the inverse of the
the Schur
Schur complement
complement ofof A).
A). This
This
result follows
result follows easily
easily from
from the
the block
block LU
LU factorization
factorization in
in property 16 of
property 16 of Section
Section 1.4.
1.4.
[~ ~ r [-D~CF +-~~I~;BD-I l
l
8. = D- I
where F
where F = (A -—B
= (A ED This result
C) -I.. This
D- I C) result follows
follows easily
easily from
from the
the block
block UL
UL factor-
factor-
ization in
ization in property 17 of
property 17 of Section
Section 1.4.
1.4.
EXERCISES
EXERCISES
mx
1. As
1. As in
in Example
Example 6.8,
6.8, characterize
characterize all
all left
left inverses
inverses of
of aa matrix
matrix A eM
A E lR m xn"..
2. Let
2. Let A E mx ",BB EelRRfflxk
A E€ lRmxn, mxk
andsuppose
and supposeAAhas
hasananSVD
SVDasasininTheorem
Theorem5.1.
5.1.Assuming
Assuming
7Z(B) ~
R(B) c R(A),
7£(A), characterize
characterize all solutions of
all solutions the matrix
of the linear equation
matrix linear equation
AX=B
in terms
in terms of
of the
the SVD
SVD of
of AA.
Exercises
Exercises 49
3.
3. Let
Let jc,
x, yy Ee E" and suppose
IRn and suppose further
further that x TTyy i=
that X 1. Show
^ 1. Show that
that
T -1 1 T
(/ - xy) = I - xy .
xTy -1
4. Let x, y E€ E"
4. and suppose
IRn and suppose further that x TTyy î= 1.
that X 1. Show
Show that
that
-cxJ C '
where 1/(1 -— x xTTy).

where Cc = 1/(1 y).
x
5. Let A e
Let A E R"1R~ xn" and
and let A -11 have
let A" have columns
columns c\, ..., ,Ccn and
Cl, ... individual elements
and individual elements Yij. y;y.
l T
Assume
Assume that x/( i=
that Yji 7^ 00 for some i/ and
for some and j.j. Show
Show that
that the
the matrix
matrix B
B =—AA -— ~i—eie
e;e; (i.e.,
: (i.e.,
A with
A with —
yl subtracted
subtracted from
from its
its (zy)th
(ij)th element)
element) is singular.
is singular.
l'
Hint: Show that
Hint: Show ct E<=N(B).
that Ci M(B).
6. As in
6. in Example
Example 6.10,
6.10, check
check directly
directly that the condition
condition for
for reconstructibility
reconstructibility takes the
the
form
form
N[ fA J ~ N(A n ).
CA n - 1
left blank
blank
Chapter 7
Chapter 7
Projections, Inner
Projections, Inner Product
Product
Spaces, and
Spaces, and Norms
Norms
7.1
7.1 Projections
Definition 7.1. Let
Definition 7.1. V be
Let V be a vector space
a vector with V
space with V=X 0 Y.
X EEl y. By Theorem 2.26,
By Theorem every vv Ee V
2.26, every V
has aa unique
has decomposition vv = xx + yy with
unique decomposition and yy Ee y.
with xx eE X and y. Define
Define PX y :• V
pX,y —>• X <;
V ---+ c VV
by
by
PX,yV = x for all v E V.
Px,y is called
PX,y is (oblique) projection
the (oblique)
called the projection on
on X along 3^.
X along y.
Figure 7.1
Figure 7.1 displays
displays the projection of
the projection on both
of vvon and Y
both X and 3^ in the case
in the case V =
= ]R2.
Figure 7.1.
Figure 7.1. Oblique
Oblique projections.
projections.
Theorem 7.2.
Theorem 7.2. Px,y is linear
px.y is linear and
and P#
pl. yy — Px,y-
= px.y.
Theorem 7.3. A
Theorem 7.3. linear transformation
A linear transformation P is aa projection
P is if and
projection if and only if it
only if is idempotent,
it is idempotent, i.e.,
i.e.,
p22 =
P = P.
P. Also,
Also, P
P is a projection if and only if I —P
isaprojectionifandonlyifl -P is a projection. Infact,
isaprojection. Infact, Py,x
Py.x —= II — Px,y-
-px.y.
Proof: Suppose P
Proof: Suppose P is
is aa projection, say on
projection, say on X along Y
X along (using the
y (using the notation of Definition
notation of Definition 7.1).
7.1).
51
51
52
52 Chapter 7.
Chapter 7. Projections,
Projections, Inner
Inner Product Spaces, and
Product Spaces, and Norms
Norms
2
Let u
Let v Ee VV be
be arbitrary.
arbitrary. Then
Then Pv Pv == P(xP(x + + y)
y) == PxPx = = x. Moreover, P
x. Moreover, p 2vv == P PPv
Pv — =
2 2
Px =
Px = xx =
= Pv. Thus, P
Pv. Thus, p2 == P. Conversely, suppose
P. Conversely, suppose P p2 = = P. Let X
P. Let X == {v
{v Ee VV :: Pv
Pv == v}
v}
and Y
and y = {v {v E€ V Pv = OJ.
V :: Pv 0}. It
It is
is easy
easy toto check
check that
that X and Y
X and 3âre
aresubspaces.
subspaces. We Wenow
nowprove
prove
that
that VV= X X $0 y. First note
y. First note that
that iftfveX,
v E X, thenthen Pv
Pv = v. If vv Ee Y,
v. If y, then
then Pv
Pv = = O.0. Hence
Hence
if vv E€ X
if n y, then
X ny, then vv = 0. Now
= O. Now letlet vu Ee VV be arbitrary. Then
be arbitrary. Then vv = Pv Pv + (I P)v. Let
(I -- P)v. Let
xx == Pv,
Pv, yy == (I P)v. Then
(I -- P)v. Then PxPx == P p 22vv =
= Pv
Pv == x x so X, while Py = P(I - P}v==
so xx Ee X, while Py = P(l - P)v
2
2
Pv -- P
Pv p vv = 0 0 soso Y Thus, V
y Ee y. Thus, V= X X $0 Y and the
y and the projection
projection on on X along Y
X along is P.
y is P.
Essentially the
Essentially the same
same argument
argument shows
shows that
that /I -— P is the
P is the projection
projection on on Y along X.
y along X. D
0
L
Definition 7.4. In
Definition 7.4. In the
the special
special case
case where y = X^,
where Y X1-, PX.X
px.xl. *s called
is called an orthogonal projec-
an orthogonal projec-
x =
L
tion and we
tion and we then
then use
use the
the notation
notation P
PX = PX.XL
PX,X -
xn
7.5. P
Theorem 7.5. P Ee E" is the
jRnxn is the matrix
matrix of
of an
an orthogonal
orthogonal projection (onto K(P)}
projection (onto if and
R(P)) if and only
only
2 T
ifPp2 = p
if P .
P = pT.
L
Proof: Let
Proof: Let P be an
P be an orthogonal
orthogonal projection
projection (on(on X, say,along
X, say, alongXX1-) andlet
} and letx,jc,yy Ee jR"
R"bebe
arbitrary. Note
arbitrary. that (I
Note that (/ -- P)x
P)x = = (I(I -- PX,X^X
px.xJ.)x = = P x±,xx by
PXJ..xx Theorem 7.3.
by Theorem 7.3. Thus,
Thus,
(/ -- P)x
(I P)x Ee X L
X1-.. Since Py Py Ee X,X, wewehave
have(py)T P)x==yT
( P y f ((II - - P)x yTpT
PT(I(I - - P)xP)x==O.0.
T T T
Since xx and
Since were arbitrary,
and yy were we must
arbitrary, we must have
have P (I —
pT (I - P) P) = 0. Hence
= O. Hence P pT = P
= pT PP = = P, P,
T
with the second
with the second equality
equality following
following since
since PpTPP isis symmetric.
symmetric. Conversely,
Conversely, suppose
suppose P is is aa
symmetric projection
symmetric projection matrix and let
matrix and let xx be arbitrary. Write
be arbitrary. Write xx = = PPxx + (I(I -— P)x. Then
P)x. Then
x TTPpT
x T
(I(I -- P)x
P)x =
= x x TTP(I
P(l -- P}x
P)x = = 0.
O. Thus, since Px Px e R(P), then (/
E U(P), (I -- P)x
P)x 6 ft(P)1
E R(P)1-
and P
and must be
P must an orthogonal
be an orthogonal projection.
projection. D 0
7.1.1
7.1 .1 The four
The four fundamental orthogonal projections
fundamental orthogonal projections
mxn
Theorems 5.1
Using the notation of Theorems 5.1 and 5.11,
5.11, let A 6 R
A E jRmxII with SVD A UT,VTT =
= U!:V
A =
Then
UtSVf. Then
U\SVr
r
PR(A) AA+ U\U[ Lu;uT,

;=1
PR(A).L 1- AA+ U2 U ! LUiUT,

i=r+l
11
PN(A) 1- A+A V2V{ L ViVf,

i=r+l
r
PN(A)J. A+A VIV{ LViVT
i=l
are easily
are easily checked
checked to
to be
be (unique)
(unique) orthogonal
orthogonal projections
projections onto
onto the
the respective four funda-
respective four funda-
mental subspaces.
mental subspaces,
7.1.
7.1. Projections
Projections 53
Example 7.6.
Example Determine the
7.6. Determine the orthogonal
projection of
of aa vector
vector v Ee IR
M" n
on another
on another nonzero
nonzero
n
vector w Ee IRn.
R.
Solution: Think
Solution: Think of
of the
the vector as an
w as
vector w an element
element of
of the
the one-dimensional
one-dimensional subspace
subspace R(
IZ(w).
w).
Then
Then the desired projection
the desired projection is
is simply
simply
Pn(w)v = ww+v
wwTv
(using
(using Example 4.8)
Example 4.8)
= (WTV)
T W.
W W
Moreover, the
Moreover, the vector
vector zz that
that is orthogonal to
is orthogonal w and such
to wand such that Pvv + zz is
that v = P is given
given by
by
= PK(
zz = Pn(w)"' = (/(l —
W)±Vv = - PK(W))V
Pn(w»v = = vv — (:;~)j w.
- (^-^ w. See
See Figure
Figure 7.2.
7.2. A
A direct
direct calculation
calculation shows
shows
that z and
that and u; are, in
ware, fact, orthogonal:
in fact, orthogonal:
v
z
Pv w
Figure 7.2. Orthogonal

Figure 7.2. Orthogonal projection
projection on
on aa "line."
"line."
Example 7.7.
Example 7.7. Recall
Recall the
the proof of Theorem
proof of Theorem 3.11.
3.11. There, { v \ ,...
There, {VI, . . . ,, Vk} was an
Vk} was an orthomormal
orthornormal
basis for aa subset
basis for subset S of IRn.1. An
of W arbitrary vector
An arbitrary vector xx Ee R"
IRn was chosen and
was chosen and aa formula
formula for
for XI
x\
rather mysteriously. The expression for x\
appeared rather XI is simply the orthogonal projectionprojection of
of
xX on
on S. Specifically,
Specifically,
Example 7.8.
Example 7.8. Recall
Recall the diagram of
the diagram the four
of the four fundamental subspaces. The
fundamental subspaces. indicated direct
The indicated direct
sum decompositions of the domain E" Rmm are given easily as follows.
IR n and co-domain IR
Let Wn1 be
Let Xx Ee IR be an arbitrary vector.
an arbitrary vector. Then
Then
X = PN(A)u + PN(A)X
= A+ Ax + (I - A+ A)x
= VI vt x + V Vi x
2 (recall VVT = I).
54 Chapter
Chapter 7.
7. Projections,
Projections, Inner
Inner Product
Product Spaces,
Spaces, and
and Norms
Norms
Similarly, let
Similarly, y E
let Y IR mm be
e ]R be an arbitrary vector.
an arbitrary Then
vector. Then
Y = PR(A)Y + PR(A)~Y
= AA+y + ( l - AA+)y
= U1Ur y + U2U[ Y (recall UU T = I).
Example
Example 7.9.
7.9. Let
Let
Then
Then
1/4 1/4 ]
1/4 1/4
o o
r
and
and we can decompose
we can decompose thethe vector [2 3
vector [2 3 4V uniquely
4] into the
uniquely into sum of
the sum of aa vector A/'CA)-1
in N(A)-L
vector in
and
and aa vector
vector in N(A), respectively,
in J\f(A), respectively, as
as follows:
follows:
[!]~ A' Ax + (l - A' A)x
1/2 -1/2
= +
o !
1/2 1/2 0] [ 2] [ -1/2 1/2
[ 1~2 1~2 ~ o
5/2] [-1/2]
= [ 5~2 + 1~2 .
7.2 Inner Product

Inner Product Spaces
Definition
Definition 7.10.
7.10. Let V be
Let V be aa vector
vector space
space over
over R. { • , .)• ) :: V
Then (',
IR. Then V xx V a real
is a
V -+ IR is inner
real inner
product ifif
> Qfor
1. (x, x) ::: all x 6V
Ofor aU ( x , xx)} ==00 if
E V and (x, if and only = 0.
ifx =
only ifx O.
2. (x, y)
2. (x, = (y,
y) = x) for all x, y eE V.
(y,x)forallx,y V.
3. (x,
3. cryi +
{*, aYI + PY2)
^2) == a(x, y\) +
a(x,Yl) + f3(x,
/3(jt, Y2)
y^}for
for all jc,Yl,
allx, yi,Y2
j2 E^ VVand/or
and for all
alia,
a, f3ftEe IR.
R.
T
Example
Example 7.11.
7.11. Let V=
Let V = R".
IRn. Then
Then {^,
(x, y} x TyY is
y) = X is the "usual" Euclidean
the "usual" Euclidean inner
inner product
product or
or
dot product.
T
Example 7.12.
Example 7.12. Let V V== E".
IRn. Then (x,
(jc, y) =X
y)QQ = Qy, where Q
X T Qy, Q = Q TT >
= Q > 0 is
is an
an arbitrary
definite matrix, defines
n x n positive definite defines a "weighted" inner product.
Definition
Definition 7.13. IRmmxxn,
7.13. IfIf A Ee R ", then A T
ATE e RIR nnxm
xm is the unique linear transformation
transformation or map
T
(x, Ay) =- {AT
such that {x, (A x, y) for all x € IRmm and
E R for all y e R".
andfor IRn.
7.2. Inner
7.2. Inner product
Product Spaces
Spaces 55
55
It is
It is easy
easy to
to check
check that,
that, with
with this
this more "abstract" definition
more "abstract" definition of
of transpose, and if
transpose, and if the
the
T
(/,
(i, y)th
j)th element
element of A is
of A is a (; , then
aij, then the (i, y)th
the (i, j)th element
element of
of A is a/,.
AT is ap. ItIt can also be
can also checked
be checked
T T
that all the
that all the usual
usual properties
properties ofof the
the transpose
transpose hold, such as
hold, such as (AB)
(Afl) = = BBT A
AT. However, the
. However, the
definition above allows
definition above allows us
us to extend the
to extend the concept
concept of
of transpose
transpose toto the
the case
case of
of weighted
weighted inner
inner
mxn
products in the following way. Suppose A A eE R {-, -}g
]Rm xn and let (., .) Q and (•, -}R, with Qand
(., .) R, with Q and
R positive
positive definite,
definite, be
be weighted
weighted inner
inner products
products on
on RIRmm and IRn, respectively.
and W, respectively. Then
Then wewe can
can
define
define the "weighted transpose"
the "weighted transpose" AA## as
as the
the unique
unique map
map that satisfies
that satisfies
# m
(x, Ay)
(x, AY)Q = (A
Q = (A#x, y)R for all
x, Y)R eR
all xx E IRm IRn.1.
and for all Yy Ee W
By
By Example
Example 7.12
7.l2 above,
above, we
we must
must then
then have xT
have X T
QAy = x
QAy x TT(A #
(A#{) Ry for all
Ry for all x, y. Hence
x, y. Hence we
we
# T #
must have QA
QA = (A ) R.
= (A#{ R. Taking transposes (of the usual variety) gives A
transposes (of AT Q
Q =
= RA
RA#. .
Since R
Since R is
is nonsingular,
nonsingular, we
we find
find
A* R-1A TQ.
= /r'A'
A# = Q.
We can also
We can also generalize
generalize the the notion of orthogonality
notion of orthogonality (x (xTTyy = 0) to
= 0) Q -orthogonality (Q
to Q-orthogonality is
(Q is
aa positive
positive definite
definite matrix).
matrix). Two Two vectors
vectors x,
x, yy Ee IRn
W areare Q-orthogonal
<2-orthogonal (or
(or conjugate
conjugate with
with
T
respect
respect to Q) if
to Q) ( x , yy)} QQ = X
if (x, Qy = 0.
X T Qy O. Q -orthogonality is
Q-orthogonality an important
is an important tool
tool used
used in
in
studying
studying conjugate
conjugate direction
direction methods
methods in
in optimization theory.
optimization theory.
Definition
Definition 7.14.
7.14. Let V be
Let V be a
a vector
vector space
space over C. Then
over <C. Then {-, •} :: V
(., .) V xV -> C is
V -+ complex
is aa complex
inner product
inner product ifif
1. (x,
1. > Qfor
( x , xx)) :::: 0 for all
all xx eE V
V and
and ((x, =00 ifif and
x , xx)) = and only = 0.
ifxx =
only if O.
2.
2. (x, y) = (y,
(x, y) (y, x)
x) for
for all e V.
x, yy E
all x, V.
3.
3. (x, aYI + fiy
(x,ayi 2) =
f3Y2) = a(x, yll + fi(x,
a(x, y\) y2}forallx,
f3(x, Y2) y\, yY22 Ee V
for all x, YI, V and all a, f3ft 6
for alia,
andfor E C.
c.
Remark 7.15.
Remark 7.15. We could use
We could use the
the notation
notation {•,
(., -}·)ec to
to denote
denote aa complex
complex inner
inner product,
product, but
but
if
if the
the vectors involved are
vectors involved are complex-valued,
complex-valued, thethe complex
complex inner
inner product
product isis to
to be understood.
be understood.
Note,
Note, too,
too, from
from part
part 22 of
of the
the definition,
definition, that x , xx)) must
that ((x, must be
be real
real for
for all x.
all x.
Remark 7.16. Note

Remark 7.16. from parts
Note from parts 22 and
and 3
3 of
of Definition
Definition 7.14
7.14 that
that we
we have
have
(ax\ + fix2, y) = a(x\, y) + P(x2, y}.
Remark 7.17. The

Remark 7.17. The Euclidean
Euclidean inner
inner product of x,
product of eC
x, y E C"n is
is given
given by
by
n
(x, y) = LXiYi = xHy.

i=1
H
The conventional definition
The conventional definition of
of the complex Euclidean
the complex Euclidean inner
inner product
product is
is (x, y) = yyHxx but
(x, y} but we
we
HH
use
use its
its complex
complex conjugate
conjugate x yy here
here for
for symmetry
symmetry with
with the
the real case.
real case.
Remark 7.1S.
Remark 7.18. A A weighted
weighted inner
inner product
product can
can be
be defined
defined as
as in
in the
the real
real case
case by (x, y}
by (x, Q —
y)Q =
Xx HHQy,
Qy, for arbitrary
arbitrary Q
Q = Q QHH
> o. The notion
> 0. notion of Q -orthogonality can
Q-orthogonality can be
be similarly
similarly
generalized to
generalized to the complex case.
the complex case.
56
56 Chapter 7.
Projections, Inner
Inner Product
Spaces, and Norms
Definition 7.19.
Definition 7.19. AA vector
vector space (V, IF)
space (V, F) endowed
endowed with
with aa specific
specific inner
inner product is called
product is called an
an
inner
inner product
product space. IF =
If F
space. If e, we
= C, call V
we call complex inner
V aa complex inner product
product space. If FIF == R,
space. If R we
we
call V
call Vaa real
real inner
inner product space.
product space.
Example
Example 7.20.
7.20.
1. Check
1. Check that = IRR"n xxn" with
that V = with the
the inner
inner product
product (A,
(A, B)
B) == Tr
Tr AT
AT B is
B is aa real
real inner
inner product
product
space. Note
space. Note that other choices
that other choices are
are possible since by
possible since by properties of the
properties of the trace function,
trace function,
T
TrA
Tr AT BB =
= Tr
TrBBTTAA = Tr A BTT =
= TrAB TrBA
= Tr BAT.T
.
nx H
2.
2. Check
Check that V=
that V = e
Cnxn " with the inner
with the inner product B) = Tr
(A, B)
product (A, Tr A is aa complex
AHBB is complex inner
inner
product space. Again,
product space. other choices
Again, other choices are possible.
are possible.
Definition
Definition 7.21.
7.21. Let V be
Let V be an inner product
an inner product space.
space. For
For vv eE V,
V, we
we define the norm
define the (or
norm (or
length)
length) ofv
ofv by \\v\\ =
by IIvll = */(v, v). This
-J(V,V). This is
is called
called the norm induced
the norm induced by ( - , -.).) .
by (',
Example
Example 7.22.
7.22.
1. If
1. If V
V == IR
E."n
with
with the
the usual inner product,
usual inner product, the
the induced
induced norm
norm is
is given
given by
by II||i>||
v II =
2 21
(E,=i<Y)
(Li=l V i )2.-
n
xV—*« 9\ 7
2.
2. If V =
If V = enC" with
with the
the usual inner product,
usual inner product, the
the induced
induced norm
norm is
is given
given by v II =
by II\\v\\
"n
(£? = ,l»,-l
(L...i=l
22 !
IVi I )*.
) .
Theorem
Theorem 7.23. 7.23. Let
Let P be an
P be an orthogonal
projection on an inner
on an inner product
product space Then
space V. Then
\\Pv\\ < Ilvll
IIPvll ::::: for all v e
\\v\\forallv V.
E V.
Proof: Since P
Proof: Since P is an orthogonal
is an orthogonal projection,
projection, P p22 == P P = =P #
pH. (Here, the
. (Here, notation P
the notation p## denotes
denotes
#
the unique
the unique linear transformation that
linear transformation that satisfies
satisfies (Pu,
( P u , vv)} == (u,
(u, Pp#v) for all
v) for u, vv eE V. If
all u, If this
this
# T
seems aa little
seems too abstract,
little too consider V =
abstract, consider = R"
IRn (or C"), where
(or en), where P p# is
is simply
simply the the usual
usual P pT (or
(or
H
P )). Hence
pH)). Hence ((Pv,
P v , v)
v) = (P 22v,
= (P v, v)
v) =
= (Pv,
(Pv, P #
p#v)v) == ((Pv,
P v , Pv)
Pv) = IIPvll 22 :::
= \\Pv\\ > 0.O. Now
Now // -- PPisis
also aa projection,
also projection, so so the
the above
above result applies and
result applies and wewe getget
0::::: ((I - P)v. v) = (v. v) - (Pv, v)

= IIvll2 - IIPvll 2
from which
from the theorem
which the theorem follows.
follows. 0
Definition
Definition 7.24.
7.24. The norm induced
The norm on an
induced on an inner
inner product
product space
space by
by the "usual" inner
the "usual" product
inner product
is called
is the natural
called the norm.
natural norm.
In case
In case V = en or
= C" IR n, the
or V == R", the natural
natural norm
norm is
is also the Euclidean
called the
also called Euclidean norm.
norm. InIn
the next
the next section,
section, other
other norms
norms on on these
these vector spaces are
vector spaces are defined.
defined. A converse to
A converse to the
the above
above
procedure
procedure is
is also available. That
also available. That is, given aa norm
is, given norm defined
defined by
by \\x\\
IIx II — •>/(•*> x), an
= .j(X,X}, an inner
inner
product can be
product can defined via
be defined via the
the following.
following.
7.3. Vector
7.3. Vector Norms
Norms 57
57
Theorem 7.25 (Polarization

Theorem 7.25 (Polarization Identity).
Identity).
1. For
1. For x, € m~n,
x, yy E R", an
an inner
inner product is defined
product is defined by
by
(x,y)=xTy= IIx+YIl2~IIX_YI12_ IIx + yll2 _ IIxll2 _ lIyll2

2
2.
2. For x, yy eE C",
For x, en, an
an inner
inner product is defined
product is defined by
by
where = ii =
where jj = .J=I.
= \/—T.
7.3
7.3 Vector Norms
Vector Norms
Definition 7.26. Let (V, F) be
(V, IF) be aa vector space. Then
vector space. Then II\ \ -. \ II\ : V ->• R
V ---+ vector norm
is aa vector
IR is norm ifit
if it
satisfies the
satisfies the following three properties:
following three properties:
1. Ilxll::: Ofor all x E V and IIxll = 0 ifand only ifx = O.

2. Ilaxll = lalllxllforallx E Vandforalla E IF.
3. IIx + yll :::: IIxll + IIYliforall x, y E V.

(This is
(This is called
called the triangle inequality,
the triangle inequality, as
as seen readily from
seen readily the usual
from the usual diagram
diagram illus
illus-
trating the
trating the sum
sum of of two
two vectors
vectors in R2.)
in ]R2 .)
Remark
Remark 7.27. It is
7.27. It is convenient
convenient in
in the remainder of
the remainder of this section to
this section state results
to state results for complex-
for complex-
valued vectors.
valued The specialization
vectors. The specialization to the real
to the real case
case is
is obvious.
obvious.
Definition 7.28. A
Definition 7.28. vector space
A vector space (V, F) is
(V, IF) is said to be
said to normed linear
be aa normed linear space
space ifif and
and only
only ifif
there exists
there exists aa vector norm II|| .• II|| :: V
vector norm V ---+
-> ]RR satisfying
satisfying the
the three
three conditions
conditions of
of Definition
Definition 7.26.
7.26.
Example 7.29.
Example 7.29.
1. For
1. For x Ee en,
C", the
the Holder norms, or
HOlder norms, or p-norms, are defined
p-norms, are defined by
by
Special cases:
Special cases:
(a) Ilx III = L:7=1 IXi I (the "Manhattan" norm).

1 1
H
(b) Ilxllz = (L:7=1Ix;l2)2 = (X X)2 (the Euclidean norm).
(c) Ilxlioo = maxlx;l
IE!!
= lim IIxllp-
p---++oo
(The second
(The second equality
equality is
is aa theorem
theorem that requires proof.)
that requires proof.)
58
58 Chapter 7. Projections,
Chapter 7. Projections, Inner
Inner Product Spaces, and
Norms
2. Some weighted
2. Some weighted p-norms:
p-norms:
||JC||,.D =
(a) IIxll1.D L~=ld;lx;l, whered;
= E^rf/l*/!, where 4 >
> 0.
O.
1
= (x
IIz.Q —
Ikllz.g
(b) IIx (xhH Qx)
QXY 2,> where Q = QH >
= QH > 0 (this norm is more commonly
denoted II|| .• IIQ)'
denoted ||c).
3.
3. On
On the vector space
the vector space (C[to,
(C[to, ttl, R), define
t \ ] , 1Ft), define the vector norm
the vector norm
11111 = max 1/(t)I·

to:::.t~JI
On the
On vector space
the vector space «e[to, t\])n, 1Ft),
((C[to, ttlr, R), define
define the
the vector
vector norm
norm
1111100 = max II/(t) 11 00 ,

tO~t:5.tl
Fhcorem 7.30
Theorem 7.30 (HOlder
(Holder Inequality).
Inequality). Let
Let x, e en.
x, yy E C". Then
Ther,
I I
-+-=1.
p q
A
A particular case of
particular case of the
the Holder inequality is
HOlder inequality is of
of special
special interest.
interest.
Theorem 7.31
Theorem (Cauchy-Bunyakovsky-Schwarz Inequality).
7.31 (Cauchy-Bunyakovsky-Schwarz Inequality). Let
Let x, y eE en.
x, y C". Then
Then
with equality
with equality ifif and
and only
only ifif xx and
and yyare
are linearly dependent.
linearly dependent.
x2
Proof: Consider the
Proof' Consider matrix [x
the matrix y] e
[x y] C"x2
E en .. Since
Since
is aa nonnegative
is nonnegative definite
definite matrix,
matrix, itsits determinant
determinant must
must be
be nonnegative. In other
nonnegative. In other words,
words,
H H H H H H
0 ~
o < (x H
( x xx)(yH
) ( y y y)
) -— (x H
( x yy)(yH
) ( y x x). Since yH
) . Since y xx == xx y,
H we see
y, we see immediately
immediately that \XHy\
that IXH yl ~<
IIxll2l1yllz.
\\X\\2\\y\\2- 0
D
Note: This is
Note: This is not the classical
not the classical algebraic
algebraic proof
proof of
of the Cauchy-Bunyakovsky-Schwarz
the Cauchy-Bunyakovsky-Schwarz
(C-B-S) inequality
(C-B-S) inequality (see,
(see, e.g., [20, p.
e.g., [20, p. 217]).
217]). However, it is
However, it is particularly
particularly easy
easy to
to remember.
remember.
Remark 7.32.
Remark 7.32. The angle e
The angle 0 between two nonzero
between two nonzero vectors
vectors x, en may
x, yy eE C" may be
be defined
defined by
by
cos e =
cos# = 1I;~~1~1112'
I, „ |.^||
Il-Mmlylb
, 0
0 ~
< e
0 ~
<
— ^
I'
5-. The
The C-B-S
C-B-S inequality
inequality is
is thus
thus equivalent
equivalent to
to the
the statement
statement
1 COS 01| ~< 1.
| cose 1.
Remark 7.33. Theorem

Remark 7.33. 7.31 and
Theorem 7.31 and Remark 7.32 are
Remark 7.32 are true
true for
for general
general inner
inner product spaces.
product spaces.
x
Remark 7.34. The
Remark 7.34. The norm
norm II|| .• 112
||2 is unitarily invariant,
is unitarily invariant, i.e.,
i.e., if U E€ e
if U C"
nxn " is
is unitary,
unitary, then
then
\\Ux\\2 = IIxll2
IIUxll2 \\x\\2 (Proof
(Proof. IIUxili
\\Ux\\l = xXHUHUxH H
U Ux = xHx X X =
H
= IIxlli)· However, 11·111
\\x\\\). However, || - ||, and || - 1^
and 1I·IIClO
7.4. Matrix
7.4. Matrix Norms
Norms 59
59
are not
are not unitarily invariant. Similar
unitarily invariant. Similar remarks
remarks apply
apply to
to the
the unitary
unitary invariance
invariance of
of norms
norms of
of real
real
vectors under
vectors orthogonal transformation.
under orthogonal transformation.
Remark 7.35.
Remark If x, yy E€ en
7.35. If C" are
are orthogonal,
orthogonal, then
then we
we have the Pythagorean
have the Identity
Pythagorean Identity
Ilx ± YII~ = IIxll~ + IIYII~,

the proof
the proof of
of which follows easily
which follows easily from
from liz II~2 =
||z||2 _ _//.
z z.
ZH
7.36. All
Theorem 7.36. All norms on en
norms on C" are equivalent; i.e.,
are equivalent; i.e., there exist constants
there exist constants CI,
c\, C2
c-i (possibly
(possibly
depending on
depending onn)
n) such
such that
that
Example 7.37.
Example For xx EG en,
7.37. For C", the
the following
following inequalities
inequalities are
are all
all tight bounds; i.e.,
tight bounds; i.e., there
there exist
exist
vectors for which
vectors xx for equality holds:
which equality holds:
Ilxlll :::: Jn Ilxlb Ilxlll :::: n IIxlloo;

Ilxll2:::: IIxll» IIxl12 :::: Jn Ilxll oo ;
IIxlloo :::: IIxll» IIxlioo :::: IIxllz.
Finally, we conclude
Finally, we conclude this
this section
section with
with aa theorem
theorem about
about convergence
convergence ofof vectors.
vectors. Con-
Con-
vergence of
vergence of aa sequence
sequence of
of vectors to some
vectors to some limit vector can
limit vector can be converted into
be converted into aa statement
statement
about convergence
about convergence of
of real
real numbers, i.e., convergence
numbers, i.e., convergence in
in terms of vector
terms of vector norms.
norms.
Theorem 7.38. Let II·
7.38. Let \\ • II\\ be
be aa vector
vector norm
norm and
and suppose
suppose v, i» (1) , v(2),
v, v(l), ... Ee en.
v(2\ ... C". Then
Then
lim V(k) = v if and only if lim II v(k) - v II = O.

k4+00 k~+oo
7.4
7.4 Matrix Norms
Matrix Norms
In this
In section we
this section we introduce
introduce the
the concept
concept ofof matrix norm. As
matrix norm. As with
with vectors,
vectors, the
the motivation for
motivation for
using matrix
using norms is
matrix norms is to
to have
have aa notion
notion of
of either
either the
the size
size of
of or
or the
the nearness of matrices.
nearness of matrices. The
The
former notion
former notion is
is useful
useful for
for perturbation
perturbation analysis,
analysis, while
while the latter is
the latter is needed
needed to
to make sense of
make sense of
"convergence" of
"convergence" of matrices.
matrices. Attention
Attention is
is confined
confined toto the
the vector (Wnxn
space (IRm
vector space R) since
xn ,, IR) since that
that is
is
what arises
what arises in
in the majority of
the majority of applications.
applications. Extension
Extension to to the complex case
the complex case is
is straightforward
straightforward
and essentially
and essentially obvious.
obvious.
mx
Definition 7.39.
Definition 7.39. II· Rmxn
|| • II|| : IR " -> E is
~ IR matrix norm
is aa matrix if it
norm if it satisfies the following
satisfies the following three
three
properties:
properties:
1. IIAII ~ Ofor all A E IR mxn and IIAII = 0 if and only if A = O.

mxn
2. lIaAl1 = lalliAliforall A E IR andfor all a E IR.
3. IIA + BII :::: IIAII + IIBII for all A, BE IRmxn.

(As with vectors, this is called the triangle inequality.)
60 Chapter 7.
Projections, Inner
Inner Product
Product Spaces,
Spaces, and
and Norms
Norms
Example 7.40. R mx ". Then the Frobenius norm (or matrix Euclidean norm) is
7.40. Let A Ee lR,mxn.
defined by
IIAIIF t
~ (t. ai;) I ~ (t. altA)) 1 ~ (T, (A' A)) 1 ~ (T, (AA '));
^wncic r =
(where = laiiK^/i;;.
rank(A)).
Example 7.41. Let A Rmxn. Then the matrix
e lR,mxn.
A E matrix p-norms are defined by
IIAxll
IIAII = max -_P = max IIAxll .
P Ilxllp;60 Ilxli p IIxllp=1 p
"computable." Each is a
The following three special cases are important because they are "computable."
theorem and requires a proof.
1.
I. The "maximum column sum" norm is
2. The "maximum row sum" norm is

2.
IIAlioo = max
rE!!l. (t
J=1
laUI).
3. The spectral norm is

3.
tTL T
IIAII2 = Amax(A A) = A~ax(AA ) = a1(A).
Note: IIA+llz = l/ar(A), where r = rank(A).

mxn
Example 7.42. Let A EE R
Example 7.42. Schatten/7-norms are defined by
lR,mxn.. The Schattenp-norms
IIAlls.p = (at' + ... + a!)"".

Some special cases of Schatten /?-norms
p-norms are equal to norms defined previously.
previously. For example,
|| . || 5 2 =
11·115.2 = ||II . \\IIFF and || • ||5i00 =
and 11'115,00 = ||II •. ||112'2. The
The norm
norm ||II .• ||115.1
5>1 is
is often
often called
called the
the trace
trace norm.
norm.
mx
Example 7.43. Let A Ee K
Example 7.43. "._ Then "mixed" norms can also be defined by
lR,mxn
IIAII = max IIAxil p

p,q 11.<110#0 IIxllq
Example 7.44.
Example 7.44. The "matrix analogue
The "matrix analogue of
of the
the vector
vector I-norm,"
1-norm," IIAlis
|| A\\s == Li.j \ai}; I,|, isisaa norm.
^ j laij norm.
The concept of a matrix norm alone is not altogether useful since it does not allow us
to estimate
to estimate the
the size
size of
of aa matrix
matrix product
product A
AB in terms
B in of the
terms of the sizes
sizes of
of A and B
A and individually.
B individually.
7.4.
7.4. Matrix
Matrix Norms
Norms 61
61
Notice
Notice that
that this difficulty did
this difficulty did not arise for
not arise for vectors,
vectors, although
although there
there are
are analogues for, e.g.,
analogues for, e.g.,
inner
inner products
products oror outer
outer products
products of
of vectors.
vectors. We
We thus
thus need
need the
the following definition.
following definition.
mxn nxk
Definition
Definition 7.45. 7.45. Let Let A A eE R B eE R
]Rmxn,, B ]Rnxk.. Then
Then the
the norms
norms \\II .• II",
\\a, \\II·• \\
Ilfl'p,and
and II \\. •lIy\\y are
are
mutually consistent if \\ A B \\ a < IIAllfllIBlly.
mutuallyconsistentifIlABII,,::S \\A\\p\\B\\y. AA matrix
matrix norm
norm 11·11
\\ • \\isis said
said toto be be consistent
consistent
II A B II ::s
ifif \\AB\\ < ||II A
A ||1111|| Bfi||II whenever the matrix
whenever the matrix product
product is defined.
is defined.
Example
Example 7.46.
7.46.
1. ||II·• ||/7
1. and II ||. •II ||pp for
II F and for all
all pp are
are consistent
consistent matrix
matrixnorms.
norms.
2.
2. The "mixed"
"mixed" norm
norm
IIAxll1
II· 11 100
,
= max - - = max laijl
x;60 Ilx 1100 i,j
is
is aa matrix
matrix norm
norm but
but it
it is
is not consistent. For
not consistent. For example,
example, take
take A =B
A = = [:
B = \\ J1. Then
:]. Then
||Afl|| li00 = 2while||A||
IIABIII,oo li00 ||B|| 1>00 = 1.
2 while IIAIII,ooIlBIII,oo l.
The -norms are

The pp-norms are examples
examples of of matrix
matrix norms norms that are subordinate
that are subordinate to (or induced
to (or induced by) by)
aa vector
vector norm,
norm, i.e.,
i.e.,
IIAxl1
IIAII = max - - = max IIAxl1
x;60 IIx II Ilxll=1
IIAxll Pp
11^4^11 .
(or,
(or, more
more generally,
generally, ||A||
IIAllp,q == maxô ., . . )),. For
maxx;60 IIxll such subordinate
For such subordmate norms,
norms, also
also called oper-
caUedoper-
q
ator norms, we
atornorms, wec1earlyhave IIAxll ::s
clearly have ||Ajc|| < ||A||1|jt||.
IIAllllxll· Since ||Afijc|| ::s
Since IIABxl1 IIAlIllBxll ::s
< ||A||||fljc|| < IIAIIIIBllllxll,
||A||||fl||||jt||,
it
it follows
follows that
that all subordinate norms
all subordinate norms are consistent.
are consistent.
Theorem
Theorem 7.47.
7.47. There
There exists
exists a
a vector
vector x*
x* such that ||Ajt*||
such that IIAx*11 == ||A|| ||jc*|| ifif the
IIAllllx*11 the matrix norm is
matrix norm is
subordinate
subordinate to the vector
to the norm.
vector norm.
Theorem
Theorem 7.48.
7.48. IfIf \\II •. 11m
\\m is
is aa consistent
consistent matrix
matrix norm,
norm, there exists aa vector
there exists vector norm
norm \\II .• \\IIvv
consistent
consistent with
with it,
it, i.e., IIAxliv ::s
i.e., HAjcJI^ < \\A\\ \\x\\vv.'
IIAlimm Ilxli
Not
Not every consistent matrix
every consistent matrix norm
norm is is subordinate
subordinate to to aa vector
vector norm.
norm. For example,
For example,
consider
consider ||II •. \\
II F'F.Then II Ax 1122 ::s< II ||A||
Then||A^|| F ||jc||
A II Filx 112,2,sosoII ||. •112||2isisconsistent
consistentwith
withII ||. •II ||F,F,but
butthere
theredoes
does
not exist aa vector
not exist vector normnorm ||II .• II|| such
such that
that ||A||
IIAIIFF is is given
given by max x ;60 ",~~i'.
by maxô \^ •
Useful Results
The
The following
following miscellaneous
miscellaneous results
results about
about matrix
matrix norms are collected
norms are collected for
for future reference.
future reference.
The interested reader
reader is invited to prove each of them as an exercise.
exercise.
1. II In II p = 1 for all p, while IIIn II F = .jii.

x
2.
2. For AA eE R" ", the
]Rnxn, following inequalities are all tight, i.e.,
i.e., there exist matrices A
A for
IIAIII ::s .jii IIAlb IIAIII ::s n IIAlloo, IIAIII ::s .jii II A IIF;
IIAII2 ::s.jii IIAII I, IIAII2 ::s .jii IIAlloo, IIAII2::S IIAIIF;
II A 1100 ::s n IIAII I , IIAlioo ::s .jii IIAII2, IIAlioo ::s .jii IIAIIF;
IIAIIF ::s.jii IIAII I , IIAIIF ::s .jii IIAlb IIAIIF ::s .jii IIAlioo'
62
62 Chapter 7.
Projections, Inner
Inner Product
Product Spaces,
Spaces, and
and Norms
Norms
mxa
3. For
3. A EeR
For A IR mxn .,
max laijl :::: IIAII2 :::: ~ max laijl.
l.] l.]
4. The
The norms
norms II|| .• IIF and II|| .• 112
\\F and ||2 (as
(as well
well asas all
all the
the Schatten
Schatten /?-norms,
p-norms, but not necessarily
but not necessarily
mx
unitarily invariant;
other p-norms) are unitarily invariant; i.e.,
i.e., for all A Ee IR
Rmxn " and for all orthogonal
matrices Q
matrices Q EzR mxm and
IRmxm and Z Ee IR M" x
nxn
",, IIQAZlia
||(MZ||a = ||A||IIAllaa fora
fora = 2 oror F.
F.
Convergence
Convergence
The following theorem uses matrix norms to convert a statement about convergence of a
sequence of matrices into a statement about the convergence of an associated sequence of
of
scalars.
scalars.
(1) (2)
7.49. Let II\\ ·11
Theorem 7.49. be a matrix norm and suppose A, A
-\\bea A(I), ,A
A(2), ,... Rmx".Then
... EeIRmxn. Then
lim A (k)
k~+oo
= A if and only if k~+oo
lim IIA (k) - A II = o.
EXERCISES
EXERCISES
+
1. If
1. If P projection, prove that P
P is an orthogonal projection, p+ = P.
= P.
2. Suppose P and Q are orthogonal projections and P + Q = I. Prove that P -— QQ

must be
must an orthogonal
be an orthogonal matrix.
matrix.
+
3. Prove that /I -— A+ AA is an orthogonal projection. Also, prove directly that V
V22Vl
V/ isis an
an
orthogonal projection,
orthogonal where V2
projection, where is defined
¥2 is defined as
as in
in Theorem
Theorem 5.1.
5.1.
nxn
4. Suppose that a matrix AA Ee IR
Wmxn has linearly independent columns. Prove that the
orthogonal projection onto the space spanned by these column vectors is given by the
A(ATA)
P = A(AT
matrix P -1 }AT.
A)~ AT.
5. Find the (orthogonal) projection of the vector [2

5. 4]r onto the subspace of 1R
[2 33 4f R33
spanned by
spanned the plane
by the 3;c -—yv +
plane 3x + 2z = 0.
= O.
IR nxn
6. Prove that E" x
" with the inner product (A,
(A, B)
B) == Tr A T
ATBB is a real inner product
space.
space.
7. Show that the matrix norms II|| .• 112

||2 and
and II|| .• IIF
\\F are unitarily invariant.
nxn
8. Definition:
8. Definition: Let Let A A Ee IR
Rnxn and denote
and denote its
its set
set of
of eigenvalues
eigenvalues (not
(not necessarily
necessarily distinct)
distinct)
by {A-i , ....
by P.l, . . , >, .An}.
„ } . The
The spectral of A
radius of
spectral radius is the
A is scalar
the scalar
p(A) = max IA;I.

i
Exercises
Exercises 63
63
Let
Let
A=[~14 0
12
~].
5
Determine ||A||F, IIAII
Determine IIAIIF' ||A||2, IIAlloo,
\\A\\Ilt, IIAlb HA^, and and peA).
p(A).
9. Let
9. Let
A=[~4 9~ 2~].
Determine ||A||F, IIAII
Determine IIAIIF' H A I dI ,, IIAlb
||A||2, IIAlloo,
H A H ^ , and
and p(A). (An nn xx nn matrix,
peA). (An all of
matrix, all of whose
whose
2
columns and
columns and rows
rows asas well
well as as main
main diagonal
diagonal and
and antidiagonal
antidiagonal sum
sum to
to ss = n (n 2 + 1)
= n(n l)/2,
/2,
is called a "magic square" matrix. If M is a magic square matrix, it can be proved
that || M Up =
that IIMllp for all
= ss for all/?.)
p.)
T
10. Let
10. Let A
A = = xyxyT, , where
where both
both x,x, y e IR
y E R"n are
are nonzero. Determine IIAIIF'
nonzero. Determine ||A||F, IIAIII>
||A||j, IIAlb
||A||2,
and II||A||oo
and in terms
A 1100 in terms of
of IIxlla and/or IlylljJ,
\\x\\a and/or where ex
\\y\\p, where and {3
a and ft take the value
take the value 1,2,
1, 2, or
or (Xl
oo as
as
appropriate.
appropriate.
left blank
blank
Chapter 8
Chapter 8
Linear Least
Linear Least Squares
Squares
Problems
Problems
8.1
8.1 The Linear
The Linear Least
Least Squares
Squares Problem
Problem
mx
Problem: Suppose
Problem: Suppose A e R
A E jRmxn with m 2:
" with n and bb E
> nand Rm isisaagiven
<=jRm givenvector.
vector.The linearleast
Thelinear least
squares problem
squares problem consists of finding
consists of finding an
an element
element of
of the set
the set
x = {x E jRn : p(x) = IIAx - bll 2 is minimized}.
Solution: The set X

The set has aa number
X has number of
of easily
easily verified properties:
verified properties:
1. A vector
1. vector xx Ee X if and
X if and only
only if ATrr = 0, where
if AT where r = bb -— Ax
Ax is the residual
is the associated
residual associated
T
with x.x. The equations AT A rr =— 0 can be rewritten in the form A ATTAx
Ax ==A ATT
bb and the
latter
latter form is commonly
form is commonly known
known as the normal
as the equations, i.e.,
normal equations, i.e., xx E
e X if and
X if and only
only ifif
x is
is aa solution
solution of
of the
the normal
normal equations.
equations. For
For further
further details,
details, see
see Section 8.2.
Section 8.2.
2. A
2. A vector
vector xx E X if and
X if and only
onlv if
if xx is
is of
of the
the form
x=A+b+(I-A+A)y, whereyEjRnisarbitrary. (8.1)
To see
To see why
why this
this must
must be
be so,
so, write
write the
the residual
residual r in
in the
the form
form
r = (b - PR(A)b) + (PR(A)b - Ax).
Now, (PR(A)b
Now, (Pn(A)b -— Ax)
AJC) is
is clearly
clearly in
in 'R(A),
7£(A), while
while
(b - PR(A)b) = (I - PR(A))b
= PR(A),,-b E 'R(A)-L
so these
so these two
two vectors
vectors are
are orthogonal.
orthogonal. Hence,
Hence,
IIrll~ = lib - Axll~

= lib - PR(A)bll~ + IIPR(A)b - Axll~
from the
from Pythagorean identity
the Pythagorean identity (Remark
(Remark 7.35).
7.35). Thus,
Thus, IIAx
||A.x -— bll~ (and hence
b\\\ (and hence p(x)
p ( x ) ==
\\Ax
II Ax -—b\\2) assumes its
b 112) assumes its minimum
minimum value
value if
if and
and only
only if
if
(8.2)
65
66 Chapter 8.
Chapter 8. Linear
Linear Least Squares Problems
Least Squares Problems
+
and this
and equation always
this equation always has
has aa solution
solution since
since AA
AA+b b eE 7£(A).
R(A). By Theorem 6.3,
By Theorem 6.3, all
all
solutions
solutions of
of (8.2) are of
(8.2) are of the
the form
form
x = A+ AA+b + (I - A+ A)y
=A+b+(I-A+A)y,
where yy eE W
where is arbitrary.
]R.n is arbitrary. The
The minimum
minimum value of pp ((x)
value of x ) is
is then
then clearly
clearly equal
equal to
to
lib - PR(A)bll z = 11(1 - AA+)bI1 2

~ Ilbll z,
the
the last inequality following
last inequality following by
by Theorem 7.23.
Theorem 7.23.
3. X is
3. X convex. To
is convex. see why,
To see consider two
why, consider arbitrary vectors
two arbitrary Xl =
vectors jci = AA++bb + (I
(I — A+A)y
- A + A) y
and *2
and Xz = A+b + (I
= A+b (I — A+ A)z in
- A+A)z X. Let
in X. Let 86 eE [0,1].
[0, 1]. Then
Then the convex combination
the convex combination
0*i + (1
8x, (1 - - 8)xz A+b++
#)*2 ==A+b A+A)(8y
(I(I- -A+ A)(Oy++ (1(1 - 0)z)
- 8)z) is clearly
is clearly in X.
in X.
4.
4. X has
has aa unique element x*
unique element x" of
of minimal 2-norm. In
minimal2-norm. In fact,
fact, x*
x" == A++bb isis the
the unique
unique vector
vector
that solves
that solves this "double minimization"
this "double minimization" problem,
problem, i.e., x* minimizes
i.e., x* minimizes the the residual p(x)
residual p(x)
and is
and is the
the vector of minimum
vector of minimum 2-norm
2-norm that does so.
that does so. This
This follows immediately from
follows immediately from
convexity or
convexity or directly
directly from
from the
the fact
fact that
that all
all xx eE X
X are
are of
of the
the form
form (8.1)
(8.1) andand
which
which follows
follows since
since the two vectors
the two vectors are orthogonal.
are orthogonal.
5.
5. There is aa unique
There is unique solution
solution to
to the
the least
least squares
squares problem, i.e., X
problem, i.e., X = {x*} =
= {x"} {A+b}, if
= {A+b}, if
+
and only
and only if
if A or, equivalently,
A +AA = Ilor, if and
equivalently, if and only if rank
only if (A) = n.
rank(A) n.
Just
Just as
as for
for the solution of
the solution of linear
linear equations,
equations, we
we can
can generalize
generalize the
the linear
linear least squares
least squares
problem
problem to
to the
the matrix case.
matrix case.
mx mxk
Theorem 8.1. Let A Ee E " and B
]R.mxn BE€R
]R.mxk.. The
The general solution to
min IIAX - Bib

XElR Plxk
is of
is the form
of the form
X=A+B+(I-A+A)Y,
xfc
where Y E€ R"
where Y is arbitrary.
]R.nxk is arbitrary. The
The unique
unique solution
solution of
of minimum
minimum 2-norm or F-norm
2-norm or F-norm is
is
X
X= A+B.
= A+B.
Remark 8.2.
Remark 8.2. Notice
Notice that
that solutions
solutions of
of the linear least
the linear least squares
squares problem look exactly
problem look exactly the
the
same as
same as solutions
solutions of
of the
the linear system AX
linear system AX = B. The
= B. The only
only difference is that
difference is that in
in the case
the case
of
of linear least squares
linear least squares solutions,
solutions, there is no
there is "existence condition"
no "existence condition" such
such as K(B) S;
as R(B) c 7£(A).
R(A).
If the
If the existence
existence condition
condition happens to be
happens to satisfied, then
be satisfied. equality holds
then equality holds and
and the
the least squares
least squares
8.3 Linear
Regression and Other Linear
and Other Linear Least
Least Squares
Squares Problems
Problems 67
residual is 0.
O. Of all solutions that give a residual of 0, the unique solution X = A++BB has
X =
minimum 2-norm or F F-norm.
-norm.
+
Remark 8.3. If
Remark 8.3. we take
If we take B
B = 1m in Theorem
Im in Theorem 8.1, then X = A
8.1, then A+ can be
can be interpreted
interpreted as
as
saying that the Moore-Penrose pseudoinverse of AA is the best (in the matrix 2-norm sense)
matrix such that AX
AX approximates the identity.
Remark 8.4. Many other interesting and useful approximation results are available for the
x
matrix 2-norm (and F-norm).
F -norm). One such is the following.
following. Let A Ee M™
lR~xn" with SVD
A = U~VT = LOiUiV!.
i=l
Then a best rank kk approximation to A l <:sf ck <:sr r,

A for 1 , i .i.e.,
e . , aa solution to
min IIA - MIi2,

MEJRZ'xn
is given
is given by
by
k
Mk = LOiUiV!.
i=1
The special case in which m =

=n and k =
nand = n -— 1 gives a nearest singular matrix to A
A Ee lR~ xn .
8.2
8.2 Geometric Solution
Geometric Solution
Looking at the schematic provided in Figure 8.1, it is apparent that minimizing IIAx || Ax -—bll
b\\2 2
is equivalent to finding the vector xx E Wn1 for which pp =
e lR Ax is closest to b
— Ax b (in the Euclidean
norm sense). Clearly, rr = bb -— Ax
Ax must be orthogonal to R(A).
7£(A). Thus, if AyAy is an arbitrary
vector in 7£(A)
R(A) (i.e., yy is arbitrary), we must have
0= (Ay)T (b - Ax)
=yTAT(b-Ax)
= yT (ATb _ AT Ax).
Since y is arbitrary, we must have AATT

bb —
- A
ATT
Ax ATr A;c
Ax = 0 or A Ax = ATb.
= AT b.
T T
Special case: If AA is full (column) rank, then x = (A A)
= (AT A b.
A)-l ATb.
8.3
Linear Regression and
and Other
Other Linear
Linear Least
Least Squares
Squares
Problems
Problems
8.3.1
8.3.1 Example: Linear regression
Suppose we have m measurements (ll,
(t\,y\), . . . ,, (trn,
YI), ... m) for which we hypothesize a linear
(tm,yYm)
(affine) relationship
y = at + f3 (8.3)
68 Chapter 8.
Chapter Linear Least
8. Linear Least Squares
Squares Problems
Problems
b
r
p=Ax Ay E R(A)
Figure 8.1.
Figure S.l. Projection of bb on
Projection of on R(A).
K(A).
for certain
for certain constants a and
constants a. One way
ft. One
and {3. way to solve this
to solve this problem is to
problem is to find
find the
the line
line that
that best
best fits
fits
the data in the least squares sense; i.e., with the model (8.3), we have
the data in the least squares sense; i.e., with the model (8.3), we have
YI = all + {3 + 81 ,
Y2 = al2 + {3 + 82
where
where 8 &\,..., are "errors"
8mm are
1 , ... , 8 "errors" and
and we
we wish
wish to
to minimize 8\ +
minimize 8? + ...
• • • + 8^-
8;. Geometrically,
Geometrically, we
we
are trying
are trying toto find
find the
the best
best line
line that
that minimizes the (sum
minimizes the (sum of
of squares
squares of
of the) distances from
the) distances the
from the
given data
given data points. See, for
points. See, for example,
example, Figure
Figure 8.2.
8.2.
Figure 8.2. Simple

Figure 8.2. Simple linear
linear regression.
regression.
Note
Note that distances are
that distances are measured
measured in in the
the vertical sense from
venical sense from the
the points to [he
point!; to the line
line (as
(a!;
indicated,
indicated. for
for example,
example. for
for the
the point
point (t\, YIn. However,
(tl. y\}}. However. other
other criteria
criteria arc
nrc possible.
po~~iblc. For
For ex-
cx-
ample, one
ample, one could measure the
could measure distances in
the distances in the
the horizontal sense, or
horizontal sense, or the
the perpendicular distance
perpendiculnr distance
from
from the
the points to the
points to the line
line could
could be
be used. The latter
used. The latter is called total least squares.
is called squares. Instead
Instead
of 2-norms,
of 2-norms, one
one could
could also
also use
use 1-norms
I-norms or or oo-norms.
oo-norms. TheThe latter
latter two
two are computationally
are computationally
8.3. Linear
8.3. Linear Regression
Regression and
and Other
Other Linear
Linear Least
Least Squares
Squares Problems
Problems 69
much more difficult

difficult to handle, and thus we present only the more tractable 2-norm case in
text that follows.
follows.
The mra "error equations" can be written in matrix form as
Y = Ax +0,
where
We then want to solve the problem
minoT 0 = min (Ax - y)T (Ax - y)

x
or, equivalently,
min lIoll~ = min II Ax - YII~. (8.4)
x
T T
Solution: xx —
Solution: [~] is
= [^1 is aa solution
solution of
of the
the normal equations A
normal equations ATAxAx = A
AT yy where, for the
special form
special form of
of the
the matrices above, we
matrices above, we have
have
and
and
AT Y = [ Li ti Yi
LiYi
J.
The solution for the parameters a and f3ft can then be written
8.3.2 Other least squares problems

Suppose the hypothesized model is not the linear equation (S.3)
(8.3) but rather is of
of the
the form
form
y = f(t) = Cl0!(0 + • • • 4- cn<t>n(t). (8.5)

(8.5)
In (8.5) the ¢i(t)

</>,(0 are given (basis) functions
functions and the Ci
c; are constants to be determined to
minimize
minimize the least squares
the least squares error.
error. The
The matrix problem is
matrix problem still (S.4),
is still (8.4), where
where we now have
we now have
An important special case of (8.5) is least squares polynomial approximation, which

i 1l ;
0,•(t)
corresponds to choosing ¢i (?)==t t'~
- ,, i i Ee!!, althoughthis
n,although thischoice
choicecan
canlead
leadtotocomputational
computational
70
70 Chapter 8.
Chapter Linear Least
8. Linear Least Squares
Squares Problems
Problems
difficulties
difficulties because of numerical ill ill conditioning for large n.
n. Numerically better approaches
are based on orthogonal polynomials, piecewise polynomial functions, splines, etc. etc.
coefficients Ci
The key feature in (8.5) is that the coefficients c, appear
appear linearly. The basis functions
functions
</>,- can be arbitrarily nonlinear. Sometimes a problem in which the Ci'S
¢i c, 's appear nonlinearly
nonlinearly
can be
can converted into
be converted into aa linear
linear problem.
problem. ForFor example,
example, ifif the
the fitting
fitting function
function isis of
of the
the form
form
y = ff( (t)
Y c, eC2i
t ) = c\e C2
/ ,, then
then taking
taking logarithms
logarithms yields
yields the
the equation
equation logy
log y == logci + cjt.
log c, + c2f. Then
Then
defining yy = log y, c,
— logy, c" and C2GI = cj_
logci,
c\ = log C2 results in a standard linear least squares
problem.
problem.
8.4
8.4 Least Squares
Least Squares and
and Singular
Singular Value Decomposition
Value Decomposition
In the numerical linear algebra literature (e.g., [4],
[4], [7],
[7], [11], [23]), it is shown that solution
of linear least squares problems via the normal equations can be a very poor numerical
method in
method in finite-precision
finite-precision arithmetic.
arithmetic. Since
Since the
the standard
standard Kalman
Kalman filter essentially amounts
filter essentially amounts
to sequential updating of normal equations, it can be expected to exhibit such poor numerical
behavior in
behavior practice (and
in practice (and it
it does).
does). Better
Better numerical methods are
numerical methods are based
based onon algorithms
algorithms that
that
T
work directly and solely on A itself rather than A
A itself AT A. Two basic classes of algorithms are
S VD and
based on SVD and QR
QR (orthogonal-upper
(orthogonal-upper triangular)
triangular) factorization,
factorization, respectively.
respectively. The
The former
former
is much more expensive but is generally more reliable and offers offers considerable theoretical
insight.
insight.
In this section we investigate solution of the linear least squares problem
min II Ax - b11 2 , A E IRmxn , bE IR m , (8.6)

x
via the
via SVD. Specifically,
the SVD. Specifically, we assume that
we assume that A has an
A has an SVD
SVD given
given by
by A
A = UT,
U~VTVT = U,SVr
U\SVf
Theorem 5.1.
as in Theorem 5.1. We now note that
IIAx - bll~ = IIU~VT x - bll~

= II ~ VT X - U T bll; since II . Ib is unitarily invariant
=11~z-cll~ wherez=VTx,c=UTb
= II [~ ~] [ ~~ ] - [ ~~ ] II:
= II [ sz~~ c, ] II:
The last
The last equality follows from
equality follows from the the fact
fact that
that if [~~].
if vv = [£ ], then v II ~ =
then II||u||^ viii\\\~ + II\\vi\\\
= II||i>i v211 ~ (note
(note
that orthogonality is
that orthogonality is not
not what
what is used here;
is used here; thethe subvectors
subvectors can
can have different lengths).
have different lengths). ThisThis
explains why it is convenient to work above with the square of the norm rather than the
norm. As far as the minimization is concerned,
concerned. the two are equivalent.equivalent. In fact. fact, the last
quantity above is clearly minimized by taking z\ S~lc\. The subvector zZ22 is arbitrary,
z, = S-'c,. arbitrary,
while the
while the minimum
minimum value of II\\Ax
value of b II ~ is
Ax -— b\\^ czll ~.
is IIl^llr
8.5. Least Squares
B.S. Least Squares and
and QR
QR Factorization
Factorization 71
71
Now
Now transform
transform back
back to
to the original coordinates:
the original coordinates:
x = Vz
= [VI V2 1[ ~~ ]
= VIZ I + V2Z2
= VIS-ici+ V2Z2
= vls-Iufb + V2Z2.
The
The last
last equality
equality follows
follows from
from
c = UTb = [ ~ f: ]= [ ~~ l
Note
Note that since 12
that since Z2 is arbitrary, V
is arbitrary, is an
V22Zz2 is an arbitrary vector in
arbitrary vector in 7Z(V
R(V22)) = = A/"(A).
N(A). Thus,
Thus, xx has
has
been written in the form x = = A++bb + + (I A++A)_y,
(/ -— A A) y, where
where yy Ee R m
ffi.m is arbitrary. This
is arbitrary. This agrees,
agrees,
of course,
of course, with
with (8.1).
(8.1).
The minimum
The minimum valuevalue of
of the least squares
the least squares residual
residual is
is
and we
and clearly have
we clearly have that
that
minimum least
minimum least squares
squares residual
residual is
is 0 -4=> bb is
0 {::=:} is orthogonal to all
orthogonal to all vectors in U
vectors in U22
•<=^
{::=:} is orthogonal
b is to all
orthogonal to all vectors 7l(A}L
in R(A)l.
vectors in
{::=:} b E R(A).
+
Another expression
Another expression for
for the minimum residual
the minimum is II|| (I
residual is (/ -— AA )b|| 2 . This
AA +)bllz. This follows easily since
follows easily since
11(1- AA+)bll~2 = 11U2U!b"~ = b U2ZV!V22V!b = bTVZV!b
||(7 - AA+)b\\ 2
- \\U2Ufb\\l = b T
T U UÛ UJb = bT
U 2 U*b = IIV!bll~.2
= \\U?b\\ 2
.
Finally, an
Finally, important special
an important special case
case of the linear
of the linear least squares problem
least squares problem isis the
the
X
so-called full-rank
so-called full-rank problem, i.e., A
problem, i.e., A eE 1R™ ". In
ffi.~xn. In this case the
this case the SVDSVD ofof A is given
A is given by
by
A UZVTT =
A = V:EV = [VI t/2][o]î r > and
[U{ Vzl[g]Vr, and there
there is
is thus "no V
thus "no part" to
V22 part" to the solution.
the solution.
8.5
8.5 Least Squares
Least Squares and
and QR Factorization
QR Factorization
In this section,
In this section, we
we again look at
again look at the solution of
the solution of the
the linear
linear least squares problem
least squares problem (8.6)
(8.6) but
but this
this
time
time in
in terms
terms of
of the QR factorization.
the QR factorization. This
This matrix
matrix factorization
factorization isis much
much cheaper
cheaper to compute
to compute
than an SVD
than an SVD and,
and, with
with appropriate
appropriate numerical
numerical enhancements,
enhancements, can can be quite reliable.
be quite reliable.
To simplify the
To simplify exposition, we
the exposition, we add
add the simplifying assumption
the simplifying assumption that
that AA has full column
has full column
XM
rank, i.e., A
rank, i.e., A eE R™
ffi.~xn.. It
It is
is then
then possible,
possible, via
via aa sequence
sequence of of so-called
so-called Householder
Householder or or Givens
Givens
transformations,
transformations, to reduce A
to reduce A in the following
in the following way.
way. A A finite
finite sequence
sequence of of simple
simple orthogonal
orthogonal
row transformations (of
row transformations (of Householder
Householder or Givens type)
or Givens type) can
can be
be performed
performed on on A
A to reduce itit
to reduce
to
to triangular
triangular form.
form. If If we label the
we label the product
product of such orthogonal
of such orthogonal rowrow transformations
transformations as the
as the
mxm
orthogonal matrix QT QT E€ R ffi.mxm,, we have
(8.7)
72 Chapter
Chapter 8. Linear Least
8. Linear Squares Problems
Least Squares Problems
x mx
where
where R eE M£ ffi.~xn" is
is upper
upper triangular.
triangular. Now
Now write
write Q
Q = [Q\ Q
= [QI 2], where
Qz], QI e
where Q\ E R " and
ffi.mxn and
IX(m )
Q
Qz2 E€ ffi.m
K" x(m-n). ~" . Both
Both Q and Qz
Q\I and <22 have
have orthonormal
orthonormal columns.
columns. Multiplying
Multiplying through
through by by Q
Q
in
in (8.7),
(8.7), we see that
we see that
A=Q[~J (8.8)
= [QI Qz] [ ~ ]
= QIR. (8.9)
Any
Any of of (8.7),
(8.7), (8.8),
(8.8), or (8.9) are
or (8.9) are variously
variously referred
referred to
to as
as QR
QR factorizations of A.
factorizations of A. Note
Note that
that
(8.9)
(8.9) isis essentially
essentially what
what isis accomplished
accomplished by by the
the Gram-Schmidt
Gram-Schmidt process, i.e., by
process, i.e., by writing
writing
AR- l1 = Q\
AR~ QI wewe see
see that
that aa "triangular"
"triangular" linear
linear combination (given by
combination (given by the coefficients of
the coefficients of
R-l)I ) of
R~ of the
the columns
columns ofof A yields
yields the
the orthonormal columns of
orthonormal columns of QQ\.
I.
Now
Now note
note that
that
IIAx - bll~ = IIQ T Ax - QTbll~ since II . 112 is unitarily invariant
= II [ ~ ] x - [ ~~ ] If:,
The
The last
last quantity
quantity above
above is
is clearly
clearly minimized
minimized byby taking
taking xx =
=R R- lIc\
Cl and
and the
the minimum
minimum residual
residual
is Ilczllz.
is \\C2\\2- Equivalently,
Equivalently, we
we have
have x =
= R- 1l
R~ Q\bQf b == A +bb and
+
and the
the minimum
minimum residual
residual is Qr bllz'
is IIIIC?^!^-
EXERCISES
EXERCISES
xn m
1. For
1. For A E€Wffi. mxn b Ee E
, ,b ffi. m,, and
and any e R",
any y E ffi. n , check
check directly
directly that (I -- A++A)y
that (I and A
A)y and +
A+bb
are
are orthogonal vectors.
orthogonal vectors.
2.
2. Consider
Consider the
the following
following set
set of
of measurements
measurements (*,, yt):
(Xi, Yi):
(1,2), (2,1), (3,3).
(a)
(a) Find
Find the
the best
best (in
(in the
the 2-norm
2-norm sense)
sense) line
line of
of the
the form
form yy = + ftfJ that
ax +
= ax that fits this
fits this
data.
data.
(b)
(b) Find
Find the
the best (in the
best (in the 2-norm sense) line
2-norm sense) line of
of the
the form
form jc
x == ay fJ that
ay + (3 that fits
fits this
this
data.
data.
3. Suppose qi
3. Suppose q, and qz2 are
and q are two
two orthonormal
orthonormal vectors
vectors and b is
and b is aa fixed
fixed vector, all in
vector, all R".
in ffi. n•
(a) Find
(a) Find the
the optimal linear combination
optimallinear aql +
combination aq^ fiq2 that
+ (3q2 that is
is closest
closest to b (in
to b (in the 2-norm
the 2-norm
sense).
sense).
(b)
(b) Let denote the
Let rr denote "error vector"
the "error vector" bb —
- ctq\
aql — Show that
flq2- Show
- {3qz. that rr is orthogonal to
is orthogonal to
bothî
both ql and q2.
and q2.
Exercises
Exercises 73
4. Find
4. Find all solutions of
all solutions of the
the linear least squares
linear least squares problem
problem
min II Ax - bll 2
x
when A = [ ~
5. Consider the problem of
5. of finding the minimum 2-norm
2-norm solution
solution of
of the linear least
least
«rmarp« nrr»h1<=>m
squares problem
min II Ax - bl1 2
x
when A = [~ ~ ] and b = [ !1 The solution is
(a) Consider aa perturbation

(a) Consider E\ =
perturbation EI = [~ ~] of
[0 pi of A,
A, where is aa small
where 88 is small positive
positive number.
number.
Solve the
Solve the perturbed
perturbed version
version of
of the above problem,
the above problem,
where AI
where AI = A+
= A + E
E\.I . What happens to
What happens to IIx*
||jt* — ||2 as
- yyII2 as 88 approaches
approaches 0?
O?
(b) Now
(b) Now consider the perturbation
consider the
positive
positive number.
perturbation EI
Solve the
number. Solve the perturbed problem
perturbed problem
n
E2 == \[~0 s~\ of
of A,
A, where
where again
again 88 is
is aa small
small
min II A 2 z - bib
z
where
where A —A
A22 = A +E
E22.• What happens to
What happens to IIx*
\\x* -— z|| as 88 approaches
zll22 as approaches O?
0?
6. Use
6. Use the four Penrose
the four Penrose conditions
conditions and the fact
and the fact that
that QI has orthonormal
Q\ has orthonormal columns to
columns to
verify that
verify that if A eE R™
if A x
~;:,xn"can
canbe
befactored
factoredininthe
theform
form(8.9),
(8.9),then
thenA+ R~IlQf.
A+== R- Q\.
x
7. Let
1. Let A
A eE R"
~nxn,", not
not necessarily
necessarily nonsingular,
nonsingular, and
and suppose
suppose A
A = QR, where
where Q is
is
A ++ = R+
orthogonal. Prove that A R+QQTT.
left blank
blank
Chapter 9
Chapter 9
Eigenvalues and
Eigenvalues and
Eigenvectors
Eigenvectors
9.1
9.1 Fundamental Definitions
Fundamental Definitions and
and Properties
Properties
nxn
Definition 9.1. A
Definition 9.1. A nonzero
nonzero vector
vector xx eE C" is aa right
en is right eigenvector
eigenvector of A eE C
of A if there
e nxn if there exists
exists
aa scalar A. Ee e,
scalar A C, called
called an eigenvalue, such
an eigenvalue, such that
that
Ax = AX. (9.1)
Similarly,
Similarly, aa nonzero vector yy eE C"
nonzero vector en is a left
is a eigenvector corresponding
left eigenvector corresponding to
to an
an eigenvalue
eigenvalue
a if
Mif
(9.2)
By
By taking
taking Hermitian
Hennitian transposes
transposes inin (9.1),
(9.1), we see immediately
we see immediately that x HH is
that X is aa left
left eigen-
eigen-
H
vector
vector of
of A A H associated
associated with A. Note
with I. Note that
that if [y] is
if xx [y] is aa right
right [left]
[left] eigenvector
eigenvector of A, then
of A, then
so is
so is ax [ay] for
ax [ay] for any
any nonzero scalar aa E
nonzero scalar E C. One
One often-used
often-used scaling
scaling for
for an
an eigenvector
eigenvector is is
aa =— 1/
\j'||;t|| so that
IIx II so that the
the scaled
scaled eigenvector
eigenvector hashas nonn
norm 1. 1. The
The 2-nonn
2-norm isis the
the most
most common
common
norm
nonn used
used forfor such scaling.
such scaling.
Definition
Definition 9.2. The polynomial
9.2. The polynomialn (A.) == det(A—A,/)
n (A) det (A - Al) is
is called the characteristic
called the characteristic polynomial
polynomial
of
of A. (Note that
A. (Note that the
the characteristic
characteristic polynomial
polynomial can
can also
also be
be defined as det(A./
defined as det(Al —- A). This
A). This
results
results in
in at
at most
most aa change
change of
of sign
sign and,
and, as
as aa matter
matter of
of convenience,
convenience, we
we use
use both forms
both forms
throughout
throughout thethe text.}
text.)
The
The following
following classical
classical theorem
theorem can
can be
be very
very useful
useful in
in hand
hand calculation.
calculation. ItIt can
can be
be
proved
proved easily from the
easily from the Jordan
Jordan canonical form to
canonical fonn to be
be discussed in the
discussed in the text
text to follow (see,
to follow for
(see, for
example, [21]) or
example, [21D or directly
directly using
using elementary
elementary properties
properties of
of inverses
inverses and
and determinants
determinants (see,
(see,
for
for example, [3]).
example, [3]).
nxn
Theorem
Theorem 9.3
9.3 (Cayley-Hamilton).
(Cayley-Hamilton). For
For any
any A enxn
A eE C ,, n(A) = 0.
n(A) = O.
2
Example
Example 9.4.
9.4. Let
Let A [-~ ~g].
A = [~g -~]. Then
Then n(k) A2 +
n(A) = X + 2A,
2A - —3.3. ItItisisan
aneasy
easyexercise
exercise toto
2
verify
verify that
that n(A)
n(A) =
=A A2 + 2A 31 = 0.
2A -- 31 O.
x
It can
It can be
be proved
proved from
from elementary
elementary properties
properties of determinants that
of detenninants that if A eE e
if A C" ",, then
nxn
then
7t (X) is
n(A) is aa polynomial
polynomial of degree n.
of degree n. Thus,
Thus, the
the Fundamental
Fundamental Theorem
Theorem of
of Algebra
Algebra says
says that
that
75
76 Chapter 9.
Chapter 9. Eigenvalues
Eigenvalues and
and Eigenvectors
Eigenvectors
7r(A)
n(A) has
has nn roots,
roots, possibly repeated. These
possibly repeated. These roots,
roots, as
as solutions
solutions of
of the determinant equation
the determinant equation
n(A) = det(A - AI) = 0, (9.3)
are the
are eigenvalues of
the eigenvalues of A and imply
A and imply the singularity of
the singularity of the
the matrix
matrix A
A -— XI, and hence
AI, and hence further
further
guarantee the
guarantee the existence
existence of
of corresponding
corresponding nonzero eigenvectors.
nonzero eigenvectors.
Definition 9.5.
Definition 9.5. The spectrum of
The spectrum of A c
x
A Ee C"nxn" is
is the
the set
set of
of all
all eigenvalues
eigenvalues of
of A, i.e., the
A, i.e., the set
set of
of
all roots
all roots of
of its
its characteristic
characteristic polynomial n(X). The
polynomialn(A). The spectrum
spectrum ofof A denoted A
is denoted
A is (A).
A(A).
A Ee en
Let the eigenvalues of A C"xxn
" be denoted X\,...,
A], ... , Xn. Then if we write (9.3) in the
An.
form
form
n(A) = det(A - AI) = (A] - A) ... (An - A) (9.4)
and set
and set A = 00 in
X= in this
this identity,
identity, we
we get the interesting
get the interesting fact del (A) =
that det(A)
fact that = AI
A] .• A.2 • • •AnAM(see
A2 ... (see
also Theorem
Theorem 9.25).
If Wxn, then n(X)
If A Ee 1Ftnxn, n(A) has real coefficients.
coefficients. Hence the roots of 7r(A),
n(A), i.e., the
eigenvalues of
eigenvalues of A,
A, must occur in
must occur in complex
complex conjugate
conjugate pairs.
pairs.
Example 9.6.
Example 9.6. Let a, ft
Let a, f3 Ee 1Ft
R and
and let
let A _^ !].
= [[~f3
A = £ ]. Then
Then n(A) A.22- - 2aA
jr(A.) = A 2aA++aa22++f32
ft2 and
and
A
A has eigenvalues aa ± fij
has eigenvalues (where j = ii = R).
f3j (where •>/—!)•
If A
If A E R"x", then there is an easily checked
€ 1Ftnxn, checked relationship between the left and right
T
eigenvectors of
eigenvectors of A and A
A and AT (take Hermitian
(take Hermitian transposes of both
transposes of both sides
sides of
of (9.2».
(9.2)). Specifically,
Specifically, if
if
y is a left
left eigenvector of
of A
A corresponding to A A eE A(A),
A(A), then yy is a right eigenvector of AT
of AT
corresponding to IA. €E A (A). Note, too, that by elementary properties of
A(A). of the determinant,
r
we always
we always have
have A(A)
A(A) = = A(A
A(AT), ), but
but that A(A) =
that A(A) = A(A) only if
A(A) only if A
A E R"x".
e 1Ftnxn.
Definition 9.7. IfX
Definition 9.7. is aa root
If A is root of multiplicity m
of multiplicity m ofjr(X),
of n(A), we
we say that A
say that X is
is an
an eigenvalue
eigenvalue of A
of A
of algebraic multiplicity
of algebraic multiplicity m. m. The geometric multiplicity
The geometric multiplicity ofofXA is
is the
the number
number ofof associated
associated
independent eigenvectors
independent eigenvectors = = nn -— rank(A
rank(A -— AI)
A/) =
= dimN(A
dim J\f(A -— AI).
XI).
If A
If AE€ A(A)
A(A) has algebraic multiplicity
has algebraic multiplicity m,
m, then
then 1I ::::
< dimN(A
dimA/"(A -— AI) A/) ::::
< m. Thus, if
m. Thus, if
we denote the
we denote the geometric
geometric multiplicity of A
multiplicity of A by
by g,
g, then we must
then we have 1I ::::
must have < gg ::::
< m.
m.
x
Definition 9.8.
Definition 9.8. A
A matrix
matrix A 1Ftnxn
A Ee W " is is said
said toto be defective if
be defective if it
it has
has an eigenvalue whose
an eigenvalue whose
geometric
geometric multiplicity is not
multiplicity is not equal
equal to
to (i.e.,
(i.e., less
less than)
than) its
its algebraic
algebraic multiplicity.
multiplicity. Equivalently,
Equivalently,
A
A is
is said
said to
to be
be defective
defective ifif it
it does
does not
not have
have nn linearly
independent (right
(right or
or left)
left) eigenvectors.
eigenvectors.
From the Cayley-Hamilton Theorem, we know that n(A) n(A) = = 0. However, it is pos-
O. However,
sible for
sible for A to satisfy
A to satisfy aa lower-order
lower-order polynomial.
polynomial. For example, if
For example, if A = \[~1Q ®],
A = ~], then
then A sat-
A sat-
2
isfies
isfies (1
(Je -— 1)2
I) = = O.0. But
But it
it also
also clearly
clearly satisfies
satisfies the smaller degree
the smaller degree polynomial
polynomial equation
equation
a - n =0o.
(it. - 1) ;;;:;
Definition ~.~.
neftnhion 5.5. Thll
The minimal
minimal polynomial
polynomial of A G
Of A K""" is
l::: l!if.nxn ix the
(hI' polynomial
polynomilll o/(X) of IPll.ft
a(A) oJ least
degree such
degree such that a (A) ~=0.
that a(A) O.
It can
It can be
be shown
shown thatthat or(l) is essentially
a(Je) is essentially unique (unique if
unique (unique if we
we force
force the coefficient
the coefficient
of the
of highest power
the highest power of of AA to be +
to be +1,1. say;
say; such
such aa polynomial
polynomial is said to
is said be monic
to be monic and and we
we
generally write
generally write et (A) as
a(A) as aa monic
monic polynomial
polynomial throughout
throughout the
the text).
text). Moreover,
Moreover, itit can
can also
also be
be
9.1. Fundamental Definitions
9.1. Fundamental Definitions and
and Properties
Properties 77
77
shown that aa(A)

(A.) divides every
every nonzero
nonzero polynomial f3(A) for which ftf3(A)
polynomial fi(k} (A) = 0.
O. In particular,
particular,
a(X) divides n(A).
a(A) n(X).
There is an algorithm to determine or (A.) directly
a(A) directly (without
(withoutknowing
knowing eigenvalues
eigenvalues and
and as-
as-
sociated eigenvector
eigenvector structure). Unfortunately,
Unfortunately, this algorithm,
algorithm, called the Bezout algorithm,
Bezout algorithm,
is numerically unstable.
Example 9.10. The above definitions are illustrated below for a series of matrices, each
Example 9.10.
4
of which has an eigenvalue 2 of algebraic multiplicity 4,
4, i.e., n(A)
7r(A) = (A
(A —
- 2)
2)4. . We denote
the geometric multiplicity by g.
g.
A-[~
- 0
0
2
0
0
0
I
0
2 !] ha,"(A) ~ (A - 2)' ""d g ~ 1.
A~[~ ~ ~ ~
0
2
] ha< a(A) (A - 2)' ""d g 2.
0 2
0 0
A~U ~ ~ ~
I 0
2 0
0 2
] h'" a(A) (A - 2)2 ""d g 3.
0 0
A~U ~ ~ ~
0 0
2 0
] ha<a(A) (A - 2) andg 4.
0 2
0 0
At this point, one might speculate that g g plus the degree of a must always be five.
Unfortunately, such is not the case. The matrix
A~U !]
I 0
2 0
0 2
0 0
has a(A) = (A - 2)2 and g = 2.

x
Theorem 9.11. Let
Theorem 9.11. Let A ccnxn
A eE C« " ana et A.,
and [let Ai be
be an
an eigenvalue
eigenvalue of
of A
A with
with corresponding
corresponding right
right
eigenvector jc,-.
Xi. Furthermore, let Yj left eigenvector corresponding to any A
yj be a left Aj; eE l\(A)
A (A)
such
such that
that Xj =£ A.,.
Aj 1= YY
Then yfx{Xi =
Ai. Then 0.
= O.
Proof: Since Ax
Proof' Since AXit = A,*,,
AiXi,
(9.5)
78 Chapter
Chapter 9. Eigenvalues and
9. Eigenvalues and Eigenvectors
Eigenvectors
Similarly, since
Similarly, since YY
y" A = AjXjyf,
A = yy,
(9.6)
Subtracting (9.6)
Subtracting (9.6) from (9.5), we
from (9.5), we find
find 00 =
= (Ai
(A.,-- —Aj)YY
A y )j^jc,. SinceAiA,,-- —AjA.7- =1=^ 0,0,we
xi. Since wemust
musthave
have
yfxt =0.O.
YyXi = 0
The
The proof of Theorem
proof of 9.11 is
Theorem 9.11 is very similar to
very similar to two
two other
other fundamental
fundamental and important
and important
results.
results.
Theorem 9.12.
Theorem 9.12. Let
Let A
A E
x
e C"nxn" bec
be Hermitian,
Hermitian, i.e.,
i.e., A =A
A =
H
AH.. Then
Then all
all eigenvalues
eigenvalues of
of A must
A must
be
be real.
real.
Proof: Suppose (A,
Proof: Suppose (A., x)
x) is an arbitrary
is an arbitrary eigenvalue/eigenvector
eigenvalue/eigenvector pair
pair such
such that
that Ax = A.JC.
Ax = Then
AX. Then
(9.7)
Taking Hermitian
Taking Hermitian transposes
transposes in (9.7) yields
in (9.7) yields
Using the fact

Using the fact that A is
that A Hermitian, we
is Hermitian, have that
we have XxHxx =
that IXH = Xx H
AXHx. However, since
x. However, since xx is
is an
an
H
eigenvector, we have xH
eigenvector, /= 0,
X Xx =1= 0, from
from which conclude IA. =
which we conclude = A, i.e., A.
A, i.e., A isisreal.
real. 0D
Theorem 9.13.
Theorem 9.13. Let
Let A x
A eE C"nxn c
" be
be Hermitian
Hermitian and
and suppose
suppose AX and
and /JL
iJ- are distinct eigenvalues
are distinct eigenvalues
of A
of A with corresponding right
with corresponding right eigenvectors
eigenvectors x and
and z, respectively.
respectively. Then
Then x and
and zz must
must be
be
orthogonal.
orthogonal.
H
Proof: Premultiply
Proof: the equation
Premultiply the equation AxAx == A.JC
AX byby Z to get
ZH to ZH Ax
get ZH Ax == XAZz HH xx.. Take
Take the Hermitian
the Hermitian
transpose of of this equation
equation and use the facts
facts that A A is Hermitian and A A.isisreal getxXHHAz
realtotoget Az ==
Xx
AxH H
z.z. Premultiply the equation
equation AzAz = iJ-Z get xXHHAz
i^z by xXHH to get Az == /^X
iJ-XHH
ZZ = Xx AXHH
Since
z.z. Since
A,A ^=1= /z,
iJ-, we
we must
must have
have that x HHzz =
that X 0, i.e.,
= 0, i.e., the two vectors
the two vectors must be orthogonal.
must be orthogonal. D 0
Let us now
Let us now return
return to the general
to the case.
general case.
Theorem 9.14.
Theorem 9.14. Let
Let AA E €. c Cnxnnxn
have distinct
have eigenvalues A
distinct eigenvalues , 1 ?...
AI, . . . ,, A.
An n with corresponding
with corresponding
right
right eigenvectors
eigenvectors x\,...
XI, ... ,,xxnn. • Then
Then {XI,
[x\,...,
... , x
xn}} is
is a linearly independent
a linearly independent set. The same
set. The same
result holds for
result holds for the corresponding left
the corresponding left eigenvectors.
eigenvectors.
Proof: For the
Proof: the proof see, for
proof see, for example, [21,
[21, p.
p. 118].
118]. 0
If
If A
A Ee cC nx " has
nxn
distinct eigenvalues,
has distinct eigenvalues, andand ifif Ai
A., Ee A(A),
A(A), then
then by Theorem 9.11,
by Theorem 9.11, jc,
Xi is
is
orthogonal to
orthogonal to all
all yj's
y/s forfor which
which jj ^=1= i. However, it
i. However, cannot be
it cannot the case
be the case that yf*x
that YiH
t =
Xi = 00 asas
well,
well, oror else
else xXif would
would be orthogonal to
be orthogonal linearly independent
to nn linearly independent vectors (by Theorem
vectors (by Theorem 9.14)9.14)
and would
and thus have
would thus have toto be 0, contradicting
be 0, the fact
contradicting the fact that
that it is an
it is an eigenvector.
eigenvector. Since yr
Since yf*XiXi =1=
^ 00
for each
for each i,i, we can choose
we can choose the the normalization
normalization of the *,
of the 's, or
Xi'S, or the y, 's,
the Yi 's, or
or both, so that
both, so ytHHx;
that Yi Xi == 11
for/ i €E !1.
for n.
9.1. Fundamental
9.1. Fundamental Definitions
Definitions and
and Properties 79
x
Theorem 9.15.
Theorem 9.15. Let Let AA Ee C"
en xn" have
have distinct eigenvalues A.I
distinct eigenvalues AI,, ...
..., , A. andlet
Annand letthe
thecorrespond-
correspond-
ing right
ing right eigenvectors
eigenvectors form form aa matrix
matrix X X = [XI,[x\, ...
..., , xxn].
n]. Similarly,
Similarly, letlet YY = — [YI,""
[y\, ..., yYn]
n]
be the
be the matrix
matrix of of corresponding
corresponding left
left eigenvectors.
eigenvectors. Furthermore,
Furthermore, suppose that the
suppose that the left
left and
and
right eigenvectors have
right eigenvectors have been
been normalized
normalized so so that yf1 Xi
that YiH Xi == 1, 1, i/ Een.!!:: Finally, let A
Finally, let A ==
txn
diag(A,j, ...
diag(AJ, . . . ,, X e W
n) E
An) Then AJC,
]Rnxn.. Then AXi = = A.,-*/,
AiXi, i/ Ee !!,
n, can
can bebe written
written in in matrixform
matrix form as as
AX=XA (9.8)
while YiH
while y^XjXj = 5,;, i/ E!!,
= oij, en, y' e !!,
j E n, is
is expressed
expressed by
by the equation
the equation
yHX = I. (9.9)
These matrix
These matrix equations
equations can
can be
be combined
combined to
to yield the following
yield the matrix factorizations:
following matrix factorizations:
X-lAX =A = yRAX (9.10)
and
and
n
A = XAX- I = XAyH = LAixiyr (9.11)
i=1
Example 9.16. Let

Example 9.16. Let
5
-3
2 -3
-2
-4
~ ].
Then rr(A)
Then det(A -- A./)
n(X) = det(A -(A.33 + 4A.
AI) = -(A 4A22 + 9)"
9 A. + 10)
10) = -()" 2)(A.2 + 2)"
-(A. + 2)(),,2 2A,++5),
5),from
from
which we find
which we find A (A) =
A(A) = {-2,
{—2, -1— 1 ± 2j}. We can
2 j } . We can now
now find the right
find the right and
and left
left eigenvectors
eigenvectors
corresponding to
corresponding to these eigenvalues.
these eigenvalues.
For
For A-i
Al == -2,
—2, solve
solve the
the 33 xx 33 linear system (A
linear system (A -— (-2)l)xI
(—2}I)x\ == 00 to
to get
get
Note that
Note one component
that one component of
of XI
;ci can
can be set arbitrarily,
be set arbitrarily, and
and this then determines
this then determines the
the other
other two
two
(since dimA/XA -— (-2)1)
(since dimN(A (—2)7) == 1).
1). To
To get
get the
the corresponding
corresponding left
left eigenvector
eigenvector YI, solve the
y\, solve the
linear system
linear yi
(A + 21)
system y\(A = 00 to
21) = to get
get
This time
This time we have chosen
we have chosen the
the arbitrary scale factor
arbitrary scale factor for
for YJ so that
y\ so that y f xXI\ =
= 1.
1.yi
For A
For —1 +
A22 = -1 + 2j, solve the
2j, solve the linear
linear system
system (A (—1+
(A -— (-1 + 2j)
2j)I)x
I)x22 = = 00 to
to get
get
3+ j ]
X2 =[ 3 ~/ .
80 Chapter 9.
Eigenvalues and
and Eigenvectors
Eigenvectors
Solve the
Solve the linear
linear system
system y"
yf (A -I +
(A -— ((-1 + 227')/)
j) I) =
= 00 and
and nonnalize
normalize Y2 so that
y>2 so that y"x 1 to
2 = 1
yf X2 to get
get
For XT,
For — 1 -— 2j,
= -I
A3 = 2j, we we could
could proceed
proceed to to solve
solve linear
linear systems
systems asas for
for A2.
A.2. However, we
However, we
can also
can also note
note that
that x$
X3 ==xX2
2 ' and
and yi
Y3 = jj.
Y2. To
To see
see this,
this, use
use the
the fact
fact that
that A,
A3 3 = A.2
A2 and
and simply
simply
conjugate the
conjugate the equation AX22 =
equation A;c — ^.2*2 to get
A2X2 to AX2 = ^2X2-
get Ax^ A2X2. A similar
similar argument
argument yields
yields the
the result
result
for left
for left eigenvectors.
eigenvectors.
Now define
Now define the matrix X of
the matrix of right eigenvectors:
right eigenvectors:
3+j 3- j ]
3-j 3+j .
-2 -2
It is
It is then
then easy
easy to
to verify
verify that
that
.!.=.L !.±1
4 4
l+j .!.=.L
4 4
Other results
Other results in
in Theorem
Theorem 9.15
9.15 can
can also
also be verified. For
be verified. For example,
example,
-2 0
X-IAX=A= 0 -1+2j
[ o 0
Finally, note
Finally, note that we could
that we could have solved directly
have solved directly only
only for
for *i and xX22 (and
XI and X3 =
(and XT, 2). Then,
= xX2). Then,
instead of
instead of detennining
determining the j,'s directly,
the Yi'S directly, we
we could
could have
have found
found them instead by
them instead by computing
computing
X~l and reading off its rows.
X-I
Example 9.17.
Example 9.17. Let
Let
A = [-~ -~ ~] .
o -3
3
Then Jl"(A)
Then 7r(A.) = det(A
det(A -- AI) _(A 3 +
A./) = -(A 8A22+ 19A
+ 8A 19X++ 12)
12)== -(A -(A.++ I)(A
1)(A.++3)(A
3)(A,++4),
4),
from which
from which wewe find
find A (A) =
A(A) = {-I,
{—1, -3,
—3, -4}.
—4}.Proceeding
Proceedingasasininthe
theprevious
previousexample,
example,ititisis
straightforward to
gtruightforw!U"d to compute
comput~
X~[~ -i ]
I
0
-I
and
and
x-,~q ~ y'
1 2 1
3 0 -3 ]
2 -2 2
9.1.
9.1. Fundamental Definitions and
Fundamental Definitions and Properties
Properties 81
81
l
We
We also
also have
have X~
X-I AX =A
AX = A= diag(—1,
= diag( -1, —3, —4), which
-3, -4), which is
is equivalent
equivalent to
to the
the dyadic expan-
dyadic expan-
sion
sion
A = LAixiyr
i=1
~(-I)[ ~ W~ ~l+(-3)[ j ][~ 0 -~l
+(-4) [ -; ] [~ 1
- -
3 ~J
I I I I I I
~ (-I) [
l
I I
(; 3 (;
2 0 -2 3 -3 3
I
3
I
2
3
I
I
3
I
J+ (-3) [ 0 0
I
0
0
I
]+ (-4) [ -3
I
I
I
3
I
-3
I
I
(; 3 (; -2 2 3 -3 3
Theorem 9.18.
Theorem 9.18. Eigenvalues (but not
Eigenvalues (but not eigenvectors) are invariant
eigenvectors) are invariant under
under a
a similarity
similarity trans-
trans-
formation T.
formation T.
Proof: Suppose (A,
Proof: Suppose (A, jc)
X) is an eigenvalue/eigenvector
is an eigenvalue/eigenvector pair such that
pair such that Ax
Ax =
= Xx. Then, since
AX. Then, since T
T
is nonsingular, we have the equivalent
equivalent statement (T~lIAT)(T~
statement (T- AT)(T-lx)I
x) = ( T ~ lIxx),
= XA(T- from which
) , from
the
the theorem statement follows.
theorem statement follows. For left eigenvectors
For left eigenvectors we
we have
have aa similar
similar statement, namely
statement, namely
H H H HH 1
yyH AA = Xy
AyH ifandon\yif(T
if and only if (T y) y)H
(T~(T- = A(THHyf.
AT)1 AT) =X(T y)H. DD
x
Remark
Remark 9.19.
9.19. IfIf /f is
is an
an analytic
analytic function (e.g., ff(x)
function (e.g., is aa polynomial,
( x ) is polynomial, or or eeX,, or
or sin*,
sinx,
fl n
or, in general, representable
representable by a power series X^ô L~:O anxn),n* )> then
then it is easy to show that
easy to show that
the
the eigenvalues
eigenvalues ofof /(A)
f(A) (defined
(defined as L~:OanAn) are
as Xô^-A") are /(A),
f(A), but butf(A)
/(A)doesdoes not
notnecessarily
necessarily
have
have all
all the
the same
same eigenvectors
eigenvectors (unless, say, A
(unless, say, A is
is diagonalizable).
diagonalizable). For For example,
example, A A = = T[~0 6Oj]
2
has only one
has only one right
right eigenvector corresponding to
eigenvector corresponding to the
the eigenvalue
eigenvalue 0,
0, but
but A
A2 == f[~0 0~1]has
has two
two
independent
independent right
right eigenvectors
eigenvectors associated
associated with
with the eigenvalue 0.
the eigenvalue o. What
What isis true
true is
is that
that the
the
eigenvalue/eigenvector
eigenvalue/eigenvector pair
pair (A, x) maps
(A, x) maps to (/(A), x)
to (f(A), x) but
but not conversely.
not conversely.
The following theorem
The following theorem is
is useful
useful when
when solving systems of
solving systems of linear
linear differential equations.
differential equations.
A
Details of how the matrix exponential e'
etA is used to solve
solve the system
system xi = Ax are the subject
Ax are
of
of Chapter
Chapter 11.
11.
xn 1
Theorem 9.20.
Theorem 9.20. Let A Ee R"
jRnxn and suppose
suppose X~~
X-I AX = A,
AX — A, where A
A is diagonal. Then
n
= LeA,txiYiH.
i=1
82 Chapter 9.
Eigenvalues and
and Eigenvectors
Eigenvectors
Proof: Starting from

Proof' Starting from the definition, we
the definition, we have
have
n
= LeA;IXiYiH. 0
i=1
The following
The following corollary
corollary is
is immediate
immediate from
from the
the theorem
theorem upon setting tt == I.I.
upon setting
nx
Corollary 9.21.
Corollary 9.21. If A Ee R
If A " is
]Rn xn is diagonalizable eigenvalues A.,-,
with eigenvalues
diagonalizable with Ai, i/' E
en,~, and
and right
right
AA XA i
eigenvectors
eigenvectors xXi,
t •, /
i €
E n_,
~, then
then e
e has
has eigenvalues
eigenvalues e
e " , i
i €
E n_,
~, and
and the
the same
same eigenvectors.
eigenvectors.
There
There areare extensions
extensions to
to Theorem
Theorem 9.20 9.20 and
and Corollary
Corollary 9.21
9.21for
foranyanyfunction
functionthat thatisis
analytic on
analytic on the
the spectrum
spectrum ofof A,
A, i.e.,
i.e., ff(A)
(A) = f(A)X~l I =
= XXf(A)X- Xdiag(/(A.i),...,
= Xdiag(J(AI), f ( X t t ) ) X ~ Il ..
... , f(An))X-
It is
It is desirable,
desirable, of
of course, to have
course, to have aa version of Theorem
version of Theorem 9.209.20 and
and its
its corollary
corollary in in which
which
A is
A is not
not necessarily
necessarily diagonalizable.
diagonalizable. It It is
is necessary
necessary first
first to
to consider
consider the
the notion
notion of of Jordan
Jordan
canonical form,
canonical form, from
from which such aa result
which such is then
result is then available
available and
and presented
presented later in this
later in chapter.
this chapter.
9.2
9.2 Jordan Canonical
Jordan Canonical Form
Form
Theorem 9.22.
Theorem 9.22.
x
1. Jordan Canonical
I. lordan Canonical Form (/CF): For
Form (JCF): For all
all A c nxn
A Ee C" " with
with eigenvalues
eigenvalues X\,..., eC
AI, ... , kAnn E C
x
(not necessarily
(not necessarily distinct),
distinct), there
there exists
exists X
X E€ C^
c~xn " such
such that
that
X-I AX = 1 = diag(ll, ... , 1q), (9.12)
where
where each of the
each of the lordan
Jordan block matrices 1/ i1,, .•••
block matrices . . ,, 1q
Jq is
is of
of the
the form
form
Ai 0 o
0 Ai 0
Ai
1i = (9.13)
o
Ai
o o Ai
9.2. Jordan Canonical
9.2. Jordan Canonical Form
Form 83
83
and L;=1 ki = n.
nx
2. Real Jordan Canonical Form:
Form: For all A E€ R jRnxn" with eigenvalues AI,
Xi, ...
. . .,,An
Xn (not
(not
xn
necessarily X €
necessarily distinct), there exists X E R"
lR.~xn such that
(9.14)
where each of
of the Jordan block matrices 11,
J\, ...
..., , J1qq is
is of form
of the form
in the case of real eigenvalues A., e A (A), and
where M
where Mi; = _~; ^~: 1] and
= [[ _»' \0 ~]A ininthe
andhI2 == [6 thecase
caseof
of complex
complex conjugate
conjugateeigenvalues
eigenvalues
>
ai±jp
(Xi i eA(A
± jfJi E ).
A(A).
Proof: For the proof

Proof: proof see, for example, [21, pp. 120-124]. D
0
Transformations
Transformations like T == [I"__,~ -"•{"]
like T allowus
{ ] allow usto
togo
goback
back and
andforth
forthbetween
between aareal
realJCF
JCF
and its complex counterpart:
T-I [ (X + jfJ O. ] T =[ (X fJ ] = M.
o (X - JfJ -fJ (X
For nontrivial Jordan blocks, the situation is only a bit more complicated.
complicated. With
~ -~]
1 -j
o o
-j 1 o 0 '
o o -j 1
84 Chapter
Eigenvectors
it
it is easily checked
is easily checked that
that
]T~[~ l
0 0
[ "+ jfi et + jf3 0 0 h
T- I 0
0 0 et - jf3 M
0 0 0 et - jf3
Definition
Definition 9.23.
9.23. The
The characteristic
characteristic polynomials
polynomials of
of the
the Jordan
Jordan blocks
blocks defined
defined in
in Theorem
Theorem
9.22 are called
9.22 are the elementary
called the elementary divisors or invariant
divisors or invariant factors of A.
factors of A.
Theorem 9.24.
Theorem The characteristic
9.24. The characteristic polynomial
polynomial ofof aa matrix is the
matrix is the product
product of
of its elementary
its elementary
divisors. The minimal
divisors. The minimal polynomial
polynomial ofof aa matrix
matrix is
is the
the product
product of
of the
the elementary divisors of
elementary divisors of
highest degree corresponding
highest degree corresponding to
to distinct eigenvalues.
distinct eigenvalues.
Theorem 9.25.
Theorem 9.25. Let
Let A
x
A eE C"nxn" with c
eigenvalues AI,
with eigenvalues ...," X
AI, .. n. Then
An. Then
n
1. det(A) = nAi.
i=1
n
2. Tr(A) = 2,)i.
i=1
Proof:
Proof:
l
1. From
1. From Theorem 9.22 we
Theorem 9.22 we have that A
have that ~ . Thus,
A = XXJJXX-I. Thus,
1
det(A) = det(XJX-
det(A) = ) = det(7)
det(X J X-I) det(J) == ]~["
n7=1 =l A,-.
Ai.
2. Again, from
from Theorem ~ l . Thus,
9.22 we have that A = XXJJXX-I.
Theorem 9.22
l 11
Tr(A) =
Tr(A) = Tr(XJX~ ) = TrC/X"
Tr(X J X-I) Tr(JX- X) = Tr(/)
*) = Tr(J) = L7=1
= £" =1 A.,-.
Ai. D
0
Example 9.26.
Example 9.26. Suppose A
Suppose A e E7x7 is
E lR. is known
known toto have
have 7r(A)
:rr(A) = (A 1)4(A- - 2)3
(A.- - 1)4(A 2)3and
and
2 2
a (A.) =
et(A) = (A
(A.- —1)2(A
I) (A.- —2)2.
2) . Then
ThenAAhashastwo
twopossible
possibleJCFs
JCFs(not
(notcounting
countingreorderings
reorderingsofofthe
the
diagonal blocks):
diagonal blocks):
1 0 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 1 I 0 0 0
J(l) = 0 0 0 1 0 0 0 and f2) = 0 0 0 1 0 0 0
0 0 0 0 2 1 0 0 0 0 0 2 1 0
0 0 0 0 0 2 0 0 0 0 0 0 2 0
0 0 0 0 0 0 2 0 0 0 0 0 0 2
(1)
Note that 7J(l) has elementary divisors
has elementary divisors(A(A- -1)z, I) 2(A
, (A. - 1),
- 1), (A - (A.
1),-(A1), 2)2(A
(A, -and
- 2)2, , and
- 2),(A - 2),
2) 2 2 2
while /(
J(2) has elementarydivisors
has elementary divisors (A(A- -1)2,
I) (A, (A
- -1)2,I)(A, (A - 2)and
- 2)2, , and
(A -(A2).
- 2).
9.3. Determination
9.3. Determination of
of the JCF
the JCF &5
85
Example 9.27.
Example 9.27. Knowing TT (A.), a(A),
rr(A), a (A), and
and rank(A
rank (A -—Ai
A,,7) for distinct
l) for distinct Ai
A.,isis not
not sufficient
sufficient to
to
determine the JCF of A
determine uniquely. The matrices
A uniquely.
a 0 0 0 0 0 a 0 0 0 0 0
0 a 0 0 0 0 0 a 0 0 0 0
0 0 a 0 0 0 0 0 0 a 0 0 0 0
Al= 0 0 0 a 0 0 A2 = 0 0 0 a 0 0
0 0 0 0 a 0 0 0 0 0 0 a 0
0 0 0 0 0 a 1 0 0 0 0 0 a 0
0 0 0 0 0 0 a 0 0 0 0 0 0 a
7r(A.) =
both have rr(A) = (A
(A.- —a)7, a(A.)== (A(A.- —
a) ,a(A) a) and
a)\ , andrank(A
rank(A- — al) ==4, 4,i.e.,
al) i.e.,three
threeeigen-
eigen-
vectors.
9.3 Determination of
Determination of the
the JCF
JCF
lxn
The first critical item of information in determining the JCF of a matrix A A Ee W
]R.nxn is its
number of eigenvectors. For each distinct eigenvalue Ai, A,,, the associated
associated number of linearly
independent right
independent right (or left) eigenvectors
eigenvectors is given by dim A^(A -— A.,7)
dimN(A A;l) = n -— rank(A -— A;l).
A.(7).
The straightforward
The straightforward case
case is, of course,
is, of course, when
when Ai
X,- is simple, i.e.,
is simple, of algebraic
i.e., of algebraic multiplicity 1; it
multiplicity 1; it
then has precisely one eigenvector. The more interesting (and difficult) case occurs when
A,
Ai is of algebraic multiplicity
multiplicity greater than one. For example, suppose suppose
A =
[3 2
0
o
3
0 n
U2 I]
Then
Then
A-3I= o 0
o 0
has rank 1, so the eigenvalue 3 has two eigenvectors associated If we let [~l
associated with it. If [î ~2 &]T
£2 ~3]T
denote aa solution
denote solution to
to the
the linear
linear system
system (A
(A -— 3/)£ = 0,
3l)~ = 0, we
we find 2£2 +
that 2~2
find that +£~33== 0O.. Thus,
Thus, both
both
are eigenvectors
are eigenvectors (and
(and are
are independent).
independent). To get aa third
To get third vector JC3 such
vector X3 such that X = [Xl
that X [x\ KJ_ XT,]
X2 X3]
reduces A to JCF, we need the notion of principal vector.
Definition 9.28. c
xn
A Ee C"nxn (or R" x
"). Then xX is a right principal
]R.nxn). principal vector of degree
degree k
X Ee A
associated with A (A) ifand
A(A) if and only
only if(A XI)kx == 00 and
if (A -- ulx and(A(A -- AI)k-l ^ o.
U}k~lxx i= 0.
Remark 9.29.
Remark 9.29.
1. An analogous definition holds for a left

left principal vector of degree k.
k.
86 Chapter
Eigenvectors
2. The phrase "of

"of grade k" is often
often used synonymously
synonymously with "of
"of degree k."
3. Principal vectors are sometimes also called generalized eigenvectors, but the latter
generalized eigenvectors,
term will be assigned a much different
different meaning in Chapter 12.
4. The case kk =
= 1 corresponds to the "usual" eigenvector.
eigenvector.
5. A right (or left) principal vector of degree kk is associated with a Jordan block J;
S. ji of
of
dimension k or larger.
9.3.1
9.3.1 Theoretical
Theoretical computation
computation
To motivate the development of a procedure for determining
determining principal vectors, consider a
2x2 i]. Denote
Jordan block{h[~0 h1.
2 x 2 Jordan Denote by (1)
by xx(l) and
and x (2)
x(2) the
the two
two columns
columns of
of aa matrix
matrix XX eE R 2 2
lR~X2
,x
that reduces a matrix A A to this JCF. Then the
JCF. Then theequation
equation AXAX == XXJ canbe
J can bewritten
written
A [x(l) x(2)] = [x(l) X(2)] [~ ~ J.

Ax(1) =
The first column yields the equation Ax(!) x (1) is a right
hx(1) which simply says that x(!)
= AX(!),
(2)
eigenvector. The second
second column yields the following equation for x
x(2),, the principal vector
of degree
of degree 2:
(A - A/)x(2) = x(l). (9.17)
z (2) w
If we
If we premultiply
premultiply (9.17) by AI), we
(A -- XI),
by (A we find (A-- A1)2
find (A X I ) x(2)
x ==(A XI)x ==O.0.Thus,
(A-- A1)X(l) Thus,
the definition of principal vector is satisfied.
x
This suggests a "general" procedure. First, determine all eigenvalues of A eE R" lR nxn"
c
nxn ).
(or C ). Then for each distinct X A eE A (A) perform the following:
A(A) following:
1. Solve
(A - A1)X(l) = O.
This step finds all the eigenvectors (i.e., principal vectors of degree 1) associated with
I) associated
A. The number of
X. of eigenvectors depends on the rank of AI. For example, if
of A -— XI. if
rank(A — - XI)
A/) == n -— 1, there is only one eigenvector. If
If the algebraic multiplicity of
multiplicity of
XA is greater than its geometric multiplicity, principal
principal vectors still need
need to be computed
from succeeding steps.
(1)
2. For each independent jc
x(l),, solve
(A - A1)x(2) = x(l).
The number of linearly independent solutions at this step depends on the rank of of
(A
(A — X I ) 2 . If, for example, this rank is nn -— 2, there are two linearly independent
- uf.
solutions to the homogeneous equation (A AI)22x^
(A -— XI) o. One of these solutions
x (2) = 0. solutions
(l)
is, of course, xx(l) (^ (1= 0), (A -- 'A1)
0), since (A 22 ( l )
X I ) xx(l) = (A - XI)0
= (A o. The
AI)O = 0. othersolution
The other solution
is the desired principal vector of degree 2. (It may be necessary necessary to take a linear
(1)
combination of of jc
x(l) vectors to get
get a right-hand
right-hand side that is in 7£(A
R(A — - XI). See, for
AI). See,
example, Exercise 7.)
9.3. Determination
9.3. Determination of
of the
the JCF
JCF 87
3. For
3. For each
each independent x(2) from
independent X(2) from step
step 2,
2, solve
solve
(A - AI)x(3) = x(2).
4. Continue
4. Continue inin this
this way until the
way until the total
total number
number of
of independent
independent eigenvectors
eigenvectors and
and principal
principal
vectors is
vectors is equal
equal to
to the
the algebraic
algebraic multiplicity
multiplicity of
of A.
A.
Unfortunately,
Unfortunately, this natural-looking procedure
this natural-looking procedure cancan fail
fail to
to find all Jordan
find all Jordan vectors.
vectors. For
For
more extensive treatments,
more extensive treatments, see,
see, for
for example,
example, [20]
[20] and
and [21].
[21]. Determination
Determination of of eigenvectors
eigenvectors
and principal
and principal vectors is obviously
vectors is obviously very
very tedious for anything
tedious for anything beyond simple problems
beyond simple problems (n = 22
(n =
or 3,
or 3, say).
say). Attempts
Attempts to to do
do such
such calculations
calculations in in finite-precision
finite-precision floating-point
floating-point arithmetic
arithmetic
generally prove
generally prove unreliable.
unreliable. There
There are
are significant
significant numerical
numerical difficulties
difficulties inherent
inherent in
in attempting
attempting
to compute
to compute aa JCF,
JCF, and
and the
the interested
interested student
student is
is strongly
strongly urged
urged to
to consult
consult the
the classical
classical and
and very
very
readable [8]
readable [8] to
to learn
learn why.
why. Notice
Notice that
that high-quality
high-quality mathematical
mathematical software
software such
such as
as MATLAB
MATLAB
does
does not offer aa jcf
not offer command, although
j cf command, although aa jordan command is
j ardan command available in
is available in MATLAB's
MATLAB'S
Symbolic Toolbox.
Symbolic Toolbox.
kxk
Theorem 9.30.
Suppose AA Ee C
Ckxk has an
has an eigenvalue A,ofofalgebraic
eigenvalue A algebraicmultiplicity
multiplicitykkand
and
suppose
suppose further rank(A -— AI)
that rank(A
further that AI) == kk -— 1.
1. Let
Let X x ( l ) , ...
= [[x(l),
X = x(k)], where
. . . ,, X(k)], where the chain of
the chain of
vectors x(i)
vectors x(i) is
is constructed
constructed as
as above.
above. Then
Then
Theorem 9.31.
Theorem (x (1) , ...
9.31. {x(l), x (k) } is
. . . ,, X(k)} is aa linearly
independent set.
set.
Theorem 9.32.
Theorem 9.32. Principal vectors associated
Principal vectors associated with
with different
different Jordan blocks are
Jordan blocks are linearly
linearly inde-
inde-
pendent.
pendent.
Example 9.33.
Example 9.33. Let
Let
A=[~0 01 2; ] .
The eigenvalues
The eigenvalues ofof A
A are AI =
are A1 = I, h2 =
1, A2 = 1,
1, and h3 =
and A3 = 2.
2. First,
First, find the eigenvectors
find the eigenvectors associated
associated
with the distinct
with the distinct eigenvalues
eigenvalues 11 and
and 2.2.
(A ,(1)=
2I)x~1)
(A --2/)x3(1) = 00 yields
yields
88 Chapter 9. Eigenvalues and Eigenvectors
(1)
(A-- 11)x?J
(A l/)x, ==00 yields
yields
To find
To find aa principal
principal vector of degree
vector of degree 22 associated
associated with
with the
the multiple
multiple eigenvalue
eigenvalue 1,
1, solve
solve
l/)x,(2)2) == xiI)
(A -– 1I)xl
(A x, (1)to
toeet
get
x,
(2)
=[ 0~ ] .
Now
Now let
let
xl" xl"] ~ [ ~
l
0 5
X = [xiI) 1 3
0
Then itit is
Then is easy
easy to
to check
check that
that
X-'~U -i
0
1
0
-5 ]
and X-lAX =[ I
~ 1
0 n
9.3.2
9.3.2 On the +1 's
's in JCF
JCF blocks
In this subsection
In this subsection we show that
we show that the
the nonzero superdiagonal elements
nonzero superdiagonal elements of
of aa JCF
JCF need not be
need not be
11's's but
but can
can be
be arbitrary
arbitrary -— so so long
long as
as they
they are
are nonzero.
nonzero. For
For the
the sake
sake of
of definiteness,
defmiteness, we
we
consider below
consider below the case of
the case of aa single
single Jordan
Jordan block, but the
block, but the result clearly holds
result clearly holds for any JCF.
for any JCF.
nxn
SupposedAA€E RjRnxn
Suppose and
and
Let D
Let diag(d1, ...
D = diag(d" be aa nonsingular
. . . ,, ddnn)) be nonsingular "scaling"
"scaling" matrix.
matrix. Then
Then
A 4l. 0 0
d,
0 )... !b. 0
d,
A
D-'(X-' AX)D = D-' J D = j =
dn - I
dn -
0
2
dn
A- dn - I
0 0 )...
9.4.
9.4. Geometric
Geometric Aspects
Aspects of
of the
the JCF
JCF 89
Appropriate choice of the di 's then yields any desired nonzero superdiagonal elements.
di's
This result can also be interpreted
interpreted in terms of the matrix X = = [x\,..., eigenvectors
xnn]] of eigenvectors
[x[, ... ,x
and principal
and principal vectors that reduces
vectors that reduces A A to
to its
its JCF. Specifically, Jj is
lCF. Specifically, is obtained
obtained from
from A A via the
via the
similarity
similarity transformation
transformation XDXD = \d\x\,..., dnxn}.
[d[x[, ... , dnxn].
In
In aa similar
similar fashion,
fashion, the reverse-order identity
the reverse-order matrix (or
identity matrix (or exchange matrix)
exchange matrix)
0 0 I
0
p = pT = p-[ = (9.18)
0 1
I 0 0
can be used to
to put the superdiagonal
superdiagonal elements
elements in
in the subdiagonal instead
instead if that is desired:
desired:
A I 0 0 A 0 0
0 A 0 A 0
p-[ A 0 1 A
p=
0
A I A 0
0 0 A 0 0 A
9.4
9.4 Geometric Aspects
Geometric of the
Aspects of the JCF
JCF
The matrix XX that reduces a matrix A E e IR" X
jH.nxn (or Cnxn
"(or nxn
c
)) totoaalCF
JCFprovides
providesaachange
changeofofbasis
basis
with respect to
with respect to which
which the
the matrix
matrix is diagonal or
is diagonal or block
block diagonal.
diagonal. It It is
is thus natural to
thus natural to expect
expect an
an
associated direct
associated direct sum decomposition of
sum decomposition of jH.n.
R. Such
Such aa decomposition
decomposition is is given
given in
in the
the following
following
theorem.
x
9.34. Suppose
Theorem 9.34. Suppose A Ee R" " has characteristic polynomial
jH.nxn
n(A) = (A - A[)n) ... (A - Amtm
and minimal polynomial

a(A) = (A - A[)V) '" (A - Am)Vm
with A-i, . . . ,, A.
AI, ... distinct. Then
Ammdistinct. Then
jH.n = N(A - AlIt) E6 ... E6 N(A - AmItm
= N (A - A1I) v) E6 ... E6 N (A - Am I) Vm .
Note that dimM(A AJ)Viw =
dimN(A -— A.,/) = «,-.
ni.
Definition 9.35. Let V be a vector space over FIF and suppose A : V —>•
and suppose --+ V is a linear
transformation. A
transformation. subspace S
A subspace c V
S ~ is A-invariant
V is if AS
A -invariant if AS ~c S,
S, where
where AS is defined
AS is as the
defined as the
set {As : ss eE S}.
set {As: S}.
90 Chapter 9.
Eigenvalues and
and Eigenvectors
Eigenvectors
If V is taken to be ]Rn
If R" over Rand R"xxk* is a matrix whose columns SI,
R, and SS Ee ]Rn s\,..., s/t
... , Sk
span aa k-dimensional
span /^-dimensional subspace
subspace S,
<S,i.e.,
i.e.,R(S)
K(S) == S, thenS <S
<S,then is isA-invariant
A-invariantififand
andonly
onlyififthere
there
kxk
exists M EeR ]Rkxk such that
AS = SM. (9.19)
/th columns of each side of (9.19):
This follows easily by comparing the ith
Example 9.36.
Example 9.36. The
The equation Ax = A*
equation Ax AX =
= xx A defining aa right
A defining right eigenvector
eigenvector xx of
of an
an eigenvalue
eigenvalue
XA says that *x spans an A-invariant subspace (of dimension one).
9.37. Suppose X block diagonalizes A, i.e.,
Example 9.37.
X-I AX = [~ J2
].
Rewriting in the form
~ J,
we have that
we have that A
AXA,i = X;li,
A", /,,i /== 1,2,
1, 2,sosothe
thecolumns
columnsofofXiA,span
spanananA-invariant
A-mvanantsubspace.
subspace.
Theorem 9.38. E"x".
9.38. Suppose A Ee ]Rnxn.
7.
1. Let
Let p(A) «o/ +
= CloI
peA) = o?i A +
+ ClIA + '"• • •+ <xqAq be
+ ClqAq be aa polynomial
polynomial in A. Then
in A. Then N(p(A))
N(p(A)) and
and
7£(p(A)) A-invariant.
R(p(A)) are A-invariant.
A -invariant if
2. S is A-invariant ifSS1-1. is AATT-invariant.
only if
if and only
Theorem 9.39.
Theorem 9.39. If F such that V =
If V is a vector space over IF = N\ ® ...
NI EB • • • EB
0 Nmm, , where each
A// is A-invariant,
N; is then aa basis
A-invariant, then basis for V can
for V can be
be chosen with respect
chosen with respect to which A
to which A hashas aa block
block
diagonal representation.
diagonal representation.
The Jordan
The Jordan canonical
canonical form form isis aa special
special case
case ofof the above theorem.
the above theorem. If If A A has
has distinct
distinct
eigenvalues A,,-
eigenvalues Ai as in Theorem 9.34, 9.34, we could choose bases for N(A N(A — - A.,-/)"'
Ai/)n, by SVD, for
example (note
example (note that
that the
the power
power ni n, could
could bebe replaced
replaced by v,). We
by Vi). would then
We would then get
get aa block
block diagonal
diagonal
representation for
representation for A
A with
with full blocks rather
full blocks rather than
than the
the highly structured Jordan
highly structured blocks. Other
Jordan blocks. Other
such "canonical"
such "canonical" forms
forms areare discussed
discussed in text that
in text that follows.
follows.
Suppose A" X == [Xl[ X .....
i , . . . ,Xm] R"nxnisissuch
Xm] Ee]R~xn suchthat
thatX-I
XÂXAX ==diag(J1, diag(7i,...,
... , Jm where
Jm),),where
each Ji
each Ji = diag(/,i,..., Jik,)
= diag(JiI,"" //*,.) andand each
each /,* is aa Jordan
Jik is Jordan block
block corresponding
corresponding to to Ai
A, Ee A(A).
A(A).
We could also use other block diagonal decompositions (e.g., via SVD), but we restrict our
attention
attention here to only
here to only the
the Jordan
Jordan blockblock case.
case. Note that A
Note that AXiA", == Xi
A*,- J/,,i , so
so by
by (9.19)
(9.19) the
the columns
columns
of A",
of (i.e., the
Xi (i.e., the eigenvectors
eigenvectors and and principal vectors associated
principal vectors associated withwith Ai) A.,) span
span an an A-invariant
A-invariant
subspace of]Rn.
of W.
Finally, we return to the problem of developing a formula formula for ee'l AA in the case that A A
x
nxn T
is not necessarily
is not diagonalizable. Let
necessarily diagonalizable. Let Yi7, E€ <eC" "' , bebe aa Jordan
Jordan basisbasis for for N
N(A (AT -— A.,/)"'.
A;lt.
Equivalently, partition
Equivalently, partition
9.5.
9.S. The Matrix Sign
The Matrix Sign Function
Function 91
91
compatibly. Then
compatibly. Then
A = XJX- I = XJy H
= [XI, ... , Xm] diag(JI, ... , Jm) [YI , ••• , Ym]H
m
H
= LX;JiYi .
i=1
In a similar fashion we can compute

m
etA = LXietJ;YiH,
i=1
which
which is
is aa useful
useful formula
formula when
when used in conjunction
used in conjunction with
with the
the result
result
A 0 0 eAt teAt .lt 2 e At

2!
0 A 0 eAt teAt
exp t A 0 0 0 eAt
1
0 0 A 0 0
for a k x k Jordan block

block 7, A == Ai.
associated with an eigenvalue A.
Ji associated A.,.
9.5
9.5 The Matrix
The Matrix Sign
Sign Function
Function
In this section
section we give a very brief
brief introduction to an interesting
interesting and useful
useful matrix function
function
called the matrix
called the sign function.
matrix sign function. It
It is
is aa generalization
generalization of
of the sign (or
the sign (or signum)
signum) of
of aa scalar.
scalar. A
A
survey of the matrix sign function and some of its applications can be found in [15].
Definition
Definition 9.40. E C with Re(z) ^f= O.
9.40. Let z E 0. Then the sign of
of z is defined
defined by
Re(z) {+1 ifRe(z) > 0,

sgn(z) = IRe(z) I = -1 ifRe(z) < O.
x
Definition 9.41. Suppose A E
Definition 9.41. e C" " has no eigenvalues on the imaginary axis, and let
cnxn
be aa Jordan
be Jordan canonical
canonicalform
form for
for A, with
with N
N containing
containing all
all Jordan
Jordan blocks
blocks corresponding
corresponding to
to the
the
eigenvalues of A in
eigenvalues of in the
the left
left half-plane
half-plane and
and P
P containing
containing all
all Jordan
Jordan blocks
blocks corresponding
corresponding toto
eigenvalues in
eigenvalues in the
the right
right half-plane.
half-plane. Then the sign
Then the sign of
of A, denoted sgn(A),
A, denoted sgn(A), is
is given
given by
by
sgn(A) = X [ -/ 0] 0 / X
-I
,
92
92 Chapter 9.
Eigenvalues and
and Eigenvectors
Eigenvectors
where the negative and positive of the same dimensions as N and p,

positive identity matrices are of P,
respectively.
There are
There are other
other equivalent
equivalent definitions
definitions of
of the
the matrix sign function,
matrix sign function, but
but the one given
the one given
here is
here is especially
especially useful
useful in
in deriving
deriving many
many ofof its
its key
key properties. The JCF
properties. The JCF definition
definition of
of the
the
matrix sign function does not generally
generally lend itself
itself to reliable computation on a finite-word-
length digital computer. In fact, its reliable numerical calculation
calculation is an interesting topic in
its own right.
We state
We state some
some of the more
of the more useful properties of
useful properties of the matrix sign
the matrix sign function
function as
as theorems.
theorems.
Their straightforward proofs are left
Their left to the exercises.
exercises.
Theorem 9.42.
Theorem 9.42. Suppose A Ee C"nxnx
e
" has no eigenvalues on the imaginary axis, and let
S== sgn(A).
sgn(A). Then the following
following hold:
1. S is diagonalizable with eigenvalues equal to del.
± 1.
2. S2 =
2. S2 = I.
I.
3.
3. AS = SA.
AS = SA.
4. sgn(AH) =
4. sgn(A") = (sgn(A»H.
(sgn(A))".
l x
5. sgn(T-
5. sgn(T-1AT) T-lsgn(A)TforallnonsingularT
AT) = T-1sgn(A)T enxn
foralinonsingularT Ee C" "..
6. sgn(cA)
6. = sgn(c)
sgn(cA) = sgn(c) sgn(A)
sgn(A)/or
for all nonzero real scalars c.
c.
x
Theorem 9.43. Suppose A Ee e
Theorem 9.43. C"
nxn
" has no eigenvalues on the imaginary axis, and let
S= sgn(A). Then the following
— sgn(A). following hold:
1.
I. 7l(S — /) is an A-invariant
R(S -l) left half-plane
A-invariant subspace corresponding to the left half-plane eigenvalues
of A (the
of negative invariant
(the negative invariant subspace).
subspace).
R(S + l) is an A-invariant
2. R(S+/) A -invariant subspace corresponding to the right half-plane
half-plane eigenvalues
(the positive
of A (the
of invariant subspace).
positive invariant
3. negA ==
3. negA = (l
(/ -— S)
S)/2
/2 is a projection
projection onto the negative invariant subspace of A.
subspace of
4. posA == (/ +
= (l + S)/2 is a projection onto the positive
positive invariant subspace of
of A.
A.
EXERCISES
EXERCISES
1. Let A
1. e nxn
A Ee Cnxn have distinct
distinct eigenvalues AI, ...,, X
),.1> ••• right eigen-
),.nn with corresponding right eigen-
vectors
vectors Xi, and left
... ,,xXnn and
Xl, ... left eigenvectors
eigenvectors Yl,
y\, ••.
..., , Yn, respectively. Let
yn, respectively. Let v Ee en
C" be
be an
an
arbitrary vector.
arbitrary vector. Show
Show that
that vv can
can be
be expressed
expressed (uniquely)
(uniquely) as as aa linear
linear combination
combination
of the right eigenvectors. Find the appropriate expression

expression for v as a linear combination
of the left eigenvectors as well.
Exercises 93
93
x H
2. Suppose
2. Suppose A
A E rc nxn
€ C" " is
is skew-Hermitian,
skew-Hermitian, i.e.,
i.e., A
AH = —A.
= Prove that
-A. Prove all eigenvalues
that all eigenvalues of
of
aa skew-Hermitian
skew-Hermitian matrix must be pure imaginary.
matrix must be pure imaginary.
x
3. Suppose
3. Suppose A rc nxn
A Ee C" " is Hermitian. Let
is Hermitian. Let A A be
be an
an eigenvalue of A
eigenvalue of A with corresponding
with corresponding
right eigenvector x.
right eigenvector Show that
x. Show that xx is also aa left
is also left eigenvector
eigenvector for
for A.
A. Prove
Prove the
the same
same result
result
if
if A is skew-Hermitian.
A is skew-Hermitian.
5x5
4. Suppose a matrix A E€ lR.
R5x5 has eigenvalues {2,
{2, 2, 2, 2, 3}.
3}. Determine all possible
JCFs for
JCFs for A.
A.
5. Determine the eigenvalues,

5. eigenvalues, right eigenvectors
eigenvectors and
and right principal vectors if
if necessary,
and (real) JCFs of the following matrices:
2 -1 ]
(a) [ 1 0 '
6. Determine
Determine the
the JCFs of the
the following
following matrices:
matrices:
n
6. JCFs of
<a) Uj -2
-1
2 =n
7.
7. Let
Let
A = [H -1]· 1
2 2"
Find
Find aa nonsingular
nonsingular matrix
matrix X such that
X such that X
X-IAX = J,J, where
AX = where JJ is
is the
the JCF
JCF
J=[~0 0~ 1~].
r
Hint: Use[—
Hint: Use[-11 11 -— l] I]T as an
an eigenvector. The vectors [0 [0 1 -If— l] r and[l 0]r
and[1 0 of
(2) (1)
are both eigenvectors, but then the equation (A
(A -— /)jc
I)x(2) = x x(1) can't be solved.
8. Show
8. Show that all right
that all eigenvectors of
right eigenvectors of the
the Jordan
Jordan block
block matrix
matrix in Theorem 9.30
in Theorem 9.30 must be
must be
multiples of
multiples of el
e\ eE R*.
lR. k . Characterize all left
Characterize all left eigenvectors.
eigenvectors.
x T
9. Let AA eE R"
lR.nxn" be of the form A = xy
A = xyT,, where x,
x, y E R"
y e lR.n are nonzero vectors with
TT
O. Determine
xx yy = 0. Determine thethe JCF
JCF of
of A.
A.
xn T
10. Let
10. Let A
A eE R"
lR. nxn be of the
be of the form
form A A = / + xy
= 1+ xyT,, where
where x, y e
x, y E R" are nonzero
lR. n are nonzero vectors
vectors
TT
with = 0.
with x yy = O. Determine
Determine the JCF of
the JCF of A.
A.
11. Suppose a matrix A A Ee RlR. 16x

16x 16
16 has
has 16 eigenvalues
eigenvalues atat 00 and its
its JCF
JCF consists
consists of
of a single
single
Jordan block of the form
Jordan form specified
specified in Theorem 9.22.
9.22. Suppose
Suppose the small number 10- 10~16
16
is added
is added to
to the
the (16,1)
(16,1) element
element of of J.
J. What
What are
are the
the eigenvalues
eigenvalues of of this
this slightly perturbed
slightly perturbed
matrix?
matrix?
94 Chapter 9.
Eigenvalues and
and Eigenvectors
Eigenvectors
12. Show
12. Show that
that every
every matrix
matrix A A E e R" x
jRnxn " can
can be
be factored
factored in in the
the form
form A A = SIS2, where Si
Si$2, where SI
and
and £2
S2 are real symmetric
are real symmetric matrices
matrices andand one
one of
of them,
them, say say S1, is nonsingular.
Si, is nonsingular.
Hint: Suppose A
Hint: Suppose A == XXl ~ l is
J XX-I is aa reduction
reduction ofof A to JCF
A to JCF andand suppose
suppose we we can
can construct
construct
the "symmetric
the "symmetric factorization"
factorization" of of 1. Then A =
J. Then = (X i X T ) ( X ~ T T S2X-I)
( X SSIXT)(X- S2X~l) would
would be the
required symmetric
required symmetric factorization
factorization of of A. Thus, it
A. Thus, it suffices
suffices to to prove
prove the
the result
result for
for the
the
JCF. The
JCF. The transformation
transformation P in (9.18)
P in (9.18) isis useful.
useful.
x
13. Prove
13. Prove that
that every
every matrix
matrix A Ee W " is
jRn xn is similar
similar to
to its
its transpose
transpose and
and determine
determine aa similarity
similarity
transformation explicitly.
transformation explicitly.
Hint: Use the
Hint: Use the factorization
factorization in
in the
the previous
previous exercise.
exercise.
14. Consider
14. Consider the
the block
block upper triangular matrix
upper triangular matrix
A _ [ All Al2 ]
- 0 A22 '
xn kxk
where A Ee M" and A
jRnxn and All e R
n E 1 ::s:
jRkxk with 1 < k < ::s: n. Suppose A
n. Suppose and that we
^ 0 and
u =1=
Al2 we
want to
want to block diagonalize A
block diagonalize via the
A via similarity transformation
the similarity transformation
R*x <«-*), i.e.,

X Ee IRkx(n-k),
where X
T-IAT = [A011 0 ]
A22 .
Find aa matrix
Find matrix equation
equation that
that X must satisfy
X must satisfy for
for this
this to
to be
be possible. If nn =
possible. If = 22 and
and kk =
= 1,
1,
what can
what can you say further,
you say further, in
in terms
terms of
of AU and A
All and 22, about
A22, about when
when the
the equation
equation for
for X is
is
solvable?
solvable?
15. Prove
15. Theorem 9.42.
Prove Theorem 9.42.
16. Prove
16. Prove Theorem
Theorem 9.43.
9.43.
17. Suppose
17. Suppose AA Ee C"xn en
has all
xn has all its
its eigenvalues
eigenvalues in
in the
the left
left half-plane.
half-plane. Prove
Prove that
that
sgn(A) =
sgn(A) = -1.
-/.
Chapter 10
Chapter 10
Canonical Forms
Canonical Forms
10.1
10.1 Some Basic
Some Basic Canonical
Canonical Forms
Forms
Problem:
Problem: Let V and
Let V and WW be
be vector spaces and
vector spaces and suppose
suppose A A :: V
V —>•
---+ W
W is
is aa linear
linear transformation.
transformation.
Find
Find bases
bases in V and
in V and W W with
with respect
respect to to which
which Mat
Mat A A has
has aa "simple
"simple form" or "canonical
form" or "canonical
form."
form." In
In matrix
matrix terms,
terms, if
if A IR mxn
A eE R mxn
,, find R™ xm and
find P eE lR;;:xm and QQ eE R n xn
lR~xn
n such that
such P AQ has
that PAQ has aa
"canonical
"canonical form."
form." The
The transformation
transformation A A M»
f--+ PAQ
P AQ isis called
called an equivalence; it
an equivalence; is called
it is called an
an
orthogonal
orthogonal equivalence
equivalence if P and
if P and Q areare orthogonal
orthogonal matrices.
matrices.
Remark 10.1. We
Remark 10.1. can also
We can also consider
consider thethe case
case A emmxn
A eE C
xn
and unitary
and unitary equivalence
equivalence if P and
if P and
<2
Q are
are unitary.
unitary.
Two special cases
Two special cases are
are of
of interest:
interest:
1. If
1. If W = V
V and
and <2 P"11,, the
Q == p- thetransformation
transformation AAf--+ PAP" 1 isiscalled
H>PAP-I similarity.
calledaasimilarity.
T T
If W =
2. If = VV and if Q = P
and if pT is orthogonal, the
is orthogonal, the transformation
transformation A
A i-»
f--+ PAP
P ApT is called
is called
an orthogonal
an similarity (or
orthogonal similarity (or unitary similarity in
unitary similarity in the
the complex
complex case).
case).
The
The following
following results
results areare typical
typical ofof what
what can can be achieved under
be achieved under aa unitary similarity. If
unitary similarity. If
A = A
A = AHH E C"xxn
6 en " has eigenvalues AI,
has eigenvalues . . . ,, A
AI, ... An,n, then there exists
then there exists aa unitary
unitary matrix
matrix U£7 such
suchthat
that
UHHAU
U AU =— D, D, where
where D diag(A.j,...,
D == diag(AJ, A. n ). This
... , An). This is
is proved
proved in in Theorem
Theorem 10.2.
10.2. What
What other
other
matrices
matrices are
are "diagonalizable"
"diagonalizable" under under unitary similarity? The
unitary similarity? The answer
answer isis given
given in Theorem
in Theorem
10.9, where
10.9, where it it is
is proved
proved that
that aa general
general matrix
matrix A A eE e x
C"nxn" is
is unitarily similar to
unitarily similar to aa diagonal
diagonal
H H
matrix if and
matrix if and only
only if
if it
it is
is normal
normal (i.e.,
(i.e., AA
AA H = = A A). Normal
AHA). Normal matrices
matrices include
include Hermitian,
Hermitian,
skew-Hermitian,
skew-Hermitian, and and unitary
unitary matrices
matrices (and (and their
their "real"
"real" counterparts: symmetric, skew-
counterparts: symmetric, skew-
symmetric,
symmetric, andand orthogonal,
orthogonal, respectively),
respectively), as as well
well asas other
other matrices
matrices that
that merely satisfy the
merely satisfy the
definition,
definition, such
such asas A
A= = [[_~ a
!]
_ b ^1 for for real scalars aa and
real scalars and b.
h. IfIf aa matrix
matrix A A is
is not
not normal,
normal, the
the
most "diagonal" we
most "diagonal" we can get is
can get is the
the JCF
JCF described
described in in Chapter
Chapter 9. 9.
x
Theorem 10.2.
Theorem 10.2. Let A = = AHH eE C"en xn " have (real) eigenvalues A.I, . . . ,,An.
AI, ... Xn. Then there
HH
exists
exists aa unitary
unitary matrix
matrix X
X such
such that
that X
X AX AX == D
D= = diag(A.j, . . . ,, X
diag(Al, ... n) (the
An) (the columns
columns ofX
of X are
are
orthonormal for A).
eigenvectors for
orthonormal eigenvectors A).
95
95
96
96 Chapter 10.
Chapter 10. Canonical
Canonical Forms
Forms
Proof: Let x\
Proof' XI be a right eigenvector
eigenvector corresponding
corresponding to X\,AI, and normalize it such that x~
xf*x\ =
XI =
1. Then there
1. Then exist n
there exist n -— 11 additional
additional vectors
vectors xX2, . . . ,, xXnn such
2, ... such that
that X = (XI,
[x\,...,
... , xn] =
xn] =
[x\ X22]] is unitary. Now
[XI
XHAX =[ xH
I
XH ] A [XI
2
X 2] =[ x~Axl
XfAxl
X~AX2
XfAX 2 ]
X~AX2
=[ Al
0 XfAX 2 ] (10.1)
=[ Al
0 XfAX z
0
l (10.2)
In (l0.1)
In (10.1) we have used
we have used the fact that
the fact that Ax\
AXI = = AIXI.
k\x\. When When combined
combined with
with the
the fact
fact that
that
x~
x"xiXI =
= 1,1, we get A-i
Al remaining in the (l,I)-block.
(l,l)-block. We also get 0 in the (2,l)-block
(2, I)-block by
noting that x\ orthogonal to all vectors in X
XI is orthogonal 2. In (10.2), we get 0 in the (l,2)-block
Xz. (l,2)-block by
H
noting that X AX is Hermitian. The proof
XH AX proof is completed easily by induction
induction upon noting
that the (2,2)-block
(2,2)-block must have eigenvalues A2, ... , A.
X2,..., An.n . D
0
Given a unit vector x\ XI Ee JRn, ]R"X("-1) such that X

E", the construction of X2z Ee JRnx(n-l) X— =
[x\ X22]] is orthogonal
[XI frequently required. The construction can actually be performed
orthogonal is frequently
quite easily by means of Householder
Householder (or Givens) transformations
transformations as in the proof
proof of the
following general
following general result.
result.
10.3. Let
Theorem 10.3. X\ E
Let XI e CCnxknxk
have orthonormal
have orthonormal columns
columns and
and suppose
suppose VU is
is a unitary
a unitary
matrix such
matrix such that
that V
UX\
XI == [\ ~],
0 1, where
where RR €E Ckxk
kxk
is
is upper
upper triangular.
triangular. Write V HH = [U\
Write U [VI U Vz]]
2
with Ui €C
VI E Cnxk
nxk
. Then [XI
[Xi V
U2]] is unitary.
Proof: Let
Proof: Let X\
X I = [x\,..., Xk]. Construct
[XI, ... ,xd. Construct aa sequence
sequence of
of Householder
Householder matrices (also known
matrices (also known
as elementary reflectors) H\,..., Hkk in the usual way (see below) such that
HI, ... , H
Hk ... HdxI, ... , xd = [ ~ l

where R is upper triangular (and nonsingular since x\, ..., , Xk
XI, ... U=
Xk are orthonormal). Let V =
H k...H
Hk'" v. Then
HI. UH =
Then VH = /HI'"
/,-•• H
Hkk and
and
Then xÛ H
2 =
X i U2 = 0 (i(/ E
€ ~)
k) means that xXif is orthogonal to each of the n —
- kk columns of V2.
U2.
But the latter are orthonormal since they are the last n -— kk rows of the unitary matrix U.U.
Thus, f/2] is unitary.
[Xi U2]
Thus. [XI unitary. D
0
called for in Theorem 10.2 is then a special case of Theorem

The construction called Theorem 10.3 10.3
for kk = 1.
1. We illustrate the construction of the necessary Householder matrix for kk — = 1.
For simplicity, XI be denoted by [~I,
simplicity, we consider the real case. Let the unit vector x\ [£i, .. %n]T.
. . ,. ,, ~nf.
10.1. Some
10.1. Some Basic
Basic Canonical Forms
Canonical Forms 97
Then
Then the
the necessary
necessary Householder
Householder matrix matrix needed
needed forfor the
the construction
construction of is given
X^2 is
of X given byby
+ TT r
U = I -—2uu+
2uu = I — u-ÛU
- +uuu , where u
, where u = [t-\ 1, £2, • • •» £«] - It can
[';1 ± 1, ';2, ... , ';nf. It can easily be checked
checked
that U
U is symmetric
symmetric and U UTTUU = = UU22 == I,
I, so U U is orthogonal.
orthogonal. To see that U effects the
U effects
necessary
necessary compression
compression of
of jci,
Xl, it is easily
it is easily verified
verified that
that U u TTU
u == 2± 2£i and
± 2';1 u TTX\
and U Xl == 11 ±
± £1.
';1.
Thus,
Further details on Householder matrices, including the choice of sign and the complex case,
can be consulted
consulted in standard
standard numerical
numerical linear algebra texts such as [7],
linear algebra [7], [11],
[11], [23],
[23], [25].
[25].
The real version of Theorem 10.2
10.2isisworth
worthstating
statingseparately
separately since
sinceititisisapplied
appliedfre-
fre-
quently
quently in
in applications.
applications.
10.4. Let
Theorem 10.4. A = A
Let A AT T
eE E nxn
jRnxn have
have eigenvalues
eigenvalues k\, ... ,X
AI, ... n. Then
, An. Then there exists an
there exists an
lxn
orthogonal matrix X eE W (whose columns are orthonormal eigenvectors of
jRn xn (whose of A) such that
T
X
XT AX
AX =
= D
D== diag(Xi,
diag(Al, ....
. . , An).
X n ).
Note that Theorem 10.4 implies that a symmetric matrix A
A (with the obvious analogue
from
from Theorem 10.2for
Theorem 10.2 forHermitian
Hermitian matrices)
matrices) can
canbe
bewritten
written
n
T
A = XDX = LAiXiXT, (10.3)
i=1
often called the spectral

which is often spectral representation of A. In fact, A in (10.3) is actually a
weighted sum of orthogonal projections P, Pi (onto the one-dimensional
one-dimensional eigenspaces
eigenspaces corre-
sponding
sponding to
to the
the A., 's),i.e.,
Ai'S), i.e.,
n
A = LAiPi,
i=l
where
where P, = PUM
Pi = —xxiXt
PR(x;) = ixf ==xxixT
ixj since
sincexjxTxi =1.1.
Xi —
The following pair of theorems form the theoretical
theoretical foundation of the double-Francis-
double-Francis-
QR algorithm used to compute matrix eigenvalues in a numerically stable and reliable way.
98 Chapter 10.
Chapter 10. Canonical Forms
Canonical Forms
Theorem 10.5 (Schur).

Theorem 10.5
H
(Schur). Let cnxn
A eE C"
Let A x
". . Then
Then there
there exists
exists a
a unitary
unitary matrix U such
matrix U such that
that
U H AU
U AU == T,
T, where
where TT is
is upper
upper triangular.
triangular.
Proof: The proof
Proof: The proof of
of this
this theorem
theorem is
is essentially
essentially the
the same as that
same as that of Theorem 10.2
of Theorem lO.2 except
except that
that
in
in this
this case
case (using
(using the
the notation U rather
notation U rather than X) the
than X) (l,2)-block wf AU2
the (l,2)-block AU2 is ur
is not
not 0.
O. D
0
In
In the
the case
case of
of A
A E IRn xxn
e R" ",, it
it is
is thus
thus unitarily similar to
unitarily similar to an
an upper
upper triangular
triangular matrix,
matrix, but
but
if A has
if A has aa complex
complex conjugate
conjugate pairpair of
of eigenvalues,
eigenvalues, then complex arithmetic
then complex arithmetic isis clearly
clearly needed
needed
to
to place
place such
such eigenvalues
eigenValues on on the
the diagonal
diagonal of T. However,
of T. However, thethe next
next theorem
theorem shows
shows that
that every
every
xn
A
A eE WIRnxn is
is also
also orthogonally
orthogonally similar
similar (i.e.,
(i.e., real
real arithmetic)
arithmetic) to quasi-upper-triangular
to aa quasi-upper-triangular
matrix.
matrix. A quasi-upper-triangular matrix
A quasi-upper-triangular matrix is
is block
block upper
upper triangular
triangular with
with 11 xx 11 diagonal
diagonal
blocks corresponding to
blocks corresponding to its
its real
real eigenvalues
eigenvalues andand 2x2
2 x 2 diagonal
diagonal blocks corresponding to
blocks corresponding to its
its
complex
complex conjugate
conjugate pairs
pairs of
of eigenvalues.
eigenvalues.
10.6 (Murnaghan-Wintner). Let
Theorem 10.6 Let AA E IR n xxn.
e R" ". Then
Then there
there exists
exists an orthogonal
an orthogonal
T
matrix U such
matrix U such that U AU
that U T
= S,
AU = S, where
where S
S is quasi-upper-triangular.
is quasi-upper-triangular.
Definition 10.7. The
Definition 10.7. The triangular
triangular matrix
matrix TT in
in Theorem 10.5 is
Theorem 10.5 called aa Schur
is called Schur canonical
canonical
form
form or
or Schur
Schur form. The quasi-upper-triangular
fonn. The quasi-upper-triangular matrix S in
matrix S in Theorem
Theorem 10.610.6 is
is called real
called aa real
Schur canonical
Schur canonical form
form oror real
real Schur
Schur form (RSF). The
fonn (RSF). The columns
columns of of aa unitary [orthogonal]
unitary [orthogonal}
matrix
matrix UU that
that reduces
reduces a
a matrix to [real]
matrix to [real} Schur
Schur form
fonn are
are called Schur vectors.
called Schur vectors.
Example 10.8.
10.8. The
The matrix
matrix
is
is in
in RSF.
RSF. Its
Its real
real JCF
JCF is
s~ [ -20
-2 5
4
0 n
n
is
h[ -1
1
0 0
1
Note
Note that
that only
only the
the first
first Schur
Schur vector
vector (and
(and then
then only
only ifif the corresponding first
the corresponding first eigenvalue
eigenvalue
is
is real
real if is orthogonal)
U is
if U orthogonal) is is an
an eigenvector.
eigenvector. However,
However, what
what is is true,
true, and sufficient for
and sufficient for virtually
virtually
all
all applications
applications (see,
(see, for
for example, [17]), is
example, [17]), is that
that the
the first Schur vectors
first k Schur vectors span
span the same A-
the same
invariant subspace as
invariant subspace as the eigenvectors corresponding
the eigenvectors corresponding to to the
the first eigenvalues along
first k eigenvalues along the
the
diagonal
diagonal of of TT (or S).
(or S).
While
While every
every matrix
matrix cancan bebe reduced
reduced toto Schur form (or
Schur form (or RSF),
RSF), itit is
is of
of interest
interest to know
to know
when
when we can go
we can go further and reduce
further and reduce aa matrix
matrix via
via unitary similarity to
unitary similarity to diagonal
diagonal form. The
form. The
following
following theorem
theorem answers
answers this
this question.
question.
A eE c
x
Theorem 10.9.
Theorem 10.9. A A matrix
matrix A C"nxn " is
is unitarily
unitarily similar
similar to
to a
a diagonal
diagonal matrix
matrix ifif and
and only
only if
if
H HH
A
A is
is normal
normal (i.e.,
(i.e., A
AHAA == AAAA ).).
Proof: Suppose
Proof: U is
Suppose U is aa unitary
unitary matrix
matrix such
such that UHH AU
that U D, where
= D,
AU = D is
where D is diagonal.
diagonal. Then
Then
AAH = U VUHU VHU H = U DDHU H == U DH DU H == AH A

so
so A is normal.
A is normal.
10.2. Definite
10.2. Matrices
Definite Matrices 99
Conversely, suppose AA is normal and let U UHHAU

U be a unitary matrix such that U T,
A U = T,
where T
T is an upper triangular matrix (Theorem
(Theorem 10.5). Then
It is then a routine exercise to show that T

It T must, in fact, be diagonal. D
0
10.2
10.2 Definite Matrices
Definite Matrices
xn
Definition 10.10. A
Definition 10.10. A symmetric
symmetric matrix
matrix A
A E Wnxn
e lR. is
1. positive definite if
positive definite if and
and only ifxxTTAx
only if > 0Qfor
Ax > nonzero xx G
all nonzero
for all Wn1.. We
E lR. We write
write A > 0.
A > O.
2. nonnegative definite (or positive semidefinite) if

(or positive if and only if
and only x TT Ax
if X > 0 for
Ax :::: all
for all
nonzero xx Ee lR.
nonzero W. • We
n
write A
We write > 0.
A :::: O.
negative definite if
3. negative - A is positive
if—A definite. We
positive definite. write A
We write < 0.
A < O.
4. nonpositive definite (or

(or negative
negative semidefinite) if
if—-A
A is nonnegative
nonnegative definite.
definite. We
We
write A
write < 0.
A ~ O.
Also, if
Also, if A and B
A and are symmetric
B are symmetric matrices,
matrices, we we write
write AA > B if
> B if and if AA -— BB >> 0 or
only if
and only or
B
B —- AA << 0.
O. Similarly,
Similarly, we
we write
write A > B
A :::: B ifif and
and only if A —
only ifA - B>QorB - A
B :::: 0 or B — A < ~ 0.
O.
Remark 10.11.
Remark 10.11. If A e
A Ee C" nxn
x
" is Hermitian, all the above definitions hold except that
superscript H
superscript //ss replace T s. Indeed, this is generally true for all results in the remainder of
Ts. of
this section that may be stated in the real case for simplicity.
Remark 10.12. If
If a matrix is neither
neither definite nor semidefinite, indefinite.
semidefinite, it is said to be indefinite.
Theorem 10.13. Let

Let A = A
A = AHH
Cnxn
eE e nxn
with eigenvalues
with eigenvalues X > A
AI{ :::: > ...
A22 :::: • • • ::::
> A An.n. Thenfor
Then for all
all
x eC",
E en,
Proof:
Proof: Let U U be a unitary matrix that diagonalizes
diagonalizes A 10.2. Furthermore,
A as in Theorem 10.2. Furthermore,
let yv = U UHHx, CM, and denote the components of y by
x, where x is an arbitrary vector in en,
€ n. Then
j]i, ii En.
11;, Then
n
x HAx = (U HX)H U H AU(U Hx) = yH Dy = LA; 111;12.

;=1
But clearly
n
LA; 11'/;12 ~ AlyH Y = AIX HX

;=1
100
100 Chapter 10.
Canonical Forms
Forms
and
and
n
LAillJilZ::: AnyHy = An xHx ,

i=l
from which the theorem follows. D

0
Remark
Remark 10.14. The ratio ^^ XHHAx for A == A AHH
E eC
<= nxn
nxn
nonzerox jcEeen
andnonzero
and C"isiscalled
calledthe
the
x x
Rayleigh quotient of of jc.
x. Theorem
Theorem 1O.l310.13 provides
provides upper (AO(A 1) and lower (An)
(A.w) bounds for
the Rayleigh quotient. If A = = A H
AH eE C" x
enxn " is positive definite, Xx HHAx > 0 for all nonzero
Ax >
x E
E C",soO
en, so 0 < XAnn <::::: •••
... <
::::: A.I.
AI.
I
x H
Corollary 10.15. Let
Corollary A e
Let A enxn
E C" ". . Then
Then IIAII2
\\A\\2 =
=^ m(A
Ar1ax(AH A}.
A).
Proof: For all x €E en
Proof: C" we have
Let jc
Let x be
be an
an eigenvector
eigenvector corresponding
corresponding to
to X (AHHA).
max(A
Amax A). Then 111~~1~22 = ^^(A"
Then ^pjp Ar1ax (A HA),
A), whence
whence
IIAxll2 ! H
IIAliz = max - - = Amax{A A). 0
xfO IIxll2
Definition
Definition 10.16. A principal submatrix of an nxn
submatrixofan n x n matrix A is the (n — k)x(n
-k) x (n — k) matrix
-k)
that remains by deleting k rows and the corresponding k columns. A leading principal
submatrix of
of order n —
- k is obtained
obtained by deleting the last k rows and
and columns.
x
Theorem 10.17. A symmetric matrix A eE E" " is positive
~nxn definite ififand
positive definite only ififany
and only any of
ofthe
the
following
following three equivalent
equivalent conditions hold:
1. The determinants of all leading principal

determinants of principal submatrices of
of A are positive.
positive.
2. All
All eigenvalues
eigenvalues of
of A
A are positive.
are positive.
3. A can be written in the form

form M
MTT
~n xxn
M, where M eE R" " is nonsingular.
x
Theorem 10.18. A symmetric matrix A €E R" ~n xn definite if
" is nonnegative definite if and only if
and only if any
of
of the following
following three equivalent
equivalent conditions hold:
of all
1. The determinants of all principal
principal submatrices
submatrices of
of A are nonnegative.
2.
2. All eigenvalues of
All eigenvalues of A
A are nonnegative.
are nonnegaTive.
T ix
3. A
3. can be
A can be written in the
wrirren in form M
[he/orm MT M,
M, where M 6
where M E R
IRb<n" and
and kk >
~ rank(A) — rank(M).
ranlc(A) "" ranlc(M).
Remark 10.19. Note

R.@mllrk 10.19. that the determinants of all principal "ubm!ltriC[!!l
Not@th!ltthl!dl!termin!lntl:ofnllprincip!ll eubmatrioesmu"tmuetbB
bQnonnBgmivB
nonnogativo
in Theorem 10.18.1, not just those of the leading principal submatrices. For example,
Theorem 10.18.1,
consider
consider the
the matrix
matrix A
A —= [[~0 _l~].
1. The
The determinant
determinant ofof the
the 1x1
I x 1 leading submatrix is
leading submatrix is 0
0 and
and
the determinant
determinant of the 2x2
2 x 2 leading submatrix is also 0 (cf. Theorem
0 (cf. Theorem 10.17).
10.17). However, the
10.2. Definite
10.2. Definite Matrices
Matrices 101
101
principal submatrix consisting

principal submatrix of the
consisting of (2,2) element
the (2,2) element is,
is, in
in fact,
fact, negative
negative and
and A is nonpositive
is nonpositive
definite.
Remark 10.20. The
Remark 10.20. The factor
factor M in Theorem
M in 10.18.3 is
Theorem 10.18.3 is not
not unique.
unique. For example, if
For example, if
then
then M can be
M can be
[1 0], [ fz
-ti
o
o l [~~ 0]
v'3 0
0 , ...
Recall
Recall that
that A > B
A :::: if the
B if the matrix
matrix A
A -— B
B is
is nonnegative definite. The
nonnegative definite. The following
following
theorem is useful
theorem is useful in "comparing" symmetric
in "comparing" symmetric matrices.
matrices. Its
Its proof is straightforward
proof is straightforward from
from
basic
basic definitions.
definitions.
nxn
Theorem 10.21. Let A, B eE R
jRnxn be symmetric.
nxm T
1. 1f
If A ::::
>BandMe
Band M E R
jRnxm,, then M
MT AM > MT
AM :::: MTBM.
BM.
nxm T
2. If
2. Ifj A
A>>B and M eE R
Band jR~xm,
m , then M
MT AM
AM> M. TBM.
> MT BM.
The following standard

standard theorem
theorem is stated
stated without proof (see, for
proof (see, for example,
example, [16, [16,p.p.
xn
nxn
181]). It concerns
181]). concerns the the notion
notion ofof the
the "square
"square root"
root" of
of aa matrix.
matrix. That
That is,
is, if
if A E € lR.
E" ,,we
wesay
say
nx
that
that S Ee R squareroot
jRn xn"isisa asquare rootofofAAififS2S2 =—A.A. InIngeneral,
general,matrices
matrices(both
(bothsymmetric
symmetricand and
nonsymmetric)
nonsymmetric) havehave infinitely
infinitely many
many square
square roots.
roots. For example, if
For example, if A == lz,/2, any
any matrix
matrix S ofof
the "
form [ c e
°*
[COSO _ s 9 .
™
Sino]] is a square
the 10rm ssinOe _ ccosOe IS a square root. root.
x
nxn
Theorem
Theorem 10.22.10.22. Let AA Ee lR.
R" "be
benonnegative
nonnegativedefinite.
definite. Then
ThenAAhas
hasaaunique
uniquenonnegative
nonnegative
definite
definite square root S. S. Moreover, SA = = AS rankS =
AS and rankS = rank
rankAA (and hence S
S is positive
definite ifif A is positive
definite definite).
positive definite).
A stronger form
A stronger form of the third
of the third characterization in Theorem
characterization in Theorem 10.17
10.17 is
is available and is
available and is
known as
known as the Cholesky factorization.
the Cholesky factorization. It
It is
is stated
stated and
and proved
proved below for the
below for the more
more general
general
Hermitian case.
Hermitian case.
Theorem 10.23. Let A eE c
Theorem 10.23. nxn
<Cnxn be Hermitian and positive definite. Then there exists a
positive definite.
unique nonsingular
nonsingular lower triangular matrix L with positive
positive diagonal elements such that
H
A== LL
LLH. .
Proof: The
Proof: The proof
proof is
is by
by induction.
induction. The case n =
The case = 1 is trivially true.
is trivially true. Write
Write the
the matrix
matrix A
A in
in
the
the form
form
By our
By our induction
induction hypothesis,
hypothesis, assume
assume the
the result
result is
is true
true for matrices of
for matrices of order
order n -— 11 so
so that B
that B
may be written as
as B =
= L\L^,
L1Lf, where L\ C1-""1^""^ is nonsingular and
Ll eE c(n-l)x(n-l) and lower triangular
102
102 Chapter 10.
Canonical Forms
Forms
with positive diagonal elements. It

It remains to prove that we can write the n x n matrix A
in the
in form
the form
b ] = [Lc J
ann
0 ]
a
[Lf0 c
a
J,
where a is positive. Performing the indicated matrix multiplication
multiplication and equating the cor-
responding submatrices, we see that we
we see we must have L\c =b
L IC = and a
b and =C
nn =
ann cH
H
cC + aa22.• Clearly
by c = C,lb.
c is given simply by Substituting in
L^b. Substituting in the
the expression involving
involving a, we we find
find
a22 = ann -— bbHL\
= ann H
LIH L\lb =
L11b ann -— bbHH B-1b
= ann B~lb (= the Schur complement of B B in A). But we
A). But
know that
o < det(A) = det [ ~ b ] = det(B) det(a nn _ b H B-1b).

ann
H l
det(fi) >
Since det(B) > 0, we must have a - bH
nn —b
ann B b >
B-1b > 0.
O. Choosing
Choosing aa to be
be the positive square
root of
of «„„ H
ann -— bb B~ l
B-1b b completes
completes the proof. D
0
10.3
10.3 Equivalence Transformations
Equivalence Transformations and
and Congruence
Congruence
Theorem 10.24. Let A €E C™* 71
c;,xn. . Then C™xm
exist matrices P Ee C:
Then there exist xm
and Q eE C" x
n " such
c~xn such
that
that
PAQ=[~ ~l (l0.4)
Proof: A classical proof

Proof: proof can be consulted in, for example, [21,
[21,p.p.131].
131].Alternatively,
Alternatively,
suppose A has an SVD of the form (5.2) in its complex version. Then
[
S-l
o 0 ] [
I
U
Uf
]
AV =
[I0 0 ]
0 .
Take P =[ S~ 'f [I ] and Q = V to complete the proof. 0
Note that the greater freedom afforded

afforded by the equivalence transformation of Theorem
10.24, as opposed to the more restrictive situation of a similarity transformation, yields a
far "simpler" canonical form (10.4). However, numerical procedures
procedures for computing such
an equivalence directly via, say, Gaussian or elementary row and column operations, are
generally unreliable. The numerically preferred equivalence is, of course, the unitary
unitary equiv-
alence known as the SVD. However, the SVD is relatively expensive to compute and other
canonical forms exist that are intermediate between (l0.4)
(10.4) and the SVD; see, for example
[7, Ch. 5], [4, Ch. 2]. Two such forms are stated here. They are more stably computable
than (lOA)
(10.4) and more efficiently
efficiently computable than a full SVD. Many similar results are also
available.
available.
10.3. Equivalence
10.3. Equivalence Transformations and Congruence
Transformations and Congruence 103
103
x
Theorem
Theorem 10.25
10.25 (Complete Orthogonal Decomposition).
(Complete Orthogonal Decomposition). Let A Ee C™
Let A e~xn.". Then
Then there
there exist
exist
mxm nxn
unitary matrices
unitary matrices UU eE C
e
mxm
and V Ee C
and V e
nxn
such
such that
that
(10.5)
where €,rrxr is
R Ee e;xr
where R is upper (or lower)
upper (or lower) triangular
triangular with
with positive
positive diagonal elements.
diagonal elements.
Proof: For the
proof, see [4]. D
0
x mxm
Theorem
Theorem 10.26.
10.26. Let
Let AA eE C™
e~xn.". Then
Then there
there exists
exists a
a unitary
unitary matrix Q Ee C
matrix Q e mxm and
and aa
x
permutation
permutation matrix Fl e C" "
IT E en xn such that
QAIT = [~ ~ l (10.6)
r xr rx( r)
where R E
E C
e;xr upper triangular and S eE C
r is upper " is arbitrary
erx(n-r) arbitrary but in general
general nonzero.
nonzero.
Proof: For the
proof, see [4]. D
0
Remark 10.27.
Remark 10.27. When A
When A has full column
has full column rankrank but is "near"
but is "near" aa rank
rank deficient matrix,
deficient matrix,
various rank
various revealing QR
rank revealing QR decompositions
decompositions are are available
available that
that can sometimes detect
can sometimes detect such
such
phenomena
phenomena atat aa cost
cost considerably
considerably less
less than
than aa full
full SVD.
SVD. Again,
Again, see [4] for
see [4] for details.
details.
nxn n xn H
Definition 10.28. Let A eE C
Definition 10.28. X e
e nxn and X E C The transformation A i->
n . The
e~xn. H- XXH AX
AX is called
congruence. Note
aa congruence. Note that
that aa congruence
congruence is
is aa similarity
similarity if
if and
and only
only if
ifXX is
is unitary.
unitary.
Note
Note that congruence preserves
that congruence preserves the
the property
property of
of being
being Hermitian; i.e., if
Hermitian; i.e., if A is Hermitian,
A is Hermitian,
then XHH AX
then X AX isis also
also Hermitian. It is
Hermitian. It is of interest to
of interest to ask
ask what other properties
what other properties ofof aa matrix
matrix are
are
preserved
preserved under
under congruence. It turns
congruence. It out that
turns out that the
the principal
principal property
property so
so preserved
preserved is is the
the sign
sign
of each
of each eigenvalue.
eigenvalue.
H x
A ==A AH eE C"
e
nxn
" and
and let
let 7t,
rr, v, and £
v, and ~ denote the numbers
denote the numbers of positive,
of positive,
negative, and
negative, and zero eigenvalues,
eigenvalues, respectively,
respectively, of
of A. Then the inertia
A. Then of A is
inertia of is the
the triple of
of
In(A) = (rr,
numbers In(A)
numbers v, n
(n, v, £). The signature of
The signature is given by sig(A)
of A is sig(A) == nrr -— v.
v.
Example 10.30.
Example 10.30.
l.In[!
o
1 o
o
0
0]
00
-10 =(2,1,1).
0
2. If A
2. If AH Ee Ce nnxn
A = A" x
" , ,t hthen
e n AA>
> 00 if
if and
and only if In
only if (A) =
In(A) (n,0,0).
= (n, 0, 0).
3. If In(A) (TT, v,
In(A) = (rr, n, then
v, £), then rank(A) rr + v.
rank(A) = n v.
Theorem
Theorem 10.31
10.31 (Sylvester's
(Sylvester's Law Inertia). Let A = A HHE
of Inertia).
Law of Cnxn
e en xn and X e
E C
n xn
e~ nxn.. Then
H
In(A) == In(X
In(A) ln(X AX).
H AX).
Proof: For
Proof: For the
the proof, see, for
proof, see, example, [21,
for example, [21, p. 134]. D
p. 134]. D
Theorem 10.31
Theorem 10.31guarantees
guaranteesthat
thatrank
rankand
andsignature
signatureofofa amatrix
matrixare
arepreserved
preservedunder
under
congruence. We
congruence. We then
then have
have the following.
the following.
104
104 Chapter 10. Canonical
Chapter 10. Canonical Forms
Forms
Theorem 10.32. Let A = A AHH

eE c xn
nxn
C" In(A) =
with In(A) = (jt,
(Jr, v, £). Then there exists a matrix
v, O.
xn H
X e C"n
X E c~xn such that X AX = diag(l, ....
XH AX = diag(1, . . ,, 1,
I, -1,...,
-I, ... , -1,
-1,0,0, ....
. . ,0),
, 0),where
wherethe
thenumber
number of
of
1's's is
is 7i, the number of — l's is v,
Jr, the number of -I 's is v, and the number 0/0 's is (,.
the numberofO's is~.
Proof: Let AI
Proof: AI,, ...
. . . ,, X
Anw denote the eigenvalues of of A and order them such that the first TT
Jr are
positive, the next v are negative, and the final £~ are 0. O. By Theorem
Theorem 10.2 there exists a unitary
matrix VU such that VH UHAU AV = diag(Ai, ...
= diag(AI, . . . ,, An).
A w ). Define
Define the
thenn xx nnmatrix
matrix
vv = diag(I/~, ... , I/~, 1/.f-Arr+I' ... , I/.f-Arr+v, I, ... ,1).

X =V
Then it is easy to check that X W yields the desired
U VV desired result. D
0
10.3.1
10.3.1 Block matrices and definiteness
T
Theorem 10.33. Suppose A =
=AAT and D
D= DT. Then
= DT.
if and only ifeither

ifand if either A
A> °
> 0 and
and D BT A~
D -- BT A-Il B > 0, or D
> 0, > 0 and
D > and A -- BD^B T
BD- I BT > 0.
> O.
Proof: The proof
Proof: proof follows by considering, for example, the congruence
B ]
D ~
[I
0
_A-I B
I
JT [ A
BT ~ ][ ~
The details are straightforward and are left
left to the reader. D
0
Remark
Remark 10.34. Note the symmetric Schur complements of A (or D) in the theorem.
Theorem 10.35. Suppose A = AT
AT D =D
and D T
. Then
DT.
B ] >
D - °
if and only if
if ifA>0,
A:::: 0, AA +
AA+B = B,
B = and D
B. and BT A
D -- BT +
B > 0.
A+B:::: o.
Proof: Consider the congruence with
Proof: Consider
and proceed as in the proof

proof of Theorem
Theorem 10.33. D
0
10.4
10.4 Rational Canonical
Rational Canonical Form
Form
One final canonical form to be mentioned is the rational
rational canonical form.
10.4. Rational
10.4. Rational Canonical
Canonical Form
Form 105
105
n x
Definition
Definition 10.36. A
A matrix
matrix A e lR
A E M" Xn" is
is said to be
said to if its minimal
be nonderogatory ifits minimal polynomial
polynomial
and
and characteristic
polynomial are the same
are the or, equivalently,
same or; equivalently, if
if its
its Jordan
Jordan canonical form
canonical form
has only one
has only one block associated with
block associated with each distinct eigenvalue.
each distinct eigenvalue.
Suppose A
Suppose Wnxn
A EE lR xn
is
is aa nonderogatory matrix and
nonderogatory matrix and suppose
suppose its
its characteristic polyno-
characteristic polyno-
mial is 7r(A)
n(A) = A" (a0 +
An -— (ao alA + ... +
+ «A a n _iA n ~')- Then
+ an_IAn-I). Then it
it can
can be
be shown
shown (see
(see [12])
[12]) that
that A
A
is similar to
is similar to aa matrix
matrix of
of the
the form
form
o o o
o 0
(10.7)
o
o o
Definition 10.37.
Definition 10.37. AA matrix
matrix A lRnx
A eE E nxn" of
of the
the form
form (10.7)
(10.7) is
is called a companion
called a cornpanion matrix
rnatrix or
or
is said
is said to in companion
be in
to be form.
cornpanion forrn.
Companion
Companion matrices
matrices also
also appear
appear in
in the
the literature
literature in
in several
several equivalent
equivalent forms.
forms. To
To
illustrate, consider
illustrate, consider the companion matrix
the companion matrix
(l0.8)
This
This matrix
matrix is
is aa special
special case
case of
of aa matrix in lower
matrix in Hessenberg form.
lower Hessenberg form. Using
Using the
the reverse-order
reverse-order
identity similarity
identity similarity P given by
P given by (9.18), A is
(9.18), A easily seen
is easily seen to
to be similar to
be similar to the following matrix
the following matrix
in upper
in upper Hessenberg form:
Hessenberg form:
a2
o
o
1
al
0
0
1
6]
o .
o
(10.9)
Moreover,
Moreover, since
since aa matrix
matrix is
is similar
similar to
to its
its transpose
transpose (see
(see exercise
exercise 13
13 in
in Chapter
Chapter 9),
9), the
the
following are also
following are also companion
companion matrices
matrices similar
similar to
to the above:
the above:
:l ~ ! ~01].
ao 0 0
(10.10)
Notice
Notice that
that in all cases
in all cases aa companion
companion matrix
matrix is
is nonsingular
nonsingular ifif and
and only if aO
only if /= O.
ao i= 0.
In fact, the
In fact, inverse of
the inverse of aa nonsingular companion matrix
nonsingular companion is again
matrix is again in
in companion
companion form. For
form. For
£*Yamr\1j=»
example,
-~ -~ _!!l
o ao ao
1 1 o o (10.11)
o o o
o o
106 Chapter 10.
Canonical Forms
Forms
with aa similar
with similar result
result for companion matrices
for companion of the
matrices of the form
form (10.10).
(10.10).
If a companion matrix of the form (10.7) is singular,
If singular, i.e., if
if ao = 0, then its pseudo-
ao =
1
inverse can still be computed. Let a Ee JRn-1 M"" denote the vector [ai, \a\, a2,
02,..., a n -i] and
... , an-If and let
l
I+~T
cc = l+ r
a a.
a' Then
Then it
it is
is easily
easily verified
verified that
that
o o o +
o o o
o 1- caa T
o
ca
J.
o o
Note that /I -— caa TT == (I ++ aaTT)

) -I ,, and hence the pseudoinverse of a singular companion
matrix is not
matrix is not aa companion
companion matrix
matrix unless
unless a a== 0.
O.
Companion matrices
Companion matrices have
have many
many other interesting properties,
other interesting among which,
properties, among which, and per-
and per-
haps surprisingly,
haps surprisingly, isis the
the fact
fact that
that their singular values
their singular values can
can be found in
be found in closed form; see
closed form; see
[14].
Theorem 10.38.
Theorem 10.38. Let al >
Let a\ GI >
~ a2 ••• >
~ ... ~ aann be the singular
be the values of
singular values of the companion matrix
the companion matrix
A
A in
in (10.7).
(10.7). Let
Leta = a\ +
a = ar
+ a\ + •...• • ++ a%_
ai + a;_1{ and
and yy == 1
1+ aJ
+ «.Q ++ a. Then
a. Then
al2_ 21 ( y + Jy 2- 4ao2) '

-
a? = 1 for i = 2, 3, ... , n - 1,
a; = ~ (y - Jy2 - 4aJ) .
Ifao
If ^ 0,
ao =1= the largest
0, the largest and
and smallest
smallest singular
singular values
values can also be
can also be written in the
written in the equivalent
equivalent form
form
Remark 10.39. Explicit

Remark 10.39. Explicit formulas
formulas for
for all
all the
the associated right and
associated right left singular
and left singular vectors can
vectors can
also be
also derived easily.
be derived easily.
nx
If A
If A E€ RJRnxn" is derogatory,
derogatory, i.e., has more than one Jordan block associated associated with
at
at least one eigenvalue,
least one eigenvalue, then
then it
it is
is not similar to
not similar to aa companion
companion matrix
matrix of the form
of the form (10.7).
(10.7).
However, it can be shown that a derogatory matrix is similar to a block diagonal matrix,
each of whose
each of whose diagonal
diagonal blocks
blocks isis aa companion
companion matrix.
matrix. Such
Such matrices are said
matrices are said to
to be
be in
in
rational canonical form
rational canonical form (or Frobenius
Frobenius canonical form).form). For details, see, for example, [12].
Companion
Companion matrices appear frequently
matrices appear frequently in in the control and
the control signal processing
and signal literature
processing literature
but
but unfortunately they are
unfortunately they are often
often very
very difficult
difficult to
to work
work with numerically. Algorithms
with numerically. Algorithms to reduce
to reduce
an
an arbitrary
arbitrary matrix
matrix to companion form
to companion form are
are numerically
numerically unstable.
unstable. Moreover,
Moreover, companion
companion
matrices
matrices are
are known
known to to possess
possess many
many undesirable numerical properties.
undesirable numerical properties. For example, in
For example, in
general and especially
especially as n n increases, their eigenstructure is extremely ill conditioned,
nonsingular ones are
nonsingular ones are nearly singular, stable
nearly singular, stable ones
ones are
are nearly
nearly unstable, and so
unstable, and so forth
forth [14].
[14].
Exercises
Exercises 107
Companion
Companion matrices
matrices and
and rational
rational canonical
canonical forms
forms are
are generally
generally to
to be
be avoided
avoided in floating-
in fioating-
point computation.
Remark
Remark 10.40. 10.40. Theorem
Theorem 10.38
10.38 yields some understanding
yields some understanding of of why
why difficult numerical
difficult numerical
behavior
behavior might might be be expected
expected for
for companion
companion matrices.
matrices. For
For example,
example, when
when solving linear
solving linear
systems of equationsequations of the form (6.2), one measure of numerical sensitivity is K
numerical sensitivity P(A) =
Kp(A) =
l m
II Â IIpp II A~
A -] IIpp'> the
e so-called
so-calledcondition
conditionnumber
numberof ofAA with
withrespect
respecttotoinversion
inversionand
andwith
withrespect
respect
to
to the
the matrix P-norm. If
matrix p-norm. If this
this number
number is
is large,
large, say 0(10*),
say O(lO k
), one
one may
may lose
lose up
up to
to kk digits
digits of
of
precision.
precision. In In thethe 2-norm,
2-norm, this condition number
this condition number isis the
the ratio
ratio ofof largest
largest to
to smallest
smallest singular
singular
values which, by the theorem, can be determined determined explicitly
explicitly as
y+J y 2 - 4a5
21 a ol
It is
It is easy
easy to
to show
show that
that y/2/ao
21~01 ::::< K2(A)
k2(A) < 1:01' and
:::: -£-,, and when
when GOao is small or
is small or yy is
is large
large (or both),
(or both),
then
then K2(A)
K2(A) ^~ T~I. It is
I~I' It is not
not unusual
unusualfor for yy to
to be
be large
large for large n. Note
forlarge Note that
that explicit
explicit formulas
formulas
for K] (A) and Koo(A)
K\ (A) Koo(A) can also be determined easily by using (l0.11). (10.11).
EXERCISES
EXERCISES
1. Show that if a triangular matrix is normal, then it must be diagonal.

1.
x
2. Prove that if A
A e
E M"
jRnxn" is normal, then Af(A)
N(A) = A/"(ATr ).
= N(A ).
nx
3. Let AA GE Ccc nxn" and define p(A)
peA) == maxx I'M- Then p(A)
€A(A) IAI.
max)..EA(A) peA) is called the spectral
radius of
radius of A.
A. Show
Show that if A
that if A is
is normal,
normal, then
then p(A)
peA) = ||A||2. Show
= IIAII2' Show that
that the
the converse
converse
is
is true
true if
if n
n= = 2.2.
nxn
4. Let AA €
E Cen xn be normal with eigenvalues
eigenvalues y1A],, ...
..., , yAnn and
and singular
singular values a\ > a0'22 >
0'1 ~ ~
• •• ~
... > an
on ~> O.
0. Show
Show that
that a;
a, (A)
(A) = IA;(A)I
|A.,-(A)| for
for ii E!l.
e n.
5. Use the reverse-order identity
identity matrix P introduced in
in (9.18)
(9.18) and the matrix U U in
x
Theorem 10.5 to find a unitary matrix Q that reduces A
A e cc nxn
E C" " to lower triangular
form.
6.
6. Let A = I[~J :
Let A M]eECCC22x2
x2
.. Find
Find aa unitary
unitary matrix such that
U such
matrix U that
7. If
7. If A e W
A E xn
A -I[ must
jRn xn is positive definite, show that A must also
also be
be positive
positive definite.
3. Suppose A
8. A e
E E"
x
jRnxn" is
is positive definite.
definite. Is [ ^ [1 A~I]1 >
/i ~ 0?
O?
}. Let
9. R, S 6
Let R, E E
nxn
jRnxn be symmetric. Show
be symmetric. that [[~*
Show that J~]1 > 0 if
> 0 if and
and only
only if
if S > 0 and
> 0 and
R > SS-I.
R>
108
108 Chapter 10.
Canonical Forms
Forms
10. Find the inertia of the following

10. following matrices:
(a) [~ ~ l (b) [
-2
1- j
1+ j ]
-2 '
(d) [ - 1 1+ j ]
1- j -1 .
Chapter 11
Chapter 11
Linear Differential
Linear Differential and
and
11.1
11.1 Differential Equations
Differential Equations
In this
In section we
this section we study
study solutions
solutions of
of the linear homogeneous
the linear system of
homogeneous system of differential
equations
x(t) = Ax(t); x(to) = Xo E JR.n (11.1)
for tt 2:
for > to. This is
IQ. This is known
known as as an initial-value problem.
an initial-value problem. We We restrict
restrict our
our attention
attention in
in this
this
nxn
chapter only
chapter only to the so-called
to the time-invariant case,
so-called time-invariant case, where
where the matrix A
the matrix A Ee R is constant
JR.nxn is constant
and does
and does not
not depend
depend onon t.t. The
The solution
solution of
of (11.1) is then
(11.1) is then known always to
known always to exist
exist and
and be
be
unique. It
unique. It can
can be
be described
described conveniently
conveniently in terms of
in terms of the
the matrix
matrix exponential.
exponential.
Definition 11.1. For
Definition 11.1. all A
For all Rnxn, the
A Ee JR.nxn, the matrix exponential eeAA Ee R
matrix exponential nxn
is defined
JR.nxn is defined by
by the
power series
power series
+00 1
A
e = L
k=O
,Ak.
k.
(11.2)
The series (11.2)

The series (11.2) can
can be shown to
be shown converge for
to converge for all
all A (has radius
A (has of convergence
radius of convergence equal
equal
to +00). The
to +(0). Thesolution
solutionof
of(11.1)
(11.1)involves
involvesthe
thematrix
matrix
(11.3)
which thus also

which thus also converges
converges for
for all
all A and uniformly
A and uniformly in
in t.t.
11.1.1
11.1.1 Properties of
Properties of the matrix exponential
the matrix exponential
1. eO = I.
e° = I.
This follows
Proof: This
Proof follows immediately
immediately from
from Definition
Definition 11.1
11.1bybysetting
settingAA==O.0.
T
R" XM , (e(e
A )A = e A • T
2. For allAAEGJR.nxn,
For all f - e^.
Proof: This
Proof This follows
follows immediately from Definition
immediately from 11.1 and
Definition 11.1 and linearity
linearity of
of the transpose.
the transpose.
109
109
110
110 Chapter 11.
Chapter 11. Linear
Linear Differential
Differential and
and Difference
Equations
For all
3. For R"x" and
all A Ee JRnxn and for
for all
all t,
t, Tr Ee JR,
R, e(t+T)A
e(t+r)A e'AeerA
= etA
= rA
elAe'tAA..
= erAe
=
Proof: Note that
Proof" Note that
e(t+r)A = I + (t + T)A + (t + T)2 A 2 + ...

2!
and
and
tA rA
e e = ( I + t A + t2!2 A 2 +... ) ( I + T A + T2!2 A 2 +... ) .
Compare like
Compare like powers of A
powers of in the
A in above two
the above two equations
equations and
and use
use the
the binomial
binomial theorem
theorem
on (t + T)*.
on(t+T)k.
xn
all A, B Ee R"
4. For all and for all
JRnxn and all t Ee JR, et(A+B) =-etAe
R, et(A+B) =ê'Ae'tB
B
e'Be'tAA if and
= etBe
= and only if A
and B
and commute, i.e.,
B commute, i.e., AB
AB ==B A.
BA.
Proof: Note that
Proof' Note that
2
t
et(A+B) = I + teA + B) + -(A + B)2 + ...
2!
and
and
while
while
t2 2
tB tA
e e = ( 1+ tB + 2iB +... ) ( 1+ tA + t2!A
2 2 +... ) .
Compare like
Compare like powers of tt in
powers of in the first equation
the first equation and the second
and the second or
or third
third and
and use the
use the
binomial theorem on
binomial theorem (A + B/
on (A B)k and
and the
the commutativity
commutativityof ofAAand
andB.B.
x
For all A Ee R"
5. ForaH JRnxn" and
and for
for all
all t eE JR, (e'A)~l = e-
R, (etA)-1 e~'tAA..
Proof: Simply take
Proof" Simply take TT =
= -t
— t in
in property
property 3.
3.
6. Let
6. denote the
Let £ denote Laplace transform
the Laplace transform and £~! the
and £-1 the inverse
inverse Laplace
Laplace transform. Then for
transform. Then for
x
E R"
all A € JRnxn" and for all tt € R,
E lR,
tA } = (sI - A)-I.
C{etA
(a) .l{e
(a) } = (sI-Arl.
(b) £- 1 {(j/-A)-
(b) .l-I{(sl- 1
A)-I}} == « M
erA..
We prove
Proof: We
Proof" only (a).
prove only (a). Part
Part (b)
(b) follows
follows similarly.
similarly.
{+oo tA
= io et(-sl)e dt
(+oo
=io ef(A-sl) dt since A and (-sf) commute
11.1. Differential
11.1. Differential Equations
Equations 111
111
= {+oo
10
t
;=1
e(Ai-S)t x;y;H dt assuming A is diagonalizable
= ~[fo+oo e(Ai-S)t dt]x;y;H

n 1
= '"' - - X i y;H assuming Re s > Re Ai for i E !!
L.....
i=1
s - A"I
1
(sl --A)-
= (sI
= A)-I. .
The
The matrix
matrix (s(s II -— A)
A) ~'
-I is
is called the resolvent
called the of A
resolvent of A and
and is
is defined
defined for
for all
all ss not in A
not in (A).
A (A).
Notice
Notice inin the
the proof
proof that
that we
we have
have assumed, for convenience,
assumed, for convenience, that
that A is diagonalizable.
A is diagonalizable.
If this
If is not
this is not the
the case,
case, the scalar dyadic
the scalar dyadic decomposition
decomposition can
can be
be replaced
replaced by by
m
et(A-sl) =L Xiet(Ji-sl)y;H
;=1
using
using the
the JCF. Allsucceeding
JCF. All succeedingsteps
stepsin
inthe
theproof
proof then
then follow
follow in
inaastraightforward
straightforward way.
way.
7. For
For all
all A
A eE R" x
JRnxn" and for all
and for all t eE R, £(e'tA
JR, 1h(e A
)) = AetA = etA
e'AA.
Proof: Since
Proof: Since the
the series (11.3) is
series (11.3) is uniformly
uniformly convergent,
convergent, it
it can
can be differentiated term-by-
be differentiated term-by-
term from which the result follows immediately. Alternatively, the formal definition
d e(t+M)A _ etA
_(/A) = lim
dt ~t-+O L'lt
can be employed
employed as follows. For any consistent matrix norm,
e(t+~t)AAt-
- --u.-- - - Ae tA
etA I = IIIL'lt (etAe~tA - /A) - Ae tA I
I
= I ~t (e~tAetA - etA) - Ae
tA
I
I ~t (e~tA - l)e - Ae II
tA tA
=
= I (M A (M)2 A 2 +... )
+~ e tA - AetAil
I L'lt
= I ( Ae +
tA
~; A 2etA + ... ) - Ae
tA
II
= I ( ~; A2 + (~~)2 A 3
+ .. , ) etA I
1 L'lt (L'lt)2
< MIIA21111e
-
tA II
( _2! + -IIAII
3!
+ --IIAI12
4!
+ ... )
< L'lt1lA21111e
tA Il (1 + L'ltiIAIl + (~t IIAII2 + ... )
= L'lt IIA 21111e tA IIe~tIIAII.
112
112 Chapter 11.
Chapter 11. Linear
Linear Differential
Differential and
and Difference
Equations
For fixed
For the right-hand
t, the
fixed t, right-hand side
side above
above clearly
clearly goes
goes to
to 00 as
as t:.t goes to
At goes to O.
0. Thus,
Thus, the
the
limit exists
limit exists and
and equals Ae t AA•. A
equals Ae' similar proof
A similar proof yields
yields the
the limit etAA A,
limit e' or one
A, or one can
can use
use the
the
A
fact that
fact that A commutes with
A commutes with anyany polynomial
polynomial ofof A of finite
A of degree and
finite degree and hence
hence with
with e'
etA. .
11.1.2 Homogeneous linear differential equations

Homogeneous equations
Theorem 11.2. Let Rnnxn
A Ee IR
Let A xn.. The
The solution
solution of
of the
the linear
linear homogeneous
homogeneous initial-value
initial-value problem
problem
x(t) = Ax(l); x(to) = Xo E IR n (11.4)
for t ::: to is given by

(11.5)
Differentiate (11.5)
Proof: Differentiate
Proof: (11.5) and
and use
use property
property 77 of of the matrix exponential
the matrix exponential to get xx((t)
to get t ) ==
(t to)A
Ae ~ xo
Ae(t-to)A xo = Ax(t). Also, x(to)
x(t0) = e(fo~t°')AXQXo =
— e(to-to)A Xo so, by the fundamental
— XQ fundamental existence and
uniqueness theorem
uniqueness theorem for
for ordinary
ordinary differential
differential equations,
equations, (11.5)
(11.5) is
is the
the solution
solution of
of (11.4).
(11.4). D 0
11.1.3 Inhomogeneous linear differential equations

Inhomogeneous equations
nxn xm
Theorem 11.3. Let
Theorem Let AA Ee R
IR nxn ,, B
B Ee W
IR nxm and let
and let the
the vector-valued
vector-valued function
function uu be
be given
given
and, say,
and, say, continuous.
continuous. Then
Then the
the solution
solution of
of the
the linear
linear inhomogeneous
inhomogeneous initial-value
problem
x(t) = Ax(t) + Bu(t); x(to) = Xo E IRn (11.6)
for
for tt :::
> toIQ is
is given
given by
by the variation of
the variation of parameters
parameters formula
formula
x(t) = e(t-to)A xo + t e(t-s)A Bu(s) ds. (11.7)

lo t
Proof: Differentiate (11.7)

Proof: Differentiate (11.7) and
and again
again use
use property
property 77 of
of the
the matrix
matrix exponential.
exponential. The
The general
general
formula
formula
-
d
dt
l
pet)
q
(t)
f(x, t) dx =
l
pet)
q
af(x t)
(t)
at
' dx +
dq(t) dp(t)
f(q(t), t ) - - - f(p(t), t ) - -
dt dt
is used to
to get
get xx(t)
(t) = Ae{'-to)AxXo0 + f'o Ae
= Ae(t-to)A ( s)A
Ir:
Bu(s) ds +
'- Bu(s)
Ae(t-s)A + Bu(t)
Bu(t) = Ax(t) + Bu(t). Also,
= Ax(t)
= (f fo)/1
*('o)
x(to} = <? °~ .¥o
e(to-tolA + 0 == XQ
Xo + Xo so, by the fundamental
fundilm()ntill existence
()lI.i~t()Oc() and
nnd uniqueness
uniqu()Oc:s:s theorem
theorem for
for
ordinary differential
ordinary differential equations,
equations, (11.7)
(11.7) is
is the
the solution
solution ofof (1l.6).
(11.6). D 0
Remark 11.4.
Remark 11.4. The
The proof above simply
proof above simply verifies
verifies the
the variation
variation of
of parameters formula by
parameters formula by
direct differentiation.
direct differentiation. The
The formula
formula can
can be derived by
be derived means of
by means of an
an integrating
integrating factor
factor "trick"
"trick"
as follows.
as follows. Premultiply
Premultiply the equation x
the equation x -— Ax = Bu
Ax = e~tA to
by e-
Bu by to get
get
(11.8)
11.1. Differential
Equations 113
Now integrate (11.8) over the interval [to, t]:

[to, t]:
1 t d
-e-sAx(s) ds =
to ds
1t to
e-SABu(s) ds.
Thus,
e-tAx(t) - e-toAx(to) = t e- sA Bu(s) ds
lto
and hence
x(t) = e(t-tolA xo + t e(t-s)A Bu(s) ds.
lto
11.1.4
11.1.4 Linear matrix
Linear matrix differential
equations
Matrix-valued initial-value problems also occur frequently. The first is an obvious general-
ization of Theorem 11.2,and
Theorem 11.2, andthe
theproof
proof isisessentially
essentiallythe
thesame.
same.
lxn
Theorem 11.5. Let A Ee W
jRnxn.. The
The solution of
of the matrix linear homogeneous initial-value
nrohlcm
problem
X(t) = AX(t); X(to) =C E jRnxn (11.9)
for > to
for tt ::: to is
is given
given by
by
X(t) = e(t-to)Ac. (11.10)
In the matrix case, we can have coefficient

coefficient matrices on both the right and left. For
convenience, the following
following theorem is stated with initial time to
to =
= 0.
O.
xn mxm xm
Theorem 11.6.
Theorem 11.6. Let A Ee Rn
jRnxn,, B eE R
jRmxm,, and C eE Rn
]R.nxm.. Then the
the matrix initial-value
problem
problem
X(t) = AX(t) + X(t)B; X(O) = C (11.11)
has the (t) —

the solution XX(t) = etACe
= aetACeratB
tB .
Proof: Differentiate e
Proof: Differentiate tA
etACe tB with respect to tt and use property
CetB property 7 of the matrix exponential.
exponential.
The fact that X t ) satisfies the initial condition is trivial.
X ((t) D
0
X
Corollary 11.7. Let A, C eE IR"
Corollary 11.7. ". Then the matrix initial-value problem
]R.nxn.
X(t) = AX(t) + X(t)AT; X(O) =C (11.12)
has the
the solution X(t}
X(t) = etACetAT.
= etACetAT.
When C is symmetric in (11.12), X t ) is symmetric and (11.12) is known as a Lya-
X ((t)
punov
punov differential
differential equation. Sylvester
equation. The initial-value problem (11.11) is known as a Sylvester
differential equation.
differential equation.
114
114 Chapter 11.
Chapter 11. Linear
Linear Differential
Differential and
and Difference
Equations
11.1.5
11.1 .5 Modal decompositions
xn
Let A
Let A E
E W
jRnxn and
and suppose, for convenience,
suppose, for that it
convenience, that it is diagonalizable (if
is diagonalizable (if A is not
A is diagonaliz-
not diagonaliz-
able, the
able, the rest of this
rest of this subsection
subsection is
is easily
easily generalized
generalized by using the
by using the JCF
JCF and
and the
the decomposition
decomposition
A = ^L Xf
A — Ji YyitHH as
X;li as discussed
discussed in
in Chapter
Chapter 9).
9). Then
Then the
the solution x(t) of
solution x(t) of (11.4)
(11.4) can
can be
be written
written
x(t) = e(t-to)A Xo
= (ti.iU-tO)Xiyr) Xo
1=1
n
= L(YiHxoeAi(t-tO»Xi.
i=1
The Ai
The ki ss are
are called
called the modal velocities
the modal velocities and
and the
the right
right eigenvectors
eigenvectors *, are called
Xi are called the modal
the modal
directions. The
directions. The decomposition
decomposition above
above expresses
expresses the
the solution
solution x(t) as aa weighted
x (t) as sum of
weighted sum of its
its
modal velocities
modal velocities and
and directions.
directions.
This modal
This modal decomposition
decomposition cancan be expressed in
be expressed in aa different
different looking
looking but
but identical
identical form
form
n
if we
if we write
write the
the initial
initial condition
condition XQ as aa weighted
Xo as weighted sum
sum of
of the
the right
right eigenvectors
eigenvectors Xo = L ai Xi.
i=1
Then
Then
= L(aieAiU-tO»Xi.
i=1
In the
In last equality
the last equality we
we have used the
have used the fact that Yi
fact that HXj =
yf*Xj = flij.
Sfj.
Similarly, in
Similarly, in the
the inhomogeneous case we
inhomogeneous case we can
can write
write
t e(t-s)A Bu(s) ds = t (it eAiU-S)YiH Bu(s) dS) Xi.

i~ i=1 ~
11.1.6 Computation of the matrix exponential

Computation exponential
JCF method
JCF method
x xn 1
Let A
Let A eE R"
jRnxn" and
and suppose
suppose X
X Ee Rn is such
jR~xn is such that
that X"
X-I AX J, where
= J,
AX = is aa JCF
where JJ is for A.
JCF for A.
Then
Then
etA = etXJX-1
= XetJX- 1
n
,
Le A• X'YiH if A is diagonalizable
1=1
~
I t,x;e'J,y;H in geneml.
11.1. Differential
Equations 115
If A is
If etA via the formula etA
is diagonalizable, it is then easy to compute etA etA = Xe tJ
Xe tl X-I
X '
tj
since et I is simply a diagonal matrix.
In the more general case, the problem
problem clearly reduces simply to the computation of
clearly reduces of
kxk
the exponential of a Jordan block. To be specific, let .7,
Ji EeC<C
kxk
beaaJordan
be Jordanblock
blockof
ofthe
theform
form
A 1 o o
o A
Ji = o =U+N.
o o A
AI and N commute. Thus, eettJiI, = eO.!

Clearly A/ e'ueetN
l N by property 4 of
the matrix exponential.
tu Atx
lH
diag(e ,',...,
The diagonal part is easy: ee == diag(e ext}. But eetN
••• ,eAt).
lN is
is almost as easy since N is
nilpotent of degree
nilpotent k.
degree k.
nx
matrix M
Definition 11.8. A matrix e M
M E jRnxn" is nilpotent of
is nilpotent of degree
degree (or
(or index,
index, or
or grade) if
grade) p if
p p l
M =
MP = 0, M ~ t=-
0, while MP-I ^ 0.
O.
For the matrix N defined above, it is easy to check that while N has 1's l's along only
its first
its superdiagonal (and
first superdiagonal (and O's O's elsewhere),
elsewhere), N N22 has 1's along
has l's along only
only its
its second
second superdiagonal,
superdiagonal,
and so forth. N kk~- lI has a 1 in its (1,
forth. Finally, N (1, k)
k) element and has O'sO's everywhere else, and
kk N
N = 0. O. Thus, the series expansion of e' e lN is finite,
finite, i.e.,
t2 t k- I I
e IN =I+tN+-N 2 + ... + N k-
2! (k - I)!
t
o o 1
Thus,
eAt 12 At Ik-I At
teAt
2I e (k-I)! e
0 eAt teAl
ell; = 12 At
0 0 eAt
2I e
teAl
0 0 eAt
In the case when A

A.isiscomplex,
complex,aareal
realversion
versionof
ofthe
theabove
abovecan
canbe
beworked
workedout.
out.
116 Chapter 11.
Chapter 11. Linear
Linear Differential
Differential and
and Difference
Equations
Example 11.9.
Example 11.9. Let =[=i a
A = [ ~_\ J]. Then
Let A Then A(A)
A (A) =
= {-2,
{-2, -2}
-2} and
and
etA = x-I
Xe tJ
-~ ] [ -1 ]
2 1 -2 -1
=[ ] exp t [
0 2
te- 2t
e~2t
=[ 2
1
] [
e- 2t ][ -1
-1
2 ]
Interpolation method
Interpolation method
This method is numerically unstable in finite-precision arithmetic but is quite effective
effective for
hand
hand calculation
calculation in small-order problems.
in small-order problems. TheThe method
method is stated and
is stated and illustrated
illustrated for
for the
the
exponential function but applies equally well to other functions.
functions.
nxn tx A
Given A
Given A €
E E.
jRnxn and
and /(A) = eetA,
f(A) = , compute
compute f(A)
f(A) == e'etA,, where tt is a fixed
fixed scalar.
Suppose the characteristic
characteristic polynomial ofof A can be written as n(A)
n ( X ) = Yi?=i (A nr=1
t',
(^ -~~ Aiî)"'»
where the
the A.,-
Ai s are distinct. Define
OTQ, ...
where ao, . . . , , an-l
an-i are n
n constants that are to be determined. They are, in fact, the unique
solution of the n equations:
g(k)(Ai) = f(k)(Ai); k = 0, I, ... , ni - I, i Em.
Here, the superscript

superscript (k)
(&) denotes the fcth
kth derivative with respect to A. a,s then
X. With the aiS
known,
known, the function g
the function g is
is known
known andand /(A)
f(A) == g(A).
g(A). TheThe motivation
motivation for for this
this method
method is is
the Cayley-Hamilton Theorem, Theorem 9.3, which says that all powers of A A greater than
expressed as linear combinations of A
n -— 1 can be expressed Akk for kk = 0, I, 1, ...
. . . ,, n -— 1.
1. Thus, all the
greater than nn -— 1 in the power series for ee't AA can be written in terms of these
terms of order greater
lower-order powers as well. The polynomial gg gives the appropriate linear combination.
lower-order
Example 11.10. Let
Example 11.10. Let
A = [-~ -~0-1~ ]
o
and /(A) etK. Then jr(A.)
f(A) = etA. -(A. +
n(A) = -(A I) 3 , so
+ 1)3, so m
m== 11 and nl{ = 3.
and n
2
Let g(X)
g(A) = ao + alA
— UQ a\X + a2A2.
oÂ. . Then the three equations for the a,s
aiS are given by
g(-I) = f(-1) ==> ao - a l +a2 = e- t

,
g'(-1) = f'(-1) ==> at - 2a2 = te- t ,

g"(-I) = 1"(-1) ==> 2a2 = t 2 e- t •
11.1.
11 Differential Equations
.1. Differential Equations 117
Solving for
Solving for the
the a, s, we
ai s, we find
find
Thus,
Example 11.11.
Example 11.11. Let
Let A ~4 4i
A = [[::::~ 6]
andt /(A)
_* J] and ff>\
f(A) =
= eO-. TU^^ _/"i\
eatk. Then
Then rr(A) =
7r(X) = (Af\ i o\22
(A ++ 2)2 somm=
2) so = 11and
and
«i = 2.
nL 2.
Let
Let g(A.) ao +
g(A) = «o + ofiA..
aLA. Then
Then the
the defining equations for
defining equations for the
the a,-s are given
aiS are given by
by
g(-2) = f(-2) ==> ao - 2al = e- 2t ,

g'(-2) = f'(-2) ==> al = te- 2t
.
Solving
Solving for
for the
the a,s,
aiS, we
we find
find
ao = e- 2t + 2te- 2t ,
aL = te- 2t .
Thus,
f(A) = etA = g(A) = aoI + al A

= (e- 2t + 2te- 2t ) [ ~ oI ] + te- 2t [-4 4]
-I 0
_ [ e- 2t _ 2te- 2t
- -te- 2t
Other methods
Other methods
1
1. etA =
1. Use etA = £~ l
{(sl -— A)^
.c-I{(sI A)-I} } and techniques for inverse Laplace transforms. This
is quite effective
effective for small-order
small-order problems, but general nonsymbolic computational
techniques
techniques are numerically
numerically unstable since the problem
problem is theoretically equivalent
equivalent to
knowing precisely a JCE JCF.
2. Use Pade approximation. There is an extensive literature on approximating cer-
tain nonlinear functions rational functions. The matrix analogue yields eeAA ~
functions by rational =
118
118 Chapter 11.
Chapter 11. Linear
Linear Differential
Differential and Difference Equations
and Difference Equations
l P
where D(A)
D~ (A)N(A), where
D-I(A)N(A), 80I + olA
D(A) = 001 =
Si A +
H ... +hopAP and N(A)
SPA and v0I +
N(A) = vol vlA +
+ vIA + =
q
...
• • • + Vq
vq A Explicit formulas
A q..
Explicit formulas areare known
known forfor the
the coefficients
coefficients of
of the
the numerator
numerator and and
denominator polynomials of various orders. Unfortunately, a Fade approximation for
denominator polynomials of various orders. Unfortunately, a Pad6 approximation for
the
the exponential
exponential is accurate only
is accurate only in
in aa neighborhood
neighborhood of of the
the origin; in the
origin; in the matrix
matrix case case
this
this means when IIAII
means when || A|| isis sufficiently
sufficiently small.
small. This
This can
can be
be arranged
arranged byby scaling
scaling A,
A, say,say, by
by
22'
multiplying it by 1/2* for sufficiently large k and using the fact that e = ( e
multiplying it by 1/2k for sufficiently large k and using the fact that e (e(
AA
= / { ] / 2 *)A \ *
j . I /2')A )
Numerical loss of
Numerical loss of accuracy can occur
accuracy can occur in this procedure
in this from the
procedure from successive squarings.
the successive squarings.
3.
3. Reduce
Reduce A A to
to (real)
(real) Schur
Schur form
form SS via
via the
the unitary similarity U
unitary similarity U and use eeAA = UUe
and use e SsU
UHH
and successive
and successive recursions up the
recursions up superdiagonals of
the superdiagonals of the (quasi) upper
the (quasi) upper triangular matrix
triangular matrix
e Ss..
e
4. Many
4. methods are
Many methods are outlined
outlined in, for example,
in, for [19]. Reliable
example, [19]. Reliable and efficient computation
and efficient computation
of
of matrix functions such
matrix functions as eeAA and
such as log(A) remains
and 10g(A) remains aa fertile
fertile area for research.
area for research.
11.2
11.2 Difference Equations
In
In this
this section
section we outline solutions
we outline solutions of
of discrete-time
discrete-time analogues
analogues of
of the
the linear differential
linear differential
equations
equations ofof the
the previous section. Linear
previous section. discrete-time systems,
Linear discrete-time systems, modeled
modeled by systems of
by systems of
difference equations, exhibit
difference equations, exhibit many
many parallels
parallels to
to the
the continuous-time
continuous-time differential
differential equation
equation
case,
case, and
and this observation is
this observation is exploited
exploited frequently.
frequently.
11.2.1
Homogeneous linear difference equations
difference equations
xn
Theorem 11.12. Let
Let A e Rn
A E jRn xn.. The
The solution
solution of
ofthe linear homogeneous
the linear homogeneous system of difference
system of difference
equations
equations
(11.13)
for > 00 is
for kk 2:: is given
given by
by
Proof: The proof

Proof: The proof is
is almost
almost immediate
immediate upon substitution of
upon substitution of (11.14)
(11.14) into (11.13).
into (11.13). D
0
Remark 11.13. Again,
Remark 11.13. Again, we restrict our
we restrict our attention
attention only
only to
to the
the so-called time-invariant
so-called time-invariant
case,
case, where
where the
the matrix
matrix A
A inin (11.13)
(11.13) is
is constant and does
constant and does not
not depend
depend on k. We
on k. We could also
could also
consider an
consider an arbitrary
arbitrary "initial
"initial time"
time" ko,
ko, but since the
but since system is
the system is time-invariant,
time-invariant, and since we
and since we
want
want to
to keep
keep the
the formulas "clean" (i.e.,
formulas "clean" (i.e., no
no double subscripts), we
double subscripts), we have chosen ko
have chosen ko =
= 00 for
for
convenience.
convenience.
11.2.2 Inhomogeneous linear

Inhomogeneous linear difference equations
nxn nxm
Theorem 11.14. Let A eE R jRnxn,, B e jRnxm and suppose {«*}£§
E R {udt~ « is a given sequence of
of
m-vectors. Then
m-vectors. Then the
the solution of the
solution of the inhomogeneous
inhomogeneous initial-value
problem
(11.15)
11.2. Difference
11.2. Difference Equations
Equations 119
119
is given by
k-I
xk=AkXO+LAk-j-IBUj, k:::.O. (11.16)
j=O
Proof: The proof

Proof: The proof is
is again
again almost
almost immediate
immediate upon substitution
substitution of
of (11.16)
(11.16) into
into (11.15).
(11.15). D
0
11.2.3
11.2.3 Computation of
Computation of matrix
matrix powers
powers
It is
It is clear
clear that solution of
that solution of linear
linear systems
systems of
of difference
equations involves
involves computation
computation of
of
Akk.. One solution method, which is numerically unstable but sometimes
sometimes useful
useful for hand
z-transforms, by analogy with the use of Laplace transforms to compute
calculation, is to use z-transforms,
aa matrix
matrix exponential.
exponential. One
One definition
definition of
of the
the z-transform
z-transform of
of aa sequence
sequence {gk} is
is
+00
Z({gk}t~) = LgkZ-k.
k=O
Assuming
Assuming |z| > max
Izl > max IAI,
|A|,the
thez-transform
z-transformof
ofthe
thesequence {Ak} isisthen
sequence {Ak} thengiven
givenby
by
X€A(A)
AEA(A)
+00
1 12
Z({A})=L...-z-A =I+-A+"2 A + ...
k "'kk
k=O z z
= (l-z-IA)-I
= z(zI - A)-I.
Methods based
based on the JCF are sometimes useful, again mostly for small-order
small-order prob-
xn
lems. Assume that A eE M"
jRnxn and
and let X R^n be such that X-I
E jR~xn
X e X~1AX
AX = /, J, where J is a
JCF for
JCF for A.
A. Then
Then
Ak = (XJX-I)k
= XJkX- 1
_I tA~X;y;H
- m
if A is diagonalizable,
H
LXi Jtyi in general.
;=1
If A is diagonalizable,
If Akk via the formula A
diagonalizable, it is then easy to compute A Akk =
— XJ k
X Jk XX-Il
since /*
Jk is simply a diagonal matrix.
120 Chapter 11.
Chapter 11. Linear
Linear Differential
Differential and
and Difference
Equations
In the
In the general
general case,
case, the
the problem again reduces
problem again reduces to the computation
to the computation ofof the
the power of aa
power of
pxp
Jordan block. To
To be specific, let 7,
Ji eE Cpxp
C be a Jordan block of the form
o ... 0 A
Writing
Writing J/,•i = AI + N and
= XI and noting
noting that AI and
that XI and the
the nilpotent matrix N commute,
nilpotent matrix commute, it
it is
is
then straightforward
straightforward to apply the binomial theorem to (AI +
(XI N) N)kk
and verify
verify that
Ak kA k-I k 2
(;)A - ( k
p-l
) Ak-P+I
0 Ak kA k- 1
J/ =
0 0 Ak ( ; ) Ak- 2
kA k - 1
0 0 Ak
The symbol (:
The symbol ( )) has
has the
the usual
usual definition of q!(kk~q)!
definition of ,(^ ., and
and is
is to
to be interpreted as
be interpreted as 0 if kk <
0 if < q.
q.
In the case when A.A isiscomplex,
complex,aareal
realversion
versionof ofthe
theabove
abovecan
canbebeworked
workedout.
out.
Example 11.15. Let

Example 11.15. A = [_J
Let A [=i -4
a
J]. Then
Then
Ak = XJkX-1 = [2
1
1 ] [(_2)k
1 0
k(-2)kk-
(-2)
1
] [
-1
1 -2
1 ]
_ [ (_2/- 1 (-2 - 2k) k( -2l+ ] 1

- -k( _2)k-1 (-2l- 1(2k - 2) .
Basic analogues of other methods

methods such as those mentioned in Section 11.1.611.1.6 can also
be derived
be derived for
for the
the computation
computation of
of matrix powers, but
matrix powers, but again no universally
again no "best" method
universally "best" method
exists. For an erudite discussion of the state of the art, see [11,
[11, Ch. 18].
11.3
11.3 Higher-Order Equations
Higher-Order Equations
It is well known that a higher-order
higher-order (scalar) linear differential
differential equation can be converted to
a first-order linear system. Consider, for example, the initial-value
(11.17)
with ¢J(t)
4>(t} a given function and n initial conditions
y(O) = Co, y(O) = CI, ... , in-I)(O) = Cn-I' (1l.l8)

Exercises 121
121
Here, v (m) denotes

Here, y(m) denotes the mth derivative
the mth derivative of
of yy with
with respect
respect to
to t.t. Define
Define aa vector
vector xx (t)
(?) Ee ]Rn
R" with
with
components *i(0
components Xl (t) = yyet),
( t ) , xX2(t)
2(t) = yyet), . . . ,, xXn(t)
( t ) , ... y { n ~ l ) ( t ) . Then
n(t) = In-l)(t). Then
Xl (I) = X2(t) = y(t),

X2(t) = X3(t) = yet),
Xn-l (t) = Xn(t) = y(n-l)(t),

Xn(t) = y(n)(t) = -aoy(t) - aly(t) - ... - an_lln-l)(t) + ¢(t)
= -aOx\ (t) - a\X2(t) - ... - an-lXn(t) + ¢(t).
These equations
These equations can
can then
then be
be rewritten as the
rewritten as the first-order linear system
first-order linear system
x(t) =
0
0
0
-ao
0
-a\
0
1
0
0
0
1
-a n-\
x(t)+ [ n~(t) (11.19)
The initial
The initial conditions
conditions take
Note that
Note that det(A!
take the
det(X7 -— A)
the form
form X^(0)
A) == A."
= Cc = [co,
(0) = [CQ, Cl,
c\,..., CnM-\_I] .
•.. , C
an-\Xnn-~1l+H... +halA
An ++an_1A a\X++ao.
r.
ao.However,
However,the
thecompanion
companion
matrix A
matrix in (11.19)
A in (11.19) possesses many nasty
possesses many nasty numerical
numerical properties for even
properties for even moderately
moderately sized
sized nn
and, as
and, as mentioned
mentioned before,
before, is
is often
often well
well worth
worth avoiding,
avoiding, at
at least
least for
for computational
computational purposes.
purposes.
A similar
A similar procedure
procedure holds
holds for
for the
the conversion
conversion of of aa higher-order
higher-order difference
difference equation
equation
with initial conditions,

n initial
with n conditions, into
into aa linear
linear first-order difference equation
first-order difference equation with (vector) initial
with (vector) initial
condition.
condition.
EXERCISES
EXERCISES
nxn p
1. Let
1. Let P
P E€ lR
Rnxn be
be aa projection.
projection. Show
Show that % !/ +
that eeP ~ + 1.718P.
1.718P.
2. Suppose
2. Suppose x, x, y
y E € lR
R"n and
and let
let A
A = xyT. Further,
= xyT. let aa =
Further, let T
= xXT Show that
y.y. Show e'A
that etA
T
I + gget,
1+ ( t , a)xy
a)xyT, , where
where
!(eat - I) if a 1= 0,
g(t,a)= { a t
if a = O.
3. Let
3. Let
122
122 Chapter 11.
Chapter 11. Linear
Linear Differential
Differential and Difference Equations
and Difference
nx
where X eE M'
jRmxn " is arbitrary. Show that
e = [eoI
A sinh 1 X ]
~I .
4.
4. Let
Let K denote
denote the skew-symmetric matrix
the skew-symmetric matrix
0 In ]
[ -In 0 '
2nx2n
where /„
In denotes the n x n identity matrix. AA matrix A e E RjR2nx2n is said to be
1 T l T 1
Hamiltonian if K~
K -I A
ATK
K == -A and to
- A and be symplectic
to be symplectic if K
K~-I A
ATK - AA--I.
K = .
(a)
(a) Suppose
Suppose EH is
is Hamiltonian
Hamiltonian and
and let A,be
let).. be an
aneigenvalue
eigenvalueof
of H. Showthat
H. Show that-)..
—A,must
must
also be
also be an eigenvalue of
an eigenvalue of H.
H.
(b) Suppose
(b) Suppose SS is
is symplectic and let
symplectic and A.be
let).. bean
aneigenvalue
eigenvalueof
ofS.S. Show
Showthat
that1/)..
1 /A,must
must
also be an eigenValue
also eigenvalue of
of S.
(c) Suppose
Suppose that H is Hamiltonian and
and S is symplectic.
symplectic. Show S~1HS
Show that S-I H S must be
Hamiltonian.
Hamiltonian.
Suppose H is Hamiltonian. Show that eHH must be symplectic.
(d) Suppose
(d)
5. Let f3 €
a, ft
Let a, R and
E lR and
Then show that

Then show that
ectt
_eut
cos f3t
sin f3t e
ectctrt sin ~t
cos/A
J.
6. Find
Find aa general
general expression
expression for
for
M
7. Find
Find e =
etA when A =
8.5. Let
Let
(a) Solve the

(a) Solve differential equation
the differential equation
i = Ax ; x(O) =[ ~ J.
Exercises
Exercises 123
(b) Solve
(b) Solve the
the differential
differential equation
equation
i = Ax + b; x(O) =[ ~ l
9. Consider
9. Consider the
the initial-value
problem
i(t) = Ax(t); x(O) = Xo

for tt ~
for > O.
0. Suppose
Suppose that E"x" is
that A Ee ~nxn is skew-symmetric
skew-symmetric and
and let
let ex
a == Ilxol12. Show that
\\XQ\\2. Show that
||*(OII2 =
I/X(t)1/2 = ex
aforallf > 0.
for all t > O.
10. Consider
10. Consider the
the n matrix initial-value
n xx nn matrix initial-value problem
problem
X(t) = AX(t) - X(t)A; X(O) = c.

Show that
Show the eigenvalues
that the eigenvalues of
of the solution XX((t)
the solution of this
t ) of this problem are the
problem are the same
same as
as those
those
of C
of Cffor
or all?.
all t.
11. The
11. The year
year is
is 2004
2004 and
and there are three
there are large "free
three large "free trade
trade zones"
zones" in
in the
the world:
world: Asia
Asia (A),
(A),
Europe (E),
Europe (E), and
and the Americas (R).
the Americas (R). Suppose
Suppose certain
certain multinational companies have
multinational companies have
total assets of
total assets $40 trillion
of $40 trillion of
of which $20 trillion
which $20 is in
trillion is in E and $20
E and $20 trillion is in
trillion is in R.
R. Each
Each
year half
year half of
of the Americas' money
the Americas' stays home,
money stays home, aa quarter
quarter goes
goes to
to Europe,
Europe, andand aa quarter
quarter
goes to
goes to Asia.
Asia. For Europe and
For Europe and Asia,
Asia, half stays home
half stays home andand half goes to
half goes to the Americas.
the Americas.
(a) Find
(a) Find the matrix M
the matrix M that
that gives
gives
[ A] E
R year k+1
=M [A]E
R year k
(b) Find
(b) Find the eigenvalues and
the eigenvalues and right
right eigenvectors
eigenvectors of
of M.
M.
(c) Find
(c) Find the distribution of
the distribution of the companies' assets
the companies' assets at
at year
year k.
k.
(d) Find
(d) Find the limiting distribution
the limiting distribution of
of the
the $40
$40 trillion
trillion as
as the
the universe
universe ends,
ends, i.e.,
i.e., as
as
—»• +00
kk ---* +00 (i.e.,
(i.e., around
around the
the time
time the
the Cubs
Cubs win
win aa World
World Series).
Series).
(Exercise adapted from
(Exercise adapted from Problem
Problem 5.3.11
5.3.11 in
in [24].)
[24].)
12.
12. (a) Find
(a) the solution
Find the solution of
of the
the initial-value
problem
.Yet) + 2y(t) + yet) = 0; yeO) = 1, .YeO) = O.

(b) Consider
(b) Consider the difference equation
the difference equation
Zk+2 + 2Zk+1 + Zk = O.
If £0 =
If Zo = 11 and z\ = 2,
and ZI 2, what is the
what is value of
the value of ZIOOO?
ZIQOO? What
What is
is the value of
the value of Zk
Zk in
in
general?
general?
left blank
blank
Chapter 12
Chapter 12
Generalized Eigenvalue
Problems
12.1
12.1 The Generalized Eigenvalue/Eigenvector
The Generalized Eigenvalue/Eigenvector Problem
Problem
In this chapter we
In the generalized
we consider the generalized eigenvalue
eigenvalue problem
problem
Ax = 'ABx,
xn
where
where A,
A, B e e
B E C"nxn . . The
The standard eigenvalue problem
standard eigenvalue considered in
problem considered Chapter 99 obviously
in Chapter obviously
corresponds to
corresponds to the special case
the special case that
that B
B == I.I.
Definition 12.1. A
Definition 12.1. A nonzero vector x eE C" en is a right generalized
generalized eigenvector
eigenvector of
of the pair
(A, B)
(A, with A,
B) with A, BB eE e MX
Cnxn " ifif there
there exists A eE e,
scalar 'A.
exists aa scalar called aa generalized
C, called eigenvalue,
generalized eigenvalue,
such that
that
Ax = 'ABx. (12.1)
en is a left generalized
Similarly, a nonzero vector y eE C" generalized eigenvector
eigenvector corresponding to an
eigenvalue 'XA if
eigenvalue if
(12.2)
When the context is such that no confusion can arise, the adjective "generalized"
"generalized"
is usually dropped. As with the standard
standard eigenvalue
eigenvalue problem, if x [y]
[y] is a right [left]
eigenvector, then so is ax
eigenvector, ax [ay] for any
[ay] for any nonzero scalar aa. Ee <C.
C.
Definition 12.2. The
Definition 12.2. The matrix
matrix A
A— XB is
- 'AB called aa matrix
is called matrix pencil (or pencil
pencil (or pencil of
of the matrices A
the matrices A
B).
and B).
As with the standard eigenvalue problem, eigenvalues for the generalized eigenvalue
problem
problem occur
occur where the matrix pencil
pencil A -— 'AB
XB is singular.
Definition 12.3.
Definition 12.3. The polynomial 7r(A.)
The polynomial = det(A
n('A) = det(A -— 'AB)
A.5) is
is called
called the characteristic poly-
the characteristic poly-
nomial of
nomial the matrix
of the matrix pair (A, B).
pair (A, The roots
B). The roots ofn(X.) are the
ofn('A) are the eigenvalues of the
eigenvalues of the associated
associated
generalized eigenvalue problem.
generalized eigenvalue problem.
xn
Remark 12.4.
Remark 12.4. When A, B Ee E"
A, B jRnxn,, the characteristic
characteristic polynomial is obviously real, and
and
hence
hence nonreal eigenvalues must
nonreal eigenvalues must occur
occur in complex conjugate
in complex conjugate pairs.
pairs.
125
125
126
126 Chapter 12.
Chapter 12. Generalized
Eigenvalue Problems
Problems
Remark
Remark 12.5. If B =
12.5. If nonsingular), then rr(A)
= I (or in general when B is nonsingular), n ( X ) is a polynomial
eigenvalues
of degree n, and hence there are n eigenvalues associated with the pencil A -— XB. However,
AB. However,
when B = I,
B =I- I, in particular, when B
B is singular, there may be 0, k e n,
k E !!, or infinitely many
associated with the pencil A -— XB.
eigenvalues associated AB. For example, suppose
(12.3)
where a and (3
ft are scalars. Then the characteristic polynomial is
det(A - AB) = (I - AHa - (3A)
and there are several cases to consider.
Case 1: aa ^
Case 1: =I- 0,
0, {3ft ^
=I- 0.
O. There aretwo
There are twoeigenvalues,
eigenvalues, I1and
and~.|.
Case 2:
Case 2: a = 0, {3
= 0, / O.
f3 =I- 0. There
There are
are two eigenvalues, I1 and
two eigenvalues, and O.
0.
Case 3: a =
Case =I- 0, f3 0. There is only one eigenvalue, 1
= O.
{3 = I (of multiplicity
multiplicity 1).
1).
Case 4:
Case 4: aa = 0, f3
= 0, = 0.
(3 = O. All A Ee C
All A C are
are eigenvalues
eigenvalues since
since det(A
det(A -— A.B)
AB) ===0.
O.
Definition 12.6.
12.6. If
If del (A -— XB)
det(A AB) is not
not identically zero,
zero, the pencil
pencil A —
- XB
AB is said to be
regular; singular.
is said to be singular.
regular; otherwise, it is
Note that if AA(A)
N(A) n n J\f(B)
N(B) ^ =I- 0, the associated matrix pencil is singular
singular (as in Case
4 above).
Associated
Associated with any matrix
with any matrix pencil
pencil A -— XAB is aa reciprocal
B is reciprocal pencil
pencil B —
- n,A and cor-
/.LA and cor-
responding generalized eigenvalue problem. Clearly the reciprocal pencil has eigenvalues
responding generalized
(JL
/.L = ±.
= £. It
It is instructive to consider the reciprocal
reciprocal pencil associated with the example in
Remark 12.5. With A and B as in (12.3), the characteristic polynomial is
det(B - /.LA) = (1 - /.L)({3 - a/.L)
and there are again four cases to consider.
Case 1:
Case 1: a =I-
^ 0, ^ 0.
0, {3ft =I- O. There are two
There are two eigenvalues,
eigenvalues, 1 and ~.
I and ^.
Case 2: a =
Case ^ O.
= 0, {3ft =I- 0. There is only one eigenvalue, I1 (of multiplicity 1).
I).
Case 3:
Case ^ 0,
3: a =I- 0, f3 0. There
= O.
{3 = There are
are two eigenvalues, 11 and
two eigenvalues, and 0.
O.
Case 4:
Case 0, (3
= 0,
4: a = = 0.
(3 = O. All
All A
AE6C
C are
are eigenvalues
eigenvalues since
since det(B
det(B -— /.LA)
uA) ==
= 0.
O.
At least for the case of regular pencils, it is apparent where the "missing"
"missing" eigenvalues have
gone in Cases 2 and 3. That is to say, there is a second eigenvalue "at infinity" for Case 3 ofof
A— - A.B,
AB, with its reciprocal
reciprocal eigenvalue being 0 in Case
Case 3 of
of the reciprocal
reciprocal pencil B — nA.
- /.LA.
A similar reciprocal
A similar reciprocal symmetry
symmetry holds for Case
holds for Case 2.
2.
While
While there are applications in system theory and control where singular pencils
appear, only the
appear, only case of
the case of regular
regular pencils is considered
pencils is considered in
in the
the remainder of this
remainder of this chapter.
chapter. Note
Note
that A and/or B may still be singular. If B is singular, the pencil pencil A -— KB
AB always has
12.2. Canonical
12.2. Canonical Forms
Forms 127
fewer
fewer than eigenvalues. If B
than n eigenvalues. B is
is nonsingular,
nonsingular, thethe pencil
pencil AA --AAB
. f i always
always has
has precisely
precisely n
eigenvalues, since the
eigenvalues, since the generalized eigenvalue problem
generalized eigenvalue problem isis then easily seen
then easily to be
seen to be equivalent
equivalent
to the standard
to the standard eigenvalue
eigenvalue problem
problem B~ B- 1lAx
Ax = XxAx (or AB- 1lw
(or AB~ W = Xw).AW). However,
However, this
this turns
turns
out
out to
to be
be aa very
very poor
poor numerical
numerical procedure
procedure for
for handling the generalized
handling the generalized eigenvalue
eigenvalue problem
problem
if
if B isis even
even moderately
moderately illill conditioned
conditioned with
with respect
respect to
to inversion.
inversion. Numerical
Numerical methods
methods that
that
work
work directly
directly on
on AA and are discussed
and B are discussed inin standard
standard textbooks
textbooks onon numerical
numerical linear
linear algebra;
algebra;
see, for
see, for example,
example, [7,[7, Sec. 7.7]
7.7] or [25,
[25, Sec.
Sec. 6.7].
6.7].
12.2
12.2 Canonical Forms
Canonical Forms
Just as for
Just as for the
the standard
standard eigenvalue
eigenvalue problem, canonical forms
problem, canonical forms are
are available for the
available for generalized
the generalized
eigenvalue problem. Since the latter involves aa pair of matrices,
matrices, we now deal with equiva-
lencies
lencies rather
rather than similarities, and
than similarities, and the
the first
first theorem deals with
theorem deals with what
what happens
happens to eigenvalues
to eigenvalues
and eigenvectors under
and eigenvectors equivalence.
under equivalence.
Theorem 12.7.
12.7. Let
Let A, fl, Q,
A, B, nxn
Q, Z eE Cnxn c
with Q and
with Q and Z nonsingular. Then
nonsingular. Then
1.
1. the
the eigenvalues
eigenvalues of
of the problems A
the problems A —
- XB
AB and
and QAZ
QAZ —
- XQBZ
AQBZ are the same
are the same (the two
(the two
problems
problems are said to
to be equivalent).
2. ifx is a right eigenvector of

ifx isa of A—XB, Z-llxx isa
A-AB, then Z~ is a right eigenvector of QAZ—XQ B Z.
righteigenvectorofQAZ-AQB Z.
3. ify
ify isa
is a left
left eigenvector of
of A —KB, Q~Hyy isa
-AB, then Q-H isa left eigenvector ofQAZ -AQBZ.
lefteigenvectorofQAZ — XQBZ.
Proof:
Proof:
1. det(QAZ-XQBZ)
1. = det[0(A
det(QAZ - AQBZ) = det[Q(A -- XB)Z] = det
AB)Z] = gdet
det Q det ZZdet(A
det(A -- AB). Sincedet
XB). Since detQ0
and
and det
det Z are nonzero,
Z are nonzero, the
the result follows.
result follows.
l
2. The result follows
The result follows by
bynoting
notingthat
that (A
(A-AB)x
–yB)x =- 0Oif andonly
if and if Q(A
only if Q(A-XB)Z(Z~
-AB)Z(Z-l x)x) ==
o.0.
H
3. Again, the result follows easily by noting that yyH (A
(A —
- XB) o ifif and
AB) — 0 only if
and only if
H H
( Q ~ yy)H
(Q-H ) QQ(A
( A –_X BAB)Z
)Z == Q.
O. D
0
The first
The first canonical form is
canonical form an analogue
is an of Schur's
analogue of Schur's Theorem and forms,
Theorem and forms, in fact, the
in fact, the
theoretical
theoretical foundation
foundation for
for the
the QZ
QZ algorithm,
algorithm, which
which isis the generally preferred
the generally preferred method
method forfor
solving
solving the generalized eigenvalue
the generalized eigenvalue problem; see, for
problem; see, for example, [7, Sec.
example, [7, Sec. 7.7]
7.7] or [25, Sec.
or [25, Sec. 6.7].
6.7].
12.8. Let
Theorem 12.8. Let A, B eE c
A, B Cn xn
nxn
.. Then there exist
Then there unitary matrices
exist unitary matrices Q, cnxnxn such
Q, Z eE Cn such that
that
QAZ = Ta , QBZ = TfJ ,
where
where T
Taa and
and Tp are upper
TfJ are upper triangular.
triangular.
By
By Theorem 12.7, the
Theorem 12.7, eigenvalues ofthe
the eigenvalues of the pencil
pencil A
A— AB are
- XB then the
are then the ratios of the
ratios of diag-
the diag-
onal elements
onal elements ofof Ta
Ta to
to the corresponding diagonal
the corresponding elements of
diagonal elements of TfJ , with
Tp, with the understanding
the understanding
that
that aa zero
zero diagonal element of
diagonal element of TfJ corresponds to
Tp corresponds an infinite
to an infinite generalized eigenvalue.
generalized eigenvalue.
There is also
There is also an
an analogue of the
analogue of the Murnaghan-Wintner Theorem for
Murnaghan-Wintner Theorem for real matrices.
real matrices.
128 Chapter 12. Generalized
Chapter 12. Generalized Eigenvalue
Eigenvalue Problems
Problems
nxn xn
12.9. Let A, B
Theorem 12.9. B eE R
jRnxn.. Then there exist orthogonal matrices Q, Z e
E R"
jRnxn such
thnt
that
QAZ = S, QBZ = T,
where T is upper triangular and S is quasi-upper-triangular.

quasi-upper-triangular.
When S has a 2 x 2 diagonal block, the 2 x 2 subpencil formed
fonned with the corresponding
2x2
2 x 2 diagonal subblock of T has a pair of complex conjugate eigenvalues.
diagonal subblock eigenvalues. Otherwise, real
eigenvalues are given as above by the ratios of diagonal elements of of S to corresponding
elements of T.T.
There is also an analogue of the Jordan canonical form Kronecker canonical
fonn called the Kronecker
form (KCF).
form (KeF). A full description KCF, including analogues of
description of the KeF, of principal vectors and
so forth, is beyond the scope of this book. In this chapter, we present only statements of of
the basic theorems and some examples. The first theorem pertains only to "square" regular
pencils, while the full KeF
KCF in all its generality applies also to "rectangular"
"rectangular" and singular
pencils.
Theorem 12.10. Let A, B cnxn
B eE C nxn
and suppose the pencil
pencil A -— XB
AB is regular. Then there
E c
x
nxn
nonsingular matrices P, Q €
exist nonsingular C" "such
suchthat
that
peA - AB)Q = [~ ~ ] - A [~ ~ l
where J is a Jordan canonical
canonical form of A -A.fi
form corresponding to the finite eigenvalues of - AB and
N is a nilpotent
nilpotent matrix
matrix of associated with 0 and
ofJordan blocks associated and corresponding to the infinite
infinite
eigenvalues of
of A -— AB.
XB.
Example 12.11.
12.11. The matrix pencil
0]
~ ]-> [~
0 0 0 0
[2oo I 0
2 0 0
1 0
I
0
0
0
o
I
0
0
o 0 0 1 0 0 o 0
o 0 0 0 0 0 0 0
2
with characteristic polynomial (A
(X —
- 2)
2)2 has a finite eigenvalue 2
2 of multiplicty 2
2 and three
infinite eigenvalues.
12.12 (Kronecker Canonical Form). Let A, B eE c
Theorem 12.12 mxn
Cmxn .• Then there exist
nonsingular matrices P eE c
nonsingular C mxm
mxm
Q eE c
and Q C nxn
nxn
such that
peA - AB)Q = diag(LII' ... , L l" L~, ...• L;'. J - A.I, I - )"N),
12.2. Canonical
12.2. Canonical Forms
Forms 129
where
where N is nilpotent,
N is nilpotent, both
both N and JJ are
Nand in Jordan
are in canonical form,
Jordan canonical form, and
and L^ is the
Lk is (k +
the (k + I)
1) xx kk
bidiagonal pencil
bidiagonal pencil
-A 0 0
-A
Lk = 0 0
-A
0 0 I
The /(
The are called
Ii are called the
the left
left minimal indices while
minimal indices the r,
while the ri are called the
are called the right
right minimal indices.
minimal indices.
Left
Left or right minimal indices can take the value O.0.
Example 12.13. Consider a 13 x 12 block diagonal matrix whose diagonal blocks are
-A 0]
I -A .
o I
Such a matrix is in KCF. The first block of zeros actually corresponds

corresponds to LQ,
Lo, LQ,
Lo, LQ,
Lo, LQ
L6,,
L6, where each LQ
LQ, Lo has "zero columns" and one row, while each LQ L6 has "zero rows" and
one second block
one column. The second L\ while the
block is L\ the third block
block is L\. The next
is LI- next two
two blocks
correspond to
correspond
21
J = 0 2
[ o 0
while the nilpotent matrix N
N in this example is
[ ~6~].
000
eigenvectors span A-invariant subspaces in the case of the standard

Just as sets of eigenvectors
eigenproblem (recall Definition 9.35), there is an analogous geometric concept for the
eigenproblem
generalized eigenproblem.
generalized eigenproblem.
lxn
Definition 12.14. Let A, B eE W
~nxn and suppose
suppose the pencil
pencil A -— XB
AB is regular. Then V is a
deflating subspace
deflating subspace ifif
dim(AV + BV) = dimV. (12.4)
Just as in the standard eigenvalue

eigenvalue case, there is a matrix characterization
characterization of deflating
xk
subspace. Specifically, suppose S eE Rn* ~nxk is a matrix whose columns span a k-dimensional
^-dimensional
Rn, i.e.,
S of ~n,
subspace S i.e., n(S)
R ( S ) = S. ThenSS isisaadeflating
<S.Then deflatingsubspace
subspacefor
forthe
thepencil
pencilAA- —AB
XBifif
kxk
and only if there exists M Ee R ~kxk such that
AS = BSM. (12.5)
130 Chapter 12. Generalized Eigenvalue Problems
If
If B
B == /,
I, then (12.4) becomes
then (12.4) becomes dim (A V +
dim(AV + V)
V) == dim
dimV,
V, which
which is clearly equivalent
is clearly equivalent to
to
AV c~ V.
AV V. Similarly,
Similarly, (12.5)
(12.5) becomes
becomes AS
AS == SM
SM asas before. If the pencil
before. lEthe pencil is
is not
not regular,
regular, there
there
is
is aa concept
concept analogous
analogous to
to deflating
deflating subspace called aa reducing
subspace called subspace.
reducing subspace.
12.3
12.3 Application
Application to the Computation
to the Computation of
of System
System Zeros
Zeros
Consider the
Consider linear svstem
the linear system
i = Ax + Bu,
y = Cx + Du
nxn xm pxn pxm
with A €E M jRnxn,, B € E R"
jRnxm,, C eE R and D €E R
jRPxn,, and This linear
jRPxm.. This linear time-invariant state-
space model
space model is is often
often used
used in
in multivariable control theory,
multivariable control theory, where
where x(=
x(= x(t)) is called
x(t)) is called the state
the state
vector,
vector, u is the
u is the vector
vector ofof inputs
inputs or
or controls, and yy is
controls, and is the vector of
the vector of outputs
outputs oror observables.
observables.
For details, see,
For details, see, for
for example,
example, [26].
[26].
In general,
In general, thethe (finite)
(finite) zeros of this
zeros of system are
this system given by
are given the (finite)
by the (finite) complex
complex numbers
numbers
z, where
where the "system pencil"
the "system pencil"
(12.6)
drops
drops rank. In the
rank. In the special
special case p =
case p m, these
= m, these values
values are
are the
the generalized eigenvalues of
generalized eigenvalues of the
the
(n + m)
(n + m) x m) pencil.
(n + m)
x (n pencil.
Example 12.15. Let
Example 12.15. Let
-4
A=[ 2
C = [I 2], D=O.
Then the
Then transfer matrix
the transfer matrix (see
(see [26]) of this
[26)) of this system
system is
is
+ 14
55
g(5)=C(sI-A)-'B+D= 2 '
5 + 3s + 2
which clearly has
which clearly has aa zero
zero at —2.8. Checking
at -2.8. the finite
Checking the eigenvalues of
finite eigenvalues of the pencil (12.6),
the pencil (12.6), we
we
find
find the
the characteristic
polynomial to
to be
be
det [ A-c M DB] "'" 5A + 14,

which has aa root
which has at —2.8.
root at -2.8.
The
The method
method of finding system
of finding system zeros
zeros via
via aa generalized eigenvalue problem
generalized eigenvalue also works
problem also works
well
well for general mUlti-input,
for general multi-input, multi-output systems. Numerically,
multi-output systems. Numerically, however, one must
however, one be
must be
careful
careful first
first to "deflate out"
to "deflate out" the infinite zeros
the infinite zeros (infinite eigenvalues of
(infinite eigenvalues (12.6)). This
of (12.6». is accom-
This is accom-
plished
plished by computing aa certain
by computing certain unitary
unitary equivalence
equivalence on on the system pencil
the system that then
pencil that yields aa
then yields
smaller generalized
smaller generalized eigenvalue
eigenvalue problem
problem with
with only
only finite
finite generalized eigenvalues (the
generalized eigenvalues (the finite
finite
zeros).
zeros).
The connection between
The connection system zeros
between system zeros and
and the
the corresponding system pencil
corresponding system is non-
pencil is non-
trivial.
trivial. However,
However, we offer some
we offer some insight
insight below into the
below into special case
the special case of
of aa single-input,
single-input.
12.4. Symmetric
12.4. Generalized Eigenvalue
Symmetric Generalized Eigenvalue Problems
Problems 131
131
1 lxn
single-output system. Specifically, let B = bb E ffi.n, C = c T E
e Rn, e R
ffi.l xn,, and D =d E
and D e R.
R
Furthermore, let g(.s) r
T
g(s) = cc (s7(s I -— A)~ !
b+
A) -1Z? + dd denote the system transfer function
function (matrix),
and assume
and that gg(s)
assume that ( s ) can
can be
be written in the
written in the form
form
v(s)
g(s) = n(s)'
where n(s)
TT(S) is the characteristic polynomial
polynomial of A, v(s) and n(s)
A, and v(s) TT(S) are relatively
relatively prime
(i.e., there are no "pole/zero
"pole/zero cancellations").
cancellations").
Suppose Zz E€ C is
Suppose is such
such that
that
A - zI b ]
[ cT d
is singular. Then there exists a nonzero solution to
or
or
(A - zl)x + by = 0, (12.7)
c T x +dy = O. (12.8)
Assuming z is not an eigenvalue of A

A (i.e., no pole/zero
pole/zero cancellations), then from (12.7) we
get
get
x = -(A - zl)-lby. (12.9)
Substituting this
Substituting this in (12.8), we
in (12.8), have
we have
_c T (A - zl)-lby + dy = 0,
or ( z ) y = 00 by
or gg(z)y by the definition of
the definition of g.
g. Now
Now _y ^ 00 (else
y 1= (else xx = 00 from
from (12.9)).
(12.9». Hence g(z) = 0,
Hence g(z) 0,
i.e., zz is a zero of g.
g.
12.4
12.4 Symmetric Generalized
Symmetric Generalized Eigenvalue
Eigenvalue Problems
Problems
A very important special case of the generalized eigenvalue problem
Ax = ABx (12.10)
nxn
for A, B Ee R
A, B A = A
ffi.nxn arises when A and B
AT and B1 > O.
B = BT 0. For example, the
the second-order
system of differential
Mx+Kx=O,
where MM is a symmetric positive definite
definite "mass matrix" and K K is a symmetric "stiffness
"stiffness
matrix," is a frequently
frequently employed model of structures or vibrating systems and yields a
generalized eigenvalue problem ofthe
of the form (12.10).
Since B
Since B is
is positive definite it
positive definite it is
is nonsingular. Thus, the
nonsingular. Thus, the problem (12.10) is
problem (12.10) is equivalent
equivalent
to the standard eigenvalue problem B- B~l1Ax
Ax == AJC.
AX. However, B~ B-11AA is not necessarily
symmetric.
132
132 Chapter 12.
Chapter 12. Generalized Eigenvalue Problems
Generalized Eigenvalue Problems
Example 12.16. Let

Example 12.16. Let A
A = [~ ; l = [i ~ J
B Then B~Il = [-~ ~
ThenB~ AA J
B~Il A
Nevertheless, the eigenvalues of B A are always real (and are approximately
approximately 2.1926
and -3.1926 in Example 12.16).
Theorem 12.17. Let A,
A, B eR
B E nxn
A =A
jRnxn with A
T
AT and
and B
B = B
BTT
> 0.
> O. Then the
the generalized
eigenvalue problem
eigenvalue problem
Ax = ABx
has n real eigenvalues, and the n corresponding right eigenvectors can be chosen to be
orthogonal with respect to the inner product
orthogonal product (x, y) x TTBy.
y)BB = X Moreover, if
By. Moreover, if A > 0, then
> 0,
the eigenvalues are also all positive.
positive.
Proof: Since B
Proof: Since B >> 0,
0, it
it has
has aa Cholesky
Cholesky factorization B =
factorization B LL TT,, where
= LL L is
where L is nonsingular
nonsingular
(Theorem 10.23). Then
(Theorem 10.23). the eigenvalue
Then the problem
eigenvalue problem
Ax = ABx = ALL Tx
can be rewritten as the equivalent problem
(12.11)
= L ~I1AL
Letting C = J
AL ~T and
and zZ = 1
= LLT x, (12.11)
x, (12.11) can
can then
then be
be rewritten
rewritten as
as
Cz = AZ. (12.12)
Since C
Since C= =C CTT,, the
the eigenproblem
eigenproblem (12.12)
(12.12) has n real
has n real eigenvalues,
eigenvalues, with corresponding eigen-
with corresponding eigen-
vectors zi,...,
vectors Znn satisfying
Z I, •.. , z
zi
Zj = Dij.
T
Then x,
Xi = L ~Tzi,
Zi, ii € !!., are
E n, are eigenvectors
eigenvectors of
of the
the original
original generalized
generalized eigenvalue
eigenvalue problem
problem
and satisfy
and satisfy
(Xi, Xj)B = xr BXj = (zi L ~l)(LLT)(L ~T Zj) = Dij.
T
Finally,
Finally, if
if A
A =
= A > 0,
AT> 0, then
then C = CTT >
= C > 0,
0, so
so the eigenvalues are
the eigenvalues are positive.
positive. D
0
Example 12.18. The
Example 12.18. The Cholesky factor for
Cholesky factor for the
the matrix in Example
B in
matrix B 12.16 is
Example 12.16 is
L=[~ .,fi
1] .
.,fi
Then it
Then it is
is easily
easily checked
checked thai
that
c = L~lAL~T = [ 0..5 2..5 ]

2.5 -1.5 '
whose eigenvalues are approximately 2.1926 and —3.1926

-3.1926 as expected.
The material
material of
of this section can,
can, of course, be generalized
generalized easily to the case
case where AA
and
and B are
are Hermitian,
Hermitian, but since real-valued
but since real-valued matrices
matrices are
are commonly
commonly used
used in most applications,
in most applications,
we
we have
have restricted
restricted our attention to
our attention to that
that case
case only.
only.
12.5. Simultaneous
12.5. Simultaneous Diagonalization
Diagonalization 133
12.5
12.5 Simultaneous Diagonalization
Simultaneous Diagonalization
Recall that many
Recall that matrices can
many matrices can be
be diagonalized
diagonalized by by aa similarity.
similarity. In
In particular, normal ma-
particular, normal ma-
trices can be
trices can be diagonalized
diagonalized byby aa unitary
unitary similarity.
similarity. It
It turns
turns out
out that in some
that in some cases
cases aa pair
pair of
of
matrices (A, B)
matrices (A, can be
B) can simultaneously diagonalized
be simultaneously diagonalized by by the same matrix.
the same matrix. There are many
There are many
such results
such and we
results and present only
we present only aa representative
representative (but
(but important
important and
and useful) theorem here.
useful) theorem here.
Again, we
Again, we restrict our attention
restrict our attention only
only to the real
to the case, with
real case, with the
the complex
complex case
case following
following inin aa
x
Theorem 12.19
12.19 (Simultaneous Reduction to Diagonal Form). Let A, B Ee E" " with
][~nxn
T T
A
A=A AT and
and B
B= B BT >
> 0.
O. Then
Then there
there exists
exists a
a nonsingular
nonsingular matrix
matrix Q
Q such
such that
that
where
where D
D is diagonal. In
is diagonal. fact, the
Infact, the diagonal elements of
diagonal elements of D
D are the eigenvalues
are the eigenvalues of B 11A.
of B- A.
T
Proof: Let
Proof: Let B = LLLLT be the Cholesky
be the Cholesky factorization
factorization of
of B and setC
and set L -I1AL
C = L~ AL~ T
-T. Since
. Since
T
is symmetric,
C is
C symmetric, there
there exists
exists an
an orthogonal
orthogonal matrix
matrix P such that
P such that pTe
P CPp == D, where D
D, where is
D is
diagonal. Let
diagonal. Q=
Let Q L - TTP.
= L~ Then
P. Then
and
and
QT BQ= pT L -I(LLT)L -T P = pT P = [.
Finally, since
Finally, QDQ~l
since QDQ-I = QQT AQQ-Il = L-
QQT AQQ~ T P pT
L -TPPTL~ 1 A = L -T
L -IA L~TL~L -I1A
A B~11A,
= B- A, we
we
haveA(D)
have A(D) = A(B-11A).
= A(B~ A). D
0
Note
Note that
that Q is not
Q is not in
in general orthogonal, so
general orthogonal, so it
it does
does not
not preserve eigenvalues of
preserve eigenvalues of A and
and B
B
individually. However,
individually. However, itit does
does preserve
preserve the eigenvalues of
the eigenvalues of A XB. This
A -—'AB. can be
This can seen directly.
be seen directly.
LetA
Let A = QT AQ and B = QT
QTAQandB QTBQ. Then
Then/HAB- 1A = Q-1Q~l B~
B- 1l Q-T QT AQ = Q-1
Q~T QT Q~1B-B~1AQ.
AQ.
Theorem 12.19
Theorem 12.19 is
is very
very useful for reducing
useful for reducing many statements about
many statements about pairs of symmetric
pairs of symmetric
matrices to "the
matrices to "the diagonal case." The
diagonal case." The following
following is
is typical.
typical.
xn
Theorem 12.20.
Theorem 12.20. Let
Let A, B Ee lR
A, B M" nxn
be positive
be positive definite. Then A
definite. Then A >
2: B
B ifif and only if
and only B~l1 >
if B- 2:
1
A-
A-I..
Proof: By Theorem
Proof: By Theorem 12.19,
12.19, there
there exists
exists Q E"x" such
Q Ee lR~xn such that QT AQ
that QT AQ = = D
D andand QT QT BQ
BQ = = [,I,
where
where D is diagonal.
D is diagonal. Now
Now D D >> 00 by
by Theorem 10.31. Also,
Theorem 10.31. Also, since
since A A > 2: B,B, by Theorem
by Theorem
10.21 we
10.21 we have
have that QTAQ
that QT AQ 2: QTBQ,
> QT BQ, i.e.,
i.e., D I. But
> [.
D 2: But then D-1I ::::
then D" < [(this
/ (this is is trivially
trivially true
true
lI T T
since the
since the two matrices are
two matrices are diagonal). Thus, Q
diagonal). Thus, QD~
D- QT Q ::::
<Q QQQT,, i.e., A -Il ::::
i.e., A~ < B-B~l1. . D 0
\ 2.5.1
12.5.1 Simultaneous diagonalization
Simultaneous diagonalization via
via SVD
SVD
There are situations
There are situations in
in which forming C
which forming C = L~L -I1AL~
AL -TT
as
as in the proof
in the of Theorem
proof of 12.19 is
Theorem 12.19 is
numerically problematic,
numerically problematic, e.g.,
e.g., when
when L is highly
L is highly iII
ill conditioned
conditioned with
with respect to inversion.
respect to inversion. InIn
such cases,
such cases, simultaneous
simultaneous reduction
reduction can
can also
also be
be accomplished
accomplished via via an SVD. To
an SVD. illustrate, let
To illustrate. let
134
134 Chapter 12.
Chapter 12. Generalized Eigenvalue Problems
Generalized Eigenvalue Problems
T
us assume that
us assume that both
both A and B
A and B are
are positive
positive definite.
definite. Further,
Further, let
let A
A = LAL~
= L AL A and
and B
B — LsLTB
= LBL~
be Cholesky factorizations of A
Cholesky factorizations B, respectively.
A and B, respectively. Compute the SVD
(12.13)
where E
L Ee R£ x
1R~ xn " isisdiagonal.
diagonal. Then
Thenthe matrixQQ== LLBTu
thematrix i/performsthe
U performs thesimultaneous
simultaneous
diagonalization. To check this, note that
T
QT AQ = U Li/(LAL~)Li/U
= UTULVTVLTUTU
= L2
while
QT BQ = U T LB1(LBL~)Li/U
= UTU
= I.
Remark 12.21. The SVD in (12.13) can be computed without
Remark without explicitly forming the
product or the inverse by using the so-called generalized singular value
indicated matrix product
decomposition (GSVD). Note that
LBB1 LAA can be found from the eigenvalue problem

and thus the singular values of L
02.14)
Letting
Letting xx = = LLBT B zZ we see that (12.14)
we see 02.14) can be rewritten
rewritten in the LAL~x
the form L ALAx =
= XL Bz =
ALBz =
A L g L ^ L g 7 zz,, which
ALBL~LBT which is
is thus
thus equivalent to
to the
the generalized
generalized eigenvalue
eigenvalue problem
problem
02.15)
The problem
problem (12.15) is called a generalized
generalized singular value problem and algorithms exist to
solve it (and hence equivalently (12.13»
(12.13)) via arithmetic operations performed only on LA LA
T T
and LB
L B separately, i.e.,
i.e., without forming the products
products L
LA LL
A A ~ or L
L BLL
B B ~ explicitly; see,
see, for
example, [7, Sec.
Sec. 8.7.3]. This is analogous to finding the singular values of a matrix M by
T
operations performed
performed directly on M rather than by forming
forming the matrix M MT M and solving
T
the eigenproblem
eigenproblem M MT MXM x = AX.
Xx.
Remark 12.22. Various generalizations
generalizations of the results
results in Remark 12.21
12.21 are possible, for
T
example, when A A = = AAT:::> 0.O. The case when A A is symmetric but indefinite is not so
T
straightforward, at least in real arithmetic. For example, A can be written as A = PDPPDPT, ,
~ ~ ~ ~ T
where Disdiagonaland
D is diagonal and P P is orthogonal,butin
orthogonal, but in writing
writing A — PDDp T = PD(PD)
= PDDP PD(PD{ with with
D diagonal, D
D b may have pure imaginary elements.
12.6. Higher-Order
12.6. Eigenvalue Problems
Higher-Order Eigenvalue Problems 135
12.6
12.6 Higher-Order Eigenvalue
Higher-Order Eigenvalue Problems
Problems
Consider
Consider the second-order system
the second-order system of
of differential equations
Mq+Cq+Kq=O, (12.16)
1 xn
where q(t}
q(t) e
E W ~n and M, C, K e E Rn
~nxn.. Assume for simplicity that M is nonsingular.
Suppose, by analogy with the first-order case, that we try to find a solution of (12.16) of the
form q(t) = eeAtxt p,p, where the n-vector pp and scalar A.
q(t) = aretotobe
A are bedetermined.
determined. Substituting
Substitutinginin
(12.16) we get
(12.16)
or, since eAt :F 0,

(A 2 M + AC + K) p = O.
To get a nonzero solution /?,
p, we thus seek
seek values of A. A22M
A for which the matrix A. M++ A.C
AC ++K
is
is singular. Since the
singular. Since the determinantal equation
determinantal equation
o = det(A 2 M + AC + K) = A2n + ...

yields a polynomial 2n, there are 2n eigenvalues for the second-order (or
polynomial of degree 2rc,
quadratic) eigenvalue problem A. A22MM+ AC +
+ A.C + K.
K.
A special case
A special of (12.16)
case of arises frequently
(12.16) arises frequently in
in applications: = I, C = 0,
applications: M = 0, and
and
T
K == K . Suppose
KT. Suppose K has eigenvalues
eigenvalues
IL I ::: ... ::: ILr ::: 0 > ILr+ I ::: ... ::: ILn·
Let
Let a>k
Wk = fjik I1!.2 • Then
= I| ILk Then the
the 2n eigenvalues of
2n eigenvalues of the
the second-order
second-order eigenvalue
eigenvalue problem
problem A22
A.I /+ K
K
are
are
± jWk; k = 1, ... , r,
± Wk; k = r + 1, ... , n.
T
If n (i.e.,
If rr = n (i.e., K = K
KT > 0),
::: 0), then
then all solutions of
all solutions q + Kq
of q Kq = 0 are oscillatory.
0 are oscillatory.
12.6.1
12.6.1 Conversion
Conversion to
to first-order form
first-order form
Let
Let x\ q and
XI = q X2 = q. Then
and \i Then (12.16)
(12.16) can
can be
be written
written as
as aa first-order
first-order system
system (with
(with block
block
companion matrix)
. [ 0
X = -M-1K
2
where x(t) E E
x (t) €. ~2n.
". If M M- lI because
M is singular, or if it is desired to avoid the calculation of M
M is too ill conditioned with respect to inversion, the second-order
M second-order problem (12.16) can still
be converted
converted to the first-order generalized
generalized linear
linear system
I
[ o M OJ'x = [0
-K I
-C Jx.
136
136 Chapter 12.
Chapter 12. Generalized
Eigenvalue Problems
Problems
Many other first-order

Many other realizations are
first-order realizations possible. Some
are possible. Some can
can be useful when
be useful when M,M, C, and/or K
C, andlor K
have special symmetry or skew-symmetry properties
properties that can exploited.
Higher-order analogues of (12.16) involving,
involving, say, the kth derivative
derivative of q,
q, lead naturally
naturally
to
to higher-order eigenvalue problems
higher-order eigenvalue problems that
that can
can be converted to
be converted to first-order form using
first-order form using aaknxkn
kn x kn
block companion matrix
block companion analogue of
matrix analogue of (11.19).
(11.19). Similar
Similar procedures
procedures holdhold for
for the general kth-
the general k\h-
order difference equation
order difference equation
which
which can
can be
be converted
converted to various first-order
to various systems of
first-order systems of dimension kn.
dimension kn.
EXERCISES
EXERCISES
lRnnxxn" and D Ee lR::!
1. Suppose A eE R R™xm. xm
. Show that the finite
finite generalized eigenvalues
eigenvalues of
of
the pencil
[~ ~J-A[~ ~J
are the eigenvalues of the matrix A — B D- 11C.
- BD
MX
Let F, G E€ e
2. Let C nxn ". Show that
• Show that the
the nonzero eigenvalues of
nonzero eigenvalues of FG and G F are
and GF the same.
are the same.
Hint: An easy "trick
"trick proof
proof' is to verify
verify that the matrices
[Fg ~] and [~ GOF ]
are similar via

are similar the similarity
via the similarity transformation
transformation
3. Let
Let F
F e e
nxm
E Cnxm ,, G
mx
G eE Cmxne
".• Are the nonzero singular values of FG
Are the and GF
FG and GF the
the
same?
same?
4. Suppose A € E R
nxn
]Rnxn,, B e Rnnxm
E lR *m, and E wx ".Show
and C eE lRmxn. Showthat
thatthe
thegeneralized
generalizedeigenval-
eigenval-
ues of
ues of the
the pencils
pencils
[~ ~J-A[~ ~J
and
and
[ A + B~ + GC ~] _ A [~ ~]
1
are identical for all F 6 E"
F E R"xmx m. .
*" and all G EG R"
Rm xn
Hint: Consider the equivalence
[ 0I 1G][A-U
CO
B][IF0]
l'
(A similar
(A similar result
result is
is also
also true
true for
for "nonsquare"
"nonsquare" pencils. In the
pencils. In the parlance of control
parlance of control theory,
theory,
such results show that zeros are invariant under state feedback or output injection.)
Exercises
Exercises 137
137
5. Another
Another family of simultaneous diagonalization problems
simultaneous diagonalization problems arises when it is desired
desired
that the simultaneous diagonalizing transformation Q operates
operates on matrices A, B Ee
nx
]R
jRnxn " in such a way that Q~Q-ll AQ~
AQ-T T
QTBQ
and QT BQ are simultaneously diagonal. Such
a transformation contragredient. Consider the case where both A
transformation is called contragredient. A and
B are positive
B positive definite with Cholesky
Cholesky factorizations A = LA LTA~ and B =
= L&L = L#Lg,
L B L ~,
T
respectively, and let
respectively, and U~VT be
let UW an SVD
be an SVD of LTBLA.
of L~LA'
(a) Show that Q == LA V~-!

LA V £ ~ 5 is a contragredient
contragredient transformation that reduces both
A and
A B to
and B to the
the same
same diagonal
diagonal matrix.
matrix.
Q-ll =
(b) Show that Q~ ~-!UTL~.
= ^~Û T T
L B.
(c) Show that the eigenvalues of A
AB E2 and hence are
B are the same as those of 1;2
positive.
positive.
left blank
blank
Chapter 13
Chapter 13
Kronecker Products
Kronecker Products
13.1
13.1 Definition and
Definition and Examples
Examples
Definition 13.1. Let AA eE RlRmx
mxn
",, BB e Rpxq.. Then
E lR the Kronecker
Then the Kronecker product (or tensor
product (or tensor
product)
product) of
of A
A and
and B is defined
B is as the
defined as the matrix
matrix
allB
alnB ]
A@B= : : E lRmpxnq. (13.1)
[
amlB amnB
Obviously, the same

Obviously, the same definition
definition holds if A
holds if A and
and B are complex-valued
B are complex-valued matrices.
matrices. We
We
restrict our attention
restrict our attention in
in this
this chapter
chapter primarily
primarily toto real-valued
real-valued matrices,
matrices, pointing
pointing out
out the
the
extension
extension to
to the complex case
the complex case only
only where
where it is not
it is obvious.
not obvious.
Example 13.2.
Example 13.2.
1. Let A = [~ 2
2 nand B = [; ~J. Then
~]~U n
4 2 6
A@B =[ 3~ 2B
2B
3 4
3 4
6
2
6
2
9 4 6 2
Note that B
Note that <g> AA i-/ AA@<g>B.
B @ B.
2.
2. Foranyfl
Forany B Ee!F pxq
X(
lR 7,, //z2 <8>fl
@ B = [~ ~ l\
= [o J.
Replacing 12
I2 by /„In yields a block diagonal
diagonal matrix with
with nn copies
copies of
of B along
along the
the
diagonal.
3. Let B be
Let B an arbitrary
be an arbitrary 22x2
x 2 matrix.
B @/z =
Then
matrix. Then
l b~l
b"
0
b ll
b21
0
b12
0
b2 2
0
0
b12
0
b 22 l
139
140 Kronecker Products
Chapter 13. Kronecker Products
The extension to arbitrary B

B and /„
In is obvious.
m
4. Let Jt
x €E R
~m,, y e R". Then
E !R.n.
X ® Y = [ XIY T , ... , XmY T]T

mn
= [XIYJ, ... , XIYn, X2Yl, ... , xmYnf E !R. .
5. Let* eR m , y eR". Then
13.2
13.2 Properties of
Properties of the Kronecker Product
the Kronecker Product
mx rxi
13.3. Let
Theorem 13.3. Let A e
E R
~mxn,", 5
B Ee R
~rxs,, C
C e R" x ^ and
E ~nxp, and D
D e
E R sxt
~sxt.. Then
(A 0 B)(C 0 D) = AC 0 BD (E ~mrxpt). (13.2)
Proof: Simply verify

Proof; Simply verify that
that
~[
L~=l al;kCkPBD ]
L~=1 amkckpBD
=AC0BD. 0
Theorem 13.4. For all A and B, (A ® Bl = AT ® BT.
Foral!
Proof' For the proof, simply verify
Proof: verify using the definitions of transpose
transpose and Kronecker
Kronecker
product. D0
xn mxm
13.5. If
Corollary 13.5. If A eE R" e !R.
]Rn xn and B E Rm xm are
are symmetric, then A®
A ® B is symmetric.
13.6. If
Theorem 13.6. If A and B are nonsingular, (A ® B)-I =
Bare A-I ® B- 1.
Proof: Using Theorem
Proof: Theorem 13.3,
13.3, simply note that (A ® B)(A -1 ® B- 1 ) = 1 ® 1 = I. 0
13.2. Properties of
13.2. Properties of the Kronecker Product
the Kronecker Product 141
141
Theorem 13.7.
Theorem 13.7. If IR nxn
A Ee IR"
If A xn
am/ B
and E IR mxm are
B eR mxm
are normal,
normal, then
then A®
A0 B is normal.
B is normal.
Proof:
Proof:
(A 0 B{ (A 0 B) = (AT 0 BT)(A 0 B) by Theorem 13.4

= AT A 0 BT B by Theorem 13.3
= AAT 0 B BT since A and B are normal
= (A 0 B)(A 0 B)T by Theorem 13.3. 0
Corollary
Corollary 13.8.
13.8. If
If A E IR nxn
€ E" xn
is orthogonal and B E
orthogonal and eM IR mxm
mxm
15 orthogonal, then
is then A <g>0 B is
is
orthogonal.
Sine] anddB Sin</>] Th .,IS '1y seen
Example
E 13.9.
xamp Ie 139.• Let
L et A
A = [ _eose
sin e cose an B -= [Cos</>
_ sin</> cos</>O Then it
en It is easl
easily that
seen that
A is orthogonal
orthogonal with eigenvalues e±jO e±j9 and B is orthogonal
orthogonal with eigenvalues e±j(i>. The 4 x 4
eigenvalues e±j</J.
± (6>fJ >
matrix AA®0 5B is then also orthogonal with eigenvalues e^'^+'W e±jeH</» and ee±je^ ~^ -</».\
Theorem 13.10. Lgf

Theorem 13.10.
pxq
pxq
Let A
A EG EIR mxn
mx
" have
have aa singular
singular value
value decomposition
decomposition VA ~A
l/Ê^Vj an^ /ef
and let vI
fi
B Ee^
IR have a singular
decomposition V B ~B VI. Then
UB^B^B-
yields
yields aa singular
singular value value decomposition
decomposition of A <8>
of A 0 BB (after
(after aasimple
simplereordering
reorderingofofthe
thediagonal
diagonal
elements of
elements O/£A ~A 0 <8>~B£5 and
andthe thecorresponding
correspondingrightrightand
andleft
left singular
singularvectors).
vectors).
q
Corollary 13.11. Let A E
Corollary R™x" have singular
e lR;"xn singular values UIa\ ::::
> ...
• • • ::::
> Uarr >> 0 and let B Ee IRfx
have singular
have singular values values <I > ...
T\ :::: • • • ::::
> T<sS > 0. Then
> O. Then A A <g)
0 BB (or(or BB 0<8>A)A)has hasrsrssingular
singularvalues
values
îT\
U, > ...
<I :::: • • • ::::
> UffrrT<ss > 0
Qand
and
rank(A 0 B) = (rankA)(rankB) = rank(B 0 A) .
Theorem 13.12. Let A E e RIR nnx

xn"have eigenvaluesAi,A.,-,i / E e!!,n,and
haveeigenvalues mmxw
andletletBB E e IRR /zave
xm have
eigenvalues JL j, 7j E
eigenvalues jJij, € m. TTzen the
m. Then ?/ze mn
mn eigenvalues
eigenvalues of
of A®
A0 B Bareare
Moreover, if
Moreover, if x\, ...,, xxp
Xl, ••. are linearly
p are independent right
linearly independent right eigenvectors
eigenvectors of of A corresponding
A corresponding
to A - i ...
AI, , . . . ,, A.
App (p
(p < ::::: n), and
and zi, • • •,,Zq
ZI, ... zq are linearly independent
independent right eigenvectors of of B
corresponding to
corresponding to JJL\
JLI,, ...
...,,JLq
\Juq (q
(q <::::: m), then ;c,
m), then Xi 0 <8>ZjZj E€ IR mnm"are
ffi. are linearly
linearlyindependent
independent right
right
eigenvectors of
eigenvectors of A®
A0 B B corresponding
corresponding to to A.,/u,
Ai JL j,7, ii E
e l!!
/?, 7j Ee 1·q.
Proof: The basic idea of the proof
Proof: follows:
proof is as follows:
(A 0 B)(x 0 z) = Ax 0 Bz
=AX 0 JLZ
= AJL(X 0 z). 0
If
If A and Bare
A and B are diagonalizable
diagonalizable in
in Theorem
Theorem 13.12,
13.12, we can take
we can take p n and q
p = nand q =—mm and
and
thus get the
thus get the complete
complete eigenstructure
eigenstructure of
of A
A0<8>B.
B. InIngeneral,
general,ififAAand
and Bfi have
haveJordan
Jordan form
form
142
142 Chapter 1
Chapter 13.
3. Kronecker
Kronecker Products
Products
decompositions
decompositions given
given by p-lI AP
by P~ A and
AP = JJA Q~] BQ
and Q-l J B , respectively,
BQ = JB, respectively, then
then we get the
we get the
following
following Jordan-like structure:
Jordan-like structure:
(P ® Q)-I(A ® B)(P ® Q) = (P- I ® Q-l)(A ® B)(P ® Q)

= (P- 1 AP) ® (Q-l BQ)
= JA ® JB ·
Note
Note that
that JA®
h ® JB,JR, while
while upper
upper triangular, is generally
triangular, is generally not
not quite in Jordan
quite in Jordan form
form and needs
and needs
further
further reduction
reduction (to an ultimate
(to an Jordan form
ultimate Jordan form that also depends
that also depends onon whether
whether or
or not certain
not certain
eigenvalues
eigenvalues are
are zero
zero or nonzero).
or nonzero).
A Schur
Schur form
form for
for AA ®®B can be
B can derived similarly.
be derived similarly. For
For example, suppose P
example, suppose P and
and
are unitary
Q are unitary matrices
matrices that
that reduce
reduce A and B,
A and 5, respectively,
respectively, to
to Schur
Schur (triangular)
(triangular) form,
form, i.e.,
i.e.,
H H
P
pH AP
AP = = TTAA and
and Q QH BQ
BQ = = TTBB (and
(and similarly
similarly ifif P and
and Q are
are orthogonal
orthogonal similarities
similarities
reducing
reducing A A and
and BB to
to real Schur form).
real Schur Then
form). Then
(P ® Q)H (A ® B)(P ® Q) = (pH ® QH)(A ® B)(P ® Q)
= (pH AP) ® (QH BQ)
= TA ® TR .
Corollary 13.13. IRnnxn
13.13. Let A eE R xn and B e IR rnmxm
E R xm.. Then
1. Tr(A ® B) = (TrA)(TrB) = Tr(B ® A).
2. det(A ® B) = (det A)m(det Bt = det(B ® A).
Definition 13.14. Let A eE R

Definition 13.14. IR nnxn
Xn and B eE R
mxm
IRm xrn.. Then the Kronecker
Kronecker sum (or tensor sum)
of A
of and B,
A and B, denoted A ©
denoted A EEl B,
B, isis the mn x mn
the mn mn matrix (Im <g>
matrix Urn ® A)A)++ (B
(B ®®In).
/„).Note
Note that,
that,inin
general, A®
general, A EEl B ^ B
B i= B©EEl A.
A.
Example 13.15.
Example 13.15.
1. Let
1. Let
A~U !]andB~[ ; ~l
2
2
Then
Then
2 3 0 0 0 2 0 0 0 0
3 2 1 0 0 0 0 2 0 0 1 0
1 1 4 0 0 0 0 0 2 0 0
AfflB = (h®A)+(B®h) =
0 0 0 2 3 + 2 0 0 3 0 0
0 0 0 3 2 0 2 0 0 3 0
0 0 0 4 0 0 2 0 0 3
The
The reader
reader is
is invited
invited to
to compute B0
compute B EEl A = (/3
A = (h ®® B) (A 0<g>h)
B) + (A /2)and
andnote
notethe
thedifference
difference
with
with AA © B.
EEl B.
13.2. Properties
13.2. Properties of
of the Kronecker Product
the Kronecker Product 143
143
2. Recall
2. Recall the
the real
real JCF
JCF
M I 0 0
o M I 0
M E jR2kx2k,
1=
I 0
M I
0 o M
where M == [
where M
a f3
-f3 a J. Define
0 0 o
0 0
Ek = o
0 o
Then 1J can
Then can be written in
be written the very
in the very compact
compact form
form 1J = (4
(I} <8>
® M)M)+
+(Ek ® h) =
(E^®l2) =MM$0 EEk.
k.
x mx
Theorem 13.16. Let A Ee E" jRnxn" have eigenvalues A,-,ii Ee !!.
eigenvalues Ai, n, and let B Ee RjRmxm '" have
eigenvalues /z
eigenvalues ;, 7
fJ-j, j eE I!!.
ra. Then
TTzen the
r/ze Kronecker sum A®
Kronecker sum A$ B B = (1m(Im (g> A)++ (B
® A) (B®<g>In)/„)has
/za^
mnran
e/genva/wes
eigenvalues
Al + fJ-t, ... , AI + fJ-m, A2 + fJ-t,···, A2 + fJ-m, ... , An + fJ-m'
Moreover, if
Moreover, if x\,...
XI, .•• ,x , xpp are linearly independent
are linearly independent right right eigenvectors
eigenvectors ofof A corresponding
A corresponding
AI, ...
to AI, . . . ,, X
App (p < n), and
(p ::s: and z\, ..., , Zq
ZI, ... zq are
are linearly
independent right eigenvectors
eigenvectors of
of B
corresponding to fJ-t,
corresponding f j i \ , ...
. . . ,, f^
fJ-qq (q < ra),
(q ::s: m), then ZjZj ® W1" are
<8>XiXiE€ jRmn arelinearly
independent right
right
eigenvectors
eigenvectors of of A®
A$ B corresponding to
B corresponding A.,+
to Ai + [ij, € E,
fJ-j' ii E p, jj Ee fl·
q.
Proof: The basic
Proof: The idea of
basic idea of the proof is
the proof is as
as follows:
follows:
[(1m ® A) + (B ® In)](Z ® X) = (Z ® Ax) + (Bz ® X)
= (Z ® Ax) + (fJ-Z ® X)
= (A + fJ-)(Z ® X). 0
If
If A and B
A and are diagonalizable
Bare diagonalizable in
in Theorem
Theorem 13.16,
13.16, we can take
we can p =n
take p and qq = m and
nand and
thus
thus get
get the
the complete
complete eigenstructure
eigenstructure of
of A
A 0
$ B. InIn general,
general, if if A
A and
and B have
have Jordan form
Jordan form
decompositions given
decompositions p-I1AP = lA
given by P~ and Q"
JA and BQ = JB,
Q-t1 BQ l B , respectively, then
respectively, then
[(Q ® In)(lm ® p)rt[(lm ® A) + (B ® In)][CQ ® In)(lm ® P)]
= [(1m ® p)-I(Q ® In)-I][(lm ® A) + (B ® In)][(Q ® In)(/m ® P)]
= [(1m ® p-I)(Q-I ® In)][(lm ® A) + (B ® In)][CQ ® In)(/m <:9 P)]
= (1m ® lA) + (JB ® In)
is aa Jordan-like
is Jordan-like structure for A
structure for A $© B.
B.
144 Chapter 13.
Chapter 13. Kronecker
Kronecker Products
Products
A
A Schur
Schur form
fonn for
for A
A©EB B can be
B can be derived similarly. Again,
derived similarly. suppose P
Again, suppose and Q are
P and are unitary
unitary
H
matrices that
that reduce
reduce A and B, respectively, toto Schur
Schur (triangular) form, i.e., P
fonn, i.e., pH AP
AP = = TTAA
and QHBQ
and QH BQ = = TB
TB (and similarly if
(and similarly P and
if P and QQ are orthogonal similarities
are orthogonal similarities reducing
reducing A and B
A and B
to
to real
real Schur form). Then
Schur fonn). Then
((Q ® /„)(/« ® P)]"[(/m <8> A) + (B ® /B)][(e (g) /„)(/„, ® P)] = (/m <8> rA) + (7* (g) /„),
where [(Q
[(Q <8>
® In)(lm P)] =
/„)(/«®®P)] = (Q
(<2®®P)
P) isisunitary
unitaryby
byTheorem
Theorem 13.3
13.3and
andCorollary
Corollary 13.8.
13.8.
13.3
13.3 Application to Sylvester
Application to Sylvester and Lyapunov Equations
and Lyapunov Equations
In
In this
this section
section we study the
we study linear matrix
the linear matrix equation
equation
AX+XB=C, (13.3)
x mxm xm
IRnxn
where A eE R" ", , B eE RIRmxm ,, and C eE M"
IRnxm.. This equation
equation is
is now often called a Sylvester
now often Sylvester
equation
equation in
in honor
honor of of 1.1.
J.J. Sylvester
Sylvester who studied general
who studied general linear
linear matrix
matrix equations of the
equations of the form
fonn
LA;XB; =C.
;=1
A special case of (13.3) is the symmetric equation
AX +XAT = C (13.4)
T
obtained by taking
taking B B ==A . When
AT. When C is symmetric,
symmetric, the solution
solution X E eW IRnx"xn is easily shown
also
also to
to be symmetric and
be symmetric and (13.4)
(13.4) is as aa Lyapunov
known as
is known equation. Lyapunovequations
Lyapunov equation. Lyapunov equations
arise
arise naturally
naturally in
in stability
stability theory.
theory.
The first
The important question
first important question to ask regarding
to ask (13.3) is,
regarding (13.3) is, When
When does
does aa solution
solution exist?
exist?
By writing
writing the matrices in (13.3)
(13.3) in
in tenns
terms of
of their
their columns, it is easily
easily seen
seen by equating the
z'th
ith columns
columns that
that
m
AXi + Xb; = C; = AXi + l:~>j;Xj.
j=1
These equations
These equations can
can then
then be
be rewritten as the
rewritten as the mn x
x mn linear
linear system
system
A+blll b 21 1
bl21 A + b 2Z 1
(13.5)
[
blml b2ml
The
The coefficient
coefficient matrix in (13.5)
matrix in (13.5) clearly
clearly can
can be
be written as the
written as the Kronecker sum (1m
Kronecker sum A) +
(Im 0* A) +
(BTT 0® /„).
(B In). The
The following
following definition
definition is very helpful
is very in completing
helpful in the writing
completing the writing of
of (13.5)
(13.5) as
as
an "ordinary"
an "ordinary" linear
linear system.
system.
13.3. Application
13.3. Application to
to Sylvester
Sylvester and Lyapunov Equations
and Lyapunov Equations 145
145
n nxm
Definition 13.17. Let Ci
Definition 13.17. c( €
E E. the columns ofC
jRn denote the ofC Ee R
jRnxm so that C = [ n , ...
[CI, . . . ,, Ccm}.
].
Then vec(C) is defined to be the mn-vector formed by stacking the columns ofC on top of
::~~::~: ::d~~:::O:[]::::fonned by "ocking the colunuu of on top of
C
one another, i.e., vec(C) =
Using Definition 13.17,

Using Definition 13.17, the
the linear
linear system
system (13.5) can be
(13.5) can be rewritten
rewritten in
in the
the form
form
[(1m ® A) + (B T ® In)]vec(X) = vec(C). (13.6)
There exists aa unique

There exists unique solution
solution toto (13.6)
(13.6) if and only
if and only if
if [(I
[(1mm ® A) +
® A) (BTT ®
+ (B ® /„)]
In)] isis nonsingular.
nonsingular.
T
But [(I
But [(1m
m <8>
® A)A)+ + (B
(B T (g) /„)] is nonsingular if and only if it has no zero eigenvalues.
® In)] is nonsingular if and only if it has no zero eigenvalues.
From
From Theorem
Theorem 13.16,
13.16, the
the eigenvalues
eigenvalues of of [(/
[(1mm <g>
® A) (BT ®<8>In)]
A) ++ (BT /„)] ++Mj,IJLJ,where
areareAi A., where
A,,-
Ai eE A (A), ii eE!!,
A(A), andMj
n_,and ^j Ee A(B),
A(fi),j j E!!!..
e m.We Wethus
thushave
havethe
thefollowing
followingtheorem.
theorem.
mxm xm
Theorem
Theorem 13.1S. Rnxn,, B E
13.18. Let A eE lR GRjRmxm,, and C e
E R"
jRnxm.. Then
77ie/i the Sylvester equation
AX+XB=C (13.7)
has aa unique
has unique solution if and
solution if only ifif A
and only and —B
A and have no
- B have eigenvalues in
no eigenvalues in common.
common.
Sylvester
Sylvester equations
equations of of the form (13.3)
the form (13.3) (or symmetric Lyapunov
(or symmetric Lyapunov equations
equations of of the
the form
form
(13.4))
(13.4» areare generally
generally not
not solved
solved using
using the mn x
the mn "vec"formulation
mn "vee"
x mn formulation(13.6).
(13.6). The
Themost
most
commonly preferred
commonly preferred numerical
numerical algorithm
algorithm is is described
described in in [2].
[2]. First
First A
A and
and BB are
are reduced
reduced to to
(real) Schur
(real) Schur form.
form. An equivalent
equivalent linear
linear system
system is is then
then solved
solved in
in which
which the
the triangular
triangular form
form
of the
of the reduced
reduced A and
and B cancan be
be exploited
exploited to to solve
solve successively
successively for for the
the columns
columns of of aa suitably
suitably
3
transformed solution matrix
transformed solution matrix X. X. Assuming
Assuming that, say, n
that, say, n > m, this
:::: m, this algorithm
algorithm takes only 0O(n
takes only (n 3))
66
operations rather
operations rather than the O(n
than the O(n )) that
that would
would be be required
required by solving (13.6)
by solving directly with
(13.6) directly with
Gaussian elimination.
Gaussian elimination. A A further enhancement to
further enhancement to this
this algorithm
algorithm is is available
available in
in [6]
[6] whereby
whereby
the
the larger
larger of
of A
A or B is
or B is initially
initially reduced only to
reduced only to upper
upper Hessenberg
Hessenberg rather
rather than triangular
than triangular
Schur form.
Schur form.
The next
The next few
few theorems
theorems are are classical.
classical. They
They culminate
culminate in in Theorem 13.24, one
Theorem 13.24, one ofof many
many
elegant connections
elegant connections between
between matrix
matrix theory
theory and stability theory
and stability theory for differential equations.
for differential equations.
Theorem Rnxn, B eE R
Theorem 13.19. Let A eE jRnxn, mxm
C eE R
jRmxm,, and C
nxm
jRnxm.. Suppose
Suppose further
further that A and B
are asymptotically
are stable (a
asymptotically stable (a matrix
matrix is
is asymptotically
asymptotically stable
stable ifif all
all its
its eigenvalues
eigenvalues have
have real
real
parts
parts in
in the
the open
open left
left half-plane).
half-plane). Then
Then the
the (unique)
(unique) solution
solution ofof the
the Sylvester
Sylvester equation
equation
AX+XB=C (13.8)
can
can be
be written as
written as
(13.9)
Proof: Since
Proof: Since AA and
and B are stable,
B are stable, A., (A)+
Aj(A) + Aj(B)
A;-(B) =I^ 00 for
for all
alli,i, j j so
sothere
there exists
exists aaunique
unique
solution
solution to(13.8)by
to (13.8) by Theorem 13.18. Now
Theorem 13.18. Now integrate
integrate the
the differential equation X
differential equation X = AX AX + X XB
B
(with
(with X(0)
X(O) = C)C) on [0, +00):
on [0, +00):
lim XU) - X(O) = A roo X(t)dt + ([+00 X(t)dt) B. (13.10)

I-Hoo 10 10
146
146 Chapter 13.
Kronecker Products
Products
results of Section 11.1.6, it can be shown easily that lim elA = lim elB =
Using the results = O.0.
r—>+oo
1-->+00 t—v+oo
1 .... +00
Hence, using the solution XX((t) = elACe

t) = lB from Theorem
etACetB Theorem 11.6, we have that lim XX((t) t) =
— 0.
O.
t~+x
/—<-+3C
Substituting in (13.10) we have
-C = A (1+ 00
elACe lB dt) + (1+ 00
elACe lB dt) B
{+oo
and so X
and so X = -1o elACe lB dt satisfies (13.8). o
Remark 13.20. An equivalent condition for the existence of a unique solution to AX

Remark AX +
+
XB = C is
XB = that [~
is that [ J __CcBfi ]] be
be similar to [[~J _°
similar to _OB]
B ] (via
(via the
the similarity
similarity [~
[ J _~
_* ]).
]).
x
Theorem 13.21. Lef
Let A, C E
e R" ". Then
jRnxn. TTzen the
r/ze Lyapunov equation
AX+XAT =C (13.11)
has unique solution if

has a unique if and only if
and only if A and
and -—A A TT have no eigenvalues
eigenvalues in
in common.
common. If
If C is
symmetric and ((13.11)
symmetric and has aa unique
13.11) has unique solution, then that
solution, then that solution
solution is symmetric.
is symmetric.
xn T
Remark 13.22. If If the matrix
matrix A e W
A E jRn xn has eigenvalues
eigenvalues A.I
)"" ,...,!„,
... , An, then -— A
AT has eigen-
eigen-
T
—A.], ...
values -AI, . . . ,, —k
- An.
n . Thus, a sufficient
sufficient condition that guarantees that A
A and —
- A
A T have
no common
common eigenvalues
eigenvalues is that A A be asymptotically
asymptotically stable. Many useful results exist con-
cerning the relationship between stability and Lyapunov equations. Two basic results due
to Lyapunov are the following, the first of which follows immediately from Theorem Theorem 13.19.
13.19.
x
Theorem 13.23. Let A,CA, C E e R"
jRnxn" and suppose further
further that A is asymptotically
asymptotically stable.
stable.
Then the (unique) solution o/the
of the Lyapunov equation
AX+XAT=C
can be written
can be as
written as
(13.12)
x
Theorem 13.24. A matrix A E e R"
jRnxn asymptotically stable if
" is asymptotically only if
if and only if there exists a
positive
positive definite
definite solution
solution to
to the
the Lyapunov equation
Lyapunov equation
AX +XAT = C, (13.13)
where
where C
C -= C T < O.
Proof: Suppose A is asymptotically

Proof: asymptotically stable. By Theorems 13.21
l3.21 and 13.23
l3.23 a solution
solution to
(13.13) exists and takes the form (13.12). Now let vv be an arbitrary nonzero vector in jRn.
E".
Then
Then
13.3. Application
13.3. Application to Sylvester and
to Sylvester and Lyapunov
Lyapunov Equations
Equations 147
147
Since -C
Since —C > > 00 and etA is
and etA is nonsingular
nonsingular for all t, the
for all the integrand
integrand above is positive.
above is Hence
positive. Hence
T
vv Xv
T Xv >> 00 and
and thus
thus X is positive
X is positive definite.
definite.
T
Conversely, suppose
Conversely, suppose X X == XXT >> 00 and let A
and let A. Ee A (A) with
A(A) corresponding left
with corresponding left eigen-
eigen-
vector y. Then
vector y. Then
0> yHCy = yH AXy + yHXAT Y

= (A + I)yH Xy.
H
Since yyH
Since Xy 0, we
> 0,
Xy > we must
must have A+
have A + IA == 22 Re
R eAA << 0O.. Since
Since A
A was
was arbitrary, A must
arbitrary, A must be
be
asymptotically stable. D
asymptotically stable. D
Remark 13.25. The
Remark 13.25. The Lyapunov equation AX
Lyapunov equation X ATT =
AX + XA C can
= C also be
can also be written
written using
using the
the
vec
vec notation in the
notation in the equivalent
equivalent form
form
[(/ ® A) + (A ® l)]vec(X) = vec(C).

A subtle point
A subtle point arises
arises when dealing with
when dealing with the
the "dual"
"dual" Lyapunov
Lyapunov equation A TTXX + XA
equation A X A = C.
C.
The equivalent
The equivalent "vec
"vec form" of this
form" of this equation is
equation is
[(/ ® AT) + (AT ® l)]vec(X) = vec(C).
However,
However, the
the complex-valued equation AHHXX + XA
complex-valued equation =C
XA = C is equivalent to
is equivalent to
[(/ ® AH) + (AT ® l)]vec(X) = vec(C).
The vec
The operator has
vec operator has many
many useful
useful properties, most of
properties, most of which
which derive
derive from
from one
one key
key
result.
result.
Theorem 13.26.
Theorem 13.26. For any three
For any three matrices
matrices A,
A, B, and C
B, and C for
for which
which the
the matrix
matrix product
product ABC
ABC is
is
defined,
defined,
vec(ABC) = (C T ® A)vec(B).
Proof: The proof

Proof: The follows in
proof follows in aa fairly
fairly straightforward
straightforward fashion
fashion either
either directly
directly from the defini-
from the defini-
tions or from
tions or from the
the fact vec(;t;yr) =
that vec(xyT)
fact that = y® <8>x.x. D
D
An
An immediate application is
immediate application is to
to the
the derivation
derivation of existence and
of existence and uniqueness conditions
uniqueness conditions
for the
for the solution of the
solution of simple Sylvester-like
the simple Sylvester-like equation
equation introduced
introduced in
in Theorem
Theorem 6.11.
6.11.
mxn px(} mxq
13.27. Let A Ee R
Theorem 13.27. jRrnxn,, B eR
B E jRPxq, , and C Ee R
jRrnxq.. Then the
the equation
AXB =C (13.14)
nxp
has jRn x p if
has aa solution X eE R. only if
if and only A A++CB
ifAA +
C B+BB = C,
C, in
in which
which case the
the general solution
solution
is of the
is of the form
form
(13.15)
nxp + +
where Y
where Y eE R is arbitrary.
jRnxp is arbitrary. The
The solution of (13.14)
of (13. 14) is unique if
is unique if BB
BB+ ®
®AA+ AA =
= [.
I.
Proof: Write
Proof: (13.14) as
Write (13.14) as
(B T ® A)vec(X) = vec(C) (13.16)
148
148 Chapter 113.
Chapter Kronecker Products
3. Kronecker Products
by Theorem 13.26.
by Theorem 13.26. This
This "vector
"vector equation"
equation" has
has aa solution
solution if
if and
and only
only if
if
(B T ® A)(B T ® A)+ vec(C) = vec(C).

+ + +
It is
It is aa straightforward
straightforward exercise
exercise to
to show
show that (M ®
that (M ® N) = M+
N) + = M ®<8>N+. Thus,(13.16)
N . Thus, (13.16)has
hasaa
solution if
solution if and
and only
only if
if
vec(C) = (B T ® A)«B+{ ® A+)vec(C)

= [(B+ B{ ® AA+]vec(C)
= vec(AA +C B+ B)
+ +
and hence
and if and
hence if and only
only if
if AA
AA+ CB
C B+ B = C.
B C.
The general solution of (13.16) is
The general solution of (13 .16) is then
then given
given by
by
vec(X) = (B T ® A) + vec(C) + [I - (B T ® A) + (B T ® A)]vec(Y),
where YY is
where is arbitrary.
arbitrary. This
This equation
equation can
can then
then be rewritten in
be rewritten in the form
the form
vec(X) = «B+{ ® A+)vec(C) + [I - (BB+{ ® A+ A]vec(y)
or, using
or, using Theorem
Theorem 13.26,
13.26,
+
The solution
The solution is
is clearly
clearly unique
unique if
if B
BBB+ <8>AA++A
® A ==I.I. 0D
EXERCISES
EXERCISES
1. For
I. For any
any two
two matrices
matrices A and B
A and for which
B for which the
the indicated
indicated matrix
matrix product
product is is defined,
defined,
show that
show (vec(A)) r (vec(fl))
that (vec(A»T Tr(ATr B).
(vec(B» == Tr(A £). In
In particular,
particular, if
if B
B Ee Rn xn
lR nxn ,, then
then Tr(B)
Tr(fl) ==
r
vec(/J
vec(Inl vec(fl).
vec(B).
2. Prove
2. Prove that
that for
for all
all matrices
matrices A and B,
A and B, (A B)+ =
® B)+
(A ® A+ ®
= A+ B+.
® B+.
3. Show
3. Show that
that the
the equation
equation AX
AX B B == C has aa solution
C has solution for
for all
all C
C if
if A
A has full row
has full row rank and
rank and
B has
B has full
full column
column rank.
rank. Also,
Also, show
show that
that aa solution, if it
solution, if it exists, is unique
exists, is unique if
if A
A has full
has full
column rank
column and B
rank and has full
B has full row rank. What
row rank. What is is the
the solution
solution inin this
this case?
case?
4. Show
4. Show that
that the general linear
the general linear equation
equation
k
LAiXBi =C
i=1
can be
can be written
written in
in the
the form
form
[BT ® AI + ... + B[ ® Ak]vec(X) = vec(C).

Exercises
Exercises 149
149
5. Let x E Mm and y Ee E".

€ ]Rm x rT ®
]Rn. Show that * <8>yy==y Xyx T
T •.
6. Let A e R" xn and £ e M m x m .

(a) Show that
(a) Show that IIA||A ®
<8>BII2
B||2 == IIAII2I1Blb.
||A||2||£||2.
(b) What
(b) What is
is II||A
A ®®BB\\ in terms
II FF in of the
terms of Frobenius norms
the Frobenius of A
norms of and B?
A and B? Justify
Justify your
your
answer carefully.
answer carefully.
(c) What is the spectral radius of
of A <8>BBininterms
A ® termsof
ofthe
thespectral
spectralradii
radiiof
ofAAand
and B?
B?
Justify your
Justify answer carefully.
your answer carefully.
x
7. Let
7. A, 5
Let A, B eR" ".
E ]Rnxn.
(a) Show that (l

(/ ® A)* = I/ ®
A)k = <8>Ak
A*and
and(B l ==BkBfc
(fl®<g>I /)* ®®I /forforallallintegers
integersk.&.
l A A 5 7 B
e®
(b) Show that el®A = I ® e A and eB®1 = e B ® I.
<g) e and e ® = e (g) /.
(c) Show that
(c) Show the matrices
that the matrices /I (8)
® AAand
andBB®®I /commute.
commute.
(d) Show
(d) Show that
that
e AEIlB = eU®A)+(B®l) = e B ® e A .
(Note: This
(Note: result would
This result would look
look aa little
little "nicer"
"nicer" had
had we defined our
we defined our Kronecker
Kronecker
sum the other way around. However, Definition 13.14 13.14 is conventional in the
literature.)
8. Consider the Lyapunov matrix equation (13.11) with
A = [~ _~ ]
and C the
and C symmetric matrix
the symmetric matrix
Clearly
Clearly
[~
Xs = [~ ~ ]
is aa symmetric
is symmetric solution
solution of
of the equation. Verify
the equation. Verify that
that
Xns = [_~ ~ ]
is also aa solution
is also solution and
and is
is nonsymmetric.
nonsymmetric. Explain in light
Explain in light of
of Theorem
Theorem 13.21.
13.21.
9. Block
9. Block Triangularization:
Triangularization: Let
Let
xn
where A A eE Rn
]Rn xn and D E Rmxm
e ]Rm xm.. It is desired to find
find a similarity
similarity transformation
of
of the form
the form
T=[~ ~J
such that
such T l1ST
that T- ST is
is block
block upper
upper triangular.
triangular.
150
150 Chapter 13.
Kronecker Products
Products
(a) Show that S is similar to
A +OBX B ]
[ D-XB
if X
if satisfies the
X satisfies matrix Riccati
so-called matrix
the so-called Riccati equation
equation
C-XA+DX-XBX=O.
Formulate a similar result for block lower triangularization of S.

(b) Fonnulate S.
to. Block
10. Block Diagonalization: Let
xn mxm
S= [~ ~ l
where A Ee Rn ER
jRnxn and D E jRmxm.. It is desired to find a similarity transfonnation
transformation of
of
form
the fonn
T=[~ ~]
T l1ST
such that T- ST is block
block diagonal,
diagonal.
(a) Show that S is similar to
if YY satisfies the Sylvester equation
AY - YD = -B.
(b) Formulate
Fonnulate a similar result for block diagonalization of
of
Bibliography
[1] Albert, A., Regression and the Moore-Penrose Pseudoinverse, Academic Press, New
[1]
York, NY,
York, NY, 1972.
1972.
[2] Bartels,
[2] Bartels, RH.,
R.H., and
and G.w.
G.W. Stewart,
Stewart, "Algorithm
"Algorithm 432.
432. Solution
Solution of the Matrix Equation
Equation
AX +
AX + XB
XB == C,"
C," Comm. 15(1972), 820-826.
Cornm. ACM, 15(1972),820-826.
[3] Bellman,
[3] Bellman, R,
R., Introduction to Second Edition,
to Matrix Analysis, Second Edition, McGraw-Hill,
McGraw-Hill, New
New
York, NY,
York, NY, 1970.
1970.
[4] Bjorck, A., Numerical

[4] Bjorck, Numerical Methods for Least
Methodsfor Squares Problems,
Least Squares SIAM, Philadelphia,
Problems, SIAM, Philadelphia, PA,
PA,
1996.
1996.
[5] Cline,
[5] Cline, R.E.,
R.E., "Note
"Note on
on the Generalized Inverse
the Generalized Inverse of
of the Product of
the Product of Matrices,"
Matrices," SIAM
SIAM Rev.,
Rev.,
6(1964), 57–58.
6(1964),57-58.
[6] Golub,
[6] Golub, G.H.,
G.H., S.
S. Nash, and C.
Nash, and C. Van
Van Loan,
Loan, "A
"A Hessenberg-Schur
Hessenberg-Schur Method
Method for
for the Problem
the Problem
AX +
AX XB = C," IEEE
X B = C," IEEE Trans. Autom. Control, AC-24(1979),
AC-24(1979), 909-913.
[7] Golub,
[7] Golub, G.H., and c.F.
G.H., and C.F. Van
VanLoan,
Loan,Matrix
Matrix Computations, Third Edition,
Computations, Third Edition, Johns
JohnsHopkins
Hopkins
Univ.
Univ. Press,
Press, Baltimore, MD, 1996.
Baltimore, MD, 1996.
[8] Golub,
[8] Golub, G.H.,
G.H., and
and lH.
J.H. Wilkinson,
Wilkinson, "Ill-Conditioned
"Ill-Conditioned Eigensystems
Eigensystems and
and the Computation
Computation
of the Jordan Canonical Form," SIAM
ofthe 18(1976), 578-619.
SIAM Rev., 18(1976),578-619.
[9] Greville,
[9] Greville, T.N.E., "Note on
T.N.E., "Note on the
the Generalized
Generalized Inverse
Inverse of
of aa Matrix
Matrix Product," SIAM Rev.,
Product," SIAM Rev.,
8(1966), 518–521 [Erratum,
8(1966),518-521 [Erratum, SIAM 9(1967), 249].
SIAM Rev., 9(1967), 249].
[10] Halmos,
[10] Halmos, P.R,
PR., Finite-Dimensional
Finite-Dimensional Vector
Vector Spaces, Second Edition,
Spaces, Second Edition, Van
Van Nostrand,
Nostrand,
Princeton,
Princeton, NJ,
NJ, 1958.
1958.
[11] Higham,
[11] Higham, N.J.,
N.1., Accuracy
Accuracy and Stability of
of'Numerical
Numerical Algorithms, Second Edition,
Algorithms, Second Edition, SIAM,
SIAM,
Philadelphia, PA, 2002.
Philadelphia, 2002.
Horn, RA.,
[12] Hom, R.A.,and
andC.R.
C.R.Johnson,
Johnson,Matrix
MatrixAnalysis, Cambridge Univ.
Analysis, Cambridge Univ.Press,
Press,Cambridge,
Cambridge,
UK, 1985.
UK, 1985.
Horn, R.A.,
[13] Hom, RA., and C.R. Analysis, Cambridge Univ.
C.R. Johnson, Topics in Matrix Analysis, Univ. Press,
Cambridge, UK,
Cambridge, UK, 1991.
1991.
151
151
152
152 Bibliography
Bibliography
[14] Kenney,
[14] Kenney, C,
C., and
and A.J.
AJ. Laub, "Controllability and
Laub, "Controllability and Stability
Stability Radii for Companion
Radii for Companion Fonn
Form
Systems," Math,
Systems," Math. of 1(1988), 361-390.
Control, Signals, and Systems, 1(1988),361-390.
of Control,
[15] Kenney,
[15] C.S., andAJ.
Kenney, C.S., and A.J.Laub,
Laub,"The
"TheMatrix
MatrixSign
SignFunction,"
Function," IEEE
IEEE Trans.
Trans.Autom.
Autom.Control,
Control,
40(1995), 1330–1348.
40(1995),1330-1348.
[16] Lancaster,
[16] Lancaster, P., and M.
P., and M. Tismenetsky, Theory of
Tismenetsky, The Theory Second Edition
of Matrices, Second Edition with
with
Applications, Academic Press,
Applications, Academic Press, Orlando,
Orlando, FL,
FL, 1985.
1985.
[17] Laub,
[17] Laub, A.J., "A Schur
AJ., "A Schur Method
Method for
for Solving
Solving Algebraic
Algebraic Riccati Equations," IEEE
Riccati Equations," Trans..
IEEE Trans ..
Autom. Control, AC-24( 1979),
1979), 913–921.
913-921.
[18] Meyer,
[18] C.D., Matrix Analysis
Meyer, C.D., Analysis and Applied
Applied Linear SIAM, Philadelphia,
Linear Algebra, SIAM, Philadelphia, PA,
PA,
2000.
2000.
[19] Moler,
[19] C.B.,and
Moler, c.B., andc.P.
C.F.Van
VanLoan,
Loan,"Nineteen
"NineteenDubious
DubiousWays
WaystotoCompute
Computethe
theExponential
Exponential
of aa Matrix,"
of Matrix," SIAM 20(1978), 801-836.
SIAM Rev., 20(1978),801-836.
[20] Noble,
[20] and J.w.
Noble, B., and J.W. Daniel,
Daniel, Applied Linear Algebra, Third
Applied Linear Edition, Prentice-Hall,
Third Edition, Prentice-Hall,
Englewood Cliffs,
Englewood Cliffs, NJ,
NJ, 1988.
1988.
[21] Ortega,
[21] Ortega, J., Matrix Theory. A Second Course, Plenum,
Plenum, New York, NY,
NY, 1987.
1987.
[22] Pemose,
[22] Penrose, R., "A Generalized
R., "A Generalized Inverse
Inverse for Matrices," Proc.
for Matrices," Proc. Cambridge Philos. Soc.,
Soc.,
51(1955), 406–413.
51(1955),406-413.
[23] Stewart,
[23] Stewart, G.W.,
G. W., Introduction to Academic Press,
to Matrix Computations, Academic Press, New
New York,
York, NY,
NY,
1973.
1973.
[24] Strang,
[24] Strang, G.,
G., Linear
Linear Algebra and Its
Algebra and Applications, Third
Its Applications, Edition, Harcourt
Third Edition, Brace
Harcourt Brace
Jovanovich, San
Jovanovich, San Diego, CA, 1988.
Diego, CA, 1988.
[25] Watkins, D.S.,

[25] D.S., Fundamentals of Second Edition,
of Matrix Computations, Second Edition, Wiley-
Interscience, New
Interscience, New York,
York, 2002.
2002.
[26] Wonham,
[26] Wonham, W.M.,
W.M., Linear Multivariable Control. Third Edition,
Control. A Geometric Approach, Third Edition,
Springer-Verlag, New
Springer-Verlag, New York,
York, NY,
NY, 1985.
1985.
Index
Index
A–invariant subspace, 89
A-invariant subspace, 89 congruence, 103
congruence, 103
matrix characterization
matrix characterization of,
of, 90
90 conjugate transpose, 22
conjugate transpose,
algebraic multiplicity,
algebraic multiplicity, 76
76 contragredient transformation,
contragredient transformation, 137
137
angle between
angle between vectors,
vectors, 58
58 controllability, 46
controllability, 46
basis, 11
basis, 11 defective, 76
defective, 76
natural,
natural, 12
12 degree
degree
block matrix,
block matrix, 22 of aa principal
of principal vector,
vector, 85
85
definiteness of,
definiteness of, 104
104 determinant, 4
determinant, 4
diagonalization, 150
diagonalization, 150 of aa block
of matrix, 55
block matrix,
inverse of,
inverse of, 48
48 properties of,
properties of, 4-6
4–6
LU factorization,
LV factorization, 55 dimension, 12
dimension, 12
triangularization, 149 direct sum
direct sum
of subspaces,
of subspaces, 13
13
C",
en, 1 domain, 17
domain, 17
e
(pmxn
mxn
,
i
1
eigenvalue, 75
eigenvalue, 75
(p/nxn 1
e~xn, 1
invariance under
invariance under similarity
similarity transfor-
transfor-
Cauchy–Bunyakovsky–Schwarz Inequal-
Cauchy-Bunyakovsky-Schwarz Inequal-
mation, 81
mation,81
ity, 58
ity,58
elementary divisors,
elementary divisors, 8484
Cayley–Hamilton Theorem,
Cayley-Hamilton Theorem, 75 75
equivalence transformation,
equivalence transformation, 95 95
chain
chain
orthogonal, 95
of eigenvectors,
of eigenvectors, 87 87
unitary, 95
unitary, 95
equivalent generalized
equivalent generalized eigenvalue
eigenvalue prob-
prob-
of aa matrix,
of matrix, 7575
lems, 127
lems,
of aa matrix
of matrix pencil,
pencil, 125
125
equivalent matrix
equivalent matrix pencils,
pencils, 127
127
Cholesky factorization,
Cholesky factorization, 101
101
exchange matrix,
exchange matrix, 39,
39, 89
89
co–domain, 17
co-domain, 17
exponential of
exponential of aa Jordan
Jordan block,
block, 91,
91, 115
115
column
column
exponential of
exponential of aa matrix,
matrix, 81,
81, 109
109
rank, 23
rank, 23
computation of,
computation of, 114-118
114–118
vector,
vector, 11
inverse of,
inverse of, 110
110
companion matrix
companion matrix
properties of,
properties of, 109-112
109–112
inverse of,
inverse of, 105
105
pseudoinverse
pseudoinverse of, of, 106
106 field, 7
field,
singular values
singular values of,
of, 106
106 four fundamental
four fundamental subspaces,
subspaces, 23
23
singular vectors
singular vectors of,
of, 106
106 function of
function of aa matrix,
matrix, 81
81
complement
complement
of aa subspace,
of subspace, 13 13 generalized eigenvalue,
generalized eigenvalue, 125
125
orthogonal, 21
orthogonal, 21 generalized real
generalized real Schur
Schur form,
form, 128
128
153
154 Index
Index
generalized Schur form, 127

generalized Kronecker product, 139
generalized singular value decomposition,
generalized decomposition, determinant of, 142
determinant
134
134 eigenvalues of, 141
geometric
geometric multiplicity, 76 eigenvectors of, 141
eigenvectors
products of, 140
Holder Inequality, 58 pseudoinverse of, 148
Hermitian transpose,
transpose, 2 singular values of, 141
singUlar
higher–order
higher-order difference equations trace of, 142
conversion to first–order
conversion first-order form, 121 transpose of, 140
higher–order
higher-order differential equations Kronecker sum, 142
first-order form, 120 eigenvalues of, 143
higher–order eigenvalue problems
higher-order problems eigenvectors of,
eigenvectors of, 143
143
first-order form, 136 exponential of, 149
i, 2
i,2 leading principal submatrix, 100
idempotent, 6, 51
idempotent, 51 left eigenvector, 75
identity matrix, 4 left generalized eigenvector, 125
inertia, 103 invertible, 26
left invertible.
initial–value problem, 109
initial-value left nullspace, 22
left
higher–order equations, 120
for higher-order left principal vector, 85
homogeneous linear difference
for homogeneous linear dependence, 10
equations, 118 linear equations
linear equations
homogeneous linear differential
for homogeneous characterization of
characterization of all
all solutions,
solutions, 44
44
equations, 112 existence of
existence of solutions,
solutions, 44
44
for inhomogeneous
for inhomogeneous linear
linear difference
difference uniqueness
uniqueness of of solutions,
solutions, 45
45
equations, 119 linear independence,
linear independence, 10 10
inhomogeneous linear differen-
for inhomogeneous differen- problem, 65
linear least squares problem,
tial equations,
equations, 112 general solution of, 66
inner product
inner product geometric solution of, 67
complex, 55 residual of, 65
complex Euclidean,
complex Euclidean, 44 solution via QR factorization, 71 71
Euclidean, 4, 54 solution via singular value decom-
decom-
real, 54 position, 70
usual, 54 statement of, 65
weighted, 54 uniqueness of solution, 66
invariant factors, 84 linear regression, 67
inverses linear transformation, 17
of block matrices, 47 co–domain of, 17
co-domain
composition of, 19
7,
j,22 domain of, 17
Jordan block, 82 invertible, 25
Jordan canonical form (JCF), 82 invertible, 26
left invertible.
matrix representation of, 18
Kronecker canonical
Kronecker canonical form (KCF), 129 nonsingular, 25
Kronecker delta, 20
Kronecker nullspace
nulls pace of, 20
Index
Index 155
range of, 20 /?–,

p-,60 60
right invertible, 26 consistent, 61
61
LU
LV factorization, 6 Frobenius, 60
block,
block,55 induced by a vector norm, 61
Lyapunov differential equation,
Lyapunov differential equation, 113
113 mixed,
mixed,60 60
Lyapunov equation,
equation, 144 mutually consistent, 61
and asymptotic stability, 146 relations among, 61
61
integral form
integral of solution,
form of solution, 146
146 Schatten, 60
Schatten,60
symmetry
symmetry of of solution,
solution, 146
146 spectral, 60
spectral,
uniqueness ofof solution, 146 subordinate to a vector norm, 61
subordinate
unitarily
unitarily invariant,
invariant, 62
62
matrix
matrix matrix pencil, 125
asymptotically stable, 145 equivalent, 127
best rank k approximation to, 67 reciprocal, 126
companion, 105 regular, 126
defective, 76 singular,
singUlar, 126
definite, 99 matrix sign function, 91
derogatory, 106
minimal polynomial, 76
diagonal,
diagonal,2 2 monic polynomial, 76
exponential, 109
Moore–Penrose
Moore-Penrose pseudoinverse, 29
Hamiltonian, 122
multiplication
multiplication
Hermitian, 2
matrix–matrix,
matrix-matrix, 3
Householder,
Householder, 97 97
matrix–vector,
matrix-vector, 3
indefinite, 99
Murnaghan–Wintner
Mumaghan-Wintner Theorem, 98
lower Hessenberg, 2
lower triangular, 2
nearest singular matrix to, 67 negative definite, 99
nilpotent, 115 negative invariant subspace, 92
92
nonderogatory, 105 nonnegative definite,
definite, 99
99
normal, 33, 95 criteria for, 100
orthogonal, 4 nonpositive definite, 99
pentadiagonal, 2 norm
norm
quasi–upper–triangular,
quasi-upper-triangular, 98 induced,
induced,56 56
sign of
of a, 91 natural,
natural,5656
square root ofof a, 101
10 1 normal equations, 65
symmetric, 2 normed
normed linear
linear space,
space, 57
57
symplectic, 122 nullity, 24
tridiagonal, 2 nullspace,
nullspace,20 20
unitary,
unitary, 44 left, 22
22
upper
upper Hessenberg,
Hessenberg, 22 right, 22
upper triangular, 2
matrix
matrix exponential,
exponential, 81,
81, 91,
91, 109
109 observability, 46
observability, 46
matrix norm, 59 one–to–one
one-to-one (1–1),
(1-1), 23
1–.60
1-,60 conditions for, 25
2–,
2-,6060 onto, 23
oo–,
00-,60 60 conditions for,
for, 25
156
156 Index
Index
orthogonal Q –orthogonality, 55
Q-orthogonality,
complement, 21 21 QR factorization,
QR factorization, 72
72
matrix, 4
matrix, 4
projection, 52
projection, 52 JR.n,, 11I
TO"
IK
mxn i
subspaces, 14 MJR.mxn,1
, 1
mxn 11
vectors, 4,
vectors, 4, 20
20 MlR.~xn,
r '
Mnxn
JR.~xn,1 I
orthonormal
orthonormal n ' '
vectors, 4, 20 range, 20
range, 20
outer product,
outer product, 19
19 range inclusion
range inclusion
Kronecker product, 140
and Kronecker characterized by pseudoinverses, 33
exponential of, 121
121 rank, 23
pseudoinverse of, 33 column, 23
decomposition of, 41
singular value decomposition 41 row, 23
row, 23
various matrix norms of, 63 rank–one matrix, 19
rank-one matrix, 19
rational canonical form, 104
Rayleigh quotient, 100
pencil
reachability, 46
equivalent, 127
equivalent, 127
form, 98
real Schur canonical form,
of matrices, 125
of
real Schur
real Schur form,
form, 9898
reciprocal, 126
reciprocal matrix pencil, 126
regular, 126
reconstructibility, 46
singular, 126
singular, 126
regular matrix pencil, 126
Penrose theorem,
Penrose theorem, 3030
residual, 65
polar factorization,
polar factorization, 4141
resolvent, 111
III
polarization identity, 57
polarization
reverse–order identity matrix,
reverse-order identity matrix, 39,
39, 89
89
definite, 99
positive definite, right eigenvector, 75
criteria for, 100 right generalized eigenvector, 125
positive invariant
positive invariant subspace,
subspace, 92
92 right invertible, 26
(Kth) of a Jordan block, 120
power (kth) right nullspace, 22
powers of a matrix right principal vector,
right principal vector, 85
85
119–120
computation of, 119-120 row
row
principal submatrix, 100 rank, 23
projection
projection vector, 1I
vector,
oblique, 51 51
fundamental subspaces, 52
on four fundamental form, 98
Schur canonical form,
orthogonal, 52 generalized, 127
pseudoinverse, 29 Schur complement, 6, 48, 102, 104
four Penrose conditions for, 30 Schur Theorem, 98
full–column–rank matrix, 30
of a full-column-rank Schur vectors, 98
full–row–rank matrix,
of a full-row-rank matrix, 30 second–order eigenvalue
second-order eigenvalue problem,
problem, 135
135
of aa matrix
of matrix product,
product, 32
32 conversion to
conversion to first–order form, 135
first-order form, 135
of aa scalar,
of scalar, 31
31 Sherman–Morrison–Woodbury formula,
Sherman-Morrison-Woodbury formula,
of aa vector,
of vector, 31
31 48
48
uniqueness, 30 signature, 103
signature, 103
via singular value decomposition, 38 transformation, 95
similarity transformation,
Pythagorean Identity, 59 invariance of eigenvalues,
and invariance eigenvalues, 81h
Index
Index 157
157
orthogonal,
orthogonal, 95 Sylvester's Law of Inertia, 103
Sylvester's
unitary, 95 symmetric generalized
generalized eigenvalue prob-
simple eigenvalue, 85 lem, 131
lem,131
simultaneous diagonalization, 133
total least squares,
squares, 68
via singular value decomposition,
decomposition, 134
trace, 6
singular matrix pencil, 126
transpose, 2
decomposition (SVD), 35 characterization
characterization by inner product, 54
and bases for four fundamental of
of a block matrix, 2
subspaces, 38 triangle inequality
and pseudoinverse,
pseudoinverse, 38 for matrix norms, 59
and rank, 38 for vector norms, 57
characterization
characterization of a matrix factor-
ization as, 37 unitarily invariant
dyadic expansion, 38 matrix norm, 62
examples, 37 vector norm, 58
full vs. compact,
compact, 37
variation of
of parameters, 112
fundamental theorem, 35
vec
vec
nonuniqueness, 36 of
of a matrix, 145
singular values, 36 of
of a matrix product, 147
singular vectors vector norm, 57
left, 36 l–, 57
1-,57
right, 36 2–,
2-,5757
span, 11 oo–,
00-,57 57
spectral radius, 62, 107 P–,
p-,57 51
spectral representation,
spectral representation, 97 equivalent, 59
spectrum, 76 Euclidean,
Euclidean, 57
subordinate norm, 61 61 Manhattan, 57
subspace, 9 relations among, 59
unitarily invariant, 58
A–invariant,
A-invariant, 89
weighted, 58
deflating, 129
weighted p–,p-, 58
reducing, 130 vector space, 8
subspaces dimension
dimension of, 12
complements
complements of, 13 vectors, 1
direct
direct sum of, 13 column, 1
equality of, 10 linearly dependent, 10
four fundamental, 23 linearly
linearly independent,
independent, 10
intersection of, 13 orthogonal, 4, 20
orthogonal, 14 orthonormal, 4, 20
sum of, 13 row,
row, 11
Sylvester differential
differential equation, 113 span of
of a set of, 11
Sylvester equation, 144
Sylvester
zeros
integral form of of solution, 145
of
of a linear dynamical system, 130
uniqueness of solution, 145

Matrix Analysis For Scientists and Engineers by Alan J Laub

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix Analysis For Scientists and Engineers by Alan J Laub

Uploaded by

Copyright:

Available Formats

Matrix Analysis

About the cover:

7 Projections, Inner Product Spaces, and Norms 51

8 Linear Least Squares Problems 65

9 Eigenvalues and Eigenvectors 75

— AJL, June 2004

A = la' ....• a"1 E

with the column

3.[ ~ J+2.[ ~ J+l.[ ~ l

If matrices C and D are compatible for multiplication, recall that (CD)

Note that (x, (x, y)c =

4. Interchanging two columns of A changes only the sign of

4. Let U1, U2, ...

2.1 Definitions and Examples

(Ml) aa·- ((,8,

(M2) there exists an element I1 Ee F

Axioms (A1)-(A3) state

= {ao+ atX + ... + apxP :aj,f3i EIR ;P,qEZ

and scalar multiplication

+ fJI2 aln + fJln

amI + fJml am2 + fJm2 a mn + fJmn

al VI + ... + (XkVk = 0 implies al = 0, ... , ak = O.

Let Vivf eE R",

Sp(X) = Sp{VI, V2, ... }

= {v : V = (Xl VI + ... + (XkVk ; (Xi ElF, Vi EX, kEN},

1. X (of basis vectors), and

dim(R mxn ) = mn.

dim(R + S) = dim(R) + dim(S) - dim(R n S).

Example 2.28. Let U be the subspace of upper triangular matrices in E"

vet) f--+ wet) = (I:-v)(t) = 11to

X f--+ Y = I:-X = MX.

LVi = aliwl + ... + amiWm

LVx = Lv = ~ILvI + ... + ~nLvn

3.3 Composition of Transformations

The above diagram

Rmxn can be written in the form A =

Theorem 3.4. Let

R(A) = Sp{al, ... , an} .

Proof: The proof

Remark 3.6. Note that in

3.7. Let {VI,

2. {[ ~~i ],[ -:~~ J} is an

Then it can be shown that

3xI + 5X2 + 7X3 = 0,

5. (n + S)~ = nl. n S~.

(a) AVI = AV2 ===} VI = V2 .

Tv = Av for all v E N(A)-L.

completeness, we include here a few miscellaneous results about ranks of sums

1. O:s rank(A + B) :s rank(A) + rank(B).

3. nullity(B) :s nullity(AB) :s nullity(A) + nullity(B).

Theorem 3.25. Let -> W. Then

Theorem 3.26. Let A : V -»

1. If A - RR such that AA~

Proof: We prove the first part and

A(A- R + A-RA -I) = AA- R + AA-RA - A

Consider the differentiation

5. Prove Theorem 3.11.4.

Tx = Ax for all x E NCA).l.

A+ = lim (AT A + 82 1) -I AT (4.1)

= limAT(AAT +8 2 1)-1. (4.2)

(A T )+ = lim (AA T + 82 l)-IA

= lim [AT(AAT + 82 l)-1{

2. {[ i ],[ -: J} is an