Professional Documents
Culture Documents
eigenvectors,
diagonalisation of a
matrix, orthogonal
Diagonalisation diagonalisation fo
symmetric matrices
Reading
As in the previous chapter, there is no specific essential reading for this chapter. It
is essential that you do some reading, but the topics discussed in this chapter are
adequately covered in many texts on linear algebra. The list below gives examples of
relevant reading. (For full publication details, see Chapter 1.)
Leon, S.J., Linear Algebra with Applications. Chapter 6, sections 6.1 and 6.3.
Simon, C.P. and Blume, L., Mathematics for Economists. Chapter 23, sections
23.1, 23.7.
Introduction
One of the most useful techniques in applications of matrices and linear algebra is
diagonalisation. Before discussing this, we have to look at the topic of eigenvalues
and eigenvectors. We shall explore a number of applications of diagonalisation in
the next chapter of the guide.
Definitions
37
equation holds is called an eigenvector for eigenvalue or an eigenvector of A
corresponding to eigenvalue .
Example: Let
1 1
A= .
2 2
Then
1 1 1 0 1 1
A I = =
1 2 0 1 2 2
and the characteristic polynomial is
1 1
|A I| =
2 2
= (1 )(2 ) 2
= 2 3 + 2 2 = 2 3.
So the eigenvalues are the solutions of 2 3 = 0. To solve this, one could use either
the formula for the solutions to a quadratic, or simply observe that the equation is
( 3) = 0 with solutions = 0 and = 3. Hence the eigenvalues of A are 0 and 3.
We have seen that the eigenvalues are 0 and 3. To find an eigenvector for eigenvalue
0 we solve the system (A 0I)x = 0: that is, Ax = 0, or
1 1 x1 0
= .
2 2 x2 0
This could be solved using row operations. (Note that it cannot be solved by us-
ing inverse matrices since A is not invertible. In fact, inverse matrix techniques or
38
Cramers rule will never be of use here since being an eigenvalue means that A I
is not invertible.) However, we can solve this fairly directly just by looking at the
equations. We have to solve
x1 + x2 = 0, 2x1 + 2x2 = 0.
Clearly both equations are equivalent. From either one, we obtain x1 = x2 . We can
choose x2 to be any number we like. Lets take x2 = 1; then we need x1 = x2 = 1.
It follows that an eigenvector for 0 is
1
x= .
1
The choice x2 = 1 was arbitrary; we could have chosen any non-zero number, so, for
example, the following are eigenvectors for 0:
2 5.2
, .
2 5.2
There are infinitely many eigenvectors for 0: for each 6= 0,
is an eigenvector for 0. But be careful not to think that you can choose = 0; for
then x becomes the zero vector, and this is never an eigenvector, simply by definition.
To find an eigenvector for 3, we solve (A 3I)x = 0, which is
2 1 x1 0
= .
2 1 x2 0
This is equivalent to the equations
2x1 + x2 = 0, 2x1 x2 = 0,
which are together equivalent to the single equation x2 = 2x1 . If we choose x1 = 1,
we obtain the eigenvector
1
x= .
2
(Again, any non-zero scalar multiple of this vector is also an eigenvector for eigenvalue
3.)
We illustrated with a 2 2 example just for simplicity, but you should be able to work
with 3 3 matrices. We give three such examples.
39
Now, we notice that each of the two terms in this expression has 4 as a factor, so
instead of expanding everything, we take 4 out as a common factor, obtaining
It follows that the eigenvalues are 4, 0, 12. (The characteristic polynomial will not
always factorise so easily. Here it was simple because of the common factor (4 ).
The next example is more difficult.) To find an eigenvector for 4, we have to solve
the equation (A 4I)x = 0, that is,
0 0 4 x1 0
0 0 4 x2 = 0 .
4 4 4 x3 0
Of course, we could use row operations, but the system is simple enough to solve
straight away. The equations are
4x3 = 0
4x3 = 0
4x1 + 4x2 + 4x3 = 0,
Activity 3.1 Determine eigenvectors for 0 and 12. You should find that for = 0,
your eigenvector is a non-zero multiple of
1
1
1
Example: Let
3 1 2
A = 1 1 1.
1 1 0
Given that 1 is an eigenvalue of A, find all the eigenvalues of A.
40
We calculate the characteristic polynomial of A:
3 1 2
|A I| = 1 1 1
1 1 0
1 1 1 1 1 1
= (3 ) (1)
2
1 1 1 1
= (3 )(2 + 1) + ( 1) 2(2 + )
= 3 42 5 2.
Now, the fact that 1 is an eigenvalue means that 1 is a solution of the equation
|A I| = 0, which means that ( (1)), that is, ( + 1), is a factor of the
characteristic polynomial |A I|. So this characteristic polynomial can be written
in the form
( + 1)(a2 + b + c).
Clearly we must have a = 1 and c = 2 to obtain the correct 3 term and the
correct constant. Given this, b = 3. In other words, the characteristic polynomial
is
( + 1)(2 3 2) = ( + 1)(2 + 3 + 2) = ( + 1)( + 2)( + 1).
That is, |A I| = ( + 1)2 ( + 2). The eigenvalues are the solutions to |A I| = 0,
so they are = 1 and = 2. Note that in this case, there are only two eigenvalues
(or, the eigenvalue 1 is repeated, or has multiplicity 2, as it is sometimes said).
Example: Let
3 1 1
A = 0 2 0.
1 1 3
Then (check this!), the characteristic polynomial is 3 + 802 20 + 16. This
factorises (check!) as
( 2)( 2)( 4),
so the eigenvalues are 2 and 4. There are only two eigenvalues in this case. (We
sometimes say that the eigenvalue 2 is repeated or has multiplicity 2, because
( 2)2 is a factor of the characteristic polynomial.) To find an eigenvector for = 4,
we have to solve the equation (A 4I)x = 0, that is,
1 1 1 x1 0
0 2 0 x2 = 0 .
1 1 1 x3 0
The equations are
x1 x2 + x3 = 0
2x2 = 0
x1 x2 x3 = 0,
so x2 = 0 and x1 = x3 . Choosing x3 = 1, we get the eigenvector
1
0.
1
For = 2, we have to solve the equation (A 2I)x = 0, that is,
1 1 1 x1 0
0 0 0 x2 = 0 .
1 1 1 x3 0
41
This system is equivalent to the single equation x1 x2 +x3 = 0. (Convince yourself!)
Choosing x3 = 1 and x2 = 0 we have x1 = 1, so we obtain the eigenvector
1
0.
1
Complex eigenvalues
Although we shall only deal in this subject with real matrices (that is, matrices
whose entries are real numbers), it is possible for such real matrices to have complex
eigenvalues. This is not something you have to spend much time on, but you have
to be aware of it. We briefly describe complex numbers.1 The complex numbers 1
For more discussion of
are based on the complex number i, which is defined to be the square root of 1. complex numbers, see
(Of course, no such real number exists.) Any complex number z can be written in Appendix A3 of Simon and
the form z = a + bi where a, b are real numbers. We call a the real part and b Blume.
the imaginary part of z. Of course, any real number is a complex number, since
a = a + 0i. The following example shows a 2 2 matrix with complex eigenvalues,
and it also demonstrates how to deal with complex numbers.
1 1
Example: Consider the matrix A = . We shall see that it has complex
9 1
eigenvalues. First, the characteristic polynomial |AI| is (1)2 +9 = 2 2+10.
(Check this!) Using the formula for the roots of a quadratic equation, the eigenvalues
are
2 36
.
2
Now p
36 = (36)(1) = 36 1 = 6 1 = 6i.
So the eigenvalues are the complex numbers 1 + 3i and 1 3i. Lets proceed with
finding eigenvectors. To find an eigenvector for 1 + 3i, we solve (A (1 + 3i)I)x = 0,
which is
3i 1 x1 0
= .
9 3i x2 0
This is equivalent to the equations
But the second equation is just 3i times the first, so both are equivalent. Taking
the second, we see that x1 = i/3x2 . So an eigenvector is (taking x2 = 3) (i, 3)T .
For = 1 3i, we end up solving the system
You should be aware, then, that even though we are not dealing with matrices that
have complex numbers as their entries, the possibility still exists that eigenvalues
(and eigenvectors) will involve complex numbers. However, if a matrix is symmetric
(that is, it equals its transpose), then it certainly has real eigenvalues. This useful
fact, which we shall prove later, is important when we consider quadratic forms in
the next chapter.
42
Diagonalisation of a square matrix
Square matrices A and B are similar if there is an invertible matrix P such that
P 1 AP = B.
and
1 0 0
0 2 0
DP = .. (v1 . . . vn ) = (1 v1 . . . n vn ) .
0 0 . 0
0 0 n
So this means that
The fact that P 1 exists means that none of the vectors vi is the zero vector. So
this means that (for i = 1, 2, . . . , n) i is an eigenvalue of A and vi is a corresponding
eigenvector. Since P has an inverse, these eigenvectors are linearly independent.
Therefore, A has n linearly independent eigenvectors. Conversely, if A has n linearly
independent eigenvectors, then the matrix P whose columns are these eigenvectors
will be invertible, and we will have P 1 AP = D where D is a diagonal matrix with
entries equal to the eigenvalues of A. We have therefore established the following
result.
There is a more sophisticated way to think about this result, in terms of change of
basis and matrix representations of linear transformations. Suppose that T is the
linear transformation corresponding to A, so that T (x) = Ax for all x. Suppose that
A has a set of n linearly independent eigenvectors B = {x1 , x2 , . . . , xn }, corresponding
(respectively) to the eigenvalues 1 , . . . , n . Since this is a linearly independent set
of size n in Rn , B is a basis for Rn . By Theorem 2.18, the matrix of T with respect
to B is
AT [B, B] = ([T (x1 )]B . . . [T (xn )]B ) .
But T (xi ) = Axi = xi , so the coordinate vector of T (xi ) with respect to B is
43
which has i in entry i and all other entries zero. Therefore
AT [B, B] = diag(1 , . . . , n ) = D.
P = (x1 . . . xn )
and AT is the matrix representing T , which in this case is simply A itself. We therefore
see that
P 1 AP = AT [B, B] = D,
and so the matrix P diagonalises A.
We have seen that it has three distinct eigenvalues 0, 4, 12, and that eigenvectors
corresponding to eigenvalues 4, 0, 12 are (in that order)
1 1 1
1 , 1 , 1 .
0 1 2
Then, according to the theory, P should have an inverse, and we should have P 1 AP =
D = diag(4, 0, 12). To check that this is true, we could calculate P 1 and evaluate
the product. The inverse may be calculated using either elementary row operations
or determinants. (Matrix inversion is not part of this subject: however, it is part of
the pre-requisite subject Mathematics for economists. You should therefore know
how to invert a matrix.)
44
r
vectors of the form as r runs through all non-zero real numbers. So the
r
1
eigenvectors are precisely the non-zero scalar multiples of the fixed vector .
1
Any two eigenvectors are therefore multiples of each other and hence form a linearly
dependent set. In other words, there are not two linearly independent eigenvectors,
and the matrix is not diagonalisable.
The following result is useful. It shows that if a matrix has n different eigenvalues
then it is diagonalisable.2 2
For a proof, see
Ostaszewski, Mathematics
in Economics, Section 7.4.
Theorem 3.2 Eigenvectors corresponding to different eigenvalues are linearly inde-
pendent. So if an n n matrix has n different eigenvalues, then it has a set of n
linearly independent eigenvectors and is therefore diagonalisable.
It is not, however, necessary for the eigenvalues to be distinct. What is needed for
diagonalisation is a set of n linearly independent eigenvectors, and this can happen
even when there is a repeated eigenvalue (that is, when there are fewer than n
different eigenvalues). The following example illustrates this.
above, and we saw that it has only two eigenvalues, 4 and 2. If we want to diago-
nalise it, we need to find three linearly independent eigenvectors. We found that an
eigenvector corresponding to = 4 is
1
0,
1
and that, for = 2, the eigenvectors are given by the non-zero-vector solutions to
the system consisting of just the single equation x1 x2 + x3 = 0. Above, we simply
wanted to find an eigenvector, but now we want to find two which, together with
the eigenvector for = 4, form a linearly independent set. Now, the system for the
eigenvectors corresponding to = 2 has just one equation and is therefore of rank 1;
it follows that the solution set is two-dimensional. Lets see exactly what the general
solution looks like. We have x1 = x2 x3 , and x2 , x3 can be chosen independently of
each other. Setting x3 = r and x2 = s, we see that the general solution is
x1 sr 1 1
x2 = s = s 1 + r 0 , (r, s R).
x3 r 0 1
This shows that the solution space (the eigenspace, as it is called in this instance)
is spanned by the two linearly independent vectors
1 1
1, 0.
0 1
45
three linearly independent eigenvectors, even though two of them correspond to the
same eigenvalue. The matrix is therefore diagonalisable. We may take
1 1 1
P = 0 1 0.
1 0 1
is symmetric: that is, its transpose AT is equal to itself. It turns out that such
matrices are always diagonalisable. They are, furthermore, diagonalisable in a special
way. A matrix P is orthogonal if P T P = P P T = I: that is, if P has inverse P T .
A matrix A is said to be orthogonally diagonalisable if there is an orthogonal
matrix P such that P T AP = D where D is a diagonal matrix. Note that P T = P 1 ,
so P T AP = P 1 AP . The argument given above shows that the columns of P must
be n linearly independent eigenvectors of A. But the condition that P T P = I means
something else, as we now discuss.
T
xT2
P =
.. .
.
xTn
Calculating the matrix product P T P , we find that the (i, j)-entry of P T P is xTi xj .
But, since P T P = I, we must have
xTi xi = 1 (i = 1, 2, . . . , n), xTi xj = 0 (i 6= j).
So, not only must any two of these eigenvectors be orthogonal, but each must have
length 1.
We shall discuss orthogonality in more detail in the next chapter. For the moment,
we have the following result.
46
Proof Suppose that and are any two different eigenvalues of A and that x, y are
corresponding eigenvectors. Then Ax = x and Ay = y. The trick in this proof is
to find two different expressions for the product xT Ay (which then must, of course,
be equal to each other). Note that the matrix product xT Ay is a 1 1 matrix or,
equivalently, a number. First, since Ay = y, we have
xT Ay = xT (y) = xT y.
But also, since Ax = x, we have (Ax)T = (x)T = xT . Now, for any matrices
M, N , (M N )T = N T M T , so (Ax)T = xT AT . But AT = A (because A is symmetric),
so xT A = xT and hence
xT Ay = xT y.
We therefore have two different expressions for xT Ay: it equals xT y and xT y.
Hence,
xT y = xT y,
or
( )xT y = 0.
But since 6= (they are different eigenvalues), we have 6= 0. We deduce,
therefore, that xT y = 0. But this says precisely that x and y are orthogonal, which
is exactly what we wanted to prove.
The result just presented shows that if an n n symmetric matrix has exactly n
different eigenvalues then any n corresponding eigenvectors are orthogonal to one
another. Since we may take the eigenvectors to have length 1, this shows that the
matrix is orthogonally diagonalisable. The following result makes this precise.
(Note that we have only shown here that symmetric matrices with n different eigen-
values are orthogonally diagonalisable, but it turns out that all symmetric matrices
are orthogonally diagonalisable.)
Activity 3.3 Convince yourself that any two of these three eigenvectors are orthog-
onal.
47
Now,
p these eigenvectors
are not of length 1. For example,
the first one has length
12 + (1)2 + 02 = 2. If we divide each entry of it by 2, we will indeed obtain
an eigenvector of length 1:
1/2
1/ 2 .
0
We can similarly normalise the other two vectors, obtaining
1/3 1/6
1/ 3 , 1/ 6 .
1/ 3 2/ 6
We now form the matrix P whose columns are these normalised eigenvectors:
1/ 2 1/3 1/6
P = 1/ 2 1/ 3 1/6 .
0 1/ 3 2/ 6
Example: Let
7 0 9
A = 0 2 0.
9 0 7
Note that A is symmetric. We find an orthogonal matrix P such that P T AP is a
diagonal matrix. The characteristic polynomial of A is
7 0 9
|A I| = 0 2 0
9 0 7
= (2 )[(7 )(7 ) 81]
= (2 )(2 14 32)
= (2 )( 16)( + 2),
where we have expanded the determinant using the middle row. So the eigenvalues
are 2, 16, 2. An eigenvector for = 2 is given by
5x + 9z = 0, 9x + 5z = 0.
This means x = z = 0. So we may take (0, 1, 0)T . This already has length 1 so
there is no need to normalise it. (Recall that we need three eigenvectors which are
of length 1.) For = 2 we find that an eigenvector is (1, 0, 1)T (or some multiple
of
this). To normalise (that is, to make of length 1), we divide by its length, which
is 2, obtaining (1/ 2)(1, 0, 1)T . For = 16, we find a normalised eigenvector is
(1/ 2)(1, 0, 1). It follows that if we let
0 1/ 2 1/ 2
P = 1 0 0 ,
0 1/ 2 1/ 2
48
Learning outcomes
This chapter has discussed eigenvalues and eigenvectors and the very important tech-
nique of diagonalisation. We shall see in the next chapter how useful a technique
diagonalisation is.
At the end of this chapter and the relevant reading, you should be able to:
1 1
Question 3.2 Prove that the matrix is not diagonalizable.
0 1
49
Question 3.5 Let
0 0 2
A = 1 2 1.
1 0 3
Find an invertible matrix P and a diagonal matrix D such that P 1 AP = D.
Question 3.7 Suppose that A is a real diagonalisable matrix and that all the eigen-
values of A are non-negative. Prove that there is a matrix B such that B 2 = A.
Question 3.1 The characteristic polynomial is 3 + 142 48, which is easily fac-
torised as (6)(8). So the eigenvalues are 0, 6, 8. Corresponding eigenvectors,
respectively, are calculated to be (and non-zero multiples of)
Question 3.2 You can check that the only eigenvalue is 1 and that the corresponding
eigenvectors are all the scalar multiples of (1, 0)T . So there cannot be two linearly
independent eigenvectors, and hence the matrix is not diagonalisable.
Question 3.3 Denote the set described by W . First, 0 W . Suppose now that x, y
are in W and that R. We need to show that x + y and x are also in W . We
know that Ax = x and Ay = y, so
A(x + y) = Ax + Ay = x + y = (x + y)
and
A(x) = (Ax) = (x) = (x),
so x + y and x are indeed in W .
50
polynomial of A is p() = 3 + 22 + 2. Since = 1 is a root, we know that
( 1) is a factor. Factorising, we obtain
p() = ( 1)(2 + + 2) = ( 1)( 2)( + 1),
so the other eigenvalues are = 2, 1. Corresponding eigenvectors are, respectively,
(1, 1, 1)T and (0, 2, 1)T . We may therefore take
1 1 0
P = 0 1 2 , D = diag(1, 2, 1).
1 1 1
Question 3.5 This is slightly more complicated since there are not 3 distinct eigen-
values. The eigenvalues turn out to be 1 and 2, with two occurring twice. An
eigenvector for 1 is (2, 1, 1)T . We need to find a set of 3 linearly independent eigen-
vectors, so we need another two coming from the eigenspace corresponding to 2. You
should find that the eigenspace for = 2 is two-dimensional and has a basis consisting
of (1, 0, 1)T and (0, 1, 0)T . These two vectors together with (2, 1, 1)T do indeed
form a linearly independent set. Therefore we may take
2 1 0
P = 1 0 1 , D = diag(1, 2, 2).
1 1 0
51