You are on page 1of 8

Statistics 910, #7

Eigenvectors and Eigenvalues of Stationary Processes


Overview
1. Toeplitz matrices
2. Szeg
os theorem
3. Circulant matrices
4. Matrix Norms
5. Approximation via circulants

Toeplitz and circulant matrices


Toeplitz matrix A banded, square matrix n (subscript n for the n n
matrix) with elements [n ]jk = jk ,

0
1 2 1n

0 1 2n
1

..
..

.
n = 2
(1)
1
0
.
.

..
..
.
. 1
.
.

n1 n2 1
0
Symmetry Toeplitz matrices dont have to be symmetric or real-valued,
but ours will be since well set h = h = Cov(Xt+h , Xt ) for some
stationary process Xt . From now on, n is the covariance matrix of a
stationary process.
Assumptions We will assume that the covariances are absolutely summable,
X
|h | <
h

Breadth The results shown here are in the form thats most accessible,
without searching for too much generality. All of these extend more
generally to other types of sequences, such as those that are square
summable, and other matrices that need not be symmetric.

Statistics 910, #7

Szeg
os Theorem
Fourier transform Since the covariances are absolutely summable, it is
(relatively) easy to show that we can define a continuous function
from the h ,
f () =

h ei2h ,

12 < 12 .

(2)

h=

If h are the covariances of a stationary process, then f () is known as


the spectral density function or spectrum of the process. (See Section
4.1-4.3 in SS.)
Inverse transform The Fourier transform is invertible in the sense that
we can recover the sequence of covariances from the spectral density
function f () by integration
Z 1/2
h =
f ()ei2h d .
(3)
1/2

R 1/2
Heuristically, the expression for the variance 0 = 1/2 f ()d suggests that the spectral density decomposes the variance of the process
into a continuous mix of frequencies.
Szeg
os theorem Define the eigenvalues of n as
n,0 , n,1 , . . . , n,n1 .
These are all positive if n is positive definite (as we often require
or assume). Szeg
os theorem shows that we can use the spectum to
approximate various various functions of the eigenvalues. (An interesting question in the analysis of Toeplitz matrices in general is what
happens when n is not full rank.)
Let G denote an arbitrary continuous function. Then
n1

1X
lim
G(n,h ) =
n n
h=0

1/2

G( f () )d

(4)

1/2

That is, we get to replace a sum by an integral. The sum resembles a


Riemann approximation to an integral, as if n,h = f (2h/n).

Statistics 910, #7

Basic examples
Trace The trace of a square matrix is the sum of the diagonal values
of the matrix, which equals the sum of the eigenvalues of the
matrix.
Z 1/2
1
1X
trace(n ) =
n,h
f ()d
n
n
1/2
h

Though it follows directly from (3) that 0 =


also a consequence of (4) as well.

R 1/2

1/2 f ()d,

it is

Determinant The product of the eigenvalues of a matrix.


Z 1/2
1X
1/n
log |n |
=
log n,h
log f ()d .
n
1/2
Prediction application Heres a surprising application of Szegos theorem
to prove a theorem of Kolmogorov. If Xt is a staionary process, how
well is it possible to predict Xn+1 linearly from its n predecessors? In
particular, what is
Vn = min E (Xn+1 a0 Xn a2 Xn1 an1 X1 )2
a

In the Gaussian case, the problem is equalvalent to finding the variance


of the conditional expectation of Xn+1 given X1:n . Szegos theorem
provides an elegant answer:
!
Z 1/2
log f ()d .
(5)
lim Vn = exp
n

1/2

The answer follows from applying (4) to the ratio of determinants,


once you note that
|n+1 |
Vn =
.
(6)
|n |
(Long aside: To see that (6) holds, covariance manipulations of the
multivariate normal distribution show you that the variance of the
scalar r.v. Y given the vector r.v. X is Var(Y |X) = yy yx 1
xx xy ,
where the variance matrix is partitioned as
!
!
Y
yy yx
= Var
=
.
X
xy xx

Statistics 910, #7

To find the determinant ||, postmultiply by a matrix with 1s on


the diagonal, obtaining

!

1
0

|| =
= (yy yx 1
xx xy )|xx |


1
xx xy I
The expression (6) follows.
1
xx xy ).

Note that the regression vector =

1 P
Now back to the prediction problem. From Szegos theorem, n+1
log n+1,j
P
1
log

since
both
approximate
the
same
integral.
Now
plug
in
n,j
n
1/(n+1)
1/n
|n+1 |
|n |
and (5) follows.

Circulant matrices
Circulant matrix is a special type of Toeplitz matrix constructed by rotating a vector c0 , c1 , . . . , cn1 , say, cyclically by one position to fill
successive rows of a matrix,

c0
c1
c2 cn1
c
c1 cn2
n1 c0

c
c
c0
c1
Cn =
(7)
n2 n1

..

..
..
.
.
.
c
1

c1

c2

cn1

c0

Circulant matrices are an important type of Toeplitz matrix for our


purpose (they have others!) because we can easily find their eigenvectors and eigenvalues.
Eigenvectors These are relatively easy to find from the basic definition of
an eigenvector and a great guess for the answer. An eigenvector u of
Cn satisfies Cn u = u, which gives a system of n equations:
c0 u0 + c1 u1 + + cn1 un1 = u0
cn1 u0 + c0 u1 + + cn2 un1 = u1
..
..
.
.
c1 u0 + c2 u1 + + c0 un1 = un1

Statistics 910, #7

The equation in the rth row is, in modular form,


nr1
X

row r :

n1
X

cj uj+r +

j=0

cj u(j+r)|n = ur

j=nr

or in more conventional form as


row r :

nr1
X

n1
X

cj uj+r +

j=0

cj uj+rn = ur .

j=nr

Guess uj = j (maybe reasonable to guess this if you have been studying differential equations.). Then
nr1
X

cj j+r +

j=0

n1
X

cj j+rn = r .

j=nr

If n = 1, it works! These are the n roots of unity; any = exp(i2j/n)


works for j = 0, 1, . . . , n 1. The eigenvectors have the form u0 =
(0 , 1 , 2 , . . . , n1 ):
u0r = (1, ei2r/n , ei22r/n , . . . , ei2(n1)r/n )
as seen in the discrete Fourier transform.
Eigenvalues These are the discrete Fourier transforms of the sequence that
defines the circulant,
n,k =

n1
X

cj ei2jk/n

j=0

If the cj = j , then weve got the first n terms of the sum that defines
the spectral density (2).
Implications for covariance matrices We can now anticipate the results. Asymptotically in n, the vectors that define the discrete Fourier
transform are eigenvectors of every covariance matrix. The eigenvalues
are then the transform coefficients of the covariances, namely values
of the spectral density function. If we define an orthogonal matrix
U = (u0 , u1 , . . . , un1 ) from the eigenvectors un , then we obtain
U 0 n U diag(f (j ))

Statistics 910, #7

To confirm these guesses, we need to show that a circulant matrix


provides a good approximation to the covariance, good enough so
that we can use the results for circulants when describing covariance
matrices. To describe what we mean by a good approximation, we
need a way to measure the distance between matrices, a norm.

Matrix norms
Norms A norm on a vector space V is any function kxk for x V for which
( R)
1. kxk > 0, x 6= 0.
2. kxk = ||kxk
3. kx + yk kxk + kyk (triangle inequality)
Operator norm Define the operator norm of a square matrix T as the
maximum ratio of quadratic forms,
kT k = max
x

x0 T x
= max j
x0 x

where j are the eigenvalues (singular values) of T .


Hilbert-Schmidt or weak norm. For an n n matrix T ,
X
|T |2 = n1
|tjk |2
=
=

j,k
0
1
trace(T
T)
nX
2
1
j .
n

Connections It is indeed a weaker norm,


2
|T |2 kT k2 = max

The two allow you to handle products,


|T S| kT k|S| .

Statistics 910, #7

Approximation via circulants


An approximation Consider the following circulant that approximates
n , obtained by flipping the covariances and running them both
ways:

0
1 + n1 2 + n2

n1 + 1

0
1 + n1

n2 + 2
1 + n1

..
..

.
Gn = 2 + n2 1 + n1
0
.

.
..
..

.
.
.
.

1
n1 + 1 n2 + 2

1 + n1
0
(8)
The norm of the difference n Gn is a familiar type of sum,
|n Gn |2 =
=

2
n

2
n

= 2

n1
X

2
(n h)nh

h=1
n1
X
h=1
n1
X

h h2
h
n

reverse the sum

h2 .

h=1

Since we assume that

|j | < , the sum converges to 0,


lim |n Gn | = 0 .

Eigenvalues of close matrices If two matrices are close in the weak norm,
then the averages of their eigenvalues are close. In particular, given
two matrices A and B with eigenvalues j and j , then
X
X
| n1
j n1
j | |A B|
j

Proof: If D = A B, then
X
X
j
j = trace(A) trace(B) = trace(D).
j

Now use Cauchy-Schwarz in the form


X
X
X
(
aj )2 = (
aj 1)2 (
a2j )(n)
j

(9)

Statistics 910, #7

to show that
|trace(D)|2 = |

djj |2 n

X
j

d2jj n

d2jk = n2 |D|2 .

j,k

The loose bounds in the proof suggest you can do a lot better. In fact,
you can move the absolute value inside the sum (so that the actual
eigenvalues are getting close, not just on average).
Powers of eigenvalues A messier argument produces a similar result. Given
two matrices A and B with |A B| 0 and eigenvalues j and j ,
then for any power s,
X
lim n1
(js js ) = 0 .
j

Extension Now that weve found that powers of the eigenvalues converge
for close matrices in the limit, its not hard to get the result for a
continuous function. The argument is a common one: polynomials
can approximate any continuous function g. The result implies that
X
g(j ) g(j ) = 0 .
lim n1
j

You might also like