You are on page 1of 78

\dqbjV

2002/
page 3

Chapter 10

Iterative Methods for


Linear Systems

10.1

Classi al Iterative Methods

10.1.1 Introdu tion


The methods dis ussed so far for solving systems of linear equations Ax = b, have
been dire t methods based on matrix fa torization and yielding the solution in a
xed nite number of operations. Iterative methods, on the other hand, start from
an initial approximation, whi h is su essively improved until a su iently a urate
solution is obtained. The idea of solving systems of linear equations by iterative
methods dates at least ba k to Gauss (1823). Before the advent of omputers iterative methods used were usually non y li relaxation methods guided at ea h step
by the sizes of the residuals of the urrent solution. When in the 1950s omputers
repla ed desk al ulators an intense development of y li iterative methods started.
Iterative methods are used most often for the solution of very large linear
systems. Basi iterative methods work dire tly with the original matrix A and only
need extra storage for a few ve tors. Sin e A is involved only in terms of matrix by
ve tor produ ts there is usually no need even to store the matrix A. Su h methods
are parti ularly useful for sparse system, whi h typi ally arise in the solution of
boundary value problems of partial di erential equations by nite di eren e or
nite element methods. The matri es involved an be huge, sometimes involving
several million unknowns. The LU fa tors of matri es in su h appli ations typi ally
ontain order of magnitudes more nonzero elements than A itself. Hen e, be ause
of the storage and number of arithmeti operations required, dire t methods may
be ome far too ostly to use. This is true in parti ular for problems arising from
three-dimensional simulations in e.g., reservoir modeling, me hani al engineering,
ele tri ir uit simulation. However, in some areas, e.g., stru tural engineering,
whi h typi ally yield very ill- onditioned matri es, dire t methods are still preferred.
Pre onditioned iterative methods an be viewed as a ompromise between a
dire t and iterative solution method. General purpose te hniques for onstru ting
pre onditioners have made iterative methods su essful in many industrial appli a303

\dqbjV
2002/
page 3

304

Chapter 10.

Iterative Methods for Linear Systems

tions.

10.1.2 Relaxation Methods


Consider a linear system of equations Ax = b, where A is square and nonsingular.
All iterative methods assume that an initial approximation x(0) is given (e.g., we
ould take x(0) = 0) and by some rule ompute a sequen e of approximations x(1) ,
x(2) ; : : :. The iterative methods used before the age of high speed omputers were
usually rather unsophisti ated. We des ribe here two lassi al, so alled, relaxation
methods.
Assume that A has nonzero diagonal entries, i.e, aii 6= 0, i = 1; 2; : : : ; n.
If A is symmetri , positive de nite this is ne essarily the ase. Otherwise, sin e
A is nonsingular, the equations an always be reordered so that this is true. In
omponent form the system an then be written

xi =

1
b
aii i

n
X

j =1;j 6=i

aij xj ;

i = 1; 2 : : : ; n:

(10.1.1)

In a (minor) step of the iteration we pi k one equation, say the ith, and then
adjust the ith omponent of x(k) so that this equation be omes exa tly satis ed.
Hen e, given x(k) we ompute

x(ik+1) = x(ik) +

1 (k)
r ; ri(k) = bi
aii i

n
X
j =1

aij x(jk) :

(10.1.2)

In the days of \hand" omputation one pi ked an equation with a large residual
and went through the equations in an irregular manner. This is less e ient
when using a omputer, and here one usually perform these adjustments for i =
1; 2 : : : ; n, in a y li fashion. The resulting iterative method is alled the method
of simultaneous displa ements or Ja obi's method. Note that all omponents of
x an be updated simultaneously and the result does not depend on the sequen ing
of the equations.
The method of su essive displa ements or Gauss{Seidel's method1 di ers
from the Ja obi method by using new values x(jk+1) as soon as they are available as
follows:

jri

x(ik+1) = x(ik) +

1 (k)
r ; ri(k) = bi
aii i

i 1
X
j =1

aij x(jk+1)

n
X
j =i

aij x(jk) ; i = 1; 2 : : : ; n:

(10.1.3)
Here the omponents are su essively updated and the sequen ing of the equations
will in uen e the result.
Sin e ea h new value xi(k+1) an immediately repla e x(ik) in storage the Gauss{
Seidel method storage for unknowns is halved ompared to Ja obi's method. For
1
It was noted by G. Forsythe that Gauss nowhere mentioned this method and Seidel never
advo ated using it!

\dqbjV
2002/
page 3

10.1.

Classi al Iterative Methods

305

both methods the amount of work required in ea h iteration step is omparable in


omplexity to the multipli ation of A with a ve tor, i.e., proportional to the number
of nonzero elements in A. By onstru tion it follows that if limk!1 x(k) = x, then
x satis es (10.1.1) and therefore the system Ax = b.

Example 10.1.1. Consider the symmetri , positive de nite linear system


0
0 1
4
1 1 01
1
4
0
1
B 1
C
B2C
A= 1 0 4
b = 0A:
1A;
0
1 1 4
1
With the Ja obi method we get the following sequen e of approximations:
k

x(1k)

x(2k)

x(3k)

x(4k)

1
2
3
4
5

0:25
0:375
0.4375
0.46875
0.48344
:::
0.5

0.5
0.625
0.6875
0.71875
0.73438
:::
0.75

0
0.125
0.1875
0.21875
0.23438
:::
0.25

0.25
0.375
0.4375
0.46875
0.48344
:::
0.5

If we instead use the Gauss{Seidel method we obtain:


k x(1k)
x(2k)
x(3k)
x(4k)
1
2
3
4
5

0.25
0.40625
0.47656
0.49414
0.49854
:::

0.5625
0.70312
0.73828
0.74707
0.74927
:::

0.0625
0.20312
0.23828
0.24707
0.24927
:::

0.40625
0.47656
0.49414
0.49854
0.49963
:::

In both ases the iteration onverges, but rather slowly. Note that the onvergen e
with Gauss{Seidel's method is, in this example, about twi e as fast as with the
Ja obi method. This will be shown in Se tion 10.3.1 to be true for an important
lass of matri es, so alled onsistently ordered matri es, see Def. 10.3.4. However,
there are examples for whi h the Gauss{Seidel method diverges and the Ja obi
method onverges, and vi e versa.
It was noted early that more rapid onvergen e ould often be a hieved by
making the orre tion

x(ik+1) = x(ik) + ri(k) ;
aii

\dqbjV
2002/
page 3

306

Chapter 10.

Iterative Methods for Linear Systems

with > 1 (over-relaxation) or < 1 under-relaxation).


Another lassi al iterative method is Ri hardson's method,2 is de ned by
x(k+1) = x(k) + !k (b Ax(k) ); k = 0; 1; 2; : : : ;
(10.1.4)
where !k > 0 are parameters to be hosen. It follows easily from (10.1.4) that the
residual r(k) = b Ax(k) and error satisfy the re ursions
r(k+1) = (I !k A)r(k) ; x(k+1) x = (I !k A)(x(k) x):
In the spe ial ase that ! = !k we have
x(k) x = (I !A)k (x(0) x):
If A has a nonzero diagonal it an be s aled to have all diagonal elements equal to
1. In this ase Ri hardson's method with ! = 1 is equivalent to Ja obi's method.
The onvergen e of the three methods introdu ed here will be studied in Se tion 10.2.

10.1.3 A Model Problem


The biggest sour e of large linear systems is partial di erential equations. An
equation whi h is often en ountered is Poisson's equation, where u(x; y) satis es
2u 2u
+
= f;
(x; y) 2
= (0; 1)  (0; 1):
x2 y2
On the boundary
we assume u(x; y) to be pres ribed. We will frequently use this
as a model problem.
To approximate the solution we impose a uniform square mesh of side h = 1=n
on
. Taking f = 0 (Lapla e equation) and approximating the se ond derivatives
by symmetri di eren e quotients gives a di eren e equation

1
0 < i; j < n;
ui+1;j + ui 1;j + ui;j+1 + ui;j 1 4uij = 0;
2
h
for the unknown values uij at interior mesh points; see Se tion 13.1.2. If the mesh
points are enumerated line by line (the so alled \natural ordering") and a ve tor
u is formed of the fun tion values, the di eren e equation an then be written in
matrix form as
Au = h2 b;
u = (u1 ; u2; : : : ; un 1 );
where ui is the ve tor of unknowns in the ith line and the matrix A is symmetri ,
with at most ve nonzero elements in ea h row.
It an be veri ed that the nonzero elements of A lie on ve diagonals. In blo k
form the matrix an be written as
0
1
2I + T
I
.
B
C
I
2I + T . .
C
(n 1)2 (n 1)2 ;
A=B
(10.1.5)
B
C2R
.
.

..
..
I A
I 2I + T
2

L. F. Ri hardson [1910

\dqbjV
2002/
page 3

10.1.

Classi al Iterative Methods

307

where T is symmetri tridiagonal,


1
0
2
1
...
C
B
1 2
C
(10.1.6)
T =B
C 2 R(n 1)(n 1) :
B
... ...

1A
1 2
Taking n = 3 we obtain the matrix used in Example 10.1.1.
It an be shown that in the Cholesky fa torization A = LLT nearly all zero
elements within the outermost nonzero diagonals of L will be lled in. Hen e L
ontains about n3 nonzero elements ompared to only about 5n2 in A.
To ompute the Cholesky fa torization of a symmetri band matrix of order n
and (half) bandwidth w requires approximately 12 nw2 ops (see Algorithm 6.4.6).
For the matrix A in (10.1.6) the dimension is n2 and the bandwidth equals n. Hen e
about 12 n4 ops are needed for the fa torization. This an be ompared to the 5n2
ops needed per iteration, e.g., with Ja obi's method.
The above shows that for the model problem dire t methods use O(n2 ) ops
and about O(n) storage per grid point. This disadvantage be omes even more
a entuated if a three dimensional problem is onsidered. For Lapla e equation in
the unit ube a similar study shows that for solving n3 unknown we need 21 n7 ops
and about n5 storage. When n growth this qui kly be omes infeasible. However,
basi iterative methods still require only about 7n3 ops per iteration.
We still have not dis ussed the number of iterations needed to get a eptable
a ura y. It turns out that this will depend on the ondition number of the matrix.
We now show that for the Lapla e equation onsidered above this ondition number
will be about h 2 , independent of the dimension of the problem.

Lemma 10.1.1. Let T = trid ( ; a; b) 2 Rnn be a tridiagonal matrix with onstant


diagonals, and assume that a; b; are real and b > 0. Then the n eigenvalues of T
are given by
p
i
i = a + 2 b os
; i = 1; : : : ; n:
n+1
Further, the j th omponent of the eigenve tor vi orresponding to i is
vij =

 j=2

sin

ij
; j = 1; : : : ; n:
n+1

From Lemma 10.1.1) it follows that the eigenvalues of T are i = 2(1 +


os (i=n)), i = 1; : : : ; n 1. In parti ular we have that

max = 2(1 + os (=n))  4;

min = 2(1 os (=n))  (=n)2 :

Hen e the ondition number of T is approximately equal to 4n2=2 .


The matrix A = 4(I L U ) in (10.1.5) an be written in terms of the
Krone ker produ t (see Se tion 6.2.3) as

A = (I
T ) + (T
I );

\dqbjV
2002/
page 3

Chapter 10.

308

Iterative Methods for Linear Systems

i.e., A is the Krone ker sum of T and T . It follows that the (n 1)2 eigenvalues
of A are (i + j ), i; j = 1; : : : ; n 1, and hen e the ondition number of A is the
same as for T . The same on lusion an be shown to hold for a three dimensional
problem.
10.2

Stationary Iterative Methods

The Ja obi, Gauss{Seidel, and Ri hardson methods are all spe ial ases of a lass
of iterative methods, the general form of whi h is

Mx(k+1) = Nx(k) + b;

Here

A=M

k = 0; 1; : : : :

(10.2.1)
(10.2.2)

is a splitting of the matrix oe ient matrix A with M nonsingular. If the iteration


(10.2.1) onverges, i.e., limk!1 x(k) = x, then Mx = Nx + b and it follows from
(10.2.2) that the limit ve tor x solves the linear system Ax = b. For the iteration
to be pra ti al, it must be easy to solve linear systems with matrix M . This is the
ase, for example, if M is hosen to be triangular.
The iteration (10.2.1) is equivalent to

x(k+1) = Bx(k) + ;

where

k = 0; 1; : : : ;

B = M 1N = I

M 1 A;

(10.2.3)

= M 1 b:

An iteration of the form of (10.1.6) is alled a (one-step) stationary iterative


method, and B the iteration matrix. (In a non-stationary method the iteration
matrix B depends on k.) Subtra ting the equation x = Bx + from (10.1.6) we
obtain the re urren e formula

x(k+1)

x = B (x(k)

x)

(10.2.4)

for the errors in su essive approximations.


Ri hardson's method (10.1.4) an, for xed !k = !, be written in the form
(10.2.1) with
M = I;
N = B = I !A:
To write the Ja obi and Gauss{Seidel methods in the form of one-step stationary iterative methods we introdu e the standard splitting

A = DA LA UA ;
where DA = diag (a11 ; : : : ; ann ),
1
0
0
C
B a21
C ; UA =
LA = B
...
...
A
 ..
.
an1    an;n 1 0
0

(10.2.5)

0 a12    a1n 1
.. C
... ...
B
. C;
B

0 an 1;n A
0
0

\dqbjV
2002/
page 3

10.2.

Stationary Iterative Methods

309

and LA and UA are stri tly lower and upper triangular, respe tively. Assuming that
DA > 0, we an also write

DA 1 A = I

L U;

L = DA 1 LA ;

U = DA 1 UA :

With these notations the Ja obi method, (10.1.2), an be written DA x(k+1) =


(LA + UA )x(k) + b or

x(k+1) = (L + U )x(k) + ;

= DA 1 b:

The Gauss{Seidel method, (10.1.3), be omes (DA


equivalently

x(k+1) = (I

L) 1 Ux(k) + ; = (I

(10.2.6)

LA )x(k+1) = UA x(k) + b, or
L) 1 DA 1 b

Hen e these methods are spe ial ases of one-step stationary iterative methods, and
orrespond to the matrix splittings
Ja obi:
M = DA,
N = LA + UA ,
Gauss{Seidel: M = DA LA, N = UA ,
The iteration matri es for the Ja obi and Gauss{Seidel methods are

BJ = DA 1 (LA + UA ) = L + U;
BGS = (DA LA ) 1 UA = (I L) 1 U:

10.2.1 Convergen e Analysis

The iterative method (10.1.6) is alled onvergent if the sequen e fx(k) gk=1;2;:::
onverges for all initial ve tors x(0) . Of fundamental importan e in the study of
onvergen e of stationary iterative methods is onditions for a sequen e of powers
of a matrix to onverge to the null matrix. For this we need some results from the
theory of eigenvalues of matri es.
In Se tion 6.2.2 we introdu ed the spe tral radius of a matrix A as the nonnegative number
(A) = max ji (A)j:
1in
We have the following important result:

Theorem 10.2.1. A given matrix B 2 Rnn is said to be onvergent if (B ) < 1.


It holds that
lim B k = 0 , (B ) < 1:
(10.2.7)
k!1
Proof.

(i)
(ii)

We will show that the following four onditions are equivalent:

lim B k = 0,

k!1

lim B k x = 0;

k!1

8x 2 Cn ,

\dqbjV
2002/
page 3

Chapter 10.

310

(iii)

(B ) < 1,

(iv)

kB k < 1

Iterative Methods for Linear Systems

for at least one matrix norm.

For any ve tor x we have the inequality kB k xk  kB k k kxk, whi h shows that
(i) implies (ii).
If (B )  1, then there is an eigenve tor x 2 Cn su h that Bx = x, with jj  1.
Then the sequen e B k x = k x, k = 1; 2; : : :, is not onvergent when k ! 1 and
hen e (ii) implies (iii).
By Theorem 10.2.9, (see Se tion 10.2.4) given a number  > 0 there exists a onsistent matrix norm k  k, depending on B and , su h that

kB k < (B ) + :
Therefore (iv) follows from (iii).
Finally, by applying the inequality kB k k  kB kk , we see that (iv) implies (i).
The following ne essary and su ient riterion for onvergen e of a stationary
iterative method follows from Theorem 10.2.1.

Theorem 10.2.2. The stationary iterative method x(k+1) = Bx(k) + is onvergent


for all initial ve tors x(0) if and only if (B ) < 1, where (B ) is the spe tral radius
of B .
Proof.

From the re urren e (10.2.4) it follows that

x(k)

x = B k (x(0)

x):

(10.2.8)

Hen e x(k) onverges for all initial ve tors x(0) if and only if limk!1 B k = 0. The
theorem now follows from Theorem 10.2.1.
The problem of obtaining the spe tral radius of B is usually no less di ult
than solving the linear system. Hen e the following upper bound is useful when
trying to prove onvergen e.

Lemma 10.2.3.
For any matrix A 2 Rnn and for any onsistent matrix norm we have
(A)  kAk:
Proof. Let  be an eigenvalue of A su h that jj = (A). Then Ax = x, x 6= 0,
and taking norms
kxk = (A)kxk = kAxk  kAk kxk:
Sin e kxk > 0, we an divide the inequality by kxk and the theorem follows.

From Lemma 10.2.3 it follows that a su ient ondition for onvergen e of


the iterative method is that kB k < 1, for some matrix norm.

\dqbjV
2002/
page 3

10.2.

Stationary Iterative Methods

311

Usually, we are not only interested in onvergen e, but also in the rate of
onvergen e. By (10.2.8) the error at step k, e(k) = x(k) x, satis es e(k) = B k e(0) ,
we have for any onsistent pair of norms

ke(k) k  kB k k ke(0)k  kB kk k ke(0)k:


Thus, to redu e the norm of the error by a given fa tor < 1, it su es to perform
k iterations, where k is the smallest integer for whi h kB k k  . Taking logarithms
we obtain the ondition
1
log kB k k:
k  log =Rk (B );
Rk (B ) =
k
An expression for the asymptoti rate follows from the relation

(B ) = lim (kB k k)1=k ;


k!1
whi h holds for any onsistent matrix norm. This is a non-trivial result, but an be
proved by using the Jordan normal form, see Problem 10.2.4.
This motivates the following de nition:
De nition 10.2.4. Assume that the iterative method (10:1:6) is onvergent. For
any matrix norm k  k we de ne the average rate of onvergen e by
1
log kB k k;
k
The orresponding asymptoti rate of onvergen e is given by

Rk (B ) =

(10.2.9)

R1 (B ) = lim Rk (B ) = log (B ):


k!1

10.2.2 Convergen e of Some Classi Methods


For xed !k = ! we have seen that Ri hardson's method is a stationary iterative
method with iteration matrix B = I !A 2 bfRnn .

Theorem 10.2.5.
Assume that the eigenvalues i of A are all real and satisfy
0 < a  i  b; i = 1; : : : ; n:
Then Ri hardson's method is onvergent for 0 < ! < 2=b.

The eigenvalues of B = I !A are i = 1 i and thus satisfy 1 !b 


i  1 !a. If 1 !a < 1 and 1 !b > 1, then ji j < 1 for all i and the method
is onvergent. Sin e a > 0 the rst ondition is satis ed for all ! > 0, while the
se ond is true if ! < 2=b.
Proof.

\dqbjV
2002/
page 3

312

Chapter 10.

Iterative Methods for Linear Systems

Assuming that a = min and b = max , whi h value of ! will minimize the
spe tral radius
(B ) = maxfj1 !aj; j1 !bjg ?
It is left as an exer ise to show that the optimal ! satis es 1 !a = !b 1. (Hint:
Plot the graphs of j1 !aj and j1 !bj for ! 2 (0; 2=b).) Hen e !opt = 2=(b + a),
for whi h
2
b a  1
=
=1
;
(B ) =
b+a +1
+1
where  = b=a is the ondition number of A. Note that for large values of  the rate
of onvergen e with !opt is proportional to  1 . This illustrates a typi al fa t for
iterative methods: ill- onditioned systems require in general more work to a hieve
a ertain a ura y!

Theorem 10.2.6. The Ja obi method is onvergent if A is stri tly row-wise diagonally dominant, i.e.,

jaii j >

n
X
j=1
j6=i

jaij j; i = 1; 2; : : : ; n:

For the Ja obi method the iteration matrix BJ = L + U has elements


bij = aij =aii , i 6= j , bij = 0, i = j . From the assumption it then follows that

Proof.

kBJ k1 = 1max
in

n
X
j=1
j6=i

jaij j=jaii j < 1:

A similar result for stri tly olumn-wise diagonally dominant matri es an be


proved using kBJ k1 . A slightly stronger onvergen e result than in Theorem 10.2.6
is of importan e in appli ations. (Note that, e.g., the matrix A in (10.1.1) is not
stri tly diagonal dominant!) For irredu ible matri es (see Def. 10.1.1) the row sum
riterion in Theorem 10.2.6 an be sharpened.

Theorem 10.2.7. The Ja obi method is onvergent if A is irredu ible,

jaii j 

n
X
j=1
j6=i

jaij j; i = 1; 2; : : : ; n;

and inequality holds for at least one row.

The olumn sum riterion an be similarly improved. The onditions in Theorem 10.2.6{10.2.7 are also su ient for onvergen e of the Gauss{Seidel method
for whi h (I L)BGS = U . Consider the stri tly row-wise diagonally dominant and
hoose k so that
T k = kB T e k :
kBGS k1 = kBGS
1
GS k 1

\dqbjV
2002/
page 3

10.2.

Stationary Iterative Methods

313

T e = B T LT e + U T e , we get
Then from BGS
k
k
k
GS

kBGS k1  kBGS k1 kLT ek k1 + kU T ek k1 :


Sin e A is stri tly row-wise diagonally dominant we have kLT ek k1 + kU T ek k1 
kBJ k1 < 1, and it follows that

kBGS k1  kU T ek k1 = 1 kLT ek k1 < 1:
Hen e the Gauss{Seidel method is onvergent. The proof for the stri tly olumnwise diagonally dominant ase is similar but estimates kBGS k1 .

Example 10.2.1. In Se tion 10.1.3 it was shown that the (n 1)2 eigenvalues of
the matrix A = (I
T ) + (T
I ) arising from the model problem are (i + j ),
i; j = 1; : : : ; n 1, where i = 2(1 + os (i=n)). It follows that the eigenvalues of
the orresponding Ja obi iteration matrix BJ = L + U = (1=4)(A 4I ) are
1
ij = ( os ih + os jh); i; j = 1; 2; : : : ; n 1;
2
where h = 1=n is the grid size. The spe tral radius is obtained for i = j = 1,

(BJ ) = os(h)  1

2
1
2 (h) :

This means that the low frequen y modes of the error are damped most slowly,
whereas the high frequen y modes are damped mu h more qui kly.3 The same is
true for the Gauss{Seidel method, for whi h

(BGS ) = os2 (h)  1 (h)2 ;


The orresponding asymptoti rates of onvergen e are R1 (BJ )  2 h2 =2, and
R1 (BGS )  2 h2 . This explains the observation made in Example ex10.1.1 that
Gauss{Seidel's method onverged twi e as fast as Ja obi's method. However, for
both methods the number of iterations is proportional to (A) for the model problem.
The rate of onvergen e of the Ja obi and Gauss{Seidel methods, as exhibited
in the above example, is in general mu h too slow to make these methods of any
pra ti al use. In Se tion 10.3.1 we show how with a simple modi ation of the
Gauss{Seidel method the rate of onvergen e an be improved by a fa tor of n for
the model problem.

10.2.3 The E e t of Nonnormality and Finite Pre ision


While the spe tral radius determines the asymptoti rate of growth of matrix powers,
the norm will in uen e the initial behavior of the powers B k . However, the norm of
3
This is one of the basi observations used in the multigrid method, whi h uses a sequen e of
di erent meshes to e iently damp all frequen ies.

\dqbjV
2002/
page 3

314

Chapter 10.

Iterative Methods for Linear Systems

a onvergent matrix an for a nonnormal matrix be arbitrarily large. By the S hur


normal form any matrix A is unitarily equivalent to an upper triangular matrix.
Therefore, in exa t arithmeti , it su es to onsider the ase of an upper triangular
matrix.
Consider the 2  2 onvergent matrix


B = 0  ; 0 <    < 1;  1;

(10.2.10)

for whi h we have kB k2  (B ). Therefore, even though kB k k ! 0 as k ! 1, the


spe tral norms kB k k2 will initially sharply in rease! It is easily veri ed that
8


< 
(10.2.11)
B k = 0  kk ; k =   if  6= ;
:
kk 1
if  = .
Clearly the element k will grow initially. In the ase that  =  the maximum of
j k j will o ur when k  =(1 ). (See also Computer Exer ise 1.)
For matri es of larger dimension the initial in rease of kB k k an be huge as
shown by the following example:
 k


Example 10.2.2. Consider the iteration x(k+1) = Bx(k) , where B 2 R2020 is the
bidiagonal matrix
1
0
0 1
0:5 1
1
C
B
0
:
5
1
C
B1C
B
.
.
(0)
C
C;
.. ..
x =B
B=B
C
B
 ... A :
A

0:5 1
1
0:5
Here (B ) = 0:15, and hen e the iteration should onverge to the exa t solution of
the equation (I B )x = 0, whi h is x = 0. From Fig. 10.2.1 it is seen that kx(n) k2
in reases by almost a fa tor 1015 until it starts to de rease after 25 iterations!
Although in the long run the norm is redu ed by about a fa tor of 0:5 at ea h
iteration, large intermediate values of x(n) give rise to persistent rounding errors.
The urve in Figure 10.2.1 shows a large hump. This is a typi al phenomenon
in several other matrix problems and o urs also, e.g., when omputing the matrix
exponential eBt , when t ! 1.
For the ase when the iteration pro ess is arried out in exa t arithmeti we
found a omplete and simple mathemati al theory of onvergen e for iterates x(k) of
stationary iterative methods. A ording to Theorem 10.2.1 there is onvergen e for
any x(0) if and only if (B ) < 1, where (B ) is the spe tral radius of B . The same
ondition is ne essary and su ient for limk!1 B k = 0 to hold. In nite pre ision
arithmeti the onvergen e behavior turns out to be more omplex and less easy to
analyze.
It may be thought that iterative methods are less a e ted by rounding errors
than dire t solution methods, be ause in iterative methods one ontinues to work

\dqbjV
2002/
page 3

10.2.

Stationary Iterative Methods

315

10

10

10

10

-2

10

-4

10

-6

10

10

20

30

40

50

60

70

80

90

100

Figure 10.2.1. kx(n) k2 , where x(k+1) = Bx(k) , and x = (1; 1; : : : ; 1)T .


with the original matrix instead of modifying it. In Se tion 6.6.6 we showed that the
total e e t of rounding errors in Gaussian elimination with partial pivoting usually
is equivalent to a perturbations in the elements of the original matrix of the order
of ma hine roundo . It is easy to verify that, in general, iterative methods annot
be expe ted to do mu h better than that!
Consider an iteration step with the Gauss{Seidel method performed in oating
point arithmeti . Typi ally, in the rst step an improved x1 will be omputed from
previous x2 ; : : : ; xn by


x1 = fl b1

n
X
j =1

a1j xj =a11 = b1 (1 + 1 )

n
X
j =1

a1j xj (1 + j ) =a11 ;

with the usual bounds for i , f. Se tion 2.4.1. This an be interpreted that we
have performed an exa t Gauss{Seidel step for a perturbed problem with elements
b1 (1 + 1 ) and a1i (1 + i ), i = 2; : : : ; n. The bounds for these perturbations are of
the same order of magnitude that for the perturbations in Gaussian elimination.
The idea that we have worked with the original matrix is not orre t.

Example 10.2.3. (J. H. Wilkinson) Consider the (ill- onditioned) system Ax = b,


where




0
:
96326
0
:
81321
0
:
88824
A = 0:81321 0:68654 ;
b = 0:74988 :

The smallest singular value of A is 0:36  10 5. This system is symmetri , positive


de nite and therefore the Gauss{Seidel method should onverge, though slowly.
Starting with x1 = 0:33116, x2 = 0:70000, the next approximation for x1 is omputed from the relation


x1 = fl (0:88824 0:81321  0:7)=0:96326 = 0:33116;


(working with ve de imals). This would be an exa t result if the element a11 was
perturbed to be 0:963259 : : :, but no progress is made towards the true solution x1 =

\dqbjV
2002/
page 3

316

Chapter 10.

Iterative Methods for Linear Systems

0:39473 : : :, x2 = 0:62470 : : :. The ill- onditioning has a e ted the omputations


adversely. Convergen e is so slow that the modi ations to be made in ea h step
are less than 0:5  10 5.
Stationary iterative methods may be badly a e ted by rounding errors for
nonnormal matri es. We have seen that the \hump" phenomenon an make it
possible for kx(k) k2 to in rease substantially, even when the iteration matrix B is
onvergent; see Example 10.2.2. In su h a ase an ellation will o ur in the omputation of the nal solution, and a rounding error of size u maxk kx(k) k2 remains,
where u is the ma hine unit.
Moreover, for as nonnormal matrix B asymptoti onvergen e in nite pre ision arithmeti is no longer guaranteed even if the ondition (B ) < 1 is true in
exa t arithmeti . This phenomenon is related to the fa t that for a matrix of a
high degree of nonnormality the spe trum an be extremely sensitive to perturbations. As shown above the omputed iterate x(k) will at best be the exa t iterate
orresponding to a perturbed matrix B + B . Hen e even though (B ) < 1 it
may be that (B + B ) is larger than one. To have onvergen e in nite pre ision
arithmeti we need a stronger ondition to hold, e.g.,
max (B + E ) < 1;

kE k2 < ukB k2;

where u is the ma hine pre ision. (Compare the dis ussion of pseudospe tra in
Se tion 9.3.3.) The following rule of thumb has been suggested:
The iterative method with iteration matrix B an be expe ted to onverge
in nite pre ision arithmeti if the spe tral radius omputed via a ba kward
stable eigensolver is less than 1.
This is an instan e when an inexa t results is more useful than the exa t result!

10.2.4 Termination Criteria


An iterative method solving a linear system Ax = b is not ompletely spe i ed
unless learly de ned riteria are given for when to stop the iterations. Ideally su h
riteria should identify when the error x x(k) is small enough and also dete t if
the error is no longer de reasing or de reasing too slowly.
Normally a user would like to spe ify an absolute (or a relative) toleran e 
for the error, and stop as soon as

kx x(k) k  
(10.2.12)
is satis ed for some suitable ve tor norm k  k. However, su h a riterion annot

in general be implemented sin e x is unknown. Moreover, if the system to be


solved is ill onditioned, then be ause of roundo the riterion (10.2.12) may never
be satis ed.
Instead of (10.2.12) one an use a test on the residual ve tor r(k) = b Ax(k) ,
whi h is omputable, and stop when

kr(k) k  (kAk kx(k) k + kbk):

\dqbjV
2002/7
page 3

Review Questions

317

This is often repla ed by the stri ter riterion


kr(k) k  kbk;
(10.2.13)
but this may be di ult to satisfy in ase kbk  kAkkxk . Although su h residual based riteria are frequently used, it should be remembered that if A is ill onditioned a small residual does not guarantee a small relative error in the approximate solution. Sin e x x(k) = A 1 r(k) , (10.2.13) only guarantees that
kx x(k) k  kA 1k kbk, and this bound is attainable.
Another possibility is to base the stopping riterion on the Oettli{Prager ba kward error, see Theorem. 6.6.4. The idea is then to ompute the quantity

jri(k) j ;
(10.2.14)
(E jx(k) j + f )i
where E > 0 and f > 0, and stop when !  . It then follows from Theorem 6.6.4
! = max
i

that x(k) is the exa t solution to a perturbed linear system


(A + A)x = b + b; jAj  !E; jbj  !f:
We ould in (10.2.14) take E = jAj and f = jbj, whi h orresponds to omponentwise
ba kward errors. However, it an be argued that for iterative methods a more
suitable hoi e is to use a normwise ba kward error by setting
E = kAk1 eeT ; f = kbk1e; e = (1; 1; : : : ; 1)T :
This hoi e gives
kr(k) k1
!=
kAk1 kx(k) k1 + kbk1 :

Review Questions
1.
2.
3.

4.

The standard dis retization of Lapla e equation on a square with Diri hlet boundary
onditions leads to a ertain matrix A. Give this matrix in its blo k triangular form.
What iterative method an be derived from the splitting A = M N ? How is a
symmetrizable splitting de ned?
De ne the average and asymptoti rate of onvergen e for an iterative method
x(k+1) = Bx(k) + . Does the ondition (B ) < 1 imply that the error norm
kx x(k) k2 is monotoni ally de reasing? If not, give a ounterexample.
Give at least two di erent riteria whi h are suitable for terminating an iterative
method.

Problems
1.

Let A 2 Rnn be a given nonsingular matrix, and X (0) 2 Rnn an arbitrary matrix.
De ne a sequen e of matri es by
X (k+1) = X (k) + X (k) (I AX (k) ); k = 0; 1; 2; : : : :

\dqbjV
2002/
page 3

Chapter 10.

318

2.

Iterative Methods

(a) Prove that limk!1 X (k) = A 1 if and only if (I AX (0) ) < 1.
Hint: First show that I AX (k+1) = (I AX (k) )2 .
(b) Use the iterations to ompute the inverse A 1 , where




A = 11 12 ;
X (0) = 10:9:9 00:9:9 :
Verify that the rate of onvergen e is quadrati !
Let A 2 Rmn be a given nonsingular matrix, Consider the stationary iterative
method
x(k+1) = x(k) + !AT (b Ax(k) );
where A 2 Rmn is a possibly rank de ient matrix.
2
(a) Show that if rank (A) = n and 0 < ! < 2=max
(A) then then the iteration
onverges to the unique solution to the normal equations AT Ax = AT b.
(b) If rank(A) < n, then split the ve tor x(k) into orthogonal omponents,

x(k) = x(1k) + x(2k) ; x(1k) 2 R(AT ); x(2k) 2 N (A):


Show that the orthogonal proje tion of x(k) x(0) onto N (A) is zero. Con lude that

3.

in this ase the iteration onverges to the unique solution of the normal equations
whi h minimizes kx x(0) k2 .
Show that if for a stationary iterative method x(k+1) = Bx(k) + it holds that
kB k  < 1, and
kx(k) x(k 1) k  (1 )= ;
then the error estimate kx x(k)k   holds.

Computer Exer ises


1.
2.

Let B be the 2  2 matrix in (10.2.11), and take  =  = 0:99, = 4. Verify that


kB k k2  1 for all k < 805!
Let B 2 R2020 be an upper bidiagonal matrix with diagonal elements equal to
0:025; 0:05; 0:075; : : : ; 0:5 and elements in the superdiagonal all equal to 5.
(a) Compute and plot k = kx(k) k2 =kx(0) k2 , k = 0 : 100, where
x(k+1) = Bx(k) ; x(0) 0 = (1; 1; : : : ; 1)T :
Show that k > 1014 before it starts to de rease after 25 iterations. What is the
smallest k for whi h kx(0) kk2 < kx(0) k2 ?
(b) Compute the eigende omposition B = X X 1 and determine the ondition
number  = kX k2 kX 1 k2 of the transformation.

10.3

Su essive Overrelaxation Methods

10.3.1 The SOR Method


It was shown by Young [20 that for a ertain lass of matri es a great improvement
in the rate of onvergen e an be obtained by the simple means of introdu ing a

\dqbjV
2002/
page 3

10.3.

Su essive Overrelaxation Methods

319

relaxation parameter ! in the Gauss{Seidel method. This leads to the famous


Su essive Over Relaxation (SOR) method, whi h is derived from the Gauss{



Seidel method by multiplying ea h in rement x(ik+1) x(ik) by !,
x(ik+1) = xi(k) + !

1
b
aii i

i 1
X
j =1

aij x(jk+1)

n
X
j =i

aij x(jk) ; i = 1; 2 : : : ; n; (10.3.1)

where the expression in bra kets is the urrent residual of the ith equation. For
! = 1, the SOR method redu es to the Gauss{Seidel method. The idea is now
to hoose ! so that the asymptoti rate of onvergen e is maximized. The SOR
method an be written in matrix form as


x(k+1) = x(k) + ! + Lx(k+1)

(I

U )x(k) ;

where = DA 1 b, or after rearranging


(I

!L)x(k+1) = [(1 !)I + !U x(k) + ! :

The iteration matrix for SOR therefore is

B! = (I

!L) 1[(1 !)I + !U :

(10.3.2)

We now onsider the onvergen e of the SOR method and rst show that only
values of !, 0 < ! < 2 are of interest.

Theorem 10.3.1.
Let B = L+U be any matrix with zero diagonal and let B! be the orresponding
iteration matrix in the SOR method. Then we have
(B! )  j! 1j;

(10.3.3)

with equality only if all the eigenvalues of B! are of modulus j!


SOR method an only onverge for 0 < ! < 2.

1j. Hen e the

Proof. Sin e the determinant of a triangular matrix equals the produ t of its
diagonal elements we have

det(B! ) = det(I

!L) 1 det[(1 !)I + !U = (1 !)n :

Also det(B! ) = 1 2    n , where i are the eigenvalues of B! . It follows that

(B! ) = max ji j  j1 !j


1in
with equality only if all the eigenvalues have modulus j!

1j.

The following theorem asserts that if A is a positive de nite matrix, then the
SOR method onverges if 0 < ! < 2.

\dqbjV
2002/
page 3

320

Chapter 10.

Iterative Methods

Theorem 10.3.2. For a symmetri positive de nite matrix A we have


(B! ) < 1; 8!; 0 < ! < 2:
Proof.

We defer the proof to Theorem 10.5.4.

For an important lass of matri es an expli it expression for the optimal value
of ! an be given. We rst introdu e the lass of matri es with property A.

De nition 10.3.3. The matrix A is said to have property A if there exists a


permutation matrix P su h that P AP T has the form


D1 U1 ;
(10.3.4)
L1 D2
where D1 ; D2 are diagonal matri es.
Equivalently, the matrix A 2 Rnn has property A if the set f1; 2; : : : ; ng an
be divided into two non-void omplementary subsets S and T su h that aij = 0
unless i = j or i 2 S , j 2 T , or i 2 T , j 2 S .

Example 10.3.1. The tridiagonal matrix A below


0
0
2
1 0 01
2 0
1 01
1 2
1 0 C;
0 2
1 1C
A=B
P T AP = B


0
1 2
1A
1 1 2 0A
0 0
1 2
0
1 0 2
has property A, and we an hoose S = f1; 3g, T = f2; 4g. Permutation of olumn
1 and 4 followed by a similar row permutation will give a matrix of the form above.
De nition 10.3.4. A matrix A with the de omposition A = DA (I L U ), DA
nonsingular, is said to be onsistently ordered if the eigenvalues of
J ( ) = L + 1 U;
6= 0;
are independent of .
A matrix of the form of (10.3.4) is onsistently ordered, sin e





1 D 1 U1 
0

I
0
I
0
1
J ( ) =
=
D2 1 L1
0
0 I J (1) 0 1 I ;
the matri es J ( ) and J (1) are similar and hen e have the same eigenvalues. Similarly one an prove more generally that any blo k-tridiagonal matrix
1
0
D1 U1
C
B L2 D2 U2
C
B
... ...
C
B
A=B
L3
C;
C
B
... ...

Un 1 A
L n Dn

\dqbjV
2002/
page 3

10.3.

Su essive Overrelaxation Methods

321

where Di are nonsingular diagonal matri es is onsistently ordered.

Theorem 10.3.5.
Let A = DA(I L U ) be a onsistently ordered matrix. Then if  is an
eigenvalue of the Ja obi matrix so is . Further, to any eigenvalue  6= 0 of the
SOR matrix B! , ! 6= 0, there orresponds an eigenvalue  of the Ja obi matrix,
where
+! 1
(10.3.5)
=
!1=2
Proof. Sin e A is onsistently ordered the matrix J ( 1) =
L U = J (1) has
the same eigenvalues as J (1). Hen e if  is an eigenvalue so is . If  is an
eigenvalue of B! , then det(I B! ) = 0, or sin e det(I !L) = 1 for all !, using
(10.3.2)

det[(I

!L)(I

B! ) = det[(I

!L) (1 !)I

If ! 6= 0 and  6= 0 we an rewrite this in the form


det

+! 1
I
!1=2

!U = 0:

(1=2 L +  1=2 U ) = 0

and sin e A is onsistently ordered it follows that det I


given by (10.3.5). Hen e  is an eigenvalue of L + U .

(L + U ) = 0, where 

If we put ! = 1 in (10.3.1) we get  = 2 . Sin e ! = 1 orresponds to


the Gauss{Seidel method it follows that (BGS ) = (BJ )2 , whi h means that for
onsistently ordered matri es A the Gauss{Seidel method onverges twi e as fast as
the Ja obi method.
We now state an important result due to Young [20.

Theorem 10.3.6.
Let A be a onsistently ordered matrix, and assume that the eigenvalues  of
BJ = L + U are real and J = (BJ ) < 1. Then the optimal relaxation parameter
! in SOR is given by
2
p
!opt =
:
(10.3.6)
1 + 1 2J
For this optimal value we have
(B!opt ) = !opt

1:

(10.3.7)

(See also Young [21, Se tion 6.2.) We onsider, for a given value of  in
the range 0 <   (L + U ) < 1, the two fun tions of ,

Proof.

f! () =

+! 1
;
!

g(; ) = 1=2 :

\dqbjV
2002/
page 3

Chapter 10.

322

Iterative Methods

=1.7, =0.99;
2

1.5

f(,), g(,)

0.5

0.5

1.5

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

Figure 10.3.1. f! () and g(; ) as fun tions of  ( = 0:99, ! = !b = 1:7527).


Here f! () is a straight line passing through the points (1; 1) and (1 !; 0), and
g(; ) a parabola. The relation (10.3.5) an now be interpreted as the interse tion
of these two urves. For given  and ! we get for  the quadrati equation


2 + 2 (! 1)

1 2 2
 !  + (!
2

1)2 = 0:

(10.3.8)

whi h has two roots


1
1=2
1
1;2 = 2 !2 (! 1)  ! 2 !2 (! 1) :
2
4
The larger of these roots de reases with in reasing ! until eventually f! () be omes
a tangent to g(; ), when 2 !2 =4 (! 1) = 0 (see Fig. 10.2.1) Solving for the
root !  2 gives
2
1 (1 2 )1=2
p
:
=
!~ =
2
1=2
1 + 1 2
If ! > !~ , we get two omplex roots , whi h by the relation between roots and
oe ients in (10.3.8) satisfy

1 2 = (! 1)2 :
From this it follows that j1 j = j2 j = ! 1, 1 < !~ < ! < 2, and hen e the
minimum value of of maxi=1;2 ji j o urs for !~ . Sin e the parabola g(; (L + U )) is
the envelope of all the urves g(; ) for 0 <   (L + U ) < 1 the theorem follows.

Example 10.3.2. By (10.3.6) for SOR !opt = 2=(1 + sin h), giving
(B!opt ) = !opt

1=

1 sin h
1 + sin h

1

2h:

(10.3.9)

\dqbjV
2002/
page 3

10.3.

Su essive Overrelaxation Methods

323

Note that when limn!1 !opt = 2.

R1 (B!opt )  2h;
whi h shows that for the model problem the number of iterations is proportional to
n for the SOR method
In Table 10.1.1 we give the number of iterations required to redu e the norm
of the initial error by a fa tor of 10 3.

Table 10.3.1. Number of iterations needed to redu e the initial error by a


fa tor of 10 3 for the model problem, as a fun tion of n = 1=h.
n
10 20
50
100
200
Gauss{Seidel 69 279 1,749 6,998 27,995
SOR
17 35
92
195
413
In pra ti e, the number J is seldom known a priori, and its a urate determination would be prohibitively expensive. However, for some model problems the
spe trum of the Ja obi iteration matrix is known. In the following we need the
result:
SOR
1

0.95

0.9

0.85

0.8

0.75

0.2

0.4

0.6

0.8

1.2

1.4

1.6

1.8

Figure 10.3.2. The spe tral radius (B! ) as a fun tion of ! ( = 0:99,
!b = 1:7527).
A simple s heme for estimating !opt is to initially perform a xed number of
iterations using ! = 1, i.e., with the Gauss{Seidel method, and attempt to measure
the rate of onvergen e. The su essive orre tions satisfy

(n+1) = BGS (n) ;

(n) = x(n+1)

x(n) :

Hen e after a su ient number of iterations we have

(BJ )2 = (BGS )  n ;

n = k(n+1) k1 =k(n)k1 ;

\dqbjV
2002/
page 3

Chapter 10.

324

Iterative Methods

An estimate of !opt is then obtained by substituting this value into (10.3.6). A loser
analysis shows, however, that the number of iterations to obtain a good estimate of
!opt is omparable to the number of iterations needed to solve the original problem
by SOR. The s heme an still be pra ti al if one wishes to solve a number of systems
involving the same matrix A. Several variations of this s heme have been developed,
see Young [21, p. 210.
In more ompli ated ases when J is not known, we have to estimate !opt in
the SOR method. From Fig. 10.2.2 we on lude that it is mu h better to overestimate !opt than to underestimate it.

10.3.2 The SSOR Method


As remarked above the iteration matrix B! of the SOR-method is not symmetri
and its eigenvalues are not real. In fa t, in ase ! is hosen slightly larger than
optimal (as re ommended when J is not known) the extreme eigenvalues of B!
lie on a ir le in the omplex plane. However, a symmetri version of SOR, the
(SSOR) method of Sheldon (1955), an be onstru ted as follows. One iteration
onsists of two half iterations The rst half is the same as the SOR iteration. The
se ond half iteration is the SOR method with the equations taken in reverse order.
The SSOR method an be written in matrix form as


x(k+1=2) = x(k) + ! + Lx(k+1=2)




U )x(k) ;

(I

x(k+1) = x(k+1=2) + ! + Ux(k+1)

L)x(k+1=2) :

(I

This method is due to Sheldon [1955. The iteration matrix for SSOR is

S! = (I

!U ) 1 [(1 !)I + !L(I

!L) 1[(1 !)I + !U :

It an be shown that SSOR orresponds to a splitting with the matrix


1


! 1
M! =
(10.3.10)
DA LA DA 1 DA UA :
2 ! !
!
If A is symmetri , positive de nite then so is M! . In this ase the SSOR method
is onvergent for all ! 2 (0; 2). A proof of this is obtained by a simple modi ation
of the proof of Theorem 10.5.4.
In ontrast to the SOR method, the rate of onvergen e of SSOR is not very
sensitive to the hoi e of ! nor does it assume that A is onsistently ordered. It
an be shown that provided (LU ) < 1=4 a suitable value for ! is !b , where
p

(1 J )=2
1
2
p
;
(S!b ) 
:
1 + 2(1 J )
1 + (1 J )=2
In parti ular, for the model problem in Se tion 10.1.3 it follows that
1 sin h=2
 1 h:
(B!b ) 
1 + sin h=2
This is half the rate of onvergen e for SOR with !opt .

!b =

\dqbjV
2002/
page 3

10.3.

Su essive Overrelaxation Methods

325

10.3.3 Blo k Iterative Methods


The basi iterative methods des ribed so far an be generalized for blo k matri es
A. Assume that
0
A11 A12 : : : A1n 1
B A21 A22 : : : A2n C
A=B
..
. C;
 ..
.
. : : : .. A
An1 An2 : : : Ann
where the diagonal blo ks are square and nonsingular. For this blo k matrix we
onsider the splitting

A = DA LA UA ;

DA = diag (A11 ; A22 ; : : : ; Ann );

and where LA and UA are stri tly lower and upper triangular. As before Ja obi's
method an be written DA x(k+1) = (LA + UA )x(k) + b, or with x is partitioned
onformally

Aii x(ik+1)

x(ik) = b

n
X
j =1

Aij x(jk) ; i = 1; : : : ; n:

Hen e it is important that linear systems in the diagonal blo ks Aii an be solved
e iently.

Example 10.3.3. For the model problem in Se tion 10.1.3 the matrix A an
naturally be written in the blo k form where the diagonal blo ks Aii = 2I + T
are tridiagonal and nonsingular, see (10.1.1). The resulting systems an be solved
with little overhead. Note that here the partitioning is su h that xi orresponds to
the unknowns at the mesh points on the ith line. Hen e blo k methods are in this
ontext also known as \line" methods and the other methods as \point" methods.
Blo k versions of the Gauss{Seidel, SOR, and SSOR methods are developed
similarly. For SOR we have

Aii x(ik+1)

xi(k) = ! b

i 1
X
j =1

Aij x(jk+1)

n
X
j =i

Aij x(jk) ; i = 1; : : : ; n:

(Taking ! = 1 gives thepGauss{Seidel method.) Typi ally the rate of onvergen e


is improved by a fa tor 2 ompared to the point methods.
It an easily be veri ed that the SOR theory as developed in Theorems 10.3.2
and 10.3.5 are still valid in the blo k ase. We have

B! = (I



!L) 1 (1 !)I + !U ;

where L = DA 1 LA and U = DA 1 UA. Let A be a onsistently ordered matrix with


nonsingular diagonal blo ks Aii , 1  i  n. Assume that the blo k Ja obi matrix B

\dqbjV
2002/
page 3

Chapter 10.

326

Iterative Methods

has spe tral radius (BJ ) < 1. Then the optimal value of ! in the SOR method is
given by (10.3.6). Note that with the blo k splitting any blo k-tridiagonal matrix
0

1
D1 U1
B L2 D2 U2
C
B
C
... ...
B
C
A=B
L3
C;
B
C
.
.

. . . . Un 1 A
L n Dn

is onsistently ordered; for the point methods this was true only in ase the blo k
diagonal matri es Di , i = 1; : : : ; n were diagonal. In parti ular we on lude that
with the blo k splitting the matrix A in (10.1.1) is onsistently ordered.

Review Questions
1.
2.
3.

When is the matrix A redu ible? Illustrate this property using the dire ted graph
of A.
Let A = DA (I L U ), where DA > 0. When is A said to have \property A". When
is A onsistently ordered? How are these properties related to the SOR method?
For the model problem the asymptoti rate of onvergen e for the lassi al iterative
methods is proportional to hp , where h is the mesh size. Give the value of p for
Ja obi, Gauss{Seidel, SOR and SSOR. (For the last two methods it is assumed that
the optimal ! is used.

Problems
1.

2.

(a) Show that if A is redu ible so is AT . Whi h of the following matri es are
irredu ible?

 
 
 

1 0
0 1
1 1
1 1 :
0 1
1 0
0 1
1 0
(b) Is it true that a matrix A, in whi h the elements take the values 0 and 1 only, is
irredu ible if and only if the non-de reasing matrix sequen e (I + A)k , k = 1; 2; 3; : : :
be omes a full matrix for some value of k?
The matrix A in (10.1.1) is blo k-tridiagonal, but its diagonal blo ks are not diagonal
matri es. Show that in spite of this the matrix is onsistently ordered.
Hint: Perform a similarity transformation with the diagonal matrix
D( ) = diag (D1 ( ); D2 ( ); : : : ; Dn ( ));

where D1 ( ) = diag (1; ; : : : ; n 1 ), Di+1 ( ) = Di ( ), i = 1; 2; : : : ; n 1.

\dqbjV
2002/
page 3

10.4.

Convergen e A eleration

10.4

327

Convergen e A eleration

10.4.1 Nonstationary and Semi-iterative Methods


Consider the stationary iterative method

x(k+1) = x(k) + M 1 (b Ax(k) );


whi h orresponds to a matrix splitting A = M

B = M 1N = I

k = 0; 1; : : : ;

(10.4.1)

N , and iteration matrix

M 1A:

In this se tion we des ribe an important method for a elerating the onvergen e
of the method (10.4.1) provided it is symmetrizable.

De nition 10.4.1. The stationary iterative method (10:4:1) is said to be symmetrizable if there is a nonsingular matrix W su h that the matrix W (I B )W 1
is symmetri and positive de nite.
For a symmetrizable method the matrix I B has real positive eigenvalues.
A su ient ondition for a method to be symmetrizable is that both A and the
splitting matrix M are symmetri , positive de nite., sin e then there is a matrix W
su h that M = W T W , and

W (I

B )W 1 = W M 1AW 1 = W W 1 W

T AW

1=W

T AW

1;

whi h again is positive de nite.

Example 10.4.1. If A is positive de nite then in the standard splitting (10.2.5)


DA > 0, and hen e the Ja obi method is symmetrizable with W = DA1=2 . From
(10.3.10) it follows that also the SSOR method is symmetrizable.
It is often possible to nd an asso iated method whi h will onverge faster than
the given method by taking a weighted arithmeti mean of the rst k approximations
generated by the method (10.4.1),

x~(k) =

k
X
i=0

ki x(i) ;

k
X
i=0

ki = 1; k = 0; 1; 2; : : : ;

(10.4.2)

Su h methods, introdu ed by Varga (1957), are non-stationary and alled semiiterative methods. It follows from the error equation x(k) x = B k (x(0) x)
that

x~(k)

x=

k
X
i=0

ki (x(i)

x) =

k
X
i=0

ki B i (x(0)

x):

Introdu ing the generating polynomial we an write this

x~(k)

x = pk (B )(x(0)

x);

pk () =

k
X
i=0

ki i ;

(10.4.3)

\dqbjV
2002/
page 3

Chapter 10.

328

Iterative Methods

where pk () is a polynomial of degree k. Therefore this pro edure is also known as
polynomial a eleration. Note that from (10.4.2) it follows that pk (1) = 1. In
the spe ial ase that M = I we obtain for the residual r~(k) = b Ax~(k)

r~(k) = A(x x~(k) ) = qk (A)r(0) ;

qk () = pk (1 ):

(10.4.4)

where we have used that A = I B . The polynomials qk () have the property that
qk (0) = 1 and are known as residual polynomials.

Example 10.4.2. Consider the non-stationary Ri hardson iteration,


x(i+1) = x(i) + !i (b Ax(i) ); i = 1; 2; : : : ;
( f. (10.1.4)). It is easily seen that the residual ve tor r(k) = b Ax(k) satis es

r(k) = qk (A)r(0) ;

qk () =

kY1
i=0

(I

!i ):

(10.4.5)

Hen e, by hoosing a suitable sequen e of parameters f!igik=01 we an obtain any


residual polynomial qk (). If the spe trum of A is real, a < (A) < b, we would
like jqk ()j to be small in (a; b).

10.4.2 Chebyshev A eleration


The most important ase is Chebyshev a eleration, whi h we now develop. We
assume that the eigenvalues fi gni=1 of M 1 A are real and satisfy
0 < a  i < b:

(10.4.6)

From (10.4.3) we get the error estimate

kx~(k) xk = kqk (M 1 A)k kx(0) xk;


If M 1 A is Hermitian, then kqk (M 1 A)k2 = (qk (M 1 A)), and after k steps of the

a elerated method the 2-norm of the error is redu ed by at least a fa tor of

(qk (M 1 A)) = max


jqk (i )j  max
jq ()j:
i
2[a;b k
Therefore a suitable polynomial qk is obtained by solving the minimization problem
min max jq()j;

q21k 2[a;b

where 1k denotes the set of residual polynomials qk of degree  k su h that qk (0) =
1. By a similar argument as used in the proof of the minimax property of Chebyshev
polynomials, see Se tion 9.3.4, the solution to the above minimization problem is
given by the shifted and normalized Chebyshev polynomials

qk () = Tk (z ())=Tk (z (0));

(10.4.7)

\dqbjV
2002/
page 3

10.4.

Convergen e A eleration

329

where Tk (z ) is the Chebyshev polynomial of degree k and z () the linear transformation, whi h maps the interval  2 [a; b onto z 2 [ 1; 1. Hen e

z () =
where

b + a 2
=
b a

2
;
b a

(10.4.8)

b+a +1
b
=
> 1;
= :
(10.4.9)
b a  1
a
Note that if M 1A is symmetrizable  is the ondition number of M 1 A.
Sin e jTk (z )j  1, z 2 [ 1; 1, and  > 1, we have
 = z (0) =

(qk (M 1 A))  1=Tk () < 1:


Hen e k iterations will redu e the error norm by at least a fa tor

Tk () = osh(k ) = 21 (ek + e

k )

where  = osh = (e + e )=2 or e =  + 2


after some simpli ation

= log

> 12 ek ;
1. We obtain using (10.4.9)

p + 1  2
p 1 > p :

p
(Verify the last inequality! See Problem 2.) Hen e Tk () > 12 e2k=  , and to redu e
the error norm at least by a fa tor of < 1 it su es to perform k iterations, where

2
1p
(10.4.10)
 log ;
2

Thus the number of iterations


p required for a ertain a ura y for the a elerated
method is proportional to  rather than . This is a great improvement! Note
that the matrix M an be interpreted as a pre onditioner. To in rease the rate
of onvergen e M should be hosen so that the onditioning of the matrix M 1 A
is improved.
Sin e the zeros of the Chebyshev polynomials Tk (z ) are known it is possible
to implement Chebyshev a eleration as follows ( f. Example 10.4.2). To perform
N steps we ompute

k>

where

x(k+1)=x(k) + !k M 1 (b Ax(k) ); k = 0; 1; : : : ; N


!k=2 (b + a) (b a) os k + 21 =N



1;

(10.4.11)

; k = 0; 1; : : : ; N

1(10.4.12)
:

After N steps the iterations an be repeated in a y li fashion. (Note that for


N = 1 we retrieve the optimal ! for the stationary Ri hardson's method derived in
Se tion 10.2.) However, this s heme was shown by David Young to be very sensitive
to rounding error e e ts unless N is small. The instability an be eliminated by
using a ertain reordering of the iteration parameters !k ; see Computer exer ise

\dqbjV
2002/
page 3

Chapter 10.

330

Iterative Methods

1. However, one disadvantage remains, namely, the number N has to be xed in


advan e.
The best way to ompute the ve tors x~(k) is instead based on the three term
re urren e relation for the Chebyshev polynomials. We have (see Se tion 9.3.4)
T0(z ) = 1,

T1 (z ) = zT0;

Tk+1 (z ) = 2zTk (z ) Tk 1 (z ); k  1:

(10.4.13)

By (10.4.7) Tk (z ()) = Tk ()qk (), and substituting M 1A for z in (10.4.13), we


obtain

Tk+1 ()qk+1 (M 1 A) = 2z (M 1A)Tk ()qk (M 1 A) Tk 1 ()qk 1 (M 1 A):


Multiplying by (~x(0)

x), using (10.4.3) and (10.4.8) we obtain




2
M 1 A Tk ()(~x(k) x) Tk 1 ()(~x(k 1) x):
Tk+1 ()(~x(k+1) x) = 2 I
b a
From (10.4.13) with z =  it then follows that for k  1
Tk+1 ()~x(k+1) = 2Tk ()~x(k)

4Tk () 1 (k)


M A(~x
b a

x) Tk 1 ()~x(k 1) :

Further, we have

M 1A(~x(k)

x) = M 1 (Ax~(k)

b) = M 1r(k) :

Substituting Tk 1 () = 2Tk ()+ Tk+1 () and dividing with Tk+1 () we obtain

x~(k+1) = x~(k 1) + k M 1 r(k) + !k (~x(k)

x~(k 1) );

k = 1; 2; : : : ;

where r(k) = b Ax~(k) , and with = 2=(b + a),

!k = 2

Tk ()
;
Tk+1 ()

k = !k ; k  1:

A similar al ulation for k = 0 gives x~(1) = x~(0) + M 1 r(0) . Dropping the tilde
this leads to the following algorithm:

Algorithm 10.4.1
The Chebyshev Semi-Iterative Method
Assume that the eigenvalues fi gni=1 of M 1 A are real and satisfy 0 < a  i < b.
Then

(0) M 1 r(0) ;
k = 0,
x(k+1) = xx(k +
(10.4.14)
1) + !k M 1 r(k) + x(k) x(k 1) ; k = 1; 2; : : : ;
where  = (b + a)=(b a), = 2=(b + a), and


!0 = 2; !k = 1

!k 1
42

; k = 1; 2; : : : :

\dqbjV
2002/7
page 3

Review Questions

331

A disadvantage of Chebyshev onvergen e a eleration is that it requires knowledge of an interval [a; b en losing the (real) spe trum of M 1A is needed. If this
the en losure is too rude, then the pro ess loses e ien y.
The eigenvalues of the iteration matrix of the SOR-method B!opt are all omplex and have modulus j!opt j. Therefore in this ase onvergen e a eleration is of
no use. (A pre ise formulation is given in Young [21, p. 375.) However, Chebyshev
a eleration an be applied to the Ja obi and SSOR methods, with
1


! 1
DA LA DA 1 DA UA ;
MJ = DA ;
M! =
2 ! !
!
respe tively, as well as blo k versions of these methods, often with a substantial
gain in onvergen e rate.

Review Questions
1.

Consider an iterative method based on the splitting A = M N . Give onditions on


the eigenvalues of M 1 A whi h are su ient for Chebyshev a eleration to be used.
Express the asymptoti rate of onvergen e for the a elerated method in terms of
eigenvalue bounds.

Problems
1.
2.

3.

Verify the re ursion for !k for the Chebyshev semi-iteration method.


Show that

log (1 + s)=(1 s) = 2(s + s3 =3 + s5 =5 + : : :); 0  s < 1;
and use this result to prove (10.4.10).
Assume that A is symmetri inde nite with its eigenvalues ontained in the union
of two intervals of equal length,
S = [a; b [ [ ; d; a < b < 0; 0 < < d;
where d = b a. Then the Chebyshev semi-iterative method annot be applied
dire tly to the system Ax = b. Consider instead the equivalent system
Bx = ;
B = A(A I );
= Ab b:
(a) Show that if = d + a = b + , then the eigenvalues of B are positive and real
and ontained in the interval [ b ; ad.
(b) Show that the matrix B has ondition number
(B ) = d  jaj = d d + jbj :

jbj

jbj

Use this to give estimates for the two spe ial ases (i) Symmetri intervals with
respe t to the origin. (ii) The ase when jbj  .

\dqbjV
2002/
page 3

Chapter 10.

332

Iterative Methods

Computer Exer ises


1.

Let A be a matrix with real eigenvalues fi gni=1 , 0 < a  i < b. Then the
Chebyshev semi-iterative method for solving Ax = b an be implemented by the
re ursion (10.4.11){(10.4.12). The instability of this s heme an be eliminated by
using an ordering of the iteration parameters !k given by Lebedev and Finogenov.
For N = 2p this permutation ordering  is onstru ted by the following Matlab
program:
N = 2p ; int = 1;
kappa = ones(1; N );
for i = 1 : p
int = 2  int; ins = int + 1;
for j = int=2 : 1 : 1
kappa(2  j ) = ins kappa(j );
kappa(2  j 1) = kappa(j );
end;
end;
Implement and test this method using the system Ax = b from the Lapla e equation
on the unit square, with A blo k tridiagonal
2
2
A = tridiag( I; T + 2I; I ) 2 Rn n ;

T = tridiag( 1; 2 1) 2 Rnn :

Constru t the right hand so that the exa t solution be omes x = (1; 1; : : : ; 1; 1)T .
Let x(0) = 0 as initial approximation and solve this problem using
 The implementation based on the three term re ursion of Chebyshev polynomials
 Ri hardson implementation with natural ordering of the parameters
 Ri hardson implementation with the Lebedev{Finogenov ordering of the parameters
Take n = 50 and N = 128. Use the same number of iterations in all three implementations. List in ea h ase the maximum norm of the error and the residual. Compare
the results and draw on lusions!
10.5

Proje tion Methods

10.5.1 General Prin iples

Consider a linear system Ax = b, where A 2 Rnn . Suppose we want to nd


an approximate solution x^ in a subspa e K of dimension m. Then m independent
onditions are needed to determine x^. One way to obtain these is by requiring that
the residual b Ax^ is orthogonal to a subspa e L of dimension m, i.e.,

x^ 2 K;

b Ax^ ? L:

(10.5.1)

Many important lasses of iterative methods an be interpreted as being proje tion


methods in this general sense. The onditions (10.5.1) are often known as Petrov{
Galerkin onditions.

\dqbjV
2002/
page 3

10.5.

Proje tion Methods

333

We an obtain a matrix form of (10.5.1) by introdu ing basis ve tors in the


two subspa es. If we let

K = R(U );

L = R(V );

(10.5.2)

where U = (u1 ; : : : ; um ), V = (v1 ; : : : ; vm ), then we an write (10.5.1) as V T (b


AUz ) = 0, for some z 2 Rm . Hen e x^ = Uz is obtained by solving the redu ed
system
^ = V T b;
Az
A^ = V T AU:
(10.5.3)

We usually have m  n, and when m is small this system an be solved by a dire t


method.

Example 10.5.1. Even though A is nonsingular the matrix A^ may be singular.


Take, e.g., m = 1, U = V = e1 , and


A = 01 11 :
Then A^ = 0. Note that the matrix A here is symmetri , but not positive de nite.
There are two important spe ial ases in whi h the matrix A^ an be guaranteed
to be nonsingular.
1. Let A by symmetri , positive de nite (s.p.d.) and L = K. Then we an take
V = U , and have A^ = U T AU . Clearly A^ is s.p.d., and hen e also nonsingular.
2. Let A be nonsingular and L = AK. Then we an take V = AU , and we get
A^ = U T AT AU . Here AT A is s.p.d. and hen e A^ is nonsingular.
We now derive important optimality properties satis ed in these two spe ial
ases. For this purpose we rst de ne a new inner produ t and norm related to a
s.p.d. matrix A.

De nition 10.5.1. For an s.p.d. matrix A we de ne a related A-inner produ t


and A-norm by
(u; v)A = uT Av;
kukA = (uT Au)1=2 ;
(10.5.4)
It is easily veri ed that kukA satis es the onditions for a norm.

Lemma 10.5.2.
Let A by symmetri , positive de nite (s.p.d.) and onsider the ase L = K,
(V = U ). Then x^ = U (U T AU ) 1 U T b minimizes the A-norm of the error over all
ve tors x 2 K, i.e., x^ solves the problem
min kx x kA ; x = A 1 b:
(10.5.5)
x2K

\dqbjV
2002/
page 3

Chapter 10.

334

Iterative Methods

By (10.5.1) x^ satis es vT (b Ax^) = 0, 8v 2 K. Let e^ = x^ x be the error


in x^. Then for the error in x^ + v, v 2 K we have e = e^ + v, and

Proof.

kek2A = e^T Ae^ + vT Av + 2vT Ae^:

But here the last term is zero be ause vT Ae^ = vT (Ax^ b) = 0. It follows that kekA
is minimum if v = 0.
A related result is obtained for the se ond ase.

Lemma 10.5.3.
Let A be nonsingular and onsider the ase L = AK, (V = AU ). Then
x^ = U (U T AT AU ) 1 U T AT b minimizes the 2-norm of the residual over all ve tors
x 2 K, i.e.,
min kb Axk2 :
(10.5.6)
x2K
Proof. Using x = Uz we have kb Axk2 = kb AUz k2, whi h is minimized when z
satis es the normal equations U T AT AUz = U T AT b. This gives the desired result.

In an iterative method often a sequen e of proje tion steps of the above form
is taken. Then we need to modify the above algorithms slightly so that they an
start from a given approximation xk .4
If we let x = xk + z , then z satis es the system Az = rk , where rk = b Axk .
In step k we now apply the above proje tion method to this system. Hen e we
require that z 2 K and that rk Az = b A(xk + z ) ? L. This gives the equations

rk = b Axk ;

z^ = (V T AU ) 1 V T rk ;

xk+1 = xk + Uz:

(10.5.7)

for omputing the new approximation xk+1 . A generi proje tion algorithm is
obtained by starting from some x0 (e.g., x0 = 0), and repeatedly perform (10.5.7)
for a sequen e of subspa es L = Lk , K = Kk , k = 1; 2; : : :.

10.5.2 The One-Dimensional Case


The simplest ase of a proje tion method is when m = 1. Then in step k we take
Lk = span (vk ), and Kk = span (uk ). Starting from some x0 , we update xk in the
kth step by
vT r
(10.5.8)
rk = b Axk ; k = Tk k ; xk+1 = xk + k uk ;
vk Auk

where we have to require that vkT Auk 6= 0. By onstru tion the new residual rk+1
is orthogonal to vk . Note that rk an be omputed re ursively from

rk = rk 1

k 1 Auk 1 :

(10.5.9)

In the rest of this hapter we will use ve tor notations and xk will denote the kth approximation
and not the kth omponent of x.
4

\dqbjV
2002/
page 3

10.5.

Proje tion Methods

335

This expression is obtained by multiplying xk = xk 1 + k 1 uk 1 by A and using


the de nition rk = b Axk of the residual. Sin e Auk 1 is needed for omputing
k 1 using the re ursive residual will save one matrix times ve tor multipli ation.
If A is s.p.d. we an take vk = uk and the above formulas be ome

rk = b Axk ; k =

uTk rk
; xk+1 = xk + k uk ;
uTk Auk

In this ase xk+1 minimizes the quadrati fun tional


(x) = kx x k2A = (x x )T A(x x )

(10.5.10)

(10.5.11)

for all ve tors of the form xk + k uk .


The ve tors uk are often alled sear h dire tions. Expanding the fun tion
(xk + uk ) with respe t to , we obtain
1
(xk + uk ) = (xk ) uTk (b Axk ) + 2 uTk Auk :
2
Taking = ! k where k is given by (10.5.10) we obtain

(xk + ! k uk ) = (xk ) (!)

1
(uTk rk )2
; (!) = !(2 !);
2
uTk Auk

(10.5.12)

(10.5.13)

whi h is a quadrati fun tion of !. In a proje tion step (! = 1) the line xk + uk is


tangent to the ellipsoidal level surfa e (x) = (xk+1 ), and (xk + k uk ) < (xk )
provided that uTk rk 6= 0. More generally, if uTk rk 6= 0 we have from symmetry that

(xk + ! k uk ) < (xk );


For the error in xk+1 = xk + ! k uk we have

x^ xk+1 = x^ xk

0 < ! < 2:

uT r
! Tk k uk = I
uk Auk

u uT
! Tk k A (^x xk ):
uk Auk

This shows that the error in ea h step is transformed by a linear transformation.

Example 10.5.2. For the Gauss{Seidel method in the ith minor step the ith
omponent of the urrent approximation xk is hanged so that the ith equation is
satis ed, i.e., we take
xk := xk

^ei ;

eTi (b A(xk

^ei )) = 0;

where ei is the ith unit ve tor. Hen e the Gauss{Seidel method is equivalent to a
sequen e of one-dimensional modi ations where the sear h dire tions are hosen
equal to the unit ve tors in y li order e1 ; : : : ; en ; e1; : : : ; en ; : : :.
This interpretation an be used to prove onvergen e for the Gauss{Seidel
(and more generally the SOR method) for the ase when A is s.p.d..

\dqbjV
2002/
page 3

Chapter 10.

336

Iterative Methods

Theorem 10.5.4.
If A is symmetri , positive de nite then the SOR method onverges for 0 <
! < 2, to the unique solution of Ax = b. In parti ular the Gauss{Seidel method,
whi h orresponds to ! = 1, onverges.
In a minor step using sear h dire tion ei the value of  will de rease unless
eTi (b Axk ) = 0, i.e., unless xk satis es the ith equation. A major step onsists
of a sequen e of n minor steps using the sear h dire tions e1 ; : : : ; en . Sin e ea h
minor step e e ts a linear transformation of the error yk = x^ xk , in a major
step it holds that yk+1 = Kyk , for some matrix K . Here kKyk kA < kyk kA unless
yk is un hanged in all minor steps, i = 1; : : : ; n, whi h would imply that yk = 0.
Therefore if yk 6= 0, then kKyk kA < kyk kA, and thus kK kA < 1. It follows that
kK ny0 kA  kK knAky0kA ! 0 when n ! 1;
i.e., the iteration onverges.
If we de ne the minor step as x := ! ^ei , where ! is a x relaxation fa tor,
the onvergen e proof also holds. (We may even let ! vary with i, although the
proof assumes that ! for the same i has the same value in all major steps.) This
shows that the SOR method is onvergent and by Theorem 10.2.2 this is equivalent
to (B! ) < 1, 0 < ! < 2.
Proof.

We make two remarks about the onvergen e proof. First, it also holds if
for the basis ve tors fei gni=1 we substitute an arbitrary set of linearly independent
ve tors fpj gni=1 . Se ond, if A is a positive diagonal matrix, then we obtain the
exa t solution by the Gauss{Seidel method after n minor steps. Similarly, if A
assumes diagonal form after a oordinate transformation with, P = (p1 ; : : : ; pn );
i.e., if P T AP = D, then the exa t solution will be obtained in n steps using sear h
dire tions p1 ; : : : ; pn . Note that this ondition is equivalent to the requirement that
the ve tors fpj gni=1 should be A-orthogonal, pTi Apj = 0;
i 6= j .

10.5.3 The Method of Steepest Des ent


From the expansion (10.5.12) it is lear that the negative gradient of (x) with
respe t to x equals r(x) = b Ax. Hen e the dire tion in whi h the fun tion
 de reases most rapidly at the point xk equals the residual rk = b Axk . The
method of steepest des ent is a one-dimensional proje tion method where we
take vk = uk = rk . Then
rT r
rk = b Axk ; k = Tk k ; xk+1 = xk + k rk :
(10.5.14)
rk Ark
and a ording to (10.5.13) when rk 6= 0, we have (xk+1 ) < (xk ).
We now derive an expression for the rate of onvergen e of the steepest des ent
method. Denoting the error in xk by ek = xk x we have

kek+1 k2A = eTk+1 Aek+1 = rkT+1 ek+1 = rkT+1 (ek + k rk )


= (rk

k Ark )T ek = eTk Aek

k rkT rk ;

\dqbjV
2002/
page 3

10.5.

Proje tion Methods

337

where we have used that rkT+1 rk = 0. Using the expression (10.5.14) for k we
obtain


Tr
rkT rk
k
:
(10.5.15)
kek+1 k2A = kek k2A 1 rrTkAr
T
1
k rk A rk
k
Assume that A is s.p.d. with eigenvalues equal to 0 < 1  2  : : :  n . Then
from a well known inequality by Kantorovi h it follows that (see, e.g., Axelsson and
Barker [1984, Se . 1.2) that for the method of steepest des ent

kx xk kA 

 1
+1

k

kx x0 kA ;

 = 1 =n :

(10.5.16)

It an also be shown that asymptoti ally this bound is sharp. Hen e, the asymptoti
rate of onvergen e only depends on the extreme eigenvalues of A.
If the matrix A is ill- onditioned the level urves of  are very elongated
hyper-ellipsoids. Then the su essive iterates xk , k = 0; 1; 2; : : : will zig-zag slowly
towards the minimum x = A 1 b as illustrated in Fig. 10.5.3 for a two dimensional
ase. Note that su essive sear h dire tions are orthogonal.

Figure 10.5.1. Convergen e of the steepest des ent method.


We now onsider the more general ase when u0 = p0 = r0 (i.e., the steepest des ent dire tion) and the sear h dire tion uk+1 = pk+1 is hosen as a linear
ombination of the negative gradient rk+1 and the previous sear h dire tion pk , i.e.,

pk+1 = rk+1 + k pk ; k = 0; 1; 2 : : : :

(10.5.17)

Here the parameter k remains to be determined. (Note that k = 0 gives the


method of steepest des ent.) From (10.5.13) we know that to get (xk+1 ) < (xk )
we must have pTk rk 6= 0. Repla ing (k + 1) by k in (10.5.17) and multiplying by rkT ,
we obtain
rkT pk = rkT rk + k 1 rkT pk 1 = rkT rk ;
(10.5.18)
sin e rk is orthogonal to pk 1 . It follows that rkT pk = 0 implies rk = 0 and thus
xk = A 1 b. Hen e unless xk is the solution, the next iteration step is always de ned,

\dqbjV
2002/
page 3

Chapter 10.

338

Iterative Methods

regardless of the value of the parameter k , and (xk+1 ) < (xk ). From (10.5.18)
we also obtain the alternative expression
k = (rkT rk )=(pTk Apk ):
(10.5.19)

Review Questions
1.

2.
3.

Let (x) = 12 xT Ax xT b, where A is symmetri positive de nite, and onsider the


fun tion '( ) = (xk + pk ), where pk is a sear h dire tion and xk the urrent
approximation to x = A 1 b. For what value = k is '( ) minimized? Show that
for xk+1 = xk + k pk it holds that b Axk+1 ? pk .
Show that minimizing the quadrati form 12 xT Ax xT b along the sear h dire tions
pi = ei , i = 1; 2; : : : ; n is equivalent to one step of the Gauss{Seidel method.
How are the sear h dire tions hosen in the method of steepest des ent? What is
the asymptoti rate of onvergen e of this method?

10.6

Krylov Subspa e Methods

The Lan zos algorithm and the onjugate gradient algorithm of Hestenes and Stiefel
are the most important examples of Krylov subspa e methods. They were developed already in the early 1950s, but did not ome into wide use until twenty years
later. Now these methods, ombined with pre onditioning te hniques, have been
developed into sophisti ated solvers for large s ale linear systems. Today these are
the standard method for solving linear systems involving large, sparse, symmetri
(or Hermitian) systems.

10.6.1 The Conjugate Gradient Method

We onsider now proje tion methods where the subspa es K and L are hosen to be
the sequen e of Krylov subspa es Kk+1 (r0 ; A), k = 0; 1; 2; : : :, where r0 = b Ax0 ,
and
Km (r0 ; A) = span fr0 ; Ar0 ; : : : ; Am 1 r0 g:
(10.6.1)
This hoi e leads to one of the most important iterative methods for solving large
symmetri positive de nite systems, the onjugate gradient method.
If p0 = r0 , and the re urren e (10.5.17) is used to generate pk+1 then a simple
indu tion argument shows that the ve tors pk and rk both will lie in the Kk+1 (r0 ; A).
In the onjugate gradient method the parameter k is hosen to make pk+1 Aorthogonal or onjugate to the previous sear h dire tion, i.e.,
pTk+1 Apk = 0:
(10.6.2)
(A motivation to this hoi e is given in the se ond remark to Theorem 10.5.4.)
Multiplying (10.5.17) by pTk A and using (10.6.2) it follows that
k = (pTk Ark+1 )=(pTk Apk ):
(10.6.3)

\dqbjV
2002/
page 3

10.6.

Krylov Subspa e Methods

339

We now prove the important result that this hoi e will in fa t make pk+1 A onjugate to all previous sear h dire tions!

Lemma 10.6.1.
In the onjugate gradient algorithm the residual ve tor rk is orthogonal to all
previous sear h dire tions and residual ve tors
rkT pj = 0; j = 0; : : : ; k 1;

(10.6.4)

and the sear h dire tions are mutually A- onjugate

pTk Apj = 0; j = 0; : : : ; k 1:

(10.6.5)

We rst prove the relations (10.6.4) and (10.6.5) jointly by indu tion.
Clearly rk is orthogonal to the previous sear h dire tion pk 1 , and (10.6.2) shows
that also (10.6.5) holds for j = k 1, Hen e these relations are ertainly true for
k = 1.
Assume now that the statements are true for some k  1. From pTk rk+1 = 0,
hanging the index, and taking the s alar produ t with pj , 0  j < k we get
Proof.

rkT+1 pj = rkT pj

k pTk Apj :

From the indu tion hypothesis this is zero, and sin e rkT+1 pk = 0 it follows that
(10.6.4) holds for k := k + 1. Using equation (10.5.17), the indu tion hypothesis
and equation (10.5.9) and then (10.5.17) again we nd for 0 < j < k

pTk+1 Apj = rkT+1 Apj + k pTk Apj = j 1 rkT+1 (rj rj+1 )


= j 1 rkT+1 (pj j 1 pj 1 pj+1 + j pj );
whi h is zero by equation (10.6.4). For j = 0 we use r0 = p0 in forming the last line
of the equation. For j = k we use (10.6.2), whi h yields (10.6.5).
Sin e the ve tors p0 ; : : : ; pk 1 span the Krylov subspa e Kk (r0 ; A) the equation
(10.6.4) shows that rk ? Kk (r0 ; A). This relation shows that the onjugate gradient
implements the proje tion method obtained by taking K = L = Kk (r0 ; A). Hen e
from Lemma 10.5.2 we have the following global minimization property.

Theorem 10.6.2.
The ve tor xk in the onjugate gradient method solves the minimization problem
1
kx x k2A ; x x0 2 Kk (r0 ; A)
(10.6.6)
min

(
x
)
=
x
2
From this property it follows dire tly that the \energy" norm kx x kA in
the CG method is monotoni ally de reasing. It an also be shown that the error
norm kx xk k2 is monotoni ally de reased (see Hestenes and Stiefel [9).

\dqbjV
2002/
page 3

Chapter 10.

340

Iterative Methods

Sin e the ve tors r0 ; : : : ; rk 1 span the Krylov subspa e Kk (r0 ; A) the following
orthogonality relations also hold:

rkT rj = 0; j = 0; : : : ; k 1:

(10.6.7)

Equation (10.6.7) ensures that in exa t arithmeti the onjugate gradient method
will terminate after at most n steps. For suppose the ontrary is true. Then
rk 6= 0, k = 0; 1; : : : ; n and by (10.6.7) these n + 1 nonzero ve tors in Rn are
mutually orthogonal and hen e linearly independent, whi h is impossible. Hen e
the onjugate gradient method is in e e t a dire t method! However, as is now
well know, round-o errors spoil the nite termination property and this aspe t has
little pra ti al relevan e.
From these relations, we an on lude that the residuals r0 ; r1 ; : : : ; rk are
the same ve tors as those obtained from the sequen e r0 ; Ar0 ; : : : ; Ak r0 by GramS hmidt orthogonalization. This gives a onne tion to the Lan zos pro ess des ribed
in Se tion 10.8.4, whi h is further dis ussed below in Se tion 10.6.3. The ve tors
p0 ; p1 ; p2; : : : may be onstru ted similarly from the onjuga y relation (10.6.5).
An alternative expression for k is obtained by multiplying the re ursive expression for the residual rk+1 = rk k Apk by rkT+1 and using the orthogonality
(10.6.7) to get rkT+1 rk+1 = k rkT+1 Apk . Equations (10.5.19) and (10.6.3) then
yield
k = rkT+1 rk+1 =rkT rk :
We observe that in this expression for k the matrix A is not needed. This property
is important when the onjugate gradient method is extended to non-quadrati
fun tionals.
We now summarize the onjugate gradient method. We have seen that there
are alternative, mathemati ally equivalent formulas for omputing rk , k and k .
However, these are not equivalent with respe t to a ura y, storage and omputational work. A omparison tends to favor the following version:

Algorithm 10.6.1
The Conjugate Gradient Method
r0 = b Ax0 ; p0 = r0 ;
for k = 0; 1; 2; : : : while krk k2 >  do
k = (rk ; rk )=(pk ; Apk );
xk+1 = xk + k pk ;
rk+1 = rk k Apk ;
k = (rk+1 ; rk+1 )=(rk ; rk );
pk+1 = rk+1 + k pk ;
end
Here the inner produ t used is (p; q) = pT q. Four ve tors x; r; p and Ap need to

\dqbjV
2002/
page 3

10.6.

Krylov Subspa e Methods

341

be stored. Ea h iteration step requires one matrix by ve tor produ t when forming
Ap, two ve tor inner produ ts and three s alar by ve tor produ ts.
By instead taking the inner produ t in the above algorithm to be (p; q) = pT Aq
we obtain a related method that in ea h step minimizes the Eu lidian norm of
the residual over the same Krylov subspa e. In this algorithm the ve tors Api ,
i = 0; 1; : : : are orthogonal. In addition, the residual ve tors are required to be
A-orthogonal, i.e., onjugate. Consequently this method is alled the onjugate
residual method. This algorithm requires one more ve tor of storage and one more
ve tor update than the onjugate gradient method. Therefore, when appli able the
onjugate gradient method is usually preferred over the onjugate residual method.

10.6.2 Convergen e of the Conjugate Gradient Method


In a Krylov subspa e method the approximations are of the form xk x0 2 Kk (r0 ; A),
k = 1; 2; : : :. With rk = b Axk it follows that rk r0 2 AKk (r0 ; A). Hen e the

residual ve tors an be written

rk = qk (A)r0 ;
where qk 2 ~ 1k , the set of polynomials qk of degree k with qk (0) = 1. Sin e
(x) = 1 kx x k2 = 1 rT A 1 r = krk2 1 ;
2

the optimality property in Theorem 10.6.2 an alternatively be stated as

krk k2A

= min~ 1 kqk (A)r0 k2A 1 :


qk 2k

(10.6.8)

Denote by fi ; vi g, i = 1; : : : ; n, the eigenvalues and eigenve tors of A. Sin e


A is symmetri we an assume that the eigenve tors are orthonormal. Expanding
the right hand side as

r0 =
we have for any qk 2 ~ 1k

krk k2A 1  kqk (A)r0 k2A

n
X
i=1

i vi ;

(10.6.9)

T
1 qk (A)r0 = X 2  1 qk (i )2 :
T
1 = r0 qk (A) A
i i
i=1

In parti ular, taking




 
 

1



1
;
(10.6.10)
1
2
n
we get krn kA 1 = 0. This is an alternative proof that the CG method terminates
after at most n steps in exa t arithmeti .
If the eigenvalues of A are distin t then qn in (10.6.10) is the minimal polynomial of A (see Se tion 10.1.2). If A only has p distin t eigenvalues then the minimal
qn () = 1

\dqbjV
2002/
page 3

Chapter 10.

342

Iterative Methods

polynomial is of degree p and CG onverges in at most p steps for any ve tor r0 .


Hen e, CG is parti ularly e e tive when A has low rank! More generally, if the
grade of r0 with respe t to A equals m then only m steps are needed to obtain the
exa t solution. This will be the ase if, e.g., in the expansion (10.6.9) i 6= 0 only
for m di erent values of i.
We stress that the nite termination property of the CG method shown above
is only valid in exa t arithmeti . In pra ti al appli ations we want to obtain a good
approximate solution xk in far less than n iterations. We now use the optimality
property (10.6.9) to derive an upper bound for the rate of onvergen e of the CG
method onsidered as an iterative method. Let the set S ontain all the eigenvalues
~ 1k we have
of A and assume that for some q~k 2 
max jq~k ()j  Mk :
 2S
Then it follows that

krk k2A 1  Mk2


or

n
X
i=1

i2 i 1 = Mk2 kr0 k2A

kx xk kA  Mk kx x0 kA :

(10.6.11)
We now sele t a set S on the basis of some assumption regarding the eigenvalue
distribution of A and seek a polynomial q~k 2 ~ 1k su h that Mk = max2S jq~k ()j is
small.
~ 1k whi h
A simple hoi e is to take S = [1 ; n and seek the polynomial q~k 2 
minimizes max1 n jqk ()j. The solution to this problem is known to be a shifted
and s aled Chebyshev polynomial of degree k, see the analysis in Se tion 10.4.2. It
follows that
k
p
(10.6.12)
kx xk kA < 2 p + 11 kx x0 kA:
where  = n (A)=1 (A). This estimate is the same as (10.4.10) for Chebyshev
semi-iteration.
We note that the onvergen e of the onjugate residual method an be analyzed using a similar te hnique.

Example 10.6.1. For the model problem in Se tion 10.1.3 the extreme eigenvalues
of 41 A are max = 1 + os h, min = 1 os h. It follows that
=

1 + os h
1 os h

 sin2 1h=2  (h4 )2 :

For h = 1=100 the number of iterations needed to redu e the initial error by a fa tor
of 10 3 is then bounded by

p
1
k  log 2  103   242:
2

\dqbjV
2002/
page 3

10.6.

Krylov Subspa e Methods

343

This is about the same number of iterations as needed with SOR using !opt to
redu e the L2-norm by the same fa tor. However, the onjugate gradient method
is more general in that it does not require the matrix A to have \property A".
The error estimate above tends to be pessimisti asymptoti ally. One often
observes, in pra ti e, a superlinear onvergen e for the onjugate gradient method.
This an be theoreti ally explained for the ase when there are gaps in the spe trum of A. Then, as the iterations pro eeds, the e e t of the smallest and largest
eigenvalues of A are eliminated and the onvergen e then behaves a ording to
a smaller "e e tive" ondition number. This behavior, alled superlinear onvergen e, is in ontrast to the Chebyshev semi-iterative method, whi h only takes the
extreme eigenvalues of the spe trum into a ount and for whi h the error estimate
in Se tion 10.4.2 tends to be sharp asymptoti ally.
We have seen that, in exa t arithmeti , the onjugate gradient algorithm will
produ e the exa t solution to a linear system Ax = b in at most n steps. However, in
the presen e of rounding errors, the orthogonality relations in Theorem 10.5.4 will
no longer be satis ed exa tly. Indeed, orthogonality between residuals ri and rj , for
ji j j is large, will usually be ompletely lost. Be ause of this, the nite termination
property does not hold in pra ti e. Its main use is instead as an iterative method
for solving large, sparse, well- onditioned linear systems, using far fewer than n
iterations.
The behavior of the onjugate gradient algorithm in nite pre ision is mu h
more omplex than in exa t arithmeti . It has been observed that the bound
(10.6.12) still holds to good approximation in nite pre ision. On the other hand a
good approximate solution may not be obtained after n iterations, even though a
large drop in the error sometimes o urs after step n. It has been observed that the
onjugate gradient algorithm in nite pre ision behaves like the exa t algorithm
applied to a larger linear system A^x^ = ^b, where the matrix A^ has many eigenvalues distributed in tiny intervals about the eigenvalues of A. This means that
(A^)  (A), whi h explains why the bound (10.6.12) still applies. It an also be
shown that even in nite pre ision krk k2 ! 0, where rk is the re ursively omputed
residual in the algorithm. (Note that the norm of true residual kb Axk k2 annot
be expe ted to approa h zero.) This means that a termination riterion krk k2  
will eventually always be satis ed even if   u, where u is the ma hine pre ision.

Table 10.6.1. Maximum error for Example 10.6.2 using Chebyshev iteration with optimal parameters and the onjugate gradient algorithm.
Iteration Chebyshev Conj. gradient
1
1:6  10 2
1:6  10 2
4
2
7:1  10
6:5  10 4
5
3
1:1  10
1:0  10 5
7
4
2:7  10
1:0  10 7
9
5
4:3  10
8:1  10 10
6
1:2  10 10
5:7  10 12

\dqbjV
2002/
page 3

Chapter 10.

344

Iterative Methods

Example 10.6.2. Consider the ellipti equation




2u 2u
+ p(x; y)u = f (x; y);
+
x2 y2

p=

6(x2 + y2 )
;
1 + 12 (x4 + y4 )

0 < x; y < 1, and let the boundary onditions be determined so that




u(x; y) = 2 (x 1=2)2 + (y 1=2)2 :


The Lapla ian operator is approximated with 32 mesh points in ea h dire tion. In
Table 10.6.2 we ompare the maximum error using Chebyshev iteration with optimal
parameters and the onjugate gradient algorithm. An initial estimate identi ally
equal to zero is used.
It is seen that Algorithm 10.6.1 yields a smaller error without the need to
estimate the parameters.

10.6.3 The Lan zos Formulation


In Se tion 9.9.4 we des ribed the Lan zos pro ess for a real symmetri matrix A.
If this pro ess an be arried out for k steps, starting with a ve tor v1 , it generates
a symmetri tridiagonal matrix
0

1 2
B 2 2
B
...
Tk = B
B


3
C
C
...
...
C:
C
k 1 k 1 k A
k k

and a matrix Vk = (v1 ; : : : ; vk ) with orthogonal olumns spanning the Krylov subspa e Kk (v1 ; A) su h that

AVk = Vk Tk + k+1 vk+1 eTk :

(10.6.13)

In the ontext of solving the linear system Ax = b, using Krylov subspa es the
appropriate hoi e of starting ve tor is

1 v1 = r0 = b Ax0 ;

1 = kr0 k2:

We write the kth approximation as xk = x0 + Vk yk 2 Kk (r0 ; A). Here yk is determined by the ondition that rk = b Axk is orthogonal to Kk (r0 ; A), i.e., VkT rk = 0.
Using (10.6.13) we have

rk = r0

AVk yk = 1 v1

Vk Tk yk

k+1 (eTk yk )vk+1 :

Sin e VkT vk+1 = 0 and VkT v1 = e, multiplying by VkT gives

VkT rk = 0 = 1 e1

Tk yk :

(10.6.14)

\dqbjV
2002/
page 3

10.6.

Krylov Subspa e Methods

345

Hen e yk is obtained by solving the tridiagonal system

Tk yk = 1 e1 ;

(10.6.15)

and then xk = x0 + Vk yk . Mathemati ally this gives the same sequen e of approximations as generated by the CG method. Moreover, the olumns of Vk equal the
rst k residual ve tors in the onjugate gradient method, normalized to unit length.
The Lan zos pro ess stops if k+1 = krk k2 = 0 sin e then vk+1 is not de ned.
However, then we have AVk = Vk Tk and using (10.6.14)
0 = rk = 1 v1

Vk Tk yk = r0

AVk yk = r0

A(xk

x0 ):

It follows that Axk = b, i.e. xk is an exa t solution.


The re ursion in the onjugate gradient method is obtained from (10.6.15) by
omputing the Cholesky fa torization of Tk . This is always possible. Suppose the
Lan zos pro ess stops for k = l  n. Then, sin e A is a positive de nite matrix
then Tl = VlT AVl is also positive de nite. Thus Tk , k  l, whi h is a prin ipal
submatrix of Tl , is also positive de nite and its Cholesky fa torization must exist.
So far we have dis ussed the Lan zos pro ess in exa t arithmeti . In pra ti e,
roundo will ause the generated ve tors to lose orthogonality. A possible remedy
is to reorthogonalize ea h generated ve tor vk+1 to all previous ve tors vk ; : : : ; v1 .
This is however very ostly both in terms of storage and operations. The e e t of
nite pre ision on the Lan zos method is the same as for the CG method; it slows
down onvergen e, but fortunately does not prevent a urate approximations to be
found!

10.6.4 Symmetri Inde nite Systems


For symmetri positive de nite matri es A the onjugate gradient method omputes
iterates xk that satisfy the minimization property
min kx^ xkA ;

x2Sk

Sk = x0 + Kk (r0 ; A):

In ase A is symmetri but inde nite kkA is no longer a norm. Hen e the standard
onjugate gradient method may break down. This is also true for the onjugate
residual method.
A Krylov subspa e method for symmetri inde nite systems was given by
Paige and Saunders [15, 1975. Using the Lan zos basis Vk they seek approximations
xk = Vk yk 2 Kk (b; A), whi h are stationary values of kx^ xk k2A . These are given
by the Galerkin ondition
VkT (b AVk yk ) = 0:
This leads again to the tridiagonal system (10.6.15). However, when A is inde nite,
although the Lan zos pro ess is still well de ned, the Cholesky fa torization of Tk
may not exist. Moreover, it may happen that Tk is singular at ertain steps, and
then yk is not de ned.
If the Lan zos pro ess stops for some k  n then AVk = Vk Tk . It follows
that the eigenvalues of Tk are a subset of the eigenvalues of A, and thus if A is

\dqbjV
2002/7
page 3

346

Chapter 10.

Iterative Methods

nonsingular so is Tk . Hen e the problem with a singular Tk an only o ur at


intermediate steps.
To solve the tridiagonal system (10.6.15) Paige and Saunders suggest omputing the LQ fa torization
Tk = L k Qk ;
QTk Qk = I;
where L k is lower triangular and Qk orthogonal. Su h a fa torization always exists
and an be omputed by multiplying Tk with a sequen e of plane rotations from
the right
0
1
1
B 2 2
C
B
C




B
C:
Tk G12    Gk 1;k = Lk = B 3 3 3
C
... ... ...

A
k k k
The rotation Gk 1;k is de ned by elements k 1 and sk 1 . The bar on the element
k is used to indi ate that L k di ers from Lk , the k  k leading part of L k+1 , in
the (k; k) element only. In the next step the elements in Gk;k+1 are given by

k = k = k ;
sk = k+1 = k :
k = ( k2 + k2+1 )1=2 ;
Sin e the solution yk of Tk yk = 1 e1 will hange fully with ea h in rease in k
we write
 k zk ;
xk = Vk yk = (Vk QTk )zk = W
and let
 k = (w1 ; : : : ; wk 1 ; wk );
W
zk = (1 ; : : : ; k 1 ; k ) = Qk yk :
 k an be
Here quantities without bars will be un hanged when k in reases, and W
updated with Tk . The system (10.6.15) now be omes
 k zk :
L k zk = 1 e1 ;
x k = W
This formulation allows the vi and wi to be formed and dis arded one by one.
In implementing the algorithm we should note that x k need not be updated
at ea h step, and that if k = 0, then zk is not de ned. Instead we update
xLk = Wk zk = xLk 1 + k wk ;
where Lk is used rather than L k . We an then always obtain x k+1 when needed
from
x k+1 = xLk + k+1 wk+1 :
This de nes the SYMMLQ algorithm. In theory the algorithm will stop with k+1 =
0 and then x k = xLk = x. In pra ti e it has been observed that k+1 will rarely be
small and some other stopping riterion based on the size of the residual must be
used.
Paige and Saunders also derived an algorithm alled MINRES, whi h is based
on minimizing the Eu lidian norm of the residual rk . It should be noted that
MINRES su ers more from poorly onditioned systems than SYMMLQ does.

\dqbjV
2002/
page 3

Review Questions

347

Review Questions
1.

2.

3.

De ne the Krylov spa e Kj (b; A). Show that it is invariant under (i) s aling A.
(ii) translation A sI . How is it a e ted by an orthogonal similarity transformation
 = V T AV , = V T b?
What minimization problems are solved by the onjugate gradient method? How
an this property be used to derive an upper bound for the rate of onvergen e of
the onjugate gradient method.
Let the symmetri matrix A have eigenvalues i and orthonormal eigenve tors vi ,
i = 1; : : : ; n. If only d < n eigenvalues are distin t, what is the maximum dimension
of the Krylov spa e Kj (b; A)?

Problems
1.

2.

3.
4.

Let i ; vi be an eigenvalue and eigenve tor of the symmetri matrix A.


(a) Show that if vi ? b, then also vi ? Kj (b; A), for all j > 1.
(b) Show that if b is orthogonal against p eigenve tors, then the maximum dimension
of Kj (b; A) is at most n p. Dedu e that the the onjugate gradient method onverges
in at most n p iterations.
Let A = I + BB T 2 Rnn , where B is of rank p. In exa t arithmeti , how many
iterations are at most needed to solve a system Ax = b with the onjugate gradient
method?
Write down expli itly the onjugate residual method. Show that in this algorithm
one needs to store the ve tors x; r; Ar; p and Ap.
SYMMLQ is based on solving the tridiagonal system (10.6.13) using an LQ fa torization of Tk . Derive an alternative algorithm, whi h solves this system with Gaussian
elimination with partial pivoting.

10.7

Nonsymmetri Problems

An ideal onjugate gradient-like method for nonsymmetri systems would be hara terized by one of the properties (10.6.6) or (10.6.14). We would also like to be
able to base the implementation on a short ve tor re ursion. Unfortunately, it turns
out that su h an ideal method essentially an only exist for matri es of very spe ial
form. In parti ular, a two term re ursion like in the CG method is only possible in
ase A either has a minimal polynomial of degree  1, or is Hermitian, or is of the
form
A = ei (B + I );
B = BH ;
where  and  are real. Hen e the lass essentially onsists of shifted and rotated
Hermitian matri es.

10.7.1 The Normal Equations


When A 2 Rmn, rank (A) = n, the normal equations of the rst kind

\dqbjV
2002/
page 3

Chapter 10.

348

Iterative Methods

AT Ax = AT b
give the onditions for the solution of the linear least squares problems
min
kAx bk2 ;
x

(10.7.1)
(10.7.2)

Similarly, When rank (A) = m, the normal equations of the se ond kind

AAT y = b;

x = AT y;

(10.7.3)

give the onditions for the solution of the minimum norm problem
min kxk2 ;

Ax = b:

(10.7.4)

In parti ular, if A is nonsingular, then both systems are symmetri positive definite with solution x = A 1 b. Hen e a natural extension of iterative methods for
symmetri positive de nite systems to general nonsingular, non-symmetri linear
systems Ax = b is to to apply them to the normal equations of rst or se ond kind.
Be ause of the relation (AT A) = (AAT ) = 2 (A), the ondition number is
squared ompared to the original system Ax = b. From the estimate (10.6.12) we
note that this an lead to a substantial de rease in the rate of onvergen e.

10.7.2 Classi al iterative methods.


The non-stationary Ri hardson iteration applied to the normal equations AT Ax =
AT b an be written in the form

x(k+1) = x(k) + !k AT (b Ax(k) ); k = 1; 2; : : : ;


This method is often referred to as Landweber's method. It an be shown that
this methods is onvergent provided that for some  > 0 it holds
0 <  < !k < (2 )=max (A);

8k:

An important things to noti e in the implementation is that to avoid numeri al


instability and ll-in, the matrix AT A should not be expli itly omputed.
The eigenvalues of the iteration matrix G = I !AT A equal

k (G) = 1 k2 ; k = 1; : : : ; n;
where k are the singular values of A. From this it an be shown that Ri hardson's
method onverges to the least squares solution x = Ay b if

x(0) 2 R(AT );

0 < < 2=12(A):

Assume that all olumns in A are nonzero, and let


A = (a1 ; : : : ; an ) 2 Rmn ; dj = aT aj = kaj k2 > 0:
j

(10.7.5)

\dqbjV
2002/
page 3

10.7.

Nonsymmetri Problems

349

In Ja obi's method a sequen e of approximations

x(k) = x(1k) ; : : : ; x(nk)

T

k = 1; 2; : : : ;

is omputed from

x(jk+1) = x(jk) + aTj (b Ax(k) )=dj ; j = 1; 2; : : : ; n:

(10.7.6)

Ja obi's method an be written (10.7.6) in matrix form as

x(k+1) = x(k) + DA 1 AT (b Ax(k) );

(10.7.7)

where DA = diag (d1 ; : : : ; dn ) = diag (AT A). Ja obi's method is symmetrizable


sin e
DA1=2 (I DA 1 AT A)DA 1=2 = I DA 1=2 AT ADA 1=2 :
Ja obi's method an also be used to solve the normal equations of se ond type
(10.7.3). This method an be written in the form

x(k+1) = x(k) + AT DA 1 (b Ax(k) ); DA = diag (AAT ):

(10.7.8)

The Gauss{Seidel method is a spe ial ase of the following lass of residual
redu ing methods. Let pj 62 N (A), j = 1; 2; : : :, be a sequen e of nonzero n-ve tors
and ompute a sequen e of approximations of the form

x(j+1) = x(j) + j pj ;

j = pTj AT (b Ax(j) )=kApj k22 :

(10.7.9)

It is easily veri ed that r(j+1) ? Apj = 0, where rj = b Ax(j) , and hen e

kr(j+1) k22 = kr(j) k22 j j j2 kApj k22  kr(j) k22 ;


whi h shows that this lass of methods (10.7.9) is residual redu ing.
If A has linearly independent olumns we obtain the Gauss{Seidel method for
the normal equations by taking pj in (10.7.9) equal to the unit ve tors ej in y li
order. Then if A = (a1 ; a2 ; : : : ; an ), we have Apj = Aej = aj . An iteration step in
the Gauss{Seidel method onsists of n minor steps where we put z (1) = x(k) , and
x(k+1) = z (n+1) is omputed by

z (j+1) = z (j) + ej aTj r(j) =dj ;

r(j) = b Az (j) ;

(10.7.10)

j = 1; 2; : : : ; n. In the j th minor step only the j th omponent of z (j) is hanged, and


hen e the residual r(j) an be heaply updated. With r(1) = b Ax(k) we obtain
the re ursions
z (j+1) = z (j) + j ej ; r(j+1) = r(j)
j = aTj r(j) =dj ; j = 1; : : : ; n:

j aj ;

(10.7.11)

Note that in the j th minor step only the j th olumn of A is a essed. and that it
an be implemented without forming the matrix ATA expli itly. In ontrast to the

\dqbjV
2002/
page 3

Chapter 10.

350

Iterative Methods

Ja obi method the Gauss{Seidel method is not symmetrizable and the ordering of
the olumns of A will in uen e the onvergen e.
The Ja obi method has the advantage over Gauss{Seidel's method that it is
more easily adapted to parallel omputation, sin e (10.7.8) just requires a matrixve tor multipli ation. Further, it does not require A to be stored (or generated)
olumnwise, sin e produ ts of the form Ax and AT r an onveniently be omputed
also if A an only be a essed by rows. In this ase, if aT1 ; : : : ; aTm are the rows of
A, then we have
(Ax)i = aTi x; i = 1; : : : ; n; AT r =

m
X
i=1

ai r i :

That is, for Ax we use an inner produ t formulation, and for AT r, an outer produ t
formulation.
The su essive overrelaxation (SOR) method for the normal equations
ATAx = AT b is obtained by introdu ing an relaxation parameter ! in the Gauss{
Seidel method (10.7.12),

z (j+1) = z (j) + j ej ; r(j+1) = r(j)


j = !aTj r(j) =dj ; j = 1; : : : ; n:

j aj ;

(10.7.12)

The SOR method always onverges when ATA is positive de nite and ! satis es
0 < ! < 2. The SOR shares with the Gauss{Seidel the advantage of simpli ity and
small storage requirements.
The Gauss{Seidel method for solving the normal equations of se ond kind an
also be implemented without forming AAT . We de ne a lass of error redu ing
methods as follows: Let pi 62 N (A), i = 1; 2; : : :, be a sequen e of nonzero m-ve tors
and ompute approximations of the form

x(j+1) = x(j) + j AT pj ;

j = pTj (b Ax(j) )=kAT pj k22 :

(10.7.13)

If the system Ax = b is onsistent there is a unique solution x of minimum norm.


If we denote the error by d(j) = x x(j) , then by onstru tion d(j+1) ? AT pi . Thus

kd(j+1) k22 = kd(j) k22 j j j2 kAT pj k22  kd(j) k22 ;


i.e. this lass of methods is error redu ing.
We obtain the Gauss{Seidel method by taking pj to be the unit ve tors ej in
y li order. Then AT pj = aj: , where aTj: is the j th row of AT and the iterative
method (10.7.13) takes the form

x(j+1) = x(j) + aj: (bj

aTj: x(j) )=dj ; j = 1; : : : ; n:

the approximation y(j) is updated by


y(j) = ej (bj

aTj: AT y(i) )=dj ;

(10.7.14)

\dqbjV
2002/
page 3

10.7.

Nonsymmetri Problems

351

and with x(j) = AT y(j) and x(j+1) = x(j) + AT y(j) we re over (10.7.14). This
shows that if we take x(0) = Ay(0) , then for an arbitrary y(0) (10.7.14) is equivalent
to the Gauss{Seidel method for (10.7.3).
The SOR method applied to the normal equations of the se ond kind an be
obtained by introdu ing an a eleration parameter !, i.e.

x(j+1) = x(j) + !aj:( j

aTj: x(j) )=dj ; j = 1; : : : ; n:

(10.7.15)

10.7.3 The onjugate gradient method


The implementation of CG applied to the normal equations of the rst kind be omes
as follows:

Algorithm 10.7.1
CGLS.
r0 = b Ax0 ; p0 = s0 = AT r0 ;
for k = 0; 1; : : : while krk k2 >  do
qk = Apk ;
k = ksk k22 =kqk k22 ;
xk+1 = xk + k pk ;
rk+1 = rk k qk ;
sk+1 = AT rk+1 ;
k = ksk+1 k22 =ksk k22;
pk+1 = sk+1 + k pk ;
end
Note that it is important for the stability that the residuals rk = b Axk
and not the residuals sk = AT (b Axk ) are re urred. The method obtained by
applying CG to the normal equations of the se ond kind is also known as Craig's
Method. This method an only be used for onsistent problems, i.e., when b 2
R(A). It an also be used to ompute the (unique) minimum norm solution of an
underdetermined system, min kxk2 , subje t to Ax = b, where A 2 Rmn , m < n.
Craig's method (CGNE) an be implemented as follows:

Algorithm 10.7.2
CGNE
r0 = b Ax0 ; p0 = AT r0 ;
for k = 0; 1; : : : while krk k2 >  do
k = krk k22 =kpk k22 ;

\dqbjV
2002/
page 3

Chapter 10.

352

end

Iterative Methods

xk+1 = xk + k pk ;
rk+1 = rk k Apk ;
k = krk+1 k22 =krk k22 ;
pk+1 = AT rk+1 + k pk ;

Both CGLS and CGNE will generate iterates in the shifted Krylov subspa e,

xk 2 x0 + Kk (AT r0 ; AT A):
From the minimization property we have for the iterates in CGLS
k

kx xk kAT A = kr rk k2 < 2  + 11 kr0 k2;
where  = (A). Similarly for CGNE we have

k
ky yk kAAT = kx xk k2 < 2  + 11 kx x0 k2:

For onsistent problems the method CGNE should in general be preferred.


The main drawba k with the two above methods is that they often onverge
very slowly, whi h is related to the fa t that (AT A) = 2 (A). Note, however,
that in some spe ial ases both CGLS and CGNE may onverge mu h faster than
alternative methods. For example, when A is orthogonal then AT A = AAT = I
and both methods onverge in one step!

10.7.4 Least Squares and LSQR.


As shown by Paige and Saunders the Golub{Kahan bidiagonalization pro ess developed in Se tion 10.9.4 an be used for developing methods related to CGLS and
CGNE for solving the linear least squares problem (10.7.1) and the minimum norm
problem (10.7.1), respe tively.
To ompute a sequen e of approximate solutions to the least squares problem
we start the re ursion (10.9.19){(10.19.20) by

1 u1 = b Ax0 ;

1 v1 = AT u1;

and for j = 1; 2; : : : ompute


j+1 uj+1 = Avj j uj ;
j+1 vj+1 = AT uj+1 j+1 vj ;

(10.7.16)
(10.7.17)

where j+1  0 and j+1  0 are determined so that kuj+1 k2 = kvj+1 k2 = 1.


After k steps we have omputed orthogonal matri es

Vk = (v1 ; : : : ; vk );

Uk+1 = (u1 ; : : : ; uk+1 )

\dqbjV
2002/
page 3

10.7.

Nonsymmetri Problems

353

and a re tangular lower bidiagonal matrix


0

1
B 2 2
B
.
Bk = B
3 . .
B
B
...


k
k+1

C
C
C
C
C
A

2 R(k+1)k :

(10.7.18)

The re urren e relations (10.7.16){(10.7.17) an be written in matrix form as

1 Uk+1 e1 = b;

(10.7.19)

where e1 denotes the rst unit ve tor, and

AVk = Uk+1 Bk ;

AT Uk+1 = Vk BkT + k+1 vk+1 eTk+1 :

(10.7.20)

We now seek an approximate solution xk 2 Kk = Kk (AT b; ATA). Sin e Kk =


span(Vk ), we an write
xk = x0 + Vk yk :
(10.7.21)
Multiplying the rst equation in (10.7.20) by yk we obtain Axk = AVk yk = Uk+1 Bk yk ,
and then from (10.7.19)

b Axk = Uk+1 tk+1 ;

tk+1 = 1 e1 Bk yk :

(10.7.22)

Using the orthogonality of Uk+1 and Vk , whi h holds in exa t arithmeti , it follows
that kb Axk k2 is minimized over all xk 2 span(Vk ) by taking yk to be the solution
to the least squares problem
min
kBk yk
y
k

1 e1k2 :

(10.7.23)

This forms the basis for the algorithm LSQR. Note the spe ial form of the righthand side, whi h holds be ause the starting ve tor was taken as b. Now xk = Vk yk
solves minxk 2Kk kAx bk2, where Kk = Kk (AT b; ATA). Thus mathemati ally LSQR
generates the same sequen e of approximations as Algorithm 10.7.3 CGLS.
To solve (10.7.23) stably we need the QR fa torization QTk Bk = Rk . This an
be omputed by premultiplying Bk by a sequen e Givens transformations, whi h
are also applied to the right hand side e1 ,


Gk;k+1 Gk 1;k    G12 (Bk e1 ) = R0k dk :


k
Here the rotation Gj;j+1 is used to zero the element j+1 . It is easily veri ed that
Rk is an upper bidiagonal matrix. The least squares solution yk and the norm of
the orresponding residual are then obtained from

Rk yk = e1 ;

kb Axk k2 = jk j:

\dqbjV
2002/
page 3

Chapter 10.

354

Iterative Methods

Note that the whole ve tor yk di ers from yk 1 . An updating formula for xk
an be derived using an idea due to Paige and Saunders. With Wk = Vk Rk 1 we
an write

xk = x0 + Vk yk = x0 + 1 Vk Rk 1 dk = x0 + 1 Wk dk


= x0 + 1 (Wk 1 ; wk ) dk 1 = xk 1 + 1 k wk :
k

(10.7.24)

Consider now the minimum norm problem for a onsistent system Ax = b.


Let Lk be the upper bidiagonal matrix formed by the rst k rows of Bk
1
1
C
B 2 2
C 2 Rkk :
Lk = B
... ...
A

k k
0

(10.7.25)

The relations (10.7.20) an now be rewritten as

AVk = Uk Lk + k+1 uk+1 eTk ;

AT Uk = Vk LTk :

(10.7.26)

The iterates xk in Craig's method an be omputed as

Lk yk = 1 e1 ;

xk = Vk zk :

(10.7.27)

Using (10.7.26) and (10.7.16) it follows that the residual ve tor satis es

rk = b AVk zk = k+1 uk+1 (eTk zk ) = k+1 k uk+1 ;

6 0 then k =
6 0. Hen e the
and hen e UkT rk = 0. It an be shown that if rk 1 =
ve tors yk and xk an re ursively be formed using
k =

k
 ;
k k 1

xk = xk 1 + k vk :

10.7.5 Arnoldi's Method and GMRES


A serious drawba k with using methods based on the normal equations is that they
often onverge very slowly, whi h is related to the fa t that (AT A) = 2 (A).
We now onsider a method for general nonsymmetri systems based instead on
the Arnoldi pro ess (see Se tion 9.9.6). Given a starting ve tor v1 this pro ess omputes an orthogonal basis for the Krylov subspa e Kk (v1 ; A). In exa t arithmeti
the result after k steps is a a matrix Vk = (v1 ; : : : ; vk ), with orthogonal olumns,
and a related Hessenberg matrix Hk = (hij ) 2 Rkk .
In the following implementation of the Arnoldi pro ess we perform the orthogonalization by the modi ed Gram-S hmidt method.

\dqbjV
2002/
page 3

10.7.

Nonsymmetri Problems

355

Algorithm 10.7.3 The Arnoldi pro ess.


for j = 1 : k do
zj = Avj ;
for i = 1 : j do
hij = zjT vi ;
zj = zj hij vi ;
end
hj+1;j = kzj k2 ;
if abs(hj+1;j ) < ; break end
vj+1 = zj =hj+1;j ;
end
By onstru tion it is easily seen that if we take

v1 = r0 = 1 ; r0 = b Ax0 ; 1 = kr0 k2;


then in in Arnoldi's pro ess we have

vk 2 span(r0 ; Ar0 ; : : : ; Ak 1 r0 )  Kk (r0 ; A);

and the ve tors v1 ; : : : ; vk form an orthogonal basis in the Krylov subspa e Kk (r0 ; A).
Further we have
AVk = Vk Hk + hk+1;k vk+1 eTk :
The Arnoldi pro ess will break down at step j if and only if the ve tor zj vanishes.
When this happens we have hk+1;k = 0 and hen e AVk = Vk Hk , and Vk is a right
invariant subspa e of A.
In the generalized minimum residual (GMRES) method we seek at step j = k
an approximate solution of the form

xk = x0 + Vk yk 2 x0 + Kk (r0 ; A):
If we write (10.7.29) as AVk = Vk+1 H k , where
0

h11 h12
B h21 h22
B
...
H k = B
B



...

hk;k 1

h 1k
h2k
..
.
hkk

(10.7.28)

1
C
C
C
C
A

2 R(k+1)k :

hk+1;k
then the residual rk = b Axk an be expressed as
rk = r0 AVk yk = 1 v1 Vk+1 H k yk = Vk+1 ( 1 e1 H k yk ):

(10.7.29)

\dqbjV
2002/
page 3

Chapter 10.

356

Iterative Methods

Sin e (in exa t arithmeti ) Vk+1 has orthogonal olumns, krk k2 is minimized by
taking yk to be the solution of the least squares problem
k 1e1 H k yk k2 :
(10.7.30)
min
y
k

The QR fa torization of the Hessenberg matrix H k an be obtained by a sequen e


of k plane rotations, and we write

QTk (H k

e1 ) = R0k dk ; QTk = Gk;k+1 Gk 1;k    G12 ;


k

(10.7.31)

where Gj+1;j is hosen to zero the subdiagonal element hj+1;j . Then the solution
to (10.7.30) is obtained from the triangular system

Rk yk = 1 dk ;
Sin e H k 1 determines the rst k 1 Givens rotations and H k is obtained
from H k 1 by adding the kth olumn, it is possible to update the QR fa torization
(10.7.31) at ea h step of the Arnoldi pro ess. To derive the updating formulas for
step j = k we write
 



T
k 1
R
Q
0
H
h
k
k
T
k
1

Qk Hk = Gk;k+1
0
hk+1;k = 0 ;
0
1
where, sin e Gk;k+1 will not a e t Rk 1 ,
0

h1k 1
hk =  ... A :
hkk
The rotation Gk;k+1 and the kth olumn in Rk is determined from


Rk = Rk0 1 rk 1 ;
k

QTk hk = Gk 1;k    G12 hk = rk 1 ;


k
For the residual rk = b Axk it holds

(10.7.32)

rk 1
k 1
Gk;k+1  k A =  k A :
hk+1;k
0
(10.7.33)

krk k2 = 1 k :

(10.7.34)

(Note that QTk , whi h is a full upper Hessenberg matrix, should not be expli itly
formed.)
Pro eeding similarly with the right hand side, we have

QTk e1 = Gk;k+1

QTk

 
dk 1
dk 1
1 e1 = Gk;k+1  k 1 A =  k A  dk : (10.7.35)
0
k
0
k


(Note that the di erent dimensions of the unit ve tors e1 above is not indi ated in
the notation.) The rst k 1 elements in QTk e1 are not a e ted.

\dqbjV
2002/
page 3

10.7.

Nonsymmetri Problems

357

The new approximate solution ould be obtained from (10.7.28), but note
that the whole ve tor yk di ers from yk 1 . Sin e krk k2 = jk j is available without
forming xk , this expensive operation an be delayed until k is small enough.
An updating formula for xk an be derived using the same tri k as in (10.7.24).
There we have set Wk Rk = Vk , whi h an be written


(Wk 1 ; wk ) Rk0 1 k 1 = (Vk 1 ; vk ):


k
Equation the rst blo k olumns we get Wk 1 Rk 1 = Vk 1 , whi h shows that the
rst k 1 olumns of Wk equal Wk 1 . Equating the last olumn we get

wk = (vk

Wk 1 rk 1 )= k ;

(10.7.36)

whi h determines wk . Note that if this formula is used we only need the last olumn
of the matrix Rk . (We now need to save Wk but not Rk .)
The steps in the resulting GMRES algorithm an now be summarized as follows:
1. Obtain last olumn of H k from the Arnoldi pro ess and apply old rotations
gk = Gk 1;k    G12 hk .
2. Determine rotation Gk;k+1 and new olumn in Rk , i.e., k 1 and k a ording
to (10.7.33). This also determines k and k = krk k2 .
3. Compute the new olumn wk using (10.7.35) and xk from (10.7.24).
In exa t arithmeti GMRES annot break down and will give the exa t solution in at most n steps. If at some step we have v~k+1 = 0 then hk+1;k = 0, and
the ve tor vk+1 annot be onstru ted. However, then it is seen that AVk = Vk Hk ,
where Hk is the matrix onsisting of the rst k rows of H k . This means that
xk = x0 + Vk yk , with yk = Hk 1 1 e1 is the exa t solution. It follows that the
GMRES terminates in at most n steps.
Suppose that the matrix A is diagonalizable, A = X X 1, where  =
diag(i ). Then, using the property that the GMRES approximations minimize
the Eu lidian norm of the residual rk = b Axk in the Krylov subspa e Kk (r0 ; A),
it an be shown that
jjrk jj2   (X ) min max jq ( )j;
(10.7.37)
jjr0 jj2 2 qk i=1;2;:::;n k i
where qk is a polynomial of degree  k and qk (0) = 1. The proof is similar to the
onvergen e proof for the onjugate gradient method in Se tion 10.6.2. This results
looks similar to that obtained for the symmetri ase. However, be ause of the
fa tor 2 (X ) in (10.7.37) the rate of onvergen e an no longer be dedu ed from the
spe trum fi g of the matrix A alone. In the spe ial ase that A is normal we have
2 (X ) = 1, and the onvergen e an be determined via the omplex approximation
problem
min
max jqk (i )j;
qk (0) = 1:
q
k i=1;2;:::;n

\dqbjV
2002/
page 3

Chapter 10.

358

Iterative Methods

Be ause omplex approximation problems are harder than real ones no simple results are available even for this spe ial ase.
If GMRES is applied to a real symmetri system, it an be implemented with
three-term re urren es, whi h avoids the ne essity to store all basis ve tors vj . This
leads to the method MINRES by Paige and Saunders mentioned in Se tion 10.7.5.
(However, a version of GMRES, whi h uses the three-term re urren e but stores
all the Lan zos ve tors, has been shown to be less a e ted by rounding errors than
MINRES.)
In pra ti e the number of steps taken by GMRES must be limited be ause of
limited memory and omputational omplexity. This an be a hieved by restarting GMRES after ea h m iterations, and we denote the orresponding algorithm
GMRES(m). Note that storage requirements grow linearly with m, and the omputational requirement quadrati ally with m. In pra ti e typi ally m 2 [5; 20.
GMRES(m) an not break down in exa t arithmeti before the true solution has
been produ ed. Unfortunately it may never onverge for m < n.

10.7.6 Lan zos Bi-orthogonalization


The GMRES method was based on an algorithm for redu ing a nonsymmetri
matrix to Hessenberg form by an orthogonal similarity. Another generalization
proposed by Lan zos in [11 is based on the redu tion of A 2 Cnn to tridiagonal
form by a general similarity transformation. As dis ussed in Wilkinson [19, pp. 388{
405, this redu tion an be performed in two steps. First an orthogonal similarity is
used to redu e A to lower Hessenberg form. Se ond the appropriate elements in the
lower triangular half are zeroed olumn by olumn using a sequen e of similarity
transformations by elimination matri es of the form in (6.3.15).

H := (I

lj eTj )H (I + lj eTj );

j = 1; : : : ; n 1:

In this step row pivoting an not be used, sin e this would destroy the lower Hessenberg stru ture. As a onsequen e, the redu tion will fail if a zero pivot element
is en ountered. Hen e the stability of this pro ess annot be guaranteed, and this
redu tion is in general not advisable.
Assume that A an be redu ed to tridiagonal form
1
1 2
...
C
B
C
B 2 2
C
B
T
.
.
.
C;
B
.. .. ..
W AV = Tn = B
C
C
B
... ...

n A
n n
0

where V = (v1 ; : : : ; vn ) and W = (w1 ; : : : ; w), and W T V = I . The two ve tor


sequen es then are bi-orthogonal, i.e.,
n
wiT vj = 1; if i = j ,
0 otherwise.

(10.7.38)

\dqbjV
2002/
page 3

10.7.

Nonsymmetri Problems

359

Comparing olumns in AV = V T and AT W = W T T we nd (with v0 = w0 = 0)


the re urren e relations

k+1 vk+1 = v~k+1 = (A k I )vk k vk 1 ;


(10.7.39)
T
k+1 wk+1 = w~k+1 = (A k I )wk k wk 1 ;
(10.7.40)
Multiplying equation (10.7.39) by wkT , and using the bi-orthogonality we have
k = wkT Avk :
To satisfy the bi-orthogonality relation (10.7.40) for i = j = k + 1 it su es to
hoose k+1 and k+1 so that.

k+1 k+1 = w~kT+1 v~k+1 :


Hen e there is some freedom in hoosing these s ale fa tors. We now have the following algorithm for generating the two sequen es of ve tors v1 ; v2 ; : : : and w1 ; w2 ; : : ::

Algorithm 10.7.4
The Lan zos Bi-orthogonalization Pro ess Let v1 and w1 be two ve tors su h
that w1T v1 = 1. The following algorithm omputes in exa t arithmeti after k
steps a symmetri tridiagonal matrix Tk = trid ( j ; j ; j+1 ) and two matri es Wk
and Vk with bi-orthogonal olumns spanning the Krylov subspa es Kk (v1 ; A) and
Kk (w1 ; AT ):
w0 = v0 = 0;
1 = 1 = 0;
for j = 1; 2; : : :
j = wjT Avj ;
vj+1 = Avj j vj j vj 1 ;
wj+1 = AT wj j wj j wj 1 ;
j+1 = jwjT+1 vj+1 j1=2 ;
if j+1 = 0 then exit;
j+1 = (wjT+1 vj+1 )=j+1 ;
vj+1 = vj+1 =j+1 ;
wj+1 = wj+1 = j+1 ;
end
Note that if A = AT , w1 = v1 , and we take k = k , then the two sequen es
generated will be identi al. The pro ess then is equivalent to the symmetri Lan zos
pro ess.
If we denote

Vk = (v1 ; : : : ; vk );

Wk = (w1 ; : : : ; wk );

\dqbjV
2002/
page 3

Chapter 10.

360

Iterative Methods

then we have WkT AVk = Tk , and the re urren es in this pro ess an be written in
matrix form as

AVk = Vk Tk + k+1 vk+1 eTk ;


AT Wk = Wk TkT + k+1 wk+1 eTk :

(10.7.41)
(10.7.42)

There are two ases when the above algorithm breaks down. The rst o urs
when either v~k+1 or w~k+1 (or both) is null. In this ase it follows from (10.7.41){
(10.7.42) that an invariant subspa e has been found; if vk+1 = 0, then AVk = Vk Tk
and R(Vk ) is an A-invariant subspa e. If wk+1 = 0, then AT Wk = Wk TkT and
R(Wk ) is an AT -invariant subspa e. This is alled regular termination. The se ond
ase, alled serious breakdown by Wilkinson, o urs when w~kT v~k = 0, with neither
v~k+1 nor w~k+1 null.

10.7.7 Bi- onjugate Gradient Method and QMR


We now onsider the use of the nonsymmetri Lan zos pro ess for solving a linear
system Ax = b. Let x0 be an initial approximation. Take 1 v1 = r0 , 1 = kr0 k2 ,
and w1 = v1 . We seek an approximate solution xk su h that

xk

x0 = Vk yk 2 Kk (r0 ; A):

For the residual we then have

rk = b Axk = 1 v1 AVk yk = 1 v1

Vk Tk yk

k+1 v~k+1 eTk+1 yk

where yk is determined so that the Galerkin ondition rk ? Kk (w1 ; AT ) is satis ed.


Using (10.7.41) and the bi-orthogonality onditions WkT Vk = 0 this gives
0 = WkT ( 1 v1

AVk yk ) = 1 e1 Tk yk :

Hen e, if the matrix Tk is nonsingular yk is determined by solving the tridiagonal system Tk yk = 1 e1 . If A is symmetri , this method be omes the SYMMLQ
method, see Se tion 10.7.5. We remark again that this method an break down
without produ ing a good approximate solution to Ax = b. In ase of a serious
breakdown, it is ne essary to restart from the beginning with a new starting ve tor
r0 . As in SYMMLQ the matrix Tk may be singular for some k.
The nonsymmetri Lan zos pro ess is the basis for several iterative methods
for nonsymmetri systems. The method an be written in a form more like the
onjugate gradient algorithm, whi h is alled the bi- onjugate gradient or BCG
method, see, e.g., Barret et al. [pp. 21{22, 1994. This form is obtained from the
Lan zos formulation by omputing the LU de omposition of Tk . Hen e the BCG
method annot pro eed when a singular Tk o urs, and this is an additional ause
for breakdown.
There are few results known about the onvergen e of the BCG and related
methods. In pra ti e it has been observed that sometimes onvergen e an be as
fast as for GMRES. However, often the onvergen e behavior an be very irregular,

\dqbjV
2002/7
page 3

10.7.

Nonsymmetri Problems

361

and as remarked above breakdown o urs. Sometimes, breakdown an be avoided


by a restart at the iteration step immediately before the breakdown step.
A modi ation of BCG due to Sonneveld is CGS, whi h stands for \CG
squared". Sonneveld observed that by reorganizing the BCG algorithm in a ertain
way one an obtain an algorithm for whi h, with the same amount of work, the
errors and residuals satisfy

ek = qk2 (A)e0 ; rk = qk2 (A)r0 ;


where qk is the same polynomial as in BCG. Furthermore, no matrix-ve tor produ ts
with AT are required. CGS tends to onverge faster than BCG by a fa tor between 1
and 2, and is parti ularly attra tive in ase omputations with AT are impra ti al.
The residual norms in CGS may behave very errati ally and often a stabilized
version of this method Bi-CGSTAB is preferred.
Another related method an be developed as follows. After k steps of the
nonsymmetri Lan zos pro ess we have from the relation (10.7.41) that


AVk = Vk+1 T^k ; T^k = TkeT ;


k+1 k
where T^k is an (k + 1)  k tridiagonal matrix. We an now pro eed as was done in
developing GMRES. If we take v1 = r0 , the the residual asso iated with with an
approximate solution of the form xk = x0 + Vk y is given by

b Axk = b A(x0 + Vk y) = r0 AVk y


= v1 Vk+1 T^k y = Vk+1 ( e1 T^k y):

(10.7.43)

Hen e the norm of the residual ve tor is

kb Axk k2 = kVk+1 ( e1 T^k y)k2 :


If the matrix Vk+1 had orthonormal olumns then the residual norm would be ome
k( e1 T^k y)k2 , as in GMRES, and a least squares solution in the Krylov subspa e
ould be obtained by solving
min k e1 T^k yk2 :
y

for yk and taking xk = x0 + Vk yk . This is alled the Quasi-Minimal Residual


(QMR) method.
Re ent surveys on progress in iterative methods for non-symmetri systems
are given by Freund, Golub and Na htigal [4, 1991 and Golub and van der Vorst
[7. There is a huge variety of methods to hoose from. Unfortunately in many
pra ti al situations it is not lear what method to sele t. In general there is no best
method. In [14 examples are given whi h show that, depending on the linear system
to be solved, ea h method an be lear winner or lear loser! Hen e insight into
the hara teristi s of the linear system is needed in order to dis riminate between
methods. This is di erent from the symmetri ase, where the rate of onvergen e
an be dedu ed from the spe tral properties of the matrix alone.

\dqbjV
2002/
page 3

Chapter 10.

362

Iterative Methods

Review Questions
1.
2.
3.

What optimality property does the residual ve tors rk = b Axk in the GMRES
method satisfy. In what subspa e does the ve tor rk r0 lie?
In Lan zos bi-orthogonalization bases for two di erent Krylov subspa es are omputed. Whi h subspa es and what property has these bases?
(a) The bi- onjugate gradient (BCG) method is based on the redu tion of A 2 Cnn
to tridiagonal form by a general similarity transformation.
(b) What are the main advantages and drawba ks of the BCG method ompared to
GMRES.
) How are the approximations xk de ned in QMR?

Problems
1.
2.

Derive Algorithms CGLS and CGNE by applying the onjugate gradient algorithm
to the normal equations AT Ax = AT b and AAT y = b, x = AT y, respe tively.
Consider using GMRES to solve the system Ax = b, where
A=

0 1 ;
1 0

 

b = 11 ;

using x0 = 0. Show that x1 = 0, and that therefore GMRES(1) will never produ e
a solution.
10.8

Pre onditioned Iterative Methods

The term \pre onditioning" dates ba k to Turing in 1948, and is in general taken
to mean the transformation of a problem to a form that an more e iently be
solved. In order to be e e tive iterative methods must usually be ombined with a
(nonsingular) pre onditioning matrix M , whi h in some sense is an approximation
to A. The original linear system Ax = b is then transformed by onsidering the
left-pre onditioned system

M 1 Ax = M 1 b:

(10.8.1)

or right-pre onditioned system

AM 1 u = b;

u = Mx:

(10.8.2)

The idea is to hoose M so that the rate of onvergen e of the iterative method is
improved. Note that the produ t M 1 A (or AM 1 ) should never be formed. The
pre onditioned iteration is instead implemented by forming matrix ve tor produ ts
with A and M 1 separately. Sin e forming u = M 1v for an arbitrary ve tor v is
equivalent to solving a linear system Mu = v, the inverse M 1 is not needed either.

\dqbjV
2002/
page 3

10.8.

Pre onditioned Iterative Methods

363

Often the rate of onvergen e depends on the spe trum of the transformed
matrix. Sin e the eigenvalues of M 1 A and AM 1 are the same, we see that the
main di eren e between these two approa hes is that the a tual residual norm is
available in the right-pre onditioned ase.
If A is symmetri , positive de nite, the pre onditioned system should also
have this property. In this ase, it is natural to onsider a split pre onditioner.
Let M = LLT where L is the Cholesky fa tor of M . Then we onsider the pre onditioned linear system

A~ = L 1AL T ;

x~ = LT x;

~b = L 1 b;

and the spe trum of A~ is real. Note that the spe trum of A = C 1 AC
same as for L T L 1A = M 1 A.

(10.8.3)
T

is the

10.8.1 The Pre onditioned CG Method


The onjugate gradient algorithm 10.6.1 an be applied to linear systems Ax = b,
where A is symmetri positive de nite. In this ase it is natural to use a split
pre onditioner and onsider the system (10.8.3).
In implementing the pre onditioned onjugate gradient method we need to
~ = L 1 (A(L T p)). These an be
form matrix-ve tor produ ts of the form t = Ap
al ulated by solving two linear systems and performing one matrix multipli ation
with A as
LT q = p;
s = Aq;
Lt = s:
Thus, the extra work per step in using the pre onditioner essentially is to solve two
linear systems with matrix LT and L respe tively.
The pre onditioned algorithm will have re ursions for the transformed variables and residuals ve tors x~ = LT x and r~ = L 1 (b Ax). It an be simpli ed by
reformulating it in terms of the original variables x and residual r = b Ax. It is
left as an exer ise to show that if we let pk = L T p~k , zk = L T r~k , and

M = LLT ;

(10.8.4)

we an obtain the following implementation of the pre onditioned onjugate


gradient method:

Algorithm 10.8.1
Pre onditioned Conjugate Gradient Method
r0 = b Ax0 ; p0 = z0 = M 1 r0 ;
for k = 0; 1; 2; : : : ; while krk k2 >  do
w = Apk ;
k = (zk ; rk )=(pk ; Apk );
xk+1 = xk + k pk ;

\dqbjV
2002/
page 3

Chapter 10.

364

Iterative Methods

rk+1 = rk k Apk ;
zk+1 = M 1 rk+1 ;
k = (zk+1 ; rk+1 )=(zk ; rk );
pk+1 = zk+1 + k pk ;
end
A surprising and important feature of this version is that it depends only on
the symmetri positive de nite matrix M = LLT .
The rate of onvergen e in the A~-norm depends on (A~), see (10.6.12). Note,
however, that

kx~ x~k k2A~ = (~x x~k )T L 1 AL

T (~
x

x~k ) = kx xk k2A;

so the rate of onvergen e in A-norm of the error in x also depends on (A~). The
pre onditioned onjugate gradient method will have rapid onvergen e if one or
both of the following onditions are satis ed:
i. M 1A to have small ondition number, or
ii. M 1A to have only few distin t eigenvalues.
For symmetri inde nite systems SYMMLQ an be ombined with a positive
de nite pre onditioner M . To solve the symmetri inde nite system Ax = b the
pre onditioner is regarded to have the form M = LLT and SYMMLQ impli itly
applied to the system
L 1 AL T w = L 1 b:
The algorithm a umulates approximations to the solution x = L T w, without approximating w. A MATLAB implementation of this algorithm, whi h only requires
solves with M , is given by Gill et al. [5.

10.8.2 Pre onditioned CGLS and CGNE.


To pre ondition CGLS Algorithm (10.7.3) it is natural to use a right pre onditioner
S 2 Rnn, i.e., perform the transformation of variables
min
k(AS 1 )y
y

bk2 ; Sx = y:

(Note that for a non onsistent system Ax = b a left pre onditioner would hange
the problem.) If we apply CGLS to this problem and formulate the algorithm in
terms of the original variables x, we obtain the following algorithm:

\dqbjV
2002/
page 3

10.8.

Pre onditioned Iterative Methods

365

Algorithm 10.8.2
Pre onditioned CGLS.
r0 = b Ax0 ; p0 = s0 = S T (AT r0 );
for k = 0; 1; : : : while krk k2 >  do
tk = S 1 pk ;
qk = Atk ;
k = ksk k22=kqk k22 ;
xk+1 = xk + k tk ;
rk+1 = rk k qk ;
sk+1 = S T (AT rk+1 );
k = ksk+1 k22=ksk k22 ;
pk+1 = sk+1 + k pk ;
end
For solving a onsistent underdetermined systems we an derive a pre onditioned version of CGNE. Here it is natural to use a left pre onditioner S , and apply
CGNE to the problem
min kxk2 ;

S 1 Ax = S 1 b;

i.e., the residual ve tors are transformed. If the algorithm is formulated in terms of
the original residuals, the following algorithm results:

Algorithm 10.8.3
Pre onditioned CGNE
r0 = b Ax0 ; z0 = S 1 r0 ; p0 = AT (S
for k = 0; 1; : : : while krk k2 >  do
k = kzk k22 =kpk k22;
xk+1 = xk + k pk ;
rk+1 = rk k Apk ;
zk+1 = S 1 rk+1 ;
k = kzk+1 k22 =kzk k22 ;
pk+1 = AT (S T zk+1 ) + k pk ;
end

Tz

0 );

\dqbjV
2002/
page 3

Chapter 10.

366

Iterative Methods

Algorithm PCCGLS still minimizes the error fun tional kr^


r = b Ax, but over a di erent Krylov subspa e

r(k) k2 , where

x(k) = x(k) + Kk ;

Kk = (S 1 S T AT A; S 1 S T AT r0 ):
Algorithm PCCGNE minimizes the error fun tional kx^ x(k) k2 , over the Krylov

subspa e

x(k) = x(k) + Kk ;

Kk = (AT S T S 1A; AT S T S 1 r0 ):

The rate of onvergen e for PCGTLS depends on (AS 1 ), and for PCCGNE on
(S 1 A) = (AT S T ).

10.8.3 Pre onditioned GMRES


For nonsymmetri linear systems there are two options for applying the pre onditioner. We an use the left pre onditioned system (10.8.1) or the right pre onditioned system (10.8.2). (If A is almost symmetri positive de nite, then a split
pre onditioner might also be onsidered.) The hanges to the GMRES algorithm
are small.
In the ase of using a left pre onditioner M only the following hanges in the
Arnoldi algorithm are needed. We start the re ursion with

r0 = M 1 (b Ax0 ); 1 = kr0 k2 ; v1 = r0 = 1 ;
and de ne

zj = M 1 Avj ; j = 1; 2; : : : ; k:
All omputed residual ve tors will be pre onditioned residuals M 1 (b Axm). This
is a disadvantage sin e most stopping riteria depend are based on the a tual residuals rm = b Axm . In this left pre onditioned version the transformed residual
norm kM 1(b Ax)k2 will be minimized among all ve tors of the form
x0 + Km (M 1 r0 ; M 1 A):

(10.8.5)

In the right pre onditioned version of GMRES the a tual residual ve tors are
used, but the variables are transformed a ording to u = Mx (x = M 1 u). The
right pre onditioned algorithm an easily be modi ed to give the untransformed
solution. We have
zj = AM 1 vj ; j = 1; 2; : : : ; k:
The kth approximation is xk = x0 + M 1Vk yk , where yk solves
min k 1e1 H k yk k2 :
yk

As before this an be written as

xk = xk 1 + 1 k Mk 1 wk ; wk = Rk yk ;

\dqbjV
2002/
page 3

10.9.

Pre onditioners

367

see (10.7.24).
In the right pre onditioned version the residual norm kb AM 1 uk2 will be
minimized among all ve tors of the form u0 + Km (r0 ; AM 1 ). However, this is
equivalent to minimizing kb AM 1 uk2 among all ve tors of the form

x0 + M 1 Km (r0 ; AM 1 ):

(10.8.6)

Somewhat surprisingly the two ane subspa es (10.8.5) and (10.8.6) are the same!
The j th ve tor in the two Krylov subspa es are wj = (M 1 A)j M 1 r0 and w~j =
M 1 (AM 1 )j r0 . By a simple indu tion proof it an be shown that M 1 (AM 1 )j =
(M 1 A)j M 1 and so w~j = wj , j  0. Hen e the left and right pre onditioned
versions generate approximations in the same Krylov subspa es, and they di er
only with respe t to whi h error norm is minimized.
For the ase when A is diagonalizable, A = X X 1, where  = diag(i ) we
proved the error estimate

jjrk jj2   (X ) min max jq ( )j;


(10.8.7)
jjr0 jj2 2 qk i=1;2;:::;n k i
where qk is a polynomial of degree  k and qk (0) = 1. Be ause of the fa tor 2 (X )

in (10.8.7) the rate of onvergen e an no longer be dedu ed from the spe trum
fi g of the matrix A alone. Sin e the spe trum of M 1 A and AM 1 are the same
we an expe t that the onvergen e behavior will be similar when if A is lose to
normal.
10.9

Pre onditioners

A pre onditioner should typi ally satisfy the following onditions:


(i) M 1 A = I + R, where kRk is small.
(ii) Linear systems of the form Mu = v should be easy to solve.
(iii) nz(M )  nz(A).
Condition (i) implies fast onvergen e, (ii) that the arithmeti ost of pre onditioning is reasonable, and (iii) that the storage overhead is not too large. Obviously
these onditions are ontradi tory and a ompromise must be sought. For example,
taking M = A is optimal in the sense of (i), but obviously this hoi e is ruled out
by (ii).
The hoi e of pre onditioner is strongly problem dependent and possibly the
most ru ial omponent in the su ess of an iterative method! A pre onditioner
whi h is expensive to ompute may be ome viable if it is to be used many times,
as may be the ase, e.g., when dealing with time-dependent or nonlinear problems.
It is also dependent on the ar hite ture of the omputing system. Pre onditioners
that are e ient in a s alar omputing environment may show poor performan e
on ve tor and parallel ma hines.

\dqbjV
2002/
page 3

Chapter 10.

368

Iterative Methods

10.9.1 Pre onditioners from Matrix Splittings


The stationary iterative method

x(k+1) = x(k) + M 1 (b Ax(k) );


orresponds to a matrix splitting A = M

k = 0; 1; : : : ;

(10.9.1)

N , and the iteration matrix

B = M 1N = I

M 1 A:

The iteration (10.9.1) an be onsidered as a xed point iteration applied to the


pre onditioned system M 1Ax = M 1b. Hen e, the basi iterative methods onsidered in Se tions 10.2 give simple examples of pre onditioners.
The Ja obi and Gauss{Seidel methods are both spe ial ases of one-step stationary iterative methods. Using the standard splitting A = DA LA UA, where
DA is diagonal, LA and UA are stri tly lower and upper triangular, these methods
orrespond to the matrix splittings

MJ = DA ; and MGS = DA LA :
If A is symmetri positive de nite then MJ = DA > 0 and symmetri . However,
MGS is lower triangular and unsymmetri .
The simplest hoi e related to this splitting is to take M = DA . This
orresponds to a diagonal s aling of the rows of A, su h that the s aled matrix
M 1A = DA 1 A has a unit diagonal. For s.p.d. matri es symmetry an be preserved by using a split pre onditioner with L = LT = DA1=2 . In this ase it an be
shown that this is lose to the optimal diagonal pre onditioning.

Lemma 10.9.1. Van der Sluis [1969


Let A = DA LA LTA be a symmetri positive de nite matrix. Then if A
has at most q nonzero elements in any row it holds that
(DA 1=2 ADA 1=2 ) = min (DAD):
D>0

Although diagonal s aling may give only a modest improvement in the rate of
onvergen e it is trivial to implement and therefore re ommended even if no other
pre onditioning is arried out.
In Se tion 10.3.2 it was shown that for a symmetri matrix A the SSOR iteration method orresponds to a splitting with the matrix


1
D !LA DA 1 DA !UA :
MSSOR =
!(2 !) A
Sin e MSSOR is given in the form of an LDLT fa torization it is easy to solve linear
systems involving this pre onditioner. It also has the same sparsity as the original
matrix A. For 0 < ! < 2, if A is s.p.d. so is MSSOR .
The performan e of the SSOR splitting turns out to be fairly insensitive to
the hoi e of !. For systems arising from se ond order boundary values problems,

\dqbjV
2002/
page 3

10.9.

Pre onditioners

369

e.g., the model problem studied previously, the original ondition number (A) =
O(h 2 ) an be redu ed to (M 1 A) = O(h 1 ). Taking ! = 1 is often lose to
optimal. This orresponds to the symmetri Gauss{Seidel (SGS) pre onditioner


MSGS = DA LA DA 1 DA UA :
All the above pre onditioners satisfy onditions (ii) and (iii). The appli ation
of the pre onditioners involves only triangular solves and multipli ation with a
diagonal matrix. They are all de ned in terms of elements of the original matrix
A, and hen e do not require extra storage. However, they may not be very e e tive
with respe t to the ondition (i).

10.9.2 In omplete LU Fa torizations

The SGS pre onditioner has the form


MSGS = LU where L = I LTA DA 1 is

T
lower triangular and U = DA LA upper triangular. To nd out how well MSGS
approximates A we form the defe t matrix ixdefe t matrix

A LU = DA LA UA

LA DA 1 DA UA = LADA 1 UA :

An interesting question is whether we an nd matri es L and U with the same


nonzero stru ture as above, but with a smaller defe t matrix R = LU A.
We now develop an important lass of pre onditioners obtained from so alled
in omplete LU-fa torizations of A. The idea is to ompute a lower triangular
matrix L and an upper triangular matrix U with a pres ribed sparsity stru ture
su h that
A = LU R;
with R small. Su h in omplete LU-fa torizations an be realized by performing
a modi ed Gaussian elimination on A, in whi h elements are allowed only in in
spe i ed pla es in the L and U matri es. We assume that these pla es (i; j ) are
given by the index set

P  Pn  f(i; j ) j 1  i; j  ng;
where the diagonal positions always are in luded in P . For example, we ould take
P = PA , the set of nonzero elements in A.
Algorithm 10.9.1
In omplete LU Fa torization
for k = 1; : : : ; n 1
for i = k + 1; : : : ; n
if (i; k) 2 P lik = aik =akk ;
for j = k + 1; : : : ; n
if (k; j ) 2 P aij = aij lik akj ;

\dqbjV
2002/
page 3

Chapter 10.

370

end

end

Iterative Methods

end

The elimination onsists of n 1 steps. In the kth step we rst subtra t from
the urrent matrix elements with indi es (i; k) and (k; i) 2= P and pla e in a defe t
matrix Rk . We then arry out the kthe step of Gaussian elimination on the so
modi ed matrix. This pro ess an be expressed as follows. Let A0 = A and

Aek = Ak 1 + Rk ;

Ak = Lk Afk ; k = 1; : : : ; n 1:

Applying this relation re ursively we obtain

An 1 = Ln 1Aen 1 = Ln 1 An 2 + Ln 1 Rn 1
= Ln 1Ln 2 An 3 + Ln 1 Ln 2Rn 2 + Ln 1 Rn 1
= Ln 1Ln 2    L1 A + Ln 1 Ln 2    L1 R1
+    + Ln 1 Ln 2 Rn 2 + Ln 1 Rn 1 :
We further noti e that sin e the rst m 1 rows of Rm are zero it follows that
Lk Rm = Rm , if k < m. Then by ombining the above equations we nd LU = A+R,
where

U = An 1 ;

L = (Ln 1 Ln 2    L1) 1 ;

R = R1 + R2 +    Rn 1 :

Algorithm 10.9.2 an be improved by noting that any elements in the resulting


(n k)  (n k) lower part of the redu ed matrix not in P need not be arried
along and an also be in luded in the defe t matrix Rk . This is a hieved simply by
hanging line ve in the algorithm to

if (k; j ) 2 P and (i; j ) 2 P aij = aij

lik akj ;

In pra ti e the matrix A is sparse and the algorithm should be spe ialized to take
this into a ount. In parti ular, a version where A is pro essed a row at a time is
more onvenient for general sparse matri es. Su h an algorithm an be derived by
inter hanging the k and i loops in Algorithm 10.9.2.

Example 10.9.1. For the model problem using a ve-point approximation the
non-zero stru ture of the resulting matrix is given by

PA = f(i; j ) j ji j j = n; 1; 0; 1; ng:
Let us write A = LU + R, where

L=L

+ L 1 + L0 ;

U = U0 + U1 + Un ;

\dqbjV
2002/
page 3

10.9.

Pre onditioners

371

where L k (and Uk ) denote matri es with nonzero elements only in the k-th lower
(upper) diagonal. By the rule for multipli ation by diagonals (see Problem 6.1.6),

Ak Bl = Ck+l ; if k + l  n 1;
we an form the produ t

LU = (L n + L 1 + L0)(U0 + U1 + Un ) = (L nUn + L 1U1 + L0 U0)


+ L nU0 + L 1U0 + L0Un + L0U1 + R;
where R = L n U1 + L 1Un . Hen e the defe t matrix R has nonzero elements only
in two extra diagonals.
A family of pre onditioners an be derived by di erent hoi es of the set P .
The simplest hoi e is to take P equal to the sparsity pattern of A. This is alled
a level 0 in omplete fa torization. A level 1 in omplete fa torization is obtained by
using the union of P and the pattern of the defe t matrixR = LLT A. Higher
level in omplete fa torizations are de ned in a similar way, and so on.
An in omplete LU fa torization may not exist even if A is nonsingular and
has an LU fa torization. However, for some more restri ted lasses of matri es the
existen e of in omplete fa torizations an be guaranteed. Many matri es arising
from the dis retization of partial di erential equations have the following property.

De nition 10.9.2. A matrix A = (aij ) is an M -matrix if aij


nonsingular and A 1  0.

 0 for i 6= j , A is

The following result was proved by Meijerink and van der Vorst [13, 1977.

Theorem 10.9.3.
If A is an M -matrix, there exists for every set P su h that (i; j ) 2 P for
i = j , uniquely de ned lower and upper triangular matri es L and U with lij = 0
or uij = 0 if (i; j ) 62 P , su h that the splitting A = LU R is regular.
In ase A is s.p.d., we de ne similarly an in omplete Cholesky fa torization. Here the nonzero set P is assumed to be symmetri , i.e., if (i; j ) 2 P then
also (j; i) 2 P . Positive de niteness of A alone is not su ient to guarantee the
existen e of an in omplete Cholesky fa torization. This is be ause zero elements
may o ur on the diagonal during the fa torization.
For the ase when A is a symmetri M -matrix, a variant of the above theorem
guarantees the existen e for ea h symmetri set P su h that (i; j ) 2 P for i = j ,
a uniquely de ned lower triangular matrix L, with lij = 0 if (i; j ) 62 P su h that
the splitting A = LLT R is regular. In parti ular the matrix arising from the
model problem is a symmetri M -matrix. A symmetri M -matrix is also alled a
Stieltjes matrix.
An implementation of the in omplete Cholesky fa torization in the general
ase is given below.

\dqbjV
2002/
page 3

Chapter 10.

372

Iterative Methods

Algorithm 10.9.2
In omplete Cholesky Fa torization
for j = 1; 2; : : : ; n

j 1
X

ljj = ajj

k=1

2
ljk

1=2

for i = j + 1; : : : ; n
if (i; j ) 62 P then lij = 0 else
j 1
X

lij = aij
end

k=1

end

lik ljk

10.9.3 Blo k In omplete Fa torizations


Many matri es arising from the dis retization of multidimensional problems have a
blo k stru ture. For su h matri es one an generalize the above idea and develop
blo k in omplete fa torizations. In parti ular, we onsider here a symmetri
positive de nite blo k tridiagonal matri es of the form
1

D1 AT2
B A2 D2 AT
3
B
...
B
A=B
A3
B
...


...
...

C
C
C
C
C
AT A

= DA

LA LTA ;

(10.9.2)

AN DN
with square diagonal blo ks Di . For the model problem with the natural ordering
of mesh points we obtain this form with Ai = I , Di = tridiag( 1 4 1). If
systems with Di an be solved e iently a simple hoi e of pre onditioner is the
blo k diagonal pre onditioner
M = diag (D1 ; D2 ; : : : ; DN ):
The ase N = 2 is of spe ial interest. For the system


D1 AT2
A 2 D2



x1 = b1 ;
x2
b2

(10.9.3)

the blo k diagonal pre onditioner gives a pre onditioned matrix of the form


1 T
M 1A = D I1 A D1 I A2 :
2 2

\dqbjV
2002/
page 3

10.9.

Pre onditioners

373

Note that this matrix is of the form (10.3.4) and therefore has property A. Suppose the onjugate gradient method is used with this pre onditioner and a starting
approximation x(0)
1 . If we set
(0)
1
x(0)
2 = D2 (b2 A2 x1 );

(0)
then the orresponding residual r2(0) = b2 D2 x(0)
1 A2 x2 = 0. It an be shown that
in the following steps of the onjugate gradient method we will alternately have

r2(2k) = 0;

r1(2k+1) = 0;

k = 0; 1; 2; : : : :

This an be used to save about half the work.


If we eliminate x1 in the system (10.9.3) then we obtain

Sx2 = b2 A2 D1 1 b1 ;

S = D2

A2 D1 1 AT2 ;

(10.9.4)

where S is the S hur omplement of D1 in A. If A is s.p.d., then S is also s.p.d.,


and hen e the onjugate gradient an be used to solve the system (10.9.4). This
pro ess is alled S hur omplement pre onditioning. Here it is not ne essary
to form the S hur omplement S , sin e we only need the e e t of S on ve tors. We
an save some omputations by writing the residual of the system (10.9.4) as

r2 = (b2 D2 x2 ) A2 D1 1 (b1

AT2 x2 ):

Note here that x1 = D1 1 (b1 AT2 x2 ) is available as an intermediate result. The


solution of the system D1 x1 = b1 AT2 x2 is heap when, e.g., D1 is tridiagonal. In
other ases this system may be solved in ea h step by an iterative method in an
inner iteration.
We now des ribe a blo k in omplete Cholesky fa torization due to Con us, Golub and Meurant [3, whi h has proved to be very useful. We assume in
the following that in (10.9.2) Di is tridiagonal and Ai is diagonal, as in the model
problem. First re all from Se tion 6.4.6 that the exa t blo k Cholesky fa torization
of a symmetri positive de nite blo k-tridiagonal matrix an be written as

A = ( + LA) 1 ( + LTA );

where LA is the lower blo k triangular part of A, and  = diag (1 ; : : : ; n ), is


obtained from the re ursion
1 = D1 ;

i = Di

1 = D 1 ;

i = Di

Ai i 11 ATi ; i = 2; : : : ; N:

For the model problem, although D1 is tridiagonal,  1 and hen e i , i  2, are


dense. Be ause of this the exa t blo k Cholesky fa torization is not useful.
Instead we onsider omputing an in omplete blo k fa torization from

Ai i 11 ATi ; i = 2; : : : ; N:

(10.9.5)

Here, for ea h i, i 1 is a sparse approximation to i 1 . The in omplete blo k


Cholesky fa torization is then

M = ( + LA) 1 ( + LTA);

 = diag (1 ; : : : ; n ):

\dqbjV
2002/
page 3

374

Chapter 10.

Iterative Methods

The orresponding defe t matrix is R = M A = diag (R1 ; : : : ; Rn ), where R1 =


1 D1 = 0,
Ri = i Di Ai i 11 ATi ;
i = 2; : : : ; n:
We have assumed that the diagonal blo ks Di are diagonally dominant symmetri tridiagonal matri es. We now dis uss the onstru tion of an approximate
inverse of su h a matrix
0
1
1 1
B 1 2 2
C
B
C
... ...
B
C
T =B
2
C;
B
C
.

. . n 1 n 1 A
n 1 n
where i > 0, i = 1; : : : ; n and i < 0, i = 1; : : : ; m 1. A sparse approximation of
Di 1 an be obtained as follows. First ompute the Cholesky fa torization T = LLT ,
where
0
1
1
B 1 2
C
B
C
...
B
C
L=B

C;
2
B
C
...

A
n 1
n 1 n
It an be shown that the elements of the inverse T 1 = L T L de rease stri tly away
from the diagonal. This suggests that the matrix L 1, whi h is lower triangular and
dense, is approximated by a banded lower triangular matrix L 1 (p), taking only
the rst p + 1 lower diagonals of the exa t L 1. Note that elements of the matrix
L 1
0
1
1= 1
1= 2
B 1
C
B
C
.
B
C
..
L 1 = B 1

C;
2
B .
C
...
...
 .
A
1= n 1
.
   n 2 n 1 1= n
an be omputed diagonal by diagonal. For example, we have

i = 2; : : : ; n 1:
i = i ;
i i+1
For p = 0 we get a diagonal approximate inverse. For p = 1 the approximate
Cholesky fa tor L 1(1) is lower bidiagonal, and the approximate inverse L T (1)L 1 (1)
a tridiagonal matrix. Sin e we have assumed that Ai are diagonal matri es, the approximations i generated by (10.9.5) will in this ase be tridiagonal.

10.9.4 Fast Dire t Methods


For the solution of dis retizations of some ellipti problems on a re tangular domain
fast dire t methods an be developed. For this approa h to be valid we needed to

\dqbjV
2002/
page 3

10.9.

Pre onditioners

375

make strong assumptions about the regularity of the system. It applies only to dis retizations of problems with onstant oe ients on a re tangular domain. However, if we have variable oe ients the fast solver may be used as a pre onditioner
in the onjugate gradient method. Similarly, problems on an irregular domain may
be embedded in a re tangular domain, and again we may use a pre onditioner based
on a fast solver.
Consider a linear system Ax = b where A has the blo k-tridiagonal form
0
1
B T
0 1
0 1
b1
x1
BT B T
C
B
C
b2 C
x
C
B
B
2C
.
C
B . C;
A=B
(10.9.6)
T B ..
;
b
=
B
C; x = B
.
A


.. A
..
B
C
.
.
.
.

.
. TA
bq
xq
T B
where B; T 2 Rpp . We assume that the matri es B and T ommute, i.e., BT =
T B . Then it follows that B and T have a ommon eigensystem, and we let

QT BQ =  = diag (i );

QT T Q =
= diag (!i ):

The system Ax = b an then be written Cz = y, where


0

B


C
B
C
...
B
C
C =B

C;
B
C
... ...



T
and xj = Qzj , yj = Q bj , j = 1; : : : ; q.
For the model problem with Diri hlet boundary onditions the eigenvalues and
eigenve tors are known. Furthermore, the multipli ation of a ve tor by Q and QT
is e iently obtained by the Fast Fourier Transform algorithm (see Se tion 9.6.3).
One fast algorithm then is as follows:
1. Compute
yj = QT bj ;
j = 1; : : : ; q:
2. Rearrange taking one element from ea h ve tors yj ,

y^i = (yi1 ; yi2 ; : : : ; yiq )T ; i = 1; : : : ; p;


and solve by elimination the p systems
i z^i

where

= y^i ;

i = 1; : : : ; p;

1
i !i
C
B !i i !i
C
B
...
C
B
!i i
C ; i = 1; : : : ; p:
i=B
C
B
... ...

!i A
!i i
0

\dqbjV
2002/7
page 3

Chapter 10.

376

Iterative Methods

3. Rearrange (inversely to step 2) taking one element from ea h z^i ,

zj = (^zj1 ; z^j2 ; : : : ; z^jq )T ; j = 1; : : : ; q;


and ompute

xj = Qzj ;
j = 1; : : : ; q:
The fast Fourier transforms in steps 1 and 3 takes O(n2 log n) operations.
Solving the tridiagonal systems in step 2 only takes O(n2 ) operations, and hen e
for this step Gaussian elimination is superior to FFT. In all this algorithm uses only
O(n2 log n) operations to ompute the n2 unknown omponents of x.
Review Questions
1.
2.
3.

What is meant by pre onditioning of a linear system Ax = b. What are the properties
that a good pre onditioner should have?
What is meant by an in omplete fa torization of a matrix? Des ribe how in omplete
fa torizations an be used as pre onditioners.
Des ribe two di erent transformations of a general nonsymmetri linear system Ax =
b, that allows the transformed system to be solved by the standard onjugate gradient
method.

Problems
1.

Consider square matri es of order n, with nonzero elements only in the k-th upper
diagonal, i.e., of the form
0
B

t1

Tk = B


2.

...

tn k

C
C;
A

k  0:

Show the following rule for multipli ation by diagonals:


n
k+l ; if k + l  n 1;
Ak Bl = C
0;
otherwise,
where the elements in Ck+l are a1 bk+1 ; : : : ; an k l bn l .
Let B be a symmetri positive de nite M -matrix of the form
B=

3.

B1
C

CT
B2

with B1 and B2 square. Show that the S hur omplement S = B2 CB1 1 C T of B1


in B is a symmetri positive de nite M -matrix.
The penta-diagonal matrix of the model problem has nonzero elements in positions
PA = f(i; j ) j ji j j = 0; 1; ng, whi h de nes a level 0 in omplete fa torization. Show
that the level 1 in omplete fa torization has two extra diagonals orresponding to
ji j j = n 1.

\dqbjV
2002/
page 3

Computer Exer ises

4.

377

The triangular solves needed when using an in omplete Cholesky fa torizations as a


pre onditioner are inherently sequential and di ult to ve torize. If the fa tors are
normalized to be unit triangular, then the solution an be omputed making use of
one of the following expansions


2
3
expansion)
(I L) = I(I++LL+)(IL++LL2 )(I++ L 4 )    (Neumann
(Euler expansion)
1

Verify these expansions and prove that they are nite.

Computer Exer ises


1.

Let A be a symmetri positive de nite matrix. An in omplete Cholesky pre onditioner for A is obtained by negle ting elements in pla es (i; j ) pres ribed by a
symmetri set
P  Pn  f(i; j ) j 1  i; j  ng;
where (i; j ) 2 P , if i = j .
(a) The simplest hoi e is to take P equal to the sparsity pattern of A, whi h for the
model problem is PA = f(i; j ) j ji j j = 0; 1; ng. This is alled a level 0 in omplete
fa torization. A level 1 in omplete fa torization is obtained by using the union of
P0 and the pattern of the defe t matrix R = LLT A. Higher level in omplete
fa torizations are de ned in a similar way.
(b) Consider the model problem, where A is blo k tridiagonal
2
2
A = tridiag( I; T + 2I; I ) 2 Rn n ;

T = tridiag( 1; 2 1) 2 Rnn :

Show that A is an M -matrix and hen e that an in omplete Cholesky fa torizations


of A exists?
( ) Write a MATLAB fun tion, whi h omputes the level 0 in omplete Cholesky
fa tor L0 of A. (You should NOT write a general routine like that in the textbook,
but an e ient routine using the spe ial ve diagonal stru ture of A!) Implement also
the pre onditioned onjugate gradient method in MATLAB, and a fun tion whi h
solves L0 LT0 z = r by forward and ba kward substitution. Solve the model problem
for n = 10, and 20 with and without pre onditioning, plot the error norm kx xk k2 ,
and ompare the rate of onvergen e. Stop the iterations when the re ursive residual
is of the level of ma hine pre ision. Dis uss your results!
(d) Take the exa t solution to be x = (1; 1; : : : ; 1; 1)T . To investigate the in uen e
of the pre onditioner M = LLT on the spe trum of M 1 A do the following. For
n = 10 plot the eigenvalues of A and of M 1 A for level 0 and level 1 pre onditioner.
You may use, e.g., the built-in MATLAB fun tions to ompute the eigenvalues, and
e ien y is not a premium here. (To handle the level 1 pre onditioner you need to
generalize your in omplete Cholesky routine.)

\dqbjV
2002/
page 3

378

Chapter 10.

Iterative Methods

Notes

Two lassi al texts on iterative methods for linear systems are Varga [18, 1962 and
Young [21, 1971. Axelsson [1, 1994 is an ex ellent sour e of more modern material.
The book by Barret et al. [2, 1994 gives a ompa t survey of iterative methods and
their implementation. An up to date treatment of iterative methods and advan ed
pre onditioners are given by Saad [17, 1996. See also available software listed in
Chapter 15.
For the ase of a square matrix A this method was originally devised by
Ka zmarz [10, 1937. We remark that Ka zmarz's method has been redis overed
and used su essfully in image re onstru tion. In this ontext the method is known
as the un onstrained ART algorithm (algebrai re onstru tion te hnique).
Krylov subspa e iteration, whi h originated with the onjugate gradient method
has been named one of the Top 10 Algorithms of the 20th entury. The onjugate
gradient method was developed independently by E. Stiefel and M. R. Hestenes.
Further work was done at the Institute for Numeri al Analysis, on the ampus of
the University of California, in Los Angeles (UCLA). This work was published in
1952 in the seminal paper [9, whi h has had a tremendous impa t in s ienti omputing. In this paper the author a knowledge ooperation with J. B. Rosser, G. E.
Forsythe, and L. Paige, who were working at the institute during this period. They
also mention that C. Lan zos has developed a losely related method (see Chapter
9).

Referen es
[1 O. Axelsson. Iterative Solution Methods. Cambridge University Press, Cambridge, UK, 1994.
[2 R. Barret, M. W. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst, editors. Templates for the
Solution of Linear Systems: Building Blo ks for Iterative Methods. SIAM,
Philadelphia, PA, 1993.
[3 P. Con us, G. H. Golub, and G. Meurant. Blo k pre onditioning for the
onjugate gradient method. SIAM J. S i. Statist. Comput., 6:220{252, 1985.
[4 R. W. Freund, G. H. Golub, and N. M. Na htigal. Iterative solution of linear
systems. A ta Numeri a, 1:57{100, 1991.
[5 P. E. Gill, W. Murray, D. B. Pon eleon, and M. A. Saunders. Pre onditioners
for inde nite systems arising in optimization. SIAM J. Matrix. Anal. Appl.,
13:292{311, 1992.
[6 G. H. Golub and D. O'Leary. Some history of the onjugate gradient and
Lan zos algorithms: 1948{1976. SIAM Review, 31:50{102, 1989.
[7 G. H. Golub and H. A. van der Vorst. Closer to the solution: iterative linear
solvers. In I. S. Du and G. A. Watson, editors, The State of the Art in
Numeri al Analysis, pages 63{92. Clarendon Press, Oxford, 1997.

\dqbjV
2002/
page 3

Computer Exer ises

381

[8 A. Greenbaum. Iterative Methods for Solving Linear Systems. SIAM,


Philadelphia, PA, 1997.
[9 M. R. Hestenes and E. Stiefel. Methods of onjugate gradients for solving
linear system. J. Res. Nat. Bur. Standards, Se t. B, 49:409{436, 1952.
[10 S. Ka zmarz. Angenaherte au osung von systemen linearen glei hungen.
A ad. Polon. S ien es et Lettres, pages 355{357, 1937.
[11 C. Lan zos. An iteration method for the solution of the eigenvalue problem of
linear di erential and integral operators. J. Res. Nat. Bur. Standards, Se t.
B, 45:255{282, 1950.
[12 L. Landweber. An iterative formula for fredholm integral equations of the
rst kind. Amer. J. Math., 73:615{624, 1951.
[13 J. A. Meijerink and H. A. van der Vorst. An iterative solution method for
linear systems of whi h the oe ient matrix is a symmetri m-matrix. Math.
Comp., 31:148{162, 1977.
[14 N. M. Na htigal, S. C. Reddy, and L. N. Trefethen. How fast are nonsymmetri matrix iterations? SIAM J. Matrix. Anal. Appl., 13:778{795, 1992.
[15 C. C. Paige and M. Saunders. Solution of sparse inde nite systems of linear
equations. SIAM J. Numer. Anal., 12:617{629, 1975.
[16 J. K. Reid. On the method of onjugate gradients for the solution of large
systems of linear equations. In J. K. Reid, editor, Large Sparse Sets of Linear
Equations, pages 231{254. A ademi Press, New York, 1971.
[17 Y. Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing Co.,
Boston, MA, 1996.
[18 R. S. Varga. Matrix Iterative Analysis. Prenti e-Hall, Englewood Cli s, 1962.
[19 J. H. Wilkinson. The Algebrai Eigenvalue Problem. Clarendon Press, Oxford, UK, 1965.
[20 D. M. Young. Iterative methods for solving partial di erential equations of
ellipti type. PhD thesis, Harward University, 1950.
[21 D. M. Young. Iterative Solution of Large Linear Systems. A ademi Press,
New York, 1971.
[22 D. M. Young. A histori al overview of iterative methods. Computer Phys.
Comm., 53:1{17, 1989.

\dqbjV
2002/
page 3

382

Chapter 10.

Iterative Methods

You might also like