You are on page 1of 33

BASIC ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

by

Prabhat K. Jha, B.S.

Report

Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulllment of the Requirements for the Degree of

Master of Arts

The University of Texas at Austin December 2003

BASIC ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

APPROVED BY SUPERVISING COMMITTEE:

TODD ARBOGAST

OSCAR GONZALEZ

BASIC ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS

by

Prabhat K. Jha, M.A. The University of Texas at Austin, 2003

SUPERVISOR: TODD ARBOGAST

Direct methods such as Gaussian Elimination for solving linear systems compute the exact answer after a nite number of steps. Direct methods can be numerically expensive if the associated matrix is very large as they require O (n3 ) operations where n is the dimension of the matrix. Iterative methods on the other hand generally do not produce the exact solution, but they decrease the error by some fraction at each step. In this report, we rst describe simple iterative methods and their convergence properties and then we describe the Conjugate Gradient method which applies when the matrix is symmetric and positive denite.

iii

Contents
1 Introduction 2 Simple Iterative Methods 2.1 Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Gauss-Seidel Method . . . . . . . . . . . . . . . . . . . . . . . 2.3 Successive Over-Relaxation (SOR) Method . . . . . . . . . . . 3 Convergence of Simple Iterative Methods 4 Krylov Subspace Methods 1 2 2 3 4 6 12

4.1 The Lanczos Algorithm . . . . . . . . . . . . . . . . . . . . . . 12 4.2 Relation Between Krylov Subspaces and Lanczos Algorithm . 14 4.3 Motivation for the Conjugate Gradient (CG) Method . . . . . 15 4.4 Conjugate Gradient Method as a Krylov Subspace Method . . 16 4.5 Convergence Properties of the CG Method . . . . . . . . . . . 23 4.6 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 25 References Vita 28 29

iv

Introduction

One of the standard problems of linear algebra is solving a linear system of equations denoted by Ax = b, where A is a given n n nonsingular real or complex matrix, b is an n-column vector and x is an unknown n-column vector. Direct methods such as Gaussian Elimination compute the exact answer after a nite number of steps. Direct methods can be numerically expensive if A is very large, as they require O (n3 ) operations [1]. Iterative methods, on the other hand, generally do not produce the exact solution to the linear system, but they decrease the error by some fraction after each step. Iteration stops when the error is less than some specied tolerance. Iterative methods generally exploit the underlying mathematical or physical problems that give rise to the linear system. For example, a nite dierence discretization of a partial dierential equation (PDE) which arises in uid mechanics, heat ow, etc., may lead to a very sparse matrix, and this structure can be exploited. In this report, we rst discuss the Jacobi, Gauss-Seidel, and SOR iterative methods, and we discuss the convergence criteria for these methods. Then we discuss a more ecient method. It is an example of a Krylov subspace method that projects an n-dimensional problem to a lower dimensional subspace called a Krylov subspace. The most widely used such method is the Conjugate Gradient method. The work is organized as follows. We describe the Jacobi, Gauss-Seidel and SOR iterative methods in Section 2, and describe their convergence properties in Section 3. In Section 4, we move to the crux of this report, which is a Krylov subspace method. We describe a Lanczos algorithm to derive the Krylov subspace, and we consider a widely used Krylov subspace method for a symmetric and positive denite (SPD) matrix, the Conjugate Gradient method. We end this report by discussing aspects of preconditioning. 1

Simple Iterative Methods

The methods covered in this section belong to a family in which the nonsingular matrix A has non-zero diagonal entries and is split into matrices M and K such that A = M K with M nonsingular. So the equation Ax = b becomes x = M 1 Kx + M 1 b, or x = R1 x + c, where R = M 1 K and c = M 1 b. We dene our iteration by xm+1 = Rxm + c. (2) (1)

For the iteration to be computationally practical, the splitting should be chosen such that M 1 K and M 1 b are easy to calculate.

2.1

Jacobi Method

The Jacobi iteration determines the j -th component of the next approximation to annihilate the j -th component of the residual vector, Ax b. One step of the iteration can be written as follows. Jacobi Algorithm. for (j = 1; j n; j + +) xm+1,j = end for 2 1 ajj bj ajk xm,k
k =j

(3)

This iteration can be written in vector and matrix form as xm+1 = D 1 (L + U )xm + D 1 b, (4)

where D is the diagonal of A, L is the strictly lower triangular part of A, in terms of (2) with R = D 1 (L + U ) and c = D 1 b, where the matrix A is split into M = D and K = (L + U ) = D A. and U is the strictly upper triangular part of A. This equality can be seen

2.2

Gauss-Seidel Method

Gauss-Seidel iteration, similar to Jacobi iteration, corrects the i-th component of the current approximate solution. However, at the j -th iterate, it of Gauss-Seidel iteration can be written as follows. Gauss-Seidel Algorithm. for (j = 1; j n; j + +) xm+1,j end for To write this in the form of (2), we rewrite the iteration as
j n

uses the previously improved j 1 components of the solution. So, one step

1 = ajj

j 1

bj

k =1

ajk xm+1,k

ajk xm,k
k =j +1

(5)

k =1

ajk xm+1,k =

ajk xm,k + bj ,
k =j +1

(6)

which is equivalent to (D L)xm+1 = Uxm + b,

or xm+1 = (D L)1 Uxm + (D L)1 b. M = D L and K = U . (7)

So in this case R = (D L)1 U and c = (D L)1 and A is split into

2.3

Successive Over-Relaxation (SOR) Method

The Successive Overrelaxtion method (SOR) is derived by applying extrapolation to the Gauss-Seidel method. The Gauss-Seidel loop is improved by taking an appropriate weighted average of xm+1,j and xm,j . Thus in SOR, xm+1,j = (1 )xm,j + x m+1,j where x m+1,j is dened by (5) and is the extrapolation factor. So one step of the SOR method can be written as follows. SOR Algorithm. for (j = 1; j <= n; j + +) xm+1,j = (1 )xm,j + bj ajj
j 1 n

k =1

ajk xm+1,k

ajk xmk
k =j +1

(8) end for Multiplying the equation by ajj and rearranging we get
j 1 n

ajj xm+1,j +
k =1

ajk xm+1,k = (1 )ajj xm,j

ajk xmk + bj .
k =j +1

(9)

The iteration can be written in the form of (2) as xm+1 = (D L)1 ((1 )D + U )xm + (D L)1 b. 4 (10)

So in this case R = (D L)1 ((1 )D + U ) and c = (D L)1 b.

Equation (10) is equivalent to the Gauss-Seidel method when = 1; > 1 is called overrelaxation and < 1 underrelaxation. A somewhat naive motivation for overrelaxation is that if the direction from xm to xm+1 is a good direction towards convergence, then moving > 1 times further is better for convergence. We consider nding the optimal for some special cases in the next section where we consider convergence criteria for these simple iterative methods.

Convergence of Simple Iterative Methods

Before convergence criteria for these methods can be considered, we need to prove some lemmas and a theorem. We rst dene a few terms to be used in this section. Strictly Diagonally Dominant Matrix: A matrix A is strictly diagonally Spectral Radius: The spectral radius of a matrix A, denoted by (A), is the largest absolute value || of all eigenvalues of A, i.e., the maximum taken over all eigenvalues of A in absolute value (modulus). xT Ax > 0 for all nonzero x Rn .
n

dominant if |aii | >

j =i

|aij | for all i (such a matrix is nonsingular).

Positive Denite Matrix: A matrix A Rnn is positive denite if Operator or Subordinate Matrix Norm: Let A be an n n matrix and be a vector norm. Then Ax n x n

A max
x n

x=0

is called an operator norm or subordinate matrix norm. It is indeed a norm on Rnn . Innity Norm: The innity norm of a matrix A is dened as A

maxi

|aij |, i.e., the maximum absolute row sum. R < 1, then xm+1 = Rxm + c

Lemma 3.1. Let be an operator norm. If (i.e., b). Proof. At the (m + 1)st iteration, we have

converges to x such that x = Rx + c (i.e., Ax = b) for any x0 and any c

xm+1 = Rxm + c. 6

(11)

We know if x is the exact solution (such a solution exists because A is nonsingular and we split matrix A into matrices M and K where M is invertible), then x = Rx + c, so xm+1 x = R(xm x) = Rm+1 (x0 x). By the denition of an operator norm, we have xm+1 x = Rm+1 (x0 x) R
m+1

(12)

(13)

(x0 x) .

(14)

Since R < 1, we see that the above converges to zero. Lemma 3.2. For all operator norms, (R) R . For each > 0, there exists an operator norm

such that R

(R) + for all R.

Proof. The theorem is proved in [2] which uses the fact that every matrix has a Jordan Canonical form. The existence of such a form is proved in [4]. Theorem 3.3. The iteration xm+1 = Rxm + c converges to the solution of Ax = b for all x0 and for all b if and only if (R) < 1. Proof. With the above two lemmas, the proof is straightforward. We prove the necessary condition by contradiction. Suppose (R) 1 and choose x0 x to be an eigenvector of R with eigenvalue , where || = (R). Then xm+1 x = R(xm x) = Rm+1 (x0 x) = m+1 (x0 x), which will not approach zero if 1. To prove the sucient condition, we use Lemma 3.2 to nd an appropriate operator norm so that R < 1. Then 7

by Lemma 3.1 we know that xm+1 = Rxm + c converges for any x0 and any c. Lemma 3.4. Let A = B C be a splitting of A such that B is nonsingular and B T + C is positive denite, then (B 1 C ) < 1. Proof. The lemma is proved in [6] and uses Steins theorem, which says that if A is SPD and A H T AH is positive denite, then (H ) < 1. Thus it suces to show that Q = A (B 1 C )T AB 1 C is positive denite. Since B 1 C = I B 1 A, we have Q = (B 1 A)T A + AB 1 A (B 1 A)T AB 1 A = (B 1 A)T (B + B T A)B 1 A. But B + B T A = B T + C is positive denite by hypothesis and therefore Q is positive denite. Thus by Steins theorem, (B 1 C < 1). Based on the Theorem 3.3, we conclude that the splitting of the original matrix A into M and K should be chosen so that: i. Rx = M 1 Kx and c = M 1 b are easy to evaluate; ii. (R) is small. These two goals conict so we strive to balance them. For example, if we choose M = I , the identity matrix, goal (i) is easy to achieve but (R) is not guaranteed to be small. Convergence depends on the nature of the problem, but some of the widely applicable criteria for convergence are contained in the following theorem.

Theorem 3.5. I. If A is strictly diagonally dominant, then both Jacobi and Gauss-Seidel converge; II. If A is symmetric and positive denite (SPD), then Gauss-Seidel converges for any x0 ; III. The condition 0 < < 2 is necessary for convergence of SOR. If A is SPD, then 0 < < 2 is sucient for convergence. Proof. Criterion I: Let RJ be the coecient matrix in the Jacobi method, and RG be the coecient matrix in the Gauss-Seidel method. If A is strictly diagonally dominant, we claim that the spectral radii of RJ and RG are less than 1. If so, Theorem 3.3 implies that both Jacobi and Gauss-Seidel converge. For the Jacobi method, using Lemma 3.1, (RJ ) = (D 1 (L + U ) D 1 (L + U ) j =i |aij | = max 1in |aii | < 1.

For the Gauss-Seidel method, we prove by contradiction. Consider an eigenvalue of RG . Then 0 = det(I RG ) = det(I (D L)1 U ) = det((D L)1 ((D L) U )) or det((D L) U ) = 0. (15)

Now by contradiction assume || 1. Since A is strictly diagonally domi9

nant, so is (D L) U which implies det((D L) U ) = 0. This is a

contradiction to (15), hence || < 1 for all eigenvalues of RG . So (RG ) < 1. Criterion II: The proof can be found in [3]. When the matrix is SPD, RG can be written as (D L)1 LT . Since A is positive denite, D is positive

denite. Dene a matrix G1 to be D 1/2 RG D 1/2 and let L1 = D 1/2 LD 1/2 . eigenvalues. We verify that (G1 ) < 1 and hence prove the criterion. If G1 x = x with xH x = 1, where xH = x T , then we have LT 1 x = (I L1 )x
H H and thus xH LT 1 x = (1 x L1 x). Let a + bi = x L1 x, then we have

Then G1 = (I L1 )1 LT 1 . Since RG and G1 are similar, they have same

a bi = xH LT 1 x and

a + bi | | = 1 a bi
2

a2 + b2 = . 1 2a + a2 + b2

Since D 1/2 AD 1/2 = I L1 LT 1 is positive denite, we have 0 < 1 xH L1 x xH LT 1 x = 1 2a, which implies || < 1.

Criterion III: We prove this criterion by proving that (RSOR ) | 1|, which implies 0 < < 2 is required for convergence. Recall RSOR = (D )1 ((1 L)1 ((1 )D + U ). For convenience, we write RSOR as (I L

) where D 1 L = L and D 1 U = U . Let be any eigenvalue of )I + U ) = 0, the characteristic polynomial of RSOR can be RSOR and since det(L written as () = det(I RSOR )

)(I RSOR )) = det((I L

U ). = det(( + 1)I L

10

Thus,
n

(0) =

i (RSOR )
i=1

= det(( 1)I ) = ( 1)n This implies that (RSOR ) = max(|i |) | 1|. So, for SOR to converge, the condition 0 < < 2 is required. To prove the sucient condition, let B = 1 (D L) and C = 1 [(1 )D + LT ]. Note that A=BC and RSOR = B 1 C. To prove the result, it is sucient to show that Lemma 3.4 is satised. But this follows as D is positive denite and D L is nonsingular. Moreover, the symmetric part of B T + C is B + B T A = 2 1 D L LT D + L + LT = 1(2 )D which, since 0 < < 2, is positive denite. It should be noted that SOR with = 1 is the Gauss-Seidel method, hence we have another proof of Criterion II.

11

Krylov Subspace Methods

Krylov subspace methods project an m-dimensional problem into a lower dimensional space called a Krylov subspace which we will formally dene in this section. These methods are computationally less expensive than direct methods, as most of the operations involved are matrix vector multiplication. There are a variety of Krylov subspace methods suitable for dierent kinds of matrices, but in this report, we focus on symmetric positive denite (SPD) matrices, and then the method is called the Conjugate Gradient Method (CG). Before we dene a Krylov subspace, we explain the Lanczos algorithm, which reduces a matrix to symmetric tridiagonal form. The algorithm and its derivation can be found in [3] or [2].

4.1

The Lanczos Algorithm

Given an n n matrix A and vector b, dene a matrix K to be K = b Ab A2 b An1 b . Our goal is to nd an orthogonal matrix Q such that for all k , the leading k columns of Q and K span the same space. We would like to nd a basis for this space. Assume for the moment that K is nonsingular. Let K = QR be the QR-decomposition [3] of K , which is a factorization of K into an orthogonal matrix Q, i.e., Q1 = QT , and an upper triangular matrix R. Then the rst column of Q, q1 , is a multiple of b. Moreover, K 1 AK = (R1 Q1 )A(QR) = R1 QT AQR. Note AK = Ab A2 b A3 b An b] = K [e2 e3 . . . en c = KC, 12 (17) (16)

where ei is the i-th column of the identity matrix and c = K 1 An b1 . That is, 0 0 1 0 0 1 K 1 AK = C = . . 0 . . . . . . . . . . . . . ... 0 c1 0 c2 . . . . . . . . . . . . . . 0 . 1 cn

Thus C is in upper-Hessenberg form (i.e., the matrix is zero below the rst subdiagonal, aij = 0 if i > j + 1). From (16) we have C = R1 QT AQR. By the QR-decomposition, we know both R and R1 are upper triangular, so QT AQ = RCR1 = T is also an upper Hessenberg matrix. To nd the columns of Q, we see that AQ = QT . The j -th column of the matrix on both sides can be written as
j +1

Aqj =
i=1

tij qi .

(18)

When A is symmetric, T is symmetric and therefore tridiagonal: 1 1 .. .. . . T = 1 . . . .. . . n1 n1 n In this case, (18) is equivalent to Aqj = j 1 qj 1 + j qj + j qj +1 (19)

T Multiplying by qj on both sides and using the fact that qi and qj are or-

thonormal, we have
T qj Aqj = j .

(20)

13

If j = 0, then qj +1 = zj /j , where zj = (A j I )qj j 1qj 1 . (22) (21)

Thus given a symmetric matrix A and a vector b, using (17) to (22), we can write the Lanczos algorithm for the partial reduction of a A to a symmetric tridiagonal form as follows. Lanczos Algorithm. q1 = b/ b 2 , 0 = 0, q0 = 0 for j = 1 to k z = Aqj
T j = qj z

z = z j qj j 1 qj 1 j = z
2

if j = 0, break qj +1 = z/j end for The algorithm fails if z = 0 for some j , but this does not cause diculties for our use later in the CG method.

4.2

Relation Between Krylov Subspaces and Lanczos Algorithm

The subspace spanned by [b, Ab, A2 b, . . . , Ak1 b] is called a Krylov subspace and is denoted by Kk (A, b). With qi computed from the Lanczos algorithm, we also see that the space spanned by [b, Ab, A2 b, . . . , Ak1 b] is the space 14

spanned by [q1 , q2 , . . . , qk ]. Since the vectors qi are also orthonormal, they form an orthonormal basis of the Krylov subspace. As shown in [3], Kk has dimension k if and only if the Lanczos algorithm computes qk without quitting rst, i.e., without making z zero. We would like to nd an algorithm to solve Ax = b using the vectors computed by k -steps of the Lanczos algorithm. If we can make k smaller than n, then the algorithm will be less computationally expensive than direct methods. After k iterations of the Lanczos algorithm, we can write the matrix Q as [Qk Qu ] where Qk = [q1 . . . qk ] and Qu = [qk+1 . . . qn ]. Hence the matrix T can be written as T = QT AQ = [Qk Qu ]T A[Qk Qu ] T T QT Tk Tku k AQk Qk AQu = = T QT Tku Tu u AQk Qu AQu

Here Tk is k k , Tku is k (n k ) and Tu is (n k ) (n k ). Tk is also the projection of the matrix A onto the Krylov subspace Kk . Note that Tk is also the k -submatrix of all Tj for j k .

4.3

Motivation for the Conjugate Gradient (CG) Method

Suppose A Rnn is SPD and consider the function 1 (x) = xT Ax xT b. 2 The gradient is (x) = Ax b, so it follows that x = A1 b is the unique

minimizer of . Hence minimizing (x) is the same as nding a solution to Ax = b. One of the simplest strategies for minimizing is the method of the steepest descent described in [3] which uses the fact that decreases most rapidly in the direction of the negative gradient (xk ) = b Axk = rk , 15

the residual. But if the condition number of A, 2 (A) = max /min (i.e., the ratio of the largest and the smallest eigenvalues of A) is large, then the speed of convergence is very slow, because the level curves of are very elongated hyperellipsoids. In the steepest descent, we are forced to traverse back and forth across the curve, thus slowing progress towards the minimum point. Hence, although the algorithm is easily understood, it has very bad convergence properties as it often chooses to minimize in the same few directions over and over again. To avoid this pitfall, we consider successive minimization of along a set of directions p1 , p2 , . . . which will be A-conjugate or A-orthogonal. A matrix P is called A-conjugate if P T AP is a diagonal matrix, i.e., pT i Apj = 0 when i = j , where pi is a column vector of P . These vectors will be called conjugate gradients, and we will show how to nd them in the next section.

4.4

Conjugate Gradient Method as a Krylov Subspace Method

Since we only have k vectors after k steps of the Lanczos algorithm, we would like to have the solution of Ax = b in the Krylov subspace Kk spanned by these vectors. Thus, if xk is the best approximate solution, then
k

xk =
j =1

zk qk = Qk z for some z = (z1 , . . . , zk )T .

Here we dene best when A is a SPD matrix, by xk that minimizes the residual norm or A-norm r
A 1

= (r T A1 r )1/2 ,

where r is the residual after k iterations. The way to minimize this norm is what Hestenes and Stiefel considered in 1952 [5]; and they called their method 16

The Conjugate Gradient Method. Before we derive the CG method, we prove an important result that provides the solution to our problem in a nonconstructive way. Theorem 4.1. (First CG Theorem.) Using Tk obtained from the Lanczos
1 method, if xk = Qk Tk e1 b 2 , where e1 is the rst column of the identity

matrix, then QT k rk = 0, where rk = b Axk . This choice of xk minimizes rk


A 1

qk+1 .

over all xk Kk . Moreover, rk = rk 2 qk+1 , i.e., rk is parallel to

Proof. To make the proof less cluttered with subscripts, denote xk , rk , and Qk as x, r , and Q, respectively. We rst prove that QT r = 0, i.e., r is orthogonal to the k th Krylov subspace. We know QT r = QT (b Ax) = QT b QT Ax. By the Lanczos algorithm, the rst column of Q is b/ b
2

(23) and the rest of the


2

columns are orthogonal to b so QT b can be written as e1 b QT r = e1 b = e1 b = e1 b


2 2 2

and (23) as

QT A(QT 1 e1 b 2 ) (QT AQ)T 1 e1 b T T 1 e1 b


2 2

= 0.

This proves that the residual vector is orthogonal to the Krylov subspace. We now prove that x minimizes r
A 1 .

that minimizes the A-norm and let r = b Ax = b A(x + Qz ) = (r AQz )

Let x = x + Qz be a vector in K

be the corresponding residual vector. We prove the assertion by proving that

17

x=x . We have r
2 A 1

= r T A1 r = (r AQz )T A1 (r AQz ) = r T A1 r T (AQz )T A1 r r T A1 (AQz ) + (AQz )T A1 (AQz ) = = r r


2 A 1 2 A 1

z T QT r r T Qz + AQz + AQz
2 A 1 ,

2 A 1

as QT r = 0. For r to minimize the A-norm, AQz = 0, which happens if and only if z = 0, since A is nonsingular and Q has full column rank. We revert to the subscript notation to prove the last assertion of the b Axk Kk+1 . This implies rk is a linear combination of columns of Qk+1 orthogonal to rk . Thus, we have rk and qk+1 as parallel vectors. theorem that rk and qk+1 are parallel. Since xk Kk , we must have rk = which span Kk+1. Since QT k rk = 0, qk +1 is the only column which is not Since Tk computed from the Lanczos algorithm is a SPD tridiagonal matrix, we can use Cholesky decomposition to get Tk = Lk Dk LT k , where Lk is lower triangular and has ones on the diagonal. Since Tk is tridiagonal, so is Lk . The xk in the rst CG theorem can be written as
1 xk = Qk Tk e1 b 2 T 1 1 = Qk L k Dk Lk e1 b 2

k yk , =P

k Qk LT and yk D 1 L1 e1 b 2 . To nd xk we need to nd where P k k k k is the recurrence relation for Pk and yk . We rst prove that the matrix P A-conjugate. k is A-conjugate. Lemma 4.2. P

18

Proof. Compute T AP k = (Qk LT )T A(Qk LT ) P k k k


1 T T = L k (Qk AQk )Lk 1 T = L k Tk Lk 1 T T = L k (Lk Dk Lk )Lk

= Dk , where Dk is diagonal. Before continuing, we note that 1 1 . .. . 1 . . Tk = . . .. . . k1 k1 k 1 . l1 . . = . . .. .. lk1 1 = Lk1 lk1 eT k 1 1

d1

..

. d k 1 dk

1 .. .

diag(Dk1 , dk )

Lk1 lk1 eT k 1 1

. l1 . . .. .
T

lk1 1

since the k 1 submatrix of Tk is Tk1 , which also implies Lk1 is the k 1 submatrix of Lk .

k can be derived as follows. We know LT is The recurrence for P k 1 upper triangular, so its inverse is also an upper triangular matrix which forms

19

T the leading (k 1) (k 1) submatrix of L k . Hence we can write

k = Qk LT P k = [Qk1 qk ]
T L k 1 0 1

T = [Qk1 L k ] k 1 p

k1 p = [P k ]. k1 is identical to the leading n (k 1) submatrix of P k . Hence, to Thus P k LT = Qk and equate the k -th column on both nd p k , we use the relation P
k

sides to get p k = qk lk1p k1 . (24)

The recurrence for yk can be derived as follows. Let yk = (1 , . . . , n ). We rst show that yk1 is identical to the leading k 1 entries in yk :
1 1 Lk e1 b y k = Dk 1 Dk 1 = 1 d k 2 1 L k 1 1 2

e1 b

= =

1 1 Dk 1 Lk 1 e1 b k

y k 1 k

Hence, our recurrence relation for xk is k yk xk = P k1 p = [P k ] y k 1 k (25)

k1 yk1 + p = P k k = xk1 + p k k . Now we have a recurrence relation for the residual: rk = b Axk = b (Axk1 + Ap k k ) = rk1 Ap k k . 20 (26)

From the First CG Theorem, we know that qk and rk1 are parallel and qk rk 1 . rk 1 2 Now dene pk parallel to p k such that p k = pk / rk1 2 ; we call these vectors qk = the search directions for nding the solution to the linear system. Recurrence relations for rk , xk and pk can be written as rk = rk 1 xk pk Apk rk1 k Apk , rk 1 2 k pk rk1 + k pk , = xk1 + rk 1 2 rk1 2 lk1 = rk 1 pk 1 rk 1 + k pk 1 , rk 2 2 k (27) (28) (29) can be written as

where we have to nd formulae for k and k . We multiply (29) by pT kA which gives, since pk and pk1 are A-conjugate (Lemma 4.2),
T T pT k Apk = pk Ark 1 + pk Ak pk 1

= pT k Ark 1
T = rk 1 Apk , T as A is SPD. Now multiply (27) by rk 1 to see

(30)

k =

T T T rk rk 1 rk 1 rk 1 rk 1 rk 1 = , T rk1 Apk pT k Apk

(31)

using Theorem 4.1 and (30). Similarly, to nd a formula to calculate k , we


T use (27) and multiply it by rk and we get a dierent expression for k as T rk rk . T rk Apk

k = (29) by pT k 1 A to get k =

(32)

Using the derivation similar to k in nding (31), we multiply both sides of pT k 1 Ark 1 . T pk1 Apk1 21

(33)

Now, writing both (31) and (32) for k1 and setting one equal to another we get
T T rk rk 2 rk 2 1 rk 1 = . T T pk1Apk1 rk1 Apk1

(34)

By rearranging terms in (34), we get


T pT rk k 1 Ark 1 1 rk 1 = . T T pk1Apk1 rk 2 rk 2

(35)

From (35) and (33), we get the simple recurrence relation k =


T rk 1 rk 1 . T rk 2 rk 2

(36)

So, we have simple recurrence relations for rk , xk and pk , which leads to the implementation of the Conjugate Gradient Algorithm as follows. Conjugate Gradient Algorithm. k = 0; x0 = 0; r0 = b; p1 = b; while ( rk
2

Tolerance) repeat

k =k+1 z = Apk
T T k = (rk 1 r k 1 )/ (p k z )

xk = xk1 + k pk
T T k+1 = (rk r k )/ (r k 1 rk 1 )

rk = rk1 k z

pk+1 = rk + k+1 pk end loop The eciency of the CG method can be seen from the fact that it only requires storage of the four vectors r , x, p, and z , and we only need to do one matrix vector product, two inner products, three SAXPY operations, i.e., adding a multiple of one vector to another, and a few scalar operations. 22

4.5

Convergence Properties of the CG Method

In this section, we nd expressions that characterize the convergence of the CG method and we discover that the distribution of the eigenvalues, hence the condition number of the matrix, play an important role in the speed of convergence. We transform the convergence problem to the problem of nding approximating polynomials, namely Chebyshev polynomials. Properties of Chebyshev polynomials can be found in [2]. Assuming innite arithmetic precision, we know from the previous section
T that ri rj = 0 for i = j and search directions are mutually A-conjugate.

We conclude that CG terminates after n iterations because rn = 0. Each step of the CG method minimizes the A-norm of the the residual over all possible solutions xk Kk (A, b) = span[b, Ab, . . . , Ak1 b]. In other words, xk minimizes b Az
2 A 1

= (b Az )T A1 (b Az ) = (x z )T A(x z ) = f (z ),

where z is in Kk (A, b). Now, z can be written as


k 1

z=
j =0

j Aj b = Pk1 (A)b = Pk1 (A)Ax,

where Pk1 is a polynomial of degree k 1. So, f (z ) = [(I Pk1(A)A)x)]T A[(I Pk1(A)A)x)] = (qk (A)x)T A(qk (A)x) = xT qk (A)Aqk (A)x, where qk ( ) 1 Pk1 ( ) is a k -degree polynomial in with qk (0) = 1. Thus, for f (xk ) we have, f (xk ) = min f (z ) = min xT qk (A)Aqk (A)x,
z Kk k qk Q

(37)

23

k is the set of all k -degree polynomials qk such that qk (0) = 1. Now, where Q we use the eigendecomposition [3] of A = QQT to see qk (A)Aqk (A) = qk (QQT )(QQT )qk (QQT ) = Qqk ()QT QQT Qqk ()QT = Qqk ()qk ()QT since QT Q = I . Thus, with y = QT x,
n

f (xk ) = min f (z ) =
z Kk

k qk Q

min

2 yi i (qk (i ))2 i=1 n

k qk Q

min

i (A)

max (qk (i ))

2 i=1

2 yi i

k qk Q

min

i (A)

max (qk (i ))2 f (x0 ),


n i=1 2 yi i , wherein (A) is

since x0 = 0 implies f (x0 ) = xT Ax = y T y =

the set of all eigenvalues of A. This can be rearranged to f (xk ) min max (qk (i ))2 , k i (A) f (x0 ) qk Q which implies rk r0
A 1 A 1

min max |qk (i )|2 .


k i (A) qk Q

(38)

If all eigenvalues are evenly distributed in the interval [min , max ], a reasonable approach is to minimize the maxima of the absolute polynomial qk on this interval. This is achieved by the polynomial Tk qk () = max + min 2 max min , max + min Tk max min 24

where Tk ( ) =

1 2

( +

nomial. By the properties of a Chebyshev polynomial, we know that |Tk ( )| 1 when | | 1. Thus, rk r0
A 1 A 1

2 1)k + (

2 1)k is the Chebyshev poly-

= =

Tk Tk

max + min max min +1 1


1 1

Tk 1 +

2 1

(39)

where = max /min is the condition number of A. So, if is near 1, then (39) is small and convergence is very fast. If is very large, then (39) is large, hence the convergence is slow. Neverthless, convergence is always achieved after n iterations.

4.6

Preconditioning

We know that the convergence of the CG method depends on the condition number of the coecient matrix. Preconditioning is a method that improves the convergence rate by lowering the condition number and/or increasing the eigenvalue clustering. This is achieved by solving the modied problem M 1 Ax = M 1 b, (40)

where M is a SPD matrix which is easy to invert. If (M 1 A) < (A) or the eigenvalues of M 1 A are more clustered than those of A, then we will get a higher rate of convergence. Although, M is a SPD matrix, M 1 A may not be SPD, which is required for the CG method. Thus, the solution is to consider factorizing M = QQT , which can be obtained by Cholesky factorization, and then solve the linear system QT Q1 Ax = QT Q1 b, 25

which is equivalent to Q1 AQT x = Q1 b, (41)

where x = QT x. We claim (40) and (41) have the same solution and the coecient matrices also have the same eigenvalues and eigenvectors. For the proof, let u be an eigenvector of M 1 A with as an eigenvalue. Then we see that Q1 AQT (QT u) = Q1 Au = QT QT Q1 Au = QT (QQT )1 Au = QT M 1 Au = QT u. The preconditioned CG algorithm is derived by replacing the system matrix A with Q1 AQT , x with QT x and b with Q1 b, along with substitutions of variables so that multiplication with the matrix Q and QT is avoided. We also do not need to form the preconditioning matrix M explicitly if we are able to implicitly apply M 1 to a vector. The Preconditioned CG algorithm is as follows. Preconditioned Conjugate Gradient Algorithm. k = 0; x0 = 0; r0 = b; p1 = M 1 b; y0 = M 1 r0 ; while ( rk
2

Tolerance) repeat

k =k+1 z = Apk
T T k = (yk 1 r k 1 )/ (p k z )

xk = xk1 + k pk 26

rk = rk1 k z y k = M 1 rk
T T k+1 = (yk r k )/ (y k 1 rk 1 )

pk+1 = yk + k+1pk end loop Well-Known Preconditioners. The most commonly used preconditioners are Jacobi, incomplete factorization, block preconditioners, and domain decomposition [7]. Jacobi preconditioning uses the matrix M = diag(A). It has been shown to be useful if the diagonal elements are relatively dierent. This is equivalent to scaling the quadratic form to coordinate axes. Incomplete factorization preconditioning uses an approximation to A which is easy to invert. The goal is to do a fast but incomplete factorization of A. An example of this is to calculate only those elements of a Cholesky factor in already non-zero places. Block Preconditioners exploit the fact that most algorithms, expressed in terms of the scalar matrix elements, have analogues involving block matrices. An example of a block preconditioner is a block-Jacobi preconditioner which uses entire matrix blocks on its diagonal, which then need to be inverted to apply M 1 . Domain decomposition is used when A represents an equation on a physical region which can be broken up into disjoint (or slightly overlapping) subregions, and the equation can be solved on each subregion independently. Intuitively, preconditioning is an attempt to stretch the quadratic form to make it appear more spherical so that the eigenvalues are close to each other, and this often signicantly improves the performance of CG.

27

References
[1] A. K. Cline, Graduate class lecture notes in numerical linear algebra, Dept. of Computer Sci., Univ. of Texas, Austin, Texas, 2001. [2] J. W. Demmel, Applied Numerical Linear Algebra, SIAM, Philadelphia, Pennsylvania, 1997. [3] G. H. Golub and C. F. V. Loan, Matrix Computations, Johns Hopkins, Baltimore, Maryland, 1991. [4] P. Halmos, Finite Dimensional Vector Spaces, Van Nostrand, New York, 1958. [5] M. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Natl. Bur. Stand., 49 (1954), pp. 409436. [6] J. M. Ortega, Numerical Analysis, A Second Course, Academic Press, New York, 1972. [7] L. N. Trefethen and I. David Bau, Numerical Linear Algebra, SIAM, Philadelphia, Pennsylvania, 1997.

28

Vita
Prabhat K. Jha was born at a village Sisautiya in Nepal on May 12, 1976, the son of Late Mr. Vinod K. Jha and Mrs. Renu Jha. He graduated from Siddharth Vanashthali Science College, Kathmandu, Nepal in 1996, received a B.S. in Computer Sciences and Mathematics from Fairmont State College, West Virginia, USA in 2001. He entered the Mathematics department of The University of Texas at Austin in August 2001 where he accumulated a wealth of graduate school experience apart from working as a teaching assistant.

Permanent Address: V.D.C. Sisautiya 1 Sarlahi, Nepal

This report was typed by the author.

29

You might also like