You are on page 1of 6

(C) 1992 Society for Industrial and Applied Mathematics

SIAM J. MATRIX ANAL. APPL.


Vol. 13, No. 1, pp. 41-45, January 1992

OO5

ON THE SUM OF THE LARGEST EIGENVALUES


OF A SYMMETRIC MATRIX*
MICHAEL L. OVERTON]"

AND

ROBERT S.

Dedicated to Gene Golub on the occasion

WOMERSLEY:I:

of his 60th birthday

Abstract. The sum of the largest k eigenvalues of a symmetric matrix has a well-known extremal property
that was given by Fan in 1949 [Proc. Nat.Acad. Sci., 35 (1949), pp. 652-655 ]. A simple proof of this property,
which seems to have been overlooked in the vast literature on the subject and its many generalizations, is
discussed. The key step is the observation, which is neither new nor well known, that the convex hull of the set
of projection matrices of rank k is the set of symmetric matrices with eigenvalues between 0 and and summing
to k. The connection with the well-known Birkhoff theorem on doubly stochastic matrices is also discussed.
This approach provides a very convenient characterization for the subdifferential of the eigenvalue sum, described
in a separate paper.

Key words, eigenvalue sum, convex hull, projection matrix, Fans theorem
AMS(MOS) subject classifications. 15A42, 15A45

Let A be an n by n real symmetric matrix, with eigenvalues


and a corresponding orthonormal set of eigenvectors ql,

where A Diag (X1,


was proved by Fan [Fan].
THEOREM 1.

qn,

thus

QTQ=I
A=QAQ r
X) and Q [ql, "", qn]. In 1949, the following theorem
k

max tr X tAX.
xTX=Ik

Here Ik is the identity matrix of order k, and hence X is a matrix whose columns are k
orthonormal vectors in n. (All matrices are assumed to be real, but extension to the
case where A is complex Hermitian is straightforward.)
In the case k l, the theorem reduces to the Rayleigh principle; in the case k n,
it states only that the trace of a square matrix is the sum of its eigenvalues. Cases <
k < n are natural generalizations of the Rayleigh principle and are also reminiscent of
the Courant-Fischer theorem, which first appeared in 1905 [Fis]. (The Courant and
Fischer versions differ in the infinite-dimensional case.) The Courant-Fischer theorem
may be stated succinctly as follows.
THEOREM 2.

Xk

max min v rX rAXv.

xTx= Ik vTv=

Received by the editors March 21, 1991; accepted for publication (in revised form) July 15, 1991.

f Computer Science Department, Courant Institute of Mathematical Sciences, New York University, New
York, New York 10012 (overton@cs.nyu.edu). The work of this author was supported in part by National
Science Foundation grant CCR-9101640.
School of Mathematics, University of New South Wales, Kensington, New South Wales, Australia
(rsw@hydra.maths.oz.au). The work of this author was supported in part by U.S. Department of Energy contract
DEFG0288ER25053 while the author was visiting New York University.

41

42

MICHAEL L. OVERTON AND ROBERT S. WOMERSLEY

For both Theorems

and 2, it is immediately clear that the fight-hand side is greater


qk]. The proof of the Courant-Fischer
q,,
than or equal to the left by taking X
theorem is completed by the following argument: for any X, take v to be orthogonal to
the first k- rows of QrX; then vrX rAXv <= Xk. Completing the proof of Fans theorem
is somewhat harder, however. Clearly, letting Y QrX, the result is equivalent to the
following:
k

] Xi

max tr
yTy Ik

YrAY.

Since A is diagonal,
k

tr Y rA Y=

(2)

E E Xi y},
j=li=l

and one might suppose the rest of the proof to be straightforward. Only a few lines of
inequalities are in fact required, but deriving these is not a completely trivial exercise;
for the details, see Fans original proof. Obtaining Theorem as a consequence of Theorem
2 does not seem to be any easier. Another approach uses properties of doubly stochastic
matrices and will be described below. Therefore, although it is hard to imagine that
Rayleigh, Fischer, or Courant would have been surprised by Fans result, it is quite
plausible that they were not aware of it. Actually, Marshall and Olkin [MO point out
that Theorem is a special case of a much more general but less well known result of
von Neumann dating to 1937 [vN]:
ffiri

max

uTu InVTV= In

tr UBVC,

where ffl >->-- fin and r, >- >= rn are, respectively, the singular values of the n by
n nonsymmetric matrices B and C. See [MO, Chap. 20 for the proof.
There is a vast literature on various inequalities for sums and products of eigenvalues
of symmetric matrices; see particularly [BB, Chap. 2 ], [Bel, Chap. 8 ], [MM, Part II],
[Fri], and references therein. However, we are not aware that any of the many results
available in the literature are particularly relevant to the discussion here, except as
noted below.
The purpose of this note is to describe an easy but interesting result, which then
leads to a trivial proof of Fans theorem. This result, while not new, is so beautifully
simple that its obscurity is surprising.
THEOREM 3. Let

1 { yyr. yry= I }
and

ft2

W: W=

Wr, tr W=k,O<= W<=I}.

Here Y has dimension n by k, so that YY r is a projection matrix of order n and rank k,


and W has dimension n by n, with the last condition meaning that W and I W are
both positive semidefinite. Then ft2 is the convex hull of ft,, and 1 is the set of extreme
points

of ft2.

Theorem

(3)

then follows as a consequence, because


tr

YrAY=tr YYrA,

THE SUM OF THE LARGEST EIGENVALUES

43

and maximizing this linear function over Y Y r e "l is equivalent to maximizing it over
W e ft2. Since
tr WA

the fact that the maximum value is 2;


IV. Equivalently,

X Wii

i=1

ki follows trivially from the conditions on

Xi max tr VA,

(4)

Vfl2

2 if and only if W Q rVQ ft2.


We derived Theorem 3 before we found it in the literature. Our proof is as follows.

as V e

The fact that any convex combination of elements of ftl lies in 2 is immediate. Also,
using the spectral decomposition of W, which has eigenvalues lying between 0 and
that sum to k, it is clear that any element of 2 with rank greater than k is not an extreme
point. The only candidates for extreme points, then, are those with rank k, i.e., the
elements of 2. But it is not possible that some rank k elements are extreme points and
others are not, since the definition of ft2 does not in any way distinguish between different
rank k elements. Since a compact convex set must have extreme points, and is in fact
the convex hull of its extreme points, the proof is complete.
A slightly different proof of this theorem was given in 1971 by Fillmore and Williams
[FW ], a paper whose existence was recently brought to our attention by H. Woerdeman
and C.-K. Li. The paper is primarily concerned with the numerical range of a matrix,
which in the symmetric case is simply the line segment [X, ),1 ], together with generalizations of this notion, and is referenced in the subsequent literature on generalized
numerical ranges, e.g., [GS], [Poo]. However, [FW] does not seem to be well known
in the general linear algebra community. We do not know of an explicit statement of
Theorem 3 that appeared before 1971. S. Friedland has pointed out that the result may
be obtained, in fact in a more general form, by using Theorem 4 below in conjunction
with the classical technique of majorization [MO] and that, furthermore, the result is
related to a 1950 theorem ofLidskii [Kat, p. 145 ]; however, such approaches to Theorem
3 are considerably more complicated than the trivial proof just given. It seems very
surprising that Theorem 3 is so little known, especially given its resemblance to the
following famous theorem.
THEOREM 4. Let 23 be the set of n by n permutation matrices, and let 4 be the set
of n by n doubly stochastic matrices, i.e., nonnegative matrices whose row sums and
column sums are one. Then 4 is the convex hull of ft3, and f3 is the set of extreme points

of 4.
This theorem is usually attributed to Birkhoff, who discovered it in 1946, although
Chvfital [Chv, Chap. 20] notes that it was given by Krnig in 1936 [Krn, p. 381]. It has
been rediscovered, reproved in various ways, and generalized by many authors; see [MO,
Chap. 2] for an extensive discussion.
Theorem 4 has also been used as the basis for proving Fans theorem, e.g., [RV,
Chap. 6 and [MO, Chap. 20 ]. The former reference proves Theorem as a consequence
of some general inequalities proved using Theorem 4; the latter gives a more direct proof
as follows. We have already noted that the maximization objective in Theorem may
be written in the forms (2) and (3); another equivalent form is

(5)

XrZe,

44

MICHAEL L. OVERTON AND ROBERT S. WOMERSLEY

where e is the k-dimensional vector with all elements equal to one, X [)kl,
)kn] T,
and Z is the matrix of dimension n by k whose elements are defined by zo y}. Since
Y has orthonormal columns, the column sums of Z are one and the row sums are less
than or equal to one. Clearly, there exists an n by n doubly stochastic matrix whose first
k columns are the columns of Z (simply extend Y to a square orthogonal matrix).
Consequently, maximizing (5) over permissible values of Z cannot give a larger value
than maximizing

XrSf

(6)

over all n by n doubly stochastic matrices S, where f is the vector with first k elements
equal to one and last n k elements equal to zero. By Theorem 4, this is equivalent to
maximizing (6) over the permutation matrices, giving an upper bound of Z
Xi for
the maximization objective and completing the proof of Theorem 1. (The last step is
actually slightly different in [MO ], using majorization instead of Theorem 4.) Note, by
the way, that not all doubly stochastic matrices can be obtained using such a construction
([MO, Chap. 2]). This does not affect the proof since, as noted at the beginning of the
discussion, the lower bound for the maximization objective is immediate.
The parallels and differences in the two proofs of Theorem given above are quite
striking. The first proof used Theorem 3, writing the maximization over the nonconvex
set ft and noting that maximizing instead over its convex hull f2 led to the desired
conclusion. The second proof used Theorem 4, writing the maximization over the convex
set "4 and noting that maximizing instead over its extreme points 23 led to the desired
conclusion. (For a third proof, see [RW].)
It is a well-known consequence of Fans theorem that, because the left-hand side of
is the pointwise maximum of the linear functions on the fight-hand side, the sum of
the first k eigenvalues of a matrix is a convex function of its elements. The same property
is also deduced from (4). We show in [OW] that formulas for the subdifferential of the
eigenvalue sum may be derived from either of these two max formulations, but that the
latter is particularly useful, for a reason related to the discussion in the preceding paragraph;
ft2 is convex, while 3 is not. It follows from either derivation that the sum of the first k
eigenvalues is a smooth function of the matrix elements, i.e., the subdifferential reduces
to a gradient, if and only if),k > Xk / 1. Our interest in this subject arose from consideration
of minimizing sums of eigenvalues of matrices that are specified only in part; see [CDW
for some applications and [OW] for further details. For the important special case k
1, see [Ove]. Finally, [OW] also addresses extremal properties of sums of the largest
eigenvalues in absolute value, giving some interesting generalizations of the results discussed here.

REFERENCES

[Fis]

E.F. BECKENBACK AND R. BELLMAN, Inequalities, Springer-Verlag, New York, 1971.


R. BELLMAN, Introduction to Matrix Analysis, Second Edition, McGraw-Hill, New York, 1970.
V. CHVTAL, Linear Programming, W. H. Freeman, New York, 1980.
J. CULLUM, W. E. DONATH, AND P. WOLFE, The minimization of certain nondifferentiable sums of
eigenvalues of symmetric matrices, Math. Programming Stud., 3 (1975), pp. 35-55.
K. FAN, On a theorem of Weyl concerning the eigenvalues of linear transformations, Proc. Nat. Acad.
Sci. U.S.A., 35 (1949), pp. 652-655.
P.A. FILLMORE AND J. P. WILLIAMS, Some convexity theorems for matrices, Glasgow Math. J., 12
1971 ), pp. I10-117.
E. FISCHER, Ober quadratische Formen mit rellen Koeffizienten, Monatsh. Math. Physik, 16 (1905),

[Fri]

S. FRIEDLAND, Convex spectral functions, Linear and Multilinear Algebra, 9 1981 ), pp. 299-316.

BB
[Bel]
[Chv]
[CDW]

[Fan
[FW]

pp. 234-249.

THE SUM OF THE LARGEST EIGENVALUES

[GS]

[Kat]

45

M. GOLDBERG AND E. G. STRAUS, Elementary inclusion relations for generalized numerical ranges,
Linear Algebra Appl., 18 (1977), pp. 1-24.
T. ITO, A Short Introduction to Perturbation Theory for Linear Operators, Springer-Verlag, New
York, 1982.

K6n

[MM
MO

[Ove]
[OW]

[Poo]

D. KONIG, Theory ofFinite and Infinite Graphs. Birkh/iuser, Boston, 1990 (translation of Theorie der
endlichen und unendlichen Graphen, Akademische Verlagsgesellschaft Leipzig, 1936).
M. MARCUS AND H. MINC, A Survey of Matrix Theory and Matrix Inequalities, Allyn and Bacon,
Boston, 1964.
A.W. MARSHALL AND I. OLKIN, Inequalities: Theory ofMajorization and Its Applications, Academic
Press, New York, 1979.
M.L. OVEr,TON, Large-scale optimization ofeigenvalues, SIAM J. Optim., 1992, to appear.
M.L. OVEgTOr ArqD R. S. WOMWgSIV, Optimality conditions and duality theory for minimizing
sums of the largest eigenvalues of symmetric matrices, Computer Science Department Report
566, New York University, New York, NY, 1991; submitted to Math. Programming.
Y.-T. PooN, Another proof of a result of Westwick, Linear and Multilinear Algebra, 9 (1980), pp. 3537.

[RW]

[RV]
[vN]

F. RENDL AND H. WOLKOWICZ, Applications ofparametric programming and eigenvalue maximization


to the quadratic assignment problem, Math. Programming, to appear.
A.W. ROBERTS AND D. E. VARBERG, Convex Functions, Academic Press, New York, 1973.
J. VON NEUMANN, Some matrix inequalities and metrization of matrix space, Tomsk Univ. Rev.,
(1937), pp. 286-300; Reprinted in John v. Neumann: Collected Works, Vol. IV, A. H. Taub,
ed., Macmillan, New York, 1962, pp. 205-219.

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

You might also like