Professional Documents
Culture Documents
Applied Optimization
Volume 5
Series Editors:
Panos M. Pardalos
University of Florida, U.SA.
Donald Hearn
University of Florida, U.S.A.
The titles published in this series are listed at the end of this volume.
Interior Point Methods
of Mathematical
Programming
Edited by
Tamas Terlaky
Delft University a/Technology
PREFACE xv
VB
Vlll INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING
5 INFEASIBLE-INTERIOR-POINT ALGORITHMS
Shinji Mizuno 159
5.1 Introduction 159
5.2 An lIP Algorithm Using a Path of Centers 161
5.3 Global Convergence 164
5.4 Polynomial Time Convergence 172
5.5 An lIP Algorithm Using a Surface of Centers 175
5.6 A Predictor-corrector Algorithm 178
5.7 Convergence Properties 181
5.8 Concluding Remarks 184
REFERENCES 185
Contents IX
6 IMPLEMENTATION OF INTERIOR-POINT
METHODS FOR LARGE SCALE LINEAR
PROGRAMS
Erling D. Andersen, Jacek Gondzio, Csaba Meszaros,
Xiaojie Xu 189
6.1 Introduction 190
6.2 The Primal-dual Algorithm 193
6.3 Self-dual Embedding 200
6.4 Solving the Newton Equations 204
6.5 Presolve 225
6.6 Higher Order Extensions 230
6.7 Optimal Basis Identification 235
6.8 Interior Point Software 240
6.9 Is All the Work Already Done? 243
6.10 Conclusions 244
REFERENCES 245
8 COMPLEMENTARITY PROBLEMS
Akiko Yoshise 297
8.1 Introduction 297
8.2 Monotone Linear Complementarity Problems 300
8.3 Newton's Method and the Path of Centers 308
8.4 Two Prototype Algorithms for the Monotone LCP 316
8.5 Computational Complexity of the Algorithms 332
x INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING
9 SEMIDEFINITE PROGRAMMING
Motakuri V. Ramana, Panos M. Pardalos 369
9.1 Introduction 369
9.2 Geometry and Duality 370
9.3 Algorithms and Complexity 377
9.4 Applications 383
9.5 Concluding Remarks 390
REFERENCES 391
xiii
XIV CONTRIBUTORS
The primary goal of this book is to provide an introduction to the theory of Interior
Point Methods (IPMs) in Mathematical Programming. At the same time, we try to
present a quick overview of the impact of extensions of IPMs on smooth nonlinear
optimization and to demonstrate the potential of IPMs for solving difficult practical
problems.
The Simplex Method has dominated the theory and practice of mathematical pro-
gramming since 1947 when Dantzig discovered it. In the fifties and sixties several
attempts were made to develop alternative solution methods. At that time the prin-
cipal base of interior point methods was also developed, for example in the work of
Frisch (1955), Caroll (1961), Huard (1967), Fiacco and McCormick (1968) and Dikin
(1967). In 1972 Klee and Minty made explicit that in the worst case some variants
of the simplex method may require an exponential amount of work to solve Linear
Programming (LP) problems. This was at the time when complexity theory became
a topic of great interest. People started to classify mathematical programming prob-
lems as efficiently (in polynomial time) solvable and as difficult (NP-hard) problems.
For a while it remained open whether LP was solvable in polynomial time or not.
The break-through resolution ofthis problem was obtained by Khachijan (1989). His
analysis, based on the ellipsoid method, proved that LP and some special convex
programming problems are polynomially solvable. However, it soon became clear
that in spite of its theoretical efficiency, the ellipsoid method was not a challenging
competitor of the simplex method in practice.
The publication of Karmarkar's paper (1984) initiated a new research area that is
now referred to as Interior Point Methods (IPMs). IPMs for LP not only have
better polynomial complexity than the ellipsoid method, but are also very efficient
xv
XVI INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING
This book is divided into three parts. Part I summarizes the basic techniques,
concepts and algorithmic variants ofIPMs for linear programming. Part II is devoted
to specially structured and smooth convex programming problems, while Part III
illustrates some application areas. The authors of the different chapters are all
experts in the specific areas. The content of the thirteen chapters is briefly described
below.
Chapter 2, Affine Scaling Algorithms, gives a survey of the results concerning affine
scaling algorithms introduced and studied first by 1.1. Dikin in 1967. Conceptually
these algorithms are the simplest IPMs, being based on repeatedly optimizing a
linear function on a so-called Dikin ellipsoid inside the feasible region. The affine
scaling algorithms were rediscovered after 1984, and the first implementations of
IPMs were based on these methods. Unfortunately no polynomial complexity result
Preface XVll
is available for affine scaling methods, and it is generally conjectured that such a
result is impossible. Even to prove global convergence without any non-degeneracy
assumption is quite difficult. This chapter surveys the state of the art results in the
area.
The author, T. Tsuchiya (The Institute of Statistical Mathematics, Tokyo, Japan) is well
known as the leading expert on affine scaling methods. He has contributed to virtually all
of the important results which lead to global convergence proofs without non-degeneracy
assumptions.
Chapter 5, Infeasible Interior Point Methods, discusses the (for the time being, at
least) most practical IPMs. These algorithms require extending the concept of the
central path to infeasible solutions. Infeasible IPMs generate iterates that are infea-
sible for the equality constraints, but still require that the iterates stay in the interior
of the positive orthant. Optimality and feasibility are reached simultaneously. In-
feasibility of either the primal or the dual problem is detected by divergence of the
iterates.
This chapter is written by S. Mizuno (The Institute of Statistical Mathematics, Tokyo,
Japan) who has contributed to several different areas of IPMs. He was one of the first who
XVlll INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING
proposed primal-dual methods, made significant contributions to the theory of IPMs for
complementarity problems, and is one of the most active researchers on infeasible IPMs.
Chapter 6, Implementation Issues, discusses all the ingredients that are needed for
an efficient, robust implementation of IPMs for LP. After presenting a prototype
infeasible IPM, the chapter discusses preprocessing techniques, elements and algo-
rithms of sparse linear algebra, adaptive higher order methods, initialization, and
stopping strategies. The effect of centering, cross-over and basis identification tech-
niques are studied. Finally some open problems are presented.
The authors, E.D. Andersen (Denmark), J. Gondzio (Poland and Switzerland), Cs. Meszaros
(Hungary) and X. Xu (China and USA), are prominent members of the new generation of
people who have developed efficient, state-of-the-art optimization software. Each one has
his own high performance IPM code, and each code has its own strong points. Andersen's
code has the most advanced basis-identification and cross-over, Gondzio's code is the best
in preprocessing and Meszaros' has the most efficient and flexible implementation of sparse
linear algebra. Xu's code is based on the skew-symmetric embedding discussed in Chapter
1, and is therefore the most reliable in detecting unboundedness and infeasibilities.
ity problems. For this work A. Yoshise, together with her coauthors (including S. Mizuno,
the author of Chapter 5), received the Lancaster prize in 1993.
Chapter 12, Interior Point Methods for Global Optimization, indicates the potential
of IPMs in global optimization. As in the case of combinatorial optimization, most
problems in global optimization are NP-hard. Thus to expect polynomiality results
for such problems is not realistic. However, significant improvement in the quality
of the obtained (possibly) local solution and improved solution time are frequently
achieved. The paper presents potential reduction and affine scaling algorithms and
lower bounding techniques for general nonconvex quadratic problems, including some
classes of combinatorial optimization problems. It is easy to see that any nonlinear
problem with polynomial constraints can be transformed to such quadratic problems.
The authors P.M. Pardalos (University of Florida, Gainesville) and M.G.C. Resende (AT&T
Research) are recognized experts in optimization. Pardalos is known as a leading expert
in the field of global optimization and has written and/or edited over ten books in recent
years. Resende is responsible for pioneering work in implementing IPMs for LP, network
programming, combinatorial and global optimization problems.
Chapter 13, Interior Point Approaches for the VLSI Placement Problem, introduces
the reader to an extremely important application area of optimization. Several
optimization problems arise in VLSI (Very Large System Integration) chip design.
Here two new placement models are discussed that lead to sparse LP and sparse
convex quadratic programming problems respectively. The resulting problems are
solved by IPMs. Computational results solving some real placement problems are
presented.
A. Vannelli and his Ph.D. students A. Kennings and P. Chin are working at the Electrical
Engineering Department of the Waterloo University, Waterloo, Canada. Vannelli is known
for his devoted pioneering work on applying exact optimization methods in VLSI design.
Preface XXI
Acknowledgements
I would like to thank my close colleagues D. den Hertog, B. Jansen, E. de Klerk,
T. Luo, H. van Maaren, J. Mayer, A.J. Quist, C. Roos, J, Sturm, J.-Ph. Vial, J.P.
Warners, S. Zhang for their help and continuous support. These individuals have
provided countless useful discussions in the past years, have helped me to review
the chapters of this book, and have helped me with useful comments of all sorts. I
am also grateful to all the authors of this book for their cooperation and for their
excellent work, to John Martindale and his assistants (Kluwer Academic Publishers)
for their kind practical help, and to P. Pardalos, the managing editor of the series
"Applied Optimization" for his deep interest in modern optimization methods and
his constant encouragement. Professor Emil Klafszky (University of Technology,
Budapest Hungary), my Ph.D. supervisor, had a profound personal influence on
my interest, taste and insight in linear and nonlinear programming. Without this
intellectual impulse I would probably never have become an active member of the
mathematical programming community. Finally, but most of all, I thank my wife
for all her love, patience, and support. Without her continuous support this book
would never have been completed.
Tamas Terlaky
May 1996,
Delft, The Netherlands
PART I
LINEAR PROGRAMMING
1
INTRODUCTION TO THE THEORY
OF INTERIOR POINT METHODS
Benjamin Jansen, Cornelis Roos,
Tamas Terlaky
Faculty of Technical Mathematics and Computer Science
Delft University of Technology
Mekelweg 4, 2628 CD, Delft, The Netherlands
ABSTRACT
We discuss the basic concepts of interior point methods for linear programming, viz., duality,
the existence of a strictly complementary solution, analytic centers and the central path with
its properties. To solve the initialization problem we give an embedding of the primal and
the dual problem in a skew-symmetric self-dual reformulation that has an obvious initial
interior point. Finally, we consider the topic of interior point based sensitivity analysis.
Key Words: theory, strictly complementary, central path, embedding, logarithmic barrier
function, potential function, sensitivity analysis
1.1.1 Introduction
It is not surprising that considering the theory oflinear programming from an interior
point of view on the one hand, and the development and analysis of interior point
methods on the other, are intimately related. In fact, a similar interaction is well-
known for the simplex method. Megiddo [25] was the first to analyze the central path
in detail. GuIer et al. [16] presented a complete duality theory for LP based on the
concepts of interior point methods, thereby making the field of interior point methods
for LP fully self-supporting. Kojima et al. [21] and Monteiro and Adler [28] used
Megiddo's results to propose the first primal-dual interior point method, forming
3
T. Terlaky (ed.), Interior Point Methods o/Mathematical Programming 3-34.
iCl 1996 Kluwer Academic Publishers.
4 CHAPTER 1
the basis for high-standard interior point codes as CPLEX and OSLo The important
results in the theory of linear programming are weak and strong duality and the
existence of a strictly complementary solution (Goldman-Thcker's theorem [12]). In
this chapter we will derive these results using a skew-symmetric self-dual embedding
of the primal and the dual problem (the importance of self-duality was already
recognized in the early days of LP, e.g. Tucker [35]). An analogous reformulation
was proposed by Ye et al. [38] for a computational reason: the embedding allows an
obvious interior feasible point that need not be feasible to the original primal and
dual problems. Hence, a standard interior point method could be applied to it derive
the best known complexity bound for an infeasible start interior point method. The
approach is also computationally efficient (see Xu et al. [37]) and very effective in
discovering primal and/or dual infeasibility. The skew-symmetric embedding we use
allows for an easy analysis.
Let us first introduce some notation and state the results mentioned above. Let
c, x E R", bERm and A be an m x n matrix. The primal LP problem in standard
format is given by
(P) min { cT x : Ax = b, x ~ 0 } .
x
(D)
The sets of feasible solutions of (P) and (D) are denoted by P and V respectively.
Problem (P) is called feasible if the set P is nonempty; if P is empty then (P) is
infeasible; if there is a sequence of feasible solutions for which the objective value
goes to minus infinity then (P) is said to be unbounded; analogous statements hold
for (D). We assume throughout that A has full row rank. This implies that y follows
from a given feasible s ~ 0 in a unique way, and we may identify a feasible solution
of (D) just by s. It is easy to check that for any primal feasible x and dual feasible
(y, s) it holds bT y ::; cT x, which is weak duality. The first theorem is the main result
in the theory of LP.
Theorem 1.1.1 (Strong duality) For (P) and (D) one of the following alterna-
tives holds:
(i) Both (P) and (D) are feasible and there exist x* E P and (y*, s*) E V such that
cT x· = bTy*;
(ii) (P) is infeasible and (D) is unbounded;
(iii) (D) is infeasible and (P) is unbounded;
(iv) Both (P) and (D) are infeasible.
Theory of IPMs 5
Theorem 1.1.2 (Strict complementarity) If (P) and (D) are feasible then there
exist x* E P and (y*, s*) E V such that (x*f s* = 0 and xi + si > 0, i = 1, ... , n.
The solution (x*, s*) is called strictly complementary.
The strict complementarity condition implies that for each index i exactly one of xi
and si is zero, while the other is positive. This result was first shown in 1956 by
Goldman and Tucker [12].
In the next sections we give an elementary proof of the above fundamental theorem:
based on interior point ideas.
xTCx = O. (1.1)
The associated dual program is given by
with y E IRn. Obviously the skew-symmetry of C implies that the primal and dual
feasible sets are identical. The strong duality for these problems is easy.
6 CHAPTER 1
Lemma 1.1.3 (SP) and (SD) are feasible and for both the zero vector is an optimal
solution.
Proof: Since a ~ 0 the zero vector is primal and dual feasible. For each primal
feasible x it holds
0= xTCx ~ _aT x
by (1.1), so aT x ~ 0; analogously aTy ~ 0 for each dual feasible y. Hence the zero
vector is an optimal solution for (SP) and also for (SD). 0
Corollary 1.1.4 Let x be feasible for (SP) and define s = Cx+a. Then x is optimal
=
if and only if x T s o.
(1.2)
The statement follows from Lemma 1.1.3. o
Observe that (SP) is trivial from a computational point of view since an optimal
solution is readily available. However, the problem is interesting from a theoretical
point of view. To complete the duality theory of the skew-symmetric self-dual
problem (SP) we need to prove the existence of a strictly complementary solution.
Since (SP) and (SD) are identical it suffices to work just with the primal problem
(SP). The feasible region of (SP) will be denoted as SP. So
SP := { (x, s) : Cx - s = -a, x ~ 0, s ~ 0 }.
The set of positive vectors in SP is denoted as Sp o:
Spo:={(x,s): Cx-s=-a,x>O,s>O}.
The set of optimal solutions of (SP) will be denoted by SP*. As a consequence of
Corollary 1.1.4 we have
SP* = { (x, s) : ex - s = -a, xT s = 0, x ~ 0, s ~ OJ.
We will need the following well-known result from elementary convex analysis, see
e.g. Rockafellar [29].
We will also use the following straightforward lemma from calculus, denoting IR++ =
{x E IRn : x> O}.
Lemma 1.1.6 Let J.l E lR++ and P E IR++ be given. Then the function h(x)
pT x - J.l L:7=1 In Xi, where x E lR++, has a unique minimizer.
Proof: Let us introduce the following notation: h(x) = L:7=1 hi(x;), where h;(x;) :=
PiX; - J.llnx;. Let
- (p.x. p·x· )
hi(Xi) := hi (x;) - J.l + J.llnJ.l- J.llnpi = J.l ~ I -In ~ I - 1 .
It easily follows that the functions hi(Xi) are strictly convex and nonnegative on their
domain (0, (0); furthermore hi(Xi) -> 00 as Xi -> 0 or Xi -> 00. Hence all the level
sets of the functions hi(Xi) are bounded, and bounded away from zero. Consider a
nonempty r-Ievel set .c := {x : h(x) ~ r } of the function h(x). Note that .c is
nonempty if we take r := h(x(O) for some xeO) > O. For x E .c and for each i, we
have
n n
hi(Xi) < L hi(Xi) = L(hi(Xi) - J.l + J.lln J.l - J.llnpi)
;=1 ;=1
n n
i=1 ;=1
For any positive number J.l > 0, we define the function II' : IR++ x lR++ -> IR by
where Ci. denotes the ith row of C. Note that fl'(x) = II'(x, s) for (x, s) E Spo. The
function fl' is called the logarithmic barrier function for (SP) with barrier parameter
J.l. Due to (1.2) the term aT x can equally well be replaced by x T s, which shows that
II'(x, s) is symmetric in x and s on SP.
8 CHAPTER 1
Lemma 1.1.7 Let p, > 0. The following two statements are equivalent:
(i) The function I,..(x) has a (unique) minimizer;
(ii) There exist x, 8 E R,n such that
Cx- 8 -a, x ~ 0, 8 ~ 0,
(1.4)
X8 = p,e.
Further, if one of the statements hold8 then x minimizes I,.. if and only if x and 8
satisfy (1.,1).
Proof: First note that whenever (X,8) solves (1.4), then both x and 8 are positive,
due to the second equation. So the nonnegativity conditions for x and 8 in (1.4) can
equally well be replaced by requiring that x and 8 are positive. One easily checks that
I,..(x) is strictly convex, and hence it has at most one minimizer. Since the domain
of I,.. is open, Lemma 1.1.5 applies and it follows that I,.. has x as a minimizer if and
only ifY'/,..(x) = 0, i.e.,
Since Cis skew-symmetric and the matrices X- 1 S and S-l are positive definite and
diagonal, the last equation holds if and only if X 8 =
p,e. This proves the lemma.
o
Now assume that the set Spo is nonempty and let (x(O), 8(0)) E Spo. By (1.1) we
have for any (x, 8) ESP
(1.6)
Property (1.6) is known as the orthogonality property and often used in pivoting
algorithms, see Terlaky and Zhang [34]. Equivalently it holds,
Proof: The equivalence of (ii) and (iii) is already contained in Lemma 1.1. 7. Earlier
we noted the obvious fact that (iii) implies (i). So it suffices to show that (i) implies
(ii). Assuming (i), let (x{Ol, s(Ol) E Spo. Due to relation (1. 7) minimizing f" (x) over
lR~ is equivalent to minimizing 9" (x, s) over Spo. So the proof will be complete if
we show that g" has a minimizer in Spo. Note that g" is defined on the intersection
of lR~+ and an affine space. By the proof of Lemma 1.1.6 the level sets of 9" are
bounded, hence g" has a (unique) minimizer. This completes the proof. 0
In the remainder of this section, we will make the basic assumption that statement
(i) of Theorem 1.1.8 holds, namely that (SP) has a strictly feasible solution.
Assumption 1.1.9 SP contains a vector (x(O), s(O) > 0, i.e., Spo is nonempty.
For each positive J1 we will denote the minimizer of f" (x) as x(J1), and define s(J1) :=
CX(J1) + a. The set {x(J1) : J1 > 0 } is called the central path of (SP). We now
prove that any section (0 < J1 ::; Ji) of the central path is bounded.
Lemma 1.1.10 Let 71 > o. The set { (x(J1), s(J1» : 0 < J1 ::; 7l} is bounded.
Proof: Let (x(O), s(O) E Spo. Using the orthogonality property (1.6) and the fact
that (1.4) holds with x(J1) we get for any i, 1 ::; i::; n,
This shows that Xi(J-l) ~ (nj7 + (x(O)? s(O)/s;O). So the set { x(J-l) O<J-l~j7}is
bounded. The proof for {s(J-l) : 0 < J-l ~ j7} is similar. o
Theorem 1.1.11 If Assumption 1.1.9 holds, then there exist (x',s*) ESP' such
that x· + s* > O.
Rearranging the terms of this equality, and noting that X(J-lk)T S(J-lk) = nJ-lk and
(x·)T s· =0, we arrive at
L Xi'si(J-lk)+ L Xi(J-lk)si=nJ-lk.
iEu(x·) iEu(,·)
Dividing both sides by J-lk and recalling that Xi(J-lk )Si(J-lk) = J-lk, we obtain
Letting k -+ 00, we see that the first sum above becomes equal to the number of
nonzero coordinates in x'. Similarly, the second sum becomes equal to the number
of nonzero coordinates in s·. We conclude that the optimal pair (x', s·) is strictly
complementary. 0
Observe that the proof of Theorem 1.1.11 shows that the central path has a subse-
quence converging to an optimal solution. This suffices for proving the existence of a
strictly complementary solution. However, it can be shown that the central path is
an analytic curve and converges itself. The limiting behavior of the central path as
J-l -+ 0 has been an important subject in the research on interior point methods since
Theory of IPMs 11
long. In the book by Fiacco and McCormick [7] the convergence of the path to an
optimal solution is investigated for general convex programming problems. McLin-
den [24] considered the limiting behavior of the path for monotone complementarity
problems and introduced the idea for the proof-technique of Theorem 1.1.11, which
was later adapted by GuIer and Ye [17]. Megiddo [25] extensively investigated the
properties of the central path, which motivated Monteiro and Adler [28] and Kojima
et al. [21] for research on primal-dual methods.
Lemma 1.1.12 If Assumption 1.1.9 holds then the central path converges to a
unique primal-dual feasible pair.
Proof: The proof very much resembles that of Theorem 1.1.11. Let x be optimal
=
in (SP) and (y, S Cy + a) in (SD), and let (x', s·) be the accumulation point of
the central path as defined in Theorem 1.1.11. It easily follows that
E Xi
-+ E Sj
-s! =n.
x·
iEu(x')' iEu(s')'
( II Xi
-X! II Sj )
-s!
lin
< -1
(
E Xi
-+ E -s!Si
)
= 1.
- n X!
iEu(x') • iEu(s') • iEu(x')' iEu(s') Z
This implies that x' maximizes the product TIiEu(x') Xi and s* maximizes the prod-
uct TIiEu(s') Si over the optimal set. Hence the central path of (SP) has a unique
limit point. 0
The proof of the lemma shows that the limitpoint of the central path solves an
optimization problem over the optimal set. Actually, we proved that the limitpoint
is the so-called analytic center of the optimal set.
12 CHAPTER 1
Definition 1.1.13 (Analytic center) Let DC lRn be a bounded convex set. The
analytic center of D is the unique minimizer of
min{-lnx;: xED}.
x
The analytic center of bounded convex sets was introduced by Sonnevend [32] and
plays an important role in interior point methods. Note that the central path is the
set of analytic centers of the level sets.
For convenience, we describe how (p) is derived from (P) without increasing the
number of variables or constraints. Consider (P) and assume that rank(A) = m
(otherwise the redundant constraints can easily be eliminated). Let B be any basis of
A and partition A = [AB' AN], cT = [c~, c~] and x T = [x~, xt]. Then Ax = b, x 2':
o can be written as XB + Ali l ANXN = Ali b, x 2': 0 or equivalently -Ali l ANXN 2':
-Alilb, XN 2': O. Likewise cT x =
C~XB + ctxN =
c~Alilb + (ct - c~Alil AN)XN,
hence (P) can be written equivalently in the symmetric form as
min { (ct - c~Alil AN)XN : -Ali l ANXN 2': _ABlb, XN 2': 0 } .
XN
Expressed in the form (P)-(D), it is easily seen that a pair (x*, y*) is strictly com-
plementary if x* is feasible in (P), y* is feasible in (D) and moreover
(Ax* - bfy* = (c - ATy·fx· = 0,
y* + (Ax· - b) > 0,
X*+(C-ATy*) >0.
Theory of IPMs 13
It is worthwhile to note that if x CO ) is strictly feasible for (P) and rCO) := AxCO) - b,
°
then we have I) = by setting '19 0 = TO = 1. Also if yCO) is strictly feasible for (D)
°
and u CO ) := c - AT y(O), then c = if '19 0 = TO = 1. So, the vectors I) and c measure
the infeasibility of the given vectors x(O), r CO ), yCO) and u(O). We define the problem
=
Due to the selection of the parameters the positive solution x x(O), y yeO), '19 = =
'19 0, T =
TO is feasible for (SP), and Assumption 1.1.9 holds. Also, the coefficients
in the objective function are nonnegative. Hence, the results of the previous section
apply to this problem, and we can derive the following result.
Theorem 1.1.14 For (P) and (JJ) one of the following alternatives holds:
(i) (P) and (JJ) are both feasible and there exists a strictly complementary solution
(x·,V)·
(ii) (P) is infeasible and (il) is unbounded.
(iii) (iJ) is infeasible and (P) is unbounded.
(iv) (P) and (JJ) are both infeasible.
Proof: Problem (SP) is skew-symmetric and self-dual, the objective has nonnega-
tive coefficients and Assumption 1.1.9 holds. Hence Theorem 1.1.11 guarantees the
existence of a strictly complementary solution (x", y" , '19", T"). By Lemma 1.1.3 we
also know, that '19" = 0, since p 2:: Vo > 0. Two possibilities may occur. If T" > 0,
14 CHAPTER 1
then it is easily seen that x-* := x* Ir* and y* := y* Ir* are feasible in (I» and (D) re-
spectively, and that they constitute a strictly complementary pair. So case (i) holds.
On the other hand, if r* = 0 then it follows that Ax* ~ 0, x* ~ 0, AT y* S 0, y* ~ 0
and bT y* - cT x* > O. If bT y* > 0 then (P) is infeasible, since by assuming that X-
is a primal feasible solution one has 0 ~ X-T AT y* ~ bT y*, which is a contradiction.
Also, it follows immediately that if (D) is feasible then it is unbounded in this case.
If cT x* < 0 then (D) is infeasible, since by assuming y to be a dual feasible solution
we have 0 S yT Ax* S cT x*, which is a contradiction; also, (I» is unbounded if it is
feasible. If bT y* > 0 and cT x· < 0 then both {P) and (D) are infeasible, which can
be seen in just the same way. 0
The proof reveals that the construction (SP) cannot always determine which of the
alternatives in the theorem actually applies. It still is an open question whether a
variant of this approach can be found that does not solve an additional feasibility
problem, nor uses a 'big M'-parameter, and still identifies exactly which of the four
holds for a given pair of LP problems. Now we only have the following corollary.
Corollary 1.1.15 Let (x·, y*, 1')*, T·) be a strictly complementary solution of (SP).
If r* > 0 then (i) of Theorem 1.1.14 applies; if r* = 0 then one of (ii), (iii) or (iv)
holds.
Observe that there is ample freedom in the choice ofthe starting point. This is highly
attractive for warm-starting, when related but (slightly) perturbed LP problems
have to solved.
1.2.1 Introduction
The merits of LP are nowadays well-established and it is widely accepted as a useful
tool in Operations Research and Management Science. In many companies this way
of modeling is used to solve various kinds of practical problems. Applications in-
clude transportation problems, production planning, investment decision problems,
blending problems, location and allocation problems, among many others. Often
use is made of some standard code, most of which use a version of Dantzig's sim-
plex method as solution procedure (for a recent survey we refer to [31]). Many LP
Theory of IPMs 15
packages do not only solve the problem at hand, but provide additional information
on the solution, in particular information on the sensitivity of the solution to cer-
tain changes in the data. This is referred to as sensitivity analysis or post optimal
analysis. This information can be of tremendous importance in practice, where pa-
rameter values may be estimates, where questions of type "What if... " are frequently
encountered, and where implementation of a specific solution may be difficult. Sen-
sitivity analysis serves as a tool for obtaining information about the bottlenecks and
degrees of freedom in the problem. Unfortunately, interpreting this information and
estimating its value is often difficult in practice; misuse is common, which may lead
to expensive mistakes (see e.g., Rubin and Wagner (30)). In the literature there are
several references where (often partially) the correct interpretation of sensitivity re-
sults is stressed. We mention Gal [8, 9], Ward and Wendell [36], Rubin and Wagner
[30], Greenberg [14], among others. The purpose of this section is manyfold. Our
first objective is to convince the reader of a correct way of considering and applying
sensitivity analysis in LP. The important observation here is that knowledge of the
set of optimal solutions is needed, instead of knowing just one optimal solution. Sec-
ondly, we show that, contrary to a popular belief, sensitivity on the basis of interior
point methods is possible and even natural by using the optimal partition of the LP
problem. Research in this area was triggered by Adler and Monteiro [1] and Jansen
et al. [18] (see also Mehrotra and Monteiro (26)). Greenberg [15] has given some
examples where the interior approach has important practical influence. Thirdly, we
unify various viewpoints on sensitivity analysis, namely approaches using optimal
bases ('simplex approach'), optimal partitions (,interior approach'), or the optimal
value ('value approach'). This unification lingers on the fact that these are three
approaches by which the optimal set can be characterized.
This partition is called the optimal partition and denoted by 1[" = (B, N). Using the
optimal partition we may rewrite the primal and dual optimal sets as
P* {x : Ax = b, XB 2:': 0, XN = 0 },
16 CHAPTER 1
V* = {(y,s): ATY+S=C,SB=O,SN~O}.
Since we assume A to have full rank we can identify any feasible s ~ 0 with a unique
y such that AT y + s = c, and vice versa; hence we will sometimes just use y E V*
or s E V* instead of (y, s) E V*.
We will study the pair of LP problems (P) and (D) as their right-hand sides b
and c change; the matrix A will be constant throughout. Therefore, we index the
problems as (P(b, c)) and (D(b, c)). We denote the optimal value function by z(b, c).
We will call the pair (b, c) a feasible pair if the problems (P(b, c)) and (D(b, c)) are
both feasible. If (P(b, c)) is unbounded then we define z(b, c) := -00, and if its dual
(D(b, c) is unbounded then we define z(b, c) := 00. If both (P(b, c)) and (D(b, c))
are infeasible then z(b, c) is undefined. Specifically we are interested in the behavior
of the optimal value function as one parameter changes. Although this is a severe
restriction, it is both common from a theoretical and a computational point of view,
since the multi-parameter case is very hard (see e.g. Ward and Wendell [36] for a
practical approximative approach). So, let Llb and Llc be given perturbation vectors
and define
In the next lemma we prove a well-known elementary fact on the optimal value
function.
Lemma 1.2.1 The optimal value function f(f3) is convex and piecewise linear in f3,
while g(,) is concave and piecewise linear in ,.
Proof: By definition
f(f3) = max
y
{ b(f3f y : y E V }.
If f(f3) has a finite value, the optimal value is attained at the analytic center of one
the faces of V (cf. Lemma 1.1.12). Since the number of faces is finite it holds
f(f3) = max
y
{ b(f3)T y : YES},
where S is a finite set, viz. the set of analytic centers of the faces of V. For each
yES we have
Theory of IPMs 17
which is linear in 13. So 1(13) is the maximum of a finite set of linear functions, which
implies the first statement. The second can be shown similarly. 0
The proof of the lemma is an 'interior point variation' of a well-known proof using
for S the vertices of V. The intervals for 13 (or 1) on which the optimal value function
1(13) (or g( 1)) is linear are called linearity intervals. The points where the slope of
the optimal value function changes are called breakpoints.
We give here four questions a typical user might ask once a LP problem has been
solved for a certain value of, say, 13:
Question 1 What is the rate of change the optimal value is affected with by a
change in 13?
Question 2 In what interval may 13 be varied such that this rate of change is con-
stant?
Question 3 In what interval may 13 be varied such that the optimal solution of (D)
obtained from our solution procedure remains optimal?
Question 4 What happens to the optimal solution of (P) obtained from our solution
procedure?
Questions 1 and 2 clearly have an intimate connection with the optimal value func-
tion. It will need some analysis to show that the same is true for Questions 3 and
4. The answer to Question 1 must clearly be that the derivative (slope) of the op-
timal value function is the rate at which the optimal value changes. This rate of
change is called the shadow price (in case of varying objective we speak of shadow
cost). However, if 13 is a breakpoint then we must distinguish between increasing
and decreasing 13, since the rate of change is different in both cases. Moreover, the
shadow price is constant on a linear piece of the optimal value function. Hence the
answer to Question 2 must be a linearity interval. One of the reasons that Questions
3 and 4 are more involved is that the answer depends on the type of solution that is
computed by the solution procedure.
The next two lemmas show that the set of optimal solutions for (D(b(13), c)) (be-
ing denoted by V~) is constant on a linearity interval of 1(13) and changes in its
breakpoints. Similar results can be obtained for variations in c and are therefore
omitted.
Lemma 1.2.2 If 1(13) is linear on the interval [(31,132] then the optimal set V~ is
constant on (131,132).
CHAPT ER 1
18
Proof: Let (3 E (;31, ;32) be arbitrar y and let y E 1J~ be arbitrar y as well. Then
So all the above inequal ities are equaliti es and we obtain f' ((3) = /:)'bTy, which in
turn implies
(1.8)
the sets 1J~ are
Hence y E 1J~ for all ;3 E [;31, ;32]. From this we conclud e that
0
constan t for ;3 E (;31 , ;32)'
Coroll ary 1.2.3 Let f(;3) be linear on the interval [;31, ;32] and
denote 75* := 1J~
for arbitrary ;3 E (;31, ;32). Then 15* ~ 1J~, and 15" ~ 1J~2'
Lemm a 1.2.4 Let;31 and ;32 be such that 1J~, = 1J~2 =: 15*. Then 1J~ = 15* for
;3 E [;31, fJ2] and f(;3) is linear on this interval.
Hence f(f3) is linear on [f31, f32] and y E V~ for all f3 E [f31, f32]. Hence 15' is a subset
of the optimal set on (f31, f32). From Corollary 1.2.3 we know the reverse also holds,
hence for all f3 E (f31, f32) the optimal set equals 15'. 0
As we have seen in the proof of Lemma 1.2.2 the quantity ab T y is the same for
all y E V~ for f3 in a linearity interval. The next lemma shows that this property
distinguishes a linearity interval from a breakpoint. Gauvin [11] was one ofthe first 1
to show this result and to emphasize the need to discriminate between left and right
shadow prices, i.e., between decreasing and increasing the parameter.
Lemma 1.2.5 Let f'-(f3) and f~(f3) be the left and right derivative of f(·) in f3.
Then
Proof: We give the proof for f~(f3); the one for f'-(f3) is similar. Let /3 be in the
linearity interval just to the right of f3 and let y E V~. Then
Lemma 1.2.6 Let f31, f32 be two consecutive breakpoints of the optimal value func-
tion f(f3). Let /3 E (f31, f32) and define 15' := V~. Then
1 Personal communication 1992; Gauvin's paper is not mentioned in the historical survey by Gal
[9).
20 CHAPTER 1
Proof: We will only give the proof for the minimization problem. By Lemma
1.2.2 15* is the optimal set for all (3 E «(31, (32). Observe that the minimization
problem is convex; let «(3*, x*) be a solution to it. Obviously x* is also optimal in
(P( b«(3*), c)) with optimal value (b + (3* Llb l y for arbitrary y E 15*. Hence (3* 2: (31.
On the other hand, let x(1) be optimal in (P(b«(31), e)). By Corollary 1.2.3 it holds
(x(1))T s = 0, 'Vs E 15*. Hence the pair «(31, x(1)) is feasible in the minimization
problem and we have (3* ~ (31. This completes the proof. 0
Reconsidering the results obtained above, we see that computation of linearity in-
tervals and shadow prices can be done unambiguously using optimal sets, contrarily
to what is usually done by using just one optimal solution. Next we give three
approaches based on the use of optimal sets, motivated by three different but equiv-
alent ways of describing the optimal set. The first uses optimal partitions, the second
optimal values and the third (primal/dual) optimal bases.
For each (3 the corresponding optimal partition and a strictly complementary optimal
solution will be denoted by 7((3 = (B(3, N (3), and (x«(3\ y«(3), s«(3) respectively.
Lemma 1.2.7 Let the value function f«(3) be linear for (3 E [(31, b2J. Then 7((3 is
independent of (3 for all (3 E «(31, (32).
Proof: Follows immediately from Lemma 1.2.2 after the observation that the opti-
mal partition exactly identifies the optimal set. 0
Let us assume that (3 = 0 and (3 = 1 are two consecutive breakpoints of the optimal
value function f«(3). We will show that the optimal partition in the linearity interval
o < (3 < 1 can be determined from the optimal partition at the breakpoint (3 0 =
by computing the right shadow price at (3 = O. To this end we define the following
primal-dual pair of LP problems 2 :
Theorem 1.2.8 Let (3 E (0,1) be arbitrary. For the primal-dual pair of problems
(~h) and (D;:'h) it holds:
(i) The optimal partition is (B(3, N(3);
(ii) y«(3) is optimal in (n;:.b);
(iii) The optimal value (~b)T y((3) is the right shadow price at (3 = o.
2The notation l-+ (and later ..... , ...... and ......) refers to the starting position and the direction of
change. For instance, l-+ means starting in the breakpoint and increasing the parameter; >- means
starting in a linearity interval and decreasing the parameter.
22 CHAPTER 1
Proof: Note that (ii) and (iii) follow from Lemma 1.2.5. Let 0 < 13 < 1 be arbitrary
and consider
(1.9)
Starting from the breakpoint at j3 = 1 and using the optimal partition (Bl' Nd a
similar result can be obtained by using the primal-dual pair of LP problems given
by:
Theorem 1.2.9 Let 13 E (0,1) be arbitrary. For the primal-dual pair of problems
(P-:}) and (~) it holds:
(i) The optimal partition is (B)3, N)3);
(ii) y()3) is optimal in (D~);
(iii) The value (t!..b)T y()3) is the left shadow price at f3 = 1.
Lemma 1.2.10 If 13 E (0,1) is arbitrary then it holds (~bl(y()3) - yCO) >0 and
(~bl(y(1) - yC)3) > O.
Theory of IPMs 23
Proof: Theorem 1.2.8 shows that maximizing (t!..b l y over the dual optimal face
gives yC(3) as an optimal solution, and (t!..blyC(3) as the right shadow price. As a
consequence of Theorem 1.2.9 minimizing (t!..bl y over the optimal face gives the
left shadow price at (3 = OJ let y denote an optimal solution for this problem. Since
the value function 1((3) has a breakpoint at (3 = 0, its left and right derivatives are
different at (3 = 0, so we conclude (t!..b)Ty < (t!..bl yC(3). It follows that (t!..b)T y is
not constant on the dual optimal face. Since yCO) is an interior point of this face, we
conclude that (t!..blY < (t!..b)TyCO) < (t!..blyC(3), which implies the first result. An
analogous proof using (3 = 1 gives the second result. 0
Now we consider the case that the optimal partition associated to some given linearity
interval is known. We will show that the breakpoints and the corresponding optimal
partitions can be found from the given partition and the perturbation vector t!..b.
This is done by observing that we may write the problems in Lemma 1.2.6 as LP
problems.
Theorem 1.2.11 For the primal-dual paifLoj-problems (p~b) and (~b) it holds:
(i) The optimal partition is (B(3-, N(3-);
(ii) xC(3-) is optimal in (~b);
(iii) The optimal value is (3- .
Proof: Items (ii) and (iii) follow in fact from Lemma 1.2.6. The proof of (i) follows
the same line of reasoning as the proof of Theorem 1.2.8. We construct feasible
solutions for both problems and prove that these solutions are strictly complementary
with the correct partition. Since (yCO),sCO») is optimal in (D(b((3-),c)) (Corollary
1.2.3), we obtain the inclusion No ~ N(3-. This shows that
x := x C(3-\ (3 := (3-
is feasible for (D~b). First we deduce from Lemma 1.2.10 that (.6.b)T(yCO) - yC{r)
is positive, so y is well defined. Clearly (.6.b)T Y = -1. Furthermore,
Since (SCO)Bo = 0 and sC{r) ~ 0, it follows that (SCO)Bo - (sur )Bo = -( sC{r )Bo ::;
O. So y is feasible for the dual problem. Since for i E Bo we have Xi > 0 if and only if
i E B{3-, and Si = 0 if and only if i E B{3-, the given pair is strictly complementary
with the partition (B{3-, N{3-). This proves (i) and also (ii). To give also a proof of
(iii), by the linearity of the optimal value function on [,8-, 0] it follows
or equivalently
bT(yC{r) _ yCO) = ,8-(.6.bf(yCO) _ yC{r). (1.11)
Multiplying (1.10) with bT we obtain that the optimal value equals
bT (yC{3-) _ yCO) _
(.6.b)T(yCO) _ yC{3-) =,8 ,
The breakpoint ,8+ and the corresponding optimal partition can be found by solving
the pair of LP problems:
Theorem 1.2.12 For the primal-dual pair of problems (~) and (~) it holds:
(i) The optimal partition is (B{3+' N{3+);
(ii) xC{3+) is optimal in (~);
(iii) The optimal value is ,8+ .
terms of the given partition and the perturbation ~c. The proofs are based on the
same idea as for their dual counterparts: one checks that some natural candidate
solutions for both problems are feasible indeed, and then shows that these solutions
are strictly complementary with the correct partition. Therefore, we state these
results without proofs. The discussion is facilitated by using
where band c are such that the pair (b, c) is feasible. For each, we will denote
the corresponding optimal partition by 7r-y = (B-y, N-y) and strictly complementary
solutions by (x(-y), y(-y), s(-y»). We start with the case that the given partition belongs
to a breakpoint. Without loss of generality we assume again that, = 0 and, = 1
are two consecutive breakpoints of g(,).
Theorem 1.2.13 Let, E (0,1) be arbitrary. For the primal-dual pair of problems
(~C) and (~C) it holds:
(i) The optimal partition is (B-y, N-y);
(ii) x(-y) is optimal in (~C);
(iii) The optimal value (~c)T x(-y) is the right shadow cost at, O. =
A similar result can be obtained for the optimal partition at , = 1. Defining the
pair of LP problems
one has
Theorem 1.2.14 Let, E (0,1) be arbitrary. For the primal-dual pair of problems
(p~.n and (Den it holds:
(i) The optimal partition is (B-y, N-y);
(ii) xC-y) is optimal in (~.n;
(iii) The optimal value (~c)T x(-y) is the left shadow price at , 1. =
Using these results we derive the following corollary.
26 CHAPTER 1
Corollary 1.2.15 It holds (.6.cf(x(-y) - x(O» <0 and (.6.cf(x(1) - x("I» <0 for
arbitrary, E (0,1).
The last two results concern the determination of the size of the linearity interval and
the optimal partition in the breakpoints, given that the optimal partition associated
=
to the linearity interval is known. Assume that, 0 belongs to the linearity interval
under consideration, and that the surrounding breakpoints, if they exist, occur at
,- and ,+ respectively. We consider the following pair of problems.
(p;:C) max., { _cT x : Ax = 0, (.6.cf x = 1, XN D ?: 0 },
(D;:C) mm"l,y,. { , : AT y + s - ,.6.c = C, SB D = 0, SND ?: 0 }.
We now state
Theorelll 1.2.16 For the primal-dual pair of problems (~C) and (~C) it holds:
(i) The optimal partition is (B"I-' N"I- );
(ii) y(-Y-) is optimal in (~e);
(iii) The optimal value is ,- .
Theorelll 1.2.17 For the primal-dual pair of problems (p~.n and (D~.n it holds:
(i) The optimal partition is (B"I+,N"I+);
(ii) y(-y+) is optimal in (~);
(iii) The optimal value is ,+ .
1.2.4 Using Optimal Values
In Section 1.2.2 we showed that correct shadow prices and linearity intervals can be
obtained by solving appropriate LP problems over the optimal face of the original
primal or dual problem, that is, knowledge of the set of optimal solutions is needed
instead of just one solution. However, once knowing the optimal value of the LP
problem, we can just as well describe the optimal faces as follows:
{x : Ax = b, x?: 0, cT X = z* },
{(V,s) : ATy+s=c, s?:O, bTy=z·}.
Theory of IPMs 27
Replacing the description using optimal partitions for the description using the op-
timal value, the results in Section 1.2.2 are valid. For instance, linearity intervals of
1({3) are computed by (cf. Lemma 1.2.6)
{31 min {{3 : Ax - {3/:lb = b, x ~ 0, cT x = (b + {3/:lb)T y. },
{3,x
where y. E V·. Similarly, left and right shadow prices are found by
1'-({3) min { /:lb T y : AT y + S
Y,s
= c, S ~ 0, (b + {3/:lb? y = bT y. },
An advantage of the approach is that we do not need to know the optimal partition,
just the optimal value. In the literature few explicit references to this idea can be
found, e.g., Akgiil [2], De Jong [19], Gondzio and Terlaky [13J and Mehrotra and
Monteiro [26]. Similar ideas appear in Magnanti and Wong [23], who use a subprob-
lem defined on the optimal set to compute certain cuts in Benders decomposition [5]
and in Terlaky [33], who considers marginal values in ip-programming.
(1.12)
where y(k), k = 1, ... , K, are the dual optimal basic solutions. After Gauvin [11]
analogous results have been derived in [2],[4],[14],[20].
The theory shows that in case of multiple optimal dual basic solutions (primal de-
generacy) one has to distinguish between the rate of change as a consequence of
decreasing and increasing the parameter (3. In this case, the widespread belief that
the shadow price is given by the dual value is not valid. Rubin and Wagner [30]
indicate the traps and give a number of tips for correct interpretation of results of
the dual problem in practice. Analogously, shadow costs are not uniquely defined
in a breakpoint of the optimal value function g(-y) (cf. Greenberg [14]). This leads
to the introduction of left and right shadow costs for which similar results can be
derived.
Linearity Intervals
The classical approach to sensitivity analysis is to pose the question in what interval
the objective coefficient ej (or right-hand side bi) can vary such that the given
(computed) optimal basis B remains an optimal basis. To clarify this and other
approaches we consider the case of varying primal objective, and assume that Llb =
e(i). Hence we are interested in the problem (P(b«(3), e» and its dual. Let us denote
by T8 the interval for (3 for which B is an optimal basis. It is easy to see that
AT Y + 8 = e, 88 = 0, 8.Af ~
It is well known that T8 is really an interval which can be computed at low cost
by twice computing m ratios and comparing them. The reason that this approach
may give different answers from different LP packages is explained by the degeneracy
apparent in the problem, whence the optimal basis might not be unique and/or the
optimal primal or dual solution might not be unique. In Section 1.2.2 it was shown
Theory of IPMs 29
that the optimal set should be used. Using bases, this implies (by definition) that
the dual optimal bases are required. Let y* be the optimal basic solution for the
original problem, then we denote the set of dual optimal bases associated with y*
by S(y*). Ward and Wendell [36] introduce the optimal coefficient set of an optimal
solution y* of (P(b, c» as
T(y*) := {fJ : y* is an optimal solution of (P( b(fJ), e)) }.
A similar definition is given by Mehrotra and Monteiro [26]. Let us also define
Lemma 1.2.18
(i) Ify' is an optimal solution of (P{b,e)) then T(y*) = R(y*);
(ii) If y* is an optimal basic solution of (P{b, c)) then T(y*) = USES(yO) Ts.
A few remarks are in order. Item (ii) of the lemma was shown by Ward and Wendell
[36, Th. 17]. Note that the basis B used in its proof is dual feasible for (P(b, e)) but
not necessarily primal feasible. From Lemma 1.2.18 we may conclude that either
the optimal basic solution is only optimal in the breakpoint, or it corresponds to a
linearity interval of the optimal value function in the sense that for each value of the
parameter in this interval this solution is an optimal solution of the corresponding
problem. If fJ = 0 is a breakpoint of f(f3) then obviously there must exist more than
one optimal basic solutions of (P( b, c». The following lemma implies that whenever
the intersection of optimal coefficient sets corresponding to different optimal basic
solutions is nontrivial, then the sets coincide.
30 CHAPTER 1
Lemma 1.2.19 Let y' and y' be optimal basic solutions oj (P(b, c)) and let T(y*) n
T(y*) =f. {O}. Then T(y*) = T(Y*).
To the best of our knowledge, all commercial LP packages offering the opportunity
of performing sensitivity analysis take the approach using one optimal basis, inde-
pendently of whether degeneracy is present or not; also this approach is standard
in textbooks often without referring to degeneracy problems. Earlier attempts have
been made to circumvent the obvious shortcomings of the classical approach, see
e.g., [6, 20, 14, 8, 9]. They suggest to compute the interval for (3 where at least
one of the optimal bases associated with y. remains optimal. Obviously the overall
critical region given by such an approach is the union of intervals, each being one
where an optimal basis remains optimal. This requires more computational effort,
since (possibly) all optimal bases have to be generated. Evans and Baker [6] suggest
to solve a sequence of LP problems to find this interval. Knolmayer [20] proposes an
algorithm which does not need to generate all optimal bases associated with y*; how-
ever, the statement of his algorithm is not quite clear nor complete. Gal [10] provides
a parametric algorithm inspired by [22] that does not necessarily need all optimal
bases associated with y'. However, this approach still doesn't always generate the
complete linearity interval as desired.
Acknowledgements
The first author is supported by the Dutch Organization for Scientific Research
(NWO), grant 611-304-028. Currently he is working at Centre for Quantitative
Methods (CQM) B.V., Eindhoven, The Netherlands.
REFERENCES
[1) I. Adler and R.D.C. Monteiro. A geometric view of parametric linear program-
ming. Algorithmica, 8:161-176, 1992.
[2) M. Akgiil. A note on shadow prices in linear programming. J. Opl. Res. Soc.,
35:425-431, 1984.
[3) E.D. Andersen and Y. Yeo Combining interior-point and pivoting algorithms for
linear programming. Technical Report, Department of Management Sciences,
University of Iowa, Iowa City, USA, 1994.
[4) D.C. Aucamp and D.1. Steinberg. The computation of shadow prices in linear
programming. J. Opl. Res. Soc., 33:557-565, 1982.
[5) J.F. Benders. Partitioning procedures for solving mixed variables programming
problems. Numerische Mathematik, 4:238-252, 1962.
[6) J .R. Evans and N .R. Baker. Degeneracy and the (mis)interpretation of sensi-
tivity analysis in linear programming. Decision Sciences, 13:348-354, 1982.
[7) A.V. Fiacco and G.P. McCormick. Nonlinear Programming: Sequential Un-
constrained Minimization Techniques. John Wiley & Sons, New York, 1968.
(Reprint: Volume 4 of SIAM Classics in Applied Mathematics, SIAM Publica-
tions, Philadelphia, USA, 1990).
[8) T. Gal. Postoptimal analyses, parametric programming and related topics. Mac-
Graw Hill Inc., New York/Berlin, 1979.
[9) T. Gal. Shadow prices and sensitivity analysis in linear programming under
degeneracy, state-of-the-art-survey. OR Spektrum, 8:59-71, 1986.
[10) T. Gal. Weakly redundant constraints and their impact on postoptimal analyses
in LP. Diskussionsbeitrag 151, FernUniversitiit Hagen, Hagen, Germany, 1990.
[11) J. Gauvin. Quelques precisions sur les prix marginaux en programmation lin-
eaire. INFOR, 18:68-73,1980. (In French).
32 CHAPTER 1
[12] A.J. Goldman and A.W. Tucker. Theory of linear programming. In H.W. Kuhn
and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of
Mathematical Studies, No. 38, pages 53-97. Princeton University Press, Prince-
ton, New Jersey, 1956.
[13J J. Gondzio and T. Terlaky. A computational view of interior-point methods
for linear programming. In J. Beasley, editor, Advances in linear and integer
programming. Oxford University Press, Oxford, UK, 1995.
[23] T.L. Magnanti and R.T. Wong. Accelerating Benders decomposition: algorith-
mic enhancement and model selection criteria. Operations Research, 29:464-484,
1981.
[24] L. McLinden. The analogue of Moreau's proximation theorem, with applications
to the nonlinear complementarity problem. Pacific Journal of Mathematics,
88:101-161,1980.
[25] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo,
editor, Progress in Mathematical Programming: Interior Point and Related
Methods, pages 131-158. Springer Verlag, New York, 1989.
[26] S. Mehrotra and R.D.C. Monteiro. Parametric and range analysis for interior
point methods. Technical Report, Dept. of Systems and Industrial Engineering,
University of Arizona, Tucson, AZ, USA, 1992.
[27] S. Mehrotra and Y. Yeo Finding an interior point in the optimal face of linear
programs. Mathematical Programming, 62:497-515, 1993.
[28] R.D.C. Monteiro and I. Adler. Interior path following primal-dual algorithms:
Part I : Linear programming. Mathematical Programming, 44:27-41, 1989.
[29] R.T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New
Jersey, 1970.
[30] D.S. Rubin and H.M. Wagner. Shadow prices: tips and traps for managers and
instructors. Interfaces, 20:150-157, 1990.
[31] R. Sharda. Linear programming software for personal computers: 1992 survey.
OR/MS Today, pages 44-60, June 1992.
[32] G. Sonnevend. An "analytic center" for polyhedrons and new classes of global al-
gorithms for linear (smooth, convex) programming. In A. Prekopa, J. Szelezsan,
and B. Strazicky, editors, System Modelling and Optimization: Proceedings
of the 12th IFIP-Conference held in Budapest, Hungary, September 1985, vol-
ume 84 of Lecture Notes in Control and Information Sciences, pages 866-876.
Springer Verlag, Berlin, Germany, 1986.
[33] T. Terlaky. On lp-programming. European Journal of Operational Research,
22:70-100, 1985.
[34] T. Terlaky and S. Zhang. Pivot rules for linear programming: a survey on recent
theoretical developments. Annals of Operations Research, 46:203-233, 1993.
34 CHAPTER 1
[35] A.W. Tucker. Dual systems of homogeneous linear relations. In H.W. Kuhn
and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of
Mathematical Studies, No. 38, pages 3-18. Princeton University Press, Prince-
ton, New Jersey, 1956.
[36] J .E. Ward and R.E. Wendell. Approaches to sensitivity analysis in linear pro-
gramming. Annals of Operations Research, 27:3-38, 1990.
[37] X. Xu, P.-F. Hung, and Y. Yeo A simplified homogeneous and self-dual linear
programming algorithm and its implementation. Technical Report, Department
of Mathematics, University of Iowa, Iowa City, Iowa, USA, 1994.
[38] Y. Ye, M.J. Todd, and S. Mizuno. An O(foL)-iteration homogeneous and
self-dual linear programming algorithm. Mathematics of Operations Research,
19:53-67, 1994.
2
AFFINE SCALING ALGORITHM
Takashi Tsuchiya
The Institute of Statistical Mathematics
Department of Prediction and Control
4-6-7 Minami-Azabu, Minato-ku, Tokyo 106 Japan
e-mail: tsuchiya@sun312.ism.ac.jp
ABSTRACT
The affine scaling algorithm is the first interior point algorithm in the world proposed by
the Russian mathematician Dikin in 1967. The algorithm is simple and efficient, and is
known as the first interior point algorithm which suggested that an interior point algorithm
can outperform the existing simplex algorithm. The polynomiality status of the algorithm
is still an open question, but a number of papers have revealed its deep and beautiful
mathematical structures related to other interior point algorithms. In this paper we survey
interesting convergence results on the affine scaling algorithm.
2.1 INTRODUCTION
The affine scaling algorithm is the first interior point algorithm (IPM) for linear
programming (LP) proposed by Dikin in 1967 [12], and is known as one of the
simplest and efficient interior point algorithms. The algorithm is rediscovered by
several researchers including Barnes [7], Cavalier and Soyster [10], Karmarkar and
Ramakrishnan [29], Kortanek and Shi [30] and Vanderbei et al. [68] after Karmarkar
proposed his famous projective scaling algorithm in 1984 [28]. Each step of the
algorithm is (i) to construct an ellipsoid called "the Dikin ellipsoid," and (ii) then to
take a step in the direction which minimizes the objective function over the ellipsoid
to obtain the next iterate. This simple procedure turned out to be efficient in solving
fairly large LP problems in a few decade of iterations [1, 2, 11, 21, 29, 32, 34, 35,
44, 50]. The algorithm is known as the first IPM which suggested that an IPM can
outperform the simplex algorithms for large problems [1, 2]. It was also used in
35
T. Terlaky (ed.), Interior Point Methods o/Mathematical Programming 35-82.
iCl 1996 Kluwer Academic Publishers.
36 CHAPTER 2
Due to its practical and theoretical importance, there have been a number of papers
which gradually revealed interesting mathematical structures of the affine scaling
algorithm [3, 7, 8, 13, 15, 16, 20, 23, 26, 31, 33, 47, 48, 59, 62, 63, 64, 65, 66, 71,
68, 67]. The purpose of this survey is to shed light on mathematical structures and
convergence results on the affine scaling algorithm.
In §3, we explain the affine scaling algorithm. We introduce the Dikin ellipsoid
which is an inscribing ellipsoid on the feasible region of the (primal) LP problem.
This ellipsoid is centered at the current iterate and has certain invariance properties.
The search direction of the algorithm is defined as the direction which minimizes the
objective function over this ellipsoid. We introduce two versions of the algorithm; the
short-step version which takes a step within the ellipsoid and the long-step version
which uses the ellipsoid only for deriving the search direction and moves with a step
in terms of a fraction>. of the way to the boundary of the feasible region. We also
define a dual estimate, which is a reasonable estimate of an optimal solution of the
dual problem based on the current iterate.
In §5, we observe some basic properties ofthe generated sequence such as convergence
to a unique point, asymptotic linear convergence of the objective function values,
etc.
Then we prove global convergence of the long-step algorithm under a non degeneracy
assumption in §6. The primal iterates and the dual estimates are shown to converge
Affine Scaling Algorithm 37
to relative interior points of the optimal faces of the primal and dual, respectively,
for 0 < ,\ < 1. This result is due to Dikin [13].
In §7, we deal with the global convergence results on the long-step algorithm obtained
by Dikin [15] and Tsuchiya and Muramatsu [65] without non degeneracy assumption.
In this case, for 0 < ,\ ~ 2/3, we can show that the primal iterates converge to a
relative interior point of the optimal face of the primal problem, while the dual
estimates converge to the relative analytic center of the optimal face of the dual
problem. We do not make a complete analysis there but outline the idea. A local
potential function plays an important role in the analysis. We introduce this function
and illustrate how it is used to prove the main result mentioned above.
From §8 to §11, the topics are more related to the special case where the feasible
region is a polyhedral cone. (We call this case "homogeneous.") We start §8 by
explaining why it is important to study the algorithm applied to homogeneous prob-
lems. Then we show that the algorithm is nothing but a version of the Karmarkar
algorithm in such a case and observe that it is also directly connected to the the
Newton method to obtain the analytic center of a polyhedron. We give an alterna-
tive proof of polynomiality of the Karmarkar algorithm based on this interpretation.
Interestingly, this proof was derived in the analysis of the global convergence of the
long-step affine scaling algorithm and plays a central role in the global convergence
proof of the affine scaling algorithm.
In §9, we revisit the global convergence analysis of the algorithm for general prob-
lems, and explain how the analysis of the homogeneous case comes into the global
convergence analysis for general problems.
It was shown by Tsuchiya and Muramatsu that ,\ = 2/3 in their global convergence
result [65] mentioned above is tight in ensuring convergence of the dual estimates to
the analytic center of the dual optimal face, and more strongly, Hall and Vanderbei
[26] showed that the dual estimate is not ensured to be convergent if,\ > 2/3. In §10,
we explain why two-thirds is an upper bound concerning convergence of the dual
estimates, by using the relationship between the Newton method for the analytic
center and the homogeneous affine scaling algorithm exploited in §8. Then in §11, as
another application of this relation, we show that a superlinear convergent version of
the affine scaling algorithm is obtained by controlling the step-sizes carefully. These
results were obtained by Tsuchiya and Monteiro [64].
It is an interesting question what is the largest step-size which ensures global con-
vergence of the primal iterates of the affine scaling algorithm. As was mentioned
above, the primal iterates converge to an optimal solution if 0 < ,\ ~ 2/3. In the fall
of 1993, Mascarenhas found an interesting instance where the algorithm cannot be
38 CHAPTER 2
=
convergent to an optimal solution when A 0.999 [31]. In §12, we review this result,
and give a plausible explanation by, again, making use of the relationship between
the homogeneous affine scaling algorithm and the Newton method for the analytic
center.
Finally, in §13, we make a concluding remark, briefly reviewing several other interest-
ing topics which cannot be put in this survey because of limitation of space and time.
We close the survey by suggesting several challenging open questions associated with
this algorithm.
Let us begin with some definition, notation and description of the duality theory for
linear programming. We cite [49] as a standard text book for these basic results.
Given a vector v, Ilvll, IIvlll and IIvll oo are Euclidean norm, I-norm and oo-norm of
v. max[v] means the maximum component of v. For an index set J, we denote by
VJ the subvector of v composed of the elements associated with J. Similarly, for a
matrix M, MJ denotes a matrix composed of the columns of M associated with the
index set J. We define MJ == (MJ)T.
We define faces of P+ and V+. Let Nand B be a pair of the index sets. The pair
(N, B) is called partition if NUB = {I, ... , n} and N n B = 0.
Affine Scaling Algorithm 39
p(tI, B)
1
== {X/XNI = O,X E p+} (2.4)
is referred to as the face of p+ determined by the partition (NI' Bd. The set (2.3)
is called the relative interior of P~, ,B,)' and is written as Pt:, ,B,). A point which
belongs to p(VI, B 1 ) is referred to as a relative interior point of the face. As a special
case, we regard p+ as a face of p+ itself where (NI' B I ) is given by (0, {I, ... , n}).
If p(t B ) is bounded, then there exists a unique point which minimizes the barrier
function
" 1
-L log Xi (2.5)
iEB ,
over p(VI, B)· This special point is referred to as the "relative analytic center" of
=
1
the face. When (NI' Bd (0, {I, ... , n}), the relative analytic center is called the
"analytic center" of the feasible region P+ [51].
is referred to as the face of V+ determined by the partition (N2' B2). The set (2.6)
is called the relative interior of V(+N2, B2 ), and written as V(+:2 B2 ). As a special case,
t
we regard V+ as a face of V+ itself where (N2' B 2) is given by ({I, ... , n}, 0).
The relative analytic center of V(N 2 ,B2 ) is defined similarly as the relative interior
point of Vtt"B2) which minimizes the barrier function
-L logsi (2.8)
iEN2
associated with this face. When (N2' B 2) = ({I, ... , n}, 0), the relative analytic center
is called the analytic center of the feasible region v+.
The duality theory of linear programming concludes that the dual problem (2.2)
is feasible and has an optimal solution if and only if the primal problem (2.1) has
40 CHAPTER 2
an optimal solution, and the optimal value of (2.2) is equal to the optimal value
of (2.1). Now, we assume that (2.1) have an optimal solution. The points x and
(y, s) are optimal solutions of (2.1) and (2.2) if and only if they satisfy the following
complementarity condition:
(i)XEP+, (y,S) EV+, (ii)xisi=O (i=l, ... ,n). (2.9)
Furthermore, there always exists a pair of optimal solutions of (2.1) and (2.2) satis-
fying the following condition in addition to (i) and (ii).
(iii) Xi + Si > 0 (i = 1, ... , n). (2.10)
This condition means that either one of Xi and Si is always positive. (i)-(iii) is
called the strictly complementarity condition. Now, let X· and (y., SO) be a pair
of the optimal solutions of (2.1) and (2.2) satisfying the strictly complementarity
condition, and let N. and E. be index sets such that xi =
0 and si > 0 for all
i E N. and xi > 0 and si =
0 for all i E E •. By definition, (N., B.) is a partition,
and the optimal sets Sp for (2.1) and SD for (2.2) is written as
Sp = P(No,BO) = {x E P+IXNo = OJ, SD = V(No,BO) = {(y, s) E V+ISBo = oJ. (2.11)
Thus, the optimal sets of (2.1) and (2.2) are completely characterized by N. and
B •. If X* and (y', so) satisfy the strictly complementarity condition, then X· and
(y', s·) are relative interior points of the optimal faces Sp and SD, respectively.
In the subsequent sections, we use the following standard conventions. The letter e
denotes the vector of all ones with a proper dimension. For feasible solutions X and
(y, s) of (2.1) and (2.2), we denote by X and 5 the diagonal matrices whose entries
are X and s. An analogous rule applies to subvectors of x and s. We also extend this
convention in an obvious way when we consider a sequence {xk} and {(yk, sk)} etc.
Finally, when f(x) is a function of x and {xk} is a sequence, we occasionally use fk
as an abbreviation for f(x k ) as long as it does not cause a confusion.
where the objective function is constant over the feasible region (we can easily check
whether this occurs or not in advance), and we make Assumption 3 for obtaining
a closed form of the search direction. All of the results in this survey hold without
Assumptions 2 and 3 after a simple modification. In Appendix, we explain how to
satisfy these requirements in implementation.
(2.17)
The ellipsoid E(x,/-') defined in (2.16) is referred to as the "Dikin ellipsoid." Now,
we derive a closed form of the displacement vector and the search direction of the
algorithm. The displacement vector of the algorithm when we take the step-size /-'
is given as the optimal solution of the following optimization problem.
When regarding u =
X-I;] as the variable, this problem becomes nothing but to
find the minimum point of a linear function (Xc)T u over the intersection between
Affine S
caling Algorith
m
44 CHAPTER 2
(2.23)
Then, the iteration of the short-step affine scaling algorithm with the step /l is
written as
(2.24)
In the short-step version, the next iterate is supposed to stay in the ellipsoid with the
radius /l ~ 1. From the practical point of view, it is more efficient to move aggres-
sively along the search direction to obtain further reduction of the objective function
value as is seen from Fig. 1. Since the next iterate should remain as an interior point,
however, we move a fixed fraction A E (0,1) of the way to the boundary. The algo-
rithm with this step-size choice was proposed by Vanderbei et al. [68] and Kortanek
and Shi [30], and is called the long-step affine scaling algorithm. Most ofthe efficient
=
implementations use this version with A 0.9 ,.... 0.99 [1, 2, 11,21,32,34,35,43,44].
The iterative formula of the long-step affine scaling algorithm is written as follows:
+ _ d(x) _ XPAXXC
X (x, A) - x - Amax[X-1d(x)] - x - Amax[PAXXC]' (2.25)
Note that the iteration is not well-defined when max[PAx X c) ~ O. However, since
max[PAXXC] ~ 0 implies that -d(x) ~ 0 and cTd(x) > 0, we have the following
proposition which means that we may terminate the iteration if this happens.
Since the affine scaling algorithm is an algorithm which generates the iterates only
in the space of (2.1), it is important to obtain an estimate of an optimal solution
for the dual problem (2.2). We define a quantity for this purpose, called the "dual
estimate." As was explained in the previous section, it is well-known that solving
the pair of the primal and dual problem is equivalent to finding a pair of a primal
feasible solution x and a dual feasible solution (y, s) satisfying the complementarity
condition (2.9). Based on this fact, let us construct a good estimate for an optimal
solution for the dual problem. Given an interior feasible solution x, we obtain (y, s)
which is closest to the solution of (2.9) in a certain sense. If we give up satisfying
the nonnegativity constraint on s in (2.9), a reasonable estimate of a dual optimal
solution would be the solution of the following least squares problem:
(2.27)
(2.28)
holds between the search direction d( x) and the dual estimate. This relation is
frequently used throughout the paper. The following interesting property holds
about the dual estimate (y(x), sex)).
Theorem 2.3.3 The dual estimate (y(x), sex)) is bounded over p++.
This result was first derived by Dikin [20] and rediscovered later by several other
authors including Vanderbei and Lagarias [67], Stewart [52] and Todd [58] and has
theoretically interesting applications, e.g., [69].
There is an interesting historical story about the dual estimate [17, 18]. Indeed,
the least squares problem (2.26) was the starting point of Dikin when he developed
the affine scaling algorithm in 1967. In 1965, he was a postdoctorial fellow of Kan-
torovich, and they were carrying out some data analysis on the agricultural product
of the former Soviet Union. Kantorovich asked Dikin to estimate the dual variables
from the primal variables which were already available as the observed data. If
we assume that these economic quantities are in an equilibrium state, the primal
variables and the dual variables should satisfy the complementarity condition (2.9),
46 CHAPTER 2
and hence it would be reasonable to consider the following weighted least squares
problem to estimate the dual variables (y, s):
.. .
mllllmize (y,.)
1""
:2 L..J XiS;2 subject to s =c - AT y. (2.29)
;
Note that (2.29) is differ from (2.26) in that Xi is not squared. As a further devel-
opment of this idea of Kantorovich, Dikin realized that it is more natural to use his
dual estimate s(x) which has an invariant property. (The estimate (2.29) depends
on the scaling of the primal variables while the dual estimate (2.26) by Dikin is not.)
Furthermore, he noticed that -X 2 s(x) = -d(x) is a feasible descendent direction
for (2.1), and used it for solving linear programming. Thus, the way the affine scal-
ing algorithm is usually explained is a little bit different from the way how Dikin
developed this method.
The dual estimate is a quantity very similar to the shadow price in the simplex
method. Iteration ofthe simplex method is stopped when the shadow price becomes
nonnegative, recognizing that the primal iterate comes to an optimal solution. Then
the shadow price is nothing but a dual optimal solution. In other words, the shadow
price converges to an optimal solution of the dual problem while the primal sequence
converges to an optimal solution of the primal. Analogously, we expect that the dual
estimate converges to an optimal solution of the dual problem as the primal iterate
converges to an optimal solution of the primal. Convergence of the dual estimates
is important in the convergence theory of the affine scaling algorithm.
In order to solve any LP problem with this algorithm, we have to convert the original
problem into an equivalent problem satisfying Assumptions 1 ~ 3. There are two
ways to do this: the Big-M method and the Phase I-Phase II method. We will review
them briefly in Appendix.
Before concluding this section, we discuss how we define the affine scaling algorithm
for general form LP problems which contain free variables. The following proposition
is easy to see and hence its proof is left to the readers.
Note that the unboundedness of the objective function (both above and below) in
this proposition is easily checked once a problem is given. If the objective function is
bounded neither above nor below, the LP is meaningless. Therefore, this proposition
Affine Scaling Algorithm 47
Here, T is an affine space. Let us assume that (2.30) has an interior feasible solution
s, i.e., s > O.
The Dikin ellipsoid at s is defined as
(2.31)
We apply this idea to the dual standard form problem (2.2). By using Assumption
3, we obtain the iterative formula for the long-step affine scaling algorithm with the
step). E (0,1) for the dual standard form [2] as follows, by letting c = x where x is
=
a solution for Ax band T =
{slATy + s c}. =
(AS- 2AT)-lb
(2.33)
y + ). max[S-l AT (AS-2 AT)-lb] ,
( AS- 2AT)-lb
s - ). AT ----;-=-'-:-:-=-:--:-:::"--;:--:-;;;-:--~
max[S-l AT(AS-2 AT)-lb]
s _). S(I - PAS-.)Sx
(2.34)
max[(I - PAS-.)Sx]
Ax =b (2.35)
48 CHAPTER 2
and
s = c-ATy. (2.36)
Recall that n is the dimension of x and s and that n - m and m are the dimensions of
the feasible polyhedrons p+ and V+, respectively, under Assumptions 1 and 3. The
dimensions ofP+ and V+ are equal to the dimensions ofP and V under Assumption
1. A solution x for Ax = b is called "degenerate" if more than n - m components
of x become zero simultaneously. Similarly, we call a solution (y, s) of s = c - AT Y
is "degenerate" if more than m components of s become zero simultaneously. Now,
we introduce nondegeneracy conditions about the affine spaces P and V and the
polyhedrons p+ and V+ [25].
Making the nondegeneracy assumption on p+ (or V+) for (2.1) (or (2.2» is exactly
the same as making a standard nondegeneracy assumption in the simplex method to
prevent cycling. On the other hand, requiring nondegeneracy of V (or P) for (2.1)
(or (2.2» has something to do with the existence of constant-cost face on the feasible
region. That is, p+ (or V+) has no constant-cost face except for vertices under the
nondegeneracy assumption of V (or Pl. See [70] on this point. Fig. 3 shows some
examples of degenerate problems.
Remark. These conditions are completely symmetric with respect to the primal
problem (2.1) and the dual problem (2.2) in the following sense. Given a standard
form problem (2.1), we can convert it to the dual standard form problem like (2.2)
by taking a basis. Let vt be the feasible region of the converted dual standard
form problem equivalent to (2.1). Then nondegeneracy of p+ is equivalent to non-
degeneracy of vt. On the other hand, given a dual standard form problem (2.2),
we may eliminate the free variable y to write it in a standard form. Let pjj be the
feasible region of the converted standard form problem equivalent to (2.2). Then
nondegeneracy of V+ is equivalent to nondegeneracy of P"Jj. The same thing can
be said about the nondegeneracy condition of the affine spaces P and V. We also
mention that this definition of non degeneracy uses Assumption 3. To extend it to
the general cases, we should replace "m and m - n" in the definition above by "the
dimension of {sis = c - ATy} and {xlAx = b}," respectively. _
One of the important conclusions of the non degeneracy condition associated with
(2.1) is the following proposition.
Affine Scaling Algorithm 49
OD,.d..... funU'an....,tnr.
. . . .··-
Lb~·"'''
. .
{~,,~T:.... ~O
Figure 2.3
Now, we are ready to summarize the convergence results on the affine scaling algo-
rithm for (2.1) in view of the non degeneracy conditions. In 1974, Dikin [13] proved
global convergence of the primal iterates and the dual estimates for a short-step ver-
sion of the algorithm with J1. = 1 when p+ is nondegenerate. Unfortunately, Dikin's
work was not known to the Western countries until 1988 [14]. It is worth noting
that he even wrote a book on the affine scaling algorithm in 1980 with one of his
colleagues [20].
Soon after Karmarkar [28] proposed the projective scaling algorithm in 1984, Barnes
[7] proposed the short-step algorithm with 0 < J1. < 1 in 1985 and proved global con-
vergence of the primal iterates and the dual estimates when p+ and V are non degen-
erate, and Vanderbei et al. independently obtained the same result for a long-step
version [68] with 0 < ). < 1.
50 CHAPTER 2
The first global convergence proof of the affine scaling algorithm for (2.1) without
any non degeneracy assumption concerning P nor p+ was obtained by Tsuchiya [63]
=
in 1989 for a short-step version with II 1/8, yet requiring non degeneracy condition
on V. [This result by Tsuchiya was obtained for the affine scaling algorithm for
(2.2). He proved global convergence under the nondegeneracy condition ofP. When
interpreted in terms of the affine scaling algorithm for (2.1), the corresponding non-
degeneracy condition is the nondegeneracy condition of V.] In the same year, Tseng
and Luo gave a global convergence proof without any non degeneracy assumption but
assuming that all the entries of A, b, c are integer with input size L, with a tiny-step,
i.e., II= 2-£ [59]. Then, Tsuchiya [62] proved global convergence of the primal
iterates without any non degeneracy assumption with II =
1/8 in 1990. The proofs
by Tsuchiya made use of a local potential function which will be explained in this
survey later. This idea of the local potential function is used in most of the global
convergence proofs which do not require non degeneracy conditions, except for the
one by Tseng and Luo.
Finally, from the end of 1991 to the beginning of 1992, Dikin [15] and Tsuchiya
and Muramatsu [65] independently succeeded in proving global convergence of the
primal and the dual sequence for the long-step version without any assumption on
non degeneracy, with the step-size 0 < >. ::; 1/2 and 0 < >. ::; '}./3, respectively. Dikin's
work came out a bit earlier than Tsuchiya and Muramatsu's, while the latter result is
a little bit better. In the both papers, a paper by Dikin [16] played an important role
which dealt with the homogeneous case with>' = 1/2. Tsuchiya and Muramatsu
also proved that the asymptotic convergence rate of the objective function value
approaches "exactly" 1 - >..
In this paper, we do not give a complete global convergence proof for general cases.
We recommend papers by Monteiro, Tsuchiya and Wang [37] and by Saigal [45] for
self-contained elucidative proofs, which somewhat simplify the results in the original
work by Tsuchiya [62, 63] and Tsuchiya and Muramatsu [65]. A recent text book on
linear programming by Saigal [48] is also recommended as a literature which gives
an integrated complete treatment of the affine scaling algorithm.
Theorem 2.5.1 There exists a positive constant ~(A, c) which is determined from
A and c such that
_ cT d(x) h d (/ ++
r (x ) = 1IcIlIld(x)1I 2: ~ > 0 0/ s Jor a / x ~ P . (2.37)
Proof We prove this by contradiction. If such ~ does not exist, there exists a
sequence {x P } of interior feasible solutions such that r( x P ) --+ 0 as p --+ 00. For each
p, J(x P) == d(xP)/II(XP)-ld(xP)1I is an optimal solution of the optimization problem
Since this equation has a solution (e.g. Jq itself), there exists a solution dq whose
norm is bounded by IIJql1 :::; C2(CT Jq + 2::iltI IJm :::; C 2 (1 + (n -III)Cl)cT dq , where
C 2 only depends on A and c. Furthermore, since Idlll cT Jq --+ 00 holds for sufficiently
large q for i E I, we have IIJqll < Idll (i E 1) holds for all q sufficiently large. On
the other hand, we have di = Ji (i f/. 1), hence we conclude that, for sufficiently
large q,
(2.40)
Togegher with cT Jq =cT dq, this is a contradiction to the fact that dq is an optimal
solution of (2.38). •
By using this theorem, we obtain the following properties of the search direction and
the generated sequence including convergence of the sequence to a unique point.
Theorem 2.5.2 Let {xk} be the sequence of the primal iterates of the long-step
affine scaling algorithm with A E (0,1). If {cT xk} is bounded below, then we have
52 CHAPTER 2
4. The inequality
(2.41)
(2.45)
Together with (2.43) and (2.45), we see that d k converges to zero, which proves the
second statement of the lemma.
On the other hand, taking summation of (2.44) with respect to k, we have, for any
o ~ Kl < K 2 ,
K,
ileA, c)llcllllx K2 - xK'1l ~ ileA, c)llcll L IIxk - xk - 1 11
Affine Scaling Algorithm 53
From the second inequality, we see that {xk} is a Cauchy sequence and hence con-
verges to a unique point. This shows the third relation. The fourth relation is readily
seen by letting J{2 -+ 00 in (2.46). •
Thus, the sequence {xk} converges to a unique point xoo. Let (N, B) be the partition
such that x/J = 0 and x'B > O. We have the following proposition.
Proposition 2.5.3 Under the same assumption as Theorem 2.5.2, the set S ==
{x E P+IXN = O} is a face ofP where the objective function is constant, and for
any x E P, the objective function is written as
(2.47)
where (fI, s) E V.
Proof. Since IIX ks(xk)11 -+ 0 and x'B > 0, we see that limk--+oo(c - ATy(xk))B O. =
This implies that there exists (ii, s) such that s = c - AT iJ and CB = A~iJ, so that
SB = O. For any x E P, we have
(2.48)
Theorem 2.5.4 Under the same assumption as Theorem 2.5.2, cT xk converges lin-
early to cT x oo , where the reduction rate is at least 1 - )"/(2fo) asymptotically.
(2.50)
54 CHAPTER 2
Then, we have
(2.51 )
(2.54)
TheoreIll 2.6.1 Let {xk} and ((y(xk), s(xk))} be the sequences of the primal iter-
ates and the dual estimates generated by the long-step affine scaling algorithm for
(2.1) with.A E (0,1). IfP+ is nondegenerate and if (2.1) has an optimal solution,
{xk} and {(yk, sk)} converge to relative interior points of the primal optimal face
and the dual optimal face, respective/yo
Proof. Due to Theorem 2.5.2 and Proposition 2.4.1, the primal iterates and the dual
estimates converge to XOO and (yOO , SOO). Now, we show that XOO and (yOO , SOO) satisfy
Affine Scaling Algorithm 55
the strict complementarity condition. (As we mentioned in §2, if XOO and (yOO, SOO)
satisfy the strict complementarity condition, then they are relative interior points of
the optimal faces.) Let (N, B) be the partition where x'; = 0 and xll > O. Since
IIxoosooll = 0, we see that s'tJ = o.
First we show that s'; 2: 0, thus XOO and (yOO, SOO) satisfy the complmentarity
condition. If s'; l 0, there exists an index i E N where si < O. For sufficiently
large k, we have s~ < 0 and hence, by taking note that Xs(x) = X- 1d(x) (cf. (2.28»,
10+1 _ (Ie \
(X k)2 s).
Ie Ie 10
_ .(1 \ Xi si ) 10
Xi - X - "max[Xksk] , - x, - "max[Xksk] > xi' (2.55)
Now, we show that XOO and (yOO, SOO) satisfy strict complementarity. Let J and J
be the index sets such that si > 0 (i E J) and si = 0 (i E J) , respectively. It is
enough to show that xi > 0 for all i E J. To this end, we observe that
00
(2.60)
56 CHAPTER 2
Theorem 2.7.1 (See [65].) If (2.1) has an optimal solution, the sequences {xk} and
((y(xk), s(xk))} generated by the long-step affine scaling algorithm with 0 < A ~ 2/3
have the following properties.
This result was obtained by Tsuchiya and Muramatsu [65]. Slightly prior to this
result, Dikin established the statements 1 and 2 for 0 < A ~ 1/2 [15]. Surprizingly,
the step-size 2/3 appearing in the theorem is tight for the statements 2 and 3 to
hold. See the papers [65] and [26] and §2.1O of this survey. As to the statement I,
we do not know what is the upper bound of A which ensures global convergence of
the primal iterates of the affine scaling algorithm. Recently, Mascarenhas gave an
interesting example where the sequence cannot converge to an optimal vertex when
A = 0.999 [31]. Terlaky and Tsuchiya [57] obtained an instance where the algorithm
Affine Scaling Algorithm 57
can fail with a A ~ 0.92 by modifying his example. We deal with the example by
Mascarenhas in §2.12.
Due to Theorem 2.5.2, we see that {xk} converges to a unique point xco. We use the
same notation as in §5. Recall that Nand B are the index sets such that x'fj = 0
and xEl > O. The major tool for the proof is a local potential function which is
defined as follows:
This potential function is an analogue of the Karmarkar potential function [28]. The
function is called the local Karmarkar potential function and was first introduced in
[63] for a global convergence analysis of the affine scaling algorithm. Observe that
this local potential function is a homogeneous function in x N. Furthermore, we have,
due to (2.41) and the inequality between arithmetic mean and geometric mean, that
(2.63)
cT xk+l _ cT X CO
cTx k _ cTx co
(2.64)
X~+l u~
• = '[ k]'
1 - Amaxu (2.65)
~,
the reduction of this potential function is written as:
Let
e
k
wN = uNk - INI ' (2.67)
58 CHAPTER 2
The following theorem is a key result for the global convergence proof. Though we
do not prove it here, we give a detailed explanation on the "heart" of the proof in
§8 and §9.
(2.68)
·
I 1m
"-+00
"
UN = -INe-I' lim u~
"-+00
= o. (2.69)
Proof of Theorem 2.7.1. The linear convergence rate of the objective function readily
follows from (2.64) and Theorem 7.3.
Let t~ == x1\, Is~x1\,. By definition and (2.41), there exists a positive constant c such
that
1 cT xl: - cT X OO
t~ ~ (s~x1\, )/lIx1\,1I ~ IIx" _ xooll ~ c (2.70)
for each i E N. On the other hand, because of Theorem 2.7.2 and the linear con-
vergence of S~XN = cT X - cT x co , we see that exp(v'N(x"» = I/(IT;
above. Together with (2.70), we see that (Tt)-le ::; ee
tn
for some constant
is bounded
e,where
Tt == diag(t~). By definition, we have s;(x") = u~ It~ (i EN). Since u~ converges
to e/lNI and ce ::; (Tt}-le ::; ee, we see that {s~} is a bounded sequence whose
accumulation points are strictly positive. Furthermore, since s~ converges to zero
(cf. Theorem 2.5.5), we see that the limiting point X OO and every accumulation point
(yOO, SOO) satisfy the strict complementarity condition. Thus, XOO is a relative interior
of the optimal face of (2.1).
Affine Scaling Algorithm 59
of the dual optimal face. A necessary and sufficient condition for the relative analytic
center (y*, 8*) of the dual optimal face is that there exists xB satisfying
(2.71)
We review the basic properties of the affine scaling algorithm applied to a homoge-
neous problem and exploit its close relationship to the other interior point algorithms.
The point raised in (i) will be discussed in the next section, based on the results in
this section.
60 CHAPTER 2
minimize x cT x
(2.74)
subject to Ax = 0, x ~ O.
Recall that we use P+ to denote the feasible region of (2.74). There are three
possibilities about the problem.
We apply the long-step affine scaling algorithm to this problem. We refer to this
algorithm as "the homogeneous affine scaling algorithm." We assume that the fea-
sible region has an interior point x such that cT x > O. (In the cases 1 and 2, this
condition is always satisfied under Assumption 1.) Furthermore, we assume that
cT x > 0 is always satisfied at any interior feasible solution x under consideration
unless otherwise stated.
(2.75)
s =c - AT y, S ~ O. (2.76)
Proposition 2.8.1 The dual estimate s(x) cannot be strictly positive in the cases 2
and 3.
Affine Scaling Algorithm 61
Proof If s(x) > 0 holds for some feasible solution X, then x 0 and (y, s) = =
(y(x), s(x» make a pair of primal-dual feasible solutions satisfying strict comple-
mentarity. This implies that x =
0 is the unique optimal solution of the problem
(2.74), which cannot take place in the cases 2 and 3. This completes the proof. _
Since the signs of each component of s(x) and u(x), d(x) are the same, we have the
following corollary.
Proposition 2.8.3 The K armarkar potential function is bounded below if and only
if the case 1 occurs, where the minimum value is attained along a line emanating
from x = o.
Proof The proof is easy by taking note of the fact {xix E P+, cT x = I} is bounded
if and only if the case 1 occurs. _
The following theorem is crucial, and shows that the potential function reduces as
long as A :s: 2/3 in the homogeneous affine scaling algorithm.
Theorem 2.8.4 (See ~39, 65].) Let x be an interior feasible solution of (2. 74) such
that cT x > 0, and let x be the interior feasible solution such that cT x+ > 0 obtained
by one iteration of the long-step affine scaling algorithm with 0 :s: A :s: 2/3. Then,
reduction of the potential function is bounded above as follows:
(2.78)
where w =u - (l/n)e.
62 CHAPTER 2
Proof Let A = A/max[u] and 9 = nA/(n - A). Taking account of the relation
uTe = 1, we have
T
= 1 _ AJl.!:IL
2-
= ~(1 xt
_ 91I w I1 2 ), -' A
C
cTx
x+
max[u] 9 Xi
= 1- A - - = -(1- 9Wi)
max[u] 9
Ui
(2.79)
(cf. (2.64) and (2.65». On the other hand, since the relations l/n ::; max[uJ, 0 <
A ::; 2/3, 0 < 1 - AlluW /max[u] ::; 1 - Amax[u] hold, we see that
(2.80)
Now, we find an upper bound of ¢(x+) - ¢(x) by using the following well-known
inequalities [28].
Substituting these inequalities into (2.81) and taking note of the relation w T e = 0,
we have
2 92 11wl1 2
< -n9l1 w ll + 2(1 _ 9 max[w])
Since uT e = 1, we have
max[u] = lin + max[w] 2: lin + IIwllln = (1 + IIwlDln. (2.85)
We substitute the definition of wand 9 into (2.84). Then taking account of the
relations (2.80) and (2.85) and that max[u] = max[w] + lin, we obtain the desired
inequality (2.78). •
Based on this theorem, we obtain the following main result on the homogeneous
affine scaling algorithm.
Affine Scaling Algorithm 63
Theorelll 2.8.5 (See [39].) Let {xk} be the sequence generated by the long-step
affine scaling algorithm applied to (2.74) with the step 0 < A ::; 2/3. The following
situation occurs on {xk}.
1. In the case 1, we have limk_oo '!jJ(xk) = min '!jJ(x) and limk_oo u(xk) = eln.
Furthermore, the dual estimate ((y(xk), s(xk))} converges to the analytic center
of the dual feasible region {(y, s)ls = c - ATy :::: OJ.
2. In the case 2, we have limk_oo '!jJ(xk) = -00, where
and
(2.87)
3. In the case 3, we have {2.86} and {2.87} as long as cT xk > 0, and cT xk < 0
holds after a finite number of iterations.
Proof. In the case 1, we see that '!jJ(x) is bounded below by a constant due to
Proposition 2.8.3. If A ::; 2/3, we have either (i) '!jJk+l - '!jJi.: = 0 for some k or (ii)
'!jJk+l _ '!jJk < 0 for all k. In the case (i), we have u(xk) = eln due to Theorem 2.8.4.
Since u(xk) = (Xk)-ld(xk) by definition, this implies that d(xk) is proportional to
xk and so x k+1 is proportional to xk also. Due to the homogeneous property, for all
=
k :::: k, we see that xk is proportional to xk and that u(xk)-eln 0 holds recursively.
In the case (ii), we have limk_oo('!jJk+l_'!jJk) = 0, because, in view of Theorem 2.8.4
and Prpoposition 2.8.3, {'!jJk} is a monotonically decreasing sequence bounded below.
This implies that limk_oo lIu(x k ) - elnll = =
0, because limk_oo('!jJk+1 - '!jJk) 0 holds
only if Ilu(x k) - elnll = 0 because of Theorem 2.8.4.
Thus, we have limk_~oo u(xk) = eln in the both cases, which implies that s(xk)
converges to the analytic center of {(y, s)ls = c - AT Y > O} as we see in the similar
manner as in the proof of Theorem 2.7.1 (To see this, we put N = 0 in the proof of
Theorem 2.7.1 and use the fact that {'!jJ(Xk)} is both bounded below and above.)
The second statement associated with the case 2 is proved as follows. Due to Corol-
lary 2.8.2 and eT uk = 1, we have max[u k ] :::: 1/(n - 1). Maximizing the func-
tion on the righthand side of the first inequality of (2.78) under the condition that
max[u] :::: 1/(n - 1) and eT u = 1, we obtain the results.
64 CHAPTER 2
The proof of the former part of the statement 3 is the same as the proof of the
statement 2. We omit the proof for the latter part. _
Recently, Dikin and Roos proved the statement 1 for sequence generated by the
short-step version with J1. = 1 [19). This result seems interesting, because the step-
size II = 1 is the original version of the affine scaling algorithm by Dikin [12) and
hence has a special meaning.
Let 9 ::::: 0 be a nonzero vector, and consider the linear programming problem where
the constraint gT x = 1 is added to (2.74), i.e., we consider the following problem:
minimize x cT x
(2.88)
subject to Ax = 0, x::::: o.
In particular, if we choose 9 = e, we obtain the Karmarkar canonical form [28). Note
that a standard form problem is also readily converted into this form [22). Let L be
the input size of this problem. We assume that (i) a feasible point XO > 0 of (2.88)
is available such that cT xO > 0 and 1jJ(xO) = O( nL); (ii) the optimal value is zero;
(iii) the optimal set is bounded.
It is known that the setting above is general enough to solve any LP problem.
Our objective is to find a feasible point of (2.88) where cT x =
O. Intuitively, this
is attained by decreasing 1jJ(x) to minus infinity. To explain this, for simplicity,
consider the case where 9 =
e. By using the inequality between arithmetic mean
and geometric mean, we have, for any interior feasible point x,
From this relation, it is easy to see that x tends to an optimal solution of (2.88)
when 1jJ(x) tends to minus infinity. More precise argument shows that finding x such
that 1jJ(i:) = -O(nL) is enough to obtain an exact optimal solution by rounding
approximate solution under the assumptions (i)-(iii).
Affine Scaling Algorithm 65
Proposition 2.8.6 The direction xk + 1 - i k is exactly the same as the search direc-
tion of the Karmarkar algorithm applied to (2. 88}.
We deal with the homogeneous problem (2.74). Now, let us consider a constant cost
hyperplane where the objective function value is one. We define the polyhedron
Q+ = {xix E P+, cTx = I}. Under the assumption, Q+ is nonempty. We consider
a conic projection
x
vex) = -;y-.
c x
(2.92)
Observe that, when restricted to Q+, the Karmarkar potential function ~(x) sub-
stantially becomes the log barrier function
- Elogx; (2.93)
66 CHAPTER 2
associated with Q+, because cT x is constant. We can relate the homogeneous affine
scaling algorithm to the Newton method to find the analytic center v· of Q+ which
minimizes the log barrier function. We denote by dN(v) the Newton step to obtain
the analytic center of Q+ at a relative interior point v of Q+. The following theorem
shows that the "conic projection" of the search direction of the homogeneous affine
scaling algorithm is just the Newton direction dN(v).
Theorem 2.8.7 (See [64].) Let x E P+ such that cT x > 0, and x+ (A) be the point
such that cT x+ > 0 obtained by moving in the homogeneous affine scaling direction
with the step-size A. Then, we have
AI(U(X» N
v(x) - 1 _ AI(U(X)) d (v(x»,
where
I(U)=~.
max[u]
(2.95)
In other words, V(X+(A)) coincides the point obtained by making one iteration of the
Newton method at v( x) with the step ( to obtain the analytic center of Q+ .
It is worth discussing a little bit more on the implication of Theorem 2.8.7 in the
case 1 where (2.74) have a unique optimal solution and when we take the step-size
o < A ~ 2/3. In this case, Q+ is bounded and hence the analytic center v· exists.
We have the following proposition.
Proof Due to Theorem 2.8.4, we see that lIu(x) - elnll --+ 0 holds when 1j;(x)
approaches its (unique) minimum. By using the similar agrument as we did in
Theorem 2.7.1 (see also the proof of the first statement of Theorem 2.8.5), conversely,
we see that Ilu(x) - elnll --+ 0 implies that 1j;(x) approaches its minimum. Thus
lIu(x) - elnll --+ 0 and that 1j;(x) approaches its minimum are equivalent. On the
other hand, by definition, Ilv(x) - v·lI--+ 0 is equivalent to that 1j;(x) approaches its
minimum. The proposition readily follows from these two facts. _
Let {xk} be the sequence generated by the homogeneous affine scaling algorithm with
the step-size 0 < A ~ 2/3. It is readily seen from Theorem 2.8.5 that uk --+ eln, and
this immediately implies that v(xk) --+ v·. In view of Theorem 2.8.7, asymptotically
Affine Scaling Algorithm 67
Theorem 2.8.9 If (2.74) has a unique optimal solution and 0 < A :5 2/3, v(x")
converges to the analytic center of Q+. In particular, if A = 1/2, its asymptotic
convergence rate is quadratic.
Thus, the conical projection of the affine scaling direction for a homogeneous problem
generates a Newton direction for the analytic center of Q+. This property gives
some insights into the asymptotic behavior of the affine scaling algorithm, as will be
discussed in §1O-§ 12.
We are interested in the behavior of the algorithm in the final stage, i.e., in a suf-
ficiently small neighborhood of the limiting point xoo. It seems that the constraint
XN ~ 0 which becomes active in the end asymptotically plays a dominant role in
determining the search direction compared with the remaining constraint XB ~ 0
which remains "far away" throughout the iterations. Then, it makes sense to con-
sider the following LP problem obtained by discarding the constraint XB > 0 from
(2.1).
(2.96)
Since ABxif = b, by introducing a new variable ZB = XB - xif, we see that the
problem is equivalent to the following homogeneous LP problem with respect to XN.
minimize XN (2.97)
This is a homogeneous problem in XN space. As we saw in the previous section,
we can associate the Karmarkar potential function to this problem, which is exactly
68 CHAPTER 2
On the other hand, we have the following theorem which shows that dN(xk) and
UN(xk) of the xN-part of the affine scaling direction for (2.1) and its scaled version
are very close to dN(xk) and UN(xk) in the final stage of iterations.
We have the following plausible explanation for why the step-size two-thirds is sharp
for obtaining convergence of the dual estimates [64]. This argument is based on
Theorem 2.8.7.
Affine Scaling Algorithm 69
We take up the homogeneous problem in §8, and use the same notation as §8.3. We
also assume that the case 1 occurs, i.e., the problem (2.74) has a unique optimal
solution, and apply the homogeneous affine scaling algorithm. Let {Xk} be the
generated sequence with the step-size A. Observe that the dual estimate s( x) is a
(nonlinear) function of the direction v(x)= x/cTx as seen from its definition (2.26).
If the projected iterate v(xk) does not converge to a point in Q+, then it is unlikely
that the associated dual estimate (y(xk), s(xk)) converges to a unique point.
Now, we will show that A > 2/3 results in non-convergence of the projected iterate
v(xk) to a unique point. Suppose that we take a step A > 2/3 and v(xk) converges
to an interior point of Q+. This implies that the limit point V OO should be v·
of the analytic center of Q+, since v· is the only interior point of Q+ where the
Newton step dN(v) = 0 holds. Since vk ---> v· implies that u(xk) ---> e/n due to
Proposition 2.8.8, we have limk_oo (k = A/(l - A) = 1 (cf. (2.94)) when A = 1/2
and limk_oo (k = A/(l - A) > 2 when A > 2/3. In view of (2.94), this means
that the iteration with the step-size A > 2/3 results in an overshooting Newton
iteration in the space of v(xk) with the step-size greater than two, which cannot be
convergent to a unique point. This is a contradiction to the assumption that v(xk)
converges. Thus, A > 2/3 implies non-convergence of v", which is likely to result in
non-convergence of s(xk).
As was suggested in §9, most of the convergence results about the affine scaling
algorithm for homogeneous problems have its analogue in the asymptotic behavior
of xN-part of the sequence of the affine scaling algorithm for general problems.
Therefore, it is plausible that an analogous result holds generally, that is, the dual
estimate usually does not converge to a unique point if A > 2/3 in the affine scaling
algorithm. (This is not the case when the problem is nondegenerate, because SN(xk)
converges even if VN(xk) == XN(xk)/ shx'fv does not converge. See Theorem 2.6.1.)
Now, suppose that the problem is homogeneous and has a unique optimal solution,
and see how we can obtain superlinearly convergent sequence to zero in this special
70 CHAPTER 2
=
case. We use the same notations as in §8. In §8.3, we observed that A 1/2 implies
quadratic convergence of the projected iterates V(xk) to the analytic center of Q+.
If the projected iterate v(xk) is sufficiently close to the analytic center v· of Q+,
then u(xk) - eln is very small (cf. Proposition 2.8.8). In this case, since we have
On the other hand, in view of Theorem 2.8.7, if we take the step A'" 1, the step-size
((U(X),A) ('" AI(l - A) when u '" eln) in (2.94) can be very large, and x+ may
not stay well-centered any more in the sense that v(x+) '" v· or u(x+) '" eln hold.
Then we cannot expect to reduce cT x drastically any more in the next step at x+
even if we take another long-step, because (2.98) holds only when u(x+) '" eln.
=
However, if we take the step A 1/2 at x+ instead of taking a long-step, it is possible
to recover the centrality by taking advantage of the quadratic convergence of v( x)
to v*, which enables us again to take another long-step to decrease cT x sufficiently
in the next step.
Based on this idea, we can prove two-step superlinear convergence of the affine
scaling algorithm for homogeneous problems. Furthermore, the idea can be made
use of to implement a superlinearly convergent affine scaling algorithm for general
problems, because most of the convergence results about the homogeneous affine
scaling algorithm can hold in XN-space in general cases asymptotically. Indeed,
Tsuchiya and Monteiro [64] was able to construct a two-step superlinearly convergent
=
affine scaling algorithm by taking the step A 1/2 or A '" 1 alternatively. Stimulated
by this idea, Saigal [46] developed an affine scaling algorithm with a three-step
quadratic convergence property.
homogeneous LP problem:
=
where (3 > 1. (Thus it have an edge (Yl, Y2, YO) t(I/(1 + (3), 1/(1 + (3),1) (t 2: 0).)
This is a dual standard form problem (2.2) where we let
As was shown in (2.33), the iteration of the affine scaling algorithm for this problem
is as follows:
(2.101)
(We omitted the iterative formula for s which is automatically follows from s =
c - AT y.) The point of his example is that it is homogeneous, symmetric with
respect to Yl and Y2, and has no optimal solution. Let T«YI, Y2, Yo» (Y2,YI,YO). =
Due to the homogeneous property and symmetry, we can easily check the following
relations:
y+(p,y, >..) = P,y+(y, >..) (for 0 < p,) and T(y+(y, >..» = y+(T(y), >..). (2.102)
Now, suppose that we could find an interior feasible solution fj such that
This means that the iterates initiated at fj approaches zero, shrinking each of its
components exactly by a factor of J-l2 every two iterations. In other words, the
iterates initiated at fj with step-size >.. converge to the origin, not diverging with
driving Yo to minus infinity. Mascarenhas found that such a point fj exists for
=
>.. 0.999 by setting appropriate (3.
72 CHAPTER 2
This example is a homogeneous problem with no optimal solution. Now, we add one
more constraint
Yo;?: -1 (2.105)
which is parallel to a hyperplane where the objective function is constant. This
problem is no more homogeneous and has an optimal solution whose optimal value
is -1. We can easily show that the search direction ofthe affine scaling algorithm for
this modified problem is the same as the original homogeneous one. Thus, we obtain
the same result with this inhomogeneous problem [31, 57]. Namely, if we start the
iteration from a solution of (2.103), the sequence converges to the nonoptimal vertex
=
Y 0 and fails to find the optimal face where Yo -1. =
There is a simple explanation for why this inconvenience occurs in his example. We
return to the homogeneous case, and introduce a conic projection v(y) =
(Y1, Y2)/YO
for Y such that Yo > O. It is easily verified that the following proposition holds.
Proposition 2.12.1 Let y be an interior feasible solution of (2. 99) such that Yo > O.
y satisfies (2.103) if and only if
v(y+(y, A)) = T(v(fJ)), (2.106)
{(Y1, Y2)1 Yo = 1, (y, s) E V+} = {(Y1, Y2)1 aliY1 + a2iY2 + aOi ::; 0 (i = 1, ... , 4)}.
(2.107)
Obviously, v(y) E Q+. The log barrier function for Q+ is defined as
4 4
- 2)ogs; = - 2)og[-(aliY1 + a2iY2 + aod]· (2.108)
;=1 i=l
Recall that v(y) is a conic projection of an interior feasible solution onto the hy-
perplane where the objective function is a constant. We are analyzing behavior of
the iterates {yk} by conically projecting onto the hyperplane where the objective
function is a constant. This situation is exactly the same as the one we analyzed
in §8.3. We have the following theorem which is a dual standard form version of
Theorem 2.8.7.
Theorem 2.12.2 v(y+(y, A)) - v(y) is proportional!o the Newton direction J,N at
v(y) to minimize the log barrier function (2.108) of Q+.
Affine Scaling Algorithm 73
Now, we subtract v(y) from the both sides of (2.106). Due to the theorem above
and the definition of T, we obtain
where Cl, C2 are scaling constants. Thus, in view of Theorem 2.12.2, we can char-
acterize the set of initial points which generates the sequences convergent to y 0=
for a certain A as the set of points y satisfying
Theorem 2.12.3 There exists an instance of LP problem where the affine scaling
algorithm with a A ::; 0.92 fails to converge to an optimal solution.
each affine scaling trajectory to a straight line. Tanabe and Tsuchiya [56) observed
that this structure is nicely interpreted in the framework of the information geometry
by Amari and Nagaoka (5).
the second one is referred to as the second order affine scaling algorithm. There are
several convergence results obtained so far.
The convex quadratic programming problem (CQP) is the most direct extension of
LP. Sun extended the global convergence result of Tseng and Luo for LP to the
second order algorithm [53]. Monteiro and Tsuchiya proved global convergence of
the second order algorithm for CQP without non degeneracy assumption with the
=
step-size up to J-l 2/3, by extending the prooffor LP [36].
On the other hand, Gonzaga and Carlos [24] proved global convergence of the first
order algorithm for a convex function under the assumption that p+ is non degen-
erate. Recently, Monteiro and Wang proved global convergence of the second order
algorithm for convex and concave function under the same nondegeneracy condition
[38].
minimize (x,t) cT x + Mt
(2.111)
subject to Ax - t(Ax O - b) = b, x ~ 0, t ~ 0,
where X O is a positive vector. Obviously, this problem have an interior feasible
solution (x,t) = (xO, 1), and if M is sufficiently large, t is forced to 0 at its optimal
solution. This means that (2.1) can be solved by (2.111) if M is sufficiently large. In
this approach, we need to choose appropriate M in advance. Kojima and Ishihara
proposed a procedure to change M adaptively while running the algorithm to end
up with a sufficiently large M. See [27].
Since the optimal face of (2.112) is the feasible set of (2.1) and X OO is a relative
interior point of the optimal face of (2.112), X OO is a relative interior point of the
feasible region of (2.1). Then, (2.1) is equivalent to the following problem
for which xfj > 0 is available as an initial interior feasible solution for the affine
scaling method to solve this problem. We obtain the optimal solution of the original
problem by solving (2.113) with the affine scaling algorithm.
Affine Scaling Algorithm 77
REFERENCES
[1] Adler, I., Karmarkar, N., Resende, M., and Veiga, G., "Data structures and
programming techniques for the implementation of Karmarkar's algorithm,"
ORSA Journal on Computing, Vol. 1, No.2 (1989), pp. 84-106.
[2] Adler, I., Resende, M., Veiga, G., and Karmarkar, N., "An implementation of
Karmarkar's algorithm for linear programming," Mathematical Programming,
Vol. 44 (1989), pp. 297-335.
[3] Adler, I., and Monteiro, R. D. C., "Limiting behavior of the affine scaling con-
tinuous trajectories for linear programming problems," Mathematical Program-
ming, Vol. 50 (1990), pp. 29-51.
[4] Alizadeh, F., "Interior point methods in semidefinite programming with appli-
cations to combinatorial optimization," SIAM Journal on Optimization, Vol.5
(1995), pp.13-52.
[5] Amari, S.-I., "Differential-Geometrical Methods in Statistics," Lecture Notes in
Statistics, Vol. 28, Springer-Verlag, Berlin, 1985.
[6] Anstreicher, K., "Linear programming and the Newton barrier flow," Mathe-
matical Programming, Vol. 41 (1988), pp.367-373.
[7] Barnes, E. R., "A Variation on Karmarkar's algorithm for solving linear pro-
gramming problems," Mathematical Programming, Vol. 36 (1986), pp. 174-182.
[8] Bayer, D. A., and Lagarias, J. C., "The nonlinear geometry of linear program-
ming, I. Affine and projective trajectories," Transactions o/the American Math-
ematical Society, Vol. 314, No.2 (1989), pp. 499-526.
[9] Bayer, D. A., and Lagarias, J. C., "The nonlinear geometry of linear program-
ming, II. Legendre transform coordinates and centeral trajectories," Transac-
tions o/the American Mathematical Society, Vol. 314, No.2 (1989), pp. 527-581.
[10] Cavalier, T. M., and Soyster, A. 1., "Some computational experience and a
modification of the Karmarkar algorithm," The Pennsylvania State University,
ISME Working Paper 85-105, 1985.
[11] Cheng, Y.-C., Houck, D. J., Liu, J.-M., Meketon, M. S., Slutsman, L., Vanderbei,
R. J., and Wang, P., "The AT&T KORBX system," AT&T Technical Journal,
Vol. 68, No.3 (1989), pp. 7-19.
[12] Dikin, I. I., "Iterative solution of problems of linear and Quadratic program-
ming," Soviet Mathematics Doklady, Vol. 8 (1967), pp. 674-675.
78 CHAPTER 2
[13] Dikin, I. I., "0 skhodimosti odnogo iteratsionnogo protsessa "(in Russian), Up-
ravlyaemye Sistemy, Vol. 12 (1974), pp. 54-60.
[14] Dikin, I. I., "Letter to the editor," Mathematical Programming, Vol. 41 (1988),
pp. 393-394.
[15] Dikin, I. I., "The convergence of dual variables," Technical Report, Siberian
Energy Institute, Irkutsk, Russia, December, 1991.
[16] Dikin, I. I., "Determining the interior point of a system of linear inequalities,"
Cybernetics and Systems Analysis, Vol. 28(1992), pp. 54-67.
[17] Dikin, I. I., "Affine scaling methods for linear programming," Research Memo-
randum No. 479, The Institute of Statistical Mathematics, Tokyo, Japan, June,
1993.
[18] Dikin, I. I., Private communication, 1993.
[19] Dikin, 1.1., and Roos, C., "Convergence of the dual variables for the primal affine
scaling method with unit steps in the homogeneous case," Report No. 94-69,
Faculty of Technical Mathematics and Informatics, Delft University of Technol-
ogy, Delft, Netherlands, 1994.
[20] Dikin, I. I., and Zorkaltsev, V. I., "Iterativnoe Reshenie Zadach Matematich-
eskogo Programmirovaniya(Algoritmy Metoda Vnutrennikh Tochek)" (in Rus-
sian), Nauka, Novosibirsk, USSR, 1980.
[21] Gay, D., "Stopping tests that compute optimal solutions for interior-point linear
programming algorithms," Numerical Analysis Manuscript 89-11, AT&T Bell
Laboratories, Murray Hill, NJ, USA, 1989.
[22] Gonzaga, C. C., "Conial projection algorithms for linear programming," Math-
ematical Programming, Vol. 43 (1989), pp. 151-173.
[23] Gonzaga, C. C., "Convergence of the large step primal affine-scaling algorithm
for primal non-degenerate linear programs," Technical Report, Department of
Systems Engineering and Computer Sciences, COPPE-Federal University of Rio
de Janeiro, Brazil, 1990.
[24] Gonzaga, C. C., and Carlos, A., "A primal affine-scaling algorithm for linearly
constrained convex programs," Technical Report ES-238/90, Department ofSys-
terns Engineering and Computer Science, COPPE-Federal University of Rio de
Janeiro, Brazil, December 1990.
[25] Giiler, 0., den Hertog, D., Roos, C., Terlaky, T., and Tsuchiya, T., "Degener-
acy in interior point methods for linear programming," Annals of Operations
Research, Vol. 47 (1993), pp. 107-138.
Affine Scaling Algorithm 79
[26] Hall, L. A., and Vanderbei, R. J., "Two-thirds is sharp for affine scaling," Op-
erations Research Letters, Vol. 13 (1993), pp. 197-201.
[27] Ishihara, T., and Kojima, K., "On the big M in the affine scaling algorithm,"
Mathematical Programming, Vol. 62 (1993), pp. 85-94.
[28] Karmarkar, N., "A new polynomial-time algorithm for linear programming."
Combinatorica, Vol. 4, No.4 (1984), pp. 373-395.
[29] Karmarkar, N., and Ramakrishnan, K., "Further developments in the new
polynomial-time algorithm for linear programming," Talk given at ORSA/TIMS
National Meeting, Boston, MA, USA, April, 1985.
[30] Kortanek, K. 0., and Shi, M., "Convergence results and numerical experiments
on a linear programming hybrid algorithm," European Journal of Operations
Research, Vol.32 (1987), pp. 47-61.
[31] Mascarenhas, W. F., "The affine scaling algorithm fails for A = 0.999." Techni'Cal
Report, Universidade Estadual de Campinas, Campinas S. P., Brazil, October,
1993.
[32] McShane, K. A., Monma, C. L., and Shanno, D. F., "An implementation of a
primal-dual interior point method for linear programming," ORSA Journal on
Computing, Vol. 1 (1989), pp. 70-83.
[33] Megiddo, N., and Shub, M., "Boundary behavior of interior point algorithms
for linear programming," Mathematics of Operations Research, Vol. 14, No.1
(1989), pp. 97-146.
[34] Mehrotra, S., "Implementations of affine scaling methods: approximate solutions
of system oflinear equations using preconditioned conjugate gradient methods,"
Technical Report, Department of Industrial Engineering and Management Sci-
ences, Northwestern University, Evanston, IL 60208, USA, 1989.
[35] Monma, C. L., and Morton, A. J., "Computational experience with a dual affine
variant of Karmarkar's method for linear programming," Operations Research
Letters, Vol. 6 (1987), pp. 261-267.
[36] Monteiro, R., and Tsuchiya, T., "Global convergence of the affine scaling algo-
rithm for convex quadratic programming," Research Memorandum, The Insti-
tute of Statistical Mathematics, Tokyo, Japan, March 1995.
[37] Monteiro, R., Tsuchiya, T., and Wang, Y., "A simplified global convergence
proof of the affine scaling algorithm," Annals of Operations Research, Vol. 47
(1993), pp. 443-482.
80 CHAPTER 2
[38] Monteiro, R., and Wang, Y., "Trust region affine scaling algorithms for linearly
constrained convex and concave programs," Manuscript, School of Industrial
and Systems Engineering, Georgia Institute of Technology, Atlanta, USA, 1995.
[39] Muramatsu, M., and Tsuchiya, T., "Convergence analysis of the projective
scaling algorithm based on a long-step homogeneous affine scaling algorithm,"
Manuscript, September 1995. (To appear in Mathematical Programming. A re-
vised version of "A convergence analysis of a long-step variant of the projective
scaling algorithm," Research Memorandum No. 454, The Institute of Statistical
Mathematics, Tokyo, Japan, October 1992.)
[40] Muramatsu, M., and Tsuchiya, T., "Affine scaling method with an infeasi-
ble starting point," Research Memorandum No.490, The Institute of Statistica
Mathematics, Tokyo, Japan, 1994.
[41] Muramatsu, M., and Tsuchiya, T., "Affine scaling method with an infeasi-
ble starting point: Convergence analysis under non degeneracy assumption,"
Manuscript, 1995. (To appear in Annals of Operations Research.)
[42] Nesterov, Yu., and Nemirovskiy, A., "Interior Point Polynomial Methods in
Convex Programming," SIAM Publications, Philadelphia, Pensnsylvania, USA,
1994.
[43] Resende, M., Tsuchiya, T., and Veiga, G., "Identifying the optimal face of a
network linear program with a globally convergent interior point method," In
Large Scale Optimization: State of the Art (eds. W. W. Hager et al.), Kluwer
Academic Publishers, Netherlands, 1994.
[44] Resende, M., and Veiga, G., "An efficient implementation of a network interior
point method," Manuscript, AT&T Bell Laboratories, Murray Hill, NJ, USA,
March, 1992.
[45] Saigal, R., "A simple proof of primal affine scaling method," Technical Report,
Department of Industrial and Operations Engineering, University of Michigan,
Ann Arbor, MI48109-2117, USA, March, 1993. (To appear in Annals of Opea-
rations Research.)
[46] Saigal, R., "A three step quadratically convergent implementation of the primal
affine scaling method," Technical Report No.93-9, Department of Industrial and
Operations Engineering, University of Michigan, Ann Arbor, MI48109, USA,
1993.
[47] Saigal, R., "The primal power affine scaling method," Technical Report No.93-
21, Department of Industrial and Operations Engineering, University of Michi-
gan, Ann Arbor, MI48109, USA, 1993. (To appear in Annals of Opearations
Research.)
Affine Scaling Algorithm 81
[48] Saigal, R., "Linear Programming: A Modern Integrated Analysis," Kluwer Aca-
demic Publishers, Netherlands, 1995.
[49] Schrijver, A., "Theory of Linear and Integer Programming." John Wiley & Sons,
Chichester, England, 1986.
[50] Sinha, L., Freedman, B., Karmarkar, N., Putcha, N., and Ramakrishnan, K.,
"Overseas network planning," Proceedings of "the Third International Network
Planning Sysmposium - Networks' 86" (IEEE Communications Society, held on
June 1-6, 1986, Tarpon Springs, Florida, USA), pp. 121-124.
[51] Sonnevend, G., "An "analytic centre" for polyhedrons and new classes of global
algorithms for linear (smooth, convex) programming," Lecture Notes in Control
and Information Sciences, Springer-Verlag, New York, Vol. 84, pp. 866-876,
1985.
[52] Stewart, G. W., "On scaled projections and pseudo inverses," Linear Algebra
and its Applications, Vol.112 (1989), pp.189-193.
[53] Sun, J., "A convergence proof for an affine-scaling algorithm for convex
quadratic programming without non degeneracy assumptions," Mathematical
Programming, Vol.60 (1993), pp.69-79.
[54] Tanabe, K., "Center flattening transformation and a centered Newton method
for linear programming," Manuscript presented at MP seminar, the Operations
Research Society of Japan, July, 1987.
[55] Tanabe, K., "Differential geometry of Optimization" (in Japanese), Preliminary
issue of the Bulletin of the Japan Society for Industrial and Applied Mathemat-
ics, No.3 (1990), pp. 39-50.
[56] Tanabe, K., and Tsuchiya, T., "New geometry of linear programming" (in
Japanese), Mathematical Science, No.303 (1988), pp. 32-37.
[57] Terlaky, T., and Tsuchiya, T., "A note on Mascarenhas' counter-example about
global convergence of the affine scaling algorithm," Manuscript, March, 1996.
[58] Todd, M. J., "A Dantzig-Wolfe-like variant of Karmarkar's interior point method
for linear programming," Operations Research, Vol. 38(1990), pp.1006-1018.
[59] Tseng, P., and Luo., Z.-Q., "On the convergence of the affine-scaling algorithm,"
Mathematical Programming, Vol. 56 (1992), pp. 301-319.
[60] Tsuchiya, T., "On Yamashita's method and Freund's method for linear program-
ming" (in Japanese), Cooperative Research Report of the Institute of Statistical
Mathematics, Vol. 10 (1988), pp. 105-115.
82 CHAPTER 2
[61] Tsuchiya, T., "Dual standard form linear programming problems and Kar-
markar's canonical form" (in Japanese), Lecture Note of the Research Institute
of Mathematical Sciences, Vol. 676 (1988), pp. 330-336.
[62] Tsuchiya, T., "Global convergence of the affine scaling method for degener-
ate linear programming problems," Mathematical Programming, Vol. 52 (1991),
pp. 377-404.
[63] Tsuchiya, T., "Global convergence property of the affine scaling method for
primal degenerate linear programming problems," Mathematics of Operations
Research, Vol. 17, No.3 (1992), pp. 527-557.
[64] Tsuchiya, T., and Monteiro, R. D. C., "Superlinear convergence of the affine
scaling algorithm." Technical Report, CRPC-92288, Center for Research on
Parallel Computation, Rice University, Houston, USA, November, 1992. (To
appear in Mathematical Programming.)
[65] Tsuchiya, T., and Muramatsu, M., "Global convergence of a long-step affine
scaling algorithm for degenerate linear programming problems," SIAM Journal
on Optimization, Vol. 5, No.3 (1995), pp.525-551.
[66] Tsuchiya, T., and Tanabe, K., "Local convergence properties of new methods in
linear programming," The Journal of the Operations Research Society of Japan,
Vol. 33, No.1 (1990), pp. 22-45.
[67] Vanderbei, R. J., and Lagarias, J. C., "I. I. Dikin's convergence result for the
affine-scaling algorithm," Contemporary Mathematics, Vol. 114 (1990), pp. 109-
119.
[68] Vanderbei, R. J., Meketon, M. S., and Freedman, B. A., "A modification of
Karmarkar's linear programming algorithm," Algorithmica, Vol. 1 (1986), pp.
395-407.
[69] Vavasis, S. T., and Ye, Y., "A primal-dual accelerated interior point method
whose running time depends only on A," Technical Report, Department of Com-
puter Science, Cornell University, December, 1994.
[70] Wang, Y., and Monteiro, R., "Non degeneracy of polyhedra and linear pro-
grams," Manuscript, School of Industrial and Systems Engineering, Georgia
Institute of Technology, Atlanta, USA, 1994. (To appear in Computational Op-
timization and Applications.)
[71] Witzgall, C., Boggs, P. T., and Domich, P. D., "On the convergence behavior
of trajectories for linear programming," Contemporary Mathematics, Vol. 114
(1990), pp. 161-187.
3
TARGET-FOLLOWING METHODS
FOR LINEAR PROGRAMMING
Benjamin Jansen, Cornelis RODS,
Tamas Terlaky
Faculty of Technical Mathematics and Computer Science
Delft University of Technology
Mekelweg 4, 2628 CD, Delft, The Netherlands
ABSTRACT
We give a unifying approach to various primal-dual interior point methods by performing
the analysis in 'the space of complementary products', or v-space, which is closely related to
the use of weighted logarithmic barrier functions. We analyze central and weighted path-
following methods, Dikin-path-following methods, variants of a shifted barrier method
and the cone-affine scaling method, efficient centering strategies, and efficient strategies for
computing weighted centers.
3.1 INTRODUCTION
In this chapter we offer a general framework for the convergence analysis of primal-
dual interior point methods for linear programming (LP). This framework is general
enough to apply to very diverse existing methods and still yield simple convergence
proofs. The methods being analyzable in this context are called target-following.
These methods appeared to be closely related to the methods using a-sequences
developed by Mizuno [24, 25] for linear complementarity problems (LCPs).
(P) mill { cT x : Ax = b, x ~ 0 } ,
x
83
T. Terlaky (ed.), Interior Point Methods o/Mathematical Programming 83-124.
© 1996 Kluwer Academic Publishers.
84 CHAPTER 3
b,
C, (3.1 )
-2
V ,
for v E lR~ + (i.e., v > 0). The basic result in the development and analysis of
target-following methods is contained in the following theorem, establishing a one-
to-one correspondence between positive primal-dual pairs (x, s) and positive vectors
in lRn. The theorem was proved by McLinden [22], Kojima et al. [20], see also Giiler
et al. [11].
Theorelll 3.1.1 Let there exist at least one positive primal-dual pair for (P) and
(D). Then for each v E lR~+ there exists a unique positive primal-dual pair (x, s)
such that Xi Si = v; ,
i = 1, ... ,n, i. e., a pair solving system (3.1).
The existence of the solution follows from the observation that the given system is
the Karush-Kuhn-Tucker (KKT) system for minimizing the weighted logarithmic
barrier function
n
f(x, s;v) = xT S - LV; InXiSi (3.2)
i=l
on the primal and dual set. We now define the v-space of a given LP problem as the
space of (the square roots of) the complementary products of positive primal-dual
pairs:
v(O) = Vx(O)s(O). Atkinson and Vaidya [1] discuss how the efficiency of Newton's
method is affected by differences in the elements of a weight-vector. They give a
simple example demonstrating that when the ratio between the smallest and the
largest weight decreases, the region where Newton's method converges gets smaller.
Hence, a natural way of measuring the closeness of a point to the central path appears
to be this ratio, which is denoted as
Note that 0 < w(v) ::; 1, with equality if and only if v is on the central path. To
combine centering and improving complementarity we will be interested in trajec-
tories of which the image in the v-space passes through v(O) and is tangent to the
main diagonal at the origin of the positive orthant.
for certain values JJo > 0 and 0 ::; Ok ::; 1, where k is the iteration number. A
weighted-path following algorithm has a given v(O) > 0 and sets
However, the one-to-one correspondence between points in the v-space and positive
primal-dual pairs (x, s) su?gests that, to solve t~e LP problem, we can follow any
sequence of targets {vCk )} III the v-space, for whIch eT (vCk»)2 tends to zero, hence
86 CHAPTER 3
leads to optimality. The same methodology can be used to solve other problems, like
computing weighted centers. Note that a target-sequence may consist of an infinite
as well as a finite number of targets; a target-sequence can be predetermined, but
also adaptively constructed during the algorithm.
A(x + ~x) = b,
AT (y + ~y) + s + ~s = c,
(x + ~x)(s + ~s) = -2
v.
Target following for LP 87
Applying Newton's method to this system we remove the nonlinear term in the last
equation and obtain the following relations for the displacements:
A~x 0,
AT ~y+ ~s 0, (3.4)
x~s+s~x v2 _ v2 •
d:= ";xs- 1 .
d- 1 x = ds = v.
The main property of the scaling is that it maps both x and s to the vector v; this
property is extended to a nonlinear setting by Nesterov and Todd [28]. We also use
d to rescale ~x and ~s:
P. := d~s.
Note that the orthogonality of ~x and ~s implies that Px and P. are orthogonal
as well. Thus, in the scaled space, the search-directions Px and P. are orthogonal
components of the vector
p" := px + P.· (3.5)
By definition, we may write
ADpx o
DATpy + P. o
Px + P. v-I (v 2 - v 2) = p".
Note that Px and P. are simply the orthogonal decomposition of p" in the nullspace of
AD and the row space of AD respectively. Note that this is established by the scaling
with d. We mention here that this is the last time that the data A, b, c explicitly
appear in this section, and that the data only come in via an initial starting point.
This has the great advantage that we work completely in the v-space from now on.
88 CHAPTER 3
qv := px - P.·
Note that the orthogonality of Px and P. implies that IIqvll = IIPvll. We also have
px t(Pv + qv),
P. t(Pv - qv),
whence
(3.6)
The product PxP. plays an important role in the analysis. It represents the sec-
ond order effect in the Newton step, which needs to be small to prove efficiency of
Newton's method. Indeed, we have
Lemma 3.2.1 One has IIPxp.lloo :5I1PvIl 2 /4 and IIPxp.1I :5I1PvIl 2 /(2-/2).
In the analysis of target-following algorithms we will need a measure for the prox-
imity of the current iterate v to the current target v. For this purpose we introduce
the following proximity measure:
6(v;v) := 1
2min(v) IIPvli = 1
2min(v) v2v
II - - v211
- . (3.7)
We point out that this proximity measure is in the spirit of the Roos-Vial measure
[30], and the primal-dual measures discussed in Jansen et al. [19]. Note that this
measure is not symmetric in the iterate v and the target v. Defining
V
'U := -, (3.8)
v
the measure can be rewritten as
1 1
6(v;v) = 2 mmv
. ( ) Ilv- 1 (v 2 - v2 )11 = 2 mmv
. ( ) IIv(u - u-1)II· (3.9)
Let us indicate that if v2 = J.le for some positive J.I then this amounts to
which is up to the factor 1/2 equal to the proximity measure used in [19]. A similar
measure, namely
6M (v;v):= 2mi~(v) Ilv2; v211,
was used by Mizuno [24, 25]. This proximity measure differs from ours by a factor
involving
The next lemma is concerned with bounding these quantities. Moreover, our analysis
will show that these quantities are very important for the proximity in the v-space.
Lemma 3.2.2 Let 6 := 6(v;v) and u as defined in (3.7) and (3.8). Then it holds
1
p(6) ~ Ui ~ p(6), i = 1, .. . n,
where
p(6) :=6+~. (3.10)
90 CHAPTER 3
-2u;8 ~ 1 - u; ~ 2u;8,
or
u; - 2u;8 - 1 ~ 0 ~ u; + 2Ui8 - 1.
One easily verifies that this is equivalent to
p(8)-1 ~ Uj ~ p(8).
This proves the lemma. o
We proceed by investigating when the (full) Newton step to the target-point v can
be made without becoming infeasible, i.e., under which conditions the new iterates
x+ := x + dx and s+ := s + ds are positive. The next lemma gives a simple
condition on 8(v;v) which guarantees that the property is met after a Newton step.
Lemma 3.2.3 If IIv- 2p.,Pslloo < I, the Newton step is feasible. This condition is
satisfied if8 := 8( v; v) < 1.
Proof Let 0 ~ 0' ~ 1 be a step length along the Newton direction. We define
= =
x(n) x + ndx and sen) s + nds. Then we have
Letting Q = 1 in (3.11) and denoting (v+)2 = x+s+ we get the useful relation
(3.12)
The following lemma shows that if the current iterate v is close enough to the target
v, the Newton step ensures quadratic convergence of the proximity measure.
Lemma 3.2.4 Assume that 8 := 8( v; v) < 1 and let v+ result from a Newton step
at v with respect to v. Then one has
+ _2 84
8(v ;v) :::; 2(1- 82 )
Proof. From Lemma 3.2.3 we know that x+ and s+ are feasible. For the calculation
of 8(v+; v) we need v+. From (3.12) and Lemma 3.2.1 we get
.1 II(v+)-1(v2_(v+?)112
4mm(vp
4 . \ pll(v+)-lpxPsI12
mm v
1 IIPxPsl12
< 4 min( v p min( v+)2 .
( + _2 84
8 v ; v) :::; 2 (1 _ 8 2 )'
For 8 := 8( v; v) < ..j2J3 it holds 8( v+; v) < 8, implying convergence of the sequence
of Newton steps, while for 8 < 1/>12 it holds 8(v+;v) < 82 , guaranteeing quadratic
convergence.
The Newton step has another important consequence, namely that the duality gap
after the step has the same value as the gap in the target v.
Lemma 3.2.5 Let the primal-dual feasible pair (x+, s+) be obtained from a full
Newton step with respect to v. Then the corresponding duality gap achieves its target
value, namely (x+f s+ = Ilv112.
Proof Recall from (3.12) that (V+)2 = v 2 + pxP•. Hence, using orthogonality of Px
and p. we may write
This lemma has two important implications. First, if subsequent Newton steps would
be taken with v fixed, then the duality gap would remain constant. Furthermore, if
we take only full Newton steps in an algorithm (as is typically done in short-step
methods) the lemma implies that we do not have to bother about the duality gap
in the iterates themselves, but that it suffices to consider the duality gap in targets.
To complete the general results we will analyze the effect on the proximity measure
of a Newton step followed by an update in the target. This is technically a bit more
easily than analyzing the effect of an update in the target followed by a Newton
step, since now we can just use P. as defined before. Although the lat~er might seem
more natural both approaches are of course equivalent. We will do the analysis in
a very general setting, such that in the sequel it will be an easy task to apply this
theorem and derive polynomial complexity bounds for various applications.
Recall from (3.12) that (V+)2 = v 2 + PxP. and from (3.13) that
1 II (V+)2 - v2 v II 1 IIPxPs II
2 min(v+) v v+ + 2 min( v+) --;;+
< 8(v;V+) II v: 1100 + 2min(v+~min(v+)2~llpvI12
min (v? 2
< 8(v·V+)p(8(v+·v)) + 8
, 'V2min(v+) min(v+)
8- ~ 8 + - min (v) 82
< (v; V )p( (v ; v)) + min(v+) )2(1- 82 )'
where the last inequality follows from (3.14). Finally, from Lemma 3.2.4 we obtain
82
8(v + . v) < -~;====:;;~
, - )2 (1- 82 )
Substituting 8 ::; 1/2 yields 82 / )2( 1 - 82) ::; 1/ (2V6) and p( 8( v+ ; v)) ::; V6/2. This
gives the bound. 0
We will later apply this theorem several times in the following way. Given v close
to a target v such that 8( v; v) < 1/2, we need to determine a condition on the new
target v+ such that v+ will be in the region of quadratic convergence around 1)+, in
other words, such that 8( v+; v+) < 1/2. The lemma implies that this can be done by
measuring the proximity 8(v; v+) between the targets and the ratio min(v)/min(v+).
3.3 APPLICATIONS
We will now apply the general ingredients from Section 3.2.2 to various primal-dual
algorithms found in the literature, and to some primal-dual variants of pure primal
or dual methods that appear in the literature. The reader should recall that the only
94 CHAPTER 3
It is evident that
Lemma 3.3.1 Let v be given and let w = min(v)/max(v); using the target update
v+ =
~v, we have
min (v) 1
and
min(V+) - VI - {}
1 II (1 - (})v 2- v211
2~min(v) v
2~min(V) II {}vl I
1 {}vfn
< 2~w .
Target following for LP 95
As is clear from the lemma, in the maximal step size we have to take into account
w. Combining Lemma 3.3.1 with Theorem 3.2.6 gives that 6(v+;V+) < 1/2 for
8 = w/(3fo.). Since IIv+W = (1 - 8) II vII 2 , we get by Lemma 3.2.5 that the num-
ber of iterations required for the algorithm is O(fo./wln(x(O)ls(O)/f). Note that
for central path-following methods w = 1, so the complexity bound is negatively
influenced by non-central starting points. The bound is in accordance with [3] for
weighted path-following.
Predictor-corrector Methods
We will now analyze a predictor-corrector variant of the path-following algorithm.
As above, we assume an initial V<0) been given. Let (x, s) be the current iterate. An
'iteration' of the algorithm consists of two steps: the predictor step, which is a step
in the primal-dual affine-scaling direction, followed by a centering (corrector) step.
Let the current iterate be (x, s), and target
such that 6( v; v) :::; 1/4. In the predictor step, the target is zero; using step size ()
the new complementarity satisfies
(3.15)
For the corrector step, we specify a new target on the path determined by V<0) as
follows:
T( +)2
(V+)2:= e v (v(O)?
eT (V(0)2
We claim that there exists a 'sufficiently large' value for 8 such that 6( v+; v+) :::; 1/2.
From the quadratic convergence result of Lemma 3.2.4 it then immediately follows
that we can compute v++ such that 6(v++;V+) :::; 1/4 in one Newton step towards
v+. Lemma 3.2.5 implies that
(3.16)
Consequently, defining
96 CHAPTER 3
it follows 6(v++;V++) :::; 1/4. It remains to prove the claim. The next lemma
provides a lowerbound on 8 such that 6( v+; v+) :::; 1/2; in practice we can compute
that value 8 such that 6( v+; v+) = 1/2, which will be (much) larger in general.
Lemma 3.3.2 Let the iterates and targets be as defined above. IJ6(v;v):::; 1/4 and
8:= w(v)/(2fo) then 6(v+;V+) :::; 1/2.
Proof First, note that eT (V+)2 = (1 - 8)eT v 2. For the predictor step Ax, As as in
(3.15) Lemma 3.2.1 implies
(3.17)
and
(3.18)
By definition
1
--r==================== x
2)( eT (v+)2 / eT (vC°»)2)min( (vC°»)2)
(e T (v+)2 / eT (V<0»)2)(vC°»)2 - v 2(1 - 8) - 82AxAs I
II Jv 2 (1 - 8) + 82AxAs
1
< x
2~)(eT v 2 /e T (vC 0 »)2)min«vC 0 »)2)
1
x
Jl- 8 - 82l1v-2AxAslloo
2 2 w(v) W(V)2 n 1 1
1- 8 - 8 Ilv- AxAsll oo 2: 1- 2fo - ~ 4w(v)2 2: 1- 2../2 - 16'
Hence
<
To estimate the last term we note that eT v 2 = eT v2 , while v2 = a(V<°))2 for some
o< a < 1. From Lemma 3.2.2 it follows
min(v) _
~(_) :::; p(8(v; v)).
mm v
Combining these bounds we obtain
1 4 1 1
8(v+;V+):::; 3+ 3 / . p(I/4) < 2'
32y 1 - 1/(20)
which proves the lemma. o
To obtain a complexity bound, we need to show that the step-size can be bounded
from below by a uniform constant. Let i, j be such that min( v) = Vi and max( v) = Vj.
Then
min(v) min(v) Vi max(v)
w(v)
max( v) ~ max(v) max( v)
1 min (v) Vj 1 _ 3 (-(0)
> --;-:-;--~ - ----- > w > -w v ).
p(8(v; v)) max(v) max(v) - p(8(v; v))2 - 5
Together with (3.16) this shows that, to reach an f-approximate solution, the algo-
rithm requires at most O( fo/w(V<°)) In 1/ f) iterations. Observe, that the estimation
in (3.17) essentially determines the complexity bound; in practice the actual value
Ilv- 2 ~x~sll will determine the actual predictor step that can be taken.
Motivation
In Jansen et al. [17] the primal-dual Dikin-affine scaling direction at v is introduced
by using the solution of the subproblem
defined in the v-space. This problem can be interpreted as finding the direction in
the v-space that aims at a maximum decrease in the duality gap within the Dikin-
ellipsoid in the v-space. The solution ~v is given by -v3 /1Iv2 11. Let us now use the
vector field of the primal-dual Dikin direction and its associated set of trajectories.
The equation of the trajectory passing through v E IR~+ and tangent to the vector
field is given by
v
lI>(t;v) = ~' t:?: o. (3.19)
vv2 t + e
It holds 11>(0; v) =
v and, for t --+ 00, lI>(t; v) tends to zero tangentially to the vector e.
Observe that the central path and the weighted paths discussed in Section 3.3.1 are
straight half-lines emanating from the origin. Contrary to these, the path defined
by (3.19) is a smooth curve connecting v and the origin, see Figure 3.1.
central path
v (0)
° VI
Figure 3.1 The central path, the weighted path through v(0) and the Dikin-path
through v(0) in the v-space.
We first show that lI>(t; v) defines a path in the v-space, henceforth called the Dikin-
path starting at v, and derive some interesting properties.
(ii) For any t 2: 0 it holds that ifv; ::; Vj then 11>;(t; v) ::; I1>j (t; v);
(iii) For any t 2: 0 it holds w(l1>(t; v)) 2: w(v).
max(v)2t + 1
w(V) --'-~2-- > w (_)
V .
min(v)t+l-
o
Algorithms using such paths were introduced in Jansen et al. [18]. Note that it
combines centering and moving towards optimality, as opposed to a weighted path.
We stress that centering is very important in interior point methods. A sequence
of iterates that approximate the central path (in the limit) will generate points
converging to the analytic center of the optimal face, see Guier and Ye [12]. It is
well-known that this center is a strictly complementary solution, thereby defining
the optimal partition of the indices that characterizes the optimal set, which is
very useful in sensitivity analysis. Also, the asymptotic analysis of certain interior
point methods use the centering to prove superlinear or quadratic convergence of
algorithms, see e.g., [10, 9].
Since we are interested in the behavior of Newton's method per iteration we just
denote v := v(k-l), v+ := v(k) and t := tk. We also use w := w(v). Taking for
~x and ~s the displacements according to a full Newton step with respect to the
target-point v, we can now formally state the algorithm as in Figure 3.2.
Input
(x eO) ,s(O): the initial pair of interior feasible solutions;
v(O) := y' x(O)s(O);
Parameters
f is the accuracy parameter;
t is the step size (default value wo/(3v~fo»;
begin
x := xeD); s := sea); v:= ViS;
while x T s > f do
v:= v/Jv2t + e;
compute (~x, ~s) from (3.4);
x:= x + ~x;
end'
s·= s + ~s·,
end.
From Section 3.2.2 it is clear that the only thing remaining to analyze a target-
following method, is to guarantee that a sufficiently large step size in the v-space can
be taken, and to use this to compute the number of steps needed by the algorithm.
Specifically, we should check for which value of t the conditions of Theorem 3.2.6
hold.
Target following for LP 101
Lemma 3.3.4 Let v+ result from a step along the Dikin-path with step size
W
t := 3v~fo.
Then
_m_l_·n~(v-".)_ < ~ and
min(v+) - 8
Proof. Using min(v) = VI and min(v+) = vt, the first bound follows from
(3.20)
Furthermore
<
6Jw'l/(3fo) +1
1
<
6
This completes the proof. D
Assuming 6( v; v) < 1/2, combining Theorem 3.2.6 with Lemma 3.3.4 shows that we
can compute v+ in one Newton step such that 6( v+; v+) < 1/2. We proceed by con-
sidering the reduction of the duality gap in the algorithm. Recall from Lemma 3.2.5
that after a full Newton step the duality gap attains its target value, so we only need
to consider the duality gaps eT v2 resulting from successive target values. Using this,
we prove the following theorem.
Theorem 3.3.5 Let (x(O), s(O)) be a given initial point and lei
102 CHAPTER 3
If the step size t has in every iteration the value wl(3Vnt;;) then after at most
o (_Vn_n In -'.-(x_C_o).:.-)T_s_(0_) )
w~ f
iterations the algorithm stops with a positive primal-dual pair (x*, SO) satisfying
(x*f s* ::::: f.
If, as before, the target-point at the beginning of some iteration is denoted as v and
at the end of the same iteration as V+, then we have
1 + w~/(3Vn)'
From the theorem we see that whenever (x(O))T s(O) = 0(1) and wo = n(l), the
target-following algorithm runs in O( Vn In 11 f) iterations. Unfortunately, whenever
Wo is smaller than 0(1) the complexity bound is heavily negatively influenced. We
will later show how the bound can be improved by adjusting the analysis and using
the fact that the proximity w increases along the Dikin-path.
-3 ( -2 )
v+ := V - a 11~211 = v e- a 11~211 '
Target following for LP 103
for some positive number 0:'. Since we require v+ to be positive it is well defined
only if
._ IIv2 11
0:' < O:'max .-
max ( V )2'
Defining the step size (J by (J := O:'/O:'max we have 0 < (J < 1 and
v+ := V - (J
max
-3 )2 = (
V(
V
V e - (J
max
-2)
V()2
V
. (3.21)
Note that each element of v+ is smaller than the corresponding element of V. This
property is important, since the Newton process in the (x, s)-space forces equality
between the duality gap and eT (v+)2, see Lemma 3.2.5. So the duality gap will be
decreasing and is bounded by
Ilvll (1 - (J) :::; 11v+ II :::; Ilvll (1 - (J :::~:~:) = Ilvll (1 - (J[J2) , (3.22)
where [J := w(v). If we choose (J :::; 1/3 then the Dikin step has two interesting
properties, which are similar to the ones in Lemma 3.3.3: it preserves the ranking
of the coordinates of V, and it causes the ratio [J to increase monotonically. These
results are summarized in the next lemmas.
Lemma 3.3.6 Assume that 0 < VI :::; V2 :::; ... :::; vn and let (J :::; 1/3. Then
O<vt:::;vr:::;···:::;v;;·
Thus it follows that vt : :; vi with equality if and only if Vj = Vi. This proves the
lemma. 0
Remark. An alternative proof of Lemma 3.3.6 can be given using the function ¢(t) =
vn = 1, then vt = ¢(Vi)
t(l- (Jt 2 )/(1- (J). Assuming without loss of generality that
is the value after the Dikin step, where the maximal component of v+ is rescaled to
1. This function is monotonically increasing and concave for (J :::; 1/3. •
104 CHAPTER 3
In the sequel we shall use () :s: 1/3, hence we may assume that the coordinates of v
are ranked as in Lemma 3.3.6. SO VI is the smallest and vn the largest element of v
and w = vl/vn .
Lemma 3.3.7 Assume that () :s: 1/3 and let w+ := w(v+). Then
w+=
1-
( 1-()
()W2) _ _ (3.23)
w~w,
and
1-W+ < ( 1- - Bw ) (l-w). (3.24)
1-B
Proof Since B :s: 1/3 Lemma 3.3.6 implies that w+ vi /vt· Hence, from the
definition of vi and v;t we get
1 - Bw 2 1 - B - w + ()w3
1-w+ 1- w=------
(1- ()(w+w2)) (l-w)
1-() 1-()
1- B(l +w+w2)(1_w) =
1-() 1-()
< ( 1 - 1()w
_ () ) (1 - w).
Remark. If we use a value () > 1/3, the ranking of v may not be preserved and the
proof of Lemma 3.3.7 doesn't go through. However, it is still possible to prove the
monotonicity of w for () :s: 1/2. We will omit the proof since this property will not
be used in the analysis. _
Lemma 3.3.8 Let v+ result from a Dikin step at v with step size () ~ 1/3 usmg
(3.21). Then
_m_i_n,-'-(v-:-)..,.. < _1_ and
-.",-t 1 (}VTi
min(v+) - 1- () 8(v,v )~1-() w'
Since v+ < v it holds v+ + v < 2v ~ 2vn e. Using also the definition of v+ we get
Ilv- 1 (v+? - v 2) II = IIv-1(v+ + v)(v+ - v)11
~ 2(}vn Ilv- I ~ 2(}v Vn·
1
:; n
Theorem 3.3.9 Let (x(O), s(O) be a given initial point and let
o (VTi In (x(O)Ts(O))
~ (
iterations the algorithm stops with a positive primal-dual pair (x*, s*) satisfying
(x*)T s* ~ L
106 CHAPTER 3
Comparing Theorem 3.3.9 with Theorem 3.3.5 we see that this target-following algo-
rithm has exactly the same complexity as the Dikin-path-following method analyzed
before. Still, there is a major conceptual difference between the two algorithms, since
one chooses its targets on one smooth path, while the other has targets on various
Dikin-paths. Moreover, when starting at the same point in the v-space, a Dikin step
as in the second algorithm moves the target closer to the central path than a step
along the Dikin-path; this can be verified by comparing the values of w+ in Lemma
3.3.3(iii) and Lemma 3.3.7.
o (!ln~)
o Wo
target updates using Dikin steps with step size 0 we have w2 2:: 1/2.
r
So w2 2:: 1/2 will certainly hold if
(1 + 10~20 k
(wo)2 2:: 1/2,
Target following for LP 107
or equivalently, if
2kln(l+ 1(J~2(J) ~lnC~~2)'
Using In(1 + t) > t/2 for t < 1, this will certaillly be satisfied if
Theorem 3.3.11 The algorithm tracing targets determined by Dikin steps requires
o (vn
at most
(~ In ~ + In (x(O)f s(O»))
Wo Wo €
Unfortunately, this complexity bound is not better than the one obtained for weight-
ed path-following algorithms (see Ding and Li [3] or Section 3.3.1); still, the new
algorithm has the advantage of generating, in theory and in practice, increasingly
centered pairs. Let us define 'close to the central path' by requiring that the iterate
is in the region of quadratic convergence of some point on the central path. We can
relate 'closeness' to the value of w as follows.
Lemma 3.3.12 If w := w(v) ~ n/(n + 1), then there exists a target-point 'iJ on the
central path such that 6 := 6( v; 'iJ) < 1/v'2.
This measure is minimal for J.I = IIvll / Ilv- 1II with value
1
yl2Vllvllllv-lll- n.
Hence we will have 6 ~ 1/../2 if
IIvllllv-11l- n ~ 1.
Using the bounds Ilvll ~ y'nmax(v) and IIv- 1II ~ y'n/min(v), this implies that it
suffices to have
1 n+ 1
- < -n - '
W -
which implies the lemma. o
The next lemma estimates the number of updates needed to reach a target with
w 2: n/(n + 1).
~k)
(1- W' ) ~
(1- 1(Jwo_ (J )k (1- wo) ~ n +1 1·
Taking logarithms and using In(l- t) ~ -t for t < 1 we obtain that k should satisfy
1-(J
k 2: (J- In((n
Wo
+ 1)(1 - wo»,
It is left to the reader to verify the following lemmas, which can be proved similarly
as in the case v = 1.
Lemma 3.3.15 Let v+ result from v by a target update using (3.27) with step size
(J ~ 1/(2v + 1). Then
We find that the algorithm using v-order scaling for the target update requires
o ( vr.::
n
( 1
=- Wo
In=-
1
+ In
(x(O)?
.
s(O»))
Wo (
_+) -...r
_ vI
-+ .~
~(_) > w (_)
w (V -_ VVI _
Iff: - Vw~V) _ V • (3.29)
Vn VVn
Lemma 3.3.16 Lei v be given and let w = min(v)/max(v); using the target update
(3.28) we have
min (v)
min(v+) = ../9
1
and 6(v·, v+) < _1_
- 2../9
(..!.w - 0) 'no
V"
< _1 (max(v) _
2../9 min (v)
0) Fn
_1 (..!.-O) 'no
= 2../9 w V"
o
Target following for LP 111
Applying Theorem 3.2.6 with the bounds in Lemma 3.3.16 we can compute the
maximal value of 0 such that 8(v+;V+):::; 1/2 will hold, given 8(v;v):::; 1/2. Unfor-
tunately, it appears that this cannot be done without requiring a condition on w. If
we require
1 1
-<1+--
w- 5y'n'
and choose 0 = 1 - 1/(5y'n), then it holds 8(v+;v+) :::; 1/2 and the algorithm
has an O(y'nln l/c) iteration complexity. Observe that even in a target-following
framework the algorithm is required to stay in a small neighborhood of the central
path.
Let xeO) be given such that Axe O) = b and define hand /-lo such that xeO) + /-loh > 0.
As Freund we make the following assumption.
Assumption 3.3.17 The shift h is chosen such that for all dual feasible slacks s
the condition Ilhsll :::; y'n holds.
Note that the assumption can be satisfied ifthe dual feasible region is bounded. Fre-
und shows, that when an approximation s to the analytic center of the (bounded)
dual feasible region is known then the algorithm can be started with this approxi-
mation, the shift h = ~s-l and a suitable value for /-lo. The system to be (approxi-
mately) solved in an iteration is given by
Ax b, x + /-lh ?: 0,
ATy+s c, s?: 0,
(x+/-lh)s /-le.
While Freund's algorithm does not necessarily generate feasible dual iterates in each
iteration, our primal--dual variant does. The main task it to estimate the effect of
updating the target foe, in which we use the distance measure
Definition 3.3.18 The vectors x and s are called (p, f3)-approximate solutions if
Ax b, x + ph 2: 0,
AT Y + s = c, s 2: 0,
and 8(x + ph, s; foe) ~ f3 for some constant f3 < 1.
Lemma 3.3.19 Let x and s be (p, 1/4)-approximate solutions and let p+ (1-0)p =
for 0 =
1/(16fo). Then we can compute (p+, 1/4)-approximate solutions x+ and
s+ with one Newton step.
Then,
3
(Xi + p+ h;)Si (Xi + phi)Si + (p+ - p)hiSi 2: "5 P - Ophisi
so we can use the pair (x + p+h,s) as starting point for Newton's method toward
the new target Ue.
We first establish that this pair is still close to the current
target foe:
We will let the algorithm run until (x+ph)T s ::; f; from the condition of approximate
solutions it then follows that np ::; 2('. Hence after O( yTiln(l/ f)) iterations the
algorithm has generated p' and a pair (x', SO) such that
and
2f
x' = x' + p*h - p'h ~ -p'h ~ --llhll
n
oo .
Hence the pair (x', s') is an approximately feasible and approximately optimal so-
lution if f is chosen sufficiently small.
if min(v+) > max(v), then we set v+ = max(v) e which is on the central path.
The goal of the algorithm is to obtain a vector which is a multiple of the all~one
vector. Since
( max(v+)) 2 1 (max(v)) 2
min(V+) ~ 1 + fJ min (v) ,
or equivalently (w+)2 ~ (1 + fJ)w 2, it follows that reaching this goal will require at
most
o (~ln
fJ
~)
Wo
iterations. The appropriate value of fJ is determined from the following lemma.
Lemma 3.3.20 Let v be given; using the target update (3.30) we have
Proof. If we are not at the last iteration then from (3.30) it follows that for any i
vt ~ ~min(v) ~ min(v);
when v+ = max(v)e at the last iteration we have vt ~ min(v), hence the first bound.
Let J be the set of indices for which Vi is increased. Then we have vt = Vi for i ~ J
and
o ~ (vtf- vi ~ f) min (v)2 for i E J.
Consequently,
Combining this result with Theorem 3.2.6 gives that we can take f) = 1/(3fo) to
have b(v+; V+) < 1/2. So we obtain that the algorithm needs at most O( foln 1/wo)
iterations.
If we combine the above centering scheme with the standard primal~dual path~
following algorithm we obtain an algorithm for the LP problem needing at most
(3.31)
Target following for LP 115
iterations, starting from any interior feasible point. This is done by first centering,
and then working to optimality. Note that in the centering phase the duality gap in
subsequent target points increases, but is bounded by n max(v(O)?
where v = y'xS as usual. We make the assumption that a (specific) point on or close
to the central path is available. Note that we might use the centering algorithm of
the previous subsection to find such a point. This problem has interesting special
cases that are considered by Atkinson and Vaidya [1], Freund [6] and Goffin and
Vial [7], namely to obtain the weighted analytic center of a polytope. If b = 0 and
(x, y, s) is a solution to system (3.32) then y is the weighted analytic center of the
dual space, if it is bounded; when c = 0 and (x, y, s) satisfies the given system then
x is the weighted analytic center of the primal space, if it is bounded.
We will first analyze an algorithm proposed by Mizuno [25], which is somehow the
dual of the algorithm for finding a center as discussed in the previous subsection.
Then we give a simplified analysis of the algorithm proposed by Atkinson and Vaidya
[1] for computing weighted analytic centers. We extend their algorithm to the case
of computing weighted primal and dual centers, i.e., for finding a solution to the
system (3.32).
116 CHAPTER 3
Lemma 3.3.21 Let v+ be obtained from v with an update of the target using {3.33}.
Then
min (v) 1 1
--+) < "...--n
min(v - v 1- 0 and 6(v; v+) ~ 2vT-;::-/'..;ri·
Proof The first bound is trivial. The components of v that are decreased by a factor
vr=o have not yet achieved their final value Wi. Since they all start with the same
value, they have all been reduced by the same cumulated factor and thus
vt = vr=oVi ~ Vi = min(v).
So we have for all i that l(vt)2 -v~1 ~ Omin(v)2. Hence
Using Theorem 3.2.6 gives us 6( v+; v+) < 1/2 for 0 = 1/(3y'n). The number of
iterations to be performed is determined by the condition
=
So first we consider the case b O. Assuming that w 2 2:: e and w 2 integral, Atkinson
and Vaidya suggest to start with a target vector v<0) =
e, and to successively increase
the weights by the use of a scaling technique it la Edmonds and Karp [4]. The basic
idea is to recursively solve the given problem with all weights W[ replaced by the
maximum of 1 and Lw[!2J. Let p = Llog2max(w2)J. Then W[ can be written in
binary notation as
=
W[ f3i o f3h .. . f3i p '
where f3ij E {O, 1} for all i, j. Elements of the weight-vector w 2 which do not need p
digits for their binary description start by convention with a string of zeroes. Now,
at iteration k the target is defined by
- R. R.
( ;;;(v k))2 R.
i - P l o P S ! .. ojJ,1c'
From now on, we denote for ease of notation v:= V<k-l) and v+ := V<k). Then the
technique boils down to a scheme that updates Vi in the following way:
if i E h = {i
if i E 12 = {i (3.34)
if i E 13 = {i
Observe that
i E h ==> vt = Vi = 1. (3.35)
The number of updates in the outer target-sequence is determined by the condition
We next need to compute the complexity of one outer update. This will be done
by defining an inner target-sequence that leads from v to V+. In [1) a pure dual
algorithm is used which means that doubling all weights does not change the position
of the dual weighted center. Hence, the only Newton steps needed are to get from
2v2 to (v+?, which are quite close to each other. Let (x, s) and v be given such that
6( v; v) < 1/2, where v := .,fXs. Since b = 0, by setting
( 1 - _ a r.::)i (2v;)
Viyn
~ 2v; - 1 ifi E h,
.
J >
ViVn
-In( -2v;
--) .
- -a 2-
Vi- 1
2
Lemma 3.3.22 Define {} := Ot/y'n. Let v(i) be obtained from v(j-l) with an update
of the target using {3.37} and {3.38}. Then
min(v(j-l») 1
b(v(j-l). v(j)) < 30t .
--~~~< ----- and
,
min(v(j») -.;r=B -2~
1 ( (V~(I_!?i)_V~)2)1/2
<
2.;r=Bmin(v) iEf;I. · Vi '
{} ( (J2V; + 1) 2) 1/2
<
2.;r=B min( V) iEf;I 3 Vi
{} 30t
<
2 .;r=B
1-{}min(v) 3vn = 2v'l=O
1-{}min(v) .
Since min( v) ~ 1 the lemma follows. o
if i E h,
{)i = { ~ Q
if i E 12 U Is,
v.Tn
where a > 0 is a certain constant. Update vU ) for j ~ 1 in the following way:
(3.40)
Note that the proof of Lemma 3.3.22 is easily adapted for this sequence and that its
result remains the same. Using the condition
or
if i E h,
ifi E h
and using the fact that Vi ~ 1, this implies that the number of updates must be of
the order O(max(v)y'n), so an upper bound expressed in the data is
O(max( w )v'n).
The reader should notice here that we have shown that the algorithm has a total
complexity of
O(max(w)v'nlog2 max(w))
Newton steps. This is a factor max(w) worse than the result (3.39) above and in [1].
This difference can be explained by noticing that doubling all weights does not have
any effect in a pure primal or dual method, but has quite an effect in a primal-dual
method.
Target following for LP 121
Acknowledgements
The first author is supported by the Dutch Organization for Scientific Research
(NWO), grant 611-304-028. Currently he is working at Centre for Quantitative
Methods (CQM) B.V., Eindhoven, The Netherlands.
REFERENCES
[1] D.S. Atkinson and P.M. Vaidya. A scaling technique for finding the weighted
analytic center of a polytope. Mathematical Programming, 57:163-192, 1992.
[2] D.A. Bayer and J .C. Lagarias. The nonlinear geometry of linear programming,
Part I : Affine and projective scaling trajectories. Transactions of the American
Mathematical Society, 314:499-526, 1989.
[3] J. Ding and T.Y. Li. An algorithm based on weighted logarithmic barrier func-
tions for linear complementarity problems. Arabian Journal for Science and
Engineering, 15:679-685, 1990.
[5] R.M. Freund. Theoretical efficiency of a shifted barrier function algorithm for
linear programming. Linear Algebra and Its Applications, 152:19-41,1991.
[7] J.-L. Goffin and J.- Ph. Vial. On the computation of weighted analytic centers
and dual ellipsoids with the projective algorithm. Mathematical Programming,
60:81-92,1993.
122 CHAPTER 3
[8] C.C. Gonzaga. Path following methods for linear programming. SIAM Review,
34:167-227, 1992.
[9] C.C. Gonzaga. The largest step path following algorithm for monotone lin-
ear complementarity problems. Technical Report 94-07, Faculty of Technical
Mathematics and Computer Science, Delft University of Technology, Delft, The
Netherlands, 1994.
[10] C.C. Gonzaga and R.A. Tapia. On the quadratic convergence of the simplified
Mizuno-Todd-Ye algorithm for linear programming. Technical Report 92-41,
Dept. of Mathematical Sciences, Rice University, Houston, TX, USA, 1992.
[11] O. Giiler, C. Roos, T. Terlaky, and J .-Ph. Vial. Interior point approach to
the theory of linear programming. Cahiers de Recherche 1992.3, Faculte des
Sciences Economique et Sociales, Universite de Geneve, Geneve, Switzerland,
1992. (To appear in Management Science).
[12] O. Giiler and Y. Yeo Convergence behavior of interior-point algorithms. Math-
ematical Programming, 60:215-228, 1993.
[13] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex
Programming, Algorithms and Complexity. Kluwer Publishers, Dordrecht, The
Netherlands, 1994.
[14] D. den Hertog, C. Roos, and T. Terlaky. A polynomial method of weighted cen-
ters for convex quadratic programming. Journal of Information f'3 Optimization
Sciences, 12:187-205, 1991.
[15] B. Jansen. Interior point techniques in optimization; complementarity, sensitiv-
ity and algorithms. PhD thesis, Faculty of Technical Mathematics and Computer
Science, Delft University of Technology, Delft, The Netherlands, 1995.
[16] B. Jansen, C. Roos, and T. Terlaky. A family of polynomial affine scaling al-
gorithms for positive semi-definite linear complementarity problems. Technical
Report 93-112, Faculty of Technical Mathematics and Computer Science, Delft
University of Technology, Delft, The Netherlands, 1993.
[17] B. Jansen, C. Roos, and T. Terlaky. A polynomial primal-dual Dikin-type
algorithm for linear programming. Technical Report 93-36, Faculty of Technical
Mathematics and Computer Science, Delft University of Technology, Delft, The
Netherlands, 1993. (To appear in Mathematics of Operations Research).
[18] B. Jansen, C. Roos, T. Terlaky, and J .-Ph. Vial. Primal-dual target-following
algorithms for linear programming. Technical Report 93-107, Faculty of Techni-
cal Mathematics and Computer Science, Delft University of Technology, Delft,
The Netherlands, 1993. (To appear in Annals of Operations Research).
Target following for LP 123
[19] B. Jansen, C. Roos, T. Terlaky, and J .-Ph. Vial. Primal-dual algorithms for
linear programming based on the logarithmic barrier method. Journal of Opti-
mization Theory and Applications, 83:1-26, 1994.
[27] R.D.C. Monteiro and I. Adler. Interior path following primal-dual algorithms:
Part I : Linear programming. Mathematical Programming, 44:27-41, 1989.
[28] Y. Nesterov and M.J. Todd. Self-scaled barriers and interior-point methods
for convex programming. Technical Report 1091, School of OR and IE, Cor-
nell University, Ithaca, New York, USA, 1994. (To appear in Mathematics of
Operations Research).
[29] C. Roos and D. den Hertog. A polynomial method of weighted centers for linear
programming. Technical Report 89-13, Faculty of Technical Mathematics and
Computer Science, Delft University of Technology, Delft, The Netherlands, 1989.
[30] C. Roos and J .-Ph. Vial. A polynomial method of approximate centers for linear
programming. Mathematical Programming, 54:295-305, 1992.
124 CHAPTER 3
[31] J.F. Sturm and S. Zhang. An O(y'nL) iteration bound primal-dual cone affine
scaling algorithm. Technical Report TI 93-219, Tinbergen Institute, Erasmus
University Rotterdam, 1993.
4
POTENTIAL REDUCTION
ALGORITHMS
Kurt M. Anstreicher
Department of Management Sciences
University of Iowa
Iowa City, IA 52242, USA
4.1 INTRODUCTION
Potential reduction algorithms have a distinguished role in the area of interior point
methods for mathematical programming. Karmarkar's [44] algorithm for linear pro-
gramming, whose announcement in 1984 initiated a torrent of research into interior
point methods, used three key ingredients: a non-standard linear programming for-
mulation, projective transformations, and a potential function with which to measure
the progress of the algorithm. It was quickly shown that the non-standard formula-
tion could be avoided, and evennally algorithms were developed that eliminated the
projective transformations, but retained the use of a potential function. It is then fair
to say that the only really essential element of Karmarkar's analysis was the potential
function. Further modifications to Karmarkar's original potential function gave rise
to potential reduction algorithms having the state-of-the-art theoretical complexity
of O(..,fiiL) iterations, to solve a standard form linear program with n variables, and
integer data with total bit size L. In the classical optimization literature, potential
reduction algorithms are most closely related to Huard's [39] "method of centres;"
see also Fiacco and McCormick [21, Section 7.2]. However, Karmarkar's use of a
potential function to facilitate a complexity, as opposed to convergence analysis, was
completely novel.
125
T. Ter/aky (ed.l.lnterior Point Methods ofMathemmical PrOflramming 125-158.
CD 19'J6KlPuAaltlemicPlIlIlIIhen.
126 CHAPTER 4
we provide (in the next section) the basic complexity arguments based on the pri-
mal, and primal-dual potential functions. In the last section we describe various
modifications and extensions of the algorithms.
Todd [78] has already written an excellent survey of potential reduction algorithms.
Compared to [78], this is a more introductory article that covers less material. For
the reader interested in a more technical discussion of the topics covered here, with a
greater emphasis on research issues and new extensions, we highly recommend [78].
For a discussion of path-following methods, the other major class of polynomial-
time interior point algorithms, we highly recommend the survey paper of Gonzaga
[37].
L
n
k ~ 0 of interior points and lower bounds such that f(x k , Zk) -+ -00. The usual
approach to analyzing such an algorithm is to show that on each iteration k it is
possible to reduce f(·,·) by some uniform, positive amount o. Note that for any
x> 0,
tln(X i ) ~ nln (e:x) ,
by the arithmetic-geometric mean inequality. If we assume that a decrease of at
least fJ occurs on each iteration, then after k iterations we immediately obtain
( T x k -z k)
I nc < f(xO,zO) kfJ n I n
--+- (eTxk)
-- . (4.1)
- q q q n
Clearly then if the solution sequence {xk} is bounded, the "gap" cT xk - zk will be
driven to zero. We will next translate this observation into a precise complexity
result for LP.
The usual complexity model for LP (see for example [65]) assumes that the data
in LP is integral, and characterizes the performance of an algorithm in terms of
the dimensions m and n, and the number of bits L required to encode the problem
instance in binary. (The quantity L is commonly refered to as the size of LP.)
A complete complexity analysis should bound the number of digits required in all
computations carried out by the algorithm, but we ignore this issue here and consider
only the number of arithmetic operations performed, and not the sizes of the numbers
involved. We will use the well-known fact (see [65]) that if cT x - z ~ 2- 2£ for a
feasible solution x and lower bound z, then x can be "rounded" to an exact optimal
solution of LP in O( m 2 n) operations. It is also well known that if LP has an optimal
solution value z·, then _2°(£) :S z· ~ 2°(£).
To start, we assume that we are given an initial interior solution xO, and lower bound
zO, such that f(xO, zO) ~ O(qL). Later we will discuss the "initialization" problem
of finding such a pair (xO, zO).
Theorem 4.2.1 Assume that the set of optimal solutions of LP is nonempty and
bounded. Suppose that f(xO,zO) ~ O(qL), and f(-,·) is reduced by 0 on each itera-
tion. Then after k = O(qLlfJ) iterations, cT xk _ zk ~ 2- 2 £.
Proof We will show that In(e T xk In) = O(L) for all k ~ O(qLlo), and therefore the
theorem immediately follows from (4.1). For each iteration k define scalars
128 CHAPTER 4
and let e = nxkjeTx k = Atxk, so that eTe = n. Exponentiating (4.1) then results
e ~ A~
m
cT zk + A~ .
It follows that for every k 2:: 0, (e, At, A~) is a feasible solution for the linear pro-
gramming problem:
mm Al + A2
A~ - Alb 0
eT~ n (4.2)
cT ~ - Al zmax - A2 < 0
~ 2:: 0, Al > 0,A2 > 0,
where Zmax = 20(L) is an upper bound for Z·. Since the set of optimal solutions of
LP is nonempty and bounded, the optimal objective value in (4.2) is strictly positive.
Moreover the size of (4.2) is O(L), and therefore the optimal objective value is at
least 2- 0 (L) (see [65]). However, after k = O(qLjfJ) iterations we must have either
eTx k ~ n, or A~ < 2- 0 (L). It follows that for all k 2:: O(qLjfJ), A~ 2:: 2- 0 (L), and
therefore In(e T xk jn) ~ O(L), as claimed. _
To provide a complete complexity result for LP we still need to deal with the is-
sue of satisfying the assumptions of Theorem 4.2.1. This is quite simple, at least
from a theoretical standpoint. For an arbitrary problem LP, with no assumptions
whatsoever, consider the augmented problem:
MLP: mm cT x
Ax b
eT X < M
x > 0,
where x E Rn+1, and
A = (A,b- Ae), c -_ ( Mc ) .
It is then very well known (see for example [65]) that MLP is equivalent to LP for
M = 20 (L), in that x· with eT x· < M is an optimal solution for LP if and only
Xi = xi, i = 1, ... , n, xn +! = 0 is an optimal solution for MLP. (If the optimal
solution to MLP has Xn +1 > 0 then LP is infeasible. If the optimal solution to MLP
has eT X = M then either LP is unbounded, or LP has an unbounded set of optimal
solutions, and these cases can be distinguished by doubling M and solving MLP
again.) The primal potential function can then be defined for MLP instead of LP,
and it is easy to verify that for zO = _2 0 (L), xO = e, the assumptions of Theorem
4.2.1 are satisfied.
Potential Reduction Algorithms 129
where q > n, x > 0 is feasible for LP, and s > 0 is feasible for LD. (By the latter we
mean that there is ayE R m so that ATy + s = c.) Note that for any such x and s,
n
Theorem 4.2.2 Suppose that F(xO, sO) ~ O«q - n)L), and F(.,.) is reduced by {}
=
on each iteration. Then after k O«q - n)L/{}) iterations, (xk)T sk ~ 2- 2L •
Remarks. The primal potential function was introduced by Karmarkar [44]. The
exponentiated, or "multiplicative" form of the potential function was used by Iri
130 CHAPTER 4
and Imai [41], and was further studied by Imai (40). The use of general values for q
was suggested by Gonzaga (33). The primal-dual potential function was introduced
by Todd and Ye [SO), and (in multiplicative form) Tanabe (70). See Ye, Todd, and
Mizuno (91) and Jansen, Roos and Terlaky (42) for alternative "homogeneous self-
dual" approaches to the initialization problem.
HLP: mm i:T x
Ax 0
dTx
X > 0,
where
A = (AXk, -b), c= (X;c) , d= U).
One can think of obtaining HLP from LP by applying a transformation of variables:
and then using the additional variable X n +l to "homogenize" the original equality
constraints of LP. Clearly HLP is equivalent to LP, and x = e is feasible in LP.
The derivation of a step in LP is based on the transformed problem LP. First we
consider the issue of updating the lower bound. For any matrix B, let PB denote the
Potential Reduction Algorithms 131
orthogonal projection onto the nullspace of B. In the case that B has independent
rows, we then have PE = 1- BT(BBT)-l B.
Lemma 4.3.1 (Todd and Burrell [79]) Suppose that z E R satisfies PA(c-zd) 2: o.
Then z :::; z*.
HLD: max z
.F y + dz < c.
But Pji(c - zd) = (c - zd) - -,4T y(z) for some y(z) E R m , so PA(c - zd) 2: 0 implies
that (y(z), z) is feasible in HLD. Then z :::; z*, since LP and HLP have the same
optimal objective value. _
Using Lemma 4.3.1 the lower bound zk can be updated as follows. Let Zk = {z 2:
zk I Pji(c - zd) 2: O}, and define zk+l to be:
Then Z = zk+1 :::; z*, by Lemma 4.3.1, and moreover by construction we have
PA(c - zd) 1- O. Now let
A -
uX = P [5 1(-C - z-d) = Pe T PA (-C - z-d) = P ji (-C - Z-d) -
(c - zdf e
n + 1 e,
where we are using the fact that Ae = O. Since Pji(c- zd) 1- 0, we then immediatelyl
have
II~xll 2: II~xlloo 2: (c: ~dt e. (4.4)
_I ~x
X = e- Q lI~xll ' (4.5)
where Q > 0 is a steplength yet to be decided. Note that the resulting x 1 will satisfy
the equality constraints Ax = 0 of HLP, but in general will fail to satisfy ~ x' = 1.
In order to obtain a new point x k +1 which is feasible for LP, we employ a projective
transformation
k+1 _ X x
k- I
x - Px . (4.6)
'
132 CHAPTER 4
Substituting (4.6) into the definition of fe, .), with q = n+ 1, for a sufficiently small
we obtain
~_
< -a - t;
n+l
In
(
1- a II~~II
)
' ( 4.7)
where the inequality uses (4.4), and the fact that In(1 - t) ::; -t for any 0::; t < 1.
To obtain a bound on the potential decrease for Karmarkar's algorithm we need to
obtain a bound for (4.7). One approach is to use the following well-known inequality.
~ T IIul1 2
£;-tln(1 + Ui)?: e U- 2(1-lluII00)
Proof For each i = 1, ... , n the Taylor series expansion for In(1 + Ui) results in
00 (_1)i+1ui 1 00. u2
In(l + Ui) = L
J=l
. ' ?: Ui - 2" L
J
IUil J =
J=2
Ui - 2(l-'lu;!) . (4.8)
The proof is completed by summing (4.8), and using IUil ::; Ilulloo for each i. •
Proof We have
f(xk,zk) _ f(x k+1,zk+l) >
>
where the first inequality uses zk+l ~ zk, the second uses (4.7), and the third uses
Lemma 4.3.2 and the fact that eT ax = O. The proof is completed by substituting
a = .5 into (4.9). •
Remarks. There are many papers that consider different aspects of Karmarkar's
algorithm. One line of investigation concerns the potential decrease assured in The-
orem 4.3.3. The decrease of .25 proved here can easily be improved to 1-ln(2) ~ .31
by sharper approximation of the logarithmic barrier terms. Muramatsu and Tsuchiya
[59] show that using a "fixed fraction to the boundary" step, based on the "affine"
direction PA,(c - Ed), a decrease of about .41 is always possible. Anstreicher [3] and
McDiarmid [48] independently proved that with exact linesearch of the potential
function a decrease of approximately .7215 is always possible, and this bound is
tight. Another interesting topic is the derivation of a lower bound for the worst-case
complexity of the algorithm. Anstreicher [7] shows that using exact linesearch of the
potential function, the algorithm may produce an 0(1) reduction in f(',') on every
iteration, and may require O(ln(n/f)) iterations to reduce the gap cT xk - zk to a
factor f < 1 of its initial value. Ji and Ye [43] elaborate further the analysis of [7].
Powell [66] shows that the iterates of Karmarkar's algorithm, with exact linesearch,
may visit the neighborhoods of O( n) extreme points of the feasible region.
Anstreicher [2] and Steger [67] describe a "ball update" alternative to Todd and
Burrell's [79] lower bound methodology. Shaw and Goldfarb [69] show that with
a weakened version of the ball update, and short steps (a < 1), the projective
algorithm can be viewed as a path following method and has a complexity of O(..jTiL)
iterations. Anstreicher [2] describes a modification of the algorithm that assures
monotonicity of the objective values {cT xk}. Anstreicher [10] describes a stronger
monotonicity modification, and obtains a complexity of O(..jTiL) iterations using the
134 CHAPTER 4
weakened ball updates, and step lengths based on the primal-dual potential function
F(·, .). Goldfarb and Mehrotra [30],[31] modify the projective algorithm to allow for
the use of inexact computation of the search direction ~x. Todd [71] considers the
computation of lower bounds, and the search direction, for problems with special
structure. Todd [72] and Ye [84] describe the construction of "dual ellipsoids" that
contain all dual optimal solutions. In principle this procedure could be used to
eliminate variables as the algorithm iterates, but Anstreicher [6] describE)s why the
process fails in the presence of degeneracy. Todd [74] and Anstreicher and Watteyne
[13] describe alternatives to the usual search direction obtained via decomposition,
and projection onto a simplex, respectively. Computational results for Karmarkar's
algorithm are reported in [13], and by Todd [73].
Asic et al. [14] consider the the asymptotic behavior of the iterates in Karmarkar's
algorithm using short step (a < 1), while Megiddo and Shub [49] and Monteiro
[53] examine properties of the continuous trajectories associated with the algorithm.
Bayer and Lagarias [15] explore connections between Karmarkar's algorithm and
Newton's method, Gill et al. [29] describe relationships between Karmarkar's al-
gorithm and logarithmic barrier methods, and Mitchell and Todd [51] relate Kar-
markar's method to the primal affine scaling algorithm. Freund [23], Gonzaga [35],
and Mitchell and Todd [52] consider the projective algorithm for more general prob-
lem formulations than that of LP. See also Freund [26] for a very general discussion
of the use of projective transformations.
It turns out that both of the above issues can be addressed by a method that is
quite similar to Karmarkar's algorithm, but which avoids the use of a projective
transformation on each step. Given a feasible interior point xk, k ;::: 0, consider a
transformed problem:
LP: mill
Potential Reduction Algorithms 135
Ax b
x > 0,
where now A = AX k and c = Xk c. Let LD denote the dual ofLP. One can think of
obtaining LP from LP by applying a simple re-scaling of the variables of the form
(4.10)
Note that ifx and x are related by (4.10), then !( x, z) and !( x, z) differ by a constant
which depends only on xk. As a result, it suffices to analyze the decrease in j(-,.)
=
starting at x e, z =
zk. To this end, let ~x be the projection of the gradient of
/(e, zk) onto the nullspace of A:
(4.12)
=
Lemma 4.4.1 Let q n + vfn, and suppose that II~xll ::; TJ < 1. Then zk+1 = bT y'
satisfies zk < zk+1 ::; z·, and !(xk, zk) - !(xk, zk+1) ~ (1 - TJ)vfn.
Proof Clearly e + ~x > 0, so (4.12) implies that y' is feasible for the dual of LP,
and therefore bT y' ::; z·. In addition,
= (c +qTJvfn (cT e _
T e q- zk) eT(e
c-T e - bT y' + ~x) ::; n zk), (4.13)
q In ( cTc;;re e- - zk+l)
z
k ~ q In
(n + rrJri)
q
_I ~x
X = e -a,,~x'" (4.14)
where a > 0 is a step length yet to be decided. Following such a step, a new point
=
x k +1 is defined by x k +1 X k X I.
Lemma 4.4.2 Let q = n + fo, and suppose that "~x,, ~ '1 > O. Then there is a
step/ength a so that f(x k , zk) - f(X H1 , zk) ~ (1 + T]) - v'I+2i7 > o.
Proof. We have
where the first inequality uses Lemma 4.3.2, and the second uses In(1 - t) ~ -t
for t < 1. A straightforward calculus exercise shows that (4.15) is minimized at
Potential Reduction Algorithms 137
Q' =1- 1/>/1 + 21/, and substitution of this value into (4.15) completes the proof.
•
Taken together, Lemmas 4.4.1 and 4.4.2 immediately imply that for q n + fo, an =
Q(I) decrease in !(.,.) is always possible. As a result, the affine potential reduction
algorithm is an O( nL) iteration method for LP. However, there is a striking asymme-
try between Lemmas 4.4.1 and 4.4.2, since the former shows that in fact an Q(fo)
decrease occurs on steps where the lower bound is updated. In fact the affine poten-
tial reduction method, exactly as described above, can be shown to be an O( foL)
iteration algorithm by analyzing the algorithm using the symmetric primal-dual
potential function F(·, .), instead of the primal potential function !(', .).
Suppose that x lc > 0 and sic > 0 are feasible for LP and LD, respectively. Consider
a linear transformation of the dual variables
(4.16)
Then for any x > 0 and s > 0, feasible in LP and LD, respectively, x and s from
(4.10) and (4.16) are feasible in LP and LD, respectively, and moreover F(x, s) =
F(x, s). As a result, it suffices to analyze the descent in F(·,·) starting at x e, =
= = =
s sic X lc sic. Let ~x be as in (4.11), for zlc bTylc, where ATylc + sic = C. If
II~xll ~ 1/, we continue to take a step as in (4.14), and let x lc + 1 =
Xkx'.
=
Lemma 4.4.3 Let q n+fo, and let ~x be as in (4.11), with zk = bT ylc. Suppose
that II~xll ~ 1/ > O. Then there is a steplength Q' so that F(x lc , sic) - F(xlc+I, sk) ~
(1 + 1/) - v'f+211 > O.
Proof The proof is identical to that of Lemma 4.4.2, using the fact that for any x,
xT sk = cT X _ zk. •
Next we turn to the case of II~xll ::; 1/. As before, we will use the fact that (4.12)
provides a feasible solution for LD. Define
(4.17)
We now require an analysis of the step from sic to s' that includes the effect of the
dual barrier terms in F(·, .).
Theorem 4.4.4 Suppose that lI~xll ::; 1/. Let s' be as in (4.17), and let slc+ 1
(Xlc)-ls'. Then F(x", sk) - F(x lc , slc+ 1 ) ~ (1 - 21/)/(2 - 21/).
138 CHAPTER 4
Proof. We have
where the last inequality uses In(1 + t) ~ t for t > -1 (twice), and the fact that
II~xll ~ 1]. The proof is completed by noting that q =
n + yfii ~ 2n for n :::: 1. •
Lemma 4.4.3, and Theorems 4.4.4 and 4.2.2, imply that the affine potential reduction
algorithm, using q =n + yfii and 1] < .5, is an O( yfiiL) algorithm for LP. As with
Karmarkar's algorithm, in practice a linesearch in a can also be used to improve the
decrease in Fe·) on primal steps.
Remarks. The affine potential reduction method based on f(·,·) was proposed by
Gonzaga [33], who assumed that zO = z*. The lower bound logic based on (4.12)
Potential Reduction Algorithms 139
was suggested in [33], and fully developed by Freund [24]. Independently, Ye [85]
devised the analysis based on F(·, .), which reduces the complexity of the algorithm
to O( ynL) iterations. Ye [83] also describes an alternative O( ynL) iteration algo-
rithm that uses F(·, .), but employs projectiv~ transformations as in Karmarkar's
algorithm.
The lower bound, or dual variable, update based on (4.12) can be modified in several
different ways. For example, in [24] the lower bound is increased to a value zk+l
so that following the bound update it is always the case that II.:lili ;: : "I. As a
result, updates of the lower bound (or dual solution) are immediately be followed
by primal steps. Gonzaga [36] considers a general procedure for the construction of
lower bounds, and Mitchell [50] relates the construction in [36] to earlier results of
Todd [72].
Anstreicher [9] describes a monotonicity modification for the affine potential reduc-
tion algorithm, and Ye [86] analyzes a variant that allows for column generation.
Monteiro [58] considers the behavior of the continuous trajectories associated with
the algorithm. Todd [77] describes analogs of potential reduction methods for semi-
infinite linear programming. Anstreicher [11] devises an algorithm which is similar
to the affine potential reduction for LD, but which employs a volumetric potential
function
1
q In(z - bT y) - 2" In (det (AS- 2 AT)) ,
where s = c - AT Y > 0, q = O(m), and z > z*. The resulting algorithm has a
complexity of O(mynL) iterations. Using a potential function that combines the
volumetric barrier with the usual logarithmic barrier, the algorithm's complexity is
reduced to O( yrnnL) iterations.
x (Xk)-1/2(Sk)1/2 X
(4.20)
s (Xk)1/2(Sk)-1/2 s.
Then for any x feasible in LP, x from (4.20) is feasible for a rescaled problem
LP defined as in the previous section, but using the primal-dual scaling matrix
(Xk)1/2(Sk)-1/2 in place of Xk. Similarly if s is feasible for LD, then s is feasible
in LD, the dual of LP. Moreover, F(x, s) = F(x, s). Note that the transformation
(4.20) maps both xk and sk to the vector v = (X k )1/2(Sk)1/2 e. As a result, it suffices
= =
to consider the reduction in F(·,·) starting at x s v. Note that
We define directions
(4.21 )
where A = A(Xk)1/2(Sk)-1/2. Consider simultaneous primal and dual steps of the
form:
x' = v - ~Llx = V(e - ~V-ILlx),
(4.22)
s' v-£Lls=V(e-£V-1Lls)
-y -y'
=
where we are using the fact that LlxT Lls O. Applying Lemma 4.3.2, and the fact
that In(l - t) ~ -t for t < 1, for a sufficiently small we obtain
F(x',s')-F(v,v) <
( 4.23)
Potential Reduction Algorithms 141
where the equality uses the fact that ~x + ~s = (q/llvIl 2 )v - V-Ie. Now let Vrnin =
mini {Vi}. Then
,2 IW- l ~x112 + IIV- l ~sw
< ---i-(II~xW + II~sI12)
vrnin
1
-2-II~x+~sW
vmin
v;Jlllv~'2v - V- Ie 11
2
(4.24)
To obtain an estimate for the decrease in F(·,·) for the primal-dual algorithm we
require a bound for the linear term in (4.25). Such a bound is provided by the
following lemma.
Lemma 4.5.1 [47, Lemma 2.5] Let vERn, v > 0, and q = n +,;n. Then
vrnin II IlvWV
q - V -1 eII 2: v'3
2'
Proof We have
>
>
142 CHAPTER 4
where the second equality uses the fact that vT[V-le - (n/llvW)v] = o. •
=
Theorem 4.5.2 Let q n +.,fii, and consider the primal-dual steps defined as in
=
(4.22). Let xk+l (Xk)1/2(Sk)-1/2 x', Sk+l =
(Xk)-1/2(Sk)1/2 s', Then there is a
steplength a so that F(xk, sk) - F(xk+l, sk+l) ~ .16 .
(4.26)
The above total complexity bounds can be improved using a technique known as
partial updating. Consider Karmarkar's algorithm, or the affine potential reduction
method. Then the matrix to be formed and factorized on each iteration is of the
form A(Xk)2 AT. The idea of partial updating is to instead maintain a factorization
of a matrix A(Xk? AT, where irk > 0 satisfies
-p1 ~ -t-
ir~
xi
~ p, i = 1, ... , n, (4.27)
and p > 1 is a 0(1) constant. The computations required on each step are then mod-
ified to use the factorization of A(Xk)2 AT, instead of a factorization of A(Xk)2 AT.
Following a step from xk to xk+1, the algorithm first sets xk+ 1 = xk, and then "up-
dates" any indecies i which fail to satisfy (4.27), for k =
k + 1. Each such update
produces a rank-one change in A(Xk+l)2 AT, requiring an update of the factoriza-
tion of A(Xk+l)2 AT that can be performed in 0(m 2) operations. See for example
Shanno [68] for details of updating a Cholesky factorization. Karmarkar [44], who
introduced the technique, shows that when his algorithm uses partial updating the
number of iterations is still O( nL) but the total number of updates required on all
iterations is only 0(n1.5 L). As a result, the complexity of Karmarkar's algorithm
=
using partial updating is reduced to 0(n1.5(m 2 )L + n(mn)L) 0(m1.5n 2 L). In the
interior point literature the distinction between m and n is often ignored, in which
case partial updating provides a factor-of-yIn complexity improvement.
We will not present the details of potential reduction algorithms that incorporate par-
tial updating, but we will describe some results on the topic. A serious shortcoming
of Karmarkar's original analysis of partial updating is that the complexity improve-
ment requires that the algorithm take short steps (a < 1), instead of performing
a linesearch of the potential function. This restriction makes the technique hope-
lessly impractical. Anstreicher [5] shows that with a simple safeguard, a linesearch
can be performed when using partial updating, while still retaining the complexity
improvement. Ye [85] describes a partial updating version of the affine potential
reduction algorithm that reduces the total complexity to O( m1.5 n 1.5 L) operations.
However, the analysis of [85], like that in [44], requires that the algorithm take short
144 CHAPTER 4
steps. Anstreicher and Bosch [12] adapt the safeguarded linesearch of [5] to the affine
potential reduction algorithm, resulting in an O( m1.5 n 1.5 L) algorithm that can use
linesearch to improve the reduction in F(·,·) on each iteration. Other partial updat-
ing variants of the affine potential reduction method are devised by Bosch [16], and
Mizuno [53], [54].
Partial updating can also be applied to primal-dual algorithms, which are based on
a primal-dual scaling matrix of the form (Xk)1/2(Sk)-1/2. Bosch and Anstreicher
[17] devise an O( m1.5 n 1.5 L) partial updating variant of the primal-dual potential
reduction algorithm of [47], that allows for safeguarded linesearch of F(.,.) using
unequal primal and dual step lengths.
Consider for example Karmarkar's algorithm. Instead of the step as in (4.5), define
a step of the form
_/ ~x
(4.28)
;x; = e- O! lI~xlloo .
Potential Reduction Algorithms 145
Nesterov and Todd [62] suggest a similar "long step" analysis for the affine potential
reduction algorithm based on f(', .), with q = 2n. Let ~x be as in (4.11), and
suppose that lI~xlloo ~ TJ < 1. Let zk+ 1 = bTy', where y' is as in (4.12). It then
follows easily that
(4.30)
and also that
Thus updates of the lower bound now produce an Q(n) decrease in f(" .). Next
consider the situation where lI~xlloo > TJ. Instead of using the step as in (4.14),
define
_I ~x
X =e - a II~xlloo
Proceeding as in the proof of Lemma 4.4.2, we obtain
f(xk+l, zk) - f(x k , zk)
< -a ( q _
c- e
)T ~x
+ a2(II~xll/lI~xlloo)2
cr e - zk II~xlloo
-.:..:..:....::-:-::.:..:....:..:~...:..:..::..::..:....-
2(1 - a)
146 CHAPTER 4
< (4.31)
As in the case of Karmarkar's algorithm, (4.31) assures an n(l) decrease in f(·, zk),
but indicates that a much larger decrease will typically occur.
If one considers the affine potential reduction algorithm using Fe, .), with q 2n, =
then the situation on primal steps, with IIAxll oo 2: "I, is exactly as above. For dual
steps, the effect on F(·,·) can easily be analyzed as in the proof of Theorem 4.4.4.
The final result is that on a dual step, where IIAxll oo :5 "I, F(xk, sk) - F(xk, sk+l) 2:
n(1 - 2"1)/(2 - 2"1), a decrease of exactly n times the bound of Theorem 4.4.4.
However, with q = 2n there is essentially no reason to measure progress of the
algorithm using F(·, .).
For a more extensive discussion of the use of "long steps" in potential reduction
methods see Nesterov [61], Nesterov and Todd [62], and Todd [78]. The latter also
describes a "long step" analysis for the primal-dual potential reduction algorithm.
It turns out that it is possible to retain the O( vnL) iteration complexity of the affine
potential reduction algorithm while using "larger-step" dual updates. Consider q =
n + vvn, where v = 0(1). The analysis of descent in F(-,·) for primal and dual
steps is then almost identical to the analysis with v = 1, and the bounds provided by
Lemma 4.4.3 and Theorem 4.4.4 continue to hold. By Theorem 4.2.2, the algorithm
remains an O( vnL) iteration algorithm. However, the dual update will now result
III
cT xk - zk+l n + TJvn
cT xk - zk ::; n + vvn '
so large values of v produce a better gap reduction on dual steps. In addition,
following such a step one will tend to have a larger value for II~xll, resulting in a
primal step with better potential decrease. This is the rationale behind the "large
step dual update" of [34], although Gonzaga describes the dual update somewhat
differently from the way we describe it here, and bases his complexity analysis on
1(-,.) rather than F(·, .).
A "truly large" dual step update, with an Q( 1) reduction in the gap, is provided by
=
using q 2n. In this case the algorithm can also be analyzed using an infinity-norm
parameterization of the primal step, as described above. Thus q = 2n produces
truly-large-step dual updates, and allows for long primal steps, leading to a very
substantial improvement in the practical performance of the algorithm.
mIn c.T x
Ax b
(4.32)
d!'x 0
x > 0,
148 CHAPTER 4
where x E Rn+l,
and xO > O. It is not assumed that Ax D = b. Clearly (4.32) is equivalent to LP, and
xDgiven by xf = xf, i = 1, ... , n, x~+1 = 1 is feasible for all of the constraints of
(4.32) except the constraint efT x = O. The approach of a Phase I - Phase II potential
reduction algorithm is to simultaneously decrease the usual primal potential function
!(.,.) based on (4.32), and also decrease a "Phase I" potential function:
n+l
j(x) = q In(dT x) - L In(x;).
;=1
Freund [25] uses a "shifted barrier" approach to allow for the initialization of a
potential reduction algorithm with an infeasible point. In [25] it is assumed that
AxD = b, but that xO may have negative components. The usual potential function
!(-, .) is replaced by a function of the form
n
q In(cT x - z) - L In(x; + h;(c T x - z)),
;=1
where q = n + fo, and h > 0 is a "shift" vector such that xO + (cT XO - zO)h >
O. Similarly F(·,.) is replaced with a potential function that includes the shifted
primal barrier terms. Algorithms based on these perturbed potential functions have
complexities of O( nL) or O( foL) iterations, under various assumptions regarding
the dual feasible region.
Potential Reduction Algorithms 149
A.6.x b-Axk
AT .6.y + .6.s C - ATyk _ sk (4.33)
Sk .6. x + Xk .6.s = ip,ke-XkSke,
=
where 0 ::;: i ::;: 1, and p,k (xkl sk In. (The use of i =0 results in the "primal-dual
affine scaling," or "predictor" step, while i =1 gives a "centering," or "corrector"
step.) The next point is of the form
for a step parameter a ::;: 1. Most algorithms based on (4.33) are of the path-
following, or predictor-corrector variety. However, Mizuno, Kojima and Todd [55]
devise a potential reduction algorithm that uses directions from (4.33).
LCP: s-Mx q
s ~ 0, x > 0, xT s = 0,
where M is an n x n matrix, and q E Rn. It is well known that for appropriate choices
of M (see for example [19]), LCP can be used to represent linear programming,
convex quadratic programming, matrix games, and other problems. Many primal-
dual algorithms for LP can be extended to LCP, under the assumption that M is
a positive semidefinite (but not necessarily symmetric) matrix. In particular, the
primal-dual potential reduction algorithm of Section 5 was originally devised as a
method for LCP, and retains a complexity of O( foL) iterations so long as M is
positive semidefinite. See Kojima, Mizuno, and Yoshise [47] for details.
The theory of LCP depends very heavily on the membership of M in various classes of
matrices (for example, positive semidefinite matrices). Kojima et al. [45] discuss the
application of interior point algorithms, including primal-dual potential reduction
methods, to LCP problems with different types of M. Kojima, Megiddo, and Ye
[46] analyze a potential-reduction algorithm in the case that M is a P-matrix (that
150 CHAPTER 4
is, a matrix with positive principal minors), for which a solution to LCP always
exists (see [19]). Ye [87) analyzes a potential reduction algorithm that obtains an
approximate stationary point of a general LCP, and Ye [88) considers a potential
reduction method for the related problem of approximating a Karush-Kuhn-Tucker
point of a general quadratic programming problem. The last three references show
that the potential reduction framework can be used to analyze algorithms that are
not polynomial-time methods.
CLP: mm (c,x)
Ax b
x E K,
CLD: mm (b,y)
A*y + s c
s E K*,
Strong duality holds between CLP and CLD if, for example, CLP and CLD both have
feasible solutions which are interior to the cones K and K* ,respectively. See [64) for
more extensive duality results for these problems. Note that if X = R", Y = Rm ,
and K = R't, the nonnegative orthant, then CLP is simply LP. It is shown in [64)
that CLP actually provides a formulation for general convex programming.
In [64, Chapter 4] it is shown that Karmarkar's algorithm, and the affine potential
reduction algorithm, can be extended to problems of the form CLP so long as the
cone K possesses a f)-logarithmically-homogeneous barrier. The exact definition of
such a barrier, and its properties, are beyond the scope of this article. We note here
only that the complexities of algorithms for CLP depend on the parameter f). For
the usual LP problem, - :L7=lln(Xi) is an n-logarithmically-homogeneous barrier
Potential Reduction Algorithms 151
Todd [78] gives a much more extensive discussion of Nesterov and Nemirovskii's [64]
generalization of potential reduction algorithms to CLP. The extension of a potential
reduction algorithm (specifically Ye's [83] projective potential reduction method) to
semidefinite programming was independently obtained by Alizadeh [1]. Nesterovand
Todd [62], [63] obtain an extension of the primal-dual potential reduction method
to problems of the form CLP where K and its barrier are self-scaled; see also [78]
for a summary of these results.
Acknowledgements
I would like to thank Rob Freund, Tamas Terlaky, Mike Todd, and Yinyu Ye for
their comments on a draft of this article.
REFERENCES
[1] F. Alizadeh, "Interior point methods in semidefinite programming with appli-
cations to combinatorial optimization," SIAM J. Opt. 5 (1995) 13-51.
[2] K.M. Anstreicher, "A monotonic projective algorithm for fractional linear pro-
gramming," Algorithmica 1 (1986) 483-498.
[3] K.M. Anstreicher, "The worst-case step in Karmarkar's algorithm," Math. Oper.
Res. 14 (1989) 294-302.
[4] K.M. Anstreicher, "A combined phase I-phase II projective algorithm for linear
programming," Math. Prog. 43 (1989) 209-223.
[5] K.M. Anstreicher, "A standard form variant, and safeguarded linesearch, for
the modified Karmarkar algorithm," Math. Prog. 47 (1990) 337-351.
152 CHAPTER 4
[6] K.M. Anstreicher, "Dual ellipsoids and degeneracy in the projective algorithm
for linear programming," Contemporary Mathematics 114 (1990) 141-149.
[7] K.M. Anstreicher, "On the performance of Karmarkar's algorithm over a se-
quence of iterations," SIAM J. Opt. 1 (1991) 22-29.
[8] K.M. Anstreicher, "A combined phase I - phase II scaled potential algorithm
for linear programming," Math. Prog. 52 (1991) 429-439.
[9] K.M. Anstreicher, "On monotonicity in the scaled potential algorithm for linear
programming," Linear Algebra Appl. 152 (1991) 223-232.
[10] K.M. Anstreicher, "Strict monotonicity and improved complexity in the stan-
dard form projective algorithm for linear programming," Math. Prog. 62 (1993)
517-535.
[11] K.M. Anstreicher, "Large step volumetric potential reduction algorithms for
linear programming," to appear in Annals of O.R. (1996).
[12] K.M. Anstreicher and R.A. Bosch, "Long steps in an O(n 3 L) algorithm for
linear programming," Math. Prog. 54 (1992) 251-265.
[13] K.M. Anstreicher and P. Watteyne, "A family of search directions for Kar-
markar's algorithm," Operations Research 41 (1993),759-767.
[14] M.D. Asic, V.V. Kovacevic-Vujcic, and M.D. Radosavljevcic-Nikolic, "A note
on limiting behavior of the projective and the affine rescaling algorithms, Con-
temporary Mathematics 114 (1990) 151-157.
[15] D. Bayer and J .C. Lagarias, "Karmarkar's linear programming algorithm and
Newton's method," Math. Prog. 50 (1991) 291-330.
[16] R.A. Bosch, "On Mizuno's rank one updating algorithm for linear program-
ming," SIAM J. Opt. 3 (1993) 861-867.
[17] R.A. Bosch and K.M. Anstreicher, "On partial updating in a potential reduction
linear programming algorithm of Kojima, Mizuno, and Yoshise," Algorithmica
9 (1993) 184-197.
[18] R.A. Bosch and K.M. Anstreicher, "A partial updating algorithm for linear
programs with many more variables than constraints," Optimization Methods
and Software 4 (1995) 243-257.
[19] R. W. Cottle, J .-S. Pang, and R.E. Stone, The Linear Complementarity Problem
(Academic Press, Boston, 1992).
Potential Reduction Algorithms 153
[20] G. de Ghellinck and J.-Ph. Vial, "A polynomial Newton method for linear
programming," Algorithmica 1 (1986) 425-453.
[21] A.V. Fiacco and G.P. McCormick, Nonlinear Programming, Sequential Uncon-
strained Minimization Techniques, (John Wiley, New York, 1968); reprinted as
Classics in Applied Mathematics Vol. 4, (SIAM, Philadelphia, 1990).
[22] C. Fraley, "Linear updates for a single-phase projective method," O.R. Leiters
9 (1990) 169-174.
[23] R.M. Freund, "An analog of Karmarkar's algorithm for inequality constrained
linear programs, with a 'new' class of projective transformations for centering
a polytope," O.R. Letters 7 (1988) 9-14.
[24] R.M. Freund, "Polynomial-time algorithms for linear programming based only
on primal scaling and projected gradients of a potential function," Math. Prog.
51 (1991) 203-222.
[25] R.M. Freund, "A potential-function reduction algorithm for solving a linear
program directly from an infeasible 'warm start'," Math. Prog. 52 (1991) 441-
466.
[27] R.M. Freund, "A potential reduction algorithm with user-specified phase 1-
phase II balance for solving a linear program from an infeasible warm start,"
SIAM J. Opt. 5 (1995) 247-268.
[28] D.M. Gay, "A variant of Karmarkar's linear programming algorithm for prob-
lems in standard form," Math. Prog. 37 (1987) 81-90.
[29] P. Gill, W. Murray, M. Saunders, J. Tomlin, and M. Wright, "On projected New-
ton barrier methods for linear programming and an equivalence to Karmarkar's
projective method," Math. Prog. 36 (1986) 183-209.
[32] C.C. Gonzaga, "Conical projection algorithms for linear programming," Math.
Prog. 43 (1989) 151-173.
154 CHAPTER 4
[33] C.C. Gonzaga, "Polynomial affine algorithms for linear programming," Math.
Prog. 49 (1991) 7-21.
[34] C.C. Gonzaga, "Large-step path following methods for linear programming,
part II: potential reduction method," SIAM J. Opt. 1 (1991) 280-292.
[35] C.C. Gonzaga, "Interior point algorithms for linear programs with inequality
constraints," Math. Prog. 52 (1991) 209-225.
[36] C.C. Gonzaga, "On lower bound updates in primal potential reduction methods
for linear programming," Math. Prog. 52 (1991) 415-428.
[37] C.C. Gonzaga, "Path-following methods for linear programming," SIAM Review
34 (1992) 167-224.
[38] C.C. Gonzaga and M.J. Todd, "An O( foL )-iteration large-step primal-dual
affine algorithm for linear programming," SIAM J. Opt. 2 (1992) 349-359.
[40] H. Imai, "On the convexity ofthe multiplicative version of Karmarkar's potential
function," Math. Prog. 40 (1988) 29-32.
[41] M. Iri and H. Imai, "A multiplicative barrier function method for linear pro-
gramming," Algorithmica 1 (1986) 455-482.
[42] B. Jansen, C. Roos, and T. Terlaky, "The theory of linear programming: skew
symmetric self-dual problems and the central path," Optimization 29 (1993)
225-233.
[43] J. Ji and Y. Ye, "A complexity analysis for interior-point algorithms based on
Karmarkar's potential function," SIAM J. Opt. 4 (1994) 512-520.
[46] M. Kojima, N. Megiddo, and Y. Ye, "An interior point potential reduction
algorithm for the linear complementarity problem," Math. Prog. 54 (1992) 267-
279.
Potential Reduction Algorithms 155
[47] M. Kojima, S. Mizuno, and A. Yoshise, "An O(.,fiiL) iteration potential re-
duction algorithm for linear complementarity problems," Math. Prog. 50 (1991)
331-342.
[48] C. McDiarmid, "On the improvement per iteration in Karmarkar's algorithm
for linear programming," Math. Prog. 46 (1990) 299-320.
[49] N. Megiddo and M. Shub, "Boundary behavior of interior point algorithms in
linear programming," Math. Oper. Res. 14 (1989), 97-146
[50] J .E. Mitchell, "Updating lower bounds when using Karmarkar's projective al-
gorithm for linear programming," JOTA 78 (1993) 127-142.
[51] J.E. Mitchell and M.J. Todd, "On the relationship between the search directions
in the affine and projective variants of Karmarkar's linear programming algo-
rithm," in Contributions to Operations Research and Economics: The Twentieth
Anniversary of CORE, B. Cornet and H. Tulkens, editors, MIT Press (Cam-
bridge, MA, 1989) 237-250.
[52] J .E. Mitchell and M.J. Todd, "A variant of Karmarkar's linear programming
algorithm for problems with some unrestricted variables," SIAM J. Matrix Anal.
Appl. 10 (1989) 30-38.
[53] S. Mizuno, "A rank one updating algorithm for linear programming," The Ara-
bian Journal for Science and Engineering 15 (1990) 671-677.
[56] S. Mizuno and A. Nagasawa, "A primal-dual affine scaling potential reduction
algorithm for linear programming," Math. Prog. 62 (1993) 119-131.
[57] R.D.C. Monteiro, "Convergence and boundary behavior of the projective scaling
trajectories for linear programming," Contemporary Mathematics 114 (1990)
213-229.
[58] R.D.C. Monteiro, "On the continuous trajectories for a potential reduction al-
gorithm for linear programming," Math. Oper. Res. 17 (1992) 225-253.
[59] M. Muramatsu and T. Tsuchiya, "A convergence analysis ofa long-step variant
of the projective scaling algorithm," The Institute of Statistical Mathematics
(Tokyo, Japan, 1993); to appear in Math. Prog.
156 CHAPTER 4
[60] A.S. Nemirovskii, "An algorithm of the Karmarkar type," Soviet Journal on
Computers and Systems Sciences 25 (1987) 61-74.
[66] M.J .D. Powell, "On the number of iterations of Karmarkar's algorithm for linear
programming," Math. Prog. 62 (1993) 153-197.
[67] A.E. Steger, "An extension of Karmarkar's algorithm for bounded linear pro-
gramming problems," M.S. Thesis, State University of New York (Stonybrook,
NY, 1985).
[68] D.F. Shanno, "Computing Karmarkar projections quickly," Math. Prog. 41
(1988) 61-71.
[69] D. Shaw and D. Goldfarb, "A path-following projective interior point method
for linear programming," SIAM J. Opt. 4 (1994) 65-85.
[70] K. Tanabe, "Centered Newton method for mathematical programming," Lecture
Notes in Control and Information Sciences 113 (Springer-Verlag, Berlin, 1988)
197-206.
[71] M.J. Todd, "Exploiting special structure in Karmarkar's linear programming
algorithm," Math. Prog. 41 (1988) 97-113.
[72] M.J. Todd, "Improved bounds and containing ellipsoids in Karmarkar's linear
programming algorithm," Mathematics of Operations Research 13 (1988) 650-
659.
[73] M.J. Todd, "The effects of degeneracy and null and unbounded variables on vari-
ants of Karmarkar's linear programming algorithm," in Large Scale Numerical
Optimization, T.F. Coleman and Y. Li, editors (SIAM, Philadelphia, 1990).
Potential Reduction Algorithms 157
[75] M.J. Todd, "On Anstreicher's combined phase I-phase II projective algorithm
for linear programming," Math. Prog. 55 (1992) 1-15.
[76] M.J. Todd, "Combining phase I and phase II in a potential reduction algorithm
for linear programming," Math. Prog. 59 (1993) 133-150.
[79] M.J. Todd and B.P. Burrell, "An extension of Karmarkar's algorithm for linear
programming using dual variables," Algorithmica 1 (1986) 409-424.
[80] M.J. Todd and Y. Ye, "A centered projective algorithm for linear programming,"
Math. Oper. Res. 15 (1990) 508-529.
[83] Y. Ye, "A class of projective transformations for linear programming," SIAM
J. Compo 19 (1990) 457-466.
[84] Y. Ye, "A 'build down' scheme for linear programming," Mathematical Pro-
gramming 46 (1990) 61-72.
[85] Y. Ye, "An O(n 3 L) potential reduction algorithm for linear programming,"
Math. Prog. 50 (1991) 239-258.
[86] Y. Ye, "A potential reduction algorithm allowing column generation," SIAM J.
Opt. 2 (1992), 7-20.
[87] Y. Ye, "A fully polynomial-time approximation algorithm for computing a sta-
tionary point of the general LCP," Math. Oper. Res. 18 (1993) 334-345.
[88] Y. Ye, "On the complexity of approximating a KKT point of quadratic pro-
gramming," Dept. of Management Sciences, University of Iowa (Iowa City, lA,
1995).
158 CHAPTER 4
[91] Y. Ye, M.J. Todd, and S. Mizuno, "An O(foL)-iteration homogeneous and
self-dual linear programming algorithm," Math. Oper. Res. 19 (1994) 53-67.
5
INFEASIBLE-INTERIOR-POINT
ALGORITHMS
Shinji Mizuno
Department of Prediction and Control,
The Institute of Statistical Mathematics,
Minato-ku, Tokyo 106, Japan
ABSTRACT
An interior-point algorithm whose initial point is not restricted to a feasible point is called
an infeasible-interior-point algorithm. The algorithm directly solves a given linear program-
ming problem without using any artificial problem. So the algorithm has a big advantage
of implementation over a feasible-interior-point algorithm, which has to start from a feasi-
ble point. We introduce a primal-dual infeasible-interior-point algorithm and prove global
convergence of the algorithm. When all the data of the linear programming problem are
integers, the algorithm terminates in polynomial-time under some moderate conditions of
the initial point. We also introduce a predictor-corrector infeasible-interior-point algorithm,
which achieves better complexity and has superlinearly convergence.
5.1 INTRODUCTION
A linear programming problem is to find an optimal solution, which minimizes an
objective function under linear equality and inequality constraints. A point is called
feasible if it satisfies all the linear constraints, and called infeasible otherwise. A
point, which satisfies the inequality constraints but may not satisfy the equality con-
straints, is called interior. An interior-point algorithm solves the linear programming
problem by generating a sequence of interior points from an initial interior point. A
natural question is how to prepare the initial point?
159
T. Terlaky (ed.), Interior Point Methods ofMathematical Programming 159-187.
© 1996 Kluwer Academic Publishers.
160 CHAPTER 5
An interior point algorithm, whose initial point may not be feasible, was introduced
by Lustig [6] and Tanabe [23]. The algorithm is a simple variant of a primal-dual
interior-point algorithm developed by Megiddo [10], Kojima et al. [4, 5], Monteiro
and Adler [17], and Tanabe [22]. For any linear programming problem, it is very
easy to get an initial interior point by using slack variables. Lustig et al. [7, 8] and
Marsten et al. [9] reported that such an algorithm was practically efficient among
numerous algorithms. In this paper, we call an algorithm, which solves a given linear
programming problem directly without using any artificial problem and generate a
sequence of feasible or infeasible interior points from an arbitrary interior-point, an
infeasible-interior-point algorithm or an lIP algorithm simply.
mllllmize cT x,
(5.1)
subject to =
Ax b, x 2: 0,
where x E R n is an unknown vector. Assume that the rank of A is m and the system
=
Au b has a solution u. The dual problem of (5.1) is defined as
maximize bT y,
(5.2)
subject to AT y + s = c, s 2: 0,
(5.3)
where X := diag(x) denotes a diagonal matrix whose each diagonal element is equal
to the element Xi of x E Rn. The problem for finding a solution (x, y, s) of (5.3)
is called a primal-dual linear programming problem. Conversely if (x, y, s) is a
solution of the primal-dual problem, x and (y,s) are optimal solutions of (5.1) and
(5.2) respectively. We call a point (x, y, s) interior if x > 0 and s > 0 and call it
feasible if it satisfies Ax = b, ATy + s = c, and (x, s) 2: O.
We introduce a path of centers for the primal-dual problem (5.3). The path runs
through the feasible region, and one of the end points is on the solution set of the
problem. The path is very important to understand interior-point methods. For
162 CHAPTER 5
(5.4)
where e := (1,1, ... , If E Rn. Suppose that the problem (5.3) has an interior point.
Then the system (5.4) has a unique solution, which we denote by (x(p), yep), s(p)).
The center is clearly feasible. Let PI be the set of centers
(5.5)
Then we set pk+1 := (X k+1f sk+1 In, because this pk+1 attains the minimum of the
residual for the third equality in (5.4), i.e.
IIXk+ 1Sk+1 - pk+! ell ::; IIXk+ 1 Sk+1 - pell for any p > 0,
where II . II without subscript denotes the Euclidean norm. We also use IIxliI
2:7=1 IXil and Ilxlloo := maxdlx;I}· The algorithm is summarized as follows.
lIP Algorithms 163
Algorithm A: Let (x O, yO, sO) be an initial interior point. Set k := 0 and j.J0
(xO)T sO In.
Step 2: Compute the solution (Llx, Lly, Lls) of the system (5.5).
Step 3: Compute step sizes ap and aD and a next point (xk+l, yk+ 1, Sk+l) by (5.6).
Set j.Jk+1 := (xk+l)T sk+ 1In.
We shall show that any iterate generated by Algorithm A lies on an affine subspace,
which includes the initial point (x O, yO, SO) and the feasible region of (5.3).
Lemma 5.2.1 Let {(xk, yk, sk)} be a sequence gentrated by Algorithm A. We have
that
A(x k + aLlx) - b = (1 - a)(Axk - b),
=
AT(yk + ally) + (sk + aLls) - c (1- a)(ATyk + Sk - c)
for each a. If we set Ofj, := 1 and O~ := 1 and compute
=
Axk - b OJ,(Ax o - b), (5.7)
=
AT yk + sk _ C 01> (AT yO + sO - c). (5.8)
Similarly we get the second equality in the lemma. Since xk+l = xk + apLlx and
(yk+1,sk+1) = (yk,sk) + aD(Lly,Lls) in each iteration, we can prove the latter
assertion in the lemma by using induction. D
(5.9)
The set N contains the initial point (xO, yO, sO) and includes the path Pl. We generate
a sequence {(xk, yk, sk)} in the set N by Algorithm A. Then if (xk)T sk -- 0, the
sequence approaches to the solution set of the primal-dual problem (5.3) because
IIAxk -bll-- 0 and IIATyk +sk -cll-- O. The condition Xs ~ I"e in N assures that
any point in N is well separated from the boundary of the feasible region except for
the solution set.
We are ready to state a global convergent lIP algorithm for solving the primal-dual
problem (5.3). The algorithm belongs to Algorithm A.
Algorithm AI: Let (xO,yO,SO) be an initial interior point. Choose the parameter
values p, f, fp, fD, I, e,
and A E (0,1). Set k := 0, (}r;, := 1, (}fjy := 1, and
,,0 := (xOf sO In.
Step 1: If the conditions in (5.9) hold true at the current iterate (xk, yk, sk) then
output it as an approximate solution and stop. If
(}}.(xOf sk + (}1( s of xk
(5.11)
> p«(}}.e T X O + (}.b eT SO) + (xk)T sk + (}t(}.b(xO)T so,
then stop. Otherwise set ,,' = A"k.
Step 2: Compute the solution (~x, ~y, ~s) of the system (5.5).
Step 3: Compute
Let a be the value of a which attains the minimum. Choose any step sizes
ap ~ 0 and aD ~ 0 such that
The algorithm stops in two cases at Step 1. In the former case we get an approximate
solution, while in the latter case we detect an infeasibility of the problem. If we
use p = 00, the algorithm generates a infinite sequence of points unless we get an
approximate solution. In Theorem 5.3.1 below, we shall show that Algorithm 1
terminates in a finite number of iterations if the problem is feasible.
166 CHAPTER 5
Theorem 5.3.1 Suppose that the parameters A, ~, and 1 are independent of the
data. If p is finite, Algorithm A1 terminates in a finite number of iterations, which
=
depends on the initial point, a solution of Au b and AT v + w =
c, p, c, cp, cD, and
n. If the condition (5.11) holds true at some iteration k, the primal-dual problem
=
(5.3) has no solutions in Bp. If p 00 and the problem (5.3) is feasible, Algorithm
A1 terminates in a finite number of iterations, which depends on the initial point,
an optimal solution, c, cp, cD, and n. If p = 00 and Algorithm A1 generates an
infinite sequence, the sequence is unbounded and the problem (5.3) is infeasible.
(5.12)
for some TJl >0 and TJ2 >0 at each iteration of Algorithm A1. Define
a *._ .
.-mln, {I A
.5(I-A) ,-, (1-1)A} .
TJl TJl nTJ2
and
lIP Algorithms 167
=
Using this equality, J.lk (xkf sk In, J.l' = AJ.lk, (5.12), and a :5 .5(1 - )..)/"11, we get
the inequality (5.13) as follows:
Since this inequality holds true for any a E [0,0'*], we also see that x~ + aLlxi > 0
and s~ + aLls; > 0 by the continuity with respect to a. Using Lemma 5.2.1, (5.15),
and a :5 )../"11,
From Lemma 5.3.2, the iterates generated by Algorithm Al are in N, and the step
sizes a computed at Step 3 is greater than or equal to 0'*, which does not depend
on k. From Step 3 of Algorithm Al and Lemma 5.3.2, we see that
If 0'* is bounded away from 0 then (xk)T sk -> 0 as k -> 00. In order to obtain a
lower bound of 0'* , we estimate the magnitude of 7]1 and 7]2 in the next three lemmas.
Proof Suppose that (D.x, D.y, D.s) is expressed as in the lemma. Since ADQ = AD
=
and AD(I - Q) 0, we see that
So (,6.x, ,6.y, ,6.s) is the solution of (5.5). Since Q and I -Q are orthogonal projections,
we have that
where the norm IIX/II of a diagonal matrix XI is equal to the maximum absolute value
of the diagonal elements. Since pi = )..pk and Xsk 2: 'Ypke, we have II(XS)-511 :::
1/ J'Ypk and
II(XS)5 e - P'(XS)-5ell < II(XS)5 ell + p / ll(XS)- 5e ll
< ~ + )..pky'n/';:;-;;;;
(1 + )..Iv:r)~.
Hence we have shown the inequality (5.16). Similarly we obtain the bound of IID,6.sll
as in (5.17). 0
Since IID-l,6.xll and IID,6.sll are bounded by (5.16) and (5.17), we shall obtain an
upper bound of the first term in the right side of them.
Lemma 5.3.4 If the condition {5.11} holds true, the primal-dual problem {5.3} has
no solutions in Bp. If the condition {5.11} does not hold true at k-th iteration then
Proof Suppose that the primal-dual problem (5.3) has a solution (x', y*, s*) in B p ,
that is, II(x*, s*)lloo ::: p. From Lemma 5.2.1,
and similarly
So we have that
which implies
So (5.11) does not hold true, and we have proved the first assertion.
Now suppose that the condition (5.11) does not hold true, that is, (5.19) holds true.
Then we have that
Hence
By using Lemmas 5.3.3 and 5.3.4, we shall get the values of 1/1 and 1/2 defined in
Lemma 5.3.2.
IIP Algorithms 171
Lemma 5.3.5 If the condition (5.11) does not hold true at k-th iteration then
Proof of Theorem 5.3.1: Suppose that p is finite and the condition (5.11) does
not hold true throughout Algorithm AI. Then TJ defined in Lemma 5.3.5 is finite.
From Lemmas 5.3.2 and 5.3.5 we have that
Hence if
k > max {In((xO)T sO I f), In(lIbll/(efP», In(lIcll/(efD »}
(5.21 )
- .5a*(1 - A)
then the conditions (5.9) hold true. The right side is finite and depends on the point
(u, v, w), the initial point (xO, yO, SO), p, i, ip, iD, and n.
As stated in Lemma 5.3.4, if the condition (5.11) holds true at some iteration, there
are no solutions of (5.3) in Bp.
If the primal-dual problem is feasible, there exists a solution (x', y*, SO) of it. So
the condition (5.11) does not hold true throughout Algorithm Al if p 2: pi
lI(x*, s*)lIoo· Hence Algorithm Al terminates in a finite number of iterations, which
depends on (u,v,w) := (x',y',s*), (XO,yO,sO), pi, i, ip, iD, and n, by using the
same argument above.
Suppose that Algorithm Al generates an infinite sequence of points (xk, yk, sic). If
/-I k --> 0 then the algorithm terminates by the conditions in (5.9). So pk is bounded
away from o. If (x lc , yk, sic) is bounded then they are bounded away from 0 because
x~ s~ 2: '/-I k • Thus D and D- 1 are bounded. Hence ~x and ~s are bounded by
Lemma 5.3.3. Therefore a* in Lemma 5.3.2 is bounded away from 0, that is, pic goes
to 0, and we have derived a contradiction. 0
has a solution. It is well known that this system has a solution for Po = 2L, so we
assume that Po ::; 2L. We may compute a smaller Po than 2L by solving a simple
minimization problem without inequality constraints:
Lemma 5.4.2 Under the conditions in Theorem 5.4.1, if the condition (5.11) does
not hold true at k-th iteration of Algorithm Al then
where 11:1 and TJ are defined in Lemmas 5.3.4 and 5.3.5 respective/yo
(~ (2n~2 + 1 + 1) Vn + 1 + ~)2
v'1 np v'1
< 100n/'y.
o
Proof of Theorem 5.4.1: From Lemmas 5.3.2,5.3.5, and 5.4.2, we have that
(xk+ 1f sk+1 ::; (1 _ .50:*(1- A»(xk)T sk
174 CHAPTER 5
for
* . {I(I->.) 1(1-,)>, }
a := mill 200n' 100n 2 (1 + I) .
Using the same argument as in the proof of Theorem 5.3.1, the number of iterations
of Algorithm Al is bounded by the right side of (5.21), which is O(n 2 L) from the
parameter values in Theorem 5.4.1 and the value of a* above. 0
It is well known that if the initial point is feasible, Algorithm Al requires at most
O(nL) iterations, see for example Mizuno et al. [15]. Here we shall get a sufficient
condition to achieve the O( nL) iteration complexity of Algorithm AI.
Theorem 5.4.3 Let fJ > 0 be a constant independent of the data. Suppose that the
parameter values~, c, cp, en, and p are as in Theorem 5.4.1. For a given initial
point (xO,yO,sO) EN, if there exists a solution (u,v,w) of Au = band ATv+w = c
such that
(5.23)
Algorithm A1 terminates in O( nL) iterations.
Note that if (xO, yO, sO) = p( e, 0, e) in addition, this theorem easily follows from the
proof of Theorem 5.4.1.
Proof From the condition (5.23), .5xo :::; u :::; 1.5xo and .5so :::; w :::; l.5so. Note that
the relation (5.18) holds true not only for the optimal solution (x*, y*, SO) but also
for the point (u, v, w). So we have that
.5(x Of sk
+ .5(sO)T xk
< (e~xO + (1 - e~ )u)T sk + (e1s0 + (1 - e1)wf xk
(e~xo + (1- e~)u)T(e1so + (1- e1)w) + (xkf sk
< (l.5xOf(l.5so) + (xkf sk
< 3.25(xO)T sO.
B~IIS(xO
- u)11 + e11lX(so - w)11
< (fJ/y'n)e~IISxoll + (6/y'n)e11IXsoll
< (fJ/y'n) max{e~, e1}((skf xO + (xkf sO)
< (fJ/y'n) max{e~, e1}6.5(xof sO
< 6.56(x k f sk /y'n.
lIP Algorithms 175
where the last inequality follows from e= 1 and (5.20). From Lemma 5.3.3,
IID- I .6.xll < 6.58..jnpk/v'f+ (1 +)../v'f)Jnpk
< (2 + 6.58)Jnpk/v'f. (5.24)
We have the same bound for IID.6.sll. Using the same argument in the proof of
Lemma 5.3.5,
l.6.xT .6.sl $ «2+6.58)2h)(x k ls k ,
I.6.Xj.6.sj -,.6.xT .6.s/nl $ «1 + ,)(2 + 6.58)2h)(x kl sk.
By Lemmas 5.3.2, we have that
(xk+If sk+ I $ (1- .5a*(1- )..»(xkl sk
for
* . { ,(1 - )..) ,(1 - , ».. }
a := mill 2(2 + 6.582)' (2 + 6.58)2(1 + ,)n .
Then the number of iterations of Algorithm Al is bounded by the right side of
(5.21), which is O(nL) from the parameter values in Theorem 5.4.3 and the value of
a* above. 0
In Theorem 5.4.3, we have got a condition of the initial point to achieve O( nL) iter-
ation complexity of Algorithm AI. Mizuno [11) showed that a variant of Algorithm
Al terminates in O( nL) iterations under the condition of initial point given in The-
orem 5.4.1. The variant uses a predictor-corrector technique. Here we do not show
the variant, but we introduce a predictor-corrector algorithm, which is a variant of
Algorithm B given in the next section, and we prove its O(nL) iteration complexity
in Sections 5.6 and 5.7.
(x, s) 2: o. (5.27)
Note that if B = 0 then the problems (5.25), (5.26), and (5.27) coincide with the
original problems (5.1), (5.2), and (5.3) respectively, and if B = 1 then the initial
point (xO, yO, sO) is a feasible interior point of (5.27). A point (x k , yk, Sk) generated
by Algorithm A is a feasible point of the problem (5.27), if B~ = Band B1 = B.
Now we consider the feasibility of the primal-dual problem (5.27). It is easy to verify
that if (5.27) has interior points for two different parameter values Bl < B2, there
exists an (' > 0 such that (5.27) has an interior point for any B E (Bl - (', B2 + (').
Hence the set of parameter values, for which (5.27) has an interior point, is an open
interval (B" Bu), where B, < 1 and Bu > 1 may be -00 and 00 respectively. From the
definition, B, < 0 if and only if the original primal-dual problem (5.3) has an interior
point, and B, = 0 if and only if it has a feasible point but does not have an interior
point.
If the perturbed primal-dual problem (5.27) has an interior point, then centers of
it exist. For each B E (B t , Bu) and fJ > 0, the center (x(B, fJ), y(B, fJ), s(B, fJ)) of the
problem (5.27) is a solution of the system
(5.28)
The center (x(B, fJ), y(B, fJ), s(B, fJ)) exists uniquely for each B E (B t , Bu) and fJ > o.
We define the set of parameters
The following properties of the set 5 were shown in Mizuno et al. [16].
Theorem 5.5.1 The set 5 of centers is a surface. Let {(Bk, fJk)} be a sequence on
T. When the primal-dual problem (5.3) has an interior point, the center (x(Bk, {lk),
y(B k , {lk), s(Bk, {lk)) approaches to the solution set of (5.3) if (Bk, {lk) 1 (0,0). When
(5.3) has a feasible point but not an interior point, (x(Bk, {lk), y(e k , fJk), s(e k , {lk))
lIP Algorithms 177
approaches to the solution set of (5.3) if(Ok ,pk) ! (0,0) such that pk 10 k is bounded.
For any p* > 0, if (Ok, pk) approaches (Bl, p*) then II(x(B k , pk), y(Bk, pk), s(Bk, pk)11
is unbounded.
Outline of the proof: Since the centers are solutions of the system (5.28) for
(0, p) E T, 5 is a surface from the implicit function theorem. Suppose that the
problem (5.3) has an interior point (x', y', s'), where x' > 0 and s' > O. Then we
have that
(XO)T s(B, p) + (sO)T x(O, p) :S O(xO)T sO + (1 - O)((xOf s' + (sO)T x') + nplB.
So (x(O,p),y(O,p),s(B,p» is bounded if plB is bounded. Hence every cluster point
of (x(B, p), y(O, p), s(O, p» is a solution of (5.3) if (B, p) ! (0,0). If (Bk, pk) goes to
(Bl,P*) with p* > 0, an element of x(O,p) or s(O,p) goes to 0, which implies that
the other element is unbounded because XiSj --+ p* > O. 0
Using the results in Theorem 5.5.1, we construct an algorithm for solving the primal-
dual problem (5.3). The algorithm generates a sequence of approximate points of
(x(Ok,pk),y(Ok,pk),s(Bk,pk» for (Bk,pk) E T which converges to (0,0) as k --+ 00,
if the primal-dual problem (5.3) is feasible.
Let (xk,yk,sk) be a current iterate. For (B,p):= (B',p'), we compute the Newton
direction (~x,~y,~s) of the system (5.28) at (xk,yk,sk), that is, the solution of
(5.29)
While we have used different step sizes for the primal variables and for the dual
variables in Algorithm A, in the following algorithm we use a single step size for
178 CHAPTER 5
all the variables, so that the iterates generated by the algorithm are feasible for the
perturbed primal-dual problem (5.27).
5.6 A PREDICTOR-CORRECTOR
ALGORITHM
We define a path on the surface S. Let /-10 = (xO)T sO In. Define
From Theorem 5.5.1, the center (x(O, /-I), y(O, /-I), s(O, /-I» on P2 approaches to the
solution set of the problem (5.3) as 0 --+ 0 if the problem is feasible, and it diverges
as 0 --+ Ol > 0 if the problem is infeasible. We call P 2 a path of infeasible centers,
because it consists of infeasible points of (5.3) unless the initial point is feasible.
N'({3):= {(x, y, s) : x > 0, s > 0, IIX s - /-Iell ::; {3/-1, /-I = 0/-1°, (J > 0,
Ax = b - Ob, ATy + s = c - Oe}.
This set N'({3) is much smaller than N because of the Euclidean norm I!X s - /-leI!
to measure the closeness to the path. By generating a sequence of iterates in this
smaller neighborhood, we construct a theoretically better algorithm.
Algorithm Bl: Set fJ1 := .25 and fJ2 := .5. Choose the parameter values p, f, (p,
and (D. Let (xO,yO,sO) be an initial interior point such that IIXoso _poell ~
fJ 1p ofor pO := (xO)T sO In. Set k := 0 and 00 := 1.
Step 1: If the conditions in (5.9) hold true at the current iterate (x k , yk, sk) then
output it as an approximate solution and stop. If the condition (5.11) holds
true for O~ =
Ok and 01 = Ok then stop.
Step 2: Compute the solution (~x, ~y, ~s) of the system (5.29) for (0', p') = (0,0)
at (xk, yk, sk). Compute
a := max{a : (xk, yk, sk) + a'(~x, ~y, ~s) E N'(fJ2) for any a' E [0, a)}.
Set
(x', y', s') := (x k , yk, sk) + a(~x, ~y, ~s),
0k+1 := (1- a)Ok,
pk+1 := Ok+! pO.
Step 3: Compute the solution (~x', ~y', ~s') of the system (5.29) for
(O',p') = (Ok+1,pk+1) at (x',y',s').
Set (Xk+1, yk+1, sk+1) := (x', y', s') + (~x', ~y', ~s').
In each 100p of Algorithm B1, we compute two directions, so that one iteration
of Algorithm B1 corresponds to two iterations of Algorithm B. Step 2 is called a
predictor step and Step 3 is a corrector step. At the predictor step, we are trying
to decrease the value of Ok+I and pk+I as much as possible subject to the condition
that the new iterate is in the neighborhood N'(fJ2). Then at the corrector step,
we compute a point near to the path of centers P2. We shall show that the point
computed at the corrector step belongs to the smaller neighborhood N'(fJd.
Lemma 5.6.1 For any k, (x k , yk, sk) is a feasible point of {5.27} for 0 = Ok. More-
over pk =
(xk)T sk In and (x k , yk, sk) E N'(fJd for fJ1 .25. =
Proof. Suppose that the assertion of the lemma is true for k. We shall prove that it
is also true for k + 1. We have that
Ax'- b AXk + a(-Axk +b) - b
-(1 - a)( _Axk + b)
-(1- a)Okfj
_Ok+Ifj
180 CHAPTER 5
and similarly
ATyl+SI-C = _(}k+1 c .
From these equalities and the step size ci at Step 2, (Xl, yl, Sl) is a feasible point of
(5.27) for (} = (}k+ 1, and it is in N I (f32) for f32 = .5. By Step 3, (dx l , dyl, ds l ) is a
solution of the system
Adx l = -Ax l + b - (}k+1/;) =
0,
AT dyl + ds l = _AT yl - Sl + C - (}k+1c = 0,
51 dx l + XI ds l= _XI Sl + J-lk+1 e.
Let DI := (XI) 5(51)-.5 for XI := diag(x l ) and 51 := diag(sl). From the system of
equations above, we see that (dXI)T ds l = 0,
(Xk+1? Sk+1 (Xl? Sl + ((Sl? dx + (Xl? dS) + (dXI)T ds l
(Xl? Sl + (_(Xl? Sl + nJ-l.k+1) + °
nJ-lk+1,
and
< 1 (I A I
-1-1 SiL>.Xi
I 1)2
+ XiL>.Si
A
4x;si
1 (k+1 I 1)2
< 4(1 _ f32)J-lk+1 J-l - xisi
and
II(X' + a~X')(s' + a~s') - J.Lk+1ell
= IIX's' + a( -X' s' + pk+1e) + a2~X' ~s' - pk+1ell
< (1- a)IIX's' - pk+1ell + a211~X'~s'll
< .5(1- a)pk+l + a 2(v'2/8)J.Lk+ 1 .
Those relations imply that for each a E [0,1] x' + a~x' > 0, s' + a~s' > 0,
and (x',y',s') + a(~x',~y',~s') E N'(.5(1- a) + .25a 2), especially we see that
(xk+l,yk+l,sk+l) E N'(/3t) for /31 = .25. 0
Since Ok+1 =(1- ii)Ok at each iteration of Algorithm Bl, we shall get a lower bound
of ii, and then we shall prove the theorem.
Lemma 5.7.2 At each iteration of Algorithm Bl, if II~X ~sll < 1]3p.k for some
°
> then
1]3
Proof Since (xk,yk,sk) E N'(/3I) and (~x,~y,~s) is the solution of (5.29) for
(0',/-0:= (0,0), we have that
A(xk+a~x) = Axk+a(-Axk+b)
= (l-a)(b-O kb)+ab
= 10-
b-(l-a)O b,
similarly
c - (1 - a)Okc,
and
From the condition above and the continuity with respect to a E [0, a], we also have
°
that xk + a~x > and sk + a~s > 0. Hence (xk, yk, sk) + a(~x, ~y, ~s) E N'(/32)
for any a E [0, a], which implies ii ~ a. 0
Proof of Theorem 5.7.1: Since (xk, yk, sk) E N'(/3I) and p.k = (xk)T sk In, it is
easy to see that (xk, yk, sk) E N if"}' ::::; 1 - /31. So the results in Lemmas 5.3.3 and
5.3.4 hold true for oi:= Ok, ot
:= 0", e := 1, "}' := 1- /31, p.' := 0, and'\ := 0. If
IIP Algorithms 183
the condition (5.11) does not hold true at k-th iterate then from Lemmas 5.3.3 and
5.3.4
where 11:1 is defined in Lemma 5.3.4. We also have this bound for IIDLlsll. Hence
(5.31)
If p is finite then IILlXLlslI/~k is. Hence from Lemma 5.7.2, the step size a is
bounded away from 0, and the algorithm terminates in a finite number. We can
prove the other assertions in Theorem 5.7.1 as we have done in the proof of Theorem
5.3.1. 0
Theorem 5.7.3 Suppose that the parameter values E, Ep, ED, p, and the initial
point (xO,yO,sO) are as in Theorem 5.4.1. Then Algorithm B1 terminates in O(nL)
iterations.
=
Outline of the proof: From (xO, sO) p(e, e), (5.31), and Lemmas 5.4.2 and 5.7.2,
we obtain that a is at least 1/0(n). Since Ok+1 =
(1- a)(Jk, the condition (5.30)
=
holds true for k O(nL). Hence the number of iteration is bounded by O(nL). 0
Theorem 5.7.4 Let 6 > 0 be a constant independent of the data. Suppose that the
parameter values E, Ep, ED, and p are as in Theorem 5.4.1. For a given initial point
(XO, yO, sO) E N'(f31), if there exists a solution (u, v, w) of Au = b and AT v + w = c
such that (5.23) holds true, then Algorithm B1 terminates in O(ynL) iterations.
Outline of the proof: From (5.24) and the same bound for IIDLlslI, IILlXLlsll is
O(n)~k. So a is at least 1/0(yTi) from Lemma 5.7.2. Since (Jk+1 = (1- a)Ok, the
condition (5.30) holds true for k = O( ynL), and the number of iteration is bounded
by O(ynL). 0
184 CHAPTER 5
lIP algorithms solving primal or dual only linear programming problems are pro-
posed by Freund [1] and Muramatsu and Tsuchiya [18]. The algorithm in [1] traces
a path of centers, which is a projection of P2 on the primal space, and uses a short
step size at each iteration. The algorithm in [18] is an extension of Dikin's affine
scaling algorithm, so that it enable to start from an infeasible interior point.
Although the lIP algorithms presented in this chapter use a big initial point or an
almost feasible initial point to achieve polynomiality, Freund's algorithm [1] can start
from a smaller initial point and the number of iterations is bounded by O( n 2 L).
Mizuno et al. [14] proposed a potential reduction lIP algorithm which requires
O( n 25 L) iterations. They also proposed a variant which requires O( nL) iterations.
The lIP algorithm presented by Mizuno and Jarre [13] is different from others,
because it uses a projection on a convex set at each iteration which may increase the
infeasibility.
REFERENCES
[1) R. Freund, "An infeasible-start algorithm for linear programming whose com-
plexity depends on the distance from the starting point to the optimal solution,"
Working paper 3559-93-MSA, Sloan School of Management, Massachusetts In-
stitute of Technology, USA (1993).
(2) N. Karmarkar, "A new polynomial-time algorithm for linear programming,"
Combinatorica 4 (1984) 373-395.
(3) M. Kojima, N. Megiddo, and S. Mizuno, "A primal-dual infeasible-interior-point
algorithm for linear programming," Mathematical Programming 61 (1993) 261-
280.
(4) M. Kojima, S. Mizuno, and A. Yoshise, "A primal-dual interior point algorithm
for linear programming," in: Progress in Mathematical Programming, Interior-
Point and Related Methods, ed. N. Megiddo (Springer-Verlag, New York, 1989)
29-47.
(5) M. Kojima, S. Mizuno, and A. Yoshise, "A polynomial-time algorithm for a class
of linear complementary problems," Mathematical Programming 44 (1989) 1-26.
(6) I. J. Lustig, "Feasibility issues in a primal-dual interior-point method for linear
programming," Mathematical Programming 49 (1990/91) 145-162.
(7) I. J. Lustig, R. E. Marsten, and D. F. Shanno, "Computational experience with
a primal-dual interior point method for linear programming," Linear Algebra
and Its Applications 152 (1991) 191-222.
(8)1. J. Lustig, R. E. Marsten, and D. F. Shanno, "Interior point methods: com-
putational state of the art," ORSA Journal on Computing 6 (1994) 1-14.
(9) R. Marsten, R. Subramanian, M. Saltzman, I. J. Lustig, and D. Shanno, "In-
terior point methods for linear programming: Just call Newton, Lagrange, and
Fiacco and McCormick!," Interfaces 20 (1990) 105-116.
(10) N. Megiddo, "Pathways to the optimal set in linear programming," in: Progress
in Mathematical Programming, Interior-Point and Related Methods, ed. N.
Megiddo (Springer-Verlag, New York, 1989) 131-158.
[11) S. Mizuno, "Polynomiality of infeasible-interior-point algorithms for linear pro-
gramming," Mathematical Programming 67 (1994) 109-119.
(12) S. Mizuno, "A superlinearly convergent infeasible-interior-point algorithm for
geometrical LCP's without a strictly complementary condition," Preprint 214,
Mathematische Institute der Universitaet Wuerzburg, Germany (1994).
186 CHAPTER 5
[16] S. Mizuno, M. J. Todd, and Y. Ye, "A surface of analytic centers and infeasible-
interior-point algorithms for linear programming," Mathematics of Operations
Research 20 (1995) 52-67.
[18] M. Muramatsu and T. Tsuchiya, "An affine scaling method with an infeasible
starting point," Research Memorandum 490, The Institute of Statistical Math-
ematics, Tokyo (1993).
[19] F. A. Potra, "An infeasible interior-point predictor-corrector algorithm for linear
programming," Report No. 26, Department of Mathematics, The University of
Iowa, USA (1992).
[20] F. A. Potra, "A quadratically convergent predictor-corrector method for solving
linear programs from infeasible starting points," Mathematical Programming 67
(1994) 383-406.
[21] J. Stoer, "The complexity of an infeasible interior-point path-following method
for the solution of linear programs," Optimization Methods and Software 3
(1994) 1-12.
[22] K. Tanabe, "Centered Newton method for mathematical programming," in:
System Modeling and Optimization, eds. M. Iri and K. Yajima (Springer-Verlag,
New York, 1988) 197-206.
[23] K. Tanabe, "Centered Newton method for linear programming: Interior and
'exterior' point method' (Japanese)," in: New Methods for Linear Programming
3, ed. K. Tone, (The Institute of Statistical Mathematics, Tokyo, Japan, 1990)
98-100.
[24] S. Wright, "An infeasible-interior-point algorithm for linear complementarity
problems," Mathematical Programming 67 (1994) 29-52.
lIP Algorithms 187
[26] Y. Zhang, "On the convergence of a class of infeasible interior-point methods for
the horizontal linear complementarity problem," SIAM Journal on Optimization
4 (1994) 208-227.
[27] Y. Zhang and D. Zhang, "Superlinear convergence of infeasible interior-point
methods for linear programming," Mathematical Programming 66 (1994) 361-
378.
6
IMPLEMENTATION OF
INTERIOR-POINT METHODS
FOR LARGE SCALE LINEAR
PROGRAMS
Erling D. Andersen!, Jacek Gondzio 2 ,
Csaba Meszaros 3 , Xiaojie Xu 4
1 Department
of Management, Odense University,
Campusvej 55, DK-5230 Odense M, Denmark.
2 Logilab, HEC Geneva, Section of Management Studies,
University of Geneva, 102 Bd Carl Vogt,
CH-1211 Geneva 4, Switzerland, (on leave from the
Systems Research Institute, Polish Academy of Sciences,
Newelska 6, 01-447 Warsaw, Poland.
3 Department of Operations Research and Decision Support Systems,
Computer and Automation Research Institute,
Hungarian Academy of Sciences, Lagymanyosi u. 11, Budapest, Hungary.
4 Institute of Systems Science, Academia Sinica,
Beijing 100080, China.
ABSTRACT
In the past 10 years the interior point methods (IPM) for linear programming have gained
extraordinary interest as an alternative to the sparse simplex based methods. This has
initiated a fruitful competition between the two types of algorithms which has led to very
efficient implementations on both sides. The significant difference between interior point
and simplex based methods is reflected not only in the theoretical background but also in
the practical implementation. In this paper we give an overview of the most important
characteristics of advanced implementations of interior point methods. First, we present
the infeasible-primal-dual algorithm which is widely considered the most efficient general
purpose IPM. Our discussion includes various algorithmic enhancements of the basic al-
gorithm. The only shortcoming of the "traditional~ infeasible-primal-dual algorithm is to
detect a possible primal or dual infeasibility of the linear program. We discuss how this
problem can be solved with the homogeneous and self-dual model.
189
T. Ter/aky (ed.), Interior Point Methods ofMathematical Programming 189-252.
o 1996 KlIIwerANtkmicPIl'bli.thcTl.
190 CHAPTER 6
The IPMs practical efficiency is highly dependent on the linear algebra used. Hence, we
discuss this subject in great detail. Finally we cover the related topics of preprocessing and
obtaining an optimal basic solution from the interior-point solution.
6.1 INTRODUCTION
As early as in the late 1940's, almost at the same time when Dantzig presented the
famous simplex method, several researchers, including von Neumann (1947) [68],
Hoffman et al.(1953) [41] and Frisch (1955) [27], proposed interior-point algorithms
which traverse across the interior of the feasible region in an attempt to avoid the
combinatorial complexities of vertex-following algorithms.
However, the expensive computational steps they require, the possibility of numerical
instability in the calculations, and some discouraging experimental results led to a
consensus view that such algorithms would not be competitive with the simplex
method in practice.
In fact, it would have been very difficult to find serious discussion of any approach
other than the simplex method before 1984 when Karmarkar [46] presented a novel
interior point method, which, as he claimed, was able to solve large-scale linear
programs up to 50 times faster than the simplex method. Karmarkar's announcement
led to an explosion of interest in interior point methods (IPMs) among researchers
and practitioners.
Soon after Karmarkar's publication, Gill et al. [31] showed a formal relationship be-
tween the new interior point method and the classical logarithmic barrier method.
The barrier method is usually attributed to Frisch (1955) [27] and is formally studied
in Fiacco and McCormick [23] in the context of nonlinear optimization. Much re-
search has concentrated on the common theoretical foundations of linear and nonlin-
ear programming. A fundamental theme is the creation of continuously parametrized
families of approximate solutions that asymptotically converge to the exact solution.
A basic iteration of such a path-following algorithm consists of moving from one point
in a certain neighborhood of a path to another one called a target that preserves the
property of lying in the neighborhood of the path and is "near" to the exact solution.
In the past ten years several efficient implementations of interior point methods have
been developed. Lustig, Marsten and Shanno [54] have made particularly important
contribution in this area with their code OBI. Although the implementations of
the simplex method, has improved a lot in the recent years [78, 9, 24], extensive
numerical tests (cf. [54]) have indicated conclusively that an efficient and robust
Implementation of IPMs for LP 191
implementation of an interior point method can solve many large scale LP problems
substantially faster than the state-of-the-art simplex code.
The most efficient interior point method today is the infeasible-primal-dual algo-
rithm. Therefore in this chapter we discuss techniques used in an efficient and
robust implementation of the primal-dual method. Although the chapter focuses on
implementation techniques, some closely related theoretical issues are addressed as
well.
The practical success of any IPM implementation depends on the efficiency and the
reliability of the linear algebra kernel in it. We focus on these issues in Section 6.4.
The major work in a single iteration of any IPM consists of solving a set of linear
equations, the so-called Newton equation system. This system reduces in all IPMs
to the problem that is equivalent to an orthogonal projection of a vector on the
null space of the scaled linear operator. The diagonal scaling matrix depends on
the variant of the method used and it changes considerably in subsequent IPM
iterations. All general purpose IPM codes use a direct approach [19] to solve the
Newton equation system. The alternative, iterative methods has not been used
as much due to difficulties in choosing a preconditioner. There are two competitive
direct approaches for solving the Newton equations: the augmented system approach
[6, 7] and the normal equations approach. The former requires factorization of a
symmetric indefinite matrix, the latter works with a smaller positive definite matrix.
In Section 6.4, we discuss both these approaches in detail, analyse their advantages
and point out some difficulties that arise in their implementation. Moreover, we
192 CHAPTER 6
Interior point methods are now very reliable optimization tools. Sometimes only for
the reason of inertia, the operations research community keeps using the simplex
method in applications that could undoubtedly benefit from the new - interior point
technology. This is particularly important in those applications which require the so-
lution of very large linear programs (with tens or hundreds of thousand constraints
and variables). We thus end the chapter with a brief guide to the interior point
Implementation of IPMs for LP 193
software available nowadays. We shall list in Section 6.8 both commercial and ex-
perimental (research) LP codes based on interior point methods. Among the latter,
there exist very efficient programs that are public domain in a form of source code
and are competitive (in terms of speed) with the best commercial products.
Although the past ten years brought an enormous development of both the theory
and the implementations of IPMs, several issues still remain open. We shall address
them in Section 6.9 before giving our conclusions in Section 6.10.
The algorithm generates iterates which are positive (i.e. are interior with respect
to the inequality constraints) but do not necessarily satisfy the equality constraints.
Hence, the name infeasible-interior-point primal-dual method. For the sake of brevity,
we call it the primal-dual algorithm.
The first theoretical results for this method are due to Megiddo [58] who proposed to
apply a logarithmic barrier method to the primal and the dual problems at the same
time. Independently, Kojima, Mizuno and Yoshise [49] developed the theoretical
background of this method and gave the first complexity results.
The first implementations [57, 16] showed great promise and encouraged further re-
search in this field. These implementations have been continuously improved and
have led to the development of several highly efficient LP codes. Today's computa-
tional practice of the primal-dual implementation follows [51, 53, 54, 62, 36].
The practical implementations of the primal-dual algorithm still differ a lot from the
theoretical algorithms with polynomial complexity since the latter give too much
importance to the worst-case analysis. This gap between theory and practice has
been closed recently by Kojima, Megiddo and Mizuno [48] who show that the primal-
dual algorithm with some safe-guards has good theoretical properties.
194 CHAPTER 6
6.2.1 Fundamentals
Let us consider a primal linear programming problem
mm1m1ze cT x
subject to Ax = b, (6.1)
x + s = u,
x,s ~ 0,
maximize
subject to AT y - w + z = c, (6.2)
z,w ~ 0,
With some abuse of mathematics, to derive the primal-dual algorithm one should:
Replacing nonnegativity constraints with the logarithmic penalty terms gives the
following logarithmic barrier function
n n
Ax b,
X+S u,
ATy+z - W c, (6.4)
XZe J-le,
SWe J-le,
where X, S, Z and Ware diagonal matrices with the elements Xj, Sj, Zj and Wj,
respectively, e is the n-vector of all ones, J-l is a barrier parameter and Z = J-lX-le.
Let us observe that the first three of the above equations are linear and force primal
and dual feasibility of the solution. The last two equations are nonlinear and depend
on the barrier parameter J-l. They become the complementarity conditions for J-l 0, =
which together with the feasibility constraints provides optimality of the solutions.
measures the error in the complementarity and is called a complementarity gap. Note
that for a feasible point, this value reduces to the usual duality gap. For a J-l-center,
for example,
9 = 2J-le T e = 2nJ-l, (6.5)
and it vanishes at an optimal solution.
One iteration of the primal-dual algorithm makes one step of Newton's method
applied to the first order optimality conditions (6.4) with a given J-l and then J-l is
updated (usually decreased). The algorithm terminates when the infeasibility and
the complementarity gap are reduced below predetermined tolerances.
196 CHAPTER 6
[~
0 0 0
o
0
AT
I
0
0
I o
-I ][
fly
dx
fls 1= [ (, 1{u
{c , (6.6)
0 0 X o flz Jle-XZe
0 W 0 S flw Jle - SWe
where
{b b-Ax,
{u = U - x - s,
denote the violations of the primal and the dual constraints, respectively. We call
the linear system (6.6) the Newton equations system.
Note that the primal-dual method does not require feasibility of the solutions ({b, {u
and {c might be nonzero) during the optimization process. Feasibility is attained
during the process as optimality is approached. It is easy to verify that if a step
of length one is made in the Newton's direction (6.6), then feasibility is reached
immediately. This is seldom the case as a smaller stepsize usually has to be chosen
(a damped Newton iteration is taken) to preserve positivity of x, s, z and w. If this is
the case and a stepsize a < 1 is applied, then infeasibilities {b, {u and {c are reduced
by a factor (1 - a).
Let us take a closer look at the Newton equation system. After elimination of
where
= (X-1Z+S-1W)-1,
r = {c - X-1(Jle - XZe) + S-l(Jle - SWe) - S-lW{u, (6.9)
h {b.
Implementation of IPMs for LP 197
The solution of the reduced Newton equations system (6.8) is the computationally
most involved step of any interior point method. We shall discuss it in detail in
Section 6.4.
Once the system (6.8) has been solved, ~x and ~y are used to compute ~s, ~z and
~w by (6.7). Next the maximum step sizes in primal space (ap) and dual space (aD)
are computed such that the nonnegativity of variables is preserved. These step sizes
are slightly reduced with a factor ao < 1 to prevent hitting the boundary. Finally a
new iterate is computed as follows
x k +! Xk + aoap~x,
sk+! sk + aoap~s,
yk+! yk + aoaD~y, (6.10)
zk+! zk + aOaD~z,
w k +! w k + aOaD~w.
After making the step, the barrier parameter Jl is updated and the process is re-
peated.
From theory it is known that if the barrier parameter is only reduced slightly in each
iteration it is possible to take long steps in the Newton direction. It implies fast
convergence of Newton's method and all iterates are close to the central path. In
practice it is not efficient to reduce the barrier parameter slightly in every iteration
and stay very close to central path. (Recall we want to find a solution where the
barrier parameter is zero). On the other hand it is not efficient to move too far away
from the central path and close to the boundary, because in that case the algorithm
might get stuck taking small step in the Newton direction. Hence, convergence will
be painfully slow.
Starting point
The first difficulty arising in implementing the primal-dual method is the choice
of an initial solution. (Note that this problem is solved in an elegant way when a
homogeneous model is used, cf. Section 6.3.) One would like this point to be well
centered and to be as close to primal and dual feasibility as possible. Surprisingly,
198 CHAPTER 6
points that are relatively close to the optimal solution (but are not well centered)
often lead to bad performance and/or numerical difficulties.
subject to Ax = b, (6.11)
x + s = u,
where {! is a predetermined weight parameter. A solution of (6.11) can be given by
an explicit formula and can be computed at a cost comparable to a single interior
point iteration. It is supposed to minimize the norm of the primal solution (x, s) and
it promotes points that are better in the sense of the LP objective. As the solution
of (6.11) may have negative components in x and s, those negative components are
pushed towards positive values sufficiently bounded away from zero (all elements
=
smaller than 8 are replaced by 8, say, 8 1). Independently, an initial dual solution
=
(y, z, w) is chosen similarly to satisfy y 0 and the dual constraint (6.2). Again, all
elements of z and w smaller than 8 are replaced by 8.
Stepsize
The simplest way to ensure that all iterates remain close to the central path is to
decrease the barrier parameter slowly in subsequent IPM iterations. This gave rise
to so called short step methods that are known to have nice theoretical properties
but they are also known to demonstrate hopelessly slow convergence in practice.
In long step methods the barrier parameter is reduced much faster than what the
theory suggests. To preserve good convergence properties of this strategy the theory
requires that several Newton steps are computed within each primal-dual iteration
such that the new point is in a close neighborhood of the central path. In practice
this is ignored and only one Newton step is made before the barrier parameter is
reduced. A negative consequence of it is that the iterates cannot be kept close to the
central path. However, the computational practice shows that even if they remain
in a relatively large vicinity of the central path, the algorithm still converges fast.
where, E [0,1]. The choice of, = 1 corresponds to a pure recentering step while
the choice of, < 1 is expected to reduce the complementarity gap in the next
iterate. Indeed if the iterates are feasible the complementarity gap is guaranteed to
be reduced by a factor (1 - a(1 - ,».
The choice of'Y or, more generally, the choice of a point (so-called target) to which
the next iterate will hopefully be taken is a crucial issue for the efficiency of the
primal-dual method. We shall discuss it in detail in Section 6.6.
Let us observe that current implementations use different stepsizes in the primal and
dual spaces. This implies that the infeasibility is reduced faster than if the same
stepsize was used. All implementations use a variant of the following strategy. First
the maximum possible stepsizes are computed by the formulae
ap := max a> 0: (x, s) + a(dx, dS) ~ 0,
(6.13)
and aD := max a> 0: (z, w) + a(dz, dW) ~ 0,
and these step sizes are slightly reduced with a factor ao = 0.99995 to ensure that
the new point is strictly positive. Some codes use smaller ao in those iterations in
which 0.99995 might be too aggressive. However, in most cases this aggressive choice
of ao seems to be the best.
Stopping criteria
Interior point algorithms terminate when the first order optimality conditions (6.4)
are satisfied with some predetermined tolerance. In the case of the primal-dual
method, this translates to the following conditions imposed on the relative primal
and dual feasibility and the relative duality gap
IIAx - bll < lO-p and Ilx + s - ull < lO-p (6.14)
1 + Ilbll - 1 + lIull - ,
IIATy+z-w-ell < lO-p (6.15)
1 + Ilell - ,
200 CHAPTER 6
(6.16)
where p is the number of digits accurate in the solution. An 8-digits exact solution
=
(p 8) is typically required in the literature.
Let us observe that conditions (6.14-6.16) depend strongly on the scaling of the
problem. In particular, the denominators of their left hand sides usually decrease
after scaling of the problem.
In practice, it is rare that condition (6.16) is satisfied and at the same time one of the
conditions (6.14) or (6.15) does not hold. The explanation of this phenomena comes
from the analysis of the first order optimality conditions (6.4). Observe that the first
three equations, that impose primal and dual feasibility, are linear. They are thus
"easier" to satisfy for Newton's method than the last two equations that are nonlinear
and, additionally, change in subsequent interior point iterations. Consequently, the
most important and perhaps the only condition that really has to be checked is
(6.16).
Complexity
At least at one point the theory is still far from the the computational practice; it
is in the estimates of the worst-case complexity. Theoretical bound of O(.,fii log ~ )
iterations to obtain an i-exact solution to an LP is still extremely pessimistic as,
in practice, the number of iterations is something like O(logn) or O(nl/4). It is
rare that the current implementation of the primal-dual method uses more than 50
iterations to reach 1O-8-optimality.
The first element is the choice of a initial solution. Even though the heuristic pre-
sented in the previous section works well in practice, it is scaling dependent and
there is no guarantee that the method is producing a well-centered point.
The second element is the lack of a reliable technique to detect infeasibility or un-
boundedness of the LP problem. The infeasibility or unboundedness of one of the
Implementation of IPMs for LP 201
problems (6.1) and (6.2) usually manifests in a rapid growth of the primal or dual
objective function and immediately leads to numerical problems. This is really a
critical point in any implementation of the primal-dual algorithm.
The algorithm presented in this section removes both these drawbacks. It is based
on a skew-symmetric and self-dual artificial LP model first considered by Ye et al.
[90). Somewhat later Jansen et al. [45) presented the skew-symmetric self-dual
model for a primal-dual pair in a symmetric form. Xu et al. [86, 87) considered a
homogeneous and self-dual linear feasibility (HLF) model that was in fact studied
already in the 60s by Goldman and Tucker [33, 80). Xu [84, 85) developed a large
step path following LP algorithm based on the HLF model and implemented it.
The main advantage of the algorithm is it solves the LP problem without any reg-
ularity assumption concerning the existence of optimal, feasible, or interior feasible
solutions. If the problem is infeasible or unbounded, the algorithm correctly detects
the infeasibility for at least one of the primal and dual problems. Moreover, the
algorithm may start from any positive primal-dual pair, feasible or infeasible, near
the central ray of the positive orthant. Finally, even if the algorithm takes large
steps it achieves O{ foL )-iteration complexity.
Compared to the primal-dual method from the previous section this algorithm has
only one disadvantage: it requires one additional solve with the factorization of the
Newton equation matrix in each iteration.
This linear feasibility system is homogeneous and has zero as its trivial solution.
The zero solution is of course not of interest, but LP theory tells us that a strictly
complementary solution exists to any linear program. Now the HLF model (6.19) is
an LP problem with zero objective function and a zero right hand side. Furthermore,
it is self-dual. Denote by z the slack vector for the second (inequality) constraint and
by K, the slack scalar for the third (inequality) constraint. By the skew-symmetric
and self-dual property, the complementary pairs are (x, z) and (T, K,). A strictly
complementary solution for the HLF model satisfies (6.19) and
Xz = 0, (6.20)
and x + z > 0,
where X = diag(x). Let (y*, x*, T*, z*, K,*) be a strictly complementary solution of
the HLF model. We can prove the following:
• If T* > 0, then (y* IT*, x* IT*, z* IT*) is an optimal strictly complementary so-
lution to (6.17) and (6.18).
• If T* = 0, then K,* > 0, which implies that cT x* - bT y' < 0. i.e. at least one of
° ° °
cT x' and _bT y* is strictly less than zero. If cT x* < then (6.18) is infeasible;
if _bT y' < then (6.17) is infeasible; and if both cT x* < and _b T y* < 0,
then both (6.17) and (6.18) are infeasible.
For any (y, x > 0, T > 0, z > 0, K, > 0), the feasibility residuals and the average
complementarity residual are defined as
rp = bT - Ax,
rD = CT -AT Y - z,
(6.21)
rG = cT x - bT Y + K"
and J.1. = (x T z + TK,)/(n + 1),
Implementation of IPMs for LP 203
respectively. Given (yO, xO > 0, TO> 0, zO > 0,11:° > 0), the following barrier problem
with a parameter A defines a central path:
mInimize ZT X +II:T -A,.,O 2::i(Inxi + lnzi) - A,.,O(ln T + In 11:)
subject to A x -b T -A r~,
(6.22)
+c T -z A r'b,
-II: = -A r~.
where (r~, r'b, r~) and,.,o are initial residuals at (yO, xO > 0, TO> 0, zO > 0,11:° > 0).
As shown in Xu [87], it is essential to introduce feasibility residual terms in the right
hand sides of (6.22). Along the central path, the feasibility and complementarity
residuals are reduced at the same rate and eventually converge to zero. The same
rate of reduction in the feasibility and complementarity residuals guarantee that the
limit point is a strictly complementary solution of the HLF model (6.19).
By using the skew-symmetric property, the first order optimality conditions for the
barrier problem (6.22) are
Ax -b T -A r~,
_AT y +c T -z A r'b,
bT y _c T X -II: -A r~,
(6.23)
Xz A ,.,°e,
Til: A ,.,0,
x, T, Z, II: > O.
for A E (0,1].
It is worth to compare this system with the analogous first order optimality con-
ditions (6.4) used in the primal-dual algorithm presented in the previous section.
Note, for example, that conditions (6.23) define the central path even though the
model (6.19) has not an interior point. This is important when highly degenerate
problems are solved. Indeed for this reason it might be helpful, to add feasibility
residuals into (6.4).
Analogously to the primal-dual algorithm, the search direction for the "infeasible"
path following algorithm is generated by applying Newton's method to (6.23). Ac-
tually, in each iteration the algorithm solves the following linear equation system for
the direction (.1.y,.1.x, ~ T, ~z, .1.11:):
(6.24)
204 CHAPTER 6
where (ri, r1), r~) and pk are residuals at the current point (yk, xk > 0, Tk > 0, zk >
0, ",k > 0) and IE [0,1] is a chosen reduction rate of the barrier (or path) parameter.
Setting I = 0 yields an affine direction, and setting I =
1 yields a pure centering
direction. After the Newton direction has been computed, a stepsize is chosen, using
the same method as in the primal-dual algorithm, such that the new point is stricly
positive.
The algorithm continues until one of the following stopping criteria is satisfied.
If the step length is chosen such that the updated solution is still in a certain neigh-
borhood of the central path, then a worst case polynomial complexity result can
be established. Xu [84] restricted all iterates to stay within an intersection of an
CX)-norm neighborhood and a large 2-norm neighborhood of the central path. In this
case, the implementation achieves O( foL )-iteration complexity in the worst case.
Clearly the dimension of the Newton equation system solved by the homogeneous
algorithm is slightly larger than the corresponding system solved in the primal-
dual method. In fact the dimension is increased by exactly one. The primal-dual
method can implemented such that the same factorization as in primal-dual method
is computed in each iteration. However, the factorization must be used in one more
solve to compute the solution of the Newton equation system, see [86] for details.
[ _~-2 (6.25)
Implementation of IPMs for LP 205
It should be noted that all IPMs solve an identical system of linear equations. The
only difference is in the value of the diagonal matrix D2 and the right-hand side.
This is the reason why the comparison of different variants of interior point methods
is often simplified to a comparison of the number of iterations (Newton steps).
The linear system (6.25) can be solved using either direct or iterative methods. Itera-
tive methods, e.g., conjugate gradient algorithms are not competitive in general case
due to the difficulties in choosing a good and computationally cheap preconditioner.
Some success with iterative methods for special LP problems has been obtained, see
[71,70]
Consequently, all state of the art implementations of the general purpose IPMs use a
direct approach [19] to solve the Newton equations. We can be even more specific to
say that they all use some variant of the symmetric triangular LALT decomposition,
where L is a lower triangular matrix and A is a block diagonal matrix with blocks of
dimension 1 or 2. To complete the discussion, let us mention an alternative direct
approach ~ the QR decomposition of A. Although this approach uses orthogonal
transformations and guarantees high accuracy of solutions, it cannot be used in
practice since it is prohibitively expensive.
Summing up, the only practicable approach to solve the Newton equations in general
purpose IPM codes is the LALT decomposition. There exist numerous variants of
its implementations. They differ essentially in restrictions imposed on the choice of
the pivot order and, from some perspective, they can all be viewed within the same
unifying framework that we shall present later in this section. We will be able to do
it after we will have described the two major alternative approaches. The first one
reduces (6.25) to the normal equations
(6.26)
by pivoting down the diagonal elements of _D~2 in (6.25). The other approach
solves the augmented system (6.25) directly without necessarily pivoting in the _D2
part first.
Next we shall address some technical aspects of the implementation and its de-
pendency on the computer hardware. Due to the rapid changes in the computing
technology, a detailed discussion of the effect of computer hardware goes beyond
the scope of this book. We shall display, however, several important points where
different computer architectures influence the efficiency the most. Finally, we shall
discuss some issues of accuracy control within IPM implementations.
206 CHAPTER 6
Assume that in the kth step of the Gaussian Elimination, the ith column of the
Schur complement contains Ci nonzero entries and its diagonal element becomes a
pivot. The kth step of elimination requires thus
Ii = 2"(ci
1
-
2
1) , (6.27)
floating point operations flops to be executed. We exploit the fact that the de-
composed matrix AD2 AT is positive definite so the pivot choice can be limited to
the diagonal elements. In fact, this choice has to be limited to diagonal elements
to preserve symmetry. Function f; evaluates the computational effort and gives an
overestimate of the fill-in that can result from the elimination if the ith diagonal ele-
ment becomes a pivot (f; is the Markowitz merit function [55] applied to a symmetric
matrix [79]).
Implementation of IPMs for LP 207
The "best" pivot at step k, in the sense of the number of flops required to perform
the elimination step, is the one that minimizes Ii. Interpreting this process in terms
of the elimination graph [29}, one can see that it is equivalent to the choice of
the node in the graph which has the minimum degree (this gave the name to this
heuristic). The minimum degree ordering algorithm can be implemented efficiently
both in terms of speed and storage requirements. For details, the reader is referred
to the excellent summary in [30].
Let us observe that, in general, function (6.27) considerably overestimates the ex-
pected number of fill-ins in a given iteration of the Gaussian Elimination because it
does not take into account the fact that in many positions of the predicted fill-in,
nonzero entries already exist. It is possible that another pivot candidate, although
more expensive in terms of (6.27), would produce less fill-in as the elimination step
would mainly update already existing nonzero entries of the Schur complement. The
minimum local fill-in ordering chooses such a pivot. Generally, the minimum local
fill-in algorithm produces a sparser factorization but at higher initial cost to obtain
the ordering [54], because the analysis that exactly predicts fill-in and chooses the
pivot producing its minimum number is very expensive.
Another efficient technique to determine the pivot order has been proposed in [65].
The method first selects a set of attractive pivot candidates and, in the next step,
from this smaller set chooses the pivot that generates the minimal predicted fill-in.
Computational experience shows considerable improvement in speed without the loss
in the quality of the ordering.
Numerical examples
To give the reader some rough idea about the advantages of the two competitive
ordering schemes, we shall compare their performance on a subset of medium scale
linear problems from the Netlib collection [28]. Table 6.1 collects the results of
this comparison. Abbreviations MDO and MFO in it denote the minimum degree
ordering and the minimum local fill-in ordering, respectively.
The first three columns of Table 6.1 contain the problem names and the times (in
seconds) of the analysis phase for the two orderings considered. The analysis time
includes the setup for the ordering (i.e. building a representation of AAT), the order-
ing time, and the time for building the nonzero patterns of the Cholesky factors. For
208 CHAPTER 6
Table 6.1 Comparison of minimum degree (MDO) and minimum local fill-in
(MFO) orderings
Name Analysis tIme Nonzeros in L Flops in thousand FactorizatIOn time
MOO MFO MOO MFO MOO MFO MOO MFO
25fv47 0.50 1.38 32984 27219 1282 811 0.345 0.244
80bau3b 1.22 2.12 37730 34006 1171 893 0.424 0.361
bnl2 0.91 2.82 59437 56705 3860 3420 0.957 0.889
cycle 0.93 1.80 54682 39073 2004 920 0.565 0.305
d2q06c 1.89 5.74 135960 91614 11327 4752 2.693 1.308
degen3 20.77 13.33 119403 115681 7958 7403 2.312 2.198
dllOOl 37.40 552.44 1632101 1445468 711739 547005 160.471 129.905
greenbea 2.21 2.11 47014 45507 907 842 0.379 0.341
grow 22 0.21 0.51 8618 8590 157 156 0.064 0.055
maros-r7 6.70 47.49 510148 511930 70445 72568 15.730 15.945
pilot 5.67 25.18 191704 172264 24416 18956 5.704 4.731
pilot87 19.27 110.71 423656 389787 88504 75791 20.725 18.138
pilot-we 0.29 0.58 14904 13887 350 292 0.124 0.100
both algorithms, the ordering time is the dominating factor. The following columns
contain the number of nonzeros in the Cholesky factors produced by the two order-
ings, the number of flops (in thousand) needed to compute the factorization including
flops required by the computation of AAT. The last two columns contain the average
time (in seconds) to execute one factorization on a SUN Sparc-10 workstation.
The results presented in Table 6.1 indicate that MDO is usually faster than MFO
(degen3 is one exception) but it usually produces denser Cholesky factors. Without
going into details, we note that on problems where the nonzeros of AAT are con-
centrated in a tight band near the diagonal (e.g.: grow22, maros-r7), MFO does
not offer any advantage over MDO. In contrast, on problems with "hard" structures
(e.g.: cycle, dfl001) MFO may be more efficient. Figure 6.1 shows the sparsity
patterns of the Cholesky factors obtained by the minimum degree and minimum
local fill-in orderings for the problem cycle, on which the largest difference between
the two heuristics has been observed.
Figure 6.1 Sparsity pattern with the MDO (left) and MFO (right) on problem
cycle
I/:'-:
01:.-
50':"
...... I.
-:-,.',
;.:.;:'~ .. i
~ - .......::".' . .:
''':''' j
.... :. ~
./1- ,
;:- ...... -
.:
............
:~: ~ r::~ !-
.~:~:~:.~~::/; ~
,.
210 CHAPTER 6
The normal equations approach shows a uniformly good performance when applied
to the solution of the majority of all linear programs. Unfortunately, it suffers from
two drawbacks.
Normal equations behave badly whenever the primal linear program contains free
variables. To transform the problem to the standard form (6.1), any free variable
has to be replaced with a difference of two nonnegative variables: XF = x+ - x-.
The presence of logarithmic terms in the objective function causes very fast growth
of both split brothers. Although their difference may be kept relatively close to the
optimal value of x F, both x+ and x- tend to infinity. This results in a serious loss
of accuracy in (6.26). A remedy used in many IPM implementations is to prevent
excessive growth of x+ and x-.
A more serious drawback of the normal equations approach is that it suffers dramat-
ically from the presence of dense columns in A. The reason is that a dense column
in A with p nonzero elements creates a dense window of size p x p in the AD2 AT
matrix (subject to its symmetric row and column permutation). Assume that
(6.28)
where Al E R mxn - Ic and A2 E Rmxlc are matrices built of sparse and dense columns,
respectively. Several techniques have been proposed to treat the A2 part separately.
The simplest one, due to Birge et al. [8] makes a choice between the factorizations
of AAT and AT A matrices. The latter factorization easily accommodates dense
columns of A (dense rows of AT). The approach clearly fails when A contains both
dense columns and dense rows.
Another possibility is the column splitting technique [35, 82]. It cuts a long column
into shorter pieces, introducing additional linking constraints. Unfortunately, it
works satisfactorily only for a small number of dense columns [37].
The most popular way of treating dense columns within the normal equations ap-
proach employs the Schur complement mechanism. It is based on (6.28) and an
explicit decomposition of the matrix
(6.29)
Recently, Andersen [5) proposed a remedy to the rank deficiency arising in the Schur
complement mechanism. His approach employs an old technique due to Stewart
[74). The technique corrects all unacceptably small pivots during the Cholesky fac-
torization by adding a regularizing diagonal term to them. Consequently, instead of
computing the decomposition of Al Dr AI, it computes a decomposition of another
matrix A1DiAf + uEET , where u is a regularizing term and E is a matrix built
of unit columns with non zeros appearing in rows corresponding to corrected pivots.
Once such a stable decomposition is obtained
(6.30)
Andersen [5) observed that the rank deficiency of Al Dr Af cannot exceed k, the
number of columns handled separately. His method consists of correcting too small
pivots in the factorization of AIDrAf by computing the following (stable) Cholesky
decomposition
Summing up, it is possible to overcome the most important drawback of the normal
equations approach, i.e. to handle dense columns in it. However, there still remains a
question about the heuristic to choose the columns that should be treated separately.
212 CHAPTER 6
A trivial selection rule based on the number of nonzero elements in a column does
not identify all "hard" columns; we shall discuss this issue in the next section.
Recall that the Schur complement mechanism is efficient if the number of dense
columns in the constraint matrix is not excessive. This motivated several researchers
to pay special attention to the augmented system form of the Newton equations
which allows more freedom in the pivot choice.
[ _D-2
A
AT]
0 = LAL
T
, (6.31)
In contrast to the normal equations approach in which the analysis and factorization
phases are separated, the factorization (6.31) is computed dynamically. This means
that the choice of pivot is concerned with both the sparsity and stability of the
triangular factor. It is obvious that, due to the careful choice of stable pivots, this
factorization must be at least as stable as the one of the normal equations. On
the other hand, due to the greater freedom in the choice of the pivot order, the
augmented system factorization may produce a significantly sparser factor than that
of the normal equations. Indeed the latter is a special case of (6.31) in which the
first n pivots are chosen from the D2 part regardless their stability properties and
without any concern about the fill-in they produce.
The success of the augmented system factorization depends highly on the efficiency
of the pivot selection rule. Additionally, to save on the expensive analysis phase, the
pivot order is reused in subsequent IPM iterations and only occasionally updated
when the numerical properties of the Newton equation matrix has changed consid-
erably. Mehrotra's implementation [26, 60], for example, is based on the Bunch-
Implementation of IPMs for LP 213
Parlett factorization [13] and on the use of the generalized Markowitz [55] count of
type (6.27) for 2 x 2 pivots.
On the other hand, it has been shown in [66] that the 1 x 1 pivot scheme is always
valid when computing the symmetric factorization of the augmented matrix, and if a
valid pivot order is computed for a certain D2, it will in theory be valid for arbitrary
D2 matrices occurring during the interior point iterations. However, this ordering
might be numerically unstable.
A popular way of the pivot selection rule is detecting "dense" columns and pivoting
first in the diagonal positions of D- 2 in the augmented matrix falling outside of
them. A difficulty arises, however, with the choice of a threshold density used to
group columns of A into the sparse and the dense parts in (6.28). A fixed threshold
value approach works well only in a case when dense columns are easily identifiable,
i.e. when the number of non zeros in each of them exceeds significantly the average
number of entries in sparse columns [83]. Whenever more complicated sparsity
structure appears in A, a more sophisticated heuristic is needed. Maros and Meszaros
[56] give a detailed analysis of this issue that we shall present below.
A = [ All (6.32)
A2l
where All is supposed to be very sparse and additionally it is assumed to create
a sparse adjacency structure AllAfl' Al2 is a presumably small set of "difficult"
columns, e.g., dense columns or columns referring to free variables, and [A2l A 22 ]
is a set of "difficult" rows. An efficient heuristic to find such a partition is given in
[56].
The analysis of this system shows immediately which block can be inexpensively
pivoted out and which one should be delayed as much as possible. Elimination of
D12 causes very limited fill-in and reduces the matrix to
Af2
All DrAfl (6.33)
A2lDrAfl
214 CHAPTER 6
The elimination of the D;2 block should be delayed after all attractive pivot can-
didates from AllDi Ail and A2lDi AIl blocks are exploited. The normal equations
approach makes no such a distinction and pivots out both D12 and D;2 blocks.
It is worth to note a close relationship of the approach of [56] and the Schur comple-
ment mechanism applied to handle the block of "difficult" columns in A. Observe
that the normal equations
[ All
An ][
can be replaced with the following system
(6.34)
It is easy to verify that the matrix involved in the system (6.34) has exactly the
same sparsity pattern (subject to symmetric row and column permutations) as that
in {6.33}.
Table 6.2 compares the efficiency of the normal equations (NE) and the augmented
system (AS) approaches. We cluster our test problems into three groups. The first
group contains problems with dense columns (aircraft, fitlp, fit2p). In the
second group we collect some problems without dense columns, but with a "preju-
dicial" nonzero pattern for the normal equations (ganges, pilot4. stair). The
last group contains problems without any advantageous structure for the augmented
system. The first two columns of Table 6.2 contain the name of the problem and
the number of nonzeros in the densest column. The following two columns show
the setup time (in seconds) for the two competing approaches. Note that the setup
time includes not only the generation of the pivot order and the sparsity pattern
analysis but also the time of one numerical factorization. Columns 5 and 6 contain
Implementation of IPMs for LP 215
Table 6.2 Comparison of normal equations (NE) and augmented system (AS)
factorizations
Name Dens. Analysis time Nonzeros Flops In 1000's Fact. tIme
col. NE AS NE AS NE AS NE AS
aIrcraft 751 115.2 0.97 1437398 20317 361174 37 79.19 0.122
fitlp 627 14.22 0.33 206097 10120 42920 63 9.281 0.058
fit2p 3000 - 1.73 . 50583 - 266 . 0.328
ganges 13 0.58 0.98 35076 23555 770 316 0.252 0.122
pilot4 27 0.64 0.58 18851 14153 488 265 0.146 0.082
stair 34 0.44 0.48 17990 11693 461 188 0.129 0.062
25fv47 21 0.93 2.77 43202 43569 1282 1297 0.363 0.412
80bau3b 11 2.02 3.38 57202 57683 1171 1181 0.476 0.487
d2q06c 34 5.34 22.30 167318 178763 11328 14480 2.85 3.604
the number of non zeros in the factorization (in a case of the NE, this corresponds to
the sum of nonzeros in the Cholesky factor of (6.26) and non zeros in A). Columns
7 and 8 contain the number of flops (in thousands) required by one factorization
for the two approaches compared. The last two columns show the average times (in
seconds) to compute one factorization during the algorithm. All results are obtained
on a SUN Sparc-lO workstation.
The results of Table 6.2 obtained for problems with dense columns show an un-
questionable advantage of the augmented system over a trivial implementation of
the normal equations in which dense columns are not handled separately. Our 64
Mbyte workstation was unable to store the lower triangular part of a 3000 x 3000
totally dense matrix that resulted from the normal equations approach applied to
the problem fit2p. In contrast, the augmented system produced a very sparse fac-
torization in this case. For our second group of problems, the performance of the
augmented system is also much better. Finally, for our third group of problems, the
much lower setup cost of the normal equations made the augmented system approach
disadvantageous.
Figure 6.2 gives a bit of insight into the sparsity patterns generated for the prob-
lem stair. It displays the factored augmented matrices for the two competitive
approaches.
Based on the previous examples, we find that both methods are important for a
computational practice. It would be advantageous to have both ofthem implemented
as well as to have an analyzer that is able to determine which of them should be
used [56].
216 CHAPTER 6
Figure 6.2 Sparsity patterns with the NE (left) and AS (right) pivot rule on
problem stair ......------------"7'"--,.-----,
.
,. .;
.- s..
~.
.~ .
;...: , ..f~
.'
'-'
~ : ;:' . ~
1\ i\ ..:.~ Ii -_"~~.....
...... .....
•- :I. • •
Implementation of IPMs for LP 217
Several approaches have been developed to compute the factorization. They exploit
sparsity in an efficient way and use different techniques of storage management in
the computations. George and Liu [29] demonstrate how these calculations can be
organized either by rows or by columns.
During the row-Cholesky factorization the rows of the Cholesky factor L are com-
puted one by one. This approach is called the bordering method. Several enhance-
ments of it can be found in [29, 50].
the right of it in the matrix. The matching of nonzeros during the transformations
with this approach is not a trivial problem; several solutions have been found for its
efficient implementation [19, 72J. The interest in this approach has increased in the
past few years because of its ability to better exploit high performance architectures
and the memory hierarchy.
We shall present a few of the most important techniques that increase the efficiency
of the numerical factorization step in interior point methods. These techniques
come from parallel and vector computations and the common trick is the use of
the matrix~vector operations in 'dense' mode to reduce the overhead of the sparse
computations.
Dense window
Supernodes
The dense window technique can be generalized using the following observation. Due
to the way the Cholesky decomposition works, some blocks of columns in L tend to
have the same sparsity pattern below the diagonal. Such a block of columns is called
a supernode and it can be treated as a dense submatrix. The supernode terminology
comes from the elimination graph representation of the Cholesky decomposition [29J.
There exist two different types of supernodes: they are presented in the figures below.
Type 1 supernode Type 2 supernode
* *
* * *
* * * *
* * * * * *
* * * * * *
* * * * * *
Implementation of IPMs for LP 219
Both types of supernodes are exploited in a similar manner within the numerical
factorization step. Analogously to the dense window technique, the use of supernodes
increases the portion of flops that use dense matrix-vector transformations to save on
indirect addressing and memory references. The following operations take advantage
of the presence of supernodes:
It is advisable to impose a lower bound on the size of supernodes since the extra work
in step (ii) does not payoff in the case of too small supernodes. Another suggestion
is the use of an upper bound on the number of nonzeros in each supernode to better
exploit the cache memory on several computer architectures [52]. The effect of the
supernodal methods is highly hardware-dependent and several results can be found
in the literature: the efficiency of the supernodal decomposition on the shared-
memory multiprocessors is discussed by Esmond and Peyton [69], the exploitation
of the cache memory on high-performance workstations is studied by Rothberg and
Gupta [72] in the framework of the right looking factorization while the case of the
left looking factorization was investigated by Meszaros [64].
with a further simplifying assumption that the blocks L11 and L22 of the Cholesky
factor define supernodes. The Cholesky factorization of this matrix can be computed
in the following steps:
220 CHAPTER 6
1. Factorize LllAllLfl = B ll .
2. Update L21 = B 21 (L 1}f.
Update Bn = Bn - L21AllL21'
T
3.
A
The advantage is that steps 1, 2, and 4 can be performed in dense mode, resulting
in a very efficient implementation on high performance computers.
Loop unrolling
Consequently, three memory references (steps 1, 2, and 4) are associated with only
one arithmetical operation (step 3). During the factorization, multiple column mod-
ifications are performed on a single column, which opens the possibility to unroll the
loop over the column transformations. Let a be the target column, b, c, d, e, f and 9
be the source columns, and h(l), ... , h(6) their scalar multipliers, respectively. A
6-step loop unrolling technique consists of the following transformation:
An execution of this transformation needs only eight memory references and six
arithmetical operations (multiplications). Hence, ten memory references have been
saved compared with an execution of six elementary flops that do not exploit loop
Implementation of IPMs for LP 221
unrolling. This technique brings considerable time savings on all computer architec-
tures although the savings may vary significantly on different computers.
Numerical examples
To give the reader some idea about the efficiency of all techniques discussed by now
(i.e. dense window, supernodes and loop unrolling), we show the computational re-
sults of their application on a small set of test problems for one widely used computer
architecture, namely, a SUN Sparc-lO workstation.
In Table 6.3 we compare the times (in seconds) of computing one decomposition
with a standard left looking factorization, T(ll), the one using dense window tech-
nique, T(dw) , supernodal factorization without loop unrolling, T(snl), and, finally,
supernodal factorizations with 2-, 4-, and 6-step loop unrolling, T(sn2), T(sn4) and
T(sn6), respectively.
To cover a possibly wide set of LP problems, we have chosen 6 test examples with
very different characteristics. Problems aircraft and fit2p have extremely sparse
factorization with the augmented system (cf. Table 6.2). Problems 80bau3b and
25fv47 are "usual" sparse problems, and maros-r7 and dfl001 are examples of
very dense ones. In the case of sparse problems, these techniques have very little
influence on the factorization times. On usual "sparse" problems, the dense window
method is unequivocally superior to the standard left looking method but the savings
resulting from the use of supernodes and loop unrolling are not evident. The 4-step
loop unrolling gives a better execution time, but the effect of the 6-step loop unrolling
is negligible. Finally, for dense problems, the superiority of the simple supernodal
method over the use of dense window is unquestionably; moreover the computation
times monotonically decrease with the degree of the loop unrolling.
222 CHAPTER 6
• cache memory,
• pipelining,
• vectorization,
• superscalar capabilities.
The simple choice of the "best" algorithm is usually impossible without extensive
computational tests. The reader can find a good discussion of these issues (as well
as many numerical examples) in [52]. Let us collect some general suggestions.
There is a choice between two ordering methods: the minimum degree and the min-
imum local fill-in one. The hint is the ratio of the cost of integer (or logical) and
floating point operations. If the latter are executed fast compared with the former,
then there is little chance that the savings in numerical factorization can compensate
excessive effort during the analysis phase. In this case, the faster minimum degree
ordering seems more appropriate. In other cases, e.g., for standard "low-cost" work-
stations, the minimum local fill-in ordering may become an attractive alternative.
In the numerical factorization phase the two most commonly used methods are the
right looking and left looking algorithms. The right looking factorization exploits the
cache memory better, because the supernodes enter into the cache only once during
the factorization, while in the case of the left looking factorization a supernode
enters cache memory many times. However, the right looking factorization requires
an additional indirect addressing. As it can be presumed, the criteria for choosing
the numerical factorization algorithm must be based on the investigation of the cache
memory (its size, the time of bringing information into it, etc.).
is possible to determine the maximum set of independent rows using Gaussian Elim-
ination [2]. The computational cost of such an operation is relatively low in most of
the cases. On the other hand, IPMs that use a starting point similar to (6.11) can
benefit from the additional Cholesky factorization of AD2 AT with a well conditioned
D2 to detect linearly dependent rows. Whenever a pivot in the factorization falls
below a predetermined tolerance, i.e. Aii < f, row i can be dropped from the LP
model. Although the latter approach is, in general, less reliable than the specialized
Gaussian Elimination of A, its application does not need any additional computa-
tional effort as it only exploits the factorization that anyway has to be computed.
The practice shows that this approach solves the problem of dependent rows as well.
Even if we manage to satisfy the full row rank property at the beginning of the
optimization proces, then during the solution process matrix AD may become "nu-
merically" rank deficient. This is often the case due to the presence of primal de-
generacy. Consequently, IPM implementations have to be able to deal with rank
deficient matrices AD2 AT .
For feasible and bounded LPs, the negative influence of the ill-conditioning of AD2 AT
on the accuracy of the solution of the normal equations is surprisingly small. Stewart
[75] gives a nice explanation to this common experience, derived from an analysis
of the properties of the right hand side vector in (6.26). Stewart's result does not
apply to the case when the LP problem is infeasible. In practice, this case usually
manifests in a serious loss of accuracy when solving the Newton equations.
As mentioned before, the numerical difficulties usually appear close to the optimal
solution, especially in the presence of primal degeneracy. There exist several ways to
overcome them. These techniques are not always mathematically elegant and, addi-
tionally, they are often treated as the most precious know-how that is not revealed
by IPM specialists. Below we present some suggestions how instability problems can
be overcome and we end this section with a detailed presentation of the technique
to control accuracy applied in a public domain IPM code (available trough Netlib)
[37, 36, 38].
Another way is to add a small regularizing term tI to the matrix AD2 AT before
or during the factorization. This helps to complete the factorization step but needs
special safeguards in the following solution steps.
224 CHAPTER 6
The approach used in the public domain LP code [38] consists of several safeguard
techniques. First of all, it uses a dynamically adjusted diagonal regularizing matrix
R E nm. Its elements Rii vary from nearly zero value added to acceptably stable
=
pivots Ai; to quite large regularizations Ri; 1 added to unstable pivots Ai;. Conse-
quently, instead of a decomposition of AD2 AT, a (stable) factorization of a different
regularized matrix AD2 AT + R is computed. It is very rare that R contains more
than a few large regularizing terms; they refer to those rows of AD which are nearly
linearly dependent.
An important issue is to take R into account in all the following computations (i.e.
in solves for direction). Observe that the form of the decomposed matrix could have
been obtained from the following perturbed augmented system
(6.37)
Note that it is the Newton equations system corresponding to the following quadratic
programming problem (closely related to the dual LP problem (6.2))
maxImIze bT y - u T w + (y -
yo)T R(y - YO)
subject to ATy-w+z=c, (6.38)
z,w;::O,
in which Yo is some reference point, e.g., current iterate y. The right hand side
vectors rR and hR in (6.37) are derived from the first order optimality conditions
for the barrier problem associated with (6.38)
=
Note that they become identical to (6.9) if we take a particular reference point Yo y.
The regularization technique can be interpreted as the use of quadratic penalty for
the changes of those dual variables Yi for which presumably unstable pivots were
computed. Computational experience shows that it well prevents the propagation of
round-off errors.
Apart from the quadratic regularization technique mentioned above, the LP code
of [38] uses extensively iterative refinement process to improve the accuracy of the
Newton direction. Iterative refinement technique is always applied to the augmented
system formulation of the Newton equations system although the direction is com-
puted via their reduced, normal equations form (see, e.g., [6]).
Implementation of IPMs for LP 225
6.5 PRESOLVE
The previous section was concerned with the efficiency of solving the Newton equa-
tion system using advanced numerical linear algebra. Another way to improve the
efficiency of solving the Newton equation system is to reduce its size and make the
system sparser. This aim can be achieved by analyzing the LP problem and remove
redundancies. In practice, almost all large-scale LP problems contain redundancies.
There are several reasons to this. First of all, the model formulators, tend to chose a
formulation that makes it easy to understand and to modify the model. This often
leads to an introduction of superfluous variables and redundant constraints.
The use of a presolve phase is an old idea, see, e.g., Brearley et al. [12]; its role
was acknowledged already in simplex type optimizers. The simplex method for LP
works with sparse submatrices of A (bases) [78] while any IPM needs an inversion
of a considerably denser AAT matrix. Consequently, the potential savings resulting
from an initial problem reduction may be larger in IPM implementations. This is
the reason why the presolve analysis has recently enjoyed great attention [1, 51, 3,
37, 2, 9, 54, 77]. An additional important motivation for its use is that large LP
problems are solved routinely nowadays and the amount of redundancy is increasing
with the size of the problem.
Observe that due to the nonnegativity of x, the limits b; and bi are nonpositive
and nonnegative, respectively. If the inequalities (6.40) are at least as tight as
the original (inequality type) LP constraint, then the constraint i is redundant.
If one of them contradicts the LP constraint, then the problem is infeasible.
Finally, in some special cases (e.g.: "less than or equal to" row with h bj , =
"greater than or equal to" row with bi =
bj, or equality type row for which bj
equals to one of the limits h or bi), the LP constraint becomes a forcing one.
This means that the only way to satisfy the constraint is to fix all variables that
appear in it on their appropriate bounds.
5. Constraint limits (6.39) are used to generate implied variable bounds. (Note,
that LP variables were transformed to the standard form 0 ::; x ::; u, before).
This technique makes use of the original form of an LP constraint (i.e. its form
before a slack variable has been added to it to transform it to the "standard"
equality row of (6.1)). Assume, for example, that a nonredundant "less than or
equal to" type constraint is given, i.e.
Then
and Vk: ajk <0 h + ajk(xk - Uk) ::; L: a;jXj ::; b;,
j
and new implied bounds are given for all variables in row i by
If these bounds are tighter than the original ones, then variable bounds are
improved. Note, that this technique is particularly useful when it imposes finite
bounds on free variables. Free variables do not, in such a case, have to be split
and represented as the difference of two nonnegative variables.
This inequality can be solved and, depending on the sign of aij, produces a
lower or upper bound on Yi.
These bounds on the dual variables are used to generate lower and upper limits
for all dual constraints (a technique similar to that of point 4 is used). The limits
are then used to determine the variables reduced costs. Whenever the reduced
cost is proved to be strictly positive or strictly negative, the corresponding
variable is fixed on an appropriate bound and eliminated from the problem.
8. Dual constraint limits (obtained with a technique of point 7) are used to generate
new implied bounds on the dual variables. A technique similar to that of point 5
is applied. Implied bounds tighter than the original ones replace old bounds and
open the possibility to eliminate more variables with the technique of point 7.
3. Removing duplicate columns. Two columns are said to be duplicate if they are
identical up to a scalar multiplier. An example of duplicate columns are two
non-negative split brothers used to replace a free variable.
When discussing the disadvantages of the normal equations approach in Sec-
tion 6.4.1, we have mentioned the negative consequences of the presence of split
free variables. Sometimes it is possible to generate a finite implied bound on a
free variable [37] and avoid the need of splitting it. Whenever possible, general
duplicate variables are replaced with an aggregate variable (a linear combination
of duplicates).
much more suitable for a direct application of the interior point solver. Exact
solution of this Sparsity Problem [15] is an NP-complete problem but efficient
heuristics [1, 15, 37] usually produce satisfactory nonzero reductions in A. The
algorithm of [37], for example, looks for such a row of A that has a sparsity
pattern being the subset of the sparsity pattern of other rows and uses it to
pivot out nonzero elements from other rows.
1. Free and implied free variables can be eliminated not necessarily only in a case
when they correspond to singleton columns (cf. Section 6.5.1, point 6) but also
in a case when they correspond to denser columns. It should be noted, however,
that this elimination technique has to be used carefully as it may introduce large
amount of fill-in and, in particular, create dense columns. Hence, it requires
additional sparsity structure analysis to be implemented properly [76].
The application of all presolve techniques described by now often results in impressive
reductions of the initial LP formulation. Hopefully, the reduced problem obtained
after the presolve analysis can be solved faster. Once its solution is found, the
solution is used to recover the complete primal and dual solutions to the original
problem. This phase is called a postsolve analysis; it has been discussed extensively
in [3].
The advantages of the presolve analysis become clearer if one compares the sparsity
of the Cholesky factors obtained for the original and the reduced LP formulations.
Table 6.5 reports the number of non zeros in the Cholesky factor, NZL for all problems
listed in Table 6.4. These numbers are given for the original problem formulation,
the reduced one, and the final reduced form, in which linearly dependent rows have
been eliminated.
230 CHAPTER 6
...
ROWS LROWS LCOLS
80BAU3B 2263 9799 29063 1960 8679 18969 1960 8679 18969
826 8627 79433 763 857:1 67571 690 8572 58604
CRE-B 9649 72447 328542 5324 31818 107&03 5316 31818 107551
KEN_13 28633 42659 139834 22525 36552 81168 22356 36552 79478
NUG12 3193 8856 44244 3192 8856 38304 2794 8856 33528
OSA-30 4351 100024 700160 4279 96119 262872 .. 279 96119 262872
PDS-IO 16559 48763 140063 15609 47729 103290 15598 47729 103169
PILOTS7 2031 4883 73804 1966 4592 70375 1966 4592 70375
WOODlP 245 2594 70216 170 1717 44573 169 1717 44306
Sum 67750 298652 1605359 55788 244634 794725 55128 244634 778852
This is the main idea for using high-order methods which we shall discuss below.
Their common feature is that they reuse the factorization of the Newton equations
system in several solves with the objective to compute a "better" search direction.
There exist several approaches of this type: they apply different schemes to compute
the search direction. We shall review them briefly.
Implementation of IPMs for LP 231
The first such approach was proposed by Karmarkar et al. [47] who constructed a
parameterized representation of the (feasible) trajectory motivated from the use of
differential equations.
Mehrotra's method [62, 61] builds a higher order Taylor approximation of the (in-
feasible) primal-dual central trajectory and pushes an iterate towards an optimum
along such an approximation. The second order variant of this method proved very
successful.
Another approach, due to Domich et al. [18] uses three independent directions and
solves an auxiliary linear program in a three dimensional subspace to find a search
direction.
The method of Sonnevend et al. [73] uses subspaces spanned by directions generated
by higher order derivatives of the feasible central path, or earlier computed points
of it as a predictor step. This is later followed by one (or more) centering steps to
take the next iterate sufficiently close to the central path.
In the following part of this section we shall concentrate on two approaches that
proved to be the most attractive in computations: a second order predictor-corrector
technique [62] and a multiple centrality correction technique [36].
The first step of the predictor-corrector strategy is to compute the affine scaling
(predictor) direction. The affine scaling direction solves the Newton equation system
(6.6) for J.l = 0 and is denoted with ~a. It is easy to show that if a step of size a
232 CHAPTER 6
is taken in the affine scaling direction, then the infeasibility is reduced by the factor
(1 - a). Moreover, if the current point is feasible, then the complementarity gap
is also reduced by the same factor. Therefore, if a large step can be made in the
affine scaling direction, then a desirable progress in the optimization is achieved. On
the other hand, if the feasible stepsize in the affine-scaling direction is small, then
the current point is probably too close to the boundary. In this case the barrier
parameter should not be reduced too much.
Mehrotra suggested to use the predicted reduction in the complementarity gap along
the affine scaling direction to estimate the new barrier parameter. After the affine
scaling direction has been computed, the maximum stepsizes along this direction in
the primal (apa) and in the dual (aDa) spaces are determined preserving nonnega-
tivity of (x, 8) and (z, w). Next the predicted complementarity gap
_ (ga)2
J-I- - ga.
- (6.42)
g n
Let us observe that in the computations of the Newton direction in equation (6.6),
the second order term ~X ~z is neglected. Instead of setting the second order term
equal to zero, Mehrotra proposes to estimate ~X ~z using the affine-scaling direc-
tion ~Xa~za. His predictor-corrector direction is obtained by solving the Newton
equations system with (6.44) as the linearized complementarity conditions and the
barrier parameter J-I chosen through (6.42).
We should note here that the above presentation of the predictor-corrector tech-
nique follows the computational practice. It abuses mathematics in the sense that
Implementation of IPMs for LP 233
stepsizes ap and aD are not taken into account when building the higher order Tay-
lor approximation of the central trajectory. The reader interested to see a detailed
rigorous presentation of this approach can consult [61].
The disappointing results for the use of higher (than two) order predictor-corrector
technique used to be explained with a difficulty of building an accurate higher order
approximation of the central trajectory. On the other hand, many large scale lin-
ear programs exist for which the factorizations are extremely expensive. For those
problems the need to save on the number of factorizations becomes more important.
The approach proposed by Gondzio [36] applies multiple centrality corrections and
combines their use with a choice of reasonable, well centered targets that are sup-
posed to be easier to reach than perfectly centered (but usually unreachable) analytic
234 CHAPTER 6
centers. The idea to use targets that are not analytic centers comes from Jansen,
Roos, Teriaky and Vial [44]. They define a sequence of traceable weighted analytic
centers, targets that goes from an arbitrary interior point to a point close to the
central path. The algorithm follows these targets and continuously (although very
slowly) improves the centrality of subsequent iterates. The targets are defined in the
space of the complementarity products.
The method of [36] translates this approach into a computational practice combin-
ing the choice of attractive targets with the use of multiple correctors. It abuses the
theory of [44] in the sense that it does not limit the improvement of centrality (mea-
sured with the discrepancy between the largest and the smallest complementarity
product). Below, we briefly present this approach.
Assume (x, s) and (y, z, w) are primal and dual solutions at a given iteration of the
primal-dual algorithm (x, s, z and ware strictly positive). Next, assume that a
predictor direction Llp at this point is determined and the maximum stepsizes in
primal, Cip and dual, CiD spaces are computed that preserve nonnegativity of the
primal and dual variables, respectively.
We look for a corrector direction Ll m such that larger step sizes in primal and dual
spaces are allowed for a composite direction
(6.45 )
To enlarge these stepsizes from Cip and CiD to iip = min(Cip + flu, 1) and iiD =
min( CiD +flu, 1), respectively, a corrector term Ll m has to compensate for the negative
components in the primal and dual variables
We try to reach the goal by adding the corrector term Ll m that drives from this
exterior trial point to the next iterate (x, s, fJ, z, tV) lying in the vicinity of the central
path. However, we are aware that there is little chance to reach the analytic center
in one step that is to reach v =
(pe, pe) E R 2n in the space of the complementarity
products. Hence, we compute the complementarity products of the trial point v =
(Xz,Siii) E R2n, and concentrate the effort on correcting only their outliers. We
thus project the point v componentwise on a hypercube H = [,BminP, ,Bma:r;pFn to
get the following target
(6.47)
Implementation of IPMs for LP 235
The corrector direction Ll m solves the linear system similar to (6.6) for the following
right hand side
(0,0,0, Vt - v) E n4n+m, (6.48)
with nonzero elements only in a subset of posi tions of Vt - v that refer to the com-
plementarity products which do not belong to (f3minP, f3maxP).
Once the corrector term Ll m is computed, the new stepsizes ap and aD are deter-
mined for the composite direction
Ll = Llp + Ll m , (6.49)
The correcting process can be repeated a desirable number of times. In such a case,
the direction Ll of (6.49) becomes a new predictor Llp and is used to compute the
new trial point (6.46). An advantage of this approach is that computing every single
corrector term needs exactly the same effort (it is dominated by the solution of the
system like (6.6) with the right hand side (6.48)).
The questions arise about the choice of the "optimal" number of corrections for a
given problem and the criteria to stop correcting if it brings no improvement. They
were answered in [36]. Naturally, the more expensive the factorizations of (6.25) or
(6.26) compared with the following backsolves, the more correctors should be tried.
The computational experience of [36] proved that, when applied to the solution of
nontrivial problems, this method gives significant CPU time savings over the second
order predictor-corrector technique of Mehrotra.
approximate analytic centers are looked for [32, 63), but in the general case, interior-
point warm-start is inefficient. Consequently, the approach adopted nowadays is to
solve the first problem of a sequence of closely related problems using an IPM and
then cross-over to the simplex method. In this case the advantages of both methods
are exploited.
In this section, we shall address the problem of recovering an optimal basis from an
almost optimal primal-dual interior-point solution. Before, we would like to note that
there exist LP applications in which an optimal interior-point solution is preferable,
see, e.g., Christiansen and Kortanek [17] and Greenberg [39].
6.7.1 Notation
In this section we will work with the problem in a simplified standard form (in which
primal variables have no upper bounds)
It is well-known that any optimal solution (x· , y. , z·) must satisfy the complementar-
ity slackness conditions x; z} = O. Moreover, it is known that there exists a strictly
complementary solution that satisfies x; + z} > 0, see Goldman and Tucker [34]. Let
(x·, y., z*) be such a strictly complementary solution and define p. {j : x; > O}. =
It can be shown that p. is invariant with respect to all strictly complementary so-
=
lutions. Hence p. is unique. The pair (p., p.), where P {I, ... , n} \ P for any
set P, determines an optimal partition.
Let (8, N) denote a partition of the variables into basic and non-basic variables.
(8, N) is an optimal basis, if B is non-singular,
Xs = B-1b 2: 0, X,N" =0 (6.52)
and
(6.53)
A basic solution is said to be primal (dual) degenerate if at least one component in
Xs (Z.N) is zero.
Below we shall discuss Megiddo's algorithm and its implementation. For convenience
we assume that a set of artificial variables has been added to the problem (6.50). Let
V = =
{I, ... , m} denote the set of artificial variables; naturally, we must have Xv 0
in any optimal solution. Furthermore, we assume that a strictly complementary
solution is known. Hence, we assume that:
In fact, the algorithm presented below works for any complementary solution, i.e.
when the conditions X1'o > 0 and zpoW > 0 in assumption band c are relaxed to
X1'o 2: 0 and zpoW 2: o.
Megiddo's algorithm consists of a primal and a dual phase. Let us start with a
description of the primal phase. Let (8, N) be any partition of the variables of the
problem (6.50) into basic and non-basic parts. Then
Xs := B-1(b - Nx,N") = Xs 2: 0 (6.54)
238 CHAPTER 6
Algorithm 6.7.1
1. Choose a basis B and let x = x.
2. while(3 j E P* \ B : Xj =F 0)
3. Use the primal ratio test to move variable Xj to zero if possible
or pivot it into the basis.
4. Update (B,N) and x.
5. end while
6. B is a primal optimal basis.
It can be observed that in step 1, it is always possible to choose a basis. One possible
=
choice is B V. Algorithm 6.7.1 is a simplified version of the primal simplex algo-
rithm, because there is no pricing step (the incoming variables are predetermined).
The dual phase of Megiddo's algorithm is similar to the primal phase because, in
this case, a super-basic dual solution is known. This means that some of the reduced
costs corresponding to the basic variables might not be zero. Similarly to the primal
phase, those reduced costs can either be moved to zero or the corresponding primal
variable has to be pivoted out of the basis. The dual algorithm can be stated as
follows
Algorithm 6.7.2
1. Choose a basis B and let y = =
ii, z c - AT y.
2. while(3 j E fj* () B : Zj =F 0)
3. Use the dual ratio test to move variable Zj to zero if possible
or take it out of the basis.
4. Update (B,N), y and z.
5. end while
6. B is a dual optimal basis.
Implementation of IPMs for LP 239
If the initial basis is primal feasible, then it remains feasible throughout all steps
of Algorithm 6.7.2 because all pivots are primal degenerate. Once Algorithm 6.7.2
terminates, the final basis is both primal and dual feasible and hence optimal. Fur-
thermore, the number of iterations in the dual phase cannot exceed 11"1.
Summing up, Algorithms 6.7.1 and 6.7.2 generate an optimal basis after at most n
iterations. In practice, the number of iterations is dependent on the level of primal
and dual degeneracy.
Bixby and Lustig solve this problem using a Big-M version of Megiddo's algorithm
that is their cross-over procedure drives both complementarity and feasibility to
zero. This algorithm adds, in the worst case, several simplex pivots to obtain an
optimal basis. Their approach works well but, unfortunately, it complicates the
implementation of the cross-over procedure.
Andersen and Ye [4] propose an alternative solution to this problem. Let (xk, yk, zk)
be the iterate generated by the primal-dual algorithm in iteration k and (pk, pk) be
a guess of the optimal partition generated in iteration k. Now define the following
perturbed problem
where
bk -- pk ~k
~pk , ckpk -- (pk)Tyk and c~.
r
= (P- k)Tyk + z~ •.,..
Assume the variables in (6.55) are reordered such that x = (Xpk,Xp.) then the vec-
tor (x, y, s) = «x~., 0), yk, (0, 4.))
is a strictly complementary solution to (6.55).
Moreover, if xk converges towards an optimal primal solution and pk converges
towards P', then bk converges towards b and, similarly, c k converges towards c.
Therefore the two problems (6.50) and (6.55) will eventually share optimal bases.
This advocates for an application of Megiddo's algorithm to the perturbed problem
240 CHAPTER 6
Another question is the choice of the right iteration to terminate the interior point
algorithm and to start the cross-over. The optimal basis generation can only be
expected to produce the correct optimal basis if the interior point solution is almost
optimal and pk is a good guess for p'. A good practical criteria when to make a
switch is when fast (quadratic) convergence of primal-dual algorithm sets in.
Finally, for a discussion of linear algebra issues related to implementing the pivoting
algorithm and computational results we refer the reader to the papers [10, 4].
Several efficient LP codes based on interior-point methods have been developed the
most recent years. Almost all codes are based on the primal-dual algorithm pre-
sented above although they differ in many implementational details. There exist
several commercial vendors, e.g.: AT&T (KORBX), CPLEX(CPLEX/ BARRIER,
http://www.cplex.com), DASH(XPRESS-MP, http://www.dash . com) and IBM
(OSL, http://www.research.ibm.com/osl/) as well as numerous research codes,
some of them public domain in an executable or even in a source code form. The
reader may find it surprising that these research codes compare favorably with the
Implementation of IPMs for LP 241
best commercial products. Three public domain research codes draw particular at-
tention.
The reader interested in more information about these LP codes (both commercial
and research ones) should consult the LP FAQ (LP Frequently Asked Questions).
The World Wide Web address of the LP FAQ is
• http://vvv.skypoint.com/subscribers/ashbury/linear-programming-faq
• ftp://rtfm.mit.edu/pub/usenet/sci.ansvers/linear-programming-faq
To give the reader some idea about the efficiency of available commercial and research
LP codes, we run them on a few public domain test problems. Table 6.6 gives their
sizes, i.e. the number of rows, columns and nonzero elements, m, n, and nonz,
respectively. Problems pilot87, dflOOl and pds-l0 come from Netlib; problems
mod2, world and ilL belong to the collection maintained at the University of Iowa.
Table 6.7 reports statistics on their solution (iterations and CPU time in seconds to
reach 8-digit optimality) on an IBM Power PC workstation (model 601: 66 MHz,
64 MB RAM). In the case of 8-digits optimality could not be reached, we give
in parenthesis the number of exact digits in the suboptimal solution. The following
solvers are compared: CPLEX version 3.0 SM (simplex method), CPLEX version 3.0
BARRIER, LIPSOL version 0.3, LOQO version 2.21, HOPDM version 2.12 and
242 CHAPTER 6
Before analyzing the results collected in Table 6.7 we would like to warn the reader
that the computational results are dependent on many different factors. For example
the choice of test problems, the choice of computer and the choice of algorithmic
parameters all influence the relative performance of the codes. The results reported
in Table 6.7 have been obtained when all compared codes were run with their default
options.
The analysis of results collected in Table 6.7 indicates that there is only insignificant
difference in the efficiency of commercial and public domain research codes. The
latter are available free of charge.
Implementation of IPMs for LP 243
Although there are many different LP codes available nowadays, the reader may be
interested in preparing his own implementation of an IPM. We have to warn him
that it might not be a trivial task. A lot of different issues have to be dealt with,
e.g., the system design, the choice of the programming language, etc.
When a programming language has been chosen the next step is to choose a system
design. It is advisable to build the code based on well structured modules. For
instance the Cholesky factorization should be implemented in a separate module.
Another recommendation is to build the optimizer such that it can be called as a
stand alone procedure.
Regarding the form of input data, the standard MPS format surely has to be accepted
although more efficient binary formats might be advantageous. We refer the reader
to the book [67] for a good discussion of the MPS format.
A good reason to be able to read the MPS format is that the majority (if not
all) test problems are available in it. One such collection is the so-called Netlib
suite available via anonymous ftp to netlib.att.com (cd netlib/lp). Another
source of larger and more difficult problems is an LP test collection gathered at the
University of Iowa. It is also available via anonymous ftp to col. biz. uiowa. edu
(cd pub/testprob/lp).
• warm start.
The general warm start procedures in IPMs still work unsatisfactorily slow and are
not competitive to the simplex based reoptimizations. As mentioned in Section 6.7,
the only promising results to date have been obtained in a particular case when an
IPM is used to find an approximate analytic center of a polytope (not to optimize an
LP). It seems that the best approach currently is to solve difficult problems with an
IPM, identify its optimal basis and later employ the simplex method if reoptimization
is required.
Apart from the two practical problems mentioned above further implementational
improvements can be expected. Although we have concluded that current IPM
implementations work efficiently, we are aware that there exist LP problems that are
very sparse but produce surprisingly dense symmetric factorizations, e.g.: dfl001 or
pds- problems from the Netlib collection. It is possible that the right way to solve
these problems is to apply iterative approaches to the Newton equations system.
Finally an increasing accessibility of parallel computers in the near future will make
IPM methods that exploit this architecture more important. Indeed such algorithm
will be able to solve LP problems much larger than currently possible. This will have
important consequences for the areas of integer programming (improved cutting-
plane methods) and the area of stochastic optimization.
6.10 CONCLUSIONS
In the previous sections we have addressed the most important issues of an efficient
implementation of interior-point methods.
Implementation of IPMs for LP 245
Our discussion has been concentrated on the most important algorithmic issues such
as the role of centering (or, equivalently, following the central path) and the way of
treating infeasibility in a standard primal-dual algorithm (we have presented the HLF
model which solves the problem of detecting infeasibility efficiently). Furthermore,
we have discussed in detail the computationally most expensive part of the IPM
methods - the solution of the Newton equations system.
The progress in the IPM methods for LP during the past decade is impressive. Indeed
a complete theory of interior point methods have been developed. Moreover, based
on this theory many efficient implementations ofIPMs have been constructed. In fact
due to this algorithmic development and the improvements in the computer hardware
much larger LP problems can be solved routinely today than a decade ago. Even
though the methods are not going to improve so dramatically over the next decade
we nevertheless predict significant improvements in the current implementations.
Finally, we hope and believe that these developments are useful to the OR practi-
tioners.
Acknowledgements
The research of the second author has been supported by the Fonds National de la
Recherche Scientifique Suisse, grant #12-42503.94. The research of the third author
has been supported by the Hungarian Research Found OTKA N° T-016413.
REFERENCES
[1] I. Adler, N. Karmarkar, M. G. C. Resende, and G. Veiga. Data structures
and programming techniques for the implementation of Karmarkar's algorithm.
ORSA J. on Comput., 1(2):84-106,1989.
[2] E. D. Andersen. Finding all linearly dependent rows in large-scale linear pro-
gramming. Optimization Methods and Software, 6:219-227, 1995.
[3] E. D. Andersen and K. D. Andersen. Presolving in Linear Programming.
Preprint 35, Dept. of Math. and Computer Sci., Odense University, 1993. To
appear in Math. Programming.
[4] E. D. Andersen and Y. Yeo Combining interior-point and pivoting algo-
rithms for linear programming. Technical report, Department of Management
246 CHAPTER 6
Sciences, The University of Iowa, 1994. Available via anonymous ftp from
ftp://col.biz.uiowa.edu/pub/papers/cross.ps.Z, to appear in Management Sci-
ence.
[6) M. Arioli, J. W. Demmel, and I. S. Duff. Solving sparse linear systems with
sparse backward error. SIAM 1. Mat. Anal. Appl., 10(2):165-190,1989.
[11) A. Bjork. Methods for sparse linear least squares problems. In J. R. Bunch
and D. J. Rose, editors, Sparse Matrix Computation, pages 177-201. Academic
Press INC., 1976.
[13) J. R. Bunch and B. N. Parlett. Direct methods for solving symmetric indefinit
systems of linear equations. SIAM J. Numer. Anal., 8:639-655, 1971.
[54] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Interior point methods for linear
programming: Computational state ofthe art. ORSA J. on Comput., 6{1}:1-15,
1994.
[55] H. M. Markowitz. The elimination form of the inverse and its application to
linear programming. Management Sci., 3:255-269, 1957.
[56] I. Maros and Cs. Meszaros. The role of the augmented system in interior point
methods. Technical Report TR/06/95, Brunei University, Department of Math-
ematics and Statistics, London, 1995.
[57] K. A. McShane, C. L. Monma, and D. F. Shanno. An implementation of a
primal-dual method for linear programming. ORSA J. on Comput., 1(2):70-83,
1989.
[58] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo,
editor, Progress in Mathematical Programming: Interior-Point Algorithms and
Related Methods, pages 131-158. Springer Verlag, 1989.
[59] N. Megiddo. On finding primal- and dual- optimal bases. ORSA J. on Comput.,
3(1 ):63-65, 1991.
[60] S. Mehrotra. Handling free variables in interior methods. Technical Report
91-06, Department of Industrial Engineering and Managment Sciences, North-
western University, Evanston, USA., March 1991.
[61] S. Mehrotra. High order methods and their performance. Technical Report 90-
16Rl, Department of Industrial Engineering and Managment Sciences, North-
western University, Evanston, USA., 1991.
[62] S. Mehrotra. On the implementation of a primal-dual interior point method.
SIAM J. on Optim., 2(4):575-601, 1992.
[63] O. du Merle, J. L. Goffin, and J. P. Vial. A short note on the comparative be-
haviour of Kelley's cutting plane method and the analytic center cutting plane
method. Technical Report 1996.4, Logilab, HEC Geneva, Section of Manage-
ment Studies, University of Geneva, January 1996.
[64] Cs. Meszaros. Fast Cholesky factorization for interior point methods of linear
programming. Technical report, Computer and Automation Institute, Hungar-
ian Academy of Sciences, Budapest, 1994. To appear in Computers & Mathe-
matics with Applications.
[65] Cs. Meszaros. The "inexact" minimum local fill-in ordering algorithm. Working
paper WP 95-7, Computer and Automation Institute, Hungarian Academy of
Sciences, Budapest, 1995.
Implementation of IPMs for LP 251
[66] Cs. Meszaros. The augmented system variant of IPMs in two-stage stochastic
linear programming computation. Working paper WP 95-1, Computer and
Automation Institute, Hungarian Academy of Sciences, Budapest, 1995.
[67] J .L. Nazareth. Computer Solution of Linear Programs. Oxford University Press,
New York, 1987.
[68] J. von Neumann. On a maximization problem. Technical report, Institute for
Advanced Study (Princeton, NJ, USA), 1947.
[69] E. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for
shared-memory multiprocessors. SIAM J. Sci. Statist. Comput., 14(4):761-769,
1993.
[70] L. Portugal, F. Bastos, J. Judice, J. PaixiS, and T. Terlaky. An investigation
of interior point algorithms for the linear transportation problems. Technical
Report 93-100, Faculteit der Technische Wiskunde en Informatica, Technische
Universiteit Delft, Nederlands, 1993.
[71] M. G. C. Resende and G. Veiga. An efficient implementation of a network
interior point method. Technical report, AT&T Bell Laboratores, Murray Hill,
NJ, USA, February 1992.
[72] E. Rothberg and A. Gupta. Efficient Sparse Matrix Factorization on High-
Performance Workstations-Exploiting the Memory Hierarchy. ACM Trans.
Math. Software, 17(3):313-334, 1991.
[73] G. Sonnevend, J. Stoer, and G. Zhao. Subspace methods for solving linear
programming problems. Technical report, Institut fur Angewandte Mathematik
und Statistic, Universitat Wurz burg , Wurzburg, Germany, January 1994.
[74] G. W. Stewart. Modifying pivot elements in Gaussian elimination. Math. Comp.,
28:537-542, 1974.
[75] G. W. Stewart. On scaled projections and pseudoinverses. Linear Algebra Appl.,
112:189-193, 1989.
[76] R. Subramanian, R. P. S. Scheff Jr., J. D. Qillinan, D. S. Wiper, and R. E.
Marsten. Coldstart: Fleet assigment at Delta Air Lines. Interfaces, 24(1),
1994.
[77] U. H. Suh!. MPOS - Mathematical optimization system. European J. Oper.
Res., 72(2):312-322, 1994.
[86] X. Xu, P.-F. Hung, and Y. Yeo A simplified homogeneous and self-dual linear
programming algorithm and its implementation. Technical report, Department
of Management Sciences, The University of Iowa, 1993.
[87J X. Xu and Y. Yeo A generalized homogeneous and self-dual algorithm for linear
programming. Oper. Res. Lett., 17:181-190, 1995.
[89] Y. Yeo On the finite convergence of interior-point algorithms for linear program-
ming. Math. Programming, 57:325-335, 1992.
Introduction
Many of the theoretical results of the previous chapters about interior-point methods
for solving linear programs also hold for nonlinear convex programs. In this chapter
we intend to give a simple self-contained introduction to primal methods for convex
programs. Our focus is on the theoretical properties of the methods; in Section
7.3, we try to bridge the gap between theory and implementation, and propose
a primal long-step predictor-corrector infeasible interior-point method for convex
programming. Our presentation follows the outline in [19]; for a comprehensive
treatment of interior-point methods for convex programs we refer to [29] or [7].
This chapter is divided into four sections. In Section 7.1 a convex problem is de-
fined and an elementary method for solving this problem is listed. For this method
some crucial questions are stated that determine its efficiency. Based on these ques-
tions the concept of self-concordance is naturally derived in Section 7.2, and some
important examples of self-concordant barrier functions are listed. Section 7.2 also
presents the basic theoretical results needed in Section 7.3. In Section 7.3 a short
proof of polynomiality for the method of centers (in slight modification of the con-
ceptual method described in Section 7.1) is given. The proof is very simple once the
results of Section 7.2 are known. Section 7.3 closes with an implementable barrier
255
T. Terlaky (ed.), Interior Point Methods of Mathematical Prollrammiml 255-296.
ClI'J%KJuwer ArademkPubhshtTl.
256 CHAPTER 7
method for a slightly more general form of convex programs. In Section 7.4 we list
some applications of convex programs.
for the set S. Througout, we assume further that the constraint functions ri are
such that 4> is well defined and convex in So. In particular, we exclude restrictions
like r;(t) ;= max{O, tP for the negative real axis. (In this example the function
Solving Convex Programs 257
Let some A > Aopt be given. (If some point xES is known we may choose for
example A = 1 + cT x.) We define
(7.4)
Method of centers
Initialization: Let some value A = AO > AOP ' be given and some approximation
x(O) E So to X(AO) with cT x(O) < >.0. Set k O. =
Until some stopping criterion is satisfied repeat
3. Set k = k + 1.
End.
1. How well does Newton's method perform when applied to minimizing rp( . ,A)?
258 CHAPTER 7
It is intuitively clear that the method of centers will be interesting if and only if both
questions allow a satisfactory answer. These two questions will be used to motivate
two forms of local Lipschitz continuity in the next section.
7.2 SELF-CONCORDANCE
We give an answer to the two crucial questions of the previous section by introducing
the notion of self-concordance. Self-concordant functions are defined and examined
in great detail by Nesterov and Nemirovsky in [27, 28], and while our presentation
is different, most results presented in this section are due to [28].
Derivation of Self-Concordance
A condition for a "nice" performance of Newton's method can be derived by the
following straightforward argument.
• Let us consider a simple example: cjJ(t) := -In t, the logarithmic barrier function
1
for the positive real axis. In this case, for t > 0, cjJl/(t) =
t 2 and cjJlII(t) =
- t~' The natural condition to bound cjJlII relative to cjJl/ is to require WI/(t)1 S
2cjJl/(t)3/2. Of course, the constant "2" appears somewhat arbitrary, and also
the exponent 3/2 needs further justification. But as we will see next, this choice
of condition makes sense indeed.
• The generalization to n dimensions of the above condition results in the self-
concordance condition given in [27]; for any x E So C IRn and any direction
h E IR n we require
to hold true. From this formulation it becomes evident that the exponent 3/2
on the right hand side is natural in that it ensures independence of this relation
of the norm of h.
• Note that the quantities involved in this relation are just the second and third
directional derivatives of cjJ at x in direction h. Thus, the above relation can
equivalently be rewritten as follows.
(7.7)
Throughout we will assume that cjJ is a barrier function for S, that is for any point
y E oS on the boundary of S we assume that limx _ v, xESO cjJ(x) = 00. (In [28] the
barrier property is called "strong self-concordance".)
Note that inequality (7.7) is not invariant under multiplication of / - respectively cjJ
- by a positive constant. For example, if / satisfies (7.7), the function let) := 4/(t)
260 CHAPTER 7
~ ~113/2
satisfies /'" :s; f , and the constant "2" is not needed. Condition (7.7) essentially
requires that the supremum
If;':h (0) I
xES~,urElRn (f;',h(0»3/2
is finite, and that ¢ is multiplied by sufficiently large constant such that the supre-
mum is less or equal to 2. Thus, the choice of the second constant ("2") in the
definition of (7.7) may be somewhat arbitrary, based on the function -In t, but is is
certainly without loss of generality. In fact, our Definition 7.2.1 is in slight variation
to the original definition in [28] who require the above supremum to be less or equal
to 2/yfa, and call ¢ self-concordant with parameter 0'. However, in [28] it is also
assumed for most part of their monograph that 0' = 1, so that the definitions are
more or less the same.
Before proving that our incentive-of finding some criterion which guarantees that
Newton's method for minimizing ¢ converges well-is indeed fulfilled by (7.7), we
show that there are a number offunctions that satisfy (7.7).
Some Examples
In trying to construct functions that satisfy (7.7) let us start with the function
"-lnt" which of course satisfies (7.7).
• Summation. Let us observe first that condition (7.7) is closed with respect
to summation, that is if ¢i: IR n -+ IR satisfy (7.7) for i = 1,2, then so does
¢1,2 := ¢1 + ¢2 (as long as the intersection of the domains of 4>1 and ¢2 is not
empty).
Indeed, 1ft + f~/1 :s; Iftl + Ig'l :s; 2(f~')3/2 + 2(f!f)3/2 :s; 2(f~' + f~')3/2. Here,
we denote by fi the restriction of ¢i to the line x + th, f;(t) := ¢i(i + th). 0
• Affine transformations. Similarly, (7.7) is invariant under affine transforma-
tions. Let A(x) := Ax + b be an affine mapping with some matrix A E IRpxq
and some vector b E IRP. If ¢( . ): IRP -+ IR satisfies (7.7) then so does
¢(A( . »: IRq -+ IR (as long as there exists some x such that ¢(Ax + b) is
defined at all).
Indeed,
~
= dt~k ¢((Ax + b) + t(Ah» = dt~-
dt k ¢(A(x + th» k ¢(x + th)
- L In(ar x + i3i)
m
(7.8)
i=1
is a self-concordant barrier function for the polyhedron {xl aT X+i3i ~ 0 for 1:S
i:S m} (ifit has non empty interior).
• Convex quadratic constraints. First note that the logarithmic barrier func-
tion
-In(-q(x)) (7.9)
of the constraint q(x) :S 0 with a convex quadratic function q : IR n --+ IR satisfies
(7.7).
Indeed, the restriction J(t) := -In( -q(x+th)) can be split into two linear parts:
Since q is quadratic it follows that q(x + th) =
a2t2 + alt + ao for some real
numbers ai depending only on q, x and h. Since q is convex it follows a2 ~ 0,
and since x is stricly feasible, it follows q(x) < O. Hence, q(x + th) is either
linear in t, or it has two real roots as a function of t. In the latter case J can be
written as J(t) = -In(ult + vt) -In(u2t + V2) with VI > 0, V2 > 0, Ul, U2 E IR,
and these satisfy condition (7.7) as we have just seen. 0
n
= -2ln det X 1/ 2 -In det (I + tX- 1/ 2y X-1/2) = -In det X - 2)n(1 + tAil,
;=1
where Ai are the eigenvalues of X- 1/2Y X-1/2 (independent oft). By the closed-
ness of (7.7) under summation and affine transformations we conclude again,
that - L: In(l + tAil satisfies (7.7). 0
The H-Norm
Our analysis of Newton's method heavily depends on the choice of the norm in which
the analysis is carried out. By convexity of tjJ, its Hessian Hx := D2tjJ(x) is positive
semidefinite, and we may thus define a semi-norm IlzlIH", := (zT Hx z) 1/2. By our
assumption on problem (7.2), the set of optimal solutions SOP' is bounded, hence S
does not contain a straight line, and therefore tjJ is strictly convex 1 by the observation
following Lemma 7.2.2 below. Thus, II . IIH", is a norm-referred to as H-norm in
the sequel-and as it will turn out, this norm is a natural and very suitable choice
for our analysis. Indeed, it will turn out that the H -norm is closely related to the
shape of the set S.
Lemma 7.2.2 (Inner ellipsoid) Assume that the function tjJ is a self-concordant
barrier function and set Hx := D2tjJ(x). Let x E So and h E IRn be arbitrary. If
6 := IlhlIH", :::; 1 then x + h E S.
in fact holds true for all tEl. To prove the Lemma it suffices to show that the
points ±6- 1 (±oo if 6 = 0) are in the domain of f or at its boundary. Here, 62 =
= =
IIhllk", /,,(0). We consider the function u(t) /"(t). Note that u(t) ~ 0 'Vt E I
by convexity of f. By finding the poles of u for t ~ 0 (by pole we denote a point
=
[ > 0 where limt_f, t<fu(t) 00) we may determine the domain of f. (The case
t :::; 0 follows when replacing h by -h in the definition of f.) Let v be the "extremal"
1 If this assumption is violated, and S does contain a straight line, then the null space of D2¢(x)
is nontrivial but independent of x, and straightforward modifications are possible to generalize the
results of this section.
Solving Convex Programs 263
solution of the differential inequality u' S; 2u 3/ 2 in (7.11) with the same initial values
as u (i.e. as 1"),
v'(t) = 2v(t)3/2, v(O) = 62 .
We deduce that u(t) S; v(t). (Straightforward, by exploiting the differential in-
equality (7.11), see for example [21], Theorem 3.1, page 19.) Since v is given by
v(t) = P(M - 1)-2 and has its pole at t = 6- 1 the claim follows. 0
Observe that in the case that 1"(0) = 0 it follows from v(t) == 0 that I"(t) = 0 Vi,
i.e. the domain of fin (7.6) is I = lR, and S contains the straight line {x+thlt E lR}.
Lemma 7.2.2 was first proved in [28], and simple examples are given there (e.g. the
function f(t) = lnt) that show that the bound t < 6- 1 on the maximum feasible
step length for f is tight. The above proof is taken from [1 7]. Note that only scalar
inequalities (such as (7.7)) are needed to provide an inner ellipsoid in n-dimensional
space.
The close relation of the H -norm to the shape of the feasible set S will become
more apparent in Section 7.2.2 where it is shown that a small multiple of the inner
ellipsoid is an outer ellipsoid for a certain subset of S, namely there exists a small
number 'Y > 1 such that for any x E So, if 6 = IlhllHx 2': 'Y then either x + h if. S or
D¢;(x)h < o.
x + D.x E So and
(7.12)
Before proving that the finite difference version (7.12) follows from (7.7) we use
(7.12) to derive a relative Lipschitz condition. By subtracting IlhllHx from (7.12) we
obtain
which implies
264 CHAPTER 7
(7.13)
where
126
M(6) = (l-c5)2 -1 = 1-6 + (1-6)2 =2+0(6).
=
By choosing h ~x/6 it follows IIhllHx = 1, and when dividing both sides of (7.13)
by 6 and taking the limit as 6 --> 0 we obtain that ID3¢(x)[h, h, h]1 $ 2 which is just
(7.7). The converse direction of showing that (7.7) implies (7.13) is slightly more
difficult as (7.13) involves two vectors ~x and h while (7.7) is a scalar inequality.
The following basic lemma due to [28] allows a proof of this implication:
then
M[x,y,zF $ uA[x,x]A[y,y]A[z,z], Vx,y,z E IRn. (7.14)
This Lemma was stated in [28] and proved in [29, 16]. We give the proof of [16]. Our
proof will use the following slightly generalized version of the well-known Cauchy-
Schwarz inequality.
holds, provided that M[h, h, hj2 :::; Ilhll~, Vh E IR n is true. (The remaining part
follows by applying the generalized Cauchy-Schwarz inequality (7.15) for fixed x to
B = Mx!) Let
, := max{M[x, h, h]1 s.t. IIxl12 = Ilhli2 = I}
and let X, h be the (not necessarily unique) corresponding arguments. The necessary
conditions for a maximum (or a minimum if M[x, h, h] is negative) imply that
where (3 and p are the Lagrange multipliers. From this we deduce that (3 = ,/2 and
p =,(by multiplying from left with (x T , 0) and (0, hT)) and therefore
=
Let a self-concordant function ¢>, a point x E So, the Hessian matrix D2 ¢>( x) H x,
an arbitrary vector dx E IR n with fJ =
IIdxlIH", < 1 and an arbitrary vector h E IR n
be given. From Lemma 7.2.2 follows that x + dx E So.
To evaluate how the H -norm of the vectors dx and h changes for different matrices
H x +t6x , with t E [0,1]' let us define
(7.16)
The change of v can be estimated by its derivative v'et) using the estimate (7.14).
From self-concordance of rp it follows with (7.14) that
and in particular,
Iv'(t)1 = D3 rp(x + t~x)[~x, h, h]
~ 2)~xT D2rp(x + t~x)~x hT D 2rp(x + Mx)h = 2U(t)1/2 V (t).
Inserting (7.16) in this inequality we obtain a differential inequality,
28
Iv'(t)1 ~ 1 _ t8 vet).
v I (t) -_+~
-1- t8 v (t), yeO) = v(O),
are given by vet) = v(O)(l - t8)+2, and thus (again by the comparison theorem for
differential inequalities [21])
2 v(O)
v(O)( 1 - t8) ~ vet) ~ (1 _ t8)2
Newton's Method
This section is concerned with answering the first crucial question (about Newton's
method for minimizing rp). First note that rp has a unique minimum x· if S is
bounded. The minimum is called analytic center of S (Sonnevend [33]). This nota-
tion is commonly used but somewhat misleading since the analytic center does not
depend on the set of points S, but rather on the barrier function rp describing S.
Solving Convex Programs 267
Next we show that self-concordance of rp further implies that Newton's method con-
verges "well" when applied to minimize rp. Note that a linear perturbation of rp does
not influence the self-concordance condition (7.7), so that our results below can also
be applied to the unconstrained minimization problem
(7.17)
For simplicity of presentation we disregard the linear term (i.e. set J.t = (0), and for
x E So let
(7.18)
denote the Newton step for obtaining the next Newton iterate x := x + 8x. The
following Lemma due to [28) states the main result about Newton's method.
For 118xllHx :s: 1/4 this implies that Newton's method (without line search) is guar-
anteed to be quadratically convergent with constant at most 196 •
Proof: In [28) it is shown that rp has a minimum. We only prove (7.19) and follow
the outline of [28).
(7.20)
1-0
The following result completes our intuition on the domain of quadratic convergence
of Newton's method.
Corollary 7.2.5 Under the assumptions of Lemma 7.2.4 let x* be the minimum of
(7.17). Newton's method starting at x is quadratically convergent if x E x* + i E(x*),
=
where E(x*) {zl zT H",oz ::; I}.
Proof: Let h E iE(x*) be given i.e. such that {) := IIhllHzo ::; and set xes) i. =
=
x* + sh. For a given z E IR n we may define K(s) DtjJ(x(s»z. We obtain
Solving Convex Programs 269
where the above inequalities follow just as in the proof of Lemma 7.2.4. Setting
z = ~x = -D 2 </J(x" + h)-l D</J(x" + hf as the Newton step starting at x" + h, we
obtain-again as in the proof of Lemma 7.2.4-that
In particular, for 0 :::: 1/5, it follows that II~xIIHr'+h :::: 5/16, and by relation (7.19),
the point x* + h is the domain of quadratic convergence. 0
The importance of these results on Newton's method is that they do not depend at all
on J.l in (7.17) or on the data of the problem (7.2)-(as long as </J is self-concordant).
Furthermore, for J.l = 00, the above corollary implies that the domain of quadratic
convergence is one fifth of the inner ellipsoid of S which in turn is a fixed fraction
of the outer ellipsoid of S as we will see in Section 7.2.2.
Without proof we further quote the following result of [28]. If II~xIIHr < 1/3 then
the distance of x to x" is bounded as follows:
(7.21)
The previous results depend on the H-norm of the Newton step for minimizing </J
starting at x. Clearly, the H -norm is a continuous function of x, and as will be
shown in Section 7.2.2, it is uniformly bounded for the examples considered here.
Nevertheless, the requirement that the starting point x of Newton's method be close
to the center in the sense that II~xIIHr < 1 can often not be satisfied in practice. For
lI~xllHr > 1, the result x + ~x of the Newton step may lie outside S. The following
simple step length rule for a line-search, however, seems to be an apparently favorably
damped version of Newton's method.
Let a search direction ~x be given with D</J(x )~x < 0, for example ~x may be
the Newton direction (7.18) for </J. We are interested in a search step s such that
</J( x + s~x) is as small as possible. (Line-search problem.) For this purpose define
Jet) = </J(x + t~x) as in (7.6) and denote the minimizer of / by t" and the Newton
step for / by ~t = - /' (0)/ /,,(0). A possible rule for the search step s referred to as
the reduced Newton step for / and for which [28] proved global convergence is the
following:
~t
(7.22)
s= l+l~tIJf"(O)'
270 CHAPTER 7
(Note that the Newton step for / refers to a one-dimensional Newton's method that
is not to be confused with the n-dimensional Newton direction for tjJ.)
Lemma 7.2.6 (see [18]). Assume the barrier function tjJ for S is self-concordant.
A line-search for ¢ starting at x in direction Llx using the reduced Newton step s
(7.22) is monotonically convergent, i.e. s is "a little too short", s ::; t*. Further, s
is the largest step possible that is guaranteed to satisfy s ::; t* if all that is known are
the first two derivatives of / at the current point t = O.
Proof:
We first prove monotonicity of the line-search. Let a = /'(0) < 0, b = 1"(0) > 0,
and t* be the zero of get) := /'(t). We show that s ::; t* holds true by writing the
differential inequality (7.7) (self-concordance of J) in terms of the function g = /'.
We obtain
=
g(O) a, g'(O) b,= g"(t) ::; 2g'(t)3/2. (7.23)
The "extremal" solution v of (7.23) that solves the initial value problem
On the other hand, since g = v is possible, any step larger than s may exceed t* . 0
Observe that if Llx is the Newton direction (7.18) of ¢ at x, then the reduced Newton
step in direction Llx simply yields llx/(l + IILlxIIHJ.
In spite of the strong result of Lemma 7.2.6, it turns out that the reduced Newton
step may be much too short in practice (Lemma 7.2.6 merely gives a worst-case
estimate!) and a practical implementation definitely should not rely on the reduced
Newton step; instead it might use a line search along the Newton direction, for
example.
Solving Convex Programs 271
Similarly, we may link the second crucial question of Section 7.1 (about the "dis-
tance" cT X(A) - AoPt ) to ellipsoidal approximations of S(A) around the analytic
centers X(A). If <p( ,A) is self-concordant for fixed A the results of the previous
section imply that
E(A) := {zl zT Hx,>.z :s: I}
is an inner ellipsoid for S(A) in the sense that X(A) + E(A) C S(A). (Here, by H x ,>.
we denote the Hessian D;,<p(x, A) of <p( . ,A).) If there exists a small number 1> 1
such that lE(A) yields an outer ellipsoid for S(A), i.e. S(A) C X(A) + lE(A), then
we may conclude that
It turns out that a second property of the log function is needed to allow such
ellipsoidal approximations. (Self-concordance by itself is not sufficient-as can easily
be seen from the example r/J(t) := -In t - 0" In(1 - t): (0,1) -+ 1R; this function is
self-concordant for 0" ;::: I, the inner ellipsoids around the minimizer l~q, however,
become smaller and smaller as 0" -+ 00.)
A naive derivation of a second property needed to provide outer ellipsoids can again
be obtained from the function "-In t". To prevent the minimum of r/J from being
close to the boundary of the domain of r/J one might impose a condition that bounds
the growth of r/J (i.e. that bounds Dr/J(x»-relative to the canonical norm II .IIHx'
For the function r/J(t) := -In t: 1R+ -+ IR we may observe that
r/J'(t)2 :s: r/J"(t) Vt > 0
holds true. More generally we assume a second differential property of r/J:
B-self-concordant barrier function if for all x E SO and all h E lRn the function f
satisfies (7.7) and
(7.24)
Remark: This definition coincides with the one given in [29]. In some cases however,
we need to refer to property (7.24) independently of (7.7). We therefore allocate an
extra name and refer to (7.24) (without assuming that (7.7) holds) as B-self-/imiting.
The number VB may be interpreted as a local Lipschitz constant for!jJ (or I), where
the change in the argument x is measured in the H -norm.
Note that just as (7.7), also (7.24) is assumed to hold only for the argument t = 0
since this is easier to verify. However, as before, (7.24) in fact holds true for all tEl.
Note further that (in contrast to (7.7» condition (7.24) is not invariant when adding
a linear perturbation c: x
to!jJ. (Such a perturbation is used below in the logarithmic
barrier approach.)
Equivalent Formulations
As before for (7.7), there are also other equivalent formulations for (7.24).
(The proof that this is an equivalent condition is trivial.) Note that w(x) > 0 for
x E So, and'll can be extended contiuously to the boundary of S by setting w(x) = 0
for x E as. The resulting function is related to the multiplicative barrier function
in [14].
The Newton step allows another equivalent formulation of the condition (7.24),
namely the condition that the Newton step (7.18) is to satisfy
(7.25)
The derivation of this formulation is also straightforward, e.g. by applying the KKT-
Theorem to max{D!jJ(x)hl hT D2!jJ(x)h ~ I}. Observe that this formulation uses our
assumption that tP is strictly convex, i.e. that the Newton step exists at all, while
the previous two formulations are slightly more general.
Solving Convex Programs 273
Condition (7.25) is remarkable, as for II~xIIHx < 1 Lemma 7.2.4 about Newton's
method is applicable.
Some Examples
Let us briefly verify that the above examples (7.8) - (7.10) satisfy (7.24) as well.
• Summation. Similarly, we observe that if ¢1, 1>2 satisfy (7.24) for some self-
concordance parameters 81, 82, then so does ¢1,2 := ¢1 + 1>2 with self-concor-
dance parameter 81,2 = 81 + 82 (as long as the intersection of the domains of 1>1
and 1>2 is not empty).
(Straightforward)
Properties
We note here that for 8 < 1 there is no solution that satisfies both, (7.7) and (7.24)
(except for constant functions). This is proved in Lemma 7.2.12 below.
One of our main concerns in our derivation of (7.24) was the desire for an outer
ellipsoid. The following lemma shows that (7.24) indeed provides such an ellipsoid.
274 CHAPTER 7
Proof: We show that the points ±d«(J + 2v'o) are not feasible for 1 Ix*,h in (7.6), =
where d := I/.J1"(0). We consider the functions g(t) = /'(t) and u(t) = /"(t). To
determine the domain of 1 we investigate the poles of g for t ~ O. By (7.24), g is a
solution of
i(t) ::; (Jg'(t), g(O) = 0, g' (0) = f" (0) > O. (7.26)
Because of the initial values, the inequality g2 ::; (Jg' is "inactive" near t = O. For
small values of t ~ 0 we therefore apply inequality (7.7) again. Let
-[:= dVO.
If 1 is not defined at -[ there is nothing to show. Hence we assume it is, and conclude
analogously as in the proof of Lemma 7.2.2 that (7.7): g"(t) ~ _2g'(t)3/2 implies
=
(since w satisfies w"(t) -2w'(t?/2 and has the same initial value at t = 0). With
the variable i := t - -[ and g(t) := g(-[ + t) = g(t) relation (7.26) implies
(7.27)
g(O) = if
o
g'(T) dT ~
if0
W(T) dT =
d(I
v'o
v'o =: dl ,
+ (J)
and
g'(O) = g'(-[) ~ w(-[) = d- 2(1 + VO)-2 =: d2 •
Observe that di = (Jd 2 . It follows that g(i) ~ s(i), where s satisfies
s(i)2 = (Js'(i) and s(O) = dl .
The function s is given by s(i) =
(di l - i/(J)-l, and has its pole at i (Jd I l . The =
corresponding value of t is t = -[ + i = d(J(I + 2/v'o). By construction, s(i) ::; g(t),
so that the pole of g and hence of 1 must lie before this point. 0
Note that in the proof of Lemma 7.2.8, the function w can be continued beyond
the point -[ in such a way that w =
s' for t > t. The second integral W of w is
Solving Convex Programs 275
twice continuously differentiable, and satisfies the relations (7.7) and (7.24) almost
everywhere (except at 1). This shows that also the bound d(9+2VO) obtained from
W cannot be improved for general self-concordant functions.
We point out that the proof of Lemma 7.2.8 may be generalized to yield an outer
ellipsoid centered at other points i: #- x* if the corresponding function f satisfies
df'(i:) ::; a: < 1 independent of h. (In this case of course the constants will change.)
An immediate consequence of Lemma 7.2.2 and Lemma 7.2.8 is the following Theo-
rem:
Theorem 7.2.9 Let ¢> be a 9-self-concordant barrier function for S and let the el-
lipsoid E(x) ={hi hT D2¢>(x)h ::; I} be defined by the Hessian of rp at x. For all
x E So we have
x + E(x) C S,
and if the minimum x = x* of rp exists we further have
x* + (9 + 2v'o)E(x*) :J S.
A two-sided ellipsoidal approximation of this type is also proved in [29] with a slightly
larger ratio of inner and outer ellipsoid.
then
x + E(x) C S,
and
x + (8 + 2v'o)E(x) :J S n 1t.
276 CHAPTER 7
As pointed out above, the two-sided ellipsoidal approximations proved above can be
used to find an estimate for our second question regarding the distance ,\ - cT x('\)
compared to cT x('\) - ,\oP'. However, for Ii: > 1 in (7.5) the resulting answer is not
optimal. Below we list a stronger estimate taken from [11]. (The result in [11] is
slightly more general than what is needed to answer the second "crucial question"
in Section 7.1.2.)
Lemma 7.2.11 Let the interior of the set 8('\) in (7.4) be 1!onempty and bounded.
Let ,\op' = min{cT x I x E 8('\)} (as before) and let IjJ be a B-selflimiting barrier
function for 8. Let further Ii: ~ 1 be a constant and
Then:
(7.29)
Proof. Let x = x('\), and x OP ' be an optimal solution in 8('\) with cTxop, = ,\op'.
Define h := x OP ' - x. We consider the function f of (7.6) with the above x and
h. Obviously, "I" is a boundary point of the domain I of f. Note that /'(0) =
-Ii: >.=-)x > O. (This follows since the function j(t) := ljJ(x+th)-1i: In('\-c T (x+th))
has a minimum at t = 0.) We set again 9 = /' and use (7.24):
g(O) = /'(0).
As before, the extremal solution (1'(0)-1 - e- 1t)-1 of this differential inequality is
a lower bound for g. Since t = 1 must in in the domain of the extremal solution (or
:s
at its boundary) it follws that /'(0)-1 - B- 1 ~ 0, or /'(0) B. But this is just the
claim to be shown. 0
We now direct our effort to proving for which types of constraints the logarithmic
barrier function (7.3) is "optimal".
Optimal Barriers
The previous section illustrated the importance of self-concordance of a barrier with
a small parameter B. Here, we are interested in finding the "best" (with minimal B)
barrier of a convex set 8.
Solving Convex Programs 277
For any closed convex set S C JRn with nonempty interior there exists a universal
o= O( n )-self-concordant barrier function.
If S does not contain a straight line, the
universal barrier function is given by
and since the area of the triangle formed by the negative orthant and the line through
(-I/Xl'O) and (0, -1/x2), is 1/(2xIX2), this simplifies to -In 2 -In(xI) -In(x2),
i.e. we obtain the standard logarithmic barrier.
One might suspect that -In(xI) - In(x2) is "optimal" (with respect to 0) for this
set, and indeed if there was a better barrier function q;*, say with self-concordance
parameter 0 < 2, then for x E JR~n one could construct the barrier function
with self-concordance parameter nO. For large n it follows that nO + 2v'n8 < 2n -1,
and this contradicts the ellipsoidal approximation of inner and outer ellipsoid by a
ratio of 1 ; nO + 2v'n8 (since by preapplying the affine mapping A ; JR 2n-l -> JR2n
with
y -> (1 _,,~n-l
L.....=l
.),
y.
the function q;**(A( . )) becomes a nO-self-concordant barrier for the 2n - I-dimen-
sional simplex {Yi ~ 0, l:;:~l Yi ~ I}, and inner and outer ellipsoid cannot approx-
imate the simplex in JR 2n - 1 with a ratio that is better than 1 ; 2n - 1).
Lemma 1.2.12 Any self-concordant barrier function for a convex set S has self-
concordance parameter at least k if there exists an affine subspace U such that S n U
contains a vertex at which precisely k linearly independent smooth constraints are
active.
As a corollary we obtain that "-In det X" is an optimal barrier function for the
positive definite cone as one might choose the linear subspace U that fixes all off-
diagonal elements of a matrix X to zero; of course, the diagonal elements Xi;, of a
positive definite diagonal matrix X must satisfy Xii> 0 so that there are precisely
n linearly independent constraints active in U at X =o.
A direct derivation of this lower bound for barrier functions is given in [29] for
polyhedra, and that result is easily generalizable to the case of nonlinear constraints
as well.
We used the quality of the ellipsoidal approximations to prove the lower bound on
(), and since this bound is sharp (as there exist barrier functions that attain this
bound), our derivation also implies that asymptotically, for large (), the ellipsoidal
approximations of the sets S are optimal. Nevertheless, for special classes of barrier
functions (like - L: In Xi or -In det X), the ratio of the ellipsoidal approximations
can be improved to 1 : () - 1, see e.g [33, 5].
Above we have seen that the logarithm of a single linear or convex quadratic con-
straint is optimal. Further, it is straightforward to verify that the logarithm of a
single linear constraint is the unique-up to an additive constant-optimal barrier
function. In contrast, a convex quadratic constraint q(x) :s 0 has more than one
=
optimal barrier function; for example if x argminq(x) exists (and q(x) < 0), then
4;(~)q(x) - In(-q(x» is also a () = I-self-concordant barrier function for the set
{xl q(x) :s a}. (The proof is straightforward.)
Finally we point out that the "optimal" self-concordance parameter () is not a smooth
function of the constraints. The set {xl IIxII2 :s 1, Xl :s1 + f} has a () = I-self-
concordant barrier function for f 2: 0, and for -2 < f < 0 this set has a vertex at
which precisely two linearly independent constraints are active, so that () 2: 2 must
hold.
Another aspect concerning "optimality" of a barrier function and that was disre-
garded in this section is the cost of evaluating tP and its derivatives which is an
important issue when looking for an implementable barrier function. Certainly, the
barriers for linear, convex quadratic, or semi-definite constraints given in Section
Solving Convex Programs 279
7.2.2 are "good" in the sense that their derivatives may be computed at a reasonable
cost, and they are optimal with respect to o. This optimality may be lost when
forming intersections of the constraints by adding the barrier functions, but as long
as the number of constraints is moderate these barrier functions seem appropriate.
For problems with very many constraints, the volumetric barrier function, see e.g.
[2, 3], may be better suited. For further examples of "good" and implementable
barrier functions besides the ones listed in this chapter we refer to [29] pp. 147-202,
where barrier functions for the epigraph of the matrix norm or the second-order cone
are listed for example.
Let a ?:: 1 be fixed and ( be some function with 1("I(t)1 :<::; -3a("(t)jt for all t > O.
(For example a = 1 and ((t) := t P for some 0 < p < 1.) Then the function
(7.30)
is a 8 = 2a 2 -self-concordant barrier function for the set {(x, y)1 x ?:: 0, y:<::; ((tn c
JR,2.
1. {( x, y) I y ?:: eX},
3. {(x,y)ly?::x P, x>O}forsomep:<::;-l,or
1. -In(ln(y) - x) -In y,
2. -In(yl/p - x) -In y,
4. -In(y-xln(x))-Inx.
Another example is the set {(X,t) IX E JRpx q , t > 0, IIXI12::; t} with the q + 1-
self-concordant barrier function
- In det (t 2 I - XT X) + (q - 1) In t. (7.31)
Here, I is the q x q identity matrix, and II . 112 is the lub 2 norm, IIXII2 = SUPh;to 11~~~2 .
(Note that by IIXI12 = IIX I12 we may assume q < p.)
T
Compatibility
We conclude this section with a criterion that allows us to treat some nonlinear
convex objective function c : S --+ JR as a constraint by introducing an n + 1-
st variable Xn +l, and the additional constraint c(x) - Xn +l ::; O. The criterion
guarantees that the resulting barrier function -In(x n +l - c(x)) + t,f>(x) is a self-
concordant barrier function for the set
Proposition 7.2.13 Let t,f> be some 8-self-concordant barrier function for the set
S C JR n , and let c : S -+ JR be some smooth convex function. For x E So and
hE JRn let
l1(t) := t,f>(x + th) and h(x):= c(x + th).
The function c is called f3-compatible with ¢ for some f3 2': 1 iff for all x and h as
above,
f~/f(O) ::; 3f3f~'(0)j ff'(O).
If c is f3-compatible with ¢ then the function
-f3 2 In(x n +l - c(x)) + (32t,f>(x)
is a self-concordant barrier function for the set S+ with parameter (32(1 + 8).
The proof of this proposition follows from Theorem 3.1 in [30] by setting r := S+,
G:= JR+(= {t E JRI t 2': O}), F(t):= -Int, II(X,Xn+l) := ¢(x), and A(x,xn+d:=
X n +l - c(x).
Solving Convex Programs 281
Let cp be a O-self-concordant barrier function for S-not necessarily ofthe form (7.3).
Under the assumptions of Section 7.1.1 let a point x(o) E So and some number
°
A > Aopt be given such that
Here, ~x(o) is the Newton step -D2<p(x(O) , AO)-l D<p(x(O), AO)T, He,>. = D;<p(x, A),
and <p is given as in (7.5) with", = O.
Algorithm 1
5. k := k + 1; go to 2.
282 CHAPTER 7
Convergence Analysis
(This means that the iterates remain "close" to the centers X(A k).)
We analyze the algorithm step by step.
Step 3: By (7.21) and (7.12), and the above result we may conclude that
(7.33)
Therefore the stopping criterion in step 3 guarantees cT x(K) - Aopt < e if K is the
index k at which the method terminates.
Step 4: Relation (7.33) implies that the gap Ak - Aopt between the upper bound AI:
for cT x(.!;) and the (unknown) optimal value Aopt is reduced by a factor of at least
~~u in step 4.
To complete our induction we verify that after the update of Ak+l in step 4, the
iterate x(k+!) again satisfies
(7.34)
Solving Convex Programs 283
by the result of step 2. Using this we may continue with the triangle inequality,
For the above choice of K: = (J and u = 1/(8VO) these relations imply that at each
iteration the distance Ak - Aopt is reduced by a factor 1- 28 1 :v'8'
and from this it is
straightforward to derive that the number f{ of iterations until the algorithm stops
is bounded by
AO _ AoPt
f{ S 18VOln( ).
f
Each iteration involves the computation of the functions Ti, their first and second
derivatives as well as the solution of a linear system in JR n . Since 122~38 > In2
we may also conclude that Ak - AOP ' is reduced by a factor ~ after at most 12VO
iterations, validating the claim at the beginning of this chapter.
The above algorithm assumes that an initial point close to the center of some sub
level set S(A) is given. This assumption is typically not satisfied in practice, and a
phase-1 algorithm is necessary to find such a point. Moreover, in many real-world
linear programs, for example, the interior of the feasible set is empty. Next we sketch
an infeasible method that can solve problem (7.2)-or even a slightly more general
problem-in a single phase even if the feasible set has empty interior.
284 CHAPTER 7
Our presentation differs from infeasible interior-point methods for linear or convex
quadratic programs since for programs of the form (7.2) or (7.35) (below) the situ-
ation is more difficult than in the case of linear programs. A restriction Ti of (7.2),
for example, may be convex in the interior of S and undefined or non convex outside
S (while linear functions of course are also linear outside S).
mm cTx
Tj(X):::;O for l:::;i:::;m
Ax= b (7.35)
where A E /Rkxn with k < n has maximum rank. For each i we assume that a point
Xi is known such that T.(X.) < o. We assume that
tP.(x) := -In(-Tj(x))
The last assumption is needed in our analysis to guarantee the existence of the points
x(Jl) below. It is always satisfied, for example, if cP;(x) is B self-limiting for some
B ~ 1. If the convexity assumption is violated, or if nonlinear equality constraints
are present, some modification like a trust region interior point method may still be
applicable. The knowledge of the points Xi, however, is a basic assumption which is
necessary if we don't assume anything about the constraints Ti for T;(X) <f.. O. Often,
the points Xi can be chosen all the same, Xi = x(O), in which case the scheme below
simplifies substantially.
By construction, x(O) E S(I)O (the relative interior of S(I)). For each fixed Jl E [0,1]
the logarithmic barrier function 4
m m
i=1 i=1
is convex for X E S(Jl)O, and in particular, S(Jl) is closed and convex. Moreover,
is in domcP; so that
o
4 The subscript of r/> is ambiguous, but whenever there is a chance of confusion of the real subscript
J1. E [0,1] and the integer subscript i we will specify the subscript explicitly.
286 CHAPTER 7
Clearly, S(O) is the feasible set of(7.35). Since S+ is convex and S(I)O i- 0 it follows
that for p, E (0,1) the domain of ¢I' is not empty if S(O) is not empty. One might
therefore try to follow the points
From p, = 1 to P, =O. We note that in the absence of the shifts (i.e. when Xi =
x(O)
for all i) and in the absence of the linear constraints, the points x(p,) coincide with
the points x(.A) of the method of centers. The relation of J-I and the corresponding
parameter .A was examined in [34), for example. We further note that we do not
assume that the feasible set S(O) of (7.35) has nonempty interior, for our purposes
it is sufficient that S(J-I)O is nonempty for p, E (0,1).
Proposition: Under the assumption that the set Sop' of optimal solutions of (7.35)
is nonempty and bounded, the points x(J-I) exist for p, > O.
Assume that 'I'1"(x(J-I) + O"d) is bounded (from above) for all u > O. We distinguish
two cases: Either cT d SO or cT d > O.
S+. Since the three points (X(O), 0), (x(J.Lf)+i7d, J.lc), and (x(J.l)+~i7d, J.l) are colinear,
this leads to a contradiction to the convexity of S+. Hence, (X(O) + i7d, 0) E S+.
But this implies that X(O) + ud is feasible for (7.35) for all u > O. Since X(O) = xo pt
is optimal for (7.35) and cT d::; 0, the points X(O) + ud are optimal as well (and in
fact cT d = 0), but this contradicts boundedness of Sopt. 0
For a given J.l > 0 and given x E S(J.l) first approximate the minimum of 'PI' by a
sequence of Newton steps with line search;
(7.38)
(For simplicity we restrict this presentation to plain Newton's method with line
search.) If the function 1>1'=0 is self-concordant, then so are the functions 'PI' for
J.l E [0,1] and a possible stopping test for Newton's method might be whenever the
H-norm of the Newton step is less than 1/2, for example. (In this case we know that
the minimum of 'PI' exists.)
If on the other hand some unbounded direction is found during the line search for
Newton's method, the minimum of 'PI' does not exist, and by the above proposition,
either the set of optimal solutions is unbounded, or there is no (finite) optimal
solution. Likewise, if the domain of 'PI' "collapses" before J.l reaches 0, it follows that
the domain of (7.35) is empty.
Next, we explain the predictor step. Given a sufficiently close approximation x(k)
to x(ji) for some ji E (0,1]' the predictor step follows the tangent x'(J.l) in direction
J.l = 0 while maintaining feasibility with respect to 1>1'. It turns out that even though
(most likely) the current iterate x(k) does not lie on the curve x(J.l) , there is some
288 CHAPTER 7
other curve through x Ck ) leading to the set of optimal solutions. Let 9 = Dt.pp.(xCk)f
be the gradient of t.p p. at xC k). The points
also form a smooth curve leading from x Ck ) to some point in SoP', and whose tangent
can be computed analytically. Differentiating AX(Jl) = b + Jlb with respect to Jl
yields Ax'(Il) = b, and differentiating Dt.pp.(x(ll)f == 9 - Y(Jlf A with respect to II
yields a second linear equation for x'(Jl), namely (the first block row of)
(7.39)
The predictor step follows the linear ray x(Jl) := x Ck ) + (ft -Il)x'([t). The next value
of Jl is chosen large enough (Jl < [t) such that x(Jl) E S(Il)o. More precisely, let
Algorithm 2
Comments: Note that the above algorithm can be applied to a linear program
in both, its primal or its dual form resulting in an infeasible primal or dual algo-
rithm. We stress that primal-dual predictor-corrector methods have been extremely
successful in the recent past, see e.g. [23, 24, 20]. While purely primal (or dual)
methods as outlined above received less attention, they are easy to implement, and
Solving Convex Programs 289
we believe that they may also be efficient if implemented with the same care as
the implementations for primal-dual methods. Apart from an efficient solver for
the systems of the form (7.38), (7.39)-exploiting sparsity structure, symmetry and
quasi-definiteness, crucial features of an implementation typically include the choice
of the starting points Xi and x(O)-such that Iri(xi)I/IIDri(xi)11 ""Ildill + Ilbll, the
step length 1'+ in (7.40)-shorter in the initial stage of the algorithm and longer
towards the final iterations, as well as suitable modifications of Newton's method.
1. b = 0,
2. Xi = x(O) for all i,
3. the functions <Pi are OJ-self-concordant barrier functions,
4. the relative interior So of S == S(J-1) for p. E [0,1] is non empty, and Sopt is
bounded,
5. a starting point x(O) close to the point x(l) is available,
=
Define <P :L <pj. Then, in the space aff(S) = =
{x+d I xES, Ad O} the function <p
=
is a O-self-concordant barrier function for S with 0 :L OJ. In particular, all results
of Section 7.2 are valid for <p in aff(S). Under the above assumptions, 'PI' has the
simpler form
cTx
'PI' (x) =- + <p(x).
I'
Observe that the remark following Lemma 7.2.2 and boundedness of Sopt imply that
the Hessian of <p resp. of 'PI' is positive definite on the null space N(A) of A. By H:r;
we denote again the Hessian H:r; = =
D2<p(X) D2'Pp(x), and for a feasible direction
290 CHAPTER 7
d (i.e. such that Ad = 0) the H-norm of d is given by IIdllkx = rfI' Hxd > 0 for d oJ O.
We may also verify that the H-norm of the Newton step ~x in (7.38) is given by
(If H is positive definite (on JRn), the matrix Ht = H;1/2 llN (AH;1/2)H;1/2 where
llN(AH-l/2) is the orthogonal projection onto the null space of AH- 1/ 2 .)
Algorithm 3
Here, ~x(o) is the Newton step (7.38) with x = x(o) and p. = p.o.
2. X(k+1) := x(k)+~x(k) where ~x(k) is the Newton step (7.38) with x = x(k) and
p. = p.k.
3. If 8p.k :::; i~c: stop, else
4. p.k+1 := J.tk(l- u).
5. k:= k + 1; go to 2.
After completion of step 2., it follows from Lemma 7.2.4 for the function <PI-' of the
form (7.17) that
(7.41)
follows. 0
see e.g. [4]. In these examples, typically, the number m is very large, and
n ~ Vm is smaller. The matrices Ai often have a special form (low rank, very
sparse) so that problems with up to m ~ 300000 constraints can be solved
efficiently, see e.g. [20]. These problems are also an example for the "gap"
between theory and practice. The self-concordance parameter B m ~ n is =
quite large in these examples, and the worst-case analysis of this chapter would
suggest a slow rate of convergence for the large examples, but nevertheless, the
implementation in [20] converges reasonably fast (in about 60 iterations to 8
digits accuracy), even for the large problems.
• Lp-norm approximation problem.
Another application-introduced in [29]-is the problem
k
-U·
J - < aTJ x- b·J -< U·J for 1 _< J. <_ k}
with B = 4k-selfconcordant barrier function
k
- Lln(T}/P - Uj) -lnTj -In(uj - aT x + bj ) -In(uj + aT x - bj ),
j=l
where A(i) are given symmetric matrices, and the inequality A(x) 2:: 0 means
that A(x) is positive semidefinite. If the matrices A(i) are n x n, a e = n-self-
concordant barrier-function for this constraint is
1/>( x) = - In det A( x)
as we have seen in Section 7.2. (In Section 7.2 we considered the function
-lndetX. By preapplying the affine mapping x ----- A(x), we obtain the self-
concordant barrier function - In det A( x).) Note that the Hessian of I/> can be
computed by
For moderate values of k and n one may thus directly apply an interior-point
method. For larger values of k, the dual form may be more appropriate, see
e.g. [1, 36] where also other means for efficiently solving the Newton systems
are discussed. For a comprehensive survey on semidefinite programs we refer to
the next chapter, or to [36, 12].
Acknowledgements
The author would like to thank the editor, Prof. T. Terlaky, as well as Prof. K.
Anstreicher, Dr. R.W. Freund, Prof. S. Mizuno, Prof. D. Klatte, Prof. T. Jongen
and Prof. J. Stoer for their help and careful proofreading.
REFERENCES
[1] F. Alizadeh, "Interior point methods in semidefinite programming with applica-
tions to combinatorial optimization" SIAM Journal on Optimization, 5(1):13-
51, (1995).
[2] K.M. Anstreicher, "Volumetric Path Following Algorithms for Linear Program-
ming" Technical Report, Dept. of Management Science, The University ofIowa,
Iowa City, USA (1994).
[3] K.M. Anstreicher, "Large Step Volumetric Potential Reduction Algorithms for
Linear Programming" Technical Report, Dept. of Management Science, The
University of Iowa, Iowa City, USA (1994).
294 CHAPTER 7
[4] A. Ben-Tal, M.P. Bendsoe, "A new method for optimal truss topology design"
SIAM Journal on Optimization, 3:322-358, (1993).
[5] S. Boyd and L. EI Gl1aoui, "Method of centers for minimizing generalized eigen-
values," Linear Algebra and Its Applications 188/189 (1993) 63-111.
[6] S. Boyd, L. EI Ghaoui, E. Feron, V. Balakrishnan, Linear Matrix Inequalities
in System and Control Theory, (SIAM, Philadelphia, 1994)
[7] D. den Hertog, Interior Point Approach to Linear Quadratic and Convex Pro-
gramming Kluwer 1993.
[8] D. den Hertog, F. Jarre, C. Roos, T. Terlaky, "A Sufficient Condition for Self-
Concordance, with Application to Some Classes of Structured Convex Program-
ming Problems" Mathematical Programming, Series B 69, 1 (1995) 75-88.
[9] A.V. Fiacco and G.P. McCormick, Nonlinear Programming: Sequential Uncon-
strained Minimization Techniques (Wiley, New York, 1968), Reprinted 1990 in
the SIAM Classics in Applied Mathematics series.
[10] R. Freund, "An infeasible-start algorithm for linear programming whose com-
plexity depends on the distance from the starting point to the optimal solution,"
Working paper 3559-93-MSA, Sloan School of Management, Massachusetts In-
stitute of Technology, (Massachusetts, 1993).
[11] R.W. Freund and F. Jarre, "An interior-point method for fractional programs
with convex constraints," Mathematical Programming 67 (1994) 407-440.
[12] C. Helmberg, F. Rendl, H. Wolkowicz, R.J. Vanderbei, "An interior-point
method for semidefinite programming" Report 264, CDLDO-24, Technische
Universitiit Graz, June 1994.
[13] P. Huard, B.T. Lieu, "La methode des centres dans un espace topologique",
Numerische Mathematik 8 (1966) 56-67.
[14] M. Iri and H. Imai, "A multiplicative barrier function method for linear pro-
gramming", Algorithmica 1 (1986), pp. 455-482.
[15] F. Jarre, "On the method of analytic centers for solving smooth convex
programs," in: Optimization (Varetz, 1988), Lecture Notes in Mathematics
No. 1405 (Springer, Berlin, 1989) pp. 69-85.
[16] F. Jarre, "Interior-point methods for convex programming," Applied Mathemat-
ics and Optimization 26 (1992) 287-311.
[17] F. J arre, "Optimal ellipsoidal approximations around the analytic center," Ap-
plied Mathematics and Optimization 30:15-19 (1994).
Solving Convex Programs 295
[18] F. Jarre, "A new line-search step based on the Weierstrass p-function for min-
imizing a class of logarithmic barrier functions" Numerische Mathematik 68:
81-94 (1994)
[19] F. J arre, "Interior-point methods via self-concordance or relative Lipschitz con-
dition" Optimization Methods and Software 1995, Vol. 5, 75-104.
[27] J .E. Nesterov and A.S. Nemirovsky, "A general approach to polynomial-time
algorithms design for convex programming," Report, Central Economical and
Mathematical Institute, USSR Acad. Sci. (Moscow, Russia, 1988).
[28] J .E. Nesterov and A.S. Nemirovsky, Self-concordant functions and polynomial-
time methods in convex programming Report CEMI, USSR Academy of Sciences,
Moscow (1989).
[29] J.E. Nesterov and A.S. Nemirovsky, Interior Point Polynomial Methods in Con-
vex Programming: Theory and Applications (SIAM, Philadelphia, 1994).
296 CHAPTER 7
[30] J .E. Nesterov and A.S. Nemirovsky, "An interior-point method for general-
ized linear-fractional programming," Mathematical Programming, Series B 69,
1 (1995) .
[31] F.A. Potra, "An infeasible interior-point predictor-corrector algorithm for linear
programming," Report No. 26, Department of Mathematics, The University of
Iowa (Iowa City, Iowa, 1992).
[32] F.A. Potra, "A quadratically convergent infeasible interior-point algorithm for
linear programming," Report No. 28, Department of Mathematics, The Univer-
sity of Iowa (Iowa City, Iowa, 1992).
[33] G. Sonnevend, "An 'analytical centre' for polyhedrons and new classes of global
algorithms for linear (smooth convex) programming," in: System Modelling and
Optimization (Budapest 1985), Lecture Notes in Control and Information Sci-
ences No. 84 (Springer, Berlin, 1986) pp. 866-875.
[34] G. Sonnevend and J. Stoer, "Global ellipsoidal approximations and homotopy
methods for solving convex analytic programs," Applied Mathematics and Op-
timization 21 (1990) 139-165.
[35] J. Stoer, "The complexity of an exterior point path-following method for the
solution of linear programs," Working paper, Institut fur Angewandte Mathe-
matik und Statistik, Universitat Wurzburg, (Germany, 1992).
[36] L. Vandenberghe, S. Boyd, "Semidefinite Programming" Technical report, ISL,
Stanford University, Stanford CA, (1994), to appear in: SIAM Review (1995).
[37] Y. Ye, M.J. Todd, and S. Mizuno, "An O( foL)-iteration homogeneous and
self-dual linear programming algorithm," Technical Report No. 1007, School
of Operations Research and Industrial Engineering, Cornell University (Ithaca,
NY, 1992) to appear in Mathematics of Operations Research.
8
COMPLEMENTARITY PROBLEMS
Akiko Yoshise
Institute of Socia-Economic Planning
University of Tsukuba
Tsukuba, Ibaraki 305, Japan
ABSTRACT
This chapter deals with the interior point methods for solving complementarity problems.
Complementarity problems provide generalized forms for nonlinear and/or linear programs
and equilibrium problems. Among others, the monotone linear complementarity problem
has two important applications in the mathematical programming, the linear program and
the convex quadratic program. We focus on this problem and state its properties which serve
as the theoretical backgrounds of various interior point methods. We provide two prototype
algorithms in the class of interior point methods for the monotone linear complementarity
problem and their theoretical views. Also we briefly refer to recent developments and
further extensions on this subject.
8.1 INTRODUCTION
As we have seen in Part I, the interior point methodologies have yielded rich the-
ories and algorithms in the field of linear programming. This has motivated us to
extend them to more general problems. In this chapter, we deal with the complemen-
tarity problem as an example and describe how the results obtained for the linear
programming have been generalized for this problem.
297
T. Terlaky (ed.), Interior Point Methods of Mathematical Programming 297-367.
© 1996 Kluwu Academic Publishers.
298 CHAPTER 8
where I denotes a mapping from the n-dimensional Euclidean space lRn into itself.
The complementarity problem has been conceived to be a unifying form for nonlinear
and/or linear programs and equilibrium problems. It is known that any differentiable
convex program can be formulated into a monotone complementarity problem. Also,
the variational inequality problem has a close connection to the CP throughout
history. See Harkar and Pang [14] where the authors provide an extensive review
of theory, algorithms and applications of these problems. Among others, the linear
complementarity problem (LCP) has two important applications in the mathematical
programming, the linear program (LP) and the convex quadratic program (QP). The
reader is referred to the monumental work of Cottle, Pang and Stone [5] which is a
comprehensive book covering mathematical theory, algorithms and applications of
LCPs developed until 1992.
Each algorithm in the class of interior point methods for the CP has a common
feature such that it generates a sequence {( xk, yk)} in the positive orthant of lR 2n ,
i.e., every (xk, yk) satisfies (xk, yk) > o. If each point (xk, yk) (k =
0, 1, ... ) of
the generated sequence satisfies the equality system y =
I( x) then we say that
the algorithm is a feasible-interior-point algorithm, and otherwise, an infeasible-
interior-point algorithm. These algorithms originate in the primal-dual interior point
algorithms for the LP (see [45, 30], and so on). Megiddo [45] first showed the
existence of a path of centers for primal-dual LP which converges to a solution and
extended the concept to the general LCP. This analysis introduced a new framework
of interior point algorithms of tracing the path of centers. Within this framework,
the primal-dual interior point algorithms were developed by Kojima, Mizuno and
Yoshise [30], Tanabe [75] and so on. Kojima et al. [30] first proved the polynomial
computational complexity of the algorithms, and since then many other algorithms
have been developed based on the primal-dual strategy. Kojima et al. [29] proposed
an interior point algorithm for solving the monotone LCP and established the best
complexity bounds O( foL) known on the number of iterations for this problem.
Independent of this work, Monteiro and Adler [52] also provided an interior point
algorithm for the convex quadratic problem with an O(foL) iteration bound. Up
to present, the study of interior point methods for CP has paralleled that for the
LP.
In this chapter, we describe these interior point methods for CP with the intention
of describing the theoretical base of these algorithms as plainly as possible. For
this purpose, the class of feasible-interior-point algorithms for the monotone LCP
Complementarity Problems 299
This chapter is organized as follows. In the next section, we state the monotone LCP
and give some viewpoints in the context of optimization. As the definition (8.1) of
CP indicates, it is natural to use the classical Newton method for finding a solution
of a system to design interior point algorithms for CPo In Section 3, we discuss the
Newton method for the monotone LCP and give some fundamental results which lead
us to the definition of the path of centers for the monotone LCP and its existence. In
Section 4, we propose two prototype feasible-interior-point algorithms - the path-
following algorithm and the potential-reduction algorithm. Under the nonemptiness
assumption of the set of feasible-interior-points, each of the algorithms generates an
approximate solution of the monotone LCP in a finite number of iterations. The
iteration number is related to the initial point and the stopping criteria. In Section
5, we discuss these two subjects and provide the polynomial complexity bounds
for the algorithms described in Section 4. In Section 6, we briefly describe further
developments of interior point algorithms for LCPs and extensions to more general
classes of CPs. In order to ease readability, most theoretical results are presented
without proofs, giving just some references. However, it might require considerable
effort to collect all the proofs from the extensive literature, thus to assist the reader
most proofs are collected in Section 7.
Here we list some symbols that are often used throughout this chapter:
In light of the definition (8.1) we define the linear complementarity problem (LCP)
as a CP with an affine mapping I : R n -. Rn, i.e., I is given by I(x) = Mx + q
where M is an n x n matrix and q is an n-dimensional vector:
LCP: Find (x, y) E R 2n
(8.2)
such that y=Mx+q, (x,y)~O, xiYi=O(i=l, ... ,n).
Let us define the affine space
Saf = {(x, y) E R 2n : y = Mx + q}.
Then the feasible region S+, the feasible-interior region S++ and the solution set
Scp of the LCP 8.2 are given by
Condition 8.2.1
(x - x')T(f(x) - I(x ' »~ 0 lor every x, x' E lRn.
The monotone LCP is an LCP which satisfies Condition 8.2.2. The following is an
well-known and important example of the monotone LCP.
1 T
QP : Minimize cT u + _u Qu subject to Au 2: b, u 2: 0,
2
w = c + Qu - AT 2: 0, u 2: 0, u T = °
V W
(8.3)
z = Au - b 2: 0, v 2: 0, vT Z = 0.
The above system can be formulated as a monotone LCP where the matrix M and
the vector q are given by
M= ( Q
A °
_AT)
,q =
( C
-b
)
.
One may take an interest in the case where the problem is given by the standard
form:
1 T
QP' : Minimize cT u + _u Qu subject to Au = b, u 2: 0.
2
w = c + Qu - AT V 2: 0, u 2: 0, uTw =
0= Au - b.
° (8.4)
Kojima et al. [30] first proposed a primal-dual interior point algorithm for this type
of system arising from linear programming. Monteiro and Adler [51, 52] also refined
the algorithm in [30] and develop a primal-dual interior point algorithm for solving
the system (8.4).
302 CHAPTER 8
In the algebraic definition, the system (8.4) can not be formulated directly as an
LCP of the form (8.2). The problem (8.2) is a type of LCP which we call standard
LCP and there have been many other types of LCP, e.g., the mixed LCP (MLCP),
the horizontal LCP (HLCP), the generalized (or geometrical) LCP (GLCP) and so
on:
MLCP: Find (x, Y) E R 2n
XiYi = 0 i = 1, ... , n,
where w E Rm, q E R n+m and M is an (n + m) x (n + m) matrix;
It can be easily seen that the class of GLCPs includes both of the classes of HLCPs
and of MLCPs, and that the standard LCP (8.2) belongs to the class of HLCPs. We
define the monotonicity of each problem as follows:
One can see that systems (8.3) and (8.4) can be put into a monotone MLCP. See [5]
for more rich variations on the LCP.
Recently, some implications of three types of LCPs above have become clear in the
context of interior point algorithms ([2, 4,10,11,48,47,54,53,65,78,81,90], etc.).
Complementarity Problems 303
Giiler[ll] first showed that the GLCP with a maximal monotone operator can be
reduced to a monotone standard LCP and Bonnans and Gonzaga [4] simplified its
proof. Mizuno, Jane and Stoer [48] provided a unified approach of interior point
algorithms for a class of monotone GLCPs. The equivalence between the class of
monotone LCPs and the class of monotone MLCPs has been proved by Wright [81]
and the one between the class of GLCPs and the class of standard LCPs has been
shown by Potra [2] in view of P.-property which we will describe in Section 6.
If the matrix A in (8.4) has full row rank then we can find a basis matrix B of A.
Let us partition A as A = [BIN]. Dividing the variables U into the basic variables
UB and the nonbasic variables UN, we have
{u: Au=b, u>O}={(UB,UN): uB=B-1b-B-1UN>0, UN>O}.
Thus we can deal with the problem (8.4) as a problem of the type (8.3) with respect
to the variables UN in this case.
Figure 8.1 illustrates an example of LCP with n = 2 in x-space. The feasible region
and the feasible-interior region are given by S+ and S++, respectively. The boundary
=
lines indicate the sets of points satisfying the equations Xi 0 or Yi= =
(M x+ q)i 0
(i = 1 or 2). We often use this figure throughout of this chapter. The feasible-
interior-point algorithm generates a sequence in the feasible-interior region S++ =
((x,y) E 1R2n : y = Mx + q,(x,y) > O} (the shaded zone in the figure), which
converges to a solution of the LCP.
When we design an interior point method for the LCP, it is important to formulate
the problem as an optimization model. We propose here two types of such models
which are closely related to the prototype algorithms described in Section 4.
The first model is a quadratic programming problem which is based on the fact that
xTy ~ 0 whenever x ~ 0 and y ~ 0:
Ml: Minimize xTy subject to (x,y) E S+ = SafnIR!n. (8.8)
The model Ml is equivalent to the LCP in the sense that (x, y) is a solution of the
LCP if and only if it is a minimum solution of Ml with the objective value zero. This
formulation is a basis of the so-called path-following algorithm for the LCP, which
will be described in Section 4.1. Under certain conditions, the algorithm generates a
sequence {(xk, yk) : k = 0,1, ...} of the feasible-interior-points (xk, yk) E S++ such
that
(8.9)
where p E (0,1) is a number which does not depend on the iteration k. This relation
implies that an approximate solution (x K , yK) E S++ such that (xKf yK :5 f can
be obtained after a finite number of iteration for any f > O.
304 CHAPTER 8
The second model depends on the potential function which was first introduced by
Karmarkar [23] for linear programming problems in (non-standard) primal form. By
extending the function to the problems in primal-dual form, Todd and Ye [77] defined
the primal-dual potential function, and independently of this work, Tanabe [74] also
provided it in a multiplicative form. Ye [86] first showed the so-called primal-dual
potential-reduction algorithm and established a bound of O( vnL) on the number of
iterations (and O(n 3 L) on the number of arithmetic operations) of the algorithm.
The first O( vnL )-iteration potential-reduction algorithm for LCPs was proposed by
Kojima et. al [31]. Let us define the potential function ifJ for the LCP:
E 10gxiYi -
n
where II> 0 is a parameter. The first term (n + II) 10gxTy comes from the objective
function of the quadratic problem M1 (8.8), the second term - L:7=llog XiYi works as
a logarithmic barrier function ([7], etc.) and the last term is added for convenience
in the following discussions. We consider the following minimization problem by
employing the potential function as the objective function:
It is easy to see that the potential function <p can be expressed as follows:
xTy/n
Here the factor (Il n
. yin corresponds to the ratio of the arithmetic mean and
x,y,
i=l
the geometric mean of n positive numbers X1Y1, X2Y2, ... , XnYn; hence we can see
that
¢cen(x,y) 2: 0 for every (x,y) > o. (8.12)
This bound implies that
Thus, if we have a sequence {( xk, yk) : k = 0, 1, ...} such that ¢( xk , yk) --+ -00 then
it satisfies (xkf yk --+ O. The potential-reduction algorithm described in Section
4.2 generates a sequence {(xk, yk)} of feasible-interior-points (x k , yk) E S++ (k =
0,1, ...) such that
(8.14)
where 6 > 0 is a number which does not depend on k. Similarly to the relation (8.9),
this implies that an approximate solution (x K , yK) E S++ such that (xK)T yK ~ (
can be obtained after a finite number of iteration for any ( > O.
As we have seen above, if the sequence {(xk, yk)} satisfies ¢(xk, yk) --+ -00 then
(xk)T yk --+ O. However the converse does not necessarily hold. Let n 2: 2 and
=
consider the sequence {( xk , yk) : k 1, 2, ... } which satisfies
kk 1 kk 1.
x1Y1 = k"+ 1 ' xiYi = k (t=2,3, ... ,n)
For simplicity, we assume that the following condition holds throughout the succeed-
ing two sections, Section 3 and Section 4:
Condition 8.2.4 A feasible-interior-point (x, jj) E S++ of the LCP (8.2) is known.
This condition ensures not only the availability of an initial point of the interior
point algorithm, but also more rich properties of the monotone LCP. First, let us
observe the following well known results which can be obtained under a more relaxed
condition, i.e., S+ :f:. 0 (see Section 3.1 of [5)).
Lemma 8.2.5 Suppose that the LCP (8.2) satisfies Condition 8.2.2. If the feasible
region S+ of the L CP is not empty, then
and
Obviously, S+ i- 0 under Condition 8.2.4, hence the above lemma also holds under
the condition. Moreover, this stronger condition leads us to the next lemma (see,
for example, the proof of Theorem A.3 of [27]).
Lemma 8.2.6 Suppose that the LCP (8.2) satisfies Condition 8.2.2 and Condition
8.2.4. Then the assertions (i), (ii) and (iii) of Lemma 8.2.5 hold. Furthermore, the
set n
S+ ( T) = {(x, y) E S+ : x T y = LX; Yi ::::; T} (8.15)
;=1
is closed and bounded for every T?:: O. Here the set S+(T) can be regarded as a level
set associated with the objective function x T y of the model Ml (8.8).
Some of the results above can be generalized for nonlinear cases. Let us consider a
monotone CP, i.e., the CP satisfies Condition 8.2.1. Then the monotone CP has a
solution if it has an feasible-interior-point (x, y) E S++ (see [58]). Moreover we can
show that the level set S+ (T) in this case is also closed and bounded by a similar
way in the proof of Lemma 8.2.6. However, we can not extend the assertion (i) of
Lemma 8.2.5 to the nonlinear monotone cases. Megiddo [43] showed an example of
the nonlinear monotone CP where S+ i- 0 and S++ = 0 but Scp = 0.
In linear cases, several results have been reported concerning the existence of the
solution of the LCP. For example, if the matrix M is row sufficient and the feasible
solution set S+ is nonempty then the solution set Scp is also nonempty. On the other
hand, the solution set Scp is convex for each q if and only if the matrix M is column
sufficient. Thus if M is sufficient (i.e., row and column sufficient) then the LCP has
a nonempty convex solution set Scp for every q. See [5] for more details.
As we will see in the next section, the boundedness of the level set S+(T) (8.15) plays
a crucial role in showing the existence of the path of centers. It has been known that
the set S+(T) is bounded for every q E IRn if and only if M is an Ro-matrix, i.e., if
M x ?:: 0, x ?:: 0 and x T M x = 0 then x = O. Note that the 2 x 2 positive semi-definite
matrix
is not Ro (choose x = (0, 1)T). Hence the positive semi-definiteness does not nec-
essarily ensure that the set S+ (T) is bounded. This implies another importance of
Condition 8.2.4.
308 CHAPTER 8
The boundedness of the set S+(T) under Condition S.2.4 can be extended to a class
of so-called P.-matrices [26]. See Section 6 for the definition of P.-matrix. Recently,
the equivalence of the class of P.-matrices and the one of sufficient matrices was
shown by Viiliaho [79].
In what follows, we assume that Condition S.2.2 and Condition S.2.4 hold. Let
(x, y) E S++ be the current point. We intend to find the next point in the feasible-
interior region S++. To define the next point, we introduce the search direction
(6.x,6.y) E 1R2n and the step parameter B, and define (x(B), y(B» as
The next point X, y is determined by (x(B), y(B» for a given B > O. How should we
determine the search direction (6.x, 6.y) and the step parameter B? A solution (x, y)
of the LCP satisfies the equality system
(8.19)
The following lemma has been repeatedly used in many papers on interior point
algorithms for monotone LCP (see, for example, Lemma 4.1 and Lemma 4.20 of
[26]).
Lemma 8.3.1 Suppose that Condition 8.2.2 holds. Then, for every (x, y) E R~+,
is nonsingular, hence the system (8.19) has a unique solution (~x, ~y) for every
hE R n , and
(ii) (~x, ~y) satisfies the following inequalities:
Here X-l/2 (Y-l/2) denotes the diagonal matrix whose components are X-;I/2
(y;I/2) (i=1,2 ... ,n) andD=X 1 / 2y-I/2.
Let us observe how the above results serve to determine the next point, by adopting
the model M1 (8.8) in Section 2. The model M1 is an optimization problem which
minimizes the sum of complementarities x T y = E7
XiYi. Therefore, our intention is
to find a step size () so that the next point (x, fj) remains in the feasible-interior region
S++ and the complementarity x((})T fj((}) at the next point is reduced sufficiently.
Recall that (x, fj) is given by (8.16) for a () > O. For every (x, y) E SaC, the system
(8.18) ensures that
fj((}) = y + Ol::!.ya = (Mx + q) + (}M l::!.x a = M x(O) + q.
Hence (x((}), y((}» E SaC for every O. It follows that (x(O), y(O» > 0 is a necessary-
and-sufficient condition for (x((}), fj(O) E S++ = Sa! n lR!"+. We can easily see that
(x((}), y(O» > 0 if and only if (e + (}X-1l::!.x a, e + oy-1l::!.ya) > O. Therefore, if
OIIX-1l::!.xalloo < 1 and (}IIY-1l::!.yaII 00 < 1 (8.25)
then (x(()), y((}» > O. By (8.24), we obtain that
IIx- 1 l::!.x a1100 IIX-1/2Y-1/2lJ-1l::!.xalioo
< IIX- 1/2y-l/2lJ- 1l::!. xa ll
< IIX-1/2Y-1/211I1lJ-1l::!.x411
Let us combine the above inequality and the condition (8.26) for (i( 0), y(O)) E S++.
Define
min{xiYi: i= 1,2, ... ,n}
xTy
where 'Y is a constant such that 'Y E (0,1). Obviously, iJ satisfies (8.26) and
- 'Y
O<OS,;n<1.
Hence (i(iJ), y(iJ)) E S++ and the complementarity at (i(iJ), y(iJ)) satisfies
where (1 - tiJ)2 E (0,1). The above inequality seems to lead us to the recurrence
relation (8.9) which is a desirable property of generated sequence for the optimization
model Ml (8.8). However there exists a serious difference between (8.9) and (8.27);
the value (1 - iJ)2 in (8.27) depends on the point (x, y) while p in (8.9) is referred as a
number which does not depend on the point (x, y). The value (1 - tiJ)2 is influenced
=
by the dispersion of XiYi (i 1, 2, ... , n) and satisfies
(1 - 2 ;nY
o
S (1 - iJf < 1 (8.28)
for every (x, y) > O. Note that the equality above holds if and only if (x, y) > 0
satisfies
then we can obtain the next point (i(O), y(O)) E S++ such that the complementarity
is reduced by the factor (1 - i:Fn) 2.
The set Seen given by (8.29) is called path of centers of the LCP. Let us introduce
the following mappings u : lR!n --+ lR~ and H : lR+ x lR!n+ --+ lRn x lRn :
Then, the solution of the LCP (B.2) is equivalent to that of the system
H(O,x,y) = 0, (x,y) E 1R!n,
and each point on path of centers Seen can be given by a solution of the system
H(J-I,x,y) = 0, (x,y) E 1R!n+, for some J-I > O. (B.32)
See Figure B.2 which illustrates the path of centers Seen in x-space and u(Seen) in
u-space, respectively.
The path of centers Seen can be characterized in several ways. Here we consider the
family of problems (B.B):
L(J-I) Minimize =
1/;(J-I, x, y) xTy - J-I L:?=l lo g(XiYi) (B.33)
subject to =
(x, y) E S++ Saf n 1R!n+,
for J-I > O. This problem may be regarded as the logarithmic barrier function problem
for the model Ml. Most of the following results were indicated and studied by
Megiddo [45] and by Kojima et al. [29]. See also [34] which gives some ingredients
in the proofs provided in Section 7.
Lemma S.3.3 Let J-I > 0 be fixed. If (x, y) satisfies (8.32) then it is an optimal
solution of L(J-I).
Complementarity Problems 313
Lemma 8.3.4 Suppose that Condition 8.2.2 and Condition 8.2.4 hold. Then the
problem L(J.I) has a unique optimal solution (x(J.I), Y(J.I)) for every J.I > O.
Lemma 8.3.5 Let J.I > 0 be fixed. Under Condition 8.2.2, if (x, y) E S++ is the
optimal solution of L(J.I) then it satisfies the system (8.32).
Theorem 8.3.6 Suppose that Conditions 8.2.2 and 8.2.4 hold. Then the path of
centers Seen is a I-dimensional smooth curve which converges to a solution (x', yO) E
Sep of the LCP(8.2) as J.I tends to O.
As we can see in Section 7, the results above are mainly due to the following two
facts given in Lemma 8.3.1 and Lemma 8.2.6:
(3a) The matrix M defined by (8.20) is nonsingular for every (x, y) > O.
(3b) The set S+(r) defined by (8.15) is bounded for every r:2: O.
It has been known that the condition (3a) holds if and only if all the principal
minors of the matrix M are nonnegative, i.e., the matrix M is a Po-matrix (Lemma
4.1 of [26]). In fact, Kojima et al. [26] showed that the mapping u (8.30) is a
diffeomorphism from the feasible-interior region S++ onto the n-dimensional positive
orthant IR++ under the conditions (3a) and (3b), and derived the existence of the
path of centers. Besides this, the generalization has been done for various problems,
e.g.,
(i) nonlinear CPs: Kojima, Mizuno and Noma [27,28]' Kojima, Megiddo and Noma
[25], Noma [63], etc.,
(ii) CPs for maximal monotone operators: McLinden [41], Giiler [12], etc.,
(iii) monotone semidefinite LCPs: Kojima, Shindoh and Hara [34], etc,
(iv) monotone generalized CPs (including monotone linear and nonlinear CPs and
monotone semidefinite LCPs): Shida, Shindoh and Kojima [72].
In the literature on interior point algorithms the existence of the path of centers (or
the central trajectory) is considered as a crucial condition for providing a globally
convergent algorithm.
Up to the present, our analysis on the path of centers Seen has been based on the
optimization model Ml (8.8). However Seen can also be characterized in the context
314 CHAPTER 8
u(S
Figure 8.3 The level set A¢cen (T) and u(A¢cen (T».
of the model M2 (S.10). Recall that the potential function eP can be expressed as
(S.l1). In view of the definition ePeen, it is easily seen that the equality in (S.12)
holds on the set S++ if and only if (x, y) E Seen. Thus, we obtain another definition
of Seen:
Seen = ((x,y) E S++ : ePeen(x,y) = O}. (S.34)
See Figure S.3, Figure S.4 and Figure S.5 where the level sets
In the next section, we will propose two prototype algorithms based on the model
M1 and model M2, respectively.
The solution (~xa, ~ya) of the system (S.lS) for approximating a point which sat-
isfies the system (S.17) is often called affine scaling direction for the LCP. This
direction is used not only in the affine scaling algorithms, but also in the predictor-
corrector algorithms for the LCP (see Section 6). Furthermore, as we will see in
Complementarity Problems 315
1;;!:p
«("""""cpee' )
the next section, each of the directions used in the path-following algorithm and the
potential-reduction algorithm can be regarded as a convex combination of the affine
scaling direction and the so-called centering direction for approximating a point on
the path of centers Seen.
and consider that if (x, y) E N «(X) for a small (X > 0 then (x, y) is sufficiently close to
the path of centers. For a fixed (X > 0, we call the set .N( (X) neighborhood of path of
centers Seen. Figure 8.6 illustrates the set .N( (X) in x-space and u( Seen) in u-space,
respectively.
Complementarity Problems 317
Figure 8.6 The neighborhood N(O!) of the path of centers Seen and u(Seen).
See (i) of Lemma 8.3.1. Let (~x(j1), ~y(j1» be a convex combination ofthe centering
direction (~XC, ~yC) and the affine scaling direction (~xa, ~ya) at (x, y) E S++
given by
(8.39)
for j1 E [0,1]. It is easily seen that the direction (~x(j1), ~y(j1» coincides with the
unique solution of the system
- -T- -
-{j1(Xy - ¥e) + (1- j1)Xy}
-(Xy - j1~e), (8.40)
= o.
A conceptual illustration of these three directions is given by Figure 8.7.
Let us consider the search mapping (8.16) with (~x, ~y) = (~x(j1), ~y(j1». The
assumption (x, y) E S++ and the system (8.40) implies that
Lemma 8.4.1 Suppose that Condition 8.2.2 and Condition 8.2.4 hold. Let (x, fj) E
N(a) for a E (0,1) and let (.~x«(3), ~y«(3» be the solution of the system (8.40) for
(3 E [0,1]. Then
Here b =X 1/2y-l/2 and ~X«(3) denotes a diagonal matrix whose components are
equal to those of ~x«(3).
Let us derive a sufficient condition of the parameters a E (0,1), (3 E (0,1) and 0 >
for our requirements (4a), (4b) and (4c). By a similar discussion in Section 3, we
°
can see that (4a) holds if
(1 - a)2
(8.41)
320 CHAPTER 8
For the requirement (4b) and (4c), we must observe x( O)T fi( 0) and X (O)fi( 0) -
ice): y(e) e. The following lemma directly follows from the fact that the displacement
(ll.x(,8), ll.y(,8» satisfies
y ll.x(,8) + X ll.y(,8) = - (X y - ,8 x: y e) .
Lemma 8.4.2
then (X(O), fi(O» E N(a) (i.e., the requirement (4b», and if there exists a constant
p E (0,1) such that
(8.45)
Complementarity Problems 321
then x(Ol y(O) ::; pxT fj (i.e., the requirement (4c)). Therefore a sufficient condition
of the parameters a E (0,1), (3 E [0,1] and 0 for the requirements (4a), (4b) and
(4c) is to satisfy the inequalities (8.41), (8.44) and (8.45) with a constant p E (0,1).
In fact, let
1 1 1
a = 2' (3 = 1 - 2fo,' 0 = 5" (8.46)
Hence the choice ofthe parameters (8.46) meets the requirements (4a), (4b) and (4c)
with p = 1 - 20J;;-.
There are many other possible choices of the parameters, but
°
we never take (3 = and/or (3 = 1 in our analysis above since the requirements (4b)
and/or (4c) are not necessarily ensured in those cases (see (8.44) and (8.45». This
means that using a combined direction of the affine scaling direction (~xa, ~ya) and
the centering direction (~xC, ~yC) makes sense in our analysis.
Based on the discussion above, we now state an algorithm which we call path-
following algorithm:
322 CHAPTER 8
Input
a E (0,1): the neighborhood parameter;
(xO, yO) E .N( a): the initial feasible-interior-point in the neighbor-
hood .N(a) of the path of centers;
Parameters
f > 0: the accuracy parameter;
!3 E [0,1]: the parameter of convex combination of the
centering direction and the affine scaling direction;
p E (0,1): the parameter of shrinking ratio of the complementarity
xTy;
0: the step size parameter;
begin
(x,y) = (xO,yO);
k = 0;
while xTy> f do
Calculate (~x(!3), ~y(!3» from (8.40);
(Ax, Ay) := (~x(!3), Ay(!3»;
Compute the search mapping (X(O), y(O» by (8.16);
Find 9 such that
(x(9), y(9» > 0, (x(9), y(9» E .N(a) and x(9)Ty(9) :::; pxTy;
(x, y) := (x(9), y(9»;
k:= k + 1;
end
end.
If we choose the parameters as in (8.46) and if an initial point (XO, yO) E .N(a) is
=
obtained, then the algorithm is well-defined with the ratio p 1 - 20fo' Figure 8.8
gives an image of the sequence {(xk, yk)} generated by the path-following algorithm
in x-space.
In this case, the generated sequence {(xk, yk)} satisfies (8.9) for each k = 0,1, ...
and consequently
(Xk?yk:::; (1- 20~r (xO?yo.
Let us compute an iteration number [{ at which the criterion (xK)T yK < f IS
satisfied. A sufficient condition for (xK)T (yK) :::; f is given by
K
( l O(x)
1 - 20.jTI
) °
T y :::; f.
Complementarity Problems 323
Figure 8.8 A generated sequence {(Xk, yk)} by the path following algorithm.
Theorem 8.4.4 Suppose that the LCP{8.2) satisfies Condition 8.2.2 and Condi-
tion 8.2,,{ Define the parameters as in (8.46). Then Algorithm 8.4.3 terminates
with an approximate solution (x, y) E N( a) satisfying the desired accuracy x T y :::; (
in 0 (Fnlog ("-°r yO ) iterations.
The order 0 ( vn ("-°r
log yO) is known as the best iteration upper bound for feasible-
interior-point algorithms for solving the LCP to date.
324 CHAPTER 8
The path-following algorithm of this type was first proposed by Kojima, Mizuno and
Yoshise [30). While Algorithm 8.4.3 employs the quantity (8.35) for a measure of
the "distance" between a point (x, y) and the path of centers, many other measures
have been proposed. For instance, Kojima et al. [26) used the function if>cen as a
measure and showed the relationship among several measures.
In case of linear programs, taking a small f3 E (0,1) and a large step size () shows
an outstanding performance in practical use (see [36,42,40, 3], etc.). A difficulty of
Algorithm 8.4.3 is that it often forces us to use a short step size () and requires too
many iterations. Several approach have been proposed to overcome this difficulty
(see [50, 24, 68], etc.).
Another problem to be solved is how to prepare an initial point (XO, yO) which belongs
to the neighborhood N(a). We have at least three approaches to overcome this
difficulty. The first is to make an artificial problem from the original one which we
will describe in Section 5. The second is to use another type of path of centers and its
neighborhood according to the initial feasible-interior-point (xO, yO) E S++([50, 46,
49], etc.). See Chapter 3 for such variants of the path-following algorithm. The last
one, which may be the most practical approach among them, is to give up the idea
of finding a feasible-interior-point as an initial point, and to develop an infeasible-
interior-point algorithm which allows us to start from an infeasible-interior-point
(x,y), i.e., (x,y) > 0 but not necessarily (x,y) E Sar. See Chapter 5 for the idea of
infeasible-interior-point algorithms and much developments on this subject.
where v > 0 is a parameter. Suppose that v is a fixed positive number, and that we
currently obtain an feasible-interior-point (i, y) E S++. Let us find the next point
(x, y) E S++ according to the search mapping (8.16). To determine the next point,
it is important to bound the value of the potential function at (x«(}), y«(}» for each
(). For this purpose, we use the following lemma which has appeared in many papers
([9, 23, 77, 86], etc.).
Complementarity Problems 325
Lemma 8.4.5
~log(l+e.»eTe-
~ , -
IIel1 2
2(1 - r)'
.=1
For convenience in the succeeding discussions, we define
V Xl/2}rl/2 = diag{ylxiYi},
v = Ve = (ylXlYt, ylX2Y2, ... , ylXnYn)T, (8.47)
Vmin = min{v;: i= 1,2, ... ,n}.
(8.48)
then we have
+
02 {nxTy
+ v~ T ~
x Y+
IIX-l~xIl2 + liY-l~YIl2}
2(I-r) . (8.49)
In view of the above approximation, the vector Y~X + X ~Y plays a crucial role
in the linear term with respect to O. Furthermore, the quadratic term includes the
=
factors which we can obtain if we let h Y ~x + X ~Y in Lemma 8.3.1. So from now
on we assume that (~x, ~Y) is the solution of the system (8.19) with (x, y) = (x, y)
for some h E IRn. By Lemma 8.3.1, we have
< -i-11V- lh I1 2,
vrnin
Hence, if (J satisfies
_(J_IIV-1hll = T, (8.50)
Vmin
then we obtain a bound for the last term of (8.49) in Lemma 8.4.6 as follows:
B2 { n 11 2 +_ II"Y-'
+ v t" x T t" Y + IIX- 1 t" x2(1 t"Y1I2 } {~(!:..) 1 } 2 (8 51)
xTy T) ::; 4 1 + n + 2(1 _ T) T. .
Thus the remaining concern is to choose an h E lRn suitable to derive the potential-
reduction inequality
for some constant {j > O. While there have been several proposals for such vector h
(see [31, 26, 76, 84, 85], etc.), here we take
-T-
X Y
h = -(Xy
- _
- --e)
n+v
(8.52)
for which the solution (6.x,6.y) of the system (8.19) coincides with the solution
(6.x(,8),6.y(;3)) of (8.40) with
n
;3= - - E(O,I). (8.53)
n+v
In this case, the coefficient of the linear term in (8.49) turns out to be
Hence, by the assumption (8.50) and the inequality (8.51), we obtain the bound
( nV~in
x-T-y
_ 1)2
+
V2v~in
x-T-y .
nV~in _ n min{xiYi : i
-T- - T
=
1,2, ... , n} (0 1
E ,1
x y x y
nV~in
- - <- -21
xTy
then
(-
nV~in
- - 1)2 >-
xTy
1
- 4
and otherwise
. {I v}
where
0"1 = mm 2' y"2ri" .
328 CHAPTER 8
Let us observe the second term in the right hand side of (8.54). If we assume that
1
T <-
- 3
then we can easily see that
41 (II) 1
1 +;;: + 2(1 _
II
T) ~ 1 + 4n ~ 2 max 1, 4n
{II } .
Define
0"2 = 2max{ 1, :n}.
By combining the above results, we obtain the following inequality whenever T ~ ~:
(8.56)
The right-hand side ofthe above inequality is a quadratic function with respect to T,
and its coefficients are positive and do not depend on the current point (x, y). Hence
we can easily find a suitable T which ensures a constant reduction of the function <p.
In fact, let
_= mm. {I3'
T
0"1}
50"2
then
5 1 1
-0"1 f + :20"2f2 ~ -0"1 f + :20"2 f < - :20"1 f.
Hence, if we take the step size e according to the equation (8.50) with T = f, then
<p(x(e), y(e)) - <p(x, y) ~ -6
where 6 is given by
1
6 = :20"1f.
It should be noted that the existence of the above e is ensured by the inequality
(8.55) which implies that
Input
v > 0: the function parameter;
(xO, yO) E 5++: the initial feasible-interior-point;
Parameters
f > 0: the accuracy parameter;
fJ E [0,1]: the parameter of convex combination of the
centering direction and the affine scaling direction;
b > 0: the reduction parameter of the potential function <jJ;
B: the step size parameter;
begin
(x, y) = (x O, yO);
k = 0;
while xTy> f do
Calculate (dx(fJ), dy(fJ» from (8.40);
(dx, dy) := (dx(fJ), dy(fJ»;
Compute the search mapping (x(B), y(B)) by (8.16);
Find 0 such that
(x(O), y(O» E 5++ and <jJ(x(O), y(O)) - <jJ(x, y) ::; -{j;
(x, y) := (x(O), y(O»;
k:= k + 1;
end
end.
If we choose fJ as (8.53), then we can find a step size 0 for which the value of the
potential function <jJ is decreased by b where
{j
0"1
-
=
=
1 -
'~l-'L }
m1n 3' 50"2 '
. 1 v
mm 2' V'23' { ,
(8.57)
for every function parameter v > O. In this case, the generated sequence {(xk, yk)}
satisfies (8.14) with the above {j for each k =
0,1, ... and consequently (See Figure
8.9.)
if
Theorem 8.4.8 Suppose that the LCP(8.2) satisfies Condition 8.2.2 and Condi-
tion 8.2.4- Let /.I = O( y'n). Define the parameters (3 and b as in (8.53) and (8.57).
Then Algorithm 8.4.7 terminates in 0 ( y'nlog 2"(,,,°/)/"') iterations with an approx-
imate solution (x, y) E S++ satisfying the desired accuracy xT y ::; (.
In the discussion above, we employ the search direction (~x((3), ~y((3)) which is the
solution of the system (8.40) where (3 is given by (8.53). We can easily see that this
direction coincides with the solution of of the minimization problem with a certain
w > 0:
In this case, the direction (~x, ~y) is an affinely scaled steepest descent feasible
direction of the potential function. The above scaling is often called primal-dual
scaling and regarded as a key to show the best complexity of the interior-point
algorithms for the LCP to date. See [83] where an analysis is provided in case of
using another scaling.
The first potential-reduction algorithm was proposed by Karmarkar [23] for solving
linear programming problems. While the original potential function was defined for
problems in primal form, a primal-dual potential function of the type (8.11) was
introduced by Todd and Ye [77] and also by Tanabe [74] in a multiplicative form.
The first O( y'nL )-iteration potential-reduction algorithm for the LCP was proposed
by Kojima et. al [31] and further discussions on the algorithms can be seen in [26]
in connection with the path-following algorithm. See Chapter 4 for an overview of
the potential-reduction algorithms developed so far.
332 CHAPTER 8
In the discussion for the potential-reduction algorithm above, we only use a fixed
parameter f3 defined by (8.53). However, the polynomial-time complexity can be
ensured with other choices. See, for example, [26] where the authors also provided a
unified framework including the path-following algorithm and the potential-reduction
algorithm, based on the fact that the level set ofthe function ~cen defined in (8.11)
gives a neighborhood of the path of centers (see Figure 8.3 and Figure 8.6).
In the succeeding subsections, we intend to answer both of the above questions. Here
we impose one more condition to define the size of the LCP, which is a necessary
concept for discussing the polynomiality of algorithms.
Condition 8.5.1 All the elements of the matrix M and the vector q in the Lep
(8.2) are integral.
where mij and q; denote the (i, j)-th element of the coefficient matrix M and the
i-th element of the constant vector q of the LCP (8.2), respectively, and rz1 the
smallest integer not less than z E JR. It follows from the definition of L that every
minor of the matrix ( - M I q) is integral and its absolute value is less than 2L / n 2.
As we will see below, the polynomial-time complexity O( foL) of the interior point
algorithms can be derived from this fact.
Complementarity Problems 333
Let XO E JR~+. In general, the point XO does not necessarily satisfy M xO + q > O.
However, if we import the vector e E JRn of ones and a new variable X n +l E JR, then
we can easily find x~+1 > 0 and yO satisfying
yO = M xO + X~+l e + q > O.
We extend this idea and construct an artificial monotone LCP:
LCP': Find (x', y') E lR.2(n+l)
such that y' = M'x' + q', (x', y') ~ 0, (8.58)
x:y: = 0, i = 1,2, ... ,n+ 1.
with
x, - ( x
Xn +l
) ,y= , (M
, ( Yn+ly) ,M= -e
T e) ,q=
0
, ( q
qn+l
) . (8.59)
Here Xn+l E JR and Yn+l E IR are artificial variables and qn+l is a positive constant.
We also use the symbols S~, S~+ and S~p for the set of all feasible points, all feasible-
interior-points and all complementary solutions of the LCP', respectively:
ex °
TO <qn+l, M x O +xn+1e+q> 0
x ,xn°+ 1 , y °, Yn+l
( x,0, yO) -_ (0 ° )> 0
with
yO = Mxo + x~+le + q, y~+1 = _eT xO + qn+1. (8.60)
Note that the matrix M' defined by (8.59) is positive semi-definite whenever M
is positive semi-definite. Hence if we replace the problem LCP in the conditions,
334 CHAPTER 8
Condition 8.2.2 and Condition 8.2.4 with the artificial problem LCP' (8.58) then the
LCP' satisfies both of these conditions. The next theorem follows from Theorem
8.3.6:
Lemma 8.5.2 Suppose that LCP (8.2) satisfies Condition 8.2.2. Construct the
artificial problem LCP' (8.58) by using (8. 59}. Then, LCP' (8.58) is a monotone
LCP which has an feasible-interior-point (x,o, y'0) E S~+ for every positive constant
qn+1 > O. Moreover, the LCP has a solution (x'*, y'*) E S~p.
Therefore, we may apply some feasible-interior-point algorithms for solving the ar-
tificial problem LCP' (8.58). The remaining problem is what kind of information
can be obtained from the solution of the LCP' concerning the solution set Scp of the
original problem. To see this, let us define Land L as follows:
n n n
L = LL log2(lmoi 1+ 1) + L log2(lq;1 + 1) + 2log2 n, (8.61)
;=1 i=1 ;=1
n n n
L LL pog2(l mijl + 1)1 + L flog2(lq;J + 1)1 + 2 flog2(n + 1)1
;=1 j=1 ;=1
(8.63)
By use of this inequality, we can obtain the following lemma (see [29, 51], etc.).
Lemma 8.5.3 Suppose that the LCP (8.2) satisfies Condition 8.2.2. Construct
the artificial problem LCP' (8.58) by using (8. 59}. Let a solution (x'*, y'*) =
(x*,x~+1'Y*' Y~+d of the LCP' (8.58) be given. Then
(ii) if X~+1 > 0 then LCP (8.2) has no solution in the set {(x, y) E R~n : eT x <
qn+d, and
(iii) if qn+1 ~ 2£ In and X~+1 > 0 then the LCP (8.2) has no solution.
Complementarity Problems 335
If we turn our attention to the two interior point algorithms described in Section 4,
they need more strictly conditioned initial points (see Theorem 8.4.4 and Theorem
8.4.8). However, we can also resolve this problem by suitably setting XO and qn+1. Let
us define the size L' of the artificial problem LCP' (8.58), the neighborhood N'{Ct)
of the path of centers and the potential function ¢/ associated with the artificial
problem LCP' (8.58) according to the definitions (8.62), (8.36) and (8.11) for the
original LCP:
n+ln+1 n+1
L' LL flog2(lm~jl + 1)1 + L flog2{lq~1 + 1)1
i=1 j=1 i=1
In the lemma below, the parameter i serves for leveling the values of (i = x'?Y?
1, 2, ... , n + 1) and for bringing the initial point (x'o, yO) close to the path of centers
Seen (see [29, 30], etc.).
Lemma 8.5.4 Let n ~ 2. Suppose that the LCP (8.2) satisfies Condition 8.5.1.
Construct the artificial problem LCP' (8.58) by using (8.59). Let
Ct E (0,5/2],
Then
If we choose
2L+1
1 = i'(M,q) = -n2
then
(n + 1)2L+1
qn+l = n2
and we see that qn+l ~ 2L In, i.e., the requirement in (iii) of Lemma 8.5.3 for qn+l
is fulfilled. Therefore, the theorem below follows from the three lemmas above:
Theorem 8.5.5 Let n ~ 2. Suppose that the LCP (8.2) satisfies Condition 8.5.1.
Construct the artificial problem LCP' (8.58) by using (8.59). Let a E (0,5/2],
1 E h(M, q), i'(M, q)] and qn+1 =
(n + 1)/ where I(M, q) and i'(M, q) are defined
by (8~65). Let (x,o,y'0) =
(XO,X~+l,yO'Y~+1) be the initial point given by (8.66)
and let (x'*,y'*) =
(X*,X~+l,y"Y~+l) be a solution of the artificial problem LCP'.
Then, the following results hold.
(iv) The input size L' of the LCP' defined by (8.64) satisfies L ~ L' ~ 5L.
(ii)' If x~+1 =
0 then we have a solution (x·, yO) of the LCP (8.2) and otherwise
the original LCP (8.2) has no solution.
Complementarity Problems 337
To find whether the original LCP has a solution or not, our analysis requires the
use of number 1 = t(M, q) = 2L + 1 /n 2 which often becomes extraordinarily large for
practical use. On the other hand, we can compute the number
Recently, Ye [87] showed another type of artificial problem for the monotone LCP.
The problem is given by the following homogeneous model
y=Mx+q, (x,y)~O
can be represented as a ratio tJ..d tJ.. 2 where tJ.. 1 is a minor of order n of the matrix
[-M I q] and tJ.. 2 a nonzero minor of order n of the matrix [-M I] (see [29, 26],
etc.):
338 CHAPTER 8
Lemma 8.5.6 Let n > 2. Assume that Condition 8.5.1 holds. Suppose that
(x, y)
E S+ satisfies xT y-5:. 2- 2L . Define the index sets I and J by
Though the lemma above only ensures the existence of an exact solution (x*, y*)
of the LCP, a method has been proposed for computing the solution (x', y') from
the approximate solution (x, y) in O(n 3 ) arithmetic operations (see Appendix B of
[29]). Combining the results in Section 4 and the discussion above, let us derive the
computational complexity of the two feasible-interior-point algorithms in Section 4.
Suppose that the LCP (8.2) satisfies Condition 8.5.1. Theorem 8.5.5 implies that
we can start both algorithms for solving the artificial problem LCP' (8.58) from the
initial point (x,o, y'0) E S++ described in the theorem. From (ii) of Lemma 8.5.4,
the initial point (x'o, y'0) satisfies the equalities
Thus, by each of the algorithms, an approximate solution (x,K, y,K) E S++ with
KT K .
x' y' 5:. ( can be obtamed after
20(L))
K =0 ( vnlog-(-
iterations (see Theorem 8.4.4 and Theorem 8.4.8). If we take f 2- 2L' then we =
obtain an exact solution (x'*, y'*) E S~p of the artificial problem LCP' and if we
take 'Y =
i(M, q) as in Theorem 8.5.5 then we can determine whether the original
LCP (8.2) has a solution or not from the solution (x'*, y'*). Note that the input
size L' of the artificial problem LCP' (8.58) satisfies (iii) of Lemma 8.5.4. Thus the
required number of iterations turns out to be
K = O(vnL)
in each of the algorithms. It should be noted that each iteration requires n+ 0«
=
1)3) O(n 3 ) arithmetic operations which are mainly due to the calculation of the
search direction satisfying the system (8.40). Additionally the last iteration needs
O(n 3 ) arithmetic operations to refine the solution. Summarizing the discussions
above, we finally obtain the following theorem:
Complementarity Problems 339
Theorem 8.5.7 Suppose that the LCP (8.2) satisfies Condition 8.2.2 and Condi-
tion 8.5.1. Construct the artificial problem (8.58) as in Theorem 8.5.5 and apply the
feasible-interior-point algorithms described in Section -4 for solving the LCP' (8.58).
Then, in each of the cases, we can either find an exact solution of the original LCP or
determine that the original LCP has no solution in O(foL) iterations with O(n 3 .5 L)
arithmetic operations.
If we combine a way of using approximate scaling matrices for computing the search
directions with the path-following algorithm, the average number of arithmetic op-
erations per iteration can be theoretically reduced to O( n 2 . 5 L) and the total number
of operations to O( n 3 L) which is the best bound up to present (see [23, 29, 49], etc.).
The algorithms appeared in Section 4 are based on the idea of using the Newton
direction as the solution of the system (8.40) with a fixed f3 at each iteration. How-
ever, there have been many algorithms outside of this framework. One of such
algorithms is the so-called predictor-corrector algorithm which uses the affine direc-
tion (~xa, ~ya) (the solution of (8.40) with f3 = 0) and the centering direction
(~xC, ~yC) (the solution of (8.40) with f3 = 1) alternately during the iteration goes.
A remarkable feature of this algorithm is that not only polynomial-time properties
of the algorithm but also various asymptotically convergence properties of the gen-
erated sequence are reported under certain assumptions ([20, 19, 21, 22, 48, 47, 55,
57,56,67,69,80,88]' etc). Among others, Ye and Anstreicher [88] showed quadratic
convergence of the feasible-predictor-corrector algorithm for the monotone LCP un-
der the assumption that a strictly complementarity solution exists. Wright [80] and
Potra [69] proved superlinear or quadratic convergence of the infeasible-predictor-
corrector algorithm for the LCP under the same assumption. Monteiro and Wright
[55] gave an investigation concerning the behavior of feasible- and/or infeasible-
predictor-corrector algorithms for the monotone LCP when the LCP is degenerate,
and Mizuno [47] succeeded in weakening the assumption and deriving superlinear
convergence of the infeasible-predictor-corrector algorithm for solving a geometrical
(or general) LCP (8.7) which has a solution (not necessarily strictly complementar-
ity).
340 CHAPTER 8
Another type of algorithm is given in [15] where a new class of search directions is
introduced. Each direction in this class is given by the solution of the system:
Y~x+X~y
-M~x+~y
Here (x, y) E S++ and r is a nonnegative real number. If we take r = 0 then the so-
lution ofthe above system is equivalent to the affine direction (~xa, ~ya). However,
in the case of r > 0, the solution can not be represented as a linear combination
of the affine direction (~xa, ~ya) and the centering direction (~xC, ~yC). See [15]
for the theoretical results including polynomial complexity bound of this type of
algorithm.
In order to show the existence of the path of centers for the monotone LCP, we only
used some specific properties of the problem (see Section 3). In fact, Kojima et al.
[26] showed that there exists a path of centers Seen converging to a solution under
the following condition (see Theorem 4.4 of [26]):
Condition 8.6.1
(i) The matrix M of the LCP (8.2) belongs to the class Po of matrices with all the
principal minors nonnegative.
Thus, the condition above may be considered as a sufficient condition on the LCP
for ensuring the global convergence of feasible-interior-point algorithms. To derive
polynomiality of the algorithms, we repeatedly used Lemma 8.3.1 brought by the
monotonicity assumption on the LCP. (see Section 4). Among others, the asser-
tion (ii) of this lemma is essential for deriving bounds (8.21) and (8.22) concerning
(~x, ~y). However, similar bounds can also be obtained as long as the value of
~xT ~y is bounded from below. Based on this observation the class of P.-matrices
was first introduced in [26]. According to the definition in [26], the class p. is the
union of the class P.(II:) with respect to II: 2: 0, where P.(II:) (II: 2: 0) consists of
matrices M such that
I+(e) = {i E {I, 2, ... , n} : ei[Me]i > OJ, L(e) = {i E {I, 2, ... , n} : ei[Me]i < OJ.
Let P SD be the class of positive semi-definite matrices, P be the class of matrices
with positive principal minors, CS and RS be column-sufficient and row-sufficient
matrices, respectively. Some known implications are
(see [5, 26, 79], etc.). Concerning the LCP with a P.-matrix, the following results
have been shown (see Lemma 4.5 and Lemma 3.4 of [26)):
Lemma 8.6.2 Suppose that the matrix M in (8.2) is a P.-matrix and that Condi-
tion 8.2.4 holds. Then Condition 8.6.1 holds.
Lemma 8.6.3 If matrix M belongs to the class P.(II:) with II: ~ 0, then, for every
(x, y) E R!n+,
Therefore, the path of centers exists under the assumptions in Lemma 8.6.2 and
we can analyze the one step behavior of the algorithm using Lemma 8.6.3 as in
Section 4. It has been proved that the LCP with a P.-matrix M can be solved in
O( y'7l( 1 + II:)L) iterations by constructing a suitable artificial problem (see [26)).
342 CHAPTER 8
As described in Section 2, there are various types of LCPs such as the MLCP (8.5),
the HLCP (8.6), the GLCP (8.7) etc. Recently the LCPs with a P.-matrix has been
attracted much attention partially due to the fact that it relates these LCPs. Let us
define the P.(K)-property for these problems as follows:
GLCP: The dimension of II> is nand xTy ~ -4K L:iEI+ XiYi for every (x, y) Ell>.
Here I+ = {i: XiYi > O}. Potra [2] showed that the P.(K)-property is invariant
under some transformations which convert the above types of LCPs into each other.
It should be noted that the LCPs discussed so far constitute a mere part of the wide
class of LCPs, and that there are many other LCPs for which any polynomial-time
algorithm has not been provided yet. It is known that the general Po-matrix LCP,
i.e., the LCP for which only the requirement (i) of Condition 8.6.1 is ensured, is NP-
complete (see Section 3.4 of [26]) [26]) while the Newton direction for the system
(8.17) can be computed (see Lemma 4.1 of [26]) .. See also [44] for an attempt to
find the complexity of another class of LCPs.
The nonlinear CP is another important problem in the field of interior point al-
gorithms. Kojima et al. [27] extended the results in [29] to a class of nonlinear
complementarity problems and this work was succeeded by [28] and by [25]. In these
papers, the following three conditions are proposed:
Condition 8.6.4
(i) The mapping I is a Po-function, i.e., for every xl E IRn and x 2 E IRn with
xl #- x 2 , there exists an index i E {1,2, . .. ,n} such that
= =
u(x, y) (XIYl, X2Y2,···, XnYn), VeX, y) Y - f(x),
v(R~n+) = =
{v ERn: v Y - f(x) for some (x, y) E R~n+}.
Condition 8.6.5 The mapping f is a uniform P-function, i.e., for every xl E IRn
and x 2 E R n , there exists a positive number 1 such that
Condition 8.6.6
(i) The mapping f is a monotone function, i.e., for every Xl ERn and x 2 ERn,
Concerning the algorithms for nonlinear CPs, Kojima et al. [25] provided a homotopy
continuation method which traces the center trajectory and globally converges to a
solution of the CP (8.1) under Condition 8.6.4. In [33], a more general framework
for the globally convergent infeasible-interior-point algorithms are described in terms
of the global convergence theory given by Polak [66] in 1971. While the papers
mentioned above consider the global convergence properties of the algorithms as
their main aims, the study of their convergence rates has also become active for the
smooth convex programming (see Chapter 8). In order to derive the convergence
rate, we must impose certain conditions on the smoothness of nonlinear mapping
f. For the variational inequality problem, Nesterov and Nemirovsky [62] analyzed
the convergence rate of Newton's method in terms of the so-called self-concordant
barrier under the following condition for f:
344 CHAPTER 8
Condition 8.6.7 The mapping f is C 2 -smooth monotone operator f : IR+ -> IRn
°
is j3-compatible with F(x) = - L:~=llnxi' i.e., there exists a j3:::: such that for all
x> and hi E IRn (i = 1,2,3), the inequality
°
3
1f"(x)[h 1, h 2 , h 3 ]1 :::; 33/ 2 (3 II {J'(x)[h i , hij1/31Ix-lhiI11/3}
i=l
holds.
for all t > 0, xl, x 2 E IRn for which r := J(x 1 - x2)TV' ft(x)(x 1 - x 2 ) < 1 and
hE IRn.
On the other hand, Potra and Ye [70] presented a potential reduction algorithm for
the monotone CP and derived global and local convergence rates of the algorithm.
They used the so-called scaled Lipschitz condition below which was introduced by
Zhu [91] for convex programming problems and used by Kortanek et al. [35] for
an analysis of a primal-dual method for entropy optimization problems, by Sun et
al. [73] for the min-max saddle point problems, and by Andersen and Ye [1] for the
monotone LCP embedded in a homogeneous problem.
Moreover, Jansen et al. [16] introduced the following condition for the mapping f:
In [16], the authors showed the global convergent rate of a class of affine-scaling al-
gorithms of [15] under Condition 8.6.10, and provided some relationships among the
four conditions above. Note that the definition (8.72) of the scaled Lipschitz con-
dition implies that h T \7f(x)h 2: 0 for every x > 0, which eliminates non-monotone
mapping f a priori. Even in linear cases, i.e. f is given by f(x) = M x+q, Condition
8.6.9 does not necessarily hold for the P.-matrices. On the other hand, Condition
8.6.10 needs no monotonicity and holds for any linear mapping, which may be con-
sidered as a merit of the condition.
Another remarkable aspects of interior point algorithms for the CP are the devel-
opments of infeasible-interior-point algorithms and the extensions to semidefinite
programming. See Chapters 5 and 9, for the progress on these topics.
(i): Let us consider the optimization model M1 (8.8). The objective func-
tion x T y is rewritten as
T
x Y
1
="2 (T
x ,yT) (0 I) (x )
lay'
346 CHAPTER 8
We only show the second and the third parts of the lemma. The closed-
ness of the set S+ (1') can be obtained by the continuity of x T y. Hence, it
suffices to show that the set S+(1') is bounded for every l' 2:: O. Let (x,y)
be a fixed feasible-interior-point whose existence is ensured by Condition
8.2.4. Then, by Condition 8.2.2, we obtain the following inequality:
However, since the matrix X-l Y + M is positive definite for every (x, y) >
o whenever M is positive semi-definite, the above equation implies that
dl = =
0 and d2 M d l =0, which contradicts to (d l , d2 ) =f. O. Thus we have
shown (i).
(ii): Since the matrix M is positive semi-definite, the equation tly = M tlx
ensures that
=
o: : ; tlxT M tlx tlxT tly.
On the other hand, the equation Y tlx + X tly = h implies that
The function e - J-Iloge is strictly convex on lR++ and attains the minimum
at e = J-I in R++. Hence the point (x, y) E S++ satisfying Xy J-Ie is an =
optimal80lution of L(J-I). 0
Let J-I > be fixed. The Hessian matrix of the objective function ¢(J-I,.)
at (x, y) E S++ is given by
( J-IX-2 I )
I J-Iy- 2 .
Let (x', y) and (x" , y") be arbitrary points in the set Sa! such that (x', y) =f:.
(x" , y"). Then we observe that
=I'IIX- 1 (x' _ x")11 2 + 2(x' _ x"f (V' _ V") + I'lI y - 1 (V' _ y")11 2
= I'IIX- 1 (x' - x")11 2 + 2(x' _ x,,)T M(x' _ x") + I'lIy-l(y' _ y")11 2
> o.
Complementarity Problems 349
Thus the Hessian matrix is positive definite at each point on the nonempty
convex set S++ = Saf n 1R~n+, which implies that ¢(p, .) is strictly convex
on S++. Consequently, if the problem L(p) has an optimal solution then it
is a unique solution. In order to see the existence of the optimal solution,
it suffices to show that the level set
T 2: ¢(p, x, Y)
n
2)XiYi - plog(xiY.))
i=l
where
r' = 2n( r - n(fl - fllogfl) + fllog 2).
As we have seen in Lemma 8.2.6, the set S+ (r') is bounded under Condition
8.2.2 and Condition8.2.4, which completes the proof. 0
350 CHAPTER 8
y - pX-Ie + MT z = 0, x - py-Ie - z = 0, y - Mx - q = O.
From the first and the second equalities, we observe that
x y- pe = -X MT Z = Y z.
Letting z' = M z, the system -X MT Z = Y z can be rewritten as follows:
It follows from (i) of Theorem 8.3.1 that the coefficient matrix of the above
system is nonsingular, hence we can conclude that the Lagrange multiplier
vector z is O. 0
The existence and the uniqueness ofthe solution (x(p), y(p» E S++ of
the system (8.32) are ensured by Lemmas 8.3.4, 8.3.4 and 8.3.5. Further-
more, the mapping H defined by (8.31) is Ceo on lR X lR 2n and its Jacobian
matrix with respect to (x, y) coincides with if defined by (8.20). Since (i)
of Theorem 8.3.1 ensures that AI is nonsingular for every (x, y) > 0, thus
we obtain that the path of centers S++ is a I-dimensional smooth curve
by applying the implicit function theorem (see, for example, [64]). Let {l
be fixed. Then the set ((x(p), y(p» : 0 < {l} C Seen is bounded since it
is contained in the bounded set {(x, y) E S+ : x T y ::; n{l} (see Lemma
8.2.6). This implies that the there exists at least one accumulation point
of (x(p), y(p» as p > 0 tends to O. By the continuity of the mapping H,
every accumulation point is a solution of the LCP. To see the convergence
of (x(p), y(p» to a single point, we need to observe the limiting behavior
of (x(p), y(p» more precisely.
In view of (ii) of Lemma 8.2.5, there exists two index sets Ix and Iy
such that
lim Xi(p)
~-o
= 0, i E Ix, and lim Yi(lI)
~-o
= 0, i Ely.
Hence we only to show that other components of (x(p), y(p» also converge
to some values. Let us define the function
n
w(x,y) = - 2.:logxiYi.
i=1
=
Let X(p)i {i(lI) and Y(II)i =
"1i(p), i =
1,2, ... , n. It is easily seen that
the point (x(II), Y(II)) is an optimal solution of the problem
Minimize W(x, y)
and
WB(X,y) = W(x,y) -WN(X,y).
Since WN(X,y) is constant on the set {(x,y) E R2" : Xi {i(II), i E =
Ix; Yi = TJi (II), i E Iy}, the point (x(p), Y(II» is the optimal solution of
Minimize WB(p, X, y)
n
{(x, y) E S+: x T Y = 0,
Xi = 0, i E Ix; Yi = 0, i Ely,
Xi > 0, i fj. Ix; Yi > 0, i fj. Iy}
352 CHAPTER 8
Minimize WB(J.l, x, y)
subject to {(x, y) E S+ : xT y = 0
Xi=O, iElx); Yi=O, iEly),
Xi>O, ifilx; Yi>O, ifily}
has a unique optimal solution which we denote by (x(O), y(O». This solution
can be characterized by the following system:
(i): Since (Xy - i!e)Te = 0, we obtain (i) from (x, y) E N'(et) as follows:
- _ Y
x-T-
h =- ( Xy - !3-:;;-e .
)
(iv) and (v): Combining (8.75) and (iii) above with the equations
(i):
X(O)T yeO)
=(x + Odx(!3»T (Y + Ody(fJ»
= x T fj + O(YT dx(!3) + x T dy(!3)) + 02dx(fJ? dy(fJ))
= xT y _ OeT (X y - e) +
!3 x: Y 02dx(!3)T dy(!3))
(ii) :
log(1+0 ~
e
~-2' if ~~o. (8.77)
The assertion (i) is the inequality (8.76) itself. To see (ii), it is sufficient to
show
log(l +~) ~ ~ - 2(1 _ T)
e
if ~ ~ -T for some T E [0,1). In the case ~ ~ 0, the above inequality follows
immediately from (8.77). Furthermore, if I~I :S T, we observe that
log(l +~)
Compl ementa rity Proble ms
355
(,2
> (,- 2(1-T )
Thus we have shown (ii).
o
Proof of 8.4.6:
The following inequal ity follows from the assump tion (8.48) and
Lemma
8.4.5:
o (x'·fY'·+(x'fy'
(x'·f y' + (x')T y,. + (x'· - x'f M'(x'· - x').
Since the matrix M' is positive semi-definite matrix, we have
First note that a -y exist such that -y E [r(M, q), 'Y(M, q)]. It can bee
seen from the inequality (8.63) of L which implies that
2£
'Y(M, q) = 22"
n
2:: -y(M, q) = 2 max {l[Me]il, Iqil} 2:: 2.
- iE{1,2, ... ,n}
(i): To see (x'o, y'0) E S++, we have only to show that yO > O. By the
definition (8.65) of r(M, q) and -y 2:: r(M, q), we have
It follows that
(5- 1) n-y e
~ ~
2
~ y° ~ (5 + 1) n-y e.
~ ~
2 (8.78)
Complementarity Problems 3.57
° ° = (5)
a wy3,
X n +1Yn+l
(8.79)
( 2." __1_) n 3 < (x'O)Ty'O < (2. + _1_) n 3
n+ 1 I - n+ 1 -" n+ 1 I,
_ (1.n + _1_)
n+1
n..,,3
I
<
-
XIOylO _ (x'O)TyIO
" n+l
<
-
(1.n + _1_)
n+l
n..,,3 V'
I I.
- 10
s:: 3(L + 1) + -0' (by (i) of Lemma 8.4.5)
¢~en(xlO, ylO)
~ I (x'O)T ylO /(n + I)
~ og 10 10
i=l xi Yi
~I (x IO )T yIO/(n+l) I (x IO )T yIO/(n+l)
~
;=1
og °°
Xi Yi
+ og ° °
x n + 1 Yn+1
~I 5/0'+1/(n+l) I 5/0'+1/(n+l)
s:: ~ og 5/0' _ l/n + og 5/0'
t=l
358 CHAPTER 8
_ (n
qn+1 < + 1)2£+1
2 .
Th e assertlOn
. (".). b . db y tak'mg account 0 f
111 18 0 tame
n
the above inequality, the construction (8.59) of the LCP', the definitions
(8.62) and (8.64) of Land L', and the known inequality n(n+l)::; L. 0
y=Mx+q, (x,y)~O
where
p
=
(Xi, r/) is a vertex of S+ for l' 1, ... , p and (~, '17) is an unbounded direction
of S+, i.e., '17 = M~ and (~, '17) ~ o. Among (xi, r/) (l' = 1, ... ,p), we can
find a vertex (x', y') of S+ such that Ci ~ 1/( n + 1). It follows that
Since each nonzero component of the vertex (x*, y') is not less than n 2 2-£ >
(n + 1)2-£ (n ~ 2), the above inequalities imply that the vertex (x*, yO)
satisfies the relation (8.69). Combining the fact that I U J = {I, 2, ... , n},
we can conclude that (x*, yO) is a solution of the LCP. 0
Acknowledgements
The author would like to thank Professor Tamas Terlaky, the editor of this book,
for his warm encouragement and suggestions. Also, a colleague, Yasushi Kondo,
contributed valuable comments on an early version of this chapter.
REFERENCES
[1] E. D. Andersen and Y. Yeo On a homogeneous algorithm for the monotone com-
plementarity problem. Research Reports, Department of Management Sciences,
University of Iowa, Iowa City, Iowa 52242, 1995.
[2] M. Anitescu, G. Lesaja, and F. A. Potra. Equivalence between different formu-
lations of the linear complementarity problem. Technical report, Department
of Mathematics, University of Iowa, Iowa City, IA 52242, USA, 1995.
[3] R. E. Bixby, J. W. Gregory, I. J. Lustig, R. E. Marsten, and D. F. Shanno. Very
large-scale programming: a case study in combining interior point and simplex
methods. Operations Research, 40:885-897, 1992.
[4] J. F. Bonnans and F. A. Potra. Infeasible path following algorithms for linear
complementarity problems. Technical report, INRIA, B.P.105, 78153 Rocquen-
court, France, 1994.
360 CHAPTER 8
[5J R. W. Cottle, J .-S. Pang, and R. E. Stone. The linear complementarity problem.
Computer Science and Scientific Computing, Academic Press Inc, San Diego,
CA92101, 1990.
[6] D. den Hertog. Interior point approach to linear, quadratic and convex program-
ming. Mathematics and Its Application, Vol. 277, Kluwer Academic Publishers,
The Netherlands, 1994.
[7] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Un-
constrained Minimization Techniques. John Wiley & Sons, New York, 1968.
[8] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Re-
search Logistics Quarterly, 3:95-110, 1956.
[9] R. M. Freund. Polynomial-time algorithms for linear programming based only
on primal scaling and projected gradients of a potential function. Mathematical
Programming, 51:203-222, 1991.
[10] M. S. Gowda. On reducing a monotone horizontal LCP to an LCP. Techni-
cal report, Department of Mathematics & Statistics, University of Maryland
Baltimore County, Baltimore, Maryland 21228, 1994.
[11] O. Guier. Generalized linear complementarity problems. Research Reports,
Department of Mathematics and Statistics, University of Maryland Baltimore
County, Baltimore, Maryland 21228-5398, 1992.
[12] O. GuIer. Existence of interior points and interior paths in nonlinear monotone
complementarity problems. Mathematics of Operations Research, 18:128-148,
1993.
[13] O. GuIer. Barrier functions in interior point methods. Technical report, Depart-
ment of Mathematics and Statistics, University of Maryland Baltimore County,
Baltimore, Maryland 21228, USA, 1994.
[14] P. T. Harkar and J .-S. Pang. Finite-dimensional variational inequality and
nonlinear complementarity problems: A survey of theory, algorithms and appli-
cations. Mathematical Programming, 48:161-220, 1990.
[15] B. Jansen, C. Roos, and T. Terlaky. A family of polynomial affine scaling
algorithms for positive semi-definite linear complementarity problems. ISSN
0922-5641, Faculty of Technical Mathematics and Informatics, Delft University
of technology, P.O.Box 5031,2600 GA Delft, The Netherlands, 1993.
[16) B. Jansen, K. Roos, T. Terlaky, and A. Yoshise. Polynomiality of primal-dual
affine scaling algorithms for nonlinear complementarity problems. Technical
Report 95-83, Faculty of Technical Mathematics and Computer Science, Delft
University of Technology, Delft, The Netherlands, 1995.
Complementarity Problems 361
[17] F. Jarre. On the method of analytical centers for solving smooth convex pro-
gramming. In S. Dolecki, editor, Optimization, pages 69-86, Berlin, Germany,
1988. Lecture Notes in Mathematics No. 1405, Springer Verlag.
[19] J. Ji, F. Potra, and S. Huang. A predictor-corrector method for linear comple-
mentarity problems with polynomial complexity and super linear convergence.
No. 18, Department of Mathematics, The University of Iowa, Iowa City, Iowa
52242, 1991.
[22] J. Ji, F. A. Potra, and R. Sheng. A predictor-corrector method for solving the
P.-matrix LCP from infeasible starting points. Technical report, Department of
Mathematics and Computer Science, Valdosta State University, Valdosta, GA
31698, 1994.
[24] M. Kojima, Y. Kurita, and S. Mizuno. Large-step interior point algorithms for
linear complementarity problems. SIAM J. Optimization, 3:398-412, 1993.
[27] M. Kojima, S. Mizuno, and T. Noma. A new continuation method for com-
plementarity problems with uniform P-functions. Mathematical Programming,
43:107-113,1989.
362 CHAPTER 8
[34] M. Kojima, S. Shindoh, and S. Hara. Interior-point methods for the mono-
tone linear complementarity problem in symmetric matrices. Technical report,
Department ofInformation Sciences, Tokyo Institute of Technology, 2-12-1 Oh-
Okayama, Meguro-ku, Tokyo 152, Japan, 1994.
[35] K. O. Kortanek and J. Zhu. A polynomial barrier algorithm for linearly con-
strained convex programming problems. Mathematics of Operations Research,
18:116-127,1993.
[36] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Computational experience with a
primal-dual interior point method for linear programming. Linear Algebra and
Its Applications, 152:191-222,1991.
[65] P. M. Pardalos and Y. Yeo The general linear complementarity problem. Tech-
nical report, Department of Computer Science, The Pennsylvania State Univer-
sity, University Park, PA 16802, 1990.
[69] F. A. Potra and R. Sheng. A path following method for LCP with superlinearly
convergent iteration sequence. Report on Computational Mathematics, No.
69/1995, Department of Mathematics, University of Iowa, Iowa City, IA 52242,
1995.
[70] F. A. Potra and Y. Yeo Interior point methods for nonlinear complementarity
problems. Technical report, Department of Mathematics, The University of
Iowa, Iowa City, Iowa 52242, 1991.
[71] A. Schrijver. Theory of Linear and Integer Programming. John-Wiley & Sons,
New York, 1986.
[77] M. J. Todd and Y. Yeo A centered projective algorithm for linear programming.
Mathematics of Operations Research, 15:508-529, 1990.
[83] Y. Yeo A class of potential functions for linear programming. Technical re-
port, Integrated Systems Inc., Santa Clara, CA and Department of Engineering-
Economic Systems, Stanford University, Stanford, CA, 1988.
Complementarity Problems 367
[84] Y. Yeo A further result on the potential reduction algorithm for the P-matrix
linear complementarity problem. Technical report, Department of Management
Sciences, The University of Iowa, Iowa City, Iowa 52242, 1988.
[85] Y. Yeo The potential algorithm for linear complementarity problems. Technical
report, Department of Management Sciences, The University ofIowa, Iowa City,
Iowa 52242, 1988.
[86] Y. Yeo An O(n 3 L) potential reduction algorithm for linear programming. Math-
ematical Programming, 50:239-258, 1991.
[87] Y. Yeo On homogeneous and self-dual algorithm for LCP. Technical report,
Department of Management Sciences, The University of Iowa, Iowa City, Iowa
52242, 1994.
[88] Y. Ye and K. Anstreicher. On quadratic and O( foL) convergence of a predictor-
corrector algorithm for LCP. Mathematical Programming, 59:151-162, 1993.
[89] Y. Ye, M. J. Todd, and S. Mizuno. An O(foL)-iteration homogeneous and
self-dual linear programming algorithm. Technical report, Department of Man-
agement Sciences, The University of Iowa, Iowa City, Iowa 52242, 1992.
[90] Y. Zhang. On the convergence of a class of infeasible interior-point methods
for horizontal linear complementarity problem. Research Report 92-07, Depart-
ment of Mathematics and Statistics University of Maryland Baltimore County,
Baltimore, Maryland 21228-5398, 1992.
[91] J. Zhu. A path following algorithm for a class of convex programming problems.
Zeitschrijt fur Operations Research, 36:359-377,1992.
9
SEMIDEFINITE PROGRAMMING
Motakuri V. Ramana, Panos M. Pardalos
Center for Applied Optimization
Department of Industrial and Systems Engineering
University of Florida
Gainesville, Florida 35611 USA
ABSTRACT
Semidefinite Programming is a rapidly emerging area of mathematical programming. It
involves optimization over sets defined by semidefinite constraints. In this chapter, several
facets of this problem are presented.
9.1 INTRODUCTION
Let Sn be the space of n x n real symmetric matrices, and for A, B E Sn, A • B
denotes the inner product Li,j Aij Bij , and we write A ~ B if A - B is positive
semidefinite. Suppose that Qo, ... , Qm E Sn are given matrices, and c E Rm. Then
the semidefinite program in equality standard form is defined to be the following
optimization problem.
inf: U. Qo
U .Qi Ci Vi = 1, ... , m (SDP-E)
U ~ O.
We also define the semidefinite program in inequality standard form to be:
sup: cT x
(SDP-I)
L;:l XiQi --< Qo
The two problems SDP-E and SDP-I are equivalent in the sense that one can be
transformed into the other with relative ease. Furthermore, as will be seen in sections
369
T. Terlaky (ed.), Interior Point Methods of Mathematical Programming 369-398.
© 1996 Kluwer Academic Publishers.
370 CHAPTER 9
to follow, these problems are the so-called standard duals of each other. The main
motivation for starting out with both problems is that, the first form appears to be
more suitable for algebraic purposes while the latter has a strong geometric flavor.
Let fE, fi denote the optimal values of the problems SDP-E and SDP-I respectively.
Both problems will be collectively referred to as SDP.
At the outset, it should be mentioned that two recent survey articles have already
appeared on SDP, namely [3] and [92] (an earlier version of the latter is [91]). The
main thrust of these two surveys had been interior point methodologies for SDP. In
addition, in [3], applications to combinatorial optimization have been discussed, and
in [92], applications to engineering problems and other optimization problem classes
were presented. Keeping the above in mind, here we will dwell upon aspects that
have received less attention in the abovementioned references. In particular, only
sketchy attention will be paid to interior point methods, despite the stated title of
the current volume. Several open problems will be stated with the hope that they
will inspire further developments in this highly promising subject area.
Q(x) = Lx;Q;.
;=1
G = {x IQ(x) ~ Qo},
where Q(x) is a linear symmetric matrix map as defined above, and Qo E Sn. In
other words, G is the feasible region of the semidefinite program SDP-1. It is not hard
Semidefinite Programming 371
to see that the feasible region of SDP-E can be recast in the above inequality form,
and hence spectrahedra are precisely the feasible regions of semidefinite programs.
The name spectrahedron is chosen for the reason that their definition involves the
spectrum (the eigenvalues) of matrices, and they bear a resemblance to, and are a
generalization of polyhedra.
Certain properties of spectrahedra have been studied in [79] and [68]. Some of these
properties are:
Using this, one can characterize extreme points and extreme rays of spectrahe-
dra. It is also known that every face of a spectrahedron is exposed (i.e., each
face of G can be written as the intersection of a hyperplane with G; see [84] for
examples of nonexposed faces of general convex sets).
2. Spectrahedra are closed under intersections, but are not closed under linear
mappings, projections, polar operation or Minkowski sums ([79]).
372 CHAPTER 9
3. Unlike polyhedra [8], the dimensions of the faces of a spectrahedron need not
form a contiguous string. Take the PSD cone, for instance, which is a spectra-
hedron and it is well known that the dimensions of its faces are the triangular
=
integers k(k + 1)/2 for k 0, ... , n (see [9] and [21]).
£n := {U E Sn lUi; = 1 V i, U ~ O}.
Such matrices are also known as correlation matrices, and they playa critical role
in the approximation algorithm for the MAXCUT problem developed in [26]. More
specifically, as we will see in more detail later, their method is a relaxation in which
one optimizes a linear objective function over £n. In [50] and [51], this object has
been investigated. In particular, their results include the following.
In [68], results concerning facial structure of spectrahedra are given. The following
results are also derived.
1. Bounds on the ranks of the matrices (U for the SDP-E case and Qo - Q(x) for
the SDP-I problem), when the solutions are extreme points.
2. Bounds on the multiplicity of the eigenvalues of the matrices at extreme point
optimal solutions ([69]).
3. In [70] and [49], the extreme points are treated as a generalization of the notion
of basic feasible solutions from LP, and "simplex-type" methods for SDP has
been proposed.
G={xIQ(x)~Qo},
clearly G contains the origin exactly when Qo t O. Supposing that this latter
condition holds, it is not hard to derive (see [79]) the following expression for the
polar:
GO = CI({Q*(U) I U t 0, Qo. U ~ I}),
where Q*(U) denotes the adjoint of the linear map Q(x), and CI(.) is the closure
operation. When G is full dimensional, it is not necessary to take the closure in the
above expression, thus yielding an algebraic description of the polar for this case.
However, when full dimensionality is not satisfied, this fails to hold. In [76], by
using an incremental argument, an expression for GO is derived for the most general
situation. This in turn yields a polynomial size gapfree dual program for SDP which
will be discussed in 89.2.2.
E = {x I I(x) ~ I},
=
where, I(x) x~ + (X2 - 2)2/4 + xV4. Then every point on the boundary of G is an
extreme point, and consequently, all faces are zero dimensional, except for the whole
set itself which is 3 dimensional. However, the surface of G can be partitioned into
three pieces; two smooth surfaces which are given by exactly one of the functions
xTx - 1 and I(x) - 1 being zero (and the other being negative), and one closed
nonplanar curve which is the intersection ofthe two surfaces of Band E. This curve
is parametrized as
Prompted by the above and other similar examples, we define the following nonlinear
notion of faces, called plates. Let G =
{x E R m I Q(x) :::S Qo} be a spectrahedron,
where Q(x) is a linear n x n matrix map. Then, for every 0 ~ k ~ n, define the
subset of G given by:
2. Using the classical results of Whitney [94], it can be shown that every spectra-
hedron has at most finitely many plates.
Of course, very little is understood concerning the plates of spectrahedra and their
structure at this point. However, it appears that Algebraic Geometry techniques
such as the Groebner bases ([IOJ and [16J are good introductory texts) are applicable
here.
One can reverse the minmax into maxmin and, it can be shown once again that
This implies that Ii :S IE. There exist several examples for which equality fails to
hold (see, for instance, [91], [76] or [22]). Let us define, for the pair of semidefinite
programs SDP-E and SDP-I, the standard duality gap (SDC) to be the difference
IE - Ii· Listed below are some conditions under which SDG is zero (from [91]; see
[59] for a thorough treatment).
1. There exists a primal feasible solution U that is positive definite, or less restric-
tively (see [79] for explanation), the primal feasible region is full dimensional.
When none of the above conditions hold, one may have a nonzero duality gap.
Therefore, it is a natural question to ask if there exists a polynomial size dual program
for SDP which can be written down using the primal data and for which the duality
gap is zero, without any assumptions. A first step in this direction was taken in [13],
where it was shown that for any cone programming problem, restricting attention
to the minimal cone will result in zero duality gap. Furthermore, a theoretical (and
unimplementable) method for regularizing a cone program was given. While this
approach to duality gives zero duality gap, resulting dual programs are not explicit
polynomial size programs that depend only on the primal data. The derivation of
such a dual was an open problem before it was resolved in [76]. The approach used
there was to establish a description of polars of spectrahedra and use it to formulate
the dual program (for SDP-I) called Extended Lagrange-Slater Dual (ELSD).
In the following, we will present the ELSD program and state the main duality
theorem on ELSD. But first some notation is introduced.
Qo. U )
#
Q (U) =( Q*(U) .
376 CHAPTER 9
L: YiQi.
m
Q(y) =
i=O
The following is a gapfree dual semidefinite program, called the Extended Lag-
range-Slater Dual (ELSD) for SDP-I.
inf: (U+Wm)eQo
s.t. Q*(U + Wm ) c
Q#(Ui + Wi-I) O,i= 1, ... ,m
(ELSD)
Ui >- W;Wl,i= 1, ... ,m
U >- o
Wo o
Note that the constraint Ui t Wi Wr can alternately be written as
I
[ Wi Wl]>-o
Ui -,
The duality theorem for ELSD is given below, wherein (U, W) is said to be dual
feasible, if these matrices, along with some Ui, Wi, i = 1, ... " m, where W m = W,
satisfy the constraints of the dual program ELSD.
Theorem 9.2.1 (Duality Theorem) The following hold for the primal problem
SDP-I and the dual problem ELSD:
In [83], connections between the minimal cone based approach and ELSD were dis-
cussed. Furthermore, the extended dual of the standard SDP in equality form, i.e.,
SDP-E was also given. In the recent work [78], the Lagrangian dual (or standard
dual) of ELSD has been considered. After some reformulation, the standard dual of
ELSD in variables are z E R m and Ri E Sn, y(i) E Rm+1, i = I, .. ,m takes the
form given below.
sup: cT z - 2:::;:1 Ri • I
Q(z) -< Qo
R·
[ Q(Y(i~I)) QC!!(i + 1)) ] (P2)
Q(y(i)) >- OVi=I, .. ,m-I
[ Rm QE-Q(Z)] >- 0
Qo - Q(z) Q(y(m))
In any feasible solution of P2, the z part is also feasible for SDP-I, and every Ri is
positive semidefinite. Therefore, it follows that the optimal value of P2 is at most
that of SDP-I. In [78], it was shown that these are actually equal. Since the La-
grangian dual of P2 will be ELSD, it follows that the SDG (standard duality gap) of
P2 is zero. Thus, starting wih any arbitrary SDP, one can obtain another (polyno-
mial size) SDP with the same optimal value and whose SDG is zero. For this reason,
we will call the problem P2, the corrected primal of the semidefinite program
SDP-I. The corrected primal of SDP-E can be developed in a similar way. Now, in
order to develop interior point methods (or other complexity bounded algorithms)
for the most general SDPs, one may assume without loss of generality that the SDP
at hand (which may be taken to be in either SDP-E form of SDP-I form) has zero
standard duality gap. Note however that one can not still assume that Slater condi-
tion is satisfied, raising the possibility of developing infeasible interior point methods
in this framework.
Finally, certain analytical aspects of SDP have been studied in [52] and [85].
By applying ellipsoid and interior point methods, one can deduce the following com-
plexity results for SDP.The maximum of the bitlengths of the entries of the Qi and
the components of c will be denoted by L, and define for c > 0,
• There are algorithms which, given any rational c > ° and an Xo such that
Qo - Q(xo) >- 0, compute a rational vector x such that Qo - Q(x) >- 0, and
cTx is within an additive factor c of the optimum value of SDP. The arithmetic
complexity of these algorithms is polynomial in n, m, L, log(1/c), log(R) and the
bitlength of xo, where R is an integer such that the feasible region of the SDP
lies inside the ball ofradius R around the origin ([3],[59]). However, it should be
mentioned that a polynomial bound has not been established for the bitlengths
of the intermediate numbers occurring in these algorithms.
• For any fixed m, there is a polynomial time algorithm (in n, L) that checks
whether there exists an x such that Q(x) >- 0, and ifso, computes such a vector
([75]). For the nonstrict case of Q(x) !:: 0, the feasibility can be verified in
polynomial time for the fixed dimensional problem as shown in [72].
At the outset, we emphasize the facts that these methods deal with the computation
of approximate optimal solutions only and that no bitlength analysis has been carried
out by any of the authors.
The main feature that enables one to extend LP interior point methods to SDP is
the fact that the logarithm of the determinant function serves as a barrier function
for SDP. Its self concordance was established and used by Nesterov and Nemirovskii
[59] in developing barrier methods for SDP. In [1] and [3], a potential reduction
Semidefinite Programming 379
algorithm was developed based on Ye's projective algorithm for LP [96]. Alizadeh
([1]) also pointed out the striking similarity between LP and SDP and suggested
a mechanical way of extending results from LP to SDP. In [40], Jarre developed a
barrier method. More potential reduction methods are given in [92]. In [35], a con-
vergent and easily implementable method was given (a matlab code is available at
the ftp site ftp://orion.uwaterloo.ca/pub/henry/software). A primal-dual method
was presented in [4]. In [60] and [61], Nesterov and Todd discuss primal-dual meth-
ods for self-scaled cone problems and develop what has come to be known as the
Nesterov-Todd (NT) direction. In a recent work [22], Freund discusses interior-point
algorithms for SDPs in which no regularity (Slater-like) conditions are assumed. A
self-dual skew-symmetric embedding method was presented in [46] for the initializa-
tion of interior point methods for SDP.
Recently, several papers have appeared on interior point methods for SDP, and these
can be obtained from the interior point archive maintained at the Argonne National
Laboratories (WWW URL is http://www.mcs.anl.gov /home/otc/lnteriorPoint/index.html).
To follow are some details on these results.
The primal-dual central path is defined as the set of solutions (U (J.l), x(J.l), S(J.l)) of
the system
If the candidate solution (U, x, S) does not satisfy the first two requirements of SDP-
Path, then we enter the domain of infeasible interior point methods. In this case,
the right hand sides of the first two Newton equations in SDP-Newt are not zero,
380 CHAPTER 9
but instead they equal the current primal and dual infeasibility, respectively. These
methods simultaneously reduce the infeasibility and 1-'.
The papers [59, 92] deal with potential reduction methods. Much work has recently
been done on primal-dual central path following algorithms. Detailed study of search
directions can be found in [48, 56]. The properties of central trajectories are studied
in detail in the papers [59, 27, 20, 86],
An infeasible interior point method for SDP was developed by Potra and Sheng
[71]. This method is based on the Lagrange dual. It would be an interesting result
to develop infeasible interior point methods based on the ELSD duality approach
(see below). Interior point methods for monotone semidefinite complementarity
problems have been developed by Shida and Shindoh [73]. They prove that the
central trajectory converges to the analytic center of the optimal set. Further, they
prove global convergence of an infeasible interior point algorithm for the monotone
semidefinite complementarity problem.
With the exception of [22], most of the methods mentioned above make an explicit
assumption that the primal and/or the dual have a strictly feasible solution. As
mentioned in S9.2.2, it seems that infeasible interior point methods can be developed
using the gapfree dual ELSD and the "corrected primal" problem P2. The suitability
of infeasible IPMs for this situation can be justified as follows. Some difficulties with
initialization can be circumvented using a corrected primal based infeasible IPM
approach. Unlike in the case of LP, "Phase 1" type initialization can run into some
difficulties for SDP. For instance, for the SDP-I problem, consider the "Phase 1"
problem: inf{zoIQ(x) ~ Qo + zoI,zo 2:: O}. It may happen here that the infimum
is zero without being attained. No satisfactory "Big-M" method has been devised
for SDP (based on the examples of "ill-behaved SDPs" in [76], it is our conjecture
that M will need to be exponentially large in bitlength here). Also, even if the
initialization step is somehow carried out, there are instances of SDPs, for which all
rational solutions are exponential in bitlength, and hence the whole process becomes
inherently exponential, contradicting the initial objective of devising a polynomial
time algorithm, even in an approximate sense.
Semidefinite Programming 381
Open Problem 9.1 Develop an infeasible IPM for general semidefinite programs
using ELSD and the corrected primal P2.
Open Problem 9.2 Perform a bitlength analysis of the interior point methods for
SDP.
We now turn our attention to affine scaling algorithms. The affine scaling linear
programming algorithms have gained tremendous popularity owing to their charming
simplicity. The global convergence properties of these methods (for LP) have been
uncovered relatively recently (see [89] in this volume). In particular, Tsuchiya and
Muramatsu ([90]) proved that when the step length taken is in (0,2/3], then both
primal and dual iterates converge to optimal solutions for the respective problems.
It is not hard to extend the LP affine scaling algorithm to semidefinite programming.
For instance, for the problem SDP-I, let x be a strictly feasible solution. Let P =
Qo - L~l XiQ; >- O. Then consider the inequality
It can be shown that every feasible solution to the above inequality is feasible for
SDP-I. One can easily maximize cT x over the above ellipsoid and repeat as in the
standard dual affine scaling method. It remains to be seen if the proofs of [90) can
be extended to the above approach.
Open Problem 9.3 Prove the global convergence of the above affine scaling method.
The primal-dual affine scaling algorithms (both the Dikin-affine scaling of Jansen et
al. [39] and the classical primal-dual affine scaling algorithm of Monteiro et al. [57])
have been generalized by de Klerk et al. [45]. The iteration complexity results are
analogous to the LP case.
We will briefly mention about non-interior point methods for SDP. In [63], Overton
discusses an active set type method. In [75] (see [81]), and later independently in
[85], a notion of the convexity of a matrix map was introduced. Using this, one can
define what may be called a "convex nonlinear SDP". In [81) a Newton-like method
was developed for convex nonlinear semidefinite inequality systems, and in [85],
certain sensitivity results have been derived. While attempts towards extending
the LP simplex method to SDP have been made ([49, 70]), we consider that this
problem remains unsolved. Also, since the SDP can be treated as a non differentiable
convex optimization problem (NDO), most NDO algorithms can be applied to solve
382 CHAPTER 9
semidefinite programs. See [67] for interior point methods for global optimization
problems, which solve some SDP relaxations in a disguised form.
Many such examples are discussed in [22] and [76]. Therefore, rigorously speaking,
it is not a well stated problem to want to compute an exact optimal solution of an
arbitrary rational SDP, since the output is not representable in the Turing machine
model. Let us consider the feasibility problem defined below.
LXiQ; ~ Qo
;=1
is feasible.
Note that the required output of this problem is a "Yes" or a "No" (decision prob-
lem). Therefore, it is reasonable to ask whether there is a polynomial time algorithm
for the solution ofSDFP. In our opinion, this is the most challenging and outstanding
problem in semidefinite programming, at least in the context of complexity theory.
Open Problem 9.5 Determine whether the problem SDFP is NP-Hard, or else
find a polynomial time algorithm for its solution.
Semidefinite Programming 383
In [76], the following results concerning the exact complexity of SDP are established.
In [72], the authors discuss complexity results for fixed dimensional SDPs (both
n-fixed and m-fixed cases), extending and strengthening certain results of [75].
9.4 APPLICATIONS
Applications of semidefinite programming can be broadly classified into three groups:
Also, as seen earlier, SDP generalizes linear and convex quadratic programming
problems, more generally, convex quadratic programming with convex quadratic
constraints. Since the latter has not been extensively studied by itself, most of its
applications (which arise in certain facility location problems as studied in [18]) can
also be considered to be applications of SDP.
mm: U. Qo + 2b'6 x + Co
s.t. U • Qi + 2bT x + Ci =0 Vi = 1, ... , m (MQP2)
U -xxT =0
Now, let us relax the condition U - xx T =0 to U - xxT !: 0, to obtain
mm: U • Qo + 2b'6 x + Co
s.t. U. Qi + 2bT x + Ci = 0 V i = 1, ... ,m (RMQP)
U-xxT !: O.
The condition U - xxT !: 0 is equivalent to
Therefore, the relaxed MQP (RMQP) is a semidefinite program. This SDP relax-
ation of MQP will be referred to as the convexification Relaxation of MQP. The
reason being that, if f : R n -> R m is the quadratic map composed of the con-
straint functions of the MQP, then the feasibility of that problem can be restated
as 0 E f(Rn). On the other hand, it can be shown (see [75]) that the semidefinite
program RMQP is feasible if and only if 0 is in the convex hull of the image, i.e.,
Conv(f(Rn». This relaxation was originally introduced by Shor [74], although in a
somewhat different form. It is also investigated in [24].
We will return to the connections between MQP and SDP after discussing some
results on the application of SDP to combinatorial optimization.
the largest stable set in G. Let STAB( G) denote the convex hull of the characteristic
vectors of the stable sets of G. If u, v are the characteristic vectors of a clique and
a stable set in G, we have the inequality uT v ::; 1. This implies that the polyhedron
contains STAB(G). Now, note that the problem of finding a maximum stable set is
equivalent to each of the following problems:
Note that the second of these problems is a multiquadratic program, and hence we
apply the convexification relaxation to it. Accordingly, we define the spectrahedron
and therefore, as a relaxation to MSS, one can maximize eT x over TH(G), which is
an SDP in both variables x and U.
For general graphs, not much is known about the effectiveness of the above relax-
ation. However, for a class of graphs known as perfect graphs, the relaxation is
exact. We will circumvent the usual combinatorial definition of perfect graphs as,
for our purposes, it suffices to define these graphs as those for which STAB( G) =
QSTAB(G). Clearly, in this case, all the three sets STAB(G), TH(G) and QSTAB(G)
coincide. Thus, one can approximately maximize eT x over TH( G) by the use of a
polynomial approximation algorithm for SDP. For techniques that extract discrete
solutions from this approximation, the reader is referred to [30] and [1]. Further-
more, when G is perfect, the following additional problems can be solved using this
methodology:
• Find the smallest number of colors required to color the vertices of G such that
every pair of adjacent vertices receive different colors.
In [1], a sublinear time parallel algorithm was presented for solving the stable set
and other problems for perfect graphs. The reader is also referred to the expository
article [47] by Knuth on this approach.
This proposition may be useful in both addressing the complexity of perfect graph
recognition, as well as settling what might be considered the most celebrated and
yet unresolved conjecture in Graph Theory, which states that a graph G is perfect
if and only neither G nor its complement G induce an odd cycle of size at least 5.
modeled as the quadratic integer program given below, where W is the matrix of
weights, and J is the matrix of all ones.
max: W. J - yTWy
(MAXcUT)
Yi E {-l,+l}
Note that Yi E {-I, +1} is equivalently written as y[ 1. In [26], the following
SDP relaxation of MAXcUT was considered.
max: W.J-W.U
U to (GWR)
Ui; E {-1,+1}.
It is not hard to see that this is nothing but the convexification relaxation of the
MQP form of MAX CUT. Let us call it the Goemans-Williamson Relaxation (GWR)
of the maximum cut problem. The remarkable results of [26] are the following.
1. The optimal objective value of GWR is at most 1.14 times of that of MAXcUT.
2. From an optimal solution to GWR, a cut whose expected value is at least .878
times the optimal cut value can be obtained using randomization.
The underlying geometric reason behind item 1 above is best described using the
following theorem formulated by Laurent [51]. First, let en denote the convex hull
of all matrices of the form vvT , where v E {-I, + l}n. It is clear that the MAX CUT
problem amounts to maximizing a linear function over en. Let us return to the
convex set (called elliptope) en defined in 89.2.1. The main geometrical result
concerning these sets is given below. For a matrix A and a univariate function f,
fo(A) denotes the matrix whose (i,j)th entry is f(A;j).
Furthermore, the following nonlinear semidefinite program has the same objective
function value as MAXcUT (see [26]):
1
- max{W. (arccoso(U)) I U E en}.
7r
1. In [26], the authors extend their analysis for MAX CUT to derive strong ap-
proximation results for the following problems: MAX SAT, MAX 2SAT, MAX
DICUT (the first two problems are related to the Satisfiability Problem, and
the last is a directed version of the MAX CUT problem).
2. In [43], an approximate graph coloring algorithm was developed.
3. Extensions of the Goemans-Williamson approach to the max-k-cut problem are
given in [23].
4. Here are some results that are somewhat negative concerning the application
of SDP to combinatorial optimization. Recently, Kleinberg and Goemans [44]
have shown that certain SDP relaxations of the vertex cover problem have a
worst case performance guarantee of only 2 (in the limit) coinciding with what
the standard LP relaxation guarantees. A similar result for the independent set
problem has been established by Alon and Kahale [6].
A topic that was studied well before SDP became popular was that of PSD comple-
tions of partially specified matrices. An early and well-written paper is [28]. These
problems involve determinant maximization subject to semidefinite constraints and
a recent reference to these is [93].
We would like to mention that the well known Graph Isomorphism Problem might
perhaps be reducible to an SDP feasibility problem. Given two graphs G 1 , G 2 with
adjacency matrices A, B, respectively, the graphs are isomorphic if and only if there
exists a permutation matrix X such that A = XT BX. This can be written as the
MQP
= =
A XT BX, X e e, XT e e, X 0 X= =X,
where e is the vector of all ones, and 0 gives the entrywise (Hadamard) product of
two matrices. In [82], it was shown that one can relax the condition that X is a
permutation matrix to it being doubly stochastic. This gives the MQP:
where J is the matrix of all ones. Whether the convexification relaxation of either
of the above systems is exact appears to be an intriguing question.
The second formulation is very closely related to N + operator defined [54]. Let
f(x) = (!1(x), ... , fn(x», where f;(x) = XTQiX+bf X+Ci Vi = 1, ... , m, and consider
the problem max{cT x I f(x) = OJ. Then, the convexification relaxation is
max{ cT x I Qi • U + bi x + Ci = 0 V i, U t xxT }.
Now, let G =(V, E) be a graph and fG be the quadratic map composed both
of lEI components given by XiX, V i, j E E as well as IVI components given by
x;- Xi ViE V. Then it is seen that TH(fG] is nothing but the usual TH(G) defined
for graphs earlier, and Conv(Z) is precisely STAB(G), and hence the perfectness of
the graph G is the same as the perfectness of the quadratic map fG.
In [31], Giiler discusses the existence of barrier functions for problems of this type.
We strongly believe that many results that are known for SDP, such as interior point
methods and duality theories (both standard and ELSD duals) can be extended to
hyperbolic programs.
Acknowledgements
The first author Ramana would like to thank Laci Lovasz, Jim Renegar and Rob
Freund for several interesting discussions on SDP, and Don Hearn for support and
encouragement.
REFERENCES
[1] F. ALIZADEH, Combinatorial Optimization with Interior Point Methods and
Semi-Definite Matrices, Ph.D. Thesis, Computer Science Department, Univer-
sity of Minnesota, Minneapolis, Minnesota, 1991.
[6] N. ALON AND N. KAHALE, Approximating the Independence Number Via the
(I-function, Manuscript, 1995.
[8] G.P. BARKER, The lattice of faces of a finite dimensional cone, Linear Algebra
and its Applications, Vol. 7 (1973), pp. 71-82.
[9] G.P. BARKER AND D. CARLSON, Cones of Diagonally dominant matrices, Pa-
cific J. of Math, Vol. 57 (1975), pp. 15-32.
[11] A. BEN-TAL AND M.P. BLEDSOE, A New Method for Optimal Truss Topology
Design, SIAM J. Optim., Vol. 3 (1993), pp. 322-358.
[17] V. CHVATAL, Linear Programming, W.H. Freeman and Co., New York, 1983.
[18] J. ELZINGA, D.W. HEARN AND W. RANDOLPH, Minimax Multifacility Lo-
cation with Euclidean Distances, Transportation Science, Vol. 10, (1976), pp.
321-336.
[19] L. FAYBUSOVICH, On a Matrix Generalization of Affine-scaling Vector Fields,
SIAM J. Matrix Anal. Appl., Vol. 16 (1995), pp. 886-897.
Semidefinite Programming 393
[33] J.-P. A. HAEBERLY AND M.L. OVERTON, A Hybrid Algorithm for Optimizing
Eigenvalues of Symmetric Definite Pencils, SIAM J. Matr. Anal. Appl., Vol. 15
(1994), pp. 1141-1156.
[34] B. HE, E. DE KLERK, C. Roos AND T. TERLAKY, Method of Approximate
Centers for Semi-definite Programming, Technical Report 96-27, Faculty of
Technical Mathematics and Computer Science, Delft University of Technology,
Delft, The Netherlands, 1996.
[35] C. HELMBERG, F. RENDL, R. VANDERBEI AND H. WOLKOWICZ, An Interior-
point Method for Semidefinite Programming, To appear in SIAM J. Optim.,
1996.
[36] R.B. HOLMES, Geometric Functional Analysis and its Applications, Springer-
Verlag, New York, 1975.
[37] R. HORN AND C.R. JOHNSON, Matrix Analysis, Cambridge University Press,
Cambridge, 1985.
[38] B. JANSEN, C. Roos, T. TERLAKY AND J .-PH. VIAL, Primal-dual Algorithms
for Linear Programming Based on the Logarithmic Barrier Method, Journal of
Optimization Theory and Applications, Vol. 83 (1994), pp. 1-26.
[39] B. JANSEN AND C. Roos AND T. TERLAKY, A Family of Polynomial Affine
Scaling Algorithms for Positive Semi-definite Linear Complementarity Prob-
lems, Technical Report 93-112, Faculty of Technical Mathematics and Com-
puter Science, Delft University of Technology, Delft, The Netherlands, 1993.
(To appear in SIAM Journal on Optimization).
[40] F. JARRE, An Interior Point Method for Minimizing the Maximum Eigenvalue
of a Linear Combination of Matrices, Report SOL 91-8, Dept. of OR, Stanford
University, Stanford, CA, 1991.
[41] J. JIANG, A Long Step Primal Dual Path Following Method for Semidefinite
Programming, Technical Report 96009, Department of Applied Mathematics,
Tsinghua University, Beijing 100084, China, 1996.
[42) D.S. JOHNSON, C.H. PAPADIMITRIOU AND M. YANNAKAKIS, How Easy is
Local Search?, Journal of Compo Sys. Sci., Vol. 37 (1988), pp. 79-100.
[43] D. KARGER, R. MOTWANI AND MADHU SUDAN, Improved Graph Coloring by
Semidefinite Programming, In 34th Symposium on Foundations of Computer
Science, IEEE Computer Society Press, 1994.
[44] J. KLEINBERG AND M. GOEMANS, The Lovasz Theta Function and a Semidef-
inite Relaxation of Vertex Cover, Manuscript, 1996.
Semidefinite Programming 395
[51] M. LAURENT, The Real Positive Semidefinite Completion Problem for Series-
Parallel Graphs, Preprint, 1995.
[52] A.S. LEWIS, Eigenvalue Optimization, ACTA Numerica (1996), pp. 149-190.
[54] L. Lov ASZ AND A. SCHRJIVER, Cones of Matrices and Set Functions and 0-1
Optimization, SIAM J. Opt. 1 (1991), pp. 166-190.
[55] L. Lov ASZ, Combinatorial Optimization: Some Problems and Trends, DIMACS
Tech Report, 92-53, 1992.
[58] R.D.C. MONTEIRO AND J .-S. PANG, On Two Interior Point Mappings for
Nonlinear Semidefinite Complementarity Problems, Working Paper, School of
Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta,
USA, 1996."
[59] Y. NESTEROV AND A. NEMIROVSKII, Interior Point Polynomial Methods for
Convex Programming: Theory and Applications, SIAM, Philadelphia, 1994.
[60] Y. NESTEROV AND M.J. TODD, Self-scaled Barriers and Interior-point Methods
in Convex Programming, TR 1091, School of OR and IE, Cornell University,
Ithaca, NY 1994.
[61] Y. NESTEROV AND M.J. TODD, Primal-dual Interior-point Methods for Self-
scaled Cones, TR 1125, School of OR and IE, Cornell University, Ithaca, NY
1995.
[62] M.L. OVERTON, On Minimizing the Maximum Eigenvalue of a Symmetric Ma-
trix, SIAM J. Matrix Anal. Appl., Vol. 9 (1988) pp. 256-268.
[63] M.L. OVERTON, Large-Scale Optimization of Eigenvalues, SIAM J. Optimiza-
tion, Vol. 2 (1992), pp. 88-120.
[64] M.L. OVERTON AND R.S. WOMERSLEY, Second Derivatives for Eigenvalue
Optimization SIAM J. Matrix Anal. Appl., Vol. 16 (1995), pp. 697-718.
[65] M.L. OVERTON AND R.S. WOMERSLEY, Optimality Conditions and Duality
Theory for Minimizing Sums of the Largest Eigenvalues of Symmetric Matrices,
Math. Programming, Vol. 62 (1993), pp. 321-357.
[66] P.M. PARDALOS, Continuous Approaches to Discrete Optimization Problems,
In Nonlinear Optimization and Applications, G. Di Pillo & F. Giannessi, Ed.,
Plenum Publishing (1996).
[67] P.M. PARDALOS AND M.G.C RESENDE, Interior Point Methods for Global
Optimization, Chapter 12 of this volume.
[68] G. PATAKI, On the Facial Structure of Cone-LP's and Semidefinite Programs,
Management Science Research Report MSRR-595, GSIA, Carnegie-Mellon Uni-
versity, 1994.
[69] G. PATAK I , On the Multiplicity of Optimal Eigenvalues, Technical Report,
GSIA, 1994.
[70] G. PATAKI, Cone-LP's and Semidefinite Programs: Geometry, Basic Solutions
and a Simplex-type Method, Management Science Research Report MSRR-604,
GSIA, Carnegie-Mellon University, 1994.
Semidefinite Programming 397
[74] N .Z. SHOR, Quadratic Optimization Problems, Soviet Journal of Computer and
Systems Sciences, Vol. 25 (1987), pp. I-II.
[76] M. RAMANA, An Exact Duality Theory for Semidefinite Programming and its
Complexity Implications. DIMACS Technical Report, 95-02R, DIMACS, Rut-
gers University, 1995. To appear in Math Programming. Can be accessed at
http://www.ise.ufl.edu;-ramana.
[78] M.V. RAMANA AND R.M. FREUND, A Corrected Primal for Semidefinite Pro-
gramming, with Strong Duality, In Preparation, 1996.
[79] M.V. RAMANA AND A.J. GOLDMAN, Some Geometric Results in Semidefinite
Programming, Journal Glob. Opt., Vol. 7 (1995), pp. 33-50.
[80] M.V. RAMANA AND A.J. GOLDMAN, Quadratic Maps with Con-
vex Images, Submitted to Math. of OR, 1995. Can be accessed at
http://www.ise.ufl.edu;-ramana.
[81] M.V. RAMANA AND A.J. GOLDMAN, A Newton-like Method for Nonlinear
Semidefinite Inequalities. Submitted to SIAM J. Optim., 1996. Can be accessed
at
http://www.ise.ufl.edu;-ramana.
ABSTRACT
The paper discusses two alternative ways of implementing logarithmic barrier methods for
nonlinear programming. The first method is a pure barrier method which uses a modified
penalty-barrier function. The second uses logarithmic barrier methods to derive a modi-
fied version of a sequential quadratic programming algorithm. Implementation issues are
discussed for both methods and directions of future research indicated.
10.1 INTRODUCTION
Logarithmic barrier methods were originally developed by Fiacco and McCormick [5]
as a means of attack on the nonlinear programming problem. While they noted the
applicability of the methods to the linear programming problem, it was the general
perceived opinion at that time that the methods would not be competitive with
the simplex method. In developing algorithms and software to actually attempt to
solve nonlinear problems, a number of serious difficulties with the logarithmic barrier
method were discovered, and these proved sufficiently intractable at the time that
the methods fell into disuse for some years. Interest in the methods was rekindled
with their remarkable success in solving large linear programming problems, which in
turn has led to new research into applying the methods to nonlinear programming
problems. Most of this work is quite new, and very incomplete. However, the
methods show sufficient promise when carefully applied to a variety of nonlinear
problems as to be definitely worthy of further study. To be able to work well in
practice, however, they must overcome the problems which originally led to their
being abandoned. Not all of these problems have been fully solved to date, and in
399
T. Terlaky (ed.), Interior Point Methods ofMathematical Programming 399-414.
© 1996 Kluwer Academic Publishers.
400 CHAPTER 10
solving some of the old problems, some new problems have arisen. It is the purpose
of this work to document the authors' experiences with two different schemes for
applying the methods to nonlinear problems, and indicate which problems arising
from the original methods now seem to be adequately solved, and which areas remain
for further work. In order to do this, it is instructive first to examine the algorithm,
and the problems which arose from implementing it, of what has come to be known
as the classical log barrier method.
and the logarithmic barrier algorithm is to choose an initial feasible XO and an initial
1'0> 0 and let xk solve
m
mjnB(x, J.lk) = f(x) - J.lk ~)n(ci(x».
i=l
Set J.lk+1 = iJ.l k , where i < 1, and continue until J.lk is sufficiently small. Note that
in the definition of the algorithm, x O is not specifically used. The need for a feasible
x Oarises from the fact that minimizing B( x, J.lk) must in general be done iteratively,
and the iterative sequence requires a feasible initial estimate in order that B(x,J.lk)
is defined. If x O is feasible, then so will be all iterates found in determining XO.
This then is used as the initial point in determining xl, and so on. Thus a major
problem initially with logarithmic barrier methods was the need to determine an
initial feasible point, which can be as difficult as solving the actual problem.
A second major problem with the clrssicallog barrier method arises from structural
ill-conditioning of the method as the optimum is approached. To see this, we note
that taking first derivatives of B( x, 1') yields
L
m
hm - (
x)
1'-0 C;
where (x, A) are the optimal primal variables and the associated Lagrange multipliers.
Differentiating (10.3) again yields
m m ~
Thus for any constraint c;(x) which satisfies c;(x) = 0, a corresponding eigenvalue
of the Hessian matrix of B(x, p) ---+ 00 and the problem becomes extremely ill-
conditioned as the optimum is approached. This makes it very difficult for any
unconstrained numerical algorithm, used to minimize the barrier function for any
choice of pk, to converge as pk approaches 0.
Another difficulty with the classical logarithmic barrier method is the need for a
very careful line search algorithm. This arises during the unconstrained search for
the minimizer of B(x, pk). Typically, the iterative method to find xk is of the form
Xj+l = xj + Oijdj •
where Oij is a scalar parameter chosen to assure that B(xj+l' pk) - B(xJ, pk) is
sufficiently small. The fact that B(x, pk) has poles at Ci(X) = 0, i = 1,···, m makes
the line search extremely difficult. In fact, it is often quite difficult just to find a
bound on Oij which assures feasibility. It might seem that a safeguard would be to
choose Oij to be small enough so that the poles are not approached, but in practice
this slows down the unconstrained algorithm so drastically that the method never
converges.
Another problem with the classical logarithmic barrier method is the choice of the
initial pO and the subsequent algorithm for reducing p at each step. The method is
often very sensitive to the choices of p, and good general algorithms for determining
p have proved allusive.
Finally, the problem (10.1) has only inequality constraints. The general nonlinear
programming problem is
minf(x)
subject to c;(x) ~ 0, i = 1, ... , m, (10.4)
gi(X) = O,i = 1,·· .,p.
402 CHAPTER 10
Fiacco and McCormick [5] incorporated the equality constraints by adding a penalty
term to the barrier formulation. The transformed problem is
m 1 p
minF(x, lI) = /(x) -II Lln(ci(x)) + - L(Yi(X))2.
;=1 II i=l
Here a penalty term is added to assure that the equality constraints are driven to
zero as II --+ O. In practice the penalty terms do not contribute problems to the line
search, as the function does not have the troublesome poles of the barrier function.
However, the penalty function can also be shown to be very ill-conditioned as II --+ 0
by an analysis similar to that done for the penalty function.
There have been remedies proposed to counteract these problems as each has arisen.
For example, Murray and Wright [9] have devised a safeguarded linesearch algorithm
especially designed for logarithmic barrier functions. Carefully implemented it is
effective, but can be very costly when the line minimum is close to a pole of the
barrier function. McCormick [8], Nash and Sofer [10], and Wright [13] have devised
partitioning algorithms with approximate inverses to deal with the ill-conditioning
of the Hessian. We will not attempt to give an entire spectrum of possible remedies
here. Rather, we will deal with two different ways of implementing interior point
methods which alleviate most of the problems discussed in this section, but at the
same time introduce new problems which remain for further study. The next two
sections deal with these two different methods.
~
B(X,II,A)=/(x)-J.l~Ailn (S i +c.(x»)
-'- , (10.5)
i=l J.I
where the Ai are estimates of the Lagrange multipliers and the Si are constants used
in scaling the problem. These will be discussed later. An immediate consequence of
modifying the barrier function in this way is that for any positive Sj, the problem of
initial feasibility disappears. To see this suppose Ci(XO) < 0 for some i. If we choose
Implementing Barrier Methods for NLP 403
Jl O so that Jl o Si +Ci(XO)
> 0, which is always possible since S; > 0, then the argument
of the logarithmic function is positive and the barrier function is well defined. Thus
one criterion in choosing Jl o is to assure that the barrier function is defined for an
infeasible initial point.
Ai
L (
m
'ilxB = 'ilf -
--1 S-I
.) 'ilCi(X),
+ .s.\..d
1- JI.
(10.6)
We are now faced with the problem of adjusting Ak and Jlk. The adjustment of Ak
is suggested by (10.6), namely
"kAk
Ak + 1 = ,..; (10.7)
I JlkSi+Cj(xk)'
The adjustment of Jlk is as before Jlk+1 = /Jl k , 0 < / < 1. Neither of these adjust-
ments is without problems. We will discuss each in turn.
The problem of adjusting Jlk stems from the previous discussion concerning using
Jl o to assure initial feasibility in the extended feasible region Ci(X) ;::: -JlOSj, i =
404 CHAPTER 10
1,·· ., m. If x'" is sufficiently close to the boundary of the extended feasible region
Ci(X) ~ -1''''Si for some i, then 1'''' cannot be reduced, as the barrier function will no
longer be defined at the initial estimate x'" to the (k + 1)8t subproblem. In practice,
we have found that this occurs quite frequently. In this case, only the Lagrange
multiplier estimates can be adjusted in the hope that subsequent points will move
away from the boundary of the extended feasible region, but often this does not
occur, and the method gets stuck.
The formula for adjusting >.k is in general satisfactory, but can , for poor initial
estimates >.0, lead to a divergent sequence of estimates to one or more of the >'i'S.
Again, this is a problem that we have encountered in practice.
Conn, Gould, and Toint [3] were able to devise an algorithm based on the modified
barrier method for which they were able to prove global convergence. The algorithm
involved both solving a feasibility problem to assure that the point X~+I, the initial
estimate for the (k + l)0t subproblem, satisfied the constraints for the extended
feasibility region with 1''''+1 suitably reduced. It also reduced I' or updated>' at any
given iteration, but not both. As the feasibility problem is itself about as difficult to
solve as minimizing the modified barrier function for a fixed I' and >., this algorithm
would appear to be certainly complex and its efficiency has yet to be satisfactorily
demonstrated.
Another difficulty with the modified barrier method is that it still requires a very
careful line search algorithm. The modified barrier method changes the location
of the poles of the barrier function to those points where Ci(X) + I'k Si = 0, but it
certainly does not eliminate the poles, and computational practice has shown that
the poles of the modified barrier method cause exactly the same difficulty with line
searches as the poles of the classical barrier method.
A final difficulty with the modified barrier method is that functions such as Ci (x) =
h( y'X), where h(·) is an arbitrary function, are only defined for x ~ o. Thus extending
the feasible region by using the modified barrier function allows some variable x to
take on values for which some of the constraints, or the objective function, are not
defined. The next section will discuss one way of dealing with these problems.
modifying the modified barrier function (10.5) to a modified penalty barrier function
L >.;<I>(c;(x)),
m
where
<I> ( c;(x)) = In (Si + c;~x)) , c;(x) ~ -;3J1S;, (10.9)
and
-1
qf = -:-(s-;J1""7":(1:---;3=)=)2 ,
b 1 - 2;3
qi = S;J1(1 - ;3)2 ,
;3(2 - 3;3)
= 2(1- ;3)2 + In(s;(l -
c
q; ;3)).
Here ;3 is a scalar satisfying 0 < ;3 < 1. The idea behind the choice of P is to use
the modified barrier method when the constraint is well away from the pole of the
extended feasible region, but to replace the barrier function with a quadratic penalty
function when the constraint becomes too close to the boundary of the extended
feasible region. The parameter ;3 is used to determine how close to the boundary of
the region we wish to allow the constraint to come before switching from the barrier
to the penalty function. The constants qf, qf, and qf are chosen to assure that the
two functions and their derivatives coincide at the boundary.
Using P in place of B has two immediate effects. First, J1 can always be reduced
as much as we wish at any iteration, for we no longer require c;(x) + SiJ1 ~ 0, as
the penalty function is well defined whether or not this condition holds. Further,
the method seems quite robust with respect to the choice of the initial J1. Second,
all poles are removed from the penalty-barrier function, which makes line searches
far simpler. The penalty-barrier algorithm is tested extensively against the modified
barrier algorithm in [1), and the improvements in performance are dramatic. A new
parameter, ;3, has been incorporated, and tests to date have shown that for badly
=
nonlinear problems, ;3 .9 appears satisfactory, while for mildly nonlinear problems,
;3 = .5 is preferable. A dynamic optimal choice of;3 remains for further study.
The use of P rather than B does nothing to solve the problem of constraints that
are undefined outside of a given region. In our experience, these regions can be
defined by simple bounds on the variables. To attempt to handle this problem, it
406 CHAPTER 10
appears best to handle simple bounds on the variables separately. In [1], we chose
to use the classical log barrier function for simple bounds on the variables. This
appeared satisfactory in practice, and did not appear to hurt conditioning of the
problem noticeably. This has by no means, however, been demonstrated to be the
best way of handling simple bounds. Simple projection onto the bounds appears to
be very competitive, for example. This remains another topic for further study.
The problem of when to reduce J-L and when to update>. is studied in [2]. Here it is
shown that reducing J-L by a factor of 10 every iteration, while updating>. according to
(10.7) at every iteration is far better in general than the strategy of either updating
J-L or >., but not both. The strategy of [3] is to keep J-L fixed as long as sufficient
progress toward feasibility is being made. If on any iteration sufficient progress is
=
not made, then >.k+ 1 >.k, and on only these iterations is J-L reduced. This strategy
has the advantage of being provably globally convergent, but as previously stated, is
less efficient in general. When the "update both at every iteration" strategy fails to
converge (which is very seldom) it does so by having one or more Lagrange multipliers
diverge. Thus further research in this area would appear to be indicated on how
to keep the multipliers bounded for the "update every iteration" strategy. Also,
the first order update formula for the Lagrange multipliers (10.7), while reasonably
satisfactory, can make it difficult to achieve accurate solutions on some problems, and
can certainly slow the rate of convergence. Thus it appears necessary to investigate
higher order updates for the Lagrange multipliers.
A remaining topic is the choice of the scaling factors Sj. Polyak's original modified
barrier had Sj 1, i= =
1,···, m. Note that if the constraint Ci(X) ~ 0 is scaled by a
scaling factor Cj (x) = Ci (x) the original modified barrier term becomes
Sj
As a final note on this section, as with the classical barrier method, if we wish
to incorporate equality constraints in the problem, we can transform (10.8) to a
penalty-barrier method by incorporating the equality constraints in an augmented
Lagrangian function. Here the function to be minimized at each iteration is
PIP
W(x,J-L,>.,J3) = P(X,J-L,>.,J3) + E>'i+mgj(X) + "2 Egj(X?
;=1 J-L ;=1
Implementing Barrier Methods for NLP 407
The update formula for the Lagrange multipliers corresponding to the equality con-
straints is
1:+1 _ I: Ci(Xk)._
Ai+m - Ai+m - - - k - ' l - 1,··· ,p.
J-I
This general method has been extensively tested in [2] and has shown to be quite
satisfactory in practice. An entirely different method for handling equality and
inequality constraints will be described in the next section.
minf(x)
subject to Ci(X)-Z; =O,i= 1,···,m, (10.11)
9i(X)=0,i=I,···,p,
Zi 2: 0.
Here, as the nonlinear inequalities are replaced with equalities, there is no reason to
differentiate between nonlinear inequalities and equalities. The classical logarithmic
barrier transformation of (10.11) is
m
minf(x) - J-I2)n(z;)
;=1
subject to Ci(X) - Zi = O,i = 1" ··,m, (10.12)
9i(X)=0,i=I,···,p,
and the Lagrangian for (10.12) is
m m p
L(x, z, A) = f(x) - J-I L In(z;) - L Ai(Ci(X) - Zi) - L Ai+m9;(X). (10.13)
;=1 ;=1 ;=1
408 CHAPTER 10
For clarity, it will be useful to differentiate between the Lagrange multipliers cor-
responding to the equality and slack inequality constraints. Thus we designate
Yi = Ai+m, i = 1, ... , p. The first order conditions for (10.13) are
m p
'IlxL='Ilf(x)- LAi'llci(X)- LYi'llgi(X) =0, (10.14)
i=1 ;=1
(10.15)
g;(x)=O,i=l,···,p, (10.17)
where Z =diag(z;), e = (1, ... , l)T, and A = (AI,"', Am)T. Following the primal-
dual formulation that has proved so successful for linear programming, we designate
A =diag(>.;) and rewrite (10.15) as
The solution technique now becomes directly analogous to that in linear program-
ming, namely to use Newton's method to find (x, z, >., y) which solve the modified
first order conditions. Denoting
'IlxL(X,Z,A,y) )
c(x) - z
F(X,Z,A,y)= ( g(x) , (10.19)
AZe
then a KKT point is a point satisfying F(X,Z,A,y) = 0, where g(x) = (gl(X), ... ,
gp(x))T and c(x) = (cl(x), ... ,cm(x)f. In [4], El-Bakry et al. analyze an interior
point method for finding a KKT point. In order to describe their algorithm, we first
need further notation. Let
'IlxL(x,z,>.,y) )
G(X,Z,A,y)= ( c(x)-z ,
g(x)
v = (x, z, A, yf,
and let ~v = (~x, ~z, ~>., ~y)T solve the damped Newton system
and
(10.21)
where
min zp>.?
(10.22)
zOT >.0
-n-
zoT >.0
(10.23)
IIG(vO)II'
and I E (0,1) is a constant. These are the familiar functions from linear program-
ming that guarantee that infeasibility is reduced comparably to complementarity
and that centrality is maintained. In linear programming, these conditions plus
nonnegativity of z and >. are sufficient to prove global convergence or divergence
of either z or >.. For nonlinear programming, however, an additional condition is
needed, namely that the chosen step length a also produces a sufficient reduction in
a merit function. Here the merit function III (a) is defined to be
3. Choose 1/2 ~ Ik ~ Ik-1, and substituting "fk for I in (10.20) and (10.21),
compute
a; = max {a: 8;(a') ~ 0 for all a' ~ a}.
<>E(O,l)
410 CHAPTER 10
Let
Let
Computational experience with this algorithm is very limited. EI-Bakry et al. report
some very preliminary results on a limited number of test problems of small dimen-
sion. Lasdon et al. [7] report results, on a somewhat larger test set, of a trust region
variant of the algorithm. We have been involved with applying the algorithm to non-
linear complementarity problems, which contain nonlinear programming problems as
a subset, and have found the algorithm quite promising, but still in a stage requiring
much more research. In particular, the algorithm contains many parameters, and
performance is very dependent upon parameter choice. Proper means of choosing
these will require extensive numerical testing. Further, in the modified penalty- bar-
rier method, the penalty-barrier function is minimized, at least approximately, for
each value of J-I k • Here J-I is adjusted after each single Newton step. Other adjustment
strategies may prove more computationally viable. As the method becomes more
fully tested, other issues will undoubtedly arise. Nonetheless, we find the method
sufficiently promising to merit much more research and testing. The next section
contains a few comparative results of the two methods documented here using the
research code developed to solve nonlinear complementarity problems to test the
method documented in this section.
Implementing Barrier Methods for NLP 411
The problems and starting points chosen for the comparison are six of the more
difficult problems of the Hock and Schittkowski [6] suite of nonlinear test problems.
Uk = min{1/1, 1/2zk:\k} ,
where 1/1 = .08 and 1/2 = 1 . The remaining algorithmic parameters were set as follows
I = 10- 6 , " = 10- 4 , P = 0.5.
The convergence tolerance was f =
10- 12 . For the penalty-barrier algorithm the
accuracy was 10- . The reader is referred to [2] for a detailed description of this
8
algorithm. The comparative results of the two algorithms are contained in Table
10.1.
Primal-dual Penalty-barrier
Problem Iterations Major Minor
23 15 3 26
80 14 3 27
86 11 4 39
100 12 2 44
106 26 3 87
117 33 5 131
In both codes, Newton method was used as the basic iterative procedure. In the
results for the penalty-barrier code, the number of major iterations is the number of
412 CHAPTER 10
times J-I was decreased and the Lagrange multipliers adjusted. The number of minor
iterations is the total number of Newton steps. For the primal-dual method, the
number of iterations is the total number of Newton steps.
The results dearly indicate that the primal-dual approach is more efficient on these
problems. It should be noted here that while the merit function used for the method
only guarantees convergence to a stationary point, not a local minimizer, in all cases
the documented minimizer was obtained. In view of the results, it is instructive to
consider the relative merits of the two approaches.
First, the primal-dual approach requires second partial derivatives, as the whole
method is prefaced on solving the damped first order conditions using Newton's
method. While the modified penalty-barrier method tested here uses second partial
derivatives it has been used successfully with truncated Newton methods and limited
memory variable metric methods, both of which only require first order information.
Thus for problems where second derivative information is difficult to obtain, the
penalty-barrier method appears far preferable.
Also, when a problem has few variables but many inequality constraints, even if
Newton's method is used with the penalty-barrier method, the matrix to be factored
is of the order of the number of variables, while the primal-dual method factors a
matrix of the order of the number of variables plus the number of constraints. Here
again, the penalty-barrier methods seems preferable.
An advantage of the primal-dual approach is that the Lagrange multipliers are cal-
culated directly by Newton's method rather than using first order estimates. This
should improve both accuracy and the rate of convergence. In fact, the method
is a variant of the sequential quadratic programming algorithm, which for equality
constrained problems usually converges in very few iterations. Thus for problems
with a reasonable number of constraints relative to the number of variables, and
available second order information, the algorithm should prove quite competitive.
Preliminary testing also indicates that the Hessian matrices remain better condi-
tioned, which should be a major advantage on ill-conditioned problems. This is the
case in all problems tested here.
Acknowledgements
This research was sponsored by the Air Force Office of Scientific Research, Air Force
System Command under Grant F49620-95-0110.
REFERENCES
[1] M. G. BREITFELD AND D. F. SHANNO, Computational experience with penalty-
barrier methods for nonlinear programming, RUTCOR Research Report RRR
17-93 (revised March 1994), Rutgers University, New Brunswick, New Jersey,
1995. To appear in Annals of Operations Research.
[2] - - , A globally convergent penalty-barrier algorithm for nonlinear program-
ming and its computational performance, RUTCOR Research Report RRR 12-
94 (revised September 1995), Rutgers University, New Brunswick, New Jersey,
1995.
[3] A. R. CONN, N. I. M. GOULD, AND P. TOINT, A globally convergent La-
grangian barrier algorithm for optimization with general inequality constraints
and simple bounds, Technical Report 92/07, Department of Mathematics, Fac-
ulte Universitaires de Namur, Namur, Belgium, 1992.
[4] A. S. EL-BAKRY, R. A. TAPIA, T. TSUCHIYA, AND Y. ZHANG, On the formu-
lation and theory of the primal-dual Newton interior-point method for nonlinear
programming, Technical Report TR92-40, Department of Computational and
Applied Mathematics, Rice University, 1992.
[5] A. V. FIACCO AND G. P. MCCORMICK, Nonlinear Programming: Sequential
Unconstrained Minimization Techniques, John Wiley & Sons, New York, 1968.
Reprint: Volume 4 of SIAM Classics in Applied Mathematics, SIAM Publica-
tions, Philadelphia, Pennsylvania, 1990.
[9] W. MURRAY AND M. H. WRIGHT, Efficient linear search algorithms for the
logarithmic barrier function, Report SOL 76-18, Department of Operations Re-
search, Stanford University, Stanford, CA, 1976.
[10] S. G. NASH AND A. SOFER, A barrier method for large-scale constrained opti-
mization, ORSA Journal on Computing, 5 (1993), pp. 40-53.
ABSTRACT
Research on using interior point algorithms to solve combinatorial optimization and inte-
ger programming problems is surveyed. This paper discusses branch and cut methods for
integer programming problems, a potential reduction method based on transforming an
integer programming problem to an equivalent nonconvex quadratic programming prob-
lem, interior point methods for solving network flow problems, and methods for solving
multicommodity flow problems, including an interior point column generation algorithm.
11.1 INTRODUCTION
Research on using interior point algorithms to solve combinatorial optimization and
integer programming problems is surveyed. Typically, the problems we consider can
be formulated as linear programming problems with the restriction that some of the
variables must take integer values. The methods we consider have been used to solve
problems such as the linear ordering problem, clustering problems, facility location
problems, network flow problems, nonlinear multicommodity network flow problems,
and satisfiability problems. This paper discusses four main methodologies, three of
which are similar to known approaches using the simplex algorithm, while the fourth
method has a different flavor.
Branch and cut methods are considered in section 11.2. Simplex-based branch
and cut methods have been very successful in the last few years, being used to
solve both specific problems such as the traveling salesman problem and also generic
integer programming problems. The research described in this paper constructs a
branch and cut algorithm of the usual type, but then uses an interior point method
417
T. Terlaky (etL).lnterior Point Methods ofMathematical Programming 417-466.
C 1996 Kluwer Academic Publishers.
418 CHAPTER 11
to solve the linear programming relaxations. The principal difficulty with using an
interior point algorithm in a branch and cut method to solve integer programming
problems is in warm starting the algorithm efficiently, that is, in using the solution
to one relaxation to give a good initial solution to the next relaxation. Methods
for overcoming this difficulty are described and other features of the algorithms
are given. This paper focuses on the techniques necessary to obtain an efficient
computational implementation; there is also a discussion of theoretical issues in
section 11.6.1. Column generation algorithms have a structural similarity to cutting
plane methods, and we describe a column generation algorithm for solving nonlinear
multicommodity network flow problems in section 11.5.1.
In section 11.3, we discuss a method for solving integer programming problems that
is based upon reformulating the integer programming problem as an equivalent non-
covex quadratic programming problem. The quadratic program is then solved using
a potential reduction method. The potential function has some nice properties
which can be exploited in an efficient algorithm. Care is needed so that the algo-
rithm does not get trapped in a local minimum. We also discuss a related algorithm
for solving quadratic integer programming problems, which can be applied to the
graph partitioning problem, for example.
Many network flow problems can be solved by ignoring the integrality require-
ment on the variables and solving the linear programming relaxation of the problem,
because it is guaranteed that one of the optimal solutions to the linear program will
solve the integer programming problem. Typically for these problems, the simplex
method can be considerably enhanced by exploiting the structure of the constraint
matrix; there are also often very good methods which are not based on linear pro-
gramming. Thus, the challenge is to design an efficient implementation of an interior
point method which can compete with the algorithms which are already available.
We describe the research in this area in section 11.4.
Theoretical issues are discussed in section 11.6. This includes a discussion of the
computational complexity of interior point cutting plane methods and also improved
complexity results for various combinatorial optimization problems that have been
obtained through the use of interior point methods.
1 •
o 1 2 3 4 5
which has extreme points (1,2), (2,1), (3,1), (4,2), and (3,3). For a given linear
objective function cT X := CIXI + C2X2, the optimal solution to the integer program
420 CHAPTER 11
min{ cT x : xES} will be one of these extreme points. Thus, with the given de-
scription of P, we could solve the integer program by solving the linear program
min{ cT x : x E Pl. Of course, in general it is hard to find the polyhedral descrip-
tion P.
A branch and bound approach to this problem would examine the solution (0, 1.8)
to the LP relaxation and then split the problem into two new problems, one where
X2 ~ 2 and one where X2 ::; 1. These new linear programs are then solved and
the process is repeated. If the solution to any of the linear programming problems
that arise in this process is integer then that point solves the corresponding part of
the integer programming problem; if any of the linear programs is infeasible, then
the corresponding part of the integer program is also infeasible. The value of the
linear program provides a lower bound on the value of the corresponding part of the
integer program, and this bound can be used to prune the search space and guide
the search.
Cutting plane methods and branch and bound methods can be combined into a
branch and cut method, but we will discuss them separately, in order to emphasize
their individual features. For a good discussion of simplex-based branch and cut
methods, see, for example, the books by Nemhauser and Wolsey [61) and Parker
and Rardin [64). The book [61) is a detailed reference on integer programming and
it discusses cutting plane algorithms comprehensively; for a summary of this book,
see [62). The book [64) also discusses cutting plane algorithms, and it discusses
branch and bound in more detail than [61). Junger et al. [35) discuss computational
work using branch and cut algorithms to solve a variety of integer programming
problems.
IPMs for Combinatorial Optimization 421
As mentioned above, cutting plane and branch and bound methods work by setting
up a linear programming relaxation of the integer programming problem, solving
that relaxation, and then, if necessary, refining the relaxation so that the solution
to the relaxation gets closer to the solution to the integer programming problem.
These methods have been known for many years (Land and Doig [46], Gomory [26]),
and they have achieved very good results in the last few years. Of course, most of
these results have been achieved by using the simplex algorithm to solve the linear
programming relaxations; the focus in this section is on using an interior point
method to solve the relaxations. Unfortunately, is is not usually sufficient to simply
replace the simplex algorithm with an interior point method, because an interior
point method is not as good as the simplex algorithm at exploiting the solution to
one relaxation when trying to solve the next relaxation. This relatively poor use of
the warm start provided by the previous relaxation makes it necessary to only solve
the relaxations approximately; the algorithms seem fairly adept at exploiting this
approximate solution. Other refinements to a traditional branch and cut approach
are also necessary when using an interior point method, but the principal difference
is in the use of approximate solutions to the relaxations.
We discuss cutting plane algorithms in section 11.2.1 and branch and bound al-
gorithms in section 11.2.2. Adding a constraint to a primal linear programming
problem is structurally equivalent to adding a column to the dual problem, so re-
search on column generation algorithms has a strong impact on research on cutting
plane algorithms, and vice versa. In section 11.5.1, we discuss a column generation
algorithm for a multicommodity network flow problem. The theoretical performance
of cutting plane and column generation algorithms is discussed in section 11.6.1.
(IP) is
mm cTx
subject to Ax :5 b (LP~)
0 < x < e
where e denotes a vector of ones of the appropriate dimension. (We will use e in
this way throughout this paper.) If the optimal solution to (LP~) is integral then
it solves the original problem (I P), because it is feasible in (I P) and it is at least
as good as any other feasible point in (I P). If the optimal solution x LP to (LP~) is
not integral, then we improve the relaxation by adding an extra constraint or cutting
plane of the form ao T X :5 bo. This cutting plane is a valid inequality for (I P) but it
is violated by the optimal solution x LP . We then solve the modified LP relaxation,
and repeat the process.
The recent success of simplex based cutting plane algorithms has been achieved
through the use of polyhedral theory and specialized cutting planes; the cutting
planes are generally chosen from families of facets of the convex hull of feasible
integer points. Traditionally, Gomory cutting planes were derived from the optimal
simplex tableau; Mitchell [55] has shown how these same cutting planes can be
derived when using an interior point cutting plane algorithm.
mm
subject to b (LP)
o < x < U
max UTw
subject to W + z c (LD)
w,Z > 0
mm cTx
subject to Ax = b
aoT x + Xo bo (LPnew)
0 < x < U
0 < Xo < Uo
IPMs for Combinatorial Optimization 423
3. Add cutting planes: See if the current iterate violates any constraints. If
not, tighten the desired degree of accuracy and return to Step 2; otherwise, add
a subset of the violated constraints and go to Step 4.
for some appropriate upper bound Uo on the new slack variable Xo. The correspond-
ing new dual problem is
Note that if we know feasible solutions i: > 0 and fj, tV > 0, Z > 0 to (LP) and
(LD) respectively, then, after the addition of the cutting plane, we can obtain a new
feasible solution to (LDnew) by taking y = y, w = tV, Z = Z, Yo = 0 and Wo = Zoo If
we pick Wo = Zo to be strictly positive then all the nonnegativity constraints will be
satisfied strictly. It is not so simple to obtain a feasible solution to (LPnew) because
we have a oT i: > bo if the new constraint was a cutting plane. Nonetheless, if the old
solution was close to optimal to (LP) and (LD) then we can hope that it should
also be close to the solution to (LPnew) and (LDnew), so it provides a warm start
for solution of the new problem.
In this section, we discuss how an interior point method can be used in this setting.
A simple, conceptual interior point cutting plane algorithm could be written as
in figure 11.2. We will give a more formal algorithm later. Currently, the best
algorithm for linear programming appears to be the primal-dual predictor-corrector
barrier method (see Lustig et al. [49,50] and Mehrotra [52]), so we consider modifying
this algorithm for use in a cutting plane algorithm. Other interior point algorithms
424 CHAPTER 11
which maintain strictly positive primal and dual iterates can be modified in a similar
manner. We will also briefly discuss using a dual algorithm.
With a primal-dual algorithm, we always have interior primal and dual iterates, that
is, 0 < x < U, W > 0 and Z > O. We also have a barrier parameter Ji and we refer to
an iterate as centered if we have
=
The perfect matching problem: Given a graph G (V, E) with vertices
V and edges E, a matching is a subset M of the edges such that no two
edges in M share an end vertex. A perfect matching is a matching which
contains exactly I V I /2 edges, where I V I denotes the cardinality of V.
Given a set of weights We associated with the edges e in E, the perfect
matching problem is to find the perfect matching M with smallest weight
w(M) := EeEM We·
Edmonds [15, 16J showed that the perfect matching problem can be solved in poly-
nomial time. He also gave a complete polyhedral description of the perfect matching
problem. He showed that the optimal solution to a perfect matching problem is one
of the solutions to the linear programming problem
where o( v) denotes the set of edges in E which are incident to vertex v and E(U)
denotes the set of edges in E which have both end vertices in U, where U is a
IPMs for Combinatorial Optimization 425
subset of V. Equations (11.2) are the degree constraints and equations (11.3) are
the odd set constraints. The number of odd set constraints is exponential in the
number of vertices, so it is impracticable to solve the linear programming problem
as expressed. Thus, in a cutting plane method, the initial relaxation consists of the
degree constraints together with the nonnegativity constraints (11.4), and the odd
set constraints are added as cutting planes. Consider, for example, the graph given
in figure 11.3. Here, the edge weights are the Euclidean lengths of the edges. The
O~5
if e is one of the edges
" = { (Vl,V2),(V2,V3), (Vl,V3),(V4,VS), (V4,V6), (VS,V6)
otherwise
This solution violates the odd set constraint with U = {Vl, V2, V3}:
If this constraint is added to the relaxation, the optimal solution to the linear pro-
gram is the optimal matching given above.
Another problem that can be solved by a cutting plane algorithm is the linear or-
dering problem - see, for example, Grotschel, Junger and Reinelt [28] or Mitchell
and Borchers [57].
The linear ordering problem is NP-Hard [44]. This problem can be expressed as an
integer linear programming problem:
mm Li,j CijXii
subject to xii+ Xji = 1 for 1 S; i < j S; IV I (11.5)
xii + Xjk + xki S; 2 for 1 S; i < j < k S; IV I (11.6)
= 0 or 1 for 1 S; i < j S; IV I (11. 7)
Grotschel et al. [28] have found several classes of valid inequalities for the linear
ordering problem which can be used as cutting planes. However, for many real world
problems, the solution to the linear programming relaxation given above solves the
linear ordering problem. The equations (11.6) are known as the 3-dicycle constraints;
notice that there are ( ; ) of them. In both [28] and [57]. the initial relaxation
consists just of the equations (11.5) together with the simple bounds 0 S; Xij S; 1 for
each edge; the 3-dicycle constraints are added as cutting planes as needed. In these
implementations, the solutions to the relaxations can be integral but not feasible in
the linear ordering problem; in this case cutting planes are used to cut off infeasible
integral points. Thus, this approach to solving the linear ordering problem does not
quite fit in the framework we discussed earlier with relation to the problems (I P)
and (LP), but that framework can be extended in an obvious manner to include
this approach to the linear ordering problem. The traveling salesman problem is
also usually formulated so that solutions to the LP relaxations can be integral but
infeasible; the subtour elimination constraints are used to cut off these infeasible
integral points. (For a good discussion of the traveling salesman problem see the
book edited by Lawler et al. [47]; for a recent description of an implementation, see
Applegate et al. [5].)
Early termination
The optimal solution to (LP) is not an interior point. Therefore, if we solve (LP) to
optimality then it is necessary to perturb the solution slightly to obtain an interior
point before we can even start solving (LPnew) using an interior point method.
Typically, if an interior point method is started from close to the boundary, it will
move towards the center of the feasible region before starting to move towards the
optimal solution. Thus, the optimal solution to (LP) is not a very good starting point
for trying to solve (LPnew). A very successful method to try to avoid this difficulty
is to terminate solution of (LP) early. We will then have an interior point when we
start solving (LPnew). In addition, we will not spend as many iterations returning
towards the center of the polyhedron and we will start moving towards the optimal
solution to (LPnew) more quickly. Thus, we will spend fewer iterations solving
IPMs for Combinatorial Optimization 427
(LP) because we only solve it approximately, and we will also spend fewer iterations
solving (LPnew) because we start off with an iterate which is more centered.
We can terminate solution of (LP) early if v<'e can find cutting planes which are
violated by the current solution. In fact, if we can find cutting planes which are
violated by this current iterate, they may well be deeper cuts and cut off more of the
feasible region, because the iterate is closer than the optimal solution to the center
of the polyhedron. We may also be able to find more good cutting planes at this
early iterate.
Consider, for example, the perfect matching problem on the graph in figure 11.4,
where edge weights are the Euclidean lengths. The optimal matching in this graph
uses edges (VI, VIO), (V2' V3), (V4' V5), (V6, V7), and (V8, vg). The optimal solution to
the LP relaxation consisting of just the degree constraints and nonnegativity has
The separation routine for detecting violated odd set constraints involves finding con-
nected components in the graph that only has edges where Xe > T for some thresh-
old T 2: O. Thus, it would find the violated constraints for the oddsets {VI, V4, vs}
and {V6, V7, VIO}. If we search at an early iterate, we may well have Xe > 0 on edges
(V2' vs) and (V3, V4), and in addition the values Xe on these edges are discernibly larger
than those on the edges (V4' V6), (V5, V7) and (VI, VIO). Thus, for appropriately chosen
values of T, we would find the violated odd set constraints given above and also the
constraints corresponding to the odd sets {VI, V2, V3, V4, V5} and {V6, V7, V8, Vg, VI o}.
Without these constraints, the solution to the relaxation is fractional; thus, these
constraints are necessary, and the ability of the interior point method to find these
constraints at an earlier stage means that one fewer LP relaxation has to be solved.
428 CHAPTER 11
There are two disadvantages to looking for cuts before solving the current relaxation
to optimality. Firstly, we may be unable to find any cuts, so the search is a waste
of time. Secondly, the search may return cuts which are violated by the current
iterate, but which are not violated by the optimal solution, so we may end up
solving additional relaxations. The second disadvantage can mitigated by moving
towards the optimal solution from the center of the polyhedron, making it unlikely
that we will violate unnecessary cutting planes. One method for reducing the impact
of the first disadvantage is to use a dynamically altered tolerance for deciding when
to search for violated cutting planes. We only search when the duality gap drops
below this tolerance. If we find a large number of violated constraints, we increase
the tolerance, because we probably did not need to solve the relaxation to such a
high degree of accuracy. If we only find a small number of violated constraints, we
decrease the tolerance - we should solve the relaxations more accurately to obtain
a better set of cutting planes as the relaxation becomes a better approximation to
the convex hull of feasible integer points. As the number of violated cutting planes
drops, it should also take fewer iterations to solve the next relaxation because the
two relaxations should be close to each other.
Early termination is the most important technique for improving an interior point
cutting plane algorithm. By using a dynamically altered tolerance for determining
when to search for cutting planes, the time spent on unnecessary searches can be
dramatically reduced.
One possible reason for the difficulty with restarting an interior point cutting plane
method with an infeasible point can be developed from the work of, for exam-
ple, Anstreicher [3], Mizuno et a/. [60], and Zhang [82], who have all discussed
interior point algorithms for linear programming which move towards feasibility
and complementary slackness simultaneously. A common feature of the analysis
of these algorithms is the exploitation of the fact that they move towards feasibil-
ity at least as fast as they move towards complementary slackness. When restart-
ing directly from the approximate solution to the previous relaxation (LP), the
primal infeasibility is xo+ 1bo - a'{; i I. The total complementary slackness is
iT z + (u - i f w + XoZo + (uo - xo)wo. In order to get an iterate which is ap-
proximately centered, we could choose Wo = Zo = 2J.t/uo and Xo = uo/2. The
complementary slackness will then be approximately (2n + 2)J.t, so the ratio between
infeasibility and complementary slackness will be large if J.t is small. Other choices
for Xo, wo, and Zo would require a tradeoff between centrality and this balance be-
tween infeasibility and complementary slackness. This may explain why it is hard
to get very fast convergence from the infeasible warm start generated in a cutting
plane algorithm.
Ye, Mizuno and Todd [81] introduced a skew symmetric self dual algorithm for
linear programming. Further investigation of this algorithm is described in, for
example, [76, 77]. This algorithm has the property that it is easy to generate a
perfectly centered initial iterate. This has the potential to make this algorithm very
useful in a cutting plane framework, because we can take the iterate for the previous
relaxation, modify it slightly, and obtain an almost centered iterate for the new
relaxation. This is an issue that needs more computational investigation.
430 CHAPTER 11
mm cTx
subject to Ax = b
Aox + Xo = bo (LPmany)
0 < x < u
0 < Xo :5 Uo
for some appropriate upper bound Uo on the new slack variables Xo. Note that
Xo and Uo are now vectors and Ao is a matrix of the appropriate dimension. The
corresponding dual problem is
where Yo, wo, and Zo are all vectors with dimension equal to the number of added
constraints. If we have an interior feasible solution to (LP) and (LD) then we can
get an interior feasible solution to (LDmany) by setting Yo 0 and Wo = Zo fe = =
for some small positive constant f. If we set Xo to some positive vector, we can then
restart using an infeasible interior point method. Alternatively, it may be possible
to update the primal solution to a point which is feasible in (LPmany). Thus, the
algorithm can be restarted when many constraints are added in much the same way
that it can be restarted when only one constraint is added.
• The LP relaxation has been changed dramatically, so the interior point algo-
rithm takes several iterations to approach a new center, and then several itera-
tions to move towards the optimal solution.
• The constraint matrix becomes far larger, so the computational time required
at each iteration increases.
• It takes more iterations to solve a linear program with a large number of con-
straints than one with a small number.
IPMs for Combinatorial Optimization 431
Perhaps the simplest method for choosing which constraints to add is to add the
constraints that are most violated by the approximate solution i: to (LP). The
disadvantages of this method are that it may add a large number of very similar
constraints, or that a constraint with a large violation may not actually be that
important.
Dropping constraints
Computationally, it is useful to be able to drop constraints because this will reduce
the time required for each iteration by reducing the size of the constraint matrix.
An additional benefit of dropping constraints is that smaller linear programs require
fewer iterations to solve. The simplest way to decide whether to drop a constraint is
to check its slack value - if the current iterate satisfies the constraint easily, then the
constraint is a candidate to be dropped. When a constraint is dropped, the structure
432 CHAPTER 11
Primal heuristics
Primal heuristics are algorithms which generate integer solutions to (J P) from frac-
tional solutions to (LP). If they are very cheap, it is possible to call them at every
iteration; however, it is usually more cost effective to only call them when the sepa-
ration heuristics are also called.
For many problems, it is usually considerably easier to find the optimal solution
than to prove that it is optimal. The primal heuristics may well find the optimal
solution, and the cutting plane method can then be used to prove that that solution
is optimal. If the objective function vector c is integer then we do not need to
proceed any further with the cutting plane algorithm once the lower bound provided
by the value of (LD) is within one of the value of the best known feasible solution
to (I P) provided by the primal heuristics. Thus, the primal heuristics may well save
us work by letting us terminate without having to construct a relaxation which has
an optimal solution that is feasible in (I Pl.
If the interior point method is converging to a point in the interior of the optimal
face of Q then the primal heuristics may well provide one of the optimal solutions
to (IP), so we can terminate the algorithm, because the value of the relaxation will
agree with the value of the integer solution. Without the primal heuristics, we may
search futilely for cutting planes, and be forced to branch. Thus, a good primal
heuristic algorithm can save a great deal of time.
Another use for the primal heuristic is to modify the restart point in Q. It can be
modified to be slightly closer to the integer point generated by the primal heuristics.
It is possible that the fractional optimal solution may provide more information than
one of the optimal integer solutions. For example, in the linear ordering problem, an
optimal fractional solution corresponds to a partial ordering of the nodes, and every
ordering which agrees with this partial ordering is optimal.
For a survey of the effects of degeneracy on interior point methods for linear program-
ming problems, see GuIer et a/. [30]. Degeneracy does not appear to be as serious a
problem for interior point methods as it is for the simplex algorithm. The principal
practical effect of degeneracy on an interior point method is to cause possible numer-
ical problems because of numerical instability and ill-conditioning of the projection
matrix. Many integer programming problems have highly degenerate relaxations, so
an interior point method might be particularly well suited to such problems.
Fixing variables
Simplex branch and cut methods can use reduced costs to fix variables at zero or one
in the following manner. Let r be the reduced cost of a variable which is currently
zero in the solution to the relaxation. Let v U B be the value of the best known feasible
solution to (I P) and let v LB be the value of the relaxation. If r > v U B - v LB then
this variable must be zero in any optimal solution to (I P). A similar test can be
given for fixing a variable at one.
The reduced costs are not available at the current interior solution to the relax-
ation (LP), but the dual variables are available, and these can be used to fix vari-
ables. If Zi is the dual variable corresponding to the primal variable Xi and if ijLB
is the value of the current feasible solution to (LD) then Xi can be fixed at zero if
Zi > v U B - ijLB. A similar test can be used to fix variables at one. See Mitchell [55]
for more details.
1. Initialize. Set up the initial relaxation. Find initial interior primal and dual
points. If possible, find a feasible point in Q. If possible, find a restart point in
the relative interior of Q for use in Step 8.
2. Inner iteration. Perform one iteration of the primal dual algorithm. While
the duality gap is above the tolerance T, repeat this step.
3. Primal heuristics. Use the primal heuristics to try to improve on the current
best solution to (I P). If successful, also update the known feasible point in the
relative interior of Q.
4. Look for cutting planes. Use heuristics and/or exact algorithms to find
cutting planes, if any exist.
5. Add cutting planes. If any cutting planes were found in Step 4 then add an
appropriate subset.
Some barrier methods, affine methods, and projective methods have been developed
for solving problems using either just the primal variables or just the dual variables,
and such methods can be used to solve the dual problem. Mitchell and Todd [59] con-
sidered using a projective algorithm applied to the dual problem in a cutting plane
algorithm. This algorithm does not require primal iterates, and only uses the value
of a primal feasible point in calculating the direction at each iteration. Heuristics
are used to generate primal solutions. When cutting planes are added to the primal
problem, a strictly positive dual iterate is obtained by first moving in a direction
which is guaranteed to increase the additional dual variables, and the algorithm is
then restarted from this new point. They obtained reasonable computational results
on matching problems, in terms of the number of iterations required. They also
obtained promising results on linear ordering problems; for details see [54].
Goffin et al. [25, 22, 21] have also experimented with algorithms which only require
primal iterates in their algorithms for nonsmooth optimization and multicommodity
flow problems. For a discussion of their algorithm for multicommodity flow problems
see section 11.5.1.
As with interior point cutting plane methods, one of the important features of a
competitive interior point branch and bound algorithm is that the relaxations are
not solved to optimality but are terminated early. This is usually possible, as we now
argue. When using branch-and-bound, one of four things can happen at each node
of the tree. The subproblem could be infeasible; in an interior point method this
can be detected by finding a ray in the dual problem. The subproblem could have
optimal value worse than the value of a known integer feasible solution, so the node
is fathomed by bounds; in an interior point method, this can usually be detected
well before the subproblem is solved to optimality. The optimal solution could be
an integer solution with value better than the best known solution; in this case we
need to solve the subproblem to optimality, but the node is then fathomed. The
final possibility is that the optimal solution to the subproblem has optimal value
436 CHAPTER 11
smaller than the best known solution, but the optimal solution is not feasible in the
integer program; in this case, it is possible to use heuristics based upon the basis
identification techniques described in El-Bakry et al. [17] to determine that one of
the integer variables is tending to a fractional value, and therefore that we should
probably not solve the relaxation to optimality but should branch early.
It should be noted that in only one case is it necessary to actually solve the relaxation
to optimality, and in that case the node is fathomed. When we branch early, one
constraint in the dual relaxation (LD) is dropped, so the previous solution to (LD) is
still feasible. One variable in (LP) is fixed, so infeasibilities are introduced into (LP).
Despite this, it is still possible to solve the child node quickly [10, 11].
A branch and bound interior point algorithm has the form given in figure 11.6.
Notice that we do not necessarily maintain primal and dual feasibility throughout
the algorithm. This means that some of the tests for convergence have to depend
upon whether the iterates are feasible.
If the relaxation is infeasible that is detected in Step 3d. If the relaxation has an
optimal value that is worse than that of the best known integer solution then that is
detected in Step 3c. In these two situations, the solution of the relaxation should not
take very many iterations, and the node is then fathomed. If the relaxation has an
integer solution which is better than the best known solution, then this relaxation is
solved to optimality and the node is fathomed in Step 3b. Here, the solution of the
relaxation may take several iterations, because an exact solution is needed, but the
node is then fathomed. Of course, it is possible that the rounding heuristics provide
the optimal solution to the relaxation early, and this is sufficiently close in value to
the dual value that the solution to the relaxation can be terminated early.
The one other possibility for a node of the tree is that the optimal solution to the
relaxation is fractional, but it has value smaller than that of the best known integer
solution. We discuss this situation further in section 11.6.
If it is necessary to branch then two child nodes will be created. It may eventually
be necessary to solve the relaxations at these child nodes, so it will be necessary to
start an interior point method on these relaxations. The simplex method can start
directly from the solution to the parent node, using the dual simplex algorithm. It
is necessary to modify the solution to the parent slightly before restarting with an
interior point method, in order to obtain a slightly more centered point. We discuss
restarting in more detail in section 11.6.
eracy is generally not such a problem for interior point methods, and at least one
commercial package has installed a switch to change from simplex to an interior
point method within the branch-and-bound tree if difficulties arise. Applegate et
al. [5] also implemented such a switch. For a discussion of the effects of degeneracy
on interior point methods for linear programming, see GuIer et al. [30].
One cost of using the simplex algorithm in a branch and bound method is that it
is necessary to perform an initial basis factorization for each child subproblem. The
cost of this is clear when examining, for example, the performance of the branch and
bound code for OSL [33] on solving integer programming problems: it often happens
that the average time per iteration is about three times larger for subproblems than
it is for the root node of the branch and bound tree. This extra time is almost
completely attributable to the overhead required at each node to calculate a new
basis factorization. A comparable slow down does not happen with interior point
methods. One way to avoid this overhead would be to store the basis factorization
of each parent node, but this usually requires too much storage and is hence imprac-
ticable. Of course, part of the reason that the slow down is so noticeable is that the
simplex algorithm requires far fewer iterations to solve subproblems than to solve
the root node, because the optimal solution to the parent node does provide a good
simplex warm start for the child subproblem. At present, it does not seem possible
to get a similar reduction with an interior point method, but the fact that the basis
refactorization is so expensive means that it is not necessary to obtain as good a
reduction in the number of iterations as enjoyed by the simplex algorithm.
Interior point branch and bound methods are somewhat competitive with simplex
based branch and bound algorithms on some problems, including facility location
problems [11]. These problems have a large number of continuous variables and a
relatively small number of integer variables, so the LP relaxations are large yet the
branch and bound tree is small. It is necessary to have large relaxations for an
interior point method to compete with a simplex method. In addition, we need the
problem to be solvable on current hardware, so the branch and bound tree can not
grow too large, so we need to have only a small number of integer variables. Because
of the early termination of the solution of the relaxations, not as much information
is available at each node, so the pseudo costs [64] used to select the next branching
variable and the next node can not be calculated as accurately. This is another
reason why an interior point method can currently only be competitive on problems
with a small proportion of integer variables, because in this situation the effect of a
bad choice of branching variable is not so dramatic. More research is needed to find
good, reliable pseudo costs in an interior point branch and bound method.
To conclude this section on interior point branch and bound methods, we discuss
restarting the algorithm (subsection 11.6) and terminating the solution of the re-
IPMs for Combinatorial Optimization 439
laxation early when the iterates are tending towards a fractional solution (subsec-
tion 11.6).
There is a risk associated with attempting to stop solution of the parent subproblem
early: the parent may be split into two child subproblems, when it might have been
possible to prune the parent if it had been solved to optimality. This could happen if
the parent subproblem has worse objective function value than that of the best known
feasible solution to (I P), or if it is infeasible, or if it has an integer optimal solution.
(Notice that the last possibility is unlikely if the basis identification techniques are
working well.) Therefore, it is wise to include some safeguards to attempt to avoid
this situation. Upper and lower bounds on the value of a subproblem are provided
by the values of the current primal and dual solutions, respectively, and these can
be used to regulate the decision to branch.
There are three tests used in [11] to prevent branching too early: the dual iterate
must be feasible, the relative primal infeasibility must be no greater than 10%,
and the dual objective function must not be increasing so quickly from iteration to
iteration that it is likely that the node will be fathomed by bound within an iteration
or two. Dual feasibility can usually be maintained throughout the branch and bound
tree so the first criterion is basically just a technicality. Every time a variable is
fixed, primal infeasibility is introduced; if the initial iterate for a subproblem is a
good warm start, primal infeasibility can be regained in a few iterations. Thus, the
important criterion is the third one, regarding the increase in the dual value. This
criterion prevents branching if the difference between the dual value and the value of
the incumbent integer solution has been reduced by at least half in the last iteration,
provided the current primal value is greater than the current integer solution if the
current primal iterate is feasible.
dual barrier method is being employed (see, for example, Lustig et al. [49]). It
should be noted that many of the observations we make will also be applicable if
other interior point methods are used.
Assume a child problem has been created by fixing the variable Xo at 0 or 1 in the
parent problem
mill cT x + Coxa
subject to Ax + aoxo b (LPparent)
o < x, Xo < e,
where A is an m x n matrix, ao and bare m-vectors, c and x are n-vectors, and Co
min cTx
subject to Ax b (LPchild)
0:::; x :::; e,
It may be possible to start the solution of the child problem from an iterate for
(LPparent) which was found before x*. This earlier iterate would be further fromop-
timality for (LPparent) than x*, but it may be a better initial solution to (LPchild)
just because it is more centered, with the nonbasic components being somewhat
larger. Preliminary experiments show that this approach may hold some promise,
but it needs considerably more investigation.
Finally, in section 11.3.6, we describe a method for solving quadratic integer pro-
gramming problems.
mlog(n - x T x) - Llogs;
;=1
where s = b - Ax. The point x· is a global minimizer of this potential function if
and only if it is a feasible solution to (IPfeas).
(11.9)
It should be noted that the Hessian is a dense matrix, due to the outer product of
the vector x'" with itself. The subproblem which is solved to find the direction .6.x
is then
mm (1/2).6.x T H.6.x + hT .6. x
subject to .6.xAT 8- 2 A.6.x ::5 r2
IPMs for Combinatorial Optimization 443
If 0 < r ::; 1 then a feasible point dx in this subproblem leads to a feasible point
x + dx in the problem (QP). The solution to this subproblem depends on the
eigenvalues of the Hessian matrix H in the norm defined by the matrix AT S-2 A;
for details, see [43, 40]. This subproblem can be solved in polynomial time. For
methods for solving it, see [43, 40] or Ye [79]. Kamath et al. originally proposed
taking a step of a fixed length in the direction dx; it was subsequently pointed out
by Shi and Vannelli [73] that the algorithm can be considerably enhanced by using
a line search to determine a step length.
Van Benthem et al. [7, 75] developed a variant of this algorithm to solve the radio
link frequency assignment problem. Their refinements to the original algorithm
included a method to deal with equality constraints, and the use of a barrier method
rather than a potential function method so that the Hessian matrix retains the
sparsity structure of AS2 AT. The structure of their problem is such that all the
slack variables in the original problem (I P f eas) must also be binary. They used
this observation to develop a quadratic objective function which enabled them to
eliminate the inequality constraints.
Other rounding schemes can be used. For example, we can choose xr by examining
whether xf is increasing or decreasing. For some problems, the structure of the
problem suggests a natural rounding scheme; for example, Van Benthem et al. [7]
have suggested several rounding schemes for the radio link frequency assignment
problem. If the rounded point xk is feasible in (QP) then we can terminate the
algorithm with success: xk leads to a feasible point in the original problem (I P feas).
Notice that every {-1, 1} point satisfies this constraint except i: k • After adding
this constraint, the algorithm is restarted from a strictly feasible point. It is best
to restart from scratch because the objective function is non convex, so we want to
generate a sequence of iterates that does not lead to the local minimizer. There is
no guarantee that equation (11.10) will cut off the local minimizer, but Karmarkar
claims that the addition of this constraint is usually sufficient to push the sequence
of iterates in a different direction, so that the algorithm terminates at a different
point.
Van Benthem et al. [7, 75] solved radio link frequency assignment problems using
a potential reduction method. By cleverly exploiting the structure of their model,
they were able to develop a variant of the algorithm which solves problems with
several thousand variables and constraints.
max xTQx
subject to Xi= -1,1, i = 1, ... , n
upper bounds on the optimal value of the quadratic integer programming problem
in polynomial time [38].
This approach encloses the feasible region in an ellipsoid, finds the maximum value
of the objective function in the ellipsoid, and then modifies the ellipsoid appropri-
ately. The maximum value of the objective function over the ellipsoid is the largest
eigenvalue of an appropriate matrix. By modifying the ellipsoid appropriately, it is
possible to obtain a reasonable upper bound on the optimal value of the quadratic
problem.
We now describe the minimum cost network flow problem. Given a directed graph
G = (V, E) with m vertices V and narcs E, the arc from vertex i to vertex j is
denoted by (i,j). Flow moves around the network along the directed arcs. If more
flow is produced at a node i than is consumed at that node, then the node is called
a source node. If more flow is consumed at a node i than is produced at that node,
then the node is called a sink node. Any node which is neither a source node nor a
sink node is called a transshipment node. Let bi denote the net required flow out of
node i; if bi > 0 then node i is a source, if bi < 0 then node i is a sink, and if bi 0 =
then node i is a transshipment node. For a feasible flow to exist, it is necessary that
I:iEv bi = O. The flow must satisfy Kirchhoff's Law of flow conservation: the total
flow out of node i must equal the sum of bi and the total flow into node i for each
node i. There is a cost Cij for each unit of flow shipped along arc (i,j). We assume
without loss of generality that the lower bound on each arc is zero (see [1]), and
we denote the upper bound on arc (i, j) by Uij' The minimum cost network flow
problem is then to meet the demands at the nodes at minimum cost while satisfying
both Kirchhoff's Law and the bounds on the edge capacities. This can be expressed
as the following linear programming problem:
mill E
(i,j)EE
CijXij (11.11)
subject to E Xij -
E Xji = bi for all i E V (11.12)
(i,j)EE (j,i)EE
o ::; Xij ::; Uij for all (i,j) E E (11.13)
where Xij denotes the flow on arc (i,j). Usually, the problem data is integer, III
which case one of the optimal solutions to this linear program will be integer.
We let A denote the node-arc incidence matrix of the graph. Each column of A
corresponds to an arc (i,j) and has an entry "I" in row i and an entry "-I" in row j,
with all the remaining entries being zero. Notice that the constraint (11.12) can
be written Ax = b. The rank of the matrix A is equal to the difference between
the number of vertices and the number of connected components of the graph. One
redundant row can be eliminated for each connected component. For simplicity of
notation we retain the redundant rows, but it should be understood that these rows
have been eliminated.
3. Main loop: While the stopping criterion is not satisfied, repeat the following
steps:
ADATv =W (11.14)
a factorization of the matrix ADAT. The matrix D and the vector w change from
iteration to iteration; it is necessary to solve this system for more than one vector
w at each iteration of some algorithms. Resende and Veiga showed that superior
performance can be obtained on network flow problems if the system (11.14) is
solved using a preconditioned conjugate gradient method.
The structure of the network flow problem makes it possible to choose a good pre-
conditioner M. The simplest preconditioner is to take M to be the diagonal of the
matrix ADAT. This is simple to compute, it makes the calculation of Zk+l trivial,
and it can be effective. A more sophisticated preconditioner that exploits the nature
of the problem is the maximum weighted spanning tree (MST) preconditioner. The
edges of the graph are weighted by the corresponding elements of the diagonal ma-
trix D, and a maximum weight spanning tree is then found using either Kruskal's
algorithm or Prim's algorithm. (For descriptions of these algorithms for finding a
maximum weight spanning forest, see [1].) Let S denote the columns of A corre-
sponding to the edges in the maximum weight forest. The MST preconditioner is
then
(11.15)
where iJ is a diagonal matrix containing the entries of D for the edges in the max-
imum weight spanning forest. The preconditioned residue system solved in Step 3e
can be solved in time proportional to the number of vertices because the coefficient
matrix S can be permuted into block triangular form.
preconditioner once the performance of the diagonal preconditioner falls off in their
dual affine algorithm.
We now discuss the stopping criterion used within the preconditioned conjugate gra-
dient algorithm. Recall that we want to solve equation (11.14) and that we use the
vectors Vk as successive approximations to v. The check used in the papers discussed
in this section examines the vector ADAT Vk: if the angle () between this vector and
the right hand side vector w is close to zero, then we have solved equation (11.14)
approximately. Resende and Veiga use the criterion that the preconditioned con-
jugate gradient algorithm can be halted if 1 1 - cos () 1< (coo, where (cos is 10- 3
in early iterations of the interior point algorithm and is gradually decreased. The
calculation of cos () requires about as much work as one conjugate gradient iteration,
so it is only calculated every fifth iteration by Resende and Veiga. Additionally, the
conjugate gradient method is halted if the size of the residual rk becomes very small.
The maximum weight spanning tree found in the preconditioned conjugate routine
can be used to guess an optimal solution: if the basic solution corresponding to this
forest is feasible and the corresponding dual solution is also feasible then this solution
450 CHAPTER 11
is optimal. This works well if the solution is unique, but unfortunately it usually
does not work well in the presence of multiple optimal solutions. If the primal basic
solution is not feasible, then the current dual iterate is projected to give a point fj
which is complementary to the primal basic solution. The edges for which the dual
slack has small magnitude for this dual vector fj are then used to define a subgraph
of the original graph. The edges in this subgraph are a superset of the edges in the
forest. The flow on all edges not in this sub graph are assigned flow either 0 or equal
to their upper bound. Resende and Veiga then attempt to find a feasible flow in
the original graph by only adjusting flow on the edges in the subgraph. This can be
done by solving a maximum flow problem and is guaranteed to give an integral flow
if one exists.
As the interior point iterates converge towards optimality, this procedure will even-
tually give an integral optimal flow, provided the flows on the nonbasic edges are set
correctly to 0 or their upper bound. Resende and Veiga examine the dual variable
S; corresponding to the nonbasic variable Xi and the dual variable Zi corresponding
to the upper bound constraint on this variable Xi. If S; > Zi then variable Xi is set
to zero; otherwise it is set equal to its upper bound. As the interior point method
converges to optimality, this setting will eventually be optimal, and so the procedure
outlined above will give an optimal integral solution to the network flow problem.
The basis identification method of Megiddo [51] can be used to determine an optimal
integral basic feasible solution once the interior point method is close enough.
Thus, this work shows that interior point methods can outperform the simplex algo-
rithm even in problem classes which lend themselves to sophisticated implementa-
tions of simplex. For an interior point method to be successful, it is necessary to use
IPMs for Combinatorial Optimization 451
Many of the computational runs of these authors took several hours, and some of
the runs with CPLEX Netopt took longer than a day. They used a number of
workstations (each solving a separate problem) to obtain their results, and they
were able to solve problems which are considered very large. It is on these large
problems that the advantages of interior point methods become clear.
Goffin et al. [25, 22] have previously described column generation interior point algo-
rithms designed to solve nonsmooth optimization problems. The research described
in this section is a continuation and extension of the work described in their earlier
papers.
452 CHAPTER 11
We are given a graph G = (V, E) and a set of commodities I. We denote the node
arc incidence matrix by A. For each commodity, there are source nodes where flow is
produced, sink nodes where flow is consumed, and transshipment nodes, where the
flow is in balance. The required net flow out of node v of commodity i is represented
by d~. Goffin et at. [21] restrict themselves to the case where each commodity has
exactly one source node and one sink node. The capacity Ye of each arc e can be
selected, with an associated convex cost le(xe); the upper bound on the capacity is
denoted by Ie. Associated with each commodity i and each arc e is a linear cost c~
for each unit of commodity i shipped along arc e. The multicommodity flow problem
can then be formulated as
Here, x~ represents the flow of commodity i on arc e and Ye represents the total
flow on arc e. We assume that the cost function le(Ye) is strictly increasing and
convex and that the costs C e are nonnegative. The standard linear multicommodity
flow problem corresponds to Ie = 0 for every arc e. Equation (11.17) is called the
coupling constraint and equation (11.18) is the flow conservation constraint. Without
equation (11.17), the problem would be separable. This equation is dualized in the
Lagrangian relaxation developed for this problem. The Lagrangian multipliers for
these constraints are nonnegative because of the structure of the objective function;
with the use of an interior point cutting plane algorithm, the multipliers are actually
always positive.
where u is the vector of Lagrange multipliers for the coupling constraints. Since the
multi commodity flow problem is convex, it can be solved by solving the Lagrangian
dual problem
max
subject to
where the Lagrangian dual function LD (u) is given by
(11.22)
IPMs for Combinatorial Optimization 453
The Lagrangian dual function LD(u) is a nonsmooth concave function. The La-
grangian dual problem can be solved by obtaining a polyhedral approximation to
the dual function using supergradients ~. If LD (u) is differentiable at the point u then
the only supergradient at that point is the gradient itself. In general, a supergradient
~ at u satisfies
(11.23)
for all u ~ O. Given points uk ~ 0 and associated supergradients ~k for k = 1, ... ,1\:,
the optimal value to the linear programming problem
max z
subject to z - (~k)Tu:::; LD(u k ) - (e)Tuk for k = 1, ... , I\:
provides an upper bound O:up on the optimal value of the Lagrangian dual. It can
be shown that if I\: is large enough, then the solution to this linear program will solve
the Lagrangian dual. The maximum of LD(u k ) for k = 1, ... , I\: provides a lower
bound Oinf on the optimal value of the dual, and any optimal solution lies in the
localization set
At each stage, the algorithm generates a point in the localization set. If this point is
feasible in the Lagrangian dual, then we can update the lower bound 0h.f' If the point
is not feasible, then we can generate a new sub gradient ~ and add the corresponding
constraint to the localization set. In either case, the localization set is updated, so
we then find a new point in this set and repeat the process until the gap between
Oinf and O:uP is sufficiently small. We summarize this in the prototypical algorithm
given in figure 11.8, dropping the iteration counter k to simplify the notation.
Step 1 of this process is usually called the Master Problem. Classically, it has been
solved using the simplex algorithm, and then the whole process resembles Dantzig-
Wolfe decomposition. Goffin et al. use an interior point method to solve the Master
Problem. They apply the de Ghellinck and Vial variant [19] of Karmarkar's projec-
tive algorithm [41] to the dual of Master Problem to calculate the analytic center of
the localization set. The localization set is modified by the addition of constraints so
columns are added to the dual of this problem. An interior point is generated in the
dual by using the technique of Mitchell and Todd [59]. The method used by Goffin
et al. generates primal and dual iterates at each approximate solution to the master
problem, so an approximate solution to the Lagrangian dual can be converted to an
approximate solution to the multi commodity flow problem.
Step 2 of the prototype algorithm is called the subproblem or oracle. There are
choices available in the solution of this problem for a multi commodity flow problem,
454 CHAPTER 11
Figure 11.8 Column generation algorithm for the multicommodity flow problem
depending upon the level of disaggregation of the constraints. The constraints for
the subproblem are separable by commodity. It is then possible to generate one
subgradient for the whole problem, or to generate sub gradients corresponding to each
commodity. Goffin et al. obtained better results by disaggregating the constraints
and generating separate sub gradients for each commodity; this is in agreement with
other work in the literature which used different algorithms to solve the Master
Problem (see Jones et al. [34]).
Goffin et al. give computational results for random problems with up to 500 nodes,
1000 arcs, and 4000 commodities, and for some smaller problems from the litera-
ture. (In their formulation, the largest problems could have up to 8 X 10 6 primal
variables x~.) They compared their algorithm with an implementation of Dantzig-
Wolfe decomposition, and the interior point algorithm was clearly superior for the
problems discussed.
To date, the only interior point algorithm which solves the integer program in polyno-
mial time and which does not drop constraints is due to Vaidya [74]. This algorithm
uses the volumetric center, so its analysis differs from that of more standard interior
point methods. Vaidya's analysis of his algorithm shows that only a polynomial
number of constraints are generated, even though an infinite number of possible
constraints exists. This is a crucial point in proving the polynomial complexity of
his algorithm, and indeed of any cutting plane or column generation algorithm. For
an alternative analysis of this algorithm, see Anstreicher [4]. Anstreicher was able
to greatly reduce the constants involved in the complexity analysis of Vaidya's algo-
rithm, making the algorithm considerably more attractive for implementation. For
example, Anstreicher reduced the number of Newton steps by a factor of 1.8 million
and he reduced the maximum number of constraints used by a factor of 10 4 . Vaidya's
algorithm is a short step algorithm, in the sense that the reduction in the duality
gap at an iteration is dependent on the dimension of the problem. Ramaswamy
and Mitchell [68] have developed a long step variant of Vaidya's algorithm that has
456 CHAPTER 11
polynomial convergence. Their algorithm reduces the duality gap by a fixed ratio at
any iteration where it is not necessary to add or drop constraints.
Atkinson and Vaidya [6] developed a polynomial time cutting plane algorithm which
used the analytic center. This algorithm drops constraints that become unimportant,
and this is essential in their complexity analysis. Previous algorithms were often
shown to be polynomial in the number of additional constraints, but without a
proof that the number of added constraints is polynomial. Atkinson and Vaidya's
algorithm finds a feasible point for a set of convex inequalities by finding an analytic
center for a subset of the inequalities and using an oracle to test whether that point
satisfies all the inequalities. If the oracle returns a violated inequality, a shifted
linear constraint is added so that the analytic center remains feasible and close to
the new analytic center.
Mitchell and Ramaswamy [58) developed a barrier function cutting plane algorithm
using some of the ideas from [6). This algorithm is a long step algorithm, unlike
the algorithm in [6): if it is not necessary to add or drop constraints, then they
reduce the duality gap by a constant fraction. They showed some links between
the notion of a point being centered (see, for example, Roos and Vial [72)) and the
criteria for a constraint to be added or dropped in [6). Barrier function methods for
linear programming have shown excellent computational performance and they can
be constructed to have superlinear and quadratic convergence. It would thus appear
desirable to employ these methods in a column generation algorithm.
There have been several papers recently analyzing algorithms that add many cuts at
once (see, for example, Luo [48), Ramaswamy and Mitchell [67), and Ye [80)). These
papers generally show that the complexity of an algorithm is not harmed if many
cuts are added at once, although there do have to be some bounds on the number
of constraints added simultaneously.
The earlier theoretical papers on interior point cutting plane algorithms generally
added the constraints far from the current center, so that the center of the new
system is close to the center of the old system. The paper by Goffin et al. [24)
shows that it is possible to add a cutting plane right through the current analytic
center without changing the complexity of their algorithm [23). Ye [80) extended
this analysis to the case where multiple cuts are placed right through the analytic
IPMs for Combinatorial Optimization 457
center. Ramaswamy and Mitchell [67] describe an algorithm which adds multiple
cuts through the analytic center, and they show that the new analytic center can be
regained in O( vP log(p)) iterations, where p is the number of added cuts.
The research on interior point methods for positive semi-definite programming has
lead to improved algorithms for various problems in combinatorial optimization. For
example, see the the chapter in this book by Pardalos and Ramana or the papers by
Goemans et al. [18, 20], Alizadeh [2] or chapter 9 of the book [29].
Bertsimas and Orlin [8] use the interior point algorithm for convex programming
given by Vaidya [74] to obtain algorithms with superior theoretical complexity for
several combinatorial optimization problems, principally by giving a new method
for solving the Lagrangean dual of a problem. This leads to improved complexity
for lower bounding procedures for the traveling salesman problem (particularly, the
Held and Karp method), the Steiner tree problem, the 2-connected problem, vehicle
routing problems, multi commodity flow problems, facility location problems, and
others.
Xue and Ye [78] have described an interior point algorithm for solving the problem of
minimizing a sum of Euclidean norms. This algorithm can be used to solve problems
related to Steiner trees with better theoretical complexity than the previously best
known algorithm.
11.7 CONCLUSIONS
We have discussed the ways in which interior point methods have been used to
solve combinatorial problems. The methods discussed include algorithms where the
simplex method has been replaced by an interior point method as well as a new
method which appears unrelated to previous simplex-based algorithms.
We have discussed incorporating interior point methods into cutting plane and
branch and bound algorithms for integer programming in section 11.2. In order
458 CHAPTER 11
We described the use of interior point methods to solve network flow problems in
section 11.4. These problems can be solved by solving a single linear program. The
computational results with an interior point method are better than those with a
specialized simplex method for large problems in several classes.
Research on the multicommodity network flow problem was discussed in section 11.5.
A column generation algorithm which appears to outperform classical Dantzig-Wolfe
decomposition on these problems was described.
With all of these methods, the relative performance of the interior pont method to
other methods improves as the problem size increases. This is typical of computa-
tional results with interior point methods for linear programming and other prob-
lems. Interior point methods will probably not be the method of choice for small
or medium sized problems, but they may become the preferred method for larger
problems once computational hardware improves sufficiently to make it possible to
routinely solve problems which are currently impracticably large. The increasing use
IPMs for Combinatorial Optimization 459
We discussed theoretical issues concerning cutting plane and column generation al-
gorithms in section 11.6.1. There are polynomial time interior point cutting plane
algorithms. However, to date there is no polynomial time interior point cutting
plane algorithm that is based upon the analytic center and which does not drop
constraints. Whether such an algorithm exists is an interesting open problem. The
discussion in section 11.6.2 of improved complexity results for various combinatorial
optimization problems is a starting point for what will probably be an active research
area in the next few years.
Acknowledgements
Research partially supported by ONR Grant number NOOOI4-94-1-0391.
REFERENCES
[1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows. Prentice Hall,
Englewood Cliffs, New Jersey, 1993.
[5] D. Applegate, R. Bixby, V. Chvatal, and W. Cook. Finding cuts in the TSP (a
preliminary report). Technical report, Mathematics, AT&T Bell Laboratories,
Murray Hill, NJ, 1994.
460 CHAPTER 11
[6] D. S. Atkinson and P. M. Vaidya. A cutting plane algorithm for convex program-
ming that uses analytic centers. Mathematical Programming, 69:1-43, 1995.
[7] H. van Benthem, A. Hipolito, B. Jansen, C. Roos, T. Terlaky, and J. Warners.
Radio link frequency assignment project, Technical annex T-2.3.2: Potential
reduction methods. Technical report, Faculty of Technical Mathematics and
Informatics, Delft University of Technology, Delft, The Netherlands, 1995.
[8] D. Bertsimas and J. B. Orlin. A technique for speeding up the solution of the
Lagrangean dual. Mathematical Programming, 63:23-45, 1994.
[9] J. R. Birge, R. M. Freund, and R. J. Vanderbei. Prior reduced fill-in in solving
equations in interior point algorithms. Operations Research Leiters, 11:195-198,
1992.
[10] B. Borchers. Improved branch and bound algorithms for integer programming.
PhD thesis, Rensselaer Polytechnic Institute, Mathematical Sciences, Troy, NY,
1992.
[11] B. Borchers and J. E. Mitchell. Using an interior point method in a branch and
bound algorithm for integer programming. Technical Report 195, Mathemat-
ical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, March 1991.
Revised July 7, 1992.
[12] CPLEX Optimization Inc. CPLEX Linear Optimizer and Mixed Integer Opti-
mizer. Suite 279, 930 Tahoe Blvd. Bldg 802, Incline Village, NV 89541.
[13] M. Davis and H. Putnam. A computing procedure for quantification theory. J.
Assoc. Comput. Mach., 7:201-215,1960.
[14] DIMACS. The first DIMACS international implementation challenge: The
benchmark experiments. Technical report, DIMACS, RUTCOR, Rutgers Uni-
versity, New Brunswick, NJ, 1991.
[15] J. Edmonds. Maximum matching and a polyhedron with 0, 1 vertices. Journal
of Research National Bureau of Standards, 69B:125-130, 1965.
[16] J. Edmonds. Paths, trees and flowers. Canadian Journal of Mathematics,
17:449-467,1965.
[17] A. S. EI-Bakry, R. A. Tapia, and Y. Zhang. A study of indicators for identifying
zero variables in interior-point methods. SIAM Review, 36:45-72, 1994.
[18] Uriel Feige and Michel X. Goemans. Approximating the value of two prover
proof systems, with applications to MAX 2SAT and MAX DICUT. In Pro-
ceedings of the Third Israel Symposium on Theory of Computing and Systems,
1995.
IPMs for Combinatorial Optimization 461
[19] G. de Ghellinck and J.-P. Vial. A polynomial Newton method for linear pro-
gramming. Algorithmica, 1:425-453, 1986.
[20] Michel X. Goemans and David P. Williamson. Improved Approximation Algo-
rithms for Maximum Cut and Satisfiability Problems Using Semidefinite Pro-
gramming. J. Assoc. Comput. Mach., 1994. (To appear). A preliminary version
appeared in Proc. 26th Annual ACM Symposium on Theory of Computing.
[21] J.-L. Goffin, J. Gondzio, R. Sarkissian, and J.-P. Vial. Solving nonlinear multi-
commodity network flow problems by the analytic center cutting plane method.
Technical report, GERAD, Faculty of Management, McGill University, Mon-
treal, Quebec, Canada H3A IG5, October 1994.
[22] J .-L. Goffin, A. Haurie, and J .-P. Vial. Decomposition and nondifferentiable
optimization with the projective algorithm. Management Science, 38:284-302,
1992.
[23] J .-L. Goffin, Z.-Q. Luo, and Y. Yeo On the complexity of a column generation
algorithm for convex or quasi convex problems. In Large Scale Optimization:
The State of the Art. Kluwer Academic Publishers, 1993.
[24] J .-L. Goffin, Z.-Q. Luo, and Y. Yeo Complexity analysis of an interior cut-
ting plane method for convex feasibility problems. Technical report, Faculty of
Management, McGill University, Montreal, Quebec, Canada, June 1994.
[25] J .-L. Goffin and J .-P. Vial. Cutting planes and column generation techniques
with the projective algorithm. Journal of Optimization Theory and Applica-
tions, 65(3):409-429, 1990.
[26] R. E. Gomory. An algorithm for integer solutions to linear programs. In R. L.
Graves and P. Wolfe, editors, Recent Advances in Mathematical Programming,
pages 269-302. McGraw-Hili, New York, 1963.
[27] M. Grotschel and o. Holland. Solving matching problems with linear program-
ming. Mathematical Programming, 33:243-259, 1985.
[28] M. Grotschel, M. Jiinger, and G. Reinelt. A cutting plane algorithm for the
linear ordering problem. Operations Research, 32:1195-1220, 1984.
[29] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and Combi-
natorial Optimization. Springer-Verlag, Berlin, Germany, 1988.
[31] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex Pro-
gramming, Algorithms and Complexity. PhD thesis, Faculty of Mathematics and
Informatics, TU Delft, NL-2628 BL Delft, The Netherlands, September 1992.
[32] D. den Hertog, C. Roos, and T. Terlaky. A build-up variant of the path-
following method for LP. Operations Research Letters, 12:181-186, 1992.
[33] IBM. IBM Optimization Subroutine Library Guide and Reference, August 1990.
Publication number SC23-0519-1.
[34] K. L. Jones, 1. J. Lustig, J. M. Farvolden, and W. B. Powell. Multicommodity
network flows - the impact of formulation on decomposition. Mathematical
Programming, 62:95-117, 1993.
[35] M. Junger, G. Reinelt, and S. Thienel. Practical problem solving with cutting
plane algorithms in combinatorial optimization. Technical Report 94.156, Insti-
tut fur Informatik, Universitiit zu Kaln, PohligstraBe 1, D-50969 Kaln, Germany,
March 1994.
[36] A. P. Kamath. Efficient Continuous Algorithms for Combinatorial Optimiza-
tion. PhD thesis, Department of Computer Science, Stanford University, Palo
Alto, CA, February 1995.
[37] A. P. Kamath and N. K. Karmarkar. A continuous approach to compute up-
per bounds in quadratic maximization problems with integer constraints. In
C. A. Floudas and P. M. Pardalos, editors, Recent Advances in Global Optimiza-
tion, Princeton Series in Computer Science, pages 125-140. Princeton University
Press, Princeton, NJ, USA, 1992.
[38] A. P. Kamath and N. K. Karmarkar. An O(nL) iteration algorithm for com-
puting bounds in quadratic optimization problems. In P. M. Pardalos, editor,
Complexity in Numerical Optimization, pages 254-268. World Scientific Pub-
lishing Company, Singapore (USA address: River Edge, NJ 07661), 1993.
[39] A. P. Kamath, N. K. Karmarkar, and K. G. Ramakrishnan. Computational
and complexity results for an interior point algorithm on multi-commodity flow
problem. Technical report, Department of Computer Science, Stanford Univer-
sity, Palo Alto, CA, 1993.
[40] A. P. Kamath, N. K. Karmarkar, K. G. Ramakrishnan, and M. G. C. Re-
sende. A continuous approach to inductive inference. Mathematical Program-
ming, 57:215-238, 1992.
(41) N. K. Karmarkar. A new polynomial-time algorithm for linear programming.
Combinatorica, 4:373-395, 1984.
IPMs for Combinatorial Optimization 463
[55] J. E. Mitchell. Fixing variables and generating classical cutting planes when
using an interior point branch and cut method to solve integer programming
problems. Technical Report 216, Mathematical Sciences, Rensselaer Polytechnic
Institute, Troy, NY 12180-3590, October 1994.
[56] J. E. Mitchell. An interior point column generation method for linear program-
ming using shifted barriers. SIAM Journal on Optimization, 4:423-440, May
1994.
[67] S. Ramaswamy and J. E. Mitchell. On updating the analytic center after the
addition of multiple cuts. Technical Report 37-94-423, Dept. of Decision Sci-
ences and Engg. Systems, Rensselaer Polytechnic Institute, Troy, NY 12180,
October 1994.
[68] S. Ramaswamy and J. E. Mitchell. A long step cutting plane algorithm that
uses the volumetric barrier. Technical report, Dept. of Decision Sciences and
Engg. Systems, Rensselaer Polytechnic Institute, Troy, NY 12180, June 1995.
[69] M. G. C. Resende and P. M. Pardalos. Interior point algorithms for network flow
problems. Technical report, AT&T Bell Laboratories, Murray Hill, New Jersey
07974-2070, 1994. To appear in Advances in Linear and Integer Programming,
J. E. Beasley, ed., Oxford University Press, 1995.
[70] M. G. C. Resende and G. Veiga. An efficient implementation of a network
interior point method. In D.S. Johnson and C.C. McGeogh, editors, Network
Flows and Matching: First DIMACS Implementation Challenge" pages 299-348.
American Mathematical Society, 1993. DIMACS Series on Discrete Mathemat-
ics and Theoretical Computer Science, vol. 12.
[77] X. Xu and Y. Yeo A generalized homogeneous and self-dual algorithm for linear
programming. Operations Research Letters, 17:181-190,1995.
466 CHAPTER 11
[78] G. Xue and Y. Yeo An efficient algorithm for minimizing a sum of Euclidean
norms with applications. Technical report, Department of Computer Science
and Electrical Engineering, University of Vermont, Burlington, VT 05405-0156,
June 1995.
[80] Y. Yeo Complexity analysis of the analytic center cutting plane method that
uses multiple cuts. Technical report, Department of Management Sciences, The
University of Iowa, Iowa City, Iowa 52242, September 1994.
ABSTRACT
Interior point methods, originally invented in the context of linear programming, have found
a much broader range of applications, including global optimization problems that arise in
engineering, computer science, operations research, and other disciplines. This chapter
overviews the conceptual basis and applications of interior point methods for some classes
of global optimization problems.
Key Words: Interior point methods, noncovex optimization, global optimization, quadratic
programming, linear complementarity problem, integer programming, combinatorial opti-
mization
12.1 INTRODUCTION
During the last decade, the field of mathematical programming has evolved rapidly.
New approaches have' been developed and increasingly difficult problems are be-
ing solved with efficient implementations of new algorithms. One of these new ap-
proaches is the interior point method [17]. These algorithms have been primarily
used to develop solution methods for linear and convex minimization problems. In-
terior point methods have been also developed for nonconvex minimization problems
and have been used as subroutines in many global optimization algorithms.
467
T. Terlaky (ed.), Interior Point Methods ofMathematical Programming 467-500.
o 1996 KIMHr Academic PllbJi8Mrl.
468 CHAPTER 12
multi quadratic programming and present an interior point algorithm for quadratic
programming with box constraints. In Section 12.3, we discuss an algorithm for
the minimization of non convex potential functions and show how this algorithm
can be applied to solve combinatorial optimization problems. Computational issues
are discussed in detail. Section 12.4 deals with an affine scaling algorithm for gen-
eral non convex quadratic programming. A lower bounding technique that uses an
interior point method is considered in Section 12.5. Section 12.6 discusses a poten-
tial reduction interior point algorithm for general linear complementarity problems.
Concluding remarks are made in Section 12.7.
The general quadratic programming problem with linear constraints has been shown
to be NP-complete [8, 32]. For example, the well-known maximum clique problem
=
on a graph G (V, E) can be formulated as the indefinite quadratic program
1 T
max 2"x AGx
subject to
eT x = 1, x 2: 0,
where AG is the adjacency matrix of the graph G and e is the vector of all ones.
Even the problem of deciding the existence of a Karush-Kuhn-Tucker point for the
problem
subject to
x 2: 0,
is NP-complete [8]. An approximate solution of the general quadratic programming
problem with linear constraints can be computed by successively solving a sequence
of quadratic problems with an ellipsoid constraint.
subject to
x EP = {x E R n I Ax = b, x 2: O}, (12.2)
IPMs for Global Optimization 469
where Q E Rnxn, A E Rmxn, cERn, and bERm. A special case of this problem
is the box constrained problem
(12.3)
subject to
x E B(r) = {x E R n Illxll oo ::; r}. (12.4)
Quadratic programming problems with box constraints are also NP-complete. This
class of problems is important because it constitutes a major ingredient in many
nonlinear programming algorithms.
By replacing the infinity norm 11.1100 with the Euclidean norm 11.112' results in quadratic
programming with a single quadratic constraint
(12.5)
subject to
x E E(r) = {x E R n Illxl12::; r}. (12.6)
Quadratic programming with an ellipsoid constraint is a useful subproblem in many
interior point algorithms for discrete and continuous optimization.
We conclude this section with some basic results about the eigenvalues of Q. Com-
puting the eigenvalues of a symmetric matrix is a well-studied problem [5] and can
be done in G(n 3 ) time for an n x n matrix. Assume that the components of Q and
c are rational numbers, Q is a symmetric rational matrix, and let L( Q) denote the
binary encoding length of Q. Using linear algebra techniques, it can be shown that
if ..\(Q) is an eigenvalue of Q, then I..\(Q) I ::; nmax,;,j Iqijl ::; 20 (L(Q)), and either
..\( Q) = 0 or 1>'( Q) I > 2- 0 (L(Q)).
Among those solutions satisfying (12.7-12.9), we are interested in the one that
achieves the (global) minimum objective value for (12.5-12.6). Interestingly, all
feasible solutions satisfying (12.7-12.9) must have the same p and the same objec-
tive value. More explicitly, let (Pl, xI) and (P2, X2) satisfy (12.7-12.9), then Pl = P2
=
and q(xI) q(X2)' This fact was shown for the trust region method in unconstrained
optimization [24].
From the above result, we have that any solution that satisfies (12.7-12.9) is the
globally minimum solution for (12.5-12.6), and P is unique among these minimum
solutions. Next, we discuss an algorithm for finding a solution satisfying (12.7-12.9)
in O(n 3 Iog(1/f» arithmetic operations with the error tolerance f> 0.
Let p* 2: °
be the unique p satisfying (12.7-12.9). An upper bound for p* is
Figure 12.1 Procedure bs: Algorithm for quadratic programming over an ellipsoid
using binary search
The binary search procedure bs, shown in the pseudo-code of Figure 12.1, can be
used to approximately compute J.I •. The procedure takes as input, the problem data
n, Q, c and r, and the tolerance f, and returns an f-approximation of J.I •. In step 1,
the lower and upper bounds on J.I. are initialized. The loop from line 2 to line 9
is repeated until the interval containing J.I. is less than the tolerance f. The loop
carries out the steps of a binary search. In step 3, the midpoint J.I is determined. If
Q + J.lI is positive definite and the norm of the solution of (12.7) is less than r, then
upper bound J.lu of J.I. is updated to be the midpoint J.I. Else, if Q + J.lI is positive
definite and the norm of the solution is greater than or equal to r, then the lower
bound J.l1 is set to be the midpoint. On the other hand, if Q + J.lI is negative definite
or indefinite, or no solution of (12.7) exists, or if the norm of the minimum norm
= -(
solution!!;. Q + J.lI)+ c is greater than r, then the lower bound J.l1 is set to the
midpoint (B+ denotes the pseudoinverse of B).
The minimum norm solution of (12.7) is considered in the case J.I = I~(Q)I, in which
Q + J.lI is positive semidefinite (and therefore singular), and solutions exist for (12.7).
Thus, c must equal the projection of c onto the column (or row) space of Q + J.lI, and
the minimum norm solution!!;. is the solution such that it lies in the row (or column)
= -(
space of Q + J.lI or equivalently, !!;. Q + J.lI)+ c. Each iteration of bs requires one
matrix inversion, and can thus be completed in O(n 3 ) arithmetic operations.
Although the above algorithm is polynomially bounded (with error c), trust region
techniques are preferred in practice. Ye [36] proposed a hybrid algorithm, com-
bining Newton's method and binary search, that solves the problem of minimizing
a quadratic function over an ellipsoid in O(log(1og(l/c))) iterations, each iteration
taking O( n 3 ) operations.
A very nice discussion regarding non convex quadratic programming over a sphere
can be found in the book by Vavasis [32].
procedure qpbox(n, Q, c, r, x)
1 k = 1; xO = 1/2e;Dl = diag(1/2, ... ,1/2);
2 do stopping criterion not satisfied --+
3 Ek = {x IIID;l(x - X k- 1 )112 ~ r ~ I}};
4 xk = argmin HxTQx + cTx I x E Ed;
5 di =min{xf,l-xf}, i=l, ... ,n;
6 Dk+1 = diag(d 1 , ... , dn );
7 k =k + 1;
8 od;
end qpbox;
(12.10)
subject to
o ~ x ~ e, (12.11)
where e E R is a vector of all ones. This problem is an essential subroutine in many
n
subject to
(12.13)
The significance of this optimization problem is that many combinatorial optimiza-
tion problems can be formulated as above with the additional requirement that the
variables are binary.
In [14, 15] a new affine scaling algorithm was proposed for solving the above prob-
lem using a logarithmic potential function. Consider the non convex optimization
problem
(12.14)
where
1 n
'P(w) log(m - wT w)1/2 - - Llogdi(w) (12.15)
n ;=1
(12.16)
and where
di(w)=b;-a[w, i= 1, ... ,n, (12.17)
are the slacks. The denominator of the log term of '1'( w) is the geometric mean of
the slacks and is maximized at the center of the polytope defined by
be a given initial interior point. The algorithm generates a sequence of interior points
of .c.
IPMs JOT' Global Optimization 475
(12.19)
To see that the ellipsoid £(r) is inscribed in the polytope C, assume that r = 1 and
let Y E £(1). Then
and consequently
D- 1 A T (y_ w k ) ~ e,
where w k E Co. Denoting the i-th row of AT by aT, we have
bi - ai. w
T( Y - w k)
1T k aj. ~ 1, 'til. = 1, ... , n.
Hence,
and consequently
ary~bi' Vi=1, ... ,n,
i.e. AT y ~ b, showing that y E C. This shows that £(1) C C and since £(r) C £(1),
for 0 ~ r < 1, then £(r) C C, i.e. £(r) is an inscribed ellipsoid in C.
476 CHAPTER 12
(12.21)
subject to
(12.22)
The optimal solution ~w* to (12.21-12.22) is a descent direction of Q(w) from
wk. For a given radius r > 0, the value of the original potential function cp( w)
may increase by moving in the direction ~w*, because of the higher order terms
ignored in the approximation. It can be easily verified, however, that if the radius
is decreased sufficiently, the value of the potential function will decrease by moving
in the new ~w* direction. We shall say a local minimum to (12.14) has been found
if the radius must be reduced below a tolerance ( to achieve a reduction in the value
of the potential function.
The following result, proved in [14], characterizes the optimal solution of (12.21-
12.22). Using a linear transformation, the problem is transformed into the mini-
mization of a quadratic function over a sphere.
(12.23)
subject to
(12.24)
where Q E R mxm is symmetric and indefinite, x, c E R m and 0 < r E R. Let
denote a full set of orthonormal eigenvectors spanning R m and let AI,
Ul, •.. , U m
... , Am be the corresponding eigenvalues ordered so that Al :::; A2 :::; .,. :::; Am- 1 :::;
Am. Denote 0 > Amin = min{Al, ... ,Am} and Umin the corresponding eigenvector.
Furthermore, let q be such that Amin = Al = ... = Aq < A q+1' To describe the
solution to (12.23-12.24) consider two cases:
Case 1: Assume 2:;=1 (c T Ui? > O. Let the scalar A E (-00, Amin) and consider the
parametric family of vectors
For any r > 0, denote by A(r) the unique solution of the equation x(A)T x(A) = r2
in A. Then x(A(r)) is the unique optimal solution of (12.23-12.24).
IPMs for Global Optimization 477
Case 2: Assume cT Ui = 0, Vi = 1, ... , q. Let the scalar A E (-00, Amin) and consider
the parametric family of vectors
(12.25)
Let
1'max = Ilx(Am in)112.
If l' < 1'max then for any ° < l' < 1'max , denote by A(1') the unique solution of
the equation x(A)T x(A) = 1'2 in A. Then x(A(1')) is the unique optimal solution of
(12.23-12.24).
If l' ::::: 1'max , then let 0' 1, 0'2, ... , 0' q be any real scalars such that
q
'~ai
" 2 =r 2 -rmax'
2
i=1
Then
This shows the existence of a unique optimal solution to (12.23-12.24) if l' < 1'max.
The proof of this result is based on another fact, used to develop the algorithm
described in [14, 15], that we state next.
for A E (-00, Amin). Now, assume that cT Ui = 0, Vi = 1, ... , q and consider the
parametric family of vectors
(12.26)
478 CHAPTER 12
Figure 12.3 Procedure cmq: Algorithm for nonconvex potential function mini-
mization
The above result suggests an approach to solve the non convex optimization problem
(12.14). At each iteration, a quadratic approximation of the potential function
rp( w) around the iterate w k is minimized over an ellipsoid inscribed in the polytope
{w E RmlAT w ~ b} and centered at wk. Either a descent direction Llw· of rp(w)
is produced or w k is said to be a local minimum. A new iterate Wk+l is computed
by moving from w k in the direction Llw· such that rp{ wk+l) < rp{ w k ). This can be
done by moving a fixed step a in the direction Llw* or by doing a line search to find
a that minimizes the potential function rp{ w k + aLlw·) [30].
Figure 12.3 shows a pseudo-code procedure cmq, for finding a local minimum of the
convex quadratic maximization problem. Procedure cmq takes as input the problem
dimension n, the A matrix, the b right hand side vector, an initial estimate Po of
parameter p and initial lower and upper bounds on the acceptable length, £0 and
10, respectively. In line 2, get...start_point returns a strict interior point of the
polytope under consideration, i.e. w k E £0.
IPMs for Global Optimization 479
The algorithm iterates in the loop between lines 3 and 13, terminating when a local
optimum is found. At each iteration, a descent direction of the potential function
<p(w) is produced in lines 4 through 8. In line 4, the minimization of a quadratic
function over an ellipsoid (12.21-12.22) is solved. Because of higher order terms the
direction returned by descent...direction may not be a descent direction for <p(w).
In this case, loop 5 to 8 is repeated until an improving direction for the potential
function is produced or the largest acceptable length falls below a given tolerance {.
If an improving direction for <p(w) is found, a new point wk+ 1 is defined (in line 10)
by moving from the current iterate w k in the direction ;:lw* by a step length a < 1.
(12.29)
(12.30)
With the change of variables i = 1j(p + 1jn) and substituting the Hessian (12.19)
and the gradient (12.20) into (12.29) we obtain
_ (AD-2 AT _ 2i wkwkT _ Ii
f~ fo
I) -1 X
that satisfies (12.29). Note that r does not appear in (12.32) and that (12.32) is not
defined for all values of r. However, if the radius r of the ellipsoid (12.28) is kept
within a certain range, then there exists an interval 0 :S , :S ,max such that
(12.33)
then
lim hT ~w· = -f hT (AD- 2 AT)-lh.
£-+0+
showing that there exists, > 0 such that the direction ~w·, given in (12.32), is a
descent direction of <p( w).
The idea ofthe algorithm is to solve (12.27-12.28), more than once if necessary, with
the radius r as a variable. Parameter, is varied until r takes a value in some given
IPMs for Global Optimization 481
2 k kT 1
H = --w w --/ (12.36)
o J6 fa
and define
M=Hc+'YHo.
Given the current iterate w k , we first seek a value of 'Y such that M ll.w = 'Yh has a
solution ll.w*. This can be done by binary search, as we will see shortly. Once such
a parameter 'Y is found, the linear system
is solved for ll.w* == ll.w*("(r)). As was shown previously, the length 1(ll.w*('Y)) is
a monotonically increasing function of 'Y in the interval 0 :::; 'Y :::; 'Ymax. Optimality
condition (12.30) implies that r = JI(ll.w*('Y)) if Jl > O. Small lengths result in
small changes in the potential function, since r is small and the optimal solution
lies on the surface of the ellipsoid. A length that is too large may not correspond
to an optimal solution of (12.27-12.28), since this may require r > 1. An interval
(£,7) called the acceptable length region, is defined such that a length 1(ll.w*('Y)) is
accepted if l :::; 1(ll.w*('Y)) :::; I. If 1(ll.w*('Y)) < l, 'Y is increased and (12.37) is
resolved with the new M matrix and h vector. On the other hand, if 1(ll.w*('Y)) > I,
'Y is reduced and (12.37) is resolved. Once an acceptable length is produced we use
ll. w* ('Y) as the descent direction.
Similar to the treatment of case (i), case (ii) is handled in lines l4-18. The current
value of I is an upper bound on an acceptable value of I and is recorded in line 15
and the corresponding logical key is set. If a lower bound I for an acceptable value
of I has been found the new estimate for I is set to the geometric mean of I and l'
in line 16. Otherwise I is decreased by a fixed factor in line 17. -
Finally, in line 20, the lower bound L may have to be adjusted if 1 < Land LD key =
true. Note that the key LDkey is used only to allow the adjustment in the range of
the acceptable length, so that the range returned contains the current length 1.
M = AD- 2 AT - ~wkwkT - ~I
fJ fa '
2 k kT
f'6w W ~
is done in two steps. First, an inner product w kT ~ is computed. Then, the vector
/g w k is scaled by the inner product. The third product,
is done in three steps. First the product AT ~ is carried out. The resulting vector is
scaled by D- 2 and multiplies A. Therefore, if A is sparse, the entire matrix vector
multiplication can be done efficiently.
L Pi logdi(w),
n
<pp(w) =m - wT w -
i=1
h=-2w+AD- 1 p,
and
H = -21 + AD- 1 PD- 1 AT,
where P = (PI, ... , Pn) and P = diag(p). Note that the density of the Hessian
depends only on the density of AAT. Consequently, direct factorization methods
can be used efficiently when the density of AAT is small.
IPMs for Global Optimization 485
The more common form of integer programming, where variables Xi take on (0,1)
values, can be converted to the above form with the change of variables
1 + Wi .
Xi = --2-' z = 1, ... ,m.
and
and let
With this notation, we can state the integer programming problem as: Find wEI.
As before, let
.c = {w E Rm I AT W ::; b}
and consider the linear programming relaxation of (12.38-12.39), i.e. find w E
.c. One way of selecting ±1 integer solutions over fractional solutions in linear
programming is to introduce the quadratic objective function,
m
and solve the non convex quadratic programming problem (12.12-12.13). Note that
w T w ::; m, with the equality only occurring when Wj = ±1, j = 1, ... , m. Further-
more, if wEI then w E .c and Wi =±1, i = =
1, ... , m and therefore w T w m.
486 CHAPTER 12
In place of (12.12-12.13), one solves the non convex potential function minimization
(12.40)
where <p(w) is given by (12.15-12.17). The generally applied scheme rounds each
iterate to an integer solution, terminating if a feasible integer solution is produced.
If the algorithm converges to a nongloballocal minimum of (12.40), then the problem
is modified by adding a cut and the algorithm is applied to the augmented problem.
Let v be the integer solution rounded off from the local minimum. A valid cut is
(12.41)
We note that adding a cut of the type above will not, theoretically, prevent the
algorithm from converging to the same local minimum twice. In practice [15], the
addition of the cut changes the objective function, consequently altering the trajec-
tory followed by the algorithm.
Most combinatorial optimization problems have very natural equivalent integer and
quadratic programming formulations [26]. The algorithms described in this section
have been applied to a variety of problems, including maximum independent set [16],
set covering [15], satisfiability [13, 30], inductive inference [11, 12], and frequency
assignment in cellular telephone systems [34].
We now consider an affine scaling algorithm for general nonconvex quadratic pro-
gramming (12.1-12.2). Given an arbitrary feasible point xk E P =
{x E R n I Ax =
b, x ~ O}, using the scaling technique, we form the suboptimization problem
(12.42)
subject to
= b, and Ilx - el12 :5 r,
.iF x (12.43)
where Q = XkQXk, c= Xkc, with X k = diag (x k ), A = AXk, and 0 < r < l.
Let Llx == x - e, then (12.42-12.43) can be written as
Let B E R nxm be an orthonormal basis spanning the null space of A. Then Llx =
BLly for some Lly E R m and, with this change of variables, (12.42-12.43) becomes
(12.44)
subject to
Lly E E(r), (12.45)
which is the minimization of a quadratic function over an ellipsoid.
is also an interior feasible point for (12.1-12.2), since the linear constraints are sat-
isfied by the new interior point, i.e.
The first and second order necessary conditions for the solution of (12.42-12.43) are
given by
Q.f. + c - A? It + pk(.f. - e) = 0,
A(.f. - e) = 0,
11.f. - el12 :::; r, pk 2: 0 and pk(r -11.f. - e112) = 0,
and
Let
and
Then,
pk = Ilpkl12,
r
pk = Xk sk+l,
and
k+l _ k ( rpk)
x - X e - IIpkl12 .
Repeatedly solving (12.42-12.43) with the binary search procedure bs (or a trust
region method) one solves (12.1-12.2), as shown in the pseudo-code in Figure 12.5.
This procedure takes as input the problem data Q, B, c, the positive parameter r < 1,
and outputs the primal and dual solution vectors .f. and y, respectively. Until the
termination criterion is satisfied, the algorithm iterates from line 2 to line 8 solving
a quadratic program over an ellipsoid and recovering the variables. One possible
stopping rule is to halt when Ilx k+1 - xkl12 :::; c, for some given tolerance c > o.
Another rule is to stop when the tightnesses of the optimality conditions are at an
acceptable level.
Let xk and yk converge to.f. and y. Then,.f. and yare feasible for (12.1-12.2), and
satisfy both the first and the seco~d order necessa~y conditions of (12.1-12.2), i.e.
IPMs for Global Optimization 489
Figure 12.5 Procedure asqp: Affine scaling algorithm for quadratic programming
No complexity bound has been developed for the algorithm described above. How-
ever, if the global minimum solution of (12.3-12.4) is contained in a larger ball E(R),
Ye [35] shows that if ~(Q) 2:: 0, then
r
q(~) - z* :5 (1 - :R)(q(O) - z*),
where z* is the global minimum objective value for (12.3-12.4). This result shows
that the objective reduction rate is at least 1 - (r/R)2 after solving (12.5-12.6),
independent of the convexity of the quadratic objective function.
Although the general quadratic programming problem with linear constraints is NP-
hard, the affine scaling algorithm has proven itself to be efficient in solving certain
classes on nonconvex quadratic programming problems [6].
490 CHAPTER 12
More recently, Monteiro and Wang [23] study trust region affine scaling algorithms
for solving linearly constrained convex and concave programming problems. For a
special class of convex or concave functions, satisfying a certain invariance condition
on their Hessians, the authors prove R-linear and Q-linear convergence, respectively.
In addition, under primal non degeneracy, and for the same class of functions, they
show that an accumulation point of the iterates satisfies the first and second order
optimality conditions.
Consider the problem of finding good lower bounds on fmin. To apply an interior
point method to this problem, one needs to embed the discrete set S in a continuous
set T 2 S. Clearly, the minimum of f(x) over T is a lower bound on fmin.
Clearly, the set S is contained in E( w). If Amin (w) is the minimum eigenvalue of
W- 1/ 2QW- 1/ 2, then
. xTQx . xTW-I/2QW-I/2x
m l l lT
-- -
x Wx
= mill x x
T = Amin(W),
and therefore
XTQX ~ Amin(W) , Vx E E(w).
Hence, the minimum value of f( x) over E( w) can be obtained by simply computing
the minimum eigenvalue of W- 1 / 2QW- 1 / 2 . To further improve the bound on !min
requires that Amin (w) be maximized over the set U. Therefore, the problem of
finding a better lower bound is transformed into the optimization problem
max II
subject to
XTQX
TW ~ II, Vx E R n - {OJ and wE U.
x x
One can further simplify the problem by defining d = (d 1 , ... , dn ) E R n such that
2:7=1 di = O. Let D = diag(d). If
for xES. Now, define z = IlW + d and let Z = diag(z). For all xES,
subject to
492 CHAPTER 12
Figure 12.6 Procedure qplb: Interior point algorithm for computing lower bounds
Let M(z) = Q - Z. Observe that solving the above problem amounts to minimizing
the trace of M (z) while keeping M (z) positive semidefinite. Since M (z) is real
and symmetric, it has n real eigenvalues A;(M(z)), i = 1, ... , n. To ensure positive
definiteness, the eigenvalues of M(z) must be nonnegative. Hence, the above problem
is reformulated as
min tr(M(z))
subject to
A;(M(z)) ~ 0, i = 1, ... , n.
Kamath and Karmarkar [9, 10] proposed an interior point approach to solve the above
trace minimization problem, that takes no more that O(nL) iterations, having two
matrix inversions per iteration. Figure 12.6 shows a pseudo code for this algorithm.
To analyze the algorithm, consider the parametric family of potential functions given
by
g(z,v) = 2nln(tr(M(z)) - v) -lndet(M(z)).
where v E R is a parameter. This algorithm will generate a monotonically increasing
sequence of parameters v Ck ) that converges to the optimal value v·. The sequence
IPMs JOT Global Optimization 493
V(k) is constructed together with the sequence z(k) of interior points, as shown in the
pseudo code in Figure 12.6. Since Q - Z· is a positive definite matrix, v(a) 0 ::; v· =
is used as the initial point in the sequence.
(k)
g1
_
(z,v)--
2n
(((k»)
T i d (M( z (kl)T z +C,
ez+V'net
tr M z - v
where C is a constant. Kamath and Karmarkar show how glk)(z, v) can be reduced
by a constant amount at each iteration. They prove that it is possible to compute
v(k+1) E R and a point z(k+1) in a closed ball of radius 0' centered at z(k) such that
v(kl ::; v(k+1) ::; v* and
Using this fact, they show that, if z(k) is the current interior point and v(k) ::; v· is
the current estimate of the optimal value, then
0'2
g( z(k+ 1) v(k+1l) _ g(z(k) v(k»)
, ,
<
-
_0' + -::-;-:_----,-
2(1-0')'
where z(k+1) and v(k+1) are the new interior point and new estimate, respectively.
This proves polynomial-time complexity for the algorithm.
y = Mx + q, x 2: 0, y 2: 0, x T Y = 0,
or proving that no such (x, y) pair exists. Although the general LCP is NP-hard,
some classes of LCP's can be solved by polynomial time algorithms. For example,
when M is positive semidefinite, the LCP is a convex quadratic program and can
be solved by the ellipsoid algorithm or several interior point algorithms (e.g., see
Kojima et al. [18]). Other classes of polynomially solved LCP's are discussed in
[7, 20, 37].
The general linear complementarity problem is equivalent to the mixed integer feasi-
bility problem. It has been shown by Pardalos and Rosen [27] that any general LCP
494 CHAPTER 12
can be reduced to a mixed (0,1) integer feasibility problem. On the other hand,
using the observation that every constraint of the form z E {O, I} is equivalent to
z + w = 1, Z 2:: 0, w 2:: 0, zw = 0,
it is easy to show that any mixed (0, 1) integer feasibility problem can be formulated
as an LCP [26]. Furthermore, the problem of checking existence of a Karush-Kuhn-
Tucker point of the non convex quadratic problem of the form
subject to
x 2:: 0,
can be reduced to a symmetric LCP, which has been shown to be NP-complete [8].
~ log(x·y·)
(p - n)log(x T y) - L.J J J ,
. log(xTy)
J=l
where p > n, that is defined for any (x, y) E n+. The gradients of the potential
function are
where e is the vector of all ones. Starting from an interior point (XO, yO) that satisfies
IPMs for Global Optimization 495
where L is the size of M and q, the potential reduction algorithm generates a sequence
of interior feasible solutions {xk, yk} terminating at a point such that
subject to
oy= Mox,
II(Xk)-loxll~ + II(yk)-loyll~ ::; (32 < 1,
is solved. The scaled gradient projection vector to the null space of the scaled
equality constraints is
(12.48)
where
( 12.49)
and ~k = (xk)T yk, X k (resp. yk) denotes the diagonal matrix of xk (resp. yk).
Then, for some (3 > 0 assign
(12.51)
where
Ilpkll~/(2(p + 2)), if Ilpkll~ ::; (p + 2)2/4,
o:(llpk II~) ={
o:(lIpkllD = (p + 2)/8, otherwise.
496 CHAPTER 12
procedure lcp(M, q)
1 Find xO such that yO =
Mxo + q > 0;
2 do (xk)T yk ::::: 2- L --+
3 =
'Irk «yk)2 + M(Xk)2 MT)-l(yk - M Xk)(Xkyk - ~. e);
4 p! = -f,;Xk(yk + MT 'Irk) - e;
5 p~ = -f,;yk(x k - 'Irk) - e;
6 pk = (p!,p~);
7 f3 = min(~ p+2 '21) < - 1/2·,
8 xk+ 1 = x10 - f3X 1o p!/lIp"1I2;
9 yk+ 1 = yk _ f3y1op~/lIpkIl2;
10 k = k + 1;
11 od;
end lcp;
Pseudo code is given in Figure 12.7 for the potential reduction algorithm for the
LCP.
From (12.51), it follows that Ilpkll~ partially determines the potential reduction at
the k-th iteration. Note that the potential reduction increases with IIpkll~. Let
g(x,Y)=~Xy-e
and
Then
IIpkll~ = g(xk, yk)T H(xk, yk)g(xk, yk).
Ye and Pardalos [37] define the condition number for the LCP as
i(M, q) = inf{llg(x, y)ll1- 1 xT y::::: 2- L , <fJ(x, y) ::; O(pL) and (x, y) E n+},
where Ilg(x, y)ll1- denotes g(x, yf H(x, y)g(x, y). They prove that the algorithm
with p > n solves the LCP for which i(M, q) > 0 in O(nL/O/(f(M, q))) iterations,
each of which requires the solution of one system of linear equations. Consequently,
they show that LCP's for which i(M, q) > 0 and 1/7(M, q) is bounded above by
a polynomial in Land n, can be solved in polynomial time. Thus, the condition
number represents the degree of difficulty of the potential reduction algorithm. Fur-
thermore, the condition number suggests that convexity (or positive semidefiniteness
IPMs for Global Optimization 497
of the matrix M in LCP) may not be the basic issue that separates the polynomially
solvable classes from the class of NP-complete problems.
Many classes of non convex LCP's have been identified and can be solved in polyno-
mial time by this algorithm.
The advent of interior point methods has provided alternatives to design exact al-
gorithms, as well as heuristics, for many classes of global optimization problems.
In this chapter, we restricted ourselves to applications of interior point methods for
quadratic and combinatorial optimization problems, as well as non convex potential
functions.
There is a, vast amount of literature on interior point methods for linear and con-
vex programming, as well as applications in global and combinatorial optimiza-
tion. We direct the reader to the interior point World Wide Web page at the URL
http://www.mcs.anl.gov/home/otc/lnteriorPoint.
REFERENCES
[1] I. Adler, M.G.C. Resende, G. Veiga, and N. Karmarkar. An implementation
of Karmarkar's algorithm for linear programming. Mathematical Programming,
44:297-335, 1989.
[2] F. Alizadeh. Optimization over positive semi-definite cone: Interior-point meth-
ods and combinatorial applications. In P.M. Pardalos, editor, Advances in Op-
498 CHAPTER 12
[3) 1.1. Dikin. Iterative solution of problems of linear and quadratic programming.
Soviet Math. Doklady, 8:674-675,1967.
[5) G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 1989.
[6) C.-G. Han, P.M. Pardalos, and Y. Yeo On the solution of indefinite quadratic
problems using an interior point algorithm. Informatica, 3:474-496, 1992.
[7) R. Horst and P.M. Pardalos, editors. Handbook of Global Optimization. Kluwer
Academic Publishers, Amsterdam, 1995.
[8) R. Horst, P.M. Pardalos, and N.V. Thoai. Introduction to Global Optimization.
Kluwer Academic Publishers, Amsterdam, 1995.
[9) A.P. Kamath and N. Karmarkar. A continuous method for computing bounds
in integer quadratic optimization problems. Journal of Global Optimization,
2:229-241,1992.
[10) A.P. Kamath and N. Karmarkar. An O(nL) iteration algorithm for computing
bounds in quadratic optimization problems. In P.M. Pardalos, editor, Com-
plexity in Numerical Optimization, pages 254-268. World Scientific, Singapore,
1993.
[11) A.P. Kamath, N. Karmarkar, K.G. Ramakrishnan, and M.G.C. Resende. A con-
tinuous approach to inductive inference. Mathematical Programming, 57:215-
238, 1992.
[17] N .K. Karmarkar. A new polynomial time algorithm for linear programming.
Combinatorica, 4:373-395, 1984.
[19] K. Levenberg. A method for the solution of certain problems in least squares.
Quart. Appl. Math., 2:164-168, 1944.
[24] J.J. More and D.C. Sorenson. Computing a trust region step. SIAM J. of Stat.
Sci. Comput., 4:553-572, 1983.
[25] P. M. Pardalos, Y. Ye, C.-G. Han, and J. Kaliski. Solution of P-matrix lin-
ear complementarity problems using a potential reduction algorithm. SIAM J.
Matrix Anal. fj Appl., 14:1048-1060,1993.
[27] P.M. Pardalos and J.B. Rosen. Global optimization approach to the linear
complementarity problem. SIAM J. Scient. Stat. Computing, 9:341-353, 1988.
[28] M.V. Ramana. An algorithmic analysis of multiplicative and semidefinite pro-
gramming problems. PhD thesis, The Johns Hopkins University, Baltimore,
1993.
[29] M.G.C. Resende and P.M. Pardalos. Interior point algorithms for network flow
problems. In J .E. Beasley, editor, Advances in linear and integer programming.
Oxford University Press, 1996.
[30] C.-J. Shi, A. Vannelli, and J. Vlach. An improvement on Karmarkar's algorithm
for integer programming. In P.M. Pardalos and M.G.C. Resende, editors, COAL
Bulletin - Special issue on Computational Aspects of Combinatorial Optimiza-
tion, number 21, pages 23-28. 1992.
[31] T. Tsuchiya and M. Muramatsu. Global convergence of a long-step affine scaling
algorithm for degenerate linear programming problems. Technical Report 423,
The Institute of Statistical Mathematics, Tokyo, 1992. To appear in SIAM J.
Opt.
[32] Stephen A. Vavasis. Nonlinear Optimization, Complexity Issues. Oxford Uni-
versity Press, Oxford, 1991.
[33] J.P. Warners, T. Terlaky, C. Roos, and B. Jansen. Potential reduction algo-
rithms for structured combinatorial optimization problems. Technical Report
95-88, Delft University of Technology, Delft, 1995.
[34] J.P. Warners, T. Terlaky, C. Roos, and B. Jansen. A potential reduction ap-
proach to the frequency assignment problem. Technical Report 95-98, Delft
University of Technology, Delft, 1995.
[35] Y. Yeo A new complexity result on minimization of a quadratic function with a
sphere constraint. In C.A. Floudas and P.M. Pardalos, editors, Recent Advances
in Global Optimization, pages 19-21. Princeton University Press, Princeton,
1992.
[36] Y. Yeo On affine scaling algorithms for nonconvex quadratic programming.
Mathematical Programming, 56:285-300, 1992.
[37] Y. Ye and P.M. Pardalos. A class of linear complementarity problems solvable
in polynomial time. Linear Algebra and its Applications, 152:3-17, 1991.
13
INTERIOR POINT APPROACHES
FOR THE VLSI PLACEMENT
PROBLEM
Anthony Vannelli, Andrew Kennings,
Paulina Chin
Department of Electrical and Computer Engineering
University of Waterloo
Waterloo, Ontario
CANADA N2L 3Gl
ABSTRACT
VLSI placement involves arranging components on a two-dimensional board such that the
total interconnection wire length is minimized while avoiding component overlap and ensur-
ing enough area is provided for routing. Placement is accomplished in a two-step procedure.
The first step involves computing a good relative placement of all components while ignor-
ing overlap and routing. The second step involves removing overlap and routing. This
paper describes two new relative placement models that generate sparse LP and QP pro-
grams. The resulting LP and QP programs are efficiently solved using appropriate interior
point methods. In addition, an important extension is described to reduce module overlap.
Numerical results on a representative set of real test problems are presented.
13.1 INTRODUCTION
In the combinatorial sense, the layout problem is a constrained optimization prob-
lem. We are given a circuit (usually a module-wire connection-list called a netlist)
which is a description of switching elements and their connecting wires. We seek an
assignment of geometric coordinates of the circuit components (in the plane or in one
of a few planar layers) that satisfies the requirements of the fabrication technology
(sufficient spacing between wires, restricted number of wiring layers, and so on) and
that minimizes certain cost criteria. Practically, all aspects of the layout problem
as a whole are intractable; that is, they are NP-hard [4]. Consequently, we have to
resort to heuristic methods to solve very large problems. One of these methods is
to break up the problem into subproblems, which are then solved. Almost always,
501
T. Terlaky (ed.i, Interior Point Methods ofMathematical Programming 501-528.
© 1996 Kluwer Academic Publishers.
502 CHAPTER 13
these subproblems are NP-hard as well, but they are more amenable to heuristic so-
lutions than is the entire layout problem itself. Each one of the layout subproblems
is decomposed in an analogous fashion. In this way, we proceed to break up the
optimization problems until we reach primitive subproblems.
These subproblems are not decomposed further, but rather solved directly, either
optimally (if an efficient polynomial-time optimization algorithm exists) or approx-
imately if the subproblem is itself NP-hard or intractable, otherwise. The most
common way of breaking up the layout problem into subproblems is first to do logic
partitioning where a large circuit is divided into a collection of smaller modules
according to some criteria, then to perform component placement, and then to de-
termine the approximate course of the wires in a global routing phase. This phase
may be followed by a topological-compaction phase that reduces the area require-
ment of the layout, after which a detailed-routing phase determines the exact course
of the wires without changing the layout area. After detailed-routing, a geometric-
compaction phase may further reduce the layout area requirement [7].
In VLSI placement which is the focus of this work, we are given a set of components
(modules) that are interconnected by a set of signal paths (nets). The objective is
to position the modules while minimizing the total wirelength required to connect
the modules. In positioning the modules, several placement constraints must be
considered to guarantee feasibility (a legal placement). For instance, the modules
must be placed within some given physical area and must not overlap. Furthermore,
the modules must be placed such that the nets can physically be connected (routing).
Examples of the placement problem arise in macrocell, gate array and standard cell
design [11].
Since the VLSI placement problem is computationally intractable and optimal place-
ments are difficult to produce, advanced heuristics such as Tabu Search [14], simu-
lated annealing [11] and hierachical partitioning [10] are used to obtain near optimal
placements. Although these heuristics yield near optimal placements, they still tend
to require large computational times. However, it is well known that when good
initial placements are generated, these heuristics tend to converge quickly to near
optimal placements with low computational effort [5].
The linear program model of Weis and Mlynski is described in Section 13.2. A
new extension of a module-net-point model is described in Section 13.3. The model
is shown to be equivalent to a quadratic programming (QP) problem with a sparse
positive definite matrix in the objective function which can be solved efficiently using
an interior point method. Section 13.4 describes an important extension which can
be included to improve the relative placement by forcing overlapping modules further
apart. The quadratic interior point method used in the work to solve the relative
placement problem is described in Section 13.5 . Numerical results on test problems
for both the LP and QP problems are presented in Section 13.6. Finally, Section
13.7 summarizes the results of this work and presents directions for future research.
I fixed
, module
o
o o
---8--
o
free module
the modules, and then other methods are used to eliminate overlap and subsequently
form a legal placement. The force-directed method [12], which involves minimizing
an unconstrained quadratic function, is more commonly used, as it requires only the
solution of one linear system. However, the LP model often gives a more accurate
estimate of the total wirelength [13] and is easier to extend and generalize. The
formulation of the placement problem is a modification of Weis and Mlynski's model,
and it is presented below.
We must compute locations of M free modules that are connected by N nets. Some
of the N nets may involve connections with F fixed modules as well. The fixed
modules are usually I/O pads placed on the perimeter ofthe board (see Figure 13.1)
and their locations are known in advance.
Interior Point Approaches for the VLSI Placement Problem 505
-----.
module 4 :
I J
1\
(V.,V.)
J
module 1
I
module 2
-.
I
I module 3
I
1\
(U., U.)4t-
J J
2. The wirelength required for a net is approximated by half the perimeter of the
net's circumscribing rectangle ((Vj - Uj) + (vj - Uj) in Figure 13.2). For 2-
module or 3-module nets, this measure is equivalent to that given by a minimal
spanning Steiner tree.
• {(Uj,Uj),j = 1,2, ... ,N} and {(vj,iij),j = 1,2, ... ,N}, the lower-left and
upper-right corners of the nets' circumscribing rectangles, as in Figure 13.2
(i.e., if module i is connected to net j, then it is within net j's circumscribing
rectangle; consequently, Uj ~ Xi ~ Vj and Uj ~ Yi ~ iij).
For convenience, we let the vectors x, y, ti, ii, v, and v contain all the components
Xi, Yi, Uj, Uj, Vj, iij , respectively. We find the values of these vectors so that the
sum of the circumscribing rectangles' perimeters over all nets is as small as possible.
That is, we wish to minimize the cost function
N
~)Wj(Vj - Uj) + wj(iij - itj)]' (13.1)
j=l
where w = [Wl, W2, ... , wN]T and w = [W1, W2, ... , WN]T are weights on the nets.
These can be adjusted to obtain different layouts. Initially, Wj Wj =
1 for all =
nets connecting only free modules. The Wj and Wj values for nets connecting free
modules to fixed modules are then adjusted in order to distribute modules as evenly
as possible over the given board area, so as to avoid clustering.
1. Each net has a minimum width D > 0 which can be varied to give the desired
distribution over the board area:
3. Each free module must be within the circumscribing rectangle of a net to which
it is connected:
Because the x-direction and y-direction variables and constraints are independent of
one another and the cost function can be separated, we can solve two independent
linear programs, one for each direction. For the sake of brevity, only the LP formed
for the x-direction is shown:
mInimiZe wT(v - u)
1 -I 0 -D·e
0 0 -I 0
0 0 1 X·e (13.7)
subject to P
0
1
0
-p
0
-Q
Q
0
[: 1 < 0
0
g
0 -I 0 h
where e is the vector of ones, and g and h are vectors containing the bounds on u
and v. 1 is the identity matrix, while P and Q are matrices containing a single entry
of 1 on each row.
When all fixed modules are fixed in both x- and y- directions, then the constraint
matrix for both the x-direction and y-direction LPs are identical, although in general,
the right-hand-side vectors are different from each other. More precisely, the y-
direction LP will look identical to that in (13.7), with u, v, x and w replaced by
U, V, y and w, and with right-hand-side components D, X, g and h replaced by
different values D, Y, g and h.
It is possible to have modules that are fixed in only one direction. For example, we
may wish to specify that an I/O pad can be placed anywhere on the left edge of the
508 CHAPTER 13
200
300
Figure 13.3 The constraint matrix sparsity pattern from a placement LP.
board. In this case, the I/O pad is considered to be a fixed module in the x-direction
but a free module in the y-direction. Consequently, the two constraint matrices will
differ in structure.
Note that, by changing the sign of the cost function, LP (13.7) can be written in
standard dual form, with inequality constraints:
maXImIze
(13.8)
subject to
The constraint matrix A (the transpose of the constraint matrix in LP (13.7)) has
M + 2N rows. The number of columns varies, depending on the number of module-
net connections, but is typically two to four times the number of rows. Real-life
applications involve thousands of nets and modules, with larger problems being
formulated continually. Yet though the matrix can be great in size, it is extremely
sparse, with only one or two non zeros in each column. Figure 13.3 shows the sparsity
pattern of the matrix A from a typical placement example. (Note that the ordering
of rows and columns may not be the same as that shown in LP (13.7).)
Interior Point Approaches for the VLSI Placement Problem 509
Let the location of free module i be (Xi, Yi) and the location of net j be (Uj, Vj ). In
this case, we attempt to find one location of the net by defining (Uj, Vj) as compared
to the previous LP model which used (Uj, 'lij) and (Vj, iij ) to describe the respective
lower-left and upper-right corner locations of the circumscibing rectangle that con-
tains the net. Finally, let the location of fixed module i be (Ci, d i ). To denote the
module-net interconnections, let
and
if fixed module i is connected to net j
otherwise.
1
2"LLnij [(Xi- Uj)2+(Yi- Vj)2] +
M N
f = (13.9)
i=l j=l
1
LLii;j [(Ci - Uj)2 + (d; -
F N
2" Vj)2]
;=1 j=l
510 CHAPTER 13
fx + fy (13.10)
where
(13.11)
and
(13.12)
The objective is to find M module points and N net points to minimize the objective
function f. Note that minimizing f can be performed by minimizing fx and fy
independently, which implies the two-dimensional placement problem is equivalent
to solving two one-dimensional problems. The rest of the discussion involves fx only,
but extends to fy without any loss of generality.
Let x = [Xl, X2,···, xMf and U = [Ul, U2,···, uNf be vectors representing the
module and net points, respectively and let z = [xT, u T ] T. The objective function
fx can be conveniently rewritten in the following matrix form:
1 T T
fx = 2z Bz+g z+h, (13.13)
if i = j
(13.15)
otherwise,
and
if i = j
( 13.16)
otherwise,
respectively. The linear cost vector g is given by
(13.17)
Interior Point Approaches for the VLSI Placement Problem 511
(13.19 )
(13.20)
where
M F
hj = L nij +L fiij . (13.21)
i=1 i=1
With this restriction, and assuming each net is wired as its Steiner tree, the resulting
objective function f should closely approximate the wirelength.
Second let the maximum dimension of the placement area be (X, Y). All free mod-
ules must be constrained such that they are positioned within the placement area.
Therefore, we have
OS; Xi S; X. (13.22)
Finally, in relative placement it is desirable to obtain an even spread of free modules
over the placement area (i.e., to avoid clustering). This is obtained by including the
first moment constraint
M X
LXi=M- ( 13.23)
;=1 2
to force an even spread of free modules around the centre of the placement area.
512 CHAPTER 13
m1Dlm1ze ~zTBz + gT z + h
subject to Az=b, (13.24)
o ~ z ~ X ·e.
Such a problem can be efficiently solved using a quadratic interior point method.
- Ex; =
1 M
(J'2 + 2, m (13.25)
N ;=1
where (J'2 is a desired variance and m is the average position of the free modules (i.e.,
the centre of the board). With the second moment constraint the relative placement
problem becomes
mlD1m1ze ~zTBz + gT z + h
subject to Az=b,
(13.26)
~zTDz =
W,
o ~ z ~ X ·e,
where W = ~N((J'2 + m 2 ) and D is a diagonal matrix with either 0 or 1 on the
diagonal to pick off the components of z corresponding to the free modules. The
resulting problem now contains a quadratic equality constraint and only a locally
optimal solution can be guaranteed.
z +- z + ad (13.27)
Interior Point Approaches for the VLSI Placement Problem 513
If we determine the initial solution z by solving the original MNP model given by
(13.24) (i .e., without the second moment constraint), then the following lemma holds
Lemma 13.4.1 Let d "lObe a solution to the linearized problem (13.28) and let
z be a point satisfying the equality constraints Az = b and the variable bounds
o ~ z ~ X . e. The updated solution z + ad will also satisfy the equality constraints
A(z + ad) = b and variable bounds for 0 ~ a ~ 1.
Ad = (b - Az) = o.
We find
A(z + ad) = Az + aAd = Az = b.
The proof for the variables bounds is trivial for 0 ~ a ~ 1.
(13.29)
Proof: We have
P(z + ad) = P(z) + a'V P(z)d + O(a 2 )
P(z) + a [gT + zTB - ¢(W - ~zTDz)(zTD)J d + O(a 2 ).
(13.30)
From the constraints of (13.28) we have
1
zTDd =W - -zTDz.
2
(13.31)
Thus
'V P(z)d = gT d + zTBd - ¢(W - 21 zTDz )2. (13.32)
Clearly, for an appropriately large ¢ value, it follows that 'V P(z)d :::; 0 and
P(z + ad) :::; P(z) for a sufficiently small.
We perform a line search using d to minimize the penalty function. As ¢ --> 00, the
recursive quadratic program will approach a locally optimal solution. Assuming that
the initial solution provided by the original MNP model is good, then this locally
optimal solution should also be good, with an increase in the module spreading. Of
course, since the penalty function is not an exact penalty function, for a finite value
of tP the second moment constraint will not be exactly satisfied [9]. However, the
module spread will be increased.
13.5.1 Theory
The primal-dual algorithm is derived by applying a logarithmic barrier function
to the primal problem in order to eliminate the non-negativity constraints. The
resulting barrier problem is given by
A similar approach will yield a barrier problem if applied to the dual problem.
Assuming a point that satisfies {(x, s, r, w, y) : x, s, r, w > O}, for a fixed value of
the penalty parameter /-I > 0, the first order conditions for simultaneous optimality
for the primal and dual barrier problems are:
Ax b (13.36)
x+s u (13.37)
AT -w+r-Qx c (13.38)
XR /-Ie (13.39)
SW /-Ie (13.40)
where e denotes the n-vector of ones, and X, S, Wand R are diagonal matrices
containing the components of x, s, wand r, respectively. Equations (13.36) and
(13.37) guarantee primal feasibility and equations (13.38) guarantee dual feasibility.
Equations (13.39) and (13.40) represent the /-I - complementarity conditions.
The idea behind the primal-dual interior point algorithm can be stated as follows.
Let (xl" sl" rl" wI" YI') denote the solution of the optimality conditions for any value
/-I > 0, and let (x*, s*, r*, w*, y*) denote the solution as /-I tends to zero. Given
an initial point (x, s, r, w, y), the primal-dual algorithm uses one step of Newton's
method to try to find a point closer to (xl"sl',rl"wl',yl')· This becomes the new
solution and the penalty term J.l is reduced appropriately. This process is continued
until /-I is sufficiently close to zero and the solution (x*, s*, r*, w*, y*) is obtained.
It follows from the first order optimality conditions that this solution is both primal
and dual feasible, and the duality gap is zero. Thus, (x*, s*) is optimal for the primal
problem, and (r*, w*, y*) is optimal for the dual problem.
516 CHAPTER 13
Applying Newton's method to the first-order optimality conditions yields the follow-
ing set of linear equations to obtain the search direction:
{][if] [fl
o 0 A
o 1 1
-I 0 -Q (13.41 )
o 0 Z
S W o
where
p b-Ax
u-x-s
c - AT Y + w - r + Qx ( 13.42)
pe- XRe
pe- SWe
The desired solution is then updated as
~s r-~x, (13.44)
~w S-1 (</Jw - W ~s), (13.45)
~z X- 1(</J - R~x), (13.46)
and
The solution of the augmented system at each iteration of the interior point method
represents the main computational burden of the algorithm. The solution of the
Interior Point Approaches for the VLSI Placement Problem 517
(13.48)
A stopping criterion is also required. Optimality requires both primal and dual
feasibility and that the duality gap is below a preselected threshold. Primal and
dual feasibility are measured by
Two issues arise during the progression of the interior point algorithm. The first
issue is the computation of the parameter J.!. From the optimality conditions we see
that
xTr = nJ.!, sTw = nJ.!. (13.52)
One way to recover J.! is to compute
(13.53)
and reduce this value by one tenth to move closer to optimality at each stage of the
algorithm.
518 CHAPTER 13
The second issue which must be addressed is the selection of the step size it used
to update the solution at each iteration. The step size it must be selected to ensure
positivity of the nonnegative variables. In this work we use:
= max {max
{ , 6-
.-X ,6. -
-j , - - Wj, - -
,6.Sj - - } /0.95,1 } .
- , - ,6.rj
Q
A
(13.54)
J Xj Wj Sj rj
Table 13.1 shows the size information for the 8 problems. This chart provides the
number of free modules, fixed modules and nets for each test case. In addition, the
number of rows m, the number of columns n, and the number of nonzero elements in
the constraint matrix A are shown. In the tests documented below, only the results
for the x-direction linear programs are reported. Because the constraint matrix is
the same for both the x-direction and y-direction problems, the y-direction results
would yield similar conclusions.
Comparisons were made with both the Barrier and Simplex options of CPLEX LP
package [3]. Feasibility and duality gap tolerances of 10- 4 were used as the stopping
criteria in the CPLEX tests and subsequent QP testing. Table 13.2 shows the total
iterations taken; i.e., the interior-point iterations for CPLEX Barrier and the total
iterations for the CPLEX Simplex option.
Interior Point Approaches for the VLSI Placement Problem 519
free fixed
modules modules nets rows columns non zeros
fnn4 140 20 115 370 1481 2652
placel 264 36 294 852 2444 4216
fnn8 440 36 291 1022 2444 8556
primary 1 752 81 902 2556 4745 14462
place2 2194 77 2192 6578 21888 35553
primary2 2907 107 3029 8965 31078 60470
place3 6208 209 5711 17630 69330 105452
place4 11741 400 12949 37639 129716 233983
Table 13.2 also shows the execution times in CPU seconds for each of the solvers
tested. The Simplex solver was very efficient on the smaller problems, but showed
worse performance on the larger problems, which are the problems of interest.
CPLEX Barrier performed well on most of the test cases, but had the worst running
time for "primary2". Chin and Vannelli [2] develop iterative solvers to reduce the
related fill problems and execution times.
algorithm depends on the efficiency in solving the augmented system, the statistics
for the augmented system are also shown in Table 13.3. This includes the number
of rows, columns and nonzeros in the augmented system.
For the quadratic MNP model, the relative placements are presented in Table 13.4
(x-direction only). Table 13.4 shows the total number of interior point iterations,
total solution times required to obtain the placements and the resulting estimate
of the wirelength. Also provided in Table 13.4 is the variance of the free modules,
which provides a measure of module spreading.
Figure 13.4 shows the relative placement obtained for chipl. As expected, there is
some spreading of modules due to the presence of the I/O pads, but in general the
modules tend to cluster towards the centre of the placement area. We now include
the second moment constraint to improve module spreading.
Interior Point Approaches for the VLSI Placement Problem 521
300 ~ ~
-
~
250 x
x
x
~
x
200 x
~ x
x WCxexX~XI~ x ~
xXxxx ~ x~: '!.x ~ xx~ xJi
XX lfC
150
x x
xtx x
I< Xx ~ x xXi x¥ x~
Ix x x xx x xx x
100 x
)<
Xx x x>IE x
x x x x
*
x
Xx
XX xx x
x x
Il x
x x x
50 x
x
0 A
Figure 13.4 Relative placement for chipl; QP-MNP model. An "x" denotes the
position of a module.
522 CHAPTER 13
Table 13.5 Results of the MNP placement model (variance constraint included).
number of interior point iterations for all linearized QPs (including the first QP
required to generate the initial point), the total number of QPs solved and the
total solution times. Also shown is the resulting estimate of the wirelength and the
new variance for the free modules. The new relative placement for chipl is shown in
Figure 13.5. It can be observed that the modules are more evenly spread throughout
the placement area.
Few linearizations are required and the method quickly converges to a new solution
with improved variance. Also, general locations of modules are relatively unaffected
and the new estimate of the wirelength is close to that obtained from the original
MNP model.
Interior Point Approaches for the VLSI Placement Problem 523
300~----------~r-~------------------~~--------~---.
1 x
x x
x x
x
x
x x
x
x x
xXxx>oc
X Xx
\ x
Xx x
x x x
x x
o .------~L-----~-L----____~____~~--4r--~-L~~~~
Figure 13.5 Relative placement for chipl with variance constraint. An "x" de--
notes the position of a module.
524 CHAPTER 13
13.7 CONCLUSIONS
In this work, two relative placement models were developed. The models result in
sparse LP or QP programs which are efficiently solved by interior point methods.
The sparsity and structure of these problems makes them more suitable candidates
for these solvers rather than Simplex-based variants.
In relative placement, modules tend to cluster around the centre of the placement
area and overlap with one another. A very efficient approach has been described
to prevent clustering and overlap. The resulting approach "forces" modules further
apart by increasing the variance. A small number of additional QPs are solved. This
approach has been shown to be effective.
Future work will include investigating ways to improve the module separation. Even
with the second moment constraint, modules will still cluster and overlap. A more
aggressive approach is to include constraints which directly prevents overlap. In
addition to making the problem nonlinear, these constraints couple the problems in
the x and y directions.
To move towards a placement without overlap, consider Figure 13.6, which shows
two modules i and j with dimensions (Wi, hi) and (Wj, hj ), respectively. Around each
module i we draw a circle with radius ri such that the circle completely surrounds
module i. Let 'Y = {(i,j): i = 1,···M,j = i+ 1,···M} denote the pairs of free
modules. Overlap between pair of modules i and j can be prevented by including
the constraints
(13.55)
We rewrite this constraint in a slightly more convenient form. We square both sides
of this expression to remove the square root and multiply through by 1/2 to get
~(Xi-Xj)2+~(Yi-Yj)2~Hri+rj)2, V(i,j)E'"I. (13.56)
Notice that the x and y coordinates of the modules now interact and the relative
placement can no longer be divided into two one-dimensional problems. Moreover,
the constraints of the type given in (50) are nonconvex.
If we let Z= [x, u, y, vjT, then the overlap constraint for modules i and j , denoted
by gij(Z), can be written as
1 T
gij(Z) = '"Iij - 2"z DijZ::; o. (13.57)
where '"Iij = 1/2(ri + rj)2 and Dij is a sparse semidefinite matrix which picks off the
appropriate components of the vector z.
Interior Point Approaches for the VLSI Placement Problem 525
h·J
w.
l
One such inequality constraint is required for each pair of modules, resulting in
O(M2) additional inequality constraints. Although sparse, such a large number of
constraints may be prohibitive. However, it must be noticed that it may not be
necessary to include the overlap constraint for each pair of modules. Practically,
we consider the pairs of modules that strongly overlap first. This observation may
substantially reduce the number of added constraints. One interesting consequence
of using circles is that they to extend over more area than is actually required by
the encompassed modules; this extra space between modules may prove useful for
routing.
Finally, the integration of the relative placement results with an approach such as
Tabu Search can lead to "legal placements" with no overlap. Such an approach was
developed by one of the authors [14]. We propose to include the relative placement
approaches described in this work as a preprocessing stage before the legal placement
is found using such an approach. This work is currently in progress.
Acknowledgements
Part of this research was partially funded by a Natural Sciences and Engineering
Council of Canada (NSERC) Operating Grant No. OGP 0044456.
REFERENCES
[1] J. R. Bunch and B. N. Parlett. Direct methods for solving symmetric indefinite
systems of linear equations. SIAM J. Numer. Anal., 8:639-655, 1971.
[2] P. Chin and A. Vannelli. Computational methods for an LP model of the place-
ment problem. Technical report, University of Waterloo, Waterloo, Ontario,
1994. UW E & C-94-02.
Interior Point Approaches for the VLSI Placement Problem 527
[3] CPLEX Optimization Inc. Using the CPLEX callable library and CPLEX mixed
integer library. Incline Village, NV. 1993.
[7] T.C. Hu and E. Kuh. Theory and concepts of circuit layout. in VLSI Circuit
Layout: Theory and Design, pp. 3~18, IEEE Press, New York, 1985.
[8] K. Kozminski. Benchmarks for layout synthesis - evolution and current status.
in Proceedings 28th ACM/IEEE Design Automation Conference, pp. 265~270,
1991.
[12] N. Sherwani. Algorithms for VLSI Physical Design Automation. Kluwer Aca-
demic Publishers, Norwell, Massachusetts, 1993.
[14] L. Song and A. Vannelli. A VLSI placement method using TABU search. Mi-
croelectronics Journal, 23:167~172, 1992.