You are on page 1of 544

Interior Point Methods of Mathematical Programming

Applied Optimization
Volume 5

Series Editors:
Panos M. Pardalos
University of Florida, U.SA.

Donald Hearn
University of Florida, U.S.A.

The titles published in this series are listed at the end of this volume.
Interior Point Methods
of Mathematical
Programming
Edited by

Tamas Terlaky
Delft University a/Technology

KLUWER ACADEMIC PUBLISHERS


DORDRECHT I BOSTON I LONDON
A C.I.P. Catalogue record for this book is available from the Library of Congress.

ISBN-13: 978-1-4613-3451-4 e-ISBN-13: 978-1-4613-3449-1


DOl: 10.1007/978-1-4613-3449-1

Published by Kluwer Academic Publishers,


P.O. Box 17, 3300 AA Dordrecht, The Netherlands.

Kluwer Academic Publishers incorporates


the publishing programmes of
D. Reidel, Martinus NiJ'hoff, Dr W. Junk and MTP Press.

Sold and distributed in the U.S.A. and Canada


by Kluwer Academic Publishers,
101 Philip Drive, Norwell, MA 02061, U.S.A.

In all other countries, sold and distributed


by Kluwer Academic Publishers Group,
P.O. Box 322, 3300 AH Dordrecht, The Netherlands.

Printed on acid-free paper

All Rights Reserved


@ 19% Kluwer Academic Publishers
No part of the material protected by this copyright notice may be reproduced or
utilized in any form or by any means, electronic or mechanical,
including photocopying, recording or by any information storage and
retrieval system, without written permission from the copyright owner.
This book is dedicated to the memory of
Professor Gyorgy Sonnevend,
the father of analytic centers.
CONTENTS

PREFACE xv

Part I LINEAR PROGRAMMING 1

1 INTRODUCTION TO THE THEORY OF


INTERIOR POINT METHODS
Benjamin Jansen, Cornelis Roos, Tamas Terlaky 3
1.1 The Theory of Linear Programming 3
1.2 Sensitivity Analysis in Linear Programming 14
1.3 Concluding Remarks 30
REFERENCES 31

2 AFFINE SCALING ALGORITHM


Takashi Tsuchiya 35
2.1 Introduction 35
2.2 Problem and Preliminaries 38
2.3 The Affine Scaling Algorithm 40
2.4 Nondegeneracy Assumptions 47
2.5 Basic Properties of the Iterative Process 50
2.6 Global Convergence Proof Under a Nondegeneracy Assumption 54
2.7 Global Convergence Proof Without Nondegeneracy Assumptions 56
2.8 The Homogeneous Affine Scaling Algorithm 59
2.9 More on the Global Convergence Proof of the Affine Scaling
Algorithm 67
2.10 Why Two-Thirds is Sharp for the Affine Scaling? 68
2.11 Superlinear Convergence of the Affine Scaling Algorithm 69
2.12 On the Counterexample of Global Convergence of The Affine
Scaling Algorithm 70

VB
Vlll INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING

2.13 Concluding Remarks 73


2.14 Appendix: How to Solve General LP Problems with the Affine
Scaling Algorithm 75
REFERENCES 77

3 TARGET-FOLLOWING METHODS FOR


LINEAR PROGRAMMING
Benjamin Jansen, Cornelis Roos, Tamas Terlaky 83
3.1 Introduction 83
3.2 Short-step Primal-dual Algorithms for LP 86
3.3 Applications 93
3.4 Concluding Remarks 121
REFERENCES 121

4 POTENTIAL REDUCTION ALGORITHMS


Kurt M. Anstreicher 125
4.1 Introduction 125
4.2 Potential Functions for Linear Programming 126
4.3 Karmarkar's Algorithm 130
4.4 The Affine Potential Reduction Algorithm 134
4.5 The Primal-Dual Algorithm 139
4.6 Enhancements and Extensions 142
REFERENCES 151

5 INFEASIBLE-INTERIOR-POINT ALGORITHMS
Shinji Mizuno 159
5.1 Introduction 159
5.2 An lIP Algorithm Using a Path of Centers 161
5.3 Global Convergence 164
5.4 Polynomial Time Convergence 172
5.5 An lIP Algorithm Using a Surface of Centers 175
5.6 A Predictor-corrector Algorithm 178
5.7 Convergence Properties 181
5.8 Concluding Remarks 184
REFERENCES 185
Contents IX

6 IMPLEMENTATION OF INTERIOR-POINT
METHODS FOR LARGE SCALE LINEAR
PROGRAMS
Erling D. Andersen, Jacek Gondzio, Csaba Meszaros,
Xiaojie Xu 189
6.1 Introduction 190
6.2 The Primal-dual Algorithm 193
6.3 Self-dual Embedding 200
6.4 Solving the Newton Equations 204
6.5 Presolve 225
6.6 Higher Order Extensions 230
6.7 Optimal Basis Identification 235
6.8 Interior Point Software 240
6.9 Is All the Work Already Done? 243
6.10 Conclusions 244
REFERENCES 245

Part II CONVEX PROGRAMMING 253

7 INTERIOR-POINT METHODS FOR CLASSES


OF CONVEX PROGRAMS
Florian Jarre 255
7.1 The Problem and a Simple Method 256
7.2 Self-Concordance 258
7.3 A Basic Algorithm 281
7.4 Some Applications 291
REFERENCES 293

8 COMPLEMENTARITY PROBLEMS
Akiko Yoshise 297
8.1 Introduction 297
8.2 Monotone Linear Complementarity Problems 300
8.3 Newton's Method and the Path of Centers 308
8.4 Two Prototype Algorithms for the Monotone LCP 316
8.5 Computational Complexity of the Algorithms 332
x INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING

8.6 Further Developments and Extensions 339


8.7 Proofs of Lemmas and Theorems 345
REFERENCES 359

9 SEMIDEFINITE PROGRAMMING
Motakuri V. Ramana, Panos M. Pardalos 369
9.1 Introduction 369
9.2 Geometry and Duality 370
9.3 Algorithms and Complexity 377
9.4 Applications 383
9.5 Concluding Remarks 390
REFERENCES 391

10 IMPLEMENTING BARRIER METHODS FOR


NONLINEAR PROGRAMMING
David F. Shanno, Mark G. Breitfeld, Evangelia M.
Simantiraki 399
10.1 Introduction 399
10.2 Modified Penalty-Barrier Methods 402
10.3 A Slack Variable Alternative 407
10.4 Discussion and Preliminary Numerical Results 411
REFERENCES 413

Part III APPLICATIONS, EXTENSIONS 415

11 INTERIOR POINT METHODS FOR


COMBINATORIAL OPTIMIZATION
John E. Mitchell 417
11.1 Introduction 417
11.2 Interior Point Branch and Cut Algorithms 419
11.3 A Potential Function Method 441
11.4 Solving Network Flow Problems 445
11.5 The Multicommodity Network Flow Problem 451
11.6 Computational Complexity Results 455
11.7 Conclusions 457
REFERENCES 459
Contents Xl

12 INTERIOR POINT METHODS FOR GLOBAL


OPTIMIZATION
Panos M. Pardalos, Mauricio G. C. Resende 467
12.1 Introduction 467
12.2 Quadratic Programming 468
12.3 Nonconvex Potential Function Minimization 474
12.4 Affine Scaling Algorithm for General Quadratic Programming 486
12.5 A Lower Bounding Technique 490
12.6 Nonconvex Complementarity Problems 493
12.7 Concluding Remarks 497
REFERENCES 497

13 INTERIOR POINT APPROACHES FOR THE


VLSI PLACEMENT PROBLEM
Anthony Vannelli, Andrew Kennings, Paulina Chin 501
13.1 Introduction 501
13.2 A Linear Program Formulation of the Placement Problem 503
13.3 A Quadratic Program Formulation of the MNP Placement Model 509
13.4 Towards Overlap Removal 512
13.5 Primal-Dual Quadratic Interior Point Methods 514
13.6 Numerical Results 518
13.7 Conclusions 524
REFERENCES 526
CONTRIBUTORS

Erling D. Andersen Benjamin Jansen


Department of Management Faculty of Technical Mathematics
Odense University and Computer Science
Campusvej 55 Delft University of Technology
DK-5230 Odense M, Denmark Mekelweg 4, 2628 CD, Delft
e-mail: eda@busieco.ou.dk The Netherlands
e-mail: b.jansen@twi.tudelft.nl
Kurt M. Anstreicher
School of Business Administration Florian J arre
The University of Iowa Institut fiir Angewandte
Iowa City, Iowa 52242, USA Mathematik und Statistik
e-mail: kanstrei@scout-po.biz. uiowa.edu U niversitat W iirz burg
97074 Wiirzburg, Germany
Mark G. Breitfeld e-mail: jarre@mathematik.uni-wuerzburg.de
A.T. Kearny, GmbH.
Stuttgart, Germany Andrew Kennings
Department of Electrical and
Paulina Chin Computer Engineering
Department of Electrical and University of Waterloo
Computer Engineering Waterloo, Ontario, CANADA N2L 3G1
University of Waterloo e-mail: kennings@panther.waterloo.ca
Waterloo, Ontario, CANADA N2L 3G1
e-mail: chin@panther.waterloo.ca Csaba Meszaros
Department of Operations Research
J acek Gondzio and Decision Support Systems
Logilab, HEC Geneva Computer and Automation Institute
Section of Management Studies Hungarian Academy of Sciences
University of Geneva Lagymanyosi u. 11
102 Bd Carl Vogt Budapest, Hungary
CH-1211 Geneva 4, Switzerland meszaros@lutra.sztaki.hu
e-mail: gondzio@divsun.unige.ch
(on leave from the John E. Mitchell
Systems Research Institute Department of Mathematical Sciences
Polish Academy of Sciences Rensselaer Polytechnic Institute
Newelska 6, 01-447 Troy, NY 12180, USA
Warsaw, Poland) e-mail: mitchell@turing.cs.rpi.edu

xiii
XIV CONTRIBUTORS

Shinji Mizuno Evangelia M. Simantiraki


Department of Prediction and Control RUTCOR and
The Institute of Statistical Mathematics Graduate School of Management
Minata-ku, Tokyo 106, Japan Rutgers University
e-mail: mizuno@ism.ac.jp New Brunswick,
New Jersey, USA
Panos M. Pardalos e-mail: sima@farkas.rutgers.edu
Department of Industrial and Systems Engi-
neering Tamas Terlaky
303 Well Hall Faculty of Technical Mathematics
University of Florida and Computer Science
Gainesville Delft University of Technology
Florida, FL 32611-9083 USA Mekelweg 4, 2628 CD, Delft
e-mail: pardalos@math.ufl.edu The Netherlands
e-mail: t.terlaky@twi.tudelft.nl
Motakuri V. Ramana
Department of Industrial and Systems Engi- Takashi Tsuchiya
neering The Institute of Statistical Mathematics
303 Wei! Hall Department of Prediction and Control
University of Florida 4-6-7 Minami-Azabu, Minata-ku,
Gainesville Tokyo 106, Japan
Florida, FL 32611-9083 USA e-mail: tsuchiya@sun312.ism.ac.jp
e-mail: ramana@math.ufl.edu
Anthony Vannelli
Mauricio G.C. Resende Department of Electrical and
AT&T Bell Laboratories Computer Engineering
Murray Hill University of Waterloo
New Jersey 09794 USA Waterloo, Ontario, CANADA N2L 3Gl
e-mail: resende@research.att.com e-mail: vannell@panther.wateloo.ca
Cornelis Roos Xiaojie Xu
Faculty of Technical Mathematics X_Soft
and Computer Science P.O. Box 7207
Delft University of Technology University, MS 38677-7207, USA
Mekelweg 4, 2628 CD, Delft (on leave from the
The Netherlands Institute of Systems Science
e-mail: c.roos@twi.tudelft.nl Chinese Academy of Sciences
David F. Shanno Beijing 100080, China)
e-mail:xxu@sunset.backbone.olemisss.edu
RUTCOR, Rutgers University
New Brunswick, Akiko Yoshise
New Jersey, USA Institute of Socia-Economic Planning
e-mail: shanno@farkas.rutgers.edu University of Tsukuba
Tsukuba, Ibaraki 305, Japan
e-mail: yoshise@tsuckuba.ac.jp
PREFACE

One has to make everything as simple


as possible but, never more simple. Albert
Einstein

Discovery consists of seeing what every-


body has seen and thinking what nobody
has thought. Albert S.ent_Gyorgy;

The primary goal of this book is to provide an introduction to the theory of Interior
Point Methods (IPMs) in Mathematical Programming. At the same time, we try to
present a quick overview of the impact of extensions of IPMs on smooth nonlinear
optimization and to demonstrate the potential of IPMs for solving difficult practical
problems.

The Simplex Method has dominated the theory and practice of mathematical pro-
gramming since 1947 when Dantzig discovered it. In the fifties and sixties several
attempts were made to develop alternative solution methods. At that time the prin-
cipal base of interior point methods was also developed, for example in the work of
Frisch (1955), Caroll (1961), Huard (1967), Fiacco and McCormick (1968) and Dikin
(1967). In 1972 Klee and Minty made explicit that in the worst case some variants
of the simplex method may require an exponential amount of work to solve Linear
Programming (LP) problems. This was at the time when complexity theory became
a topic of great interest. People started to classify mathematical programming prob-
lems as efficiently (in polynomial time) solvable and as difficult (NP-hard) problems.
For a while it remained open whether LP was solvable in polynomial time or not.
The break-through resolution ofthis problem was obtained by Khachijan (1989). His
analysis, based on the ellipsoid method, proved that LP and some special convex
programming problems are polynomially solvable. However, it soon became clear
that in spite of its theoretical efficiency, the ellipsoid method was not a challenging
competitor of the simplex method in practice.

The publication of Karmarkar's paper (1984) initiated a new research area that is
now referred to as Interior Point Methods (IPMs). IPMs for LP not only have
better polynomial complexity than the ellipsoid method, but are also very efficient

xv
XVI INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING

in practice. Since the publication of Karmarkar's epoch-making paper, more than


3000 papers have been published related to interior point methods. It is impossible
to summarize briefly the tremendous amount of intellectual effort that was invested
in working out all the details necessary for a comprehensive theory, and successful
implementation, of IPMs. This volume's primary intent is to give an introduction to
and an overview of the field of IPMs for non-experts. We also hope that the surveys
collected here contain useful additional information and provide new points of view
for experts.

This book is divided into three parts. Part I summarizes the basic techniques,
concepts and algorithmic variants ofIPMs for linear programming. Part II is devoted
to specially structured and smooth convex programming problems, while Part III
illustrates some application areas. The authors of the different chapters are all
experts in the specific areas. The content of the thirteen chapters is briefly described
below.

Part I: Linear Programming contains six chapters.


Chapter 1, Introduction to the Theory of Interior Point Methods, introduces the basic
notion of the central path, studies its elementary properties, and gives a stand-alone
treatment of the duality theory of LP using concepts and tools of IPMs. This part
establishes that IPMs can be presented as a self-supporting theory, independent of
the classical approach based on the simplex method. The skew-symmetric self-dual
embedding introduced here is not only a tool to prove duality theory, but also pro-
vides a perfect solution to the initialization problem faced by all IPMs. In addition,
this chapter shows how sensitivity and postoptimal parametric analysis can be done
correctly, and how this analysis might profit from the extra information provided by
interior solutions.
The authors, B. Jansen, C. Roos and T. Terlaky, are members of the optimization group
of the Delft University of Technology, The Netherlands. In recent years this group made
significant contributions to the field of IPMs. B. Jansen defended his Ph.D. Thesis in Jan-
uary 1996 on IPMs; C. Roos was one of the first in Europe who recognized the significance
of IPMs and, together with J.-Ph. Vial, developed path following barrier methods; T.
Terlaky is known in the optimization community not only as an active member of the IPM
community but also as the author of the criss-cross method for linear and oriented matroid
programming.

Chapter 2, Affine Scaling Algorithms, gives a survey of the results concerning affine
scaling algorithms introduced and studied first by 1.1. Dikin in 1967. Conceptually
these algorithms are the simplest IPMs, being based on repeatedly optimizing a
linear function on a so-called Dikin ellipsoid inside the feasible region. The affine
scaling algorithms were rediscovered after 1984, and the first implementations of
IPMs were based on these methods. Unfortunately no polynomial complexity result
Preface XVll

is available for affine scaling methods, and it is generally conjectured that such a
result is impossible. Even to prove global convergence without any non-degeneracy
assumption is quite difficult. This chapter surveys the state of the art results in the
area.
The author, T. Tsuchiya (The Institute of Statistical Mathematics, Tokyo, Japan) is well
known as the leading expert on affine scaling methods. He has contributed to virtually all
of the important results which lead to global convergence proofs without non-degeneracy
assumptions.

Chapter 3, Target Following Methods for Linear Programming, presents a unifying


view of primal, dual and primal-dual methods. Almost all IPMs follow a path (the
central path, or a weighted path) or some sequence of reference points that leads
to optimality, or to a specific central point of the feasible region. The sequence of
reference points is called the "target sequence." Newton steps (possibly damped)
are made to get close to the current target. Closeness is measured by an appropriate
proximity measure. This framework facilitates a unified analysis of most IPMs,
including efficient centering techniques.
For information about the authors, B. Jansen, C. Roos and T. Terlaky, see the information
following the description of Chapter 1.

Chapter 4, Potential Reduction Algorithms, is included due to the primary historical


importance of potential reduction methods: Karmarkar's seminal paper presented a
polynomial, projective potential reduction method for LP. After giving an elegant
treatment of Karmarkar's projective algorithm, this chapter discusses some versions
of the affine potential reduction method and the primal-dual potential reduction
method. Several extensions and enhancements of potential reduction algorithms are
also briefly described.
This survey is given by K.M. Anstreicher from The University of Iowa. In the past ten years
he has worked primarily on projective and potential reduction methods. He also showed
the equivalence of the classical SUMT code and modern polynomial barrier methods. Most
recently his research has considered IPMs based on the volumetric barrier.

Chapter 5, Infeasible Interior Point Methods, discusses the (for the time being, at
least) most practical IPMs. These algorithms require extending the concept of the
central path to infeasible solutions. Infeasible IPMs generate iterates that are infea-
sible for the equality constraints, but still require that the iterates stay in the interior
of the positive orthant. Optimality and feasibility are reached simultaneously. In-
feasibility of either the primal or the dual problem is detected by divergence of the
iterates.
This chapter is written by S. Mizuno (The Institute of Statistical Mathematics, Tokyo,
Japan) who has contributed to several different areas of IPMs. He was one of the first who
XVlll INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING

proposed primal-dual methods, made significant contributions to the theory of IPMs for
complementarity problems, and is one of the most active researchers on infeasible IPMs.

Chapter 6, Implementation Issues, discusses all the ingredients that are needed for
an efficient, robust implementation of IPMs for LP. After presenting a prototype
infeasible IPM, the chapter discusses preprocessing techniques, elements and algo-
rithms of sparse linear algebra, adaptive higher order methods, initialization, and
stopping strategies. The effect of centering, cross-over and basis identification tech-
niques are studied. Finally some open problems are presented.
The authors, E.D. Andersen (Denmark), J. Gondzio (Poland and Switzerland), Cs. Meszaros
(Hungary) and X. Xu (China and USA), are prominent members of the new generation of
people who have developed efficient, state-of-the-art optimization software. Each one has
his own high performance IPM code, and each code has its own strong points. Andersen's
code has the most advanced basis-identification and cross-over, Gondzio's code is the best
in preprocessing and Meszaros' has the most efficient and flexible implementation of sparse
linear algebra. Xu's code is based on the skew-symmetric embedding discussed in Chapter
1, and is therefore the most reliable in detecting unboundedness and infeasibilities.

Part II: Convex Programming contains four chapters.


Chapter 7, Interior Point Methods for Classes of Convex Programs, presents the
generalization of polynomial IPMs for smooth convex programs. The smoothness
conditions of self-concordance and self-limitation are motivated and defined. Several
examples illustrate the concepts and ease the understanding. After presenting a
prototype polynomial algorithm, and an implementable variant, several classes of
structured convex programs are considered that satisfy the imposed smoothness
condition.
The chapter is written by F. Jarre, who wrote his Ph.D. and Habilitation theses on IPMs for
convex optimization. He was one of the first who proved polynomial convergence of IPMs
for quadratically constrained convex programs and programs satisfying a certain Relative-
Lipshitz condition. Recently he started working on an efficient implementation of IPMs for
large scale convex programs, more specifically for problems arising from structural design.

Chapter 8, Complementarity Problems, gives an extensive survey of polynomiality re-


sults of IPMs for linear and non-linear complementarity problems. Primal-dual IPMs
generalize relatively easily to linear complementarity problems, at least if the coeffi-
cient matrix satisfies some additional condition. Here feasible and infeasible IPMs for
linear complementarity problems with appropriate matrices are discussed. The gen-
eralization for non-linear complementarity problems is far from trivial. Smoothness
conditions similar to those discussed in Chapter 7 are needed. Further extensions to
variational inequalities are also mentioned.
The author, A. Yoshise (The University of Tsukuba, Japan) worked for years together with
a group of Japanese researchers who pioneered primal-dual IPMs for LP and complementar-
Preface XIX

ity problems. For this work A. Yoshise, together with her coauthors (including S. Mizuno,
the author of Chapter 5), received the Lancaster prize in 1993.

Chapter 9, Semidefinite Programming, gives an excellent introduction to this newly


identified research field of convex programming. Semidefinite programs contain a
linear objective and linear constraints, while a matrix of variables should be pos-
itive semidefinite. It is proved that this program admits a self-concordant barrier
function and is therefore solvable in polynomial time. Semidefinite programs arise
among other places in relaxations of combinatorial optimization problems, in control
theory, and in solving structural design problems. Basic concepts, algorithms, and
applications are discussed.
The authors are M.V. Ramana and P.M. Pardalos (University of Florida, Gainesville). Mo-
takuri V. Ramana hails from India and he received his Ph.D. from The Johns Hopkins
University in 1993. He wrote his doctoral dissertation on Multiquadratic and Semidefinite
Programming problems. He developed the first algebraic polynomial size gap-free dual pro-
gram for SDP, called the Extended Lagrange Slater Dual (ELSD), and has written several
papers concerning geometrical, structural and complexity theoretic aspects of semidefinite
programming. His other research interests include global and combinatorial optimization,
graph theory and complexity theory. For some information about P. Pardalos' activities,
see the information following the description of Chapter 12.

Chapter 10, Implementing Barrier Methods for Nonlinear Programming, proposes


two algorithmic schemes for general nonlinear programs. The first is a pure barrier
algorithm using modified barriers, while the second uses the classical logarithmic
barrier and builds a way to generate variants of sequential quadratic programming
methods. Implementation issues and some illustrative computational results are
presented as well. The practical efficiency of IPMs for solving nonlinear problems is
not yet as established as in the case of LP, and this paper is an important step in
this direction.
D.F. Shanno (RUTCOR, Rutgers University) is well known in the nonlinear optimization
community for his classical work on Quasi-Newton methods. He was one of the authors of
the OBI code, which was the first really efficient implementation of IPMs for LP. He and his
coauthors received the Orchard-Hays prize of the Mathematical Programming Society in
1992 for their pioneering work in implementing IPMs. M.G. Breitfeld (Stuttgart, Germany)
was, and E.M. Simantiraki (RUTCOR, Rutgers University) is Shanno's Ph.D. student.
Both are known for their significant contributions in developing and implementing barrier
methods for nonlinear programming.

Part III: Applications, Extensions contains three chapters.


Chapter 11, Interior Point Methods for Combinatorial Optimization, surveys the
applicability of IPMs in solving combinatorial optimization problems. The chapter
describes the adaptation of IPMs to branch and cut methods, and also to potential
xx INTERIOR POINT METHODS IN MATHEMATICAL PROGRAMMING

reduction algorithms specially designed to solve combinatorial problems by trans-


forming them into nonconvex nonlinear problems. IPMs tailored to solve network
optimization and multicommodity flow problems, including some IPM based cutting
plane methods, are also discussed.
J.E. Mitchell (Rensselaer Polytechnic Institute) received his Ph.D. from Cornell University.
His work was the first attempt to use IPMs in combinatorial optimization. He has mainly
worked in exploring the potential of IPMs in branch and cut algorithms.

Chapter 12, Interior Point Methods for Global Optimization, indicates the potential
of IPMs in global optimization. As in the case of combinatorial optimization, most
problems in global optimization are NP-hard. Thus to expect polynomiality results
for such problems is not realistic. However, significant improvement in the quality
of the obtained (possibly) local solution and improved solution time are frequently
achieved. The paper presents potential reduction and affine scaling algorithms and
lower bounding techniques for general nonconvex quadratic problems, including some
classes of combinatorial optimization problems. It is easy to see that any nonlinear
problem with polynomial constraints can be transformed to such quadratic problems.
The authors P.M. Pardalos (University of Florida, Gainesville) and M.G.C. Resende (AT&T
Research) are recognized experts in optimization. Pardalos is known as a leading expert
in the field of global optimization and has written and/or edited over ten books in recent
years. Resende is responsible for pioneering work in implementing IPMs for LP, network
programming, combinatorial and global optimization problems.

Chapter 13, Interior Point Approaches for the VLSI Placement Problem, introduces
the reader to an extremely important application area of optimization. Several
optimization problems arise in VLSI (Very Large System Integration) chip design.
Here two new placement models are discussed that lead to sparse LP and sparse
convex quadratic programming problems respectively. The resulting problems are
solved by IPMs. Computational results solving some real placement problems are
presented.
A. Vannelli and his Ph.D. students A. Kennings and P. Chin are working at the Electrical
Engineering Department of the Waterloo University, Waterloo, Canada. Vannelli is known
for his devoted pioneering work on applying exact optimization methods in VLSI design.
Preface XXI

Acknowledgements
I would like to thank my close colleagues D. den Hertog, B. Jansen, E. de Klerk,
T. Luo, H. van Maaren, J. Mayer, A.J. Quist, C. Roos, J, Sturm, J.-Ph. Vial, J.P.
Warners, S. Zhang for their help and continuous support. These individuals have
provided countless useful discussions in the past years, have helped me to review
the chapters of this book, and have helped me with useful comments of all sorts. I
am also grateful to all the authors of this book for their cooperation and for their
excellent work, to John Martindale and his assistants (Kluwer Academic Publishers)
for their kind practical help, and to P. Pardalos, the managing editor of the series
"Applied Optimization" for his deep interest in modern optimization methods and
his constant encouragement. Professor Emil Klafszky (University of Technology,
Budapest Hungary), my Ph.D. supervisor, had a profound personal influence on
my interest, taste and insight in linear and nonlinear programming. Without this
intellectual impulse I would probably never have become an active member of the
mathematical programming community. Finally, but most of all, I thank my wife
for all her love, patience, and support. Without her continuous support this book
would never have been completed.

Tamas Terlaky
May 1996,
Delft, The Netherlands
PART I
LINEAR PROGRAMMING
1
INTRODUCTION TO THE THEORY
OF INTERIOR POINT METHODS
Benjamin Jansen, Cornelis Roos,
Tamas Terlaky
Faculty of Technical Mathematics and Computer Science
Delft University of Technology
Mekelweg 4, 2628 CD, Delft, The Netherlands

ABSTRACT
We discuss the basic concepts of interior point methods for linear programming, viz., duality,
the existence of a strictly complementary solution, analytic centers and the central path with
its properties. To solve the initialization problem we give an embedding of the primal and
the dual problem in a skew-symmetric self-dual reformulation that has an obvious initial
interior point. Finally, we consider the topic of interior point based sensitivity analysis.

Key Words: theory, strictly complementary, central path, embedding, logarithmic barrier
function, potential function, sensitivity analysis

1.1 THE THEORY OF LINEAR


PROGRAMMING

1.1.1 Introduction
It is not surprising that considering the theory oflinear programming from an interior
point of view on the one hand, and the development and analysis of interior point
methods on the other, are intimately related. In fact, a similar interaction is well-
known for the simplex method. Megiddo [25] was the first to analyze the central path
in detail. GuIer et al. [16] presented a complete duality theory for LP based on the
concepts of interior point methods, thereby making the field of interior point methods
for LP fully self-supporting. Kojima et al. [21] and Monteiro and Adler [28] used
Megiddo's results to propose the first primal-dual interior point method, forming

3
T. Terlaky (ed.), Interior Point Methods o/Mathematical Programming 3-34.
iCl 1996 Kluwer Academic Publishers.
4 CHAPTER 1

the basis for high-standard interior point codes as CPLEX and OSLo The important
results in the theory of linear programming are weak and strong duality and the
existence of a strictly complementary solution (Goldman-Thcker's theorem [12]). In
this chapter we will derive these results using a skew-symmetric self-dual embedding
of the primal and the dual problem (the importance of self-duality was already
recognized in the early days of LP, e.g. Tucker [35]). An analogous reformulation
was proposed by Ye et al. [38] for a computational reason: the embedding allows an
obvious interior feasible point that need not be feasible to the original primal and
dual problems. Hence, a standard interior point method could be applied to it derive
the best known complexity bound for an infeasible start interior point method. The
approach is also computationally efficient (see Xu et al. [37]) and very effective in
discovering primal and/or dual infeasibility. The skew-symmetric embedding we use
allows for an easy analysis.

Let us first introduce some notation and state the results mentioned above. Let
c, x E R", bERm and A be an m x n matrix. The primal LP problem in standard
format is given by
(P) min { cT x : Ax = b, x ~ 0 } .
x

The associated dual problem reads

(D)

The sets of feasible solutions of (P) and (D) are denoted by P and V respectively.
Problem (P) is called feasible if the set P is nonempty; if P is empty then (P) is
infeasible; if there is a sequence of feasible solutions for which the objective value
goes to minus infinity then (P) is said to be unbounded; analogous statements hold
for (D). We assume throughout that A has full row rank. This implies that y follows
from a given feasible s ~ 0 in a unique way, and we may identify a feasible solution
of (D) just by s. It is easy to check that for any primal feasible x and dual feasible
(y, s) it holds bT y ::; cT x, which is weak duality. The first theorem is the main result
in the theory of LP.

Theorem 1.1.1 (Strong duality) For (P) and (D) one of the following alterna-
tives holds:
(i) Both (P) and (D) are feasible and there exist x* E P and (y*, s*) E V such that
cT x· = bTy*;
(ii) (P) is infeasible and (D) is unbounded;
(iii) (D) is infeasible and (P) is unbounded;
(iv) Both (P) and (D) are infeasible.
Theory of IPMs 5

An alternative way of writing the optimality condition in Theorem 1.1.1(i) is by


using the complementary slackness condition

xisi=O, i=I, ... ,n.

Because of the nonnegativity condition on x* and s* this is also equivalent to


(x*)T s* = O. Note that for arbitrary complementary solutions we might have
xi = si = O. In the analysis of interior point methods strict complementarity is
a central theme; it is involved in theoretical analyses, in sensitivity analysis as well
as in the development and analysis of polynomial time interior point methods.

Theorem 1.1.2 (Strict complementarity) If (P) and (D) are feasible then there
exist x* E P and (y*, s*) E V such that (x*f s* = 0 and xi + si > 0, i = 1, ... , n.
The solution (x*, s*) is called strictly complementary.

The strict complementarity condition implies that for each index i exactly one of xi
and si is zero, while the other is positive. This result was first shown in 1956 by
Goldman and Tucker [12].

In the next sections we give an elementary proof of the above fundamental theorem:
based on interior point ideas.

1.1.2 Duality Theory for Skew-symmetric


Self-dual LPs
We define a specific skew-symmetric self-dual linear program in the following form

(SP) min { aT x : Cx ~ -a, x ~ 0 } ,


:c

where C is an n x n skew-symmetric matrix (i.e., C T = -C) and a, x E IRn. We


require a ~ o. Observe that for each x E IRn it holds

xTCx = O. (1.1)
The associated dual program is given by

with y E IRn. Obviously the skew-symmetry of C implies that the primal and dual
feasible sets are identical. The strong duality for these problems is easy.
6 CHAPTER 1

Lemma 1.1.3 (SP) and (SD) are feasible and for both the zero vector is an optimal
solution.

Proof: Since a ~ 0 the zero vector is primal and dual feasible. For each primal
feasible x it holds
0= xTCx ~ _aT x
by (1.1), so aT x ~ 0; analogously aTy ~ 0 for each dual feasible y. Hence the zero
vector is an optimal solution for (SP) and also for (SD). 0

Corollary 1.1.4 Let x be feasible for (SP) and define s = Cx+a. Then x is optimal
=
if and only if x T s o.

Proof: Using (1.1) it holds

(1.2)
The statement follows from Lemma 1.1.3. o

Observe that (SP) is trivial from a computational point of view since an optimal
solution is readily available. However, the problem is interesting from a theoretical
point of view. To complete the duality theory of the skew-symmetric self-dual
problem (SP) we need to prove the existence of a strictly complementary solution.
Since (SP) and (SD) are identical it suffices to work just with the primal problem
(SP). The feasible region of (SP) will be denoted as SP. So
SP := { (x, s) : Cx - s = -a, x ~ 0, s ~ 0 }.
The set of positive vectors in SP is denoted as Sp o:
Spo:={(x,s): Cx-s=-a,x>O,s>O}.
The set of optimal solutions of (SP) will be denoted by SP*. As a consequence of
Corollary 1.1.4 we have
SP* = { (x, s) : ex - s = -a, xT s = 0, x ~ 0, s ~ OJ.

We will need the following well-known result from elementary convex analysis, see
e.g. Rockafellar [29].

Lemma 1.1.5 Let f : D - IR be a convex differentiable function, where D ~ R n


is an open convex set. Then xED minimizes f over D if and only if 'ilf(x) = O.
Theory of IPMs 7

We will also use the following straightforward lemma from calculus, denoting IR++ =
{x E IRn : x> O}.

Lemma 1.1.6 Let J.l E lR++ and P E IR++ be given. Then the function h(x)
pT x - J.l L:7=1 In Xi, where x E lR++, has a unique minimizer.

Proof: Let us introduce the following notation: h(x) = L:7=1 hi(x;), where h;(x;) :=
PiX; - J.llnx;. Let
- (p.x. p·x· )
hi(Xi) := hi (x;) - J.l + J.llnJ.l- J.llnpi = J.l ~ I -In ~ I - 1 .

It easily follows that the functions hi(Xi) are strictly convex and nonnegative on their
domain (0, (0); furthermore hi(Xi) -> 00 as Xi -> 0 or Xi -> 00. Hence all the level
sets of the functions hi(Xi) are bounded, and bounded away from zero. Consider a
nonempty r-Ievel set .c := {x : h(x) ~ r } of the function h(x). Note that .c is
nonempty if we take r := h(x(O) for some xeO) > O. For x E .c and for each i, we
have
n n
hi(Xi) < L hi(Xi) = L(hi(Xi) - J.l + J.lln J.l - J.llnpi)
;=1 ;=1
n n

i=1 ;=1

So .c is a subset of the Cartesian product of level sets of the functions hi. We


conclude from this that the level set .c is bounded. Since h( x) is continuous, it has
a minimizer in .c. The uniqueness of the minimizer follows from the strict convexity
of h(x). 0

For any positive number J.l > 0, we define the function II' : IR++ x lR++ -> IR by

II'(x, s) := aT x - J.l (tlnxi + tIns;) ,

and fl' : IR++ -> IR by

fl'(x):= aT X - J.l (tlnxi + tln(C;.x + ai ») , (1.3)

where Ci. denotes the ith row of C. Note that fl'(x) = II'(x, s) for (x, s) E Spo. The
function fl' is called the logarithmic barrier function for (SP) with barrier parameter
J.l. Due to (1.2) the term aT x can equally well be replaced by x T s, which shows that
II'(x, s) is symmetric in x and s on SP.
8 CHAPTER 1

Lemma 1.1.7 Let p, > 0. The following two statements are equivalent:
(i) The function I,..(x) has a (unique) minimizer;
(ii) There exist x, 8 E R,n such that

Cx- 8 -a, x ~ 0, 8 ~ 0,
(1.4)
X8 = p,e.

Further, if one of the statements hold8 then x minimizes I,.. if and only if x and 8
satisfy (1.,1).

Proof: First note that whenever (X,8) solves (1.4), then both x and 8 are positive,
due to the second equation. So the nonnegativity conditions for x and 8 in (1.4) can
equally well be replaced by requiring that x and 8 are positive. One easily checks that
I,..(x) is strictly convex, and hence it has at most one minimizer. Since the domain
of I,.. is open, Lemma 1.1.5 applies and it follows that I,.. has x as a minimizer if and
only ifY'/,..(x) = 0, i.e.,

a - p,X- 1 e - p,CT S- 1 e = 0. (1.5)

Using s = Cx + a and C T = -C, (1.5) can be written as


p,X- 1 e - 8 = C(p,S-le - x).

Rearranging the terms we obtain

Since Cis skew-symmetric and the matrices X- 1 S and S-l are positive definite and
diagonal, the last equation holds if and only if X 8 =
p,e. This proves the lemma.
o

Now assume that the set Spo is nonempty and let (x(O), 8(0)) E Spo. By (1.1) we
have for any (x, 8) ESP

(1.6)

Property (1.6) is known as the orthogonality property and often used in pivoting
algorithms, see Terlaky and Zhang [34]. Equivalently it holds,

(S<O)f x + (xCO)f 8 = xT 8 + (xCO)f (8 CO )) = aT x + aT x CO ),


which gives
(1.7)
Theory of IPMs 9

Now defining the function g" : lR~+ x lR~+ -+ IR

g,,(x, s) := (s(Oll x + (x(Oll s - J1 (t In + tin


Xi Si) ,

we have for any (x, s) E Spo


g,,(x, s) := f,,(x) + aT x(Ol,
so g,,(x, s) and f,,(x) differ only by a constant on Spa. We now prove the following
theorem.

Theorem 1.1.8 Let J1 > O. The following statements are equivalent:


(i) The set Spa is nonempty;
(ii) The function f,,(x) defined in (1.3) has a (unique) minimizer;
(iii) The system (1.4) has a solution.

Proof: The equivalence of (ii) and (iii) is already contained in Lemma 1.1. 7. Earlier
we noted the obvious fact that (iii) implies (i). So it suffices to show that (i) implies
(ii). Assuming (i), let (x{Ol, s(Ol) E Spo. Due to relation (1. 7) minimizing f" (x) over
lR~ is equivalent to minimizing 9" (x, s) over Spo. So the proof will be complete if
we show that g" has a minimizer in Spo. Note that g" is defined on the intersection
of lR~+ and an affine space. By the proof of Lemma 1.1.6 the level sets of 9" are
bounded, hence g" has a (unique) minimizer. This completes the proof. 0

In the remainder of this section, we will make the basic assumption that statement
(i) of Theorem 1.1.8 holds, namely that (SP) has a strictly feasible solution.

Assumption 1.1.9 SP contains a vector (x(O), s(O) > 0, i.e., Spo is nonempty.

For each positive J1 we will denote the minimizer of f" (x) as x(J1), and define s(J1) :=
CX(J1) + a. The set {x(J1) : J1 > 0 } is called the central path of (SP). We now
prove that any section (0 < J1 ::; Ji) of the central path is bounded.

Lemma 1.1.10 Let 71 > o. The set { (x(J1), s(J1» : 0 < J1 ::; 7l} is bounded.
Proof: Let (x(O), s(O) E Spo. Using the orthogonality property (1.6) and the fact
that (1.4) holds with x(J1) we get for any i, 1 ::; i::; n,

S~O)Xi(J1) < + (x(Oll s(J1) = x(J1l s(J1) + (x(O)l s(O)


(s(Oll x(J1)
nJ1 + (x(O)l sea) ::; n71 + (x(Oll s(O).
10 CHAPTER 1

This shows that Xi(J-l) ~ (nj7 + (x(O)? s(O)/s;O). So the set { x(J-l) O<J-l~j7}is
bounded. The proof for {s(J-l) : 0 < J-l ~ j7} is similar. o

We proceed by showing the existence of a strictly complementary solution (x', s·) E


SP under Assumption 1.1.9, that is, a solution satisfying (x*? s· = 0 and x* + s* >
O. This completes the duality theory for (SP). We denote by u(u) the support of the
vector u, i.e.,
u( u) = {i : Ui > 0 }.

Theorem 1.1.11 If Assumption 1.1.9 holds, then there exist (x',s*) ESP' such
that x· + s* > O.

Proof: Let {J-ldk'=l be a positive sequence such that J-lk -+ 0 if k -+ 00. By


Lemma 1.1.10 the set { (X(J-lk), S(J-lk)) } is bounded, hence it contains a subsequence
converging to a point (x*, s*). Then it holds (x* , s*) E SP and, from X(J-lk? S(J-lk) =
nJ-lk -+ 0 we conclude (x*? s* =
0, so it is an optimal solution. We claim that
(x*, SO) is strictly complementary. This can be shown as follows. By (1.6)

Rearranging the terms of this equality, and noting that X(J-lk)T S(J-lk) = nJ-lk and
(x·)T s· =0, we arrive at

L Xi'si(J-lk)+ L Xi(J-lk)si=nJ-lk.
iEu(x·) iEu(,·)

Dividing both sides by J-lk and recalling that Xi(J-lk )Si(J-lk) = J-lk, we obtain

Letting k -+ 00, we see that the first sum above becomes equal to the number of
nonzero coordinates in x'. Similarly, the second sum becomes equal to the number
of nonzero coordinates in s·. We conclude that the optimal pair (x', s·) is strictly
complementary. 0

Observe that the proof of Theorem 1.1.11 shows that the central path has a subse-
quence converging to an optimal solution. This suffices for proving the existence of a
strictly complementary solution. However, it can be shown that the central path is
an analytic curve and converges itself. The limiting behavior of the central path as
J-l -+ 0 has been an important subject in the research on interior point methods since
Theory of IPMs 11

long. In the book by Fiacco and McCormick [7] the convergence of the path to an
optimal solution is investigated for general convex programming problems. McLin-
den [24] considered the limiting behavior of the path for monotone complementarity
problems and introduced the idea for the proof-technique of Theorem 1.1.11, which
was later adapted by GuIer and Ye [17]. Megiddo [25] extensively investigated the
properties of the central path, which motivated Monteiro and Adler [28] and Kojima
et al. [21] for research on primal-dual methods.

Lemma 1.1.12 If Assumption 1.1.9 holds then the central path converges to a
unique primal-dual feasible pair.

Proof: The proof very much resembles that of Theorem 1.1.11. Let x be optimal
=
in (SP) and (y, S Cy + a) in (SD), and let (x', s·) be the accumulation point of
the central path as defined in Theorem 1.1.11. It easily follows that

E Xi
-+ E Sj
-s! =n.

iEu(x')' iEu(s')'

Using the arithmetic-geometric mean inequality we obtain

( II Xi
-X! II Sj )
-s!
lin
< -1
(
E Xi
-+ E -s!Si
)
= 1.
- n X!
iEu(x') • iEu(s') • iEu(x')' iEu(s') Z

Applying the inequality with S = s· gives


II Xi ~ II xi,
iEu(x') iEU(X')

and with x = x' it gives


II Si ~ II s;'
iEu(s') iEu(.·)

This implies that x' maximizes the product TIiEu(x') Xi and s* maximizes the prod-
uct TIiEu(s') Si over the optimal set. Hence the central path of (SP) has a unique
limit point. 0

The proof of the lemma shows that the limitpoint of the central path solves an
optimization problem over the optimal set. Actually, we proved that the limitpoint
is the so-called analytic center of the optimal set.
12 CHAPTER 1

Definition 1.1.13 (Analytic center) Let DC lRn be a bounded convex set. The
analytic center of D is the unique minimizer of

min{-lnx;: xED}.
x

The analytic center of bounded convex sets was introduced by Sonnevend [32] and
plays an important role in interior point methods. Note that the central path is the
set of analytic centers of the level sets.

1.1.3 Duality Theory for General LPs


The results of the previous section can easily be applied to prove the strong duality
theorem of LP. In this way we present a new proof of this classical result. We
also obtain Goldman-Tucker's theorem for the general case. In this section we will
consider the LP problem in the symmetric form instead of in the standard form (P).
Obviously, this can be done without loss of generality since every LP problem can
be rewritten from one of these forms to the other. So let the primal be given by
(P) min { cT x : Ax 2': b, x 2': 0 } ,
x

where A is an m x n matrix, c, x E lRn , and bERm. The associated dual problem


(D) is
(D) max { bT y : AT y ::5 c, y 2': 0 } .
11

For convenience, we describe how (p) is derived from (P) without increasing the
number of variables or constraints. Consider (P) and assume that rank(A) = m
(otherwise the redundant constraints can easily be eliminated). Let B be any basis of
A and partition A = [AB' AN], cT = [c~, c~] and x T = [x~, xt]. Then Ax = b, x 2':
o can be written as XB + Ali l ANXN = Ali b, x 2': 0 or equivalently -Ali l ANXN 2':
-Alilb, XN 2': O. Likewise cT x =
C~XB + ctxN =
c~Alilb + (ct - c~Alil AN)XN,
hence (P) can be written equivalently in the symmetric form as
min { (ct - c~Alil AN)XN : -Ali l ANXN 2': _ABlb, XN 2': 0 } .
XN

Expressed in the form (P)-(D), it is easily seen that a pair (x*, y*) is strictly com-
plementary if x* is feasible in (P), y* is feasible in (D) and moreover
(Ax* - bfy* = (c - ATy·fx· = 0,
y* + (Ax· - b) > 0,
X*+(C-ATy*) >0.
Theory of IPMs 13

We formulate a skew-symmetric self-dual LP problem, that incorporates all the


necessary information contained in (P) and (D). A similar embedding of the primal
and dual problem in a skew-symmetric self-dual problem was considered in [12,
38]. Let x CO ), rCO) E JR.+.+ , yCO), u CO ) E R++ and '19 0, TO, j.to, Vo E ~+ be arbitrary.
Further, we define c E JR.", I) E JR.m and a, pER as follows:

It is worthwhile to note that if x CO ) is strictly feasible for (P) and rCO) := AxCO) - b,
°
then we have I) = by setting '19 0 = TO = 1. Also if yCO) is strictly feasible for (D)
°
and u CO ) := c - AT y(O), then c = if '19 0 = TO = 1. So, the vectors I) and c measure
the infeasibility of the given vectors x(O), r CO ), yCO) and u(O). We define the problem

(SP) miny,""d,T j3!9


s.t. Ax + 1)'19 bT 2:: 0,
_ATy c'19 + CT 2:: 0,
-T
-b y + cTx aT 2:: -p,
bTy cTx + a '19 2:: 0,
y 2:: 0, x 2:: 0, '19 2:: 0, T 2:: 0.

=
Due to the selection of the parameters the positive solution x x(O), y yeO), '19 = =
'19 0, T =
TO is feasible for (SP), and Assumption 1.1.9 holds. Also, the coefficients
in the objective function are nonnegative. Hence, the results of the previous section
apply to this problem, and we can derive the following result.

Theorem 1.1.14 For (P) and (JJ) one of the following alternatives holds:
(i) (P) and (JJ) are both feasible and there exists a strictly complementary solution
(x·,V)·
(ii) (P) is infeasible and (il) is unbounded.
(iii) (iJ) is infeasible and (P) is unbounded.
(iv) (P) and (JJ) are both infeasible.

Proof: Problem (SP) is skew-symmetric and self-dual, the objective has nonnega-
tive coefficients and Assumption 1.1.9 holds. Hence Theorem 1.1.11 guarantees the
existence of a strictly complementary solution (x", y" , '19", T"). By Lemma 1.1.3 we
also know, that '19" = 0, since p 2:: Vo > 0. Two possibilities may occur. If T" > 0,
14 CHAPTER 1

then it is easily seen that x-* := x* Ir* and y* := y* Ir* are feasible in (I» and (D) re-
spectively, and that they constitute a strictly complementary pair. So case (i) holds.
On the other hand, if r* = 0 then it follows that Ax* ~ 0, x* ~ 0, AT y* S 0, y* ~ 0
and bT y* - cT x* > O. If bT y* > 0 then (P) is infeasible, since by assuming that X-
is a primal feasible solution one has 0 ~ X-T AT y* ~ bT y*, which is a contradiction.
Also, it follows immediately that if (D) is feasible then it is unbounded in this case.
If cT x* < 0 then (D) is infeasible, since by assuming y to be a dual feasible solution
we have 0 S yT Ax* S cT x*, which is a contradiction; also, (I» is unbounded if it is
feasible. If bT y* > 0 and cT x· < 0 then both {P) and (D) are infeasible, which can
be seen in just the same way. 0

The proof reveals that the construction (SP) cannot always determine which of the
alternatives in the theorem actually applies. It still is an open question whether a
variant of this approach can be found that does not solve an additional feasibility
problem, nor uses a 'big M'-parameter, and still identifies exactly which of the four
holds for a given pair of LP problems. Now we only have the following corollary.

Corollary 1.1.15 Let (x·, y*, 1')*, T·) be a strictly complementary solution of (SP).
If r* > 0 then (i) of Theorem 1.1.14 applies; if r* = 0 then one of (ii), (iii) or (iv)
holds.

Observe that there is ample freedom in the choice ofthe starting point. This is highly
attractive for warm-starting, when related but (slightly) perturbed LP problems
have to solved.

1.2 SENSITIVITY ANALYSIS IN LINEAR


PROGRAMMING

1.2.1 Introduction
The merits of LP are nowadays well-established and it is widely accepted as a useful
tool in Operations Research and Management Science. In many companies this way
of modeling is used to solve various kinds of practical problems. Applications in-
clude transportation problems, production planning, investment decision problems,
blending problems, location and allocation problems, among many others. Often
use is made of some standard code, most of which use a version of Dantzig's sim-
plex method as solution procedure (for a recent survey we refer to [31]). Many LP
Theory of IPMs 15

packages do not only solve the problem at hand, but provide additional information
on the solution, in particular information on the sensitivity of the solution to cer-
tain changes in the data. This is referred to as sensitivity analysis or post optimal
analysis. This information can be of tremendous importance in practice, where pa-
rameter values may be estimates, where questions of type "What if... " are frequently
encountered, and where implementation of a specific solution may be difficult. Sen-
sitivity analysis serves as a tool for obtaining information about the bottlenecks and
degrees of freedom in the problem. Unfortunately, interpreting this information and
estimating its value is often difficult in practice; misuse is common, which may lead
to expensive mistakes (see e.g., Rubin and Wagner (30)). In the literature there are
several references where (often partially) the correct interpretation of sensitivity re-
sults is stressed. We mention Gal [8, 9], Ward and Wendell [36], Rubin and Wagner
[30], Greenberg [14], among others. The purpose of this section is manyfold. Our
first objective is to convince the reader of a correct way of considering and applying
sensitivity analysis in LP. The important observation here is that knowledge of the
set of optimal solutions is needed, instead of knowing just one optimal solution. Sec-
ondly, we show that, contrary to a popular belief, sensitivity on the basis of interior
point methods is possible and even natural by using the optimal partition of the LP
problem. Research in this area was triggered by Adler and Monteiro [1] and Jansen
et al. [18] (see also Mehrotra and Monteiro (26)). Greenberg [15] has given some
examples where the interior approach has important practical influence. Thirdly, we
unify various viewpoints on sensitivity analysis, namely approaches using optimal
bases ('simplex approach'), optimal partitions (,interior approach'), or the optimal
value ('value approach'). This unification lingers on the fact that these are three
approaches by which the optimal set can be characterized.

1.2.2 Optimal Value Functions, Optimal Sets


and Optimal Partitions
We consider the primal and dual LP problems (P) and (D) as introduced in Section
1.1.1. The sets offeasible solutions of (P) and (D) are denoted by P and 1), whereas
the sets of optimal solutions are given by P* and 1)*. Let the index sets Band N
be defined as
B {i Xi > 0 for some E P* },
X

N {i si > 0 for some (y,s) E 1)* }.

This partition is called the optimal partition and denoted by 1[" = (B, N). Using the
optimal partition we may rewrite the primal and dual optimal sets as

P* {x : Ax = b, XB 2:': 0, XN = 0 },
16 CHAPTER 1

V* = {(y,s): ATY+S=C,SB=O,SN~O}.

Since we assume A to have full rank we can identify any feasible s ~ 0 with a unique
y such that AT y + s = c, and vice versa; hence we will sometimes just use y E V*
or s E V* instead of (y, s) E V*.

We will study the pair of LP problems (P) and (D) as their right-hand sides b
and c change; the matrix A will be constant throughout. Therefore, we index the
problems as (P(b, c)) and (D(b, c)). We denote the optimal value function by z(b, c).
We will call the pair (b, c) a feasible pair if the problems (P(b, c)) and (D(b, c)) are
both feasible. If (P(b, c)) is unbounded then we define z(b, c) := -00, and if its dual
(D(b, c) is unbounded then we define z(b, c) := 00. If both (P(b, c)) and (D(b, c))
are infeasible then z(b, c) is undefined. Specifically we are interested in the behavior
of the optimal value function as one parameter changes. Although this is a severe
restriction, it is both common from a theoretical and a computational point of view,
since the multi-parameter case is very hard (see e.g. Ward and Wendell [36] for a
practical approximative approach). So, let Llb and Llc be given perturbation vectors
and define

b(f3) := b + f3Llb, f(f3) := z(b(f3),c),


c(-y) := c + ,Llc, g(,) := z(b, c(,)).

In the next lemma we prove a well-known elementary fact on the optimal value
function.

Lemma 1.2.1 The optimal value function f(f3) is convex and piecewise linear in f3,
while g(,) is concave and piecewise linear in ,.

Proof: By definition

f(f3) = max
y
{ b(f3f y : y E V }.

If f(f3) has a finite value, the optimal value is attained at the analytic center of one
the faces of V (cf. Lemma 1.1.12). Since the number of faces is finite it holds

f(f3) = max
y
{ b(f3)T y : YES},

where S is a finite set, viz. the set of analytic centers of the faces of V. For each
yES we have
Theory of IPMs 17

which is linear in 13. So 1(13) is the maximum of a finite set of linear functions, which
implies the first statement. The second can be shown similarly. 0

The proof of the lemma is an 'interior point variation' of a well-known proof using
for S the vertices of V. The intervals for 13 (or 1) on which the optimal value function
1(13) (or g( 1)) is linear are called linearity intervals. The points where the slope of
the optimal value function changes are called breakpoints.

We give here four questions a typical user might ask once a LP problem has been
solved for a certain value of, say, 13:

Question 1 What is the rate of change the optimal value is affected with by a
change in 13?
Question 2 In what interval may 13 be varied such that this rate of change is con-
stant?
Question 3 In what interval may 13 be varied such that the optimal solution of (D)
obtained from our solution procedure remains optimal?
Question 4 What happens to the optimal solution of (P) obtained from our solution
procedure?

Questions 1 and 2 clearly have an intimate connection with the optimal value func-
tion. It will need some analysis to show that the same is true for Questions 3 and
4. The answer to Question 1 must clearly be that the derivative (slope) of the op-
timal value function is the rate at which the optimal value changes. This rate of
change is called the shadow price (in case of varying objective we speak of shadow
cost). However, if 13 is a breakpoint then we must distinguish between increasing
and decreasing 13, since the rate of change is different in both cases. Moreover, the
shadow price is constant on a linear piece of the optimal value function. Hence the
answer to Question 2 must be a linearity interval. One of the reasons that Questions
3 and 4 are more involved is that the answer depends on the type of solution that is
computed by the solution procedure.

The next two lemmas show that the set of optimal solutions for (D(b(13), c)) (be-
ing denoted by V~) is constant on a linearity interval of 1(13) and changes in its
breakpoints. Similar results can be obtained for variations in c and are therefore
omitted.

Lemma 1.2.2 If 1(13) is linear on the interval [(31,132] then the optimal set V~ is
constant on (131,132).
CHAPT ER 1
18

Proof: Let (3 E (;31, ;32) be arbitrar y and let y E 1J~ be arbitrar y as well. Then

and, since y is feasible for all ;3

Using the linearity of f(;3) on [;31, ;32] yields

So all the above inequal ities are equaliti es and we obtain f' ((3) = /:)'bTy, which in
turn implies
(1.8)
the sets 1J~ are
Hence y E 1J~ for all ;3 E [;31, ;32]. From this we conclud e that
0
constan t for ;3 E (;31 , ;32)'

Coroll ary 1.2.3 Let f(;3) be linear on the interval [;31, ;32] and
denote 75* := 1J~

for arbitrary ;3 E (;31, ;32). Then 15* ~ 1J~, and 15" ~ 1J~2'

the same value


Observe that the proof of the lemma reveals that /:)'b y must have
T

next deal with the convers e implica tion.


for all y E 1J~ for all ;3 E (;31, ;32). We will

Lemm a 1.2.4 Let;31 and ;32 be such that 1J~, = 1J~2 =: 15*. Then 1J~ = 15* for
;3 E [;31, fJ2] and f(;3) is linear on this interval.

Proof: Let y E 15* be arbitrar y. Then

f(;3I) = b(;3d T y, and f(;32) = b(;32ly ·


Conside r the linear functio n h(;3) := b(;3)Ty. Note that h(;31) =
f(;31) and h(;32) =
f(;32). Since f is convex it thus holds f(;3) :::; h(;3) for ;3 E [;31, ;32]. On the other
hand, since y is feasible for all ;3 we have

f(;3) ~ b(;3)Ty = h(;3).


Theory of IPMs 19

Hence f(f3) is linear on [f31, f32] and y E V~ for all f3 E [f31, f32]. Hence 15' is a subset
of the optimal set on (f31, f32). From Corollary 1.2.3 we know the reverse also holds,
hence for all f3 E (f31, f32) the optimal set equals 15'. 0

As we have seen in the proof of Lemma 1.2.2 the quantity ab T y is the same for
all y E V~ for f3 in a linearity interval. The next lemma shows that this property
distinguishes a linearity interval from a breakpoint. Gauvin [11] was one ofthe first 1
to show this result and to emphasize the need to discriminate between left and right
shadow prices, i.e., between decreasing and increasing the parameter.

Lemma 1.2.5 Let f'-(f3) and f~(f3) be the left and right derivative of f(·) in f3.
Then

f'- (f3) = min { abT y : y E V~ }


y

f~(f3) = max { abT y : y E V~ }.


y

Proof: We give the proof for f~(f3); the one for f'-(f3) is similar. Let /3 be in the
linearity interval just to the right of f3 and let y E V~. Then

= b(fJfy ~ (b + fJabf y, Vy E V~.


f(/3)

Since y E V~ by Corollary 1.2.3 we also have bT y = bTy, Vy E V~. Hence

abT y S; abTy, Vy E V~.

Since y E V~ and f+(f3) = 1'(/3) = abTy this implies the result. o

We now show how a linearity interval can be obtained.

Lemma 1.2.6 Let f31, f32 be two consecutive breakpoints of the optimal value func-
tion f(f3). Let /3 E (f31, f32) and define 15' := V~. Then

min {f3 : Ax - f3ab


{3,x
= b, x ~ 0, x T S =0 Vs E 15' },
max {{J : Ax - {Jab = b, x ~ 0, x T S = 0 Vs E 15' }.
{3,x

1 Personal communication 1992; Gauvin's paper is not mentioned in the historical survey by Gal
[9).
20 CHAPTER 1

Proof: We will only give the proof for the minimization problem. By Lemma
1.2.2 15* is the optimal set for all (3 E «(31, (32). Observe that the minimization
problem is convex; let «(3*, x*) be a solution to it. Obviously x* is also optimal in
(P( b«(3*), c)) with optimal value (b + (3* Llb l y for arbitrary y E 15*. Hence (3* 2: (31.
On the other hand, let x(1) be optimal in (P(b«(31), e)). By Corollary 1.2.3 it holds
(x(1))T s = 0, 'Vs E 15*. Hence the pair «(31, x(1)) is feasible in the minimization
problem and we have (3* ~ (31. This completes the proof. 0

Reconsidering the results obtained above, we see that computation of linearity in-
tervals and shadow prices can be done unambiguously using optimal sets, contrarily
to what is usually done by using just one optimal solution. Next we give three
approaches based on the use of optimal sets, motivated by three different but equiv-
alent ways of describing the optimal set. The first uses optimal partitions, the second
optimal values and the third (primal/dual) optimal bases.

1.2.3 U sing Optimal Partitions


In Section 1.1 we showed that in each LP problem a strictly complementary solution
exists (Theorem 1.1.14); such a solution uniquely determines the optimal partition
of the LP problem. In this section we will analyze an approach to sensitivity anal-
ysis using optimal partitions. The important result is that the linearity intervals
of the optimal value function correspond to intervals where the optimal partition is
constant, while in the breakpoints different partitions occur. Recalling from Section
1.2.2 that the optimal partition gives a complete description of the set of optimal
solutions this should not be a surprise after having proved Lemmas 1.2.2 and 1.2.4.
This approach to sensitivity analysis is natural in the context of interior point meth-
ods. From Lemma 1.1.12 it follows that the limit point of the central path is a strictly
complementary solution, hence determines the optimal partition. Most interior point
methods intrinsically follow the central path and, as shown by Guier and Ye [17],
many of them actually yield a final iterate from which (at least theoretically) the
optimal partition can be obtained. Mehrotra and Ye [27] propose and analyze a
projection technique that yields the optimal partition in practice. Andersen and Ye
[3] apply a similar technique based on [17]. In this section we will show that not only
we can compute linearity intervals but also the optimal partitions in the breakpoints;
also, computing shadow prices we automatically obtain the optimal partitions in the
neighboring linearity intervals.
Theory of IPMs 21

Perturbations in the Right-hand Side


As before we use the notation

b«(3) := b + (3~b, f«(3):= z(b«(3), c).

For each (3 the corresponding optimal partition and a strictly complementary optimal
solution will be denoted by 7((3 = (B(3, N (3), and (x«(3\ y«(3), s«(3) respectively.

Lemma 1.2.7 Let the value function f«(3) be linear for (3 E [(31, b2J. Then 7((3 is
independent of (3 for all (3 E «(31, (32).

Proof: Follows immediately from Lemma 1.2.2 after the observation that the opti-
mal partition exactly identifies the optimal set. 0

Let us assume that (3 = 0 and (3 = 1 are two consecutive breakpoints of the optimal
value function f«(3). We will show that the optimal partition in the linearity interval
o < (3 < 1 can be determined from the optimal partition at the breakpoint (3 0 =
by computing the right shadow price at (3 = O. To this end we define the following
primal-dual pair of LP problems 2 :

(p~b) mInx { cT x : Ax = ~b, XNo 2: 0 } ,


(D~b) maxy,s { (~b)T y : AT y + s = c, SBo = 0, SNo 2: 0 }.
Note that in (p;:'b) the variables Xi, i E Bo, are free, hence we need to define its
optimal partition 1f = (B, N) in this case. Let (x, y, s) be a strictly complementary
solution of this pair of auxiliary problems. Since the dual variables Si for i E Bo are
identically zero, it is natural to let them be element of B. So, we have B Bo U { i E =
No : Si =0 }. We now derive the following theorem.

Theorem 1.2.8 Let (3 E (0,1) be arbitrary. For the primal-dual pair of problems
(~h) and (D;:'h) it holds:
(i) The optimal partition is (B(3, N(3);
(ii) y«(3) is optimal in (n;:.b);
(iii) The optimal value (~b)T y((3) is the right shadow price at (3 = o.
2The notation l-+ (and later ..... , ...... and ......) refers to the starting position and the direction of
change. For instance, l-+ means starting in the breakpoint and increasing the parameter; >- means
starting in a linearity interval and decreasing the parameter.
22 CHAPTER 1

Proof: Note that (ii) and (iii) follow from Lemma 1.2.5. Let 0 < 13 < 1 be arbitrary
and consider
(1.9)

Since (XCO)No = 0 we have XNo ;::: O. Obviously Ax = ~b, so x is feasible in (p~b).


Observe that the dual problem (D~b) admits (yUJ) , sC)3) as a feasible solution. We
conclude the proof by showing that the pair (x, y()3) , s()3) is strictly complementary
and that it determines 7r)3 = (B)3, N)3) as the optimal partition. Recall that the
support of x C)3) is B)3 and the support of xCO) is Bo. So, for i E No we have Xi > 0 if
and only if i E No \ N)3. On the other hand, if i E No, then wehave (S()3)i > 0 if and
only if i E N)3. This proves that the given pair of solutions is strictly complementary
with optimal partition 7r)3 = (B)3, N)3). The statement in (ii) follows immediately.
Using (1.9), we obtain for 13 E (0,1)

f(f3) = cT x C)3) = cT x(O) + f3cT x = cT x(O) + f3( ~bl yC)3) ,

we also show (iii). o

Starting from the breakpoint at j3 = 1 and using the optimal partition (Bl' Nd a
similar result can be obtained by using the primal-dual pair of LP problems given
by:

(p~b) mlllx { cT x : Ax = -t!..b, XN, ;::: 0 } ,


(D~b) maxy,s {-(~blY : ATy+s = C, SB, = 0, SN,;::: O}.

Without further proof we state the following theorem.

Theorem 1.2.9 Let 13 E (0,1) be arbitrary. For the primal-dual pair of problems
(P-:}) and (~) it holds:
(i) The optimal partition is (B)3, N)3);
(ii) y()3) is optimal in (D~);
(iii) The value (t!..b)T y()3) is the left shadow price at f3 = 1.

For future use we include the following result.

Lemma 1.2.10 If 13 E (0,1) is arbitrary then it holds (~bl(y()3) - yCO) >0 and
(~bl(y(1) - yC)3) > O.
Theory of IPMs 23

Proof: Theorem 1.2.8 shows that maximizing (t!..b l y over the dual optimal face
gives yC(3) as an optimal solution, and (t!..blyC(3) as the right shadow price. As a
consequence of Theorem 1.2.9 minimizing (t!..bl y over the optimal face gives the
left shadow price at (3 = OJ let y denote an optimal solution for this problem. Since
the value function 1((3) has a breakpoint at (3 = 0, its left and right derivatives are
different at (3 = 0, so we conclude (t!..b)Ty < (t!..bl yC(3). It follows that (t!..b)T y is
not constant on the dual optimal face. Since yCO) is an interior point of this face, we
conclude that (t!..blY < (t!..b)TyCO) < (t!..blyC(3), which implies the first result. An
analogous proof using (3 = 1 gives the second result. 0

Now we consider the case that the optimal partition associated to some given linearity
interval is known. We will show that the breakpoints and the corresponding optimal
partitions can be found from the given partition and the perturbation vector t!..b.
This is done by observing that we may write the problems in Lemma 1.2.6 as LP
problems.

For convenience we assume that (3 =


0 belongs to the linearity interval under con-
sideration, and that the surrounding breakpoints, if they exist, occur at (3- < 0 and
(3+ > 0 respectively. To determine (3- we consider the following primal-dual pair of
problems.
( p ~b)
_ mln(3,x {(3 : Ax - (3t!..b = b, XBo ;::: 0, XNo = O},
_
( D ~b) maxy,. {bT Y : AT y + S = 0, (t!..b)T Y = -1, SBo ;::: 0 }.

Theorem 1.2.11 For the primal-dual paifLoj-problems (p~b) and (~b) it holds:
(i) The optimal partition is (B(3-, N(3-);
(ii) xC(3-) is optimal in (~b);
(iii) The optimal value is (3- .

Proof: Items (ii) and (iii) follow in fact from Lemma 1.2.6. The proof of (i) follows
the same line of reasoning as the proof of Theorem 1.2.8. We construct feasible
solutions for both problems and prove that these solutions are strictly complementary
with the correct partition. Since (yCO),sCO») is optimal in (D(b((3-),c)) (Corollary
1.2.3), we obtain the inclusion No ~ N(3-. This shows that

x := x C(3-\ (3 := (3-

is feasible for (p~b). We will show that


yC(3-) _ yCO)
(1.10)
24 CHAPTER 1

is feasible for (D~b). First we deduce from Lemma 1.2.10 that (.6.b)T(yCO) - yC{r)
is positive, so y is well defined. Clearly (.6.b)T Y = -1. Furthermore,

((.6.bf(yCO) _ yC{r)) AT y = AT(yC{r) _ yCO) = sCO) _ sur).

Since (SCO)Bo = 0 and sC{r) ~ 0, it follows that (SCO)Bo - (sur )Bo = -( sC{r )Bo ::;
O. So y is feasible for the dual problem. Since for i E Bo we have Xi > 0 if and only if
i E B{3-, and Si = 0 if and only if i E B{3-, the given pair is strictly complementary
with the partition (B{3-, N{3-). This proves (i) and also (ii). To give also a proof of
(iii), by the linearity of the optimal value function on [,8-, 0] it follows

or equivalently
bT(yC{r) _ yCO) = ,8-(.6.bf(yCO) _ yC{r). (1.11)
Multiplying (1.10) with bT we obtain that the optimal value equals
bT (yC{3-) _ yCO) _
(.6.b)T(yCO) _ yC{3-) =,8 ,

where the equality follows from (1.11). o

The breakpoint ,8+ and the corresponding optimal partition can be found by solving
the pair of LP problems:

(P~) max{3,~ {,8 : Ax - ,8.6.b = b, XBo ~ 0, XNo = 0 },


(D ~)
_ .
mllly,. { _bT y : AT y + s = 0, (.6.bf y = 1, SBo ~ 0 }.

Theorem 1.2.12 For the primal-dual pair of problems (~) and (~) it holds:
(i) The optimal partition is (B{3+' N{3+);
(ii) xC{3+) is optimal in (~);
(iii) The optimal value is ,8+ .

Perturbations in the Objective


Let us now consider the effect of variations in the objective vector c on the optimal
value function. By 'dualizing' the results above we obtain the appropriate results.
Just as in the previous section we show that the 'surrounding' partitions of a given
partition can be found by solving appropriate LP problems, which are formulated in
Theory of IPMs 25

terms of the given partition and the perturbation ~c. The proofs are based on the
same idea as for their dual counterparts: one checks that some natural candidate
solutions for both problems are feasible indeed, and then shows that these solutions
are strictly complementary with the correct partition. Therefore, we state these
results without proofs. The discussion is facilitated by using

where band c are such that the pair (b, c) is feasible. For each, we will denote
the corresponding optimal partition by 7r-y = (B-y, N-y) and strictly complementary
solutions by (x(-y), y(-y), s(-y»). We start with the case that the given partition belongs
to a breakpoint. Without loss of generality we assume again that, = 0 and, = 1
are two consecutive breakpoints of g(,).

Consider the following pair of LP problems.

(P~C) mm", { (~cf x : Ax = b, XBo ;::: 0, XNo = 0 } ,


(D~C) maxy,$ { bT y : AT y + s = ~c, SBo ;::: 0 } .

Theorem 1.2.13 Let, E (0,1) be arbitrary. For the primal-dual pair of problems
(~C) and (~C) it holds:
(i) The optimal partition is (B-y, N-y);
(ii) x(-y) is optimal in (~C);
(iii) The optimal value (~c)T x(-y) is the right shadow cost at, O. =

A similar result can be obtained for the optimal partition at , = 1. Defining the
pair of LP problems

(P~.c) max., = b, XB , ;::: 0, XN, = 0 } ,


{ (~c)T x : Ax
.....C)
(D ~ .
mIlly,$ {_bTy : ATy+s = -~c, SB, ;::: O},

one has

Theorem 1.2.14 Let, E (0,1) be arbitrary. For the primal-dual pair of problems
(p~.n and (Den it holds:
(i) The optimal partition is (B-y, N-y);
(ii) xC-y) is optimal in (~.n;
(iii) The optimal value (~c)T x(-y) is the left shadow price at , 1. =
Using these results we derive the following corollary.
26 CHAPTER 1

Corollary 1.2.15 It holds (.6.cf(x(-y) - x(O» <0 and (.6.cf(x(1) - x("I» <0 for
arbitrary, E (0,1).

The last two results concern the determination of the size of the linearity interval and
the optimal partition in the breakpoints, given that the optimal partition associated
=
to the linearity interval is known. Assume that, 0 belongs to the linearity interval
under consideration, and that the surrounding breakpoints, if they exist, occur at
,- and ,+ respectively. We consider the following pair of problems.
(p;:C) max., { _cT x : Ax = 0, (.6.cf x = 1, XN D ?: 0 },
(D;:C) mm"l,y,. { , : AT y + s - ,.6.c = C, SB D = 0, SND ?: 0 }.

We now state

Theorelll 1.2.16 For the primal-dual pair of problems (~C) and (~C) it holds:
(i) The optimal partition is (B"I-' N"I- );
(ii) y(-Y-) is optimal in (~e);
(iii) The optimal value is ,- .

Similarly, the breakpoint ,+ is obtained from the pair of LP problems:


(P~C) min., {cT x : Ax = 0, (.6.c)T X = -1, XN D?: 0 },
(D~C) max.y,y,. { , : AT y + s - ,.6.c = C, SB D = 0, SN ?: 0 }.
D

Theorelll 1.2.17 For the primal-dual pair of problems (p~.n and (D~.n it holds:
(i) The optimal partition is (B"I+,N"I+);
(ii) y(-y+) is optimal in (~);
(iii) The optimal value is ,+ .
1.2.4 Using Optimal Values
In Section 1.2.2 we showed that correct shadow prices and linearity intervals can be
obtained by solving appropriate LP problems over the optimal face of the original
primal or dual problem, that is, knowledge of the set of optimal solutions is needed
instead of just one solution. However, once knowing the optimal value of the LP
problem, we can just as well describe the optimal faces as follows:
{x : Ax = b, x?: 0, cT X = z* },
{(V,s) : ATy+s=c, s?:O, bTy=z·}.
Theory of IPMs 27

Replacing the description using optimal partitions for the description using the op-
timal value, the results in Section 1.2.2 are valid. For instance, linearity intervals of
1({3) are computed by (cf. Lemma 1.2.6)
{31 min {{3 : Ax - {3/:lb = b, x ~ 0, cT x = (b + {3/:lb)T y. },
{3,x

{32 max {{3 : Ax - {3/:lb = b, x ~ 0, cT X = (b + {3/:lb)T y. }.


(3,x

where y. E V·. Similarly, left and right shadow prices are found by
1'-({3) min { /:lb T y : AT y + S
Y,s
= c, S ~ 0, (b + {3/:lb? y = bT y. },

1'+({3) max { /:lb T y : AT y + S = c, S ~ 0, (b + {3/:lb)T y = bT y. }.


Y,s

An advantage of the approach is that we do not need to know the optimal partition,
just the optimal value. In the literature few explicit references to this idea can be
found, e.g., Akgiil [2], De Jong [19], Gondzio and Terlaky [13J and Mehrotra and
Monteiro [26]. Similar ideas appear in Magnanti and Wong [23], who use a subprob-
lem defined on the optimal set to compute certain cuts in Benders decomposition [5]
and in Terlaky [33], who considers marginal values in ip-programming.

1.2.5 U sing Optimal Bases


When the simplex method is used for solving a LP problem an optimal basic solution
is obtained. A basis of A is a set of m indices, denoted by B, such that the submatrix
As of A is nonsingular. The corresponding variables are called the basic variables.
The indices of the remaining nonbasic variables are in N. Given a basis B, the
associated primal basic solution x is given by

and the dual basic solution by

y = AsT ca, S = ( : ; ) := ( CAr _OA'fvY ) .


If XB ~ 0 then B is a primal feasible basis; if SA( ~ 0 then B is dual feasible. We
call the basis optimal if it is both primal and dual feasible; a basis is called primal
optimal if the associated primal basic solution is optimal for (P); analogously, a basis
is called dual optimal if the associated dual basic solution is optimal for (D). Note
that a primal (dual) optimal basis need not be dual (primal) feasible. A basis B
is called primal degenerate if the associated primal solution x has Xi = 0 for some
i E B. Analogously, a basis B is called dual degenerate if the associated dual solution
S has Si = 0 for some i E N.
28 CHAPTER 1

Shadow Prices and Shadow Costs


One important aspect of postoptimal analysis is the determination of shadow prices
(shadow costs), that is, the rate at which the optimal objective value changes as a
result of a small change in an element of the right-hand side b (or the objective e).
As follows from Lemma 1.2.5 the left and right shadow prices (costs) can be obtained
from solving auxiliary LP problems. Let Llb := e(i), where e(i) is the ith unit vector.
Let us denote the shadow prices by pi and pi. Then it follows that

(1.12)

where y(k), k = 1, ... , K, are the dual optimal basic solutions. After Gauvin [11]
analogous results have been derived in [2],[4],[14],[20].

The theory shows that in case of multiple optimal dual basic solutions (primal de-
generacy) one has to distinguish between the rate of change as a consequence of
decreasing and increasing the parameter (3. In this case, the widespread belief that
the shadow price is given by the dual value is not valid. Rubin and Wagner [30]
indicate the traps and give a number of tips for correct interpretation of results of
the dual problem in practice. Analogously, shadow costs are not uniquely defined
in a breakpoint of the optimal value function g(-y) (cf. Greenberg [14]). This leads
to the introduction of left and right shadow costs for which similar results can be
derived.

Linearity Intervals
The classical approach to sensitivity analysis is to pose the question in what interval
the objective coefficient ej (or right-hand side bi) can vary such that the given
(computed) optimal basis B remains an optimal basis. To clarify this and other
approaches we consider the case of varying primal objective, and assume that Llb =
e(i). Hence we are interested in the problem (P(b«(3), e» and its dual. Let us denote
by T8 the interval for (3 for which B is an optimal basis. It is easy to see that

TB = {(3 : {(x, y, 8) : Ax = b + (3Llb, ~ 0, = 0,


°}f. 0 }.
X8 X.Af

AT Y + 8 = e, 88 = 0, 8.Af ~
It is well known that T8 is really an interval which can be computed at low cost
by twice computing m ratios and comparing them. The reason that this approach
may give different answers from different LP packages is explained by the degeneracy
apparent in the problem, whence the optimal basis might not be unique and/or the
optimal primal or dual solution might not be unique. In Section 1.2.2 it was shown
Theory of IPMs 29

that the optimal set should be used. Using bases, this implies (by definition) that
the dual optimal bases are required. Let y* be the optimal basic solution for the
original problem, then we denote the set of dual optimal bases associated with y*
by S(y*). Ward and Wendell [36] introduce the optimal coefficient set of an optimal
solution y* of (P(b, c» as
T(y*) := {fJ : y* is an optimal solution of (P( b(fJ), e)) }.
A similar definition is given by Mehrotra and Monteiro [26]. Let us also define

R(y*) := {fJ : f(fJ) = f(O) + fJyi}.


Since y* is optimal in (P(b, e)), R(y*) is either a linearity interval of f(fJ) with slope
Yi, or the set {OJ; in the latter case fJ =
0 is a breakpoint of f(fJ). The following
lemma contains the main result.

Lemma 1.2.18
(i) Ify' is an optimal solution of (P{b,e)) then T(y*) = R(y*);
(ii) If y* is an optimal basic solution of (P{b, c)) then T(y*) = USES(yO) Ts.

Proof: (i) For fJ E T(y*) it holds

f(fJ) = (b + fJe(i)T y* = bT y* + fJy: = f(O) + fJyi,


so 13 E R(y*). Now letting 13 E R(y*) we have
f(fJ) = f(O) + fJyi = bT y* + fJyi = (b + fJe(i)f y*,
which shows that y* is optimal in (P(b(fJ), e)).
(ii) If fJ E USES(yO) Ts, then clearly y* is optimal in (P(b(fJ), e)), so fJ E T(y*).
Conversely, if fJ E T(y*) there is a basis B which is optimal in (P(b(fJ), c» and
associated with y'; so fJ E Ts. Since B is dual feasible for (P( b(fJ), c» it is dual
feasible for (P(b, c». Hence BE S(y*) by the definition of a dual optimal basis. 0

A few remarks are in order. Item (ii) of the lemma was shown by Ward and Wendell
[36, Th. 17]. Note that the basis B used in its proof is dual feasible for (P(b, e)) but
not necessarily primal feasible. From Lemma 1.2.18 we may conclude that either
the optimal basic solution is only optimal in the breakpoint, or it corresponds to a
linearity interval of the optimal value function in the sense that for each value of the
parameter in this interval this solution is an optimal solution of the corresponding
problem. If fJ = 0 is a breakpoint of f(f3) then obviously there must exist more than
one optimal basic solutions of (P( b, c». The following lemma implies that whenever
the intersection of optimal coefficient sets corresponding to different optimal basic
solutions is nontrivial, then the sets coincide.
30 CHAPTER 1

Lemma 1.2.19 Let y' and y' be optimal basic solutions oj (P(b, c)) and let T(y*) n
T(y*) =f. {O}. Then T(y*) = T(Y*).

Proof: By assumption, there exists b =f. 0 such that


J(b) J(O) + by; = bT y' + by;
f(b) f(O) + by; = bTy' + by:.
From bTy. bTV we may conclude yi = Yi. From this the result immediately
follows. o

To the best of our knowledge, all commercial LP packages offering the opportunity
of performing sensitivity analysis take the approach using one optimal basis, inde-
pendently of whether degeneracy is present or not; also this approach is standard
in textbooks often without referring to degeneracy problems. Earlier attempts have
been made to circumvent the obvious shortcomings of the classical approach, see
e.g., [6, 20, 14, 8, 9]. They suggest to compute the interval for (3 where at least
one of the optimal bases associated with y. remains optimal. Obviously the overall
critical region given by such an approach is the union of intervals, each being one
where an optimal basis remains optimal. This requires more computational effort,
since (possibly) all optimal bases have to be generated. Evans and Baker [6] suggest
to solve a sequence of LP problems to find this interval. Knolmayer [20] proposes an
algorithm which does not need to generate all optimal bases associated with y*; how-
ever, the statement of his algorithm is not quite clear nor complete. Gal [10] provides
a parametric algorithm inspired by [22] that does not necessarily need all optimal
bases associated with y'. However, this approach still doesn't always generate the
complete linearity interval as desired.

1.3 CONCLUDING REMARKS


Some of the results derived in this chapter will be heavily used in subsequent chap-
ters. First of all, the central path is the basic guideline to optimality in all IPMs,
for LP as well as for convex programming. Most IPMs for LP generate a strictly
complementary solution in the limit, as shown by GuIer and Ye [17]. This is very
important in asymptotic analysis of IPMs.
Theory of IPMs 31

Acknowledgements
The first author is supported by the Dutch Organization for Scientific Research
(NWO), grant 611-304-028. Currently he is working at Centre for Quantitative
Methods (CQM) B.V., Eindhoven, The Netherlands.

REFERENCES
[1) I. Adler and R.D.C. Monteiro. A geometric view of parametric linear program-
ming. Algorithmica, 8:161-176, 1992.

[2) M. Akgiil. A note on shadow prices in linear programming. J. Opl. Res. Soc.,
35:425-431, 1984.

[3) E.D. Andersen and Y. Yeo Combining interior-point and pivoting algorithms for
linear programming. Technical Report, Department of Management Sciences,
University of Iowa, Iowa City, USA, 1994.

[4) D.C. Aucamp and D.1. Steinberg. The computation of shadow prices in linear
programming. J. Opl. Res. Soc., 33:557-565, 1982.

[5) J.F. Benders. Partitioning procedures for solving mixed variables programming
problems. Numerische Mathematik, 4:238-252, 1962.

[6) J .R. Evans and N .R. Baker. Degeneracy and the (mis)interpretation of sensi-
tivity analysis in linear programming. Decision Sciences, 13:348-354, 1982.

[7) A.V. Fiacco and G.P. McCormick. Nonlinear Programming: Sequential Un-
constrained Minimization Techniques. John Wiley & Sons, New York, 1968.
(Reprint: Volume 4 of SIAM Classics in Applied Mathematics, SIAM Publica-
tions, Philadelphia, USA, 1990).

[8) T. Gal. Postoptimal analyses, parametric programming and related topics. Mac-
Graw Hill Inc., New York/Berlin, 1979.

[9) T. Gal. Shadow prices and sensitivity analysis in linear programming under
degeneracy, state-of-the-art-survey. OR Spektrum, 8:59-71, 1986.

[10) T. Gal. Weakly redundant constraints and their impact on postoptimal analyses
in LP. Diskussionsbeitrag 151, FernUniversitiit Hagen, Hagen, Germany, 1990.

[11) J. Gauvin. Quelques precisions sur les prix marginaux en programmation lin-
eaire. INFOR, 18:68-73,1980. (In French).
32 CHAPTER 1

[12] A.J. Goldman and A.W. Tucker. Theory of linear programming. In H.W. Kuhn
and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of
Mathematical Studies, No. 38, pages 53-97. Princeton University Press, Prince-
ton, New Jersey, 1956.
[13J J. Gondzio and T. Terlaky. A computational view of interior-point methods
for linear programming. In J. Beasley, editor, Advances in linear and integer
programming. Oxford University Press, Oxford, UK, 1995.

[14] H.J. Greenberg. An analysis of degeneracy. Naval Research Logistics Quarterly,


33:635-655, 1986.
[15] H.J. Greenberg. The use of the optimal partition in a linear programming so-
lution for postoptimal analysis. Operations Research Letters, 15:179-186,1994.
[16] O. Giiler, C. Roos, T. Terlaky, and J .-Ph. Vial. Interior point approach to
the theory of linear programming. Cahiers de Recherche 1992.3, Faculte des
Sciences Economique et Sociales, Universite de Geneve, Geneve, Switzerland,
1992. (To appear in Management Science).
[17] O. Giiler and Y. Yeo Convergence behavior of interior-point algorithms. Math-
ematical Programming, 60:215-228, 1993.

[18] B. Jansen, C. Roos, and T. Terlaky. An interior point approach to postopti-


mal and parametric analysis in linear programming. Technical Report 92-21,
Faculty of Technical Mathematics and Computer Science, Delft University of
Technology, Delft, The Netherlands, 1992.
[19J J.J. de J ong. A computational study of recent approaches to sensitivity analysis
it). linear programming. Optimal basis, optimal partition and optimal value ap-
proach. Master's thesis, Delft University of Technology, Delft, The Netherlands,
1993.
[20] G. Knolmayer. The effects of degeneracy on cost-coefficient ranges and an
algorithm to resolve interpretation problems. Decision Sciences, 15:14-21, 1984.
[21J M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algorithm
for linear programming. In N. Megiddo, editor, Progress in Mathematical Pro-
gramming : Interior Point and Related Methods, pages 29-47. Springer Verlag,
New York, 1989.
[22) T.L. Magnanti and J .B. Orlin. Parametric linear programming and anti-cycling
pivoting rules. Mathematical Programming, 41:317-325, 1988.
Theory of IPMs 33

[23] T.L. Magnanti and R.T. Wong. Accelerating Benders decomposition: algorith-
mic enhancement and model selection criteria. Operations Research, 29:464-484,
1981.
[24] L. McLinden. The analogue of Moreau's proximation theorem, with applications
to the nonlinear complementarity problem. Pacific Journal of Mathematics,
88:101-161,1980.
[25] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo,
editor, Progress in Mathematical Programming: Interior Point and Related
Methods, pages 131-158. Springer Verlag, New York, 1989.
[26] S. Mehrotra and R.D.C. Monteiro. Parametric and range analysis for interior
point methods. Technical Report, Dept. of Systems and Industrial Engineering,
University of Arizona, Tucson, AZ, USA, 1992.
[27] S. Mehrotra and Y. Yeo Finding an interior point in the optimal face of linear
programs. Mathematical Programming, 62:497-515, 1993.
[28] R.D.C. Monteiro and I. Adler. Interior path following primal-dual algorithms:
Part I : Linear programming. Mathematical Programming, 44:27-41, 1989.
[29] R.T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New
Jersey, 1970.
[30] D.S. Rubin and H.M. Wagner. Shadow prices: tips and traps for managers and
instructors. Interfaces, 20:150-157, 1990.
[31] R. Sharda. Linear programming software for personal computers: 1992 survey.
OR/MS Today, pages 44-60, June 1992.
[32] G. Sonnevend. An "analytic center" for polyhedrons and new classes of global al-
gorithms for linear (smooth, convex) programming. In A. Prekopa, J. Szelezsan,
and B. Strazicky, editors, System Modelling and Optimization: Proceedings
of the 12th IFIP-Conference held in Budapest, Hungary, September 1985, vol-
ume 84 of Lecture Notes in Control and Information Sciences, pages 866-876.
Springer Verlag, Berlin, Germany, 1986.
[33] T. Terlaky. On lp-programming. European Journal of Operational Research,
22:70-100, 1985.
[34] T. Terlaky and S. Zhang. Pivot rules for linear programming: a survey on recent
theoretical developments. Annals of Operations Research, 46:203-233, 1993.
34 CHAPTER 1

[35] A.W. Tucker. Dual systems of homogeneous linear relations. In H.W. Kuhn
and A.W. Tucker, editors, Linear Inequalities and Related Systems, Annals of
Mathematical Studies, No. 38, pages 3-18. Princeton University Press, Prince-
ton, New Jersey, 1956.
[36] J .E. Ward and R.E. Wendell. Approaches to sensitivity analysis in linear pro-
gramming. Annals of Operations Research, 27:3-38, 1990.
[37] X. Xu, P.-F. Hung, and Y. Yeo A simplified homogeneous and self-dual linear
programming algorithm and its implementation. Technical Report, Department
of Mathematics, University of Iowa, Iowa City, Iowa, USA, 1994.
[38] Y. Ye, M.J. Todd, and S. Mizuno. An O(foL)-iteration homogeneous and
self-dual linear programming algorithm. Mathematics of Operations Research,
19:53-67, 1994.
2
AFFINE SCALING ALGORITHM
Takashi Tsuchiya
The Institute of Statistical Mathematics
Department of Prediction and Control
4-6-7 Minami-Azabu, Minato-ku, Tokyo 106 Japan
e-mail: tsuchiya@sun312.ism.ac.jp

ABSTRACT
The affine scaling algorithm is the first interior point algorithm in the world proposed by
the Russian mathematician Dikin in 1967. The algorithm is simple and efficient, and is
known as the first interior point algorithm which suggested that an interior point algorithm
can outperform the existing simplex algorithm. The polynomiality status of the algorithm
is still an open question, but a number of papers have revealed its deep and beautiful
mathematical structures related to other interior point algorithms. In this paper we survey
interesting convergence results on the affine scaling algorithm.

2.1 INTRODUCTION
The affine scaling algorithm is the first interior point algorithm (IPM) for linear
programming (LP) proposed by Dikin in 1967 [12], and is known as one of the
simplest and efficient interior point algorithms. The algorithm is rediscovered by
several researchers including Barnes [7], Cavalier and Soyster [10], Karmarkar and
Ramakrishnan [29], Kortanek and Shi [30] and Vanderbei et al. [68] after Karmarkar
proposed his famous projective scaling algorithm in 1984 [28]. Each step of the
algorithm is (i) to construct an ellipsoid called "the Dikin ellipsoid," and (ii) then to
take a step in the direction which minimizes the objective function over the ellipsoid
to obtain the next iterate. This simple procedure turned out to be efficient in solving
fairly large LP problems in a few decade of iterations [1, 2, 11, 21, 29, 32, 34, 35,
44, 50]. The algorithm is known as the first IPM which suggested that an IPM can
outperform the simplex algorithms for large problems [1, 2]. It was also used in

35
T. Terlaky (ed.), Interior Point Methods o/Mathematical Programming 35-82.
iCl 1996 Kluwer Academic Publishers.
36 CHAPTER 2

KORBX system, which is one of the first commercialIPM packages developed by


AT&T [ll].

Due to its practical and theoretical importance, there have been a number of papers
which gradually revealed interesting mathematical structures of the affine scaling
algorithm [3, 7, 8, 13, 15, 16, 20, 23, 26, 31, 33, 47, 48, 59, 62, 63, 64, 65, 66, 71,
68, 67]. The purpose of this survey is to shed light on mathematical structures and
convergence results on the affine scaling algorithm.

This survey is organized as follows. In §2, we introduce problems, notations and


preliminary results of the duality theory for linear programming.

In §3, we explain the affine scaling algorithm. We introduce the Dikin ellipsoid
which is an inscribing ellipsoid on the feasible region of the (primal) LP problem.
This ellipsoid is centered at the current iterate and has certain invariance properties.
The search direction of the algorithm is defined as the direction which minimizes the
objective function over this ellipsoid. We introduce two versions of the algorithm; the
short-step version which takes a step within the ellipsoid and the long-step version
which uses the ellipsoid only for deriving the search direction and moves with a step
in terms of a fraction>. of the way to the boundary of the feasible region. We also
define a dual estimate, which is a reasonable estimate of an optimal solution of the
dual problem based on the current iterate.

In §4, we introduce various non degeneracy assumptions for analyzing convergence of


the affine scaling algorithm. These non degeneracy assumptions are closely related to
the nondegeneracy assumptions for the simplex method to simplify the convergence
analysis. We summarize the convergence results on the affine scaling algorithm under
these non degeneracy assumptions. Briefly speaking, if the problem is non degener-
ate, the primal iterates and the dual estimates generated by the long-step algorithm
converge to relative interior points of the optimal faces of the primal and dual prob-
lems, respectively, for any 0 < >. < 1. For general case, the same algorithm with
o < >. S 2/3 generates the primal iterates convergent to a relative interior of the
optimal face of the primal problem and the dual estimates convergent to the relative
analytic center of the optimal face of the dual problem.

In §5, we observe some basic properties ofthe generated sequence such as convergence
to a unique point, asymptotic linear convergence of the objective function values,
etc.

Then we prove global convergence of the long-step algorithm under a non degeneracy
assumption in §6. The primal iterates and the dual estimates are shown to converge
Affine Scaling Algorithm 37

to relative interior points of the optimal faces of the primal and dual, respectively,
for 0 < ,\ < 1. This result is due to Dikin [13].

In §7, we deal with the global convergence results on the long-step algorithm obtained
by Dikin [15] and Tsuchiya and Muramatsu [65] without non degeneracy assumption.
In this case, for 0 < ,\ ~ 2/3, we can show that the primal iterates converge to a
relative interior point of the optimal face of the primal problem, while the dual
estimates converge to the relative analytic center of the optimal face of the dual
problem. We do not make a complete analysis there but outline the idea. A local
potential function plays an important role in the analysis. We introduce this function
and illustrate how it is used to prove the main result mentioned above.

From §8 to §11, the topics are more related to the special case where the feasible
region is a polyhedral cone. (We call this case "homogeneous.") We start §8 by
explaining why it is important to study the algorithm applied to homogeneous prob-
lems. Then we show that the algorithm is nothing but a version of the Karmarkar
algorithm in such a case and observe that it is also directly connected to the the
Newton method to obtain the analytic center of a polyhedron. We give an alterna-
tive proof of polynomiality of the Karmarkar algorithm based on this interpretation.
Interestingly, this proof was derived in the analysis of the global convergence of the
long-step affine scaling algorithm and plays a central role in the global convergence
proof of the affine scaling algorithm.

In §9, we revisit the global convergence analysis of the algorithm for general prob-
lems, and explain how the analysis of the homogeneous case comes into the global
convergence analysis for general problems.

It was shown by Tsuchiya and Muramatsu that ,\ = 2/3 in their global convergence
result [65] mentioned above is tight in ensuring convergence of the dual estimates to
the analytic center of the dual optimal face, and more strongly, Hall and Vanderbei
[26] showed that the dual estimate is not ensured to be convergent if,\ > 2/3. In §10,
we explain why two-thirds is an upper bound concerning convergence of the dual
estimates, by using the relationship between the Newton method for the analytic
center and the homogeneous affine scaling algorithm exploited in §8. Then in §11, as
another application of this relation, we show that a superlinear convergent version of
the affine scaling algorithm is obtained by controlling the step-sizes carefully. These
results were obtained by Tsuchiya and Monteiro [64].

It is an interesting question what is the largest step-size which ensures global con-
vergence of the primal iterates of the affine scaling algorithm. As was mentioned
above, the primal iterates converge to an optimal solution if 0 < ,\ ~ 2/3. In the fall
of 1993, Mascarenhas found an interesting instance where the algorithm cannot be
38 CHAPTER 2

=
convergent to an optimal solution when A 0.999 [31]. In §12, we review this result,
and give a plausible explanation by, again, making use of the relationship between
the homogeneous affine scaling algorithm and the Newton method for the analytic
center.

Finally, in §13, we make a concluding remark, briefly reviewing several other interest-
ing topics which cannot be put in this survey because of limitation of space and time.
We close the survey by suggesting several challenging open questions associated with
this algorithm.

2.2 PROBLEM AND PRELIMINARIES


We deal with the following standard form LP problem
minimize x cT x
(2.1)
subject to Ax = b, x 2': 0,
and its associated dual problem
maximize (y,.) bT Y
(2.2)
subject to AT y + s = C, s 2': 0,
where A E Rmxn , c,x,sE Rn and b,yE Rm.

Let us begin with some definition, notation and description of the duality theory for
linear programming. We cite [49] as a standard text book for these basic results.

Given a vector v, Ilvll, IIvlll and IIvll oo are Euclidean norm, I-norm and oo-norm of
v. max[v] means the maximum component of v. For an index set J, we denote by
VJ the subvector of v composed of the elements associated with J. Similarly, for a
matrix M, MJ denotes a matrix composed of the columns of M associated with the
index set J. We define MJ == (MJ)T.

We denote the affine spaces {xlAx = b} and {(y,s)ls = c - ATy} by P and V,


respectively. The feasible regions of (2.1) and (2.2) are written as P+ and V+. A
feasible solution x (or (y, s)) is called an interior feasible solution of (2.1) (or (2.2))
if x> 0 (or s > 0). We denote the set of interior feasible solutions of (2.1) and (2.2)
by P++ and V++, respectively.

We define faces of P+ and V+. Let Nand B be a pair of the index sets. The pair
(N, B) is called partition if NUB = {I, ... , n} and N n B = 0.
Affine Scaling Algorithm 39

Let (NI' Bd be a partition. If the set

{X/XN1 = 0, XB , > 0, x E p+} (2.3)

is nonempty, then the set

p(tI, B)
1
== {X/XNI = O,X E p+} (2.4)

is referred to as the face of p+ determined by the partition (NI' Bd. The set (2.3)
is called the relative interior of P~, ,B,)' and is written as Pt:, ,B,). A point which
belongs to p(VI, B 1 ) is referred to as a relative interior point of the face. As a special
case, we regard p+ as a face of p+ itself where (NI' B I ) is given by (0, {I, ... , n}).
If p(t B ) is bounded, then there exists a unique point which minimizes the barrier
function
" 1

-L log Xi (2.5)
iEB ,

over p(VI, B)· This special point is referred to as the "relative analytic center" of
=
1
the face. When (NI' Bd (0, {I, ... , n}), the relative analytic center is called the
"analytic center" of the feasible region P+ [51].

Similarly, for a partition (N2' B 2), if

{(y, S)/SN2 > 0, SB 2 = 0, (y, s) E V+} (2.6)


is nonempty, the set

VtN2,B2) = {(y, S)/SB2 = 0, (y, s) E V+} (2.7)

is referred to as the face of V+ determined by the partition (N2' B2). The set (2.6)
is called the relative interior of V(+N2, B2 ), and written as V(+:2 B2 ). As a special case,
t

we regard V+ as a face of V+ itself where (N2' B 2) is given by ({I, ... , n}, 0).

The relative analytic center of V(N 2 ,B2 ) is defined similarly as the relative interior
point of Vtt"B2) which minimizes the barrier function

-L logsi (2.8)
iEN2

associated with this face. When (N2' B 2) = ({I, ... , n}, 0), the relative analytic center
is called the analytic center of the feasible region v+.

The duality theory of linear programming concludes that the dual problem (2.2)
is feasible and has an optimal solution if and only if the primal problem (2.1) has
40 CHAPTER 2

an optimal solution, and the optimal value of (2.2) is equal to the optimal value
of (2.1). Now, we assume that (2.1) have an optimal solution. The points x and
(y, s) are optimal solutions of (2.1) and (2.2) if and only if they satisfy the following
complementarity condition:
(i)XEP+, (y,S) EV+, (ii)xisi=O (i=l, ... ,n). (2.9)
Furthermore, there always exists a pair of optimal solutions of (2.1) and (2.2) satis-
fying the following condition in addition to (i) and (ii).
(iii) Xi + Si > 0 (i = 1, ... , n). (2.10)
This condition means that either one of Xi and Si is always positive. (i)-(iii) is
called the strictly complementarity condition. Now, let X· and (y., SO) be a pair
of the optimal solutions of (2.1) and (2.2) satisfying the strictly complementarity
condition, and let N. and E. be index sets such that xi =
0 and si > 0 for all
i E N. and xi > 0 and si =
0 for all i E E •. By definition, (N., B.) is a partition,
and the optimal sets Sp for (2.1) and SD for (2.2) is written as
Sp = P(No,BO) = {x E P+IXNo = OJ, SD = V(No,BO) = {(y, s) E V+ISBo = oJ. (2.11)
Thus, the optimal sets of (2.1) and (2.2) are completely characterized by N. and
B •. If X* and (y', so) satisfy the strictly complementarity condition, then X· and
(y', s·) are relative interior points of the optimal faces Sp and SD, respectively.

In the subsequent sections, we use the following standard conventions. The letter e
denotes the vector of all ones with a proper dimension. For feasible solutions X and
(y, s) of (2.1) and (2.2), we denote by X and 5 the diagonal matrices whose entries
are X and s. An analogous rule applies to subvectors of x and s. We also extend this
convention in an obvious way when we consider a sequence {xk} and {(yk, sk)} etc.
Finally, when f(x) is a function of x and {xk} is a sequence, we occasionally use fk
as an abbreviation for f(x k ) as long as it does not cause a confusion.

2.3 THE AFFINE SCALING ALGORITHM


Now, we introduce the affine scaling algorithm. We make the following assumptions:

Assumption 1: (2.1) has an interior feasible solution.


Assumption 2: cT x is not constant over the feasible region of (2.1).
Assumption 3: Rank(A) n. =
Assumption 1 is crucial to the interior point algorithms. Assumptions 2 and 3 are
not substantial. Assumption 2 is made for avoiding dealing with the trivial case
Affine Scaling Algorithm 41

where the objective function is constant over the feasible region (we can easily check
whether this occurs or not in advance), and we make Assumption 3 for obtaining
a closed form of the search direction. All of the results in this survey hold without
Assumptions 2 and 3 after a simple modification. In Appendix, we explain how to
satisfy these requirements in implementation.

Given an interior feasible solution x of (2.1), we solve (2.1) by an iterative pro-


cess which generates a sequence of interior feasible solutions. For this purpose, we
consider a ball E(x,p.) centered at x with radius p. which inscribes on the feasible
region P+. We obtain the next iterate x+ as the optimal solution for the following
optimization problem:
minimize.;; cTi: subject to i: E E(x,p.). (2.12)
Since E( x, p.) is an inscribing ball, x+ is also a feasible solution. If E( x, p.) is a good
approximation to P+ , then, x+ will be a much better approximate optimal solution
for (2.1) than x is.

How can we construct such an ellipsoid at an interior feasible solution x? In order


to answer the question, we make use of the fact that (2.1) and the following LP
problem
minimize y (D-lc?y
(2.13)
subject to AD-ly = b, y ~ 0,
which is obtained by scaling the variables x by a diagonal matrix D whose diagonal
entries are positive, is substantially the same, as is easily seen by letting y = Dx. It
is reasonable to require that the ellipsoids for (2.1) and (2.13) at the corresponding
=
points x and fj Dx are the same.

To satisfy this invariance requirement, we focus on a special standard form LP


problem obtained by scaling the original problem with the diagonal matrix (X)-l
=
which maps x to e. The new variable u is defined as u(x) X-lx. In other words,
we take the following problem as a canonical form
minimize.. (X c? u
(2.14)
subject to AX u = b, u ~ O.
Then, we consider a sphere with radius p.:
{uiliu - ell::; p, AXu = b}, (2.15)
and use this sphere as the ellipsoid E(x, p) at x. This ellipsoid can be written as
follows when transformed back into the space of x:
E(x,p.) == {i:lllu(i:) - ell ::; p., Ai: = b} = {i:IIIX-li: - ell ::; p., Ai: = b}. (2.16)
42 CHAPTER 2

Obviously, this ellipsoid is invariant under the scaling of variables. Furthermore, we


have the following proposition.

Proposition 2.3.1 Let x be an interior feasible solution of (2.1). We have E(x, 1) ~


p+ and, if /-' < 1, then E(x, /-,) ~ p++.

Proof Let x E E(x,/-'). Due to the definition of E(x,/-'), we have

(2.17)

Multiplying this relation by Xi, we have

IXi - xii S /-,Xi S Xi, (2.18)

which immediately implies the result. •


Proposition 2.3.1 means that the optimal solution for (2.12) remains feasible as long
as /-' S 1. Taking this property into consideration, we determine the next iterate
x+ when taking the step-size 0 < /-' S 1 as the optimal solution of (2.12). This is
a version of the affine scaling algorithm proposed by Dikin (with /-' = 1) [12] and
rediscovered by Barnes (with /-' < 1) [7]. We refer this version as the short-step affine
scaling algorithm, to distinguish it from the long-step version explained later.

In Fig. 1, we show how this ellipsoid with /-' =


1 approximates the feasible region for
a small problem. We draw a line connecting X and x+ inside the ellipsoid, which is
the search direction ofthe algorithm. It is seen that each E( x, 1) is an inscribing ball
no matter where x is. Another interesting point is that E( x, 1) approximates the
feasible polyhedron well when x is located relatively far away from the boundary of
the feasible region, and it is a poor approximation when x is close to the boundary.
Fig. 2 shows the sequence generated by the algorithm when /-' =
1/8. One can get
some image about the vector field generated by the algorithm.

The ellipsoid E(x,/-') defined in (2.16) is referred to as the "Dikin ellipsoid." Now,
we derive a closed form of the displacement vector and the search direction of the
algorithm. The displacement vector of the algorithm when we take the step-size /-'
is given as the optimal solution of the following optimization problem.

minimize JeT d subject to Ad = 0, d"T x- 2d = /-,2. (2.19)

When regarding u =
X-I;] as the variable, this problem becomes nothing but to
find the minimum point of a linear function (Xc)T u over the intersection between
Affine S
caling Algorith
m
44 CHAPTER 2

the linear space {ulAXu =


O} and the sphere with radius /l. The optimal solution
of this optimization problem is written as
d(x) XPAxXc
(2.20)
-/lIlX-1d(x)11 = -/lIIPAXXcll'
where
d(x) = XPAxXc (2.21)
and PAX is the projection matrix onto Ker(AX) which is defined as
PAxw = argmin.,{liw - vli 2 I v E Ker(AX)}. (2.22)
Under Assumption 3, PAX is written as PAX = 1- XAT(AX2 AT)-l AX. From
(2.21), we see that

(2.23)
Then, the iteration of the short-step affine scaling algorithm with the step /l is
written as
(2.24)

In view of the standard terminology of the theory of mathematical programming, the


search direction is the steepest descent direction of cT x with respect to the metric
X-2.

In the short-step version, the next iterate is supposed to stay in the ellipsoid with the
radius /l ~ 1. From the practical point of view, it is more efficient to move aggres-
sively along the search direction to obtain further reduction of the objective function
value as is seen from Fig. 1. Since the next iterate should remain as an interior point,
however, we move a fixed fraction A E (0,1) of the way to the boundary. The algo-
rithm with this step-size choice was proposed by Vanderbei et al. [68] and Kortanek
and Shi [30], and is called the long-step affine scaling algorithm. Most ofthe efficient
=
implementations use this version with A 0.9 ,.... 0.99 [1, 2, 11,21,32,34,35,43,44].
The iterative formula of the long-step affine scaling algorithm is written as follows:
+ _ d(x) _ XPAXXC
X (x, A) - x - Amax[X-1d(x)] - x - Amax[PAXXC]' (2.25)

Note that the iteration is not well-defined when max[PAx X c) ~ O. However, since
max[PAXXC] ~ 0 implies that -d(x) ~ 0 and cTd(x) > 0, we have the following
proposition which means that we may terminate the iteration if this happens.

Proposition 2.3.2 If max[PAx X c) ~ 0, then cT x is unbounded below over the


feasible region of (2.1).
Affine Scaling Algorithm 45

Since the affine scaling algorithm is an algorithm which generates the iterates only
in the space of (2.1), it is important to obtain an estimate of an optimal solution
for the dual problem (2.2). We define a quantity for this purpose, called the "dual
estimate." As was explained in the previous section, it is well-known that solving
the pair of the primal and dual problem is equivalent to finding a pair of a primal
feasible solution x and a dual feasible solution (y, s) satisfying the complementarity
condition (2.9). Based on this fact, let us construct a good estimate for an optimal
solution for the dual problem. Given an interior feasible solution x, we obtain (y, s)
which is closest to the solution of (2.9) in a certain sense. If we give up satisfying
the nonnegativity constraint on s in (2.9), a reasonable estimate of a dual optimal
solution would be the solution of the following least squares problem:

minimize (y, 0) ~lIxsll2 subject to s =c - AT y. (2.26)

As the optimizer of this problem, (y( x), s( x)) is written as follows:

(2.27)

It is easy to see that the relation

(2.28)

holds between the search direction d( x) and the dual estimate. This relation is
frequently used throughout the paper. The following interesting property holds
about the dual estimate (y(x), sex)).

Theorem 2.3.3 The dual estimate (y(x), sex)) is bounded over p++.

This result was first derived by Dikin [20] and rediscovered later by several other
authors including Vanderbei and Lagarias [67], Stewart [52] and Todd [58] and has
theoretically interesting applications, e.g., [69].

There is an interesting historical story about the dual estimate [17, 18]. Indeed,
the least squares problem (2.26) was the starting point of Dikin when he developed
the affine scaling algorithm in 1967. In 1965, he was a postdoctorial fellow of Kan-
torovich, and they were carrying out some data analysis on the agricultural product
of the former Soviet Union. Kantorovich asked Dikin to estimate the dual variables
from the primal variables which were already available as the observed data. If
we assume that these economic quantities are in an equilibrium state, the primal
variables and the dual variables should satisfy the complementarity condition (2.9),
46 CHAPTER 2

and hence it would be reasonable to consider the following weighted least squares
problem to estimate the dual variables (y, s):

.. .
mllllmize (y,.)
1""
:2 L..J XiS;2 subject to s =c - AT y. (2.29)
;

Note that (2.29) is differ from (2.26) in that Xi is not squared. As a further devel-
opment of this idea of Kantorovich, Dikin realized that it is more natural to use his
dual estimate s(x) which has an invariant property. (The estimate (2.29) depends
on the scaling of the primal variables while the dual estimate (2.26) by Dikin is not.)
Furthermore, he noticed that -X 2 s(x) = -d(x) is a feasible descendent direction
for (2.1), and used it for solving linear programming. Thus, the way the affine scal-
ing algorithm is usually explained is a little bit different from the way how Dikin
developed this method.

The dual estimate is a quantity very similar to the shadow price in the simplex
method. Iteration ofthe simplex method is stopped when the shadow price becomes
nonnegative, recognizing that the primal iterate comes to an optimal solution. Then
the shadow price is nothing but a dual optimal solution. In other words, the shadow
price converges to an optimal solution of the dual problem while the primal sequence
converges to an optimal solution of the primal. Analogously, we expect that the dual
estimate converges to an optimal solution of the dual problem as the primal iterate
converges to an optimal solution of the primal. Convergence of the dual estimates
is important in the convergence theory of the affine scaling algorithm.

In order to solve any LP problem with this algorithm, we have to convert the original
problem into an equivalent problem satisfying Assumptions 1 ~ 3. There are two
ways to do this: the Big-M method and the Phase I-Phase II method. We will review
them briefly in Appendix.

Before concluding this section, we discuss how we define the affine scaling algorithm
for general form LP problems which contain free variables. The following proposition
is easy to see and hence its proof is left to the readers.

Proposition 2.3.4 If the objective function of an LP problem is neither bounded


below nor above, the objective function is written as an affine function of nonnegative
variables (i. e., without free variables).

Note that the unboundedness of the objective function (both above and below) in
this proposition is easily checked once a problem is given. If the objective function is
bounded neither above nor below, the LP is meaningless. Therefore, this proposition
Affine Scaling Algorithm 47

tells us that any reasonable LP problem can be regarded as an optimization problem


of nonnegative variables. Taking note that the set of feasible nonnegative variables
is written as the cross-section of an affine space and the nonnegative orthant, the
LP problem can be rewritten as

minimize s cT s subject to sET, s ~ O. (2.30)

Here, T is an affine space. Let us assume that (2.30) has an interior feasible solution
s, i.e., s > O.
The Dikin ellipsoid at s is defined as

(2.31)

The affine scaling direction d for this problem is given as

minimize d cT d subject to dE E( s, fl)· (2.32)

We apply this idea to the dual standard form problem (2.2). By using Assumption
3, we obtain the iterative formula for the long-step affine scaling algorithm with the
step). E (0,1) for the dual standard form [2] as follows, by letting c = x where x is
=
a solution for Ax band T =
{slATy + s c}. =
(AS- 2AT)-lb
(2.33)
y + ). max[S-l AT (AS-2 AT)-lb] ,
( AS- 2AT)-lb
s - ). AT ----;-=-'-:-:-=-:--:-:::"--;:--:-;;;-:--~
max[S-l AT(AS-2 AT)-lb]
s _). S(I - PAS-.)Sx
(2.34)
max[(I - PAS-.)Sx]

We leave the derivation of this iterative formula to the interested reader.

2.4 NONDEGENERACY ASSUMPTIONS


Like in the simplex algorithm, degeneracy has something to do with the convergence
theory of the affine scaling algorithm. It is an unusual feature of the affine scaling
algorithm as an IPM. Earlier global convergence results on this algorithm assumed
some nondegeneracy assumptions which make the convergence analysis easier. In
order to give definitions of the non degeneracy conditions, we define a degenerate
point of the solution set of the linear equations

Ax =b (2.35)
48 CHAPTER 2

and
s = c-ATy. (2.36)
Recall that n is the dimension of x and s and that n - m and m are the dimensions of
the feasible polyhedrons p+ and V+, respectively, under Assumptions 1 and 3. The
dimensions ofP+ and V+ are equal to the dimensions ofP and V under Assumption
1. A solution x for Ax = b is called "degenerate" if more than n - m components
of x become zero simultaneously. Similarly, we call a solution (y, s) of s = c - AT Y
is "degenerate" if more than m components of s become zero simultaneously. Now,
we introduce nondegeneracy conditions about the affine spaces P and V and the
polyhedrons p+ and V+ [25].

1. The affine space P (or V) is called nondegenerate if no point in P (or V) is


degenerate.
2. The polyhedron p+ (or V+ ) is called nondegenerate if no point in p+ (or V+ )
is degenerate.

Making the nondegeneracy assumption on p+ (or V+) for (2.1) (or (2.2» is exactly
the same as making a standard nondegeneracy assumption in the simplex method to
prevent cycling. On the other hand, requiring nondegeneracy of V (or P) for (2.1)
(or (2.2» has something to do with the existence of constant-cost face on the feasible
region. That is, p+ (or V+) has no constant-cost face except for vertices under the
nondegeneracy assumption of V (or Pl. See [70] on this point. Fig. 3 shows some
examples of degenerate problems.

Remark. These conditions are completely symmetric with respect to the primal
problem (2.1) and the dual problem (2.2) in the following sense. Given a standard
form problem (2.1), we can convert it to the dual standard form problem like (2.2)
by taking a basis. Let vt be the feasible region of the converted dual standard
form problem equivalent to (2.1). Then nondegeneracy of p+ is equivalent to non-
degeneracy of vt. On the other hand, given a dual standard form problem (2.2),
we may eliminate the free variable y to write it in a standard form. Let pjj be the
feasible region of the converted standard form problem equivalent to (2.2). Then
nondegeneracy of V+ is equivalent to nondegeneracy of P"Jj. The same thing can
be said about the nondegeneracy condition of the affine spaces P and V. We also
mention that this definition of non degeneracy uses Assumption 3. To extend it to
the general cases, we should replace "m and m - n" in the definition above by "the
dimension of {sis = c - ATy} and {xlAx = b}," respectively. _

One of the important conclusions of the non degeneracy condition associated with
(2.1) is the following proposition.
Affine Scaling Algorithm 49

Objectl. . I'IIfte,,- ..~ ..., •

ta) 1)+ i. de~.te. 1'", nondcpDeralC. tb) P+ i .......dqeQO:,.tc • ." II depue,....


The feasible rqiaa 'Z)+ ia II ....ul.,. ortahe- 1'. i, _depanate. h h .. muitipit! Opti.
drcm (n "" 5). 1'+ i. deJencra'. '-=- m.a dmw-, hencr. l' i. de&__,e.
.. (> ,,, _ 3) equations ....... d6ed ..muk...
fteOu.J.y at each .......wx. l' il _dewmwatl!.
bela... chere il 110 ......1'Ult·c:o-t r.o:e .,or""pt
Corvlll1.iCIII.

OD,.d..... funU'an....,tnr.

. . . .··-
Lb~·"'''
. .
{~,,~T:.... ~O

(el z,+ i. depner.'e, l' il ~alI!.

'D+i.~ar.e .... .bow.iQ,(.). 'Pi.aI..


• 11 _ _ • b _ _ 'lui objlet;ive tuncti_ i,
""".t ...., _ _ old... tber.dp., •.•.• Be.

Figure 2.3

Proposition 2.4.1 If P+ is nondegenerate, then the matrix AX 2 AT is invertible


over P+, in particular on the boundary of p+ . Furthermore, the dual estimate
(y(x), sex)) is a continuous mapping over P+.

Due to this proposition, convergence of the primal iterate xk implies convergence of


the dual estimate (y(xk), s(x k )) under the non degeneracy condition of P+, which
simplifies the proof of global convergence like in the case of the simplex algorithm.

Now, we are ready to summarize the convergence results on the affine scaling algo-
rithm for (2.1) in view of the non degeneracy conditions. In 1974, Dikin [13] proved
global convergence of the primal iterates and the dual estimates for a short-step ver-
sion of the algorithm with J1. = 1 when p+ is nondegenerate. Unfortunately, Dikin's
work was not known to the Western countries until 1988 [14]. It is worth noting
that he even wrote a book on the affine scaling algorithm in 1980 with one of his
colleagues [20].

Soon after Karmarkar [28] proposed the projective scaling algorithm in 1984, Barnes
[7] proposed the short-step algorithm with 0 < J1. < 1 in 1985 and proved global con-
vergence of the primal iterates and the dual estimates when p+ and V are non degen-
erate, and Vanderbei et al. independently obtained the same result for a long-step
version [68] with 0 < ). < 1.
50 CHAPTER 2

The first global convergence proof of the affine scaling algorithm for (2.1) without
any non degeneracy assumption concerning P nor p+ was obtained by Tsuchiya [63]
=
in 1989 for a short-step version with II 1/8, yet requiring non degeneracy condition
on V. [This result by Tsuchiya was obtained for the affine scaling algorithm for
(2.2). He proved global convergence under the nondegeneracy condition ofP. When
interpreted in terms of the affine scaling algorithm for (2.1), the corresponding non-
degeneracy condition is the nondegeneracy condition of V.] In the same year, Tseng
and Luo gave a global convergence proof without any non degeneracy assumption but
assuming that all the entries of A, b, c are integer with input size L, with a tiny-step,
i.e., II= 2-£ [59]. Then, Tsuchiya [62] proved global convergence of the primal
iterates without any non degeneracy assumption with II =
1/8 in 1990. The proofs
by Tsuchiya made use of a local potential function which will be explained in this
survey later. This idea of the local potential function is used in most of the global
convergence proofs which do not require non degeneracy conditions, except for the
one by Tseng and Luo.

Finally, from the end of 1991 to the beginning of 1992, Dikin [15] and Tsuchiya
and Muramatsu [65] independently succeeded in proving global convergence of the
primal and the dual sequence for the long-step version without any assumption on
non degeneracy, with the step-size 0 < >. ::; 1/2 and 0 < >. ::; '}./3, respectively. Dikin's
work came out a bit earlier than Tsuchiya and Muramatsu's, while the latter result is
a little bit better. In the both papers, a paper by Dikin [16] played an important role
which dealt with the homogeneous case with>' = 1/2. Tsuchiya and Muramatsu
also proved that the asymptotic convergence rate of the objective function value
approaches "exactly" 1 - >..

In this paper, we do not give a complete global convergence proof for general cases.
We recommend papers by Monteiro, Tsuchiya and Wang [37] and by Saigal [45] for
self-contained elucidative proofs, which somewhat simplify the results in the original
work by Tsuchiya [62, 63] and Tsuchiya and Muramatsu [65]. A recent text book on
linear programming by Saigal [48] is also recommended as a literature which gives
an integrated complete treatment of the affine scaling algorithm.

2.5 BASIC PROPERTIES OF THE


ITERATIVE PROCESS
In this section, we derive some basic convergence properties of the sequence. We
begin with the following important theorem by Tseng and Luo [59].
Affine Scaling Algorithm 51

Theorem 2.5.1 There exists a positive constant ~(A, c) which is determined from
A and c such that

_ cT d(x) h d (/ ++
r (x ) = 1IcIlIld(x)1I 2: ~ > 0 0/ s Jor a / x ~ P . (2.37)

Proof We prove this by contradiction. If such ~ does not exist, there exists a
sequence {x P } of interior feasible solutions such that r( x P ) --+ 0 as p --+ 00. For each
p, J(x P) == d(xP)/II(XP)-ld(xP)1I is an optimal solution of the optimization problem

minimize d - cT J subject to AJ = 0, JT(Xp)-2J = 1. (2.38)

Since r(x P) tepds to zero along x P, we may surpass a subsequence {x q } of {x P},


where Idil/cT dq (i =
1, ... , n) either converges to a number or diverges to infinity.
Let I be the index set consists of all the index such that Jl --+ 00. By definition, I
is not empty. There exists a constant C l such that, if if/. I, IJlI :::; C1CT Jq holds for
all q.

Now, we consider the following system of linear equations with respect to d:


(2.39)

Since this equation has a solution (e.g. Jq itself), there exists a solution dq whose
norm is bounded by IIJql1 :::; C2(CT Jq + 2::iltI IJm :::; C 2 (1 + (n -III)Cl)cT dq , where
C 2 only depends on A and c. Furthermore, since Idlll cT Jq --+ 00 holds for sufficiently
large q for i E I, we have IIJqll < Idll (i E 1) holds for all q sufficiently large. On
the other hand, we have di = Ji (i f/. 1), hence we conclude that, for sufficiently
large q,
(2.40)
Togegher with cT Jq =cT dq, this is a contradiction to the fact that dq is an optimal
solution of (2.38). •

By using this theorem, we obtain the following properties of the search direction and
the generated sequence including convergence of the sequence to a unique point.

Theorem 2.5.2 Let {xk} be the sequence of the primal iterates of the long-step
affine scaling algorithm with A E (0,1). If {cT xk} is bounded below, then we have
52 CHAPTER 2

3. The sequence converges to a unique point xoo.

4. The inequality

(2.41)

holds for all k.

Proof Since {c T xk} is a monotone decreasing sequence bounded below, we have a


limiting value COO and cT (x k - xk+l) -+ O. Since

k-l k k-l k cTd k


o < Amax[(X) d] ~ AII(X) d II = A II (Xk)-ld k ll
Td k
c _ T( k k+l) (2.42)
< Amax[(Xk)-ldk] - c x - x

(here we used (2.23) in the first equality), we have, in view of (2.28),


lim II(Xk)-ld k ll = lim IIX kskll =
k-co Ie-co
o. (2.43)

From Theorem 2.5.1, we have, for all k > 0, that

Ilcllil(A, c)llxk - xk+ 1 11 = II cll il (A, c) I max[(X~)-ldk] dk II


dk)
~ C
T ( A
max[(Xk)-ld k ] = cT( x k - x
k+1)
.
(2.44)

(2.45)

Together with (2.43) and (2.45), we see that d k converges to zero, which proves the
second statement of the lemma.

On the other hand, taking summation of (2.44) with respect to k, we have, for any
o ~ Kl < K 2 ,
K,
ileA, c)llcllllx K2 - xK'1l ~ ileA, c)llcll L IIxk - xk - 1 11
Affine Scaling Algorithm 53

From the second inequality, we see that {xk} is a Cauchy sequence and hence con-
verges to a unique point. This shows the third relation. The fourth relation is readily
seen by letting J{2 -+ 00 in (2.46). •

Thus, the sequence {xk} converges to a unique point xoo. Let (N, B) be the partition
such that x/J = 0 and x'B > O. We have the following proposition.

Proposition 2.5.3 Under the same assumption as Theorem 2.5.2, the set S ==
{x E P+IXN = O} is a face ofP where the objective function is constant, and for
any x E P, the objective function is written as

(2.47)

where (fI, s) E V.

Proof. Since IIX ks(xk)11 -+ 0 and x'B > 0, we see that limk--+oo(c - ATy(xk))B O. =
This implies that there exists (ii, s) such that s = c - AT iJ and CB = A~iJ, so that
SB = O. For any x E P, we have

(2.48)

from which the proposition immediately follows. •


The following theorem is due to Barnes [7], and shows asymptotic linear convergence
of the objective function value.

Theorem 2.5.4 Under the same assumption as Theorem 2.5.2, cT xk converges lin-
early to cT x oo , where the reduction rate is at least 1 - )"/(2fo) asymptotically.

Proof. Since xk -+ x oo , we have


II(Xk)-l(Xk - xOO)11 < II(xt)-l(x~ - x/J)II + II(X~)-l(x~ - x'B)11
< M + II(X~)-l(x~ - x'B)1I
< Vn + 1, (2.49)
thus X OO E E( xk , 2fo) holds for sufficiently large k, and hence xk - XOO is a feasible
solution for (2.19) when we let f.l = 2fo . On the other hand, the optimal solution
=
for (2.19) with f.l 2fo is

(2.50)
54 CHAPTER 2

Then, we have
(2.51 )

This implies that


cT(xk_x oo ) dk dk
-'---=,------'- < cT < cT -----o-,.---c,---..,..-,,.,,. (2.52)
2ft - II(Xk)-ldkll - max[(Xk)-ldkJ'
Consequently, we have

This completes the proof. •


Finally, we give a result about the asymptotic behavior of the sequence of dual
estimates without a proof. This result is important in estimating several relevant
quantities in the limit. See [37] for the proof.

TheoreIll 2.5.5 Under the same assumption as Theorem 2.5.2, we have

(2.54)

2.6 GLOBAL CONVERGENCE PROOF


UNDER A NONDEGENERACY
ASSUMPTION
We show a global convergence proof of the long-step algorithm under the nondegen-
eracy condition ofP+. The result is due to Dikin [13] (see also [23, 67].)

TheoreIll 2.6.1 Let {xk} and ((y(xk), s(xk))} be the sequences of the primal iter-
ates and the dual estimates generated by the long-step affine scaling algorithm for
(2.1) with.A E (0,1). IfP+ is nondegenerate and if (2.1) has an optimal solution,
{xk} and {(yk, sk)} converge to relative interior points of the primal optimal face
and the dual optimal face, respective/yo

Proof. Due to Theorem 2.5.2 and Proposition 2.4.1, the primal iterates and the dual
estimates converge to XOO and (yOO , SOO). Now, we show that XOO and (yOO , SOO) satisfy
Affine Scaling Algorithm 55

the strict complementarity condition. (As we mentioned in §2, if XOO and (yOO, SOO)
satisfy the strict complementarity condition, then they are relative interior points of
the optimal faces.) Let (N, B) be the partition where x'; = 0 and xll > O. Since
IIxoosooll = 0, we see that s'tJ = o.

First we show that s'; 2: 0, thus XOO and (yOO, SOO) satisfy the complmentarity
condition. If s'; l 0, there exists an index i E N where si < O. For sufficiently
large k, we have s~ < 0 and hence, by taking note that Xs(x) = X- 1d(x) (cf. (2.28»,

10+1 _ (Ie \
(X k)2 s).
Ie Ie 10
_ .(1 \ Xi si ) 10
Xi - X - "max[Xksk] , - x, - "max[Xksk] > xi' (2.55)

This implies that x~ is monotonically increasing for sufficiently large k. However,


this contradicts that xi = O. Thus, s'; 2: o.

Now, we show that XOO and (yOO, SOO) satisfy strict complementarity. Let J and J
be the index sets such that si > 0 (i E J) and si = 0 (i E J) , respectively. It is
enough to show that xi > 0 for all i E J. To this end, we observe that
00

L:(logxf+1 -log x:) (2.56)


10=0

is bounded for each i E J. A sufficient condition for this is that


Ix·10+1 - x·10 I L: I 10 10 I
A x·s·
00 00
L: ] ] - ] J (2.57)
x~ - max[Xksk]
k=O ] 10=0

is bounded for each i ~ J.

Since i E J, we have, by using that Ker(AX) is orthogonal to Im«AXV),

Sj(x) [X- 1PAXXC]j = [X- 1PAXX(c - AT yOO)]j = [X- 1PAXXSOO]j


sf - a](AX2 AT)-1 AJX;sj = a](AX2 AT)-1 AJX;sj. (2.58)
Due to Proposition 2.4.1 and the relation above, we have

IsJI :::; Cllx~W, (2.59)

where C is a constant, hence IIX}s~1I = O(lIx~1I2). Since each component of s~


converges to a positive number, we have max[Xjs~l > (1/2) miniEJ si IIx~lIoo > 0
for sufficiently large k, and hence we have

(2.60)
56 CHAPTER 2

Thus, if k is sufficiently large,

where we used that each component of s~ is uniformly bounded below by a positive


constant and that {xk} is a bounded sequence. Since the rightmost hand side is a
linearly convergent sequence, we see that (2.57) is bounded, and this completes the
~~ .
2.7 GLOBAL CONVERGENCE PROOF
WITHOUT NONDEGENERACY
ASSUMPTIONS
In this section, we deal with global convergence of the long-step affine scaling algo-
rithm for general problems. We describe the main result, and outline the underlying
idea for proving global convergence of the algorithm without non degeneracy assump-
tions. Here is the main result:

Theorem 2.7.1 (See [65].) If (2.1) has an optimal solution, the sequences {xk} and
((y(xk), s(xk))} generated by the long-step affine scaling algorithm with 0 < A ~ 2/3
have the following properties.

1. {xk} converges to a relative interior point of the optimal face of (2.1).


2. {(yk, sk)} converges to the relative analytic center of the optimal face of (2.2).
3. The asymptotic reduction rate of the objective function is "exactly" 1 - A.

This result was obtained by Tsuchiya and Muramatsu [65]. Slightly prior to this
result, Dikin established the statements 1 and 2 for 0 < A ~ 1/2 [15]. Surprizingly,
the step-size 2/3 appearing in the theorem is tight for the statements 2 and 3 to
hold. See the papers [65] and [26] and §2.1O of this survey. As to the statement I,
we do not know what is the upper bound of A which ensures global convergence of
the primal iterates of the affine scaling algorithm. Recently, Mascarenhas gave an
interesting example where the sequence cannot converge to an optimal vertex when
A = 0.999 [31]. Terlaky and Tsuchiya [57] obtained an instance where the algorithm
Affine Scaling Algorithm 57

can fail with a A ~ 0.92 by modifying his example. We deal with the example by
Mascarenhas in §2.12.

Due to Theorem 2.5.2, we see that {xk} converges to a unique point xco. We use the
same notation as in §5. Recall that Nand B are the index sets such that x'fj = 0
and xEl > O. The major tool for the proof is a local potential function which is
defined as follows:

1j;(x) = INllogcT(x - X CO ) - L)ogxi = INllogs~xN - I)ogx;. (2.62)


ieN ;eN

This potential function is an analogue of the Karmarkar potential function [28]. The
function is called the local Karmarkar potential function and was first introduced in
[63] for a global convergence analysis of the affine scaling algorithm. Observe that
this local potential function is a homogeneous function in x N. Furthermore, we have,
due to (2.41) and the inequality between arithmetic mean and geometric mean, that

(2.63)

Thus, {'Ij;k} is bounded below by a constant. Let u(x) = XS(X)/S~XN' By using

cT xk+l _ cT X CO
cTx k _ cTx co

(2.64)
X~+l u~
• = '[ k]'
1 - Amaxu (2.65)
~,
the reduction of this potential function is written as:

Let
e
k
wN = uNk - INI ' (2.67)
58 CHAPTER 2

The following theorem is a key result for the global convergence proof. Though we
do not prove it here, we give a detailed explanation on the "heart" of the proof in
§8 and §9.

Theorem 2.7.2 Under the same assumption as Theorem 2.7.1, we have

(2.68)

Since s~x1\, = cT x" - cT X OO is a linearly convergent sequence and 1j} is bounded


below (cf. (2.63)), the right-hand side of (2.68) converges to zero. This drives w1\,
to 0 in the limit, which implies that u~ -+ e/INI. On the other hand, we can show
that u~ = O(s~x~) due to Theorem 2.5.5. Thus we are able to prove the following
theorem.

Theorem 2.7.3 Under the same assumption as Theorem 2.7.1, we have

·
I 1m
"-+00
"
UN = -INe-I' lim u~
"-+00
= o. (2.69)

Now, Theorem 2.7.1 is shown as follows.

Proof of Theorem 2.7.1. The linear convergence rate of the objective function readily
follows from (2.64) and Theorem 7.3.

Let t~ == x1\, Is~x1\,. By definition and (2.41), there exists a positive constant c such
that
1 cT xl: - cT X OO
t~ ~ (s~x1\, )/lIx1\,1I ~ IIx" _ xooll ~ c (2.70)

for each i E N. On the other hand, because of Theorem 2.7.2 and the linear con-
vergence of S~XN = cT X - cT x co , we see that exp(v'N(x"» = I/(IT;
above. Together with (2.70), we see that (Tt)-le ::; ee
tn
for some constant
is bounded
e,where
Tt == diag(t~). By definition, we have s;(x") = u~ It~ (i EN). Since u~ converges
to e/lNI and ce ::; (Tt}-le ::; ee, we see that {s~} is a bounded sequence whose
accumulation points are strictly positive. Furthermore, since s~ converges to zero
(cf. Theorem 2.5.5), we see that the limiting point X OO and every accumulation point
(yOO, SOO) satisfy the strict complementarity condition. Thus, XOO is a relative interior
of the optimal face of (2.1).
Affine Scaling Algorithm 59

It is remaining to show that the accumulation point (yoo, 8 (0


is the analytic center
)

of the dual optimal face. A necessary and sufficient condition for the relative analytic
center (y*, 8*) of the dual optimal face is that there exists xB satisfying

(2.71)

Let tN be an accumulation point of t~. Since t~ is obtained by scaling x~ and


ANX~ =
+ AB(xt - xIl) 0, there exists z'B such that
(2.72)

Since SfjtN = e/INI, we have


(2.73)

This completes the proof. •

2.8 THE HOMOGENEOUS AFFINE SCALING


ALGORITHM
In this section, we deal with the affine scaling algorithm applied to homogeneous
problems. This special case is important because of the following reasons: (i) Since
the structure ofthe neighborhood ofthe limiting point x oo of the iterates ofthe affine
scaling algorithm is similar to a polyhedral cone, we can approximate the affine scal-
ing algorithm in the final stage of the iteration by the affine scaling algorithm applied
to a homogeneous problem whose feasible region is a cone. (ii) There exist several
interesting connections between the affine scaling algorithm applied to homogeneous
problems and other basic interior point algorithms, especially, the Karmarkar algo-
rithm and the Newton method for the analytic center of a polyhedron.

We review the basic properties of the affine scaling algorithm applied to a homoge-
neous problem and exploit its close relationship to the other interior point algorithms.
The point raised in (i) will be discussed in the next section, based on the results in
this section.
60 CHAPTER 2

2.8.1 The Homogeneous Affine Scaling


Algorithm
Now, let us consider the special case where the feasible region is homogeneous,
namely, b = 0 in (2.1). Specifically, we consider the following problem:

minimize x cT x
(2.74)
subject to Ax = 0, x ~ O.

Recall that we use P+ to denote the feasible region of (2.74). There are three
possibilities about the problem.

1. minimum{cTxlx E P+} = 0 and x = 0 is a unique optimal solution.


2. minimum{cT xix E P+} = 0 and there exists an optimal solution where x "# O.
3. minimum{cT xix E P+} does not exist, so cT x can diverge to minus infinity.

We apply the long-step affine scaling algorithm to this problem. We refer to this
algorithm as "the homogeneous affine scaling algorithm." We assume that the fea-
sible region has an interior point x such that cT x > O. (In the cases 1 and 2, this
condition is always satisfied under Assumption 1.) Furthermore, we assume that
cT x > 0 is always satisfied at any interior feasible solution x under consideration
unless otherwise stated.

Let x be an interior feasible solution. Since AXe = Ax = 0 so that e E Ker(AX),


we have PAxe = e. Due to this relation, we have

(2.75)

where u(x) = X-1d(x)jc T X = Xs(x)jcT x. This is a remarkable property of the


search direction which holds only for homogeneous problems. The dual problem to
(2.74) is a feasibility problem of finding (y, s) such that

s =c - AT y, S ~ O. (2.76)

We have the following proposition.

Proposition 2.8.1 The dual estimate s(x) cannot be strictly positive in the cases 2
and 3.
Affine Scaling Algorithm 61

Proof If s(x) > 0 holds for some feasible solution X, then x 0 and (y, s) = =
(y(x), s(x» make a pair of primal-dual feasible solutions satisfying strict comple-
mentarity. This implies that x =
0 is the unique optimal solution of the problem
(2.74), which cannot take place in the cases 2 and 3. This completes the proof. _

Since the signs of each component of s(x) and u(x), d(x) are the same, we have the
following corollary.

Corollary 2.8.2 In the cases 2 and 3, we have u(x) :f 0 and d(x) :f O.

To analyze behavior of the algorithm, we consider the Karmarkar potential function


[28]
1jJ(x) = nlogcT x - :l)ogx;. (2.77)

This function is a homogeneous function, and have the following property.

Proposition 2.8.3 The K armarkar potential function is bounded below if and only
if the case 1 occurs, where the minimum value is attained along a line emanating
from x = o.
Proof The proof is easy by taking note of the fact {xix E P+, cT x = I} is bounded
if and only if the case 1 occurs. _

The following theorem is crucial, and shows that the potential function reduces as
long as A :s: 2/3 in the homogeneous affine scaling algorithm.

Theorem 2.8.4 (See ~39, 65].) Let x be an interior feasible solution of (2. 74) such
that cT x > 0, and let x be the interior feasible solution such that cT x+ > 0 obtained
by one iteration of the long-step affine scaling algorithm with 0 :s: A :s: 2/3. Then,
reduction of the potential function is bounded above as follows:

'I/I{x+) - 'I/I{x) ~ n,\/max[uJ


n - ,\/max[uJ
Ilu 1 el1 2
- ;
(
-
n+ 1 ,\)
max[u]2{1 - ,\)

~ n,\2 IIu -;ell (-1 + nm:x[u]) ~ n,\211w1l 1+~Iwll ~ 0,


2 3

(2.78)
where w =u - (l/n)e.
62 CHAPTER 2

Proof Let A = A/max[u] and 9 = nA/(n - A). Taking account of the relation
uTe = 1, we have
T
= 1 _ AJl.!:IL
2-
= ~(1 xt
_ 91I w I1 2 ), -' A
C
cTx
x+
max[u] 9 Xi
= 1- A - - = -(1- 9Wi)
max[u] 9
Ui
(2.79)

(cf. (2.64) and (2.65». On the other hand, since the relations l/n ::; max[uJ, 0 <
A ::; 2/3, 0 < 1 - AlluW /max[u] ::; 1 - Amax[u] hold, we see that

(2.80)

Due to (2.79), we have

.,p(x+) - .,p(x) = nlog(l- 911w1l2) - Elog(l- 9Wi). (2.81)

Now, we find an upper bound of ¢(x+) - ¢(x) by using the following well-known
inequalities [28].

log(1 - () < -( «( < 1), (2.82)


n

Elog(l - 11i) > -r? e _ I177W o ::; max[77] < 1).


;=1
2(1 - max[77])
(2.83)

Substituting these inequalities into (2.81) and taking note of the relation w T e = 0,
we have

2 92 11wl1 2
< -n9l1 w ll + 2(1 _ 9 max[w])

= 911 wll 2 (-n + 2(1- 9~ax[w])) (2.84)

Since uT e = 1, we have
max[u] = lin + max[w] 2: lin + IIwllln = (1 + IIwlDln. (2.85)

We substitute the definition of wand 9 into (2.84). Then taking account of the
relations (2.80) and (2.85) and that max[u] = max[w] + lin, we obtain the desired
inequality (2.78). •

Based on this theorem, we obtain the following main result on the homogeneous
affine scaling algorithm.
Affine Scaling Algorithm 63

Theorelll 2.8.5 (See [39].) Let {xk} be the sequence generated by the long-step
affine scaling algorithm applied to (2.74) with the step 0 < A ::; 2/3. The following
situation occurs on {xk}.

1. In the case 1, we have limk_oo '!jJ(xk) = min '!jJ(x) and limk_oo u(xk) = eln.
Furthermore, the dual estimate ((y(xk), s(xk))} converges to the analytic center
of the dual feasible region {(y, s)ls = c - ATy :::: OJ.
2. In the case 2, we have limk_oo '!jJ(xk) = -00, where

'!jJk+l _ '!jJk < _ A(2 - 3A) if A < 2/3, (2.86)


- 2V2(1 - A)

and
(2.87)

3. In the case 3, we have {2.86} and {2.87} as long as cT xk > 0, and cT xk < 0
holds after a finite number of iterations.

Proof. In the case 1, we see that '!jJ(x) is bounded below by a constant due to
Proposition 2.8.3. If A ::; 2/3, we have either (i) '!jJk+l - '!jJi.: = 0 for some k or (ii)
'!jJk+l _ '!jJk < 0 for all k. In the case (i), we have u(xk) = eln due to Theorem 2.8.4.
Since u(xk) = (Xk)-ld(xk) by definition, this implies that d(xk) is proportional to
xk and so x k+1 is proportional to xk also. Due to the homogeneous property, for all
=
k :::: k, we see that xk is proportional to xk and that u(xk)-eln 0 holds recursively.
In the case (ii), we have limk_oo('!jJk+l_'!jJk) = 0, because, in view of Theorem 2.8.4
and Prpoposition 2.8.3, {'!jJk} is a monotonically decreasing sequence bounded below.
This implies that limk_oo lIu(x k ) - elnll = =
0, because limk_oo('!jJk+1 - '!jJk) 0 holds
only if Ilu(x k) - elnll = 0 because of Theorem 2.8.4.

Thus, we have limk_~oo u(xk) = eln in the both cases, which implies that s(xk)
converges to the analytic center of {(y, s)ls = c - AT Y > O} as we see in the similar
manner as in the proof of Theorem 2.7.1 (To see this, we put N = 0 in the proof of
Theorem 2.7.1 and use the fact that {'!jJ(Xk)} is both bounded below and above.)

The second statement associated with the case 2 is proved as follows. Due to Corol-
lary 2.8.2 and eT uk = 1, we have max[u k ] :::: 1/(n - 1). Maximizing the func-
tion on the righthand side of the first inequality of (2.78) under the condition that
max[u] :::: 1/(n - 1) and eT u = 1, we obtain the results.
64 CHAPTER 2

The proof of the former part of the statement 3 is the same as the proof of the
statement 2. We omit the proof for the latter part. _

Recently, Dikin and Roos proved the statement 1 for sequence generated by the
short-step version with J1. = 1 [19). This result seems interesting, because the step-
size II = 1 is the original version of the affine scaling algorithm by Dikin [12) and
hence has a special meaning.

2.8.2 The Homogeneous Affine Scaling


Algorithm and the Karmarkar Algorithm
Now we show how the homogeneous affine scaling algorithm is related to the Kar-
markar algorithm [28). This connection was first pointed out by Bayer and Lagarias
[8) and was studied in, e.g, [22, 39, 60, 61).

Let 9 ::::: 0 be a nonzero vector, and consider the linear programming problem where
the constraint gT x = 1 is added to (2.74), i.e., we consider the following problem:

minimize x cT x
(2.88)
subject to Ax = 0, x::::: o.
In particular, if we choose 9 = e, we obtain the Karmarkar canonical form [28). Note
that a standard form problem is also readily converted into this form [22). Let L be
the input size of this problem. We assume that (i) a feasible point XO > 0 of (2.88)
is available such that cT xO > 0 and 1jJ(xO) = O( nL); (ii) the optimal value is zero;
(iii) the optimal set is bounded.

It is known that the setting above is general enough to solve any LP problem.
Our objective is to find a feasible point of (2.88) where cT x =
O. Intuitively, this
is attained by decreasing 1jJ(x) to minus infinity. To explain this, for simplicity,
consider the case where 9 =
e. By using the inequality between arithmetic mean
and geometric mean, we have, for any interior feasible point x,

) } l/n cTx cTx


{
exp(tf!(x) = (TI x i)1/n ::::: n(eTx) = ncTx. (2.89)

From this relation, it is easy to see that x tends to an optimal solution of (2.88)
when 1jJ(x) tends to minus infinity. More precise argument shows that finding x such
that 1jJ(i:) = -O(nL) is enough to obtain an exact optimal solution by rounding
approximate solution under the assumptions (i)-(iii).
Affine Scaling Algorithm 65

Now, we associate a feasible solution x of (2.74) to the feasible solution x of (2.88)


by the following conic projection:
_ x
x == T""' (2.90)
9 x
We consider the following iteration based on the long-step affine scaling algorithm
applied to (2.74):
d(x"')
x"'+l - x'" - A . (2.91)
- max[(X"')-ld(x"'»)'
where 0 < A < 2/3. x'" is a feasible solution of (2.88) for each k. We start the
algorithm from XO = xo. Since the optimal value of (2.88) is zero, we see that (2.74)
has a nonzero optimal solution where the objective function value is zero, so that
the statement 2 of Theorem 2.8.5 applies. Thus, ~(Xk+l) = ~(xk) - 0(1). Since
~(x) is a homogeneous function, we see that ~(x"') =
~(xk). Based on these facts,
it is easy to see that the algorithm finds a feasible point x' of (2.88) such that
~(x·) = -O( nL) in O( nL) iterations. This algorithm is known to be equivalent to
the Karmarkar algorithm in the following sense [8).

Proposition 2.8.6 The direction xk + 1 - i k is exactly the same as the search direc-
tion of the Karmarkar algorithm applied to (2. 88}.

2.8.3 The Homogeneous Affine Scaling


Algorithm and the Newton Method for the
Analytic Center
In this subsection, we show that the homogeneous affine scaling algorithm in §8.1 is
closely related the Newton method for the analytic center of a polyhedron.

We deal with the homogeneous problem (2.74). Now, let us consider a constant cost
hyperplane where the objective function value is one. We define the polyhedron
Q+ = {xix E P+, cTx = I}. Under the assumption, Q+ is nonempty. We consider
a conic projection
x
vex) = -;y-.
c x
(2.92)
Observe that, when restricted to Q+, the Karmarkar potential function ~(x) sub-
stantially becomes the log barrier function

- Elogx; (2.93)
66 CHAPTER 2

associated with Q+, because cT x is constant. We can relate the homogeneous affine
scaling algorithm to the Newton method to find the analytic center v· of Q+ which
minimizes the log barrier function. We denote by dN(v) the Newton step to obtain
the analytic center of Q+ at a relative interior point v of Q+. The following theorem
shows that the "conic projection" of the search direction of the homogeneous affine
scaling algorithm is just the Newton direction dN(v).

Theorem 2.8.7 (See [64].) Let x E P+ such that cT x > 0, and x+ (A) be the point
such that cT x+ > 0 obtained by moving in the homogeneous affine scaling direction
with the step-size A. Then, we have

AI(U(X» N
v(x) - 1 _ AI(U(X)) d (v(x»,

v(x) - (u(x), A)dN (v(x», (2.94)

where
I(U)=~.
max[u]
(2.95)

In other words, V(X+(A)) coincides the point obtained by making one iteration of the
Newton method at v( x) with the step ( to obtain the analytic center of Q+ .

It is worth discussing a little bit more on the implication of Theorem 2.8.7 in the
case 1 where (2.74) have a unique optimal solution and when we take the step-size
o < A ~ 2/3. In this case, Q+ is bounded and hence the analytic center v· exists.
We have the following proposition.

Proposition 2.8.8 lIu(x) - elnll--+ 0 if and only if Ilv(x) - v·ll--+ o.

Proof Due to Theorem 2.8.4, we see that lIu(x) - elnll --+ 0 holds when 1j;(x)
approaches its (unique) minimum. By using the similar agrument as we did in
Theorem 2.7.1 (see also the proof of the first statement of Theorem 2.8.5), conversely,
we see that Ilu(x) - elnll --+ 0 implies that 1j;(x) approaches its minimum. Thus
lIu(x) - elnll --+ 0 and that 1j;(x) approaches its minimum are equivalent. On the
other hand, by definition, Ilv(x) - v·lI--+ 0 is equivalent to that 1j;(x) approaches its
minimum. The proposition readily follows from these two facts. _

Let {xk} be the sequence generated by the homogeneous affine scaling algorithm with
the step-size 0 < A ~ 2/3. It is readily seen from Theorem 2.8.5 that uk --+ eln, and
this immediately implies that v(xk) --+ v·. In view of Theorem 2.8.7, asymptotically
Affine Scaling Algorithm 67

the step-size ((Uk,A) in (2.94) approaches ((e/INI,A) =


A/(l- A), which is 1 for
A = 1/2 and 2 for A = 2/3. This suggests that v" converges quadratically to v'
of the analytic center of Q+ when A =1/2. In conclusion, we have the following
theorem.

Theorem 2.8.9 If (2.74) has a unique optimal solution and 0 < A :5 2/3, v(x")
converges to the analytic center of Q+. In particular, if A = 1/2, its asymptotic
convergence rate is quadratic.

Thus, the conical projection of the affine scaling direction for a homogeneous problem
generates a Newton direction for the analytic center of Q+. This property gives
some insights into the asymptotic behavior of the affine scaling algorithm, as will be
discussed in §1O-§ 12.

2.9 MORE ON THE GLOBAL


CONVERGENCE PROOF OF THE
AFFINE SCALING ALGORITHM
Now we are ready to see how the special case analysis of the homogeneous problem
in the previous section is closely related to the global convergence analysis for the
general case in §7. We use the same notation as in §7.

We are interested in the behavior of the algorithm in the final stage, i.e., in a suf-
ficiently small neighborhood of the limiting point xoo. It seems that the constraint
XN ~ 0 which becomes active in the end asymptotically plays a dominant role in
determining the search direction compared with the remaining constraint XB ~ 0
which remains "far away" throughout the iterations. Then, it makes sense to con-
sider the following LP problem obtained by discarding the constraint XB > 0 from
(2.1).

(2.96)
Since ABxif = b, by introducing a new variable ZB = XB - xif, we see that the
problem is equivalent to the following homogeneous LP problem with respect to XN.

minimize XN (2.97)
This is a homogeneous problem in XN space. As we saw in the previous section,
we can associate the Karmarkar potential function to this problem, which is exactly
68 CHAPTER 2

the same one as we used in §7 as "the local Karmarkar'potential function" (2.62).


Now, (x~, x~ - x B) is a feasible solution of (2.97). Let dN(xk) be the affine scaling
direction for (2.97), and let UN(xk) == (X~ )-ld~/s~x~. (cf. (2.30) and (2.32) to
see how we define dN(X k).) As we saw in the previous section, dN(xk) has the effect
of reducing the local potential function, where the amount of reduction is estimated
by Theorem 2.8.4.

On the other hand, we have the following theorem which shows that dN(xk) and
UN(xk) of the xN-part of the affine scaling direction for (2.1) and its scaled version
are very close to dN(xk) and UN(xk) in the final stage of iterations.

Theorem 2.9.1 (See [64].) We have Ilu~ - u~11 = O«s~x~ ?).

Since lIu~1I ~ l/M because eTuN = 1 (cf. (2.75», u~ is a good approximation


to u~ asymptotically. Due to this similarity, we may expect an analogue of Theorem
2.8.4 holds by using u~ rather than u~. Indeed, this analogue is Theorem 2.7.2.
(Compare (2.78) of Theorem 2.8.4 with (2.68).) In this way, the analysis of the affine
scaling algorithm applied to the homogeneous problem is important for the analysis
of the affine scaling algorithm for general problems.

2.10 WHY TWO-THIRDS IS SHARP FOR


THE AFFINE SCALING?
In §7, we proved the primal sequence {xk} converges to a relative interior of the pri-
mal optimal face, while the dual sequence ((y(x k), s(xk»)} converges to the relative
analytic center of the dual optimal face, if 0 < >. ~ 2/3. After releasing the first ver-
=
sion of [65], Tsuchiya and Muramatsu observed that>. 2/3 is the largest step-size
which ensures that the dual estimates converge to the relative analytic center of the
dual optimal face [65]. More strongly, Hall and Vanderbei [26] found an example to
=
show that ..\ 2/3 is the largest step-size which ensures the pair (x k, (yk, sk» of the
primal sequence and the dual estimates converge to unique points in their respective
=
spaces. These results show that the ..\ 2/3 in Theorem 2.7.1 is tight.

We have the following plausible explanation for why the step-size two-thirds is sharp
for obtaining convergence of the dual estimates [64]. This argument is based on
Theorem 2.8.7.
Affine Scaling Algorithm 69

We take up the homogeneous problem in §8, and use the same notation as §8.3. We
also assume that the case 1 occurs, i.e., the problem (2.74) has a unique optimal
solution, and apply the homogeneous affine scaling algorithm. Let {Xk} be the
generated sequence with the step-size A. Observe that the dual estimate s( x) is a
(nonlinear) function of the direction v(x)= x/cTx as seen from its definition (2.26).
If the projected iterate v(xk) does not converge to a point in Q+, then it is unlikely
that the associated dual estimate (y(xk), s(xk)) converges to a unique point.

Now, we will show that A > 2/3 results in non-convergence of the projected iterate
v(xk) to a unique point. Suppose that we take a step A > 2/3 and v(xk) converges
to an interior point of Q+. This implies that the limit point V OO should be v·
of the analytic center of Q+, since v· is the only interior point of Q+ where the
Newton step dN(v) = 0 holds. Since vk ---> v· implies that u(xk) ---> e/n due to
Proposition 2.8.8, we have limk_oo (k = A/(l - A) = 1 (cf. (2.94)) when A = 1/2
and limk_oo (k = A/(l - A) > 2 when A > 2/3. In view of (2.94), this means
that the iteration with the step-size A > 2/3 results in an overshooting Newton
iteration in the space of v(xk) with the step-size greater than two, which cannot be
convergent to a unique point. This is a contradiction to the assumption that v(xk)
converges. Thus, A > 2/3 implies non-convergence of v", which is likely to result in
non-convergence of s(xk).

As was suggested in §9, most of the convergence results about the affine scaling
algorithm for homogeneous problems have its analogue in the asymptotic behavior
of xN-part of the sequence of the affine scaling algorithm for general problems.
Therefore, it is plausible that an analogous result holds generally, that is, the dual
estimate usually does not converge to a unique point if A > 2/3 in the affine scaling
algorithm. (This is not the case when the problem is nondegenerate, because SN(xk)
converges even if VN(xk) == XN(xk)/ shx'fv does not converge. See Theorem 2.6.1.)

2.11 SUPERLINEAR CONVERGENCE OF


THE AFFINE SCALING ALGORITHM
Based on the relationship between the homogeneous affine scaling algorithm and the
Newton method for the analytic center, we can show that the affine scaling algorithm
can enjoy superlinear convergence property, without introducing any other auxiliary
search direction but only by controlling step-size carefully [64).

Now, suppose that the problem is homogeneous and has a unique optimal solution,
and see how we can obtain superlinearly convergent sequence to zero in this special
70 CHAPTER 2

=
case. We use the same notations as in §8. In §8.3, we observed that A 1/2 implies
quadratic convergence of the projected iterates V(xk) to the analytic center of Q+.
If the projected iterate v(xk) is sufficiently close to the analytic center v· of Q+,
then u(xk) - eln is very small (cf. Proposition 2.8.8). In this case, since we have

cT x+ = 1 _ A~ '" 1 _ A lIelnl1 2 = 1 - A (2.98)


cT x max[u] max[eln]
(cf. (2.79), here we put x := xk, x+ := xk+ 1 , u := uk), we can reduce the objective
function value a lot by taking "a very long step" like A '" 1.

On the other hand, in view of Theorem 2.8.7, if we take the step A'" 1, the step-size
((U(X),A) ('" AI(l - A) when u '" eln) in (2.94) can be very large, and x+ may
not stay well-centered any more in the sense that v(x+) '" v· or u(x+) '" eln hold.
Then we cannot expect to reduce cT x drastically any more in the next step at x+
even if we take another long-step, because (2.98) holds only when u(x+) '" eln.

=
However, if we take the step A 1/2 at x+ instead of taking a long-step, it is possible
to recover the centrality by taking advantage of the quadratic convergence of v( x)
to v*, which enables us again to take another long-step to decrease cT x sufficiently
in the next step.

Based on this idea, we can prove two-step superlinear convergence of the affine
scaling algorithm for homogeneous problems. Furthermore, the idea can be made
use of to implement a superlinearly convergent affine scaling algorithm for general
problems, because most of the convergence results about the homogeneous affine
scaling algorithm can hold in XN-space in general cases asymptotically. Indeed,
Tsuchiya and Monteiro [64] was able to construct a two-step superlinearly convergent
=
affine scaling algorithm by taking the step A 1/2 or A '" 1 alternatively. Stimulated
by this idea, Saigal [46] developed an affine scaling algorithm with a three-step
quadratic convergence property.

2.12 ON THE COUNTEREXAMPLE OF


GLOBAL CONVERGENCE OF THE
AFFINE SCALING ALGORITHM
In the fall of 1993, Mascarenhas gave an interesting example showing that the al-
=
gorithm fails to converge to an optimal vertex when taking the step A 0.999 [31].
In this section, we pick up this example. He considered the following dual standard
Affine Scaling Algorithm 71

homogeneous LP problem:

mm;mrn',~, ,ubjoct to ,~u f ~:) (: )> ( n' (2.99)

=
where (3 > 1. (Thus it have an edge (Yl, Y2, YO) t(I/(1 + (3), 1/(1 + (3),1) (t 2: 0).)
This is a dual standard form problem (2.2) where we let

b = -(0,0,1), c = (0,0,0,0), A=- (1o -1)


(3
1
(3
1
0
1
-1
0
0
. (2.100)

As was shown in (2.33), the iteration of the affine scaling algorithm for this problem
is as follows:
(2.101)

(We omitted the iterative formula for s which is automatically follows from s =
c - AT y.) The point of his example is that it is homogeneous, symmetric with
respect to Yl and Y2, and has no optimal solution. Let T«YI, Y2, Yo» (Y2,YI,YO). =
Due to the homogeneous property and symmetry, we can easily check the following
relations:

y+(p,y, >..) = P,y+(y, >..) (for 0 < p,) and T(y+(y, >..» = y+(T(y), >..). (2.102)

Now, suppose that we could find an interior feasible solution fj such that

fjo > 0 and y+(fj, >..) = p,T(fj), (2.103)

where 0 < p, < 1. Then, we have


y++(fj, >..) y+(y+(fj, >..), >..) = y+(p,T(fj), >..) = p,T(y+(fj, >..»
= J-lT(J-lT(fj» = p,2fj. (2.104)

This means that the iterates initiated at fj approaches zero, shrinking each of its
components exactly by a factor of J-l2 every two iterations. In other words, the
iterates initiated at fj with step-size >.. converge to the origin, not diverging with
driving Yo to minus infinity. Mascarenhas found that such a point fj exists for
=
>.. 0.999 by setting appropriate (3.
72 CHAPTER 2

This example is a homogeneous problem with no optimal solution. Now, we add one
more constraint
Yo;?: -1 (2.105)
which is parallel to a hyperplane where the objective function is constant. This
problem is no more homogeneous and has an optimal solution whose optimal value
is -1. We can easily show that the search direction ofthe affine scaling algorithm for
this modified problem is the same as the original homogeneous one. Thus, we obtain
the same result with this inhomogeneous problem [31, 57]. Namely, if we start the
iteration from a solution of (2.103), the sequence converges to the nonoptimal vertex
=
Y 0 and fails to find the optimal face where Yo -1. =
There is a simple explanation for why this inconvenience occurs in his example. We
return to the homogeneous case, and introduce a conic projection v(y) =
(Y1, Y2)/YO
for Y such that Yo > O. It is easily verified that the following proposition holds.

Proposition 2.12.1 Let y be an interior feasible solution of (2. 99) such that Yo > O.
y satisfies (2.103) if and only if
v(y+(y, A)) = T(v(fJ)), (2.106)

Now, let Q+ be the polyhedron defined as

{(Y1, Y2)1 Yo = 1, (y, s) E V+} = {(Y1, Y2)1 aliY1 + a2iY2 + aOi ::; 0 (i = 1, ... , 4)}.
(2.107)
Obviously, v(y) E Q+. The log barrier function for Q+ is defined as
4 4
- 2)ogs; = - 2)og[-(aliY1 + a2iY2 + aod]· (2.108)
;=1 i=l

Recall that v(y) is a conic projection of an interior feasible solution onto the hy-
perplane where the objective function is a constant. We are analyzing behavior of
the iterates {yk} by conically projecting onto the hyperplane where the objective
function is a constant. This situation is exactly the same as the one we analyzed
in §8.3. We have the following theorem which is a dual standard form version of
Theorem 2.8.7.

Theorem 2.12.2 v(y+(y, A)) - v(y) is proportional!o the Newton direction J,N at
v(y) to minimize the log barrier function (2.108) of Q+.
Affine Scaling Algorithm 73

Now, we subtract v(y) from the both sides of (2.106). Due to the theorem above
and the definition of T, we obtain

v(y+(y, A)) - v(y) = T(v(y)) - v(y)


(V2(Y) - Vl(Y), Vl(Y) - V2(y)) = C2(1, -1), (2.109)

where Cl, C2 are scaling constants. Thus, in view of Theorem 2.12.2, we can char-
acterize the set of initial points which generates the sequences convergent to y 0=
for a certain A as the set of points y satisfying

IN (v(Y)) = C3 (1, -1), (2.110)

where C 3 is a constant. An advantage of this characterization is that the properties


of IN over Q+ is studied well in connection with the primal interior point algorithms,
e.g., [6, 25]. Modifying the Mascarenhas example based on this explanation, we can
find an instance where the affine scaling algorithm with a step A ::; 0.92 fails to
converge to an optimal solution. See [57] for details.

Theorem 2.12.3 There exists an instance of LP problem where the affine scaling
algorithm with a A ::; 0.92 fails to converge to an optimal solution.

2.13 CONCLUDING REMARKS


We reviewed convergence results on the affine scaling algorithm. We close this survey
with comments on several topics we could not touch and by suggesting some open
problems.

2.13.1 Continuous Trajectory


One interesting topic we could not deal with was analysis on the limiting behaviors
of the continuous trajectory associated with the algorithm. Adler and Monteiro
[3] and Witzgall et al. [71] analyzed limiting behavior of the continuous trajectory
and proved global convergence. We should also mention the work by Megiddo and
Shub [33] which observed that the continuous trajectory can "visit" exponential
number of vertices before it comes to an optimal solution when applied to the Klee-
Minty problem. Due to this fact, many people think that the affine scaling algorithm
cannot be a polynomial algorithm. As was pointed out by Bayer and Lagarias [9] and
Tanabe [54, 55], there exists a nonlinear transformation of coordinate which maps
74 CHAPTER 2

each affine scaling trajectory to a straight line. Tanabe and Tsuchiya [56) observed
that this structure is nicely interpreted in the framework of the information geometry
by Amari and Nagaoka (5).

2.13.2 Saigal's Power Method


Saigal (47) considered a modified "power" version of the algorithm where the ellipsoid
{l:lIlX-r(x - x)11 :::; p" Ax= b} is used in place of the Dikin ellipsoid E(x, p,). (See
his text book (48) also.) One disadvantage of this modification is that we lose the nice
scaling invariance property, however, surprizingly, yet most of the results obtained
about the original version including global convergence of the primal and the dual
iterates, superlinear convergence, etc., carryover to this version.

2.13.3 Extensions to Infeasible Interior Point


Methods
Extension of the affine scaling algorithm to an infeasible interior point method is
given by Dikin and Zorkaltsev [20], Muramatsu and Tsuchiya [40, 41). The search
directions of these algorithms are combinations of the two affine scaling direction
aiming at feasibility and optimality. The search direction defines a smooth vector
field on the nonnegative orthant whose associated continuous trajectories end up with
points in the optimal set of (2.1). In (40) and (41), they proved global convergence
of the primal iterates and the dual estimates.

2.13.4 Convergence Results About General


Objective Function Case
The affine scaling algorithm is naturally extended to optimization of a nonlinear
function f(x) over the feasible region of (2.1). There are two versions of such
extensions. Let x* be the current iterate. The first one is to take a step in
the direction which minimizes the first order approximation of f(x), i.e., c(x) =
[\7f(x*)jT(x - x*), over the Dikin ellipsoid, while the second one is to determine
the next iterate as the minimizer of the second order approximation of f(x), i.e.,
(1/2)(x - x*f\7 2 f(x*)(x - x*) + c(x), over the Dikin ellipsoid with certain radius
o< p, < 1. The first one is referred to as the first order affine scaling algorithm while
Affine Scaling Algorithm 75

the second one is referred to as the second order affine scaling algorithm. There are
several convergence results obtained so far.

The convex quadratic programming problem (CQP) is the most direct extension of
LP. Sun extended the global convergence result of Tseng and Luo for LP to the
second order algorithm [53]. Monteiro and Tsuchiya proved global convergence of
the second order algorithm for CQP without non degeneracy assumption with the
=
step-size up to J-l 2/3, by extending the prooffor LP [36].

On the other hand, Gonzaga and Carlos [24] proved global convergence of the first
order algorithm for a convex function under the assumption that p+ is non degen-
erate. Recently, Monteiro and Wang proved global convergence of the second order
algorithm for convex and concave function under the same nondegeneracy condition
[38].

2.13.5 Open Question and Further Research


Topics
The most interesting open question about the affine scaling algorithm for LP is to
prove its polynomiality (when started from an well-centered point). So far, few
results are obtained about this problem. It seems important to develop a novel nice
measure to the central trajectory if one tries to prove polynomiality. In connection
with this, it looks interesting to see the problem more precisely in terms of the
information geometry. Recently, many of the IPMs and their convergence results are
extended to the semidefinite programming problems (SDP) - optimization of linear
functions over semidefinite cones [4, 42]. It would be a challenging and interesting
question how we can extend the affine scaling algorithm and the convergence analysis
to SDP.

2.14 APPENDIX: HOW TO SOLVE GENERAL


LP PROBLEMS WITH THE AFFINE
SCALING ALGORITHM
In order to solve an LP problem by the affine scaling algorithm, we have to convert it
to an equivalent problem satisfying Assumptions 1 '" 3. It is easy to check whether
Assumption 2 holds or not and it is easy to satisfy Assumption 3. Hence we only
deal with how to satisfy Assumption 1.
76 CHAPTER 2

2.14.1 Big-M Method


The first method is more or less the same as the Big-M method in the simplex
method. To solve the problem (2.1), we solve the following problem:

minimize (x,t) cT x + Mt
(2.111)
subject to Ax - t(Ax O - b) = b, x ~ 0, t ~ 0,
where X O is a positive vector. Obviously, this problem have an interior feasible
solution (x,t) = (xO, 1), and if M is sufficiently large, t is forced to 0 at its optimal
solution. This means that (2.1) can be solved by (2.111) if M is sufficiently large. In
this approach, we need to choose appropriate M in advance. Kojima and Ishihara
proposed a procedure to change M adaptively while running the algorithm to end
up with a sufficiently large M. See [27].

2.14.2 Phase I-Phase II Method


In this method, we solve the problem (2.1) in two stages. In the first stage, we
solve a problem to find a relative interior point of the feasible region of the original
problem, whereas in the second stage, we solve the LP problem to the optimality.

Let xO be any positive vector. Consider the following problem:

minimize (x,t) t subject to Ax - t(AxO - b) = b, x ~ 0, t ~ O. (2.112)

Obviously, (x, t) = (xO, 1) is an interior feasible solution of (2.112).


If we solve this problem by the affine scaling algorithm with ,\ ~ 2/3, the limiting
point X OO is a relative interior point of the feasible region of the original problem.
Let Nand B be the index sets such that xIV = 0 and xfj > O.

Since the optimal face of (2.112) is the feasible set of (2.1) and X OO is a relative
interior point of the optimal face of (2.112), X OO is a relative interior point of the
feasible region of (2.1). Then, (2.1) is equivalent to the following problem

mllllmize Xa C~XB subject to ABxB = b, XB ~ 0 (2.113)

for which xfj > 0 is available as an initial interior feasible solution for the affine
scaling method to solve this problem. We obtain the optimal solution of the original
problem by solving (2.113) with the affine scaling algorithm.
Affine Scaling Algorithm 77

REFERENCES
[1] Adler, I., Karmarkar, N., Resende, M., and Veiga, G., "Data structures and
programming techniques for the implementation of Karmarkar's algorithm,"
ORSA Journal on Computing, Vol. 1, No.2 (1989), pp. 84-106.
[2] Adler, I., Resende, M., Veiga, G., and Karmarkar, N., "An implementation of
Karmarkar's algorithm for linear programming," Mathematical Programming,
Vol. 44 (1989), pp. 297-335.
[3] Adler, I., and Monteiro, R. D. C., "Limiting behavior of the affine scaling con-
tinuous trajectories for linear programming problems," Mathematical Program-
ming, Vol. 50 (1990), pp. 29-51.
[4] Alizadeh, F., "Interior point methods in semidefinite programming with appli-
cations to combinatorial optimization," SIAM Journal on Optimization, Vol.5
(1995), pp.13-52.
[5] Amari, S.-I., "Differential-Geometrical Methods in Statistics," Lecture Notes in
Statistics, Vol. 28, Springer-Verlag, Berlin, 1985.

[6] Anstreicher, K., "Linear programming and the Newton barrier flow," Mathe-
matical Programming, Vol. 41 (1988), pp.367-373.
[7] Barnes, E. R., "A Variation on Karmarkar's algorithm for solving linear pro-
gramming problems," Mathematical Programming, Vol. 36 (1986), pp. 174-182.
[8] Bayer, D. A., and Lagarias, J. C., "The nonlinear geometry of linear program-
ming, I. Affine and projective trajectories," Transactions o/the American Math-
ematical Society, Vol. 314, No.2 (1989), pp. 499-526.

[9] Bayer, D. A., and Lagarias, J. C., "The nonlinear geometry of linear program-
ming, II. Legendre transform coordinates and centeral trajectories," Transac-
tions o/the American Mathematical Society, Vol. 314, No.2 (1989), pp. 527-581.
[10] Cavalier, T. M., and Soyster, A. 1., "Some computational experience and a
modification of the Karmarkar algorithm," The Pennsylvania State University,
ISME Working Paper 85-105, 1985.
[11] Cheng, Y.-C., Houck, D. J., Liu, J.-M., Meketon, M. S., Slutsman, L., Vanderbei,
R. J., and Wang, P., "The AT&T KORBX system," AT&T Technical Journal,
Vol. 68, No.3 (1989), pp. 7-19.
[12] Dikin, I. I., "Iterative solution of problems of linear and Quadratic program-
ming," Soviet Mathematics Doklady, Vol. 8 (1967), pp. 674-675.
78 CHAPTER 2

[13] Dikin, I. I., "0 skhodimosti odnogo iteratsionnogo protsessa "(in Russian), Up-
ravlyaemye Sistemy, Vol. 12 (1974), pp. 54-60.
[14] Dikin, I. I., "Letter to the editor," Mathematical Programming, Vol. 41 (1988),
pp. 393-394.
[15] Dikin, I. I., "The convergence of dual variables," Technical Report, Siberian
Energy Institute, Irkutsk, Russia, December, 1991.
[16] Dikin, I. I., "Determining the interior point of a system of linear inequalities,"
Cybernetics and Systems Analysis, Vol. 28(1992), pp. 54-67.
[17] Dikin, I. I., "Affine scaling methods for linear programming," Research Memo-
randum No. 479, The Institute of Statistical Mathematics, Tokyo, Japan, June,
1993.
[18] Dikin, I. I., Private communication, 1993.
[19] Dikin, 1.1., and Roos, C., "Convergence of the dual variables for the primal affine
scaling method with unit steps in the homogeneous case," Report No. 94-69,
Faculty of Technical Mathematics and Informatics, Delft University of Technol-
ogy, Delft, Netherlands, 1994.
[20] Dikin, I. I., and Zorkaltsev, V. I., "Iterativnoe Reshenie Zadach Matematich-
eskogo Programmirovaniya(Algoritmy Metoda Vnutrennikh Tochek)" (in Rus-
sian), Nauka, Novosibirsk, USSR, 1980.
[21] Gay, D., "Stopping tests that compute optimal solutions for interior-point linear
programming algorithms," Numerical Analysis Manuscript 89-11, AT&T Bell
Laboratories, Murray Hill, NJ, USA, 1989.
[22] Gonzaga, C. C., "Conial projection algorithms for linear programming," Math-
ematical Programming, Vol. 43 (1989), pp. 151-173.
[23] Gonzaga, C. C., "Convergence of the large step primal affine-scaling algorithm
for primal non-degenerate linear programs," Technical Report, Department of
Systems Engineering and Computer Sciences, COPPE-Federal University of Rio
de Janeiro, Brazil, 1990.
[24] Gonzaga, C. C., and Carlos, A., "A primal affine-scaling algorithm for linearly
constrained convex programs," Technical Report ES-238/90, Department ofSys-
terns Engineering and Computer Science, COPPE-Federal University of Rio de
Janeiro, Brazil, December 1990.
[25] Giiler, 0., den Hertog, D., Roos, C., Terlaky, T., and Tsuchiya, T., "Degener-
acy in interior point methods for linear programming," Annals of Operations
Research, Vol. 47 (1993), pp. 107-138.
Affine Scaling Algorithm 79

[26] Hall, L. A., and Vanderbei, R. J., "Two-thirds is sharp for affine scaling," Op-
erations Research Letters, Vol. 13 (1993), pp. 197-201.

[27] Ishihara, T., and Kojima, K., "On the big M in the affine scaling algorithm,"
Mathematical Programming, Vol. 62 (1993), pp. 85-94.

[28] Karmarkar, N., "A new polynomial-time algorithm for linear programming."
Combinatorica, Vol. 4, No.4 (1984), pp. 373-395.

[29] Karmarkar, N., and Ramakrishnan, K., "Further developments in the new
polynomial-time algorithm for linear programming," Talk given at ORSA/TIMS
National Meeting, Boston, MA, USA, April, 1985.
[30] Kortanek, K. 0., and Shi, M., "Convergence results and numerical experiments
on a linear programming hybrid algorithm," European Journal of Operations
Research, Vol.32 (1987), pp. 47-61.
[31] Mascarenhas, W. F., "The affine scaling algorithm fails for A = 0.999." Techni'Cal
Report, Universidade Estadual de Campinas, Campinas S. P., Brazil, October,
1993.
[32] McShane, K. A., Monma, C. L., and Shanno, D. F., "An implementation of a
primal-dual interior point method for linear programming," ORSA Journal on
Computing, Vol. 1 (1989), pp. 70-83.

[33] Megiddo, N., and Shub, M., "Boundary behavior of interior point algorithms
for linear programming," Mathematics of Operations Research, Vol. 14, No.1
(1989), pp. 97-146.
[34] Mehrotra, S., "Implementations of affine scaling methods: approximate solutions
of system oflinear equations using preconditioned conjugate gradient methods,"
Technical Report, Department of Industrial Engineering and Management Sci-
ences, Northwestern University, Evanston, IL 60208, USA, 1989.
[35] Monma, C. L., and Morton, A. J., "Computational experience with a dual affine
variant of Karmarkar's method for linear programming," Operations Research
Letters, Vol. 6 (1987), pp. 261-267.

[36] Monteiro, R., and Tsuchiya, T., "Global convergence of the affine scaling algo-
rithm for convex quadratic programming," Research Memorandum, The Insti-
tute of Statistical Mathematics, Tokyo, Japan, March 1995.
[37] Monteiro, R., Tsuchiya, T., and Wang, Y., "A simplified global convergence
proof of the affine scaling algorithm," Annals of Operations Research, Vol. 47
(1993), pp. 443-482.
80 CHAPTER 2

[38] Monteiro, R., and Wang, Y., "Trust region affine scaling algorithms for linearly
constrained convex and concave programs," Manuscript, School of Industrial
and Systems Engineering, Georgia Institute of Technology, Atlanta, USA, 1995.
[39] Muramatsu, M., and Tsuchiya, T., "Convergence analysis of the projective
scaling algorithm based on a long-step homogeneous affine scaling algorithm,"
Manuscript, September 1995. (To appear in Mathematical Programming. A re-
vised version of "A convergence analysis of a long-step variant of the projective
scaling algorithm," Research Memorandum No. 454, The Institute of Statistical
Mathematics, Tokyo, Japan, October 1992.)
[40] Muramatsu, M., and Tsuchiya, T., "Affine scaling method with an infeasi-
ble starting point," Research Memorandum No.490, The Institute of Statistica
Mathematics, Tokyo, Japan, 1994.
[41] Muramatsu, M., and Tsuchiya, T., "Affine scaling method with an infeasi-
ble starting point: Convergence analysis under non degeneracy assumption,"
Manuscript, 1995. (To appear in Annals of Operations Research.)
[42] Nesterov, Yu., and Nemirovskiy, A., "Interior Point Polynomial Methods in
Convex Programming," SIAM Publications, Philadelphia, Pensnsylvania, USA,
1994.
[43] Resende, M., Tsuchiya, T., and Veiga, G., "Identifying the optimal face of a
network linear program with a globally convergent interior point method," In
Large Scale Optimization: State of the Art (eds. W. W. Hager et al.), Kluwer
Academic Publishers, Netherlands, 1994.
[44] Resende, M., and Veiga, G., "An efficient implementation of a network interior
point method," Manuscript, AT&T Bell Laboratories, Murray Hill, NJ, USA,
March, 1992.
[45] Saigal, R., "A simple proof of primal affine scaling method," Technical Report,
Department of Industrial and Operations Engineering, University of Michigan,
Ann Arbor, MI48109-2117, USA, March, 1993. (To appear in Annals of Opea-
rations Research.)
[46] Saigal, R., "A three step quadratically convergent implementation of the primal
affine scaling method," Technical Report No.93-9, Department of Industrial and
Operations Engineering, University of Michigan, Ann Arbor, MI48109, USA,
1993.
[47] Saigal, R., "The primal power affine scaling method," Technical Report No.93-
21, Department of Industrial and Operations Engineering, University of Michi-
gan, Ann Arbor, MI48109, USA, 1993. (To appear in Annals of Opearations
Research.)
Affine Scaling Algorithm 81

[48] Saigal, R., "Linear Programming: A Modern Integrated Analysis," Kluwer Aca-
demic Publishers, Netherlands, 1995.
[49] Schrijver, A., "Theory of Linear and Integer Programming." John Wiley & Sons,
Chichester, England, 1986.
[50] Sinha, L., Freedman, B., Karmarkar, N., Putcha, N., and Ramakrishnan, K.,
"Overseas network planning," Proceedings of "the Third International Network
Planning Sysmposium - Networks' 86" (IEEE Communications Society, held on
June 1-6, 1986, Tarpon Springs, Florida, USA), pp. 121-124.
[51] Sonnevend, G., "An "analytic centre" for polyhedrons and new classes of global
algorithms for linear (smooth, convex) programming," Lecture Notes in Control
and Information Sciences, Springer-Verlag, New York, Vol. 84, pp. 866-876,
1985.
[52] Stewart, G. W., "On scaled projections and pseudo inverses," Linear Algebra
and its Applications, Vol.112 (1989), pp.189-193.

[53] Sun, J., "A convergence proof for an affine-scaling algorithm for convex
quadratic programming without non degeneracy assumptions," Mathematical
Programming, Vol.60 (1993), pp.69-79.
[54] Tanabe, K., "Center flattening transformation and a centered Newton method
for linear programming," Manuscript presented at MP seminar, the Operations
Research Society of Japan, July, 1987.
[55] Tanabe, K., "Differential geometry of Optimization" (in Japanese), Preliminary
issue of the Bulletin of the Japan Society for Industrial and Applied Mathemat-
ics, No.3 (1990), pp. 39-50.
[56] Tanabe, K., and Tsuchiya, T., "New geometry of linear programming" (in
Japanese), Mathematical Science, No.303 (1988), pp. 32-37.
[57] Terlaky, T., and Tsuchiya, T., "A note on Mascarenhas' counter-example about
global convergence of the affine scaling algorithm," Manuscript, March, 1996.
[58] Todd, M. J., "A Dantzig-Wolfe-like variant of Karmarkar's interior point method
for linear programming," Operations Research, Vol. 38(1990), pp.1006-1018.
[59] Tseng, P., and Luo., Z.-Q., "On the convergence of the affine-scaling algorithm,"
Mathematical Programming, Vol. 56 (1992), pp. 301-319.
[60] Tsuchiya, T., "On Yamashita's method and Freund's method for linear program-
ming" (in Japanese), Cooperative Research Report of the Institute of Statistical
Mathematics, Vol. 10 (1988), pp. 105-115.
82 CHAPTER 2

[61] Tsuchiya, T., "Dual standard form linear programming problems and Kar-
markar's canonical form" (in Japanese), Lecture Note of the Research Institute
of Mathematical Sciences, Vol. 676 (1988), pp. 330-336.
[62] Tsuchiya, T., "Global convergence of the affine scaling method for degener-
ate linear programming problems," Mathematical Programming, Vol. 52 (1991),
pp. 377-404.
[63] Tsuchiya, T., "Global convergence property of the affine scaling method for
primal degenerate linear programming problems," Mathematics of Operations
Research, Vol. 17, No.3 (1992), pp. 527-557.
[64] Tsuchiya, T., and Monteiro, R. D. C., "Superlinear convergence of the affine
scaling algorithm." Technical Report, CRPC-92288, Center for Research on
Parallel Computation, Rice University, Houston, USA, November, 1992. (To
appear in Mathematical Programming.)
[65] Tsuchiya, T., and Muramatsu, M., "Global convergence of a long-step affine
scaling algorithm for degenerate linear programming problems," SIAM Journal
on Optimization, Vol. 5, No.3 (1995), pp.525-551.
[66] Tsuchiya, T., and Tanabe, K., "Local convergence properties of new methods in
linear programming," The Journal of the Operations Research Society of Japan,
Vol. 33, No.1 (1990), pp. 22-45.
[67] Vanderbei, R. J., and Lagarias, J. C., "I. I. Dikin's convergence result for the
affine-scaling algorithm," Contemporary Mathematics, Vol. 114 (1990), pp. 109-
119.
[68] Vanderbei, R. J., Meketon, M. S., and Freedman, B. A., "A modification of
Karmarkar's linear programming algorithm," Algorithmica, Vol. 1 (1986), pp.
395-407.
[69] Vavasis, S. T., and Ye, Y., "A primal-dual accelerated interior point method
whose running time depends only on A," Technical Report, Department of Com-
puter Science, Cornell University, December, 1994.
[70] Wang, Y., and Monteiro, R., "Non degeneracy of polyhedra and linear pro-
grams," Manuscript, School of Industrial and Systems Engineering, Georgia
Institute of Technology, Atlanta, USA, 1994. (To appear in Computational Op-
timization and Applications.)
[71] Witzgall, C., Boggs, P. T., and Domich, P. D., "On the convergence behavior
of trajectories for linear programming," Contemporary Mathematics, Vol. 114
(1990), pp. 161-187.
3
TARGET-FOLLOWING METHODS
FOR LINEAR PROGRAMMING
Benjamin Jansen, Cornelis RODS,
Tamas Terlaky
Faculty of Technical Mathematics and Computer Science
Delft University of Technology
Mekelweg 4, 2628 CD, Delft, The Netherlands

ABSTRACT
We give a unifying approach to various primal-dual interior point methods by performing
the analysis in 'the space of complementary products', or v-space, which is closely related to
the use of weighted logarithmic barrier functions. We analyze central and weighted path-
following methods, Dikin-path-following methods, variants of a shifted barrier method
and the cone-affine scaling method, efficient centering strategies, and efficient strategies for
computing weighted centers.

Key Words: target-following, primal-dual, weighted logarithmic barrier, unified frame-


work, centering, analytic center, central path

3.1 INTRODUCTION
In this chapter we offer a general framework for the convergence analysis of primal-
dual interior point methods for linear programming (LP). This framework is general
enough to apply to very diverse existing methods and still yield simple convergence
proofs. The methods being analyzable in this context are called target-following.
These methods appeared to be closely related to the methods using a-sequences
developed by Mizuno [24, 25] for linear complementarity problems (LCPs).

To be more specific we use the LP problem in standard form

(P) mill { cT x : Ax = b, x ~ 0 } ,
x

83
T. Terlaky (ed.), Interior Point Methods o/Mathematical Programming 83-124.
© 1996 Kluwer Academic Publishers.
84 CHAPTER 3

and its dual

where c, x E lRn , b, y E lRm. We assume the existence of a positive primal-dual pair


(i.e., Slater points) for (P) and (D). Consider the system 1

b,
C, (3.1 )
-2
V ,

for v E lR~ + (i.e., v > 0). The basic result in the development and analysis of
target-following methods is contained in the following theorem, establishing a one-
to-one correspondence between positive primal-dual pairs (x, s) and positive vectors
in lRn. The theorem was proved by McLinden [22], Kojima et al. [20], see also Giiler
et al. [11].

Theorelll 3.1.1 Let there exist at least one positive primal-dual pair for (P) and
(D). Then for each v E lR~+ there exists a unique positive primal-dual pair (x, s)
such that Xi Si = v; ,
i = 1, ... ,n, i. e., a pair solving system (3.1).

The existence of the solution follows from the observation that the given system is
the Karush-Kuhn-Tucker (KKT) system for minimizing the weighted logarithmic
barrier function
n
f(x, s;v) = xT S - LV; InXiSi (3.2)
i=l

on the primal and dual set. We now define the v-space of a given LP problem as the
space of (the square roots of) the complementary products of positive primal-dual
pairs:

v = { v E lRn : Vi = VXiSi, Ax = b, AT Y + s = c, x> 0, s > 0 }.


Note that if v = y'xS then IIvl12 = x T S, so in the v-space the points with constant
norm represent all positive primal-dual pairs with a fixed duality gap. Observe that
all optimal pairs (x,s) correspond to the vector v = O. The central paths of (P)
and (D) are the set of solutions of (3.1) with v2 = J-t e, where J-t > 0 and e an all-
one vector of appropriated length (cf. Chapter 1 of this book). The image of the
central path in the v-space is the main diagonal; also the image of the weighted path
that passes through an initial point (x(O), s(O)) is the positive ray passing through
1 As far as notation is concerned, if x, s E IR n thenx T s denotes the dot product of the two vectors,
whereas xs, Vx and x" for Ct E IR denote the vectors obtained from componentwise operations.
Target following for LP 85

v(O) = Vx(O)s(O). Atkinson and Vaidya [1] discuss how the efficiency of Newton's
method is affected by differences in the elements of a weight-vector. They give a
simple example demonstrating that when the ratio between the smallest and the
largest weight decreases, the region where Newton's method converges gets smaller.
Hence, a natural way of measuring the closeness of a point to the central path appears
to be this ratio, which is denoted as

( _) ._ min (v) (3.3)


w v .- max (v) .

Note that 0 < w(v) ::; 1, with equality if and only if v is on the central path. To
combine centering and improving complementarity we will be interested in trajec-
tories of which the image in the v-space passes through v(O) and is tangent to the
main diagonal at the origin of the positive orthant.

To analyze primal-dual algorithms we focus on a few general concepts. The basic


algorithmic step in path-following primal-dual interior point methods is a Newton
step in the (x, s )-space. This step is defined with respect to some target(-point) v in
the v-space. The fundamental property in interior point methods is that the step is
feasible (i.e., preserves the interior point property) if the current iterate (x, s) is close
enough to the target v, where closeness is defined with some appropriate measure
of proximity. With this in mind, we can define the concept of a target-sequence, by
which we mean any sequence of vectors in the v-space. A traceable target-sequence
is a target-sequence with the property that: (i) it can be approximated, in the sense
of the above mentioned proximity measure, by a sequence of points in the (x, s)-
space, such that (ii) successive points in the (x, s)-space are obtained by some 'easy'
computations such as one or a few Newton steps. If the target-sequence converges
to some point, then we may enforce convergence of the associated (x, s)-sequence to
the target limit. We now define a target-following algorithm as an algorithm that
generates iterates (xCk), s(k») which are close to their corresponding targets v Ck ). In
the standard (central) path-following methods the targets are points on the central
path. Then the (traceable) target-sequence is determined by

for certain values JJo > 0 and 0 ::; Ok ::; 1, where k is the iteration number. A
weighted-path following algorithm has a given v(O) > 0 and sets

However, the one-to-one correspondence between points in the v-space and positive
primal-dual pairs (x, s) su?gests that, to solve t~e LP problem, we can follow any
sequence of targets {vCk )} III the v-space, for whIch eT (vCk»)2 tends to zero, hence
86 CHAPTER 3

leads to optimality. The same methodology can be used to solve other problems, like
computing weighted centers. Note that a target-sequence may consist of an infinite
as well as a finite number of targets; a target-sequence can be predetermined, but
also adaptively constructed during the algorithm.

The striking feature of the convergence analysis we propose is that it is essentially


performed in the v-space. We express a simple condition on the target-sequence
to be traceable by a sequence of primal-dual pairs (x, s). By verifying that a given
target-sequence satisfies the condition, we have a simple methodology to derive
complexity bounds. The general results are developed in Section 3.2. In this way
we are able to analyze and prove convergence of a great variety of algorithms (see
Section 3.3) such as the standard path-following method [27, 21] and the weighted
path-following method [3], predictor-corrector variants of these algorithms (Mizuno
et al. [26]), two variants of the Dikin-path-following method [18], a variant of the
cone-affine scaling algorithm [31], a variant of Freund's shifted barrier method [5],
algorithms for computing analytic centers [13, 24] and algorithms for computing
weighted centers [25, 1]. The convergence proofs are short and similar, thereby
demonstrating the unifying value of an analysis focusing on the v-space.

3.2 SHORT-STEP PRIMAL-DUAL


ALGORITHMS FOR LP

3.2.1 Directions in v-space and (x, s )-space


In this section we will analyze primal-dual methods for LP that follow a traceable
target-sequence. Methods of this type have an iterative nature, meaning that in
every iteration a direction is computed that leads from the current iterate to the
next. Let (x,s) be a pair of primal-dual interior-feasible solutions, and let v be
the corresponding point in the v-space, i.e., v =..jXS. Furthermore, let v be the
current target-point in the v-space. Our aim is to find an approximate solution of
the system of equations (3.1), or stated otherwise, we seek directions (~x, ~y, ~s)
such that

A(x + ~x) = b,
AT (y + ~y) + s + ~s = c,
(x + ~x)(s + ~s) = -2
v.
Target following for LP 87

Applying Newton's method to this system we remove the nonlinear term in the last
equation and obtain the following relations for the displacements:

A~x 0,
AT ~y+ ~s 0, (3.4)
x~s+s~x v2 _ v2 •

It is not difficult to obtain explicit expressions for the search-direction vectors


~x, ~y and ~s. For the analysis below it will be convenient to work in scaled
space as has become more or less standard in the literature on primal-dual methods
for LP (see Gonzaga [8]). To this end we introduce the vector

d:= ";xs- 1 .

Using d we can rescale both x and s to the same vector, namely v:

d- 1 x = ds = v.
The main property of the scaling is that it maps both x and s to the vector v; this
property is extended to a nonlinear setting by Nesterov and Todd [28]. We also use
d to rescale ~x and ~s:

P. := d~s.

Note that the orthogonality of ~x and ~s implies that Px and P. are orthogonal
as well. Thus, in the scaled space, the search-directions Px and P. are orthogonal
components of the vector
p" := px + P.· (3.5)
By definition, we may write

x~s + s~x = xd-ld~s + sdd-l~X = v(Px + P.).


Obviously ~y should not be scaled, hence we define py = ~y. So, Newton's direction
is determined by the following linear system:

ADpx o
DATpy + P. o
Px + P. v-I (v 2 - v 2) = p".
Note that Px and P. are simply the orthogonal decomposition of p" in the nullspace of
AD and the row space of AD respectively. Note that this is established by the scaling
with d. We mention here that this is the last time that the data A, b, c explicitly
appear in this section, and that the data only come in via an initial starting point.
This has the great advantage that we work completely in the v-space from now on.
88 CHAPTER 3

3.2.2 Analysis of the Newton step


Since we will use Newton's method for following a traceable target-sequence we need
to analyze its behavior. Let us define the vector qv as follows:

qv := px - P.·
Note that the orthogonality of Px and P. implies that IIqvll = IIPvll. We also have
px t(Pv + qv),
P. t(Pv - qv),
whence
(3.6)
The product PxP. plays an important role in the analysis. It represents the sec-
ond order effect in the Newton step, which needs to be small to prove efficiency of
Newton's method. Indeed, we have

(x + ~x)(s + ~s) = xs + x~s + s~x + ~x~s = v 2 + VPv + PxP. = 'iP + PxP•.


So, unless the nonlinear term ~x~s (that was left out in (3.4) to obtain a linear
system) is zero, the vector of complementarity products after the step will not exactly
be iJ2 . We relate the euclidean and the infinity norms of this product to the norm
of Pv as follows; a similar lemma for the case iJ is on the central path is proved by
Mizuno et al. [26]).

Lemma 3.2.1 One has IIPxp.lloo :5I1PvIl 2 /4 and IIPxp.1I :5I1PvIl 2 /(2-/2).

Proof Using (3.6) we may write

Using (3.6) once more we obtain

IIPxp.1I 2 = eT (Pxp.)2 = 116eT (p; _ q;)2 = 116 lip; _ q;11 2


< 116 (lip; 112 + I q;ln :5 116 (IIPvIl4 + IIqvll4) = kIIpv 114.
This proves the lemma. o
Target following for LP 89

In the analysis of target-following algorithms we will need a measure for the prox-
imity of the current iterate v to the current target v. For this purpose we introduce
the following proximity measure:

6(v;v) := 1
2min(v) IIPvli = 1
2min(v) v2v
II - - v211
- . (3.7)

We point out that this proximity measure is in the spirit of the Roos-Vial measure
[30], and the primal-dual measures discussed in Jansen et al. [19]. Note that this
measure is not symmetric in the iterate v and the target v. Defining
V
'U := -, (3.8)
v
the measure can be rewritten as
1 1
6(v;v) = 2 mmv
. ( ) Ilv- 1 (v 2 - v2 )11 = 2 mmv
. ( ) IIv(u - u-1)II· (3.9)

Let us indicate that if v2 = J.le for some positive J.I then this amounts to

which is up to the factor 1/2 equal to the proximity measure used in [19]. A similar
measure, namely
6M (v;v):= 2mi~(v) Ilv2; v211,
was used by Mizuno [24, 25]. This proximity measure differs from ours by a factor
involving

The next lemma is concerned with bounding these quantities. Moreover, our analysis
will show that these quantities are very important for the proximity in the v-space.

Lemma 3.2.2 Let 6 := 6(v;v) and u as defined in (3.7) and (3.8). Then it holds

1
p(6) ~ Ui ~ p(6), i = 1, .. . n,
where
p(6) :=6+~. (3.10)
90 CHAPTER 3

Proof Observe that


1 1 1
8=2. ()llv(u-u-l)II~2 . ()min(v)IIu-u-lll=-211u-u-lll·
mmv mmv
So, for each i, 1 ~ i ~ n,
-28 ~ u;l - Ui ~ 28.
Since Ui is positive, this is equivalent to

-2u;8 ~ 1 - u; ~ 2u;8,
or
u; - 2u;8 - 1 ~ 0 ~ u; + 2Ui8 - 1.
One easily verifies that this is equivalent to

p(8)-1 ~ Uj ~ p(8).
This proves the lemma. o

We proceed by investigating when the (full) Newton step to the target-point v can
be made without becoming infeasible, i.e., under which conditions the new iterates
x+ := x + dx and s+ := s + ds are positive. The next lemma gives a simple
condition on 8(v;v) which guarantees that the property is met after a Newton step.

Lemma 3.2.3 If IIv- 2p.,Pslloo < I, the Newton step is feasible. This condition is
satisfied if8 := 8( v; v) < 1.

Proof Let 0 ~ 0' ~ 1 be a step length along the Newton direction. We define
= =
x(n) x + ndx and sen) s + nds. Then we have

x(n)s(n) = (v + np.,)(v + nps) = v2 + nv(p., + Pa) + n 2 p.,Pa


v 2 + n(v2 _ v 2 ) + n 2 p.,Pa (3.11)
= v 2 (1- 0') + nv2 (e + nv- 2 p.,Pa) .
So, if IIv- 2 p.,p.1l00 < 1 and 0' ~ 1 then x(n)s(n) > 0, which proves the first state-
ment. The condition on 8 follows from the observation

II p.,Pall < IIp.,p.lloo <


IIpv 112 = 82.
v2 00 - min(v)2 - 4min(v)2
where the last inequality follows from Lemma 3.2.1. o
Target following for LP 91

Letting Q = 1 in (3.11) and denoting (v+)2 = x+s+ we get the useful relation

(3.12)

The following lemma shows that if the current iterate v is close enough to the target
v, the Newton step ensures quadratic convergence of the proximity measure.

Lemma 3.2.4 Assume that 8 := 8( v; v) < 1 and let v+ result from a Newton step
at v with respect to v. Then one has

+ _2 84
8(v ;v) :::; 2(1- 82 )

Proof. From Lemma 3.2.3 we know that x+ and s+ are feasible. For the calculation
of 8(v+; v) we need v+. From (3.12) and Lemma 3.2.1 we get

Using this relation, (3.9) and (3.12) we may write

.1 II(v+)-1(v2_(v+?)112
4mm(vp

4 . \ pll(v+)-lpxPsI12
mm v
1 IIPxPsl12
< 4 min( v p min( v+)2 .

Substitution of the bounds derived in Lemma 3.2.1 and (3.13) yields

8(v+. v)2 < 1 Ilpv 114


, - 32min(vp min(v)2(1- 82 ).

Performing the substitution Ilpv II = 2 min(v)8, gives

( + _2 84
8 v ; v) :::; 2 (1 _ 8 2 )'

which proves the lemma. o


92 CHAPTER 3

For 8 := 8( v; v) < ..j2J3 it holds 8( v+; v) < 8, implying convergence of the sequence
of Newton steps, while for 8 < 1/>12 it holds 8(v+;v) < 82 , guaranteeing quadratic
convergence.

The Newton step has another important consequence, namely that the duality gap
after the step has the same value as the gap in the target v.

Lemma 3.2.5 Let the primal-dual feasible pair (x+, s+) be obtained from a full
Newton step with respect to v. Then the corresponding duality gap achieves its target
value, namely (x+f s+ = Ilv112.

Proof Recall from (3.12) that (V+)2 = v 2 + pxP•. Hence, using orthogonality of Px
and p. we may write

This lemma has two important implications. First, if subsequent Newton steps would
be taken with v fixed, then the duality gap would remain constant. Furthermore, if
we take only full Newton steps in an algorithm (as is typically done in short-step
methods) the lemma implies that we do not have to bother about the duality gap
in the iterates themselves, but that it suffices to consider the duality gap in targets.

To complete the general results we will analyze the effect on the proximity measure
of a Newton step followed by an update in the target. This is technically a bit more
easily than analyzing the effect of an update in the target followed by a Newton
step, since now we can just use P. as defined before. Although the lat~er might seem
more natural both approaches are of course equivalent. We will do the analysis in
a very general setting, such that in the sequel it will be an easy task to apply this
theorem and derive polynomial complexity bounds for various applications.

Theorem 3.2.6 Let v and v be such that 8 := 8( v; v) ~ 1/2. Let v+ be obtained


from v by a full Newton step with respect to v and let v+ E 1R++ be arbitrary. Then

-/6 1 min (v)


8(v+·V+)
,
<
-
-8(v·v+)
2'
+ -2-/6 min(V+)
.
Target following for LP 93

Proof First, from Lemma 3.2.3 it follows that v+ is well-defined. By definition we


have
+.~ _
8( v ,v ) - 2 mm
1
. (~)
II(v+?-(v+)211
v v+ .

Recall from (3.12) that (V+)2 = v 2 + PxP. and from (3.13) that

min(v+)2 ~ min(v)2(1 - 8 2). (3.14)

Using these and Lemmas 3.2.1 and 3.2.2 gives

1 II (V+)2 - v2 v II 1 IIPxPs II
2 min(v+) v v+ + 2 min( v+) --;;+
< 8(v;V+) II v: 1100 + 2min(v+~min(v+)2~llpvI12
min (v? 2
< 8(v·V+)p(8(v+·v)) + 8
, 'V2min(v+) min(v+)
8- ~ 8 + - min (v) 82
< (v; V )p( (v ; v)) + min(v+) )2(1- 82 )'

where the last inequality follows from (3.14). Finally, from Lemma 3.2.4 we obtain

82
8(v + . v) < -~;====:;;~
, - )2 (1- 82 )

Substituting 8 ::; 1/2 yields 82 / )2( 1 - 82) ::; 1/ (2V6) and p( 8( v+ ; v)) ::; V6/2. This
gives the bound. 0

We will later apply this theorem several times in the following way. Given v close
to a target v such that 8( v; v) < 1/2, we need to determine a condition on the new
target v+ such that v+ will be in the region of quadratic convergence around 1)+, in
other words, such that 8( v+; v+) < 1/2. The lemma implies that this can be done by
measuring the proximity 8(v; v+) between the targets and the ratio min(v)/min(v+).

3.3 APPLICATIONS
We will now apply the general ingredients from Section 3.2.2 to various primal-dual
algorithms found in the literature, and to some primal-dual variants of pure primal
or dual methods that appear in the literature. The reader should recall that the only
94 CHAPTER 3

missing element to complete the convergence analysis of a target-following method


is to determine the step size that can be taken, which is obtained from the condition
that after a Newton step the iterate should be close to an updated target, in the
sense that it belongs to the region of quadratic convergence around the target (cf.
Theorem 3.2.6). The number of iterations required then follows from analyzing the
effect of the step size on the measure of progress.

3.3.1 Path-following Methods


The standard path-following methods were derived and analyzed by Monteiro and
Adler [27] and Kojima et al. [21], being inspired by studies on the central path by
Megiddo [23] and Bayer and Lagarias [2], among others. Ding and Li [3] analyzed
primal-dual weighted path-following methods [3] (see also Mizuno [24]; a primal
version was studied by Den Hertog et al. [29, 14]).

3.3.2 Weighted Path-following Methods


In the weighted path-following methods the centering phase is by-passed and the
iterates keep approximately the distance to the path as in the initial point. Let veO )
be given, define

It is evident that

Lemma 3.3.1 Let v be given and let w = min(v)/max(v); using the target update
v+ =
~v, we have
min (v) 1
and
min(V+) - VI - {}

Proof The first statement is trivial. The second follows from

1 II (1 - (})v 2- v211
2~min(v) v

2~min(V) II {}vl I
1 {}vfn
< 2~w .
Target following for LP 95

As is clear from the lemma, in the maximal step size we have to take into account
w. Combining Lemma 3.3.1 with Theorem 3.2.6 gives that 6(v+;V+) < 1/2 for
8 = w/(3fo.). Since IIv+W = (1 - 8) II vII 2 , we get by Lemma 3.2.5 that the num-
ber of iterations required for the algorithm is O(fo./wln(x(O)ls(O)/f). Note that
for central path-following methods w = 1, so the complexity bound is negatively
influenced by non-central starting points. The bound is in accordance with [3] for
weighted path-following.

Predictor-corrector Methods
We will now analyze a predictor-corrector variant of the path-following algorithm.
As above, we assume an initial V<0) been given. Let (x, s) be the current iterate. An
'iteration' of the algorithm consists of two steps: the predictor step, which is a step
in the primal-dual affine-scaling direction, followed by a centering (corrector) step.
Let the current iterate be (x, s), and target

such that 6( v; v) :::; 1/4. In the predictor step, the target is zero; using step size ()
the new complementarity satisfies

(3.15)

For the corrector step, we specify a new target on the path determined by V<0) as
follows:
T( +)2
(V+)2:= e v (v(O)?
eT (V(0)2

We claim that there exists a 'sufficiently large' value for 8 such that 6( v+; v+) :::; 1/2.
From the quadratic convergence result of Lemma 3.2.4 it then immediately follows
that we can compute v++ such that 6(v++;V+) :::; 1/4 in one Newton step towards
v+. Lemma 3.2.5 implies that
(3.16)

Consequently, defining
96 CHAPTER 3

it follows 6(v++;V++) :::; 1/4. It remains to prove the claim. The next lemma
provides a lowerbound on 8 such that 6( v+; v+) :::; 1/2; in practice we can compute
that value 8 such that 6( v+; v+) = 1/2, which will be (much) larger in general.

Lemma 3.3.2 Let the iterates and targets be as defined above. IJ6(v;v):::; 1/4 and
8:= w(v)/(2fo) then 6(v+;V+) :::; 1/2.

Proof First, note that eT (V+)2 = (1 - 8)eT v 2. For the predictor step Ax, As as in
(3.15) Lemma 3.2.1 implies

(3.17)

and
(3.18)

By definition
1
--r==================== x
2)( eT (v+)2 / eT (vC°»)2)min( (vC°»)2)
(e T (v+)2 / eT (V<0»)2)(vC°»)2 - v 2(1 - 8) - 82AxAs I
II Jv 2 (1 - 8) + 82AxAs

1
< x
2~)(eT v 2 /e T (vC 0 »)2)min«vC 0 »)2)
1
x
Jl- 8 - 82l1v-2AxAslloo

{ (1 _ 8) II (e T v 2leT (V<O)~2)(V<O»)2 - v 2 11 + 82 11v- 1 AXASII} .

Using the value of 8, n 2: 2 and (3.17) it follows

2 2 w(v) W(V)2 n 1 1
1- 8 - 8 Ilv- AxAsll oo 2: 1- 2fo - ~ 4w(v)2 2: 1- 2../2 - 16'

Hence

< ~ {v'f="86(V'V) + w(v)2 nmax(v)2 1 x


3 ' 4 n 2../2min(v) 2)1- 1/(2'-1'2)
Target following for LP 97

<

To estimate the last term we note that eT v 2 = eT v2 , while v2 = a(V<°))2 for some
o< a < 1. From Lemma 3.2.2 it follows
min(v) _
~(_) :::; p(8(v; v)).
mm v
Combining these bounds we obtain
1 4 1 1
8(v+;V+):::; 3+ 3 / . p(I/4) < 2'
32y 1 - 1/(20)
which proves the lemma. o

To obtain a complexity bound, we need to show that the step-size can be bounded
from below by a uniform constant. Let i, j be such that min( v) = Vi and max( v) = Vj.
Then
min(v) min(v) Vi max(v)
w(v)
max( v) ~ max(v) max( v)
1 min (v) Vj 1 _ 3 (-(0)
> --;-:-;--~ - ----- > w > -w v ).
p(8(v; v)) max(v) max(v) - p(8(v; v))2 - 5
Together with (3.16) this shows that, to reach an f-approximate solution, the algo-
rithm requires at most O( fo/w(V<°)) In 1/ f) iterations. Observe, that the estimation
in (3.17) essentially determines the complexity bound; in practice the actual value
Ilv- 2 ~x~sll will determine the actual predictor step that can be taken.

3.3.3 Algorithms Using Dikin-affine Steps

Motivation
In Jansen et al. [17] the primal-dual Dikin-affine scaling direction at v is introduced
by using the solution of the subproblem

~~n {vT ~v : IIv-l~vll:::; 1},


98 CHAPTER 3

defined in the v-space. This problem can be interpreted as finding the direction in
the v-space that aims at a maximum decrease in the duality gap within the Dikin-
ellipsoid in the v-space. The solution ~v is given by -v3 /1Iv2 11. Let us now use the
vector field of the primal-dual Dikin direction and its associated set of trajectories.
The equation of the trajectory passing through v E IR~+ and tangent to the vector
field is given by
v
lI>(t;v) = ~' t:?: o. (3.19)
vv2 t + e
It holds 11>(0; v) =
v and, for t --+ 00, lI>(t; v) tends to zero tangentially to the vector e.
Observe that the central path and the weighted paths discussed in Section 3.3.1 are
straight half-lines emanating from the origin. Contrary to these, the path defined
by (3.19) is a smooth curve connecting v and the origin, see Figure 3.1.

central path

v (0)

° VI

Figure 3.1 The central path, the weighted path through v(0) and the Dikin-path
through v(0) in the v-space.

We first show that lI>(t; v) defines a path in the v-space, henceforth called the Dikin-
path starting at v, and derive some interesting properties.

Lemma 3.3.3 Let lI>(tjv) be as defined in (3.19).


(i) For any t1, t2 :?: 0 it holds
Target following for LP 99

(ii) For any t 2: 0 it holds that ifv; ::; Vj then 11>;(t; v) ::; I1>j (t; v);
(iii) For any t 2: 0 it holds w(l1>(t; v)) 2: w(v).

Proof (i) It holds

(ii) If Vi ::; Vj then it also holds

vJ(V]t + 1) ::; vJ(V]t + 1),


from which the statement follows.
(iii) Using the fact from (ii) that the ordering of the coordinates of v is the same
along the path we have

min(l1>(t; v)) _ min (v) max(v)2t + 1


w(l1>(t;V)) = max(l1>(t; v)) max(v) min(v)2t + 1

max(v)2t + 1
w(V) --'-~2-- > w (_)
V .
min(v)t+l-
o

Algorithms using such paths were introduced in Jansen et al. [18]. Note that it
combines centering and moving towards optimality, as opposed to a weighted path.
We stress that centering is very important in interior point methods. A sequence
of iterates that approximate the central path (in the limit) will generate points
converging to the analytic center of the optimal face, see Guier and Ye [12]. It is
well-known that this center is a strictly complementary solution, thereby defining
the optimal partition of the indices that characterizes the optimal set, which is
very useful in sensitivity analysis. Also, the asymptotic analysis of certain interior
point methods use the centering to prove superlinear or quadratic convergence of
algorithms, see e.g., [10, 9].

We will consider two algorithms. The first is called a Dikin-path-following method.


Given an initial target ve O), the other targets will all be at the Dikin-path starting
at vC°). The second algorithm we consider uses the tangent at vC°) and moves the
target with a certain step size in this direction. This brings the new target to a
different Dikin-path, from which the algorithm proceeds. We will show that from
a complexity point of view both algorithms behave similarly. Observe, that in the
case of a weighted path-following method both approaches are identical.
100 CHAPTER 3

Algorithm 1, Properties and Complexity


Let the initial target be denoted by v(0) and let (x(O), sea)~ be such that for v(O) :=
y' x(O)s(O) we have 6( v(O); v(0) :::; 1/2. In view of Lemma 3.3.3( ii) we assume

min(v(°) = viO) , max(v(O) = v},0).


The target-sequence is determined by values tk > 0 and the targets are defined by

Since we are interested in the behavior of Newton's method per iteration we just
denote v := v(k-l), v+ := v(k) and t := tk. We also use w := w(v). Taking for
~x and ~s the displacements according to a full Newton step with respect to the
target-point v, we can now formally state the algorithm as in Figure 3.2.

Input
(x eO) ,s(O): the initial pair of interior feasible solutions;
v(O) := y' x(O)s(O);
Parameters
f is the accuracy parameter;
t is the step size (default value wo/(3v~fo»;
begin
x := xeD); s := sea); v:= ViS;
while x T s > f do
v:= v/Jv2t + e;
compute (~x, ~s) from (3.4);
x:= x + ~x;
end'
s·= s + ~s·,
end.

Figure 3.2 Dikin-path-following algorithm.

From Section 3.2.2 it is clear that the only thing remaining to analyze a target-
following method, is to guarantee that a sufficiently large step size in the v-space can
be taken, and to use this to compute the number of steps needed by the algorithm.
Specifically, we should check for which value of t the conditions of Theorem 3.2.6
hold.
Target following for LP 101

Lemma 3.3.4 Let v+ result from a step along the Dikin-path with step size
W
t := 3v~fo.

Then
_m_l_·n~(v-".)_ < ~ and
min(v+) - 8

Proof. Using min(v) = VI and min(v+) = vt, the first bound follows from
(3.20)

Furthermore

<

6Jw'l/(3fo) +1
1
<
6
This completes the proof. D

Assuming 6( v; v) < 1/2, combining Theorem 3.2.6 with Lemma 3.3.4 shows that we
can compute v+ in one Newton step such that 6( v+; v+) < 1/2. We proceed by con-
sidering the reduction of the duality gap in the algorithm. Recall from Lemma 3.2.5
that after a full Newton step the duality gap attains its target value, so we only need
to consider the duality gaps eT v2 resulting from successive target values. Using this,
we prove the following theorem.

Theorem 3.3.5 Let (x(O), s(O)) be a given initial point and lei
102 CHAPTER 3

If the step size t has in every iteration the value wl(3Vnt;;) then after at most

o (_Vn_n In -'.-(x_C_o).:.-)T_s_(0_) )
w~ f

iterations the algorithm stops with a positive primal-dual pair (x*, SO) satisfying
(x*f s* ::::: f.

Proof. At the start of the algorithm the duality gap is given by

If, as before, the target-point at the beginning of some iteration is denoted as v and
at the end of the same iteration as V+, then we have

where w := w(v). Since w 2: wo by Lemma 3.3.3(iii), at each iteration the duality


gap is reduced by at least the factor

1 + w~/(3Vn)'

From this the theorem follows. D

From the theorem we see that whenever (x(O))T s(O) = 0(1) and wo = n(l), the
target-following algorithm runs in O( Vn In 11 f) iterations. Unfortunately, whenever
Wo is smaller than 0(1) the complexity bound is heavily negatively influenced. We
will later show how the bound can be improved by adjusting the analysis and using
the fact that the proximity w increases along the Dikin-path.

Algorithm 2, Properties and Complexity


The second algorithm we consider determines the target-sequence by moving from
one target to the other using tangents to successive Dikin-paths. Specifically, given
a current target v let us define the next target by

-3 ( -2 )
v+ := V - a 11~211 = v e- a 11~211 '
Target following for LP 103

for some positive number 0:'. Since we require v+ to be positive it is well defined
only if
._ IIv2 11
0:' < O:'max .-
max ( V )2'
Defining the step size (J by (J := O:'/O:'max we have 0 < (J < 1 and

v+ := V - (J
max
-3 )2 = (
V(
V
V e - (J
max
-2)
V()2
V
. (3.21)

Note that each element of v+ is smaller than the corresponding element of V. This
property is important, since the Newton process in the (x, s)-space forces equality
between the duality gap and eT (v+)2, see Lemma 3.2.5. So the duality gap will be
decreasing and is bounded by

Ilvll (1 - (J) :::; 11v+ II :::; Ilvll (1 - (J :::~:~:) = Ilvll (1 - (J[J2) , (3.22)

where [J := w(v). If we choose (J :::; 1/3 then the Dikin step has two interesting
properties, which are similar to the ones in Lemma 3.3.3: it preserves the ranking
of the coordinates of V, and it causes the ratio [J to increase monotonically. These
results are summarized in the next lemmas.

Lemma 3.3.6 Assume that 0 < VI :::; V2 :::; ... :::; vn and let (J :::; 1/3. Then

O<vt:::;vr:::;···:::;v;;·

Proof Let i < j. We have

Thus it follows that vt : :; vi with equality if and only if Vj = Vi. This proves the
lemma. 0

Remark. An alternative proof of Lemma 3.3.6 can be given using the function ¢(t) =
vn = 1, then vt = ¢(Vi)
t(l- (Jt 2 )/(1- (J). Assuming without loss of generality that
is the value after the Dikin step, where the maximal component of v+ is rescaled to
1. This function is monotonically increasing and concave for (J :::; 1/3. •
104 CHAPTER 3

In the sequel we shall use () :s: 1/3, hence we may assume that the coordinates of v
are ranked as in Lemma 3.3.6. SO VI is the smallest and vn the largest element of v
and w = vl/vn .

Lemma 3.3.7 Assume that () :s: 1/3 and let w+ := w(v+). Then

w+=
1-
( 1-()
()W2) _ _ (3.23)
w~w,

and
1-W+ < ( 1- - Bw ) (l-w). (3.24)
1-B

Proof Since B :s: 1/3 Lemma 3.3.6 implies that w+ vi /vt· Hence, from the
definition of vi and v;t we get

For the second inequality, note that

1 - Bw 2 1 - B - w + ()w3
1-w+ 1- w=------
(1- ()(w+w2)) (l-w)
1-() 1-()

1- B(l +w+w2)(1_w) =
1-() 1-()

< ( 1 - 1()w
_ () ) (1 - w).

This proves the lemma. o

Remark. If we use a value () > 1/3, the ranking of v may not be preserved and the
proof of Lemma 3.3.7 doesn't go through. However, it is still possible to prove the
monotonicity of w for () :s: 1/2. We will omit the proof since this property will not
be used in the analysis. _

Again it is important to analyze the influence of a target update on the proximity


measure by applying Theorem 3.2.6.
Target following for LP 105

Lemma 3.3.8 Let v+ result from a Dikin step at v with step size () ~ 1/3 usmg
(3.21). Then
_m_i_n,-'-(v-:-)..,.. < _1_ and
-.",-t 1 (}VTi
min(v+) - 1- () 8(v,v )~1-() w'

Proof Lemma 3.3.6 guarantees the same ordering for v+ as for V. SO


min(v+) = vi = vl(1- (}w2 ) 2: Vl(1- (}) = min(v)(1- (}). (3.25)
By definition and Lemma 3.3.6 we have
1
8(v;V+) = _+ Ilv- 1 (v+)2 _v2
2vl
)11. (3.26)

Since v+ < v it holds v+ + v < 2v ~ 2vn e. Using also the definition of v+ we get
Ilv- 1 (v+? - v 2) II = IIv-1(v+ + v)(v+ - v)11
~ 2(}vn Ilv- I ~ 2(}v Vn·
1
:; n

Using (3.25) we obtain


1 _ _ _1_(}VTi
8(v, v ) ~ 2(1- (})Vl 2(}v n Vn - 1 _ () w .
-.",-t

This proves the lemma. o

Assuming that 8(v;v) ~ 1/2, we have that () =


w/(6VTi) gives 8(v+;V+) ~ 1/2
applying Theorem 3.2.6. Since w increases during the course of the algorithm (see
=
Lemma 3.3.7) the default value () wo/(6VTi) guarantees that one Newton step per
target update is sufficient. The following theorem can now be proved analogous to
Theorem 3.3.5.

Theorem 3.3.9 Let (x(O), s(O) be a given initial point and let

V<0) := v'x(O)s(O) and wo:= w(v(O).


If the step size () has its default value wo/(6y'ri.) in every iteration then after at most

o (VTi In (x(O)Ts(O))
~ (

iterations the algorithm stops with a positive primal-dual pair (x*, s*) satisfying
(x*)T s* ~ L
106 CHAPTER 3

Comparing Theorem 3.3.9 with Theorem 3.3.5 we see that this target-following algo-
rithm has exactly the same complexity as the Dikin-path-following method analyzed
before. Still, there is a major conceptual difference between the two algorithms, since
one chooses its targets on one smooth path, while the other has targets on various
Dikin-paths. Moreover, when starting at the same point in the v-space, a Dikin step
as in the second algorithm moves the target closer to the central path than a step
along the Dikin-path; this can be verified by comparing the values of w+ in Lemma
3.3.3(iii) and Lemma 3.3.7.

Improved Analysis and Complexity


Unfortunately, when Wo is smaller than Q(I), the complexity bound of the target-
following algorithms considered above is highly affected. For instance, when Wo =
Q(I/Jn) we only obtain an O(n 2 Inl/c) iteration algorithm. However, the rather
straightforward analysis given above can be improved significantly to yield a bound
of
O(Jn(~ ln~ +In(xCO)fsCO)))
Wo Wo c
iterations. The fact is that in our analysis we bounded w by its initial value wo,
without considering that it is increasing in each iteration. Actually, w will reach
a value of constant order in a limited number of steps, as is clear from Lemmas
3.3.3(iii) and 3.3.7. From that point on we can use this new value to bound w from
below. The first goal is thus to bound the number of iterations to have w 'close to' l.
We will only show this procedure and analysis for the second Dikin-type algorithm.
Similar results are straightforwardly obtained for the Dikin-path-following method.

Lemma 3.3.10 Let 0 ~ 1/3. After at most

o (!ln~)
o Wo
target updates using Dikin steps with step size 0 we have w2 2:: 1/2.

Proof Using (3.23) we have for w2 ~ 1/2

1-0w2 >1-0/2=1+ 0/2.


1-0 - 1-0 1-0

r
So w2 2:: 1/2 will certainly hold if

(1 + 10~20 k
(wo)2 2:: 1/2,
Target following for LP 107

or equivalently, if
2kln(l+ 1(J~2(J) ~lnC~~2)'
Using In(1 + t) > t/2 for t < 1, this will certaillly be satisfied if

k (J/2 > In (~).


1- (J - (wo)2
Hence we find that the number of iterations required is at most
2(1- (J)
(J n
I (_1_)
2(wo)2 ,
which is of the order specified in the lemma. o

From the discussion succeeding Lemma 3.3.8 we know that (J = wo/(6..jii) is an


acceptable choice. Thus we reach a point with w2 ~ 1/2 in o
((vn/wo) In l/wo)
iterations; in that process 'iJ and hence eT 'iJ2 decreases. From then on, we can use
(J = 1/(6V2n) and we need O( vnln((xOf sO)/€) more iterations to terminate. We
have proved the following theorem.

Theorem 3.3.11 The algorithm tracing targets determined by Dikin steps requires

o (vn
at most
(~ In ~ + In (x(O)f s(O»))
Wo Wo €

iterations to obtain an i-approximate solution.

Unfortunately, this complexity bound is not better than the one obtained for weight-
ed path-following algorithms (see Ding and Li [3] or Section 3.3.1); still, the new
algorithm has the advantage of generating, in theory and in practice, increasingly
centered pairs. Let us define 'close to the central path' by requiring that the iterate
is in the region of quadratic convergence of some point on the central path. We can
relate 'closeness' to the value of w as follows.

Lemma 3.3.12 If w := w(v) ~ n/(n + 1), then there exists a target-point 'iJ on the
central path such that 6 := 6( v; 'iJ) < 1/v'2.

Proof If'iJ2 = Jle for some Jl > 0 then 6 reduces to


108 CHAPTER 3

This measure is minimal for J.I = IIvll / Ilv- 1II with value
1
yl2Vllvllllv-lll- n.
Hence we will have 6 ~ 1/../2 if

IIvllllv-11l- n ~ 1.
Using the bounds Ilvll ~ y'nmax(v) and IIv- 1II ~ y'n/min(v), this implies that it
suffices to have
1 n+ 1
- < -n - '
W -
which implies the lemma. o

The next lemma estimates the number of updates needed to reach a target with
w 2: n/(n + 1).

Lemma 3.3.13 Let (J ~ 1/3. After at most

o ((J~o In(n+ 1»)


iterations we have w 2: n/(n + 1).

Proof From equation (3.24) we need k to satisfy

~k)
(1- W' ) ~
(1- 1(Jwo_ (J )k (1- wo) ~ n +1 1·
Taking logarithms and using In(l- t) ~ -t for t < 1 we obtain that k should satisfy
1-(J
k 2: (J- In((n
Wo
+ 1)(1 - wo»,

which gives the order in the lemma. o

Other Scaling Factors


Instead of Dikin steps, we can let the steps be determined by v-order scaling in the
following sense
..+ __
( v
v - v e - (J v~v
-2V) . (3.27)
Target following for LP 109

In this setting the Dikin step has v =


1 and weighted path-following has v O. =
Again it is easy to analyze the resulting algorithms, which can be viewed as the
family of target-following algorithms simulating the family of primal-dual affine
scaling algorithms introduced in Jansen et a1. [16]. We assume that v = 0(1), since
otherwise the computations may require exponentially large or small numbers, and
the step size might become exponentially small.

First observe that

It is left to the reader to verify the following lemmas, which can be proved similarly
as in the case v = 1.

Lemma 3.3.14 If (J ~ 1/(2v + 1) then v+ has the same ranking as v; moreover,


w+ ?: w
with equality only ifw 1. =

Lemma 3.3.15 Let v+ result from v by a target update using (3.27) with step size
(J ~ 1/(2v + 1). Then

_m_in,..:.(v-f):"" < _1_ 1 (J..fii


and 6(v'V+)
, < ----
-1-(J w.
min(v+) - 1 - (J

We find that the algorithm using v-order scaling for the target update requires

iterations to obtain an (-approximate solution. In a similar way as in Lemma 3.3.10


and Theorem 3.3.11 we can improve the convergence analysis and improve the com-
plexity bound to

o ( vr.::
n
( 1
=- Wo
In=-
1
+ In
(x(O)?
.
s(O»))
Wo (

3.3.4 Cone-affine Scaling


Recently, Sturm and Zhang [31] proposed a new search-direction, which they used
in a so-called cone-affine scaling method. It appears that their direction is a linear
combination of the primal-dual affine scaling direction and a new centering direction.
110 CHAPTER 3

Here we will analyze a method following a target-sequence constructed with cone-


affine scaling steps. The target update is as follows. Let v := V<k-I), denote v+ :=
v(k), and define
v+ := y'Omin(v)v, (3.28)
for some 0 < 1. The new duality gap satisfies

eT (v+)2 = Omin(v)eTv:::; OeTTi,


hence the algorithm requires at most 0(1/(1-0) In( eT (V<0»)2 If)) iterations to obtain
an f-accurate solution. As in the Dikin-path-following algorithms the ordering of
the elements of the targets remains the same:

For the ratio w(v+) we derive

_+) -...r
_ vI
-+ .~
~(_) > w (_)
w (V -_ VVI _
Iff: - Vw~V) _ V • (3.29)
Vn VVn

Lemma 3.3.16 Lei v be given and let w = min(v)/max(v); using the target update
(3.28) we have

min (v)
min(v+) = ../9
1
and 6(v·, v+) < _1_
- 2../9
(..!.w - 0) 'no
V"

Proof. The first statement is trivial. The second follows from

6(v;V+) = 1 II0min(v)v - v211


2../9min(v) v
1
../9 1I0min(v)e - vII
2 Omin(v)

< _1 (max(v) _
2../9 min (v)
0) Fn
_1 (..!.-O) 'no
= 2../9 w V"

o
Target following for LP 111

Applying Theorem 3.2.6 with the bounds in Lemma 3.3.16 we can compute the
maximal value of 0 such that 8(v+;V+):::; 1/2 will hold, given 8(v;v):::; 1/2. Unfor-
tunately, it appears that this cannot be done without requiring a condition on w. If
we require
1 1
-<1+--
w- 5y'n'
and choose 0 = 1 - 1/(5y'n), then it holds 8(v+;v+) :::; 1/2 and the algorithm
has an O(y'nln l/c) iteration complexity. Observe that even in a target-following
framework the algorithm is required to stay in a small neighborhood of the central
path.

3.3.5 Freund's Shifted Barrier Method


Freund [5] analyzes a so-called shifted barrier method for the primal LP problem.
We will outline his method and then analyze a primal-dual variant.

Let xeO) be given such that Axe O) = b and define hand /-lo such that xeO) + /-loh > 0.
As Freund we make the following assumption.

Assumption 3.3.17 The shift h is chosen such that for all dual feasible slacks s
the condition Ilhsll :::; y'n holds.

Note that the assumption can be satisfied ifthe dual feasible region is bounded. Fre-
und shows, that when an approximation s to the analytic center of the (bounded)
dual feasible region is known then the algorithm can be started with this approxi-
mation, the shift h = ~s-l and a suitable value for /-lo. The system to be (approxi-
mately) solved in an iteration is given by
Ax b, x + /-lh ?: 0,
ATy+s c, s?: 0,
(x+/-lh)s /-le.
While Freund's algorithm does not necessarily generate feasible dual iterates in each
iteration, our primal--dual variant does. The main task it to estimate the effect of
updating the target foe, in which we use the distance measure

u« x + J.l h, s,· V r.;)


/-le 1_11 (x + /-lh)s -
-__ /-le II .
2fo J(x + /-lh)s
We define the following notion.
112 CHAPTER 3

Definition 3.3.18 The vectors x and s are called (p, f3)-approximate solutions if

Ax b, x + ph 2: 0,
AT Y + s = c, s 2: 0,
and 8(x + ph, s; foe) ~ f3 for some constant f3 < 1.

We have the following lemma.

Lemma 3.3.19 Let x and s be (p, 1/4)-approximate solutions and let p+ (1-0)p =
for 0 =
1/(16fo). Then we can compute (p+, 1/4)-approximate solutions x+ and
s+ with one Newton step.

Proof. Using Lemma 3.2.2 it holds for all i


1
-<
P-

where p := 1/4 + Jl + {1/4)2 ~ J5/3. Consequently,

( X' Hh·)s· > .!!.... > ~H.


+r
• ' • - p2 - 5 r

Then,
3
(Xi + p+ h;)Si (Xi + phi)Si + (p+ - p)hiSi 2: "5 P - Ophisi

> "53 P - opfo = (3"5 - 16


1) 43
P = 80 P > 0,

so we can use the pair (x + p+h,s) as starting point for Newton's method toward
the new target Ue.
We first establish that this pair is still close to the current
target foe:

8(x + p+ h, s; foe) = _1_11


2fo
(x + p+h)s - pc II
J(x + p+h)s

_1_11 (x + ph)s - pe+ (p+ - p)hs II


2fo J(x + p+h)s

< 8(x + ph, s; ViLe) I (~x::+~;s IL + 2~ I J(x ::+h)S I


Target following for LP 113

1 p,fii (}p yTi


<
"4 J43p/80 + -2,fii-p J43p/80
< ~ (~v1 + ~ 116) < ~.
Let (x+ + p+h, s+) result from a Newton step w.r.t. the new target pe. Then
Theorem 3.2.6 implies

6(x+ + I/+h s+' r;;+e) < 01 (}yTi + _1___1_.


r , , Y f-t' - 2 2v'l-B 201 v'l-B

Since, 1 - () ~ 1- 1/(160), we have 1/v'l-B ::; 44/43, hence


+ + +. 1:""+ < 01 ~ ~ _1_ 44 ~
6(x +p h,s ,YW e ) - 2 162.43 + 201 4 3 < 4'

So, the pair (x+, s+) is a (p+, 1/4)-approximate solution. o

We will let the algorithm run until (x+ph)T s ::; f; from the condition of approximate
solutions it then follows that np ::; 2('. Hence after O( yTiln(l/ f)) iterations the
algorithm has generated p' and a pair (x', SO) such that

and
2f
x' = x' + p*h - p'h ~ -p'h ~ --llhll
n
oo .

Hence the pair (x', s') is an approximately feasible and approximately optimal so-
lution if f is chosen sufficiently small.

3.3.6 Efficient Centering


The next application of the target-following concept is the problem of efficient cen-
tering as considered by Den Hertog [13) and Mizuno [24). The problem is stated as
follows: given an arbitrary interior-feasible point (x, s) compute a point close to the
central path. In this section we give a simple analysis of an algorithm, independently
proposed by Den Hertog and Mizuno. The idea of the algorithm is to successively
increase the smaller elements of the target-vector until they all become equal to the
largest element. More specifically, let (V(0))2 = x(O) s(O) be given; update v to obtain
v+ as follows:
vt = max(v;, v'I"'+B min (v)) , i = 1, ... , n; (3.30)
114 CHAPTER 3

if min(v+) > max(v), then we set v+ = max(v) e which is on the central path.
The goal of the algorithm is to obtain a vector which is a multiple of the all~one
vector. Since
( max(v+)) 2 1 (max(v)) 2
min(V+) ~ 1 + fJ min (v) ,
or equivalently (w+)2 ~ (1 + fJ)w 2, it follows that reaching this goal will require at
most
o (~ln
fJ
~)
Wo
iterations. The appropriate value of fJ is determined from the following lemma.

Lemma 3.3.20 Let v be given; using the target update (3.30) we have

min (v) < 1 and


min (v+) -

Proof. If we are not at the last iteration then from (3.30) it follows that for any i

vt ~ ~min(v) ~ min(v);
when v+ = max(v)e at the last iteration we have vt ~ min(v), hence the first bound.
Let J be the set of indices for which Vi is increased. Then we have vt = Vi for i ~ J
and
o ~ (vtf- vi ~ f) min (v)2 for i E J.
Consequently,

b(v; v+) = .1 II (v+? - v211 ~ .1 II f) min(v?eJ II ~ ~f)fo.


2 mlll( v+) v 2 mlll( v) v 2
where eJ is the O~ 1 characteristic vector of indices in J. o

Combining this result with Theorem 3.2.6 gives that we can take f) = 1/(3fo) to
have b(v+; V+) < 1/2. So we obtain that the algorithm needs at most O( foln 1/wo)
iterations.

If we combine the above centering scheme with the standard primal~dual path~
following algorithm we obtain an algorithm for the LP problem needing at most

(3.31)
Target following for LP 115

iterations, starting from any interior feasible point. This is done by first centering,
and then working to optimality. Note that in the centering phase the duality gap in
subsequent target points increases, but is bounded by n max(v(O)?

It is interesting to consider the seemingly equivalent scheme of moving the larger


components of v downwards. One can check that the analysis does not yield as good
a bound as before. Due to the asymmetry of the proximity measure, there is a factor
w that appears in the bound on o(v; V+). It is also clear that if we combine the
efficient centering scheme with a standard path-following algorithm, we can reach
the target (min(vCOl))e with complexity proportional to fo with no w factor. So the
observed asymmetry is not intrinsic to the problem.

3.3.7 Computing Weighted Centers


In this application we discuss some algorithms to find an approximate solution to
the KKT -system
b, x 2: 0,
c, s 2: 0, (3.32)
w2,

where W E R++ is a prespecified weight-vector. Approximate means that we will


compute a feasible pair (x, s), such that

o(v; w) :::; 1/2,

where v = y'xS as usual. We make the assumption that a (specific) point on or close
to the central path is available. Note that we might use the centering algorithm of
the previous subsection to find such a point. This problem has interesting special
cases that are considered by Atkinson and Vaidya [1], Freund [6] and Goffin and
Vial [7], namely to obtain the weighted analytic center of a polytope. If b = 0 and
(x, y, s) is a solution to system (3.32) then y is the weighted analytic center of the
dual space, if it is bounded; when c = 0 and (x, y, s) satisfies the given system then
x is the weighted analytic center of the primal space, if it is bounded.

We will first analyze an algorithm proposed by Mizuno [25], which is somehow the
dual of the algorithm for finding a center as discussed in the previous subsection.
Then we give a simplified analysis of the algorithm proposed by Atkinson and Vaidya
[1] for computing weighted analytic centers. We extend their algorithm to the case
of computing weighted primal and dual centers, i.e., for finding a solution to the
system (3.32).
116 CHAPTER 3

Mizuno '8 Algorithm


Assume that we start close to the center I-'e, with I-' = max(w 2). The aim is to get
close to the weighted center w. The first target point is set to v = max(w)e. We
then gradually decrease the elements of the vector v until they all reach the correct
value. This will be performed updating the target as follows:
vt = max(wi, vr=oVi). (3.33)
Each component Vi is decreased until it reaches its final value Wi.

Lemma 3.3.21 Let v+ be obtained from v with an update of the target using {3.33}.
Then
min (v) 1 1
--+) < "...--n
min(v - v 1- 0 and 6(v; v+) ~ 2vT-;::-/'..;ri·

Proof The first bound is trivial. The components of v that are decreased by a factor
vr=o have not yet achieved their final value Wi. Since they all start with the same
value, they have all been reduced by the same cumulated factor and thus
vt = vr=oVi ~ Vi = min(v).
So we have for all i that l(vt)2 -v~1 ~ Omin(v)2. Hence

6(v;V+) 1 II (1)+)2 - v211


2 min(V+) v

< 1 IIOmin (v)2 e II


2v'f=B min (v) v
1
< v'f=B0..;ri.
2 1- 0
o

Using Theorem 3.2.6 gives us 6( v+; v+) < 1/2 for 0 = 1/(3y'n). The number of
iterations to be performed is determined by the condition

which means that


k > ~ln (max(w)2).
- 0 min(w)2
Consequently the number of Newton steps to compute the weighted center is at most
O(y'nln l/w(w)).
Target following for LP 117

Atkinson and Vaidya's Algorithm (Dual)


Atkinson and Vaidya [1] propose an algorithm for the special case that b O. This =
corresponds to computing the weighted analytic center of the dual feasible region.
The algorithm is completely different from the one in the previous paragraph. Here
we will give a simple analysis for the algorithm by using two nested traceable target-
sequences. Moreover, we extend the algorithm to the general case (i.e., solving (3.32))
and show that this algorithm has a complexity bound which is worse than the one
for Mizuno's algorithm.

=
So first we consider the case b O. Assuming that w 2 2:: e and w 2 integral, Atkinson
and Vaidya suggest to start with a target vector v<0) =
e, and to successively increase
the weights by the use of a scaling technique it la Edmonds and Karp [4]. The basic
idea is to recursively solve the given problem with all weights W[ replaced by the
maximum of 1 and Lw[!2J. Let p = Llog2max(w2)J. Then W[ can be written in
binary notation as
=
W[ f3i o f3h .. . f3i p '
where f3ij E {O, 1} for all i, j. Elements of the weight-vector w 2 which do not need p
digits for their binary description start by convention with a string of zeroes. Now,
at iteration k the target is defined by

- R. R.
( ;;;(v k))2 R.
i - P l o P S ! .. ojJ,1c'

where we set v~k) =1 if f3i o f3i • ... f3i, =


O. Note that an update of the target to get
V<k)from V<k-l) amounts to doubling the target (i.e., adding a zero to the binary
description) and possibly adding 1 (if f3i, = 1) or subtracting 1 (if f3i o f3i • ... f3i, = 0).
This is the outer target-sequence in the algorithm.

From now on, we denote for ease of notation v:= V<k-l) and v+ := V<k). Then the
technique boils down to a scheme that updates Vi in the following way:

if i E h = {i
if i E 12 = {i (3.34)
if i E 13 = {i
Observe that
i E h ==> vt = Vi = 1. (3.35)
The number of updates in the outer target-sequence is determined by the condition

2k 2:: max( w)2 ,


which implies that there will be Llog2 max(w 2)J + 1 updates.
118 CHAPTER 3

We next need to compute the complexity of one outer update. This will be done
by defining an inner target-sequence that leads from v to V+. In [1) a pure dual
algorithm is used which means that doubling all weights does not change the position
of the dual weighted center. Hence, the only Newton steps needed are to get from
2v2 to (v+?, which are quite close to each other. Let (x, s) and v be given such that
6( v; v) < 1/2, where v := .,fXs. Since b = 0, by setting

x+ = 2x, s+ = s, v+ = v'x+s+, (3.36)


we have a feasible pair (x+, s+) for which
1
6(v+; V2v) = 6(v;v) < -.
2
So in the analysis we just have to consider the target-sequence that leads from V2v
to V+. For this purpose we use the following scheme. Let j = 0 and (v(0»)2 = 2v2.
Define
if i E h,
t'J i ={ ~
0
a
ifi E 12 , (3.37)
-ii,Tn if i E h,
where a > 0 is a certain constant. Update v(j) for j ~ 1 in the following way:
-U»)2 -_ (1 _ v,
(Vi .• ,)(-U-l))2
Vi . (3.38)
Of course, we do not overshoot the target value. The conditions

( 1 - _ a r.::)i (2v;)
Viyn
~ 2v; - 1 ifi E h,

(1 + VifoY (2v;) ~ 2v; + 1


determine the number of updates to be performed. For i E h, it suffices to have

.
J >
ViVn
-In( -2v;
--) .
- -a 2-
Vi- 1
2

Since from (3.35) we have Vi = 1 it follows that at most


Vn In2
a
iterations are needed for i E h. For i E 13 we have the following: j satisfying
ja 1
1+_Viyn
r.::~I+--=2
2vi
Target following for LP 119

suffices. This leads to the condition that j ~ tJ.


suffices; using the fact that Vi ~ 1,
this proves that the number of updates to be performed is not larger than y'n/2Ot.
We need to show now that the specific choice of the update guarantees that one
Newton step suffices per update.

Lemma 3.3.22 Define {} := Ot/y'n. Let v(i) be obtained from v(j-l) with an update
of the target using {3.37} and {3.38}. Then

min(v(j-l») 1
b(v(j-l). v(j)) < 30t .
--~~~< ----- and
,
min(v(j») -.;r=B -2~

Proof For ease of notation, let v = v(j-l) and v+ = v(j). Then

and hence we have

1 ( (V~(I_!?i)_V~)2)1/2
<
2.;r=Bmin(v) iEf;I. · Vi '

= 1 ( E (Ot Vi) 2) 1/2


2.;r=Bmin(v) iEI,uI. y'nVi

{} ( (J2V; + 1) 2) 1/2
<
2.;r=B min( V) iEf;I 3 Vi
{} 30t
<
2 .;r=B
1-{}min(v) 3vn = 2v'l=O
1-{}min(v) .
Since min( v) ~ 1 the lemma follows. o

Using Theorem 3.2.6 it follows from the lemma that for Ot =


1/7 we can get close
to v(j) from a point close to v(j-l) in one full Newton step. So the entire algorithm
performs at most
O( vn log2 max( w)) (3.39)
Newton steps, and for this pure dual algorithm we get the same complexity as in [1]
using a much simpler analysis.
120 CHAPTER 3

Atkinson and Vaidya)s Algorithm (Primal-dual)


We will now analyze the same strategy for the problem of finding the primal-dual
weighted centers, i.e., the solution of system (3.32). The outer iteration is the same
as before, i.e., doubling the target and subtracting or adding one if necessary, see
(3.34). The number of Newton steps needed to get close to a new target is more than
one now, since the update of v is big: the trick in (3.36) cannot be used anymore.
Again, to compute an iterate in the quadratic convergence region of v+ another
sequence of targets is constructed by which we reach v+ from v. The following
scheme is used. Let i/O) = v and define

if i E h,
{)i = { ~ Q
if i E 12 U Is,
v.Tn
where a > 0 is a certain constant. Update vU ) for j ~ 1 in the following way:

(3.40)

Note that the proof of Lemma 3.3.22 is easily adapted for this sequence and that its
result remains the same. Using the condition

or

if i E h,

ifi E h

and using the fact that Vi ~ 1, this implies that the number of updates must be of
the order O(max(v)y'n), so an upper bound expressed in the data is

O(max( w )v'n).

The reader should notice here that we have shown that the algorithm has a total
complexity of
O(max(w)v'nlog2 max(w))
Newton steps. This is a factor max(w) worse than the result (3.39) above and in [1].
This difference can be explained by noticing that doubling all weights does not have
any effect in a pure primal or dual method, but has quite an effect in a primal-dual
method.
Target following for LP 121

3.4 CONCLUDING REMARKS


The approach given in this chapter has proven to be effective in analyzing many
primal-dual algorithms for linear programming. In Jansen [15] the methodology is
transferred to the analysis of (self-concordant) nonlinear programming problems.

Acknowledgements
The first author is supported by the Dutch Organization for Scientific Research
(NWO), grant 611-304-028. Currently he is working at Centre for Quantitative
Methods (CQM) B.V., Eindhoven, The Netherlands.

REFERENCES
[1] D.S. Atkinson and P.M. Vaidya. A scaling technique for finding the weighted
analytic center of a polytope. Mathematical Programming, 57:163-192, 1992.

[2] D.A. Bayer and J .C. Lagarias. The nonlinear geometry of linear programming,
Part I : Affine and projective scaling trajectories. Transactions of the American
Mathematical Society, 314:499-526, 1989.

[3] J. Ding and T.Y. Li. An algorithm based on weighted logarithmic barrier func-
tions for linear complementarity problems. Arabian Journal for Science and
Engineering, 15:679-685, 1990.

[4] J. Edmonds and R.M. Karp. Theoretical improvements in algorithmic efficiency


for network flow problems. Journal of the ACM, 19:248-264,1972.

[5] R.M. Freund. Theoretical efficiency of a shifted barrier function algorithm for
linear programming. Linear Algebra and Its Applications, 152:19-41,1991.

[6] R.M. Freund. Projective transformations for interior-point algorithms, and a


superlinearly convergent algorithm for the w-center problem. Mathematical
Programming, 58:385-414, 1993.

[7] J.-L. Goffin and J.- Ph. Vial. On the computation of weighted analytic centers
and dual ellipsoids with the projective algorithm. Mathematical Programming,
60:81-92,1993.
122 CHAPTER 3

[8] C.C. Gonzaga. Path following methods for linear programming. SIAM Review,
34:167-227, 1992.
[9] C.C. Gonzaga. The largest step path following algorithm for monotone lin-
ear complementarity problems. Technical Report 94-07, Faculty of Technical
Mathematics and Computer Science, Delft University of Technology, Delft, The
Netherlands, 1994.
[10] C.C. Gonzaga and R.A. Tapia. On the quadratic convergence of the simplified
Mizuno-Todd-Ye algorithm for linear programming. Technical Report 92-41,
Dept. of Mathematical Sciences, Rice University, Houston, TX, USA, 1992.
[11] O. Giiler, C. Roos, T. Terlaky, and J .-Ph. Vial. Interior point approach to
the theory of linear programming. Cahiers de Recherche 1992.3, Faculte des
Sciences Economique et Sociales, Universite de Geneve, Geneve, Switzerland,
1992. (To appear in Management Science).
[12] O. Giiler and Y. Yeo Convergence behavior of interior-point algorithms. Math-
ematical Programming, 60:215-228, 1993.
[13] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex
Programming, Algorithms and Complexity. Kluwer Publishers, Dordrecht, The
Netherlands, 1994.
[14] D. den Hertog, C. Roos, and T. Terlaky. A polynomial method of weighted cen-
ters for convex quadratic programming. Journal of Information f'3 Optimization
Sciences, 12:187-205, 1991.
[15] B. Jansen. Interior point techniques in optimization; complementarity, sensitiv-
ity and algorithms. PhD thesis, Faculty of Technical Mathematics and Computer
Science, Delft University of Technology, Delft, The Netherlands, 1995.
[16] B. Jansen, C. Roos, and T. Terlaky. A family of polynomial affine scaling al-
gorithms for positive semi-definite linear complementarity problems. Technical
Report 93-112, Faculty of Technical Mathematics and Computer Science, Delft
University of Technology, Delft, The Netherlands, 1993.
[17] B. Jansen, C. Roos, and T. Terlaky. A polynomial primal-dual Dikin-type
algorithm for linear programming. Technical Report 93-36, Faculty of Technical
Mathematics and Computer Science, Delft University of Technology, Delft, The
Netherlands, 1993. (To appear in Mathematics of Operations Research).
[18] B. Jansen, C. Roos, T. Terlaky, and J .-Ph. Vial. Primal-dual target-following
algorithms for linear programming. Technical Report 93-107, Faculty of Techni-
cal Mathematics and Computer Science, Delft University of Technology, Delft,
The Netherlands, 1993. (To appear in Annals of Operations Research).
Target following for LP 123

[19] B. Jansen, C. Roos, T. Terlaky, and J .-Ph. Vial. Primal-dual algorithms for
linear programming based on the logarithmic barrier method. Journal of Opti-
mization Theory and Applications, 83:1-26, 1994.

[20] M. Kojima, N. Megiddo, T. Noma, and A. Yoshise. A unified approach to


interior point algorithms for linear complementarity problems, volume 538 of
Lecture Notes in Computer Science. Springer Verlag, Berlin, Germany, 1991.

[21] M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algorithm


for linear programming. In N. Megiddo, editor, Progress in Mathematical Pro-
gramming : Interior Point and Related Methods, pages 29-47. Springer Verlag,
New York, 1989.
[22] L. McLinden. The analogue of Moreau's proximation theorem, with applications
to the nonlinear complementarity problem. Pacific Journal of Mathematics,
88:101-161,1980.
[23] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo,
editor, Progress in Mathematical Programming: Interior Point and Related
Methods, pages 131-158. Springer Verlag, New York, 1989.

[24] S. Mizuno. An O(n 3 L) algorithm using a sequence for linear complementarity


problems. Journal of the Operations Research Society of Japan, 33:66-75, 1990.
[25] S. Mizuno. A new polynomial time method for a linear complementarity prob-
lem. Mathematical Programming, 56:31-43, 1992.
[26] S. Mizuno, M.J. Todd, and Y. Yeo On adaptive step primal-dual interior-
point algorithms for linear programming. Mathematics of Operations Research,
18:964-981, 1993.

[27] R.D.C. Monteiro and I. Adler. Interior path following primal-dual algorithms:
Part I : Linear programming. Mathematical Programming, 44:27-41, 1989.

[28] Y. Nesterov and M.J. Todd. Self-scaled barriers and interior-point methods
for convex programming. Technical Report 1091, School of OR and IE, Cor-
nell University, Ithaca, New York, USA, 1994. (To appear in Mathematics of
Operations Research).
[29] C. Roos and D. den Hertog. A polynomial method of weighted centers for linear
programming. Technical Report 89-13, Faculty of Technical Mathematics and
Computer Science, Delft University of Technology, Delft, The Netherlands, 1989.
[30] C. Roos and J .-Ph. Vial. A polynomial method of approximate centers for linear
programming. Mathematical Programming, 54:295-305, 1992.
124 CHAPTER 3

[31] J.F. Sturm and S. Zhang. An O(y'nL) iteration bound primal-dual cone affine
scaling algorithm. Technical Report TI 93-219, Tinbergen Institute, Erasmus
University Rotterdam, 1993.
4
POTENTIAL REDUCTION
ALGORITHMS
Kurt M. Anstreicher
Department of Management Sciences
University of Iowa
Iowa City, IA 52242, USA

4.1 INTRODUCTION
Potential reduction algorithms have a distinguished role in the area of interior point
methods for mathematical programming. Karmarkar's [44] algorithm for linear pro-
gramming, whose announcement in 1984 initiated a torrent of research into interior
point methods, used three key ingredients: a non-standard linear programming for-
mulation, projective transformations, and a potential function with which to measure
the progress of the algorithm. It was quickly shown that the non-standard formula-
tion could be avoided, and evennally algorithms were developed that eliminated the
projective transformations, but retained the use of a potential function. It is then fair
to say that the only really essential element of Karmarkar's analysis was the potential
function. Further modifications to Karmarkar's original potential function gave rise
to potential reduction algorithms having the state-of-the-art theoretical complexity
of O(..,fiiL) iterations, to solve a standard form linear program with n variables, and
integer data with total bit size L. In the classical optimization literature, potential
reduction algorithms are most closely related to Huard's [39] "method of centres;"
see also Fiacco and McCormick [21, Section 7.2]. However, Karmarkar's use of a
potential function to facilitate a complexity, as opposed to convergence analysis, was
completely novel.

The purpose of this article is to give a comprehensive survey of potential reduc-


tion algorithms for linear programming. (In the final section we will also briefly
describe the extension of potential reduction algorithms to more general problems.)
The major algorithms that are discussed, and analyzed, are Karmarkar's algorithm,
the affine potential reduction method, and the primal-dual potential reduction al-
gorithm. The different algorithms are all described using simple, consistent notation
in order to facilitate a comparison between them. Before discussing any algorithms

125
T. Ter/aky (ed.l.lnterior Point Methods ofMathemmical PrOflramming 125-158.
CD 19'J6KlPuAaltlemicPlIlIlIIhen.
126 CHAPTER 4

we provide (in the next section) the basic complexity arguments based on the pri-
mal, and primal-dual potential functions. In the last section we describe various
modifications and extensions of the algorithms.

Todd [78] has already written an excellent survey of potential reduction algorithms.
Compared to [78], this is a more introductory article that covers less material. For
the reader interested in a more technical discussion of the topics covered here, with a
greater emphasis on research issues and new extensions, we highly recommend [78].
For a discussion of path-following methods, the other major class of polynomial-
time interior point algorithms, we highly recommend the survey paper of Gonzaga
[37].

Notation. We use very standard notation throughout. Subscripts denote compo-


nents of a vector, and superscripts denote iteration numbers. For a vector x E Rn,
we use X to denote the n x n diagonal matrix having Xi; Xi, i = =
1, ... , nj similar
notation is used for sand S, v and V, and so on. We use e to denote a vector of
variable dimension, with each component equal to one. We use II ·11 to denote the
=
two-norm, IIxll Ilx112, and II ·1100 to denote the infinity-norm.

4.2 POTENTIAL FUNCTIONS FOR LINEAR


PROGRAMMING
Consider a standard form linear program, and its dual:

LP: min cTx LD: max bTy


Ax b ATy+s = c
x > 0 s > 0,

where A is an m x n matrix. We assume without loss of generality that the rows of


A are linearly independent. We also assume that the set of optimal solutions for LP
is nonempty and bounded, and let z* denote the optimal objective value in LP and
LD. The primal potential function for LP is then

L
n

f(x, z) = q In(cT x - z) - In(x;),


;=1

where x > 0 is a point in the (relative) interior of LP, z :5 z* is a lower bound on


the optimal objective value, and q ~ n. Given an initial interior point xO, and lower
bound zO, a potential reduction method based on IL .) obtains a sequence (xl:, zl:),
Potential Reduction Algorithms 127

k ~ 0 of interior points and lower bounds such that f(x k , Zk) -+ -00. The usual
approach to analyzing such an algorithm is to show that on each iteration k it is
possible to reduce f(·,·) by some uniform, positive amount o. Note that for any
x> 0,
tln(X i ) ~ nln (e:x) ,
by the arithmetic-geometric mean inequality. If we assume that a decrease of at
least fJ occurs on each iteration, then after k iterations we immediately obtain

( T x k -z k)
I nc < f(xO,zO) kfJ n I n
--+- (eTxk)
-- . (4.1)
- q q q n

Clearly then if the solution sequence {xk} is bounded, the "gap" cT xk - zk will be
driven to zero. We will next translate this observation into a precise complexity
result for LP.

The usual complexity model for LP (see for example [65]) assumes that the data
in LP is integral, and characterizes the performance of an algorithm in terms of
the dimensions m and n, and the number of bits L required to encode the problem
instance in binary. (The quantity L is commonly refered to as the size of LP.)
A complete complexity analysis should bound the number of digits required in all
computations carried out by the algorithm, but we ignore this issue here and consider
only the number of arithmetic operations performed, and not the sizes of the numbers
involved. We will use the well-known fact (see [65]) that if cT x - z ~ 2- 2£ for a
feasible solution x and lower bound z, then x can be "rounded" to an exact optimal
solution of LP in O( m 2 n) operations. It is also well known that if LP has an optimal
solution value z·, then _2°(£) :S z· ~ 2°(£).

To start, we assume that we are given an initial interior solution xO, and lower bound
zO, such that f(xO, zO) ~ O(qL). Later we will discuss the "initialization" problem
of finding such a pair (xO, zO).

Theorem 4.2.1 Assume that the set of optimal solutions of LP is nonempty and
bounded. Suppose that f(xO,zO) ~ O(qL), and f(-,·) is reduced by 0 on each itera-
tion. Then after k = O(qLlfJ) iterations, cT xk _ zk ~ 2- 2 £.

Proof We will show that In(e T xk In) = O(L) for all k ~ O(qLlo), and therefore the
theorem immediately follows from (4.1). For each iteration k define scalars
128 CHAPTER 4

and let e = nxkjeTx k = Atxk, so that eTe = n. Exponentiating (4.1) then results

e ~ A~
m
cT zk + A~ .
It follows that for every k 2:: 0, (e, At, A~) is a feasible solution for the linear pro-
gramming problem:
mm Al + A2
A~ - Alb 0
eT~ n (4.2)
cT ~ - Al zmax - A2 < 0
~ 2:: 0, Al > 0,A2 > 0,
where Zmax = 20(L) is an upper bound for Z·. Since the set of optimal solutions of
LP is nonempty and bounded, the optimal objective value in (4.2) is strictly positive.
Moreover the size of (4.2) is O(L), and therefore the optimal objective value is at
least 2- 0 (L) (see [65]). However, after k = O(qLjfJ) iterations we must have either
eTx k ~ n, or A~ < 2- 0 (L). It follows that for all k 2:: O(qLjfJ), A~ 2:: 2- 0 (L), and
therefore In(e T xk jn) ~ O(L), as claimed. _

To provide a complete complexity result for LP we still need to deal with the is-
sue of satisfying the assumptions of Theorem 4.2.1. This is quite simple, at least
from a theoretical standpoint. For an arbitrary problem LP, with no assumptions
whatsoever, consider the augmented problem:

MLP: mm cT x
Ax b
eT X < M
x > 0,
where x E Rn+1, and
A = (A,b- Ae), c -_ ( Mc ) .

It is then very well known (see for example [65]) that MLP is equivalent to LP for
M = 20 (L), in that x· with eT x· < M is an optimal solution for LP if and only
Xi = xi, i = 1, ... , n, xn +! = 0 is an optimal solution for MLP. (If the optimal
solution to MLP has Xn +1 > 0 then LP is infeasible. If the optimal solution to MLP
has eT X = M then either LP is unbounded, or LP has an unbounded set of optimal
solutions, and these cases can be distinguished by doubling M and solving MLP
again.) The primal potential function can then be defined for MLP instead of LP,
and it is easy to verify that for zO = _2 0 (L), xO = e, the assumptions of Theorem
4.2.1 are satisfied.
Potential Reduction Algorithms 129

In addition to potential reduction methods based on f("'), we will consider algo-


rithms that utilize the primal-dual potential function for LP:
n n
F(x, s) = q In(xT s) - L In(xi) - L In(si)'
;=1 ;=1

where q > n, x > 0 is feasible for LP, and s > 0 is feasible for LD. (By the latter we
mean that there is ayE R m so that ATy + s = c.) Note that for any such x and s,
n

F(x, s) = (q _ n)ln(xTs) _ LIn (X~;)


;=1 x s
> (q - n) In(xT s) + n In(n), (4.3)
by the arithmetic-geometric mean inequality, where we are using the fact that
eT(XSe/x T s) = 1. A potential reduction algorithm based on F(·,·) generates a
sequence of primal and dual solutions (xk,sk) so that F(xk,sk) ---> -00. We will
now give a complexity result for such an algorithm under the assumption that F(.,.)
is reduced by some uniform amount {} on each iteration k. The proof of this result
is extremely simple, due to the form of F(·, .).

Theorem 4.2.2 Suppose that F(xO, sO) ~ O«q - n)L), and F(.,.) is reduced by {}
=
on each iteration. Then after k O«q - n)L/{}) iterations, (xk)T sk ~ 2- 2L •

Proof. Using (4.3) we obtain

I n « x k)T s k) F(xk, sk) F(xO, sO) - k{}


~ ~ ,
q-n q-n
from which the theorem easily follows. •
Note that the existence of an interior point for the dual problem LD implies that
the set of optimal solutions for LP is bounded, so the boundedness assumption that
was explicit in Theorem 4.2.1 is implicit in Theorem 4.2.2. To provide a complete
complexity result for LP based on the reduction of F(.,.) we must deal with the
initialization problem of finding (xO, sO) that satisfy the assumptions of Theorem
4.2.2. This can be done using an augmented problem that is very similar to the
problem MLP described above, but the analysis is somewhat more complex than for
the primal case, and is omitted here. We refer the interested reader to [12, Section
5] for the details of such an initialization.

Remarks. The primal potential function was introduced by Karmarkar [44]. The
exponentiated, or "multiplicative" form of the potential function was used by Iri
130 CHAPTER 4

and Imai [41], and was further studied by Imai (40). The use of general values for q
was suggested by Gonzaga (33). The primal-dual potential function was introduced
by Todd and Ye [SO), and (in multiplicative form) Tanabe (70). See Ye, Todd, and
Mizuno (91) and Jansen, Roos and Terlaky (42) for alternative "homogeneous self-
dual" approaches to the initialization problem.

4.3 KARMARKAR'S ALGORITHM


In this section we describe Karmarkar's projective algorithm for LP. The original
algorithm, as presented in [44], was based on a linear program in a non-standard
special form. The "standard form" version we describe here was independently
devised in (2), [20], [2S], [32), (67), and [89). Let xk, k ~ 0, be a feasible interior point
for LP, and zk :::; z' a valid lower bound. Our goal is to generate a new interior
point xk+l, and lower bound zk+l, so that the primal potential function f(·, .), with
q = n+l, is decreased by an amount fJ = 0(1). From Theorem 4.2.1, such a decrease
immediately provides an O( nL) iteration algorithm for LP.

Consider a new linear programming problem, with variables x E Rn+l:

HLP: mm i:T x
Ax 0
dTx
X > 0,

where
A = (AXk, -b), c= (X;c) , d= U).
One can think of obtaining HLP from LP by applying a transformation of variables:

and then using the additional variable X n +l to "homogenize" the original equality
constraints of LP. Clearly HLP is equivalent to LP, and x = e is feasible in LP.
The derivation of a step in LP is based on the transformed problem LP. First we
consider the issue of updating the lower bound. For any matrix B, let PB denote the
Potential Reduction Algorithms 131

orthogonal projection onto the nullspace of B. In the case that B has independent
rows, we then have PE = 1- BT(BBT)-l B.

Lemma 4.3.1 (Todd and Burrell [79]) Suppose that z E R satisfies PA(c-zd) 2: o.
Then z :::; z*.

Proof. The dual of HLP is:

HLD: max z
.F y + dz < c.
But Pji(c - zd) = (c - zd) - -,4T y(z) for some y(z) E R m , so PA(c - zd) 2: 0 implies
that (y(z), z) is feasible in HLD. Then z :::; z*, since LP and HLP have the same
optimal objective value. _

Using Lemma 4.3.1 the lower bound zk can be updated as follows. Let Zk = {z 2:
zk I Pji(c - zd) 2: O}, and define zk+l to be:

zk+1 = {max{z E Zk} if Zk "# 0,


zk otherwise.

Then Z = zk+1 :::; z*, by Lemma 4.3.1, and moreover by construction we have
PA(c - zd) 1- O. Now let

A -
uX = P [5 1(-C - z-d) = Pe T PA (-C - z-d) = P ji (-C - Z-d) -
(c - zdf e
n + 1 e,

where we are using the fact that Ae = O. Since Pji(c- zd) 1- 0, we then immediatelyl
have
II~xll 2: II~xlloo 2: (c: ~dt e. (4.4)

The next point, in the transformed variables x, will be of the form

_I ~x
X = e- Q lI~xll ' (4.5)

where Q > 0 is a steplength yet to be decided. Note that the resulting x 1 will satisfy
the equality constraints Ax = 0 of HLP, but in general will fail to satisfy ~ x' = 1.
In order to obtain a new point x k +1 which is feasible for LP, we employ a projective
transformation
k+1 _ X x
k- I
x - Px . (4.6)
'
132 CHAPTER 4

Substituting (4.6) into the definition of fe, .), with q = n+ 1, for a sufficiently small
we obtain

~_
< -a - t;
n+l
In
(
1- a II~~II
)
' ( 4.7)

where the inequality uses (4.4), and the fact that In(1 - t) ::; -t for any 0::; t < 1.
To obtain a bound on the potential decrease for Karmarkar's algorithm we need to
obtain a bound for (4.7). One approach is to use the following well-known inequality.

Lemma 4.3.2 Let U E R n , Ilulloo ::; 1. Then

~ T IIul1 2
£;-tln(1 + Ui)?: e U- 2(1-lluII00)

Proof For each i = 1, ... , n the Taylor series expansion for In(1 + Ui) results in

00 (_1)i+1ui 1 00. u2
In(l + Ui) = L
J=l
. ' ?: Ui - 2" L
J
IUil J =
J=2
Ui - 2(l-'lu;!) . (4.8)

The proof is completed by summing (4.8), and using IUil ::; Ilulloo for each i. •

Theorem 4.3.3 On every iteration k ?: 0 of Karmarkar's algorithm, the steplength


a may be chosen so that f(x k , zk) - f(x k+!, zk+!) 2: .25 .
Potential Reduction Algorithms 133

Proof We have
f(xk,zk) _ f(x k+1,zk+l) >

>

> a- 2(1_a)' ( 4.9)

where the first inequality uses zk+l ~ zk, the second uses (4.7), and the third uses
Lemma 4.3.2 and the fact that eT ax = O. The proof is completed by substituting
a = .5 into (4.9). •

An important feature of Karmarkar's algorithm is that in practice, an approximate


linesearch in the step length a can be performed to maximize the potential decrease
on each step. Such a linesearch typically obtains steplengths, and potential decreases,
that are much larger than the 0(1) values that appear in the worst-case analysis
above.

Remarks. There are many papers that consider different aspects of Karmarkar's
algorithm. One line of investigation concerns the potential decrease assured in The-
orem 4.3.3. The decrease of .25 proved here can easily be improved to 1-ln(2) ~ .31
by sharper approximation of the logarithmic barrier terms. Muramatsu and Tsuchiya
[59] show that using a "fixed fraction to the boundary" step, based on the "affine"
direction PA,(c - Ed), a decrease of about .41 is always possible. Anstreicher [3] and
McDiarmid [48] independently proved that with exact linesearch of the potential
function a decrease of approximately .7215 is always possible, and this bound is
tight. Another interesting topic is the derivation of a lower bound for the worst-case
complexity of the algorithm. Anstreicher [7] shows that using exact linesearch of the
potential function, the algorithm may produce an 0(1) reduction in f(',') on every
iteration, and may require O(ln(n/f)) iterations to reduce the gap cT xk - zk to a
factor f < 1 of its initial value. Ji and Ye [43] elaborate further the analysis of [7].
Powell [66] shows that the iterates of Karmarkar's algorithm, with exact linesearch,
may visit the neighborhoods of O( n) extreme points of the feasible region.

Anstreicher [2] and Steger [67] describe a "ball update" alternative to Todd and
Burrell's [79] lower bound methodology. Shaw and Goldfarb [69] show that with
a weakened version of the ball update, and short steps (a < 1), the projective
algorithm can be viewed as a path following method and has a complexity of O(..jTiL)
iterations. Anstreicher [2] describes a modification of the algorithm that assures
monotonicity of the objective values {cT xk}. Anstreicher [10] describes a stronger
monotonicity modification, and obtains a complexity of O(..jTiL) iterations using the
134 CHAPTER 4

weakened ball updates, and step lengths based on the primal-dual potential function
F(·, .). Goldfarb and Mehrotra [30],[31] modify the projective algorithm to allow for
the use of inexact computation of the search direction ~x. Todd [71] considers the
computation of lower bounds, and the search direction, for problems with special
structure. Todd [72] and Ye [84] describe the construction of "dual ellipsoids" that
contain all dual optimal solutions. In principle this procedure could be used to
eliminate variables as the algorithm iterates, but Anstreicher [6] describE)s why the
process fails in the presence of degeneracy. Todd [74] and Anstreicher and Watteyne
[13] describe alternatives to the usual search direction obtained via decomposition,
and projection onto a simplex, respectively. Computational results for Karmarkar's
algorithm are reported in [13], and by Todd [73].

Asic et al. [14] consider the the asymptotic behavior of the iterates in Karmarkar's
algorithm using short step (a < 1), while Megiddo and Shub [49] and Monteiro
[53] examine properties of the continuous trajectories associated with the algorithm.
Bayer and Lagarias [15] explore connections between Karmarkar's algorithm and
Newton's method, Gill et al. [29] describe relationships between Karmarkar's al-
gorithm and logarithmic barrier methods, and Mitchell and Todd [51] relate Kar-
markar's method to the primal affine scaling algorithm. Freund [23], Gonzaga [35],
and Mitchell and Todd [52] consider the projective algorithm for more general prob-
lem formulations than that of LP. See also Freund [26] for a very general discussion
of the use of projective transformations.

4.4 THE AFFINE POTENTIAL REDUCTION


ALGORITHM
Although Karmarkar's algorithm caused a revolution in mathematical programming,
there are some aspects of the method that are less than ideal. For example projective
transformations have rarely been employed in the optimization literature, and the
use of the projective transfomation (4.6) is not particularly intuitive. In addition,
the O(nL) iteration complexity bound for the algorithm was eventually bettered
by "path-following" methods for linear programming (see for example [37]), which
achieve a complexity of O(.,fii.L) iterations.

It turns out that both of the above issues can be addressed by a method that is
quite similar to Karmarkar's algorithm, but which avoids the use of a projective
transformation on each step. Given a feasible interior point xk, k ;::: 0, consider a
transformed problem:
LP: mill
Potential Reduction Algorithms 135

Ax b
x > 0,

where now A = AX k and c = Xk c. Let LD denote the dual ofLP. One can think of
obtaining LP from LP by applying a simple re-scaling of the variables of the form

(4.10)

Clearly LP is equivalent to LP, and x =


e is feasible in LP. As in Karmarkar's
algorithm, the derivation of a step in LP is based on the transformed problem LP.
Define a transformed potential function
n
/(x, z) = q In(cT x - z) - L In(x;).
;=1

Note that ifx and x are related by (4.10), then !( x, z) and !( x, z) differ by a constant
which depends only on xk. As a result, it suffices to analyze the decrease in j(-,.)
=
starting at x e, z =
zk. To this end, let ~x be the projection of the gradient of
/(e, zk) onto the nullspace of A:

~x = p ..dV'xj(e, zk}f = P}, (;;r q


c e- z
kC - e) . ( 4.11)

Re-arranging (4.11), it follows that there is a y' E R m so that

(4.12)

=
Lemma 4.4.1 Let q n + vfn, and suppose that II~xll ::; TJ < 1. Then zk+1 = bT y'
satisfies zk < zk+1 ::; z·, and !(xk, zk) - !(xk, zk+1) ~ (1 - TJ)vfn.

Proof Clearly e + ~x > 0, so (4.12) implies that y' is feasible for the dual of LP,
and therefore bT y' ::; z·. In addition,

= (c +qTJvfn (cT e _
T e q- zk) eT(e
c-T e - bT y' + ~x) ::; n zk), (4.13)

implying zk+1 > zk. Finally,


136 CHAPTER 4

q In ( cTc;;re e- - zk+l)
z
k ~ q In
(n + rrJri)
q

q In (1 - (1 - ;)fo) ~ -(1 - '1)fo,

where the last inequality uses In(1 - t) ~ -t for t < 1. •


Let 0 < '1 < 1 be a fixed constant, independent of n. By Lemma 4.4.1, if II~xll < '1,
then the lower bound can be updated to a new value zHl, such that the potential
function f(·,·) is reduced by n(fo). Consider next the situation when "~x" ~ '1.
In this case we take a step in the transformed problem of the form:

_I ~x
X = e -a,,~x'" (4.14)

where a > 0 is a step length yet to be decided. Following such a step, a new point
=
x k +1 is defined by x k +1 X k X I.

Lemma 4.4.2 Let q = n + fo, and suppose that "~x,, ~ '1 > O. Then there is a
step/ength a so that f(x k , zk) - f(X H1 , zk) ~ (1 + T]) - v'I+2i7 > o.

Proof. We have

< -aT] + 2(1 - a) , (4.15)

where the first inequality uses Lemma 4.3.2, and the second uses In(1 - t) ~ -t
for t < 1. A straightforward calculus exercise shows that (4.15) is minimized at
Potential Reduction Algorithms 137

Q' =1- 1/>/1 + 21/, and substitution of this value into (4.15) completes the proof.

Taken together, Lemmas 4.4.1 and 4.4.2 immediately imply that for q n + fo, an =
Q(I) decrease in !(.,.) is always possible. As a result, the affine potential reduction
algorithm is an O( nL) iteration method for LP. However, there is a striking asymme-
try between Lemmas 4.4.1 and 4.4.2, since the former shows that in fact an Q(fo)
decrease occurs on steps where the lower bound is updated. In fact the affine poten-
tial reduction method, exactly as described above, can be shown to be an O( foL)
iteration algorithm by analyzing the algorithm using the symmetric primal-dual
potential function F(·, .), instead of the primal potential function !(', .).

Suppose that x lc > 0 and sic > 0 are feasible for LP and LD, respectively. Consider
a linear transformation of the dual variables

(4.16)

Then for any x > 0 and s > 0, feasible in LP and LD, respectively, x and s from
(4.10) and (4.16) are feasible in LP and LD, respectively, and moreover F(x, s) =
F(x, s). As a result, it suffices to analyze the descent in F(·,·) starting at x e, =
= = =
s sic X lc sic. Let ~x be as in (4.11), for zlc bTylc, where ATylc + sic = C. If
II~xll ~ 1/, we continue to take a step as in (4.14), and let x lc + 1 =
Xkx'.

=
Lemma 4.4.3 Let q n+fo, and let ~x be as in (4.11), with zk = bT ylc. Suppose
that II~xll ~ 1/ > O. Then there is a steplength Q' so that F(x lc , sic) - F(xlc+I, sk) ~
(1 + 1/) - v'f+211 > O.

Proof The proof is identical to that of Lemma 4.4.2, using the fact that for any x,
xT sk = cT X _ zk. •

Next we turn to the case of II~xll ::; 1/. As before, we will use the fact that (4.12)
provides a feasible solution for LD. Define

(4.17)

We now require an analysis of the step from sic to s' that includes the effect of the
dual barrier terms in F(·, .).

Theorem 4.4.4 Suppose that lI~xll ::; 1/. Let s' be as in (4.17), and let slc+ 1
(Xlc)-ls'. Then F(x", sk) - F(x lc , slc+ 1 ) ~ (1 - 21/)/(2 - 21/).
138 CHAPTER 4

Proof. We have

F(Xk, Sk+ 1) _ F(Xk, Sk)


F(e, s') - F(e, Sk)

qln (n+:T~x) -nln (eTqs


k
) - tln(1+~Xi)+ tln(Sf)'
where the last equality uses (4.17). Note that
n
L In(s7) ~ n In(eTs k In), (4.18)
i=1

by the arithmetic-geometric mean inequality. Moreover, Lemma 4.3.2 implies that


n 2
-''In(1+~xi)~-eT~x+ (1] )" (4.19)
~ 21-1]

Using (4.18) and (4.19), we obtain F(xk,sk+1) - F(xk,sk)

where the last inequality uses In(1 + t) ~ t for t > -1 (twice), and the fact that
II~xll ~ 1]. The proof is completed by noting that q =
n + yfii ~ 2n for n :::: 1. •

Lemma 4.4.3, and Theorems 4.4.4 and 4.2.2, imply that the affine potential reduction
algorithm, using q =n + yfii and 1] < .5, is an O( yfiiL) algorithm for LP. As with
Karmarkar's algorithm, in practice a linesearch in a can also be used to improve the
decrease in Fe·) on primal steps.

Remarks. The affine potential reduction method based on f(·,·) was proposed by
Gonzaga [33], who assumed that zO = z*. The lower bound logic based on (4.12)
Potential Reduction Algorithms 139

was suggested in [33], and fully developed by Freund [24]. Independently, Ye [85]
devised the analysis based on F(·, .), which reduces the complexity of the algorithm
to O( ynL) iterations. Ye [83] also describes an alternative O( ynL) iteration algo-
rithm that uses F(·, .), but employs projectiv~ transformations as in Karmarkar's
algorithm.

The lower bound, or dual variable, update based on (4.12) can be modified in several
different ways. For example, in [24] the lower bound is increased to a value zk+l
so that following the bound update it is always the case that II.:lili ;: : "I. As a
result, updates of the lower bound (or dual solution) are immediately be followed
by primal steps. Gonzaga [36] considers a general procedure for the construction of
lower bounds, and Mitchell [50] relates the construction in [36] to earlier results of
Todd [72].

Anstreicher [9] describes a monotonicity modification for the affine potential reduc-
tion algorithm, and Ye [86] analyzes a variant that allows for column generation.
Monteiro [58] considers the behavior of the continuous trajectories associated with
the algorithm. Todd [77] describes analogs of potential reduction methods for semi-
infinite linear programming. Anstreicher [11] devises an algorithm which is similar
to the affine potential reduction for LD, but which employs a volumetric potential
function
1
q In(z - bT y) - 2" In (det (AS- 2 AT)) ,

where s = c - AT Y > 0, q = O(m), and z > z*. The resulting algorithm has a
complexity of O(mynL) iterations. Using a potential function that combines the
volumetric barrier with the usual logarithmic barrier, the algorithm's complexity is
reduced to O( yrnnL) iterations.

4.5 THE PRIMAL-DUAL ALGORITHM


In the analysis of the previous section, the use of the primal-dual potential function
=
F(·, .), with q n + yn, results in a comparable potential reduction on primal and
dual steps, and improves the complexity of the affine potential reduction algorithm
to O( ynL) iterations. The algorithm's treatment of primal versus dual variables
is still very asymmetric, however. In this section we describe a different potential
reduction method based on F(·,·) which treats the primal and dual variables in
a completely symmetric fashion. This "primal-dual" algorithm is due to Kojima,
Mizuno, and Yoshise [47]. Our derivation here differs somewhat from that in [47],
as we wish to emphasize the connection with the primal algorithm of the previous
section.
140 CHAPTER 4

Let q = n + vn,and let xk and sk be feasible interior solutions of LP and LD,


respectively. Consider a change of variables

x (Xk)-1/2(Sk)1/2 X
(4.20)
s (Xk)1/2(Sk)-1/2 s.

Then for any x feasible in LP, x from (4.20) is feasible for a rescaled problem
LP defined as in the previous section, but using the primal-dual scaling matrix
(Xk)1/2(Sk)-1/2 in place of Xk. Similarly if s is feasible for LD, then s is feasible
in LD, the dual of LP. Moreover, F(x, s) = F(x, s). Note that the transformation
(4.20) maps both xk and sk to the vector v = (X k )1/2(Sk)1/2 e. As a result, it suffices
= =
to consider the reduction in F(·,·) starting at x s v. Note that

We define directions

(4.21 )
where A = A(Xk)1/2(Sk)-1/2. Consider simultaneous primal and dual steps of the
form:
x' = v - ~Llx = V(e - ~V-ILlx),
(4.22)
s' v-£Lls=V(e-£V-1Lls)
-y -y'

where I = JIIV-1LlxIl2 + IIV-1LlsIl2, and a > 0 is a steplength yet to be decided.


We then have F(x', s') - F(v, v)

qln ((V - aLlxh)T(v - aLlsh)) _ ~ln (xi) _ ~ln (s:)


II vl12 ~
i=l

'
~
i=l

"

qln (1- avT(Llx+LlS))


ll
III v 2
_ ~ln (1- aLlx i ) _ ~ln (1- aLlsi) ,
8 8 IVi IVi

=
where we are using the fact that LlxT Lls O. Applying Lemma 4.3.2, and the fact
that In(l - t) ~ -t for t < 1, for a sufficiently small we obtain

F(x',s')-F(v,v) <

( 4.23)
Potential Reduction Algorithms 141

where the equality uses the fact that ~x + ~s = (q/llvIl 2 )v - V-Ie. Now let Vrnin =
mini {Vi}. Then
,2 IW- l ~x112 + IIV- l ~sw
< ---i-(II~xW + II~sI12)
vrnin
1
-2-II~x+~sW
vmin

v;Jlllv~'2v - V- Ie 11
2
(4.24)

Using (4.24) in (4.23), we obtain

F(x I, S ') - F( v, v) ~ -avrnin 11"v~'2 v - V-lell + 2(la~ a)" (4.25)

To obtain an estimate for the decrease in F(·,·) for the primal-dual algorithm we
require a bound for the linear term in (4.25). Such a bound is provided by the
following lemma.

Lemma 4.5.1 [47, Lemma 2.5] Let vERn, v > 0, and q = n +,;n. Then

vrnin II IlvWV
q - V -1 eII 2: v'3
2'

Proof We have

>

>
142 CHAPTER 4

where the second equality uses the fact that vT[V-le - (n/llvW)v] = o. •
=
Theorem 4.5.2 Let q n +.,fii, and consider the primal-dual steps defined as in
=
(4.22). Let xk+l (Xk)1/2(Sk)-1/2 x', Sk+l =
(Xk)-1/2(Sk)1/2 s', Then there is a
steplength a so that F(xk, sk) - F(xk+l, sk+l) ~ .16 .

Proof From (4.25) and Lemma 4.5.1 we have

(4.26)

The proof is completed by substituting a = .37 into (4.26). •


Remarks. Todd and Ye [80], who introduce the primal-dual potential function
F(·, .), devise an interesting primal-dual potential reduction algorithm that may
be considered to be a precursor to the algorithm of this section. The method of
[80] uses projective transformations, like Karmarkar's algorithm, and attains a com-
plexity of O( foL) iterations. Unfortunately the iterates are constrained to lie in a
neighborhood of the central path, making the algorithm similar to a path following
method, and precluding the use of linesearches to increase the descent in F (" .) on
each step. Gonzaga and Todd [38] describe a "primal or dual" potential reduction
method based on F(·,.) which achieves symmetry between the primal and dual vari-
ables in a fundamentally different way from the algorithm of this section. In [38],
the algorithm takes either a primal step as in (4.14), or a dual step which is based on
a projected gradient step in the transformed dual variables s, after a scaling of the
form s = (Sk)-ls which maps sk to e. It is shown that for q = n + fo, one of these
two steps must produce an 0(1) decrease in FC .). Mizuno and Nagasawa [56], and
Tun<;el [81] consider variants of the primal-dual potential reduction algorithm that
use the primal-dual affine scaling direction. Ye et al. [90] consider modifications
of the primal-dual algorithm based on varying the value of the parameter q in the
system used to derive the primal-dual directions.

4.6 ENHANCEMENTS AND EXTENSIONS


In this section we describe several modifications of the potential reduction methods
described in the previous sections that enhance the theoretical complexity, and/or
practical performance, of the algorithms. We also describe extensions of the algo-
rithms to problems more general than LP.
Potential Reduction Algorithms 143

4.6.1 Partial Updating


For each ofthe algorithms described above, the dominant computational task on each
iteration is the formation, and factorization, of an m x m matrix, requiring 0(m 2 n)
operations using standard linear algebra. The remaining work per iteration is all
O(mn). As a result, the total complexity for Karmarkar's algorithm is 0(m 2 n 2 L)
operations, and the total complexity for both the affine and primal-dual potential
=
reduction algorithms (using q n + yin) is 0(m 2 n1.5 L) operations.

The above total complexity bounds can be improved using a technique known as
partial updating. Consider Karmarkar's algorithm, or the affine potential reduction
method. Then the matrix to be formed and factorized on each iteration is of the
form A(Xk)2 AT. The idea of partial updating is to instead maintain a factorization
of a matrix A(Xk? AT, where irk > 0 satisfies

-p1 ~ -t-
ir~
xi
~ p, i = 1, ... , n, (4.27)

and p > 1 is a 0(1) constant. The computations required on each step are then mod-
ified to use the factorization of A(Xk)2 AT, instead of a factorization of A(Xk)2 AT.
Following a step from xk to xk+1, the algorithm first sets xk+ 1 = xk, and then "up-
dates" any indecies i which fail to satisfy (4.27), for k =
k + 1. Each such update
produces a rank-one change in A(Xk+l)2 AT, requiring an update of the factoriza-
tion of A(Xk+l)2 AT that can be performed in 0(m 2) operations. See for example
Shanno [68] for details of updating a Cholesky factorization. Karmarkar [44], who
introduced the technique, shows that when his algorithm uses partial updating the
number of iterations is still O( nL) but the total number of updates required on all
iterations is only 0(n1.5 L). As a result, the complexity of Karmarkar's algorithm
=
using partial updating is reduced to 0(n1.5(m 2 )L + n(mn)L) 0(m1.5n 2 L). In the
interior point literature the distinction between m and n is often ignored, in which
case partial updating provides a factor-of-yIn complexity improvement.

We will not present the details of potential reduction algorithms that incorporate par-
tial updating, but we will describe some results on the topic. A serious shortcoming
of Karmarkar's original analysis of partial updating is that the complexity improve-
ment requires that the algorithm take short steps (a < 1), instead of performing
a linesearch of the potential function. This restriction makes the technique hope-
lessly impractical. Anstreicher [5] shows that with a simple safeguard, a linesearch
can be performed when using partial updating, while still retaining the complexity
improvement. Ye [85] describes a partial updating version of the affine potential
reduction algorithm that reduces the total complexity to O( m1.5 n 1.5 L) operations.
However, the analysis of [85], like that in [44], requires that the algorithm take short
144 CHAPTER 4

steps. Anstreicher and Bosch [12] adapt the safeguarded linesearch of [5] to the affine
potential reduction algorithm, resulting in an O( m1.5 n 1.5 L) algorithm that can use
linesearch to improve the reduction in F(·,·) on each iteration. Other partial updat-
ing variants of the affine potential reduction method are devised by Bosch [16], and
Mizuno [53], [54].

Partial updating can also be applied to primal-dual algorithms, which are based on
a primal-dual scaling matrix of the form (Xk)1/2(Sk)-1/2. Bosch and Anstreicher
[17] devise an O( m1.5 n 1.5 L) partial updating variant of the primal-dual potential
reduction algorithm of [47], that allows for safeguarded linesearch of F(.,.) using
unequal primal and dual step lengths.

Although partial updating is important from the standpoint of theoretical complex-


ity, the technique has not been used very much in practice. The reason for this
is quite simple. The complexity improvement from partial updating is based on
worst-case decrease in the potential function, and reducing the number of updates
per iteration from n to an average of O( y'n). However, in practice algorithms typi-
cally achieve potential decreases that are much better than the worst-case bounds,
using long steps that would result in O( n) updates per iteration. The additional
"overhead" required to implement partial updating then makes the technique un-
competitive. Shan no [68] and Bosch and Anstreicher [18] present computational
results using partial updating. In [18] it is shown that for certain problem struc-
tures partial updating can actually enhance the practical performance of the affine
potential reduction algorithm.

4.6.2 Long Steps


For each of the algorithms considered above, the steps (in (4.5), (4.14), and (4.22»
are parameterized using a two-norm step length. In practice a potential reduction
algorithm can (and generally will) use a steplength having O! > 1, but the perfor-
mance on such a "long" step cannot be theoretically analyzed. One way to analyze
such long steps, and in so doing perhaps get more insight into the typical behav-
ior of a potential reduction algorithm, is to parameterize the step in terms of an
infinity-norm, as opposed to two-norm, steplength.

Consider for example Karmarkar's algorithm. Instead of the step as in (4.5), define
a step of the form
_/ ~x
(4.28)
;x; = e- O! lI~xlloo .
Potential Reduction Algorithms 145

Proceeding as in the derivation of (4.7), we then obtain


f(Xk+l, Zk+l) _ f(x k , Zk+l)

= (n + l)ln (1- all~xW/II~xlloo)


(c-zdVe
- Eln
i=l
(1- a ~Xi
II~xlloo
)
< (n + 1) In (1 _ a(ll~xll/lI~xlloo)211~xlloo) + a2(II~xll/ll~xlloo)2
(c-zd)T e 2(1-a)
( 2 ) lI~xW (4.29)
II~xll~'
(
< -a + 2(1- a)
where the first inequality uses Lemma 4.3.2 and the fact that eT ~x = 0, and the
second uses (4.4) and In(l - t) ~ -t for t < 1. As in Theorem 4.3.3, (4.29) shows
that an Q(l) decrease in f(',') is always possible. However, (4.29) also indicates that
the decrease on a step of Karmarkar's algorithm will typically be much greater. In
particular, II~xI12/II~xll~ is typically Q(n/ln(n)) (as first observed by Nemirov/!kii
[60]), implying that the algorithm can obtain a potential decrease of Q(n/ In(n)) .on
a single step. From Theorem 4.2.1, this magnitude of potential decrease per step
results in an O(ln( n)L) iteration algorithm, in accord with the observation that in
practice the convergence of the algorithm is independent (or nearly independent) of
n.

Nesterov and Todd [62] suggest a similar "long step" analysis for the affine potential
reduction algorithm based on f(', .), with q = 2n. Let ~x be as in (4.11), and
suppose that lI~xlloo ~ TJ < 1. Let zk+ 1 = bTy', where y' is as in (4.12). It then
follows easily that
(4.30)
and also that

Thus updates of the lower bound now produce an Q(n) decrease in f(" .). Next
consider the situation where lI~xlloo > TJ. Instead of using the step as in (4.14),
define
_I ~x
X =e - a II~xlloo
Proceeding as in the proof of Lemma 4.4.2, we obtain
f(xk+l, zk) - f(x k , zk)

< -a ( q _
c- e
)T ~x
+ a2(II~xll/lI~xlloo)2
cr e - zk II~xlloo
-.:..:..:....::-:-::.:..:....:..:~...:..:..::..::..:....-
2(1 - a)
146 CHAPTER 4

< (4.31)

As in the case of Karmarkar's algorithm, (4.31) assures an n(l) decrease in f(·, zk),
but indicates that a much larger decrease will typically occur.

If one considers the affine potential reduction algorithm using Fe, .), with q 2n, =
then the situation on primal steps, with IIAxll oo 2: "I, is exactly as above. For dual
steps, the effect on F(·,·) can easily be analyzed as in the proof of Theorem 4.4.4.
The final result is that on a dual step, where IIAxll oo :5 "I, F(xk, sk) - F(xk, sk+l) 2:
n(1 - 2"1)/(2 - 2"1), a decrease of exactly n times the bound of Theorem 4.4.4.
However, with q = 2n there is essentially no reason to measure progress of the
algorithm using F(·, .).

For a more extensive discussion of the use of "long steps" in potential reduction
methods see Nesterov [61], Nesterov and Todd [62], and Todd [78]. The latter also
describes a "long step" analysis for the primal-dual potential reduction algorithm.

4.6.3 Large-step Dual Updates


=
The affine potential reduction method based on F(·, .), with q n +...[ii, was con-
sidered a major breakthrough in interior point methods. Previous O( ...[iiL) iteration
methods were all of the short-step path-following variety (see for example [37]),
where iterates were constrained to remain within a small neighborhood of the cen-
tral trajectory. The affine potential reduction method, on the other hand, placed
no explicit restrictions on the iterates, and offered the possibility of an O(...[iiL) al-
gorithm that might perform well in practice. Unfortunately the algorithm does not
=
perform well in practice with q n +...[ii. An explanation for this phenomenon was
provided by Gonzaga [34]. With q = n + ...[ii, dual updates are performed when
IIAxll :5 "I < 1. The result of such an update is a "small-step" reduction in the gap;
in fact (4.13) indicates that the gap is reduced by a factor which is no smaller than
1-(I+T/)/...[ii. On the other hand the algorithm takes primal steps when IIAxll > "I,
and in this case the "worst-case" reduction in F(·,·) is only n(I). One might hope
that the use of a linesearch on the primal and dual steps could improve the per-
formance of the algorithm, but in practice (with q = n + ...[ii) this improvement is
minimal.
Potential Reduction Algorithms 147

It turns out that it is possible to retain the O( vnL) iteration complexity of the affine
potential reduction algorithm while using "larger-step" dual updates. Consider q =
n + vvn, where v = 0(1). The analysis of descent in F(-,·) for primal and dual
steps is then almost identical to the analysis with v = 1, and the bounds provided by
Lemma 4.4.3 and Theorem 4.4.4 continue to hold. By Theorem 4.2.2, the algorithm
remains an O( vnL) iteration algorithm. However, the dual update will now result
III
cT xk - zk+l n + TJvn
cT xk - zk ::; n + vvn '
so large values of v produce a better gap reduction on dual steps. In addition,
following such a step one will tend to have a larger value for II~xll, resulting in a
primal step with better potential decrease. This is the rationale behind the "large
step dual update" of [34], although Gonzaga describes the dual update somewhat
differently from the way we describe it here, and bases his complexity analysis on
1(-,.) rather than F(·, .).

A "truly large" dual step update, with an Q( 1) reduction in the gap, is provided by
=
using q 2n. In this case the algorithm can also be analyzed using an infinity-norm
parameterization of the primal step, as described above. Thus q = 2n produces
truly-large-step dual updates, and allows for long primal steps, leading to a very
substantial improvement in the practical performance of the algorithm.

4.6.4 Infeasible-Start Methods


The potential reduction algorithms described above all require an initial primal fea-
sible xO > 0, and possibly an initial dual feasible so. As described in Section 2,
it is possible to devise an augmented problem like MLP which has an initial feasi-
ble solution. However, the large value of the parameter M makes the use of MLP
computationally unattractive.

Several approaches have been developed to allow potential reduction algorithms to


operate on problems that do not have a known feasible interior point, without the
use of M as in MLP. Phase I - Phase II algorithms use a formulation similar to MLP,
but without the explicit use of the M objective coefficient. Consider a problem:

mIn c.T x
Ax b
(4.32)
d!'x 0
x > 0,
148 CHAPTER 4

where x E Rn+l,

and xO > O. It is not assumed that Ax D = b. Clearly (4.32) is equivalent to LP, and
xDgiven by xf = xf, i = 1, ... , n, x~+1 = 1 is feasible for all of the constraints of
(4.32) except the constraint efT x = O. The approach of a Phase I - Phase II potential
reduction algorithm is to simultaneously decrease the usual primal potential function
!(.,.) based on (4.32), and also decrease a "Phase I" potential function:
n+l
j(x) = q In(dT x) - L In(x;).
;=1

Algorithms of this type based on Karmarkar's algorithm, using q = n + 1, were


devised by Anstreicher [3], and Todd [75]. Methods based on the affine potential
reduction algorithm, using !e .)
and q :2: n + fo, can be found in Anstreicher [8],
and Todd [76]. It should be noted that even with q = n + fo, the latter algorithms
cannot use Fe·) to improve the complexity ofthese methods to O( foL) iterations.

DeGhellinck and Vial [20] describe a variant of Karmarkar's algorithm, based on


parameterized feasibility problems, that does not require an initial feasible point.
When initialized with a feasible point, the method of [20] is essentially the "stan-
dard form" variant of Karmarkar's method, as described in Section 3. Fraley [22]
considers an improvement of the lower bound procedure in [20] when the initial point
is not feasible. Freund [27] describes a Phase I - Phase II affine potential reduction
algorithm that uses a single potential function, and enforces a "balance" between
the Phase I and Phase II objectives through an added constraint.

Freund [25] uses a "shifted barrier" approach to allow for the initialization of a
potential reduction algorithm with an infeasible point. In [25] it is assumed that
AxD = b, but that xO may have negative components. The usual potential function
!(-, .) is replaced by a function of the form
n
q In(cT x - z) - L In(x; + h;(c T x - z)),
;=1

where q = n + fo, and h > 0 is a "shift" vector such that xO + (cT XO - zO)h >
O. Similarly F(·,.) is replaced with a potential function that includes the shifted
primal barrier terms. Algorithms based on these perturbed potential functions have
complexities of O( nL) or O( foL) iterations, under various assumptions regarding
the dual feasible region.
Potential Reduction Algorithms 149

In practice, primal-dual "infeasible-interior-point" methods have been used very


successfully to solve linear programs from infeasible starting points. For a given
iterate xk > 0, sk > 0, these algorithms obtain search directions .6.x and .6.s by
solving a system of the form:

A.6.x b-Axk
AT .6.y + .6.s C - ATyk _ sk (4.33)
Sk .6. x + Xk .6.s = ip,ke-XkSke,

=
where 0 ::;: i ::;: 1, and p,k (xkl sk In. (The use of i =0 results in the "primal-dual
affine scaling," or "predictor" step, while i =1 gives a "centering," or "corrector"
step.) The next point is of the form

for a step parameter a ::;: 1. Most algorithms based on (4.33) are of the path-
following, or predictor-corrector variety. However, Mizuno, Kojima and Todd [55]
devise a potential reduction algorithm that uses directions from (4.33).

4.6.5 Linear Complementarity Problems


The Linear Complementarity Problem is:

LCP: s-Mx q
s ~ 0, x > 0, xT s = 0,
where M is an n x n matrix, and q E Rn. It is well known that for appropriate choices
of M (see for example [19]), LCP can be used to represent linear programming,
convex quadratic programming, matrix games, and other problems. Many primal-
dual algorithms for LP can be extended to LCP, under the assumption that M is
a positive semidefinite (but not necessarily symmetric) matrix. In particular, the
primal-dual potential reduction algorithm of Section 5 was originally devised as a
method for LCP, and retains a complexity of O( foL) iterations so long as M is
positive semidefinite. See Kojima, Mizuno, and Yoshise [47] for details.

The theory of LCP depends very heavily on the membership of M in various classes of
matrices (for example, positive semidefinite matrices). Kojima et al. [45] discuss the
application of interior point algorithms, including primal-dual potential reduction
methods, to LCP problems with different types of M. Kojima, Megiddo, and Ye
[46] analyze a potential-reduction algorithm in the case that M is a P-matrix (that
150 CHAPTER 4

is, a matrix with positive principal minors), for which a solution to LCP always
exists (see [19]). Ye [87) analyzes a potential reduction algorithm that obtains an
approximate stationary point of a general LCP, and Ye [88) considers a potential
reduction method for the related problem of approximating a Karush-Kuhn-Tucker
point of a general quadratic programming problem. The last three references show
that the potential reduction framework can be used to analyze algorithms that are
not polynomial-time methods.

4.6.6 Linear Programming Over Cones


Nesterov and Nemirovskii [64) consider a "conic" extension of the usual linear pro-
gramming problem of the form

CLP: mm (c,x)
Ax b
x E K,

where x is in a finite-dimensional real vector space X, c is in the dual space X* , b


is in a finite-dimensional real vector space Y, A is a linear mapping from X to Y,
and K is a closed, convex, and pointed cone in X. A dual problem for CLP is then

CLD: mm (b,y)
A*y + s c
s E K*,

where A* : y* -+ X* is the adjoint of A, y E Y, and K* C X* is the dual cone

K* = {s E X*I (x, s) ~ 0 Vx E K}.

Strong duality holds between CLP and CLD if, for example, CLP and CLD both have
feasible solutions which are interior to the cones K and K* ,respectively. See [64) for
more extensive duality results for these problems. Note that if X = R", Y = Rm ,
and K = R't, the nonnegative orthant, then CLP is simply LP. It is shown in [64)
that CLP actually provides a formulation for general convex programming.

In [64, Chapter 4] it is shown that Karmarkar's algorithm, and the affine potential
reduction algorithm, can be extended to problems of the form CLP so long as the
cone K possesses a f)-logarithmically-homogeneous barrier. The exact definition of
such a barrier, and its properties, are beyond the scope of this article. We note here
only that the complexities of algorithms for CLP depend on the parameter f). For
the usual LP problem, - :L7=lln(Xi) is an n-logarithmically-homogeneous barrier
Potential Reduction Algorithms 151

forR+. Another important special case takes X to be the space of n x n symmetric


matrices, and K the cone of symmetric positive semidefinite matrices. For this case
(x, s) = tr(xs), where tr(·) denotes the trace of a matrix, K* = K, and the barrier
-In(det (x)) is an n-Iogarithmically-homogeneous barrier for K. Problems of the
latter type are now commonly refered to as semidefinite programming problems,
and have a number of significant applications in combinatorial optimization, control
theory, and elsewhere. See Vandenberghe and Boyd [82] for an excellent survey of
semidefinite programming applications, and algorithms.

Todd [78] gives a much more extensive discussion of Nesterov and Nemirovskii's [64]
generalization of potential reduction algorithms to CLP. The extension of a potential
reduction algorithm (specifically Ye's [83] projective potential reduction method) to
semidefinite programming was independently obtained by Alizadeh [1]. Nesterovand
Todd [62], [63] obtain an extension of the primal-dual potential reduction method
to problems of the form CLP where K and its barrier are self-scaled; see also [78]
for a summary of these results.

Acknowledgements
I would like to thank Rob Freund, Tamas Terlaky, Mike Todd, and Yinyu Ye for
their comments on a draft of this article.

REFERENCES
[1] F. Alizadeh, "Interior point methods in semidefinite programming with appli-
cations to combinatorial optimization," SIAM J. Opt. 5 (1995) 13-51.

[2] K.M. Anstreicher, "A monotonic projective algorithm for fractional linear pro-
gramming," Algorithmica 1 (1986) 483-498.

[3] K.M. Anstreicher, "The worst-case step in Karmarkar's algorithm," Math. Oper.
Res. 14 (1989) 294-302.
[4] K.M. Anstreicher, "A combined phase I-phase II projective algorithm for linear
programming," Math. Prog. 43 (1989) 209-223.
[5] K.M. Anstreicher, "A standard form variant, and safeguarded linesearch, for
the modified Karmarkar algorithm," Math. Prog. 47 (1990) 337-351.
152 CHAPTER 4

[6] K.M. Anstreicher, "Dual ellipsoids and degeneracy in the projective algorithm
for linear programming," Contemporary Mathematics 114 (1990) 141-149.
[7] K.M. Anstreicher, "On the performance of Karmarkar's algorithm over a se-
quence of iterations," SIAM J. Opt. 1 (1991) 22-29.
[8] K.M. Anstreicher, "A combined phase I - phase II scaled potential algorithm
for linear programming," Math. Prog. 52 (1991) 429-439.
[9] K.M. Anstreicher, "On monotonicity in the scaled potential algorithm for linear
programming," Linear Algebra Appl. 152 (1991) 223-232.
[10] K.M. Anstreicher, "Strict monotonicity and improved complexity in the stan-
dard form projective algorithm for linear programming," Math. Prog. 62 (1993)
517-535.
[11] K.M. Anstreicher, "Large step volumetric potential reduction algorithms for
linear programming," to appear in Annals of O.R. (1996).
[12] K.M. Anstreicher and R.A. Bosch, "Long steps in an O(n 3 L) algorithm for
linear programming," Math. Prog. 54 (1992) 251-265.
[13] K.M. Anstreicher and P. Watteyne, "A family of search directions for Kar-
markar's algorithm," Operations Research 41 (1993),759-767.
[14] M.D. Asic, V.V. Kovacevic-Vujcic, and M.D. Radosavljevcic-Nikolic, "A note
on limiting behavior of the projective and the affine rescaling algorithms, Con-
temporary Mathematics 114 (1990) 151-157.

[15] D. Bayer and J .C. Lagarias, "Karmarkar's linear programming algorithm and
Newton's method," Math. Prog. 50 (1991) 291-330.
[16] R.A. Bosch, "On Mizuno's rank one updating algorithm for linear program-
ming," SIAM J. Opt. 3 (1993) 861-867.
[17] R.A. Bosch and K.M. Anstreicher, "On partial updating in a potential reduction
linear programming algorithm of Kojima, Mizuno, and Yoshise," Algorithmica
9 (1993) 184-197.
[18] R.A. Bosch and K.M. Anstreicher, "A partial updating algorithm for linear
programs with many more variables than constraints," Optimization Methods
and Software 4 (1995) 243-257.
[19] R. W. Cottle, J .-S. Pang, and R.E. Stone, The Linear Complementarity Problem
(Academic Press, Boston, 1992).
Potential Reduction Algorithms 153

[20] G. de Ghellinck and J.-Ph. Vial, "A polynomial Newton method for linear
programming," Algorithmica 1 (1986) 425-453.

[21] A.V. Fiacco and G.P. McCormick, Nonlinear Programming, Sequential Uncon-
strained Minimization Techniques, (John Wiley, New York, 1968); reprinted as
Classics in Applied Mathematics Vol. 4, (SIAM, Philadelphia, 1990).

[22] C. Fraley, "Linear updates for a single-phase projective method," O.R. Leiters
9 (1990) 169-174.

[23] R.M. Freund, "An analog of Karmarkar's algorithm for inequality constrained
linear programs, with a 'new' class of projective transformations for centering
a polytope," O.R. Letters 7 (1988) 9-14.

[24] R.M. Freund, "Polynomial-time algorithms for linear programming based only
on primal scaling and projected gradients of a potential function," Math. Prog.
51 (1991) 203-222.

[25] R.M. Freund, "A potential-function reduction algorithm for solving a linear
program directly from an infeasible 'warm start'," Math. Prog. 52 (1991) 441-
466.

[26] R.M. Freund, "Projective transformations for interior-point algorithms, and a


superlinearly convergent algorithm for the w-center problem," Math. Prog. 58
(1993) 385-414.

[27] R.M. Freund, "A potential reduction algorithm with user-specified phase 1-
phase II balance for solving a linear program from an infeasible warm start,"
SIAM J. Opt. 5 (1995) 247-268.

[28] D.M. Gay, "A variant of Karmarkar's linear programming algorithm for prob-
lems in standard form," Math. Prog. 37 (1987) 81-90.

[29] P. Gill, W. Murray, M. Saunders, J. Tomlin, and M. Wright, "On projected New-
ton barrier methods for linear programming and an equivalence to Karmarkar's
projective method," Math. Prog. 36 (1986) 183-209.

[30] D. Goldfarb and S. Mehrotra, "Relaxed variants of Karmarkar's algorithm for


linear programs with unknown optimal objective value," Math. Prog. 40 (1988),
183-195.

[31] D. Goldfarb and S. Mehrotra, "A relaxed version of Karmarkar's method,"


Math. Prog. 40 (1988), 289-315.

[32] C.C. Gonzaga, "Conical projection algorithms for linear programming," Math.
Prog. 43 (1989) 151-173.
154 CHAPTER 4

[33] C.C. Gonzaga, "Polynomial affine algorithms for linear programming," Math.
Prog. 49 (1991) 7-21.

[34] C.C. Gonzaga, "Large-step path following methods for linear programming,
part II: potential reduction method," SIAM J. Opt. 1 (1991) 280-292.

[35] C.C. Gonzaga, "Interior point algorithms for linear programs with inequality
constraints," Math. Prog. 52 (1991) 209-225.

[36] C.C. Gonzaga, "On lower bound updates in primal potential reduction methods
for linear programming," Math. Prog. 52 (1991) 415-428.

[37] C.C. Gonzaga, "Path-following methods for linear programming," SIAM Review
34 (1992) 167-224.

[38] C.C. Gonzaga and M.J. Todd, "An O( foL )-iteration large-step primal-dual
affine algorithm for linear programming," SIAM J. Opt. 2 (1992) 349-359.

[39] P. Huard, "Resolution of mathematical programming with nonlinear constraints


by the method of centres," in Nonlinear Programming, J. Abadie, editor (North-
Holland, Amsterdam, 1967).

[40] H. Imai, "On the convexity ofthe multiplicative version of Karmarkar's potential
function," Math. Prog. 40 (1988) 29-32.

[41] M. Iri and H. Imai, "A multiplicative barrier function method for linear pro-
gramming," Algorithmica 1 (1986) 455-482.

[42] B. Jansen, C. Roos, and T. Terlaky, "The theory of linear programming: skew
symmetric self-dual problems and the central path," Optimization 29 (1993)
225-233.
[43] J. Ji and Y. Ye, "A complexity analysis for interior-point algorithms based on
Karmarkar's potential function," SIAM J. Opt. 4 (1994) 512-520.

[44] N. Karmarkar, "A new polynomial-time algorithm for linear programming,"


Combinatorica 4 (1984) 373-395.

[45] M. Kojima, N. Megiddo, T. Noma, and A. Yoshise, "A unified approach to


interior point algorithms for linear complementarity problems," Lecture Notes
in Computer Science 538 (Springer-Verlag, Berlin, 1991).

[46] M. Kojima, N. Megiddo, and Y. Ye, "An interior point potential reduction
algorithm for the linear complementarity problem," Math. Prog. 54 (1992) 267-
279.
Potential Reduction Algorithms 155

[47] M. Kojima, S. Mizuno, and A. Yoshise, "An O(.,fiiL) iteration potential re-
duction algorithm for linear complementarity problems," Math. Prog. 50 (1991)
331-342.
[48] C. McDiarmid, "On the improvement per iteration in Karmarkar's algorithm
for linear programming," Math. Prog. 46 (1990) 299-320.
[49] N. Megiddo and M. Shub, "Boundary behavior of interior point algorithms in
linear programming," Math. Oper. Res. 14 (1989), 97-146
[50] J .E. Mitchell, "Updating lower bounds when using Karmarkar's projective al-
gorithm for linear programming," JOTA 78 (1993) 127-142.
[51] J.E. Mitchell and M.J. Todd, "On the relationship between the search directions
in the affine and projective variants of Karmarkar's linear programming algo-
rithm," in Contributions to Operations Research and Economics: The Twentieth
Anniversary of CORE, B. Cornet and H. Tulkens, editors, MIT Press (Cam-
bridge, MA, 1989) 237-250.
[52] J .E. Mitchell and M.J. Todd, "A variant of Karmarkar's linear programming
algorithm for problems with some unrestricted variables," SIAM J. Matrix Anal.
Appl. 10 (1989) 30-38.

[53] S. Mizuno, "A rank one updating algorithm for linear programming," The Ara-
bian Journal for Science and Engineering 15 (1990) 671-677.

[54] S. Mizuno, "O( n P L) iteration O( n 3 L) potential reduction algorithms for linear


programming," Linear Algebra Appl. 152 (1991) 155-168.

[55] S. Mizuno, M. Kojima, and M.J. Todd, "Infeasible-interior-point primal-dual


potential-reduction algorithms for linear programming," SIAM J. Opt. 5 (1995)
52-67.

[56] S. Mizuno and A. Nagasawa, "A primal-dual affine scaling potential reduction
algorithm for linear programming," Math. Prog. 62 (1993) 119-131.
[57] R.D.C. Monteiro, "Convergence and boundary behavior of the projective scaling
trajectories for linear programming," Contemporary Mathematics 114 (1990)
213-229.
[58] R.D.C. Monteiro, "On the continuous trajectories for a potential reduction al-
gorithm for linear programming," Math. Oper. Res. 17 (1992) 225-253.
[59] M. Muramatsu and T. Tsuchiya, "A convergence analysis ofa long-step variant
of the projective scaling algorithm," The Institute of Statistical Mathematics
(Tokyo, Japan, 1993); to appear in Math. Prog.
156 CHAPTER 4

[60] A.S. Nemirovskii, "An algorithm of the Karmarkar type," Soviet Journal on
Computers and Systems Sciences 25 (1987) 61-74.

[61] Y.E. Nesterov, "Long-step strategies in interior point potential-reduction algo-


rithms," Dept. SES-COMIN, University of Geneva (Geneva, Switzerland, 1993).
[62] Y.E. Nesterov and M.J. Todd, "Self-scaled barriers and interior-point methods
for convex programming," Technical Report 1091, School of OR/IE, Cornell
University (Ithaca, NY, 1994); to appear in Math. Oper. Res ..
[63] Y.E. Nesterov and M.J. Todd, "Primal-dual interior point methods for self-
scaled cones," Technical Report 1125, School of OR/IE, Cornell University
(Ithaca, NY, 1995).
[64] Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Con-
vex Programming (SIAM, Philadelphia, 1994).

[65] C.H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms


and Complexity (Prentice-Hall, 1982).

[66] M.J .D. Powell, "On the number of iterations of Karmarkar's algorithm for linear
programming," Math. Prog. 62 (1993) 153-197.
[67] A.E. Steger, "An extension of Karmarkar's algorithm for bounded linear pro-
gramming problems," M.S. Thesis, State University of New York (Stonybrook,
NY, 1985).
[68] D.F. Shanno, "Computing Karmarkar projections quickly," Math. Prog. 41
(1988) 61-71.
[69] D. Shaw and D. Goldfarb, "A path-following projective interior point method
for linear programming," SIAM J. Opt. 4 (1994) 65-85.
[70] K. Tanabe, "Centered Newton method for mathematical programming," Lecture
Notes in Control and Information Sciences 113 (Springer-Verlag, Berlin, 1988)
197-206.
[71] M.J. Todd, "Exploiting special structure in Karmarkar's linear programming
algorithm," Math. Prog. 41 (1988) 97-113.
[72] M.J. Todd, "Improved bounds and containing ellipsoids in Karmarkar's linear
programming algorithm," Mathematics of Operations Research 13 (1988) 650-
659.
[73] M.J. Todd, "The effects of degeneracy and null and unbounded variables on vari-
ants of Karmarkar's linear programming algorithm," in Large Scale Numerical
Optimization, T.F. Coleman and Y. Li, editors (SIAM, Philadelphia, 1990).
Potential Reduction Algorithms 157

[74] M.J. Todd, "A Dantzig-Wolfe-like variant of Karmarkar's interior-point linear


programming algorithm," Operations Research 38 (1990) 1006-1018.

[75] M.J. Todd, "On Anstreicher's combined phase I-phase II projective algorithm
for linear programming," Math. Prog. 55 (1992) 1-15.

[76] M.J. Todd, "Combining phase I and phase II in a potential reduction algorithm
for linear programming," Math. Prog. 59 (1993) 133-150.

[77] M.J. Todd, "Interior-point algorithms for semi-infinite programming," Math.


Prog. 65 (1994) 217-245.

[78] M.J. Todd, "Potential-reduction methods in mathematical programming,"


School of IE/OR, Cornell University (Ithaca, NY, 1995); to appear in Math.
Prog.

[79] M.J. Todd and B.P. Burrell, "An extension of Karmarkar's algorithm for linear
programming using dual variables," Algorithmica 1 (1986) 409-424.

[80] M.J. Todd and Y. Ye, "A centered projective algorithm for linear programming,"
Math. Oper. Res. 15 (1990) 508-529.

[81] L. Tunc;el, "Constant potential primal-dual algorithms: a framework," Math.


Prog. 66 (1994) 145-159.

[82] L. Vandenberghe and S. Boyd, "Positive definite programming," Dept. of Elec-


trical Engineering, Stanford University (Stanford, CA, 1994); to appear in SIAM
Review.

[83] Y. Ye, "A class of projective transformations for linear programming," SIAM
J. Compo 19 (1990) 457-466.

[84] Y. Ye, "A 'build down' scheme for linear programming," Mathematical Pro-
gramming 46 (1990) 61-72.

[85] Y. Ye, "An O(n 3 L) potential reduction algorithm for linear programming,"
Math. Prog. 50 (1991) 239-258.

[86] Y. Ye, "A potential reduction algorithm allowing column generation," SIAM J.
Opt. 2 (1992), 7-20.

[87] Y. Ye, "A fully polynomial-time approximation algorithm for computing a sta-
tionary point of the general LCP," Math. Oper. Res. 18 (1993) 334-345.
[88] Y. Ye, "On the complexity of approximating a KKT point of quadratic pro-
gramming," Dept. of Management Sciences, University of Iowa (Iowa City, lA,
1995).
158 CHAPTER 4

[89] Y. Ye and M. Kojima, "Recovering optimal dual solutions in Karmarkar's poly-


nomial algorithm for linear programming," Math. Prog. 39 (1987) 305-317.
[90] Y. Ye, K.O. Kortanek, J.A. Kaliski, and S. Huang, "Near-boundary behavior
of primal-dual potential reduction algorithms for linear programming," Math.
Prog. 58 (1993) 243-255.

[91] Y. Ye, M.J. Todd, and S. Mizuno, "An O(foL)-iteration homogeneous and
self-dual linear programming algorithm," Math. Oper. Res. 19 (1994) 53-67.
5
INFEASIBLE-INTERIOR-POINT
ALGORITHMS
Shinji Mizuno
Department of Prediction and Control,
The Institute of Statistical Mathematics,
Minato-ku, Tokyo 106, Japan

ABSTRACT
An interior-point algorithm whose initial point is not restricted to a feasible point is called
an infeasible-interior-point algorithm. The algorithm directly solves a given linear program-
ming problem without using any artificial problem. So the algorithm has a big advantage
of implementation over a feasible-interior-point algorithm, which has to start from a feasi-
ble point. We introduce a primal-dual infeasible-interior-point algorithm and prove global
convergence of the algorithm. When all the data of the linear programming problem are
integers, the algorithm terminates in polynomial-time under some moderate conditions of
the initial point. We also introduce a predictor-corrector infeasible-interior-point algorithm,
which achieves better complexity and has superlinearly convergence.

5.1 INTRODUCTION
A linear programming problem is to find an optimal solution, which minimizes an
objective function under linear equality and inequality constraints. A point is called
feasible if it satisfies all the linear constraints, and called infeasible otherwise. A
point, which satisfies the inequality constraints but may not satisfy the equality con-
straints, is called interior. An interior-point algorithm solves the linear programming
problem by generating a sequence of interior points from an initial interior point. A
natural question is how to prepare the initial point?

In 1980's since the announcement of a projective interior-point algorithm by Kar-


markar [2], interior point algorithms were developed under the assumption that the
initial point is feasible and interior. For a general linear programming problem,

159
T. Terlaky (ed.), Interior Point Methods ofMathematical Programming 159-187.
© 1996 Kluwer Academic Publishers.
160 CHAPTER 5

however, computing a feasible point is as difficult as computing an optimal solution.


So we need to construct an artificial problem having a trivial feasible interior point.
Such an artificial problem requires additional variables and big constants to assure
an equivalence between the original problem and the artificial problem.

An interior point algorithm, whose initial point may not be feasible, was introduced
by Lustig [6] and Tanabe [23]. The algorithm is a simple variant of a primal-dual
interior-point algorithm developed by Megiddo [10], Kojima et al. [4, 5], Monteiro
and Adler [17], and Tanabe [22]. For any linear programming problem, it is very
easy to get an initial interior point by using slack variables. Lustig et al. [7, 8] and
Marsten et al. [9] reported that such an algorithm was practically efficient among
numerous algorithms. In this paper, we call an algorithm, which solves a given linear
programming problem directly without using any artificial problem and generate a
sequence of feasible or infeasible interior points from an arbitrary interior-point, an
infeasible-interior-point algorithm or an lIP algorithm simply.

Global convergence of an infeasible-interior-point algorithm was proved by Kojima


et al. [3]. The algorithm starts from an arbitrary interior-point and utilizes different
step sizes for primal and dual variables such as the practically efficient algorithm
proposed by Lustig et al. [7, 8]. The algorithm finds an approximate solution if a
given problem is feasible, or it detects that there are no solutions of the problem in a
wide region defined in advance. Zhang [26] proposed an O( n 2 L )-iteration infeasible-
interior-point algorithm under the assumption that the problem is feasible and all
the data are integers, where nand L denote the number of variables and the size of
all the data respectively. Then Mizuno [11] showed that a variant of the algorithm
[3] achieves the same complexity without the feasibility condition. Mizuno [11] also
proposed an O( nL )-iteration algorithm by using the predictor-corrector technique.
The infeasible-interior-point algorithms mentioned above use the path of feasible
centers. Potra [19, 20] and Stoer [21] proposed another type of infeasible-interior-
point algorithms, which trace a path of infeasible centers. This type of algorithm
possesses nice theoretical properties such as O( nL )-iteration complexity and super-
linear convergence.

In this Chapter, we present two generic algorithms named Algorithms A and B.


Algorithm A is a variant of the practically efficient algorithm proposed by Lustig
et al. [7, 8]. It uses the path of feasible centers to compute the search direction,
and it may take different step sizes for the primal and dual variables. Algorithm
B was proposed as a theoretically efficient algorithm. It uses a surface of infeasible
centers and takes a same step size for both primal and dual variables. We also present
Algorithms Al and Bl which belong to Algorithms A and B respectively. In the next
section, we introduce Algorithm A. In Section 5.3, we presents Algorithm Al and
prove global convergence of Algorithm Al from arbitrary initial interior point. Then
lIP Algorithms 161

we show in Section 5.4 that the number of iterations is bounded by a polynomial


function of nand L by taking a special initial point. In Section 5.5, we introduce
a surface of centers and present Algorithm B, which traces a path on the surface.
Then we propose Algorithm B1 in Section 5.6, which is called a predictor-corrector
algorithm. We show convergence properties of Algorithm B1 in Section 5.7. Finally
we survey infeasible-interior-point algorithms in Section 5.8.

5.2 AN liP ALGORITHM USING A PATH OF


CENTERS
Let n 2: m > 0 be integers. For an m x n matrix A and vectors bERm and cERn,
we define a primal linear programming problem

mllllmize cT x,
(5.1)
subject to =
Ax b, x 2: 0,

where x E R n is an unknown vector. Assume that the rank of A is m and the system
=
Au b has a solution u. The dual problem of (5.1) is defined as

maximize bT y,
(5.2)
subject to AT y + s = c, s 2: 0,

where y E R m and s E R n are unknown vectors. If x is an optimal solution of (5.1)


and (y, s) is an optimal solution of (5.2) then the following conditions hold:

(5.3)

where X := diag(x) denotes a diagonal matrix whose each diagonal element is equal
to the element Xi of x E Rn. The problem for finding a solution (x, y, s) of (5.3)
is called a primal-dual linear programming problem. Conversely if (x, y, s) is a
solution of the primal-dual problem, x and (y,s) are optimal solutions of (5.1) and
(5.2) respectively. We call a point (x, y, s) interior if x > 0 and s > 0 and call it
feasible if it satisfies Ax = b, ATy + s = c, and (x, s) 2: O.

We introduce a path of centers for the primal-dual problem (5.3). The path runs
through the feasible region, and one of the end points is on the solution set of the
problem. The path is very important to understand interior-point methods. For
162 CHAPTER 5

each p > 0, we consider a system of equations:

(5.4)

where e := (1,1, ... , If E Rn. Suppose that the problem (5.3) has an interior point.
Then the system (5.4) has a unique solution, which we denote by (x(p), yep), s(p)).
The center is clearly feasible. Let PI be the set of centers

PI := {(x(p), yep), s(p)) : p > O}.


The set PI forms a smooth path. We call (x(p), yep), s(p)) a feasible center and PI
a path of feasible centers. If we take p = 0, the system (5.4) is equivalent to the
system (5.3). So the solution (x(p), yep), s(p)) of (5.4) approaches to the solution
set of the problem (5.3) as p goes to O. Generating a sequence of approximate points
of (x(pk), y(pk), s(pk)) for a sequence {pk}, which converges to 0 as k ---. 00, we
obtain an approximate solution of (5.3). We note that PI is empty if the problem
(5.3) does not have interior points.

Now we introduce an infeasible-interior-point algorithm for solving the primal-dual


linear programming problem (5.3). The algorithm generates a sequence of parame-
ters pk and a sequence of interior points (x k , yk, sk). Let (xO, yO, SO) be an interior
point of (5.3), which serves as an initial point of our algorithm. The point is not nec-
essary feasible. Set pO := (xO)T SO In. Suppose that pk and (x k , yk, sk) are available.
Then we choose a parameter value p' ::; pk. In order to compute an approximation
of (x(p'), yep'), s(p')), we compute the Newton direction (~x, ~y, ~s) of the sys-
tem of equations (5.4) for p = p' at the current point (x k , yk, sk). The direction is
computed as the solution of the system of equations

(5.5)

where Xk := diag(x k) and Sk := diag(sk). We compute a new interior point


(xk+ l , yk+l, sk+l) by using a primal step size ap and a dual step size aD:
(X k+l , yk+!, sk+!) := (x k , yk, sk) + (ap~x, aD~y, aD~s). (5.6)

Then we set pk+1 := (X k+1f sk+1 In, because this pk+1 attains the minimum of the
residual for the third equality in (5.4), i.e.
IIXk+ 1Sk+1 - pk+! ell ::; IIXk+ 1 Sk+1 - pell for any p > 0,
where II . II without subscript denotes the Euclidean norm. We also use IIxliI
2:7=1 IXil and Ilxlloo := maxdlx;I}· The algorithm is summarized as follows.
lIP Algorithms 163

Algorithm A: Let (x O, yO, sO) be an initial interior point. Set k := 0 and j.J0
(xO)T sO In.

Step 1: Choose a parameter value j.J' :::; j.Jk.

Step 2: Compute the solution (Llx, Lly, Lls) of the system (5.5).

Step 3: Compute step sizes ap and aD and a next point (xk+l, yk+ 1, Sk+l) by (5.6).
Set j.Jk+1 := (xk+l)T sk+ 1In.

Step 4: Set k := k + 1 and go to Step 1.

We shall show that any iterate generated by Algorithm A lies on an affine subspace,
which includes the initial point (x O, yO, SO) and the feasible region of (5.3).

Lemma 5.2.1 Let {(xk, yk, sk)} be a sequence gentrated by Algorithm A. We have
that
A(x k + aLlx) - b = (1 - a)(Axk - b),
=
AT(yk + ally) + (sk + aLls) - c (1- a)(ATyk + Sk - c)
for each a. If we set Ofj, := 1 and O~ := 1 and compute

0~+1 := (1 - ap)OJ, and 0~+1 := (1- aD)B1>

at Step 3 of Algorithm A for each k, then

=
Axk - b OJ,(Ax o - b), (5.7)
=
AT yk + sk _ C 01> (AT yO + sO - c). (5.8)

Proof. Since (Llx, Lly, Lls) is a solution of (5.5), we have that

A(x k + aLlx) - b AXk + aALlx - b


Axk + a(-Axk + b) - b
(1- a)(Axk - b).

Similarly we get the second equality in the lemma. Since xk+l = xk + apLlx and
(yk+1,sk+1) = (yk,sk) + aD(Lly,Lls) in each iteration, we can prove the latter
assertion in the lemma by using induction. D

We can construct various infeasible-interior-point algorithms by specifying a method


for computing the parameter value j.J' at Step 1 and the step sizes ap and aD at
164 CHAPTER 5

Step 3 in Algorithm A. Here we introduce a simple algorithm, which is based on the


one proposed by Lustig et al. [7). Let A E (0,1) and q E (0,1). At Step 1, we choose
p' = Apk. At Step 3, we firstly compute the maximize step sizes ap and aD which
preserve the nonnegativity:

ap := max{ a' : x + aLlx 2: 0 for any a E [0, a')},

aD := max{ a' : s + aLls 2: 0 for any a E [0, a1}.


Then we choose ap := qap and aD := qaD. Lustig et al. [8) reported that this
algorithm was very efficient in practice when the values A and 1 - q are small.
However we can not prove convergence of the algorithm theoretically.

5.3 GLOBAL CONVERGENCE


Since we do not know whether the primal-dual problem (5.3) is feasible or not, we
have to detect the feasibility of the problem. Let p be a big positive constant and (,
(p, and (D be small positive constants. Suppose that we are interested in a solution
(x, y, s) of (5.3) only in the set

Bp:= {(x,y,s): II(x,s)lloo::; pl·


The parameter p is used in algorithms only for detecting the nonexistence of solutions
in B p , so we may set p := 00. If all the data A, b, and e are integers and L denotes
the size of the data, the problem (5.3) has a solution in Bp for p := 2L or the problem
is infeasible. Our algorithm finds an approximate solution (x, y, s) of (5.3) such that

(5.9)

or detect that the problem (5.3) has no solutions in Bp.

Let 1 E (0,1) be a constant such that

For ~ E (0,1], we define the set

N·- {(x,y,s):x>O, s>O, Xs2:1peforp=x T s/n,


x T sllbll 2: ~npoliAx - bll, x T sllcll 2: ~npoliAT y + s - ell},
where
(5.10)
lIP Algorithms 165

The set N contains the initial point (xO, yO, sO) and includes the path Pl. We generate
a sequence {(xk, yk, sk)} in the set N by Algorithm A. Then if (xk)T sk -- 0, the
sequence approaches to the solution set of the primal-dual problem (5.3) because
IIAxk -bll-- 0 and IIATyk +sk -cll-- O. The condition Xs ~ I"e in N assures that
any point in N is well separated from the boundary of the feasible region except for
the solution set.

We are ready to state a global convergent lIP algorithm for solving the primal-dual
problem (5.3). The algorithm belongs to Algorithm A.

Algorithm AI: Let (xO,yO,SO) be an initial interior point. Choose the parameter
values p, f, fp, fD, I, e,
and A E (0,1). Set k := 0, (}r;, := 1, (}fjy := 1, and
,,0 := (xOf sO In.

Step 1: If the conditions in (5.9) hold true at the current iterate (xk, yk, sk) then
output it as an approximate solution and stop. If

(}}.(xOf sk + (}1( s of xk
(5.11)
> p«(}}.e T X O + (}.b eT SO) + (xk)T sk + (}t(}.b(xO)T so,
then stop. Otherwise set ,,' = A"k.
Step 2: Compute the solution (~x, ~y, ~s) of the system (5.5).

Step 3: Compute

Let a be the value of a which attains the minimum. Choose any step sizes
ap ~ 0 and aD ~ 0 such that

(x k+l , yk+l, sk+l) := (x k + ap~x, yk + aD~y, sk + aD~s) E N,


"k+l ::; j1 for "k+l := (xk+ l f sk+l In.

Set (}~+l := (1- ap)()}. and (}~+l := (1- aD)(}.b.

Step 4: Set k := k + 1 and go to Step 1.

The algorithm stops in two cases at Step 1. In the former case we get an approximate
solution, while in the latter case we detect an infeasibility of the problem. If we
use p = 00, the algorithm generates a infinite sequence of points unless we get an
approximate solution. In Theorem 5.3.1 below, we shall show that Algorithm 1
terminates in a finite number of iterations if the problem is feasible.
166 CHAPTER 5

At Step 3 of Algorithm A, we try to decrease the value of (x k + a6xl(sk + a6s)


as much as possible under the condition that the next iterate is also in N when the
step size of primal variables is equal to that of dual variables. We use different step
sizes ap and aD if we could decrease the value more than that. Otherwise we should
use ap = a and aD = a at Step 3. Using Lemma 5.2.1, the value of jl and a is
computed by solving at most n + 3 quadratic equations.

Now we state the global convergence of Algorithm Al.

Theorem 5.3.1 Suppose that the parameters A, ~, and 1 are independent of the
data. If p is finite, Algorithm A1 terminates in a finite number of iterations, which
=
depends on the initial point, a solution of Au b and AT v + w =
c, p, c, cp, cD, and
n. If the condition (5.11) holds true at some iteration k, the primal-dual problem
=
(5.3) has no solutions in Bp. If p 00 and the problem (5.3) is feasible, Algorithm
A1 terminates in a finite number of iterations, which depends on the initial point,
an optimal solution, c, cp, cD, and n. If p = 00 and Algorithm A1 generates an
infinite sequence, the sequence is unbounded and the problem (5.3) is infeasible.

If we are interested in a solution only in a bounded region, we should use an ap-


propriate value of p. If we want to find a solution whenever the problem is feasible,
we should use p := 00, so that the condition (5.11) is nonsense. Then the iterates
generated by Algorithm Al may become very big. Nevertheless we could find an
approximate solution in a finite number of iterations if the problem is feasible. We
utilize 4 lemmas below to prove this theorem.

Lemma 5.3.2 Suppose that (xk,yk,sk) E N,

(5.12)

for some TJl >0 and TJ2 >0 at each iteration of Algorithm A1. Define

a *._ .
.-mln, {I A
.5(I-A) ,-, (1-1)A} .
TJl TJl nTJ2

If a E [0, a*] then

(x k + a6xf(sk + a6s) S; (1 - .5a(1- A»(xk)T sk (5.13)

and
lIP Algorithms 167

Proof. From the third equality in (5.5), we have that

(skf Llx + (xkf Lls = nJ.l' - (xk)T sk.

=
Using this equality, J.lk (xkf sk In, J.l' = AJ.lk, (5.12), and a :5 .5(1 - )..)/"11, we get
the inequality (5.13) as follows:

(x k + aLlxf(sk + aLls) (xkf sk + a(nJ.l' -


(xkf sk) + a 2LlxT Lls
:5 (xkf sk + a(A - l)(x k f sk + 0'2"11 (xk)T sk
:5 (1 - .50'(1- A»(xkf sk.
The condition (x k , yk, sk) E N implies

xf sf ~ -Yf-lk for each i, (5.14)


(xk)T skllbll ~ enf-l°IIAxk - bll, (5.15)
(xk)T skllcll ~ enf-l°Il AT yk + sk - ell.
We shall show that these conditions hold true at (x k , yk, sk) + a(Llx, Lly, Lls). Using
(5.12), (5.14), and a :5 (1 - -Y)A/(n'TJ2),

(x~ + aLlxi)(s~ + aLls;) - -y(x k + aLlxf(sk + aLls)/n


xf sf
+ a(1-l' - xf sf)
+ 0'2 Llx;Lls;
--y«xkf sk + a(nf-l' - (xkf sk) + 0'2 LlxT Lls)/n
(1 - a )xr s~ + a)"J.lk + 0'2 Llx;Lls;
--y(1 - a)f-lk - -yaAf-lk - -ya 2LlxT Lls/n
> 0'(1 - -Y)Af-lk - 0'2"12 (xkl sk
> O.

Since this inequality holds true for any a E [0,0'*], we also see that x~ + aLlxi > 0
and s~ + aLls; > 0 by the continuity with respect to a. Using Lemma 5.2.1, (5.15),
and a :5 )../"11,

(x k + aLlxf(sk + aLls)llbll- enf-l°IIA(xk + aLlx) - bll


«1- a)(xkf sk + a)..(xkf sk + a 2LlxT Lls)llbll- enJ.l°(1- a)IIAxk - bll
~ (a)..(xkf sk - a2'TJ1(xk)T sk)llbll
~ 0
and similarly

(x k + aLlxf(sk + aLls)llcll- enf-l°IlAT(yk + ally) + (sk + aLls) - ell ~ O.

We have shown that (x k , yk, sk) + a(Llx, Lly, Lls) E N. o


168 CHAPTER 5

From Lemma 5.3.2, the iterates generated by Algorithm Al are in N, and the step
sizes a computed at Step 3 is greater than or equal to 0'*, which does not depend
on k. From Step 3 of Algorithm Al and Lemma 5.3.2, we see that

(x k +1f Sk+l ~ nil


~ (1 - .50'*(1 - A))(xk)T sk.

If 0'* is bounded away from 0 then (xk)T sk -> 0 as k -> 00. In order to obtain a
lower bound of 0'* , we estimate the magnitude of 7]1 and 7]2 in the next three lemmas.

Lemma 5.3.3 The solution of (5.5) satisfies

D-1D.x = -OJ,QD-l(xO - u) + 01 (I - Q)D(sO - w)


-(1 - Q)(XS)-·5(XS k -/-l'e),
D.y = -01(yO -
v) - (AD2 AT)-l AD(0j,D-1(xO - u) + 01>D(sO - w)
-(XS)-·5(Xsk -/-l'e)),
DD.s = 0j,QD-1(xO - u) - 01 (I - Q)D(sO - w) - Q(XS)-·5(X sk -/-l'e),

where X := diag(xk), S := diag(sk), D := x


5 S-·5, Q := DAT (AD2 AT)-l AD and
(u, v, w) is a solution of Au = b and AT v + w = c. Moreover

IID-1D.xll ~ OJ,IIS(xO - U)~IIX(SO - w)1I + (1 + ~) In/-lk, (5.16)

IIDD.sll ~ OJ,IIS(xO - U)~"X(SO - w)1I + (1 + ~) In/-l k . (5.17)

Proof Suppose that (D.x, D.y, D.s) is expressed as in the lemma. Since ADQ = AD
=
and AD(I - Q) 0, we see that

AD.x AD( D- 1 D.x)


-OJ,A( xO - u)
-Axk +b
by using Au = band (5.7). Similarly we have that AT D.y + D.s = _AT yk - sk +c
and that

SD.x + XD.s (XS)·5(D- 1D.x + DD.s)


_(Xsk -/-l'e).
IIP Algorithms 169

So (,6.x, ,6.y, ,6.s) is the solution of (5.5). Since Q and I -Q are orthogonal projections,
we have that

IID- 1,6.xll < IIO},D-1(xO - u)1I + IlotD(sO - w)11 + II(XS)-5(Xs k - p'e)11


< II(XS)-511(O}'IIS(xO - u)11 + otllX(so - w)ID
+11(XS)5 e - pl(XS)- 5ell,

where the norm IIX/II of a diagonal matrix XI is equal to the maximum absolute value
of the diagonal elements. Since pi = )..pk and Xsk 2: 'Ypke, we have II(XS)-511 :::
1/ J'Ypk and
II(XS)5 e - P'(XS)-5ell < II(XS)5 ell + p / ll(XS)- 5e ll
< ~ + )..pky'n/';:;-;;;;
(1 + )..Iv:r)~.
Hence we have shown the inequality (5.16). Similarly we obtain the bound of IID,6.sll
as in (5.17). 0

Since IID-l,6.xll and IID,6.sll are bounded by (5.16) and (5.17), we shall obtain an
upper bound of the first term in the right side of them.

Lemma 5.3.4 If the condition {5.11} holds true, the primal-dual problem {5.3} has
no solutions in Bp. If the condition {5.11} does not hold true at k-th iteration then

where X := diag(xk), 5 := diag(sk), and

for a solution of Au = b and AT v + w = c.

Proof Suppose that the primal-dual problem (5.3) has a solution (x', y*, s*) in B p ,
that is, II(x*, s*)lloo ::: p. From Lemma 5.2.1,

O},AxO + (1- O},)b - (b + O},(AxO - b))


o
170 CHAPTER 5

and similarly

So we have that

which implies

(O~xO + (1- O~)x'l sk + (01s0 + (1- (1)s'l xk (5.18)


= (O~xO + (1- O~)x'l(OhsO + (1- 0h)s*) + (xk)T sk.
Using this equality, 0::; x' ::; pe, 0::; s' ::; pe, and (x*)T s* = 0, we obtain

So (5.11) does not hold true, and we have proved the first assertion.

Now suppose that the condition (5.11) does not hold true, that is, (5.19) holds true.
Then we have that

O~IIS(xO - u)11 + 0hIIX(so -


w)11
< 0~IISxOIIII(XO)-1(xO - u)lloo + OhIIXsollll(So)-1(sO - w)lloo
< K1(0~IISxoll + OhllXsolD
< K1(0~(sk)T x O + Oh(xk)T sO)
::; K1(0~(xOl(pe) + 01(sOl(pe) + O~Oh(xOl sO + (xkl sk).

From (5.7), (5.8), and (xk,yk,sk) E N, we have that

(xkls k ?f.O~nJ-l° and (xk)Tsk ?f.01nJ-l°. (5.20)

Hence

O~IIS(xO - u)11 + 0hIIX(sO - w)11


peT xO peT sO Oh(xO? sO ) kT k
::; K1 ( T O + T O + f. ° +1 (x) s .
.. nJ-l .. nJ-l nJ-l
Since Oh ::; 1 and J-l 0 = (xO? sO In, we obtain the inequality in the lemma. 0

By using Lemmas 5.3.3 and 5.3.4, we shall get the values of 1/1 and 1/2 defined in
Lemma 5.3.2.
IIP Algorithms 171

Lemma 5.3.5 If the condition (5.11) does not hold true at k-th iteration then

l~xT ~sl ~ TJ(xkf sk and I~Xi~Si - l~xT ~s/nl ~ (1 + I)TJ(xkf sk,

Proof Using Lemmas 5.3.3 and 5.3.4,

Since we have the same bound for IID~sll. we see that

I~Xi~Si-l~xT~s/nl < (1+I)m!lXI~Xi~Sil


,
< (1 + 1)IID- 1 ~xIIIlD~sll
< (1 + I)TJ(xkf sk.
Similarly we obtain that l~xT ~sl ~ IID-l~xIlIID~sll ~ TJ(xkf sk. o

We are ready to prove Theorem 5.3.1.

Proof of Theorem 5.3.1: Suppose that p is finite and the condition (5.11) does
not hold true throughout Algorithm AI. Then TJ defined in Lemma 5.3.5 is finite.
From Lemmas 5.3.2 and 5.3.5 we have that

for each k, where


• . {1 - >. (1 - I)>' }
a := mm ~' n(1 +I)TJ .
Hence we have that
172 CHAPTER 5

IIAxk - bll < (Xk)T skllbll/(enpo)


< (1- .5a*(I- A»"'lIbll/e
and similarly

Hence if
k > max {In((xO)T sO I f), In(lIbll/(efP», In(lIcll/(efD »}
(5.21 )
- .5a*(1 - A)
then the conditions (5.9) hold true. The right side is finite and depends on the point
(u, v, w), the initial point (xO, yO, SO), p, i, ip, iD, and n.

As stated in Lemma 5.3.4, if the condition (5.11) holds true at some iteration, there
are no solutions of (5.3) in Bp.

If the primal-dual problem is feasible, there exists a solution (x', y*, SO) of it. So
the condition (5.11) does not hold true throughout Algorithm Al if p 2: pi
lI(x*, s*)lIoo· Hence Algorithm Al terminates in a finite number of iterations, which
depends on (u,v,w) := (x',y',s*), (XO,yO,sO), pi, i, ip, iD, and n, by using the
same argument above.

Suppose that Algorithm Al generates an infinite sequence of points (xk, yk, sic). If
/-I k --> 0 then the algorithm terminates by the conditions in (5.9). So pk is bounded
away from o. If (x lc , yk, sic) is bounded then they are bounded away from 0 because
x~ s~ 2: '/-I k • Thus D and D- 1 are bounded. Hence ~x and ~s are bounded by
Lemma 5.3.3. Therefore a* in Lemma 5.3.2 is bounded away from 0, that is, pic goes
to 0, and we have derived a contradiction. 0

5.4 POLYNOMIAL TIME CONVERGENCE


In this section, we assume that the size of the data A, b, and c is L. We shall show
that the number of iterations required in Algorithm Al is bounded by O(n 2 L), when
we choose an appropriate initial point and a parameter e. We also show that the
number is bounded by O( nL) if the initial point is feasible or almost feasible.

Let PO > 0 be a constant such that the system


Au = b, AT V + w = c, lI(u,w)lIoo::5 Po (5.22)
IIP Algorithms 173

has a solution. It is well known that this system has a solution for Po = 2L, so we
assume that Po ::; 2L. We may compute a smaller Po than 2L by solving a simple
minimization problem without inequality constraints:

min //(u, w)// subject to Au = b, AT v + w = c.


We obtain the following polynomial-time bound of Algorithm AI.

Theorem 5.4.1 Suppose that we choose ~ = 1, p E [po,2 L ], ( ~ 2- L , (p ~ 2- L ,


(D~ 2- L , and the initial point

Then Algorithm Al terminates in O(n 2 L) iterations.

We use the following lemma to prove the theorem.

Lemma 5.4.2 Under the conditions in Theorem 5.4.1, if the condition (5.11) does
not hold true at k-th iteration of Algorithm Al then

11:1 ::; 2 and TJ::; 100n/'y,

where 11:1 and TJ are defined in Lemmas 5.3.4 and 5.3.5 respective/yo

Proof Let (u, v, w) be a solution of (5.22). Then we have that


o ::; xO - u ::; pe + poe::; 2x o,
0::; sO - w::; pe + poe::; 2so,
which imply
11:1 := //((XO)-l(xO - u), (5 0 )-1(sO - w))//oo ::; 2.
Since xO = pe, sO = pe, and ~ = 1, we have that

(~ (2n~2 + 1 + 1) Vn + 1 + ~)2
v'1 np v'1
< 100n/'y.
o

Proof of Theorem 5.4.1: From Lemmas 5.3.2,5.3.5, and 5.4.2, we have that
(xk+ 1f sk+1 ::; (1 _ .50:*(1- A»(xk)T sk
174 CHAPTER 5

for
* . {I(I->.) 1(1-,)>, }
a := mill 200n' 100n 2 (1 + I) .
Using the same argument as in the proof of Theorem 5.3.1, the number of iterations
of Algorithm Al is bounded by the right side of (5.21), which is O(n 2 L) from the
parameter values in Theorem 5.4.1 and the value of a* above. 0

It is well known that if the initial point is feasible, Algorithm Al requires at most
O(nL) iterations, see for example Mizuno et al. [15]. Here we shall get a sufficient
condition to achieve the O( nL) iteration complexity of Algorithm AI.

Theorem 5.4.3 Let fJ > 0 be a constant independent of the data. Suppose that the
parameter values~, c, cp, en, and p are as in Theorem 5.4.1. For a given initial
point (xO,yO,sO) EN, if there exists a solution (u,v,w) of Au = band ATv+w = c
such that

(5.23)
Algorithm A1 terminates in O( nL) iterations.

Note that if (xO, yO, sO) = p( e, 0, e) in addition, this theorem easily follows from the
proof of Theorem 5.4.1.

Proof From the condition (5.23), .5xo :::; u :::; 1.5xo and .5so :::; w :::; l.5so. Note that
the relation (5.18) holds true not only for the optimal solution (x*, y*, SO) but also
for the point (u, v, w). So we have that
.5(x Of sk
+ .5(sO)T xk
< (e~xO + (1 - e~ )u)T sk + (e1s0 + (1 - e1)wf xk
(e~xo + (1- e~)u)T(e1so + (1- e1)w) + (xkf sk
< (l.5xOf(l.5so) + (xkf sk
< 3.25(xO)T sO.

Using this inequality and (5.23), we see that

B~IIS(xO
- u)11 + e11lX(so - w)11
< (fJ/y'n)e~IISxoll + (6/y'n)e11IXsoll
< (fJ/y'n) max{e~, e1}((skf xO + (xkf sO)
< (fJ/y'n) max{e~, e1}6.5(xof sO
< 6.56(x k f sk /y'n.
lIP Algorithms 175

where the last inequality follows from e= 1 and (5.20). From Lemma 5.3.3,
IID- I .6.xll < 6.58..jnpk/v'f+ (1 +)../v'f)Jnpk
< (2 + 6.58)Jnpk/v'f. (5.24)
We have the same bound for IID.6.sll. Using the same argument in the proof of
Lemma 5.3.5,
l.6.xT .6.sl $ «2+6.58)2h)(x k ls k ,
I.6.Xj.6.sj -,.6.xT .6.s/nl $ «1 + ,)(2 + 6.58)2h)(x kl sk.
By Lemmas 5.3.2, we have that
(xk+If sk+ I $ (1- .5a*(1- )..»(xkl sk

for
* . { ,(1 - )..) ,(1 - , ».. }
a := mill 2(2 + 6.582)' (2 + 6.58)2(1 + ,)n .
Then the number of iterations of Algorithm Al is bounded by the right side of
(5.21), which is O(nL) from the parameter values in Theorem 5.4.3 and the value of
a* above. 0

In Theorem 5.4.3, we have got a condition of the initial point to achieve O( nL) iter-
ation complexity of Algorithm AI. Mizuno [11) showed that a variant of Algorithm
Al terminates in O( nL) iterations under the condition of initial point given in The-
orem 5.4.1. The variant uses a predictor-corrector technique. Here we do not show
the variant, but we introduce a predictor-corrector algorithm, which is a variant of
Algorithm B given in the next section, and we prove its O(nL) iteration complexity
in Sections 5.6 and 5.7.

5.5 AN lIP ALGORITHM USING A SURFACE


OF CENTERS
Let (xO, yO, sO) be an interior point of (5.3). Let band e be defined by (5.10). For
each fixed B, we consider a perturbed linear programming problem
mlllImIze (c - Be)T x,
(5.25)
subject to Ax = b - Bb, x;::: 0,
its dual problem
maxImIze (b - Bb)Ty,
(5.26)
subject to AT Y + s = c - Be, s;::: 0,
176 CHAPTER 5

and its primal-dual problem

(x, s) 2: o. (5.27)

Note that if B = 0 then the problems (5.25), (5.26), and (5.27) coincide with the
original problems (5.1), (5.2), and (5.3) respectively, and if B = 1 then the initial
point (xO, yO, sO) is a feasible interior point of (5.27). A point (x k , yk, Sk) generated
by Algorithm A is a feasible point of the problem (5.27), if B~ = Band B1 = B.

Now we consider the feasibility of the primal-dual problem (5.27). It is easy to verify
that if (5.27) has interior points for two different parameter values Bl < B2, there
exists an (' > 0 such that (5.27) has an interior point for any B E (Bl - (', B2 + (').
Hence the set of parameter values, for which (5.27) has an interior point, is an open
interval (B" Bu), where B, < 1 and Bu > 1 may be -00 and 00 respectively. From the
definition, B, < 0 if and only if the original primal-dual problem (5.3) has an interior
point, and B, = 0 if and only if it has a feasible point but does not have an interior
point.

If the perturbed primal-dual problem (5.27) has an interior point, then centers of
it exist. For each B E (B t , Bu) and fJ > 0, the center (x(B, fJ), y(B, fJ), s(B, fJ)) of the
problem (5.27) is a solution of the system

(5.28)

The center (x(B, fJ), y(B, fJ), s(B, fJ)) exists uniquely for each B E (B t , Bu) and fJ > o.
We define the set of parameters

T:= {(e,fJ): B E (B"eu),fJ > O}

and the set of centers

5:= {(x(B,fJ),y(e,fJ),s(B,fJ)): (B,fJ) E T}.

The following properties of the set 5 were shown in Mizuno et al. [16].

Theorem 5.5.1 The set 5 of centers is a surface. Let {(Bk, fJk)} be a sequence on
T. When the primal-dual problem (5.3) has an interior point, the center (x(Bk, {lk),
y(B k , {lk), s(Bk, {lk)) approaches to the solution set of (5.3) if (Bk, {lk) 1 (0,0). When
(5.3) has a feasible point but not an interior point, (x(Bk, {lk), y(e k , fJk), s(e k , {lk))
lIP Algorithms 177

approaches to the solution set of (5.3) if(Ok ,pk) ! (0,0) such that pk 10 k is bounded.
For any p* > 0, if (Ok, pk) approaches (Bl, p*) then II(x(B k , pk), y(Bk, pk), s(Bk, pk)11
is unbounded.

Outline of the proof: Since the centers are solutions of the system (5.28) for
(0, p) E T, 5 is a surface from the implicit function theorem. Suppose that the
problem (5.3) has an interior point (x', y', s'), where x' > 0 and s' > O. Then we
have that

(BxO + (1- O)x' - x(O,p)f(OsO + (1- B)s' - s(B,p» = 0,


or equivalently
(OXO + (1 - B)x'f s(B, p) + (OsO + (1 - O)s')T x(B, p)
= (BxO + (1- B)x'f(BsO + (1 - O)s'f + x(B,pf s(O, p)
from which we can prove that (x( 0, p), y( B, p), s( B, p» is bounded. Then every cluster
point of (x(O, p), y(B, p), s(O, p» is a solution of (5.3) if (0, p) ! (0,0). Now suppose
that the problem (5.3) has a feasible optimal point (x', y', s'). Then we get from the
equality above that

(XO)T s(B, p) + (sO)T x(O, p) :S O(xO)T sO + (1 - O)((xOf s' + (sO)T x') + nplB.
So (x(O,p),y(O,p),s(B,p» is bounded if plB is bounded. Hence every cluster point
of (x(B, p), y(O, p), s(O, p» is a solution of (5.3) if (B, p) ! (0,0). If (Bk, pk) goes to
(Bl,P*) with p* > 0, an element of x(O,p) or s(O,p) goes to 0, which implies that
the other element is unbounded because XiSj --+ p* > O. 0

Using the results in Theorem 5.5.1, we construct an algorithm for solving the primal-
dual problem (5.3). The algorithm generates a sequence of approximate points of
(x(Ok,pk),y(Ok,pk),s(Bk,pk» for (Bk,pk) E T which converges to (0,0) as k --+ 00,
if the primal-dual problem (5.3) is feasible.

Let (xk,yk,sk) be a current iterate. For (B,p):= (B',p'), we compute the Newton
direction (~x,~y,~s) of the system (5.28) at (xk,yk,sk), that is, the solution of

(5.29)

While we have used different step sizes for the primal variables and for the dual
variables in Algorithm A, in the following algorithm we use a single step size for
178 CHAPTER 5

all the variables, so that the iterates generated by the algorithm are feasible for the
perturbed primal-dual problem (5.27).

Algorithm B: Let (xO,yO,sO) be an initial interior point. Set k := 0 and /-10


(xO)T sO In.

Step 1: Choose parameter values 0' :s Ok and /-I' :s /-I k .


Step 2: Compute the solution (Llx, Lly, Lls) of the system (5.29).

Step 3: Compute a step size a and the next point


(xk+l, yk+l, sk+!) := (xk, yk, sk) + a(Llx, Lly, Lls). Compute Ok+! and /-Ik+ 1 .

Step 4: Set k := k + 1 and go to Step 1.

We may regard the infeasible-interior-point algorithms proposed by Potra [19, 20],


Stoer [21], Freund [1], and Mizuno et al. [14] as instances of Algorithm B.

5.6 A PREDICTOR-CORRECTOR
ALGORITHM
We define a path on the surface S. Let /-10 = (xO)T sO In. Define

P2 := {(x(O, /-I), y(O, /-I), s(O, /-I» : /-I = 0/-1°,0> O}.

From Theorem 5.5.1, the center (x(O, /-I), y(O, /-I), s(O, /-I» on P2 approaches to the
solution set of the problem (5.3) as 0 --+ 0 if the problem is feasible, and it diverges
as 0 --+ Ol > 0 if the problem is infeasible. We call P 2 a path of infeasible centers,
because it consists of infeasible points of (5.3) unless the initial point is feasible.

Let {3 E [0,1]. We define a neighborhood of the path P2 as follows

N'({3):= {(x, y, s) : x > 0, s > 0, IIX s - /-Iell ::; {3/-1, /-I = 0/-1°, (J > 0,
Ax = b - Ob, ATy + s = c - Oe}.

This set N'({3) is much smaller than N because of the Euclidean norm I!X s - /-leI!
to measure the closeness to the path. By generating a sequence of iterates in this
smaller neighborhood, we construct a theoretically better algorithm.

A predictor-corrector algorithm tracing the path P2 is described as follows:


IIP Algorithms 179

Algorithm Bl: Set fJ1 := .25 and fJ2 := .5. Choose the parameter values p, f, (p,
and (D. Let (xO,yO,sO) be an initial interior point such that IIXoso _poell ~
fJ 1p ofor pO := (xO)T sO In. Set k := 0 and 00 := 1.
Step 1: If the conditions in (5.9) hold true at the current iterate (x k , yk, sk) then
output it as an approximate solution and stop. If the condition (5.11) holds
true for O~ =
Ok and 01 = Ok then stop.
Step 2: Compute the solution (~x, ~y, ~s) of the system (5.29) for (0', p') = (0,0)
at (xk, yk, sk). Compute

a := max{a : (xk, yk, sk) + a'(~x, ~y, ~s) E N'(fJ2) for any a' E [0, a)}.

Set
(x', y', s') := (x k , yk, sk) + a(~x, ~y, ~s),
0k+1 := (1- a)Ok,
pk+1 := Ok+! pO.

Step 3: Compute the solution (~x', ~y', ~s') of the system (5.29) for
(O',p') = (Ok+1,pk+1) at (x',y',s').
Set (Xk+1, yk+1, sk+1) := (x', y', s') + (~x', ~y', ~s').

Step 4: Set k := k + 1 and go to Step 1.

In each 100p of Algorithm B1, we compute two directions, so that one iteration
of Algorithm B1 corresponds to two iterations of Algorithm B. Step 2 is called a
predictor step and Step 3 is a corrector step. At the predictor step, we are trying
to decrease the value of Ok+I and pk+I as much as possible subject to the condition
that the new iterate is in the neighborhood N'(fJ2). Then at the corrector step,
we compute a point near to the path of centers P2. We shall show that the point
computed at the corrector step belongs to the smaller neighborhood N'(fJd.

Lemma 5.6.1 For any k, (x k , yk, sk) is a feasible point of {5.27} for 0 = Ok. More-
over pk =
(xk)T sk In and (x k , yk, sk) E N'(fJd for fJ1 .25. =

Proof. Suppose that the assertion of the lemma is true for k. We shall prove that it
is also true for k + 1. We have that
Ax'- b AXk + a(-Axk +b) - b
-(1 - a)( _Axk + b)
-(1- a)Okfj
_Ok+Ifj
180 CHAPTER 5

and similarly
ATyl+SI-C = _(}k+1 c .

From these equalities and the step size ci at Step 2, (Xl, yl, Sl) is a feasible point of
(5.27) for (} = (}k+ 1, and it is in N I (f32) for f32 = .5. By Step 3, (dx l , dyl, ds l ) is a
solution of the system
Adx l = -Ax l + b - (}k+1/;) =
0,
AT dyl + ds l = _AT yl - Sl + C - (}k+1c = 0,
51 dx l + XI ds l= _XI Sl + J-lk+1 e.

Let DI := (XI) 5(51)-.5 for XI := diag(x l ) and 51 := diag(sl). From the system of
equations above, we see that (dXI)T ds l = 0,
(Xk+1? Sk+1 (Xl? Sl + ((Sl? dx + (Xl? dS) + (dXI)T ds l
(Xl? Sl + (_(Xl? Sl + nJ-l.k+1) + °
nJ-lk+1,

and

< 1 (I A I
-1-1 SiL>.Xi
I 1)2
+ XiL>.Si
A

4x;si
1 (k+1 I 1)2
< 4(1 _ f32)J-lk+1 J-l - xisi

for each i. Thus we obtain


n
IldX I dslW = 2.)dX~dsD2
;=1
lIP Algorithms 181

Since /32 = .5,


II~X' ~s'll ~ (v'2/8)J.Lk+ 1.
For each a E [0,1], we have that
A(x' + a~x') = Ax' = b - (}k+l/j,
AT(y' + a~y') + (s' + a~s') =
ATy' + S' =c - (}k+ 1 c,

and
II(X' + a~X')(s' + a~s') - J.Lk+1ell
= IIX's' + a( -X' s' + pk+1e) + a2~X' ~s' - pk+1ell
< (1- a)IIX's' - pk+1ell + a211~X'~s'll
< .5(1- a)pk+l + a 2(v'2/8)J.Lk+ 1 .
Those relations imply that for each a E [0,1] x' + a~x' > 0, s' + a~s' > 0,
and (x',y',s') + a(~x',~y',~s') E N'(.5(1- a) + .25a 2), especially we see that
(xk+l,yk+l,sk+l) E N'(/3t) for /31 = .25. 0

5.7 CONVERGENCE PROPERTIES


In this section, we prove that Algorithm Bl terminates in a finite number of iterations
from an arbitrary initial interior point if p is finite or the problem is feasible. Then
we show that the complexity of Algorithm Bl is better than that of Algorithm AI,
when we use a big initial point or an almost feasible initial point.

Theorem 5.7.1 If p is finite, Algorithm Bl terminates in a finite number of iter-


ations, which depends on the initial point, a solution of Au =
b and AT v + w e, =
p, <, <p, <D, and n. If the condition (5.11) holds true at some iteration k, the
primal-dual problem (5.3) has no solutions in Bp. If p = 00 and the problem (5.3)
is feasible, Algorithm Bl terminates in a finite number of iterations, which depends
on the initial point, an optimal solution, f, <p, tD, and n. If p = 00 and Algorithm
Bl generates an infinite sequence, the sequence is unbounded and the problem (5.3)
is infeasible.

From Lemma 5.6.1, we have that


(xkfs" = np" =
(}k(xO)TsO,
=
IIAx" - bll (}"IIAxO - bll,
IIATy" + sk - ell = (}"IIA T yO + SO - ell.
182 CHAPTER 5

Hence the conditions in (5.9) hold true if

Since Ok+1 =(1- ii)Ok at each iteration of Algorithm Bl, we shall get a lower bound
of ii, and then we shall prove the theorem.

Lemma 5.7.2 At each iteration of Algorithm Bl, if II~X ~sll < 1]3p.k for some
°
> then
1]3

Proof Since (xk,yk,sk) E N'(/3I) and (~x,~y,~s) is the solution of (5.29) for
(0',/-0:= (0,0), we have that
A(xk+a~x) = Axk+a(-Axk+b)
= (l-a)(b-O kb)+ab
= 10-
b-(l-a)O b,

similarly

c - (1 - a)Okc,

and

II(Xk + a~X)(sk + a~s) - (1- a)p.kell- /32(1 - a)p.k


IIXksk + a(Xk ~s + Sk ~x) + a2~X~s - (1- a)p.kell_ /32(1- a)p.k
11(1 - a )(Xk sk - p.k e) + 0'2 ~X ~sll - /32(1 _ a )p.k
< (1- a)/31p.k + a 21]3p.k - /32(1 - a)p.k
= (0'21]3 + .250' - .25)p.k
< ° if a E [O,a].

From the condition above and the continuity with respect to a E [0, a], we also have
°
that xk + a~x > and sk + a~s > 0. Hence (xk, yk, sk) + a(~x, ~y, ~s) E N'(/32)
for any a E [0, a], which implies ii ~ a. 0

Proof of Theorem 5.7.1: Since (xk, yk, sk) E N'(/3I) and p.k = (xk)T sk In, it is
easy to see that (xk, yk, sk) E N if"}' ::::; 1 - /31. So the results in Lemmas 5.3.3 and
5.3.4 hold true for oi:= Ok, ot
:= 0", e := 1, "}' := 1- /31, p.' := 0, and'\ := 0. If
IIP Algorithms 183

the condition (5.11) does not hold true at k-th iterate then from Lemmas 5.3.3 and
5.3.4

where 11:1 is defined in Lemma 5.3.4. We also have this bound for IIDLlsll. Hence

(5.31)

If p is finite then IILlXLlslI/~k is. Hence from Lemma 5.7.2, the step size a is
bounded away from 0, and the algorithm terminates in a finite number. We can
prove the other assertions in Theorem 5.7.1 as we have done in the proof of Theorem
5.3.1. 0

We state two results on polynomial-time complexity of Algorithm HI, whose proofs


are similar to those of Theorems 5.4.1 and 5.4.3. The important difference between
Algorithms Al and HI is the complexity. The number of iterations required for
Algorithm HI is theoretically less than that for Algorithm AI.

Theorem 5.7.3 Suppose that the parameter values E, Ep, ED, p, and the initial
point (xO,yO,sO) are as in Theorem 5.4.1. Then Algorithm B1 terminates in O(nL)
iterations.

=
Outline of the proof: From (xO, sO) p(e, e), (5.31), and Lemmas 5.4.2 and 5.7.2,
we obtain that a is at least 1/0(n). Since Ok+1 =
(1- a)(Jk, the condition (5.30)
=
holds true for k O(nL). Hence the number of iteration is bounded by O(nL). 0

Theorem 5.7.4 Let 6 > 0 be a constant independent of the data. Suppose that the
parameter values E, Ep, ED, and p are as in Theorem 5.4.1. For a given initial point
(XO, yO, sO) E N'(f31), if there exists a solution (u, v, w) of Au = b and AT v + w = c
such that (5.23) holds true, then Algorithm B1 terminates in O(ynL) iterations.

Outline of the proof: From (5.24) and the same bound for IIDLlslI, IILlXLlsll is
O(n)~k. So a is at least 1/0(yTi) from Lemma 5.7.2. Since (Jk+1 = (1- a)Ok, the
condition (5.30) holds true for k = O( ynL), and the number of iteration is bounded
by O(ynL). 0
184 CHAPTER 5

It is also possible to prove that if we set f = 0 in Algorithm Bl and the problem


(5.3) is feasible, the iterates generated by the algorithm converge quadratically, i.e.
there exists a constant ( > such that (xk+ 1f sk+l ~ (((xkf sk)2 for each k. Since
we need technical results and complicated analysis, we do not prove it here. See
Mizuno et al. [11], for example, for the complete proof of Theorems 5.7.3, 5.7.4, and
the quadratic convergence.

5.8 CONCLUDING REMARKS


In this chapter, we have introduced primal-dual infeasible-interior-point algorithms
for linear programming. This type of algorithms are easily extended to a linear
complementarity problem (LCP) with a positive semidefinite matrix. In fact, the lIP
algorithms presented by Zhang [26], Potra [20], and Wright [24] solve an LCP rather
than a linear programming problem. Mizuno et al. [11] gave a unified approach of an
lIP algorithm for various LCP's including primal-dual linear programming problems.

lIP algorithms solving primal or dual only linear programming problems are pro-
posed by Freund [1] and Muramatsu and Tsuchiya [18]. The algorithm in [1] traces
a path of centers, which is a projection of P2 on the primal space, and uses a short
step size at each iteration. The algorithm in [18] is an extension of Dikin's affine
scaling algorithm, so that it enable to start from an infeasible interior point.

Although the lIP algorithms presented in this chapter use a big initial point or an
almost feasible initial point to achieve polynomiality, Freund's algorithm [1] can start
from a smaller initial point and the number of iterations is bounded by O( n 2 L).
Mizuno et al. [14] proposed a potential reduction lIP algorithm which requires
O( n 25 L) iterations. They also proposed a variant which requires O( nL) iterations.
The lIP algorithm presented by Mizuno and Jarre [13] is different from others,
because it uses a projection on a convex set at each iteration which may increase the
infeasibility.

A superlinear convergence of an lIP algorithm was proved by Zhang and Zhang


[27]. Then Potra [20] proposed a quadratically convergent predictor-corrector lIP
algorithm for an LCP under the condition that a strictly complementarity solution
exists. The algorithm in Mizuno [12] converges superlinearly for an LCP without
the condition. Finally we mention about a homogeneous and self-dual interior-point
algorithm presented by Ye et al. [25]. The algorithm uses an artificial problem,
however it may start from a simple interior point (x, y, s) = (e, 0, e) and it requires
(.,filL )-iterations without using any big constant.
lIP Algorithms 185

REFERENCES
[1) R. Freund, "An infeasible-start algorithm for linear programming whose com-
plexity depends on the distance from the starting point to the optimal solution,"
Working paper 3559-93-MSA, Sloan School of Management, Massachusetts In-
stitute of Technology, USA (1993).
(2) N. Karmarkar, "A new polynomial-time algorithm for linear programming,"
Combinatorica 4 (1984) 373-395.
(3) M. Kojima, N. Megiddo, and S. Mizuno, "A primal-dual infeasible-interior-point
algorithm for linear programming," Mathematical Programming 61 (1993) 261-
280.
(4) M. Kojima, S. Mizuno, and A. Yoshise, "A primal-dual interior point algorithm
for linear programming," in: Progress in Mathematical Programming, Interior-
Point and Related Methods, ed. N. Megiddo (Springer-Verlag, New York, 1989)
29-47.
(5) M. Kojima, S. Mizuno, and A. Yoshise, "A polynomial-time algorithm for a class
of linear complementary problems," Mathematical Programming 44 (1989) 1-26.
(6) I. J. Lustig, "Feasibility issues in a primal-dual interior-point method for linear
programming," Mathematical Programming 49 (1990/91) 145-162.
(7) I. J. Lustig, R. E. Marsten, and D. F. Shanno, "Computational experience with
a primal-dual interior point method for linear programming," Linear Algebra
and Its Applications 152 (1991) 191-222.
(8)1. J. Lustig, R. E. Marsten, and D. F. Shanno, "Interior point methods: com-
putational state of the art," ORSA Journal on Computing 6 (1994) 1-14.
(9) R. Marsten, R. Subramanian, M. Saltzman, I. J. Lustig, and D. Shanno, "In-
terior point methods for linear programming: Just call Newton, Lagrange, and
Fiacco and McCormick!," Interfaces 20 (1990) 105-116.
(10) N. Megiddo, "Pathways to the optimal set in linear programming," in: Progress
in Mathematical Programming, Interior-Point and Related Methods, ed. N.
Megiddo (Springer-Verlag, New York, 1989) 131-158.
[11) S. Mizuno, "Polynomiality of infeasible-interior-point algorithms for linear pro-
gramming," Mathematical Programming 67 (1994) 109-119.
(12) S. Mizuno, "A superlinearly convergent infeasible-interior-point algorithm for
geometrical LCP's without a strictly complementary condition," Preprint 214,
Mathematische Institute der Universitaet Wuerzburg, Germany (1994).
186 CHAPTER 5

[13] S. Mizuno and F. J arre, "An infeasible-interior-point algorithm using projec-


tions onto a convex set," Preprint 209, Mathematische Institute der Universitaet
Wuerzburg, Germany (1993).

[14] S. Mizuno, M. Kojima, and M. J. Todd, "Infeasible-interior-point primal-dual


potential-reduction algorithms for linear programming," SIAM Journal on Op-
timization 5 (1995) 52-67.

[15] S. Mizuno, M. J. Todd, and Y. Ye, "On adaptive-step primal-dual interior-


point algorithms for linear programming," Mathematics of Operations Research
18 (1993) 964-981.

[16] S. Mizuno, M. J. Todd, and Y. Ye, "A surface of analytic centers and infeasible-
interior-point algorithms for linear programming," Mathematics of Operations
Research 20 (1995) 52-67.

[17] R. D. C. Monteiro and 1. Adler, "Interior path following primal-dual algorithms.


Part I: linear programming," Mathematical Programming 44 (1989) 27-41.

[18] M. Muramatsu and T. Tsuchiya, "An affine scaling method with an infeasible
starting point," Research Memorandum 490, The Institute of Statistical Math-
ematics, Tokyo (1993).
[19] F. A. Potra, "An infeasible interior-point predictor-corrector algorithm for linear
programming," Report No. 26, Department of Mathematics, The University of
Iowa, USA (1992).
[20] F. A. Potra, "A quadratically convergent predictor-corrector method for solving
linear programs from infeasible starting points," Mathematical Programming 67
(1994) 383-406.
[21] J. Stoer, "The complexity of an infeasible interior-point path-following method
for the solution of linear programs," Optimization Methods and Software 3
(1994) 1-12.
[22] K. Tanabe, "Centered Newton method for mathematical programming," in:
System Modeling and Optimization, eds. M. Iri and K. Yajima (Springer-Verlag,
New York, 1988) 197-206.
[23] K. Tanabe, "Centered Newton method for linear programming: Interior and
'exterior' point method' (Japanese)," in: New Methods for Linear Programming
3, ed. K. Tone, (The Institute of Statistical Mathematics, Tokyo, Japan, 1990)
98-100.
[24] S. Wright, "An infeasible-interior-point algorithm for linear complementarity
problems," Mathematical Programming 67 (1994) 29-52.
lIP Algorithms 187

[25] Y. Ye, M. J. Todd, and S. Mizuno, "An O(foL)-iteration homogeneous and


self-dual linear programming algorithm," Mathematics of Operations Research
19 (1994) 53-67.

[26] Y. Zhang, "On the convergence of a class of infeasible interior-point methods for
the horizontal linear complementarity problem," SIAM Journal on Optimization
4 (1994) 208-227.
[27] Y. Zhang and D. Zhang, "Superlinear convergence of infeasible interior-point
methods for linear programming," Mathematical Programming 66 (1994) 361-
378.
6
IMPLEMENTATION OF
INTERIOR-POINT METHODS
FOR LARGE SCALE LINEAR
PROGRAMS
Erling D. Andersen!, Jacek Gondzio 2 ,
Csaba Meszaros 3 , Xiaojie Xu 4
1 Department
of Management, Odense University,
Campusvej 55, DK-5230 Odense M, Denmark.
2 Logilab, HEC Geneva, Section of Management Studies,
University of Geneva, 102 Bd Carl Vogt,
CH-1211 Geneva 4, Switzerland, (on leave from the
Systems Research Institute, Polish Academy of Sciences,
Newelska 6, 01-447 Warsaw, Poland.
3 Department of Operations Research and Decision Support Systems,
Computer and Automation Research Institute,
Hungarian Academy of Sciences, Lagymanyosi u. 11, Budapest, Hungary.
4 Institute of Systems Science, Academia Sinica,
Beijing 100080, China.

ABSTRACT
In the past 10 years the interior point methods (IPM) for linear programming have gained
extraordinary interest as an alternative to the sparse simplex based methods. This has
initiated a fruitful competition between the two types of algorithms which has led to very
efficient implementations on both sides. The significant difference between interior point
and simplex based methods is reflected not only in the theoretical background but also in
the practical implementation. In this paper we give an overview of the most important
characteristics of advanced implementations of interior point methods. First, we present
the infeasible-primal-dual algorithm which is widely considered the most efficient general
purpose IPM. Our discussion includes various algorithmic enhancements of the basic al-
gorithm. The only shortcoming of the "traditional~ infeasible-primal-dual algorithm is to
detect a possible primal or dual infeasibility of the linear program. We discuss how this
problem can be solved with the homogeneous and self-dual model.

189
T. Ter/aky (ed.), Interior Point Methods ofMathematical Programming 189-252.
o 1996 KlIIwerANtkmicPIl'bli.thcTl.
190 CHAPTER 6

The IPMs practical efficiency is highly dependent on the linear algebra used. Hence, we
discuss this subject in great detail. Finally we cover the related topics of preprocessing and
obtaining an optimal basic solution from the interior-point solution.

6.1 INTRODUCTION
As early as in the late 1940's, almost at the same time when Dantzig presented the
famous simplex method, several researchers, including von Neumann (1947) [68],
Hoffman et al.(1953) [41] and Frisch (1955) [27], proposed interior-point algorithms
which traverse across the interior of the feasible region in an attempt to avoid the
combinatorial complexities of vertex-following algorithms.

However, the expensive computational steps they require, the possibility of numerical
instability in the calculations, and some discouraging experimental results led to a
consensus view that such algorithms would not be competitive with the simplex
method in practice.

In fact, it would have been very difficult to find serious discussion of any approach
other than the simplex method before 1984 when Karmarkar [46] presented a novel
interior point method, which, as he claimed, was able to solve large-scale linear
programs up to 50 times faster than the simplex method. Karmarkar's announcement
led to an explosion of interest in interior point methods (IPMs) among researchers
and practitioners.

Soon after Karmarkar's publication, Gill et al. [31] showed a formal relationship be-
tween the new interior point method and the classical logarithmic barrier method.
The barrier method is usually attributed to Frisch (1955) [27] and is formally studied
in Fiacco and McCormick [23] in the context of nonlinear optimization. Much re-
search has concentrated on the common theoretical foundations of linear and nonlin-
ear programming. A fundamental theme is the creation of continuously parametrized
families of approximate solutions that asymptotically converge to the exact solution.
A basic iteration of such a path-following algorithm consists of moving from one point
in a certain neighborhood of a path to another one called a target that preserves the
property of lying in the neighborhood of the path and is "near" to the exact solution.

In the past ten years several efficient implementations of interior point methods have
been developed. Lustig, Marsten and Shanno [54] have made particularly important
contribution in this area with their code OBI. Although the implementations of
the simplex method, has improved a lot in the recent years [78, 9, 24], extensive
numerical tests (cf. [54]) have indicated conclusively that an efficient and robust
Implementation of IPMs for LP 191

implementation of an interior point method can solve many large scale LP problems
substantially faster than the state-of-the-art simplex code.

The most efficient interior point method today is the infeasible-primal-dual algo-
rithm. Therefore in this chapter we discuss techniques used in an efficient and
robust implementation of the primal-dual method. Although the chapter focuses on
implementation techniques, some closely related theoretical issues are addressed as
well.

Most relevant issues of interior point method implementations are illustrated by


computational results. Small set of test problems (from the public domain collec-
tions of LPs) is in such a case chosen to illustrate typical behavior of the presented
implementational techniques. The reader intersested to see excessive numerical re-
sults that demonstrate how the given technique works in practice should consult the
appropriate references.

The presentation starts in Section 6.2 with a description of the infeasible-primal-dual


method. Most issues of the theory and implementation of this method are now well
understood. However, two of them still remain open, namely detecting infeasibility
of the problem and the choice of a well-centered starting point. A solution to these
problems that is both mathematically elegant and implement able in practice, comes
with the use of a homogeneous and self-dual linear feasibility model. We will address
this model in Section 6.3.

The practical success of any IPM implementation depends on the efficiency and the
reliability of the linear algebra kernel in it. We focus on these issues in Section 6.4.

The major work in a single iteration of any IPM consists of solving a set of linear
equations, the so-called Newton equation system. This system reduces in all IPMs
to the problem that is equivalent to an orthogonal projection of a vector on the
null space of the scaled linear operator. The diagonal scaling matrix depends on
the variant of the method used and it changes considerably in subsequent IPM
iterations. All general purpose IPM codes use a direct approach [19] to solve the
Newton equation system. The alternative, iterative methods has not been used
as much due to difficulties in choosing a preconditioner. There are two competitive
direct approaches for solving the Newton equations: the augmented system approach
[6, 7] and the normal equations approach. The former requires factorization of a
symmetric indefinite matrix, the latter works with a smaller positive definite matrix.

In Section 6.4, we discuss both these approaches in detail, analyse their advantages
and point out some difficulties that arise in their implementation. Moreover, we
192 CHAPTER 6

present a unified framework which covers all previously presented techniques. We


also briefly discuss hardware dependencies of the implementations.

Other issues related to an efficient implementation of IPMs are addressed in Sec-


tion 6.5. We discuss the important role of preprocessing the linear program and
recall some related problems, such as the impact of the presence of free variables
and dense columns in the LP problem.

As mentioned before a direct approach is used to solve a system of Newton equa-


tions in every IPM iteration. Therefore, in each iteration a matrix factorization is
computed that requires a nontrivial amount of work. In contrast, the following back-
solve step is usually significantly cheaper. An obvious idea, known from different
applications of the Newton method, is to reuse the factorization in several iterations
or, equivalently, to repeat several backsolves to guess a better next iterate. We call
such an approach a higher order method. The first higher order method was incor-
porated into a dual affine-scaling method of AT&T's Korbx system [47]. An efficient
high-order method was proposed by Mehrotra; his second-order predictor-corrector
strategy [62] has been incorporated in all primal-dual type implementations. As
shown in Mehrotra [61], the improvement from using orders higher than 2 is very
limited. Recently, Gondzio [36] proposed a new way to exploit high order information
in a primal-dual algorithm and showed considerable improvements in solving large
scale problems. We shall address the use of higher order methods in Section 6.6.

An important issue is when to terminate an IPM. Contrary to the simplex algorithm


an IPM never generates the exact optimal solution; instead it generates an infinite
sequence converging towards an optimal solution. Hence, it is necessary to be able
to terminate an IPM after a finite number of iterations and report the exact optimal
solution. This problem is solved with Ye's finite termination scheme, see [89]. A
closely related problem is to generate an optimal basic solution from an optimal
interior point solution. In general, if an LP problem has multiple optimal solutions
an IPM does not produce an optimal solution which is also a basic solution. Megiddo
[59] has shown that if an exact primal and dual optimal solution is known, then an
optimal basic solution can be produced in strongly polynomial time using a simplified
simplex algorithm. In Section 6.7, we discuss a method which combines Ye's finite
termination scheme and Megiddo's method to produce an optimal basic solution.

Interior point methods are now very reliable optimization tools. Sometimes only for
the reason of inertia, the operations research community keeps using the simplex
method in applications that could undoubtedly benefit from the new - interior point
technology. This is particularly important in those applications which require the so-
lution of very large linear programs (with tens or hundreds of thousand constraints
and variables). We thus end the chapter with a brief guide to the interior point
Implementation of IPMs for LP 193

software available nowadays. We shall list in Section 6.8 both commercial and ex-
perimental (research) LP codes based on interior point methods. Among the latter,
there exist very efficient programs that are public domain in a form of source code
and are competitive (in terms of speed) with the best commercial products.

Although the past ten years brought an enormous development of both the theory
and the implementations of IPMs, several issues still remain open. We shall address
them in Section 6.9 before giving our conclusions in Section 6.10.

6.2 THE PRIMAL-DUAL ALGORITHM


The computationally most attractive IPM is an infeasible-primal-dual algorithm.
Indeed it has been implemented in all commercial software packages. Hence, we
start the paper presenting this algorithm.

The algorithm generates iterates which are positive (i.e. are interior with respect
to the inequality constraints) but do not necessarily satisfy the equality constraints.
Hence, the name infeasible-interior-point primal-dual method. For the sake of brevity,
we call it the primal-dual algorithm.

The first theoretical results for this method are due to Megiddo [58] who proposed to
apply a logarithmic barrier method to the primal and the dual problems at the same
time. Independently, Kojima, Mizuno and Yoshise [49] developed the theoretical
background of this method and gave the first complexity results.

The first implementations [57, 16] showed great promise and encouraged further re-
search in this field. These implementations have been continuously improved and
have led to the development of several highly efficient LP codes. Today's computa-
tional practice of the primal-dual implementation follows [51, 53, 54, 62, 36].

The practical implementations of the primal-dual algorithm still differ a lot from the
theoretical algorithms with polynomial complexity since the latter give too much
importance to the worst-case analysis. This gap between theory and practice has
been closed recently by Kojima, Megiddo and Mizuno [48] who show that the primal-
dual algorithm with some safe-guards has good theoretical properties.
194 CHAPTER 6

6.2.1 Fundamentals
Let us consider a primal linear programming problem

mm1m1ze cT x
subject to Ax = b, (6.1)
x + s = u,
x,s ~ 0,

where c, x, s, u E 1ln , b E 1lm , A E 1lmxn and its dual

maximize
subject to AT y - w + z = c, (6.2)
z,w ~ 0,

where y E 1lm and z, w E 1ln. An LP problem is said to be feasible if and only if


its constraints are consistent; it is called unbounded if there is a sequence of feasible
points whose objective value goes to infinity. An LP problem is said to have a
solution if and only if it is feasible and bounded.

With some abuse of mathematics, to derive the primal-dual algorithm one should:

• replace the nonnegativity constraints on the variables with logarithmic barrier


penalty terms;
• move equality constraints to the objective with the Lagrange transformation to
obtain an unconstrained optimization problem and write first order optimality
conditions for it; and
• apply Newton's method to solve these first order optimality conditions (i.e. to
solve a system of nonlinear equations).

Let us do this exercise.

Replacing nonnegativity constraints with the logarithmic penalty terms gives the
following logarithmic barrier function
n n

L(x,s,p) = cT x - J.l Llnxj - p Llnsj. (6.3)


j=l j=1
Implementation of IPMs for LP 195

Next, we write the first order optimality conditions for it

Ax b,
X+S u,
ATy+z - W c, (6.4)
XZe J-le,
SWe J-le,

where X, S, Z and Ware diagonal matrices with the elements Xj, Sj, Zj and Wj,
respectively, e is the n-vector of all ones, J-l is a barrier parameter and Z = J-lX-le.
Let us observe that the first three of the above equations are linear and force primal
and dual feasibility of the solution. The last two equations are nonlinear and depend
on the barrier parameter J-l. They become the complementarity conditions for J-l 0, =
which together with the feasibility constraints provides optimality of the solutions.

It can be seen that (6.4) is identical to the Karush-Kuhn-Tucker (KKT) system


for the LP problem, in which the complementarity conditions are perturbed by J-l.
Hence, (6.4) is called the perturbed KKT conditions.

A nonnegative solution of (6.4) is called an analytic center. It clearly depends on


the value of the barrier parameter Ji. The set of such solutions (x(J-l), s(J-l» and
(Y(J-l), z(J-l), w(J-l» defines a trajectory of centers for the primal and dual problem,
respectively and is called the central path. The quantity

measures the error in the complementarity and is called a complementarity gap. Note
that for a feasible point, this value reduces to the usual duality gap. For a J-l-center,
for example,
9 = 2J-le T e = 2nJ-l, (6.5)
and it vanishes at an optimal solution.

One iteration of the primal-dual algorithm makes one step of Newton's method
applied to the first order optimality conditions (6.4) with a given J-l and then J-l is
updated (usually decreased). The algorithm terminates when the infeasibility and
the complementarity gap are reduced below predetermined tolerances.
196 CHAPTER 6

Given an x,s,z,w En+., y En m , Newton's direction is obtained by solving the


following system of linear equations

[~
0 0 0
o
0
AT
I
0
0
I o
-I ][
fly
dx
fls 1= [ (, 1{u
{c , (6.6)
0 0 X o flz Jle-XZe
0 W 0 S flw Jle - SWe

where

{b b-Ax,
{u = U - x - s,

and {c c- ATy_ z+w,

denote the violations of the primal and the dual constraints, respectively. We call
the linear system (6.6) the Newton equations system.

Note that the primal-dual method does not require feasibility of the solutions ({b, {u
and {c might be nonzero) during the optimization process. Feasibility is attained
during the process as optimality is approached. It is easy to verify that if a step
of length one is made in the Newton's direction (6.6), then feasibility is reached
immediately. This is seldom the case as a smaller stepsize usually has to be chosen
(a damped Newton iteration is taken) to preserve positivity of x, s, z and w. If this is
the case and a stepsize a < 1 is applied, then infeasibilities {b, {u and {c are reduced
by a factor (1 - a).

Let us take a closer look at the Newton equation system. After elimination of

flz = X-1(Jle - XZe - Zflx),


fls {u -flx, (6.7)
flw S-l(Jle - SWe - W fls) = S-l(Jle - SWe - W{u + W flx),
it reduces to
[ _~-2 (6.8)

where

= (X-1Z+S-1W)-1,
r = {c - X-1(Jle - XZe) + S-l(Jle - SWe) - S-lW{u, (6.9)
h {b.
Implementation of IPMs for LP 197

The solution of the reduced Newton equations system (6.8) is the computationally
most involved step of any interior point method. We shall discuss it in detail in
Section 6.4.

Once the system (6.8) has been solved, ~x and ~y are used to compute ~s, ~z and
~w by (6.7). Next the maximum step sizes in primal space (ap) and dual space (aD)
are computed such that the nonnegativity of variables is preserved. These step sizes
are slightly reduced with a factor ao < 1 to prevent hitting the boundary. Finally a
new iterate is computed as follows
x k +! Xk + aoap~x,
sk+! sk + aoap~s,
yk+! yk + aoaD~y, (6.10)
zk+! zk + aOaD~z,
w k +! w k + aOaD~w.

After making the step, the barrier parameter Jl is updated and the process is re-
peated.

6.2.2 From Theory to Computational Practice


In the previous section we have outlined the primal-dual algorithm. Now, we shall
address some practical issues of its implementation.

From theory it is known that if the barrier parameter is only reduced slightly in each
iteration it is possible to take long steps in the Newton direction. It implies fast
convergence of Newton's method and all iterates are close to the central path. In
practice it is not efficient to reduce the barrier parameter slightly in every iteration
and stay very close to central path. (Recall we want to find a solution where the
barrier parameter is zero). On the other hand it is not efficient to move too far away
from the central path and close to the boundary, because in that case the algorithm
might get stuck taking small step in the Newton direction. Hence, convergence will
be painfully slow.

Starting point

The first difficulty arising in implementing the primal-dual method is the choice
of an initial solution. (Note that this problem is solved in an elegant way when a
homogeneous model is used, cf. Section 6.3.) One would like this point to be well
centered and to be as close to primal and dual feasibility as possible. Surprisingly,
198 CHAPTER 6

points that are relatively close to the optimal solution (but are not well centered)
often lead to bad performance and/or numerical difficulties.

Mehrotra [62] has proposed to solve a certain quadratic programming problem to


obtain the initial solution. We will now present a variant of his idea. As starting
solution we use the optimal solution to the following quadratic programming (QP)
problem

subject to Ax = b, (6.11)
x + s = u,
where {! is a predetermined weight parameter. A solution of (6.11) can be given by
an explicit formula and can be computed at a cost comparable to a single interior
point iteration. It is supposed to minimize the norm of the primal solution (x, s) and
it promotes points that are better in the sense of the LP objective. As the solution
of (6.11) may have negative components in x and s, those negative components are
pushed towards positive values sufficiently bounded away from zero (all elements
=
smaller than 8 are replaced by 8, say, 8 1). Independently, an initial dual solution
=
(y, z, w) is chosen similarly to satisfy y 0 and the dual constraint (6.2). Again, all
elements of z and w smaller than 8 are replaced by 8.

Stepsize

The simplest way to ensure that all iterates remain close to the central path is to
decrease the barrier parameter slowly in subsequent IPM iterations. This gave rise
to so called short step methods that are known to have nice theoretical properties
but they are also known to demonstrate hopelessly slow convergence in practice.

In long step methods the barrier parameter is reduced much faster than what the
theory suggests. To preserve good convergence properties of this strategy the theory
requires that several Newton steps are computed within each primal-dual iteration
such that the new point is in a close neighborhood of the central path. In practice
this is ignored and only one Newton step is made before the barrier parameter is
reduced. A negative consequence of it is that the iterates cannot be kept close to the
central path. However, the computational practice shows that even if they remain
in a relatively large vicinity of the central path, the algorithm still converges fast.

The barrier parameter is chosen as some fraction of the average complementarity


product at the current point (cf. equation (6.5»

Jlnew = ,Jlaverage = ,(g/2n), (6.12)


Implementation of IPMs for LP 199

where, E [0,1]. The choice of, = 1 corresponds to a pure recentering step while
the choice of, < 1 is expected to reduce the complementarity gap in the next
iterate. Indeed if the iterates are feasible the complementarity gap is guaranteed to
be reduced by a factor (1 - a(1 - ,».
The choice of'Y or, more generally, the choice of a point (so-called target) to which
the next iterate will hopefully be taken is a crucial issue for the efficiency of the
primal-dual method. We shall discuss it in detail in Section 6.6.

Let us observe that current implementations use different stepsizes in the primal and
dual spaces. This implies that the infeasibility is reduced faster than if the same
stepsize was used. All implementations use a variant of the following strategy. First
the maximum possible stepsizes are computed by the formulae
ap := max a> 0: (x, s) + a(dx, dS) ~ 0,
(6.13)
and aD := max a> 0: (z, w) + a(dz, dW) ~ 0,
and these step sizes are slightly reduced with a factor ao = 0.99995 to ensure that
the new point is strictly positive. Some codes use smaller ao in those iterations in
which 0.99995 might be too aggressive. However, in most cases this aggressive choice
of ao seems to be the best.

In general, the algorithm cannot be guaranteed to be globally convergent with the


choice ao = 0.99995. However, Kojima, Megiddo and Mizuno [48] has proved global
convergence of a variant of the primal-dual method that allows the aggressive choice
of ao in most iterations. To ensure global convergence, the stepsizes must be chosen
such that the infeasiblities converge faster to zero than the complementarity gap and
the iterates are not allowed to move too far away from the central path. For most
LP problems with the default starting point (described previously) the additional
safe-guards are not constraining for the stepsize.

Stopping criteria

Interior point algorithms terminate when the first order optimality conditions (6.4)
are satisfied with some predetermined tolerance. In the case of the primal-dual
method, this translates to the following conditions imposed on the relative primal
and dual feasibility and the relative duality gap

IIAx - bll < lO-p and Ilx + s - ull < lO-p (6.14)
1 + Ilbll - 1 + lIull - ,
IIATy+z-w-ell < lO-p (6.15)
1 + Ilell - ,
200 CHAPTER 6

(6.16)

where p is the number of digits accurate in the solution. An 8-digits exact solution
=
(p 8) is typically required in the literature.

Let us observe that conditions (6.14-6.16) depend strongly on the scaling of the
problem. In particular, the denominators of their left hand sides usually decrease
after scaling of the problem.

In practice, it is rare that condition (6.16) is satisfied and at the same time one of the
conditions (6.14) or (6.15) does not hold. The explanation of this phenomena comes
from the analysis of the first order optimality conditions (6.4). Observe that the first
three equations, that impose primal and dual feasibility, are linear. They are thus
"easier" to satisfy for Newton's method than the last two equations that are nonlinear
and, additionally, change in subsequent interior point iterations. Consequently, the
most important and perhaps the only condition that really has to be checked is
(6.16).

Complexity

At least at one point the theory is still far from the the computational practice; it
is in the estimates of the worst-case complexity. Theoretical bound of O(.,fii log ~ )
iterations to obtain an i-exact solution to an LP is still extremely pessimistic as,
in practice, the number of iterations is something like O(logn) or O(nl/4). It is
rare that the current implementation of the primal-dual method uses more than 50
iterations to reach 1O-8-optimality.

6.3 SELF-DUAL EMBEDDING


Two important elements in the primal-dual algorithm has not been solved satisfac-
torily from a practical point of view.

The first element is the choice of a initial solution. Even though the heuristic pre-
sented in the previous section works well in practice, it is scaling dependent and
there is no guarantee that the method is producing a well-centered point.

The second element is the lack of a reliable technique to detect infeasibility or un-
boundedness of the LP problem. The infeasibility or unboundedness of one of the
Implementation of IPMs for LP 201

problems (6.1) and (6.2) usually manifests in a rapid growth of the primal or dual
objective function and immediately leads to numerical problems. This is really a
critical point in any implementation of the primal-dual algorithm.

The algorithm presented in this section removes both these drawbacks. It is based
on a skew-symmetric and self-dual artificial LP model first considered by Ye et al.
[90). Somewhat later Jansen et al. [45) presented the skew-symmetric self-dual
model for a primal-dual pair in a symmetric form. Xu et al. [86, 87) considered a
homogeneous and self-dual linear feasibility (HLF) model that was in fact studied
already in the 60s by Goldman and Tucker [33, 80). Xu [84, 85) developed a large
step path following LP algorithm based on the HLF model and implemented it.

The main advantage of the algorithm is it solves the LP problem without any reg-
ularity assumption concerning the existence of optimal, feasible, or interior feasible
solutions. If the problem is infeasible or unbounded, the algorithm correctly detects
the infeasibility for at least one of the primal and dual problems. Moreover, the
algorithm may start from any positive primal-dual pair, feasible or infeasible, near
the central ray of the positive orthant. Finally, even if the algorithm takes large
steps it achieves O{ foL )-iteration complexity.

Compared to the primal-dual method from the previous section this algorithm has
only one disadvantage: it requires one additional solve with the factorization of the
Newton equation matrix in each iteration.

6.3.1 HLF Model


Let us now present the HLF model. For the sake of simplicity, we will work through-
out this section with a simplified primal LP formulation (in which all primal variables
are nonnegative and without upper bound) that is
minimize cT x subject to Ax = b, x 2: 0, (6.17)
where c, x E Rn, bERm, A E R mxn and its dual
maximize bT y subject to AT y ~ c, (6.18)
where y E Rm. Introducing a homogeneous variable T and coupling the primal and
dual problem together gives the homogeneous and self-dual linear feasibility model
Ax -bT = 0,
_ATy +CT 2: 0,
bTy -cTx (6.19)
2: 0,
y free, x 2: 0, T 2: 0.
202 CHAPTER 6

This linear feasibility system is homogeneous and has zero as its trivial solution.
The zero solution is of course not of interest, but LP theory tells us that a strictly
complementary solution exists to any linear program. Now the HLF model (6.19) is
an LP problem with zero objective function and a zero right hand side. Furthermore,
it is self-dual. Denote by z the slack vector for the second (inequality) constraint and
by K, the slack scalar for the third (inequality) constraint. By the skew-symmetric
and self-dual property, the complementary pairs are (x, z) and (T, K,). A strictly
complementary solution for the HLF model satisfies (6.19) and

Xz = 0, (6.20)
and x + z > 0,
where X = diag(x). Let (y*, x*, T*, z*, K,*) be a strictly complementary solution of
the HLF model. We can prove the following:

• If T* > 0, then (y* IT*, x* IT*, z* IT*) is an optimal strictly complementary so-
lution to (6.17) and (6.18).

• If T* = 0, then K,* > 0, which implies that cT x* - bT y' < 0. i.e. at least one of

° ° °
cT x' and _bT y* is strictly less than zero. If cT x* < then (6.18) is infeasible;
if _bT y' < then (6.17) is infeasible; and if both cT x* < and _b T y* < 0,
then both (6.17) and (6.18) are infeasible.

6.3.2 A Path Following Algorithm


Due to the third constraint of (6.19), the HLF model does not have a feasible interior
point. Therefore, a definition of a central path similar to (6.4) makes no sense since it
is restricted to the interior feasible region. In this subsection we will define a central
path, which connects any given initial positive pair (x, z) and (T, K,) and a strictly
complementary solution of the HLF model. Afterwards an algorithm is developed
based on following such an "infeasible" central path to a strictly complementary
solution.

For any (y, x > 0, T > 0, z > 0, K, > 0), the feasibility residuals and the average
complementarity residual are defined as

rp = bT - Ax,
rD = CT -AT Y - z,
(6.21)
rG = cT x - bT Y + K"
and J.1. = (x T z + TK,)/(n + 1),
Implementation of IPMs for LP 203

respectively. Given (yO, xO > 0, TO> 0, zO > 0,11:° > 0), the following barrier problem
with a parameter A defines a central path:
mInimize ZT X +II:T -A,.,O 2::i(Inxi + lnzi) - A,.,O(ln T + In 11:)
subject to A x -b T -A r~,
(6.22)
+c T -z A r'b,
-II: = -A r~.

where (r~, r'b, r~) and,.,o are initial residuals at (yO, xO > 0, TO> 0, zO > 0,11:° > 0).
As shown in Xu [87], it is essential to introduce feasibility residual terms in the right
hand sides of (6.22). Along the central path, the feasibility and complementarity
residuals are reduced at the same rate and eventually converge to zero. The same
rate of reduction in the feasibility and complementarity residuals guarantee that the
limit point is a strictly complementary solution of the HLF model (6.19).

By using the skew-symmetric property, the first order optimality conditions for the
barrier problem (6.22) are
Ax -b T -A r~,
_AT y +c T -z A r'b,
bT y _c T X -II: -A r~,
(6.23)
Xz A ,.,°e,
Til: A ,.,0,
x, T, Z, II: > O.
for A E (0,1].

It is worth to compare this system with the analogous first order optimality con-
ditions (6.4) used in the primal-dual algorithm presented in the previous section.
Note, for example, that conditions (6.23) define the central path even though the
model (6.19) has not an interior point. This is important when highly degenerate
problems are solved. Indeed for this reason it might be helpful, to add feasibility
residuals into (6.4).

Analogously to the primal-dual algorithm, the search direction for the "infeasible"
path following algorithm is generated by applying Newton's method to (6.23). Ac-
tually, in each iteration the algorithm solves the following linear equation system for
the direction (.1.y,.1.x, ~ T, ~z, .1.11:):

(6.24)
204 CHAPTER 6

where (ri, r1), r~) and pk are residuals at the current point (yk, xk > 0, Tk > 0, zk >
0, ",k > 0) and IE [0,1] is a chosen reduction rate of the barrier (or path) parameter.
Setting I = 0 yields an affine direction, and setting I =
1 yields a pure centering
direction. After the Newton direction has been computed, a stepsize is chosen, using
the same method as in the primal-dual algorithm, such that the new point is stricly
positive.

The algorithm continues until one of the following stopping criteria is satisfied.

• The LP problem is infeasible (or near infeasible) if

• Optimal (approximate) solution is obtained if

Ilrpll < 10- 8 IlrD11 < 10- 8


T+ Ilxll ' T+ Ilzll .

If the step length is chosen such that the updated solution is still in a certain neigh-
borhood of the central path, then a worst case polynomial complexity result can
be established. Xu [84] restricted all iterates to stay within an intersection of an
CX)-norm neighborhood and a large 2-norm neighborhood of the central path. In this
case, the implementation achieves O( foL )-iteration complexity in the worst case.

Clearly the dimension of the Newton equation system solved by the homogeneous
algorithm is slightly larger than the corresponding system solved in the primal-
dual method. In fact the dimension is increased by exactly one. The primal-dual
method can implemented such that the same factorization as in primal-dual method
is computed in each iteration. However, the factorization must be used in one more
solve to compute the solution of the Newton equation system, see [86] for details.

6.4 SOLVING THE NEWTON EQUATIONS


In Section 6.2 we noted that the solution of the Newton equations system is the
computationally most involved task in the primal-dual method. This system reduces,
in practice, to the following set of equations

[ _~-2 (6.25)
Implementation of IPMs for LP 205

It should be noted that all IPMs solve an identical system of linear equations. The
only difference is in the value of the diagonal matrix D2 and the right-hand side.
This is the reason why the comparison of different variants of interior point methods
is often simplified to a comparison of the number of iterations (Newton steps).

The linear system (6.25) can be solved using either direct or iterative methods. Itera-
tive methods, e.g., conjugate gradient algorithms are not competitive in general case
due to the difficulties in choosing a good and computationally cheap preconditioner.
Some success with iterative methods for special LP problems has been obtained, see
[71,70]

Consequently, all state of the art implementations of the general purpose IPMs use a
direct approach [19] to solve the Newton equations. We can be even more specific to
say that they all use some variant of the symmetric triangular LALT decomposition,
where L is a lower triangular matrix and A is a block diagonal matrix with blocks of
dimension 1 or 2. To complete the discussion, let us mention an alternative direct
approach ~ the QR decomposition of A. Although this approach uses orthogonal
transformations and guarantees high accuracy of solutions, it cannot be used in
practice since it is prohibitively expensive.

Summing up, the only practicable approach to solve the Newton equations in general
purpose IPM codes is the LALT decomposition. There exist numerous variants of
its implementations. They differ essentially in restrictions imposed on the choice of
the pivot order and, from some perspective, they can all be viewed within the same
unifying framework that we shall present later in this section. We will be able to do
it after we will have described the two major alternative approaches. The first one
reduces (6.25) to the normal equations

(6.26)

by pivoting down the diagonal elements of _D~2 in (6.25). The other approach
solves the augmented system (6.25) directly without necessarily pivoting in the _D2
part first.

Next we shall address some technical aspects of the implementation and its de-
pendency on the computer hardware. Due to the rapid changes in the computing
technology, a detailed discussion of the effect of computer hardware goes beyond
the scope of this book. We shall display, however, several important points where
different computer architectures influence the efficiency the most. Finally, we shall
discuss some issues of accuracy control within IPM implementations.
206 CHAPTER 6

6.4.1 The Normal Equations Approach


An advantage of the normal equations approach is that it works with a positive
definite matrix AD2 AT (we assume that the LP constraint matrix has full row
rank; D2 is positive definite by definition). Thus the Cholesky decomposition of
this matrix exists for any D2 and numerical pivoting is not necessary to maintain
stability. Moreover, the sparsity pattern in the decomposition is independent of
the value of D2 and hence is constant in all IPM iterations. Consequently, a good
sparsity preserving pivot order can be chosen with much care (even if it involves
considerable computational effort) since it will be used extensively throughout the
whole solution process. This argument has been used to justify the application of the
normal equations approach in the first professional IPM implementations [1,47,57].

The success of the implementation of the Cholesky factorization depends on the


quality of its analysis phase [19, 29], i.e. reordering for sparsity. Its goal is to
find a permutation matrix P such that the Cholesky factor of P AD2AT p T is the
sparsest possible. In practice, heuristics are used to solve this problem since finding
an optimal permutation (that is, by the way, an NP-complete problem [88]) would be
unacceptably expensive. Two such heuristics, namely the minimum degree and the
minimum local fill-in orderings [19, 29, 30] are particularly useful in the context of
IPM implementations. They are both local, i.e. they rely on the pivot choice limited
to a small subset of the most attractive pivot candidates. Let us briefly discuss these
two heuristics.

Minimum degree ordering

Assume that in the kth step of the Gaussian Elimination, the ith column of the
Schur complement contains Ci nonzero entries and its diagonal element becomes a
pivot. The kth step of elimination requires thus

Ii = 2"(ci
1
-
2
1) , (6.27)

floating point operations flops to be executed. We exploit the fact that the de-
composed matrix AD2 AT is positive definite so the pivot choice can be limited to
the diagonal elements. In fact, this choice has to be limited to diagonal elements
to preserve symmetry. Function f; evaluates the computational effort and gives an
overestimate of the fill-in that can result from the elimination if the ith diagonal ele-
ment becomes a pivot (f; is the Markowitz merit function [55] applied to a symmetric
matrix [79]).
Implementation of IPMs for LP 207

The "best" pivot at step k, in the sense of the number of flops required to perform
the elimination step, is the one that minimizes Ii. Interpreting this process in terms
of the elimination graph [29}, one can see that it is equivalent to the choice of
the node in the graph which has the minimum degree (this gave the name to this
heuristic). The minimum degree ordering algorithm can be implemented efficiently
both in terms of speed and storage requirements. For details, the reader is referred
to the excellent summary in [30].

Minimum local fill-in ordering

Let us observe that, in general, function (6.27) considerably overestimates the ex-
pected number of fill-ins in a given iteration of the Gaussian Elimination because it
does not take into account the fact that in many positions of the predicted fill-in,
nonzero entries already exist. It is possible that another pivot candidate, although
more expensive in terms of (6.27), would produce less fill-in as the elimination step
would mainly update already existing nonzero entries of the Schur complement. The
minimum local fill-in ordering chooses such a pivot. Generally, the minimum local
fill-in algorithm produces a sparser factorization but at higher initial cost to obtain
the ordering [54], because the analysis that exactly predicts fill-in and chooses the
pivot producing its minimum number is very expensive.

Another efficient technique to determine the pivot order has been proposed in [65].
The method first selects a set of attractive pivot candidates and, in the next step,
from this smaller set chooses the pivot that generates the minimal predicted fill-in.
Computational experience shows considerable improvement in speed without the loss
in the quality of the ordering.

Numerical examples

To give the reader some rough idea about the advantages of the two competitive
ordering schemes, we shall compare their performance on a subset of medium scale
linear problems from the Netlib collection [28]. Table 6.1 collects the results of
this comparison. Abbreviations MDO and MFO in it denote the minimum degree
ordering and the minimum local fill-in ordering, respectively.

The first three columns of Table 6.1 contain the problem names and the times (in
seconds) of the analysis phase for the two orderings considered. The analysis time
includes the setup for the ordering (i.e. building a representation of AAT), the order-
ing time, and the time for building the nonzero patterns of the Cholesky factors. For
208 CHAPTER 6

Table 6.1 Comparison of minimum degree (MDO) and minimum local fill-in
(MFO) orderings
Name Analysis tIme Nonzeros in L Flops in thousand FactorizatIOn time
MOO MFO MOO MFO MOO MFO MOO MFO
25fv47 0.50 1.38 32984 27219 1282 811 0.345 0.244
80bau3b 1.22 2.12 37730 34006 1171 893 0.424 0.361
bnl2 0.91 2.82 59437 56705 3860 3420 0.957 0.889
cycle 0.93 1.80 54682 39073 2004 920 0.565 0.305
d2q06c 1.89 5.74 135960 91614 11327 4752 2.693 1.308
degen3 20.77 13.33 119403 115681 7958 7403 2.312 2.198
dllOOl 37.40 552.44 1632101 1445468 711739 547005 160.471 129.905
greenbea 2.21 2.11 47014 45507 907 842 0.379 0.341
grow 22 0.21 0.51 8618 8590 157 156 0.064 0.055
maros-r7 6.70 47.49 510148 511930 70445 72568 15.730 15.945
pilot 5.67 25.18 191704 172264 24416 18956 5.704 4.731
pilot87 19.27 110.71 423656 389787 88504 75791 20.725 18.138
pilot-we 0.29 0.58 14904 13887 350 292 0.124 0.100

both algorithms, the ordering time is the dominating factor. The following columns
contain the number of nonzeros in the Cholesky factors produced by the two order-
ings, the number of flops (in thousand) needed to compute the factorization including
flops required by the computation of AAT. The last two columns contain the average
time (in seconds) to execute one factorization on a SUN Sparc-10 workstation.

The results presented in Table 6.1 indicate that MDO is usually faster than MFO
(degen3 is one exception) but it usually produces denser Cholesky factors. Without
going into details, we note that on problems where the nonzeros of AAT are con-
centrated in a tight band near the diagonal (e.g.: grow22, maros-r7), MFO does
not offer any advantage over MDO. In contrast, on problems with "hard" structures
(e.g.: cycle, dfl001) MFO may be more efficient. Figure 6.1 shows the sparsity
patterns of the Cholesky factors obtained by the minimum degree and minimum
local fill-in orderings for the problem cycle, on which the largest difference between
the two heuristics has been observed.

We have to be careful when giving final conclusions. An additional difficulty comes


with the fact that the numerical factorization depends very much on the hardware
and, in particular, on the ratio of the performance of integer and floating point
operations on a given machine. We shall address this problem in more detail in
Section 6.4.4. Here we only conclude that the minimum degree ordering performs
sufficiently well to be a default option in any IPM implementation. In some cases,
however, when very difficult problems are solved or a sequence of problems with the
same sparsity patterns is solved, the more involved analysis of the minimum local
fill-in ordering may payoff.
Implementation of IPMs for LP 209

Figure 6.1 Sparsity pattern with the MDO (left) and MFO (right) on problem
cycle

I/:'-:
01:.-
50':"

...... I.
-:-,.',

;.:.;:'~ .. i
~ - .......::".' . .:
''':''' j

.... :. ~

./1- ,

;:- ...... -

.:
............
:~: ~ r::~ !-

.~:~:~:.~~::/; ~
,.
210 CHAPTER 6

Disadvantages of the normal equations approach

The normal equations approach shows a uniformly good performance when applied
to the solution of the majority of all linear programs. Unfortunately, it suffers from
two drawbacks.

Normal equations behave badly whenever the primal linear program contains free
variables. To transform the problem to the standard form (6.1), any free variable
has to be replaced with a difference of two nonnegative variables: XF = x+ - x-.
The presence of logarithmic terms in the objective function causes very fast growth
of both split brothers. Although their difference may be kept relatively close to the
optimal value of x F, both x+ and x- tend to infinity. This results in a serious loss
of accuracy in (6.26). A remedy used in many IPM implementations is to prevent
excessive growth of x+ and x-.

A more serious drawback of the normal equations approach is that it suffers dramat-
ically from the presence of dense columns in A. The reason is that a dense column
in A with p nonzero elements creates a dense window of size p x p in the AD2 AT
matrix (subject to its symmetric row and column permutation). Assume that

(6.28)

where Al E R mxn - Ic and A2 E Rmxlc are matrices built of sparse and dense columns,
respectively. Several techniques have been proposed to treat the A2 part separately.

The simplest one, due to Birge et al. [8] makes a choice between the factorizations
of AAT and AT A matrices. The latter factorization easily accommodates dense
columns of A (dense rows of AT). The approach clearly fails when A contains both
dense columns and dense rows.

Another possibility is the column splitting technique [35, 82]. It cuts a long column
into shorter pieces, introducing additional linking constraints. Unfortunately, it
works satisfactorily only for a small number of dense columns [37].

The most popular way of treating dense columns within the normal equations ap-
proach employs the Schur complement mechanism. It is based on (6.28) and an
explicit decomposition of the matrix

(6.29)

into a presumably sparse part Al Dr AIand a significantly denser symmetric rank-k


update of it. A Cholesky decomposition is then computed for the "sparse" part and
Implementation of IPMs for LP 211

the dense rank-k update is handled via the Sherman-Morrison-Woodbury formula.


This method is not guaranteed to work correctly because the sparse part may be rank
deficient (clearly, a full row rank assumption on A does not guarantee that Al has full
row rank). Whenever this happens, the Cholesky decomposition of AID? Af does not
exist and the Sherman-Morrison- Woodbury update is not well defined. Therefore in
a practical implementation a small diagonal regularization term is added to Al Dr Af
such that the decomposition exists. The method usually works satisfactorily for a
small number of dense columns.

Recently, Andersen [5) proposed a remedy to the rank deficiency arising in the Schur
complement mechanism. His approach employs an old technique due to Stewart
[74). The technique corrects all unacceptably small pivots during the Cholesky fac-
torization by adding a regularizing diagonal term to them. Consequently, instead of
computing the decomposition of Al Dr AI, it computes a decomposition of another
matrix A1DiAf + uEET , where u is a regularizing term and E is a matrix built
of unit columns with non zeros appearing in rows corresponding to corrected pivots.
Once such a stable decomposition is obtained

(6.30)

it is used as a stable "working basis" in the Sherman-Morrison-Woodbury update to


compute

Stewart's technique is attractive, of course, only for a small rank deficiency of


Al DiAf,

Andersen [5) observed that the rank deficiency of Al Dr Af cannot exceed k, the
number of columns handled separately. His method consists of correcting too small
pivots in the factorization of AIDrAf by computing the following (stable) Cholesky
decomposition

Next, this factorization is employed in the Schur complement mechanism to compute

Summing up, it is possible to overcome the most important drawback of the normal
equations approach, i.e. to handle dense columns in it. However, there still remains a
question about the heuristic to choose the columns that should be treated separately.
212 CHAPTER 6

A trivial selection rule based on the number of nonzero elements in a column does
not identify all "hard" columns; we shall discuss this issue in the next section.

Recall that the Schur complement mechanism is efficient if the number of dense
columns in the constraint matrix is not excessive. This motivated several researchers
to pay special attention to the augmented system form of the Newton equations
which allows more freedom in the pivot choice.

6.4.2 The Augmented System Approach


The augmented system approach is an old and well understood technique to solve a
least squares problem [6, 7, 11, 19]. It consists in the application of the Bunch-Parlett
[13] factorization to the symmetric indefinite matrix

[ _D-2
A
AT]
0 = LAL
T
, (6.31)

where A is an indefinite block diagonal matrix with 1 x 1 and 2 x 2 blocks.

In contrast to the normal equations approach in which the analysis and factorization
phases are separated, the factorization (6.31) is computed dynamically. This means
that the choice of pivot is concerned with both the sparsity and stability of the
triangular factor. It is obvious that, due to the careful choice of stable pivots, this
factorization must be at least as stable as the one of the normal equations. On
the other hand, due to the greater freedom in the choice of the pivot order, the
augmented system factorization may produce a significantly sparser factor than that
of the normal equations. Indeed the latter is a special case of (6.31) in which the
first n pivots are chosen from the D2 part regardless their stability properties and
without any concern about the fill-in they produce.

Advantageous stability properties of the augmented system approach motivated sev-


eral researchers to incorporate it into their IPM codes [20, 26, 56, 81, 83]. Soon
afterwards, other advantages of this approach, namely, an ease of handling free LP
variables and dense columns in A and an ability of its easy extension to handling
quadratic programming problems were recognized [83, 60, 14].

The success of the augmented system factorization depends highly on the efficiency
of the pivot selection rule. Additionally, to save on the expensive analysis phase, the
pivot order is reused in subsequent IPM iterations and only occasionally updated
when the numerical properties of the Newton equation matrix has changed consid-
erably. Mehrotra's implementation [26, 60], for example, is based on the Bunch-
Implementation of IPMs for LP 213

Parlett factorization [13] and on the use of the generalized Markowitz [55] count of
type (6.27) for 2 x 2 pivots.

On the other hand, it has been shown in [66] that the 1 x 1 pivot scheme is always
valid when computing the symmetric factorization of the augmented matrix, and if a
valid pivot order is computed for a certain D2, it will in theory be valid for arbitrary
D2 matrices occurring during the interior point iterations. However, this ordering
might be numerically unstable.

A popular way of the pivot selection rule is detecting "dense" columns and pivoting
first in the diagonal positions of D- 2 in the augmented matrix falling outside of
them. A difficulty arises, however, with the choice of a threshold density used to
group columns of A into the sparse and the dense parts in (6.28). A fixed threshold
value approach works well only in a case when dense columns are easily identifiable,
i.e. when the number of non zeros in each of them exceeds significantly the average
number of entries in sparse columns [83]. Whenever more complicated sparsity
structure appears in A, a more sophisticated heuristic is needed. Maros and Meszaros
[56] give a detailed analysis of this issue that we shall present below.

Instead of (6.28), he considers the following partition of the LP constraint matrix

A = [ All (6.32)
A2l
where All is supposed to be very sparse and additionally it is assumed to create
a sparse adjacency structure AllAfl' Al2 is a presumably small set of "difficult"
columns, e.g., dense columns or columns referring to free variables, and [A2l A 22 ]
is a set of "difficult" rows. An efficient heuristic to find such a partition is given in
[56].

Once the partition (6.32) is determined, (6.25) becomes

The analysis of this system shows immediately which block can be inexpensively
pivoted out and which one should be delayed as much as possible. Elimination of
D12 causes very limited fill-in and reduces the matrix to

Af2
All DrAfl (6.33)
A2lDrAfl
214 CHAPTER 6

The elimination of the D;2 block should be delayed after all attractive pivot can-
didates from AllDi Ail and A2lDi AIl blocks are exploited. The normal equations
approach makes no such a distinction and pivots out both D12 and D;2 blocks.

It is worth to note a close relationship of the approach of [56] and the Schur comple-
ment mechanism applied to handle the block of "difficult" columns in A. Observe
that the normal equations

[ All
An ][
can be replaced with the following system

(6.34)

in which all "difficult" columns are handled as a symmetric rank-k update of an


"easy" part (cf. (6.29))

It is easy to verify that the matrix involved in the system (6.34) has exactly the
same sparsity pattern (subject to symmetric row and column permutations) as that
in {6.33}.

Normal equations versus the augmented system

Table 6.2 compares the efficiency of the normal equations (NE) and the augmented
system (AS) approaches. We cluster our test problems into three groups. The first
group contains problems with dense columns (aircraft, fitlp, fit2p). In the
second group we collect some problems without dense columns, but with a "preju-
dicial" nonzero pattern for the normal equations (ganges, pilot4. stair). The
last group contains problems without any advantageous structure for the augmented
system. The first two columns of Table 6.2 contain the name of the problem and
the number of nonzeros in the densest column. The following two columns show
the setup time (in seconds) for the two competing approaches. Note that the setup
time includes not only the generation of the pivot order and the sparsity pattern
analysis but also the time of one numerical factorization. Columns 5 and 6 contain
Implementation of IPMs for LP 215

Table 6.2 Comparison of normal equations (NE) and augmented system (AS)
factorizations
Name Dens. Analysis time Nonzeros Flops In 1000's Fact. tIme
col. NE AS NE AS NE AS NE AS
aIrcraft 751 115.2 0.97 1437398 20317 361174 37 79.19 0.122
fitlp 627 14.22 0.33 206097 10120 42920 63 9.281 0.058
fit2p 3000 - 1.73 . 50583 - 266 . 0.328
ganges 13 0.58 0.98 35076 23555 770 316 0.252 0.122
pilot4 27 0.64 0.58 18851 14153 488 265 0.146 0.082
stair 34 0.44 0.48 17990 11693 461 188 0.129 0.062
25fv47 21 0.93 2.77 43202 43569 1282 1297 0.363 0.412
80bau3b 11 2.02 3.38 57202 57683 1171 1181 0.476 0.487
d2q06c 34 5.34 22.30 167318 178763 11328 14480 2.85 3.604

the number of non zeros in the factorization (in a case of the NE, this corresponds to
the sum of nonzeros in the Cholesky factor of (6.26) and non zeros in A). Columns
7 and 8 contain the number of flops (in thousands) required by one factorization
for the two approaches compared. The last two columns show the average times (in
seconds) to compute one factorization during the algorithm. All results are obtained
on a SUN Sparc-lO workstation.

The results of Table 6.2 obtained for problems with dense columns show an un-
questionable advantage of the augmented system over a trivial implementation of
the normal equations in which dense columns are not handled separately. Our 64
Mbyte workstation was unable to store the lower triangular part of a 3000 x 3000
totally dense matrix that resulted from the normal equations approach applied to
the problem fit2p. In contrast, the augmented system produced a very sparse fac-
torization in this case. For our second group of problems, the performance of the
augmented system is also much better. Finally, for our third group of problems, the
much lower setup cost of the normal equations made the augmented system approach
disadvantageous.

Figure 6.2 gives a bit of insight into the sparsity patterns generated for the prob-
lem stair. It displays the factored augmented matrices for the two competitive
approaches.

Based on the previous examples, we find that both methods are important for a
computational practice. It would be advantageous to have both ofthem implemented
as well as to have an analyzer that is able to determine which of them should be
used [56].
216 CHAPTER 6

Figure 6.2 Sparsity patterns with the NE (left) and AS (right) pivot rule on
problem stair ......------------"7'"--,.-----,

.
,. .;
.- s..
~.

.~ .
;...: , ..f~
.'
'-'

~ : ;:' . ~

•• ~.: .~ ..... ~ " : ' , ,. •• ' : "I~

1\ i\ ..:.~ Ii -_"~~.....
...... .....

•- :I. • •
Implementation of IPMs for LP 217

6.4.3 The Numerical Factorization


In this section we shall demonstrate several issues of the implementation of the
numerical factorization step. We use the normal equations approach because of
notational convenience, but the methods used here can be applied in a similar way
to the general symmetric decomposition of the augmented system.

Let M = AD2 AT and consider its Cholesky factorization M = LAL T , where L is


a lower triangular matrix and A is a diagonal matrix. We note that the solution
of a sparse symmetric system of linear equations is a very important problem in
scientific computing. Therefore, it is a well developed area both in theory and in
the computational practice. The basic formulae for computing the column j of L
(denoted by Lj) and the pivot Ajj are:
j-I

Ajj = Mjj - L ljk' (6.35)


k=1

Lj AI.. (Mj - E(Akk1jk)Lk) . (6.36)


)J k=1

Several approaches have been developed to compute the factorization. They exploit
sparsity in an efficient way and use different techniques of storage management in
the computations. George and Liu [29] demonstrate how these calculations can be
organized either by rows or by columns.

During the row-Cholesky factorization the rows of the Cholesky factor L are com-
puted one by one. This approach is called the bordering method. Several enhance-
ments of it can be found in [29, 50].

An alternative approach is the column-Cholesky factorization in which the columns


of L are computed one by one. This is the most commonly used form; its efficient
implementations can be found, for example, in the Yale Sparse Matrix Package [21]
and Waterloo SPARSPAK [29]. This method is also called left looking factorization,
because for computing the column Lj the information from the left part of the factor
(i.e. the columns prior to Lj) is used in the computations. Its implementation uses
dynamic linked lists to identify the 'left' columns when updating the pivot column,
and a double precision work array to accumulate the column modifications and to
resolve the nonzero matching between different columns.

The third approach is the submatrix-Cholesky factorization, also referred to as the


right looking factorization. In this approach, once a column Lj has been computed,
it immediately generates all contributions to subsequent columns, i.e. to columns to
218 CHAPTER 6

the right of it in the matrix. The matching of nonzeros during the transformations
with this approach is not a trivial problem; several solutions have been found for its
efficient implementation [19, 72J. The interest in this approach has increased in the
past few years because of its ability to better exploit high performance architectures
and the memory hierarchy.

We shall present a few of the most important techniques that increase the efficiency
of the numerical factorization step in interior point methods. These techniques
come from parallel and vector computations and the common trick is the use of
the matrix~vector operations in 'dense' mode to reduce the overhead of the sparse
computations.

Dense window

The most straightforward improvement of the factorization is exploitation of the


dense window. In practice, the triangular factors become completely dense in the
last steps of the Cholesky factorization (see e.g. Figures 6.1 and 6.2). The last
partition of columns can be handled as a dense matrix when those columns are
factored. In this way the overhead of doing sparse computations is avoided. It
might also be advantageous to include some almost dense columns into the dense
window (see, e.g., [19]).

Supernodes

The dense window technique can be generalized using the following observation. Due
to the way the Cholesky decomposition works, some blocks of columns in L tend to
have the same sparsity pattern below the diagonal. Such a block of columns is called
a supernode and it can be treated as a dense submatrix. The supernode terminology
comes from the elimination graph representation of the Cholesky decomposition [29J.
There exist two different types of supernodes: they are presented in the figures below.
Type 1 supernode Type 2 supernode

* *
* * *
* * * *
* * * * * *
* * * * * *
* * * * * *
Implementation of IPMs for LP 219

Both types of supernodes are exploited in a similar manner within the numerical
factorization step. Analogously to the dense window technique, the use of supernodes
increases the portion of flops that use dense matrix-vector transformations to save on
indirect addressing and memory references. The following operations take advantage
of the presence of supernodes:

(i) whenever column j is a member of a supernode, the operation of building Lj


is simplified (operations on other members of the supernode are done in dense
mode);
(ii) whenever column j is not a member of a supernode (but it depends on a set of
columns that belong to a supernode), then a temporary work array is used to
accumulate the contribution of the whole supernode before this contribution is
added to Lj.

It is advisable to impose a lower bound on the size of supernodes since the extra work
in step (ii) does not payoff in the case of too small supernodes. Another suggestion
is the use of an upper bound on the number of nonzeros in each supernode to better
exploit the cache memory on several computer architectures [52]. The effect of the
supernodal methods is highly hardware-dependent and several results can be found
in the literature: the efficiency of the supernodal decomposition on the shared-
memory multiprocessors is discussed by Esmond and Peyton [69], the exploitation
of the cache memory on high-performance workstations is studied by Rothberg and
Gupta [72] in the framework of the right looking factorization while the case of the
left looking factorization was investigated by Meszaros [64].

Block Cholesky factorization

Another possibility to use dense computations is the partitioning of L into smaller,


presumably dense blocks. We try to divide L into block diagonal 'supernodal' sub-
matrices. This technique is very effective in some cases, because the typical Cholesky
factor contains many such blocks (the largest of them is usually the dense window
located at the bottom of the matrix). Consider the following matrix:

with a further simplifying assumption that the blocks L11 and L22 of the Cholesky
factor define supernodes. The Cholesky factorization of this matrix can be computed
in the following steps:
220 CHAPTER 6

1. Factorize LllAllLfl = B ll .
2. Update L21 = B 21 (L 1}f.
Update Bn = Bn - L21AllL21'
T
3.
A

4. Factorize L22A22LI2 = En.

The advantage is that steps 1, 2, and 4 can be performed in dense mode, resulting
in a very efficient implementation on high performance computers.

Loop unrolling

Dense computations can be further specialized to exploit a loop unrolling technique.


The typical inner loop of the factorization adds a multiple of a column to another
one. Let a be the target column, b the source column and c the multiplier. If we
assume that c is kept in a single register, then the steps performed by the computer
to execute one transformation a <- a + cb can be written as follows:

1. read a( i) from the memory,

2. read b(i) from the memory,

3. compute a(i) + cb(i),


4. store the result in the memory.

Consequently, three memory references (steps 1, 2, and 4) are associated with only
one arithmetical operation (step 3). During the factorization, multiple column mod-
ifications are performed on a single column, which opens the possibility to unroll the
loop over the column transformations. Let a be the target column, b, c, d, e, f and 9
be the source columns, and h(l), ... , h(6) their scalar multipliers, respectively. A
6-step loop unrolling technique consists of the following transformation:

a <- a + h(l)b + h(2)c + h(3)d + h(4)e + h(5)f + h(6)g.

An execution of this transformation needs only eight memory references and six
arithmetical operations (multiplications). Hence, ten memory references have been
saved compared with an execution of six elementary flops that do not exploit loop
Implementation of IPMs for LP 221

Table 6.3 Comparison of different techniques of the factorization


Name T(ll) T(dw) T(sn1) T(sn2) T(sn4) Tlsn6).
aircraft 0.131 0.143 0.142 0.135 0.133 0.129
fit2p 0.344 0.324 0.343 0.347 0.334 0.328
80bau3b 0.504 0.464 0.528 0.480 0.428 0.424
25fv47 0.471 0.437 0.463 0.404 0.349 0.346
dll001 434.964 349.539 219.333 191.068 175.221 160.594
maros-r7 28.901 26.285 22.308 18.422 16.416 15.734

unrolling. This technique brings considerable time savings on all computer architec-
tures although the savings may vary significantly on different computers.

Numerical examples

To give the reader some idea about the efficiency of all techniques discussed by now
(i.e. dense window, supernodes and loop unrolling), we show the computational re-
sults of their application on a small set of test problems for one widely used computer
architecture, namely, a SUN Sparc-lO workstation.

In Table 6.3 we compare the times (in seconds) of computing one decomposition
with a standard left looking factorization, T(ll), the one using dense window tech-
nique, T(dw) , supernodal factorization without loop unrolling, T(snl), and, finally,
supernodal factorizations with 2-, 4-, and 6-step loop unrolling, T(sn2), T(sn4) and
T(sn6), respectively.

To cover a possibly wide set of LP problems, we have chosen 6 test examples with
very different characteristics. Problems aircraft and fit2p have extremely sparse
factorization with the augmented system (cf. Table 6.2). Problems 80bau3b and
25fv47 are "usual" sparse problems, and maros-r7 and dfl001 are examples of
very dense ones. In the case of sparse problems, these techniques have very little
influence on the factorization times. On usual "sparse" problems, the dense window
method is unequivocally superior to the standard left looking method but the savings
resulting from the use of supernodes and loop unrolling are not evident. The 4-step
loop unrolling gives a better execution time, but the effect of the 6-step loop unrolling
is negligible. Finally, for dense problems, the superiority of the simple supernodal
method over the use of dense window is unquestionably; moreover the computation
times monotonically decrease with the degree of the loop unrolling.
222 CHAPTER 6

6.4.4 Hardware-Dependent Aspects


As noted previously, modern computer architectures have many different character-
istics that influence the choice of the algorithm to be applied. The most important
of them are:

• cache memory,
• pipelining,
• vectorization,
• superscalar capabilities.

The simple choice of the "best" algorithm is usually impossible without extensive
computational tests. The reader can find a good discussion of these issues (as well
as many numerical examples) in [52]. Let us collect some general suggestions.

There is a choice between two ordering methods: the minimum degree and the min-
imum local fill-in one. The hint is the ratio of the cost of integer (or logical) and
floating point operations. If the latter are executed fast compared with the former,
then there is little chance that the savings in numerical factorization can compensate
excessive effort during the analysis phase. In this case, the faster minimum degree
ordering seems more appropriate. In other cases, e.g., for standard "low-cost" work-
stations, the minimum local fill-in ordering may become an attractive alternative.

In the numerical factorization phase the two most commonly used methods are the
right looking and left looking algorithms. The right looking factorization exploits the
cache memory better, because the supernodes enter into the cache only once during
the factorization, while in the case of the left looking factorization a supernode
enters cache memory many times. However, the right looking factorization requires
an additional indirect addressing. As it can be presumed, the criteria for choosing
the numerical factorization algorithm must be based on the investigation of the cache
memory (its size, the time of bringing information into it, etc.).

6.4.5 Handling of Rank Deficiency and


Instability
The standard assumption in the theory of interior point methods is that the LP con-
straint matrix A has full row rank. In practice, this property may not be satisfied. It
Implementation of IPMs fOT LP 223

is possible to determine the maximum set of independent rows using Gaussian Elim-
ination [2]. The computational cost of such an operation is relatively low in most of
the cases. On the other hand, IPMs that use a starting point similar to (6.11) can
benefit from the additional Cholesky factorization of AD2 AT with a well conditioned
D2 to detect linearly dependent rows. Whenever a pivot in the factorization falls
below a predetermined tolerance, i.e. Aii < f, row i can be dropped from the LP
model. Although the latter approach is, in general, less reliable than the specialized
Gaussian Elimination of A, its application does not need any additional computa-
tional effort as it only exploits the factorization that anyway has to be computed.
The practice shows that this approach solves the problem of dependent rows as well.

Even if we manage to satisfy the full row rank property at the beginning of the
optimization proces, then during the solution process matrix AD may become "nu-
merically" rank deficient. This is often the case due to the presence of primal de-
generacy. Consequently, IPM implementations have to be able to deal with rank
deficient matrices AD2 AT .

For feasible and bounded LPs, the negative influence of the ill-conditioning of AD2 AT
on the accuracy of the solution of the normal equations is surprisingly small. Stewart
[75] gives a nice explanation to this common experience, derived from an analysis
of the properties of the right hand side vector in (6.26). Stewart's result does not
apply to the case when the LP problem is infeasible. In practice, this case usually
manifests in a serious loss of accuracy when solving the Newton equations.

As mentioned before, the numerical difficulties usually appear close to the optimal
solution, especially in the presence of primal degeneracy. There exist several ways to
overcome them. These techniques are not always mathematically elegant and, addi-
tionally, they are often treated as the most precious know-how that is not revealed
by IPM specialists. Below we present some suggestions how instability problems can
be overcome and we end this section with a detailed presentation of the technique
to control accuracy applied in a public domain IPM code (available trough Netlib)
[37, 36, 38].

One possibility to handle nearly rank deficient factorizations is to remove presumably


dependent rows only "virtually" by setting the nonzeros of Li to zero and Ai; to one.
Note that it is much easier if an inverse of the diagonal matrix A is stored instead
of A itself (in this case it suffices to set Ai/ = 0).

Another way is to add a small regularizing term tI to the matrix AD2 AT before
or during the factorization. This helps to complete the factorization step but needs
special safeguards in the following solution steps.
224 CHAPTER 6

The approach used in the public domain LP code [38] consists of several safeguard
techniques. First of all, it uses a dynamically adjusted diagonal regularizing matrix
R E nm. Its elements Rii vary from nearly zero value added to acceptably stable
=
pivots Ai; to quite large regularizations Ri; 1 added to unstable pivots Ai;. Conse-
quently, instead of a decomposition of AD2 AT, a (stable) factorization of a different
regularized matrix AD2 AT + R is computed. It is very rare that R contains more
than a few large regularizing terms; they refer to those rows of AD which are nearly
linearly dependent.

An important issue is to take R into account in all the following computations (i.e.
in solves for direction). Observe that the form of the decomposed matrix could have
been obtained from the following perturbed augmented system

(6.37)

Note that it is the Newton equations system corresponding to the following quadratic
programming problem (closely related to the dual LP problem (6.2))

maxImIze bT y - u T w + (y -
yo)T R(y - YO)
subject to ATy-w+z=c, (6.38)
z,w;::O,

in which Yo is some reference point, e.g., current iterate y. The right hand side
vectors rR and hR in (6.37) are derived from the first order optimality conditions
for the barrier problem associated with (6.38)

rR = ~c - X-I(I-'e - XZe) + S-I(I-'e - SWe) - S-IW~u,


hR ~b + a(y - YO).

=
Note that they become identical to (6.9) if we take a particular reference point Yo y.
The regularization technique can be interpreted as the use of quadratic penalty for
the changes of those dual variables Yi for which presumably unstable pivots were
computed. Computational experience shows that it well prevents the propagation of
round-off errors.

Apart from the quadratic regularization technique mentioned above, the LP code
of [38] uses extensively iterative refinement process to improve the accuracy of the
Newton direction. Iterative refinement technique is always applied to the augmented
system formulation of the Newton equations system although the direction is com-
puted via their reduced, normal equations form (see, e.g., [6]).
Implementation of IPMs for LP 225

6.5 PRESOLVE
The previous section was concerned with the efficiency of solving the Newton equa-
tion system using advanced numerical linear algebra. Another way to improve the
efficiency of solving the Newton equation system is to reduce its size and make the
system sparser. This aim can be achieved by analyzing the LP problem and remove
redundancies. In practice, almost all large-scale LP problems contain redundancies.
There are several reasons to this. First of all, the model formulators, tend to chose a
formulation that makes it easy to understand and to modify the model. This often
leads to an introduction of superfluous variables and redundant constraints.

Unfortunately, it is impossible to remove all redundancies in a large-scale LP problem


manually. Therefore a presolve analysis aims at improving problem formulation.
More precisely, its goals might be defined as follows:

• reduce the size of the LP problem as much as possible;

• reformulate the model to the most suitable form for a solver.

The use of a presolve phase is an old idea, see, e.g., Brearley et al. [12]; its role
was acknowledged already in simplex type optimizers. The simplex method for LP
works with sparse submatrices of A (bases) [78] while any IPM needs an inversion
of a considerably denser AAT matrix. Consequently, the potential savings resulting
from an initial problem reduction may be larger in IPM implementations. This is
the reason why the presolve analysis has recently enjoyed great attention [1, 51, 3,
37, 2, 9, 54, 77]. An additional important motivation for its use is that large LP
problems are solved routinely nowadays and the amount of redundancy is increasing
with the size of the problem.

6.5.1 The Reduction Methods


In general, finding all redundancies in an LP problem is computationally too expen-
sive. Therefore all presolve procedures use an arsenal of simple inspection techniques
to detect obvious forms of redundancies. These techniques are applied repeatedly
until the problem cannot be reduced any further. Below, we briefly present the
most common reduction procedures. Further details about the presolve phase are
presented in [3, 37]. The following reduction techniques are used:

1. Empty rows and columns are removed.


226 CHAPTER 6

2. A fixed variable (0 = Uj) can be substituted out of the problem.


3. A row singleton defines a simple variable bound; after an appropriate bound
modification the row can be removed.

4. Lower and upper limits for every constraint i are determined

h= L: ajjuj, and bi = L: ajjuj, (6.39)


{j:a,j<O} {j:aij>O}

that clearly satisfy


h::; L: ajjxj ::; bi. (6.40)
j

Observe that due to the nonnegativity of x, the limits b; and bi are nonpositive
and nonnegative, respectively. If the inequalities (6.40) are at least as tight as
the original (inequality type) LP constraint, then the constraint i is redundant.
If one of them contradicts the LP constraint, then the problem is infeasible.
Finally, in some special cases (e.g.: "less than or equal to" row with h bj , =
"greater than or equal to" row with bi =
bj, or equality type row for which bj
equals to one of the limits h or bi), the LP constraint becomes a forcing one.
This means that the only way to satisfy the constraint is to fix all variables that
appear in it on their appropriate bounds.

5. Constraint limits (6.39) are used to generate implied variable bounds. (Note,
that LP variables were transformed to the standard form 0 ::; x ::; u, before).
This technique makes use of the original form of an LP constraint (i.e. its form
before a slack variable has been added to it to transform it to the "standard"
equality row of (6.1)). Assume, for example, that a nonredundant "less than or
equal to" type constraint is given, i.e.

h < L: ajjxj ::; bj .


j

Then

Vk: ajk >0 h + ajkXk ::; L: ajjxj ::; bj ,


j

and Vk: ajk <0 h + ajk(xk - Uk) ::; L: a;jXj ::; b;,
j

and new implied bounds are given for all variables in row i by

Xk ::; u~ = (b; - h)/a;k for all k: ajk > 0,


Xk ~ l~ = Uk + (b; - '!i)/aik for all k: aik < O.
Implementation of IPMs for LP 227

If these bounds are tighter than the original ones, then variable bounds are
improved. Note, that this technique is particularly useful when it imposes finite
bounds on free variables. Free variables do not, in such a case, have to be split
and represented as the difference of two nonnegative variables.

6. Variable j is a free column singleton, if

3k : ajk :f. 01\ (aij = 0, Vi :f. k) 1\ lj = -00 1\ Uj = +00.


In this case variable Xj can be substituted out of the problem: variable Xj
disappears and the kth constraint is eliminated. The same technique can be
applied to eliminate a singleton implied free variable, i.e. a variable for which
implied bounds (generated by the technique of point 5) are at least as tight as
the original bounds.

7. Nonnegative unbounded variables (0 :::; x :::; +00) referring to singleton columns


are used to generate bounds on dual variables y. Namely, if the variable j refers
to a singleton column with an entry aij and Uj = +00 (i.e. Wj = 0), then the
dual constraint (6.2) becomes an inequality

This inequality can be solved and, depending on the sign of aij, produces a
lower or upper bound on Yi.
These bounds on the dual variables are used to generate lower and upper limits
for all dual constraints (a technique similar to that of point 4 is used). The limits
are then used to determine the variables reduced costs. Whenever the reduced
cost is proved to be strictly positive or strictly negative, the corresponding
variable is fixed on an appropriate bound and eliminated from the problem.

8. Dual constraint limits (obtained with a technique of point 7) are used to generate
new implied bounds on the dual variables. A technique similar to that of point 5
is applied. Implied bounds tighter than the original ones replace old bounds and
open the possibility to eliminate more variables with the technique of point 7.

6.5.2 Detecting Redundancy and Improving


Sparsity in A
The presolve techniques described in the previous section involve a considerable
amount of arithmetical operations. The techniques discussed in this section are
based mainly on the sparsity pattern analysis. We list them below.
228 CHAPTER 6

1. Removing duplicate constraints. Two constraints are said to be duplicate if


they are identical up to a scalar multiplier. One of the duplicate constraints is
removed from the problem.

2. Removing linearly dependent constraints. The presence of more than a few


linearly dependent rows in A may lead to serious numerical problems in an
interior-point methods, since it implies a rank deficiency in the Newton equation
system. Subramanian et al. [76] and Andersen [2] report that in some cases the
computational savings from removing the linearly dependent constraints are
significant.

3. Removing duplicate columns. Two columns are said to be duplicate if they are
identical up to a scalar multiplier. An example of duplicate columns are two
non-negative split brothers used to replace a free variable.
When discussing the disadvantages of the normal equations approach in Sec-
tion 6.4.1, we have mentioned the negative consequences of the presence of split
free variables. Sometimes it is possible to generate a finite implied bound on a
free variable [37] and avoid the need of splitting it. Whenever possible, general
duplicate variables are replaced with an aggregate variable (a linear combination
of duplicates).

4. Improving sparsity of A. We look for a nonsingular matrix M E nmxm such


that the matrix M A is as sparse as possible. Primal feasibility constraints can
in such case be replaced with an equivalent formulation

MAx = Mb, (6.41)

much more suitable for a direct application of the interior point solver. Exact
solution of this Sparsity Problem [15] is an NP-complete problem but efficient
heuristics [1, 15, 37] usually produce satisfactory nonzero reductions in A. The
algorithm of [37], for example, looks for such a row of A that has a sparsity
pattern being the subset of the sparsity pattern of other rows and uses it to
pivot out nonzero elements from other rows.

6.5.3 Other Types of Reductions


The common feature of the previous presolve techniques is that they cannot increase
the number of non zeroes in the LP problem. It may in some cases be advantageous
to allow a limited fill-in that results from the elimination of certain variables and
constraints. We list these elimination techniques below.
Implementation of IPMs for LP 229

1. Free and implied free variables can be eliminated not necessarily only in a case
when they correspond to singleton columns (cf. Section 6.5.1, point 6) but also
in a case when they correspond to denser columns. It should be noted, however,
that this elimination technique has to be used carefully as it may introduce large
amount of fill-in and, in particular, create dense columns. Hence, it requires
additional sparsity structure analysis to be implemented properly [76].

2. Doubleton rows corresponding to equality type constraints can be used to pivot


out one of the variables. This operation is clearly the opposite to splitting dense
columns: it causes a concatenation of two shorter columns into a longer one but
it may be advantageous if the length of the new column is not excessive [25].

The application of all presolve techniques described by now often results in impressive
reductions of the initial LP formulation. Hopefully, the reduced problem obtained
after the presolve analysis can be solved faster. Once its solution is found, the
solution is used to recover the complete primal and dual solutions to the original
problem. This phase is called a postsolve analysis; it has been discussed extensively
in [3].

6.5.4 Numerical Examples


In Table 6.4 we present some computational results reproduced from [2]. The
columns ROWS, COLS and NZA show the number of rows, columns and nonzero
elements in A, respectively. The following columns: RROWS, RCOLS and RNZA
show the same numbers, but after presolve. Finally, LROWS, LCOLS and LNZA
present the LP matrix statistics after presolve and after elimination of all linearly
dependent rows. The results collected in Table 6.4 clearly advocate for the use of an
involved presolve analysis, although they also show that there exist (rare in practice)
almost irreducible problems.

The advantages of the presolve analysis become clearer if one compares the sparsity
of the Cholesky factors obtained for the original and the reduced LP formulations.
Table 6.5 reports the number of non zeros in the Cholesky factor, NZL for all problems
listed in Table 6.4. These numbers are given for the original problem formulation,
the reduced one, and the final reduced form, in which linearly dependent rows have
been eliminated.
230 CHAPTER 6

Table 6.4 Advantages of the presolve analysis.


Na.me eOLS NZA RROWS ReOLS RNZA LNZA

...
ROWS LROWS LCOLS
80BAU3B 2263 9799 29063 1960 8679 18969 1960 8679 18969
826 8627 79433 763 857:1 67571 690 8572 58604
CRE-B 9649 72447 328542 5324 31818 107&03 5316 31818 107551
KEN_13 28633 42659 139834 22525 36552 81168 22356 36552 79478
NUG12 3193 8856 44244 3192 8856 38304 2794 8856 33528
OSA-30 4351 100024 700160 4279 96119 262872 .. 279 96119 262872
PDS-IO 16559 48763 140063 15609 47729 103290 15598 47729 103169
PILOTS7 2031 4883 73804 1966 4592 70375 1966 4592 70375
WOODlP 245 2594 70216 170 1717 44573 169 1717 44306
Sum 67750 298652 1605359 55788 244634 794725 55128 244634 778852

Table 6.5 Cholesky factors after presolve.


Name NZL RNZL LNZL
80BAU3B 40521 38308 38308
aa3 204468 185961 152501
CRE-B 957052 246616 245903
KEN-13 340070 269227 256213
NUG12 2732346 2732346 1993043
OSA-30 222863 14125 14125
PDS-10 1729070 1615648 1615516
PILOT87 423242 422195 422195
WOODIP 18347 11658 11488

6.6 HIGHER ORDER EXTENSIONS


The computationally most expensive step in any implementation of an IPM is the
solution of the Newton equation system. This system has to be solved in each iter-
ation and, as we have discussed in Section 6.4, it requires computing a symmetric
factorization of one of the matrices (6.25) or (6.26) followed with a solve employing
this factorization. Both in theory and in practice the factorization phase is compu-
tationally much more expensive than the solve phase. Therefore, we can allow to
do several solves in each iteration if these solves help to reduce the total number of
interior point iterations and therefore also the number of factorizations.

This is the main idea for using high-order methods which we shall discuss below.
Their common feature is that they reuse the factorization of the Newton equations
system in several solves with the objective to compute a "better" search direction.
There exist several approaches of this type: they apply different schemes to compute
the search direction. We shall review them briefly.
Implementation of IPMs for LP 231

The first such approach was proposed by Karmarkar et al. [47] who constructed a
parameterized representation of the (feasible) trajectory motivated from the use of
differential equations.

Mehrotra's method [62, 61] builds a higher order Taylor approximation of the (in-
feasible) primal-dual central trajectory and pushes an iterate towards an optimum
along such an approximation. The second order variant of this method proved very
successful.

Another approach, due to Domich et al. [18] uses three independent directions and
solves an auxiliary linear program in a three dimensional subspace to find a search
direction.

The method of Sonnevend et al. [73] uses subspaces spanned by directions generated
by higher order derivatives of the feasible central path, or earlier computed points
of it as a predictor step. This is later followed by one (or more) centering steps to
take the next iterate sufficiently close to the central path.

Hung and Ye [42] studied theoretically higher order predictor-corrector techniques


incorporated in a homogeneous self-dual algorithm.

The approach of Gondzio [36] defines a sequence of targets in a vast neighborhood


of the central path. These targets are usually easier to reach than analytic centers.
The correctors are supposed to take the iterates to these points. Consequently, the
iterates remain relatively well centered (a large discrepancy of the complementarity
products is avoided) and larger steps can be taken in the primal and dual spaces.

In the following part of this section we shall concentrate on two approaches that
proved to be the most attractive in computations: a second order predictor-corrector
technique [62] and a multiple centrality correction technique [36].

6.6.1 Predictor-corrector Technique


Mehrotra's predictor-corrector strategy [62, 61] has two components which are an
adaptive choice of the barrier parameter and the computation of a high-order ap-
proximation to the central path.

The first step of the predictor-corrector strategy is to compute the affine scaling
(predictor) direction. The affine scaling direction solves the Newton equation system
(6.6) for J.l = 0 and is denoted with ~a. It is easy to show that if a step of size a
232 CHAPTER 6

is taken in the affine scaling direction, then the infeasibility is reduced by the factor
(1 - a). Moreover, if the current point is feasible, then the complementarity gap
is also reduced by the same factor. Therefore, if a large step can be made in the
affine scaling direction, then a desirable progress in the optimization is achieved. On
the other hand, if the feasible stepsize in the affine-scaling direction is small, then
the current point is probably too close to the boundary. In this case the barrier
parameter should not be reduced too much.

Mehrotra suggested to use the predicted reduction in the complementarity gap along
the affine scaling direction to estimate the new barrier parameter. After the affine
scaling direction has been computed, the maximum stepsizes along this direction in
the primal (apa) and in the dual (aDa) spaces are determined preserving nonnega-
tivity of (x, 8) and (z, w). Next the predicted complementarity gap

ga = (x + apa~xf(z + aDa~Z) + (8 + apa~8f(w + aDa~W)


is computed and the barrier parameter is chosen using the heuristic

_ (ga)2
J-I- - ga.
- (6.42)
g n

Next, the high-order component of the predictor-corrector direction is computed.


Note that we ideally want to compute a direction such that the next iterate is
perfectly centered, i.e.
(X + ~X)(z + ~z) /-Ie. = (6.43)
(We have an equivalent relation for variables 8 and w associated to the upper
bounds). The above system can be rewritten as

z ~x + X ~z = -X z + /-Ie - ~X ~z. (6.44)

Let us observe that in the computations of the Newton direction in equation (6.6),
the second order term ~X ~z is neglected. Instead of setting the second order term
equal to zero, Mehrotra proposes to estimate ~X ~z using the affine-scaling direc-
tion ~Xa~za. His predictor-corrector direction is obtained by solving the Newton
equations system with (6.44) as the linearized complementarity conditions and the
barrier parameter J-I chosen through (6.42).

We should note here that the above presentation of the predictor-corrector tech-
nique follows the computational practice. It abuses mathematics in the sense that
Implementation of IPMs for LP 233

stepsizes ap and aD are not taken into account when building the higher order Tay-
lor approximation of the central trajectory. The reader interested to see a detailed
rigorous presentation of this approach can consult [61].

Let us observe that a single iteration of the (second order) predictor-corrector


primal-dual method needs two solves of the same large, sparse linear system for
two different right hand sides. The benefit of the method is, we obtain a good es-
timate for the barrier parameter I' and a high order approximation to the central
path. Indeed computational practice shows that the additional computational cost
of the predictor-corrector strategy is more than offset by a reduction in the number
of iterations (factorizations).

The predictor-corrector mechanism can be applied repeatedly leading thus to meth-


ods of order higher than two. However, the computational results presented in [61]
show that the number of iterations does not decrease sufficiently to justify the addi-
tional computations. Consequently, the second order predictor-corrector technique
became for a couple of years the computational state of the art [54, 62].

The disappointing results for the use of higher (than two) order predictor-corrector
technique used to be explained with a difficulty of building an accurate higher order
approximation of the central trajectory. On the other hand, many large scale lin-
ear programs exist for which the factorizations are extremely expensive. For those
problems the need to save on the number of factorizations becomes more important.

The method presented in the next section responds to this need.

6.6.2 Modified Centering Directions


Let us observe that the step (~x, ~y, ~s, ~z, ~w) of (6.6) aims at drawing all com-
plementarity products to the same value 1'. Moreover, to ensure the progress of
optimization, the barrier parameter I' has to be smaller than the average comple-
mentarity product I'average = (x T Z + sT w)J2n. Such perfectly centered points usu-
ally cannot be reached. Although the theory requires that subsequent iterates are in
the neighborhood of the central path, in the computational practice, they may stay
quite far away from it without negative consequences for the ability of taking large
steps (and the fast convergence).

The approach proposed by Gondzio [36] applies multiple centrality corrections and
combines their use with a choice of reasonable, well centered targets that are sup-
posed to be easier to reach than perfectly centered (but usually unreachable) analytic
234 CHAPTER 6

centers. The idea to use targets that are not analytic centers comes from Jansen,
Roos, Teriaky and Vial [44]. They define a sequence of traceable weighted analytic
centers, targets that goes from an arbitrary interior point to a point close to the
central path. The algorithm follows these targets and continuously (although very
slowly) improves the centrality of subsequent iterates. The targets are defined in the
space of the complementarity products.

The method of [36] translates this approach into a computational practice combin-
ing the choice of attractive targets with the use of multiple correctors. It abuses the
theory of [44] in the sense that it does not limit the improvement of centrality (mea-
sured with the discrepancy between the largest and the smallest complementarity
product). Below, we briefly present this approach.

Assume (x, s) and (y, z, w) are primal and dual solutions at a given iteration of the
primal-dual algorithm (x, s, z and ware strictly positive). Next, assume that a
predictor direction Llp at this point is determined and the maximum stepsizes in
primal, Cip and dual, CiD spaces are computed that preserve nonnegativity of the
primal and dual variables, respectively.

We look for a corrector direction Ll m such that larger step sizes in primal and dual
spaces are allowed for a composite direction

(6.45 )

To enlarge these stepsizes from Cip and CiD to iip = min(Cip + flu, 1) and iiD =
min( CiD +flu, 1), respectively, a corrector term Ll m has to compensate for the negative
components in the primal and dual variables

(x, s) (x, s) + iip(Llpx, Llps),


(6.46)
(ii, z, iii) (y, z, w) + iiD(Llpy, Llpz, Llpw).

We try to reach the goal by adding the corrector term Ll m that drives from this
exterior trial point to the next iterate (x, s, fJ, z, tV) lying in the vicinity of the central
path. However, we are aware that there is little chance to reach the analytic center
in one step that is to reach v =
(pe, pe) E R 2n in the space of the complementarity
products. Hence, we compute the complementarity products of the trial point v =
(Xz,Siii) E R2n, and concentrate the effort on correcting only their outliers. We
thus project the point v componentwise on a hypercube H = [,BminP, ,Bma:r;pFn to
get the following target

(6.47)
Implementation of IPMs for LP 235

The corrector direction Ll m solves the linear system similar to (6.6) for the following
right hand side
(0,0,0, Vt - v) E n4n+m, (6.48)
with nonzero elements only in a subset of posi tions of Vt - v that refer to the com-
plementarity products which do not belong to (f3minP, f3maxP).

Once the corrector term Ll m is computed, the new stepsizes ap and aD are deter-
mined for the composite direction

Ll = Llp + Ll m , (6.49)

and the primal-dual algorithm can move to the next iterate.

The correcting process can be repeated a desirable number of times. In such a case,
the direction Ll of (6.49) becomes a new predictor Llp and is used to compute the
new trial point (6.46). An advantage of this approach is that computing every single
corrector term needs exactly the same effort (it is dominated by the solution of the
system like (6.6) with the right hand side (6.48)).

The questions arise about the choice of the "optimal" number of corrections for a
given problem and the criteria to stop correcting if it brings no improvement. They
were answered in [36]. Naturally, the more expensive the factorizations of (6.25) or
(6.26) compared with the following backsolves, the more correctors should be tried.

The computational experience of [36] proved that, when applied to the solution of
nontrivial problems, this method gives significant CPU time savings over the second
order predictor-corrector technique of Mehrotra.

6.7 OPTIMAL BASIS IDENTIFICATION


In many practical applications of linear programming, a sequence of closely related
problems has to be solved. This is, for example, the case in the branch and bound
algorithm for integer programming or in column generation (cutting planes) meth-
ods. Obviously when two closely related problems are solved the previous optimal
solution should and can be used to solve the new problem faster. In the context
of the simplex algorithm this aim is achieved by starting from the previous optimal
basic solution. In the context of an interior-point method, warm start procedure still
does not exist and it is not obvious that this problem will ever be solved satisfactorily
(cf. Section 6.9). Some hope comes from the particular IPM application in which
236 CHAPTER 6

approximate analytic centers are looked for [32, 63), but in the general case, interior-
point warm-start is inefficient. Consequently, the approach adopted nowadays is to
solve the first problem of a sequence of closely related problems using an IPM and
then cross-over to the simplex method. In this case the advantages of both methods
are exploited.

In this section, we shall address the problem of recovering an optimal basis from an
almost optimal primal-dual interior-point solution. Before, we would like to note that
there exist LP applications in which an optimal interior-point solution is preferable,
see, e.g., Christiansen and Kortanek [17] and Greenberg [39].

The primal-dual algorithm discussed in the previous sections produces an optimal


basic solution only if the optimal solution is unique (which is very rare in practice).
In fact, in the case of either multiple primal or multiple dual optimal solutions, the
primal-dual method will generate an optimal solution in the analytic center of the
optimal face, see GuIer and Ye [40]. Therefore an algorithm is needed that generates
an optimal basis from an optimal interior-point solution.

6.7.1 Notation
In this section we will work with the problem in a simplified standard form (in which
primal variables have no upper bounds)

mmImIze cT x subject to Ax = b, x ~ o. (6.50)

The dual to (6.50) is

maxImIze bT y subject to AT y + z = c, z ~ O. (6.51)

It is well-known that any optimal solution (x· , y. , z·) must satisfy the complementar-
ity slackness conditions x; z} = O. Moreover, it is known that there exists a strictly
complementary solution that satisfies x; + z} > 0, see Goldman and Tucker [34]. Let
(x·, y., z*) be such a strictly complementary solution and define p. {j : x; > O}. =
It can be shown that p. is invariant with respect to all strictly complementary so-
=
lutions. Hence p. is unique. The pair (p., p.), where P {I, ... , n} \ P for any
set P, determines an optimal partition.

Furthermore we use the notation Xp =


XjEP for any vector x and any set P. IPI
means the number of elements in P. For any set P, P denotes a matrix built of
columns corresponding to variables that belong to P, namely, P a(.,jEP), where=
a(.,i) is the jth column of A.
Implementation of IPMs for LP 237

Let (8, N) denote a partition of the variables into basic and non-basic variables.
(8, N) is an optimal basis, if B is non-singular,
Xs = B-1b 2: 0, X,N" =0 (6.52)
and
(6.53)
A basic solution is said to be primal (dual) degenerate if at least one component in
Xs (Z.N) is zero.

6.7.2 The Pivoting Algorithm


The best algorithm to generate an optimal basis has been proposed by Megiddo
[59]. It constructs an optimal basis in less than n iterations starting from any
complementary solution and it is strongly polynomial. Megiddo has proved an even
stronger result: he has shown that an optimal basis cannot be constructed from a
primal or dual optimal solution in strongly polynomial time unless there exists a
strongly polynomial algorithm for LP.

Below we shall discuss Megiddo's algorithm and its implementation. For convenience
we assume that a set of artificial variables has been added to the problem (6.50). Let
V = =
{I, ... , m} denote the set of artificial variables; naturally, we must have Xv 0
in any optimal solution. Furthermore, we assume that a strictly complementary
solution is known. Hence, we assume that:

a. We know the optimal partition (P*, j)*) and V ~ j)*.


b. We know an optimal primal solution x such that Ax = b, xpo = 0 and X1'o >
o.
c. We know an optimal dual solution (y, z) such that AT y + z = c, zpoW > 0 and
z:;'o = o.

In fact, the algorithm presented below works for any complementary solution, i.e.
when the conditions X1'o > 0 and zpoW > 0 in assumption band c are relaxed to
X1'o 2: 0 and zpoW 2: o.

Megiddo's algorithm consists of a primal and a dual phase. Let us start with a
description of the primal phase. Let (8, N) be any partition of the variables of the
problem (6.50) into basic and non-basic parts. Then
Xs := B-1(b - Nx,N") = Xs 2: 0 (6.54)
238 CHAPTER 6

because B is non-singular. The solution (XI3, XN") is called a super-basic solution


since some of the non-basic variables are not at their lower bound zero (the non-
basic variables that are not identical to zero are called super-basic). The idea of the
primal phase is to move all super-basic variables to zero or to pivot them into the
basis using simplex like iterations. The resulting basis is primal optimal, because it
is feasible and it is complementary with respect to the dual optimal solution (ii, z).
Each move or pivot step reduces the number of super-basic variables by one: since
the number of super-basic variables cannot exceed IP* I. the algorithm terminates
after at most IP* I iterations.

Now we will state the algorithm

Algorithm 6.7.1
1. Choose a basis B and let x = x.
2. while(3 j E P* \ B : Xj =F 0)
3. Use the primal ratio test to move variable Xj to zero if possible
or pivot it into the basis.
4. Update (B,N) and x.
5. end while
6. B is a primal optimal basis.

It can be observed that in step 1, it is always possible to choose a basis. One possible
=
choice is B V. Algorithm 6.7.1 is a simplified version of the primal simplex algo-
rithm, because there is no pricing step (the incoming variables are predetermined).

The dual phase of Megiddo's algorithm is similar to the primal phase because, in
this case, a super-basic dual solution is known. This means that some of the reduced
costs corresponding to the basic variables might not be zero. Similarly to the primal
phase, those reduced costs can either be moved to zero or the corresponding primal
variable has to be pivoted out of the basis. The dual algorithm can be stated as
follows

Algorithm 6.7.2
1. Choose a basis B and let y = =
ii, z c - AT y.
2. while(3 j E fj* () B : Zj =F 0)
3. Use the dual ratio test to move variable Zj to zero if possible
or take it out of the basis.
4. Update (B,N), y and z.
5. end while
6. B is a dual optimal basis.
Implementation of IPMs for LP 239

If the initial basis is primal feasible, then it remains feasible throughout all steps
of Algorithm 6.7.2 because all pivots are primal degenerate. Once Algorithm 6.7.2
terminates, the final basis is both primal and dual feasible and hence optimal. Fur-
thermore, the number of iterations in the dual phase cannot exceed 11"1.

Summing up, Algorithms 6.7.1 and 6.7.2 generate an optimal basis after at most n
iterations. In practice, the number of iterations is dependent on the level of primal
and dual degeneracy.

6.7.3 Implementational Issues of the Pivoting


Algorithm
Megiddo's algorithm presented in the previous section assumes that an exact optimal
solution is known. This assumption is never met in practice, because the primal-
dual algorithm only generates a sequence converging towards the optimal solution.
Furthermore, due to the finite precision of computations, the solution returned by
the primal-dual algorithm is neither exactly feasible nor complementary.

Bixby and Lustig solve this problem using a Big-M version of Megiddo's algorithm
that is their cross-over procedure drives both complementarity and feasibility to
zero. This algorithm adds, in the worst case, several simplex pivots to obtain an
optimal basis. Their approach works well but, unfortunately, it complicates the
implementation of the cross-over procedure.

Andersen and Ye [4] propose an alternative solution to this problem. Let (xk, yk, zk)
be the iterate generated by the primal-dual algorithm in iteration k and (pk, pk) be
a guess of the optimal partition generated in iteration k. Now define the following
perturbed problem

mllllmize (c k ?x subject to Ax = bk , x 2:: 0, (6.55)

where
bk -- pk ~k
~pk , ckpk -- (pk)Tyk and c~.
r
= (P- k)Tyk + z~ •.,..

Assume the variables in (6.55) are reordered such that x = (Xpk,Xp.) then the vec-
tor (x, y, s) = «x~., 0), yk, (0, 4.))
is a strictly complementary solution to (6.55).
Moreover, if xk converges towards an optimal primal solution and pk converges
towards P', then bk converges towards b and, similarly, c k converges towards c.
Therefore the two problems (6.50) and (6.55) will eventually share optimal bases.
This advocates for an application of Megiddo's algorithm to the perturbed problem
240 CHAPTER 6

(6.55) instead of to (6.50). Note, that an optimal complementary solution to the


problem (6.55) is known.

An important practical issue is a choice of an indicator pk for the optimal partition


P*. A trivial one is
(6.56)
Unfortunately, this indicator is not invariant with respect to the column scaling.
Hence it is less attractive. Another indicator is

pk = {j: 1~~jl/xJ:::; I~~jl/zn, (6.57)


where (~~, ~~) is the primal-dual affine scaling search direction and this indicator
is scaling invariant. It uses the variable changes to guess the optimal partition. This
indicator is justified by the theory in [22] and is also reliable in practice.

Another question is the choice of the right iteration to terminate the interior point
algorithm and to start the cross-over. The optimal basis generation can only be
expected to produce the correct optimal basis if the interior point solution is almost
optimal and pk is a good guess for p'. A good practical criteria when to make a
switch is when fast (quadratic) convergence of primal-dual algorithm sets in.

Finally, for a discussion of linear algebra issues related to implementing the pivoting
algorithm and computational results we refer the reader to the papers [10, 4].

6.8 INTERIOR POINT SOFTWARE


Now more than ten years after Karmarkar's publication, interior point methods is
a well understood area both in theory and practice. The current implementations
are sophisticated optimization tools capable to solve very large linear programs.
Moreover, the interior-point methods have proved to be significantly more efficient
than the best available simplex implementations for many LP problems [54].

Several efficient LP codes based on interior-point methods have been developed the
most recent years. Almost all codes are based on the primal-dual algorithm pre-
sented above although they differ in many implementational details. There exist
several commercial vendors, e.g.: AT&T (KORBX), CPLEX(CPLEX/ BARRIER,
http://www.cplex.com), DASH(XPRESS-MP, http://www.dash . com) and IBM
(OSL, http://www.research.ibm.com/osl/) as well as numerous research codes,
some of them public domain in an executable or even in a source code form. The
reader may find it surprising that these research codes compare favorably with the
Implementation of IPMs for LP 241

best commercial products. Three public domain research codes draw particular at-
tention.

Vanderbei's LOQO is an implementation of the predictor-corrector primal-dual al-


gorithm for LP and QP. The code is written in C and is available in an executable
form of a callable library. Note that LOQO is only free if it is used for academic
purposes. LOQO is available from http://www.sor.princeton . edurrvdb/.

Zhang's LIPSOL is written in MATLAB and FORTRAN. It is also an implementa-


tion of the predictor-corrector primal-dual method. Its undoubtful advantage is an
ease of comprehension resulting from the use of MATLAB's programming language.
LIPSOL is available from
http://pc5.math.urnbc.edu/-yzhang/.

Gondzio's HOPDM is an implementation of a higher order primal-dual method (cf.


Section 6.6.2). This code is public domain in a form of FORTRAN source files from
http://ecolu-info.unige.ch/-logilab/software/.

Meszaros' BPMPD, is another implementation of a higher order primal-dual method.


The code is available in a form of FORTRAN source files for academic purposes from
ftp://ftp.sztaki.hu/pub/oplab/SOFTWARE/BPMPD.

The reader interested in more information about these LP codes (both commercial
and research ones) should consult the LP FAQ (LP Frequently Asked Questions).
The World Wide Web address of the LP FAQ is

• http://vvv.skypoint.com/subscribers/ashbury/linear-programming-faq
• ftp://rtfm.mit.edu/pub/usenet/sci.ansvers/linear-programming-faq

To give the reader some idea about the efficiency of available commercial and research
LP codes, we run them on a few public domain test problems. Table 6.6 gives their
sizes, i.e. the number of rows, columns and nonzero elements, m, n, and nonz,
respectively. Problems pilot87, dflOOl and pds-l0 come from Netlib; problems
mod2, world and ilL belong to the collection maintained at the University of Iowa.
Table 6.7 reports statistics on their solution (iterations and CPU time in seconds to
reach 8-digit optimality) on an IBM Power PC workstation (model 601: 66 MHz,
64 MB RAM). In the case of 8-digits optimality could not be reached, we give
in parenthesis the number of exact digits in the suboptimal solution. The following
solvers are compared: CPLEX version 3.0 SM (simplex method), CPLEX version 3.0
BARRIER, LIPSOL version 0.3, LOQO version 2.21, HOPDM version 2.12 and
242 CHAPTER 6

Table 6.6 Test problem statistics.


Name n m nonz
pilot87 2030 4883 73804
dllOOl 6071 12230 41873
pds-l0 16558 48763 140063
mod2 35664 31728 n01l6
world 35510 32734 220748
NL 7195 9718 102570

Table 6.7 Commercial vs. public domain solvers.


Problem Cplex SM Cplex/BAR LIPSOL
its time its time its time
pilot87 10167 722.2 41 602.8 38 817.4
dll001 63389 3766.9 47 3829.3 85 9906.8
pds-l0 38327 1222.7 60 3462.1 51 5824.3
mod2 117360 18199.9 57 947.0 72 3737.3
world 134270 22469.5 62 1145.6 60 3115.2
NL 32273 1124.8 31 147.2 35 319.3

Problem LOQO HOPDM BPMPD


its time its time its time
pilot87 47 1220.8 24 501.4 30 391.9
dll001 53 7378.6 (7)33 5294.3 34 2921.4
pds-l0 51 5616.9 29 3393.7 30 2715.4
mod2 73 1591.7 47 1069.1 48 875.7
world 74 1753.6 51 1345.2 57 1109.4
NL 29 210.0 23 171.2 30 148.5

BPMPD version 2.1. CPLEX represents the current state-of-the-art commercial LP


optimizer. The remaining solvers are the earlier mentioned research codes.

Before analyzing the results collected in Table 6.7 we would like to warn the reader
that the computational results are dependent on many different factors. For example
the choice of test problems, the choice of computer and the choice of algorithmic
parameters all influence the relative performance of the codes. The results reported
in Table 6.7 have been obtained when all compared codes were run with their default
options.

The analysis of results collected in Table 6.7 indicates that there is only insignificant
difference in the efficiency of commercial and public domain research codes. The
latter are available free of charge.
Implementation of IPMs for LP 243

Although there are many different LP codes available nowadays, the reader may be
interested in preparing his own implementation of an IPM. We have to warn him
that it might not be a trivial task. A lot of different issues have to be dealt with,
e.g., the system design, the choice of the programming language, etc.

In general, implementing the primal-dual algorithm in C or FORTRAN is a time-


consuming job. Whereas any IPM can be implemented fast using the MATLAB
environment. MATLAB has a sparse matrix capability which means that relatively
large LP problems can be solved efficiently with it.

If performance is the ultimate goal, then the code should be implemented in C or


FORTRAN. Which programming language to use is a matter of taste. Even the
commercial codes does not use the same language.

When a programming language has been chosen the next step is to choose a system
design. It is advisable to build the code based on well structured modules. For
instance the Cholesky factorization should be implemented in a separate module.
Another recommendation is to build the optimizer such that it can be called as a
stand alone procedure.

Regarding the form of input data, the standard MPS format surely has to be accepted
although more efficient binary formats might be advantageous. We refer the reader
to the book [67] for a good discussion of the MPS format.

A good reason to be able to read the MPS format is that the majority (if not
all) test problems are available in it. One such collection is the so-called Netlib
suite available via anonymous ftp to netlib.att.com (cd netlib/lp). Another
source of larger and more difficult problems is an LP test collection gathered at the
University of Iowa. It is also available via anonymous ftp to col. biz. uiowa. edu
(cd pub/testprob/lp).

6.9 IS ALL THE WORK ALREADY DONE?


From reading the previous section, the reader should have an impression that the
area of interior point methods for linear programming is deeply explored. Indeed
current IPM implementations are extremely powerful, robust and often significantly
faster than the simplex codes. A natural question arises about the relevant problems
that still remain open about the methods implementation. From our point of view
there are at least a two important questions:
244 CHAPTER 6

• implementation of postoptimal analysis in a correct way, and

• warm start.

When implementing LP algorithms one has to consider methods to produce shadow


prices and ranges. OR practitioners are used to the simplex based postoptimal
analysis that assume the knowledge of an optimal basis and are not always aware of
potential mathematical errors and sometimes misleading economical consequences
of it [39]. This is especially the case if the LP problem is degenerate which is almost
always the case in practice. An interior-point based postoptimal analysis will in this
case give more accurate answers, see Jansen et al.[43]. However, an interior-point
based postoptimal analysis is potentially computationally much more expensive than
the simplex based method.

The general warm start procedures in IPMs still work unsatisfactorily slow and are
not competitive to the simplex based reoptimizations. As mentioned in Section 6.7,
the only promising results to date have been obtained in a particular case when an
IPM is used to find an approximate analytic center of a polytope (not to optimize an
LP). It seems that the best approach currently is to solve difficult problems with an
IPM, identify its optimal basis and later employ the simplex method if reoptimization
is required.

Apart from the two practical problems mentioned above further implementational
improvements can be expected. Although we have concluded that current IPM
implementations work efficiently, we are aware that there exist LP problems that are
very sparse but produce surprisingly dense symmetric factorizations, e.g.: dfl001 or
pds- problems from the Netlib collection. It is possible that the right way to solve
these problems is to apply iterative approaches to the Newton equations system.

Finally an increasing accessibility of parallel computers in the near future will make
IPM methods that exploit this architecture more important. Indeed such algorithm
will be able to solve LP problems much larger than currently possible. This will have
important consequences for the areas of integer programming (improved cutting-
plane methods) and the area of stochastic optimization.

6.10 CONCLUSIONS
In the previous sections we have addressed the most important issues of an efficient
implementation of interior-point methods.
Implementation of IPMs for LP 245

Our discussion has been concentrated on the most important algorithmic issues such
as the role of centering (or, equivalently, following the central path) and the way of
treating infeasibility in a standard primal-dual algorithm (we have presented the HLF
model which solves the problem of detecting infeasibility efficiently). Furthermore,
we have discussed in detail the computationally most expensive part of the IPM
methods - the solution of the Newton equations system.

The progress in the IPM methods for LP during the past decade is impressive. Indeed
a complete theory of interior point methods have been developed. Moreover, based
on this theory many efficient implementations ofIPMs have been constructed. In fact
due to this algorithmic development and the improvements in the computer hardware
much larger LP problems can be solved routinely today than a decade ago. Even
though the methods are not going to improve so dramatically over the next decade
we nevertheless predict significant improvements in the current implementations.

Finally, we hope and believe that these developments are useful to the OR practi-
tioners.

Acknowledgements
The research of the second author has been supported by the Fonds National de la
Recherche Scientifique Suisse, grant #12-42503.94. The research of the third author
has been supported by the Hungarian Research Found OTKA N° T-016413.

REFERENCES
[1] I. Adler, N. Karmarkar, M. G. C. Resende, and G. Veiga. Data structures
and programming techniques for the implementation of Karmarkar's algorithm.
ORSA J. on Comput., 1(2):84-106,1989.
[2] E. D. Andersen. Finding all linearly dependent rows in large-scale linear pro-
gramming. Optimization Methods and Software, 6:219-227, 1995.
[3] E. D. Andersen and K. D. Andersen. Presolving in Linear Programming.
Preprint 35, Dept. of Math. and Computer Sci., Odense University, 1993. To
appear in Math. Programming.
[4] E. D. Andersen and Y. Yeo Combining interior-point and pivoting algo-
rithms for linear programming. Technical report, Department of Management
246 CHAPTER 6

Sciences, The University of Iowa, 1994. Available via anonymous ftp from
ftp://col.biz.uiowa.edu/pub/papers/cross.ps.Z, to appear in Management Sci-
ence.

[5) K. D. Andersen. A modified Schur complement method for handling dense


columns in interior point methods for linear programming. Technical report,
Dept. of Math. and Computer Sci., Odense University, 1994. Submitted to ACM
Transaction on Mathematical Software.

[6) M. Arioli, J. W. Demmel, and I. S. Duff. Solving sparse linear systems with
sparse backward error. SIAM 1. Mat. Anal. Appl., 10(2):165-190,1989.

[7) M. Arioli, I. S. Duff, and P. P. M. de Rijk. On the augmented system approach


to sparse least-squares problems. Numer. Math., 55:667-684, 1989.

[8) J. R. Birge, R. M. Freund, and R. Vanderbei. Prior reduced fill-in solving


equations in interior point algorithms. Oper. Res. Lett., 11:195-198, 1992.

[9) R. E. Bixby. Progress in linear programming. ORSA J. on Comput., 6(1):15-22,


1994.

[10) R. E. Bixby and M. J. Saltzman. Recovering an optimal basis from an interior


point solution. Oper. Res. Lett., 15(4):169-178,1993.

[11) A. Bjork. Methods for sparse linear least squares problems. In J. R. Bunch
and D. J. Rose, editors, Sparse Matrix Computation, pages 177-201. Academic
Press INC., 1976.

[12) A. L. Brearley, G. Mitra, and H. P. Williams. Analysis of mathematical program-


ming problems prior to applying the simplex algorithm. Math. Programming,
15:54-83, 1975.

[13) J. R. Bunch and B. N. Parlett. Direct methods for solving symmetric indefinit
systems of linear equations. SIAM J. Numer. Anal., 8:639-655, 1971.

[14) T. J. Carpenter, I. J. Lustig, J. M. Mulvey, and D. F. Shanno. Separable


quadratic programming via a primal-dual interior point method and its use in
a sequential procedure. ORSA J. on Comput., 5:182-191, 1993.

[15) S. F. Chang and S. T. McCormick. A hierachical algorithm for making sparse


matrices sparse. Math. Programming, 56:1-30, 1992.

[16) I. C. Choi, C. L. Monma, and D. F. Shanno. Further Development of a Primal-


Dual Interior Point Method. ORSA J. on Comput., 2(4):304-311, 1990.
Implementation of IPMs for LP 247

[17] E. Christiansen and K. O. Kortanek. Computation of the collapse state in limit


analysis using the LP primal affine scaling algorithm. J. Comput. Appl. Math.,
34:47-63, 1991.
[18] P. D. Domich, P. T. Boggs, J. E. Rogers, and C. Witzgall. Optimizing over
three dimensional subspaces in an interior-point method for linear programming.
Linear Algebra Appl., 152:315-342, 1991.
[19] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct methods for sparse matrices.
Oxford University Press, New York, 1989.
[20] I. S. Duff, N. I. M. Gould, J. K. Reid, J. A. Scott, and K. Turner. The factoriza-
tion of sparse symmetric indefinite matrices. IMA J. Numer. Anal., 11:181-204,
1991.
[21] S. C. Eisenstat, M. C. Gursky, M. H. Schultz, and A. H. Sherman. The Yale
sparse matrix package, I. the symmetric code. Internat. J. Numer. Methods
Engrg., 18:1145-1151,1982.
[22] A. S. El-Bakry, R. A. Tapia, and Y. Zhang. A study of indicators for identifying
zero variables in interior-point methods. SIAM Rev., 36(1):45-72, 1994.
[23] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Un-
constrained Minimization Techniques. John Wiley and Sons, New York, 1968.

[24] J. J. H. Forrest and D. Goldfarb. Steepest-edge simplex algorithms for linear


programming. Math. Programming, 57:341-374, 1992.
[25] J. J. H. Forrest and J. A. Tomlin. Implementing the simplex method for opti-
mization subroutine library. IBM Systems J., 31(1):11-25, 1992.
[26] R. Fourer and S. Mehrotra. Solving symmetric indefinite systems in an interior
point method for linear programming. Math. Programming, 62:15-40, 1993.
[27] K. R. Frisch. The logarithmic potential method of convex programming. Tech-
nical report, University Institute of Economics, Oslo, Norway, 1955.
[28] D. M. Gay. Electronic mail distribution of linear programming test problems.
COAL Newsletter, 13:10-12, 1985.
[29] A. George and J. W. -H. Liu. Computing Solution of Large Sparse Positive
Definite Systems. Prentice-Hall, Englewood Cliffs, NJ, 1981.
[30] A. George and J. W. -H. Liu. The evolution of the minimum degree ordering
algorithm. SIAM Rev., 31:1-19, 1989.
248 CHAPTER 6

[31] P. E. Gill, W. Murray, M. A. Saunders, J. A. Tomlin, and M. H. Wright. On the


projected Newton barrier methods for linear programming and an equivalence
to Karmarkar's projective method. Math. Programming, 36:183-209, 1986.
[32] J. L. Goffin and J. P. Vial. Cutting planes and column generation techniques
with the projective algorithm. J. Optim. Theory Appl., 65:409-429, 1990.
[33] A. J. Goldman and A. W. Tucker. Polyhedral convex cones. In H. W. Kuhn
and A. W. Tucker, editors, Linear Inequalities and related Systems, pages 19-40,
Princeton, New Jersey, 1956. Princeton University Press.
[34] A. J. Goldman and A. W. Tucker. Theory of linear programming. In H. W.
Kuhn and A. W. Tucker, editors, Linear Inequalities and related Systems, pages
53-97, Princeton, New Jersey, 1956. Princeton University Press.
[35] J. Gondzio. Splitting dense columns of constraint matrix in interior point meth-
ods for large scale linear programming. Optimization, 24:285-297, 1992.
[36] J. Gondzio. Multiple centrality corrections in a primal-dual method for lin-
ear programming. Technical Report 1994.20, Logilab, HEC Geneva, Section of
Management Studies, University of Geneva, November 1994. Revised May 1995,
to appear in Computational Optimization and Applications.
[37] J. Gondzio. Presolve analysis of linear programs prior to applying the interior
point method. Technical Report 1994.3, Logilab, HEC Geneva, Section of Man-
agement Studies, University of Geneva, 1994. Revised Dec. 1994, to appear in
ORSA J. on Compo
[38] J. Gondzio. HOPDM (version 2.12) - A fast LP solver based on a primal-dual
interior point method. European J. Oper. Res., 85:221-225, 1995.
[39] H. J. Greenberg. The use of the optimal partition in a linear programming
solution for postoptimal analysis. Oper. Res. Lett., 15(4):179-186, 1994.
[40] O. GuIer and Y. Yeo Convergence behaviour of interior-point algorithms. Math.
Programming, 60(2):215-228, 1993.
[41] A. J. Hoffman, M. Mannos, D. Sokolowsky, and N. Wiegman. Computational
experience in solving linear programs. Journal of the Society for Industrial and
Applied Mathematics, 1:17-33,1953.
[42] P. -F. Hung and Y. Yeo An asymptotical O(y'nL)-iteration path-following lin-
ear programming algorithm that uses wide neighborhoods. Technical report,
Department of Mathematics, The University of Iowa, March 1994. To appear
in SIAM J. on Optimization.
Implementation of IPMs for LP 249

[43] B. Jansen, C. Roos, and T. Terlaky. An interior point approach to postopti-


mal and parametric analysis in linear programming. In Interior point meth-
ods. Eovos Lonind University, Department of Operations Research, H-1088 Bu-
dapest, Muzeum krt. 6-8., Hungary, 1992.
[44] B. Jansen, C. Roos, T. Terlaky, and J. P. Vial. Primal-dual target follow-
ing algorithms for linear programming. Technical Report 93-107, Faculty of
Technical Mathematics and Informatics, Technical University Delft, Delft, The
Netherlands, 1993.
[45] B. Jansen, T. Terlaky, and C. Roos. The theory of linear programming: Skew
symmetric self-dual problems and the central path. Optimization, 29:225-233,
1994.
[46] N. K. Karmarkar. A polynomial-time algorithm for linear programming. Com-
binatorica, 4:373-395, 1984.

[47] N. K. Karmarkar, J. C. Lagarias, L. Slutsman, and P. Wang. Power series


variants of Karmarkar-type algorithms. ATCfT Tech. J., 68:20-36, 1989.
[48] M. Kojima, N. Megiddo, and S. Mizuno. A primal-dual infeasible-interior-point
algorithm for linear programming. Math. Programming, 61:263-280, 1993.
[49] M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algo-
rithm for linear proramming. In N. Megiddo, editor, Progress in Mathemati-
cal Programming: Interior-Point Algorithms and Related Methods, pages 29-47.
Springer Verlag, Berlin, 1989.
[50] J. W. -H. Liu. A generalized envelope method for sparse factorization by rows.
ACM Trans. Math. Software, 17(1):112-129,1991.

[51] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Computational experience with


a primal-dual interior point method for linear programming. Linear Algebra
Appl., 20:191-222, 1991.

[52] I. J. Lustig, R. E. Marsten, and D. F. Shanno. The interaction of algorithms and


architectures for interior point methods. In P. M. Pardalos, editor, Advances in
optimization and parallel computing, pages 190-205. Elsevier Sciences Publishers
B.V.,1992.
[53] I. J. Lustig, R. E. Marsten, and D. F. Shanno. On implementing Mehrotra's
predictor-corrector interior-point method for linear programming. SIAM J. on
Optim., 2(3):435-449, 1992.
250 CHAPTER 6

[54] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Interior point methods for linear
programming: Computational state ofthe art. ORSA J. on Comput., 6{1}:1-15,
1994.
[55] H. M. Markowitz. The elimination form of the inverse and its application to
linear programming. Management Sci., 3:255-269, 1957.
[56] I. Maros and Cs. Meszaros. The role of the augmented system in interior point
methods. Technical Report TR/06/95, Brunei University, Department of Math-
ematics and Statistics, London, 1995.
[57] K. A. McShane, C. L. Monma, and D. F. Shanno. An implementation of a
primal-dual method for linear programming. ORSA J. on Comput., 1(2):70-83,
1989.
[58] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo,
editor, Progress in Mathematical Programming: Interior-Point Algorithms and
Related Methods, pages 131-158. Springer Verlag, 1989.
[59] N. Megiddo. On finding primal- and dual- optimal bases. ORSA J. on Comput.,
3(1 ):63-65, 1991.
[60] S. Mehrotra. Handling free variables in interior methods. Technical Report
91-06, Department of Industrial Engineering and Managment Sciences, North-
western University, Evanston, USA., March 1991.
[61] S. Mehrotra. High order methods and their performance. Technical Report 90-
16Rl, Department of Industrial Engineering and Managment Sciences, North-
western University, Evanston, USA., 1991.
[62] S. Mehrotra. On the implementation of a primal-dual interior point method.
SIAM J. on Optim., 2(4):575-601, 1992.
[63] O. du Merle, J. L. Goffin, and J. P. Vial. A short note on the comparative be-
haviour of Kelley's cutting plane method and the analytic center cutting plane
method. Technical Report 1996.4, Logilab, HEC Geneva, Section of Manage-
ment Studies, University of Geneva, January 1996.
[64] Cs. Meszaros. Fast Cholesky factorization for interior point methods of linear
programming. Technical report, Computer and Automation Institute, Hungar-
ian Academy of Sciences, Budapest, 1994. To appear in Computers & Mathe-
matics with Applications.
[65] Cs. Meszaros. The "inexact" minimum local fill-in ordering algorithm. Working
paper WP 95-7, Computer and Automation Institute, Hungarian Academy of
Sciences, Budapest, 1995.
Implementation of IPMs for LP 251

[66] Cs. Meszaros. The augmented system variant of IPMs in two-stage stochastic
linear programming computation. Working paper WP 95-1, Computer and
Automation Institute, Hungarian Academy of Sciences, Budapest, 1995.
[67] J .L. Nazareth. Computer Solution of Linear Programs. Oxford University Press,
New York, 1987.
[68] J. von Neumann. On a maximization problem. Technical report, Institute for
Advanced Study (Princeton, NJ, USA), 1947.
[69] E. Ng and B. W. Peyton. A supernodal Cholesky factorization algorithm for
shared-memory multiprocessors. SIAM J. Sci. Statist. Comput., 14(4):761-769,
1993.
[70] L. Portugal, F. Bastos, J. Judice, J. PaixiS, and T. Terlaky. An investigation
of interior point algorithms for the linear transportation problems. Technical
Report 93-100, Faculteit der Technische Wiskunde en Informatica, Technische
Universiteit Delft, Nederlands, 1993.
[71] M. G. C. Resende and G. Veiga. An efficient implementation of a network
interior point method. Technical report, AT&T Bell Laboratores, Murray Hill,
NJ, USA, February 1992.
[72] E. Rothberg and A. Gupta. Efficient Sparse Matrix Factorization on High-
Performance Workstations-Exploiting the Memory Hierarchy. ACM Trans.
Math. Software, 17(3):313-334, 1991.
[73] G. Sonnevend, J. Stoer, and G. Zhao. Subspace methods for solving linear
programming problems. Technical report, Institut fur Angewandte Mathematik
und Statistic, Universitat Wurz burg , Wurzburg, Germany, January 1994.
[74] G. W. Stewart. Modifying pivot elements in Gaussian elimination. Math. Comp.,
28:537-542, 1974.
[75] G. W. Stewart. On scaled projections and pseudoinverses. Linear Algebra Appl.,
112:189-193, 1989.
[76] R. Subramanian, R. P. S. Scheff Jr., J. D. Qillinan, D. S. Wiper, and R. E.
Marsten. Coldstart: Fleet assigment at Delta Air Lines. Interfaces, 24(1),
1994.
[77] U. H. Suh!. MPOS - Mathematical optimization system. European J. Oper.
Res., 72(2):312-322, 1994.

[78] U. H. Suhl and L. M. Suh!. Computing sparse LU factorizations for large-scale


linear programming bases. ORSA J. on Comput., 2(4):325-335, 1990.
252 CHAPTER 6

[79] W. F. Tinney and J. W. Walker. Direct solution of sparse network equations by


optimally ordered triangular factorization. In Proceedings of IEEE, volume 55,
pages 1801-1809. 1967.

[80] A. W. Tucker. Dual systems of homogeneous linear relations. In Linear inequali-


ties and related systems, pages 3-18. Princeton University Press, Princeton, NJ,
1956.
[81] K. Turner. Computing projections for Karmarkar algorithm. Linear Algebra
Appl., 152:141-154,1991.

[82J R. J. Vanderbei. Splitting dense columns in sparse linear systems. Linear


Algebra Appl., 152:107-117,1991.

[83J R. J. Vanderbei and T. J. Carpenter. Symmetric indefinite systems for interior


point methods. Math. Programming, 58:1-32, 1993.

[84J X. Xu. An O( y'n"L )-iteration large-step infeasible path-following algorithm for


linear programming. Technical report, College of Business Administration, The
University of Iowa, Iowa City, IA 52242, August 1994.

[85] X. Xu. On the implementation of a homogeneous and self-dual linear program-


ming algorithm. Technical report, 1994. Manuscript.

[86] X. Xu, P.-F. Hung, and Y. Yeo A simplified homogeneous and self-dual linear
programming algorithm and its implementation. Technical report, Department
of Management Sciences, The University of Iowa, 1993.

[87J X. Xu and Y. Yeo A generalized homogeneous and self-dual algorithm for linear
programming. Oper. Res. Lett., 17:181-190, 1995.

[88J M. Yannakakis. Computing the minimum fill-in is NP-complete. SIAM J.


Algebraic Discrete Methods, pages 77-79, 1981.

[89] Y. Yeo On the finite convergence of interior-point algorithms for linear program-
ming. Math. Programming, 57:325-335, 1992.

[90J Y. Ye, M. J. Todd, and S. Mizuno. An O(y'n"L) - iteration homogeneous and


self-dual linear programming algorithm. Math. Oper. Res., 19:53-67, 1994.
PART II
CONVEX PROGRAMMING
7
INTERIOR-POINT METHODS
FOR CLASSES OF CONVEX
PROGRAMS
Florian Jarre
Institut fur Angewandte
Mathematik und Statistik
Universitiit Wurzburg
97074 Wurzburg, Germany

Introduction
Many of the theoretical results of the previous chapters about interior-point methods
for solving linear programs also hold for nonlinear convex programs. In this chapter
we intend to give a simple self-contained introduction to primal methods for convex
programs. Our focus is on the theoretical properties of the methods; in Section
7.3, we try to bridge the gap between theory and implementation, and propose
a primal long-step predictor-corrector infeasible interior-point method for convex
programming. Our presentation follows the outline in [19]; for a comprehensive
treatment of interior-point methods for convex programs we refer to [29] or [7].

The generalization of primal methods-such as barrier methods, primal affine scaling


methods, path-following methods or the method of centers-from linear programs
to classes of smooth convex programs is based on the identification of two of the key
properties of the (scalar) function "-In t: JR.+ ......,. JR.". These properties, known as
self-concordance [29], are directly or implicitly used in most, if not all, analyses of
logarithmic barrier methods for linear or convex quadratic programs.

This chapter is divided into four sections. In Section 7.1 a convex problem is de-
fined and an elementary method for solving this problem is listed. For this method
some crucial questions are stated that determine its efficiency. Based on these ques-
tions the concept of self-concordance is naturally derived in Section 7.2, and some
important examples of self-concordant barrier functions are listed. Section 7.2 also
presents the basic theoretical results needed in Section 7.3. In Section 7.3 a short
proof of polynomiality for the method of centers (in slight modification of the con-
ceptual method described in Section 7.1) is given. The proof is very simple once the
results of Section 7.2 are known. Section 7.3 closes with an implementable barrier

255
T. Terlaky (ed.), Interior Point Methods of Mathematical Prollrammiml 255-296.
ClI'J%KJuwer ArademkPubhshtTl.
256 CHAPTER 7

method for a slightly more general form of convex programs. In Section 7.4 we list
some applications of convex programs.

7.1 THE PROBLEM AND A SIMPLE


METHOD

7.1.1 A Convex Problem and Assumptions


Let a convex set S C JRn,

S={xlri(x)~O for l~i~m} (7.1)

be given by m restrictions r; ; S -> JR. We assume that S has a nonempty inte-


rior So, and that the constraint functions ri are continuous on S and three times
continuously differentiable on So. We consider the problem of minimizing a linear
objective function cT x subject to the constraint XES, i.e. we search for X OP ' E S
and Aop' such that
(7.2)
(Note that a nonlinear objective function can be rewritten as a linear objective
function by adding one more constraint and one more variable.) Througout we
assume that the set Sop' of optimal solutions is nonempty and bounded. (Note that
S itself may be unbounded.) Next we briefly outline the method of centers for solving
(7.2), and point out certain critical aspects of this method.

7.1.2 The Method of Centers


A simple method for solving (7.2) is the method of centers of Huard (1966), [13]
which is also well suited to point out the two crucial aspects when solving (7.2).

The method of centers is based on the logarithmic barrier function


m
4>(x);= - Lln(-r;(x» (7.3)
;=1

for the set S. Througout, we assume further that the constraint functions ri are
such that 4> is well defined and convex in So. In particular, we exclude restrictions
like r;(t) ;= max{O, tP for the negative real axis. (In this example the function
Solving Convex Programs 257

¢(t) := -In( -r;(t)) is not defined on the interior So := {t E 1R I t < O} of the


feasible set.) It is straightforward to verify that ¢ is smooth and convex if the
functions ri are so. By assumption, ¢ is finite in So, and as x approaches the
boundary 8S of S we may verify that limx~as, xESO ¢(x) =
00. (While most of
the applications will probably involve the logarithms of the constraint functions as
a barrier, it is not necessary to assume such a structure; all that will be used in our
analysis are the self-concordance properties derived in the next section.)

Let some A > Aopt be given. (If some point xES is known we may choose for
example A = 1 + cT x.) We define

(7.4)

and consider the following barrier function for S(A):

rp(x, A) := -Ii In(A - cT x) + ¢(x) (7.5)


for some fixed Ii :::: 1. The minimum of rp( . ,>.) will be denoted by X(A). (Such a
minimizer X(A) exists if S(A) is bounded, or equivalently, if Sop' is nonempty and
bounded, and it is unique if ¢ is strictly convex.) The simplest form of the method
of centers proceeds in two steps.

Method of centers
Initialization: Let some value A = AO > AOP ' be given and some approximation
x(O) E So to X(AO) with cT x(O) < >.0. Set k O. =
Until some stopping criterion is satisfied repeat

1. Reduce A to Ak+l = HAk + cT x(k»).


2. Approximate X(Ak+l) by Newton's method (for minimizing rp( . ,Ak+l)) starting
at x(k).

3. Set k = k + 1.
End.

The performance of this method depends on two crucial questions.

1. How well does Newton's method perform when applied to minimizing rp( . ,A)?
258 CHAPTER 7

2. How large is the "distance" HAk - cT x(k)) by which Ak is reduced compared to


the "distance" cT x(k) - AOP ' of x(k) to optimality?

It is intuitively clear that the method of centers will be interesting if and only if both
questions allow a satisfactory answer. These two questions will be used to motivate
two forms of local Lipschitz continuity in the next section.

7.2 SELF-CONCORDANCE
We give an answer to the two crucial questions of the previous section by introducing
the notion of self-concordance. Self-concordant functions are defined and examined
in great detail by Nesterov and Nemirovsky in [27, 28], and while our presentation
is different, most results presented in this section are due to [28].

7.2.1 The First Question, Newton's Method


Next we state a condition that enables us to analyze some theoretical properties of
Newton's method for minimizing the barrier function cjJ for the set S. These results
will be applied to ip( . ,A) of (7.5) and S(A) later on. Note that ip( . ,A) has the same
structure as cjJ being the sum of the logarithms of finitely many smooth constraints.

Derivation of Self-Concordance
A condition for a "nice" performance of Newton's method can be derived by the
following straightforward argument.

• Newton's method for minimizing cjJ starting at x( Ie) is based on an approximation


to cjJ, the approximation having a constant Hessian H = D2cjJ(X(k)) (quadratic
model).
• Intuitively it is clear that Newton's method is "good" if the relative change of
the Hessian D2 cjJ( x) is sma'll for small changes in x.
• The absolute change of D2cjJ(x) is determined by the third derivative D 3 cjJ(x).

• Thus, D 3 cjJ(x) should be small relative to D2cjJ(x).


Solving Convex Programs 259

• Let us consider a simple example: cjJ(t) := -In t, the logarithmic barrier function
1
for the positive real axis. In this case, for t > 0, cjJl/(t) =
t 2 and cjJlII(t) =
- t~' The natural condition to bound cjJlII relative to cjJl/ is to require WI/(t)1 S
2cjJl/(t)3/2. Of course, the constant "2" appears somewhat arbitrary, and also
the exponent 3/2 needs further justification. But as we will see next, this choice
of condition makes sense indeed.
• The generalization to n dimensions of the above condition results in the self-
concordance condition given in [27]; for any x E So C IRn and any direction
h E IR n we require

to hold true. From this formulation it becomes evident that the exponent 3/2
on the right hand side is natural in that it ensures independence of this relation
of the norm of h.

• Note that the quantities involved in this relation are just the second and third
directional derivatives of cjJ at x in direction h. Thus, the above relation can
equivalently be rewritten as follows.

Definition 7.2.1 (Self-concordance) Let x E So be some strictly feasible point


for cjJ and let h E IR n , h #- 0 be some direction. Define the function f : I ---> IR by

Jet) = /x,h(t) = cjJ(x + th). (7.6)

Here, I = {t I x + th E SO} is an open interval containing O. The function / hence


depends on the barrier function cjJ as well as on x and h. The function cjJ is sel/-
concordant if it is convex, three times continuously differentiable, and if for any
x E So and h E IRn, the function / satisfies

(7.7)

Throughout we will assume that cjJ is a barrier function for S, that is for any point
y E oS on the boundary of S we assume that limx _ v, xESO cjJ(x) = 00. (In [28] the
barrier property is called "strong self-concordance".)

Note that inequality (7.7) is not invariant under multiplication of / - respectively cjJ
- by a positive constant. For example, if / satisfies (7.7), the function let) := 4/(t)
260 CHAPTER 7

~ ~113/2
satisfies /'" :s; f , and the constant "2" is not needed. Condition (7.7) essentially
requires that the supremum
If;':h (0) I
xES~,urElRn (f;',h(0»3/2
is finite, and that ¢ is multiplied by sufficiently large constant such that the supre-
mum is less or equal to 2. Thus, the choice of the second constant ("2") in the
definition of (7.7) may be somewhat arbitrary, based on the function -In t, but is is
certainly without loss of generality. In fact, our Definition 7.2.1 is in slight variation
to the original definition in [28] who require the above supremum to be less or equal
to 2/yfa, and call ¢ self-concordant with parameter 0'. However, in [28] it is also
assumed for most part of their monograph that 0' = 1, so that the definitions are
more or less the same.

Before proving that our incentive-of finding some criterion which guarantees that
Newton's method for minimizing ¢ converges well-is indeed fulfilled by (7.7), we
show that there are a number offunctions that satisfy (7.7).

Some Examples
In trying to construct functions that satisfy (7.7) let us start with the function
"-lnt" which of course satisfies (7.7).

• Summation. Let us observe first that condition (7.7) is closed with respect
to summation, that is if ¢i: IR n -+ IR satisfy (7.7) for i = 1,2, then so does
¢1,2 := ¢1 + ¢2 (as long as the intersection of the domains of 4>1 and ¢2 is not
empty).
Indeed, 1ft + f~/1 :s; Iftl + Ig'l :s; 2(f~')3/2 + 2(f!f)3/2 :s; 2(f~' + f~')3/2. Here,
we denote by fi the restriction of ¢i to the line x + th, f;(t) := ¢i(i + th). 0
• Affine transformations. Similarly, (7.7) is invariant under affine transforma-
tions. Let A(x) := Ax + b be an affine mapping with some matrix A E IRpxq
and some vector b E IRP. If ¢( . ): IRP -+ IR satisfies (7.7) then so does
¢(A( . »: IRq -+ IR (as long as there exists some x such that ¢(Ax + b) is
defined at all).
Indeed,
~
= dt~k ¢((Ax + b) + t(Ah» = dt~-
dt k ¢(A(x + th» k ¢(x + th)

with Ii = Ax+b and h = Ah. Thus, if ¢-with f(t) = ¢(x+th)-satisfies (7.7),


then so does ¢(A( . »with f(t) = ¢(A(x + th». 0
Solving Convex Programs 261

• Polyhedron. In particular, if we choose A = aT


and b = i3i in the previous
observation we may conclude that -In(ar x + i3;) satisfies (7.7), and by the
above closed ness under summation we may further conclude that

- L In(ar x + i3i)
m

(7.8)
i=1

is a self-concordant barrier function for the polyhedron {xl aT X+i3i ~ 0 for 1:S
i:S m} (ifit has non empty interior).

Let us give two further brief examples of self-concordant barrier functions.

• Convex quadratic constraints. First note that the logarithmic barrier func-
tion
-In(-q(x)) (7.9)
of the constraint q(x) :S 0 with a convex quadratic function q : IR n --+ IR satisfies
(7.7).
Indeed, the restriction J(t) := -In( -q(x+th)) can be split into two linear parts:
Since q is quadratic it follows that q(x + th) =
a2t2 + alt + ao for some real
numbers ai depending only on q, x and h. Since q is convex it follows a2 ~ 0,
and since x is stricly feasible, it follows q(x) < O. Hence, q(x + th) is either
linear in t, or it has two real roots as a function of t. In the latter case J can be
written as J(t) = -In(ult + vt) -In(u2t + V2) with VI > 0, V2 > 0, Ul, U2 E IR,
and these satisfy condition (7.7) as we have just seen. 0

• Semidefiniteness constraint. The second very important example regards


positive semidefinite programs. These programs are similar in structure to linear
programs, however, the unknown is not a real vector x but a symmetric n x n-
matrix X, and the constraint x ~ 0 (meaning that each component of x is
nonnegative) is replaced by the constraint X ~ 0 that X is positive definite,
i.e. hT X h ~ 0 for all h E IRn. The logarithmic barrier function for the cone of
positive definite matrices is given by

,p(X) := { -In det X if X is positive definite


+00 else
Given a positive definite n x n-matrix X and a symmetric matrix Y, we may
consider the restriction J of ,p,
J(t) = -In det (X + tY). (7.10)
In order to evaluate its derivatives we rewrite J as
262 CHAPTER 7

n
= -2ln det X 1/ 2 -In det (I + tX- 1/ 2y X-1/2) = -In det X - 2)n(1 + tAil,
;=1

where Ai are the eigenvalues of X- 1/2Y X-1/2 (independent oft). By the closed-
ness of (7.7) under summation and affine transformations we conclude again,
that - L: In(l + tAil satisfies (7.7). 0

The H-Norm
Our analysis of Newton's method heavily depends on the choice of the norm in which
the analysis is carried out. By convexity of tjJ, its Hessian Hx := D2tjJ(x) is positive
semidefinite, and we may thus define a semi-norm IlzlIH", := (zT Hx z) 1/2. By our
assumption on problem (7.2), the set of optimal solutions SOP' is bounded, hence S
does not contain a straight line, and therefore tjJ is strictly convex 1 by the observation
following Lemma 7.2.2 below. Thus, II . IIH", is a norm-referred to as H-norm in
the sequel-and as it will turn out, this norm is a natural and very suitable choice
for our analysis. Indeed, it will turn out that the H -norm is closely related to the
shape of the set S.

Lemma 7.2.2 (Inner ellipsoid) Assume that the function tjJ is a self-concordant
barrier function and set Hx := D2tjJ(x). Let x E So and h E IRn be arbitrary. If
6 := IlhlIH", :::; 1 then x + h E S.

Proof: As in Definition 7.2.1 denote by f =


fx,h the restriction of tjJ to the line
{x+thl t E I}. We note that the differential inequality (7.7) is assumed to hold only
at the argument t = 0, since this is easier to verify. However, since (7.7) is assumed
to hold for any x E So and hE IR n in (7.6), the more general inequality

Iflll(t)1 :::; 2/"(t)3/2 (7.11)

in fact holds true for all tEl. To prove the Lemma it suffices to show that the
points ±6- 1 (±oo if 6 = 0) are in the domain of f or at its boundary. Here, 62 =
= =
IIhllk", /,,(0). We consider the function u(t) /"(t). Note that u(t) ~ 0 'Vt E I
by convexity of f. By finding the poles of u for t ~ 0 (by pole we denote a point
=
[ > 0 where limt_f, t<fu(t) 00) we may determine the domain of f. (The case
t :::; 0 follows when replacing h by -h in the definition of f.) Let v be the "extremal"
1 If this assumption is violated, and S does contain a straight line, then the null space of D2¢(x)
is nontrivial but independent of x, and straightforward modifications are possible to generalize the
results of this section.
Solving Convex Programs 263

solution of the differential inequality u' S; 2u 3/ 2 in (7.11) with the same initial values
as u (i.e. as 1"),
v'(t) = 2v(t)3/2, v(O) = 62 .
We deduce that u(t) S; v(t). (Straightforward, by exploiting the differential in-
equality (7.11), see for example [21], Theorem 3.1, page 19.) Since v is given by
v(t) = P(M - 1)-2 and has its pole at t = 6- 1 the claim follows. 0

Observe that in the case that 1"(0) = 0 it follows from v(t) == 0 that I"(t) = 0 Vi,
i.e. the domain of fin (7.6) is I = lR, and S contains the straight line {x+thlt E lR}.

Lemma 7.2.2 was first proved in [28], and simple examples are given there (e.g. the
function f(t) = lnt) that show that the bound t < 6- 1 on the maximum feasible
step length for f is tight. The above proof is taken from [1 7]. Note that only scalar
inequalities (such as (7.7)) are needed to provide an inner ellipsoid in n-dimensional
space.

The close relation of the H -norm to the shape of the feasible set S will become
more apparent in Section 7.2.2 where it is shown that a small multiple of the inner
ellipsoid is an outer ellipsoid for a certain subset of S, namely there exists a small
number 'Y > 1 such that for any x E So, if 6 = IlhllHx 2': 'Y then either x + h if. S or
D¢;(x)h < o.

Relative Lipschitz Condition


In [28], condition (7.7) is rewritten in an equivalent finite difference version,

x + D.x E So and

(7.12)

Before proving that the finite difference version (7.12) follows from (7.7) we use
(7.12) to derive a relative Lipschitz condition. By subtracting IlhllHx from (7.12) we
obtain

which implies
264 CHAPTER 7

and hence for the squares of the norms

(7.13)

where
126
M(6) = (l-c5)2 -1 = 1-6 + (1-6)2 =2+0(6).

Relation (7.13) may be regarded as a relative Lipschitz condition on the Hessian of


¢ similar to the one developed in [15], and it precisely states our original intention
that the relative change of \l2¢ be small if the change in its argument II~xIIHx = 6
is small. However, since condition (7.13) involves two directions ~x and h it seems
less practical than the scalar condition (7.7) which, in many cases, can easily be
checked.

=
By choosing h ~x/6 it follows IIhllHx = 1, and when dividing both sides of (7.13)
by 6 and taking the limit as 6 --> 0 we obtain that ID3¢(x)[h, h, h]1 $ 2 which is just
(7.7). The converse direction of showing that (7.7) implies (7.13) is slightly more
difficult as (7.13) involves two vectors ~x and h while (7.7) is a scalar inequality.
The following basic lemma due to [28] allows a proof of this implication:

Lemma 7.2.3 (Spectral Radius for Symmetric Trilinear Forms) Let


M : IR n x IR n x IR n --> IR be a symmetric homogeneous trilinear form, let A
IR n x IR n --> IR be a symmetric bilinear form, and let further u > 0 be a number
such that

then
M[x,y,zF $ uA[x,x]A[y,y]A[z,z], Vx,y,z E IRn. (7.14)

This Lemma was stated in [28] and proved in [29, 16]. We give the proof of [16]. Our
proof will use the following slightly generalized version of the well-known Cauchy-
Schwarz inequality.

Generalized Cauchy-Schwarz Inequality.


If A, B are symmetric matrices with Ix T Bxl $ xT Ax, "Ix E IR n , then

(aT Bb)2 $ aT AabT Ab, Va, bE IRn. (7.15)

Proof: Straightforward, see e.g. [16].


Solving Convex Programs 265

Proof of Lemma 7.2.3: For x E IR n denote by Mx the (symmetric) matrix defined


by yT Mxz := Mx[Y, z] := M[x, y, z], Vy, z E IRn. Without loss of generality let
u = 1 (else substitute A by ~A). Without loss of generality we assume that A is
positive definite. (Else replace A by A. := A + tl with f > 0, and take the limit
as f ---> 0.) By substituting M[x,y,zj:= M[A-l/2x,A-l/2y,A-l/2z] we can further
assume that A = I is the identity. Finally, it is sufficient to show that

holds, provided that M[h, h, hj2 :::; Ilhll~, Vh E IR n is true. (The remaining part
follows by applying the generalized Cauchy-Schwarz inequality (7.15) for fixed x to
B = Mx!) Let
, := max{M[x, h, h]1 s.t. IIxl12 = Ilhli2 = I}
and let X, h be the (not necessarily unique) corresponding arguments. The necessary
conditions for a maximum (or a minimum if M[x, h, h] is negative) imply that

where (3 and p are the Lagrange multipliers. From this we deduce that (3 = ,/2 and
p =,(by multiplying from left with (x T , 0) and (0, hT)) and therefore

(Mh)h =,x, (Mh)x = ,h,


i.e. (Mh)2h = ,2h. By symmetry of M h , h is eigenvector of Mh to the eigenvalue
±" which implies that
, = IhT Mhhl = M[h, h, h],
and this completes the proof. o

Proof of (7.12) by (7.7): (The outline of this proof is due to [29].)

=
Let a self-concordant function ¢>, a point x E So, the Hessian matrix D2 ¢>( x) H x,
an arbitrary vector dx E IR n with fJ =
IIdxlIH", < 1 and an arbitrary vector h E IR n
be given. From Lemma 7.2.2 follows that x + dx E So.

To evaluate how the H -norm of the vectors dx and h changes for different matrices
H x +t6x , with t E [0,1]' let us define

u(t) := Ildxllkx+,."x = dx T D2¢>(x + idx)dx ~ °


and
266 CHAPTER 7

Note that u(t) = 1':


6x(t) is as in the proof of Lemma 7.2.2. From the proof of
Lemma 7.2.2 we thu; obtain that

(7.16)

Relation (7.12) can be expressed in terms of v, i.e. we have to show:

The change of v can be estimated by its derivative v'et) using the estimate (7.14).
From self-concordance of rp it follows with (7.14) that

and in particular,
Iv'(t)1 = D3 rp(x + t~x)[~x, h, h]
~ 2)~xT D2rp(x + t~x)~x hT D 2rp(x + Mx)h = 2U(t)1/2 V (t).
Inserting (7.16) in this inequality we obtain a differential inequality,
28
Iv'(t)1 ~ 1 _ t8 vet).

The "extremal" solutions v of this differential inequality satisfying

v I (t) -_+~
-1- t8 v (t), yeO) = v(O),

are given by vet) = v(O)(l - t8)+2, and thus (again by the comparison theorem for
differential inequalities [21])
2 v(O)
v(O)( 1 - t8) ~ vet) ~ (1 _ t8)2

which is just relation (7.12) (after taking square roots). o

Newton's Method
This section is concerned with answering the first crucial question (about Newton's
method for minimizing rp). First note that rp has a unique minimum x· if S is
bounded. The minimum is called analytic center of S (Sonnevend [33]). This nota-
tion is commonly used but somewhat misleading since the analytic center does not
depend on the set of points S, but rather on the barrier function rp describing S.
Solving Convex Programs 267

Next we show that self-concordance of rp further implies that Newton's method con-
verges "well" when applied to minimize rp. Note that a linear perturbation of rp does
not influence the self-concordance condition (7.7), so that our results below can also
be applied to the unconstrained minimization problem

(7.17)

For simplicity of presentation we disregard the linear term (i.e. set J.t = (0), and for
x E So let
(7.18)
denote the Newton step for obtaining the next Newton iterate x := x + 8x. The
following Lemma due to [28) states the main result about Newton's method.

Lemma 7.2.4 (Newton) If rp is a self-concordant barrier function for S and x E


So such that 118xllHr < 1, then rp has a minimum x· E So. Further, the next Newton
iterate x satisfies x E So, and with 8x denoting the next Newton step starting at x,
we have
(7.19)

For 118xllHx :s: 1/4 this implies that Newton's method (without line search) is guar-
anteed to be quadratically convergent with constant at most 196 •

Proof: In [28) it is shown that rp has a minimum. We only prove (7.19) and follow
the outline of [28).

Define x(s) := x+s8x for s E [0,1) where 2 8x = -D 2 rp(x)-lDrp(xf is the Newton


step starting at x to minimize rp. By (7.13), x(s) E So for all s E [0,1) and any
h E IR n we have

(7.20)

For given z E IR n let

/C(s) := Drp(x(s»z - (1 - s)Drp(x)z = Drp(x(s»z + (1 - s)8xT D 2 rp(x)z.


2Jn this Chapter the gradient D¢(x) of some function ¢ is always a row vector, by V¢(x) we
denote the corresponding column vector.
268 CHAPTER 7

Observe that K(O) =


0 and K(l) =
DtjJ(x + ~x)z. Using the generalized Cauchy-
Schwarz inequality (7.15) and defining 6 := II~xIlHr we obtain from (7.20)

IK'(s)1 = /~DtjJ(X(S»z - ~xT D 2 tjJ(X)Z/ = l~xT(D2tjJ(x(s» - D2tjJ(x»zl

< ()(1_1 S6 )2 -1) J~xTD2tjJ(x)~xJzTD2tjJ(X)Z


2

((1_1S6)2 -1) 6l1zllHr·


By integration (K(O) = O!) we can thus bound
IK(l)l::; 11
o
IK'(s)lds::; IIzllHx 6
11
0
(
1
l-s
6)2 - 1 ds
6
= IlzlIHx -~.
2

1-0

Choosing z = ~x = _D2tjJ(x + ~x)-lDtjJ(x + ~xf as the next Newton step we


obtain
II~xllkx+AX = IDtjJ(x + ~x)~xl = 1K(1)1
62 62
::; 1 - 6 IIdxllHx ::; (1 _ 6)2 lI~xIIHx+AZ'
the last inequality following from (7.12). Now the claim follows when dividing the
last line by IldxIIHz+AZ. 0

The following result completes our intuition on the domain of quadratic convergence
of Newton's method.

Corollary 7.2.5 Under the assumptions of Lemma 7.2.4 let x* be the minimum of
(7.17). Newton's method starting at x is quadratically convergent if x E x* + i E(x*),
=
where E(x*) {zl zT H",oz ::; I}.

Proof: Let h E iE(x*) be given i.e. such that {) := IIhllHzo ::; and set xes) i. =
=
x* + sh. For a given z E IR n we may define K(s) DtjJ(x(s»z. We obtain
Solving Convex Programs 269

where the above inequalities follow just as in the proof of Lemma 7.2.4. Setting
z = ~x = -D 2 </J(x" + h)-l D</J(x" + hf as the Newton step starting at x" + h, we
obtain-again as in the proof of Lemma 7.2.4-that

In particular, for 0 :::: 1/5, it follows that II~xIIHr'+h :::: 5/16, and by relation (7.19),
the point x* + h is the domain of quadratic convergence. 0

The importance of these results on Newton's method is that they do not depend at all
on J.l in (7.17) or on the data of the problem (7.2)-(as long as </J is self-concordant).
Furthermore, for J.l = 00, the above corollary implies that the domain of quadratic
convergence is one fifth of the inner ellipsoid of S which in turn is a fixed fraction
of the outer ellipsoid of S as we will see in Section 7.2.2.

Without proof we further quote the following result of [28]. If II~xIIHr < 1/3 then
the distance of x to x" is bounded as follows:

(7.21)

The previous results depend on the H-norm of the Newton step for minimizing </J
starting at x. Clearly, the H -norm is a continuous function of x, and as will be
shown in Section 7.2.2, it is uniformly bounded for the examples considered here.

Nevertheless, the requirement that the starting point x of Newton's method be close
to the center in the sense that II~xIIHr < 1 can often not be satisfied in practice. For
lI~xllHr > 1, the result x + ~x of the Newton step may lie outside S. The following
simple step length rule for a line-search, however, seems to be an apparently favorably
damped version of Newton's method.

Let a search direction ~x be given with D</J(x )~x < 0, for example ~x may be
the Newton direction (7.18) for </J. We are interested in a search step s such that
</J( x + s~x) is as small as possible. (Line-search problem.) For this purpose define
Jet) = </J(x + t~x) as in (7.6) and denote the minimizer of / by t" and the Newton
step for / by ~t = - /' (0)/ /,,(0). A possible rule for the search step s referred to as
the reduced Newton step for / and for which [28] proved global convergence is the
following:
~t
(7.22)
s= l+l~tIJf"(O)'
270 CHAPTER 7

(Note that the Newton step for / refers to a one-dimensional Newton's method that
is not to be confused with the n-dimensional Newton direction for tjJ.)

Lemma 7.2.6 (see [18]). Assume the barrier function tjJ for S is self-concordant.
A line-search for ¢ starting at x in direction Llx using the reduced Newton step s
(7.22) is monotonically convergent, i.e. s is "a little too short", s ::; t*. Further, s
is the largest step possible that is guaranteed to satisfy s ::; t* if all that is known are
the first two derivatives of / at the current point t = O.

Proof:
We first prove monotonicity of the line-search. Let a = /'(0) < 0, b = 1"(0) > 0,
and t* be the zero of get) := /'(t). We show that s ::; t* holds true by writing the
differential inequality (7.7) (self-concordance of J) in terms of the function g = /'.
We obtain
=
g(O) a, g'(O) b,= g"(t) ::; 2g'(t)3/2. (7.23)
The "extremal" solution v of (7.23) that solves the initial value problem

v(O) = a, v'(O) = b, v"(t) = 2v'(t)3/2


is an upper bound for g: Since g' satisfies a first-order initial value inequality it
follows that g' ::; v' (see, for example, [21], Theorem 3.1), and this implies that
g ::; v. The function v is given by vet) = (b- 1 / 2 - t)-1 + a - b1/ 2 and has the
(unique) zero s = ab- 1 /(1 + ab- 1 / 2 ). By construction of v, we may conclude that
s < to.

On the other hand, since g = v is possible, any step larger than s may exceed t* . 0

Observe that if Llx is the Newton direction (7.18) of ¢ at x, then the reduced Newton
step in direction Llx simply yields llx/(l + IILlxIIHJ.

In spite of the strong result of Lemma 7.2.6, it turns out that the reduced Newton
step may be much too short in practice (Lemma 7.2.6 merely gives a worst-case
estimate!) and a practical implementation definitely should not rely on the reduced
Newton step; instead it might use a line search along the Newton direction, for
example.
Solving Convex Programs 271

7.2.2 The Second Question, Outer Ellipsoidal


Approximation
The results of the previous section state that Newton's method for minimizing a
self-concordant function converges quadratically in "one fifth of the inner ellipsoid
around the analytic center". This result looks nice, but it is meaningful only if
the inner ellipsoid is sufficiently large; for example if a small multiple of the inner
ellipsoid forms an outer ellipsoid for S.

Similarly, we may link the second crucial question of Section 7.1 (about the "dis-
tance" cT X(A) - AoPt ) to ellipsoidal approximations of S(A) around the analytic
centers X(A). If <p( ,A) is self-concordant for fixed A the results of the previous
section imply that
E(A) := {zl zT Hx,>.z :s: I}
is an inner ellipsoid for S(A) in the sense that X(A) + E(A) C S(A). (Here, by H x ,>.
we denote the Hessian D;,<p(x, A) of <p( . ,A).) If there exists a small number 1> 1
such that lE(A) yields an outer ellipsoid for S(A), i.e. S(A) C X(A) + lE(A), then
we may conclude that

and thus answer the second question.

It turns out that a second property of the log function is needed to allow such
ellipsoidal approximations. (Self-concordance by itself is not sufficient-as can easily
be seen from the example r/J(t) := -In t - 0" In(1 - t): (0,1) -+ 1R; this function is
self-concordant for 0" ;::: I, the inner ellipsoids around the minimizer l~q, however,
become smaller and smaller as 0" -+ 00.)

A naive derivation of a second property needed to provide outer ellipsoids can again
be obtained from the function "-In t". To prevent the minimum of r/J from being
close to the boundary of the domain of r/J one might impose a condition that bounds
the growth of r/J (i.e. that bounds Dr/J(x»-relative to the canonical norm II .IIHx'
For the function r/J(t) := -In t: 1R+ -+ IR we may observe that
r/J'(t)2 :s: r/J"(t) Vt > 0
holds true. More generally we assume a second differential property of r/J:

Definition 7.2.7 (Self-limitation) Let the function f be defined as in (7.6). Let a


constant parameter 0 ;::: 1 be given. The barrier function r/J: S C IR n -+ IR is called
272 CHAPTER 7

B-self-concordant barrier function if for all x E SO and all h E lRn the function f
satisfies (7.7) and

(7.24)

Remark: This definition coincides with the one given in [29]. In some cases however,
we need to refer to property (7.24) independently of (7.7). We therefore allocate an
extra name and refer to (7.24) (without assuming that (7.7) holds) as B-self-/imiting.

The number VB may be interpreted as a local Lipschitz constant for!jJ (or I), where
the change in the argument x is measured in the H -norm.

Note that just as (7.7), also (7.24) is assumed to hold only for the argument t = 0
since this is easier to verify. However, as before, (7.24) in fact holds true for all tEl.

Note further that (in contrast to (7.7» condition (7.24) is not invariant when adding
a linear perturbation c: x
to!jJ. (Such a perturbation is used below in the logarithmic
barrier approach.)

Equivalent Formulations
As before for (7.7), there are also other equivalent formulations for (7.24).

The first reformulation of (7.24) is the requirement that the function'll: So -+ lR


defined by w(x) := e-¢(x)/9 is concave.

(The proof that this is an equivalent condition is trivial.) Note that w(x) > 0 for
x E So, and'll can be extended contiuously to the boundary of S by setting w(x) = 0
for x E as. The resulting function is related to the multiplicative barrier function
in [14].

The Newton step allows another equivalent formulation of the condition (7.24),
namely the condition that the Newton step (7.18) is to satisfy
(7.25)
The derivation of this formulation is also straightforward, e.g. by applying the KKT-
Theorem to max{D!jJ(x)hl hT D2!jJ(x)h ~ I}. Observe that this formulation uses our
assumption that tP is strictly convex, i.e. that the Newton step exists at all, while
the previous two formulations are slightly more general.
Solving Convex Programs 273

Condition (7.25) is remarkable, as for II~xIIHx < 1 Lemma 7.2.4 about Newton's
method is applicable.

Some Examples
Let us briefly verify that the above examples (7.8) - (7.10) satisfy (7.24) as well.

• Affine transformations. We begin with the invariance of property (7.24) with


respect to affine transformations and observe as for (7.7) that if A(x) := Ax+b
is an affine mapping with some matrix A E JRpxq and some vector b E JRP, then
with 1>( . ): JRP -+ JR, also ¢(A( . )): JRq -+ JR satisfies (7.24) with the same
parameter 8 (as long as there exists some x such that 1>( Ax + b) is defined at
all).
(The proof of this statement is exactly as for (7.7).)

• Summation. Similarly, we observe that if ¢1, 1>2 satisfy (7.24) for some self-
concordance parameters 81, 82, then so does ¢1,2 := ¢1 + 1>2 with self-concor-
dance parameter 81,2 = 81 + 82 (as long as the intersection of the domains of 1>1
and 1>2 is not empty).
(Straightforward)

• Linear, quadratic or semi-definiteness constraints. The proofs for (7.8)


and (7.10) can easily be modified to show the following results.
The logarithmic barrier function of a linear constraint is a 8 = I-self-concordant
barrier function, the logarithmic barrier function -In X of a symmetric positive
definite n x n matrix X is a 8 = n-self-concordant barrier function.
Moreover it is easy to see that the logarithmic barrier function -In(-r(x)) of
any convex constraint function r( x) :::; 0 is 8 = I-self-limiting (but not necessar-
ily self-concordant) as long as there exists some x for which rex) < O. In partic-
ular, the logarithm of a convex quadratic constraint is a 8 = I-self-concordant
barrier function.

Properties
We note here that for 8 < 1 there is no solution that satisfies both, (7.7) and (7.24)
(except for constant functions). This is proved in Lemma 7.2.12 below.

One of our main concerns in our derivation of (7.24) was the desire for an outer
ellipsoid. The following lemma shows that (7.24) indeed provides such an ellipsoid.
274 CHAPTER 7

Lemma 7.2.8 (Outer ellipsoid) Let ¢ be a (J-se/f-concordant barrier function for


Sand x' be the analytic center of S. Let h be some vector with 6 = IlhllHr* >
«(J + 2v'o). Then the points x' ± h are outside S.

Proof: We show that the points ±d«(J + 2v'o) are not feasible for 1 Ix*,h in (7.6), =
where d := I/.J1"(0). We consider the functions g(t) = /'(t) and u(t) = /"(t). To
determine the domain of 1 we investigate the poles of g for t ~ O. By (7.24), g is a
solution of
i(t) ::; (Jg'(t), g(O) = 0, g' (0) = f" (0) > O. (7.26)
Because of the initial values, the inequality g2 ::; (Jg' is "inactive" near t = O. For
small values of t ~ 0 we therefore apply inequality (7.7) again. Let

-[:= dVO.

If 1 is not defined at -[ there is nothing to show. Hence we assume it is, and conclude
analogously as in the proof of Lemma 7.2.2 that (7.7): g"(t) ~ _2g'(t)3/2 implies

g'(t) ~ w(t) := (t + d)-2 for t E [O,-[]

=
(since w satisfies w"(t) -2w'(t?/2 and has the same initial value at t = 0). With
the variable i := t - -[ and g(t) := g(-[ + t) = g(t) relation (7.26) implies

(7.27)

We find that the initial values for (7.27) satisfy

g(O) = if
o
g'(T) dT ~
if0
W(T) dT =
d(I
v'o
v'o =: dl ,
+ (J)
and
g'(O) = g'(-[) ~ w(-[) = d- 2(1 + VO)-2 =: d2 •
Observe that di = (Jd 2 . It follows that g(i) ~ s(i), where s satisfies
s(i)2 = (Js'(i) and s(O) = dl .
The function s is given by s(i) =
(di l - i/(J)-l, and has its pole at i (Jd I l . The =
corresponding value of t is t = -[ + i = d(J(I + 2/v'o). By construction, s(i) ::; g(t),
so that the pole of g and hence of 1 must lie before this point. 0

Note that in the proof of Lemma 7.2.8, the function w can be continued beyond
the point -[ in such a way that w =
s' for t > t. The second integral W of w is
Solving Convex Programs 275

twice continuously differentiable, and satisfies the relations (7.7) and (7.24) almost
everywhere (except at 1). This shows that also the bound d(9+2VO) obtained from
W cannot be improved for general self-concordant functions.

We point out that the proof of Lemma 7.2.8 may be generalized to yield an outer
ellipsoid centered at other points i: #- x* if the corresponding function f satisfies
df'(i:) ::; a: < 1 independent of h. (In this case of course the constants will change.)

An immediate consequence of Lemma 7.2.2 and Lemma 7.2.8 is the following Theo-
rem:

Theorem 7.2.9 Let ¢> be a 9-self-concordant barrier function for S and let the el-
lipsoid E(x) ={hi hT D2¢>(x)h ::; I} be defined by the Hessian of rp at x. For all
x E So we have
x + E(x) C S,
and if the minimum x = x* of rp exists we further have

x* + (9 + 2v'o)E(x*) :J S.

A two-sided ellipsoidal approximation of this type is also proved in [29] with a slightly
larger ratio of inner and outer ellipsoid.

It is clear that two-sided ellipsoidal approximations of S by concentric ellipsoids with


=
a fixed similarity ratio l' 9 + 2VO are only possible at or near the analytic center
x' of S. For arbitrary x E So however, we can show the following new corollary. By
"cutting off" from S a suitable half-space 1t through the current point x, a small
multiple of the inner ellipsoid E( x) yields an outer ellipsoid for the remaining part
of Sn1t.

Corollary 7.2.10 Let rp be a 8-self-concordant bamer function forS and let x E So


be arbitrary. Further define

1t := {y I Drp(x)(y - x) ~ O} and E(x):= {h I hT D 2rp(x)h ::; I},

then
x + E(x) C S,
and
x + (8 + 2v'o)E(x) :J S n 1t.
276 CHAPTER 7

Proof: The proof of Lemma 7.2.8 can be applied to the Corollary. 0

As pointed out above, the two-sided ellipsoidal approximations proved above can be
used to find an estimate for our second question regarding the distance ,\ - cT x('\)
compared to cT x('\) - ,\oP'. However, for Ii: > 1 in (7.5) the resulting answer is not
optimal. Below we list a stronger estimate taken from [11]. (The result in [11] is
slightly more general than what is needed to answer the second "crucial question"
in Section 7.1.2.)

Lemma 7.2.11 Let the interior of the set 8('\) in (7.4) be 1!onempty and bounded.
Let ,\op' = min{cT x I x E 8('\)} (as before) and let IjJ be a B-selflimiting barrier
function for 8. Let further Ii: ~ 1 be a constant and

x('\) := arg min ljJ(x) - Ii: In('\ - cT x). (7.28)


xES(>')

Then:
(7.29)

Proof. Let x = x('\), and x OP ' be an optimal solution in 8('\) with cTxop, = ,\op'.
Define h := x OP ' - x. We consider the function f of (7.6) with the above x and
h. Obviously, "I" is a boundary point of the domain I of f. Note that /'(0) =
-Ii: >.=-)x > O. (This follows since the function j(t) := ljJ(x+th)-1i: In('\-c T (x+th))
has a minimum at t = 0.) We set again 9 = /' and use (7.24):

g(O) = /'(0).
As before, the extremal solution (1'(0)-1 - e- 1t)-1 of this differential inequality is
a lower bound for g. Since t = 1 must in in the domain of the extremal solution (or
:s
at its boundary) it follws that /'(0)-1 - B- 1 ~ 0, or /'(0) B. But this is just the
claim to be shown. 0

We now direct our effort to proving for which types of constraints the logarithmic
barrier function (7.3) is "optimal".

Optimal Barriers
The previous section illustrated the importance of self-concordance of a barrier with
a small parameter B. Here, we are interested in finding the "best" (with minimal B)
barrier of a convex set 8.
Solving Convex Programs 277

In particular, we would like to find barrier functions that have a self-concordance


parameter 0 lower than the straightforward barrier functions defined in the previous
sections. A very interesting theoretical result is the following due to [29].

For any closed convex set S C JRn with nonempty interior there exists a universal
o= O( n )-self-concordant barrier function.
If S does not contain a straight line, the
universal barrier function is given by

canst In I{hl hT(y - x) ~ 1 V y E S}I,

where I . I is the Lebesque measure in JR n , and canst is some positive number


independent of n. In most cases, this barrier function is not practical, and its
evaluation is much more costly than the solution of the original optimization problem
(7.2). In some cases however, we may evaluate the universal barrier explicitly; for
=
the positive orthant of JR2 for example, S JR~, it is straightforward to verify that
the (above) universal barrier is given by

and since the area of the triangle formed by the negative orthant and the line through
(-I/Xl'O) and (0, -1/x2), is 1/(2xIX2), this simplifies to -In 2 -In(xI) -In(x2),
i.e. we obtain the standard logarithmic barrier.

One might suspect that -In(xI) - In(x2) is "optimal" (with respect to 0) for this
set, and indeed if there was a better barrier function q;*, say with self-concordance
parameter 0 < 2, then for x E JR~n one could construct the barrier function

with self-concordance parameter nO. For large n it follows that nO + 2v'n8 < 2n -1,
and this contradicts the ellipsoidal approximation of inner and outer ellipsoid by a
ratio of 1 ; nO + 2v'n8 (since by preapplying the affine mapping A ; JR 2n-l -> JR2n
with
y -> (1 _,,~n-l
L.....=l
.),
y.
the function q;**(A( . )) becomes a nO-self-concordant barrier for the 2n - I-dimen-
sional simplex {Yi ~ 0, l:;:~l Yi ~ I}, and inner and outer ellipsoid cannot approx-
imate the simplex in JR 2n - 1 with a ratio that is better than 1 ; 2n - 1).

With a similar consideration one can show the following Lemma.


278 CHAPTER 7

Lemma 1.2.12 Any self-concordant barrier function for a convex set S has self-
concordance parameter at least k if there exists an affine subspace U such that S n U
contains a vertex at which precisely k linearly independent smooth constraints are
active.

As a corollary we obtain that "-In det X" is an optimal barrier function for the
positive definite cone as one might choose the linear subspace U that fixes all off-
diagonal elements of a matrix X to zero; of course, the diagonal elements Xi;, of a
positive definite diagonal matrix X must satisfy Xii> 0 so that there are precisely
n linearly independent constraints active in U at X =o.
A direct derivation of this lower bound for barrier functions is given in [29] for
polyhedra, and that result is easily generalizable to the case of nonlinear constraints
as well.

We used the quality of the ellipsoidal approximations to prove the lower bound on
(), and since this bound is sharp (as there exist barrier functions that attain this
bound), our derivation also implies that asymptotically, for large (), the ellipsoidal
approximations of the sets S are optimal. Nevertheless, for special classes of barrier
functions (like - L: In Xi or -In det X), the ratio of the ellipsoidal approximations
can be improved to 1 : () - 1, see e.g [33, 5].

Above we have seen that the logarithm of a single linear or convex quadratic con-
straint is optimal. Further, it is straightforward to verify that the logarithm of a
single linear constraint is the unique-up to an additive constant-optimal barrier
function. In contrast, a convex quadratic constraint q(x) :s 0 has more than one
=
optimal barrier function; for example if x argminq(x) exists (and q(x) < 0), then
4;(~)q(x) - In(-q(x» is also a () = I-self-concordant barrier function for the set
{xl q(x) :s a}. (The proof is straightforward.)

Finally we point out that the "optimal" self-concordance parameter () is not a smooth
function of the constraints. The set {xl IIxII2 :s 1, Xl :s1 + f} has a () = I-self-
concordant barrier function for f 2: 0, and for -2 < f < 0 this set has a vertex at
which precisely two linearly independent constraints are active, so that () 2: 2 must
hold.

Another aspect concerning "optimality" of a barrier function and that was disre-
garded in this section is the cost of evaluating tP and its derivatives which is an
important issue when looking for an implementable barrier function. Certainly, the
barriers for linear, convex quadratic, or semi-definite constraints given in Section
Solving Convex Programs 279

7.2.2 are "good" in the sense that their derivatives may be computed at a reasonable
cost, and they are optimal with respect to o. This optimality may be lost when
forming intersections of the constraints by adding the barrier functions, but as long
as the number of constraints is moderate these barrier functions seem appropriate.
For problems with very many constraints, the volumetric barrier function, see e.g.
[2, 3], may be better suited. For further examples of "good" and implementable
barrier functions besides the ones listed in this chapter we refer to [29] pp. 147-202,
where barrier functions for the epigraph of the matrix norm or the second-order cone
are listed for example.

Further Examples of Self-Concordant Functions


Here, we quote some further examples from [29] of convex domains that have "easily
computable" self-concordant barrier functions.

Let a ?:: 1 be fixed and ( be some function with 1("I(t)1 :<::; -3a("(t)jt for all t > O.
(For example a = 1 and ((t) := t P for some 0 < p < 1.) Then the function

(7.30)

is a 8 = 2a 2 -self-concordant barrier function for the set {(x, y)1 x ?:: 0, y:<::; ((tn c
JR,2.

In particular (see also [8]), the two-dimensional sets

1. {( x, y) I y ?:: eX},

2. {(x,y)1 y?:: (x+)P} forsomep?:: 1 (and x+ :=max{x,O}),

3. {(x,y)ly?::x P, x>O}forsomep:<::;-l,or

4. {(x, y)1 y?:: x Inx, x?:: O}

have the 2-self-concordant barrier functions

1. -In(ln(y) - x) -In y,
2. -In(yl/p - x) -In y,

3. -In( x - yl/P) - In y, respectively


280 CHAPTER 7

4. -In(y-xln(x))-Inx.

Another example is the set {(X,t) IX E JRpx q , t > 0, IIXI12::; t} with the q + 1-
self-concordant barrier function

- In det (t 2 I - XT X) + (q - 1) In t. (7.31)

Here, I is the q x q identity matrix, and II . 112 is the lub 2 norm, IIXII2 = SUPh;to 11~~~2 .
(Note that by IIXI12 = IIX I12 we may assume q < p.)
T

Compatibility
We conclude this section with a criterion that allows us to treat some nonlinear
convex objective function c : S --+ JR as a constraint by introducing an n + 1-
st variable Xn +l, and the additional constraint c(x) - Xn +l ::; O. The criterion
guarantees that the resulting barrier function -In(x n +l - c(x)) + t,f>(x) is a self-
concordant barrier function for the set

S+ := {(x, xn+l)1 XES, c(x)::; xn+d.


This result (taken from [8]) is a special case of a more general theory developed in
[30].

Proposition 7.2.13 Let t,f> be some 8-self-concordant barrier function for the set
S C JR n , and let c : S -+ JR be some smooth convex function. For x E So and
hE JRn let
l1(t) := t,f>(x + th) and h(x):= c(x + th).
The function c is called f3-compatible with ¢ for some f3 2': 1 iff for all x and h as
above,
f~/f(O) ::; 3f3f~'(0)j ff'(O).
If c is f3-compatible with ¢ then the function
-f3 2 In(x n +l - c(x)) + (32t,f>(x)
is a self-concordant barrier function for the set S+ with parameter (32(1 + 8).

The proof of this proposition follows from Theorem 3.1 in [30] by setting r := S+,
G:= JR+(= {t E JRI t 2': O}), F(t):= -Int, II(X,Xn+l) := ¢(x), and A(x,xn+d:=
X n +l - c(x).
Solving Convex Programs 281

7.3 A BASIC ALGORITHM


In this section we show that the conditions identified in the previous section are
indeed sufficient to guarantee a polynomial rate of convergence of the method of
centers when certain parameters are suitably chosen. Here, a polynomial rate means
that a certain measure for the closeness to an optimal solution-for example if x is
feasible, this measure may be the gap in the objective function value cT x - cT xopt-is
t
reduced by a factor in a number of iterations that is bounded by a polynomial in
the dimension of the problem and the self-concordance parameter O. For the method
of centers it turns out that given a suitable starting point, a certain upper bound
t
for cT x - cT x opt is reduced by a factor in 12V8 iterations.

7.3.1 A Model Algorithm with Polynomial Rate


of Convergence
We present a polynomial version of the method of centers for solving problem (7.2).
As mentioned above, polynomial refers to the rate of convergence; the exact solution
of problem (7.2), of course, is not computable in general.

Let cp be a O-self-concordant barrier function for S-not necessarily ofthe form (7.3).
Under the assumptions of Section 7.1.1 let a point x(o) E So and some number
°
A > Aopt be given such that

Here, ~x(o) is the Newton step -D2<p(x(O) , AO)-l D<p(x(O), AO)T, He,>. = D;<p(x, A),
and <p is given as in (7.5) with", = O.

Algorithm 1

1. k := 0; u:= 1/(SV8); E: := desired accuracy.


2. x(k+ 1) := x(k) + ~x(k) where ~x(k) := _D2<p(x(k), Ak)-l D<p(x(k), Ak)T.

3. If Ak - cT x(k+l) 5 ~~E: stop, else

4. Ak+l := Ak _ u(Ak _ cT x(k+l».

5. k := k + 1; go to 2.
282 CHAPTER 7

Convergence Analysis

We prove feasibility and convergence of the objective function value cT x Ck ) to the


optimal value. By induction, we assume that x Ck ) is strictly feasible, x Ck ) E So, and
satisfies

(This means that the iterates remain "close" to the centers X(A k).)
We analyze the algorithm step by step.

Step 2: By Lemma 7.2.4, the result x Ck+ 1 ) of step 2. satisfies

Step 3: By (7.21) and (7.12), and the above result we may conclude that

IIx(k+l) - X(Ak)IIHz(>k).>k ::; 0/(1- 6) ::; 1/14, (7.32)

where 6 = 1 - (1 - 3 . (20/81)2)1/3 ~ 0.0651141. This implies that x(k+ 1 ) lies in


1/14 of the inner ellipsoid around x(,Ak). For the center X(Ak) we can apply Lemma
7.2.11 and obtain
Ak _ cT X(Ak) 2: cT x(,Ak) _ AoPt •
We would like to use this result with cT x(k+ 1 ) in place of cT X(Ak). For cT x(k+ 1 ) I-
cT X(Ak) we consider the line through X(Ak) and xk+ 1 and mark the intersection of
this line with the inner ellipsoid for S(A), and the intersections with cT x =Ak and
cT x= Aopt. (The latter intersections are outside the inner ellipsoid.) The previous
estimates then result in

(7.33)

Therefore the stopping criterion in step 3 guarantees cT x(K) - Aopt < e if K is the
index k at which the method terminates.

Step 4: Relation (7.33) implies that the gap Ak - Aopt between the upper bound AI:
for cT x(.!;) and the (unknown) optimal value Aopt is reduced by a factor of at least
~~u in step 4.
To complete our induction we verify that after the update of Ak+l in step 4, the
iterate x(k+!) again satisfies

(7.34)
Solving Convex Programs 283

From the definition of the Hessian


HX,A = D2¢(X) + K:(A~~;x)2 follows that Hx(k+l),Ak+1 ~ H x (k+l),Ak, and therefore
the inverses satisfy H-(!+I) ',+1 S H-(!+I) ' k ' (Here, two symmetric matrices A, B
X,A X ,1'\

satisfy A ~ B iff A - B is positive semidefinite.) Hence,

by the result of step 2. Using this we may continue with the triangle inequality,

S (20/81)2 + IIMC/(A k +1 - cT x(k+1»)llwl S (20/81)2 + 1/8 < 20/101.


x(k+l) .,k+l

The third inequality is a straightforward consequence (maybe 5 lines proof) of the


Sherman Morrison update formula for inverse matrices 3 applied to H = D2¢ +
K:(A~~;X)2' and the last inequality implies (7.34). 0

For the above choice of K: = (J and u = 1/(8VO) these relations imply that at each
iteration the distance Ak - Aopt is reduced by a factor 1- 28 1 :v'8'
and from this it is
straightforward to derive that the number f{ of iterations until the algorithm stops
is bounded by
AO _ AoPt
f{ S 18VOln( ).
f

Each iteration involves the computation of the functions Ti, their first and second
derivatives as well as the solution of a linear system in JR n . Since 122~38 > In2
we may also conclude that Ak - AOP ' is reduced by a factor ~ after at most 12VO
iterations, validating the claim at the beginning of this chapter.

The above algorithm assumes that an initial point close to the center of some sub
level set S(A) is given. This assumption is typically not satisfied in practice, and a
phase-1 algorithm is necessary to find such a point. Moreover, in many real-world
linear programs, for example, the interior of the feasible set is empty. Next we sketch
an infeasible method that can solve problem (7.2)-or even a slightly more general
problem-in a single phase even if the feasible set has empty interior.
284 CHAPTER 7

7.3.2 Towards a Practical Algorithm


In addition to the need of a phase 1 algorithm, the polynomial method of the previous
section is prohibitively slow when being implemented as stated in Algorithm 1. While
the most successful implementations of interior-point methods to date are primal-
dual algorithms, we briefly outline a primal method that can be implemented with
the means discussed in this chapter and that may nevertheless be fast enough for
a practical implementation. We choose a modification of the logarithmic barrier
method in [9]. (A modification of the method of centers would be equally suitable.)

Our presentation differs from infeasible interior-point methods for linear or convex
quadratic programs since for programs of the form (7.2) or (7.35) (below) the situ-
ation is more difficult than in the case of linear programs. A restriction Ti of (7.2),
for example, may be convex in the interior of S and undefined or non convex outside
S (while linear functions of course are also linear outside S).

An approach commonly used for linear programs is to relax a constraint of the


form T;(X) = aTx - bi :::; 0 to Ti(X) :::; l'f3i for some fixed f3i ~ 0, such that a
given initial point x(O) is strictly feasible for the relaxed constraint with I' = 1,
i.e. ''-In(l'f3i + Ti(X(O))" is well defined. During the course of the algorithm the
duality gap and the infeasibility I' simultaneously tend to zero. It is clear that
this approach needs to be modified, for example, if Ti has a singularity somewhere in
{xl Tj(X) ~ O}, or if Tj(X) is the function "-det X". In the latter case, -In( -Ti(X»
is self-concordant, but the function ''-In(l'f3 - Ti(X))" is not for 1'f3 > O.

AMore General Problem and Assumptions


Since in many applications there are linear equality constraints, we outline the
method for the slightly more general problem

mm cTx
Tj(X):::;O for l:::;i:::;m
Ax= b (7.35)

where A E /Rkxn with k < n has maximum rank. For each i we assume that a point
Xi is known such that T.(X.) < o. We assume that

tP.(x) := -In(-Tj(x))

is continuous and convex wherever it is defined. We further assume that Tj(X) is


bounded below by some polynomial p in Ilxll, T.(X) ~ p(lIxlD for x in the domain
dom(¢l.) of ¢l•.
Solving Convex Programs 285

The last assumption is needed in our analysis to guarantee the existence of the points
x(Jl) below. It is always satisfied, for example, if cP;(x) is B self-limiting for some
B ~ 1. If the convexity assumption is violated, or if nonlinear equality constraints
are present, some modification like a trust region interior point method may still be
applicable. The knowledge of the points Xi, however, is a basic assumption which is
necessary if we don't assume anything about the constraints Ti for T;(X) <f.. O. Often,
the points Xi can be chosen all the same, Xi = x(O), in which case the scheme below
simplifies substantially.

Constructing a Path X(p,)


In general, we assume that some arbitrary initial point x(O) is given. Set b Ax(O)-b. =
If it is not possible to find common points Xi =
x(O) for all i, we may use the freedom
of x(O) and choose x(O) as the least squares solution of Ax = b so that b = O.

We consider the feasible sets


d; := Xi - x(O).

By construction, x(O) E S(I)O (the relative interior of S(I)). For each fixed Jl E [0,1]
the logarithmic barrier function 4
m m

i=1 i=1

is convex for X E S(Jl)O, and in particular, S(Jl) is closed and convex. Moreover,

S+ := {(x,Jl) E lR n + 1 I Jl E [0,1]' X E S(Jl)} (7.36)


is closed and convex.

• The proof is straightforward; for completeness, we show convexity: The linear


equation (A,-b)(x,Jl) =
b is clearly convex. If (Xj,Jlj) E S+ for j 1,2, then =
Xj + Jljd; E domcPi for all i, and hence, by convexity of domcPi,

is in domcP; so that

o
4 The subscript of r/> is ambiguous, but whenever there is a chance of confusion of the real subscript
J1. E [0,1] and the integer subscript i we will specify the subscript explicitly.
286 CHAPTER 7

Clearly, S(O) is the feasible set of(7.35). Since S+ is convex and S(I)O i- 0 it follows
that for p, E (0,1) the domain of ¢I' is not empty if S(O) is not empty. One might
therefore try to follow the points

From p, = 1 to P, =O. We note that in the absence of the shifts (i.e. when Xi =
x(O)
for all i) and in the absence of the linear constraints, the points x(p,) coincide with
the points x(.A) of the method of centers. The relation of J-I and the corresponding
parameter .A was examined in [34), for example. We further note that we do not
assume that the feasible set S(O) of (7.35) has nonempty interior, for our purposes
it is sufficient that S(J-I)O is nonempty for p, E (0,1).

Proposition: Under the assumption that the set Sop' of optimal solutions of (7.35)
is nonempty and bounded, the points x(J-I) exist for p, > O.

Proof: Let x op ' be some solution of (7.35). Then,

For x E S(J-I)O let


cTx
'I'1'(x) := - + ¢I'(x).
J-I
Since '1'1' is a convex barrier function, it suffices to show that limllxll_oo, Ax=b+b
'1'1' (x) = 00 to guarantee the existence of its minimizer x(J-I). More precisely we show
by contradiction that if there exists some direction d such that x(J-I) + O"d E S(p,) for
all 0" ~ 0 then
lim 'I'1'(x(J-I) + O"d) = 00.
a_oo

Assume that 'I'1"(x(J-I) + O"d) is bounded (from above) for all u > O. We distinguish
two cases: Either cT d SO or cT d > O.

The latter case immediately leads to a contradiction. By our assumption on ployno-


mial boundedness of -ri, the function 4>Ax(p,) + O"d) falls at most logarithmically
with 0". On the other hand, cT(x(J-I) + ud) grows linearly with 0" leading to a con-
tradiction.

The case cT d S 0 leads to a contradiction to the boundedness of Sop,: Clearly, X(O) E


SoP', and in particular, (X(O),O) E S+. If it happened that for some (j > 0 the point
(X(O) + (jd, 0) f/. S+, then-since S+ is closed-also (X(p,f) + (jd, J-If) f/. S+ for some
small f > O. On the other hand, x(J-I) + ~(jd E S(J-I) and therefore (x(J-I) + ~(jd, J-I) E
Solving Convex Programs 287

S+. Since the three points (X(O), 0), (x(J.Lf)+i7d, J.lc), and (x(J.l)+~i7d, J.l) are colinear,
this leads to a contradiction to the convexity of S+. Hence, (X(O) + i7d, 0) E S+.
But this implies that X(O) + ud is feasible for (7.35) for all u > O. Since X(O) = xo pt
is optimal for (7.35) and cT d::; 0, the points X(O) + ud are optimal as well (and in
fact cT d = 0), but this contradicts boundedness of Sopt. 0

A Single-Phase Primal Predictor-Corrector Algorithm


For numerically tracing the points x(J.l) we further assume that 1>1'=0 is smooth and
strictly convex on S(J.l) (i.e. the Hessian is positive definite on the null space of A.)
This implies that x(J.l) is unique for each J.l, and forms a smooth curve in J.l. Below,
D as in D'PI' (x) always refers to differentiation with respect to x; differentiation
with respect to J.l will be denoted by a " as in x'(J.l). For following the curve x(J.l),
a predictor-corrector scheme appears to be most promising. The corrector step can
be motivated as follows:

For a given J.l > 0 and given x E S(J.l) first approximate the minimum of 'PI' by a
sequence of Newton steps with line search;

x+ = x + Umin~X, Umin = argmin{u > 0 I 'P1'(x + u~x)}, (7.37)

where ~x is given by the solution of

(7.38)

(For simplicity we restrict this presentation to plain Newton's method with line
search.) If the function 1>1'=0 is self-concordant, then so are the functions 'PI' for
J.l E [0,1] and a possible stopping test for Newton's method might be whenever the
H-norm of the Newton step is less than 1/2, for example. (In this case we know that
the minimum of 'PI' exists.)

If on the other hand some unbounded direction is found during the line search for
Newton's method, the minimum of 'PI' does not exist, and by the above proposition,
either the set of optimal solutions is unbounded, or there is no (finite) optimal
solution. Likewise, if the domain of 'PI' "collapses" before J.l reaches 0, it follows that
the domain of (7.35) is empty.

Next, we explain the predictor step. Given a sufficiently close approximation x(k)
to x(ji) for some ji E (0,1]' the predictor step follows the tangent x'(J.l) in direction
J.l = 0 while maintaining feasibility with respect to 1>1'. It turns out that even though
(most likely) the current iterate x(k) does not lie on the curve x(J.l) , there is some
288 CHAPTER 7

other curve through x Ck ) leading to the set of optimal solutions. Let 9 = Dt.pp.(xCk)f
be the gradient of t.p p. at xC k). The points

also form a smooth curve leading from x Ck ) to some point in SoP', and whose tangent
can be computed analytically. Differentiating AX(Jl) = b + Jlb with respect to Jl
yields Ax'(Il) = b, and differentiating Dt.pp.(x(ll)f == 9 - Y(Jlf A with respect to II
yields a second linear equation for x'(Jl), namely (the first block row of)

(7.39)

The predictor step follows the linear ray x(Jl) := x Ck ) + (ft -Il)x'([t). The next value
of Jl is chosen large enough (Jl < [t) such that x(Jl) E S(Il)o. More precisely, let

ft := inf {Jl I Ti(X(Jl) + Jld i ) < 0 for all i}.


By convexity of S+ (7.36) we conclude that for any II E (ft, ftl the point x(Jl) is in
the domain of t.pp.o By some line search we therefore determine it and choose the
next Jl = 11+ as
Jl+ = 0.3ft + 0.7ft, (7.40)
for example, and set x+ = x(Jl+). Summarizing we obtain the following algorithm.

Algorithm 2

1. k := 0; c := desired accuracy; xCD) and Xi as given above. Define b = AxCD) - b.


2. repeat step (7.37), (7.38) starting with x = xC,.) until the H-norm ofthe Newton
step is less than ~. Let the result be xP:).
3. If Jl ~ c/m or k > iteration limit, stop, else
4. do step (7.39), (7.40) with x = xC,.) and set x Ck+ 1 ) := x+. Set k := k + 1, go to
2.

Comments: Note that the above algorithm can be applied to a linear program
in both, its primal or its dual form resulting in an infeasible primal or dual algo-
rithm. We stress that primal-dual predictor-corrector methods have been extremely
successful in the recent past, see e.g. [23, 24, 20]. While purely primal (or dual)
methods as outlined above received less attention, they are easy to implement, and
Solving Convex Programs 289

we believe that they may also be efficient if implemented with the same care as
the implementations for primal-dual methods. Apart from an efficient solver for
the systems of the form (7.38), (7.39)-exploiting sparsity structure, symmetry and
quasi-definiteness, crucial features of an implementation typically include the choice
of the starting points Xi and x(O)-such that Iri(xi)I/IIDri(xi)11 ""Ildill + Ilbll, the
step length 1'+ in (7.40)-shorter in the initial stage of the algorithm and longer
towards the final iterations, as well as suitable modifications of Newton's method.

7.3.3 Analysis of a Short Step Version of


Algorithm 2
A careful complexity analysis of Algorithm 2 is beyond the scope of this chapter.
Even for linear programs the analysis of infeasible interior point methods is a difficult
topic; the rate of convergence of most infeasible interior-point methods is worse
(namely O(m)) than for feasible methods (which is O(ym)), see e.g. [10,22,25,26,
31, 32, 35, 37]. For the simple case, however, that

1. b = 0,
2. Xi = x(O) for all i,
3. the functions <Pi are OJ-self-concordant barrier functions,
4. the relative interior So of S == S(J-1) for p. E [0,1] is non empty, and Sopt is
bounded,
5. a starting point x(O) close to the point x(l) is available,

a complexity analysis of a short step version of Algorithm 2 is given below.

=
Define <P :L <pj. Then, in the space aff(S) = =
{x+d I xES, Ad O} the function <p
=
is a O-self-concordant barrier function for S with 0 :L OJ. In particular, all results
of Section 7.2 are valid for <p in aff(S). Under the above assumptions, 'PI' has the
simpler form
cTx
'PI' (x) =- + <p(x).
I'
Observe that the remark following Lemma 7.2.2 and boundedness of Sopt imply that
the Hessian of <p resp. of 'PI' is positive definite on the null space N(A) of A. By H:r;
we denote again the Hessian H:r; = =
D2<p(X) D2'Pp(x), and for a feasible direction
290 CHAPTER 7

d (i.e. such that Ad = 0) the H-norm of d is given by IIdllkx = rfI' Hxd > 0 for d oJ O.
We may also verify that the H-norm of the Newton step ~x in (7.38) is given by

In slight abuse of notation we further denote

(If H is positive definite (on JRn), the matrix Ht = H;1/2 llN (AH;1/2)H;1/2 where
llN(AH-l/2) is the orthogonal projection onto the null space of AH- 1/ 2 .)

Algorithm 3

1. Let J.t 0 = 1. Set k := 0; u := 1/(9V8); c: = desired accuracy, and assume that

Here, ~x(o) is the Newton step (7.38) with x = x(o) and p. = p.o.
2. X(k+1) := x(k)+~x(k) where ~x(k) is the Newton step (7.38) with x = x(k) and
p. = p.k.
3. If 8p.k :::; i~c: stop, else
4. p.k+1 := J.tk(l- u).

5. k:= k + 1; go to 2.

The convergence analysis of Algorithm 3 may be carried out by induction like in


Section 7.3.1. We show that II~x(k)IIHx(.) :::; 120°1 for all k.

After completion of step 2., it follows from Lemma 7.2.4 for the function <PI-' of the
form (7.17) that
(7.41)

By property (7.25) we may thus conclude that


Solving Convex Programs 291

(The function ¢ is O-selfconcordant and its gradient at x(k+ 1 ) is D¢(x(k+ 1 »


DCPl'k(x Ck+ 1 » - clJ-lk.) Hence, after the update of Ilk+l in Step 4., it follows

lI_c_ + D¢(xCk+l»TII + < ~ + VO + 1/16 < ~


1lk+ 1 Hx(k+l) - 16 9VO - 1 - 101'
since 0 ~ 1. Finally, to verify that the stopping test in Step 3. is accurate observe that
the centers x( A) minimizing cP in (7.5) and the points x(ll) minimizing ¢( x) + cT XI J-l
coincide for
=
II OI(A - cT X(A)).
(Here, in slight abuse of notation we use the name (A or II) of the parameter to
distinguish to which minimization problem the curve x( . ) refers to (i.e. to cp(x, A)
of the method of centers or to CPI'(x) = ¢(x)+cT xiII).) Again, a relation of the form
(7.32) follows from (7.41) and implies that x Ck+ 1 ) lies within 114 of the inner ellipsoid
around x(ll k ) = X(Ak) with Ilk = (Ak - cT x()..k»/O. Thus, as in Section 7.3.1, the
desired bound
Ollk ~ ~~ (cT x Ck +1 ) _ AO P ' )

follows. 0

7.4 SOME APPLICATIONS


As we have seen, self-concordant barrier functions are attractive (for the use in a
numerical implementation) if their first and second derivatives are computable at a
reasonable cost, and if the self-concordance parameter is not much bigger than the
dimension of the problem. We conclude this chapter with some simple examples of
convex problems that have "attractive" self-concordant barrier functions. For each
example, of course, other self-concordant constraints can be added. More examples
can be found in [29, 36, 8].

• Quadratically constrained convex programs.


The most simple class of nonlinearly constrained convex problems are those with
quadratic constraints. In spite of their simple form, these problems arise in a
variety of applications. In robotics, for example, finding the best way to move a
robot often leads to a problem with linear and quadratic constraints. Another
interesting example arises in mechanical engineering. The discretization of the
problem of finding the stiffest truss (pin-joined framework) with respect to a
given load leads to a problem of the form

min {cT x I x T AiX ::; bi for 1::; i ::; m} ,


292 CHAPTER 7

see e.g. [4]. In these examples, typically, the number m is very large, and
n ~ Vm is smaller. The matrices Ai often have a special form (low rank, very
sparse) so that problems with up to m ~ 300000 constraints can be solved
efficiently, see e.g. [20]. These problems are also an example for the "gap"
between theory and practice. The self-concordance parameter B m ~ n is =
quite large in these examples, and the worst-case analysis of this chapter would
suggest a slow rate of convergence for the large examples, but nevertheless, the
implementation in [20] converges reasonably fast (in about 60 iterations to 8
digits accuracy), even for the large problems.
• Lp-norm approximation problem.
Another application-introduced in [29]-is the problem
k

min L laT x - bjl P ,


j=l

subject to some convex constraints on x. Here, the vectors aj E IR n , as well as


bj E IR, and p 2: 1 are given data. This problem can be reformulated as

-U·
J - < aTJ x- b·J -< U·J for 1 _< J. <_ k}
with B = 4k-selfconcordant barrier function
k
- Lln(T}/P - Uj) -lnTj -In(uj - aT x + bj ) -In(uj + aT x - bj ),
j=l

which is composed of 2k linear constraints and k constraints of the form y 2: zP,


see Example 2 in Section 7.2.2. Note that the artificial variables Uj can be
eliminated from this presentation.
• Further examples from [8] include the dual geometric programming problem, the
extended entropy programming problem, the primal ip-programming problem
and the dual ip-programming problem.
• The most popular class of convex programs with self-concordant barrier func-
tions are semidefinite programs. A number of relaxations of combinatorial
problems-see e.g. [1, 12], problems from geometry-see e.g. [36] or control
theory-see e.g. [6]-can be rewritten as semidefinite programs. Often these
programs can be reduced to the standard form

min {cT x I A(x) := A(O) + t


J=l
xiA(i) 2: o} ,
Solving Convex Programs 293

where A(i) are given symmetric matrices, and the inequality A(x) 2:: 0 means
that A(x) is positive semidefinite. If the matrices A(i) are n x n, a e = n-self-
concordant barrier-function for this constraint is

1/>( x) = - In det A( x)
as we have seen in Section 7.2. (In Section 7.2 we considered the function
-lndetX. By preapplying the affine mapping x ----- A(x), we obtain the self-
concordant barrier function - In det A( x).) Note that the Hessian of I/> can be
computed by

(DI/>(X))i -trace(A(i)A(x)-I), (D 2 1/>(x))i,j


trace(A(i) A(x)-l AU) A(x)-l).

For moderate values of k and n one may thus directly apply an interior-point
method. For larger values of k, the dual form may be more appropriate, see
e.g. [1, 36] where also other means for efficiently solving the Newton systems
are discussed. For a comprehensive survey on semidefinite programs we refer to
the next chapter, or to [36, 12].

Acknowledgements
The author would like to thank the editor, Prof. T. Terlaky, as well as Prof. K.
Anstreicher, Dr. R.W. Freund, Prof. S. Mizuno, Prof. D. Klatte, Prof. T. Jongen
and Prof. J. Stoer for their help and careful proofreading.

REFERENCES
[1] F. Alizadeh, "Interior point methods in semidefinite programming with applica-
tions to combinatorial optimization" SIAM Journal on Optimization, 5(1):13-
51, (1995).
[2] K.M. Anstreicher, "Volumetric Path Following Algorithms for Linear Program-
ming" Technical Report, Dept. of Management Science, The University ofIowa,
Iowa City, USA (1994).
[3] K.M. Anstreicher, "Large Step Volumetric Potential Reduction Algorithms for
Linear Programming" Technical Report, Dept. of Management Science, The
University of Iowa, Iowa City, USA (1994).
294 CHAPTER 7

[4] A. Ben-Tal, M.P. Bendsoe, "A new method for optimal truss topology design"
SIAM Journal on Optimization, 3:322-358, (1993).
[5] S. Boyd and L. EI Gl1aoui, "Method of centers for minimizing generalized eigen-
values," Linear Algebra and Its Applications 188/189 (1993) 63-111.
[6] S. Boyd, L. EI Ghaoui, E. Feron, V. Balakrishnan, Linear Matrix Inequalities
in System and Control Theory, (SIAM, Philadelphia, 1994)
[7] D. den Hertog, Interior Point Approach to Linear Quadratic and Convex Pro-
gramming Kluwer 1993.
[8] D. den Hertog, F. Jarre, C. Roos, T. Terlaky, "A Sufficient Condition for Self-
Concordance, with Application to Some Classes of Structured Convex Program-
ming Problems" Mathematical Programming, Series B 69, 1 (1995) 75-88.
[9] A.V. Fiacco and G.P. McCormick, Nonlinear Programming: Sequential Uncon-
strained Minimization Techniques (Wiley, New York, 1968), Reprinted 1990 in
the SIAM Classics in Applied Mathematics series.
[10] R. Freund, "An infeasible-start algorithm for linear programming whose com-
plexity depends on the distance from the starting point to the optimal solution,"
Working paper 3559-93-MSA, Sloan School of Management, Massachusetts In-
stitute of Technology, (Massachusetts, 1993).
[11] R.W. Freund and F. Jarre, "An interior-point method for fractional programs
with convex constraints," Mathematical Programming 67 (1994) 407-440.
[12] C. Helmberg, F. Rendl, H. Wolkowicz, R.J. Vanderbei, "An interior-point
method for semidefinite programming" Report 264, CDLDO-24, Technische
Universitiit Graz, June 1994.
[13] P. Huard, B.T. Lieu, "La methode des centres dans un espace topologique",
Numerische Mathematik 8 (1966) 56-67.
[14] M. Iri and H. Imai, "A multiplicative barrier function method for linear pro-
gramming", Algorithmica 1 (1986), pp. 455-482.
[15] F. Jarre, "On the method of analytic centers for solving smooth convex
programs," in: Optimization (Varetz, 1988), Lecture Notes in Mathematics
No. 1405 (Springer, Berlin, 1989) pp. 69-85.
[16] F. Jarre, "Interior-point methods for convex programming," Applied Mathemat-
ics and Optimization 26 (1992) 287-311.
[17] F. J arre, "Optimal ellipsoidal approximations around the analytic center," Ap-
plied Mathematics and Optimization 30:15-19 (1994).
Solving Convex Programs 295

[18] F. Jarre, "A new line-search step based on the Weierstrass p-function for min-
imizing a class of logarithmic barrier functions" Numerische Mathematik 68:
81-94 (1994)
[19] F. J arre, "Interior-point methods via self-concordance or relative Lipschitz con-
dition" Optimization Methods and Software 1995, Vol. 5, 75-104.

[20] F. Jarre, M. Kocvara, J. Zowe "Truss topology design by interior-point meth-


ods" Technical Report, in preparation (1995).

[21] H.W. Knobloch and F. Kappel, Gewohnliche Differentialgleichungen (Teubner


Verlag, Stuttgart, 1974).

[22] M. Kojima, N. Megiddo and S. Mizuno, "A primal-dual infeasible-interior-point


algorithm for linear programming," Research Report RJ 8500, IBM Almaden
Research Center (San Jose, CA, 1991), to appear in Mathematical Programming.
131-158.
[23] I.J. Lustig, R.E. Marsten and D.F. Shanno, "On implementing Mehrotra's
predictor-corrector interior-point method for linear programming," SIAM Jour-
nal on Optimization 2 (1992) 435-449.

[24] S. Mehrotra, "On the implementation of a (primal-dual) interior-point method" ,


Technical report 90-03, Dept. of Industrial Engineering and Management Sci-
ences, Northwestern University, Evanston, II, 1990.

[25] S. Mizuno, "Polynomiality of infeasible interior point algorithms for linear


programming," Technical Report No. 1006, School of Operations Research and
Industrial Engineering, Cornell University (Ithaca, NY, 1992).

[26] S. Mizuno, M. Kojima, and M.J. Todd, "Infeasible-interior-point primal-dual


potential-reduction algorithms for linear programming," Technical Report
No. 1023, School of Operations Research and Industrial Engineering, Cornell
University (Ithaca, NY, 1990).

[27] J .E. Nesterov and A.S. Nemirovsky, "A general approach to polynomial-time
algorithms design for convex programming," Report, Central Economical and
Mathematical Institute, USSR Acad. Sci. (Moscow, Russia, 1988).
[28] J .E. Nesterov and A.S. Nemirovsky, Self-concordant functions and polynomial-
time methods in convex programming Report CEMI, USSR Academy of Sciences,
Moscow (1989).

[29] J.E. Nesterov and A.S. Nemirovsky, Interior Point Polynomial Methods in Con-
vex Programming: Theory and Applications (SIAM, Philadelphia, 1994).
296 CHAPTER 7

[30] J .E. Nesterov and A.S. Nemirovsky, "An interior-point method for general-
ized linear-fractional programming," Mathematical Programming, Series B 69,
1 (1995) .
[31] F.A. Potra, "An infeasible interior-point predictor-corrector algorithm for linear
programming," Report No. 26, Department of Mathematics, The University of
Iowa (Iowa City, Iowa, 1992).
[32] F.A. Potra, "A quadratically convergent infeasible interior-point algorithm for
linear programming," Report No. 28, Department of Mathematics, The Univer-
sity of Iowa (Iowa City, Iowa, 1992).

[33] G. Sonnevend, "An 'analytical centre' for polyhedrons and new classes of global
algorithms for linear (smooth convex) programming," in: System Modelling and
Optimization (Budapest 1985), Lecture Notes in Control and Information Sci-
ences No. 84 (Springer, Berlin, 1986) pp. 866-875.
[34] G. Sonnevend and J. Stoer, "Global ellipsoidal approximations and homotopy
methods for solving convex analytic programs," Applied Mathematics and Op-
timization 21 (1990) 139-165.

[35] J. Stoer, "The complexity of an exterior point path-following method for the
solution of linear programs," Working paper, Institut fur Angewandte Mathe-
matik und Statistik, Universitat Wurzburg, (Germany, 1992).
[36] L. Vandenberghe, S. Boyd, "Semidefinite Programming" Technical report, ISL,
Stanford University, Stanford CA, (1994), to appear in: SIAM Review (1995).
[37] Y. Ye, M.J. Todd, and S. Mizuno, "An O( foL)-iteration homogeneous and
self-dual linear programming algorithm," Technical Report No. 1007, School
of Operations Research and Industrial Engineering, Cornell University (Ithaca,
NY, 1992) to appear in Mathematics of Operations Research.
8
COMPLEMENTARITY PROBLEMS
Akiko Yoshise
Institute of Socia-Economic Planning
University of Tsukuba
Tsukuba, Ibaraki 305, Japan

ABSTRACT
This chapter deals with the interior point methods for solving complementarity problems.
Complementarity problems provide generalized forms for nonlinear and/or linear programs
and equilibrium problems. Among others, the monotone linear complementarity problem
has two important applications in the mathematical programming, the linear program and
the convex quadratic program. We focus on this problem and state its properties which serve
as the theoretical backgrounds of various interior point methods. We provide two prototype
algorithms in the class of interior point methods for the monotone linear complementarity
problem and their theoretical views. Also we briefly refer to recent developments and
further extensions on this subject.

Key Words: Interior point method, complementarity problem, path-following algorithm,


potential-reduction algorithm, artificial problem, computational complexity, P' -matrix,
smoothness condition

8.1 INTRODUCTION
As we have seen in Part I, the interior point methodologies have yielded rich the-
ories and algorithms in the field of linear programming. This has motivated us to
extend them to more general problems. In this chapter, we deal with the complemen-
tarity problem as an example and describe how the results obtained for the linear
programming have been generalized for this problem.

297
T. Terlaky (ed.), Interior Point Methods of Mathematical Programming 297-367.
© 1996 Kluwu Academic Publishers.
298 CHAPTER 8

The (standard) complementarity problem, abbreviated by CP, is defined to be ofthe


form (see [5], etc.):

CP: Find (x, y) E lR 2n


such that y = I(x), (x, y) ~ 0, and XiYi = °(i = 1, ... , n) (8.1)

where I denotes a mapping from the n-dimensional Euclidean space lRn into itself.
The complementarity problem has been conceived to be a unifying form for nonlinear
and/or linear programs and equilibrium problems. It is known that any differentiable
convex program can be formulated into a monotone complementarity problem. Also,
the variational inequality problem has a close connection to the CP throughout
history. See Harkar and Pang [14] where the authors provide an extensive review
of theory, algorithms and applications of these problems. Among others, the linear
complementarity problem (LCP) has two important applications in the mathematical
programming, the linear program (LP) and the convex quadratic program (QP). The
reader is referred to the monumental work of Cottle, Pang and Stone [5] which is a
comprehensive book covering mathematical theory, algorithms and applications of
LCPs developed until 1992.

Each algorithm in the class of interior point methods for the CP has a common
feature such that it generates a sequence {( xk, yk)} in the positive orthant of lR 2n ,
i.e., every (xk, yk) satisfies (xk, yk) > o. If each point (xk, yk) (k =
0, 1, ... ) of
the generated sequence satisfies the equality system y =
I( x) then we say that
the algorithm is a feasible-interior-point algorithm, and otherwise, an infeasible-
interior-point algorithm. These algorithms originate in the primal-dual interior point
algorithms for the LP (see [45, 30], and so on). Megiddo [45] first showed the
existence of a path of centers for primal-dual LP which converges to a solution and
extended the concept to the general LCP. This analysis introduced a new framework
of interior point algorithms of tracing the path of centers. Within this framework,
the primal-dual interior point algorithms were developed by Kojima, Mizuno and
Yoshise [30], Tanabe [75] and so on. Kojima et al. [30] first proved the polynomial
computational complexity of the algorithms, and since then many other algorithms
have been developed based on the primal-dual strategy. Kojima et al. [29] proposed
an interior point algorithm for solving the monotone LCP and established the best
complexity bounds O( foL) known on the number of iterations for this problem.
Independent of this work, Monteiro and Adler [52] also provided an interior point
algorithm for the convex quadratic problem with an O(foL) iteration bound. Up
to present, the study of interior point methods for CP has paralleled that for the
LP.

In this chapter, we describe these interior point methods for CP with the intention
of describing the theoretical base of these algorithms as plainly as possible. For
this purpose, the class of feasible-interior-point algorithms for the monotone LCP
Complementarity Problems 299

will be selected as the subject for discussion on theoretical aspects. Concerning


other algorithms, we will briefly explain the recent developments with appropriate
references.

This chapter is organized as follows. In the next section, we state the monotone LCP
and give some viewpoints in the context of optimization. As the definition (8.1) of
CP indicates, it is natural to use the classical Newton method for finding a solution
of a system to design interior point algorithms for CPo In Section 3, we discuss the
Newton method for the monotone LCP and give some fundamental results which lead
us to the definition of the path of centers for the monotone LCP and its existence. In
Section 4, we propose two prototype feasible-interior-point algorithms - the path-
following algorithm and the potential-reduction algorithm. Under the nonemptiness
assumption of the set of feasible-interior-points, each of the algorithms generates an
approximate solution of the monotone LCP in a finite number of iterations. The
iteration number is related to the initial point and the stopping criteria. In Section
5, we discuss these two subjects and provide the polynomial complexity bounds
for the algorithms described in Section 4. In Section 6, we briefly describe further
developments of interior point algorithms for LCPs and extensions to more general
classes of CPs. In order to ease readability, most theoretical results are presented
without proofs, giving just some references. However, it might require considerable
effort to collect all the proofs from the extensive literature, thus to assist the reader
most proofs are collected in Section 7.

Here we list some symbols that are often used throughout this chapter:

Rm : the m-dimensional Euclidean space.


R~ : the nonnegative orthant of Rm.
R~+ : the positive orthant of Rm.
I : the identity matrix.
o : the zero matrix.
e : the vector of ones in Rn.
(1, 0 and e are suitably dimensional in the context.)
X = diag{xi : i = 1,2, .. . ,n}:
the diagonal matrix whose elements are the coordinates Xi of x E Rn.
300 CHAPTER 8

8.2 MONOTONE LINEAR


COMPLEMENTARITY PROBLEMS
In this section, we describe the monotone linear complementarity problem (mono-
tone LCP) and its examples. We present two optimization problems related to the
monotone LCP which are important models for the design of interior-point methods.
Some basic properties of the monotone LCP are also derived under certain condi-
tions. These results support theoretical aspects of interior point algorithms for the
monotone LCP, which we will see in Section 3.

In light of the definition (8.1) we define the linear complementarity problem (LCP)
as a CP with an affine mapping I : R n -. Rn, i.e., I is given by I(x) = Mx + q
where M is an n x n matrix and q is an n-dimensional vector:
LCP: Find (x, y) E R 2n
(8.2)
such that y=Mx+q, (x,y)~O, xiYi=O(i=l, ... ,n).
Let us define the affine space
Saf = {(x, y) E R 2n : y = Mx + q}.
Then the feasible region S+, the feasible-interior region S++ and the solution set
Scp of the LCP 8.2 are given by

S+ {(x,y) E R!n : y = Mx + q} = Saf n R~n,


S++ {(x, y) E R~~ : y = Mx + q} = SaC n R!n+,
Scp = {(x,Y)ER~n:y=Mx+q, XiYi=O, i=1,2, ... ,n}
Saf n {(x, y) E R!n : XiYi = 0, i = 1,2, ... , n}.
A CP is called monotone complementarity problem if the mapping I associated with
a CP satisfies the following condition:

Condition 8.2.1
(x - x')T(f(x) - I(x ' »~ 0 lor every x, x' E lRn.

When we consider the linear case I( x) = M x + q, the above condition is equivalent


to the condition below:

Condition 8.2.2 The matrix M is positive semi~definite, i.e.,


dT M d ~ 0 for every dE R n
Complementarity Problems 301

The monotone LCP is an LCP which satisfies Condition 8.2.2. The following is an
well-known and important example of the monotone LCP.

Example 8.2.3 [Convex quadratic programming]

1 T
QP : Minimize cT u + _u Qu subject to Au 2: b, u 2: 0,
2

where Q is a symmetric positive semi-definite matrix.

Since the objective function is convex, the following Karush-Kuhn-Tucker conditions


are necessary and sufficient for optimal solutions of QP:

w = c + Qu - AT 2: 0, u 2: 0, u T = °
V W
(8.3)
z = Au - b 2: 0, v 2: 0, vT Z = 0.
The above system can be formulated as a monotone LCP where the matrix M and
the vector q are given by

M= ( Q
A °
_AT)
,q =
( C
-b
)
.

Here the positive semi-definiteness of M follows from the condition on Q. In the


special case Q = 0, the problem QP becomes a linear program with inequality
constraints. In this case, the matrix M is skew-symmetric, i.e., JI' M d =°
for every
dE IRn.

One may take an interest in the case where the problem is given by the standard
form:
1 T
QP' : Minimize cT u + _u Qu subject to Au = b, u 2: 0.
2

The Karush-Kuhn-Tucker conditions of this problem are given by

w = c + Qu - AT V 2: 0, u 2: 0, uTw =
0= Au - b.
° (8.4)

Kojima et al. [30] first proposed a primal-dual interior point algorithm for this type
of system arising from linear programming. Monteiro and Adler [51, 52] also refined
the algorithm in [30] and develop a primal-dual interior point algorithm for solving
the system (8.4).
302 CHAPTER 8

In the algebraic definition, the system (8.4) can not be formulated directly as an
LCP of the form (8.2). The problem (8.2) is a type of LCP which we call standard
LCP and there have been many other types of LCP, e.g., the mixed LCP (MLCP),
the horizontal LCP (HLCP), the generalized (or geometrical) LCP (GLCP) and so
on:
MLCP: Find (x, Y) E R 2n

such that (~) = M ( : ) + q, (x, y) ~ 0, (8.5)

XiYi = 0 i = 1, ... , n,
where w E Rm, q E R n+m and M is an (n + m) x (n + m) matrix;

HLCP: Find (x, y) E R 2n


such that MIX + M2y = q, (x, y) ~ 0, (8.6)
XiYi = 0 i = 1,2, . .. ,n,
where MI and M2 are n x n matrices;

GLCP: Find (x, y) E IR2n


suchthat (x,Y)E~(x,j), (x,y)~O, (8.7)
XiYi =0 ;i = 1,2, .. . ,n,
where ~ is a linear manifold given by
~(x, y) = (xT, yT)T + ~
with a linear subspace ~ and a point (x, y) E R2n.

It can be easily seen that the class of GLCPs includes both of the classes of HLCPs
and of MLCPs, and that the standard LCP (8.2) belongs to the class of HLCPs. We
define the monotonicity of each problem as follows:

MLCP: The matrix M is positive semi-definite;


HLCP: The equation MIX + M2y = 0 implies that xTy ~ 0;
GLCP: The dimension of ~ is nand xTy ~ 0 for every (x, y) E ~.

One can see that systems (8.3) and (8.4) can be put into a monotone MLCP. See [5]
for more rich variations on the LCP.

Recently, some implications of three types of LCPs above have become clear in the
context of interior point algorithms ([2, 4,10,11,48,47,54,53,65,78,81,90], etc.).
Complementarity Problems 303

Giiler[ll] first showed that the GLCP with a maximal monotone operator can be
reduced to a monotone standard LCP and Bonnans and Gonzaga [4] simplified its
proof. Mizuno, Jane and Stoer [48] provided a unified approach of interior point
algorithms for a class of monotone GLCPs. The equivalence between the class of
monotone LCPs and the class of monotone MLCPs has been proved by Wright [81]
and the one between the class of GLCPs and the class of standard LCPs has been
shown by Potra [2] in view of P.-property which we will describe in Section 6.

If the matrix A in (8.4) has full row rank then we can find a basis matrix B of A.
Let us partition A as A = [BIN]. Dividing the variables U into the basic variables
UB and the nonbasic variables UN, we have
{u: Au=b, u>O}={(UB,UN): uB=B-1b-B-1UN>0, UN>O}.
Thus we can deal with the problem (8.4) as a problem of the type (8.3) with respect
to the variables UN in this case.

Figure 8.1 illustrates an example of LCP with n = 2 in x-space. The feasible region
and the feasible-interior region are given by S+ and S++, respectively. The boundary
=
lines indicate the sets of points satisfying the equations Xi 0 or Yi= =
(M x+ q)i 0
(i = 1 or 2). We often use this figure throughout of this chapter. The feasible-
interior-point algorithm generates a sequence in the feasible-interior region S++ =
((x,y) E 1R2n : y = Mx + q,(x,y) > O} (the shaded zone in the figure), which
converges to a solution of the LCP.

When we design an interior point method for the LCP, it is important to formulate
the problem as an optimization model. We propose here two types of such models
which are closely related to the prototype algorithms described in Section 4.

The first model is a quadratic programming problem which is based on the fact that
xTy ~ 0 whenever x ~ 0 and y ~ 0:
Ml: Minimize xTy subject to (x,y) E S+ = SafnIR!n. (8.8)
The model Ml is equivalent to the LCP in the sense that (x, y) is a solution of the
LCP if and only if it is a minimum solution of Ml with the objective value zero. This
formulation is a basis of the so-called path-following algorithm for the LCP, which
will be described in Section 4.1. Under certain conditions, the algorithm generates a
sequence {(xk, yk) : k = 0,1, ...} of the feasible-interior-points (xk, yk) E S++ such
that
(8.9)
where p E (0,1) is a number which does not depend on the iteration k. This relation
implies that an approximate solution (x K , yK) E S++ such that (xKf yK :5 f can
be obtained after a finite number of iteration for any f > O.
304 CHAPTER 8

Figure 8.1 The feasible-interior region S++ of the LCP (n = 2).

The second model depends on the potential function which was first introduced by
Karmarkar [23] for linear programming problems in (non-standard) primal form. By
extending the function to the problems in primal-dual form, Todd and Ye [77] defined
the primal-dual potential function, and independently of this work, Tanabe [74] also
provided it in a multiplicative form. Ye [86] first showed the so-called primal-dual
potential-reduction algorithm and established a bound of O( vnL) on the number of
iterations (and O(n 3 L) on the number of arithmetic operations) of the algorithm.
The first O( vnL )-iteration potential-reduction algorithm for LCPs was proposed by
Kojima et. al [31]. Let us define the potential function ifJ for the LCP:

E 10gxiYi -
n

ifJ(x, y) = (n + II) 10gxT y - n logn, for every (x, y) >0


i=1

where II> 0 is a parameter. The first term (n + II) 10gxTy comes from the objective
function of the quadratic problem M1 (8.8), the second term - L:7=llog XiYi works as
a logarithmic barrier function ([7], etc.) and the last term is added for convenience
in the following discussions. We consider the following minimization problem by
employing the potential function as the objective function:

M2 : Minimize ifJ(x, y) subject to (x, y) E S++ = SaC n IR~+. (8.10)


Complementarity Problems 305

It is easy to see that the potential function <p can be expressed as follows:

<p(x, y) v<pcp(x, y) + ¢cen(x, y),


<Pcp(x, y) logxTy,
<Pcen(x, y) n log x T y - 2::7=1 log XiYi - n log n
(8.11)
"n 1 ~
L....i=l og "',y,
1
n og en:=,
",T yin
xiy,)'/n .

xTy/n
Here the factor (Il n
. yin corresponds to the ratio of the arithmetic mean and
x,y,
i=l
the geometric mean of n positive numbers X1Y1, X2Y2, ... , XnYn; hence we can see
that
¢cen(x,y) 2: 0 for every (x,y) > o. (8.12)
This bound implies that

= vlogxTy for every (x,y) > o.


¢(x,y) 2: v¢cp(x,y) (8.13)

Thus, if we have a sequence {( xk, yk) : k = 0, 1, ...} such that ¢( xk , yk) --+ -00 then
it satisfies (xkf yk --+ O. The potential-reduction algorithm described in Section
4.2 generates a sequence {(xk, yk)} of feasible-interior-points (x k , yk) E S++ (k =
0,1, ...) such that

(8.14)

where 6 > 0 is a number which does not depend on k. Similarly to the relation (8.9),
this implies that an approximate solution (x K , yK) E S++ such that (xK)T yK ~ (
can be obtained after a finite number of iteration for any ( > O.

As we have seen above, if the sequence {(xk, yk)} satisfies ¢(xk, yk) --+ -00 then
(xk)T yk --+ O. However the converse does not necessarily hold. Let n 2: 2 and
=
consider the sequence {( xk , yk) : k 1, 2, ... } which satisfies

kk 1 kk 1.
x1Y1 = k"+ 1 ' xiYi = k (t=2,3, ... ,n)

for each k. Obviously (xkf yk =


2::7=1 x~Yf becomes zero as k tends to infinity,
but the sequence {¢(xk, yk)} is bounded as we will see below. The following two
inequalities follow from the fact that the function log t is monotonically increasing
on the interval (0,00).
In-I) 1 1
(n + II) log ( k,,+l + - k - -log k,,+l - (n - l)log k
n -I 1 1
(n + lI)log - k - -log k,,+l - (n - l)log k
306 CHAPTER 8

(n + II) log ( n - 1) + { -( n + II) + (II + 1) + (n - I)} log k


(n + II) log(n - 1),
In-I) 1 1
(n + II) log ( k,,+l + - k - -log k,,+l - (n - 1) log k
n I l
(n + lI)log k -log k,,+l - (n - l)log k
(n + II) log n + {- (n + II) + (II + 1) + (n - 1) }log k
(n + II) log n,
This example also illustrates the fact that x T y converges to zero as <Peen diverges
to +00 whenever <p(x, y) is bounded from above. See Chapter 4 for more detailed
descriptions of various potential functions.

For simplicity, we assume that the following condition holds throughout the succeed-
ing two sections, Section 3 and Section 4:

Condition 8.2.4 A feasible-interior-point (x, jj) E S++ of the LCP (8.2) is known.

This condition ensures not only the availability of an initial point of the interior
point algorithm, but also more rich properties of the monotone LCP. First, let us
observe the following well known results which can be obtained under a more relaxed
condition, i.e., S+ :f:. 0 (see Section 3.1 of [5)).

Lemma 8.2.5 Suppose that the LCP (8.2) satisfies Condition 8.2.2. If the feasible
region S+ of the L CP is not empty, then

(i) the solution set Sep is also nonempty,


(ii) there exist two index sets Ix and Iy such that
Ix = {i : Xi = 0 for every (x, y) E Sep},
Iy = {i : Yi = 0 for every (x, y) E Sep},
IxUly ={1,2, ... ,n},
Xi> 0 (i f/. Ix) and Yi > 0 (i f/. Iy) for some (x, y) E Sep,

and

(iii) the solution set Sep is equal to the convex polyhedron


P={(x,y)ES+ :Xi=O, iElx , Yi=O, iEly}

where the index sets Ix and Iy are given in (ii).


Complementarity Problems 307

Obviously, S+ i- 0 under Condition 8.2.4, hence the above lemma also holds under
the condition. Moreover, this stronger condition leads us to the next lemma (see,
for example, the proof of Theorem A.3 of [27]).

Lemma 8.2.6 Suppose that the LCP (8.2) satisfies Condition 8.2.2 and Condition
8.2.4. Then the assertions (i), (ii) and (iii) of Lemma 8.2.5 hold. Furthermore, the
set n
S+ ( T) = {(x, y) E S+ : x T y = LX; Yi ::::; T} (8.15)
;=1

is closed and bounded for every T?:: O. Here the set S+(T) can be regarded as a level
set associated with the objective function x T y of the model Ml (8.8).

In particular, the solution set Scp is a closed bounded convex set.

Some of the results above can be generalized for nonlinear cases. Let us consider a
monotone CP, i.e., the CP satisfies Condition 8.2.1. Then the monotone CP has a
solution if it has an feasible-interior-point (x, y) E S++ (see [58]). Moreover we can
show that the level set S+ (T) in this case is also closed and bounded by a similar
way in the proof of Lemma 8.2.6. However, we can not extend the assertion (i) of
Lemma 8.2.5 to the nonlinear monotone cases. Megiddo [43] showed an example of
the nonlinear monotone CP where S+ i- 0 and S++ = 0 but Scp = 0.

In linear cases, several results have been reported concerning the existence of the
solution of the LCP. For example, if the matrix M is row sufficient and the feasible
solution set S+ is nonempty then the solution set Scp is also nonempty. On the other
hand, the solution set Scp is convex for each q if and only if the matrix M is column
sufficient. Thus if M is sufficient (i.e., row and column sufficient) then the LCP has
a nonempty convex solution set Scp for every q. See [5] for more details.

As we will see in the next section, the boundedness of the level set S+(T) (8.15) plays
a crucial role in showing the existence of the path of centers. It has been known that
the set S+(T) is bounded for every q E IRn if and only if M is an Ro-matrix, i.e., if
M x ?:: 0, x ?:: 0 and x T M x = 0 then x = O. Note that the 2 x 2 positive semi-definite
matrix

is not Ro (choose x = (0, 1)T). Hence the positive semi-definiteness does not nec-
essarily ensure that the set S+ (T) is bounded. This implies another importance of
Condition 8.2.4.
308 CHAPTER 8

The boundedness of the set S+(T) under Condition S.2.4 can be extended to a class
of so-called P.-matrices [26]. See Section 6 for the definition of P.-matrix. Recently,
the equivalence of the class of P.-matrices and the one of sufficient matrices was
shown by Viiliaho [79].

8.3 NEWTON'S METHOD AND THE PATH


OF CENTERS
In this section, we discuss the Newton's method for solving the monotone LCP de-
scribed in the previous section. We also refer to the path of centers for the monotone
LCP, whose existence appears necessary for showing the convergence properties of
interior point algorithms.

In what follows, we assume that Condition S.2.2 and Condition S.2.4 hold. Let
(x, y) E S++ be the current point. We intend to find the next point in the feasible-
interior region S++. To define the next point, we introduce the search direction
(6.x,6.y) E 1R2n and the step parameter B, and define (x(B), y(B» as

(x(B), y(B» = (x, y) + B(6.x, 6.y). (S.16)

The next point X, y is determined by (x(B), y(B» for a given B > O. How should we
determine the search direction (6.x, 6.y) and the step parameter B? A solution (x, y)
of the LCP satisfies the equality system

y=Mx+q, xiy;=0(i=1,2, ... ,n) (S.17)

and the inequality system


(x, y) ~ O.
Therefore, a reasonable method may be to employ the Newton direction to approx-
imate a point which satisfies the system (S.17) of equations and to chose a suitable
step size so that the next point (x(B), y(B» remains in the positive orthant. In this
case, the Newton direction at the point (x, y) E S++ C Saf satisfies the following
system
=
Y6.x + X6.y -Xy, -M6.x + 6.y 0, = (S.lS)
where X(Y) denotes the diagonal matrix whose components are X;(Yi) (i = 1, ... , n).
Under the assumption that the matrix M is positive semi-definite, the system has a
solution whose Euclidean norm can be bounded explicitly. To see this, we consider a
more general system in which the vectors x, y and -Xy are replaced by two positive
Complementarity Problems 309

vectors x E R~+, Y E R~+ and one n-dimensional vector h, respectively:

(8.19)

The following lemma has been repeatedly used in many papers on interior point
algorithms for monotone LCP (see, for example, Lemma 4.1 and Lemma 4.20 of
[26]).

Lemma 8.3.1 Suppose that Condition 8.2.2 holds. Then, for every (x, y) E R~+,

(i) the matrix


- (Y X)
M= -M I (8.20)

is nonsingular, hence the system (8.19) has a unique solution (~x, ~y) for every
hE R n , and
(ii) (~x, ~y) satisfies the following inequalities:

o ::; ~xT ~y < !IIX-l/2y-l/2hI12 (8.21)


4
IID-I~xW + IID~yI12 IIX- 1 / 2y-l/2hW - 2~xT ~y
< IIX- 1 / 2y-l/2hW. (8.22)

Here X-l/2 (Y-l/2) denotes the diagonal matrix whose components are X-;I/2
(y;I/2) (i=1,2 ... ,n) andD=X 1 / 2y-I/2.

Noting that IIX- 1 / 2 y-I/2(-Xy)W = iTy, we obtain the following results as a


corollary of the lemma.

Corollary 8.3.2 Suppose that Condition 8.2.2 holds. Then

(i) the system (8.18) has a unique solution (~xa,~ya), and


(ii) (~xa, ~ya) satisfies the following inequalities:
1
o ::; (~xal ~ya ::; 4"iT ii, (8.23)
IIb-l~xaIl2 + Ilb~yall2 ::; iTii. (8.24)
Here b =X 1 / 2 Y-l/2.
310 CHAPTER 8

Let us observe how the above results serve to determine the next point, by adopting
the model M1 (8.8) in Section 2. The model M1 is an optimization problem which
minimizes the sum of complementarities x T y = E7
XiYi. Therefore, our intention is
to find a step size () so that the next point (x, fj) remains in the feasible-interior region
S++ and the complementarity x((})T fj((}) at the next point is reduced sufficiently.
Recall that (x, fj) is given by (8.16) for a () > O. For every (x, y) E SaC, the system
(8.18) ensures that
fj((}) = y + Ol::!.ya = (Mx + q) + (}M l::!.x a = M x(O) + q.
Hence (x((}), y((}» E SaC for every O. It follows that (x(O), y(O» > 0 is a necessary-
and-sufficient condition for (x((}), fj(O) E S++ = Sa! n lR!"+. We can easily see that
(x((}), y(O» > 0 if and only if (e + (}X-1l::!.x a, e + oy-1l::!.ya) > O. Therefore, if
OIIX-1l::!.xalloo < 1 and (}IIY-1l::!.yaII 00 < 1 (8.25)
then (x(()), y((}» > O. By (8.24), we obtain that
IIx- 1 l::!.x a1100 IIX-1/2Y-1/2lJ-1l::!.xalioo
< IIX- 1/2y-l/2lJ- 1l::!. xa ll
< IIX-1/2Y-1/211I1lJ-1l::!.x411

< min{xiYi: i= 1,2, ... ,n}'


and similarly,

min{xiYi: i= 1,2, ... ,nr


Thus, the upper bound

min{xiYi: i= 1,2, ... ,n}


()< (8.26)
xTy
gives a sufficient condition for (x( 0), y( 0» E S++. On the other hand, the comple-
mentarity x((})T y(O) at the next point is rewritten as follows:
x(Of y(O) = (x + (}l::!.xafW + (}l::!.ya)
= xTy + O(yT l::!.x 4 + xT l::!.ya) + 02 (l::!.x af l::!.ya.
Since (l::!.x a, l::!.ya) satisfies the system (8.18) and the inequality (8.23), x(Of fj((}) is
bounded by
Complementarity Problems 311

Let us combine the above inequality and the condition (8.26) for (i( 0), y(O)) E S++.
Define
min{xiYi: i= 1,2, ... ,n}
xTy

where 'Y is a constant such that 'Y E (0,1). Obviously, iJ satisfies (8.26) and
- 'Y
O<OS,;n<1.

Hence (i(iJ), y(iJ)) E S++ and the complementarity at (i(iJ), y(iJ)) satisfies

i(iJfy(iJ) S (1- ~iJ)2XTy, (8.27)

where (1 - tiJ)2 E (0,1). The above inequality seems to lead us to the recurrence
relation (8.9) which is a desirable property of generated sequence for the optimization
model Ml (8.8). However there exists a serious difference between (8.9) and (8.27);
the value (1 - iJ)2 in (8.27) depends on the point (x, y) while p in (8.9) is referred as a
number which does not depend on the point (x, y). The value (1 - tiJ)2 is influenced
=
by the dispersion of XiYi (i 1, 2, ... , n) and satisfies

(1 - 2 ;nY
o
S (1 - iJf < 1 (8.28)

for every (x, y) > O. Note that the equality above holds if and only if (x, y) > 0
satisfies

XiY; = p. (i = 1.2, ... , n) (or equivalently X y = p.e) for some p. > O.

This means that if (x, y) belongs to the set

Seen {(x,y) E S++: XiYi = P. (i = 1,2, . .. ,n), I-' > O}


{(x,y) E S++: Xy = I-'e, p. > O} (8.29)

then we can obtain the next point (i(O), y(O)) E S++ such that the complementarity
is reduced by the factor (1 - i:Fn) 2.

The set Seen given by (8.29) is called path of centers of the LCP. Let us introduce
the following mappings u : lR!n --+ lR~ and H : lR+ x lR!n+ --+ lRn x lRn :

u(x, y) = Xy = (XIYl, X2Y2, ... , xnYn)T, (8.30)


H(p., x, y) = (u(x, y) - p.e, Y - Mx - q) = (Xy - p.e,y - Mx - q). (8.31)
312 CHAPTER 8

Figure 8.2 The path of centers Seen and u(Seen).

Then, the solution of the LCP (B.2) is equivalent to that of the system
H(O,x,y) = 0, (x,y) E 1R!n,
and each point on path of centers Seen can be given by a solution of the system
H(J-I,x,y) = 0, (x,y) E 1R!n+, for some J-I > O. (B.32)
See Figure B.2 which illustrates the path of centers Seen in x-space and u(Seen) in
u-space, respectively.

The path of centers Seen can be characterized in several ways. Here we consider the
family of problems (B.B):
L(J-I) Minimize =
1/;(J-I, x, y) xTy - J-I L:?=l lo g(XiYi) (B.33)
subject to =
(x, y) E S++ Saf n 1R!n+,
for J-I > O. This problem may be regarded as the logarithmic barrier function problem
for the model Ml. Most of the following results were indicated and studied by
Megiddo [45] and by Kojima et al. [29]. See also [34] which gives some ingredients
in the proofs provided in Section 7.

Lemma S.3.3 Let J-I > 0 be fixed. If (x, y) satisfies (8.32) then it is an optimal
solution of L(J-I).
Complementarity Problems 313

Lemma 8.3.4 Suppose that Condition 8.2.2 and Condition 8.2.4 hold. Then the
problem L(J.I) has a unique optimal solution (x(J.I), Y(J.I)) for every J.I > O.

Lemma 8.3.5 Let J.I > 0 be fixed. Under Condition 8.2.2, if (x, y) E S++ is the
optimal solution of L(J.I) then it satisfies the system (8.32).

Theorem 8.3.6 Suppose that Conditions 8.2.2 and 8.2.4 hold. Then the path of
centers Seen is a I-dimensional smooth curve which converges to a solution (x', yO) E
Sep of the LCP(8.2) as J.I tends to O.

As we can see in Section 7, the results above are mainly due to the following two
facts given in Lemma 8.3.1 and Lemma 8.2.6:

(3a) The matrix M defined by (8.20) is nonsingular for every (x, y) > O.
(3b) The set S+(r) defined by (8.15) is bounded for every r:2: O.

It has been known that the condition (3a) holds if and only if all the principal
minors of the matrix M are nonnegative, i.e., the matrix M is a Po-matrix (Lemma
4.1 of [26]). In fact, Kojima et al. [26] showed that the mapping u (8.30) is a
diffeomorphism from the feasible-interior region S++ onto the n-dimensional positive
orthant IR++ under the conditions (3a) and (3b), and derived the existence of the
path of centers. Besides this, the generalization has been done for various problems,
e.g.,

(i) nonlinear CPs: Kojima, Mizuno and Noma [27,28]' Kojima, Megiddo and Noma
[25], Noma [63], etc.,
(ii) CPs for maximal monotone operators: McLinden [41], Giiler [12], etc.,
(iii) monotone semidefinite LCPs: Kojima, Shindoh and Hara [34], etc,
(iv) monotone generalized CPs (including monotone linear and nonlinear CPs and
monotone semidefinite LCPs): Shida, Shindoh and Kojima [72].

In the literature on interior point algorithms the existence of the path of centers (or
the central trajectory) is considered as a crucial condition for providing a globally
convergent algorithm.

Up to the present, our analysis on the path of centers Seen has been based on the
optimization model Ml (8.8). However Seen can also be characterized in the context
314 CHAPTER 8

u(S

Figure 8.3 The level set A¢cen (T) and u(A¢cen (T».

of the model M2 (S.10). Recall that the potential function eP can be expressed as
(S.l1). In view of the definition ePeen, it is easily seen that the equality in (S.12)
holds on the set S++ if and only if (x, y) E Seen. Thus, we obtain another definition
of Seen:
Seen = ((x,y) E S++ : ePeen(x,y) = O}. (S.34)
See Figure S.3, Figure S.4 and Figure S.5 where the level sets

A,pcen (r) ((x,y) E S++ : ePeen(x,y) S r},


A,pcp(r) {(x, y) E S++ : ePep(x, y) S r},
A,p(r) {(x, y) E S++ : eP(x, y) S r}
ofthe functions ePeen, ePep and eP in x-space and their images u(~ePeen(r» u(~ePep(r»
and u(~eP(r» by the mapping u given by (8.30) are represented for some r, respec-
tively.

In the next section, we will propose two prototype algorithms based on the model
M1 and model M2, respectively.

The solution (~xa, ~ya) of the system (S.lS) for approximating a point which sat-
isfies the system (S.17) is often called affine scaling direction for the LCP. This
direction is used not only in the affine scaling algorithms, but also in the predictor-
corrector algorithms for the LCP (see Section 6). Furthermore, as we will see in
Complementarity Problems 315

1;;!:p

«("""""cpee' )

Figure 8.5 The level set A</>(r) and u(A</>(r».


316 CHAPTER 8

the next section, each of the directions used in the path-following algorithm and the
potential-reduction algorithm can be regarded as a convex combination of the affine
scaling direction and the so-called centering direction for approximating a point on
the path of centers Seen.

8.4 TWO PROTOTYPE ALGORITHMS FOR


THE MONOTONE LCP
In this section, we propose two algorithms, the path-following algorithm and the
potential-reduction algorithm, which are known as typical interior point algorithms
for the LCP. We still impose Condition 8.2.2 and Condition 8.2.4 on the LCP
throughout this section.

8.4.1 Path-Following Algorithm


As we have seen in the previous section, if we have a point (x, y) on the path of
centers Seen, then we can easily find a next point (x, ii) such that it belongs to the
feasible-interior region S++ and the complementarity x T y is reduced by a fixed factor
(1 - i7n? (see (8.27) and (8.28)). However, since each point on the path of centers
Seen is the solution of the nonlinear system (8.32), it is not so easy to find a point
on Seen. In the analysis in Section 3, we employ the Newton direction (~xa, ~ya)
as the search direction, which is the unique solution of the system (8.18). Since
the direction (~xa, ~ya) can be regarded as a continuous map from S++ to R 2n ,
a similar result may be obtained if (x, ii) is sufficiently close to the path of centers.
This is a motivation for developing the path-following algorithm.

For each (x, y) E S++, we employ the quantity

min IIH(J.I,x,y)1I = min IIXy - J.lell = IIXY - XTYel1 (8.35)


~EDl+ ~EDl+ n
as a measure for the distance from (x, y) to the path of centers Seen. We also define
the set
(8.36)

and consider that if (x, y) E N «(X) for a small (X > 0 then (x, y) is sufficiently close to
the path of centers. For a fixed (X > 0, we call the set .N( (X) neighborhood of path of
centers Seen. Figure 8.6 illustrates the set .N( (X) in x-space and u( Seen) in u-space,
respectively.
Complementarity Problems 317

Figure 8.6 The neighborhood N(O!) of the path of centers Seen and u(Seen).

Let a E (0,1) be fixed. Then we can see that


xTy xTy
0< (1- a)-:S XiYi :S (1 + a)-, i E {1,2, .. . ,n} (8.37)
n n
for each (x, y) E N(a). Thus the upper bound in (8.26) for maintaining the posi-
tiveness of (x(B), y(B)) is bounded by /1-;'01. Consequently we may find a feasible-
interior-point (x,ii) E S++ along the direction (6 x a,6 ya) at which the comple-
mentarity is reduced by the factor (1 - t~ 2 where r E (0,1) is a constant.
However, since the next point (x, y) does not necessarily belong to the neighborhood
N(a), the above discussion might be inadequate to be continued. Our intention is
to construct an algorithm which generates a sequence {(xk,yk)} C N(a) satisfying
(8.9), which we call feasible-path-following algorithm for the LCP. For this purpose,
we introduce another direction (6x C, 6 y C) E 1R2n, called centering direction, which
is the Newton direction for approximating a point on the path of centers Seen at
(x, y) E S++ such that
xTy
Xy=--e, y=Mx+q.
n
The direction (6x C, 6 y C) is given by the unique solution of the system
_ _ _ xT y
Y6x+X6y = -(Xy- -e),
n
-M6x+ 6y = O. (8.38)
318 CHAPTER 8

See (i) of Lemma 8.3.1. Let (~x(j1), ~y(j1» be a convex combination ofthe centering
direction (~XC, ~yC) and the affine scaling direction (~xa, ~ya) at (x, y) E S++
given by
(8.39)
for j1 E [0,1]. It is easily seen that the direction (~x(j1), ~y(j1» coincides with the
unique solution of the system
- -T- -
-{j1(Xy - ¥e) + (1- j1)Xy}
-(Xy - j1~e), (8.40)
= o.
A conceptual illustration of these three directions is given by Figure 8.7.

Let us consider the search mapping (8.16) with (~x, ~y) = (~x(j1), ~y(j1». The
assumption (x, y) E S++ and the system (8.40) implies that

(i(O), y(O» = (x, y) + O(~x(j1), ~y(j1» E Saf


for every O. In what follows, we will show that there exist a constant p E (0,1) and
a new point (i(O), y(O» such that

(4a) (i(O), yeO»~ > 0,


Complementarity Problems 319

(4b) (x(O),y(O» EN(a),


(4c) X(O? yeO) :s pxT fj
for every (x, fj) E N (a), by choosing suitable parameters (the combination parameter
(3 E [0,1]' the neighborhood parameter a E (0,1) and the step size parameter 0 > 0).
It should be noted that the requirement (4a) is a sufficient condition for (x, y) E
S++ = lR,!n+ n Sar since (X(O), y(O» E Sar for every O. The following results are
useful for determining the parameters and can be obtained from the definition (8.36)
of N(a), Lemma 8.3.1 and the inequality (8.37).

Lemma 8.4.1 Suppose that Condition 8.2.2 and Condition 8.2.4 hold. Let (x, fj) E
N(a) for a E (0,1) and let (.~x«(3), ~y«(3» be the solution of the system (8.40) for
(3 E [0,1]. Then

(i) IIXfj - (3~eIl2 :s {a 2 + (1- (3)2n} (~r.


(ii) °:s ~x«(3)T ~y«(3) :s a'!g-!]'n~.
(iii) IIb-1~x«(3)1I2 + IIb~y«(3)112 :s a'+~1-:)'n 9.
( iv) IIX-1~x«(3)11 < - /a'+(1-{3)2n
- V (1-a)2
and Ily-1~y«(3)11 < _/a'+(1-{3)2n
- V (1-a)2 .

(v) II~X«(3)~y«(3)1I :s a'+~1-:)2n~.

Here b =X 1/2y-l/2 and ~X«(3) denotes a diagonal matrix whose components are
equal to those of ~x«(3).

Let us derive a sufficient condition of the parameters a E (0,1), (3 E (0,1) and 0 >
for our requirements (4a), (4b) and (4c). By a similar discussion in Section 3, we
°
can see that (4a) holds if

0IlX-1~X«(3)1I < 1 and 0Ily-1~y«(3)11 <1


(see (8.25». By (iv) of Lemma 8.4.1, the above inequality gives us the first restriction
on the parameters a, (3 and 0:

(1 - a)2
(8.41)
320 CHAPTER 8

For the requirement (4b) and (4c), we must observe x( O)T fi( 0) and X (O)fi( 0) -
ice): y(e) e. The following lemma directly follows from the fact that the displacement
(ll.x(,8), ll.y(,8» satisfies

y ll.x(,8) + X ll.y(,8) = - (X y - ,8 x: y e) .
Lemma 8.4.2

(i) x(Of fi(O) = (1- 0(1 - ,8»x T fj + 02ll.x(,8)T ll.y(,8).


(ii) X (O)fi( 0) _ ice): yee) e
= (1 - 0) (X fj - ye) + 02 (ll.X (,8) ll.y(,8) - ll.x(l3): ll.Y(fJ) e) .
Therefore, by (ii) and (v) of Lemma 8.4.1 and the fact that (x, y) E N(a), the value
of x(Offi(O) and IIX(O)fi(O) - i(e):y(e)ell can be bounded as follows:

(1 - 0(1 - ,8»xT y :::; x(O)T fi(O)


:::; (1- 0(1 -,8) + 02a 2 + (1- ,8?n) xTy, (8.42)
4n(l- a)
IIX(O)fi(O) _ x(O)T fi(O) ell
n
:::; (1 - O)IIX fj - x: fj ell + 02 (IIll.X(,8)ll.Y(,8)11 + ll.X(,8>:rnll. Y(,8»)
xTy (a 2 + (1- ,8?n xTy a 2 + (1- ,8)2n x Ty )
< (1 - O)a -
- n
+ 02 I-a n
+ ---=-,.--'-:--
4yn(l-a) n
= {(1- O)a + 02 (1 + _1_) a2+ (1-
,8)2n} x Ty . (8.43)
4yn 1- a n

Consequently, we obtain from (8.42) and (8.43) that if

(1 - O)a + 02 (1 + 4~) a 2 \(~~,8?n :::; a(l- 0(1-,8» (8.44)

then (X(O), fi(O» E N(a) (i.e., the requirement (4b», and if there exists a constant
p E (0,1) such that

(8.45)
Complementarity Problems 321

then x(Ol y(O) ::; pxT fj (i.e., the requirement (4c)). Therefore a sufficient condition
of the parameters a E (0,1), (3 E [0,1] and 0 for the requirements (4a), (4b) and
(4c) is to satisfy the inequalities (8.41), (8.44) and (8.45) with a constant p E (0,1).
In fact, let
1 1 1
a = 2' (3 = 1 - 2fo,' 0 = 5" (8.46)

then it can be easily seen that (1 - (3) 2 n = ~ and that


1 fl (1- a)2
o = 5" ::; V2 = a 2 + (1- (3)2n

(1 - O)a + 02 (1 + It,;:;") a 2 + (1 - (3)2n = ~ ::; a(l- 0(1 - (3)),


4vn 1- a 20

for every n ~ 1. Furthermore,

Hence the choice ofthe parameters (8.46) meets the requirements (4a), (4b) and (4c)
with p = 1 - 20J;;-.
There are many other possible choices of the parameters, but
°
we never take (3 = and/or (3 = 1 in our analysis above since the requirements (4b)
and/or (4c) are not necessarily ensured in those cases (see (8.44) and (8.45». This
means that using a combined direction of the affine scaling direction (~xa, ~ya) and
the centering direction (~xC, ~yC) makes sense in our analysis.

Based on the discussion above, we now state an algorithm which we call path-
following algorithm:
322 CHAPTER 8

Algorithm 8.4.3 [Path-following algorithm for the LCP]

Input
a E (0,1): the neighborhood parameter;
(xO, yO) E .N( a): the initial feasible-interior-point in the neighbor-
hood .N(a) of the path of centers;
Parameters
f > 0: the accuracy parameter;
!3 E [0,1]: the parameter of convex combination of the
centering direction and the affine scaling direction;
p E (0,1): the parameter of shrinking ratio of the complementarity
xTy;
0: the step size parameter;
begin
(x,y) = (xO,yO);
k = 0;
while xTy> f do
Calculate (~x(!3), ~y(!3» from (8.40);
(Ax, Ay) := (~x(!3), Ay(!3»;
Compute the search mapping (X(O), y(O» by (8.16);
Find 9 such that
(x(9), y(9» > 0, (x(9), y(9» E .N(a) and x(9)Ty(9) :::; pxTy;
(x, y) := (x(9), y(9»;
k:= k + 1;
end
end.
If we choose the parameters as in (8.46) and if an initial point (XO, yO) E .N(a) is
=
obtained, then the algorithm is well-defined with the ratio p 1 - 20fo' Figure 8.8
gives an image of the sequence {(xk, yk)} generated by the path-following algorithm
in x-space.

In this case, the generated sequence {(xk, yk)} satisfies (8.9) for each k = 0,1, ...
and consequently
(Xk?yk:::; (1- 20~r (xO?yo.
Let us compute an iteration number [{ at which the criterion (xK)T yK < f IS
satisfied. A sufficient condition for (xK)T (yK) :::; f is given by
K
( l O(x)
1 - 20.jTI
) °
T y :::; f.
Complementarity Problems 323

Figure 8.8 A generated sequence {(Xk, yk)} by the path following algorithm.

This implies that

Klog(l- 20~) ::::;lOg(XO;TyO'


By using the fact (see (i) of Lemma 8.4.5»

log (1- 20~) ::::; -20~ < 0,


we obtain the bound
( O)T 0
K > 20Fnlog x y
- f

Thus the following theorem has been shown.

Theorem 8.4.4 Suppose that the LCP{8.2) satisfies Condition 8.2.2 and Condi-
tion 8.2,,{ Define the parameters as in (8.46). Then Algorithm 8.4.3 terminates
with an approximate solution (x, y) E N( a) satisfying the desired accuracy x T y :::; (
in 0 (Fnlog ("-°r yO ) iterations.
The order 0 ( vn ("-°r
log yO) is known as the best iteration upper bound for feasible-
interior-point algorithms for solving the LCP to date.
324 CHAPTER 8

The path-following algorithm of this type was first proposed by Kojima, Mizuno and
Yoshise [30). While Algorithm 8.4.3 employs the quantity (8.35) for a measure of
the "distance" between a point (x, y) and the path of centers, many other measures
have been proposed. For instance, Kojima et al. [26) used the function if>cen as a
measure and showed the relationship among several measures.

In case of linear programs, taking a small f3 E (0,1) and a large step size () shows
an outstanding performance in practical use (see [36,42,40, 3], etc.). A difficulty of
Algorithm 8.4.3 is that it often forces us to use a short step size () and requires too
many iterations. Several approach have been proposed to overcome this difficulty
(see [50, 24, 68], etc.).

Another problem to be solved is how to prepare an initial point (XO, yO) which belongs
to the neighborhood N(a). We have at least three approaches to overcome this
difficulty. The first is to make an artificial problem from the original one which we
will describe in Section 5. The second is to use another type of path of centers and its
neighborhood according to the initial feasible-interior-point (xO, yO) E S++([50, 46,
49], etc.). See Chapter 3 for such variants of the path-following algorithm. The last
one, which may be the most practical approach among them, is to give up the idea
of finding a feasible-interior-point as an initial point, and to develop an infeasible-
interior-point algorithm which allows us to start from an infeasible-interior-point
(x,y), i.e., (x,y) > 0 but not necessarily (x,y) E Sar. See Chapter 5 for the idea of
infeasible-interior-point algorithms and much developments on this subject.

8.4.2 Potential-reduction Algorithm


We describe here another typical interior point algorithm, the potential-reduction
algorithm, for solving the LCP. The algorithm is based on the model M2 (8.10) and
generates a sequence {(xk, yk)} which satisfies the relation (8.14) with a number
fJ > 0 not depending on k.

Recall that the potential function if> is defined by


n
cfJ(x, y) = (n + v) 10gxT y -l)ogxiYi - n log n for every (x, y) > 0,
i=l

where v > 0 is a parameter. Suppose that v is a fixed positive number, and that we
currently obtain an feasible-interior-point (i, y) E S++. Let us find the next point
(x, y) E S++ according to the search mapping (8.16). To determine the next point,
it is important to bound the value of the potential function at (x«(}), y«(}» for each
(). For this purpose, we use the following lemma which has appeared in many papers
([9, 23, 77, 86], etc.).
Complementarity Problems 325

Lemma 8.4.5

(i) Ifl +e > 0 fore E IR then 10g(1 +e) ~ e·


e
(ii) Let r E [0,1). If E IRn satisfies e + e~ (1 - r)e then

~log(l+e.»eTe-
~ , -
IIel1 2
2(1 - r)'
.=1
For convenience in the succeeding discussions, we define

V Xl/2}rl/2 = diag{ylxiYi},
v = Ve = (ylXlYt, ylX2Y2, ... , ylXnYn)T, (8.47)
Vmin = min{v;: i= 1,2, ... ,n}.

The following lemma directly follows from the lemma above:

Lemma 8.4.6 Suppose that

(8.48)

then we have

¢J(x(O), y(O» - ¢J(x, y)

~ 0 {nx;-; v_V-Ie} T {V-ley ~x + X ~y)}

+
02 {nxTy
+ v~ T ~
x Y+
IIX-l~xIl2 + liY-l~YIl2}
2(I-r) . (8.49)

In view of the above approximation, the vector Y~X + X ~Y plays a crucial role
in the linear term with respect to O. Furthermore, the quadratic term includes the
=
factors which we can obtain if we let h Y ~x + X ~Y in Lemma 8.3.1. So from now
on we assume that (~x, ~Y) is the solution of the system (8.19) with (x, y) = (x, y)
for some h E IRn. By Lemma 8.3.1, we have

IIx- l dxl12 + lIy-l ~y112 = IIV- l D- l ~x112 + IIV- l Ddyl12


< 11V-11l2(IID-1~xIl2 + IID~yIl2)

< -i-11V- lh I1 2,
vrnin

~xT ~Y < ~IIV-lhI12


4 .
326 CHAPTER 8

Hence, if (J satisfies
_(J_IIV-1hll = T, (8.50)
Vmin

then we obtain a bound for the last term of (8.49) in Lemma 8.4.6 as follows:

B2 { n 11 2 +_ II"Y-'
+ v t" x T t" Y + IIX- 1 t" x2(1 t"Y1I2 } {~(!:..) 1 } 2 (8 51)
xTy T) ::; 4 1 + n + 2(1 _ T) T. .

Thus the remaining concern is to choose an h E lRn suitable to derive the potential-
reduction inequality

for some constant {j > O. While there have been several proposals for such vector h
(see [31, 26, 76, 84, 85], etc.), here we take
-T-
X Y
h = -(Xy
- _
- --e)
n+v
(8.52)

for which the solution (6.x,6.y) of the system (8.19) coincides with the solution
(6.x(,8),6.y(;3)) of (8.40) with
n
;3= - - E(O,I). (8.53)
n+v
In this case, the coefficient of the linear term in (8.49) turns out to be

Hence, by the assumption (8.50) and the inequality (8.51), we obtain the bound

</>(i;(B), y(B)) - </>(x, jj) ::; - nx~; vrninllV- 1


hilT + { ~ (1 + *) + 2(1 ~ T) } T2. (8.54)

By the definition (8.52) and the fact that


T
( n -1 )
xTyv-Ve v=O,
Complementarity Problems 327

we can see that

( nV~in
x-T-y
_ 1)2
+
V2v~in
x-T-y .

The definition (8.47) of Vrnin implies that

nV~in _ n min{xiYi : i
-T- - T
=
1,2, ... , n} (0 1
E ,1
x y x y

for every (x, y) E S++. Specially, if

nV~in
- - <- -21
xTy
then
(-
nV~in
- - 1)2 >-
xTy
1
- 4
and otherwise

Thus, we conclude that


(8.55)

. {I v}
where
0"1 = mm 2' y"2ri" .
328 CHAPTER 8

Let us observe the second term in the right hand side of (8.54). If we assume that
1
T <-
- 3
then we can easily see that

41 (II) 1
1 +;;: + 2(1 _
II
T) ~ 1 + 4n ~ 2 max 1, 4n
{II } .
Define
0"2 = 2max{ 1, :n}.
By combining the above results, we obtain the following inequality whenever T ~ ~:

(8.56)

The right-hand side ofthe above inequality is a quadratic function with respect to T,
and its coefficients are positive and do not depend on the current point (x, y). Hence
we can easily find a suitable T which ensures a constant reduction of the function <p.
In fact, let
_= mm. {I3'
T
0"1}
50"2

then
5 1 1
-0"1 f + :20"2f2 ~ -0"1 f + :20"2 f < - :20"1 f.
Hence, if we take the step size e according to the equation (8.50) with T = f, then
<p(x(e), y(e)) - <p(x, y) ~ -6

where 6 is given by
1
6 = :20"1f.
It should be noted that the existence of the above e is ensured by the inequality
(8.55) which implies that

Now we state the second algorithm which we call potential-reduction algorithm:


Complementarity Problems 329

Algorithm 8.4.7 [Potential-reduction algorithm for the LCP]

Input
v > 0: the function parameter;
(xO, yO) E 5++: the initial feasible-interior-point;
Parameters
f > 0: the accuracy parameter;
fJ E [0,1]: the parameter of convex combination of the
centering direction and the affine scaling direction;
b > 0: the reduction parameter of the potential function <jJ;
B: the step size parameter;
begin
(x, y) = (x O, yO);
k = 0;
while xTy> f do
Calculate (dx(fJ), dy(fJ» from (8.40);
(dx, dy) := (dx(fJ), dy(fJ»;
Compute the search mapping (x(B), y(B)) by (8.16);
Find 0 such that
(x(O), y(O» E 5++ and <jJ(x(O), y(O)) - <jJ(x, y) ::; -{j;
(x, y) := (x(O), y(O»;
k:= k + 1;
end
end.

If we choose fJ as (8.53), then we can find a step size 0 for which the value of the
potential function <jJ is decreased by b where

{j

0"1
-

=
=
1 -

'~l-'L }
m1n 3' 50"2 '
. 1 v
mm 2' V'23' { ,
(8.57)

0"2 2 max {1, 4n '

for every function parameter v > O. In this case, the generated sequence {(xk, yk)}
satisfies (8.14) with the above {j for each k =
0,1, ... and consequently (See Figure
8.9.)

Let us compute an iteration number K at which the accuracy (xK)T yK ::; c is


reached. Recall that the potential function <jJ has the property (8.13). It follows that
330 CHAPTER 8

Figure 8.9 A generated sequence {(Xk, yk)} by the potential-reduction algorithm.

if

for a given (; > 0 then

Hence, the inequality


cfJ(XO,yo) - K6:5 v log (;
gives a sufficient condition for such a number K, which can be rewritten as
v 2<1>(xO,yo)/"
K:::: "blog (;

In view of the definition (8.57) of 6, the value of v / 6 is given by


20n
(0 < v :5 yi!f),
V { "
- = 40v (..j!f < v :5 4n),
6 I,!:,' (4n < v).
Thus the order of v / 6 attains its minimum O( yin) when we take v = O( yin) in our
analysis. We have shown the following theorem:
Complementarity Problems 331

Theorem 8.4.8 Suppose that the LCP(8.2) satisfies Condition 8.2.2 and Condi-
tion 8.2.4- Let /.I = O( y'n). Define the parameters (3 and b as in (8.53) and (8.57).
Then Algorithm 8.4.7 terminates in 0 ( y'nlog 2"(,,,°/)/"') iterations with an approx-
imate solution (x, y) E S++ satisfying the desired accuracy xT y ::; (.

In the discussion above, we employ the search direction (~x((3), ~y((3)) which is the
solution of the system (8.40) where (3 is given by (8.53). We can easily see that this
direction coincides with the solution of of the minimization problem with a certain
w > 0:

Minimize + 'il ytjJ(x, y)~y


'il xtjJ(x, y)~x
subject to -M ~x + ~y = 0, IW- 1 (y ~x + X~y)1I2 ::; w.
Here 'il xtjJ(x, y) and 'il ytjJ(x, y) denote the gradients of the potential function tjJ at
(x, y) E S++ with respect to the variable vector x and y, respectively. Thus the
direction (~x, ~y) can be regarded as the trust region direction of the potential
function tjJ on S++ with respect to the metric induced by II ·llpD

which defines a norm in the linear subspace

{(~x, ~y) E IR 2n : -M ~x + ~y = O}.


If the matrix M is skew-symmetric (as in the LCP arising from a linear program),
then we can see that

In this case, the direction (~x, ~y) is an affinely scaled steepest descent feasible
direction of the potential function. The above scaling is often called primal-dual
scaling and regarded as a key to show the best complexity of the interior-point
algorithms for the LCP to date. See [83] where an analysis is provided in case of
using another scaling.

The first potential-reduction algorithm was proposed by Karmarkar [23] for solving
linear programming problems. While the original potential function was defined for
problems in primal form, a primal-dual potential function of the type (8.11) was
introduced by Todd and Ye [77] and also by Tanabe [74] in a multiplicative form.
The first O( y'nL )-iteration potential-reduction algorithm for the LCP was proposed
by Kojima et. al [31] and further discussions on the algorithms can be seen in [26]
in connection with the path-following algorithm. See Chapter 4 for an overview of
the potential-reduction algorithms developed so far.
332 CHAPTER 8

In the discussion for the potential-reduction algorithm above, we only use a fixed
parameter f3 defined by (8.53). However, the polynomial-time complexity can be
ensured with other choices. See, for example, [26] where the authors also provided a
unified framework including the path-following algorithm and the potential-reduction
algorithm, based on the fact that the level set ofthe function ~cen defined in (8.11)
gives a neighborhood of the path of centers (see Figure 8.3 and Figure 8.6).

8.5 COMPUTATIONAL COMPLEXITY OF


THE ALGORITHMS
In this section, we will discuss the complexity of the interior point algorithms for the
LCP satisfying Condition 8.2.2. Each of the two algorithms proposed in the previous
section starts from an initial point (XO, yO) in the feasible-interior region and generate
an approximate solution (x K , yK) after a finite number J{ of iterations. See Theorem
8.4.4 and Theorem 8.4.8. Here the following two important questions remain to be
solved:

(5a) How can we prepare such an initial point (X O, yO)?


(5b) How can we obtain an exact solution (x*, y*) of the LCP from the approximate
solution (x K , yK)?

In the succeeding subsections, we intend to answer both of the above questions. Here
we impose one more condition to define the size of the LCP, which is a necessary
concept for discussing the polynomiality of algorithms.

Condition 8.5.1 All the elements of the matrix M and the vector q in the Lep
(8.2) are integral.

Under the condition above, the size L of the LCP is defined by


n n n
L =E E rlog2(l m ij 1+ 1)1 +E pog2(lq;J + 1)1 + 2 rl og2(n + 1)1 + n(n + 1),
;=1 j=1 ;=1

where mij and q; denote the (i, j)-th element of the coefficient matrix M and the
i-th element of the constant vector q of the LCP (8.2), respectively, and rz1 the
smallest integer not less than z E JR. It follows from the definition of L that every
minor of the matrix ( - M I q) is integral and its absolute value is less than 2L / n 2.
As we will see below, the polynomial-time complexity O( foL) of the interior point
algorithms can be derived from this fact.
Complementarity Problems 333

8.5.1 Initial Point


A most practical way to overcome the difficulty of finding a feasible-interior initial
point is to use infeasible-interior-point algorithms which allow us to start from a
point (xO, yO) > 0 but not necessarily (xO, yO) E Saf. See Chapter 5 for the detailed
description of the algorithm. We propose here another method for which the results
described so far can be easily adopted and the best complexity bound to date has
been shown.

Let XO E JR~+. In general, the point XO does not necessarily satisfy M xO + q > O.
However, if we import the vector e E JRn of ones and a new variable X n +l E JR, then
we can easily find x~+1 > 0 and yO satisfying

yO = M xO + X~+l e + q > O.
We extend this idea and construct an artificial monotone LCP:
LCP': Find (x', y') E lR.2(n+l)
such that y' = M'x' + q', (x', y') ~ 0, (8.58)
x:y: = 0, i = 1,2, ... ,n+ 1.
with

x, - ( x
Xn +l
) ,y= , (M
, ( Yn+ly) ,M= -e
T e) ,q=
0
, ( q
qn+l
) . (8.59)

Here Xn+l E JR and Yn+l E IR are artificial variables and qn+l is a positive constant.
We also use the symbols S~, S~+ and S~p for the set of all feasible points, all feasible-
interior-points and all complementary solutions of the LCP', respectively:

S~ = {(x', y') ;::: 0 : y' = M'x' + q'},


S~+ = {(x',y'»O:y'=M'x'+q'},
S~p = {(x', y) E S~ : x~y~ = 0 i = 1,2, ... , n + I}.
For a given qn+1 > 0, we can easily find an xO > 0 and X~+l > 0 for which

ex °
TO <qn+l, M x O +xn+1e+q> 0

hold. Hence, the LCP' has an initial interior point

x ,xn°+ 1 , y °, Yn+l
( x,0, yO) -_ (0 ° )> 0
with
yO = Mxo + x~+le + q, y~+1 = _eT xO + qn+1. (8.60)
Note that the matrix M' defined by (8.59) is positive semi-definite whenever M
is positive semi-definite. Hence if we replace the problem LCP in the conditions,
334 CHAPTER 8

Condition 8.2.2 and Condition 8.2.4 with the artificial problem LCP' (8.58) then the
LCP' satisfies both of these conditions. The next theorem follows from Theorem
8.3.6:

Lemma 8.5.2 Suppose that LCP (8.2) satisfies Condition 8.2.2. Construct the
artificial problem LCP' (8.58) by using (8. 59}. Then, LCP' (8.58) is a monotone
LCP which has an feasible-interior-point (x,o, y'0) E S~+ for every positive constant
qn+1 > O. Moreover, the LCP has a solution (x'*, y'*) E S~p.

Therefore, we may apply some feasible-interior-point algorithms for solving the ar-
tificial problem LCP' (8.58). The remaining problem is what kind of information
can be obtained from the solution of the LCP' concerning the solution set Scp of the
original problem. To see this, let us define Land L as follows:
n n n
L = LL log2(lmoi 1+ 1) + L log2(lq;1 + 1) + 2log2 n, (8.61)
;=1 i=1 ;=1
n n n
L LL pog2(l mijl + 1)1 + L flog2(lq;J + 1)1 + 2 flog2(n + 1)1
;=1 j=1 ;=1

+ n(n + 1). (8.62)

The above definitions imply that

(8.63)

By use of this inequality, we can obtain the following lemma (see [29, 51], etc.).

Lemma 8.5.3 Suppose that the LCP (8.2) satisfies Condition 8.2.2. Construct
the artificial problem LCP' (8.58) by using (8. 59}. Let a solution (x'*, y'*) =
(x*,x~+1'Y*' Y~+d of the LCP' (8.58) be given. Then

(i) if x~+1 = 0 then (x*, y*) is a solution of the LCP (8.2),

(ii) if X~+1 > 0 then LCP (8.2) has no solution in the set {(x, y) E R~n : eT x <
qn+d, and

(iii) if qn+1 ~ 2£ In and X~+1 > 0 then the LCP (8.2) has no solution.
Complementarity Problems 335

If we turn our attention to the two interior point algorithms described in Section 4,
they need more strictly conditioned initial points (see Theorem 8.4.4 and Theorem
8.4.8). However, we can also resolve this problem by suitably setting XO and qn+1. Let
us define the size L' of the artificial problem LCP' (8.58), the neighborhood N'{Ct)
of the path of centers and the potential function ¢/ associated with the artificial
problem LCP' (8.58) according to the definitions (8.62), (8.36) and (8.11) for the
original LCP:
n+ln+1 n+1
L' LL flog2(lm~jl + 1)1 + L flog2{lq~1 + 1)1
i=1 j=1 i=1

+2 pog2{{n + 1) + 1)1 + (n + 1)«n + 1) + 1), (8.64)

N'(Ct') {(X" y) E S~+ : IIX'Y - :':~ ell ~ Ct :':~}


¢/(X', y') I/'¢{p{x', y') + ¢>~en(x', y'),
¢>~p(x',y') = logx,T y',
n+l
¢>~en(x', y) (n + 1) logx,T Y - L logx:y: - (n + 1) log(n + 1)
;=1

In the lemma below, the parameter i serves for leveling the values of (i = x'?Y?
1, 2, ... , n + 1) and for bringing the initial point (x'o, yO) close to the path of centers
Seen (see [29, 30], etc.).

Lemma 8.5.4 Let n ~ 2. Suppose that the LCP (8.2) satisfies Condition 8.5.1.
Construct the artificial problem LCP' (8.58) by using (8.59). Let
Ct E (0,5/2],

1.(M,q) = 2maXiE{1,2, ... ,n}{ I[Me];l, Iqil } ~ 2,


2L+ 1 (8.65)
t(M, q) = -2-'
n
'Y E [1., t], qn+l = (n + 1)-y,
where [eli denotes the i-th component of the vector e. Let the vector (x'o, yO) =
(XO,X~+I,yO'Y~+1) be defined by

xO = ie, x~+1 = (~) ni 2 ,


yO = Mxo + x~+1e + q = -yMe + (~) n-y 2 e + q,
(8.66)

y~+1 = -eTxO + (n + 1); = i.


336 CHAPTER 8

Then

(i) (x'O, y'0) E S~+ and (x,o, y'0) E N'(a),

(ii) .p~p(x,o,y'o) ~ (3+~) L, </J~en(x'o,y'0) ~ a, </J'(x'o,y'0) ~ 1I{3+~) L+a, and


(iii) L ~ L' ~ 5L.

If we choose
2L+1
1 = i'(M,q) = -n2
then
(n + 1)2L+1
qn+l = n2
and we see that qn+l ~ 2L In, i.e., the requirement in (iii) of Lemma 8.5.3 for qn+l
is fulfilled. Therefore, the theorem below follows from the three lemmas above:

Theorem 8.5.5 Let n ~ 2. Suppose that the LCP (8.2) satisfies Condition 8.5.1.
Construct the artificial problem LCP' (8.58) by using (8.59). Let a E (0,5/2],
1 E h(M, q), i'(M, q)] and qn+1 =
(n + 1)/ where I(M, q) and i'(M, q) are defined
by (8~65). Let (x,o,y'0) =
(XO,X~+l,yO'Y~+1) be the initial point given by (8.66)
and let (x'*,y'*) =
(X*,X~+l,y"Y~+l) be a solution of the artificial problem LCP'.
Then, the following results hold.

(i) (x,o, y'0) E N(a) for every a E (0,5/2].

(ii) .p'(x,o, y'°)/II' = O(L) for every II' > O.


(iii) If x~+1 =
0 then we have a solution (x*, y*) of the LCP (8.2) and otherwise
the original LCP (8.2) has no solution in the set {(x, y) E R~n : eT x < qn+d.

(iv) The input size L' of the LCP' defined by (8.64) satisfies L ~ L' ~ 5L.

In particular, if 1 = i'(M, q) then the assertion (ii) is replaced by {iiI:

(ii)' If x~+1 =
0 then we have a solution (x·, yO) of the LCP (8.2) and otherwise
the original LCP (8.2) has no solution.
Complementarity Problems 337

To find whether the original LCP has a solution or not, our analysis requires the
use of number 1 = t(M, q) = 2L + 1 /n 2 which often becomes extraordinarily large for
practical use. On the other hand, we can compute the number

l(M, q) = 2 max {l[Me];I, !qi!}


- iE{1,2, ... ,n}

in O(n 2 ) arithmetic operations. A practical method may be to start a suitable


1 ~ l(M, q) and to update it until the set {(x, y) E lR!n :
eT x < qn+d with
qn+1 ;; (n + l)r becomes a sufficiently large. Kojima et al. [32] proposed a method
of updating 1 based on the results proposed by Mangasarian [37, 38, 39].

Recently, Ye [87] showed another type of artificial problem for the monotone LCP.
The problem is given by the following homogeneous model

RCP: Find (x, Xn+1, y, Yn+d E 1R2(n+1)


such that Y = Xn+1(M + q), Yn+1 = -xT(M
"':+1 + q), "':+1 (8.67)
(x,xn+1,Y,Yn+d~O, XiYi=O, i=I,2, ... ,n+l,

which is an extension of the homogeneous self-dual linear programming proposed


in [89], [82] and so on. While the problem (8.67) loses the linearity of the original
problem, this approach brought a remarkable development i.e., it succeeded in pro-
viding an O( fo,L) interior point algorithm for solving the problem (8.67) by which
the feasibility and/or the solvability of the original LCP can be detected without any
use of big parameter as in Theorem 8.5.5. The approach has also been generalized
to other problems (see [I, 13], etc.).

8.5.2 Stopping Criterion and Computational


Complexity
In Section 4, we showed that each of the path-following algorithm and the potential-
reduction algorithm provides an approximate solution (x K , y K ) E S++ and x KT yK $
t in a finite number K of iterations for every t > 0 ( see Theorem 8.4.4 and Theorem
8.4.8). The lemma below gives us a sufficient criterion t for computing an exact
solution (x·, Y·) E Scp of the LCP. Its proof is based on the fact that each basic
component of basic feasible solution of the system

y=Mx+q, (x,y)~O

can be represented as a ratio tJ..d tJ.. 2 where tJ.. 1 is a minor of order n of the matrix
[-M I q] and tJ.. 2 a nonzero minor of order n of the matrix [-M I] (see [29, 26],
etc.):
338 CHAPTER 8

Lemma 8.5.6 Let n > 2. Assume that Condition 8.5.1 holds. Suppose that
(x, y)
E S+ satisfies xT y-5:. 2- 2L . Define the index sets I and J by

I={iE{i,2, ... ,n}:xi5:.2-L} and J={jE{1,2, ... ,n}:Yj5:.2-L}.


(8.68)
Then there exists a solution (x*, y*) of the LCP (8.2) satisfying

(x*,y*) E S+, x; =0 for every i E I and yj =0 for every j E J. (8.69)

Though the lemma above only ensures the existence of an exact solution (x*, y*)
of the LCP, a method has been proposed for computing the solution (x', y') from
the approximate solution (x, y) in O(n 3 ) arithmetic operations (see Appendix B of
[29]). Combining the results in Section 4 and the discussion above, let us derive the
computational complexity of the two feasible-interior-point algorithms in Section 4.
Suppose that the LCP (8.2) satisfies Condition 8.5.1. Theorem 8.5.5 implies that
we can start both algorithms for solving the artificial problem LCP' (8.58) from the
initial point (x,o, y'0) E S++ described in the theorem. From (ii) of Lemma 8.5.4,
the initial point (x'o, y'0) satisfies the equalities

x,oT y'O = 20(L) and 24>(X IO ,y'O)/V = 20(L).

Thus, by each of the algorithms, an approximate solution (x,K, y,K) E S++ with
KT K .
x' y' 5:. ( can be obtamed after
20(L))
K =0 ( vnlog-(-

iterations (see Theorem 8.4.4 and Theorem 8.4.8). If we take f 2- 2L' then we =
obtain an exact solution (x'*, y'*) E S~p of the artificial problem LCP' and if we
take 'Y =
i(M, q) as in Theorem 8.5.5 then we can determine whether the original
LCP (8.2) has a solution or not from the solution (x'*, y'*). Note that the input
size L' of the artificial problem LCP' (8.58) satisfies (iii) of Lemma 8.5.4. Thus the
required number of iterations turns out to be

K = O(vnL)
in each of the algorithms. It should be noted that each iteration requires n+ 0«
=
1)3) O(n 3 ) arithmetic operations which are mainly due to the calculation of the
search direction satisfying the system (8.40). Additionally the last iteration needs
O(n 3 ) arithmetic operations to refine the solution. Summarizing the discussions
above, we finally obtain the following theorem:
Complementarity Problems 339

Theorem 8.5.7 Suppose that the LCP (8.2) satisfies Condition 8.2.2 and Condi-
tion 8.5.1. Construct the artificial problem (8.58) as in Theorem 8.5.5 and apply the
feasible-interior-point algorithms described in Section -4 for solving the LCP' (8.58).
Then, in each of the cases, we can either find an exact solution of the original LCP or
determine that the original LCP has no solution in O(foL) iterations with O(n 3 .5 L)
arithmetic operations.

If we combine a way of using approximate scaling matrices for computing the search
directions with the path-following algorithm, the average number of arithmetic op-
erations per iteration can be theoretically reduced to O( n 2 . 5 L) and the total number
of operations to O( n 3 L) which is the best bound up to present (see [23, 29, 49], etc.).

8.6 FURTHER DEVELOPMENTS AND


EXTENSIONS
In this section, we briefly describe some further developments and extensions of
the interior point algorithms for the CP (8.1) which we have not mentioned in the
previous sections.

The algorithms appeared in Section 4 are based on the idea of using the Newton
direction as the solution of the system (8.40) with a fixed f3 at each iteration. How-
ever, there have been many algorithms outside of this framework. One of such
algorithms is the so-called predictor-corrector algorithm which uses the affine direc-
tion (~xa, ~ya) (the solution of (8.40) with f3 = 0) and the centering direction
(~xC, ~yC) (the solution of (8.40) with f3 = 1) alternately during the iteration goes.
A remarkable feature of this algorithm is that not only polynomial-time properties
of the algorithm but also various asymptotically convergence properties of the gen-
erated sequence are reported under certain assumptions ([20, 19, 21, 22, 48, 47, 55,
57,56,67,69,80,88]' etc). Among others, Ye and Anstreicher [88] showed quadratic
convergence of the feasible-predictor-corrector algorithm for the monotone LCP un-
der the assumption that a strictly complementarity solution exists. Wright [80] and
Potra [69] proved superlinear or quadratic convergence of the infeasible-predictor-
corrector algorithm for the LCP under the same assumption. Monteiro and Wright
[55] gave an investigation concerning the behavior of feasible- and/or infeasible-
predictor-corrector algorithms for the monotone LCP when the LCP is degenerate,
and Mizuno [47] succeeded in weakening the assumption and deriving superlinear
convergence of the infeasible-predictor-corrector algorithm for solving a geometrical
(or general) LCP (8.7) which has a solution (not necessarily strictly complementar-
ity).
340 CHAPTER 8

Another type of algorithm is given in [15] where a new class of search directions is
introduced. Each direction in this class is given by the solution of the system:

Y~x+X~y
-M~x+~y

Here (x, y) E S++ and r is a nonnegative real number. If we take r = 0 then the so-
lution ofthe above system is equivalent to the affine direction (~xa, ~ya). However,
in the case of r > 0, the solution can not be represented as a linear combination
of the affine direction (~xa, ~ya) and the centering direction (~xC, ~yC). See [15]
for the theoretical results including polynomial complexity bound of this type of
algorithm.

In order to show the existence of the path of centers for the monotone LCP, we only
used some specific properties of the problem (see Section 3). In fact, Kojima et al.
[26] showed that there exists a path of centers Seen converging to a solution under
the following condition (see Theorem 4.4 of [26]):

Condition 8.6.1

(i) The matrix M of the LCP (8.2) belongs to the class Po of matrices with all the
principal minors nonnegative.

(ii) A feasible-interior-point (XO, yO) E S++ is known.


(iii) The level set S+(T) = {(x, y) E S+ : xTy::;: T} of the objective function of the
model Ml (8.8) is bounded for every T 2: O.

Thus, the condition above may be considered as a sufficient condition on the LCP
for ensuring the global convergence of feasible-interior-point algorithms. To derive
polynomiality of the algorithms, we repeatedly used Lemma 8.3.1 brought by the
monotonicity assumption on the LCP. (see Section 4). Among others, the asser-
tion (ii) of this lemma is essential for deriving bounds (8.21) and (8.22) concerning
(~x, ~y). However, similar bounds can also be obtained as long as the value of
~xT ~y is bounded from below. Based on this observation the class of P.-matrices
was first introduced in [26]. According to the definition in [26], the class p. is the
union of the class P.(II:) with respect to II: 2: 0, where P.(II:) (II: 2: 0) consists of
matrices M such that

(1 + 411:) E ~i[M~]i + E MM~]i 2: 0 for every ~ E lRR. (8.70)


iEl+W iEL(e)
Complementarity Problems 341

Here [M~]i denotes the i-th component of the vector M~ and

I+(e) = {i E {I, 2, ... , n} : ei[Me]i > OJ, L(e) = {i E {I, 2, ... , n} : ei[Me]i < OJ.
Let P SD be the class of positive semi-definite matrices, P be the class of matrices
with positive principal minors, CS and RS be column-sufficient and row-sufficient
matrices, respectively. Some known implications are

PSD C p. C CS, PCP., p. =CSnRS

(see [5, 26, 79], etc.). Concerning the LCP with a P.-matrix, the following results
have been shown (see Lemma 4.5 and Lemma 3.4 of [26)):

Lemma 8.6.2 Suppose that the matrix M in (8.2) is a P.-matrix and that Condi-
tion 8.2.4 holds. Then Condition 8.6.1 holds.

Lemma 8.6.3 If matrix M belongs to the class P.(II:) with II: ~ 0, then, for every
(x, y) E R!n+,

(i) the matrix


M= (Y-M X)
I
is nonsingular, hence the system (8.19) has a unique solution (.!lx, .!ly) for every
hE Rn, and

(ii) (.!lx, .!ly) satisfies the following inequalities:

_II:IIX-1/2y-1/2hIl2 :5 .!lxT .!lY:5 ~IIX-1/2y-1/2hIl2


IID- 1 .!lxIl 2 + IID.!lyIl2 = IIX- 1/ 2y-1/2hIl 2 - 2.!lxT .!ly
:5 (1 + 211:)IIX-1/2y-1/2hIl2.
Here X- 1/2 (Y-1/2) denotes the diagonal matrix whose components are x;1/2
=
(y;1/2) (i 1,2 ... , n), and D =
X1/2y-1/2.

Therefore, the path of centers exists under the assumptions in Lemma 8.6.2 and
we can analyze the one step behavior of the algorithm using Lemma 8.6.3 as in
Section 4. It has been proved that the LCP with a P.-matrix M can be solved in
O( y'7l( 1 + II:)L) iterations by constructing a suitable artificial problem (see [26)).
342 CHAPTER 8

As described in Section 2, there are various types of LCPs such as the MLCP (8.5),
the HLCP (8.6), the GLCP (8.7) etc. Recently the LCPs with a P.-matrix has been
attracted much attention partially due to the fact that it relates these LCPs. Let us
define the P.(K)-property for these problems as follows:

MLCP: the matrix M is a P.(K)-matrix.


HLCP: The equation Mlx + M2Y = 0 implies that xTy ~ -4K L:iEI+ XiYi.

GLCP: The dimension of II> is nand xTy ~ -4K L:iEI+ XiYi for every (x, y) Ell>.

Here I+ = {i: XiYi > O}. Potra [2] showed that the P.(K)-property is invariant
under some transformations which convert the above types of LCPs into each other.

It should be noted that the LCPs discussed so far constitute a mere part of the wide
class of LCPs, and that there are many other LCPs for which any polynomial-time
algorithm has not been provided yet. It is known that the general Po-matrix LCP,
i.e., the LCP for which only the requirement (i) of Condition 8.6.1 is ensured, is NP-
complete (see Section 3.4 of [26]) [26]) while the Newton direction for the system
(8.17) can be computed (see Lemma 4.1 of [26]) .. See also [44] for an attempt to
find the complexity of another class of LCPs.

The nonlinear CP is another important problem in the field of interior point al-
gorithms. Kojima et al. [27] extended the results in [29] to a class of nonlinear
complementarity problems and this work was succeeded by [28] and by [25]. In these
papers, the following three conditions are proposed:

Condition 8.6.4

(i) The mapping I is a Po-function, i.e., for every xl E IRn and x 2 E IRn with
xl #- x 2 , there exists an index i E {1,2, . .. ,n} such that

xl #- x; and (xl - x;)(/;(xl) - 1;(x 2 »~ o.


(ii) The set
S++U) = {(x, y) E IR2n : Y = I(x), (x, y) > O} (8.71)
is non empty.
(iii) The set H-l(C) =
{(x, y) E IR~n : H(x, y) E C} is bounded for every compact
subset C 01 IRt. x v(IR~n+). Here
H(x, y) = (u(x, y), vex, y»,
Complementarity Problems 343

= =
u(x, y) (XIYl, X2Y2,···, XnYn), VeX, y) Y - f(x),
v(R~n+) = =
{v ERn: v Y - f(x) for some (x, y) E R~n+}.

Condition 8.6.5 The mapping f is a uniform P-function, i.e., for every xl E IRn
and x 2 E R n , there exists a positive number 1 such that

Condition 8.6.6

(i) The mapping f is a monotone function, i.e., for every Xl ERn and x 2 ERn,

(xl _ x 2 ?U(x l ) - f(x 2 »::::: O.

(ii) The set


S++U) = {(x, y) E R 2n : Y = f(x), (x, y) > O}
is nonempty.

Condition 8.6.4 and Condition 8.6.6 may be regarded as extensions of Condition


8.6.1 and the assumption on the LCP (8.2) in Theorem 8.3.6 for nonlinear cases,
respectively. The main contributions of these studies are to prove the existence of
the path of centers under Condition 8.6.4 and the implication that if Condition 8.6.5
or Condition 8.6.6 holds then so does Condition 8.6.4. GuIer [12] extended the former
result to the CP with a maximal monotone map and established the existence of the
path of centers for the problem. It should be noted that a theoretical background of
these results can be found in the fundamental work of McLinden [41].

Concerning the algorithms for nonlinear CPs, Kojima et al. [25] provided a homotopy
continuation method which traces the center trajectory and globally converges to a
solution of the CP (8.1) under Condition 8.6.4. In [33], a more general framework
for the globally convergent infeasible-interior-point algorithms are described in terms
of the global convergence theory given by Polak [66] in 1971. While the papers
mentioned above consider the global convergence properties of the algorithms as
their main aims, the study of their convergence rates has also become active for the
smooth convex programming (see Chapter 8). In order to derive the convergence
rate, we must impose certain conditions on the smoothness of nonlinear mapping
f. For the variational inequality problem, Nesterov and Nemirovsky [62] analyzed
the convergence rate of Newton's method in terms of the so-called self-concordant
barrier under the following condition for f:
344 CHAPTER 8

Condition 8.6.7 The mapping f is C 2 -smooth monotone operator f : IR+ -> IRn

°
is j3-compatible with F(x) = - L:~=llnxi' i.e., there exists a j3:::: such that for all
x> and hi E IRn (i = 1,2,3), the inequality
°
3
1f"(x)[h 1, h 2 , h 3 ]1 :::; 33/ 2 (3 II {J'(x)[h i , hij1/31Ix-lhiI11/3}
i=l
holds.

It has been shown that the barrier function ft(x) =


(1 + j3?{tf(x) + X-1e} is self-
concordant for every t > 0 under the condition above (see [62]). The concept of
self-concordance was originally given in the study of barrier function methods for
solving convex programming ([59, 60, 61], etc.). Independently of this approach,
J arre [17] introduced a smoothness condition which can be regarded as a relative
Lipschitz condition. These two conditions brought many theoretical results to convex
programming, which can be seen in Nesterov and Nemirovsky [62], Jarre [18], Den
Hertog [6], and so on. See also Chapter 8. A result that can be seen in all these
studies is that self-concordant mappings satisfy a modified version of the condition
proposed in [17]. In view of the CP, the condition will be given as follows:

Condition 8.6.8 The mapping f is a continuous and differentiable monotone op-


erator f : IR+ -> IRn and satisfies the relative Lipschitz condition, i.e., there exists a
(3 :::: 0 such that the mapping It( x) =
(1 + (3? {tf( x) + X- 1 e} satisfies the inequality

Ih T (V'ft(x 2 )-V'ft(x 1»hl:::; Cl~r)2 -1) hTV'ft(x1)h.

for all t > 0, xl, x 2 E IRn for which r := J(x 1 - x2)TV' ft(x)(x 1 - x 2 ) < 1 and
hE IRn.

On the other hand, Potra and Ye [70] presented a potential reduction algorithm for
the monotone CP and derived global and local convergence rates of the algorithm.
They used the so-called scaled Lipschitz condition below which was introduced by
Zhu [91] for convex programming problems and used by Kortanek et al. [35] for
an analysis of a primal-dual method for entropy optimization problems, by Sun et
al. [73] for the min-max saddle point problems, and by Andersen and Ye [1] for the
monotone LCP embedded in a homogeneous problem.

Condition 8.6.9 The mapping f is a continuous and differentiable monotone op-


erator f : IR+ -> IRn and satisfies the scaled Lipschitz condition, i. e., there is a
Complementarity Problems 345

nondecreasing function '!{>( 0:) such that

IIXU(x + h) - f(x) - \7f(x)h)11 :::; '!{>(o:)h T \7f(x)h (8.72)

for all x> 0 and h satisfying Ilx-1hll :::; 0:.

Moreover, Jansen et al. [16] introduced the following condition for the mapping f:

Condition 8.6.10 The mapping f is a continuous and differentiable operator and


there exists a constant K 2: 0 such that the Jacobian \7 f(x) of the mapping f is a
P.(K)-matrix defined by (8.70) for all x 2: O. Furthermore, There exists a 8 > 0
and a '"Y 2: 0 such that

ID (f(X + 8b.:) - f(x) - V' f(x ).6.x) II : :; ,BIIDV' f(x ).6.xll,


for every ~x satisfying II(X- 1 + y-l\7f(x»~xll :::; 1 and () E (0,8], (x,y) E
S++U). Here the set S++U) is defined by (8.71) and D = X 1 / 2 y-l/2.

In [16], the authors showed the global convergent rate of a class of affine-scaling al-
gorithms of [15] under Condition 8.6.10, and provided some relationships among the
four conditions above. Note that the definition (8.72) of the scaled Lipschitz con-
dition implies that h T \7f(x)h 2: 0 for every x > 0, which eliminates non-monotone
mapping f a priori. Even in linear cases, i.e. f is given by f(x) = M x+q, Condition
8.6.9 does not necessarily hold for the P.-matrices. On the other hand, Condition
8.6.10 needs no monotonicity and holds for any linear mapping, which may be con-
sidered as a merit of the condition.

Another remarkable aspects of interior point algorithms for the CP are the devel-
opments of infeasible-interior-point algorithms and the extensions to semidefinite
programming. See Chapters 5 and 9, for the progress on these topics.

8.7 PROOFS OF LEMMAS AND THEOREMS


Proof of Lemma 8.2.5:

(i): Let us consider the optimization model M1 (8.8). The objective func-
tion x T y is rewritten as

T
x Y
1
="2 (T
x ,yT) (0 I) (x )
lay'
346 CHAPTER 8

hence model Ml is a quadratic program whose objective function is bounded


from below on the feasible region S+. Thus model Ml has an optimal
solution (x,y) E S+ (see Appendix of Frank and Wolfe [8]) which satisfies
the Karush-Kuhn-Tucker conditions with Lagrangian multiplier vectors z,
Zx and Zy
y - MT Z - Zx = 0, Zx ~ 0, z:; x = 0,
x + z - Zy = 0, Zy ~ 0, z~ y = o.

As y - Zx = MT z and x - Zy = -z, it follows from monotonicity that

o ~ - zT M z = (y - Zx f (x - Zy) = yT X - z:; x - yT Zy + z:; Zy.


Since z'[ x = yT Zy = 0, (x, y) ~ 0 and (zx, Zy) ~ 0, we obtain the inequality
o ~ yT X + z:; Zy ~ 0

which implies that xT y = z:; Zy = o.


(ii): Let (x, y) and (x', y') be arbitrary two solutions of the monotone LCP.
We claim that any convex combination of (x, y) and (x', y') is also a solution.
First, we observe that

0::; (x - x'f(y - y') = _xT y' - x,T y::; 0


since xTy = x,T y' = 0, (x,y) ~ 0 and (x',y') ~ O. Thus we have xTy' =
x,T Y = 0 for any (x,y) E Scp and (x',y') E Scpo Let us define
(X(A),y(A» = A(X,y) + (1- A)(X',y'), A E [0,1].
It is easily seen that

yeA)= MX(A) + q, (X(A), yeA»~ ~ o.


Using the fact x T y' = x,T Y = 0, we also see that

x(Af yeA) = (AX + (1- A)x'f(AY + (1- A)Y')


= A(1 - A)(xT y' + x,T y)
o.
Therefore, (X(A), y(A» E Scp for every (x, y) E Scp, (x', y') E Scp and
A E [0,1]. The assertion (ii) follows from this fact and the nonnegativity of
(x, y) E Scpo
(iii): We have already seen in (ii) that Scp C P. Since IxU1y = {I, 2, ... , n},
we also have that P C Scpo This completes the proof of the lemma. 0
Complementarity Problems 347

Proof of Lemma 8.2.6:

We only show the second and the third parts of the lemma. The closed-
ness of the set S+ (1') can be obtained by the continuity of x T y. Hence, it
suffices to show that the set S+(1') is bounded for every l' 2:: O. Let (x,y)
be a fixed feasible-interior-point whose existence is ensured by Condition
8.2.4. Then, by Condition 8.2.2, we obtain the following inequality:

Hence, if (x, y) E S+(1') then (x, y) belongs to the bounded set

{(x, y) E IR2n : (x, y) 2:: 0, yT X + xTy::::; l' + xTy}.


The last part of the Lemma follows from the fact Scp = S+(O) and the
assertion (ii) of Lemma 8.2.5. 0

Proof of Lemma 8.3.1:

(i): Let us assume that M is singular. Then there exists a 2n-dimensional


vector (dI, d2 ) =f. 0 which satisfies

Hence we obtain the equation

However, since the matrix X-l Y + M is positive definite for every (x, y) >
o whenever M is positive semi-definite, the above equation implies that
dl = =
0 and d2 M d l =0, which contradicts to (d l , d2 ) =f. O. Thus we have
shown (i).
(ii): Since the matrix M is positive semi-definite, the equation tly = M tlx
ensures that
=
o: : ; tlxT M tlx tlxT tly.
On the other hand, the equation Y tlx + X tly = h implies that

(Xtyt)-lYAx + (Xty!)-l XAx = D- 1 Ax + DAy = (xtyt)-lh


348 CHAPTER 8

Thus, we obtain the inequality (8.21) as follows:


b..xT b..y (D- 1 b..xf Db..y
~ {IID- 1 b..x + Db..yW - IID- 1 b..x - Db..yI12}

< ~IID-l b..x + Db..y1l2


~IIX-!Y-!hW
4 .
The equation (8.22) immediately follows from
IID- 1 b..x + Db..y1l2 = IID- 1 b..x1l2 + 2b..xT b..y + IIDb..yIl2
and b..xT b..y 2: o. o

Proof of Lemma 8.3.3:

It is easily seen that the objective function 'I/;(J-I,.) in (8.33) can be


rewritten as
n n
'I/;(J-I, x, y) = x T Y - J-I I)ogxiYi = L:(X;Yi - J-IlogxiYi).
;=1 ;=1

The function e - J-Iloge is strictly convex on lR++ and attains the minimum
at e = J-I in R++. Hence the point (x, y) E S++ satisfying Xy J-Ie is an =
optimal80lution of L(J-I). 0

Proof of Lemma 8.3.4:

Let J-I > be fixed. The Hessian matrix of the objective function ¢(J-I,.)
at (x, y) E S++ is given by

( J-IX-2 I )
I J-Iy- 2 .

Let (x', y) and (x" , y") be arbitrary points in the set Sa! such that (x', y) =f:.
(x" , y"). Then we observe that

« x , _ x ,,)T , ( Y, _ Y")T) ( I'X- 2 I ) ( x' - x" )


I l'y-2 y' _ y"

=I'IIX- 1 (x' _ x")11 2 + 2(x' _ x"f (V' _ V") + I'lI y - 1 (V' _ y")11 2
= I'IIX- 1 (x' - x")11 2 + 2(x' _ x,,)T M(x' _ x") + I'lIy-l(y' _ y")11 2
> o.
Complementarity Problems 349

Thus the Hessian matrix is positive definite at each point on the nonempty
convex set S++ = Saf n 1R~n+, which implies that ¢(p, .) is strictly convex
on S++. Consequently, if the problem L(p) has an optimal solution then it
is a unique solution. In order to see the existence of the optimal solution,
it suffices to show that the level set

At{;er) = {(x,y) E S++ : ¢(p,x,y) ~ r}


of the objective function ¢(p, .) is closed and bounded for a real number r.
The closedness of the set At{; (r) follows from the continuity of the function
¢(p, .). Hence, we only to show that At{;(r) is bounded. Let (x, y) be a fixed
feasible-interior-point, and let us choose r so that r 2: ¢(p, x, y). Then we
see that At{; (r) is nonempty, and for every (x, y) E At{; (r),

T 2: ¢(p, x, Y)
n

2)XiYi - plog(xiY.))
i=l

2: (n -l)(p - plogp) + XiY. - plog(xiYi)


(n -l)(p - plog p) + XiY. - plog(2p + (XiYi - 2p))

(n -l)(p - plogp) + XiYi - plog (2P (1 + XiY~~ 2 P ) )


(n - l)(p - p log p) + XiYi - p log 2p - p log (1 + x.Y~~ 2 P )
2: (n -l)(p - plog p) + XiYi - plog 2p _ p XiYi - 2p
2p
n(p-plogp)-plog2+ XiYi.
2
Here the second inequality follows from the fact that log(l +~) ~ ~ if
1 + ~ > 0 (see (i) of Lemma 8.4.5) Thus every point (x,y) E At{;(r)
belongs to the set
n
S+(r') = {(x, y) E S+ : x T y =L XiYi ~ T'l (8.73)
i=1

where
r' = 2n( r - n(fl - fllogfl) + fllog 2).
As we have seen in Lemma 8.2.6, the set S+ (r') is bounded under Condition
8.2.2 and Condition8.2.4, which completes the proof. 0
350 CHAPTER 8

Proof of Lemma 8.3.5:

The optimal solution (x, y) satisfies the Karush-Kuhn-Tucker optimality


condition with a Lagrangian multiplier vector z:

y - pX-Ie + MT z = 0, x - py-Ie - z = 0, y - Mx - q = O.
From the first and the second equalities, we observe that

x y- pe = -X MT Z = Y z.
Letting z' = M z, the system -X MT Z = Y z can be rewritten as follows:

It follows from (i) of Theorem 8.3.1 that the coefficient matrix of the above
system is nonsingular, hence we can conclude that the Lagrange multiplier
vector z is O. 0

Proof of Theorem 8.3.6:

The existence and the uniqueness ofthe solution (x(p), y(p» E S++ of
the system (8.32) are ensured by Lemmas 8.3.4, 8.3.4 and 8.3.5. Further-
more, the mapping H defined by (8.31) is Ceo on lR X lR 2n and its Jacobian
matrix with respect to (x, y) coincides with if defined by (8.20). Since (i)
of Theorem 8.3.1 ensures that AI is nonsingular for every (x, y) > 0, thus
we obtain that the path of centers S++ is a I-dimensional smooth curve
by applying the implicit function theorem (see, for example, [64]). Let {l
be fixed. Then the set ((x(p), y(p» : 0 < {l} C Seen is bounded since it
is contained in the bounded set {(x, y) E S+ : x T y ::; n{l} (see Lemma
8.2.6). This implies that the there exists at least one accumulation point
of (x(p), y(p» as p > 0 tends to O. By the continuity of the mapping H,
every accumulation point is a solution of the LCP. To see the convergence
of (x(p), y(p» to a single point, we need to observe the limiting behavior
of (x(p), y(p» more precisely.
In view of (ii) of Lemma 8.2.5, there exists two index sets Ix and Iy
such that

Ix = {i: Xi = 0 for every (x,y) E Sep},


Iy = {i: Yi = 0 for every (x,y) E Sep},
Ix UI y = {1,2, ... ,n}.
Complementarity Problems 351

Since every accumulation point of (X(II), y(p)) is a solution of the LCP, it


follows that

lim Xi(p)
~-o
= 0, i E Ix, and lim Yi(lI)
~-o
= 0, i Ely.

Hence we only to show that other components of (x(p), y(p» also converge
to some values. Let us define the function
n

w(x,y) = - 2.:logxiYi.
i=1

=
Let X(p)i {i(lI) and Y(II)i =
"1i(p), i =
1,2, ... , n. It is easily seen that
the point (x(II), Y(II)) is an optimal solution of the problem

Minimize W(x, y)

subject to (x,y) E S++ = SaC n 1R!"+, xTy = 2.:" {i(p)"1i(p),


i=1

and (x(P),Y(II)) satisfies the Karush-Kuhn-Tucker condition with a La-


grangian multiplier vector (zo, z) E IR.t+":
X-I + MT Z + zoY = 0, y - l - Z + zox = 0, (8.74)
=
Y M x + q, x T Y = L,~=1 {i(II)"1i(II)·
Let us define
WN(X, y) = -(2.: log Xi + 2.: logYi)
iEI", iEI.

and
WB(X,y) = W(x,y) -WN(X,y).
Since WN(X,y) is constant on the set {(x,y) E R2" : Xi {i(II), i E =
Ix; Yi = TJi (II), i E Iy}, the point (x(p), Y(II» is the optimal solution of
Minimize WB(p, X, y)
n

subject to (x, y) E S++ = SaC n R!n+, xTy = 2.:{i(p)TJi(p),


i=1
Xi = {i(P), i E Ix, Yi = TJi(p), i Ely.
By Lemma 8.2.6, we can see that the set

{(x, y) E S+: x T Y = 0,
Xi = 0, i E Ix; Yi = 0, i Ely,
Xi > 0, i fj. Ix; Yi > 0, i fj. Iy}
352 CHAPTER 8

is a nonempty bounded convex set. It follows that the problem correspond-


ing to J.l = 0

Minimize WB(J.l, x, y)
subject to {(x, y) E S+ : xT y = 0
Xi=O, iElx); Yi=O, iEly),
Xi>O, ifilx; Yi>O, ifily}
has a unique optimal solution which we denote by (x(O), y(O». This solution
can be characterized by the following system:

-Xi1 + MiT Z + ZOYi = 0, i fi Ix,


1
- - Zi + ZOXi = 0, i rt. I y ,
Yi
Xi=O, iElx; Yi=O, iEly; y=Mx+q
where (zo, z) E IR1+ n is a Lagrangian multiplier vector and Mt is the ith
row of MT. Since the point (x(J-l), Y(J-l» satisfies the system (8.74) for every
J-l > 0, any accumulation point of ((x(J-lk), Y(J-lk))} (J-lk --> 0) satisfies the
above system and coincides with (x(O), y(O». Thus, (x(J-l), Y(J-l» converges
to a point (x(O), y(O» as J-l > 0 tends to O. 0

Proof of Lemma 8.4.1:

(i): Since (Xy - i!e)Te = 0, we obtain (i) from (x, y) E N'(et) as follows:

(ii) and (iii): The inequality (8.37) implies that

IIX- 1/ 2y-l/211 2 < n (8.75)


- (l-et)xTy
Complementarity Problems 353

for every (x, y) E N(a). Hence, from (i) above, we have

IIX-l/2y-l/2 (XY-!3 x: Ye ) 112 ::; a2\(~~!3)2nx:y.


The assertions (ii) and (iii) immediately follow from this inequality and (ii)
of Lemma 8.3.1 by substituting (x, y) = (x, y) and

- _ Y
x-T-
h =- ( Xy - !3-:;;-e .
)

(iv) and (v): Combining (8.75) and (iii) above with the equations

IIX- 1 dx(!3)11 = IIX-l/2y-l/2(D-ldx(!3»11


liY- 1 dy(!3)1I = IIX- 1 / 2y-l/2(Ddy(!3»11
IldX(!3)dy(!3)1I = II(D- 1 dX(!3»(Ddy(!3»II,
we obtain (iv) and (v), which completes the proof of this lemma. 0

Proof of Lemma 8.4.2:

The assertions are based on the relation

Ydx(!3) +Xdy(!3) =- (XY- !3x:Ye).

(i):

X(O)T yeO)
=(x + Odx(!3»T (Y + Ody(fJ»
= x T fj + O(YT dx(!3) + x T dy(!3)) + 02dx(fJ? dy(fJ))
= xT y _ OeT (X y - e) +
!3 x: Y 02dx(!3)T dy(!3))

=xT fj _ OeT (X Y _ x:Ye)


-T-
-0(1 - !3)eT x Ye + 02 dx(!3? dy(!3»
n
= (1 - 0(1 - !3»xT Y + 02 dx(!3? dy(!3).
354 CHAPTER 8

(ii) :

X(O)y(O) _ X(O)T yeO) e


n
= Xii + O(Y ~x(fJ) + X~y(fJ)) + 02 ~X(fJ)~y(fJ)
X(O)T yeO)
- e
n

= Xii - 0 (Xii - fJx:ii e) + 02~X(fJ)~y(fJ)


x( 0) T fI( 0)
- e
n

= Xii - 0 (Xii - x: ii e) - O( 1 - fJ) x: ii e+ 0 ~X (fJ) ~y(fJ)


2

-(1 - 0(1 - fJ)) x Tii e _ 02 ~x(fJ)T ~y(fJ) (by (i))


n n
= (1 - 0) (Xii - x: ii e) + 02 ( ~X(fJ)~y(fJ) _ ~x(fJ): ~y(fJ) e) .
o

Proof of Lemma 8.4.5:

For ~ E R, one can easily see the following inequalities:

loge 1 + ~) :S ~, if 1 + ~ > 0, (8.76)

log(1+0 ~
e
~-2' if ~~o. (8.77)

The assertion (i) is the inequality (8.76) itself. To see (ii), it is sufficient to
show
log(l +~) ~ ~ - 2(1 _ T)
e
if ~ ~ -T for some T E [0,1). In the case ~ ~ 0, the above inequality follows
immediately from (8.77). Furthermore, if I~I :S T, we observe that

log(l +~)
Compl ementa rity Proble ms
355

(,2
> (,- 2(1-T )
Thus we have shown (ii).
o

Proof of 8.4.6:

The following inequal ity follows from the assump tion (8.48) and
Lemma
8.4.5:

<I> ( 1: ( B), Y(B» - <I> ( x, y) = {( n + v) log (x + BD..z: ) T (y + /:It,. y )


n

- 2:)og( x, + BD..z:,)(Yi + BD.y.))}


n

- {(n + v) log xTy - Llog(x ,y,)

Proof of Lemm a 8.5.3:

(i): It is straight forward .


(ii): Let (x,y) be a solution of the LCP (8.2) such that (x,y) E
{(x,y) E
S+ : eTx < qn+d. Then -eTi+ qn+l > and (i',y') = (x,O,y °
,-eTi +
356 CHAPTER 8

qn+l) be a solution of the artificial problem LCP' by its construction. From


(x'·,y'·) E S~p and (x',il) E S~P' we see that

o (x'·fY'·+(x'fy'
(x'·f y' + (x')T y,. + (x'· - x'f M'(x'· - x').
Since the matrix M' is positive semi-definite matrix, we have

(x'·f i1 + (x'f y'. ~ 0,

which implies that x~+l = 0 since Yn+l = _e T X + qn+1 > O. Thus we


obtain the assertion (ii).
(iii): Every basic component of a basic feasible solution of the system

y=Mx+q, (x,y) 2::0.


is represented as the ratio of two minors of order n for the n x (2n+ 1) matrix
(-M I q). Since eyery minor of (-M I q) is integral and its absolute
value is less than 2L In 2 by (8.63), every component of a basic feasible
solution i~ less than 2£ In 2 . Thus, the set {(x, y) E S+ : eT x < qn+d with
qn+l 2:: 2L In contains all of basic feasible solutions of the LCP (8.2). We
obtain (iii) from (ii). 0

Proof of Lemma 8.5.4:

First note that a -y exist such that -y E [r(M, q), 'Y(M, q)]. It can bee
seen from the inequality (8.63) of L which implies that


'Y(M, q) = 22"
n
2:: -y(M, q) = 2 max {l[Me]il, Iqil} 2:: 2.
- iE{1,2, ... ,n}

(i): To see (x'o, y'0) E S++, we have only to show that yO > O. By the
definition (8.65) of r(M, q) and -y 2:: r(M, q), we have

It follows that

(5- 1) n-y e
~ ~
2
~ y° ~ (5 + 1) n-y e.
~ ~
2 (8.78)
Complementarity Problems 3.57

Since ct E (0,5/2]' we obtain yO> O. From the definition (8.65) of(x'O,y'o)


and the bounds (8.78) of yO, we have

(~-~)n"/ S::x?y? s:: (~+~)n'l (i=I,2, ... ,n),

° ° = (5)
a wy3,
X n +1Yn+l

(8.79)
( 2." __1_) n 3 < (x'O)Ty'O < (2. + _1_) n 3
n+ 1 I - n+ 1 -" n+ 1 I,
_ (1.n + _1_)
n+1
n..,,3
I
<
-
XIOylO _ (x'O)TyIO
" n+l
<
-
(1.n + _1_)
n+l
n..,,3 V'
I I.

Using n ~ 2 and ct E (0, ~l it follows that

hence we have shown (i).


(ii): The bounds (8.79) and l' s:: t(M, q) also ensure that
¢~p(XIO, ylO) log (xIO)T ylO

s:: log{(~+ n:Jn(n+lh 3 }

s:: log {(~ + _1_) n(n + 1) 23(L6+1) }


0' n+l n

s:: 3( L- + 1) log 2 + log ( 1 + -;-


10)

- 10
s:: 3(L + 1) + -0' (by (i) of Lemma 8.4.5)

s:: (3 + ~) L (since L ~ L + 1~ 21og2 n + 1 > 2).

¢~en(xlO, ylO)
~ I (x'O)T ylO /(n + I)
~ og 10 10
i=l xi Yi

~I (x IO )T yIO/(n+l) I (x IO )T yIO/(n+l)
~
;=1
og °°
Xi Yi
+ og ° °
x n + 1 Yn+1

~I 5/0'+1/(n+l) I 5/0'+1/(n+l)
s:: ~ og 5/0' _ l/n + og 5/0'
t=l
358 CHAPTER 8

n log {I + I/n: 1/( n/ + 1) } + log {I + (a )}


5a-In 5n+I
nI/n+I/(n+I)+ a
~ 5/a _ I/n 5(n + 1) (by (i) of Lemma 8.4.5)
2 a
5/a - l/n + 10
2a a
---+-
5-a/n 10
a (since a E (0,5/2]).
The proof is completed by the definition of ¢/.
"') S·mce 'Y ::;
(111: t( q =
M,) 2£+1
- 2- an d qn+1 =(
n +)
1 'Y, we have that
n

_ (n
qn+1 < + 1)2£+1
2 .
Th e assertlOn
. (".). b . db y tak'mg account 0 f
111 18 0 tame
n
the above inequality, the construction (8.59) of the LCP', the definitions
(8.62) and (8.64) of Land L', and the known inequality n(n+l)::; L. 0

Proof of Lemma 8.5.6:

First observe that f U J = {I, 2, ... , n} since otherwise there exists an


index k such that Xk > 2- L and Yk > 2- L which contradicts to xT Y ::; 2- 2L .
Each basic component of basic feasible solution of the system

y=Mx+q, (x,y)~O

can be represented as a ratio tid ti2 where til is a minor of order n of


the matrix [-M f q] and ti2 a nonzero minor of order n of the matrix
[-M f]. By the definition (8.62) of L and the bound (8.63), we see that
2L 2L
1::; IIti111 ::; 2' 1::;
n
IIti211 ::; 2
n
(see, for example, [71]). Therefore each nonzero component of a vertex of
S+ is not less than n 2 2-L.
In view of Caratheodory's theorem (see, for example, [64]) we have
p

(x, y) = 2:>f(X f , yf) + (~, 7]),


f=l

where
p

p ::; n + 1, L Cf = 1, Cf ~ 0 (f = 1, ... , p),


f=l
Complementarity Problems 359

=
(Xi, r/) is a vertex of S+ for l' 1, ... , p and (~, '17) is an unbounded direction
of S+, i.e., '17 = M~ and (~, '17) ~ o. Among (xi, r/) (l' = 1, ... ,p), we can
find a vertex (x', y') of S+ such that Ci ~ 1/( n + 1). It follows that

(x, Y) ~ n ~ 1 (x' , y' )


and

(n + 1)2-£ ~ (n + l)xi ~ xi (iEI),


(n + 1)2-£ ~ (n + l)Yj ~ yj (j E J).

Since each nonzero component of the vertex (x*, y') is not less than n 2 2-£ >
(n + 1)2-£ (n ~ 2), the above inequalities imply that the vertex (x*, yO)
satisfies the relation (8.69). Combining the fact that I U J = {I, 2, ... , n},
we can conclude that (x*, yO) is a solution of the LCP. 0

Acknowledgements
The author would like to thank Professor Tamas Terlaky, the editor of this book,
for his warm encouragement and suggestions. Also, a colleague, Yasushi Kondo,
contributed valuable comments on an early version of this chapter.

REFERENCES
[1] E. D. Andersen and Y. Yeo On a homogeneous algorithm for the monotone com-
plementarity problem. Research Reports, Department of Management Sciences,
University of Iowa, Iowa City, Iowa 52242, 1995.
[2] M. Anitescu, G. Lesaja, and F. A. Potra. Equivalence between different formu-
lations of the linear complementarity problem. Technical report, Department
of Mathematics, University of Iowa, Iowa City, IA 52242, USA, 1995.
[3] R. E. Bixby, J. W. Gregory, I. J. Lustig, R. E. Marsten, and D. F. Shanno. Very
large-scale programming: a case study in combining interior point and simplex
methods. Operations Research, 40:885-897, 1992.
[4] J. F. Bonnans and F. A. Potra. Infeasible path following algorithms for linear
complementarity problems. Technical report, INRIA, B.P.105, 78153 Rocquen-
court, France, 1994.
360 CHAPTER 8

[5J R. W. Cottle, J .-S. Pang, and R. E. Stone. The linear complementarity problem.
Computer Science and Scientific Computing, Academic Press Inc, San Diego,
CA92101, 1990.
[6] D. den Hertog. Interior point approach to linear, quadratic and convex program-
ming. Mathematics and Its Application, Vol. 277, Kluwer Academic Publishers,
The Netherlands, 1994.
[7] A. V. Fiacco and G. P. McCormick. Nonlinear Programming: Sequential Un-
constrained Minimization Techniques. John Wiley & Sons, New York, 1968.
[8] M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Re-
search Logistics Quarterly, 3:95-110, 1956.
[9] R. M. Freund. Polynomial-time algorithms for linear programming based only
on primal scaling and projected gradients of a potential function. Mathematical
Programming, 51:203-222, 1991.
[10] M. S. Gowda. On reducing a monotone horizontal LCP to an LCP. Techni-
cal report, Department of Mathematics & Statistics, University of Maryland
Baltimore County, Baltimore, Maryland 21228, 1994.
[11] O. Guier. Generalized linear complementarity problems. Research Reports,
Department of Mathematics and Statistics, University of Maryland Baltimore
County, Baltimore, Maryland 21228-5398, 1992.
[12] O. GuIer. Existence of interior points and interior paths in nonlinear monotone
complementarity problems. Mathematics of Operations Research, 18:128-148,
1993.
[13] O. GuIer. Barrier functions in interior point methods. Technical report, Depart-
ment of Mathematics and Statistics, University of Maryland Baltimore County,
Baltimore, Maryland 21228, USA, 1994.
[14] P. T. Harkar and J .-S. Pang. Finite-dimensional variational inequality and
nonlinear complementarity problems: A survey of theory, algorithms and appli-
cations. Mathematical Programming, 48:161-220, 1990.
[15] B. Jansen, C. Roos, and T. Terlaky. A family of polynomial affine scaling
algorithms for positive semi-definite linear complementarity problems. ISSN
0922-5641, Faculty of Technical Mathematics and Informatics, Delft University
of technology, P.O.Box 5031,2600 GA Delft, The Netherlands, 1993.
[16) B. Jansen, K. Roos, T. Terlaky, and A. Yoshise. Polynomiality of primal-dual
affine scaling algorithms for nonlinear complementarity problems. Technical
Report 95-83, Faculty of Technical Mathematics and Computer Science, Delft
University of Technology, Delft, The Netherlands, 1995.
Complementarity Problems 361

[17] F. Jarre. On the method of analytical centers for solving smooth convex pro-
gramming. In S. Dolecki, editor, Optimization, pages 69-86, Berlin, Germany,
1988. Lecture Notes in Mathematics No. 1405, Springer Verlag.

[18] F. J arre. Interior-point methods via self-concordance of relative Lipschitz con-


dition. Habilitationsschrift, Fakultiit fiir Mathematik der Bayerrischen Julius-
Maximilians- U niversitiit Wiirzburg, 1994.

[19] J. Ji, F. Potra, and S. Huang. A predictor-corrector method for linear comple-
mentarity problems with polynomial complexity and super linear convergence.
No. 18, Department of Mathematics, The University of Iowa, Iowa City, Iowa
52242, 1991.

[20] J. Ji, F. Potra, R. A. Tapia, and Y. Zhang. An interior-point method with


polynomial complexity and superlinear convergence for linear complementarity
problems. TR91-23, Department of Mathematics, The University of Iowa, Iowa
City, Iowa 52242, 1991.

[21] J. Ji and F. A. Potra. An infeasible-interior-point method for the P.-matrix


LCP. Technical report, Department of Mathematics and Computer Science,
Valdosta State University, Valdosta, GA 31698, 1994.

[22] J. Ji, F. A. Potra, and R. Sheng. A predictor-corrector method for solving the
P.-matrix LCP from infeasible starting points. Technical report, Department of
Mathematics and Computer Science, Valdosta State University, Valdosta, GA
31698, 1994.

[23] N. Karmarkar. A new polynomial-time algorithm for linear programming. Com-


binatorica, 4:373-395, 1984.

[24] M. Kojima, Y. Kurita, and S. Mizuno. Large-step interior point algorithms for
linear complementarity problems. SIAM J. Optimization, 3:398-412, 1993.

[25] M. Kojima, N. Megiddo, and T. Noma. Homotopy continuation methods


for nonlinear complementarity problems. Mathematics of operations research,
16:754-774,1991.

[26] M. Kojima, N. Megiddo, T. Noma, and A. Yoshise. A Unified Approach to


Interior Point Algorithms for Linear Complementarity Problems. Lecture Notes
in Computer Science 538, Springer-Verlag, New York, 1991.

[27] M. Kojima, S. Mizuno, and T. Noma. A new continuation method for com-
plementarity problems with uniform P-functions. Mathematical Programming,
43:107-113,1989.
362 CHAPTER 8

[28] M. Kojima, S. Mizuno, and T. Noma. Limiting behavior of trajectories gener-


ated by a continuation method for monotone complementarity problems. Math-
ematics of Operations Research, 15:662-675, 1990.

[29] M. Kojima, S. Mizuno, and A. Yoshise. A polynomial-time algorithm for a class


of linear complementary problems. Mathematical Programming, 44:1-26, 1989.

[30] M. Kojima, S. Mizuno, and A. Yoshise. A primal-dual interior point algorithm


for linear programming. In N. Megiddo, editor, Progress in Mathematical Pro-
gramming, Interior-Point and Related Methods, pages 29-47, New York, 1989.
Springer-Verlag.

[31] M. Kojima, S. Mizuno, and A. Yoshise. An O( foL) iteration potential reduction


algorithm for linear complementarity problems. Mathematical Programming,
50:331-342, 1991.
[32] M. Kojima, S. Mizuno, and A. Yoshise. A little theorem ofthe big M in interior
point algorithms. Mathematical Programming, 59:361-375, 1993.

[33] M. Kojima, T. Noma, and A. Yoshise. Global convergence in infeasible-interior-


point algorithms. Mathematical Programming, 65:43-72, 1994.

[34] M. Kojima, S. Shindoh, and S. Hara. Interior-point methods for the mono-
tone linear complementarity problem in symmetric matrices. Technical report,
Department ofInformation Sciences, Tokyo Institute of Technology, 2-12-1 Oh-
Okayama, Meguro-ku, Tokyo 152, Japan, 1994.

[35] K. O. Kortanek and J. Zhu. A polynomial barrier algorithm for linearly con-
strained convex programming problems. Mathematics of Operations Research,
18:116-127,1993.
[36] I. J. Lustig, R. E. Marsten, and D. F. Shanno. Computational experience with a
primal-dual interior point method for linear programming. Linear Algebra and
Its Applications, 152:191-222,1991.

[37] O. 1. Mangasarian. Characterization of bounded solution sets of linear comple-


mentarity problems. Mathematical Programming Study, 19:153-166,1982.
[38] O. L. Mangasarian. Simple and computable bounds for solutions of linear com-
plementarity problems and linear programs. Mathematical Programming Study,
25:1-12,1985.
[39] O. 1. Mangasarian. Error bounds for non degenerate monotone linear comple-
mentarity problems. Mathematical Programming, 48:437-445, 1990.
Complementarity Problems 363

[40] R. Marsten, R. Subramanian, M. Saltzman, I Lustig, and D. Shanno. Interior


point methods for linear programming: Just call Newton, Lagrange and Fiacco
and McCormick! Interfaces, 20:105-116, 1990.
[41] L. McLinden. An analogue of Moreau's proximation theorem, with application
to the nonlinear complementarity problem. Pacific Journal of Mathematics,
88:101-161, 1980.
[42] K. A. McShane, C. L. Monma, and D. F. Shanno. An implementation of a
primal-dual interior point method for linear programming. ORSA Journal on
Computing, 1:70-83, 1989.

[43] N. Megiddo. A monotone complementarity problem with feasible solutions but


no complementary solutions. Mathematical Programming, 12:131-132,1977.
[44] N. Megiddo. A note on the complexity of P-matrix LCP and computing an
equilibrium. Technical report, IBM Almaden Research Center and School of
Mathematical Sciences, Tel Aviv University, 650 Harry Road, San Jose, CA
95120-6099, USA and Tel Aviv, Israel, 1988.
[45] N. Megiddo. Pathways to the optimal set in linear programming. In N. Megiddo,
editor, Progress in Mathematical Programming, Interior-Point and Related
Methods, pages 131-158, New York, 1989. Springer-Verlag.

[46] S. Mizuno. An O( n 3 L) algorithm using a sequence for a linear complementarity


problem. Journal of the Operations Research Society of Japan, 33:66-75, 1990.

[47] S. Mizuno. A superlinearly convergent infeasible-interior-point algorithm for


geometrical LCPs without a strictly complementary condition. Technical re-
port, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-ku,
Tokyo, 106, Japan, 1994.

[48] S. Mizuno, F. Jarre, and J. Stoer. An unified approach to infeasible-interior-


point algorithms via geometrical linear complementarity problems. Technical
report, The Institute of Statistical Mathematics, 4-6-7 Minami-Azabu, Minato-
ku, Tokyo, 106, Japan, 1994.

[49] S. Mizuno and M. J. Todd. An O( n 3 L) adaptive path following algorithm for a


linear complementarity problem. Mathematical Programming, 52:587-595, 1991.
[50] S. Mizuno, A. Yoshise, and T. Kikuchi. Practical polynomial time algorithms for
linear complementarity problems. Journal of the Operations Research Society
of Japan, 32:75-92, 1989.

[51] R. D. C. Monteiro and I. Adler. Interior path following primal-dual algorithms.


Part I: linear programming. Mathematical Programming, 44:27-41, 1989.
364 CHAPTER 8

[52] R. D. C. Monteiro and I. Adler. Interior path following primal-dual algorithms.


Part II: convex quadratic programming. Mathematical Programming, 44:43-66,
1989.
[53] R. D. C. Monteiro and J .-S. Pang. Properties of an interior-point mapping for
mixed complementarity problems. Technical report, School of Industrial and
Systems Engineering Georgia Institute of Technology, Atlanta, Georgia 30332-
0205, 1993.
[54] R. D. C. Monteiro and T. Tsuchiya. Limiting behavior of the derivatives of cer-
tain trajectories associated with a monotone horizontal linear complementarity
problem. Technical report, Systems and Industrial Engineering, University Ari-
zona, Tucson, AZ 85721, 1992.
[55] R. D. C. Monteiro and S. Wright. Local convergence of interior-point algo-
rithms for degenerate monotone LCP. Technical report, Systems and Industrial
Engineering, University Arizona, Tucson, AZ 85721, 1993.
[56] R. D. C. Monteiro and S. Wright. A superlinear infeasible-interior-point affine
scaling algorithm for LCP. Technical report, Systems and Industrial Engineer-
ing, University Arizona, Tucson, AZ 85721, 1993.
[57] R. D. C. Monteiro and S. Wright. Superlinear primal-dual affine scaling algo-
rithms for LCP. Technical report, Systems and Industrial Engineering, Univer-
sity Arizona, Tucson, AZ 85721, 1993.
[58] J. J. More. Class of functions and feasibility conditions in nonlinear comple-
mentarity problem. Mathematical Programming, 6:327-338, 1974.
[59] J. E. Nesterov. New polynomial-time algorithms for linear and quadratic pro-
gramming. Report at the 13-th International Symposium on Mathematical
Programming, Central Economical and Mathematical Institute, USSR Acad.
Sci., Krasikova str. 32,117418 Moscow, USSR, 1988.
[60] J. E. Nesterov and A. S. Nemirovsky. A general approach to polynomial-time
algorithms design for convex programming. Report at the 13-th International
Symposium on Mathematical Programming, Central Economical and Mathe-
matical Institute, USSR Acad. Sci., Krasikova str. 32, 117418 Moscow, USSR,
1988.
[61] J. E. Nesterov and A. S. Nemirovsky. Self-concordant functions and polynomial-
time methods in convex programming. Technical report, Central Economical
and Mathematical Institute, USSR Acad. Sci., Moscow, USSR, 1989.
Complementarity Problems 365

[62] Y. Nesterov and A. S. Nemirovsky. Interior point polynomial algorithms in


Convex Programming. SIAM Studies in Applied Mathematics, Vol. 13, SIAM,
Philadelphia, 1994.

[63] T. Noma. A globally convergent iterative algorithm for complementarity prob-


lems - a modification of interior point algorithms for linear complementarity
problems - . Dr. Thesis, Dept. of Systems Sciences, Tokyo Institute of Tech-
nology, Oh-Okayama, Meguro-ku, Tokyo 152, Japan, 1991.

[64] J. M. Ortega and W. G. Rheinboldt. Iterative Solution of Nonlinear Equations


in Several Variables. Academic Press, Orlando, Florida 32887, 1970.

[65] P. M. Pardalos and Y. Yeo The general linear complementarity problem. Tech-
nical report, Department of Computer Science, The Pennsylvania State Univer-
sity, University Park, PA 16802, 1990.

[66] E. Polak. Computational Methods in Optimization: A Unified Approach. Aca-


demic Press, New York, 1971.

[67] F. A. Potra. An O( nL) infeasible-interior-interior algorithm for LCP with


quadratic convergence. Technical report, Department of Mathematics, The Uni-
versity of Iowa, Iowa City, Iowa 52242, 1994.

[68] F. A. Potra and R. Sheng. A large-step infeasible-interior-point method for the


po-matrix LCP. Technical report, Department of Mathematics, University of
Iowa, Iowa City, IA 52242, USA, 1994.

[69] F. A. Potra and R. Sheng. A path following method for LCP with superlinearly
convergent iteration sequence. Report on Computational Mathematics, No.
69/1995, Department of Mathematics, University of Iowa, Iowa City, IA 52242,
1995.

[70] F. A. Potra and Y. Yeo Interior point methods for nonlinear complementarity
problems. Technical report, Department of Mathematics, The University of
Iowa, Iowa City, Iowa 52242, 1991.

[71] A. Schrijver. Theory of Linear and Integer Programming. John-Wiley & Sons,
New York, 1986.

[72] M. Shida, S. Shindoh, and M. Kojima. Centers of monotone generalized comple-


mentarity problems. Research Reports on Information Science B-303, Depart-
ment of Mathematical and Computing Sciences, Tokyo Institute of Technology,
2-12-1 Oh-Okayama, Meguro-ku, Tokyo 152, Japan, 1995.
366 CHAPTER 8

[73] J. Sun, J. Zhu, and G. Zhao. A predictor-corrector algorithm for a class of


nonlinear saddle point problem. Technical report, Department of Decision Sci-
ences, National University of Singapore, 10 Kent Ridge Crescent, Singapore
0511, 1994.

[74] K. Tanabe. Centered Newton method for mathematical programming. In M. Iri


and K. Yajima, editors, System Modelling and Optimization, pages 197-206.
Springer-Verlag, 1988.

[75] K. Tanabe. A posteriori error estimate for an approximate solution of a general


linear programming problem. In K. Tone, editor, New Methods for Linear Pro-
gramming 2, pages 118-120,4-6-7 Minamiazabu, Minato-ku, Tokyo 106, Japan,
1988. The Institute of Statistical Mathematics.

[76] M. J. Todd. Projected scaled steepest descent in Kojima-Mizuno-Yoshise's


potential reduction algorithm for the linear complementarity problem. Technical
Report No. 950, School of Operations Research and Industrial Engineering,
College of Engineering, Cornell University, Ithaca, New York 14853-3801, 1990.

[77] M. J. Todd and Y. Yeo A centered projective algorithm for linear programming.
Mathematics of Operations Research, 15:508-529, 1990.

[78] R. H. Tiitiincii and M. J. Todd. Reducing horizontal linear complementar-


ity problems. Technical report, School of Operations Research and Industrial
Engineering, Cornell University, Ithaca, New York 14853, USA, 1994.

[79] H. Valiaho. P.-matrices are just sufficient. Technical report, Department of


Mathematics, University of Helsinki, Helsinki, Finland, 1995.

[80] S. Wright. A superlinear infeasible-interior-point algorithm for monotone non-


linear complementarity problems. Technical report, Argonne National Labora-
tory, 9700 South Cass Avenue, Argonne, Illinois 60439, 1993.

[81] S. J. Wright. A path-following interior-point algorithm for linear and quadratic


problem. Preprint MCS-P40101293, Mathematics and Computer Science Divi-
sion, Argonne National Laboratory, Argonne, IL 60439, 1994.
[82] X. Xu, P.-F. Hung, and Y. Yeo A simplified homogeneous and self-dual linear
programming algorithm and its implementation. Technical report, Institute of
Systems Science, Academia Sinica, Beijing 100080, China, 1993.

[83] Y. Yeo A class of potential functions for linear programming. Technical re-
port, Integrated Systems Inc., Santa Clara, CA and Department of Engineering-
Economic Systems, Stanford University, Stanford, CA, 1988.
Complementarity Problems 367

[84] Y. Yeo A further result on the potential reduction algorithm for the P-matrix
linear complementarity problem. Technical report, Department of Management
Sciences, The University of Iowa, Iowa City, Iowa 52242, 1988.
[85] Y. Yeo The potential algorithm for linear complementarity problems. Technical
report, Department of Management Sciences, The University ofIowa, Iowa City,
Iowa 52242, 1988.
[86] Y. Yeo An O(n 3 L) potential reduction algorithm for linear programming. Math-
ematical Programming, 50:239-258, 1991.

[87] Y. Yeo On homogeneous and self-dual algorithm for LCP. Technical report,
Department of Management Sciences, The University of Iowa, Iowa City, Iowa
52242, 1994.
[88] Y. Ye and K. Anstreicher. On quadratic and O( foL) convergence of a predictor-
corrector algorithm for LCP. Mathematical Programming, 59:151-162, 1993.
[89] Y. Ye, M. J. Todd, and S. Mizuno. An O(foL)-iteration homogeneous and
self-dual linear programming algorithm. Technical report, Department of Man-
agement Sciences, The University of Iowa, Iowa City, Iowa 52242, 1992.
[90] Y. Zhang. On the convergence of a class of infeasible interior-point methods
for horizontal linear complementarity problem. Research Report 92-07, Depart-
ment of Mathematics and Statistics University of Maryland Baltimore County,
Baltimore, Maryland 21228-5398, 1992.

[91] J. Zhu. A path following algorithm for a class of convex programming problems.
Zeitschrijt fur Operations Research, 36:359-377,1992.
9
SEMIDEFINITE PROGRAMMING
Motakuri V. Ramana, Panos M. Pardalos
Center for Applied Optimization
Department of Industrial and Systems Engineering
University of Florida
Gainesville, Florida 35611 USA

ABSTRACT
Semidefinite Programming is a rapidly emerging area of mathematical programming. It
involves optimization over sets defined by semidefinite constraints. In this chapter, several
facets of this problem are presented.

9.1 INTRODUCTION
Let Sn be the space of n x n real symmetric matrices, and for A, B E Sn, A • B
denotes the inner product Li,j Aij Bij , and we write A ~ B if A - B is positive
semidefinite. Suppose that Qo, ... , Qm E Sn are given matrices, and c E Rm. Then
the semidefinite program in equality standard form is defined to be the following
optimization problem.

inf: U. Qo
U .Qi Ci Vi = 1, ... , m (SDP-E)
U ~ O.
We also define the semidefinite program in inequality standard form to be:

sup: cT x
(SDP-I)
L;:l XiQi --< Qo

The two problems SDP-E and SDP-I are equivalent in the sense that one can be
transformed into the other with relative ease. Furthermore, as will be seen in sections

369
T. Terlaky (ed.), Interior Point Methods of Mathematical Programming 369-398.
© 1996 Kluwer Academic Publishers.
370 CHAPTER 9

to follow, these problems are the so-called standard duals of each other. The main
motivation for starting out with both problems is that, the first form appears to be
more suitable for algebraic purposes while the latter has a strong geometric flavor.
Let fE, fi denote the optimal values of the problems SDP-E and SDP-I respectively.
Both problems will be collectively referred to as SDP.

The main subject matter of Semidefinite Programming (SDP) can be broadly


classified into the following three categories.

1. Geometric, algebraic and duality theoretic treatment of SDP.


2. Algorithmic, complexity theoretic and computational development.
3. Applications of SDP.

At the outset, it should be mentioned that two recent survey articles have already
appeared on SDP, namely [3] and [92] (an earlier version of the latter is [91]). The
main thrust of these two surveys had been interior point methodologies for SDP. In
addition, in [3], applications to combinatorial optimization have been discussed, and
in [92], applications to engineering problems and other optimization problem classes
were presented. Keeping the above in mind, here we will dwell upon aspects that
have received less attention in the abovementioned references. In particular, only
sketchy attention will be paid to interior point methods, despite the stated title of
the current volume. Several open problems will be stated with the hope that they
will inspire further developments in this highly promising subject area.

9.2 GEOMETRY AND DUALITY


In this section, we will look at several geometric and duality theoretic aspects con-
cerning SDP. Throughout this chapter, Q(x) will denote the linear matrix map:
m

Q(x) = Lx;Q;.
;=1

A Spectrahedron is defined to be a closed convex set of the following type,

G = {x IQ(x) ~ Qo},
where Q(x) is a linear symmetric matrix map as defined above, and Qo E Sn. In
other words, G is the feasible region of the semidefinite program SDP-1. It is not hard
Semidefinite Programming 371

to see that the feasible region of SDP-E can be recast in the above inequality form,
and hence spectrahedra are precisely the feasible regions of semidefinite programs.
The name spectrahedron is chosen for the reason that their definition involves the
spectrum (the eigenvalues) of matrices, and they bear a resemblance to, and are a
generalization of polyhedra.

9.2.1 Analysis of Spectrahedra


We begin by first introducing some special classes of spectrahedra.

• If P = {x I Ax ::; b} is a polyhedron, then {x IDiag( b- Ax) to} is a spectrahedral


representation of P. Thus every polyhedron is a spectrahedron.

• Let S ={x I xTQx + bT X + c ::; O} be a generic ellipsoid, where Q is a PSD


matrix. Then, it can easily be shown that S is a spectrahedron (see [92], [75]).
Moreover, the intersection of finitely many spectrahedra is another spectrahe-
dron, and hence the intersection of several generic ellipsoids is a spectrahedron.
As an example, every Euclidean ball is a spectrahedron. It is also interest-
ing that the unit ball in 14 norm is the projection of a spectrahedron. To see
this, let m = 2, and consider S = {(Xl, X2) I + x~x1 ::; I}. Then consider
R = {(XI,X2,YI,Y2)IYf + y~ ::; 1,xi ::; YI'X~ ::; Y2}. It follows that R is a
spectrahedron, and S is the projection of R onto its first two coordinates.

Certain properties of spectrahedra have been studied in [79] and [68]. Some of these
properties are:

1. Given a point x in a spectrahedron G as defined earlier, the smallest face of G


containing x is given by

FG(x) = {x E G I Null(Qo - Q(x» ;2 Null(Qo - Q(x»}.

Using this, one can characterize extreme points and extreme rays of spectrahe-
dra. It is also known that every face of a spectrahedron is exposed (i.e., each
face of G can be written as the intersection of a hyperplane with G; see [84] for
examples of nonexposed faces of general convex sets).

2. Spectrahedra are closed under intersections, but are not closed under linear
mappings, projections, polar operation or Minkowski sums ([79]).
372 CHAPTER 9

3. Unlike polyhedra [8], the dimensions of the faces of a spectrahedron need not
form a contiguous string. Take the PSD cone, for instance, which is a spectra-
hedron and it is well known that the dimensions of its faces are the triangular
=
integers k(k + 1)/2 for k 0, ... , n (see [9] and [21]).

In [50], the following subclass of spectrahedra, called Elliptopes were introduced:

£n := {U E Sn lUi; = 1 V i, U ~ O}.

Such matrices are also known as correlation matrices, and they playa critical role
in the approximation algorithm for the MAXCUT problem developed in [26]. More
specifically, as we will see in more detail later, their method is a relaxation in which
one optimizes a linear objective function over £n. In [50] and [51], this object has
been investigated. In particular, their results include the following.

• Expressions for the normal cones.


• Proof that £n has exactly 2n vertices (points at which the normal cone is full
dimensional), namely, matrices of the form vvT where v is a binary (±1) vector.
• Various results concerning regular points (points where the normal cone is one
dimensional), tangent cones and faces of £n.

In [68], results concerning facial structure of spectrahedra are given. The following
results are also derived.

1. Bounds on the ranks of the matrices (U for the SDP-E case and Qo - Q(x) for
the SDP-I problem), when the solutions are extreme points.
2. Bounds on the multiplicity of the eigenvalues of the matrices at extreme point
optimal solutions ([69]).
3. In [70] and [49], the extreme points are treated as a generalization of the notion
of basic feasible solutions from LP, and "simplex-type" methods for SDP has
been proposed.

The polar of a convex set G containing the origin is defined by:


Semidefinite Programming 373

When G is a spectrahedron of the form

G={xIQ(x)~Qo},

clearly G contains the origin exactly when Qo t O. Supposing that this latter
condition holds, it is not hard to derive (see [79]) the following expression for the
polar:
GO = CI({Q*(U) I U t 0, Qo. U ~ I}),
where Q*(U) denotes the adjoint of the linear map Q(x), and CI(.) is the closure
operation. When G is full dimensional, it is not necessary to take the closure in the
above expression, thus yielding an algebraic description of the polar for this case.
However, when full dimensionality is not satisfied, this fails to hold. In [76], by
using an incremental argument, an expression for GO is derived for the most general
situation. This in turn yields a polynomial size gapfree dual program for SDP which
will be discussed in 89.2.2.

Since spectrahedra are a generalization of polyhedra, a seemingly interesting problem


is that of characterizing when a spectrahedron is polyhedral. More generally, one
can ask when a given projection of a spectrahedron is polyhedral. What is rather
surprising is that, a satisfactory answer to this latter question will likely yield a good
characterization of perfect graphs, as will be seen in 89.4.1.

On the Nonlinear Geometry of Spectrahedra


Much has been understood concerning the linear geometry (i.e., description of objects
such as faces and polars) of spectrahedra. However, these objects do not seem to
capture the inherently nonlinear nature of the surfaces of spectrahedra. To illustrate
our point, we consider the following simple example.

Let G be a spectrahedron in R3 defined as the intersection of the unit ball B =


{x I x T X ~ I} and the ellipsoid

E = {x I I(x) ~ I},
=
where, I(x) x~ + (X2 - 2)2/4 + xV4. Then every point on the boundary of G is an
extreme point, and consequently, all faces are zero dimensional, except for the whole
set itself which is 3 dimensional. However, the surface of G can be partitioned into
three pieces; two smooth surfaces which are given by exactly one of the functions
xTx - 1 and I(x) - 1 being zero (and the other being negative), and one closed
nonplanar curve which is the intersection ofthe two surfaces of Band E. This curve
is parametrized as

~(t) := (±y'(4t - 1)/3, t, ±J(t + 2)(2 - 3t)/3) ,


374 CHAPTER 9

where t is in the range [1/4,2/3J.

Prompted by the above and other similar examples, we define the following nonlinear
notion of faces, called plates. Let G =
{x E R m I Q(x) :::S Qo} be a spectrahedron,
where Q(x) is a linear n x n matrix map. Then, for every 0 ~ k ~ n, define the
subset of G given by:

G[k] := {x E G I rank(Qo - Q(x)) = k}.

Clearly, G = L~=o G[k]. Then, a plate of G of order k is defined to be the closure


of a connected component of G[k]. It is not hard to show the following:

1. The rank of Qo - Q( x) is constant over the relative interior of an (ordinary)


face (in fact, the null space is constant; see [79)). Hence the relative interior
of a face on which rank{Qo - Q(x)} = k is contained in exactly one connected
component of G[k].

2. Using the classical results of Whitney [94], it can be shown that every spectra-
hedron has at most finitely many plates.

3. If we have a polyhedron given by P =


{x I Ax ~ b}, we can reexpress it as
P = {x I Diag(b - Ax) to}. Then the above definition of plates reduces to the
usual notion of polyhedral faces.

Of course, very little is understood concerning the plates of spectrahedra and their
structure at this point. However, it appears that Algebraic Geometry techniques
such as the Groebner bases ([IOJ and [16J are good introductory texts) are applicable
here.

9.2.2 Duality in SDP


As mentioned earlier, the two formulations of SDP, namely SDP-E and SDP-I (of
S9.1) have a certain duality correspondence. More specifically, they are Lagrangian
duals ( or standard duals) of each other. To show this, consider the following minmax
reformulation of SDP-I, which is not hard to establish:

f;= sup inf{cTx+Ue(Qo-Q(x))}.


xER~ Uto

One can reverse the minmax into maxmin and, it can be shown once again that

f'E= inf sup {cTx+Ue(Qo-Q(x»}.


UtOXER~
Semidefinite Programming 375

This implies that Ii :S IE. There exist several examples for which equality fails to
hold (see, for instance, [91], [76] or [22]). Let us define, for the pair of semidefinite
programs SDP-E and SDP-I, the standard duality gap (SDC) to be the difference
IE - Ii· Listed below are some conditions under which SDG is zero (from [91]; see
[59] for a thorough treatment).

1. There exists a primal feasible solution U that is positive definite, or less restric-
tively (see [79] for explanation), the primal feasible region is full dimensional.

2. The dual feasible region is full dimensional.

3. The primal optimal solution set is nonempty and bounded.

4. The dual optimal solution set is nonempty and bounded.

When none of the above conditions hold, one may have a nonzero duality gap.
Therefore, it is a natural question to ask if there exists a polynomial size dual program
for SDP which can be written down using the primal data and for which the duality
gap is zero, without any assumptions. A first step in this direction was taken in [13],
where it was shown that for any cone programming problem, restricting attention
to the minimal cone will result in zero duality gap. Furthermore, a theoretical (and
unimplementable) method for regularizing a cone program was given. While this
approach to duality gives zero duality gap, resulting dual programs are not explicit
polynomial size programs that depend only on the primal data. The derivation of
such a dual was an open problem before it was resolved in [76]. The approach used
there was to establish a description of polars of spectrahedra and use it to formulate
the dual program (for SDP-I) called Extended Lagrange-Slater Dual (ELSD).
In the following, we will present the ELSD program and state the main duality
theorem on ELSD. But first some notation is introduced.

- Q(x) = L~l XiQi.


- G:= {x I Q(x):5 Qo} is the feasible region of (P).
- Q* : Mn -+ Rm is defined by Q*(U) = U • Qi, i = 1, ... , m (here, and in what
follows, Mn denotes the space of n x n real matrices).
- Q# : Mn -+ R m +1 is defined to be

Qo. U )
#
Q (U) =( Q*(U) .
376 CHAPTER 9

- If Y E R m +1 with indexing starting at zero,

L: YiQi.
m
Q(y) =
i=O

The following is a gapfree dual semidefinite program, called the Extended Lag-
range-Slater Dual (ELSD) for SDP-I.

inf: (U+Wm)eQo
s.t. Q*(U + Wm ) c
Q#(Ui + Wi-I) O,i= 1, ... ,m
(ELSD)
Ui >- W;Wl,i= 1, ... ,m
U >- o
Wo o
Note that the constraint Ui t Wi Wr can alternately be written as

I
[ Wi Wl]>-o
Ui -,

and consequently ELSD is a semidefinite program. The domains of different variables


are given by: U E Sn, Ui E Sn V i = 1, ... , m and Wi E Mn Vi = 1, ... , m (and we use
an auxiliary matrix variable Wo = 0 for notational convenience). The size of ELSD
is easily seen to be polynomial in the size of the primal problem SDP-1.

The duality theorem for ELSD is given below, wherein (U, W) is said to be dual
feasible, if these matrices, along with some Ui, Wi, i = 1, ... " m, where W m = W,
satisfy the constraints of the dual program ELSD.

Theorem 9.2.1 (Duality Theorem) The following hold for the primal problem
SDP-I and the dual problem ELSD:

1. (Weak Duality) If x is primal feasible and (U, W) is dual feasible, then cT x :s


(U + W)eQo.
2. (primal Boundedness) If the primal is feasible, then its optimal value is finite
if and only if the dual ELSD is feasible.
3. (Zero Gap) If both the primal and the dual ELSD are feasible, then the optimal
values of these two programs are equal.
4- (Dual Attainment) Whenever the common optimal value of the primal and
ELSD is finite, the latter attains this value.
Semidefinite Programming 377

In [83], connections between the minimal cone based approach and ELSD were dis-
cussed. Furthermore, the extended dual of the standard SDP in equality form, i.e.,
SDP-E was also given. In the recent work [78], the Lagrangian dual (or standard
dual) of ELSD has been considered. After some reformulation, the standard dual of
ELSD in variables are z E R m and Ri E Sn, y(i) E Rm+1, i = I, .. ,m takes the
form given below.
sup: cT z - 2:::;:1 Ri • I
Q(z) -< Qo

[ Q(Y(i~I)) QC!!(i + 1)) ] (P2)
Q(y(i)) >- OVi=I, .. ,m-I
[ Rm QE-Q(Z)] >- 0
Qo - Q(z) Q(y(m))
In any feasible solution of P2, the z part is also feasible for SDP-I, and every Ri is
positive semidefinite. Therefore, it follows that the optimal value of P2 is at most
that of SDP-I. In [78], it was shown that these are actually equal. Since the La-
grangian dual of P2 will be ELSD, it follows that the SDG (standard duality gap) of
P2 is zero. Thus, starting wih any arbitrary SDP, one can obtain another (polyno-
mial size) SDP with the same optimal value and whose SDG is zero. For this reason,
we will call the problem P2, the corrected primal of the semidefinite program
SDP-I. The corrected primal of SDP-E can be developed in a similar way. Now, in
order to develop interior point methods (or other complexity bounded algorithms)
for the most general SDPs, one may assume without loss of generality that the SDP
at hand (which may be taken to be in either SDP-E form of SDP-I form) has zero
standard duality gap. Note however that one can not still assume that Slater condi-
tion is satisfied, raising the possibility of developing infeasible interior point methods
in this framework.

Finally, certain analytical aspects of SDP have been studied in [52] and [85].

9.3 ALGORITHMS AND COMPLEXITY


9.3.1 An Overview of Known Complexity
Results
Let Qi, i = 0, ... , m be given rational symmetric matrices, c is a rational vector, and
let
G= {xJQ(x):::s Qo}
be the feasible region of SDP-I.
378 CHAPTER 9

By applying ellipsoid and interior point methods, one can deduce the following com-
plexity results for SDP.The maximum of the bitlengths of the entries of the Qi and
the components of c will be denoted by L, and define for c > 0,

S(G, c) = G + B(O, c) and S(G, -c) = {x IB(x, c) ~ G}.


• =
If a positive integer R is known a priori such that either G 0 or GnB(O, R) 1= 0,
then there is an algorithm that solves the "weak optimization" problem, i.e., for
any rational c > 0, the algorithm either finds a point y E S(G, c) that satisfies
cT x ~ cTy + c 'V x E S(G, -c), or asserts that S(G, -c) is empty ([30]). The
complexity of the algorithm is polynomial in n, m, L, and log(1/c).

• There are algorithms which, given any rational c > ° and an Xo such that
Qo - Q(xo) >- 0, compute a rational vector x such that Qo - Q(x) >- 0, and
cTx is within an additive factor c of the optimum value of SDP. The arithmetic
complexity of these algorithms is polynomial in n, m, L, log(1/c), log(R) and the
bitlength of xo, where R is an integer such that the feasible region of the SDP
lies inside the ball ofradius R around the origin ([3],[59]). However, it should be
mentioned that a polynomial bound has not been established for the bitlengths
of the intermediate numbers occurring in these algorithms.
• For any fixed m, there is a polynomial time algorithm (in n, L) that checks
whether there exists an x such that Q(x) >- 0, and ifso, computes such a vector
([75]). For the nonstrict case of Q(x) !:: 0, the feasibility can be verified in
polynomial time for the fixed dimensional problem as shown in [72].

9.3.2 Interior Point Methods


The development of IPMs for SDP is currently an extremely active research area.
The reader is referred to the surveys [3] and [92] for extensive details. Below, we will
describe in a somewhat cursory fashion, some ofthe specific interior point algorithms
developed.

At the outset, we emphasize the facts that these methods deal with the computation
of approximate optimal solutions only and that no bitlength analysis has been carried
out by any of the authors.

The main feature that enables one to extend LP interior point methods to SDP is
the fact that the logarithm of the determinant function serves as a barrier function
for SDP. Its self concordance was established and used by Nesterov and Nemirovskii
[59] in developing barrier methods for SDP. In [1] and [3], a potential reduction
Semidefinite Programming 379

algorithm was developed based on Ye's projective algorithm for LP [96]. Alizadeh
([1]) also pointed out the striking similarity between LP and SDP and suggested
a mechanical way of extending results from LP to SDP. In [40], Jarre developed a
barrier method. More potential reduction methods are given in [92]. In [35], a con-
vergent and easily implementable method was given (a matlab code is available at
the ftp site ftp://orion.uwaterloo.ca/pub/henry/software). A primal-dual method
was presented in [4]. In [60] and [61], Nesterov and Todd discuss primal-dual meth-
ods for self-scaled cone problems and develop what has come to be known as the
Nesterov-Todd (NT) direction. In a recent work [22], Freund discusses interior-point
algorithms for SDPs in which no regularity (Slater-like) conditions are assumed. A
self-dual skew-symmetric embedding method was presented in [46] for the initializa-
tion of interior point methods for SDP.

Recently, several papers have appeared on interior point methods for SDP, and these
can be obtained from the interior point archive maintained at the Argonne National
Laboratories (WWW URL is http://www.mcs.anl.gov /home/otc/lnteriorPoint/index.html).
To follow are some details on these results.

The primal-dual central path is defined as the set of solutions (U (J.l), x(J.l), S(J.l)) of
the system

U. Qi Ci 'if i = 1, ... , m u::.o


2:::1 XiQi + S Qo s::.o (SDP-Path)
US J.ll.
If we assume that the matices Qi 'if i = 1, ... , m are independent then for each J.l > 0
the solution (U(J.l), x(J.l), S(J.l)) is unique. If we have a solution with J.l =
0 then
we have an optimal solution pair with duality gap zero. Given an interior primal
dual-solution (U, x, S) that satisfy the first two of above requirements and satisfy
approximately the last one ("approximately centered solution"), the search direction
(~U, ~x, ~S) in the primal-dual methods is derived by solving the following Newton
system;
~U·Qi O'ifi=I, ... ,m
2:::1 ~XiQi + ~S o (SDP-Newt)
~US+U~S J.ll -US.
The solution of the above system under the usual mild assumption is unique. The ~S
part is symmetric, while the ~U part is not. Then one has to symmetrize this part
and determine a step length a to obtain the new iterate (U +a~U, x+a~x, S+a~S).
Then the procedure is repeated while the parameter J.l is driven to zero.

If the candidate solution (U, x, S) does not satisfy the first two requirements of SDP-
Path, then we enter the domain of infeasible interior point methods. In this case,
the right hand sides of the first two Newton equations in SDP-Newt are not zero,
380 CHAPTER 9

but instead they equal the current primal and dual infeasibility, respectively. These
methods simultaneously reduce the infeasibility and 1-'.

The papers [59, 92] deal with potential reduction methods. Much work has recently
been done on primal-dual central path following algorithms. Detailed study of search
directions can be found in [48, 56]. The properties of central trajectories are studied
in detail in the papers [59, 27, 20, 86],

The so-called primal or dual logarithmic-barrier path-following methods are gener-


alized by Faybusovich [20] (general analysis), de Klerk et al. [34] (full-step meth-
ods with local quadratic convergence) and Anstreicher and Fampa [7] (large update
method). Primal-dual path following methods were independently developed by
Sturm and Zhang [87] (a full-step primal-dual algorithm, predictor-corrector algo-
rithm and the largest-step method) and Jiang [41] (long-step primal-dual logarithmic
barrier algorithm of Jansen et al. [38]).

An infeasible interior point method for SDP was developed by Potra and Sheng
[71]. This method is based on the Lagrange dual. It would be an interesting result
to develop infeasible interior point methods based on the ELSD duality approach
(see below). Interior point methods for monotone semidefinite complementarity
problems have been developed by Shida and Shindoh [73]. They prove that the
central trajectory converges to the analytic center of the optimal set. Further, they
prove global convergence of an infeasible interior point algorithm for the monotone
semidefinite complementarity problem.

With the exception of [22], most of the methods mentioned above make an explicit
assumption that the primal and/or the dual have a strictly feasible solution. As
mentioned in S9.2.2, it seems that infeasible interior point methods can be developed
using the gapfree dual ELSD and the "corrected primal" problem P2. The suitability
of infeasible IPMs for this situation can be justified as follows. Some difficulties with
initialization can be circumvented using a corrected primal based infeasible IPM
approach. Unlike in the case of LP, "Phase 1" type initialization can run into some
difficulties for SDP. For instance, for the SDP-I problem, consider the "Phase 1"
problem: inf{zoIQ(x) ~ Qo + zoI,zo 2:: O}. It may happen here that the infimum
is zero without being attained. No satisfactory "Big-M" method has been devised
for SDP (based on the examples of "ill-behaved SDPs" in [76], it is our conjecture
that M will need to be exponentially large in bitlength here). Also, even if the
initialization step is somehow carried out, there are instances of SDPs, for which all
rational solutions are exponential in bitlength, and hence the whole process becomes
inherently exponential, contradicting the initial objective of devising a polynomial
time algorithm, even in an approximate sense.
Semidefinite Programming 381

Open Problem 9.1 Develop an infeasible IPM for general semidefinite programs
using ELSD and the corrected primal P2.

Open Problem 9.2 Perform a bitlength analysis of the interior point methods for
SDP.

We now turn our attention to affine scaling algorithms. The affine scaling linear
programming algorithms have gained tremendous popularity owing to their charming
simplicity. The global convergence properties of these methods (for LP) have been
uncovered relatively recently (see [89] in this volume). In particular, Tsuchiya and
Muramatsu ([90]) proved that when the step length taken is in (0,2/3], then both
primal and dual iterates converge to optimal solutions for the respective problems.
It is not hard to extend the LP affine scaling algorithm to semidefinite programming.
For instance, for the problem SDP-I, let x be a strictly feasible solution. Let P =
Qo - L~l XiQ; >- O. Then consider the inequality

trace«Qo - LX;Qi)P- 2 (Qo - LXiQi)):::; 1.


i i

It can be shown that every feasible solution to the above inequality is feasible for
SDP-I. One can easily maximize cT x over the above ellipsoid and repeat as in the
standard dual affine scaling method. It remains to be seen if the proofs of [90) can
be extended to the above approach.

Open Problem 9.3 Prove the global convergence of the above affine scaling method.

The primal-dual affine scaling algorithms (both the Dikin-affine scaling of Jansen et
al. [39] and the classical primal-dual affine scaling algorithm of Monteiro et al. [57])
have been generalized by de Klerk et al. [45]. The iteration complexity results are
analogous to the LP case.

We will briefly mention about non-interior point methods for SDP. In [63], Overton
discusses an active set type method. In [75] (see [81]), and later independently in
[85], a notion of the convexity of a matrix map was introduced. Using this, one can
define what may be called a "convex nonlinear SDP". In [81) a Newton-like method
was developed for convex nonlinear semidefinite inequality systems, and in [85],
certain sensitivity results have been derived. While attempts towards extending
the LP simplex method to SDP have been made ([49, 70]), we consider that this
problem remains unsolved. Also, since the SDP can be treated as a non differentiable
convex optimization problem (NDO), most NDO algorithms can be applied to solve
382 CHAPTER 9

semidefinite programs. See [67] for interior point methods for global optimization
problems, which solve some SDP relaxations in a disguised form.

Open Problem 9.4 Develop an appropriate (globally convergent) extension of the


simplex method to SDP.

9.3.3 Feasibility and Complexity in SDP


In this section, we address the issue of exact complexity of Semidefinite Program-
ming. As is well known, every rational linear inequality system that is feasible has a
rational solution of polynomial bitlength. In sharp contrast, the following situations
can occur for a feasible rational semidefinite inequality:

• it only has irrational solutions

• all its rational solutions have exponential bitlength.

Many such examples are discussed in [22] and [76]. Therefore, rigorously speaking,
it is not a well stated problem to want to compute an exact optimal solution of an
arbitrary rational SDP, since the output is not representable in the Turing machine
model. Let us consider the feasibility problem defined below.

Definition 9.3.1 Semidefinite Feasibility Problem (SDFP) Given rational


symmetric matrices Qo, ... , Qm, determine if the semidefinite system
m

LXiQ; ~ Qo
;=1

is feasible.

Note that the required output of this problem is a "Yes" or a "No" (decision prob-
lem). Therefore, it is reasonable to ask whether there is a polynomial time algorithm
for the solution ofSDFP. In our opinion, this is the most challenging and outstanding
problem in semidefinite programming, at least in the context of complexity theory.

Open Problem 9.5 Determine whether the problem SDFP is NP-Hard, or else
find a polynomial time algorithm for its solution.
Semidefinite Programming 383

In [76], the following results concerning the exact complexity of SDP are established.

1. If SDFPENP, then SDFPECo-NP, and vice versa.


2. In the Turing Machine model [25], SDFP is not NP-Complete unless NP=Co-
NP,
3. SDFP is in NPnCo-NP in the real number model of Blum, Shub and Smale
[12].
4. There are polynomial time reductions from the following problems to SDFP:
(a) Checking whether a feasible SDP is bounded (i.e., it has a finite optimal
value).
(b) Checking whether a feasible and bounded SDP attains the optimum.
(c) Checking the optimality of a given feasible solution.

In [72], the authors discuss complexity results for fixed dimensional SDPs (both
n-fixed and m-fixed cases), extending and strengthening certain results of [75].

9.4 APPLICATIONS
Applications of semidefinite programming can be broadly classified into three groups:

• SDP as a relaxation of nonconvex problems, in particular, mathematical pro-


grams involving quadratic functions.
• Combinatorial Optimization applications.
• Direct SDP models, arising in some engineering problems.

Also, as seen earlier, SDP generalizes linear and convex quadratic programming
problems, more generally, convex quadratic programming with convex quadratic
constraints. Since the latter has not been extensively studied by itself, most of its
applications (which arise in certain facility location problems as studied in [18]) can
also be considered to be applications of SDP.

Semidefinite Programming can be naturally arrived (see [75]) at by relaxing Mul-


tiquadratic Programs (MQP), which are optimization problems of the type given
below.
384 CHAPTER 9

mm: xTQox + 2b'6 x + Co (MQP)


s.t. XTQiX + 2bT x + Ci = 0 V i = 1, ... , m.
By introducing a new matrix variable U and imposing an additional constraint U =
xxT, we can rewrite the above problem as

mm: U. Qo + 2b'6 x + Co
s.t. U • Qi + 2bT x + Ci =0 Vi = 1, ... , m (MQP2)
U -xxT =0
Now, let us relax the condition U - xx T =0 to U - xxT !: 0, to obtain
mm: U • Qo + 2b'6 x + Co
s.t. U. Qi + 2bT x + Ci = 0 V i = 1, ... ,m (RMQP)
U-xxT !: O.
The condition U - xxT !: 0 is equivalent to

Therefore, the relaxed MQP (RMQP) is a semidefinite program. This SDP relax-
ation of MQP will be referred to as the convexification Relaxation of MQP. The
reason being that, if f : R n -> R m is the quadratic map composed of the con-
straint functions of the MQP, then the feasibility of that problem can be restated
as 0 E f(Rn). On the other hand, it can be shown (see [75]) that the semidefinite
program RMQP is feasible if and only if 0 is in the convex hull of the image, i.e.,
Conv(f(Rn». This relaxation was originally introduced by Shor [74], although in a
somewhat different form. It is also investigated in [24].

We will return to the connections between MQP and SDP after discussing some
results on the application of SDP to combinatorial optimization.

9.4.1 Combinatorial Optimization


Stable Set Problems and Perfect Graphs
One of the early works in semidefinite programming emanated in the context of
certain graph optimization problem such as the Maximum Stable Set (MSS) and
related problems.

A clique (resp. stable set) in a graph G =


(V, E) is a subset S of V in which every
pair of nodes is adjacent (resp. nonadjacent). The problem MSS is that of finding
Semidefinite Programming 385

the largest stable set in G. Let STAB( G) denote the convex hull of the characteristic
vectors of the stable sets of G. If u, v are the characteristic vectors of a clique and
a stable set in G, we have the inequality uT v ::; 1. This implies that the polyhedron

QSTAB( G) = {x ~ 0 I x T u ::; 1 V characteristic vectors u of cliques of G}

contains STAB(G). Now, note that the problem of finding a maximum stable set is
equivalent to each of the following problems:

1. maximize eT x over STAB( G)

2. maximize eTx where x satisfies XiXj = 0 V i,j E E and Xi E {O, I} ViE V.

Note that the second of these problems is a multiquadratic program, and hence we
apply the convexification relaxation to it. Accordingly, we define the spectrahedron

S(G):= {(U,x) Ix ~ O,U t XXT,Uii = Xi Vi E V,Uij = 0 V i,j E E}.


This spectrahedron can be projected on the x variables to get the following set
defined in, for instance, [30]:

TH(G) = {x 13U such that (U, x) E S(G)}.


It is not hard to show ([30], [54]) that

STAB(G) ~ TH(G) ~ QSTAB(G),

and therefore, as a relaxation to MSS, one can maximize eT x over TH(G), which is
an SDP in both variables x and U.

For general graphs, not much is known about the effectiveness of the above relax-
ation. However, for a class of graphs known as perfect graphs, the relaxation is
exact. We will circumvent the usual combinatorial definition of perfect graphs as,
for our purposes, it suffices to define these graphs as those for which STAB( G) =
QSTAB(G). Clearly, in this case, all the three sets STAB(G), TH(G) and QSTAB(G)
coincide. Thus, one can approximately maximize eT x over TH( G) by the use of a
polynomial approximation algorithm for SDP. For techniques that extract discrete
solutions from this approximation, the reader is referred to [30] and [1]. Further-
more, when G is perfect, the following additional problems can be solved using this
methodology:

• Find the largest clique in G.


386 CHAPTER 9

• Find the smallest number of colors required to color the vertices of G such that
every pair of adjacent vertices receive different colors.

In [1], a sublinear time parallel algorithm was presented for solving the stable set
and other problems for perfect graphs. The reader is also referred to the expository
article [47] by Knuth on this approach.

Finally, we turn to the problem of characterization and recognition of perfect graphs.


The definition of perfect graphs (STAB( G) = QSTAB( G)) involves two polyhedra
whose description involves the set of all maximal cliques and maximal stable sets of
G. Since these may be exponential in number, the above definition does not seem
to yield directly a recognition algorithm for perfect graphs. For that matter, even if
the number of cliques and stable sets is polynomial, no polynomial time algorithm
is known for solving this type of problems (see [55]). However, the following was
shown in [30].

Proposition 9.4.1 A graph G is perfect if and only if TH(G) is a polyhedron.

This proposition may be useful in both addressing the complexity of perfect graph
recognition, as well as settling what might be considered the most celebrated and
yet unresolved conjecture in Graph Theory, which states that a graph G is perfect
if and only neither G nor its complement G induce an odd cycle of size at least 5.

Open Problem 9.6 Characterize the polyhedrality of TH(G).

Since TH( G) of G is a projected spectrahedron, it is natural to ask about the com-


plexity of verifying the polyhedrality of an arbitrary projected spectrahedron. Un-
fortunately, this general problem turns out to be NP-Hard as shown in an upcoming
paper ([77]) by Ramana. There, it was also shown that, under an irredundancy
assumption, the verification of polyhedrality of a spectrahedron can be done in ran-
domized polynomial time. This latter result, however, does not seem to extend easily
to projected spectrahedra such as TH(G).

The Maximum Cut Problem


Let us now turn our attention to another celebrated combinatorial optimization
problem, called the Maximum Cut Problem (abbreviated MAXCUT): given a set of
nonnegative weights Wij, 1 ::; i < j ::; n, the problem is to determine a partition
SuS of the set N = {l, ... , n} that maximizes :LiES,jES Wij. This problem can be
Semidefinite Programming 387

modeled as the quadratic integer program given below, where W is the matrix of
weights, and J is the matrix of all ones.
max: W. J - yTWy
(MAXcUT)
Yi E {-l,+l}
Note that Yi E {-I, +1} is equivalently written as y[ 1. In [26], the following
SDP relaxation of MAXcUT was considered.
max: W.J-W.U
U to (GWR)
Ui; E {-1,+1}.
It is not hard to see that this is nothing but the convexification relaxation of the
MQP form of MAX CUT. Let us call it the Goemans-Williamson Relaxation (GWR)
of the maximum cut problem. The remarkable results of [26] are the following.

1. The optimal objective value of GWR is at most 1.14 times of that of MAXcUT.
2. From an optimal solution to GWR, a cut whose expected value is at least .878
times the optimal cut value can be obtained using randomization.

The underlying geometric reason behind item 1 above is best described using the
following theorem formulated by Laurent [51]. First, let en denote the convex hull
of all matrices of the form vvT , where v E {-I, + l}n. It is clear that the MAX CUT
problem amounts to maximizing a linear function over en. Let us return to the
convex set (called elliptope) en defined in 89.2.1. The main geometrical result
concerning these sets is given below. For a matrix A and a univariate function f,
fo(A) denotes the matrix whose (i,j)th entry is f(A;j).

Theorem 9.4.1 en <; en <; {sino(~U) I U E en}.

Furthermore, the following nonlinear semidefinite program has the same objective
function value as MAXcUT (see [26]):
1
- max{W. (arccoso(U)) I U E en}.
7r

As mentioned above, to obtain an approximately optimal cut, a randomized rounding


technique is applied, and then the entire algorithm is derandomized. However, it
seems to be an interesting question to ask if there is a direct deterministic procedure
that achieves the same. In particular, such a method might make use of semidefinite
programming duality. We state this as an open problem.
388 CHAPTER 9

Open Problem 9.7 Find a deterministic procedure for obtaining an approximate


maximum cut from an optimal solution to the semidefinite relaxation GWR.

Other Combinatorial Optimization Problems


Several other applications of SDP to combinatorial optimization have recently been
developed. The following is a list of some of these results.

1. In [26], the authors extend their analysis for MAX CUT to derive strong ap-
proximation results for the following problems: MAX SAT, MAX 2SAT, MAX
DICUT (the first two problems are related to the Satisfiability Problem, and
the last is a directed version of the MAX CUT problem).
2. In [43], an approximate graph coloring algorithm was developed.
3. Extensions of the Goemans-Williamson approach to the max-k-cut problem are
given in [23].
4. Here are some results that are somewhat negative concerning the application
of SDP to combinatorial optimization. Recently, Kleinberg and Goemans [44]
have shown that certain SDP relaxations of the vertex cover problem have a
worst case performance guarantee of only 2 (in the limit) coinciding with what
the standard LP relaxation guarantees. A similar result for the independent set
problem has been established by Alon and Kahale [6].

A topic that was studied well before SDP became popular was that of PSD comple-
tions of partially specified matrices. An early and well-written paper is [28]. These
problems involve determinant maximization subject to semidefinite constraints and
a recent reference to these is [93].

We would like to mention that the well known Graph Isomorphism Problem might
perhaps be reducible to an SDP feasibility problem. Given two graphs G 1 , G 2 with
adjacency matrices A, B, respectively, the graphs are isomorphic if and only if there
exists a permutation matrix X such that A = XT BX. This can be written as the
MQP
= =
A XT BX, X e e, XT e e, X 0 X= =X,
where e is the vector of all ones, and 0 gives the entrywise (Hadamard) product of
two matrices. In [82], it was shown that one can relax the condition that X is a
permutation matrix to it being doubly stochastic. This gives the MQP:

A=XTBX, Xe=e, XTe=e, O~X~J,


Semidefinite Programming 389

where J is the matrix of all ones. Whether the convexification relaxation of either
of the above systems is exact appears to be an intriguing question.

Open Problem 9.8 Is Graph Isomorphism reducible to Semidefinite Feasibility?

9.4.2 More on Multiquadratic Programming


We will return to the connections between multiquadratic programming and SDP.
As we have seen, for every MQP, there is a natural SDP relaxation, called the
convexification relaxation. It is natural to ask when this relaxation is exact. This is
quite similar to the question of when LP relaxation is exact for Integer Programming
problems (when the coefficient matrix is unimodular, for instance). We will present
two slightly different ways of addressing this problem. The first of these comes from
the analysis of [75], and the second is inspired by results on perfect graphs and the
work of [54].

Let f : R n -+ R m be a constant-free (i.e., f(O) = 0) quadratic map, and consider


the problem of verifying the feasibility of f(x) =
b, or equivalently, b E f(Rn).
As mentioned earlier, the convexification relaxation 0 E Conv(f(Rn» reduces to a
semidefinite program. The convexification relaxation is exact for every bERm if
and only if f(Rn), the image of f, is convex. This inspires the definition of ICON
maps, which are maps that have convex images. In [80], quadratic ICON maps
were characterized. Unfortunately, as shown in the same paper, the recognition
of ICON maps is NP-Hard. However, the restriction to special classes of quadratic
maps might yield polynomial time recognition results. For the optimization problem,
=
min{fo(x) I f(x) OJ, the convexification relaxation can be shown to be exact when
the combined map (fo(x), f(x» is ICON.

The second formulation is very closely related to N + operator defined [54]. Let
f(x) = (!1(x), ... , fn(x», where f;(x) = XTQiX+bf X+Ci Vi = 1, ... , m, and consider
the problem max{cT x I f(x) = OJ. Then, the convexification relaxation is
max{ cT x I Qi • U + bi x + Ci = 0 V i, U t xxT }.

Define the convex set (projection of a spectrahedron)

TH[fJ = {x 13U such that Qi. U + bi x + Ci = 0 V i, U t xxT }.


Clearly, TH[f] contains the convex hull of the MQP feasible region Z = {x I f(x) =
OJ. Let us say that f is a perfect quadratic map if

TH[f] = Conv( {x I f(x) = O}).


390 CHAPTER 9

Now, let G =(V, E) be a graph and fG be the quadratic map composed both
of lEI components given by XiX, V i, j E E as well as IVI components given by
x;- Xi ViE V. Then it is seen that TH(fG] is nothing but the usual TH(G) defined
for graphs earlier, and Conv(Z) is precisely STAB(G), and hence the perfectness of
the graph G is the same as the perfectness of the quadratic map fG.

Open Problem 9.9 Characterize the perfectness of quadratic maps.

9.4.3 Engineering and Other Applications


Listed below are certain applications that have been discussed for most part in [14]
and [92].

• Logarithmic Chebychev Approximation ([92]).


• Structural Design problems, such as Truss design are found in [11].
• Pattern separation problems ([92]).
• Statistical applications such as minimum trace factor analysis ([92]).
• Control Theory applications (see [14]).

9.5 CONCLUDING REMARKS


Since Semidefinite Programming came to light four to five years ago, significant
strides have been made in this subject. Theoretical as well as algorithmic advances
continue to be made fairly rapidly. Several open problems and future research direc-
tions were presented in this chapter. We will conclude the chapter by mentioning a
promising generalization of semidefinite programming which was unearthed by Giiler
[31].
A real homogeneous polynomial p is said to be hyperbolic with respect to a nonzero
vector d, if p(d) > 0 and the univariate (in t) polynomial p(x + td) has only real
roots for every real vector x. Let K(p, d) denote the component of {x I p(x) > O}
that contains d. It is well known that this is an open convex cone and is called the
hyperbolicity cone of p (in the direction d). Now let us define a Hyperbolic Program
(HPJ to be one of the type

max{cT X I L(x) E CI(K)},


Semidefinite Programming 391

where L( x) is an affine map of x. Hyperbolic programs generalize semidefinite pro-


grams. To see this, let p(U) be the determinant polynomial of a symmetric matrix
variable U. Then p( U + tJ) = 0 gives precisely the negatives of the eigenvalues of
the matrix U, which are real. Furthermore, p(I) = 1 > 0, and hence, in this case,
the hyperbolicity cone K(p, 1) is simply the cone of positive definite matrices, whose
closure is the cone of PSD matrices. Taking the affine map L(x) to be Qo - Q(x),
we recover the semidefinite program SDP-I.

In [31], Giiler discusses the existence of barrier functions for problems of this type.
We strongly believe that many results that are known for SDP, such as interior point
methods and duality theories (both standard and ELSD duals) can be extended to
hyperbolic programs.

Acknowledgements
The first author Ramana would like to thank Laci Lovasz, Jim Renegar and Rob
Freund for several interesting discussions on SDP, and Don Hearn for support and
encouragement.

REFERENCES
[1] F. ALIZADEH, Combinatorial Optimization with Interior Point Methods and
Semi-Definite Matrices, Ph.D. Thesis, Computer Science Department, Univer-
sity of Minnesota, Minneapolis, Minnesota, 1991.

[2] F.ALIZADEH, Optimization Over Positive Semi-Definite Cone; Interior-Point


Methods and Combinatorial Applications, In "Advances in Optimization and
Parallel Computing", P.M. Pardalos, editor, North-Holland, 1992.

[3] F. ALIZADEH, Interior Point Methods in Semidefinite Programming with Ap-


plications to Combinatorial Optimization, SIAM 1. Opt., Vol. 5 (1995), pp.
13-5l.

[4] F. ALIZADEH, J.A. HAEBERLY AND M. OVERTON, Primal-dual Interior Point


Methods for Semidefinite Programming, Manuscript, 1994.

[5] F. ALIZADEH, J.-P. A. HAEBERLY AND M.L. OVERTON, Complementarity


and Nondegeneracy in Semidefinite Programming, Submitted to Math. Pro-
gramming, March 1995.
392 CHAPTER 9

[6] N. ALON AND N. KAHALE, Approximating the Independence Number Via the
(I-function, Manuscript, 1995.

[7] K.M. ANSTREICHER AND M. FAMPA, A Long-step Path Following Algorithm


for Semidefinite Programming Problems, Working Paper, Department of Man-
agement Sciences, University of Iowa, Iowa City, USA, 1996.

[8] G.P. BARKER, The lattice of faces of a finite dimensional cone, Linear Algebra
and its Applications, Vol. 7 (1973), pp. 71-82.

[9] G.P. BARKER AND D. CARLSON, Cones of Diagonally dominant matrices, Pa-
cific J. of Math, Vol. 57 (1975), pp. 15-32.

[10] T. BECKER AND V. WEISPFENNING (WITH H. KREDEL), Grobner Bases: A


Computational Approach to Commutative Algebra, Springer-Verlag, New York,
1993.

[11] A. BEN-TAL AND M.P. BLEDSOE, A New Method for Optimal Truss Topology
Design, SIAM J. Optim., Vol. 3 (1993), pp. 322-358.

[12] L. BLUM, M. SHUB AND S. SMALE, On a Theory of Computation and Com-


plexity over the Real Numbers: NP-Completeness, Recursive Functions and Uni-
versal Machines, Bull. (New Series) of the AMS, Vol. 21 (1989), pp 1-46.

[13] J. BORWEIN AND H. WOLKOWICZ, Regularizing the Abstract Convex Program,


J. Math. Anal. Appl., Vol. 83(1981).

[14] S. BOYD, L. EL GHAOUI, E. FERON AND V. BALAKRISHNAN, Linear Matrix


Inequalities in System and Control Theory, Volume 15 of Studies in Applied
Mathematics, SIAM, Philadelphia, PA, 1994.

[15] A. BRc,bNDsTED, An Introduction to Convex Polytopes, Springer-Verlag, New


York,1983.

[16] D. Cox, J. LITTLE AND D. O'SHEA, Ideals, Varieties, and Algorithms,


Springer-Verlag, New York, 1992.

[17] V. CHVATAL, Linear Programming, W.H. Freeman and Co., New York, 1983.
[18] J. ELZINGA, D.W. HEARN AND W. RANDOLPH, Minimax Multifacility Lo-
cation with Euclidean Distances, Transportation Science, Vol. 10, (1976), pp.
321-336.
[19] L. FAYBUSOVICH, On a Matrix Generalization of Affine-scaling Vector Fields,
SIAM J. Matrix Anal. Appl., Vol. 16 (1995), pp. 886-897.
Semidefinite Programming 393

[20] L. FAYBUSOVICH, Semi-definite Programming: a Path-following Algorithm for


a Linear-quadratic Functional, Technical Report, Dept. of Mathematics, Uni-
versity of Notre Dame, Notre Dame, IN, USA, 1995.
[21] R. FLETCHER, Semi-definite Matrix Constraints in Optimization, SIAM J. on
Control and Optimization, Vol. 23 (1985), pp. 493-513.
[22] R. FREUND, Complexity of an Algorithm for Finding an Approximate Solution
of a Semi-Definite Program, with no Regularity Condition, Working Paper, OR
302-94, ORC, MIT, 1994.
[23] A. FRIEZE AND M. JERRUM, Improved Approximation Algorithms for MAX-
k-CUT and MAX BISECTION, To appear in SIAM Journal of Discrete Math-
ematics, 1996.
[24] T. FUJIE AND M. KOJIMA, Semidefinite Programming Relaxation for Noncon-
vex Quadratic Programs, To appear in J. of Global Optimization, 1996.
[25] M.R. GAREY AND D.S. JOHNSON, Computers and Intractability: A Guide to
the Theory of NP-Completeness, W.H. Freeman and Company, New York, 1979.
[26] M.X. GOEMANS AND D.P. WILLIAMSON, Improved Approximation Algorithms
for Maximum Cut and Satisfiability Problems Using Semidefinite Programming,
J. ACM, Vol. 42 (1995), pp. 1115-1145.
[27] D. GOLDFARB AND K. SCHEINBERG Interior Point Trajectories in Semidefinite
Programming, Working Paper, Dept. of IEOR, Columbia University, New York,
NY, 1996.
[28] R. GRONE, C.R. JOHNSON, E.M. SA AND H. WOLKOWICZ, Positive Definite
Completions of Partial Semidefinite Matrices, Linear Algebra and its Applica-
tions, Vol. 58 (1984), pp. 109-124.
[29] M. GROTSCHEL, L. Lov ASZ AND A. SCHRIJVER, Polynomial Algorithms for
Perfect Graphs, Annals of Discrete Mathematics 21, C. Berge and V. Chvatal,
eds., North Holland, 1984.
[30] M. GROTSCHEL, L. LOVASZ AND A. SCHRIJVER, Geometric Algorithms and
Combinatorial Optimization, Springer-Verlag, Berlin, 1988.
[31] O. GULER, Hyperbolic Polynomial and Interior Point Methods for Convex Pro-
gramming, Technical report TR95-40, Dept. of Math. and Stat., University of
Maryland, Baltimore County, Baltimore, MD 21228.
[32] J .-P. A. HAEBERLY AND M.L. OVERTON, Optimizing Eigenvalues ofSymmet-
ric Definite Pencils, Proceedings of American Control Conference, Baltimore,
July 1994.
394 CHAPTER 9

[33] J.-P. A. HAEBERLY AND M.L. OVERTON, A Hybrid Algorithm for Optimizing
Eigenvalues of Symmetric Definite Pencils, SIAM J. Matr. Anal. Appl., Vol. 15
(1994), pp. 1141-1156.
[34] B. HE, E. DE KLERK, C. Roos AND T. TERLAKY, Method of Approximate
Centers for Semi-definite Programming, Technical Report 96-27, Faculty of
Technical Mathematics and Computer Science, Delft University of Technology,
Delft, The Netherlands, 1996.
[35] C. HELMBERG, F. RENDL, R. VANDERBEI AND H. WOLKOWICZ, An Interior-
point Method for Semidefinite Programming, To appear in SIAM J. Optim.,
1996.
[36] R.B. HOLMES, Geometric Functional Analysis and its Applications, Springer-
Verlag, New York, 1975.
[37] R. HORN AND C.R. JOHNSON, Matrix Analysis, Cambridge University Press,
Cambridge, 1985.
[38] B. JANSEN, C. Roos, T. TERLAKY AND J .-PH. VIAL, Primal-dual Algorithms
for Linear Programming Based on the Logarithmic Barrier Method, Journal of
Optimization Theory and Applications, Vol. 83 (1994), pp. 1-26.
[39] B. JANSEN AND C. Roos AND T. TERLAKY, A Family of Polynomial Affine
Scaling Algorithms for Positive Semi-definite Linear Complementarity Prob-
lems, Technical Report 93-112, Faculty of Technical Mathematics and Com-
puter Science, Delft University of Technology, Delft, The Netherlands, 1993.
(To appear in SIAM Journal on Optimization).
[40] F. JARRE, An Interior Point Method for Minimizing the Maximum Eigenvalue
of a Linear Combination of Matrices, Report SOL 91-8, Dept. of OR, Stanford
University, Stanford, CA, 1991.
[41] J. JIANG, A Long Step Primal Dual Path Following Method for Semidefinite
Programming, Technical Report 96009, Department of Applied Mathematics,
Tsinghua University, Beijing 100084, China, 1996.
[42) D.S. JOHNSON, C.H. PAPADIMITRIOU AND M. YANNAKAKIS, How Easy is
Local Search?, Journal of Compo Sys. Sci., Vol. 37 (1988), pp. 79-100.
[43] D. KARGER, R. MOTWANI AND MADHU SUDAN, Improved Graph Coloring by
Semidefinite Programming, In 34th Symposium on Foundations of Computer
Science, IEEE Computer Society Press, 1994.
[44] J. KLEINBERG AND M. GOEMANS, The Lovasz Theta Function and a Semidef-
inite Relaxation of Vertex Cover, Manuscript, 1996.
Semidefinite Programming 395

[45] E. DE KLERK, C. Roos AND T. TERLAKY, Polynomial Primal-dual Affine


Scaling Algorithms in Semidefinite Programming, Technical Report 96-42, Fac-
ulty of Technical Mathematics and Computer Science, Delft University of Tech-
nology, Delft, The Netherlands, 1996.

[46] D. DE KLERK, C. Roos, T. TERLAKY, Initialization in Semidefinite Program-


ming Via a Self-dual Skew-Symmetric Embedding, Report 96-10, Faculty of
Technical Mathematics and Informatics, Delft University of Technology, Delft,
The Netherlands, 1996.

[47] D. E. KNUTH, The Sandwich Theorem, The Electronic Journal of Combina-


torics 1 (1994), #Al.

[48] M. KOJIMA, M. SHIDA AND S. SHINDOH, Global and Local Convergence of


Predictor-Corrector Infeasible-Interior-Point Algorithms for Semidefinite Pro-
grams, Technical Report B-305, Department of Mathematical and Computing
Sciences, Tokyo Institute of Technology, Tokyo, Japan, 1996.

[49] J .B. Lasserre, Linear Programming with Positive Semi-definite Matrices, To


appear in Mathematical Problems in Engineering, 1994.

[50] M. LAURENT AND S. POLJAK, On a Positive Semidefinite Relaxation of the


Cut Polytope, To appear in Linear Algebra and its Applications, 1996.

[51] M. LAURENT, The Real Positive Semidefinite Completion Problem for Series-
Parallel Graphs, Preprint, 1995.

[52] A.S. LEWIS, Eigenvalue Optimization, ACTA Numerica (1996), pp. 149-190.

[53] L. LovAsz, On the Shannon Capacity of a Graph, IEEE Transactions on In-


formation Theory IT-25 (1979), pp. 1-7.

[54] L. Lov ASZ AND A. SCHRJIVER, Cones of Matrices and Set Functions and 0-1
Optimization, SIAM J. Opt. 1 (1991), pp. 166-190.

[55] L. Lov ASZ, Combinatorial Optimization: Some Problems and Trends, DIMACS
Tech Report, 92-53, 1992.

[56] R.D.C. MONTEIRO, Primal-Dual Algorithms for Semidefinite Programming,


Working Paper, School of Industrial and Systems Engineering, Georgia Institute
of Technology, Atlanta, USA, 1995.

[57] R.D.C. MONTEIRO, I. ADLER AND M.G.C. RESENDE, A Polynomial-time


Primal-dual Affine Scaling Algorithm for Linear and Convex Quadratic Pro-
gramming and its Power Series Extension, Mathematics of Operations Research,
Vol. 15 (1990), pp. 191-214.
396 CHAPTER 9

[58] R.D.C. MONTEIRO AND J .-S. PANG, On Two Interior Point Mappings for
Nonlinear Semidefinite Complementarity Problems, Working Paper, School of
Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta,
USA, 1996."
[59] Y. NESTEROV AND A. NEMIROVSKII, Interior Point Polynomial Methods for
Convex Programming: Theory and Applications, SIAM, Philadelphia, 1994.
[60] Y. NESTEROV AND M.J. TODD, Self-scaled Barriers and Interior-point Methods
in Convex Programming, TR 1091, School of OR and IE, Cornell University,
Ithaca, NY 1994.
[61] Y. NESTEROV AND M.J. TODD, Primal-dual Interior-point Methods for Self-
scaled Cones, TR 1125, School of OR and IE, Cornell University, Ithaca, NY
1995.
[62] M.L. OVERTON, On Minimizing the Maximum Eigenvalue of a Symmetric Ma-
trix, SIAM J. Matrix Anal. Appl., Vol. 9 (1988) pp. 256-268.
[63] M.L. OVERTON, Large-Scale Optimization of Eigenvalues, SIAM J. Optimiza-
tion, Vol. 2 (1992), pp. 88-120.
[64] M.L. OVERTON AND R.S. WOMERSLEY, Second Derivatives for Eigenvalue
Optimization SIAM J. Matrix Anal. Appl., Vol. 16 (1995), pp. 697-718.
[65] M.L. OVERTON AND R.S. WOMERSLEY, Optimality Conditions and Duality
Theory for Minimizing Sums of the Largest Eigenvalues of Symmetric Matrices,
Math. Programming, Vol. 62 (1993), pp. 321-357.
[66] P.M. PARDALOS, Continuous Approaches to Discrete Optimization Problems,
In Nonlinear Optimization and Applications, G. Di Pillo & F. Giannessi, Ed.,
Plenum Publishing (1996).
[67] P.M. PARDALOS AND M.G.C RESENDE, Interior Point Methods for Global
Optimization, Chapter 12 of this volume.
[68] G. PATAKI, On the Facial Structure of Cone-LP's and Semidefinite Programs,
Management Science Research Report MSRR-595, GSIA, Carnegie-Mellon Uni-
versity, 1994.
[69] G. PATAK I , On the Multiplicity of Optimal Eigenvalues, Technical Report,
GSIA, 1994.
[70] G. PATAKI, Cone-LP's and Semidefinite Programs: Geometry, Basic Solutions
and a Simplex-type Method, Management Science Research Report MSRR-604,
GSIA, Carnegie-Mellon University, 1994.
Semidefinite Programming 397

[71] F.A. POTRA AND R. SHENG, A Superlinearly Convergent Primal-duallnfeas-


ible-interior-point Algorithm for Semidefinite Programming, Reports on Com-
putational Mathematics, No. 78, 1995, Dept. of Mathematics, The University
of Iowa, Iowa City, USA.

[72] L. PORKOLAB AND L. KHACHIYAN, On the Complexity of Semidefinite Pro-


grams, RUTCOR Research Report, RRR 40-95, Rutgers University, New
Brunswick, NJ-08903.

[73] M. SHIDA AND S. SHINDOH, Monotone Semidefinite Complementarity Prob-


lems, Technical Report B-312, 1996, Department of Mathematical and Com-
puting Sciences, Tokyo Institute of Technology, Tokyo, Japan.

[74] N .Z. SHOR, Quadratic Optimization Problems, Soviet Journal of Computer and
Systems Sciences, Vol. 25 (1987), pp. I-II.

[75] M.V. RAMANA, An Algorithmic Analysis of Multiquadratic and Semidefinite


Programming Problems, Ph.D. Thesis, The Johns Hopkins University, Balti-
more, 1993.

[76] M. RAMANA, An Exact Duality Theory for Semidefinite Programming and its
Complexity Implications. DIMACS Technical Report, 95-02R, DIMACS, Rut-
gers University, 1995. To appear in Math Programming. Can be accessed at
http://www.ise.ufl.edu;-ramana.

[77] M.V. RAMANA, On Polyhedrality in Semidefinite Programming,In preparation,


1996.

[78] M.V. RAMANA AND R.M. FREUND, A Corrected Primal for Semidefinite Pro-
gramming, with Strong Duality, In Preparation, 1996.

[79] M.V. RAMANA AND A.J. GOLDMAN, Some Geometric Results in Semidefinite
Programming, Journal Glob. Opt., Vol. 7 (1995), pp. 33-50.

[80] M.V. RAMANA AND A.J. GOLDMAN, Quadratic Maps with Con-
vex Images, Submitted to Math. of OR, 1995. Can be accessed at
http://www.ise.ufl.edu;-ramana.

[81] M.V. RAMANA AND A.J. GOLDMAN, A Newton-like Method for Nonlinear
Semidefinite Inequalities. Submitted to SIAM J. Optim., 1996. Can be accessed
at
http://www.ise.ufl.edu;-ramana.

[82] M.V. RAMANA, E.R. SCHEINERMAN AND D. ULLMAN, Fractional Isomor-


phism of Graphs, Disc. Math., Vol. 132 (1994), pp 247-265.
398 CHAPTER 9

[83] M.V. RAMANA, 1. TUNQEL AND H. WOLKOWICZ, Strong Duality in Semidef-


inite Programming, To appear in SIAM J. Optimization, 1995. Can be accessed
at http://www.ise.ufl.edu;-ramana.
[84] T.R. ROCKAFELLAR, Convex Analysis, Princeton University Press, Princeton,
1970.
[85] A. SHAPIRO, First and Second Order Analysis of Nonlinear Semidefinite Pro-
grams, To appear in Math Programming, 1996.
[86] M. SHIDA AND S. SHINDOH, Monotone Semidefinite Complementarity Prob-
lems, Technical Report B-312, 1996, Department of Mathematical and Com-
puting Sciences, Tokyo Institute of Technology, Tokyo, Japan.
[87] J.F. STURM AND S. ZHANG, Symmetric Primal-dual Path Following Algo-
rithms for Semidefinite Programming, Technical Report 9554/ A, 1995, Tinber-
gen Institute, Erasmus University Rotterdam.
[88] J. F. STURM AND S. ZHANG, Superlinear Convergence of Symmetric Primal-
dual Path Following Algorithm for Semidefinite Programming, Technical Report
9607/ A, 1996, Tinbergen Institute, Erasmus University Rotterdam.
[89] T. TSUCHIYA, Affine Scaling Algorithm, Chapter 2 of this volume.
[90] T. TSUCHlYA AND M. MURAMATSU, Global Convergence of a Long-step Affine
Scaling Algorithm for Degenerate Linear Programming Problems, SIAM J Op-
tim., Vol. 5 (1995), pp. 525-551.
[91] L. VANDENBERGHE AND S. BOYD, Positive-Definite Programming, Mathemat-
ical Programming: State of the Art 1994, J.R. Birge and K.G. Murty ed.s, U. of
Michigan, 1994.
[92] 1. VANDENBERGHE AND S. BOYD, Semidefinite Programming, SIAM Review,
38 (1996), pp. 49-95.
[93] L. VANDENBERGHE, S. BOYD AND SHAC-PO WU, Determinant Maximization
with Linear Matrix Inequality Constraints, Technical Report, Information Sys-
tems Laboratory, Elec. Engg. Dept., Stanford University, Stanford, CA, 1996.
[94] H. WHITNEY, Elementary Structure of Real Algebraic Varieties, Annals of
Math., Vol. 66 (1957), No.3, pp. 545-556.
[95] H. WOLKOWICZ, Some Applications of Optimization in Matrix Theory, Linear
Algebra and its Applications, Vol. 40 (1981), pp. 101-118.
[96] Y. YE, A Class of Projective Transformations for Linear Programming, SIAM
J. Comput., Vo1.19 (1990), pp. 457-466.
10
IMPLEMENTING BARRIER
METHODS FOR NONLINEAR
PROGRAMMING
David F. Shanno 1 , Mark G. Breitfeld 2 ,
Evangelia M. Simantiraki3
1 RUTCOR, Rutgers University, New Brunswick, New Jersey.
2 A. T. Kearny, GmbH, Stuttgart, Germany.
3 RUTCOR and Graduate School of Management,
Rutgers University, New Brunswick, New Jersey.

ABSTRACT
The paper discusses two alternative ways of implementing logarithmic barrier methods for
nonlinear programming. The first method is a pure barrier method which uses a modified
penalty-barrier function. The second uses logarithmic barrier methods to derive a modi-
fied version of a sequential quadratic programming algorithm. Implementation issues are
discussed for both methods and directions of future research indicated.

10.1 INTRODUCTION
Logarithmic barrier methods were originally developed by Fiacco and McCormick [5]
as a means of attack on the nonlinear programming problem. While they noted the
applicability of the methods to the linear programming problem, it was the general
perceived opinion at that time that the methods would not be competitive with
the simplex method. In developing algorithms and software to actually attempt to
solve nonlinear problems, a number of serious difficulties with the logarithmic barrier
method were discovered, and these proved sufficiently intractable at the time that
the methods fell into disuse for some years. Interest in the methods was rekindled
with their remarkable success in solving large linear programming problems, which in
turn has led to new research into applying the methods to nonlinear programming
problems. Most of this work is quite new, and very incomplete. However, the
methods show sufficient promise when carefully applied to a variety of nonlinear
problems as to be definitely worthy of further study. To be able to work well in
practice, however, they must overcome the problems which originally led to their
being abandoned. Not all of these problems have been fully solved to date, and in

399
T. Terlaky (ed.), Interior Point Methods ofMathematical Programming 399-414.
© 1996 Kluwer Academic Publishers.
400 CHAPTER 10

solving some of the old problems, some new problems have arisen. It is the purpose
of this work to document the authors' experiences with two different schemes for
applying the methods to nonlinear problems, and indicate which problems arising
from the original methods now seem to be adequately solved, and which areas remain
for further work. In order to do this, it is instructive first to examine the algorithm,
and the problems which arose from implementing it, of what has come to be known
as the classical log barrier method.

10.1.1 The Classical Logarithmic Barrier


Method
The classical logarithmic barrier method of Fiacco and McCormick was designed to
solve the problem
minf(x) (10.1)
subject to Ci(X) ~ 0, i = 1,···, m.
The barrier function transformation of the problem is
m
minB(x,J.l) = f(x) - I' ~)n(Ci(x», (10.2)
;=1

and the logarithmic barrier algorithm is to choose an initial feasible XO and an initial
1'0> 0 and let xk solve
m
mjnB(x, J.lk) = f(x) - J.lk ~)n(ci(x».
i=l

Set J.lk+1 = iJ.l k , where i < 1, and continue until J.lk is sufficiently small. Note that
in the definition of the algorithm, x O is not specifically used. The need for a feasible
x Oarises from the fact that minimizing B( x, J.lk) must in general be done iteratively,
and the iterative sequence requires a feasible initial estimate in order that B(x,J.lk)
is defined. If x O is feasible, then so will be all iterates found in determining XO.
This then is used as the initial point in determining xl, and so on. Thus a major
problem initially with logarithmic barrier methods was the need to determine an
initial feasible point, which can be as difficult as solving the actual problem.

A second major problem with the clrssicallog barrier method arises from structural
ill-conditioning of the method as the optimum is approached. To see this, we note
that taking first derivatives of B( x, 1') yields

L
m

V",B = Vf - .J.l( )VCi(X) (10.3)


i=l c, x
Implementing Barrier Methods for NLP 401

and the KKT conditions imply that


. p
= A;,
~

hm - (
x)
1'-0 C;

where (x, A) are the optimal primal variables and the associated Lagrange multipliers.
Differentiating (10.3) again yields
m m ~

'il x B = 'il f - f;t ~


2 2 '"'
Ai'il 2 Ci(X) + f;t
' " ' Ai
c;(x) 'il c;(x)'il Ci(X) T .

Thus for any constraint c;(x) which satisfies c;(x) = 0, a corresponding eigenvalue
of the Hessian matrix of B(x, p) ---+ 00 and the problem becomes extremely ill-
conditioned as the optimum is approached. This makes it very difficult for any
unconstrained numerical algorithm, used to minimize the barrier function for any
choice of pk, to converge as pk approaches 0.

Another difficulty with the classical logarithmic barrier method is the need for a
very careful line search algorithm. This arises during the unconstrained search for
the minimizer of B(x, pk). Typically, the iterative method to find xk is of the form

Xj+l = xj + Oijdj •
where Oij is a scalar parameter chosen to assure that B(xj+l' pk) - B(xJ, pk) is
sufficiently small. The fact that B(x, pk) has poles at Ci(X) = 0, i = 1,···, m makes
the line search extremely difficult. In fact, it is often quite difficult just to find a
bound on Oij which assures feasibility. It might seem that a safeguard would be to
choose Oij to be small enough so that the poles are not approached, but in practice
this slows down the unconstrained algorithm so drastically that the method never
converges.

Another problem with the classical logarithmic barrier method is the choice of the
initial pO and the subsequent algorithm for reducing p at each step. The method is
often very sensitive to the choices of p, and good general algorithms for determining
p have proved allusive.

Finally, the problem (10.1) has only inequality constraints. The general nonlinear
programming problem is

minf(x)
subject to c;(x) ~ 0, i = 1, ... , m, (10.4)
gi(X) = O,i = 1,·· .,p.
402 CHAPTER 10

Fiacco and McCormick [5] incorporated the equality constraints by adding a penalty
term to the barrier formulation. The transformed problem is
m 1 p
minF(x, lI) = /(x) -II Lln(ci(x)) + - L(Yi(X))2.
;=1 II i=l

Here a penalty term is added to assure that the equality constraints are driven to
zero as II --+ O. In practice the penalty terms do not contribute problems to the line
search, as the function does not have the troublesome poles of the barrier function.
However, the penalty function can also be shown to be very ill-conditioned as II --+ 0
by an analysis similar to that done for the penalty function.

There have been remedies proposed to counteract these problems as each has arisen.
For example, Murray and Wright [9] have devised a safeguarded linesearch algorithm
especially designed for logarithmic barrier functions. Carefully implemented it is
effective, but can be very costly when the line minimum is close to a pole of the
barrier function. McCormick [8], Nash and Sofer [10], and Wright [13] have devised
partitioning algorithms with approximate inverses to deal with the ill-conditioning
of the Hessian. We will not attempt to give an entire spectrum of possible remedies
here. Rather, we will deal with two different ways of implementing interior point
methods which alleviate most of the problems discussed in this section, but at the
same time introduce new problems which remain for further study. The next two
sections deal with these two different methods.

10.2 MODIFIED PENALTY-BARRIER


METHODS
10.2.1 Modified Barrier Methods
In order to attempt to overcome some of the problems alluded to in the previous
section with the classical log barrier function for (10.1), Polyak [11] proposed the
modified barrier method, where the barrier function is defined by

~
B(X,II,A)=/(x)-J.l~Ailn (S i +c.(x»)
-'- , (10.5)
i=l J.I
where the Ai are estimates of the Lagrange multipliers and the Si are constants used
in scaling the problem. These will be discussed later. An immediate consequence of
modifying the barrier function in this way is that for any positive Sj, the problem of
initial feasibility disappears. To see this suppose Ci(XO) < 0 for some i. If we choose
Implementing Barrier Methods for NLP 403

Jl O so that Jl o Si +Ci(XO)
> 0, which is always possible since S; > 0, then the argument
of the logarithmic function is positive and the barrier function is well defined. Thus
one criterion in choosing Jl o is to assure that the barrier function is defined for an
infeasible initial point.

The effect on ill-conditioning of the problem is more subtle. Proceeding as in the


previous section, differentiating (10.5) yields

Ai
L (
m
'ilxB = 'ilf -
--1 S-I
.) 'ilCi(X),
+ .s.\..d
1- JI.

which as before indicates that

(10.6)

Taking second derivatives yields


m m ~

'il B = 'il f - "'~


2 2
~ A;'il 2 c;(x) + '~
"
(_ A; _( )) 'ilci(X)'ilc;(x) T ,
;=1 ;=1 S,Jl + c, x
which again has an eigenvalue which becomes infinitely large when Ci(X) 0, if =
Jl -+ 0 as well. However, Polyak showed that there exists a threshold value Ii
such that for any fixed Jl satisfying Jl ~ Ii, the solution to the modified barrier
problem will converge to the solution of (10.1) if the estimates of the Lagrange
multipliers converge to the optimal Lagrange multipliers i Thus the modified barrier
algorithm becomes to choose xO and AD > 0, and choose Jlo large enough to assure
that JlOSi + Ci(XO) > 0, i = 1", -, m. Let xk solve

We are now faced with the problem of adjusting Ak and Jlk. The adjustment of Ak
is suggested by (10.6), namely
"kAk
Ak + 1 = ,..; (10.7)
I JlkSi+Cj(xk)'

The adjustment of Jlk is as before Jlk+1 = /Jl k , 0 < / < 1. Neither of these adjust-
ments is without problems. We will discuss each in turn.

The problem of adjusting Jlk stems from the previous discussion concerning using
Jl o to assure initial feasibility in the extended feasible region Ci(X) ;::: -JlOSj, i =
404 CHAPTER 10

1,·· ., m. If x'" is sufficiently close to the boundary of the extended feasible region
Ci(X) ~ -1''''Si for some i, then 1'''' cannot be reduced, as the barrier function will no
longer be defined at the initial estimate x'" to the (k + 1)8t subproblem. In practice,
we have found that this occurs quite frequently. In this case, only the Lagrange
multiplier estimates can be adjusted in the hope that subsequent points will move
away from the boundary of the extended feasible region, but often this does not
occur, and the method gets stuck.

The formula for adjusting >.k is in general satisfactory, but can , for poor initial
estimates >.0, lead to a divergent sequence of estimates to one or more of the >'i'S.
Again, this is a problem that we have encountered in practice.

Conn, Gould, and Toint [3] were able to devise an algorithm based on the modified
barrier method for which they were able to prove global convergence. The algorithm
involved both solving a feasibility problem to assure that the point X~+I, the initial
estimate for the (k + l)0t subproblem, satisfied the constraints for the extended
feasibility region with 1''''+1 suitably reduced. It also reduced I' or updated>' at any
given iteration, but not both. As the feasibility problem is itself about as difficult to
solve as minimizing the modified barrier function for a fixed I' and >., this algorithm
would appear to be certainly complex and its efficiency has yet to be satisfactorily
demonstrated.

Another difficulty with the modified barrier method is that it still requires a very
careful line search algorithm. The modified barrier method changes the location
of the poles of the barrier function to those points where Ci(X) + I'k Si = 0, but it
certainly does not eliminate the poles, and computational practice has shown that
the poles of the modified barrier method cause exactly the same difficulty with line
searches as the poles of the classical barrier method.

A final difficulty with the modified barrier method is that functions such as Ci (x) =
h( y'X), where h(·) is an arbitrary function, are only defined for x ~ o. Thus extending
the feasible region by using the modified barrier function allows some variable x to
take on values for which some of the constraints, or the objective function, are not
defined. The next section will discuss one way of dealing with these problems.

10.2.2 The Modified Penalty-Barrier Method


In [1], the problem of maintaining feasibility while allowing for the reduction of I'
and the problem of eliminating poles from the barrier function are both addressed by
Implementing Barrier Methods for NLP 405

modifying the modified barrier function (10.5) to a modified penalty barrier function

L >.;<I>(c;(x)),
m

P(x, J1, >.,;3) = f(x) - J1 (10.8)


;=1

where
<I> ( c;(x)) = In (Si + c;~x)) , c;(x) ~ -;3J1S;, (10.9)

<I>(c;(x)) = ~qfc;(x)2 + qfc;(x) + qf, c;(x) < -;3J1s;, (10.10)

and
-1
qf = -:-(s-;J1""7":(1:---;3=)=)2 ,
b 1 - 2;3
qi = S;J1(1 - ;3)2 ,
;3(2 - 3;3)
= 2(1- ;3)2 + In(s;(l -
c
q; ;3)).

Here ;3 is a scalar satisfying 0 < ;3 < 1. The idea behind the choice of P is to use
the modified barrier method when the constraint is well away from the pole of the
extended feasible region, but to replace the barrier function with a quadratic penalty
function when the constraint becomes too close to the boundary of the extended
feasible region. The parameter ;3 is used to determine how close to the boundary of
the region we wish to allow the constraint to come before switching from the barrier
to the penalty function. The constants qf, qf, and qf are chosen to assure that the
two functions and their derivatives coincide at the boundary.

Using P in place of B has two immediate effects. First, J1 can always be reduced
as much as we wish at any iteration, for we no longer require c;(x) + SiJ1 ~ 0, as
the penalty function is well defined whether or not this condition holds. Further,
the method seems quite robust with respect to the choice of the initial J1. Second,
all poles are removed from the penalty-barrier function, which makes line searches
far simpler. The penalty-barrier algorithm is tested extensively against the modified
barrier algorithm in [1), and the improvements in performance are dramatic. A new
parameter, ;3, has been incorporated, and tests to date have shown that for badly
=
nonlinear problems, ;3 .9 appears satisfactory, while for mildly nonlinear problems,
;3 = .5 is preferable. A dynamic optimal choice of;3 remains for further study.

The use of P rather than B does nothing to solve the problem of constraints that
are undefined outside of a given region. In our experience, these regions can be
defined by simple bounds on the variables. To attempt to handle this problem, it
406 CHAPTER 10

appears best to handle simple bounds on the variables separately. In [1], we chose
to use the classical log barrier function for simple bounds on the variables. This
appeared satisfactory in practice, and did not appear to hurt conditioning of the
problem noticeably. This has by no means, however, been demonstrated to be the
best way of handling simple bounds. Simple projection onto the bounds appears to
be very competitive, for example. This remains another topic for further study.

The problem of when to reduce J-L and when to update>. is studied in [2]. Here it is
shown that reducing J-L by a factor of 10 every iteration, while updating>. according to
(10.7) at every iteration is far better in general than the strategy of either updating
J-L or >., but not both. The strategy of [3] is to keep J-L fixed as long as sufficient
progress toward feasibility is being made. If on any iteration sufficient progress is
=
not made, then >.k+ 1 >.k, and on only these iterations is J-L reduced. This strategy
has the advantage of being provably globally convergent, but as previously stated, is
less efficient in general. When the "update both at every iteration" strategy fails to
converge (which is very seldom) it does so by having one or more Lagrange multipliers
diverge. Thus further research in this area would appear to be indicated on how
to keep the multipliers bounded for the "update every iteration" strategy. Also,
the first order update formula for the Lagrange multipliers (10.7), while reasonably
satisfactory, can make it difficult to achieve accurate solutions on some problems, and
can certainly slow the rate of convergence. Thus it appears necessary to investigate
higher order updates for the Lagrange multipliers.

A remaining topic is the choice of the scaling factors Sj. Polyak's original modified
barrier had Sj 1, i= =
1,···, m. Note that if the constraint Ci(X) ~ 0 is scaled by a
scaling factor Cj (x) = Ci (x) the original modified barrier term becomes
Sj

In( 1 + Cj (x)) = In( Sj + Cj (x)) - In( Sj),


J-L J-L
so the constraints can be scaled naturally by introducing the shift Sj. We have found
it useful numerically to shift at each major iteration by
sf = IIci(x~)Ij,
where II ·11 is the Euclidean norm.

As a final note on this section, as with the classical barrier method, if we wish
to incorporate equality constraints in the problem, we can transform (10.8) to a
penalty-barrier method by incorporating the equality constraints in an augmented
Lagrangian function. Here the function to be minimized at each iteration is
PIP
W(x,J-L,>.,J3) = P(X,J-L,>.,J3) + E>'i+mgj(X) + "2 Egj(X?
;=1 J-L ;=1
Implementing Barrier Methods for NLP 407

The update formula for the Lagrange multipliers corresponding to the equality con-
straints is
1:+1 _ I: Ci(Xk)._
Ai+m - Ai+m - - - k - ' l - 1,··· ,p.
J-I
This general method has been extensively tested in [2] and has shown to be quite
satisfactory in practice. An entirely different method for handling equality and
inequality constraints will be described in the next section.

10.3 A SLACK VARIABLE ALTERNATIVE


The method developed in the previous section modifies the classical logarithmic
barrier approach where the nonlinear inequalities appear directly in the modified
penalty-barrier function. An alternative approach is to introduce slack variables
into (10.4), yielding the formulation

minf(x)
subject to Ci(X)-Z; =O,i= 1,···,m, (10.11)
9i(X)=0,i=I,···,p,
Zi 2: 0.

Here, as the nonlinear inequalities are replaced with equalities, there is no reason to
differentiate between nonlinear inequalities and equalities. The classical logarithmic
barrier transformation of (10.11) is

m
minf(x) - J-I2)n(z;)
;=1
subject to Ci(X) - Zi = O,i = 1" ··,m, (10.12)
9i(X)=0,i=I,···,p,
and the Lagrangian for (10.12) is
m m p
L(x, z, A) = f(x) - J-I L In(z;) - L Ai(Ci(X) - Zi) - L Ai+m9;(X). (10.13)
;=1 ;=1 ;=1
408 CHAPTER 10

For clarity, it will be useful to differentiate between the Lagrange multipliers cor-
responding to the equality and slack inequality constraints. Thus we designate
Yi = Ai+m, i = 1, ... , p. The first order conditions for (10.13) are
m p
'IlxL='Ilf(x)- LAi'llci(X)- LYi'llgi(X) =0, (10.14)
i=1 ;=1

(10.15)

Ci(X) - Zi = 0, i = 1,···, m, (10.16)

g;(x)=O,i=l,···,p, (10.17)
where Z =diag(z;), e = (1, ... , l)T, and A = (AI,"', Am)T. Following the primal-
dual formulation that has proved so successful for linear programming, we designate
A =diag(>.;) and rewrite (10.15) as

ZAe = pe. (10.18)

The solution technique now becomes directly analogous to that in linear program-
ming, namely to use Newton's method to find (x, z, >., y) which solve the modified
first order conditions. Denoting

'IlxL(X,Z,A,y) )
c(x) - z
F(X,Z,A,y)= ( g(x) , (10.19)
AZe

then a KKT point is a point satisfying F(X,Z,A,y) = 0, where g(x) = (gl(X), ... ,
gp(x))T and c(x) = (cl(x), ... ,cm(x)f. In [4], El-Bakry et al. analyze an interior
point method for finding a KKT point. In order to describe their algorithm, we first
need further notation. Let
'IlxL(x,z,>.,y) )
G(X,Z,A,y)= ( c(x)-z ,
g(x)

v = (x, z, A, yf,
and let ~v = (~x, ~z, ~>., ~y)T solve the damped Newton system

F'(V)~V = -F(v) + pe,


Implementing Barrier Methods for NLP 409

where e = (0,0,0, eTf. Let


v(a) = (x(a),z(a),>.(a),y(a))T = (x,z,>.,yf +a(~x,~z,~>',~Yf.
Further, let
. z(a)T >.(a)
8 1(a) = ~m(zi(a)>'i(a)) - 'F1 n (10.20)

and
(10.21)
where
min zp>.?
(10.22)
zOT >.0
-n-
zoT >.0
(10.23)
IIG(vO)II'
and I E (0,1) is a constant. These are the familiar functions from linear program-
ming that guarantee that infeasibility is reduced comparably to complementarity
and that centrality is maintained. In linear programming, these conditions plus
nonnegativity of z and >. are sufficient to prove global convergence or divergence
of either z or >.. For nonlinear programming, however, an additional condition is
needed, namely that the chosen step length a also produces a sufficient reduction in
a merit function. Here the merit function III (a) is defined to be

III (a) = IIF( v( a ))II~.


We can now state the nonlinear variant to the linear primal-dual algorithm. It is to
choose vO with zO > 0, >.0 > 0, P E (0,1) , c; E (0,1/2]' set k = 0, and Ik-1 = 1, and
compute 111° (va).

1. Test for convergence. If IIIk ~ f, stop.


2. Choose uk E (0,1) and compute the perturbed Newton direction with

3. Choose 1/2 ~ Ik ~ Ik-1, and substituting "fk for I in (10.20) and (10.21),
compute
a; = max {a: 8;(a') ~ 0 for all a' ~ a}.
<>E(O,l)
410 CHAPTER 10

Let

Let

where t is the smallest nonnegative integer that satisfies

4. Let vk+l = v k + ak~vk, k = k + 1, and go to step 1.


EI-Bakry et al. were able to prove global convergence for this algorithm under stan-
dard conditions, with one additional interesting, and important, condition, namely
that the matrix \l;,L + \lcT AZ-l\lc remain nonsingular throughout the algorithm,
where \leT is the Jacobian matrix of the inequality constraints. The condition is
interesting, because while it may seem to be an artifact needed to prove convergence,
it is more than that. In [12], we have found that when no feasible solution exists,
this matrix becomes singular, causing the norm of the search vector to become very
large, and hence driving a to zero.

Computational experience with this algorithm is very limited. EI-Bakry et al. report
some very preliminary results on a limited number of test problems of small dimen-
sion. Lasdon et al. [7] report results, on a somewhat larger test set, of a trust region
variant of the algorithm. We have been involved with applying the algorithm to non-
linear complementarity problems, which contain nonlinear programming problems as
a subset, and have found the algorithm quite promising, but still in a stage requiring
much more research. In particular, the algorithm contains many parameters, and
performance is very dependent upon parameter choice. Proper means of choosing
these will require extensive numerical testing. Further, in the modified penalty- bar-
rier method, the penalty-barrier function is minimized, at least approximately, for
each value of J-I k • Here J-I is adjusted after each single Newton step. Other adjustment
strategies may prove more computationally viable. As the method becomes more
fully tested, other issues will undoubtedly arise. Nonetheless, we find the method
sufficiently promising to merit much more research and testing. The next section
contains a few comparative results of the two methods documented here using the
research code developed to solve nonlinear complementarity problems to test the
method documented in this section.
Implementing Barrier Methods for NLP 411

lOA DISCUSSION AND PRELIMINARY


NUMERICAL RESULTS
The penalty barrier algorithm described in Section 10.2 has been extensively tested
on a large variety of test problems. (See [1], [2]). The slack variable alternative of
Section 10.3 is still under development,and the code we used to provide the sample re-
sults in the section has been developed and tested as a complementarity code rather
than being restricted to nonlinear optimization problems. Thus, it is likely that
more efficient versions can be developed which are restricted to nonlinear program-
ming problems, which are more structured than the more general complementarity
problems.

The problems and starting points chosen for the comparison are six of the more
difficult problems of the Hock and Schittkowski [6] suite of nonlinear test problems.

For the primal-dual algorithm Uk was selected as

Uk = min{1/1, 1/2zk:\k} ,

where 1/1 = .08 and 1/2 = 1 . The remaining algorithmic parameters were set as follows
I = 10- 6 , " = 10- 4 , P = 0.5.
The convergence tolerance was f =
10- 12 . For the penalty-barrier algorithm the
accuracy was 10- . The reader is referred to [2] for a detailed description of this
8
algorithm. The comparative results of the two algorithms are contained in Table
10.1.
Primal-dual Penalty-barrier
Problem Iterations Major Minor
23 15 3 26
80 14 3 27
86 11 4 39
100 12 2 44
106 26 3 87
117 33 5 131

Table 10.1 Comparative performance

In both codes, Newton method was used as the basic iterative procedure. In the
results for the penalty-barrier code, the number of major iterations is the number of
412 CHAPTER 10

times J-I was decreased and the Lagrange multipliers adjusted. The number of minor
iterations is the total number of Newton steps. For the primal-dual method, the
number of iterations is the total number of Newton steps.

The results dearly indicate that the primal-dual approach is more efficient on these
problems. It should be noted here that while the merit function used for the method
only guarantees convergence to a stationary point, not a local minimizer, in all cases
the documented minimizer was obtained. In view of the results, it is instructive to
consider the relative merits of the two approaches.

First, the primal-dual approach requires second partial derivatives, as the whole
method is prefaced on solving the damped first order conditions using Newton's
method. While the modified penalty-barrier method tested here uses second partial
derivatives it has been used successfully with truncated Newton methods and limited
memory variable metric methods, both of which only require first order information.
Thus for problems where second derivative information is difficult to obtain, the
penalty-barrier method appears far preferable.

Also, when a problem has few variables but many inequality constraints, even if
Newton's method is used with the penalty-barrier method, the matrix to be factored
is of the order of the number of variables, while the primal-dual method factors a
matrix of the order of the number of variables plus the number of constraints. Here
again, the penalty-barrier methods seems preferable.

An advantage of the primal-dual approach is that the Lagrange multipliers are cal-
culated directly by Newton's method rather than using first order estimates. This
should improve both accuracy and the rate of convergence. In fact, the method
is a variant of the sequential quadratic programming algorithm, which for equality
constrained problems usually converges in very few iterations. Thus for problems
with a reasonable number of constraints relative to the number of variables, and
available second order information, the algorithm should prove quite competitive.
Preliminary testing also indicates that the Hessian matrices remain better condi-
tioned, which should be a major advantage on ill-conditioned problems. This is the
case in all problems tested here.

In summary, the reemergence of barrier methods for nonlinear programming is still


in its infancy. Much remains to be done, but results to date are sufficiently promising
that we can hope to have viable algorithms in the near future.
Implementing Barrier Methods for NLP 413

Acknowledgements
This research was sponsored by the Air Force Office of Scientific Research, Air Force
System Command under Grant F49620-95-0110.

REFERENCES
[1] M. G. BREITFELD AND D. F. SHANNO, Computational experience with penalty-
barrier methods for nonlinear programming, RUTCOR Research Report RRR
17-93 (revised March 1994), Rutgers University, New Brunswick, New Jersey,
1995. To appear in Annals of Operations Research.
[2] - - , A globally convergent penalty-barrier algorithm for nonlinear program-
ming and its computational performance, RUTCOR Research Report RRR 12-
94 (revised September 1995), Rutgers University, New Brunswick, New Jersey,
1995.
[3] A. R. CONN, N. I. M. GOULD, AND P. TOINT, A globally convergent La-
grangian barrier algorithm for optimization with general inequality constraints
and simple bounds, Technical Report 92/07, Department of Mathematics, Fac-
ulte Universitaires de Namur, Namur, Belgium, 1992.
[4] A. S. EL-BAKRY, R. A. TAPIA, T. TSUCHIYA, AND Y. ZHANG, On the formu-
lation and theory of the primal-dual Newton interior-point method for nonlinear
programming, Technical Report TR92-40, Department of Computational and
Applied Mathematics, Rice University, 1992.
[5] A. V. FIACCO AND G. P. MCCORMICK, Nonlinear Programming: Sequential
Unconstrained Minimization Techniques, John Wiley & Sons, New York, 1968.
Reprint: Volume 4 of SIAM Classics in Applied Mathematics, SIAM Publica-
tions, Philadelphia, Pennsylvania, 1990.

[6] W. HOCK AND K. SCHITTKOWSKI, Test Examples for Nonlinear Programming


Codes, vol. 187 of Lecture Notes in Economics and Mathematical Systems,
Springer Verlag, Berlin, 1981.
[7] L. S. LASDON, J. PLUMMER, AND G. Yu, Primal-dual and primal interior
point algorithms for general nonlinear programs, ORSA Journal on Computing,
7 (1995), pp. 321-332.
[8] G. P. MCCORMICK, The projective SUMT method for convex optimization,
Mathematics of Operations Research, 14 (1989), pp. 203-224.
414 CHAPTER 10

[9] W. MURRAY AND M. H. WRIGHT, Efficient linear search algorithms for the
logarithmic barrier function, Report SOL 76-18, Department of Operations Re-
search, Stanford University, Stanford, CA, 1976.
[10] S. G. NASH AND A. SOFER, A barrier method for large-scale constrained opti-
mization, ORSA Journal on Computing, 5 (1993), pp. 40-53.

[11] R. POLYAK, Modified barrier functions (theory and methods), Mathematical


Programming, 54 (1992), pp. 177-222.
[12] E. SIMANTIRAKI AND D. SHANNO, An infeasible-interior-point algorithm for
solving mixed complementarity problems, RUTCOR Research Report RRR 37-
95, Rutgers University, New Brunswick, New Jersey, 1995.
[13] M. H. WRIGHT, Interior methods for constrained optimization, in Acta Numer-
ica, A. Iserles, ed., Cambridge University Press, New York, 1992, pp. 341-407.
PART III
APPLICATIONS, EXTENSIONS
11
INTERIOR POINT METHODS FOR
COMBINATORIAL OPTIMIZATION
John E. Mitchell
Department of Mathematical Sciences
Rensselaer Polytechnic Institute
Troy, NY 12180

ABSTRACT
Research on using interior point algorithms to solve combinatorial optimization and inte-
ger programming problems is surveyed. This paper discusses branch and cut methods for
integer programming problems, a potential reduction method based on transforming an
integer programming problem to an equivalent nonconvex quadratic programming prob-
lem, interior point methods for solving network flow problems, and methods for solving
multicommodity flow problems, including an interior point column generation algorithm.

11.1 INTRODUCTION
Research on using interior point algorithms to solve combinatorial optimization and
integer programming problems is surveyed. Typically, the problems we consider can
be formulated as linear programming problems with the restriction that some of the
variables must take integer values. The methods we consider have been used to solve
problems such as the linear ordering problem, clustering problems, facility location
problems, network flow problems, nonlinear multicommodity network flow problems,
and satisfiability problems. This paper discusses four main methodologies, three of
which are similar to known approaches using the simplex algorithm, while the fourth
method has a different flavor.

Branch and cut methods are considered in section 11.2. Simplex-based branch
and cut methods have been very successful in the last few years, being used to
solve both specific problems such as the traveling salesman problem and also generic
integer programming problems. The research described in this paper constructs a
branch and cut algorithm of the usual type, but then uses an interior point method

417
T. Terlaky (etL).lnterior Point Methods ofMathematical Programming 417-466.
C 1996 Kluwer Academic Publishers.
418 CHAPTER 11

to solve the linear programming relaxations. The principal difficulty with using an
interior point algorithm in a branch and cut method to solve integer programming
problems is in warm starting the algorithm efficiently, that is, in using the solution
to one relaxation to give a good initial solution to the next relaxation. Methods
for overcoming this difficulty are described and other features of the algorithms
are given. This paper focuses on the techniques necessary to obtain an efficient
computational implementation; there is also a discussion of theoretical issues in
section 11.6.1. Column generation algorithms have a structural similarity to cutting
plane methods, and we describe a column generation algorithm for solving nonlinear
multicommodity network flow problems in section 11.5.1.

In section 11.3, we discuss a method for solving integer programming problems that
is based upon reformulating the integer programming problem as an equivalent non-
covex quadratic programming problem. The quadratic program is then solved using
a potential reduction method. The potential function has some nice properties
which can be exploited in an efficient algorithm. Care is needed so that the algo-
rithm does not get trapped in a local minimum. We also discuss a related algorithm
for solving quadratic integer programming problems, which can be applied to the
graph partitioning problem, for example.

Many network flow problems can be solved by ignoring the integrality require-
ment on the variables and solving the linear programming relaxation of the problem,
because it is guaranteed that one of the optimal solutions to the linear program will
solve the integer programming problem. Typically for these problems, the simplex
method can be considerably enhanced by exploiting the structure of the constraint
matrix; there are also often very good methods which are not based on linear pro-
gramming. Thus, the challenge is to design an efficient implementation of an interior
point method which can compete with the algorithms which are already available.
We describe the research in this area in section 11.4.

Interior point approaches to the multicommodity network flow problem are


discussed in section 11.5. These include the column generation algorithm mentioned
earlier. These problems can be modelled as linear programming problems which are
too large to be solved easily, so it is necessary to use alternative methods to just
solving the linear programming problem.

Theoretical issues are discussed in section 11.6. This includes a discussion of the
computational complexity of interior point cutting plane methods and also improved
complexity results for various combinatorial optimization problems that have been
obtained through the use of interior point methods.

Finally, we offer our conclusions in section 11.7.


IPMs for Combinatorial Optimization 419

1 •

o 1 2 3 4 5

Figure 11.1 Feasible region of an integer program

11.2 INTERIOR POINT BRANCH AND CUT


ALGORITHMS
In this section, we discuss the solution of integer programming problems using cutting
plane and branch and bound methods. Before considering the general case, we
examine the following example. Consider the integer feasible region S
3XI + 5X2 > 9
-2XI + 5X2 < 9
5XI + 2X2 < 25
3XI 4X2 < 7
Xi integer, i = 1,2
shown in figure 11.1. The feasible points are shown by dots. The convex hull of the
feasible integer points is the set of points
P:= {XI,X2: Xl + X2 2: 3,X2 2: I,XI - X2 ~ 2,XI + X2 ~ 6, -Xl + 2X2 ~ 3},

which has extreme points (1,2), (2,1), (3,1), (4,2), and (3,3). For a given linear
objective function cT X := CIXI + C2X2, the optimal solution to the integer program
420 CHAPTER 11

min{ cT x : xES} will be one of these extreme points. Thus, with the given de-
scription of P, we could solve the integer program by solving the linear program
min{ cT x : x E Pl. Of course, in general it is hard to find the polyhedral descrip-
tion P.

Let us take CI = 2, C2 = 3. The solution to the integer program is then the


point (2, 1). A cutting plane method first solves the LP relaxation of the integer
program:
mIn 2XI + 3X2
subject to 3XI + 5X2 > 9
-2XI + 5X2 < 9
5XI + 2X2 < 25
3XI 4X2 < 7
This problem has optimal solution (0,1.8), with value 5.4. We then add an extra
constraint (or cutting plane) to the LP relaxation that is violated by the point (0,1.8)
but is satisfied by every point in 5, and then resolve the LP relaxation. For example,
we could add the constraint 4XI + X2 ~ 4. Modern cutting plane methods attempt
to use cutting planes which are facets of the convex hull P of 5, so they would add
either Xl + X2 ~ 3 or -Xl + 2X2 ::; 3. It is harder to find strong cutting planes like
these than a weaker cutting plane such as a Gomory cut.

A branch and bound approach to this problem would examine the solution (0, 1.8)
to the LP relaxation and then split the problem into two new problems, one where
X2 ~ 2 and one where X2 ::; 1. These new linear programs are then solved and
the process is repeated. If the solution to any of the linear programming problems
that arise in this process is integer then that point solves the corresponding part of
the integer programming problem; if any of the linear programs is infeasible, then
the corresponding part of the integer program is also infeasible. The value of the
linear program provides a lower bound on the value of the corresponding part of the
integer program, and this bound can be used to prune the search space and guide
the search.

Cutting plane methods and branch and bound methods can be combined into a
branch and cut method, but we will discuss them separately, in order to emphasize
their individual features. For a good discussion of simplex-based branch and cut
methods, see, for example, the books by Nemhauser and Wolsey [61) and Parker
and Rardin [64). The book [61) is a detailed reference on integer programming and
it discusses cutting plane algorithms comprehensively; for a summary of this book,
see [62). The book [64) also discusses cutting plane algorithms, and it discusses
branch and bound in more detail than [61). Junger et al. [35) discuss computational
work using branch and cut algorithms to solve a variety of integer programming
problems.
IPMs for Combinatorial Optimization 421

As mentioned above, cutting plane and branch and bound methods work by setting
up a linear programming relaxation of the integer programming problem, solving
that relaxation, and then, if necessary, refining the relaxation so that the solution
to the relaxation gets closer to the solution to the integer programming problem.
These methods have been known for many years (Land and Doig [46], Gomory [26]),
and they have achieved very good results in the last few years. Of course, most of
these results have been achieved by using the simplex algorithm to solve the linear
programming relaxations; the focus in this section is on using an interior point
method to solve the relaxations. Unfortunately, is is not usually sufficient to simply
replace the simplex algorithm with an interior point method, because an interior
point method is not as good as the simplex algorithm at exploiting the solution to
one relaxation when trying to solve the next relaxation. This relatively poor use of
the warm start provided by the previous relaxation makes it necessary to only solve
the relaxations approximately; the algorithms seem fairly adept at exploiting this
approximate solution. Other refinements to a traditional branch and cut approach
are also necessary when using an interior point method, but the principal difference
is in the use of approximate solutions to the relaxations.

We discuss cutting plane algorithms in section 11.2.1 and branch and bound al-
gorithms in section 11.2.2. Adding a constraint to a primal linear programming
problem is structurally equivalent to adding a column to the dual problem, so re-
search on column generation algorithms has a strong impact on research on cutting
plane algorithms, and vice versa. In section 11.5.1, we discuss a column generation
algorithm for a multicommodity network flow problem. The theoretical performance
of cutting plane and column generation algorithms is discussed in section 11.6.1.

11.2.1 Interior point cutting plane algorithms


In order to simplify the discussion, we assume that all the variables are constrained
to take the values zero or one, and that all the constraints are inequality constraints.
We assume we have an integer programming problem of the form
mm cTx
subject to Ax < b (IP)
Xi 0 or 1
where x and care n-vectors, b is an m-vector, and A is an m x n matrix. We
assume that c is not in the row space of A; if this was not the case, every feasible
solution would be optimal. We do not make any assumptions regarding the relative
magnitudes of m and n, nor do we make any assumptions regarding the matrix A.
Many problems can be cast in this framework. We let Q denote the convex hull of
feasible solutions to (IP). The linear programming relaxation (or LP relaxation) of
422 CHAPTER 11

(IP) is
mm cTx
subject to Ax :5 b (LP~)
0 < x < e

where e denotes a vector of ones of the appropriate dimension. (We will use e in
this way throughout this paper.) If the optimal solution to (LP~) is integral then
it solves the original problem (I P), because it is feasible in (I P) and it is at least
as good as any other feasible point in (I P). If the optimal solution x LP to (LP~) is
not integral, then we improve the relaxation by adding an extra constraint or cutting
plane of the form ao T X :5 bo. This cutting plane is a valid inequality for (I P) but it
is violated by the optimal solution x LP . We then solve the modified LP relaxation,
and repeat the process.

The recent success of simplex based cutting plane algorithms has been achieved
through the use of polyhedral theory and specialized cutting planes; the cutting
planes are generally chosen from families of facets of the convex hull of feasible
integer points. Traditionally, Gomory cutting planes were derived from the optimal
simplex tableau; Mitchell [55] has shown how these same cutting planes can be
derived when using an interior point cutting plane algorithm.

We prefer to write the linear programming relaxation as a problem with equality


constraints; thus we include slack variables to get the relaxation

mm
subject to b (LP)
o < x < U

where u is a vector of upper bounds on the variables, so Ui =


1 for the original
integer variables, and Ui takes an appropriate value for the remaining variables. The
dual problem to (LP) is

max UTw
subject to W + z c (LD)
w,Z > 0

When we add a cutting plane, we will obtain the new relaxation

mm cTx
subject to Ax = b
aoT x + Xo bo (LPnew)
0 < x < U
0 < Xo < Uo
IPMs for Combinatorial Optimization 423

1. Initialize: Pick initial x, y, wand primal and dual slacks.

2. Approximately solve relaxation: Solve the current relaxation to the desired


degree of accuracy using an interior point algorithm. If the current iterate is a
sufficiently accurate solution to the original problem (I P), STOP.

3. Add cutting planes: See if the current iterate violates any constraints. If
not, tighten the desired degree of accuracy and return to Step 2; otherwise, add
a subset of the violated constraints and go to Step 4.

4. Update the relaxation and restart: Update the variables appropriately.


Return to Step 2.

Figure 11.2 A prototype interior point cutting plane algoritlun

for some appropriate upper bound Uo on the new slack variable Xo. The correspond-
ing new dual problem is

max bTy uTw UoWo


subject to ATy + aoyo w + z c
Yo Wo + Zo 0 (LDnew)
W,Z > 0
Wo, Zo > 0

Note that if we know feasible solutions i: > 0 and fj, tV > 0, Z > 0 to (LP) and
(LD) respectively, then, after the addition of the cutting plane, we can obtain a new
feasible solution to (LDnew) by taking y = y, w = tV, Z = Z, Yo = 0 and Wo = Zoo If
we pick Wo = Zo to be strictly positive then all the nonnegativity constraints will be
satisfied strictly. It is not so simple to obtain a feasible solution to (LPnew) because
we have a oT i: > bo if the new constraint was a cutting plane. Nonetheless, if the old
solution was close to optimal to (LP) and (LD) then we can hope that it should
also be close to the solution to (LPnew) and (LDnew), so it provides a warm start
for solution of the new problem.

In this section, we discuss how an interior point method can be used in this setting.
A simple, conceptual interior point cutting plane algorithm could be written as
in figure 11.2. We will give a more formal algorithm later. Currently, the best
algorithm for linear programming appears to be the primal-dual predictor-corrector
barrier method (see Lustig et al. [49,50] and Mehrotra [52]), so we consider modifying
this algorithm for use in a cutting plane algorithm. Other interior point algorithms
424 CHAPTER 11

which maintain strictly positive primal and dual iterates can be modified in a similar
manner. We will also briefly discuss using a dual algorithm.

With a primal-dual algorithm, we always have interior primal and dual iterates, that
is, 0 < x < U, W > 0 and Z > O. We also have a barrier parameter Ji and we refer to
an iterate as centered if we have

XiZi = Ji and (Ui - Xi)Wi = Ji, i = 1, ... , n. (iLl)


When Ji = 0, these conditions are the complementary slackness conditions. Interior
point methods tend to work better when they can use iterates that are close to being
centered. The importance of having centered iterates is a theme which will recur in
this paper.

We first motivate the discussion by describing two integer programming problems.

Two example problems


The perfect matching problem can be solved by using a cutting plane algorithm -
see Grotschel and Holland [27J and Mitchell and Todd [59J.

=
The perfect matching problem: Given a graph G (V, E) with vertices
V and edges E, a matching is a subset M of the edges such that no two
edges in M share an end vertex. A perfect matching is a matching which
contains exactly I V I /2 edges, where I V I denotes the cardinality of V.
Given a set of weights We associated with the edges e in E, the perfect
matching problem is to find the perfect matching M with smallest weight
w(M) := EeEM We·

Edmonds [15, 16J showed that the perfect matching problem can be solved in poly-
nomial time. He also gave a complete polyhedral description of the perfect matching
problem. He showed that the optimal solution to a perfect matching problem is one
of the solutions to the linear programming problem

mill EeEE WeXe


subject to EeE 6( v) X e = 1 for all v E V (1l.2)
EeEE(U) Xe ::; (IUI-l)/2 for all U ~ V with lUI odd (1l.3)
~ 0 for all e E E (11.4)

where o( v) denotes the set of edges in E which are incident to vertex v and E(U)
denotes the set of edges in E which have both end vertices in U, where U is a
IPMs for Combinatorial Optimization 425

subset of V. Equations (11.2) are the degree constraints and equations (11.3) are
the odd set constraints. The number of odd set constraints is exponential in the
number of vertices, so it is impracticable to solve the linear programming problem
as expressed. Thus, in a cutting plane method, the initial relaxation consists of the
degree constraints together with the nonnegativity constraints (11.4), and the odd
set constraints are added as cutting planes. Consider, for example, the graph given
in figure 11.3. Here, the edge weights are the Euclidean lengths of the edges. The

Figure 11.3 The effect of an odd set constraint

optimal matching has M =


{(V2, V3), (v!, V4), (vs, V6)}. The LP relaxation consisting
of the degree constraints and the nonnegativity constraints has optimal solution

O~5
if e is one of the edges
" = { (Vl,V2),(V2,V3), (Vl,V3),(V4,VS), (V4,V6), (VS,V6)
otherwise

This solution violates the odd set constraint with U = {Vl, V2, V3}:

If this constraint is added to the relaxation, the optimal solution to the linear pro-
gram is the optimal matching given above.

Another problem that can be solved by a cutting plane algorithm is the linear or-
dering problem - see, for example, Grotschel, Junger and Reinelt [28] or Mitchell
and Borchers [57].

The linear ordering problem: Given a complete directed graph G =


(V, A), with costs Cij on the arcs, define the cost of a permutation u of the
vertices to be c(u):= E(i,j):o(i)<o(")Cij. The linear ordering problem is to
find the permutation with the smailest cost.
426 CHAPTER 11

The linear ordering problem is NP-Hard [44]. This problem can be expressed as an
integer linear programming problem:

mm Li,j CijXii
subject to xii+ Xji = 1 for 1 S; i < j S; IV I (11.5)
xii + Xjk + xki S; 2 for 1 S; i < j < k S; IV I (11.6)
= 0 or 1 for 1 S; i < j S; IV I (11. 7)

Grotschel et al. [28] have found several classes of valid inequalities for the linear
ordering problem which can be used as cutting planes. However, for many real world
problems, the solution to the linear programming relaxation given above solves the
linear ordering problem. The equations (11.6) are known as the 3-dicycle constraints;
notice that there are ( ; ) of them. In both [28] and [57]. the initial relaxation
consists just of the equations (11.5) together with the simple bounds 0 S; Xij S; 1 for
each edge; the 3-dicycle constraints are added as cutting planes as needed. In these
implementations, the solutions to the relaxations can be integral but not feasible in
the linear ordering problem; in this case cutting planes are used to cut off infeasible
integral points. Thus, this approach to solving the linear ordering problem does not
quite fit in the framework we discussed earlier with relation to the problems (I P)
and (LP), but that framework can be extended in an obvious manner to include
this approach to the linear ordering problem. The traveling salesman problem is
also usually formulated so that solutions to the LP relaxations can be integral but
infeasible; the subtour elimination constraints are used to cut off these infeasible
integral points. (For a good discussion of the traveling salesman problem see the
book edited by Lawler et al. [47]; for a recent description of an implementation, see
Applegate et al. [5].)

Early termination
The optimal solution to (LP) is not an interior point. Therefore, if we solve (LP) to
optimality then it is necessary to perturb the solution slightly to obtain an interior
point before we can even start solving (LPnew) using an interior point method.
Typically, if an interior point method is started from close to the boundary, it will
move towards the center of the feasible region before starting to move towards the
optimal solution. Thus, the optimal solution to (LP) is not a very good starting point
for trying to solve (LPnew). A very successful method to try to avoid this difficulty
is to terminate solution of (LP) early. We will then have an interior point when we
start solving (LPnew). In addition, we will not spend as many iterations returning
towards the center of the polyhedron and we will start moving towards the optimal
solution to (LPnew) more quickly. Thus, we will spend fewer iterations solving
IPMs for Combinatorial Optimization 427

(LP) because we only solve it approximately, and we will also spend fewer iterations
solving (LPnew) because we start off with an iterate which is more centered.

We can terminate solution of (LP) early if v<'e can find cutting planes which are
violated by the current solution. In fact, if we can find cutting planes which are
violated by this current iterate, they may well be deeper cuts and cut off more of the
feasible region, because the iterate is closer than the optimal solution to the center
of the polyhedron. We may also be able to find more good cutting planes at this
early iterate.

Consider, for example, the perfect matching problem on the graph in figure 11.4,
where edge weights are the Euclidean lengths. The optimal matching in this graph

Figure 11.4 An illustration of the phenomenon of nested odd sets

uses edges (VI, VIO), (V2' V3), (V4' V5), (V6, V7), and (V8, vg). The optimal solution to
the LP relaxation consisting of just the degree constraints and nonnegativity has

I for edges (V2' V3), (V8, Vg)


Xe = { 0.5 for edges (VI, V4), (VI, vs), (V4' vs), (V6, V7), (V6, VIO), (V7, VlO)
o otherwise

The separation routine for detecting violated odd set constraints involves finding con-
nected components in the graph that only has edges where Xe > T for some thresh-
old T 2: O. Thus, it would find the violated constraints for the oddsets {VI, V4, vs}
and {V6, V7, VIO}. If we search at an early iterate, we may well have Xe > 0 on edges
(V2' vs) and (V3, V4), and in addition the values Xe on these edges are discernibly larger
than those on the edges (V4' V6), (V5, V7) and (VI, VIO). Thus, for appropriately chosen
values of T, we would find the violated odd set constraints given above and also the
constraints corresponding to the odd sets {VI, V2, V3, V4, V5} and {V6, V7, V8, Vg, VI o}.
Without these constraints, the solution to the relaxation is fractional; thus, these
constraints are necessary, and the ability of the interior point method to find these
constraints at an earlier stage means that one fewer LP relaxation has to be solved.
428 CHAPTER 11

There are two disadvantages to looking for cuts before solving the current relaxation
to optimality. Firstly, we may be unable to find any cuts, so the search is a waste
of time. Secondly, the search may return cuts which are violated by the current
iterate, but which are not violated by the optimal solution, so we may end up
solving additional relaxations. The second disadvantage can mitigated by moving
towards the optimal solution from the center of the polyhedron, making it unlikely
that we will violate unnecessary cutting planes. One method for reducing the impact
of the first disadvantage is to use a dynamically altered tolerance for deciding when
to search for violated cutting planes. We only search when the duality gap drops
below this tolerance. If we find a large number of violated constraints, we increase
the tolerance, because we probably did not need to solve the relaxation to such a
high degree of accuracy. If we only find a small number of violated constraints, we
decrease the tolerance - we should solve the relaxations more accurately to obtain
a better set of cutting planes as the relaxation becomes a better approximation to
the convex hull of feasible integer points. As the number of violated cutting planes
drops, it should also take fewer iterations to solve the next relaxation because the
two relaxations should be close to each other.

Early termination is the most important technique for improving an interior point
cutting plane algorithm. By using a dynamically altered tolerance for determining
when to search for cutting planes, the time spent on unnecessary searches can be
dramatically reduced.

Restarting the algorithm and regaining primal feasibility


When adding a cutting plane, we can obtain a new feasible interior dual iterate by
setting Yo = 0 and Wo = Zo = i for some appropriate small positive value of i.
It is not straightforward to transform the old primal iterate into a feasible interior
iterate in the new problem (LPnew). One option is to pick a strictly positive value
for Xo and then use a primal-dual infeasible interior point method, as described in,
for example, Zhang [82]. To improve stability and performance, it is useful to also
increase any small components of x, w, z and u - x up to i. It is also often useful
to take a pure centering step when restarting. This works reasonably well, typically
using about one third to one half of the number of iterations required to solve the
problem from a cold start. However, the sequence of iterates often tends to move
towards the center of the feasible region and away from the optimal solution while
attempting to regain feasibility, with the result that it takes several iterations to
solve (LPnew).

We have found that better performance can be obtained if it is possible to update


the primal iterate to a point that is known to be feasible and interior in (LPnew).
Any interior point which is a convex combination of feasible integral points will
IPMs for Combinatorial Optimization 429

satisfy all cutting planes, so it will be feasible in (LPnew). In addition, it will be


interior in (LPnew) provided it satisfies all the cutting planes strictly. Any point
in the relative interior of Q will be feasible and interior in (LPnew). It is often
straightforward to find an initial point of this type; this point can be updated as the
algorithm progresses, either by combining it with integral solutions that are found
by heuristics, or by combining it with any iterate which is in the convex hull. The
improved performance of the algorithm is because there is no need to balance the
search for feasibility with the searches for optimality and centrality, and also because
the point in the convex hull is often more centered than the point which is feasible
in (LP).

One possible reason for the difficulty with restarting an interior point cutting plane
method with an infeasible point can be developed from the work of, for exam-
ple, Anstreicher [3], Mizuno et a/. [60], and Zhang [82], who have all discussed
interior point algorithms for linear programming which move towards feasibility
and complementary slackness simultaneously. A common feature of the analysis
of these algorithms is the exploitation of the fact that they move towards feasibil-
ity at least as fast as they move towards complementary slackness. When restart-
ing directly from the approximate solution to the previous relaxation (LP), the
primal infeasibility is xo+ 1bo - a'{; i I. The total complementary slackness is
iT z + (u - i f w + XoZo + (uo - xo)wo. In order to get an iterate which is ap-
proximately centered, we could choose Wo = Zo = 2J.t/uo and Xo = uo/2. The
complementary slackness will then be approximately (2n + 2)J.t, so the ratio between
infeasibility and complementary slackness will be large if J.t is small. Other choices
for Xo, wo, and Zo would require a tradeoff between centrality and this balance be-
tween infeasibility and complementary slackness. This may explain why it is hard
to get very fast convergence from the infeasible warm start generated in a cutting
plane algorithm.

Ye, Mizuno and Todd [81] introduced a skew symmetric self dual algorithm for
linear programming. Further investigation of this algorithm is described in, for
example, [76, 77]. This algorithm has the property that it is easy to generate a
perfectly centered initial iterate. This has the potential to make this algorithm very
useful in a cutting plane framework, because we can take the iterate for the previous
relaxation, modify it slightly, and obtain an almost centered iterate for the new
relaxation. This is an issue that needs more computational investigation.
430 CHAPTER 11

Adding many constraints at once


It is usual in practice to add many constraints at once. If we add many constraints
to the relaxation (LP), we obtain the new relaxation

mm cTx
subject to Ax = b
Aox + Xo = bo (LPmany)
0 < x < u
0 < Xo :5 Uo
for some appropriate upper bound Uo on the new slack variables Xo. Note that
Xo and Uo are now vectors and Ao is a matrix of the appropriate dimension. The
corresponding dual problem is

max bTy uTw UoWo


subject to ATy + Aijyo w + z c
Yo Wo + Zo = 0 (LDmany)
w,z ~ 0
Wo, Zo > 0

where Yo, wo, and Zo are all vectors with dimension equal to the number of added
constraints. If we have an interior feasible solution to (LP) and (LD) then we can
get an interior feasible solution to (LDmany) by setting Yo 0 and Wo = Zo fe = =
for some small positive constant f. If we set Xo to some positive vector, we can then
restart using an infeasible interior point method. Alternatively, it may be possible
to update the primal solution to a point which is feasible in (LPmany). Thus, the
algorithm can be restarted when many constraints are added in much the same way
that it can be restarted when only one constraint is added.

A very large number of constraints is occasionally generated when solving some


problems. If all of these constraints are added to the relaxation at once, the algorithm
slows down for several reasons:

• The LP relaxation has been changed dramatically, so the interior point algo-
rithm takes several iterations to approach a new center, and then several itera-
tions to move towards the optimal solution.

• The constraint matrix becomes far larger, so the computational time required
at each iteration increases.

• It takes more iterations to solve a linear program with a large number of con-
straints than one with a small number.
IPMs for Combinatorial Optimization 431

Thus, it is advisable to only add a subset of the constraints.

Perhaps the simplest method for choosing which constraints to add is to add the
constraints that are most violated by the approximate solution i: to (LP). The
disadvantages of this method are that it may add a large number of very similar
constraints, or that a constraint with a large violation may not actually be that
important.

In some situations, it is useful to consider the effect of a constraint on the structure


of the non zeroes in the constraint matrix when deciding whether to add a constraint.
It is desirable to keep the Cholesky factors of the product AAT sparse in order to
be able to calculate the projections quickly, and if we add a lot of constraints which
all use the same variables then these constraints are going to lead to fill-in in AAT.
This was found to be an important consideration when solving the linear ordering
problem [57]. For this problem, it was found to be very advantageous to add only
a subset of the violated 3-dicycle constraints that was pairwise arc-disjoint - if the
same arc appears in two different 3-dicycles then the inner product between the rows
of the constraint matrix representing the two constraints is nonzero, so there is fill-in
in AAT.

An alternative measure for the importance of a constraint has been proposed by


Atkinson and Vaidya [6], who considered convex feasibility problems. In their setting,
the current relaxation has the constraints Cx :::; g, 0 :::; x :::; u for some upper bound u,
and they consider a possible constraint of the form hT x :::; h o . They suggest looking
at the interplay of the constraint with the Hessian of the barrier function used in
an interior point method; in particular, if C has full column rank then the quantity
hT(S2 + C T D 2 C)-lh should be large if the constraint is important, where Sand
D are appropriate diagonal matrices. The cutting plane algorithm described earlier
in this section calculates projections using matrices of the form 52 + C iJ2 C T rather
than S2 + C T D 2C for some appropriate diagonal matrices 5 and iJ; however, these
products are related (see, for example, Birge et at. [9]), so a similar test could
be derived using quantities already available in the algorithm. To the best of our
knowledge, nobody has investigated this measure computationally.

Dropping constraints
Computationally, it is useful to be able to drop constraints because this will reduce
the time required for each iteration by reducing the size of the constraint matrix.
An additional benefit of dropping constraints is that smaller linear programs require
fewer iterations to solve. The simplest way to decide whether to drop a constraint is
to check its slack value - if the current iterate satisfies the constraint easily, then the
constraint is a candidate to be dropped. When a constraint is dropped, the structure
432 CHAPTER 11

of the matrix AAT is changed, so it is necessary to calculate a new ordering of the


columns of this matrix for the Cholesky factorization. Because of this work, the
algorithm described in [57] only dropped constraints when other constraints were
added to the matrix.

The Atkinson-Vaidya measure of the importance of a constraint can be used to


decide whether to drop a constraint. To the best of our knowledge, nobody has
attempted to use this heuristic, principally because of the work required to calculate
this measure.

Primal heuristics
Primal heuristics are algorithms which generate integer solutions to (J P) from frac-
tional solutions to (LP). If they are very cheap, it is possible to call them at every
iteration; however, it is usually more cost effective to only call them when the sepa-
ration heuristics are also called.

For many problems, it is usually considerably easier to find the optimal solution
than to prove that it is optimal. The primal heuristics may well find the optimal
solution, and the cutting plane method can then be used to prove that that solution
is optimal. If the objective function vector c is integer then we do not need to
proceed any further with the cutting plane algorithm once the lower bound provided
by the value of (LD) is within one of the value of the best known feasible solution
to (I P) provided by the primal heuristics. Thus, the primal heuristics may well save
us work by letting us terminate without having to construct a relaxation which has
an optimal solution that is feasible in (I Pl.

If the interior point method is converging to a point in the interior of the optimal
face of Q then the primal heuristics may well provide one of the optimal solutions
to (IP), so we can terminate the algorithm, because the value of the relaxation will
agree with the value of the integer solution. Without the primal heuristics, we may
search futilely for cutting planes, and be forced to branch. Thus, a good primal
heuristic algorithm can save a great deal of time.

Another use for the primal heuristic is to modify the restart point in Q. It can be
modified to be slightly closer to the integer point generated by the primal heuristics.

Multiple optimal solutions and degeneracy


If the integer programming problem has multiple optimal solutions, then it is likely
that the iterates generated by an interior point cutting plane method will converge
to a point in the interior of the optimal face of Q. In this case, the primal heuristics
IPMs for Combinatorial Optimization 433

can usually be used to find an optimal solution to (IP). Alternatively, Megiddo's


approach [51] can be used to find an optimal basic feasible solution in Q which will
then solve (I P). Megiddo's algorithm is strongly polynomial.

It is possible that the fractional optimal solution may provide more information than
one of the optimal integer solutions. For example, in the linear ordering problem, an
optimal fractional solution corresponds to a partial ordering of the nodes, and every
ordering which agrees with this partial ordering is optimal.

For a survey of the effects of degeneracy on interior point methods for linear program-
ming problems, see GuIer et a/. [30]. Degeneracy does not appear to be as serious a
problem for interior point methods as it is for the simplex algorithm. The principal
practical effect of degeneracy on an interior point method is to cause possible numer-
ical problems because of numerical instability and ill-conditioning of the projection
matrix. Many integer programming problems have highly degenerate relaxations, so
an interior point method might be particularly well suited to such problems.

Fixing variables
Simplex branch and cut methods can use reduced costs to fix variables at zero or one
in the following manner. Let r be the reduced cost of a variable which is currently
zero in the solution to the relaxation. Let v U B be the value of the best known feasible
solution to (I P) and let v LB be the value of the relaxation. If r > v U B - v LB then
this variable must be zero in any optimal solution to (I P). A similar test can be
given for fixing a variable at one.

The reduced costs are not available at the current interior solution to the relax-
ation (LP), but the dual variables are available, and these can be used to fix vari-
ables. If Zi is the dual variable corresponding to the primal variable Xi and if ijLB
is the value of the current feasible solution to (LD) then Xi can be fixed at zero if
Zi > v U B - ijLB. A similar test can be used to fix variables at one. See Mitchell [55]
for more details.

The complete algorithm


We summarize the complete algorithm in figure 1l.5.

Algorithms which only require positive dual iterates


After adding constraints to the primal problem, the current primal iterate is infeasi-
ble and the dual variables corresponding to the additional columns have value zero.
434 CHAPTER 11

1. Initialize. Set up the initial relaxation. Find initial interior primal and dual
points. If possible, find a feasible point in Q. If possible, find a restart point in
the relative interior of Q for use in Step 8.

2. Inner iteration. Perform one iteration of the primal dual algorithm. While
the duality gap is above the tolerance T, repeat this step.

3. Primal heuristics. Use the primal heuristics to try to improve on the current
best solution to (I P). If successful, also update the known feasible point in the
relative interior of Q.

4. Look for cutting planes. Use heuristics and/or exact algorithms to find
cutting planes, if any exist.

5. Add cutting planes. If any cutting planes were found in Step 4 then add an
appropriate subset.

6. Fix variables. If possible, fix variables at zero or one.

7. Drop cutting planes. If any cutting plane appears to no longer be important,


drop it.

8. Modify current iterate. Increase any small components of wand z to a


small value Eo If a feasible point in the relative interior of Q is known, update
the primal solution to this point. Otherwise, increase any small components of
x and the vector of primal slacks to (. Modify the barrier parameter. If not
using the predictor-corrector algorithm, perform one pure centering step to get
a better initial point for the next relaxation. Return to Step 2

Figure 11.5 An interior point cutting plane algorithm


IPMs for Combinatorial Optimization 435

Some barrier methods, affine methods, and projective methods have been developed
for solving problems using either just the primal variables or just the dual variables,
and such methods can be used to solve the dual problem. Mitchell and Todd [59] con-
sidered using a projective algorithm applied to the dual problem in a cutting plane
algorithm. This algorithm does not require primal iterates, and only uses the value
of a primal feasible point in calculating the direction at each iteration. Heuristics
are used to generate primal solutions. When cutting planes are added to the primal
problem, a strictly positive dual iterate is obtained by first moving in a direction
which is guaranteed to increase the additional dual variables, and the algorithm is
then restarted from this new point. They obtained reasonable computational results
on matching problems, in terms of the number of iterations required. They also
obtained promising results on linear ordering problems; for details see [54].

Goffin et al. [25, 22, 21] have also experimented with algorithms which only require
primal iterates in their algorithms for nonsmooth optimization and multicommodity
flow problems. For a discussion of their algorithm for multicommodity flow problems
see section 11.5.1.

11.2.2 Interior point branch and bound methods


Branch-and-bound is a method for solving an integer programming problem (I P)
by solving a sequence of linear programming problems. The subproblems can be
regarded as forming a tree, rooted at the linear programming relaxation (LP) of
the integer programming problem. As we move down the tree, more and more
integer variables are fixed at their values. We provide a very brief description of the
technique in order to highlight some aspects which prove important when using an
interior point method. For a more detailed discussion of branch-and-bound and the
options available, see, for example, Parker and Rardin [64].

As with interior point cutting plane methods, one of the important features of a
competitive interior point branch and bound algorithm is that the relaxations are
not solved to optimality but are terminated early. This is usually possible, as we now
argue. When using branch-and-bound, one of four things can happen at each node
of the tree. The subproblem could be infeasible; in an interior point method this
can be detected by finding a ray in the dual problem. The subproblem could have
optimal value worse than the value of a known integer feasible solution, so the node
is fathomed by bounds; in an interior point method, this can usually be detected
well before the subproblem is solved to optimality. The optimal solution could be
an integer solution with value better than the best known solution; in this case we
need to solve the subproblem to optimality, but the node is then fathomed. The
final possibility is that the optimal solution to the subproblem has optimal value
436 CHAPTER 11

smaller than the best known solution, but the optimal solution is not feasible in the
integer program; in this case, it is possible to use heuristics based upon the basis
identification techniques described in El-Bakry et al. [17] to determine that one of
the integer variables is tending to a fractional value, and therefore that we should
probably not solve the relaxation to optimality but should branch early.

It should be noted that in only one case is it necessary to actually solve the relaxation
to optimality, and in that case the node is fathomed. When we branch early, one
constraint in the dual relaxation (LD) is dropped, so the previous solution to (LD) is
still feasible. One variable in (LP) is fixed, so infeasibilities are introduced into (LP).
Despite this, it is still possible to solve the child node quickly [10, 11].

A branch and bound interior point algorithm has the form given in figure 11.6.

Notice that we do not necessarily maintain primal and dual feasibility throughout
the algorithm. This means that some of the tests for convergence have to depend
upon whether the iterates are feasible.

If the relaxation is infeasible that is detected in Step 3d. If the relaxation has an
optimal value that is worse than that of the best known integer solution then that is
detected in Step 3c. In these two situations, the solution of the relaxation should not
take very many iterations, and the node is then fathomed. If the relaxation has an
integer solution which is better than the best known solution, then this relaxation is
solved to optimality and the node is fathomed in Step 3b. Here, the solution of the
relaxation may take several iterations, because an exact solution is needed, but the
node is then fathomed. Of course, it is possible that the rounding heuristics provide
the optimal solution to the relaxation early, and this is sufficiently close in value to
the dual value that the solution to the relaxation can be terminated early.

The one other possibility for a node of the tree is that the optimal solution to the
relaxation is fractional, but it has value smaller than that of the best known integer
solution. We discuss this situation further in section 11.6.

If it is necessary to branch then two child nodes will be created. It may eventually
be necessary to solve the relaxations at these child nodes, so it will be necessary to
start an interior point method on these relaxations. The simplex method can start
directly from the solution to the parent node, using the dual simplex algorithm. It
is necessary to modify the solution to the parent slightly before restarting with an
interior point method, in order to obtain a slightly more centered point. We discuss
restarting in more detail in section 11.6.

The linear programming problems generated in a branch-and-bound tree can suffer


from considerable degeneracy, which can greatly slow the simplex method. Degen-
IPMs for Combinatorial Optimization 437

1. Initialize: Pick an initial relaxation. Choose an initial primal and dual


iterate, using, for example, the method of Lustig et al. [50]. Set a tolerance T
for optimality. Initialize the branch and bound tree to contain only the initial
relaxation.
2. Pick a node: Select a node of the branch and bound tree to solve next. Find
an initial solution for this node. (We discuss a method for restarting at a node
later.)
3. Perform an interior point iteration: Perform one iteration of the interior
point method for the current node.
(a) Attempt to find a good integer solution by rounding the fractional solution
in an appropriate manner. If this gives the best solution found so far, store
it, and take its value as an upper bound on the optimal value of (IP).
(b) If the duality gap of the current relaxation is smaller than T and if the
primal value is better than the upper bound on the optimal value of (I P)
then
• If the primal solution is integral, fathom this node, update the upper
bound on the optimal value of (IP), and return to Step 2.
• If the primal value is nonintegral, go to Step 5.
(c) If the dual value of the current node is greater than the best known upper
bound, prune this node.
(d) If we can find a dual ray, showing that the primal problem is infeasible,
prune this node.
(e) If the current solution is dual feasible and if the relative primal infeasibility
is smaller than some tolerance, go to Step 4.
Repeat this step.
4. Check for nonintegrality: Check to see whether the solution to the current
node appears to be fractional. If it appears that the solution to this node will be
fractional, and if it appears unlikely that this node will be fathomed by bounds,
go to Step 5; otherwise return to Step 3.
5. Branch: Split the current node into two nodes. Pass the current primal and
dual solution onto the child nodes as a warm start. Go to Step 2.

Figure 11.6 An interior point branch and bound algoritlun


438 CHAPTER 11

eracy is generally not such a problem for interior point methods, and at least one
commercial package has installed a switch to change from simplex to an interior
point method within the branch-and-bound tree if difficulties arise. Applegate et
al. [5] also implemented such a switch. For a discussion of the effects of degeneracy
on interior point methods for linear programming, see GuIer et al. [30].

One cost of using the simplex algorithm in a branch and bound method is that it
is necessary to perform an initial basis factorization for each child subproblem. The
cost of this is clear when examining, for example, the performance of the branch and
bound code for OSL [33] on solving integer programming problems: it often happens
that the average time per iteration is about three times larger for subproblems than
it is for the root node of the branch and bound tree. This extra time is almost
completely attributable to the overhead required at each node to calculate a new
basis factorization. A comparable slow down does not happen with interior point
methods. One way to avoid this overhead would be to store the basis factorization
of each parent node, but this usually requires too much storage and is hence imprac-
ticable. Of course, part of the reason that the slow down is so noticeable is that the
simplex algorithm requires far fewer iterations to solve subproblems than to solve
the root node, because the optimal solution to the parent node does provide a good
simplex warm start for the child subproblem. At present, it does not seem possible
to get a similar reduction with an interior point method, but the fact that the basis
refactorization is so expensive means that it is not necessary to obtain as good a
reduction in the number of iterations as enjoyed by the simplex algorithm.

Interior point branch and bound methods are somewhat competitive with simplex
based branch and bound algorithms on some problems, including facility location
problems [11]. These problems have a large number of continuous variables and a
relatively small number of integer variables, so the LP relaxations are large yet the
branch and bound tree is small. It is necessary to have large relaxations for an
interior point method to compete with a simplex method. In addition, we need the
problem to be solvable on current hardware, so the branch and bound tree can not
grow too large, so we need to have only a small number of integer variables. Because
of the early termination of the solution of the relaxations, not as much information
is available at each node, so the pseudo costs [64] used to select the next branching
variable and the next node can not be calculated as accurately. This is another
reason why an interior point method can currently only be competitive on problems
with a small proportion of integer variables, because in this situation the effect of a
bad choice of branching variable is not so dramatic. More research is needed to find
good, reliable pseudo costs in an interior point branch and bound method.

To conclude this section on interior point branch and bound methods, we discuss
restarting the algorithm (subsection 11.6) and terminating the solution of the re-
IPMs for Combinatorial Optimization 439

laxation early when the iterates are tending towards a fractional solution (subsec-
tion 11.6).

Terminating the current relaxation early


In this section, we assume that the optimal solution to the relaxation is fractional, but
it has value smaller than that of the best known integer solution. In this situation,
we can use basis identification techniques [17], as mentioned in Step 4. This will
usually save a few iterations on the solution of the current node, and it will also
result in termination at a more centered iterate, which will result in a better initial
iterate for each child of this node.

There is a risk associated with attempting to stop solution of the parent subproblem
early: the parent may be split into two child subproblems, when it might have been
possible to prune the parent if it had been solved to optimality. This could happen if
the parent subproblem has worse objective function value than that of the best known
feasible solution to (I P), or if it is infeasible, or if it has an integer optimal solution.
(Notice that the last possibility is unlikely if the basis identification techniques are
working well.) Therefore, it is wise to include some safeguards to attempt to avoid
this situation. Upper and lower bounds on the value of a subproblem are provided
by the values of the current primal and dual solutions, respectively, and these can
be used to regulate the decision to branch.

There are three tests used in [11] to prevent branching too early: the dual iterate
must be feasible, the relative primal infeasibility must be no greater than 10%,
and the dual objective function must not be increasing so quickly from iteration to
iteration that it is likely that the node will be fathomed by bound within an iteration
or two. Dual feasibility can usually be maintained throughout the branch and bound
tree so the first criterion is basically just a technicality. Every time a variable is
fixed, primal infeasibility is introduced; if the initial iterate for a subproblem is a
good warm start, primal infeasibility can be regained in a few iterations. Thus, the
important criterion is the third one, regarding the increase in the dual value. This
criterion prevents branching if the difference between the dual value and the value of
the incumbent integer solution has been reduced by at least half in the last iteration,
provided the current primal value is greater than the current integer solution if the
current primal iterate is feasible.

Warm starting at a node of the tree


The exact method used for restarting at a node ofthe branch and bound tree depends
upon the interior point algorithm used. In this section, we assume that the primal-
440 CHAPTER 11

dual barrier method is being employed (see, for example, Lustig et al. [49]). It
should be noted that many of the observations we make will also be applicable if
other interior point methods are used.

Assume a child problem has been created by fixing the variable Xo at 0 or 1 in the
parent problem

mill cT x + Coxa
subject to Ax + aoxo b (LPparent)
o < x, Xo < e,
where A is an m x n matrix, ao and bare m-vectors, c and x are n-vectors, and Co

and Xo are scalars. The child problem has the form

min cTx
subject to Ax b (LPchild)
0:::; x :::; e,

where b = b if Xo is fixed at zero, and b = b - ao if Xo is fixed at one. An approximate


solution x = x*, Xo = xli to (LPparent) is known. Since we created this particular
child problem, xli must be fractional, so x* is probably infeasible in (LPchi 1rl \ If
we examine the dual problems

max bTy eTw Wo


subject to ATy w + z c (LDparent)
aT; y Wo + Zo Co
W,Wo, Z, Zo > 0,
and
max bT y eTw
subject to ATy w + Z C (LDchild)
W,Z > 0,
we notice that the approximate solution y = y*, w = w*, Z = z* (and Wo = wii,
Zo = zii) to (LDparent) is still feasible in (LDchild) - all that has changed is the
objective function value. Therefore, it is possible to restart the algorithm using an
infeasible interior point method. It may be that some of the components of x* are
very close to zero or one, and these components should be modified to give a slightly
more interior point, with small components being increased and large components
being decreased. Similarly, if some components of the dual variables w' and z* are
smaller than some tolerance, they should be increased, at the cost of making the
initial iterate for the subproblem slightly more infeasible. Thus, we can start the
interior point method on the child problem if we have stored x', y* , and w*. It may
be beneficial to use a pure centering step first before updating the barrier parameter
J.l in a standard manner.
IPMs for Combinatorial Optimization 441

It may be possible to start the solution of the child problem from an iterate for
(LPparent) which was found before x*. This earlier iterate would be further fromop-
timality for (LPparent) than x*, but it may be a better initial solution to (LPchild)
just because it is more centered, with the nonbasic components being somewhat
larger. Preliminary experiments show that this approach may hold some promise,
but it needs considerably more investigation.

11.3 A POTENTIAL FUNCTION METHOD


Karmarkar and various coauthors [43,40] proposed a novel approach to solve integer
programming problems. This approach examines a related continuous optimization
problem and uses this continuous problem to approach a sclution to the original
integer program. We outline their approach in this section.

The algorithm transforms the problem to a hard equivalent quadratic programming


problem (see section 11.3.1). At each iteration, given a strictly feasible iterate for
the hard QP, it solves an easier non convex quadratic programming problem to find
a direction in the hard quadratic problem (see section 11.3.2), and it moves in the
direction to obtain a new iterate; it also rounds the new point to an integer point
which is then checked for feasibility (see section 11.3.3). It may happen that the
algorithm converges to a noninteger solution; a method for handling this situation
is described in section 11.3.4. Computational experience with the algorithm is dis-
cussed in section 11.3.5.

Finally, in section 11.3.6, we describe a method for solving quadratic integer pro-
gramming problems.

11.3.1 Transforming the problem


The integer programming feasibility problem can be stated as

Find a point x in R n satisfying Ax :::; b, Xi E {O, I}, i = 1, ... , n, (IP feas)

where A is an m x n matrix and b is an m-vector. This can be scaled to give the


equivalent problem

Find a point x in R n satisfying Ax:::; ii, Xi E {-I, I}, i = 1, ... , n,


442 CHAPTER 11

where x = 2x-e, A = A, and b = 2b-Ae. This can be transformed to the nonconvex


quadratic programming problem
min n-xTx
subject to Ax < b (QP)
-e ::5 x ::5 e
The problem (Q P) is written equivalently as
min
subject to (QP)
where A = [AT I - If and b = [b T eT eTjT. We define m = in + 2n, so A is an
m x n matrix and b is an m-vector. The feasible solutions to (I P f eas) correspond
to points in (QP) with value zero. Any other feasible point to (QP) has strictly
positive value. The algorithm is initialized with any strictly feasible point for the
problem (QP).

11.3.2 Finding the next iterate


The algorithm uses an interior point method to attempt to minimize the potential
function
m

mlog(n - x T x) - Llogs;
;=1
where s = b - Ax. The point x· is a global minimizer of this potential function if
and only if it is a feasible solution to (IPfeas).

At each iteration, a quadratic approximation to the potential function is constructed,


and a direction is obtained by finding the minimum of this quadratic approximation
over an inscribed sphere. The gradient of the potential function at the current
iterate x'"
is given by
(11.8)
where fa := n - x",T x'" and 8 is the diagonal matrix with diagonal elements the
entries of the vector s. The Hessian of the potential function at the current point is

(11.9)
It should be noted that the Hessian is a dense matrix, due to the outer product of
the vector x'" with itself. The subproblem which is solved to find the direction .6.x
is then
mm (1/2).6.x T H.6.x + hT .6. x
subject to .6.xAT 8- 2 A.6.x ::5 r2
IPMs for Combinatorial Optimization 443

If 0 < r ::; 1 then a feasible point dx in this subproblem leads to a feasible point
x + dx in the problem (QP). The solution to this subproblem depends on the
eigenvalues of the Hessian matrix H in the norm defined by the matrix AT S-2 A;
for details, see [43, 40]. This subproblem can be solved in polynomial time. For
methods for solving it, see [43, 40] or Ye [79]. Kamath et al. originally proposed
taking a step of a fixed length in the direction dx; it was subsequently pointed out
by Shi and Vannelli [73] that the algorithm can be considerably enhanced by using
a line search to determine a step length.

Van Benthem et al. [7, 75] developed a variant of this algorithm to solve the radio
link frequency assignment problem. Their refinements to the original algorithm
included a method to deal with equality constraints, and the use of a barrier method
rather than a potential function method so that the Hessian matrix retains the
sparsity structure of AS2 AT. The structure of their problem is such that all the
slack variables in the original problem (I P f eas) must also be binary. They used
this observation to develop a quadratic objective function which enabled them to
eliminate the inequality constraints.

11.3.3 Obtaining an integer point


At each iteration k, a new strictly feasible point xk for (Q P) is obtained. This point
is then rounded to obtain an integral point xk. The simplest method to obtain xk
is to set
-k _ { 1 if xf ~ 0
Xi - -1 if xf < 0

Other rounding schemes can be used. For example, we can choose xr by examining
whether xf is increasing or decreasing. For some problems, the structure of the
problem suggests a natural rounding scheme; for example, Van Benthem et al. [7]
have suggested several rounding schemes for the radio link frequency assignment
problem. If the rounded point xk is feasible in (QP) then we can terminate the
algorithm with success: xk leads to a feasible point in the original problem (I P feas).

11.3.4 Avoiding local minimizers


The non convex quadratic problem may have local minimizers which are fractional
points and therefore not global minimizers. If this happens, it may be that the
rounded solution xk remains infeasible. In this situation, it is possible to add a
constraint
(11.10)
444 CHAPTER 11

Notice that every {-1, 1} point satisfies this constraint except i: k • After adding
this constraint, the algorithm is restarted from a strictly feasible point. It is best
to restart from scratch because the objective function is non convex, so we want to
generate a sequence of iterates that does not lead to the local minimizer. There is
no guarantee that equation (11.10) will cut off the local minimizer, but Karmarkar
claims that the addition of this constraint is usually sufficient to push the sequence
of iterates in a different direction, so that the algorithm terminates at a different
point.

11.3.5 Computational experience


The algorithm was used to solve set covering problems [43]. It was also used to
solve satisfiability problems from inductive inference [40], obtaining promising results
compared to an implementation of the Davis-Putnam procedure [13] (an implicit
enumeration procedure which is similar to branch and bound). Shi and Vannelli [73]
improved on these results by incorporating a linesearch.

Van Benthem et al. [7, 75] solved radio link frequency assignment problems using
a potential reduction method. By cleverly exploiting the structure of their model,
they were able to develop a variant of the algorithm which solves problems with
several thousand variables and constraints.

The algorithm can be used to solve optimization problems by incorporating con-


straints of the form cT x :::; K and resolving with different values of K. It should
be noted that this method is essentially a heuristic, in that it can not determine
that an infeasible instance is really infeasible. Thus, the algorithm does not lead
to a guarantee that the optimal solution has been found when solving optimization
problems.

11.3.6 Quadratic integer programming problems


Kamath et al. [37, 38] also investigated using a potential function interior point
method to solve quadratic integer programming problems of the form

max xTQx
subject to Xi= -1,1, i = 1, ... , n

where Q is a symmetric n x n matrix and x is an n-vector. This problem is NP-


Hard if Q has at least one positive eigenvalue [63]. The graph partitioning problem
can be modeled in this manner [37]. They were able to use their method to obtain
IPMs for Combinatorial Optimization 445

upper bounds on the optimal value of the quadratic integer programming problem
in polynomial time [38].

This approach encloses the feasible region in an ellipsoid, finds the maximum value
of the objective function in the ellipsoid, and then modifies the ellipsoid appropri-
ately. The maximum value of the objective function over the ellipsoid is the largest
eigenvalue of an appropriate matrix. By modifying the ellipsoid appropriately, it is
possible to obtain a reasonable upper bound on the optimal value of the quadratic
problem.

11.4 SOLVING NETWORK FLOW


PROBLEMS
11.4.1 Introduction
Network flow problems arise in the shipment of commodities (for example, oil, tele-
phone calls, or cars) over a network from sources to destinations. For example, oil
is shipped from oil fields to refineries to its eventual destination, telephone calls are
routed from the caller to the destination over the telephone company's network, and
cars are routed through a city between residences, offices, and commercial buildings.
Many of these network flow problems can be modeled as linear programming prob-
lems with a constraint matrix with a special structure. Historically, these problems
have been solved by using specialized versions of the simplex algorithm designed to
exploit the structure of the constraint matrix. Recently, several researchers have
experimented with using an interior point algorithm to solve these linear program-
ming problems, with results that are very competitive with the network simplex
method. We describe some of the computational results in section 11.4.3. As is to
be expected, the interior point method has to be modified in order to exploit the
structure of the constraint matrix fully. The principal modification is in the use of a
preconditioned conjugate gradient method to calculate the necessary projections at
each iteration. We describe this and other modifications in section 11.4.2.

An overview of work in this area is contained in Resende and Pardalos [69]. An


implementation of the dual affine method is described in Resende and Veiga [70,71].
Other interior point algorithms are investigated in Portugal et al. [65, 66]. For an
introduction to network flow problems and applications, see the book by Ahuja et
al. [1].
446 CHAPTER 11

We now describe the minimum cost network flow problem. Given a directed graph
G = (V, E) with m vertices V and narcs E, the arc from vertex i to vertex j is
denoted by (i,j). Flow moves around the network along the directed arcs. If more
flow is produced at a node i than is consumed at that node, then the node is called
a source node. If more flow is consumed at a node i than is produced at that node,
then the node is called a sink node. Any node which is neither a source node nor a
sink node is called a transshipment node. Let bi denote the net required flow out of
node i; if bi > 0 then node i is a source, if bi < 0 then node i is a sink, and if bi 0 =
then node i is a transshipment node. For a feasible flow to exist, it is necessary that
I:iEv bi = O. The flow must satisfy Kirchhoff's Law of flow conservation: the total
flow out of node i must equal the sum of bi and the total flow into node i for each
node i. There is a cost Cij for each unit of flow shipped along arc (i,j). We assume
without loss of generality that the lower bound on each arc is zero (see [1]), and
we denote the upper bound on arc (i, j) by Uij' The minimum cost network flow
problem is then to meet the demands at the nodes at minimum cost while satisfying
both Kirchhoff's Law and the bounds on the edge capacities. This can be expressed
as the following linear programming problem:

mill E
(i,j)EE
CijXij (11.11)

subject to E Xij -
E Xji = bi for all i E V (11.12)
(i,j)EE (j,i)EE
o ::; Xij ::; Uij for all (i,j) E E (11.13)

where Xij denotes the flow on arc (i,j). Usually, the problem data is integer, III
which case one of the optimal solutions to this linear program will be integer.

We let A denote the node-arc incidence matrix of the graph. Each column of A
corresponds to an arc (i,j) and has an entry "I" in row i and an entry "-I" in row j,
with all the remaining entries being zero. Notice that the constraint (11.12) can
be written Ax = b. The rank of the matrix A is equal to the difference between
the number of vertices and the number of connected components of the graph. One
redundant row can be eliminated for each connected component. For simplicity of
notation we retain the redundant rows, but it should be understood that these rows
have been eliminated.

Many combinatorial optimization problems can be formulated as minimum cost net-


work flow problems. Examples include the assignment problem, the transportation
problem, the shortest path problem, and the maximum flow problem. For more
details, see [1]. The multicommodity network flow problem has more than one com-
modity moving through the network. See section 11.5 for a discussion of interior point
IPMs for Combinatorial Optimization 447

1. Given: Constraint matrix A, diagonal matrix D, preconditioner M, vector w,


tolerance {", want to calculate an approximate solution v to equation (11.14).

2. Initialize: Set v = 0, 1'0 = W, Zo = M- I 1'o, Po = zo, k = O.

3. Main loop: While the stopping criterion is not satisfied, repeat the following
steps:

(a) Calculate qk = ADATpk.


(b) Calculate frk = Zk1'k/pIqk.
(c) Calculate Vk+I = Vk + frkPk.
(d) Calculate 1'k+1 = 1'k - frkqk.
(e) Find Zk+I by solving MZk+I = 1'k+I.
(f) Calculate Ih = Zk+I 1'k+ti zk 1'k·
(g) Calculate Pk+I = Zk+I + f3kPk.
(h) Increase the iteration counter k by one.
4. Stop: Final solution is v = Vk.

Figure 11.7 The preconditioned conjugate gradient algorithm

multicommodity network flow algorithms. For background on the multicommodity


network flow problem, see, for example, the books by Ahuja [1] and Minoux [53].

11.4.2 Components of interior point network


flow methods
Calculating the projections by using a preconditioned
conjugate gradient method
In any implementation of an interior point method, it is necessary to find a direction
at each iteration by solving a system of equations

ADATv =W (11.14)

where A is the m x n constraint matrix, D is a diagonal m x m matrix, v is an


unknown m-vector, and w is a known m-vector. This is usually done by calculating
448 CHAPTER 11

a factorization of the matrix ADAT. The matrix D and the vector w change from
iteration to iteration; it is necessary to solve this system for more than one vector
w at each iteration of some algorithms. Resende and Veiga showed that superior
performance can be obtained on network flow problems if the system (11.14) is
solved using a preconditioned conjugate gradient method.

A preconditioned conjugate gradient algorithm for solving (11.14) is given in fig-


ure 11.7. The preconditioner is denoted by M. The matrix M is a positive definite
matrix and it is chosen so that the matrix M-l(ADAT) is less ill-conditioned than
the original matrix ADAT, and this should then improve the convergence of the
conjugate gradient algorithm. Notice that Step 3e of the preconditioned conjugate
gradient algorithm requires the solution of a system of equations involving M. The
loop in the algorithm will probably be executed at least five to ten times for each
calculation of a projection; thus, it is essential that it be considerably easier to solve
a system of equations involving M than one involving ADAT.

The structure of the network flow problem makes it possible to choose a good pre-
conditioner M. The simplest preconditioner is to take M to be the diagonal of the
matrix ADAT. This is simple to compute, it makes the calculation of Zk+l trivial,
and it can be effective. A more sophisticated preconditioner that exploits the nature
of the problem is the maximum weighted spanning tree (MST) preconditioner. The
edges of the graph are weighted by the corresponding elements of the diagonal ma-
trix D, and a maximum weight spanning tree is then found using either Kruskal's
algorithm or Prim's algorithm. (For descriptions of these algorithms for finding a
maximum weight spanning forest, see [1].) Let S denote the columns of A corre-
sponding to the edges in the maximum weight forest. The MST preconditioner is
then
(11.15)
where iJ is a diagonal matrix containing the entries of D for the edges in the max-
imum weight spanning forest. The preconditioned residue system solved in Step 3e
can be solved in time proportional to the number of vertices because the coefficient
matrix S can be permuted into block triangular form.

The diagonal preconditioner appears to be better than the MST preconditioner in


the early iterations of the interior point algorithm, in that it requires fewer steps of
the preconditioned conjugate gradient algorithm to obtain a direction of sufficient
accuracy. The situation reverses in later iterations. Initially, the MST preconditioner
is a poor approximation to the matrix ADAT because it puts too much emphasis
on a few edges when it is not really possible to decide which edges are important.
Eventually, the MST preconditioner becomes a better approximation to the matrix
ADAT, because it is possible to pick the right subset of the edges. Thus, Resende
and Veiga [70, 71] use the diagonal preconditioner initially and switch to the MST
IPMs for Combinatorial Optimization 449

preconditioner once the performance of the diagonal preconditioner falls off in their
dual affine algorithm.

Portugal et al. [65, 66] have proposed a preconditioner based on an incomplete QR


factorization of the matrix D 1 / 2 AT. This preconditioner appears to behave like the
diagonal preconditioner in the early iterations, like the MST preconditioner in the
later iterations, and to perform better than either of the other two preconditioners
in the intermediate iterations. They have used this preconditioner in a primal-dual
interior point algorithm for network flow problems. A preconditioner proposed by
Karmarkar and Ramakrishnan [42] is based on selectively zeroing out elements of DA
and also of the resulting modified product ADAT , and then using the incomplete
Cholesky factors of the approximation to this matrix as the preconditioner. This
preconditioner also performs similarly to the diagonal preconditioner in the early
iterations and similarly to the MST pre conditioner in the later iterations.

We now discuss the stopping criterion used within the preconditioned conjugate gra-
dient algorithm. Recall that we want to solve equation (11.14) and that we use the
vectors Vk as successive approximations to v. The check used in the papers discussed
in this section examines the vector ADAT Vk: if the angle () between this vector and
the right hand side vector w is close to zero, then we have solved equation (11.14)
approximately. Resende and Veiga use the criterion that the preconditioned con-
jugate gradient algorithm can be halted if 1 1 - cos () 1< (coo, where (cos is 10- 3
in early iterations of the interior point algorithm and is gradually decreased. The
calculation of cos () requires about as much work as one conjugate gradient iteration,
so it is only calculated every fifth iteration by Resende and Veiga. Additionally, the
conjugate gradient method is halted if the size of the residual rk becomes very small.

Recovering the Optimal Flow


Since the node arc incidence matrix is totally unimodular, every basic feasible solu-
tion to the network flow problem is integral provided b is integral, so every iterate
generated by the simplex algorithm corresponds to an integral flow. The basic feasi-
ble solutions correspond to forests in the graph, with nonzero flow only on the edges
in the forest. An interior point method usually converges to a point in the relative
interior of the face of optimal solutions, so, if the optimal solution is not unique,
an interior point method will not return an integral solution. We discuss methods
used to obtain an integral optimal solution from the iterates generated by an interior
point algorithm.

The maximum weight spanning tree found in the preconditioned conjugate routine
can be used to guess an optimal solution: if the basic solution corresponding to this
forest is feasible and the corresponding dual solution is also feasible then this solution
450 CHAPTER 11

is optimal. This works well if the solution is unique, but unfortunately it usually
does not work well in the presence of multiple optimal solutions. If the primal basic
solution is not feasible, then the current dual iterate is projected to give a point fj
which is complementary to the primal basic solution. The edges for which the dual
slack has small magnitude for this dual vector fj are then used to define a subgraph
of the original graph. The edges in this subgraph are a superset of the edges in the
forest. The flow on all edges not in this sub graph are assigned flow either 0 or equal
to their upper bound. Resende and Veiga then attempt to find a feasible flow in
the original graph by only adjusting flow on the edges in the subgraph. This can be
done by solving a maximum flow problem and is guaranteed to give an integral flow
if one exists.

As the interior point iterates converge towards optimality, this procedure will even-
tually give an integral optimal flow, provided the flows on the nonbasic edges are set
correctly to 0 or their upper bound. Resende and Veiga examine the dual variable
S; corresponding to the nonbasic variable Xi and the dual variable Zi corresponding
to the upper bound constraint on this variable Xi. If S; > Zi then variable Xi is set
to zero; otherwise it is set equal to its upper bound. As the interior point method
converges to optimality, this setting will eventually be optimal, and so the procedure
outlined above will give an optimal integral solution to the network flow problem.
The basis identification method of Megiddo [51] can be used to determine an optimal
integral basic feasible solution once the interior point method is close enough.

11.4.3 Comparison with Network Simplex


Resende and Veiga [70,71] have compared their code with version 3.0 of CPLEX Ne-
topt [12]. They generated problems of seven different structures and of varying sizes
for each structure. Two problem classes were generated using NETGEN [45], and
the other problems were generated using various generators contributed to the First
DIMACS Algorithm Implementation Challenge [14] (these generators are available
from DIMACS at Rutgers University, at FTP site: dimacs.rutgers.edu). Both
CPLEX Netopt and the code of Resende and Veiga were able to solve all of the
generated problems, providing integer flows as output. In all but two classes, the
interior point code was faster than CPLEX Netopt on the largest problems. On
one of the remaining classes, the difference between the interior point code and the
simplex code was decreasing as the problem size increased.

Thus, this work shows that interior point methods can outperform the simplex algo-
rithm even in problem classes which lend themselves to sophisticated implementa-
tions of simplex. For an interior point method to be successful, it is necessary to use
IPMs for Combinatorial Optimization 451

a preconditioned conjugate gradient method to calculate the projections, and to use


various other techniques outlined here and discussed in more detail in [65, 66, 70, 71].

Many of the computational runs of these authors took several hours, and some of
the runs with CPLEX Netopt took longer than a day. They used a number of
workstations (each solving a separate problem) to obtain their results, and they
were able to solve problems which are considered very large. It is on these large
problems that the advantages of interior point methods become clear.

11.5 THE MULTICOMMODITY NETWORK


FLOW PROBLEM
In this section, we describe two interior point approaches to multicommodity network
flow problems. The nonlinear multi commodity network flow problem with separable
increasing convex costs can be modelled as a nonlinear programming problem with
linear constraints. The problems of interest generally create very large nonlinear
programs. They arise in, for example, the areas of telecommunication, transporta-
tion, computer networks, and multi-item production planning. For more discussion
of the multicommodity network flow problem, see the books by Ahuja [1] and Mi-
noux [53]. (For a description of single commodity linear network flow problems, see
section 11.4.)

11.5.1 A Column Generation Algorithm for the


Multicommodity Network Flow Problem
Goffin et al. [21] have described an interior point algorithm for solving nonlinear
multi commodity network flow problems that has similarities to the Dantzig-Wolfe
algorithm. Their algorithm is a column generation method, with new columns added
either one at a time or in bunches. It approximately solves the nonlinear program
that arises at each stage by using a projective method, specifically the de Ghellinck
and Vial [19] variant of Karmarkar's algorithm [41]. The column generation subprob-
lem is formulated as a shortest path problem and is solved using an implementation
of Dijkstra's algorithm.

Goffin et al. [25, 22] have previously described column generation interior point algo-
rithms designed to solve nonsmooth optimization problems. The research described
in this section is a continuation and extension of the work described in their earlier
papers.
452 CHAPTER 11

We are given a graph G = (V, E) and a set of commodities I. We denote the node
arc incidence matrix by A. For each commodity, there are source nodes where flow is
produced, sink nodes where flow is consumed, and transshipment nodes, where the
flow is in balance. The required net flow out of node v of commodity i is represented
by d~. Goffin et at. [21] restrict themselves to the case where each commodity has
exactly one source node and one sink node. The capacity Ye of each arc e can be
selected, with an associated convex cost le(xe); the upper bound on the capacity is
denoted by Ie. Associated with each commodity i and each arc e is a linear cost c~
for each unit of commodity i shipped along arc e. The multicommodity flow problem
can then be formulated as

mIll L:iEI L:eEE c~x~ + L:eEE le(Ye) (11.16)


subject to L:iEJ x~ ~ Ye "Ie E E (11.17)
Axi = d i Vi E I (11.18)
x~ 2: 0 Vi E I, e E E (11.19)
o ~ Ye ~ Ie "Ie E E. (11.20)

Here, x~ represents the flow of commodity i on arc e and Ye represents the total
flow on arc e. We assume that the cost function le(Ye) is strictly increasing and
convex and that the costs C e are nonnegative. The standard linear multicommodity
flow problem corresponds to Ie = 0 for every arc e. Equation (11.17) is called the
coupling constraint and equation (11.18) is the flow conservation constraint. Without
equation (11.17), the problem would be separable. This equation is dualized in the
Lagrangian relaxation developed for this problem. The Lagrangian multipliers for
these constraints are nonnegative because of the structure of the objective function;
with the use of an interior point cutting plane algorithm, the multipliers are actually
always positive.

Dualizing the coupling constraints (11.17) gives the Lagrangian

L(x, Y; u) := LL c~x~ + L le(Ye) +L ue( -Ye + L x~) (11.21)


iEI eEE eEE eEE iEI

where u is the vector of Lagrange multipliers for the coupling constraints. Since the
multi commodity flow problem is convex, it can be solved by solving the Lagrangian
dual problem
max
subject to
where the Lagrangian dual function LD (u) is given by

(11.22)
IPMs for Combinatorial Optimization 453

The Lagrangian dual function LD(u) is a nonsmooth concave function. The La-
grangian dual problem can be solved by obtaining a polyhedral approximation to
the dual function using supergradients ~. If LD (u) is differentiable at the point u then
the only supergradient at that point is the gradient itself. In general, a supergradient
~ at u satisfies
(11.23)
for all u ~ O. Given points uk ~ 0 and associated supergradients ~k for k = 1, ... ,1\:,
the optimal value to the linear programming problem

max z
subject to z - (~k)Tu:::; LD(u k ) - (e)Tuk for k = 1, ... , I\:

provides an upper bound O:up on the optimal value of the Lagrangian dual. It can
be shown that if I\: is large enough, then the solution to this linear program will solve
the Lagrangian dual. The maximum of LD(u k ) for k = 1, ... , I\: provides a lower
bound Oinf on the optimal value of the dual, and any optimal solution lies in the
localization set

At each stage, the algorithm generates a point in the localization set. If this point is
feasible in the Lagrangian dual, then we can update the lower bound 0h.f' If the point
is not feasible, then we can generate a new sub gradient ~ and add the corresponding
constraint to the localization set. In either case, the localization set is updated, so
we then find a new point in this set and repeat the process until the gap between
Oinf and O:uP is sufficiently small. We summarize this in the prototypical algorithm
given in figure 11.8, dropping the iteration counter k to simplify the notation.

Step 1 of this process is usually called the Master Problem. Classically, it has been
solved using the simplex algorithm, and then the whole process resembles Dantzig-
Wolfe decomposition. Goffin et al. use an interior point method to solve the Master
Problem. They apply the de Ghellinck and Vial variant [19] of Karmarkar's projec-
tive algorithm [41] to the dual of Master Problem to calculate the analytic center of
the localization set. The localization set is modified by the addition of constraints so
columns are added to the dual of this problem. An interior point is generated in the
dual by using the technique of Mitchell and Todd [59]. The method used by Goffin
et al. generates primal and dual iterates at each approximate solution to the master
problem, so an approximate solution to the Lagrangian dual can be converted to an
approximate solution to the multi commodity flow problem.

Step 2 of the prototype algorithm is called the subproblem or oracle. There are
choices available in the solution of this problem for a multi commodity flow problem,
454 CHAPTER 11

1. Select a point (Z, ii)· in the localization set LOC.


2. Compute LD(ii) and find a subgradient eof LD(ii) at ii.
e e ii
3. Add the inequality
z- u :5 L D(ii) -
to the definition of the localization set LOC. If LD(ii) > Ori,f' then update Oinf
to LD(ii).

4. Repeat the process until the termination criterion is satisfied.

Figure 11.8 Column generation algorithm for the multicommodity flow problem

depending upon the level of disaggregation of the constraints. The constraints for
the subproblem are separable by commodity. It is then possible to generate one
subgradient for the whole problem, or to generate sub gradients corresponding to each
commodity. Goffin et al. obtained better results by disaggregating the constraints
and generating separate sub gradients for each commodity; this is in agreement with
other work in the literature which used different algorithms to solve the Master
Problem (see Jones et al. [34]).

Goffin et al. give computational results for random problems with up to 500 nodes,
1000 arcs, and 4000 commodities, and for some smaller problems from the litera-
ture. (In their formulation, the largest problems could have up to 8 X 10 6 primal
variables x~.) They compared their algorithm with an implementation of Dantzig-
Wolfe decomposition, and the interior point algorithm was clearly superior for the
problems discussed.

11.5.2 Other Interior Point Methods to Solve the


M ulticommodity Network Flow Problem
Kamath et al. [36, 39] have described several interior point methods for the multi-
commodity flow problem. One approach solves the problem as a linear programming
problem using a dual projective interior point method. They obtained computational
results comparable with CPLEX Netopt [12]. A second approach places the network
flow constraints in a convex quadratic objective function and solves a minimization
problem with this objective subject to the capacity constraints. This algorithm has
IPMs for Combinatorial Optimization 455

good theoretical complexity for approximately solving the multicommodity network


flow problem.

11.6 COMPUTATIONAL COMPLEXITY


RESULTS
11.6.1 Theoretical Behaviour of Cutting Plane
Algorithms
It is usually straightforward to show that an interior point cutting plane (or column
generation) algorithm runs in time polynomial in the total number of constraints (or
columns) generated during the algorithm - see, for example, Mitchell [56] or den
Hertog et ai. [31,32]. A harder problem is to show that such an algorithm runs in
time polynomial in the size of the original description of the problem.

Given an integer programming problem, a separation routine either confirms that a


point is in the convex hull of the set of feasible integer points, or it provides a cutting
plane which separates the point from the convex hull. If the separation routine runs
in time polynomial in the size of the problem, then the ellipsoid algorithm can be
used to solve the integer programming problem in polynomial time - see Grotschel
et al. [29]. It is not necessary to drop any constraints when using this method. For
the rest of this subsection, we assume that the separation routines require polynomial
time.

To date, the only interior point algorithm which solves the integer program in polyno-
mial time and which does not drop constraints is due to Vaidya [74]. This algorithm
uses the volumetric center, so its analysis differs from that of more standard interior
point methods. Vaidya's analysis of his algorithm shows that only a polynomial
number of constraints are generated, even though an infinite number of possible
constraints exists. This is a crucial point in proving the polynomial complexity of
his algorithm, and indeed of any cutting plane or column generation algorithm. For
an alternative analysis of this algorithm, see Anstreicher [4]. Anstreicher was able
to greatly reduce the constants involved in the complexity analysis of Vaidya's algo-
rithm, making the algorithm considerably more attractive for implementation. For
example, Anstreicher reduced the number of Newton steps by a factor of 1.8 million
and he reduced the maximum number of constraints used by a factor of 10 4 . Vaidya's
algorithm is a short step algorithm, in the sense that the reduction in the duality
gap at an iteration is dependent on the dimension of the problem. Ramaswamy
and Mitchell [68] have developed a long step variant of Vaidya's algorithm that has
456 CHAPTER 11

polynomial convergence. Their algorithm reduces the duality gap by a fixed ratio at
any iteration where it is not necessary to add or drop constraints.

Atkinson and Vaidya [6] developed a polynomial time cutting plane algorithm which
used the analytic center. This algorithm drops constraints that become unimportant,
and this is essential in their complexity analysis. Previous algorithms were often
shown to be polynomial in the number of additional constraints, but without a
proof that the number of added constraints is polynomial. Atkinson and Vaidya's
algorithm finds a feasible point for a set of convex inequalities by finding an analytic
center for a subset of the inequalities and using an oracle to test whether that point
satisfies all the inequalities. If the oracle returns a violated inequality, a shifted
linear constraint is added so that the analytic center remains feasible and close to
the new analytic center.

Mitchell and Ramaswamy [58) developed a barrier function cutting plane algorithm
using some of the ideas from [6). This algorithm is a long step algorithm, unlike
the algorithm in [6): if it is not necessary to add or drop constraints, then they
reduce the duality gap by a constant fraction. They showed some links between
the notion of a point being centered (see, for example, Roos and Vial [72)) and the
criteria for a constraint to be added or dropped in [6). Barrier function methods for
linear programming have shown excellent computational performance and they can
be constructed to have superlinear and quadratic convergence. It would thus appear
desirable to employ these methods in a column generation algorithm.

Goffin et al. [23, 24) presented a pseudopolynomial column generation algorithm


which does not need to drop any columns. The number of iterations required to get
the objective function value to within f of optimality is polynomial in f, but this
algorithm does not obtain a solution within 2- L of optimality in time polynomial
in L, where L is the size of the data.

There have been several papers recently analyzing algorithms that add many cuts at
once (see, for example, Luo [48), Ramaswamy and Mitchell [67), and Ye [80)). These
papers generally show that the complexity of an algorithm is not harmed if many
cuts are added at once, although there do have to be some bounds on the number
of constraints added simultaneously.

The earlier theoretical papers on interior point cutting plane algorithms generally
added the constraints far from the current center, so that the center of the new
system is close to the center of the old system. The paper by Goffin et al. [24)
shows that it is possible to add a cutting plane right through the current analytic
center without changing the complexity of their algorithm [23). Ye [80) extended
this analysis to the case where multiple cuts are placed right through the analytic
IPMs for Combinatorial Optimization 457

center. Ramaswamy and Mitchell [67] describe an algorithm which adds multiple
cuts through the analytic center, and they show that the new analytic center can be
regained in O( vP log(p)) iterations, where p is the number of added cuts.

11.6.2 Improved Complexity Results for Selected


Combinatorial Optimization Problems
There has been some research on using interior point methods within algorithms to
solve some combinatorial optimization problems that can be solved in polynomial
time. This has led to improved complexity results for some problems.

The research on interior point methods for positive semi-definite programming has
lead to improved algorithms for various problems in combinatorial optimization. For
example, see the the chapter in this book by Pardalos and Ramana or the papers by
Goemans et al. [18, 20], Alizadeh [2] or chapter 9 of the book [29].

Bertsimas and Orlin [8] use the interior point algorithm for convex programming
given by Vaidya [74] to obtain algorithms with superior theoretical complexity for
several combinatorial optimization problems, principally by giving a new method
for solving the Lagrangean dual of a problem. This leads to improved complexity
for lower bounding procedures for the traveling salesman problem (particularly, the
Held and Karp method), the Steiner tree problem, the 2-connected problem, vehicle
routing problems, multi commodity flow problems, facility location problems, and
others.

Xue and Ye [78] have described an interior point algorithm for solving the problem of
minimizing a sum of Euclidean norms. This algorithm can be used to solve problems
related to Steiner trees with better theoretical complexity than the previously best
known algorithm.

11.7 CONCLUSIONS
We have discussed the ways in which interior point methods have been used to
solve combinatorial problems. The methods discussed include algorithms where the
simplex method has been replaced by an interior point method as well as a new
method which appears unrelated to previous simplex-based algorithms.

We have discussed incorporating interior point methods into cutting plane and
branch and bound algorithms for integer programming in section 11.2. In order
458 CHAPTER 11

to do this successfully, it is necessary to be able to use a warm start somewhat ef-


ficiently. The effective use of a warm start in an interior point method is an active
area of research; if a warm start could be exploited successfully by an interior point
method then the performance of interior point cutting plane and branch and bound
algorithms would be considerably enhanced. In the research to date, the most im-
portant technique appears to be early termination: the current relaxation is only
solved to within some tolerance of optimality before we attempt to refine the relax-
ation. Currently, interior point cutting plane methods do appear to be somewhat
competitive with simplex cutting plane algorithms, at least for some problems. In-
terior point branch and bound algorithms still appear weaker than simplex based
algorithms, at least for the size of problems which can currently be solved. For linear
programming, interior point methods start to outperform simplex for large problems,
so a branch and bound interior point method would only be advantageous for large
problems (thousands of variables and constraints). Pure integer problems of this size
are currently generally intractable. Thus, interior point branch and bound methods
are currently only useful for problems with a small number of integer variables, but a
large number of continuous variables. As hardware improves, it will become possible
to solve larger problems, and interior point branch and bound methods will become
more attractive. Additionally, if a warm start could be exploited more efficiently
then an interior point method would become attractive even for smaller problems.

We described a potential reduction algorithm that transforms an integer program-


ming problem into an equivalent quadratic program in section 11.3. This algorithm
appears to have reasonable computational performance, and it could solve large
problems that were previously unsolved.

We described the use of interior point methods to solve network flow problems in
section 11.4. These problems can be solved by solving a single linear program. The
computational results with an interior point method are better than those with a
specialized simplex method for large problems in several classes.

Research on the multicommodity network flow problem was discussed in section 11.5.
A column generation algorithm which appears to outperform classical Dantzig-Wolfe
decomposition on these problems was described.

With all of these methods, the relative performance of the interior pont method to
other methods improves as the problem size increases. This is typical of computa-
tional results with interior point methods for linear programming and other prob-
lems. Interior point methods will probably not be the method of choice for small
or medium sized problems, but they may become the preferred method for larger
problems once computational hardware improves sufficiently to make it possible to
routinely solve problems which are currently impracticably large. The increasing use
IPMs for Combinatorial Optimization 459

of parallel computers and networks of workstations is leading to the solution of ever


larger problems. Of course, improvements in simplex may keep it the method of
choice even for large problems, but we expect that there will be at least some classes
of problems where an interior point method is ,uperior for large instances. Research
on most of the algorithms discussed in this paper is ongoing, and the researchers
involved are attempting to solve larger problems, in an effort to determine the best
algorithm for large hard problems.

We discussed theoretical issues concerning cutting plane and column generation al-
gorithms in section 11.6.1. There are polynomial time interior point cutting plane
algorithms. However, to date there is no polynomial time interior point cutting
plane algorithm that is based upon the analytic center and which does not drop
constraints. Whether such an algorithm exists is an interesting open problem. The
discussion in section 11.6.2 of improved complexity results for various combinatorial
optimization problems is a starting point for what will probably be an active research
area in the next few years.

Acknowledgements
Research partially supported by ONR Grant number NOOOI4-94-1-0391.

REFERENCES
[1] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows. Prentice Hall,
Englewood Cliffs, New Jersey, 1993.

[2] F. Alizadeh. Interior point methods in semidefinite programming with applica-


tions to combinatorial optimization. SIAM Journal on Optimization, 5(1):13-
51, 1995.

[3] K. M. Anstreicher. A combined phase I - phase II scaled potential algorithm


for linear programming. Mathematical Programming, 52:429-439, 1991.

[4] K. M. Anstreicher. On Vaidya's volumetric cutting plane method for convex


programming. Technical report, Department of Management Sciences, Univer-
sity of Iowa, Iowa City, Iowa 52242, September 1994.

[5] D. Applegate, R. Bixby, V. Chvatal, and W. Cook. Finding cuts in the TSP (a
preliminary report). Technical report, Mathematics, AT&T Bell Laboratories,
Murray Hill, NJ, 1994.
460 CHAPTER 11

[6] D. S. Atkinson and P. M. Vaidya. A cutting plane algorithm for convex program-
ming that uses analytic centers. Mathematical Programming, 69:1-43, 1995.
[7] H. van Benthem, A. Hipolito, B. Jansen, C. Roos, T. Terlaky, and J. Warners.
Radio link frequency assignment project, Technical annex T-2.3.2: Potential
reduction methods. Technical report, Faculty of Technical Mathematics and
Informatics, Delft University of Technology, Delft, The Netherlands, 1995.
[8] D. Bertsimas and J. B. Orlin. A technique for speeding up the solution of the
Lagrangean dual. Mathematical Programming, 63:23-45, 1994.
[9] J. R. Birge, R. M. Freund, and R. J. Vanderbei. Prior reduced fill-in in solving
equations in interior point algorithms. Operations Research Leiters, 11:195-198,
1992.
[10] B. Borchers. Improved branch and bound algorithms for integer programming.
PhD thesis, Rensselaer Polytechnic Institute, Mathematical Sciences, Troy, NY,
1992.
[11] B. Borchers and J. E. Mitchell. Using an interior point method in a branch and
bound algorithm for integer programming. Technical Report 195, Mathemat-
ical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180, March 1991.
Revised July 7, 1992.
[12] CPLEX Optimization Inc. CPLEX Linear Optimizer and Mixed Integer Opti-
mizer. Suite 279, 930 Tahoe Blvd. Bldg 802, Incline Village, NV 89541.
[13] M. Davis and H. Putnam. A computing procedure for quantification theory. J.
Assoc. Comput. Mach., 7:201-215,1960.
[14] DIMACS. The first DIMACS international implementation challenge: The
benchmark experiments. Technical report, DIMACS, RUTCOR, Rutgers Uni-
versity, New Brunswick, NJ, 1991.
[15] J. Edmonds. Maximum matching and a polyhedron with 0, 1 vertices. Journal
of Research National Bureau of Standards, 69B:125-130, 1965.
[16] J. Edmonds. Paths, trees and flowers. Canadian Journal of Mathematics,
17:449-467,1965.
[17] A. S. EI-Bakry, R. A. Tapia, and Y. Zhang. A study of indicators for identifying
zero variables in interior-point methods. SIAM Review, 36:45-72, 1994.
[18] Uriel Feige and Michel X. Goemans. Approximating the value of two prover
proof systems, with applications to MAX 2SAT and MAX DICUT. In Pro-
ceedings of the Third Israel Symposium on Theory of Computing and Systems,
1995.
IPMs for Combinatorial Optimization 461

[19] G. de Ghellinck and J.-P. Vial. A polynomial Newton method for linear pro-
gramming. Algorithmica, 1:425-453, 1986.
[20] Michel X. Goemans and David P. Williamson. Improved Approximation Algo-
rithms for Maximum Cut and Satisfiability Problems Using Semidefinite Pro-
gramming. J. Assoc. Comput. Mach., 1994. (To appear). A preliminary version
appeared in Proc. 26th Annual ACM Symposium on Theory of Computing.
[21] J.-L. Goffin, J. Gondzio, R. Sarkissian, and J.-P. Vial. Solving nonlinear multi-
commodity network flow problems by the analytic center cutting plane method.
Technical report, GERAD, Faculty of Management, McGill University, Mon-
treal, Quebec, Canada H3A IG5, October 1994.
[22] J .-L. Goffin, A. Haurie, and J .-P. Vial. Decomposition and nondifferentiable
optimization with the projective algorithm. Management Science, 38:284-302,
1992.
[23] J .-L. Goffin, Z.-Q. Luo, and Y. Yeo On the complexity of a column generation
algorithm for convex or quasi convex problems. In Large Scale Optimization:
The State of the Art. Kluwer Academic Publishers, 1993.

[24] J .-L. Goffin, Z.-Q. Luo, and Y. Yeo Complexity analysis of an interior cut-
ting plane method for convex feasibility problems. Technical report, Faculty of
Management, McGill University, Montreal, Quebec, Canada, June 1994.
[25] J .-L. Goffin and J .-P. Vial. Cutting planes and column generation techniques
with the projective algorithm. Journal of Optimization Theory and Applica-
tions, 65(3):409-429, 1990.
[26] R. E. Gomory. An algorithm for integer solutions to linear programs. In R. L.
Graves and P. Wolfe, editors, Recent Advances in Mathematical Programming,
pages 269-302. McGraw-Hili, New York, 1963.
[27] M. Grotschel and o. Holland. Solving matching problems with linear program-
ming. Mathematical Programming, 33:243-259, 1985.
[28] M. Grotschel, M. Jiinger, and G. Reinelt. A cutting plane algorithm for the
linear ordering problem. Operations Research, 32:1195-1220, 1984.
[29] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithms and Combi-
natorial Optimization. Springer-Verlag, Berlin, Germany, 1988.

[30] O. Giiler, D. den Hertog, C. Roos, T. Terlaky, and T. Tsuchiya. Degeneracy in


interior point methods for linear programming: A survey. Annals of Operations
Research, 46:107-138, 1993.
462 CHAPTER 11

[31] D. den Hertog. Interior Point Approach to Linear, Quadratic and Convex Pro-
gramming, Algorithms and Complexity. PhD thesis, Faculty of Mathematics and
Informatics, TU Delft, NL-2628 BL Delft, The Netherlands, September 1992.
[32] D. den Hertog, C. Roos, and T. Terlaky. A build-up variant of the path-
following method for LP. Operations Research Letters, 12:181-186, 1992.
[33] IBM. IBM Optimization Subroutine Library Guide and Reference, August 1990.
Publication number SC23-0519-1.
[34] K. L. Jones, 1. J. Lustig, J. M. Farvolden, and W. B. Powell. Multicommodity
network flows - the impact of formulation on decomposition. Mathematical
Programming, 62:95-117, 1993.

[35] M. Junger, G. Reinelt, and S. Thienel. Practical problem solving with cutting
plane algorithms in combinatorial optimization. Technical Report 94.156, Insti-
tut fur Informatik, Universitiit zu Kaln, PohligstraBe 1, D-50969 Kaln, Germany,
March 1994.
[36] A. P. Kamath. Efficient Continuous Algorithms for Combinatorial Optimiza-
tion. PhD thesis, Department of Computer Science, Stanford University, Palo
Alto, CA, February 1995.
[37] A. P. Kamath and N. K. Karmarkar. A continuous approach to compute up-
per bounds in quadratic maximization problems with integer constraints. In
C. A. Floudas and P. M. Pardalos, editors, Recent Advances in Global Optimiza-
tion, Princeton Series in Computer Science, pages 125-140. Princeton University
Press, Princeton, NJ, USA, 1992.
[38] A. P. Kamath and N. K. Karmarkar. An O(nL) iteration algorithm for com-
puting bounds in quadratic optimization problems. In P. M. Pardalos, editor,
Complexity in Numerical Optimization, pages 254-268. World Scientific Pub-
lishing Company, Singapore (USA address: River Edge, NJ 07661), 1993.
[39] A. P. Kamath, N. K. Karmarkar, and K. G. Ramakrishnan. Computational
and complexity results for an interior point algorithm on multi-commodity flow
problem. Technical report, Department of Computer Science, Stanford Univer-
sity, Palo Alto, CA, 1993.
[40] A. P. Kamath, N. K. Karmarkar, K. G. Ramakrishnan, and M. G. C. Re-
sende. A continuous approach to inductive inference. Mathematical Program-
ming, 57:215-238, 1992.
(41) N. K. Karmarkar. A new polynomial-time algorithm for linear programming.
Combinatorica, 4:373-395, 1984.
IPMs for Combinatorial Optimization 463

[42] N. K. Karmarkar and K. G. Ramakrishnan. Computational results of an interior


point algorithm for large scale linear programming. Mathematical Programming,
52:555-586, 1991.
[43] N. K. Karmarkar, M. G. C. Resende, and K. G. Ramakrishnan. An interior point
algorithm to solve computationally difficult set covering problems. Mathematical
Programming, 52:597-618, 1991.
[44] R. M. Karp. Reducibility among combinatorial problems. In R. E. Miller and
J. W. Thatcher, editors, Complexity of Computer Computations, pages 85-103.
Plenum Press, New York, 1972.
[45) D. Klingman, A. Napier, and J. Stutz. Netgen: A program for generating large
scale capacitated assignment, transportation, and minimum cost network flow
problems. Management Science, 20:814-821, 1974.
[46) A. H. Land and A. G. Doig. An automatic method of solving discrete program-
ming problems. Econometrica, 28:497-520, 1960.
[47] E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy Kan, and D. B. Shmoys, editors.
The Traveling Salesman Problem. John Wiley, New York, 1985.
[48) Z.-Q. Luo. Analysis of a cutting plane method that uses weighted analytic
center and multiple cuts. Technical report, Department of Electrical and Com-
puter Engineering, McMaster University, Hamilton, Ontario, L8S 4L7, Canada,
September 1994.
[49) I. J. Lustig, R. E. Marsten, and D. F. Shanno. On implementing Mehrotra's
predictor-corrector interior point method for linear programming. SIAM Jour-
nal on Optimization, 2:435-449, 1992.
[50]1. J. Lustig, R. E. Marsten, and D. F. Shanno. Interior point methods for linear
programming: Computational state of the art. ORSA Journal on Computing,
6(1):1-14, 1994. See also the following commentaries and rejoinder.
[51) N. Megiddo. On finding primal- and dual-optimal bases. ORSA Journal on
Computing, 3:63-65, 1991.
[52) S. Mehrotra. On the implementation of a (primal-dual) interior point method.
SIAM Journal on Optimization, 2(4):575-601, 1992.
[53) M. Minoux. Mathematical Programming: Theory and Algorithms. Wiley, New
York,1986.
[54) J. E. Mitchell. Karmarkar's Algorithm and Combinatorial Optimization Prob-
lems. PhD thesis, School of Operations Research and Industrial Engineering,
Cornell University, Ithaca, NY, 1988.
464 CHAPTER 11

[55] J. E. Mitchell. Fixing variables and generating classical cutting planes when
using an interior point branch and cut method to solve integer programming
problems. Technical Report 216, Mathematical Sciences, Rensselaer Polytechnic
Institute, Troy, NY 12180-3590, October 1994.

[56] J. E. Mitchell. An interior point column generation method for linear program-
ming using shifted barriers. SIAM Journal on Optimization, 4:423-440, May
1994.

[57] J. E. Mitchell and B. Borchers. Solving real-world linear ordering problems


using a primal-dual interior point cutting plane method. Technical Report 207,
Mathematical Sciences, Rensselaer Polytechnic Institute, Troy, NY 12180-3590,
March 1993. To appear in Annals of OR.
[58] J. E. Mitchell and S. Ramaswamy. An extension of Atkinson and Vaidya's
algorithm that uses the central trajectory. Technical Report 37-93-387, DSES,
Rensselaer Polytechnic Institute, Troy, NY 12180-3590, August 1993.
[59] J. E. Mitchell and M. J. Todd. Solving combinatorial optimization problems
using Karmarkar's algorithm. Mathematical Programming, 56:245-284, 1992.
[60] S. Mizuno, M. Kojima, and M. J. Todd. Infeasible-interior-point primal-dual
potential-reduction algorithms for linear programming. SIAM Journal on Op-
timization, 5:52-67, 1995.
[61] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization.
John Wiley, New York, 1988.
[62] G. L. Nemhauser and L. A. Wolsey. Integer programming. In G. L. Nemhauser
et al., editor, Optimization, chapter 6, pages 447-527. North-Holland, 1989.
[63] P. M. Pardalos and S. A. Vavasis. Quadratic programming with one negative
eigenvalue is N P-hard. Journal of Global Optimization, 1:15-23, 1991.
[64] R. G. Parker and R. L. Rardin. Discrete Optimization. Academic Press, San
Diego, CA 92101, 1988.
[65] L. Portugal, F. Bastos, J. Judice, J. Paix ao, and T. Terlaky. An investigation of
interior point algorithms for the linear transportation problem. Technical report,
Department of Mathematics, University of Coimbra, Coimbra, Portugal, 1993.
To appear in SIAM J. Sci. Computing.
[66] L. Portugal, M. Resende, G. Veiga, and J. Judice. A truncated primal-infeasible
dual-feasible network interior point method. Technical report, AT&T Bell Lab-
oratories, Murray Hill, Jew Jersey, 1994.
IPMs Jor Combinatorial Optimization 465

[67] S. Ramaswamy and J. E. Mitchell. On updating the analytic center after the
addition of multiple cuts. Technical Report 37-94-423, Dept. of Decision Sci-
ences and Engg. Systems, Rensselaer Polytechnic Institute, Troy, NY 12180,
October 1994.

[68] S. Ramaswamy and J. E. Mitchell. A long step cutting plane algorithm that
uses the volumetric barrier. Technical report, Dept. of Decision Sciences and
Engg. Systems, Rensselaer Polytechnic Institute, Troy, NY 12180, June 1995.
[69] M. G. C. Resende and P. M. Pardalos. Interior point algorithms for network flow
problems. Technical report, AT&T Bell Laboratories, Murray Hill, New Jersey
07974-2070, 1994. To appear in Advances in Linear and Integer Programming,
J. E. Beasley, ed., Oxford University Press, 1995.
[70] M. G. C. Resende and G. Veiga. An efficient implementation of a network
interior point method. In D.S. Johnson and C.C. McGeogh, editors, Network
Flows and Matching: First DIMACS Implementation Challenge" pages 299-348.
American Mathematical Society, 1993. DIMACS Series on Discrete Mathemat-
ics and Theoretical Computer Science, vol. 12.

[71] M. G. C. Resende and G. Veiga. An implementation of the dual affine scaling


algorithm for minimum cost flow on bipartite uncapacitated networks. SIAM
Journal on Optimization, 3:516-537, 1993.
[72] C. Roos and J. P. Vial. A polynomial method of approximate centers for linear
programming. Mathematical Programming, 54:295-305, 1992.
[73] C.-J. Shi, A. Vannelli, and J. Vlach. An improvement on Karmarkar's algorithm
for integer programming. COAL Bulletin, 21:23-28, November 1992.
[74] P. M. Vaidya. A new algorithm for minimizing convex functions over convex
sets. In Proceedings of the 30th Annual IEEE Symposium on Foundations of
Computer Science, pages 338-343, Los Alamitos, CA, 1989. IEEE Computer
Press. To appear in Mathematical Programming.

[75] J. P. Warners. A potential reduction approach to the radio link frequency


assignment problem. Master's thesis, Faculty of Technical Mathematics and
Informatics, Delft University of Technology, Delft, The Netherlands, 1995.
[76] X. Xu, P. F. Hung, and Y. Yeo A simplified homogeneous and self-dual lin-
ear programming algorithm and its implementation. Technical report, College
of Business Administration, The University of Iowa, Iowa City, Iowa 52242,
September 1993.

[77] X. Xu and Y. Yeo A generalized homogeneous and self-dual algorithm for linear
programming. Operations Research Letters, 17:181-190,1995.
466 CHAPTER 11

[78] G. Xue and Y. Yeo An efficient algorithm for minimizing a sum of Euclidean
norms with applications. Technical report, Department of Computer Science
and Electrical Engineering, University of Vermont, Burlington, VT 05405-0156,
June 1995.

[79] Y. Yeo On an affine scaling algorithm for nonconvex quadratic programming.


Mathematical Programming, 56:285-300, 1992.

[80] Y. Yeo Complexity analysis of the analytic center cutting plane method that
uses multiple cuts. Technical report, Department of Management Sciences, The
University of Iowa, Iowa City, Iowa 52242, September 1994.

[81] Y. Ye, M. J. Todd, and S. Mizuno. An O(foL)-iteration homogeneous and


self-dual linear programming algorithm. Mathematics of Operations Research,
19:53-67, 1994.
[82] Y. Zhang. On the convergence of a class of infeasible interior-point methods for
the horizontal linear complementarity problem. SIAM Journal on Optimization,
4(1):208-227,1994.
12
INTERIOR POINT METHODS FOR
GLOBAL OPTIMIZATION
Panos M. Pardalos t , Mauricio G.C. Resende 2
t University of Florida
Gainesville, Florida 35611 USA
2 ATf3T Bell Laboratories
Murray Hill, New Jersey 09794 USA

ABSTRACT
Interior point methods, originally invented in the context of linear programming, have found
a much broader range of applications, including global optimization problems that arise in
engineering, computer science, operations research, and other disciplines. This chapter
overviews the conceptual basis and applications of interior point methods for some classes
of global optimization problems.

Key Words: Interior point methods, noncovex optimization, global optimization, quadratic
programming, linear complementarity problem, integer programming, combinatorial opti-
mization

12.1 INTRODUCTION
During the last decade, the field of mathematical programming has evolved rapidly.
New approaches have' been developed and increasingly difficult problems are be-
ing solved with efficient implementations of new algorithms. One of these new ap-
proaches is the interior point method [17]. These algorithms have been primarily
used to develop solution methods for linear and convex minimization problems. In-
terior point methods have been also developed for nonconvex minimization problems
and have been used as subroutines in many global optimization algorithms.

In this chapter, we provide an overview of some recent developments in the field of


interior point algorithms for global optimization. In Section 12.2, we discuss sev-
eral classes of quadratic programming problems. We first consider a polynomial
time algorithm for quadratic programming over an ellipsoid. We briefly discuss

467
T. Terlaky (ed.), Interior Point Methods ofMathematical Programming 467-500.
o 1996 KIMHr Academic PllbJi8Mrl.
468 CHAPTER 12

multi quadratic programming and present an interior point algorithm for quadratic
programming with box constraints. In Section 12.3, we discuss an algorithm for
the minimization of non convex potential functions and show how this algorithm
can be applied to solve combinatorial optimization problems. Computational issues
are discussed in detail. Section 12.4 deals with an affine scaling algorithm for gen-
eral non convex quadratic programming. A lower bounding technique that uses an
interior point method is considered in Section 12.5. Section 12.6 discusses a poten-
tial reduction interior point algorithm for general linear complementarity problems.
Concluding remarks are made in Section 12.7.

12.2 QUADRATIC PROGRAMMING


We start our discussion on the use of interior point techniques to solve nonconvex
quadratic programming problems.

The general quadratic programming problem with linear constraints has been shown
to be NP-complete [8, 32]. For example, the well-known maximum clique problem
=
on a graph G (V, E) can be formulated as the indefinite quadratic program
1 T
max 2"x AGx

subject to
eT x = 1, x 2: 0,
where AG is the adjacency matrix of the graph G and e is the vector of all ones.
Even the problem of deciding the existence of a Karush-Kuhn-Tucker point for the
problem

subject to
x 2: 0,
is NP-complete [8]. An approximate solution of the general quadratic programming
problem with linear constraints can be computed by successively solving a sequence
of quadratic problems with an ellipsoid constraint.

The general quadratic problem has the form

min q(x) = ~xTQx + cT x (12.1)

subject to
x EP = {x E R n I Ax = b, x 2: O}, (12.2)
IPMs for Global Optimization 469

where Q E Rnxn, A E Rmxn, cERn, and bERm. A special case of this problem
is the box constrained problem

(12.3)

subject to
x E B(r) = {x E R n Illxll oo ::; r}. (12.4)
Quadratic programming problems with box constraints are also NP-complete. This
class of problems is important because it constitutes a major ingredient in many
nonlinear programming algorithms.

By replacing the infinity norm 11.1100 with the Euclidean norm 11.112' results in quadratic
programming with a single quadratic constraint

(12.5)

subject to
x E E(r) = {x E R n Illxl12::; r}. (12.6)
Quadratic programming with an ellipsoid constraint is a useful subproblem in many
interior point algorithms for discrete and continuous optimization.

We conclude this section with some basic results about the eigenvalues of Q. Com-
puting the eigenvalues of a symmetric matrix is a well-studied problem [5] and can
be done in G(n 3 ) time for an n x n matrix. Assume that the components of Q and
c are rational numbers, Q is a symmetric rational matrix, and let L( Q) denote the
binary encoding length of Q. Using linear algebra techniques, it can be shown that
if ..\(Q) is an eigenvalue of Q, then I..\(Q) I ::; nmax,;,j Iqijl ::; 20 (L(Q)), and either
..\( Q) = 0 or 1>'( Q) I > 2- 0 (L(Q)).

Let PA be the orthogonal projection matrix onto the null space {x E R n I Ax =


OJ, and B be an orthonormal basis spanning the null space. Then, ..\(BTQB) E
{..\( pI Q P A)} and the columns of AT are the eigenvectors of pI Q P A, corresponding
to the zero eigenvalues of pIQPA .

12.2.1 Quadratic Programming with an Ellipsoid


Constraint
Although quadratic programming subject to box constraints is NP-hard, quadra-
tic programming subject to ellipsoid constraints can be solved by a polynomial time
470 CHAPTER 12

algorithm. In this subsection, we outline a polynomial-time algorithm for computing


a global optimum of a quadratic function over a sphere [35, 36]. The first order
necessary conditions for (12.5-12.6) are
(Q + J-lI)x = -c, (12.7)
and
=
IIxl12 :::; r, J-l2: 0, and J-l(r - IIx112) 0. (12.8)
The second order necessary condition for (12.5-12.6) is that Q + pI is positive
semidefinite, or equivalently, that all eigenvalues of Q + pI are nonnegative, i.e.
>.( Q + pI) 2: 0.
If we denote by ~(Q) the smallest eigenvalue of Q, then the second order condition
can be expressed as
p 2: max(O, -~(Q». (12.9)
When the objective function q(x) is indefinite, i.e. ~(Q) < 0, we must have IIxl12 = r
in (12.8), since, from (12.9), J-l 2: I~(Q)I > 0. In this case, the solution is on the
boundary of the sphere. Because the feasible domain is a compact set, there exists
at least one solution that satisfies optimality conditions (12.7-12.9).

Let x* satisfy (12.7-12.9). If ~(Q) < 0, then


q(x*) :::; q(O) _ r21~iQ)I.
This property relates the optimal value of the objective to the smallest eigenvalue
of Q. In other words, this property indicates that the larger the absolute value of
~(Q), the greater the reduction of the objective value for (12.5-12.6).

Among those solutions satisfying (12.7-12.9), we are interested in the one that
achieves the (global) minimum objective value for (12.5-12.6). Interestingly, all
feasible solutions satisfying (12.7-12.9) must have the same p and the same objec-
tive value. More explicitly, let (Pl, xI) and (P2, X2) satisfy (12.7-12.9), then Pl = P2
=
and q(xI) q(X2)' This fact was shown for the trust region method in unconstrained
optimization [24].

From the above result, we have that any solution that satisfies (12.7-12.9) is the
globally minimum solution for (12.5-12.6), and P is unique among these minimum
solutions. Next, we discuss an algorithm for finding a solution satisfying (12.7-12.9)
in O(n 3 Iog(1/f» arithmetic operations with the error tolerance f> 0.

Let p* 2: °
be the unique p satisfying (12.7-12.9). An upper bound for p* is

II cl12 + n ~~x Iqij I 2: J-l*.


r I,}
IPMs for Global Optimization 471

procedure bs(n, Q, c, r, f, J.I.)


1 J.l1 = 0; J.lu = IlcI12/r+ nmaxo,j 1%1;
2 do J.lu - J.l1 ~ f -
3 J.I = (J.l1 + J.lu)/2;
4 if (Q + J.lI) is positive definite -
5 if II(Q + J.lI)- l cIl2 < r -
6 J.lu = J.lj
7 else J.l1 = J.I fij
8 =
else J.l1 J.I fij
9 odj
end bSj

Figure 12.1 Procedure bs: Algorithm for quadratic programming over an ellipsoid
using binary search

The binary search procedure bs, shown in the pseudo-code of Figure 12.1, can be
used to approximately compute J.I •. The procedure takes as input, the problem data
n, Q, c and r, and the tolerance f, and returns an f-approximation of J.I •. In step 1,
the lower and upper bounds on J.I. are initialized. The loop from line 2 to line 9
is repeated until the interval containing J.I. is less than the tolerance f. The loop
carries out the steps of a binary search. In step 3, the midpoint J.I is determined. If
Q + J.lI is positive definite and the norm of the solution of (12.7) is less than r, then
upper bound J.lu of J.I. is updated to be the midpoint J.I. Else, if Q + J.lI is positive
definite and the norm of the solution is greater than or equal to r, then the lower
bound J.l1 is set to be the midpoint. On the other hand, if Q + J.lI is negative definite
or indefinite, or no solution of (12.7) exists, or if the norm of the minimum norm
= -(
solution!!;. Q + J.lI)+ c is greater than r, then the lower bound J.l1 is set to the
midpoint (B+ denotes the pseudoinverse of B).

The minimum norm solution of (12.7) is considered in the case J.I = I~(Q)I, in which
Q + J.lI is positive semidefinite (and therefore singular), and solutions exist for (12.7).
Thus, c must equal the projection of c onto the column (or row) space of Q + J.lI, and
the minimum norm solution!!;. is the solution such that it lies in the row (or column)
= -(
space of Q + J.lI or equivalently, !!;. Q + J.lI)+ c. Each iteration of bs requires one
matrix inversion, and can thus be completed in O(n 3 ) arithmetic operations.

The above binary search procedure terminates in O(log(1/f» iterations, resulting in


a total complexity of O(n 3 10g(1/f» arithmetic operations. The solution J.I. resulting
from the procedure satisfies (12.7-12.9) with 0 :5 J.lu - J.I. < O(f) and IIIxll2 - r 1<
O(f).
472 CHAPTER 12

Although the above algorithm is polynomially bounded (with error c), trust region
techniques are preferred in practice. Ye [36] proposed a hybrid algorithm, com-
bining Newton's method and binary search, that solves the problem of minimizing
a quadratic function over an ellipsoid in O(log(1og(l/c))) iterations, each iteration
taking O( n 3 ) operations.

A very nice discussion regarding non convex quadratic programming over a sphere
can be found in the book by Vavasis [32].

12.2.2 M ultiquadratic Programming


The multiquadratic programming problem (MQP) is defined to be the problem of
globally minimizing a quadratic function subject to quadratic equality and inequal-
ity constraints. The MQP offers a powerful unification of several mathematical opti-
mization problems. For instance, it includes as special cases, conventional quadratic
programming and binary integer programming, and allows compact formulations of
problems such as the job-shop scheduling problem. Also, the more general problem
of polynomial programming can be reduced to MQP.

The general multi quadratic programming problem can be stated as


min /(x) = x T Ax + aT x
subject to

x T BiX + bT x = d;, i = 1, ... , m,


x T Ci x + cT x ::; ei, i = 1, ... , k,
x ERn,
where A, B;, C; are n x n real symmetric matrices, and a, b, d, c, e are real vectors
of appropriate dimensions. This type of problem is, in general, non convex since it
contains as special cases polynomial optimization and zero-one integer programming.

Interior point approaches and in particular semidefinite programming techniques


have been used to undertake this problem, e.g. [28].

12.2.3 Quadratic Programming with Box


Constraints
In this subsection, we describe an interior point algorithm for general quadratic pro-
gramming subject to box constraints. Without loss of generality, consider quadratic
IPMs for Global Optimization 473

procedure qpbox(n, Q, c, r, x)
1 k = 1; xO = 1/2e;Dl = diag(1/2, ... ,1/2);
2 do stopping criterion not satisfied --+
3 Ek = {x IIID;l(x - X k- 1 )112 ~ r ~ I}};
4 xk = argmin HxTQx + cTx I x E Ed;
5 di =min{xf,l-xf}, i=l, ... ,n;
6 Dk+1 = diag(d 1 , ... , dn );
7 k =k + 1;
8 od;
end qpbox;

Figure 12.2 Procedure qpbox: Algorithm to solve quadratic prograrruning with


box constraints

programming over the unit hypercube, i.e.

(12.10)

subject to
o ~ x ~ e, (12.11)
where e E R is a vector of all ones. This problem is an essential subroutine in many
n

general nonlinear optimization codes. Furthermore, many engineering problems can


be formulated as quadratic programs with box constraints.

The algorithm, described in Figure 12.2, solves a sequence of quadratic programs


over ellipsoids, to find a locally (perhaps globally) optimal solution of (12.10-12.11).
Procedure qpbox takes as input the problem data n, Q, c, and r ~ 1, and returns
an approximately optimum solution x·. In line 1 of the pseudo code, the solution
vector is initialized to be in the center of the hypercube, and the scaling matrix of
the ellipsoid (Dt) is setup so that the ellipsoid is centered at the initial solution.
The loop from line 2 to line 8 is repeated until a stopping criterion is satisfied. One
such stopping rule, used in [6], is to halt when Ilxk - x k - 1 112 ~ {, where { is a given
tolerance. In line 3, an ellipsoid centered at the current solution is setup and the
quadratic program over this ellipsoid is solved in line 4. Lines 5 and 6 update the
scaling matrix and in line 7 the iteration counter is incremented.

In [6], preliminary computational results are presented using an implementation of


procedure bs, described earlier, to do the optimization over the ellipsoid.
474 CHAPTER 12

12.3 NONCONVEX POTENTIAL FUNCTION


MINIMIZATION
Consider the problem of maximizing a convex quadratic function defined as
m

max wTw= Lwl (12.12)


;=1

subject to
(12.13)
The significance of this optimization problem is that many combinatorial optimiza-
tion problems can be formulated as above with the additional requirement that the
variables are binary.

In [14, 15] a new affine scaling algorithm was proposed for solving the above prob-
lem using a logarithmic potential function. Consider the non convex optimization
problem
(12.14)
where
1 n
'P(w) log(m - wT w)1/2 - - Llogdi(w) (12.15)
n ;=1

(12.16)

and where
di(w)=b;-a[w, i= 1, ... ,n, (12.17)
are the slacks. The denominator of the log term of '1'( w) is the geometric mean of
the slacks and is maximized at the center of the polytope defined by

To find a local (perhaps global) solution of (12.14), an approach similar to the


classical Levenberg-Marquardt methods [19, 21] is used. Let

be a given initial interior point. The algorithm generates a sequence of interior points
of .c.
IPMs JOT' Global Optimization 475

Let w k E CO be the k-th iterate. Around w k a quadratic approximation of the


potential function is set up. Let D = diag(d 1 (w), ... ,dn (w)), e = (1, ... ,1), fo =
m - w T wand C be a constant. The quadratic approximation of <p( w) around w" is
given by
1
Q(w) =2"(w - wkf H(w - w k ) + hT(w - w k ) + C (12.18)

where the Hessian is

(12.19)

and the gradient is


1 k 1 -1
h=--w +-AD e. (12.20)
fo n
Recall that minimizing (12.18) over a polytope is NP-complete. However, if the
polytope is substituted by an inscribed ellipsoid, the resulting approximate problem
is easy. As we have seen in subsection 12.2.1, this easier problem can be solved
in polynomial time [36]. Since preliminary implementations of this algorithm indi-
cate that trust region methods are more efficient for solving these problems, in the
discussion that follows we consider a trust region approach.

Consider the ellipsoid

To see that the ellipsoid £(r) is inscribed in the polytope C, assume that r = 1 and
let Y E £(1). Then

and consequently
D- 1 A T (y_ w k ) ~ e,
where w k E Co. Denoting the i-th row of AT by aT, we have

bi - ai. w
T( Y - w k)
1T k aj. ~ 1, 'til. = 1, ... , n.
Hence,

and consequently
ary~bi' Vi=1, ... ,n,
i.e. AT y ~ b, showing that y E C. This shows that £(1) C C and since £(r) C £(1),
for 0 ~ r < 1, then £(r) C C, i.e. £(r) is an inscribed ellipsoid in C.
476 CHAPTER 12

Substituting the polytope by the appropriate inscribed ellipsoid and letting ~w ==


w - w k results in the minimization of a quadratic function over an ellipsoid, i.e.

(12.21)

subject to
(12.22)
The optimal solution ~w* to (12.21-12.22) is a descent direction of Q(w) from
wk. For a given radius r > 0, the value of the original potential function cp( w)
may increase by moving in the direction ~w*, because of the higher order terms
ignored in the approximation. It can be easily verified, however, that if the radius
is decreased sufficiently, the value of the potential function will decrease by moving
in the new ~w* direction. We shall say a local minimum to (12.14) has been found
if the radius must be reduced below a tolerance ( to achieve a reduction in the value
of the potential function.

The following result, proved in [14], characterizes the optimal solution of (12.21-
12.22). Using a linear transformation, the problem is transformed into the mini-
mization of a quadratic function over a sphere.

Consider the optimization problem

(12.23)

subject to
(12.24)
where Q E R mxm is symmetric and indefinite, x, c E R m and 0 < r E R. Let
denote a full set of orthonormal eigenvectors spanning R m and let AI,
Ul, •.. , U m
... , Am be the corresponding eigenvalues ordered so that Al :::; A2 :::; .,. :::; Am- 1 :::;
Am. Denote 0 > Amin = min{Al, ... ,Am} and Umin the corresponding eigenvector.
Furthermore, let q be such that Amin = Al = ... = Aq < A q+1' To describe the
solution to (12.23-12.24) consider two cases:
Case 1: Assume 2:;=1 (c T Ui? > O. Let the scalar A E (-00, Amin) and consider the
parametric family of vectors

For any r > 0, denote by A(r) the unique solution of the equation x(A)T x(A) = r2
in A. Then x(A(r)) is the unique optimal solution of (12.23-12.24).
IPMs for Global Optimization 477

Case 2: Assume cT Ui = 0, Vi = 1, ... , q. Let the scalar A E (-00, Amin) and consider
the parametric family of vectors

(12.25)

Let
1'max = Ilx(Am in)112.
If l' < 1'max then for any ° < l' < 1'max , denote by A(1') the unique solution of
the equation x(A)T x(A) = 1'2 in A. Then x(A(1')) is the unique optimal solution of
(12.23-12.24).
If l' ::::: 1'max , then let 0' 1, 0'2, ... , 0' q be any real scalars such that
q
'~ai
" 2 =r 2 -rmax'
2
i=1

Then

is an optimal solution of (12.23-12.24). Since the choice of O';'s is arbitrary, this


solution is not unique.

This shows the existence of a unique optimal solution to (12.23-12.24) if l' < 1'max.
The proof of this result is based on another fact, used to develop the algorithm
described in [14, 15], that we state next.

Let the length of x(.\) be

/ (x(A)) == Ilx(A)II~ = x(Af x(A),


then / (X(A)) is monotonically increasing in A in the interval A E (-00, Amin). To
see this is so, consider two cases. First, assume L:i=1 (cTUi)2 > 0. Consider the
parametric family of vectors

for A E (-00, Amin). Now, assume that cT Ui = 0, Vi = 1, ... , q and consider the
parametric family of vectors

(12.26)
478 CHAPTER 12

procedure cmq(n, A, b, Po, k, 10 )


1 k = 0; 1= 1/(po + lin); l = k; 1 = 10 ; /{ = 0;
2 wk =
get...start-point(A, b);
3 do r>
f-+
4 Llw· =descent-1iirection(l,w k ,l,l);
5 do rp(w k + aLlw·) ~ rp(w k ) and I> ( -+
6 1=lllr;
7 Llw· =
descent-1iirection(J, w k , l, I);
8 od;
9 if rp(w k + aLlw·) < rp(w k ) -+
10 w k +1 = w k + aLlw·;
11 k = + 1;
k
12 fl·,
13 od;
end cmq;

Figure 12.3 Procedure cmq: Algorithm for nonconvex potential function mini-
mization

for A E (-00, Amin). Furthermore, assume

Then I (X(A)) is monotonically increasing in A in the interval A E (-00, Amin).

The above result suggests an approach to solve the non convex optimization problem
(12.14). At each iteration, a quadratic approximation of the potential function
rp( w) around the iterate w k is minimized over an ellipsoid inscribed in the polytope
{w E RmlAT w ~ b} and centered at wk. Either a descent direction Llw· of rp(w)
is produced or w k is said to be a local minimum. A new iterate Wk+l is computed
by moving from w k in the direction Llw· such that rp{ wk+l) < rp{ w k ). This can be
done by moving a fixed step a in the direction Llw* or by doing a line search to find
a that minimizes the potential function rp{ w k + aLlw·) [30].

Figure 12.3 shows a pseudo-code procedure cmq, for finding a local minimum of the
convex quadratic maximization problem. Procedure cmq takes as input the problem
dimension n, the A matrix, the b right hand side vector, an initial estimate Po of
parameter p and initial lower and upper bounds on the acceptable length, £0 and
10, respectively. In line 2, get...start_point returns a strict interior point of the
polytope under consideration, i.e. w k E £0.
IPMs for Global Optimization 479

The algorithm iterates in the loop between lines 3 and 13, terminating when a local
optimum is found. At each iteration, a descent direction of the potential function
<p(w) is produced in lines 4 through 8. In line 4, the minimization of a quadratic
function over an ellipsoid (12.21-12.22) is solved. Because of higher order terms the
direction returned by descent...direction may not be a descent direction for <p(w).
In this case, loop 5 to 8 is repeated until an improving direction for the potential
function is produced or the largest acceptable length falls below a given tolerance {.

If an improving direction for <p(w) is found, a new point wk+ 1 is defined (in line 10)
by moving from the current iterate w k in the direction ;:lw* by a step length a < 1.

12.3.1 Computing the Descent Direction


Now consider in more detail the computation ofthe descent direction for the potential
function. The algorithm described in this section is similar to the trust region method
described in More and Sorensen [24].

As discussed previously, the algorithm solves the optimization problem

min ~(;:lw? H;:lw + hT ;:lw (12.27)


subject to
(12.28)
to produce a descent direction ;:lw* for the potential function <p(w). A solution
;:lw* E R m to (12.27-12.28) is optimal if and only if there exists fJ ~ 0 such that

(12.29)

(12.30)

H + pAD- 2AT is positive semidefinite. (12.31 )

With the change of variables i = 1j(p + 1jn) and substituting the Hessian (12.19)
and the gradient (12.20) into (12.29) we obtain

_ (AD-2 AT _ 2i wkwkT _ Ii
f~ fo
I) -1 X

i (_~wk + .!.AD-1e) (12.32)


fo n
480 CHAPTER 12

that satisfies (12.29). Note that r does not appear in (12.32) and that (12.32) is not
defined for all values of r. However, if the radius r of the ellipsoid (12.28) is kept
within a certain range, then there exists an interval 0 :S , :S ,max such that

(12.33)

is nonsingular. Next, we show that for, small enough ~ w· is a descent direction of


<p(w). Note that

Let, = f > 0 and consider lim hT ~w·. Since


£-+0+

lim ~w· =f (AD- 2 A T )-1(_h)


€-o+

then
lim hT ~w· = -f hT (AD- 2 AT)-lh.
£-+0+

lim hT ~w· < 0,


£-+0+

showing that there exists, > 0 such that the direction ~w·, given in (12.32), is a
descent direction of <p( w).

The idea ofthe algorithm is to solve (12.27-12.28), more than once if necessary, with
the radius r as a variable. Parameter, is varied until r takes a value in some given
IPMs for Global Optimization 481

interval. Each iteration of this algorithm is comprised of two tasks. To simplify


notation, let
(12.35)

2 k kT 1
H = --w w --/ (12.36)
o J6 fa
and define
M=Hc+'YHo.
Given the current iterate w k , we first seek a value of 'Y such that M ll.w = 'Yh has a
solution ll.w*. This can be done by binary search, as we will see shortly. Once such
a parameter 'Y is found, the linear system

M ll.w* = 'Yh (12.37)

is solved for ll.w* == ll.w*("(r)). As was shown previously, the length 1(ll.w*('Y)) is
a monotonically increasing function of 'Y in the interval 0 :::; 'Y :::; 'Ymax. Optimality
condition (12.30) implies that r = JI(ll.w*('Y)) if Jl > O. Small lengths result in
small changes in the potential function, since r is small and the optimal solution
lies on the surface of the ellipsoid. A length that is too large may not correspond
to an optimal solution of (12.27-12.28), since this may require r > 1. An interval
(£,7) called the acceptable length region, is defined such that a length 1(ll.w*('Y)) is
accepted if l :::; 1(ll.w*('Y)) :::; I. If 1(ll.w*('Y)) < l, 'Y is increased and (12.37) is
resolved with the new M matrix and h vector. On the other hand, if 1(ll.w*('Y)) > I,
'Y is reduced and (12.37) is resolved. Once an acceptable length is produced we use
ll. w* ('Y) as the descent direction.

Figure 12.4 presents pseudo-code for procedure descent...direction, where (12.27-


12.28) is optimized. As input, procedure descent...direction is given an estimate
for parameter 'Y, the current iterate w k around which the inscribing ellipsoid is to be
constructed and the current acceptable length region defined by l and I. The value
of'Y passed to descent...direction at minor iteration k of cmq is the value returned
by descent...direction at minor iteration k - 1. It returns a descent direction ll.w*
of the quadratic approximation of the potential function Q( w) from w k , the next
estimate for parameter 'Y and the current lower bound of the acceptable length region
l.
In line 1, the length I is set to a large number and several logical keys are initialized:
LD key is true if a linear dependency in the rows of M is ever found during the
solution of the linear system (12.37) and is false otherwise; 'fkey ('Y
- k ey
) is true if an
upper (lower) bound for an acceptable 'Y has been found and false otherwise.
482 CHAPTER 12

procedure descent...direction( 1', w k , l, I)


1 = = =
I 00; LDkey false; 7key false; -'Yk ey false; =
2 do I > 1 or (l < l and LDkey false) -+ =
3 M = He + 'YHo; b = 'Yh;
4 =
do M Llw b has no solution -+
5 =
l' l' /'Yr; LDkey = true;
6 M = He + 'YHo; b = 'Yh;
7 od;
8 Llw· = M-1b; 1= (Llw*)T AD-2 AT Llw*;
9 if I<! and LDkey false-+ =
10 'Y = 'Y. = true·,
" ..Lkey
'Y
..L
11 if 7key = true -+ l' = ..;rr fi;
12 = =
if7key false -+ l' 1'. 'Yr fi;
13 fi;
14 if/>1-+
15 =
l' 1'; 7key = true;
16 if 'YL =
...... ey true -+=l' ;;:;y fi;
V.!..'
17 'YL ey = false -+ l' = l' hr fi;
if ......
18 fi;
19 od;
20 do I <! and LDkey = true -+ ! = ljlr od;
21 return(Llw·);
end descent...direction;

Figure 12.4 Procedure descent..direction: Algoritlun to compute descent direc-


tion in nonconvex pptential function minimization
IPMs for Global Optimization 483

The problem of minimizing a nonconvex quadratic function over an ellipsoid is carried


out in the loop going from line 2 to 19. The loop is repeated until either a length
1 is found such that l :::; t :::; I or t :::; I due to a linear dependency found during the
solution of (12.37), i.e. if LD key = true. Lines 3 to 8 produce a descent direction
that may not necessarily have an acceptable length. In line 3 the matrix M and the
right hand side vector b are formed. The linear system (12.37) is tentatively solved
in line 4. The solution procedure may not be successful, i.e. M may be singular.
This implies that parameter I is too large and parameter I is reduced in line 5 of
loop 4-7, which is repeated until a nonsingular matrix M is produced.

Once a nonsingular M matrix is available, a descent direction ~w' is computed in


line 8 along with its corresponding length t. Three cases can occur: (i) - the length is
too small even though no linear dependency was detected in the factorization; (ii) -
the length is too large; or (iii) - the length is acceptable. Case (iii) is the termination
condition for the main loop 2-19. In lines 9-13 the first case is considered. The value
of I is a lower bound on an acceptable value of I and is recorded in line 10 and the
corresponding logical key is set. If an upper bound l' for an acceptable value of I
has been found the new estimate for I is set to the geometric mean of I and l' in
line 11. Otherwise I is increased by a fixed factor in line 12. -

Similar to the treatment of case (i), case (ii) is handled in lines l4-18. The current
value of I is an upper bound on an acceptable value of I and is recorded in line 15
and the corresponding logical key is set. If a lower bound I for an acceptable value
of I has been found the new estimate for I is set to the geometric mean of I and l'
in line 16. Otherwise I is decreased by a fixed factor in line 17. -

Finally, in line 20, the lower bound L may have to be adjusted if 1 < Land LD key =
true. Note that the key LDkey is used only to allow the adjustment in the range of
the acceptable length, so that the range returned contains the current length 1.

12.3.2 Some Computational Considerations


The density of the linear system solved at each iteration of descent..direction
is determined by the density of the Hessian matrix. Using the potential function
described in the previous section, this Hessian,

M = AD- 2 AT - ~wkwkT - ~I
fJ fa '

is totally dense, because of the rank one component /g


wkw kT . Consequently, direct
factorization solution techniques must be ruled out for large instances. However,
in the case where the matrix A is sparse, iterative methods can be applied to ap-
484 CHAPTER 12

proximately solve the linear system. In [13], a preconditioned conjugate gradient


algorithm, using diagonal preconditioning, was used to solve the system efficiently
taking advantage of the special structure of the coefficient matrix. In this approach,
the main computational effort is the multiplication of a dense vector ~ and the coeffi-
cient matrix M, i.e. M~. This multiplication can be done efficiently, by considering
fact that M is the sum of three matrices, each of which has special structure. The
first multiplication,
1
f/~
is simply a scaling of ~. The second product,

2 k kT
f'6w W ~

is done in two steps. First, an inner product w kT ~ is computed. Then, the vector
/g w k is scaled by the inner product. The third product,

is done in three steps. First the product AT ~ is carried out. The resulting vector is
scaled by D- 2 and multiplies A. Therefore, if A is sparse, the entire matrix vector
multiplication can be done efficiently.

In a recent study, Warners et al. [33] describe a new potential function

L Pi logdi(w),
n

<pp(w) =m - wT w -
i=1

whose gradient and Hessian are given by

h=-2w+AD- 1 p,

and
H = -21 + AD- 1 PD- 1 AT,
where P = (PI, ... , Pn) and P = diag(p). Note that the density of the Hessian
depends only on the density of AAT. Consequently, direct factorization methods
can be used efficiently when the density of AAT is small.
IPMs for Global Optimization 485

12.3.3 Application to Combinatorial


Optimization
The algorithms discussed in this section have been applied to the following integer
programming problem: Given A' E R mxn ' and b' ERn', find W E R m such that:

A'Tw < b' (12.38)


Wi { -1, I}, i = 1, ... , m. (12.39)

The more common form of integer programming, where variables Xi take on (0,1)
values, can be converted to the above form with the change of variables

1 + Wi .
Xi = --2-' z = 1, ... ,m.

More specifically, let I denote an m x m identity matrix,

and

and let

With this notation, we can state the integer programming problem as: Find wEI.

As before, let
.c = {w E Rm I AT W ::; b}
and consider the linear programming relaxation of (12.38-12.39), i.e. find w E
.c. One way of selecting ±1 integer solutions over fractional solutions in linear
programming is to introduce the quadratic objective function,
m

max wTw = Lwl


;=1

and solve the non convex quadratic programming problem (12.12-12.13). Note that
w T w ::; m, with the equality only occurring when Wj = ±1, j = 1, ... , m. Further-
more, if wEI then w E .c and Wi =±1, i = =
1, ... , m and therefore w T w m.
486 CHAPTER 12

Hence, if w is the optimal solution to (12.12-12.13) then wEI:. If wTw = m then


Wi = ±1, i = 1, ... , m and therefore wEI. Consequently, this shows that if wEI:
then wEI if and only if w T w = m.

In place of (12.12-12.13), one solves the non convex potential function minimization

(12.40)

where <p(w) is given by (12.15-12.17). The generally applied scheme rounds each
iterate to an integer solution, terminating if a feasible integer solution is produced.
If the algorithm converges to a nongloballocal minimum of (12.40), then the problem
is modified by adding a cut and the algorithm is applied to the augmented problem.
Let v be the integer solution rounded off from the local minimum. A valid cut is

(12.41)

Observe that if w = v then v T w = m. Otherwise, v T w :::; m - 2. Therefore, the


cut (12.41) excludes v but does not exclude any other feasible integral solution of
(12.38-12.39).

We note that adding a cut of the type above will not, theoretically, prevent the
algorithm from converging to the same local minimum twice. In practice [15], the
addition of the cut changes the objective function, consequently altering the trajec-
tory followed by the algorithm.

Most combinatorial optimization problems have very natural equivalent integer and
quadratic programming formulations [26]. The algorithms described in this section
have been applied to a variety of problems, including maximum independent set [16],
set covering [15], satisfiability [13, 30], inductive inference [11, 12], and frequency
assignment in cellular telephone systems [34].

12.4 AFFINE SCALING ALGORITHM FOR


GENERAL QUADRATIC
PROGRAMMING
The affine scaling algorithm for linear programming, introduced by Dikin [3], and re-
discovered after the publication of Karmarkar's interior point algorithm, was among
the first interior point algorithms to be shown to be competitive with the Simplex
Method. Affine scaling algorithms for linear, convex quadratic, and network pro-
gramming have been extensively studied and implemented (e.g. [1, 22, 29, 31]).
IPMs for Global Optimization 487

We now consider an affine scaling algorithm for general nonconvex quadratic pro-
gramming (12.1-12.2). Given an arbitrary feasible point xk E P =
{x E R n I Ax =
b, x ~ O}, using the scaling technique, we form the suboptimization problem

(12.42)

subject to
= b, and Ilx - el12 :5 r,
.iF x (12.43)
where Q = XkQXk, c= Xkc, with X k = diag (x k ), A = AXk, and 0 < r < l.
Let Llx == x - e, then (12.42-12.43) can be written as

mm ~ LlxT QLlx + (Qe + cf Llx


subject to

Let B E R nxm be an orthonormal basis spanning the null space of A. Then Llx =
BLly for some Lly E R m and, with this change of variables, (12.42-12.43) becomes

(12.44)

subject to
Lly E E(r), (12.45)
which is the minimization of a quadratic function over an ellipsoid.

Problem (12.44-12.45) is identical to (12.5-12.6), whose solution can be obtained


with the trust region method, or in polynomial time, via the binary search procedure
bs. For simplicity, we analyze the affine scaling algorithm in the standard form of
(12.42-12.43) instead of (12.44-12.45).

Denote by ~ the solution for (12.42-12.43). If x"


is an interior feasible point for
(12.1-12.2), i.e. xk > 0, then
Xk+1 =
Xk~ > 0

is also an interior feasible point for (12.1-12.2), since the linear constraints are sat-
isfied by the new interior point, i.e.

AXk+l = A~ = band 0 < 1 - r < ~i < 1 + r < 2.


488 CHAPTER 12

In addition, observe that

The first and second order necessary conditions for the solution of (12.42-12.43) are
given by
Q.f. + c - A? It + pk(.f. - e) = 0,
A(.f. - e) = 0,
11.f. - el12 :::; r, pk 2: 0 and pk(r -11.f. - e112) = 0,
and

Let

and

Then,
pk = Ilpkl12,
r
pk = Xk sk+l,

and
k+l _ k ( rpk)
x - X e - IIpkl12 .

Repeatedly solving (12.42-12.43) with the binary search procedure bs (or a trust
region method) one solves (12.1-12.2), as shown in the pseudo-code in Figure 12.5.
This procedure takes as input the problem data Q, B, c, the positive parameter r < 1,
and outputs the primal and dual solution vectors .f. and y, respectively. Until the
termination criterion is satisfied, the algorithm iterates from line 2 to line 8 solving
a quadratic program over an ellipsoid and recovering the variables. One possible
stopping rule is to halt when Ilx k+1 - xkl12 :::; c, for some given tolerance c > o.
Another rule is to stop when the tightnesses of the optimality conditions are at an
acceptable level.

Let xk and yk converge to.f. and y. Then,.f. and yare feasible for (12.1-12.2), and
satisfy both the first and the seco~d order necessa~y conditions of (12.1-12.2), i.e.
IPMs for Global Optimization 489

procedure asqp( Q, B, c, r, b !!..)


1 k =:0;
2 do termination criterion not satisfied ---->
3 =
!!.. argminH~yT iV QB~y + (Qe + <Y B~y I ~y E E(r)};
4 ~= By;
5 xk+ 1 ~ Xk~;
6 yk+ 1 =
y;
7 =
k k +-1;
8 od;
end asqp;

Figure 12.5 Procedure asqp: Affine scaling algorithm for quadratic programming

and Q is positive semidefinite in the null space of

{x E R n I Ax = 0 and e] x = 0 for j E J},

where the index set J = {j E {I, 2, ... , n} I ~j = OJ.


Under certain assumptions given in [35], xk and yk generated by asqp converge to a
local optimum. In practice, one can relax the restriction of r < 1 in asqp. In other
words, the spherical feasible region for (12.42-12.43) may be enlarged to achieve
further improvement as long as xk+ 1 remains an interior point of (12.1-12.2).

No complexity bound has been developed for the algorithm described above. How-
ever, if the global minimum solution of (12.3-12.4) is contained in a larger ball E(R),
Ye [35] shows that if ~(Q) 2:: 0, then
r
q(~) - z* :5 (1 - :R)(q(O) - z*),

and otherwise, if ~(Q) < 0, then


r2
q(~) - z* :5 (1 - R2 )(q(O) - z*),

where z* is the global minimum objective value for (12.3-12.4). This result shows
that the objective reduction rate is at least 1 - (r/R)2 after solving (12.5-12.6),
independent of the convexity of the quadratic objective function.

Although the general quadratic programming problem with linear constraints is NP-
hard, the affine scaling algorithm has proven itself to be efficient in solving certain
classes on nonconvex quadratic programming problems [6].
490 CHAPTER 12

More recently, Monteiro and Wang [23] study trust region affine scaling algorithms
for solving linearly constrained convex and concave programming problems. For a
special class of convex or concave functions, satisfying a certain invariance condition
on their Hessians, the authors prove R-linear and Q-linear convergence, respectively.
In addition, under primal non degeneracy, and for the same class of functions, they
show that an accumulation point of the iterates satisfies the first and second order
optimality conditions.

12.5 A LOWER BOUNDING TECHNIQUE


A lower bound for the globally optimal solution of the quadratic program (12.1-
12.2) can be obtained by minimizing the objective function over the largest ellipsoid
inscribed in P. This technique can be applied to quadratic integer programming, a
problem that is NP-hard in the general case. Kamath and Karmarkar [9] proposed a
polynomial time interior point algorithm for computing these bounds. The problem
is solved as a minimization of the trace of a matrix subject to positive definiteness
conditions. The algorithm takes no more than O(nL) iterations (where L is the the
number of bits required to represent the input). The algorithm does two matrix
inversions per iteration.

Consider the quadratic integer program


min f(x) = xTQx (12.46)
subject to
xES={-1,l}n, (12.47)
where Q E Rnxn is symmetric. Let fmin be the value of the optimal solution of
(12.46-12.47).

Consider the problem of finding good lower bounds on fmin. To apply an interior
point method to this problem, one needs to embed the discrete set S in a continuous
set T 2 S. Clearly, the minimum of f(x) over T is a lower bound on fmin.

A commonly used approach is to choose the continuous set to be the box


B={xER"I-l::;xi::;l,i=I, ... ,n}.
However, if f(x) is not convex, the problem of minimizing f(x) over B is NP-hard.
Consider this difficult case, and therefore assume that Q has at least one negative
eigenvalue. Since optimizing over a box can be hard, instead enclose the box in an
ellipsoid E. Let
U = {w = (Wl, ... ,Wn ) E R" I L:7=1 Wi = 1 and Wi > 0, i = 1, ... ,n,},
IPMs for Global Optimization 491

and consider the parameterized ellipsoid

where W E U and W = diag(w).

Clearly, the set S is contained in E( w). If Amin (w) is the minimum eigenvalue of
W- 1/ 2QW- 1/ 2, then

. xTQx . xTW-I/2QW-I/2x
m l l lT
-- -
x Wx
= mill x x
T = Amin(W),
and therefore
XTQX ~ Amin(W) , Vx E E(w).
Hence, the minimum value of f( x) over E( w) can be obtained by simply computing
the minimum eigenvalue of W- 1 / 2QW- 1 / 2 . To further improve the bound on !min
requires that Amin (w) be maximized over the set U. Therefore, the problem of
finding a better lower bound is transformed into the optimization problem

max II

subject to
XTQX
TW ~ II, Vx E R n - {OJ and wE U.
x x
One can further simplify the problem by defining d = (d 1 , ... , dn ) E R n such that
2:7=1 di = O. Let D = diag(d). If

then, since 2:7=1 Wi = 1 and 2::'::1 di = 0,

for xES. Now, define z = IlW + d and let Z = diag(z). For all xES,

and therefore the problem becomes

subject to
492 CHAPTER 12

procedure qplb(Q, f, z,opt)


1 =
z(a) (Am;n(Q) - l)e;
2 v(a) = 0;
3 M(z(a)) = Q - Z(a); k = 0;
4 do tr(M(z(k))) - v(k)) ~ E --+
5 Construct H(k) where Hi;) = (et M(z(k))-lej)2;
6 j(k)(z) = 2nln(tr(M(zCk))) - v Ck )) -lndetM(zCk));
7 g(k) = V j(k)(P));
8 =
f3 0.5/ v"g(7"i-k "V;)T;-:H::-iC-;-;k):-_7"1g'(k") ;
9 Solve H(k)!1z = -f3g Ck );
10 if gCk)T !1z < 0.5 --+
11 Increase v Ck ) until g(k)T !1z = 0.5;
12 fl·
13 zr'k+1) = z(k) + !1z; k = k + 1;
14 od·
15 z:::
P); opt = tr(Q) - vCk );
end qplb;

Figure 12.6 Procedure qplb: Interior point algorithm for computing lower bounds

Let M(z) = Q - Z. Observe that solving the above problem amounts to minimizing
the trace of M (z) while keeping M (z) positive semidefinite. Since M (z) is real
and symmetric, it has n real eigenvalues A;(M(z)), i = 1, ... , n. To ensure positive
definiteness, the eigenvalues of M(z) must be nonnegative. Hence, the above problem
is reformulated as
min tr(M(z))
subject to
A;(M(z)) ~ 0, i = 1, ... , n.
Kamath and Karmarkar [9, 10] proposed an interior point approach to solve the above
trace minimization problem, that takes no more that O(nL) iterations, having two
matrix inversions per iteration. Figure 12.6 shows a pseudo code for this algorithm.

To analyze the algorithm, consider the parametric family of potential functions given
by
g(z,v) = 2nln(tr(M(z)) - v) -lndet(M(z)).
where v E R is a parameter. This algorithm will generate a monotonically increasing
sequence of parameters v Ck ) that converges to the optimal value v·. The sequence
IPMs JOT Global Optimization 493

V(k) is constructed together with the sequence z(k) of interior points, as shown in the
pseudo code in Figure 12.6. Since Q - Z· is a positive definite matrix, v(a) 0 ::; v· =
is used as the initial point in the sequence.

Let gl k) (z , v) be the linear approximation of g( z, v) at z( k). Then

(k)
g1
_
(z,v)--
2n
(((k»)
T i d (M( z (kl)T z +C,
ez+V'net
tr M z - v

where C is a constant. Kamath and Karmarkar show how glk)(z, v) can be reduced
by a constant amount at each iteration. They prove that it is possible to compute
v(k+1) E R and a point z(k+1) in a closed ball of radius 0' centered at z(k) such that
v(kl ::; v(k+1) ::; v* and

gi k )(z(k+ 1),v(k+ 1») _ gi k )(z(k),v(k+ 1»)::; _0'.

Using this fact, they show that, if z(k) is the current interior point and v(k) ::; v· is
the current estimate of the optimal value, then
0'2
g( z(k+ 1) v(k+1l) _ g(z(k) v(k»)
, ,
<
-
_0' + -::-;-:_----,-
2(1-0')'

where z(k+1) and v(k+1) are the new interior point and new estimate, respectively.
This proves polynomial-time complexity for the algorithm.

12.6 NONCONVEX COMPLEMENTARITY


PROBLEMS
For a given n by n rational matrix M and a given n rational vector q, the linear
complementarity problem (LCP), is that of finding an (x, y) pair such that

y = Mx + q, x 2: 0, y 2: 0, x T Y = 0,
or proving that no such (x, y) pair exists. Although the general LCP is NP-hard,
some classes of LCP's can be solved by polynomial time algorithms. For example,
when M is positive semidefinite, the LCP is a convex quadratic program and can
be solved by the ellipsoid algorithm or several interior point algorithms (e.g., see
Kojima et al. [18]). Other classes of polynomially solved LCP's are discussed in
[7, 20, 37].

The general linear complementarity problem is equivalent to the mixed integer feasi-
bility problem. It has been shown by Pardalos and Rosen [27] that any general LCP
494 CHAPTER 12

can be reduced to a mixed (0,1) integer feasibility problem. On the other hand,
using the observation that every constraint of the form z E {O, I} is equivalent to

z + w = 1, Z 2:: 0, w 2:: 0, zw = 0,
it is easy to show that any mixed (0, 1) integer feasibility problem can be formulated
as an LCP [26]. Furthermore, the problem of checking existence of a Karush-Kuhn-
Tucker point of the non convex quadratic problem of the form

subject to
x 2:: 0,
can be reduced to a symmetric LCP, which has been shown to be NP-complete [8].

We now focus our attention on a potential reduction interior point approach to


non convex LCP following the discussion in [25, 37]. Given the LCP defined by M
and q, define the set

n+ = {(x,y) I x > 0, y = Mx + q > O}


which is assumed nonempty. An LCP with empty n+ can be transformed to an
equivalent or related LCP with nonempty n+. Finally, define the feasible domain
by
n = {(x,y) I x 2:: 0, y = Mx + q 2:: O}.

Consider the potential function


n

tfJ(x, y) plog(xTy) -l)og(XjYj)


j=l

~ log(x·y·)
(p - n)log(x T y) - L.J J J ,
. log(xTy)
J=l

where p > n, that is defined for any (x, y) E n+. The gradients of the potential
function are

where e is the vector of all ones. Starting from an interior point (XO, yO) that satisfies
IPMs for Global Optimization 495

where L is the size of M and q, the potential reduction algorithm generates a sequence
of interior feasible solutions {xk, yk} terminating at a point such that

tjJ(Xk, l) ::; -(p - n)L.


It can be verified that at this point

and an exact solution to the LCP can be obtained in O( n 3 ) additional operations.

To achieve a potential reduction, the scaled gradient projection method is applied.


At the k-th iteration, the linear program over an ellipsoid

subject to
oy= Mox,
II(Xk)-loxll~ + II(yk)-loyll~ ::; (32 < 1,
is solved. The scaled gradient projection vector to the null space of the scaled
equality constraints is

(12.48)

where

( 12.49)

and ~k = (xk)T yk, X k (resp. yk) denotes the diagonal matrix of xk (resp. yk).
Then, for some (3 > 0 assign

xk+ 1 = xk _ ,BXkp!/llpkll and yk+ 1 = yk _ (3ykp~/llpkll.


By choosing
(3 = min {llpkl12 ~} < 1/2 (12.50)
p+2'2 - ,
it can be shown that

(12.51)
where
Ilpkll~/(2(p + 2)), if Ilpkll~ ::; (p + 2)2/4,
o:(llpk II~) ={
o:(lIpkllD = (p + 2)/8, otherwise.
496 CHAPTER 12

procedure lcp(M, q)
1 Find xO such that yO =
Mxo + q > 0;
2 do (xk)T yk ::::: 2- L --+
3 =
'Irk «yk)2 + M(Xk)2 MT)-l(yk - M Xk)(Xkyk - ~. e);
4 p! = -f,;Xk(yk + MT 'Irk) - e;
5 p~ = -f,;yk(x k - 'Irk) - e;
6 pk = (p!,p~);
7 f3 = min(~ p+2 '21) < - 1/2·,
8 xk+ 1 = x10 - f3X 1o p!/lIp"1I2;
9 yk+ 1 = yk _ f3y1op~/lIpkIl2;
10 k = k + 1;
11 od;
end lcp;

Figure 12.7 Procedure lcp: Potential reduction algorithm for LCP

Pseudo code is given in Figure 12.7 for the potential reduction algorithm for the
LCP.

From (12.51), it follows that Ilpkll~ partially determines the potential reduction at
the k-th iteration. Note that the potential reduction increases with IIpkll~. Let

g(x,Y)=~Xy-e
and

Then
IIpkll~ = g(xk, yk)T H(xk, yk)g(xk, yk).
Ye and Pardalos [37] define the condition number for the LCP as
i(M, q) = inf{llg(x, y)ll1- 1 xT y::::: 2- L , <fJ(x, y) ::; O(pL) and (x, y) E n+},
where Ilg(x, y)ll1- denotes g(x, yf H(x, y)g(x, y). They prove that the algorithm
with p > n solves the LCP for which i(M, q) > 0 in O(nL/O/(f(M, q))) iterations,
each of which requires the solution of one system of linear equations. Consequently,
they show that LCP's for which i(M, q) > 0 and 1/7(M, q) is bounded above by
a polynomial in Land n, can be solved in polynomial time. Thus, the condition
number represents the degree of difficulty of the potential reduction algorithm. Fur-
thermore, the condition number suggests that convexity (or positive semidefiniteness
IPMs for Global Optimization 497

of the matrix M in LCP) may not be the basic issue that separates the polynomially
solvable classes from the class of NP-complete problems.

Many classes of non convex LCP's have been identified and can be solved in polyno-
mial time by this algorithm.

12.7 CONCLUDING REMARKS


Global optimization is of central importance in both the natural sciences, such as
physics, chemistry and biology, as well as artificial or man-made sciences, such as
computer science and operations research.

The advent of interior point methods has provided alternatives to design exact al-
gorithms, as well as heuristics, for many classes of global optimization problems.
In this chapter, we restricted ourselves to applications of interior point methods for
quadratic and combinatorial optimization problems, as well as non convex potential
functions.

Recently, a great amount of research activity on semidefinite programming [2, 28]


has produced some very interesting results. The significance of semidefinite program-
ming is that it provides tighter relaxations to many combinatorial and non convex
optimization problems and, in theory, semidefinite programming can be solved in
polynomial time. Preliminary implementations of semidefinite programming have
been recently described [4].

There is a, vast amount of literature on interior point methods for linear and con-
vex programming, as well as applications in global and combinatorial optimiza-
tion. We direct the reader to the interior point World Wide Web page at the URL
http://www.mcs.anl.gov/home/otc/lnteriorPoint.

REFERENCES
[1] I. Adler, M.G.C. Resende, G. Veiga, and N. Karmarkar. An implementation
of Karmarkar's algorithm for linear programming. Mathematical Programming,
44:297-335, 1989.
[2] F. Alizadeh. Optimization over positive semi-definite cone: Interior-point meth-
ods and combinatorial applications. In P.M. Pardalos, editor, Advances in Op-
498 CHAPTER 12

timization and Parallel Computing, pages 1-25. North-Holland, Amsterdam,


1992.

[3) 1.1. Dikin. Iterative solution of problems of linear and quadratic programming.
Soviet Math. Doklady, 8:674-675,1967.

[4) K. Fujisawa and M. Kojima. SPDA (Semidefinite Programming Algorithm) -


User's manual. Technical Report B-308, Tokyo Institute of Technology, Depart-
ment of Mathematical and Computing Sciences, Tokyo, December 1995. Pro-
gram available at ftp://ftp. is. titech. ac. jp/pub/OpRes/software/SDPA.

[5) G.H. Golub and C.F. Van Loan. Matrix Computations. The Johns Hopkins
University Press, Baltimore, 1989.

[6) C.-G. Han, P.M. Pardalos, and Y. Yeo On the solution of indefinite quadratic
problems using an interior point algorithm. Informatica, 3:474-496, 1992.

[7) R. Horst and P.M. Pardalos, editors. Handbook of Global Optimization. Kluwer
Academic Publishers, Amsterdam, 1995.

[8) R. Horst, P.M. Pardalos, and N.V. Thoai. Introduction to Global Optimization.
Kluwer Academic Publishers, Amsterdam, 1995.

[9) A.P. Kamath and N. Karmarkar. A continuous method for computing bounds
in integer quadratic optimization problems. Journal of Global Optimization,
2:229-241,1992.

[10) A.P. Kamath and N. Karmarkar. An O(nL) iteration algorithm for computing
bounds in quadratic optimization problems. In P.M. Pardalos, editor, Com-
plexity in Numerical Optimization, pages 254-268. World Scientific, Singapore,
1993.

[11) A.P. Kamath, N. Karmarkar, K.G. Ramakrishnan, and M.G.C. Resende. A con-
tinuous approach to inductive inference. Mathematical Programming, 57:215-
238, 1992.

[12) A.P. Kamath, N. Karmarkar, K.G. Ramakrishnan, and M.G.C. Resende. An


interior point approach to Boolean vector function synthesis. In Proceedings of
the 36th MSCAS, pages 185-189, 1993.

[13) A.P. Kamath, N. Karmarkar, N. Ramakrishnan, and M.G.C. Resende. Compu-


tational experience with an interior point algorithm on the Satisfiability prob-
lem. Annals of Operations Research, 25:43-58, 1990.

[14) N. Karmarkar. An interior-point approach for NP-complete problems. Contem-


porary Mathematics, 114:297-308, 1990.
IPMs for Global Optimization 499

[15] N. Karmarkar, M.G.C. Resende, and K. Ramakrishnan. An interior point al-


gorithm to solve computationally difficult set covering problems. Mathematical
Programming, 52:597-618, 1991.

[16] N. Karmarkar, M.G.C. Resende, and K.G. Ramakrishnan. An interior point


approach to the maximum independent set problem in dense random graphs.
In Proceedings of the XIII Latin American Conference on Informatics, volume 1,
pages 241-260, Santiago, Chile, July 1989.

[17] N .K. Karmarkar. A new polynomial time algorithm for linear programming.
Combinatorica, 4:373-395, 1984.

[18] M. Kojima, N. Megiddo, T. Noma, and A. Yoshise. A unified approach to


interIOr point methods for linear complementarity problems. Lecture Notes in
Computer Science. Springer-Verlag, 1991.

[19] K. Levenberg. A method for the solution of certain problems in least squares.
Quart. Appl. Math., 2:164-168, 1944.

[20] O.L. Mangasarian. Characterization of linear complementarity problems as


linear programs. Mathematical Programming Study, 7:74-87, 1978.

[21] D. Marquardt. An algorithm for least-squares estimation of nonlinear parame-


ters. SIAM J. Appl. Math., 11:431-441,1963.

[22] R.D.C. Monteiro, I. Adler, and M.G.C. Resende. A polynomial-time primal-


dual affine scaling algorithm for linear and convex quadratic programming and
its power series extension. Mathematics of Operations Research, 15:191-214,
1990.
[23] R.D.C. Monteiro and Y. Wang. Trust region affine scaling algorithms for linearly
constrained convex and concave programs. Technical report, Georgia Institute of
Technology, Atlanta, GA, June 1995. To appear in Mathematical Programming.

[24] J.J. More and D.C. Sorenson. Computing a trust region step. SIAM J. of Stat.
Sci. Comput., 4:553-572, 1983.

[25] P. M. Pardalos, Y. Ye, C.-G. Han, and J. Kaliski. Solution of P-matrix lin-
ear complementarity problems using a potential reduction algorithm. SIAM J.
Matrix Anal. fj Appl., 14:1048-1060,1993.

[26] P.M. Pardalos. Continuous approaches to discrete optimization problems. In


G. Di Pillo and F. Giannessi, editors, Nonlinear optimization and applications.
Plenum Publishing, 1996.
500 CHAPTER 12

[27] P.M. Pardalos and J.B. Rosen. Global optimization approach to the linear
complementarity problem. SIAM J. Scient. Stat. Computing, 9:341-353, 1988.
[28] M.V. Ramana. An algorithmic analysis of multiplicative and semidefinite pro-
gramming problems. PhD thesis, The Johns Hopkins University, Baltimore,
1993.

[29] M.G.C. Resende and P.M. Pardalos. Interior point algorithms for network flow
problems. In J .E. Beasley, editor, Advances in linear and integer programming.
Oxford University Press, 1996.
[30] C.-J. Shi, A. Vannelli, and J. Vlach. An improvement on Karmarkar's algorithm
for integer programming. In P.M. Pardalos and M.G.C. Resende, editors, COAL
Bulletin - Special issue on Computational Aspects of Combinatorial Optimiza-
tion, number 21, pages 23-28. 1992.
[31] T. Tsuchiya and M. Muramatsu. Global convergence of a long-step affine scaling
algorithm for degenerate linear programming problems. Technical Report 423,
The Institute of Statistical Mathematics, Tokyo, 1992. To appear in SIAM J.
Opt.
[32] Stephen A. Vavasis. Nonlinear Optimization, Complexity Issues. Oxford Uni-
versity Press, Oxford, 1991.
[33] J.P. Warners, T. Terlaky, C. Roos, and B. Jansen. Potential reduction algo-
rithms for structured combinatorial optimization problems. Technical Report
95-88, Delft University of Technology, Delft, 1995.
[34] J.P. Warners, T. Terlaky, C. Roos, and B. Jansen. A potential reduction ap-
proach to the frequency assignment problem. Technical Report 95-98, Delft
University of Technology, Delft, 1995.
[35] Y. Yeo A new complexity result on minimization of a quadratic function with a
sphere constraint. In C.A. Floudas and P.M. Pardalos, editors, Recent Advances
in Global Optimization, pages 19-21. Princeton University Press, Princeton,
1992.
[36] Y. Yeo On affine scaling algorithms for nonconvex quadratic programming.
Mathematical Programming, 56:285-300, 1992.
[37] Y. Ye and P.M. Pardalos. A class of linear complementarity problems solvable
in polynomial time. Linear Algebra and its Applications, 152:3-17, 1991.
13
INTERIOR POINT APPROACHES
FOR THE VLSI PLACEMENT
PROBLEM
Anthony Vannelli, Andrew Kennings,
Paulina Chin
Department of Electrical and Computer Engineering
University of Waterloo
Waterloo, Ontario
CANADA N2L 3Gl

ABSTRACT
VLSI placement involves arranging components on a two-dimensional board such that the
total interconnection wire length is minimized while avoiding component overlap and ensur-
ing enough area is provided for routing. Placement is accomplished in a two-step procedure.
The first step involves computing a good relative placement of all components while ignor-
ing overlap and routing. The second step involves removing overlap and routing. This
paper describes two new relative placement models that generate sparse LP and QP pro-
grams. The resulting LP and QP programs are efficiently solved using appropriate interior
point methods. In addition, an important extension is described to reduce module overlap.
Numerical results on a representative set of real test problems are presented.

Keywords: relative placement, quadratic programming, interior point methods.

13.1 INTRODUCTION
In the combinatorial sense, the layout problem is a constrained optimization prob-
lem. We are given a circuit (usually a module-wire connection-list called a netlist)
which is a description of switching elements and their connecting wires. We seek an
assignment of geometric coordinates of the circuit components (in the plane or in one
of a few planar layers) that satisfies the requirements of the fabrication technology
(sufficient spacing between wires, restricted number of wiring layers, and so on) and
that minimizes certain cost criteria. Practically, all aspects of the layout problem
as a whole are intractable; that is, they are NP-hard [4]. Consequently, we have to
resort to heuristic methods to solve very large problems. One of these methods is
to break up the problem into subproblems, which are then solved. Almost always,

501
T. Terlaky (ed.i, Interior Point Methods ofMathematical Programming 501-528.
© 1996 Kluwer Academic Publishers.
502 CHAPTER 13

these subproblems are NP-hard as well, but they are more amenable to heuristic so-
lutions than is the entire layout problem itself. Each one of the layout subproblems
is decomposed in an analogous fashion. In this way, we proceed to break up the
optimization problems until we reach primitive subproblems.

These subproblems are not decomposed further, but rather solved directly, either
optimally (if an efficient polynomial-time optimization algorithm exists) or approx-
imately if the subproblem is itself NP-hard or intractable, otherwise. The most
common way of breaking up the layout problem into subproblems is first to do logic
partitioning where a large circuit is divided into a collection of smaller modules
according to some criteria, then to perform component placement, and then to de-
termine the approximate course of the wires in a global routing phase. This phase
may be followed by a topological-compaction phase that reduces the area require-
ment of the layout, after which a detailed-routing phase determines the exact course
of the wires without changing the layout area. After detailed-routing, a geometric-
compaction phase may further reduce the layout area requirement [7].

In VLSI placement which is the focus of this work, we are given a set of components
(modules) that are interconnected by a set of signal paths (nets). The objective is
to position the modules while minimizing the total wirelength required to connect
the modules. In positioning the modules, several placement constraints must be
considered to guarantee feasibility (a legal placement). For instance, the modules
must be placed within some given physical area and must not overlap. Furthermore,
the modules must be placed such that the nets can physically be connected (routing).
Examples of the placement problem arise in macrocell, gate array and standard cell
design [11].

Since the VLSI placement problem is computationally intractable and optimal place-
ments are difficult to produce, advanced heuristics such as Tabu Search [14], simu-
lated annealing [11] and hierachical partitioning [10] are used to obtain near optimal
placements. Although these heuristics yield near optimal placements, they still tend
to require large computational times. However, it is well known that when good
initial placements are generated, these heuristics tend to converge quickly to near
optimal placements with low computational effort [5].

Relative placement involves generating good initial placements. To generate initial


placements, some estimate of the wirelength is minimized while ignoring feasibility
issues such as overlap and routing. The idea behind relative placement is to quickly
determine the general location of modules within the placement area, and then to
subsequently obtain a legal placement with another heuristic method.
Interior Point Approaches for the VLSI Placement Problem 503

Relative placement heuristics include eigenvector techniques [6] and force-direct-


ed methods [12]. However, these techniques tend to be restrictive in that they
do not permit the addition of constraints which may further improve the relative
placement. More recently, a linear programming (LP) model for relative placement
has been proposed [17], and subsequently studied in terms of its efficiency [2]. The
LP model provides a global view of the placement problem by considering all modules
simultaneously. Additionally, the model is easily extended by including additional
constraints.

In this work, an alternative placment procedure is proposed by using a module-net-


point (MNP) model. In this model, all modules and nets are considered to be points.
This results in a quadratic program (QP) model which can be solved efficiently using
an interior point method [15]. This model is much in the spirit of the LP model
previously mentioned [17]; it considers all modules simultaneously and allows for the
inclusion of additional constraints. However, the MNP model is expected to extend
more naturally to more complicated models, as it is based on a QP rather than an
LP model. Despite the similarities of the LP and QP models, the aim of this work
is not to directly compare these two models. This would require taking the relative
placements produced by each model and actually generating a legal placement; such
an analysis is beyond the scope of this work. The main focus of this work is to
explore the MNP model and its solution, and extensions which further improve the
resulting relative placement.

The linear program model of Weis and Mlynski is described in Section 13.2. A
new extension of a module-net-point model is described in Section 13.3. The model
is shown to be equivalent to a quadratic programming (QP) problem with a sparse
positive definite matrix in the objective function which can be solved efficiently using
an interior point method. Section 13.4 describes an important extension which can
be included to improve the relative placement by forcing overlapping modules further
apart. The quadratic interior point method used in the work to solve the relative
placement problem is described in Section 13.5 . Numerical results on test problems
for both the LP and QP problems are presented in Section 13.6. Finally, Section
13.7 summarizes the results of this work and presents directions for future research.

13.2 A LINEAR PROGRAM FORMULATION


OF THE PLACEMENT PROBLEM
To solve the placement problem, we must determine locations on a board for the
modules while minimizing wirelength. Weis and Mlynski [17] proposed an LP model
for determining relative placements. Their solution provides general locations for all
504 CHAPTER 13

I fixed
, module

o
o o
---8--
o

free module

Figure 13.1 Placement of free and fixed modules on a board.

the modules, and then other methods are used to eliminate overlap and subsequently
form a legal placement. The force-directed method [12], which involves minimizing
an unconstrained quadratic function, is more commonly used, as it requires only the
solution of one linear system. However, the LP model often gives a more accurate
estimate of the total wirelength [13] and is easier to extend and generalize. The
formulation of the placement problem is a modification of Weis and Mlynski's model,
and it is presented below.

We must compute locations of M free modules that are connected by N nets. Some
of the N nets may involve connections with F fixed modules as well. The fixed
modules are usually I/O pads placed on the perimeter ofthe board (see Figure 13.1)
and their locations are known in advance.
Interior Point Approaches for the VLSI Placement Problem 505

-----.
module 4 :
I J
1\

(V.,V.)
J

module 1

I
module 2

-.
I
I module 3
I
1\

(U., U.)4t-
J J

Figure 13.2 The circumscribing rectangle for a net.

13.2.1 Basic Assumptions


l. Modules are modelled as points; i.e., height and width information is disre-
garded. This assumption is reasonable for computing initial relative placements
and simplifies the original model in [17].

2. The wirelength required for a net is approximated by half the perimeter of the
net's circumscribing rectangle ((Vj - Uj) + (vj - Uj) in Figure 13.2). For 2-
module or 3-module nets, this measure is equivalent to that given by a minimal
spanning Steiner tree.

3. The following information is given:

• X and Y, the board's width and height,


• a list of all modules and their connections, and
• {( Ci, d i ), i = 1,2, ... , F}, positions of modules that are fixed on the board
(including I/O pads).
506 CHAPTER 13

13.2.2 Objective Function Construction

We solve for the following unknowns:

• {( Xi, Yi), i = 1, 2, ... , M}, the coordinates of free modules

• {(Uj,Uj),j = 1,2, ... ,N} and {(vj,iij),j = 1,2, ... ,N}, the lower-left and
upper-right corners of the nets' circumscribing rectangles, as in Figure 13.2
(i.e., if module i is connected to net j, then it is within net j's circumscribing
rectangle; consequently, Uj ~ Xi ~ Vj and Uj ~ Yi ~ iij).

For convenience, we let the vectors x, y, ti, ii, v, and v contain all the components
Xi, Yi, Uj, Uj, Vj, iij , respectively. We find the values of these vectors so that the
sum of the circumscribing rectangles' perimeters over all nets is as small as possible.
That is, we wish to minimize the cost function
N
~)Wj(Vj - Uj) + wj(iij - itj)]' (13.1)
j=l

where w = [Wl, W2, ... , wN]T and w = [W1, W2, ... , WN]T are weights on the nets.
These can be adjusted to obtain different layouts. Initially, Wj Wj =
1 for all =
nets connecting only free modules. The Wj and Wj values for nets connecting free
modules to fixed modules are then adjusted in order to distribute modules as evenly
as possible over the given board area, so as to avoid clustering.

13.2.3 Constraint Generation


Constraints for the x-direction are shown below. The y-direction constraints can be
easily derived in a similar manner.

1. Each net has a minimum width D > 0 which can be varied to give the desired
distribution over the board area:

Vj - Uj ~ D, j = 1,2, ... , N. (13.2)

2. The modules must be placed within the board edges:

o~ Xi and Xi ~ X, i = 1,2, ... , M. (13.3)


Interior Point Approaches for the VLSI Placement Problem 507

3. Each free module must be within the circumscribing rectangle of a net to which
it is connected:

Uj ~ Xi and Xi ~ Vj, j = 1,2, ... , N, free module i in net j. (13.4)

4. An upper bound on Uj is the minimum x-coordinate over all fixed modules in


net j. A lower bound on Vj is the maximum x-coordinate over all fixed modules
in net j:
Uj~9j and Vj2:hj, j=I,2, ... ,N, (13.5)
where
9j = min{z;: fixed module i in net j} and
(13.6)
hj = max{zi: fixed module i in net j}, j =
1,2, ... , N.

Because the x-direction and y-direction variables and constraints are independent of
one another and the cost function can be separated, we can solve two independent
linear programs, one for each direction. For the sake of brevity, only the LP formed
for the x-direction is shown:
mInimiZe wT(v - u)

1 -I 0 -D·e
0 0 -I 0
0 0 1 X·e (13.7)
subject to P
0
1
0
-p
0
-Q
Q
0
[: 1 < 0
0
g
0 -I 0 h

where e is the vector of ones, and g and h are vectors containing the bounds on u
and v. 1 is the identity matrix, while P and Q are matrices containing a single entry
of 1 on each row.

When all fixed modules are fixed in both x- and y- directions, then the constraint
matrix for both the x-direction and y-direction LPs are identical, although in general,
the right-hand-side vectors are different from each other. More precisely, the y-
direction LP will look identical to that in (13.7), with u, v, x and w replaced by
U, V, y and w, and with right-hand-side components D, X, g and h replaced by
different values D, Y, g and h.

It is possible to have modules that are fixed in only one direction. For example, we
may wish to specify that an I/O pad can be placed anywhere on the left edge of the
508 CHAPTER 13

200

300

o 200 400 600 800 1000 1200 1400

Figure 13.3 The constraint matrix sparsity pattern from a placement LP.

board. In this case, the I/O pad is considered to be a fixed module in the x-direction
but a free module in the y-direction. Consequently, the two constraint matrices will
differ in structure.

Note that, by changing the sign of the cost function, LP (13.7) can be written in
standard dual form, with inequality constraints:

maXImIze
(13.8)
subject to

The constraint matrix A (the transpose of the constraint matrix in LP (13.7)) has
M + 2N rows. The number of columns varies, depending on the number of module-
net connections, but is typically two to four times the number of rows. Real-life
applications involve thousands of nets and modules, with larger problems being
formulated continually. Yet though the matrix can be great in size, it is extremely
sparse, with only one or two non zeros in each column. Figure 13.3 shows the sparsity
pattern of the matrix A from a typical placement example. (Note that the ordering
of rows and columns may not be the same as that shown in LP (13.7).)
Interior Point Approaches for the VLSI Placement Problem 509

13.3 A QUADRATIC PROGRAM


FORMULATION OF THE MNP
PLACEMENT MODEL
The LP formulation described in section 2 leads to many modules being "clustered"
about the center of the placement region. It is important to attempt to force the
modules apart so that a legal placement; i.e., a placement with no overlap is more
easily generated. The approaches described in this section and section 4 separate
the modules further apart in the desired placement region.

In an attempt to force more module separation, we develop a new module-net-point


(MNP) model, where all the modules and nets are considered as points. We are
concerned with determining the location of M free modules interconnected by N
nets while minimizing some measure of wirelength. Furthermore, some of the N
nets may involve connection to F fixed I/O pads located around the periphery of
the placement area.

Let the location of free module i be (Xi, Yi) and the location of net j be (Uj, Vj ). In
this case, we attempt to find one location of the net by defining (Uj, Vj) as compared
to the previous LP model which used (Uj, 'lij) and (Vj, iij ) to describe the respective
lower-left and upper-right corner locations of the circumscibing rectangle that con-
tains the net. Finally, let the location of fixed module i be (Ci, d i ). To denote the
module-net interconnections, let

if free module i is connected to net j


otherwise,

and
if fixed module i is connected to net j
otherwise.

13.3.1 Objective Function Construction


To approximate the total estimated wirelength, we use the sum of the squared wire-
length as the objective function; i.e.,

1
2"LLnij [(Xi- Uj)2+(Yi- Vj)2] +
M N
f = (13.9)
i=l j=l

1
LLii;j [(Ci - Uj)2 + (d; -
F N
2" Vj)2]
;=1 j=l
510 CHAPTER 13

fx + fy (13.10)

where

(13.11)

and

(13.12)

The objective is to find M module points and N net points to minimize the objective
function f. Note that minimizing f can be performed by minimizing fx and fy
independently, which implies the two-dimensional placement problem is equivalent
to solving two one-dimensional problems. The rest of the discussion involves fx only,
but extends to fy without any loss of generality.

Let x = [Xl, X2,···, xMf and U = [Ul, U2,···, uNf be vectors representing the
module and net points, respectively and let z = [xT, u T ] T. The objective function
fx can be conveniently rewritten in the following matrix form:

1 T T
fx = 2z Bz+g z+h, (13.13)

where B is a symmetric positive definite matrix. Matrix B is positive definite when


F > 0 which is always the case in practical layout problems; i.e., at least one module
is always fixed.
(13.14)

The matrix N = [nij] is an M x N matrix describing the free module-net intercon-


nections; e.g., nij = 1 if free module i is in net j. The matrices Da = [da,J and
Db = [db;j] are diagonal matrices, where

if i = j
(13.15)
otherwise,

and
if i = j
( 13.16)
otherwise,
respectively. The linear cost vector g is given by

(13.17)
Interior Point Approaches for the VLSI Placement Problem 511

where 0 is an M-vector of zeros and f3 is an N-vector with element j given by


F
f3j = - L fiijCi. (13.18)
i=1

Finally, the scalar h is given by

(13.19 )

13.3.2 Constraint Generation


Several important constraints must be included in the resulting placement model. We
generate only the constraints for the x-direction; equivalent constraints are generated
for the QP problem formulated in the y-direction. First the module and net points
are related. The net point Uj should represent the centre of gravity of all modules
connected to net j, i.e.,

(13.20)

where
M F
hj = L nij +L fiij . (13.21)
i=1 i=1

With this restriction, and assuming each net is wired as its Steiner tree, the resulting
objective function f should closely approximate the wirelength.

Second let the maximum dimension of the placement area be (X, Y). All free mod-
ules must be constrained such that they are positioned within the placement area.
Therefore, we have
OS; Xi S; X. (13.22)
Finally, in relative placement it is desirable to obtain an even spread of free modules
over the placement area (i.e., to avoid clustering). This is obtained by including the
first moment constraint
M X
LXi=M- ( 13.23)
;=1 2
to force an even spread of free modules around the centre of the placement area.
512 CHAPTER 13

As described, the placement model involves the minimization of a positive definite


quadratic objective function subject to a set of linear equality and inequality con-
straints. Using the previous definition of z and considering the minimization of Ix
only, the minimization problem is stated concisely as

m1Dlm1ze ~zTBz + gT z + h
subject to Az=b, (13.24)
o ~ z ~ X ·e.
Such a problem can be efficiently solved using a quadratic interior point method.

13.4 TOWARDS OVERLAP REMOVAL


The MNP model described in the previous section relies heavily on the presence of
I/O pads and the first moment constraint to force module separation. Significant
overlap will occur when few I/O pads are present, and the modules tend to cluster
around the middle of the placement area. We seek methods to improve free module
separation.

To force additional free module separation, we include a second moment constraint


given by

- Ex; =
1 M
(J'2 + 2, m (13.25)
N ;=1
where (J'2 is a desired variance and m is the average position of the free modules (i.e.,
the centre of the board). With the second moment constraint the relative placement
problem becomes
mlD1m1ze ~zTBz + gT z + h
subject to Az=b,
(13.26)
~zTDz =
W,
o ~ z ~ X ·e,
where W = ~N((J'2 + m 2 ) and D is a diagonal matrix with either 0 or 1 on the
diagonal to pick off the components of z corresponding to the free modules. The
resulting problem now contains a quadratic equality constraint and only a locally
optimal solution can be guaranteed.

To attack this problem we consider recursive quadratic programming, where at each


iteration we have a tentative solution z and we wish to determine an appropriate
search direction d. We update our solution as

z +- z + ad (13.27)
Interior Point Approaches for the VLSI Placement Problem 513

where 0 ~ a ~ 1 is an appropriate step size. By substituting z+d for z in (13.26), the


following convex quadratic program approximation of (13.26) is solved to determine
d
minimize (g + Bzl d + ~dTBd
subject to (b - Az) - Ad = 0,
(13.28)
(W - ~zTDz) - zTDd 0, =
o ~ z+d ~ X ·e.
Note, that -tdTDd is not considered in the second moment constraint.

If we determine the initial solution z by solving the original MNP model given by
(13.24) (i .e., without the second moment constraint), then the following lemma holds

Lemma 13.4.1 Let d "lObe a solution to the linearized problem (13.28) and let
z be a point satisfying the equality constraints Az = b and the variable bounds
o ~ z ~ X . e. The updated solution z + ad will also satisfy the equality constraints
A(z + ad) = b and variable bounds for 0 ~ a ~ 1.

Proof: From the linearized problem, assuming Az =b we have

Ad = (b - Az) = o.
We find
A(z + ad) = Az + aAd = Az = b.
The proof for the variables bounds is trivial for 0 ~ a ~ 1.

Hence, the sequence of solutions generated by recursive quadratic programming will


satisfy all the constraints and variable bounds, except for the second moment con-
straint. It is therefore necessary to show that d is such that the second moment
constraint will move towards becoming satisfied. We have the following proposition:

Proposition 1 Let d "lObe the solution of the linearized quadratic program


(13.28) and z be a point statisfying proposition 13.4.1. Then d is a descent direction
for the penalty function

(13.29)

where </J> 0 is a large penalty parameter.


514 CHAPTER 13

Proof: We have
P(z + ad) = P(z) + a'V P(z)d + O(a 2 )
P(z) + a [gT + zTB - ¢(W - ~zTDz)(zTD)J d + O(a 2 ).
(13.30)
From the constraints of (13.28) we have
1
zTDd =W - -zTDz.
2
(13.31)

Thus
'V P(z)d = gT d + zTBd - ¢(W - 21 zTDz )2. (13.32)
Clearly, for an appropriately large ¢ value, it follows that 'V P(z)d :::; 0 and
P(z + ad) :::; P(z) for a sufficiently small.

We perform a line search using d to minimize the penalty function. As ¢ --> 00, the
recursive quadratic program will approach a locally optimal solution. Assuming that
the initial solution provided by the original MNP model is good, then this locally
optimal solution should also be good, with an increase in the module spreading. Of
course, since the penalty function is not an exact penalty function, for a finite value
of tP the second moment constraint will not be exactly satisfied [9]. However, the
module spread will be increased.

13.5 PRIMAL-DUAL QUADRATIC


INTERIOR POINT METHODS
In this section and for the sake of completeness, we summarize the quadratic interior
point approach [15] that is used to solve the QP problems formulated in the last two
sections. We are interested in solving the following primal quadratic program:
mInimiZe cT x + txT Qx
subject to Ax = b,
(13.33)
x+s=u
x,s ~ 0
and its dual quadratic program:
maximize bTy - uTw - ~xTQx
subject to AT Y - w+ r - Qx = c, (13.34)
r,w~O,
Interior Point Approaches for the VLSI Placement Problem 515

where Q is an n x n positive semidefinite matrix, A is an m x n matrix with full row


rank, c and u are n-vectors, b is an m-vector, and r is an n-vector of slack variables
which allows the dual problem to be expressed in equality form.

13.5.1 Theory
The primal-dual algorithm is derived by applying a logarithmic barrier function
to the primal problem in order to eliminate the non-negativity constraints. The
resulting barrier problem is given by

mllllmize cTx + ~xTQx -/-I 2:: =l lnx j -/-I 2::


J J =l lnsj
subject to Ax=b (13.35)
x+s = u.

A similar approach will yield a barrier problem if applied to the dual problem.

Assuming a point that satisfies {(x, s, r, w, y) : x, s, r, w > O}, for a fixed value of
the penalty parameter /-I > 0, the first order conditions for simultaneous optimality
for the primal and dual barrier problems are:

Ax b (13.36)
x+s u (13.37)
AT -w+r-Qx c (13.38)
XR /-Ie (13.39)
SW /-Ie (13.40)

where e denotes the n-vector of ones, and X, S, Wand R are diagonal matrices
containing the components of x, s, wand r, respectively. Equations (13.36) and
(13.37) guarantee primal feasibility and equations (13.38) guarantee dual feasibility.
Equations (13.39) and (13.40) represent the /-I - complementarity conditions.

The idea behind the primal-dual interior point algorithm can be stated as follows.
Let (xl" sl" rl" wI" YI') denote the solution of the optimality conditions for any value
/-I > 0, and let (x*, s*, r*, w*, y*) denote the solution as /-I tends to zero. Given
an initial point (x, s, r, w, y), the primal-dual algorithm uses one step of Newton's
method to try to find a point closer to (xl"sl',rl"wl',yl')· This becomes the new
solution and the penalty term J.l is reduced appropriately. This process is continued
until /-I is sufficiently close to zero and the solution (x*, s*, r*, w*, y*) is obtained.
It follows from the first order optimality conditions that this solution is both primal
and dual feasible, and the duality gap is zero. Thus, (x*, s*) is optimal for the primal
problem, and (r*, w*, y*) is optimal for the dual problem.
516 CHAPTER 13

Applying Newton's method to the first-order optimality conditions yields the follow-
ing set of linear equations to obtain the search direction:

{][if] [fl
o 0 A
o 1 1
-I 0 -Q (13.41 )
o 0 Z
S W o
where
p b-Ax
u-x-s
c - AT Y + w - r + Qx ( 13.42)
pe- XRe
pe- SWe
The desired solution is then updated as

(x, s, w, r, y) <- (x, s, w, r, y)/ii (13.43)


where ii is a step length chosen to ensure positivity of the non-negative variables.

13.5.2 Search Directions


The linear system of equations derived from Newton's method is usually not solved
directly, but is first simplified. If we first solve for ~s, ~W and ~r we arrive with
the following:

~s r-~x, (13.44)
~w S-1 (</Jw - W ~s), (13.45)
~z X- 1(</J - R~x), (13.46)
and

[-(Q+RX~1+WS-1) ~T] [~;] = (13.47)

[ (T - X-1</Jz + S~l</Jw - S-lWr ] .

This system is referred to as the augmented system and is symmetric indefinite,


whereas the original system is not.

The solution of the augmented system at each iteration of the interior point method
represents the main computational burden of the algorithm. The solution of the
Interior Point Approaches for the VLSI Placement Problem 517

augmented system typically consumes 80 - 90 percent of the total computational


time. Several approaches have been suggested for solving the augmented system,
including both direct methods [1,16] and iterative methods [2]. Rather than focusing
on the iterative solution of the augmented system (which is an active area of research
in itself), in this work we use the direct method as described in [16].

13.5.3 Additional Issues


Several addition issues must be addressed in order to implement the interior point
algorithm. An initial starting point {(x, w, r, s, y) : x, w, r, s > O} is required to
start the algorithm. Although it is possible to devise elaborate schemes for selecting
a starting point, an arbitrary initial solution which often is sufficient is given by

(13.48)

A stopping criterion is also required. Optimality requires both primal and dual
feasibility and that the duality gap is below a preselected threshold. Primal and
dual feasibility are measured by

[lip 112 + II T 112] 1/2


(13.49)
lib II +1
and
lIuli (13.50)
II c II +1'
respectively. To measure the duality gap, we use

I cTx - bTy +uTw +xTQx I


(13.51)
I cTx + ~xTQx I +1

Two issues arise during the progression of the interior point algorithm. The first
issue is the computation of the parameter J.!. From the optimality conditions we see
that
xTr = nJ.!, sTw = nJ.!. (13.52)
One way to recover J.! is to compute

(13.53)

and reduce this value by one tenth to move closer to optimality at each stage of the
algorithm.
518 CHAPTER 13

The second issue which must be addressed is the selection of the step size it used
to update the solution at each iteration. The step size it must be selected to ensure
positivity of the nonnegative variables. In this work we use:

= max {max
{ , 6-
.-X ,6. -
-j , - - Wj, - -
,6.Sj - - } /0.95,1 } .
- , - ,6.rj
Q
A

(13.54)
J Xj Wj Sj rj

13.6 NUMERICAL RESULTS


To test the placement models, we implemented the QP approaches described in
sections 3 and 4 in C and tested the algorithm on realistic test layout problems from
the VLSI literature. The heuristics and resulting LP and QP programs were run on
a Sun Spare 2 workstation with 64Meg of memory.

13.6.1 LP Placement Results


In this subsection, we exploit the sparsity of the constraint matrix A in problem (8)
to show the advantage of solving these problems by an interior point algorithm as
compared to using a Simplex method. The placement LPs were created from netlists
in the MCNC Benchmark Test Suite [8] (fnn4, fnn8, primary 1 and primary2) and
four additional net lists (place1, place2, place3 and place4). Each of these problems
listed I/O pads for each edge of the board. Locations for these pads were fixed by
distributing them evenly across the specified edges. The net weights for connecting
free modules to fixed modules were set as Wk = Wk = 10 and all other net weights
were set to 1. The minimum net width and height were set as D = 0.1 x X and
D=O.lxY.

Table 13.1 shows the size information for the 8 problems. This chart provides the
number of free modules, fixed modules and nets for each test case. In addition, the
number of rows m, the number of columns n, and the number of nonzero elements in
the constraint matrix A are shown. In the tests documented below, only the results
for the x-direction linear programs are reported. Because the constraint matrix is
the same for both the x-direction and y-direction problems, the y-direction results
would yield similar conclusions.

Comparisons were made with both the Barrier and Simplex options of CPLEX LP
package [3]. Feasibility and duality gap tolerances of 10- 4 were used as the stopping
criteria in the CPLEX tests and subsequent QP testing. Table 13.2 shows the total
iterations taken; i.e., the interior-point iterations for CPLEX Barrier and the total
iterations for the CPLEX Simplex option.
Interior Point Approaches for the VLSI Placement Problem 519

Table 13.1 Characteristics of Placement LPs.

free fixed
modules modules nets rows columns non zeros
fnn4 140 20 115 370 1481 2652
placel 264 36 294 852 2444 4216
fnn8 440 36 291 1022 2444 8556
primary 1 752 81 902 2556 4745 14462
place2 2194 77 2192 6578 21888 35553
primary2 2907 107 3029 8965 31078 60470
place3 6208 209 5711 17630 69330 105452
place4 11741 400 12949 37639 129716 233983

Table 13.2 Interior-point or Simplex Iterations for Placement LPs.

CPLEX Execution Time (sec.) CPLEX Iterations


Barrier Simplex Barrier Simplex
fnn4 3 2 12 504
place 1 4 4 9 580
fnn8 14 7 20 1060
primary 1 25 32 16 1873
place2 51 207 12 5245
primary2 299 242 18 5608
place3 175 1353 14 12435
place4 2545 7680 17 34229
520 CHAPTER 13

Table 13.2 also shows the execution times in CPU seconds for each of the solvers
tested. The Simplex solver was very efficient on the smaller problems, but showed
worse performance on the larger problems, which are the problems of interest.
CPLEX Barrier performed well on most of the test cases, but had the worst running
time for "primary2". Chin and Vannelli [2] develop iterative solvers to reduce the
related fill problems and execution times.

13.6.2 Quadratic Model Placement Results


To show that better module separation is achieved by the outlined approaches in
sections 3 and 4, we considered the following new test problems shown in Table 13.3.
This table provides the problem name and the number of fixed modules, free modules
and nets in the test problem. In generating these problems, the fixed I/O pads were
distributed evenly around the edges of the placement area. Since the efficiency of the

Table 13.3 Characteristics of Placement QPs.

Problem Problem Size Augmented System


Free Modules Fixed Modules Nets Dimension Nonzeros
chip1 270 30 294 859 4732
chip2 248 28 239 725 3859
chip3 179 20 219 618 3178
chip4 219 25 221 662 3376

algorithm depends on the efficiency in solving the augmented system, the statistics
for the augmented system are also shown in Table 13.3. This includes the number
of rows, columns and nonzeros in the augmented system.

For the quadratic MNP model, the relative placements are presented in Table 13.4
(x-direction only). Table 13.4 shows the total number of interior point iterations,
total solution times required to obtain the placements and the resulting estimate
of the wirelength. Also provided in Table 13.4 is the variance of the free modules,
which provides a measure of module spreading.

Figure 13.4 shows the relative placement obtained for chipl. As expected, there is
some spreading of modules due to the presence of the I/O pads, but in general the
modules tend to cluster towards the centre of the placement area. We now include
the second moment constraint to improve module spreading.
Interior Point Approaches for the VLSI Placement Problem 521

300 ~ ~

-
~
250 x
x
x
~
x
200 x
~ x
x WCxexX~XI~ x ~
xXxxx ~ x~: '!.x ~ xx~ xJi
XX lfC
150
x x
xtx x
I< Xx ~ x xXi x¥ x~
Ix x x xx x xx x
100 x
)<
Xx x x>IE x
x x x x
*
x
Xx
XX xx x
x x
Il x
x x x
50 x

x
0 A

0 50 100 150 200 ~250 300

Figure 13.4 Relative placement for chipl; QP-MNP model. An "x" denotes the
position of a module.
522 CHAPTER 13

Table 13.4 Results of the QP MNP-Placement Model.

Problem Interior Point Solution Estimate of Module


Iterations Time (sec.) Wirelength Variance
chipl 9 119 2.6ge06 1656.92
chip2 11 56 2.60e06 2706.81
chip3 10 35 6.87e05 1586.42
chip4 10 25 2.66e06 3163.89

13.6.3 Variance Improvement


To improve the spread of modules within the placement area, the module variance
was increased by 50 percent from that shown in Table 13.4. Using the solution
provided from the original MNP model, the placement results obtained by including
the second moment constraint are shown in Table 13.5. Table 13.5 shows the total

Table 13.5 Results of the MNP placement model (variance constraint included).

Problem Interior Point QPs Solution Estimate of Module


Iterations Solved Time (sec.) Wirelength Variance
chip1 18 2 236 3.95e06 2486.53
chip2 22 2 113 3.68e06 4062.09
chip3 20 2 72 8.12e05 2380.68
chip4 21 2 54 3.02e06 4747.99

number of interior point iterations for all linearized QPs (including the first QP
required to generate the initial point), the total number of QPs solved and the
total solution times. Also shown is the resulting estimate of the wirelength and the
new variance for the free modules. The new relative placement for chipl is shown in
Figure 13.5. It can be observed that the modules are more evenly spread throughout
the placement area.

Few linearizations are required and the method quickly converges to a new solution
with improved variance. Also, general locations of modules are relatively unaffected
and the new estimate of the wirelength is close to that obtained from the original
MNP model.
Interior Point Approaches for the VLSI Placement Problem 523

300~----------~r-~------------------~~--------~---.

1 x
x x
x x
x

x
x x
x
x x
xXxx>oc
X Xx
\ x
Xx x

x x x
x x

o .------~L-----~-L----____~____~~--4r--~-L~~~~

o 50 100 150 200 250 300

Figure 13.5 Relative placement for chipl with variance constraint. An "x" de--
notes the position of a module.
524 CHAPTER 13

13.7 CONCLUSIONS
In this work, two relative placement models were developed. The models result in
sparse LP or QP programs which are efficiently solved by interior point methods.
The sparsity and structure of these problems makes them more suitable candidates
for these solvers rather than Simplex-based variants.

In relative placement, modules tend to cluster around the centre of the placement
area and overlap with one another. A very efficient approach has been described
to prevent clustering and overlap. The resulting approach "forces" modules further
apart by increasing the variance. A small number of additional QPs are solved. This
approach has been shown to be effective.

Future work will include investigating ways to improve the module separation. Even
with the second moment constraint, modules will still cluster and overlap. A more
aggressive approach is to include constraints which directly prevents overlap. In
addition to making the problem nonlinear, these constraints couple the problems in
the x and y directions.

To move towards a placement without overlap, consider Figure 13.6, which shows
two modules i and j with dimensions (Wi, hi) and (Wj, hj ), respectively. Around each
module i we draw a circle with radius ri such that the circle completely surrounds
module i. Let 'Y = {(i,j): i = 1,···M,j = i+ 1,···M} denote the pairs of free
modules. Overlap between pair of modules i and j can be prevented by including
the constraints

(13.55)

We rewrite this constraint in a slightly more convenient form. We square both sides
of this expression to remove the square root and multiply through by 1/2 to get
~(Xi-Xj)2+~(Yi-Yj)2~Hri+rj)2, V(i,j)E'"I. (13.56)
Notice that the x and y coordinates of the modules now interact and the relative
placement can no longer be divided into two one-dimensional problems. Moreover,
the constraints of the type given in (50) are nonconvex.

If we let Z= [x, u, y, vjT, then the overlap constraint for modules i and j , denoted
by gij(Z), can be written as
1 T
gij(Z) = '"Iij - 2"z DijZ::; o. (13.57)

where '"Iij = 1/2(ri + rj)2 and Dij is a sparse semidefinite matrix which picks off the
appropriate components of the vector z.
Interior Point Approaches for the VLSI Placement Problem 525

h·J
w.
l

Figure 13.6 Preventing overlap by surrounding modules with circles.


526 CHAPTER 13

One such inequality constraint is required for each pair of modules, resulting in
O(M2) additional inequality constraints. Although sparse, such a large number of
constraints may be prohibitive. However, it must be noticed that it may not be
necessary to include the overlap constraint for each pair of modules. Practically,
we consider the pairs of modules that strongly overlap first. This observation may
substantially reduce the number of added constraints. One interesting consequence
of using circles is that they to extend over more area than is actually required by
the encompassed modules; this extra space between modules may prove useful for
routing.

Our placement problem now has the form

mllllmlze J(z) = tzTBz+gTz+h


subject to Az=b,
(13.58)
gij(Z):S 0, V(i,j) E I
o :S z :S Z . e.
New work will focus on reducing the number ofrequired constraints and investigating
techniques for improving the efficiency of the interior point method used to solve the
linearized QPs.

Finally, the integration of the relative placement results with an approach such as
Tabu Search can lead to "legal placements" with no overlap. Such an approach was
developed by one of the authors [14]. We propose to include the relative placement
approaches described in this work as a preprocessing stage before the legal placement
is found using such an approach. This work is currently in progress.

Acknowledgements
Part of this research was partially funded by a Natural Sciences and Engineering
Council of Canada (NSERC) Operating Grant No. OGP 0044456.

REFERENCES
[1] J. R. Bunch and B. N. Parlett. Direct methods for solving symmetric indefinite
systems of linear equations. SIAM J. Numer. Anal., 8:639-655, 1971.
[2] P. Chin and A. Vannelli. Computational methods for an LP model of the place-
ment problem. Technical report, University of Waterloo, Waterloo, Ontario,
1994. UW E & C-94-02.
Interior Point Approaches for the VLSI Placement Problem 527

[3] CPLEX Optimization Inc. Using the CPLEX callable library and CPLEX mixed
integer library. Incline Village, NV. 1993.

[4] G. Hachtel and C. Morrison. Linear complexity algorithms for hierarchical


routing. IEEE Transactions on Computer-Aided Design, 8(1):64~80, 1989.

[5] S. W. Hadley, B. 1. Mark, and A. Vannelli. An efficient eigenvector approach


for finding netlist partitions. IEEE Transactions on Computer-Aided Design,
11(7):885~892, July 1992.

[6] K. M. Hall. An r-dimensional quardratic placement algorithm. Management


Science, 17(3):219~229, November 1991.

[7] T.C. Hu and E. Kuh. Theory and concepts of circuit layout. in VLSI Circuit
Layout: Theory and Design, pp. 3~18, IEEE Press, New York, 1985.

[8] K. Kozminski. Benchmarks for layout synthesis - evolution and current status.
in Proceedings 28th ACM/IEEE Design Automation Conference, pp. 265~270,
1991.

[9] D. G. Luenberger. Introduction to Linear and Nonlinear Programming. Addison-


Wesley Pub. Co., Reading, Mass., 1973.

[10] D. G. Schweikert and B. W. Kernighan. A proper model for the partitioning of


electrical circuits. In Proceedings of the 9th Design and Automation Workshop,
pages 57~62, June 1979.

[11] C. Sechen and A. Sangiovanni Vincentelli. The Timberwolf placement and


routing package. IEEE J. Solid-State Circuits, 20:510~522, 1985.

[12] N. Sherwani. Algorithms for VLSI Physical Design Automation. Kluwer Aca-
demic Publishers, Norwell, Massachusetts, 1993.

[13] G. Sigl, K. Doll and F. Johannes. Analytical placement: a linear or quadratic


objective function? Proc. 28th ACM/IEEE Design Automation Conference,
427 ~432, 1991.

[14] L. Song and A. Vannelli. A VLSI placement method using TABU search. Mi-
croelectronics Journal, 23:167~172, 1992.

[15] R. J. Vanderbei. LOQO: An interior point code for quadratic programming.


Technical report, Princeton University, Princeton, N.J., 1994.

[16] R. J. Vanderbei and T. J. Carpenter. Symmetric indefinite systems for interior


point methods. Mathematical Programming, 58:1~32, 1993.
528 CHAPTER 13

[17] B. X. Weis and D. A. Mlynski. A new relative placement procedure based on


MSST and linear programming. Proc. IEEE Int. Symp. Gir. & Sys., 2:564-567,
1987.
Applied Optimization

1. D.-Z. Du and D.F. Hsu (eds.): Combinatorial Network Theory. 1996

2. M.J. Panik: Linear Programming: Mathematics, Theory and Algorithms. 1996

3. R.B. Kearfott and V. Kreinovich (eds.): Applications of Interval Computations.


1996
4. N. Hritonenko and Y. Yatsenko: Modeling and Optimimization of the Lifetime of
Technology. 1996
5. T. Terlaky (ed.): Interior Point Methods of Mathematical Programming. 1996

KLUWER ACADEMIC PUBLISHERS - DORDRECHT I BOSTON I LONDON

You might also like