You are on page 1of 42

Articial Intelligence

1.1.1

The eld of classical Articial Intelligence (AI) coalesced in the 1950s


drawing on an understanding of the brain from neuroscience, the new
mathematics of information theory, control theory referred to as cybernetics,
and the dawn of the digital computer. AI is a cross-disciplinary eld
of research that is generally concerned with developing and investigating

What is AI

1.1

Welcome to Clever Algorithms! This is a handbook of recipes for computational problem solving techniques from the elds of Computational
Intelligence, Biologically Inspired Computation, and Metaheuristics. Clever
Algorithms are interesting, practical, and fun to learn about and implement.
Research scientists may be interested in browsing algorithm inspirations in
search of an interesting system or process analogs to investigate. Developers
and software engineers may compare various problem solving algorithms
and technique-specic guidelines. Practitioners, students, and interested
amateurs may implement state-of-the-art algorithms to address business or
scientic needs, or simply play with the fascinating systems they represent.
This introductory chapter provides relevant background information on
Articial Intelligence and Algorithms. The core of the book provides a large
corpus of algorithms presented in a complete and consistent manner. The
nal chapter covers some advanced topics to consider once a number of
algorithms have been mastered. This book has been designed as a reference
text, where specic techniques are looked up, or where the algorithms across
whole elds of study can be browsed, rather than being read cover-to-cover.
This book is an algorithm handbook and a technique guidebook, and I hope
you nd something useful.

Introduction

Chapter 1

Chapter 1. Introduction

The traditional stream of AI concerns a top down perspective of problem


solving, generally involving symbolic representations and logic processes
that most importantly can explain why the systems work. The successes of
this prescriptive stream include a multitude of specialist approaches such

Neat AI

systems that operate or act intelligently. It is considered a discipline in the


eld of computer science given the strong focus on computation.
Russell and Norvig provide a perspective that denes Articial Intelligence in four categories: 1) systems that think like humans, 2) systems
that act like humans, 3) systems that think rationally, 4) systems that
act rationally [43]. In their denition, acting like a human suggests that
a system can do some specic things humans can do, this includes elds
such as the Turing test, natural language processing, automated reasoning,
knowledge representation, machine learning, computer vision, and robotics.
Thinking like a human suggests systems that model the cognitive information processing properties of humans, for example a general problem solver
and systems that build internal models of their world. Thinking rationally
suggests laws of rationalism and structured thought, such as syllogisms and
formal logic. Finally, acting rationally suggests systems that do rational
things such as expected utility maximization and rational agents.
Luger and Stubbleeld suggest that AI is a sub-eld of computer science
concerned with the automation of intelligence, and like other sub-elds
of computer science has both theoretical concerns (how and why do the
systems work? ) and application concerns (where and when can the systems
be used? ) [34]. They suggest a strong empirical focus to research, because
although there may be a strong desire for mathematical analysis, the systems
themselves defy analysis given their complexity. The machines and software
investigated in AI are not black boxes, rather analysis proceeds by observing
the systems interactions with their environments, followed by an internal
assessment of the system to relate its structure back to its behavior.
Articial Intelligence is therefore concerned with investigating mechanisms that underlie intelligence and intelligence behavior. The traditional
approach toward designing and investigating AI (the so-called good old
fashioned AI) has been to employ a symbolic basis for these mechanisms.
A newer approach historically referred to as scruy articial intelligence or
soft computing does not necessarily use a symbolic basis, instead patterning
these mechanisms after biological or natural processes. This represents a
modern paradigm shift in interest from symbolic knowledge representations,
to inference strategies for adaptation and learning, and has been referred to
as neat versus scruy approaches to AI. The neat philosophy is concerned
with formal symbolic models of intelligence that can explain why they work,
whereas the scruy philosophy is concerned with intelligent strategies that
explain how they work [44].

Natural Computation

An important perspective on scruy Articial Intelligence is the motivation


and inspiration for the core information processing strategy of a given
technique. Computers can only do what they are instructed, therefore a
consideration is to distill information processing from other elds of study,
such as the physical world and biology. The study of biologically motivated
computation is called Biologically Inspired Computing [16], and is one of
three related elds of Natural Computing [22, 23, 39]. Natural Computing
is an interdisciplinary eld concerned with the relationship of computation
and biology, which in addition to Biologically Inspired Computing is also
comprised of Computationally Motivated Biology and Computing with
Biology [36, 40].

1.1.2

Scruy AI approaches are dened as relatively simple procedures that


result in complex emergent and self-organizing behavior that can defy
traditional reductionist analyses, the eects of which can be exploited for
quickly locating approximate solutions to intractable problems. A common
characteristic of such techniques is the incorporation of randomness in
their processes resulting in robust probabilistic and stochastic decision
making contrasted to the sometimes more fragile determinism of the crisp
approaches. Another important common attribute is the adoption of an
inductive rather than deductive approach to problem solving, generalizing
solutions or decisions from sets of specic observations made by the system.

There have been a number of thrusts in the eld of AI toward less crisp
techniques that are able to locate approximate, imprecise, or partially-true
solutions to problems with a reasonable cost of resources. Such approaches
are typically descriptive rather than prescriptive, describing a process for
achieving a solution (how), but not explaining why they work (like the
neater approaches).

Scruy AI

as rule-based expert systems, automatic theorem provers, and operations


research techniques that underly modern planning and scheduling software.
Although traditional approaches have resulted in signicant success they
have their limits, most notably scalability. Increases in problem size result in
an unmanageable increase in the complexity of such problems meaning that
although traditional techniques can guarantee an optimal, precise, or true
solution, the computational execution time or computing memory required
can be intractable.

1.1. What is AI
Chapter 1. Introduction

Computational Intelligence
Computational Intelligence is a modern name for the sub-eld of AI concerned with sub-symbolic (also called messy, scruy, and soft) techniques.
Computational Intelligence describes techniques that focus on strategy and
outcome. The eld broadly covers sub-disciplines that focus on adaptive
and intelligence systems, not limited to: Evolutionary Computation, Swarm
Intelligence (Particle Swarm and Ant Colony Optimization), Fuzzy Systems,
Articial Immune Systems, and Articial Neural Networks [20, 41]. This
section provides a brief summary of the each of the ve primary areas of
study.

1.1.3

Computation with Biology is the investigation of substrates other than


silicon in which to implement computation [1]. Common examples include
molecular or DNA Computing and Quantum Computing.

Computation with Biology

Computationally Motivated Biology involves investigating biology using


computers. The intent of this area is to use information sciences and
simulation to model biological systems in digital computers with the aim
to replicate and better understand behaviors in biological systems. The
eld facilitates the ability to better understand life-as-it-is and investigate
life-as-it-could-be. Typically, work in this sub-eld is not concerned with
the construction of mathematical and engineering tools, rather it is focused
on simulating natural phenomena. Common examples include Articial
Life, Fractal Geometry (L-systems, Iterative Function Systems, Particle
Systems, Brownian motion), and Cellular Automata. A related eld is that
of Computational Biology generally concerned with modeling biological
systems and the application of statistical methods such as in the sub-eld
of Bioinformatics.

Computationally Motivated Biology

Biologically Inspired Computation is computation inspired by biological


metaphor, also referred to as Biomimicry, and Biomemetics in other engineering disciplines [6, 17]. The intent of this eld is to devise mathematical
and engineering tools to generate solutions to computation problems. The
eld involves using procedures for nding solutions abstracted from the
natural world for addressing computationally phrased problems.

Biologically Inspired Computation

A collection of approaches inspired by the structure and function of the


acquired immune system of vertebrates. Popular approaches include clonal

Articial Immune Systems

Fuzzy Intelligence is a paradigm that is concerned with the investigation of


fuzzy logic, which is a form of logic that is not constrained to true and false
determinations like propositional logic, but rather functions which dene
approximate truth, or degrees of truth [52]. Fuzzy logic and fuzzy systems
are a logic system used as a reasoning strategy and are typically applied to
expert system and control system domains.

Fuzzy Intelligence

Neural Networks are a paradigm that is concerned with the investigation of


architectures and learning strategies inspired by the modeling of neurons
in the brain [8]. Learning strategies are typically divided into supervised
and unsupervised which manage environmental feedback in dierent ways.
Neural network learning processes are considered adaptive learning and
are typically applied to function approximation and pattern recognition
domains.

Articial Neural Networks

A paradigm that considers collective intelligence as a behavior that emerges


through the interaction and cooperation of large numbers of lesser intelligent
agents. The paradigm consists of two dominant sub-elds 1) Ant Colony
Optimization that investigates probabilistic algorithms inspired by the
foraging behavior of ants [10, 18], and 2) Particle Swarm Optimization that
investigates probabilistic algorithms inspired by the ocking and foraging
behavior of birds and sh [30]. Like evolutionary computation, swarm
intelligence-based techniques are considered adaptive strategies and are
typically applied to search and optimization domains.

Swarm Intelligence

A paradigm that is concerned with the investigation of systems inspired by


the neo-Darwinian theory of evolution by means of natural selection (natural
selection theory and an understanding of genetics). Popular evolutionary
algorithms include the Genetic Algorithm, Evolution Strategy, Genetic
and Evolutionary Programming, and Dierential Evolution [4, 5]. The
evolutionary process is considered an adaptive strategy and is typically
applied to search and optimization domains [26, 28].

Evolutionary Computation

1.1. What is AI

Chapter 1. Introduction

Metaheuristics

Metaheuristics are not problem-specic.

The basic concepts of metaheuristics permit an abstract level description.

They may incorporate mechanisms to avoid getting trapped in conned


areas of the search space.

Metaheuristic algorithms are approximate and usually non-deterministic.

Techniques which constitute metaheuristic algorithms range from


simple local search procedures to complex learning processes.

The goal is to eciently explore the search space in order to nd


(near-)optimal solutions.

Metaheuristics are strategies that guide the search process.

Another popular name for the strategy-outcome perspective of scruy AI is


metaheuristics. In this context, heuristic is an algorithm that locates good
enough solutions to a problem without concern for whether the solution
can be proven to be correct or optimal [37]. Heuristic methods trade-o
concerns such as precision, quality, and accuracy in favor of computational
eort (space and time eciency). The greedy search procedure that only
takes cost-improving steps is an example of heuristic method.
Like heuristics, metaheuristics may be considered a general algorithmic
framework that can be applied to dierent optimization problems with
relative few modications to adapt them to a specic problem [25, 46]. The
dierence is that metaheuristics are intended to extend the capabilities
of heuristics by combining one or more heuristic methods (referred to as
procedures) using a higher-level strategy (hence meta). A procedure in a
metaheuristic is considered black-box in that little (if any) prior knowledge
is known about it by the metaheuristic, and as such it may be replaced with
a dierent procedure. Procedures may be as simple as the manipulation of
a representation, or as complex as another complete metaheuristic. Some
examples of metaheuristics include iterated local search, tabu search, the
genetic algorithm, ant colony optimization, and simulated annealing.
Blum and Roli outline nine properties of metaheuristics [9], as follows:

1.1.4

selection, negative selection, the dendritic cell algorithm, and immune network algorithms. The immune-inspired adaptive processes vary in strategy
and show similarities to the elds of Evolutionary Computation and Articial Neural Networks, and are typically used for optimization and pattern
recognition domains [15].

Clever Algorithms

Problem Domains

The number of possible solutions in the search space is so large as to


forbid an exhaustive search for the best answer.

Algorithms from the elds of Computational Intelligence, Biologically Inspired Computing, and Metaheuristics are applied to dicult problems, to
which more traditional approaches may not be suited. Michalewicz and
Fogel propose ve reasons why problems may be dicult [37] (page 11):

1.2

This book is concerned with clever algorithms, which are algorithms


drawn from many sub-elds of articial intelligence not limited to the
scruy elds of biologically inspired computation, computational intelligence
and metaheuristics. The term clever algorithms is intended to unify a
collection of interesting and useful computational tools under a consistent
and accessible banner. An alternative name (Inspired Algorithms) was
considered, although ultimately rejected given that not all of the algorithms
to be described in the project have an inspiration (specically a biological or
physical inspiration) for their computational strategy. The set of algorithms
described in this book may generally be referred to as unconventional
optimization algorithms (for example, see [14]), as optimization is the main
form of computation provided by the listed approaches. A technically more
appropriate name for these approaches is stochastic global optimization (for
example, see [49] and [35]).
Algorithms were selected in order to provide a rich and interesting
coverage of the elds of Biologically Inspired Computation, Metaheuristics
and Computational Intelligence. Rather than a coverage of just the state-ofthe-art and popular methods, the algorithms presented also include historic
and newly described methods. The nal selection was designed to provoke
curiosity and encourage exploration and a wider view of the eld.

1.1.5

Hyperheuristics are yet another extension that focuses on heuristics


that modify their parameters (online or oine) to improve the ecacy
of solution, or the eciency of the computation. Hyperheuristics provide
high-level strategies that may employ machine learning and adapt their
search behavior by modifying the application of the sub-procedures or even
which procedures are used (operating on the space of heuristics which in
turn operate within the problem domain) [12, 13].

Todays more advanced metaheuristics use search experience (embodied


in some form of memory) to guide the search.

Metaheuristics may make use of domain-specic knowledge in the


form of heuristics that are controlled by the upper level strategy.

1.2. Problem Domains

The person solving the problem is inadequately prepared or imagines


some psychological barrier that prevents them from discovering a
solution.

The possible solutions are so heavily constrained that constructing


even one feasible answer is dicult, let alone searching for an optimal
solution.

The evaluation function that describes the quality of any proposed


solution is noisy or varies with time, thereby requiring not just a single
solution but an entire series of solutions.

The problem is so complicated, that just to facilitate any answer at


all, we have to use such simplied models of the problem that any
result is essentially useless.

Chapter 1. Introduction

Function Optimization

Mathematically, optimization is dened as the search for a combination of parameters commonly referred to as decision variables (x = {x1 , x2 , x3 , . . . xn })
which minimize or maximize some ordinal quantity (c) (typically a scalar
called a score or cost) assigned by an objective function or cost function (f ),
under a set of constraints (g = {g1 , g2 , g3 , . . . gn }). For example, a general
minimization case would be as follows: f (x ) f (x), xi x. Constraints
may provide boundaries on decision variables (for example in a real-value hypercube n ), or may generally dene regions of feasibility and in-feasibility
in the decision variable space. In applied mathematics the eld may be
referred to as Mathematical Programming. More generally the eld may
be referred to as Global or Function Optimization given the focus on the
objective function. For more general information on optimization refer to
Horst et al. [29].

Problem Description

Real-world optimization problems and generalizations thereof can be drawn


from most elds of science, engineering, and information technology (for
a sample [2, 48]). Importantly, function optimization problems have had
a long tradition in the elds of Articial Intelligence in motivating basic
research into new problem solving techniques, and for investigating and
verifying systemic behavior against benchmark problem instances.

1.2.1

This section introduces two problem formalisms that embody many of the
most dicult problems faced by Articial and Computational Intelligence.
They are: Function Optimization and Function Approximation. Each class
of problem is described in terms of its general properties, a formalism, and
a set of specialized sub-problems. These problem classes provide a tangible
framing of the algorithmic techniques described throughout the work.

10

Function Approximation

1 Taken from statistics referring to the centers of mass in distributions, although in


optimization it refers to regions of interest in the search space, in particular valleys in
minimization, and peaks in maximization cost surfaces.

Real-world Function Approximation problems are among the most computationally dicult considered in the broader eld of Articial Intelligence for
reasons including: incomplete information, high-dimensionality, noise in the
sample observations, and non-linearities in the target function. This section
considers the Function Approximation formalism and related specializations
as a general motivating problem to contrast and compare with Function
Optimization.

1.2.2

The eld of Function Optimization is related to Function Approximation, as many-sub-problems of Function Approximation may be dened as
optimization problems. Many of the technique paradigms used for function

Curve or Surface Fitting where a model is prepared that provides a


best-t (called a regression) for a set of observations that may be
used for interpolation over known observations and extrapolation for
observations outside what has been modeled.

Clustering where observations may be organized into groups based


on underlying common features, although the groups are unlabeled
requiring a process to model an underlying discrimination function
without corrective feedback.

Classication where observations are inherently organized into labelled groups (classes) and a supervised process models an underlying
discrimination function to classify unobserved samples.

Feature Selection where a feature is considered an aggregation of


one-or-more attributes, where only those features that have meaning
in the context of the target function are necessary to the modeling
function [27, 32].

The function approximation formalism can be used to phrase some of the


hardest problems faced by Computer Science, and Articial Intelligence
in particular, such as natural language processing and computer vision.
The general process focuses on 1) the collection and preparation of the
observations from the target function, 2) the selection and/or preparation of
a model of the target function, and 3) the application and ongoing renement
of the prepared model. Some important problem-based sub-elds include:

Sub-Fields of Study

Function Approximation is the problem of nding a function (f ) that approximates a target function (g), where typically the approximated function
is selected based on a sample of observations (x, also referred to as the
training set) taken from the unknown target function. In machine learning,
the function approximation formalism is used to describe general problem
types commonly referred to as pattern recognition, such as classication,
clustering, and curve tting (called a decision or discrimination function).
Such general problem types are described in terms of approximating an
unknown Probability Density Function (PDF), which underlies the relationships in the problem space, and is represented in the sample data. This
perspective of such problems is commonly referred to as statistical machine
learning and/or density estimation [8, 24].

Chapter 1. Introduction

The study of optimization is comprised of many specialized sub-elds, based


on an overlapping taxonomy that focuses on the principle concerns in the
general formalism. For example, with regard to the decision variables,
one may consider univariate and multivariate optimization problems. The
type of decision variables promotes specialities for continuous, discrete,
and permutations of variables. Dependencies between decision variables
under a cost function dene the elds of Linear Programming, Quadratic
Programming, and Nonlinear Programming. A large class of optimization
problems can be reduced to discrete sets and are considered in the eld
of Combinatorial Optimization, to which many theoretical properties are
known, most importantly that many interesting and relevant problems
cannot be solved by an approach with polynomial time complexity (socalled NP, for example see Papadimitriou and Steiglitz [38]).
THe evaluation of variables against a cost function, collectively may
be considered a response surface. The shape of such a response surface
may be convex, which is a class of functions to which many important
theoretical ndings have been made, not limited to the fact that location of
the local optimal conguration also means the global optimal conguration
of decision variables has been located [11]. Many interesting and real-world
optimization problems produce cost surfaces that are non-convex or so called
multi-modal1 (rather than unimodal) suggesting that there are multiple
peaks and valleys. Further, many real-world optimization problems with
continuous decision variables cannot be dierentiated given their complexity
or limited information availability, meaning that derivative-based gradient
decent methods (that are well understood) are not applicable, necessitating
the use of so-called direct search (sample or pattern-based) methods [33].
Real-world objective function evaluation may be noisy, discontinuous, and/or
dynamic, and the constraints of real-world problem solving may require
an approximate solution in limited time or using resources, motivating the
need for heuristic approaches.

12
Problem Description

11

Sub-Fields of Study

1.2. Problem Domains

13

Unconventional Optimization

Black Box Algorithms

Black Box optimization algorithms are those that exploit little, if any,
information from a problem domain in order to devise a solution. They are
generalized problem solving procedures that may be applied to a range of
problems with very little modication [19]. Domain specic knowledge refers
to known relationships between solution representations and the objective
cost function. Generally speaking, the less domain specic information
incorporated into a technique, the more exible the technique, although the
less ecient it will be for a given problem. For example, random search is
the most general black box approach and is also the most exible requiring
only the generation of random solutions for a given problem. Random
search allows resampling of the domain which gives it a worst case behavior
that is worse than enumerating the entire search domain. In practice, the
more prior knowledge available about a problem, the more information that
can be exploited by a technique in order to eciently locate a solution for
the problem, heuristically or otherwise. Therefore, black box methods are
those methods suitable for those problems where little information from the

1.3.1

Not all algorithms described in this book are for optimization, although
those that are may be referred to as unconventional to dierentiate them
from the more traditional approaches. Examples of traditional approaches
include (but are not not limited) mathematical optimization algorithms
(such as Newtons method and Gradient Descent that use derivatives to
locate a local minimum) and direct search methods (such as the Simplex
method and the Nelder-Mead method that use a search pattern to locate
optima). Unconventional optimization algorithms are designed for the
more dicult problem instances, the attributes of which were introduced in
Section 1.2.1. This section introduces some common attributes of this class
of algorithm.

1.3

approximation are dierentiated based on the representation and the optimization process used to minimize error or maximize eectiveness on a
given approximation problem. The diculty of Function Approximation
problems centre around 1) the nature of the unknown relationships between
attributes and features, 2) the number (dimensionality) of attributes and
features, and 3) general concerns of noise in such relationships and the
dynamic availability of samples from the target function. Additional diculties include the incorporation of prior knowledge (such as imbalance in
samples, incomplete information and the variable reliability of data), and
problems of invariant features (such as transformation, translation, rotation,
scaling, and skewing of features).

1.3. Unconventional Optimization

Chapter 1. Introduction

No-Free-Lunch

Stochastic Optimization

Monte Carlo methods are used for selecting a statistical sample to


approximate a given target probability density function and are tradi-

Stochastic optimization algorithms are those that use randomness to elicit


non-deterministic behaviors, contrasted to purely deterministic procedures.
Most algorithms from the elds of Computational Intelligence, Biologically
Inspired Computation, and Metaheuristics may be considered to belong the
eld of Stochastic Optimization. Algorithms that exploit randomness are not
random in behavior, rather they sample a problem space in a biased manner,
focusing on areas of interest and neglecting less interesting areas [45]. A
class of techniques that focus on the stochastic sampling of a domain, called
Markov Chain Monte Carlo (MCMC) algorithms, provide good average
performance, and generally oer a low chance of the worst case performance.
Such approaches are suited to problems with many coupled degrees of
freedom, for example large, high-dimensional spaces. MCMC approaches
involve stochastically sampling from a target distribution function similar
to Monte Carlo simulation methods using a process that resembles a biased
Markov chain.

1.3.3

The No-Free-Lunch Theorem of search and optimization by Wolpert and


Macready proposes that all black box optimization algorithms are the same
for searching for the extremum of a cost function when averaged over all
possible functions [50, 51]. The theorem has caused a lot of pessimism and
misunderstanding, particularly in relation to the evaluation and comparison
of Metaheuristic and Computational Intelligence algorithms.
The implication of the theorem is that searching for the best generalpurpose black box optimization algorithm is irresponsible as no such procedure is theoretically possible. No-Free-Lunch applies to stochastic and
deterministic optimization algorithms as well as to algorithms that learn and
adjust their search strategy over time. It is independent of the performance
measure used and the representation selected. Wolpert and Macreadys
original paper was produced at a time when grandiose generalizations were
being made as to algorithm, representation, or conguration superiority.
The practical impact of the theory is to encourage practitioners to bound
claims of applicability for search and optimization algorithms. Wolpert and
Macready encouraged eort be put into devising practical problem classes
and into the matching of suitable algorithms to problem classes. Further,
they compelled practitioners to exploit domain knowledge in optimization
algorithm application, which is now an axiom in the eld.

1.3.2

problem domain is available to be used by a problem solving approach.

14

15

Inductive Learning

Many unconventional optimization algorithms employ a process that includes


the iterative improvement of candidate solutions against an objective cost
function. This process of adaptation is generally a method by which the
process obtains characteristics that improve the systems (candidate solution)
relative performance in an environment (cost function). This adaptive
behavior is commonly achieved through a selectionist process of repetition
of the steps: generation, test, and selection. The use of non-deterministic
processes mean that the sampling of the domain (the generation step) is
typically non-parametric, although guided by past experience.
The method of acquiring information is called inductive learning or
learning from example, where the approach uses the implicit assumption
that specic examples are representative of the broader information content
of the environment, specically with regard to anticipated need. Many
unconventional optimization approaches maintain a single candidate solution,
a population of samples, or a compression thereof that provides both an
instantaneous representation of all of the information acquired by the process,
and the basis for generating and making future decisions.
This method of simultaneously acquiring and improving information
from the domain and the optimization of decision making (where to direct
future eort) is called the k-armed bandit (two-armed and multi-armed
bandit) problem from the eld of statistical decision making known as game
theory [7, 42]. This formalism considers the capability of a strategy to
allocate available resources proportional to the future payo the strategy
is expected to receive. The classic example is the 2-armed bandit problem

1.3.4

MCMC techniques combine these two approaches to solve integration


and optimization problems in large dimensional spaces by generating samples while exploring the space using a Markov chain process, rather than
sequentially or independently [3]. The step generation is congured to bias
sampling in more important regions of the domain. Three examples of
MCMC techniques include the Metropolis-Hastings algorithm, Simulated
Annealing for global optimization, and the Gibbs sampler which are commonly employed in the elds of physics, chemistry, statistics, and economics.

Markov Chain processes provide a probabilistic model for state transitions or moves within a discrete domain called a walk or a chain of
steps. A Markov system is only dependent on the current position in
the domain in order to probabilistically determine the next step in
the walk.

tionally used in statistical physics. Samples are drawn sequentially


and the process may include criteria for rejecting samples and biasing
the sampling locations within high-dimensional spaces.

1.3. Unconventional Optimization

Chapter 1. Introduction

Book Organization

Algorithms

Immune Algorithms inspired by the adaptive immune system of vertebrates (Chapter 7).

Swarm Algorithms that focuses on methods that exploit the properties


of collective intelligence (Chapter 6).

Probabilistic Algorithms that focuses on methods that build models


and estimate distributions in search domains (Chapter 5).

Physical Algorithms inspired by physical and social systems (Chapter 4).

Evolutionary Algorithms inspired by evolution by means of natural


selection (Chapter 3).

Stochastic Algorithms that focuses on the introduction of randomness


into heuristic methods (Chapter 2).

Algorithms are presented in six groups or kingdoms distilled from the broader
elds of study each in their own chapter, as follows:

1.4.1

The remainder of this book is organized into two parts: Algorithms that
describes a large number of techniques in a complete and a consistent
manner presented in a rough algorithm groups, and Extensions that reviews
more advanced topics suitable for when a number of algorithms have been
mastered.

1.4

used by Goldberg to describe the behavior of the genetic algorithm [26]. The
example involves an agent that learns which one of the two slot machines
provides more return by pulling the handle of each (sampling the domain)
and biasing future handle pulls proportional to the expected utility, based
on the probabilistic experience with the past distribution of the payo.
The formalism may also be used to understand the properties of inductive
learning demonstrated by the adaptive behavior of most unconventional
optimization algorithms.
The stochastic iterative process of generate and test can be computationally wasteful, potentially re-searching areas of the problem space already
searched, and requiring many trials or samples in order to achieve a good
enough solution. The limited use of prior knowledge from the domain
(black box) coupled with the stochastic sampling process mean that the
adapted solutions are created without top-down insight or instruction can
sometimes be interesting, innovative, and even competitive with decades of
human expertise [31].

16

17

Strategy: The strategy is an abstract description of the computational


model. The strategy describes the information processing actions
a technique shall take in order to achieve an objective, providing a
logical separation between a computational realization (procedure) and
an analogous system (metaphor). A given problem solving strategy

Metaphor : (where appropriate) The metaphor is a description of the


technique in the context of the inspiring system or a dierent suitable
system. The features of the technique are made apparent through
an analogous description of the features of the inspiring system. The
explanation through analogy is not expected to be literal, rather the
method is used as an allegorical communication tool. The inspiring
system is not explicitly described, this is the role of the inspiration
topic, which represents a loose dependency for this topic.

Inspiration: (where appropriate) The inspiration describes the specic


system or process that provoked the inception of the algorithm. The
inspiring system may non-exclusively be natural, biological, physical,
or social. The description of the inspiring system may include relevant
domain specic theory, observation, nomenclature, and those salient
attributes of the system that are somehow abstractly or conceptually
manifest in the technique.

Taxonomy: The algorithm taxonomy denes where a technique ts


into the eld, both the specic sub-elds of Computational Intelligence
and Biologically Inspired Computation as well as the broader eld
of Articial Intelligence. The taxonomy also provides a context for
determining the relationships between algorithms.

Name: The algorithm name denes the canonical name used to refer
to the technique, in addition to common aliases, abbreviations, and
acronyms. The name is used as the heading of an algorithm description.

A given algorithm is more than just a procedure or code listing, each


approach is an island of research. The meta-information that dene the
context of a technique is just as important to understanding and application
as abstract recipes and concrete implementations. A standardized algorithm
description is adopted to provide a consistent presentation of algorithms
with a mixture of softer narrative descriptions, programmatic descriptions
both abstract and concrete, and most importantly useful sources for nding
out more information about the technique.
The standardized algorithm description template covers the following
subjects:

Neural Algorithms inspired by the plasticity and learning qualities of


the human nervous system (Chapter 8).

1.4. Book Organization

References: The references section includes a listing of both primary


sources of information about the technique as well as useful introductory sources for novices to gain a deeper understanding of the
theory and application of the technique. The description consists
of hand-selected reference material including books, peer reviewed
conference papers, and journal articles.

Code Listing: The code listing description provides a minimal but


functional version of the technique implemented with a programming
language. The code description can be typed into a computer and
provide a working execution of the technique. The technique implementation also includes a minimal problem instance to which it is
applied, and both the problem and algorithm implementations are
complete enough to demonstrate the techniques procedure. The description is presented as a programming source code listing with a
terse introductory summary.

Heuristics: The heuristics section describes the commonsense, best


practice, and demonstrated rules for applying and conguring a parameterized algorithm. The heuristics relate to the technical details
of the techniques procedure and data structures for general classes
of application (neither specic implementations nor specic problem
instances).

Procedure: The algorithmic procedure summarizes the specics of


realizing a strategy as a systemized and parameterized computation.
It outlines how the algorithm is organized in terms of the computation,
data structures, and representations.

may be realized as one of a number of specic algorithms or problem


solving systems.

Chapter 1. Introduction

2 Ruby

can be downloaded for free from http://www.ruby-lang.org

Source code examples are included in the algorithm descriptions, and


the Ruby Programming Language was selected for use throughout the
book. Ruby was selected because it supports the procedural programming paradigm, adopted to ensure that examples can be easily ported to
object-oriented and other paradigms. Additionally, Ruby is an interpreted
language, meaning the code can be directly executed without an introduced
compilation step, and it is free to download and use from the Internet.2
Ruby is concise, expressive, and supports meta-programming features that
improve the readability of code examples.
The sample code provides a working version of a given technique for
demonstration purposes. Having a tinker with a technique can really
bring it to life and provide valuable insight into a method. The sample
code is a minimum implementation, providing plenty of opportunity to

18

19

Extensions

How to Read this Book

Students: Undergraduate and graduate students interested in learning about techniques, addressing questions such as: What are some
interesting algorithms to study? How to implement a given approach?

Engineers: Programmers and developers concerned with implementing,


applying, or maintaining algorithms, addressing questions such as:
What is the procedure for a given technique? What are the best practice
heuristics for employing a given technique?

Scientists: Research scientists concerned with theoretically or empirically investigating algorithms, addressing questions such as: What
is the motivating system and strategy for a given technique? What
are some algorithms that may be used in a comparison within a given
subeld or across subelds?

This book is a reference text that provides a large compendium of algorithm


descriptions. It is a trusted handbook of practical computational recipes to
be consulted when one is confronted with dicult function optimization and
approximation problems. It is also an encompassing guidebook of modern
heuristic methods that may be browsed for inspiration, exploration, and
general interest.
The audience for this work may be interested in the elds of Computational Intelligence, Biologically Inspired Computation, and Metaheuristics
and may count themselves as belonging to one of the following broader
groups:

1.5

There are some some advanced topics that cannot be meaningfully considered
until one has a rm grasp of a number of algorithms, and these are discussed
at the back of the book. The Advanced Topics chapter addresses topics such
as: the use of alternative programming paradigms when implementing clever
algorithms, methodologies used when devising entirely new approaches,
strategies to consider when testing clever algorithms, visualizing the behavior
and results of algorithms, and comparing algorithms based on the results
they produce using statistical methods. Like the background information
provided in this chapter, the extensions provide a gentle introduction and
starting point into some advanced topics, and references for seeking a deeper
understanding.

1.4.2

explore, extend and optimize. All of the source code for the algorithms
presented in this book is available from the companion website, online at
http://www.CleverAlgorithms.com. All algorithm implementations were
tested with Ruby 1.8.6, 1.8.7 and 1.9.

1.5. How to Read this Book

Chapter 1. Introduction

Articial Intelligence

Computational Intelligence

Biologically Inspired Computation


Computational methods inspired by natural and biologically systems represent a large portion of the algorithms described in this book. The collection
of articles published in de Castro and Von Zubens Recent Developments
in Biologically Inspired Computing provides an overview of the state of
the eld, and the introductory chapter on need for such methods does an
excellent job to motivate the eld of study [17]. Forbess Imitation of Life:

1.6.3

Introductory books for the eld of Computational Intelligence generally


focus on a handful of specic sub-elds and their techniques. Engelbrechts
Computational Intelligence: An Introduction provides a modern and detailed introduction to the eld covering classic subjects such as Evolutionary
Computation and Articial Neural Networks, as well as more recent techniques such as Swarm Intelligence and Articial Immune Systems [20].
Pedryczs slightly more dated Computational Intelligence: An Introduction
also provides a solid coverage of the core of the eld with some deeper
insights into fuzzy logic and fuzzy systems [41].

1.6.2

Articial Intelligence is large eld of study and many excellent texts have
been written to introduce the subject. Russell and Novigs Articial
Intelligence: A Modern Approach is an excellent introductory text providing
a broad and deep review of what the eld has to oer and is useful for
students and practitioners alike [43]. Luger and Stubbleelds Articial
Intelligence: Structures and Strategies for Complex Problem Solving is also
an excellent reference text, providing a more empirical approach to the eld
[34].

1.6.1

This book is not an introduction to Articial Intelligence or related sub-elds,


nor is it a eld guide for a specic class of algorithms. This section provides
some pointers to selected books and articles for those readers seeking a
deeper understanding of the elds of study to which the Clever Algorithms
described in this book belong.

Further Reading

Amateurs: Practitioners interested in knowing more about algorithms,


addressing questions such as: What classes of techniques exist and what
algorithms do they provide? How to conceptualize the computation of
a technique?

1.6

20

21

[10] E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm Intelligence: From


Natural to Articial Systems. Oxford University Press US, 1999.

Bibliography

[3] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction


to MCMC for machine learning. Machine Learning, 50:543, 2003.

[2] M. M. Ali, C. Storey, and A T


orn. Application of stochastic global
optimization algorithms to practical problems. Journal of Optimization
Theory and Applications, 95(3):545563, 1997.

[1] S. Aaronson. NP-complete problems and physical reality. ACM


SIGACT News (COLUMN: Complexity theory), 36(1):3052, 2005.

1.7

The Ruby Programming Language is a multi-paradigm dynamic language


that appeared in approximately 1995. Its meta-programming capabilities
coupled with concise and readable syntax have made it a popular language
of choice for web development, scripting, and application development.
The classic reference text for the language is Thomas, Fowler, and Hunts
Programming Ruby: The Pragmatic Programmers Guide referred to as the
pickaxe book because of the picture of the pickaxe on the cover [47]. An
updated edition is available that covers version 1.9 (compared to 1.8 in the
cited version) that will work just as well for use as a reference for the examples
in this book. Flanagan and Matsumotos The Ruby Programming Language
also provides a seminal reference text with contributions from Yukihiro
Matsumoto, the author of the language [21]. For more information on the
Ruby Programming Language, see the quick-start guide in Appendix A.

The Ruby Programming Language

[19] S. Droste, T. Jansen, and I. Wegener. Upper and lower bounds for
randomized search heuristics in black-box optimization. Theory of
Computing Systems, 39(4):525544, 2006.

utzle. Ant Colony Optimization. MIT Press, 2004.


[18] M. Dorigo and T. St

[17] L. N. de Castro and F. J. Von Zuben. Recent developments in biologically


inspired computing. Idea Group Inc, 2005.

[16] L. N. de Castro and F. J. Von Zuben. Recent developments in biologically


inspired computing, chapter From biologically inspired computing to
natural computing. Idea Group, 2005.

[15] L. N. de Castro and J. Timmis. Articial Immune Systems: A New


Computational Intelligence Approach. Springer, 2002.

[14] D. Corne, M. Dorigo, and F. Glover. New Ideas in Optimization.


McGraw-Hill, 1999.

[13] E. K. Burke, G. Kendall, and E. Soubeiga. A tabu-search hyperheuristic for timetabling and rostering. Journal of Heuristics, 9(6):451
470, 2003.

[12] E. K. Burke, E. Hart, G. Kendall, J. Newall, P. Ross, and S. Schulenburg.


Handbook of Metaheuristics, chapter Hyper-heuristics: An emerging
direction in modern search technology, pages 457474. Kluwer, 2003.

[11] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge


University Press, 2004.

[8] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford


University Press, 1995.

1.6.5

[7] D. Bergemann and J. Valimaki. Bandit problems. Cowles Foundation


Discussion Papers 1551, Cowles Foundation, Yale University, January
2006.

[6] J. M. Benyus. Biomimicry: Innovation Inspired by Nature. Quill, 1998.

[5] T. B
ack, D. B. Fogel, and Z. Michalewicz, editors. Evolutionary
Computation 2: Advanced Algorithms and Operations. IoP, 2000.

[9] C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual comparison. ACM Computing Surveys
(CSUR), 35(3):268308, 2003.

Metaheuristics

Chapter 1. Introduction

ack, D. B. Fogel, and Z. Michalewicz, editors. Evolutionary


[4] T. B
Computation 1: Basic Algorithms and Operators. IoP, 2000.

22

The eld of Metaheuristics was initially constrained to heuristics for applying


classical optimization procedures, although has expanded to encompass a
broader and diverse set of techniques. Michalewicz and Fogels How to
Solve It: Modern Heuristics provides a practical tour of heuristic methods
with a consistent set of worked examples [37]. Glover and Kochenbergers
Handbook of Metaheuristics provides a solid introduction into a broad
collection of techniques and their capabilities [25].

1.6.4

How Biology Is Inspiring Computing sets the scene for Natural Computing
and the interrelated disciplines, of which Biologically Inspired Computing
is but one useful example [22]. Finally, Benyuss Biomimicry: Innovation
Inspired by Nature provides a good introduction into the broader related
eld of a new frontier in science and technology that involves building
systems inspired by an understanding of the biological world [6].

1.7. Bibliography

Handbook of Metaheuristics.

[34] G. F. Luger and W. A. Stubbleeld. Articial Intelligence: Structures


and Strategies for Complex Problem Solving. Benjamin/Cummings
Pub. Co., second edition, 1993.

[33] R. M. Lewis, V. T., and M. W. Trosset. Direct search methods: then


and now. Journal of Computational and Applied Mathematics, 124:191
207, 2000.

[32] M. Kudo and J. Sklansky. Comparison of algorithms that select features


for pattern classiers. Pattern Recognition, 33:2541, 2000.

[31] J. R. Koza, M. A. Keane, M. J. Streeter, W. Mydlowec, J. Yu, and


G. Lanza. Genetic Programming IV: Routine Human-Competitive
Machine Intelligence. Springer, 2003.

[30] J. Kennedy, R. C. Eberhart, and Y. Shi. Swarm Intelligence. Morgan


Kaufmann, 2001.

[50] D. H. Wolpert and W. G. Macready. No free lunch theorems for search.


Technical report, Santa Fe Institute, Sante Fe, NM, USA, 1995.

[49] T. Weise. Global Optimization Algorithms - Theory and Application.


(Self Published), 2009-06-26 edition, 2007.

[48] A. T
orn, M. M. Ali, and S. Viitanen. Stochastic global optimization:
Problem classes and solution techniques. Journal of Global Optimization, 14:437447, 1999.

[47] D. Thomas, C. Fowler, and A. Hunt. Programming Ruby: The Pragmatic Programmers Guide. Pragmatic Bookshelf, second edition, 2004.

[46] E. G. Talbi. Metaheuristics: From Design to Implementation. John


Wiley and Sons, 2009.

[45] J. C. Spall. Introduction to stochastic search and optimization: estimation, simulation, and control. John Wiley and Sons, 2003.

[44] A. Sloman. Evolving Knowledge in Natural Science and Articial


Intelligence, chapter Must intelligent systems be scruy? Pitman,
1990.

[43] S. Russell and P. Norvig. Articial Intelligence: A Modern Approach.


Prentice Hall, third edition, 2009.

[28] J. H. Holland. Adaptation in natural and articial systems: An introductory analysis with applications to biology, control, and articial
intelligence. University of Michigan Press, 1975.

[29] R. Horst, P. M. Pardalos, and N. V. Thoai. Introduction to Global


Optimization. Kluwer Academic Publishers, 2nd edition, 2000.

[42] H. Robbins. Some aspects of the sequential design of experiments. Bull.


Amer. Math. Soc., 58:527535, 1952.

[41] W. Pedrycz. Computational Intelligence: An Introduction. CRC Press,


1997.

un. Bio-inspired computing paradigms (natural computing).


[40] G. Pa
Unconventional Programming Paradigms, 3566:155160, 2005.

[39] R. Paton. Computing With Biological Metaphors, chapter Introduction


to computing with biological metaphors, pages 18. Chapman & Hall,
1994.

[27] I. Guyon and A. Elissee. An introduction to variable and feature


selection. Journal of Machine Learning Research, 3:11571182, 2003.

[26] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and


Machine Learning. Addison-Wesley, 1989.

[25] F. Glover and G. A. Kochenberger.


Springer, 2003.

[24] K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic


Press, 1990.

[38] C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization:


Algorithms and Complexity. Courier Dover Publications, 1998.

[37] Z. Michalewicz and D. B. Fogel. How to Solve It: Modern Heuristics.


Springer, 2004.

[22] N. Forbes. Biologically inspired computing. Computing in Science and


Engineering, 2(6):8387, 2000.

[23] N. Forbes. Imitation of Life: How Biology Is Inspiring Computing.


The MIT Press, 2005.

[36] P. Marrow. Nature-inspired computing technology and applications.


BT Technology Journal, 18(4):1323, 2000.

available at

Chapter 1. Introduction

[21] D. Flanagan and Y. Matsumoto. The Ruby Programming Language.


OReilly Media, 2008.

24
[35] S. Luke. Essentials of Metaheuristics. Lulu, 2010.
http://cs.gmu.edu/sean/book/metaheuristics/.

23

[20] A. P. Engelbrecht. Computational Intelligence: An Introduction. John


Wiley and Sons, second edition, 2007.

1.7. Bibliography

25

[52] L. A. Zadeh, G. J. Klir, and B. Yuan. Fuzzy sets, fuzzy logic, and fuzzy
systems: selected papers. World Scientic, 1996.

[51] D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(67):67
82, 1997.

1.7. Bibliography
26

Chapter 1. Introduction

27

Algorithms

Part II

Overview

Stochastic Optimization

29

The majority of the algorithms to be described in this book are comprised


of probabilistic and stochastic processes. What dierentiates the stochastic
algorithms in this chapter from the remaining algorithms is the specic lack
of 1) an inspiring system, and 2) a metaphorical explanation. Both inspiration and metaphor refer to the descriptive elements in the standardized
algorithm description.
These described algorithms are predominately global optimization algorithms and metaheuristics that manage the application of an embedded
neighborhood exploring (local) search procedure. As such, with the exception of Stochastic Hill Climbing and Random Search the algorithms may
be considered extensions of the multi-start search (also known as multirestart search). This set of algorithms provide various dierent strategies by
which better and varied starting points can be generated and issued to a
neighborhood searching technique for renement, a process that is repeated
with potentially improving or unexplored areas to search.

2.1.1

This chapter describes Stochastic Algorithms.

2.1

Stochastic Algorithms

Chapter 2

Random Search

Chapter 2. Stochastic Algorithms

Taxonomy

Strategy

Procedure

Heuristics
Random search is minimal in that it only requires a candidate solution
construction routine and a candidate solution evaluation routine, both
of which may be calibrated using the approach.

2.2.4

Algorithm 2.2.1: Pseudocode for Random Search.


Input: NumIterations, ProblemSize, SearchSpace
Output: Best
1 Best ;
2 foreach iteri NumIterations do
3
candidatei RandomSolution(ProblemSize, SearchSpace);
4
if Cost(candidatei ) < Cost(Best) then
5
Best candidatei ;
6
end
7 end
8 return Best;

Algorithm 2.2.1 provides a pseudocode listing of the Random Search Algorithm for minimizing a cost function.

2.2.3

The strategy of Random Search is to sample solutions from across the entire
search space using a uniform probability distribution. Each future sample
is independent of the samples that come before it.

2.2.2

Random search belongs to the elds of Stochastic Optimization and Global


Optimization. Random search is a direct search method as it does not
require derivatives to search a continuous domain. This base approach is
related to techniques that provide small improvements such as Directed
Random Search, and Adaptive Random Search (Section 2.3).

2.2.1

Random Search, RS, Blind Random Search, Blind Search, Pure Random
Search, PRS

2.2

30

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

31

Code Listing

# problem configuration
problem_size = 2
search_space = Array.new(problem_size) {|i| [-5, +5]}

if __FILE__ == $0

end
return best
end

candidate = {}
candidate[:vector] = random_vector(search_space)
candidate[:cost] = objective_function(candidate[:vector])
best = candidate if best.nil? or candidate[:cost] < best[:cost]
puts " > iteration=#{(iter+1)}, best=#{best[:cost]}"

def search(search_space, max_iter)


best = nil
max_iter.times do |iter|

end
end

minmax[i][0] + ((minmax[i][1] - minmax[i][0]) * rand())

def random_vector(minmax)
return Array.new(minmax.size) do |i|

def objective_function(vector)
return vector.inject(0) {|sum, x| sum + (x ** 2.0)}
end

Listing 2.1 provides an example of the Random Search Algorithm implemented in the Ruby Programming Language. In the example, the algorithm
runs for a xed number of iterations and returns the best candidate solution
discovered. The example problem is an instance
n of a continuous function
optimization that seeks min f (x) where f = i=1 x2i , 5.0 xi 5.0 and
n = 2. The optimal solution for this basin function is (v0 , . . . , vn1 ) = 0.0.

2.2.5

The results of a Random Search can be used to seed another search


technique, like a local search technique (such as the Hill Climbing algorithm) that can be used to locate the best solution in the neighborhood
of the good candidate solution.

Care must be taken with some problem domains to ensure that random
candidate solution construction is unbiased

Random Search can return a reasonable approximation of the optimal


solution within a reasonable time under low problem dimensionality,
although the approach does not scale well with problem size (such as
the number of dimensions).

The worst case performance for Random Search for locating the
optima is worse than an Enumeration of the search domain, given
that Random Search has no memory and can blindly resample.

2.2. Random Search

32

31

30

29

28

27

References

Listing 2.1: Random Search in Ruby

Bibliography

[5] J. C. Spall. Introduction to stochastic search and optimization: estimation, simulation, and control. John Wiley and Sons, 2003.

[4] F. J. Solis and J. B. Wets. Minimization by random search techniques.


Mathematics of Operations Research, 6:1930, 1981.

[3] O. Y. Kulchitskii. Random-search algorithm for extrema in functional


space under conditions of partial uncertainty. Cybernetics and Systems
Analysis, 12(5):794801, 1976.

[2] D. C. Karnopp. Random search techniques for optimization problems.


Automatica, 1(23):111121, 1963.

[1] S. H. Brooks. A discussion of random methods for seeking maxima.


Operations Research, 6(2):244251, 1958.

2.2.7

For overviews of Random Search Methods see Zhigljavsky [9], Solis and
Wets [4], and also White [7] who provide an insightful review article. Spall
provides a detailed overview of the eld of Stochastic Optimization, including
the Random Search method [5] (for example, see Chapter 2). For a shorter
introduction by Spall, see [6] (specically Section 6.2). Also see Zabinsky
for another detailed review of the broader eld [8].

Learn More

There is no seminal specication of the Random Search algorithm, rather


there are discussions of the general approach and related random search
methods from the 1950s through to the 1970s. This was around the time
that pattern and direct search methods were actively researched. Brooks is
credited with the so-called pure random search [1]. Two seminal reviews
of random search methods of the time include: Karnopp [2] and prhaps
Kulchitskii [3].

Primary Sources

2.2.6

end

Chapter 2. Stochastic Algorithms

# algorithm configuration
max_iter = 100
# execute the algorithm
best = search(search_space, max_iter)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

32

33

[9] A. A. Zhigljavsky. Theory of Global Random Search. Kluwer Academic,


1991.

[8] Z. B. Zabinsky. Stochastic adaptive search for global optimization. Kluwer


Academic Publishers, 2003.

[7] R. C. White. A survey of random methods for parameter optimization.


Simulation, 17(1):197205, 1971.

[6] J. C. Spall. Handbook of computational statistics: concepts and methods,


chapter 6. Stochastic Optimization, pages 169198. Springer, 2004.

2.2. Random Search

Adaptive Random Search

Chapter 2. Stochastic Algorithms

Taxonomy

Strategy

Procedure

Heuristics

Adaptive Random Search may adapt the search direction in addition


to the step size.

Candidates with equal cost should be considered improvements to


allow the algorithm to make progress across plateaus in the response
surface.

Adaptive Random Search was designed for continuous function optimization problem domains.

2.3.4

Algorithm 2.3.1 provides a pseudocode listing of the Adaptive Random


Search Algorithm for minimizing a cost function based on the specication
for Adaptive Step-Size Random Search by Schummer and Steiglitz [6].

2.3.3

The Adaptive Random Search algorithm was designed to address the limitations of the xed step size in the Localized Random Search algorithm.
The strategy for Adaptive Random Search is to continually approximate
the optimal step size required to reach the global optimum in the search
space. This is achieved by trialling and adopting smaller or larger step sizes
only if they result in an improvement in the search performance.
The Strategy of the Adaptive Step Size Random Search algorithm (the
specic technique reviewed) is to trial a larger step in each iteration and
adopt the larger step if it results in an improved result. Very large step
sizes are trialled in the same manner although with a much lower frequency.
This strategy of preferring large moves is intended to allow the technique to
escape local optima. Smaller step sizes are adopted if no improvement is
made for an extended period.

2.3.2

The Adaptive Random Search algorithm belongs to the general set of


approaches known as Stochastic Optimization and Global Optimization. It
is a direct search method in that it does not require derivatives to navigate
the search space. Adaptive Random Search is an extension of the Random
Search (Section 2.2) and Localized Random Search algorithms.

2.3.1

Adaptive Random Search, ARS, Adaptive Step Size Random Search, ASSRS,
Variable Step-Size Random Search.

2.3

34

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

end
end
end
return S;

f actor

Input: Itermax , P roblemsize , SearchSpace, StepSizeinit


f actor ,
large
iter
,
StepSize
,
StepSize
StepSizesmall
f actor
f actor ,
f actor
N oChangemax
Output: S
N oChangecount 0;
StepSizei InitializeStepSize(SearchSpace, StepSizeinit
f actor );
S RandomSolution(P roblemsize , SearchSpace);
for i = 0 to Itermax do
S1 TakeStep(SearchSpace, S, StepSizei );
StepSizelarge
0;
i
if i modStepSizeiter
f actor then
StepSizelarge
StepSizei StepSizelarge
i
f actor ;
else
StepSizelarge
StepSizei StepSizesmall
i
f actor ;
end
S2 TakeStep(SearchSpace, S, StepSizelarge
);
i
if Cost(S1 )Cost(S) Cost(S2 )Cost(S) then
if Cost(S2 )<Cost(S1 ) then
S S2 ;
StepSizei StepSizelarge
;
i
else
S S1 ;
end
N oChangecount 0;
else
N oChangecount N oChangecount + 1;
if N oChangecount > N oChangemax then
N oChangecount 0;
StepSizei
StepSizei StepSize
small ;

Algorithm 2.3.1: Pseudocode for Adaptive Random Search.

2.3. Adaptive Random Search

35

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

Chapter 2. Stochastic Algorithms

max_no_impr)

def search(max_iter, bounds, init_factor, s_factor, l_factor, iter_mult,

end

step, big_step = {}, {}


step[:vector] = take_step(bounds, current[:vector], step_size)
step[:cost] = objective_function(step[:vector])
big_step[:vector] = take_step(bounds,current[:vector],big_stepsize)
big_step[:cost] = objective_function(big_step[:vector])
return step, big_step

def take_steps(bounds, current, step_size, big_stepsize)

def large_step_size(iter, step_size, s_factor, l_factor, iter_mult)


return step_size * l_factor if iter>0 and iter.modulo(iter_mult) == 0
return step_size * s_factor
end

end
return position
end

position = Array.new(current.size)
position.size.times do |i|
min = [minmax[i][0], current[i]-step_size].max
max = [minmax[i][1], current[i]+step_size].min
position[i] = rand_in_bounds(min, max)

def take_step(minmax, current, step_size)

end
end

rand_in_bounds(minmax[i][0], minmax[i][1])

def random_vector(minmax)
return Array.new(minmax.size) do |i|

def rand_in_bounds(min, max)


return min + ((max-min) * rand())
end

def objective_function(vector)
return vector.inject(0) {|sum, x| sum + (x ** 2.0)}
end

Listing 2.2 provides an example of the Adaptive Random Search Algorithm


implemented in the Ruby Programming Language, based on the specication
for Adaptive Step-Size Random Search by Schummer and Steiglitz [6].
In the example, the algorithm runs for a xed number of iterations and
returns the best candidate solution discovered. The example problem is an
instance
n of a continuous function optimization that seeks min f (x) where
f = i=1 x2i , 5.0 < xi < 5.0 and n = 2. The optimal solution for this
basin function is (v0 , . . . , vn1 ) = 0.0.

Code Listing

The step size may be adapted for all parameters, or for each parameter
individually.

2.3.5

36

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

References

Listing 2.2: Adaptive Random Search in Ruby

37

Many works in the 1960s and 1970s experimented with variable step sizes for
Random Search methods. Schummer and Steiglitz are commonly credited
the adaptive step size procedure, which they called Adaptive Step-Size
Random Search [6]. Their approach only modies the step size based on an
approximation of the optimal step size required to reach the global optima.
Kregting and White review adaptive random search methods and propose

Primary Sources

2.3.6

end

# problem configuration
problem_size = 2
bounds = Array.new(problem_size) {|i| [-5, +5]}
# algorithm configuration
max_iter = 1000
init_factor = 0.05
s_factor = 1.3
l_factor = 3.0
iter_mult = 10
max_no_impr = 30
# execute the algorithm
best = search(max_iter, bounds, init_factor, s_factor, l_factor,
iter_mult, max_no_impr)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

if __FILE__ == $0

end
return current
end

puts " > iteration #{(iter+1)}, best=#{current[:cost]}"

end

count += 1
count, stepSize = 0, (step_size/s_factor) if count >= max_no_impr

else

count = 0

end

current = step

else

step_size = (bounds[0][1]-bounds[0][0]) * init_factor


current, count = {}, 0
current[:vector] = random_vector(bounds)
current[:cost] = objective_function(current[:vector])
max_iter.times do |iter|
big_stepsize = large_step_size(iter, step_size, s_factor, l_factor,
iter_mult)
step, big_step = take_steps(bounds, current, step_size, big_stepsize)
if step[:cost] <= current[:cost] or big_step[:cost] <= current[:cost]
if big_step[:cost] <= step[:cost]
step_size, current = big_stepsize, big_step

2.3. Adaptive Random Search

Chapter 2. Stochastic Algorithms

Bibliography

[7] R. C. White. A survey of random methods for parameter optimization.


Simulation, 17(1):197205, 1971.

[6] M. Schumer and K. Steiglitz. Adaptive step size random search. IEEE
Transactions on Automatic Control, 13(3):270276, 1968.

[5] G. Schrack and M. Choit. Optimized relative step size random searches.
Mathematical Programming, 10(1):230244, 1976.

[4] L. A. Rastrigin. The convergence of the random search method in the


extremal control of a many parameter system. Automation and Remote
Control, 24:13371342, 1963.

[3] S. F. Masri, G. A. Bekey, and F. B. Saord. Global optimization


algorithm using adaptive random search. Applied Mathematics and
Computation, 7(4):353376, 1980.

[2] J. Kregting and R. C. White. Adaptive random search. Technical Report


TH-Report 71-E-24, Eindhoven University of Technology, Eindhoven,
Netherlands, 1971.

[1] D. C. Karnopp. Random search techniques for optimization problems.


Automatica, 1(23):111121, 1963.

2.3.7

White reviews extensions to Rastrigins Creeping Random Search [4] (xed


step size) that use probabilistic step sizes drawn stochastically from uniform
and probabilistic distributions [7]. White also reviews works that propose
dynamic control strategies for the step size, such as Karnopp [1] who proposes
increases and decreases to the step size based on performance over very
small numbers of trials. Schrack and Choit review random search methods
that modify their step size in order to approximate optimal moves while
searching, including the property of reversal [5]. Masri et al. describe an
adaptive random search strategy that alternates between periods of xed
and variable step sizes [3].

Learn More

an approach called Adaptive Directional Random Search that modies


both the algorithms step size and direction in response to the cost function
[2].

38

Stochastic Hill Climbing

39

Procedure

Algorithm 2.4.1: Pseudocode for Stochastic Hill Climbing.


Input: Itermax , ProblemSize
Output: Current
1 Current RandomSolution(ProblemSize);
2 foreach iteri Itermax do
3
Candidate RandomNeighbor(Current);
4
if Cost(Candidate) Cost(Current) then
5
Current Candidate;
6
end
7 end
8 return Current;

Algorithm 2.4.1 provides a pseudocode listing of the Stochastic Hill Climbing


algorithm for minimizing a cost function, specically the Random Mutation
Hill Climbing algorithm described by Forrest and Mitchell applied to a
maximization optimization problem [3].

2.4.3

The strategy of the Stochastic Hill Climbing algorithm is iterate the process
of randomly selecting a neighbor for a candidate solution and only accept it
if it results in an improvement. The strategy was proposed to address the
limitations of deterministic hill climbing techniques that were likely to get
stuck in local optima due to their greedy acceptance of neighboring moves.

12

11

10

Neighbors with better or equal cost should be accepted, allowing the


technique to navigate across plateaus in the response surface.

Strategy

Code Listing

mutant = Array.new(bitstring)
pos = rand(bitstring.size)
mutant[pos] = (mutant[pos]=='1') ? '0' : '1'

def random_neighbor(bitstring)

def random_bitstring(num_bits)
return Array.new(num_bits){|i| (rand<0.5) ? "1" : "0"}
end

def onemax(vector)
return vector.inject(0.0){|sum, v| sum + ((v=="1") ? 1 : 0)}
end

Listing 2.3 provides an example of the Stochastic Hill Climbing algorithm


implemented in the Ruby Programming Language, specically the Random
Mutation Hill Climbing algorithm described by Forrest and Mitchell [3].
The algorithm is executed for a xed number of iterations and is applied to
a binary string optimization problem called One Max. The objective of
this maximization problem is to prepare a string of all 1 bits, where the
cost function only reports the number of bits in a given string.

2.4.5

The procedure can be applied to multiple candidate solutions concurrently, allowing multiple algorithm runs to be performed at the same
time (called Parallel Hill Climbing).

The algorithm can be restarted and repeated a number of times after


it converges to provide an improved result (called Multiple Restart
Hill Climbing).

Stochastic Hill Climbing is a local search technique (compared to


global search) and may be used to rene a result after the execution
of a global search algorithm.

2.4.2

The algorithms strategy may be applied to continuous domains by


making use of a step-size to dene candidate-solution neighbors (such
as Localized Random Search and Fixed Step-Size Random Search).

Even though the technique uses a stochastic process, it can still get
stuck in local optima.

Taxonomy

Heuristics

Chapter 2. Stochastic Algorithms

Stochastic Hill Climbing was designed to be used in discrete domains


with explicit neighbors such as combinatorial optimization (compared
to continuous function optimization).

2.4.4

40

The Stochastic Hill Climbing algorithm is a Stochastic Optimization algorithm and is a Local Optimization algorithm (contrasted to Global Optimization). It is a direct search technique, as it does not require derivatives
of the search space. Stochastic Hill Climbing is an extension of deterministic
hill climbing algorithms such as Simple Hill Climbing (rst-best neighbor),
Steepest-Ascent Hill Climbing (best neighbor), and a parent of approaches
such as Parallel Hill Climbing and Random-Restart Hill Climbing.

2.4.1

Stochastic Hill Climbing, SHC, Random Hill Climbing, RHC, Random


Mutation Hill Climbing, RMHC.

2.4

2.4. Stochastic Hill Climbing

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

References

Listing 2.3: Stochastic Hill Climbing in Ruby

41

Jules and Wattenberg were also early to consider stochastic hill climbing
as an approach to compare to the genetic algorithm [4]. Skalak applied the
RMHC algorithm to a single long bit-string that represented a number of
prototype vectors for use in classication [8].

Perhaps the most popular implementation of the Stochastic Hill Climbing


algorithm is by Forrest and Mitchell, who proposed the Random Mutation Hill Climbing (RMHC) algorithm (with communication from Richard
Palmer) in a study that investigated the behavior of the genetic algorithm
on a deceptive class of (discrete) bit-string optimization problems called
royal road functions [3]. The RMHC was compared to two other hill
climbing algorithms in addition to the genetic algorithm, specically: the
Steepest-Ascent Hill Climber, and the Next-Ascent Hill Climber. This study
was then followed up by Mitchell and Holland [5].

Primary Sources

2.4.6

end

# problem configuration
num_bits = 64
# algorithm configuration
max_iterations = 1000
# execute the algorithm
best = search(max_iterations, num_bits)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].join}"

if __FILE__ == $0

end
return candidate
end

candidate = {}
candidate[:vector] = random_bitstring(num_bits)
candidate[:cost] = onemax(candidate[:vector])
max_iterations.times do |iter|
neighbor = {}
neighbor[:vector] = random_neighbor(candidate[:vector])
neighbor[:cost] = onemax(neighbor[:vector])
candidate = neighbor if neighbor[:cost] >= candidate[:cost]
puts " > iteration #{(iter+1)}, best=#{candidate[:cost]}"
break if candidate[:cost] == num_bits

def search(max_iterations, num_bits)

return mutant
end

2.4. Stochastic Hill Climbing

Chapter 2. Stochastic Algorithms

Bibliography

[8] D. B. Skalak. Prototype and feature selection by sampling and random mutation hill climbing algorithms. In Proceedings of the eleventh
international conference on machine learning, pages 293301. Morgan
Kaufmann, 1994.

[7] H. M
uhlenbein. How genetic algorithms really work: I. mutation and
hillclimbing. In Parallel Problem Solving from Nature 2, pages 1526,
1992.

[6] H. M
uhlenbein. Evolution in time and space - the parallel genetic
algorithm. In Foundations of Genetic Algorithms, 1991.

[5] M. Mitchell and J. H. Holland. When will a genetic algorithm outperform


hill climbing? In Proceedings of the 5th International Conference on
Genetic Algorithms. Morgan Kaufmann Publishers Inc., 1993.

[4] A. Juels and M. Wattenberg. Stochastic hill climbing as a baseline


method for evaluating genetic algorithms. Technical report, University
of California, Berkeley, 1994.

[3] S. Forrest and M. Mitchell. Relative building-block tness and the


building-block hypothesis. In Foundations of Genetic Algorithms 2,
pages 109126. Morgan Kaufmann, 1993.

[2] L. Davis. Bit-climbing, representational bias, and test suite design. In


Proceedings of the fourth international conference on genetic algorithms,
pages 1823, 1991.

ack. Optimal mutation rates in genetic search. In Proceedings of the


[1] T. B
Fifth International Conference on Genetic Algorithms, pages 29, 1993.

2.4.7

The Stochastic Hill Climbing algorithm is related to the genetic algorithm


without crossover. Simplied versions of the approach are investigated for
bit-string based optimization problems with the population size of the genetic
algorithm reduced to one. The general technique has been investigated
under the names Iterated Hillclimbing [6], ES(1+1,m,hc) [7], Random Bit
Climber [2], and (1+1)-Genetic Algorithm [1]. This main dierence between
RMHC and ES(1+1) is that the latter uses a xed probability of a mutation
for each discrete element of a solution (meaning the neighborhood size is
probabilistic), whereas RMHC will only stochastically modify one element.

Learn More

42

Iterated Local Search


Heuristics

Chapter 2. Stochastic Algorithms

Procedure

Algorithm 2.5.1: Pseudocode for Iterated Local Search.


Input:
Output: Sbest
1 Sbest ConstructInitialSolution();
2 Sbest LocalSearch();
3 SearchHistory Sbest ;
4 while StopCondition() do
5
Scandidate Perturbation(Sbest , SearchHistory);
6
Scandidate LocalSearch(Scandidate );
7
SearchHistory Scandidate ;
8
if AcceptanceCriterion(Sbest , Scandidate , SearchHistory) then
9
Sbest Scandidate ;
10
end
11 end
12 return Sbest ;

Algorithm 2.5.1 provides a pseudocode listing of the Iterated Local Search


algorithm for minimizing a cost function.

2.5.3

The objective of Iterated Local Search is to improve upon stochastic MutliRestart Search by sampling in the broader neighborhood of candidate
solutions and using a Local Search technique to rene solutions to their
local optima. Iterated Local Search explores a sequence of solutions created
as perturbations of the current best solution, the result of which is rened
using an embedded heuristic.

Code Listing
Listing 2.4 provides an example of the Iterated Local Search algorithm
implemented in the Ruby Programming Language. The algorithm is applied
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken
from the TSPLIB. The problem seeks a permutation of the order to visit
cities (called a tour) that minimizes the total distance traveled. The optimal
tour distance for Berlin52 instance is 7542 units.
The Iterated Local Search runs for a xed number of iterations. The
implementation is based on a common algorithm conguration for the TSP,
where a double-bridge move (4-opt) is used as the perturbation technique,
and a stochastic 2-opt is used as the embedded Local Search heuristic.
The double-bridge move involves partitioning a permutation into 4 pieces
(a,b,c,d) and putting it back together in a specic and jumbled ordering
(a,d,c,b).

2.5.5

The simplest and most common acceptance criteria is an improvement


in the cost of constructed candidate solutions.

The procedure may store as much or as little history as needed to


be used during perturbation and acceptance criteria. No history
represents a random walk in a larger neighborhood of the best solution
and is the most common implementation of the approach.

Perturbations can be made deterministically, although stochastic and


probabilistic (adaptive based on history) are the most common.

The starting point for the search may be a randomly constructed


candidate solution, or constructed using a problem-specic heuristic
(such as nearest neighbor).

The embedded heuristic is most commonly a problem-specic local


search technique.

Strategy

2.5.2

The perturbation of the current best solution should be in a neighborhood beyond the reach of the embedded heuristic and should not be
easily undone.

Iterated Local Search was designed for and has been predominately
applied to discrete domains, such as combinatorial optimization problems.

2.5.4

44

Perturbations that are too small make the algorithm too greedy,
perturbations that are too large make the algorithm too stochastic.

Taxonomy

43

Iterated Local Search is a Metaheuristic and a Global Optimization technique. It is an extension of Mutli Start Search and may be considered a
parent of many two-phase search approaches such as the Greedy Randomized Adaptive Search Procedure (Section 2.8) and Variable Neighborhood
Search (Section 2.7).

2.5.1

Iterated Local Search, ILS.

2.5

2.5. Iterated Local Search

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

def perturbation(cities, best)

end

= 1 + rand(perm.size / 4)
= pos1 + 1 + rand(perm.size / 4)
= pos2 + 1 + rand(perm.size / 4)
perm[0...pos1] + perm[pos3..perm.size]
perm[pos2...pos3] + perm[pos1...pos2]
return p1 + p2

pos1
pos2
pos3
p1 =
p2 =

def double_bridge_move(perm)

end

candidate = {:vector=>stochastic_two_opt(best[:vector])}
candidate[:cost] = cost(candidate[:vector], cities)
count = (candidate[:cost] < best[:cost]) ? 0 : count+1
best = candidate if candidate[:cost] < best[:cost]
end until count >= max_no_improv
return best

begin

count = 0

def local_search(best, cities, max_no_improv)

end

perm = Array.new(permutation)
c1, c2 = rand(perm.size), rand(perm.size)
exclude = [c1]
exclude << ((c1==0) ? perm.size-1 : c1-1)
exclude << ((c1==perm.size-1) ? 0 : c1+1)
c2 = rand(perm.size) while exclude.include?(c2)
c1, c2 = c2, c1 if c2 < c1
perm[c1...c2] = perm[c1...c2].reverse
return perm

def stochastic_two_opt(permutation)

end
return perm
end

perm = Array.new(cities.size){|i| i}
perm.each_index do |i|
r = rand(perm.size-i) + i
perm[r], perm[i] = perm[i], perm[r]

def random_permutation(cities)

end
return distance
end

distance =0
permutation.each_with_index do |c1, i|
c2 = (i==permutation.size-1) ? permutation[0] : permutation[i+1]
distance += euc_2d(cities[c1], cities[c2])

def cost(permutation, cities)

end

Math.sqrt((c1[0] - c2[0])**2.0 + (c1[1] - c2[1])**2.0).round

def euc_2d(c1, c2)

2.5. Iterated Local Search

45

93

92

91

90

89

88

87

86

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

Chapter 2. Stochastic Algorithms

References

Listing 2.4: Iterated Local Search in Ruby

The denition and framework for Iterated Local Search was described by
St
utzle in his PhD dissertation [12]. Specically he proposed constrains on
what constitutes an Iterated Local Search algorithm as 1) a single chain of
candidate solutions, and 2) the method used to improve candidate solutions
occurs within a reduced space by a black-box heuristic. St
utzle does not take
credit for the approach, instead highlighting specic instances of Iterated
Local Search from the literature, such as iterated descent [1], large-step
Markov chains [7], iterated Lin-Kernighan [3], chained local optimization

Primary Sources

2.5.6

end

# problem configuration
berlin52 = [[565,575],[25,185],[345,750],[945,685],[845,655],
[880,660],[25,230],[525,1000],[580,1175],[650,1130],[1605,620],
[1220,580],[1465,200],[1530,5],[845,680],[725,370],[145,665],
[415,635],[510,875],[560,365],[300,465],[520,585],[480,415],
[835,625],[975,580],[1215,245],[1320,315],[1250,400],[660,180],
[410,250],[420,555],[575,665],[1150,1160],[700,580],[685,595],
[685,610],[770,610],[795,645],[720,635],[760,650],[475,960],
[95,260],[875,920],[700,500],[555,815],[830,485],[1170,65],
[830,610],[605,625],[595,360],[1340,725],[1740,245]]
# algorithm configuration
max_iterations = 100
max_no_improv = 50
# execute the algorithm
best = search(berlin52, max_iterations, max_no_improv)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

if __FILE__ == $0

end
return best
end

best = {}
best[:vector] = random_permutation(cities)
best[:cost] = cost(best[:vector], cities)
best = local_search(best, cities, max_no_improv)
max_iterations.times do |iter|
candidate = perturbation(cities, best)
candidate = local_search(candidate, cities, max_no_improv)
best = candidate if candidate[:cost] < best[:cost]
puts " > iteration #{(iter+1)}, best=#{best[:cost]}"

def search(cities, max_iterations, max_no_improv)

end

candidate = {}
candidate[:vector] = double_bridge_move(best[:vector])
candidate[:cost] = cost(candidate[:vector], cities)
return candidate

46

47

Bibliography

utzle. Applying iterated local search to the permutation ow shop


[9] T. St
problem. Technical Report AIDA9804, FG Intellektik, TU Darmstadt,
1998.

utzle. Handbook of
[8] H. Ramalhinho-Lourenco, O. C. Martin, and T. St
Metaheuristics, chapter Iterated Local Search, pages 320353. Springer,
2003.

[7] O. Martin, S. W. Otto, and E. W. Felten. Large-step markov chains


for the traveling salesman problems. Complex Systems, 5(3):299326,
1991.

[6] O. Martin and S. W. Otto. Combining simulated annealing with local


search heuristics. Annals of Operations Research, 63:5775, 1996.

utzle. A beginners introduction to


[5] H. R. Lourenco, O. Martin, and T. St
iterated local search. In Proceedings 4th Metaheuristics International
Conference (MIC2001), 2001.

[4] D. S. Johnson and L. A. McGeoch. Local Search in Combinatorial


Optimization, chapter The travelling salesman problem: A case study
in local optimization, pages 215310. John Wiley & Sons, 1997.

[3] D. S. Johnson. Local optimization and the travelling salesman problem.


In Proceedings of the 17th Colloquium on Automata, Languages, and
Programming, pages 446461, 1990.

[2] J. Baxter. Local optima avoidance in depot location. Journal of the


Operational Research Society, 32:815819, 1981.

[1] E. B. Baum. Towards practical neural computation for combinatorial


optimization problems. In AIP conference proceedings: Neural Networks
for Computing, pages 5364, 1986.

2.5.7

Two early technical reports by St


utzle that present applications of Iterated
Local Search include a report on the Quadratic Assignment Problem [10],
and another on the permutation ow shop problem [9]. St
utzle and Hoos
also published an early paper studying Iterated Local Search for to the TSP
[11]. Lourenco, Martin, and St
utzle provide a concise presentation of the
technique, related techniques and the framework, much as it is presented in
St
utzles dissertation [5]. The same authors also preset an authoritative
summary of the approach and its applications as a book chapter [8].

Learn More

[6], as well as [2] that introduces the principle, and [4] that summarized it
(list taken from [8]).

2.5. Iterated Local Search


Chapter 2. Stochastic Algorithms

utzle. Local Search Algorithms for Combinatorial Problems:


[12] T. G. St
Analysis, Improvements, and New Applications. PhD thesis, Darmstadt
University of Technology, Department of Computer Science, 1998.

utzle and H. H. Hoos. Analyzing the run-time behaviour of


[11] T. St
iterated local search for the TSP. In Proceedings III Metaheuristics
International Conference, 1999.

utzle. Iterated local search for the quadratic assignment problem.


[10] T. St
Technical Report AIDA-99-03, FG Intellektik, FB Informatik, TU
Darmstadt, 1999.

48

Guided Local Search

Taxonomy

49

Strategy

Procedure

Heuristics

The Guided Local Search procedure is independent of the Local


Search procedure embedded within it. A suitable domain-specic

2.6.4

Algorithm 2.6.1 provides a pseudocode listing of the Guided Local Search


algorithm for minimization. The Local Search algorithm used by the
Guided Local Search
M algorithm uses an augmented cost function in the form
h(s) = g(s) + i=1 fi , where h(s) is the augmented cost function, g(s) is
the problem cost function, is the regularization parameter (a coecient
for scaling the penalties), s is a locally optimal solution of M features,
and fi is the ith feature in locally optimal solution. The augmented cost
function is only used by the local search procedure, the Guided Local Search
algorithm uses the problem specic cost function without augmentation.
Penalties are only updated for those features in a locally optimal solution
that maximize utility, updated by adding 1 to the penalty for the future
C
(a counter). The utility for a feature is calculated as Uf eature = 1+Pf eature
,
f eature
where Uf eature is the utility for penalizing a feature (maximizing), Cf eature
is the cost of the feature, and Pf eature is the current penalty for the feature.

2.6.3

The strategy for the Guided Local Search algorithm is to use penalties to
encourage a Local Search technique to escape local optima and discover the
global optima. A Local Search algorithm is run until it gets stuck in a local
optima. The features from the local optima are evaluated and penalized,
the results of which are used in an augmented cost function employed by the
Local Search procedure. The Local Search is repeated a number of times
using the last local optima discovered and the augmented cost function that
guides exploration away from solutions with features present in discovered
local optima.

2.6.2

The Guided Local Search algorithm is a Metaheuristic and a Global Optimization algorithm that makes use of an embedded Local Search algorithm. It is an extension to Local Search algorithms such as Hill Climbing
(Section 2.4) and is similar in strategy to the Tabu Search algorithm (Section 2.10) and the Iterated Local Search algorithm (Section 2.5).

2.6.1

Guided Local Search, GLS.

2.6

2.6. Guided Local Search

Chapter 2. Stochastic Algorithms

Code Listing
Listing 2.5 provides an example of the Guided Local Search algorithm
implemented in the Ruby Programming Language. The algorithm is applied
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken
from the TSPLIB. The problem seeks a permutation of the order to visit
cities (called a tour) that minimizes the total distance traveled. The optimal
tour distance for Berlin52 instance is 7542 units.

2.6.5

The parameter is a scaling factor for feature penalization that must


be in the same proportion to the candidate solution costs from the
specic problem instance to which the algorithm is being applied.
As such, the value for must be meaningful when used within the
augmented cost function (such as when it is added to a candidate
solution cost in minimization and subtracted from a cost in the case
of a maximization problem).

The algorithm was designed for discrete optimization problems where


a solution is comprised of independently assessable features such as
Combinatorial Optimization, although it has been applied to continuous function optimization modeled as binary strings.

The Guided Local Search procedure may need to be executed for


thousands to hundreds-of-thousands of iterations, each iteration of
which assumes a run of a Local Search algorithm to convergence.

search procedure should be identied and employed.

Algorithm 2.6.1: Pseudocode for Guided Local Search.


Input: Itermax ,
Output: Sbest
1 fpenalties ;
2 Sbest RandomSolution();
3 foreach Iteri Itermax do
4
Scurr LocalSearch(Sbest , , fpenalties );
5
futilities CalculateFeatureUtilities(Scurr , fpenalties );
6
fpenalties UpdateFeaturePenalties(Scurr , fpenalties ,
futilities );
if Cost(Scurr ) Cost(Sbest ) then
7
8
Sbest Scurr ;
9
end
10 end
11 return Sbest ;

50

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

51

def cost(cand, penalties, cities, lambda)

end
return [distance, augmented]
end

distance, augmented = 0, 0
permutation.each_with_index do |c1, i|
c2 = (i==permutation.size-1) ? permutation[0] : permutation[i+1]
c1, c2 = c2, c1 if c2 < c1
d = euc_2d(cities[c1], cities[c2])
distance += d
augmented += d + (lambda * (penalties[c1][c2]))

def augmented_cost(permutation, penalties, cities, lambda)

end

perm = Array.new(permutation)
c1, c2 = rand(perm.size), rand(perm.size)
exclude = [c1]
exclude << ((c1==0) ? perm.size-1 : c1-1)
exclude << ((c1==perm.size-1) ? 0 : c1+1)
c2 = rand(perm.size) while exclude.include?(c2)
c1, c2 = c2, c1 if c2 < c1
perm[c1...c2] = perm[c1...c2].reverse
return perm

def stochastic_two_opt(permutation)

end
return perm
end

perm = Array.new(cities.size){|i| i}
perm.each_index do |i|
r = rand(perm.size-i) + i
perm[r], perm[i] = perm[i], perm[r]

def random_permutation(cities)

end

Math.sqrt((c1[0] - c2[0])**2.0 + (c1[1] - c2[1])**2.0).round

def euc_2d(c1, c2)

The implementation of the algorithm for the TSP was based on the
conguration specied by Voudouris in [7]. A TSP-specic local search
algorithm is used called 2-opt that selects two points in a permutation and
reconnects the tour, potentially untwisting the tour at the selected points.
The stopping condition for 2-opt was congured to be a xed number of
non-improving moves.
The equation for setting for TSP instances is = cost(optima)
,
N
where N is the number of cities, cost(optima) is the cost of a local optimum
found by a local search, and (0, 1] (around 0.3 for TSP and 2-opt).
The cost of a local optima was xed to the approximated value of 15000
for the Berlin52 instance. The utility function for features (edges) in the
D
TSP is Uedge = 1+Pedge
, where Uedge is the utility for penalizing an edge
edge
(maximizing), Dedge is the cost of the edge (distance between cities) and
Pedge is the current penalty for the edge.

2.6. Guided Local Search

94

93

92

91

90

89

88

87

86

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

# problem configuration
berlin52 = [[565,575],[25,185],[345,750],[945,685],[845,655],
[880,660],[25,230],[525,1000],[580,1175],[650,1130],[1605,620],
[1220,580],[1465,200],[1530,5],[845,680],[725,370],[145,665],
[415,635],[510,875],[560,365],[300,465],[520,585],[480,415],

if __FILE__ == $0

end
return best
end

current = {:vector=>random_permutation(cities)}
best = nil
penalties = Array.new(cities.size){ Array.new(cities.size, 0) }
max_iterations.times do |iter|
current=local_search(current, cities, penalties, max_no_improv, lambda)
utilities=calculate_feature_utilities(penalties,cities,current[:vector])
update_penalties!(penalties, cities, current[:vector], utilities)
best = current if best.nil? or current[:cost] < best[:cost]
puts " > iter=#{(iter+1)}, best=#{best[:cost]}, aug=#{best[:aug_cost]}"

def search(max_iterations, cities, max_no_improv, lambda)

end
return penalties
end

max = utilities.max()
permutation.each_with_index do |c1, i|
c2 = (i==permutation.size-1) ? permutation[0] : permutation[i+1]
c1, c2 = c2, c1 if c2 < c1
penalties[c1][c2] += 1 if utilities[i] == max

def update_penalties!(penalties, cities, permutation, utilities)

end
return utilities
end

utilities = Array.new(permutation.size,0)
permutation.each_with_index do |c1, i|
c2 = (i==permutation.size-1) ? permutation[0] : permutation[i+1]
c1, c2 = c2, c1 if c2 < c1
utilities[i] = euc_2d(cities[c1], cities[c2]) / (1.0 + penal[c1][c2])

def calculate_feature_utilities(penal, cities, permutation)

end

candidate = {:vector=> stochastic_two_opt(current[:vector])}


cost(candidate, penalties, cities, lambda)
count = (candidate[:aug_cost] < current[:aug_cost]) ? 0 : count+1
current = candidate if candidate[:aug_cost] < current[:aug_cost]
end until count >= max_no_improv
return current

begin

cost(current, penalties, cities, lambda)


count = 0

def local_search(current, cities, penalties, max_no_improv, lambda)

end

Chapter 2. Stochastic Algorithms

cost, acost = augmented_cost(cand[:vector], penalties, cities, lambda)


cand[:cost], cand[:aug_cost] = cost, acost

52

109

108

107

106

105

104

103

102

101

100

99

98

97

96

95

References

Listing 2.5: Guided Local Search in Ruby

53

Bibliography

[1] L. T. Lau. Guided Genetic Algorithm. PhD thesis, Department of


Computer Science, University of Essex, 1999.

2.6.7

Voudouris and Tsang provide a high-level introduction to the technique [11],


and a contemporary summary of the approach in Glover and Kochenbergers
Handbook of metaheuristics [12] that includes a review of the technique,
application areas, and demonstration applications on a diverse set of problem
instances. Mills et al. elaborated on the approach, devising an Extended
Guided Local Search (EGLS) technique that added aspiration criteria and
random moves to the procedure [4], work which culminated in Mills PhD
dissertation [3]. Lau and Tsang further extended the approach by integrating
it with a Genetic Algorithm, called the Guided Genetic Algorithm (GGA)
[2], that also culminated in a PhD dissertation by Lau [1].

Learn More

Guided Local Search emerged from an approach called GENET, which is


a connectionist approach to constraint satisfaction [6, 13]. Guided Local
Search was presented by Voudouris and Tsang in a series of technical reports (that were later published) that described the technique and provided
example applications of it to constraint satisfaction [8], combinatorial optimization [5, 10], and function optimization [9]. The seminal work on the
technique was Voudouris PhD dissertation [7].

Primary Sources

2.6.6

end

[835,625],[975,580],[1215,245],[1320,315],[1250,400],[660,180],
[410,250],[420,555],[575,665],[1150,1160],[700,580],[685,595],
[685,610],[770,610],[795,645],[720,635],[760,650],[475,960],
[95,260],[875,920],[700,500],[555,815],[830,485],[1170,65],
[830,610],[605,625],[595,360],[1340,725],[1740,245]]
# algorithm configuration
max_iterations = 150
max_no_improv = 20
alpha = 0.3
local_search_optima = 12000.0
lambda = alpha * (local_search_optima/berlin52.size.to_f)
# execute the algorithm
best = search(max_iterations, berlin52, max_no_improv, lambda)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

2.6. Guided Local Search

Chapter 2. Stochastic Algorithms

[13] C. J. Wang and E. P. K. Tsang. Solving constraint satisfaction problems


using neural networks. In Proceedings Second International Conference
on Articial Neural Networks, pages 295299, 1991.

[12] C. Voudouris and E. P. K. Tsang. Handbook of Metaheuristics, chapter


7: Guided Local Search, pages 185218. Springer, 2003.

[11] C. Voudouris and E. P. K. Tsang. Guided local search joins the


elite in discrete optimisation. In Proceedings, DIMACS Workshop on
Constraint Programming and Large Scale Discrete Optimisation, 1998.

[10] C. Voudouris and E. Tsang. Guided local search. Technical Report CSM247, Department of Computer Science, University of Essex, Colchester,
C04 3SQ, UK, 1995.

[9] C. Voudouris and E. Tsang. Function optimization using guided local


search. Technical Report CSM-249, Department of Computer Science
University of Essex Colchester, CO4 3SQ, UK, 1995.

[8] C. Voudouris and E. Tsang. The tunneling algorithm for partial csps
and combinatorial optimization problems. Technical Report CSM-213,
Department of Computer Science, University of Essex, Colchester, C04
3SQ, UK, 1994.

[7] C. Voudouris. Guided local search for combinatorial optimisation problems. PhD thesis, Department of Computer Science, University of
Essex, Colchester, UK, July 1997.

[6] E. P. K. Tsang and C. J. Wang. A generic neural network approach for


constraint satisfaction problems. In Taylor G, editor, Neural network
applications, pages 1222, 1992.

[5] E. Tsang and C. Voudouris. Fast local search and guided local search
and their application to british telecoms workforce scheduling problem. Technical Report CSM-246, Department of Computer Science
University of Essex Colchester CO4 3SQ, 1995.

[4] P. Mills, E. Tsang, and J. Ford. Applying an extended guided local


search on the quadratic assignment problem. Annals of Operations
Research, 118:121135, 2003.

[3] P. Mills. Extensions to Guided Local Search. PhD thesis, Department


of Computer Science, University of Essex, 2002.

[2] T. L. Lau and E. P. K. Tsang. The guided genetic algorithm and its
application to the general assignment problems. In IEEE 10th International Conference on Tools with Articial Intelligence (ICTAI98),
1998.

54

Variable Neighborhood Search

Taxonomy

55

Strategy

Procedure

Heuristics

The Variable Neighborhood Descent (VND) can be embedded in the


Variable Neighborhood Search as a the Local Search procedure and
has been shown to be most eective.

The embedded Local Search technique should be specialized to the


problem type and instance to which the technique is being applied.

Variable Neighborhood Search has been applied to a very wide array


of combinatorial optimization problems as well as clustering and
continuous function optimization problems.

Approximation methods (such as stochastic hill climbing) are suggested


for use as the Local Search procedure for large problem instances in
order to reduce the running time.

2.7.4

Algorithm 2.7.1 provides a pseudocode listing of the Variable Neighborhood


Search algorithm for minimizing a cost function. The Pseudocode shows
that the systematic search of expanding neighborhoods for a local optimum
is abandoned when a global improvement is achieved (shown with the Break
jump).

2.7.3

The strategy for the Variable Neighborhood Search involves iterative exploration of larger and larger neighborhoods for a given local optima until
an improvement is located after which time the search across expanding
neighborhoods is repeated. The strategy is motivated by three principles:
1) a local minimum for one neighborhood structure may not be a local
minimum for a dierent neighborhood structure, 2) a global minimum is a
local minimum for all possible neighborhood structures, and 3) local minima
are relatively close to global minima for many problem classes.

2.7.2

Variable Neighborhood Search is a Metaheuristic and a Global Optimization


technique that manages a Local Search technique. It is related to the
Iterative Local Search algorithm (Section 2.5).

2.7.1

Variable Neighborhood Search, VNS.

2.7

2.7. Variable Neighborhood Search

Chapter 2. Stochastic Algorithms

Code Listing

distance =0
perm.each_with_index do |c1, i|
c2 = (i==perm.size-1) ? perm[0] : perm[i+1]

def cost(perm, cities)

end

Math.sqrt((c1[0] - c2[0])**2.0 + (c1[1] - c2[1])**2.0).round

def euc_2d(c1, c2)

Listing 2.6 provides an example of the Variable Neighborhood Search algorithm implemented in the Ruby Programming Language. The algorithm is
applied to the Berlin52 instance of the Traveling Salesman Problem (TSP),
taken from the TSPLIB. The problem seeks a permutation of the order to
visit cities (called a tour) that minimizes the total distance traveled. The
optimal tour distance for Berlin52 instance is 7542 units.
The Variable Neighborhood Search uses a stochastic 2-opt procedure as
the embedded local search. The procedure deletes two edges and reverses
the sequence in-between the deleted edges, potentially removing twists in
the tour. The neighborhood structure used in the search is the number of
times the 2-opt procedure is performed on a permutation, between 1 and 20
times. The stopping condition for the local search procedure is a maximum
number of iterations without improvement. The same stop condition is
employed by the higher-order Variable Neighborhood Search procedure,
although with a lower boundary on the number of non-improving iterations.

2.7.5

Algorithm 2.7.1: Pseudocode for VNS.


Input: Neighborhoods
Output: Sbest
1 Sbest RandomSolution();
2 while StopCondition() do
3
foreach N eighborhoodi Neighborhoods do
4
N eighborhoodcurr CalculateNeighborhood(Sbest ,
N eighborhoodi );
Scandidate
5
RandomSolutionInNeighborhood(N eighborhoodcurr );
Scandidate LocalSearch(Scandidate );
6
7
if Cost(Scandidate ) < Cost(Sbest ) then
8
Sbest Scandidate ;
9
Break;
10
end
11
end
12 end
13 return Sbest ;

56

64

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

neighborhoods.each do |neigh|
candidate = {}
candidate[:vector] = Array.new(best[:vector])
neigh.times{stochastic_two_opt!(candidate[:vector])}
candidate[:cost] = cost(candidate[:vector], cities)
candidate = local_search(candidate, cities, max_no_improv_ls, neigh)
puts " > iteration #{(iter+1)}, neigh=#{neigh}, best=#{best[:cost]}"
iter += 1
if(candidate[:cost] < best[:cost])

begin

best = {}
best[:vector] = random_permutation(cities)
best[:cost] = cost(best[:vector], cities)
iter, count = 0, 0

def search(cities, neighborhoods, max_no_improv, max_no_improv_ls)

end
end until count >= max_no_improv
return best
end

count += 1

else

candidate = {}
candidate[:vector] = Array.new(best[:vector])
neighborhood.times{stochastic_two_opt!(candidate[:vector])}
candidate[:cost] = cost(candidate[:vector], cities)
if candidate[:cost] < best[:cost]
count, best = 0, candidate

begin

count = 0

def local_search(best, cities, max_no_improv, neighborhood)

end

c1, c2 = rand(perm.size), rand(perm.size)


exclude = [c1]
exclude << ((c1==0) ? perm.size-1 : c1-1)
exclude << ((c1==perm.size-1) ? 0 : c1+1)
c2 = rand(perm.size) while exclude.include?(c2)
c1, c2 = c2, c1 if c2 < c1
perm[c1...c2] = perm[c1...c2].reverse
return perm

def stochastic_two_opt!(perm)

end
return perm
end

perm = Array.new(cities.size){|i| i}
perm.each_index do |i|
r = rand(perm.size-i) + i
perm[r], perm[i] = perm[i], perm[r]

def random_permutation(cities)

end
return distance
end

distance += euc_2d(cities[c1], cities[c2])

2.7. Variable Neighborhood Search

57

94

93

92

91

90

89

88

87

86

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

break
else

Chapter 2. Stochastic Algorithms

count += 1

best, count = candidate, 0


puts "New best, restarting neighborhood search."

References

Listing 2.6: Variable Neighborhood Search in Ruby

There are a large number of papers published on Variable Neighborhood


Search, its applications and variations. Hansen and Mladenovic provide an

Learn More

The seminal paper for describing Variable Neighborhood Search was by


Mladenovic and Hansen in 1997 [7], although an early abstract by Mladenovic
is sometimes cited [6]. The approach is explained in terms of three dierent
variations on the general theme. Variable Neighborhood Descent (VND)
refers to the use of a Local Search procedure and the deterministic (as
opposed to stochastic or probabilistic) change of neighborhood size. Reduced
Variable Neighborhood Search (RVNS) involves performing a stochastic
random search within a neighborhood and no renement via a local search
technique. Basic Variable Neighborhood Search is the canonical approach
described by Mladenovic and Hansen in the seminal paper.

Primary Sources

2.7.6

end

# problem configuration
berlin52 = [[565,575],[25,185],[345,750],[945,685],[845,655],
[880,660],[25,230],[525,1000],[580,1175],[650,1130],[1605,620],
[1220,580],[1465,200],[1530,5],[845,680],[725,370],[145,665],
[415,635],[510,875],[560,365],[300,465],[520,585],[480,415],
[835,625],[975,580],[1215,245],[1320,315],[1250,400],[660,180],
[410,250],[420,555],[575,665],[1150,1160],[700,580],[685,595],
[685,610],[770,610],[795,645],[720,635],[760,650],[475,960],
[95,260],[875,920],[700,500],[555,815],[830,485],[1170,65],
[830,610],[605,625],[595,360],[1340,725],[1740,245]]
# algorithm configuration
max_no_improv = 50
max_no_improv_ls = 70
neighborhoods = 1...20
# execute the algorithm
best = search(berlin52, neighborhoods, max_no_improv, max_no_improv_ls)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

if __FILE__ == $0

end
end
end until count >= max_no_improv
return best
end

58

[7] N. Mladenovic and P. Hansen. Variable neighborhood search. Computers


& Operations Research, 24(11):10971100, 1997.

[6] N. Mladenovic. A variable neighborhood algorithm - a new metaheuristic


for combinatorial optimization. In Abstracts of papers presented at
Optimization Days, 1995.

[5] P. Hansen, N. Mladenovic, and D. Perez-Britos. Variable neighborhood


decomposition search. Journal of Heuristics, 7(4):13811231, 2001.

[4] P. Hansen and N. Mladenovic. Handbook of Metaheuristics, chapter 6:


Variable Neighborhood Search, pages 145184. Springer, 2003.

[3] P. Hansen and N. Mladenovic. Handbook of Applied Optimization, chapter Variable neighbourhood search, pages 221234. Oxford University
Press, 2002.

[2] P. Hansen and N. Mladenovic. Variable neighborhood search: Principles


and applications. European Journal of Operational Research, 130(3):449
467, 2001.

[1] P. Hansen and N. Mladenovic. Meta-heuristics, Advances and trends


in local search paradigms for optimization, chapter An introduction
to Variable neighborhood search, pages 433458. Kluwer Academic
Publishers, 1998.

Bibliography

Strategy

Taxonomy

Procedure

Algorithm 2.8.2 provides the pseudocode the Greedy Randomized Construction function. The function involves the step-wise construction of a
candidate solution using a stochastically greedy construction process. The
function works by building a Restricted Candidate List (RCL) that constraints the components of a solution (features) that may be selected from

Algorithm 2.8.1: Pseudocode for the GRASP.


Input:
Output: Sbest
1 Sbest ConstructRandomSolution();
2 while StopCondition() do
3
Scandidate GreedyRandomizedConstruction();
4
Scandidate LocalSearch(Scandidate );
5
if Cost(Scandidate ) < Cost(Sbest ) then
6
Sbest Scandidate ;
7
end
8 end
9 return Sbest ;

Algorithm 2.8.1 provides a pseudocode listing of the Greedy Randomized


Adaptive Search Procedure for minimizing a cost function.

2.8.3

The objective of the Greedy Randomized Adaptive Search Procedure is to


repeatedly sample stochastically greedy solutions, and then use a local search
procedure to rene them to a local optima. The strategy of the procedure
is centered on the stochastic and greedy step-wise construction mechanism
that constrains the selection and order-of-inclusion of the components of a
solution based on the value they are expected to provide.

2.8.1

Greedy Randomized Adaptive Search Procedure, GRASP.

2.8.2

Greedy Randomized Adaptive Search

Chapter 2. Stochastic Algorithms

2.7.7

2.8

60

The Greedy Randomized Adaptive Search Procedure is a Metaheuristic


and Global Optimization algorithm, originally proposed for the Operations
Research practitioners. The iterative application of an embedded Local
Search technique relate the approach to Iterative Local Search (Section 2.5)
and Multi-Start techniques.

59

overview of the approach that includes its recent history, extensions and a
detailed review of the numerous areas of application [4]. For some additional
useful overviews of the technique, its principles, and applications, see [13].
There are many extensions to Variable Neighborhood Search. Some
popular examples include: Variable Neighborhood Decomposition Search
(VNDS) that involves embedding a second heuristic or metaheuristic approach in VNS to replace the Local Search procedure [5], Skewed Variable
Neighborhood Search (SVNS) that encourages exploration of neighborhoods
far away from discovered local optima, and Parallel Variable Neighborhood
Search (PVNS) that either parallelizes the local search of a neighborhood
or parallelizes the searching of the neighborhoods themselves.

2.7. Variable Neighborhood Search

61

Heuristics

Code Listing

Listing 2.7 provides an example of the Greedy Randomized Adaptive Search


Procedure implemented in the Ruby Programming Language. The algorithm

2.8.5

The technique was designed for discrete problem classes such as combinatorial optimization problems.

As an alternative to using the threshold, the RCL can be constrained


to the top n% of candidate features that may be selected from each
construction cycle.

The threshold denes the amount of greediness of the construction


mechanism, where values close to 0 may be too greedy, and values
close to 1 may be too generalized.

2.8.4

Algorithm 2.8.2: Pseudocode the GreedyRandomizedConstruction


function.
Input:
Output: Scandidate
1 Scandidate ;
2 while Scandidate
= ProblemSize do
3
F eaturecosts ;
4
for F eaturei
/ Scandidate do
5
F eaturecosts
CostOfAddingFeatureToSolution(Scandidate , F eaturei );
end
6
7
RCL ;
8
F costmin MinCost(F eaturecosts );
9
F costmax MaxCost(F eaturecosts );
10
for Fi cost F eaturecosts do
11
if Fi cost F costmin + (F costmax F costmin ) then
12
RCL F eaturei ;
13
end
14
end
15
Scandidate SelectRandomFeature(RCL);
16 end
17 return Scandidate ;

each cycle. The RCL may be constrained by an explicit size, or by using


a threshold ( [0, 1]) on the cost of adding each feature to the current
candidate solution.

2.8. Greedy Randomized Adaptive Search

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

Chapter 2. Stochastic Algorithms

candidate = {}
candidate[:vector] = [rand(cities.size)]
allCities = Array.new(cities.size) {|i| i}
while candidate[:vector].size < cities.size
candidates = allCities - candidate[:vector]
costs = Array.new(candidates.size) do |i|
euc_2d(cities[candidate[:vector].last], cities[i])

def construct_randomized_greedy_solution(cities, alpha)

end

candidate = {:vector=>stochastic_two_opt(best[:vector])}
candidate[:cost] = cost(candidate[:vector], cities)
count = (candidate[:cost] < best[:cost]) ? 0 : count+1
best = candidate if candidate[:cost] < best[:cost]
end until count >= max_no_improv
return best

begin

count = 0

def local_search(best, cities, max_no_improv)

end

perm = Array.new(permutation)
c1, c2 = rand(perm.size), rand(perm.size)
exclude = [c1]
exclude << ((c1==0) ? perm.size-1 : c1-1)
exclude << ((c1==perm.size-1) ? 0 : c1+1)
c2 = rand(perm.size) while exclude.include?(c2)
c1, c2 = c2, c1 if c2 < c1
perm[c1...c2] = perm[c1...c2].reverse
return perm

def stochastic_two_opt(permutation)

end
return distance
end

distance =0
perm.each_with_index do |c1, i|
c2 = (i==perm.size-1) ? perm[0] : perm[i+1]
distance += euc_2d(cities[c1], cities[c2])

def cost(perm, cities)

end

Math.sqrt((c1[0] - c2[0])**2.0 + (c1[1] - c2[1])**2.0).round

def euc_2d(c1, c2)

is applied to the Berlin52 instance of the Traveling Salesman Problem (TSP),


taken from the TSPLIB. The problem seeks a permutation of the order to
visit cities (called a tour) that minimizes the total distance traveled. The
optimal tour distance for Berlin52 instance is 7542 units.
The stochastic and greedy step-wise construction of a tour involves
evaluating candidate cities by the the cost they contribute as being the
next city in the tour. The algorithm uses a stochastic 2-opt procedure for
the Local Search with a xed number of non-improving iterations as the
stopping condition.

62

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

References

63

The seminal paper that introduces the general approach of stochastic and
greedy step-wise construction of candidate solutions is by Feo and Resende
[3]. The general approach was inspired by greedy heuristics by Hart and
Shogan [9]. The seminal review paper that is cited with the preliminary
paper is by Feo and Resende [4], and provides a coherent description

Primary Sources

2.8.6

Listing 2.7: Greedy Randomized Adaptive Search Procedure in Ruby

end

# problem configuration
berlin52 = [[565,575],[25,185],[345,750],[945,685],[845,655],
[880,660],[25,230],[525,1000],[580,1175],[650,1130],[1605,620],
[1220,580],[1465,200],[1530,5],[845,680],[725,370],[145,665],
[415,635],[510,875],[560,365],[300,465],[520,585],[480,415],
[835,625],[975,580],[1215,245],[1320,315],[1250,400],[660,180],
[410,250],[420,555],[575,665],[1150,1160],[700,580],[685,595],
[685,610],[770,610],[795,645],[720,635],[760,650],[475,960],
[95,260],[875,920],[700,500],[555,815],[830,485],[1170,65],
[830,610],[605,625],[595,360],[1340,725],[1740,245]]
# algorithm configuration
max_iter = 50
max_no_improv = 50
greediness_factor = 0.3
# execute the algorithm
best = search(berlin52, max_iter, max_no_improv, greediness_factor)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

if __FILE__ == $0

end
return best
end

candidate = construct_randomized_greedy_solution(cities, alpha);


candidate = local_search(candidate, cities, max_no_improv)
best = candidate if best.nil? or candidate[:cost] < best[:cost]
puts " > iteration #{(iter+1)}, best=#{best[:cost]}"

def search(cities, max_iter, max_no_improv, alpha)


best = nil
max_iter.times do |iter|

end

candidate[:cost] = cost(candidate[:vector], cities)


return candidate

end

candidate[:vector] << rcl[rand(rcl.size)]

end

rcl, max, min = [], costs.max, costs.min


costs.each_with_index do |c,i|
rcl << candidates[i] if c <= (min + alpha*(max-min))

end

2.8. Greedy Randomized Adaptive Search

Chapter 2. Stochastic Algorithms

Bibliography

[6] T. A. Feo, K. Sarathy, and J. McGahan. A grasp for single machine


scheduling with sequence dependent setup costs and linear delay penalties. Computers & Operations Research, 23(9):881895, 1996.

[5] T. A. Feo, K. Sarathy, and J. McGahan. A GRASP for single machine


scheduling with sequence dependent setup costs and linear delay penalties. Technical Report TX 78712-1063, Operations Research Group,
Department of Mechanical Engineering, The University of Texas at
Austin, 1994.

[4] T. A. Feo and M. G. C. Resende. Greedy randomized adaptive search


procedures. Journal of Global Optimization, 6:109133, 1995.

[3] T. A. Feo and M. G. C. Resende. A probabilistic heuristic for a


computationally dicult set covering problem. Operations Research
Letters, 8:6771, 1989.

[2] T. A. Feo, J. Bard, and S. Holland. A GRASP for scheduling printed


wiring board assembly. Technical Report TX 78712-1063, Operations
Research Group, Department of Mechanical Engineering, The University of Texas at Austin, 1993.

[1] J. F. Bard, T. A. Feo, and S. Holland. A GRASP for scheduling printed


wiring board assembly. I.I.E. Trans., 28:155165, 1996.

2.8.7

There are a vast number of review, application, and extension papers for
GRASP. Pitsoulis and Resende provide an extensive contemporary overview
of the eld as a review chapter [11], as does Resende and Ribeiro that
includes a clear presentation of the use of the threshold parameter instead
of a xed size for the RCL [13]. Festa and Resende provide an annotated
bibliography as a review chapter that provides some needed insight into large
amount of study that has gone into the approach [8]. There are numerous
extensions to GRASP, not limited to the popular Reactive GRASP for
adapting [12], the use of long term memory to allow the technique to
learn from candidate solutions discovered in previous iterations, and parallel
implementations of the procedure such as Parallel GRASP [10].

Learn More

of the GRASP technique, an example, and review of early applications.


An early application was by Feo, Venkatraman and Bard for a machine
scheduling problem [7]. Other early applications to scheduling problems
include technical reports [2] (later published as [1]) and [5] (also later
published as [6]).

64

65

[13] M. G. C. Resende and C. C. Ribeiro. Handbook of Metaheuristics,


chapter Greedy randomized adaptive search procedures, pages 219249.
Kluwer Academic Publishers, 2003.

[12] M. Prais and C. C. Ribeiro. Reactive GRASP: An application to a


matrix decomposition problem in TDMA trac assignment. INFORMS
Journal on Computing, 12:164176, 2000.

[11] L. Pitsoulis and M. G. C. Resende. Handbook of Applied Optimization,


chapter Greedy randomized adaptive search procedures, pages 168181.
Oxford University Press, 2002.

[10] P. M. Pardalos, L. S. Pitsoulis, and M. G. C. Resende. A parallel


GRASP implementation for the quadratic assignment problems. In
Parallel Algorithms for Irregularly Structured Problems (Irregular94),
pages 111130. Kluwer Academic Publishers, 1995.

[9] J. P. Hart and A. W. Shogan. Semigreedy heuristics: An empirical


study. Operations Research Letters, 6:107114, 1987.

[8] P. Festa and M. G. C. Resende. Essays and Surveys on Metaheuristics,


chapter GRASP: An annotated bibliography, pages 325367. Kluwer
Academic Publishers, 2002.

[7] T. A. Feo, K. Venkatraman, and J. F. Bard. A GRASP for a dicult


single machine scheduling problem. Computers & Operations Research,
18:635643, 1991.

2.8. Greedy Randomized Adaptive Search

Scatter Search

Taxonomy

Chapter 2. Stochastic Algorithms

Strategy

Procedure

Heuristics

Subset sizes can be 2, 3, 4 or more members that are all recombined


to produce viable candidate solutions within the neighborhood of the
members of the subset.

Small set sizes are preferred for the ReferenceSet, such as 10 or 20


members.

Scatter search is suitable for both discrete domains such as combinatorial optimization as well as continuous domains such as non-linear
programming (continuous function optimization).

2.9.4

Algorithm 2.9.1 provides a pseudocode listing of the Scatter Search algorithm


for minimizing a cost function. The procedure is based on the abstract form
presented by Glover as a template for the general class of technique [3], with
inuences from an application of the technique to function optimization by
Glover [3].

2.9.3

The objective of Scatter Search is to maintain a set of diverse and highquality candidate solutions. The principle of the approach is that useful
information about the global optima is stored in a diverse and elite set of
solutions (the reference set) and that recombining samples from the set
can exploit this information. The strategy involves an iterative process,
where a population of diverse and high-quality candidate solutions that
are partitioned into subsets and linearly recombined to create weighted
centroids of sample-based neighborhoods. The results of recombination
are rened using an embedded heuristic and assessed in the context of the
reference set as to whether or not they are retained.

2.9.2

Scatter search is a Metaheuristic and a Global Optimization algorithm. It is


also sometimes associated with the eld of Evolutionary Computation given
the use of a population and recombination in the structure of the technique.
Scatter Search is a sibling of Tabu Search (Section 2.10), developed by the
same author and based on similar origins.

2.9.1

Scatter Search, SS.

2.9

66

67

Code Listing

Listing 2.8 provides an example of the Scatter Search algorithm implemented


in the Ruby Programming Language. The example problem is an instance of

2.9.5

A lack of changes to the ReferenceSet may be used as a signal to


stop the current search, and potentially restart the search with a newly
initialized ReferenceSet.

The ReferenceSet may be updated at the end of an iteration, or


dynamically as candidates are created (a so-called steady-state population in some evolutionary computation literature).

The selection of members for the ReferenceSet at the end of each


iteration favors solutions with higher quality and may also promote
diversity.

The Local Search procedure should be a problem-specic improvement


heuristic.

Each subset should comprise at least one member added to the set in
the previous algorithm iteration.

Algorithm 2.9.1: Pseudocode for Scatter Search.


Input: DiverseSetsize , Ref erenceSetsize
Output: ReferenceSet
1 InitialSet ConstructInitialSolution(DiverseSetsize );
2 RenedSet ;
3 for Si InitialSet do
4
RenedSet LocalSearch(Si );
5 end
6 ReferenceSet SelectInitialReferenceSet(Ref erenceSetsize );
7 while StopCondition() do
8
Subsets SelectSubset(ReferenceSet);
9
CandidateSet ;
10
for Subseti Subsets do
11
RecombinedCandidates RecombineMembers(Subseti );
12
for Si RecombinedCandidates do
13
CandidateSet LocalSearch(Si );
14
end
15
end
16
ReferenceSet Select(ReferenceSet, CandidateSet,
Ref erenceSetsize );
17 end
18 return ReferenceSet;

2.9. Scatter Search

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

Chapter 2. Stochastic Algorithms

cand = {:vector=>random_vector(bounds)}
cand[:cost] = objective_function(cand[:vector])
cand = local_search(cand, bounds, max_no_improv, step_size)
diverse_set << cand if !diverse_set.any? {|x| x[:vector]==cand[:vector]}

begin

diverse_set = []

def construct_initial_set(bounds, set_size, max_no_improv, step_size)

end

candidate = {:vector=>take_step(bounds, best[:vector], step_size)}


candidate[:cost] = objective_function(candidate[:vector])
count = (candidate[:cost] < best[:cost]) ? 0 : count+1
best = candidate if candidate[:cost] < best[:cost]
end until count >= max_no_improv
return best

begin

count = 0

def local_search(best, bounds, max_no_improv, step_size)

end
return position
end

position = Array.new(current.size)
position.size.times do |i|
min = [minmax[i][0], current[i]-step_size].max
max = [minmax[i][1], current[i]+step_size].min
position[i] = rand_in_bounds(min, max)

def take_step(minmax, current, step_size)

end
end

rand_in_bounds(minmax[i][0], minmax[i][1])

def random_vector(minmax)
return Array.new(minmax.size) do |i|

def rand_in_bounds(min, max)


return min + ((max-min) * rand())
end

def objective_function(vector)
return vector.inject(0) {|sum, x| sum + (x ** 2.0)}
end

n
a continuous function optimization that seeks min f (x) where f = i=1 x2i ,
5.0 xi 5.0 and n = 3. The optimal solution for this basin function is
(v1 , . . . , vn ) = 0.0.
The algorithm is an implementation of Scatter Search as described in
an application of the technique to unconstrained non-linear optimization by
Glover [6]. The seeds for initial solutions are generated as random vectors,
as opposed to stratied samples. The example was further simplied by
not including a restart strategy, and the exclusion of diversity maintenance
in the ReferenceSet. A stochastic local search algorithm is used as the
embedded heuristic that uses a stochastic step size in the range of half a
percent of the search space.

68

98

97

96

95

94

93

92

91

90

89

88

87

86

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

69

subsets = select_subsets(ref_set)

def explore_subsets(bounds, ref_set, max_no_improv, step_size)


was_change = false

end
return children
end

child[:cost] = objective_function(child[:vector])
children << child

end

a, b = subset
d = rand(euclidean_distance(a[:vector], b[:vector]))/2.0
children = []
subset.each do |p|
step = (rand<0.5) ? +d : -d
child = {:vector=>Array.new(minmax.size)}
child[:vector].each_index do |i|
child[:vector][i] = p[:vector][i] + step
child[:vector][i]=minmax[i][0] if child[:vector][i]<minmax[i][0]
child[:vector][i]=minmax[i][1] if child[:vector][i]>minmax[i][1]

def recombine(subset, minmax)

end
return subsets
end

additions = ref_set.select{|c| c[:new]}


remainder = ref_set - additions
remainder = additions if remainder.nil? or remainder.empty?
subsets = []
additions.each do |a|
remainder.each{|r| subsets << [a,r] if a!=r && !subsets.include?([r,a])}

def select_subsets(ref_set)

end

diverse_set.sort!{|x,y| x[:cost] <=> y[:cost]}


ref_set = Array.new(num_elite){|i| diverse_set[i]}
remainder = diverse_set - ref_set
remainder.each{|c| c[:dist] = distance(c[:vector], ref_set)}
remainder.sort!{|x,y| y[:dist]<=>x[:dist]}
ref_set = ref_set + remainder.first(ref_set_size-ref_set.size)
return [ref_set, ref_set[0]]

def diversify(diverse_set, num_elite, ref_set_size)

def distance(v, set)


return set.inject(0){|s,x| s + euclidean_distance(v, x[:vector])}
end

end

sum = 0.0
c1.each_index {|i| sum += (c1[i]-c2[i])**2.0}
return Math.sqrt(sum)

def euclidean_distance(c1, c2)

end until diverse_set.size == set_size


return diverse_set
end

2.9. Scatter Search


99

149

148

147

146

145

144

143

142

141

140

139

138

137

136

135

134

133

132

131

130

129

128

127

126

125

124

123

122

121

120

119

118

117

116

115

114

113

112

111

110

109

108

107

106

105

104

103

102

101

100

end

Chapter 2. Stochastic Algorithms

end

Listing 2.8: Scatter Search in Ruby

# problem configuration
problem_size = 3
bounds = Array.new(problem_size) {|i| [-5, +5]}
# algorithm configuration
max_iter = 100
step_size = (bounds[0][1]-bounds[0][0])*0.005
max_no_improv = 30
ref_set_size = 10
diverse_set_size = 20
no_elite = 5
# execute the algorithm
best = search(bounds, max_iter, ref_set_size, diverse_set_size,
max_no_improv, step_size, no_elite)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

if __FILE__ == $0

end
return best
end

step_size, max_elite)
diverse_set = construct_initial_set(bounds, div_set_size, max_no_improv,
step_size)
ref_set, best = diversify(diverse_set, max_elite, ref_set_size)
ref_set.each{|c| c[:new] = true}
max_iter.times do |iter|
was_change = explore_subsets(bounds, ref_set, max_no_improv, step_size)
ref_set.sort!{|x,y| x[:cost] <=> y[:cost]}
best = ref_set.first if ref_set.first[:cost] < best[:cost]
puts " > iter=#{(iter+1)}, best=#{best[:cost]}"
break if !was_change

def search(bounds, max_iter, ref_set_size, div_set_size, max_no_improv,

end
end
end
end
return was_change
end

improved.each do |c|
if !ref_set.any? {|x| x[:vector]==c[:vector]}
c[:new] = true
ref_set.sort!{|x,y| x[:cost] <=> y[:cost]}
if c[:cost] < ref_set.last[:cost]
ref_set.delete(ref_set.last)
ref_set << c
puts " >> added, cost=#{c[:cost]}"
was_change = true

ref_set.each{|c| c[:new] = false}


subsets.each do |subset|
candidates = recombine(subset, bounds)
improved = Array.new(candidates.size) do |i|
local_search(candidates[i], bounds, max_no_improv, step_size)

70

References

71

Bibliography

[7] M. Laguna and R. Mart. Scatter search: methodology and implementations in C. Kluwer Academic Publishers, 2003.

[6] F. Glover, M. Laguna, and R. Mart. Advances in Evolutionary Computation: Theory and Applications, chapter Scatter Search, pages 519537.
Springer-Verlag, 2003.

[5] F. Glover, M. Laguna, and R. Mart. Fundamentals of scatter search


and path relinking. Control and Cybernetics, 39(3):653684, 2000.

[4] F. Glover. New Ideas in Optimization, chapter Scatter search and path
relinking, pages 297316. McGraw-Hill Ltd., 1999.

[3] F. Glover. Articial Evolution, chapter A Template For Scatter Search


And Path Relinking, page 13. Sprinter, 1998.

[2] F. Glover. Tabu search for nonlinear and parametric optimization (with
links to genetic algorithms). Discrete Applied Mathematics, 49:231255,
1994.

[1] F. Glover. Heuristics for integer programming using surrogate constraints.


Decision Sciences, 8(1):156166, 1977.

2.9.7

The primary reference for the approach is the book by Laguna and Mart
that reviews the principles of the approach in detail and presents tutorials on
applications of the approach on standard problems using the C programming
language [7]. There are many review articles and chapters on Scatter Search
that may be used to supplement an understanding of the approach, such
as a detailed review chapter by Glover [4], a review of the fundamentals of
the approach and its relationship to an abstraction called path linking by
Glover, Laguna, and Mart [5], and a modern overview of the technique by
Mart, Laguna, and Glover [8].

Learn More

A form of the Scatter Search algorithm was proposed by Glover for integer
programming [1], based on Glovers earlier work on surrogate constraints.
The approach remained idle until it was revisited by Glover and combined
with Tabu Search [2]. The modern canonical reference of the approach was
proposed by Glover who provides an abstract template of the procedure
that may be specialized for a given application domain [3].

Primary Sources

2.9.6

2.9. Scatter Search


Chapter 2. Stochastic Algorithms

[8] R. Mart, M. Laguna, and F. Glover. Principles of scatter search.


European Journal of Operational Research, 169(1):359372, 2006.

72

Tabu Search

Taxonomy

73

Strategy

Procedure

Heuristics

Intermediate-term memory structures can be introduced (complementing the short-term memory) to focus the search on promising areas of
the search space (intensication), called aspiration criteria.

Candidates for neighboring moves can be generated deterministically


for the entire neighborhood or the neighborhood can be stochastically
sampled to a xed size, trading o eciency for accuracy.

Tabu search was designed for, and has predominately been applied to
discrete domains such as combinatorial optimization problems.

Tabu search was designed to manage an embedded hill climbing


heuristic, although may be adapted to manage any neighborhood
exploration heuristic.

2.10.4

Algorithm 2.10.1 provides a pseudocode listing of the Tabu Search algorithm


for minimizing a cost function. The listing shows the simple Tabu Search
algorithm with short term memory, without intermediate and long term
memory management.

2.10.3

The objective for the Tabu Search algorithm is to constrain an embedded


heuristic from returning to recently visited areas of the search space, referred
to as cycling. The strategy of the approach is to maintain a short term
memory of the specic changes of recent moves within the search space
and preventing future moves from undoing those changes. Additional
intermediate-term memory structures may be introduced to bias moves
toward promising areas of the search space, as well as longer-term memory
structures that promote a general diversity in the search across the search
space.

2.10.2

Tabu Search is a Global Optimization algorithm and a Metaheuristic or


Meta-strategy for controlling an embedded heuristic technique. Tabu Search
is a parent for a large family of derivative approaches that introduce memory
structures in Metaheuristics, such as Reactive Tabu Search (Section 2.11)
and Parallel Tabu Search.

2.10.1

Tabu Search, TS, Taboo Search.

2.10

2.10. Tabu Search


Chapter 2. Stochastic Algorithms

Code Listing
Listing 2.9 provides an example of the Tabu Search algorithm implemented
in the Ruby Programming Language. The algorithm is applied to the
Berlin52 instance of the Traveling Salesman Problem (TSP), taken from
the TSPLIB. The problem seeks a permutation of the order to visit cities
(called a tour) that minimizes the total distance traveled. The optimal tour
distance for Berli52 instance is 7542 units.
The algorithm is an implementation of the simple Tabu Search with a
short term memory structure that executes for a xed number of iterations.
The starting point for the search is prepared using a random permutation
that is rened using a stochastic 2-opt Local Search procedure. The stochastic 2-opt procedure is used as the embedded hill climbing heuristic with
a xed sized candidate list. The two edges that are deleted in each 2-opt

2.10.5

Long-term memory structures can be introduced (complementing the


short-term memory) to encourage useful exploration of the broader
search space, called diversication. Strategies may include generating
solutions with rarely used components and biasing the generation
away from the most commonly used solution components.

Algorithm 2.10.1: Pseudocode for Tabu Search.


Input: T abuListsize
Output: Sbest
1 Sbest ConstructInitialSolution();
2 TabuList ;
3 while StopCondition() do
4
CandidateList ;
5
for Scandidate Sbestneighborhood do
6
if ContainsAnyFeatures(Scandidate , TabuList) then
7
CandidateList Scandidate ;
8
end
9
end
10
Scandidate LocateBestCandidate(CandidateList);
11
if Cost(Scandidate ) Cost(Sbest ) then
12
Sbest Scandidate ;
13
TabuList FeatureDifferences(Scandidate , Sbest );
14
while TabuList > T abuListsize do
15
DeleteFeature(TabuList);
16
end
17
end
18 end
19 return Sbest ;

74

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

candidate = {:vector=>perm}
candidate[:cost] = cost(candidate[:vector], cities)
return candidate, edges

end while is_tabu?(perm, tabu_list)

perm, edges = stochastic_two_opt(best[:vector])

def generate_candidate(best, tabu_list, cities)


perm, edges = nil, nil
begin

end
end
return false
end

c2 = (i==permutation.size-1) ? permutation[0] : permutation[i+1]


tabu_list.each do |forbidden_edge|
return true if forbidden_edge == [c1, c2]

def is_tabu?(permutation, tabu_list)


permutation.each_with_index do |c1, i|

end

perm = Array.new(parent)
c1, c2 = rand(perm.size), rand(perm.size)
exclude = [c1]
exclude << ((c1==0) ? perm.size-1 : c1-1)
exclude << ((c1==perm.size-1) ? 0 : c1+1)
c2 = rand(perm.size) while exclude.include?(c2)
c1, c2 = c2, c1 if c2 < c1
perm[c1...c2] = perm[c1...c2].reverse
return perm, [[parent[c1-1], parent[c1]], [parent[c2-1], parent[c2]]]

def stochastic_two_opt(parent)

end
return perm
end

perm = Array.new(cities.size){|i| i}
perm.each_index do |i|
r = rand(perm.size-i) + i
perm[r], perm[i] = perm[i], perm[r]

def random_permutation(cities)

end
return distance
end

distance = 0
perm.each_with_index do |c1, i|
c2 = (i==perm.size-1) ? perm[0] : perm[i+1]
distance += euc_2d(cities[c1], cities[c2])

def cost(perm, cities)

end

Math.sqrt((c1[0] - c2[0])**2.0 + (c1[1] - c2[1])**2.0).round

def euc_2d(c1, c2)

96

95

94

93

92

91

90

89

88

87

86

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

55

Chapter 2. Stochastic Algorithms

Listing 2.9: Tabu Search in Ruby

References
Tabu Search was introduced by Glover applied to scheduling employees to
duty rosters [9] and a more general overview in the context of the TSP [5],
based on his previous work on surrogate constraints on integer programming

Primary Sources

2.10.6

end

# problem configuration
berlin52 = [[565,575],[25,185],[345,750],[945,685],[845,655],
[880,660],[25,230],[525,1000],[580,1175],[650,1130],[1605,620],
[1220,580],[1465,200],[1530,5],[845,680],[725,370],[145,665],
[415,635],[510,875],[560,365],[300,465],[520,585],[480,415],
[835,625],[975,580],[1215,245],[1320,315],[1250,400],[660,180],
[410,250],[420,555],[575,665],[1150,1160],[700,580],[685,595],
[685,610],[770,610],[795,645],[720,635],[760,650],[475,960],
[95,260],[875,920],[700,500],[555,815],[830,485],[1170,65],
[830,610],[605,625],[595,360],[1340,725],[1740,245]]
# algorithm configuration
max_iter = 100
tabu_list_size = 15
max_candidates = 50
# execute the algorithm
best = search(berlin52, tabu_list_size, max_candidates, max_iter)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

if __FILE__ == $0

end
return best
end

puts " > iteration #{(iter+1)}, best=#{best[:cost]}"

end

candidates.sort! {|x,y| x.first[:cost] <=> y.first[:cost]}


best_candidate = candidates.first[0]
best_candidate_edges = candidates.first[1]
if best_candidate[:cost] < current[:cost]
current = best_candidate
best = best_candidate if best_candidate[:cost] < best[:cost]
best_candidate_edges.each {|edge| tabu_list.push(edge)}
tabu_list.pop while tabu_list.size > tabu_list_size

end

current = {:vector=>random_permutation(cities)}
current[:cost] = cost(current[:vector], cities)
best = current
tabu_list = Array.new(tabu_list_size)
max_iter.times do |iter|
candidates = Array.new(candidate_list_size) do |i|
generate_candidate(current, tabu_list, cities)

def search(cities, tabu_list_size, candidate_list_size, max_iter)

end

54

move are stored on the tabu list. This general approach is similar to that
used by Knox in his work on Tabu Search for symmetrical TSP [12] and
Fiechter for the Parallel Tabu Search for the TSP [2].
53

76

75

2.10. Tabu Search

77

Bibliography

[10] F. Glover and E. Taillard. A users guide to tabu search. Annals of


Operations Research, 41(1):128, 1993.

[9] F. Glover and C. McMillan. The general employee scheduling problem:


an integration of MS and AI. Computers and Operations Research,
13(5):536573, 1986.

[8] F. Glover. Tabu search: A tutorial. Interfaces, 4:7494, 1990.

[7] F. Glover. Tabu search Part II. ORSA Journal on Computing,


2(1):432, 1990.

[6] F. Glover. Tabu search Part I. ORSA Journal on Computing,


1(3):190206, 1989.

[5] F. Glover. Future paths for integer programming and links to articial
intelligence. Computers and Operations Research, 13(5):533549, 1986.

[4] F. Glover. Heuristics for integer programming using surrogate constraints. Decision Sciences, 8(1):156166, 1977.

[3] M. Gendreau. Handbook of Metaheuristics, chapter 2: An Introduction


to Tabu Search, pages 3754. Springer, 2003.

[2] C-N. Fiechter. A parallel tabu search algorithm for large traveling
salesman problems. Discrete Applied Mathematics, 3(6):243267, 1994.

[1] D. Cvijovic and J. Klinowski. Taboo search: An approach to the


multiple minima problem. Science, 267:664666, 1995.

2.10.7

Glover provides a high-level introduction to Tabu Search in the form of a


practical tutorial [8], as does Glover and Taillard in a user guide format
[10]. The best source of information for Tabu Search is the book dedicated
to the approach by Glover and Laguna that covers the principles of the
technique in detail as well as an in-depth review of applications [11]. The
approach appeared in Science, that considered a modication for its application to continuous function optimization problems [1]. Finally, Gendreau
provides an excellent contemporary review of the algorithm, highlighting
best practices and application heuristics collected from across the eld of
study [3].

Learn More

problems [4]. Glover provided a seminal overview of the algorithm in a


two-part journal article, the rst part of which introduced the algorithm
and reviewed then-recent applications [6], and the second which focused on
advanced topics and open areas of research [7].

2.10. Tabu Search


Chapter 2. Stochastic Algorithms

[12] J. Knox. Tabu search performance on the symmetric traveling salesman


problem. Computers & Operations Research, 21(8):867876, 1994.

[11] F. W. Glover and M. Laguna. Tabu Search. Springer, 1998.

78

Reactive Tabu Search

Taxonomy

79

Strategy

Procedure

Heuristics

Reactive Tabu Search is an extension of Tabu Search and as such


should exploit the best practices used for the parent algorithm.

2.11.4

Algorithm 2.11.1 provides a pseudocode listing of the Reactive Tabu Search


algorithm for minimizing a cost function. The Pseudocode is based on the
version of the Reactive Tabu Search described by Battiti and Tecchiolli in [9]
with supplements like the IsTabu function from [7]. The procedure has been
modied for brevity to exude the diversication procedure (escape move).
Algorithm 2.11.2 describes the memory based reaction that manipulates
the size of the ProhibitionPeriod in response to identied cycles in the
ongoing search. Algorithm 2.11.3 describes the selection of the best move
from a list of candidate moves in the neighborhood of a given solution. The
function permits prohibited moves in the case where a prohibited move is
better than the best know solution and the selected admissible move (called
aspiration). Algorithm 2.11.4 determines whether a given neighborhood
move is tabu based on the current ProhibitionPeriod, and is employed
by sub-functions of the Algorithm 2.11.3 function.

2.11.3

The objective of Tabu Search is to avoid cycles while applying a local search
technique. The Reactive Tabu Search addresses this objective by explicitly
monitoring the search and reacting to the occurrence of cycles and their
repetition by adapting the tabu tenure (tabu list size). The strategy of the
broader eld of Reactive Search Optimization is to automate the process by
which a practitioner congures a search procedure by monitoring its online
behavior and to use machine learning techniques to adapt a techniques
conguration.

2.11.2

Reactive Tabu Search is a Metaheuristic and a Global Optimization algorithm. It is an extension of Tabu Search (Section 2.10) and the basis for a
eld of reactive techniques called Reactive Local Search and more broadly
the eld of Reactive Search Optimization.

2.11.1

Reactive Tabu Search, RTS, R-TABU, Reactive Taboo Search.

2.11

2.11. Reactive Tabu Search


Chapter 2. Stochastic Algorithms

Algorithm 2.11.2: Pseudocode for the MemoryBasedReaction function.


Input: Increase, Decrease, ProblemSize
Output:
1 if HaveVisitedSolutionBefore(Scurr , VisitedSolutions) then
2
Scurrt RetrieveLastTimeVisited(VisitedSolutions, Scurr );
3
RepetitionInterval Iterationi Scurrt ;
4
Scurrt Iterationi ;
5
if RepetitionInterval < 2 ProblemSize then
6
RepetitionIntervalavg 0.1 RepetitionInterval + 0.9
RepetitionIntervalavg ;
ProhibitionPeriod ProhibitionPeriod Increase;
7
8
P rohibitionP eriodt Iterationi ;
9
end
10 else
11
VisitedSolutions Scurr ;
12
Scurrt Iterationi ;
13 end
14 if Iterationi P rohibitionP eriodt > RepetitionIntervalavg then
15
ProhibitionPeriod Max(1, ProhibitionPeriod Decrease);
16
P rohibitionP eriodt Iterationi ;
17 end

Algorithm 2.11.1: Pseudocode for Reactive Tabu Search.


Input: Iterationmax , Increase, Decrease, ProblemSize
Output: Sbest
1 Scurr ConstructInitialSolution();
2 Sbest Scurr ;
3 TabuList ;
4 ProhibitionPeriod 1;
5 foreach Iterationi Iterationmax do
6
MemoryBasedReaction(Increase, Decrease, ProblemSize);
7
CandidateList GenerateCandidateNeighborhood(Scurr );
8
Scurr BestMove(CandidateList);
9
TabuList Scurrf eature ;
10
if Cost(Scurr ) Cost(Sbest ) then
11
Sbest Scurr ;
12
end
13 end
14 return Sbest ;

80

81

The increase parameter should be greater than one (such as 1.1 or


1.3) and the decrease parameter should be less than one (such as 0.9
or 0.8).

Reactive Tabu Search was proposed to use an long-term memory to


diversify the search after a threshold of cycle repetitions has been
reached.

Reactive Tabu Search was proposed to use ecient memory data


structures such as hash tables.

Reactive Tabu Search was designed for discrete domains such as


combinatorial optimization, although has been applied to continuous
function optimization.

Algorithm 2.11.4: Pseudocode for the IsTabu function.


Input:
Output: Tabu
1 Tabu FALSE;
t
2 Scurrf eature RetrieveTimeFeatureLastUsed(Scurrf eature );
t
3 if Scurrf eature Iterationcurr ProhibitionPeriod then
4
Tabu TRUE;
5 end
6 return Tabu;

Algorithm 2.11.3: Pseudocode for the BestMove function.


Input: ProblemSize
Output: Scurr
1 CandidateListadmissible GetAdmissibleMoves(CandidateList);
2 CandidateListtabu CandidateList CandidateListadmissible ;
3 if Size(CandidateListadmissible ) < 2 then
4
ProhibitionPeriod ProblemSize 2;
5
P rohibitionP eriodt Iterationi ;
6 end
7 Scurr GetBest(CandidateListadmissible );
8 Sbesttabu GetBest(CandidateListtabu );
9 if Cost(Sbesttabu ) < Cost(Sbest ) Cost(Sbesttabu ) < Cost(Scurr )
then
10
Scurr Sbesttabu ;
11 end
12 return Scurr ;

2.11. Reactive Tabu Search

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

Code Listing

Chapter 2. Stochastic Algorithms

def is_tabu?(edge, tabu_list, iter, prohib_period)


tabu_list.each do |entry|

end

perm = Array.new(parent)
c1, c2 = rand(perm.size), rand(perm.size)
exclude = [c1]
exclude << ((c1==0) ? perm.size-1 : c1-1)
exclude << ((c1==perm.size-1) ? 0 : c1+1)
c2 = rand(perm.size) while exclude.include?(c2)
c1, c2 = c2, c1 if c2 < c1
perm[c1...c2] = perm[c1...c2].reverse
return perm, [[parent[c1-1], parent[c1]], [parent[c2-1], parent[c2]]]

def stochastic_two_opt(parent)

end
return perm
end

perm = Array.new(cities.size){|i| i}
perm.each_index do |i|
r = rand(perm.size-i) + i
perm[r], perm[i] = perm[i], perm[r]

def random_permutation(cities)

end
return distance
end

distance = 0
perm.each_with_index do |c1, i|
c2 = (i==perm.size-1) ? perm[0] : perm[i+1]
distance += euc_2d(cities[c1], cities[c2])

def cost(perm, cities)

end

Math.sqrt((c1[0] - c2[0])**2.0 + (c1[1] - c2[1])**2.0).round

def euc_2d(c1, c2)

Listing 2.10 provides an example of the Reactive Tabu Search algorithm


implemented in the Ruby Programming Language. The algorithm is applied
to the Berlin52 instance of the Traveling Salesman Problem (TSP), taken
from the TSPLIB. The problem seeks a permutation of the order to visit
cities (called a tour) that minimizes the total distance traveled. The optimal
tour distance for Berlin52 instance is 7542 units.
The procedure is based on the code listing described by Battiti and
Tecchiolli in [9] with supplements like the IsTabu function from [7]. The
implementation does not use ecient memory data structures such as hash
tables. The algorithm is initialized with a stochastic 2-opt local search,
and the neighborhood is generated as a xed candidate list of stochastic
2-opt moves. The edges selected for changing in the 2-opt move are stored
as features in the tabu list. The example does not implement the escape
procedure for search diversication.

2.11.5

82

92

91

90

89

88

87

86

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

47

46

45

44

43

42

41

40

39

38

37

entry = {}
entry[:edgelist] = to_edge_list(permutation)
entry[:iter] = iteration
entry[:visits] = 1
visited_list.push(entry)

def store_permutation(visited_list, permutation, iteration)

end
return nil
end

edgeList = to_edge_list(permutation)
visited_list.each do |entry|
return entry if equivalent?(edgeList, entry[:edgelist])

def get_candidate_entry(visited_list, permutation)

end

candidate = {}
candidate[:vector], edges = stochastic_two_opt(best[:vector])
candidate[:cost] = cost(candidate[:vector], cities)
return candidate, edges

def generate_candidate(best, cities)

def equivalent?(el1, el2)


el1.each {|e| return false if !el2.include?(e) }
return true
end

end
return list
end

list = []
perm.each_with_index do |c1, i|
c2 = (i==perm.size-1) ? perm[0] : perm[i+1]
c1, c2 = c2, c1 if c1 > c2
list << [c1, c2]

def to_edge_list(perm)

end

entry = {:edge=>edge, :iter=>iter}


tabu_list.push(entry)
return entry

end
end

entry[:iter] = iter
return entry

def make_tabu(tabu_list, edge, iter)


tabu_list.each do |entry|
if entry[:edge] == edge

if entry[:edge] == edge
return true if entry[:iter] >= iter-prohib_period
return false
end
end
return false
end

2.11. Reactive Tabu Search

83

148

147

146

145

144

143

142

141

140

139

138

137

136

135

134

133

132

131

130

129

128

127

126

125

124

123

122

121

120

119

118

117

116

115

114

113

112

111

110

109

108

107

106

105

104

103

102

101

100

99

98

97

96

95

94

93

Chapter 2. Stochastic Algorithms

end
end

current,best_move_edges = (admis.empty?) ? tabu.first : admis.first


if !tabu.empty?
tf = tabu.first[0]
if tf[:cost]<best[:cost] and tf[:cost]<current[:cost]
current, best_move_edges = tabu.first

end

candidates.sort! {|x,y| x.first[:cost] <=> y.first[:cost]}


tabu,admis = sort_neighborhood(candidates,tabu_list,prohib_period,iter)
if admis.size < 2
prohib_period = cities.size-2
last_change = iter

end

candidates = Array.new(max_cand) do |i|


generate_candidate(current, cities)

end

prohib_period = [prohib_period*decrease,1].max
last_change = iter

end
if iter-last_change > avg_size

store_permutation(visited_list, current[:vector], iter)

end
else

current = {:vector=>random_permutation(cities)}
current[:cost] = cost(current[:vector], cities)
best = current
tabu_list, prohib_period = [], 1
visited_list, avg_size, last_change = [], 1, 0
max_iter.times do |iter|
candidate_entry = get_candidate_entry(visited_list, current[:vector])
if !candidate_entry.nil?
repetition_interval = iter - candidate_entry[:iter]
candidate_entry[:iter] = iter
candidate_entry[:visits] += 1
if repetition_interval < 2*(cities.size-1)
avg_size = 0.1*(iter-candidate_entry[:iter]) + 0.9*avg_size
prohib_period = (prohib_period.to_f * increase)
last_change = iter

def search(cities, max_cand, max_iter, increase, decrease)

end
end
return [tabu, admissable]
end

admissable << a

else

tabu, admissable = [], []


candidates.each do |a|
if is_tabu?(a[1][0], tabu_list, iteration, prohib_period) or
is_tabu?(a[1][1], tabu_list, iteration, prohib_period)
tabu << a

def sort_neighborhood(candidates, tabu_list, prohib_period, iteration)

return entry
end

84

175

174

173

172

171

170

169

168

167

166

165

164

163

162

161

160

159

158

157

156

155

154

153

152

151

150

149

References

Reactive Tabu Search was abstracted to a form called Reactive Local


Search that considers adaptive methods that learn suitable parameters for
heuristics that manage an embedded local search technique [4, 5]. Under
this abstraction, the Reactive Tabu Search algorithm is a single example

Learn More

Reactive Tabu Search was proposed by Battiti and Tecchiolli as an extension


to Tabu Search that included an adaptive tabu list size in addition to a
diversication mechanism [7]. The technique also used ecient memory
structures that were based on an earlier work by Battiti and Tecchiolli that
considered a parallel tabu search [6]. Some early application papers by
Battiti and Tecchiolli include a comparison to Simulated Annealing applied
to the Quadratic Assignment Problem [8], benchmarked on instances of
the knapsack problem and N-K models and compared with Repeated Local
Minima Search, Simulated Annealing, and Genetic Algorithms [9], and
training neural networks on an array of problem instances [10].

Primary Sources

2.11.6

Listing 2.10: Reactive Tabu Search in Ruby

# problem configuration
berlin52 = [[565,575],[25,185],[345,750],[945,685],[845,655],
[880,660],[25,230],[525,1000],[580,1175],[650,1130],[1605,620],
[1220,580],[1465,200],[1530,5],[845,680],[725,370],[145,665],
[415,635],[510,875],[560,365],[300,465],[520,585],[480,415],
[835,625],[975,580],[1215,245],[1320,315],[1250,400],[660,180],
[410,250],[420,555],[575,665],[1150,1160],[700,580],[685,595],
[685,610],[770,610],[795,645],[720,635],[760,650],[475,960],
[95,260],[875,920],[700,500],[555,815],[830,485],[1170,65],
[830,610],[605,625],[595,360],[1340,725],[1740,245]]
# algorithm configuration
max_iter = 100
max_candidates = 50
increase = 1.3
decrease = 0.9
# execute the algorithm
best = search(berlin52, max_candidates, max_iter, increase, decrease)
puts "Done. Best Solution: c=#{best[:cost]}, v=#{best[:vector].inspect}"

if __FILE__ == $0

end
return best
end

end

85

best_move_edges.each {|edge| make_tabu(tabu_list, edge, iter)}


best = candidates.first[0] if candidates.first[0][:cost] < best[:cost]
puts " > it=#{iter}, tenure=#{prohib_period.round}, best=#{best[:cost]}"

2.11. Reactive Tabu Search

Chapter 2. Stochastic Algorithms

Bibliography

[10] R. Battiti and G. Tecchiolli. Training neural nets with the reactive
tabu search. IEEE Transactions on Neural Networks, 6(5):11851200,
1995.

[9] R. Battiti and G. Tecchiolli. Local search with memory: Benchmarking


RTS. Operations Research Spektrum, 17(2/3):6786, 1995.

[8] R. Battiti and G. Tecchiolli. Simulated annealing and tabu search in


the long run: a comparison on qap tasks. Computer and Mathematics
with Applications, 28(6):18, 1994.

[7] R. Battiti and G. Tecchiolli. The reactive tabu search. ORSA Journal
on Computing, 6(2):126140, 1994.

[6] R. Battiti and G. Tecchiolli. Parallel biased search for combinatorial optimization: genetic algorithms and tabu. Microprocessors and
Microsystems, 16(7):351367, 1992.

[5] R. Battiti and M. Protasi. Reactive local search for the maximum
clique problem. Algorithmica, 29(4):610637, 2001.

[4] R. Battiti and M. Protasi. Reactive local search for the maximum
clique problem. Technical Report TR-95-052, International Computer
Science Institute, Berkeley, CA, 1995.

[3] R. Battiti, M. Brunato, and F. Mascia. Reactive Search and Intelligent


Optimization. Springer, 2008.

[2] R. Battiti and M. Brunato. Handbook of Metaheuristics, chapter


Reactive Search Optimization: Learning while Optimizing. Springer
Verlag, 2nd edition, 2009.

[1] R. Battiti. Machine learning methods for parameter tuning in heuristics.


In 5th DIMACS Challenge Workshop: Experimental Methodology Day,
1996.

2.11.7

of the Reactive Local Search principle applied to the Tabu Search. This
framework was further extended to the use of any adaptive machine learning
techniques to adapt the parameters of an algorithm by reacting to algorithm
outcomes online while solving a problem, called Reactive Search [1]. The
best reference for this general framework is the book on Reactive Search
Optimization by Battiti, Brunato, and Mascia [3]. Additionally, the review
chapter by Battiti and Brunato provides a contemporary description [2].

86