Professional Documents
Culture Documents
Genetic algorithms (GAs) are stochastic adaptive algorithms whose search method is based
on simulation of natural genetic inheritance and Darwinian striving for survival. They can be
used to find approximate solutions to numerical optimization problems in cases where
finding the exact optimum is prohibitively expensive, or where no algorithm is known.
However, such applications can encounter problems that sometimes delay, if not prevent,
finding the optimal solutions with desired precision. In this paper we describe applications of
GAs to numerical optimization, present three novel ways to handle such problems, and give
some experimental results.
Keywords: Genetic algorithm, random algorithm, optimization technique, constraint han-
dling, local tuning, convergence
the last non-don't care symbols of a chromosome. It cations we rely mostly on empirical results. During the last
defines the compactness of information contained in a 15 years many GA applications were developed which
schema. supported the building block hypothesis in many different
Assuming that the selective probability is proportional problem domains. Nevertheless, this hypothesis suggests
to fitness, and independent probabilities, Pc and Pro, for that the problem of coding for a GA is critical for its
crossover and mutation, respectively, we can derive the performance, and that such a coding should satisfy the
following growth equation (see e.g. Goldberg, 1989): idea of short building blocks.
Since the binary alphabet offers the maximum number
f(H, t)
m(H, t + l) > m ( H , t ) f(t) of schemata per bit of information of any coding (see
Goldberg, 1989), the bit string representation of solutions
has dominated genetic algorithm research. Additionally,
- P c ~I(H) - pmo(H) ] ( 1)
such coding provides simplicity of analysis and elegance of
available operators. But the 'implicit parallelism' result
where m(H, t) is the number of schema H at time t, f(H, t)
does not depend on the use of bit strings (see Antonisse
is the average fitness of schema H at time t, and f(t) is the
and Keller, 1987)--hence it may be worthwhile experi-
average fitness of the population. This equation is also
menting with richer data structures and other 'genetic'
based on the assumption that the fitness function f returns
operators. This may be important in particular in the
only positive values; when applying GAs to optimization
presence of non-trivial constraints on potential solutions
problems where the optimization function may return
to the problem.
negative values, some additional mapping between opti-
mization and fitness functions is required (see Goldberg,
1989). 2.3 How they work
The growth equation (1) shows that selection increases
Suppose we wish to maximize a function of k variables,
sampling rate of the above-average schemata, and that this
f(xl . . . . . xk): ~k ~ R. Suppose further, that each variable
change is exponential. The sampling itself does not intro-
xi can take values from a domain Di = [ai, b;] ___. We
duce any new schemata (not represented in the initial t = 0
wish to optimize the function f with some precision:
sampling). This is exactly why the crossover operator is
suppose six decimal places for the variables' values is
introduced--to enable structured yet random information
desired.
exchange. Additionally, the mutation operator introduces
It is clear that to achieve such precision each domain D~
greater variability into the population. The combined (dis-
should be cut into (b; - a~) 106 equal size ranges. Let us
ruptive) effect of these operators on a schema is not
denote by mi the smallest integer such that
significant if the schema is short and low-order. This result
(b~- a i ) 106< 2m~- 1 . Then, a representation having
can be stated as:
each variable coded as a binary string of length m i clearly
satisfies the precision requirement. Additionally, the fol-
Theorem 1 (Schema Theorem) Short, low-order, above-
lowing formula easily interprets each such string:
average schemata receive exponentially increasing trials in
subsequent generations of a genetic algorithm. bi - ai
x~ = a~ + decimal( 1001 . . . 0012) - -
2 mi -- 1
An immediate result of this theorem is that GAs explore
the search space by short schemata which, subsequently, where decimal(string2) represents the decimal value of that
are used for information exchange during crossover: binary string.
Now, each chromosome (as a potential solution) is
Hypothesis 1 (Building Block Hypothesis) A genetic al- represented by a binary string of length m = Ek= 1 m~; the
gorithm seeks near-optimal performance through the juxta- first ml bits map into a value from the range [al, bl], the
position of short, low-order, high-performance schemata, next group of m 2 bits map into a value from the range
called the building blocks. [a2, b2] , the last group of mk bits map into a value from
the range [ak, bk].
As stated by Goldberg (1989): To initialize the population, if we do not have any
intuition about the distribution of potential optima, we
Just as a child creates magnificent fortresses through the can simply set some number, pop_size, of chromosomes
arrangement of simple blocks of wood, so does a genetic randomly in a bitwise fashion. Otherwise, we can provide
algorithm seek near-optimal performance through the jux-
some initial (potential) solutions.
taposition of short, low-order, high performance schemata,
or building blocks. The rest of the algorithm is straightforward: in each
generation we evaluate each chromosome (using the func-
Although some research has been done to prove this tion f on the decoded sequences of variables), select a new
hypothesis (e.g. Bethke, 1980), for most non-trivial appli- population according to the probability distribution based
78 Michalewicz and Janikow
on fitness values, and recombine the chromosomes in the The next 15 bits
new population by mutation and crossover operators.
After a number of generations, when no further improve- 1111 lO0 lOlO0010
ment is observed, the best chromosome represents a (pos- represent
sibly the global) optimal solution. In practice, we stop the 5.8 - 4 . 1
x 2 = 4.1 + decimal(lllll00101000102) 215_ 1
algorithm after a fixed number of iterations, depending on
speed and resource criteria. 1.7
= 4.1 + 31906 - - = 4.1 + 1.655330 = 5.755330.
32767
2.4 An example The last 19 bits
217 < 151 000 < 2 TM The fitness value for this chromosome is
optimal solutions with a desired precision. Such problems observed by Grefenstette (1987a):
originate from many sources,--for example, the coding,
Like natural genetic systems, GAs progress by virtue of
which moves the operational search space away from the changing the distribution of high performance substruc-
problem space; insufficient number of iterations, and in- tures in the overall population; individual structures are
sufficient population size. These problems then manifest not the focus of attention. Once the high performance
themselves in premature convergence of the entire popu- regions of the search space are identified by a GA, it may
lation to a non-global optimum; inability to perform fine be useful to invoke a local search routine to optimize the
local tuning; or inability to operate in the presence of members of the final population.
non-trivial constraints. In this section we describe these
Local search requires the utilization of schemata of
three manifestations and in Sections 4 - 6 we present quite
higher order and longer defining length that those sug-
novel ways to handle them.
gested by Theorem 1. Holland (1975) suggested that the
GA should be used as a preprocessor to perform the
3.1 Premature convergence initial search, before turning the search process over to a
system that can employ domain knowledge to guide the
Premature convergence is a common problem of GAs
local search. Additionally, there are problems where the
and other optimization algorithms. If convergence occurs
domains of parameters are unlimited, the number of
too rapidly, then the valuable information developed in
parameters is quite large, and high precision is required.
part of the population is often lost. Implementations of
These requirements imply that the length of the (binary)
genetic algorithms are prone to converge prematurely
solution vector is quite significant (for 100 variables with
before the optimal solution has been found. As stated by
domains in the range [ - 5 0 0 , 500], where the six digits'
Booker (1987):
precision after the decimal point is required, the length of
While the performance of most implementations is com- the binary solution vector is 3000). For such problems
parable to or better than the performance of many other the performance of GAs is quite poor.
search techniques, it [the GA] still fails to live up to the
high expectations engendered by the theory. The problem
is that, while the theory points to sampling rates and 3.3 Constraints
search behavior in the limit, any implementation uses a
finite population or set of sample points. Estimates based The central problem in applications of GAs is that of
on finite samples inevitably have a sampling error and lead constraints; until recently there was no promising
to search trajectories much different from those theoreti- methodology for handling them.
cally predicted. This problem is manifested in practice as a Traditionally, to solve a constrained optimization
premature loss of diversity in the population with the problem using the GA approach we use some penalty
search converging to a sub-optimal solution. functions. However, such approaches suffer from the dis-
To improve the performance of GAs, DeJong (1975) advantage of being tailored to the problem and are not
investigated five modifications of the basic algorithm. sufficiently general to handle a variety of problems.
These modifications were called the elitist model, the In this approach we generate potential solutions with-
expected value model, the elitist expected value model, the out considering the constraints, but incorporating in the
crowding factor model, and the generalized crossover evaluation function penalties for suppressing illegal candi-
model. A few years later, Brindle (1981) examined five dates. However, though the evaluation function is usually
further modifications: deterministic sampling, remainder well defined, there is no accepted melthodology for
stochastic sampling without replacement, stochastic sam- combining it with the penalty (Richardson et al, 1989).
piing without replacement, remainder stochastic sampling Davis (1987) discusses a problem in deciding how large a
with replacement, and stochastic tournament. A detailed penalty to impose:
discussion of all these modifications is presented by Gold- If one incorporates a high penalty into the evaluation
berg (1989). Quite another direction in relaxing this prob- routine and the domain is one in which production of an
lem borrows ideas from simulated annealing (see e.g. individual violating the constraint is likely, one runs the
Sirag and Weisser, 1987). risk of creating a genetic algorithm that spends most of its
In general, most approaches which attempted to im- time evaluating illegal individuals. Further, it can happen
prove the convergence of GAs presented some modifica- that when a legal individual is found, it drives the others
tions to the selection routine. out and the population converges on it without finding
better individuals, since the likely paths to other legal
individuals require the production of illegal individuals as
3.2 Fine local tuning intermediate structures, and the penalties for violating the
constraints make it unlikely that such intermediate struc-
Genetic algorithms display inherent difficulties in per- ture will reproduce. If one imposes moderate penalties, the
forming local search for the numerical applications. As system may evolve individuals that violate the constraint
80 Michalewicz and Janikow
but are rated better than those that do not because the rest of ior only through finite sampling from the population.
the evaluation function can be satisfied better by accepting the Therefore, we define a measure using statistical random
moderate constraint penalty than by avoiding it. sample variance and mean (we call such a measure a
span):
4. Premature convergence
/ 1
x/n-li=l
As stated in Section 3, one of the problems encountered by
st= 1 n
- r. f ( x ~ )
GA applications is premature convergence of the entire ni=l
population to a non-global optimum. The premature con- which can be rewritten as:
vergence is closely related to the characteristics of the
function itself, and to the magnitude and kind of errors I n . Elf2(x~)
__ /=t
introduced by the sampling mechanism. s_
The problem of premature convergence is primarily
related to the existence of local optima, and depends on
both function characteristics and sampling of the solution This measure is normalized so that it is quite function-
space9 For example, assume that stveP(t) is close to some independent.
local optimum, and f(stv) is much greater than the average Genetic algorithm applications normally require that all
evaluationf(t). Also, assume that there is no s~ close to the chromosomes evaluate to non-negative numbers; such a
global maximum sought; this might be the case for multi- requirement relates to the sampling mechanism and can be
modal functions. In such a case, there is a fast convergence easily enforced9 In such a case m a x ( s t ) = x / ~ since
toward that local optimum, with little chance of more ] En l f 2i _< (nf/)2. However, such a maximal value is ob-
global exploration needed to search for other optima. tained only when all but one evaluations are z e r o - - a very
While such a behavior is permissible at the later evolution- unlikely case. Experiments show that most spans are less
ary stages, and even desired at the final ones, it is quite than one. Moreover, we have determined experimentally,
disturbing very early. We propose an approach (see by running different functions with differently scaled s and
Janikow and Michalewicz, 1991; Janikow and Michalewicz, observing the average optimization behavior, that s* -- 0.1
1990) which diminishes this problem by decreasing the gives the best possible trade-off between space exploration
speed of convergence during the early stages of population and speed of convergence; therefore, we subsequently treat
existence. all cases with respect to the relationship between st and s*.
The other problem, also reflecting on the convergence, is This particular choice of s* is rather rough and more
related to shifts in the average population fitness. Consider extensive experiments are planned to approximate it in a
two functions: fl(s), and f2(s) = f l ( s ) + const, which have more systematic way. Moreover, it would be interesting to
the same relative optima. One would expect that both can see whether it could be approximated theoretically; how-
be optimized with similar degree of difficulty. However, if ever, the mechanism we are about to introduce is little
const ~>f~(s), then either the function f2(s) will suffer from affected by some variations of s*.
much slower convergence than the function fl(s), or the To preserve efficiency we do not want to recalculate st at
function fl(s) will converge possibly to a local optimum. each iteration. This is actually not necessary as this mea-
Some of the previous approaches to this problem used sure finds the function's characteristics determinable from
rank instead of actual values f(s~) to guide the selective a random sample. We have determined experimentally
process. Such an approach suffers from some drawbacks. that very often the initial population provides a very good
First, it puts the responsibility on the user to decide on the approximation. However, if desired, such sampling may be
best selection mechanism. Second, it totally ignores infor- performed for some time before the algorithm actually
mation it holds about the absolute evaluations of different starts iterating (we call such a span So).
chromosomes. Third, it treats all cases uniformly, regard- For the construction of our scaling mechanism, which
less of the magnitude of the problem. we call a non-uniform power scaling, we use two dynamic
To deal with such cases we introduce a measure of the parameters: the span s and population age t (this parame-
problematic characteristics of the function being opti- ter is taken to be the iteration number of the algorithm).
mized, which we later incorporate into the sampling mech- What we seek is a mechanism which produces more ran-
anism. dom search for small ages and increasingly selects only the
best chromosomes at late stages of the population life.
Moreover, the net effect of such a mechansim should
4.1 The scaling mechanism depend on the span; a greater span should cause the whole
We are mostly interested in some average behavior of the mechanism to put less emphasis on chromosomes' fitness,
function being optimized. However, we know such behav- somehow randomizing the sampling. The scaling itself uses
Genetic algorithms for numerical optimization 81
0 . t 0 . t t
t=T t=T t=T
Fig. 2. k values for functions A, B, and C: co= 0.1, Pl = 0.05, p: = 0.1.
82 Michalewicz and Janikow
fB
1
a x
0 ) X
f~
a = 1000
. X
7~
We used a traditional binary representation with 30 bits independent runs, each with population size of 40 and
per variable; therefore, a chromosome was a vector of 260 iterated 5000 times.
binary bits for # e l e m s = 10. Moreover, the 30 bits gave us As expected, function B turned out to be the most
a precision of Ax = n/23~ -~ 3 x 10 - 9 . This error propa- difficult for the GA; its high span S o - 1.0 caused many
gated to function error faulty local optima to be included in the solutions. This is
# elems af also the case where our scaling mechanism gave best
Af(x) = Ax improvements, both in terms of faulty convergence and of
i= 1 Xi the absolute magnitude of solution vectors. The most
which, very pessimistically, can be approximated to average function A also benefited from the mechanism in
# elems
terms of both measures; this function has a usual span, but
Af(x) = A x ~' (1 + 4 q i ) = A x ( 1 0 + 2 2 0 q ) is still highly multimodal, difficult for any method. The
i=1 smallest influence was observed on function C, the one with
This, in turn, translates to about 7 x 10 - 7 for functions A a very small span; such characteristics prohibited faulty
and C, and about 1.6 x 10 - 4 for function B. peak selections in a natural way. The increased selective
During the runs we used one-bit mutations (0.001 rate pressure in older populations, generated by higher k-values,
on bits), single-point crossover (0.2 rate on chromosomes), slightly improved accuracy, while preserving high precision.
and inversion (0.02 rate on chromosomes). The results are We have yet to conduct more systematic testing in order
summarized in Table 1; they represent an average of 25 to optimize the various parameters used in our scaling
Accuracy Imprecision
The GA GA GA GA
absolute without with without with
Function optimum fglobal __ a scaling scaling scaling scaling
mechanism. Nevertheless, these results indicate its useful- tion requires appropriately different operators; the inter-
ness as an automatic problem analyser in cases where the ested reader should refer to Janikow and Michalewicz
user is not expected to perform such an analysis. (1990).
As to the increased computational complexity of calcu-
lating the k parameter, this was rather an insignificant
5.1 The n o n - u n i f o r m operator
change for two reasons. First, the span s o capturing the
characteristics of the optimized function was calculated The non-uniform mutation operator was introduced in
only once at iteration t = 0. Moreover, to increase the two of our earlier papers (see Michalewicz and Janikow,
quality of this measure approximated by the final sampling, 1990 and Janikow and Michalewicz, 1990). As mentioned,
it was performed without the population size restriction; this is the operator responsible for the fine tuning capabil-
the number of such samples was set to 200. Second, the ities in a numerical search space. It is defined as follows: if
formula can be partially rewritten to account for the sty = ( v l , . . . , Vm) is a chromosome and the element vk is
constant and incremental parts of it. selected for mutation (the domain of vk is [lk, uk]), the
result is a vector s'v+ l = ( v l , . . . , v ' k , . . . , v ~ ) , with
YT t / T = 0.50, b = 2
Y
t / T = 0.90, b = 2
t" 0 D K
1
Fig. 4. A (t, y)for two selected times.
5.3 Experiments and results We experimented with ten different cases, as defined in
Table 2. F o r each, we repeated three separate runs o f
The exact representation is as follows: for a problem of m 40 000 generations and reported the best (with respect to
variables, each c h r o m o s o m e , representing a permissible final value) in Table 3. The same table also gives some
solution, is represented as a vector of m floating point intermediate results after a selected n u m b e r o f generations;
numbers sty = (Vl . . . . , vn) (when the generation n u m b e r t for example, the values in the 10 000 c o l u m n indicate the
and the c h r o m o s o m e n u m b e r i are not important, we partial results after 10000 generations, while running
simply write s). The precision o f such a representation is 40 000. It is important to note that such values are some-
fixed for a given machine, and based on the precision o f how worse than those obtained while running only 10 000
the floating point (or double, if needed) type. generations, due to the nature o f the n o n - u n i f o r m opera-
Table 2. The ten test cases for the dynamic control problem
Case N Xo s r q a b
1 45 100 1 1 1 1 1
2 45 100 10 1 1 1 1
3 45 100 1000 1 1 1 1
4 45 100 1 10 1 1 1
5 45 100 1 1000 1 1 1
6 45 100 1 1 0 1 1
7 45 100 1 1 1000 1 1
8 45 100 1 1 1 0.01 1
9 45 100 1 1 1 1 0.01
10 45 100 1 1 1 1 100
Table 3. Results of the modified genetic algorithm on the dynamic control problem
Generations
Case 1 100 1000 10 000 20 000 30 000 40 000 Factor
1 17 807.4 3.27 985 1.74689 1.61 866 1.61 825 1.61 804 1.61 803 104
2 13 670.4 5.33 177 1.45968 1.11 349 1.09 205 1.09 165 1.09 163 105
3 17 023.8 2.87 485 1.07974 1.00 968 1.00 126 1.00 104 1.00 103 107
4 15 077.3 8.64 310 3,75530 3.71 846 3.70 812 3.70 165 3.70 160 104
5 59 56.43 12.2559 2.89769 2.87 727 2.87 646 2.87 570 2.87 569 105
6 16 657.7 5.07 047 2.05314 1.61 869 1.61 830 1.61 806 1.61 806 104
7 288.0666 19.2684 7.02566 1.63 464 1.62 412 1.61 888 1.61 882 104
8 116.982 67.1758 1.92764 1.00 009 1.00 005 1.00 005 1.00 005 104
9 7.18 263 4.42 849 4.37093 4.31 504 4.31 024 4.31 004 4.31 004 105
10 987 0352 138 132 16096.0 1.38 244 1.00 041 1.00 010 1.00 010 104
Genetic algorithms f o r numerical optimization 85
GA GA
Exact solution w/ non-uniform mutation w/o non-uniform mutation
Case value value D (%) value D (%)
Generations CPU
GA 1 10 I00 1000 10 000 40 000 time
with non-uniform mutation 17 807 4000 293 564 32 798 17 468 16 186 16 180 719.4sec*
without non-uniform mutation 17 243 100 415 623 59 872 17 931 17 445 16 234 719.1 sec*
*Times are similar since the GA without the special mutation performed other operations at a higher rate in order to
achieve the same rate of breeding. For a comparison, a GA with a binary representation of 30 bits per variable used
12622 sec CPU for the same run (all runs reported from a DEC3100 station).
tor. For all cases the population size was fixed at 100. The verged much faster to that solution (see Table 5 for case
domain of all variables was set to [ - 2 0 0 , 200]. 1). In other words it has an additional advantage in time
It is interesting to compare these results with the exact constrained situations.
solutions as well as those obtained from another G A - - Figure 5 illustrates the non-uniform mutation's effect on
exactly the same one but without the non-uniform muta- the evolutionary process. The new mutation causes quite
tion on. Table 4 summarizes the results; columns labeled an increase in the number of improvements observed in
D indicate the relative percentage errors. The G A using the population at the end of the population's life. More-
the non-uniform mutation clearly outperforms the other over, the smaller number of such improvements prior to
one with respect to the accuracy of the found optimal that time, together with an actually faster convergence,
solution; while the enhanced G A was rarely in error by clearly indicates a better overall search.
more than a few thousandths of 1%, the other one was In other publications (see Michalewicz et al, 1990 and
hardly ever better than 1% out. Moreover, it also con- Janikow and Michalewicz, 1990) we present more experi-
86 Michalewicz and Janikow
merits with other dynamic control problems: these prove the remaining two:
the superiority of genetic approaches over commercial
optimization packages as well. X3 = 5 -- X 1 -- X2
x4= 3 -xl
x6 = 3 + x l + x 2
The proposed methodology for handling linear con-
straints by genetic algorithms (see Michalewicz and Thus, we have reduced the original problem to the opti-
Janikow, 1991) combines some previous ideas, but in a mization problem of two variables xl and x2:
totally new way and context. The main idea lies in a
careful elimination of the equation constraints, and de- g(xl, x2) =f(xl, x2, (5 - Xl - x2), (3 - xl),
signing d y n a m i c operators preserving the inequality con-
straints. Both types of constraint can be processed very (4 - x2), (3 + xl + x2))
efficiently if they contain only linear equations; so con- subject to the following constraints (inequalities only):
straints problems include many of the interesting opti-
mization cases. xl > 0, x2 > 0
Linear constraints are of two types: equalities and in-
equalities (the latter include all variables' domains). We 5-x~-x2--- 0
first eliminate all the equalities, reducing the number of
3-x1>0
variables and modifying the inequalities; reducing the set
of variables both decreases the length of the representa- 4--X2-->0
tion and reduces the search space. Left with only linear
3+xl+x2>0
inequalities, we deal with the convex search space which,
in the presence of dynamic and closed operators, can be These inequalities can be further reduced to:
searched without explicitly considering the constraints.
The problem then becomes one of designing such closed 0<x1<3
operators. We achieve this by defining them as being
context-dependent, that is, dynamically adjusting to the 0<x2<4
current context. xl+x2< 5
The set of equalities can be eliminated (on a one by
one basis) as follows: first transform an equality so that Now, given a chromosome (a point within the constrained
one of its variables is expressed in terms of the others, solution space), any operator must produce a new feasible
and then substitute all occurrences of this variable by solution (this is what we mean by the c l o s e d n e s s of opera-
such an expression, in all remaining equalities and all tors). This can be achieved by working within the current
inequalities. A detailed description of this approach, context; e.g. if x = (1.8, 2.3) is to be mutated on xl, then
along with theoretical foundations, can be found in the new value of this variable must be taken from the
Michalewicz and Janikow (1991). Here, we provide only range [0, 5 - x2] = [0, 2.7]. This signifies another fact: it is
an example. again much easier to define such operators on a floating
Suppose that we wish to minimize a function of six point representation, as it is not enough to deal locally
variables: with one bit at a time.
~ e ( X l , X 2 , X 3 , X4~ X 5 , X 6 )
6.1 The dynamic operators
subject to the following constraints:
The genetic operators are dynamic, i.e. the value of a
X 1 "q- X 2 ~- X 3 ~--- 5 vector component depends on the remaining values of the
vector.
x 4 + x 5 + x 6 = 10
The value of the ith component of a feasible solution
xl + x 4 = 3 s = (vl . . . . , Vm) is always in some (dynamic) range [l, u];
the bounds l and u depend on the other vector's values
xz + x5 = 4 vl . . . . . v i _ 1, vi+ 1, 9 9 9 Vm, and the set of inequalities. We
say that the ith component (ith gene) of the vector s is
X 1 2 O, X 2 ~ O, X 3 ~ O, X 4 ~ O, X 5 ~ O, X6 ~ 0
m o v a b l e if l < u.
We can take advantage of the presence of four indepen- Before we describe the operators, we present two impor-
dent equations and express four variables as functions of tant characteristics of convex spaces (due to the linearity
Genetic algorithms for numerical optimization 87
of the constraints, the solution space is always a convex of the operator defined in Section 5.1; the only change
space 5~), which play an essential role in the definition relates to the dynamic domains:
of these operators:
, vk + A(t, u(k) "'~ - vk) if a random digit is 0
(1) For any two points s~ and s2 in the solution space vk= v k + A ( t , vk -- l(k))
4 if a random digit is 1
5e, the linear combination as~ + ( 1 - a)s2, where a s [0, 1],
is a point in 5". 6.1.2 Crossover group
(2) For every point s o ~ ~ and any line p such that
So e p, p intersects the boundaries of 5e at precisely two Chromosomes are randomly selected in pairs for applica-
tion of the crossover operators according to appropriate
points, say lp~ and Up~
probabilities. The operators defined here are versions of
Since we are only interested in lines parallel to each axis, the ones outlined in Janikow and Michalewicz (1990) for
to simplify the notation we denote by l~o and u~o the ith floating point crossover, but adjusted to our dynamic
components of the vectors 1~, and Up, respectively, where domains:
the line p is parallel to the axis i. We assume further that
l~,-) -< u~i). (1) Simple crossover. This is defined as follows: if
Because of intuitional similarities, we cluster the opera- s~t=(Vl . . . . 9 vm) andsw' = ( w l , . . ., win) are crossed after
tors into the standard two classes: mutation and crossover. the kth position, the resulting offspring are: st~+~=
The proposed crossover and mutation operators use the (Vl,...,vk, w~+l,...,wm) and s~+ = , ., W k '
1 (W 1 . .
two properties to ensure that the offspring of a point in Vk+l . . . . , Vm). Note the the only permissible split points
the convex solution space 5~ belongs to 5P. For a detailed are between individual floating points (using float repre-
discussion on these topics, the reader is referred to sentation it is impossible to split anywhere else).
Michalewicz and Janikow (1990). However, such an operator may produce offspring out-
side the convex solution space 5~. To avoid this problem,
6.1.1 M u t a t i o n group we use the property of the convex spaces, which says that
Mutations are quite different from the traditional muta- there exists a ~ [0, 1] such that
tion operator with respect to both the actual mutation
(a gene, being a floating point number, is mutated in St+l =(Vl '
v " " ",
Vk, W k + t a + V k + l ( l _ a ) ' "" 9 ,
c h o i c e f r o m the f o l l o w i n g r a n g e : where
a~
0, if0 < x/< 2
\vk-w~
sw
Wk--Vk/
sv
[max(l~7o - Wk, U(~, --V k ) , m i n ( l ~ ) - V__kk,U(~) --_ Wk)~
k \W~--Vk Vk--Wk/3
sw
ci,
g(x~) = ~ 2c"
I
3%
if 2 < xg < 4
if4<x~<_6
i f 6 < x ~ -< 8
i f Vk > Wk
[0, 0] if Vk = Wk 4% i f 8 < x ; < 10
[ 5ci, ifl0<xi
m a x l~) - - V k , udb_SWk , m i n l~) _-- Wk U(k) -- Vk
/ \ w ~ - v~ v~ - w~ / \ v~ - w~ ' w~ ---v~ w i t h p a r a m e t e r s ci as g i v e n in T a b l e 6 a n d s u b j e c t to t h e
if Vk < Wk following equality constraints:
Table 6. Parameters ci
cl = 0 c2 = 21 c3 = 50 c4 = 62 c5 = 93 c 6 = 77 e7 = 1000
c8 = 21 c9 = 0 Clo = 17 Cll = 54 c12 = 67 c13 = 1000 c14 = 48
c15 = 50 c16 = 17 c17 = 0 c18 = 60 c19 = 98 c20 = 67 c21 = 25
e22 = 62 cz3 = 54 c24 = 60 c25 = 0 c26 = 27 e27 1000
= C28 38
=
There are 13 independent and one dependent equations We experimented with various values o f p (close to 1), k
here; therefore, we eliminate thirteen variables: (close to 1 , where 14 is total n u m b e r o f equality con-
x ~ , . . . , x 8, x~5, x22, x36, x44. All remaining variables are s t r a i n t s - - t h e static d o m a i n constraints were naturally sa-
renamed Y ! , " " " , Y 3 6 preserving order, i.e. y~ = Xg, tisfied by a proper representation), and T = 8000. However,
Y2 = Xlo, 9 9 9 , Y6 = X14, Y7 : Xt6, 9 9 9 , Y36 = X49. Each of this m e t h o d did not lead to feasible solutions: in over 1200
these variables has to satisfy four two-sided inequalities, runs (with different r a n d o m n u m b e r generator seeds and
which result from the initial domains and our transforma- various values for parameters k and p) the best c h r o m o -
tions. Now, each c h r o m o s o m e is a float vector somes (after 8000 generations) violated significantly at least
(Yl," " ",Y36)" three constraints. F o r example, very often the algorithm
converged to a solution where the numbers on one diagonal
were equal to 20 and all others were zeros:
6.3 Experiments and results
X 1 = X9 = X17 = X25 = X33 = X 4 1 ~--- X 4 9 = 20
F o r comparative experiments we planned on using a stan-
dard G A a p p r o a c h to constraints by penalties (see e.g. x i - 0 for other values o f i
Richardson et al, 1989) and a version (the student version)
As to G A M S , which only works with continuous func-
o f G A M S , a package for the construction and solution of
tions, we reimplemented the problem using arctangent
large and compex mathematical p r o g r a m m i n g models
functions to approximate each o f the five steps. A parame-
( B r o o k e et al, 1988); we used the M I N O S version of the
ter PA was used to control the 'tightness' o f the fit:
optimizer. However, we did not succeed in the former: the
m a j o r issue in using the penalty functions a p p r o a c h is ~ a r c t a n ( P A ( x i - 2))/~ + 1 + ]
assigning weights to the c o n s t r a i n t s - - t h e s e weights play [ a r c t a n ( P A ( x ; - 4))/re + 89 |
the role o f penalties if a potential solution does not satisfy g(xi) = ci | a r c t a n ( P A ( x ; - 6))/n + 1 + |
them. In experiments with the above problem the evalua- | arctan(PA (xi - 8))/re + 89 /
tion function Eval was c o m p o s e d o f the optimization [ a r c t a n ( P A ( x i - l O ) ) / r c + 89 J
function f and the penalty P :
This m e t h o d allowed the system to find a feasible solution,
Eval(x) = f ( x ) + P but one m u c h worse than those o f our G A (Tables 8 and
9); the G A f o u n d an optimal value o f 24.15 while the
F o r our experiments we used a suggestion (see R i c h a r d s o n other system f o u n d one o f 96.00.
et al, 1989) to start with relaxed penalties and to tighten The results indicate that the usefulness o f our m e t h o d in
them as the run progresses. We used the presence o f m a n y constraints and its superiority over
some standard systems on non-trivial problems. O u r
P=kx xfx Z~41 de modified genetic algorithm was run for 8000 iterations,
with a population size o f 40. The parameters used in all
runs are displayed in Table 10. A single run o f 8000
w h e r e f i s the average fitness o f the population at the given
iterations took 2:28 C P U sec on a C r a y Y - M P . G A M S
generation, t, k and p are parameters, T is the m a x i m u m
was run on an Olivetti 386 with a m a t h coprocessor and a
n u m b e r of generations, and de. returns the 'degree of
single run t o o k 1:20 sec.
constraint violation'. F o r example, for a constraint:
Parameter Value
pop_size 40
References
prop-mutum 0.08
pr ob-mUtbm 0.03 Ackley, D. H. (1987) An empiricial study of bit vector function
prob-mutnm 0.07 optimization, in Genetic Algorithms and Simulated Anneal-
prob_crosssc O.10 ing, Davis, L. (ed), Pitman, London, pp. 170-204.
prob_crosss~ O.10 Antonisse, J. (1989) A new interpretation of schema notation
that overturns the binary encoding constraint, in Proceed-
prob_crossw, O.10
a 0.25 ings of the Third International Conference on Genetic Al-
b 2.0 gorithms, J. Schaffer (ed.), Morgan Kaufmann, San Mateo,
California, pp. 86-91.
Key: population size Antonisse, H. J. and Keller, K. S. (1987) Genetic operators for
(pop-size), probability of uni- high level knowledge representation, in Proceedings of the
form mutation (prob-mutum), Second International Conference on Genetic Algorithms,
probability of boundary muta- MIT, Cambridge, Mass., J. J. Grenfenstette (ed.), Lawrence
tion (probJnUtbm), probability Erlbaum, Hillsdale, New Jersey.
of non-uniform mutation Bellman, R. (1957) Dynamic Programming, Princeton University
(prob~rnUtnm), probability of Press, Princeton, NJ.
simple crossover Bethke, A. D. (1980) Genetic algorithms as function optimizers.
(prob_erosssc), probability of
PhD dissertation, University of Michigan.
single arithmetical crossover
(prob_crosss~), probability of Bertsekas, D. P. (1987) Dynamic Programming. Deterministic and
whole arithmetical crossover Stochastic Models, Prentice Hall, Englewood Cliffs, NJ.
(prob_erosswa), coefficient a Booker, L. B. (1982) Intelligent behavior as an adaptation to
for the whole arithmetical the task environment, PhD dissertation, University of
crossover, coefficient b for A of Michigan.
the non-uniform mutation. Booker, L. B. (1987) Improving search in genetic algorithms, in
Genetic Algorithms and Simulated Annealing, L. Davis (ed.),
Pitman, London, pp. 61-73.
algorithms to overcome three m a j o r problems: h a n d l i n g o f Bosworth, J., Foo, N. and Zeigler, B. P. (1972) Comparison of
constraints, p r e m a t u r e convergence, a n d local fine tuning. Genetic Algorithms with Conjugate Gradient Methods (CR-
The p r e l i m i n a r y results of several experiments are m o r e 2093), Washington, DC: National Aeronautics and Space
t h a n e n c o u r a g i n g a n d suggest that the m e t h o d s are very Administration.
useful. They m a y lead to the solution of some difficult Brindle, A. (1981) Genetic algorithms for function optimization.
operations research problems. We are currently in the PhD dissertation, University of Alberta, Edmonton.
process of i m p l e m e n t i n g a single system to b r i n g all the Brooke, A., Kendrick, D., Meeraus, A. (1988) GAMS: A User's
Guide, The Scientific Press, Redwood City, California.
above ideas together. W h e n completed, the system will be
Davis, L. (ed.) (1987) Genetic Algorithms and Simulated Anneal-
c o m p a r e d with m a n y software o p t i m i z a t i o n packages o n
ing, Pitman, London.
different functions with n o n - t r i v i a l constraints. The system DeJong, K. A. (1975) An analysis of the behaviour of a class of
should be able to deal with very complex p r o b l e m s ( t h o u - genetic adaptive systems. PhD dissertation, University of
sands of variables, h u n d r e d s of constraints) since we need Michigan.
to represent only a relatively small p o p u l a t i o n of potential DeJong, K. A. (1985) Genetic algorithms: a 10 year perspective,
solution vectors a n d apply new genetic operators to them. in Proceedings of the First International Conference on Genetic
Genetic algorithms f o r numerical optimization 91
Algorithms, Pittsburgh, 24-26 July, J. J. Grefenstette (ed.), Michalewicz, Z. and Janikow, C. (1991) GENOCOP: a genetic
Lawrence Erlbaum, Hillsdale, New Jersey. algorithm for numerical optimization problems with con-
Dhar, V. and Ranganathan, N. (1990) Integer programming vs. straints, to appear in Communications of ACM.
expert systems: an experimental comparison. Communications Michalewicz, Z., Jankowski, A., Vignaux, G. A. (1990) The
of ACM, 33, 323-336. constraints problem in genetic algorithms, in Methodologies
Goldberg, D. E. (1989) Genetic Algorithms in Search, Optimization for Intelligent Systems: Selected Papers, M. L. Emrich, M. S.
and Machine Learning, Addison-Wesley, Reading, Massachu- Phifer, B. Huber, M. Zemankova and Z. W. Ras (eds),
setts. Proceedings of the Fifth International Symposium on
Grefenstette, J. J. (1985) Proceedings of the First International Methodologies of Intelligent Systems, Knoxville, 25-27 Oc-
Conference on Genetic Algorithms, Pittsburgh, 24-26 July, tober, pp. 142-157.
Lawrence Erlbaum, Hillsdale, New Jersey. Michalewicz, Z., Kazemi, M., Krawczyk, J. and Janikow, C.
Grefenstette, J. J. (1986) Optimization of control parameters for (1990) On dynamic control problem, Proceedings of the 29th
genetic algorithms, IEEE Transactions on Systems, Man, and IEEE Conference on Decision and Control, Honolulu, 5-7
Cybernetics, 16(1), 122-128. December.
Grefenstette, J. J. (1987a) Incorporating problem specific knowl- Neumann, J. von (1966) Theory of Self-Reproducing Automata,
edge into genetic algorithms, in Genetic Algorithms and Burks (ed.), University of Illinois Press.
Simulated Annealing, L. Davis (ed.), Pitman, London, pp. Richardson, J. T., Palmer, M. R., Liepins, G. and Hilliard, M.
42-60. (1989) Some guidelines for genetic algorithms with penalty
Grefenstette, J. J. (1987b) Proceedings of the Second International functions, in Proceedings of the Third International Confer-
Conference on Genetic Algorithms, MIT, Cambridge, Mass., ence on Genetic Algorithms, 4-7 June, J. Schaffer (ed.),
28-31 July, Lawrence Erlbaum, Hillsdale, New Jersey. Morgan Kaufmann, London.
Groves, L., Michalewicz, Z., Elia, P. and Janikow, C. (1990) Sasieni, M., Yaspan, A. and Friedman, L. (1959) Operations
Genetic algorithms for drawing directed graphs, in Proceed- Research Methods and Problems, John Wiley and Sons.
ings of the Fifth International Symposium on Methodologies of Schaffer, J. (1989) Proceedings of the Third International Confer-
Intelligent Systems, Knoxville, 25-27 October, pp. 268-276. ence on Genetic Algorithms, 4-7 June, Morgan Kaufmann,
Holland, J. (1959) A universal computer capable of executing an Redwood City, California.
arbitrary number of sub-programs simultaneously, in Pro- Schaffer, J., Caruana, R., Eshelman, L. and Das, R. (1989), A
ceedings of the 1959 EJCC, pp. 108-113. study of control parameters affecting online performance of
Holland, J. (1975) Adaptation in Natural and Artificial Systems, genetic algorithms for function optimization, in Proceedings
University of Michigan Press, Ann Arbor. of the Third International Conference on Genetic Algorithms,
Janikow, C. and Michalewiez, Z. (1990) Specialized genetic 4-7 June, J. Schaffer (ed.), Morgan Kaufmann, pp. 51-60.
algorithms for numerical optimization problems, in Proceed- Sirag, D. J. and Weisser, P. T. (1987) Toward a unified thermo-
ings of the International Conference on Toolsfor AI, Washing- dynamic genetic operator, in Proceedings of the Second
ton, 6-9 November, pp. 798-804. International Conference on Genetic Algorithms, 28-31 July,
Janikow, C. and Michalewicz, Z. (199l) Convergence problem Lawrence Erlbaum, Hillsdale, New Jersey.
in genetic algorithms, submitted for publication. Taha, H. A. (1982) Operations Research: An Introduction, Collier
Kirkpatrick, S., Gellatt, C. and Vecchi, M. (1983) Optimization Macmillan, London.
by simulated annealing, Science, 220(4598), 671. Ulam, S. (1949) On the Monte Carlo Method, Proceedings of the
Laarhoven, P. J. M. van and Aarts, E. H. L. (1987) Simulated 2rid Symposium on Large Scale Digital Calculating Machin-
Annealing: Theory and Applications, D. Reidel, Dordrecht. ery, Harvard University Press, pp. 207-212.
Michalewicz, Z., Vignaux, G. A. and Groves, L. (1989) Genetic Vignaux, G. A. and Michalewicz, Z. (1989) Genetic algorithms
algorithms for optimization problems, in Proceedings of the for the transporation problem, Proceedings of the 4th Inter-
llth NZ Computer Conference, Wellington, New Zealand, national Symposium on Methodologies for Intelligent Sys-
16-18 August, pp. 211-223. tems, Charlotte, NC, 12-14 October, pp. 252-259.
Michalewicz, Z., Vignaux, G. A. and Hobbs, M. (1991) A genetic Vignaux, G. A. and Michalewicz, Z. (1990) A genetic algorithm
algorithm for the nonlinear transportation problem, ORSA for the linear transportation problem. IEEE Transactions on
Journal on Computing, 3(4). Systems, Man, and Cybernetics, 21(2).