You are on page 1of 6

A Generic Genetic Algorithm using Phenotype Building Functions

Marc Tanti
Dept Artificial Intelligence
University of Malta
mtan0007@um.edu.mt

single point crossover, two point crossover, n-


Abstract point crossover and uniform crossover whilst
mutation is done by randomly inverting bits.
The problem with genetic algorithms is Permutation encoding is when the chromo-
that their use is made difficult by the fact some is a permutation, that is, an ordering of
that they are not very reusable. This is elements. Things like Hamiltonian cycles or or-
because the chromosome representation, dering problems can be encoded this way. Since
genetic operators and fitness function the permutation property needs to be preserved,
need to be redefined for many problems. partially matched crossover is used for crossover
In order to use a genetic algorithm one and random swaps are used from mutation.
has to think in terms of evolution which Value encoding is the same thing as binary
might not be easy for a programmer. In encoding except that it uses a greater cardinality,
this project, we aim to solve these prob- that is, more than just ‘0’ or ‘1’, such as enume-
lems by creating a generic chromosome rations, integers, real numbers and objects. Any
representation which always uses the of the crossover methods for binary chromo-
same genetic algorithm regardless of the somes can be used. For mutation, a small random
problem being solved. The user then di- real number can be added to a number in the
rects the genetic algorithm by writing a chromosome in the case of real numbers or an
function, called the ‘phenotype building element in the chromosome can be randomly
function’ or the ‘builder function’, which exchanged with another random element.
maps the generic chromosome into can- Tree encoding is when trees are used for
didate solutions. The fitness function will chromosomes. This is used in genetic program-
still have to be provided in order to ming where equations are represented as expres-
measure the fitness of a candidate solu- sion trees and programs are represented as syntax
tion. trees. Crossover can be the swapping of sub-trees
between trees and mutation can be the replace-
Results of examples executed with this ment a sub-tree with a random sub-tree.
method show that the genetic algorithm’s This variety of encodings (and corresponding
process is not significantly hindered by genetic operators) is contrasted with biological
this abstraction and that the use of the evolution where only one encoding is used, the
genetic algorithm was intuitive. DNA. Yet, an enormous diversity of form and
function exists in the different phenotypes we
1 Introduction observe. So biological evolution can be thought
of as a form of generic genetic algorithm, where
In genetic algorithms, the problem being solved from a single chromosome encoding, any solu-
dictates the encoding of the chromosome. tion can be evolved. It would be desirable to
(Obitko) mentions four types of chromosomal have the same ability in genetic algorithms.
encodings: Of course the DNA can be thought of as a val-
Binary encoding is when the chromosome is a ue encoding with a cardinality of four (since it is
string of bits and its applications include evolv- made up of four different nucleotides) and that
ing binary numbers and sequences of Boolean things like Hamiltonian cycles and expression
values. Crossover is done by exchanging bits trees are probably not best evolved on such a
between chromosomes using methods such as
system. But the power of biological evolution is ship problems such as time table scheduling
not in how the genotype is represented, but in where a relationship between time slots and les-
how it is used. (Kratz, 2009) describes that the sons has to be found. However, it becomes less
DNA is used by first dividing it into sections natural to use for other problems such as numeri-
called genes which encode proteins. Proteins are cal ones.
sequences of molecules called amino acids, the This paper describes a way to create a generic
sequence of which is determined from the gene. genetic algorithm which is both reusable for dif-
Once a protein is produced it folds up into a three ferent problems and easy to use by a program-
dimensional structure, the shape of which will mer.
give it a function such as breaking or joining
molecules. All the different proteins produced 2 Specification and Analysis
when working together make cells do what they
One way to create a generic genetic algorithm is
do and this gives rise to the phenotype.
to make one which can evolve any value of any
So it can be said that the power of chemistry is
type (strings, integers, floats, arrays of strings,
held inside this chromosome representation. Dif-
etc). The user would then only have to supply a
ferent DNA instances give rise to different chem-
fitness function which, apart from acting as a
ical results rather than encoding the phenotype
normal fitness function, also penalizes a candi-
directly in the DNA, as is the case with tradition-
date solution for being of an undesired type and
al genetic algorithms. This is what makes biolog-
for being an illegal value (such as an array which
ical evolution so versatile. The question is
isn’t a permutation). The problem, of course, is
whether we can place the power of the Turing
the massive search space created as well as the
machine, rather than of chemistry, inside a ge-
increased complexity of the fitness function.
neric genetic algorithm. That way it can be used
Instead, a way to guide the genetic algorithm
and reused for a wide variety of problems, rather
to create only candidate solutions of a desired
than having to redesign the encoding and genetic
type is used. The genetic algorithm used is a
operators for different problems.
simple binary chromosome encoding, irrespec-
There are, of course, several indirect encod-
tive of the problem being solved. Therefore the
ings developed which use a process to convert
genetic algorithm needs never be altered and is
the chromosome into a candidate solution rather
generic. What’s new is that apart from the fitness
than encoding the candidate solution itself in the
function, the user also supplies a phenotype
chromosome. Such examples include the follow-
building function, or builder function.
ing: (i) Developmental Genetic Programming
This function takes as a parameter a reader ob-
(Banzhaf & Keller, The Evolution of Genetic
ject which encapsulates the binary chromosome
Code in Genetic Programming, 1999) which uses
and provides methods to read the sequence of
an arbitrary sequence of symbols found in ma-
bits as primitive data type values such as integ-
thematical equations as a chromosome and re-
ers, strings and floats. In other words, each me-
pairs these sequences into legal expressions, (ii)
thod reads a part of the binary chromosome and
Gene Expression Programming (Ferreira, 2001)
converts the binary number read to the appropri-
which encodes a tree of nodes into a linearly
ate type. For example reading a single bit to re-
structured chromosome to be converted to a tree
turn a Boolean value or reading several bits to
such that any chromosome is guaranteed to have
return an integer. This is analogous to the way a
a prefix that encodes a valid tree, and (iii)
file reader works, which reads a file byte by byte,
Grammatical Encoding (Mitchelle, 1998, p. 74)
converting the bytes into data of the desired
where a context free grammar is evolved, out of
types.
which a single string is produced which encodes
The user writes the phenotype builder function
how the neurons of an artificial neural network
such that its parameter, the reader object, is used
are to be interconnected.
to obtain inputs for the function. Based on these
However the only indirect encoding found
inputs, the function is to output a candidate solu-
which was meant to be used as a general solution
tion. The function is written such that only can-
rather than being problem oriented was that de-
didate solutions of the desired type and only le-
scribed in (Guillaumier, 2002) where problems
gal values are returned. For example, if a feed
are described as DBMS tables which need rela-
forward artificial neural network is to be
tionships between the tables; the genetic algo-
evolved, a phenotype building function could be
rithm will then evolve the relationships. This
written which reads as many floats as there are
system is very natural to use for solving relation-
weights, organizes these floats into matrices for crossover and point mutation (giving each bit a
the artificial neural network and returns the neur- small probability of being flipped).
al network as a candidate solution, or phenotype.
An example of a phenotype building function 4 Examples and Experiments
which produces square numbers would be the
In order to demonstrate the validity of the con-
following:
cept, a suite of problems were attempted to be
function int builder(reader) { solved using the described system. Each experi-
int num = reader.readNatural(10); ment was run 3 times and was allowed to evolve
return num*num; for 500 generations. The best fitness of each
}
generation was recorded and will be presented
This function first reads a natural number here as a graph.
made from 10 bits from the binary chromosome
and then returns the square of that number. 4.1 Artificial Neural Network for Parity
But the builder function could be more com- In this problem, the genetic algorithm was used
plex. For example permutations could be gener- to evolve the weights for an artificial neural net-
ated by reading pairs of array indexes and then work to learn the 7-bit odd parity problem using
using these pairs to swap the indexes in an or- 20 hidden neurons (plus 7 inputs and 1 output
dered array. Thus an ordered array would be neuron).
shuffled into a new permutation just by reading The builder function reads rational numbers of
numbers from the chromosome, without having 10 bits for the whole number and 10 bits for the
to use special encodings. fractional part for each weight, 160 (7×20+20×1)
It was thought that reading the chromosome at weights in all, and organizes them into the ma-
runtime rather than preloading all required inputs trices of the artificial neural network.
beforehand would allow for a shorter chromo- The fitness function calculates the mean
some as the same bits can be read differently square error of each 7 bit input. The genetic algo-
based on previous inputs read. rithm minimizes the error.
Thus we have a new way of thinking about None of the 3 runs reached zero error. The
genetic algorithms. Instead of viewing genetic lowest MSE reached was 0.0234375 with only 3
algorithms as evolving solutions to problems, we (of 127) inputs which give an error greater than
view them as evolving inputs to an algorithm 0.1.
which generates solutions. As the inputs are
0.5
evolved, the output is also evolved and the genet-
Mean Square Error

ic algorithm searches for the inputs which max- 0.4


imize the fitness of the output. 0.3
We believe that this new way of using genetic 0.2
algorithms is easier for programmers to use as
0.1
instead of thinking in terms of evolution, they
would be thinking in terms of algorithms which 0
1
43
85
127
169
211
253
295
337
379
421
463

map inputs to outputs. Hence in this way we


have created a genetic algorithm which is generic
because it harnesses the power of the Turing ma- Generation
chine in the phenotype building function to be
Figure 1 Graph of best fitness vs generation of evolution
generic, just as biological evolution harnesses the
of weights of an ANN.
power of chemistry in the proteins to be generic.
4.2 N-Queens Problem
3 Implementation
In this problem, the genetic algorithm was used
The genetic algorithm used was intentionally to evolve a solution for the 16-queen problem, by
made simple in order to demonstrate that the per- placing 16 queens on a 16 by 16 chess board
formance of the search is not drastically deteri- without restrictions on the placement in order to
orated by the way the genetic algorithm is used intentionally make it harder to evolve (queens
and that no special genetic algorithm needs to be can even overlap).
used for it to work. The genetic algorithm uses The builder function reads 16 chess board
binary chromosomes, rank selection, 2-point coordinates (pairs of natural numbers) and places
each queen there.
The fitness function counts the number of 25
queens in the same row, column or diagonal of
20
another queen and divides the total count by 2 to

Clashes
avoid counting each attack twice as attacks are 15
symmetric (queen A attacks queen B means that 10
queen B attacks queen A). Overlapping queens 5
are counted 3 times, hence giving them a greater
0
penalty. The genetic algorithm minimizes the

1
43
85
127
169
211
253
295
337
379
421
463
number of attacks.
One of the 3 runs reached a solution to the
Generation
problem with no queens attacking others, the
other two only had one queen in the wrong place. Figure 3 Graph of best fitness vs generation of evolution
15 of time table scheduling.

10
4.4 Travelling Salesperson Problem
Attacks

In this problem, the genetic algorithm was used


5 to solve a TSP by evolving a minimum distance
cycle for an 8 by 8 grid of cities (64 cities),
0 spaced at unit intervals.
Being a permutation problem, the builder
1
43
85
127
169
211
253
295
337
379
421
463

function generates permutations of cities by se-


Generation lecting which city goes in which place in the
permutation. In this case, this is done by select-
Figure 2 Graph of best fitness vs generation of evolution ing a city from an ordered list of cities, append-
of n-queen problem. ing it to the permutation and removing it from
the ordered list. This is repeated until every city
4.3 Time Table Scheduling Problem
is moved to the permutation.
In this problem, the genetic algorithm was The fitness function takes every neighboring
used to evolve a schedule for 90 lessons among pair of points in the cycle, finds the Euclidean
40 timeslots such that both the teachers and the distance between them using the Pythagoras
group of students don’t encounter any clashes in Theorem, sums up all the distances and returns
time. There are 10 lessons per teacher (9 teach- the total. The genetic algorithm minimizes the
ers) and 4 groups of students in all and these distance.
teachers and students are paired up into lessons None of the 3 runs reached a minimum dis-
in such a way to appear haphazardly mixed. tance. The shortest cycle evolved was:
The builder function reads a natural number as
a time slot (between 0 and 39) for each lesson.
The fitness function counts the number of du-
plicate time slots for each teacher and the num-
ber of duplicate time slots for each group, since
duplicate time slots indicate clashes. The genetic
algorithm minimizes the number of clashes.
All 3 runs evolved to zero clashes very quick-
ly.
Figure 4 Hamiltonian cycle evolved.
250 4.6 Symbolic Regression of Orbits
200 In this problem, the genetic algorithm was used

Distance
150 to evolve an equation which has a graph that
passes through a given set of points with minimal
100
error. The points1 used are the duration of the
50 orbits of planets against their distance from the
0 sun. The goal is that of finding an equation
which describes the relationship between orbit
1
43
85
127
169
211
253
295
337
379
421
463
duration and radius.
Generation The equation to be evolved can consist of (any
number of the following) a constant made of 10
Figure 5 Graph of best fitness vs generation of evolution bits for the whole number and 10 bits for the
of TSP. fractional part, an input variable for the equation
(x), a square root, a plus, a times and a division.
4.5 Bin Packing Problem
In all there cannot be more than 50 operands and
In this problem, the genetic algorithm was used operators.
to evolve a solution to a bin packing problem Since evolving equations involves evolving
where the bin size is of 100 units and the items to expression trees, the builder function generates
bin are the numbers from 1 to 99. trees by using reverse polish notation. First, a
To make the example more interesting, it was queue of rational number constants of 10 bit
decided to solve this problem by finding a per- whole numbers and 10 bit fractional parts are
mutation of the list of items and adding them to evolved together with another queue of operands
the bins in order, fitting as many numbers as and operators, collectively called functions.
possible in the current bin and starting a new bin Functions are de-queued and pushed into a stack
when the next number doesn’t fit. where if a function is an operation, the right
As a permutation problem, the builder func- amount of functions are popped from the stack
tion generates permutations by swapping the and attached to the operation. Should there not
elements in the ordered list of numbers. In this be enough functions in the stack, the operation is
case, we shall use the concept used by selection ignored. If the function being pushed is a con-
sort, where we move through the list from first to stant, a rational number is de-queued and used as
last swapping each element with an element after the constant to be pushed into the stack. At the
it (or with itself in order to leave it in place). end only the tree found at the top of the stack is
Natural numbers are read to determine which used as the phenotype, which allows variation in
element to swap with. the size of the tree. If the stack is empty, the
The fitness function counts the number of bins phenotype will be the equation y = 0.
used. The genetic algorithm minimizes the num- The fitness function takes each point the equa-
ber of bins. tion is supposed to pass through and calculates
All 3 runs evolved a solution using 53 bins, 3 the mean square error for each point. The genetic
bins more than a greedy algorithm’s solution algorithm minimizes the error.
(binning 99 with 1, 98 with 2, etc). None of the 3 runs reached zero error, howev-
65 er they all came very close when considering that
a perfect fit might not exist within the constraints
60 imposed on the equation. The best fit had a max-
imum error of 48.4 and the graph of the equation
Bins

55
is:
50
45
1
43
85
127
169
211
253
295
337
379
421
463

Generation

Figure 6 Graph of best fitness vs generation of evolution


1
of bin packing problem. The points were taken from
http://www.topscope.org/nuffield/pas/solar/solar7.htm
l
rithm which took longer to decrement the num-
ber of bins.

5 Conclusion
We feel that the experiments demonstrated in
this report show that the concept described is
worth investigating further. Although we do not
claim that any problem can be efficiently
Figure 7 Equation evolved: y=(x + x) * sqrt(x /
evolved with this system, we believe that any
100.751953125) Turing computable candidate solution can be
described and evolved.
1.00E+10 From a cynical point of view, one might claim
Mean Square Error

that the system is merely an indirect encoding


1.00E+08
genetic algorithm which repairs the genotype to
1.00E+06 produce a valid phenotype. The builder function
1.00E+04 could be viewed as merely correcting any incon-
1.00E+02 sistencies in the genotype with the desired phe-
notype (in the case of the symbolic regression
1.00E+00
example, for example). We still believe this is
1
52
103
154
205
256
307
358
409
460

easier to use than a direct evolution of the pheno-


type with custom genetic operators because it
Generation
allows the user to abstract away details which are
Figure 8 Graph of best fitness (logarithmic) vs
not relevant to the problem. The user only focus-
generation of evolution of equations. es on the “repair” aspect of the genetic algorithm
and the rest is reused.
4.7 Comparison to Traditional Genetic Al- Writing a function to generate a candidate so-
gorithms lution from inputs seems to be more natural a
The bin packing problem evolved here was thought for a programmer than thinking in terms
compared to solutions evolved using a direct of evolution. For some problems, it might be eas-
permutation encoding. Keeping all things equal ier even for non-programmers who still need to
and changing only the way permutations are use a genetic algorithm such as engineers.
evolved (by swaps versus by partially matched
crossover and swap mutation), the following data Reference
was obtained: Banzhaf, W., & Keller, R. E. (1999). The Evolution
65 of Genetic Code in Genetic Programming. Proc.
Genetic and Evolutionary Computation Conference
60 (pp. 1077-1082). San Francisco, CA, USA: Morgan
Kaufmann Publishers.
Bins

55 Ferreira, C. (2001). Gene Expression Programming: A


New Adaptive Algorithm for Solving Problems. In
50
Complex Systems, Vol. 13, issue 2 (pp. 87-129).
45 Complex Systems Publication, Inc.
Guillaumier, K. (2002). A Multi-Purpose Scripting
1
43
85
127
169
211
253
295
337
379
421
463

Language and Interpreter for Optimisation Problems.


Msida, Malta: University of Malta.
Generation
Kratz, R. F. (2009). Molecular & Cell Biology FOR
DUMMIES. Indianapolis, IN, USA: Wiley Publishing,
Figure 9 Graph of best fitness vs generation of evolution
Inc.
of bin packing problem using project genetic algorithm
(blue) in comparison to traditional genetic algorithm Mitchelle, M. (1998). An Introduction to Genetic
(red). Algorithms. Cambridge, MA, USA: MIT Press.
Obitko, M. (n.d.). Introduction to Genetic Algorithms.
The project genetic algorithm (blue) was a Retrieved May 2010, from www.obitko.com:
clear winner when compared to the traditional http://www.obitko.com/tutorials/genetic-
genetic algorithm (red) as it evolved faster with algorithms/index.php
smaller steps unlike the traditional genetic algo-

You might also like