Professional Documents
Culture Documents
by Benjamin J. Lynch
T C A G T T G C G A C T G A C T
Outline
What are genetic algorithms?
Biological origins Shortcomings of Newton-type optimizers
Strategies
Why would we use genetic algorithms? Isnt there a simple solution we learned in Calculus? Newton-Raphson and its many relatives and variants are based on the use of local information. The function value and the derivatives with respect to the parameters optimized are used to take a step in an appropriate direction towards a local maximum or minimum.
6
Newton-Raphson
Perfect for a parabola!
H is the Hessian (2nd derivative with respect to all parameters optimized) g is the gradient x is the step to minimize the gradient x
7
H x = !g
H x = !g
8
Enter
Genetic Algorithm
Create Initial Population Evaluate the fitness of each individual Select the parents based on fitness Create new population
Evaluation
Selection
Recombination
10
Setup
11
Setup
12
Density Functional Example: Choose parameters to be all the variables in the gradient-corrected exchange terms.
+ ) 1/ 3 2 2 , cx 2 3+ 3 ( ax , bx e , dx f ) = ,$ % ) ) & + f dx 2 4 # * ' ,1 ! 1 + 6 ax sinh x, ) ) Ax * ( & &" (r ) 4 / 3 dr & ! & & '
13
GGA EX
Setup
Setup
14
Fitness
We need to evaluate tness of each individual. The tness will be used to bias the next generation towards better genes.
Setup
15
Fitness
We must rst dene what tness is! You must come up with a single metric that will be used to compare 2 possible solutions and decide which is better.
Setup
16
Fitness =
simulations i
simulations
Setup
17
For an empirical density functional, the tness might be a weighted RMS deviation from experimental values.
Fitness = "
dataset i
! (E
Calc i
( P) " E
Exp 2 i
Setup
18
Check that your problem is well-suited for optimization with a GA. Your tness function will need to be evaluated thousands of times. Make sure you have the resources. If a GA is too expensive, you still might be able to simplify your problem and use a GA to nd regions in the parameter space of interest.
Setup
19
Determine chromosomal representation of parameters Parameters can be encoded in binary, base4 base-10, etc.
encoding
20
After you decide how to encode the parameters, you must decide on the domain of your parameters. This is entirely dependent on your problem. You will want to allow your parameters to be anything physically reasonable (if youre solving a physical problem) Create an initial population with randomized chromosomes
encoding
21
100000001101 3213
encoding
22
698
349
38
You need to understand the system you are optimizing in order to determine the proper parameter range
23
encoding
24
Generate initial population of individuals (chromosomes) The initial population can be generated by randomizing the genes for each chromosome of the initial population You can set the parameters for a few individuals if you want. This might speed up the process.
encoding
25
Review Steps in a GA
1.Initialization of population 2. Evaluation of tness 3. Selection 4. Recombination 5. Repeat 2-4
26
Evaluation
Assign a tness value to each individual based on the parameters derived from its chromosome
Evaluate Fitness
27
The tness function is somehow based on the genes of the individual and should reect how good a given set of parameters is.
Lift-drag , low drag airfoil Ability of a density functional to better predict chemical phenomena Swimming speed of a robotic sh Power output of a chemical laser
Evaluate Fitness
28
Evaluation of the tness is the computationallyintensive portion of a GA optimization Each chromosome holds the information that uniquely describes an individual. Each chromosome/(parameters set)/individual can be evaluated separate from the other individuals.
GA optimizations are typically described as embarrassingly parallelizable
The evaluation of the chromosomes reduces down to a tness value for each individual which will be used in the next step
Evaluate Fitness
29
Selection
Allow selection rules and random behavior to select next population
Selection
30
The parents must be selected based on their tness. The individuals with a higher tness must have a higher probability of having offspring. There are several methods for selection.
Selection
31
Selection
32
Roulette wheel selection has problems if the tness changes by orders of magnitude. If two individuals have a much higher fitness, they could be the parents for every child in the next generation.
Fit(#1) Fit(#2) Fit(#3) Fit(#4) Fit(#5)
Selection
33
Rank Selection
Fit(#1) Fit(#2) Fit(#3)
All individuals in the population are ranked according to tness Each individual is assigned a weight inversely proportional to the rank (or other similar scheme).
1 rank
Fit(#4) Fit(#5)
Fit(#5)
(rank )1.7
Selection
35
Tournament Selection
4 individuals (A,B,C,D) are randomly selected from the population. Two are eliminated and two become the parents of a child in the next generation
A Fitness(A) > Fitness(B) B C Fitness(D) > Fitness(C) D A
Selection
36
Tournament Selection
Selection of parents continues until a new population is completed. Individuals might be the parent to several children, or no children.
A Fitness(A) > Fitness(B) B C Fitness(D) > Fitness(C) D A
Selection
37
9 16 6 16
1 16
Selection
Recombination
Using the chromosomes of the parents, we create the chromosome of the child
Recombination
39
Param. 1 (eyes)
Param. 2 (nose)
Recombination
40
Param. 1 (eyes)
In this case the child will have a new nose that is not the same as parent1 or parent2.
Param. 2 (nose)
Recombination
41
Param. 1
1 5
1 0
Param. 2
0 0
1 5
0 3
Recombination
42
Allows more variation in offspring and decreases need for random mutations
Recombination
43
Param. 1
Param. 2
44
Creep mutations are a special type of random mutation. Creep mutations cause a parameter to change by a small amount, rather than randomizing any one element. Param. 1 1 1 1 0 0 0 0 0 Random mutations 1 1 1 1 1 0 0 0 1 1 1 0 1 0 0 0
Param. 2
1 1 1 1 0 1 1 1 OR 1 1 0 0 0 0 0 0
45
The desirable frequency of mutations depends greatly on the other GA options chosen.
A B C D E F G H a b c d e f g h A B C D e f g h A B C D e f g H
46
Param. 1
Param. 2
Random mutations
Random mutations
Elitism
Elitism refers to the safeguarding of the chromosome of the most t individual in a given generation. If elitism is used, only N-1 individuals are produced by recombining the information from parents. The last individual is a copy of the most t individual from the previous generation. This ensures that the best chromosome is never lost in the optimization process due to random events.
48
More GA Options
Separate islands with populations that interact infrequently. Use male & female populations Alpha male selection Use 3 parents for each offspring Use 3 sexes Recessive genes . and many more. Most of which are only useful for very specic types of tness functions.
49
Micro-GA
Small population size of 1 individual per parameter optimized. No random mutations. When the genetic variance is below a certain threshold (~5%), the most t individual goes on, while the chromosomes of all other individuals are randomized Cycle this process
50
Overview of a micro-GA
Create initial population Evaluate the fitness of each individual Select the parents based on fitness Recombine parents to create new population Is the variance in genes < than 5% Randomize the chromosomes for all but the most fit individual
No
Yes
51
52
Review
Create Initial Population Evaluate the fitness of each individual Select the parents based on fitness Create new population
Evaluation
Selection
Recombination
53
54
If you are unsatisfied with the progress you can change a couple options
Allow crossover everywhere to introduce more variety between each generation.
Change to base-10 encoding as well, so it isnt too random
Use rank selection with a weight of 1/rank3 to heavily favor the best individuals.
55
57
Method to Parallelize a GA
MPI is always a good choice if youre already familiar the language. This option would also enable GA algorithms with islands on heterogeneous clusters. Fork/wait would be very easy
Can be done in several languages
58
#!/usr/bin/perl use Parallel::ForkManager; $max_process=4; $pm = new Parallel::ForkManager($max_process); @all_chromosomes=(@ARGV) # enter main loop, no more than $max_process # children will run at a time foreach $chromosome (@all_chromosomes) { my $pid = $pm->start and next; # prepare an appropriate input file with # this set of parameters &writeinput($chromosome); # run the program to evaluate the fitness &evalfitness($chromosome); $pm->finish; } # make sure that all the processes are done $pm->wait_all_children; #parse output and return to our GA &parse_output;
59
GA Programs Available
David Carrols GA Driver
http://cuaerospace.com/carroll/ga.html
GAUL
http://gaul.sourceforge.net/
Simple Generalized GA
http://www.skamphausen.de/software/AI/ga.html
60
61
62