You are on page 1of 18

Computationally Intensive and Noisy Tasks: CO-EvolutionaryLearning and Temporal Difference Learning on Backgammon

Motivation Experiment Setup

A Benchmark and a Representation


Fitness Function and Other Parameters Measuring Genetic Diversity

Results

How Big a Population, How Many Games


Corollary: Small Population, More Games is Worse

Conclusions

Experimental Setup

Pubeval (Temporal Difference learning) The co-evolutionary system here also


uses Pubevals simple linear representation.

On a more sophisticatedneural network architecture, thismethod createdthe worlds best Backgammon computer, TDGammon

Experimental

Pubeval is two linear functions,

main part of a game of Backgammon the final racing stage, when pieces do not have to pass the opponents pieces. This racing stage is less interestingthan the main part, because there is an algorithm to exactly solve the end game.

Co-evolution here only optimizes the first function and The final racing part of the game uses Pubevals racing weights.

Measuring Genetic Diversity

A popular measure of genetic diversity


is the Shannon index. Givenn different groups, each of which has fraction fi of the total number of individuals, the Shannon index H

The question facing this paper is, for a given way to repre- sent a solution, how can CO-evolutionary learning obtain the highest ability from the least CPU time?

Results ( How Big a Population, How Many Games)

Plausible answer is that on this noisy


task,more samples

Do those extra games make any


differenceat all?

Sampling more games would more accurately discem the differences in ability among the members of the population.
that the extra precision in those evaluations does indeed have an effect,but a negative one: more games reduce the behavioral diversity, which ipso facto require more evaluations to discern those smaller differences between players.

Corollary: Small Population, More Games is Worse Since we're interested in achieving a
representation's peak ability, from the least CPU time, it is reasonable to ask what happens when the population size is barely large enough, instead of generously large. Smaller populations use less CPU time. But a smaller population has more trouble maintaining diversity, and Figure 3 and 4 show that

Co-Evolution,What Is ItGoodFor?

The answer depends partly on the your


learning task and computational resources

If your task is such that a small improvement doesn't count, and parallel hardware is unavailable, then temporal difference learning is attractive. If the task requires the best possible competitive advantage, and coming second means losing, then using much more CPU time for the best possible results may be worth it. if parallel hardware is available, co- evolution becomes attractive.

Conclusion
Use a generously large population: more noise requires more population. If you skimp on population size, more evaluations can be worse for learning, not better, because of its tendency to reduce diversity. Use just enough evaluations, so that more does not improve learning. This depends on the task, and your implementation. Here, each individual needs to take part in about 1600 games.

Computationally intensive noisy tasks


are tractable to coevolutionary learning on inexpensive parallel hardware, and given enough computational power, can create a solution comparableto Temporal Difference learning.

You might also like