Professional Documents
Culture Documents
-1-
4. Preliminary Investigations
-2-
to use sent the same data but the “see” messages In this simple example there are just two input
were slightly different. In version 9 of the protocol nodes and two output nodes, with four hidden layer
the server (using the previous example) would nodes. Input values are entered at the input nodes
send: and processed through the network. As the values
pass through each connection they are multiplied
(see 43 ((line t) 6.2 -17) by the associated weight, a number between 0 and
((ball) 16.8 15)) 1. The summations of these weighted values are
worked out at each node and if a threshold value is
However, we changed to version 7 as this was reached they are passed onto the next layer, until
what the RoboCup manual used. In doing so the eventually there is a value in the output nodes (see
messages were sent slightly differently: section 3.1.1. in the corpus of material).
Neural networks can learn through a number of
(see 43 ((l t) 6.2 -17) ((b) 16.8 techniques, all of which involve changing the
15)) weights associated with the interconnections
between the nodes. They are used for pattern
The words “line” and “ball” had been replaced recognition amongst other things. They have three
with letters - “l” and “b”. Although this was not a main advantages:
major problem it became a hindrance as some of 1. They have the ability to learn through
the first code we wrote was based on version 9 of example
the protocol and when we changed to 7 our code 2. They are more fault tolerant than most
fell over. other learning algorithms
3. They are more suited to real-time
4.3. Choosing a Learning Algorithm operations due to their high computational
rates.
There were a few options open to us when it
came to choosing a learning algorithm and we 4.3.2. Genetic Algorithms. Genetic algorithms are
narrowed it down to two: Neural networks and inspired by Darwin’s theory of evolution. The
genetic algorithms. algorithm begins with a set of solutions to a
problem called the population. Solutions from one
4.3.1. Neural Networks. A neural network population are taken and used to form new
consists of a number of interconnected neurons populations. It is hoped that the new population
called nodes. They work by inputting data into the will be better at solving the problem than the old
input nodes and processing these through the one. The population that is chosen to “reproduce”
network to the output node. The number and is the population that provides the best solution to
strength (the weight) of the connections between the problem, much like survival of the fittest in
the nodes determines the final output. The theory is biology.
that the neural network works in the same way as A basic genetic algorithm has the following
an incredibly primitive biological brain. A human steps:
brain however contains billions of neurons and 1. Generate a random population of n suitable
connections so even if scientists understood fully solutions
the dynamics of human brains it is still beyond our 2. Evaluate the fitness of each solution in the
computing capacity to simulate them. population
A neural network typically has three layers; the 3. Create a new population by selecting the
input layer, the hidden layer and the output layer: best two performing solutions and combine
them to produce offspring. Then mutate the
offspring slightly and place them in the
new population.
4. Use the new population and run the
algorithm again.
5. If you are satisfied with the results stop,
else go to step 2.
-3-
2. They do not rely on any assumptions or
prior knowledge of the search space
3. A genetic algorithm search runs based on
probabilistic rules instead of deterministic
ones.
-4-
direction of the ball, and the two values we wished error. If you wanted to have an output of 30 and
to output were the turn value (which direction the you get an output of 50 you have an error of 20. It
player turns) and the dash value (how fast the is this error that leads to the next phase which gives
player runs): the algorithm its name – the errors are fed back
through the network and changes are made to the
weights of the nodes depending on how much that
node contributes to the error at the output. The
algorithm repeats this process until the outputs
produced for the training data are sufficiently close
to the desired outputs; in other words until the error
is sufficiently small (further explanation is given in
Figure 4. The 2x2x2 neural network section 3.1.1. in the corpus of material).
For the backpropagation algorithm to work you
To create this small network in Java we created must first train it using training data. Since we only
a number of arrays. The first was called had a small network to begin with we decided that
neuronTypes, and this was used to identify which we would train the network to recognise the
input neuron the value for the balls distance and following simple situation:
direction should be loaded into. We then created a If the ball is greater than 20m away turn
neuronOutputTypes to extract the correct output towards it but don’t run
value. These arrays were of String type, but we If the ball is less than 20m away turn and
also required arrays that would handle the actual run towards it
values, so we created the output layer, a one- After implementing the mathematics of the
dimensional array called outputLayer, and two algorithm we began to train the network somewhat
two-dimensional arrays called inputLayer and unsuccessfully. We had a number of problems, one
hiddenLayer. The reason inputLayer and of which was to do with normalising our inputs.
hiddenLayer were two-dimensional was so we We were advised to use input values of between 0
could store the value and the weights associated and 1, but the distance of the ball could be anything
with that neuron in the same array. This enabled us up to 160m away, and the direction of the ball
to save valuable time during the processing stage. could be from -180° to 180°. We decided we would
Instead of taking well over 100ms to process therefore normalise the inputs by dividing them by
data through the network it was now taking the maximum possible error we could receive at
nanoseconds. We then tested to see how long it the output nodes, being 160 for ball distance and
took to parse the messages and then process the 360 for ball direction.
network. We found it took around 2 or 3 Another problem was to do with the sigmoid
milliseconds almost every time, but occasionally it function that was required during the
would jump to 30 or 40. We discovered the reason backpropagation algorithm. The problem here was
for this was that Java did its own garbage that it would never get to negative numbers. There
collection automatically when a certain amount of were a couple of ways round this. Firstly we could
memory was being used, so every now and then it have used another version of this function that
would take time out to clear this. It was decided we allowed for negative values called the tan sigmoid,
would manually perform our own garbage but eventually we just worked around the problem
collection every time the messages were processed by assuming we would never use negative
through the network. This increased the average numbers. For example, if we had to turn -10° we
time to parse and process the messages, but it would simply turn 190° as this amounted to the
meant that this whole process never took more than same thing.
about 10ms. We eventually managed to get the neural
network learning this one situation after around
5.3. Backpropagation 500 cycles through the backpropagation algorithm.
Now we knew the mathematics was working we
The backpropagation algorithm starts by attempted to learn every combination for ball
initialising all the weights in the neural network to distance and direction – all 57600 (160*360) of
random values, usually in a range of -0.5 to 0.5. them. Here however, the algorithm did not work as
The data is then fed through the network to give we had hoped. Instead of learning the two rules we
some output. The algorithm works by assuming had told it, the network seemed to just take an
you will know what you want at the output nodes average of the maximum errors we were giving, as
and therefore what you actually get will have an
-5-
Dash converged to 80 (maximum error of 160) and 5.4. Synchronisation Issues
Turn converged to 180 (maximum error of 360).
We consulted a neural network expert, Dr. While we were developing the neural network
Konstantinos Sirlantzis, who showed us that there not much attention was paid to the synchronisation
was more than one way to teach a neural network of the threads we created. We always knew this
using backpropagation; an online method and a would have to be done but we thought it would be
batch method. Online means you backpropagate better to clean up the code and synchronise it when
every time data is processed through the network, the neural network was working, but since it never
whereas batch means you run all the training did work properly we decided to implement this
patterns and sum the errors to find an overall before we continued with the genetic algorithm to
average error and then backpropagate using this save any more work in the future.
average. We tried them both but still the network Two threads are created by our program, one of
failed to learn, more often than not converging to them being the PlayerBrain and the other
the average of the maximum error. containing the smaller brains and SensoryInput
After this failed we assumed there must be class. The main motivations behind synchronising
something wrong with the code we had written so the threads were two fold: Firstly we wanted to
we went through it line by line and calculated each make sure we only sent a maximum of three
weight update manually to make sure it was doing commands per cycle (this was the maximum
exactly what we wanted. A couple of mistakes number allowed by the server) and secondly we did
were found; we were applying the sigmoid function not want to be in a situation where we were reading
twice to the same value and there were brackets in and writing data on the same arrays. We had to
the wrong place, but other than these small typos create a semaphore to ensure that decide() - the
the mathematics worked as we planned and even method used to make the decisions - was blocked
after fixing these mistakes it still was not learning until told to run. Since sense body messages are
the patterns we presented. sent every 100ms (the same time as one cycle) we
We were introduced with the following agreed this would be used to call decide().
problem: Should we continue with the neural However, there was another problem in that see
network which was the backbone of the client messages were sent every 150ms, which meant that
programs we had written or should we consider a every third cycle we would have two messages
different learning algorithm? We decided to give being sent to us. Since we wanted see messages to
ourselves a deadline on the neural network and if it be loaded into the arrays first we had to call
still wasn’t working by this date we would be decide() and then sleep for 15ms and use a second
forced to consider some other implementation, semaphore to block until all the see parameters
likely to be some kind of genetic algorithm. were loaded. This would ensure that decisions were
In the remaining weeks of using the neural not being made on the previous cycle’s
network we went through our code with Dr. information.
Sirlantzis but he could see nothing wrong with how
it was implemented. In the backpropagation 5.5. Implementing Genetic Algorithms
algorithm there are many different variables we
could change. The learning rate for example is one The approach we took with the genetic
parameter used and we altered this from values algorithm was to design the player so he could
such as 0.001 to 2000, but this didn’t seem to have recognise different situations. In each situation he
any effect. We then tried changing the number of would have some actions, but the genetic algorithm
hidden layer neurons in the network from two to would be responsible for learning what value the
higher numbers because we thought that maybe action should take. For example, if the player is in
there were not enough routes through the network a situation where he must turn and then run, the
for all the combinations of patterns to be learnt. genetic algorithm does not actually change the fact
After weeks of tweaking and testing the that the player will turn and run, but it will
network simply was not working and we decided determine how fast and what direction the player
we had to change our approach dramatically and will take.
implement a rule based client that used a genetic The first step therefore was to write down all
algorithm to learn. the different situations we wanted our players to
recognise and then code them. This was different
for each type of player - midfielder, defender, etc
and proved quite a hard task in itself. We spent a
-6-
lot of time testing the code to make sure each on it, and felt that had we had more time we would
situation was being recognised correctly. It was have done a good job of it. However, we felt the
here that we noticed there was a problem. The time would be put to better use by tidying up and
player primarily uses what he can see to work out documenting the code we already had and
the situation he is in, but when the information is beginning the documentation.
loaded into the arrays, if something cannot be seen
this piece of information is not updated and 5.6. Optimisation Issues
therefore the array could hold old information. We
solved this problem by setting a default value for With cycle times being just 100ms it was
all the elements of -999. Therefore the player knew obviously important that our program executed
he could not see an object if the value stored for it everything it required and sent the messages to the
was -999. server within this time. Even though we changed
The second step was to implement a the learning algorithm the overall structure of our
mechanism that would ensure the players only sent program remained the same, with the smaller
a maximum of three commands per cycle. This was brains parsing the information and loading the
done by creating a linkedlist of length three and values into arrays which will be used to make
adding commands to this. The first command decisions. This meant that optimising the brains,
would be taken and sent to the server and the next especially the see brain which was used most often,
cycle the second command would be sent. This would save processing time whether we used a
ensured only one action command was sent per neural network or genetic algorithms.
cycle. The only time the player made a decision The main operation the see brain did was to
was when the arraylist of commands was empty, break the message up into substrings (see section
thus making sure we only queue a maximum of 5.1. of this document). However, this is time
three messages (there can only be one kick, turn, consuming because a lot of comparisons were
dash or turn_neck command per cycle, but say being made. The optimised version was almost
messages are allowed to be sent as well as any of rewritten from scratch – instead of breaking the
the previous ones). Another linkedlist of length string down into smaller and smaller substrings it
three was created for say messages. This meant that went through the message indexing the string. This
both action commands and say commands could be meant it only had to go through it once and index
sent in the same cycle without exceeding the where specific brackets were. Depending on where
number of messages per cycle the server allowed. these brackets were and how many parameters
The next stage in implementing the genetic were contained within meant we could identify
algorithm was to create an array, where each which objects the player could see and extract the
element refers to a specific dash, turn, kick or necessary information.
turn_neck value. In accordance with the process This new indexing of the see message instead
outlined in section 4.3.2. of this document we of breaking it down meant that the time it took to
would set all the values in the array to a valid, process one message went from an average of
random number. about 10ms down to an average of about 3ms.
Once this is complete we need to implement a We also went through the PlayerBrain to
scoring system. We planned to complete this by optimise the different loops that were required and
having a kind of “fantasy football” scoring system tidied up the code to make it as efficient as
where the coach would rate the players depending possible. After all these changes were made we
on how well they play. For example, ten points managed to decrease the total processing time,
would be awarded to a player who scores a goal, from when we initially get the messages to sending
two points for a completed pass etc. This way we our own messages to the server from 30ms to less
rate the fitness of each player and can now select than 1ms.
the two best players ready to produce the next
generation of players. 6. Conclusions
However, since the failure of the neural
network time was always against us. After we Obviously we did not achieve our major
coded the players to recognise the different objective for this project which was to implement a
situations they were in we made the decision to neural network so our players could learn about the
stop working on the genetic algorithm as we knew opposition and progressively improve their
we would not have time to implement, test and performance. We were very disappointed about this
then train it. This was disappointing because we because we spent such a long amount of time on
made good progress in the two months we worked
-7-
this but were unable to fully diagnose why the 7. Acknowledgements
network failed without risking missing the
deadline. Neither our supervisor nor Dr. Sirlantzis We would like to firstly thank our project
could find any problems so it remains a mystery. supervisor Colin Johnson for his continued input
Maybe we missed some vitally important detail or and support for the project. Colin helped in
didn’t present the training patterns in a way that numerous ways, including suggests about how we
allowed the network to learn. However, we did implement the neural network using arrays instead
choose a challenging project and one of the risks of classes, as well as providing us with the
we identified was not being able to implement the backpropagation algorithm. He was always
neural network, which unfortunately proved to be a available for meetings and discussed our problems
reality. until he was sure we understood how to overcome
Since we spent such a long time on the neural them. Additionally he was a tremendous help with
network we had left ourselves a tight schedule for respect to the genetic algorithm and the write up of
learning, implementing and training a genetic documentation, including this paper.
algorithm. As we progressed with the algorithm we We would also like to thank Dr. Konstantinos
began to accept the fact that the algorithm was not Sirlantzis for his much appreciated help while we
going to be fully implemented. It was frustrating struggled with the neural network. Kostas never
because we quickly learned how they worked and hesitated when we asked to meet with him and
strongly believed that had we chosen this approach discuss our implementation of the network, and he
over the neural network we would have made a was always available for further meetings and gave
really good job of it. As it turned out we had too us invaluable insights into the workings of neural
little time and only managed to implement the networks. Without his help we would not have
players recognising the situations they found understood our problems with the network as well
themselves in. as we did and he was genuinely disappointed when
Although we did not meet our aims for the we had to abandon the network for a different
project we don’t feel that it was a wasted effort. approach.
We all now fully understand neural networks and
genetic algorithms in some detail, and all learnt 8. References
from the experience. If we were to begin the
project again we would seriously consider [1] RoboCup Official Site
implementing a genetic algorithm from the start. http://www.robocup.org/
The message parsing and dealing of messages
would be kept the same but after progressing with [2] Neural Networks: Computerized Brains
both learning algorithms as far as possible we http://www.glencoe.com/norton/n-instructor-
thought we would have done a better job with the /updates/1999/8299-4.html
genetic algorithm. We also agreed that we should
have paid more attention to the synchronisation of [3] Introduction to Genetic Algorithms
the threads before we began working on the neural http://cs.felk.cvut.cz/~xobitko/ga/
network. A lot of time was spent on this after the
failure of the neural network and the time we [4] Introduction to Backpropagation Neural
would have saved by completing the task before Network
starting on the network would have been valuable http://www.geocities.com/neuralbug/neural_ne
as we began to reach the deadline. tworks.htm
As far as future work is concerned there are
various routes we could take. Finishing the genetic [5] Bentley. P. J., Digital Biology. Simon &
algorithm would obviously be the first thing to do, Schuster, 2001
but we would have liked to implement a more
useful coach. The coach we actually managed to [6] Coppin, Ben. Artificial Intelligence
implement was quite primitive but eventually he Illuminated, Jones & Bartlett, 2004
would have been necessary to record how well the
players were doing and so be able to choose the [7] Müller, Berndt. Neural Networks: An
players that could win the game (natural selection) Introduction, Berlin, 1990
in order to produce offspring.
[8] Barnes, D. Object Orientated Programming
with Java, Prentice Hall, 2000
-8-