You are on page 1of 5

Comparing Word Categories to Make

Sentences Using Genetic Algorithms


Eric Marsh
12/18/2015

Hypothesis: When using evolutionary computation to generate sentences, it is


better to focus on evolving categories of words instead of evolving grammar
structure.

Abstract: The goal of this project stems from the poor results of the last project. In
the previous project, sentences came out jumbled and incoherent because the
grammar was generated by the genetic algorithm, and as a result, was ruined by
the negative effects of crossover and mutation. In this project, the goal was to
evolve good sentences by comparing categories of words, and plugging them into a
sentence template. For example, when evolving a subject, the program can use
family nouns such as sister or baby and use it together in a sentence with violent
verbs such as kick or punch to be plugged into a template to get The baby liked to
punch. These sentences are then given a fitness assigned by a user. Overall, this
new method produced much better sentences then the previous project, showing
my hypothesis is true. This is largely because of the sentence templates giving a
much better grammar structure then the previous project could evolve. Another
reason it did much better was because it kept producing similar sentences to the
users tastes. As a result, the user would find many sentences based off the ones
they have chosen before. For the future, it is recommended to allow individuals to
carry more than just one subject and one verb. It would also be nice to see this
implemented on Twitter, as the last project was.
Methods/Algorithms: This genetic algorithm consisted of a population of 5
individuals, with each individual containing a subject-category and a verb-category.
Each individual is then assigned a subject from the list of that subject-category
(iguana from ANIMALS) and a verb from that verb-category
(paints from
CREATIVE). These words are then plugged into a shared template (The SUBJECT
was a master at VERB.) to get an actual sentence (The iguana was a master at
paint.) The resulting sentences were then ranked by a user and assigned a fitness
based on that ranking. For the next generation, the best individual was appended to
the population as elitism. Then two individuals are chosen from a tournament
selection function. They are then crossovered. Crossover worked by having either
the subject-category or the verb-category switched between two individuals. Once
an individual receives a new category from the other individual, they are then
assigned a random word from that category. An example of crossover can be seen
in figure 1. The first individual is assigned the second individuals category of
KITCHEN APPLIANCE so it picks fridge.

Figure 1: Crossover. Green text are the subjects, yellow text are the verbs

After crossover, each individual has a chance to be mutated. Mutation works by


either changing the subject-category to another random subject-category, or
changing the verb-category to another random verb-category. If their category is
changed, then they are assigned a new word from that category list. After this
process is repeated enough times to accumulate a population of 5 individuals, the
user ranks the sentences to assign their fitness again. A new generation is born.

Algorithm
Population
size
Selection
method
Elitism (if
used)
Population
initializatio
n

Crossover
method
Crossover
rate
Mutation
method
Mutation
rate
Fitness
function

Generational
5
Tournament selection of 2
Elitism is implemented by appending the highest ranked
individual to the next generation
A population of 5 individuals is initialized. Each individual picks
a subject category (ex. POLITICS) and a verb category (ex.
BODILYFUNCTIONS). Then its subject_word is assigned a
random noun from the list of that category (ex. Obama in
POLITICS) and its verb_word is assigned a random verb from
the list of that category (ex. Fart from BODILYFUNCTIONS).
Then, each individual is given the same sentence template,
which it fills with its two words. A user then reads each
individuals sentence and ranks all individuals.
50% to swap subject categories. 50% to swap verb categories.
30%
50% pick a random subject category to be assigned to them
50% pick a random verb category to be assigned to them
40%

A user reads a population, and ranks each individual based on


how well the subject and verb work together. The rank is then
incremented to their current fitness. (ex. If the individual was
the third best of a population, and its initial fitness was 7, its
new fitness is 10
Size control None. Only two words were used for each individual
(if any)
Word Types Subject: ANIMALS,FAMILY, POLITICS, SCIFI,MATH,PLACES,
CELEBRITIES,SILLY,JOBS,BODYPARTS,BODYFLUIDS,APPLIANCES,T
ECHNOLOGY,
VEHICLES,CLOTHES,MUSIC

Verb: FOOD,VIOLENCE,FUN,LAZY,SOCIAL,CREATIVE,ENGINEER

Results:
2)
3)
1)
4)
5)
If
If
If
If
If

uncle is the master of wiggle


hair is the master of kill
President Obama is the master of paint
iguana is the master of read
wizard is the master of write

you
you
you
you
you

like
like
like
like
like

President Obama, then you can go sleep


pilot, then you can go read
baby, then you can go cheer
congress, then you can go attack
brother, then you can go cheer

Figure 2: An example of an initial population evolving. The first list consists of the
first 5 individuals. The green text are subjects and the yellow texts are verbs. The
numbers to the left of them are their rankings.

Population Fitness over Time


40
35
30
25
Individual
Fitness
Value1(Lower isIndividual
better) 220

Individual 3

Individual 3

Individual 5

15
10
5
0

Generation

10

Figure 2: All 5 individuals fitness over time. Notice Individual 1 is the best
individual, individual 5 is the worst. The best evolved sentence was Being a Hillary
Clinton means you must fart.

Conclusion:
The hypothesis was a success. Evolving categories of words resulted in more
readable and interesting sentences then before. The evolutionary algorithm stuck to
the users preferences and tended to create similar sentences. As seen in figure 1,
POLITICS and CREATIVE were the categories for the best individual in the initial
population, and these categories are seen in the next generation (President
Obama in the first sentence, read in second sentence, congress in fourth
sentence.) This made for more fun results for the user, since it inclined to their
tastes. For example, I have a childish taste of humor so most of my sentences
involved Politics and Bodily functions as the respective subject and verb
categories. Since users tended to pick more of the same subject and verb, there
tended to always be one clear best individual. In Figure 2, it is individual 1 who is
the clear winner. However, we can see diversity as individuals vary in fitness, for
example Generation 6 where we can see many individuals fitness rankings change.
While the results were positive, there is more room for improvement in the future.
For example, individuals should be able to carry more than just one subject and one
verb. It would be nice to evolve varying sizes of word amounts, such as an
individual with 2 subjects and 1 verb or 5 subjects and 6 verbs. This would make the
results substantially more sporadic, but more complex in category combinations. I
would also like to see this implemented in Twitter. This was not done in this project
because each individual had to be ranked, from 1 to 5, and the twitter program in
the last project did not support that. In the future, implementing polling on a twitter
bot account would allow online users to collectively vote on sentences and allow for
more interesting category combinations. Lastly, I would like to see more subject
categories, verb categories, and sentence templates in the future. For this project, I
had to think of them all by myself. As a result, there isnt a large range of categories
to pick from. Increasing this amount would allow for more complex sentences. This
can be done by finding a database of these categories, or writing a program to
extract them from a text. Overall, this was a very fun project to create and the
success of the results show that my hypothesis was correct.

You might also like