Towards Using Genetic Algorithms in Lossy Audio Compression 2008

Towards using Genetic Algorithms in lossy audio compression
Peter Vel p.d.vel@student.utwente.nl ABSTRACT

Compression is a search for the smallest representation of an information resource. To find the global optimum in the solution space for these problems, the use of genetic algorithms is beneficial. This research studies the theories behind audio compression and genetic algorithms, and applies genetic algorithms to the optimization problem of dynamically sized blocks of audio samples, which are encoded using a Discrete Cosine Transform (DCT). This technique is compared to a version of itself with static block sizes, by applying both to a test set of sounds, showing that genetic algorithms are not only beneficial to lossy audio compression techniques, but can also be combined well with the current optimization techniques. what lossy audio compression is and how it works. Compression techniques attempt to discover structure in the source input which can be used to create a compressed representation of the same information, as described in detail in [1-3]. A distinction should be made between the two main types of compression: Lossless Compression is a type of compression in which no information is lost. The compressor attempts to find the smallest representation for a given source input (e.g. ZIP, RAR), from which the original information can still be completely restored. Because all information must be preserved, this leads to relatively low compression rates. Lossy Compression is a type of compression in which some loss of information is allowed in order to reach even higher compression rates. The compressor attempts to find a balance between a small compressed representation and a high accuracy in describing the source input (e.g. JPEG, MP3). In most applications of audio compression (the topic of this paper), the loss of some information is in most cases is not detrimental[1-3] and therefore audio is usually compressed using lossy algorithms.
Keywords
Genetic Algorithms, audio compression, Fourier analysis, lossy compression, discrete cosine transform (DCT).
1. INTRODUCTION
Compression plays a central role in most modern day communication. It is used when viewing an image, listening to MP3s, even when using a mobile phone. The field of compression is a highly theoretical field, which attempts to find the smallest possible representation of a source of information. To find these optima a wide range of techniques are applied; some of which will be discussed in the next chapters of this paper. A method that is occasionally used for trivial problems is that of a systematic brute force approach. However, the traditional inherent problem with brute force techniques lies in the impractical length of its processing time. The research in this paper attempts to prove that an optimized brute force technique called genetic algorithms is feasible to use in a wider context of lossy audio compression. The goal of this research is not to create a new optimized compressor, but instead prove the concept that higher compression rates can be attained when a compressor uses genetic algorithms. First, some key concepts of lossy audio compression will be described. Next, the application of genetic algorithms in this context will be explained, showing how such an algorithm might be applied to a real problem. The design and implementation of the prototype will then be described. The results of a number of test runs using this prototype will then be evaluated. Finally, a conclusion will be drawn from these results and suggestions made for improvements or further application of these concepts.
2.1 Lossy Audio Compression

For lossy audio compression, the compressed representation is created by extracting features from the source data and describing these features using some coding scheme, rather than the source itself. Digital audio consists of sampled sound waves which can be displayed in a graph as a wave form1. Audio is periodic by nature, so most algorithms use this property when compressing. Some of the most commonly used feature extraction techniques are variants of the Fast Fourier Transform (FFT), which describe the source input using a sum of sine functions by detecting frequencies of sinusoids (as explained in [1-4]). Because each value describes the presence of a sinusoid with a certain frequency, this representation is known as the frequency domain. To describe a long and complex sound in its entirety as a single sum of sinusoids would require an extremely large number of sinusoids. This would result in code expansion, since the enormous number of sinusoids will be more complex to describe than the original data. To prevent this from happening, sounds are usually broken into blocks, and Fourier transforms are then applied to each block separately. After optimizations are done (for example, prioritizing the frequencies most audible to humans, i.e.: Huffman encoding), this representation of the sound (in the frequency domain) is encoded into the compressed file. For this encoding and optimization, several decisions have to be made by either the developer of the encoder, or the person using the encoder. These decisions vary from the acceptable level of
1
2. AUDIO COMPRESSION
To put this research into context, it should first be explained
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission.
8thTwente Student Conference on IT, Enschede, January 25th, 2008

Copyright 2008, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science
The term wave form is also used in this paper to mean digital sound in general
data loss, to the size of the blocks that will be encoded using the Fourier algorithm. This research focuses on the latter example. When this type of low level decision is made globally (i.e. for the entire sound) any chance of local optimization is lost. Since there are many of these optimization decisions to be made, it is impossible for any encoder to determine the best decision for each part, as well as combination of parts, of the sound. As an accepted solution some of the more advanced compressors will do a very basic check to see if the block to be encoded is very complex, and if so divide it up into a number of smaller blocks. For example, MPEG Layer 3 divides a block into three smaller ones if it deems the sound too complex to describe. This is a start at solving the problem, but it is very rigid because of the constants still involved. Every block will still be either the size of a standard block, or rd of that. Making this system more flexible would require a search using a brute force algorithm. Due to the complexity and the large number of possible combinations, which grows exponentially as the wave form becomes larger, a simple brute force is not really feasible in practice, and a more sophisticated method of brute forcing would have to be applied. Genetic algorithms are such a system and will be extensively explained in the next chapter. It should be added that some other optimization settings can be determined by brute force quite easily. For example, the number of bits which should be used to encode the FFT values of each block into the final representation. The reason for this is that there are only a small number of options to be tried and because the number of bits used to encode, one block does not influence any other blocks. The latter in particular means that the complexity of the problem does not increase exponentially with the size of the wave form. The benefit of the standard brute force technique over a technique such as a genetic algorithm is that by applying a real brute force technique one always finds the global optimal solution for a problem since it actually tries all possible solutions. As will be explained in chapter 3, other techniques do not always guarantee this, thus a brute force is the preferred method when the problem is a trivial one. Yet other settings can actually be determined without any search taking place at all, as they can be directly derived from the source input. This is the case with the Fourier transforms, which determine the frequency components directly from the source inputs. Ideally then, a combination of all these techniques could be used, directly deriving when possible, brute forcing values when the problem is trivial to brute force, and genetic algorithms when it is not. This paper describes how such a combination can be implemented and how it performed on a test set of sound files.
Let us walk through these parameters one by one: If we know the required fidelity and the size of a block, then parameter 2 is only left to depend on parameter 1. Since parameter 1 and 2 are inversely related, we are actually trying to determine the optimal value for the combination of parameter 1 and 2. Parameter 1 has very few useful values (e.g. 4 - 32 bits). This means it is an easy target for a brute force approach. Given a set of audio samples, each of these values can be used to determine the number of FFT values required (see above), after which the most efficient solution can be selected. It is trivial to simply keep adding sinusoids until the minimum fidelity is attained, or until all FFT values are used. In the latter case the number of bits used was simply insufficient to reach the minimal fidelity (even with all of the values being used), and this setting for parameter 1 can therefore be discarded as invalid.
The third parameter is where finding a solution becomes much more complicated. The previous two parameters are shown to be relatively easy to determine, given parameter 3. But how does one find the optimal solution for parameter 3? The answer to this will be the main focus of this research, and will be explained extensively in later chapters. Needless to say, this is a very simplified audio compression algorithm, but the purpose of this research is not to create a new high performance audio compression algorithm. Rather, the purpose is to deliver a proof of concept showing the application of genetic algorithms to optimize audio compression in ways that are not achieved by other methods. The audio compression used here is therefore a very simple design which could still prove the following important points: The ability of genetic algorithms to create complex optimizations not realistically achievable by other means The possibility to use genetic algorithms in combination with other methods, such as brute force search and feature extraction algorithms like the Fast Fourier Transform The fact that the algorithm is not optimized does not take away the validity of these results, as they can be easily applied to a more advanced and more optimized compression algorithm.
3. GENETIC ALGORITHMS
As mentioned before, genetic algorithms are an efficient way to search a solution space for a global optimum. This is done by mimicking Darwins theory of evolution [5, 6]. The theory of Darwinian evolution states that if individuals in a population have beneficial characteristics, they will have a higher chance of reproducing. As they reproduce, they pass on these characteristics to the next generation by means of their genes. The result is that in the next generation, the beneficial properties are more common because those who had them reproduced more. In nature, offspring is often created by two parents, thereby combining properties of both parents into a single individual. This might lead to an offspring with the best of both parents, which may have a higher fitness2 than either of its parents. Another way life evolves is through random mutation, which randomly alters a property of an individual. Usually this is harmful to an individual, but sometimes it leads to a higher
2.2 Searching for the optimum

Arbitrarily sized blocks of audio are a considerably complex problem to optimize. The reason for this is that the size of each block influences the rest of the solution. This of course assumes that each sample of the input is only part of one block, because otherwise the same sample would be encoded multiple times. A search must be undertaken in order to find the best combination of settings for all of the parts of a sound. In this preliminary research a combination of a genetic algorithm and brute force will be used on a set of test sounds, attempting to optimize three such settings for each block of audio samples, given a required level of fidelity (which is a fixed value for the whole sound): 1. 2. 3. The number of bits used to encode each FFT value. The number of FFT values required to attain the required fidelity. The size of each encoded audio block.
fitness is used here in the Darwinian sense of the word, simply meaning chance to reproduce
fitness, which causes the mutated gene to spread to more and more individuals over future generations. The Darwinian evolutionary process can be imitated by a computer environment to achieve similar results. This technique is known as genetic algorithms. One difference between nature and genetic algorithms is that instead of finding an individual that is good at surviving, a genetic algorithm tries to find a good solution to a certain problem. A large amount of research has already been done on genetic algorithms, and on similar techniques such as Genetic Programming by John R. Koza [7], and they have been applied in a wide range of different areas [812].
performing relatively well is no guarantee for reproductive success. By sheer chance some good genes might still be eliminated from reproduction while some less fit ones are not. This is often beneficial, and leads the genetic algorithm further away from the problems that a hill climbing approach has with complex problems.
4. DESIGN
Genetic algorithms all follow a fixed structure, and to create an implementation for a problem some standard parts will need to be designed. In this chapter, first the audio compression used will be described, followed by descriptions of the design of each part of the genetic algorithm.
3.1 Why use Genetic Algorithms?

In order to answer this question, first two similar techniques will be described, along with their advantages and disadvantages, namely the brute force or hill climbing techniques. The problem with the brute force method in this case is that it would just take too long. This approach will always find the global optimum, but it does so by simply checking each solution, which can take a lot of time. In other words, it does a lot of exploration, but no exploitation of any promising results. The hill climbing approach does the opposite. It randomly creates some solutions, and selects the most promising candidate. Then it keeps changing each parameter in the solution until it can no longer find a change that will improve the result, and then it stops. This technique does a lot of exploitation of a promising result, but no exploration of other areas. Hill climbing tends to get stuck in local optima, and has a hard time finding the global optimum. This is illustrated in Figure 1 which has a global optimum at B, but searching this solution space with a hill climbing approach would more likely result in one of the local optima A or C, depending on the first solution it discovered.
4.1 Audio Compression

For the compression, an FFT variant known as the Discrete Cosine Transform (DCT) was used. The values produced by the DCT were scaled to fall in the range -1 X(k) 1. By scaling these values to a fixed range, they are more efficient to encode. This particular range was selected because it preserves the sign of the value, and because they can be easily scaled to integer values of n bits by using: The result DCT formula that was used in this project was:
Where
4.2 Representation
A B C
When using genetic algorithms, often the most difficult thing to design is the representation of a solution. This is also known as the genotype, since this is the structure that gets passed along to the next generation (similar to genes in nature). The actual solution is known as the phenotype, which can be decoded from the genotype (similar to the actual life form in nature). The best possible phenotype is what a genetic algorithm actually looks for and evaluates, but the genotype is what is used to describe and pass on properties of that phenotype. The reason it is so difficult to design a genotype, is because it has to fulfill a few rather tricky requirements: It has to consistently represent the solution. This means that the meaning of each part of the genotype should not change depending on other parts. Similar genotypes should produce similar phenotypes. If a small mutation occurs in the genotype, it should not produce a radically different phenotype. If it would, the genetic algorithm basically turns into a blind search. In this case, the representation used was an array of blocks together which together spanned the entire wave form, each containing its starting point. The reason each block only has a starting point is because their end point is already determined by the starting point of the next block. Because a single sample is insignificant to the result, the audio input was first divided into chunks of 8 samples to decrease the time needed to run the algorithm.
Figure 1. Solution space with three optima Genetic algorithms balance exploration and exploitation by giving promising solutions a better chance of reproducing, while at the same time keeping and mutating less fit solutions as well. For a more theoretical proof of why genetic algorithms work well on solution spaces please refer to [7, 9, 11]. Each of these contains one or more chapters about theorems that explain the efficiency of genetic algorithms, such as the schema theorem and the building block theorem. Repeating and explaining these fairly complex proofs here would be excessive and draw away from the topic of this research. At this point, two things should probably be clarified about genetic algorithms. Firstly, the term fitness as it is used in this paper describes the performance of a solution, and in the implementation a numeric value will be assigned to a solution that describes how well it performed. This value is known as the solutions fitness value. Secondly, a higher fitness creates a higher chance of reproduction, but it provides no guarantees. Genetic algorithms are by their very nature non-deterministic, so each run of a genetic algorithm will provide a different result. As in nature,
As was explained in paragraph 2.2, the bit count of each encoded DCT value and the number of DCT values used for each block will not need to be optimized using a genetic algorithm, and therefore will be part of the genotype. Instead they will be calculated when a block is created or altered, because they are required for determining the fitness of a solution.
chance at reproduction. The reality is different however, because the roulette wheel would look like this:
Sol 1 Sol. 2 Sol. 3 Sol. 4

Figure 4. Roulette wheel based on example data As can be seen in the image, the chances of getting selected for any solution other than solution 2 are very low, and the next generation may very well consist only of copies of this same solution. Many other systems have been invented and evaluated[7, 9-13], but will not be further described here. The example of the standard roulette system should illustrate why selection systems are an area of much research and discussion. The system that was used in this research used a combination of elitism, ranking and tournament selection. These concepts will be described in the following paragraphs.
4.3 Fitness Function

The fitness function in a genetic algorithm usually takes a genotype and uses it to evaluate a solution. It generally does not evaluate the genotype directly, but by determining the phenotype represented by the genotype, and evaluating that instead. In this case, the fitness value of each solution was determined by the size of the encoded representation it would result in. This size was determined taking into account all the settings that were optimized when the blocks were constructed, as described in the previous paragraph. The smaller the encoded representation was, the fitter the solution was considered to be. The reason there is no description of the range that the fitness values fell into was because the fitness value was not directly used in this project. The reason for this will be explained in the following paragraph.
4.4 Selection
Having a fitness value for each solution, a system is needed to decide which solutions get to reproduce for the next generation and which do not. There are many ways to do this, all of them applying some sort of weighted chance to the selection. Some of these techniques have been proven to be more effective than others. [10]
4.4.1 Elitism
Because any selection system in a genetic algorithm is still based on chance, there is always a small risk of losing the best solution that was found. The chance of the best solution being selected is greater than that of other solutions, but this is no guarantee. If it does not reproduce, the best solution in the next generation might turn out to be less good than its predecessor. To prevent this from happening, a copy of the best solution can be automatically selected to go into the next generation. This is known as elitism, because the elite solution of a generation is given special privileges.
4.4.2 Ranking
In order to prevent the skewed chance distribution shown in Figure 4, a system known as ranking is used. This method does not use the fitness value directly to determine the chance of being selected, but rather it sorts all the solutions based on their fitness values, and then assigns a reproduction chance based on their position in the list. This creates a more even and controlled chance distribution, without taking away the advantages of the better solutions. [10] The fact that our system used ranking was the reason why the encoded size of a solution can be directly used to evaluate fitness, as mentioned in paragraph 4.3. The solutions are simply ordered by size before selection takes place.
Figure 2. Simple roulette wheel system For example, the most straightforward approach is to use a system which works like a roulette wheel, where each solution gets allocated a part of the wheel that is directly related to its fitness (see Figure 2). The wheel is then spun, and the solution on which the ball lands gets to reproduce. A downside of this system is that even though two solutions might not differ much in fitness, they might be quite different relatively. Using this simple system the chances of getting selected might be very unbalanced, resulting in the same single solution getting selected many times. In the next generation this will only be worse, since there are then even more instances of that same solution, quickly resulting in this being the only solution in the system. This leaves mutation as the only method of exploration, and is a problem known as premature convergence. It is perhaps best illustrated with an example: Imagine the following population of four solutions. The desired fitness value is 1, but all four solutions perform quite poorly. Their fitness values are as follows: Solution 1 Fitness 0.01 Solution 2 0.06 Solution 3 0.02 Solution 4 0.005
4.4.3 Tournaments
To give further control over the selective pressure 3 in the system, tournaments were also used. After the solutions were ranked according to fitness values, two were randomly selected using the ranking system. After this, one of the two is randomly selected for reproduction, with the higher ranked of the two having a chance of 0.5p<1. In this formula p is set by the encoder and could even be dynamically altered at runtime to vary the selective pressure over time. This last option has a lot of advantages if it is used properly, but it was not implemented in our system.
Figure 3. Example data All of them perform poorly, and even though solution 2 should be slightly favored, they should all be given an almost equal
3
the influence of the difference in fitness values
4.5 Reproduction
After a solution has been selected, the system randomly decides whether to reproduce it by cloning or by using crossover. These two reproduction methods will be described in the following paragraphs.
5.1 WAVE Files

The audio files were read in by a custom made class, which can read each of the channels of a 44100Hz WAVE file into an array of double precision floating point. The values were scaled to fall within the range of -1.0 and +1.0, making them easier to handle in the rest of the system. The system selected only the first audio channel to work with, since it was designed to optimize only a single array of samples.
4.5.1 Cloning
The most straight forward method of reproduction is cloning. The newly created solution is an exact copy of the original in every way. This is useful for exploitation of promising solutions, but of course it does nothing for exploring new options.
5.2 User Interface

A simple front end was created with a menu that allows the user to browse for and open WAVE files, and run an experiment. When a file is opened the WAVE file is parsed as described above and the data is used to create a new instance of the World class. Each time the run button is pressed, the World is initiated and the experiment is run for a given number of generations, which is passed as a parameter to the World.
4.5.2 Crossover
An alternative reproduction method consists of using a crossover operator, which was also implemented in this system. A crossover operator combines two solutions to create a new solution, rather than simply cloning an existing solution. This could result in a new solution that has the best properties of both its parents combined, and has been proven to greatly increase the efficiency of a genetic algorithm. [7, 9, 11-14] To accomplish this, the system first selects a mate using the system described in paragraph 4.4 for the solution to reproduce with. After this, a block is randomly selected in the first solution as the cutoff point. The nearest matching block is selected as the cutoff point in the other solution. The two arrays of blocks are cut off at these points and recombined to form two new solutions.
5.3 World
Internally there is a central class called World which takes care of most of the organization. When it is created, it receives the scaled audio data and converts them into chunks of 8 samples each. When the init() method of the World is called it uses this array of chunks to create the array of random solutions that form the first generation. The World class includes a run() function which executes a genetic algorithm for a number of generations, as determined by a parameter. At the start of each generation it tells each solution to optimize its blocks for the array of audio data, in order to determine its true compressed size. It then applies elitism by placing the best solution directly into the next generation. It then goes through the selection system and reproduction functions to create the rest of the new generation, deciding when to cross over and mutate new solutions. At the end of each generation it removes the current generation and continues with the next. When the function has finished it returns the most successful solution discovered.
4.6 Mutation Operator

An important part of the reason genetic algorithms work so well, is because of the exploration that they do of new solutions. This exploration is in part done by mutating existing solutions to create new ones. For this project, the mutation operator changes the parameters in one or more blocks of a solution. Before a new solution is placed into the next generation it always4 has a small chance of mutating, no matter if it was created by cloning or crossover.
5. IMPLEMENTATION
The system was implemented using C++ using only the standard libraries. For the random numbers taken throughout the program, a randomValue() function was created as follows:
5.4 Solutions
The Solution class contains an array of the blocks it uses. When the solution is created it uses the number of chunks in the audio input, and divides them all into randomly sized blocks. It does this as follows: 1. 2. A split point is randomly selected, where a new border is introduced. For every two borders that are more than a fixed upper limit of block size apart, step 1 is applied. The upper limit for block size was 256 chunks in this system. After each two borders are less than the maximum size apart, the actual Block instances are created using these newly defined borders
To make decisions based on chance, the shouldI() function was created:
3.
These two functions are at the heart of the non-deterministic genetic algorithm, and are crucial to the functioning of the system. In the following paragraphs the important elements of the implementation will be described.
Solutions have a mutate() function which randomly selects blocks from the vector to mutate. Blocks that are no longer useful due to a mutation are immediately removed, and when a block is above the maximum size, a new block is introduced. The Solution class also contains an optimize() function that calls optimize() on each of the blocks it contains that have changed since the last time.
5.5 Blocks
A Block structure contains a firstChunk variable that determines where the block starts in the audio. It is initialized when it is created by the Solution to which it belongs (see previous paragraph).
Almost always, the solution cloned for elitism is never mutated.
The other values stored in a block are the number of bits used to describe each DCT value and the number of DCT values used to achieve the desired fidelity. These do not initially contain a value, but are instead filled in when the optimize() function is called. The optimize() function implements the core functionality of the brute force technique and the calculation of the DCT values, and is therefore crucial to the system. It takes as parameters the scaled audio input and the start of the next Block. It uses this information to determine the audio samples it is to be optimized for, and stores that range in a new array, while using the DCT() function to get the DCT values that describe that audio. After this is done, the brute force optimization technique works as follows: It selects one of the possible bit counts to describe each DCT sample, from an array of predefined options. The following 16 options were determined experimentally to be the most useful: {4,5,6,7,8,9,10,11,12,13,14,15,16,24,32,64} Each DCT value is converted to this bit count, and then restored to a double precision float, in order to see how far they have been rounded off. It then uses these slightly inaccurate DCT values to see how many of them it would need to attain the required level of fidelity. Knowing how many values it took, and how many bits each value takes, it determines what the compressed size for this bit count would be When it has tried all of the possible values, it selects the bit count that resulted in the smallest compressed size The required fidelity was set to be a maximum of 5% loss of information. For a rounded DCT value, any difference from its original was considered to be lost information, so its information content was:
The second song used was Voodoo People by The Prodigy. This is a techno song and was selected because of its fast beats and quickly alternating frequencies. Lastly, a rap song by LL Cool J called Phenomenon was used. It features smooth steady rhythms and beats, and unlike the other two it has a lot of vocals. The sound files were each compressed using fixed block sizes for the entire sound file. The block size used was the same as the standard MP3 encoding size, which is 384 samples or 48 chunks[1-3]. This technique had the advantage of not having to describe the size of each block in its compressed version, at the price of only having a fixed block size for the entire sound file. This method will from this point on be referred to as the fixed system. The sound files were then compressed individually using dynamically sized blocks with the genetic algorithm version described in this paper. This technique had the advantage of optimizing the block size locally, at the price of having to put the size of each encoded block in the compressed representation. This method will from this point on be referred to as the dynamic system. During the execution of the test using the genetic algorithm, the following settings were used: Population size: 20 Number of generations: 250 Chance of mutation: 30% Chance of crossover: 75% Chance of the fittest solution winning the selection tournament: 65% After the experiments were done, the results for each sample were compared. This will be described in the next chapter.
7. RESULTS
In this chapter the results will be shown for each sound file in the form of a graph with two lines. The horizontal blue line represents the constant size resulting from using the fixed system. The changing red line shows the size of the best solution of each generation, using the dynamic system. On the vertical axis is the number of bits in the compressed representation. The size of the original uncompressed sound data for each sound file was left out of the graph, because each sound snippet was 220500 x 16 = 3528000 bits in size, which is so large compared to the compressed representations that it would be futile to put it in the same graph.
To determine how many DCT values this took, the system kept adding rounded DCT values until the following statement was true:
If it used up all the rounded DCT values and still had too much lost information, the option of using this bit count would be discarded.
6. EXPERIMENT
The prototype described in the previous chapter was evaluated using a set of three sound files. The sound files were taken from songs that were copied directly from CD, to avoid information already being lost due to other lossy compression methods such as MP3. A representative audio segment of 5 seconds was taken from each song. This length was used because it allowed for quick experiments on this unoptimized system while still being representative in size, being a total of 220500 samples long. The songs were selected to be of extremely different genres and styles, in order to find results that are widely applicable. The first song used was Beethovens 5th symphony. This is a famous classical work which uses a lot of long smooth tones and bass.
Thousandsof bits
150 100 50 0 Generations

Figure 5. Beethoven's 5th symphony
It may appear odd that in the graphs, the dynamic system immediately and throughout the testing consistently performs better than the fixed system. This phenomenon can be explained by the fact that a number of random solutions were generated,
and only the very best one is shown. On average over all solutions, it resulted in inferior performance due to the fact that it is random. The real benefit however, is that the dynamic system evolves over time, and continually improves its compression. After the number of generations used in this experiment, the results are already becoming significantly different. This evolution is best illustrated by graph of the Beethoven segment, where the dynamic system only takes up two thirds of the amount of bits the fixed system needs. This trend results from the fact that the audio segment in question contains long melodious sound, which would be more efficiently compressed by using larger blocks The dynamic system adjusts to meet this demand, while the fixed system does not.
Lastly, by combining this approach with brute forcing the bit count used for each DCT value as well as deriving the number of such DCT values stored, it has also been proven that genetic algorithms can be combined well with other established compression and optimization techniques. The ability to use genetic algorithms for optimization gives compressors a chance to optimize areas that cannot be directly derived, and where brute force seems unfeasible in a reasonable amount of time. It can be added to existing compression techniques, creating compression ratios that could not have been achieved without it.
9. DISCUSSION
In the experiments the fixed system was shown to perform relatively poorly with the segment from Beethovens 5th symphony, presumably because it used blocks that were inefficiently small for the audio it was compressing. A question arises from this that may prove to be relevant to this research. Optimizing the fixed block size of the entire audio file would be feasible to do using a brute force method, simply trying out each block size. Using this method, the discovered optimal block size would only have to be stored once in the header of the compressed file, since it does not vary per block. In the case of the Beethoven segment, this block size would presumably be large, while it would be small for the Prodigy snippet. This is an interesting alternative technique that might be explored in future work to improve the performance of the fixed system. I am confident however, based on the steady decline of file size the dynamic system was still creating at the end of the experiments, that it would outperform the fixed system in the end. Of course, we will not know for sure until this has been implemented and tested. Another criticism of the system is the time it takes to find solutions. One of the problems during the experiments was the extremely long time it took to come up with solutions for a full length song. The original plan was to simply use an entire song at once, but in the end, representative 5 second snippets were used. They were sufficient for the experiment, but for encoding full songs this still poses a problem. This problem however, was not caused by the genetic algorithms, but by the rather inefficient implementation of the DCT function. Every time a block was to be optimized the DCT function was called upon to perform a Fourier analysis of the samples, which used 75% of the CPU time used in the system. Therefore this does not affect the main point of the paper, but it is an important side note concerning the performance of the compression system described in this paper.
Thousands of bits
400 300 200 100 0 Generations

Figure 6. The Prodigy - Voodoo People
The Prodigy audio segment was also found to require a substantially larger description due to its complexity.. The compressed size is almost three times larger than that of the Beethoven snippet, despite the fact that they have the same number of samples. Even on a snippet of this high complexity, the dynamic system created a smaller resultant representation. .
Thousands of bits
300 250 200 150 100 50 0 Generations

Figure 7. LL Cool J - Phenomenon
ACKNOWLEDGMENTS
I would like to thank Mannes Poel for taking to time to guide me through this project, and helping me keep my goals realistic and achievable. I would also like to thank Betsy van Dijk for enabling me to take on this project despite the planning difficulties.
To summarize, the dynamic system outperformed the fixed system in all experiments, and was still steadily decreasing the compressed size at the end of each experiment.
REFERENCES
[1] [2] [3] [4] K. Sayood, Introduction to Data Compression, Third Edition: Morgan Kaufmann Publishers Inc., 2005. D. Salomon, Data compression: the complete reference. New York: Springer-Verlag, 1998. D. Salomon, A Guide to Data Compression Methods. New York: Springer-Verlag, 2002. J. F. James, A Student's Guide to Fourier Transforms: With Applications in Physics and Engineering. Cambridge, UK: Cambridge University Press, 2003.
8. CONCLUSIONS
The experiments demonstrated that a lossy audio compression technique that uses dynamically sized blocks can outperform a system that uses fixed size blocks, despite the extra cost of having to store the size of each block. By using genetic algorithms to actually find efficient solutions to such a complex optimization problem it has also been proven that genetic algorithms can be a useful tool to create complex optimizations that are difficult or even impossible to achieve by other means.
[5] [6]
[7]
[8]
[9]
C. Zimmer, Evolution: The Triumph of an Idea. New York: HarperCollins Publishers, 2001. C. Darwin, On the Origin of Species by Means of Natural Selection. Cambridge, MA: Harvard Univ. Press, 1859. J. R. Koza, Genetic programming: on the programming of computers by means of natural selection: MIT Press, 1992. L. Vences and I. Rudomin, "Fractal Compression of Single Images and Image Sequences using Genetic Algorithms," 1994. Z. Michalewicz, Genetic algorithms + data structures = evolution programs (3rd ed.). New York: SpringerVerlag, 1996.
[10]
[11] [12] [13] [14]
D. B. D. R. B. R. R. Martin, "An Overview of Genetic Algorithms, Part I: Fundamentals," University Computing, vol. 14, p. 11, 1993. W. B. Langdon and R. Poli, Foundations of Genetic Programming: Springer-Verlag, 2002. R. L. Haupt and S. E. Haupt, Practical genetic algorithms: John Wiley & Sons, Inc., 1998. S. R. Ladd, Genetic Algorithms in C++. New York: M & T Books, 1996. E. Vonk, L. C. Jain, and R. P. Johnson, Automatic Generation of Neural Network Architecture Using Evolutionary Computation: World Scientific Publishing Co., Inc., 1997.

Towards Using Genetic Algorithms in Lossy Audio Compression 2008

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Towards Using Genetic Algorithms in Lossy Audio Compression 2008

Uploaded by

Copyright:

Available Formats

Towards using Genetic Algorithms in lossy audio compression

Peter Vel p.d.vel@student.utwente.nl ABSTRACT

2.1 Lossy Audio Compression

8thTwente Student Conference on IT, Enschede, January 25th, 2008

2.2 Searching for the optimum

3.1 Why use Genetic Algorithms?

4.1 Audio Compression

Sol 1 Sol. 2 Sol. 3 Sol. 4

4.3 Fitness Function

the influence of the difference in fitness values

5.1 WAVE Files

5.2 User Interface

4.6 Mutation Operator

To make decisions based on chance, the shouldI() function was created:

Almost always, the solution cloned for elitism is never mutated.

150 100 50 0 Generations

400 300 200 100 0 Generations

300 250 200 150 100 50 0 Generations

[11] [12] [13] [14]

You might also like