You are on page 1of 5

IMPLEMENTATION OF ANT SYSTEM FOR DNA SEQUENCE OPTIMIZATION

Tri Basuki Kurniawan, Zuwairie Ibrahim, Muhammad Faiz Mohamed Saaid, and Azli Yahya
Faculty of Electrical Engineering, Universiti Teknologi Malaysia, 81310 UTM Skudai,
Johor Darul Takzim, Malaysia.

Email: tribasukikurniawan@yahoo.com; zuwairiee@fke.utm.my; ninjagaban@yahoo.com;


azli@fke.utm.my

Abstract

In DNA based computation, the design of good DNA sequences has turned out to be a
fundamental problem and one of the most practical and important research topics. Although the
design of DNA sequences is dependent on the protocol of biological experiments, it is highly
required to establish a method for the systematic design of DNA sequences, which could be
applied to various design constraints. Much works have focused on designing the DNA sequences
to obtain a set of good DNA sequences. In this paper, Ant System (AS) is proposed to solve the
DNA sequence optimization problem. AS, which is the first approach proposed in Ant Colony
Optimization (ACO), uses some ants to search the solutions based on the pheromone information.
A model is adapted, which consists of four nodes representing four DNA bases. The results of the
proposed approach are compared with other methods, such as evolutionary algorithm.

Keyword: Ant Colony Optimization, Ant System, DNA sequence optimization.

1.0 Introduction

The ability of a DNA computer to perform calculations using specific biochemical


reactions between different DNA strands by Watson–Crick complementary base pairing, affords a
number of useful properties such as massive parallelism and huge memory capacity [1]. However,
due to the technological difficulty, the in vitro reactions may result in incorrect computation.
Sometimes, DNA computer fails to generate identical results for the same problem and using the
same algorithm. Furthermore, some DNAs could be wasted because of the undesirable reactions.
To overcome these drawbacks, much works have focused on the design of DNA sequences to
reduce the possibility for illegal reactions [2].
In these approaches, sequences are designed such that each element uniquely hybridizes to
its complementary sequence, but not to any other sequence. Due to the differences in experimental
requirements, however, it seems impossible to establish an all-purpose library of sequences that
effectively caters to the requirements of all laboratory experiments [1]. Although the design of
DNA sequences is dependent on the protocol of biological experiments, it is highly required to
establish a method for the systematic design of DNA sequences, which could be applied to various
design constraints [3].
Various kinds of methods and strategies for DNA sequence optimization have been
proposed to date, such as exhaustive search [4], template-map strategy [5], graph method [6],
stochastic methods [7], and nearest-neighbour thermodynamic [8]. An Ant Colony Optimization
(ACO) approach for DNA sequence design has been previously proposed [9], which used
thermodynamic values as heuristic information. However, in DNA sequence design, since there is
actually no information could be used as heuristic information, in this study, DNA sequences are
designed based on Ant System (AS) without any heuristic information.
2.0 The DNA sequence optimization

The objective of the DNA sequence optimization problem is basically to obtain a set of
DNA sequences, where each sequence is unique or cannot be hybridized with other sequences in
the set. In this paper, the objective functions and constraints from [9] are used. Two objective
functions, namely Hmeasure and similarity, are chosen to estimate the uniqueness of each DNA
sequence. Moreover, two additional objective functions, which are hairpin and continuity, are used
in order to prevent secondary structure of a DNA sequence. Furthermore, two constraints, which
are GC% and melting temperature, are used to keep uniform chemical characteristics.
DNA sequence optimization is actually a multi-objective optimization problem. However,
the problem is converted into single-objective problem, which can be formulated as follows:

min f DNA = ∑ ωi f i (1)


i

subjected to Tm and GCcontent constraints, where fi is the objective function for each i ∈ {Hmeasure,
similarity, hairpin, continuity}, and ωi is the weight for each fi. In this study, the weights are
defined by the user.

3.0 Ant System

ACO is a population-based meta-heuristic approach for combinatorial optimization


problems. It is inspired by the ability of ants to find the shortest path from their nest to the food
source. Marco Dorigo has firstly introduced AS in his PhD thesis [10] and applied it to the
travelling salesman problem (TSP). Since then, it has been applied in many applications such as
the quadratic assignment problem [11] and RNA secondary structure prediction [12].
The main processes in AS are the construction of a solution and the pheromone updating by
all ants at each iteration. To construct a solution, ants select the following city to be visited through
a stochastic mechanism. When ant k is in city r and has so far constructed a partial solution, the
probability of going to city s is given by:

 [τ (r , s )]α [η ( r , s)]β
 α β
if s ∈ J k (r )
pk (r , s) =  ∑ [τ (r , u )] [η (r , u )] (2)
 u∈ J k ( r )
0 otherwise

where Jk(r) is a set of feasible components; that is, edges (r, s) where s is a city not yet visited by
the ant k. Eq. (2) also includes other edges (r, u), where u refers to all cities not yet visited by the
ant k. The parameters α and β control the relative importance of the pheromone versus the heuristic
information.
The pheromone τ(r, s) is associated with the edge joining cities i and j, and the pheromone
updating is done as follows:

τ (r , s) = (1 − ρ ) ⋅ τ (r , s) + ∆τ (r , s) (3)

where ρ is the evaporation rate and ∆τ (r , s ) is the quantity of pheromone laid on edge (r, s) by ant
k:
 1 if (r , s ) ∈ sequence done by ant k (4)
∆τ ( r , s ) =  Q
0 otherwise

where Q is the sum of all objectives.


A suitable model is proposed to solve the DNA sequence design based on ACO. The model
is similar to a finite state machine, which consists of four nodes. Individual node representing
DNA bases, which is A, C, G, or T, as shown in Fig 1.
As illustrated in Fig 1, if an ant is placed (randomly) at node A (Fig 1a), and the ant moves
to node T (Fig 1b), hence the path constructed by the ant can be translated into ‘AT’ sequence of
DNA. Next, when the ant moves from node T to node C (Fig 1c), the DNA sequence of ‘ATC’ is
constructed. The tours of the ants continue until the number of required sequences is produced.

The AS algorithm for DNA sequence optimization can be summarized in Algorithm 1;


Algorithm 1. Ant System for DNA Sequence Optimization
// --- Initialization step
Initialize parameter t, α, ρ, nk, N, and all DNA parameters, such as hcon, hdis, scon, sdis;
Calculate τo;
For each link(i, j) do
τ (i, j) = τo; // --- Pheromone initialize
end
t= 0; // --- initialize no of iteration. // --- Main Process
Repeat
Repeat
Place all ants randomly, k =1, …, nk; // --- (nk = number of ants)
For each ant k =1, …, nk do
Repeat // --- State Transition Rule
Each ant applies a state transition rule (Eq. 2) to incrementally build a DNA sequence;
Until all ants have build a complete a DNA sequence; // --- (no. of tours)
Next // --- Archive update
If no. of DNA sequences in archive are equal with the number of ants (full) Then
Calculate all DNA sequences in archive bases on four objective functions and then sort the
sequence in descending order; // --- N-first(s) DNA sequences in archive
N-DNA sequences which have higher value in archive is removed;
EndIf // --- Storing the archive process
For all DNA sequences, compute four objective functions and then sort the sequence in ascending
order;
For each DNA sequence do
Check GCcontent and Tm constraints;
If pass and archive is not full Then store DNA sequence to archive;
Next // --- Global Pheromone Updating for all DNA sequences produced by ants;
A global pheromone updating rule (Eq. 3) is applied;
Until number of DNA sequences in archive are equal to the number of ants;
t = t + 1;
Until End_Condition // --- (maximum no. of iteration is reached)

During the initialization step, all parameters, such as α, ρ, and nk were set using default
parameters for ACO [13], α was set to 1 and ρ was set to 0.5. Number of mutation (N) was set as
half of number of ants and 500 iterations were used. The set of sequences consists of 7 sequences
(nk) of 20-mer length.
The DNA parameters are initialized with values in [14]. Hmeasure and similarity employed
continuous parameters (hcon and scon) equal to six bases and discontinuous parameters (hdis and sdis)
Fig 1. Finite state machine as a model for constructing a DNA sequence.

is 17%. For continuity, the threshold value was 2. Hairpin formation was set at least six base-
pairings and a six base loop. The melting temperature was computed base on the nearest-neighbour
(NN) method with 1 M salt concentration and 10 nM nucleic acid concentration.
In this paper, the multi-objective optimization problem is converted into single-objective
problem. Since it is difficult to find the proper weight values for every objective [15], the weight in
Eq. (1) were set equal to 1.

4.0 Results and Discussion

Based on the proposed model and algorithm, 100 independent runs of the AS approach for
DNA sequence optimization were executed and the average values are calculated. Comparisons
are made with the previously proposed ACS approach in [9] and MOEA method in [14] as shown
in Table 1 and Fig. 2.
Our sequences show lower Hmeasure value but a bit higher similarity value. For sequences
generated by ACS, no continuity is observed, whereas the continuity value of sequences generated
by the proposed approach and MOEA are 0.1 and 1.3, respectively. The total objective values
show that the result obtained is the best compared to others.

5.0 Conclusion

Ant System has been implemented without heuristic information for DNA sequence
optimization with four objective functions: Hmeasure, similarity, continuity, and hairpin and two
constraints: GCcontent and melting temperature. The DNA sequences obtained from the proposed
approach were compared with sequences designed by ACS and MOEA approach. The results
show that AS can generate relatively better in total objective values than MOEA approach.

6.0 Acknowledgement

This research is supported financially by the Ministry of Science, Technology, and


Innovation (MOSTI), Malaysia under eScienceFund Research Funding (Vot 79034) and Universiti
Teknologi Malaysia under Initial Research Grant Scheme (IRGS) (Vot 78285).

7.0 References

[1] L.M. Adleman, Molecular computation of solutions to combinatorial problem, Science, 266, 1994.
[2] A. Brenneman and A. Condon, Strand design for biomolecular computation, Theory: Comput. Sci, 287,
2002, 39–58.
[3] S. Kashiwamura, A. Kameda, M. Yamamoto, A. Ohuchi, Two-step search for DNA sequence design,
Proceedings of the ITC-CSCC 2003, 2003, 1815–1818.
Table 1. The comparison results of AS approach, ACS [9], and MOEA [14].
The average value of objective for 100 time running of AS approach
C Hr Hm Sm Total
Average 0.1 0.5 36.8 57.9 95.1
Standard Deviation 0.3 0.8 8.5 8.8 1.7
The DNA sequences taken from [9]
Average 0.0 0.0 54.1 51.9 106.0
Standard Deviation 0.0 0.0 12.4 7.8 10.5
The DNA sequences taken from [14]
Average 1.3 0.0 52.0 52.4 107.1
Standard Deviation 3.4 0.0 8.3 6.7 5.5
Remark: C, Hr, Hm, and Sm denote continuity, hairpin, Hmeasure, and similarity objectives values, respectively

Fig 2. Comparison of average fitness of AS, ACS [9], and MOEA [14]. Here, the number of
sequences = 7 and length = 20.

[4] A.J. Hartemink, D.K. Gifford, and J. Khodor, Automated constraint based nucleotide sequence
selection for DNA computation, Proceedings of the 4th DIMACS Workshop DNA Based Computers,
1998, 227–235.
[5] A.G. Frutos, A.J. Thiel, A.E. Condon, L.M. Smith, and R.M. Corn, DNA computing at surfaces: four
base mismatch word design, Proceedings of the 3rd DIMACS Workshop DNA Based Computers, 1997.
[6] U. Feldkamp, S. Saghafi, W. Banzhaf, and H. Rauhe, DNA sequence generator - A program for the
construction of DNA sequences, Proceedings of the 7th International Workshop DNA Based
Computers, 2001, 179–188.
[7] F. Tanaka, M. Nakatsugawa, M. Yamamoto, T. Shiba, and A. Ohuchi, Developing support system for
sequence design in DNA computing, Proceedings of the 7th International Workshop DNA Based
Computers, 2001, 340–349.
[8] F. Tanaka, A. Kameda, M. Yamamoto, and A. Ohuchi, Thermodynamic parameter based on nearest-
neighbor model for DNA sequences with a single-bulge loop, Biochemistry, 43 (22), 2004, 7143–7150.
[9] T.B. Kurniawan, N.K. Khalid, Z. Ibrahim, M. Khalid, and M. Middendorf, An ant colony system for
DNA sequence design based on thermodynamics, Proceedings of IASTED-ACST, 2008, 144-149.
[10] M. Dorigo, Optimization, learning, and natural algorithms, PhD Thesis, Politechnico di Milano, Italy,
1992.
[11] V. Maniezzo, A. Colorni, and M. Dorigo, The ant system applied to the quadratic assignment problem,
Universite Libre de Bruxelles, Belgium, Tech. Rep. IRIDIA/94-28, 1994.
[12] N. McMcMellan, RNA secondary structure prediction using ant colony optimisation, Master Thesis,
School of Informatics, University of Edinburgh, 2006.
[13] M. Dorigo and T. Stutzle, Ant Colony Optimization, Massachusets Institute of Technology, 2004.
[14] S.Y. Shin, I.H. Lee, D. Kim, and B.T. Zhang, Multiobjective evolutionary optimization of DNA
sequences for reliable DNA computing, IEEE Transaction on Evolutionary Computation, 9 (2), 2005,
143-158.
[15] Y. Jin, M. Olhofer, and B. Sendhoff, Dynamic weighted aggregation for evolutionary multi-objective
optimization: How does it work and how?, In Proceeding of GECCO Conference, 2001.

You might also like