You are on page 1of 18

Journal of Chemical Engineering of Japan, Vol. 50, No. 4, pp.

273–290, 2017 Research Paper

A Fast Converging and Consistent Teaching-Learning-Self-Study


Algorithm for Optimization: A Case Study of Tuning of LSSVM
Parameters for the Prediction of NOx Emissions from a
Tangentially Fired Pulverized Coal Boiler
Faisal Ahmed1,2, Jin-Kuk Kim1, Asad Ullah Khan2,
Ho Young Park3 and Yeong Koo Yeo1
1
Department of Chemical Engineering, Hanyang University, 222 Wangsibriro, Sungdong-gu, Seoul 04763,
Korea
2
Department of Chemical Engineering, COMSATS Institute of Information Technology, Lahore, Pakistan
3
KEPRI, 105 Munji-Ro, Yuseong-Gu, Daejeon 305-760, Republic of Korea

Keywords: Least Squares Support Vector Machines, NOx Prediction, Teaching-Learning-Based-Optimization, Ame-
liorated Teaching-Learning-Based-Optimization, Teaching-Learning-Self-Study-Optimization

This paper presents a novel Teaching-Learning-Self-Study-Optimization (TLSO) algorithm which is not only fast converg-
ing according to the number of iterations, but also relatively consistent in converging with high accuracy to the global
minimum in comparison with some other algorithms. The original Teaching-Learning-Based Optimization (TLBO) gives
uniformly distributed and randomly selected weight to the amount of knowledge to a learner at each phase, i.e., teacher
phase and learner phase. This uniformly distributed and randomly selected weight causes the algorithm to converge
the average cost of learners in a moderate number of iterations. Li and his coworkers intensified the teacher and learner
phases by introducing weight-parameters in order to improve the convergence speed in terms of iterations in 2013 and
called it Ameliorated Teaching-Learning-Based Optimization (ATLBO). The criterion of a good evolutionary optimization
algorithm is to be consistent in converging the cost of the objective function. For this, it should include intensification
for local search as well as diversification for global search in order to reduce the chances of trapping in a local minimum.
Some students naturally tend to study by themselves by the means of a library and internet academic resources in order
to enhance their knowledge. This phenomenon is termed as self-study and is introduced in the proposed TLSO’s learner
phase as a diversification factor (DF). Various other evolutionary algorithms such as ACO, PSO, TLBO, ATLBO and two vari-
ants of TLSO are also developed and compared with TLSO in terms of consistency to converge to the global minimum.
Results reveal that the TLSO was found to be consistent not only for a higher number of functions among 20 benchmark
functions, but also for NOx prediction application. Results also show that the predicted NOx emissions through LSSVM
tuned with TLSO are comparable with the other algorithms considered in this work.

The reason of the development of such simulation models


Introduction
is twofold. (1) Combustion optimization has been proved
NOx emissions from industrial stacks such as coal-fired to be an effective and economic technique to reduce the
utility boilers have several adverse and harmful effects on NOx emissions (Li et al., 2013). For this, adjustable process
the environment, e.g., photochemical smog, depletion of variables or operational parameters such as primary and
ozonosphere, acid rain and enhancement of greenhouse secondary air flow bias and coal feed flow rate are care-
effect, etc. (Lv et al., 2013). Therefore, environmental regula- fully optimized to realize low NOx emissions in coal-fired
tion authorities have imposed strict regulations on coal-fired utility boilers. However, before optimization is carried out,
power plants and other industries to reduce the NOx emis- a precise model is essential to relate various adjustable
sions. The increase of coal consumption in power plants process variables and NOx emissions (Wei et al., 2013). (2)
worldwide and stringent environmental regulations on NOx Hardware based continuous emission monitoring systems
emissions have attracted the attention of researchers in the (CEMSs) are commonly used to estimate and monitor the
last decade. With this increasing interest in environmental NOx emissions. However, their cost for installation and
protection, easy to use and more compact simulation mod- maintenance is relatively expensive. On the other hand, due
els and monitoring systems are required to help power plant to severe environment, CEMS are often off-line for main-
operators control the NOx emissions. tenance purpose. In such circumstances, a redundant facil-
ity is crucial which is fulfilled by a simulation model (Zhou
Received on January 12, 2016; accepted on October 6, 2016
et al., 2012).
DOI: 10.1252/jcej.16we002 During the last decades, several models have been pro-
Correspondence concerning this article should be addressed to Y. K. posed to predict and reduce the NOx emissions. Generally,
Yeo (E-mail address: ykyeo@hanyang.ac.kr). they can be categorized into three major categories, i.e., com-

Vol. 50  No.©4 2017 


Copyright 2017The Society of Chemical Engineers, Japan 273
putational fluid dynamics (CFD) models (Fan et al., 2001; Non-population based algorithms such as simulated an-
Khoshhal et al., 2010; Baek et al., 2014), statistical heuristic nealing (SA) and Tabu search (TS) execute single-starting-
models (Adali et al., 1999; Ligang et al., 2010; Li et al., 2013; point search with their solutions being highly dependent
Ahmed et al., 2015) and gray-box models (Pearson and on the initial conditions; whereas, population based algo-
Pottmann, 2000; Li et al., 2004). Due to the time and effort rithms normally start the search from multiple points and
required and complexity involved in the development of their performance is little influenced by initial solution. On
CFD models, statistical heuristic models based on process the other hand, among population based metaheuristics,
data history find their places quite frequently in the model- a genetic algorithm may destroy previous knowledge dur-
ing of NOx emissions (Ahmed et al., 2009). ing offspring production, hence converging very slow (Kao
Among several statistical heuristic methods, LSSVM has et al., 2012). Therefore, for the comparison purpose, popula-
gained much attention in recent years. LSSVM has several tion based metaheuristics that retain the knowledge of good
advantages over other heuristic methods such as partial least solutions have been adopted including ACO, PSO, TLBO
squares (PLS) and multilayer perceptron (MLP). It has bet- and ATLBO and two variants of TLSO (described in later
ter generalization ability and global optimization property. section).
In addition, MLP (for instance, back propagation neural Prior to applying the proposed TLSO to tune the hyper-
networks, BPNN) may be trapped in a local minimum parameters of LSSVM for the NOx prediction application,
(Vong et al., 2006) and LSSVM has fewer parameters than an experiment has been carried out on a balanced combina-
BPNN. Moreover, it utilizes structural risk minimization by tion of separable, non-separable, unimodal and multimodal,
applying an upper bound on the expected risk while BPNN 20 benchmark functions to validate the superiority of pro-
makes use of traditional empirical risk minimization (Frias- posed TLSO in terms of consistency to converge to global
Martinez et al., 2006). However, the capability of LSSVM minimum in several executions or runs of algorithms con-
is highly dependent on the selection of its two hyper- sidered. The comparison has been made among population
parameters, i.e., generalization parameter C and width pa- based and good-history-retaining algorithms such as ACO,
rameter σ in the RBF kernel function. These two parameters PSO, TLBO, ATLBO, two variants of TLSO and proposed
are crucial for appropriate model building and should be TLSO.
tuned properly. Several tuning methods have been applied Eventually, a model was built using LSSVM tuned with
to support vector regression (SVR) and LSSVM to optimize proposed TLSO to predict the NOx emissions from tan-
hyper-parameters. These tuning methods include simulated gentially fired pulverized coal boiler in a 500 MW power
annealing (SA) (Ye, 2007), grid search (GS) (Zhou et al., plant. The result of prediction based on test data shows that
2012), population based swarm intelligence such as genetic LSSVM tuned with the proposed TLSO exhibits good re-
algorithm (GA) (Wang and Fu, 2014), ant colony optimiza- gression accuracy and generalization ability on test data.
tion (ACO) (Zhou et al., 2012), particle swarm optimization
(PSO) (Hu et al., 2009; Chamkalani et al., 2014), teaching-
1. Literature Review
learning-based optimization (TLBO) (Li et al., 2013) and
ameliorated teaching-learning-based optimization (ATLBO) 1.1 Least squares support vector machine (LSSVM)
(Li et al., 2013). Suykens and Vandewalle (1999) proposed an extended
Though TLBO has been widely applied to various indus- version of support vector machines (SVM) called Least
trial applications for model building tuned with different Squares Support Vector Machines (LSSVM) in 1999. It con-
optimization algorithms for prediction of relevant qual- verts the quadratic programming problem of SVM into a
ity variables (Rao et al., 2011; Degertekin and Hayalioglu, system of linear equations.
2013; Gang et al., 2014), to the best of authors’ knowledge, For training data containing N data points {xi, yi} (i=1,
hitherto only a few studies have utilized LSSVM tuned with 2, ..., N) with input xi ∈Rn and output yi ∈Rn, the LSSVM
particular teaching-learning-based algorithm for NOx emis- model takes the form of Eq. (1).
sions modeling (Li et al., 2013).
In this work, we have proposed a new concept of self- y(x ) = wT ⋅ φ(x )+b (1)
study in the teaching and learning environment to incorpo-
rate diversification in the teaching-learning-based algorithm Where, T denotes the transpose operation. In LSSVM for
to tune the hyper-parameters of LSSVM for the prediction of the function estimation, the optimization problem is de-
NOx emission from tangentially fired pulverized coal boiler. scribed as follows.
The proposed TLSO keeps the inertia weight and accelera- N
1 T 1
tion coefficient in the teacher phase introduced by Li and his
coworkers (Li et al., 2013) into the TLBO, but also proposes
Min J (w , e ) =
2
w w+ C
2 e
i=1
2
i (2)

the self-study concept in the learner phase to keep a trade-off


between intensification and diversification. This good bal- It is subject to the equality constraints.
ance between intensification and diversification characterizes
the TLSO with an attribute of consistency in converging to a yi = wT φ(xi )+b +ei (i =1, 2, , N ) (3)
global minimum in several independent runs.

274 Journal of Chemical Engineering of Japan


The corresponding Lagrangian is given by Eq. (4). Eq. (10).
N X teacher = X min f ( X ) (10)
L(w , b, e ; α) = J (w , e ) − 
i=1
αi {wT φ(xi )+b + ei − yi } (4)
The teacher would increase the knowledge of learners to
move the mean value Mi towards itself. This means the Mi
Considering Karush–Kuhn–Tucker (KKT) optimality con- will move towards Xteacher. The learners would increase their
ditions, we have Eq. (5). knowledge and update their marks, and the new obtained
marks primarily depend on two components: the old mark
∂L ∂L ∂L ∂L
=0 , =0 , =0 , =0 (5) Xold,i and the difference between Xteacher and the mean. Ac-
∂w ∂b ∂ei ∂αi
cording to the TLBO algorithm, these two components take
The optimization problem can be written as Eq. (6). the form as follows.

1 X new,i = X old,i + ri ( X teacher − TF Mi ) (11)


0  b 0
  ⋅  =  (6)
1T 1 ri is a random number between 0 and 1, which decides the
Ω + I  α   y 
T T
 C  step size of the difference (Xteacher−TFMi) randomly. TF is
the teaching factor which governs the value of the mean to
Where, y=[y1, ..., yN],1=[1, ..., 1], α=[α1,..., αN] and Mercer be updated. The value of TF can be either 1 or 2, which is
condition is applied as Eq. (7). decided with equal probability as Eq. (12).
Ωij = K (xi , x j ) = φ(xi )T φ(x j ) (i, j =1, 2, , N ) (7) TF = round[1+ rand(0, 1)(2 − 1)] (12)
Investigations show that the radial basis function (RBF) 1.2.2 Learner phase In this phase, a learner could up-
is superior to other kernel functions when there is a lack of date his/her marks and enhance its fitness value by inter-
prior knowledge (Vapnik, 1995). The RBF kernel that satis- acting with other learners who have more knowledge than
fies the Mercer conditions is given below. him/her in the class. For a learner Xi, another learner Xj is
randomly selected (i≠j) and Xi updates his/her marks ac-
 ||x − xi||2  cording to the random learner.
K (x , xi ) = exp  −  (i =1, 2, , N ) (8)
 2σ 2
 
 X old, i + ri ( Xi − X j ) if f ( Xi ) < f ( X j )
X new, i =  (13)
The resulting LSSVM model for nonlinear regression can be  X old, i + ri ( X j − Xi ) if f ( Xi ) ≥ f ( X j )
written as Eq. (9).
N Xnew,i is accepted if it gives better fitness value.
y( x ) =  α K ( x , x ) +b
i=1
i i (9)
2. Variants of Teaching-Learning-Self-Study-
1.2 Teaching-learning-based optimization (TLBO)
In 2011, Rao et al. (2011), proposed a TLBO which was
Optimization (TLSO)
based on the social phenomenon of teaching and learning In order to investigate thoroughly the convergence per-
in a classroom environment. Li et al. (2013) ameliorated formance in terms of consistency to converge the cost to a
this algorithm, which is known as ameliorated teaching- global minimum, a total of three variants had been devel-
learning-based optimization (ATLBO). It is composed of oped, executed and tested. The selected variant among the
two vital modes. (1) Teacher phase: teacher teaches and three has been described below as the proposed TLSO. The
influences all students in a classroom. (2) Learner phase: less remaining two variants have also been described subse-
knowledgeable students are influenced by more knowledge- quently.
able students and learn from them.
1.2.1 Teacher phase This phase represents the influ- 2.1 The proposed teaching-learning-self-study
ence of the teacher who is considered to be the most knowl- optimization (TLSO)
edgeable person in the classroom. The teacher, according to Intensification and diversification are two main com-
his abilities, tries to raise the knowledge level of students to ponents of a good optimization algorithm. The proposed
his or her level. In other words, he/she tries to increase the algorithm TLSO focuses on these two main basic concepts
mean result of his/her class in the subject or course he/she as follows:
teaches. At any iteration, suppose that there are ‘s’ number 1. In TLBO, teacher’s effort is randomly distributed to
of subjects (i.e. process variables or hyper parameters; m=1, all learners and he/she offers a random amount of learn-
2, ..., s) and ‘k’ number of students/learners (i.e. population ing to both the learners, i.e., good learner and bad learner.
size; i=1, 2, ..., k) and that Mi represents the mean of the Naturally, a teacher may put more effort to a bad learner
marks obtained by the learners in the class. As the teacher to move the average of a class towards itself faster in the
is considered a highly learned person in the class, the best teacher phase (Li et al., 2013). In TLBO, the obtained marks
learner in the class is considered as a teacher. Mark X which of the learners after an exam or phase are considered to be
gives the best fitness value is the mark of the teacher, that is normally distributed, though they may not be normally dis-

Vol. 50  No. 4  2017 275


tributed but may have some skewness in practice (Xing and the elitist strategy, the upper half is selected and the algo-
Gao, 2014). rithm moves towards the learner phase.
2. In the learner phase, apart from learning from each 2.1.2 Learner phase A good optimization algorithm
other, the learners naturally may increase their knowledge should focus on diversification as well as intensification. A
by studying various books, articles and theses by themselves balanced tradeoff between intensification and diversifica-
using facilities available such as the library, internet, etc. tion is indispensable. In the teaching and learning scenario,
This phenomenon can be termed as self-learning which has the diversifications can be considered equivalent to the self-
been added to the learner phase as a subsection. study of the learners. Self-study is a natural phenomenon
In this paper, a novel Teaching-Learning-Self-Study-Op- which allows learners to take full advantage of resources
timization (TLSO) is proposed by incorporating into the available such as libraries and academic websites on the in-
TLBO, the natural phenomenon of teacher’s focus on bad ternet, etc. As the teacher phase has already been intensified
learners and by proposing once again a natural phenom- by the introduction of new coefficients, we will introduce
enon of self-learning. Integration of the first phenomenon diversification in the learner phase. In order to introduce di-
mentioned above into TLBO focuses the intensification versification in the algorithm, a learner should be allowed to
which in general causes acceleration of convergence com- visit the wide search space to increase search point sparsity.
pared to TLBO. On the other hand, the second phenom- In TLBO, it is allowed in the first iteration where learners get
enon has been added to focus on diversification which in their marks randomly from the given min–max search space,
general reduces the risk of converging at a local minimum but start following the teacher and good learners in the con-
and increases the consistency of the algorithm to reach to sequent iterations. This allowance in the first iteration is not
global minimum. enough especially when the teacher phase is intensified, and
2.1.1 Teacher phase Lower marks of some learners needs to be implemented throughout the execution of all
will have a bad effect on the mean of the class. Therefore, iterations to have significant contribution of diversification
in fact, the teacher may want to put greater efforts on some factor in the algorithm. To incorporate this concept, TLSO
selected bad learners to raise the mean of the class. If the introduces diversification factor D as Eq. (17).
obtained marks of a learner are too low, the teacher must
Dm = XmU − XmL (17)
put more effort on this learner than that of a learner whose
mark is higher. Hence, the learner with lower marks needs The diversification factor Dm, to account for the self-study
a big step and the learner with higher mark needs a small concept in the learners phase, can be incorporated as a sub-
step towards the teacher. To integrate this concept, Eq. (11) section as Eq. (18) or Eq. (19).
is modified by introducing new parameters (Li et al., 2013). if rand<0.5
X mod,i = Ωi X old,i + Γi ( X teacher − TF Mi ) (14)  X old,i + ri ( Xi − X j ) if f ( Xi ) < f ( X j )
X mod,i =  (18)
Where, Ωi is the inertia weight that gives weight to the previ-  X old,i + ri ( X j − Xi ) if f ( Xi ) ≥ f ( X j )
ous solution and keeps a balanced contribution of historical
impact to the new solution. Γi is the acceleration coefficient otherwise
which accelerates the convergence by controlling the step
size of the difference (Xteacher−TFMi). Γi can take values from  X old,i − ri Dm if ri Dm < X old,i

0 to 1. These parameters are defined as follows. X mod,i =  X old,i + ri Dm if ri Dm = X old,i (19)
r D − X if ri Dm > X old,i
 i m old, i
fiti  
iter

Ωi = 1  1+ exp  −   (15)
 fit teacher   Similar to the teacher phase, after the modification of
   marks and computation of fitness values of all learners, the
marks and fitness values are concatenated, sorted according
  fit i   to fitness values and marks are set accordingly in the learner
Γi = 1  1+ exp  −  iter  (16)
  fit teacher   phase as well. Lastly, the upper half is selected for the next
iteration.
Where, fiti is the fitness value of the i-th learner and fitteacher A step by step procedure of TLSO is depicted in Figure 1.
is the fitness value of the teacher (the lowest cost of objective
function among all learners) in the first iteration. 2.2 Variant-1
The TLSO initializes in the same fashion as the TLBO Teacher phase Teacher phase of variant-1 is the same as
and the fundamental learning theme is similar. In the con- that of proposed TLSO.
sequent iterations, after the modification of the marks of all Learner phase Two acceleration coefficients Φi and Δi
the learners through Eq. (14) and the computation of the are assimilated (Li et al., 2013) to control the step size ac-
fitness values of all learners, the modified marks and fit- cording to the present efficiency of the learner and random-
ness values are concatenated with the old marks and fitness ly selected learner j as follows.
values. The fitness values are then sorted in ascending order
and their marks are assigned accordingly. Finally, following Φi =1 − exp(fit( X j ) − fit( Xi )) (20)

276 Journal of Chemical Engineering of Japan


Fig. 1 Flow chart of TLSO algorithm

Δi = 1 − exp(fit( Xi ) − fit( X j )) (21) 3. Experimental Study and Data Analysis


The final equation of learner phase of variant-1 can be writ- 3.1 Experimental setup
ten as follows. In this experiment, we have employed 20 benchmark
if rand<0.5 functions with balanced combinations of separable, non-
separable, unimodal and multimodal functions. The details
 X old,i + Φi ( Xi − X j ) if f ( Xi ) < f ( X j ) of the functions are shown in Table 1. These functions have
X mod, i =  (22)
 X old,i + Δi ( X j − Xi ) if f ( Xi ) ≥ f ( X j ) been optimized by different evolutionary and swarm intel-
ligence based optimization algorithms such as ACO, PSO,
otherwise TLBO, ATLBO, varian-1, variant-2 and TLSO. ACO and
PSO algorithms execute the objective function (1×pop-
 X old,i − ri Dm if ri Dm < X old,i ulation-size) per iteration; whereas, TLBO, ATLBO, vari-

X mod,i =  X old,i + ri Dm if ri Dm = X old,i (23) ant-1, variant-2 and TLSO execute the objective function
r D − X if ri Dm > X old,i (2×class-size) per iteration. Therefore, in order to maintain
i m old, i
the fairness in comparison and to keep the number of func-
tion evaluations the same, the total iterations for ACO and
2.3 Variant-2 PSO have been set as double of what has been set for the
Teacher phase Teacher phase of variant-2 is also the other considered algorithm. The class size or population and
same as proposed TLSO. function evaluations were set as 25 and 50,000, respectively,
Learner phase One more acceleration coefficient Ψi is for all considered algorithms. The maximum iterations for
incorporated as (Li et al., 2013) Eq. (24). ATLBO, variant-1, variant-2 and TLSO were 1,000, and for
ACO and PSO, 2,000. TLBO removes the duplicate marks
Ψi =1 − exp(fit( X teacher ) − fit( Xi )) (24)
and obtains their result during the iterations; therefore, it
The final equation of learner phase of variant-2 can be writ- may run the objective function more than (2×class-size)
ten as follows. per iteration. To address this and to keep the number of
if rand<0.5 function evaluations identical with other algorithms, a stop-
ping criteria has been incorporated in the algorithm which
 X old, i + Φi ( X j − Xi ) if f ( Xi ) ≤ f ( X j ) terminates the execution of the algorithm when the required
X mod,i =  (25)
 X old, i + Ψi ( X teacher − Xi ) if f ( Xi ) > f ( X j ) function evaluations have been executed. Each algorithm
has been executed for 30 times independently. The PSO
otherwise and ACO algorithms considered in this experiment can be
found in Hu et al. (2009) and Zheng et al. (2009), respec-
 X old,i − ri Dm if ri Dm < X old,i tively. Two different ACO algorithms have been presented in

X mod,i =  X old,i + ri Dm if ri Dm = X old,i (26) Zheng et al. (2009) from which ACO2 has been used in this
r D − X if ri Dm > X old,i work.
i m old, i
A drawback of the algorithms having algorithm-specific

Vol. 50  No. 4  2017 277


Table 1 Benchmark functions considered in experiment

No. Function Formula D Search Range C

1. Step D 30 [−100, 100] U.S.


Fmin =  [| x +0.5 |]
i=1
i
2

2. Sphere D 30 [−100, 100] U.S.


Fmin = x
i=1
2
i

3. Sum Squares D 30 [−10, 10] U.S.


Fmin =  ix
i=1
2
i

4. Beale 2 [−4.5, 4.5] UN


Fmin = (1.5 − x1 + x1 x2 ) +(2.25 − x1 + x1 x22 )2 +(2.625 − x1 + x1 x23 )2
2

5. Easom Fmin = − cos( x1 ) cos( x2 ) exp( − ( x1 − π )2 − ( x2 − π )2 ) 2 [−100, 100] UN


6. Matyas Fmin = 0.26( x12 + x22 ) − 0.48 x2 2 [−10, 10] UN
7. Zakharov 2 4 10 [−5, 10] UN
D  D   D 
Fmin =  xi2 + 
  0.5 ixi  + 
   0.5 ixi 

i=1  i=1   i=1 
8. Powell D /4 24 [−4, 5] UN
Fmin = 
i=1
( x 4 i −3 +10 x 4 i −2 )2 + 5( x 4 i −1 − x 4 i )2 +( x 4 i −2 +10 x 4 i −1 )4 +10( x 4 i −3 − x 4 i )4

9. Schwefel 2.22 D D 30 [−10, 10] UN


Fmin =  | x |+∏ | x |
i=1
i
i=1
i

10. Schwefel 1.2 2 30 [−100, 100] UN


D  i 
Fmin =   

x 2j 

i=1  j=1 
11. Bohachevsky 1 2 [−100, 100] MS
Fmin = x12 +2 x22 − 0.3 cos(3 πx1 ) − 0.4 cos( 4 πx2 )+0.7
12. Booth 2 [−10, 10] MS
Fmin = ( x1 − 2 x2 − 7 )2 +(2 x1 + x2 − 5)2
13. Michalewicz 2 D 2 [0, π] MS
Fmin = −  sin x (sin( ix
i=1
1
2
i / π ))20

14. 6 Hump Camel Back 2 [−5, 5] MN


x16
Fmin = 4 x12 − 2.1 x14 + + x1 x 2 − 4 x22 + 4 x 24
3
15. Schaffer 2 [−100, 100] MN
sin2 ( x12 + x 22 ) − 0.5
Fmin = 0.5 +
(1+ 0.001( x12 + x 22 ))2
16. Bohachevsky 2 2 [−100, 100] MN
Fmin = x12 +2 x22 − 0.3 cos(3 πx1 )( 4 πx2 )+0.3
17. Bohachevsky 3 2 [−100, 100] MN
Fmin = x12 +2 x22 − 0.3 cos(3 πx1 + 4 πx2 )+0.3
18. Shubert MN
 5  5 
Fmin = 
  i cos(( i +1) x1 +i 
  i cos(( i +1) x2 +i 

 i=1  i=1 
19. GoldStein-Price 2 [−2, 2] MN
Fmin =[1+( x1 + x 2 +1)2 (19 − 14 x1 + 3 x12 − 14 x2 + 6 x1 x 2 + 3 x22 )]
×[30 +(2 x1 − 3 x 2 )2 (18 − 32 x1 +12 x12 + 48 x 2 − 36 x1 x 2 + 27 x 22 )]
20. Griewank D D  xi  30 [−600, 600] MN
1
Fmin =
4000  x − ∏ cos 
2
i  +1

i=1 i=1  i 
D=Dimension, C=characteristic, U=unimodal, M=multimodal, S=separable, N=non-separable.

parameters, such as ACO and PSO in this case, is they need mark function separately. In contrast, TLBO, ATLBO, vari-
extensive experimentation for the selection of the algo- ant-1, variant-2 and TLSO have the advantage of not having
rithm-specific parameters for each objective function prior such algorithm-specific parameters, which saves a lot of
to actual optimization. These parameters have substantial ef- time consumed in algorithm-specific parameters selection.
fects on the performance of the algorithms, i.e., convergence All of the experiments have been conducted on an
accuracy and convergence speed. Therefore, the selection of Intel(R) Core(TM) i7-3770 CPU @3.40 GHz and 8 GB RAM
suitable algorithm-specific parameters is an essential task for with Windows 7 Ultimate.
such algorithms. Therefore, an extensive experimentation
has been carried out to obtain the algorithm-specific param- 3.2 Comparison with other algorithms
eters of ACO and PSO to get the best result for each bench- In this subsection, the results of the TLSO with 25 learn-

278 Journal of Chemical Engineering of Japan


Table 2 Comparison of algorithms for benchmark function experiment

Function ACO PSO TLBO ATLBO Variant-1 Variant-2 TLSO

Step Optimum 0 0 0 0 0 0 0
Best 4.2987e-07 0.0076 1.4948e-16 6.2621 5.9954 6.6278 0.1792
Worst 0.0126 1.8693 1.1425e-10 7.5000 7.4844 7.4992 1.0741
Mean 0.0020 0.7049 5.1737e-12 7.3546 6.9546 7.2777 0.5448
SD 0.0031 0.3926 2.1416e-11 0.2792 0.4329 0.2093 0.2118
MI — — 830.2222 — — — —
Time 1.6155 1.1484 5.6979 1.0729 1.1596 1.1027 1.0434
Sphere Optimum 0 0 0 0 0 0 0
Best 9.9745e-07 0 7.4626e-149 0 0 0 0
Worst 0.0141 1.0002e+04 5.8388e-144 0 0 0 0
Mean 0.0016 1.3343e+03 3.6166e-145 0 0 0 0
SD 0.0032 3.4573e+03 1.0796e-144 0 0 0 0
MI — — — 527.5333 541.5000 532.9000 534.6667
Time 1.6207 1.1412 5.6143 1.0965 1.1736 1.1224 1.0603
Sum Squares Optimum 0 0 0 0 0 0 0
Best 7.3885e-10 0 3.5427e-149 0 0 0 0
Worst 9.5630e-04 1500 2.6366e-144 0 0 0 0
Mean 1.4028e-04 186.6667 1.8251e-145 0 0 0 0
SD 2.3041e-04 434.4901 6.1275e-145 0 0 0 0
MI — — — 524.9333 537.8333 529.1667 531.7333
Time 1.6259 1.1413 5.6341 1.0915 1.1763 1.1128 1.0655
Beale Optimum 0 0 0 0 0 0 0
Best 2.5085e-04 0 0 4.4019e-06 5.9169e-06 1.1297e-05 0
Worst 0.4458 0.7621 0 1.1225 0.0050 0.0072 6.1630e-32
Mean 0.0659 0.0762 0 0.2788 8.7650e-04 0.0017 2.9788e-33
SD 0.0815 0.2325 0 0.3492 9.7008e-04 0.0020 1.2179e-32
MI — 81.3333 Not converged — — — 192.7500
Time 0.7723 0.5238 4.5045 0.5775 0.5797 0.5531 0.5115
Easom Optimum −1 −1 −1 −1 −1 −1 −1
Best −0.9999 −1 −1 −0.9999 −0.9999 −0.9999 −1
Worst −0.9996 −1 −1 0 −0.0024 −0.2231 −1
Mean −0.9999 −1 −1 −0.3177 −0.9071 −0.8394 −1
SD 9.0771e-05 0 0 0.4606 0.2416 0.2158 0
MI — 168.4000 94.6333 — — — 138.6000
Time 1.2857 1.0491 4.6495 0.9954 1.0612 0.9898 0.9362
Matyas Optimum 0 0 0 0 0 0 0
Best 8.3409e-15 0 1.3722e-273 0 0 0 0
Worst 2.1366e-07 0 1.7080e-260 4.9407e-324 4.9407e-324 4.9407e-324 0
Mean 1.7586e-08 0 6.7700e-262 0 0 0 0
SD 4.5363e-08 0 0 0 0 0 0
MI — 417.7333 — 521.6538 526.7917 523.8148 392.8667
Time 0.7367 0.5115 4.4447 0.5497 0.5472 0.5300 0.4874
Zakharov Optimum 0 0 0 0 0 0 0
Best 1.1189e-08 0 1.2474e-102 0 0 0 0
Worst 0.0014 0.0765 1.3062e-96 0 0 0 0
Mean 9.7569e-05 0.0077 5.9532e-98 0 0 0 0
SD 2.7039e-04 0.0181 2.4845e-97 0 0 0 0
MI — — — 521.8000 533.9333 527.9667 528.1000
Time 1.4539 1.1166 5.0447 1.0795 1.1386 1.0640 1.0142
Powell Optimum 0 0 0 0 0 0 0
Best 1.2187e-09 0 1.3436e-12 0 0 0 0
Worst 6.3814e-04 93.3053 2.2869e-07 0 0 0 0
Mean 5.6287e-05 14.8647 2.4341e-08 0 0 0 0
SD 1.2657e-04 20.9282 5.8292e-08 0 0 0 0
MI — — — 525.5667 537.5667 530.4667 528.8000
Time 1.6754 1.2795 5.6434 1.1937 1.2678 1.1992 1.1416
Schwefel 2.22 Optimum 0 0 0 0 0 0 0
Best 1.3446e-04 0 2.6461e-74 8.1947e-309 2.6589e-303 3.3894e-305 3.4250e-306
Worst 0.0392 49.4945 1.1577e-72 1.1410e-304 1.3414e-300 1.9049e-302 2.9975e-305
Mean 0.0105 12.4796 2.7073e-73 1.9068e-305 8.6671e-302 1.3074e-303 1.0911e-305
SD 0.0103 11.7663 2.7884e-73 0 0 0 0
MI — Not converged — — — — —
Time 1.6403 1.1661 5.8249 1.0787 1.1675 1.1074 1.0755
Schwefel 1.2 Optimum 0 0 0 0 0 0 0
Best 2.0244e-15 0 5.9502e-293 0 0 0 0
Worst 0.0023 800000000 2.8513e-287 0 0 0 0
Mean 1.1665e-04 60000000 2.2931e-288 0 0 0 0
SD 4.2449e-04 1.8864e+08 0 0 0 0 0
MI — 403 — 261.7667 274.2667 266.6000 269.3667
Time 1.9412 1.4797 6.2953 1.4537 1.5217 1.4894 1.4039

Vol. 50  No. 4  2017 279


Table 2 Comparison ofTable 2 (Continued)
algorithms for benchmark function experiment

Function ACO PSO TLBO ATLBO Variant-1 Variant-2 TLSO


Bohachevsky 1 Optimum 0 0 0 0 0 0 0
Best 3.2787e-07 0 0 0 0 0 0
Worst 0.0139 0.4129 0 0.5339 0 0 0
Mean 0.0010 0.0138 0 0.0336 0 0 0
SD 0.0026 0.0754 0 0.1281 0 0 0
MI — 75.5172 62.3333 23.4643 32.9000 27.3667 24.8333
Time 0.7593 0.5094 4.3776 0.5585 0.5686 0.5440 0.4883
Booth Optimum 0 0 0 0 0 0 0
Best 0.0327 0 0 0.0170 1.9773e-04 0.0017 0
Worst 15.7028 0 0 13.9661 12.2347 20.0143 0
Mean 2.4235 0 0 2.6168 0.5589 2.0699 0
SD 3.3621 0 0 3.6574 2.2178 3.8630 0
MI — 53.7333 Not converged — — — 163.9667
Time 0.7369 0.4917 4.4084 0.5446 0.5557 0.5395 0.4814
6 Hump Camel Optimum −1.03163 −1.03163 −1.03163 −1.03163 −1.03163 −1.03163 −1.03163
Back Best −1.031343 −1.031628 −1.031628 −1.023477 −1.030853 −1.031120 −1.031628
Worst −0.512714 −1.031628 −1.031628 −0.037735 −0.946287 −0.577493 −1.031628
Mean −0.913493 −1.031628 −1.031628 −0.738855 −1.012701 −0.950475 −1.031628
SD 0.148386 6.5195e-16 6.7752e-16 0.306883 0.021221 0.113539 1.4700e-08
MI — 40.9333 181.2692 — — — 38.0667
Time 0.780536 0.5443 4.4418 0.5959 0.5961 0.5569 0.5385
Michalewicz 2 Optimum −1.8013 −1.8013 −1.8013 −1.8013 −1.8013 −1.8013 −1.8013
Best −1.8008 −1.8013 −1.8013 −1.80129 −1.80129 −1.80128 −1.8013
Worst −1.2108 −0.8013 −1.8013 −1.7138 −1.0032 −1.6965 −1.8011
Mean −1.6625 −1.5424 −1.8013 −1.7850 −1.7479 −1.7890 −1.8013
SD 0.1599 0.3821 9.0336e-16 0.0222 0.1757 0.0204 3.5272e-05
MI — 6.7000 26 — — — 22.6897
Time 1.3704 1.0845 4.6834 1.0666 1.1437 1.0625 0.9943
Schaffer Optimum 0 0 0 0 0 0 0
Best 1.3264e-07 0 0 0.0097 0 0.0097 0
Worst 7.7539e-04 0.0097 0.0080 0.1782 0.1276 0.1782 0.0372
Mean 6.1321e-05 6.4773e-04 3.1107e-04 0.0479 0.0277 0.0268 0.0127
SD 1.4583e-04 0.0025 0.0015 0.0454 0.0290 0.0357 0.0101
MI — 660.6429 Not converged — 662 — 41.5000
Time 1.0297 0.7711 4.4180 0.7643 0.7945 0.7556 0.6982
Bohachevsky 2 Optimum 0 0 0 0 0 0 0
Best 9.6508e-08 0 0 0 0 0 0
Worst 9.4119e-04 0 0 0 0 0 0
Mean 1.0520e-04 0 0 0 0 0 0
SD 2.3087e-04 0 0 0 0 0 0
MI — 34.9000 62.0667 24.2333 31.4333 27.2667 24.5333
Time 0.7548 0.5107 4.3693 0.5535 0.5622 0.5449 0.4863
Bohachevsky 3 Optimum 0 0 0 0 0 0 0
Best 1.4178e-07 0 0 0 0 0 0
Worst 0.0070 0 0 0.3176 0 0 0
Mean 0.0018 0 0 0.0358 0 0 0
SD 0.0019 0 0 0.0940 0 0 0
MI — 121.5000 97.4333 24.2308 33.6667 27.2000 26.3667
Time 0.7518 0.5042 4.3639 0.5660 0.5575 0.5456 0.4902
Shubert Optimum −186.73 −186.73 −186.73 −186.73 −186.73 −186.73 −186.73
Best −186.6447 −186.7309 −186.7309 −183.4215 −186.7309 −186.7262 −186.7309
Worst −150.9113 −186.7309 −186.7309 −35.0872 −162.4750 −78.1390 −186.7309
Mean −179.5812 −186.7309 −186.7309 −115.1551 −184.5825 −177.8981 −186.7309
SD 9.3883 3.6944e-14 1.9748e-14 44.6395 4.6589 20.0409 1.3443e-07
MI — 64.4667 530 — 151.8 — 39.4333
Time 0.7979 0.5503 4.4531 0.6053 0.6123 0.6001 0.5329
Gold Stein Price Optimum 3 3 3 3 3 3 3
Best 3.0652 2.9999 2.9999 3.0845 3.0003 3.0025 3
Worst 22.5277 84.0000 2.9999 201.0964 30.0048 84.3471 2.9999
Mean 9.9523 9.3000 2.9999 26.6149 6.3867 18.3300 3
SD 5.7063 20.8941 1.3323e-15 39.3595 5.9907 17.2697 3.1963e-07
MI — 43.963 57.5 — — — 82.1786
Time 0.7677 0.5282 4.3912 0.5850 0.5894 0.5646 0.5362
Griewank Optimum 0 0 0 0 0 0 0
Best 3.6640e-09 0 0 0 0 0 0
Worst 0.0097 0 0 0 0 0 0
Mean 0.0015 0 0 0 0 0 0
SD 0.0024 0 0 0 0 0 0
MI — 407.2667 145.5333 26.6333 39.3667 31.6333 34.6000
Time 1.7111 1.2521 5.8819 1.6477 1.2267 1.1595 1.6163

SD=Standard deviation, MI=mean iterations to converge the average cost of class/population.

280 Journal of Chemical Engineering of Japan


Table 3 Frequency of benchmark functions presenting best results according to the criteria over 30 independent runs from 20 benchmark functions

Criterion ACO PSO TLBO ATLBO Variant-1 Variant-2 TLSO

Mean solution 1 8 12 8 10 10 17
Standard deviation (if the mean solution is among the best) 1 7 11 8 10 10 13
Mean iterations to converge the average cost of class (if the solution 0 4 2 9 0 0 4
is among the best)

ers or population and 50,000 function evaluations are pre- Goldstein price but in terms of standard deviation, TLBO
sented and compared with the results obtained using the was found better than TLSO.
other algorithms considered in terms of consistency to con- Table 3 summarizes the results presented in Table 2 and
verge the cost to a global minimum, mean iterations to shows the number of benchmark functions each considered
converge the average cost of the class or population (MI) algorithm found the best according to the mean solution
and average time in seconds it takes per run. Table 2 shows and standard deviation (if the mean solution is among the
the results in the form of the best solution among 30 inde- best). Even if an algorithm is able to converge to a global
pendent runs, worst solution, mean solution, standard de- minimum and converges the average cost faster for a range
viation, MI and average time consumed per run. The mean of functions, its consistency to converge to a global mini-
solution and standard deviation represent the consistency mum for a function still remains the essential criteria for
of an algorithm (for the specific function) to converge to performance evaluation. In this experiment, TLSO outper-
a global minimum. The MI represents the speed of the al- formed other considered algorithms in terms of consistency
gorithm with which the average of marks or position of its to converge to a global minimum by giving better results for
learners or population moves towards the teacher (best ant 17 functions in terms of mean solution and 13 functions in
or best particle) during the iterations. In this experiment, terms of standard deviation out of 20 benchmark function.
the MI over 30 independent runs to converge the average However, as expected, in terms of MI, ATLBO was found
cost of the class or population has been computed for the better than other algorithms considered. ATLBO converges
algorithms giving the best solution for the functions. the average cost of the class quicker than other algorithms
It can be observed from the Table 2 that in this experi- for 9 functions (Table 3). The reason of ATLBO’s better
ment, the performances of PSO, TLBO, ATLBO, variant-1, performance in this regard may be due to the inclusion of
variant-2 and TLSO were the same for Bohachevsky 2 and inertia weight and acceleration coefficients in teacher and
Griewank, and they outperformed ACO’s performance in learner phases of the TLBO, respectively. These coefficients
terms of mean solution and standard deviation as shown help accelerate the movement of the average cost of the
in Table 2. PSO, TLBO, variant-1, variant-2 and TLSO have class towards the global minimum, causing faster conver-
been found to perform identically and outperformed ACO gence of average cost of the class. On the other hand, it
and ATLBO for Bohachevsky 3. For Matyas, PSO, ATLBO, may be inferred that the compelling pressure, of the teacher
variant-1, variant-2 and TLSO performed in similar fash- on the learners and of learners on other learners through
ion and outperformed ACO and TLBO. For Sphere, Sum these acceleration coefficients may lead learners to the local
squares, Zakharov, Powell and Schwefel 1.2; ATLBO vari- minimum especially in the case of multimodal functions
ant-1, variant-2 and TLSO’s performances have been found during some of the independent runs. It is evident in Table
alike and better than other considered algorithms. Results 2 that the ability of ATLBO being consistent to converge
obtained from TLBO, variant-1, variant-2 and TLSO were to a global minimum is deficient in the case of multimodal
found the same for Bohachevsky 1 and these algorithms functions compared to unimodal functions. TLSO has bal-
outperformed other algorithms. It was observed for Schaffer ance in its performance in unimodal and multimodal func-
that only PSO, TLBO, variant-1 and TLSO have converged tions as well. The self-study concept in TLSO helps diversify
the cost to a global minimum, but ACO outperformed all the search process in the search space, so that learners can
algorithms in terms of mean solution and standard de- spread and cover the whole search space instead of follow-
viation. In case of Easom and 6 Hump Camel Back; PSO, ing the teacher or the other learners blindly and converge at
TLBO and TLSO performed better than other algorithms; a local minimum. However, there is compensation for this
for 6 Hump Camel Back, the standard deviation of PSO was consistency in TLSO, a bit higher MI than ATLBO.
found better than TLBO and TLSO. For Booth and Shubert, As far as the elapsed time of functions per run is consid-
the algorithms whose performances were found best are ered, elapsed time of different functions per run for TLSO
PSO, TLBO and TLSO. In addition, for Shubert, TLBO out- outperformed TLBO, but was found comparable (because
performed PSO and TLSO in terms of standard deviation. the difference is less than a second) to the other considered
TLBO and TLSO achieved the same results for Michalewicz algorithms (Table 2).
2 except for the standard deviation, which was better in the
case of TLBO. TLBO gave the best results compared to other
algorithms for Step and Beale functions. For Schwefel 2.22,
TLSO outperformed. TLBO and TLSO outperformed for

Vol. 50  No. 4  2017 281


set up according to the time lag between process variables
and NOx emissions. Time lag was 11 min, which was incor-
porated accordingly. Initially, there were 37 (after taking av-
erage of those operational parameters which were recorded
in redundancy) process variables collected from the plant
that included various process variables such as coal feed
rate, air flow bias, mill bowl differential pressure, total mois-
ture, fuel ratio, O2 and CO outflow, sulfur content, etc. A
graphical investigation was carried out and the variables that
were found to be constant or noise along the whole passage
of time were omitted and 14 variables were selected. The
selected variables are forced draft fan bias, induced draft fan
bias, primary air flow bias, coal feed rate, mill bowl differen-
tial pressure, calorific value of coal, ash content, air to fuel
ratio, total moisture, volatile matter, reheater outlet tempera-
ture, generation in megawatt, O2 and CO outflow. From the
plant, 246 data samples were collected with a sampling time
of 2 h. The LSSVM model was developed with 184 samples
and the remaining 62 samples were utilized for testing and
validating the model.

4.3 Tuning of hyper-parameters of LSSVM


Generalization ability and prediction accuracy of LSSVM
mainly depend on its two hyper-parameters; punishment
on the data set beyond the pre-specified error tolerance (C)
and RBF width (σ2). Therefore, it is of utmost importance
Fig. 2 A simplified sketch of a tangentially fired pulverized coal boiler
to adjust these hyper-parameters carefully for a well-tuned
LSSVM model. As a cost of the objective function, the root
mean squared error of cross validation (RMSECV) is adopt-
4. NOx Modeling
ed with 5-fold cross validations. The details of cross valida-
4.1 Plant description tion can be found elsewhere (Kohavi, 1995).
For the prediction of NOx emissions from the stack, a In this work, the proposed TLSO is applied to NOx emis-
standard 500 MW unit consisting of a tangentially fired sions data to build a well-tuned model and is compared
pulverized-coal boiler is considered in this study. The boiler with other optimization algorithms in terms of consistency
consists of 6 layers and 24 burners. Tilting fuel and combus- in approaching the global minimum. For all optimization
tion air nozzles are located in each corner of the furnace. algorithms tested in this work, class size or population size is
These nozzles are directed towards the circumference of selected as 30. The dimension is 2 and the error tolerance of
an imaginary circle at the center of the furnace. Pulverized the cost function is set to be 0.00001. The [min, max] inter-
coal from the silos is conveyed through a conveyer belt to vals of the variables (C, σ2) are [0.001, 1000].
the mill where size reduction of coal is carried out. The Extensive experimentation has been carried out to se-
amount of coal used in the boiler depends on the boiler lect the suitable algorithm-specific parameters of ACO and
load. The pulverized coal is transported from the mill to the PSO. For ACO, these parameters have been set as evapo-
boiler through preheated primary air. Secondary air travels ration factor (ρ)=0.3556, constant (k)=0.2790, constant
through air preheater to the boiler by a forced draft fan and (po)=0.9999, traverse step of ants (λ)=0.7431, traverse step
is used in the furnace to burn the coal. Coal combustion of best ant (w), wmax =0.9999 and wmin =0.2001. For PSO,
produces combustion gas and ash. Heavy ash at the bottom cognitive acceleration constant (c1)=1.7425, social acceler-
and fly ash is collected at the top, respectively. NOx is then ation constant (c2)=1.9999 and inertia weight (ω)=0.5398.
removed using Selective Catalytic Reduction (SCR) and Unlike ACO and PSO, TLBO and ATLBO, variant-1, vari-
the remaining part is fed to a dust collector where fly ash is ant-2 and TLSO are free from the hindrance of selection of
captured. Finally, SOx is removed followed by emission of algorithm-specific parameters. All the algorithms consid-
remaining gases through the stack. A simplified sketch of ered have been run 100 times each independently to tune
boiler is shown in Figure 2. the LSSVM hyper-parameters with the settings mentioned
above accordingly.
4.2 NOx modeling setup There are various performance criteria to estimate the
For the purpose of this study, operational process vari- prediction error. Here, relative root mean squared error
ables and NOx emissions history was obtained from the (rRSME) and mean relative error (MRE) are adopted for es-
Taean Power Plant, Taean, Korea. First of all, the data was timating the prediction error of LSSVM model.

282 Journal of Chemical Engineering of Japan


Fig. 3 ACO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average cost of
objective function against number of iterations (d) converged average cost at each run

2
with each iteration. The converged cost and the converged
Nt
1  Yi ,actual − Yi ,pred  average cost are plotted against 30 independent runs in Fig-
relative RMSE =
Nt  
 Yi ,actual

 ures 3(b) and (d), respectively, to magnify the results. The
i=1 (27)
minimum cost (i.e. global minimum) of 5-fold RMSECV of
NOx data under consideration using LSSVM was 20.4805.
Nt
1 Yi ,actual − Yi ,pred Figure 3(a) shows the performance of convergence of cost of
MRE =
Nt  Yi ,actual the objective function using ACO. It is clear from the figure
i=1 (28)
that ACO has not converged to the global minimum in all
4.4 Results and discussion 30 runs. In fact, it is difficult to realize from the Figure 3(a)
After a careful processing of the operational data, the that, although some runs which seem to have converged to
NOx emissions from a tangentially fired pulverized coal the global minimum, they actually have not converged to
boiler were modeled using LSSVM. For the fine-tuned it but only approached to near global minimum. Therefore,
LSSVM model, the hyper-parameters of LSSVM need to be in order to differentiate, we have classified the diversions as
optimized well for better predictions. The hyper-parameters near global minimum diversions and local minimum diver-
of LSSVM have been optimized by employing various al- sions. In Figure 3(b), it is clear that ACO has not converged
gorithms and their convergence performances in terms of the cost to the global minimum in all 30 runs. ACO reached
consistency to converge the cost of the 5-fold RMSECV of the near global minimum in 7 runs, and the remaining 23
NOx emissions under consideration to a global minimum, runs diverged to a local minimum (Table 4). As far as the
MI have been investigated. The algorithms employed for this average cost is concerned, ACO diverged it to a local mini-
application have been run 30 times each. For the fair com- mum in all 30 runs (Figure 3(d)). It demonstrates that dur-
parison, ACO and PSO have been iterated double the times ing the execution of iterations of algorithm (for NOx data
of the other algorithms. Therefore, the even values of cost under consideration), the average position of the ants did
and average cost of ACO and PSO have been taken to draw not approach towards the global minimum in any run. ACO
the graphs to be consistent with graphs of other algorithms. converged the cost to near the global minimum in 7 runs; it
The performance of 30 runs of ACO is shown in Figure is because the best ant of the iteration approached near the
3. The cost and average cost of objective function (RMSECV global minimum. The convergence of cost to near the global
5-fold) are plotted against number of iterations in Figures minimum is not fixed, and owing to the stochastic process,
3(a) and (c). The average cost of objective function for all this number may change. The important point to be noted
algorithms has been plotted to see if it converges to the here is that cost has not converged to the global minimum
global minimum and to examine how rapidly the learners of in all 30 runs.
a class or particles of swarm or ants of a colony are moving Similarly, the graphs are plotted to investigate the con-

Vol. 50  No. 4  2017 283


Table 4 Performance in terms of consistency to converge and speed in terms of iterations in 30-run experiment of optimization algorithms

Type ACO PSO TLBO ATLBO Variant-1 Variant-2 TLSO

Convergence of cost of 5-fold RMSECV 0 0 30 5 11 14 30


Local minimum diversion (cost) 23 29 0 2 5 5 0
Near to global minimum diversion (cost) 7 1 0 23 14 11 0
Mean iterations to converge the cost (if converged) — — 18.6667 9.6 13.4545 12 12.8
Mean iterations to converge the average cost (if converged) — — 40.0333 12.4000 16.2727 15.1429 16

Fig. 4 PSO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average cost of
objective function against number of iterations (d) converged average cost at each run

sistency performance of PSO. It can be seen in Figures 4(a) every run (Figures 5(a) and (b)), but also the average cost
and (b) that not all the runs of PSO converged the cost to of class converged to the global minimum at every run (Fig-
near the global minimum and none converged to the global ures 5(c) and (d)). In other words, it can be said for all 30
minimum. The dotted line represents the global minimum. runs that all the learners approached the global minimum
PSO diverged the cost to the local minimum 29 times and at every run. The cost and average cost curves of TLBO ap-
to the near the global minimum 1 time out of 30 runs (Table proached the global minimum in 18.6667 and 40.0333 mean
4). Here, it may be said that the 5-fold RMSECV objective iterations averaged over 30 runs (as all 30 runs converged to
function of the NOx model using LSSVM in this case is a the global minimum) (Table 4).
multimodal function and, due to this reason, some of the The ATLBO graphs shown in Figures 6(a) and (b) depict
runs of PSO have completely diverted to a local minimum. that ATLBO diverged the cost in some of the runs. Table 4
Figures 4(c) and (d) show that the average cost of the PSO tabulates that cost diverged in 2 runs to a local minimum, to
diverged to a local minimum and never converged to the near the global minimum in 23 runs and converged to the
global minimum, even not near to the global minimum in global minimum in 5 runs. The mean iterations to converge
a 30-run experiment. Similar to ACO, it can be said for PSO the cost and average cost were computed to be 9.6 and 12.4
as well (for NOx data under consideration) that the average iterations, which indicate quite fast convergence of cost and
position of the swarms was not converged except the best average cost of the class for ATLBO in terms of iterations.
swarm of the iteration. PSO was also found inconsistent for Figure 6(c) illustrates the rapid movement of average cost
converging the cost of 5-fold RMSECV of NOx data under towards the global minimum when compared to TLBO
consideration. (Figure 5(c)). This rapid movement is characterized by the
The performance of TLBO for NOx data under consider- inclusion of the inertia weight, acceleration coefficient and
ation is presented in Figure 5. The cost of objective function the elitist strategy in the ATLBO. On the other hand, as the
using TLBO not only converged to the global minimum at cost using ATLBO for the considered NOx data was not

284 Journal of Chemical Engineering of Japan


Fig. 5 TLBO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average cost
of objective function against number of iterations (d) converged average cost at each run

Fig. 6 ATLBO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average cost
of objective function against number of iterations (d) converged average cost at each run

converged as well for all 30 runs, the average cost was not inertia weight and acceleration coefficients in both teacher
converged as well for all 30 runs (Figures 6(c) and (d)). and learner phases. These coefficients intensify the unidirec-
ATLBO differs from TLBO on the basis of introduced tional search and compel the average of the cost towards the

Vol. 50  No. 4  2017 285


Fig. 7 Variant-1: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average
cost of objective function against number of iterations (d) converged average cost at each run

minimum without taking into account the diversification. celeration coefficients in the learner phase. The average cost
As a result, although the runs of ATLBO moved the average using variant-1 did not converge to the global minimum in
cost quickly towards the global minimum, they could not all 30 runs (Figures 7(c) and (d)).
avoid divergences from the global minimum in some runs. The variant-2 successfully escaped the cost of the objec-
Keeping in mind that the runs of TLBO avoided trapping tive function to be stuck in a local minimum 25 times and
in a local minimum and converged steadily to the global converged the cost 14 times, thus outperforming ATLBO
minimum, it is quite reasonable to infer that the compelling and variant-1 in this regard (Figures 8(a) and (b)). The
pressure of intensification without inclusion of diversifica- cost of variant-2 diverted to near the global minimum for
tion in ATLBO is responsible for some runs to just approach 11 times (Table 4). The mean iterations to converge the cost
to near the global minimum and not converge to it and/or and average cost were computed to be 12 and 15.1429. In
completely divert to a local minimum (for the NOx data Figures 8(c) and (d), it is shown that the average cost did
under consideration having local minimum). not converge to the global minimum in all 30 runs.
Figure 7 represents the performance of the variant-1. The performance of proposed TLSO is illustrated in Fig-
As shown in Figures 7(a) and (b), variant-1 has success- ure 9. It is clear from Figures 9(a) and (b) that TLSO has not
fully avoided the cost of objective function to be trapped only been successful in avoiding the local minimum, but has
in a local minimum 25 times in total and converged to the also avoided divergence to near the global minimum. TLSO
global minimum 11 times, and outperformed ATLBO in has converged the cost to the global minimum in all 30 runs.
this regard. It may be the result of the incorporation of the As far as the consistency of runs to converge the average
self-study concept of learners (diversification), but on the cost to the global minimum is concerned, TLSO has shown
other side, variant-1 diverted the cost to near the global to escape from trapping in near the global minimum as well
minimum 14 times and to a local minimum 5 times in the as the local minimum, and converged to the global mini-
30-run experiment as shown in Figure 7(b) and Table 4. The mum in all 30 runs (Figure 9(d)). In terms of consistency to
mean iterations to converge the cost and average cost were converge the cost and average cost to the global minimum,
computed as 13.4545 and 16.2727, which are higher than TLSO outperformed ACO, PSO, ATLBO, variant-1 and vari-
those of ATLBO. Compared to TLBO, the mean iterations ant-2, but is comparable to TLBO for the objective function
to converge the cost and average cost of variant-1 is lower. of the application considered in this study. The average cost
It indicates that variant-1 moves the average cost towards curves of TLSO approached the global minimum in 16
near the global minimum quicker than TLBO. Despite hav- mean iterations averaged over 30 runs. The skewness of the
ing this capability, varian-1 did not converge the cost to the curves of average cost against iterations and comparison of
global minimum in all 30 runs as TLBO did. The reason for mean iterations to converge the cost demonstrate that the
this may be the over intensification by the inclusion of ac- movement of the average cost towards the global minimum

286 Journal of Chemical Engineering of Japan


Fig. 8 Variant-2: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average
cost of objective function against number of iterations (d) converged average cost at each run

Fig. 9 Proposed TLSO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) av-
erage cost of objective function against number of iterations (d) converged average cost at each run

of TLSO is faster than TLBO, variant-1 and variant-2, but in that they are successful in converging the cost of some
slightly slower than ATLBO (Figures 3(c)–9(c) and Table 4). objective functions and not successful in the case of some
This is the characteristic of all evolutionary algorithms other objective functions. Furthermore, if they converge the

Vol. 50  No. 4  2017 287


cost to the global minimum, they are consistent for some gorithm. However, the general trend of a 30-run experiment
objective functions and are not for some other objective of the algorithms mentioned in this work remained the
functions. It has already been mentioned above by optimiz- same as has been described earlier regarding each algorithm.
ing the benchmark functions that TLSO outperformed in All of the optimization algorithms considered in this
terms of consistency to converge the cost compared to the work were executed 30 times each and the medians of the
other algorithms considered and is faster in terms of time costs of each algorithm were computed. For comparison
elapsed per run compared to TLBO. purposes, the (C, σ2) corresponding to the median cost of
In the case of the NOx prediction application under con- each algorithm was picked up for NOx modeling and simu-
sideration, the TLSO was found to outperform the other lation, and it was found that their NOx prediction results
algorithms considered in terms of consistency and com- were comparable with TLSO. The performance criteria of
parable to TLBO. But TLSO outperformed TLBO in terms prediction error such as rRSME and MRE for training and
of mean iterations to converge the cost and average cost. test data have also been computed and tabulated in Table 5.
Also, TLSO outperformed the other algorithms considered The maximum difference between the MRE values of mod-
including TLBO in terms of mean iterations to converge the els with different algorithms is only 0.008 and the maximum
cost and average cost except ATLBO. Here TLSO presented difference between the rRMSE values is only 0.022. This may
better results than ATLBO in terms of consistency to con- be because the NOx emissions data is not sensitive up to this
verge. level of accuracy on which the optimization algorithms are
The superior performance of TLSO over the considered tested. Therefore, we show only the TLSO graph for TLSO
algorithms in two experiments may be due to the following being consistent and faster in terms of iterations. For TLSO,
reasons. Through incorporation of underlying and natural (C, σ2) were found to be (1.5387, 12.8843) and used to build
teaching and learning concepts such as a teacher’s greater a model for the prediction of NOx emissions. The model
effort for the bad learner and the learner’s self-study, TLSO was then simulated to predict NOx emissions using the test
has intensified the teaching phase, and contrary to variant-1 data. The comparison between the measured NOx and the
and variant-2, diversified the learner phase up to some ex- predicted NOx by TLSO-LSSVM using test data is presented
tent by keeping a balanced trade-off of intensification and in Figure 10. It is clear from Figure 10(a) that the predicted
diversification. This balance has characterized the TLSO NOx keeps a good track of, and is in good agreement with,
with forcing the average cost of the class towards the teacher the measured NOx of training data. Figure 10(b) shows
in order to converge rapidly, while at the same time avoid- model predictions on test data, and exhibits good gener-
ing divergence to a local minimum which may be caused by alization ability on test data. The residual plots in Figures
such force. 11(a) and (b) depict that the residuals of training and test
It is to be noted that in this work, the 30-run experiment predictions are in acceptable agreement with the zero line.
of each algorithm mentioned in this work was executed Hence, it can be said that the TLSO-LSSVM maintains the
multiple times. As expected, it is found that, owing to be the prediction accuracy as well as the generalization ability of
stochastic processes, the aforementioned algorithms includ- LSSVM for the prediction of NOx emissions from a tan-
ing the TLSO did not exhibit exactly the same results be- gentially fired pulverized coal boiler in a standard 500 MW
cause each run is independent of other runs of the same al- power plant comparable to the other considered algorithms

Table 5 Performance evaluation of NOx prediction using median value given by each algorithm

ACO PSO TLBO ATLBO Variant-1 Variant-2 TLSO


Simulation
MRE rRMSE MRE rRMSE MRE rRMSE MRE rRMSE MRE rRMSE MRE rRMSE MRE rRMSE

Training data 0.0619 0.0780 0.0735 0.0904 0.0578 0.0733 0.0578 0.0733 0.0575 0.0730 0.0576 0.0732 0.0578 0.0733
Test data 0.0673 0.0853 0.0679 0.0875 0.0680 0.0856 0.0680 0.0856 0.0681 0.0855 0.0681 0.0856 0.0680 0.0856

Fig. 10 Predicted NOx with TLSO-LSSVM in comparison with measured NOx (a) training data (b) test data

288 Journal of Chemical Engineering of Japan


Fig. 11 Residual plot (samples v/s residual) (a) residual plot of training data (b) residual plot of test data

and outperforms them in terms of consistency and speed in Acknowledgement


terms of iterations.
This research work is sponsored by the Higher Education Com-
mission (HEC), Govt. of Pakistan under the scholarship program ti-
Conclusion tled: “HRDI-MS leading to PhD program of faculty development for
UESTPs/UETs (Batch-II) Phase-I”. We also would like to thank the staff
In this work, a novel teaching-learning-self-study-opti- of the Taean Power Plant, Taean, South Korea for their technical Sup-
mization (TLSO) algorithm has been proposed and com- port.
pared with ACO, PSO, TLBO and ATLBO and two vari-
ants of TLSO to examine the consistency to converge to a Nomenclature
global minimum using benchmark functions. TLSO has
b = output bias
introduced the concept of self-study of the learners, which
C = punishment on the data set beyond the pre-specified
in combination with the teacher’s greater efforts on a bad error tolerance
learner, makes the algorithms balanced in terms of the in- Dm = diversification factor (min–max search space) for
tensification and the diversification. TLSO has been found m-th subject
consistent to converge to the global minimum in the experi- e = error between actual and predicted output
ment of benchmark functions and also in the 30-run experi- f(Xi) = cost of objective function with marks Xi
ment for tuning of the hyper-parameters of LSSVM for the fit = cost of objective function
prediction of NOx emissions from a tangentially fired pul- I = identity matrix
verized coal boiler of a standard 500 MW power plant. iter = current iteration
Moreover, the results also indicate that the average cost K = radial basis kernel function
k = total number of learners (i=1, 2, ..., k)
of the class moves faster towards the global minimum and
M = mean of the marks obtained by learners in class
converges in lesser number of iterations in the case of TLSO
N = number of data points in x
compared to other algorithms under consideration except Nt = number of data points in test or validation data set
ATLBO. This superior performance of ATLBO in this regard r = random number
may be for the reason that it has intensified the TLBO, but rand = uniformly distributed random number between 0 and 1
due to the over intensification, the consistency of conver- s = total number of subjects (m=1, 2, ..., s)
gence is disturbed in some benchmark functions, especially TF = teaching factor
in multimodal functions and in the considered NOx model- w = weight vector
ing application as well; whereas, TLSO balances the inten- Xteacher = marks of the best learner
sification and diversification by introducing the self-study Xold = old marks of learners
concept. In other words, it can be said that TLSO consists Xnew = new marks learners
U
Xm = maximum marks of m-th subject
of the good attributes of TLBO and ATLBO. Furthermore, L
Xm = minimum marks of m-th subject
the NOx emissions prediction by LSSVM tuned with the
Ypred = predicted output (NOx in this case)
proposed TLSO are in good accordance with the measured Yactual = actual output (NOx in this case)
values of NOx from the boiler. y = output data
In summary, the study demonstrated the capability of the x = input data
TLSO in combination with artificial intelligence techniques,
i.e., LSSVM to model the complex and highly nonlinear re- α = lagrange multipliers
lation of the NOx emissions with the operational parameters Γ = acceleration coefficient
of the tangentially fired pulverized coal boiler. It also testifies Δ = another acceleration coefficient for variant-1
that the novel TLSO-LSSVM model can yield promising re- Φ = acceleration coefficient for variant-1 and variant-2
sults for NOx prediction from boilers of power plants. φ(x) = nonlinear transformation function
Ψ = another acceleration coefficient for variant-2
Ω = inertia weight

Vol. 50  No. 4  2017 289


‹Subscripts› International Joint Conference on Artificial Intelligence (IJCAI),
i = i-th data point (LSSVM) pp. 1137–1143, San Francisco, U.S.A. (1995)
i = i-th learner (TLBO) Li, G., P. Niu, W. Zhang and Y. Liu; “Model NOx Emissions by Least
j = j-th data point (TLBO) Squares Support Vector Machine with Tuning Based on Ame-
j = j-th data point (LSSVM) liorated Teaching–Learning-Based Optimization,” Chemom. Intell.
Lab. Syst., 126, 11–20 (2013)
Li, K., S. Thompson and J. Peng; “Modelling and Prediction of NOx
Literature Cited
Emission in a Coal-Fired Power Generation Plant,” Control Eng.
Adali, T., B. Bakal, M. K. Sonmez and R. Fakory; “NOx and CO Predic- Pract., 12, 707–723 (2004)
tion in Fossil Fuel Plants by Time Delay Neural Networks,” Integr. Ligang, Z., J. Hailin, Y. Minggao and Y. Minggao; “Prediction of Nox
Comput.-Aided Eng., 6, 27–39 (1999) Concentration from Coal Combustion Using Ls-Svr,” 4th Interna-
Ahmed, F., H. J. Cho, J.-K. Kim, N. Seong and Y. K. Yeo; “Prediction of tional Conference on Bioinformatics and Biomedical Engineering
NOx Emission from Coal Fired Power Plant Based on Real-Time (iCBBE), pp. 1–4 (2010)
Model Updates and Output Bias Update,” J. Chem. Eng. Japan, 48, Lv, Y., J. Liu, T. Yang and D. Zeng; “A Novel Least Squares Support Vec-
35–43 (2015) tor Machine Ensemble Model for NOx Emission Prediction of a
Ahmed, F., S. Nazir and Y. K. Yeo; “A Recursive Pls-Based Soft Sensor Coal-Fired Boiler,” Energy, 55, 319–329 (2013)
for Prediction of the Melt Index During Grade Change Operations Pearson, R. K. and M. Pottmann; “Gray-Box Identification of Block-
in Hdpe Plant,” Korean J. Chem. Eng., 26, 14–20 (2009) Oriented Nonlinear Models,” J. Process Contr., 10, 301–315 (2000)
Baek, S. H., H. Y. Park and S. H. Ko; “The Effect of the Coal Blending Rao, R. V., V. J. Savsani and D. P. Vakharia; “Teaching–Learning-Based
Method in a Coal Fired Boiler on Carbon in Ash and Nox Emis- Optimization: A Novel Method for Constrained Mechanical De-
sion,” Fuel, 128, 62–70 (2014) sign Optimization Problems,” Comput. Aided Des., 43, 303–315
Chamkalani, A., S. Zendehboudi, A. Bahadori, R. Kharrat, R. Chamka- (2011)
lani, L. James and I. Chatzis; “Integration of Lssvm Technique with Suykens, J. A. K. and J. Vandewalle; “Least Squares Support Vector Ma-
Pso to Determine Asphaltene Deposition,” J. Petrol. Sci. Eng., 124, chine Classifiers,” Neural Process. Lett., 9, 293–300 (1999)
243–253 (2014) Vapnik, V.; The Nature of Statistical Learning Theory; Springer-Verlag,
Degertekin, S. O. and M. S. Hayalioglu; “Sizing Truss Structures Using New York, U.S.A. (1995)
Teaching-Learning-Based Optimization,” Comput. Struc., 119, Vong, C.-M., P.-K. Wong and Y.-P. Li; “Prediction of Automotive
177–188 (2013) Engine Power and Torque Using Least Squares Support Vector
Fan, J., L. Qian, Y. Ma, P. Sun and K. Cen; “Computational Modeling of Machines and Bayesian Inference,” Eng. Appl. Artif. Intell., 19,
Pulverized Coal Combustion Processes in Tangentially Fired Fur- 277–287 (2006)
naces,” Chem. Eng. J., 81, 261–269 (2001) Wang, Y. and H. Fu; “Parameters Selection of Lssvm Based on Adaptive
Frias-Martinez, E., A. Sanchez and J. Velez; “Support Vector Machines Genetic Algorithm for Ship Rolling Prediction,” Control Confer-
Versus Multi-Layer Perceptrons for Efficient Off-Line Signature ence (CCC), 33rd Chinese, pp. 6632–6636, (2014)
Recognition,” Eng. Appl. Artif. Intell., 19, 693–704 (2006) Wei, Z., X. Li, L. Xu and Y. Cheng; “Comparative Study of Computa-
Gang, J., W. Jingcheng, G. Yang and L. Huajiang; “Urban Water De- tional Intelligence Approaches for NOx Reduction of Coal-Fired
mand Forecasting by Ls-Svm with Tuning Based on Elitist Teach- Boiler,” Energy, 55, 683–692 (2013)
ing-Learning-Based Optimization,” The 26th Chinese Control and Xing, B. and W.-J. Gao; Innovative Computational Intelligence: A
Decision Conference (2014 CCDC), pp. 3997–4002, Changsha, Rough Guide to 134 Clever Algorithms. Springer, Switzerland
China (2014) (2014)
Hu, D., W. Mao, J. Zhao and G. Yan; “Application of Lssvm-Pso to Load Ye, M.; “Prediction of Chaotic Time Series Using Ls-Svm with Simu-
Identification in Frequency Domain,” Artificial Intelligence and lated Annealing Algorithms,” Proceedings of the 4th international
Computational Intelligence, Vol. 5855, pp. 231–240, Springer, Ber- symposium on Neural Networks: Part II—Advances in Neural
lin and Heidelberg, Germany (2009) Networks, pp. 127–134, Nanjing, China (2007)
Kao, Y., M.-H. Chen and Y.-T. Huang; “A Hybrid Algorithm Based on Zheng, L.-G., H. Zhou, K.-F. Cen and C.-L. Wang; “A Comparative
Aco and Pso for Capacitated Vehicle Routing Problems,” Math. Study of Optimization Algorithms for Low NOx Combustion
Probl. Eng., 2012, 17 (2012) Modification at a Coal-Fired Utility Boiler,” Expert Syst. Appl., 36,
Khoshhal, A., M. Rahimi and A. A. Alsairafi; ““The Cfd Modeling of 2780–2793 (2009)
Nox Emission, Hitac and Heat Transfer in an Industrial Boiler,” Zhou, H., J. Pei Zhao, L. Gang Zheng, C. Lin Wang and K. Fa Cen;
Numer. Heat Transfer, Part A-App., 58, 295–312 (2010) “Modeling NOx Emissions from Coal-Fired Utility Boilers Using
Kohavi, R.; “A Study of Cross-Validation and Bootstrap for Accuracy Support Vector Regression with Ant Colony Optimization,” Eng.
Estimation and Model Selection,” Proceedings of the Fourteenth Appl. Artif. Intell., 25, 147–158 (2012)

290 Journal of Chemical Engineering of Japan

You might also like