Professional Documents
Culture Documents
Keywords: Least Squares Support Vector Machines, NOx Prediction, Teaching-Learning-Based-Optimization, Ame-
liorated Teaching-Learning-Based-Optimization, Teaching-Learning-Self-Study-Optimization
This paper presents a novel Teaching-Learning-Self-Study-Optimization (TLSO) algorithm which is not only fast converg-
ing according to the number of iterations, but also relatively consistent in converging with high accuracy to the global
minimum in comparison with some other algorithms. The original Teaching-Learning-Based Optimization (TLBO) gives
uniformly distributed and randomly selected weight to the amount of knowledge to a learner at each phase, i.e., teacher
phase and learner phase. This uniformly distributed and randomly selected weight causes the algorithm to converge
the average cost of learners in a moderate number of iterations. Li and his coworkers intensified the teacher and learner
phases by introducing weight-parameters in order to improve the convergence speed in terms of iterations in 2013 and
called it Ameliorated Teaching-Learning-Based Optimization (ATLBO). The criterion of a good evolutionary optimization
algorithm is to be consistent in converging the cost of the objective function. For this, it should include intensification
for local search as well as diversification for global search in order to reduce the chances of trapping in a local minimum.
Some students naturally tend to study by themselves by the means of a library and internet academic resources in order
to enhance their knowledge. This phenomenon is termed as self-study and is introduced in the proposed TLSO’s learner
phase as a diversification factor (DF). Various other evolutionary algorithms such as ACO, PSO, TLBO, ATLBO and two vari-
ants of TLSO are also developed and compared with TLSO in terms of consistency to converge to the global minimum.
Results reveal that the TLSO was found to be consistent not only for a higher number of functions among 20 benchmark
functions, but also for NOx prediction application. Results also show that the predicted NOx emissions through LSSVM
tuned with TLSO are comparable with the other algorithms considered in this work.
parameters, such as ACO and PSO in this case, is they need mark function separately. In contrast, TLBO, ATLBO, vari-
extensive experimentation for the selection of the algo- ant-1, variant-2 and TLSO have the advantage of not having
rithm-specific parameters for each objective function prior such algorithm-specific parameters, which saves a lot of
to actual optimization. These parameters have substantial ef- time consumed in algorithm-specific parameters selection.
fects on the performance of the algorithms, i.e., convergence All of the experiments have been conducted on an
accuracy and convergence speed. Therefore, the selection of Intel(R) Core(TM) i7-3770 CPU @3.40 GHz and 8 GB RAM
suitable algorithm-specific parameters is an essential task for with Windows 7 Ultimate.
such algorithms. Therefore, an extensive experimentation
has been carried out to obtain the algorithm-specific param- 3.2 Comparison with other algorithms
eters of ACO and PSO to get the best result for each bench- In this subsection, the results of the TLSO with 25 learn-
Step Optimum 0 0 0 0 0 0 0
Best 4.2987e-07 0.0076 1.4948e-16 6.2621 5.9954 6.6278 0.1792
Worst 0.0126 1.8693 1.1425e-10 7.5000 7.4844 7.4992 1.0741
Mean 0.0020 0.7049 5.1737e-12 7.3546 6.9546 7.2777 0.5448
SD 0.0031 0.3926 2.1416e-11 0.2792 0.4329 0.2093 0.2118
MI — — 830.2222 — — — —
Time 1.6155 1.1484 5.6979 1.0729 1.1596 1.1027 1.0434
Sphere Optimum 0 0 0 0 0 0 0
Best 9.9745e-07 0 7.4626e-149 0 0 0 0
Worst 0.0141 1.0002e+04 5.8388e-144 0 0 0 0
Mean 0.0016 1.3343e+03 3.6166e-145 0 0 0 0
SD 0.0032 3.4573e+03 1.0796e-144 0 0 0 0
MI — — — 527.5333 541.5000 532.9000 534.6667
Time 1.6207 1.1412 5.6143 1.0965 1.1736 1.1224 1.0603
Sum Squares Optimum 0 0 0 0 0 0 0
Best 7.3885e-10 0 3.5427e-149 0 0 0 0
Worst 9.5630e-04 1500 2.6366e-144 0 0 0 0
Mean 1.4028e-04 186.6667 1.8251e-145 0 0 0 0
SD 2.3041e-04 434.4901 6.1275e-145 0 0 0 0
MI — — — 524.9333 537.8333 529.1667 531.7333
Time 1.6259 1.1413 5.6341 1.0915 1.1763 1.1128 1.0655
Beale Optimum 0 0 0 0 0 0 0
Best 2.5085e-04 0 0 4.4019e-06 5.9169e-06 1.1297e-05 0
Worst 0.4458 0.7621 0 1.1225 0.0050 0.0072 6.1630e-32
Mean 0.0659 0.0762 0 0.2788 8.7650e-04 0.0017 2.9788e-33
SD 0.0815 0.2325 0 0.3492 9.7008e-04 0.0020 1.2179e-32
MI — 81.3333 Not converged — — — 192.7500
Time 0.7723 0.5238 4.5045 0.5775 0.5797 0.5531 0.5115
Easom Optimum −1 −1 −1 −1 −1 −1 −1
Best −0.9999 −1 −1 −0.9999 −0.9999 −0.9999 −1
Worst −0.9996 −1 −1 0 −0.0024 −0.2231 −1
Mean −0.9999 −1 −1 −0.3177 −0.9071 −0.8394 −1
SD 9.0771e-05 0 0 0.4606 0.2416 0.2158 0
MI — 168.4000 94.6333 — — — 138.6000
Time 1.2857 1.0491 4.6495 0.9954 1.0612 0.9898 0.9362
Matyas Optimum 0 0 0 0 0 0 0
Best 8.3409e-15 0 1.3722e-273 0 0 0 0
Worst 2.1366e-07 0 1.7080e-260 4.9407e-324 4.9407e-324 4.9407e-324 0
Mean 1.7586e-08 0 6.7700e-262 0 0 0 0
SD 4.5363e-08 0 0 0 0 0 0
MI — 417.7333 — 521.6538 526.7917 523.8148 392.8667
Time 0.7367 0.5115 4.4447 0.5497 0.5472 0.5300 0.4874
Zakharov Optimum 0 0 0 0 0 0 0
Best 1.1189e-08 0 1.2474e-102 0 0 0 0
Worst 0.0014 0.0765 1.3062e-96 0 0 0 0
Mean 9.7569e-05 0.0077 5.9532e-98 0 0 0 0
SD 2.7039e-04 0.0181 2.4845e-97 0 0 0 0
MI — — — 521.8000 533.9333 527.9667 528.1000
Time 1.4539 1.1166 5.0447 1.0795 1.1386 1.0640 1.0142
Powell Optimum 0 0 0 0 0 0 0
Best 1.2187e-09 0 1.3436e-12 0 0 0 0
Worst 6.3814e-04 93.3053 2.2869e-07 0 0 0 0
Mean 5.6287e-05 14.8647 2.4341e-08 0 0 0 0
SD 1.2657e-04 20.9282 5.8292e-08 0 0 0 0
MI — — — 525.5667 537.5667 530.4667 528.8000
Time 1.6754 1.2795 5.6434 1.1937 1.2678 1.1992 1.1416
Schwefel 2.22 Optimum 0 0 0 0 0 0 0
Best 1.3446e-04 0 2.6461e-74 8.1947e-309 2.6589e-303 3.3894e-305 3.4250e-306
Worst 0.0392 49.4945 1.1577e-72 1.1410e-304 1.3414e-300 1.9049e-302 2.9975e-305
Mean 0.0105 12.4796 2.7073e-73 1.9068e-305 8.6671e-302 1.3074e-303 1.0911e-305
SD 0.0103 11.7663 2.7884e-73 0 0 0 0
MI — Not converged — — — — —
Time 1.6403 1.1661 5.8249 1.0787 1.1675 1.1074 1.0755
Schwefel 1.2 Optimum 0 0 0 0 0 0 0
Best 2.0244e-15 0 5.9502e-293 0 0 0 0
Worst 0.0023 800000000 2.8513e-287 0 0 0 0
Mean 1.1665e-04 60000000 2.2931e-288 0 0 0 0
SD 4.2449e-04 1.8864e+08 0 0 0 0 0
MI — 403 — 261.7667 274.2667 266.6000 269.3667
Time 1.9412 1.4797 6.2953 1.4537 1.5217 1.4894 1.4039
Mean solution 1 8 12 8 10 10 17
Standard deviation (if the mean solution is among the best) 1 7 11 8 10 10 13
Mean iterations to converge the average cost of class (if the solution 0 4 2 9 0 0 4
is among the best)
ers or population and 50,000 function evaluations are pre- Goldstein price but in terms of standard deviation, TLBO
sented and compared with the results obtained using the was found better than TLSO.
other algorithms considered in terms of consistency to con- Table 3 summarizes the results presented in Table 2 and
verge the cost to a global minimum, mean iterations to shows the number of benchmark functions each considered
converge the average cost of the class or population (MI) algorithm found the best according to the mean solution
and average time in seconds it takes per run. Table 2 shows and standard deviation (if the mean solution is among the
the results in the form of the best solution among 30 inde- best). Even if an algorithm is able to converge to a global
pendent runs, worst solution, mean solution, standard de- minimum and converges the average cost faster for a range
viation, MI and average time consumed per run. The mean of functions, its consistency to converge to a global mini-
solution and standard deviation represent the consistency mum for a function still remains the essential criteria for
of an algorithm (for the specific function) to converge to performance evaluation. In this experiment, TLSO outper-
a global minimum. The MI represents the speed of the al- formed other considered algorithms in terms of consistency
gorithm with which the average of marks or position of its to converge to a global minimum by giving better results for
learners or population moves towards the teacher (best ant 17 functions in terms of mean solution and 13 functions in
or best particle) during the iterations. In this experiment, terms of standard deviation out of 20 benchmark function.
the MI over 30 independent runs to converge the average However, as expected, in terms of MI, ATLBO was found
cost of the class or population has been computed for the better than other algorithms considered. ATLBO converges
algorithms giving the best solution for the functions. the average cost of the class quicker than other algorithms
It can be observed from the Table 2 that in this experi- for 9 functions (Table 3). The reason of ATLBO’s better
ment, the performances of PSO, TLBO, ATLBO, variant-1, performance in this regard may be due to the inclusion of
variant-2 and TLSO were the same for Bohachevsky 2 and inertia weight and acceleration coefficients in teacher and
Griewank, and they outperformed ACO’s performance in learner phases of the TLBO, respectively. These coefficients
terms of mean solution and standard deviation as shown help accelerate the movement of the average cost of the
in Table 2. PSO, TLBO, variant-1, variant-2 and TLSO have class towards the global minimum, causing faster conver-
been found to perform identically and outperformed ACO gence of average cost of the class. On the other hand, it
and ATLBO for Bohachevsky 3. For Matyas, PSO, ATLBO, may be inferred that the compelling pressure, of the teacher
variant-1, variant-2 and TLSO performed in similar fash- on the learners and of learners on other learners through
ion and outperformed ACO and TLBO. For Sphere, Sum these acceleration coefficients may lead learners to the local
squares, Zakharov, Powell and Schwefel 1.2; ATLBO vari- minimum especially in the case of multimodal functions
ant-1, variant-2 and TLSO’s performances have been found during some of the independent runs. It is evident in Table
alike and better than other considered algorithms. Results 2 that the ability of ATLBO being consistent to converge
obtained from TLBO, variant-1, variant-2 and TLSO were to a global minimum is deficient in the case of multimodal
found the same for Bohachevsky 1 and these algorithms functions compared to unimodal functions. TLSO has bal-
outperformed other algorithms. It was observed for Schaffer ance in its performance in unimodal and multimodal func-
that only PSO, TLBO, variant-1 and TLSO have converged tions as well. The self-study concept in TLSO helps diversify
the cost to a global minimum, but ACO outperformed all the search process in the search space, so that learners can
algorithms in terms of mean solution and standard de- spread and cover the whole search space instead of follow-
viation. In case of Easom and 6 Hump Camel Back; PSO, ing the teacher or the other learners blindly and converge at
TLBO and TLSO performed better than other algorithms; a local minimum. However, there is compensation for this
for 6 Hump Camel Back, the standard deviation of PSO was consistency in TLSO, a bit higher MI than ATLBO.
found better than TLBO and TLSO. For Booth and Shubert, As far as the elapsed time of functions per run is consid-
the algorithms whose performances were found best are ered, elapsed time of different functions per run for TLSO
PSO, TLBO and TLSO. In addition, for Shubert, TLBO out- outperformed TLBO, but was found comparable (because
performed PSO and TLSO in terms of standard deviation. the difference is less than a second) to the other considered
TLBO and TLSO achieved the same results for Michalewicz algorithms (Table 2).
2 except for the standard deviation, which was better in the
case of TLBO. TLBO gave the best results compared to other
algorithms for Step and Beale functions. For Schwefel 2.22,
TLSO outperformed. TLBO and TLSO outperformed for
2
with each iteration. The converged cost and the converged
Nt
1 Yi ,actual − Yi ,pred average cost are plotted against 30 independent runs in Fig-
relative RMSE =
Nt
Yi ,actual
ures 3(b) and (d), respectively, to magnify the results. The
i=1 (27)
minimum cost (i.e. global minimum) of 5-fold RMSECV of
NOx data under consideration using LSSVM was 20.4805.
Nt
1 Yi ,actual − Yi ,pred Figure 3(a) shows the performance of convergence of cost of
MRE =
Nt Yi ,actual the objective function using ACO. It is clear from the figure
i=1 (28)
that ACO has not converged to the global minimum in all
4.4 Results and discussion 30 runs. In fact, it is difficult to realize from the Figure 3(a)
After a careful processing of the operational data, the that, although some runs which seem to have converged to
NOx emissions from a tangentially fired pulverized coal the global minimum, they actually have not converged to
boiler were modeled using LSSVM. For the fine-tuned it but only approached to near global minimum. Therefore,
LSSVM model, the hyper-parameters of LSSVM need to be in order to differentiate, we have classified the diversions as
optimized well for better predictions. The hyper-parameters near global minimum diversions and local minimum diver-
of LSSVM have been optimized by employing various al- sions. In Figure 3(b), it is clear that ACO has not converged
gorithms and their convergence performances in terms of the cost to the global minimum in all 30 runs. ACO reached
consistency to converge the cost of the 5-fold RMSECV of the near global minimum in 7 runs, and the remaining 23
NOx emissions under consideration to a global minimum, runs diverged to a local minimum (Table 4). As far as the
MI have been investigated. The algorithms employed for this average cost is concerned, ACO diverged it to a local mini-
application have been run 30 times each. For the fair com- mum in all 30 runs (Figure 3(d)). It demonstrates that dur-
parison, ACO and PSO have been iterated double the times ing the execution of iterations of algorithm (for NOx data
of the other algorithms. Therefore, the even values of cost under consideration), the average position of the ants did
and average cost of ACO and PSO have been taken to draw not approach towards the global minimum in any run. ACO
the graphs to be consistent with graphs of other algorithms. converged the cost to near the global minimum in 7 runs; it
The performance of 30 runs of ACO is shown in Figure is because the best ant of the iteration approached near the
3. The cost and average cost of objective function (RMSECV global minimum. The convergence of cost to near the global
5-fold) are plotted against number of iterations in Figures minimum is not fixed, and owing to the stochastic process,
3(a) and (c). The average cost of objective function for all this number may change. The important point to be noted
algorithms has been plotted to see if it converges to the here is that cost has not converged to the global minimum
global minimum and to examine how rapidly the learners of in all 30 runs.
a class or particles of swarm or ants of a colony are moving Similarly, the graphs are plotted to investigate the con-
Fig. 4 PSO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average cost of
objective function against number of iterations (d) converged average cost at each run
sistency performance of PSO. It can be seen in Figures 4(a) every run (Figures 5(a) and (b)), but also the average cost
and (b) that not all the runs of PSO converged the cost to of class converged to the global minimum at every run (Fig-
near the global minimum and none converged to the global ures 5(c) and (d)). In other words, it can be said for all 30
minimum. The dotted line represents the global minimum. runs that all the learners approached the global minimum
PSO diverged the cost to the local minimum 29 times and at every run. The cost and average cost curves of TLBO ap-
to the near the global minimum 1 time out of 30 runs (Table proached the global minimum in 18.6667 and 40.0333 mean
4). Here, it may be said that the 5-fold RMSECV objective iterations averaged over 30 runs (as all 30 runs converged to
function of the NOx model using LSSVM in this case is a the global minimum) (Table 4).
multimodal function and, due to this reason, some of the The ATLBO graphs shown in Figures 6(a) and (b) depict
runs of PSO have completely diverted to a local minimum. that ATLBO diverged the cost in some of the runs. Table 4
Figures 4(c) and (d) show that the average cost of the PSO tabulates that cost diverged in 2 runs to a local minimum, to
diverged to a local minimum and never converged to the near the global minimum in 23 runs and converged to the
global minimum, even not near to the global minimum in global minimum in 5 runs. The mean iterations to converge
a 30-run experiment. Similar to ACO, it can be said for PSO the cost and average cost were computed to be 9.6 and 12.4
as well (for NOx data under consideration) that the average iterations, which indicate quite fast convergence of cost and
position of the swarms was not converged except the best average cost of the class for ATLBO in terms of iterations.
swarm of the iteration. PSO was also found inconsistent for Figure 6(c) illustrates the rapid movement of average cost
converging the cost of 5-fold RMSECV of NOx data under towards the global minimum when compared to TLBO
consideration. (Figure 5(c)). This rapid movement is characterized by the
The performance of TLBO for NOx data under consider- inclusion of the inertia weight, acceleration coefficient and
ation is presented in Figure 5. The cost of objective function the elitist strategy in the ATLBO. On the other hand, as the
using TLBO not only converged to the global minimum at cost using ATLBO for the considered NOx data was not
Fig. 6 ATLBO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) average cost
of objective function against number of iterations (d) converged average cost at each run
converged as well for all 30 runs, the average cost was not inertia weight and acceleration coefficients in both teacher
converged as well for all 30 runs (Figures 6(c) and (d)). and learner phases. These coefficients intensify the unidirec-
ATLBO differs from TLBO on the basis of introduced tional search and compel the average of the cost towards the
minimum without taking into account the diversification. celeration coefficients in the learner phase. The average cost
As a result, although the runs of ATLBO moved the average using variant-1 did not converge to the global minimum in
cost quickly towards the global minimum, they could not all 30 runs (Figures 7(c) and (d)).
avoid divergences from the global minimum in some runs. The variant-2 successfully escaped the cost of the objec-
Keeping in mind that the runs of TLBO avoided trapping tive function to be stuck in a local minimum 25 times and
in a local minimum and converged steadily to the global converged the cost 14 times, thus outperforming ATLBO
minimum, it is quite reasonable to infer that the compelling and variant-1 in this regard (Figures 8(a) and (b)). The
pressure of intensification without inclusion of diversifica- cost of variant-2 diverted to near the global minimum for
tion in ATLBO is responsible for some runs to just approach 11 times (Table 4). The mean iterations to converge the cost
to near the global minimum and not converge to it and/or and average cost were computed to be 12 and 15.1429. In
completely divert to a local minimum (for the NOx data Figures 8(c) and (d), it is shown that the average cost did
under consideration having local minimum). not converge to the global minimum in all 30 runs.
Figure 7 represents the performance of the variant-1. The performance of proposed TLSO is illustrated in Fig-
As shown in Figures 7(a) and (b), variant-1 has success- ure 9. It is clear from Figures 9(a) and (b) that TLSO has not
fully avoided the cost of objective function to be trapped only been successful in avoiding the local minimum, but has
in a local minimum 25 times in total and converged to the also avoided divergence to near the global minimum. TLSO
global minimum 11 times, and outperformed ATLBO in has converged the cost to the global minimum in all 30 runs.
this regard. It may be the result of the incorporation of the As far as the consistency of runs to converge the average
self-study concept of learners (diversification), but on the cost to the global minimum is concerned, TLSO has shown
other side, variant-1 diverted the cost to near the global to escape from trapping in near the global minimum as well
minimum 14 times and to a local minimum 5 times in the as the local minimum, and converged to the global mini-
30-run experiment as shown in Figure 7(b) and Table 4. The mum in all 30 runs (Figure 9(d)). In terms of consistency to
mean iterations to converge the cost and average cost were converge the cost and average cost to the global minimum,
computed as 13.4545 and 16.2727, which are higher than TLSO outperformed ACO, PSO, ATLBO, variant-1 and vari-
those of ATLBO. Compared to TLBO, the mean iterations ant-2, but is comparable to TLBO for the objective function
to converge the cost and average cost of variant-1 is lower. of the application considered in this study. The average cost
It indicates that variant-1 moves the average cost towards curves of TLSO approached the global minimum in 16
near the global minimum quicker than TLBO. Despite hav- mean iterations averaged over 30 runs. The skewness of the
ing this capability, varian-1 did not converge the cost to the curves of average cost against iterations and comparison of
global minimum in all 30 runs as TLBO did. The reason for mean iterations to converge the cost demonstrate that the
this may be the over intensification by the inclusion of ac- movement of the average cost towards the global minimum
Fig. 9 Proposed TLSO: Convergence consistency of (a) cost of objective function against number of iterations (b) converged cost at each run (c) av-
erage cost of objective function against number of iterations (d) converged average cost at each run
of TLSO is faster than TLBO, variant-1 and variant-2, but in that they are successful in converging the cost of some
slightly slower than ATLBO (Figures 3(c)–9(c) and Table 4). objective functions and not successful in the case of some
This is the characteristic of all evolutionary algorithms other objective functions. Furthermore, if they converge the
Table 5 Performance evaluation of NOx prediction using median value given by each algorithm
Training data 0.0619 0.0780 0.0735 0.0904 0.0578 0.0733 0.0578 0.0733 0.0575 0.0730 0.0576 0.0732 0.0578 0.0733
Test data 0.0673 0.0853 0.0679 0.0875 0.0680 0.0856 0.0680 0.0856 0.0681 0.0855 0.0681 0.0856 0.0680 0.0856
Fig. 10 Predicted NOx with TLSO-LSSVM in comparison with measured NOx (a) training data (b) test data