You are on page 1of 8

International Journal of Control, Automation, and Systems (2013) 11(1):159-166

DOI 10.1007/s12555-011-0099-1

ISSN:1598-6446 eISSN:2005-4092
http://www.springer.com/12555

Increasing the Accuracy of Incremental Naive Bayes Classifier


Using Instance Based Learning
Sotiris Kotsiantis
Abstract: Along with the increase of data and information, incremental learning ability turns out to be
more and more important for machine learning approaches. The online algorithms try not to remember
irrelevant information instead of synthesizing all available information (as opposed to classic batch
learning algorithms). In this study, we attempted to increase the prediction accuracy of an incremental
version of Naive Bayes model by integrating instance based learning. We performed a large-scale
comparison of the proposed method with other state-of-the-art algorithms on several datasets and the
proposed method produce better accuracy in most cases.
Keywords: Concept drift, incremental machine learning, online learning.

1. INTRODUCTION
In huge datasets, thousands of measurements are
collected every day, thus the amount of information
stored in databases is massive and continuously growing.
Therefore, the knowledge extracted from these databases
need to be continuously updated, otherwise it could
become outdated. The main problem of using traditional
(non-incremental) learning algorithms to extract knowledge from databases with huge and continuously
growing data is the high computational effort required.
Incremental learning ability is very vital to large
datasets due to two reasons. Firstly, it is almost impossible
to collect all useful training instances before the trained
system is put into use. Secondly, modifying a trained
system may be cheaper in time cost than building a new
system from scratch.
Naive Bayes (NB) [5] classifier is the simplest form of
Bayesian network [15] since it captures the assumption
that every attribute is independent of all the all other
attributes, given the class feature. The naive Bayes algorithm is traditionally used in batch mode, meaning that
the algorithm does not perform the majority of its computations after observing each training instance, but rather accumulates certain information on all of the training instances and then performs the final computations
on the batch of instances [5]. However, note that there
is nothing inherent in the algorithm that stops one from
using it to learn incrementally. As an example, consider
how the incremental naive Bayes algorithm can work
assuming that it makes one pass through all of the training set. In step #1, it initializes all of the counts and to__________
Manuscript received August 16, 2011; revised March 2, 2012
and September 6, 2012; accepted November 1, 2012. Recommended by Editorial Board member Yuan Fang Zheng under the
direction of Editor Myotaeg Lim.
Sotiris Kotsiantis is with the Educational Software Development Laboratory of the Department of Mathematics, University of
Patras, Rio 26504, Greece (e-mail: kotsiantis@upatras.gr).
ICROS, KIEE and Springer 2013

tals to 0 and then goes through the training instances, one


at a time. For each training instance, it is given the feature vector x and the value of the label for that. The algorithm goes through the feature vector and increments the
correct counts. In step #2, these counts and totals are
converted to probabilities by dividing each count by the
number of training instances in similar class. The final
step (#3) computes the prior probabilities p(k) as the
fraction of all training instances that are in class k.
For many learning tasks of real world data, when it is
collected over an extended period of time, its underlying
distribution is likely to change. This kind of phenomenon
is known as concept drift. Research on concept drift
shows that lazy learning algorithms are among the most
effective models [7]. In this study, we attempted to
increase the prediction accuracy of an incremental
version of Naive Bayes model by integrating instance
based learning. We performed a large-scale comparison
of the proposed method with other state-of-the-art
algorithms on several datasets and we took better
accuracy in most cases.
Section 2 introduces some basic themes about online
learning, while Section 3 discusses the proposed method.
Experiment results and comparisons of the proposed
method with other learning algorithms in several datasets
are presented in Section 4. Finally, we conclude in
Section 5 with summary and additional research topics.
2. ONLINE LEARNING
In an online setting, the algorithm continually
modifies its hypothesis as it is being used; it repeatedly
receives a new pattern, predicts its value and possibly
updates its hypothesis accordingly. The online learning is
significant for many applications, such as intelligent user
interfaces, computer security, and market-basket analysis.
For example, customer preferences change as new
products and services become available. Desirable
features for incremental learning systems are that they
should: a) require small time per record, b) be able to

160

Sotiris Kotsiantis

build a model using one scan of the data, c) use only a


fixed amount of memory and d) make a usable model
available at any time.
Researchers developed online algorithms for learning
traditional machine learning models such as decision
trees [26]. Given an existing decision tree and a new
instance, this algorithm adds the instance to the example
set at the appropriate non-terminal and leaf nodes and
then verifies that all the features at the non-terminal
nodes and the class at the leaf node are still the finest.
Neural networks can be learned online by simply making
one pass through the data. However, there would
obviously be some loss associated with only making one
pass through the data [21]. There is a known
disadvantage for all these algorithms since it is very
difficult to perform learning with several instances at
once. In order to solve this problem, some algorithms
rely on windowing techniques [28] which consist in
storing the last k examples and performing a learning
process the time a new instance is encountered.
The Weighted Majority (WM) algorithm [17] is the
basis of many online algorithms. WM maintains a weight
vector for a set of classifiers, and predicts the outcome
using a weighted majority vote between the classifiers.
WM online learns this weight vector by punishing
erroneous classifiers. A number of similar algorithms
have been developed such as [1]. It must be mentioned
that WM are binary algorithms.
Voted-perceptron [10] stores more information during
training and then uses this elaborate information to
generate better predictions on the test instance. The
information it maintains during training is the list of all
prediction vectors that were generated after each mistake.
For each such vector, the algorithm counts the number of
iterations the vector survives until the next error. This
count is the weight of the prediction vector. Votedperceptron computes the binary prediction of each one of
the prediction vectors and combines all these predictions
by a weighted majority vote. The weights used are the
survival times described above. Good prediction
vectors is likely to survive for a long time and thus have
larger weight in the majority vote. It must be mentioned
that Voted-perceptron is also a binary algorithm.
Research on concept drift shows that lazy learning
algorithms are among the most effective models [7]. The
3-nearest neighbors algorithm (3-NN) is a method for
classifying instances based on closest training examples
in the feature space [29]. An instance is classified by a
majority vote of its neighbors, with the object being
assigned to the class most common amongst its 3 nearest
neighbors. K-star is another instance-based learner that
differs from other instance-based learners in that it
utilizes an entropy-based distance function [25].
Non-Nested Generalised Exemplars [23] (NNGE)
expands on nearest neighbour by introducing generalised
exemplars. Generalised exemplars are a bounded group
of examples that share the same concept and are close in
proximity within an n-dimensional problem space, where
n is the number of features in each example. The
bounding of groups is implemented by axis-parallel n-

dimensional rectangles, or hyperrectangles. Hyperrectangles represent each generalisation by an exemplar


where each feature value is replaced by either a range of
values for a continuous-valued domain or a list of
probable values for a discrete-valued domain [23]. This
enables hyperrectangles to represent a more broad rule
than many single examples.
AODE (Averaged One-Dependence Estimators) classifier [27] is considered an improvement on NB. Sahami
[22] introduced the notion of k-dependence estimators,
through which the probability of each feature value is
conditioned by the class and, at most, k other features.
AODE uses 1-dependence estimators (ODEs) and makes
use of SPODEs (SuperParent-One-Dependence Estimators), as every feature depends on the class and another
shared feature, designated as superparent. AODE
weakens the feature independence assumption by
averaging all models from a restricted class of onedependence learners.
In an online environment, it is less clear how to apply
ensemble methods directly [25]. For instance, with bagging, when one new example arrives that is misclassified,
it is too ineffective to resample the available data and
learn new classifiers [13]. One solution is to rely on the
user to specify the number of examples from the input
stream for each base learner [8,19] but this approach
supposes one know a great deal about the structure of the
data stream. There are also online boosting algorithms
that reweight classifiers [6,19] but these presume a fixed
number of classifiers. Additionally, online boosting is
likely to suffer a large loss initially when the base models have been trained with very few instances and the
algorithm can never be able to recover [2,11].
3. PROPOSED METHOD
Univariate approaches assume the independence of
features whereas multivariate approaches take the
relations between features into account. While it is good
practice to start modeling with simple univariate models,
the problem should also be investigated by using local
models. The proposed model simple trains a Nave Bayes
classifier during the train process. For this reason, the
training time of the model is that of simple Nave Bayes.
During the classification of a test instance the model
calculate the probabilities each class and if the
probability of the most possible class is at least two times
the probability of the next possible class then the
decision is that of NB model. However, if the NB is not
so sure e.g., the probability of the most possible class is
less than two times the probability of the next possible
class; the model finds the 3 nearest neighbors using the
selected distance metric. Finally, in this case the model
averages the probabilities of NB with 3NN classifier for
the classification of the testing instance. It must be
mentioned that 3NN classifier is only used for a small
number of test instances and for this reason classification
time is not a big problem for our model. Generally, the
proposed ensemble is described by pseudo-code in
Fig. 1.

Increasing the Accuracy of Incremental Naive Bayes Classifier Using Instance Based Learning

Training:
1. Initialize all the counts of feature values and totals to 0
and then goes through the training examples, one at a time.
2. For each training example, it is given the feature vector x
and the value of the class for that. The algorithm goes
through the feature vector and according to the feature
values increments the proper feature values counts.
3. These counts and totals are converted to probabilities by
dividing each count by the number of training examples in
same class c e.g. P(f|c): the probability of each feature value
f in each class c.
4. Computes the prior probabilities p(c) as the fraction of all
training examples that are in class c.
Classification:
1. Obtain the test instance
2. Calculate the probabilities of belonging the instance in
each class of the dataset e.g. take P(c) times the calculated
probability of each feature in class c, i.e. each P(f |c) based
on the test instance.
3. If the probability of the most possible class is at least two
times the probability of the next possible class then the
decision is that of NB model else
a. Find the k(=3) nearest neighbors using the selected
distance metric (Manhattan in our implementation)
b. Aggregate the decisions of NB with 3NN classifier by
averaging of the probabilities for the classification of the
testing instance.
c. The class with the highest probability is the final
decision

Fig. 1. The proposed algorithm.


The constraint due to which NB algorithm fail to
classify a particular instance is that the probability of
class attribute can evenly distributed among the distinct
attribute values. In that case NB algorithm incorrect
classification occurs due to the random assignment of
class labels. The proposed algorithm solves this problem
by aggregating the decisions of NB with 3NN classifier.
An advantage of 3NN is that it is particularly effective
when the probability distributions of the feature variables
are not known. In addition, the problem of over-fitting
occurs if a value of any attribute is never observed in the
training data. Hence, the posterior probabilities of every
class of a test instance with unseen attribute value
become zero. This problem is resolved by aggregating
the decisions of NB with 3NN classifier, too.
4. COMPARISONS AND RESULTS
In this section, two experiments are conducted. During
the first experiment, a number of incremental learning
algorithm (Naive Bayes, 3NN, KSTAR, NNge) is compared with the proposed method. During the second experiment, a representative algorithm for each of batch
sophisticated supervised learning algorithms was compared with the proposed method. We used batch algorithms as a superior measure of the accuracy of learning
algorithms. Most of the incremental versions of batch
algorithms are not lossless [21,26,28]. An online lossless
learning algorithm is an algorithm that returns identical
hypothesis with that of batch algorithm.

161

For the purpose of our study, the datasets come from


many domains of the UCI repository including text classification problems (oh0.mat, oh15.mat) [9]. The used
datasets are batch datasets, i.e., there is no natural order
in the data. In order to calculate the classifiers accuracy,
cross validation was run 10 times for each algorithm and
the mean value of the 10-cross validations was calculated.
It must be mentioned that we used the free available
source code for algorithms by [12] for our experiments.
We did not attempt to tune any of the algorithms to the
specific dataset. Wherever possible, default values of
learning parameters were used. This approach results in
lower estimates of the true accuracy, but it is a bias that
affects all the learning algorithms uniformly.
In the last rows of the Table 1 there are the aggregated
results of the first experiment. In Table 1, we represent
with v that the proposed method looses from the specific algorithm. That is, the specific algorithm performed
statistically better than the proposed method according to
t-test with p<0.05 [29]. Furthermore, in Table 1, * indicates that proposed method performed statistically better than the specific algorithm according to t-test with p
< 0.05. In all the other cases, there is no significant statistical difference between the results (Draws). In the last
rows of the Table 1 one can also observe the aggregated
results in the form (W-D-L). In this notation W means
that the proposed method is significantly less accurate
than the compared algorithm in W out of 34 datasets, L
means that the proposed method is significantly more
accurate than the compared algorithm in L out of 34 datasets, while in the remaining cases (D), there is no significant statistical difference between the results. To sum
up, the proposed ensemble is significantly more precise
than NB algorithm in 7 out of the 34 datasets, whilst it
has significantly higher error rates in none dataset. In
addition, the proposed algorithm is significantly more
accurate than 3NN algorithm in 9 out of the 34 datasets,
whereas it has significantly higher error rates in 2 datasets. Moreover, the proposed ensemble is significantly
more precise than KSTAR algorithm in 15 out of the 34
datasets, whilst it has significantly higher error rates in 4
datasets. Moreover, the proposed ensemble is significantly more precise than NNGE algorithm in 12 out of the 34
datasets, whilst it has significantly higher error rates in
one dataset. In the sequel, we will provide representative
figures showing how the accuracy increases as more instances are added.
During the second experiment, AODE classifier [27]
and a representative algorithm for each of batch sophisticated supervised learning algorithms was compared with
the proposed method. The C4.5 algorithm [20] was the
representative of the decision trees [18] in our study. The
SMO algorithm [14] - was the representative of the
support vector machines [16,30]. Finally, the RIPPER
[4] was the representative of the rule learners [24] in our
study. As one can see in Table 2, the proposed ensemble
is significantly more precise than AODE in 5 out of the
34 datasets, whilst it has significantly higher error rates
in 4 datasets. In addition, the proposed ensemble is
significantly more precise than SMO in 4 out of the 34

Sotiris Kotsiantis

162

Table 1. Comparing the proposed ensemble with other incremental classifiers.


Dataset
audiology
autos
badges
balance-scale
breast-cancer
wisconsin-breast-cancer
colic
credit-g
credit-rating
diabetes
Glass
haberman
heart-c
hungarian-14-heart-diseas
heart-statlog
hepatitis
ionosphere
iris
labor
lymphography
monk1
monk2
monk3
primary-tumor
sonar
students
relation
vehicle
vote
wine
zoo
oh0.mat
oh15.mat
DLBCL-Stanford
W-D-L (p<0.05)

Proposed
73.07
69.46
100
90.1
73.22
96.19
80.57
75.15
82.41
75.72
66.31
73.31
83.74
84.15
84.93
83.11
91.14
95
93.37
84.77
80.37
56.87
93.45
47.2
80.04
85.26
78.31
66.17
90.74
98.13
96.25
80.12
74.58
95.31

datasets, whilst it has significantly higher error rates in 3


datasets. The proposed algorithm is also significantly more
precise than RIPPER algorithm in 8 out of the 34 datasets,
while it has significantly higher error rates in 2 datasets.
Finally, the proposed algorithm has significantly lower
error rates than C4.5 algorithm in 9 out of the 34
datasets and it is significantly less accurate in 5 datasets.
The naive Bayes classifier has been successful despite
its crude class conditional independence assumption.
There are however, limits on the abilities of NB to reach
near-optimal performance. Take for example a linearly
separable pair of non-Gaussian classes. A linear
classifier trained by the perceptron algorithm is
guaranteed to learn the classification boundary while NB
is not. Obviously, some real datasets violate conditional
independence assumption. Due to the naive assumption,
the NB often leads to the poor posterior probability. This
is the reason why some datasets are well classified while
the others are not. When the NB produces poor results in
a specific dataset, the proposed method cannot produce
much better results.
All the experiments specify that the proposed method
performed, on average, better than all the tested algorithms.

NB
72.64
57.41
99.66
90.53
72.7
96.07
78.7
75.16
77.86
75.75
49.45
75.06
83.34
83.95
83.59
83.81
82.17
95.53
93.57
83.13
73.38
56.83
93.45
49.71
67.71
85.7
77.85
44.68
90.02
97.46
94.97
80.09
74.67
96.00
0/27/7

*
*

*
*
*

3NN
67.97
67.23
100
86.74
73.13
96.6
80.95
72.21
84.96
73.86
70.02
69.77
81.82
82.33
79.11
80.85
86.02
95.2
92.83
81.74
78.97
54.74
86.72
44.98
83.76
82.29
78.9
70.21
93.08
95.85
92.61
21.79
19.05
75.35
2/23/9

*
*

v
v

*
*
*

K*
80.32
72.01
90.27
88.72
73.73
95.35
75.71
70.17
79.1
70.19
75.31
70.27
75.18
77.83
76.44
80.17
84.64
94.67
92.03
85.08
80.27
58.35
86.22
38.02
85.11
80.85
77.56
70.22
93.22
98.72
96.03
32.02
31.27
51.00
4/15/15

v
*
*

*
*
*
v
*
*
*
*

*
*
*
v
v

*
*
*

NNge
73
74.26
100
80.46
67.8
96.18
79.02
69.24
82.83
72.84
67.98
66.8
77.78
79.6
77.3
81.88
90.6
96
86.23
77.14
86.73
53.98
89.08
39.09
71.12
81.01
70.69
62.26
95.1
95.93
94.09
73.06
67.96
94.30
1/21/12

*
*
*

*
*
*
*
v

*
*

5. CONCLUSION
Online learning is essential when the dataset is large
enough that multiple passes through the dataset would be
time-consuming. In this study, we attempted to increase
the prediction accuracy of an incremental version of
Naive Bayes model by integrating instance based learning. We performed a large-scale comparison with other
state-of-the-art algorithms and ensembles on 34 standard
benchmark datasets and we took better accuracy in most
cases. We believe the simplicity of this algorithm and its
great performance makes it an appealing tool for pattern
classification. However, in spite of these results, no general method will work always.
We have mostly used our online algorithm to learn
static datasets, i.e., those which do not have any temporal
ordering among the training instances. Much data mining
research is concerned with finding methods applicable to
the increasing variety of types of data availabletime
series, multimedia, spatial, worldwide web logs, etc.
Using the presented algorithm on these different types of
data is an important area of future work.

Increasing the Accuracy of Incremental Naive Bayes Classifier Using Instance Based Learning

Fig. 2. Representative graphs showing how the accuracy increases as more instances are added.

163

164

Sotiris Kotsiantis

Fig. 3. Representative graphs showing how the accuracy increases as more instances are added.

Increasing the Accuracy of Incremental Naive Bayes Classifier Using Instance Based Learning

165

Table 2. Comparing the proposed ensemble with well known classifiers.


Dataset
audiology
autos
badges
balance-scale
breast-cancer
wisconsin-breast-cancer
colic
credit-g
credit-rating
diabetes
Glass
haberman
heart-c
hungarian-14-heart-diseas
heart-statlog
hepatitis
ionosphere
iris
labor
lymphography
monk1
monk2
monk3
primary-tumor
sonar
students
relation
vehicle
vote
wine
zoo
oh0.mat
oh15.mat
DLBCL-Stanford
W-D-L (p<0.05)

[1]

[2]

[3]

[4]
[5]

[6]

[7]

Proposed
73.07
69.46
100
90.1
73.22
96.19
80.57
75.15
82.41
75.72
66.31
73.31
83.74
84.15
84.93
83.11
91.14
95
93.37
84.77
80.37
56.87
93.45
47.2
80.04
85.26
78.31
66.17
90.74
98.13
96.25
80.12
74.58
95.31

SMO
80.77
71.34
100
87.57
69.52
96.75
82.66
75.09
84.88
76.8
57.36
73.4
83.86
82.74
83.89
85.77
88.07
96.27
92.97
86.48
79.58
58.7
93.45
47.09
76.6
86.72
77.6
74.08
95.77
98.76
96.05
81.62
73.98
95.35
3/27/4

*
*

*
v
v

REFERENCES
P. Auer and M. Warmuth, Tracking the best disjunction, Machine Learning, vol. 32, no. 2, pp.
127-150, 1998.
F. Chu and C. Zaniolo, Fast and light boosting for
adaptive mining of data streams, Lecture Notes in
Computer Science, vol. 3056, pp. 282-292, 2004.
J. G. Cleary and L. E. Trigg, K*: an instance-based
learner using an entropic distance measure, Proc.
of the 12th International Conference on Machine
Learning, pp. 108-114, 1995.
W. Cohen, fast effective rule induction, Proc. of
Int Conf. of ML-95, pp. 115-123, 1995.
P. Domingos and M. Pazzani, On the optimality of
the simple Bayesian classifier under zero-one loss,
Machine Learning, vol. 29, no. 2-3, pp. 103-130,
1997.
W. Fan, S. Stolfo, and J. Zhang, The application of
AdaBoost for distributed, scalable and online learning, Proc. of Fifth ACM SIGKDD International
Conference on Knowledge Discovery and Data
Mining, New York, pp. 362-366, 1999.
F. Fdez-Riverola, E. L. Iglesias, F. Daz, J. R.

RIPPER
73.1
73.62
100
80.3
71.45
95.61
85.1
72.21
85.16
75.18
66.78
72.72
79.95
79.57
78.7
78.13
89.16
93.93
83.7
76.31
83.87
56.21
84.8
38.74
73.4
86.44
78.01
68.32
95.75
93.14
86.62
80.78
76.38
74.45
2/24/8

[8]

[9]

[10]

[11]

[12]

[13]

*
*

*
*

v
*

C4.5
77.26
81.77
100
77.82
74.28
95.01
85.16
71.25
85.57
74.49
67.63
71.05
76.94
80.22
78.15
79.22
89.74
94.73
78.6
75.84
80.61
57.75
92.95
41.39
73.61
86.75
78.55
72.28
96.57
93.2
92.61
81.57
74.64
77.90
5/20/9

v
*

v
*
v

*
*

*
*

v
v
*

AODE
72.73
74.76
100
69.96
73.05
97.05
82.45
75.83
86.67
75.70
74.53
71.57
82.87
84.33
82.70
85.36
91.09
93.07
88.43
86.86
82.32
59.62
93.21
49.77
77.05
86.08
78.21
70.32
94.28
98.31
94.66
76.06
69.96
95.35
4/25/5

v
*

v
v

*
*

Mndez, and J. M. Corchado, Applying lazy learning algorithms to tackle concept drift in spam filtering, Expert Systems with Applications, vol. 33, no.
1, pp. 36-48, 2007.
A. Fern and R. Givan, Online ensemble learning:
An empirical study, Proc. of the 17th International
Conference on ML, pp. 279-286, 2000.
A. Frank and A. Asuncion, UCI Machine Learning
Repository [http://archive.ics.uci.edu/ml]. Irvine,
CA: University of California, 2010.
Y. Freund and R. Schapire, Large margin classification using the perceptron algorithm, Machine
Learning, vol. 37, no. 3, pp. 277-296, 1999.
A. Gangardiwala and R. Polikar, Dynamically
weighted majority voting for incremental learning
and comparison of three boosting based approaches, Proc. of Joint Conf. on Neural Networks, pp.
1131-1136, 2005.
M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.
Reutemann, and I. H. Witten, The WEKA data
mining software: an update, SIGKDD Explorations, vol. 11, no. 1, pp. 10-18, 2009.
L. I. Kuncheva, Classifier ensembles for changing

166

[14]

[15]

[16]

[17]

[18]

[19]

[20]
[21]
[22]

[23]
[24]

[25]

[26]

[27]

[28]

[29]

[30]

Sotiris Kotsiantis

environments, Lecture Notes in Computer Science,


vol. 3077, pp. 1-15, 2004.
L. Kai and H.-K. Huang, Incremental learning
proximal support vector machine classifiers, Proc.
of International Conference on Machine Learning
and Cybernetics, pp. 1635-1637, 2002.
J. Lee, W. Chung, E. Kim, and S. Kim, A new
genetic approach for structure learning of Bayesian
networks: matrix genetic algorithm, International
Journal of Control, Automation and Systems, vol. 8,
no. 2, pp 398-407, 2010.
Z. Liang and Y. Li, Incremental support vector
machine learning in the primal and applications,
Neurocomputing, vol. 72, no. 10-12, pp. 2249-2258,
2009.
N. Littlestone and M. Warmuth, The weighted
majority algorithm, Information and Computation,
vol. 108, pp. 212-261, 1994.
L. Jing, L. Xue, and Z. Weicai, Ambiguous decision trees for mining concept-drifting data
streams, Pattern Recognition Letters, vol. 30, no.
15, pp. 1347-1355, 2009.
N. C. Oza and S. Russell, Online bagging and
boosting, Proc. of Artificial Intelligence and Statistics 2001, pp. 105-112, 2001.
J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco, 1993.
D. Saad, Online Learning in Neural Networks,
London, Cambridge University Press, 1998.
M. Sahami, Learning limited dependence Bayesian classifiers, Proc. of the 2nd Int. Conf. on
Knowledge Discovery in Databases, pp. 335-338,
1996.
R. Sylvain, Nearest Neighbor with Generalization,
Christchurch, New Zealand, 2002.
C.-J. Tsai, C.-I. Lee, and W.-P. Yang, Mining decision rules on data streams in the presence of concept drifts, Expert Systems with Applications, vol.
36, no. 2, pp. 1164-1178, 2009.
A. Tsymbal, M. Pechenizkiy, P. Cunningham, and
S. Puuronen, Dynamic integration of classifiers
for handling concept drift, Information Fusion, vol.
9, no. 1, pp. 56-68, 2008.
P. Utgoff, N. Berkman, and J. Clouse, Decision
tree induction based on efficient tree restructuring,
Machine Learning, vol. 29, no. 1, pp. 5-44, 1997.
G. I. Webb, J. R. Boughton, and Z. Wang, Not so
naive bayes: aggregating one-dependence estimators, Machine Learning, vol. 58, pp. 5-24, 2005.
G. Widmer and M. Kubat, Learning in the presence of concept drift and hidden contexts, Machine Learning, vol. 23, pp. 69-101, 1996.
I. Witten, E. Frank, and M. Hall, Data Mining:
Practical Machine Learning Tools and Techniques,
Morgan Kaufmann, 2011.
H.-G. Yeom, S.-M. Park, J. Park, and K.-B. Sim,
Superiority demonstration of variance-considered
machines by comparing error rate with support vector machines, Int. Journal of Control, Automation,
and Systems, vol. 9, no. 3, pp. 595-600, 2011.

Sotiris Kotsiantis received his bachelor in


mathematics in 1999, a Master degree in
2001 and a Ph.D. degree in computer
science in 2005 from the University of
Patras, Greece. His research interests are
mainly in the field of data mining and
machine learning. He has a lot of
publications to his credit in international
journals and conferences.

You might also like