Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

This article has been accepted for publication in a future issue of this journal, but has not been
fully edited. Content may change prior to final publication.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
Cluster Oriented Ensemble Classifier: Impact of Multi-cluster
Characterisation on Ensemble Classifier Learning
B. Verma and A. Rahman
Centre of Intelligent and Networked Systems
School of Computing Sciences, CQUniversity
Rockhamton, Queensland 4702, Australia
Email: b.verma@cqu.edu.au, a.rahman@cqu.edu.au
Abstract
This paper presents a novel cluster oriented ensemble classifier. The proposed ensemble
classifier is based on original concepts such as learning of cluster boundaries by the base
classifiers and mapping of cluster confidences to class decision using a fusion classifier. The
categorised data set is characterised into multiple clusters and fed to a number of distinctive
base classifiers. The base classifiers learn cluster boundaries and produce cluster confidence
vectors. A second level fusion classifier combines the cluster confidences and maps to class
decisions. The proposed ensemble classifier modifies the learning domain for the base
classifiers and facilitates efficient learning. The proposed approach is evaluated on
benchmark data sets from UCI machine learning repository to identify the impact of multi-
cluster boundaries on classifier learning and classification accuracy. The experimental results
and two–tailed sign test demonstrate the superiority of the proposed cluster oriented ensemble
classifier over existing ensemble classifiers published in the literature.
Keywords: Ensemble classifier, clustering, classification, fusion of classifiers
Digital Object Indentifier 10.1109/TKDE.2011.28 1041-4347/11/$26.00 © 2011 IEEE

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
1. Introduction
An ensemble classifier is conventionally constructed from a set of base classifiers that
separately learn the class boundaries over the patterns in a training set. The decision of an
ensemble classifier on a test pattern is produced by fusing the individual decisions of the base
classifiers. Ensemble classifiers are also known as multiple classifier systems, committee of
classifiers and mixture of experts [1]. An ensemble classifier produces more accurate
classification than its individual counterparts provided the base classifier errors are
uncorrelated [3].
Contemporary ensemble generation techniques train the base classifiers on different subsets
of the training data in order to make their errors uncorrelated. The different algorithms
including bagging [4] and boosting [7] vary in terms of generating the training subsets for
base classifier training. The decisions of the base classifiers are fused into a single decision
by using either majority voting on discrete decisions [1] or algebraic combiners [15] on
continuous valued confidence measures. Although the contemporary ensemble classifiers
(detailed in Section 2) are capable of making the base classifier errors uncorrelated they fail
to establish any mechanism to improve the learning domain of the individual base classifiers.
To clarify this concern let us consider a real world data set with overlapping patterns from
different classes. The learning of class boundaries between overlapping class patterns in such
cases is a difficult problem. Excessive training of the base classifiers will lead to accurate
learning of the decision boundary but resulting in overfitting thus misclassifying instances of
test data. On the other hand learning generalized boundaries will avoid overfitting but at the
cost of always misclassifying some overlapping patterns. This problem on learning the class
boundaries of overlapping patterns remains inherent in all the base classifiers and is
2
propagated to the decision fusion stage as well even though the base classifier errors are
uncorrelated.
We opt to bring in clustering at this point. Clustering is the process of partitioning a data set
into multiple groups where each group contains data points that are very close in Euclidian
space. The clusters have well defined and easy to learn boundaries. Let’s assume that the
patterns are labelled with their cluster number. Now if the base classifiers are trained on the
modified data set they will learn the cluster boundaries. As the clusters have well defined
easy to learn boundaries the base classifiers can learn them with high accuracy. Clusters can
contain overlapping patterns from multiple classes. A fusion classifier can be trained to
predict the class of a pattern from the predicted cluster. The proposed cluster oriented
ensemble classifier is based on the above philosophy.
With the aim to achieve better learning and improved accuracy of the ensemble classifier, in
this paper we propose an ensemble classifier approach that clusters classified data into
multiple clusters, learns the decision boundaries between the clusters using a set of base
classifiers and combine the cluster decisions produced by the base classifiers into class
decision by a fusion classifier. Learning cluster boundaries leads to superior performance of
the base classifiers. The fusion classifier maps the clustering pattern produced by the base
classifiers into class decision. Altogether the ensemble of base and fusion classifiers aims
better learning leading to higher classification accuracy as evidenced from the experimental
results.
While achieving the above mentioned aim, the research presented in this paper would like to
find out the answers of four major research questions. The first research question is to
investigate the performance of different clustering approaches namely heterogeneous
clustering (i.e. clustering all the patterns from different classes) and homogeneous clustering
3
(i.e. clustering patterns within a class). The second research question is to investigate whether
the ensemble classifier outperforms the base classifiers significantly. The third research
question is to find out the impact of fusion classifier. The final research question is to find the
standing of the proposed ensemble classifier with respect to other ensemble classifiers on
benchmark data sets.
This paper is organized as follows. Section 2 presents the literature review. The proposed
ensemble classifier is discussed in Section 3 and the methodology is presented in Section 4.
Section 5 describes the experimental setup used for evaluating the proposed approach.
Section 6 presents the results and comparative analysis. Finally, Section 7 concludes the
paper.
2. Literature Review
The major concentration of ensemble classifier research [1]–[2] is on (i) generation of base
classifiers for achieving diversity among them, and (ii) methods for fusing the decision of the
base classifiers. Two classifiers are diverse if they make different errors on different instances.
The ultimate objective of diversity is to make the base classifiers as unique as possible with
respect to misclassified instances. We present a review of the contemporary ensemble
classifiers related to the proposed approach in this section.
Bagging [4][6] is a sampling based ensemble classifier generation approach that was
introduced by Breiman. Bagging generates the multiple base classifiers by training them on
data subsets randomly dawn (with replacement) from the entire training set. The decisions of
the base classifiers are combined into the final decision by majority voting. The sampling
procedure of bagging creates the various training subsets by bootstrap sampling which results
in the diversity among the base classifiers. Bagging is suitable for small data sets. For large
data sets however the sampling scheme based on the bootstrap with replicates of the training
4
set is infeasible. Moreover, the randomness introduced by the sampling process in bagging
cannot guarantee the performance of the overall ensemble classifier. A number of variations
to bagging are observed in the literature to improve its performance and the list includes
random forests [5], ordered aggregation [11], adaptive generation and aggregation approach
[14], and fuzzy bagging [13].
Schapire proposed a method called boosting [7][8] that creates data subsets for base classifier
training by re-sampling the training data, however, by providing the most informative
training data for each consecutive classifier. In boosting each of the training instances is
assigned a weight that determines how well the instance was classified in the previous
iteration. The subset of the training data that is badly classified (i.e. instances with higher
weights) are included in the training set for the next iteration. This way boosting pays more
attention to instances that are hard to classify. Although boosting identifies difficult to
classify instances it does not provide any mechanism to improve the learning of base
classifiers on these instances. The problem of base classifier learning that is raised by
overlapping patterns still remains (as mentioned in the previous section), and leads to poor
base classifier performance. A number of variants of boosting can be observed in the
literature including boosting recombined weak classifiers [12], weighted instance selection
[10], Learn++ [20] and its variant Learn++.NC [21].
Random subspace [9] is an ensemble creation method that uses feature sub sets to create the
different data subsets to train the base classifiers. Maclin and Shavlik proposed a neural
ensemble [22] where a number of new approaches are presented to initialise the network
weights in order to achieve diversity and generalization. Pujol and Masip presented a binary
discriminative learning technique [23] based on the approximation of the non-linear decision
boundary by a piece-wise linear smooth additive model. Chaudhuri et. al. presented a hybrid
ensemble model [24] that combines the strengths of parametric and non–parametric
5
classifiers. In recent times there are some works relating to cluster ensembles that aims to
obtain improved clustering of the data set by combining multiple partitioning of the data set
[25]. Note that the focus of ensemble classifier is to obtain improved classification accuracy
that is significantly different from cluster ensembles that aims to achieve improved clustering
accuracy.
The other key aspect of ensemble classifier is the fusion of base classifier outputs into class
decisions. The mapping can be done on discrete class decisions or continuous class
confidence values produced by the base classifiers. The commonly used fusion methods [1]
for combining class labels are majority voting, weighted majority voting, behaviour
knowledge space, and Borda count. The commonly used fusion methods for combining
continuous outputs are algebraic combiners [15] including mean rule, weighted average,
trimmed mean, min/max/median rule, product rule, and generalized mean. A number of other
fusion rules include decision template [16], pair-wise fusion matrix [17], adaptive fusion
method [18], and non–Bayesian probabilistic fusion [19]. Note that all these approaches are
designed to fuse the class decisions from the base classifiers into a single class decision.
Summarizing, the contemporary ensemble classifier generation methods are able to produce
diversity among the base classifiers by making their errors uncorrelated. It however does not
provide any mechanism to improve the learning process of the individual base classifiers on
difficult to classify overlapping patterns. The proposed ensemble classifier aims to address
this issue by creating multiple boundaries through data clustering, training the base classifiers
on easy to learn cluster boundaries and handling the cluster to class mapping process by a
fusion classifier. The overall philosophy of the proposed approach is presented in the
following section.
6
3. The Proposed Ensemble Classifier
3.1 Motivation
The decision boundaries in real world data sets are not simple. This is primarily because of
overlapping patterns from different classes in the data set. As a result the learning of decision
boundaries in such data sets leads to either overfitting or poor generalization. In both cases it
causes classification errors. The situation is explained in Figure 1. The data set in Figure 1(a)
contains overlapping patterns from two classes. Accurate learning from the training data by a
generic classifier will result in class boundaries in Figure 1(b) leading to overfitting and thus
misclassification of test data. An alternate solution to the problem can be achieved by
reducing penalties for misclassification during training. In this case the generic classifier will
learn simple decision boundaries (Figure 1(c)) but will cause misclassification of training as
well as test data.
This is the point where we would like to introduce multiple decision boundaries for each
class through clustering. Clustering is the process of grouping similar patterns. Clustering the
data set in Figure 1(a) with overlapping patterns will result in smaller groups of patterns as in
Figure 1(d). Note that the cluster boundaries (Figure 1(e)) are simple and easy to learn. A
generic classifier, if trained, now learns simple cluster boundaries that neither causes
overfitting nor extreme generalization. Cluster to class mapping can be done by a fusion
classifier. The underlying theoretical model and the methodologies of the proposed ensemble
classifier are based on the above fact and are presented bellow.
7
Class 1 Class 1 Class 1

Test Case Class 2 Test Case Class 2 Test Case Class 2
(a) (b) (c)

Class 1 Class 1
Class 2 Class 2
Test Case Class 2 Test Case Class 2
(d) (e)
Figure 1: Impact of clustering on an example data set consisting of two classes. (a) The
original data set with overlapping patterns, (b) Overfitting caused by accurate learning of the
decision boundaries, (c) Generalized decision boundary with overlapping patterns of class
two considered as part of class one, (d) Clustered data set, and (e) Decision boundaries
learned on clustered data set.
3.2 Ensemble Classifier Model
Let the ensemble classifier is composed of a set of ܰ௕௖ base classifiers ߰ଵ , ߰ଶ , … , ߰ே್೎ and a
fusion classifier ߮. Given a pattern ࢞ the ensemble classifier ݂ can be defined to achieve the
following mapping:
݂(࢞) = [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ] (1)
where ‫ݐ‬ଵ , … , ‫ݐ‬ே೎ are class confidence values for the ܰ௖ classes. The base and fusion classifiers
combine to achieve the above mapping.
Assuming that the data set is partitioned into ‫ ܭ‬clusters, each pattern belongs to a cluster. The
base classifier ߰௜ is set to map the input pattern ࢞ to a set of cluster confidence measures
‫ݓ‬௜ଵ , … , ‫ݓ‬௜௄ as
߰௜ (࢞) = [‫ݓ‬௜ଵ , … , ‫ݓ‬௜௄ ]. (2)
8
The training set Ȟ௕ of a base classifier ߰ is made of pairs (࢞, [‫ݓ‬ଵ , … , ‫ݓ‬௄ ]) where ࢞ represents
the input and [‫ݓ‬ଵ , … , ‫ݓ‬௄ ] represents the target. Given that ࢞ belongs to cluster k the target
cluster confidence vector is set as
1 ݂݅ ݆ = ݇
‫ݓ‬௝ = ൝ . (3)
0 ‫݁ݏ݅ݓݎ݄݁ݐ݋‬
The base classifier parameters ߠట are tuned to optimization such that
ߠట = argminఏ෡ഗ σ‫࢞(׊‬,[௪భ ,…,௪಼])‫ߝ ್୻א‬௕ ቀ߰ఏ෡ഗ (࢞), [‫ݓ‬ଵ , … , ‫ݓ‬௄ ]ቁ (4)
where ߝ௕ is the error function. Let ߰ఏ෡ഗ (࢞) = [ߛଵ , … , ߛ௄ ]. The error function ߝ௕ for the base
classifier is defined as
ߝ௕ = σ௄
௞ୀଵ|‫ݓ‬௞ െ ߛ௞ |. (5)
Given the cluster confidence vectors produced by the base classifiers the fusion classifier ߮
performs the following mapping
߮൫[‫ݓ‬ଵଵ , … , ‫ݓ‬ଵ௄ ], … , [‫ݓ‬ே್೎ଵ , … , ‫ݓ‬ே್೎௄ ]൯ = [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ] (6)
where the ‫ݓ‬௜ଵ , … , ‫ݓ‬௜௄ are the class confidence measures produced by base classifier ߰௜ and
‫ݐ‬ଵ , … , ‫ݐ‬ே೎ are class confidence values. The training set for the fusion classifier Ȟ௙ is composed
of pairs ൫࢝, [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ]൯ where ࢝ is the cluster confidence vector and [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ] is the target
class confidence vector. A cluster can contain patterns from multiple classes and in that case
a unique mapping is not possible by the fusion classifier. Depending on the number of classes
ܰ௖ , each class deserves a share of the cluster. There are thus a total of ܰ௖ outputs/targets of
the fusion classifier each representing a class and each target receives a weight during
training according to the proportion of its patterns in the cluster. Let the cluster confidence
9
vectors produced by the base classifiers in (6) correspond to cluster k that contains ݊௝ patterns
of class j where 1 ൑ ݆ ൑ ܰ௖ . The target class confidence for the j–th class is set as
௡ೕ
‫ݐ‬௝ = ಿ೎ . (7)
σೕసభ ௡ೕ
The parameters Ʌఝ for the fusion classifier ߮ are optimized such that
ߠఝ = argminఏ෡ക σ‫׊‬൫࢝,[௧భ ,…,௧ಿ ߝ௙ ቀ߮ఏ෡ക (࢝), [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ]ቁ (8)

೎ ]൯‫୻א‬೑
where ߝ௙ is the error function. Assuming ߮ఏ෡ക (࢝) = [ߟଵ , … , ߟே೎ ] the error function ߝ௙ is
defined as:
ே
ߝ௙ = σ௝ୀଵ
೎
ห‫ݐ‬௝ െ ߟ௝ ห. (9)
Using (2) and (6), the ensemble classifier mapping in (1) can be enumerated as:
݂(࢞) = ߮൫ൣ߰௜ (࢞) Ȉ ‫ ڮ‬Ȉ ߰ே್೎ (࢞)൧൯

= ߮൫[‫ݓ‬ଵଵ , … , ‫ݓ‬ଵ௄ ] Ȉ ‫ ڮ‬Ȉ [‫ݓ‬ே್೎ଵ , … , ‫ݓ‬ே್೎௄ ]൯ (10)
= ൣ‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ൧
The proposed ensemble classifier is based on the above model and corresponding architecture
is presented in Figure 2.
The objective of the proposed Cluster Oriented Ensemble Classifier (COEC) is to improve
Input Base classifier Cluster Confidence Vector Fusion classifier Class Confidence Vector
\1 w11 ,, w1K
x \2 w21 ,, w2 K
M t1 ,, t N c

\ N bc wN bc 1 ,, wN bc K
Figure 2: Architecture of the proposed ensemble classifier
10
the learning process as well as the overall prediction accuracy by partitioning the data set,
learning cluster boundaries by the base classifiers and mapping base classifiers’ output to
class confidence vector using a fusion classifier. The novelty of the proposed method lies in:
(i) Partitioning classified data into multiple clusters for achieving better separation.
(ii) Use of base classifiers in an ensemble to learn cluster boundaries.
(iii) Fusion of cluster confidence values produced by the base classifiers into class
confidence values by a fusion classifier.
3.3 Clustering in COEC
The learning of the base and fusion classifiers in COEC depends on multiple class boundaries
produced by clustering. The outcome of the clustering algorithm depends on the similarity
measure ο between the patterns and we have used Euclidian distance that computes the
geometric distance between two patterns ࢞௜ =< ‫ݔ‬௜ଵ , ‫ݔ‬௜ଶ , … , ‫ݔ‬௜௡ > and
࢞௝ =< ‫ݔ‬௝ଵ , ‫ݔ‬௝ଶ , … , ‫ݔ‬௝௡ > in n–dimensional hyperspace. We performed two types of clustering
in COEC:
(i) Heterogeneous clustering to partition all the patterns in the training set independent of any
knowledge of the class of the patterns.
(ii) Homogeneous clustering for partitioning the patterns belonging to a single class only.
Patterns belonging to each class are partitioned separately.
The characteristics and outcome of the two types of clustering is significantly different and
influence the accuracy of COEC as evidenced from the experimental results.
Assuming a set of K clusters {ȍଵ , ȍଶ , … , ȍ௄ } and the associated cluster centres
{țଵ , țଶ , … , ț௄ } the clustering algorithm aims to minimize an objective function
11
‫ = ܬ‬σ௄
௞ୀଵ σ‫࢞׊‬೔ ‫א‬ȍ ο(࢞௜ , ૂ௞ ), (11)
for the patterns in the corresponding training set. Considering an augmented training set īሖ
defined as īሖ = {(࢞ଵ , ݈ଵ ), (࢞ଶ , ݈ଶ ), … , (࢞หīሖ ห , ݈หīሖ ห )} where ݈௜ ‫{ ג‬ȍଵ , ȍଶ , … , ȍ௄ } , a generic
classifier ߰௜ learns the decision boundaries between the clusters and produces cluster
confidence vector ‫ݓ‬௜ଵ , … , ‫ݓ‬௜௄ . The fusion classifier maps cluster confidence vector to class
confidence vector [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ].
The performance of the fusion classifier depends on the content of the cluster. If all the
patterns in the cluster belong to the same class the mapping is unique. We refer to these
clusters as atomic clusters. Non–atomic clusters are composed of patterns from different
classes. The target vector of the fusion classifier for these clusters is set according to the
proportion of patterns from different classes during training as mentioned in (7).
4. Learning and Prediction Methodology of COEC
The overall learning and prediction methodology of COEC is presented in Figure 3 and
Figure 4. The learning process is depicted in Figure 3 where the training data is first clustered
and the base classifiers then learn the mapping from patterns to clusters. The cluster
confidence values produced by the different base classifiers are then merged to form the
inputs for the fusion classifier and the targets are set to the original class values for learning
the cluster to class map. During prediction (Figure 4), the base classifiers produce cluster
confidence vectors for a test pattern. These vectors are merged to form the input for the
fusion classifier that produces the class confidence vector.
The different steps of learning and prediction of the ensemble of classifiers are detailed in the
following sections.
12
Figure 3: Training process for COEC
Figure 4: Test process for COEC
13
4.1 Homogeneous/Heterogeneous Clustering
The learning process starts by partitioning the training data into multiple clusters. Given the
training data set [‫ݔ‬௜௝ ] = [݀௜௝ ] Ȉ [݈ܿܽ‫ݏݏ‬௜ ] where 1 ൑ ݅ ൑ ܰ௘௫௔௠௣௟௘௦ and 1 ൑ ݆ ൑ ܰ௙௘௔௧௨௥௘௦ , the
purpose of the clustering algorithm is to partition the training data set into a number of
ܰ௖௟௨௦௧௘௥ clusters. The output of the clustering algorithm is the modified data set [‫ݕ‬௜௝ ] =
[݀௜௝ ] Ȉ [݈ܿ‫ݎ݁ݐݏݑ‬௜ ]. Given the training data set, the clustering algorithm is presented in Figure
5. At the completion of clustering each row of [݀௜௝ ] is augmented with cluster id producing
[‫ݕ‬௜௝ ] = [݀௜௝ ] Ȉ [݈ܿ‫ݎ݁ݐݏݑ‬௜ ].
The output of the clustering algorithm depends on the input argument type. We have used two
types of clustering in COEC – (i) Homogeneous clustering: Clustering is performed
separately on the patterns belonging to the same class, and (ii) Heterogeneous clustering:
Clustering is performed on the entire data set. We have reported our findings on both of these
clustering approaches in Section 6.
Figure 5: Homogeneous/Heterogeneous Clustering algorithm for partitioning classified data

into multiple clusters.
14
4.2 Base Classifier Training
A set of ܰ௕௖ base classifiers are trained with [‫ݕ‬௜௝ ] = [݀௜௝ ] Ȉ [݈ܿ‫ݎ݁ݐݏݑ‬௜ ] as produced by the
clustering algorithm. The input to each base classifier is set to [݀௜௝ ]. The target for each base
classifier is set to [‫ݐ‬௜௞ ] such that
1 ݂݅ ݈ܿ‫ݎ݁ݐݏݑ‬௜ = ݇
‫ݐ‬௜௞ = ൝ , (12)
0 ‫݁ݏ݅ݓݎ݄݁ݐ݋‬
where 1 ൑ ݇ ൑ ܰ௖௟௨௦௧௘௥௦ . The aim of training the base classifiers with the target cluster
matrix is that during prediction the base classifiers produce cluster confidence values for a
pattern. The training parameters for each base classifier are optimized to fit the training data.
The training algorithm for a generic classifier is presented in Figure 6. At the completion of
training, for each base classifier ܾ a model ߠ෠ ௕ is obtained where 1 ൑ ܾ ൑ ܰ௕௖ and [݀௜௝ ] is
presented to each of the base classifiers producing a set of ܾ cluster confidence matrices
௕
{[‫ݓ‬௜௞ ]} for the training patterns where 1 ൑ ܾ ൑ ܰ௕௖ , and 1 ൑ ݇ ൑ ܰ௖௟௨௦௧௘௥௦ .
Figure 6: Base classifier training algorithm.
15
4.3 Fusion Classifier Training
The confidence matrices produced by the base classifiers are combined to form the input to
the fusion classifier ߮ where 1 ൑ ݅ ൑ ܰ௘௫௔௠௣௟௘௦ and 1 ൑ ݇ ൑ ܰ௖௟௨௦௧௘௥௦ . The target matrix for
߮ is composed of class confidence vectors that are set according to the proportion of class
instances within the cluster. The parameters for fusion classifier ߮ are optimized to fit the
above input-output pattern produced by the training examples. At the completion of training a
model for the ensemble classifier ߠ෠ ఝ is obtained. The training algorithm for the fusion
classifier is presented in Figure 7.
Figure 7: Fusion classifier training algorithm.
4.4 Prediction
The test pattern ݁ =< ݁ଵ , … , ݁ே೑೐ೌ೟ೠೝ೐ೞ > is presented to each of the base classifiers. Each
base classifier ܾ produces ܰ௖௟௨௦௧௘௥௦ different confidence values < ‫ݓ‬ଵ௕ , … , ‫ݓ‬ே௕೎೗ೠೞ೟೐ೝೞ > that
indicate the possibility of the pattern belonging to the different clusters. The cluster
confidence vectors produced by the different base classifiers are combined to produce
ே ே
< ‫ݓ‬ଵଵ , … , ‫ݓ‬ேଵ ೎೗ೠೞ೟೐ೝೞ , … , ‫ݓ‬ଵ ್೎ , … , ‫ݓ‬ே೎೗ೠೞ೟೐ೝೞ
್೎
> that forms the input to the fusion classifier. At
output the fusion classifier produces the class confidence values < ߟଵ , … , ߟே೎೗ೌೞೞ > that
indicate the possibility of the example belonging to different classes. The ensemble classifier
prediction algorithm is presented in Figure 8.
16
Figure 8: Ensemble classifier prediction algorithm.
5. Experimental Setup
We have conducted a number of experiments on benchmark data sets from UCI machine
learning repository [27] to verify the strength of COEC and investigate the research questions
mentioned in Section 1. We have used the same data sets as used in recently published
research [10]–[12][17] so that the results can be easily compared. A summary of the data sets
is presented in Table 1. The Wine data set has well defined training and test sets so as
directed by the description of the data set [27], we have used it as it is. We have used 10–fold
cross validation for reporting the classification results for all the other data sets.
Table 1: Data sets used in the experiments.

Dataset # instances # attributes # classes
Breast Cancer (Wisconsin) 699 10 2

Sonar 208 60 2
Iris 150 4 3
Ionosphere 351 34 2
Thyroid (New) 215 5 3
Vehicle 946 18 4
Liver 345 7 2
Diabetes 768 8 2
Wine 178 13 3
Satellite 6435 36 6
Segment 2310 19 7
We used the k–means clustering algorithm [26] for partitioning the data sets. Two types of
clustering were performed: (i) heterogeneous clustering: conventional clustering of the entire
data set into k clusters where a cluster can contain examples of more than one class. The
17
target for the fusion classifier is set as per the proportions of the class examples within each
cluster; (ii) homogeneous clustering: examples of a single class are partitioned into k clusters.
The target of the fusion classifier is set to the class for which the clustering is performed. We
have reported the impact of both types of clustering on ensemble classifier accuracy and
analysed for their superiority.
We have investigated the proposed ensemble classifier by incorporating three well known
and distinct classifiers such as k Nearest Neighbour (k–NN), Neural Network (NN), and
Support Vector Machine (SVM) as the base classifiers. A Neural Network is used as the
fusion classifier. The neural networks for small data sets are trained using a single hidden
layer and tan sigmoid activation functions for the neurons. The Levenberg–Marquardt
backpropagation method is used for learning of the weights in these cases. Larger data sets
are however learned with log sigmoid activation function and gradient descent training
function. We have used the radial basis kernel for SVM and the libsvm library [28] in all the
experiments. The different parameters for the classifiers (e.g. k in k–NN classifier, sigma in
RBF kernel of SVM, and epochs, RMS error goal, learning rate in neural network) were
hand tuned for different data sets. The classification accuracies of bagging, boosting and
random subspace on the data sets in Table 1 are obtained from [17] and WEKA [31]. All the
experiments were conducted on MATLAB 7.5.0.
6. Results and Discussion
6.1 Heterogeneous and Homogeneous Clustering
6.1.1 Heterogeneous clustering
Given a set of training examples the heterogeneous clustering partitions the entire data set. In
a data set where examples of different classes are well separated in Euclidian space,
18
heterogeneous clustering will produce partitions each containing examples from one class
only. We use the term atomic cluster to refer to a partition containing examples from a single
class. Most of the real world data sets however contain overlapping examples from different
classes. It is thus likely to observe mostly non–atomic clusters (clusters containing examples
from multiple classes) when the data set is partitioned using heterogeneous clustering where
the number of clusters equals the number of classes. Figure 9 represents a set of co–
occurrence matrices that are obtained from different data sets by counting the number of
instances of each class belonging to a particular cluster when the data sets are partitioned into
k clusters using k–means clustering with k=# of classes.
Class Class Class Class
1 2 1 2 1 2 3 1 2 3
1 0 42 45 1 0 32 0
Cluster
Cluster
1 61 82 1 30 32
Cluster
Cluster
2 21 3 0 2 30 1 0
2 141 31 2 57 68
3 24 0 0 3 0 3 24
Ionosphere Sonar Iris Wine
Figure 9: Cluster–Class co–occurrence matrix when heterogeneous clustering is performed

on the data sets using k–mean clustering algorithm where k = # of classes.
Note from Figure 9 that in Ionosphere and Sonar data sets each cluster contains examples
from multiple classes. This implies overlapping data points in these data sets. Nearly atomic
and atomic clusters are obtained for the Iris data set at the second and third clusters
respectively. The first cluster however contains overlapping examples from class 2 and class
3. Clustering these data sets into higher number of partitions will lead to higher number of
atomic or nearly atomic clusters leading to better learning of the ensemble classifier. The
clusters produced for the Wine data set are however either atomic or nearly atomic. It is easier
to produce the cluster to class mapping for these clusters by the fusion classifier in COEC.
Clustering further is unlikely to provide any benefit for the ensemble classifier learning for
such data sets. Figure 10 represents the co–occurrence matrices when the Ionosphere, Sonar
and Iris data sets are partitioned into higher number of clusters.
19
It can be observed from Figure 10 that the higher number of clusters improves the learning
scenario for all the data sets. Six out of ten clusters in the Ionosphere data set are atomic and
two clusters are near atomic. Four clusters are atomic and three clusters are near atomic for
Sonar data set. All the clusters are either atomic or near atomic for Iris data set. These results
imply that higher number of clusters in heterogeneous clustering produce significant numbers
of atomic and near atomic clusters and it becomes easier for the fusion classifier in COEC to
produce the cluster to class map leading to better classification accuracy.
Class Class Class
1 2 1 2 1 2 3
1 0 10 1 17 26 1 0 24 2
2 35 0 2 11 1 2 0 0 10
3 0 7 3 0 11 3 0 20 2
4 108 26 4 18 22 4 12 0 0
5 22 2 5 17 5 5 0 1 16
6 20 38 6 0 9 6 9 0 0
7 0 15 7 4 7 7 0 0 15
Cluster
8 17 1 8 0 8 8 9 0 0
Cluster
Cluster
9 0 8 9 20 5 9 15 0 0
10 0 6 10 0 6
Ionosphere Sonar Iris
Figure 10: Cluster–Class co–occurrence matrix when heterogeneous clustering is

performed on the data sets using k–mean clustering algorithm with higher number of
clusters.
Figure 11 presents the classification accuracies of the datasets in Table 1 at different number
of clusters using heterogeneous clustering in COEC. The best classification accuracies are
obtained for all the data sets at number of clusters greater than the number of classes. As the
clusters have well defined boundaries the base classifiers learn cluster boundaries easily.
Higher number of clusters produces mostly atomic and near–atomic clusters for data sets like
Iris, Ionosphere and Sonar (Figure 10). As a result the fusion classifiers learn the cluster to
class maps with high accuracy resulting in better classification performance of the COEC.
Data sets like Wine has class patterns that are already well separated (Figure 9) and further
clustering does not significantly improve the classification performance of the COEC.
20
(a) Breast cancer (b) Sonar (c) Iris
(d) Ionosphere (e) Thyroid (f) Vehicle
(g) Liver (h) Diabetes (i) Wine
(j) Satellite (k) Segment
Figure 11: Heterogeneous clustering in COEC at different number of clusters on the test cases
of the datasets in Table 1.
6.1.2 Homogeneous clustering
Homogeneous clustering partitions the examples belonging to single class only and ignores
the instances of other classes. Consider the partitioning of the data sets in Figure 9 using
homogeneous clustering. The resultant cluster–class co–occurrence matrices are represented
in Figure 12 considering two clusters for each class. The total number of clusters equals the
number of classes time the number of clusters per class. Note that all the clusters are atomic
in nature.
21
Class Class Class Class
1 2 1 2 1 2 3 1 2 3
1 59 0 1 51 0 1 21 0 0 1 11 0 0
2 143 0 2 36 0 2 24 0 0 2 19 0 0
Cluster
Cluster
3 0 82 3 0 61 3 0 23 0 3 0 18 0
4 0 31 4 0 39 4 0 22 0 4 0 18 0
Cluster
Cluster
5 0 0 20 5 0 0 15
6 0 0 25 6 0 0 9
Ionosphere Sonar Iris Wine
Figure 12: Cluster–Class co–occurrence matrix when homogeneous clustering is

performed on the data sets using k–mean clustering algorithm with two classes for each
cluster.
Figure 13 represents the classification performance of COEC at different number of clusters
on the data sets in Table 1 using homogeneous clustering. Here n clusters imply a total of
n×number_of_classes clusters in the data set. For example, the Vehicle data set has four
classes and the four clusters in Figure 13 means 4×4=16 clusters in the data set. Too many
clusters in small data sets imply small number of patterns in each cluster that leads to poor
learning of the fusion classifier in COEC. This explains the fall of accuracy at higher number
of clusters for majority of the data sets in Figure 13.
6.1.3 Comparison
Homogeneous clustering can be beneficial over heterogeneous clustering for overlapping
patterns. For clarification, consider an artificial data set in Figure 14. The data set contains
overlapping patterns from multiple classes. Heterogeneous clustering is likely to produce the
partitions presented in Figure 14(b) where a large cluster is non–atomic. Even with higher
number of partitions the situation is unlikely to change or the produced clusters will be
random with each being non–atomic. The partitions produced by homogeneous clustering
under identical situation are presented in Figure 14(c). Note that all the clusters are atomic in
nature. The groups within each cluster are well separated geometrically for the data set. As
22
data is clustered class wise the cluster to class mapping becomes easier by the fusion
classifier. COEC thus performs better using homogeneous clustering.

Figure 13: Homogeneous clustering in COEC at different number of clusters on the test cases
of the datasets in Table 1.
(a) Data Set (b) Heterogeneous clustering (c) Homogeneous clustering
Figure 14: Clustering of an artificial data set with overlapping data points using
homogeneous and heterogeneous clustering.
23
To verify the above observation we have conducted a set of classification experiments on the
data sets in Table 1 using both homogeneous and heterogeneous clustering with COEC. The
10–fold cross validation results on the test sets are presented in Table 2. It can be observed
that homogeneous clustering performs 14.38% better than heterogeneous clustering on an
average with COEC. These real world data sets contain significantly overlapping patterns and
the performance of homogeneous clustering is better than that of heterogeneous clustering as
evidenced from Table 2. To validate this claim, we define the null and alternative hypothesis
as follows:
Null Hypothesis: Homogeneous clustering is equivalent to heterogeneous clustering for
classifying data using COEC.
Alternative Hypothesis: Homogeneous clustering is significantly better than heterogeneous
clustering for classifying data using COEC.
Note that the Null Hypothesis is rejected at 0.05 significance level by two–tailed sign test
[29][30] from the comparative classification performances of heterogeneous and
homogeneous clustering in Table 2.
Table 2: Classification performance comparison of COEC at homogeneous clustering and

heterogeneous clustering on the test cases of the data sets in Table 1 using 10–fold cross
validation. The sign test on the results implies that homogeneous clustering is significantly
better than heterogeneous clustering with COEC.
Data Set Heterogeneous clustering Homogeneous clustering
Breast Cancer 97.59±1.29 97.72±2.23
Sonar 67.29±9.90 84.44±7.60
Iris 95.33±5.49 96.00±3.44
Ionosphere 86.55±7.03 89.09±5.49
Thyroid 86.06±9.55 94.89±6.20
Vehicle 52.15±4.45 71.77±2.99
Liver 57.67±6.28 63.33±9.05
Diabetes 64.8±3.50 71.08±5.65
Wine 98.10±0.00 99.05±0.00
Satellite 76.08±4.97 89.19±1.22
Segment 66.93±5.07 95.97±1.08
24
Note that the performance of COEC with clustering depends on the number of clusters. The
main objective of this paper is to observe the influence of clustering on classification
accuracy. We have adopted a step wise search method by changing the number of clusters
within a limited range and observing its influence on classification accuracy. The actual
number of clusters is a function of the number of patterns in the data set and it is thus
required that a wider range of number of clusters be considered for finding the optimal
number of clusters at which the classification accuracy is maximum. Further research is
required for finding the optimal number of clusters.
6.2 Impact of Clusters on Diversity
In order to ascertain the impact of clusters on diversity we have computed the errors made by
the base classifiers as we change the number of clusters in COEC. Figure 15 represents the
errors made by k–NN, NN and SVM base classifiers as the number of clusters change. Note
that the base classifier errors at each cluster are different for all the data sets. This is possible
only if the base classifiers make different errors on identical patterns. This implies that the
errors made by the base classifiers are not correlated which in turn refer to the diversity
among the base classifiers.
25
Figure 15: Change in errors made by base classifiers as the number of clusters change in COEC.
The errors are normalized within a range of zero to one.
6.3 Comparative Performance Analysis of COEC and Base Classifiers
Table 3 represents a comparative analysis of the classification performance of COEC and the
corresponding base classifiers. Note that different base classifiers achieve different accuracies
on the data sets. This indicates the fact that the errors made by the base classifiers are
different and diversity among the base classifiers in achieved in COEC. On an average COEC
performs 3.92% better than k–NN, 7.26% better than NN and 7.39% better than SVM as the
base classifiers. The fusion classifier mingles the decisions from the base classifiers to find
the best possible verdict and this can be attributed to the better performance of COEC. In
26
order to validate the claims we define the null and alternative hypothesis for each classifier
pair in Table 4. Note that the null hypothesis is rejected at 0.05 significance level by two–
tailed sign test for each classifier pair in Table 4 implying that COEC performs significantly
better than the corresponding base classifiers.
Table 3: Classification performance comparison between COEC and the

corresponding base classifiers.
Data Set k–NN NN SVM COEC
Breast Cancer 96.78 95.09 92.02 97.72
Sonar 81.49 73.09 55.04 84.44
Iris 94 93.33 93.33 96.00
Ionosphere 80.66 77.69 84.66 89.09
Thyroid 85.17 91.33 93.83 94.89
Vehicle 68.71 65.95 68.31 71.77
Liver 61.08 62.58 61.75 63.33
Diabetes 70.29 61.27 70.64 71.08
Wine 97.14 93.50 96.07 99.05
Satellite 87.45 83.23 88.89 89.19
Segment 94.76 94.98 95.28 95.97
Table 4: Significance test for comparing the classification performance of COEC

and the corresponding base classifiers using sign test.
Classifier pair Hypothesis Test
COEC vs. k–NN Null Hypothesis: COEC is equivalent to base k–NN classifier
Alternative Hypothesis: COEC is significantly better than the base k–NN classifier
Sign-Test: Null Hypothesis rejected at 0.05 significance level from the comparative
classification performances of COEC and k–NN in Table 3
COEC vs. NN Null Hypothesis: COEC is equivalent to base NN classifier

Alternative Hypothesis: COEC is significantly better than the base NN classifier
classification performances of COEC and NN in Table 3
COEC vs. SVM Null Hypothesis: COEC is equivalent to base SVM classifier
Alternative Hypothesis: COEC is significantly better than the base SVM classifier
classification performances of COEC and SVM in Table 3
We also conducted a classification experiment of the entire data set with the base classifiers
only without any clustering. The classification results are presented in Table 5. COEC
performs 3.62% better than k–NN, 5.33% better than NN and 6.51% better than SVM
classifiers. This implies that clustering has significant impact on the learning of the ensemble
classifier (Section 6.1) leading to overall better performance. We justify this claim by
defining the null and alternative hypothesis in Table 6 for each pair of classifiers. Note that
the null hypothesis is rejected at 0.05 significance level using two–tailed sign test for each
27
classifier pair. This implies that clustering significantly impacts the learning in COEC and
improves the classification performance.
Table 5: Classification performance comparison between COEC and individual

classifiers with no clustering.
Data Set k-NN NN SVM COEC
Breast Cancer 96.78 95.09 92.02 97.72
Sonar 80.53 70.89 53.29 84.44
Iris 95.33 96 94.67 96.00
Ionosphere 82.80 82.04 87.23 89.09
Thyroid 88.5 87.11 93.83 94.89
Vehicle 69.39 72.26 70.15 71.77
Liver 59.08 61.67 66.92 63.33
Diabetes 68.9 68.62 71.74 71.08
Wine 97.14 93.50 96.07 99.05
Satellite 87.45 83.23 88.89 89.19
Segment 95.24 95.46 93.33 95.97
Table 6: Significance test for comparing the pair–wise classification performance

between COEC and the individual classifiers (without clustering) using sign test.
COEC vs. k–NN Null Hypothesis: COEC is equivalent to k–NN classifier
Alternative Hypothesis: COEC is significantly better than the k–NN classifier
classification performances of COEC and k–NN in Table 5
COEC vs. NN Null Hypothesis: COEC is equivalent to NN classifier

Alternative Hypothesis: COEC is significantly better than the NN classifier
classification performances of COEC and NN in Table 5
COEC vs. SVM Null Hypothesis: COEC is equivalent to SVM classifier

Alternative Hypothesis: COEC is significantly better than the SVM classifier
classification performances of COEC and SVM in Table 5
6.4 Comparative Performance Analysis of Classifier Fusion and Algebraic Fusion
Conventional algebraic fusion methods fuse the class confidence values produced by the base
classifiers to produce the class confidence values of the ensemble classifier. In COEC the
base classifiers produce cluster confidence values. If conventional algebraic methods (e.g.
mean of confidence values) are used in COEC the cluster confidence values will be produced
for the ensemble classifier. The cluster–to–class mapping can then be obtained using majority
voting. The class having maximum number of patterns in the cluster will win the vote. This
process is not suitable for strong non–atomic clusters as it undermines the class patterns
28
significantly present in the cluster but not in maximum. This will thus impact the overall
classification accuracy. A fusion classifier will perform better under this circumstance. The
targets of the classifier are set according to proportions of class patterns and trained
accordingly. The fusion classifier thus gives importance to all the classes in a cluster
according to their proportion whereas the majority voting undermines that.
Table 7 provides a comparative classification performance of fusion classifier and algebraic
fusion (mean confidence for cluster and majority voting for class) while used with COEC.
Overall the fusion classifier performs 1.08% better than algebraic fusion. This implies that
the use of fusion classifier significantly improves the performance of COEC compared to
algebraic fusion. To justify this claim we define the following null and alternative hypothesis:
Null Hypothesis: Fusion classifier approach is equivalent to algebraic fusion approach while
used with COEC
Alternative Hypothesis: Fusion classifier approach is significantly better than the algebraic
fusion approach while used with COEC
Note that the null hypothesis rejected at 0.05 significance level by two–tailed sign test from
the comparative classification performances presented in Table 7.
Table 7: Classification performance comparison between algebraic fusion and classifier

fusion in COEC.
Data Set Algebraic fusion Classifier fusion
Breast Cancer 97.61 97.72
Sonar 83.89 84.44
Iris 95.33 96.00
Ionosphere 84.00 89.09
Thyroid 94.28 94.89
Vehicle 70.86 71.77
Liver 62.83 63.33
Diabetes 70.46 71.08
Wine 97.14 99.05
Satellite 89.89 89.19
Segment 96.36 95.97
29
6.5 Comparative Performance Analysis of COEC and Classical Ensemble Classifiers
In order to find the position of COEC we have classified the data sets using classical
ensemble classifiers namely bagging, boosting, and random subspace method. Figure 16
provides a summary of the classification accuracies obtained using COEC and other
ensemble classifiers. On an average COEC performs 6.05% better than bagging, 8.20% better
than boosting and 9.08% better than random subspace method. As mentioned in Section 2 the
classical methods aim to achieve diversity and do not provide any mechanism to improve the
learning performance of base classifiers. In COEC this issue is handled by first allowing the
base classifier to learn cluster boundary. As clusters have well defined boundaries it is easier
to learn by the base classifiers. The fusion classifier performs the cluster to class mapping and
as observed in the previous section it performs better than the conventional fusion methods.
This combination of cluster boundary learning and fusion classifier mapping leads to better
performance of COEC. We justify this claim by conducting a sign test as presented in Table 8.
Note that the null hypothesis is rejected in all cases either at 0.05 or 0.15 significance level
indicating the fact that COEC performs significantly better than the conventional ensemble
classifiers.
30
Figure 16: Classification performance comparison between COEC and classical ensemble
classifiers.
Table 8: Significance test for comparing the pair–wise classification performance

between COEC and the classical ensemble classifiers using sign test.
COEC vs. bagging Null Hypothesis: COEC is equivalent to bagging
Alternative Hypothesis: COEC is significantly better than bagging
classification performances of COEC and bagging in Figure 16
COEC vs. boosting Null Hypothesis: COEC is equivalent to boosting

Alternative Hypothesis: COEC is significantly better than boosting
classification performances of COEC and boosting in Figure 16
COEC vs. random Null Hypothesis: COEC is equivalent to random subspace method
subspace method Alternative Hypothesis: COEC is significantly better than random subspace method
classification performances of COEC and random subspace method in Figure 16
7. Conclusion
We have presented a novel cluster oriented ensemble classifier (COEC) which is based on
learning of cluster boundaries by the base classifiers leading to better learning capability and
cluster–to–class mapping by a fusion classifier leading to better classification accuracy.
The proposed COEC has been evaluated on benchmark data sets from UCI machine learning
repository. The detailed experimental results and their significance using two-tailed sign test
31
have been presented and analysed in Section 6. The evidence from the experimental results
and two–tailed sign test show that (i) homogeneous clustering performs significantly better
than heterogeneous clustering with COEC. As shown in Section 6.1, overall the
homogeneous clustering performs 14.38% better than heterogeneous clustering. (ii) the
proposed COEC performs significantly better than its base counterparts. As shown in Section
6.3, overall COEC performs 3.62% better than k–NN, 5.33% better than NN and 6.51% better
than SVM classifiers. (iii) fusion classifier performs significantly better than algebraic fusion
with COEC. As shown in Section 6.4, overall the fusion classifier performs 1.08% better than
algebraic fusion. (iv) COEC outperforms classical ensemble classifiers namely bagging,
boosting and random subspace method significantly on benchmark data sets. As shown in
Section 6.5, overall COEC performs 6.05% better than bagging, 8.20% better than boosting
and 9.08% better than random subspace method.
In our future research, we would like to focus on finding the optimal number of clusters and
global optimization of the parameters of the base and fusion classifiers.
References
[1] R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and Systems
Magazine, vol. 6, no. 3, pp. 21–45, 2006.
[2] R. Caruana and A. N. Mizil, “An Empirical Comparison of Supervised Learning
Algorithms,” Proceedings of International Conference on Machine Learning (ICML), pp.
161–168, 2006.
[3] T. Windeatt, “Accuracy/Diversity and ensemble MLP classifier design,” IEEE
Transaction on Neural Networks, vol. 17, no. 5, pp. 1194–1211, 2006.
[4] L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.
[5] L. Breiman, “Random Forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001.
32
[6] G. Fumera, F. Roli and A. Serrau, “A Theoretical Analysis of Bagging as a Linear
Combination of Classifiers,” IEEE Transaction on Pattern Analysis and Machine
Intelligence, vol. 30, no. 7, pp. 1293–1299, 2008.
[7] R. E. Schapire, “The strength of weak learnability,” Machine Learning,” vol. 5, no. 2, pp.
197–227, 1990.
[8] Y. Freund and R. E. Schapire, “Decision-theoretic generalization of on-line learning and
an application to boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp.
119–139, 1997.
[9] R. E. Banfield, L. o. Hall, K. W. Bowyer, W. P. Kegelmeyer, “ A new ensemble
diversity measure applied to thinning ensembles,” International workshop on Multiple
Classifier Systems (MCS), pp. 306–316, 2003.
[10] N. G. Pedrajas, “Constructing ensembles of classifiers by means of weighted instance
selection,” IEEE Transaction on Neural Networks, vol. 20, no. 2, pp. 258–277, 2009.
[11] G. M. Munoz, D. H. Lobato, and A. Suarez, “An analysis of ensemble pruning
techniques based on ordered aggregation,” IEEE Transaction on Pattern Analysis and
Machine Intelligence, vol. 31, no. 2, pp. 245–259, 2009.
[12] J. J. Rodriguez and J. Maudes, “Boosting recombined weak classifiers,” Pattern
Recognition Letters, vol. 29, pp. 1049–1059, 2008.
[13] L. Nanni and A. Lumini, “Fuzzy bagging: a novel ensemble of classifiers,” Pattern
Recognition, vol. 39, pp. 488–490, 2006.
[14] L. Chen and M. S. Kamel, “A generalized adaptive ensemble generation and aggregation
approach for multiple classifiers systems,” Pattern Recognition, vol. 42, pp. 629–644,
2009.
33
[15] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE
Transaction on Pattern Analysis and Machine Intelligence, vol. 20, no. 3, pp. 226–239,
1998.
[16] L. I. Kuncheva, J. C. Bezdek, and R. Duin, “Decision templates for multiple classifier
fusion: An experimental comparison,” Pattern Recognition, vol. 34, no. 2, pp. 299–314,
2001.
[17] A. H. R. Ko, R. Sabourin, A. de S. Britto, and L. Oliveira, “ Pairwise fusion matrix for
combining classifiers,” Pattern Recognition, vol. 40, pp. 2198–2210, 2007.
[18] N. M. Wanas, R. A. Dara, and M. S. Kamel, “ Adaptive fusion and co-operative training
for classifier ensembles,” Pattern Recogntion, vol. 39, pp. 1781–1794, 2006.
[19] O. R. Terrades, E. Valveny, and S. Tabbone, “Optimal classifier fusion in a non-
Bayesian probabilistic framework,” IEEE Transaction on Pattern Analysis and Machine
Intelligence, vol. 31, no. 9, pp. 1630–1644, 2009.
[20] D. Parikh and R. Polikar, “Ensemble based incrimental learning approach to data fusion,”
IEEE Transaction on Systeams, Man, and Cybernetics, vol. 37, no. 2, pp. 437–450, 2007.
[21] M. D. Muhlbaier, A. Topalis, and R. Polikar, “Learn++.NC: Combining ensemble of
classifiers with dynamically weighted consult-and-vote for efficient incremental learning
of new classes,” IEEE Transaction on Neural Networks, vol. 20, no. 1, pp. 152–168,
2009.
[22] R. Maclin and J. W. Shavlik, “Combining the Predictions of Multiple Classifiers: Using
Competitive Learning to Initialize Neural Networks,” International Joint Conference on
Artificial Intelligence, pp. 524–531, 1995.
[23] O. Pujol and D. Masip, “Geometry-based ensembles: toward a structural characterization
of the classification boundary,” IEEE Transaction on Pattern Analysis and Machine
Intelligence, vol. 31, no. 6, 1140–1146, 2009.
34
[24] P. Chaudhuri, A. K. Ghosh, and H. Oja, “Classification based on hybridization of
parametric and non-parametric classifiers,” IEEE Transaction on Pattern Analysis and
Machine Intelligence, vol. 31, no. 7, pp. 1153–1164, 2009.
[25] A. Strehl and J. Ghosh, “Cluster ensembles – a knowledge reuse framework for
combining multiple partitions,” The Journal of Machine Learning Research, vol. 3, pp.
583–617, 2003.
[26] E. Forgy, “Cluster analysis of multivariate data: Efficiency vs. interpretability of
classifications,” Biometrics, vol. 21, pp. 768–780, 1965.
[27] UCI Machine Learning Database, http://archive.ics.uci.edu/ml/, accessed on 10th
February 2010.
[28] LIBSVM, “A library for support vector machines,”
http://www.csie.ntu.edu.tw/~cjlin/libsvm/, accessed on 10th February 2010.
[29] J. Demsar, “Statistical comparisons of classifiers over multiple data sets,” Journal of
Machine Learning Research, vol. 7, pp. 1–30, 2006.
[30] D. J. Sheskin, “Handbook of parametric and nonparametric statistical procedures,”
Chapman & Hall/CRC, 2000.
[31] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The
WEKA Data Mining Software: An Update,” SIGKDD Explorations, vol. 11, no. 1, 2009.
35
Author Biography
Brijesh Verma is a Chair Professor in the School of Information and Communication
Technology at Central Queensland University, Australia. His research interests include
pattern recognition and computational intelligence. He has published thirteen books, seven
book chapters and over hundred papers in journals and conference proceedings. He has
received twelve competitive research grants and supervised thirty one research students in the
areas of pattern recognition and computational intelligence. He has served on the program
committees of over thirty international conferences and editorial boards of six international
journals. He is a Senior Member of IEEE and has served as a Chair of IEEE Computational
Intelligence Society’s Queensland Chapter (2007-2008) and a member of IEEE CIS
Subcommittee (2010) for Outstanding Chapter Award.
Ashfaqur Rahman received his Ph.D. degree in Information Technology from Monash
University, Australia in 2008. Currently, he is a Research Fellow at the Centre for Intelligent
and Networked Systems (CINS) at Central Queensland University (CQU), Australia. His
major research interests are in the fields of data mining, multimedia signal processing and
communication and artificial intelligence. He has published more than 20 peer-reviewed
journal articles and conference papers. Dr. Rahman is the recipient of numerous academic
awards including CQU Seed Grant, the International Postgraduate Research Scholarship
(IPRS), Monash Graduate Scholarship (MGS) and FIT Dean Scholarship by Monash
University, Australia.
36

Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cluster Oriented Ensemble Classifier Impact of Multi-Cluster

Uploaded by

Copyright:

Available Formats

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication.

Cluster Oriented Ensemble Classifier: Impact of Multi-cluster

Characterisation on Ensemble Classifier Learning

B. Verma and A. Rahman

Centre of Intelligent and Networked Systems

School of Computing Sciences, CQUniversity

Rockhamton, Queensland 4702, Australia

Email: b.verma@cqu.edu.au, a.rahman@cqu.edu.au

classifiers and facilitates efficient learning. The proposed approach is evaluated on

classifier over existing ensemble classifiers published in the literature.

Keywords: Ensemble classifier, clustering, classification, fusion of classifiers

Digital Object Indentifier 10.1109/TKDE.2011.28 1041-4347/11/$26.00 © 2011 IEEE

An ensemble classifier is conventionally constructed from a set of base classifiers that

continuous valued confidence measures. Although the contemporary ensemble classifiers

ensemble classifier is based on the above philosophy.

decision by a fusion classifier. Learning cluster boundaries leads to superior performance of

investigate the performance of different clustering approaches namely heterogeneous

benchmark data sets.

ensemble classifier is discussed in Section 3 and the methodology is presented in Section 4.

respect to misclassified instances. We present a review of the contemporary ensemble

classifiers related to the proposed approach in this section.

[14], and fuzzy bagging [13].

base classifier performance. A number of variants of boosting can be observed in the

[10], Learn++ [20] and its variant Learn++.NC [21].

3. The Proposed Ensemble Classifier

misclassification of test data. An alternate solution to the problem can be achieved by

well as test data.

Class 1 Class 1 Class 1

(a) (b) (c)

3.2 Ensemble Classifier Model

݂(࢞) = [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ] (1)

combine to achieve the above mapping.

߰௜ (࢞) = [‫ݓ‬௜ଵ , … , ‫ݓ‬௜௄ ]. (2)

cluster confidence vector is set as

The base classifier parameters ߠట are tuned to optimization such that

ߠట = argminఏ෡ഗ σ‫࢞(׊‬,[௪భ ,…,௪಼])‫ߝ ್୻א‬௕ ቀ߰ఏ෡ഗ (࢞), [‫ݓ‬ଵ , … , ‫ݓ‬௄ ]ቁ (4)

performs the following mapping

߮൫[‫ݓ‬ଵଵ , … , ‫ݓ‬ଵ௄ ], … , [‫ݓ‬ே್೎ଵ , … , ‫ݓ‬ே್೎௄ ]൯ = [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ] (6)

ߠఝ = argminఏ෡ക σ‫׊‬൫࢝,[௧భ ,…,௧ಿ ߝ௙ ቀ߮ఏ෡ക (࢝), [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ]ቁ (8)

݂(࢞) = ߮൫ൣ߰௜ (࢞) Ȉ ‫ ڮ‬Ȉ ߰ே್೎ (࢞)൧൯

\1 w11 ,, w1K

Figure 2: Architecture of the proposed ensemble classifier

(ii) Use of base classifiers in an ensemble to learn cluster boundaries.

confidence values by a fusion classifier.

3.3 Clustering in COEC

knowledge of the class of the patterns.

Patterns belonging to each class are partitioned separately.

influence the accuracy of COEC as evidenced from the experimental results.

Assuming a set of K clusters {ȍଵ , ȍଶ , … , ȍ௄ } and the associated cluster centres

{țଵ , țଶ , … , ț௄ } the clustering algorithm aims to minimize an objective function

defined as īሖ = {(࢞ଵ , ݈ଵ ), (࢞ଶ , ݈ଶ ), … , (࢞หīሖ ห , ݈หīሖ ห )} where ݈௜ ‫{ ג‬ȍଵ , ȍଶ , … , ȍ௄ } , a generic

confidence vector [‫ݐ‬ଵ , … , ‫ݐ‬ே೎ ].

proportion of patterns from different classes during training as mentioned in (7).

4. Learning and Prediction Methodology of COEC

fusion classifier that produces the class confidence vector.

Figure 3: Training process for COEC

Figure 4: Test process for COEC

4.1 Homogeneous/Heterogeneous Clustering

[‫ݕ‬௜௝ ] = [݀௜௝ ] Ȉ [݈ܿ‫ݎ݁ݐݏݑ‬௜ ].

types of clustering in COEC – (i) Homogeneous clustering: Clustering is performed

clustering approaches in Section 6.

Figure 5: Homogeneous/Heterogeneous Clustering algorithm for partitioning classified data

\1 w11 ,, w1K