You are on page 1of 20

Department

of Computer Science and Engineering

An Adaptive Ensemble Classifier for Mining


Concept Drifting Data Streams

Presented By
Dr. Dewan Md. Farid
Associate Professor
Department of Computer Science & Engineering
United International University, Bangladesh

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE


Department of Computer Science and Engineering

Outline of The Presentation


Research challenges for mining data streams
Novel class instances
Adaptive ensemble classifier
Experimental analysis
Conclusions and future works

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 2


Department of Computer Science and Engineering

Data Streams Mining


Its a method of extracting knowledge and information from
continuous data instances. Instances in data streams are
generated with time passing by and cannot be controlled by
any pre-defined order.

Network traffic
Sensor data

Continuous data flow

Call center records

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 3


Department of Computer Science and Engineering

Research Challenges
Existing mining classifiers need to be updated frequently to adapt
to the changes in data streams.
A data stream has very diverse characteristics compared to the
traditional static data or database.
A. Dynamic
B. Infinite
C. High dimensional
D. High-speed
E. Non-repetitive
New instances with new class labels may appear at any time.
Dynamic feature sets
Noise in data streams

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 4


Department of Computer Science and Engineering

Novel Class Instances


Most existing data mining techniques cannot detect and classify
such novel class instances in data streaming environments, because
they are trained on instances with a fixed number of class labels.
The classification models become less accurate as time passes.
The statistical properties of the target class, which the data mining
classifiers are trying to classify, change over time in unforeseen
ways.

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 5


Department of Computer Science and Engineering

Novel Class Instances (con.)


Novel class
y y
- - - - -
D - - - - - D X X X X X
C
y1 - - - - - - -
- - - C y1 - - - - - - -
- - -
X X X X X XX X X X X X
XX X X X X X X X X X X
X X X X X XX X X X X X
A
A X X X X X X

- - - - - - -

- - - - - - -

- - - - - - - - - - - - - - - -

- - - - - - - - - - - - - - - -
++++ ++
- - - - - - - - - - - - - - - -
++++ ++
- - - - - - - - - - - - - - - -
++ + + ++
- - - - - - - - - - - - - - - -
y ++ + + ++
- - - - - - - - - - - - - - - -
+ +++ ++ +
- - - - - - - -- - - - - + +++ ++ + y
- - - - - - - -- - - - -
++ + + + ++ +
2 ++ + + + ++ +
2
+++++ ++++ +++ + + + + + + + +
+ ++ + + ++ ++ + + + + + + + + +
B +++++ ++++ +++ + + + + + + + +
+ ++ + + ++ ++ + B
+ + + + + + + +

x1 x x1 x
Classification rules:
R1. if (x > x1 and y < y2) or (x < x1 and y < y1) then class = +
R2. if (x > x1 and y > y2) or (x < x1 and y > y1) then class = -
Existing classification models misclassify novel class instances

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 6


Department of Computer Science and Engineering

Data Streams Mining Models


Single model incremental classification
It incrementally update a single classifier with new data to cope with
the evolution of the data stream. Usually its requires complex operations to
modify the internal structure of the classifier. In some cases, this technique
perform poorly.
Ensemble model based classification
Its a combination or a set of classifiers, i.e. it combines a series of
classifiers with the intention to create an improved composite model. It has
higher classification accuracy rates than single model classification techniques.

Fig. An example of Ensemble classifier.

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 7


Department of Computer Science and Engineering

Adaptive Ensemble Classifier


The proposed adaptive ensemble classifier addresses several challenges in data stream
classifications:
1. Used traditional mining classifiers (Decision tree with Clustering).
2. Updates itself automatically (so that it represents the most recent concepts in
data streams).
3. Handled infinite length problem (dividing a data stream into equal-sized sub-
stream).
4. A new instance is classified by the majority of weighted voting among the
classifiers in M.
5. The instances of each sub-stream are labeled by M and then a new classifier is
trained with the most recent dataset.
6. The classifier with the smallest weight reflecting the minimum classification
accuracy rate in M is chosen for replacement.

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 8


Department of Computer Science and Engineering

Adaptive Ensemble Classifier (con.)

Fig. Flow charts of the proposed adaptive ensemble classifier.

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 9


Department of Computer Science and Engineering

Adaptive Ensemble Classifier (con.)

Algorithms:
A. Algorithm 1. Instance Weighting using NB Classifier
B. Algorithm 2. Decision Tree Learning
C. Algorithm 3. Similarity Based Clustering
D. Algorithm 4. The Adaptive Ensemble Classifier
E. Algorithm 5. Classification and Novel Class Detection

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 10


Department of Computer Science and Engineering

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 11


Department of Computer Science and Engineering

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 12


Department of Computer Science and Engineering

Experimental Analysis
Used Symbols and Terms

Symbol Term
The equations used to evaluate learning
N Total instances in the data stream
algorithms:
Nc Total novel class instances in the data stream
Fn *100
Total existing class instances misclassified
M new =
Fp Nc
as novel class
Fn Total novel class instances misclassified as
existing class Fp *100
Fnew =
Fe Total existing class instances misclassified N Nc
Mnew % of novel class instances misclassified as
existing class
( Fp + Fn + Fe ) *100
Fnew % of existing class instances falsely ERR =
identified as novel class N
ERR Total misclassified error

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 13


Department of Computer Science and Engineering

Experimental Analysis (con.)


We implement our algorithm in Java.

The code for decision tree has been adapted from the Weka machine learning open
source repository (http://www.cs.waikato.ac.nz/ml/weka).

Weka contains tools for data pre-processing, classification, regression, clustering,


association rules, and visualization.

The experiments were run on an Intel Core 2 Duo Processor 2.0 GHz processor (2 MB
Cache, 800 MHz FSB) with 1 GB of RAM.

Datasets from UCI Machine Learning Repository


Dataset No of Attribute No of Instances No of Class
Attributes Types Attribute
NSL-KDD 41 Real & Nominal 25192 23
Dataset
Large Soybean 35 Nominal 683 19
Database
Image 19 Real 2310 7
Segmentation
Data

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 14


Department of Computer Science and Engineering

Experimental Analysis

Fig. The comparison of accuracy rate of classifiers


in concept drifting using the NSL-KDD dataset.

Fig. The ROC curves of classifiers using


the entire NSL-KDD dataset.

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 15


Department of Computer Science and Engineering

Experimental Analysis (con.)

Fig. The comparison of accuracy rate of classifiers


in concept drifting using the soybean dataset.

Fig. The ROC curves of classifiers using


the entire soybean dataset.

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 16


Department of Computer Science and Engineering

Experimental Analysis (con.)

Fig. The comparison of accuracy rate of


classifiers in concept drifting using the image
segmentation dataset.

Fig. The ROC curves of classifiers using


the entire image dataset.

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 17


Department of Computer Science and Engineering

Experimental Analysis (con.)

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 18


Department of Computer Science and Engineering

Conclusion & Future Works


Our work addresses Future works

A. Infinite length A. Concept drifting under


dynamic feature sets
B. Limited labeled data
B. Test instance weighting
C. Concept drift approach
D. Concept-evolution C. Selecting the most
informative test instances

Publication:
Dewan Md. Farid, Li Zhang, Alamgir Hossain, Chowdhury Mofizur Rahman, Rebecca
Strachan, Graham Sexton, and Keshav Dahal, An Adaptive Ensemble Classifier for
Mining Concept Drifting Data Streams, Expert Systems with Applications, Elsevier
Science, Vol. 40, Issue 15, 1 November 2013, pp. 5895-5906 (Impact Factor: 2.981).

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 19


Department of Computer Science and Engineering

Questions

http://cse.uiu.ac.bd Dr. Dewan Md. Farid, Associate Professor, Dept. of CSE 20

You might also like