Professional Documents
Culture Documents
Presented By
Dr. Dewan Md. Farid
Associate Professor
Department of Computer Science & Engineering
United International University, Bangladesh
Network traffic
Sensor data
Research Challenges
Existing mining classifiers need to be updated frequently to adapt
to the changes in data streams.
A data stream has very diverse characteristics compared to the
traditional static data or database.
A. Dynamic
B. Infinite
C. High dimensional
D. High-speed
E. Non-repetitive
New instances with new class labels may appear at any time.
Dynamic feature sets
Noise in data streams
Algorithms:
A. Algorithm 1. Instance Weighting using NB Classifier
B. Algorithm 2. Decision Tree Learning
C. Algorithm 3. Similarity Based Clustering
D. Algorithm 4. The Adaptive Ensemble Classifier
E. Algorithm 5. Classification and Novel Class Detection
Experimental Analysis
Used Symbols and Terms
Symbol Term
The equations used to evaluate learning
N Total instances in the data stream
algorithms:
Nc Total novel class instances in the data stream
Fn *100
Total existing class instances misclassified
M new =
Fp Nc
as novel class
Fn Total novel class instances misclassified as
existing class Fp *100
Fnew =
Fe Total existing class instances misclassified N Nc
Mnew % of novel class instances misclassified as
existing class
( Fp + Fn + Fe ) *100
Fnew % of existing class instances falsely ERR =
identified as novel class N
ERR Total misclassified error
The code for decision tree has been adapted from the Weka machine learning open
source repository (http://www.cs.waikato.ac.nz/ml/weka).
The experiments were run on an Intel Core 2 Duo Processor 2.0 GHz processor (2 MB
Cache, 800 MHz FSB) with 1 GB of RAM.
Experimental Analysis
Publication:
Dewan Md. Farid, Li Zhang, Alamgir Hossain, Chowdhury Mofizur Rahman, Rebecca
Strachan, Graham Sexton, and Keshav Dahal, An Adaptive Ensemble Classifier for
Mining Concept Drifting Data Streams, Expert Systems with Applications, Elsevier
Science, Vol. 40, Issue 15, 1 November 2013, pp. 5895-5906 (Impact Factor: 2.981).
Questions