Professional Documents
Culture Documents
Anomaly Detection
Tan,Steinbach, Kumar
4/18/2004
Anomaly/Outlier Detection
Applications:
Credit card fraud detection, telecommunication fraud detection,
network intrusion detection, fault detection
Tan,Steinbach, Kumar
4/18/2004
Anomaly Detection
Challenges
assumption:
Tan,Steinbach, Kumar
4/18/2004
General Steps
Build a profile of the normal behavior
Tan,Steinbach, Kumar
4/18/2004
Graphical Approaches
Limitations
Time consuming
Subjective
Tan,Steinbach, Kumar
4/18/2004
Statistical Approaches
Tan,Steinbach, Kumar
4/18/2004
Grubbs Test
Detect
test statistic:
H0 if:
Tan,Steinbach, Kumar
( N 1)
G
N
max X X
s
t (2 / N , N 2 )
N 2 t (2 / N , N 2 )
4/18/2004
Statistical-based Likelihood
Approach
General Approach:
Initially, assume all the data points belong to M
Let Lt(D) be the log likelihood of D at time t
For each point xt that belongs to M, move it to A
Tan,Steinbach, Kumar
4/18/2004
Statistical-based Likelihood
Approach
distribution, D = (1 ) M + A
M is a probability distribution estimated from data
Data
Lt ( D ) PD ( xi ) (1 ) PM t ( xi ) PAt ( xi )
i 1
xi M t
xi At
xi M t
Tan,Steinbach, Kumar
| At |
xi At
4/18/2004
Distance-based Approaches
Tan,Steinbach, Kumar
4/18/2004
10
Approach:
Compute the distance between every pair of data
points
There are various ways to define outliers:
Data
The
The
Tan,Steinbach, Kumar
4/18/2004
11
In the NN approach, p2 is
not considered as
outlier, while LOF
approach find both p1
and p2 as outliers
p2
Tan,Steinbach, Kumar
p1
Introduction to Data Mining
4/18/2004
12
Clustering-Based
Basic idea:
Cluster the data into
groups of different density
Choose points in small
cluster as candidate
outliers
Compute the distance
between candidate points
and non-candidate
clusters.
If
Tan,Steinbach, Kumar
4/18/2004
13