Professional Documents
Culture Documents
Outline
Introduction
Aspects of Anomaly Detection Problem
Applications
Different Types of Anomaly Detection
Case Studies
Discussion and Conclusions
Introduction
We are drowning in the deluge
of data that are being collected
world-wide, while starving for
knowledge at the same time*
Anomalous events occur
relatively infrequently
However, when they do occur,
their consequences can be quite
dramatic and quite often in a
negative sense
* - J. Naisbitt, Megatrends: Ten New Directions Transforming Our Lives. New York: Warner Books,
1982.
Cyber Intrusions
A web server involved in ftp
traffic
Simple Example
N1 and N2 are
regions of normal
behavior
Points o1 and o2
are anomalies
Points in region O3
are anomalies
Y
N1
o1
O3
o2
N2
Related problems
Rare Class Mining
Chance discovery
Novelty Detection
Exception Mining
Noise Removal
Black Swan*
* N. Talleb, The Black Swan: The Impact of the Highly Probable?, 2007
Key Challenges
Defining a representative normal region is
challenging
The boundary between normal and outlying behavior
is often not precise
The exact notion of an outlier is different for different
application domains
Availability of labeled data for training/validation
Malicious adversaries
Data might contain noise
Normal behavior keeps evolving
Input Data
Most common form of
data handled by
anomaly detection
techniques is Record
Data
Univariate
Multivariate
Tid
10
SrcIP
Start
time
Dest IP
Dest
Port
Number
Attack
of bytes
139
192
No
139
195
No
139
180
No
139
199
No
139
19
Yes
139
177
No
139
172
No
139
285
Yes
139
195
No
139
163
Yes
t
ca
Tid
ri c
o
eg
SrcIP
al
us
o
u
in
t
n
co
t
ca
ri c
o
eg
al
n
co
us
o
u
tin
Number
Internal
of bytes
Duration
Dest IP
1 206.163.37.81
0.10
160.94.179.208
150
No
2 206.163.37.99
0.27
160.94.179.235
208
No
3 160.94.123.45
1.23
160.94.179.221
195
Yes
4 206.163.37.37 112.03
160.94.179.253
199
No
5 206.163.37.41
160.94.179.244
181
No
0.32
ar
n
bi
GGTTCCGCCTTCAGCCCCGCGCC
CGCAGGGCCCGCCCCGCGCCGTC
GAGAAGGGCCCGCCTGGCGGGCG
GGGGGAGGCGGGGCCGCCCGAGC
CCAACCGAGTCCGACCAGGTGCC
CCCTCTGCTCGGCCTAGACCTGA
GCTCATTAGGCGGCAGCGGACAG
GCCAAGTAGAACACGCGAAGCGC
TGGGCTGCCTGCTGCGACCAGGG
Data Labels
Supervised Anomaly Detection
Labels available for both normal data and
anomalies
Similar to rare class mining
Type of Anomaly
Point Anomalies
Contextual Anomalies
Collective Anomalies
Point Anomalies
An individual data instance is anomalous
w.r.t. the data
Y
N1
o1
O3
o2
N2
Contextual Anomalies
An individual data instance is anomalous within a context
Requires a notion of context
Also referred to as conditional anomalies*
Anomaly
Normal
* Xiuyao Song, Mingxi Wu, Christopher Jermaine, Sanjay Ranka, Conditional Anomaly Detection, IEEE
Transactions on Data and Knowledge Engineering, 2006.
Collective Anomalies
A collection of related data instances is anomalous
Requires a relationship among data instances
Sequential Data
Spatial Data
Graph Data
Anomalous Subsequence
Score
Each test instance is assigned an anomaly score
Allows the output to be ranked
Requires an additional threshold parameter
F measure
=
=
TP/(TP + FN)
TP/(TP + FP)
2*R*P/(R+P)
. 2
i ff
t l i e
t e
t i o
t e
i q
t e
t i o
t e
l s
. 4
l a
. 6
t e
. 7
Intrusion Detection
Intrusion Detection:
Process of monitoring the events occurring in a computer system or
network and analyzing them for intrusions
Intrusions are defined as attempts to bypass the security
mechanisms of a computer or network
Challenges
Traditional signature-based intrusion detection
systems are based on signatures of known
attacks and cannot detect emerging cyber threats
Substantial latency in deployment of newly
created signatures across the computer system
Fraud Detection
Fraud detection refers to detection of criminal activities
occurring in commercial organizations
Malicious users might be the actual customers of the organization
or might be posing as a customer (also known as identity theft).
Types of fraud
Challenges
Fast and accurate real-time detection
Misclassification cost is very high
Healthcare Informatics
Detect anomalous patient records
Indicate disease outbreaks, instrumentation
errors, etc.
Key Challenges
Only normal labels available
Misclassification cost is very high
Data can be complex: spatio-temporal
Key Challenges
Data is extremely huge, noisy and unlabelled
Most of applications exhibit temporal behavior
Detecting anomalous events typically require immediate intervention
Image Processing
50
100
150
200
250
50
100
150
200
250
300
350
Key Challenges
Detecting collective anomalies
Data sets are very large
Anomaly
Taxonomy*
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
* Outlier Detection A Survey, Varun Chandola, Arindam Banerjee, and Vipin Kumar, Technical Report
TR07-17, University of Minnesota (Under Review)
Drawbacks:
Supervised classification techniques
Require both labels from both normal and anomaly class
Cannot detect unknown and emerging anomalies
Association rules
Rules with support higher than pre specified threshold may characterize
normal behavior [Barbara01, Otey03]
Anomalous data record occurs in fewer frequent itemsets compared to normal
data record [He04]
Frequent episodes for describing temporal normal behavior [Lee00,Qin04]
Case specific feature weighting [Cardey97] - Decision tree learning, where for
each rare class test example replace global weight vector with dynamically
generated weight vector that depends on the path taken by that example
Case specific rule weighting [Grzymala00] - LERS (Learning from Examples
based on Rough Sets) algorithm increases the rule strength for all rules
describing the rare class
N-phase:
remove FP from examples covered in P-phase
N-rules give high accuracy and significant support
NC
NC
Oscillatory networks
Relaxation time of oscillatory NNs is used as a criterion for novelty detection when
a new instance is presented [Ho98, Borisyuk00]
Target
variabl
es
* S. Hawkins, et al. Outlier detection using replicator neural networks, DaWaK02 2002.
origin
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
Categories:
Distance based methods
Drawbacks
If normal points do not have sufficient number of
neighbors the techniques may fail
Computationally expensive
In high dimensional spaces, data is sparse and the
concept of similarity may not be meaningful anymore.
Due to the sparseness, distances between any two data
records may become quite similar => Each data record
may be considered as potential outlier!
Not suitable for datasets that have modes with varying density
* Knorr, Ng,Algorithms for Mining Distance-Based Outliers in Large Datasets, VLDB98
** S. Ramaswamy, R. Rastogi, S. Kyuseok: Efficient Algorithms for Mining Outliers from
Large Data Sets, ACM SIGMOD Conf. On Management of Data, 2000.
For each data point q compute the distance to the k-th nearest neighbor
(k-distance)
Compute reachability distance (reach-dist) for each data example q with
respect to data example p as:
reach-dist(q, p) = max{k-distance(p), d(q,p)}
Compaute LOF(q) as ratio of average local reachability density of qs knearest neighbors and local reachability density of the data record q
1
lrd ( p )
LOF(q) =
MinPts p lrd (q )
* - Breunig, et al, LOF: Identifying Density-Based Local Outliers, KDD
2000.
In the NN approach, p2 is
not considered as outlier,
while the LOF approach
find both p1 and p2 as
outliers
p3
Distance from p2
to nearest
neighbor
p2
p1
NN approach may
consider p3 as outlier, but
LOF approach does not
2 r i
dist oi , pi+1
i=1 r r 1
r 1
ac distG p1 =
ac-distG(p1) is larger if dist oi , pi+1 is large for small values of the index i, which
corresponds to the sparser neighborhood of the point p1.
COF identifies outliers as points whose neighborhoods is sparser than the
neighborhoods of their neighbors
* J. Tang, Z. Chen, A. W. Fu, D. Cheung, A robust outlier detection scheme for large data sets, Proc. PacificAsia Conf. Knowledge Discovery and Data Mining, Tapeh, Taiwan, 2002.
n pi , r n pi , r , k n pi , r ,
Example:
n(pi,r)=4,
n(pi,r)=1,
n pi , r ,
n(p
n 3p,r)=2,
i , r , 1.479
n(p1,r)=3,
n(p2,r)=5,
= (1+3+5+2) / 4 = 2.75,
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
Drawbacks
Computationally expensive
Using indexing structures (k-d tree, R* tree) may alleviate this
problem
FindOut
FindOut algorithm* by-product of WaveCluster
Main idea: Remove the clusters from original data and then
identify the outliers
Transform data into multidimensional signals using wavelet
transformation
High frequency of the signals correspond to regions where is the
rapid change of distribution boundaries of the clusters
Low frequency parts correspond to
the regions where the data is
concentrated
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
Challenges
With high dimensions, difficult to estimate distributions
Parametric assumptions often do not hold for real data sets
Non-parametric Techniques
Do not assume any knowledge of parameters
Use non-parametric techniques to learn a distribution e.g. parzen
window estimation
SmartSifter (SS)*
Uses Finite Mixtures
SS uses a probabilistic model as a representation of underlying
mechanism of data generation.
Histogram density used to represent a probability density for categorical
attributes
SDLE (Sequentially Discounting Laplace Estimation) for learning
histogram density for categorical domain
Finite mixture model used to represent a probability density for continuous
attributes
SDEM (Sequentially Discounting Expectation and Maximizing) for
learning finite mixture for continuous domain
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
Advantage
Operate in an unsupervised mode
Challenges
Require an information theoretic measure sensitive enough to detect
irregularity induced by very few outliers
Spectral Techniques
Analysis based on eigen decomposition of data
Key Idea
Find combination of attributes that capture bulk of variability
Reduced set of attributes can explain normal data well, but not necessarily the
outliers
Advantage
Can operate in an unsupervised mode
Disadvantage
Based on the assumption that anomalies and normal instances are
distinguishable in the reduced space
Disadvantages
Works well for low dimensional data
Can provide only aggregated or partial views for high
dimension data
* Haslett, J. et al. Dynamic graphics for exploring spatial data with application to locating global and local anomalies.
The American Statistician
* Cox et al 1997. Visual data mining: Recognizing telephone calling fraud. Journal of Data Mining and
Knowledge Discovery
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
Assumption
All normal instances within a context will be
similar (in terms of behavioral attributes), while
the anomalies will be different
Challenges
Identifying a set of good contextual attributes
Determining a context using the contextual
attributes
Contextual Attributes
Contextual attributes define a neighborhood (context)
for each instance
For example:
Spatial Context
Latitude, Longitude
Graph Context
Edges, Weights
Sequential Context
Position, Time
Profile Context
User demographics
A mapping p(Vj|Ui) is learnt that indicates the probability of the behavioral part to
be generated by component Vj when the contextual part is generated by
component Ui
* Xiuyao Song, Mingxi Wu, Christopher Jermaine, Sanjay Ranka, Conditional Anomaly Detection, IEEE
Transactions on Data and Knowledge Engineering, 2006.
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
Testing
Extract fixed length subsequences from the test sequence
Find empirical probability of each test subsequence from the above counts
If probability for a subsequence is below a threshold, the subsequence is
declared as anomalous
Number of anomalous subsequences in a test sequence is its anomaly
score
* Warrender, Christina, Stephanie Forrest, and Barak Pearlmutter. Detecting Intrusions Using System Calls: Alternative Data
Models. To appear, 1999 IEEE Symposium on Security and Privacy. 1999.
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
50
100
150
200
250
50
100
150
200
250
300
350
Time
slot 2
Time
slot i
..
Di
Time
slot (i+1)
Di+1
Time
slot t
..
Time
IncrementalLOF_insertion(DatasetS)
D
Given:SetS{p1,,pN}piR ,whereDcorresponds
tothedimensionalityofdatarecords.
ForeachdatapointpcinsertedintodatasetS
coming
insert(pc)
ComputekNN(pc)
(pjkNN(pc))
computereachdistk(pc,pj)usingEq.(1);
//Update_neighborsofpc
Supdate_k_distance=kRNN(pc);
(pjSupdate_k_distance)
updatekdistance(pj)usingEq.(5);
Supdate_lrd=Supdate_k_distance;
(pjSupdate_k_distance),(pikNN(pj)\{pc})
reachdistk(pi,pj)=kdistance(pj);
ifpjkNN(pi)
Supdate_lrd=Supdate_lrd{pi};
Supdate_LOF=Supdate_lrd;
(pmSupdate_lrd)
updatelrd(pm)usingEq.(2);
Supdate_LOF=Supdate_LOFkRNN(pm);
(plSupdate_LOF)
updateLOF(pl)usingEq.(3);
computelrd(pc)usingEq.(2);
computeLOF(pc)usingEq.(3);
End//for
Taxonomy
Anomaly Detection
Classification Based
Clustering Based
Statistical
Others
Rule Based
Density Based
Parametric
Distance Based
Non-parametric
SVM Based
Contextual Anomaly
Detection
Visualization Based
Collective Anomaly
Detection
Online Anomaly
Detection
Distributed Anomaly
Detection
100000
80000
60000
40000
20000
0
1 1991
2
3 1993
4 1994
5
6
7
8 1998
9 1999
10 2000
11 2001
12 2002
13 2003
14
1990
1992
1995
1996
1997
Scannin
g
activity
Attacke
r
Comput
er
Networ
k
Compromise
Machine
d Machine
with
vulnerability
Intrusion Detection
Intrusion Detection System
combination of software
and hardware that attempts
to perform intrusion detection
raises the alarm when possible
intrusion happens
Traditional intrusion detection system IDS tools (e.g. SNORT) are based
on signatures of known attacks
Example of SNORT rule (MS-SQL Slammer worm)
any -> udp port 1434 (content:"|81 F1 03 01 04 9B 81 F1 01|";
content:"sock"; content:"send")
www.snort.o
rg
Signature database has to be manually revised for each new type of
Limitations
discovered intrusion
They cannot detect emerging cyber threats
Substantial latency in deployment of newly created signatures across the
computer system
Anomaly detection
Detect novel attacks as deviations from normal behavior
Potential high false alarm rate - previously unseen (yet legitimate) system
behaviors may also be recognized as anomalies
SrcIP
Start
time
Dest IP
Dest
Port
Number
Attack
of bytes
Misuse Detection
Building Predictive
Models
Tid
al
al
us
l
c
c
i
o
i
a
r
u
o
or por
in
g
g
t
e
e
ss
n
t
t
m
a
a
o
te
c
c
cl
ca
SrcIP
Start
time
Dest
DestPort
IP
Number
Number
Attack
Attack
of bytes
bytes
of
139
192
No
139
195
No
150
150
?
No
139
180
No
208
208
?
No
139
199
No
195
195
?
Yes
139
19
Yes
199
199
?
No
139
177
No
181
181
?
Yes
139
172
No
139
285
Yes
139
195
No
139
163
Yes
1 0
Summarization of
attacks using
association rules
Rules
RulesDiscovered:
Discovered:
{Src
{SrcIP
IP==206.163.37.95,
206.163.37.95,
Dest
Port
=
Dest Port =139,
139,
Bytes
[150,
Bytes [150,200]}
200]}-->
-->
{ATTACK}
{ATTACK}
Training
Set
Learn
Classifier
Anomaly Detection
Test
Set
Model
network
Anomaly
detection
Datacapturing
device
tcpdump
Filtering
M
I
N
Detection System
D
S
Summaryand
Association
characterization
pattern analysis
ofattacks
Detectednovel MINDSAT
attacks
Labels
Feature
Extraction
Knownattack
detection
Detected
knownattacks
Human
analyst
Feature Extraction
Three groups of features
Basic features of individual TCP connections
Features 1 & 2
Features 3 & 4
Feature 5
Feature 6
Feature 7
Feature 8
d s t
f l a g
d s t
s e r v i c e
f l a g %
1 h t t p
S 0
1 h t t p
S 0
1 h t t p
S 0
1 h t t p
S 0
1 h t t p
S 0
1 h t t p
S 0
2 h t t p
S 0
2 h t t p
S 0
4 h t t p
S 0
4 h t t p
S 0
2 f t p
2 f t p
ee xx
s e r v i c e
i s t i n g
S 0
f ee a tt u r ee s
uu ss e l ee ss s
s y n
f l o o d
n o r m
a l
S 0
c oo nn ss tt rr uu cc tt
h
ii gg h
f ee aa tt uu rr ee ss
i n f o r m
m
aa t ii o nn
For the same source (destination) IP address, number of unique destination (source)
IP addresses inside the network in last T seconds Features 9 (13)
Number of connections from source (destination) IP to the same destination (source)
port in last T seconds Features 11 (15)
w
w
i t hh
g aa ii nn
srcIP
63.150.X.253
63.150.X.253
63.150.X.253
63.150.X.253
63.150.X.253
63.150.X.253
63.150.X.253
63.150.X.253
63.150.X.253
63.150.X.253
142.150.Y.101
200.250.Z.20
202.175.Z.237
63.150.X.253
63.150.X.253
63.150.X.253
142.150.Y.101
63.150.X.253
142.150.Y.236
142.150.Y.101
63.150.X.253
142.150.Y.101
142.150.Y.101
142.150.Y.101
63.150.X.253
142.150.Y.107
63.150.X.253
63.150.X.253
142.150.Y.103
63.150.X.253
63.150.X.253
sPort
1161
1161
1161
1161
1161
1161
1161
1161
1161
1161
0
27016
27016
1161
1161
1161
0
1161
0
0
1161
0
0
0
1161
0
1161
1161
0
1161
1161
dstIP
128.101.X.29
160.94.X.134
128.101.X.185
160.94.X.71
160.94.X.19
160.94.X.80
160.94.X.220
128.101.X.108
128.101.X.223
128.101.X.142
128.101.X.127
128.101.X.116
128.101.X.116
128.101.X.62
160.94.X.223
128.101.X.241
128.101.X.168
160.94.X.43
128.101.X.240
128.101.X.45
160.94.X.183
128.101.X.161
128.101.X.99
128.101.X.219
128.101.X.160
128.101.X.2
128.101.X.30
128.101.X.40
128.101.X.202
160.94.X.32
128.101.X.61
dPort
1434
1434
1434
1434
1434
1434
1434
1434
1434
1434
2048
4629
4148
1434
1434
1434
2048
1434
2048
2048
1434
2048
2048
2048
1434
2048
1434
1434
2048
1434
1434
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
4
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
7
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
9
10 11 12 13 14 15 16
0.81 0 0.59 0 0 0 0 0
0.81 0 0.59 0 0 0 0 0
0.81 0 0.58 0 0 0 0 0
0.81 0 0.58 0 0 0 0 0
0.81 0 0.58 0 0 0 0 0
0.81 0 0.58 0 0 0 0 0
0.81 0 0.58 0 0 0 0 0
0.82 0 0.58 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0
0
0
0 0 0 1 0
0
0
0
0 0 0 1 0
0.82 0 0.57 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.82 0 0.57 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
0.83 0 0.56 0 0 0 0 0
picked by MINDS include scanning activities, worms, and non-standard behavior such as
policy violations and insider attacks. Many of these attacks detected by MINDS, have already been on the
CERT/CC list of recent advisories and incident notes.
Some
Scans
August 13, 2004, Detected scanning for Microsoft DS service on port 445/TCP (Ranked#1)
Reported by CERT as recent DoS attacks that needs further analysis (CERT August 9, 2004)
Undetected by SNORT since the scanning was non-sequential (very slow). Rule added to SNORT in September 2004
August 13, 2004, Detected scanning for Oracle server (Ranked #2), Reported by CERT, June 13, 2004
Undetected by SNORT because the scanning was hidden within another Web scanning
October 10, 2005, Detected a distributed windows networking scan from multiple source locations (Ranked #1)
Policy Violations
August 8, 2005, Identified machine running Microsoft PPTP VPN server on non-standard ports (Ranked #1)
Undetected by SNORT since the collected GRE traffic was part of the normal traffic
August 10 2005 & October 30, 2005, Identified compromised machines running FTP servers on non-standard ports, which is a policy violation (Ranked #1)
February 6, 2006, Detected a computer on the network apparently communicating with a computer in California over a VPN or on IPv6
Worms
October 10, 2005, Detected several instances of slapper worm that were not identified by SNORT since they were variations of existing worm code
February 6, 2006, Detected unsolicited ICMP ECHOREPLY messages to a computer previously infected with Stacheldract worm (a DDos agent)
Conclusions
Anomaly detection can detect critical
information in data
Highly applicable in various application
domains
Nature of anomaly detection problem is
dependent on the application domain
Need different approaches to solve a
particular problem formulation
References
Ling, C., Li, C. Data mining for direct marketing: Problems and solutions, KDD, 1998.
Kubat M., Matwin, S., Addressing the Curse of Imbalanced Training Sets: One-Sided Selection, ICML
1997.
N. Chawla et al., SMOTE: Synthetic Minority Over-Sampling Technique, JAIR, 2002.
W. Fan et al, Using Artificial Anomalies to Detect Unknown and Known Network Intrusions, ICDM 2001
N. Abe, et al, Outlier Detection by Active Learning, KDD 2006
C. Cardie, N. Howe, Improving Minority Class Prediction Using Case specific feature weighting, ICML
1997.
J. Grzymala et al, An Approach to Imbalanced Data Sets Based on Changing Rule Strength, AAAI
Workshop on Learning from Imbalanced Data Sets, 2000.
George H. John. Robust linear discriminant trees. AI&Statistics, 1995
Barbara, D., Couto, J., Jajodia, S., and Wu, N. Adam: a testbed for exploring the use of data mining in
intrusion detection. SIGMOD Rec., 2001
Otey, M., Parthasarathy, S., Ghoting, A., Li, G., Narravula, S., and Panda, D. Towards nic-based intrusion
detection. KDD 2003
He, Z., Xu, X., Huang, J. Z., and Deng, S. A frequent pattern discovery method for outlier detection. WebAge Information Management, 726732, 2004
Lee, W., Stolfo, S. J., and Mok, K. W. Adaptive intrusion detection: A data mining approach. Artificial
Intelligence Review, 2000
Qin, M. and Hwang, K. Frequent episode rules for internet anomaly detection. In Proceedings of the 3rd
IEEE International Symposium on Network Computing and Applications, 2004
Ide, T. and Kashima, H. Eigenspace-based anomaly detection in computer systems. KDD, 2004
Sun, J. et al., Less is more: Compact matrix representation of large sparse graphs. ICDM 2007
References
Lee, W. and Xiang, D. Information-theoretic measures for anomaly detection. In Proceedings of the
IEEE Symposium on Security and Privacy. IEEE Computer Society, 2001
Ratsch, G., Mika, S., Scholkopf, B., and Muller, K.-R. Constructing boosting algorithms from SVMs: An
application to one-class classification. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2002
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. A geometric framework for unsupervised
anomaly detection. In Proceedings of Applications of Data Mining in Computer Security, 2002
Scholkopf, B., Platt, O., Shawe-Taylor, J., Smola, A., and Williamson, R. Estimating the support of a
high-dimensional distribution. Tech. Rep. 99-87, Microsoft Research, 1999
Baker, D. et al., A hierarchical probabilistic model for novelty detection in text. ICML 1999
Das, K. and Schneider, J. Detecting anomalous records in categorical datasets. KDD 2007
Augusteijn, M. and Folkert, B. Neural network classification and novelty detection. International Journal
on Remote Sensing, 2002
Sykacek, P. Equivalent error bars for neural network classifiers trained by Bayesian inference. In
Proceedings of the European Symposium on Artificial Neural Networks. 121126, 1997
Vasconcelos, G. C., Fairhurst, M. C., and Bisset, D. L. Investigating feedforward neural networks with
respect to the rejection of spurious patterns. Pattern Recognition Letter, 1995
References
S. Hawkins, et al. Outlier detection using Replicator neural networks, DaWaK02 2002.
Dasgupta, D. and Nino, F. 2000. A comparison of negative and positive selection algorithms in novel
pattern detection. IEEE International Conference on Systems, Man, and Cybernetics, 2000
Caudell, T. and Newman, D. An adaptive resonance architecture to define normality and detect novelties
in time series and databases. World Congress on Neural Networks, 1993
Albrecht, S. et al. Generalized radial basis function networks for classification and novelty detection:
self-organization of optional Bayesian decision. Neural Networks, 2000
Steinwart, I., Hush, D., and Scovel, C. A classification framework for anomaly detection. JMLR, 2005
Srinivas Mukkamala et al. Intrusion Detection Systems Using Adaptive Regression Splines. ICEIS 2004
Li, Y., Pont et al. Improving the performance of radial basis function classifiers in condition monitoring
and fault diagnosis applications where unknown faults may occur. Pattern Recognition Letters, 2002
Borisyuk, R. et al. An oscillatory neural network model of sparse distributed memory and novelty
detection. Biosystems, 2000
Ho, T. V. and Rouat, J. Novelty detection based on relaxation time of a network of integrate-and-fire
neurons. Proceedings of Second IEEE World Congress on Computational Intelligence, 1998
Thanks!!!
Questions?