You are on page 1of 4

2012 2nd International Conference on Computer Science and Network Technology

A Novel Approach to Intrusion Detection Base on


Fast Incremental SVM
Qi Mu, Yongjun Zhang , Qian Niu
School of Computer, Xian University of Science and Technology
Xian, China
E-mail:azhangyongjun@126.com
problem with a small extension set (Nearest Border Vectors)
instead of the SV set. However, in practical applications, there
may be a fitting problem of classifier trained by the nearest
vectors as a result of the fitting samples. Obviously, extraction
to the boundary sample as few as possible and preservation
to the original classified information as complete as possible
are the keys to incremental SVM research.

AbstractA new incremental SVM algorithm to intrusion


detection based on cloud model is proposed for the low efficiency
of border vectors extraction. In this algorithm, the characteristic
distance between the heterogeneous samples is mapped into a
membership function to extract the boundary vectors from initial
dataset, which reflects the stability and uncertainty
characteristics of the cloud model. Also the possible changes of
support vector set after new samples adding are analyzed and the
useless samples are discarded by the analysis results. The
theoretical analysis and simulation results show that the detection
speed is greatly improved, while maintaining a high detection
performance.
Keywords- Intrusion Detection; Support Vector Machine(SVM);
Incremental Learning; Cloud Model; Boundary Vectors

I.

INTRODUCTION

Support vector machine(SVM)[1] is a machine learning


algorithm based on statistical theory proposed by Vapnik . It is
widely used in the field of intrusion detection because of its
great nonlinear processing performance and generalization
capability. For intrusion detection systems, however, the too
large training set and continuous new samples inevitably lead
long training time and affect the classification accuracy of
further training. As an effective way to process continuously
updated data, incremental learning can retain the previous
results , just study the adding data , and form a continuous
learning process. Incremental SVM, applied in intrusion
detection system , makes full use of the results of historical
training to effectively solve the memory problems caused by
huge data sets. At the same time, it also can improve the
problems of low classification accuracy and long training time,
as a result of the new sample emergence.

II. CLOUD THEORY


The cloud model [6-7] , proposed by Professor Li Deyi, is
to express the uncertainty transformation between qualitative
concept and quantitative data in natural language value, which
provide a powerful way to information processing of
qualitative and quantitative.
Definition 1 Let U is a quantitative domain expressed by
a value, and C is a qualitative concept on U . If the
quantitative value xU is a random realization of C and
certainty degree ( x) [0,1] of x to C is a random number with
stable tendency : U [0,1] xU x (x) the
distribution of x in the domain is called cloud, for
cloud C(X ) and each x called a cloud droplet.

Syed [2] first proposed SVM-based incremental learning


algorithm, though analyzing support vector (SV) of the sample
set. Liu Ye [3] proposed a DoS intrusion detection algorithm
based on incremental SVM (ISVM), which only retains the SV
and discards all non-SV samples. But with the increase of the
new samples, non-SV and SV will transform each other, and
discarding non-SV samples too quickly will affect the
classification accuracy of the incremental learning. Especially,
for the lack of initial samples, subsequent learning may be
unstable. Liu Yeqing et al. [4] proposed an Incremental SVM
learning algorithm based on the Nearest Border Vectors
(referred to as the N-ISVM), which solves the over-discarding

978-1-4673-2964-4/12/$31.00 2012 IEEE

On this basis, the extraction of support vector is effectively


improved and a new Incremental SVM method based on Cloud
Model (referred to as B-ISVM) is proposed after identifying
cloud boundary areas[5]. To the initial set, firstly, the distances
between each sample and its all heterogeneous are calculated in
the feature space and then mapped into membership degree to
effectively extract the boundary vectors, according the
characteristics of the cloud model, uncertainty with certainty
and stability with variability. To incremental set, both the
samples violating of the KKT conditions and the ones
satisfying but near the classification plane are retained.
Combining the two parts and training, the final incremental
SVM classifier are completed

When (x) is normal distribution, it is called as normal


cloud model [6-7],which is a normal random number set with
stable tendency, made up of expected value, entropy and hyper
entropy, shown in Figure 1. Algorithm or hardware generating
cloud droplets is called the cloud generator.

1467

CHANGCHUN, CHINA

method replacing the full with the nearest border vectors needs
further improvement.
B. Cloud Boundary Vectors
SV is certainly in the area close to the heterogeneous, but
not necessarily the nearest, which is the same to stable
tendency and randomness of cloud model. The theory of cloud
is introduced in this paper to extract boundary vectors. The
main process is as follows:
For linearly inseparable training set, data is mapped from
the original space : R n H , x ( x) to a high dimensional
feature space H by a nonlinear mapping.

Figure 1. The diagram of three digital characteristics of cloud model

Definition 2 Characteristic Distance Let x, y be two


vectors. Nonlinear mapping to feature space H, the
characteristic distance of them is defined as

Algorithm 1 Cloud generator of X-condition[8]


input { Ex En He } n x 0 // three digital
characteristics and number of droplets
output{ ( x0 , 1 ) ,, ( x0 , n ) }

d H ( x, y ) d ( ( x), ( y ))

for i =1 to n

En randn( En, He)

i e

Definition 3 Inter-class Characteristic Distance Matrix


As in (1), calculate the distance Dij of sample i and each the
heterogeneous j . Traverse all the two class samples and define
the C C matrix D as the inter-class characteristic distance
matrix, where C , C are respectively the numbers of the two
class samples.

drop( x0 , i )
III.

where K () is the kernel functions satisfying Mercer condition.

( x0 Ex ) 2
2 ( En )

K ( x, x ) 2 K ( x, y ) K ( y , y )

IDENTIFYING CLOUD BOUNDARY AREAS

Support vectors uniquely determine the classification


hyperplane, so how to find the support vectors is the key of the
SVM classification. Support Vectors have significant
geometric features, namely: they are almost in the edge of its
class and closest to the separating hyperplane samples.
According to this feature, the boundary vectors could be
selected from samples which are the most likely to be SVs in
order to reduce the training set and improve training efficiency.

i
Let Dmin
be the distance between i and its nearest

heterogeneous sample, and D i be the average distance between


i and all the heterogeneous samples of the sample with all the
heterogeneous samples. So distance of the nearest two vectors
can be calculated as Dmin min(Dij ) .
Definition 4 Cloud Membership
Transform the
characteristic distance Dij ,according to Algorithm 1:

A. Nearest Border Vectors


Set training set is T={ ( x1 , y1 ) , ... ( xl , yl ) ,}.
Let d ( xi , x j ) be the distance between xi and x j . For each xi ,
the nearest border vector is the heterogeneous x j corresponding
the min d ( xi , x j ) . Traversing all the i , nearest border vectors
set is obtained.

ij e

( Dij Ex )2
2 ( En )2

Theorem 1 Mapping training set T to high dimensional


feature space to make T linearly separable, then hyperplane
defined by border vector set of T is unique and can entirely
separate the two classes [4].

Theorem 1 illustrates that the nearest border vectors can


replace the full sample set, and also SV set is included in it.
However, due to the existence of the noise data and overfitting
samples, the penalty parameters will be introduced in order to
reduce the deviation of these samples, so actually the SV is not
necessarily the nearest point of the classification plane. The

i
En = D i Dmin He En / c . c N 2 in this
Ex = D min
paper as a control parameter, and N is the number of
homogeneous samples. En is a normal distribution random
number with expectation En and variance He . Therefore, the
smaller Dij does not necessarily correspond larger ij ,
reflecting the uncertainty.

En' normrnd ( En, He)

ij is defined as the cloud membership of i to j , where

1468

effectively reduced and the distribution characteristics of the


sample itself is greatly to keep.

Definition 5 Cloud Boundary Vectors and Cloud


Boundary Areas For each sample j , find the heterogeneous
sample i corresponding the max( ij ),which become a cloud
boundary vector. All the cloud boundary vectors consist of
cloud boundary areas.

IV. KKT CONDITIONS AND SAMPLE DISTRIBUTIONS


If the new samples contain the classified information that
original sample set does not, then the SV set is bound to
change after learing, because of the new information. The
impact of adding samples on the original SVs are related to the
KKT conditions. The new samples satisfing KKT conditions
will not change the original support vector set, otherwise the
opposite.

C. Parameters and Performance Analysis


The nearest vectors only have the greater probability to be
boundary vectors. On the other hand, the ones far from the
classification hyperplane may contain useful information and
are also added to cloud boundary areas, although they are not
the SVs during this learning. The cloud boundary vectors are
extracted according the characteristic distance to maintain the
stability tendency of the samples, retaining samples close to the
heterogeneous and eliminating others. However, it is uncertain
for each sample: the nearest vector will not necessarily be
selected and the farthest may be retained.

Theorem 2 If the one violating KKT conditions exists in


the adding samples, the non-SV of original samples may
become a new SV[9].
The impact of the new sample on the original non-SVs is as
an example of Figure 3. As shown in Figure 3, rectangles and
triangles respectively represent two different types of samples,
and new samples are marked with solid points. Initial
classification plane is f ( x) 0 and new classification plane
is g ( x) 0 , respectively. N1 was non-SV in the initial set.
However, after adding new samples, it converte into SV.

To extract the boundary vectors based on cloud model and


meet these requirements, the parameter plays a vital role. The
horizontal position and the steepness of the cloud model are
affected by Ex and En . He is related to the dispersion of the
cloud droplets, that is, the greater He is, the bigger dispersion
of droplets is. All the cloud droplets randomly fluctuate the
curve of the desired, which is controlled by He .

In simple incremental SVM method, only the original SV


set and the violation samples of the KKT conditions are
merged into the new training set, which may result in the loss
of classified information of the original sample, regarding of
the original non-SVs. According the theorem 2, these ignored
non-SV may become SV in the follow-up learning. Therefore,
the successor to the learning process will appear the "shock"
phenomenon, because of the deficiency of initial samples.

i
First, let Ex = Dmin
in order to ensure the vectors near
classification surface vectors tend to larger membership degree.
Set En = D i Dmin to control the cloud coverage and the scope
of the cloud will also increase accordingly with the sample
distance. The stable tendency will be damaged because of too
large He ,and randomness may lost because it is too small.
When He =0, the algorithm degenerated into nearest boundary

f(x)=0

vector algorithm. In this paper, the parameter c N 2


controlled by the number of samples, make He decreased as the
samples increased in order to avoid retaining too many samples
in the long distance.

A1

g(x)=0
N1

Noise Samples

Overfitting
Samples

g(x)=0

Figure 3. Impact of adding samples to classification

Cloud Boundary
Vectors

Theorem 3 If the one violating KKT conditions exists in


the adding samples, the sample satisfing KKT may become a
new SV[9].
In Figure 3, A1 satisfies the KKT conditions and transforms
into the SV after new samples adding, where new samples are
marked with solid points.

Figure 2. Extraction of Cloud Boundary Vectiors

Through the cloud boundary vectors marked with solid


points, the training set is reduced as shown in Figure 2, where
the two types of samples are respectively discribed as
rectangles and triangles. There are no clear boundaries of the
final cloud boundary vector districts, keepping pace with the
fuzziness of the cloud model. As a result, the full sample are

Therefore, new samples satisfing KKT conditions also


shoud not be thrown away. In incremental learning, in addition
to the violating samples of the KKT conditions, a part of new
ones satisfing KKT conditions shoud be considered, which
distribute outside the class interval within a certain range.

1469

(3)If there is no the sample violating the KKT conditions


in X I , turn (2) ;

V. INCREMENTAL SVM METHOD TO INTRUSION


DETECTION BAESD ON CLOUD BOUNDARY VECTORS
To the initial set, firstly, the distances between each
sample and its all heterogeneous are calculated in the feature
space and
then
mapped into membership
degree to
effectively extract the boundary vectors, which are trained to
get initial classifier. To incremental set, both the samples
violating of the KKT conditions expressed by yi f ( xi ) 1 and
the ones satisfying but near the classification plane expressed
by 1 yi f ( xi ) 1 (0,1) are retained. Combining all
the three parts and training, the final incremental SVM
classifier are completed.

(4)Define X N as the sample violating the KKT conditions


in X I , and X V as the ones satisfying but near the classification
plane;
(5) C0 C0 X N X V train C 0 and update SVM ,
turn(2).
VI. EXPERIMENT AND ANALYSIS OF INTRUSION DETECTION
12000 independent samples in KDDCUP1999[10] are
randomly selected and distributed into four parts, where one is
initial set and the remaining three are incremental sets.
Experiment environment is CPU 1. 99GHzmemory 2GB and
Windows XP system,based on MATLAB7.0 and libsvm-mat2.91-1 toolbox, where RBF function is as kernel function of
SVM and kernel parameter g and penalty coefficient C can be
obtained by cross-validation optimizing method. The
experimental results are shown in Table I.

Let X 0 be the initial samples, and X I be the increamental


samples of the I time. The algorithm is as follow.
(1)Define the cloud boundary vectors C 0 of X 0 ,and train to
complete initial classifier SVM ;
(2)If X I , Algorithm terminateselse, turn step(3);
TABLE I.

Training sets

samples

EXPERIMENTAL RESULTS

I-SVM

N-ISVM

C-ISVM

Training

time(ms)

rate(%)

Training

time(ms)

Training

Training

time(ms)

rate(%)

Initial set

2999

2999

78

100

2999

78

100

2999

78

100

Incremental set 1

3001

3178

203

94.87

1989

125

93.47

1695

125

93.30

Incremental set 2

2999

3205

422

84.97

2124

313

90.62

1721

310

94.71

Incremental set 2

3001

3247

671

80.81

2200

547

92.50

1798

469

96.29

Table 1 shows that C-ISVM algorithm is superior to ISVM and N-ISVM method in terms of overall performance. In
the aspect of training samples and time, the I-SVM trains all
incremental samples, resulting the largest training number and
time. Such as the first increment, training time is 38.42% more
than others. Although C-ISVM expand the boundary vectors,
the samples violating the KKT conditions are reduced. So the
training set and time is slightly smaller than the N-ISVM; In
the aspect of detection rate, the final rate of C-ISVM is
significantly increased by 15.48% and 3.79% than the NISVM and I-SVM. In the aspect of algorithm stability, the
detection rate of I-SVM is declining as the increment
increasing. N-ISVM maintains higher detection rate, but the
impact of the noise data and the sample over-fitting limites its
classification performance further improving. C-ISVM focuses
on filtering the boundary vectors, and also retains the overall
distribution characteristics of samples. As the samples
gradually improved in follow-up learning process, the
detection rate can maintain a steady rising trend.

REFERENCES
[1]

Cortes C, Vapnik V. Support vector networks [J]. Machine Learning.


1995, 20 (3):273-297.
[2] Syed N, Liu H, Sung K. Incremental learning with support vector
machines[C]. Proc Int Joint Conf on Artificial Intelligence, 1999.
[3] LIU Ye, WANG Zebing, FENG Yan. DoS Intrusion Detection Based on
Incremental Learning with Support Vector Machines [J]. Computer
Engineering,2006,32(4):179-186.
[4] LIU Ye-qing, LIU Sanyang, GU Ming-tao. Incremental Learning
Algorithm of Support Vector Machine Based on Nearest Border
Vectors[J]. Mathematics in Practice and Theory,2011,41(2):110-114.
[5] CHEN Weimin. Research on Support Vector Machine Solving the
Large-scale Data Set [D]. Nanjing University of Aeronautics and
Astronautics,2006.
[6] Li Deyi, Meng Haijun, Shi Xuemei. Membership Clouds and
Membership Cloud Generators[J]. Journal of Computer Research and
Development. 1995,32(6):15-20.
[7] Li Deyi, Liu Changyu, Du Yi,et al. Artificial Intelligence with
Uncertainty [J]. Journal of Software,2004,15(11):1583-1594.
[8] Li xing-sheng. Study on Classification and Clustering Mining based on
Cloud Model and Data Field[D].The PLA University of
Technology & Science ,2003.16-19.
[9] WANG Xiao-dan, ZHENG Chun-ying, WU Chong-ming, et.al. New
algorithm for SVM-Based incremental learning[J]. Journal of Computer
Applications,2006,26(10):2440-2443.
[10] KDD99Cupdataset[DB/OL].[2012-08-07].http://kdd.ics.uci.edu
/databases/kddcup99/kddcup99.Html

VII. CONCLUSION
A new incremental SVM method to intrusion detection
based on cloud model is proposed. The cloud membership is
defined to replace characteristic distance,and also KKT
conditions are extended. Experimental results show that the
method effectively reduces the sample set and running time,
while maintaining a high detection performance.

1470

You might also like