Professional Documents
Culture Documents
381
Proceedings of the First Internadonal Conference on Machine Learning and Cybernetics,Beijing, 4 5 November 2002
can be generated, according to the associated probability 3.1 Profiling Normal and Abnormal Behavior
distribution. It is only the outcome, not the state visible to
an external observer and therefore states are “hidden” to the The first thing that we need to do is to profile normal and
outside. This is what the name Hidden Markov Models abnormal behavior from the view of a specified process,
comes from. e.g., mail-sending. We have obtained two sets of mail-
Traditionally, people have used Markov model to sending system call data. The procedures of generating
successfully model a lot of real world processes. But for these traces are described in (Forrest et al. 1996) [SI.Each
some other processes, the strict assumption of Markov that trace is the list of system calls issued by a single process
next state is dependent only upon the current state will not from the beginning of its execution to the end. Each trace
hold, thus we need to find more general models to deal with file lists pain of numbers, one pair per line. The first
these processes while at the same time withhold some good number in a pair is the PID of the executing process, and
properties of Markov model. These principles motivate the second is a number representing the system call. The
people to generate the Hidden Markov Models. Hidden mapping between system call numbers and actual system
Markov Models is a double embedded stochastic process call names is given in a separate file. For example, the
with two hierarch levels. The upper level is a Markov number 10 represents system call “execv”. Forks are traced
process that the states are unobservable. Observation is a individually, and their traces are included as part of normal.
probabilistic function of the upper level Markov states. The set of traces includes:
Different Markov states will have different observation Normal traces: a trace of the mail-sending daemon and
probabilistic functions. a concatenation of several invocations of mail-sending.
The two hierarchy-level structure is the main idea and Abnormal traces: 2 traces of the syslog-remote
advantage of HMMs. It can be used to model much more intrusion, 2 traces of the sscp intrusion, 2 traces of the
complicated stochastic processes than traditional Markov syslog-localintrusion, 2 traces of the decode intrusion,
model. In speech recognition, HMMs have been widely I trace of the sm5x intrusion and 1 trace of the sm565a
used for analysis human auditory signals as speech patterns intrusion. These traces come from the abnormal runs
[’I. In modeling dynamic human control strategy, “‘I uses of the mail-sending program which is exploited by
HMMs to classify different human’s behavior patterns. above intrusion tools. The mail-sending daemon deals
Transient sonar signals are analyzed with HMMs for ocean with incoming mails and all other processes deal with
surveillance [I2]. outgoing mails.
2.2 -
Usine HMMs model to characterize the behavior Table 1. System Call Data. Each file has two columns, the
q i d s and the system call numbers.
of programs
lpID I 1381 1381 ... 1391 1391... 1
I
Because Hidden Markov Models (HMMs) are statistical Systerncdh 5 2 3 4 5 4 2766 ...19 19 105 1 0 4 ...
382
Proceedings of the First International Conference on Machine Laruing and Cybernetics,Beijing, 4 5 November 2002
normal traces are labeled as “normal”. It should be noted weight of a region indicates its “abnormal” degree. A
that for an intrusion trace not all the sequences are number of abnormal system calls generated by an intrusion
“abnormal” since the illegal activities only occur in several action will lead to more abnormal sequences in a region
places within a trace. than a normal action, so we can get a greater weight. But a
prediction error only leads to very sparse “abnormal”
Table 2. Classified System Call Data. sequences, so the weight of this region is light. In (Forrest
I System call (1 1) I ClaSsLabels I et al. 1996) [*I,the percentage of the mismatched sequences
52345421665426666 ” n o d (out of the total number of matches (lookups) performed for
the trace) is used to distinguish normal from abnormal. Our
algorithm is different in that ‘we look for “abnormal
regions” that contains more “abnormal” sequences than the
“normal” ones, and calculates the percentage of abnormal
3.2 Experimental Setup
regions (out of the total number of regions). Our algorithm
We formulate our learning method as followings: is more sensitive, and less false alarm.
0 The training data is composed of all normal sequences,
plus the abnormal sequences from 1 traces of the sscp 4 Results
intrusion, I trace of the syslog-local intrusion, and I
trace of the syslog-remote intrusion. We now analyze the results of our experiments. We
0 The testing data is all the sequences (normal and demonstrate that our machine learning approach is useful in
abnormal) in the intrusion traces not included in the detecting misuse and anomaly intrusions. We need include
training data. all the unique normal sequences of mil-sending. The
The HMMs can be used to predict whether a sequence is abnormal sequences are taken from sscp-1, syslog-local-1,
“abnormal” or “normal”. But what the intrusion detection and syslog-remote-l. We compare the results of the
system needs to know is whether the trace being tested is an following experiments that have different distributions of
intrusion or not. Can we say that whenever there is an “abnormal” versus “normal” sequences in the training data:
“abnormal” sequence in the trace, it is an intrusion? It 1. Experiment A: I copy of all unique normal sequences
depends on the accuracy of our HMMs when classifying a and 1 copy of the abnormal sequences.
sequence as “abnormal”. Unless it is close to loo%, it is 2. Experiment B: 4 copies of all unique abnormal
unlikely that a predicted “abnormal” sequence is always sequences.
part of an intrusion rather than just an error. 3. Experiment C 4 copies of all unique normal
sequences.
3.3 Detecting Abnormal Behavior 4. Experiment D: 4 copies of all unique normal
sequences and 4 copies of the abnormal sequences.
We use the following algorithm to detect whether the Each copy of the abnormal sequences has 1,315
trace is an intrusion based on the HMMs predictions of its sequences. We test the performance of the classifiers
constituent sequences: generated by HMMs on every intrusion trace by supplying
1. Use a sliding window of length 2n+l, e.g., 7, 9, 11, 13, all the sequences (abnormal and normal) of the trace to the
etc., and a sliding (shift) step of n, to scan the predictions classifiers. To detect abnormal behavior, with a sliding
made by HMMs. window of length 9, is applied to the predictions of the
2. For each of the regions (of HMMs predictions) classifiers. Note that the window length here specifies the
generated in Stepl, we use n to divide the sum of size of the regions of predictions, which can be different
“abnormal” predictions, and we notate the quotient as w. If from the length of the sequences of system calls. Table 3
w is less than 1, the current region of predictions is a shows the anomaly detection results of these experiments
“normal” region, else the current region is an “abnormal” compared with the results from Forrest et al. (1996) [*I,
region and its weight is w. From Table 3, we can see that the classifier from
3. We add all of the weights of “abnormal” regions, and Experiment A is not acceptable because it classifies every
notate the sum as S. Then we use the number of all regions trace (including mil-sending) as an intrusion. This is due
to divide the S to get the percentage of “abnormal” regions, to too few examples of “normal” and “abnormal” in the
and we notate it as a. If a is above a threshold value, say training data (each unique “normal” sequence only appears
lo%, then the trace is an intrusion. once). The classifier from Experiment B does better than
This algorithm can filter out the false positives (i.e., the classifier from Experiment A. It is important to note that
classified a “normal” sequence to an “abnormal” sequence). the classifiers from Experiment B perform quite well on
The principle behind this algorithm is that when an known intrusions, i.e., sscp-2, syslog-local-2, and syslog-
intrusion actually occurs, it generates a number of abnormal remote-2, because the training data includes the abnormal
system calls, and as a result, the neighboring sequences of sequences from the traces of the same types of intrusions.
system calls will not match the normal sequences. However, But they perform relatively poorly on unknown intrusions
the prediction errors tend to be isolated. We think the (the abnormal sequences of the traces of these types of
383
Proceedmgs of the First International Conference on Machme Learning and Cybernetics, Beijing, 4-5 November 2002
intrusions are not in the training data), i.e., decode- 1&2, sequences detects an intrusion, we can say we detect this
sm565a, and sm5x. This result shows ns that classifiers with intrusion. So the performance to recognize the known
“abnormal” sequences are only good for detecting known intrusion is better thm C. Therefore, the classifier from
intrusions and hence don’t generalize to other “unseen” Experiment D has the hest performance. It correctly
intrusions. The classifier from Experiment C, which has the classifies every tracc of the known and the unknown
4 copies of “normal” sequences, is an improvement. It intrusions. Experiment C and D also confirm our conjecture
predicts correctly on all intrusions, including the known and that classifiers for the “normal” sequences can be used for
the unknown, but does worse than D on the known anomaly intrusion detection, thus generalizing the notion of
intrusions i.e., sscp-2, syslog-local-2, and syslog-remote-2, normalcy. The results from Forrest et al. (1996) showed
because in Experiment D, we have not only normal that their methods required a very low threshold in order to
sequences but also abnormal sequences to train our HMMs. correctly detect the decode and smS6Sa intrusions.
Our HMMs have knowledge about both normal and Comparing with it our results showed that our approach
abnormal sequences, and if the classifier trained with generated much “stronger signals” of anomalies from the
normal sequences or the classifier trained with abnormal intrnsiontraces.
384
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4 5 November 2002
305