You are on page 1of 5

Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4-5 November 2002

HMMS (HIDDEN MARKOV MODELS) BASED ON ANOMALY


INTRUSION DETECTION METHOD
BO GAO“’, HUI-YE MA‘b’,YU-HANG YANG“’

@’Departmentof Electronic Engineering, Shanghai Jiao Tong University, Shanghai, China


‘b’Centerfor Space Science and Applied Research, Chinese Academy of Sciences
E-MAIL: gaobo@sjtu,edu.cn, pony@nc.poac.ac.cn

Abstract: “normal” database that contains all possible short sequences


In this paper we discuss our research in developing (e.g., of length 11) of system calls for each program (mail-
anomaly detecting method for intrusion detection. The key sending, [pr, etc.) that needs to be guarded. The normal
idea is to use HMMs (Hidden Markov models) to learn the database is then used to examine the behavior of a running
(normal and abnormal) patterns of U N x processes. These program (e.g., an instance of mil-sending). If the total
patterns can he used to detect anomalies and known intrusion. number (or percentage) of abnormal sequences, which are
Using experiments on the muiI-sending system call data, we
demonstrate that we can Constrnct coucise and accurate those that don’t match the normal sequences in the normal
classifiers to detect intrusion action. database, is above a threshold value, then the current run is
flagged as abnormal, i.e., a misuse or anomaly intrusion is
Keywords: detected.
Intrusion detection; Anomaly detecting; HMMS (Hidden There are many ways in which system call data could be
Markov Models); Machine learning; System call used to characterize normal behavior of programs, each of
which involves building or training a model using traces of
1 Introduction normal processes. A machine learning approach to this
problem would construct a finite state machine to recognize
Approaches to intrusion detection can be broadly divided the “language” of the program traces. A particularly
into anomaly detection and misuse detection. Misuse powerful finite state machine is the Hidden Markov models,
detection based approaches [‘.2,31 define and look for precise used widely in s eech recognition and also in DNA
sequences of events that damage the system. Misuse sequence modeling ? % ~ O ,
detection approaches are more precise and less prone to We get our experiment data, which are a set of traces of
false positives. However, since misuse detection the mail-sending program, from Stephanie Forrest. These
approaches require specification of damaging events, which traces were used io the experiments reported in (Forrest et
is usually manual and based on previously known attacks, What we want to know is whether a machine learning
they are less effective against newly discovered approach based on HMMs can be used to learn the normal
vulnerabilities and anacks. Anomaly detection based andor abnormal behavior of the process. Consequently, we
.approaches first create a profile that describes normal want to know whether our approach can detect known and
behaviors and then detects deviations from this profile [4’s’61. unknown intrusion more accurately.
Anomaly detection approaches possess the advantage that The rest of the paper is organized as follows: Section 2
learning to identify normal behavior can be automated, but presents the HMMs and how to use HMMs in our
they are prone to false positives, especially when experiments. Section 3 describes the details of our
permissible but previously unlearned behavior occurs. experiment. Section 4 presents the results of our
An alternative approach is taken by Fink, Levitt and KO experiments and analyses them in details. Section 5 outlines
[’I. hstead of trying to build up normal user profiles, they some open issues and avenues for future work. Section 6
focus on determining normal behavior for privileged makes our conclusions.
processes, those that run as root. They define normal
behavior using a program specification language, in which 2 Hidden Markov Models
the allowed operations (the system calls and their
garameters) of a process are formally specified. Forrest et al 2.1 Hidden Markov Models Description
I I introduced a novel and simpler method. They gathered
the traces of normal runs of a program and analyzed the The Hidden Markov Models is a finite set of states, each
“local (short) range ordering of system calls”. They of which is associated with a (generally multi-dimensional)
discovered that these local orderings “appears to be probability distribution. Transitions among the states are
remarkably consistent, and this suggests a simple definition governed by a set of probabilities called transition
of normal behavior”. The key idea here is to build a probabilities. In a particular state an outcome or observation

0-7803-7508-4/02/$17.00 02002 IEEE

381
Proceedings of the First Internadonal Conference on Machine Learning and Cybernetics,Beijing, 4 5 November 2002

can be generated, according to the associated probability 3.1 Profiling Normal and Abnormal Behavior
distribution. It is only the outcome, not the state visible to
an external observer and therefore states are “hidden” to the The first thing that we need to do is to profile normal and
outside. This is what the name Hidden Markov Models abnormal behavior from the view of a specified process,
comes from. e.g., mail-sending. We have obtained two sets of mail-
Traditionally, people have used Markov model to sending system call data. The procedures of generating
successfully model a lot of real world processes. But for these traces are described in (Forrest et al. 1996) [SI.Each
some other processes, the strict assumption of Markov that trace is the list of system calls issued by a single process
next state is dependent only upon the current state will not from the beginning of its execution to the end. Each trace
hold, thus we need to find more general models to deal with file lists pain of numbers, one pair per line. The first
these processes while at the same time withhold some good number in a pair is the PID of the executing process, and
properties of Markov model. These principles motivate the second is a number representing the system call. The
people to generate the Hidden Markov Models. Hidden mapping between system call numbers and actual system
Markov Models is a double embedded stochastic process call names is given in a separate file. For example, the
with two hierarch levels. The upper level is a Markov number 10 represents system call “execv”. Forks are traced
process that the states are unobservable. Observation is a individually, and their traces are included as part of normal.
probabilistic function of the upper level Markov states. The set of traces includes:
Different Markov states will have different observation Normal traces: a trace of the mail-sending daemon and
probabilistic functions. a concatenation of several invocations of mail-sending.
The two hierarchy-level structure is the main idea and Abnormal traces: 2 traces of the syslog-remote
advantage of HMMs. It can be used to model much more intrusion, 2 traces of the sscp intrusion, 2 traces of the
complicated stochastic processes than traditional Markov syslog-localintrusion, 2 traces of the decode intrusion,
model. In speech recognition, HMMs have been widely I trace of the sm5x intrusion and 1 trace of the sm565a
used for analysis human auditory signals as speech patterns intrusion. These traces come from the abnormal runs
[’I. In modeling dynamic human control strategy, “‘I uses of the mail-sending program which is exploited by
HMMs to classify different human’s behavior patterns. above intrusion tools. The mail-sending daemon deals
Transient sonar signals are analyzed with HMMs for ocean with incoming mails and all other processes deal with
surveillance [I2]. outgoing mails.

2.2 -
Usine HMMs model to characterize the behavior Table 1. System Call Data. Each file has two columns, the
q i d s and the system call numbers.
of programs
lpID I 1381 1381 ... 1391 1391... 1
I
Because Hidden Markov Models (HMMs) are statistical Systerncdh 5 2 3 4 5 4 2766 ...19 19 105 1 0 4 ...

recognition,speech recognition, and modeling of biological


sequences, and our traces are sequences of system calls The algorithm used to build the profile of normal and
(“normal” and “abnormal”) from a privileged program, we abnormal behavior is extremely simple. Since short
want to use HMMs to learn the pattern of normal andlor sequences of system calls are important characteristics of a
abnormal behavior of the process. Standard HMMs have a program’s normal behavior. We can use a slide window to
fixed number of states, so one must decide on the size of cross the trace of mail-sending, and record which calls
the model before training. Preliminary experiments show us follow which within the sliding window. By recording these
that a good choice for our application is to choose a number short sequences we can get the profile of normal behavior.
of states roughly corresponding to the number of unique Following (Forrest et al 1996)@’,a sliding window of length
system calls used by the program. Our test programs use an 11 seemed to give the best predictive performance.
alphabet of about 60-sFte HMMs for mail-sending. The Therefore, we also use a sliding window of length 11 with
states are fully connected transitions are allowed from state sliding (shift) step of 1 to create sequences of system calls
to state. For each state then, we need store the probabilities from the traces. We first use a sliding window to scan the
associated with transitions to another state, and the normal traces (of the mail-sending daemon and mail-
probabilities associated with producing each system call. sending) and create a list of unique sequences of system
For a program using S system calls, and hence a model of S calls, 1,082 in total. We call this list the “normal” list. Next,
states, this means roughly 2S2 values. Transition and we check each of the intrusion traces against it using the
symbol probabilities are initialized randomly, and then same method. We slide a window of size 11 across the
trained using the Baum-Welch algorithm as described id9’. intrusion traces, determining if the sequence of system calls
differs from that recorded in the normal list. If a match can
3 Experiments he found then the sequence is labeled as “normal”.
Otherwise it is labeled as “abnormal”. See Table 2 for an
example of the labeled sequences. All sequences in the

382
Proceedings of the First International Conference on Machine Laruing and Cybernetics,Beijing, 4 5 November 2002

normal traces are labeled as “normal”. It should be noted weight of a region indicates its “abnormal” degree. A
that for an intrusion trace not all the sequences are number of abnormal system calls generated by an intrusion
“abnormal” since the illegal activities only occur in several action will lead to more abnormal sequences in a region
places within a trace. than a normal action, so we can get a greater weight. But a
prediction error only leads to very sparse “abnormal”
Table 2. Classified System Call Data. sequences, so the weight of this region is light. In (Forrest
I System call (1 1) I ClaSsLabels I et al. 1996) [*I,the percentage of the mismatched sequences
52345421665426666 ” n o d (out of the total number of matches (lookups) performed for
the trace) is used to distinguish normal from abnormal. Our
algorithm is different in that ‘we look for “abnormal
regions” that contains more “abnormal” sequences than the
“normal” ones, and calculates the percentage of abnormal
3.2 Experimental Setup
regions (out of the total number of regions). Our algorithm
We formulate our learning method as followings: is more sensitive, and less false alarm.
0 The training data is composed of all normal sequences,
plus the abnormal sequences from 1 traces of the sscp 4 Results
intrusion, I trace of the syslog-local intrusion, and I
trace of the syslog-remote intrusion. We now analyze the results of our experiments. We
0 The testing data is all the sequences (normal and demonstrate that our machine learning approach is useful in
abnormal) in the intrusion traces not included in the detecting misuse and anomaly intrusions. We need include
training data. all the unique normal sequences of mil-sending. The
The HMMs can be used to predict whether a sequence is abnormal sequences are taken from sscp-1, syslog-local-1,
“abnormal” or “normal”. But what the intrusion detection and syslog-remote-l. We compare the results of the
system needs to know is whether the trace being tested is an following experiments that have different distributions of
intrusion or not. Can we say that whenever there is an “abnormal” versus “normal” sequences in the training data:
“abnormal” sequence in the trace, it is an intrusion? It 1. Experiment A: I copy of all unique normal sequences
depends on the accuracy of our HMMs when classifying a and 1 copy of the abnormal sequences.
sequence as “abnormal”. Unless it is close to loo%, it is 2. Experiment B: 4 copies of all unique abnormal
unlikely that a predicted “abnormal” sequence is always sequences.
part of an intrusion rather than just an error. 3. Experiment C 4 copies of all unique normal
sequences.
3.3 Detecting Abnormal Behavior 4. Experiment D: 4 copies of all unique normal
sequences and 4 copies of the abnormal sequences.
We use the following algorithm to detect whether the Each copy of the abnormal sequences has 1,315
trace is an intrusion based on the HMMs predictions of its sequences. We test the performance of the classifiers
constituent sequences: generated by HMMs on every intrusion trace by supplying
1. Use a sliding window of length 2n+l, e.g., 7, 9, 11, 13, all the sequences (abnormal and normal) of the trace to the
etc., and a sliding (shift) step of n, to scan the predictions classifiers. To detect abnormal behavior, with a sliding
made by HMMs. window of length 9, is applied to the predictions of the
2. For each of the regions (of HMMs predictions) classifiers. Note that the window length here specifies the
generated in Stepl, we use n to divide the sum of size of the regions of predictions, which can be different
“abnormal” predictions, and we notate the quotient as w. If from the length of the sequences of system calls. Table 3
w is less than 1, the current region of predictions is a shows the anomaly detection results of these experiments
“normal” region, else the current region is an “abnormal” compared with the results from Forrest et al. (1996) [*I,
region and its weight is w. From Table 3, we can see that the classifier from
3. We add all of the weights of “abnormal” regions, and Experiment A is not acceptable because it classifies every
notate the sum as S. Then we use the number of all regions trace (including mil-sending) as an intrusion. This is due
to divide the S to get the percentage of “abnormal” regions, to too few examples of “normal” and “abnormal” in the
and we notate it as a. If a is above a threshold value, say training data (each unique “normal” sequence only appears
lo%, then the trace is an intrusion. once). The classifier from Experiment B does better than
This algorithm can filter out the false positives (i.e., the classifier from Experiment A. It is important to note that
classified a “normal” sequence to an “abnormal” sequence). the classifiers from Experiment B perform quite well on
The principle behind this algorithm is that when an known intrusions, i.e., sscp-2, syslog-local-2, and syslog-
intrusion actually occurs, it generates a number of abnormal remote-2, because the training data includes the abnormal
system calls, and as a result, the neighboring sequences of sequences from the traces of the same types of intrusions.
system calls will not match the normal sequences. However, But they perform relatively poorly on unknown intrusions
the prediction errors tend to be isolated. We think the (the abnormal sequences of the traces of these types of

383
Proceedmgs of the First International Conference on Machme Learning and Cybernetics, Beijing, 4-5 November 2002

intrusions are not in the training data), i.e., decode- 1&2, sequences detects an intrusion, we can say we detect this
sm565a, and sm5x. This result shows ns that classifiers with intrusion. So the performance to recognize the known
“abnormal” sequences are only good for detecting known intrusion is better thm C. Therefore, the classifier from
intrusions and hence don’t generalize to other “unseen” Experiment D has the hest performance. It correctly
intrusions. The classifier from Experiment C, which has the classifies every tracc of the known and the unknown
4 copies of “normal” sequences, is an improvement. It intrusions. Experiment C and D also confirm our conjecture
predicts correctly on all intrusions, including the known and that classifiers for the “normal” sequences can be used for
the unknown, but does worse than D on the known anomaly intrusion detection, thus generalizing the notion of
intrusions i.e., sscp-2, syslog-local-2, and syslog-remote-2, normalcy. The results from Forrest et al. (1996) showed
because in Experiment D, we have not only normal that their methods required a very low threshold in order to
sequences but also abnormal sequences to train our HMMs. correctly detect the decode and smS6Sa intrusions.
Our HMMs have knowledge about both normal and Comparing with it our results showed that our approach
abnormal sequences, and if the classifier trained with generated much “stronger signals” of anomalies from the
normal sequences or the classifier trained with abnormal intrnsiontraces.

demonstrate that machine learning can indeed play an


5 Discussion and Future Work important role in intrusion detection of computer systems.
Much more research needs to he pursued in order to build
These experiments teach us a very important lesson that a system that can much more rapidly and correctly detect
is in order to detect unknown anomalous behavion. it is intrusions.
pivotal that a model of the normal behavior of the program
be used. This confirms the intuition of Forrest et al. (1996)
as well. Recall that we have chosen to define normal in References
terms of short sequences of system calls. In the interests of
simplicity, we ignore the parameters passed to the system P. Porras, R.Kernmerer. Penetration State Transition
calls, and look only at their temporal orderings. This Analysis-A Rule Based Intrnsion Detection Approach.
definition of normal behavior ignores many other Computer Security Applications Conference, 1992.
important aspects of process behavior, such as timing K. Ilgun. A real-time Intrusion detection system for
information instruction sequences between system calls, UNIX. IEEE Symp. on Security and Privacy, 1993.
and interactions with other processes. Certain. intrusions SKurmar, E. Spafford. A Pattern-Matching Model
may only he detectable by examining other aspects of for Intrusion Detection. National Computer Security
process behavior, and so we may need to consider them Conference, 1994.
later. K. Fox, R. Henning, J. Reed, R. Simonian. A Nueral
Network Approach Towards Intrusion Detection.
6 Conclusions National Computer Security Conference, 1990.
T. Lunt. A Real-Time Intrusion Detection Expert
We applied a machine leaming approach based on System (IDES)-Final Report. SRI-CSL-92-05, SRI
HMMs to leam normal and abnormal patterns of program International, 1992.
behavior from its execution trace to generalize upon the D. Anderson, T. Lunt, H.Javitz, A. T a m m , A.
method introduced in (Forrest et al. 1996) ‘*], The resultant Valdes. Next-generation Intrusion Detection Expert
normal patterns (classifiers) are shown to be able to System(NIDES): A Summary, SRI-CSL-95-07, SRI
accurately detect anomalous intrusions. Our experiments International, 1995.

384
Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 4 5 November 2002

[7] C KO, G Fink, K Levitt. Automated Detection of


Vulnerabilities in Privileged Programs by Execution
Monitoring. in Proceedings of the 10th Annual
computer Security Applications Conference pp.134-
144,1994.
[8] Stemphanie Forrest, Steven A. Hofmeyr, Ani1
Somayaji, Thomas A. Longtaff. A Sense of Self for
Unix Processes. In Proceedings of the 1996 IEEE
Symposium on Security and Privacy, IEEE Computer
Society Press,Los Alamitors, CA pp.120-128.
[9] L. R .Rabiner. A tutorial on Hidden Markov Models
and selected applications in speech recognition.
Proceedings of the IEEE, 77(2):257-286,1989.
[IO] L. R. Rabiner, B. H. Juang. An introduction to
Hidden Markov Models. IEEE ASSP Magazine,
pp.4-16, January 1986.
[ I l l M. C. Nechyba, Y. Xu. Stochastic Similarity for
Validating Human Control Strategy Models.
[I21 Kundu, G C. Chen, C. E. Persons. Transient Sonar
Signal Classification Using Hidden Markov Models
and Neural Nets. IEEE Journal of Oceanic
Engineering, 1911):87-99, 1994.

305

You might also like