Professional Documents
Culture Documents
A CLUSTERING ALGORITHM
A CLUSTERING FOR DETECTING
ALGORITHM DDoS ATTACKS IN
FOR DETECTING
NETWORKS
DDoS ATTACKS IN NETWORKS
Dr.K.Sarmila, G.Kavin
S.Rajalakshmi, Dr.A.Kannan,
Assistant Professor, Research scholar, Department of Computer Science and Engineering,
M.E.SoftwareEngineeing, Assistant
RVS College of Engineering, Coimbatore, Professor,
Tamilnadu, India
Department of Computer Science and Engineering, Department of Computer Science and Engineering,
Anna University, Chennai Anna University, Chennai
mail2rajalakshmi@gmail.com kannan@cs.annauniv.edu
Index terms: Anomaly detection, heuristic clustering, true positive rate, false positive rate.
1. INTRODUCTION
With rapid development in the computer misuse detection, each instance in a data set is
based technology, new application areas for labeled as normal or intrusion and a learning
computer networks have emerged in Local algorithm are trained over the labeled data. . An
Area Network and Wide Area Network .This anomaly detection technique builds models of
became an attractive target for the abuse and a normal behavior, and automatically detects any
big vulnerability for the community. Securing deviation from it, flagging the latter as suspect.
this infrastructure has become the one Attacks fall into four main categories[8] they
research area. Network intrusion detection are Denial of Service (DoS) is a class of attacks
systems have become a standard component in where an attacker makes some computing or
security infrastructures. Intrusion detection is memory resource too busy or too full to handle
the process of monitoring the events legitimate requests, A remote to user (R2L)
occurring in a computer system or network attack is class of attacks where an attacker sends
and analyzing them for signs of intrusions, packets to a machine over a network, then
defined as attempts to compromise the exploits machines vulnerability to illegally gain
confidentiality, integrity, availability, or to local access a user, User to root exploits is a
bypass the security mechanisms of a computer class of attacks where an attacker starts out with
or network. access to a normal user account on the system
There are generally two types of and is able to exploit vulnerability to gain root
attacks in network intrusion detection. In access to the system. Probing is a class of
8
www.ijresonline.com
International Journal of Recent Engineering Science (IJRES),
ISSN: 2349-7157, volume1 issue2 Feb 2014
attacks where an attacker scans a network to for audit mechanisms, among which were
gather information or find known allowing the discovery of attempts to bypass
vulnerabilities. protection mechanism.
The main reason for using Data Over the past years there are many
Mining Techniques for intrusion detection different types of intrusions, and different
system is due to the enormous volume of detectors are needed to detect normal and
existing and newly appearing network data anomalous activities in network. Some
that require for processing. The data Clustering algorithms have recently gained
accumulated each day by a network is huge. attention in related literature, since they can help
Several data mining techniques such as current intrusion detection systems in several
clustering, classification, and association rules respects.
are proving to be useful for gathering different Several existing supervised and
knowledge for intrusion detection. unsupervised anomaly detection schemes and
Unsupervised Anomaly Detection their variations are evaluated on the DARPA
(UAD) algorithms have the major advantage 1998 data set of network connections [3] as well
of being able to process unlabeled data and as on real network data using existing standard
detect intrusions that otherwise could not be evaluation techniques.
detected. The goal of data clustering, or Recently, anomaly detection has been
unsupervised learning, is to discovery a used for identifying attacks in computer
natural grouping in a set of patterns, points, networks [1], malicious activities in computer
or objects, without knowledge of any class systems and misuse in web systems [6], [7]. A
labels. more recent class of Anomaly Detection
We need a technique for detecting Systems developed using machine learning
intrusions when our training data is unlabeled, techniques like artificial neural-networks [8],
as well as for detecting new and unknown Kohonens self organizing maps [9] fuzzy
types of Intrusions. We evaluated our classifiers and others [7] have become popular
clustering method over a data set of network because of their high detection accuracies at low
connections. The network data we examined false positives. Data mining for security
was from DARPA 2000 [9], which is a very knowledge [2] employs a number of search
popular and widely used intrusion attack data algorithms, such as statistical analysis, deviation
set. Our experimental results show that the analysis, rule induction, neural abduction,
detection rate of the proposed method making associations, correlations, and
consistently outperforms other existing clustering.
algorithms reported in the literature; especially Another data mining approach [6] for
at the false positive rate intrusion detection is the application of
clustering techniques for effective intrusion
2. RELATED WORKS .How ever , the existing works that is based on
either K-Means or K-Medoids have two
Network-based computer systems play shortcomings in clustering large network
increasingly vital roles in modern society, datasets namely number of clusters dependency
security of network systems has become and lacking of the ability of dealing with
important than ever before. As an active character attributes in the network transactions.
defense technology, IDS (Intrusion Detection Among these the number of clusters dependency
Systems) attempts to identify intrusions in suggests that the value of K is very critical to
secure architecture. In 1970s, the U.S the clustering result and should be fixed before
Department of Defense outlined security goals clustering.
9
www.ijresonline.com
International Journal of Recent Engineering Science (IJRES),
ISSN: 2349-7157, volume1 issue2 Feb 2014
10
www.ijresonline.com
International Journal of Recent Engineering Science (IJRES),
ISSN: 2349-7157, volume1 issue2 Feb 2014
The initial center of clustering Sim (ei,ej)) = Sim N(ei,ej)+ Sim P(ei,ej)
11
www.ijresonline.com
International Journal of Recent Engineering Science (IJRES),
ISSN: 2349-7157, volume1 issue2 Feb 2014
Let P= (PN + PS), and P= (P1, P2,., Pm) 4. EXPERIMENTS AND RESULTS
where PN is the center of numerical attribute,
the Ps is the center of character attribute, In this section, we present the results of the
1 n Heuristic clustering method compare it with
PNi= hji i= 1,2,., p (p<= m)
n j 1 K-Means over the dataset 2000 DARPA . In
heuristic clustering the relations between
The hji is the numerical attribute. The Ps is categorical and continuous features are handled
a frequent character attribute set which naturally, without any forced conversions (k-
consists of q (q=m-p) most frequent character means) between these two types of features. Fig
attribute. 3.2 shows how can we detect the anomaly
instances.
Labeling
We use the measures for evaluating the
We propose a method to detect anomaly that performance
does not either depend on the population ratio 1. TPR or recall is the percentage of anomaly
of the clusters. In our labeling method, we instances correctly detected,
assume that center of a normal cluster is 2. FPR is the percentage of normal instances
highly close to the initial cluster center vh incorrectly classified as anomaly,
which are created from the clustering. 3. Precision is the percentage of correctly
In other words, if a cluster is normal, the detected anomaly instances over all the detected
distance between the center of the cluster and anomaly instances,
vh will be small, otherwise it will be 4. Total accuracy or accuracy is the
large.Thus, we first, for each cluster center percentage of all normal and anomaly instances
Cj (1 j k), calculate the maximum distance that are correctly classified.
to vh. We then calculate the average distance The identification of normal and
of the maximum distances. If the maximum abnormal classes has been represented in the
distance from a cluster to vh is less than the form of a matrix called Confusion matrix as
maximum average distance, we label the given in Table 4.1.
cluster as normal. Otherwise, label as attack.
Actual \
Predicted Normal Abnormal
Class
Normal True positive False negative
(TP) (FN)
12
www.ijresonline.com
International Journal of Recent Engineering Science (IJRES),
ISSN: 2349-7157, volume1 issue2 Feb 2014
Intrusion Detection
Detection Rate = TN / (FN + TN) * 100
95.00%
FPR
The results obtained from heuristic
clustering based intrusion detection accuracy 80.00%
accuracy, false positive rate is shown in fig Fig 4.2 Results of Detection rate with
4.2. heuristic clustering
Table 4.2 Results for anomaly detection using 5. CONCLUSIONS AND FUTURE WORKS
heuristic clustering.
This paper Network intrusion detection
Iteration Training Detection False Error with heuristic clustering incorporates the idea of
applying data mining techniques to intrusion
instances Rate Positive Rate detection system to maximize the effectiveness in
Rate identifying attacks, thereby helping the users to
0 415 90.9% 2.8% 3.4% construct more secure information systems. The
main advantage of our algorithm is that the
1 820 94.7% 2.5% 2.9% relations between categorical and continuous
2 1200 89.0% 4.4% 5.4%
features in 2000 DARPA data set are handled
naturally, without any forced conversions (k-
3 3000 87.0% 6.4% 7.5% means) between these two types of features.
For future work, we need to verify
4 5200 89.3% 2.4% 3.1%
performance of the proposed clustering
5 6130 89.7% 1.9% 2.7% algorithm over real data and make a new
benchmark dataset for intrusion detection.
6 7117 88.3% 3.1% 3.9% Future directions in this research area can
7 8045 90.7% 6.1% 7.3% be extracting more variables and to develop an
advanced detection algorithm to suit these
8 9300 87.5% 2.8% 3.6% multiple variables. The new system may be
tested with more than one benchmark data set
9 11245 91.7% 5.3% 5.4%
13
www.ijresonline.com
International Journal of Recent Engineering Science (IJRES),
ISSN: 2349-7157, volume1 issue2 Feb 2014
14
www.ijresonline.com