You are on page 1of 25

This article appeared in a journal published by Elsevier.

The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elseviers archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright

Author's personal copy


Journal of Network and Computer Applications 34 (2011) 13021325

Contents lists available at ScienceDirect

Journal of Network and Computer Applications


journal homepage: www.elsevier.com/locate/jnca

Anomaly detection in wireless sensor networks: A survey


Miao Xie ,1, Song Han , Biming Tian, Sazia Parvin
Digital Ecosystems and Business Intelligence Institute, Curtin University, DEBII, GPO Box U1987, Perth, WA 6845, Australia

a r t i c l e i n f o

a b s t r a c t

Article history:
Received 19 August 2010
Received in revised form
10 February 2011
Accepted 7 March 2011
Available online 21 March 2011

Since security threats to WSNs are increasingly being diversied and deliberate, prevention-based
techniques alone can no longer provide WSNs with adequate security. However, detection-based
techniques might be effective in collaboration with prevention-based techniques for securing WSNs. As
a signicant branch of detection-based techniques, the research of anomaly detection in wired
networks and wireless ad hoc networks is already quite mature, but such solutions can be rarely
applied to WSNs without any change, because WSNs are characterized by constrained resources, such
as limited energy, weak computation capability, poor memory, short communication range, etc. The
development of anomaly detection techniques suitable for WSNs is therefore regarded as an essential
research area, which will enable WSNs to be much more secure and reliable. In this survey paper, a few
of the key design principles relating to the development of anomaly detection techniques in WSNs are
discussed in particular. Then, the state-of-the-art techniques of anomaly detection in WSNs are
systematically introduced, according to WSNs architectures (Hierarchical/Flat) and detection technique
categories (statistical techniques, rule based, data mining, computational intelligence, game theory,
graph based, and hybrid, etc.). The analyses and comparisons of the approaches that belong to a similar
technique category are represented technically, followed by a brief discussion towards the potential
research areas in the near future and conclusion.
& 2011 Elsevier Ltd. All rights reserved.

Keywords:
Wireless sensor networks
Information security
Anomaly detection

1. Introduction
A wireless sensor network (WSN) is made up of a mass of
spatially distributed autonomous sensors, to jointly monitor
physical or environmental conditions, such as temperature,
sound, vibration, pressure, motion and pollutants (Yick et al.,
2008). To date, WSNs have been successfully applied to many
industrial and civil domains, including industrial process, monitoring and control, machine health monitoring, environment and
habitat monitoring, healthcare applications, home automation,
and trafc control. A typical WSN has little or no infrastructure. If
the deployment of a WSN is subject to an ad hoc manner, it is
categorized as unstructured. In contrast, the network deployed
with a pre-planned manner is categorized as structured. Each
sensor node is optionally built up with a variety of network
services such as localization, coverage, synchronization, data
compression and aggregation, and security, for the purpose of
enhancing the networks overall performance. Sensor nodes
communicate with each other, through following the typical
ve-layer communication protocol stack, which consists of
 Corresponding authors.

E-mail addresses: clifford1984621@gmail.com (M. Xie),


hansongau@gmail.com (S. Han).
1
Tel.: 61 040 1400624.
1084-8045/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.jnca.2011.03.004

physical layer, data link layer, network layer, transport layer,


and application layer.
The properties of WSN inevitably cause that a sensor node is
extremely restricted by resources, including energy, memory,
computing, bandwidth, and communication. Hence, WSN is
vulnerable to security threats both external and internal. In
addition, physical access is allowed for sensor nodes, as the
network is usually deployed near the physical source of the event,
but without tamper-resistance owing to cost constraint. What is
worse, the information exchange can be captured by any internal
and external devices, caused by the use of publicly accessible
communication channels. In consequence, a WSN is often threatened by multiple security threats, which could be categorized as
follows (Lopez and Zhou, 2008):







communication attack;
denial of service attack;
node compromise;
impersonation attack;
protocol-specic attack.

Han et al. (2005) also propose a good taxonomy that surveys the
security threats according to a more detailed criteria.
Securing WSN is imperative and challenging accordingly.
Prevention-based techniques that fundamentally build upon

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

1303

cryptography are the rst line of defense for protecting WSN.


Based on a primitive of secret key management, encryption and
authentication are the primary measures in a prevention-based
technique, as that introduced in the security framework SPINS
(Perrig et al., 2001). However, in case the rst line of defense is
broken through, compromised nodes could extract security-sensitive information (e.g. secret key), leading to breaches of security.
Thus, developing detection-based techniques as the second line of
defense appears to be of great importance. Intrusion detection is a
typical example of detection-based techniques. This concept was
originally proposed by Anderson (1980) two decades ago in a
report Computer Security Threat Monitoring and Surveillance.
Intrusion detection is dened as the process of monitoring the
events occurring in a computer system or network and analyzing
them for any signs of possible incidents, which are violations or
imminent threats of violation of computer policies, acceptable use
policies, or standard practices (Scarfone and Mell, 2007). However, anomaly detection (Hu, 2010, also referred as outlier
detection, deviation detection, etc.), a branch of intrusion detection, is best suited to WSN because its methodology is exible and
resource-friendly in general. Anomaly detection is dened as the
process of comparing denitions of activity that is considered
normal against observed events in order to identify signicant
deviations. Moreover, an anomaly in a dataset is dened as an
observation that appears to be inconsistent with the remainder of
the dataset (Hodge and Justin, 2004).
Anomaly may be caused by not only security threats, but also
faulty sensor nodes in the network or unusual phenomena in the
monitoring zone (Rajasegarar et al., 2008). In the real world,
isolated node failures can bring down the entire network, which
is harmful to reliability of WSN. This survey paper merely focuses
on anomaly detection techniques in WSN, irrespective of causes
of generating anomaly. The overview of the content of this survey
paper is given in Fig. 1.

non-parametric techniques. Nevertheless, a technology-concerned survey is yet absent to present the latest progress of
developing anomaly detection in WSN.
Moreover, our paper expects acting as a guideline of selecting
appropriate anomaly detection techniques. Through analyzing
and comparing those particular approaches that belong to a
similar technique category, the advantages and shortcomings of
each technique category can be identied. Accordingly, it further
extracts the key design principles to overcome possible aws.
The pattern of anomaly detection signicantly impacts on the
performance of a detection scheme, which basically relates to
who is mainly responsible for the data processing of detection.
The choice of detection pattern depends on the application
scenario. The fair understanding with regard to these available
anomaly detection patterns could facilitate the development of
detection schemes. In consequence, these anomaly detection
patterns are surveyed separately in this paper.
In our survey paper, all detection schemes are divided into two
types of detection method: prior-knowledge based, or priorknowledge free. The prior-knowledge-based detection schemes
are better suited to the applications which are biased to detection
speed; the prior-knowledge free schemes, on the contrary, are
capable of providing applications with stronger detection generality. This awareness is positive to optimally selecting anomaly
detection techniques. Attribute selection is traditionally a critical
issue in a detection system, as using less number of attributes is
able to conserve resource. Our paper emphasizes the importance
of this issue for developing anomaly detectors in WSNs, whereas a
detailed discussion is not given owing to space constraint.
Finally, the developing orientations in this area are examined,
and a number of potential research areas in the near future are
proposed.

1.1. Motivation

Other than anomaly detection, there are also misuse/signature detection and stateful protocol analysis in the category of
intrusion detection (Scarfone and Mell, 2007). Misuse/signature
detection is dened as a process of comparing signatures against
observed events to identify possible incidents, where each
signature is a pattern corresponding to a known threat. Stateful
protocol analysis is dened as the process of comparing predetermined proles of generally accepted denitions of benign

The research relating to anomaly detection in WSN has been


followed with much interest in recent years. From the ISSNIP
(Intelligent Sensors, Sensor Networks and Information Processing, The University of Melbourne, Australia) group, Rajasegarar
et al. (2008) did a survey on the related works before 2007 with a
simpler criteria: statistical parameter estimation techniques or

1.2. State-of-the-art techniques

Fig. 1. The content of this survey paper.

Author's personal copy


1304

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

protocol activities for each protocol state against observed


events to identify outliers. Misuse/signature detection and stateful protocol analysis need complicated expression computing
and/or sizeable memory, to which WSNs usually cannot afford.
Moreover, they are unable to defense against unknown security
threats. Consequently, anomaly detection is currently the dominant technology for enhancing the security and reliability
of WSN.
Though WSN is derived from wireless ad hoc networks, the
most of detection schemes well-functioned in ad hoc networks
are not suitable for WSN, probably because (Akyildiz et al., 2002):

Table 1
Summary of the taxonomy.
Category

Techniques

Statistical
Data mining
Computational intelligence
Rule
Game theory
Graph
Hybrid

Distribution
Measure
Model
Clustering
SVM
Rule learner
SOM
ANN
GA
Assumption
Experience
Non-cooperative and non-zero-sum
Tree construction
Depth-rst search
Prevention and detection

 the number of sensor nodes in a WSN can be several orders of









magnitude higher than that of an ad hoc network;


sensor nodes are densely deployed;
a sensor node is less stable;
the topology of WSNs varies frequently;
sensor nodes mainly use a broadcast communication paradigm, whereas ad hoc networks are mainly based on point-topoint communication;
each sensor node is highly constrained in energy, computation
capability, memory, etc.
sensor nodes may have no global identications as a result of
the large amount of overhead.

Accordingly, the advanced anomaly detection schemes in ad


hoc networks (Qian et al., 2007; Tarique et al., 2009; Wu et al.,
2007) cannot be applied to WSN, as well as those developed in
wired networks.
In this survey paper, recently proposed detection schemes
in WSN are introduced. Because the architecture of a WSN is
strongly related to many aspects of designing a suited scheme,
these detection schemes are classied as hierarchical and at
(homogeneous) according to their architectures. In a hierarchical
WSN, all sensor nodes are grouped or clustered, where only a
single node is elected as the cluster head (possibly equipped with
stronger capacity) to conduct the organizational functions within
its group or cluster. On the contrary, all sensor nodes equally
contribute to any team-functions and participate in internal
protocols (e.g. routing protocols) in a at WSN. For each of the
architectures, a number of typical examples are given in terms of
the technique category that they belong to.
As far as the technique categories, statistical techniques, data
mining, and computational intelligence are employed most
widely. Statistical techniques consist of statistical distribution
(Palpanas et al., 2003; Subramaniam et al., 2006; Liu et al., 2007;
Dallas et al., 2007; Li et al., 2008a; Tiwari et al., 2009), statistical
measure (e.g. mean, variance, self-dened, etc.) (Zhang et al.,
2008; Pires et al., 2004; Onat and Miri, 2005a,b; Li et al., 2008b),
and statistical model (e.g. auto regression) (Curiac et al., 2007).
Computational intelligence is closely linked to machine learning
and remotely linked to data mining. Conceptually, machine
learning is more concerned with design and development of the
algorithms that enable computers to learn from large-scale
datasets. Data mining, however, principally focuses on discovering patterns, associations, changes, anomalies, and statistically
signicant structures and events in datasets. Under the technique
category of data mining and computational intelligence, a couple
of examples are introduced, including clustering algorithms
(Rajasegarar et al., 2006; Masud et al., 2009; Wang et al., 2009),
support vector machine (SVM) (Rajasegarar et al., 2007), articial
neural network (ANN) (Wang et al., 2009), self-organizing map
(SOM) (Wang et al., 2009), genetic algorithm (GA) (Rahul et al.,
2009), and association rule learning (Yu and Tsai, 2008). Game
theory is dedicated to build up smart strategies for identifying
vulnerable areas in WSN (Agah et al., 2004a,b). There is only a

case that concentrates on linking detection with prevention


together to protect a hierarchical WSN from both internal and
external attacks (Su et al., 2005). Graph-based techniques specialize in modeling a graph with the network ow (Ngai et al., 2006,
2007), which allows applying a few of graph algorithms (such as
tree construction, depth-rst search, etc.) to detect anomaly.
Finally, rule-based techniques, which often build upon priorknowledge such as assumption and experience, are preferred in
at WSNs (Silva et al., 2005; Yu and Xiao, 2006; Ioannis et al.,
2007; Ho et al., 2009). Table 1 shows this taxonomy in brief.
1.3. Key challenge
The key challenge of evolving anomaly detection in WSN is to
identify anomaly with high accuracy but minimized energy cost,
so as to prolong the lifetime of the entire network. This target
could be attained from several paths. Above all, paying much
more attention on lightweight detection techniques, which are
characterized by compactness and efciency. Second, reconstructing detection schemes with a distributed manner can spread the
energy overhead around the entire network and markedly reduce
the communication overhead, such that the lifetime of the network stretches. A suited detection pattern could also conserve the
energy cost without losing the security and reliability. In addition,
taking smart strategies into account such as shrinking the scale of
attributes set, compressing the input dataset, and simplifying the
procedure of analysis and decision could make lots of progress for
conserving energy.
1.4. Organization
The rest of this paper is organized as follows. In the second
section, these key design principles with respect to anomaly
detection in WSNs are discussed in detail. The following two
sections introduce many representative detection schemes, in
terms of hierarchical and at topologies respectively. The fth
section states the analysis and comparisons between schemes
that belong to a similar technique category. Finally, this survey is
summarized with a presentation about the potential research
areas in the near future.

2. Key design principles


The key design principles of anomaly detection in WSN must
be followed along with several aspects







target;
typical security threats;
detection pattern;
detection method;
attribute selection.

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

2.1. Target
The target implies what a detection scheme is expected to be
able to do. In order for ensuring the performance, a detection
scheme is suggested to achieve a target comprising of Ioannis
et al. (2007):
Effectiveness: The effectiveness of a detection scheme reect by
the detection accuracy and false alarm rate. The rate of detection
accuracy is the number of successfully detected anomalies divides
by the number of total anomalies. False alarm consists of false
positive and false negative, where a false positive signies a
legitimate activity is falsely identied as an anomaly, and a miss
of capturing a real anomaly results in a false negative. False alarm
rate is the number of false alarm divides by the number of
reported anomalies. A good scheme should reach at high detection accuracy rate while remaining false alarm rate down. On the
other hand, the ability of detecting unknown (new types of
anomaly) anomalies is also signicant as security threats to
WSN are more and more diversied and deliberate. This ability
is referred as detection generality in this paper.
Minimized resource: WSN characterizes by tremendously constrained resources, especially the availability of energy. As a
result, minimizing the energy cost is a priority. The less use of
resource partly determines faster detection speed, but probably
leads to the loss of effectiveness. In consequence, it is difcult to
trade off the effectiveness and resource usage. According to a
truth that the most of energy in a sensor node is drained by radio
communication rather than by computation (Roman et al., 2006),
activating in-network computing as much as possible, namely
using distributed manner for computing, might be a promising
way to address this issue. In addition, the resource conservation
may come with effort made to design lightweight detection
schemes as well as smart strategies.
Trust no node: Unlike wired networks or ad hoc networks, a
sensor node can be compromised easily due to its weakness.
Accordingly, a detection scheme has to meet the criterion
trust-no-node at any time. Based on a security foundation
(Zhang et al., 2008; Curiac et al., 2007; Su et al., 2005; Ngai
et al., 2006, 2007; Yu and Xiao, 2006; Ho et al., 2009), adding a
process of data ltering (Liu et al., 2007), and employing a vote
(or similar) mechanism (Liu et al., 2007; Li et al., 2008a,b;
Tiwari et al., 2009; Pires et al., 2004; Ioannis et al., 2007) might
be effective for directly ensuring the legitimate identity of a
sensor node or diluting the bad effects caused by the unattended malicious nodes.
Be secure: The detection schemes themselves must be secure,
because the line of defense would be destroyed to the ground if
sophisticated adversaries disable or jump over the detection
service before initiate thorough attacks. In theory, adversaries
could make use of analytical measures to speculate what a kind of
detection rules or algorithms is in employment by their targeted
schemes. Furthermore, adversaries perhaps wreck the detection
scheme with brute force. The survivability against malicious
activities is thus a signicant point to assess the security of
detection schemes themselves. Moreover, the optimal detection
scheme must own the capability to recover its detection service
immediately once being wrecked, which is referred as tolerability.
2.2. Typical security threats
The typical security threats to WSN which can be identied by
a detection scheme should be fully reviewed. Many surveys
regarding these security threats have been introduced (Lopez
and Zhou, 2008; Han et al., 2005) according to different criteria,
but detection is not effective against all of the mentioned threats,
such as eavesdropping attack only can be resisted by the built-in

1305

Table 2
The typical security threats and preferred countermeasures.
Security threats

Preferred countermeasures

Black-hole
Malicious node
Sinkhole
Selective forwarding
Wormhole
Replica node
Random failure

Statistical measure
Statistical distribution, data mining
Graph, rule
Statistical measure, data mining
Statistical measure, rule
Rule
Statistical distribution, data mining

security foundation. On the other hand, the relationship between


these threats is sometimes indistinguishable. Selective forwarding attack is a subsequent offence based on sinkhole attack, for
example, whereas the breakthrough of a sinkhole attack will
result in not only the following selective forwarding attack, but
also a series of severe security damages such as message alter. As
a result, the typical security threats and their countermeasures
which have been mentioned in the cited papers are roughly
shown in Table 2. In fact, more comparisons should be put into
practice, such as the damage scope of each security threat, the
damage degree of each security threat, the symptom of each
security threat (relating to attribute selection, see Section 2.5),
etc. This full work is expected to be nished separately, due to the
space limitation. Random failure is regarded as a special case of
security threats here, as anomaly detection is also able to deal
with it.

2.3. Detection pattern


Axelsson (1998) proposed a generic framework of intrusion
detection systems (IDSs), consisting of audit collection/storage,
processing, conguration/reference data, active/processing data,
and alarm. As a branch technique of intrusion detection, a generic
framework of anomaly detection systems (ADSs) is simply
derived from the original IDS framework, which is comprised of
input, data processing, analysis and decision, and output (Chandola
et al., 2009). In general, a dataset that includes a collection of data
instances is the input for anomaly detection. A data instance
consists of a set of attributes, either univariate or multivariate.
The feature of an attribute could be binary, categorical, or
continuous. In the procedure of data processing, a normal prole
representing the benign status of the system is produced with a
training procedure, or with prior-knowledge. Certain detection
schemes probably need a special procedure of preprocessing.
According to the label of the input dataset, supervised, semisupervised, and unsupervised are popular methodologies to
training. Relying on the established normal prole, a test instance
can be identied whether it is an anomaly with specied algorithms, during the procedure of analysis and decision. Usually,
single or multiple thresholds will be established for doing this
task. The type of anomaly could be point, contextual, or collective.
The nal result, namely the output is produced by the anomaly
detector as one of the two possible forms: score or label. Figure 2
illustrates the generic framework of anomaly detection.
As for the detection pattern, it is basically linked to who takes
charge of carrying out the data processing procedure of anomaly
detection, since this is deterministic to many design details of a
scheme as well as its performance. Depending on the architecture
of a WSN, a range of detection patterns have been in use, which
will be briey described below. Moreover, Table 3 shows a list of
these popular detection patterns and their corresponding references, where we use CH and CSN stand for cluster head and
common sponsor node for short.

Author's personal copy


1306

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

Fig. 2. Generic framework of anomaly detection.

Table 3
Popular detection patterns.
Hierarchical WSNs

Flat WSNs

Patterns

References

Patterns

References

CH
CH and CSNs

Wang et al. (2009) and Su et al. (2005)


Palpanas et al. (2003), Subramaniam et al. (2006),
Zhang et al. (2008), and Rajasegarar et al. (2006, 2007)
Masud et al. (2009) and Rahul et al. (2009)

One-hop
Radio-range

Onat and Miri (2005a) and Onat and Miri (2005b)


Liu et al. (2007), Pires et al. (2004), and Silva et al. (2005)

Other

Dallas et al. (2007), Yu and Tsai (2008), Yu and Xiao (2006),


and Ioannis et al. (2007); Ho et al. (2009)
Curiac et al. (2007) and Ngai et al. (2006, 2007)
Li et al. (2008a,b)

Base station

Base station
Grouping

In a hierarchical WSN, basically there are three available


detection patterns. First, the cluster head is responsible for
the data processing procedure alone (Wang et al., 2009; Su
et al., 2005). Second, the cluster head and common sensor
nodes cooperate to accomplish this (Palpanas et al., 2003;
Subramaniam et al., 2006; Zhang et al., 2008; Rajasegarar
et al., 2006, 2007). Third, this procedure is carried out at the
base station (Masud et al., 2009; Rahul et al., 2009). In the rst
pattern, except collecting the input datasets the common sensor
nodes do not participate in the data processing procedure, and/or
partly contribute to the procedure of analysis and decision; the
cluster head alone is in charge of the data processing procedure.
However, this clearly leads to the overuse of energy in the cluster
head. As a result, the second and third detection patterns seem to
be more reasonable. None of them considers having the cluster
head attended; this may fail to meet the criterion trust-nonode. One possible remedy is letting the common sensor nodes
to monitor the cluster head by turns, such as picking out a part
of nodes according to their remaining energy (Wang et al., 2009;
Su et al., 2005). These detection patterns are illustrated in Fig. 3.
There are also three broad categories of detection pattern in
at WSNs. First, a part of nodes are on duty for covering its
neighborhood according to certain specication. In detail, this
neighborhood can be its one-hop (Onat and Miri, 2005a,b),
radio range (Liu et al., 2007; Pires et al., 2004; Silva et al., 2005),

or other (Dallas et al., 2007; Yu and Tsai, 2008; Yu and Xiao, 2006;
Ioannis et al., 2007; Ho et al., 2009). The active nodes take care of its
specied neighborhood by monitoring and accomplishing the procedure of data processing. The procedure of analysis and decision may
be resolved by the active nodes alone or a cooperative method.
Second, the base station conducts anomaly detection across the
network (Curiac et al., 2007; Ngai et al., 2006, 2007). Third, partition
the network into groups and then activate a part of sensor nodes in
each group to take charge of the monitoring and data processing
procedure (Li et al., 2008a,b). The common shortcoming of the rst
pattern is the redundancy of protection coverage, because there is no
mechanism capable of accurately measuring the maximal protection
coverage that the active nodes can afford. As far as the third pattern, it
provides at WSNs with a chance as employing advanced technique
as hierarchical WSNs. However, the grouping procedure certainly
brings a massive energy burden. Available detection patterns in at
WSNs are shown in Fig. 4.
2.4. Detection method
Detection method is a key point of a detection scheme, as the
method impacts on its usable scope. The applicable range of a
scheme is to be restricted by the preconditions, according to
which two detection methods are introduced: prior-knowledge
based and prior-knowledge free.

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

A hierarchical wireless sensor network

Pattern 1 CH
Base Station

A Cluster
Pattern 2 CH & CSNs

Common Sensor Node


Pattern 3 BS
Cluster Head
Fig. 3. Available detection patterns (hierarchical).

A flat wireless sensor network

Base Station

Pattern 1 One-hop

Pattern 4 BS

Pattern 2 Radio Range

Pattern 5 Grouping

A Group

Sensor Node
Pattern 3 Other
Working Node
Fig. 4. Available detection patterns (at).

1307

Author's personal copy


1308

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

Fig. 5. Process of identifying detection techniques. DM: Data Mining; CI: Computational Intelligence; IDA: Intrusion Detection Agent; SF: Security Foundation;
VD: Verifying Dataset; Stat: Statistical Techniques and DAD: Distributed Anomaly Detection.

The knowledge regarding anomaly detection often consists of


assumption (Palpanas et al., 2003; Subramaniam et al., 2006; Liu
et al., 2007; Dallas et al., 2007; Li et al., 2008a; Pires et al., 2004;
Curiac et al., 2007; Ho et al., 2009), and experience (Tiwari et al.,
2009; Silva et al., 2005; Ioannis et al., 2007). If a normal prole is
produced on the basis of the knowledge known in advance
instead of by an explicit training procedure, this scheme is
categorized as prior-knowledge based. For instance, a detector
is put into practice in terms of the assumption that the Mahalanobis squared distance constructed by the networking attributes
is subject to chi-square distribution (Liu et al., 2007). Based on the
assumption that the signal propagates with a known model (e.g.
two-ray ground model), a detection scheme is carried out by
comparing the estimated signal strength from the given model
and the real signal strength from the transceiver (Pires et al.,
2004). Security experts suggest that a node is highly possible to
be compromised if it discards the packets more than w percentage during t time units; through this experience, a detection rule
is established (Tiwari et al., 2009; Ioannis et al., 2007).
A prior-knowledge free scheme allows performing detection
without any related knowledge in advance. The normal prole is
produced by a training procedure. All data mining and computational intelligence-based and graph-based detection schemes are
prior-knowledge free (Rajasegarar et al., 2006, 2007; Masud et al.,
2009; Wang et al., 2009; Rahul et al., 2009; Yu and Tsai, 2008;
Ngai et al., 2006, 2007), as well as the most of statistical detection
schemes (Zhang et al., 2008; Onat and Miri, 2005a,b; Li et al.,
2008b). Classication is a typical detection technique derived
from the family of data mining, in which the classier is built
upon the training procedure. As to computational intelligence, GA
is a good example, which is applied to measure the tness of node
without any prior-knowledge (Rahul et al., 2009), and then a
detection scheme can be optimally deployed. With the network
ow information, sensor nodes are divided into many sub-trees,

where the root of biggest sub-tree is regarded as a compromised


node (Ngai et al., 2006, 2007). In addition, the standard deviation
of packet arrival intervals during a specied time period is trained
as the normal prole for identifying anomaly (Onat and Miri,
2005a), in accordance to
jmeanrecBuf meanintBuf j 4 K  stdrecBuf :
In conclusion, the dependency on prior-knowledge certainly
limits their applicability, but prior-knowledge-based schemes are
generally good at detecting anomaly that closely correlates to
their known knowledge. Besides, these schemes are usually with
fast detection speed, and simplicity of being realized. On the
contrary, prior-knowledge free detection schemes may be awkward at detection speed, whereas they are provided with stronger
capability of addressing unknown security threats or random
failures. Consequently, a rough process of identifying appropriate
detection techniques is shown in Fig. 5.
2.5. Attribute selection
A truth of interest is that the most of malicious activities or
random failures against a WSN could be reected by a single attribute
or multiple ones over the network. In fact, this is the essence why
anomaly detection can take effect to enhance the security and
reliability of WSN. For example, the irregular change of hop count
implicates a huge likelihood of being endangered by sinkhole attacks
(Dallas et al., 2007; Ngai et al., 2006, 2007); the signal power is
impractical while encountering Hello ood and wormhole attacks
(Pires et al., 2004); the insider attacks markedly affect the underlying
distribution of the sensed data (Liu et al., 2007); and the network
trafc behaviors related measurements such as packet dropping rate
(Ioannis et al., 2007) and packets arrival process (Onat and Miri,
2005a) are capable of identifying black-hole and selective forwarding
attacks. This nature of attribute makes it a critical research problem.

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

Furthermore, a reduced set of attributes can improve the detection


speed as well as the detection accuracy remarkably (Chebrolu et al.,
2005; Kloft et al., 2008). But, this problem remains open in the
anomaly detection of WSNs, despite little progress has been sporadically made (Silva et al., 2005; Ho et al., 2009). This issue would be
accounted for separately later, owing to the space limitation.

3. Anomaly detection based on hierarchical WSNs


In hierarchical WSNs, statistical techniques, data mining and
computational intelligence, game theory, and hybrid detection
have been employed to realizing detection schemes. The input is
collected at each common sensor node, probably followed by a
preprocessing procedure or a part of computation tasks coming
from the procedure of data processing. The original/preprocessed
inputs or local normal proles are then sent to the cluster head or
base station, where the global normal prole is produced with a
training algorithm, some prior-knowledge or a combing algorithm
during the data processing procedure. The procedure of analysis
and decision would be carried out at each common sensor node or
the cluster head respectively, or both. Finally, the output of
anomaly detection is produced as a specied form where the
analysis and decision procedure have been done. Basically, these
techniques tend to nd a normal prole using a training procedure in order to realize higher detection generality. Thus, the
most of their detection methods belong to prior-knowledge free.
One common feature of these detection schemes is making
use of their hierarchical architecture to implement detection
within a distributed manner, which spreads the energy overhead
around the entire network and relieves the communication
burden. Because in distributed detection a central entity is
required to globally organize and coordinate the sub-computation tasks throughout a group, the cluster head suits to such
naturally. In a distributed detection scheme, the common sensor
nodes participate in the procedure of data processing, thereby
taking over a part of computing cost of the cluster head, and
capable of exchanging less information with the cluster head in
order for conserving the communication cost. For example, kernel
density estimator (Palpanas et al., 2003; Subramaniam et al., 2006),
clustering algorithms (Rajasegarar et al., 2006; Masud et al., 2009),
and support vector machine (SVM) (Rajasegarar et al., 2007) are
typical techniques, upon which these distributed schemes depend.
In the following, a number of particular detection schemes are
introduced according to their technique categories, for each of
which its principle, detection pattern, detection method, and any
unique feature or additional strategy are depicted in detail.
3.1. Statistical techniques
3.1.1. Distributed detection using kernel density estimator
A kernel density estimator is built up to identify anomaly by
estimating the underlying distribution of sensed data (Palpanas
et al., 2003). First, each common sensor node accomplishes the
local detection. The cluster head then collects all local normal
proles to carry out the global detection within its group. For the
purpose of ensuring the smooth delivery of streaming data, each
discrete event occurs under the control of timing parameters:
dead line and importance.
The principle is simply described as follows. Given that S is a
random sample of static relation T and k(x) is the kernel function,
such that for all tuples in S,
f x

1X
kxti :
n t AS
i

1309

The underlying distribution f(x) is estimated with


f x

1X
kxti :
n t AS
i

Epanechnikov kernel is employed in this case, as


8
  2   
>
x
< 3 1 1 x
,   o 1,
B
B
kx 4 B
>
: 0 otherwise,
p
where B (B 5sjSj1=5 , and s is the standard deviation of T) is the
bandwidth of kernel function. Once f(x) is estimated, it enables
identifying anomaly through calculating the number of sensed
datas values ranged within the neighborhood of t0. N(t0, r) is the
number of sensed datas values in T, which are falling into a
sphere of radius r around t0, as
Z
Nt0 ,r f x dx:
r

If N(t0,r) is less than a threshold p, t0 is identied as an anomaly.


Afterwards, the sample set S and bandwidth of kernel function
B at each common senor node are sent to the cluster head. Using a
combing algorithm, the cluster head is able to work out the global
normal prole, by which the global detection is launched then.
Kernel density estimator is good at approximating the underlying distribution of a multiple dimensional dataset with reasonable resource cost. Moreover, it is easy to be operated in a
distributed manner by combining the bandwidths of local kernel
functions together. The choice of kernel function is critical to the
performance; however, the estimation of parameters is a hard
problem in this kind of non-parametric statistical techniques.
3.1.2. Online detection using kernel density estimator
In the advanced kernel density estimator-based detection
scheme (Subramaniam et al., 2006), many enhancements are
gured out in contrast with its original effort (Palpanas et al.,
2003). The online approximation of sensed data in a sliding
window is proposed, using chain-sample algorithm. In the
interest of supporting the online approximation, a couple of
points are improved. First, the size of the resulting set from two
sensor nodes is reduced by the technique of warehousing of
samples. Second, a suitable technique for computing the standard
deviation in a sliding window of streaming data is made use of
facilitating the combination of bandwidths, as
V1,2 V1 V2

N1 N2
m1 m2 2 ,
N1,2

where m is the mean, V is the variance, N1,2 N1 N2, and


m1,2 m1 N1 m2 N2 =N1,2 . Third, each common sensor node only
reports the update of its kernel density estimator with a probability f jRp j=ljRj, where a parent node has l children nodes, each
with a kernel density estimator of size jRj, and the kernel density
estimator of parent node has size jRp j. Except distributed deviation detection algorithm which is based on distance (Palpanas
et al., 2003), a new local metrics-based algorithm multi-granular
deviation detection (MGDD) is introduced. Given that MDEFp,r, a
is the deviation factor of an observation p, and sMDEF p,r, a is the
normalized standard deviation in the sampling neighborhood of
p, p is agged as an anomaly if
MDEFp,r, a 4ks sMDEF p,r, a,
where ks is the factor of determining a signicant deviation.
Online detection is carried out in this advanced scheme. With
a probability-based strategy, the normal prole can be regularly
updated to meet the dynamic of system but not incurring too
much energy cost. On the other hand, a new local metrics-based

Author's personal copy


1310

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

algorithm is introduced to detection, which suits to the dataset


indistinguishable by distance.
3.1.3. Detection using statistical measures
Relying on spatiotemporal correlation and consistency in some
spatial granularity, and a frequency mechanism respectively, a
detection scheme is designed to deal with insider attacks (Zhang
et al., 2008), such as exceptional message and abnormal behavior.
Two detection mechanisms are introduced, one of which is that
the cluster head covers its group, and the other one is that each
common sensor node watches its one-hop neighbors. A random
secret key pre-distribution mechanism cooperates with this
detection scheme.
The principle of the exceptional message detection mechanism
(EMDM) is adopting the similarity between a pair of messages
coming from the common sensor nodes to identify anomaly.
Given a dynamic set maintained by the cluster head
D fMi ,Wi jM1 ,W1 ,M2 ,W2 , . . . ,Mn ,Wn g,
where Mi stands for a recorded message, Wi is the weight
(frequency) of Mi. When a new message Mnew arrives at the
cluster head, Mnew traverses across D. If Mnew matches with any Mi
in accordance to
simMnew ,Mi

VMnew  VMi
,
VMnew  VMi

namely the similarity between Mnew and Mi is less than a threshold, Mnew is identied as normal and its corresponding Wi
increases. Otherwise, Mnew is put into a new observing period to
eventually determine it is a new type of message or fake message.
If similar messages come from the other nodes during this period,
Mnew is a new type of normal message; on the contrary, Mnew is a
fake message rmly. The sender of Mnew is marked as malicious
immediately, and let the other common sensor nodes and base
station be informed.
As for the abnormal behavior detection mechanism (ABDM),
two measures are employed to identify anomaly. One is to
examine if a common sensor node sends too much or too less
messages in a turn. The other one is built upon a security
foundation. Each common sensor node records its one-hop
neighbors ID and N(IDi), where N(IDi) is the value of the abnormal
behavior of node IDi. Given

jIDx IDj ,NIDj jID1 ,NID1 , . . . ,IDm ,NIDm ,


where m is the number of IDxs neighbors, and
uIDx

m
1X
NIDj ,
mj1

sIDx

v
u
m
u 1 X
t
NIDj mIDx ,
m1 j 1

jIDx



NIDj m 
IDj 


,


sIDj

where uIDx and sIDx denote the mean and standard deviation of
jIDx respectively, if jIDx is deviated from a normal value, node
IDj will be reported to the cluster head as suspicious node.
This detection scheme makes use of a comparatively simple
technique, such that a faster detection speed comes true. Because
EMDM and ABDM work together, the cluster head and common
sensor nodes activate to perform detection at the same time,
which may provide the network with stronger security. However,
an apparent aw exists in EMDM. If more than one malicious
node sends the same fake messages, EMDM is incapable of
sustaining its operation against such attacks.

3.1.4. Detection using rules based on probability


Tiwari et al. lead a probability model (Tiwari et al., 2009) into
the rule-based scheme (Ioannis et al., 2007), aiming at black-hole
and selective forwarding attacks. By using the probability model
to more accurately measure the trafc behaviors, the false alarm
rate of the rule-based detection scheme can be sharply reduced. A
part of the common sensor nodes are selected as watchdogs, to
monitoring the neighbors within its radio range; the cluster head
is responsible for the analysis and decision procedure.
This scheme employs two detection rules: (A) During a time
window of w, if the probability p0 of packets dropping in a sensor
node is greater than a threshold t, this node is reported as
suspicious; (B) if the probability p of a sensor node being reported
as suspicious is greater than 50%, the cluster head marks it as
compromised denitely. At each watchdog, the network trafc
pattern is modeled with Poison distribution. If the expected
amount of occurrences during a given interval is l, the probability
of k occurrences (non-negative integer, k0,1,2y) is equal to
f k, l

lk el
k!

where l can be estimated according to network learning. If a


sudden change of the network trafc in a sensor node is perceived
by a watchdog, this node is reported as suspicious to the cluster
head. The rest of the watchdogs covering the radio range where a
suspicion appears, are called for participating in the procedure of
analysis and decision. During this procedure, if the probability p0
reported by a watchdog against the suspicious node is greater
than t, the cluster head records it as 1, otherwise 0. After a
specied time interval, the cluster head generates a probability
sequence against the suspicious node, with the reports of watchdogs. This sequence is split into two-bit pairs; afterwards, all 00
and 11 pairs are eliminated for preventing from bias. Let the
probability of outcome 0 be q and 1 be 1  q. p is then
computed from the resulting sequence; if (B) is satised, the
suspicious node is marked as a compromised node denitively.
This scheme improves a rule-based detection scheme by
taking advantage of probability-based measure, reducing the false
alarm rate signicantly.

3.1.5. Research problems


Statistical techniques-based detection schemes are exible.
Single or multiple attributes over the network such as the
network trafc (Tiwari et al., 2009) and the sensed data (multidimensional) (Palpanas et al., 2003; Subramaniam et al., 2006)
can be utilized to construct a variety of statistical distributions; or
the statistical measurements are dedicated to reect a normal
status, such as similarity, mean, variance, standard deviation
(Zhang et al., 2008), etc. Taking the appropriate statistical distributions and measurements into account is necessary for the
sake of meeting a wider range of application scenarios.
The benets of distributed manner are already mentioned. It is
strongly encouraged that makes use of it as much as possible.
Statistical techniques own great potential to be reconstructed in a
distributed manner, because their core computing tasks are able
to be divided into smaller ones and then combined easily, such as
kernel density estimator (Palpanas et al., 2003; Subramaniam
et al., 2006). Moving along this path, the detection schemes based
on statistical techniques can be implemented with stronger
detection generality, but resource-efcient.
Online detection, which is of great signicance for many realtime application scenarios, has brought to success with kernel
density estimator technique (Subramaniam et al., 2006). However, this needs smart strategies to enormously reduce the
information exchange.

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

1311

Incorporating other techniques into statistical techniques


could boost the detection performance, such as rule-based detection technique (Tiwari et al., 2009), where a couple of detection
rules are set up to avoid the difculty of training the normal
prole, but using a probability model to accurately measure the
trafc behaviors.

locally computed radii to work out a global radius. Detection is


then launched at each common sensor node with the global
normal prole.
In terms of the optimization problem:

3.2. Data mining and computational intelligence-based techniques

where xi is a data vector, the mapped vector jxi is called


as image vector, R is the radius of the quarter-sphere, and
fxi : i 1 . . . ng are the slack variables that allow a part of the
image vectors lying outside the quarter-sphere. This problem can
be resolved by Lagrange algorithm. The image vectors consequently may fall inside, on the boundary of, and outside the
quarter-sphere (outliers). Subsequently, the cluster head collects
the radii locally computed at each common sensor node to obtain
a global radius Rm. A couple of measures are optional to compute
Rm: mean, median, maximum, and minimum. When the common
sensor nodes receive Rm, detection is initiated. If a test instance xi
satises

3.2.1. Distributed detection using K-means clustering


With a K-means clustering algorithm, Rajasegarar et al. (2006)
design a distributed detection scheme. Each common sensor node
locally collects the input dataset to work out a normal prole.
Then the cluster head collects all local normal proles to accomplish the procedure of data processing, where a global normal
prole is produced. After received the global normal prole, each
common sensor node initiates the analysis and decision procedure to perform detection. In order to t in distance-based
clustering, the input dataset is normalized at each common
sensor node with a preprocessing procedure.
Given a dataset vkj, k 1ym, it is transformed to
ukj vkj mvj =dvj ,
where mvj and dvj stand for the mean and standard deviation of
the jth attribute in vkj ,8k respectively. Subsequently ukj is normalized in the interval [0,1], according to
u kj ukj minuj =maxuj minuj :
Given a common sensor node si collecting a dataset Xi, si sends the
local normal prole
!
m
m
X
X
i
i 2
i
i
xk ,
xk ,m,xmax ,xmin
k1

k1

to the cluster head, where m stands for jXi j. After the global
normal prole
2

mG , dG ,xGmax ,xGmin
is computed, the cluster head sends it back to the common sensor
nodes. After received the global normal prole, each common
sensor node initiates detection locally, using a xed-width clustering algorithm. If the Euclidean distance between a data point
and its closest cluster centroid is larger than a user-specied
radius o, a new cluster is organized with this data point as
centroid. For reducing the number of resulting clusters, a cluster
merging process is then conducted, through measuring the innercluster distances. The clusters c1 and c2 merge if their innercluster distance d(c1,c2) is less than o. Finally, the average intercluster distance of K nearest neighbor (KNN) clusters is applied to
identify anomalous clusters. Let ICDi be the average inter-cluster
distance (KNN) of cluster i, AVG(ICD) and SD(ICD) be the mean and
standard deviation of all inter-cluster distances respectively. If
ICDi 4SDICD AVGICD,
cluster i is viewed as anomalous.
This detection scheme is subject to a distributed manner,
where the common sensor nodes are responsible for a part of the
global normalizing procedure, which is served for the core
K-means clustering algorithm. There is a four-parameter tuple
making up a normal prole, which conserves energy cost in
communications.
3.2.2. Distributed detection using SVM
One-class quarter-sphere SVM, as a representative algorithm
of SVM, is also suited to distribute anomaly detection (Rajasegarar
et al., 2007). First, the local quarter-sphere is computed at each
common sensor node. Second, the cluster heads collects these

min

R A R, e A Rn

R2

n
1X
x,
vni 1 i

s:t:

Jjxi J2 rR2 xi ,

xi Z 0,

~ ,x 4 R2 ,
normkx
i i
m
xi is identied as an anomaly.
This scheme may suffer from a more massive procedure of
data processing, as a result of the high complexity of SVM. But,
only one parameter as the normal prole is exchanged between
the cluster head and common sensor nodes, indicating mush less
communication cost.
3.2.3. Distributed detection using clustering ellipsoids
Across the entire network, a WSN probably contains multiple
types of data underlying distribution; accordingly, Moshtaghi
et al. propose a distributed detection scheme based on clustering
ellipsoids (Masud et al., 2009). The base station takes charge of
computing the global hyper-ellipsoid, to accommodate the nonhomogenous data underlying distributions. The common sensor
nodes are in charge of performing detection, on the other hand,
with the global hyper-ellipsoid.
The general form of the elliptical boundary is represented as
ella,A; t fx A Rp jxaT Axa t 2 g,
where a is the center of the ellipsoid and t is its effective radius.
The Mahalanobis distance of x is
q
JxmJV 1 xmT V 1 xm,
where m is the mean and V is the covariance matrix. Consequently, x is actually resided within a hyper-ellipsoidal boundary
if its Mahalanobis distance is t, i.e.:
Bm,V 1 ; t fx A Rp jJxmJ2V 1 t 2 g:
x is considered as a local anomaly if falling outside this boundary.
Hyper-ellipsoids are sent to the base station by the common
sensor nodes as local normal proles, where a global ellipsoid is
produced. In order to satisfy as many types of data underlying
distribution as possible, t is intentionally selected. In addition,
these ellipsoids reported by the common sensor nodes are
disposed off with clustering which reduces the redundancy
between them. Given a common sensor node Nj sending the
parameter tuple (mj, Vj, nj) regarding its local ellipse Ej to the base
station B, the similarity between two ellipsoids is measured as
SE1 ,E2 eJm1 m2 J :
Positive root eigenvalue (PRE) plot is employed to estimate the
number of clusters c. Ellipses merge as a pairwise manner when
the similarities and c are ready: Let (mu, Vu, nu) and (mv, Vv, nv) be
the parameter tuples of the ellipsoids Eu and Ev respectively, the

Author's personal copy


1312

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

parameter tuple of the global ellipse E0 will be (m,V,n):

probability Pi, as

n nu nv ,

F
Pi PN i

nu
nv
mu mv ,
m
n
n
V

nu 1
nv 1
nu nv
Vu
Vv
mu mv mu mv T :
n1
n1
nn1

This parameter tuple of the global ellipse is the global normal


prole in fact. When the common sensor nodes receive it from B,
detection is launched locally.
Using the base station to undertake the main computing tasks,
this detection scheme is energy-efcient. However, there is a
scope for thinking over better similarity measures for hyperellipsoids, which take the shape and orientation of the ellipses
into consideration, as well as their separation. Moreover, more
robust methods are in need to merge ellipses which are from
slightly different underlying distributions. In this context, it also
desires for a more appropriate boundary than a standard deviation, in order to avoid excessive false positive alarms.

3.2.4. Detection using multi-agent and rened clustering


Wang et al. (2009) introduce a multi-agents-based detection
scheme, which takes advantage of self-organizing map (SOM)
neural network algorithm and K-means clustering algorithm.
Detection agents including sentry, analysis, response, and management are attached to each node over the network, which
particularly take charge of detection. In this scheme, the cluster
head is taking care of its common sensor nodes, whereas a part of
common sensor nodes are activated in terms of their remaining
energy for monitoring the cluster head.
In fact, the cluster head and common sensor nodes monitor
with each other, using a same principle. The input dataset is
clustered by SOM neural network rst of all. Afterwards, the
clusters are rened by using K-means clustering algorithm. Let
Dxi be the Euclidian distance between xi and the center of its
cluster Xj1. If Dxi is larger than the distance between xi and the
center of another cluster Xj2, xi is re-clustered into cluster Xj2. The
U-Matrix Map of the weight generated by neural network enables
to identify anomaly. Once anomaly is perceived, the trust degree
between two nodes is decreased. The denitive alarm is produced
until the degree of trust is below a predened threshold.
The participation of agents provides this scheme with higher
exibility, but also incurs excess costs. Letting the cluster head be
attended increases the security, as it meets trust-no-node.
However, employing SOM neural network algorithm and K-means
clustering algorithm at the same time brings a massive computation burden.

3.2.5. Optimized detection using genetic algorithm


This GA-based scheme does not focus on detection explicitly,
but it is able to not only speed up the detection accuracy, but also
reduce the false alarm rate (Rahul et al., 2009). This scheme
allocates the monitoring function to the sensor nodes through
using GA to evaluate its tness on the basis of workloads patterns,
packet statistics, utilization data, battery status, and quality-ofservice compliance.
Sensor nodes are classied as cluster head (CH), inactive node
(powered off), inter-cluster router (ICR), and common sensor
node (NS) in particular. The base station obtains a competing
tness function based on GA to optimally select CH or ICR as the
local monitoring node (LMN), where each solution is represented
as a binary string (chromosome) and an associated tness
measure. From the mating pool, a solution is picked out with a

j0

Fj

where Fi is the functional tness of a possible solution, and N is


the total number of possible solutions. LMN agent is in charge of
monitoring its neighbor nodes: (a) received signal strength,
(b) transmission periodicity, (c) spurious transmissions from
illegitimate nodes, (d) response delay, and (e) packet dropping
or modication. In addition, the base station utilizes LMN as a
loop-back agent to transmit special patterns through its trusted
route and receive the patterns with a pre-established route, in
which malicious nodes can be identied by the transmitting of
hashed data. Moreover, the base station covers the entire network
with optional techniques (statistical metrics and models, Markov
model, and time series model, etc.) on the basis of analytical
trafc data and LMN alerts. The tness function consists of
monitoring node integrity tness (MIF), monitoring node battery
tness (MBF), monitoring node coverage tness (MCF), and
cumulative truest tness (CTF). MIF resists the allocation of
LMN which is suspected to be compromised; the base station
estimates MIF with integrity rank value, whereby a low value
indicates high susceptibility to intrusion.
PN
PN
1 IRch  Kch
1 IRicr  Kicr
MIF ch
icr
,
PN
PM
ch 1 Kch
icr 1 Kicr
Kx 1

if x LMN; x A ch,icr,
PR

IRicr

r1

IRricr

where IRch and IRicr are the integrity ranks of CH and ICR
respectively, R is the number of routes, and IRricr is the integrity
rank of the route r that includes icr as a router in its path. IR is
estimated by the base station according to
Rx,y

covx,y
; 1 o Rx,y o1,
varx  vary

l t a  lt1 1a  l t1,


n
X

IDC var

k0

!,

lk

n
X

lk ,

k0

where lt stands for the actual number of the packet arrivals


during interval t, l t stands for the estimated number of the
packet arrivals during interval t, and lk is the number of the
packet arrivals between time intervals tk and tk 1 . MBF reects a
penalty on the battery usage of the communication between
sensor nodes, as
PN
BC  K
MBF iPN i i , BCi f Q ,U,
i Ki
where Q is the residual battery capacity, BCi is the projected
battery capacity of node i (CH or ICR). Battery usage rate (U)
depends on individual load and can be estimated with trafc
patterns and node-sync data. MCF rewards LMNs those can snoop
around the maximal number of nodes with low estimated
integrity rank:
!
P
P
b2 M
1 b1 N
j cj
i ci
MCF

,
2
F1  N
F1  M

b1 b2 1,
where ci is the number of LMN agents that monitor malicious
node i, which is below the integrity rank threshold, cj is the
number of LMN agents that monitor non-malicious node j, which

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

1313

is above the integrity rank threshold, and F1 and F2 are the


expected coverage redundancies for each malicious and nonmalicious node respectively. The total tness is given by CTF, as

the networks on-going sessions, and Ck indicates the average


cost of protecting cluster K.

CTF a1 MIF a2 MBF a3 MCF:

3.3.2. Comparisons with game theory-based scheme


The non-cooperative game theory-based scheme (Agah et al.,
2004a) is then compared with Markov decision process (MDP)
and intuitive trafc measure (Agah et al., 2004b).
With a stochastic process known as Markov Chain, MDP can do
forecasting by modeling the systems state transitions in the past.
MDP contains a tuple (S,A,R,tr), where S is a state set, A is a set of
actions, R is the reward function, and tr is the state-transition
function. The past system states and the transitions between
states can be described by a MDP model. The target is to
maximize the expected value of the received rewards over time.
On the other hand, the trafc measure is based on the intuitive
metric, so that the cluster which suffers from heaviest trafc
volume is marked as the most vulnerable area. Because of taking
account into many factors, the non-cooperative game theorybased scheme accomplishes highest forecasting accuracy among
others.

This scheme is extremely appropriate to cooperate with any


detection scheme, for not only conserving resource usage, but also
promoting its detection performance. The limitation of this
scheme is that GA suffers from exponential time increase if the
networks scale grows.
3.2.6. Research problems
Data mining and computational intelligence algorithms-based
detection schemes characterize by strong detection generality,
meaning effective to defense against a wider range of security
threats even if unknown. The tempting detection generality, of
course, comes along with high complexity, such that these
schemes best effort are tried to operate in distributed manner
(Rajasegarar et al., 2006, 2007; Masud et al., 2009).
Not simply proting from the hierarchical architecture of the
network, such as procient control and management, little
redundancy of routing, and adaptability to a distributed manner,
arranging the primary computing tasks to the base station also
provides the detection schemes with much more conversation of
energy overheads (Masud et al., 2009; Rahul et al., 2009).
Equipping each sensor node with detection agents could enhance
the performance and the ease of implementation without taking
too much energy in sensor nodes away (Wang et al., 2009), but
certainly leads to extra expense on advanced devices.
In fact, the GA-based scheme (Rahul et al., 2009) is an
attractive paradigm for developing intelligent detection schemes
over WSNs. A few of signicant factors relating to the benign
status are modeled with a tness function in each potential
solution, according to which the best solution is eventually found
by an optimizing process. The nal detection solution could
achieve maximal detection performance with minimal resource.
This scheme is able to cooperate with a range of detection
techniques, and makes them more intelligent.
3.3. Game theory-based techniques
3.3.1. Non-cooperative game theory
A game theory-based scheme is introduced for nding out the
vulnerable areas in a WSN (Agah et al., 2004a), based on many
risk factors such as reliability of a sensor node, different types of
attack, and past behaviors of the attacker. Only these identied
areas are provided with the protection of detection, in order to
save the energy cost.
Intrusion detection is modeled as a game played between
detection system and adversary. Each player is allowed to select
a strategy from a set of strategies once. Given a xed cluster in
the network, say K, these strategies are available to adversary:
attack cluster K, not attack cluster K, and attack a different
cluster. Detection system responds to either defend cluster K, or
defend a different cluster. The strategies are marked with 1 to
3 and 1 to 2 for adversary and detection system respectively,
where two 2  3 payoff matrixes A and B can be established. The
problem is to nd out the optimized strategy that maximizes the
prot for both players, namely achieving Nash equilibrium.
Measuring the payoff depends on a couple of factors, including
attack type, density of sensor nodes, and the number of previous
attacks. Nash equilibrium is achieved when both players selected
their own rst strategy. In other words, protecting the cluster
which has the highest value of U(t)  Ck brings about a reliable
rate of successful detection, where U(t) indicates the utility of

3.3.3. Research problems


Similar to the GA-based scheme (Rahul et al., 2009) mentioned
earlier, non-cooperative game theory-based schemes are not
concerned with detection immediately; however, it could assist
detection schemes in advancing their performance as well as
efciency. The design of the payoff function is crucial to the
forecasting accuracy, which is worth more studying. Moreover, if
the GA-based scheme which is capable of optimizing the placement of the monitoring nodes could cooperate with the game
theory-based scheme which enables identifying the vulnerable
areas, it is expected that the detection schemes can achieve better
performance.
3.4. Hybrid detection
3.4.1. Detection with prevention technique
There is only a hybrid detection framework (Su et al., 2005),
which really calls for the collaboration between the energy-saving
detection technique and the authentication-based prevention
technique. In the detection scheme, the cluster head is responsible for monitoring its common senor nodes; on the other hand, a
part of the common senor nodes are picked out in terms of their
residual energy to monitor their cluster head in turn.
A suite of secret keys are established during initialization, in
which the base station and common sensor nodes share the
individual secret key, each common sensor node shares a set of
pairwise secret keys with its neighbors, the common sensor nodes
within a cluster share a cluster secret key, and the group secret key
is shared among all sensor nodes over the network. The packets
transmitting through the network are categorized as control messages and sensed data. When the base station, cluster head, or any
intermediate node forwards a control message, a message authentication code (MAC) is appended with proper secret key. The intermediate nodes forwarding this control message verify the appended
MAC and replace it with a new MAC. The verifying and replacing
of MAC continues until this control message arrives at its destination. If sender (u) sends control message (M) to receiver (vi) with
current time stamp Tc, a MAC is generated by a proper secret key
according to
u-vi : M,Tc ,MACKuvi ,MjTc
where MjTc is the concatenation of M and Tc, and MACKuvi ,MjTc is
the MAC generated from MjTc with the secret key Kuvi which is
shared between u and vi. When a common sensor node (vi) forwards
a sensed data (D) to the cluster head (u), u needs to verify D to

Author's personal copy


1314

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

prevent from any fake or redundant messages sent by the attackers.


Because D is usually large and periodically sent from vi to u, the
generation of MACs during the forwarding path is time-consuming
and impractical for a WSN. In consequence, an enhanced authentication scheme of LEAP is put forward. The original LEAP cannot
identify the compromised nodes, as all the common sensor nodes
within a cluster share only one cluster secret key. First, pairwise
secret key is used by the enhanced scheme, instead of cluster secret
key which is used by the original LEAP. Second, the enhanced
scheme employs one-time key chain as session keys, which is fairly
efcient for authentication.
The detection is implemented in accordance to three types of
misbehaviors: packet dropping, packet duplicating, and packet
jamming. This detection scheme can be divided into two parts:
one is that the cluster head monitors its common sensor nodes
and the other one is that the common senor nodes monitor their
cluster head in turn. In particular, monitoring the cluster head
consists of arranging monitoring nodes, reacting to the abnormal
cluster heads, determining the alarm threshold, and determining
the group size. Moreover, monitoring the common sensor nodes is
simply to localize the suspicious node by pairwise secret key if
anomaly found.
This scheme is certainly able to reach at energy-efcient as
well as strongly secured, by taking consideration into many
details, for example linking detection against internal attackers
with prevention against external attackers together, using onetime key chain, letting the cluster head to be attended with
minimized energy cost, and fast localizing the compromised
nodes with a secret key. However, sensor nodes cannot move
and new sensor nodes cannot be added, once the pairwise key has
been established. Probably a dynamic key management and a
distribution mechanism could overcome this aw.

might be implemented with assistance such as the installation of


agents (Ho et al., 2009).
Besides, detection methods in at WSN are also diverse.
Minimizing energy consumption while retaining good performance is always important, and this is discussed along with the
various detection methods mentioned in the proposed detection
schemes below.

3.4.2. Research problems


Few schemes (Zhang et al., 2008) mentioned to cooperate with
a prevention-based technique in hierarchical WSNs. Moreover,
the security foundation established with a prevention technique
is only served as enhancing the security of the network, instead of
taking advantage of the functions brought by the availability of
secret keys. WSNs should have been protected by a security
foundation (Perrig et al., 2001). Apparently, the detection scheme
will be more efcient if capable of utilizing the functions provided
by this security foundation, rather than making use of prevention
and detection separately.

4.1.2. Detection using multi-hop ACK


Building upon a mechanism of multi-hop acknowledgement, a
detection scheme is put forward to defense against selective
forwarding attack (Yu and Xiao, 2006). Detection is active during
the path forwarding packets from the source node to the base
station, where the base station, intermediate nodes, and source
node take part.
A security foundation has to be established rstly, including
(A) node initialization and deployment, and (B) OHC (one-way
hash chain) based one-to-many authentication. The secret key
server loads every sensor node with a unique secret key and a
symmetric bivariate polynomial f(u, v) in the initialization. The
unique secret key is shared between this node and the base
station, and can be used for encrypting messages and generating MACs (message authentication codes). In the deployment,
each sensor node tries to nd its downstream and upstream
nodes. Afterwards, OHC is made use of establishing one-to-many
authentication among sensor nodes, which may be multiple hops
away. Compared with pairwise key-based authentication, the
OHC-based mechanism allows decreasing both the communication and computation overhead.
The detection is carried out by upstream (from the source node
to the base station) and downstream (from the base station to the
source node) separately. In the upstream detection, each intermediate node conducts report packet, ACK packet, and alarm
packet. Upon a report packet is periodically forwarded up to
ACK_Cnt times, a ACK packet (TTL is set as ACK_TTL) is generated
and forwarded along the opposite direction. The intermediate
node starts waiting for the ACK packets after sends over a report
packet. If fewer than t ACK packets return during Tack time period,
this node generates an alarm packet reporting the next downstream node is suspicious . Probably a malicious node would

4. Anomaly detection based on at WSNs


In at WSNs, rule-based techniques and statistical techniques
are more likely to be made use of. Without hierarchical architecture, all nodes are equally capable of functioning and participating in internal protocols. Consequently, detection schemes which
are lightweight and require less communication are preferable. In
this section, we survey some of the representative literatures for
each technique category mentioned above.
A rule-based model is commonly developed in accordance
with assumptions, information, or experiences known in advance.
As a result, it often focuses on specic security issues by examining the particular attributes of networking behaviors. In at
WSNs, statistical techniques are relatively simpler than those
for hierarchical WSNs, because of the nature of the architecture.
Because data mining and computation intelligence techniques
often depend on a central entity to cope with heavy organizational tasks, at architecture is naturally disabled for this,
although data mining and computation intelligence techniques

4.1. Rule-based detection


4.1.1. Decentralized detection using rules
A decentralized rule-based scheme is proposed (Silva et al.,
2005), in which a rule union picked from a set of candidate rules
is applied to satisfy the specic demands of application scenarios.
Given a WSN composed of common nodes, monitor nodes,
intruder nodes, and base station, each monitor node is in charge
of monitoring the neighbors within its radio range, by turning the
promiscuous listening mode on.
In particular, this scheme makes up of data acquisition, rule
application, and intrusion detection. In the rst phase, each
monitor node collects messages by a promiscuous listening mode
and lters off the important information for subsequent analysis.
The applicable rules are selected out according to requirements
during the second phase. As for the intrusion detection phase,
failing to match a rule increases one onto the failure counter. An
alarm is produced until this counter is over a predened threshold within a round of detection.
This scheme gives a good framework to rule-based detection.
But, there is a lack of clear description in regard of the details of
determining monitor nodes, such as particularly how many and
which sensor nodes should be on duty to make sure the entire
network is under protection.

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

fabricate the alarm packet. However, it only has little effect,


because the malicious node just can set its next downstream
node as suspicious, whereas the source node regards both of them
as suspicious. In the downstream detection, the intermediate
node is able to identify malicious node in terms of a discontinuous Packet_IDs for a specic source node, where its upstream
node is reported as suspicious to the base station. The base station
can lter off those false alarms denitively.
In fact, a specic protocol is served for the detection service,
rather than simply setting up several detection rules. Hence, this
is reluctantly classied as rule-based techniques, provided that
we view a protocol as composed of a series of rules. This scheme
is simple and fast, but having to rely on a security foundation
based on secret key management. Besides, there is no context to
precisely declare the conditions of starting report messages up.
4.1.3. Detection using rules
A rule-based detection scheme accounting for black-hole and
selective forwarding attacks is proposed (Ioannis et al., 2007). A
part of nodes are activated for monitoring and jointly making nal
decision, which are called watchdog nodes.
In particular, for a link A-B, node A and the nodes residing
inside the intersection of their radio range are set up as watchdog
nodes, which actually participate in detection. A couple of detection rules are established: (A) if a node drops more than t percent
of packets during w time units, alarm is produced; (B) if more
than half of the watchdog nodes produce the alarms against a
node, this node is denitively marked as intruder.
This mechanism is energy efcient, since relatively simple
detection rules require less communication, as well as small
communication overhead. But this mechanism is unsuitable for
any application scenario that requires a high detection accuracy
and low false alarm rate.
This scheme is supposedly the simplest and fastest one up to
now, although the detection performance stays down. But, a
classic framework of using detection rules is built up, as well as
the majority vote mechanism, which is introduced to the detection in WSN originally.
4.1.4. Detection using group deployment knowledge
In order to defense against replica node attacks, a detection
scheme is developed (Ho et al., 2009) through taking advantage of
group deployment knowledge. The detection is initiated at a
sensor node when this node receives a request from its neighbor
to forward a message. The use of a group deployment strategy is
the key assumption in this scheme. In this strategy, sensor nodes
are grouped together by the network operator and programmed
with the corresponding group information before deployment.
Each group of nodes is deployed towards the identical location
called group deployment point, following a probability density
function (PDF) f. f is modeled with two-dimensional Gaussian
distribution, as
f x,y

2
2
1
2
exxG yyG =2s ,
2ps2

where (x, y) is the position of a sensor node, (xG, yG) is the


deployment point of group G, and s is the standard deviation. In
total, there are three particular approaches from basic to
advanced: basic, location claim, multi-group.
The basic approach is described as follows. If a sensor node u
(a member of group Gu) receives a request from its neighbor node
v (a member of group Gv), u rst checks whether the distance
between the group deployment points of Gu and group Gv is less
than a predened distance d (threshold). If so, u believes v is a
trusted neighbor and then forwards messages coming from v. If

1315

the deployment is not accurate enough, a lot of benign nodes may


be rejected for forwarding messages by their neighbor nodes.
In consequence, the location claim approach comes with
stronger tolerance, in which a node also forwards messages
coming from untrusted neighbor nodes as long as evidences can
be provided for their innocence. The home (deployment) zone of a
group is dened as a circle centered at its group deployment point
with radius Rz. On the basis of a secure software localization
method, the node is allowed to discover its real location after
deployment. In the neighbor discovery phase, only the node v
(v A Nu, N(u) is the neighborhood of node before verifying vs
real location) which satises one of the following conditions can
be reserved: (A) if v resides outside its home zone, its location
claim
Cv fvJLv JSig v g,
where Lv is the vs real location and Sigv is the signature generated
by vs private secret key, can be authenticated successfully; (B) if
v resides inside its home zone, its location message
Mv fvJLv JMAC kuv g,
where MAC kuv is a message authentication code and kuv is the
secret key shared between u and v, can be authenticated successfully; (C) v resides inside the radio range of u; otherwise removed
from N(u). Then u marks v as trusted if v A Nu, otherwise
untrusted. When u forwards a message coming from a untrusted
node v, u veries vs location claim Cv. In detail, u oods throughout Cvs home zone with a probability pf. If any node in this zone
receives conicting claims, v is already replicated.
Based on the assumption in regard of the group deployment,
this scheme is effective to defense against replica node attacks
in collaboration with secret key management. Similar to the
multi-hop ACK-based scheme, a specic protocol is designed
(Yu and Xiao, 2006). However, the performance may be unsteady
as a result of the high dependency on the accuracy of node
deployment.
4.1.5. Research problems
In general, these schemes rely on prior-knowledge, restricting
their detection generality. But, the detection speed certainly
benets from no explicit training procedure. Basically, rule-based
schemes are derived from two broad categories.
The simpler one is building up a series of detection rules,
following the classical rule-based detection framework (Silva
et al., 2005; Ioannis et al., 2007) to carry out detection. The
combination of detection rules is a signicant issue, as too many
rules incur unnecessary energy cost but too less ones may miss
the detection accuracy. This issue is strongly related to attribute
selection mentioned in Section 2.4.
The other one is to invent a specic protocol aiming at certain
particular security threats (Yu and Xiao, 2006; Ho et al., 2009),
which brings about added complexity in the design and implementation. Furthermore, looking cooperation into the security
foundation appears to be of vast importance. Without a security
mechanism to protect from being tampered in progress, these
protocols may be not trustful at all.
4.2. Statistical techniques
4.2.1. Detection using radio model
Junior et al. introduce a scheme that focuses on HELLO ood
and wormhole attacks (Pires et al., 2004). Each node takes care of
the neighbors within its radio-range. Once a node hears a
message transmission from one of its neighbors, it starts up
detection. A message transmission is regarded as suspicious if

Author's personal copy


1316

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

its signal power is incompatible with its senders geographical


position. Then, the nal decision is made by a vote mechanism.
If the network follows a radio propagation model such as Free
Space model and Two-Ray Ground model, this model species the
transmission power, received signal strength, and distance
between sender and receiver. For example, in the Two-Ray
Ground model the received signal power is
pr

Pt  Gt  Gr  h2t  h2r
,
d4  L

where pr (watts) is the received power, Pt (watts) is the transmission


power, Gt is the transmitter antenna gain, Gr is the receiver antenna
gain, ht (meters) is the transmitter antenna height, hr (meters) is the
receiver antenna height, d (meters) is the distance between the
sender and receiver, and L is the system loss (a constant). If the
difference between the expected strength value of the received signal
and the actual strength value detected at the receivers transceiver is
larger than a predened threshold, as


minPr ,Pe
r 4R, r 1
,
maxPr ,Pe
where Pr is the value of received signal strength, Pe is the value of
expected received signal strength, and R is the threshold called
maximum ratio difference. Then, the suspicious node information
dissemination protocol (SNIDP) is initiated, by which the nal
decision is made. Each sensor node maintains a table that records
the numbers of suspicious or unsuspicious votes against its withinradio-range neighbors. When node A perceives a suspicious node B, A
broadcasts a request to the entire network. If the request receiver
does not hear B, this request is ignored; otherwise, the request
receiver has to vote a suspicious or unsuspicious count for B by
broadcasting a replay. If A nds that the suspicious count outnumbers
the unsuspicious count eventually, B is marked as compromised
denitely. In addition, the transmission is only detected with a
probability, in order to conserve energy cost.
In accordance to the assumption regarding radio propagation,
anomaly is identied by a statistical measure, along with a
majority vote mechanism. If the assumption fails to meet the
realistic situation, this scheme would be invalid.

4.2.2. Detection using packet arrival process


Dependent on the statistical measure of the packet arrival
process, a detection scheme is proposed by Onat and Miri (2005a).
Each sensor node maintains the normal trafc prole on its onehop neighbor nodes, with which anomaly can be identied.
A general Pareto arrival process with short bursts of ON and
long OFF is build up to depict the trafc pattern. For each ON
burst, a Poisson sub-process is created. Each sensor node separately maintains a received buffer (stores last N) and an intrusion
buffer (stores latest N1, N1 o N) for each of its one-hop neighbors, where the packet arrival intervals between a pair of
continuous packets are recorded. Then, the means and standard
deviation of the received buffer and intrusion buffer are examined. If satises
jmeanrecBuff meanintBuff j 4 K  stdrecBuff ,
where K is a predened constant, alarm is produced. For the
purpose of preventing the increase of the false alarm rate from
the depletion of energy, a parameter called miss threshold is
enabled, because the rate of packet dropping gradually grows up
in conjunction with the declining energy of sensor node. In detail,
if the number of the packets failing to pass through detection is
larger than a miss threshold N2 N2 oN, the property of the
received buffer needs to be adjusted.

4.2.3. Detection using packet power levels and arrival rates


Similar to the scheme based on packet arrival process (Onat
and Miri, 2005a), a new scheme is advanced, where the packet
power levels and arrival rates are observed to identify anomaly
(Onat and Miri, 2005b). Each sensor node takes care of its one-hop
neighbor nodes likewise.
A main packet buffer is maintained to record the arrival times
and received powers of the latest N packets. If the received power
of an incoming packet is below the minimum value or above the
maximum value of the received powers currently recorded in the
main packet buffer, this incoming packet is regarded as anomalous. Alarm is produced until the number of the packets which
consecutively behave anomalous reaches a predened threshold.
On the other hand, the arrival rates of the latest N2 and N N2 o N
packets are recoded. If
rateN2
Z K,
rateN
where rate N2 and rate N are the arrival rates, and K is a threshold,
alarm is produced. The miss threshold is applied to this scheme
also, as same as its initial scheme.
This suite of detection schemes exploits anomaly detection
with simple statistical technique. Indeed it is fast and efcient,
and probably adapts to a large number of applications, where the
security requirement is merely at entry level. By using a limited
number of attributes over the network as well as a simple method
to inaccurately measure the correlations under the data, it would
be difcult to nd any malicious activity hidden behind supercially genuine behaviors.

4.2.4. Detection using statistical distribution


Through exploring the spatial correlations existent among the
networking behaviors of sensor nodes in close proximity, Liu et al.
(2007) propose an insider attacker detection scheme. Each monitoring sensor node collects information from the neighbors
within its radio range, but applies a ltering process to ascertain
the authenticity of the information provided by those indirect
neighbors (more than one-hop away). Then a majority vote is
activated for deciding a suspicious node on a verdict.
This detection scheme in particular consists of four phases,
including information collection, false information ltering, outlier detection, and majority vote. Let N1(x) be a bounded closed
set of R2, which can be monitored by sensor node x directly;
specically, it can be xs one-hop neighborhood in watchdog-like
techniques. N(x) Nx + N1 x is the closed set of R2 that covers x
and additional n  1 nearest sensor nodes. x obtains a set of
attribute vectors F(x),
Fx ff xi f1 xi ,f2 xi , . . . ,fq xi T jxi A Nxg,
where q is a variable, indicating the number of the attributes in
use. In this case, the attribute set is made up of packet dropping
rate, packet sending rate, forwarding delay time, and sensor
readings. In the interest of preventing any possible adversary
hiding in N1(x) from forwarding the false attribute vectors of xi
(xs two-hop neighbor, xi A NxN1 x), trust-based false information ltering protocol is developed. Let
F1 x ff xi f1 xi ,f2 xi , . . . ,fq xi T jxi A N1 xg
be the attribute vectors of N1 x, m^ j be the sample mean and d^ j be
the sample standard deviation of F1(x)s jth component set
F1,j x ffj xi jxi A N1 xg, as
v
u
n1
n1
u 1 X
1 X
^
m^ j
fj xi , d j t
f x m^ j 2 ,
n1 i 1
n1 1 i 1 j i

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

1317

where n1 is the number of nodes in N1(x). x works out

compared with in terms of the error:

0
F 01,j x ffj xi jxi A N1 xg,

eA t xA tx^ A t:

0
where fj xi jfj xi m^ j =d^ j j. For each xi A N1 x, x seeks out the
maximum attribute component

If eA (t) is larger than a threshold eA , A is marked as malicious


node.
AR model is famous of its efciency, accuracy, and exibility in
the eld of time-series forecasting. A strong premise is in need
that the chosen data has to meet the properties of AR model,
where this paper demonstrates insufciently.

fM0 xi maxffj xi j1 rj rqg,


which suggests the extremeness of xi deviating from its neighbors behaviors. A trust value is gured out, according to
Txi fMm =fM0 xi ,
where fMm minf fM0 xi jxi A N1 xg. Given x receiving t attribute
vectors from its neighbors regarding xj A NxN1 x, in which
xj1 , . . . ,xjt A N1 x. A node xjT 1 r T rt will be a reliable relay
node, if
TxjT maxfTxjs j1r s rtg,

Txjr Z Tmin ,

where Tmin fMm =2. If xj has no reliable relay node available in


N1(x), the information about xj is dismissed by x. The ltered set
~
~
Nx
Nx
D Nx is xed up as a result, which covers xs one-hop
neighbors and xs two-hop neighbors who have a trustworthy
relay node in N1(x). According to the assumption that all
~
f xi ,xi A Nx,
form a multivariate normal distribution, the Mahalanobis squared distance constructed by the dataset F~ x is subject
to w2q distribution, thus detecting outlier is enabled (Li et al.,
_
2008a).
In addition, OGK algorithm is employed to estimate m and
_
P
accurately. Finally, a majority vote mechanism is used for
making a nal decision. x announces all suspicious nodes (D(x)) to
its neighborhood N x,Nx D N x, as more neighbors can
participate in the voting procedure. The announcement includes
~
xs vote on a neighbor xi, xi A Nx,
in which 1/0 indicates outlier/
normal. x then receives the announcements from other partici~
pant nodes to generate a record for each node in Nx.
If the
proportion pi of the announcements voting against xi is larger
than 0.5, xi is denitely regarded as outlier.
The choice of attributes is exible in this scheme, thus the
detection generality can be further extended if making use of
different combinations of attributes. With a false information
ltering procedure, this detection scheme is survivable for the
interference injected by unattended adversaries. Furthermore,
the use of majority vote mechanism provides this detection
scheme with dual guarantee for reliability.
4.2.5. Detection using auto-regression model
Curiac et al. (2007) propose a detection scheme against
malicious node, based on auto-regression (AR) model. The detector is installed in the base station to examine if the real value of
the data measured by a sensor node is equal to the value
predicted by AR model approximately.
The AR model is dened as
xt a1 xt1 a2 xt2    an xtn xt,
where x(t) stands for a time series, ai is a auto-regression
coefcient, n is the order of auto-regression, and x is the noise
(commonly Gaussian white noise). If the coefcients ai are timevarying, this equation can be rewritten as
xt a1 txt1 a2 txt2    an txtn xt:
The coefcients ai(t) can be estimated, in case the time series
x(t),y, x(t n) is known (recursive parameter estimation).
^
Accordingly, the future value xt
can be estimated, in case the
coefcients ai(t) and the time series x(t),y, x(t  n) are known.
Specically, QR decomposition recursive least square (QRD-RLS)
algorithm is taken advantage of estimating the coefcients. After
^
xt
is gured out, the real value xA (t) measured by node A is

4.2.6. Detection using hop count


Hop-count monitoring is applied to implement a detection
scheme by Dallas et al. (2007), which is effective against sinkhole
attack. Each sensor node is responsible for examining the hop
counts of the packets transmitting through itself, on the basis of
ad hoc on-demand distance vector (AODV) routing protocol.
A sensor node collects the hop counts regarding the nodes in
its vicinity, when the base station initiates the network and
periodically maintains routes by broadcasts. Given the sequence
of a hop-count measurement in a sensor node
X fx1 ,x2 , . . . ,xn g,
a pair of limits xlo and xhi are estimated, which enable identifying
anomaly. xlo and xhi are the lower and upper tolerance limits
towards a given proportion a of Xs underlying distribution.
Lognormal distribution can be closely approximated as the underlying distribution of hop counts, based on a goodness-of-t test
measured by the skewness and kurtosis of lognormal distribution,
in which the probability distribution function (PDF) is
p
2
2
f X x elnx =2s =xs 2p, x Z 0, s 40:
Since a may impose on the trade-off between the detection
accuracy and false alarm rate remarkably, alarm is only raised
until a number of successive route updates m (m 2 in this case)
are out of the condence limits.
The success of this scheme depends upon the accuracy of
lognormal distribution, which is controllable by advancing algorithms. But, this scheme is suggested to cooperate with a
prevention-based technique, instead of being a naked detection
service. Without a security foundation, adversaries may inject
tampered hop-counts information into the network, disabling the
detection service immediately.
4.2.7. Detection using quantitative measure
A data transmission quality (DTQ) function is dened to
identify compromised nodes, followed by a voting procedure to
make nal decision (Li et al., 2008b). The entire network is
divided into multiple groups. Each node is aware of its group
information, and maintains a DTQ table against its neighbors in
this group. Two communication scenarios are considered: intragroup and inter-group.
The DTQ function uctuates smoothly for legitimate nodes,
whereas continuously decreases for suspicious nodes. Building
upon the long-term and short-term statistics in regard of data
transmission quality, the DTQ function is established with variables: E, the total energy cost of transmitting a data burst; D, the
total transmission number of packets; P(), the expected probability of a packet being successfully transmitted; STB(), the
transmission stability of a sensor node:
8
a
>
Sd,u
Sd,u
>
>
o1,
if
>
< Ld,u
Ld,u
STB 
1=a
>
Sd,u
Sd,u
>
>
41,
if
>
: Ld,u
Ld,u

Author's personal copy


1318

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

where S(d, u) is the sum of the successful transmission rates of the


past n=m data bursts, and L(d,u) is such of the past n data bursts.
Sd,u=Ld,u reects variation trend of quality of the recent data
transmissions. In particular, the data transmission quality
increases when Sd,u=Ld,u 4 1; On the contrary, it decreases
immediately. The DTQ function is dened as
DTQ k

D
 STB=P,
E

where k 4 0. In this function, D=E stands for real amount of the


data packets being successfully transmitted in one energy unit.
STB() species degree of data transmission stability. Sudden
increase of data transmission stability dose not push DTQ value
up, but any decay of data transmission stability decreases DTQ
value soon. Ideally, the variable 1=P is able to erase the impact
from environment utterly. Accordingly, the fast falling DTQ value
enables identifying malicious nodes. If any misbehavior is found
by the ACK mechanism which is usually served as security
foundation, the sender (in intra-group scenario) or rst node (in
inter-group scenario) of this group reduces the DTQ values
relating to its group nodes, along the data transmitting path.
Voting is applied to exactly localize these compromised nodes
subsequently. The detection threshold of a group is dened as
X
1
Th r
qi ,
jNDTQ j i A N
DTQ

where NDTQ signies the set of the nodes listed in DTQ table,
jNDTQ j is the size of this set, qi is the DTQ value of node i, and
0 o r o1. When a node realizes one or multiple DTQ values are
below the threshold, it believes compromised nodes exist. In such
a case, this node broadcasts a message to its group members,
initiating a voting procedure. Since the DTQ values recorded by
the rest of nodes have unequal weights toward a node, it is unfair
to compare the number of votes directly. Apparently, the DTQ
values with late time stamps are more important than those with
old time stamps. Therefore, a variable o o A 0,1 is set up as the
weight of DTQ value, whose value decreases over time. The voting
result can be represented as
Vm

n
X

wim qim vim ,

i1

where qim is the DTQ value of node m in the DTQ table located at
node i, wim is the weight of qim, and vim 1 if votes for m or vim  1 if
i votes against m. If V 5 0, m is denitely a compromised node; if
V b 0, m is a legitimate node and those nodes voting against m need
to update their DTQ values according to formula


f
qm qm  1
,
f a
where the rates of voting for and against stand at f:a. Malicious nodes
then are dismissed immediately from the network.
A DTQ function is developed to identify compromised nodes
quantitatively, which take many attributes into account, such as
energy cost, data transmission quality, slack variable, etc. The use
of weight-based voting mechanism advances the reliability of the
proposed scheme. However, the dependencies on a grouping
method and the assistance by ACK mechanism result in additional
complexity.
4.2.8. Detection using grouping and statistical distribution
Li et al. (2008a) propose a group-based detection scheme,
using a statistical distribution-based technique to identify anomaly. The entire network is partitioned into a set of groups above
all, where the nodes within same group are physically close to
each other and their sensed data are dissimilar by at most d. Each
d-group is further partitioned into equal-sized sub-groups. Each

node in a sub-group takes charge of monitoring the network,


whereas these sub-groups monitor the entire d-group in turn, in
order to reduce energy cost.
The principle of the original d-grouping algorithm is described
as follows. Node i waits a random time Trandom(i) to initiate a
grouping message. During this period, if i receives a grouping
message from its neighbor and has not joined in any group yet, it
examines the Euclidean distance dfi ,frj between its sensed data fi
and the sensed data frj received from group root rj, as well as the
number of hops hops(i, rj) between them. If dfi ,frj r d=2 and
hopsi,rj r h=2, i joins in group rj. If a sensor node is not joined in
any group until expires, it creates a new group itself. As the
original d-grouping algorithm elects the root of a group at
random, the result may fall into extreme situations, such as too
many small groups. Accordingly, the rened d-grouping algorithm
is introduced. Two new variables frj _max and frj _min are employed,
indicating the maximal and minimal value of the sensed data in
group rj. frj _avg is the current average value of the sensed data in
group rj, and frj _count is the current number of sensor nodes in
group rj. Given
fr0j _avg frj _avg  frj _count fi =frj _count 1,
f 0rj _max maxfrj _max ,fi ,
f 0rj _min minfrj _min ,fi ,
if df 0rj _max ,f 0rj _avg r d=2, df 0rj _min ,f 0rj _avg r d=2, and hopsi,rj r h=2, i
joins in group rj.
Detection is carried out in two steps, which takes a couple of
attributes into consideration, including sensed data, packet sending rate, packet dropping rate, packet mismatch rate, packet
receiving rate, and packet sending power. First, if a node detects
a deviation, it alters the other nodes in its sub-group. Second, if
more than n1 alert messages coming from a same node or
charging against a same node in the alert table during a period,
promiscuous mode is activated for monitoring the suspicious
node specically. The alert value is dened as n1w1 n2w2, where
w1 and w2 stand for the weight values of the alert messages
coming from the other sensor nodes and itself respectively, n1 and
n2 are their numbers. If the alert value is above a predened
threshold y, the suspicious node is conrmed as malicious. Given
the attribute vector of node xi,
f xi fx1 , . . . ,xn g,
forming a sample of multivariate normal distribution, and that
P
f(xi) is distributed as Nq m, , following a multivariate normal
distribution with mean vector u and variancecovariance matrix
P
, the Mahalanobis squared distance
X1
f xi mT
f xi m
is distributed as w2q , where w2q is the chi-square distribution with
q degree of freedom. Consequently, the probability that f(xi)
satises
X1
f xi m 4 w2q a
f xi mT
is a, where w2q a is the _upper percentile of this chi-square
P
_
distribution. Let m and
be the estimations of m and S
respectively. The probability that f(xi) satises
X1
f xi m 4 w2q a
f xi mT
is expected to be roughly a. Given
X1
f xi m1=2 ,
dxi f xi mT
xi is regarded as outlier if d(xi) or d2(xi) is unusually large, for
example d2 xi 4 w2q a. Estimating m and S with the simple mean

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

and the simple variancecovariance may suffer from the distortion caused by outlying sensor nodes. Instead, OGK (orthogonalized GnanadesikanKettenring) algorithm is used for accomplishing the estimating of m and S.
Through leading the consideration of average amount into the
original d-group algorithm, an enhanced algorithm with stronger
resilience is proposed. Dependent on an interval estimation
technique derived from statistics as well as a weight-based voting
mechanism, the effectiveness of this scheme is sound. Moreover,
the employment of several typical attributes over the network
provides advanced detection generality with the proposed
scheme.
4.2.9. Research problems
In at WSN, statistical techniques-based detection schemes
are most popular. Taking advantage of a relatively complex
training procedure, these detection schemes can reach stronger
detection generality than rule-based schemes. In general, a part of
nodes are responsible for the main computing of the data
processing procedure. However, this may rapidly bleed these
working nodes out. Therefore, developing a strategy to averagely
make use of sensor nodes for anomaly detection is emergent.
The incorporation of a majority voting mechanism into the
detection scheme is very common, because a naked detection
based on statistics may generate a high false alarm rate. In fact,
this is caused by the natural inaccuracy of the statistical measures
themselves as well as the intervention by unattended malicious
nodes. Liu et al. (2007) deal this issue with an extra false
information ltering, but markedly rises up energy cost. Casting
about for a new scheme in collaboration with prevention-based
technique probably could mitigate the urgent need of plus
guarantee mechanism.
The invention of grouping-based schemes (Li et al., 2008a,b)
gives the opportunity of implementing advanced techniques as
strong as used for hierarchical WSNs to at WSNs. Nevertheless,
the weakness of these heads (elected from normal sensor nodes)
impedes the progress.
4.3. Graph-based techniques
4.3.1. Detection using routing pattern
Based on the routing pattern, a detection scheme is introduced
to mainly defense against sinkhole attacks (Ngai et al., 2006).
First, the base station collects the network ows information to
identify the attacked area with a low-overhead algorithm, where
it contains all the affected nodes. Second, the base station
localizes the intruders exactly by modeling the attracted area
with a graph, according to the routing pattern. Multiple malicious
nodes may be cooperative to prevent the base station from
collecting correct information. Therefore, this scheme is enhanced
by establishing a secret key-based security foundation, increasing
path redundancy, and an algorithm dealing with multiple malicious nodes using hop counts.
In this case, the base station estimates the suspicious area by
examining any missing or inconsistent data with
s
xj x2
f xj
,
x
where x1 ,y,xn are the sensed data collected in a sliding window,
and x is their mean. The base station broadcasts a request
message encrypted with its private secret key KBS, as
/TS,ID1 ,ID2 , . . . ,IDn SKBS , where IDi stands for the ID of node i,
and TS is the time stamp. For each node v receiving this request
the rst time, it should reply the base station a message
/IDv ,IDnexthop ,costS if its ID is called on, where the cost may be

1319

hop count, data rate, etc. Afterwards, the base station can realize
the routing pattern by constructing a tree using the next hop
information collected. In total, the base station may get more than
one tree. Finally, the number of nodes in these trees is computed
by a depth-rst search. The intruder should be the root who has
the biggest tree, which attracts most of the network trafc.
In addition to identify an intruder, two enhancements have
been made in order for preventing from concurrent attacks
launched by more than one malicious node: one is to establish
a security foundation, while the other one is to increase the
redundancy of the path forwarding reply messages. The base
station then localizes the real intruder through detecting the
inconsistency among the hop-count information.
In its extended journal publication (Ngai et al., 2007), a MAC
(message authentication code) mechanism is additionally introduced when a node sends replay messages, because the replay
messages can be tampered during the forwarding path if without
any protecting mechanism. Given node v sending a replay
message R, v actually sends /R,MAC Kv RS to the base station,
where the notation MACKv is the MAC computed over message R
with secret key Kv. Furthermore, a more elaborate evaluating
process for the performance of the scheme is described, covering
its detection accuracy, communication cost, and energy cost. In
terms of the simulation experiments, the performance of this
scheme is perfectly sound.
This scheme adopts routing patterns to detect sinkhole attacks
in WSNs. Any intruder can be identied through detecting the
inconsistency among the hop-count information. Security of this
scheme is enhanced by using a message authentication code
mechanism. In addition, graph-based techniques would have
been resource costly, but the participation of base station in the
computing process overcomes this problem.
4.3.2. Research problems
Graph-based technique is currently not very popular for the
anomaly detection in WSNs. But, the architecture of WSNs suits to
be modeled with a graph in nature. Establishing a graph commonly requires the followings: (A) the participation of routing
protocol, (B) arranging a global identication to each node, and
(C) the security convoy by a mechanism of secret key management; thus the applicability of graph-based techniques is diminished. If being supported by specically designed routing protocol
and security foundation, it would be realistic that the graphbased detection schemes can make greater sense.
4.4. Data mining and computational intelligence-based techniques
4.4.1. Detection using rule learner
Yu develops a detection scheme with association rule learning
(Yu and Tsai, 2008). Each sensor node is equipped with an intrusion
detection agent (IDA), which consists of a local intrusion detection
component (LIDC) monitoring its host node and a packet-based
intrusion detection component (PIDC) identifying malicious nodes
by the communication activities relating to its neighbors.
A machine learning algorithm called SLIPPER is applied to this
scheme, comprising multiple binary classiers each of which
contains a set of rules. CR is the prediction condence of rule
R in a binary classier, which is xed up during the training
phase. In a binary classier, the nal hypothesis sums up the
condence values of rules all together in accordance to
!
X
Hx sign
CRt ,
Rt :x A Rt

where sign stands for predicted class label and the degree of
prediction condence (PC). Because there are multiple binary

Author's personal copy


1320

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

Table 4
Horizontal evaluation of statistical technique category.
Scheme

D3

MGDD

Chi

Stat M

DTQ

ACC
FAR
GENE
CC

 94%
 2%
1
OdjRj

Z 90%
r 5%
6
O(mn)

r 93:3%
r 0:032%
3
O(n)

Z 93%
r 2%
6
O(NDTQ) O(Nvote)

MEM



1
O djRj 2 logjWj

 94%
 1%
1


djRj
O
2ar
OdjRj

O(nw)

O(mn)

O(NDTQ n)

classiers, the prediction condence ratio (PCR) based arbitral


strategy is used for making nal decision. PCR is dened as
PCR PCjMAXfPC 1 ,PC 2 , . . . ,PC m g,
where PC is the prediction condence about a data in the test
dataset, and PCi is the prediction condence about ith data in the
training dataset. The nal decision is gured out in terms of
i fjjPCRj MAXfPCR1 ,PCR2 , . . . ,PCRn g,
where PCRj is a prediction condence rate, i is the index of the binary
classier which is determined as nal result, and n is the number of
the binary classiers in total. In addition, the specic conditions
separately employed by different rules may have same implication;
as a result, a strategy is necessarily introduced to optimize the rule
evaluation procedure. Finally, a model tuning algorithm is proposed
to prevent false alarms from the associated condence values being
inappropriately congured. In particular, when a classier generates a
false positive prediction, the condence values of those positive rules
should be decreased; conversely, when a classier generates a false
negative prediction, the condence values of involved positive rules
should be increased.
Few schemes that build upon data mining and computational
intelligence algorithms can be found in a at WSN. This case
achieves such by equipping sensor nodes with IDA, which will not
bring about additional energy cost. But, this really incurs much
more device cost. A smart strategy of rule evaluation is introduced as well as a model tuning algorithm, which brings lower
energy overhead and stronger robustness to this scheme.
4.4.2. Research problems
Commonly, it is reluctant to develop detection schemes by
data mining and computational intelligence algorithms, as a at
WSN hardly adapts to it without a formidable head to conduct
central functions. The involvement of IDA brings the chance,
although much more cost is paid on the device upgrade. Similarly,
grouping algorithms mentioned in Section 4.2 are able to produce
the same effect. The research in regard of applying advanced
techniques to anomaly detection in at WSNs is encouraged to
carry on, because there would be a at WSN-based application
thirsty for a detection service with good generality.

5. Analysis and comparison


The advantages and disadvantages of each detection scheme have
been individually mentioned above. The analysis and comparison are
meaningful between these schemes horizontally and vertically. As a
result, a number of representative cases are selected from three most
popular technique categories: rule-based, data mining and computational intelligence, and statistical techniques. Although an evaluation
standard on the performance of a detection scheme has been
suggested in Section 2.1, this might be not suitable for evaluating
the most of existing schemes, as it contains unredeemed and
idealized contents. In this section, each case is evaluated by a couple

Table 5
Horizontal evaluation of data mining and computational intelligence technique
category.
Scheme

Clustering

SVM

SLIPPER

ACC
FAR
GENE
CC
MEM

N/A
4%
1
O(mNc)
O(Nc)

 100%
r 10%
1
p
O nL Ol On
O(n)

N/A
N/A
Many
r Onlogn
N/A

of traditional criteria: detection accuracy, detection generality, false


alarm rate, and detection speed. It is reasonable to deem that the
detection speed can be reected by the computation complexity and
memory use. Accordingly, each representative cases computation
complexity and memory use are evaluated, instead of showing its
detection speed quantitatively. A short comment may be also available for a case, in order to talk about its potential limits, such as the
need on prior-knowledge.
5.1. Statistical techniques
5.1.1. Analysis and comparison
The statistical techniques used for anomaly detection in
WSNs can be classied as statistical distribution, statistical
measures, and statistical model. The suite of kernel density
estimator-based schemes (Palpanas et al., 2003; Subramaniam
et al., 2006) is xed up on behalf of the statistical distribution
technique category. Two examples with respect to the statistical measures technique category are given, one of which
makes use of classic statistical measures (mean and standard
deviation) (Onat and Miri, 2005a) while the other denes a
measure itself (self-dened measure) for identifying anomaly
(Li et al., 2008b). As for the technique category of statistical
models, it is neglectable because of infrequent use. Table 4
shows the results in brief, where the number of roughly
covered security issues is used as the mark of detection
generality; ACC, FAR, GENE, CC, and MEM mean detection
accuracy, false alarm rate, detection generality, computation
complexity, and memory use (same in Tables 5 and 6)
respectively; and a number of abbreviated names are used
for short, including D3 (Subramaniam et al., 2006), MGDD
(Subramaniam et al., 2006), Chi (Liu et al., 2007), Stat M (Onat
and Miri, 2005a), DTQ (Li et al., 2008b).
Two algorithms D3 (distributed detection of distance-based
outliers) and MGDD (outlier detection using multi-granular local
metrics) are developed in particular (Subramaniam et al., 2006).
Overall, D3 performs with detection accuracy up to 94%, but only
suffers from false alarm rate of 2%. On the other hand, MGDD also
achieves detection accuracy as high as around 94%, with false
alarm rate of 1%. The computation complexity of D3 is OdjRj, and
the memory use of each sensor node is OdjRj 1=e2 logjWj,
where d is the data dimensionality, jRj is the sample size, e is the
maximum error of standard deviation, and jWj is the size of sliding

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

1321

against attacks including DoS, black-hole, wormhole, message


alter, power consumption, selective forwarding, etc.

Table 6
Horizontal evaluation of rule-based technique category.
Scheme

Wide

Pac Drop

ACC
FAR
GENE
CC
MEM

480%
o 10%
Many
N/A
N/A

N/A
o 5%
2
O(n)
O(nw)

window. MGDD undertakes OdjRj=2ar computation complexity and


OdjRj memory use, where r stands for the sampling neighborhood
and ar is the counting neighborhood. By means of modeling the
sensed data with kernel density estimator, this scheme is effective
against malicious nodes injecting false information into the network,
as well as random failures. In consequence, it is also feasible for
dealing with other issues, such as faulty sensors discovery and online
query processing. However, its success is strongly associated with
accuracy of the model, in particular accuracy of the parameters, such
as bandwidth of the kernel function.
The interval estimation-based scheme (Liu et al., 2007) enables
performing detection with an accuracy rate higher than 90% and
false alarm rate generally less than 5%. Its computation complexity adds up at false information ltering, outlier detection, and
majority vote in all, which is Omn,m bn bq exactly, where m is
the number of nodes in neighborhood Nn, n is the number of
nodes in neighborhood N, and q is the data dimensionality. The
memory use is O(nw), where w indicates the size of sliding
window. Moreover, the vote procedure generates O(n)O(m)
communication overhead. In this scheme, the set of attributes is
composed of packet dropping rate, packet sending rate, forwarding delay time, and sensed data. Therefore, it could defense
against a range of attacks: selective forwarding, black-hole, power
consumption, DoS (Denial of Service), message alter, false information, etc. The dataset constructed by a variety of attributes
must not all meet the assumption regarding chi-square distribution. As a result, the performance goes down if the deterministic
behind the dataset is dissatised with chi-square distribution.
Classic statistical measures (such as mean, variance, standard deviation, covariance, etc.) are well capable of revealing
the intrinsic of a statistics. For example, mean and standard
deviation are applied to build up a normal prole representing
the packet arrival process (Onat and Miri, 2005a). In an ideal
situation where all the parameters are best optimized, the
detection accuracy can reach 93.3%, with false alarm rate
around 0.032%. The computation complexity stands at O(n),
while the memory use is O(mn), where n is the number of
nodes in the monitoring area, and m is the length of buffer N1.
This packet arrival process partly relates to the packet sending
rate of a node as well as the packet dropping rate; thus, this
scheme may be effective against power consumption attack,
selective forwarding attack, and black-hole attack.
The example of self-dened statistical measure is about the
DTQ (data transmission quality) based scheme (Li et al., 2008b).
Once the parameters are best adjusted, this scheme in general
achieves detection accuracy higher than 93% with false alarm rate
less than 2%. Each sensor node maintains NDTQ DTQ values for its
group neighbors. The computation complexity is O(NDTQ)
O(Nvote), where O(Nvote) is generated at the vote procedure, and
Nvote is the number of sensor nodes participating in voting. Its
memory use depends on the number of the nodes that reside in
the transmitting path, which is O(NDTQn) denitely, where n is the
length of the buffer. The DTQ function synthetically takes packet
sending power, packet dropping rate, and forwarding delay time,
etc. into account; therefore, this scheme has ability to defense

5.1.2. Conclusion
After reviewing these schemes, we know that D3 and MGDD
in Subramaniam et al. (2006) are carried out in hierarchical
WSNs, and Chi (Liu et al., 2007), Stat M (Onat and Miri, 2005a),
and DTQ (Li et al., 2008b) are developed in at WSNs. Seemingly
Stat M demands the least computation complexity and memory
use at each node. But, many nodes are active for monitoring at
the same time. Conversely, it is probably the most resourceexpensive scheme among others. Communication cost is a
signicant factor for the detection speed, whereas it is hardly
measurable as the detection related information exchange must
not occur explicitly. Despite fewer nodes are working simultaneously in Chi and DTQ schemes, a plenty of communication cost
is actually generated at Chi, because it has to collect a lengthy
data from each of its neighbor nodes. Schemes D3 and MGDD
seem to show the most stable performance but take up the
resource as same as the other ones. Because the use of a
distributed manner spreads their computation over the entire
network and therefore it is in effect to reduce the communication cost; moreover, the cluster head tightly holds the primary
computing tasks together with itself, accordingly alleviating the
computing pressure at common sensor nodes. Furthermore, we
can conclude that commonly stronger detection generality is
accomplished by taking advantage of more attributes.
5.2. Data mining and computational intelligence
5.2.1. Analysis and comparison
This survey paper provides the technique category of data mining
and computational intelligence with a few examples, the most of
which are based on hierarchical WSN. In this sub-section, a xedwidth clustering-based scheme (Rajasegarar et al., 2006) is in use on
behalf of clustering algorithm of data mining. The second example is
about SVM (Rajasegarar et al., 2007). In addition, the association rule
learning algorithm-based scheme is cited (Yu and Tsai, 2008). Table 5
shows the comparisons illustratively, where the abbreviated names
include clustering (Rajasegarar et al., 2006), SVM (Rajasegarar et al.,
2007), and SLIPPER (Yu and Tsai, 2008).
The xed-width clustering-based scheme (Rajasegarar et al.,
2006) highlights its advance on resource use because it is
operated in distributed manner. In contrast to its centralized
case, their average false alarm rates are comparable (distributed
case is 4%, centralized case is 3%), as well as their detection
accuracies. The point is that the decrease of communication
overhead is up to 98% in this scheme. The computation complexity is summed up by those generated at data normalizing,
clustering, merging, and detection separately, which is O(mNc)
exactly, where m is the number of the measurements during a
time window, and Nc Nc 5 m stands for the number of the
clusters. Beside, each sensor node costs O(Nc) memory use. The
sensed data, consisting of humidity, temperature and pressure in
detail, is used as the attribute set for detection. Hence, this
scheme is supposed to be effective against false information
attack and a majority of random failures.
Similarly, the quarter-sphere SVM-based detection scheme
(Rajasegarar et al., 2007) is established with distributed manner.
Given RBF kernel and the maximally optimized parameters n and
s, this scheme can reach at approximately 100% detection
accuracy astonishingly, with false alarm rate less than 10%. What
is more, there is a 357-fold reduction in communication overhead,
comparing with its centralized case. Solving the linear optimization problem of SVM requires a polynomial time algorithm such

Author's personal copy


1322

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

as interior point method, which incurs O(n3) arithmetic operap


tions and O nL iterations, where n is the number of the
variables and L is the size of the optimization problem. Computing the global normal prole generates O(l) complexity, where l is
the number of the common sensor nodes. The detection procedure costs O(n) complexity. In total, the computation complexity
p
is O nL Ol On. The memory use of each node is O(n),
where n is the number of the sensed data used for detection. Two
times of information exchange occur between the cluster head
and common sensor nodes, resulting in a little communication
cost. In particular, the detection attribute set is made up of light,
temperature, pressure and humidity, with which this scheme is
able to identify false information attack and random failures.
Building upon a rule learning algorithm called SLIPPER, a
detection framework is proposed in fact, rather than a specic
scheme (Yu and Tsai, 2008). The performance of this framework
may vary in terms of what the detection attributes and detection
rules are made use in practice. However, SLIPPER generates at
most O(n log n) computation complexity afrmatively, where n is
the number of examples.

5.2.2. Conclusion
Many aspects with regard to data mining and computational
intelligence-based detection schemes take on similarity: they are
all built in hierarchical WSNs except the one with help of IDA (Yu
and Tsai, 2008); they are independent on any prior-knowledge;
they tend to be operated in distributed manner, as these schemes
are often complicated; and they prefer multivariate attribute set,
in order to take hold of excellent detection generality. Thus, the
differences between these schemes merely come along with
various choices on the attribute set and probably added
strategies.

5.3. Rule-based detection


5.3.1. Analysis and comparison
Rule-based detection schemes are highly restricted by their
dependency on prior-knowledge regarding anomaly detection. In
this sub-section, two examples are given (Silva et al., 2005;
Ioannis et al., 2007), which rely on knowing a quantity of
experience about the genuine networking behaviors in advance.
The results are shown in Table 6, where the examples are
abbreviated as Wide (Silva et al., 2005) and Pac Drop (Ioannis
et al., 2007).

A exible rule-based detection scheme is proposed (Silva et al.,


2005), in which a wide range of rules are available for a variety of
application scenarios. Being able to against attacks including
wormhole, message alter, black-hole, DoS (jamming), and replay,
this scheme keeps up detection accuracy rate no less than 85%,
except 80% against delay attacks. Its false alarm rate is generally
less than 10%. The detection speed is fully linked to the positions
of sensor nodes in the routing tree that varies according to the
network topology and communication protocol. Since a lot of
rules are available, this scheme is effective against many security
issues if appropriate rules are running. But, without a security
expert to suggest the appropriate use of rules, this scheme can
make sense hardly.
Concerned with the packet dropping rate, another rule-based
detection scheme is developed (Ioannis et al., 2007). The simulation results demonstrate its sound detection accuracy with no
more than 5% false alarm rate. The computation complexity is
mainly generated at the rule matching procedure, which is O(n),
where n is the number of nodes in the monitoring area. Each node
spends O(nw) on the memory use, where w is the size of the
sliding window. Moreover, the vote procedure generates a little
communication overhead. Because only two rules about the
packet dropping rate are employed, its scope is limited to blackhole attacks and selective forwarding attacks. The security expert
is also important for this scheme, as the customers are usually
unfamiliar with setting the threshold.
5.3.2. Conclusion
The amount of the attributes and their corresponding rules
plays a key role in the detection speed of a scheme as well as its
detection generality. Without a training procedure, a set of rules
function as the normal prole directly. This leads to not only fast
detection speed, but also strong reliance in regard of how to set
the normal prole and threshold with available rules. As a result,
either a reasonable assumption or a security expert who can
suggest how to account for such with his experience is obligatory.
5.4. Summary
Because the environment, dataset, focus, scale, etc., in each
experiment are totally different from scheme to scheme, detection accuracy and false alarm rate may not reect the realistic
performance. Therefore, these detection technique categories are
vertically examined without the two factors. Table 7 shows the
panoramic comparison of evaluations on the popular techniques.

Table 7
Panoramic comparison of evaluations on popular detection techniques.
Techniques

ACC

CC

Remark

D3

 94%

OdjRj

Either parametric or non-parametric statistical techniques,


balanced performance and complexity

MGDD

 94%

Chi
Stat M
DTQ

Z 90%
r 93:3%
Z 93%



djRj
O
2ar
O(mn)
O(n)
O(NDTQ) O(Nvote)

Clustering
SVM
SLIPPER

N/A
 100%
N/A

O(mNc)
p
O nL Ol On
r Onlogn

Basically based on machine learning, good performance but complex

Wide
Pac Drop

480%
N/A

N/A
O(n)

Simple and fast rule-based techniques, performance is limited

ACC: detection accuracy; CC: computation complexity; FAR: false alarm rate; GENE: detection generality.
D3: distributed detection of distance-based outliers; MGDD: outlier detection using multi-granular local metrics.
Chi: chi-square distribution-based detection; Stat M: statistical measure; DTQ: data transmission quality.
SVM: support vector machine; SLIPPER: rule-learning algorithm.

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

Table 8
Vertical evaluation of technique categories.
Tech. category

Generality

Speed

Distributed

Prior
knowledge

Statistical
techniques
DM/CI
Rule

Normal

Normal

Possible

Assumption

High
Low

Low
High

Necessary
Not

Not
Assumption,
experience

The vertical evaluation on the three technique categories is


illustrated in Table 8, where DM/CI stands for data mining/
computational intelligence.
Overwhelmingly, data mining and computational intelligencebased schemes own strongest detection generality, as long as
adequate attributes are in use. Their formidable capabilities of
dealing with multiple-dimensional data fully support this to be
realistic. Of course, what comes along with this capability is the
high complexity. Fortunately, distributed detection may be implemented with the help of WSNs hierarchical architecture, which
eventually cuts the complexity down as much as do those
relatively advanced statistical techniques-based schemes. This
kind of schemes is also characterized by the great exibility, as
it never depends on any prior-knowledge.
Apparently, the computation complexity and memory use of
rule-based detection schemes are lowest, indicating the fastest
detection speed. However, they have to suffer from the weakest
detection generality, since they are not equipped with the ability
to dispose of multi-dimensional data. Inserting the new rules that
cover more detection attributes into the rule set is the only way to
push the detection generality up, which results in a linear
increase of the complexity. What is worse, the establishment of
these schemes often demands some prior-knowledge regarding
anomaly detection, either assumptions or experiences.
The performance of statistical techniques-based schemes
essentially stands in the middle. These schemes are enabled to
be deployed in any WSN. With a rational assumption, some
statistical distributions or models can be used for measuring the
networking behaviors so as to identify anomaly. Statistical technique is allowed to tackle multi-dimensional data, but the
complexity would climb up dramatically. Distributed anomaly
detection is also feasible for this kind of schemes.

1323

framework. For example, a classic model (Denning, 1987) was put


forward for intrusion detection expert system, which is independent of any particular system, application environment, system
vulnerability, or type of security issue.
6.1.2. Attribute selection
The signicance of attribute selection has been stated in
Section 2.4. In an ideal situation, the interrelationship between
attributes over the network and each potential security issue can
be fully gured out. But, this situation is impractical since the
interrelationship will not rmly follow the one-to-one pattern,
meaning that one malicious activity may be disclosed by a few of
attributes or that more than one malicious activity only imposes
on the same attribute. In a word, this interrelationship is fuzzy.
We advise this problem could be settled by three steps. First,
examine popular security issues against WSN. Second, nd out
atomic attributes over the network based on the established
anomaly detection model, where the immediate attributes provided by the model are referred as atomic attributes. For
example, packet arrival interval (Onat and Miri, 2005a) is just a
derived attribute calculated by the difference between packet
arrival times, where the packet arrival time is an atomic attribute.
Third, explore the interrelationship with a variety of advanced
techniques, such as support vector data description (SVDD) (Kloft
et al., 2008), Bayesian networks (BN), classication and regression
trees (CART) (Chebrolu et al., 2005), and fuzzy and rough sets
(Jensen and Shen, 2009).
6.1.3. Distributed anomaly detection in WSN
Distributed detection is not strange to traditional networks
(Huang et al., 2006; Cabrera et al., 2008), but rarely appears in
WSNs. In this paper, a couple of distributed anomaly detection
schemes are reviewed, in which using time-consuming algorithms for detection in WSNs comes true with the help of a
distributed manner. Most of the time, we prefer a scheme with
sound intelligence but paying a little higher for the speed, rather
than that everything gives way to the efciency. As a result,
advanced detection techniques with a distributed manner are
absolutely promising. Currently a number of data mining algorithms and statistical techniques have been successfully applied
to this issue. A point is that not any detection algorithm can be
reconstructed in a distributed manner; moreover, not any reconstructed distributed detection algorithm is better than its origin.
This point is in emergent need to be followed.

6. Potential research areas and conclusion


Anomaly detection has received much attention for the recent
years, as a result of its outstanding effort made to securing WSNs.
The increasingly complicated application scenarios and risky
adversaries, however, force us to keep this research going forward. According to the papers surveyed above, a number of
potential research areas are suggested as follows.
6.1. Potential research areas
6.1.1. Modeling the problem of anomaly detection in WSN
Anomaly detection is well-studied in TCP/IP networks and ad
hoc networks, but new problems arise after anomaly detection
enters into a WSN. In fact, anomaly detection in WSNs relates to
many factors, such as the sensor nodes operating system, sensor
nodes capabilities, sensor nodes deployment, routing protocols,
networks architectures, security foundation, natural environment, etc. We believe a formal model is necessary to describe
the problem of anomaly detection in WSNs. Based on this
model, the research can be followed within a general-purpose

6.1.4. Advanced strategy


A few of strategies have been incorporated into a detection
scheme in WSNs. The GA-based scheme (Rahul et al., 2009) that
introduces a strategy to optimally arrange detection services for
WSNs is one of the most typical cases. Another one is that a smart
strategy, derived from the game theory (Agah et al., 2004a), which
is designed to identify the vulnerable areas in a WSN. Besides, a
lot of schemes take advantage of majority vote or similar
strategies to make the nal decision. Moreover, probability-based
strategies sometimes are suitable for diminishing the frequency
of the operation or communication (Subramaniam et al., 2006;
Pires et al., 2004). In conclusion, the intelligent use of strategies
can make up the aws existed in current schemes. This problem is
worth further studying.
6.1.5. Cooperation with prevention-based technique
The original intention of the research of anomaly detection in
WSNs is to get together with prevention-based techniques,
enabling WSNs to be much more secure and reliable. However,
few schemes emphasize this point, except a couple of schemes

Author's personal copy


1324

M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

which will get naked under the malicious behaviors without the
protection of secret keys (Yu and Xiao, 2006; Ho et al., 2009). In
fact, WSN should have been equipped with a security foundation
for the majority of safety-critical applications. There is however
only a case that really link detection with prevention, in which
the pairwise key established between two nodes are used for
localizing the compromised nodes. Consequently, we suggest
spending more attention on the cooperation between the detection service and security foundation.
6.1.6. Survivable and tolerable anomaly detection
This is another old problem derived from traditional intrusion
detection (Yu and Frincke, 2004; Frinckea et al., 2006). As a result
of the resource constraints of WSNs, simpler detection algorithms
are more likely to be employed, allowing adversary to break
through the detection service itself before initiating a thorough
attack. For example, an adversary can nd out the threshold easily
by analyzing the pattern of network trafc and alarm, if the
detection is based on measuring statistical mean/variance of the
network trafc. To date, this issue remains open in WSNs. We
have suggested a conceptual outline with regard to inventing
survivable and tolerable anomaly detection in WSNs in Section 2.1,
where the survivability implies the detection scheme is equipped
with the anti-analysis capability (Deng et al., 2004). In other words it
perturbs the detection algorithm of itself; on the other hand,
tolerability indicates that a detection service will be recovered or be
re-enabled immediately after failing to continue against external
disturbance.
6.1.7. Uniform performance evaluation standard
Currently, the performance metrics of intrusion detection are still
in use, which are mainly composed of detection accuracy, false alarm
rate, and complexity. But, we suppose it is not even close to evaluate
next generation of anomaly detection in WSNs. First, a detection
scheme is always expected to own powerful capability of addressing
a wider range of security issues with a comparable cost; therefore,
detection generality should be added to the new performance
metrics. Second, energy cost must be taken into account, as energy
is the most precious resource in WSNs. Finally, a metric has to be
involved to evaluate the security and robustness of detection schemes
themselves, where we supposedly dene survivability and tolerability
for doing such tasks.
6.2. Conclusion
The research of anomaly detection in WSNs attracts increasing
attention recently, whereas a survey paper that systematically
details the up-to-date anomaly detection techniques in WSNs and
advises a number of signicant research problems is not yet
available. In this paper, we have rstly presented the key design
principles of anomaly detection in WSNs. Then, many typical
examples are introduced according to WSNs architectures and
their technique categories. Moreover, a few of examples are
picked from three most popular technique categories to carry
out a particular comparison and analysis. Finally, the potential
research areas in the near future are suggested.
References
Agah A, et al. A non-cooperative game approach for intrusion detection in sensor
networks. Presented at the IEEE 60th vehicular technology conference,
September 2004.
Agah A, et al. Intrusion detection in sensor networks: a non-cooperative game
approach. Presented at the 3rd IEEE international symposium on network
computing and applications, 2004.
Akyildiz IF, et al. A survey on sensor networks. IEEE Communications Magazine
2002;40(August):102114.

Anderson JP. Computer security threat monitoring and surveillance. Fort Washington, Pennsylvania: James P Anderson Co; April 1980.
Axelsson S. Research in intrusion-detection systems: a survey; December 1998.
Cabrera JoBD, et al. Ensemble methods for anomaly detection and distributed intrusion detection in mobile ad-hoc networks. Information Fusion
2008;9(January).
Chandola V, et al. Anomaly detection: a survey. ACM Computing Surveys
2009;41(July).
Chebrolu S, et al. Feature deduction and ensemble design of intrusion detection
systems. Computers & Security 2005;24(June):295307.
Curiac D-I, et al. Malicious node detection in wireless sensor networks using an
autoregression technique. Presented at the 3rd international conference on
networking and services, June 2007.
Dallas D, et al. Hop-count monitoring: detecting sinkhole attacks in wireless
sensor networks. Presented at the 15th IEEE international conference on
networks, 2007.
Deng J, et al. Intrusion tolerance and anti-trafc analysis strategies for wireless
sensor networks. Presented at the 2004 international conference on dependable systems and networks, July 2004.
Denning DE. An intrusion-detection model. IEEE Transactions on Software
Engineering 1987;SE-13:22232.
Frinckea D, et al. From intrusion detection to self-protection. Computer Networks
2006;11(November):12338.
Han S, et al. Taxonomy of attacks on wireless sensor networks. Presented at the 1st
European conference on computer network defence, 2005.
Hodge VJ, Justin J. A survey of outlier detection methodologies. Articial Intelligence Review 2004;22:85126.
Ho J-W, et al. Distributed detection of replica node attacks with group
deployment knowledge in wireless sensor networks. Ad Hoc Networks
2009;7(November):147688.
Hu J. Host-based anomaly IDS. In: Springer handbook of information and communication security. Springer Verlag; 2010.
Huang L, et al. Distributed PCA and network anomaly detection; July 2006.
Ioannis K, et al. Towards intrusion detection in wireless sensor networks.
Presented at the 13th European wireless conference, April 2007.
Jensen R, Shen Q. New approaches to fuzzy-rough feature selection. IEEE Transactions on Fuzzy Systems 2009;17(August):82438.
Kloft M, et al. Automatic feature selection for anomaly detection. Presented at the
1st ACM workshop on AISec, 2008.
Li G, et al. Group-based intrusion detection system in wireless sensor networks.
Computer Communications 2008a;31(December):432432.
Li T, et al. Compromised sensor nodes detection: a quantitative approach.
Presented at the 28th international conference on distributed computing
systems workshops, June 2008.
Liu F, et al. Insider attacker detection in wireless sensor networks. Presented
at the 26th IEEE international conference on computer communications,
May 2007.
Lopez J, Zhou J. Overview of wireless sensor network security. In: Wireless sensor
network security. IOS Press, incorporated; May 2008. p. 121.
Masud M, et al. Anomaly detection by clustering ellipsoids in wireless sensor
networks. Presented at the 5th international conference on intelligent sensors,
sensor networks and information processing, 2009.
Ngai ECH, et al. On the intruder detection for sinkhole attack in wireless sensor
networks. Presented at the IEEE international conference on communications,
June 2006.
Ngai ECH, et al. An efcient intruder detection algorithm against sinkhole attacks in
wireless sensor networks. Computer Communications 2007;30(September):
235364.
Onat I, Miri A. A real-time node-based trafc anomaly detection algorithm for
wireless sensor networks. Presented at the 2005 systems communications,
August 2005.
Onat I, Miri A. An intrusion detection system for wireless sensor networks.
Presented at the 2005 IEEE international conference on wireless and mobile
computing, networking and communications, August 2005.
Palpanas T, et al. Distributed deviation detection in sensor networks. SIGMOD
Record 2003;32(December):7782.
Perrig A, et al. SPINS: security protocols for sensor networks. Presented at the
17th ACM international conference on mobile computing and networks,
2001.
Pires WR, et al. Malicious node detection in wireless sensor networks. Presented at
the 18th international parallel and distributed processing symposium,
April 2004.
Qian L, et al. Detection of wormhole attacks in multi-path routed wireless ad hoc
networks: a statistical analysis approach. Journal of Network and Computer
Applications 2007;30:30830.
Rahul K, et al. Reduced complexity intrusion detection in sensor networks using
genetic algorithm. Presented at the IEEE international conference on communications, 2009.
Rajasegarar S, et al. Distributed anomaly detection in wireless sensor networks.
Presented at the 10th IEEE Singapore international conference on communication systems, October 2006.
Rajasegarar S, et al. Quarter sphere based distributed anomaly detection in
wireless sensor networks. Presented at the IEEE international conference on
communications, June 2007.
Rajasegarar S, et al. Anomaly detection in wireless sensor networks. IEEE Wireless
Communications 2008;15:3440.

Author's personal copy


M. Xie et al. / Journal of Network and Computer Applications 34 (2011) 13021325

Roman R, et al. Applying intrusion detection systems to wireless sensor networks.


Presented at the 3rd IEEE consumer communications and networking
conference, January 2006.
Scarfone K, Mell P. Guide to intrusion detection and prevention systems (IDPS);
February 2007.
Silva APRd, et al. Decentralized intrusion detection in wireless sensor networks.
Presented at the 1st ACM international workshop on quality of service and
security in wireless and mobile networks, 2005.
Subramaniam S, et al. Online outlier detection in sensor data using non-parametric models. Presented at the 32nd international conference on very large
data bases, September 2006.
Su C-C, et al. The new intrusion prevention and detection approaches for
clustering-based sensor networks. Presented at the IEEE wireless communications and networking conference, March 2005.
Tarique M, et al. Survey of multipath routing protocols for mobile ad hoc
networks. Journal of Network and Computer Applications 2009;32:112543.
Tiwari M, et al. Designing intrusion detection to detect black hole and selective
forwarding attack in WSN based on local information. Presented at the fourth
international conference on computer sciences and convergence information
technology, 2009.

1325

Wang H, et al. Intrusion detection for wireless sensor networks based on multiagent and rened clustering. Presented at the international conference on
communications and mobile computing, January 2009.
Wu B, et al. Secure and efcient key management in mobile ad hoc networks.
Journal of Network and Computer Applications 2007;30:93754.
Yick J, Mukherjee B, Ghosal D. Wireless sensor network survey. Computer
Networks 2008;52(August):2292330.
Yu D, Frincke D. Towards survivable intrusion detection system. Presented at the
37th annual Hawaii international conference on system sciences, January
2004.
Yu Z, Tsai JJP. A framework of machine learning based intrusion detection for
wireless sensor networks. Presented at the IEEE international conference on
sensor networks, ubiquitous and trustworthy computing, June 2008.
Yu B, Xiao B. Detecting selective forwarding attacks in wireless sensor
networks. Presented at the 20th international parallel and distributed processing, 2006.
Zhang Y-Y, et al. Inside attacker detection in hierarchical wireless sensor network.
Presented at the 3rd international conference on innovative computing
information and control, June 2008.

All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.