You are on page 1of 22

D.

Project Description
1 Introduction
Network neutrality [37] is a hotly debated topic of the past decade, with no end in sight [54]. There are
at least three different perspectives from which one can view the concept [38]. However, one of its major
themes is the content nondiscrimination principle which roughly states that Internet Service Providers (ISPs)
should not be able to restrict, limit nor discriminate their users right to access content and run network
applications. It is probably fair to say that network neutrality debate is a debate over who has control of the
network: the edge or the middle?
The edge represents (the devices of) network users, who have a better idea of what they actually want,
collectively have more computational power, and are the more innovative of the two. Giving more power to
the edge is consistent with the end-to-end (E2E) principle [92], which has served us very well for the past
40 years. It is the principle based on which much of the current Internet architecture owns its shape to.
The middle represents the routers/switches, their links, and other devices which form the infrastructure
for delivering data for the edge. Following the E2E principle, for most of the Internets existence the middle
has been interfering minimally with users traffic; the Internet has been relatively neutral. However, in the
past decade or so, a variety of factors have been making the Internet in general and the ISPs in particular
less and less neutral. These factors include, in the words of Blumenthal and Clark [22], the rise of new
stake-holders in the Internet, in particular Internet service providers; new government interests; the changing
motivations of a growing user base; and the tension between the demand for trustworthy overall operation
and the inability to trust the behavior of individual users. In general, politics, economics, security, and new
types of application quality of service (QoS) demands are the major factors contributing to this deviation
from the E2E principle.
Perhaps some degree of E2E principle violation has to be tolerated in order to cope with the brave
new world, or the E2E arguments themselves have to be altered a little (as argued in [22]). As a few simple
examples, firewalls and spam mail filtering are necessary for security, network address translations (NAT)
are necessary for scalability, content caching is necessary for performance, and probably some intelligence
inside the networks are necessary for media intensive and/or QoS sensitive applications.
However, there is a large class of E2E principle violations which have much less to do with technical
reasons. For example, it is not uncommon for ISPs and even entire countries to selectively degrade or block
voice-over-IP (VoIP) traffic [48] or peer-to-peer (P2P) traffic [9, 14]. It is much harder, if not impossible,
for the ISPs to justify this type of traffic discrimination.
Earlier this year, Representative Henry Waxman put forward a proposal in the House to set clear rules
on neutrality, to no avail. On April 6, 2010, a three-judge panel in a federal appeal court in Washington D.C.
ruled that the Federal Communications Commission (FCC) does not have the authority to regulate how an
ISP manages its network [11, 13]. If nothing else, the ruling indicates that traffic discrimination will not
disappear through legal means any time soon. Early this December (of 2010), there was a confusing and
media-attention grabbing dispute between Internet backbone operator Level 3 and Comcast, when Level
3 asserted that Comcast was violating network neutrality by suddenly charging Level 3 to deliver its
traffic onto Comcasts network [12]. In fact, the FCC just delayed its vote on network neutrality until
December 21st, while its Chairman Julius Genachowski gave a speech entitled Preserving Internet Freedom
and Openness a few weeks ago [10].
In the midst of the opinionated and contentious debate between many parties (some with vested interests)
in network neutrality, a very natural technical question arises, which this proposal is developed around.
The question is the following
How do we design applications which can evade the ISPs traffic discrimination?

The question posed as such is too vague, in large part due to its scope. The answer to the question
depends on three main factors:
(i) How exactly does the network operator detect and discriminate the applications data flows?
(ii) What is the users network connection and service characteristics?
(iii) What are the QoS requirements that the application requires?
By precisely scoping each of the above three items, we will have a set of well-formulated, wellmotivated, and intriguing technical questions to study. Addressing this entire class of questions is beyond
the scope of a three-year research program. We will further narrow down our answers to (i), (ii), and (iii).
With respect to (i), there are two complementary approaches to data flow detection and/or classification:
content-based and statistical machine learning (SML) based. Content-based classifiers look at the actual
contents of packets in the flows, including information in the headers such as protocol and port [63, 68] and
in the actual payloads [47, 56, 66, 93, 98] (deep packet inspection, or DPI). SML-based classifiers make use
of various statistics such as packet sizes, flow length, inter-arrival times, etc. Needless to say, a combination
of SML- and content-based methods can also be used.
The header-based methods have been shown to be ineffective [55, 56], mostly because ports and protocols can easily be forged. DPI-based classifiers can be accurate. However, they also have severe limitations [58]. First, they can only identify known classes of traffic. It would be difficult for DPI-classifiers
to deal with applications running proprietary protocols such as Skype. Second, DPI is not feasible on encrypted payloads. For example, both KaZaa and Skype are known to encrypt their payloads not only to avoid
traffic detection but also to avoid reverse-engineering [21, 61]. The popular L7-Filter [1], for example, can
not accurately detect obfuscated or encrypted protocols [14]. The open-source OpenDPI has been recently
stripped off its capability of identifying obfuscated or encrypted protocols [2]. Third, DPI is impractical in
real-time on high-speed links. Last but not least, DPI causes major privacy and legal concerns.
SML-based classifiers do not peek into packet payloads, and thus are immune to encryption. It has
been shown that SML classifiers can be highly accurate [20, 62, 67, 87, 91, 102]. For example, [25] recently
reported that a 5-second observation window is sufficient for an SML-based classifier to pin-point Skype
traffic with recall and precision better than 98%. SML-based classifiers are starting to show up in the wild.
(See, for example, the open-source SPID software [3].) In a real-time setting, SML-based classifiers are
also aided by advances in data streaming algorithms [69]. Furthermore, unlike DPI, (semi-)unsupervised
learning techniques allow SML-learners to classify new types of application traffic.
SML-based classifiers have their own limitations, however. By padding payloads and/or splitting the
flows, applications can easily alter the statistical signatures of the flows. However, flow padding and splitting come at a price: the QoS level of the application clearly must suffer, potentially to the point that the
application can no longer function properly, rendering the camouflaging effort ineffectual. VoIP, for example, has very stringent QoS requirements. One of the major questions this proposal aims to address is to
formalize and characterize this tradeoff between QoS degradation and evasion capability of an application.
To summarize, with respect to question (i), we will focus our attention on the question of evading SMLbased classifiers for reasons elaborated above. In addition to the network neutrality motivation, there is
another great motivation for studying the problem of evading SML-based classifiers: evading SML-based
engines is a fundamental problem in the emerging area of machine learning security [17, 18]. As an increasing number of network and systems security and fault diagnosis solutions are SML-based [15, 52, 60], it
is imperative that we understand the security of machine learning models and algorithms. Knowing which
SML-based flow classifiers are practically useful under various criteria is also important for other reasons.
For example, network operators need accurate flow statistics. Network security administrators want to detect
abnormal traffics such as when there is an attack or a worm propagating.
2

With respect to (ii), we can make use of the service terms indicated in typical service level agreement
(SLA) contracts between users and ISPs. Basically, we need to determine the resources that the application
has to work with. For widest possible applicability of the problem formulation, we will make as few assumptions about the SLA as possible. For example, in a formulation discussed in Section 4, the only thing
we assume is that the user has a certain access bandwidth.
With respect to (iii), we will formalize commonly used QoS parameters such as total throughput, latency, jitter, loss rate, and instantaneous bandwidth. For example, if we want to send VoIP streams to bypass
the ISPs filter, we need to ensure the typical VoIP QoS parameters such as loss rate less than 1 percent, oneway latency no more than 150 ms. average one-way jitter less than 30 ms, and instantaneous throughput of
21-320 kbps, depending on the sampling rate, the codec, and Layer 2 media overhead [95].
This proposal aims to formulate and address a class of problems regarding methods for evading
statistical machine learning based packet classifiers, while keeping some specific level of QoS,
subject to the service characteristics provided to the application by the network provider.
Intellectual Merit: Leaving aside all the obvious political and economic overtones, technically the
network neutrality debate is also about the role of the E2E design principle which has been influential
in the design of the Internet and its protocols in the past 40 odd years. At the very least, it would thus
be very interesting and satisfying intellectually to have a viable E2E solution to the traffic discrimination
problem, the core of network (in)neutrality. As shall be elaborated, before trying to evade SML-based data
flow classifiers we will also need to verify that indeed SML-based classifiers can be practically effective in
distinguishing data flows, possibly in real-time. This problem is useful from both as security standpoint and
from a network statistics/diagnosis standpoint. Furthermore, studying effective and accurate SML-based
classifiers under real-time constraints adds a fresh challenge to machine learning. The fruits beared from
the interaction between statistical modeling and data streaming type of requirements should be useful in
a wide variety of contexts. For example, statistically learn the environment given a multitude of sensing
data streams is a central problem in sensor networking. Last but not least, our problems are a sub-class of
machine learning security, which is crucial in an era where increasingly we rely on intelligent systems for
data gathering, storage, and analysis.
In terms of qualification, the PI is an expert in combinatorial group testing and switching networks design
and analysis. In group testing, his major contributions were in highly error-tolerant explicit constructions
of disjunct matrices based on algebraic objects [36, 73, 74, 76], and sub-linear time decodable group testing
[53]. Combinatorial group testing has many applications in network and security [19, 33, 34, 40, 57, 103].
In switching networks, the PI has developed a complexity models for multi-channel switching networks
and proved asymptotically optimal and near-optimal bounds for many classes of multi-channel switching
networks [35, 71, 72, 75, 78, 80, 81, 86, 101], and devised new techniques for switching network blocking
analysis [70, 77, 79, 8285, 88, 100]. More closely related to this project, the PI has devised combinatorial
models for insider threat detection and mitigation, and developed practical GUI-based tools based on the
models [2931, 41, 45]. In [64], the PI and co-authors proposed a data-centric machine learning framework
to database insider threat detection [64]. The works in [4244, 46] involve the modeling and analyses of
Internet worm and botnet propagation mechanisms.
Broader Impacts: The project naturally brings together ideas and researchers from networking, security, and statistical machine learning. Whichever outcome of the FCCs vote on the network neutrality issue
this years end, traffic discrimination is here to stay. Hence, even a partial solution to this problem should
lead the way to a variety of network applications favoring users. Obviously, the (in)security of machine
learning has important real-world impacts. The project will support the training of graduate students and the
PI will disseminate the results via research publications, presentations, and surveys.

Main Proposed Research Questions

Problem 2.1. With respect to data flow classification and discrimination, is SML-based classifers really a
worthy target for evasion?
We have briefly argued in the previous section to narrow down the scope of the ISPs traffic discrimination technique to be SML-based. However, we will need to be much more certain that SML-based classifiers
really are practical for real-time traffic discrimination. Thus, this is the first problem in our proposed research.
As preliminary evidences, there have been many studies showing that good SML-based classifiers can
achieve very high accuracy (more than 98%) in classifying many types of typical Internet data flows such
as HTTP, FTP, SSH , P 2 P, VOIP, DNS , HTTPS, etc. See [87] for an excellent survey on both SML-based
classification techniques and performance results. The comprehensive evaluation work in [58] also includes
comparisons of SML-based classifiers with content-based classifiers. As alluded to earlier, content-based
classifiers are not suitable for obfuscated or encrypted payloads. Furthermore, deep packet inspection raises
privacy concerns which the ISPs may not want to deal with.
If SML-based classifers are that accurate, then why is the above an open research question? The catch
here lies in the fact that virtually all proposed (accurate) classifiers make use of statistical features derived
from all packets in at least one direction of the flows. (For TCP connections, a classifier might have access to
packets flowing back too! Bidirectional access improves classification accuracy a little. However, requiring
bidirectional access is not very practical partly because the classifier may not be on the backward path [87].)
The SML-classifiers were trained off-line and tested with off-line data sets, for which the entire flows are
available and statistical feature vectors are computed from the entire flows. While these accurate classifiers
are certainly very useful for network operators (for provisioning, diagnosis, and a posteriori traffic analysis,
e.g.), they are not useful for discriminating against flows. In order to disturb/drop a flow, the classifier must
be able to accurately identify the flow while it is still present.
Consequently, Problem 2.1 is really about the effectiveness of SML-based classifiers in identifying data
flows in the middle of a flow transmission. We will refer to these classifiers as real-time classifiers, for lack
of better names. (The word online classifier might be used; however, online learning has a totally different
meaning which might cause some confusion.) In this respect, there are significantly fewer studies. Few
notable exceptions are the works in [2325, 89]. In [89], the classifiers are trained on features selected on a
(small) sliding window of packets in a flow, then evaluated in identifying a UDP-based online game protocol.
In [25], a 5 second window was shown to be sufficient to identify Skype flows with high accuracy. In [23,24],
SML-classifiers whose features are based on both traffic characteristics and fingerprints computed from the
encrypted payloads are shown to be accurate. Classification methods which only require small observational
windows also use less memory space and thus are more suitable for deployment at high-capacity routers.
Problem 2.2. How do we design practical real-time data flow classifiers?
The solutions to this problem have two implications. First, they are certainly useful for network operators
in collecting statistics and diganosis information, and they should be useful in designing flow-based routing
algorithms. Second, knowing which learning methods and models are really practical, we can then focus on
designing practical solutions to evade them.
A practical real-time flow classifier has to work under quite stringent real-time constraints. Hence,
Problem 2.2 is embedded with many interesting questions. For example, the classifier can only sample
packets from small, possibly randomized, sliding windows of the stream of packets from a given flow. It is
unclear which features to extract and how to extract them in mili-second time scales. Do Internet data flows
exhibit sufficient statistical patterns in small observational windows to distinguish themselves? If so, which
types of flows have that property?
4

Even with perfect feature and statistical model selection strategies, to make them practical we will also
need to solve the accompanying problem of designing the algorithms realizing the strategies in a datastreaming fashion (sub-linear time and space). There are evidences from data streaming and network traffic
anomaly detection that such algorithms exist for some simple statistical models. For example, it is known
that entropy and entropy norms of data streams can be estimated efficiently [27, 28], which can be used for
worm and anomaly detection [16, 99]. Problem 2.2 will generate a new class of data streaming questions,
whose solution techniques would contribute further to our knowledge of data streaming.
Problem 2.3. How do we formulate and solve the SML-classifier evasion problems?
This class of problems is at the center of our proposed research. In order to formulate such problems, we
shall need to formalize the notions of user applications capability and levels of quality of service guarantees.
For example, in terms of what the application can do (to evade the classifier), we can make some minimal
assumptions about the users SLA terms with the ISP. The simplest type of SLA is probably some given
download and upload bandwidth capacities. The QoS guarantees will vary depending on the application.
We will look at basic QoS parameters such as overall throughput, instantaneous throughput, jitter, latency,
and loss rate.
At a high level, the classifier evasion problem is how to send data by splitting the original flow into multiple sub-flows, and possibly padding the sub-flows, so that the classifier allows the (padded) sub-flows to go
through, and that the aggregate of the sub-flows as seen at the receiving side satisfies the QoS requirements.
There are two types of classifier evasion problems we will consider. For lack of better words, we will
name them generic classifier evasion and specific classifier evasion problems.
Generic classifier evasion. In this first type of problems, we will make no assumption about what
learning model the classifier is using. This is the weakest possible assumption, and thus we expect this
version of classifier evasion to the the hardest to solve. Section 4 discusses a particular formulation of this
type of classifier evasion problems and presents some partial results.
Briefly, we can assume that the application has access to a set of white-listed flows which the classifier
allows to get through. The application picks sub-flows from this white-list, and schedules sub-flow transmissions. The objective is to optimize some aggregate QoS parameter such as throughput, as observed at
the receiver side. The constraints come from how we model the SLA. If upload bandwidth is a constraint,
for example, then mathematically the sub-flow selection and scheduling is subject to a multidimensional
packing-type of constraint. Because the application is just re-playing allowable flows, it does not matter
what type model the classifier uses.
Specific classifier evasion. In this type, we will look at some specific statistical models for the real-time
classifiers. Answers to Problem 2.2 provide us the guidance on which specific classifiers are worthy of
evading. Then, we assume that the application knows which model the classifier is using. As shall be seen
in Section 3, ensemble learning methods such as AdaBoost and Bagging [65], or Decision Tree - C4.5 [48]
are good candidates for real-time SML-classifiers.
In the generic classifier evasion problems, the application is highly constrained in what type of sub-flows
are allowed. In this specific version, the allowable sub-flows can be generated from the known statistical
model. The application will generally have more flexibility in picking the allowable sub-flows; thus, we
expect the QoS guarantees are better than solutions to the generic type of evasion problems. Obviously,
solutions to the second type of evasion problems have direct applications in SML-based network security.
Another related question is, which model is more capable of coping with flow evasion?
Problem 2.4. How do we create sample flows from a given classification model?
For the specific classifier evasion problem, the application has access to a classification model, which
could be generative or discriminative. The application will likely have to create sample flows from the
5

model, and then use the sample flows to guide the flow splitting and padding process. Thus, implicitly we
need to solve the problem of creating sample flows from a given model. This problem would be simpler
if the classification model is generative such as Naive Bayes or Mixture of Gaussian. For a discriminative
model such as support vector machine or boosting, finding efficient methods for generating sample flows
fitting the model is a very interesting question on its own!
Problem 2.5. How does the location of the classifier(s) affect the formulations and solutions to the above
problems?
Thus far, there are a few real-world constraints which have been swept under the carpet. We have not
been explicit in stating where exactly between the sender and the receiver the classifier is placed. In fact,
there might also be multiple classifiers. Furthermore, we have not specified whether the classifier(s) has
access to both directions of the communication.
For the sake of clarity, suppose there is only one classifier placed somewhere between the sender and
the receiver. Consider a solution to Problem 2.3 where the sender sends sub-flows according to a white-list
of allowable flows. As the sub-flows travel from the sender to the classifier, current network condition
will alter the statistical signature of the sub-flows: some packets might be lost, the inter-arrival times are
certainly not the same as the inter-transmission times. We will refer to this effect as the network noise effect.
The farther the classifier is from the sender, the higher the network noise level will be. It is not clear how
the classifier will classify a sub-flow plus network noise. This practical issue brings up a subtle point in
the (potential) formulations for Problem 2.3.
In the generic classifier evasion version, we assume that there is a list of white-listed flows. We have not
specifically indicated where and how this white-list can be obtained. The simplest way for is for the user to
experimentally run network applications which the ISP allows to go through and record all the packet sizes
and their timings. Thus, the white-list flows are flows recorded at the users machine. By replaying these
flows for the camouflaging effort, the faked sub-flows will suffer from exactly the same type of network
noise as the real flows. Thus, the important point to note here is that for white-listed flows obtained at the
users machine, we do not need to worry about the network noise affecting the faked flows. However, it is
still very interesting to answer the following two related problems, which will be useful for us later on:
Problem 2.6. How do the network conditions (traffic load, congestion, e.g.) affect the classification performance of (real-time) SML-based classifiers?
For example, suppose we train a classifier under some specific traffic load, what will its classification
accuracy be under a different traffic load, which could be lighter or heavier than the original load.
Problem 2.7. How does the distance from the source affect the (real-time) classifiers precision?
We will set up a small experimental local area network, and also connect the network to PlanetLab
[32, 90] to address the above two questions. The answers to the above two questions should guide us toward
an answer to the following problem.
Problem 2.8. Under a given network condition and a given sender-to-classifier distance, how do we model
the network noise effect? More concretely, given a specific flow characteristic at the sender (all packet
sizes and their timings), what is the distribution of the arrival times and packet losses as observed at the
classifier?
In the specific classifier evasion version, we have access to a specific statistical model that the classifier
uses. Thus, if we had a good answer to Problem 2.8, we can then attempt to generate sub-flows from the
sender such that those flows plus network noise will fit the classifiers model. This constraint adds
another layer of complexity to Problem 2.4.
6

Problem 2.9. When splitting a flow into sub-flows and/or pad them for masquerading, how can we add
redundant data to make the transmission more error tolerant?
Error-tolerance can be used to maintain a certain packet loss rate, which is a basic QoS parameter. Also,
redundancy will help cope with the (hopfully small) probability that the classifier disturbs or drops one of
the sub-flows. For this problem, lessons learned from designing erasure codes for large reliable content
distribution should be very helpful [26].
Problem 2.10. Characterize the overhead involved in flow splitting, padding, and redundancy addition.
Clearly, in formulating and addressing the classifier evasion problem, we have to take into account both
data and network overhead.
Problem 2.11. Re-visit all the above problems in the scenario where the network operator adopts collaborative and distributive machine learning classifiers.
This problem is very natural, but realistically its scope is likely much wider than this proposal was intended to cover. The problem of devising statistical models which can be adopted by a set of distributed
nodes and designing distributed protocols for both training and decision making belongs to distributed machine learning, a difficult topic still in its infancy [51]. (The key motivation for [51] to develop distributed
learning models and protocols was for a network-wide anomaly detection problem similar to our distributed
classifier problem [50].) The difficulties come from both the distributed system and the machine learning
angles. From the distributed system angle, computing sophisticated functions and solving large-scale optimization problems distributedly is hard. Yet, this task is required by most learning models such as SVM
and PCA (as was done in [50, 51]). From the machine learning angle, nodes in the network often have
incomplete pictures of the instance space, which requires us to design statistical models from a collection of
partial windows into the instance space world. Traditional learning theory was not designed to deal with
such distributed sampling constraints.
The inherent difficulty with Problem 2.11 is also a good opportunity. This proposal aims to study how
to design and evaluate data flow classifiers which can collaborate to detect correlational patterns between
the flows (the analog of Problems 2.1 and 2.2). As a first step, we narrow the problem down to only look for
a given flow type. For example, if there were at least one classifier intercepting each splitted sub-flow, can
they collaborate to disturb the overall flow? Then, we will also explore the corresponding evasion problems.

Preliminary results for Questions 2.1 and 2.2

This section presents very briefly some of our preliminary results regarding Questions 2.1 and 2.2. It will
elaborate on several subtle points regarding the design and evaluation of SML-based classifiers. Another
objective of the section is to further convince the reader of the central premise that evading SML-based
classifier is a worthy goal. The results confirm that real-time SML-based classifiers can potentially be made
practical. These classifiers are trained and tested on small observational windows in the flows.
The data traces. We use public traffic traces from NLANR [96] and MAWI Working Group Traffic
Archive [97]. Several days long packet header traces are obtained from the Auckland-IV and Auckland VI
data sets. Auckland-IV is collected from 02/21/2001 to 02/25/2001, and 03/16/2001 03/24/2001. AucklandVI is from 06/09/2001 to 06/12/2001. Because the traces are very large (65 GB containing over 3 billions
IP headers for Auckland-IV), we sample out a subset from each trace. In order to avoid bias in favoring
the large classes and over-fitting to prior probabilities of classes, each class has roughly equal size. We
use stratified sampling to sample 1,000 flows for each application and each trace. MAWI data traces were
48-hour long from 01/09/2007 to 01/11/2007 on a 1G Ethernet. During this period, we collected 8 traces
of 1-hour each day at identical times (02:00, 10:00, 14:00, 20:00), which were chosen to cover business,
non-business as well as night time hours. Our second data set is generated by sampling 1000 randomly
7

selected flows of 10 classes (SSN DNS , HTTP, HTTPS , POP 3, TELNET, NNTP, SOCKS , FTP, SMTP). Overall,
there are 10, 000 flows in the data set.
Feature selection. For a window (sub-flow) of fixed size (the number of packets p), we select 10 features
which are the duration, total number of bytes, the minimum, mean, maximum, and standard deviation of
packet lengths and of inter-arrival times. For bidirectional access there are 20 features.
The SML algorithms. We evaluate 10 different machine learning algorithms, which are AdaBoost,
Bagging (both with decision stumps as base classifiers), Decision Tree (C4.5, J48 implementation), Sequential Minimal Optimization (SMO), Nave Bayes with Kernel Density Estimate (NBK) and with discretization (NBD), Nave Bayes Tree (NBTree), Support Vector Machine (SVM), Bayesian Network (BN), and
Instance-based Learning (IB1 and IBk). The algorithms are trained on the data sets using 10-fold cross
validation, where a labeled data set is partitioned in to 10 subsets, and 10 training and testing iterations are
performed. In each iteration, 9 of the 10 subsets are used for training and the remaining subset for testing.
Evaluation criteria. For a given flow class i, true positive (TPi ) is the number of correctly identified
flows, false positive (FPi ) is the number of flows wrongly marked as belonging to the class, and false
negative (FNi ) is the number of flows in the class wrongly classified. Precision, recall, and accuracy are the
threePevaluation measures: precision = TP/(TP + FP), recall = TP/(TP + FN), and overall accuracy =
i TPi
P
( i (TPi +FNi )) , where the sums are over all flow class i.
The results. We vary the observational window size p from 5 to 100. What that means is that the
classifier, during training and testing, is only provided with p consecutive packets in each flow. In the first
and second sets of experiments, there are two windows per flow, one for each flow direction. All windows
of the first experiment set start from the beginning of a flow direction. Accuracy results are reported in Fig.
1. The second set (see Fig. 2) picks a random starting position for the windows. In the third and fourth sets
of experiments there are only one window per flow, whose starting point is random. In the third set (Fig. 3)
the window is on the backward flow, and the fourth (Fig. 4) is on forward flow.
ADABoost
IB1
IBk
NBK

SMO
Bagging
BN
J48

ADABoost
IB1
IBk
NBK

NBD
NBTree

SMO
Bagging
BN
J48
delay_cutoff

cutoff
100

100

90

90
Accuracy (%)

Accuracy (%)

NBD
NBTree

80
70
60

80
70
60
50

50

40

40
0

10

20

30

40
50
60
Window Size (p)

70

80

90

100

Figure 1: Bidirectional, window starts from beginning

10

20

30

40
50
60
Window Size (p)

70

80

90

Figure 2: Bidirectional, random window

From the results, we learned several useful facts which are partial answers to Question 2.2: (a) Adaboost,
Bagging, and C4.5 perform great with accuracy consistently well over 90%; (b)For these well-performing
algorithms, the window size does not have too much of an effect, in fact a window of size of 10 seems to be
a consistently great choice; (c) There is a slight performance penalty for observing only one flow direction.
It is worth noting that, in order to discriminate traffic an accuracy of 80-85% is more than sufficient, because
ISPs do not need to completely drop a flow. Randomly delaying some packets in a VoIP flow, for example,
already has a negative impact on the application.

100

ADABoost
IB1
IBk
NBK

SMO
Bagging
BN
J48

NBD
NBTree

ADABoost
IB1
IBk
NBK

SMO
Bagging
BN
J48
uni_begin_cutoff_for

100

100

90

90
Accuracy (%)

Accuracy (%)

uni_begin_cutoff_back

80
70
60
50
40

NBD
NBTree

80
70
60
50

10

20

30

40

50

60

70

80

90

40

100

10

20

30

Window Size (p)

40

50

60

70

80

90

Figure 3: Unidirectional, random window on backward direction

Figure 4: Unidirectional, random window on forward


direction

With respect to the individual flow types, Table 1 shows the recall rates for the Adaboost algorithm with
a window size p = 10. Other well-performing algorithms follow the same trend. HTTP, HTTPS , SOCKS are
the hardest flows to classify, probably for the obvious reason that many different types of applications can
run on these protocols. Observing on only one direction does degrade the recall rate slightly. Observing
from the start of the flows is also slightly better than observing randomly inside the flows. Overall, the recall
rates for most applications are very respectable.
Table 1: Recall rates of different flow types. Observation window size p = 10, AdaBoost algorithm.
BC: window starts from beginning and entire flows observed, BI: window starts from beginning and p
packets observed, DC: window starts at a random point to the end of the flow, DI: window starts at a
random point until p packets observed, -F: window on flows forward direction. -B: window on backward
direction.

ssh
dns
telnet
smtp
pop3
ftp-control
nntp
socks
http
https

Bidirectional
BC
BI
100 98.8
99.1 98.9
98.4 97.2
97.2 96.9
94.2 97.6
97.2 93.5
99.9 97
96.3 92.7
98
92.6
97.3 94.6

DC
99
96.3
83.2
94.4
96.9
95.1
97
85.9
74.6
79.5

DI
97.2
95.2
94.3
92.8
90.6
91.5
92.1
76.4
74.4
73.7

Unidirectional
BC-F BC-B
99.7
99.5
99
98.4
98.6
98.6
94.9
94.3
98.6
97.3
93.4
96.4
98.1
99.6
84.9
93.1
95.4
94.7
96.7
95

BI-F
98.7
98.8
93.6
91
92
89.8
92.2
83.2
82.9
89.1

BI-B
99
98.2
96.6
90.3
86.7
88.8
94.7
88.9
88.8
87.4

DC-F
98.7
98.3
95.7
90.9
97.1
91.3
98.5
84.3
86.1
82.3

DC-B
99.3
98.1
95.8
94.4
90.2
92.4
99.5
84.9
79.3
84.9

DI-F
97.2
86.6
94
95.6
92.2
89.3
82.5
65.7
68.8
70.7

DI-B
96.1
92.5
96.2
92.1
85.7
92.2
85.3
75
72.8
77.2

A Formulation of the Generic Classifier Evasion Problem

This section develops a particular formulation of the generic classifier evasion problem along with algorithmic results and evaluations. We hope to convince the reader that evading SML-based classifiers while
maintaining high levels of QoS is a distinct possibility. Furthermore, the section illustrates the types of
research questions arising from addressing the problems studied in the proposal.

4.1

100

Window Size (p)

The data-flow masquerading problem

The ISPs discrimination policy can take a variety of forms. They can, for example, black-list certain
flow types like VoIP or P2P. They can put some flows such as HTTP on a white-list allowing them to pass
through untouched. However the discrimination policy is defined, it is reasonable to assume that there is
9

Figure 5: Senders transmission schedule and the combined effect


an implicit or explicit white-list of flows. Hence, in this formulation we make the minimal assumption that
the user has access to a database of white-listed flows. There are several ways in which the user can build
this database: (1) she can experimentally run applications which the ISP allows to go through and record
all the packet sizes and their timings, (2) she can statistically learn the distributions allowed through by the
classifier and generate white-listed flows from the learned generative model, or (3) she has access to the
publically available statistical models that the classifiers use (i.e., no security by obscurity).
The problem now becomes: given a database of white-listed data flows, how can a user transfers data
evading the classifier while satisfying some desired Quality-of-Service (QoS) requirements. We next describe how the white-list flows are modeled. The white-list database consisting of m data flows A1 , . . . , Am .
The Ai are unidirectional. (Classifiers requiring bidirectional access are not practical [87].) Time is slotted.
Each slot is seconds long, and is sufficient to send at least one packet of a given flow. From the data
traces described in Section 3, practical values of are in the tens of mili-seconds. Each flow Ai can thus be
thought of as an array Ai [1], , Ai [li ], where li is the number of time slots that flow Ai lasts, and Ai [j] is
the number of bytes sent in slot j. For example, the following flow is a flow taken from the data trace with
= 10ms.
A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8]
1450 0
0
50
0
0
0
1330
At time slot 1, the flow has a packet of size 1450 bytes. Then, the inter-arrival time between the first and the
second packet is 3 = 30ms, at which point the second packet of size 50 bytes is sent. Similarly, the next
inter-arrival time is 40ms and the next packet is of size 1330 bytes.
Now, suppose the user would like to transfer data given some desired level of end-to-end QoS, but
she is only allowed to send data in accordance with the white-listed flows. At any time slot, she can start
re-playing some flows in the database. We assume that once a flow is chosen to be re-played, she has
to follow through by re-playing the entire flow. (Otherwise, a smart classifier would be suspicious.) A
transmission schedule indicates which flow to start at which time slot. Note that the same white-listed flow
can be chosen multiple times in a schedule. However, the total number of bytes she can transmit in one time
slot is at most C, which is defined by her Internet service available bandwidth. Can she accomplish the task?
For example, suppose the database consists of the allowable flows A1 , A2 , A3 as shown in Fig. 5. Given
the transmission schedule in the figure, cumulatively the sender is able to send 10 bytes in the first slot,
10 in the second, 20 in the third and so on. The total throughput is 150/11 = 13.6 bytes per slot. We
next formulate two versions of the so-called flow masquerading problem, taking into account to very basic
quality of service (QoS) parameters. The first QoS is the overall throughput of the combined flow, and the
second, more suitable for real-time applications, is the minimum instantaneous throughput.
Problem 4.1 (M AXIMUM T HROUGHPUT DATA F LOW M ASQUERADING MTDFM). We are given m
white-listed flows A1 , . . . , Am , the maximum bandwidth capacity per slot C, and a duration T. The problem
is to find a transmission schedule maximizing the total amount of data transmitted from slot 1 to slot T.
10

is not very stringent on timeliness, and thus its solution is likely not suitable for real-time
types of QoS. However, throughput is probably the most basic QoS parameter, and an algorithm solving
MTDFM will serve as a basis for solving more complex versions of DATA - FLOW MASQUERADING . The next
version has more time-sensitive requirements. We first need to formally define the concept of instantaneous
throughput. Fix a transmission schedule S. For any time slot i, let BiS denote the total amount of data the
MTDFM

BS

schedule sends from slot 1 up to slot i. The instantaneous throughput at i, denoted by iS , is iS := ii .


For example,
for the schedule in Fig. 5, starting from slot 3, the minimum instantaneous throughput is

60 65 85 90 110 130 135 150
,
,
12.85 Bps. The schedule is thus good for any application
min 40
3 4 5 , 6 , 7 , 8 , 9 , 10 , 11
requiring at least 12.85Bps (Bytes per slot) instantaneous throughput after an initial period of T = 2 slots.
Problem 4.2 (M AXIMUM I NSTANTANEOUS T HROUGHPUT DATA F LOW M ASQUERADING MITDFM).
In addition to the input parameters as those in MTDFM, we are also given an initial grace period T < T.
(The grace period captures the initial buffer time in a real-time media transmission application.) The problem is to find a transmission schedule S maximizing the worst-case instantaneous throughput S after the
grace period, S := minT <iT iS . In words, we want to maximize the worst-case instantaneous throughput
S after the grace period.
The above two versions do not restrict the number of concurrent flows that the user is allowed to schedule. While the bandwidth capacity C will certainly limit the number of concurrent flows, in practice this
limit might be too high. In reality, the number of concurrent TCP connections per host is bounded by the
kernel, or the user might still want to further limit the number of concurrent TCP connections to avoid performance decay. The ISPs classifier might also put a threshold on the number of concurrent connections.
The next two versions are the concurrency-bounded versions of the previous two.
Problem 4.3 (k-MTDFM and k-MITDFM). These two problems are the same as their corresponding versions
above, with an additional requirement that the maximum number of scheduled flows spanning any time slot
is at most k.
For example, the schedule shown in Fig. 5 is not feasible for 2-MTDFM and 2-MITDFM, because slots 6
to 8 are spanned by 3 flows. All of the above problems are of the multidimensional bin-packing/knapsack
types [49]. Hence, it should not be surprising that they are NP-hard. In fact, we can prove that all four
of them are strongly NP-hard, i.e. they remain NP-hard even when all numerical values are polynomially
bounded in encoding size [39]. Note also that strongly NP-hard problems do not have fully polynomialtime approximation schemes (FPTAS) [39].
Theorem 4.4. The four problems MTDFM, MITDFM, k-MTDFM, and k-MITDFM are strongly NP-hard. In
particular, there is no fully polynomial-time approximation schemes for neither problem.
Problem 4.5. Design approximation algorithms and prove stronger hardness of approximation results for
the four problems MTDFM, MITDFM, k-MTDFM, and k-MITDFM.

4.2

Integer Program Formulations and Approximation Heuristics

Although the above four problems are hard to solve optimally, it is easy to write down equivalent (mixed)
integer program formulations for them. We will also design several approximation heuristics and demonstrate that the heuristics (based on the integer programs) can achive very high quality solutions.
For each i [m] and t [T], let xit denote the number of copies of flow Ai which is scheduled to start
at time slot t. The following integer program is equivalent to MTDFM.

11

m X
T
X

max

i=1 t=1

s.t.

Ai [l] xit

m
X

min{li ,Tt+1}

(1)

l=1

min{t,T}

Ai [t j + 1]xij C, t [T + n 1]

i=1 j=max{1,tli +1}

xit Z+ ,

i, t

For the bounded-concurrency version k-MTDFM, we simply add a constraint stating that for each time
slot t [T], the number of flows spanning the slot t is at most k. For the MITDFM problem, in addition
to the slot capacity constraints, we also want to add a constraint saying that each instantaneous rate t is at
least z, for each t {T + 1, , T}, where z is a variable; and then, the programs objective is to maximize
z. And, for k-MITDFM we add the bounded-concurrency constraints to the integer program for MITDFM.
The integer program forMTDFM is a packing integer program (PIP). Randomized rounding for this PIP
has approximation ratio O( T) [94]. The greedy algorithm in [59] gives a ratio of ssmax
n, where smax (resp.
min
smin ) is the maximum (resp. minimum) packet size in the flow list, and n is the maximum flow size. These
ratios are too large for our problems, partly because they were designed for generic PIPs. The greedy choice
in [59] is also not very natural for our problems. Hence, we chose to implement and evaluate more intuitive
greedy choices in this section.
First-fit. A natural greedy heuristic is to keep filling flows in the slots within T as much as possible, and
as early as possible. Define the weight wi (t) of a flow Ai at time t [T] to be the total number of bytes
scheduled to be sent by the flow before the deadline T, if we start Ai at t. In the algorithm, we maintain a
capacity vector c[1, , T + n 1], initialized by setting c[j] = C for all j. When a flow is fit in, we
subtract corresponding amounts out of the components of this vector. We say that flow Ai can fit capacity
vector c at slot t [T] if Ai [l] c[t + l 1], l [li ]. We pick the heaviest flow that can fit first, breaking
ties by choosing flows that can start as far left as possible.
Greedy heuristic. In the MITDFM problem, we would like to fit flows as heavy as possible and as early
as possible. To further push packed flows to the left, we take into account the right-edge of the flows
in the selection process. For each flow Ai and time t T, define the right-edge of flow Ai at time t to
be right-edge(i, t) := min(t + li 1, T). We will choose flows to fit in the capacity vector in descending
wi (t)
order of right-edge(i,t)
. For the concurrency-bounded versions, we can maintain an additional concurrency

array r = r[1 T + n 1], where r[t] counts number of scheduled flows spanning time slot t, and
straightforwardly modify the greedy heuristics taking r into account.
Relaxation and rounding. The above two combinatorial heuristics do not take advantage of the global
structure of the problem. The integer programs do, at the cost of exponential running time. We can take
advantage of both methods by first solving the LP-relaxation of the integer programs (which takes polytime), round the variables down to get a partial solution. Then, we run one of the greedy algorithms on the
residual problem, trying to regain the loss imposed by the conservative rounding procedure. Specifically, let
x denote the optimal solution to the LP-relaxation of the corresponding integer program formulated earlier.
bLPc
We construct an ILP-feasible solution xbLPc by assigning xit = bxit c. We then amend this solution by
solving the residual problem.

4.3

Evaluating the heuristics

There are 5 heuristics to be evaluated: LP-relaxation (LP serves as a very crude upper bound), first-fit
(FF), greedy (GR), LP-relaxation, rounding, plus FF (RR-FF), and LP-relaxation, rounding, plus GR (RRGR). White-list databases each consists of 1000 flows are sampled from the traces described in Section 3.
We have to sample a white-list for each set of experiments because one single white-list may not satisfy the
12

parameters of all the experiments. We evaluate each heuristic by varying the following major parameters
that might affect the outcome: (1) 4 different bandwidth capacity C of 1.544 Mbps (for T1 connection), 5
Mbps (for a typical DSL connection), 8 Mbps (for a typical Cable Modem connection), and 44.736 Mbps
(for T3 connection), (2) 5 different flow duration T of 30 seconds, 1, 3, 5, and 10 minutes, (3) 3 different
maximum white-listed flow lengths n of 10, 20 and 30 packets, (4) 3 different minimum white-listed flow
lengths n of 10, 20, and 30 packets, (5) 3 different initial grace period T of 0.1, 0.2, and 0.5 seconds, (6)
3 different concurrency bound k of 10, 50, and 100. The time slot is fixed at = 20ms. Either the overall
throughput or the worst-case instantaneous throughput are used to evaluate the quality of a solution. Overall,
10, 720 evaluations were conducted. (This is slightly less than 52 4 34 2 because not all combinations are
meaningful.) We only report a small subset of results, because most results follow the same trend.
The integer programs have very large sizes, hence we are not able to compare the heuristics directly with
the optimal objective. Thus, we have to use the LP objective as a crude upper bound. Of course, the optimal
solution is likely much less than that of the LPs objective, due to the likely large integrality gap. In fact, the
objective of the LPs in most cases are extremely close to the maximum capacity C.
Figures 6 and 7 show the effect of varying bandwidth capacities C on the overall instantaneous throughput. Here we chose T = 2 minutes, the maximum white-listed flow length is 30 packets, and a 500ms grace
period. Most algorithms perform close to optimal with respect to throughput, regardless of available capacities. There is a marked difference between algorithms with respect to instantaneous throughput, however.
The LP-based algorithms (RR-FF and RR-GR) perform unexpectedly well. However, since they require
solving a large LP, it may not may not be desirable. The greedy choice of GR was designed with instantaneous throughput in mind, and GR works reasonably well, attaining 75% of the maximum possible.
7

Effect of Capacity

x 10

Effect of Capacity

4
Throghput (bps)

Instantaneous Throghput (bps)

x 10

LP
FF

GR
RRFF

RRGR

1
1.544

5
8
Capacity (Mbps)

4
LP
FF
GR
RRFF
RRGR

3
2
1
1.544

44.736

44.736

Capacity (Mbps)

Figure 6: Effect of varying capacities on throughput

Figure 7: Instantaneous throughput

Figures 8 and 9 show the effect of varying the duration T. Here we fix C = 8Mbps, white-listed flows
of 30 packets, and a 500ms grace period. Most algorithms perform close to optimal with respect to
throughput. It is interesting to note that while GR is consistently better than FF, RR-FF is better than RRGR. This is likely because the residual problems have only a few places to fit a new flow it, and thus it is
better to fit the heavier flow first regardless of its right edge. In terms of instantaneous throughput, again the
LP-based solutions work well, and RR-GR works reasonably well as before. FF is ill-suited for optimizing
this objective.
Figures 10 and 11 show the effect of limiting the the minimum white-list flow length. These experiments
are conducted because there are some flows in the traces which have very short lengths (45 packets), which
the heuristics can choose repeatedly to fill up most of the slots, leading to very high quality solutions in the
previous cases. As expected, all algorithms performances degrade slowly as the min-flow lengths increase.
Achievable instantaneous throughputs are a little lower than achievable overall throughputs. However, they
are mostly more than half of the available bandwidths, which should be sufficient for many real-time appli-

13

Effect of Duration

x 10

Instantaneous Throughtput (bps)

7.5
Throughtput (bps)

Effect of Duration

LP
FF
GR
RRFF
RRGR

6.5

30

120

180
Duration (sec)

300

x 10

LP

FF

RRFF

GR
RRGR

5
4
3
2
1
0

30

120

180

300

600

Duration (sec)

600

Figure 8: Effect of varying duration T on throughput

Figure 9: Effect of varying duration T on instantaneous throughput. There are missing data points because the LP-solvers took too long to run for large T.

cations. One noticeable fact is that, in terms of throughput, the LP-based heuristics no longer outperform
FF consistently. The grace period of up to 500ms does not have a significant effect on the performance of
all algorithms.
6

Effect of Min Flow Weight

x 10

InstantaneousThroughtput (bps)

8
Throughtput (bps)

LP
FF

7.5

GR
RRFF

RRGR

6.5
6
5.5
5

20
Min Flow Weight ( # pkts)

30

Figure 10: Effect of varying min flow length on


throughput

LP

FF
GR

RRFF
RRGR

5
4
3
2
1

10

Effect of Min Flow Weight

x 10

10

20
Min Flow Weight ( # pkts)

30

Figure 11: Effect of varying min flow length on instantaneous throughput

The concurrency degree k has the largest impact on the performances. The first impact in on the running
time of the LP and the LP-based algorithms. In fact, most of the LP-based algorithms cannot terminate
within a reasonable amount of time due to the explosion in the number of constraints for the k- MITDFM
problem, as shown in Fig. 13. The second impact is on the throughput (both types). Although our heuristics
are reasonably close to the LPs objective (which is an overestimate on the optimality), the LPs objective
itself is a lot worse once we restrict the concurrency degree. In all of the experiments where we do not restrict
the concurrency degree, we were able to achieve near optimality with the concurrency of about 500 flows.
There are two lessons we can learn from this fact. From the classifiers point of view, number of concurrent
flows from a particular machine is probably a good feature. From the users point of view, evading the
classifier under our (very strict) re-playing model might require distributing the connections across multiple
machines.

Prior NSF Support

PI Ngo was the PI of an NSF CAREER Award entitled Designs and Analysis of WDM Switching Architectures (CCF-0347565, Feb 2004 Feb 2009, $409K). The project aims to provide a theoretical framework
14

x 10

Instantaneous throughput (pbs)

Effect of Concurrency Flows k


LP

Throughtput (bps)

FF
GR

RRFF
RRGR

3
2

LP

10

20
50
Max Concurrency Flow (# flows)

FF

1.5

GR
RRFF
RRGR

0.5

Effect of Concurrency Flows k

x 10

10

20
50
Max Concurrency Flows (#flows)

100

100

Figure 12: Effect of concurrency degree k on throughput. The performance dip at k = 100 is due to the
difference in white-listed flow sets.

Figure 13: Effect of concurrency degree k on instantaneous throughput. The LP-based solutions take
too long to finish and thus are not reported. Here,
C = 8Mbps.

to characterize the complexity of WDM switching networks under different traffic models and non-blocking
requirements. Then, practical and cost-effective constructions of non-blocking switching networks are to
be designed. The third objective is devising efficient routing algorithms on the new architectures. Since
February 2004, the project has produced 13 journal papers (including papers in IEEE/ACM Transactions on
Networking and SIAM J. Comput.), and 30 conference papers (including INFOCOM, GLOBECOM, DSN,
SODA, RAID), among other workshop papers and book chapters. Notable results include optimal and near
optimal upper and lower bounds on the complexity of multi-wavelength switching networks and multi-rate
switching networks. We resolved a 17-year-old open problem in the complexity of multirate distribution networks. Five Ph.D. students were partially supported by the grant, four of whom have successfully defended
their dissertations and are working in Silicon Valley.

Timeline and Dissemination Plan

Year 1: we will focus on addressing Problems 2.1, 2.2, 2.5, 2.6, and 2.7, which have to do with pinpointing the best SML-classifiers to evade, and with identifying the affect of network noise on the classification performance. To do this, we will set up a small experimental local area network testbed, and also
connect it to PlanetLab. Furthermore, we will download and analyze newer network traces than those in
Section 3. Year 2: know which which classification models are worthy of evasion, we will work on Problems 2.3, 2.4, 2.5, and 2.8. These problems conern evading the classifiers taking into account network noise
models. Year 3: we study the error-tolerant and overhead questions (Problem 2.9, 2.10), and explore the
highly open-ended Problem 2.11.
We will disseminate the results from this project through web pages, seminars, invited talks, conference
presentations and journal publications. We believe that this project will stimulate further research on data
flow classification and machine learning, at other institutions and the industry at large. One graduate students
is to be employed to work on the project.

Curriculum Development Activities

The research materials and results shall also be integrated into course materials for CSE 489/589 (Modern Networking Concepts), CSE 711 (Computational Learning Theory), and CSE 694 (Probabilistic Analysis and Randomized algorithms). These are courses taught yearly by the PI at SUNY Buffalo. The classifier
evasion problems fit very well into the overall framework of modern statistical learning theory (CSE711),
especially with noise-tolerant and statistical query learning models. Networking aspects of the proposed
research belong to the networking course (CSE589), and many of the analytical techniques can be taught in
CSE 694.

15

E. References Cited
[1] L7 packet filter. http://l7-filter.sourceforge.net/.
[2] Open dpi. http://opendpi.org/opendpi.org/index.html.
[3] Statistical protocol identification. http://sourceforge.net/projects/spid/.
[4] Panama cracks down on net telephony. http://news.cnet.com/2100-1033-965073.
html, Nov 2002.
[5] Clearwire blocks competitive voice offering.
article1744.html, Mar 2005.

http://www.vonage-forum.com/

[6] Mexico phone operator telmex blocking voip traffic and websites.
vonage-forum.com/article1795.html, Mar 2005.

http://www.

[7] Trouble
in
the
tropics:
Belize
telco
accused
of
blocking
voip.
http://www.zdnet.com/blog/ip-telephony/
trouble-in-the-tropics-belize-telco-accused-of-blocking-voip/1048,
Apr 2006.
[8] Trouble on the line.
http://www.guardian.co.uk/technology/2006/apr/06/
voip.telephony, Apr 2006.
[9] Comcast blocks some internet traffic. http://www.msnbc.msn.com/id/21376597/, Oct
2007.

[10] Att, comcast face new web rules as agency sets vote. http://www.bloomberg.com/news/
2010-12-01/net-neutrality-vote-by-u-s-fcc-set-for-december-after-year-of-conf
html, Dec 2010.
[11] Court rules for comcast over fcc in net neutrality case. http://www.washingtonpost.com/
wp-dyn/content/article/2010/04/06/AR2010040600742.html, Apr 2010.
[12] How
comcast
became
a
toll-collecting,
nuke-wielding
hydra.
http://arstechnica.com/tech-policy/news/2010/11/
how-comcast-became-a-toll-collecting-hydra-with-a-nuke.ars,
Dec
2010.
[13] Is net neutrality dead?
2010.

http://m.news.com/2166-12_3-20001886-266.html, Apr

[14] Network neutrality and protocol discrimination.


http://www.computerworld.com/
s/article/9197721/Network_Neutrality_and_Protocol_Discrimination?
taxonomyId=154, Nov 2010.
[15] A HMED , T., C OATES , M., AND L AKHINA , A. Multivariate online anomaly detection using kernel
recursive least squares. In INFOCOM (2007), pp. 625633.
[16] A RACKAPARAMBIL , C., B RATUS , S., B RODY, J., AND S HUBINA , A. Entropy based worm and
anomaly detection in fast ip networks. In In Proceedings of IEEE Workshop on Scalable Stream
Processing Systems (SSPS) (2010).

[17] BARRENO , M., BARTLETT, P. L., C HI , F. J., J OSEPH , A. D., N ELSON , B., RUBINSTEIN , B.
I. P., S AINI , U., AND T YGAR , J. D. Open problems in the security of learning. In AISec (2008),
pp. 1926.
[18] BARRENO , M., N ELSON , B., S EARS , R., J OSEPH , A. D., AND T YGAR , J. D. Can machine learning
be secure? In ASIACCS (2006), pp. 1625.
[19] B ERGER , T., M EHRAVARI , N., T OWSLEY, D., AND W OLF, J. Random multiple-access communications and group testing. IEEE Trans. Commun. 32, 7 (1984), 769779.
[20] B ERNAILLE , L., T EIXEIRA , R., AND S ALAMATIAN , K. Early application identification. In In
CoNEXT06 (2006).
[21] B IONDI , P., AND D ESCLAUX , F. Silver needle in the skype. In Black Hat Europe (2006).
[22] B LUMENTHAL , M. S., AND C LARK , D. D. Rethinking the design of the internet: the end-to-end
arguments vs. the brave new world. ACM Trans. Internet Techn. 1, 1 (2001), 70109.
[23] B ONFIGLIO , D., M ELLIA , M., M EO , M., R ITACCA , N., AND ROSSI , D. Tracking down skype
traffic. In INFOCOM (2008), pp. 261265.
[24] B ONFIGLIO , D., M ELLIA , M., M EO , M., ROSSI , D., AND T OFANELLI , P. Revealing skype traffic:
when randomness plays with you. In SIGCOMM (2007), pp. 3748.
[25] B RANCH , P. A., H EYDE , A., AND A RMITAGE , G. J. Rapid identification of skype traffic flows. In
NOSSDAV 09 (2009), ACM, pp. 9196.
[26] B YERS , J. W., L UBY, M., M ITZENMACHER , M., AND R EGE , A. A digital fountain approach to
reliable distribution of bulk data. SIGCOMM Comput. Commun. Rev. 28 (October 1998), 5667.
[27] C HAKRABARTI , A., BA , K. D., AND M UTHUKRISHNAN , S. Estimating entropy and entropy norm
on data streams. Internet Mathematics 3, 1 (2006).
[28] C HAKRABARTI , A., C ORMODE , G., AND M C G REGOR , A. A near-optimal algorithm for estimating
the entropy of a stream. ACM Transactions on Algorithms 6, 3 (2010).
[29] C HINCHANI , R., H A , D., I YER , A., N GO , H. Q., AND U PADHYAYA , S. Insider threat assessment:
model, analysis, and tool. In Network Security. Springer, New York, 2006. To Appear.
[30] C HINCHANI , R., H A , D. T., I YER , A., N GO , H. Q., AND U PADHYAYA , S. J. On the hardness of
approximating the min-hack problem. J. Comb. Optim. 9, 3 (2005), 295311.
[31] C HINCHANI , R., I YER , A., N GO , H. Q., AND U PADHYAYA , S. J. Towards a theory of insider threat
assessment. In DSN (2005), pp. 108117.
[32] C HUN , B., C ULLER , D., ROSCOE , T., BAVIER , A., P ETERSON , L., WAWRZONIAK , M., AND
B OWMAN , M. PlanetLab: An Overlay Testbed for Broad-Coverage Services. ACM SIGCOMM
Computer Communication Review 33, 3 (July 2003), 0000.
[33] C OLBOURN , C. J., D INITZ , J. H., AND S TINSON , D. R. Applications of combinatorial designs
to communications, cryptography, and networking. In Surveys in combinatorics, 1999 (Canterbury).
Cambridge Univ. Press, Cambridge, 1999, pp. 37100.

[34] D U , D.-Z., AND H WANG , F. K. Combinatorial group testing and its applications, second ed., vol. 12
of Series on Applied Mathematics. World Scientific Publishing Co. Inc., River Edge, NJ, 2000.
[35] D U , D.-Z., AND N GO , H. Q., Eds. Switching Networks: Recent Advances. Network Theory and
Applications, 5. Kluwer Academic Publishers, Dordrecht, The Netherlands, 2001.
[36] DYACHKOV, A., H WANG , F., M ACULA , A., V ILENKIN , P., AND W ENG , C. W. A construction of
pooling designs with some happy surprises. Journal of Computational Biology 12, 8 (2005), 1129
1136.
[37] F ELTEN , E. Nuts and bolts of network neutrality, Jul 2006.
[38] F ELTEN , E. Three flavors of net neutrality, Dec 2008.
[39] G AREY, M. R., AND J OHNSON , D. S. Computers and intractability. W. H. Freeman and Co.,
San Francisco, Calif., 1979. A guide to the theory of NP-completeness, A Series of Books in the
Mathematical Sciences.
[40] G OODRICH , M. T., ATALLAH , M. J., AND TAMASSIA , R. Indexing information for data forensics.
In Third International Conference on Applied Cryptography and Network Security (ANCS) (2005),
pp. 206221.
[41] H A , D., U PADHYAYA , S., N GO , H. Q., P RAMANIK , S., C HINCHANI , R., AND M ATHEW, S. Insider
threat analysis using information-centric modeling. In Advances in Digital Forensics III, P. Craiger
and S. Shenoi, Eds. Springer, Boston, 2007.
[42] H A , D. T., AND N GO , H. Q. On the trade-off between speed and resiliency of Flash worms and
similar malcodes. In Proceedings of The 5th ACM Workshop on Recurring Malcode (WORM 2007), in
association with the 14th ACM Conference on Computer and Communications Security (CCS 2007)
(Oct 29Nov 02 2007), ACM.
[43] H A , D. T., AND N GO , H. Q. On the trade-off between speed and resiliency of flash worms and
similar malcodes. Journal in Computer Virology 5, 4 (2009), 309320.
[44] H A , D. T., N GO , H. Q., AND C HANDRASEKARAN , M. Crestbot: A new family of resilient botnets.
In GLOBECOM (2008), pp. 21482153.
[45] H A , D. T., U PADHYAYA , S. J., N GO , H. Q., P RAMANIK , S., C HINCHANI , R., AND M ATHEW,
S. Insider threat analysis using information-centric modeling. In IFIP Int. Conf. Digital Forensics
(2007), pp. 5573.
[46] H A , D. T., YAN , G., E IDENBENZ , S., AND N GO , H. Q. On the effectiveness of structural detection
and defense against p2p-based botnets. In DSN (2009), pp. 297306.
[47] H AFFNER , P., S EN , S., S PATSCHECK , O., AND WANG , D. Acas: automated construction of application signatures. In MineNet 05: Proceeding of the 2005 ACM SIGCOMM workshop on Mining
network data (New York, NY, USA, 2005), ACM, pp. 197202.
[48] H ASTIE , T., T IBSHIRANI , R., AND F RIEDMAN , J. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, corrected ed. Springer, August 2003.
[49] H OCHBAUM , D. S., Ed. Approximation Algorithms for NP Hard Problems. PWS Publishing Company, Boston, MA, 1997.
3

[50] H UANG , L., N GUYEN , X., G AROFALAKIS , M. N., H ELLERSTEIN , J. M., J ORDAN , M. I., J OSEPH ,
A. D., AND TAFT, N. Communication-efficient online detection of network-wide anomalies. In
INFOCOM (2007), pp. 134142.
[51] H UANG , L., N GUYEN , X., G AROFALAKIS , M. N., J ORDAN , M. I., J OSEPH , A. D., AND TAFT,
N. In-network pca and anomaly detection. In NIPS (2006), pp. 617624.
[52] H UANG , Y., F EAMSTER , N., L AKHINA , A., AND X U , J. J. Diagnosing network disruptions with
network-wide analysis. In SIGMETRICS (2007), pp. 6172.
[53] I NDYK , P., N GO , H. Q., AND RUDRA , A. Efficiently decodable non-adaptive group testing. In
Proceedings of the Twenty First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA2010)
(New York, 2010), ACM.
[54] J OCH , A. Debating net neutrality. Commun. ACM 52, 10 (2009), 1415.
[55] K ARAGIANNIS , T., B ROIDO , A., B ROWNLEE , N., C LAFFY, K., AND FALOUTSOS , M. Is P2P
dying or just hiding? In Globecom 2004 (Dec 2004).
[56] K ARAGIANNIS , T., B ROIDO , A., FALOUTSOS , M., AND CLAFFY, K. Transport layer identification
of p2p traffic. In IMC 04 (2004), ACM, pp. 121134.
[57] K HATTAB , S. M., G OBRIEL , S., M ELHEM , R. G., AND M OSS E , D. Live baiting for service-level
dos attackers. In INFOCOM (2008), pp. 171175.
[58] K IM , H., C LAFFY, K., F OMENKOV, M., BARMAN , D., FALOUTSOS , M., AND L EE , K. Internet
traffic classification demystified: myths, caveats, and the best practices. In CoNEXT 08 (2008),
ACM, pp. 112.
[59] K RYSTA , P. Greedy approximation via duality for packing, combinatorial auctions and routing. In
MFCS (2005), pp. 615627.
[60] L AKHINA , A., C ROVELLA , M., AND D IOT, C. Mining anomalies using traffic feature distributions.
In SIGCOMM (2005), pp. 217228.
[61] L EIBOWITZ , N., R IPEANU , M., AND W IERZBICKI , A. Deconstructing the kazaa network. In
WIAPP 03 (2003), p. 112.
[62] L I , W., AND M OORE , A. W. A machine learning approach for efficient traffic classification. In
MASCOTS (2007), pp. 310317.
[63] L OGG , C., AND C OTTRELL , L. Characterization of the traffic between slac and the internet.
http://www.slac.stanford.edu/comp/net/slac-netflow/html/slac-netflow.html, July 2003.
[64] M ATHEW, S., P ETROPOULOS , M., N GO , H. Q., AND U PADHYAYA , S. J. A data-centric approach
to insider attack detection in database systems. In RAID (2010), pp. 382401.

[65] M EIR , R., AND R ATSCH


, G. An introduction to boosting and leveraging. In Machine Learning
Summer School (2002), pp. 118183.
[66] M OORE , A. W., AND PAPAGIANNAKI , K. Toward the accurate identification of network applications. In PAM (2005), pp. 4154.

[67] M OORE , A. W., AND Z UEV, D. Internet traffic classification using bayesian analysis techniques. In
SIGMETRICS 05 (2005), ACM, pp. 5060.
[68] M OORE , D., K EYS , K., KOGA , R., L AGACHE , E., AND C LAFFY, K. C. The coralreef software
suite as a tool for system and network administrators. In Proceedings of LISA01, pp. 133144.
[69] M UTHUKRISHNAN , S. Data streams: algorithms and applications. Foundations and Trends in Theoretical Computer Science 1, 2 (2005).
[70] N GO , H. Q. A new routing algorithm for multirate rearrangeable Clos networks. Theoret. Comput.
Sci. 290, 3 (2003), 21572167.
[71] N GO , H. Q. WDM switching networks, rearrangeable and nonblocking [w, f ]-connectors. SIAM
Journal on Computing 35, 3 (2005-2006), 766785.
[72] N GO , H. Q. WDM switching networks: complexity and constructions. In Combinatorial Optimization in Communication Networks, D.-Z. Du, M. Cheng, and Y. Li, Eds., vol. 18 of Combinatorial
Optimization. Springer, New York, 2006, pp. 395426.
[73] N GO , H. Q. On a hyperplane arrangement problem and tighter analysis of an error-tolerant pooling
design. J. Comb. Optim. 15, 1 (2008), 6176.
[74] N GO , H. Q., AND D U , D.-Z. A survey on combinatorial group testing algorithms with applications to DNA library screening. In Discrete mathematical problems with medical applications (New
Brunswick, NJ, 1999), vol. 55 of DIMACS Ser. Discrete Math. Theoret. Comput. Sci. Amer. Math.
Soc., Providence, RI, 2000, pp. 171182.
[75] N GO , H. Q., AND D U , D.-Z. Notes on the complexity of switching networks. In Advances in
Switching Networks, D.-Z. Du and H. Q. Ngo, Eds., vol. 5 of Network Theory and Applications.
Kluwer Academic Publishers, 2001, pp. 307367.
[76] N GO , H. Q., AND D U , D.-Z. New constructions of non-adaptive and error-tolerance pooling designs.
Discrete Math. 243, 1-3 (2002), 161170.
[77] N GO , H. Q., N GUYEN , T.-N., AND H A , D. T. Crosstalk-free widesense nonblocking multicast
photonic switching networks. In Proceedings of the 2008 IEEE Global Communications Conference
(GLOBECOM) (New Orleans, LA, U.S.A., 2008), IEEE, pp. ????
[78] N GO , H. Q., PAN , D., AND Q IAO , C. Nonblocking WDM switches based on arrayed waveguide
grating and limited wavelength conversion. In Proceedings of the 23rd Conference of the IEEE
Communications Society (INFOCOM) (Hong Kong, China, 2004), IEEE.
[79] N GO , H. Q., PAN , D., AND Q IAO , C. Constructions and analyses of nonblocking wdm switches
based on arrayed waveguide grating and limited wavelength conversion. IEEE/ACM Transactions on
Networking 14, 1 (2006), 205217.
[80] N GO , H. Q., PAN , D., AND YANG , Y. Optical switching networks with minimum number of limited range wavelength converters. In Proceedings of the 24rd Annual Joint Conference of the IEEE
Computer and Communications Societies (INFOCOM) (Miami, Florida, U.S.A., March 2005), vol. 2,
IEEE, pp. 11281138.
[81] N GO , H. Q., PAN , D., AND YANG , Y. Optical switching networks with minimum number of limited
range wavelength converters. IEEE/ACM Transactions on Networking 15, 4 (2007), 969979.
5

[82] N GO , H. Q., RUDRA , A., L E , A. N., AND N GUYEN , T.-N. Analyzing nonblocking switching
networks using linear programming (duality). In INFOCOM (2010), pp. 26962704.
[83] N GO , H. Q., AND V U , V. H. Multirate rearrangeable Clos networks and a generalized bipartite
graph edge coloring problem. SIAM Journal on Computing 32, 4 (2003), 10401049.
[84] N GO , H. Q., AND V U , V. H. Multirate rearrangeable Clos networks and a generalized bipartite
graph edge coloring problem. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on
Discrete Algorithms (SODA2003, Baltimore, MD) (New York, 2003), ACM, pp. 834840.
[85] N GO , H. Q., WANG , Y., AND L E , A. A linear programming duality approach to analyzing strictly
nonblocking d-ary multilog networks under general crosstalk constraints. In Proceedings of the 14th
Annual International Computing and Combinatorics Conference (COCOON) (Bejing, China, 2008),
Springer, LNCS, pp. 509519.
[86] N GO , H. Q., WANG , Y., AND PAN , D. Rearrangeable and nonblocking [w, f ]-distributors.
IEEE/ACM Transactions on Networking (2008). Accepted for publication.
[87] N GUYEN , T., AND A RMITAGE , G. A Survey of Techniques for Internet Traffic Classification using
Machine Learning. IEEE Communications Surveys & Tutorials 10, 4 (2008), 5676.
[88] N GUYEN , T.-N., N GO , H. Q., AND WANG , Y. Strictly nonblocking f -cast photonic switching networks under general crosstalk constraints. In Proceedings of the 2008 IEEE Global Communications
Conference (GLOBECOM) (New Orleans, LA, U.S.A., 2008), IEEE, pp. ????
[89] N GUYEN , T. T. T., AND A RMITAGE , G. J. Training on multiple sub-flows to optimise the use of
machine learning classifiers in real-world ip networks. In LCN (2006), pp. 369376.
[90] P ETERSON , L., A NDERSON , T., C ULLER , D., AND ROSCOE , T. A Blueprint for Introducing Disruptive Technology into the Internet. In Proceedings of HotNetsI (Princeton, New Jersey, October
2002).
[91] ROUGHAN , M., S EN , S., S PATSCHECK , O., AND D UFFIELD , N. Class-of-service mapping for qos:
a statistical signature-based approach to ip traffic classification. In IMC 04 (2004), ACM, pp. 135
148.
[92] S ALTZER , J. H., R EED , D. P., AND C LARK , D. D. End-to-end arguments in system design. ACM
Trans. Comput. Syst. 2 (November 1984), 277288.
[93] S EN , S., S PATSCHECK , O., AND WANG , D. Accurate, scalable in-network identification of p2p
traffic using application signatures. In WWW 04 (2004), ACM, pp. 512521.
[94] S RINIVASAN , A. Improved approximation guarantees for packing and covering integer programs.
SIAM J. Comput. 29, 2 (1999), 648670.
[95] S ZIGETI , T., AND H ATTINGH , C. Quality of service design overview. Cisco Press (Dec 2004).
[96] TRACES : HTTP :// PMA . NLANR . NET /S PECIAL /, N.
[97] TRACES : HTTP :// TRACER . CSL . SONY. CO . JP / MAWI /, M.

[98] VAN DER M ERWE , J., C ACERES


, R., HUA C HU , Y., AND S REENAN , C. mmdump: a tool for
monitoring internet multimedia traffic. SIGCOMM Comput. Commun. Rev. 30, 5 (2000), 4859.
6

[99] WAGNER , A., AND P LATTNER , B. Entropy based worm and anomaly detection in fast ip networks.
In Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure
for Collaborative Enterprise (Washington, DC, USA, 2005), IEEE Computer Society, pp. 172177.
[100] WANG , Y., N GO , H. Q., AND J IANG , X. Strictly nonblocking f -cast d-ary multilog networks
under fanout and crosstalk constraints. In Proceedings of the 2008 International Conference on
Communications (ICC) (Bejing, China, 2008), IEEE.
[101] WANG , Y., N GO , H. Q., AND N GUYEN , T.-N. Constructions of given-depth and optimal multirate
rearrangeably nonblocking distributors. In Proceedings of the 2007 Workshop on High Performance
Switching and Routing (HPSR) (2007), IEEE.
[102] W ILLIAMS , N., Z ANDER , S., AND A RMITAGE , G. A preliminary performance comparison of five
machine learning algorithms for practical ip traffic flow classification. SIGCOMM Comput. Commun.
Rev. 36, 5 (2006), 516.
[103] W OLF, J. K. Born again group testing: multiaccess communications. IEEE Transaction on Information Theory IT-31 (1985), 185191.

You might also like