A statistical approach to IP-level classification of
network traffic
Manuel Crotti, Francesco Gringoli, Paolo Pelosato, Luca Salgarelli
DEA, University of Brescia, Italy
Abstract— Correct classification of traffic flows according to
the application layer protocols that generated them is essential
for most network-management, resource allocation and intrusion
detection systems in TCP/IP networks. With the ever increasing
number of network protocols and services running on nonstandard TCP ports, the classification methods based the analysis
of the transport layer header are rapidly becoming ineffective.
On the other hand, mechanisms based on full payload analysis
are too computationally demanding to be run on most highbandwidth links. Here we present a novel classification technique
based on the statistical analysis of network traffic performed at
the IP-level. The key idea behind our approach is to build a set of
protocol fingerprints that we believe summarize, in a compact and
efficient way, the main IP-level statistical properties of application
layer protocols. By means of a simple, lightweight algorithm
based on the notion of anomaly scores, also presented in this
paper, an unknown flow can be compared against known protocol
fingerprints, detecting the application that generated the flow.
Our methodology is completely based on IP-level analysis: no
payload analysis or port analysis is required for the classification
of an unknown flow. Besides introducing our approach, we
describe preliminary experimental results that show how this
technique is effective in correctly classifying network traffic in a
real network environment.
Keywords: Traffic classification, traffic measurement.
I. I NTRODUCTION
Traffic classification mechanisms belong to the wide set
of tools that help the allocation, control and management of
resources in TCP/IP networks, and improve the reliability of
Network Intrusion Detection Systems (NIDS). An effective
mechanism for the classification of traffic flows according to
the application layer protocols that generated them can suggest
suitable measures to prevent or ease network congestion, to
deploy QoS-aware mechanisms successfully, or to counter
network attacks. Indeed the ability of network operators to
discover and manage new traffic patterns such as widespread
peer-to-peer file sharing protocols could have social and political repercussions [1]. A detailed analysis of how traffic is
distributed among different applications is also a key issue for
any research work based on traffic simulations, since accurate
models of real networks are required.
Different techniques can be used to classify IP traffic. The
simplest method is to identify the application layer service
that generated each flow by its transport level source and
destination ports [2]. Since many services are supposed to run
on “well-known”, standard ports1 , this classification technique
1 For
25.
example, the Simple Mail Transfer Protocol (SMTP) uses TCP port
can be successful in some cases. However, nowadays standard
services are frequently run on non-standard ports, for example
to circumvent policy restrictions, e.g., running a HTTP server
on port 8080 instead of the canonical 80. Moreover, some increasingly popular applications, such as peer-to-peer services,
do not even rely on a predefined set of well-known ports. In
the case of protocols tunneled on top of other protocols, such
as chat traffic tunneled on top of HTTP connections, correct
classification of network traffic cannot rely at all on transport
layer headers.
Another set of techniques as the ones present in many
NIDS such as [3], [4] is based on the detailed analysis of the
captured traffic at different layers, including the application
layer, to classify flows. By means of an exhaustive payload
analysis, these techniques try to discover which application
layer protocol originated each flow. Apart from legal issues
concerning the privacy of end users, the main drawback of
this kind of approaches is the computational power needed to
classify the traffic, since the finite state machines that drive
the application layer protocols must be decoded. Therefore
these techniques scale poorly to the capacity of current highspeed networks, limiting their use to lower bandwidth links.
Moreover, even these signature-based classification algorithms
can fail when traffic is tunneled: for example, when HTTP
is used as a transport layer for peer-to-peer traffic, most
signature-based engines would classify such flows as regular
HTTP. In the worst case, payload analysis techniques can
become completely ineffective, for example when end-to-end
encryption mechanisms, such as Transport Layer Security, are
used to protect the payload.
Our approach belongs to yet another class of techniques,
those which try to classify network traffic relying exclusively
on the statistical properties of the flows (see for example [5],
[6]). The key idea behind our work is that the statistical
properties of three basic elements of each network flow, i.e. the
size of the IP packets, their inter-arrival time and their cardinality within the flow, should be sufficient to determine which
application layer protocol generated the flow. We describe our
idea in this paper by providing three research contributions.
We first define the notion of protocol fingerprints, which
express the three statistical properties mentioned above in
a compact and efficient way. We then introduce the notion
of an anomaly score in this context. Our anomaly score
defines how much an unknown traffic flow is close to a given
protocol fingerprint. Finally, we introduce a simple algorithm
that classifies unknown flows by checking their anomaly scores
Relay Chat traffic traces, since such traffic flows are easily
identifiable even by payload analysis. These works differ
from ours in that they focus exclusively on a single class of
applications (audio traffic and chat traffic, respectively).
In [10], [11] it is shown that traffic pattern similarity
between different application layer protocols can be exploited
to group observed flows into hierarchical clusters. Even if
only a few representative features are taken from each flow
(the number of exchanges between the endpoints, the total
number of bytes, connection duration and so on), the produced
clusters show the feasibility of untrained coarse statistical
traffic classification aimed at the discrimination among different application classes (e.g., peer-to-peer flows versus remote
login). Although these techniques represent the first attempt
of untrained traffic classification, it is not clear if they could
be applied to the finely grained application layer classification
we are interested in.
Other trained approaches confirm the possibility of discrimination between different application classes. One of the main
obstacles to the deployment of effective QoS mechanisms in
the Internet remains the lack of a fast and reliable mechanism
to classify flows depending on which kind of application
data they transport (e.g. bulk FTP data versus low jitter
audio packets). The technique presented in [12] focuses on
this problem: in this work it is shown that a useful set of
features allowing this kind of classification can be located at
different levels (single packet, flow, connection and so on)
and they can be successfully exploited by Nearest Neighbor
and Linear Discriminant Analysis algorithms. Another trained
approach for class discrimination among flows has also been
demonstrated with a supervised Bayesian Learning Machine
approach in [13]. Although based on full payload analysis, [14]
also tries to identify classes of traffic, instead on focusing on
the classification of specific application layer protocols. Even
if these approaches could lead to an effective deployment of
QoS-based mechanisms, probably they would not be precise
enough to allow fine-grained application layer protocol discrimination, e.g. POP versus IMAP.
Finally, [6] is one of the recent works that focus on the
statistical analysis of traffic, and that shows a promising
approach for fine-grained protocol classification. It introduces
a classification method based on the analysis of host behavior,
with the same goals as ours: the classification of flows according to the applications that generated them without payload
analysis. However, their approach differs considerably from
ours: in the case of [6] the classification is made by associating
an host behavior pattern to one or more applications and
refining the association by means of heuristics and behavior
stratification.
against all known protocol fingerprints.
The key benefits of our approach versus existing techniques
are its lightweightness and its robustness against the emergence of traffic generated by new application layer protocols.
Furthermore, it is not based on signature matching and does
not rely in any way on port numbers knowledge and transport
layer payloads: this theoretically allows the classification of
traffic flows that are tunneled, or even encrypted. For example,
HTTP sessions tunneling peer-to-peer bulk data should not
be classified as web browsing. A preliminary experimental
application of our technique on traffic traces collected on the
University of Brescia’s Faculty of Engineering network shows
promising results.
The remainder of this paper is organized as follows. In
Section II we briefly describe related work. In Section III
we introduce our classification methodology, showing how we
can derive protocol fingerprints by using statistical analysis at
the IP-level. We also describe our definition of anomaly score
of a given IP traffic flow versus a fingerprint, and introduce
a simple classification algorithm based on such score. In
Section IV we describe preliminary numerical results: given
a set of protocol fingerprints we show how our definition
of anomaly score is effective at classifying traffic flows by
analyzing exclusively their properties at the IP-level. Finally,
Section V concludes the paper.
II. R ELATED WORK
The idea of using the statistical properties of network traffic
to classify flows, or at least to describe their behavior, is not
new. Early, pioneering work on Internet traffic characterization
published in [7], [8] focused on the relationship between
the observed statistical properties of flows and the application
protocol that generated them. These works show that analytical
models describing random variables can be suitable to express
the behavior of a few protocols. Such models include observed
lengths, duration and inter-arrival times of different TCP flows.
However, the analysis of inter-arrival statistics shows that non
homogeneous Poisson models, though they can successfully
describe some user-induced events (e.g. remote shell), do not
capture most of the traffic characteristics. These preliminary
works do not make any attempt to classify flows according to
application layer protocols.
One of the first attempts to classify content-biased traffic [9]
shows how Real Audio flows may be identified among aggregates. With a simple analysis of packet lengths and interarrival times, the technique described in this work aims at
allowing QoS deployment for audio traffic, independently from
the particular transport protocol used. A similar approach has
been used in [5] to analyze chat traffic. Stemming from the
observation that this kind of traffic is dominated by human
interactions, this work proved the feasibility of identifying chat
flows, whether or not they are using their own transport protocol or are layered on top of other application protocols like
HTTP. To overcome one of the key issues with statisticallytrained classifiers, i.e. the lack of verifiable reference data,
this work was based on the statistical analysis of Internet
III. O UR CLASSIFICATION METHODOLOGY: APPLICATION
LAYER FINGERPRINTS AT THE NETWORK LAYER
In this paper, we focus on the classification of IP flows
produced by network applications exchanging data through
2
TCP connections such as HTTP, SMTP, SSH, etc.2 These
applications always follow the same basic model of operation:
a pair of client and server endpoints establish a connection by
means of a three-way handshake procedure - connection set
up; they exchange application data through TCP segments communication phase; they decide to end the communication
- connection tear down. If something goes wrong this threesteps procedure is usually terminated and the resulting flow
is incomplete. With this basis and referring only to complete
flows, we define flow F as the unidirectional, ordered sequence
of IP packets produced either by the client towards the server,
or by the server towards the client. Each communication will
therefore generate two flows, the client-server flow, composed
of (Nclient + 1) IP packets:
an IP packet on the type of network interface used to collect
the traces. For example, on an Ethernet link, variable s would
range from 40 to 1500 (bytes).
Variable ∆t is, instead, sampled with resolution coherent
with the speed of network interface used to capture the traffic
traces and with the clock resolution of the capture device, and
binned accordingly. In case of Tcpdump [15] used on offthe-shelf hardware, the PDFi plane can be realistically binned
along the (log10 ) ∆t-axis from 10−7 to 103 (seconds), with
step 0.015 . Each resulting PDFi matrix in our example above
would be 1461x1001. Finally, if L+1 is the number of packets
of the longer-lived flows used to analyze a certain protocol,
we order the resulting L PDFi s into the Probability Density
Function vector PDF.
Fclient = {P kt0 , P kt1 , P kt2 , . . . , P ktNclient },
B. Anomaly score: from protocol PDFs to protocol fingerprints
where P ktj represents the j-th IP packet sent by the client to
the server, and the corresponding server-client flow, composed
by (Nserver + 1) IP packets:
In order to classify an unknown traffic flow given a set of
different protocol PDFs we need to check if the behavior of the
flow is statistically compatible with the description given by
at least one of the PDFs; furthermore we need also to choose
which PDF describes it better. If we were able to do this we
could state that the unknown flow belongs to the application
protocol which built that PDF.
We are looking for a definition of an anomaly score S
that could describe “how statistically far” an unknown flow
F, composed by a series of packet pairs Pi , is from a given
protocol PDF. A basic building block of such anomaly score
is the value that the i − th component of PDF assumes in
Pi . In fact, since the value of PDFi in (s, ∆t) expresses
the probability that the pair Pi is set to (s, ∆t), the value
PDFi (Pi ) gives us the correlation between the unknown flow’s
i − th packet and the application layer protocol described by
the specific PDF used: the higher the value, the higher the
probability that the flow was generated by such protocol.
To counter the fact that the random variables (s, ∆t) are
affected by “noisy” components such as network congestion
and path MTU values, the values of PDFi in the region close to
Pi should also be allowed to have an impact on the definition
of anomaly score S. To this end, we introduce the concept
of protocol fingerprint M, defined as the vector of i matrices
resulting from the application of a circular Gaussian filter of
a given radius6 to each component of the PDF vector, and
rescaling every resulting matrix so that it still sums to 1.
Figure 1 gives a graphic representation of the first three
components of the fingerprint resulting from the HTTP protocol Fclient traces obtained from the training set described
in the following section. It is clear how the behavior of a
packet extracted from a HTTP flow strongly depends on its
cardinality number inside the flow.
Fserver = {P kt0 , P kt1 , P kt2 , . . . , P ktNserver }.
At the IP layer, each flow F can be characterized as an
ordered sequence of N pairs Pi = {si , ∆ti }, with 1 ≤ i ≤ N ,
where si represents the P kti ’s size and ∆ti represents the
inter-arrival time between P kti−1 and P kti . Our study is
based on the tenet that the statistical information contained
in an appropriate amount of flows generated according to the
same application layer protocol rules should be enough to
decide whether an unknown flow is in agreement with such
protocol or not. We name such statistical information protocol
fingerprint, and we define it in the remainder of this section.
A. Protocol fingerprint precursors: Probability Density Function vectors
The generation of a given application layer protocol’s fingerprint starts from the evaluation of a set of L Probability
Density Functions PDFi , estimated from a set of flows (a
training set) generated by the same, known protocol, and
captured by a monitoring device. Here the i − th PDFi is
built on all the i − th packets belonging to those flows that
are at-least i packets long. Obviously as i increases, the
corresponding PDFi is evaluated on a decreasing number of
flows. Therefore, L is fixed to base each PDFi on a statistically
significant number of flows3 . If this holds, PDFi describes the
behavior of the i-th packets on the plane (packet-size s, interarrival time ∆t) for a certain protocol.
Variable s is discrete4 and assumes values in a range
dimensioned according to the minimum and maximum size of
2 The extension of our work to other kinds of transport layer protocols, such
as the User Datagram Protocol, is left as future work.
3 We will specify the order of magnitude of this number in Section IV.
4 As usual when we consider a random variable X assuming values on
a finite size domain X = {x1 , ..., xN } we implicitly state its distribution
function is everywhere constant apart from a finite number of points that are
a subset of X. We then consider, with a small notation misuse, its probability
density function being everywhere null and positive on these points where it
is equal to the amplitude of the corresponding distribution discontinuity.
5 Note that although the binning of the ∆t axis is done to accept time-stamp
differences of as little as 10−7 seconds, in our case this is a value that is far
too conservative, since Tcpdump will not be able to reach this kind of accuracy
on stock hardware. However, this fact does not impact the correctness of our
methodology, since the same inaccuracies imputable to Tcpdump are expected
to affect homogeneously the generation of every protocol’s PDFi .
6 We will give examples of numerical values of the radius in Section IV.
3
p
2) F was originated
by protocol
p if S (F, M )
1
K
min S F, M , ..., S F, M
.
=
D. Using the technique in practice
The application of this technique to classify TCP flows on
a real network is relatively simple, and can be summarized in
the following steps:
1) Collect traffic traces on the edge gateway of the network.
This can involve using Tcpdump or any other trafficcapture mechanism available. These traces will serve as
training set for our classification technique.
2) Pre-classify the traces by means of any effective mechanism, either payload or header based, such as Snort, Bro,
the techniques proposed in [6], [13], or a combination
of these mechanisms.
3) Using the results of the payload-based pre-classification,
build protocol fingerprints, following the procedures
described in Sections III-A and III-B. Install the fingerprints on the classification engine.
4) At this point, the (software based) classification engine
can run the algorithm introduced in Section III-C, based
on the protocol fingerprints derived from the previous
phase. This activity can be performed on live traffic.
5) Periodically, if necessary, update the fingerprints by
running steps 1-3 again.
Fig. 1. First three components of the HTTP fingerprint, as derived from the
University of Brescia’s Fclient traffic.
We can now proceed to the actual definition of anomaly
score S. We start with introducing an anomaly score vector A,
whose i − th component Ai is a function of the value Mi (Pi ).
The i − th component of vector A is defined as follows:
1
Ai (Pi , Mi ) =
,
max (ε, Mi (Pi ))
where Mi (Pi ) is the value of Mi calculated in Pi , and ε
is a small positive quantity. We introduce the term ε to let
the score be always finite, even when Mi is zero in Pi 7 . By
construction, the following will hold true, for any value of Pi
on the plane (s, ∆t): 1 ≤ Ai (Pi , Mi ) ≤ ε−1 .
Starting from the definition of anomaly score vector A we
can now define the anomaly score S of F versus M as follows:
PN
Ai (Pi , Mi ) /N − Amin
S (F, M) = i=1
,
Amax − Amin
where N is the minimum between the number of pairs
composing F and L, and Amin,max are the allowed extreme
values of A as defined above, i.e. 1 and ε−1 , respectively. This
implies that 0 ≤ S (F, M) ≤ 1.
Note that the accuracy of the tools used in step 2 is critical:
it is necessary that the pre-classification of the flows that will
be used to build fingerprints is done with as little error as
possible. For example, when building the HTTP fingerprints
the inclusion of HTTP flows tunneling peer-to-peer bulk data
should be strictly avoided.
Although at this stage the validation of training sets is
widely recognized as a difficult problem to solve [6], different
solutions either exist, or could be implemented in the near
future at least for a subset of the cases. A combination of
different payload-based classifiers could, in many cases, be a
useful tool in this phase: even if computationally inefficient,
the use of such combination of mechanisms could find its
place here, since the pre-classification phase has to be seldom
executed, i.e. when the fingerprints have to be created for the
first time, and when they have to be updated as application
layer protocols evolve with time. However, an implementation
of such an effective combination of techniques is not available
to the research community at the time of this writing.
As we will see in the following sections, the preclassification of the traffic traces used in the experiments
described in the rest of this paper can be done with a
very simple technique, since in our experimental setup prior
knowledge can be used to infer with 100% certainty the nature
of the flows under consideration. Although this works perfectly
well for obtaining the preliminary validation results that we
report here, in the future a more general approach will need
to be implemented.
C. Classification algorithm
Building on the definitions of flow F, protocol fingerprint M
and anomaly score S (F, M), we can introduce the following
simple classification algorithm.
Given K protocol fingerprints Mj , with 1 ≤ j ≤ K, and an
unknown flow F, we state that F was originated by protocol
p if its score versus Mp is lower than for any other Mj , with
j 6= p, and 1 ≤ j ≤ K.
More precisely, our classification algorithm works as follows:
1) For each protocol fingerprint Mj , with 1 ≤ j ≤ K,
calculate S F, Mj .
7 ε should be an order of magnitude smaller than the smallest non-zero
value of Mi .
4
be sure that by simply observing each flow’s TCP headers we
can detect with certainty its related application layer protocol.
In other words, since we have complete control over the
software run on the network servers8 , we can be sure that
traffic flowing on a given port is actually valid traffic generated
by its related application layer protocol, otherwise our servers
would not generate semantically correct TCP flows. The same
considerations hold for the procedure we used to obtain
classification data on the evaluation set: by simply examining
the TCP headers of each flow we could, in this case, obtain
data to validate our technique against.
IV. A NALYSIS AND EXPERIMENTAL RESULTS
A. Testbed setup and protocol fingerprints
In order to preliminary test the validity of our classification
technique we collected network traces at the edge gateway of
the University of Brescia Engineering Faculty’s data center
network. The data center is composed of a dozen high-end
servers, running a mix of versions of the Linux operating
system, and interconnected by several layer-2 100/1000BaseT
802.3 segments. This network hosts e-mail, web, and shell accounts for around eight hundred people, between researchers,
administrative staff and students. The edge gateway where the
trace was collected connects the data center to the Internet by
means of a 100 Mb/s 802.3 link, and is implemented with a
Linux-based dual-processor server.
The traces were collected with Tcpdump from the outgoing interface of the gateway, and included each packet’s
full payload. During the preliminary validation phase of our
approach described in this paper, we focused our attention
on the following protocols: HTTP, SMTP and POP3. The
validation of our technique with other protocols is left to a
future work.
We collected traffic traces running Tcpdump for several
hours a day, over a period of time spanning six months. Both
the time and duration of each trace were chosen randomly.
In the end, this phase resulted in traces for over 150 GB of
traffic composed by a variable mix of semantically valid HTTP,
POP3 and SMTP flows. Here, we define “semantically valid”
the TCP flows that go beyond the initial 3-way-handshake procedure, i.e. that are characterized by more than two ∆t values.
We divided the traces in two sets: a training set, amounting
to more than 15000 flows for each of the three protocols we
are considering, used to build protocol fingerprints, and an
evaluation set amounting to more than 6000 flows for each of
the three protocols, used to validate our technique.
Following the procedure described in Section III-D, we
started by creating protocol fingerprints from the training set.
We actually generated several sets of fingerprints, starting from
increasingly large portions of the training set. This is useful
to assess the sensitivity of our technique to the number of
flows used to fingerprint a given protocol. The value used for
the radius in the Gaussian filter (applied to PDFs to derive
protocol fingerprints, see Section III-B) was 21. This value was
chosen empirically, since it gave the best results with the data
available during this preliminary validation phase. The study
of how the radius affects the precision of our technique, if it
should be a fixed value or it should be fingerprint-dependent,
and if the Gaussian filter should be parametrized differently
in the two axis (s, ∆t) is left to a future work. We generated
fingerprint vectors composed of twenty elements, which seem
more than enough to statistically characterize even longer-lived
flows.
In this paper, the technique used to pre-classify the training
set (step 2, Section III-D) is very simple: since in this case
the traffic used for the validation was either originated from
or directed to the data center network describe above, we can
Fig. 2. First three components of the SMTP fingerprint, as derived from the
University of Brescia’s Fclient traffic.
Fig. 3. First three components of the POP3 fingerprint, as derived from the
University of Brescia’s Fclient traffic.
Figures 1, 2 and 3 show the first three components of the
fingerprint vectors that we obtained for HTTP, SMTP and
8 Note that, on the contrary, we do not have any control over the clients
which, for all the traces involved in our experiments, are on the Internet.
5
1
POP3, respectively. Once again, it is clear how the behavior of
packets depends on their cardinality. Furthermore, the different
packet distributions of the three considered protocols are
clearly visible. Note that although the figures for POP3 at this
resolution seem to show less differences than the other two
fingerprints, the same kind of cardinality-dependence exhibited
by SMTP and HTTP is clearly visible for POP3 at higher
resolutions.
0.98
0.96
hit ratio
0.94
B. Experimental results
0.9
0.88
By running the methodology explained in Section III, we
classified the flows of the evaluation set. For each flow in the
evaluation set, we compared the result given by our technique
with the actual application layer protocol determined by the
TCP headers as explained in section IV-A.
Protocol
HTTP
SMTP
POP3
0.92
Fserver
99.41%
99.65%
98.47%
http Fserv
smtp Fserv
pop3 Fserv
http Fcli
smtp Fcli
pop3 Fcli
0.86
0.84
0.82
1000
Fclient
97.46%
97.79%
92.79%
2500
5000
7500
number of flows
10000
15000
Fig. 4. Ratio of HTTP, POP3 and SMTP flows from the evaluation set
that were correctly classified vs. number of flows used to build protocol
fingerprints.
TABLE I
P ERCENTAGE OF FLOWS FROM THE EVALUATION SET THAT WERE
in use, allowing the direct application of our classification
methodology to a large portion of the traffic. However, no
matter how many the available fingerprints, it is reasonable to
assume that it will always be possible to encounter a traffic
flow that does not belong to any of them. In such cases our
classification algorithm would fail: it would incorrectly assign
the unknown flow to one of the known fingerprints. In fact,
the proposed algorithm is based on a hard decision rule: given
some fingerprints, the type of an unknown flow is always set
to the “closest” fingerprint as stated by rule 2 of Section IIIC. This approach cannot work for the classification of traffic
which we don’t have a fingerprint for. The problem here is
that the absolute value of the anomaly score is not taken into
account.
To understand how to improve our algorithm we can observe
Table II, where we report the mean anomaly scores of flows
in the evaluation set versus the three available fingerprints.
For flows produced by fingerprinted protocols (i.e., HTTP,
SMTP and POP3), the mean score of traffic flows versus the
available fingerprints is at its minimum for the fingerprint of
the protocol that generated the flows (values in italic), and
this is the factor that our classification algorithm is based on.
Furthermore, the mean score of the flows versus the “correct”
protocol fingerprint is below 0.1 in all cases and the differences
are again more marked for Fserver flows.
We can hence modify our algorithm and include the absolute
score value in the decision mechanism so that a “warning
bell” could ring when a protocol that does not belong to any
available fingerprint is being classified. For example, we could
set a threshold value so that when the smallest score is above
it the flow is classified as “unknown”.
This idea is validated by the last rows of each of Tables II,
which show the mean anomaly score of SSH flows from
the three protocol fingerprints. Contrary to what happens for
fingerprinted protocols, in this case, all mean scores are well
CORRECTLY CLASSIFIED .
Table I presents the main results of this experimental phase.
As shown by the numbers, our technique correctly classifies
the application layer protocol of each flow in excess of at
least 92% of the times, with protocol fingerprints obtained
from 15000 flows. The technique seems to perform better for
Fserver flows than for Fclient ones. This can be justified by
the fact that the statistics of the parameters we are observing
is more deterministic, in our experiment, for Fserver than
for Fclient flows. In fact, while the former are generated
by a relatively uniform set of variables (operating systems,
hardware and network conditions), we can expect much more
heterogeneity for the latter (clients on the Internet). Nevertheless, our technique shows promising results even in classifying
Fclient flows.
As shown in Figure 4, our technique is sensitive to the
number of flows used to build fingerprints: as long as this
number increases we note that, apart from local oscillations,
the hit ratio rises. However it is worth noting that even when
the fingerprints are built on 1000 flows only, the hit ratio is
still above 96% for all three protocols Fserver traffic. Among
the Fclient flows, only the classification of HTTP achieves
relatively lower results, but in any case above the 83% mark.
Once again, this confirms the fact that our technique seems to
perform reasonably well even with traffic generated under a
highly heterogeneous set of conditions, as in the case of traffic
coming from clients on the Internet.
C. Extension to the classification of non-fingerprinted protocols
The application of our technique on a generic TCP/IP
network could, after some time spent collecting traces, produce
fingerprints for the majority of the protocols that are effectively
6
above the 0.1 mark and only a small set of flows reach an
anomaly score lower than 0.1 (0.6% of flows in the best case
and 6.5% in the worst case). This number seems to be a good
candidate for the proposed threshold and could be used to
improve the robustness of the proposed algorithm respect to
flows generated by non-fingerprinted protocols.
HTTP flows
SMTP flows
POP3 flows
SSH flows
HTTP M
0.0139
0.4872
0.0740
0.5333
HTTP flows
SMTP flows
POP3 flows
SSH flows
HTTP M
0.0278
0.4506
0.5100
0.5740
Fclient
SMTP M
0.2116
0.0376
0.0394
0.2701
Fserver
SMTP M
0.5020
0.0050
0.1617
0.4118
sets and expand the analysis to other fingerprinted protocols.
Also, several improvements to our algorithm are possible. A
few of them are the correlation of classification information
from Fclient and Fserver , and the integration of different
factors, such as the numerical value of the anomaly score of an
unknown flow versus each fingerprint, in the decision process,
moving away from the hard-decision algorithm presented in
this paper. Finally Table II shows that the mean anomaly score
is protocol-biased. A preliminary analysis that we recently
conducted indicates that the bias is related to the Gaussian
filter applied to PDFs to obtain protocol fingerprints. We are
investigating the possibility of building each fingerprint with a
different smoothing factor, and to apply a Gaussian filter with
different parameters on the s and ∆t axes, in order to obtain
more precise fingerprints and anomaly score measurements.
POP3 M
0.2059
0.4305
0.0014
0.5390
POP3 M
0.4849
0.1966
0.0169
0.5314
R EFERENCES
[1] T. Karagiannis, A. Broido, M. Faloutsos, and K. C. Claffy, “Transport
layer identification of P2P traffic,” in IMC’04: Proceedings of the 4th
ACM SIGCOMM conference on Internet measurement, (New York, NY,
USA), pp. 121–134, ACM Press, 2004.
[2] D. Moore, K. Keys, R. Koga, E. Lagache, and K. C. Claffy, “The
CoralReef Software Suite as a Tool for System and Network Administrators,” in LISA ’01: Proceedings of the 15th USENIX conference on
System administration, (Berkeley, CA, USA), pp. 133–144, USENIX
Association, 2001.
[3] V. Paxson, “BRO: a system for detecting network intruders in real-time,”
in Proceedings of the 7th USENIX Security Symposium, (San Antonio,
TX, USA), January 1998.
[4] M. Roesch, “Snort: Lightweight intrusion detection for networks,” in
LISA ’99: Proceedings of the 13th Conference on Systems Administration, (Seattle, Washington, USA), pp. 229–238, 7-12 November 1999.
[5] C. Dewes, A. Wichmann, and A. Feldmann, “An analysis of Internet
chat systems,” in IMC ’03: Proceedings of the 3rd ACM SIGCOMM
conference on Internet measurement, (New York, NY, USA), pp. 51–64,
ACM Press, 2003.
[6] T. Karagiannis, K. Papagiannaki, and M. Faloutsos, “BLINC: multilevel
traffic classification in the dark,” in SIGCOMM’05: Proceedings of the
2005 conference on Applications, technologies, architectures, and protocols for computer communications, (New York, NY, USA), pp. 229–240,
ACM Press, 2005.
[7] V. Paxson, “Empirically derived analytic models of wide-area TCP
connections,” IEEE/ACM Trans. Netw., vol. 2, no. 4, pp. 316–336, 1994.
[8] V. Paxson and S. Floyd, “Wide area traffic: the failure of Poisson
modeling,” IEEE/ACM Trans. Netw., vol. 3, no. 3, pp. 226–244, 1995.
[9] A. Mena and J. Heidemann, “An Empirical Study of Real Audio Traffic,”
in Proceedings of the IEEE Infocom, (Tel-Aviv, Israel), pp. 101–110,
IEEE, March 2000.
[10] F. Hernández-Campos, F. D. Smith, K. Jeffay, and A. B. Nobel, “Statistical Clustering of Internet Communications Patterns,” in Computing
Science and Statistics, vol. 35, July 2003.
[11] A. McGregor, M. Hall, P. Lorier, and J. Brunskill, “Flow Clustering
Using Machine Learning Techniques,” in Proceedings of the Fifth
Passive and Active Measurement Workshop (PAM 2004), Mar. 2004.
[12] M. Roughan, S. Sen, O. Spatscheck, and N. Duffield, “Class-of-service
mapping for QoS: a statistical signature-based approach to IP traffic
classification,” in IMC ’04: Proceedings of the 4th ACM SIGCOMM
conference on Internet measurement, (New York, NY, USA), pp. 135–
148, ACM Press, 2004.
[13] A. W. Moore and D. Zuev, “Internet traffic classification using bayesian
analysis techniques,” in SIGMETRICS ’05: Proceedings of the 2005
ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (New York, NY, USA), pp. 50–60, ACM
Press, 2005.
[14] A. W. Moore and K. Papagiannaki, “Toward the Accurate Identification
of Network Applications,” in Proceedings of the Sixth Passive and Active
Measurement Workshop (PAM 2005), Oct. 2005.
[15] “Tcpdump/Libpcap.” http://www.tcpdump.org.
TABLE II
M EAN VALUES OF ANOMALY SCORES S(F,M) FOR FINGERPRINTED
PROTOCOLS AND FOR
SSH.
In other words, while the blind application of our methodology to the classification, for example, of SSH flows based
on HTTP, POP3 and SMTP fingerprints would assign them
(incorrectly) to SMTP traffic, the inclusion of a threshold
value for the anomaly score to our algorithm would signal
the fact that the unknown traffic does not belong to any of
the fingerprinted protocols. We are continuing to study the
applicability of our methodology to non-fingerprinted traffic,
and will report on it in more detail in future papers. In
particular, while the application of a fixed threshold value
represents a first order approximation of an algorithm capable
of detecting non-fingerprinted protocols, we are studying the
effects of the application of a per-fingerprint threshold, as well
as more refined data derived from the analysis of the properties
of the mean scores.
V. C ONCLUSIONS AND FUTURE WORK
In this paper we introduced and analyzed a new methodology for the IP-level classification of network traffic. The
main highlight of our technique is the fact that it is based
on the statistical properties of network traffic rather than
on the analysis of its payloads. This means that it is less
computationally intensive than payload-based mechanisms,
making it more scalable to the increasing speed of today’s
networks. Furthermore, it can in principle be extended to the
classification of encrypted traffic.
Experimental tests show promising, albeit preliminary, results with respect to the correctness of our classification
algorithm, and to its sensitivity to a series of parameters.
Comparisons with related work are also encouraging: besides
the novelty of the main idea behind our approach, numerical
results presented in this paper show how this methodology
can potentially surpass the performance of existing trafficclassification mechanisms.
Our work in this area is continuing in several directions. A
first natural step will be to run new tests with different training
7
© Copyright 2026 Paperzz