00530163.pdf

Cluster Analysis for Anomaly Detection
Giuseppe Lieto, Fabio Orsini, and Genoveffa Pagano
System Management, Inc. Italy
[email protected], [email protected],
[email protected]
Abstract. This document presents a technique of traffic analysis, looking for attempted intrusion
and information attacks. A traffic classifier aggregates packets in clusters by means of an
adapted genetic algorithm. In a network with traffic homogenous over the time, clusters do not
vary in number and characteristics. In the event of attacks or introduction of new applications the
clusters change in number and characteristics. The set of data processed for the test are extracted
from traffic DARPA, provided by MIT Lincoln Labs and commonly used to test effectiveness
and efficiency of systems for Intrusion Detection. The target events of the trials are Denial of
Service and Reconaissance. The experimental evidence shows that, even with an input of unrefined data, the algorithm is able to classify, with discrete accuracy, malicious events.
1 Introduction
Anomaly detection techniques are based on traffic experience retrieval on the network
to protect, so that abnormal traffic, both in quantity and in quality, can be detected.
Almost all approaches are based on anti-intrusion system learning of what is normal traffic: anything different from the normal traffic is malicious or suspect.
The normal traffic classification is made analyzing:
• Application level content, with a textual characterization;
• The whole connection and packet headers, usually using clustering techniques;
• Traffic quantity and connections transition frequencies, by modelling the users behaviour in different hours, according to the services and the applications they use.
1.1 Anomaly Detection with Clustering on Header Data
The most interesting studies are related to learning algorithms without human supervision. They classify the traffic in different clusters [3], each of them contains
strongly correlated packets. Packets characterization is based on header fields, while
the cluster creation can be realized with different algorithms.
All the traffic that doesn’t belong to normal clusters, is classified as abnormal; the
following step is to distinguish between abnormal and malicious traffic.
Traffic characterization starts with data mining and creation of multidimensional vectors, called feature vectors, whose components represent the instance dimensions. The
choice of relevant attributes for the instances is really important for the characterization
and many studies focus on the evaluation of the best techniques for features choice.
E. Corchado et al. (Eds.): CISIS 2008, ASC 53, pp. 163–169, 2009.
springerlink.com
© Springer-Verlag Berlin Heidelberg 2009
164
G. Lieto, F. Orsini, and G. Pagano
The approach chosen in [4] considers the connections as an unique entity, containing several packets. In that way is possible to retrieve from network data information
as the observing domain, the hosts number, the active applications, the number of
users connected to a host or to a service.
The approach introduced above is the most used one. Anyway there are classification trials based on raw data.
An interesting study field is DoS attack detection: such attacks produce an undistinguishable traffic. [1] proposes a defence based on edge routers, that can create
different queues for malicious packets and normal traffic. The distinction takes place
through an anomaly detection algorithm that classifies the normal traffic using a kmeans clustering, based on observation of the statistic trend of the traffic.
Experimental results state that, during a DoS attack, new denser, more populated
and bigger clusters are created. Even the sudden increase in density of an existing
cluster can mean an attach is ongoing, if it is observed together with an immediate
variation of the single features mean values.
The different queues allow to shorten up the normal connection time; moreover,
without the need to stop the suspect connection, the band dedicated to DoS traffic decreases (because of long queues), together with the effects on resources management.
In our approach, we choose a genetic clustering technique, Unsupervised Niche
Clustering [2], to classify the network traffic. UNC uses an evolutionary algorithm
with a niching strategy, allowing to maintain genetic niches as candidate clusters,
rather than loosing awareness of little groups of strongly peculiar genetic individuals,
that would end up extinguished using a traditional evolutionary algorithm.
Analysing the traffic, we applied the algorithm to several groups of individuals,
monitoring the formation of new clusters and the trend of the density in already existing clusters [1]. Moreover, we observed the trend in the number of clusters extracted
from a fixed number of individuals. The two approaches were tested during normal
network activities and then compared to the results obtained during a DoS attack or a
reconnaissance activity.
2 Description of the Algorithm: Unsupervised Niche Clustering
Unsupervised Niche Clustering aims at searching the solution space for any number
of clusters. It maintains dense areas in the solution space using an evolutionary algorithm and a niching tecnique. As in nature, niches represent subspaces of the environment that support different types of individuals with similar genetic characteristics.
2.1 Encoding Scheme
Each individual represents a candidate cluster: it is characterized by the center, an
individual in n dimensions, a robust measure of the scale and its fitness.
Table 1. Feature vector of each individual
Genome
Guyi= (gi1, gi2, … , gin)
Scale
Σ²i
Fitness
fi
Cluster Analysis for Anomaly Detection
165
2.2 Genetic Operators and Scale
At the very first step, scale is assigned to each individual in an empirical way: we
assume i-th individual is the center, that is the mean value, of a cluster containing all
the individuals in the solution space.
For each generation parents and offspring, update their scale in a recursive manner, according to equation (1):
(1)
(2)
where wij represents a robust weight measuring how much the j-th individual belongs
to i-th cluster;
dij is the Euclidean distance of j-th individual from the center of i-th cluster;
N is the number of individuals in the solution space.
At the moment of their birth, children inherit their closest parent’s scale. Generation
of an offspring is made by two genetic operators: crossover and mutation. In our work
we implemented one-point crossover to each dimension, combining the most significant bits of one parent with the least significant ones of the second parent. Mutation
could modify each bit of the genome with a given probability: in our case, we chose
the mutation probability was 0.001.
Equation (1) maximizes fitness value (3) for i-th cluster.
2.3 Fitness Function
Fitness, for i-th individual, is represented by the density of a hypotetical cluster, having i-th individual as its center
(3)
2.4 Niching
UNC uses Deterministic Crowding (DC) to create and maintain niches.
DC steps are:
1.
2.
3.
4.
choose the couple of parents
apply crossover and mutation
calculate the distance from each parent to each child
couple one child with one parent, so that the sum of the two distances parentchild is minimized
5. in each couple parent-child, the one with the best fitness survives, and the other
is discarded from the population
166
G. Lieto, F. Orsini, and G. Pagano
Through DC, evaluation of child’s fitness for surviving is not simply obtained comparing it to the closest parent’s fitness: such an approach keeps the comparisons
within a limited solution space.
In addition, we analysed a conservative approach for step1. The parents where chosen so that their distance was under a fixed threshold, and their fitness had the same
order of magnitude, in a way to maintain genetic diversity.
Coupling between very far individuals, having highly different fitness values,
would quickly extinguish the weakest individual, loosing notion of evolutionary
niches.
2.5 Extraction of the Cluster Centers
The final cluster centers are individuals in the final population with fitness greater
than a given value: in our case, greater than the mean fitness of the entire population.
2.6 Cluster Characterization
Assignment of each individual to a cluster does not follow a binary logic; fuzzy logic
is applied instead. Clusters don’t have a radius, but we assigned to each individual a
degree of belonging to each cluster.
Member functions of the fuzzy set are Gaussian functions in equation (4)
(4)
where mean value in the center of the cluster, and the scale of the center coincides
with the scale of the belonging Gaussian function. An individual will be considered as
belonging to the cluster which maximizes the belonging function (4).
3 Data Set
For the correct exploration of a solutions space, the genomes must correctly represent
the physical reality under study.
Our work can be divided according to the following phases:
• We created an instrument to extract network traffic data.
• We investigated the results of a genetic clustering without any data manipulation,
in order to observe if header raw data could correctly represent the network traffic
population, without any human understanding of attribute meaning. This approach
proved to be completely different from the analytic one proposed in [4].
• We observed the clusters centres evolution to detect DoS and scanning attacks.
• All data were extracted from tcpdump text files, obtained from a real network traffic; the packets have been studied starting from the third level of the TCP/IP stack.
Cluster Analysis for Anomaly Detection
167
This choice causes the lack of the information of the link between Ip address and
physical address, contained in ARP tables.
• The headers values were extracted, separated, and converted in long integers. Ip
addresses, were divided in two separated segments and later converted because of
the maximum representation capacity of our computers.
• From a single data set we implemented an object made up of the whole population
under exam. The choice of the headers fields and the population size has been
taken using a heuristic method.
4 Experimental Results
We focused on attacks as denial of Service and scanning activities. Our data set was
extracted from DARPA, build by Lincoln Lab in MIT in Boston., USA.
4.1 Experimentation 1
The first example is an IP sweep attack: the attacker sends an ICMP packet to each
machine, in order to discover if they’re on at the moment of the attack.
We applied the algorithm to an initial train of 5000 packets; then, we monitored the
trend of the centers on following trains of 1000 packets.
Fig. 1. Trends of clusters for Neptune attack
168
G. Lieto, F. Orsini, and G. Pagano
As the number of clusters won’t be predefined, we related each cluster representative of a train with the closest belonging to the following train of packets. About not
assigned centers, we calculated the minimum distance from the preceding clusters.
We observed that ICMP clusters can never be assigned to a preceding train of
packets, and their minimum distance is far larger than any other not assigned clusters.
In figure 1, the evolution of normal clusters, and the isolated cluster created during
the attack, the red one. The three dimensions are transport layer protocol, destination
and source port.
We identified the attack, when a not assigned cluster’s minimum distance was
higher than a threshold value. We had the same results in a Port Sweep attack.
Anyway, we faced some false positive, in presence of DNS requests: this could be
avoided accurately assigning weight functions to balance the different kinds of normal
traffic in the network.
4.2 Experimentation 2
The second case we analysed was a Neptune attack, it causes a denial of service,
flooding the target machine with syn TCP packets, and never finalizing the three way
handshake. Handling an attack producing a huge number of packets, we expected a
Table 2. Evolution of the cluster centers during Neptune attack
Train
number
Attack
1
2
3
4
5
6
No
No
No
Yes
Yes
Yes
Colour in
fig. 2
1
2
Population
Number of
Clusters
5000
1000
1000
1000
1000
1000
18
13
13
21
2
2
3
4
5
6
Scale
6,00E+08
4,00E+08
2,00E+08
0,00E+00
1
2
3
4
5
6
Data set
Fig. 2. Dispersion around the centers in each train of packets
Average
Scale of the
Clusters in
the Train
2.37E+08
4.35E+08
5.29E+08
5.95E+08
8.40E+07
1.70E+08
Cluster Analysis for Anomaly Detection
169
raise in the number of clusters calculated on the packet trains containing the attack,
characterized by a density higher than the one observed during normal activities.
In figure 2, we represent the clusters’ centers in each train of packets. We observed
a strong contraction in the number of clusters. More over, the individuals in the population were very less dispersed around these centers than the normal centers.
It’s evident that the center, though calculated from the same number of packets,
diminish abruptly in number. More over, the dispersion around the centers diminishes
as abruptly, as seen in figure 3.
5 Conclusions
Experimental results show that our algorithm can identify new events happening in
trains of packets of a given small number: its sensitivity applies to attacks producing a
large number of homogenous packets.
Evolutionary approach proved to be feasible, stressing a trend in traffic: thanks to
the recombination of data and to the random component, the cluster centers can be
identified in genomes not present in the initial solution space.
Using a hill climbing procedure, UNC selects the fittest individuals, preserving
evolutionary niches generation by generation; by monitoring the evolution of the
centers, we had a robust approach against the noise, compared to a statistical approach
to clustering: individuals not representative of an evolutionary niche have a low probability of surviving.
The performance of the algorithm can be improved by separately processing the
traffic incoming and outgoing the network under analysis: this would help to keep
under control the false positive rate. Moreover, the process of data mining from
packet headers can be refined and improved, so to build a feature vector containing
not only raw data from the header, but more refined data, containing knowledge about
the network and its hosts, the connections, the services and so on.
A different approach of the same algorithm could be monitoring the traffic and
evaluating it compared to the existing clusters, rather than observing the evolution of
the clusters: once the clusters are formed, a score of abnormality can be assigned to
each individual under investigation, according to how much it belongs to each cluster
of the solution space. In a few empirical tests, we simulated a wide range of attacks
using Nessus tool: although some trains of anomalous packets show substantially
normal scores, and the number of false positive is quite relevant, we observed that
abnormal traffic has got a sensitive higher abnormal score than the normal traffic has,
if referring to the mean values.
References
1. Rouil, Chevrollier, Golmie: Unsupervised anomaly detection system using next-generation
router architecture (2005)
2. Leon, Nasraoui, Gomez: Anomaly detection based on unsupervised niche clustering with
application to network intrusion detection
3. Cerbara, I.: Cenni sulla cluster analysis (1999)
4. Lee, S.: A framework for constructing features and models for intrusion detection systems
(2001)