Alert Correlation for Extracting Attack Strategies

Alert Correlation for
Extracting Attack Strategies
Authors: B. Zhu and A. A. Ghorbani
Source: IJNS review paper
Reporter: Chun-Ta Li (李俊達)
Outline





Introduction
Proposed approach
Test and evaluation
Conclusions and future work
Comments
2
Introduction (1/2)

Security issues




The number of incidents rapidly increased from 82,094 in
2002 to 137,529 in 2003.
Attacks are getting more and more sophisticated.
One of the solutions: Intrusion detection systems (IDS)
Intrusion detection systems


Host-based and network-based (data source)
Problems:


Understanding of attack behaviors
Extracting attack strategies from the alerts

Manually managing, time-consuming and error-prone
3
Introduction (2/2)

Alert correlation techniques

Alert Correlation Based on Feature Similarity




Alert Correlation Based on Known Scenario




Based on the similarities of some selected features
Ex. Source IP address, target IP address, and port number
Drawback: Cannot discover the causal relationships between related alerts
Learned from training datasets using data mining approach
It can uncover the causal relationship of alerts
Drawback: They are all restricted to known situations
Alert Correlation Based on Prerequisite and Consequence
Relationship


Most alerts are not isolated, but related to different stages of attacks
Drawback: It cannot correlated unknown attacks
4
Proposed approach (1/9)

Overview


To reveal the causal relationship among the alerts
Automated construction of attack graphs
n
Alert
Correlation engine
1. Multi-Layer Perceptron (MLP)
2. Support Vector Machine (SVM)
n
▪ Whether or not two alerts should be correlated
n alerts
Alert Correlation Matrix
(ACM)
▪ If yes, the probability with which they are correlated
Algorithm 2
Attack graph
(Causal relationship among alerts)
A group of the correlated alerts
Algorithm 1
A list of hyper-alerts
5
Proposed approach (2/9)

Πf
Alert Correlation Matrix (ACM)

Cell:
(temporal relationship)
Each cell in ACM holds a correlation weight:

Correlation Strength (Π)


Πb
Πb
Backward Correlation Strength (Πb)
Πf
n

Forward Correlation Strength (Πf)
Previous
alert
Current
alert
Next
alert
n
6
Proposed approach (3/9)

Feature selection


Alert information: timestamp, source IP, destination IP,
source port, destination port, type of the attack
6 features:


 0 or 1
 0 or 1
 between 0 or 1
 between 0 or 1
▪ The value of F6 is low  two alerts are seldom correlated (Πb is not reliable)
▪ The value of F6 becomes large  two alerts are frequently correlated (Πb is reliable)
7
Proposed approach (4/9)


Alert Correlation Using Multi-Layer Perceptron (MLP)

Inputs: 6 elements and label

Outputs: a value between 0 and 1 (the probability that two alerts are
correlated)
Alert Correlation Using Support Vector Machine (SVM)


Inputs: the same input used for MLP and bipolar format labels
Outputs: a value between 0 and 1
-- The output of the conventional SVM (not probability)
-- The probability output of SVM [Platt, 1999]
-- cross-entropy error function [Platt, 1999]
8
Proposed approach (5/9)

Comparison of MLP and SVM

MLP:




More accurate than SVM
Slow training speed
Over-fitting problem
SVM:

To produce precise
probabilistic output  selected
appropriate training patterns
// To make a decision based on the
outputs of both of these two methods //
Proposed approach (6/9)

Correlation process (Algorithm 1)

Construct the hyper-alert graph


To give the network administrator intrinsic view of attack scenarios
Two thresholds: correlation threshold and correlation sensitivity
10
 0.5
11
Proposed approach (8/9)

Generating Attack Graph using ACM (Algorithm 2)


To represent different typical attack strategies
hyper-alert graph vs. attack graph




Attack graph (it can have cycles), hyper-alert graph (no cycles)
Attack graphs are a more general representations of attack strategies
Encodes causal relationship among alerts
The algorithm performs a horizontal search (Πf) in the ACM
12
13
Test and evaluation (1/7)

Experiment with DARPA 2000 Dataset [MIT Lincoln Laboratory]


Two multistage attack scenarios: LLDOS1.0 and LLDOS2.0.2
Alert log file [RealSecure IDS]

LLDOS1.0 (924 alerts)

Correlation process




correlation threshold r = 0.5
correlation sensitivity s = 0.1
ACM contains 19 different types of alerts
LLDOS2.0.2 (494 alerts)

Correlation process



correlation threshold r = 0.5
correlation sensitivity s = 0.1
ACM contains 17 different types of alerts
14
15
Test and evaluation (3/7)

LLDOS 1.0 – Scenario One
16
Test and evaluation (4/7)
17
Test and evaluation (5/7)

LLDOS 2.0.2 – Scenario Two
18
Test and evaluation (6/7)
19
Test and evaluation (7/7)



The attack strategies from intrusion alerts is similar
to Ning et al. [Ning and Xu, 2003]
The difference is that the proposed approach does
not need to define a large number of rules in order to
correlate the alerts.
The ACS is adaptive to the emerge of new attack
patterns because new alerts are automatically added
to the ACM.
20
Conclusions and future work

This paper presents an alert correlation technique





Multilayer Perceptron (MLP)
Support Vector Machine (SVM)
Alert Correlation Matrix (ACM)
Automatic extracting attack strategies from alerts
Future work




Identifying more features for correlation
Real-time correlation
Recognizing the variations of attack strategies
Target recognition and risk assessment
21
Evaluation of Paper: Good
Comments
Recommendation: Accept after minor revision

In this paper, the authors would like to propose a new alert correlation technique that
can automatically extract attack strategies from a large number of intrusion alerts for
the administrator to study new countermeasures. There are two neural network
approaches is used to determine the causal relationship of two alerts, including
Multilayer Perceptron (MLP) and Support Vector Machine (SVM). Moreover, an
Alert Correlation Matrix (ACM) is used to process and update the correlation
strength of any two types of alerts. From the evaluation results on the DARPA 2000
dataset shows, the result of author’s approach is similar to previous research. The
difference is that it does not need to define a large number of rules for the alerts and
the new alerts can be automatically add to the ACM for studying new attack
strategies.

My outlook






More features
The value of each label in MLP and SVM
18 training patterns
SVM writing
Efficiency and correlations
11 typos
22