Applying PCA for Traffic
Anomaly Detection:
Problems and Solutions
Daniela Brauckhoff (ETH Zurich, CH)
Kave Salamatian (Lancaster University, FR)
Martin May (Thomson, CH)
IEEE INFOCOM (April, 2009)
2010/3/2
1
Agenda
•
•
•
•
•
•
Before Introduction
Objective
A Signal Processing View on PCA
Extension of PCA to Stochastic Processes
Validation
Conclusion
2010/3/2
2
What is PCA?
• PCA
– Principle Component Analysis
• PCA’s Usage
– lower the characteristic dimension
– e.g., a picture with size 1024 * 768
• its characteristic dimension is its length * width
• with 786432 characteristic value
• use PCA to lower the characteristic dimension
2010/3/2
3
What is PCA? (cont.1)
2010/3/2
Ref. Site- http://blog.finalevil.com/2008/07/pca.html
4
What is PCA? (cont.2)
2010/3/2
5
Agenda
•
•
•
•
•
•
Before Introduction
Objective
A Signal Processing View on PCA
Extension of PCA to Stochastic Processes
Validation
Conclusion
2010/3/2
6
Problems and Solutions
• Consider the temporal correlation of the data
• Extend the PCA
– Replaced by Karhunen-Loeve Transform
2010/3/2
7
Agenda
•
•
•
•
•
•
Before Introduction
Objective
A Signal Processing View on PCA
Extension of PCA to Stochastic Processes
Validation
Conclusion
2010/3/2
8
Two different interpretations
1. As an efficient representation that transforms
the data to a new coordinate system
•
Projection on the first coordinate contains the
greatest variance
2. As a modeling technique
•
2010/3/2
using a finite number of terms of an orthogonal serie
expansion of the signal with uncorrelated coefficients
9
Background
• Suppose that we have a column vector of
correlated random variables:
–
Matrix X =>
X ( X 1 ,..., X K ) R
T
k
– Each random variable has its own observation vector
through N dependent realization vector:
x ( x ,..., x )
i
1
i T
K
– Note:
• Random variables means the data you collected from
network
2010/3/2
10
Background (cont.1)
• In order to find the characteristic of the
above data collected from network
– i.e., the most suitable basis: (1 ,..., K ) ,
• where i is an eigenvector of the covariance matrix X
defined as E{( X )( X )T } , estimated by
ˆ
• where
1
xx T
N 1
is a column vector containing the means of X i
X ( X 1 ,..., X K ) R
T
2010/3/2
k
11
Background (cont.2)
• The most suitable basis: (1 ,..., K )
• How to find the i respectively?
– i.e., solve the following linear equation:
i λ ii
– Method: SVD (Singular Value Decomposition)
• Note: basis change matrix U [1,...,K ]
2010/3/2
E{( X )( X )T }
12
Background (cont.3)
• But U [1,...,K ] is a basis change matrix only
when X is zero mean
~
• Meanwhile, X must replaced by X X -
~
~
– i.e., y Ux
– not taking care of it could lead to large errors when using
PCA
• Rewrite the initial vector of random variables X
– Yi is the essential property!
– i.e., suitable for PCA representation
2010/3/2
K
X Yii
i 1
13
Agenda
•
•
•
•
•
•
Before Introduction
Objective
A Signal Processing View on PCA
Extension of PCA to Stochastic Processes
Validation
Conclusion
2010/3/2
14
Stochastic Process
• The extension to PCA Stochastic processes
that have temporal as well as spatial
correlations
• Assume we have a K-vector of zero mean
stationary stochastic processes
X(t ) ( X 1 (t ),..., X K (t ))T
– with a covariance function
i , j ( ) E{ X i (t ) X j (t )}
2010/3/2
15
Stochastic Process (cont.1)
• The multi-dimension Karhunen-Loeve
theorem states that one can rewrite this
vector as a serie expansion (named KL
expansion):
K
X l (t ) Y i , j (t )
i 1 j 1
– Compared: X
K
Y
i 1
2010/3/2
l
i, j
i i
16
Stochastic Process (cont.2)
• How to get basis function i , j (t ) ?
– Solve the linear integral equations:
K
i 1
– Compared:
b
a
i ,l
( s ) i , j ( s t )ds λ l , j l , j (t )
i λ ii
l
i, j
• Then we can obtained Y
by
b
Y X l (s)i , j (s)ds
l
i, j
a
K
2010/3/2
X l (t ) Yi ,l j i , j (t )17
i 1 j 1
Stochastic Process (cont.3)
• But Galerkin method transforms the above
integral equations to a matrix problem that
can be solved by applying the SVD technique
• It possible to derive the KL expansion using
only a finite number of samples
– Time-sampled version => i , j [k ] i , j (kT )
– Finally, we obtain a discrete version of the KL
K N
expansion as:
l
X l [k ] Yi , j i , j [k ]
i 1 j 1
2010/3/2
18
Stochastic Process (cont.4)
• Construct a KN × (n − N) observation matrix
K
N
X l [k ] Yi ,l j i , j [k ]
i 1 j 1
• With KN eigenvector
2010/3/2
19
Stochastic Process (cont.5)
ˆ
1
xx T
n N 1
• Use
to estimate the all needed
spatio-temporal convariance
2010/3/2
20
Agenda
•
•
•
•
•
•
Before Introduction
Objective
A Signal Processing View on PCA
Extension of PCA to Stochastic Processes
Validation
Conclusion
2010/3/2
21
Data Set and Metrics
• Collect Three weeks of Netflow data
– one of the peering links of a medium-sized ISP
(SWITCH, AS559)
• Recorded in August 2007
– comprise a variety of traffic anomalies
– happening in daily operation such as network
scans, denial of service attacks, alpha flows, etc
2010/3/2
22
Data Set and Metrics (cont.1)
• The computing the detection metrics:
– distinguish between incoming and outgoing
traffic, as well as UDP and TCP flows
– for each of these four categories, compute seven
commonly used traffic features:
•
•
•
•
•
2010/3/2
Byte
Packet
flow counts
Sources and destination IP address entropy
Source and destination IP address counts
23
Data Set and Metrics (cont.2)
• All metrics obtained by aggregating the traffic
in 15 minutes intervals resulting 28*96 matrix
per measurement day
• Anomalies identified by using visual inspection
• Resulted in 28 detected anomalous events in
UDP and 73 detected in TCP traffic
2010/3/2
24
Data Set and Metrics (cont.3)
• Use the vector of metrics
containing the first two days of metrics for
building the model
• Derive a spatio-temporal correlation matrix
with the temporal correlation range set to N =
1, .., 5
– Note that setting N = 1 gives the standard PCA
approach
– apply SVD decomposition to the data, resulting in
a basis change matrix
2010/3/2
25
ROC curves
• Receiver Operating Characteristics (ROC)
curve combining the two parameters in one
value captures this essential trade-off
– false positive and true positive
2010/3/2
26
ROC curves (cont.1)
• Receiver Operating Characteristics (ROC)
curve combining the two parameters in one
value captures this essential trade-off
– false positive and true positive
2010/3/2
27
ROC curves (cont.2)
2010/3/2
28
ROC curves (cont.3)
• The comparison of ROC curves shows a
considerable improvement of the anomaly
detection performance with use of KL
expansion with N = 2, 3 consistently for UDP
and TCP traffic and thereafter a decrease for
N≥4
2010/3/2
29
Effect of non-stationarity
• Stationarity issue:
– N ≥ 4 the performance decreases
– when N increases, the model contains more
parameters and becomes more sensitive to the
stationarity of the traffic metrics
2010/3/2
30
Agenda
•
•
•
•
•
•
Before Introduction
Objective
A Signal Processing View on PCA
Extension of PCA to Stochastic Processes
Validation
Conclusion
2010/3/2
31
Conclusion
• Direct application of the PCA method results
in poor performance in terms of ROC curves
• The correct framework is not the classical PCA
but rather the Karhunen-Loeve expansion
• Provide a Galerkin method for developing a
predictive model and therefore an important
improvement is attained when temporal
correlation is considered
2010/3/2
32
Q&A
Thank you!
2010/3/2
33
i
( X1,..., X K )
X ( X 1 ,..., X K )T R k
K
X Yii
i 1
( X 1 ,..., X K )
X ( X 1 ,..., X K )T R k
x ( x1i ,..., xKi )T
(e1 ,..., e K )
(1 ,..., K )
E{( X )( X )T }
i λ ii
~
X X-
~y U~
x
K
X Yii
i 1
1
xx T
N 1
X (t ) ( X 1 (t ),..., X K (t )) T
ˆ
i , j ( ) E{ X i (t ) X j (t )}
K
X l (t ) Yi ,l j i , j (t )
i 1 j 1
K
b
i 1
a
i ,l
( s ) i , j ( s t ) ds λ l , j l , j (t )
b
Y X l ( s ) i , j ( s )ds
l
i, j
a
K
N
X l [k ] Yi ,l j i , j [ k ]
i 1 j 1
x ( x ,..., x )
i
1
(e1 ,..., e K )
i T
K
Xi
X
Yi
(1 ,..., K )
ˆ
ˆ
1
xx T
N 1
1
xx T
n N 1
L
M
Xˆ l [k ] Yi ,l j i , j [ k ]
i 1 j 1
i , j [k ]
D[k ]
i
Q[k ]h
Xi
X
U [1 ,..., K ]
Yi
i , j (t )
Yi ,l j
i , j [k ] i , j (kT )
X(t ) ( X 1 (t ),..., X K (t ))T
E{( X )( X ) } i , j ( ) E{ X i (t ) X j (t )}
T
i λ ii
~
X X-
~
y U~
x
2010/3/2
K
X l (t ) Y i , j (t )
i 1 j 1
l
i, j
34
K
i 1
b
a
i ,l
( s ) i , j ( s t )ds λ l , j l , j (t )
b
Y X l (s)i , j (s)ds
l
i, j
D[k ]
a
K
i 1 j 1
1
xx T
n N 1
L
Q[k ]h
N
X l [k ] Yi ,l j i , j [k ]
ˆ
i , j [k ]
M
Xˆ l [k ] Yi ,l j i , j [k ]
U [1,...,K ]
Yi ,l j
i , j [k ] i , j (kT )
i 1 j 1
2010/3/2
35
© Copyright 2026 Paperzz