hamsa-cisco - Northwestern University

Hamsa: Fast Signature Generation
for Zero-day Polymorphic Worms
with Provable Attack Resilience
Lab for Internet & Security Technology (LIST)
Northwestern University
The Spread of Sapphire/Slammer Worms
2
Desired Requirements for Polymorphic
Worm Signature Generation
• Network-based signature generation
– Worms spread in exponential speed, to
detect them in their early stage is very
crucial… However
» At their early stage there are limited worm
samples.
– The high speed network router may see more
worm samples… But
» Need to keep up with the network speed !
» Only can use network level information
3
Desired Requirements for Polymorphic
Worm Signature Generation
• Noise tolerant
– Most network flow classifiers suffer false
positives.
– Even host based approaches can be injected with
noise.
• Attack resilience
– Attackers always try to evade the detection
systems
• Efficient signature matching for high-speed
links
No existing work satisfies these requirements4 !
Outline
•
•
•
•
•
•
Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion
5
Choice of Signatures
• Two classes of signatures
– Content based
» Token: a substring with reasonable coverage to the
suspicious traffic
» Signatures: conjunction of tokens
– Behavior based
• Our choice: content based
– Fast signature matching. ASIC based approach
can archive 6 ~ 8Gb/s
– Generic, independent of any protocol or server
6
Unique Invariants of Worms
• Protocol Frame
Invariants
– The code path to the vulnerability part, usually
infrequently used
– Code-Red II: ‘.ida?’ or ‘.idq?’
• Control Data: leading to control flow hijacking
– Hard coded value to overwrite a jump target or a
function call
• Worm Executable Payload
– CLET polymorphic engine: ‘0\x8b’, ‘\xff\xff\xff’ and
‘t\x07\xeb’
• Possible to have worms with no such invariants,
but very hard
7
Hamsa Architecture
Network
Tap
TCP
25
Protocol
Classifier
TCP
53
TCP
80
. . .
TCP
137
UDP
1434
Suspicious
Traffic Pool
Known
Worm
Filter
Worm
Flow
Classifier
Normal traffic
reservoir
Normal
Traffic Pool
Hamsa
Signature
Generator
Signatures
Real time
Policy driven
8
Components from existing work
• Worm flow classifiers
– Scan based detector [Autograph]
– Byte spectrum based approach
[PAYL]
– Honeynet/Honeyfarm sensors
[Honeycomb]
9
Hamsa Design
• Key idea: model the uniqueness of worm
invariants
– Greedy algorithm for finding token conjunction
signatures
• Highly accurate while much faster
– Both analytically and experimentally
– Compared with the latest work, polygraph
– Suffix array based token extraction
• Provable attack resilience guarantee
• Noise tolerant
10
Outline
•
•
•
•
•
•
Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion
11
Hamsa Signature Generator
Normal
Traffic Pool
Token
Identification
Core
Signature
Refiner
Signature
Suspicious
Traffic Pool
Token
Extractor
NO
Pool size
too small?
Tokens
Filter
YES
Quit
• Core part: Model-based Greedy Signature
Generation
• Iterative approach for multiple worms
12
Problem Formulation
Maximize the
coverage in the
suspicious pool
Suspicious
pool
Signature
Generator
Normal
pool
false positive
bound r
Signature
False positive in
the normal pool is
bounded by r
Without noise, can be solve linearly using token extraction
With noise
NP-Hard!
13
Model Uniqueness of Invariants
t1
U(1)=upper bound of FP(t1)
FP
21%
t2
U(2)=upper bound of FP(t1,t2)
Joint FP with t1
2%
9%
0.5%
17%
1%
5%
The total number of tokens bounded by k*
14
Signature Generation Algorithm
token extraction
Suspicious
poolFP)
(COV,
t1
u(1)=15%
tokens
(82%, 50%)
(70%, 11%)
(67%, 30%)
(62%, 15%)
(50%, 25%)
(41%, 55%)
(36%, 41%)
(12%, 9%)
Order by coverage
15
Signature Generation Algorithm
Signature
t1
(COV, FP)
(COV, FP)
(82%, 50%)
(69%, 9.8%)
(70%, 11%)
(68%, 8.5%)
(67%, 30%)
(67%, 1%)
(62%, 15%)
(40%, 2.5%)
(50%, 25%)
(35%, 12%)
(41%, 55%)
(31%, 9%)
(36%, 41%)
(10%, 0.5%)
t2
u(2)=7.5%
(12%, 9%)
Order by joint
coverage with t1
16
Algorithm Runtime Analysis
• Preprocessing need:
O(m + n + T*l + T*(|M|+|N|))
• Running time: O(T*(|M|+|N|))
– In most case |M| < |N| so, it can reduce
to O(T*|N|)
T : the # of tokens
|M|: the # of flows in the
suspicious pool
m: the # of bytes in the
suspicious pool
l: the maximum length of
tokens
|N|: the # of flows in the
normal pool
n: the # of bytes in the
normal pool
17
Provable Attack Resilience
Guarantee
• Proved the worse case bound on false negative
given the false positive
• Analytically bound the worst attackers can do!
• Example: K*=5, u(1)=0.2, u(2)=0.08, u(3)=0.04,
u(4)=0.02, u(5)=0.01 and r=0.01
Noise ratio
5%
10%
20%
FP upper bound
1%
1%
1%
FN upper bound
1.84%
3.89%
8.75%
• The better the flow classifier, the lower are
the false negatives
18
Attack Resilience Assumptions
• Common assumptions for any sig generation sys
1. The attacker cannot control which worm
samples are encountered by Hamsa
2. The attacker cannot control which worm
samples encountered will be classified as worm
samples by the flow classifier
• Unique assumptions for token-based schemes
1. The attacker cannot change the frequency of
tokens in normal traffic
2. The attacker cannot control which normal
samples encountered are classified as worm
samples by the worm flow classifier
19
Attack Resilience Assumptions
•
Attacks to the flow classifier
– Our approach does not depend on perfect flow
classifiers
– But with 99% noise, no approach can work!
– High noise injection makes the worm propagate
less efficiently.
•
Enhance flow classifiers
– Cluster suspicious flows by return messages
– Information theory based approaches (DePaul
Univ)
20
Generalizing Signature Generation
with noise
•
BEST Signature = Balanced Signature
– Balance the sensitivity with the specificity
– Create notation scoring function:
score(cov, fp, …) to evaluate the goodness
of signature
– Current used
score (COV , FP, LEN )   log((   FP),10)    COV    LEN
» Intuition: it is better to reduce the coverage 1/a
if the false positive becomes 10 times smaller.
» Add some weight to the length of signature
(LEN) to break ties between the signatures with
same coverage and false positive
21
Hamsa Signature Generator
Normal
Traffic Pool
Token
Identification
Core
Signature
Refiner
Signature
Suspicious
Traffic Pool
Token
Extractor
NO
Pool size
too small?
Tokens
Filter
YES
Quit
Next: Token extraction and token identification
22
Token Exaction
• Problem formulation:
– Input: a set of strings, and minimum length l and
minimum coverage COVmin
– Output:
» A set of tokens (substrings) meet the minimum length
and coverage requirements
• Coverage: the portion of strings having the token
» Corresponding sample vectors for each token
• Main techniques:
– Suffix array
– LCP (Longest Common Prefix) array, and LCP
intervals
23
– Token Exaction Algorithm (TEA)
Suffix Array
• Illustration by an
example
– String1: abrac, String2:
adabra
– Cat together: abracadabra$
– All suffix: a$, ra$, bra$,
abra$, dabra$…
– Sort all the suffix:
– 4n space
– Sorting can be done in 4n
space and O(nlog(n)) time
a
abra
abracadabra
acadabra
adabra
bra
bracadabra
cadabra
dabra
ra
racadabra
10
7
0
3
5
8
1
4
6
9
2
24
LCP Array and LCP Intervals
Suffixes
sufarr lcparr idx str
a
abra
abracadabra
acadabra
adabra
bra
bracadabra
cadabra
dabra
ra
racadabra
10
7
0
3
5
8
1
4
6
9
2
- (0)
1
4
1
1
0
3
0
0
0
2
0
1
2
3
4
5
6
7
8
9
10
2
2
1
1
2
2
1
1
2
2
1
0-[0,10]
1-[0,4]
3-[5,6]
2-[9,10]
4-[1..2]
LCP intervals => tokens
25
Token Exaction Algorithm
(TEA)
• Find eligible LCP intervals first
• Then find the tokens
26
Token Exaction Algorithm
(TEA)
27
Token Exaction Algorithm
(TEA)
28
Token Identification
• For normal traffic, pre-compute and
store suffix array offline
• For a given token, binary search in
suffix array gives the corresponding
LCP intervals
• O(log(n)) time complexity
– More sophisticated O(1) algorithm is
possible, may require more space
29
Implementation Details
•
–
–
•
Token Extraction: extract a set of tokens with
minimum length l and minimum coverage COVmin.
Polygraph use suffix tree based approach: 20n space
and time consuming.
Our approach: Enhanced suffix array 8n space and
much faster! (at least 20 times)
Calculate false positive when check U-bounds
(Token Identification)
–
–
Again suffix array based approach, but for a 300MB
normal pool, 1.2GB suffix array still large!
Optimization: using MMAP, memory usage: 150 ~
250MB
30
Hamsa Signature Generator
Normal
Traffic Pool
Token
Identification
Core
Signature
Refiner
Signature
Suspicious
Traffic Pool
Token
Extractor
NO
Pool size
too small?
Tokens
Filter
YES
Quit
Next: signature refinement
31
Signature Refinement
• Why refinement?
– Produce a signature with same sensitivity
but better specificity
• How?
– After we use the core algorithm to get the
greedy signature, we believe the samples
matched by the greedy signature are all
worm samples
– Reduce to a signature generation without
noise problem. Do another round token
extraction
32
Extend to Detect Multiple
Worms
• Iteratively use single worm detector
to detect multiple worms
– At the first iteration, the algorithm
find the signature for the most popular
worms in the suspicious pool.
– All other worms and normal traffic
treat as noise
33
Practical Issues on Data
Normalization
• Typical cases need data normalization
– IP packet fragmentation
– TCP flow reassembly (defend fragroute)
– RPC fragmentation
– URL Obfuscation
– HTML Obfuscation
– Telnet/FTP Evasion by \backspace or
\delete keys
• Normalization translates data into the
canonical form
34
Practical Issues on Data
Normalization (II)
• Hamsa with data normalization works
better
• Without or with weak data
normalization, Hamsa still work
– But because the data many have
different forms of encoding, may
produce multiple signature for a single
worm
– Need sufficient samples for each form
of encoding
35
Outline
•
•
•
•
•
•
Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion
36
Experiment Methodology
•
Experiential setup:
–
Suspicious pool:
»
»
–
•
Three pseudo polymorphic worms based on real exploits
(Code-Red II, Apache-Knacker and ATPhttpd),
Two polymorphic engines from Internet (CLET and TAPiON).
Normal pool: 2 hour departmental http trace (326MB)
Signature evaluation:
–
–
False negative: 5000 generated worm samples per
worm
False positive:
»
»
»
4-day departmental http trace (12.6 GB)
3.7GB web crawling including .mp3, .rm, .ppt, .pdf, .swf etc.
/usr/bin of Linux Fedora Core 4
37
Results on Signature Quality
Worms
Code-Red
II
CLET
•
Training
FN
Training
FP
Evaluation Evaluation
Binary
FN
FP
evaluation FP
Signature
0
0
0
0
0
{'.ida?': 1, '%u780': 1, ' HTTP/1.0\r\n': 1, 'GET /': 1, '%u': 2}
0
0.109%
0
0.06236%
0.268%
{'0\x8b': 1, '\xff\xff\xff': 1,'t\x07\xeb': 1}
Single worm with noise
– Suspicious pool size: 100 and 200 samples
– Noise ratio: 0%, 10%, 30%, 50%, 70%
– Noise samples randomly picked from the normal
pool
– Always get above signatures and accuracy.
38
Results on Signature Quality (II)
•
Suspicious pool with high noise ratio:
– For noise ratio 50% and 70%, sometimes we can
produce two signatures, one is the true worm
signature, anther solely from noise, due to the
locality of the noise.
– The false positive of these noise signatures
have to be very small:
» Mean: 0.09%
» Maximum: 0.7%
•
Multiple worms with noises give similar
results
39
Experiment: U-bound evaluation
•
•
u(i)  u(1) * u
i 1
r
1 i  k
*
To be conservative we chose k*=15.
– u(k*)= u(15)= 9.16*10-6.
•
u(1) and ur evaluation
– We tested:u(1) = [0.02, 0.04, 0.06, 0.08, 0.10,
0.20, 0.30, 0.40, 0.5]
– and ur = [0.20, 0.40, 0.60, 0.8].
– The minimum (u(1), ur) works for all our worms
was (0.08,0.20)
– In practice, we use conservative value (0.15,0.5)
40
Speed Results
•
–
–
–
Implementation with C++/Python
500 samples with 20% noise, 100MB normal traffic
pool, 15 seconds on an XEON 2.8Ghz, 112MB memory
consumption
Speed comparison with Polygraph
Asymptotic runtime: O(T) vs. O(|M|2), when |M|
increase, T won’t increase as fast as |M|!
Experimental: 64 to 361 times faster (polygraph vs.
ours, both in python)
20% noise
30% noise
40% noise
50% noise
3000
the number of tokens
•
2000
1000
0
100
200
300
pool size
400
41
Experiment: Sample requirement
•
•
Coincidental-pattern attack [Polygraph]
Results
– For the three pseudo worms, 10 samples can get
good results
– CLET and TAPiON at least need 50 samples
•
Conclusion
– For better signatures, to be conservative, at
least need 100+ samples
Require scalable and fast signature generation!
42
Token-fit Attack Can Fail Polygraph
• Polygraph: hierarchical clustering to find
signatures w/ smallest false positives
• With the token distribution of the noise
in the suspicious pool, the attacker can
make the worm samples more like noise
traffic
– Different worm samples encode different
noise tokens
• Our approach can still work!
43
Token-fit attack could make
Polygraph fail
N1
Noise samples
N2
Merge
Candidate 1
N3
W1
Worm samples
W2
Merge
Candidate 2
CANNOT merge further!
NO true signature
found!
W3
Merge
Candidate 3
44
Experiment: Token-fit attack
•
•
•
Suspicious of 50 samples with 50% noise
Elaborate different worm samples like
different noise samples.
Results
– Polygraph 100% false negative
– Hamsa still can get the correct signature as
before!
45
Outline
•
•
•
•
•
•
Motivation
Hamsa Design
Model-based Signature Generation
Evaluation
Related Work
Conclusion
46
Related works
Hamsa
Polygraph
CFG
PADS
Nemean
COVERS
Malware
Detection
Host
Network or
host based
Network Network
Network Host
Host
Host
Content or
behavior
based
Content
based
Content
based
Behavior Content
based
based
Content
based
Behavior Behavior
based
based
Noise
tolerance
Yes
Yes
(slow)
Yes
No
No
Yes
Yes
Multi worms Yes
in one
protocol
Yes
(slow)
Yes
No
Yes
Yes
Yes
On-line sig
matching
Fast
Fast
Slow
Fast
Fast
Fast
Slow
Generality
General
purpose
General
purpose
General
purpose
General
purpose
Protocol
specific
Server
specific
General
purpose
Provable atk Yes
resilience
No
No
No
No
No
No
Information egp
exploited
egp
p
egp
e
eg
p
47
Conclusion
•
Network based signature generation and
matching are important and challenging
Hamsa: automated signature generation
•
–
–
–
–
•
Fast
Noise tolerant
Provable attack resilience
Capable of detecting multiple worms in a
single application protocol
Proposed a model to describe the worm
invariants
48
Questions ?
Results on Signature Quality (II)
•
Suspicious pool with high noise ratio:
– For noise ratio 50% and 70%, sometimes we can
produce two signatures, one is the true worm
signature, anther solely from noise.
– The false positive of these noise signatures
have to be very small:
» Mean: 0.09%
» Maximum: 0.7%
•
Multiple worms with noises give similar
results
50
Normal Traffic Poisoning
Attack
• We found our approach is not sensitive
to the normal traffic pool used
• History: last 6 months time window
• The attacker has to poison the normal
traffic 6 month ahead!
• 6 month the vulnerability may have been
patched!
• Poisoning the popular protocol is very
difficult.
51
Red Herring Attack
• Hard to implement
• Dynamic updating problem. Again
our approach is fast
• Partial Signature matching, in
extended version.
52
Coincidental Attack
• As mentioned in the Polygraph
paper, increase the sample
requirement
• Again, our approach are scalable
and fast
53
Model Uniqueness of Invariants
• Let worm has a set of invariants:
Determine their order by:
FP({t1})  FP({t j })
j
t1: the token with minimum false positive in normal
traffic. u(1) is the upper bound of the false
positive of t1
FP({t1 , t 2 })  FP({t1 , t j })
j  1
t2: the token with minimum joint false positive with
t1 FP({t1,t2}) bounded by u(2)
FP({t1 ,..., ti })  FP({t1 ,..., ti 1 ,t j })
j  i  1
ti: the token with minimum joint false positive with
{t1, t2, ti-1}. FP({t1,t2,…,ti}) bounded by u(i)
The total number of tokens bounded by k*
54
Problem Formulation
Noisy Token Multiset Signature Generation Problem :
INPUT: Suspicious pool M and normal traffic pool
N; value r<1.
OUTPUT: A multi-set of tokens signature S={(t1,
n1), . . . (tk, nk)} such that the signature can maximize
the coverage in the suspicious pool and the false
positive in normal pool should less than r
• Without noise, exist polynomial time algo
• With noise, NP-Hard
55
Generalizing Signature Generation
with noise
• BEST Signature = Balanced Signature
– Balance the sensitivity with the specificity
– But how? Create notation Scoring function:
score(cov, fp, …) to evaluate the goodness
of signature
– Current used
score (COV , FP, LEN )   log((   FP),10)    COV    LEN
» Intuition: it is better to reduce the coverage 1/a
if the false positive becomes 10 times smaller.
» Add some weight to the length of signature
(LEN) to break ties between the signatures with
same coverage and false positive
56
Generalizing Signature Generation
with noise
• Algorithm: similar
•
Running time: same as previous simple
form
•
Attack Resilience Guarantee: similar
57
Extension to multiple worm
•
Iteratively use single worm detector to
detect multiple worm
– At the first iteration, the algorithm find
the signature for the most popular worms in
the suspicious pool. All other worms and
normal traffic treat as noise.
– Though the analysis for the single worm can
apply to multiple worms, but the bound are
not very promising.
Reason: high noise ratio
58
Token Extraction
•
•
•
Extract a set of tokens with minimum
length lmin and coverage COVmin. And for
each token output the frequency
vector.
Polygraph use suffix tree based
approach: 20n space and time
consuming.
Our approach:
– Enhanced suffix array 4n space
– Much faster, at least 50(UPDATE) times!
– Can apply to Polygraph also.
59
Calculate the false positive
•
•
•
–
–
–
–
–
We need to have the false positive to
check the U-bounds
Again suffix array based approach, but for
a 300MB normal pool, 1.2GB suffix array
still large!
Improvements
Caching
MMAP suffix array. True memory usage: 150 ~
250MB.
2 level normal pool
Hardware based fast string matching
Compress normal pool and string matching
algorithms directly over compressed strings 60
Future works
• Enhance the flow classifiers
– Cluster suspicious flows by return
messages
– Malicious flow verification by
replaying to Address Space
Randomization enabled servers.
61