00010 S6 00111 S5 S4 S3 S0 S7 - CSIE -NCKU

Space-Time Tradeoffs in
Software-based Deep Packet
Inspection
Author: Anat Bremler-Barr, Yotam
Harchol, and David Hay
Published in Proc. IEEE HPSR 2011
Goal



Software based DPI
AC based (Exact Matching)
Reduced memory size


Fit in CPU cache
Worst case throughput
2
Aho-Corasick
Given a states s,
Depth(s): Depth(S4) = 2, Depth(S13) = 3
Label(s): Label(S4) = BD, Label(S13) = BCA
Label(S12) = CDBCAB
Forward Transitions (To Deeper states)
Failure Transitions
Failure Transitions to S0 are omitted
3
State Structure
(1/3)
Lookup Table Format
B
S2
S2
S2
C
S5
S7
S7
D
S4
S0
S6
E
S3
S1
S1
S13 S14 S2
…
S7
S0
S1
S2
S4
S5
A
S0
S0
S0
Lookup Table format used in:
(# of Forward transitions) more than 64.
4
State Structure
(2/3)
Linear Format
S2
S5
A
S0
S0
B
S2
S2
C
S5
S7
D
S4
S6
E
S3
S1
S4 (S0)
S5 (S7)
D
S6
S2 (S0)
C
S5
D
S4
E
S3
5
State Structure
(3/3)
Bitmap Format
S2
S5
S5 (S7)
A
S0
S0
D
B
S2
S2
C
S5
S7
C
E
S3
S1
S6
00010 S6
S2 (S0)
D
S4
S6
S5
S7
D
S4
00111 S5 S4 S3
E
S3
S0
6
Path-Compression (1/3)


One-way branch states are compressed.
Problem:



Incoming Failure Transition
Outgoing Failure Transition
Solution:


No incoming failure transition is allowed
Multiple outgoing transition Fields
7
Path-Compression (2/3)
Sa
Sx
A, Sb
*, Sx
A
Sb
Sy
B, Sc
*, Sy
B
Sc
C
Sd
Sz
C, Sd
*, Sz
Sa
3, Sd
A, Sx
B, Sy
C, Sz
ABC
Sd
8
Path-Compression (3/3)
Tuck. (INFOCOM 2004)
T
Sa
A
Sx
S
T, Sj
*, Sp
B
Sy
A, Sb
*, Sx
Si
Sb
Sz
B, Sc
*, Sy
T
Sj
A, Sk
*, Sq
Sc
C, Sd
*, Sz
A
Before
Sk
*, Sb
C
Sd
Sa
3, Sd
A, Sx
B, Sy
C, Sz
Si
2, Sk
T, Sp
A, Sq
ABC
TA
After
Sd
Sk
???
9
Path Compression: Before and After
Aho-Corasick
Text: CDBCAB
Text: CDBCAA
10
Leaves-Compression

Original
Trie leaves consists only failure
transition.
Sa
A, Sb
Sa
1st
process A, Sb, 0
2nd
process
Sa
AB, Sx, 1
A
A
Sb
B, Sc
Sb
B, Sx, 1
B
Sc
*, Sx
•Adding one bit for
each forward transition
=> indicate an accept
state
•The process can be
applied recursively
11
Use both techniques

S0
B
Add one bit for every symbol of
compressed path.
A
Sa
Sp
B, 0
E
Sb
C, 1
Sc
D, 1
Sd
Sq
Set the bit of i-th symbol when:
(1) when a transition with the first i symbols of the path is to
an accepting state
(2) if the failure transition of the pre-compressed state
reached after the first i symbols of the path, is to a leaf
12
Leaves Compression: Before and After
13
Pointer Compression



There are many transitions that go to
states whose depth is small.
31% of the failure transitions go to
depth 1 states
Additional 35% of the failure transitions
go to depth 2 states.
14
Variable-Size Pointers



Two lengths: 2 and 2+log2|S|
00: Go to state S0
01: Go to depth 1 states
(S0 occurs current symbols)

10: Go to depth 2 states
(S0 occurs last symbols + current symbols)
(Valid pairs are less, thus use hashing)

11: Go to next states as regular pointer
15
Huffman Coding



Huffman coding allocates short code for
frequent symbols and long code for
infrequent ones.
A lookup table is used to provide
symbol-to-Huffman-code conversion.
The idea is not used.
16
Evaluation Environment
Two Environment:
 Core 2 Duo 2.53 GHz (2 Core), 32KB L1,
3MB L2.
 Core i7 2.93 GHz (4 Core), 32 KB L1,
256 KB L2, 8MB L3.
17
Evaluation Traffic
Pattern:


Snort
ClamAV (Partial)
Traffic:



DARPA (Real Life)
Exhaustive Traversal
Failure path Traversal
Worst Case
18
Space Requirement
19
Throughput
20
Memory Access
21
L1 Cache Miss Ratio
22
Miss ratio of Larger L2 Cache
23