Thesis_ECDS - Computer and Internet Architecture Laboratory

A DFA with Extended Character-Set for
Fast Deep Packet Inspection
Author: Cong Liu, Yan Pan, Ai Chen, and Jie Wu
Presenter: Xiao-Min Zheng
Date:2016/11/09
Department of Computer Science and Information Engineering
National Cheng Kung University, Taiwan R.O.C.
Introduction




DPI is becoming increasing important in classifying and
controlling network traffic.
We focus on a general-purpose processor approach.
NFA implementations of regular expression will cause a
nondeterministic number of main memory accesses per byte.
DFA implementations of regular expression will cause a
very large memory space to store their transition table.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
2
Introduction





DFA/EC as a general model of DFA, removes part of each
DFA state and incorporates it with the next input character.
This novel solution focuses on state reduction, through
doubling the size of the character-set.
our solution requires only a single main memory access for
each byte in the traffic payload.
DFA/EC is simple, easy to implement, and easy to update
due to fast construction speed.
DFA/EC can be combined with other compression
approaches to provide a better level of compression.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
3
The Conceptual DFA/EC



A DFA can be constructed from a set of NFAs.
The number of DFA states is the number of possible
combinations of active DFA states that can be
simultaneously active, which can be exponential to the
number of NFA states
Let N be the set of NFA states, and let D be the set of
DFA states.
1) d∈D,d⊆N;
2) D⊂2N ;
3) |N|≪ |D| ≪| 2N | =2|N|
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
4
The Conceptual DFA/EC

The reasons why NFAs cause state explosion:
1) The states 2,6, and 7 are more likely to be active;
2) A frequently active NFA state is more likely to be active
simultaneously with other sets of states.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
5
The Conceptual DFA/EC


In DFA/EC we select some of the most frequently active
NFA states and incorporate them into the character-set(or
the alphabet) of the DFA to form a slightly larger
extended character-set.
Three states of DFA/EC:
1) main DFA(D1): A main DFA in a DFA/EC that
implements the rest of the infrequently active NFA states;
2) complementary states(N2):NFA states that are selected
and incorporated into the character-set;
3) main states(N1):Remaining NFA states.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
6
The Conceptual DFA/EC
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
7
The Formal Model of DFA/EC


Theorem 1: For any DFA, there exists an DFA/EC
A DFA/EC can be defined by (D1,D2,C,Ce,He,Te,M)
Set of simultaneously active sets of main states
Set of simultaneously active sets of complementary states
C
Original character-set (or alphabet)
Extended character-set
D
Set of conventional DFA states
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
8
The Formal Model of DFA/EC
He: Generates an extended character ce∈Ce
Te: Generates a pair of partial states
M: Generates another partial state
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
9
The Formal Model of DFA/EC

DFA defined by (D,C,T),T:D x C →D can be equivalent
to another form of DFA(D1,D2,C,T11,T12,T21,T22)
d1 and d2 as sets of simultaneously active NFA states
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
10
An Efficient implementation

Two constraints on the complementary states
1) conflicting constraint
2) binary constraint
The purpose of these constraints is to reduce the range of
function He
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
11
An Efficient implementation

conflicting constraint

Under the non-conflicting constraint, for a given c,
their can be at most one complementary state n2 in d2
that has a transition to one or several main states,{n1 }.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
12
An Efficient implementation

binary constraint

The binary constraint is in terms of the transitions with
in N2, while the non-conflicting constraint, concerns
transitions from states in N2 to states in N1.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
13
An Efficient implementation

Determine the complementary states by independent-state
method.
1)First step is to estimate the extent to which each NFA
state causes state explosion;
2)Second step is to determine the complementary states
based on the results in the first step and the two
constraints.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
14
An Efficient implementation


The number of states of a DFA constructed from an NFA
depends on the level of independence among the states in
the NFA:
1) if every pair of states in the NFA is not independent,
the size of the DFA equals that of the NFA;
2) if every pair of states in the NFA is independent, the
size of the DFA is 2|N|
We measure the level of independence of an NFA state
among other NFA states by using the number of times it
appears in a pair of independent states, which we call the
independent number of the state.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
15
An Efficient implementation
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
16
An Efficient implementation
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
17
Evaluation





We developed several compilers, which read files of rules
and created the corresponding inspection programs and
the transition tables.
We extracted rule-sets from the Snort rules.
We developed a synthetic payload generator.
We generate the inspection programs for the rule-sets,
measure their storages, and load them with the synthetic
payloads to measure their performance.
Results
1) On storage size
2) On memory bandwidth and speed
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
18
Evaluation


The total number of states (percentage to DFA)
The significant reduction is due of the removal of the
frequently active complementary state in DFA/EC.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
19
Evaluation


The total number of transitions (percentage to DFA)
The number of transitions is the sum of the numbers of
transitions of each state.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
20
Evaluation


The transition storage(Bits/Percentage to DFA)
The total minimum memory (storage) requirement of the
transition tables in terms of bits.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
21
Evaluation



The size of the per-flow state(Bits)
We measure the sizes of the per-flow state of the inspection
programs in terms of bits and words.
The per-flow states for DFA , MDFA and DFA/EC are
,
and
+ N2
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
22
Evaluation



Memory bandwidth (bits) with different rule-sets
The memory bandwidths of DFA, MDFA, and DFA/EC are
Figure shows that the memory bandwidth of DFA/EC is
very close to that of DFA and is much smaller than MDFAs.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
23
Evaluation



Memory accesses (times/KB)
The number of main memory accesses per KB of payload.
DFA/EC and DFA have the minimum number of main
memory accesses, MDFAs increases in proportional to M.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
24
Evaluation



Inspection speed(Java)
We measure the speed of the inspection programs with both
Java and C++ implementations in a Unix machine with
16GB of 1333MHz DDR3 memory and a 2.66 GHz Inter
Core i5 CPU.
The speeds of the inspection programs depend on the
hardware and software on which they are implemented.
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
25
Evaluation
Java
C++
National Cheng Kung University CSIE
Computer & Internet Architecture Lab
26
Conclusion




DFA/EC a general-purpose processor and regular
expressions-based deep packet inspection algorithm.
This solution requires only a single main memory access
for each byte in the traffic payload.
DFA/ECs are very compact, has a smaller memory
bandwidth, and runs faster than DFA.
We will combine DFA/EC with the existing transition
compression and character-set compression techniques,
and perform experiments with more rule-sets.