A DFA with Extended Character-Set for Fast Deep Packet Inspection Author: Cong Liu, Yan Pan, Ai Chen, and Jie Wu Presenter: Xiao-Min Zheng Date:2016/11/09 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. Introduction DPI is becoming increasing important in classifying and controlling network traffic. We focus on a general-purpose processor approach. NFA implementations of regular expression will cause a nondeterministic number of main memory accesses per byte. DFA implementations of regular expression will cause a very large memory space to store their transition table. National Cheng Kung University CSIE Computer & Internet Architecture Lab 2 Introduction DFA/EC as a general model of DFA, removes part of each DFA state and incorporates it with the next input character. This novel solution focuses on state reduction, through doubling the size of the character-set. our solution requires only a single main memory access for each byte in the traffic payload. DFA/EC is simple, easy to implement, and easy to update due to fast construction speed. DFA/EC can be combined with other compression approaches to provide a better level of compression. National Cheng Kung University CSIE Computer & Internet Architecture Lab 3 The Conceptual DFA/EC A DFA can be constructed from a set of NFAs. The number of DFA states is the number of possible combinations of active DFA states that can be simultaneously active, which can be exponential to the number of NFA states Let N be the set of NFA states, and let D be the set of DFA states. 1) d∈D,d⊆N; 2) D⊂2N ; 3) |N|≪ |D| ≪| 2N | =2|N| National Cheng Kung University CSIE Computer & Internet Architecture Lab 4 The Conceptual DFA/EC The reasons why NFAs cause state explosion: 1) The states 2,6, and 7 are more likely to be active; 2) A frequently active NFA state is more likely to be active simultaneously with other sets of states. National Cheng Kung University CSIE Computer & Internet Architecture Lab 5 The Conceptual DFA/EC In DFA/EC we select some of the most frequently active NFA states and incorporate them into the character-set(or the alphabet) of the DFA to form a slightly larger extended character-set. Three states of DFA/EC: 1) main DFA(D1): A main DFA in a DFA/EC that implements the rest of the infrequently active NFA states; 2) complementary states(N2):NFA states that are selected and incorporated into the character-set; 3) main states(N1):Remaining NFA states. National Cheng Kung University CSIE Computer & Internet Architecture Lab 6 The Conceptual DFA/EC National Cheng Kung University CSIE Computer & Internet Architecture Lab 7 The Formal Model of DFA/EC Theorem 1: For any DFA, there exists an DFA/EC A DFA/EC can be defined by (D1,D2,C,Ce,He,Te,M) Set of simultaneously active sets of main states Set of simultaneously active sets of complementary states C Original character-set (or alphabet) Extended character-set D Set of conventional DFA states National Cheng Kung University CSIE Computer & Internet Architecture Lab 8 The Formal Model of DFA/EC He: Generates an extended character ce∈Ce Te: Generates a pair of partial states M: Generates another partial state National Cheng Kung University CSIE Computer & Internet Architecture Lab 9 The Formal Model of DFA/EC DFA defined by (D,C,T),T:D x C →D can be equivalent to another form of DFA(D1,D2,C,T11,T12,T21,T22) d1 and d2 as sets of simultaneously active NFA states National Cheng Kung University CSIE Computer & Internet Architecture Lab 10 An Efficient implementation Two constraints on the complementary states 1) conflicting constraint 2) binary constraint The purpose of these constraints is to reduce the range of function He National Cheng Kung University CSIE Computer & Internet Architecture Lab 11 An Efficient implementation conflicting constraint Under the non-conflicting constraint, for a given c, their can be at most one complementary state n2 in d2 that has a transition to one or several main states,{n1 }. National Cheng Kung University CSIE Computer & Internet Architecture Lab 12 An Efficient implementation binary constraint The binary constraint is in terms of the transitions with in N2, while the non-conflicting constraint, concerns transitions from states in N2 to states in N1. National Cheng Kung University CSIE Computer & Internet Architecture Lab 13 An Efficient implementation Determine the complementary states by independent-state method. 1)First step is to estimate the extent to which each NFA state causes state explosion; 2)Second step is to determine the complementary states based on the results in the first step and the two constraints. National Cheng Kung University CSIE Computer & Internet Architecture Lab 14 An Efficient implementation The number of states of a DFA constructed from an NFA depends on the level of independence among the states in the NFA: 1) if every pair of states in the NFA is not independent, the size of the DFA equals that of the NFA; 2) if every pair of states in the NFA is independent, the size of the DFA is 2|N| We measure the level of independence of an NFA state among other NFA states by using the number of times it appears in a pair of independent states, which we call the independent number of the state. National Cheng Kung University CSIE Computer & Internet Architecture Lab 15 An Efficient implementation National Cheng Kung University CSIE Computer & Internet Architecture Lab 16 An Efficient implementation National Cheng Kung University CSIE Computer & Internet Architecture Lab 17 Evaluation We developed several compilers, which read files of rules and created the corresponding inspection programs and the transition tables. We extracted rule-sets from the Snort rules. We developed a synthetic payload generator. We generate the inspection programs for the rule-sets, measure their storages, and load them with the synthetic payloads to measure their performance. Results 1) On storage size 2) On memory bandwidth and speed National Cheng Kung University CSIE Computer & Internet Architecture Lab 18 Evaluation The total number of states (percentage to DFA) The significant reduction is due of the removal of the frequently active complementary state in DFA/EC. National Cheng Kung University CSIE Computer & Internet Architecture Lab 19 Evaluation The total number of transitions (percentage to DFA) The number of transitions is the sum of the numbers of transitions of each state. National Cheng Kung University CSIE Computer & Internet Architecture Lab 20 Evaluation The transition storage(Bits/Percentage to DFA) The total minimum memory (storage) requirement of the transition tables in terms of bits. National Cheng Kung University CSIE Computer & Internet Architecture Lab 21 Evaluation The size of the per-flow state(Bits) We measure the sizes of the per-flow state of the inspection programs in terms of bits and words. The per-flow states for DFA , MDFA and DFA/EC are , and + N2 National Cheng Kung University CSIE Computer & Internet Architecture Lab 22 Evaluation Memory bandwidth (bits) with different rule-sets The memory bandwidths of DFA, MDFA, and DFA/EC are Figure shows that the memory bandwidth of DFA/EC is very close to that of DFA and is much smaller than MDFAs. National Cheng Kung University CSIE Computer & Internet Architecture Lab 23 Evaluation Memory accesses (times/KB) The number of main memory accesses per KB of payload. DFA/EC and DFA have the minimum number of main memory accesses, MDFAs increases in proportional to M. National Cheng Kung University CSIE Computer & Internet Architecture Lab 24 Evaluation Inspection speed(Java) We measure the speed of the inspection programs with both Java and C++ implementations in a Unix machine with 16GB of 1333MHz DDR3 memory and a 2.66 GHz Inter Core i5 CPU. The speeds of the inspection programs depend on the hardware and software on which they are implemented. National Cheng Kung University CSIE Computer & Internet Architecture Lab 25 Evaluation Java C++ National Cheng Kung University CSIE Computer & Internet Architecture Lab 26 Conclusion DFA/EC a general-purpose processor and regular expressions-based deep packet inspection algorithm. This solution requires only a single main memory access for each byte in the traffic payload. DFA/ECs are very compact, has a smaller memory bandwidth, and runs faster than DFA. We will combine DFA/EC with the existing transition compression and character-set compression techniques, and perform experiments with more rule-sets.
© Copyright 2025 Paperzz