TCS -TR-A-14-74 TCS Technical Report Packet Classification for Global Network View of Software-Defined Networking by Takeru Inoue, Toru Mano, Kimihiro Mizutani, Shin-ichi Minato, and Osamu Akashi Division of Computer Science Report Series A July 16, 2014 Hokkaido University Graduate School of Information Science and Technology Email: [email protected] Phone: +81-011-706-7682 Fax: +81-011-706-7682 Packet Classification for Global Network View of Software-Defined Networking Takeru Inoue∗† Toru Mano† Shin-ichi Minato‡ Kimihiro Mizutani† Osamu Akashi† July 16, 2014 Abstract In software-defined networking, applications are allowed to access a global view of the network so as to provide sophisticated functionalities, such as quality-oriented service delivery, automatic fault localization, and network verification. All of these functionalities commonly rely on a well-studied technology, packet classification. Unlike the conventional classification problem to search for the action taken at a single switch, the global network view requires to identify the network-wide behavior of the packet, which is defined as a combination of switch actions. Conventional classification methods, however, fail to well support network-wide behaviors, since the search space is complicatedly partitioned due to the combinations. This paper proposes a novel packet classification method that efficiently supports network-wide packet behaviors. Our method utilizes a compressed data structure named the multi-valued decision diagram, allowing it to manipulate the complex search space with several algorithms. Through detailed analysis, we optimize the classification performance as well as the construction of decision diagrams. Experiments with real network datasets show that our method identifies the packet behavior at 20.1 Mpps on a single CPU core with only 8.4 MB memory; by contrast, conventional methods failed to work even with 16 GB memory. We believe that our method is essential for realizing advanced applications that can fully leverage the potential of software-defined networking. 1 Introduction Software-Defined Networking (SDN) [1] is a new paradigm that separates the control and forwarding planes in a network. The control plane, which is operated ∗ [email protected] NTT Network Innovation Laboratories, Yokosuka, Japan. ‡ Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan. † 1 2 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi by a logically centralized controller, provides a global view of the distributed network state. This global view allows SDN applications running on the control plane to readily identify network-wide packet behaviors, e.g., which path the packet traverses, whether the packet is discarded in the network, and what actions the packet is subjected to at middle boxes. Many applications have been developed that utilize the network-wide packet behavior to offer sophisticated functionalities. • In quality-oriented services [2] and an upcoming infrastructural paradigm called “network fabric” (separation of intelligence from the network core) [3, 4], the network edge is assumed able to determine the network-wide behavior for every incoming packet. • Reference [5] proposed an automatic fault localization technique that exploits the differences between actual packet behavior monitored in the network and the theoretical behavior estimated from the global view. • References [6–11] estimate the network-wide packet behavior from the network status, in order to verify conformity with a network policy. An method that can efficiently determine the network-wide behavior of a packet is essential to successfully implementing these advanced applications; they cannot work if only the action to be taken at a single switch can be identified. 1.1 Summary and Limitations of Prior Art Packet classification [12–20], a functionality that determines the action taken on a packet based on multiple header fields, has been a key technology in modern networks to provide services beyond basic packet forwarding, such as access control, quality of services, and traffic monitoring and analysis. Packet classification has been extensively studied in the past fifteen years, and the state-of-the-art method [16] looks up large classifiers with tens of thousands of rules very efficiently. These well-studied methods are, however, easily overwhelmed by the complexity of handling network-wide behaviors. This is because, as shown in Fig. 1, networkwide behavior is defined as the combinations of switch actions. Our preliminary experiments on the Stanford backbone network [10], which is a medium-scale network with 16 switches, 757,170 forwarding rules, and 1,584 ACL rules, revealed that the classification method of [16] fails to construct a classifier of the networkwide behaviors, exhausting the available computer memory of 16 GB. The main cause of this failure is that the search space defined by the packet header was partitioned into a huge number of blocks; 652 million rules were required to express the complicated space, but the conventional methods baulk at this number. In order to efficiently handle such a complicated partitioned space, some network verifiers [6–8] employ Binary Decision Diagrams (BDDs) [21], which are a compressed data structure designed to represent Boolean functions. Due to the great space efficiency of BDDs, only 6.6 MB memory was required to represent Packet Classification for Global Network View of Software-Defined Networking Example network A A1 A3 3 C1 A2 B B1 B2 C2 B3 C C3 Actions at switches Field1 Field2 Action Field1 Field2 Action Field1 Field2 Action * [3,4] Drop [0,3] [2,5] B2 [2,2] [1,3] C2 [4,5] * A1 [4,5] * B1 [4,5] * C1 [6,7] * A3 [7,7] [1,3] B3 [6,7] * C3 [0,3] * A2 * * Drop * * Drop 0 0 Field1 B2 B3 0 0 7 Field1 Network-wide packet behaviors Field2 7 I VII IX II III IV VIIIXXI 0 0 V VI Field1 B1 XII 7 7 Drop 0 0 C2 Drop 7 Field2 7 Drop A1 A3 Field2 A2 Field2 7 C1 C3 Field1 7 I A B C Drop II A B C III A B : C XII A B C Figure 1: Search space defined by packet header and network-wide packet behaviors. The example network at the top includes three switches (A, B, and C), each of which has three ports. Each switch has a table that maintains several rules, and each rule associates two header fields (field1 and field2 of 3-bit) with actions. The two-field header space (a square) is divided into only seven blocks at switch A, but it is partitioned into 17 blocks for the twelve network-wide behaviors as shown in the bottom. the Stanford backbone in our experiments. However, a behavior must be looked up by linear search which is impractically slow; considering the several Boolean functions represented in BDDs, each of which maps the header space exclusively to a network-wide packet behavior, these functions must be tested individually to determine the packet behavior (e.g., in Fig. 1, twelve Boolean functions have to be tested one by one). In our experiments on the Stanford backbone, which has 1,093 network-wide behaviors, only 5.78 Kpps was achieved with BDDs. Classification throughput should exceed 10 Mpps in order to examine packets at line rate on 10 Gbps links. As discussed so far, no method has been introduced that can identify networkwide packet behavior with the necessary time and space efficiency. This deficiency imposes the following limitations on the SDN applications discussed earlier on. 4 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi • Quality-oriented services and network fabric cannot be realized in a complex network like the Stanford backbone, whose complicated network-wide packet behaviors cannot be maintained by network edge switches using conventional classification methods. • Reference [5] only employs predefined test packets for failure detection, since it is hard for conventional methods to identify the theoretical network-wide behavior for arbitrary packets; this restriction might hinder the resolution of faulty behaviors of production traffic. • References [6–11] can test only a few properties every few msec, which may cause serious policy violations to be overlooked if a lot of properties have to be checked in real-time. Similar limitations are faced by other applications. In addition, the limitations will hinder the advent of more advanced SDN applications. To remove the limitations and leverage the full potential of SDN, it is imperative to develop a new classification method satisfying the following requirements; the complicated header space of network-wide packet behaviors should be represented compactly enough to fit within a fast cache memory, and packets at the rate of beyond 10 Mpps should be classified based on their network-wide behavior. Rapid construction and update of classifiers are also appreciated. 1.2 Our Contributions This paper proposes a novel packet classification method that meets the above requirements. Our idea is simple but very effective. Boolean functions, each of which is associated with a single behavior, are unified into a single multi-valued function representing a whole classifier by itself. This multi-valued function is represented by a data structure named the multi-valued decision diagram (MDD) [22], a variant of BDD. Since this unification removes the slow linear search from the classification process, the throughput can be greatly improved while maintaining the space-efficiency of BDDs. Our method is two-fold. • An efficient algorithm that constructs the MDD from a set of BDDs representing Boolean functions, is presented. The construction process is analyzed and optimized by reducing it to Huffman coding. Upon receipt of a new packet behavior, the MDD can be incrementally updated in a short time. • A very simple and fast algorithm examines the MDD to classify packets based on their behaviors. Another algorithm further accelerates the classification process by regarding a bunch of header bits as a single variable. The timespace tradeoff entailed in the bit aggregation is analyzed. Packet Classification for Global Network View of Software-Defined Networking 5 Our method is evaluated with three real network configurations: Internet21 , Stanford backbone network [10], and Purdue campus network [23]. In the case of Stanford, the MDD only requires 8.4 MB of memory, which fits within fast CPU cache. The classification throughput is 20.1 Mpps on a single CPU core, nearly 3,500 times faster than conventional BDD methods. The MDD is constructed in 0.93 sec, and a new behavior can be added just in 29 msec. Our method does not require any special hardware support like TCAMs, and can be applied to any use case without significant difficulties. In this paper, our method is implemented as a fully software-based classifier assuming its use in software-based SDN applications, but it is so simple that it can be realized as a hardware-based system. The rest of this paper is organized as follows. After defining our problem in Section 2, Section 3 reveals deficits of conventional methods. Section 4 designs our method, and Section 5 elaborates algorithms used in the method. Section 6 reports the experiments and their results, and Section 7 discusses application aspects. Section 8 summarizes related work, and Section 9 concludes this paper. 2 Problem Statement The logical search space defined by the L-bit packet header is denoted by X = {0, 1}L , and is called the header space [10] (e.g., a square in Fig. 1 represents a header space of L = 3 + 3). Packet header x corresponds to a point in the header space, x ∈ X . The protocol-specific semantics of the packet header are ignored in our method, and a packet header is considered as a flat sequence of bits. The header space is partitioned into equivalent subspaces, each of which corresponds exclusively to a single behavior (e.g., the bottom square of Fig. 1 has twelve equivalent subspaces). An equivalent subspace is not necessarily continuous, and can consist of several blocks, where block is used to specify a continuous subspace. Let Xi be an equivalent subspace that maps to the i-th behavior, the subspace is defined as, Xi = {x ∈ X : fi (x) = >}, where fi (x) is a Boolean function indicating whether a packet with header x follows the i-th behavior, fi : X → {⊥, >}, and ⊥ is false and > is true. Since equivalent subspaces are mutually exclusive, Xi ∩ Xj = ∅ (i 6= j), and collectively exhaustive, S i Xi = X , they form a partition of header space X , P , as follows, P = {XI , XII , · · · , X|P | }. The equivalent subspaces, Xi ’s, and corresponding Boolean functions, fi ’s, are obtained by utilizing switch actions as follows [8]. Given P1 and P2 as arbitrary partitions of X , we define operation ⊗ as, P1 ⊗ P2 = {X ∩ Y : X ∈ P1 , Y ∈ P2 , X ∩ Y 6= ∅}. 1 http://vn.grnoc.iu.edu/Internet2/fib/ 6 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi Using this operation, the whole header space is then partitioned into equivalent subspaces as follows, O P = Pj , (1) j where Pj is a partition defined by the actions of the j-th switch (e.g., a square in the middle of Fig. 1)2 . This operation divides the header space into several blocks, which become combinatorially finer (e.g., the bottom square is much finer than middle squares in Fig. 1). In other words, intersecting rules of different switches spawn new blocks to distinguish network-wide behaviors, while rules on a single switch are simply masked by higher-priority rules if intersected. Multi-valued function F is defined by, F : X → {nil, I, II, · · · , |P |}, where F (x) = i if and only if header x is in the i-th equivalent subspace, x ∈ Xi . A multi-valued function that cannot be “nil” is called a complete multi-valued function, and can be used as a packet classifier. Our goal in this paper is to construct an efficient data structure representing a complete multi-valued function, F , given a set of Boolean functions, fi ’s. 3 Inefficiencies of Conventional Packet Classification Methods This section identifies why conventional packet classification methods fail to support network-wide packet behaviors. There are two major conventional approaches; algorithmic methods [12–16] usually construct a decision tree to represent the header space, while architectural methods [17–20] often try to reduce the number of rules to fit them in space-limited TCAMs. These conventional methods commonly depend on the following assumption; the header space is a multi-dimensional space defined by each header field (not each bit like our definition), and it is covered by a set of hypercubes, where a hypercube is a convex block defined by a range or prefix in each field. To resolve possible conflicts among hypercubes, they are given an overall total order. Normally, hypercubes are specified by a list of “5-tuple” rules (i.e., source/destination IP, source/destination port, and protocol). The computational complexity of both conventional approaches depends on the number of hypercubes, as follows. • In a decision tree, the root node represents the whole header space, which is recursively divided at every intermediate node until a few rules are left 2 Note that these partitions, Pj ’s, might be given for each port as well as each switch, depending on network configuration. Packet Classification for Global Network View of Software-Defined Networking 7 at a leaf node. The space complexity is known as O(N D ) for O(log N ) classification time [24], where N is the number of hypercubes and D is the number of header fields. The required space in practice could be smaller than this worst-case bound, but it still depends on the number of hypercube rules because the decision tree must be large enough to divide the rule set into several subsets with a few rules. • Rule reduction methods reduce the number of hypercube rules by merging those of the same action into a single large hypercube; in the following example, the left set of hypercube rules is reduced to the right set. IP destination 0.0.0.0/3 32.0.0.0/3 64.0.0.0/2 128.0.0.0/1 Action Drop A1 Drop Drop IP destination 32.0.0.0/3 0.0.0.0/0 Action A1 Drop The time complexity of TCAM razor, the most-cited rule reduction method, is roughly considered to be O(DN AL2 ), where A is the number of actions/behaviors, since it performs dynamic programming of O(N AL2 ) [25] for each header field. The number of hypercubes, N , is examined for the three real networks (the statistics are given in Section 6). To our knowledge, there is no technique that can calculate a set of hypercubes to represent a header space of network-wide behavior, and so hypercubes are extracted from BDDs of fi ’s. # of hypercubes Internet2 35,700 Stanford 652,115,821 Purdue 834,648,394 The numbers for Stanford and Purdue networks are extremely large. This is because Stanford and Purdue include multi-field rules, which incurs the curse of dimensionality [26]. Internet2 dataset includes single-field rules only, and so it has a moderate number of hypercubes. Against these networks, we evaluate three conventional methods: two decision tree methods, HyperCuts [12] and HybridCuts [16], and one rule reduction method, TCAM razor [18]. Open-source implementations were available for HyperCuts3 and HybridCuts4 , while our own implementation was developed for TCAM razor. As shown in the following table, the single success was achieved by HybridCuts for Internet2; otherwise, the conventional methods exhausted 16 GB memory or did not finish in a half hour. HyperCuts HybridCuts TCAM razor 3 4 Internet2 fail (memory) success fail (timeout) Stanford fail (memory) fail (memory) fail (timeout) Purdue fail (memory) fail (memory) fail (timeout) http://hypercuts.masicek.net/, by Charles University. http://github.com/lwj4333765/HybridCuts, by the authors of HybridCuts. 8 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi To recap, although the header space of network-wide behavior requires a huge number of hypercube rules to represent it, the computational complexity of conventional methods is directly dependent on the number. In order to remove this dependency, our method should be constructed from a compressed representation like BDDs where individual rules do not need to be examined. 4 Abstract Procedure of Proposed Method This section describes the abstract procedure of our method; algorithms to implement the procedure are given in Section 5. Our method, first, builds incomplete multi-valued function Fi from the Boolean function of the i-th behavior, fi . Function Fi maps the header space to just the i-th behavior, as follows, Fi : X → {nil, i}, where Fi (x) = i if fi (x) = >, or Fi (x) = nil otherwise. Since two equivalent subspaces, Xi and Xj (i 6= j), are mutually exclusive, two incomplete multi-valued functions, Fi and Fj , never define behaviors for the same packet header; i.e., ∀x ∈ X , Fi (x) = nil ∨ Fj (x) = nil. These two incomplete multivalued functions can be unified into a single one without conflict, by introducing the following unification operation, if Fj (x) = nil, Fi (x) Fi (x) ] Fj (x) = Fj (x) (2) if Fi (x) = nil, undefined otherwise. This operation is clearly associative and commutative. Performing this operation over multi-valued functions of all the behaviors yields the multi-valued function of classifier, F (x), as follows, F (x) = |P | ] Fi (x). (3) i=I Since the equivalent subspaces are collectively exhaustive, F (x) is complete, that is, ∀x ∈ X , F (x) 6= nil. To incrementally update classifier F , we consider that rules of another multivalued function, F 0 , would be inserted above the original rules. More formally, F (x) is overwritten by F 0 (x) just in the subspace X 0 in which F 0 (x) is defined, ∀x ∈ X 0 , F 0 (x) 6= nil. This update operation is defined as, ( F 0 (x) if F 0 (x) 6= nil, F 0 (x) F (x) = (4) F (x) otherwise. Our method, first, builds an incomplete multi-valued funcFigure 2 presents an example B. Procedure B. Procedure oftion, Proposed B. Procedure Proposed Method Proposed Method A. Decision A. Decision Diagrams A. Decision Diagrams Fiof (x), fromofthe BooleanMethod function of i-th behavior, fi (x). A BDD [20]Diagrams is an acyclic dire The multi-valued function, F (x), maps the header space only node and two terminal nodes,witfa i Our method, Our method, first,Our builds method, first,anbuilds incomplete first,anbuilds incomplete multi-valued an incomplete multi-valued func-multi-valued funcA BDDfunc[16]is A BDD anA [16]is acyclic BDDan[16]is directed acyclic angraph directed acyclic dg i-th as the follows, terminal node is labeled as b-th tion, Fi (x), tion,from Fito (x), tion, the from Boolean Fbehavior, the from Boolean function Boolean function of i-th behavior, function of i-th behavior, fof i-th behavior, fnode (x). i (x), i (x). i (x).andf i node two and terminal node two and terminal nodes, twofalse terminal nodes, ? and false nodes, true ? it hasisby two 0-ch The multi-valued The multi-valued The function, function, Fi (x), maps function, F themaps header Fi (x), thespace maps header only thespace header only spaceterminal only isand terminal node terminal labled node labled node a labeled bitisby number labled aarcs, bit by in numb [1, a Fmulti-valued {nil, i},i (x), i :X ! indicates the b-th bit is 0 or 1; to i-th behavior, to i-th behavior, as to follows, i-th behavior, as follows, as follows, two labeled twoarcs, labeled 0-child twoarcs, labeled and 0-child 1-child, arcs,and 0-child each 1-child, and of wh e1 where Fi (x) = i if fi (x) = >, or Fi (x) = nil otherwise. be both 0is and 11;is (i.e., abit1; “don’t the bit number the bit is number the 0 or bit 1; number if 0 a or bit is if 0 skipped, a or is if sk ita Fi : X !F{undef, ! F i}, {undef, : X ! i}, {undef, i}, i :X i Since two different subspaces, Xi and Xj (i 6= j), are root to a terminal corresponds 1 (i.e., a 1“don’t (i.e., acare” 1“don’t (i.e., bit). acare” “don’t A path bit). care” from A path bit). the from roo A exclusive, multi-valued functions, at the to end of the path indi where Fiwhere (x) = F imutually if(x) fi (x) = Fi i= if(x) >, fi = (x) ori = F ifitwo (x) >, fi (x) or =incomplete = F undef (x)or =otherwise. Fundef otherwise. undef otherwise. iwhere i >, i (x) = corresponds corresponds to terminal packet corresponds toheader packet x, header packet and x,header termi and F (x) and F (x), never define behaviors for the same Packet Classification for Global Network View of Software-Defined Networking height (the longest i j Since two Since different two Since different subspaces, two different subspaces, Xi and subspaces, Xij and (i 6=Xij), (iare 6=Xjj), 6=packet j), j and of(iare path indicates of are path9maximum indicates of thepath value indicates the of value f (x). the of A value fBDD (x). pa of A is header; 8x X,F = multi-valued nil _functions, Fj (x) =functions, nil. These two for a function and bit order. The i (x)incomplete mutuallymutually exclusive, mutually exclusive, twoi.e., incomplete exclusive, two2 incomplete multi-valued two multi-valued functions, a function a function and bita order. function and bit order. and bit order. incomplete multi-valued functions canthe be same unified into a packet one number of non-terminal nodes Fi (x) and FiF (x) and Fnever (x) and define never Fj (x), behaviors define neverbehaviors for define the behaviors same for packet for the packet same j (x), iF j (x), A MDD A[17], MDD [18]is A[17], MDD also [18]is an [17], acyclic also [18]is andirected acyclic also a confliction, by introducing the following unification |fcan (x)| if BDD represents header; i.e., header; 8xwithout i.e., header; 2 X 8x , F i.e., (x) 2 X 8x = , F undef 2 (x) X = , F _ undef (x) F (x) = _ = undef F undef. (x) _ = F undef. (x) = undef. i i i j j j single root, single butby root, itsingle buthave root, it the can more but have it than can more two have tha te Functions Representations operation, A [22] is also These two These incomplete two These incomplete multi-valued two incomplete multi-valued functions multi-valued functions can be functions melded can be melded can be I, melded undef, undef, II, · · · ,I,|P undef, II,|. ·MDD Each · · ,I,|P II, non-terminal |.[21], ·Each · · , |P non-termina |. Each node nois 8 ait single but itcan can into a one intowithout a one intoconfliction, without a one confliction, without by introducing confliction, by introducing the byfollowing following the following aggregated aggregated bits,with and aggregated bits, canand have bits, itroot, can more and have itthan more two hah ifintroducing Fthe j (x) = nil, <Fi (x) BDDs Construction fI fII f|P| > ... nodes, II, · · · , |P |. Each K K nil, I,K operation,operation, operation, 0, 1, · · · ,0, 2 1, · · 1. · ,0,2Other 1, · · ·1. properties , 2Other 1. properties are Other same prope are wi Fi8 (x) ] Fj8 (x) = 8 Fj (x) if Fi (x) = nil, by aggregated bits, and it can > :> In our method, In our method, aIn BDD our method, ais BDD used ais to BDD used represe is t > > F (x) F (x) if F F (x) (x) if = F undef, (x) if = F undef, (x) = undef, K i i i j j j undefined otherwise. < < < arcs, 0, 1, · · · , 2 1. The max function function of a header function of aspace header of mapped aspace header to mapped aspace single to map pa a FiF (x) FjiF(x) ]operation Fji... (x) ] = FisF = ifFFj (x) F F|P|j(x) =F undef, = Fundef, j (x) properties are same with the BD j (x) i (x) if i (x) if i (x) = undef, I ] F II = This > > > while is a MDD used whiletois a express MDD used toisaexpress used multi-valued to aexpr mu : : associative : and commutative. while a MDD In our method, a BDD is used not available not available otherwise. not available otherwise. otherwise. Performing this operation over all the incomplete headermultispace header of space packet header ofclassifier. space packetofclassifier. Figure packet 2classifi Figu show Unification tion that maps the header space functions, Fassociative the and multi-valued function i (x)’s, MDDofofclasMDD Fig. 1.ofMDD Fig. 1.of Fig. 1. This operation This operation isvalued This associative operation is associative andiscommutative. and commutative. commutative. while a MDD is used to expre sifier, Fthis (x), operation isover obtained as follows, F Performing Performing this operation Performing thisalloperation over the incomplete all MDDs over the incomplete all multithe incomplete multi- multipacket classifier. Multi-Valued Multi-Valued Decision B. Multi-Valued Decision DiagramDecision Diagram Constructio Dia Co |Pthe | multi-valued valued functions, valued functions, valued Fi (x)’s, functions, F the multi-valued Fthe multi-valued function function of clas- function ofB.clasofB.clasi (x)’s, i (x)’s, Bit aggregation ] sifier, F (x), sifier,is Fobtained (x), sifier,is Fobtained as(x), follows, is obtained asFfollows, (x) =as follows, Fi (x). (2)The Construction The BDD of B. fBDD The is of easily fBDD of is and easily fi (x)Updates converte istoeasily the oM i (x) i (x) converted F(K) i=I by replacing by replacing ?-grams and by replacing >-terminals ?- and >-terminals ?- with and >-termin undefwitha |P | |P | |P | ] ] ] respectively. respectively. The respectively. of BDD size of fBDD (x) is of obviou fBDD Since the =subspaces collectively exhaustive, F (x) F (x) Fi (x). = are F (x) F = Fi (x). (1) F (x) (1)is com-(1) size isize i (x) TheThe BDD of The fi (x) is easily co i (x). that of MDD that ofofby MDD F that (x)ofofinMDD Fterms ofin of Fterms the number in of term the (new rule) plete, that is, 8x 2 X , F (x) = 6 nil. i=I i=I i=I ireplacing i (x) i (x) ?and >-termina Update f ’ BDDs diagram, diagram, that is,respectively. diagram, |fthat is,=|fthat |F (x)| is,time =|fwhere |F =|A( |F w In order toare incrementally update weisconi (x)| iThe i (x)|, i (x)| i (x)|, complex Since theSince subspaces theSince subspaces are the collectively subspaces collectively exhaustive, are collectively exhaustive, F (x)classifier exhaustive, is comF (x)Fis(x), comF (x) comdiagram of diagram representing of diagram representing function representing A(x). The funct A(x) spa sider F (x) overwritten byundef. an incomplete of multi-valued The MDD size isfunction obviously sa plete, thatplete, is, 8x that 2 plete, is, X that , 8x Fthat (x) 2 is, X 6=, 8x undef. Fis(x) 2X 6=, undef. F (x) 6= 0 this step ofconthis is O(|f step of this is O(|f while (x)|), is O(|f thewhile time the compl whi tim function, F 0 (x), if classifier itupdate is defined, (x) 6= wenil. Roughly i (x)|), i (x)|), |F =step |f i (x)| ii(x)|. In order In to order incrementally In to order incrementally to update incrementally classifier F update (x),Fwe classifier F (x), conFof(x), conwe 0 F ’ constant, O(1). constant, O(1). O(1). speaking, rules of F (x) would be inserted above thoseconstant, of Reference [21] defines an effi sider thatsider F (x)that issider overwritten F (x)that is overwritten F (x) by isanoverwritten incomplete by an incomplete by multi-valued an incomplete multi-valued multi-valued 0 0 MDDs 0 by, Reference Reference [17] Reference [17]an defines efficient [17]an defines efficient algorithm an ea (x). This operation defined thatdefines performs arbitrary operation Bitfunction, function, Faggregation (x),Ffunction, if F it0 (x), is defined, ifFupdate it0 (x), is defined, F if 0 (x) it is6=defined, Fis undef. (x) 6=FRoughly undef. (x) 6= Roughly undef. Roughly 0( that of performs thatof performs arbitrary that performs arbitrary operations operations overmethod two operati over MDD algorithm allowsarbitrary our to speaking,speaking, rules ofspeaking, rules F 0 (x)ofwould rules F 0 (x)of bewould Finserted (x)Fbe0would inserted above be those above of those above those 0inserted (x) if F (x) 6= nil,algorithmalgorithm 0 (K) (K) allows our allows method our allows to method apply our to operatio method apply F’ F(x) to algorithm multi-valued functions, so as FThis (x)operation =is by, (3) F (x). This F (x). update This F operation (x). update update is F defined operation definedis by, defined by, multi-valued functions, multi-valued functions, so as tofunctions, calculate sosize as tocan calcula so (1)be asantq ( ( (F (x) otherwise. multi-valued worst-case MDD 0 0 0 0 0 F (x) if F F (x) (x) if6= F 0Fundef, (x) (x) if6= Fundef, (x) 6= undef, MDD of canMDD be quite of canMDD large be quite can withlarge beoperations; quite withlarge operat i.e., wi update operation is in equivalent toofunification Figure 2: Flowchart ofF 0classifier and our(2)method. (x) F 0Note (x) (x)construction = that F 0(x) (x) = F (x) = update (2) (2) |A(x)3B(x)| |A(x)||B(x)|, shows |A(x)3B(x)| shows |A(x)3B(x)| shows |A(x)||B(x)|, |A(x)3B(x)| |A(x)||B(x)|, where |A(x) A F (x) otherwise. F (x) otherwise. F (x) otherwise. multi-valued functions and 3 is operation ] when the latter is defined, and so update operation arefunctions. multi-valued are multi-valued functions are the multi-valued functions and 3 is functions and arbitrary 3 isanao ever, size is usually conside can be applied to unify incomplete multi-valued Note thatNote operation thatNote operation that is equivalent operation is equivalent to operation is equivalent to operation ] when to operation ] thewhen ] thewhen the ever, the ever, size is the usually ever, size is the considerably usually size is considerably usually consiths case upper bound, withsmaller somethi For sothe lookup acceleration, multi-valued function inthe latter isdefined, latter is defined, and latter isoperation defined, and so unification operation and can so operation be applied canaoperation be to applied can meld be to applied meld to meld Note that update operation is equivalent to ] casewhen upper case bound, upper case with bound, upper something with bound, something like with|A(x)| some like [41]; in our experiments, the siz which consecutive K bits are aggregated into a single variable incomplete incomplete multi-valued incomplete multi-valued functions. multi-valued functions.functions. in our experiments, in our total experiments, in our the experiments, size was the size never the was greater size neve w latter operation is defined, but this paper distinguishes them for clarity. size of operands. These ob is defined as follows, For the For lookup the For acceleration, lookup the acceleration, lookup a multi-valued acceleration, a multi-valued function a multi-valued function in function in in size of operands. size oftheorem. operands. size Our of method operands. Our ismethod established Our ismethod estab on L (K) Kinto whichpacket consecutive which classification consecutive which KFbits consecutive are aggregated bits are single into variable a single single variablein K ! {nil, I,into II, variable ·a·theorem. · function , |P |}. :K {0, 1,process, · are · ·K, aggregated 2bits 1} In order to accelerate the aa aggregated multi-valued theorem.theorem. is definedis as defined follows, is as defined follows, as follows, Theorem 1. Assume that the s (K) (K) which continuous K bits are aggregated into a single variable is defined as follows, Necessarily, F LK (x) =LK F (x), 8x uses FTheorem L 2 X . This paper Theorem 1. Assume Theorem 1. that Assume the 1. that size Assume of theMDD that size the ofaf K K KK K K K ! I,{undef, · · ,I, {undef, |PII,|}. · · · and ,I,|P II,|}. · · · ,simply |P |}. an operation is usually less than F : {0,F1,only · ·:·{0, , when 2F1, · ·:it 1} ·{0, ,has 2 1,! · · {undef, ·1} , 2distinguished 1}II, · ! to be by K, uses an operation an operation is usually an operation is less usually than is less usually or equal than less or to eq th operand MDDs, L KF otherwise. Let1,X· ·2Let 2the Let X1} input X beK2the of X input this be the aggregated ofI, input this variable repre- variable repreF (K) : {0, · X, 2Xbe − → {nil, II,aggregated ·of·variable ·this , |Paggregated |}. operand repreoperand MDDs, operand MDDs, MDDs, |A(x)3B(x)| |A( sentation,sentation, we have sentation, we F Khave (X) F =AKLGORITHMS Fhave (X) (x) F = ifKX F(X) (x) isINequivalent = ifP ROPOSED F X(x) is equivalent if to XMx.isETHOD equivalent to x. to x. IV.we 1 1 1 |A(x)3B(x)| |A(x)3B(x)| |A(x)| |A(x)3B(x)| +|A(x)| |B(x)|, +|A(x)| |B(x)|,+ (K) (K) Obviously, F∈ ⌘ F. .FThis ⌘ F .paper F ⌘ F . uses F Necessarily, F (x) = FObviously, (x), ∀x XObviously, onlyBoolean whenfunctions, it has tothe time complexity of the opera Given a set of BDDs representing given by, IV. A IV. A LGORITHMS IV.OFAPLGORITHMS ROPOSED OF P ROPOSED Min OF ETHOD P ROPOSED M ETHODconstruct M ETHOD fiotherwise. (x)’s, algorithms proposed this section MDDs the time the andtime space the and complexity time space and complexity space of the complexity operation of the o be specified by K, or uses simply FLGORITHMS (K) of F (x) and F (X). After reviewing decision diagrams in K Given a Given set ofaGiven BDDs set ofarepresenting BDDs set of representing BDDs Boolean representing Boolean functions, Boolean functions,functions, K K K O(2 (|A(x)| + (|A(x)| O(2 +(|A(x)| |B(x)|)). O(2 +(|A(x)| |B(x)|)). + |B(x)|)) Last, the constructionfi (x)’s, and algorithms procedures method areconstruct summarized Section IV-A, Section IV-B constructs the MDD from O(2 a set of fupdate fi (x)’s, proposed algorithms proposed in this of section proposed in our this construct section in this construct section MDDs MDDs MDDs i (x)’s, algorithms K Section K IV-C accelerates lookup operations on the BDDs, The time complexity of CAS FK of (X). andF (x) Fand After (X). andreviewing FAfter (X). reviewing decision After reviewing decision diagramsdecision diagrams in diagrams in complexity in complexity as a flowchart in Fig. 2. of F (x) ofandF (x) The The isThe defined complexity isby defined the number is by defined the of nub MDD. number of nodes and that of chi Section IV-A, Section Section IV-A, Section IV-B Section IV-A, constructs IV-B Section constructs aIV-B MDDconstructs from a MDD a set afrom MDD of a of set from of a set of children of children per node. of children per node.per node. BDDs, and BDDs, Section and BDDs, IV-C Section and accelerates IV-C Section accelerates lookup IV-C accelerates operations lookup operations lookup on the operations on the on the Theorem Theorem From the following Theorem 1, the following lemma 1, the follow islem gi • In classifier construction, Boolean functions mapped to behaviors,From fi ’s, are 1,From MDD. MDD. MDD. converted to multi-valued functions, Fi ’s, which are unified into the multivalued function of classifier, F , and further converted to F (K) by bit aggregation. • To incrementally update classifier F (K) , new rule f 0 is converted into F 0(K) through bit aggregation, and then inserted to the classifier. 5 Algorithms in Proposed Method Given a set of BDDs representing Boolean functions, fi ’s, which are obtained using existing techniques like [8], the algorithms proposed in this section construct 10 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi MDD of F(2) BDD of fI 1 0 0 1,2 0,1 1 21 0 1 2 3 32 3,4 2,3 43 43 5 4 5 4 3,4 2,3 3 2 0 5 4 1 5,6 4,5 2 1 3 5,6 4,5 3,4 2,3 0 0 5,6 4,5 5,6 4,5 3,4 2,3 2 1 3 0 5,6 4,5 1 3 5,6 4,5 2 5,6 4,5 5,6 4,5 6 5 3 1 0 2 3 III 0 1 0 1 2 II 3 2 1I 0 4 IV 3 2 5 V 1 1 6 VI 2 0 3 7 VII 1 2 3 0 8 VIII 3 1 10 X 2 0 0 23 9 IX 1 0 3 11 XI 1 2 12 XII Figure 3: BDD (left) and MDD of K = 2 (right). The BDD represents the Boolean function mapped to behavior “I” of Fig. 1, while the MDD represents the multi-valued function mapped to all twelve behaviors of Fig. 1. Since consecutive two-bits are aggregated in the MDD, non-terminal nodes are labeled by the two-bits while arcs are labeled by 0- to 3-child. A packet header of x = 010111 is indicated by the red paths in the both diagrams. the MDD of F (K) . After reviewing decision diagrams in Section 5.1, Section 5.2 analyzes the unification and update operations performed on MDDs. Section 5.3 introduces the bit aggregation algorithm, and Section 5.4 proposes a search algorithm for the packet classification. 5.1 Decision Diagrams As shown in Fig. 3 (left), a BDD [21] is an acyclic directed graph with a single root node and two terminal nodes, ⊥ and >. Each non-terminal node is labeled as b-th bit in the header, b ∈ [0, L − 1], and it has two labeled arcs, 0-child and 1-child, each of which indicates the b-th bit is 0 or 1. A path from the root to a terminal corresponds to a packet header, x, or a set of headers (a hypercube) when some bits are skipped (e.g., the red BDD path in Fig. 3, on which the fifth bit is skipped, expresses a hypercube of 01011∗). The terminal node at the end of path indicates the value of f (x). The maximum height, or the longest path, is L. Common prefix and suffix are shared among paths for compression (e.g., two paths, 00 ∗ 11∗ and 01111∗, share their prefix 0 and suffix 11∗ in Fig. 3); it is believed that BDDs are well compressed for most practical functions [27]. The size of BDD is defined as the number of non-terminal nodes in it, and the size is denoted by ||f || if the BDD represents function f . An MDD [22], which is shown in Fig. 3 (right), is also an acyclic directed graph with a single root, but it can have more than two terminal nodes, nil, I, II, · · · , |P |. Each non-terminal node is labeled by aggregated bits, and it can have more than two children arcs, 0, 1, · · · , 2K − 1. The maximum height is L/K. Other properties are the same as those of BDD. Packet Classification for Global Network View of Software-Defined Networking 11 In our method, a BDD is used to represent a Boolean function that maps the header space to a single packet behavior, while an MDD is used to express a multivalued function. BDD of fi (x) is easily converted to MDD of Fi (x), by replacing ⊥- and >-terminals with nil- and i-terminals, respectively. The time complexity of this operation is clearly constant, O(1). MDD size is obviously the same as BDD size, i.e., ||Fi || = ||fi ||. 5.2 5.2.1 Unification and Update Operations CASE Algorithm Reference [22] defines an efficient algorithm named CASE that performs arbitrary operations over two operand MDDs and constructs the resulting MDD. This CASE algorithm allows our method to apply operations ] and to multi-valued functions so as to calculate (2) and (4). The worst-case MDD size can be quite large, ||A3B|| ≤ ||A|| · ||B||, where A and B are multi-valued functions and 3 is an arbitrary operator. However, the size is usually considerably smaller than this worst-case upper bound, closer to ||A|| + ||B|| [28]; in our experiments, the size was never greater than the total size of operands, though impractical counter examples can be considered. This observation yields the following assumption. Assumption 1. The size of MDD performing a single operation is less than or equal to the total size of operand MDDs, ||A3B|| ≤ ||A|| + ||B||. (5) The time complexity of CASE algorithm is defined by the number of nodes and that of children per node [22], and so it is fixed with Assumption 1 as follows, O(2K (||A|| + ||B||)), (6) where the algorithm involves ||A|| + ||B|| node creations, at each of which 2K children are set. Note that A and B must share a common K to apply CASE algorithm. 5.2.2 Unification The MDD size and time complexity of (3) is analyzed. Using (5), the MDD size is bounded as follows, ||F || ≤ |P | X i=I ||Fi ||. (7) tion, (x), thefollows, Boolean function of i-th behavior, fi (x). two nodelabeled and terminal nodes, false ?each and >. to i-thFito behavior, i-thfrom behavior, to as i-th behavior, as follows, as follows, twotwo labeled arcs, two 0-child arcs, labeled 0-child and arcs, 1-child, and 0-child 1-child, andoftrue 1-child, each which of Each repre each whic The multi-valued function, Fi (x), maps the header space only the terminal node isislabled by aabit number inaskipped, [1, L], and bit the number bit number the 0 bit or number is 1; 0 if or 1; is bit 0 if is or a skipped, bit 1; if is bit it can is skipp be it c0i Fi behavior, : X F!i :{undef, Xas! Ffollows, : i}, X ! {undef, i}, i}, i {undef, to i-th two labeled arcs, 0-child and 1-child, each of which repre 1 (i.e., 1a (i.e., “don’t a1 “don’t care” (i.e., abit). care” “don’t A bit). path care” Afrom path bit).the from A root path theto from root a term tht the bit corresponds number is 0toorpacket 1; iftoaheader bit and is skipped, can bethe0 whereFiwhere F:i (x) = Fwhere i(x) if f=iF(x) ii i}, (x) if=f= (x) i or if=fF>, (x) or== F>, (x)or=Fundef otherwise. undef corresponds otherwise. i{undef, i>, ii(x) iundef i (x) =otherwise. tocorresponds packet header packet x, x, header theand terminal the x,it and termina atthe X ! 1are aindicates “don’t care” bit). path from theofA root to aAisterm Since two Sincedifferent two Since different subspaces, two different subspaces, Xi subspaces, andXiXjand(iXX 6= j), (i 6= are Xj j),(iof 6=(i.e., j),of are i j and path path of indicates path the value indicates theAof value f (x). theof value Af (x). BDD fisBDD (x). canonica BD c where mutually Fi (x) =mutually i exclusive, if ftwo = >, orincomplete Ftwo = undef otherwise. i (x)exclusive, i (x) toabit packet header x, order. and the terminal at the mutually exclusive, incomplete two multi-valued incomplete multi-valued functions, multi-valued functions, functions, acorresponds function a function and function and order. bit and order. bit Since two different Xi behaviors andforXthe 6=thej), j (i of Apath indicates the value ofanalso f[18]is (x).anAalso BDD canonic Fi (x) and Fi (x) Fjand (x), Fi (x) Fnever andsubspaces, define never Fj (x),define behaviors never define behaviors for same packet same forare thepacket same packet j (x), MDD A [17], MDDA [18]is [17], MDD also [18]is [17], acyclic acyclic directed an isacyclic directed graph diw g mutually exclusive, two multi-valued functions, asingle function and bit order. header; header; i.e., 8x header; i.e.,2 8x X ,i.e., F2iincomplete (x) X 8x, F=2 undef X ,= Fi (x) undef _ Fj=(x) _undef F= undef. _= Fj (x) undef. = undef. i (x) j (x) single root, but root, single it can butroot, ithave can butmore have it can than more have two than more terminal twothan term nt Fi (x) and Fincomplete (x), never define behaviors forfunctions the packet j two MDD [18]is also acyclic graph wb These two These These incomplete two multi-valued incomplete multi-valued functions multi-valued cansame be functions can melded be melded canundef, beA melded undef, I, Akashi II, [17], · · I, ·undef, ,II, |P ·|.· ·Each I,, |P II, |.·non-terminal ·Each ·an, |P non-terminal |. Eachdirected node non-terminal isnode labeled is nla 12 T. Inoue, T. Mano, K. Mizutani, S.=Minato, and O. header; i.e., 8x 2 X , F (x) = undef _ F (x) undef. i j single root,bits, but itbits, can have two terminal n into a into one without a one intowithout aconfliction, one without confliction, byconfliction, introducing by introducing bytheintroducing following the following theaggregated following aggregated aggregated and it can and bits, have it more can and more have itthan can than more have two than more children twothach These operation, two incomplete K |. Each K non-terminal node is labeled undef, I,, 1, II, ,21, |P operation, operation,multi-valued functions can be melded 0, 1, · · ·0, 2K· ·· ·· ·0, ,1. Other · · · 1., 2properties Other 1.properties Other are same properties arewith sameare the with sam BD 8 8 by introducing 8 into a one without confliction, the following aggregated bits, and it can have more than two children In our Inmethod, our In method, aour BDD method, a is BDD used a isBDD toused represent istoused represent atoBore > Fj (x) if = (x)if=Fundef, i (x) j (x) = undef, K <Fi (x)> <Fi (x)> <Fif operation, F(x) Fjundef, 0, 1, · · function · ,of 2 a header 1. area single same BD function function of Other a header space ofproperties amapped space headermapped to space mapped to awith packet single tothe apack beha sin 8 (F F ) (F F ) Fi (x) FiF(x) ]F= Fij(x) (x) FXI = Fif (x) Fj (x) Fi (x) if = Fiundef, (x)if=Fundef, IV ] V VIF] j (x) j (x) j= j (x) i (x) = undef, In our method, a BDD is used to represent a Bo > > > while a while MDD a while is MDD used a is to MDD used express is to used express a multi-valued to express a multi-valued a function multi-v f F (x) if F (x) = undef, i : j : : <not available not available not otherwise. available otherwise. otherwise. function of a header space mapped to a single packet beh F header header space of space header packet of space packet classifier. of classifier. packet Figure classifier. Figure 2 shows 2 Figure shows a BDD 2 F Fi (x) Fj (x) if Fi (x) = undef, IV ] FjV(x) = while aof MDD isFig. used1. express :associative MDD MDD Fig.of1. MDD oftoFig. 1. a multi-valued function This operation This operation This is associative operation is> and is associative commutative. and commutative. and commutative. not available otherwise. headermultispace of packet classifier. Figure 2 shows a BDD Performing Performing this Performing operation this operation this overoperation allover the all incomplete over the all incomplete themultiincomplete multiMDD of Fig. 1. This operation is associative and commutative. B. Multi-Valued B. Multi-Valued B. Decision Multi-Valued Decision Diagram Decision Diagram Construction Diagram Construction Const F F F F F F F F F F F F valued valued functions, functions, valued FVi (x)’s, functions, Fithe (x)’s, Fthe (x)’s, multi-valued thefunction multi-valued ofII clasfunction iVII IV VI XI multi-valued VIII XII I function IIIof clasX ofIXclasPerforming this operation over all as thefollows, incomplete multisifier, Fsifier, (x), is Fsifier, (x), obtained isFobtained (x), as is follows, obtained as follows, BDD TheofBDD fiThe (x)ofis BDD fieasily (x)Diagram ofis fconverted easily converted easily to theconverted MDD to the of MD toF i (x) is ||Ffunctions, 6 6 6 6 7 7 7function 9 9 9 9B. The 10 Multi-Valued Decision Construction valued Fi (x)’s, the multi-valued of clasi|| = by replacing by replacing ?by and replacing ?>-terminals and ?>-terminals and with >-terminals undefwith undefand with i-term and und |P | |P | |P | ] ] ] sifier, F (x), is obtained as follows, The respectively. BDD fisize (x)The is theofisMDD ofis Fo The respectively. ofeasily size BDD The ofconverted ofBDD size fi (x) ofofis BDD ftoiobviously (x) fobviousl same F (x) =F (x)F=i (x). F (x)Fi= (x). Fi (x). (1) respectively. (1) (1) of i (x) Figure 4: Huffman tree representing the optimal calculation orderthat ofreplacing (3) for Fig. 1.of Leaf by ?and >-terminals with undefand i-term |P | of that MDD of of MDD that F of (x) MDD in F (x) terms of in F of terms (x) the in of number terms the number of of the nodes nu o ] i=I i=I i=I i i i nodes are MDDs of Fi ’s, while nodes represent ] operations. The MDD sizes, respectively. The size of that BDD of fii(x)| (x) is=obviously same F (x)internal = Fi (x). (1) diagram, diagram, that is, diagram, that |f (x)| is, |f = (x)| |F is, (x)|, = |f |F where (x)|, |F where |A(x)| (x)|, |A(x) is where the i i i i i Since the Since Since subspaces are the collectively subspaces are collectively are exhaustive, collectively exhaustive, F (x) exhaustive, isFcom(x) is Fcom(x) is com||Fi ||’s, are shown at subspaces thethe bottom. thatdiagram of ofMDD of termsfunction of thefunction number of compl nodes i=I i (x) in of diagram representing of F diagram representing function representing A(x). The A(x). space The A(x). space Th plete, that plete, is,that 8x plete, 2 is,X8x that , F2(x) is,X 8x ,6=Fundef. (x) 2 X6=, Fundef. (x) 6= undef. diagram, that is, |f (x)| = |F (x)|, where |A(x)| isisthe i i of this step this is step O(|f of this is O(|f step iwhile is (x)|), O(|f the while timethe while complexity timethe complexi time clc Since theInsubspaces are collectively exhaustive, (x) is(x), comi (x)|), i (x)|), In order to order incrementally Intoorder incrementally to update incrementally update classifier classifier update FF (x), classifier F we conwe F (x), conwe of conof diagram representing function A(x). The space comp constant, constant, O(1). constant, O(1). O(1). plete, that that 8x Xthat ,F (x) undef. sider sider Fis,(x) that sider isF2overwritten (x) isF overwritten (x)6= is by overwritten an incomplete by an incomplete by an multi-valued incomplete multi-valued multi-valued of Reference this step O(|f while the time complexity is na cl Since operation ]0 (x), is associative and commutative, operations inis[17] (3) can be i (x)|), 0 update Reference Reference defines [17] defines an [17] efficient an defines efficient algorithm an efficient algorithm named algo C In order incrementally F (x), we 6=confunction, function, F to function, Fif0 (x), it isif Fdefined, (x), it is ifdefined, F it 0classifier (x) is defined, F6=0 (x) undef. 6= F 0 (x) undef. Roughly undef. Roughly Roughly constant, O(1). 0 0 by 0 incomplete that performs that performs arbitrary that performs arbitrary operations arbitrary operations over operations two over MDDs. two over MDDs. This two C sider F (x) is overwritten an multi-valued performed inspeaking, an that arbitrary order. In addition, the time complexity of each operation speaking, rulesspeaking, ofrules F (x) ofrules F would (x) of would be F (x) inserted be would inserted above be inserted above those of those aboveofthose of 0 Reference [17] defines an method efficient named C algorithm algorithm allows algorithm allows our method our allows to our apply method toalgorithm operations applytooperations apply ] and ope F update (x), ifupdate itThis is size defined, Fis0 (x) (x). This F (x). This Fon (x). operation operation update isofdefined operation defined by,6=isundef. defined by, Roughly by,We can, in (3) variesFfunction, depending the operand MDDs. therefore, optimize 0( that performs arbitrary operations over two MDDs. This C multi-valued multi-valued functions, multi-valued functions, so as functions, to so calculate as to so calculate as (1) to and calculate (1) (2). and The ( ( ( speaking, rules of F (x) would be inserted above those of 0 0 0 0 the calculation order.update operation algorithm allows ourbelarge method to apply operations ] and Fdefined if0 (x) F 0 (x) Fifby, (x) 6= F undef, (x)if6=Fundef, (x) 6= undef, of MDD of can MDD beofcan quite MDD quite canwith be large quite operations; with large operations; with i.e.,operations reference i.e., re F (x).FThis is 0 0 0 F (x) (x) F F(x) (x) F =F(x) (x) =F (x) = (2) multi-valued (2) (2) functions, so as to calculate (1) and (2). The ( shows shows |A(x)3B(x)| |A(x)3B(x)| shows |A(x)3B(x)| |A(x)||B(x)|, |A(x)||B(x)|, |A(x)||B(x)|, where where A(x) and A(x wh F (x) Fotherwise. (x) 0 Fotherwise. (x) otherwise. 0 Theorem 1. The0 time complexity is given F (x) ofif (3) F (x)optimized 6= undef, with Huffman of MDD can beare quite large with operations; i.e., reference are multi-valued arecoding multi-valued functions multi-valued functions and 3 functions and is arbitrary 3 isand arbitrary 3 operator. is arbit opeH F (x) F (x) = (2) Note that Note operation that Note operation thatisF operation equivalent tois operation equivalent to operation ]towhen operation ] the when] the when|A(x)3B(x)| the shows |A(x)||B(x)|, where A(x)this and (x) is equivalent otherwise. by, ever, the ever, sizethe isever, size usually the isusually size considerably is usually considerably smaller considerably smaller than sma than w latter islatter defined, is latter defined, andissodefined, and operation so operation and socan operation be can applied be applied can to meld be applied to meld to meld are multi-valued functions and something 3with islike arbitrary operator. case upper case bound, upper case bound, with upper something with bound, something |A(x)| like |A(x)| + |B(x)| like +|A Note that operation is equivalent to operation ] when the ! incomplete incomplete multi-valued incomplete multi-valued functions. multi-valued functions. |Pfunctions. | ever, size isin usually considerably smaller than this w in ourthe in experiments, our experiments, ourthe experiments, size thewas sizenever the was size greater never was greater than never the gr th X latter defined, and so operation cana be applied to meld For isthe Forlookup theFor lookup acceleration, the lookup acceleration, aacceleration, multi-valued multi-valued a function multi-valued function in case function in upperinbound, with something like |A(x)| + |B(x)| K size of size operands. of operands. sizeOur of operands. method Our method isOur established method is established on is establish the on follo th 2arebits , single (8) Obitsfunctions. `(F i )||F ia||single incomplete which consecutive whichmulti-valued consecutive which K consecutive K aggregated are K bits aggregated are intoaggregated into a variable into avariable single variable in our experiments, the size was never greater than the theorem. theorem. theorem. For the lookup acceleration, a multi-valued function in i=I is defined is defined as follows, is defined as follows, as follows, size of operands. Our method is established on the follo which consecutive K Kbits are Kaggregated intoLa single variable Theorem L L Theorem 1. Assume Theorem 1. Assume that1.the Assume that sizetheofthat size MDD the of after MDD size of perfor after MD K K K K theorem. K· ·! K K II, {undef, ! {undef, I, ! · · {undef, I, · , II, |P · |}. · · I, , |P II, |}. · ·Huffman · , |P |}. F : {0, F 1, : · {0, · · F , 1, 2 · : · {0, · , 1} 2 1, · 1} , 2 1} where `(Fi ) isisdefined the path length from the root to leaf Fi on the an tree. as follows, operation an operation isanusually operation is usually less isthan usually lessorthan equal less or to than equal theortotal toequa thesi Theorem 1. Assume the size of MDD after perfor K2 X Let X X be 2 Let·the X· ·X,be input X of input beKLthis the input this of aggregated variable reprerepreoperand operand MDDs, operand MDDs,that MDDs, !ofaggregated {undef, I,this II, ·aggregated · · ,variable |Prepre|}. variable F Let : {0, 1, 2K2Kthe 1} sentation, sentation, we have sentation, weFhave (X) we FK = have (X) F (x) F=Kif(X) FX (x)= isifequivalent FX(x)is ifequivalent X tois x. equivalent toanx.operation to x. is usually less than or equal to the total s 1 only |A(x)3B(x)| |A(x)3B(x)| |A(x)| |A(x)| + |B(x)|, +|A(x)| |B(x)|, + |B(x)|, Let X Obviously, 2operation XF 1be variable at repreoperand MDDs, Proof. SinceObviously, each takes two operands a time in|A(x)3B(x)| the CASE algoObviously, ⌘ the FF.1 input ⌘ FF. of ⌘this F . aggregated sentation, we have F K (X) = F (x) if X is equivalent to x. rithm, the calculation can be a ETHOD binary tree, like Fig. 4.time Every IV.F A LGORITHMS IV.F .A LGORITHMS IV. A OF LGORITHMS Pillustrated ROPOSED OF P ROPOSED OF M ETHOD Pas ROPOSED M M ETHOD 1order the time theand time space the and complexity space andcomplexity space the complexity operation of the operation of isthegiven oper is |A(x)3B(x)| |A(x)| + of |B(x)|, Obviously, ⌘ function Fi onGiven a leaf atofa each nodeBoolean up to the root, and so the time Given a is set used aof Given set BDDs set BDDs representing ofinternal representing BDDs Boolean representing functions, Boolean functions, functions, K K O(2K (|A(x)| O(2 (|A(x)| + O(2 |B(x)|)). (|A(x)| + |B(x)|)). IV. A LGORITHMS OF P ROPOSED M ETHOD the time and space complexity of +the|B(x)|)). operation is given fi (x)’s, algorithms (x)’s, falgorithms proposed algorithms in this proposed in section this inconstruct this section construct MDDs construct MDDs MDDs complexity of (3)fiis bounded byproposed (8) using (6).section i (x)’s, Kof BDDs K K Given a set representing Boolean functions, K of F (x) of and F (x)Fofand F (X). (x) F After and (X).Freviewing After (X).reviewing After decision reviewing decision diagrams decision diagrams in diagrams in in(|A(x)| TheO(2 complexity The complexity The is complexity is defined by the is defined by number the number by of the nodes numb of and no + defined |B(x)|)). The optimal order is this given as aa from binary tree minimizing (8); this fi (x)’s,calculation algorithms in section construct MDDs Section Section IV-A, Section IV-A,proposed Section IV-A, IV-B constructs Section IV-B constructs IV-B a MDD constructs MDD afrom set MDD ofa set from ofchildren a set of of of children perofnode. children per node. per node. K of can F (x) andSection F (X). After reviewing decision diagrams in optimizationBDDs, be regarded as Huffman coding [29], asonleaf MDDs and their BDDs, and and BDDs, Section IV-C and accelerates Section IV-C accelerates IV-C lookup accelerates lookup operations operations lookup on the operations the the Theon complexity defined byfollowing the number nodes and From Theorem From Theorem From 1,is the Theorem following 1, the 1, lemma the following lemma is of given is lemma for given (1 SectionMDD. IV-A, MDD. Section IV-B constructs a MDD from a set of of children per node. MDD. sizes are replaced with symbols and their weights (frequencies), respectively. This BDDs, and Section IV-C accelerates lookup operations on the Theorem 1, the following lemma is given for (1 binary tree is also known as a Huffman tree. Note that the timeFrom complexity ignores MDD. Huffman tree construction, since it is negligible compared with MDD operations. 5.3 Bit Aggregation This subsection defines the bit aggregation algorithm that accelerates packet classification. Algorithm 1 constructs the MDD of F (K) from that of F , by aggregating continuous K bits into a single variable. Note that in Algorithm 1, F or F (K) refers to the root node of MDD representing function F or F (K) , not to the function itself. This algorithm recursively creates MDD nodes by finding their 2K children at K bits ahead from the node; x0 means K continuous bits from F.b, where F.b Packet Classification for Global Network View of Software-Defined Networking 13 Algorithm 1: Aggregate Input: F Output: F (K) if F is non-terminal or F not found in cache then create F (K) for x0 ← 0 to 2K − 1 do F 0 ← descendant reached by x0 from F F (K) .child[x0 ] ← Aggregate(F 0 ) // x0 is K-bit from F.b // set x0 -th child cache[F ] ← F (K) return cache[F ] is the smallest bit number labeling node F (e.g., F.b = 0 at the root of MDD in Fig. 3). To avoid repeatedly visiting the same node, visited nodes are cached; this algorithm assumes that all terminal nodes would have been set to the cache in advance. The time complexity of this algorithm is O(2K ||F ||). Since the maximum height of MDD is shrunk to L/K by bit aggregation, the worst-case search path length is also reduced by a factor of K. This is a great acceleration, but there is a tradeoff between the search time and memory space, as follows. We set the following assumption about MDD size. Assumption 2. The bit aggregation algorithm (Algorithm 1) shrinks the MDD size by a factor of K, ||F (K) || ≈ ||F (1) || , K (9) assuming that MDD nodes are roughly equally distributed over each run of K bits (i.e., bits between [Ki, K(i + 1) − 1], i ∈ [0, L/K − 1]). Since the memory requirement of MDD F (K) is given by the product of MDD size and memory requirement of each node [30], it is derived with Assumption 2 as follows, ||F (1) || |b| + 2K |child| , K (10) where |b| is the width of bit numbers and |child| is that of child IDs. Given |b| = |child|, the memory requirement is minimized at K = 2, and increases exponentially for K > 2. The memory requirement might be reduced by introducing heterogeneous MDDs [30], in which non-terminal nodes are allowed to maintain a different number of children and different number of bits. This heterogeneity is also utilized by conventional decision tree methods. However, it prevents efficient MDD traversal introduced in the next subsection, and so we only use MDDs of uniform K in our method. 14 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi Algorithm 2: Search Input: F (K) , pkt Output: packet behavior ∈ {I, II, · · · , |P |} while F (K) is non-terminal do x0 = pkt[F (K) .b/K] F (K) = F (K) .child[x0 ] return F (K) 5.4 // pkt is in K-bit array // get K-bit from F (K) .b // go to next node // terminal node of behavior Packet Classification Algorithm 2 presents the search algorithm that identifies the network-wide behavior of a given packet. It simply follows a path based on the packet. The time complexity is determined just by the maximum height of F (K) , that is, O(L/K), because this algorithm includes no operation other than MDD traversal; in contrast, conventional decision tree methods usually involve linear search at a leaf node to select a single rule. This algorithm is also very efficient in terms of implementation. A packet header is represented as an array of K-bit elements; i.e., i-th K-bit element on the header can be accessed by index i, like pkt[i]. The array element itself becomes a child index without fixing numbers of children and bits, which makes this algorithm very efficient in actual implementation. Our algorithm directly handles the raw bit sequence of the packet header by a K-bit array, and so the packet does not need to be pre-processed at all; other implementations might assume that the header fields of interest would be extracted in advance5 . 6 Experiments This section evaluates our method in terms of memory usage in Section 6.1, construction and update time in Section 6.2, and classification throughput in Section 6.3. Our method was implemented in C++. The parameter of bit aggregation, K, was chosen from 1, 2, 4, and 8, in order to align K-bit array elements with bytes. The widths of bit numbers and child IDs, |b| and |child|, were defined as 32 bits. Our method was compared with HybridCuts [16] as well as a classification method utilizing BDDs [8]. For HybridCuts, parameters “binth” and “spfac” were set to eight and four, which showed the best performance in terms of memory usage and throughput. Search process was added to the original implementation by us. The BDD classification method relies on a set of BDDs of fi ’s to determine the network-wide behavior. It was also implemented in C++ by us. 5 For instance, http://www.arl.wustl.edu/~hs1/PClassEval.html and http://hypercuts. masicek.net/ Packet Classification for Global Network View of Software-Defined Networking 15 Table 1: Statistics of Three Real Networks # # # # # # of of of of of of switches ports used rules (FIB) rules (ACL) header bits of interest network-wide packet behaviors Internet2 9 56 126,017 0 32 86 Stanford 16 58 757,170 1,584 88 1,093 Purdue 1,646 2,736 0 3,605 104 10,353 Configuration datasets of the three real networks, Internet2, Stanford backbone network [10], and Purdue campus network [23], were employed in the experiments. The network statistics are shown in Table 16 . Configuration rules of FIB (Forwarding Information Base) specify the destination IP field only, while those of ACL (Access Control List) are written as 5-tuple7 . Some rules specify the VLAN field, but it was ignored in our experiments in order to use a packet trace generator named ClassBench [31], which supports 5-tuple only. This omission, however, has no significant impact on our results, because the MDD size is only 20 % larger at most when the VLAN field is considered. The experiments were conducted using a single core of Xeon 3.5 GHz with 8 MB cache and 16 GB main memory (DELL PowerEdge Server). 6.1 Memory Usage Figure 5 shows the MDD size (the number of nodes in an MDD). The size was smaller than the total number of nodes in a set of BDDs, as indicated by (7). # of BDD nodes Internet2 37,903 Stanford 547,298 Purdue 561,635 Figure 5 shows a good fit with (9), which validates Assumption 2; the discrepancy is 9.6 % at maximum (Purdue with K = 8). The amount of memory used by an MDD is shown in Fig. 6, while the size of each MDD node, |b| + 2K |child|, is given in the following table (the memory usage of an MDD is the product of MDD size and node size). 6 The numbers of network-wide packet behaviors in Table 1 are somewhat different from [8]. We are unable to explain this difference, but in the most complicated Purdue network, our number is greater than that of [8] (3,917) and so this difference does not favor us. 7 Since Purdue network has no FIB rule, network-wide packet behaviors are defined only at the network edge. 16 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi MDD size [K] 1000 Internet2 Stanford Purdue 100 10 1 0 2 4 6 8 10 K Figure 5: MDD size. The dotted lines indicate (9). Memory usage [MB] 100 Internet2 Stanford Purdue 10 1 0.1 0 2 4 6 8 10 K Figure 6: Memory usage of MDD. The dotted lines indicate (10). The horizontal line is memory usage of HybridCuts for Internet2. K 1 2 4 8 Node size [B] 12 20 68 1028 The memory usage is minimized at K = 2 and increases exponentially for larger K, as expected by (10), but it still fits within the CPU cache (8 MB) even for K = 8 in Internet2 and Stanford, and for K = 4 in Purdue. Although the memory usage is relatively large, 56 MB, at K = 8 in Purdue, the CPU cache remained viable for this seven-fold data; this issue is discussed in the throughput evaluation. BDD memory usage is as follows. BDD memory usage [MB] Internet2 0.454 Stanford 6.568 Purdue 6.740 MDD has smaller memory usage than BDD when K ≤ 4 in Internet2 and Stanford and K ≤ 2 in Purdue. HybridCuts successfully constructed a classifier just for Internet2 as described in Section 3, which requires 739 KB of memory. HybridCuts could be comparable to our method if the header space was represented by a small number of hypercubes, but it does not scale with the header space complexity of network-wide packet behavior. Packet Classification for Global Network View of Software-Defined Networking 0.8 30 Time [sec] Internet2 25 0.6 2500 Stanford 2000 20 0.4 1000 10 5 0 Optimal 500 0.562 6.35 0 Ascending Random Purdue 1500 15 0.2 17 0 Optimal Ascending Random Optimal Ascending Random Figure 7: Computation time of MDD unification. Each point is the average of 10 trials. Time [sec] 10 Internet2 Stanford Purdue 1 0.1 0.01 0.001 0 2 4 6 8 10 K Figure 8: Computation time of bit aggregation. Each point is the average of 10 trials. 6.2 Construction and Update Time Figure 7 demonstrates the time to unify BDDs of fi ’s into a single MDD of F (1) by the operations of (3). We evaluated three calculation orders: the optimal order of Theorem 1, ascending order (a minimum MDD is added to the current unified MDD at every step), and random order. As shown in Fig. 7, the optimal order outperforms the others. Even for the complicated Purdue network, MDDs were unified just in 6.35 sec, which is usually acceptable as an initial construction; note that it can be incrementally updated on demand, as is detailed later. MDD of F (1) was converted to that of F (K) by the bit aggregation algorithm presented in Algorithm 1. The aggregation time is shown in Fig. 8. As expected, the time grows exponentially with K, but it remained under 2.5 sec which is considered acceptable. It is worth noting that the calculation time of BDDs is less than 1 sec according to [8], which has to be added to the total construction time of our method. HybridCuts required 2.17 sec to construct a classifier for Internet2. This is slower than our method with the random calculation order, because HybridCuts processes each hypercube rule individually while our method handles them collectively in a compressed manner. For MDD updates, we assume that the packet classifier is updated by a new rule associated with a new network-wide behavior. First, the new rule was randomly chosen from the original rules, and then the MDD of classifier was constructed from 18 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi Time [msec] 1000 Internet2 Stanford Purdue 100 10 1 0 2 4 6 8 10 K Figure 9: Computation time to update an MDD. Each point is the average of 100 trials. # of hops 100 Internet2 Stanford Purdue 10 1 0 2 4 6 8 10 K Figure 10: Average hop counts on an MDD to classify a packet. The dotted lines guide ∝ 1/K. The horizontal line is average hop counts of HybridCuts for Internet2. the remaining rules. We finally measured the time to perform update operation over the MDDs of new rule and original classifier. Figure 9 shows the update time. It is less than 10 msec for Internet2, and less than 30 msec for Stanford. Even for Purdue with K = 8, the MDD was updated in just 200 msec; these results are entirely acceptable unless time constraints are extraordinarily severe. 6.3 Classification Throughput Traditionally, classification throughput of packet classification has been evaluated by the number of memory accesses per packet. However, modern processors have complicated architectures with multi-level cache hierarchy, which has a significant impact on throughput, and so classification throughput was measured by actually processing packet headers. In addition to Xeon, classification throughput was also measured using a single core of Core i7 1.7 GHz with 4 MB cache and 8 GB main memory (Apple Macbook Air), in order to examine the impact of hardware. In the experiments, the classification throughput was measured as follows. First, a million packets with IP and TCP/UDP headers were generated by ClassBench based on the network configurations. The packets were then written into a file in network-byte order. Finally, each packet was read from the file and classified. Note that packets should be generated based on the configurations as done by ClassBench, in order to match all behaviors; randomly generated packets are likely 19 Packet Classification for Global Network View of Software-Defined Networking Xeon 25 Throughput [Mpps] 50 Internet2 40 20 Average Worst-case 30 12 Stanford 8 15 20 10 10 5 6 4 Average Worst-case 0 0 0 2 4 6 8 0 10 2 4 6 8 0 10 Core i7 2 4 6 Throughput [Mpps] K 8 10 1 Average Worst-case 0 0 10 2 1 0 8 Purdue 2 Average Worst-case 6 3 3 2 4 4 4 4 2 Stanford 5 6 0 K 6 Internet2 Average Worst-case 2 K K 8 Purdue 10 0 2 4 6 K 8 Average Worst-case 0 10 0 2 4 6 8 10 K Figure 11: Classification throughput measured on Xeon 3.5 GHz at the top, and on Core i7 1.7 GHz at the bottom. Each cross is the average of 30 trials, and each trial processed a million packets. The worst-case throughput is indicated by dashed lines. The horizontal lines in Internet2 are throughput of HybridCuts. to match larger subspaces (probably a “default rule”), but never match smaller ones. Figure 10 shows the average number of hops from the root to a terminal on the MDD of F 0(K) . Considering that the number of worst-case hops at K = 1 is equivalent to the number of bits of interest given in Table 1, the average hop number was roughly 2/3 of the worst-case. The average hop number follows 1/K with maximum deviation of 17 % (Stanford with K = 8), and it is less than 10 even with K = 8 for all networks, which ensures the fast classification of our method. The hop number is much less than the 288 bits of the TCP/IP header, since header fields of no interest are simply skipped on MDDs. The classification throughput measured on the Xeon is demonstrated at the top of Fig. 11. It exceeded 10 Mpps at K = 8 for all networks; at 10 Mpps, even short packets of 125 bytes can fill a 10 Gbps link. The throughput of our method scales linearly against K for all networks, though the MDD is larger than the CPU cache for Purdue with K = 8. Figure 11 also shows the worst-case throughput estimated using the worst-case hop number shown in Fig. 10. Even for the worst-case, the throughput is rather high. Interestingly, the classification throughput shows different behaviors on Core i7, as shown at the bottom of Fig. 11. The throughput is quite high, greater than 3 Mpps at K = 8 for all networks, but it does not scale linearly, rather it scales logarithmically. 20 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi 12 10 Time [nsec] 35 Xeon Core i7 30 25 8 20 6 15 4 10 2 5 0 0 0.1 1 10 Memory usage [MB] 100 0.1 1 10 100 Memory usage [MB] Figure 12: Computation time required for a single hop versus memory usage, measured on Xeon 3.5 GHz at the top and on Core i7 1.7 GHz at the bottom. The dotted line shows the linear regression on (log(x), y). This difference in scaling is investigated in depth. Figure 12 shows the computation time required for a single hop versus the memory usage of MDD. The hop time on Xeon stays nearly constant, while that on Core i7 increases logarithmically with MDD size; this is the reason for the scaling difference. We do not dive into the details of CPU architecture and cache hierarchy, which is beyond the scope of this paper, but these results justify our evaluation strategy; the classification throughput should be evaluated based on actual measurements in order to consider the hardware impact. Throughput of HybridCuts was 7.26 Mpps on Xeon for Internet2, as shown in Fig. 11; it is slower than our method of K ≥ 2, though the average hop count was quite small, 3.46. This is because each hop took 31.8 nsec, which is nearly four times of our method. HybridCuts has to examine decision criteria at every node, while our simple search algorithm finds the next node just by array access. Moreover, HybridCuts has to find a rule from “binth” number of rules by linear search at a leaf node. The BDD classification method was slower by several orders of magnitude due to the linear search performed over all behaviors, as follows. Throughput by BDDs [Mpps] 7 Internet2 0.0849 Stanford 0.00578 Purdue 0.00153 Discussion from Application Aspects This section discusses our method in terms of applications. Section 7.1 chooses the best K based on the experiments’ results. Section 7.2 discusses the contributions of our method to SDN applications. 7.1 Choice of Best K Network operators are needed to choose the best K to deploy their applications. Our evaluation in Section 6 revealed a tradeoff between the throughput and con- Packet Classification for Global Network View of Software-Defined Networking Throughput [Mpps] 25 K=8 20 21 Stanford 15 K=4 10 5 0 0 20 40 60 80 100 Update rate [update/sec] 120 Figure 13: Classification throughput versus update rate. Internet2 and Purdue are omitted since they showed similar results. struction/update time; large K improves the throughput, but degrades the construction/update time. This tradeoff is shown in Fig. 13 (the construction time is not shown to depict the tradeoff on a two-dimensional plane). The figure tells us that K should be set to 4 or 8, since the points of K = 4 and 8 form a Pareto frontier, i.e., they cannot be improved without degrading some property. If the network were much simpler, K = 16 might be included in the Pareto frontier, but MDDs of K = 16 require 128× memory capacity compared to K = 8, which would probably make it difficult to fully leverage small CPU caches. In contrast, only K = 4 would form a Pareto frontier in more complicated networks. 7.2 7.2.1 Contributions to SDN applications Network Fabric Network fabric [3,4] is an idea to improve the economic efficiency and manageability of networks by separating the “intelligence” from the network core. An application program of network fabric constructs a classifier of network-wide packet behaviors based on the network policies. The classifier is then set to edge switches, which tag the header of every incoming packet with some sort of “behavior label”. The labeled packets are forwarded through the network core based on the behavior labels. This simple network core can consist of low-cost switches and can be managed just by the labels independently of protocols used in external networks. To realize the network fabric, the network edge is required to determine the network-wide behavior for every packet at line rate, but that is virtually impossible for conventional classification methods, which failed to construct the classifier or achieved unacceptably low throughput given complex policies like those of the Stanford backbone. Reference [4] estimated that the classification rate of 6.7 Gbps on a single CPU core is enough, and the rate could be enhanced using computer clustering techniques like RouteBricks [32] for a large domain. However, no specific classification method that can deal with network-wide packet behaviors is listed in the paper. Our method is the first one that can realize the network fabric even if the network policy is complex. 22 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi 7.2.2 Fault Localization In network fault localization [5], the application software calculates network-wide packet behaviors that could be monitored in the network, in advance. A few probe packets are then selected for each behavior, and the packets are exchanged periodically between switches. If the actually monitored behavior is not identical to the pre-calculated one, the application investigates possible causes as in solving a set cover problem. Reference [5] assumes that only a small number of predefined packets are employed, since no conventional classification method can classify arbitrary packets. This limitation, which was recognized as a match fault deficiency in [5], implies that faulty behaviors of other packets are overlooked; more importantly, failures occurring on production traffic are not detected. Our classifier allows the application software to examine all packets without ignoring any of them. Our efficient method, of course, does not need to compromise the sampling rate. 7.2.3 Network Verification Network verification [6–11] is used to check network properties, such as reachability (e.g., packets of a given header subspace sent from a client can get to a specified server), loop-free (no packet in the subspace would traverse any cyclic path), and waypointing (packets from the outside of network should pass through a firewall). The property tests rely on packet classification in order to map the header space to a network-wide packet behavior represented as a directed subgraph. Assume that in Stanford backbone network, a new rule should be applied immediately to fix a security hole and several properties must be checked with the new rule before deploying it. Subgraphs representing packet behaviors can be updated in 26 msec by [8], while the classifier is updated in 29 msec in our method (Fig. 9); our method is comparably fast. The new behaviors are, then, confirmed with each packet or each header subspace to be tested; assuming that many properties equal to the number of rules in the network, e.g., 100,000, have to be checked, our method only requires 5.0 msec for the lookups while [8] requires 17.3 sec, which can be critical in a severe security incident. 8 Related Work This section discusses related work; conventional packet classification methods [12– 20] are not discussed, since they were thoroughly examined in Section 3. To fully leverage efficient packet classifiers, fast packet I/O mechanisms should be created to receive packets at line rates. Recent research on system technologies [33] resulted in 14.9 Mpps on a single CPU core, which is roughly equal to our results. Our classification method with the fast packet I/O can provide SDN Packet Classification for Global Network View of Software-Defined Networking 23 applications with the quick identification of network-wide packet behavior on fast links. OpenFlow8 , one version of SDN, defines the multi-table pipeline; it divides a large classifier into multiple small ones. If the original complicated header space is divided into many parts, the complexity of each part might be greatly reduced. However, the multi-table pipeline focuses on rules specified on a single switch, not network-wide packet behaviors, and it is supposed to handle each dimension with a separate table. Therefore, each machine holds only a few tables (e.g., four tables [34]), and so our problem is not significantly mitigated. Several papers [35–38], some of which were written in the context of SDN [37,38], discussed algorithms to distribute classification rules among multiple switches without changing their semantics, in order to store the rules in space-limited TCAMs. They are, however, not arranged to identify the network-wide packet behavior. Multiple-bit striding was utilized to accelerate trie traversal [39, 40], and this technique looks similar with our bit aggregation. Our method is, however, established on MDDs, not simple tries, and so Algorithm 1 is designed to handle path convergence by caching visited nodes and to care skipped nodes of don’t care bits. BDDs have often been used to deal with complicated header spaces along with various network applications such as firewall analysis [41], network state verification [6–8], policy enforcement [42], and traffic analysis [43]. However, work to date did not focus on classification throughput, and so speeds are low. Firewall Decision Diagrams (FDDs) [44, 45] can be used to represent the header space in a compressed manner, but they were designed to be human readable, not to be efficiently manipulated by computers; e.g., the network verifier using FDDs [9] seems to be at least 1,000 times slower than those with BDDs [6, 8]. Prefix DAG [46] employs a data structure similar to MDDs, but it focused on a simple classification problem with a single header field, in which classifiers are readily constructed even if all rules are extracted. BDDs and MDDs have been intensively studied in the LSI-CAD community. Reference [47] minimized the total size of given BDDs by applying different bit orders to them, while our interest includes the size of MDD. The memory usage of MDD was optimized in [30], but the MDD structure is made complicated and degrading the classification throughput. 9 Conclusions This paper has developed a packet classification method that efficiently represents the complicated header space defined with network-wide packet behaviors. To fully leverage the global network view provided by SDN, packet classification methods are required to well handle network-wide packet behaviors, not actions taken on 8 http://www.openflow.org/documents/openflow-spec-v1.1.0.pdf 24 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi a single switch. Although packet classification has been studied intensely, conventional classification methods are unable to handle network-wide behaviors at all. Our method is based on compressed decision diagrams and is supported by several new algorithms; it is simple but surprisingly efficient. Numerical experiments on three actual networks showed that our method can identify the network-wide packet behavior at 10 Mpps or more for all three networks. Our work is the only one to solve this hard but important problem in SDN. Moreover, thanks to the introduction of SDN, complex network policies can now be automatically processed by application programs without being manually written in the inefficient 5-tuple format, and so our efficient internal representation will get more opportunities beyond the example applications shown in this paper. In future work, we will develop SDN applications that utilize our classification method, and evaluate its feasibility with field experiments. Powerful techniques studied in computer science, such as compressed self-indexes [46] and Boolean expression minimization [48], will be applied to our method. References [1] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson, Jennifer Rexford, Scott Shenker, and Jonathan Turner. OpenFlow: enabling innovation in campus networks. SIGCOMM Comput. Commun. Rev., 38(2):69–74, 2008. [2] S. Agarwal, M. Kodialam, and T.V. Lakshman. Traffic engineering in software defined networks. In IEEE INFOCOM, pages 2211–2219, 2013. [3] Martin Casado, Teemu Koponen, Scott Shenker, and Amin Tootoonchian. Fabric: A retrospective on evolving sdn. In ACM HotSDN, pages 85–90, 2012. [4] Barath Raghavan, Martı́n Casado, Teemu Koponen, Sylvia Ratnasamy, Ali Ghodsi, and Scott Shenker. Software-defined internet architecture: Decoupling architecture from infrastructure. In ACM HotNets, pages 43–48, 2012. [5] Hongyi Zeng, Peyman Kazemian, George Varghese, and Nick McKeown. Automatic test packet generation. In ACM CoNEXT, pages 241–252, 2012. [6] E. Al-Shaer, W. Marrero, A. El-Atawy, and K. ElBadawi. Network configuration in a box: towards end-to-end verification of network reachability and security. In IEEE ICNP, pages 123–132, 2009. [7] R. McGeer. Verification of switching network properties using satisfiability. In IEEE ICC, pages 6638–6644, 2012. [8] Hongkun Yang and S.S. Lam. Real-time verification of network properties using atomic predicates. In IEEE ICNP, pages 1–11, 2013. Packet Classification for Global Network View of Software-Defined Networking 25 [9] A.R. Khakpour and A.X. Liu. Quantifying and querying network reachability. In IEEE ICDCS, pages 817–826, 2010. [10] Peyman Kazemian, George Varghese, and Nick McKeown. Header space analysis: Static checking for networks. In USENIX NSDI, pages 113–126, 2012. [11] Ahmed Khurshid, Xuan Zou, Wenxuan Zhou, Matthew Caesar, and P. Brighten Godfrey. VeriFlow: Verifying network-wide invariants in real time. In USENIX NSDI, pages 15–28, 2013. [12] Sumeet Singh, Florin Baboescu, George Varghese, and Jia Wang. Packet classification using multidimensional cutting. In ACM SIGCOMM, pages 213– 224, 2003. [13] Hyesook Lim and Ju Hyoung Mun. High-speed packet classification using binary search on length. In ACM/IEEE ANCS, pages 137–144, 2007. [14] Yaxuan Qi, Lianghong Xu, Baohua Yang, Yibo Xue, and Jun Li. Packet classification algorithms: From theory to practice. In IEEE INFOCOM, pages 648–656, 2009. [15] Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar. EffiCuts: Optimizing packet classification for memory and throughput. SIGCOMM Comput. Commun. Rev., 40(4):207–218, 2010. [16] Wenjun Li and Xianfeng Li. HybridCuts: A scheme combining decomposition and cutting for packet classification. In IEEE HOTI, pages 41–48, 2013. [17] Hao Che, Z. Wang, Kai Zheng, and Bin Liu. DRES: Dynamic range encoding scheme for TCAM coprocessors. IEEE Transactions on Computers, 57(7):902– 915, 2008. [18] A.X. Liu, C.R. Meiners, and E. Torng. TCAM razor: A systematic approach towards minimizing packet classifiers in tcams. IEEE/ACM Transactions on Networking, 18(2):490–500, 2010. [19] A. Bremler-Barr and D. Hendler. Space-efficient TCAM-based classification using gray coding. IEEE Transactions on Computers, 61(1):18–30, 2012. [20] O. Rottenstreich, R. Cohen, D. Raz, and I. Keslassy. Exact worst case TCAM rule expansion. IEEE Transactions on Computers, 62(6):1127–1140, 2013. [21] R.E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, C-35(8):677–691, 1986. [22] A. Srinivasan, T. Ham, S. Malik, and R.K. Brayton. Algorithms for discrete function manipulation. In IEEE ICCAD, pages 92–95, 1990. 26 T. Inoue, T. Mano, K. Mizutani, S. Minato, and O. Akashi [23] Yu-Wei Eric Sung, Sanjay G. Rao, Geoffrey G. Xie, and David A. Maltz. Towards systematic design of enterprise networks. In ACM CoNEXT, pages 22:1–22:12, 2008. [24] Mark H. Overmars and Frank A. van der Stappen. Range searching and point location among fat objects. Journal of Algorithms, 21(3):629–656, 1996. [25] Subhash Suri, Tuomas Sandholm, and Priyank Warkhede. Compressing twodimensional routing tables. Algorithmica, 35(4):287–300, 2003. [26] Christian Böhm, Stefan Berchtold, and Daniel A. Keim. Searching in highdimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322–373, 2001. [27] Ryo Yoshinaka, Jun Kawahara, Shuhei Denzumi, Hiroki Arimura, and Shin’ichi Minato. Counterexamples to the long-standing conjecture on the complexity of BDD binary operations. Information Processing Letters, 112(16):636–640, 2012. [28] Donald E. Knuth. The Art of Computer Programming: Combinatorial Algorithms Part 1, volume 4A. Addison-Wesley, USA, 2011. [29] David A Huffman. A method for the construction of minimum redundancy codes. Proceedings of the IRE, 40(9):1098–1101, 1952. [30] S. Nagayama and T. Sasao. On the optimization of heterogeneous MDDs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(11):1645–1659, 2005. [31] D.E. Taylor and J.S. Turner. ClassBench: a packet classification benchmark. In IEEE INFOCOM, volume 3, pages 2068–2079 vol. 3, 2005. [32] Mihai Dobrescu, Norbert Egi, Katerina Argyraki, Byung-Gon Chun, Kevin Fall, Gianluca Iannaccone, Allan Knies, Maziar Manesh, and Sylvia Ratnasamy. RouteBricks: exploiting parallelism to scale software routers. In ACM SOSP, pages 15–28, 2009. [33] Luigi Rizzo. netmap: a novel framework for fast packet I/O. In USENIX ATC, pages 101–112, 2012. [34] Heng Pan, Hongtao Guan, Junjie Liu, Wanfu Ding, Chengyong Lin, and Gaogang Xie. The FlowAdapter: Enable flexible multi-table processing on legacy hardware. In ACM HotSDN, pages 85–90, 2013. [35] Minlan Yu, Jennifer Rexford, Michael J. Freedman, and Jia Wang. Scalable flow-based networking with DIFANE. In ACM SIGCOMM, pages 351–362, 2010. Packet Classification for Global Network View of Software-Defined Networking 27 [36] Chad R. Meiners, Alex X. Liu, Eric Torng, and Jignesh Patel. Split: Optimizing space, power, and throughput for TCAM-based classification. In ACM/IEEE ANCS, pages 200–210, 2011. [37] Nanxi Kang, Zhenming Liu, Jennifer Rexford, and David Walker. Optimizing the ”one big switch” abstraction in software-defined networks. In ACM CoNEXT, pages 13–24, 2013. [38] Y. Kanizo, D. Hay, and I. Keslassy. Palette: Distributing tables in softwaredefined networks. In IEEE INFOCOM, pages 545–549, 2013. [39] Jahangir Hasan and T. N. Vijaykumar. Dynamic pipelining: Making IPlookup truly scalable. SIGCOMM Comput. Commun. Rev., 35(4):205–216, 2005. [40] Yi Wang, Yuan Zu, Ting Zhang, Kunyang Peng, Qunfeng Dong, Bin Liu, Wei Meng, Huichen Dai, Xin Tian, Zhonghu Xu, Hao Wu, and Di Yang. Wire speed name lookup: A GPU-based approach. In USENIX NSDI, pages 199– 212, 2013. [41] Lihua Yuan, Hao Chen, Jianning Mai, Chen-Nee Chuah, Zhendong Su, and P. Mohapatra. Fireman: a toolkit for firewall modeling and analysis. In IEEE S&P, pages 199–213, 2006. [42] Yu-Wei Eric Sung, Carsten Lund, Mark Lyn, Sanjay G. Rao, and Subhabrata Sen. Modeling and understanding end-to-end class of service policies in operational networks. In ACM SIGCOMM, pages 219–230, 2009. [43] Lihua Yuan, Chen-Nee Chuah, and P. Mohapatra. ProgME: Towards programmable network measurement. IEEE/ACM Transactions on Networking, 19(1):115–128, 2011. [44] Mohamed G. Gouda and Alex X. Liu. Structured firewall design. Computer Networks, 51(4):1106–1120, 2007. [45] A.X. Liu and M.G. Gouda. Diverse firewall design. IEEE Transactions on Parallel and Distributed Systems, 19(9):1237–1251, 2008. [46] Gábor Rétvári, János Tapolcai, Attila Kőrösi, András Majdán, and Zalán Heszberger. Compressing ip forwarding tables: Towards entropy bounds and beyond. SIGCOMM Comput. Commun. Rev., 43(4):111–122, 2013. [47] Amit Narayan, Adrian J. Isles, Jawahar Jain, Robert K. Brayton, and Alberto L. Sangiovanni-Vincentelli. Reachability analysis using partitionedROBDDs. In IEEE/ACM ICCAD, pages 388–393, 1997. [48] Kirill Kogan, Sergey Nikolenko, William Culhane, Patrick Eugster, and Eddie Ruan. Towards efficient implementation of packet classifiers in SDN/OpenFlow. In ACM HotSDN, pages 153–154, 2013.
© Copyright 2026 Paperzz