Clustered Linked List Forest For IPv6 Lookup Oğuzhan ERDEM1 and Aydın CARUS2 1Department of Electrical and Electronics Engineering 2Department of Computer Engineering Trakya University, TURKEY {ogerdem, aydinc}@trakya.edu.tr HOTI'13- Oğuzhan Erdem Agenda Ø Background Ø Prior Work Ø Our Solutions Ø Implementation Results Ø Conclusions & Future Work 2 HOTI'13- Oğuzhan Erdem Agenda Ø Background Ø Prior Work Ø Our Solutions Ø Implementation Results Ø Conclusions & Future Work 3 4 HOTI'13- Oğuzhan Erdem Background Ø IP Lookup Ø Core functions of Internet routers Ø Match the destination IP address to the prefixes in the routing table Ø Longest Prefix Matching (LPM) Ø Multiple matched prefixes possible Ø Only the longest matched prefix will be used Ø Example Ø IP address “114.110.88.212” Ø Routing table: Routing Prefixes Next-Hop Forwarding 114.110.*.*. Port 0 114.110.88.*. Port 1 Ø Matches both, but the second match is used 5 HOTI'13- Oğuzhan Erdem Tree based IP Lookup Ø Binary Search Tree (BST) Ø Each node stores prefix values Ø Better memory performance Ø Complex pre-processing Ø Binary Trie (BT) Ø Simple data structure Ø Easy update process Ø Large number of memory accesses Ø Prefix expansion Ø Sorting requırement Ø Hard incremental updates Prefix Next Hop 00* 01* 10* 101* 110* 0001* 0011* 0101* 1000* 1100* P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 0 0 1 P6 0 1 P1 1 0 1 P2 1 P7 0 0 1 P8 P9 Binary Trie 0 1 P3 1 0 P4 0 P5 P10 Binary Search Tree HOTI'13- Oğuzhan Erdem 6 Performance Metrics Ø Throughput Ø Support for higher link data rates Ø 100 Gbps network interfaces and links Ø Memory usage Ø Rapid growth of routing table size Ø Estimated as ~500K entries in 2015 Ø Transition from IPv4 (32-bit) to IPv6 (128-bit) Ø IPv4 address exhaustion Ø State-of-the-art designs cannot support large routing tables due to the available on-chip memory and the number of I/O pins of FPGAs Ø Update Capability Ø Types of updates – Insert, delete and modify Ø How fast an update operation can be performed HOTI'13- Oğuzhan Erdem Agenda Ø Background Ø Prior Work Ø Our Solutions Ø Implementation Results Ø Conclusions & Future Work 7 HOTI'13- Oğuzhan Erdem 8 Hardware based solutions Ø Linear pattern search in TCAM Ø One lookup per cycle Ø Expensive, power-hungry, low density, large access time, need for a priority encoder Ø Hash based solutions Ø Simple and fast Ø Large number of different hash tables and functions for each prefix lengths, hash collisions. Ø Binary bit traversal in pipelined tries Ø Good throughput performance and support quick prefix updates Ø Inefficient memory usage Ø Binary value search in pipelined trees Ø Good memory performance Ø Complex pre-processing and updates 9 HOTI'13- Oğuzhan Erdem Hierarchical Search Structures Ø Hierarchical trie for packet classification Ø Hierarchically connected binary tries Ø One big SA trie from SA prefixes Ø Many small DA tries using DA prefixes Ø Hardware implementation unfeasible 0 1 0 0 1 1 Ø Backtracking problem SA Trie DA Tries 1 1 0 1 1 0 0 1 0 0 0 0 Ø Hierarchical structures for IP lookup Ø Ony DA is used Ø Split prefixes into two part Ø Hierarchically connected data structures Ø Substantial memory saving Ø Bit strings shared by multiple prefixes are stored only once Ø No backtracking : Fixed size strings naturally pairwise disjoint HOTI'13- Oğuzhan Erdem Agenda Ø Background Ø Prior Work Ø Our Solutions Ø Implementation Results Ø Conclusions & Future Work 10 HOTI'13- Oğuzhan Erdem 11 Our Solutions Ø Clustered Linked List Forest (CLLF) structure Ø Hierarchical two stage data structure Ø Reduces the memory requirement Ø Simplifies the table updates Ø Linear pipelined SRAM-based architecture on FPGAs Ø Supports up to 712K IPv6 prefixes Ø Sustained throughput of 432 Million lookups per second (138 Gbps for the minimum packet size of 40 Bytes) 12 HOTI'13- Oğuzhan Erdem Clustered Linked List Forest (CLLF) Ø Hierarchical two stage data structure Ø A prefix P can be expressed as the concatenation of two substrings Px and Py such that P = Px Py . Ø Stage 1 : Clustering stage (Px) Stage 2 : Linked List (LL) stage (Py) C18 C214 C320 C426 9 C215 C321 C427 C110 C216 C322 C428 11 C217 C323 C429 C112 C218 C324 C430 13 C219 C325 C431 Cluster3 Cluster4 C1 C1 C1 Cluster1 Cluster2 Clustering Stage N1,0,0 N1,1,0 N1,k,0 N2,0,0 N2,1,0 N2,t,0 N3,0,0 N3,1,0 N1,0,1 N1,1,1 N1,k,1 N2,0,1 N2,1,1 N2,t,1 N3,0,1 N3,1,1 N2,1,2 N2,t,2 N3,0,2 N3,r,0 N4,0,0 N4,1,0 N4,z,0 N4,1,1 N4,z,1 LL1 N1,0,2 N1,k,2 N4,1,2 LLk N1,0,m LL0 N2,t,m Linked List (LL) Stage N4,1,m HOTI'13- Oğuzhan Erdem 13 Clustering Ø Length-based Ø 64 clusters in IPv6 the worst case Ø Each cluster is indexed starting from 1 to n (Ci represents ith cluster) Ø Reduce the number of clusters by merging Ø Parameters Ø Block size (BS) : The number of distinct lengths involved in a cluster Ø CiL : Each distinct length (L) block of prefixes in a Ci Ø Ex: If a cluster Ci includes the prefixes of lengths from 8 to 11, then BS(Ci)=4 and Ci9 represents a bunch of prefixes with length 9. Ø Initial prefix stride (IPS) : The length of substring Px (|Px|) in P = Px Py Ø Ex: If IPS (Ci)= 4, then a prefix P = 110010111* in Ci can be represented using two substrings Px= 1100 and Py= 10111* 14 HOTI'13- Oğuzhan Erdem Data Structure Ø Stage 1: Clustering stage Ø Each group has a Binary Search Tree (BST) using Px of prefixes. Ø Stage 2: Linked List (LL) stage Ø Each node of BST connects to a linked list data structure. Ø Each node in a LL contains different Py substring with its length. Ø BS(C1)=2, BS(C2)=2, BS(C3)=3, IPS(C1)=2, IPS(C2)=3, IPS(C3)=4. P0 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 Prefix 10* 00* 000* 011* 101* 111* 1000* 0010* 0111* 10010* 11000* 01101* 11110* 10110* 01111* 101101* 011011* 1011110* 0110011* 01100000* 01101011* Next Hop 1 2 2 8 6 5 8 6 7 10 9 8 10 3 1 5 1 3 7 9 4 C1={C12,C13} C2={C24,C25} C3={C36,C37,C38} 100 (4) 10 (2) 0110 (6) 1011 (6) 01 (1) 110 (6) 011 (3) 11 (3) 101 (5) 001 (1) 00 (0) 111 (7) Clustering stage Linked List Stage - 0 P1 0 1 P2 1 1 P3 - 0 P0 1 1 P4 1 1 P5 0 1 P7 1 1 P8 0 1 P6 01 2 P11 10 2 P9 11 2 P14 10 2 P13 00 2 P10 10 2 P12 11 2 P16 01 2 P15 011 3 P18 110 3 P17 0000 4 P19 1011 4 P20 HOTI'13- Oğuzhan Erdem 15 IP Lookup Ø Split IP address into two: IPx and IPy Ø Based on the IPS value of corresponding cluster Ø Search IPx in Stage 1 (BST) and IPy in Stage 2 (Linear search). Ø Parallel searches in clusters Ø If a match is found in BST, proceed to the connected LL. Ø At each LL node, an IPy sub-address is compared with Py sub-prefix. Ø Update search result if longer match than the previous match. Ø Search terminates when all the nodes in a LL is traversed Ø Outputs from all clusters are given to the priority encoder that chooses the longest prefix. HOTI'13- Oğuzhan Erdem 16 Optimization Ø Node size optimizations Ø BST Ø The nodes of BST can be enhanced to store Py substrings Ø Ptree: A limit for the number of Py substrings per tree node Ø LL Ø LL nodes can be enhanced to store more than one Py substrings Ø PLL: Upper bound for the number of Py prefixes per LL node Ø Clustering optimizations Ø BS Ø Defines the number and size of clusters Ø Larges BS results in less number of pipelines in hardware implementation Ø IPS Ø Determines the number of nodes in BST and LL. Ø Smaller IPS reduces the number of BST nodes but increase the number of LL nodes and implicitly the delay. 17 HOTI'13- Oğuzhan Erdem Architecture Ø Overall 1 stage Ø Register Incoming Packet Header extracter Address Data IPx BST BST IPx NHIIP Address NHIIP Address Px Next Stage Address BRAM BST NHIprefix Match Module (BST) Address Next Stage Address Clustering stage Register LL LL LL Address Data IPy Cluster2 Clustern Address BRAM Address NHIprefix Priority Encoder Result IPy NHIIP L(Py) NHIIP Cluster1 Py LL stage Match Module (LL) Address HOTI'13- Oğuzhan Erdem Agenda Ø Background Ø Prior Work Ø Our Solutions Ø Implementation Results Ø Conclusions & Future Work 18 19 HOTI'13- Oğuzhan Erdem Results No. of prefixes Ø Experimental setup Ø Ten IPv4 core routing tables from Project - RIS on 06/03/2010. Ø Ten synthetic IPv6 routing tables generated using IPv4 core RTs IPv4 rrc00 rrc03 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc15 rrc16 IPv6 rrc00_6 rrc03_6 rrc05_6 rrc06_6 rrc07_6 rrc10_6 rrc11_6 rrc12_6 rrc15_6 rrc16_6 # prefixes 332118 321617 322997 321577 322557 319952 323668 320016 323987 328296 140000 rrc00_6 120000 rrc03_6 100000 rrc06_6 80000 rrc07_6 rrc05_6 rrc10_6 60000 rrc11_6 rrc12_6 40000 rrc15_6 20000 rrc16_6 0 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 Prefix Length 20 HOTI'13- Oğuzhan Erdem Memory Requirement Ø For various BS and IPS (PLL=Ptree=1) Ø For various Ptree and PLL (BS=4, IPS=13) 19 20 Memory (Mbits) Memory (Mbits) 25 15 10 5 0 4 5 6 7 8 BS rrc00_6 9 10 rrc03_6 11 12 12 rrc05_6 13 14 15 16 17 18 18 17 16 15 1 2 IPS rrc06_6 3 4 PLL rrc07_6 rrc10_6 5 rrc11_6 6 7 rrc12_6 8 1 2 rrc15_6 3 4 Ptree rrc16_6 BT 341 330 332 331 332 329 347 329 334 337 BTPC 72 69 70 69 70 69 72 70 70 71 DBPC 45 45 45 45 46 45 48 45 45 46 BST 35.5 34.4 34.6 34.4 34.5 34.3 34.6 34.3 34.7 35.2 CTF 27.9 27.3 27.0 29.2 27.2 27.1 27.1 26.9 28.0 27.2 CLLF 16.4 15.8 15.9 15.9 16.0 15.7 16.0 15.7 15.9 16.1 21 HOTI'13- Oğuzhan Erdem Delay Ø The delay for various Ptree and PLL Ø The % of memory by BST and LL for various IPS (BS=4, PLL=5, Ptree=1) (BS=4, IPS=13) 200 150 100 50 0 1 2 3 4 PLL rrc00_6 CLLF 45 5 rrc03_6 44 6 7 8 rrc05_6 45 1 2 3 4 Ptree rrc06_6 44 % of memory usage Delay (Cycle) 250 rrc07_6 45 69 45 Delay 27 14 12 13 14 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% rrc10_6 45 rrc11_6 IPS 15 Ø Depth (cycles) (BS=4, IPS=13, PLL=5, Ptree=1) 6 6 16 17 18 BST rrc12_6 44 8 45 rrc15_6 46 Linked List rrc16_6 45 22 HOTI'13- Oğuzhan Erdem Performance Comparison Ø FPGA implementation Ø Implemented in Verilog using Xilinx ISE 12.4 and target device Xilinx XC6VSX475T Ø 216 MHz (432 million packets per second with dual-ported BRAMs) Ø Clock period is 4.63 ns, 138 Gbps for the minimum packet size of 40 bytes Ø Memory efficiency (in Bits/prefix) Ø Throughput efficiency (in Gbps/Bits) Architecture # prefix Mem. Eff. Bits/prefix Throughput Gbps Throughput eff. Gbps/B CLLF 324 51.8 138 2.66 CTF 324 88.1 135 1.53 BST 324 124 125 1.01 DBPC 324 146 120 0.82 2-3 tree 324 167 120 0.72 Flashtrie 310 171 64 0.37 TreeBitmap 310 405 6 0.02 HOTI'13- Oğuzhan Erdem Agenda • Background • Prior Work • Our Solutions • Implementation Sesults • Conclusions & Future Work 23 HOTI'13- Oğuzhan Erdem 24 Conclusions and Future Work Ø Clustered Linked List Forest Ø Significant memory saving Ø Performance improvements with optimizations Ø Linear pipelined SRAM-based architecture on FPGAs Ø Supports up to 712K IPv6 prefixes Ø Sustained throughput of 138 Gbps Ø Future work Ø Extend the algorithm to virtual routers to improve their memory efficiency THANK YOU !
© Copyright 2024 Paperzz