Compress the Route Table Stored in TCAM by Using - CSIE -NCKU

Multi-Terabit IP Lookup Using
Parallel Bidirectional Pipelines
Author:
Weirong Jiang
Publisher:
ACM 2008
Presenter:
Po Ting Huang
Date:2010/3/03
1
Viktor K. Prasanna
Outline




2
Introduction
Architecture
Front End
Back End
Memory Balancing
Trie Partitioning
Subtrie-to-Pipeline Mapping
Node-to-Stage Mapping
Performance
Introduction






3
To balance the memory distribution across stages, several
novel pipeline architectures have been proposed [2, 11, 6],but
none of them can perfectly balanced memory distribution over
stages
Ring→throughput degradation, packet blocking during a route
update.
CAMP→delay variation, packet blocking during a route update.
OLP→the first several stages in OLP may not be balanced
The “memory wall” tends to impede the performance
improvement of a single pipeline architecture.
Thus it becomes necessary to employ multiple pipelines
parallel search architecture to speed IP lookup.


4
The architecture consists of multiple bidirectional linear
pipelines where each pipeline stores part of a routing table
Two mapping schemes are proposed to balance the memory
distribution over different pipelines as well as across different
stages in each pipeline.
Architecture
5
Front End

receives packets and
dispatches the packets
to different pipelines

Cache: most recently searched
IP addresses and their next-hop
information.
DIT: stores the relationship
between the subtries and the
pipeline entrances (destination
index table)
Scheduler: determines which
pipeline entrance the packet is
routed to


6
Back End

processes the packets
and outputs the
retrieved next-hop
information.

multi-port queue:to tolerate the
access conflict
output delay queue: When the
packet exits the pipeline, it may be
delayed to be output so that the
intra-flow packet order is preserved

7
Trie Partitioning



initial stride(I): The number of initial bits to be used
A larger I:more small subtries, but result in prefix duplication
Prefix duplication: results in memory inefficiency and may
increase the update cost
I=2
8


9
We study the prefix length distribution based on four
representative routing tables collected from [17]
Following sections, we pick I = 12 as default.
Subtrie-to-Pipeline Mapping
Si denotes the set of subtries contained
in the ith pipeline,i=1 to p
K is the number of subtries
Ti is the ith subtrie,i=1 to k
10
Subtrie-to-Pipeline Mapping
11
Node-to-Stage Mapping
Constraint: If node A is an ancestor of node B in a trie, then A must be
mapped to a stage preceding the stage to which B is mapped.
 The main ideas are to allow
(1) two subtries to be mapped onto different directions
(2) two trie nodes on the same trie level to be mapped onto different
stages.

12
Node-to-Stage Mapping

several heuristics to select the subtries to be
inverted
1.Largest leaf
2.Least height
3.Largest leaf per height
4.Least average depth per leaf

IFR denotes the inversion factor
IFR=0→no subtrie is inverted
IFR close to the pipeline depth→all subtries are inverted
13
Node-to-Stage Mapping
IFR=1
Use Least average depth
per leaf heuristics
→
14
↓
15

The priority of a trie node is defined as
forward subtrie →The priority of a trie node is definedas its height
reverse subtrie→its depth
The node whose priority is equal to the number of the remaining
stages is regarded as a critical node.

Node fields
–
–
16
Distance to child
Memory address of child
Performance





17
I=12
P=4
H=25
IFR=4~8
Least average
depth per leaf
heuristics
Performance

Memory: 1.8 MB
–
–

18.75 G packets / sec
–

7.5 PPC*2.5GHz=18.75G packets/sec=6.0Tbps(packet
size=40bytes)
Power consumption: 0.2 W / IP lookup
–
18
(13+5)*2^13*25*4=14.75Mb=1.8MB
18 KB/stage
0.008*25=0.2W