Multi-Engine Packet Classification Hardware

Multi-Engine Packet Classification Hardware - CSIE -NCKU

Scalable Pattern-Matching via
Dynamic Differentiated
Distributed Detection (D4)
Author:
Kai Zheng, Hongbin Lu
Publisher:
GLOBECOM 2008
Presenter:
Han-Chen Chen
Date:2009/12/23
1
Introduction
 Due to unbalance of network flow sizes, traditional flow
based data parallel processing/programming model can not
fully exert multicore platforms’ computing power and
results in poor performance scalability.
 Pattern set pre-partition, let multiple candidate PM methods
to handle the subsets, Detection Mode would be selected
specifically for each incoming flows at the run-time.
2
Primitive idea of Distributed
Detection
Traditional Flow-based Load-Balancing.
Reallocating/Balancing the
workload via D2.
3
Overhead of Distributed
Detection
1. from the OS/system, for increased number of
memory references to address the data
structures of the subsets.
2. The higher mode used, the higher overhead may
be required .
4
Architecture of Differentiated
Distributed Detection
Task-info Queue
stores the information
denoting which flow to
inspect and which
pattern set/sub-set to
detect against.
5
Methods of Differentiated
Distributed Detection
Aho-Corasick (AC) algorithm :
AC algorithm always consumes much more memory, relatively
lower average performance especially when dealing with huge
pattern sets.
Modified-Wu-Manber (MWM) algorithm :
Much lower memory requirement, but it would not be handy and
its performance becomes non-deterministic when dealing with
short patterns (since the Bad-Character shifts are bounded by the
minimum pattern length of the set) and when hash collisions occur
heavily.
6
Wu-Manber Algorithm
 Basic idea of the Boyer-Moore algorithm. It contains a SHIFT table,
a HASH table, and a PREFIX table.
 We impose a requirement that all patterns have the same length.
 Check B characters.
 Each string of size B is mapped (using a hash function) to an integer
used as an index to the SHIFT table.
 We use the exact same integer to index into another table, called
HASH. The i’th entry of the HASH table, HASH[i], contains a
pointer to a list of patterns whose last B characters hash into i.
 Due to the suffixes ‘ion’ or ‘ing’ are very common in English texts.
We also map the first B’ characters of all patterns into the PREFIX
table.
 It is much less common to have different patterns that share the
same prefix and the same suffix.
7
Wu-Manber Algorithm
Ex: pattern set : working
input string : abcding
B=3;
B’=2;
hash[“ing”]=i;
if(Shift[i]>0)
shift Shift[i];
Shift table
talking
pattern last hash
B characters
i
i
…
hash table
i
else
{
calculate prefix “ab” hash value k;
find hash table ith bucket which prefix hash value k;
check those patterns actually match;
}
Talking
working
8
pseudo-code of the
prototyped PSP algorithm
Temp bucket
IS1 IS2
ISNint
…
AC
PS1 PS2 PS3
PSm-1 PSm
…
9
Step 2 example
 Pattern1: talking , Pattern2: working
 K=Hash[“ing”]=15 , Nint=5
PSorig – PSmode-m(1)
talking
 When “ring” calculate hash key k=15
 I[15] = (I[15]+1)%5 = 1
 Add “ring” to IS1
 When “working” calculate hash key k=15
 I[15] = (I[15]+1)%5 = 2
working
.
.
.
.
1
 Add “working” to IS2
10
Implementation of Mode
Selector & Scheduler
 It tends to be always un-worthwhile to apply D2 on small
flows, since small flows is easy to be scheduled and
would be less possible to incur “out-of-balance” issues.
(Small flows: tens of KBs.)
 The system may not be always ready for D2, even for
the large flows. D2 only provides the way to gear up its
CPU utilization, if the system is already very busy and
would remain busy for a while, applying D2 would
merely tire the system out.
 MSS should also take account of the characteristics of
the system or try to “adapt” to the system, e.g. a pretest on the system (using certain sample traces) may be
necessary when determining the parameters for
dynamically mode selecting.
11
Schematic of Mode Selector
& Scheduler
12
Performance
Throughput scalability comparison among different MWM-based parallel PM schemes.
1. The straightforward per-flow-based load balance scheme (i.e. the non-D2 scheme
using Mode 1 merely).
2. The Brute-force D2 scheme in which the Detection Modes are equal to the number of
PME threads used.
3. Dynamic D2 scheme in which Detection Modes are selected in the runtime.
4. D4, which is similar with the Dynamic D2 Scheme except that the patterns whose
sizes are not larger than 9 bytes would be processed by the AC algorithms when 13
Mode>1.
Thanks for
your
listening
14

Download Report

Multi-Engine Packet Classification Hardware - CSIE -NCKU

Paperzz.com

Your Paperzz