Combining SIMD and Many/Multi-core Parallelism for
Finite State Machines with Enumerative Speculation
Peng Jiang
Gagan Agrawal
Department of Computer Science and Engineering
The Ohio State University, USA
Motivation - FSM
• Finite state machines (FSMs) are at the
core of many practical applications
•
•
•
•
•
Regular Expression matching
Huffman decoding
XML validation
Text tokenization
…….
2
Motivation - FSM
• Example (Div7)
1
state
6
0
0
3
1
0
0
2
3
4
5
6
Bit 0
0
2
4
6
1
3
5
Bit 1
1
3
5
0
2
4
6
1
0
1
1
5
1
0
0
input
2
0
1
0
1
1
4
state = 0;
for(i in input_text) {
state = T[i][state];
}
Embarrassingly sequential:
Current ‘state’ depends on the
‘state’ of previous iteration and
current input
3
Background on Parallelizing FSMs
• Speculation
• Divide the input into several blocks
• GUESS the starting states for each block
except the first one
• Perform computation on each block in
Can be expensive to get
parallel
a good guess
• Re-execute when a block started with
wrong state
• has been mainly implemented at
thread/process level (Zhao et al.
ASPLOS’14, ’15)
4
Background on Parallelizing FSMs
• Enumeration
• Divide the input into several blocks
• Perform computation on each block in
parallel with ALL possible states in the FSM
• Active states may converge to a small
amount
• has been utilizingRedundant
SIMDwork
to perform
multiple
may
outweigh the benefit of
state transition simultaneously
(Mytkowicz
parallelization
et al. ASPLOS’14)
5
Exploiting Fine-grain Parallelism (SIMD)
• More Powerful SIMD features (Intel Xeon
Phi)
•
•
•
Wider SIMD lanes (512 bits)
Gather/scatter operations enable irregular memory
accesses
Mask data type and operations enable
computation on specified SIMD lanes
6
Exploiting Fine-grain Parallelism (SIMD)
• SIMD state transitions
•
•
•
Each lane stores a state
Calculate the addresses of next states
simultaneously
Gather the new states from the transition table
vinput
i
state0 state1 ...
vstate
i
statem-1
in0
in1
...
inn-1
7
Our Contribution
• Enumerative Speculation
•
•
•
•
Speculation (1-state) vs enumeration (all-state)
Computes with n states (1<n<N)
Make prediction easier
Limit the amount of redundant work
• Efficiently mapping enumerative speculation
of FSMs to SIMD architecture
•
•
Accommodate more than one speculation task in one
SIMD vector
Tradeoff between redundant work and speculation
success rate
8
Enumerative Speculation (SIMD implementation)
• Enumerative Speculation in SIMD vector
State
Vector
(1-way)
State
Vector
(2-way)
Correct state for
block i
Correct state for
block i
Guessed states
for block i+1
Guessed states
for block i+1
Guessed states
for block i+2
State
Vector
(3-way)
Correct state
for block i
Guessed states
for block i+1
Guessed states
for block i+2
Guessed states
for block i+3
9
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block1
Input:
StateTransition:
1
0
1
1
block2
0
1
0
1
0
0
1
1
0
lane0
0
1
2
3
4
5
6
lane1-7
10
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block1
Input:
StateTransition:
1
0
0
1
lane0
1
1
block2
0
1
0
1
0
0
1
2
2
4
3
6
4
1
5
3
6
5
0
0
1
1
lane1-7
11
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block1
Input:
StateTransition:
1
0
1
0
1
2
lane0
1
block2
0
1
0
1
0
0
0
1
1
2
5
2
4
2
3
6
6
4
1
3
5
3
0
6
5
4
0
1
1
lane1-7
12
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block1
Input:
StateTransition:
1
0
1
1
0
1
2
5
lane0
block2
0
1
0
1
0
0
0
0
1
2
1
2
5
3
2
4
2
4
3
6
6
5
4
1
3
6
5
3
0
0
6
5
4
1
1
1
lane1-7
13
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block1
Input:
StateTransition:
block2
1
0
1
1
0
0
1
2
5
4
lane0
1
0
1
0
0
1
0
0
1
2
4
1
2
5
3
6
2
4
2
4
1
3
6
6
5
3
4
1
3
6
5
5
3
0
0
0
6
5
4
1
2
1
lane1-7
14
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block1
Input:
StateTransition:
block2
1
0
1
1
0
1
0
1
2
5
4
1
lane0
0
1
0
0
1
1
0
0
1
2
4
2
1
2
5
3
6
6
2
4
2
4
1
3
3
6
6
5
3
0
4
1
3
6
5
4
5
3
0
0
0
1
6
5
4
1
2
5
lane1-7
15
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block1
Input:
StateTransition:
block2
1
0
1
1
0
1
0
0
1
2
5
4
1
3
lane0
1
0
0
1
1
0
0
1
2
4
2
5
1
2
5
3
6
6
6
2
4
2
4
1
3
0
3
6
6
5
3
0
1
4
1
3
6
5
4
2
5
3
0
0
0
1
3
6
5
4
1
2
5
4
lane1-7
16
Enumerative Speculation (SIMD implementation)
• SIMD speculation
block 1
Input:
Input:
StateTransition:
State Transition:
block 2
1
0
1
1
0
1
0
0
1
2
5
4
1
3
lane0
The final state of
block 1 is 3, so select
the speculation of
block 2 started with
state 3
1
0
0
1
1
0
0
1
2
4
2
5
1
2
5
3
6
6
6
2
4
2
4
1
3
0
3
6
6
5
3
0
1
4
1
3
6
5
4
2
5
3
0
0
0
1
3
6
5
4
1
2
5
4
correct
final
state
lane1-7
17
Enumerative Speculation (SIMD implementation)
• Correctness Checking
vf:
vf[0]
vf[1]
vf[2]
……
vf[m-1]
vf[m]
vf[0]
vf[0]
vf[0]
……
vf[0]
vf[0]
vs[1]
vs[2]
……
vs[m-1]
vs[m]
SIMD comparison
vs:
vs[0]
18
Enumerative Speculation (MIMD implementation)
• Extend to Multithreading
•
•
First thread performs SIMD speculation
•
•
First lane stores correct state
Remaining lanes do speculation
Other threads speculate with SIMD
•
•
SIMD lanes are evenly divided into groups
Each group speculates on different input block
19
Results
• Platform
•
•
•
•
Intel Xeon Phi SE10P coprocessor (KNC)
61 cores, 1.1 GHz
8 GB GDDR5
Intel ICC 13.1.0, -O3 enabled
• Applications
•
•
•
•
Huffman decoding
Regular expression matching
HTML tokenization
Div7
20
Huffman Decoding
• Single thread performance, speedup by SIMD
0.18
7.69%
ExecutionTime(sec)
0.16
Serial
1-wayEnumSpec
0.14
1-wayPureSpec
0.12
15-wayPureSpec
0.1
7.69%
0.08
0.06
0.04
61.85%
97.61%
61.96%
98.02%
0.02
0
76.txt.utf-8
50247.txt.utf-8
21
Huffman Decoding
Speedups overSerial
• Multi-threaded performance, speedup by SIMD+MIMD
128
64
32
16
8
4
2
1
EnumSpec(2-way)
EnumConv
1
2
4
8
16
32
60
NumberofThreads
22
Regular Expression Matching
• Single thread performance, speedup by SIMD
0.05
ExecutionTime(sec)
0.045
Serial
0.04
1-wayPureSpec
0.035
3-wayEnumSpec
0.03
0.025
7.59%
1-wayEnumSpec
3-wayPureSpec
7.36%
15-wayPureSpec
26.49%
52.79%
0.02
0.015
0.01
72.95%
100%
0.005
52.91%
100%
26.97%
80.72%
0
76.txt.utf-8
50247.txt.utf-8
Results for regular expression(.∗l.∗i.∗k.∗e)|(.∗a.∗p.∗p.∗l.∗e)
23
Conclusions
•
Enumerative Speculation
•
•
Tradeoff between redundant work and speculation success
rate in SIMD vector
Efficient SIMD implementation of enumerative
speculation
•
Conduct one or more speculation tasks on a SIMD vector
• Performance
• Achieve consistently higher speculation success rate than pure
speculation
• Better SIMD utilization than pure enumeration
24
Thanks for your attention!
Q?
• Peng Jiang [email protected]
• Gagan Agrawal [email protected]
25
© Copyright 2026 Paperzz