SPolly: Speculative Optimizations in the
Polyhedral Model
Johannes Doerfert, Clemens Hammacher,
Kevin Streit, Sebastian Hack
Saarland University, Germany
January 21, 2013
The Problem
int A[256][256], B[256][256], C[256][256];
void matmul() {
for (int i=0; i<256; i++)
for (int j=0; j<256; j++)
for (int k=0; k<256; k++)
C[i][j] += A[k][i] * B[j][k];
}
2/16
The Problem
int A[65536], B[65536], C[65536];
void matmul() {
for (int i=0; i<256; i++)
for (int j=0; j<256; j++)
for (int k=0; k<256; k++)
C[i*256+j] += A[k*256+i] * B[j*256+k];
}
2/16
The Problem
void matmul(int* A, int* B, int* C) {
for (int i=0; i<256; i++)
for (int j=0; j<256; j++)
for (int k=0; k<256; k++)
C[i*256+j] += A[k*256+i] * B[j*256+k];
}
2/16
The Problem
void matmul(int* A, int* B, int* C, int N) {
for (int i=0; i<N; i++)
for (int j=0; j<N; j++)
for (int k=0; k<N; k++)
C[i*N+j] += A[k*N+i] * B[j*N+k];
}
2/16
How urgent is this problem?
3/16
How urgent is this problem?
14.8%
85.2%
Valid Regions
Invalid Regions
3/16
How urgent is this problem?
Setup
I
Polly
I
I
state-of-the-art polyhedral optimizer integrated in LLVM
SPEC 2000
I
I
industry standard benchmark suite
nine real world programs:
ammp, art, bzip2, crafty, equake, gzip, mcf, mesa, twolf
4/16
How urgent is this problem?
Setup
I
Polly
I
I
SPEC 2000
I
I
I
state-of-the-art polyhedral optimizer integrated in LLVM
industry standard benchmark suite
nine real world programs:
ammp, art, bzip2, crafty, equake, gzip, mcf, mesa, twolf
Research questions
I
I
number of Static Control Parts
(SCoPs := code regions amenable to polyhedral optimizations)
impact of individual rejection causes
4/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
B
C
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
B
C
for (i = 0; i < N; i++)
A[i*N ] += B[i];
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
B
C
void f(int* A , int* B ){
A[0] = B[5];
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
B
C
for (i = 0; i < N*M; i++)
A[i] += B[i];
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
B
C
for (i = 0; i < N; i++)
A[i] += g(i);
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
B
C
for (i=0; i<N; i+=i/2+1 )
A[i] += A[i+1];
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
B
C
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
1230
1093
840
532
384
253
199
1
B
C
(66%)
(59%)
(45%)
(29%)
(21%)
(14%)
(11%)
( 0%)
A #regions where condition i is violated.
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
1230
1093
840
532
384
253
199
1
B
C
(66%)
(59%)
(45%)
(29%)
(21%)
(14%)
(11%)
( 0%)
A #regions where condition i is violated.
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
1230
1093
840
532
384
253
199
1
(66%)
(59%)
(45%)
(29%)
(21%)
(14%)
(11%)
( 0%)
B
84
207
6
72
0
31
0
0
C
( 5%)
(11%)
( 0%)
( 4%)
( 0%)
( 2%)
( 0%)
( 0%)
A #regions where condition i is violated.
B #regions where only condition i is violated.
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
1230
1093
840
532
384
253
199
1
(66%)
(59%)
(45%)
(29%)
(21%)
(14%)
(11%)
( 0%)
B
84
207
6
72
0
31
0
0
C
( 5%)
(11%)
( 0%)
( 4%)
( 0%)
( 2%)
( 0%)
( 0%)
A #regions where condition i is violated.
B #regions where only condition i is violated.
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
1230
1093
840
532
384
253
199
1
(66%)
(59%)
(45%)
(29%)
(21%)
(14%)
(11%)
( 0%)
B
84
207
6
72
0
31
0
0
( 5%)
(11%)
( 0%)
( 4%)
( 0%)
( 2%)
( 0%)
( 0%)
C
84
510
660
928
1174
1387
1586
1587
( 5%)
(27%)
(35%)
(50%)
(63%)
(74%)
(85%)
(85%)
A #regions where condition i is violated.
B #regions where only condition i is violated.
C #regions where only conditions 0 to i are violated.
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
1230
1093
840
532
384
253
199
1
(66%)
(59%)
(45%)
(29%)
(21%)
(14%)
(11%)
( 0%)
B
84
207
6
72
0
31
0
0
( 5%)
(11%)
( 0%)
( 4%)
( 0%)
( 2%)
( 0%)
( 0%)
C
84
510
660
928
1174
1387
1586
1587
( 5%)
(27%)
(35%)
(50%)
(63%)
(74%)
(85%)
(85%)
A #regions where condition i is violated.
B #regions where only condition i is violated.
C #regions where only conditions 0 to i are violated.
5/16
How urgent is this problem?
SCoP rejection causes found in 1862 regions
i
0
1
2
3
4
5
6
7
Rejection cause
Non-affine expressions
Aliasing
Non-affine loop bounds
Function call
Non-canonical indvars
Complex CFG
Unsigned comparison
Others
A
1230
1093
840
532
384
253
199
1
(66%)
(59%)
(45%)
(29%)
(21%)
(14%)
(11%)
( 0%)
B
84
207
6
72
0
31
0
0
( 5%)
(11%)
( 0%)
( 4%)
( 0%)
( 2%)
( 0%)
( 0%)
C
84
510
660
928
1174
1387
1586
1587
( 5%)
(27%)
(35%)
(50%)
(63%)
(74%)
(85%)
(85%)
A #regions where condition i is violated.
B #regions where only condition i is violated.
C #regions where only conditions 0 to i are violated.
5/16
How urgent is this problem?
Conclusion
14.8%
49.8%
35.4%
Valid Regions
Targeted Regions
Invalid Regions
6/16
How to allow more polyhedral optimizations?
Example
void f(int* A, int* B) {
for (int i=0; i < 2048; i++)
A[i] += B[i];
}
7/16
How to allow more polyhedral optimizations?
Example
1. speculatively assume properties (e.g., constant parameters)
void f(int* A, int* B) {
for (int i=0; i < 2048; i++)
A[i] += B[i];
}
7/16
How to allow more polyhedral optimizations?
Example
1. speculatively assume properties (e.g., constant parameters)
2. derive specialized versions
void f_spec(int* restrict A, int* restrict B) {
for (int i=0; i < 2048; i++)
A[i] += B[i];
}
7/16
How to allow more polyhedral optimizations?
Example
1. speculatively assume properties (e.g., constant parameters)
2. derive specialized versions
3. apply polyhedral optimizations
void f_opt(int* restrict A, int* restrict B) {
parfor (int j=0; j < 2048; j+=32)
for (int i=j; i < 32 + j; i++)
A[i] += B[i];
}
7/16
How to allow more polyhedral optimizations?
Example
1. speculatively assume properties (e.g., constant parameters)
2. derive specialized versions
3. apply polyhedral optimizations
4. add runtime dispatcher
void f_dispatcher(int* A, int* B) {
if (overlap(A, B, 2048))
f(A, B);
else
f_opt(A, B);
}
7/16
How to allow more polyhedral optimizations?
Implementation
8/16
How to allow more polyhedral optimizations?
Implementation
LLVM-IR
Polly
SPolly
SCoP Detection
Valid SCoPs
Polyhedral Optimizations
Code Generation
Program
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid SCoPs
Polyhedral Optimizations
Code Generation
Program
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
Valid SCoPs
Polyhedral Optimizations
sSCoP Detection
Valid sSCoPs
Region Speculation
Code Generation
Program
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Program
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Program
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
Program
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
Program
JIT-Environment
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
Profiling Information
Program
JIT-Environment
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
Profiling Information
Program
JIT-Environment
8/16
How to allow more polyhedral optimizations?
Implementation
Polly
LLVM-IR
SPolly
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
Profiling Information
Program
JIT-Environment
8/16
Does it work?
9/16
Does it work?
Almost.
9/16
Does it work?
Applicability on SPEC 2000
14.8%
19.3%
65.9%
Valid SCoPs
Additional sSCoPs
Invalid sSCoPs
10/16
Does it work?
Applicability on SPEC 2000
300
250
Number of valid regions
283
Polly
SPolly
200
150
106
100
50
50
22
41
28 30 37
7
0 ammp art
123
58
24 29 31
10
bzip2 crafty equake gzip
29
1 1
mcf
mesa twolf
11/16
Does it work?
Runtime Results on SPEC 2000
Speedup relatively to Polly
2
Polly
SPolly
1
0 ammp
art
bzip2 crafty equake gzip
mcf
mesa twolf
12/16
Does it work?
Runtime Results on SPEC 2000
Polly
SPolly
art
bzip2 crafty equake gzip
mcf
Polly crashes
0 ammp
SPolly crashes
1
SPolly crashes
Speedup relatively to Polly
2
mesa twolf
12/16
Speedup relatively to Polly
1
0 ammp
art
bzip2 crafty equake gzip
SPolly crashes
SPolly crashes
Polly crashes with additional information
Polly crashes with additional information
Polly crashes with additional information
2
mcf
mesa twolf
Polly crashes
Does it work?
Runtime Results on SPEC 2000
Polly
SPolly
12/16
Does it work?
Case Study – Setup
Algorithm 2D derivation computation
(basic image processing block)
Inputs are given in 2 different resolutions
Evaluated speedup of SPolly normalized against Polly
13/16
Does it work?
Case Study – Results
3.5
Speedup relatively to clang
3.0
Polly
SPolly
2.5
2.0
1.5
1.0
0.5
0.0 512x4096 512x4096 4096x512 4096x512 512x4096 4096x512
Input image size
14/16
Does it work?
Case Study – Results
3.5
Speedup relatively to clang
3.0
Polly
SPolly
2.5
2.0
1.5
1.0
0.5
0.0 512x4096 512x4096 4096x512 4096x512 512x4096 4096x512
Input image size
14/16
Does it work?
Case Study – Results
3.5
Speedup relatively to clang
3.0
Polly
SPolly
2.5
2.0
1.5
1.0
0.5
0.0 512x4096 512x4096 4096x512 4096x512 512x4096 4096x512
Input image size
14/16
Does it work?
Case Study – Results
3.5
Speedup relatively to clang
3.0
Polly
SPolly
2.5
2.0
1.5
1.0
0.5
0.0 512x4096 512x4096 4096x512 4096x512 512x4096 4096x512
Input image size
14/16
Does it work?
Case Study – Results
3.5
Speedup relatively to clang
3.0
Polly
SPolly
2.5
2.0
1.5
1.0
0.5
0.0 512x4096 512x4096 4096x512 4096x512 512x4096 4096x512
Input image size
14/16
Does it work?
Case Study – Results
3.5
Speedup relatively to clang
3.0
Polly
SPolly
2.5
2.0
1.5
1.0
0.5
0.0 512x4096 512x4096 4096x512 4096x512 512x4096 4096x512
Input image size
14/16
Does it work?
Case Study – Results
3.5
Speedup relatively to clang
3.0
Polly
SPolly
2.5
2.0
1.5
1.0
0.5
0.0 512x4096 512x4096 4096x512 4096x512 512x4096 4096x512
Input image size
14/16
Speedup relatively to clang
bicg
12
11
10
9
8
7
6
5
4
3
2
1
0
Geomean: [1st run] 1.134
durbin
dynprog
gramschmidt
correlation
reg_detect
gemver
mvt
2mm
trisolv
fdtd-2d
atax
jacobi-2d-imper
covariance
ludcmp
floyd-warshall
gesummv
adi
gemm
seidel-1d
doitgen
lu
3mm
fdtd-apml
cholesky
syr2k
symm
trmm
jacobi-1d-imper
syrk
Does it work?
Runtime Results on Polybench
SPolly 1st run
SPolly 2nd run
[2nd run] 1.481
15/16
Summary
16/16
Summary
14.8%
85.2%
16/16
Summary
Polly
LLVM-IR
SPolly
14.8%
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
85.2%
Profiling Information
Program
JIT-Environment
16/16
Summary
Polly
LLVM-IR
SPolly
14.8%
Invalid SCoPs
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
85.2%
Profiling Information
Program
JIT-Environment
14.8%
19.3%
65.9%
16/16
Summary
Polly
LLVM-IR
SPolly
Invalid SCoPs
14.8%
SCoP Detection
sSCoP Detection
Valid sSCoPs
Valid SCoPs
Specialized
Versions
Polyhedral Optimizations
Region Speculation
Code Generation
Runtime Dispatcher
Profiling Versions
85.2%
Profiling Information
Program
JIT-Environment
durbin
dynprog
gramschmidt
mvt
gemver
reg_detect
correlation
atax
2mm
trisolv
fdtd-2d
ludcmp
jacobi-2d-imper
gesummv
covariance
floyd-warshall
lu
adi
3mm
gemm
doitgen
seidel-1d
fdtd-apml
trmm
syr2k
symm
cholesky
bicg
Speedup relatively to clang
65.9%
SPolly 1st run
SPolly 2nd run
syrk
19.3%
12
11
10
9
8
7
6
5
4
3
2
1
0
jacobi-1d-imper
14.8%
16/16
Does it work?
Case Study – Setup continued
I
Convolution kernel of size 3x3
I
Applied to all channels of an RGBA image (e.g., png)
I
Measured on a Intel(R) Core(TM) i5 CPU M 560
Image source:
https://sonnati.wordpress.com/2010/10/06/flash-h-264-h-264-squared-%E2%80%93-part-iii/
Concept
Alias tests
Concept
Alias tests
f o r ( i = 0 ; i < N ; i ++) {
f o r ( j = 0 ; j < N ; j ++) {
// I 1
C[ i ] [ j ] = 0;
f o r ( k = 0 ; k < N ; k++) {
//
I2
I3
I4
C [ i ] [ j ] += A [ i ] [ k ] ∗ B [ k ] [ j ] ;
}
}
}
Concept
Alias tests
f o r ( i = 0 ; i < N ; i ++) {
f o r ( j = 0 ; j < N ; j ++) {
Acc
// I 1
I1 and I2
C[ i ] [ j ] = 0;
I3
f o r ( k = 0 ; k < N ; k++) {
I4
//
I2
I3
I4
C [ i ] [ j ] += A [ i ] [ k ] ∗ B [ k ] [ j ] ;
}
}
}
bp
C
A
B
ma
0
0
0
Ma
N∗N−1
N∗N−1
N∗N−1
Concept
Alias tests
f o r ( i = 0 ; i < N ; i ++) {
f o r ( j = 0 ; j < N ; j ++) {
Acc
// I 1
I1 and I2
C[ i ] [ j ] = 0;
I3
f o r ( k = 0 ; k < N ; k++) {
I4
//
I2
I3
I4
C [ i ] [ j ] += A [ i ] [ k ] ∗ B [ k ] [ j ] ;
}
}
}
bool
bool
bool
bool
ab
ac
bc
no
= B [ N∗N−1] < A [ 0 ] | |
= C [ N∗N−1] < A [ 0 ] | |
= B [ N∗N−1] < C [ 0 ] | |
a l i a s f o u n d = ab &&
B[0]
C[0]
B[0]
ac &&
bp
C
A
B
ma
0
0
0
Ma
N∗N−1
N∗N−1
N∗N−1
> A [ N∗N− 1 ] ;
> A [ N∗N− 1 ] ;
> C [ N∗N− 1 ] ;
bc ;
© Copyright 2026 Paperzz