avg min - People.csail.mit.edu

Randomized Accuracy Aware
Program Transformations for
Efficient Approximate Computations
Sasa Misailovic
Joint work with
Zeyuan Allen Zhu Jonathan Kelner
MIT CSAIL
Martin Rinard
…
…
…
…
…
• Nodes represent computation
• Edges represent flow of data
…
…
…
…
…
• Functions – process individual data
• Reduction nodes – aggregate data
…
…
…
…
avg
avg
avg
avg
…
min
• Functions – process individual data
• Reduction nodes – aggregate data
…
…
…
…
avg
avg
avg
avg
f1
f2
f3
…
min
Function substitution
• Multiple implementations
• Each has expected error/time (𝐸, 𝑇)
…
…
…
…
avg
avg
avg
avg
…
min
Function substitution
• Multiple implementations
• Each has expected error/time (𝐸, 𝑇)
…
…
…
…
avg
avg
avg
avg
…
min
Sampling inputs of reduction nodes
• Reductions consume fewer inputs
…
…
avg
avg
…
…
…
min
Sampling inputs of reduction nodes
• Reductions consume fewer inputs
Tradeoff Space
Time
Error
Tradeoff Space
Time
Error
Optimal Tradeoff Curve
Time
Using the tradeoff curve:
• Minimize time subject to error bound
• Minimize error subject to time bound
Error
Our Result
Original program
Transformations
Analysis
• Randomized computation
• Guaranteed expected error/time tradeoff
• (1 + 𝜀)-approximation of optimal tradeoff
Error bound
Optimization
Optimized
program
Outline
Model of Computation
Tradeoff Curve Construction
Optimized Program Selection
Related Work
Model of Computation
f
…
g
f
…
g
f
avg
avg
…
f
g
avg
avg
t
u
t
v
w
u
…
min
…
v
w
g
Model of Computation
g
f
f
…
g
f
m
… g
avg
f
avg
avg
f
g
n
t
t
t
v
u
w
v
…
w
n min
min
1
…
avg
avg
n
u
…m
avg
u
v
w
g
Structure of Computation
• Computation nodes
g
f
m
m
avg
avg
n
n
t
u
v
w
n
min
1
DAGs of functions
Functions: arbitrary code
Process individual inputs
• Reduction nodes
Aggregation functions
Average, min, max, sum…
• Computation Tree
Computation nodes and
reduction nodes
Accuracy-Aware Transformations
Function substitution
m
m
avg
avg
n
n
• Multiple versions 𝑓𝑖
• Execute 𝑓𝑖 with probability 𝑝𝑓𝑖
• Each has error/time spec (𝐸𝑖 , 𝑇𝑖 )
Reduction sampling
• Consume 𝑠 < 𝑚 inputs
Probability of selecting each
input: 𝑠/𝑚
n
• Derived error/time specifications
min
Average: 𝐸 ≤ 𝜓 1/ 𝑠
1
Min/max: 𝐸 ≤ 𝜓 𝑐 𝑠 , 𝑐 < 1
Accuracy-Aware Transformations
Function substitution
… m
m
avg
avg
avg
n
n
• Multiple versions 𝑓𝑖
• Execute 𝑓𝑖 with probability 𝑝𝑓𝑖
• Each has error/time spec (𝐸𝑖 , 𝑇𝑖 )
Reduction sampling
• Consume 𝑠 < 𝑚 inputs
Probability of selecting each
input: 𝑠/𝑚
n
• Derived error/time specifications
min
Average: 𝐸 ≤ 𝜓 1/ 𝑠
1
Min/max: 𝐸 ≤ 𝜓 𝑐 𝑠 , 𝑐 < 1
Accuracy-Aware Transformations
Function substitution
… m
m
avg
avg
avg
n
n
• Multiple versions 𝑓𝑖
• Execute 𝑓𝑖 with probability 𝑝𝑓𝑖
• Each has error/time spec (𝐸𝑖 , 𝑇𝑖 )
Reduction sampling
• Consume 𝑠 < 𝑚 inputs
Probability of selecting each
input: 𝑠/𝑚
n
• Derived error/time specifications
min
Average: 𝐸 ≤ 𝜓 1/ 𝑠
1
Min/max: 𝐸 ≤ 𝜓 𝑐 𝑠 , 𝑐 < 1
Accuracy-Aware Transformations
Function substitution
… m
m
avg
avg
avg
n
n
• Multiple versions 𝑓𝑖
• Execute 𝑓𝑖 with probability 𝑝𝑓𝑖
• Each has error/time spec (𝐸𝑖 , 𝑇𝑖 )
Reduction sampling
• Consume 𝑠 < 𝑚 inputs
Probability of selecting each
input: 𝑠/𝑚
n
• Derived error/time specifications
min
Average: 𝐸 ≤ 𝜓 1/ 𝑠
1
Min/max: 𝐸 ≤ 𝜓 𝑐 𝑠 , 𝑐 < 1
Program Configuration Vector
Defines transformed program
m
• Functions: probability of
executing each version
• Reductions: number of
elements to sample
m
avg
avg
n
n
n
min
1
𝒇𝟏
0.3
𝒇𝟐
0.6
𝒇𝟑
0.1
𝒔𝟏
m/2
…
…
Configuration Vector
• Specifies program version
• Functions:
Findm optimal
programprobability of
m
executing each version
=
• Reductions: number of
n
n
Find configuration vector
thattoachieves
elements
sample
avg
avg
optimal accuracy vs. performance tradeoff
n
min
1
𝒇𝟏
0.3
𝒇𝟐
0.6
𝒇𝟑
0.1
𝒔𝟏
m/2
…
…
Tradeoff Curve Construction: Algorithm
Divide and conquer
m
m
avg
avg
n
n
• For each subcomputation
construct tradeoff curve
• Dynamic programming
Properties
n
min
1
• Polynomial time
• 1 + 𝜀 -approximation of
true tradeoff curve
Tradeoff Curve Construction: Algorithm
m
m
avg
avg
n
n
n
min
1
Tradeoff Curve Construction: Algorithm
m
m
avg
avg
n
n
n
min
1
Tradeoff Curve Construction: Algorithm
m
m
avg
avg
n
n
n
min
1
Tradeoff Curve Construction: Algorithm
m
avg
n
n
n
min
1
Tradeoff Curve Construction: Algorithm
n
n
n
min
1
Tradeoff Curve Construction: Algorithm
n
n
n
min
1
Tradeoff Curve Construction: Algorithm
n
n
n
min
1
Tradeoff Curve Construction: Algorithm
n
min
1
Tradeoff Curve Construction: Algorithm
Computation Node Optimization
Linear program
Variables: 𝑥0 , 𝑥1 , 𝑥2
• Probability to execute each version of 𝑓
• Range: 𝑥𝑖 ∈ 0,1
• Sum: 𝑥0 + 𝑥1 + 𝑥2 = 1
(E0,T0)(E1,T1)(E2,T2)
Computation Node Optimization
Linear program
Variables: 𝑥0 , 𝑥1 , 𝑥2
• Probability to execute each version of 𝑓
• Range: 𝑥𝑖 ∈ 0,1
• Sum: 𝑥0 + 𝑥1 + 𝑥2 = 1
Objective:
𝐦𝐢𝐧 𝑥0 𝑇𝑓0 + 𝑥1 𝑇𝑓1 + 𝑥2 𝑇𝑓2
(E0,T0)(E1,T1)(E2,T2)
Constraint:
𝛽𝑓 𝑥0 𝐸𝑓0 + 𝑥1 𝐸𝑓1 + 𝑥2 𝐸𝑓2 ≤ 𝜹
Computation Node Optimization
Linear program
Variables: 𝑥0 , 𝑥1 , 𝑥2
• Probability to execute each version of 𝑓
• Range: 𝑥𝑖 ∈ 0,1
• Sum: 𝑥0 + 𝑥1 + 𝑥2 = 1
Objective:
𝐦𝐢𝐧 𝑥0 𝑇𝑓0 + 𝑥1 𝑇𝑓1 + 𝑥2 𝑇𝑓2
(E0,T0)(E1,T1)(E2,T2)
Constraint:
𝜷𝒇 𝑥0 𝐸𝑓0 + 𝑥1 𝐸𝑓1 + 𝑥2 𝐸𝑓2 ≤ 𝛿
Computation Node Optimization
Linear program
Variables: 𝑥0 , 𝑥1 , 𝑥2
• Probability to execute each version of 𝑓
• Range: 𝑥𝑖 ∈ 0,1
• Sum: 𝑥0 + 𝑥1 + 𝑥2 = 1
Objective:
𝐦𝐢𝐧 𝛽𝑓 𝑥0 𝐸𝑓0 + 𝑥1 𝐸𝑓1 + 𝑥2 𝐸𝑓2
(E0,T0)(E1,T1)(E2,T2)
Constraint:
𝑥0 𝑇𝑓0 + 𝑥1 𝑇𝑓1 + 𝑥2 𝑇𝑓2 ≤ 𝜽
The Algorithm: Reduction Nodes
Given error bound 𝛿, find number of elements to sample 𝑠.
m
avg
The Algorithm: Reduction Nodes
Given error bound 𝛿, find number of elements to sample 𝑠.
min 𝑇𝑠𝑢𝑏 × 𝑠
(𝐸𝑠𝑢𝑏 , 𝑇𝑠𝑢𝑏 )
s.t.
m
avg
𝑠 ∈ {1, 2, … , 𝑚}
𝐸𝑠𝑢𝑏 + 𝐸𝑙𝑜𝑐𝑎𝑙 𝑠 ≤ 𝛿
The Algorithm: Reduction Nodes
Given error bound 𝛿, find number of elements to sample 𝑠.
min 𝑇𝑠𝑢𝑏 × 𝑠
(𝐸𝑠𝑢𝑏 , 𝑇𝑠𝑢𝑏 )
s.t.
m
avg
𝑠 ∈ {1, 2, … , 𝑚}
𝐸𝑠𝑢𝑏 + 𝐸𝑙𝑜𝑐𝑎𝑙 𝑠 ≤ 𝛿
From approximate tradeoff curve:
(𝑖)
𝑖
𝑇𝑠𝑢𝑏 = 𝑎𝑖 ⋅ 𝐸𝑠𝑢𝑏 + 𝑏𝑖
for
𝑖 ∈ {1, 2, … , 𝑛𝑠𝑒𝑔 }
The Algorithm: Reduction Nodes
Given error bound 𝛿, find number of elements to sample 𝑠.
min (𝑎𝑖 ⋅ (𝛿 − 𝐸𝑙𝑜𝑐𝑎𝑙 𝑠 ) + 𝑏𝑖 ) × 𝑠
(𝐸𝑠𝑢𝑏 , 𝑇𝑠𝑢𝑏 )
s.t.
m
avg
𝑠 ∈ 𝑠𝑙 𝑖 , … , 𝑠𝑢 𝑖
𝑖 ∈ 1, … , 𝑛𝑠𝑒𝑔
• Univariate optimization problem
• Analogously, minimize error subject to 𝜃
Approximate Tradeoff Curve
Bidimensional Discretization
Time
Take elements at regular intervals
• 𝛿𝑖+1 = 𝛿𝑖 ⋅ 1 + 𝜀
• 𝜃𝑖+1 = 𝜃𝑖 ⋅ 1 + 𝜀
Error
Approximating Tradeoff Curve
Bidimensional Discretization
Time
Error
Approximating Tradeoff Curve
Time
Error
Approximating Tradeoff Curve
Randomized configuration:
Time
• Execute 𝐴 with probability 1 − 𝜅
• Execute 𝐵 with probability 𝜅
𝐴
𝐶
𝐵
Error
𝜅
1−𝜅
Approximating Tradeoff Curve
Time
𝑇
𝐸
𝐸
𝜀𝐸
Error
Approximating Tradeoff Curve
Time
𝑇
𝜀T
𝑇
𝐸
Error
Properties of the Algorithm
Performance
• Number of tradeoff curve points: 𝑛𝑝 = 𝑂
𝑛 log 𝑛
𝜀
• Most expensive operation: bidimensional discretization
Calling LP solver 𝑂 𝑛𝑝 ⋅ 𝑛 times
Each call can have 𝑂 𝑛𝑝 ⋅ 𝑛 variables
Precision
• Precision decreases linearly with the number of nodes 𝑛
• To obtain 1 + 𝜀 -approximation set intermediate 𝜀 ′ = 𝜀 Ω(𝑛)
Space
• Storing tradeoff curves: 𝑂 𝑛𝑝2 ⋅ 𝑛
Obtaining Optimized Programs
Tradeoff curves for all subcomputations:
• Each curve contains partial configuration
Probability of executing local function nodes
Number of inputs to sample from reduction node
Error tolerated by subcomputation
• Distribution over optimal program configurations
Incrementally construct configuration vector:
• For every execution
• Traverse the tree, starting from root
• Time to get full vector: 𝑂(𝑛)
Related Work
Accuracy-aware transformations
• Empirical justification: training/test input set
[Rinard ICS ‘06, Rinard OOPSLA ’07, Ansel et al. PLDI ‘09, Misailovic et al. ICSE ’10, Baek & Chilimbi PLDI ‘10
Hoffmann et al. ASPLOS ‘11, Sidiroglou et al. FSE ‘11]
• Probabilistic accuracy analysis for loop perforation
[Misailovic et al. SAS ‘11, Chaudhuri et al. FSE ‘11]
Related Work
Accuracy-aware transformations
• Empirical justification: training/test input set
[Rinard ICS ‘06, Rinard OOPSLA ’07, Ansel et al. PLDI ‘09, Misailovic et al. ICSE ’10, Baek & Chilimbi PLDI ‘10
Hoffmann et al. ASPLOS ‘11, Sidiroglou et al. FSE ‘11]
• Probabilistic accuracy analysis for loop perforation
[Misailovic et al. SAS ‘11, Chaudhuri et al. FSE ‘11]
Ensuring safety of transformed programs
• Separating critical and approximate parts of program
[Carbin & Rinard ISSTA ’10, Sampson et al. PLDI ’11]
• Verifying relaxed semantics of programs [Carbin et al. CSAIL-TR ‘11]
Analytic properties of programs
[Majumdar & Saha RTSS ‘09, Chaudhuri et al. POPL’10, Ivancic et al. MEMOCODE ‘10 , Reed & Pierce ICFP ’10,
Chaudhuri & Solar-Lezama PLDI ’10 , Chaudhuri et al. FSE ’11]
Summary
Model of Computation
• Accuracy-aware program transformations
• Effects on overall accuracy and execution time
Explore and Exploit Optimal Tradeoffs
• Approximate optimal tradeoff curve construction
• Polynomial, dynamic programming algorithm
• Randomized program configurations to achieve tradeoffs
Envisioned Applications
• Image and video processing, numerical algorithms,
queries on big data sets, machine learning, …
• Optimization, fault tolerance, dynamic adaptation