Constrained Sampling and Counting

Constrained Sampling
and Counting
Moshe Y. Vardi
Rice University
Joint work with: Supratik Chakraborty (IIT Bombay), Daniel Fremont (UC Berkeley),
Kuldeep Meel (Rice), and Sanjit Seshia (UC Berkeley)
1
How do we guarantee that systems work
correctly ?
Functional Verification
 Formal verification
 Challenges: formal requirements, scalability
 ~10-15% of verification effort
 Dynamic verification: dominant approach
2
Dynamic Verification
 Design is simulated with test vectors
 Test vectors represent different verification
scenarios
 Results from simulation compared to
intended results
 Challenge: Exceedingly large test spaces!
3
Motivating Example
b
a
64 bit
64 bit
c = f(a,b)
64 bit
How do we test the circuit works ?
• Try for all values of a and b
• 2128 possibilities
• Sun will go nova before done!
• Not scalable
c
4
Constrained-Random Simulation
b
a
64 bit
64 bit
c = f(a,b)
64 bit
Sources for Constraints
• Designers:
1. a +64 11 *32 b = 12
2. a <64 (b >> 4)
• Past Experience:
1. 40 <64 34 + a <64 5050
2. 120 <64 b <64 230
• Users:
1. 232 *32 a + b != 1100
2. 1020 <64 (b /64 2) +64 a <64 2200
c
 Test vectors: solutions of constraints
 Proposed by Lichtenstein, Malka, Aharon (IAAI 594)
Constrained-Random Simulation
b
a
64 bit
64 bit
c = f(a,b)
64 bit
Sources for Constraints
• Designers:
1. a +64 11 *32 b = 12
2. a <64 (b >> 4)
• Past Experience:
1. 40 <64 34 + a <64 5050
2. 120 <64 b <64 230
• Users:
1. 232 *32 a + b != 1100
2. 1020 <64 (b /64 2) +64 a <64 2200
c
Problem: How can we uniformly sample the values of a and b
satisfying the above constraints?
6
Problem Formulation
Set of Constraints
b
a
64 bit
64 bit
c = f(a,b)
SAT Formula
Sample satisfying assignments
uniformly at random
64 bit
c
SAT Sampling
7
Roadmap
 SAT Sampling
 SAT Counting
 Extensions
 Future Directions
8
Diverse Applications
Constrained
Random
Simulation
Searchbased
Synthesis
Automatic
Problem
Generation
SAT Sampling
Probabilistic
Inference
Planning
under
uncertainity
9
Prior Work
UniGen
BDD
Guarantees
BGP
MCMC
SAT-Based
Performance
10
Core Idea
11
Partitioning into cells
Cells should be roughly equal in size and small
enough to enumerate completely
12
Partitioning into cells
Pick a random cell
Pick a random solution from this cell
13
How to Partition?
How to partition into roughly equal
small cells of solutions without
knowing the distribution of
solutions?
Universal Hashing
[Carter-Wegman 1979]
14
Universal Hashing
 Hash functions: mapping {0,1}n to {0,1}m
 (2n elements to 2m cells)
 Random inputs => All cells are roughly equal (in expectation)
 Universal family of hash functions:
 Choose hash function randomly from family
 For arbitrary distribution on inputs => All cells are roughly equal
(in expectation)
15
Strong Universality
 H(n,m,r): Family of r-universal hash functions mapping
{0,1}n to {0,1}m (2n elements to 2m cells)
 r: degree of independence of hashed inputs
 Higher r => Stronger guarantee on range of size of cells
 r-wise universality => Polynomials of degree r-1
 Stronger universality => Higher complexity
16
Hashing-based Approaches
Solution space
n-universal hashing
All cells required to be small
Uniform Generation
BGP Algorithm (Bellare et al, 2000)
17
Scaling to ~0.8M Variables
Solution space
n-universal hashing
3-universal hashing
From tens of variables to
~0.8M variables!
Random
BGP Algorithm
All cells are small
Uniform Generation
UniGen
Only a randomly chosen
cell needs to be “small”
Almost-Uniform Generation
18
Underlying Hash Functions
 A cell can be represented as the conjunction of:
 Input formula F
 m random XOR constraints
 2m is the number of cells desired
 Use CryptoMiniSAT for CNF + XOR formulas
19
Strong Theoretical Guarantees
 Uniformity
 Almost- Uniformity
 UniGen succeeds with probability > 0.52
20
18000
case47
case_3_b14_3
case105
case8
case203
case145
case61
case9
case15
case140
case_2_b14_1
case_3_b14_1
squaring14
squaring7
case_2_ptb_1
case_1_ptb_1
case_2_b14_2
case_3_b14_2
2-3 Orders of Magnitude Faster
Timeout: 18000 seconds
1800
180
Time(s)
18
UniGen
1.8
XORSample'
0.18
Benchmarks
21
Results: Uniformity
500
450
400
Frequency
350
300
250
200
150
100
50
0
184
208
228
248
268
288
#Solutions
• Benchmark: case110.cnf; #var: 287; #clauses: 1263
• Total Runs: 4x106; Total Solutions : 16384
22
Results: Uniformity
500
US
450
UniGen
400
Frequency
350
300
250
200
150
100
50
0
184
208
228
248
268
288
#Solutions
• Benchmark: case110.cnf; #var: 287; #clauses: 1263
• Total Runs: 4x106; Total Solutions : 16384
23
Roadmap
 SAT Sampling
 SAT Counting
 Works inspired from core ideas
 Future Directions
24
What is SAT Counting?
 Given a SAT formula F
 RF: Set of all solutions of F
 Problem (#SAT): Estimate the number of solutions of
F (#F) i.e., what is the cardinality of RF?
 E.g., F = (a v b)
 RF = {(0,1), (1,0), (1,1)}
 The number of solutions (#F) = 3
#P: The class of counting problems for
decision problems in NP!
25
Practical Applications
Wide range of applications!
 Estimating coverage achieved
 Probabilistic reasoning/Bayesian inference
 Planning with uncertainty
 Multi-agent/ adversarial reasoning
[Roth 96, Sang 04, Bacchus 04, Domshlak 07]
26
Counting through Partitioning
27
Counting through Partitioning
Pick a random cell
Total # of solutions= #solutions in the cell
* total # of cells
28
Strong Theoretical Results
ApproxMC (CNF: F, tolerance: e, confidence:d)
Suppose ApproxMC(F,e,d) returns C. Then,
ApproxMC runs in time polynomial in log (1-d)-1,
|F|, e-1 relative to SAT oracle
29
Mean Error: Only 4% (e: 0.75)
3.6E+16
1.1E+15
3.5E+13
1.1E+12
Count
3.4E+10
Cachet*1.75
1.1E+09
3.4E+07
Cachet/1.75
1.0E+06
ApproxMC
3.3E+04
1.0E+03
3.2E+01
1.0E+00
0
10
20
30
40
50
60
70
80
90
Benchmarks
Mean error: 4% – much smaller than the
theoretical guarantee of 75%
30
Roadmap
 SAT Sampling
 Model Counting
 Extensions
 Future Directions
31
Extensions
 Improved performance of UniGen
 Distributed sampling after preprocessing
 ~20 SAT calls per sample
 Weighted (non-uniform) sampling and counting
32
Roadmap
 SAT Sampling
 Model Counting
 Extensions
 Future Directions
33
Extension to More Expressive
Domains (SMT, CSP, ASP)
 Efficient strongly-universal hashing schemes
 Solvers to handle F + Hash efficiently
 CryptoMiniSAT has fueled progress for SAT
domain
 Similar solvers for other domains?
34
Key Takeaways
 Constrained sampling and counting are fundamental problems
with wide variety of applications
 Prior methods failed to scale or offered very weak theoretical
guarantees
 UniGen: The first scalable generator with theoretical
guarantees of almost-uniformity
 ApproxMC: The first scalable approximate SAT counter
 Extensions of underlying techniques in different contexts
 Visit: www.cs.rice.edu/~kgm2/ for papers/tools/source code!
35
Backup Slides
36
Can Solve a Large Class of Problems
70000
Time (seconds)
60000
50000
40000
ApproxMC
30000
Cachet
20000
10000
0
0
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
Benchmarks
Large class of problems that lie beyond the exact
counters but can be computed by ApproxMC 37
Exploring CNF+XOR
 Very little understanding as of now
 Eager/Lazy approach for XORs?
 How to reduce size of XORs further?
38
Weighted Counting
Ref: “Distribution-Aware Sampling and Weighted Model Counting for
SAT” (In Proc. of AAAI 2014 )
39
Weighted Counting
Given
 CNF Formula F
 Weight Function W over assignments
Problem
 What is the sum of weights of satisfying assignments?
Example
 F = (a ∨ b)
 W([0,1]) = W([1,0]) = 1/3 W([1,1]) = W([0,0]) = 1/6
 W(F) = 1/3 + 1/3 + 1/6 = 5/6
40
Partition into (weighted) equal
“small” cells
41
Partition into (weighted) equal
“small” cells
Pick a random cell
Pick (by weight) a random solution from this cell
42
Can you always achieve
partitioning?
What if one solution dominates the entire solution
space
Tilt = wmax/wmin
Small tilt → All solutions contribute
43
How to handle large tilt?
.001
.001
.001
.001
.002
.001
.002
.002
.992
Tilt = 992
44
Handling Large Tilt
.001
.001
Can be achieved
with
Pseudo-Boolean
.002
.001
.50 ≤ wtSolver
<1
Still
not
Optimization
.001a SAT problem
.001
.002
.002
.001 ≤ wt < .002
.992
45