Slide 1 - cs.rochester.edu

Adaptive annealing: a near-optimal
connection between
sampling and counting
Daniel Štefankovič
(University of Rochester)
Santosh Vempala
Eric Vigoda
(Georgia Tech)
Counting
independent sets
spanning trees
matchings
perfect matchings
k-colorings
Compute the number of
independent sets
(hard-core gas model)
independent set =
subset S of vertices,
of a graph
no two in S are neighbors
# independent sets = 7
independent set = subset S of vertices
no two in S are neighbors
# independent sets = 5598861
independent set = subset S of vertices
no two in S are neighbors
graph G  # independent sets in G
#P-complete
#P-complete even for 3-regular graphs
(Dyer, Greenhill, 1997)
graph G  # independent sets in G
?
approximation
randomization
We would like to know Q
Goal: random variable
Y
such that
P( (1-)Q  Y  (1+)Q )  1-
“Y gives (1)-estimate”
(approx) counting  sampling
Valleau,Card’72 (physical chemistry),
Babai’79 (for matchings and colorings),
Jerrum,Valiant,V.Vazirani’86
the outcome of the JVV reduction:
random variables: X1 X2 ... Xt
such that
1) E[X X ... X ] = “WANTED”
1 2
t
2) the Xi are easy to estimate
V[Xi]
squared coefficient
=
O(1)
2
of variation (SCV)
E[Xi]
(approx) counting  sampling
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t / ) samples (O(t/ ) from each X )
2
i
give
1 estimator of “WANTED” with prob3/4
JVV for independent sets
GOAL: given a graph G, estimate the
number of independent sets of G
1
# independent sets =
P(
)
JVV for independent sets
P(
P(
?
?
?
X1
)P(
P(AB)=P(A)P(B|A)
)=
?
?
) P( )P( )
?
X2
Xi  [0,1] and E[Xi] ½
X3

V[Xi]
X4
=
O(1)
E[Xi]2
Self-reducibility for independent sets
P(
?
?
?
)
5
=
7
Self-reducibility for independent sets
P(
?
?
?
7
=
5
)
5
=
7
Self-reducibility for independent sets
P(
?
?
?
7
=
5
)
5
=
7
7
=
5
Self-reducibility for independent sets
P(
?
?
)
5
=
3
3
=
5
Self-reducibility for independent sets
P(
?
?
)
5
=
3
3
=
5
5
=
3
Self-reducibility for independent sets
7
=
5
7 5 3
=
5 3 2
7 5
=
5 3
=7
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
ŠVV: If we have a sampler oracle:
, graph G
SAMPLER
ORACLE
set from
gas-model
Gibbs at 
then FPRAS using O*(n) samples.
Application – independent sets
O*( |V| ) samples suffice for counting
Cost per sample (Vigoda’01,Dyer-Greenhill’01)
time = O*( |V| ) for graphs of degree  4.
Total running time:
O* ( |V|2 ).
Other applications
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for <C
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2
(using Jerrum’95)
total running time
easy = hot
hard = cold
Hamiltonian
4
2
1
0
Big set = 
Hamiltonian
H :   {0,...,n}
Goal: estimate
-1
|H (0)|
-1
|H (0)|
= E[X1] ... E[Xt ]
Distributions between hot and cold
 = inverse temperature
 = 0  hot  uniform on 
 =   cold  uniform on H-1(0)
 (x)  exp(-H(x))
(Gibbs distributions)
Distributions between hot and cold
 (x)  exp(-H(x))
exp(-H(x))
 (x) =
Z()
Normalizing factor = partition function
Z()=  exp(-H(x))
x
Partition function
Z()=  exp(-H(x))
x
have:
want:
Z(0) = ||
-1
Z() = |H (0)|
Assumption:
we have a sampler oracle for 
exp(-H(x))
 (x) =
Z()
graph G

SAMPLER
ORACLE
subset of V
from 
Assumption:
we have a sampler oracle for 
exp(-H(x))
 (x) =
Z()
W  
Assumption:
we have a sampler oracle for 
exp(-H(x))
 (x) =
Z()
W  
X = exp(H(W)( - ))
Assumption:
we have a sampler oracle for 
exp(-H(x))
 (x) =
Z()
W  
X = exp(H(W)( - ))
can obtain the following ratio:
E[X] =  (s) X(s) =
s
Z()
Z()
Our goal restated
Partition function
Z() =  exp(-H(x))
x
Goal: estimate
Z() =
Z(1) Z(2)
Z(0) Z(1)
-1
Z()=|H (0)|
...
Z(t)
Z(t-1)
0 = 0 < 1 <  2 < ... < t = 
Z(0)
Our goal restated
Z() =
Z(1) Z(2)
Z(0) Z(1)
...
Z(t)
Z(t-1)
Z(0)
Cooling schedule:
0 = 0 < 1 <  2 < ... < t = 
How to choose the cooling schedule?
minimize length, while satisfying
V[Xi]
E[Xi]2
= O(1)
E[Xi] =
Z(i)
Z(i-1)
Parameters: A and n
Z() =  exp(-H(x))
x
Z(0) = A
H:  {0,...,n}
n
Z() =

ak e- k
k=0
ak = |H-1(k)|
Parameters
Z(0) = A
H:  {0,...,n}
A
n
V
2
E
 V!
V
perfect matchings
V!
V
k-colorings
V
k
E
independent sets
matchings
Previous cooling schedules
Z(0) = A
H:  {0,...,n}
0 = 0 < 1 <  2 < ... < t = 
“Safe steps”
   + 1/n
(Bezáková,Štefankovič,
   (1 + 1/ln A)
Vigoda,V.Vazirani’06)
ln A  
Cooling schedules of length
O( n ln A)
O( (ln n) (ln A) )
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
No better fixed schedule possible
Z(0) = A
H:  {0,...,n}
A schedule that works for all
-n
A
Za() =
(1 + a e
)
1+a
(with a[0,A-1])
has LENGTH  ( (ln n)(ln A) )
Parameters
Z(0) = A
H:  {0,...,n}
Our main result:
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Previously:
non-adaptive schedules
of length *( ln A )
Related work
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Lovász-Vempala
Volume of convex bodies in O*(n4)
schedule of length O(n1/2)
(non-adaptive cooling schedule)
Existential part
Lemma:
for every partition function there exists
a cooling schedule of length O*((ln A)1/2)
can get adaptive schedule
of length O* ( (ln A)1/2 )
Express SCV using partition function
(going from  to )
W  
E[X2]
2
E[X]
E[X] =
Z()
Z()
X = exp(H(W)( - ))
=
Z(2-) Z()
2
Z()
 C
E[X2]
2
E[X]


=
Z(2-) Z()
2
Z()
 C
2-
f()=ln Z()
Proof:
 C’=(ln C)/2
f()=ln Z()
Proof:
1
f is decreasing
f is convex
f’(0)  –n
f(0)  ln A
either f or f’
changes a lot
Let K:=f
1
(ln |f’|) 
K
f:[a,b]  R, convex, decreasing
can be “approximated” using
f’(a)
(f(a)-f(b))
f’(b)
segments
Technicality: getting to 2-
Proof:


2-
Technicality: getting to 2-
i
Proof:

i+1

2-
Technicality: getting to 2-
i
Proof:

i+1

2-
i+2
Technicality: getting to 2-
i
Proof:
ln ln A
extra
steps

i+1

2-
i+2
i+3
Existential  Algorithmic
can get adaptive schedule
of length O* ( (ln A)1/2 )
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Algorithmic construction
Our main result:
using a sampler oracle for 
exp(-H(x))
 (x) =
Z()
we can construct a cooling schedule of length
 38 (ln A)1/2(ln ln A)(ln n)
Total number of oracle calls
 107 (ln A) (ln ln A+ln n)7 ln (1/)
Algorithmic construction
current inverse temperature 
ideally move to  such that
B1 
E[X2]
E[X]2
 B2
E[X] =
Z()
Z()
Algorithmic construction
current inverse temperature 
ideally move to  such that
B1 
E[X2]
E[X]2
 B2
E[X] =
X is “easy to estimate”
Z()
Z()
Algorithmic construction
current inverse temperature 
ideally move to  such that
B1 
E[X2]
E[X]2
 B2
E[X] =
Z()
Z()
we make progress (assuming B1>1)
Algorithmic construction
current inverse temperature 
ideally move to  such that
B1 
E[X2]
E[X]2
 B2
E[X] =
need to construct a “feeler” for this
Z()
Z()
Algorithmic construction
current inverse temperature 
ideally move to  such that
B1 
E[X2]
E[X]2
 B2
E[X] =
Z()
Z(2-)
Z()
Z()
need to construct a “feeler” for this
Z()
Z()
Algorithmic construction
current inverse temperature 
bad “feeler”
ideally move to  such that
B1 
E[X2]
E[X]2
 B2
E[X] =
Z()
Z(2-)
Z()
Z()
need to construct a “feeler” for this
Z()
Z()
Rough estimator for
n
Z() =

Z()
Z()
ak e- k
k=0
For W   we have P(H(W)=k) =
ak e- k
Z()
Rough estimator for
Z()
Z()
If H(X)=k likely at both ,   rough
n
estimator
Z() =

ak e- k
k=0
For W   we have P(H(W)=k) =
For U   we have P(H(U)=k) =
ak e- k
Z()
ak e- k
Z()
Rough estimator for
Z()
Z()
For W   we have P(H(W)=k) =
For U   we have P(H(U)=k) =
P(H(U)=k) k(-) Z()
=
e
P(H(W)=k)
Z()
ak e- k
Z()
ak e- k
Z()
Z()
Rough estimator for
Z()
n
Z() =

ak e- k
k=0
For W   we have
P(H(W)[c,d]) =
d
 ak e- k
k=c
Z()
Rough estimator for
If |-| |d-c|  1 then
Z()
Z()
Z()
1 Z()
P(H(U)[c,d]) ec(-)

e
P(H(W)[c,d])
e Z()
Z()
We also need P(H(U)  [c,d])
P(H(W)  [c,d])
to be large.
Split {0,1,...,n} into h  4(ln n) ln A
intervals
[0],[1],[2],...,[c,c(1+1/ ln A)],...
for any inverse temperature  there
exists a interval with P(H(W) I)  1/8h
We say that I is HEAVY for 
Algorithm
repeat
find an interval I which is heavy for
the current inverse temperature 
see how far I is heavy (until some *)
use the interval I for the feeler
either
* make progress, or
* eliminate the interval I
Z()
Z(2-)
Z()
Z()
Algorithm
repeat
find an interval I which is heavy for
the current inverse temperature 
see how far I is heavy (until some *)
use the interval I for the feeler
either
* make progress, or
* eliminate the interval I
* or make a “long move”
Z()
Z(2-)
Z()
Z()
if we have sampler oracles for 
then we can get adaptive schedule
of length t=O* ( (ln A)1/2 )
independent sets
O*(n2)
(using Vigoda’01, Dyer-Greenhill’01)
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for <C
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2
(using Jerrum’95)
Appendix – proof of:
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t / ) samples (O(t/ ) from each X )
2
i
give
1 estimator of “WANTED” with prob3/4
The Bienaymé-Chebyshev inequality
P( Y gives (1)-estimate )
V[Y]
1E[Y]2
Y=
1
2
X1 + X2 + ... + Xn
n
The Bienaymé-Chebyshev inequality
P( Y gives (1)-estimate )
squared coefficient
of variation SCV
V[Y]
1
=
2
E[Y]
n
V[Y]
1E[Y]2
V[X]
E[X]2
Y=
1
2
X1 + X2 + ... + Xn
n
The Bienaymé-Chebyshev inequality
Let X1,...,Xn,X be independent, identically
distributed random variables,
Q=E[X]. Let
Y=
X1 + X2 + ... + Xn
n
Then
P( Y gives (1)-estimate of Q )
V[X] 1
12
2

n E[X]
Chernoff’s bound
Let X1,...,Xn,X be independent, identically
distributed random variables, 0  X  1,
Q=E[X]. Let
Y=
X1 + X2 + ... + Xn
n
Then
P( Y gives (1)-estimate of Q )
1– e
- 2 . n . E[X] / 3
V[X]
n=
E[X]2
n=
0X1
1
E[X]
3
ln (1/)
2
1
2
1

0X1
n=
n=
0X1
1
E[X]
1
E[X]
3
ln (1/)
2
1
2
1

Median “boosting trick”
n=
1
E[X]
4
2
Y=
P( 
X1 + X2 + ... + Xn
n
)  3/4
=
(1-)Q
Y
(1+)Q
Median trick – repeat 2T times
(1-)Q
P( 
(1+)Q
)  3/4

P(
> T out of 2T
)1-e

P(
median is in )
1-e
-T/4
-T/4
0X1
n=
1
E[X]
+ median trick
n=
0X1
1
E[X]
3
2
ln (1/)
32
ln (1/)
2

n=
V[X] 32
ln
(1/)
E[X]2 2
+ median trick
n=
0X1
1
E[X]
3
2
ln (1/)
Appendix – proof of:
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t / ) samples (O(t/ ) from each X )
2
i
give
1 estimator of “WANTED” with prob3/4
How precise do the Xi have to be?
First attempt – Chernoff’s bound
How precise do the Xi have to be?
First attempt – Chernoff’s bound
Main idea:




(1 )(1 )(1 )... (1 )  1
t
t
t
t
How precise do the Xi have to be?
First attempt – Chernoff’s bound
Main idea:




(1 )(1 )(1 )... (1 )  1
t
t
t
t
n=
(
1
E[X]
1
2
ln (1/)
)
each term  (t2) samples   (t3) total
How precise do the Xi have to be?
Bienaymé-Chebyshev is better
(Dyer-Frieze’1991)
X=X1 X2 ... Xt
GOAL: SCV(X)  2/4
squared coefficient
of variation (SCV)
P( X gives (1)-estimate )
V[X]
1E[X]2
1
2
How precise do the Xi have to be?
Bienaymé-Chebyshev is better
(Dyer-Frieze’1991)
Main idea:
2/4
2/4
SCV(Xi) 

 SCV(X) <

t
SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1
V[X]
SCV(X)=
=
2
E[X]
E[X2]
-1
2
E[X]
How precise do the Xi have to be?
Bienaymé-Chebyshev is better
(Dyer-Frieze’1991)
Main idea:
X = X1 X2 ... Xt
2/4
2/4
SCV(Xi) 

 SCV(X) <

t
each term O(t /2) samples  O(t2/2) total
if we have sampler oracles for 
then we can get adaptive schedule
of length t=O* ( (ln A)1/2 )
independent sets
O*(n2)
(using Vigoda’01, Dyer-Greenhill’01)
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for <C
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2
(using Jerrum’95)