Adaptive annealing: a near-optimal
connection between
sampling and counting
Daniel Štefankovič
(University of Rochester)
Santosh Vempala
Eric Vigoda
(Georgia Tech)
Counting
independent sets
spanning trees
matchings
perfect matchings
k-colorings
Compute the number of
independent sets
(hard-core gas model)
independent set =
subset S of vertices,
of a graph
no two in S are neighbors
# independent sets = 7
independent set = subset S of vertices
no two in S are neighbors
# independent sets = 5598861
independent set = subset S of vertices
no two in S are neighbors
graph G # independent sets in G
#P-complete
#P-complete even for 3-regular graphs
(Dyer, Greenhill, 1997)
graph G # independent sets in G
?
approximation
randomization
We would like to know Q
Goal: random variable
Y
such that
P( (1-)Q Y (1+)Q ) 1-
“Y gives (1)-estimate”
(approx) counting sampling
Valleau,Card’72 (physical chemistry),
Babai’79 (for matchings and colorings),
Jerrum,Valiant,V.Vazirani’86
the outcome of the JVV reduction:
random variables: X1 X2 ... Xt
such that
1) E[X X ... X ] = “WANTED”
1 2
t
2) the Xi are easy to estimate
V[Xi]
squared coefficient
=
O(1)
2
of variation (SCV)
E[Xi]
(approx) counting sampling
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t / ) samples (O(t/ ) from each X )
2
i
give
1 estimator of “WANTED” with prob3/4
JVV for independent sets
GOAL: given a graph G, estimate the
number of independent sets of G
1
# independent sets =
P(
)
JVV for independent sets
P(
P(
?
?
?
X1
)P(
P(AB)=P(A)P(B|A)
)=
?
?
) P( )P( )
?
X2
Xi [0,1] and E[Xi] ½
X3
V[Xi]
X4
=
O(1)
E[Xi]2
Self-reducibility for independent sets
P(
?
?
?
)
5
=
7
Self-reducibility for independent sets
P(
?
?
?
7
=
5
)
5
=
7
Self-reducibility for independent sets
P(
?
?
?
7
=
5
)
5
=
7
7
=
5
Self-reducibility for independent sets
P(
?
?
)
5
=
3
3
=
5
Self-reducibility for independent sets
P(
?
?
)
5
=
3
3
=
5
5
=
3
Self-reducibility for independent sets
7
=
5
7 5 3
=
5 3 2
7 5
=
5 3
=7
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
JVV: If we have a sampler oracle:
graph G
SAMPLER
ORACLE
random
independent
set of G
then FPRAS using O(n2) samples.
ŠVV: If we have a sampler oracle:
, graph G
SAMPLER
ORACLE
set from
gas-model
Gibbs at
then FPRAS using O*(n) samples.
Application – independent sets
O*( |V| ) samples suffice for counting
Cost per sample (Vigoda’01,Dyer-Greenhill’01)
time = O*( |V| ) for graphs of degree 4.
Total running time:
O* ( |V|2 ).
Other applications
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for <C
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2
(using Jerrum’95)
total running time
easy = hot
hard = cold
Hamiltonian
4
2
1
0
Big set =
Hamiltonian
H : {0,...,n}
Goal: estimate
-1
|H (0)|
-1
|H (0)|
= E[X1] ... E[Xt ]
Distributions between hot and cold
= inverse temperature
= 0 hot uniform on
= cold uniform on H-1(0)
(x) exp(-H(x))
(Gibbs distributions)
Distributions between hot and cold
(x) exp(-H(x))
exp(-H(x))
(x) =
Z()
Normalizing factor = partition function
Z()= exp(-H(x))
x
Partition function
Z()= exp(-H(x))
x
have:
want:
Z(0) = ||
-1
Z() = |H (0)|
Assumption:
we have a sampler oracle for
exp(-H(x))
(x) =
Z()
graph G
SAMPLER
ORACLE
subset of V
from
Assumption:
we have a sampler oracle for
exp(-H(x))
(x) =
Z()
W
Assumption:
we have a sampler oracle for
exp(-H(x))
(x) =
Z()
W
X = exp(H(W)( - ))
Assumption:
we have a sampler oracle for
exp(-H(x))
(x) =
Z()
W
X = exp(H(W)( - ))
can obtain the following ratio:
E[X] = (s) X(s) =
s
Z()
Z()
Our goal restated
Partition function
Z() = exp(-H(x))
x
Goal: estimate
Z() =
Z(1) Z(2)
Z(0) Z(1)
-1
Z()=|H (0)|
...
Z(t)
Z(t-1)
0 = 0 < 1 < 2 < ... < t =
Z(0)
Our goal restated
Z() =
Z(1) Z(2)
Z(0) Z(1)
...
Z(t)
Z(t-1)
Z(0)
Cooling schedule:
0 = 0 < 1 < 2 < ... < t =
How to choose the cooling schedule?
minimize length, while satisfying
V[Xi]
E[Xi]2
= O(1)
E[Xi] =
Z(i)
Z(i-1)
Parameters: A and n
Z() = exp(-H(x))
x
Z(0) = A
H: {0,...,n}
n
Z() =
ak e- k
k=0
ak = |H-1(k)|
Parameters
Z(0) = A
H: {0,...,n}
A
n
V
2
E
V!
V
perfect matchings
V!
V
k-colorings
V
k
E
independent sets
matchings
Previous cooling schedules
Z(0) = A
H: {0,...,n}
0 = 0 < 1 < 2 < ... < t =
“Safe steps”
+ 1/n
(Bezáková,Štefankovič,
(1 + 1/ln A)
Vigoda,V.Vazirani’06)
ln A
Cooling schedules of length
O( n ln A)
O( (ln n) (ln A) )
(Bezáková,Štefankovič,
Vigoda,V.Vazirani’06)
No better fixed schedule possible
Z(0) = A
H: {0,...,n}
A schedule that works for all
-n
A
Za() =
(1 + a e
)
1+a
(with a[0,A-1])
has LENGTH ( (ln n)(ln A) )
Parameters
Z(0) = A
H: {0,...,n}
Our main result:
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Previously:
non-adaptive schedules
of length *( ln A )
Related work
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Lovász-Vempala
Volume of convex bodies in O*(n4)
schedule of length O(n1/2)
(non-adaptive cooling schedule)
Existential part
Lemma:
for every partition function there exists
a cooling schedule of length O*((ln A)1/2)
can get adaptive schedule
of length O* ( (ln A)1/2 )
Express SCV using partition function
(going from to )
W
E[X2]
2
E[X]
E[X] =
Z()
Z()
X = exp(H(W)( - ))
=
Z(2-) Z()
2
Z()
C
E[X2]
2
E[X]
=
Z(2-) Z()
2
Z()
C
2-
f()=ln Z()
Proof:
C’=(ln C)/2
f()=ln Z()
Proof:
1
f is decreasing
f is convex
f’(0) –n
f(0) ln A
either f or f’
changes a lot
Let K:=f
1
(ln |f’|)
K
f:[a,b] R, convex, decreasing
can be “approximated” using
f’(a)
(f(a)-f(b))
f’(b)
segments
Technicality: getting to 2-
Proof:
2-
Technicality: getting to 2-
i
Proof:
i+1
2-
Technicality: getting to 2-
i
Proof:
i+1
2-
i+2
Technicality: getting to 2-
i
Proof:
ln ln A
extra
steps
i+1
2-
i+2
i+3
Existential Algorithmic
can get adaptive schedule
of length O* ( (ln A)1/2 )
can get adaptive schedule
*
1/2
of length O ( (ln A) )
Algorithmic construction
Our main result:
using a sampler oracle for
exp(-H(x))
(x) =
Z()
we can construct a cooling schedule of length
38 (ln A)1/2(ln ln A)(ln n)
Total number of oracle calls
107 (ln A) (ln ln A+ln n)7 ln (1/)
Algorithmic construction
current inverse temperature
ideally move to such that
B1
E[X2]
E[X]2
B2
E[X] =
Z()
Z()
Algorithmic construction
current inverse temperature
ideally move to such that
B1
E[X2]
E[X]2
B2
E[X] =
X is “easy to estimate”
Z()
Z()
Algorithmic construction
current inverse temperature
ideally move to such that
B1
E[X2]
E[X]2
B2
E[X] =
Z()
Z()
we make progress (assuming B1>1)
Algorithmic construction
current inverse temperature
ideally move to such that
B1
E[X2]
E[X]2
B2
E[X] =
need to construct a “feeler” for this
Z()
Z()
Algorithmic construction
current inverse temperature
ideally move to such that
B1
E[X2]
E[X]2
B2
E[X] =
Z()
Z(2-)
Z()
Z()
need to construct a “feeler” for this
Z()
Z()
Algorithmic construction
current inverse temperature
bad “feeler”
ideally move to such that
B1
E[X2]
E[X]2
B2
E[X] =
Z()
Z(2-)
Z()
Z()
need to construct a “feeler” for this
Z()
Z()
Rough estimator for
n
Z() =
Z()
Z()
ak e- k
k=0
For W we have P(H(W)=k) =
ak e- k
Z()
Rough estimator for
Z()
Z()
If H(X)=k likely at both , rough
n
estimator
Z() =
ak e- k
k=0
For W we have P(H(W)=k) =
For U we have P(H(U)=k) =
ak e- k
Z()
ak e- k
Z()
Rough estimator for
Z()
Z()
For W we have P(H(W)=k) =
For U we have P(H(U)=k) =
P(H(U)=k) k(-) Z()
=
e
P(H(W)=k)
Z()
ak e- k
Z()
ak e- k
Z()
Z()
Rough estimator for
Z()
n
Z() =
ak e- k
k=0
For W we have
P(H(W)[c,d]) =
d
ak e- k
k=c
Z()
Rough estimator for
If |-| |d-c| 1 then
Z()
Z()
Z()
1 Z()
P(H(U)[c,d]) ec(-)
e
P(H(W)[c,d])
e Z()
Z()
We also need P(H(U) [c,d])
P(H(W) [c,d])
to be large.
Split {0,1,...,n} into h 4(ln n) ln A
intervals
[0],[1],[2],...,[c,c(1+1/ ln A)],...
for any inverse temperature there
exists a interval with P(H(W) I) 1/8h
We say that I is HEAVY for
Algorithm
repeat
find an interval I which is heavy for
the current inverse temperature
see how far I is heavy (until some *)
use the interval I for the feeler
either
* make progress, or
* eliminate the interval I
Z()
Z(2-)
Z()
Z()
Algorithm
repeat
find an interval I which is heavy for
the current inverse temperature
see how far I is heavy (until some *)
use the interval I for the feeler
either
* make progress, or
* eliminate the interval I
* or make a “long move”
Z()
Z(2-)
Z()
Z()
if we have sampler oracles for
then we can get adaptive schedule
of length t=O* ( (ln A)1/2 )
independent sets
O*(n2)
(using Vigoda’01, Dyer-Greenhill’01)
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for <C
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2
(using Jerrum’95)
Appendix – proof of:
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t / ) samples (O(t/ ) from each X )
2
i
give
1 estimator of “WANTED” with prob3/4
The Bienaymé-Chebyshev inequality
P( Y gives (1)-estimate )
V[Y]
1E[Y]2
Y=
1
2
X1 + X2 + ... + Xn
n
The Bienaymé-Chebyshev inequality
P( Y gives (1)-estimate )
squared coefficient
of variation SCV
V[Y]
1
=
2
E[Y]
n
V[Y]
1E[Y]2
V[X]
E[X]2
Y=
1
2
X1 + X2 + ... + Xn
n
The Bienaymé-Chebyshev inequality
Let X1,...,Xn,X be independent, identically
distributed random variables,
Q=E[X]. Let
Y=
X1 + X2 + ... + Xn
n
Then
P( Y gives (1)-estimate of Q )
V[X] 1
12
2
n E[X]
Chernoff’s bound
Let X1,...,Xn,X be independent, identically
distributed random variables, 0 X 1,
Q=E[X]. Let
Y=
X1 + X2 + ... + Xn
n
Then
P( Y gives (1)-estimate of Q )
1– e
- 2 . n . E[X] / 3
V[X]
n=
E[X]2
n=
0X1
1
E[X]
3
ln (1/)
2
1
2
1
0X1
n=
n=
0X1
1
E[X]
1
E[X]
3
ln (1/)
2
1
2
1
Median “boosting trick”
n=
1
E[X]
4
2
Y=
P(
X1 + X2 + ... + Xn
n
) 3/4
=
(1-)Q
Y
(1+)Q
Median trick – repeat 2T times
(1-)Q
P(
(1+)Q
) 3/4
P(
> T out of 2T
)1-e
P(
median is in )
1-e
-T/4
-T/4
0X1
n=
1
E[X]
+ median trick
n=
0X1
1
E[X]
3
2
ln (1/)
32
ln (1/)
2
n=
V[X] 32
ln
(1/)
E[X]2 2
+ median trick
n=
0X1
1
E[X]
3
2
ln (1/)
Appendix – proof of:
1)
E[X1 X2 ... Xt]
= “WANTED”
2) the Xi are easy to estimate
V[Xi]
= O(1)
2
E[Xi]
Theorem (Dyer-Frieze’91)
2
2
O(t / ) samples (O(t/ ) from each X )
2
i
give
1 estimator of “WANTED” with prob3/4
How precise do the Xi have to be?
First attempt – Chernoff’s bound
How precise do the Xi have to be?
First attempt – Chernoff’s bound
Main idea:
(1 )(1 )(1 )... (1 ) 1
t
t
t
t
How precise do the Xi have to be?
First attempt – Chernoff’s bound
Main idea:
(1 )(1 )(1 )... (1 ) 1
t
t
t
t
n=
(
1
E[X]
1
2
ln (1/)
)
each term (t2) samples (t3) total
How precise do the Xi have to be?
Bienaymé-Chebyshev is better
(Dyer-Frieze’1991)
X=X1 X2 ... Xt
GOAL: SCV(X) 2/4
squared coefficient
of variation (SCV)
P( X gives (1)-estimate )
V[X]
1E[X]2
1
2
How precise do the Xi have to be?
Bienaymé-Chebyshev is better
(Dyer-Frieze’1991)
Main idea:
2/4
2/4
SCV(Xi)
SCV(X) <
t
SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1
V[X]
SCV(X)=
=
2
E[X]
E[X2]
-1
2
E[X]
How precise do the Xi have to be?
Bienaymé-Chebyshev is better
(Dyer-Frieze’1991)
Main idea:
X = X1 X2 ... Xt
2/4
2/4
SCV(Xi)
SCV(X) <
t
each term O(t /2) samples O(t2/2) total
if we have sampler oracles for
then we can get adaptive schedule
of length t=O* ( (ln A)1/2 )
independent sets
O*(n2)
(using Vigoda’01, Dyer-Greenhill’01)
matchings
O*(n2m)
(using Jerrum, Sinclair’89)
spin systems:
Ising model
O*(n2) for <C
(using Marinelli, Olivieri’95)
k-colorings
O*(n2) for k>2
(using Jerrum’95)
© Copyright 2026 Paperzz