Recent Progress On Sampling Problem
Yin Tat Lee (MSR/UW), Santosh Vempala (Gatech)
My Dream
Tell the complexity of a convex problem by looking at the formula.
Example
Minimum Cost Flow Problem:
βͺ This is a linear program, each row has two non-zero.
It can be solved in π(π π). [LS14] (Previous: π(π π) for graph with π edges and π vertices.)
My Dream
Tell the complexity of a convex problem by looking at the formula.
Example
Submodular Minimization:
Fundamental in combinatorial
optimization.
Worth β₯ 2 Fulkerson prizes
where π satisfies diminishing return, i.e.
π π+π βπ π β€π π+π βπ π
βͺ π can be extended to a convex function on 0,1 π .
βͺ subgradient of π can be computed in π2 time.
It can be solved in π(π3 ). [LSW15] (Previous: π(π5 ))
βπ β π, π β π.
Algorithmic Convex Geometry
To describe a formula, we need some operations.
Given a convex set πΎ, we have following operations
βͺ Membership(x): Check if π₯ β πΎ.
βͺ Separation(x): Assert π₯ β πΎ, or find a hyperplane separate π₯ and πΎ.
βͺ Width(c): Compute minπ π π₯.
π₯βπΎ
βͺ Optimize(c): Compute argminπ π π₯.
π₯βπΎ
βͺ Sample(g): Sample according to π π₯ 1πΎ . (assume π is logconcave)
βͺ Integrate(g): Compute
πΎ
π π₯ ππ₯. (assume π is logconcave)
Theorem: They are all equivalent by polynomial time algorithms.
One of the Major Source of Polynomial Time Algorithms!
Algorithmic Convex Geometry
Traditionally viewed as impractical.
Now, we have an efficient version of
ellipsoid method.
Why those operations?
For any convex π, define the dual π β π = argmaxπ₯ π π π₯ β π(π₯), and ππΎ = β1πΎπ .
Progress: We are getting the tight polynomial equivalence between left 4.
Integration
Membership
ππΎ (π₯)
Width
ππΎβ (π)
Separation
πππΎ (π₯)
Convex Optimization
Optimization
πππΎβ (π)
π π₯ ππ₯
πΎ
Sample
~π βπ 1πΎ
Problem: Sampling
Input: a convex set πΎ.
Output: sample a point from the uniform distribution on K.
Generalized Problem:
Input: a logconcave distribution π
Output: sample a point according to π.
Why? useful for optimization, integration/counting, learning, rounding.
ο Best way to minimize convex function with noisy value oracle.
ο Only way to compute volume of convex set.
Non-trivial application: Convex Bandit
Game: For each round π‘ = 1,2, β― , π, the player
βͺ Adversary selects a convex loss function βπ‘
Sébastien Bubeck
Ronen Eldan
βͺ Chooses (possibly randomly) π₯π‘ from unit ball in n dim based on past observations.
βͺ Receives the loss/observation βπ‘ π₯π‘ β [0,1].
βͺ Nothing else about βπ‘ is revealed!
Measure performance by regret:
There is a good fixed action, but
βͺ We only learn one point each iteration!
βͺ Adversary can give confusing information!
The gold of standard is getting π( π).
Namely, π¦ 1000 π is better than π¦ π 2/3 .
Non-trivial application: Convex Bandit
Game: For each round π‘ = 1,2, β― , π, the player
βͺ Adversary selects a convex loss function βπ‘
Sébastien Bubeck
Ronen Eldan
βͺ Chooses (possibly randomly) π₯π‘ from unit ball in n dim based on past observations.
βͺ Receives the loss/observation βπ‘ π₯π‘ β [0,1].
βͺ Nothing else about βπ‘ is revealed!
Measure performance by regret:
After a decade of research, we have
π
π = π10.5 π.
(The first polynomial time and regret algorithm.)
The gold of standard is getting π( π).
Namely, π¦ 1000 π is better than π¦ π 2/3 .
How to Input the set
Oracle Setting:
ο A membership oracle: answer YES/NO to βπ₯ β πΎβ.
ο A ball π₯ + ππ΅ such that π₯ + ππ΅ β πΎ β π₯ + poly π ππ΅.
Explicit Setting:
ο Given explicitly, such as polytopes, spectrahedrons, β¦
ο In this talk, we focus on polytope {π΄π₯ β₯ π}. (m = # constraints)
Outline
ο Oracle Setting:
ο Introduce the ball walk
ο KLS conjecture and its related conjectures
ο Main Result
ο Explicit Setting: (original promised talk)
ο Introduce the geodesic walk
ο Bound the # of iteration
ο Bound the cost per iteration
Sampling Problem
Input: a convex set πΎ with a membership oracle
Output: sample a point from the uniform distribution on K.
Conjectured Lower Bound: π2 .
Generalized Problem: Given a logconcave distribution π, sampled π₯ from π.
Conjectured Optimal Algorithm: Ball Walk
At π₯, pick random π¦ from π₯ + πΏπ΅π ,
if π¦ is in πΎ, go to π¦.
otherwise, sample again
This walk may get trapped on one side if the set is not convex.
Isoperimetric constant
For any set πΎ, we define the isoperimetric constant ππΎ by
ππΎ = min
π
Area(ππ)
min(vol π , vol π π )
π large, hard to cut the set
Theorem
Given a random point in πΎ, we can generate another in
π
π( 2 2 log(1/π))
πΏ ππΎ
iterations of Ball Walk where πΏ is step size.
βͺ ππΎ or πΏ larger, mix better.
βͺ πΏ cannot be too large, otherwise, fail probability is ~1.
π small, easy to cut the set
Isoperimetric constant of Convex Set
Note that ππΎ is not affine invariant and can be arbitrary small.
L
1
ππΎ = 1/πΏ.
However, you can renormalize πΎ such that Cov πΎ = πΌ.
Definition: πΎ is isotropic, if it is mean 0 and Cov πΎ = πΌ.
Theorem: If πΏ <
0.001
,
π
ball walk stays inside the set with constant probability.
π2
Theorem: Given a random point in isotropic πΎ, we can generate another in π( 2 log(1/π))
ππΎ
To make body isotropic, we can sample the body to compute covariance.
KLS Conjecture
Kannan-Lovász-Simonovits Conjecture:
For any isotropic convex πΎ, ππΎ = Ξ©(1).
If this is true, Ball Walk takes O(π2 ) iter for isotropic πΎ
(Matched the believed information theoretical lower bound.)
To get the βtightβ reduction from membership to sampling,
it suffices to prove KLS conjecture
KLS conjecture and its related conjectures
Slicing Conjecture:
Any unit volume convex set πΎ has a slice with volume Ξ©(1).
Thin-Shell Conjecture:
For isotropic convex πΎ, πΌ( π₯ β π 2 ) = π(1).
Generalized Levy concentration:
For logconcave distribution π, 1-Lipschitz π with πΌπ = 0,
β |π π₯ β πΌπ| > π‘ = exp(βΞ© π‘ ).
Essentially, it is asking if all convex sets looks like ellipsoids.
Main Result
[Lovasz-Simonovits 93] π = Ξ© 1 πβ1/2 .
[Klartag 2006] π = Ξ© 1 πβ1/2 log1/2 π.
What if we cut the body by
sphere only?
ππΎ β
π
πππ π
2
β₯ ππΎ
[Fleury, Guedon, Paouris 2006] π = Ξ© 1 πβ1/2 log1/6 π log β2 log π.
[Klartag 2006] π = Ξ©(1)πβ0.4 .
[Fleury 2010] π = Ξ©(1)πβ0.375 .
[Guedon, Milman 2010] π = Ξ©(1)πβ0.333 .
[Eldan 2012] π = Ξ© 1 π = Ξ©(1)πβ0.333 .
[Lee Vempala 2016] π = Ξ© 1 πβ0.25 .
In particular, we have π(π2.5 ) mixing for ball walk.
Do you know better
way to bound mixing
time of ball walk?
Outline
ο Oracle Setting:
ο Introduce the ball walk
ο KLS conjecture and its related conjectures
ο Main Result
ο Explicit Setting:
ο Introduce the geodesic walk
ο Bound the # of iteration
ο Bound the cost per iteration
Problem: Sampling
Input: a polytope with π constraints and π variables.
Output: sample a point from the uniform distribution on K.
{π΄π₯ β₯ π}
Iterations Time/Iter
of matrix
Polytopes KN09
Dikin walk
ππ
ππ1.38 Cost
inversion
LV16
Ball walk
π2.5
ππ
LV16
Geodesic walk
ππ0.75
ππ1.38
First sub-quadratic algorithm.
How does nature mix particles?
Brownian Motion.
It works for sampling on βπ .
However, convex set has boundary ο.
Option 1) Reflect it when you hit the boundary.
However, it need tiny step for discretization.
How does the nature mixes particle?
Brownian Motion.
It works for sampling on βπ .
However, convex set has boundary ο.
Option 2) Remove the boundary by blowing up.
However, this requires explicit polytopes.
Blowing Up?
Non-Uniform Distribution on Real
After blow up
Original Polytope
Real Line
The distortion makes the hard constraint
becomes βsoftβ.
Uniform Distribution on [0,1]
Enter Riemannian manifolds
ο π-dimensional manifold M is an π-dimensional surface.
ο Each point π has a tangent space ππ π of dimension π, the local linear
approximation of M at π; tangents of curves in π lie in ππ π.
ο The inner product in ππ π depends on π: π’, π£
π
Informally, you can think it is like assigning an unit ball for every point
Enter Riemannian manifolds
ο Each point π has a linear tangent space ππ π.
ο The inner product in ππ π depends on π: π’, π£
ο Length of a curve π: 0,1 β π is
1
πΏ π =
0
π
π π‘
ππ‘
π
ππ‘
π(π‘)
ο Distance π(π₯, π¦) is the infimum over all paths in M between x and y.
βGeneralizedβ Ball Walk
At x, pick random y from π·π₯ where π·π₯ = {π¦: π π₯, π¦ β€ 1}.
Hessian manifold
Hessian manifold: a subset of βπ with
inner product defined by π’, π£ π = π’π π» 2 π π π£.
For polytope πππ π₯ β₯ ππ βπ , we use the log barrier function
π
1
π π₯ =
log(
)
π π π₯
π=1
ο π π π₯ = πππ π₯ β ππ is the distance from π₯ to constraint π
ο π blows up when π₯ close to boundary
ο Our walk is slower when it is close to boundary.
Suggested algorithm
At x, pick random y from π·π₯
where π·π₯ = {π¦: π π₯, π¦ β€ 1} is induced by log barrier.
Doesnβt work!
Corresponding
Hessian Manifold
(Called Dikin Ellipsoid)
random walk on
real line
Original Polytope
Converges to the boundary since the volume of βboundaryβ is +β.
Getting Uniform Distribution
Lemma If π π₯ β π¦ = π(π¦ β π₯), then stationary distribution is uniform.
To make a Markov chain π symmetric, we use
min π π₯ β π¦ , π π¦ β π₯ ππ π₯ β π¦
π π₯βπ¦ =
.
β― ππ π₯ = π¦
To implement it, we sample π¦ according to π(π₯ β π¦)
if π π₯ β π¦ < π π¦ β π₯ , go to π¦.
Else, we go to π¦ with probability π π¦ β π₯ /π(π₯ β π¦);
Stay at x otherwise.
Dikin Walk
At x, pick random y from π·π₯
if π₯ β π·π¦ , reject π¦
else, accept π¦ with probability
[Copied from KN09]
vol π·π₯
min(1,
vol π·π¦
).
[KN09] proved it takes π(ππ) steps.
Better than the previous best π π2.5 for oracle setting.
At x, pick random y from π·π₯
if π₯ β π·π¦ , reject π¦
Dikin Walk and its Limitation
else, accept π¦ with probability min(1,
Dikin ellipsoid is fully contained in πΎ.
Idea: Pick next step y from a blown-up Dikin ellipsoid.
Can afford to blow up by ~ π/ log π . WHP π¦ β πΎ.
In high dimension, volume of π·π₯ is not that smooth. (Worst case 0,1 π )
Any larger step makes the success probability exponentially small!
0,1
π
is the worst case for ball walk, hit-and-run, Dikin walk ο.
vol π·π₯
vol π·π¦
).
At x, pick random y from π·π₯
if π₯ β π·π¦ , reject π¦
Going back to Brownian Motion
else, accept π¦ with probability min(1,
vol π·π₯
vol π·π¦
The walk is not symmetric in the βspaceβ. Tendency of going to center.
Corresponding
Hessian Manifold
Original Polytope
Taking step size to 0, Dikin walk becomes a stochastic differential equation:
ππ₯π‘ = π π₯π‘ ππ‘ + π π₯π‘ πππ‘
where π π₯π‘ = π β²β² π₯π‘ β1/2 and π(π₯π‘ ) is the drift towards center.
).
What is the drift? Fokker-Planck equation
The probability distribution of the SDE
ππ₯π‘ = π π₯π‘ ππ‘ + π π₯π‘ πππ‘
is given by
ππ
π
1 π2
2 π₯ π π₯, π‘ .
π₯, π‘ = β
π π₯ π π₯, π‘ +
π
ππ‘
ππ₯
2 ππ₯ 2
To make the stationary distribution constant, we need
π
1 π2
2
β
π π₯ +
π
π₯ =0
2
ππ₯
2 ππ₯
Hence, we have π π₯ = ππβ².
A New Walk
A new walk:
π₯π‘+β = π₯π‘ + β β
π π₯π‘ + π π₯π‘ π
with π~π(0, βπΌ).
It doesnβt make sense.
Exponential map
ο Exponential map expπ : ππ π β π is defined as
ο expπ (π£) = πΎπ£ (1),
ο πΎπ£ : unique geodesic (shortest path) from p with initial velocity π£.
Geodesic Walk
A new walk:
π₯π‘+β = expπ₯π‘ (β/2 β
π π₯π‘ + π π₯π‘ π)
with π~π(0, βπΌ).
Anyway to avoid
using filter?
However, this walk has discretization error.
So, we do a metropolis filter after.
Since our walk is complicated, the filter is super complicated.
Outline
ο Oracle Setting:
ο Introduce the ball walk
ο KLS conjecture and its related conjectures
ο Main Result
ο Explicit Setting: (original promised talk)
ο Introduce the geodesic walk
ο Bound the # of iteration
ο Bound the cost per iteration
Geodesic Walk
A new walk:
π₯π‘+β = expπ₯π‘ (β/2 β
π π₯π‘ + π)
with π~π(0, βπΌ).
Geodesic is better than βstraight lineβ:
1) It extends infinitely.
2) It gives a massive cancellation.
Key Lemma 1: Provable Long Geodesic
Straight line defines finitely; Geodesic defines infinitely.
Thm [LV16]: For manifold induced by log barrier,
a random geodesic πΎ starting from π₯ satisfies
1
β4
πππ πΎ β² π‘ β€ π(π )(πππ π₯ β π) for 0 β€ π‘ β€ π(π1/4 ).
Namely, the geodesic is well behavior for a long time.
Remark:
If central path in IPM had this, we have a π5/4 time algorithm for MaxFlow!
Key Lemma 2: Massive Cancellation
Consider a SDE on 1 dimensional real line (NOT manifold)
ππ₯π‘ = π π₯π‘ ππ‘ + π π₯π‘ πππ‘ .
How good is the βEuler methodβ, namely π₯0 + βπ π₯0 + βπ π₯0 π?
By βTaylorβ expansions, we have
β β²
π₯β = π₯0 + βπ π₯0 + βπ π₯0 π + π π₯0 π π₯0 π 2 β 1 + π β1.5 .
2
If π β² π₯0 β 0, the error is π(β).
If π β² π₯0 = 0, the error is π(β1.5 ).
For geodesic walk, π β² π₯0 = 0 (Christoffel symbols vanish in normal
coordinates)
Convergence Theorem
Thm [LV16]: For log barrier, the geodesic walk mixes in π ππ0.75 steps.
Thm [LV16]: For log barrier on 0,1 π , it mixes in π(π1/3 ) steps. ο
The best bound for ball-walk, hit-and-run and Dikin walk
is π(π2 ) steps for 0,1 π .
Our walk is similar to Milstein method.
Is high order method for
SDE used in MCMC?
Outline
ο Oracle Setting:
ο Introduce the ball walk
ο KLS conjecture and its related conjectures
ο Main Result
ο Explicit Setting: (original promised talk)
ο Introduce the geodesic walk
ο Bound the # of iteration
ο Bound the cost per iteration
Can we simply do Taylor expansion?
How to implement the algorithm
In high dim, it may take ππ time to
compute the π π‘β derivatives.
In tangent plane at x,
1. pick π€ βΌ ππ₯ (0, πΌ), i.e. standard Gassian in β. βπ₯
2. Compute π¦ =
β
expπ₯ π
2
π₯ + βπ€
3. Accept with probability Min
π π¦βπ₯
1,
π π₯βπ¦
How to compute geodesic and rejection probability?
Need high accuracy for rejection probability ο due to βdirectednessβ.
Geodesic is given by geodesic equation; probability is given by Jacobi field.
Collocation Method for ODE
A weakly polynomial time algorithm for some ODEs
Consider the ODE π¦ β² = π π‘, π¦(π‘) with π¦ 0 = π¦0 .
Given a degree π poly π and distinct points π‘1 , π‘2 , β― , π‘π ,
let π(π) be the unique degree π poly π s.t.
p
πβ² π‘ = π(π‘, π(π‘))
on π‘ = π‘1 , π‘2 , β― , π‘π
π 0 = π(0).
Lem [LV16]: π is well defined. If π‘π are Chebyshev points on [0,1], then
πΏππ π = π πΏππ π .
Thm [LV16]: If πΏππ π β€ 0.001, we can find a fix point of π efficiently.
Collocation Method for ODE
A weakly polynomial time algorithm for some ODEs
Consider the ODE π¦ β² = π π‘, π¦(π‘) with π¦ 0 = π¦0 .
Thm [LV16]: Suppose that
ο πΏππ π β€ 0.001
ο There is a degree π poly π such that πβ² β π β² β€ π.
Then, we can find a π¦ such that π¦ β π¦ 1 = π(π) in time
π(πlog 2 ππ β1 ) with π(πlog ππ β1 ) evaluations of π.
Remark: No need to compute πβ²!
In general, the runtime is π(ππLipπ(1) (π)) instead.
ππ
How can I bound the πππ
derivatives?
For 1 variable function, we can estimate π π‘β derivatives easily.
Idea: reduce estimating derivatives of general functions to 1 variable.
In general, we write πΉ β€π₯ π
Dπ πΉ(π₯) β€ π π 0 .
Calculus rule: πΉ β€π₯ π and πΊ β€πΉ(π₯) π, then πΊ β πΉ β€π₯ π β (π β π 0 ).
Implementation Theorem
Using the trick before, we show geodesic can be approximated by π(1)
degree poly.
Hence, collocation method finds in π(1) steps.
Thm [LV16]: If β β€ πβ1/2 , 1 step of Geodesic walk can be implemented
in matrix multiplication time.
For hypercube, β β€ π(1) suffices.
Questions
ο We have no background on numerical ODE/SDE and RG.
So, the running time should be improvable easily.
ο How to avoid the filtering step?
ο Is there way to tell a walk mixed or not?
(i.e. even if we cannot prove KLS, the algorithm can still stop early.)
ο Is higher order method in SDE useful in MCMC?
ο Any other suggestion/heuristic for sampling on convex set?
© Copyright 2026 Paperzz