Slide - Cs.princeton.edu

Multiplicative weights method:
A meta algorithm with applications to linear
and semi-definite programming
Sanjeev Arora
Princeton University
Based upon:
Fast algorithms for Approximate SDP [FOCS ‘05]
log(n) approximation to SPARSEST CUT in Õ(n2) time [FOCS ‘04]
The multiplicative weights update method and it’s applications [’05]
See also recent papers by Hazan and Kale.
Multiplicative update rule (long history)
n agents
weights
w1
w2
.
.
.
Update weights according to
performance:
wit+1 Ã wit (1 +  ¢ performance of i)
wn
Applications: approximate solutions to LPs and SDPs, flow problems,
online learning (boosting), derandomization & chernoff bounds,
online convex optimization, computational geometry,
metricembeddongs, portfolio management… (see our survey)
Simplest setting – predicting the market
1$ for correct prediction
0$ for incorrect
N “experts” on TV
Can we perform as good as the best expert ?
Weighted majority algorithm [LW ‘94]
“Predict according to the weighted majority.”
Multiplicative update (initially all wi =1):
If expert predicted correctly: wit+1 Ã wit
If incorrectly, wit+1 Ã wit(1 - )
Claim: #mistakes by algorithm ¼ 2(1+)(#mistakes by best expert)
Potential: t = Sum of weights= i wit (initially n)
If algorithm predicts incorrectly ) t+1 · t -  t /2
T · (1-/2)m(A) n
m(A) =# mistakes by algorithm
T ¸ (1-)mi
) m(A) · 2(1+)mi + O(log n/)
Generalized Weighted majority
[A.,Hazan, Kale ‘05]
n agents
Set of events (possibly infinite)
event j
expert i
payoff
M(i,j)
Generalized Weighted majority
[AHK ‘05]
Set of events (possibly infinite)
n agents
p1
p2
.
.
.
pn
Algorithm: plays distribution on
experts (p1,…,pn)
Payoff for event j: i pi M(i,j)
Update rule:
pit+1 Ã pit (1 +  ¢ M(i,j))
Claim: After T iterations,
Algorithm payoff ¸ (1-) best expert – O(log n / )
Game playing,
Online optimization
Lagrangean relaxation
Gradient descent
Chernoff bounds
Games with Matrix
Payoffs
Boosting
Fast soln to
LPs, SDPs
Common features of MW algorithms
“competition” amongst n experts
Appearance of terms like
exp( - t (performance at time t) )
Time to get -approximate solutions is
proportional to 1/2.
Boosting and AdaBoost
(Schapire’91, Freund-Schapire’97)
“Weak learning ) Strong learning”
Input: 0/1 vectors with a “Yes/No” label
(white, tall, vegetarian, smoker,…,)
“Has disease”
(nonwhite, short, vegetarian, nonsmoker,..) “No disease”
Desired: Short OR-of-AND (i.e., DNF) formula that
describes  fraction of data (if one exists)
(white Æ vegetarian) Ç (nonwhite Æ nonsmoker)
 = 1- “Strong Learning”
 = ½ +  “Weak Learning”
Main idea in boosting: data points are “experts”,
weak learners are “events.”
Approximate solutions to convex
programming
e.g., Plotkin Shmoys Tardos ’91, Young’97, Garg
Koenemann’99, Fleischer’99
MW Meta-Algorithm gives unified view
Note: Distribution
{ pi}
i pi =1
$ Convex Combination
If P is a convex set and
each xi 2 P
then i pi xi 2 P
Solving LPs (feasibility)
P = convex
domain
w1 a1¢ x ¸ b1
w2 a2¢ x ¸ b
2






wm am¢ x ¸ bm
x 2P
-
-



-
Event Oracle
k wk(ak¢ x – bk) ¸ 0
x2P
Solving LPs (feasibility)
[1 – (a1 ¢ x0 – b1)/] £ w1 a1¢ x ¸ b1
[1 – (a2 ¢ x0 – b2)/] £ w2 a2¢ x ¸ b
2



Final solution =


Average x vector




[1 – (am ¢ x0 – bm)/] £ wm am¢ x ¸ bm
x 2P
x0
Oracle
|ak ¢ x0 – bk| · 
 = width
k wk(ak¢ x – bk) ¸ 0
x2P
Performance guarantees
In O(2 log(n)/2) iterations, average x is  feasible.
Packing-Covering LPs: [Plotkin, Shmoys, Tardos ’91]
9? x 2 P:
j = 1, 2, … m:
aj ¢ x ¸ 1
Covering
problem
Want to find x 2 P s.t. aj ¢ x ¸ 1 - 
Assume: 8 x 2 P: 0 · aj ¢ x · 
MW algorithm gets  feasible x in O( log(n)/2)
iterations
Connection to Chernoff bounds and
derandomization
Deterministic approximation algorithms for 0/1
packing/covering problem a la Raghavan-Thompson
Solve LP
relaxation of
integer program.
Obtain soln (yi)
Randomized
Rounding O(log n)
times)
xi Ã 1 w/ prob. yi
xi Ã 0 w/ prob.1 - yi
Derandomize
using pessimistic
estimators
exp(
i t ¢ f(yi)) without solving the LP.”
Young [95] “Randomized
rounding
MW update rule mimics pessimistic estimator.
Application 2:
Semidefinite programming (Klein-Lu’97)
a1¢ x ¸ b1
a2¢ x ¸ b2



am¢ x ¸ bm
x 2P
aj and x : symmetric
matrices in Rn £ n
P=
Oracle: max j wj (aj ¢ x) over P
(eigenvalue computation!)
{x: x is psd;
tr(x) =1}
The promise of SDPs…
0.878-approximation to MAX-CUT (GW’94),
7/8-approximation to MAX-3SAT (KZ’00)
plog n –approximation to SPARSEST CUT,
BALANCED SEPARATOR (ARV’04).
plog n-approximation to MIN-2CNF DELETION,
MIN-UNCUT (ACMM’05),
MIN-VERTEX SEPARATOR (FHL’06),…
etc. etc….
The pitfall…
High running times ---as high as n4.5 in recent works
Solving SDP relaxations more efficiently
using MW [AHK’05]
Problem
Using Interior
Our result
Point
MAXQP (e.g. MAX-
Õ(n3.5)
CUT)
Õ(n1.5N/2.5) or
Õ(n3/*3.5)
HAPLOFREQ
Õ(n4)
Õ(n2.5/2.5)
SCP
Õ(n4)
Õ(n1.5N/4.5)
EMBEDDING
Õ(n4)
Õ(n3/d53.5)
SPARSEST CUT
Õ(n4.5)
Õ(n3/2)
MIN UNCUT etc
Õ(n4.5)
Õ(n3.5/2)
Key difference between efficient and
not-so-efficient implementations
of the MW idea: Width management.
(e.g., the difference between PST’91 and
GK’99)
Recall: issue of width
MW
a 1¢ x ¸ b 1
a 2¢ x ¸ b 2



am¢ x ¸ bm
Oracle
k wk(ak¢ x – bk) ¸ 0
x2P
Õ(2/2) iterations
to obtain  feasible x
 = maxk |ak ¢ x –
bk|
 is too large!!
Issue 1:Dealing with width
MW
a 1¢ x ¸ b 1
a 2¢ x ¸ b 2



am¢ x ¸ bm
A few high width
constraints
Run a hybrid of MW
and ellipsoid/Vaidya
poly(m, log(/))
iterations to obtain 
feasible x
Oracle
k wk(ak¢ x – bk) ¸ 0
x2P
Dealing with width (contd)
MW
a 1¢ x ¸ b 1
a 2¢ x ¸ b 2

Dual ellipsoid/Vaidya
am¢ x ¸ bm
Oracle
k wk(ak¢ x – bk) ¸ 0
x2P
Hybrid of MW and
Vaidya
Õ(L2/2) iterations
to obtain  feasible x
L ¿ 
Issue 2: Efficient implementation of Oracle: fast
eigenvalues via matrix sparsification
Random sampling
C
C’
O(nij|Cij|/) non-zero entries
k C – C’k · 
Lanczos algorithm effectively uses sparsity of C
Similar to Achlioptas, McSherry [’01], but better in
some situations (also easier analysis)
O(n2)-time algorithm to compute O(plog n)-approximation
to SPARSEST CUT
[A., Hazan, Kale ’04]
(combinatorial algorithm;
improves upon O(n4.5) algorithm of A, Rao,
Vazirani)
Sparsest Cut
(G) = 2/5
S
The sparsest cut:
S
O(log n) approximation [Leighton Rao ’88]
O(plog n) approximation [A., Rao, Vazirani’04]
O(p log n) approximation in O(n2) time.
(Actually, finds expander flows) [A., Hazan, Kale’05]
MW algorithm to find expander flows
Events – { (s,w,z) | weights on vertices, edges, cuts}
Experts – pairs of vertices (i,j)
Payoff: (for weights di,j on experts)
shortest path
according to
weights we
Fact: If events are chosen optimally, the distribution on experts di,j
converges to a demand graph which is an “expander
flow”
Cuts
separating
i and j
[by results of Arora-Rao-Vazirani ’04 suffices to produce
approx. sparsest cut]
New: Online games with matrix payoffs
(joint w/ Satyen Kale’06)
Payoff is a matrix, and so is the “distribution” on experts!
Uses matrix analogues of usual inequalities
1+ x · ex
I + A · eA
Can be used to give a new “primal-dual” view of SDP
relaxations () easier, faster approximation algorithms)
New: Faster algorithms for online learning
and portfolio management
(Agarwal-Hazan’06, Agarwal-Hazan-Kalai-Kale’06 )
Framework for online optimization
inspired by Newton’s method (2nd order optimization).
(Note: MW ¼ gradient descent)
Fast algorithms for Portfolio management and other
online optimization problems
Open problems
Better approaches to width management?
Faster run times?
Lower bounds?
THANK YOU
Connection to Chernoff bounds and
derandomization
Deterministic approximation algorithms a la
Raghavan-Thompson
Packing/covering IP with variables xi = 0/1
9? x 2 P: 8 j 2 [m], fj(x) ¸ 0
Solve LP relaxation using variables yi 2 [0, 1]
Randomized rounding:
1 w.p. yi
xi =
0 w.p. 1 – yi
Chernoff: O(log m) sampling iterations suffice
Derandomization [Young, ’95]
Can derandomize the rounding using exp(tj fj(x)) as
a pessimistic estimator of failure probability
By minimizing the estimator in every iteration, we
mimic the random expt, so O(log m) iterations suffice
The structure of the estimator obviates the need to
solve the LP: Randomized rounding without solving
the Linear Program
Punchline: resulting algorithm is the MW algorithm!
Weighted majority [LW ‘94]
If lost at t, t+1 · (1-½ ) t
At time T: T · (1-½ )#mistakes ¢ 0
Overall:
#mistakes · log(n)/ + (1+) mi
#mistakes of
expert i
Semidefinite programming
Vectors aj and x: symmetric matrices in Rn £ n
xº0
Assume: Tr(x) · 1
Set P = {x: x º 0, Tr(x) · 1}
Oracle: max j wj(aj ¢ x) over P
Optimum: x = vvT where v is the largest eigenvector
of jwj aj
Efficiently implementing the oracle
Optimum: x = vvT
v is the largest eigenvector of some matrix C
Suffices to find a vector v such that vTCv ¸ 0
Lanczos algorithm with a random starting vector is
ideal for this
Advantage: uses only matrix-vector products
Exploits sparsity (also: sparsification procedure)
Use analysis of Kuczynski and Wozniakowski [’92]

Download Report

Slide - Cs.princeton.edu

Paperzz.com

Your Paperzz