Convex Optimization on Large-Scale Domains
Given by Linear Minimization Oracles
Arkadi Nemirovski
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
Joint research with Anatoli Juditsky†
†
University J. Fourier, Grenoble
London Optimization Workshop
King’s College, London, June 9-10, 2014
Large-Scale Convex Programs via Linear Minimization Oracles
Overview
Problems of interest and motivation
Linear Minimization Oracles and classical Conditional Gradient
Algorithm
Fenchel-type representations and LMO-based Convex Optimization
nonsmooth convex minimization
variational inequalities with monotone operators and convex-concave
saddle points
Large-Scale Convex Programs via Linear Minimization Oracles
Motivation
♣ Problem of Interest: Variational Inequality with monotone operator
Find x∗ ∈ X : hΦ(x), x − x∗ i ≥ 0 ∀x ∈ X
VI(Φ, X )
• X : convex compact subset of Euclidean space E
• Φ : X → E is monotone: hΦ(x) − Φ(x 0 ), x − x 0 i ≥ 0 ∀x, x 0 ∈ X
Examples:
♠ Convex Minimization: Φ(x) ∈ ∂f (x), x ∈ X , for a convex Lipschitz
continuous function f : X → R
⇒ solutions to VI(Φ, X ) are exactly the minimizers of f on X
♠ Convex-Concave Saddle Points:
X = U × V , Φ(u, v ) = [Φu (u, v ) ∈ ∂u f (u, v ); Φv (u, v ) ∈ ∂v [−f (u, v )]]
for a convex-concave Lipschitz continuous f (u, v ) : X → R
⇒ solutions to VI(Φ, X ) are exactly the saddle points of f on U × V .
♣ When problem’s sizes make Interior Point algorithms prohibitively time
consuming, First Order Methods (FOM’s) become the methods of choice.
Reasons: Under favorable circumstances, FOM’s (a) have cheap steps
and (b) exhibit nearly dimension independent sublinear convergence rate.
Note: Perhaps one could survive without (b), but (a) is a must!
Large-Scale Convex Programs via Linear Minimization Oracles
Proximal FOMs
Find x∗ ∈ X : hΦ(x), x − x∗ i ≥ 0 ∀x ∈ X
VI(Φ, X )
• X : convex compact subset of Euclidean space E
• Φ : X → E is monotone: hΦ(x) − Φ(x 0 ), x − x 0 i ≥ 0 ∀x, x 0 ∈ X
♣ Fact: Most FOMs for large-scale convex optimization (Subgradient
Descent, Mirror Descent, Nesterov’s Fast Gradient Methods,...) are
proximal algorithms.
♠ To allow for proximal methods with cheap iterations, X should admit
cheap proximal setup — a C 1 strongly convex distance generating
function (d.g.f.) ω(·) : X → R leading to easy to compute Bregman
projections e 7→ argminx∈X [ω(x) + he, xi]
♣ Note: If X admits cheap proximal setup, then X admits cheap Linear
Minimization Oracle capable to minimize linear forms over X .
Large-Scale Convex Programs via Linear Minimization Oracles
Proximal FOMs: bottlenecks
Find x∗ ∈ X : hΦ(x), x − x∗ i ≥ 0 ∀x ∈ X
VI(Φ, X )
• X : convex compact subset of Euclidean space E
• Φ : X → E is monotone: hΦ(x) − Φ(x 0 ), x − x 0 i ≥ 0 ∀x, x 0 ∈ X
♠ In several important cases, X does not admit cheap proximal setup, but
does allow for cheap LMO:
Example 1: X ⊂ Rm×n is nuclear norm ball, or spectahedron – the set of
symmetric psd m × m matrices with unit trace. Here Bregman projection
requires full singular value decomposition of an m × n matrix, resp., full
eigenvalue decomposition of a symmetric m × m matrix.
LMO is much cheaper: it reduces to computing (e.g., by Power method)
the leading pair of singular vectors (resp., the leading eigenvector) of a
matrix.
Example 2: X is Total Variation ball in the space of m × n zero mean
images. Here already the simplest Bregman projection reduces to highly
computationally demanding metric projection onto the TV ball.
LMO is much cheaper: it reduces to solving a max flow problem on a
simple mn-node network with ≈ 2mn arcs.
Large-Scale Convex Programs via Linear Minimization Oracles
Illustration: LMO vs. Bregman projection
Computing leading pair of singular vectors of an 8192 × 8192 matrix
takes 64.4 sec — by factor 7.5 cheaper than computing the full
singular value decomposition.
Computing leading eigenvector of an 8192 × 8192 symmetric matrix
takes 10.9 sec — by factor 13.0 cheaper than computing the full
eigenvalue decomposition.
Minimizing a linear form over the TV ball in the space of 1024 × 1024
images takes 55.6 sec — by factor 20.6 cheaper than computing
metric projection onto the ball.
Platform: 4 × 3.40 GHz CPU, 16.0 GB RAM 64-bit Windows 7
♣ Our goal: Solving large-scale problems with convex structure (convex
minimization, convex-concave saddle points, variational inequalities with
monotone operators) on LMO-represented domains.
Large-Scale Convex Programs via Linear Minimization Oracles
Beyond Proximal FOMs: Conditional Gradient
Conditional Gradient Algorithm
♣ Seemingly the only standard technique for handling LMO-represented
domains is the Conditional Gradient Algorithm [Frank&Wolfe ’58] solving
smooth convex minimization problems
Opt = minx∈X f (x)
(P)
♠ CGA is the recurrence
[X 3] xt 7→ ∇f (xt ), xt+ ∈ Argminx∈X h∇f (xt ), xi
2
7→ xt+1 : f (xt+1 ) ≤ f (xt + t+1
[xt+ − xt ]) & xt+1 ∈ X , t = 1, 2, ...
♠ Theorem [well known]: Let f (·) be convex and (κ, L) smooth, κ ∈ (1, 2]:
∀x, y ∈ X : f (y ) ≤ f (x) + h∇f (x), y − xi + κL kx − y kκX
[k · kX : norm on Lin(X ) with the unit ball 12 [X − X ]]
22κ
L
Then f (xt ) − Opt ≤
·
, t = 2, 3, ...
κ(3 − κ) (t + 1)κ−1
♠ CGA was extended recently [Harchaoui,Juditsky,Nem.’13] to norm-regularized
problems like
min [f (x) + kxk]
x∈K
• K: cone with LMO-represented K ∩ {x : kxk ≤ 1}; • f : convex and smooth.
Large-Scale Convex Programs via Linear Minimization Oracles
Fenchel-Type Representations of Functions
♠ Question: How to carry out nonsmooth convex minimization and solve
other smooth/nonsmooth problems with convex structure on LMO-represented domains?
♠ Proposed answer: Use Fenchel-type representations.
♣ Fenchel representation (F.r.) of a function f : Rn → R ∪ {+∞} is
f (x) = supy [hx, y i − f ∗ (y )]
∗
• f : proper convex lower semicontinuous.
♠ Fenchel-type representation (F-t.r.) of f is
f (x) = supy [hx, Ay + ai − φ(y )]
• φ: proper convex lower semicontinuous.
♠ Good F-t.r: Y := Dom φ is compact & φ ∈ Lip(Y ).
♣ F.r. of proper convex lower semicontinuous f “exists in the nature” and is
unique, but usually is not available numerically. In contrast, F-t.r.’s admit
fully algorithmic calculus: all basic convexity-preserving operations as
applied to operands given by F-t.r.’s yield explicit F-t.r. of the result.
⇒ Typical “well-structured” convex functions admit explicit good F-t.r.’s
(even with affine φ’s).
Large-Scale Convex Programs via Linear Minimization Oracles
♠ Example: F.r. of f1 + f2 is given by computationally demanding
inf-convolution:
(f1 + f2 )∗ (y ) =
inf
y1 +y2 =y
[f1∗ (y1 ) + f2∗ (y2 )]
• In contrast, an F.-t.r. of f1 + f2 is readily given by F.-t.r.’s of f1 , f2 :
fi (x) = infyi ∈Yi [hx, Ai yi + ai i − φi (yi )] i = 1, 2
⇓
f1 (x) + f2 (x) = sup
hx, A1 y1 + a1 + A2 y2 + a2 i − [φ1 (y1 ) + φ2 (y2 )]
{z
}
|
|
{z
}
y =[y1 ;y2 ]
∈Y :=Y1 ×Y2
Ay +a
φ(y )
Large-Scale Convex Programs via Linear Minimization Oracles
Nonsmooth Convex Minimization via Fenchel-Type Representation
♣ When solving convex minimization problem
Opt(P) = minx∈X f (x),
good F-t.r. of the objective
f (x) = max [hx, Ay + ai − φ(y )]
(P)
y ∈Y =Domφ
gives rise to the dual problem
[−Opt(P) =] Opt(D) = min f∗ (y ) := φ(y ) − minhx, Ay + ai
(D)
y ∈Y
x∈X
♠ Observation: LMO for X combines with First Order oracle for φ to
induce First Order oracle for f∗
⇒ When First Order oracle for φ and LMO for X are available, (D) is well
suited for solving by FOMs (e.g., proximal methods, provided Y admits
cheap proximal setup).
⇒ Strategy: Solve (D) and then recover a solution to (P).
• Question: How to recover good solution to the problem of interest (P)
from information acquired when solving (D)?
• Proposed answer: Use accuracy certificates.
Large-Scale Convex Programs via Linear Minimization Oracles
Accuracy Certificates
♣ Assume we are applying N-step FOM to a convex problem
Opt = miny ∈Y F (y ),
(P)
and have generated search points yt ∈ Y augmented with first order
information (F (yt ), F 0 (yt )), 1 ≤ t ≤ N.
♠ An accuracy certificate for execution protocol I N = {yt , F (yt ), F 0 (yt )}Nt=1 is
N
a collection λN = {λN
t }t=1 of N nonnegative weights summing up to 1.
♠ Accuracy certificate λN and execution protocol I N give rise to
PN
0
• Resolution Res(I N , λhN ) = maxy ∈Y t=1 λN
t hF (yt ), iyt − y i
P
N
N
N
• Gap Gap(I N , λN ) = mint≤N F (yt ) − t=1 λN
t F (yt ) + Res(I , λ )
≤ Res(I N , λN )
♠ Simple Theorem I [Nem.,Onn,Rothblum,’10]: Let y N be the best (with
the smallest value of F ) of the search points y1 , ..., yN , and let
PN
N bN
ybN = t=1 λN
t yt . Then y , y are feasible solutions to (P) satisfying
F (ybN ) − Opt ≤
F (y N ) − Opt ≤
Res(I N , λN ),
Gap(I N , λN ) ≤ Res(I N , λN )
Large-Scale Convex Programs via Linear Minimization Oracles
Accuracy Certificates (continued)
Opt(P) = minx∈X [f (x) := maxy ∈Y [hx, Ay + ai − φ(y )]]
(P)
♣ Let I N = {yt ∈ Y , F (yt ), F 0 (yt )}N
t=1 be N-step execution protocol built by
an FOM as applied to
Opt(D) = miny ∈Y {F (y ) := φ(y ) − minx∈X hx, Ay + ai}
(D)
and let xt ∈ Argminx∈X hx, Ayt + ai be LMO’s answers obtained when
mimicking the First Order oracle for F :
F (yt ) = φ(yt ) − hxt , Ayt + ai & F 0 (yt ) = φ0 (yt ) − AT xt
♠ Simple Theorem II [Cox,Juditsky,Nem.,’13]: Let λN be an accuracy
PN
bN
certificate for I N and xbN = t=1 λN
t xt . Then x is feasible for (P) and
f (xbN ) − Opt(P) ≤ Res(I N , λN )
(in fact, the right hand side can be replaced with Gap(I N , λN )).
Large-Scale Convex Programs via Linear Minimization Oracles
LMO-Based Nonsmooth Convex Minimization (continued)
Opt(P)
[−Opt(P) =] Opt(D)
= minx∈X {f (x) = maxy ∈Y [hx, Ay + ai − φ(y )]}
= miny ∈Y {F (y ) = φ(y ) − minx∈X hx, Ay + ai}
(P)
(D)
♣ Conclusion: Mimicking the First Order oracle for (D) via LMO for X and
solving (D) by an FOM producing accuracy certificates, after N = 1, 2, ...
iterations we have at our disposal feasible solutions xbN to the problem of
interest (P) such that
f (xbN ) − Opt(P) ≤ Gap(I N , λN ).
♠ Fact: A wide spectrum of FOMs allow for augmenting execution
protocols by good accuracy certificates, meaning that Res(I N , λN ) (and
thus Gap(I N , λN )) obeys the standard efficiency estimates of the
algorithms in question.
• For some FOMs (Subgradient/Mirror Descent, Nesterov’s Fast Gradient
Method for smooth convex minimization, and full memory Mirror Descent Bundle
Level algorithms), good certificates are readily available.
• Several FOMs (polynomial time Cutting Plane algorithms, like Ellipsoid and
Inscribed Ellipsoid methods, and truncated memory Mirror Descent Bundle Level
algorithms) can be modified in order to produce good certificates. The required
modifications are “costless” — the complexity of an iteration remains basically
intact.
Large-Scale Convex Programs via Linear Minimization Oracles
LMO-Based Nonsmooth Convex Minimization (continued)
Opt(P)
[−Opt(P) =] Opt(D)
=
=
minx∈X {f (x) = maxy ∈Y [hx, Ay + ai − φ(y )]}
miny ∈Y {F (y ) = φ(y ) − minx∈X hx, Ay + ai}
(P)
(D)
♣ Let Y be equipped with cheap proximal setup
⇒ (P) can be solved by applying to (D) a proximal algorithm with good
accuracy certificates (e.g., various versions of Mirror Descent) and
recovering from the certificates approximate solutions to (P).
• With this approach, an iteration requires a single call to the LMO for X
and a single computation of Bregman projection
ξ 7→ argminy ∈Y [hξ, y i + ω(y )].
♠ An alternative is
• to use F-t.r. of f and proximal setup for Y to approximate f by
efδ (x) = maxy ∈Y {hx, Ay + ai − φ(y )−δω(y )}
• to minimize the C1,1 function efδ (·) over X by Conditional Gradient.
Note: The alternative is just Nesterov’s smoothing with smooth minimization by the
LMO-based Conditional Gradient rather than by proximal Fast Gradients.
♠ Fact: When φ is affine (quite typical!), both approaches result√in
methods with the same iteration complexity and the same O(1/ t)
efficiency estimate.
Large-Scale Convex Programs via Linear Minimization Oracles
LMO-Based Nonsmooth Convex Minimization: How It Works
♣ Test problems: Matrix Completion
with uniform fit
Opt
=
=
Y
=
minx∈Rp×p :kxknuc ≤1 nf (x) := max
hP (i,j)∈Ω |xij − aij | io
minx∈Rp×p :kxknuc ≤1 maxy ∈Y
(i,j)∈Ω yij (xij − aij )
P
{y = {yij : (i, j) ∈ Ω} : (i,j)∈Ω |yij | ≤ 1}
• Ω: N-element collection of cells in a p × p matrix.
♠ Results, I: Restricted Memory Bundle-Level algorithm on low size
(p = 512, N = 512) Matrix Completion:
Memory depth
Gap1 /Gap1024
1
114
33
164
65
350
129
3253
♠ Results, II: Subgradient Descent on Matrix Completion:
p
2048
4096
8192
N
8192
16384
16384
Gap1
1.81e-1
3.74e-1
2.54e-1
Gap1 /Gap32
171.2
335.4
37.8
Gap1 /Gap128
213.8
1060.8
875.8
Gap1 /Gap1024
451.4
1287.3
1183.6
CPU, sec
521.3
1524.8
3644.0
Platform: desktop PC with 4 × 3.40 GHz Intel Core2 CPU and 16 GB RAM, Windows 7-64 OS.
Large-Scale Convex Programs via Linear Minimization Oracles
From Nonsmooth LMO-Based Convex Minimization to Variational Inequalities and
Saddle Points
Motivating Example
♣ Consider Matrix Completion problem as follows:
Opt = min [f (u) := kAu − bk2,2 ]
u:kuknuc ≤1
Pk
• u 7→ Au : Rn×n → Rm×m , e.g., Au = i=1 `i uriT
• k · k2,2 : spectral norm (largest singular value) of a matrix
♠ Fenchel-type representation of f is immediate:
f (u) = maxkv knuc ≤1 hv , Au − bi
⇒ problem of interest reduces to bilinear saddle point problem
minu∈U maxv ∈V hv , Au − bi
• U = {u ∈ Rn×n : kuknuc ≤ 1}, V = {v ∈ Rm×m : kv knuc ≤ 1}
where both U and V admit computationally cheap LMO’s, but do not admit
computationally cheap proximal setups
⇒ Our previous approach (same as any other known approach) is
inapplicable – we needed Y ≡ V to be “proximal-friendly”...
⇒ (?) How to solve convex-concave saddle point problems on products of
LMO-represented domains?
Large-Scale Convex Programs via Linear Minimization Oracles
Fenchel-Type Representation of Monotone Operator: Definition
Definitions
♣ Fenchel-type representation: Let X ⊂ E be a convex compact set in
Euclidean space, and Φ : X → E be a vector field on X . A Fenchel-type
representation of Φ on X is
Φ(x) = Ay (x) + a
(∗)
•
•
•
y 7→ Ay + a : F → E: affine mapping from Euclidean space F into E
y (x): strong solution to VI(G(·) − A∗ x, Y )
Y ⊂ F : convex & compact, G(·) : Y → F : monotone
♠ F, Y , A, a, y (·), G(·) is the data of the representation.
Definition
♣ Dual operator induced by F.-t.r. (∗) is
Θ(y ) = G(y ) − A∗ x(y ) : Y → F
,
x(y ) ∈ Argminx∈X hAy + a, xi
♠ The v.i. VI(Θ, Y ) is called the (induced by (∗)) dual to the primal v.i.
VI(Φ, X ).
Large-Scale Convex Programs via Linear Minimization Oracles
Fenchel-Type Representation of Monotone Operator (continued)
Facts:
♠ If an operator Φ : X → E admits a representation on a convex compact
set X ⊂ E, Φ is monotone on X
♠ The dual operator Θ induced by a Fenchel-type representation of a
monotone operator is monotone. Θ is bounded, provided G(·) is so.
Calculus of Fenchel-type Representations:
♠ F-t.r.’s of monotone operators admit fully algorithmic calculus: F-t.r.’s of
operands of basic monotonicity-preserving operations:
• summation with nonnegative coefficients,
• direct summation,
• affine substitution of variables
can be straightforwardly converted to an F-t.r. of the result.
♠ An affine monotone operator admits explicit F-t.r. on every compact
domain.
♠ A good F-t.r. f (x) = miny ∈Y [hx, Ay + ai − φ(y )] of convex function
f : X → R induces an F-t.r. of a subgradient field of f , provided φ ∈ C1 (Y ).
Large-Scale Convex Programs via Linear Minimization Oracles
A Digression: Variational Inequalities with Monotone Operators: Accuracy Measures
Find x∗ ∈ X : hΦ(x), x − x∗ i ≥ 0 ∀x ∈ X
VI(Φ, X )
♠ A natural measure of (in)accuracy of a candidate solution x̄ ∈ X to
VI(Φ, X ) is the dual gap function εvi (x̄|Φ, X ) = supx∈X hΦ(x), x̄ − xi
♠ When VI(Φ, X ) comes from convex-concave saddle point problem:
• X = U × V for convex compact sets U, V , and
• Φ(u, v ) = [Φu (u, v ) ∈ ∂u f (u, v ); Φv (u, v ) ∈ ∂v [−f (u, v )]] for Lipschitz
continuous convex-concave function
f (u, v ) : X = U × V → R,
another natural accuracy measure is the saddle point inaccuracy
εsad (x̄ = [ū; v̄ ]|f , U, V ) := maxv ∈V f (ū, v ) − minu∈U f (u, v̄ )
Explanation: Convex-concave saddle point problem gives rise to two dual
to each other convex programs h
i
Opt(P) = minu∈U f (u) := maxv ∈V f (u, v ) (P)
Opt(D) = maxv ∈V [f (v ) := minu∈U f (u, v )] (D)
with equal optimal values: Opt(P) = Opt(D).
⇒ εsad (ū, v̄ |f , U, V ) is the sum of non-optimalities, in terms of respective
objectives, of ū ∈ U as a solution to (P) and v̄ ∈ V as a solution to (D).
Large-Scale Convex Programs via Linear Minimization Oracles
Why Accuracy Certificates Certify Accuracy ?
♠ Fact: Let v.i. VI(Ψ, Z ) with monotone operator Ψ and convex compact
Z be solved by N-step FOM, let P
I N = {zi ∈ Z , Ψ(zi )}N
i=1 be execution
N
N
protocol, and λ = {λi ≥ 0}i=1 , i λi = 1, be an accuracy certificate.
PN
Then z N = i=1 λi zi is a feasible solution to VI(Ψ, Z ), and
PN
εvi (z N |Ψ, Z ) ≤ Res(I N , λN ) := maxz∈Z i=1 λi hΨ(zi ), zi − zi
When Ψ is associated with convex-concave saddle point problem
minu∈U maxv ∈V f (u, v ), we also have εsad (z N |f , U, V ) ≤ Res(I N , λN ).
♠ Fact: Let Ψ be a bounded vector field on a convex compact domain Z .
For every N = 1, 2, ..., a properly designed N-step proximal FOM (Mirror
Descent) as applied to VI(Ψ, Z ) generates an execution protocol I N and
accuracy certificate λN such that
√
Res(I N , λN ) ≤ O(1/ N)
If Ψ is Lipschitz continuous on Z , then for properly selected N-step FOM
(Mirror Prox) the efficiency estimate improves to
Res(I N , λN ) ≤ O(1/N).
In both cases, factors hidden in O(·) are explicitly given by parameters of
proximal setup and the magnitude of Ψ (first case), or of the Lipschitz
constant of Ψ (second case).
Large-Scale Convex Programs via Linear Minimization Oracles
Solving Monotone Variational Inequalities on LMO-Represented Domains
♠ In order to solve a primal v.i.
VI(Φ, X ) given a F-t.r.
y (x) ∈ Y
Φ(x) = Ay (x) + a, where
hG(y (x)) − A∗ x, y − y (x)i ≥ 0 ∀y ∈ Y
we solve the dual v.i. VI(Θ, Y ),
Θ(y ) = G(y ) − A∗ x(y ), where x(y ) ∈ Argminx∈X hx, Ay + ai
Note: Computing Θ(y ) reduces to computing G(y ), multiplying by A and
A∗ , and a single call to the LMO representing X .
♣ Theorem [Juditsky,Nem.,’13]: Let I N = {yi ∈ Y , Θ(yi )}N
i=1 be execution
N
protocol
of
a
FOM
applied
to
the
dual
v.i.
VI(Θ,
Y
),
and
λ
= {λi ≥ 0}N
i=1 ,
P
λ
=
1,
be
associated
accuracy
certificate.
i i
PN
Then x N = i=1 λi x(yi ) is a feasible solution to the primal v.i. VI(Φ, X )
and
PN
εvi (x T |Φ, X ) ≤ Res(I N , λN ) := maxy ∈Y i=1 λi hΘ(yi ), yi − y i
If Φ is associated with bilinear convex-concave saddle point problem
minu∈U maxv ∈V [f (u, v ) = ha, ui + hb, v i + hv , Aui] ,
then also
εsad (x N |f , U, V ) ≤ Res(I N , λN )
Large-Scale Convex Programs via Linear Minimization Oracles
How it Works
♠ As applied to Motivating Example
Opt =
min
u∈Rn×n ,
kuknuc ≤1
f (u) := kAu − bk2,2 =
max hv , Au − bi, Au =
min
u∈Rn×n , v ∈Rm×m ,
kuknuc ≤1 kv knuc ≤1
k
P
`i uriT
i=1
our approach results in a method yielding in N = 1, 2, ... steps feasible
approximate solutions u N to the problem of interest and lower
√ bounds
OptN on Opt such that GapN ≡ f (u N ) − OptN ≤ O(1)kAk2,2 / N
Iteration count N
257
1
65
129
193
321
385
449
512
m = 512
n = 1024
k = 2
GapN
Gap1 /GapN
cpu, sec
0.1269
1.00
0.2
0.0239
5.31
9.5
0.0145
8.78
27.6
0.0103
12.38
69.1
0.0075
17.03
112.6
0.0063
20.20
218.1
0.0042
29.98
326.2
0.0040
31.41
432.6
0.0040
31.66
536.4
m = 1024
n = 2048
k = 2
GapN
Gap1 /GapN
cpu, sec
0.1329
1.00
0.7
0.0196
6.79
38.0
0.0119
11.21
101.1
0.0075
17.81
206.3
0.0053
25.09
314.1
0.0041
32.29
508.9
0.0036
37.23
699.0
0.0034
38.70
884.9
0.0027
50.06
1070.0
m = 2048
n = 4096
k = 2
GapN
Gap1 /GapN
cpu, sec
0.1239
1.00
2.2
0.0222
5.57
103.5
0.0139
8.93
257.6
0.0108
11.48
496.9
0.0086
14.40
742.5
0.0041
30.48
1147.8
0.0037
33.14
1564.4
0.0035
35.76
1981.4
0.0035
35.77
2401.0
m = 4096
n = 8192
k = 2
GapN
Gap1 /GapN
cpu, sec
0.1193
1.00
6.5
0.0232
5.14
289.9
0.0134
8.90
683.8
0.0108
11.08
1238.1
0.0054
22.00
1816.0
0.0040
29.83
2724.5
0.0035
33.93
3648.3
0.0034
34.85
4572.2
0.0034
35.14
5490.8
m = 8192
n = 16384
k = 2
GapN
Gap1 /GapN
cpu, sec
0.11959
0.02136
0.01460
0.01011
0.00853
1.00
5.60
8.19
11.82
14.01
21.7
920.4
2050.2
3492.4
4902.2
Platform: 4 x 3.40 GHz desktop with 16 GB RAM, 64 bit Windows 7 OS.
Note: The design dimension of the largest instance is 228 = 268 435 456.
Large-Scale Convex Programs via Linear Minimization Oracles
References
Bruce Cox, Anatoli Juditsky, Arkadi Nemirovski, Dual subgradient
algorithms for large-scale nonsmooth learning problems – to appear
in Mathematical Programming Series B, arXiv:1302.2349, Aug. 2013
Anatoli Juditsky, Arkadi Nemirovski, Solving variational inequalities
with monotone operators on domains given by Linear Minimization
Oracles – submitted to Mathematical Programming, arXiv:1312.1073,
Dec. 2013
Large-Scale Convex Programs via Linear Minimization Oracles
© Copyright 2026 Paperzz