Open Seminar on Non-Convex Optimization Over

Open Seminar
on
Non-Convex Optimization Over Networks
Sandeep Kumar
Supervisor: Dr. Ketan Rajawat
Dept. of Electrical Engineering ,
IIT Kanpur
Uttar Pradesh, India
1 Non-Convex optimization challenges and requirements
2 Asynchronous optimization over heterogeneous Networks
3 Distributed Interference Alignment Over MIMO Networks
4 Stochastic Non-Convex ADMM
5 Extension to Multidimensional Scaling(MDS)
6 Conclusion
7 References
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
2 / 78
A generic optimization problem
A generic optimization problem can be written as
minimize
subject to
f (x)
x∈X
(1)
Cost function f : RN → R, Constraint set X
X = dom(f ) ∩ {x|h1 (x) = 0, . . . , hm (x) = 0}
∩ {x|g1 (x) ≤ 0, . . . , gn (x) ≤ 0}
(2)
If f and X are convex the formulation becomes convex optimization
problem.
e.g linear programming, quadratic programming, semi definite programming
If either of the requirements are not met, the problem is non-convex (NC).
Bertsekas, Dimitri P. Convex optimization theory. Belmont: Athena Scientific, 2009.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
3 / 78
Convex functions
Convex functions f : Rn → R, x, y ∈ dom(f ). A function is convex if, for
0≤θ≤1
I Zeroth order, f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y).
I First order, f (x) + ∇f (x)T (y − x) ≤ f (y).
I Second order, ∇2 f (x) 0
݂ሺ‫ݔ‬ሻ
ߠ݂ሺ‫ݔ‬ሻ ൅ ͳ െ ߠ ݂ሺ‫)ݕ‬
݂ሺ‫ݕ‬ሻ
݂ሺߠ‫ ݔ‬൅ ͳ െ ߠ ‫ݕ‬ሻ
‫ݔ‬
ߠ‫ ݔ‬൅ ͳ െ ߠ ‫ݕ‬
Sandeep Kumar (IIT K)
‫ݖ‬଴
‫ݕ‬
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
4 / 78
Convex set
Convex set A set X is convex set, if x, y ∈ X , then for 0 ≤ θ ≤ 1,
θx + (1 − θ)y ∈ X
Non-convex set
Sandeep Kumar (IIT K)
Convex set
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
5 / 78
What’s so special about convex?
I
I
A convex function has no local minima that are not global
The existence of a global minimum of a convex function over a convex
set is conveniently characterized in terms of directions of recession.
I
A real-valued convex function is continuous and has nice differentiability
properties
I
A convex set has a nonempty relative interior
I
A convex set is connected and has feasible directions at any point
I
A nonconvex function can be convexified while maintaining the
optimality of its global minima
[1]Bertsekas, Dimitri P. Convex optimization theory. Belmont: Athena Scientific, 2009.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
6 / 78
Is Convex Optimization Suffiecent?
The convex optimization problems are structured. But in practice a lot of
practical problems are non-convex [2, 3, 4, 11, 10, 12].
Non-convex (NC) frameworks
1. Sparse Regression
2. Robust regression
3. Dictionary learning
4. Low-rank Matrix(Regression, completion, factorization,approximation)
5. Dimensionality reduction(MDS)
6. Non-concave utility maximization
7. Principal component analysis (PCA)
[2] Loh, Po-Ling, and Martin J. Wainwright. “High-dimensional regression with noisy and missing data: Provable guarantees with
non-convexity.” Advances in Neural Information Processing Systems. 2011.
[10] Ghadimi, Saeed, and Guanghui Lan. “Stochastic first-and zeroth-order methods for nonconvex stochastic programming.” SIAM Journal on
Optimization 23.4 (2013): 2341-2368.
[12]Hong, Mingyi. ”A distributed, asynchronous and incremental algorithm for nonconvex optimization: An ADMM based approach.” arXiv
preprint arXiv:1412.6058 (2014).
[11]Hong, Mingyi. ”Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: Algorithms, convergence, and
applications.” arXiv preprint arXiv:1604.00543 (2016).
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
7 / 78
Non-convex examples
Consider an optimization problem, where f (.) any data fidelity function
min f (x)
s.t
x∈X
(3)
1. Linear classification X = {x| kxk0 ≤ s}, NC: kxk0 .
2. Matrix-completion X = {X| rank(X) ≤ r}, NC: rank(X).
3. Robust PCA X = {X = L + S|rank(L) ≤ r, kSk0 ≤ s}, NC:rank(L)
and kSk0 .
4. Low rank matrix factorization, given set of entries mij from
M ∈ Rm×n , obtain LRT ∈ Rm×n , s.t,L ∈ Rm×r ,, RT ∈ Rr×n , and
r << min(m, n)
2
2
2
min M − LRT F + λ1 kLkF + λ2 kRkF
L,R
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
(4)
Open Seminar, 27/10/2016
8 / 78
Network non-convex example
(1) Distributed cooperative localization over networks . Given
{δk,j }N
k,j=1 , obtain X such that
X̂ = arg min
X∈X
I
K X
X
wkj (δkj − dk,j )2
(5)
k=1 j∈Nk
Non-convexity, due to objective function. dk,j := kXk − Xj k2
(2) Alternate formulation
X̂ = arg
X X
min
X,Y,{δkj },{rij }
subject to
2
wkj (δkj
− 2δkj dk,j + d2k,j )
k=1 j∈Nk
Ykk + Yjj − 2Ykj = rkj
2
rkj = δkj
, rkj ≥ 0,
j ∈ Nk
T
Y=X X
I
(6)
Non-convexity due to constraint Y = XT X.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
9 / 78
Non-convex optimization challenges and methods
Challenges:
I
Many local minima’s, no global solution.
I
Not well defined methods
I
Every problem is unique (unstructured)
Methods
1. Convex Relaxation[5]
2. Projected (sub)gradient method[7]
3. Alternate Minimization[12]
4. Majorization-Minimization[6]
[5] Vandenberghe, Lieven, and Stephen Boyd. “Semidefinite programming.” SIAM review 38.1 (1996): 49-95.
[12] Hong, Mingyi. ”A distributed, asynchronous and incremental algorithm for nonconvex optimization: An ADMM based approach.” arXiv
preprint arXiv:1412.6058 (2014).
[6] Mairal, Julien. “Incremental majorization-minimization optimization with application to large-scale machine learning.” SIAM Journal on
Optimization 25.2 (2015): 829-855.
[7] Bianchi, Pascal, and Jrmie Jakubowicz. “onvergence of a multi-agent projected stochastic gradient algorithm for non-convex optimization.”
IEEE Transactions on Automatic Control 58.2 (2013): 391-405.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
10 / 78
Convex Relaxation
2
1
Convex relaxations
3
Advantage:
I
Convex optimization: Polynomial time
Generic tools available for optimization
I
Systematic analysis
I
Disadvantage:
I
Optimizes over a much bigger set, not scalable to large problems
I
No convergence guarantees
I
Cannot be applied to all Non-Convex problems
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
11 / 78
Don’t Relax for Non-convex(NC)
Don’t Relax
Approaches
I Alternate Minimization
I Majorization-Minimization
Challenges: Analysis and convergence guarantee is much harder.
Advantage:
I Scalability
I Computationally cheap
I Convergence guarantee
I Better modeling of problems
Non-Convex optimization:
Challenging and Rewarding .
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
12 / 78
Alternate minimization
min
x,y
f (x, y)
(7)
Alternate minimization
I Fix x, optimize for y
y t := arg min
y
I
f (xt , y)
(8)
Fix y, optimize for x
xt+1 := arg min
x
f (x, y t )
(9)
Advantages
I If each subproblem is easy, iterates are cheap
I Parallelization is easy
I For each subproblem, an approximate solution by
linearization/mazorization is easy to obtain
I Stationary point is often guaranteed
[4]Jain, Prateek, Praneeth Netrapalli, and Sujay Sanghavi. “Low-rank matrix completion using alternating minimization.” Proceedings of the
forty-fifth annual ACM symposium on Theory of computing. ACM, 2013.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
13 / 78
Majorization-Minimzation
Replace iteratively the f (x) by an auxiliary function g(x, z), where z in
g(x, z) is some fixed value.
I
The function g(x, z) should be easy to minimize.
I
f (x) ≤ g(x, z)
I
f (x)|x=z = g(z, z)
[35] I. Borg, and P. Groenen.. modern multidimensional scaling theory and applications. Springer, New York, (2005)
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
14 / 78
Convex Vs Non-Convex
Given {mij } ∈ Ω, and M ∈ Rm×n , and [M]ij = mij . Obtain a low rank
matrix Z ∈ Rm×n , s.t Mij = Zij
min rank(Z) s.t,
Z
Convex relaxation[8]: kZk∗ =
PK
k=1
min kZk∗
Zij = Mij
(10)
σk (Z)
s.t,
Zij = Mij
(11)
Storing and finding SVD of m × n matrix is challenging
Non-convex:[4] Z = XYT , where X ∈ Rm×k , Y ∈ Rn×k and
k << min(m, n).
min
X,Y
1 X
[Mij − (XYT )ij ]
2
(12)
(i,j)∈Ω
smaller dimensions,alternate minimization, convex subproblem.
[8]Cands, Emmanuel J., and Terence Tao. “The power of convex relaxation: Near-optimal matrix completion.” IEEE Transactions on
Information Theory 56.5 (2010): 2053-2080.
[4]Jain, Prateek, Praneeth Netrapalli, and Sujay Sanghavi. “Low-rank matrix completion using alternating minimization.” Proceedings of the
forty-fifth annual ACM symposium on Theory of computing. ACM, 2013.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
15 / 78
Is local minima guarantee enough?
1. Globally optimal solution is NP-hard.
2. Large scale optimization [12]
3. Time varying optimization [39] (e.g, tracking, localization)
4. Motivation from deep learning [44].
‫ݔ‬ଵ ǡ ݂ሺ‫ݔ‬ଵ ሻ
‫ݔ‬ଶ ǡ ݂ሺ‫ݔ‬ଶ ሻ
Figure 1: Deep learning example
[44] Huang, Feihu, Songcan Chen, and Zhaosong Lu.“Stochastic Alternating Direction Method of Multipliers with Variance Reduction for
Nonconvex Optimization.” arXiv preprint arXiv:1610.02758 (2016).
[12]Hong, Mingyi. ”A distributed, asynchronous and incremental algorithm for nonconvex optimization: An ADMM based approach.” arXiv
preprint arXiv:1412.6058 (2014).
[39] Sandeep Kumar, R. Kumar and K. Rajawat, “Cooperative Localization of Mobile Networks Via Velocity-Assisted Multidimensional Scaling,”
in IEEE Trans. on Signal Proc., vol. 64, no. 7, April, 1, 2016.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
16 / 78
Optimization over Networks:
Challenges and Requirements
The performance of an algorithm over networks is particularly dictated by
two major constraints
1. Spatial constraints
2. Temporal constraints
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
17 / 78
Optimization over Networks (Spatial Constraints)
Spatial Constraints
I
Locally observable objective functions, and variables
I
Distributed information processing
I
Uncertainties, due to the information flow across the network.
4
X∗ = arg min
X∈X
K
X
fk (xk )
(13)
‫ ݔ‬4, ‫ ݔ‬5
3
‫ݔ‬6
k=1
X = [x1 , . . . , xK ]
(14)
Requirements
I
Decentralized algorithm
I
Local communication/message passing
I
Asynchronous communication protocol
1
‫ͳݔ‬
2
‫ݔ‬2 , ‫ݔ‬3
[23] Sayed, Ali H. “Adaptation, learning, and optimization over networks.” Foundations and Trends in Machine Learning 7.4-5 (2014): 311-801
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
18 / 78
Optimization over Networks (Temporal Constraints)
Temporal Constraints
On the other hand for a dynamic environment another class of uncertainties
often witnessed is due to temporal variations in the objective function
and the constraints.
I
Dynamic topology
I
Dynamic channel environment
I
Heterogeneity of the nodes in the multi-agent system
Existing approaches
I
Asynchronous algorithms [27]
I
Stochastic methods[37]
I
Bayesian based method [41]
I
Online optimization [24]
[24] M. Akbari; B. Gharesifard; T. Linder,“Distributed Online Convex Optimization on Time-Varying Directed Graphs,” in IEEE Transactions on
Control of Network Systems , vol.PP, no.99, 2015
[37] Simonetto, Andrea, Leon Kester, and Geert Leus. “Distributed Time-Varying Stochastic Optimization and Utility-based Communication.”
arXiv preprint arXiv:1408.5294 (2014).
[27]Nedic, Angelia. “Asynchronous broadcast-based convex optimization over a network.” IEEE Transactions on Automatic Control 56.6 (2011):
1337-1351.
[41] Sayed, Ali H. Adaptive filters. John Wiley & Sons, 2011.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
19 / 78
Contribution
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
20 / 78
Contribution
1. Asynchronous Optimization over Heterogeneous Networks
Sandeep Kumar, R. Jain, and K. Rajawat “Asynchronous Optimization Over Heterogeneous
Networks via Consensus ADMM” IEEE Trans. on Signal and Inf Proc. over Networks, 2016
2. Distributed Interference Alignment over Cellular Network
Sandeep Kumar and Ketan Rajawat, “Distributed Interference Alignment For MIMO Cellular
Networks Via Consensus ADMM.” IEEE GlobalSIP, 2016, Washington, U.S.
3. Distributed Non-Convex Stochastic ADMM
Sandeep Kumar, Sandeep Gudla and Ketan Rajawat “Distributed Stochastic Non-Convex
ADMM ”, IEEE Trans. on Signal Proc.(To be submitted).
4. Cooperative Mobile Network Localization
Sandeep Kumar, R. Kumar and K. Rajawat, “Cooperative Localization of Mobile Networks Via
Velocity-Assisted Multidimensional Scaling,” in IEEE Trans. on Signal Proc., vol. 64, no. 7,
April, 1, 2016.
Sandeep Kumar and Ketan Rajawat,“Velocity-assisted multidimensional scaling .” IEEE 16th
International Workshop on SPAWC, Stockholm, Sweden, 2015.
5. Stochastic Multi Dimensional Scaling
Ketan Rajawat, and Sandeep Kumar. “Stochastic Multidimensional Scaling”, Special issues on
distributed information processing for social networks, IEEE Trans. on Signal and Inf Proc. over
Networks (Under review).
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
21 / 78
Asynchronous Non-Convex
Optimization over
Heterogeneous Networks
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
22 / 78
Distributed Optimization
‫ݔ‬ଵ ǡ ‫ݖ‬
ͳ
݃ଵ ሺ‫ݔ‬ଵ ሻ
‫ݔ‬ଶ ǡ ‫ݖ‬
‫ݖ‬
ʹ
݃ଶ ሺ‫ݔ‬ଶ ሻ
݃ଵ ሺ‫ݔ‬ଵ ሻ
ͳ
Master node
‫ݔ‬ଷ ǡ ‫ݖ‬
‫ݔ‬௄ ǡ ‫ݖ‬
͵
݃ଷ ሺ‫ݔ‬ଷ ሻ
‫ܭ‬
݃௄ ሺ‫ݔ‬௄ ሻ
Centralized Architecture for Distributed
Optimization
(4)
‫ݔ‬ଵǡ ‫ݔ‬ଶ
ʹ
݃ଶ ሺ‫ݔ‬ଶ ሻ
‫ݔ‬ଵǡ ‫ݔ‬௄
‫ݔ‬ଶǡ ‫ݔ‬௄
‫ܭ‬
݃௄ ሺ‫ݔ‬௄ ሻ
‫ݔ‬ଷǡ ‫ݔ‬௄
͵
݃ଷ ሺ‫ݔ‬ଷ ሻ
Decentralized Architecture for Distributed
Optimization
I
A multi-agent system with k = 1, . . . , K nodes.
I
Each agent k, could be sensor nodes, processors, robots etc,.
I
Machine learning, robotics, Economics, Big data analytics,
Network optimization, Signal processing
I
Heterogeneity of the nodes, resource, spatial and temporal
constraints, motivates for distributed and asynchronous decision
making.
[25] Bertsekas, Dimitri P., and John N. Tsitsiklis. “Parallel and distributed computation: numerical methods. Vol. 23. Englewood Cliffs, NJ:
Prentice hall, 1989.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
23 / 78
Distributed and Asynchronous Optimization Literature
Centralized architecture, Problem formulation, set of variables {xk }K
k=1 ,
where xk ∈ x,
x∗ = arg min
x∈X
K
X
gk (xk ) + h(x)
(15)
k=1
Decentralized architecture, problem formulation
x∗ = arg min
x∈X
K
X
k=1
gk (xk ) +
K
X
h(xk )
(16)
k=1
When gk (.) is convex
I
Huge literature for both type of architecture Figure- 23.
I
Consensus based, Diffusion, Gossip based, Incremental (sub)gradient,
Distributed (sub)gradient, Distributed ADMM, Dual averaging,
Proximal Dual, Block Coordinate, Mirror descent, Alternate
minimization [13]-[22]
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
24 / 78
Non-convex Distributed Optimization
gk (x) is non-convex
I
Very recently, provably convergent solution, only for centralized
architecture.
I
Non-convex ADMM, Stochastic subgradient, Proximal Dual
[11, 10, 12].
I
Applicable for many applications including, Machine Learning.
Limitations of Centralized Architecture
I
Needs a master node, or assumes sharing of global
database/variable among all nodes.
I
Not suited for communication and sensor network applications.
[10] Ghadimi, Saeed, and Guanghui Lan. “Stochastic first-and zeroth-order methods for nonconvex stochastic programming.” SIAM Journal on
Optimization 23.4 (2013): 2341-2368.
[12]Hong, Mingyi. ”A distributed, asynchronous and incremental algorithm for nonconvex optimization: An ADMM based approach.” arXiv
preprint arXiv:1412.6058 (2014).
[11]Hong, Mingyi. ”Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: Algorithms, convergence, and
applications.” arXiv preprint arXiv:1604.00543 (2016).
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
25 / 78
Optimization over Distributed Networks
PK
I
X∗ = arg min
I
Locally observable information, local communication possible
I
Each node optimizes its function with local information
I
Let for node k, Nk denotes the neighboring set (N1 = {1, 2, 3}).
I
fk = F ({xj }j∈Nk ), f1 = F (x1 , x2 , x3 )
Sandeep Kumar (IIT K)
k=1
fk
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
26 / 78
I
Variables x ∈ RN such that, x = {xn }N
n=1
I
Partition the variables {xn }N
n=1 among nodes, each variable is of
interest to exactly one node.
I
I
{Sk }K
k=1 denote disjoint subsets, {xn |xn ∈ Sk } are local to node k
S
Sk0 := j∈{k}∪N 0 Sj , extended set containing neighborhood.
I
gk (·) at node k depends on the neighborhood Nk .
k
P = min
x
K
X
gk ({xn }n∈Sk0 ) + hk ({xn }n∈Sk )
(17)
k=1
s. t. {xn }n∈Sk ∈ Xk
k = 1, 2, . . . , K.
[28]Kumar, Sandeep, Rahul Jain, and Ketan Rajawat. ”Asynchronous Optimization Over Heterogeneous Networks via Consensus ADMM.” IEEE
Transactions on Signal Information Processing over Networks., 2016
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
27 / 78
Factor graph representation of the objective function
ܵԢ1
݃1
‫ ݔ‬4, ‫ ݔ‬5
4
3
1
‫ͳݔ‬
ሺܽሻ
‫ݔ‬6
2
‫ ݔ‬2, ‫ ݔ‬3
‫ͳݔ‬
‫ʹݔ‬
‫ ܵ ͵ݔ‬2
ܵԢ2
݃ʹ
ܵԢ3
݃͵
‫ݔ‬Ͷ
‫ݔ‬ͷ ܵ 3
ܵԢ4
݃4
ܵͳ
ሺܾሻ
‫ݔ‬͸ ܵ
4
Bipartite Factor
PK graph representation for partially separable objective
function. k=1 gk ({xn }n∈Sk0 ) + hk ({xn }n∈Sk )
Check nodes → Summands gk
Variable nodes → Set Sk .
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
28 / 78
Decentralized Consensus Problem Formulation
I
Introduce copies xkj of the variable xj , j ∈ Nk
I
The consensus variable z = {zk },
I
xk = {xkj }j∈Nk , zk = {zj }j∈Nk and yk = {ykj }j∈Nk
Sandeep Kumar (IIT K)
∀ k = 1, . . . , K
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
29 / 78
Decentralized Consensus Problem Formulation
x11
g1
min
{xk },z
K
X
z1
x12
x13
x1
x21
gk ({xkj }j∈Nk ) + hk (zk )
z2
x22
g2
x2
x24
k=1
(18)
s. t.
xkj = zj ,
zk ∈ Xk ,
j ∈ Nk
k = 1, . . . , K
x31
x33
g3
x34
x42
g4
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
z3
x3
z4
x43
x44
x4
Open Seminar, 27/10/2016
30 / 78
Fundamentals of ADMM
I
I
Alternating direction of method of multipliers (ADMM), blends the
decomposability of dual ascent with superior convergence properties
of the method of multipliers
The algorithm solves the problem of the form
min f (x) + g(z)
s.t Ax + Bz = c
I
(19)
With variables x ∈ Rn and z ∈ Rm , A ∈ Rp×n , B ∈ Rp×m and c ∈ Rp
The augmented Lagrangian
Lρ (x, z, y) = f (x) + g(z) + y T (Ax + Bz − c) + ρ/2kAx + Bz − ck22 (20)
1 zk+1 := arg minz Lρ (xk , z, yk ) z minimization step
2 xk+1 := arg minx Lρ (x, zk+1 , yk ), x minimization step
3 yk+1 := yk + ρ(Axk+1 + Bzk+1 − c), dual update, ρ > 0
[21]Boyd, Stephen, et al. “Distributed optimization and statistical learning via the alternating direction method of multipliers.” Foundations and
Trends in Machine Learning 3.1 (2011): 1-122.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
31 / 78
ADMM Lagrangian Formulation
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
32 / 78
ADMM Lagrangian Formulation
min
{xk },z
K
X
gk ({xkj }j∈Nk ) + hk (zk )
(21)
k=1
s. t.
xkj = zj ,
zk ∈ Xk ,
j ∈ Nk
(22)
k = 1, . . . , K
(23)
∀ k = 1, . . . , K, is non convex differentiable function,
I
gk (.),
I
hk (.) is convex, non-differentiable.
I
The set X is closed, compact and convex.
I
xk = {xkj }j∈Nk , zk = {zj }j∈Nk and yk = {ykj }j∈Nk
[28] Sandeep Kumar, R. Jain, and K. Rajawat “Asynchronous Optimization Over Heterogeneous Networks via Consensus ADMM” IEEE Trans.
on Signal and Inf Proc. over Networks, vol. 64, 2016
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
33 / 78
Lagrangian formulation
L({xk }, z, {yk }) =
X
`k (xk , zk , yk ) +
hk (zk )
(24)
`ˇj (x̌j , zj , y̌j ).
(25)
k
=
X
X
k
gj (xj ) +
X
j
j
xk = {xkj }j∈Nk , zk = {zj }j∈Nk yk = {ykj }j∈Nk
and [x̌j ]k := xkj , [y̌j ]k := xkj for all k ∈ Nj .
Inward view for node k
`k (xk , zk , yk ) := gk (xk ) +
X
hykj , xkj − zj i +
j∈Nk
X ρk
2
kxkj − zj k
2
(26)
j∈Nk
(27)
Outward view for node k
X
X ρk
2
`ˇj (x̌j , zj , y̌j ) :=
hykj , xkj − zj i +
kxkj − zj k + hj (zj )
2
k∈Nj
Sandeep Kumar (IIT K)
(28)
k∈Nj
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
34 / 78
Lagrangian through example
‫ݖ‬ଷ
‫ݔ‬ଵଷ
3‫ݔ‬
ଷ
1
‫ݔ‬ଵଵ
‫ݖ‬ଵ
3
4
4
‫ݖ‬ଵ
‫ݔ‬ଵ
‫ݔ‬ଵଶ
(a) Inward
‫ݖ‬ଶ
‫ݔ‬ଶ
‫ݔ‬ଷଵ
‫ݔ‬ଵ
1
2
‫ݔ‬ଵଵ
‫ݖ‬ଵ
‫ݔ‬ଶଵ
‫ݖ‬ଵ
2
(b) Outward
x1 = {x11 , x12 , x13 }, y1 = {y11 , y12 , y13 } and z1 = {z1 , z2 , z3 }
x̌1 = {x11 , x21 , x31 }, y̌1 = {y11 , y21 , y31 } and
`1 (x1 , z1 , y1 ) = g1 (x1 ) + hy11 , x11 − z1 i + hy12 , x12 − z2 i + hy13 , x13 − z3 i
2
2
2
+ ρ/2 kx11 − z1 k + ρ/2 kx12 − z2 k + ρ/2 kx13 − z3 k
`ˇ1 (x̌1 , z1 , y̌1 ) = h1 (z1 ) + hy11 , x11 − z1 i + hy21 , x21 − z1 i + hy31 , x31 − z1 i
2
2
2
ρ/2 kx11 − z1 k + ρ/2 kx21 − z1 k + ρ/2 kx31 − z1 k
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
35 / 78
Proximal Update Rule
For K node system, with partially separable variables.
Primal variables: {xk }K
k=1 , z = [z1 , . . . , zK ] where for each node,
xk = {xkj }j∈Nk ,
Dual variables: {yk }K
k=1 , where for each node, yk = {ykj }j∈Nk .
1
Starting with arbitrary {x1k } and {ykj
}, the update for {zjt+1 } are
zjt+1 = arg min hj (zj ) +
zj ∈Xj
X
t
hykj
, xtkj − zj i +
k∈Nj
X ρk xtkj − zj 2
2
k∈Nj

t
t
k∈Nj ρk xkj + ykj

P
k∈Nj ρk
P
= proxj 
(29)
Where the proximal point function proxj (·) is defined as[26]
proxj (x) := arg min h(u) +
u∈Xj
1
2
kx − uk .
2
(30)
[26]Parikh, Neal, and Stephen P. Boyd. “Proximal Algorithms.” Foundations and Trends in optimization 1.3 (2014): 127-239.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
36 / 78
Primal and Dual updates..... linearization
xt+1
is obtained by minimizing Lagrangian with respect to xk ,
k
xt+1
= arg min gk (xk ) +
k
xk
I
I
X
t
hykj
, xkj − zjt+1 i +
j∈Nk
X ρk xkj − z t+1 2
j
2
j∈Nk
gk (.) is non-convex, above problem is hard to solve exactly .
t+1
t+1
linearizing gk (xk ) ≈ gk (zt+1
k ) + h∇gk (zk ), xk − zk i
t+1
t+1
xt+1
≈ arg min gk (zt+1
k
k ) + h∇gk (zk ), xk − zk i
xk
X
X ρk t
xkj − z t+1 2
+
hykj
, xkj − zjt+1 i +
j
2
j∈Nk
I
I
I
(31)
j∈Nk
the vector [zk ]j := zj for all j ∈ Nk and zero otherwise.
Since nodal functions gk (·) depend only on {xn }n∈Nk
The gradient vector is defined as

 ∂ g (x )
j ∈ Nk
k
∂xkj k
xk =zk
[∇gk (zt+1
k )]j := 
0
j∈
/N .
(32)
k
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
37 / 78
Primal and Dual updates..... linearization
xt+1
is obtained by minimizing Lagrangian with respect to xk ,
k
xt+1
= arg min gk (xk ) +
k
xk
I
I
X
t
hykj
, xkj − zjt+1 i +
j∈Nk
X ρk xkj − z t+1 2
j
2
j∈Nk
gk (.) is non-convex, above problem is hard to solve exactly .
t+1
t+1
linearizing gk (xk ) ≈ gk (zt+1
k ) + h∇gk (zk ), xk − zk i
t+1
t+1
xt+1
≈ arg min gk (zt+1
k
k ) + h∇gk (zk ), xk − zk i
xk
X
X ρk t
xkj − z t+1 2
+
hykj
, xkj − zjt+1 i +
j
2
j∈Nk
I
I
I
(31)
j∈Nk
the vector [zk ]j := zj for all j ∈ Nk and zero otherwise.
Since nodal functions gk (·) depend only on {xn }n∈Nk
The gradient vector is defined as

 ∂ g (x )
j ∈ Nk
k
∂xkj k
xk =zk
[∇gk (zt+1
k )]j := 
0
j∈
/N .
(32)
k
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
37 / 78
Primal and Dual updates..... linearization
xt+1
is obtained by minimizing Lagrangian with respect to xk ,
k
xt+1
= arg min gk (xk ) +
k
xk
I
I
X
t
hykj
, xkj − zjt+1 i +
j∈Nk
X ρk xkj − z t+1 2
j
2
j∈Nk
gk (.) is non-convex, above problem is hard to solve exactly .
t+1
t+1
linearizing gk (xk ) ≈ gk (zt+1
k ) + h∇gk (zk ), xk − zk i
t+1
t+1
xt+1
≈ arg min gk (zt+1
k
k ) + h∇gk (zk ), xk − zk i
xk
X
X ρk t
xkj − z t+1 2
+
hykj
, xkj − zjt+1 i +
j
2
j∈Nk
I
I
I
(31)
j∈Nk
the vector [zk ]j := zj for all j ∈ Nk and zero otherwise.
Since nodal functions gk (·) depend only on {xn }n∈Nk
The gradient vector is defined as

 ∂ g (x )
j ∈ Nk
k
∂xkj k
xk =zk
[∇gk (zt+1
k )]j := 
0
j∈
/N .
(32)
k
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
37 / 78
Primal and Dual updates..... linearization
xt+1
is obtained by minimizing Lagrangian with respect to xk ,
k
xt+1
= arg min gk (xk ) +
k
xk
I
I
X
t
hykj
, xkj − zjt+1 i +
j∈Nk
X ρk xkj − z t+1 2
j
2
j∈Nk
gk (.) is non-convex, above problem is hard to solve exactly .
t+1
t+1
linearizing gk (xk ) ≈ gk (zt+1
k ) + h∇gk (zk ), xk − zk i
t+1
t+1
xt+1
≈ arg min gk (zt+1
k
k ) + h∇gk (zk ), xk − zk i
xk
X
X ρk t
xkj − z t+1 2
+
hykj
, xkj − zjt+1 i +
j
2
j∈Nk
I
I
I
(31)
j∈Nk
the vector [zk ]j := zj for all j ∈ Nk and zero otherwise.
Since nodal functions gk (·) depend only on {xn }n∈Nk
The gradient vector is defined as

 ∂ g (x )
j ∈ Nk
k
∂xkj k
xk =zk
[∇gk (zt+1
k )]j := 
0
j∈
/N .
(32)
k
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
37 / 78
Primal and Dual updates..... linearization
xt+1
is obtained by minimizing Lagrangian with respect to xk ,
k
xt+1
= arg min gk (xk ) +
k
xk
I
I
X
t
hykj
, xkj − zjt+1 i +
j∈Nk
X ρk xkj − z t+1 2
j
2
j∈Nk
gk (.) is non-convex, above problem is hard to solve exactly .
t+1
t+1
linearizing gk (xk ) ≈ gk (zt+1
k ) + h∇gk (zk ), xk − zk i
t+1
t+1
xt+1
≈ arg min gk (zt+1
k
k ) + h∇gk (zk ), xk − zk i
xk
X
X ρk t
xkj − z t+1 2
+
hykj
, xkj − zjt+1 i +
j
2
j∈Nk
I
I
I
(31)
j∈Nk
the vector [zk ]j := zj for all j ∈ Nk and zero otherwise.
Since nodal functions gk (·) depend only on {xn }n∈Nk
The gradient vector is defined as

 ∂ g (x )
j ∈ Nk
k
∂xkj k
xk =zk
[∇gk (zt+1
k )]j := 
0
j∈
/N .
(32)
k
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
37 / 78
The approximate update of xt+1
= {xt+1
k
kj } thus becomes
xt+1
kj =
(
zjt+1 −
1
ρk
t
[∇gk (zt+1
k )]j + ykj
j ∈ Nk
j∈
/ Nk .
0
(33)
The dual updates are as
t+1
t+1
t
ykj
= ykj
+ ρk {xt+1
}
kj − zj
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
j ∈ Nk
(34)
Open Seminar, 27/10/2016
38 / 78
Primal and Dual updates...Majorization
I
In many problems, it is possible to upper bound the non-convex
component functions gk (xk ) with an appropriate convex surrogate
function.
I
The surrogate/majorizing function, is s.t,
and satisfies
∀, xk , gk (xk ) ≤ fk (xk , zk )
fk (zk , zk ) = gk (zk )
(35)
∇fk (zk , zk ) := ∇x fk (x, zk ) |x=zk = ∇x gk (x) |x=zk .
I
(36)
With such majorizer, the xt+1
updates can be carried out accurately as
k
[t+1]
xt+1
= arg min fk (xk ,zk
k
)+
X
t
hykj
, xkj − zjt+1 i
j∈Nk
ρk X xkj − z t+1 2
+
j
2
(37)
j∈Nk
I
While the zk and ykj updates are same as previous.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
39 / 78
Distributed Synchronous Algorithm
1
Initialize {x1kj , ykj
}, zk for all j ∈ Nk .
for t = 1, 2, . . . do
t
Send {ρk xtkj + ykj
} to neighbors j ∈ Nk
t
4:
Upon receiving {ρj xtjk + yjk
} for all j ∈ Nk ,
t+1
5:
Update zk as

P
t
t
ρ
x
+
y
j jk
jk
j∈Nk

P
zkt+1 = proxk 
j∈Nk ρj
1:
2:
3:
6:
7:
8:
Transmit zkt+1 to its neighbors j ∈ Nk
Upon receiving zjt+1 , from all neighbors j ∈ Nk ,
Update the primal variable xt+1
as
k
(
t
j ∈ Nk
zjt+1 − ρ1k [∇gk (zt+1
t+1
k )]j + ykj
xkj =
0
j∈
/ Nk .
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
(38)
(39)
Open Seminar, 27/10/2016
40 / 78
Distributed Synchronous Algorithm
9:
t+1
Update the dual variable ykj
as
t+1
t+1
t
ykj
= ykj
+ ρk {xt+1
}
kj − zj
j ∈ Nk
(40)
if stopping criteria is met then
terminate loop
12:
end if
13: end for
10:
11:
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
41 / 78
Synchronous Limitations
The above algorithm is distributed but synchronous, the progress of the
algorithm is limited by the slowest nodes.
Its applicability to heterogeneous networks is therefore limited, due to
various issues as
(S1) For some nodes, calculating ∇gk (·) or proxk (·) may be
computationally demanding. In such cases, all nodes in the network
must wait for the slowest node to carry out its update.
(S2) Each node is required to transmit two messages to each of its
neighbors per-iteration. This might be excessive for nodes operating
on a power-budget.
(S3) Delays in function computation or communication may also arise
because of the heterogeneity of nodes in a network.
Need Asynchronous Algorithm
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
42 / 78
Asynchronous updates
I
Skipping of ∇gk (·) or proxk (·) calculations, and communications are
allowed for some iterations
I
Let S t as the set of nodes that carry out the update at time t, then
the zt+1
update can be written as
j
zjt+1 =
P

t
t
k∈Nj (ρk xkj +ykj )
prox
P
j

k∈Nj
j ∈ St
ρk
zjt
I
Bounded delay: t + 1 − Tk ≤ [t + 1]k ≤ t + 1 for some Tk < ∞.
I
Use the latest available gradient ∇gk (zk
[t+1]
xt+1
kj
=
(
zjt+1 −
1
ρk
[t+1]
[∇gk (zk
) for xk update,
t
)]j + ykj
j ∈ Nk
j∈
/ Nk .
0
t+1
t+1
t
ykj
= ykj
+ ρk {xt+1
}
kj − zj
Sandeep Kumar (IIT K)
(41)
j∈
/ St
Non-Convex Optimization Over Networks
j ∈ Nk
(42)
(43)
Open Seminar, 27/10/2016
43 / 78
Distributed Asynchronous Algorithm with Optional Updates
1
Set t = 1, initialize {x1kj , ykj
, zj1 } for all j ∈ Nk .
2: for t = 1, 2, . . . do
t
(Optional) Send {ρk xtkj + ykj
} to neighbors j ∈ Nk
t
t
4:
if {ρj xjk + yjk } received from all j ∈ Nk then
(Optional) Update zkt+1 as
P

t
t
j∈Nk (ρj xjk +yjk )
prox
P
k ∈ St
k
j∈Nk ρj
zkt+1 =
 t
zk
k∈
/ St
6:
8:
10:
(44)
(Optional) transmit to each j ∈ Nk
end if
if zjt+1 not received from some j ∈ Nk then
set zjt+1 = zjt
end if
(Optional) Calculate gradient ∇gk (zt+1
k )
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
44 / 78
Distributed Asynchronous Algorithm with Optional Updates
12:
Update the primal variable xt+1
as
k
(
xt+1
kj
=
zjt+1 −
1
ρk
[t+1]
[∇gk (zk
t
)]j + ykj
j ∈ Nk
j∈
/ Nk .
0
(45)
t+1
Update the dual variable ykj
as in (34)
t+1
14:
if xk − xtk ≤ δ then
terminate loop
16:
end if
end for
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
45 / 78
Convergence Analysis
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
46 / 78
Linearized updates
Assumption
For each node k, the component function gradient ∇gk (x) is Lipschitz
continuous, that is, there exists Lk > 0, for all x, x0 ∈ domgk such that
k∇gk (x) − ∇gk (x0 )k ≤ Lk kx − x0 k .
(46)
Assumption
The set X is a closed, convex, and compact. The functions gk (x) is
bounded from below over X .
Assumption
For node k, the step size ρk is chosen large enough such that, it holds that
αk > 0 and βk > 0, where
ρk fk
7Lk
1
|Nk |Lk Tk2
αk :=
−
+
|Nk |L2k (Tk + 1)2 −
2
2
2ρk
ρk
2
βk := ρk − 7Lk
Sandeep Kumar (IIT K)
(47)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
47 / 78
Majorized updates
Assumption
For each node k, there exists a constant Lk ≥ 0, such that for all x, x0 , z, z0 ,
the following inequalities are satisfied:
k∇gk (x) − ∇gk (x0 )k ≤ Lk kx − x0 k
0
0
(48a)
k∇f (x, z) − ∇f (x , z)k ≤ Lk kx − x k
(48b)
k∇f (x, z) − ∇f (x, z0 )k ≤ Lk kz − z0 k .
(48c)
Assumption
For node k, the step size ρk is chosen large enough such that αk > 0 and
βk > 0, where
ρk fk
8Lk
1
Lk Tk2
2
2
αk := |Nk |
+
L
(T
+
1)
−
−
k
k
2
ρ2k
ρk
2
βk :=
ρk − 9Lk
8L3
− 2k .
2
ρk
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
(49)
Open Seminar, 27/10/2016
48 / 78
Lemma
(a) Starting from any time t = t0 , there exists T < ∞ such that
L({xTk +t0 }; zT +t0 , {ykT +t0 }) − L({xtk0 }; zt0 , {ykt0 })
≤−
T +t
K
0 −1 X
X
i=t0
−
k=1
βk X
i
2
kxi+1
kj − xkj k
2
TX
+t0 X
K
i=t0 k=1
j∈Nk
αk
X
kzji+1 − zji k2 .
(50)
j∈Nk
(b) The augmented Lagrangian values are bounded from below, i.e., for any
time t ≥ 1, it holds that Lagrangian satisfies
L({xtk }; zt , {ykt }) ≥ P −
Lk X
diam2 (X ) > −∞
2
j∈Nk
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
49 / 78
Theorem
(a) The iterates generated by asynchronous
following sense
lim zkt+1 − zkt = 0,
t→∞
t lim xt+1
−
x
= 0,
kj
kj
t→∞
t+1
t lim ykj
− ykj
= 0,
t→∞
Algorithm converges in the
∀
k
(51a)
j ∈ Nk ,
∀
k
(51b)
j ∈ Nk ,
∀
k
(51c)
(b) For each k ∈ K and j ∈ Nk , denote limit points of the sequences {zkt },
t
?
{xtkj }, and {ykj
} by zk? , x?kj , and ykj
, respectively. Then
?
?
?
{{zk }, {xkj }, {ykj }} is a stationary point of (18) and satisfies
∇gk (x?k ) + yk? = 0, k = 1, . . . , K
X
?
yjk
∈ ∂(hk (z)) |z=zk? k = 1, . . . , K
(52a)
(52b)
j∈Nk
x?kj = zj? ∈ Xj ,
Sandeep Kumar (IIT K)
j ∈ Nk , k = 1, . . . , K
Non-Convex Optimization Over Networks
(52c)
Open Seminar, 27/10/2016
50 / 78
Asynchronous Localization over
WSNs
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
51 / 78
Asynchronous Localization over WSNs
I
Distributed cooperative localization over networks
X
X̂ = arg min
gk ({xj }j∈Nk )
X∈B
(53)
k=1
gk ({xj }j∈Nk ) =
X
wkj δkj
2
q
2
− kxk − xj k + (54)
j∈Nk0
Figure 2: Cooperative Localization Example
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
52 / 78
r h
i
N RM SE := E kX̂ − X? k2F /E [kX? k2F ]
0.18
SyncDADMM
0.16
AsyncDADMM
NRMSE
AsyncE−ML
SyncE−ML
0.14
DwMDS
0.12
0.1
0.08
100
150
200
250
300
Iterations
350
400
450
500
Figure 3: NRMSE performance
1 Proposed ADMM based method
2 DwMDS [38], Distributed MDS based incremental approach.
3 E-ML [37] Edge based semi definite relaxation (E-SDP) techniques.
[37] Simonetto, Andrea, and Geert Leus. “Distributed maximum likelihood sensor network localization.” IEEE Transactions on Signal
Processing 62.6 (2014): 1424-1437.
[38] Costa, Jose A., Neal Patwari, and Alfred O. Hero III. “Distributed weighted-multidimensional scaling for node localization in sensor
networks.” ACM Transactions on Sensor Networks (TOSN) 2.1 (2006): 39-64.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
53 / 78
convergence criteria
The convergence rate is analyzed by plotting the stopping criterion, against
the iteration index t.
I ψ(t) := zt+1 − zt PN t+1 − Xt )
I φ(t) := 1
k F
k=1 (Xk
N
PN t+1
I Ω(t) := 1
− Z∗ )F
k=1 (Xk
N
Z∗ is the optimal point, obtained after running the algorithm for 5000
iterations.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
54 / 78
The cooperative distributed gradient descent (C-DGD)[29, ch,10].
1 X t
Xt+1
=
Xj − α∇gkt (Xk )
k
|Nk |
j∈Nk
10 0
φ(t)
10 -5
10 -10
10 -15
C-DGD
SyncMDADMM
SyncDADMM
Ω(t)
100
10-5
10
C-DGD
SyncMDADMM
SyncDADMM
-10
0
200
400
600
800
1000
Iterations(t)
Figure 4: Convergence of Majorized and Proximal Variant and C-DGD
[29] Nedic, Angelia, and Asuman Ozdaglar. “cooperative distributed multi-agent.” Convex Optimization in Signal Processing and
Communications 340 (2010)
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
55 / 78
0
10
ψ(t)
ρk=50
−10
10
ρk=10
−20
NRMSE
10
ρk=20
ρk=5
0.1
ρk=10
ρ =5
0.01 k
0
ρk=20 ρk=50
2000
4000
6000
Iteration
8000
10000
Figure 5: Effect of choosing ρk for Tk = 4 and fk = 0.75
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
56 / 78
0
ψ(t)
10
−10
fk=0.25
10
fk=0.5
f =0.75
k
−20
ψ(t)
100
10
−10
10
Tk=2
−20
10
0
Tk=5
Tk=15
Tk=20
Tk=1
2000
4000
6000
Iterations(t)
8000
10000
Figure 6: Effect of asynchrony for ρk = 10. The top figure uses Tk = 4, while the
bottom figure uses fk = 0.75.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
57 / 78
Distributed Interference
Alignment
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
58 / 78
Interference alignment (IA) Over Cellular MIMO Network
Interference
. . .
.
.
6 Cell network
.
Signal
Figure 7: Cellular Network
I
(G, K, N, M ) cellular system.
I
Each BS g serves K users, denoted by the set Ug .
I
The backhaul allows communication in neighbor hood
We consider the uplink case, where user j ∈ Ug transmits a precoded
signal vector over an M × N channel, and is received by the set of BSs
Ng0 := Ng ∪ {g}.
I
I
Interference due to neighboring cells are considered.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
59 / 78
When symbol extensions over time and frequency are not allowed, IA entails
obtaining matrices {Vj , Uj }j that satisfy [31, 32]
UH
k Hjg Vj = 0
rank(UH
k Hkg Vk )
j∈
/ Ug , k ∈ Ug
(55)
k ∈ Ug
(56)
=d
The above problem is NP-hard, can be solved only approximately.
The proposed distributed rank-minimization formulation [33]
min
G X X
X
{Vj }{Cg }{Dgj }
kHjg Vj − Cg Dgj k2F
g=1 i∈Ng j∈Ui
s.t.Vk (1 : d, 1 : d) = I
∀ g, k ∈ Ug
(57)
[31]Cadambe, Viveck R and Jafar, Syed Ali. ”Interference alignment and degrees of freedom of the-user interference channel.” IEEE Trans. on
Information Theory, 2008, 3425–3441.
[32]Sridharan, Gokul and Yu, Wei. ”Linear Beamformer Design for Interference Alignment via Rank Minimization.” IEEE Trans. on Signal
Processing , 2015, 5910–5923.
[33]Sandeep Kumar and Ketan Rajawat, “Distributed Interference Alignment For MIMO Cellular Networks Via Consensus ADMM.” IEEE
GlobalSIP, 2016, Washington.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
60 / 78
The number
is defined as η1 − η2 , where
Pof interference-free dimensions
−2 η1 := {n| k σn (UH
H
V
)
>
10
}
for all k and gk k
k
H
η2 := {n | σn (Uk [Rg , {Hng Vn }n∈Ug ,n6=k ]) > 10−5 }.
Interference-free dimensions
Instances of Network-wide Alignment
100%
80%
60%
40%
SIR >20 dB
SIR >40 dB
20%
SIR >60 dB
0%
6
5
4
3
2
1
0
0
500
1000
1500
2000
Iterations (t)
2500
3000
Figure 8: Network-wide alignment and free dimensions
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
61 / 78
Distributed Stochastic
Non-Convex ADMM
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
62 / 78
Stochastic Non-Convex ADMM
In this work we are interested to minimize the following optimization problem
minimize Eξ [f (x, ξ)] + g(x)
x∈X
(58)
where
I
X is a compact convex set
I
f is objective function, and g is regularizer function.
I
f (x) ≡ Eξ [f (x, ξ)] is expectation of instance function value f (x, ξ)
w.r.t ξ
ξ is a random observation belongs to the probability space (Ω, F, P)
I
I
I
I
where the distribution P is unknown
It is assumed that a sequence of independent and identical distributed
(i.i.d) observations can be drawn from P
let F 0 ⊂ F 1 ⊂ . . . be an increasing
sequence of sub σ – fields of F
where F t = σ ξ 1 , ξ 2 . . . , ξ t .
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
63 / 78
Stochastic Non-convex Consensus ADMM
Centralized Formulation
minimize
Eξ [f (x, ξ)] + g(z)
subject to
x=z;
(59)
x, z ∈ X
Where g(.) and Eξ [f (x, ξ)], are non-convex and differentiable functions.
Distributed Formulation
min g(z) +
x∈X
s.t
K
X
E [fk (xk , ξk )]
k=1
xk = z,
(60)
∀ k = 1, . . . , K ; z ∈ X
Where g(.) is a convex and possibly non-differentiable function and
Eξ [fk (xk , ξk )] is non convex and smooth function, with Lipschitz constant
Lk for all k = 1, . . . , K.
[36] Sandeep Kumar, Sandeep Gudla, and Ketan Rajawat “Distributed Stochastic Non-Convex ADMM ”, IEEE Trans. on Signal Proc.(To be
submitted).
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
64 / 78
ECD SNR 15.4103
PCA SNR 14.7448
NC-DSADMM SNR 15.4754
NC-SADMM 15.9434
Figure 9: Reconstructed image of mars
[45] Schizas, Ioannis D., and Georgios B. Giannakis. “Covariance eigenvector sparsity for compression and denoising.” IEEE Transactions on
Signal Processing 60.5 (2012): 2408-2421.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
65 / 78
Multi dimensional Scaling
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
66 / 78
MDS
Given {δij }, MDS finds an embedding X ∈ RN ×m , where xi ∈ Rm are the
coordinates, that closely approximates the distances/dissimilarities
P
I X̂ = arg minX∈RN ×m σ(X) =
ωij (δij − kxi − xj k2 )2
i<j
I
MDS as dimensionality reduction technique
F : dRm0 (i, j) → {x1 , . . . , xn } ⊂ Rm
I
(m < m0 )
MDS as visualization technique
F : dRm (i, j) → {x1 , . . . , xn } ⊂ Rm (m = 2)
[34] T.F. Cox and M.A.A.Cox, Multidimensional Scaling, CRC/ Chapman and Hall, 2001.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
67 / 78
Relative velocity
I
Noise-free relative velocity measurements between nodes i and j
(t)
(t)
(t)
υij = kvi − vj k cos θij
I
word or phrase
(61)
θij is the angle between the relative velocity and relative position
Figure 10: Calculation of angle θ
[39] Sandeep Kumar, R. Kumar and K. Rajawat, “Cooperative Localization of Mobile Networks Via Velocity-Assisted Multidimensional Scaling,”
in IEEE Trans. on Signal Proc., vol. 64, no. 7, April, 1, 2016.
[40] Sandeep Kumar and Ketan Rajawat,“Velocity-assisted multidimensional scaling .” IEEE 16th International Workshop on SPAWC,
Stockholm,
Sweden,
Sandeep
Kumar2015.
(IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
68 / 78
Velocity-assisted MDS(vMDS)
vMDS stress function
σ(X(t) ) =
X
ωij
X
νij
(t)
(t)
δij − dij (X(t) )
2
i<j
+λ
(t)
(t)
(t)
(t)
υij − ||vi − vj || cos θ
2
(62)
i<j
The vMDS problem is formulated as
X̂(t) = arg min σ(X(t) )
(63)
X(t)
I
Relative velocity information is incorporated in MDS framework
I
The vMDS stress function is non-convex.
I
Minimization is done by majorization approach
A low complexity, model independent, dynamic localization algorithm
with convergence guarantee. The framework is extended to the
distributed implementation also.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
69 / 78
0.7
0.6
υmax = 0.03 m/s
υmax = 0.1 m/s
EKF
NRMSE
0.5
0.4
vMDS
0.3
0.2
0.1
0
0.4
CRLB
0.6
0.8
R
1
1.2
1.4
Figure 11: Comparison with the CRLB
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
70 / 78
3
10
R =1.4
R=0.4
2
10
Run time
1
EKF
10
0
10
vMDS
−1
10
−2
10
50
100
Number of nodes
150
200
Figure 12: Per localization instant run times
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
71 / 78
Stochastic Multidimensional Scaling(SMDS)
I
Here we consider the MDS problem in a stochastic setting.
I
The weights, and dissimilarities or distance measurements are random
variables with unknown distributions. Specifically, given {δmn (t)} and
{wmn (t)},
X min σ̄(X) :=
E wmn (t)(δmn (t) − kxm − xn k)2 .
(64)
X
m<n
I
In the absence of the distribution information, the expression for σ̄(X)
cannot be evaluated in closed-form, and the SMACOF algorithm cannot
be applied.
I
We have proposed an online version of SMACOF algorithm.
I
The algorithm is provably convergent and has a linear-complexity and
simple to implement.
I
Applications to network tracking, dynamic network visualization and big
data embedding are explored.
[42] Ketan Rajawat, and Sandeep Kumar. “Stochastic Multidimensional Scaling”, Special issues on distributed information processing for social
networks, IEEE Trans. on Signal and Inf Proc. over Networks (Under review).
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
72 / 78
Pubchem data with 8 million data point
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
73 / 78
Conclusion
I
The area Non-Convex optimization is very fascinating (Great
challenges, Great return).
I
Non-convex optimization over network is even more fascinating with
great opportunities.
I
Discussed alternate-minimization and majorization-minimization based
method for non-convex optimization.
Proposed deterministic and stochastic framework based on ADMM for
non-convex optimization.
I
I
Extension of MDS framework, for dynamic and large scale network
application, big data visualization
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
74 / 78
Thank You
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
75 / 78
References
[1]
Bertsekas, Dimitri P. Convex optimization theory. Belmont: Athena Scientific, 2009.
[2]
Loh, Po-Ling, and Martin J. Wainwright. “High-dimensional regression with noisy and missing data: Provable guarantees with
non-convexity.” Advances in Neural Information Processing Systems. 2011.
[3]
Huber, Peter J. “Robust methods of estimation of regression coefficients 1. Statistics: A Journal of Theoretical and Applied Statistics
8.1 (1977): 41-53.
[4]
Jain, Prateek, Praneeth Netrapalli, and Sujay Sanghavi. “Low-rank matrix completion using alternating minimization.” Proceedings of
the forty-fifth annual ACM symposium on Theory of computing. ACM, 2013.
[5]
Vandenberghe, Lieven, and Stephen Boyd. “Semidefinite programming.” SIAM review 38.1 (1996): 49-95.
[6]
Mairal, Julien. “Incremental majorization-minimization optimization with application to large-scale machine learning.” SIAM Journal on
Optimization 25.2 (2015): 829-855.
[7]
Bianchi, Pascal, and Jrmie Jakubowicz. “convergence of a multi-agent projected stochastic gradient algorithm for non-convex
optimization.” IEEE Transactions on Automatic Control 58.2 (2013): 391-405.
[8]
Cands, Emmanuel J., and Terence Tao. “The power of convex relaxation: Near-optimal matrix completion.” IEEE Transactions on
Information Theory 56.5 (2010): 2053-2080.
[9]
Davis, Damek. “The Asynchronous PALM Algorithm for Nonsmooth Nonconvex Problems.” arXiv preprint arXiv:1604.00526 (2016).
[10]
Ghadimi, Saeed, and Guanghui Lan. “Stochastic first-and zeroth-order methods for nonconvex stochastic programming.” SIAM Journal
on Optimization 23.4 (2013): 2341-2368.
[11]
Hong, Mingyi. ”Decomposing linearly constrained nonconvex problems by a proximal primal dual approach: Algorithms, convergence, and
applications.” arXiv preprint arXiv:1604.00543 (2016).
[12]
Hong, Mingyi. ”A distributed, asynchronous and incremental algorithm for nonconvex optimization: An ADMM based approach.” arXiv
preprint arXiv:1412.6058 (2014).
[13]
Chang, Tsung-Hui, Angelia Nedi, and Anna Scaglione. ”Distributed constrained optimization by consensus-based primal-dual
perturbation method.” IEEE Transactions on Automatic Control 59.6 (2014): 1524-1538.
[14]
Lobel, Ilan, and Asuman Ozdaglar. ”Distributed subgradient methods for convex optimization over random networks.” IEEE Transactions
on Automatic Control 56.6 (2011): 1291-1306.
[15]
Nedic, Angelia, Dimitri P. Bertsekas, and Vivek S. Borkar. ”Distributed asynchronous incremental subgradient methods.” (2000).
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
76 / 78
References
[16]
Duchi, John C., Alekh Agarwal, and Martin J. Wainwright. ”Dual averaging for distributed optimization: convergence analysis and
network scaling.” IEEE Transactions on Automatic Control 57.3 (2012): 592-606.
[17]
Chen, Jianshu, and Ali H. Sayed. ”Diffusion adaptation strategies for distributed optimization and learning over networks.” IEEE
Transactions on Signal Processing 60.8 (2012): 4289-4305.
[18]
Zhang, Ruiliang, and James T. Kwok. ”Asynchronous Distributed ADMM for Consensus Optimization.” ICML. 2014.
[19]
Richtrik, Peter, and Martin Takc. ”Distributed coordinate descent method for learning with big data.” (2013).
[20]
Dekel, Ofer, et al. ”Optimal distributed online prediction using mini-batches.” Journal of Machine Learning Research 13.Jan (2012):
165-202.
[21]
Boyd, Stephen, et al. ”Distributed optimization and statistical learning via the alternating direction method of multipliers.” Foundations
and Trends in Machine Learning 3.1 (2011): 1-122.
[22]
Boyd, Stephen, et al. ”Gossip algorithms: Design, analysis and applications.” Proceedings IEEE 24th Annual Joint Conference of the
IEEE Computer and Communications Societies.. Vol. 3. IEEE, 2005.
[23]
Sayed, Ali H. “Adaptation, learning, and optimization over networks.” Foundations and Trends in Machine Learning 7.4-5 (2014):
311-801.
[24]
M. Akbari; B. Gharesifard; T. Linder,“Distributed Online Convex Optimization on Time-Varying Directed Graphs,” in IEEE Transactions
on Control of Network Systems , vol.PP, no.99, 2015
[25]
Bertsekas, Dimitri P., and John N. Tsitsiklis. “Parallel and distributed computation: numerical methods. Vol. 23. Englewood Cliffs, NJ:
Prentice hall, 1989.
[26]
Parikh, Neal, and Stephen P. Boyd. “Proximal Algorithms.” Foundations and Trends in optimization 1.3 (2014): 127-239.
[27]
Nedic, Angelia. “Asynchronous broadcast-based convex optimization over a network.” IEEE Transactions on Automatic Control 56.6
(2011): 1337-1351.
[28]
Sandeep Kumar, R. Jain, and K. Rajawat “Asynchronous Optimization Over Heterogeneous Networks via Consensus ADMM” IEEE
Trans. on Signal and Inf Proc. over Networks, vol. 64, 2016
[29]
Nedic, Angelia, and Asuman Ozdaglar. “cooperative distributed multi-agent.” Convex Optimization in Signal Processing and
Communications 340 (2010)
[30]
Simonetto, Andrea, Leon Kester, and Geert Leus. “Distributed Time-Varying Stochastic Optimization and Utility-based
Communication.” arXiv preprint arXiv:1408.5294 (2014).
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
77 / 78
References
[31]
Cadambe, Viveck R and Jafar, Syed Ali. ”Interference alignment and degrees of freedom of the-user interference channel.” IEEE Trans.
on Information Theory, 2008, 3425–3441.
[32]
Sridharan, Gokul and Yu, Wei. ”Linear Beamformer Design for Interference Alignment via Rank Minimization.” IEEE Trans. on Signal
Processingy, 2015, 5910–5923.
[33]
Sandeep Kumar and Ketan Rajawat, “Distributed Interference Alignment For MIMO Cellular Networks Via Consensus ADMM.” IEEE
GlobalSIP, 2016, Washington.
[34]
T.F. Cox and M.A.A.Cox, Multidimensional Scaling, CRC/ Chapman and Hall, 2001.
[35]
I. Borg, and P. Groenen. modern multidimensional scaling theory and applications. Springer, New York, (2005)
[36]
Sandeep Kumar, and Ketan Rajawat “Distributed Stochastic Non-Convex ADMM ”, IEEE Trans. on Signal Proc.(To be submitted).
[37]
Simonetto, Andrea, and Geert Leus. “Distributed maximum likelihood sensor network localization.” IEEE Transactions on Signal
Processing 62.6 (2014): 1424-1437.
[38]
Costa, Jose A., Neal Patwari, and Alfred O. Hero III. “Distributed weighted-multidimensional scaling for node localization in sensor
networks.” ACM Transactions on Sensor Networks (TOSN) 2.1 (2006): 39-64.
[39]
Sandeep Kumar, R. Kumar and K. Rajawat, “Cooperative Localization of Mobile Networks Via Velocity-Assisted Multidimensional
Scaling,” in IEEE Trans. on Signal Proc., vol. 64, no. 7, April, 1, 2016.
[40]
Sandeep Kumar and Ketan Rajawat,“Velocity-assisted multidimensional scaling .” IEEE 16th International Workshop on SPAWC,
Stockholm, Sweden, 2015.
[41]
Sayed, Ali H. Adaptive filters. John Wiley & Sons, 2011.
[42]
Ketan Rajawat, and Sandeep Kumar. “Stochastic Multidimensional Scaling”, Special issues on distributed information processing for
social networks, IEEE Trans. on Signal and Inf Proc. over Networks (Under review).
[43]
Netrapalli, Praneeth, et al. “Non-convex robust PCA.” Advances in Neural Information Processing Systems. 2014.
[44]
Huang, Feihu, Songcan Chen, and Zhaosong Lu.“Stochastic Alternating Direction Method of Multipliers with Variance Reduction for
Nonconvex Optimization.” arXiv preprint arXiv:1610.02758 (2016).
[45]
Schizas, Ioannis D., and Georgios B. Giannakis. “Covariance eigenvector sparsity for compression and denoising.” IEEE Transactions on
Signal Processing 60.5 (2012): 2408-2421.
Sandeep Kumar (IIT K)
Non-Convex Optimization Over Networks
Open Seminar, 27/10/2016
78 / 78

Download Report

Open Seminar on Non-Convex Optimization Over

Paperzz.com

Your Paperzz