Semideο¬nite Programming
OR 6327 Spring 2012
Scribe: Daniel Fleischman
1
Lecture 27
May 1, 2012
The Sparse Covariance Selection Problem
Suppose we have π random variables and we want to detect their dependence on each other as
well as estimating mean and covariance parameters.
Note that we may (wrongly) detect a dependence. Suppose variables π1 and π2 are (conditionally) independent of each other, but π1 and π3 are dependent, π3 and π4 are dependent
and π4 and π2 are dependent. (Henceforth weβll use lower-case letters for random variables to
avoid confusion with matrices.)
We will assume that the variables are multivariate normal parametrized
by mean π )and
( 1
π β1
covariance matrix Ξ£ (Ξ£ is pd). Then the density is π (π₯) = π1 exp β 2 (π₯ β π) Ξ£ (π₯ β π) for
some normalizing constant π1 .
Let π¦ = (π₯π ; π₯π ) consist of two of the variables and let πΎ = {1, 2, . . . , π} β {π, π}. Then the
density of π¦ given π₯πΎ = π₯¯πΎ is:
}
{
1
π β1
π(π¦) = π2 exp β (π¦ β π) Ξ£{π,π},{π,π} (π¦ β π)
2
for some π and π2 . Hence, π₯π and π₯π are independent given π₯πΎ = π₯¯πΎ if and only if (Ξ£β1 )ππ = 0.
We sample π times to get π₯1 , . . . , π₯π β βπ and compute the sample mean π₯¯ = (π₯1 + . . . +
π₯π )/π and the sample variance Ξ£Μ = ((π₯1 β π₯¯)(π₯1 β π₯¯)π + . . . + (π₯π β π₯¯)(π₯π β π₯¯)π )/π , and use
maximum likelihood. The likelihood function given the data is:
βπ/2
β(π, Ξ£) := π3 det(Ξ£)
π
β
}
{
1
π β1
exp β (π₯π β π) Ξ£ (π₯π β π)
2
π=1
for yet another normalizing constant π3 . For any Ξ£ pd this is maximized over π by π = π₯¯. So
we want to maximize:
βπ/2
βΜ(Ξ£) := β(¯
π₯, Ξ£) = π3 det(Ξ£)
{
}
{
}
1 β1
π
π β1
exp β Ξ£ β Ξ£Μ = exp π4 β lndet Ξ£ β Ξ£ β Ξ£Μ .
2
2
2
Since the exponential function is monotone, we want to maximize its argument. This
is not concave in Ξ£. So we change variables. Let π = Ξ£β1 . Then we want to minimize
βlndet π + Ξ£Μ β π, which is convex.
Parametrizing our optimization by the inverse covariance has another advantage, since this
is the matrix we expect (or hope) to be sparse. We will penalize the presence of nonzeros in π
by adding an βπ1 β term: β£β£β£πβ£β£β£1 := β£β£vec(π)β£β£1 . So, at the end, we want to solve:
1
βlndet
π
|
{z+ Ξ£Μ β π}
smooth
min
πβ£β£β£πβ£β£β£1 .
| {z }
nice nonsmooth convex
We will use variable splitting and the alternating direction augmented Lagrangian method.
πβ»0
2
+
The Method
Reformulate as:
min
β lndet π + Ξ£Μ β π + πβ£β£β£π β£β£β£1
s.t. :
π β π = 0,
πβ»0
and consider the augmented Lagrangian:
πΏ(π, π, π ) := βlndet π + Ξ£Μ β π + πβ£β£β£π β£β£β£1 β π β (π β π ) +
π½
β£β£π β π β£β£2πΉ
2
for π½ > 0 ο¬xed.
The idea is to start with π0 = π0 = πΌ and π0 = 0 and in each step do (ππ+1 , ππ+1 ) =
arg minπ,π πΏ(π, π, ππ ), and ππ+1 := ππ β π½(ππ+1 β ππ+1 ).
But it is more eο¬cient to minimize separately over π and π , to get:
β Start with π0 = π0 = πΌ and π0 = 0.
β In iteration π do:
β ππ+1 = arg minπ πΏ(π, ππ , ππ )
β ππ+1 = arg minπ πΏ(ππ+1 , π, ππ )
β ππ+1 = ππ β π½(ππ+1 β ππ+1 ).
The two optimization steps are given by the following two propositions. First note that:
π½
β£β£π β π β£β£2πΉ + π(π, π )
2
π½
1
= βlndet π + β£β£π β (π + (π β Ξ£Μ))β£β£2πΉ + π β² (π, π )
2
π½
πΏ(π, π, π ) = βlndet π β (π β Ξ£Μ) β π +
where π(π, π ) and π β² (π, π ) are functions independent of π.
Proposition 1 Let π΅ := ππ + π½1 (ππ β Ξ£Μ) have eigenvalue decomposition πΞππ . Then the mini(
)
β
mizer of βlndet π+ π½2 β£β£πβπ΅β£β£2πΉ is π = ππ ππ where π = π·πππ(π) and ππ = 12 ππ + π2π + 4π½ β1
for all π.
2
Proof: The objective function is convex and smooth, with derivative βπ β1 + π½(π β π΅), so it
is minimized when π½π β π β1 = π½π΅.
Let π = ππ ππ; then by premultiplying by ππ and postmultiplying by π we get π½π β
β1
π = π½Ξ. This is solved by a diagonal matrix π = Diag (π) with π½ππ β πβ1
π = π½ππ , for all π,
with roots as given above. β
β
Now, for the minimization problem over π , note that:
πΏ(π, π, π ) = βπ β (π β π ) + πβ£β£β£π β£β£β£1 +
= πβ£β£β£π β£β£β£1 +
π½
β£β£π β π β£β£2πΉ + π(π, π )
2
π½
1
β£β£π β (π β π )β£β£2πΉ + π β² (π, π )
2
π½
for π(π, π ), π β² (π, π ) independent of π .
So we have:
Proposition 2 Let πΆ = π½ππ+1 β ππ . Then the minimizer of πβ£β£β£π β£β£β£1 + π½2 β£β£π β π½1 πΆβ£β£2πΉ is
π (πΆ)), where π΅ π = {π β ππ : β£π§ β£ β€ π βπ, π} and π π is the projection on that
π = π½1 (πΆ β ππ΅β
ππ
π΅β
β
set.
π (πΆ) = π·, where π
Proof: Note that πΆ β ππ΅β
ππ is 0 if βπ β€ πππ β€ π, is πππ + π if πππ < βπ and
πππ β π if πππ > π.
The objective function separates completely over each entry of π and πΆ, so we want:
min πβ£π€ππ β£ +
π€ππ
1
π½
(π€ππ β πππ )2 .
2
π½
This is convex, and its subdiο¬erential is:
β§
if π€ππ > 0,
β¨ {π + (π½π€ππ β πππ )}
{βπ + (π½π€ππ β πππ )} if π€ππ < 0,
βπ (π€ππ ) =
β©
[βπ, π] + {π½π€ππ β πππ } if π€ππ = 0.
In all cases we see that π as in the statement of the proposition has entry π€ππ with a
subgradient of 0. β
β
3
© Copyright 2026 Paperzz