MATRIX DECOMPOSITION BY CONVEX PROGRAMMING
JON LEE, KY VU
Assume that a matrix C ∈ Rm×n is the sum of k unknown matrices A∗1 , . . . , A∗k .
We know in advance that the image of each Ai under an associated linear map fi
is a low-rank matrix. We would like to find conditions so that we can recover these
matrices exactly, in particular, by solving the convex optimization problem
min
k
X
λi kfi (Ai ))k
i=1
s.t A1 + . . . + Ak = C,
for some properly selected weights λi > 0. The norm k . k is chosen to be the nuclear
norm, which is the best surrogate of the rank function.
1. Matrix seminorms induced by linear transformations
Let f be a linear transformation from Rm×n to Rp×q . For any matrix norm k . k,
we define
kAkf := kf (A)k.
We immediately have the following property:
Lemma 1.1. k . kf defines a matrix seminorm. Furthermore, if f is injective, then
k . kf defines a matrix norm.
Since a seminorm is a convex function, we would like to know how its subgradients look like. In the following, we concentrate on nuclear norm, which is denoted
by k . k∗ . By the transformation property of subgradients, for any matrix A we have
∂kAk∗f = f t ∂kf (A)k∗ .
Let U ΣV t be the Singular Value Decomposition of f (A), where U ∈ Rp×k , V ∈
Rq×k satisfy U t U = V t V = Ik×k and Σ ∈ Rk×k is a diagonal matrix with positive
entries. Let PU := U U t and PV = V V t , then we have
∂kf (A)k∗ = {U V t + W : kW k~ ≤ 1, PU W = 0, W PV = 0},
where k . k~ is the operator norm. Denote by
Ω := {U X t + Y V t | X ∈ Rq×k and Y ∈ Rp×k }.
Then we can write the above subgradient in a more convenient form:
∂kf (A)k∗ = {U V t + W : kW k~ ≤ 1, W ∈ Ω⊥ }.
Note that U V t ∈ Ω, therefore a matrix Q ∈ ∂kf (A)k∗ if and only if
PΩ (Q) = U V t
and
kPΩ⊥ (Q)k~ ≤ 1
From the above arguments, we have the following proposition:
Date: December 12, 2014.
1
2
JON LEE, KY VU
Proposition 1.2. Let X, Q ∈ Rm×n and f be a linear transformation from Rm×n
to Rp×q . Assume that U ΣV t is the SVD of f (X) with Σ ∈ Rk×k and
Ωf := {U X t + Y V t | X ∈ Rq×k and Y ∈ Rp×k }.
Then Q ∈ ∂kXk∗f if and only if there is a matrix R ∈ Rp×q such that Q = f t (R)
and
PΩf (R) = U V t
(R)k~ ≤ 1.
kPΩ⊥
f
and
2. A sufficient condition for unique decomposition
Assume that A∗1 , A∗2 , . . . , A∗k ∈ Rm×n are such that
k
X
A∗i = C
i=1
and for each i = 1, 2, . . . , k, let fi be a linear transformation from Rm×n to Rpi ×qi .
We would like to find conditions so that A∗1 , . . . , A∗k are unique solutions of the
following convex program
min
k
X
λi kfi (Ai ))k
(1)
i=1
s.t A1 + . . . + Ak = C,
for some properly selected weights λi > 0.
For each i, consider the SVD of fi (A∗i ), which is given by
fi (A∗i ) = Ui Σi Vit ,
where Ui ∈ Rpi ×ki , Vi ∈ Rqi ×ki satisfy Uit Ui = Vit Vi = Iki and Σi ∈ Rki ×ki is a
diagonal matrix with positive entries. We then denote
Ωfi := {Ui X t + Y Vit | X ∈ Rqi ×ki , Y ∈ Rpi ×ki },
and
Ω∗fi := fi−1 [Ωfi ∩ Im (fi )].
The following proposition provides a sufficient condition for the uniqueness of solutions of the problem (1):
Proposition 2.1. Given A∗i , fi , Ui , Vi , Ωfi , i = 1, . . . , k defined as above. Then
(A∗1 , . . . , A∗k ) is the unique solution of the problem (1) if the following conditions
hold:
Pk
Lk
∗
∗
(1)
i=1 Ωfi =
i=1 Ωfi and
∗
(2) there are matrices Q , R1∗ , R2∗ , . . . , Rk∗ such that for each i = 1, 2, . . . , k:
Q∗ = fit (Ri∗ )
PΩfi (Ri∗ ) = λi Ui Vit
kPΩ⊥
(Ri∗ )k~ < λi .
f
i
Proof.
Assume that all the above hypotheses are satisfied. We first show that A∗ =
(A∗1 , . . . , A∗k ) is an optimum of (1). By checking the subgradient optimal conditions
at (A∗1 , . . . , A∗k ), it is sufficient to show that there exists a dual Q such that
Q ∈ λi ∂kA∗i k∗fi
for all 1 ≤ i ≤ k.
(2)
MATRIX DECOMPOSITION BY CONVEX PROGRAMMING
3
It is easy to see that matrix Q∗ from the second hypothesis satisfies all these
conditions, due to Proposition 1.2. Therefore, A∗ is an optimum of (1).
Now, assume for contraction that A∗ + N = (A∗1 + N1 , . . . , A∗k + Nk ) is another
feasible solution of (1). It follows immediately that N1 + . . . + Nk = 0.
Denote by F (A) := λ1 kf1 (A1 )k∗ + . . . + λk kfk (Ak )k∗ as a function of A =
(A1 , . . . , Ak ). Let (Q1 , . . . , Qk ) be an arbitrary subgradient of F (.) at (A∗1 , . . . , A∗k ),
then by definition we have
F (A∗ + N ) ≥ F (A∗ ) +
k
X
hQi , Ni i.
(3)
i=1
By Proposition 1.2, for each 1 ≤ i ≤ k, there is a matrix Ri such that Qi = fit (Ri )
and
PΩfi (Ri ) = λi Ui Vit ,
(Ri )k~ ≤ λi ,
kPΩ⊥
f
(4)
i
so we have
hQi , Ni i = hfit (Ri ), Ni i
= hRi , fi (Ni )i
= hλi Ui Vit + PΩ⊥
(Ri ), fi (Ni )i
f
i
= hRi∗ − PΩ⊥
(Ri∗ ) + PΩ⊥
(Ri ), fi (Ni )i
f
f
i
i
(since Q∗ ∈ λi ∂kA∗i k∗fi ).
Therefore
k
k
X
X
hQi , Ni i =
hRi∗ − PΩ⊥
(Ri∗ ) + PΩ⊥
(Ri ), fi (Ni )i
f
f
i=1
i
i=1
=
k
X
i=1
=
k
X
i=1
=
k
X
i=1
=
k
X
i=1
=
k
X
i=1
i
(Ri∗ ), fi (Ni )i +
hPΩ⊥
(Ri ) − PΩ⊥
f
f
i
i
hRi∗ , fi (Ni )i
i=1
hPΩ⊥
(Ri ) − PΩ⊥
(Ri∗ ), fi (Ni )i +
f
f
i
k
X
i
k
X
hQ∗ , Ni i
i=1
hPΩ⊥
(Ri ) − PΩ⊥
(Ri∗ ), fi (Ni )i + hQ∗ ,
f
f
i
i
Ni i
i=1
hPΩ⊥
(Ri ) − PΩ⊥
(Ri∗ ), fi (Ni )i
f
f
i
k
X
(since
i
k
X
Ni = 0)
i=1
hPΩ⊥
(Ri ) − PΩ⊥
(Ri∗ ), PΩ⊥
(fi (Ni ))i.
f
f
f
i
i
i
Note that we are free to choose all Ri , as long as they satisfy the conditions (4).
For each i = 1, . . . , k, consider the SVD of PΩ⊥
(fi (Ni )):
f
i
PΩ⊥
(fi (Ni )) = Ũi Σ̃i Ṽit .
f
(5)
i
We choose Ri = λi Ui Vit + λi Ũi Ṽit . In order to show Ri satisfies the conditions in
t
(4), it is sufficient to verify that Ũi Ṽit ∈ Ω⊥
fi (it is obvious that kŨi Ṽi k~ = 1).
4
JON LEE, KY VU
Since Ũi Σ̃i Ṽit ∈ Ω⊥
fi , it follows that
hŨi Σ̃i Ṽit , W i = 0
for all W ∈ Ωfi .
Therefore, by Lemma 6.1 in the Appendix:
PUi Ũi Σ̃i Ṽit = 0
and
Ũi Σ̃i Ṽit PVi = 0,
where PUi := Ui Uit and PVi := Vi Vit (note that Ui , Vi are in the SVD of fi (Ai )).
Then we have
t
PUi (Ũi Ṽit ) = (PUi Ũi )Ṽit = (PUi Ũi Σ̃i Ṽit ) Ṽi Σ̃−1
Ṽi = 0 and
i
−1 t
t
t
t
(Ũi Ṽi )PVi = Ũi (Ṽi PVi ) = Ũi Σ̃i Ũi ( Ũi Σ̃i Ṽi PVi ) = 0.
By using Lemma 6.1 again, we have Ũi Ṽit ⊥ Ωfi , or Ũi Ṽit ∈ Ω⊥
fi .
With this choice of Ri :
t
(fi (Ni ))k∗ .
hPΩ⊥
(Ri ), PΩ⊥
(fi (Ni ))i = λi Ũi Ṽit , Ũi Σ̃i Ṽi = λi trace Σ̃i = λi kPΩ⊥
f
f
f
i
i
i
Therefore we have
k
k
k
X
X
X
hQi , Ni i =
hPΩ⊥
(R
),
P
hPΩ⊥
(Ri∗ ), PΩ⊥
(fi (Ni ))ii
⊥ (fi (Ni ))i −
i
Ω
f
f
f
f
i=1
i=1
=
k
X
i=1
≥
k
X
i=1
i
i
λi kPΩ⊥
(fi (Ni ))k∗ −
f
i
k
X
i
k
X
i
hPΩ⊥
(Ri∗ ), PΩ⊥
(fi (Ni ))ii
f
f
i
i=1
λi kPΩ⊥
(fi (Ni ))k∗ −
f
i
i=1
i
kPΩ⊥
(Ri∗ )k~ . kPΩ⊥
(fi (Ni ))k∗
f
f
i
i=1
i
k X
∗
λi − kPΩ⊥
(Ri )k~ . kPΩ⊥
=
(fi (Ni ))k∗ .
f
f
i=1
i
i
The inequality above follows from the property that dual of nuclear norm is operator
norm, i.e hX, Y i ≤ kXk∗ .kY k~ . By assumption, we know that this sum is strictly
positive unless kPΩ⊥
(fi (Ni ))k∗ = 0 for all 1 ≤ i ≤ k. This is equivalent to fi (Ni ) ∈
fi
Ωfi . But we also have fi (Ni ) ∈ Im (fi ), therefore fi (Ni ) ∈ Ωfi ∩ Im (fi ). In other
word,
Ni ∈ fi−1 [Ωfi ∩ Im (fi )] = Ω∗fi for all 1 ≤ i ≤ k.
(6)
Since the sum of Ω∗fi is a direct sum by assumption and N1 + . . . + Nk = 0, then
(6) occurs if and only if Ni = 0 for all i.
Therefore, if Ni are not all equals to 0, then F (A∗ + N ) < F (A∗ ), which contradicts to the optimality of A∗ . This finishes the proof.
MATRIX DECOMPOSITION BY CONVEX PROGRAMMING
5
3. Incoherence conditions for unique decomposition
In this section, we assume that for each 1 ≤ i ≤ k, fi is a bijection from
Rm×n → Rpi ×qi . This makes our problem much easier to deal with but also excludes
many important classes of functions. In the next section, we will try to generalize
to arbitrary functions.
Pk
Pk
Assume that the sum i=1 Ω∗fi is direct, then there is a matrix Q̂ ∈ i=1 Ω∗fi
such that
PΩ∗f (Q̂) = λi PΩ∗f (fit Ui Vit )
for each 1 ≤ i ≤ k.
i
i
For convenience, we will denote by gi := (fit )−1 . Select
Si := gi PΩ∗⊥
(Q̂) − λi PΩ∗⊥
(fit Ui Vit ) ∈ Ω⊥
fi ,
f
f
i
i
and denote by R̂i := λi Ui Vit + Si . We will find conditions such that Q̂, R̂1 , . . . , R̂k
satisfy requirements in Proposition 2.1. In fact, we have
Q̂ = λi PΩ∗f (fit Ui Vit ) + PΩ∗⊥
(Q̂)
f
i
=
i
λi fit Ui Vit
(fit Ui Vit )
+ PΩ∗⊥
(Q̂) − λi PΩ∗⊥
f
f
i
i
= λi fit Ui Vit + fit Si
= fit R̂i ,
and PΩfi (R̂i ) = λi Ui Vit ; PΩ⊥
(R̂i ) = Si . Therefore, it is sufficient to find conditions
fi
so that kSi k~ < 1 for all 1 ≤ i ≤ k.
Pk
Pk
Since Q̂ ∈ i=1 Ω∗fi , there are matrices Qi ∈ Ω∗fi such that Q̂ = i=1 Qi . We
set
i := Qi − λi PΩ∗f (fit Ui Vit )
i
Pk
then i ∈ Ω∗fi , and from the equation PΩ∗f (Q̂) = PΩ∗f ( j=1 Qj ) we get:
i
i = −PΩ∗f (
k
X
i
i
λj PΩ∗f (fjt Uj Vjt ) + j ).
j
j6=i
Denote
µ∗ij = max∗
X∈Ωf
j
kXk~gi
,
kXk~gj
αi =
max
X∈Rm×n
kPΩ∗⊥
(X)k~gi
f
i
kXk~gi
and βi = kPΩ∗⊥
(fit Ui Vit )k~gi .
f
i
6
JON LEE, KY VU
Therefore
ki k~gi ≤
k
X
(1 + αi )kQj k~gi
j6=i
≤
k
X
(1 + αi )µij kQj k~gj
j6=i
=
k
X
(1 + αi )µij kλj PΩ∗f (fjt Uj Vjt ) + j k~gj
j
j6=i
≤
k
X
(1 + αi )µij λj (1 + βj ) + kj k~gj
j6=i
On the other hand,
kSi k~ = kPΩ∗⊥
(Q̂) − λi PΩ∗⊥
(fit Ui Vit )k~gi
f
f
i
i
≤ kPΩ∗⊥
(Q̂)k~gi + λi kPΩ∗⊥
(fit Ui Vit )k~gi
fi
fi
X
= kPΩ∗⊥
(
Qj )k~gi + λi βi
f
i
≤ αi k
j6=i
X
Qj k~gi + λi βi
j6=i
≤ αi
X
kQj k~gi + λi βi
j6=i
≤ αi
X
µij kQj k~gj + λi βi
j6=i
≤ αi
X
µij λj kPΩ∗f (fjt Uj Vjt )k~gj + kj k~gj + λi βi
j
j6=i
≤ αi
X
j6=i
µij λj (1 + βj ) + kj k~gj + λi βi
MATRIX DECOMPOSITION BY CONVEX PROGRAMMING
7
Lemma 3.1. Given a matrix M ∈ Rm×n and Ω∗fi is defined as above, if fi is
orthogonal then
(M ) = fi−1 Ipi ×pi − Ui Uit fi (M ) Iqi ×qi − Vi Vit .
PΩ∗⊥
f
i
In particular
(M )k~fi ≤ kM k~fi .
kPΩ∗⊥
f
i
Proof. We know that
Ω∗⊥
fi
=
fit Ω⊥
fi ,
(M ) must have the form
so PΩ∗⊥
f
i
(M ) = fit X∗ for some X∗ ∈ Ω⊥
PΩ∗⊥
fi .
f
i
It means that M − fit X∗ ∈ Ω∗fi , so for all X ∈ Ω⊥
fi :
M − fit X∗ , fit (X) = 0
or
fi (M ) − fi fit (X∗ ), X = 0.
Therefore, X∗ = PΩ⊥
[fi (M )], and it is well-known (see [1], for example) that
f
i
X∗ = (Ipi ×pi − PUi )fi (M )(Iqi ×qi − PVi )
We then conclude that
PΩ∗⊥
(M ) = fi−1 Ipi ×pi − Ui Uit fi (M ) Iqi ×qi − Vi Vit .
f
i
Note that operator norm is submultiplicative. So we have
[Ipi ×pi − Ui Uit ]fi (M )[Iqi ×qi − Vi Vit ]
kPΩ∗⊥
(M
)k
=
~f
i
~
fi
t
t
≤ kIpi ×pi − Ui Ui k~
kfi (M )k~
kIqi ×qi − Vi Vi k~ .
We use the following claim: for any symmetric matrix A such that At A = A, we
have kAk~ ≤ 1.
Indeed, since A is symmetric, there is an orthogonal matrix U such that
A = U diag(σ1 , . . . , σn )U t .
thus, At A = U diag(σ12 , . . . , σn2 )U t , so |σi | = σi2 for all i. It means that σmax (A) ≤ 1,
and the claim is proved.
Apply the claim for A = Ipi ×pi − Ui Uit and A = Iqi ×qi − Vi Vit , we have
kIpi ×pi − Ui Uit k~ ≤ 1
and kIqi ×qi − Vi Vit k~ ≤ 1,
the proof is then followed.
Define
µ∗ij =
max
N ∈Ω∗
fj ,kN k~fj ≤1
Assume we have the direct sum
⊕ni=1 Ω∗fi such that
kN k~fi = max∗
N ∈Ωf
j
⊕ni=1 Ω∗fi ,
PΩ∗f (Q̂) = λi fit (Ui Vit )
kN k~fi
.
kN k~fj
then there is a unique matrix Q̂ ∈
for each 1 ≤ i ≤ k.
i
We can express Q̂ uniquely as
Q̂ = Q1 + . . . + Qk
in which Qi ∈ Ω∗fi .
8
JON LEE, KY VU
Denote
(
δi =
1
kAi k~
if fi is an orthogonal transformation
.
if fi (X) = Ai X or fi (X) = XAi
For each i, let Qi = fit (Ui Vit ) + i . Then i = Qi − λi fit (Ui Vit ) ∈ Ω∗fi . We have
X
λj fjt (Uj Vjt ) + j )k~fi
(
Q̂)k
=
kP
kPΩ∗⊥
∗⊥ (
~f
Ω
i
f
f
i
i
≤
X
≤
X
=
X
≤
X
j6=i
kλj fjt (Uj Vjt ) + j k~fi
j6=i
µ∗ij kλj fjt (Uj Vjt ) + j k~fj
j6=i
µ∗ij kλj Uj Vjt + fj j k~
j6=i
µ∗ij (λj + kj k~fj )
j6=i
On the other hand, λi fit (Ui Vit ) = PΩ∗f (Q̂) = Q1 + (
i
X
t
t
ki k~fi = kPΩ∗f (
λj fj (Uj Vj ) + j )k~fi
P
j6=i
PΩ∗f (Qj ), so
i
i
j6=i
≤k
X
≤2
X
X
λj fjt (Uj Vjt ) + j k~fi + kPΩ∗⊥
(
λj fjt (Uj Vjt ) + j )k~fi
f
i
j6=i
kλj fjt (Uj Vjt )
j6=i
+ j k~fi
j6=i
≤2
X
≤2
X
µ∗ij kλj fjt (Uj Vjt ) + j k~fj
j6=i
µ∗ij (λj + kj k~fj ).
j6=i
This is equivalent to
(λi + ki k~fi ) ≤ λi +
X
2µ∗ij (1 + kj k~gj )
(7)
j6=i
Denote zi := λi + ki k~fi , λ = (λ1 , . . . , λk )t and
1
−2µ∗12 . . .
−2µ∗21
1
...
Θ :=
...
...
...
−2µ∗k1 −2µ∗k2 . . .
−2µ∗1k
−2µ∗2k
,
...
1
then we can write (7) in a more compact way by
Θz ≤ λ.
(8)
4. Incoherence of matrix decomposition
We need to find conditions on the matrix Θ and λ so that if (z1∗ , . . . , zk∗ ) is the
unique solution of
Θz = λ,
MATRIX DECOMPOSITION BY CONVEX PROGRAMMING
9
then (8) is equivalent to
z ≤ z∗.
(Q̂)k~gi by
Using z ∗ as an upper bound for z, we obtain an upper bound for kPΩ⊥
f
i
(Q̂)k~gi ≤ kgi k~
kPΩ⊥
f
i
X
µ∗ij zj ≤ kgi k~
j6=i
X
j6=i
kgi k~
µ∗ij zj∗ =
(z ∗ − λi ).
1 + kgi k~ i
Moreover, Θ and λ must be chosen so that the value in the RHS is bounded by λi .
(Q̂)k~fi .
Similar arguments are applied for kPΩ⊥
f
i
5. Computations with small k
In this section, we will attempt to do some computations for some small k.
5.1. The case k = 2. The system of inequality (8) is equivalent to
(
z1 ≤ θ12 z2 + λ1
z2 ≤ θ21 z1 + λ2
So
(
(1 − θ12 θ21 )z1 ≤ θ12 λ2 + λ1
(1 − θ12 θ21 )z2 ≤ θ21 λ1 + λ2
The first requirement should be θ12 θ21 < 1, then we have
(
12 λ2 +λ1
z1 ≤ θ1−θ
12 θ21
21 λ1 +λ2
z2 ≤ θ1−θ
12 θ21
We have
kgi k~
kgi k~
θ12 λ2 + λ1
kgi k~
θ12 λ2 + λ1 θ12 θ21
(z1 −λ1 ) ≤
(
−λ1 ) =
(
)
1
1 + kgi k~
1 + kgi k~ 1 − θ12 θ21
1 + kgi k~
1 − θ12 θ21
the last value is not greater than λ1 if and only if
λ2
1 − (1 + 2c1 )(1 + c2 )µ12 µ21
≤
λ1
c1 µ12
where ci = kgi k~ . Similarly we must have
kPΩ⊥
(Q̂)k~g1 ≤
f
λ1
1 − (1 + 2c2 )(1 + c1 )µ12 µ21
≤
.
λ2
c2 µ21
Such λ1 , λ2 exist if
1 − (1 + 2c1 )(1 + c2 )µ12 µ21 1 − (1 + 2c2 )(1 + c1 )µ12 µ21 ≥ c1 c2 µ21 µ12 .
This is a quadratic inequality for µ12 µ21 , then it is very easy to see that it holds
when we select
1
.
µ12 µ21 <
(1 + c1 )(1 + c2 )(2 + c1 + c2 + c1 c2 )
In the special case when both f1 , f2 are orthogonal, we have c1 = c2 = 1, then we
need to select
1
µ12 µ21 <
.
20
10
JON LEE, KY VU
Note that in [Parillo], we can use 0 instead of c1 , thus we select
1
1
µ12 µ21 <
=
(1 + c2 )(2 + c2 )
6
5.2. The case k = 3.
Lemma 5.1. If 0 < 1 − θ12 θ21 , 1 − θ13 θ31 , 1 − θ23 θ32 and 0 < 1 − θ12 θ21 − θ13 θ31 −
θ23 θ32 − θ12 θ23 θ31 − θ13 θ32 θ21 , then
References
[1] Chandrasekaran, Venkat; Sanghavi, Sujay; Parrilo, Pablo A.; Willsky, Alan S: Rank-sparsity
incoherence for matrix decomposition. SIAM J. Optim. 21 (2011), no. 2, 572–596.
[2] Recht, Benjamin; Fazel, Maryam; Parrilo, Pablo A. Guaranteed minimum-rank solutions of
linear matrix equations via nuclear norm minimization. SIAM Rev. 52 (2010), no. 3, 471–501.
[3] Watson, G. A. Characterization of the subdifferential of some matrix norms. Linear Algebra
Appl. 170 (1992), 33–45.
[4] Vandenberghe, Lieven; Boyd, Stephen.Semidefinite programming. SIAM Rev. 38 (1996), no.
1, 49–95.
MATRIX DECOMPOSITION BY CONVEX PROGRAMMING
11
In this work, we are interested in certain matrix norms such as nuclear norm and
operator norm.
Since a norm is a convex function, we would like to answer some basic questions,
for example:
• what is the dual norm of k . kf ?
• what is ?
It is well-known that the dual of operator norm k . k~ in Rm×n is nuclear norm
k . k∗ . We would like to obtain a similar result for k . k~f . Let fIt be the restriction
of f t to the image space I = Im(f ) and g := (fIt )−1 where f t : Rp×q → Rm×n is the
adjoint of f . Since f is injective, fIt is a linear isomorphism. The dual of k . k~f is
defined by a norm p( . ):
p(X) := sup trace(X t Y ) Y ∈ Rm×n , kY k~f ≤ 1
for all X ∈ Rm×n . We have the following proposition:
Proposition 5.2. Let f be an injective linear transformation from Rm×n to Rp×q .
Then the dual norm of k . k~f is given by
p(X) = min {kW k∗ : f t (W ) = X}.
(9)
In particular, dual norm of k . k~f is bounded above by k . k∗g , i.e
p(X) ≤ kXk∗g
for all X ∈ Rm×n ,
where g := (fIt )−1 .
Proof. The proof is standard.
Corollary 5.3. Let f be a bijective linear transformation from R
Then the dual norm of k . k∗f is k . k~g where g := (f t )−1 .
m×n
to R
p×q
.
Now we turn to the second question of determining subgradient ∂kXk∗f .
With this argument, we immediately have the following proposition, which specifies the subgradient of k . k∗f :
6. APPENDIX - SOME SIMPLE PROOFS
Recall that, if A = U σ(A)V t is the singular value decomposition (SVD) of A
where σ(A) ∈ Rk×k is a diagonal matrix with positive entries and U ∈ Rm×k , V ∈
Rn×k satisfying U t U = V t V = Ik×k , then kAk∗ := kσ(A)k1 and kAk~ := kσ(A)k∞
are nuclear norm and operator norm of A, respectively.
Lemma 6.1. Let U ∈ Rp×k , V ∈ Rq×k satisfying U t U = V t V = Ik×k . We define
PU := U U t and PV = V V t . Then for any matrix W ∈ Rp×q , the following are
equivalent:
(i) PU W = 0 and W PV = 0,
(ii) hW, U X t + Y V t i = 0 for all X ∈ Rq×k and Y ∈ Rp×k .
Proof. (i) ⇒ (ii): We have
PU W = 0 ⇒ U t PU W = U t W = 0
W PV = 0 ⇒ W PV V = W V = 0.
12
JON LEE, KY VU
Therefore, for all X ∈ Rq×k and Y ∈ Rp×k :
hW, U X t + Y V t i = hW, U X t i + hW, Y V t i
= hU t W, X t i + hW V, Y i
= h0, X t i + h0, Y i = 0.
(ii) ⇒ (i): Select X t = U t W and Y = W V , we have
0 = hW, U X t + Y V t i
= hU t W, X t i + hW V, Y i
= kU t W k2 + kW V k2 .
Therefore, U t W = 0 and W V = 0. But this implies that U U t W = PU W = 0 and
W V V t = W PV = 0.
Lemma 6.2. If A is n × n invertible matrix, and σ(A) = diag(σ1 , . . . , σk ) then
1
1
σ(A−1 ) = diag(
,...,
).
σk (A)
σ1 (A)
Proof. The proof is straightforward.
Lemma 6.3. For any two full column-rank matrices A, B we have:
σmin (AB) ≥ σmin (A) σmin (B).
Proof. We know that for any matrix M ,
kM xk2
.
x6=0 kxk2
Therefore, there is an vector x∗ 6= 0 such that
kABx∗ k2
σmin (AB) =
.
kx∗ k2
Since x∗ 6= 0 and A, B are full column-rank, both Bx∗ and ABx∗ are not 0, then
we have
kABx∗ k2 kBx∗ k2
≥ σmin (A) σmin (B).
σmin (AB) =
kBx∗ k2 kx∗ k2
σmin (M ) := min
Lemma 6.4. Let U ∈ Rp×k satisfying U t U = Ik×k and A ∈ Rk×k be invertible,
then
k(U t AAt U )−1 k~ ≤ k(At )−1 k2~ .
Proof. We have, by above lemmas
1
1
1
= 2
k(U t AAt U )−1 k~ =
=
t
t
t
t
t
σmin (U AA U )
σmin (At U )
σmin (A U ) A U
1
≤ 2
2 (U )
t
σmin (A ) σmin
1
2
= 2
= σmax
(A−1 ) = kA−1 k2~ .
σmin (A)
Here we use Lemma 6.3, with the notice that both At and U are full columnrank.
© Copyright 2026 Paperzz