Appendix S1.

Appendix S1
Proof of Theorems
Consider a matrix  of non-negative entries and a parameter for L1-norm regularization:
Ŵ  arg min g(W )  f (W )    oW
W
n
1
  Wx j  u j
2
   oW
j 1
1
(1.0)
This equation is equivalent to:
Ŵ  arg min g(W )  tr(W T W )  2tr(WU T )    oW
W
1
(1.1)
with matrices  and U defined to be:
n
 
  xj xj
j 1
n
 
U  xj uj
T
j 1
T
(1.2)
Consider the following noise model:
u  Wx  
(1.3)
where the noise term  : N(0,  I ) follows normal distribution with a fixed variance.
2
Theorem 1
If 
n  0  0,   O(1) , and
1 n

C  lim   xi xiT 
n  n

i 1
(1.4)
is non-singular, then when n   ,


D
n Ŵ  W  arg min(V )
(1.5)
where


V (Z )  tr(Z T ZC)  2tr(Z T )  0 tr  I(W  0) osgn(W ) oZ  I(W  0) o Z  T (1.6)
and  is a random matrix with normal distribution of mean 0 and covariance
E   ij  kl    ik jl 2 for all i, j, k, l.
Proof. Let Ŵ  Ŵ  W . Then it follows from Eq. (1.1) that Ŵ is the solution of




Ŵ  argmin g(W )  n tr W T W   2 ntr W T    o(W  W ) 1 (1.7)
W
where
1 n
(1.8)
i xiT

n i 1
is a random matrix with mean 0 and covariance E  ab  cd  Cbdm ac 2 for all a, b, c, d.

n
Here C m   xi xiT / n is the empirical covariance matrix of the input data. As
i 1
n  , C m  C and   n 0 . Consequently, we must have Ŵ  0 in Eq. (1.7)
almost surely when C is non-singular. When W  0 , the last term in the RHS of Eq.
(1.7) can be rewritten as

 o(W  W ) 1  tr  I(W  0) osgn(W ) oW  I(W  0) o W  T


(1.9)

Now define Z  nW . Then as n  , n Ŵ  W minimizes






tr Z T ZC  2tr Z T  0 tr  I(W  0) osgn(W ) oW  I(W  0) o W  T (1.10)
Given a matrix  of non-negative entries and a single non-negative parameter  , we can
formulate the following minimization problem:
n
n
̂  arg min f ()   log det()  tr(S)    o 1
2
2
f 0
where  o is the component-wise product of the two matrices.
(1.11)
Theorem 2
If 
n  0  0,   O(1) , and  is non-singular, then


D
n ̂    arg min(V )
(1.12)
where
V () 


1
1
tr()  tr(Z)  0 tr  I(  0) o   I(  0) osgn() o  T (1.13)
4
2
and Z is a random matrix with normal distribution of mean 0 and covariance
E  Z ij Z kl    ij  jl   il  jk for all i, j, k, l.
Proof. Let Yˆ  ̂   . Then from Eq. (1.11) we have:
n
n
Yˆ  arg min f (Y )   log det(  Y )  tr(SY )    o(  Y ) 1
2
2
Y
n
 
S  xj xj
j 1
T
(1.14)
/ n is a random variable. When n   , S follows normal distribution
and converges in distribution to:
S 
1
Z
n
(1.15)
where Z follows normal distribution with mean 0 and covariance
E  Z ij Z kl    ij  jl   il  jk for all i, j, k,l [1, p] . Hence, we have:
n tr(SY )  n tr(Y )  tr( nYZ )
When n  , Y  0 almost surely. Thus, we can perform Taylor expansion on
log det(  Y ) around  which leads to:
(1.16)
 
 
1
tr(1Y 1Y )  o Y
2
1
2
log det()  tr(Y )  tr(Y Y )  o Y F
2
log det(  Y )  log det()  tr(1Y ) 

2
F
(1.17)
When Y is small, the term  o  Y  can be rewritten as:

ij Yij
ij  0

 o  Y  ij  
ij sgn(ij )(ij  Yij ) ij  0

(1.18)
Substituting Eqs. (1.16), (1.17) and (1.18) into Eq. (1.14) and ignoring terms independent
of Y, we have:


1
1
Yˆ  arg min f (Y )  tr  nY  nY  tr
4
2
Y
n
n


 0    I ij  0  ij
i 1 j 1



 nYZ 

 
nYij  I ij  0  ij sgn ij
If we define   nY  n ̂   , then we have ̂  arg min V  .

nYij 
(1.19)