Appendix S1 Proof of Theorems Consider a matrix of non-negative entries and a parameter for L1-norm regularization: Ŵ arg min g(W ) f (W ) oW W n 1 Wx j u j 2 oW j 1 1 (1.0) This equation is equivalent to: Ŵ arg min g(W ) tr(W T W ) 2tr(WU T ) oW W 1 (1.1) with matrices and U defined to be: n xj xj j 1 n U xj uj T j 1 T (1.2) Consider the following noise model: u Wx (1.3) where the noise term : N(0, I ) follows normal distribution with a fixed variance. 2 Theorem 1 If n 0 0, O(1) , and 1 n C lim xi xiT n n i 1 (1.4) is non-singular, then when n , D n Ŵ W arg min(V ) (1.5) where V (Z ) tr(Z T ZC) 2tr(Z T ) 0 tr I(W 0) osgn(W ) oZ I(W 0) o Z T (1.6) and is a random matrix with normal distribution of mean 0 and covariance E ij kl ik jl 2 for all i, j, k, l. Proof. Let Ŵ Ŵ W . Then it follows from Eq. (1.1) that Ŵ is the solution of Ŵ argmin g(W ) n tr W T W 2 ntr W T o(W W ) 1 (1.7) W where 1 n (1.8) i xiT n i 1 is a random matrix with mean 0 and covariance E ab cd Cbdm ac 2 for all a, b, c, d. n Here C m xi xiT / n is the empirical covariance matrix of the input data. As i 1 n , C m C and n 0 . Consequently, we must have Ŵ 0 in Eq. (1.7) almost surely when C is non-singular. When W 0 , the last term in the RHS of Eq. (1.7) can be rewritten as o(W W ) 1 tr I(W 0) osgn(W ) oW I(W 0) o W T (1.9) Now define Z nW . Then as n , n Ŵ W minimizes tr Z T ZC 2tr Z T 0 tr I(W 0) osgn(W ) oW I(W 0) o W T (1.10) Given a matrix of non-negative entries and a single non-negative parameter , we can formulate the following minimization problem: n n ̂ arg min f () log det() tr(S) o 1 2 2 f 0 where o is the component-wise product of the two matrices. (1.11) Theorem 2 If n 0 0, O(1) , and is non-singular, then D n ̂ arg min(V ) (1.12) where V () 1 1 tr() tr(Z) 0 tr I( 0) o I( 0) osgn() o T (1.13) 4 2 and Z is a random matrix with normal distribution of mean 0 and covariance E Z ij Z kl ij jl il jk for all i, j, k, l. Proof. Let Yˆ ̂ . Then from Eq. (1.11) we have: n n Yˆ arg min f (Y ) log det( Y ) tr(SY ) o( Y ) 1 2 2 Y n S xj xj j 1 T (1.14) / n is a random variable. When n , S follows normal distribution and converges in distribution to: S 1 Z n (1.15) where Z follows normal distribution with mean 0 and covariance E Z ij Z kl ij jl il jk for all i, j, k,l [1, p] . Hence, we have: n tr(SY ) n tr(Y ) tr( nYZ ) When n , Y 0 almost surely. Thus, we can perform Taylor expansion on log det( Y ) around which leads to: (1.16) 1 tr(1Y 1Y ) o Y 2 1 2 log det() tr(Y ) tr(Y Y ) o Y F 2 log det( Y ) log det() tr(1Y ) 2 F (1.17) When Y is small, the term o Y can be rewritten as: ij Yij ij 0 o Y ij ij sgn(ij )(ij Yij ) ij 0 (1.18) Substituting Eqs. (1.16), (1.17) and (1.18) into Eq. (1.14) and ignoring terms independent of Y, we have: 1 1 Yˆ arg min f (Y ) tr nY nY tr 4 2 Y n n 0 I ij 0 ij i 1 j 1 nYZ nYij I ij 0 ij sgn ij If we define nY n ̂ , then we have ̂ arg min V . nYij (1.19)
© Copyright 2026 Paperzz