Non-convex Compressed Sensing with the Sum-of-Squares Method Tasuku Soma∗ Abstract We consider stable signal recovery in `q quasi-norm for 0 < q ≤ 1. In this problem, given a measurement vector y = Ax for some unknown signal vector x ∈ Rn and a known matrix A ∈ Rm×n , we want to recover z ∈ Rn with kx − zkq = O(kx − x∗ kq ) from a measurement vector, where x∗ is the s-sparse vector closest to x in `q quasi-norm. Although a small value of q is favorable for measuring the distance to sparse vectors, previous methods for q < 1 involve `q quasi-norm minimization which is computationally intractable. In this paper, we overcome this issue by using the sum-of-squares method, and give the first polynomial-time stable recovery scheme for a large class of matrices A in `q quasi-norm for any fixed constant 0 < q ≤ 1. 1 Introduction Compressed sensing, the subject of recovering a sparse signal vector x ∈ Rn from its measurement vector y = Ax for some known matrix A ∈ Rm×n with m n, has been intensively studied in the last decade [10, 22, 23]. It is well known that, if A is a random Gaussian or Rademacher matrix with m = Θ(s log(n/s)), then with high probability1 we can exactly recover any s-sparse signal vector, that is, a vector with the number of non-zero elements at most s [13]. This means that we can significantly compress signal vectors if they are sparse (in an appropriate choice of basis). In a realistic scenario, however, we can assume only that the signal vector x is not exactly sparse but close to a sparse vector. In such cases, we want to recover x with an error controlled by its distance to s-sparse vectors, and this feature is called stability [23] or instance optimality [20] of a recovery scheme. For q > 0, an integer s ∈ Z+ , and ∗ Graduate School of Information Science and Technology, The University of Tokyo (tasuku [email protected]). Supported by JSPS Grant-in-Aid for JSPS Fellows. † National Institute of Informatics and Preferred Infrastructure, Inc. ([email protected]). Supported by MEXT Grant-inAid for Scientific Research on Innovative Areas (24106001), JST, CREST, Foundations of Innovative Algorithms for Big Data, and JST, ERATO, Kawarabayashi Large Graph Project. 1 Here and henceforth, “with high probability” means with a constant probability sufficiently close to one. Yuichi Yoshida† a vector x ∈ Rn , we define σs (x)q = min{kx − x0 kq | x0 ∈ Rn , kx0 k0 ≤ s}. That is, σs (x)q is the distance of x to the closest ssparse vector in `q (quasi-)norm. The objective of stable signal recovery in `q quasi-norm is, given a measurement vector y = Ax generated from an unknown signal vector x ∈ Rn , to recover a vector z ∈ Rn such that kz −xkq = O(σs (x)q ). It is desirable to take a small q, for example, in the sparse noise model; that is, x = x0 + e, where x0 is the best s-sparse approximation to x and e is an s0 -sparse noise with s0 = O(1). To see this, let us compare the cases q = 1/2 and q = 1 as an illustrative example. Note that kz − xk1 ≤ kz − xk1/2 ≤ nkz − xk1 and σs (x)1/2 = kek1/2 ≤ s0 kek1 = s0 σs (x)1 . Hence, kz−xk1/2 = O(σs (x)1/2 ) implies kz−xk1 = O(σs (x)1 ), whereas kz − xk1 = O(σs (x)1 ) only implies a weaker bound kz − xk1/2 = O(nσs (x)1/2 ). It is reported that the noise vector e is often sparse in practice [18, 29]. Intuitively speaking, smaller q works better in the sparse noise model because kz −xkqq → kz −xk0 and σs (x)q → s0 as q → 0, and hence z converges to an (s + O(s0 ))sparse vector. A common approach to stable signal recovery is relaxing the `0 quasi-norm by an `q quasi-norm for 0 < q ≤ 1: (Pq ) minimize subject to kzkq Az = y and z ∈ Rn . For q = 1, this approach is called basis pursuit, a wellstudied and central algorithm in compressed sensing. If A satisfies some technical condition, called the stable null space property, then the solution z to (Pq ) satisfies kz − xkq = O(σs (x)q ). It is folklore that the following result can be obtained by extending the results in [11, 13], which handle the q = 1 case. Theorem 1.1. Let q ∈ (0, 1] and s, n ∈ Z+ . Then, a random Gaussian or Rademacher matrix A ∈ Rm×n with m = Θ(s log(n/s)) satisfies the following property with high probability: For any vectors x ∈ Rn and y = Ax, the solution z to (Pq ) satisfies kz − xkq = O(σs (x)q ). When q = 1, i.e., in the basis pursuit, the program (Pq ) can be represented as a linear program and we can find its solution in polynomial time. When q < 1, however, solving (Pq ) is NP-hard in general and Theorem 1.1 is currently only of theoretical interest. The main contribution of this paper is showing that, even when q < 1, there is a recovery scheme whose theoretical guarantee almost matches that of Theorem 1.1: Theorem 1.2. (Main theorem, informal version) Let q ∈ (0, 1] be a constant of the form q = 2−k for some k ∈ Z+ , s, n ∈ Z+ , and > 0. Then, a random (normalized) Rademacher matrix A ∈ Rm×n with m = O(s2/q log n) satisfies the following property with high probability: For any vector y = Ax, where x ∈ [−1, 1]n is an unknown vector, we can recover z with kz − xkq ≤ O(σs (x)q ) + in polynomial time. To our knowledge, our algorithm is the first polynomialtime stable recovery scheme in `q quasi-norm for q < 1. Our algorithm is based on the sum-of-squares method, which we describe next. The assumption on x can be easily relaxed to x ∈ [−R, R] for a polynomial R in n, keeping the running time polynomial in n. a pseudodistribution; only its low-degree moments are e D to emphasize the unavailable. We use the subscript E e z∼D P (z) derlying pseudodistribution. Also we write E to emphasize the variable z. We say that a degree-d pseudodistribution D satisfies the constraint {P = 0} if e P Q = 0 for all Q of degree at most d − deg P . We E say that D satisfies {P ≥ 0} if it satisfies the constraint {P − S = 0} for some SoS polynomial S ∈ Rd [z]. A pseudoexpectation can be compactly represented by its moment matrix, an nO(d/2) × nO(d/2) symmetric matrix M containing all the values for monomials e P 2 ≥ 0 is of degree at most d. The above condition E equivalent to requiring that M is positive semidefinite, and the other conditions can be represented by linear constraints among entries of M . The following theorem states that we can efficiently optimize over pseudodistributions, basically via semidefinite programming. Theorem 1.3. (The SoS method [30, 31, 33, 39]) For every > 0, d, n, m, M ∈ N and n-variate polynomials P1 , . . . , Pm in Rd [z], whose coefficients are in 0, . . . , M , if there exists a degree-d D for z satisfying the constraint Pi = 0 for every i ∈ [m], then we can O(d) time a degree-d pseu1.1 Sum of Squares Method As we cannot exactly find in (n · poly0 log(M/)) dodistribution D for z satisfying Pi ≤ and Pi ≥ − solve the non-convex program (Pq ) when q < 1, we for every i ∈ [m]. use the sum-of-squares (SoS) semidefinite programming hierarchy [30, 31, 33, 39], or simply the SoS method. By using binary search, we can also handle objective Here, we adopt the description of the SoS method functions. We ignore numerical issues since it will never introduced in [3, 4]. The SoS method attempts to solve affect our arguments (at least from a theoretical pera polynomial optimization problem (POP), via convex spective). We simply assume that the SoS method finds relaxation called the moment relaxation. We specify an optimal degree-d pseudodistribution in poly(nO(d) ) an even integer d called the degree or the level as a time. parameter. The SoS method will find a linear operator The dual of the SoS method corresponds to the L from the set of polynomials of degree at most d to R, SoS proof system [27], an automizable and restricted which shares some properties of expectation operators proof system for refuting a system of polynomials. The and satisfies the system of polynomials. Such linear SoS proof system are built on several deep results of operators are called pseudoexpectations. real algebraic geometry, such as positivestellensatz and Definition 1.1. (Pseudoexpectation) Let R[z] denote the polynomial ring over the reals in variables z = (z(1), . . . , z(n)) and let R[z]d denote the set of polynomials in R[z] of degree at most d. A degree-d pseudoexpectation operator for R[z] is a linear operator e : R[z]d → R satisfying that E(1) e e 2) ≥ 0 E = 1 and E(P for every polynomial P of degree at most d/2. its variants [34, 37]. Roughly speaking, any fact that is provable in the (bounded-degree) SoS proof system is also reflected to pseudodistributions. This approach lies at the heart of recent surprising results [3, 32]; they imported known proofs for integrality gap instances into the SoS proof system and showed that (low-degree) SoS hierarchy solves the integrality gap instances. For notational simplicity, it is convenient to regard a pseudoexpectation operator as the “expectation” operator of the so-called pseudodistribution 2 D. Note that it is possible that no actual distribution realizes a pseudoexpectation operator as the expectation operator (see, e.g. [19, 35]). Also note that we cannot sample from 1.2 Proof Outline Now we give a proof outline of Theorem 1.2. First we recall basic facts in stable signal recovery using the non-convex program (Pq ). For a set S ⊆ [n], we denote by S the complement set [n] \ S. For a vector v ∈ Rn and a set S ⊆ [n], vS denotes the vector restricted to S. We abuse this notation and regard vS as the vector in RS with vS (i) = v(i) for i ∈ S as well as a vector in Rn with vS (i) = v(i) for i ∈ S and vS (i) = 0 2 It is also called locally consistent distribution [19, 35]. for i ∈ S. The following notion is important in stable signal recovery. In the formal definition of PRNSP, we allow to impose some other technical constraints. Note that PRNSP is a stronger condition than the robust null space property Definition 1.2. (Robust null space property) because the latter considers only actual distributions Let q ∈ (0, 1], ρ, τ > 0, and s ∈ Z+ . A matrix consisting of single vectors. A ∈ Rm×n is said to satisfy the (ρ, τ, s)-robust null If the measurement matrix A ∈ Rm×n satisfies the space property (RNSP) in `q quasi-norm if PRNSP, then the resulting pseudodistribution D satisq q q fies an SoS version of stable signal recovery. Moreover, (1.1) kvS kq ≤ ρkvS kq + τ kAvk2 we can round D to a vector close to the signal vector, holds for any v ∈ Rn and S ⊆ [n] with |S| ≤ s. as shown in the following theorem. If A satisfies the (ρ, τ, s)-robust null space property for ρ < 1, then the solution to the program (Pq ) satisfies kz − xkq = O(σs (x)q ) and we establish Theorem 1.1. Indeed, a random Gaussian or Rademacher matrix will do as long as m = Ω(s log n/s) [13]. The main idea of our algorithm is that we can convert the proof of Theorem 1.1 into a sum of squares (SoS) proof, a proof such that all the polynomials appearing there have bounded degree and all the inequalities involved are simply implied by the SoS partial order. Due to the duality between SoS proofs and the SoS semidefinite programming hierarchy (see, e.g. [32]), any consequence of a SoS proof derived from a given set of polynomial constraints is satisfied by the pseudodistribution obtained by the SoS method on these constraints. Let us consider the following SoS version of (Pq ). minimize (P̃q ) subject to e kzkq E q z∼D D satisfies Az = y, Theorem 1.4. (informal version) Let q ∈ (0, 1] be a constant of the form q = 2−k for some k ∈ Z+ , 0 < ρ < 1, τ > 0, s ∈ Z+ , and > 0. If A satisfies the `q -(ρ, τ, s)-PRNSP, then, from the pseudodistribution D obtained by solving (P̃q ) with additional constraints depending on , we can compute a vector z̄ ∈ Rn such that 2 + 2ρ σs (x)q + kz̄ − xkq ≤ 1−ρ in polynomial time. We then show that there exists a family of matrices satisfying PRNSP, namely the family of Rademacher matrices. Theorem 1.5. (informal version) Let q ∈ (0, 1] and s ∈ Z+ . For some 0 < ρ < 1 and τ = O(s), a (normalized) Rademacher matrix A ∈ Rm×n with m = Ω(s2/q log n) satisfies the `q -(ρ, τ, s)-PRNSP with high probability. Theorem 1.2 is a consequence of Theorems 1.4 and 1.5. where D is over pseudodistributions for the variable z. In our algorithm, we solve (P̃q ) with technical additional constraints using the SoS method, and then recover a vector from the obtained pseudoexpectation operator. A subtle issue here is that we need to deal with terms of the form |z(i)|q for fractional q, which is not polynomial in z. A remedy for this is adding auxiliary variables and constraints among them (details are explained later), and for now we assume that we can exactly solve the program (P̃q ). As we can only execute SoS proofs by using the SoS method, we need an SoS version of the robust null space property: 1.3 Technical Contributions In order to prove Theorem 1.4, we would like to follow the proof of Theorem 1.1 in the SoS proof system. However, many steps of the proof requires non-trivial modifications for the SoS proof system to go through. Let us explain in more detail below. Handling `q quasi-norm The first obvious obstacle is that the program (P̃q ) contains non-polynomial terms such as |z(i)|q in the objective function. This is easily resolved by lifting; i.e., we introduce a variable tz(i) , which is supposed to represent |z(i)|q , and impose 2/q the constraints tz(i) = z(i)2 and tz(i) ≥ 0 (recall that q Definition 1.3. (informal version) Let q ∈ (0, 1], is of the form 2−k for some k ∈ Z+ ). ρ, τ > 0, and s ∈ Z+ . A matrix A ∈ Rm×n is said to Imposing the triangle inequality for `qq metsatisfy the (ρ, τ, s)-pseudo robust null space property in ric The second problem is that the added variable tz(i) `q quasi-norm (`q -(ρ, τ, s)-PRNSP for short) if does not always behave as expected. When proving Theorem 1.1, we frequently use the triangle inequal q/2 q q e kvS kq ≤ ρ E e kvS̄ kq + τ E e kAvk2 ity ku + vkqq ≤ kuk (1.2) E q q 2 Pq + kvkq for qvectors P u andq v, v∼D v∼D v∼D or in other words, |u(i) + v(i)| ≤ i i |u(i)| + P q . Unfortunately, we cannot derive its SoS verholds for any pseudodistribution D of degree at least 2/q i |v(i)| e P tu(i)+v(i) ≤ E e P tu(i) + E e P tv(i) in the SoS and S ⊆ [n] with |S| ≤ s. sion E i i i proof system. This is because, even if we impose the 2/q 2/q 2/q constraints tu(i) = u(i)2 , tv(i) = v(i)2 , and tu(i)+v(i) = (u(i) + v(i))2 , these constraints do not involve any constraint among tu(i) , tv(i) , and tu(i)+v(i) . (Instead, P q/2 P q/2 2/q 2/q e e we can derive E t ≤ E t + i u(i)+v(i) i u(i) P q/2 2/q e E in the SoS proof system, which is not i tv(i) very useful.) To overcome this issue, we use the fact that we only need the triangle inequality among z − x, z, and x, where z is a variable in (P̃q ) and x is the signal vector. Of course we do not know x in advance; we only know that x lies in [−1, 1]n . We will discretize the region [−1, 1]n using a small δ 1/n2/q and explicitly add e tz(i)−b ≤ E e tz(i) + b for each i ∈ [n] the constraints E and multiple b ∈ [−1, 1] of δ. Instead of x, we use an approximated vector xL each coordinate of which is a multiple of δ with kx − xL k being small. The SoS proof system augmented with these triangle inequalities can now follow the proof of Theorem 1.1. Rounding Rounding a pseudodistribution to a feasible solution is a challenging task in general. Fortunately, we can easily round the pseudodistribution D obtained by solving the program (P̃q ) to a vector z̄ close to the signal vector by fixing each coordinate z̄(i) to e D |z(i) − b|q , where b ∈ [−1, 1] runs over mularg minb E tiples of δ. Imposing the PRNSP In order to prove Theorem 1.5, we show that a (normalized) Rademacher matrix with a small coherence satisfies PRNSP. Here, the coherence of a matrix A ∈ Rm×n with columns a1 , . . . , am is defined as µ = maxi,j |hai , aj i|. Again our approach is the following: rewrite a proof that matrices with a small coherence satisfy the robust null space property, into an SoS proof. Here is the last obstacle; it seems inevitable for us to add exponentially many extra variables and constraints, in order to convert known proofs. However, we will show that the number of extra variables is polynomially bounded and that the extra constraints admit a separation oracle, if A is a Rademacher matrix. Thanks to the ellipsoid method, we can obtain the desired SoS proof for Rademacher matrices in polynomial time. trices [8, 9, 28]. See [23, 26] for a survey of this area. Besides the fact that k·kqq converges to the `0 norm as q → 0, an advantage of using the `q quasi-norm is that we can use a smaller measurement matrix for signal recovery (at least for the exact case). Chartrand and Staneva [16] showed that `q quasi-norm minimization is a stable recovery scheme for a certain matrix A ∈ Rm×n with m = Ω(s + qs log(n/s)).3 Hence, as q → 0, the required dimension of the measurement vector scales like O(s). As stated in the beginning of this section, the `q quasi-norm minimization is effective for the sparse noise model. Other methods (e.g., Prony’s method [21]) are also applicable to the sparse noise model, but they require a priori information on the sparsity of noise. In order to implement the `q quasi-norm minimization for q < 1, several heuristic methods have been proposed, including the gradient method [14], the iteratively reweighed least squares method (IRLS) [17], and an operator-splitting algorithm [15], and a method that computes a local minimum [24]. Recently, the sum-of-squares method has been found to be useful in several unsupervised learning tasks such as the planted sparse vector problem [4], dictionary learning [5], and tensor decompositions [5, 6, 25, 40]. Surveys [19, 35] provide detailed information on the Lasserre and other hierarchies. A handbook [2] contains various articles on the semidefinite programming, conic programming, and the SoS method, from both the theoretical and the practical point of view. 1.5 Organization In Section 2, we review basic facts on the SoS proof system. In Section 3, we show that stable signal recovery is achieved if the measurement matrix A satisfies PRNSP. Then, we show that a Rademacher matrix satisfies PRNSP with high probability in Section 4. 2 Preliminaries For an integer k ∈ N, we denote by [k] the set {1, 2, . . . , k}. For a matrix A, σmax (A) denotes its largest singular value. We write P 0 if P is a sumof-squares polynomial, and similarly we write P Q if P − Q 0. We now see several useful lemmas for pseudodistributions. Most of these results were proved in the series 1.4 Related work The notion of stable signal re- of works by Barak and his coauthors [3, 4, 5, 6]. We covery in `1 norm is introduced in [11, 12] and it is basically follow the terminology and notations in the shown that basis pursuit is a stable recover scheme for recent survey by Barak and Steurer [7]. a random Gaussian matrix A ∈ Rm×n provided that m = Ω(s log(n/s)). The notion of stable signal recov- Lemma 2.1. (Pseudo Cauchy-Schwartz [3]) Let ery is extended to `q quasi-norm in [20, 36]. Stable 3 Although they do not give this result explicitly, it can be recovery schemes with almost linear encoding time and shown by simple adaptation of their result to a proof for stability recovery time have been proposed by using sparse ma- of ` minimization. 1 d ∈ Z+ be an integer. Let P, Q ∈ R[x]d be polynomials To recover a vector close to xL in `q quasi-norm from of degree d and D be a pseudodistribution of degree at y, we consider the following non-convex program: least 2d. Then, we have minimize kzkqq 1/2 1/2 (P ) q,η e P (x)Q(x) ≤ E e P (x)2 e Q(x)2 E E . subject to kAz − yk2 ≤ η 2 , x∼D x∼D x∼D 2 √ where η = σmax (A) sδ. Note that the desired vector xL is a feasible solution of this program. In Section 3.1, we show how to formulate the nonconvex program (Pq,η ) as a POP. Although the form of the POP depends on , we can solve its moment relaxation using the SoS method. Then, we show x∼D x∼D x∼D that, if A satisfies a certain property, which we call pseudo robust null space property, then the vector z Lemma 2.3. ([3]) Let u, v, and D be the same as “sampled” from the pseudodistribution obtained by the above. Then, we have SoS method is close to the vector xL in `q quasi 1/2 1/2 1/2 norm. Then in Section 3.2, we show how to round e ku + vk2 e kuk2 e kvk2 E ≤ E + E . the pseudodistribution to an actual vector z̄ so that 2 2 2 x∼D x∼D x∼D kz̄ − xkq = O(σs (x)q ) + for small > 0. Lemma 2.4. For any pseudodistribution D of degree at least 4, we have 3.1 Formulating `q quasi-norm minimization as a POP To formulate (Pq,η ) as a POP, we introduce e hu, vi2 ≤ E e kuk2 kvk2 . E additional variables. Formally, we define kzkqq as a 2 2 Pn (u,v)∼D (u,v)∼D shorthand for i=1 tz(i) , where tz(i) for i ∈ [n] is an additional variable corresponding to |z(i)|q . We also Proof. By the Lagrange identity, we have 2/q impose constraints tz(i) ≥ 0 and tz(i) = z(i)2 for each X 2 2 2 2 kuk2 kvk2 − hu, vi = (u(i)v(j) − u(j)v(i)) . i ∈ [n]. i<j Unfortunately, these constraints are not sufficient e tz(i) and z(i) in the SoS to derive formula connecting E Hence, the statement has an SoS proof of degree 4. e e t2/q are unrelated proof system because E tz(i) and E Lemma 2.2. ([3]) Let d ∈ Z+ be an integer. Let u and v be vectors whose entries are polynomials in R[x]d . Then for any pseudodistribution D of degree at least 2d, we have 1/2 1/2 e hu, vi ≤ E e kuk2 e kvk2 E E . 2 2 z(i) in general. To address this issue, we add several other variables and constraints. Specifically, for each i ∈ [n] and b ∈ L, we add a new variable tz(i)−b along with the In this section, we prove Theorem 1.4. The formal following constraints: statement of Theorem 1.4 is provided in Theorem 3.1. In the rest of this section, we fix q ∈ (0, 1] and an error • tz(i)−b ≥ 0, parameter > 0. We also fix a matrix A ∈ Rm×n and the unknown signal vector x ∈ [−1, 1]n , and hence the 2/q • tz(i)−b = (z(i) − b)2 , measurement vector y = Ax. Let δ > 0 be a parameter chosen later, and L be a • tz(i)−b ≤ tz(i)−b0 + |b − b0 |q (b0 ∈ L), set of multiples of δ in [−1, 1]. Clearly |L| = O(1/δ), and there exists x̄ ∈ L with |x − x̄| ≤ δ for any x ∈ [−1, 1]. Let xL ∈ Rn be the point in Ln closest to x. Our • |b − b0 |q ≤ tz(i)−b + tz(i)−b0 (b0 ∈ L) . L strategy is recovering x instead of x. As we know that xL ∈ Ln and each coordinate of xL is discretized, Intuitively, tz(i)−b represents |z(i) − b|q . The last two we can add several useful constraints to the moment constraints encode triangle inequalities; for example, the first one corresponds to |z(i) − b|q ≤ |z(i) − b0 |q + relaxation, such as triangle inequality. |b − b0 |q . This increases the number of variables and For the measurement vector y = Ax, we have constraints by O(n/δ) and O(n/δ 2 ), respectively. For L L kAx − yk2 = kA(x − x)k2 the sake of notational simplicity, we will denote tz(i)−b by |z(i) − b|q for i ∈ [n] and b ∈ L in POPs. Note that L ≤ σmax (A)kx − xk2 these constraints are valid in the sense that the desired √ ≤ σmax (A) sδ. solution xL satisfies all of them. 3 Signal Recovery Based on Pseudo Robust Null Space Property Thus we arrive at the following POP: minimize subject to kzkqq z satisfies the system {kAz − yk22 ≤ η 2 } ∪ P, where P is a system of constraints. Here, P contains all the constraints on tz(i)−b mentioned above but not limited to. Indeed, in Section 4, we add other (valid) constraints to P so that the matrix A satisfies the pseudo robust null space property. From now on, we also fix P. Now, we consider the following moment relaxation: minimize (P̃q,η ) subject to e kzkq E q D D satisfies the system {kAz − yk22 ≤ η 2 } ∪ P, Proof. Let v := z − xL . Then from the PRNSP of A, we have q/2 q 2 e e S kq ≤ ρ Ekv e (3.3) . Ekv q S kq + τ EkAvk2 Note that q L q kxL kqq = kxL S kq + kxS kq e e S kq + kxL kq ≤ Ek(z − xL )S kqq + Ekz q S q e S kq + Ekz e S kq + kxL kq , = Ekv q q S q q L q L q q e e e Ekv S kq = Ek(z − x )S kq ≤ kxS kq + EkzS kq . Summing up these two inequalities, we get q q L q q L q e e e (3.4) Ekv S kq ≤ Ekzkq − kx kq + EkvS kq + 2kxS kq . Combining (3.3) and (3.4), we get 1 e q e Ekzkqq − kxL kqq + 2kxSL kqq Ekv S kq ≤ 1−ρ q/2 τ e + EkAvk22 . 1−ρ Proposition 3.1. Let nP be the number of variables in P and dP be the maximum degree that appears in the system P. If there is a polynomial-time separation oracle for constraints in P, then we can solve the moment relaxation (P̃q,η ) in O(n/δ+nP )O(2/q+dP ) time. Therefore, q q q e e e Proof. Immediate from Theorem 1.3. Ekvk q = EkvS kq + EkvS kq q/2 e Since xL ∈ Ln , the value Ekz − xL kqq is well q 2 e e EkAvk ≤ (1 + ρ) Ekv k + τ 2 S q defined. Furthermore, D satisfies the following triangle 1 + ρ inequalities: q L q L q e ≤ k Ekzk − kx k + 2kx q q S q 1−ρ q L q L q e e Ekzk q ≤ Ekz − x kq + kx kq , q/2 2τ 2 e q L q EkAvk . + e − xL kq ≤ Ekzk e 2 Ekz q q + kx kq , 1−ρ q e − xL kq + Ekzk e kxL kqq ≤ Ekz q q. Corollary 3.1. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . m×n satisfies the `q -(ρ, τ, s)-PRNSP Now we formally define pseudo robust null space If the matrix A ∈ R with respect to the system P, then we have property. 1+q Definition 3.1. Let ρ, τ > 0 and s ∈ Z+ . A mae − xL kq ≤ 2(1 + ρ) σs (xL )q + 2 τ η q . Ekz q q m×n trix A ∈ R is said to satisfy the (ρ, τ, s) pseudo 1−ρ 1−ρ robust null space property in `q quasi-norm (`q -(ρ, τ, s)q L q e e Proof. Note that Ekzk q − kx kq ≤ 0 and EkA(z − PRNSP for short) with respect to the system P if q/2 q/2 q/2 q/2 2 e xL )k22 ≤ EkAz−yk + ky−AxL k22 ≤ 2η q e kAvk2 e kvS kq ≤ ρ E e kv kq + τ E 2 E 2 q q S D D D by Lemma 2.3 and the concavity of | · |q . The statement holds for any pseudodistribution D of degree at least 2/q is immediate from the previous lemma with S being an satisfying P and all v of the form z − b (b ∈ Ln ), and index set of top-s largest absolute entries of x. for any S ⊆ [n] with |S| = s. Lemma 3.1. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the 3.2 Rounding Suppose that we have obtained a matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP with pseudodistribution D as a solution to (P̃q,η ). Then, we use the following simple rounding scheme to extract an respect to the system P, then we have actual vector z̄ ∈ Rn : For each i ∈ [n], find b ∈ L that 1+ρ e L q q L q L q e e z∼D |z(i) − b|q and set z̄(i) = b. In other Ekz − x kq ≤ Ekzkq − kx kq + 2kxS kq minimizes E 1−ρ e z∼D kz−bkq , where b is over words, z̄ is a minimizer of E q/2 q 2τ e n L 2 vectors in L . Clearly, we can find z̄ in O(n/δ) time. + EkA(z − x )k2 1−ρ Also, we have the following guarantee on its distance to for any S ⊆ [n] with |S| ≤ s. xL . Lemma 3.2. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the Corollary 3.2. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP with If the matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP respect to the system P, then the vector z̄ ∈ Ln satisfies with respect to the system P, then the vector z̄ satisfies 2(1 + ρ) 1+ρ 21+q τ q L q L q σs (x)q + O() kz̄ − xkq ≤ (3.5) kz̄ − x kq ≤ 2 σs (x )q + η . 1−ρ 1−ρ 1−ρ 1/q+1 1/q √ Proof. By Corollary 3.1, we have s . by choosing δ = O (1−ρ) / σ (A) max τ n1/q 1+q e − xL kq ≤ 1 + ρ σs (xL )q + 2 τ η q . Ekz q q 1−ρ 1−ρ Proof. Invoke Lemma 3.3 with 1/q−1 1 − ρ 1−ρ 1/q 0 L n q = · = Ω . e Since x ∈ L holds and z̄ is a minimizer of Ekz − bkq , 2(1 + ρ)σs (x)qq n we also have Then, we get a vector z̄ such that 1+q e − z̄kq ≤ 1 + ρ σs (xL )q + 2 τ η q . Ekz 2(1 + ρ) 1/q q q 1−ρ 1−ρ kz̄ − xkq ≤ σs (x)qq + 0 1−ρ Then by the triangle inequality, 1/q−1 2(1 + ρ) 2(1 + ρ) 0 q σ (x) + O σ (x) ≤ s q s q L q q L q e − z̄k + Ekz e −x k 1−ρ 1−ρ kz̄ − x kq ≤ Ekz q q 2(1 + ρ) 1+ρ 21+q τ q ≤ σs (x)q + O(). ≤2 σs (xL )qq + η . 1−ρ 1−ρ 1−ρ Note that the required δ is n (1 − ρ)0 1/q √ (1 − ρ)0 1/q o / σmax (A) s , Ω min τ n Lemma 3.3. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the (1 − ρ)1/q+1 √ 1/q matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP with =Ω / σmax (A) s . respect to the system P, then the vector z̄ satisfies τ n1/q Then, we show that z̄ is close to the signal vector x in `qq metric. kz̄ − xkqq ≤ 2(1 + ρ) σs (x)qq + O() 1−ρ Combining Proposition 3.1 and Corollary 3.2, we get the following theorem. Theorem 3.1. (formal version of Theorem 1.4) Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the matrix 1/q √ m×n 1/q satisfies the `q -(ρ, τ, s)-PRNSP with respect δ = min{ (1−ρ)/τ / σmax (A) s , ((1−ρ)/n) }. A ∈ R to the system P and P has a polynomial-time separation √ Proof. From the choice of δ, we have η = σmax (A) sδ ≤ oracle, then we can compute a vector z̄ ∈ Rn such that 1/q (1 − ρ)/τ . Then from Lemma 3.2, we get 2(1 + ρ) kz̄ − xkq ≤ σs (x)q + . 1−ρ 2(1 + ρ) kz̄ − xL kqq ≤ σs (xL )qq + O(). 1−ρ The running time is (n/δ + nP )O(2/q+dP ) , where nP Note that we have kxL − xkqq ≤ nδ q and |σs (x)qq − is the number of variables in P, dP is the maximum degree that appears in constraints of P, and δ = σs (xL )qq | = O(nδ q ). Then, we have √ (1−ρ)1/q+1 1/q Ω / σmax (A) s . τ n1/q kz̄ − xkqq ≤ kz̄ − xL kqq + kxL − xkqq 4 Imposing Pseudo Robust Null Space 2(1 + ρ) ≤ (σs (x)qq + nδ q ) + O() + nδ q Property 1−ρ A normalized Rademacher matrix A ∈ Rm×n is a 2(1 + ρ) nδ q = σs (x)qq + O + random 1−ρ 1−ρ √ matrix whose each entry is i.i.d. sampled from {±1/ m}. In this section, we show that we can impose 2(1 + ρ) = σs (x)qq + O(). the PRNSP to the normalized Rademacher matrix by 1−ρ introducing some additional variables and constraints to Finally, we show that z̄ is close to x in `q quasi- pseudodistributions. Now we introduce coherence and norm. review properties of a normalized Rademacher matrix. by choosing Definition 4.1. (Coherence) Let A ∈ Rm×n be a matrix whose columns a1 , . . . , an are `2 -normalized. The `q -coherence function of A is defined as µqq (s) := nP o q maxi∈[n] max /S j∈S |hai , aj i| : S ⊆ [n], |S| = s, i ∈ for s = 1, . . . , n − 1. can design a separation oracle for (4.6), and hence we can use the ellipsoid method to solve SoS relaxations involving these constraints. Lemma 4.3. There is a polynomial time algorithm that validates whether (4.6) is satisfied by a given pseudodistribution D, and if not, it finds i ∈ [n] and b ∈ Ln such Now we review properties of a normalized that Rademacher matrix. X b e b (i)|q > E|hAv e e b (j)|q |haj , ai i|q . E|v , ai i|q + E|v Lemma 4.1. ([1]) Let n ∈ Z+ and > 0. If m = j6=i Ω(log n/2 ), then a normalized Rademacher matrix A ∈ Rm×n with columns a1 , . . . , an satisfies |hai , aj i| ≤ Proof. Since there are only polynomially many choices Pn for all i 6= j with high probability. for the values of i, b(i), and j=1 b(j)haj , ai i, we can what follows, Pin Corollary 4.1. Let s, n ∈ Z+ , > 0. If m = exhaustively check all of them. Hence, n we fix i ∈ [n], b := b(i), and c := 2/q 2/q i j=1 b(j)haj , ai i. Ω(s log n/ ), then a normalized Rademacher maNote that, by fixing c, we have also fixed the value of m×n trix A ∈ R with columns a1 , . . . , an satisfies e b q 0 e |hAz, ai i − c|q . E|hAv , a i| to c := E q i µq (s) ≤ with high probability. n Now we want to check Pn whether there exists b ∈ L Lemma 4.2. ([38]) A normalized Rademacher matrix with b(i) = bi and P j=1 b(j)haj , ai i = c such that m q 0 e e ). A ∈ Rm×n with m ≤ n satisfies σmax (A) = O( n log E|z(i) − bi | > c + j6=i E|z(j) − b(j)|q |haj , ai i|q . m We can solve this problem by dynamic programming In what follows, we fix q ∈ (0, 1], s ∈ Z+ , > 0, using a table T [·, ·], where T [k, a] stores the minimum q q and a matrix A ∈ Rm×n that satisfies the properties value of P e j∈[k],j6=i E|z(j) − b(j)| |haj , ai i| conditioned of Corollary 4.1 and Lemma 4.2. Let a1 , . . . , an be the on P j∈[k],j6=i b(j)haj , ai i = a. Since the size of T is columns of A. O(n · nm/δ), the dynamic programming can be solved For b ∈ Ln , we define v b = z − b, where z is in polynomial time. a variable in POP. To impose the PRNSP to A, for each i ∈ [n] and b ∈ Ln , we add a variable thAvb ,ai i Let P 0 be the system of all the added constraints on 2/q and constraints thAvb ,ai i = hAv b , ai i2 and thAvb ,ai i ≥ tz(i)−b (given in Section 3) and constraints on thAvb ,a i . i 0. Intuitively, thAvb ,ai i represents |hAv b , ai i|q , and as Then, A satisfies the PRNSP by imposing the system with the previous section, we will denote thAvb ,ai i by P 0 . |hAv b , ai i|q in POPs for notational simplicity. Furthermore, we add the following constraints for Theorem 4.1. (formal version of Theorem 1.5) The matrix A satisfies the `q -(ρ, τ, s)-PRNSP with each i ∈ [n] and b ∈ Ln , µq (s) respect to P 0 with ρ = 1−µqqq (s−1) = O() and X (4.6) |v b (i)|q ≤ |hAv b , ai i|q + |v b (j)|q |haj , ai i|q . τ = 1−µqqs(s−1) = O(s). j6=i These constraints are valid since v b (i)ai = Av b − P b j6=i v (j)aj and hai , ai i = 1. At first sight, the resulting moment relaxation is intractable since there are exponentially many new variables. However, this To see this, note Pn is not the case. P n that hAv b , ai i = j=1 z(i)haj , ai i − j=1 b(j)haj , ai i. Pn Also, the number of possible values of j=1 b(j)haj , ai i √ is O(nm/δ) because A is ±1/ m-valued and b(j) ∈ [−1, 1] is a multiple of δ. Hence, Pn identifying variables thAvb ,ai i and thAvb0 ,ai i if j=1 b(j)haj , ai i = Pn 0 b (j)ha , a i, we only have to add O(nm/δ) varij i j=1 ables. Another issue is that we have added exponentially many constraints. The following lemma says that we Proof. The proof basically follows the standard proof in compressed sensing (see [23]). Let us fix S ⊆ [n] of size s and b ∈ Ln . Then for any pseudodistribution D satisfying the above constraints, we have b e b (i)|q ≤ E|hAv e e E|v , ai i|q + E X |v b (l)|q |hal , ai i|q l∈S̄ e +E X b |v (j)|q |haj , ai i|q j∈S:j6=i for i ∈ S. Using the pseudo Cauchy Schwartz k = − log2 q times, we get b e E|hAv , ai i|q 2/q b e ≤ EhAv , ai i2 . b b 2 e e By Lemma 2.4, EhAv , ai i2 ≤ EkAv k2 kai k22 = References b 2 e EkAv k2 . Thus we have q/2 X b 2 [1] D. Achlioptas. Database-friendly random projections: e b (i)|q ≤ EkAv e e E|v |v b (l)|q |hal , ai i|q k2 +E l∈S̄ X e +E |v b (j)|q |haj , ai i|q . [2] j∈S:j6=i Summing up these inequalities yields q/2 X X b 2 e b kq ≤ s EkAv e e b (l)|q Ekv k + E|v |hal , ai i|q S q [3] 2 i∈S l∈S̄ + X b q e (j)| E|v j∈S X |haj , ai i|q [4] i∈S:i6=j q/2 b 2 e e b kq ≤ s EkAv k2 + µqq (s) Ekv S̄ q e b kq . + µqq (s − 1) Ekv S q [5] Rearranging this inequality finishes the proof. Combining Lemma 4.2 and Theorem 4.1, we immediately have the following. [6] Corollary 4.2. The matrix A satisfies the `q -(ρ, τ, s)PRNSP with respect to P 0 with ρ = O() and τ = O(s). Moreover, σmax (A) = O((n log m)/m). [7] Now we establish our main theorem. Theorem 4.2. (formal version of Theorem 1.2) Let q ∈ (0, 1] be a constant of the form q = 2−k for some k ∈ Z+ , s, n ∈ Z+ , and > 0. Then, a normalized Rademacher matrix A ∈ Rm×n with m = O(s2/q log n) satisfies the following property with high probability: For any vector y = Ax, where x ∈ [−1, 1]n is an unknown vector, we can recover z̄ with kz̄ − xkq ≤ O(σs (x)q ) + . O(2/q) The running time is (nm/δ) , where 1/q √ / sn log m . δ = O m sn1/q Proof. Solve the SoS relaxation (P̃q,η ) with the system P 0 . Then, with high probability, the matrix A satisfies the `q -(ρ, τ, s)-PRNSP with respect to P 0 for ρ = O(1) and τ = O(s), and σmax (A) = O((n log m)/m) with high probability by Corollary 4.2. Then, we can recover a vector z ∈ Rn with the desired property by Theorem 3.1. Note that the number nP 0 of added variables in the system P 0 is O(nm/δ) and the maximum degree dP 0 that appears in a constraint of P 0 is 2/q. Hence, the running time becomes as stated. [8] [9] [10] [11] [12] [13] [14] Acknowledgements The authors thank Satoru Iwata and Yuji Nakatsukasa for reading an earlier version of this paper. [15] Johnson-Lindenstrauss with binary coins. Journal of Computer and System Sciences, 66(4):671–687, 2003. M. F. Anjos and J. B. Lasserre. Handbook on Semidefinite, Conic and Polynomial Optimization. Springer Science & Business Media, 2011. B. Barak, F. G. S. L. Brandao, A. W. Harrow, J. Kelner, D. Steurer, and Y. Zhou. Hypercontractivity, sumof-squares proofs, and their applications. In Proceedings of the 44th Annual ACM Symposium on Theory of Computing (STOC), pages 307–326, 2012. B. Barak, J. A. Kelner, and D. Steurer. Rounding sum-of-squares relaxations. In Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC), pages 31–40, 2014. B. Barak, J. A. Kelner, and D. Steurer. Dictionary learning and tensor decomposition via the sum-ofsquares method. In Proceedings of the 47th Annual ACM Symposium on Theory of Computing (STOC), pages 143–151, 2015. B. Barak and A. Moitra. Tensor prediction, Rademacher complexity and random 3-XOR. CoRR, abs/1501.06521, 2015. B. Barak and D. Steurer. Sum-of-squares proofs and the quest toward optimal algorithms. In Proceedings of International Congress of Mathematicians (ICM), pages 509–533, 2014. R. Berinde and P. Indyk. Sequential sparse matching pursuit. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing, pages 36–43, 2009. R. Berinde, P. Indyk, and M. Ruzic. Practical nearoptimal sparse recovery in the L1 norm. In Proceedings of the 46th Annual Allerton Conference on Communication, Control, and Computing, pages 198–205, 2008. A. M. Bruckstein, D. L. Donoho, and M. Elad. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51(1):34–81, 2009. E. J. Candès. The restricted isometry property and its implications for compressed sensing. Comptes Rendus Mathematique, 346(9-10):589–592, 2008. E. J. Candès, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006. E. J. Candès and T. Tao. Decoding by linear programming. IEEE Transactions on Information Theory, 51(12):4203–4215, 2005. R. Chartrand. Exact reconstruction of sparse signals via nonconvex minimization. Signal Processing Letters, 14(10):707–710. R. Chartrand. Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data. In Proceedings of the IEEE International Symposium [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] on Biomedical Imaging: From Nano to Macro (ISBI), pages 262–265, 2009. R. Chartrand and V. Staneva. Restricted isometry properties and nonconvex compressive sensing. Inverse Problems, 24(3), 2008. R. Chartrand and W. Yin. Iteratively reweighted algorithms for compressive sensing. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3869–3872, 2008. A. Cherian, S. Sra, and N. Papanikolopoulos. Denoising sparse noise via online dictionary learning. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2060–2063, 2011. E. Chlamtac and M. Tulsiani. Convex relaxations and integrality gaps. In Handbook on Semidefinite, Conic and Polynomial Optimization. 2012. A. Cohen, W. Dahmen, and R. DeVore. Compressed sensing and best k-term approximation. Journal of the American Mathematical Society, 22(1):211–231, 2009. B. G. R. de Prony. Essay experimental et analytique: sur les lois de la dilatabilite de fluides elastique et sur celles de la force expansive de la vapeur de l’alcool, a differentes temperatures. Journal de l’ecole Polytechnique, pages 24–76, 1795. D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006. S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive Sensing. Springer, 2013. D. Ge, X. Jiang, and Y. Ye. A note on the complexity of Lp minimization. Mathematical Programming, 129(2):285–299, 2011. R. Ge and T. Ma. Decomposing overcomplete 3rd order tensors using sum-of-squares algorithms. CoRR, abs/1504.05287, 2015. A. Gilbert and P. Indyk. Sparse recovery using sparse matrices. Proceedings of the IEEE, 6:937–947, 2010. D. Grigoriev and N. Vorobjov. Complexity of null-and positivstellensatz proofs. Annals of Pure and Applied Logic, 113(1):153–160, 2001. P. Indyk and M. Ružić. Near-optimal sparse recovery in the L1 norm. In Proceedings of the 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 199–207, 2008. J. N. Laska, M. A. Davenport, and R. G. Baraniuk. Exact signal recovery from sparsely corrupted measurements through the pursuit of justice. Conference Record of the 43rd Asilomar Conference on Signals, Systems and Computers (ACSSC), pages 1556–1560, 2009. J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM Journal on Optimization, 11(3):796–817, 2006. Y. Nesterov. Squared functional systems and optimization problems. In High Performance Optimization, pages 405–440. Springer, 2000. R. O’Donnell and Y. Zhou. Approximability and proof [33] [34] [35] [36] [37] [38] [39] [40] complexity. In Proceedings of the 24th Annual ACMSIAM Symposium on Discrete Algorithms (SODA), pages 1537–1556, 2013. P. A. Parrilo. Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. PhD thesis, California Institute of Technology, 2000. M. Putinar. Positive polynomials on compact semialgebraic sets. Indiana University Mathematics Journal, 42(3):969–984, 1993. T. Rothvoß. The Lasserre hierarchy in approximation algorithms. Lecture Notes for the MAPSP, pages 1–25, 2013. R. Saab, R. Chartrand, and O. Yilmaz. Stable sparse approximations via nonconvex optimization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3885–3888, 2008. K. Schmüdgen. The k-moment problem for compact semi-algebraic sets. Mathematische Annalen, 289(1):203–206, 1991. Y. Seginer. The expected norm of random matrices. Combinatorics, Probability and Computing, 9(2):149– 166, 2000. N. Z. Shor. An approach to obtaining global extremums in polynomial mathematical programming problems. Cybernetics, 23(5):695–700, 1987. G. Tang and P. Shah. Guaranteed tensor decomposition: A moment approach. Proceedings of the 32nd International Conference on Machine Learning (ICML), pages 1491–1500, 2015.
© Copyright 2026 Paperzz