Non-convex Compressed Sensing with the Sum-of

Non-convex Compressed Sensing with the Sum-of-Squares Method
Tasuku Soma∗
Abstract
We consider stable signal recovery in `q quasi-norm for
0 < q ≤ 1. In this problem, given a measurement vector
y = Ax for some unknown signal vector x ∈ Rn and a
known matrix A ∈ Rm×n , we want to recover z ∈ Rn
with kx − zkq = O(kx − x∗ kq ) from a measurement
vector, where x∗ is the s-sparse vector closest to x in `q
quasi-norm.
Although a small value of q is favorable for measuring the distance to sparse vectors, previous methods
for q < 1 involve `q quasi-norm minimization which is
computationally intractable. In this paper, we overcome
this issue by using the sum-of-squares method, and give
the first polynomial-time stable recovery scheme for a
large class of matrices A in `q quasi-norm for any fixed
constant 0 < q ≤ 1.
1
Introduction
Compressed sensing, the subject of recovering a sparse
signal vector x ∈ Rn from its measurement vector y =
Ax for some known matrix A ∈ Rm×n with m n, has
been intensively studied in the last decade [10, 22, 23].
It is well known that, if A is a random Gaussian or
Rademacher matrix with m = Θ(s log(n/s)), then with
high probability1 we can exactly recover any s-sparse
signal vector, that is, a vector with the number of
non-zero elements at most s [13]. This means that
we can significantly compress signal vectors if they are
sparse (in an appropriate choice of basis). In a realistic
scenario, however, we can assume only that the signal
vector x is not exactly sparse but close to a sparse
vector. In such cases, we want to recover x with an error
controlled by its distance to s-sparse vectors, and this
feature is called stability [23] or instance optimality [20]
of a recovery scheme. For q > 0, an integer s ∈ Z+ , and
∗ Graduate School of Information Science and Technology,
The University of Tokyo (tasuku [email protected]).
Supported by JSPS Grant-in-Aid for JSPS Fellows.
† National Institute of Informatics and Preferred Infrastructure, Inc. ([email protected]). Supported by MEXT Grant-inAid for Scientific Research on Innovative Areas (24106001), JST,
CREST, Foundations of Innovative Algorithms for Big Data, and
JST, ERATO, Kawarabayashi Large Graph Project.
1 Here and henceforth, “with high probability” means with a
constant probability sufficiently close to one.
Yuichi Yoshida†
a vector x ∈ Rn , we define
σs (x)q = min{kx − x0 kq | x0 ∈ Rn , kx0 k0 ≤ s}.
That is, σs (x)q is the distance of x to the closest ssparse vector in `q (quasi-)norm. The objective of stable
signal recovery in `q quasi-norm is, given a measurement
vector y = Ax generated from an unknown signal vector
x ∈ Rn , to recover a vector z ∈ Rn such that kz −xkq =
O(σs (x)q ). It is desirable to take a small q, for example,
in the sparse noise model; that is, x = x0 + e, where
x0 is the best s-sparse approximation to x and e is
an s0 -sparse noise with s0 = O(1). To see this, let us
compare the cases q = 1/2 and q = 1 as an illustrative
example. Note that kz − xk1 ≤ kz − xk1/2 ≤ nkz − xk1
and σs (x)1/2 = kek1/2 ≤ s0 kek1 = s0 σs (x)1 . Hence,
kz−xk1/2 = O(σs (x)1/2 ) implies kz−xk1 = O(σs (x)1 ),
whereas kz − xk1 = O(σs (x)1 ) only implies a weaker
bound kz − xk1/2 = O(nσs (x)1/2 ). It is reported that
the noise vector e is often sparse in practice [18, 29].
Intuitively speaking, smaller q works better in the sparse
noise model because kz −xkqq → kz −xk0 and σs (x)q →
s0 as q → 0, and hence z converges to an (s + O(s0 ))sparse vector.
A common approach to stable signal recovery is
relaxing the `0 quasi-norm by an `q quasi-norm for
0 < q ≤ 1:
(Pq )
minimize
subject to
kzkq
Az = y and z ∈ Rn .
For q = 1, this approach is called basis pursuit, a wellstudied and central algorithm in compressed sensing. If
A satisfies some technical condition, called the stable
null space property, then the solution z to (Pq ) satisfies
kz − xkq = O(σs (x)q ). It is folklore that the following
result can be obtained by extending the results in [11,
13], which handle the q = 1 case.
Theorem 1.1. Let q ∈ (0, 1] and s, n ∈ Z+ . Then,
a random Gaussian or Rademacher matrix A ∈ Rm×n
with m = Θ(s log(n/s)) satisfies the following property
with high probability: For any vectors x ∈ Rn and
y = Ax, the solution z to (Pq ) satisfies kz − xkq =
O(σs (x)q ).
When q = 1, i.e., in the basis pursuit, the program (Pq )
can be represented as a linear program and we can find
its solution in polynomial time. When q < 1, however,
solving (Pq ) is NP-hard in general and Theorem 1.1 is
currently only of theoretical interest.
The main contribution of this paper is showing
that, even when q < 1, there is a recovery scheme
whose theoretical guarantee almost matches that of
Theorem 1.1:
Theorem 1.2. (Main theorem, informal version)
Let q ∈ (0, 1] be a constant of the form q = 2−k for
some k ∈ Z+ , s, n ∈ Z+ , and > 0. Then, a random
(normalized) Rademacher matrix A ∈ Rm×n with
m = O(s2/q log n) satisfies the following property
with high probability: For any vector y = Ax, where
x ∈ [−1, 1]n is an unknown vector, we can recover z
with kz − xkq ≤ O(σs (x)q ) + in polynomial time.
To our knowledge, our algorithm is the first polynomialtime stable recovery scheme in `q quasi-norm for q < 1.
Our algorithm is based on the sum-of-squares method,
which we describe next. The assumption on x can be
easily relaxed to x ∈ [−R, R] for a polynomial R in n,
keeping the running time polynomial in n.
a pseudodistribution; only its low-degree moments are
e D to emphasize the unavailable. We use the subscript E
e z∼D P (z)
derlying pseudodistribution. Also we write E
to emphasize the variable z. We say that a degree-d
pseudodistribution D satisfies the constraint {P = 0} if
e P Q = 0 for all Q of degree at most d − deg P . We
E
say that D satisfies {P ≥ 0} if it satisfies the constraint
{P − S = 0} for some SoS polynomial S ∈ Rd [z].
A pseudoexpectation can be compactly represented
by its moment matrix, an nO(d/2) × nO(d/2) symmetric matrix M containing all the values for monomials
e P 2 ≥ 0 is
of degree at most d. The above condition E
equivalent to requiring that M is positive semidefinite,
and the other conditions can be represented by linear
constraints among entries of M . The following theorem
states that we can efficiently optimize over pseudodistributions, basically via semidefinite programming.
Theorem 1.3. (The SoS method [30, 31, 33, 39])
For every > 0, d, n, m, M ∈ N and n-variate polynomials P1 , . . . , Pm in Rd [z], whose coefficients are in
0, . . . , M , if there exists a degree-d D for z satisfying
the constraint Pi = 0 for every i ∈ [m], then we can
O(d)
time a degree-d pseu1.1 Sum of Squares Method As we cannot exactly find in (n · poly0 log(M/))
dodistribution
D
for
z
satisfying
Pi ≤ and Pi ≥ −
solve the non-convex program (Pq ) when q < 1, we
for
every
i
∈
[m].
use the sum-of-squares (SoS) semidefinite programming
hierarchy [30, 31, 33, 39], or simply the SoS method. By using binary search, we can also handle objective
Here, we adopt the description of the SoS method functions. We ignore numerical issues since it will never
introduced in [3, 4]. The SoS method attempts to solve affect our arguments (at least from a theoretical pera polynomial optimization problem (POP), via convex spective). We simply assume that the SoS method finds
relaxation called the moment relaxation. We specify an optimal degree-d pseudodistribution in poly(nO(d) )
an even integer d called the degree or the level as a time.
parameter. The SoS method will find a linear operator
The dual of the SoS method corresponds to the
L from the set of polynomials of degree at most d to R, SoS proof system [27], an automizable and restricted
which shares some properties of expectation operators proof system for refuting a system of polynomials. The
and satisfies the system of polynomials. Such linear SoS proof system are built on several deep results of
operators are called pseudoexpectations.
real algebraic geometry, such as positivestellensatz and
Definition 1.1. (Pseudoexpectation) Let R[z] denote the polynomial ring over the reals in variables
z = (z(1), . . . , z(n)) and let R[z]d denote the set of
polynomials in R[z] of degree at most d. A degree-d
pseudoexpectation operator for R[z] is a linear operator
e : R[z]d → R satisfying that E(1)
e
e 2) ≥ 0
E
= 1 and E(P
for every polynomial P of degree at most d/2.
its variants [34, 37]. Roughly speaking, any fact that
is provable in the (bounded-degree) SoS proof system
is also reflected to pseudodistributions. This approach
lies at the heart of recent surprising results [3, 32]; they
imported known proofs for integrality gap instances into
the SoS proof system and showed that (low-degree) SoS
hierarchy solves the integrality gap instances.
For notational simplicity, it is convenient to regard a
pseudoexpectation operator as the “expectation” operator of the so-called pseudodistribution 2 D. Note that it
is possible that no actual distribution realizes a pseudoexpectation operator as the expectation operator (see,
e.g. [19, 35]). Also note that we cannot sample from
1.2 Proof Outline Now we give a proof outline of
Theorem 1.2. First we recall basic facts in stable signal
recovery using the non-convex program (Pq ). For a set
S ⊆ [n], we denote by S the complement set [n] \ S. For
a vector v ∈ Rn and a set S ⊆ [n], vS denotes the vector
restricted to S. We abuse this notation and regard vS
as the vector in RS with vS (i) = v(i) for i ∈ S as well as
a vector in Rn with vS (i) = v(i) for i ∈ S and vS (i) = 0
2 It
is also called locally consistent distribution [19, 35].
for i ∈ S. The following notion is important in stable
signal recovery.
In the formal definition of PRNSP, we allow to impose
some other technical constraints. Note that PRNSP is a
stronger condition than the robust null space property
Definition 1.2. (Robust null space property)
because the latter considers only actual distributions
Let q ∈ (0, 1], ρ, τ > 0, and s ∈ Z+ . A matrix
consisting of single vectors.
A ∈ Rm×n is said to satisfy the (ρ, τ, s)-robust null
If the measurement matrix A ∈ Rm×n satisfies the
space property (RNSP) in `q quasi-norm if
PRNSP, then the resulting pseudodistribution D satisq
q
q
fies
an SoS version of stable signal recovery. Moreover,
(1.1)
kvS kq ≤ ρkvS kq + τ kAvk2
we can round D to a vector close to the signal vector,
holds for any v ∈ Rn and S ⊆ [n] with |S| ≤ s.
as shown in the following theorem.
If A satisfies the (ρ, τ, s)-robust null space property for
ρ < 1, then the solution to the program (Pq ) satisfies
kz − xkq = O(σs (x)q ) and we establish Theorem 1.1.
Indeed, a random Gaussian or Rademacher matrix will
do as long as m = Ω(s log n/s) [13].
The main idea of our algorithm is that we can convert the proof of Theorem 1.1 into a sum of squares
(SoS) proof, a proof such that all the polynomials appearing there have bounded degree and all the inequalities involved are simply implied by the SoS partial order. Due to the duality between SoS proofs and the SoS
semidefinite programming hierarchy (see, e.g. [32]), any
consequence of a SoS proof derived from a given set of
polynomial constraints is satisfied by the pseudodistribution obtained by the SoS method on these constraints.
Let us consider the following SoS version of (Pq ).
minimize
(P̃q )
subject to
e kzkq
E
q
z∼D
D satisfies Az = y,
Theorem 1.4. (informal version) Let q ∈ (0, 1] be
a constant of the form q = 2−k for some k ∈ Z+ ,
0 < ρ < 1, τ > 0, s ∈ Z+ , and > 0. If A satisfies the
`q -(ρ, τ, s)-PRNSP, then, from the pseudodistribution
D obtained by solving (P̃q ) with additional constraints
depending on , we can compute a vector z̄ ∈ Rn such
that
2 + 2ρ
σs (x)q + kz̄ − xkq ≤
1−ρ
in polynomial time.
We then show that there exists a family of matrices
satisfying PRNSP, namely the family of Rademacher
matrices.
Theorem 1.5. (informal version) Let q ∈ (0, 1]
and s ∈ Z+ . For some 0 < ρ < 1 and τ = O(s),
a (normalized) Rademacher matrix A ∈ Rm×n with
m = Ω(s2/q log n) satisfies the `q -(ρ, τ, s)-PRNSP with
high probability.
Theorem 1.2 is a consequence of Theorems 1.4 and 1.5.
where D is over pseudodistributions for the variable z.
In our algorithm, we solve (P̃q ) with technical additional
constraints using the SoS method, and then recover a
vector from the obtained pseudoexpectation operator.
A subtle issue here is that we need to deal with terms of
the form |z(i)|q for fractional q, which is not polynomial
in z. A remedy for this is adding auxiliary variables and
constraints among them (details are explained later),
and for now we assume that we can exactly solve the
program (P̃q ).
As we can only execute SoS proofs by using the SoS
method, we need an SoS version of the robust null space
property:
1.3 Technical Contributions In order to prove
Theorem 1.4, we would like to follow the proof of Theorem 1.1 in the SoS proof system. However, many steps
of the proof requires non-trivial modifications for the
SoS proof system to go through. Let us explain in more
detail below.
Handling `q quasi-norm The first obvious obstacle is that the program (P̃q ) contains non-polynomial
terms such as |z(i)|q in the objective function. This is
easily resolved by lifting; i.e., we introduce a variable
tz(i) , which is supposed to represent |z(i)|q , and impose
2/q
the constraints tz(i) = z(i)2 and tz(i) ≥ 0 (recall that q
Definition 1.3. (informal version) Let q ∈ (0, 1], is of the form 2−k for some k ∈ Z+ ).
ρ, τ > 0, and s ∈ Z+ . A matrix A ∈ Rm×n is said to
Imposing the triangle inequality for `qq metsatisfy the (ρ, τ, s)-pseudo robust null space property in ric The second problem is that the added variable tz(i)
`q quasi-norm (`q -(ρ, τ, s)-PRNSP for short) if
does not always behave as expected. When proving
Theorem 1.1, we frequently use the triangle inequal
q/2
q
q
e kvS kq ≤ ρ E
e kvS̄ kq + τ E
e kAvk2
ity ku + vkqq ≤ kuk
(1.2)
E
q
q
2
Pq + kvkq for qvectors
P u andq v,
v∼D
v∼D
v∼D
or
in
other
words,
|u(i)
+
v(i)|
≤
i
i |u(i)| +
P
q
. Unfortunately, we cannot derive its SoS verholds for any pseudodistribution D of degree at least 2/q
i |v(i)|
e P tu(i)+v(i) ≤ E
e P tu(i) + E
e P tv(i) in the SoS
and S ⊆ [n] with |S| ≤ s.
sion E
i
i
i
proof system. This is because, even if we impose the
2/q
2/q
2/q
constraints tu(i) = u(i)2 , tv(i) = v(i)2 , and tu(i)+v(i) =
(u(i) + v(i))2 , these constraints do not involve any
constraint among tu(i) , tv(i) , and tu(i)+v(i) . (Instead,
P
q/2 P
q/2
2/q
2/q
e
e
we can derive E
t
≤
E
t
+
i u(i)+v(i)
i u(i)
P
q/2
2/q
e
E
in the SoS proof system, which is not
i tv(i)
very useful.)
To overcome this issue, we use the fact that we only
need the triangle inequality among z − x, z, and x,
where z is a variable in (P̃q ) and x is the signal vector.
Of course we do not know x in advance; we only know
that x lies in [−1, 1]n . We will discretize the region
[−1, 1]n using a small δ 1/n2/q and explicitly add
e tz(i)−b ≤ E
e tz(i) + b for each i ∈ [n]
the constraints E
and multiple b ∈ [−1, 1] of δ. Instead of x, we use an
approximated vector xL each coordinate of which is a
multiple of δ with kx − xL k being small. The SoS proof
system augmented with these triangle inequalities can
now follow the proof of Theorem 1.1.
Rounding Rounding a pseudodistribution to a
feasible solution is a challenging task in general. Fortunately, we can easily round the pseudodistribution D
obtained by solving the program (P̃q ) to a vector z̄ close
to the signal vector by fixing each coordinate z̄(i) to
e D |z(i) − b|q , where b ∈ [−1, 1] runs over mularg minb E
tiples of δ.
Imposing the PRNSP In order to prove Theorem 1.5, we show that a (normalized) Rademacher
matrix with a small coherence satisfies PRNSP. Here,
the coherence of a matrix A ∈ Rm×n with columns
a1 , . . . , am is defined as µ = maxi,j |hai , aj i|. Again
our approach is the following: rewrite a proof that matrices with a small coherence satisfy the robust null
space property, into an SoS proof. Here is the last obstacle; it seems inevitable for us to add exponentially
many extra variables and constraints, in order to convert known proofs. However, we will show that the number of extra variables is polynomially bounded and that
the extra constraints admit a separation oracle, if A is
a Rademacher matrix. Thanks to the ellipsoid method,
we can obtain the desired SoS proof for Rademacher
matrices in polynomial time.
trices [8, 9, 28]. See [23, 26] for a survey of this area.
Besides the fact that k·kqq converges to the `0 norm
as q → 0, an advantage of using the `q quasi-norm is
that we can use a smaller measurement matrix for signal
recovery (at least for the exact case). Chartrand and
Staneva [16] showed that `q quasi-norm minimization is
a stable recovery scheme for a certain matrix A ∈ Rm×n
with m = Ω(s + qs log(n/s)).3 Hence, as q → 0, the
required dimension of the measurement vector scales
like O(s). As stated in the beginning of this section,
the `q quasi-norm minimization is effective for the sparse
noise model. Other methods (e.g., Prony’s method [21])
are also applicable to the sparse noise model, but they
require a priori information on the sparsity of noise.
In order to implement the `q quasi-norm minimization for q < 1, several heuristic methods have been proposed, including the gradient method [14], the iteratively reweighed least squares method (IRLS) [17], and
an operator-splitting algorithm [15], and a method that
computes a local minimum [24].
Recently, the sum-of-squares method has been
found to be useful in several unsupervised learning tasks
such as the planted sparse vector problem [4], dictionary
learning [5], and tensor decompositions [5, 6, 25, 40].
Surveys [19, 35] provide detailed information on the
Lasserre and other hierarchies. A handbook [2] contains various articles on the semidefinite programming,
conic programming, and the SoS method, from both the
theoretical and the practical point of view.
1.5 Organization In Section 2, we review basic facts
on the SoS proof system. In Section 3, we show
that stable signal recovery is achieved if the measurement matrix A satisfies PRNSP. Then, we show that a
Rademacher matrix satisfies PRNSP with high probability in Section 4.
2
Preliminaries
For an integer k ∈ N, we denote by [k] the set
{1, 2, . . . , k}. For a matrix A, σmax (A) denotes its
largest singular value. We write P 0 if P is a sumof-squares polynomial, and similarly we write P Q if
P − Q 0.
We now see several useful lemmas for pseudodistributions. Most of these results were proved in the series
1.4 Related work The notion of stable signal re- of works by Barak and his coauthors [3, 4, 5, 6]. We
covery in `1 norm is introduced in [11, 12] and it is basically follow the terminology and notations in the
shown that basis pursuit is a stable recover scheme for recent survey by Barak and Steurer [7].
a random Gaussian matrix A ∈ Rm×n provided that
m = Ω(s log(n/s)). The notion of stable signal recov- Lemma 2.1. (Pseudo Cauchy-Schwartz [3]) Let
ery is extended to `q quasi-norm in [20, 36]. Stable
3 Although they do not give this result explicitly, it can be
recovery schemes with almost linear encoding time and
shown by simple adaptation of their result to a proof for stability
recovery time have been proposed by using sparse ma- of ` minimization.
1
d ∈ Z+ be an integer. Let P, Q ∈ R[x]d be polynomials To recover a vector close to xL in `q quasi-norm from
of degree d and D be a pseudodistribution of degree at y, we consider the following non-convex program:
least 2d. Then, we have
minimize kzkqq
1/2 1/2
(P
)
q,η
e P (x)Q(x) ≤ E
e P (x)2
e Q(x)2
E
E
.
subject to kAz − yk2 ≤ η 2 ,
x∼D
x∼D
x∼D
2
√
where η = σmax (A) sδ. Note that the desired vector
xL is a feasible solution of this program.
In Section 3.1, we show how to formulate the nonconvex program (Pq,η ) as a POP. Although the form
of the POP depends on , we can solve its moment
relaxation using the SoS method. Then, we show
x∼D
x∼D
x∼D
that, if A satisfies a certain property, which we call
pseudo robust null space property, then the vector z
Lemma 2.3. ([3]) Let u, v, and D be the same as
“sampled” from the pseudodistribution obtained by the
above. Then, we have
SoS method is close to the vector xL in `q quasi
1/2 1/2 1/2
norm. Then in Section 3.2, we show how to round
e ku + vk2
e kuk2
e kvk2
E
≤ E
+ E
. the pseudodistribution to an actual vector z̄ so that
2
2
2
x∼D
x∼D
x∼D
kz̄ − xkq = O(σs (x)q ) + for small > 0.
Lemma 2.4. For any pseudodistribution D of degree at
least 4, we have
3.1 Formulating `q quasi-norm minimization as
a POP To formulate (Pq,η ) as a POP, we introduce
e hu, vi2 ≤ E
e kuk2 kvk2 .
E
additional variables.
Formally, we define kzkqq as a
2
2
Pn
(u,v)∼D
(u,v)∼D
shorthand for
i=1 tz(i) , where tz(i) for i ∈ [n] is an
additional
variable
corresponding to |z(i)|q . We also
Proof. By the Lagrange identity, we have
2/q
impose constraints tz(i) ≥ 0 and tz(i) = z(i)2 for each
X
2
2
2
2
kuk2 kvk2 − hu, vi =
(u(i)v(j) − u(j)v(i)) .
i ∈ [n].
i<j
Unfortunately, these constraints are not sufficient
e tz(i) and z(i) in the SoS
to derive formula connecting E
Hence, the statement has an SoS proof of degree 4.
e
e t2/q are unrelated
proof system because E tz(i) and E
Lemma 2.2. ([3]) Let d ∈ Z+ be an integer. Let u
and v be vectors whose entries are polynomials in R[x]d .
Then for any pseudodistribution D of degree at least 2d,
we have
1/2 1/2
e hu, vi ≤ E
e kuk2
e kvk2
E
E
.
2
2
z(i)
in general. To address this issue, we add several other
variables and constraints. Specifically, for each i ∈ [n]
and b ∈ L, we add a new variable tz(i)−b along with the
In this section, we prove Theorem 1.4. The formal
following constraints:
statement of Theorem 1.4 is provided in Theorem 3.1.
In the rest of this section, we fix q ∈ (0, 1] and an error
• tz(i)−b ≥ 0,
parameter > 0. We also fix a matrix A ∈ Rm×n and
the unknown signal vector x ∈ [−1, 1]n , and hence the
2/q
• tz(i)−b = (z(i) − b)2 ,
measurement vector y = Ax.
Let δ > 0 be a parameter chosen later, and L be a
• tz(i)−b ≤ tz(i)−b0 + |b − b0 |q (b0 ∈ L),
set of multiples of δ in [−1, 1]. Clearly |L| = O(1/δ), and
there exists x̄ ∈ L with |x − x̄| ≤ δ for any x ∈ [−1, 1].
Let xL ∈ Rn be the point in Ln closest to x. Our
• |b − b0 |q ≤ tz(i)−b + tz(i)−b0 (b0 ∈ L) .
L
strategy is recovering x instead of x. As we know
that xL ∈ Ln and each coordinate of xL is discretized, Intuitively, tz(i)−b represents |z(i) − b|q . The last two
we can add several useful constraints to the moment constraints encode triangle inequalities; for example,
the first one corresponds to |z(i) − b|q ≤ |z(i) − b0 |q +
relaxation, such as triangle inequality.
|b − b0 |q . This increases the number of variables and
For the measurement vector y = Ax, we have
constraints by O(n/δ) and O(n/δ 2 ), respectively. For
L
L
kAx − yk2 = kA(x − x)k2
the sake of notational simplicity, we will denote tz(i)−b
by
|z(i) − b|q for i ∈ [n] and b ∈ L in POPs. Note that
L
≤ σmax (A)kx − xk2
these constraints are valid in the sense that the desired
√
≤ σmax (A) sδ.
solution xL satisfies all of them.
3
Signal Recovery Based on Pseudo Robust
Null Space Property
Thus we arrive at the following POP:
minimize
subject to
kzkqq
z satisfies the system
{kAz − yk22 ≤ η 2 } ∪ P,
where P is a system of constraints. Here, P contains
all the constraints on tz(i)−b mentioned above but not
limited to. Indeed, in Section 4, we add other (valid)
constraints to P so that the matrix A satisfies the
pseudo robust null space property. From now on, we
also fix P.
Now, we consider the following moment relaxation:
minimize
(P̃q,η )
subject to
e kzkq
E
q
D
D satisfies the system
{kAz − yk22 ≤ η 2 } ∪ P,
Proof. Let v := z − xL . Then from the PRNSP of A,
we have
q/2
q
2
e
e S kq ≤ ρ Ekv
e
(3.3)
.
Ekv
q
S kq + τ EkAvk2
Note that
q
L q
kxL kqq = kxL
S kq + kxS kq
e
e S kq + kxL kq
≤ Ek(z
− xL )S kqq + Ekz
q
S q
e S kq + Ekz
e S kq + kxL kq ,
= Ekv
q
q
S q
q
L
q
L q
q
e
e
e
Ekv
S kq = Ek(z − x )S kq ≤ kxS kq + EkzS kq .
Summing up these two inequalities, we get
q
q
L q
q
L q
e
e
e
(3.4) Ekv
S kq ≤ Ekzkq − kx kq + EkvS kq + 2kxS kq .
Combining (3.3) and (3.4), we get
1 e
q
e
Ekzkqq − kxL kqq + 2kxSL kqq
Ekv
S kq ≤
1−ρ
q/2
τ e
+
EkAvk22
.
1−ρ
Proposition 3.1. Let nP be the number of variables
in P and dP be the maximum degree that appears in
the system P. If there is a polynomial-time separation
oracle for constraints in P, then we can solve the
moment relaxation (P̃q,η ) in O(n/δ+nP )O(2/q+dP ) time. Therefore,
q
q
q
e
e
e
Proof. Immediate from Theorem 1.3.
Ekvk
q = EkvS kq + EkvS kq
q/2
e
Since xL ∈ Ln , the value Ekz
− xL kqq is well
q
2
e
e
EkAvk
≤ (1 + ρ) Ekv
k
+
τ
2
S q
defined. Furthermore, D satisfies the following triangle
1
+
ρ
inequalities:
q
L q
L q
e
≤
k
Ekzk
−
kx
k
+
2kx
q
q
S q
1−ρ
q
L q
L q
e
e
Ekzk
q ≤ Ekz − x kq + kx kq ,
q/2
2τ
2
e
q
L q
EkAvk
.
+
e − xL kq ≤ Ekzk
e
2
Ekz
q
q + kx kq ,
1−ρ
q
e − xL kq + Ekzk
e
kxL kqq ≤ Ekz
q
q.
Corollary 3.1. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ .
m×n
satisfies the `q -(ρ, τ, s)-PRNSP
Now we formally define pseudo robust null space If the matrix A ∈ R
with
respect
to
the
system
P, then we have
property.
1+q
Definition 3.1. Let ρ, τ > 0 and s ∈ Z+ . A mae − xL kq ≤ 2(1 + ρ) σs (xL )q + 2 τ η q .
Ekz
q
q
m×n
trix A ∈ R
is said to satisfy the (ρ, τ, s) pseudo
1−ρ
1−ρ
robust null space property in `q quasi-norm (`q -(ρ, τ, s)q
L q
e
e
Proof. Note that Ekzk
q − kx kq ≤ 0 and EkA(z −
PRNSP for short) with respect to the system P if
q/2
q/2
q/2
q/2
2
e
xL )k22
≤ EkAz−yk
+ ky−AxL k22
≤ 2η q
e kAvk2
e kvS kq ≤ ρ E
e kv kq + τ E
2
E
2
q
q
S
D
D
D
by Lemma 2.3 and the concavity of | · |q . The statement
holds for any pseudodistribution D of degree at least 2/q is immediate from the previous lemma with S being an
satisfying P and all v of the form z − b (b ∈ Ln ), and index set of top-s largest absolute entries of x.
for any S ⊆ [n] with |S| = s.
Lemma 3.1. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the 3.2 Rounding Suppose that we have obtained a
matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP with pseudodistribution D as a solution to (P̃q,η ). Then, we
use the following simple rounding scheme to extract an
respect to the system P, then we have
actual vector z̄ ∈ Rn : For each i ∈ [n], find b ∈ L that
1+ρ e
L q
q
L q
L q
e
e z∼D |z(i) − b|q and set z̄(i) = b. In other
Ekz − x kq ≤
Ekzkq − kx kq + 2kxS kq
minimizes E
1−ρ
e z∼D kz−bkq , where b is over
words, z̄ is a minimizer of E
q/2
q
2τ e
n
L 2
vectors in L . Clearly, we can find z̄ in O(n/δ) time.
+
EkA(z − x )k2
1−ρ
Also, we have the following guarantee on its distance to
for any S ⊆ [n] with |S| ≤ s.
xL .
Lemma 3.2. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the Corollary 3.2. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ .
matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP with If the matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP
respect to the system P, then the vector z̄ ∈ Ln satisfies with respect to the system P, then the vector z̄ satisfies
2(1 + ρ)
1+ρ
21+q τ q
L q
L q
σs (x)q + O()
kz̄ − xkq ≤
(3.5) kz̄ − x kq ≤ 2
σs (x )q +
η .
1−ρ
1−ρ
1−ρ
1/q+1 1/q
√ Proof. By Corollary 3.1, we have
s .
by choosing δ = O (1−ρ)
/
σ
(A)
max
τ n1/q
1+q
e − xL kq ≤ 1 + ρ σs (xL )q + 2 τ η q .
Ekz
q
q
1−ρ
1−ρ
Proof. Invoke Lemma 3.3 with
1/q−1
1 − ρ 1−ρ
1/q
0
L
n
q
=
·
=
Ω
.
e
Since x ∈ L holds and z̄ is a minimizer of Ekz − bkq ,
2(1 + ρ)σs (x)qq
n
we also have
Then, we get a vector z̄ such that
1+q
e − z̄kq ≤ 1 + ρ σs (xL )q + 2 τ η q .
Ekz
2(1 + ρ)
1/q
q
q
1−ρ
1−ρ
kz̄ − xkq ≤
σs (x)qq + 0
1−ρ
Then by the triangle inequality,
1/q−1 2(1 + ρ)
2(1 + ρ)
0
q
σ
(x)
+
O
σ
(x)
≤
s
q
s
q
L q
q
L
q
e − z̄k + Ekz
e −x k
1−ρ
1−ρ
kz̄ − x kq ≤ Ekz
q
q
2(1 + ρ)
1+ρ
21+q τ q
≤
σs (x)q + O().
≤2
σs (xL )qq +
η .
1−ρ
1−ρ
1−ρ
Note that the required δ is
n (1 − ρ)0 1/q
√ (1 − ρ)0 1/q o
/ σmax (A) s ,
Ω min
τ
n
Lemma 3.3. Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the
(1 − ρ)1/q+1 √ 1/q
matrix A ∈ Rm×n satisfies the `q -(ρ, τ, s)-PRNSP with
=Ω
/ σmax (A) s .
respect to the system P, then the vector z̄ satisfies
τ n1/q
Then, we show that z̄ is close to the signal vector
x in `qq metric.
kz̄ − xkqq ≤
2(1 + ρ)
σs (x)qq + O()
1−ρ
Combining Proposition 3.1 and Corollary 3.2, we
get the following theorem.
Theorem 3.1. (formal version of Theorem 1.4)
Let 0 < ρ < 1, τ > 0, and s ∈ Z+ . If the matrix
1/q
√ m×n
1/q
satisfies the `q -(ρ, τ, s)-PRNSP with respect
δ = min{ (1−ρ)/τ
/ σmax (A) s , ((1−ρ)/n) }. A ∈ R
to
the
system
P and P has a polynomial-time separation
√
Proof. From the choice of δ, we have η = σmax (A) sδ ≤ oracle, then we can compute a vector z̄ ∈ Rn such that
1/q
(1 − ρ)/τ
. Then from Lemma 3.2, we get
2(1 + ρ)
kz̄ − xkq ≤
σs (x)q + .
1−ρ
2(1 + ρ)
kz̄ − xL kqq ≤
σs (xL )qq + O().
1−ρ
The running time is (n/δ + nP )O(2/q+dP ) , where nP
Note that we have kxL − xkqq ≤ nδ q and |σs (x)qq − is the number of variables in P, dP is the maximum
degree
that appears in constraints
of P, and δ =
σs (xL )qq | = O(nδ q ). Then, we have
√ (1−ρ)1/q+1 1/q
Ω
/ σmax (A) s .
τ n1/q
kz̄ − xkqq ≤ kz̄ − xL kqq + kxL − xkqq
4 Imposing Pseudo Robust Null Space
2(1 + ρ)
≤
(σs (x)qq + nδ q ) + O() + nδ q
Property
1−ρ
A normalized Rademacher matrix A ∈ Rm×n is a
2(1 + ρ)
nδ q
=
σs (x)qq + O +
random
1−ρ
1−ρ
√ matrix whose each entry is i.i.d. sampled from
{±1/
m}. In this section, we show that we can impose
2(1 + ρ)
=
σs (x)qq + O().
the PRNSP to the normalized Rademacher matrix by
1−ρ
introducing some additional variables and constraints to
Finally, we show that z̄ is close to x in `q quasi- pseudodistributions. Now we introduce coherence and
norm.
review properties of a normalized Rademacher matrix.
by choosing
Definition 4.1. (Coherence) Let A ∈ Rm×n be a
matrix whose columns a1 , . . . , an are `2 -normalized.
The `q -coherence function of A is defined as µqq (s) :=
nP
o
q
maxi∈[n] max
/S
j∈S |hai , aj i| : S ⊆ [n], |S| = s, i ∈
for s = 1, . . . , n − 1.
can design a separation oracle for (4.6), and hence we
can use the ellipsoid method to solve SoS relaxations
involving these constraints.
Lemma 4.3. There is a polynomial time algorithm that
validates whether (4.6) is satisfied by a given pseudodistribution D, and if not, it finds i ∈ [n] and b ∈ Ln such
Now we review properties of a normalized that
Rademacher matrix.
X
b
e b (i)|q > E|hAv
e
e b (j)|q |haj , ai i|q .
E|v
, ai i|q +
E|v
Lemma 4.1. ([1]) Let n ∈ Z+ and > 0. If m =
j6=i
Ω(log n/2 ), then a normalized Rademacher matrix A ∈
Rm×n with columns a1 , . . . , an satisfies |hai , aj i| ≤ Proof. Since there are only polynomially
many choices
Pn
for all i 6= j with high probability.
for the values of i, b(i), and j=1 b(j)haj , ai i, we can
what follows,
Pin
Corollary 4.1. Let s, n ∈ Z+ , > 0. If m = exhaustively check all of them. Hence,
n
we
fix
i
∈
[n],
b
:=
b(i),
and
c
:=
2/q
2/q
i
j=1 b(j)haj , ai i.
Ω(s log n/ ), then a normalized Rademacher maNote
that,
by
fixing
c,
we
have
also
fixed
the value of
m×n
trix A ∈ R
with columns a1 , . . . , an satisfies e
b
q
0
e |hAz, ai i − c|q .
E|hAv
,
a
i|
to
c
:=
E
q
i
µq (s) ≤ with high probability.
n
Now we want to check
Pn whether there exists b ∈ L
Lemma 4.2. ([38]) A normalized Rademacher matrix with b(i) = bi and P
j=1 b(j)haj , ai i = c such that
m
q
0
e
e
).
A ∈ Rm×n with m ≤ n satisfies σmax (A) = O( n log
E|z(i) − bi | > c + j6=i E|z(j)
− b(j)|q |haj , ai i|q .
m
We can solve this problem by dynamic programming
In what follows, we fix q ∈ (0, 1], s ∈ Z+ , > 0, using a table T [·, ·], where T [k, a] stores the minimum
q
q
and a matrix A ∈ Rm×n that satisfies the properties value of P
e
j∈[k],j6=i E|z(j) − b(j)| |haj , ai i| conditioned
of Corollary 4.1 and Lemma 4.2. Let a1 , . . . , an be the on P
j∈[k],j6=i b(j)haj , ai i = a. Since the size of T is
columns of A.
O(n
·
nm/δ),
the dynamic programming can be solved
For b ∈ Ln , we define v b = z − b, where z is
in polynomial time.
a variable in POP. To impose the PRNSP to A, for
each i ∈ [n] and b ∈ Ln , we add a variable thAvb ,ai i
Let P 0 be the system of all the added constraints on
2/q
and constraints thAvb ,ai i = hAv b , ai i2 and thAvb ,ai i ≥ tz(i)−b (given in Section 3) and constraints on thAvb ,a i .
i
0. Intuitively, thAvb ,ai i represents |hAv b , ai i|q , and as Then, A satisfies the PRNSP by imposing the system
with the previous section, we will denote thAvb ,ai i by P 0 .
|hAv b , ai i|q in POPs for notational simplicity.
Furthermore, we add the following constraints for Theorem 4.1. (formal version of Theorem 1.5)
The matrix A satisfies the `q -(ρ, τ, s)-PRNSP with
each i ∈ [n] and b ∈ Ln ,
µq (s)
respect to P 0 with ρ = 1−µqqq (s−1) = O() and
X
(4.6) |v b (i)|q ≤ |hAv b , ai i|q +
|v b (j)|q |haj , ai i|q .
τ = 1−µqqs(s−1) = O(s).
j6=i
These constraints are valid since v b (i)ai = Av b −
P
b
j6=i v (j)aj and hai , ai i = 1.
At first sight, the resulting moment relaxation is intractable since there are exponentially many new variables. However, this
To see this, note
Pn is not the case. P
n
that hAv b , ai i = j=1 z(i)haj , ai i − j=1 b(j)haj , ai i.
Pn
Also, the number of possible values of j=1 b(j)haj , ai i
√
is O(nm/δ) because A is ±1/ m-valued and b(j) ∈
[−1, 1] is a multiple of δ. Hence,
Pn identifying variables thAvb ,ai i and thAvb0 ,ai i if
j=1 b(j)haj , ai i =
Pn
0
b
(j)ha
,
a
i,
we
only
have
to
add
O(nm/δ) varij
i
j=1
ables.
Another issue is that we have added exponentially
many constraints. The following lemma says that we
Proof. The proof basically follows the standard proof
in compressed sensing (see [23]). Let us fix S ⊆ [n] of
size s and b ∈ Ln . Then for any pseudodistribution D
satisfying the above constraints, we have
b
e b (i)|q ≤ E|hAv
e
e
E|v
, ai i|q + E
X
|v b (l)|q |hal , ai i|q
l∈S̄
e
+E
X
b
|v (j)|q |haj , ai i|q
j∈S:j6=i
for i ∈ S. Using the pseudo Cauchy Schwartz k =
− log2 q times, we get
b
e
E|hAv
, ai i|q
2/q
b
e
≤ EhAv
, ai i2 .
b
b 2
e
e
By Lemma 2.4, EhAv
, ai i2 ≤ EkAv
k2 kai k22 = References
b 2
e
EkAv
k2 . Thus we have
q/2
X
b 2
[1] D. Achlioptas. Database-friendly random projections:
e b (i)|q ≤ EkAv
e
e
E|v
|v b (l)|q |hal , ai i|q
k2
+E
l∈S̄
X
e
+E
|v b (j)|q |haj , ai i|q .
[2]
j∈S:j6=i
Summing up these inequalities yields
q/2 X
X
b 2
e b kq ≤ s EkAv
e
e b (l)|q
Ekv
k
+
E|v
|hal , ai i|q
S q
[3]
2
i∈S
l∈S̄
+
X
b
q
e (j)|
E|v
j∈S
X
|haj , ai i|q
[4]
i∈S:i6=j
q/2
b 2
e
e b kq
≤ s EkAv
k2
+ µqq (s) Ekv
S̄ q
e b kq .
+ µqq (s − 1) Ekv
S q
[5]
Rearranging this inequality finishes the proof.
Combining Lemma 4.2 and Theorem 4.1, we immediately have the following.
[6]
Corollary 4.2. The matrix A satisfies the `q -(ρ, τ, s)PRNSP with respect to P 0 with ρ = O() and τ = O(s).
Moreover, σmax (A) = O((n log m)/m).
[7]
Now we establish our main theorem.
Theorem 4.2. (formal version of Theorem 1.2)
Let q ∈ (0, 1] be a constant of the form q = 2−k for
some k ∈ Z+ , s, n ∈ Z+ , and > 0. Then, a normalized
Rademacher matrix A ∈ Rm×n with m = O(s2/q log n)
satisfies the following property with high probability: For
any vector y = Ax, where x ∈ [−1, 1]n is an unknown
vector, we can recover z̄ with kz̄ − xkq ≤ O(σs (x)q ) + .
O(2/q)
The running
time
is
(nm/δ)
,
where
1/q √
/ sn log m .
δ = O m sn1/q
Proof. Solve the SoS relaxation (P̃q,η ) with the system
P 0 . Then, with high probability, the matrix A satisfies
the `q -(ρ, τ, s)-PRNSP with respect to P 0 for ρ = O(1)
and τ = O(s), and σmax (A) = O((n log m)/m) with
high probability by Corollary 4.2. Then, we can recover
a vector z ∈ Rn with the desired property by Theorem 3.1. Note that the number nP 0 of added variables
in the system P 0 is O(nm/δ) and the maximum degree
dP 0 that appears in a constraint of P 0 is 2/q. Hence,
the running time becomes as stated.
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Acknowledgements
The authors thank Satoru Iwata and Yuji Nakatsukasa
for reading an earlier version of this paper.
[15]
Johnson-Lindenstrauss with binary coins. Journal of
Computer and System Sciences, 66(4):671–687, 2003.
M. F. Anjos and J. B. Lasserre. Handbook on Semidefinite, Conic and Polynomial Optimization. Springer
Science & Business Media, 2011.
B. Barak, F. G. S. L. Brandao, A. W. Harrow, J. Kelner, D. Steurer, and Y. Zhou. Hypercontractivity, sumof-squares proofs, and their applications. In Proceedings of the 44th Annual ACM Symposium on Theory
of Computing (STOC), pages 307–326, 2012.
B. Barak, J. A. Kelner, and D. Steurer. Rounding
sum-of-squares relaxations. In Proceedings of the 46th
Annual ACM Symposium on Theory of Computing
(STOC), pages 31–40, 2014.
B. Barak, J. A. Kelner, and D. Steurer. Dictionary
learning and tensor decomposition via the sum-ofsquares method. In Proceedings of the 47th Annual
ACM Symposium on Theory of Computing (STOC),
pages 143–151, 2015.
B. Barak and A. Moitra.
Tensor prediction,
Rademacher complexity and random 3-XOR. CoRR,
abs/1501.06521, 2015.
B. Barak and D. Steurer. Sum-of-squares proofs and
the quest toward optimal algorithms. In Proceedings
of International Congress of Mathematicians (ICM),
pages 509–533, 2014.
R. Berinde and P. Indyk. Sequential sparse matching
pursuit. In Proceedings of the 47th Annual Allerton
Conference on Communication, Control, and Computing, pages 36–43, 2009.
R. Berinde, P. Indyk, and M. Ruzic. Practical nearoptimal sparse recovery in the L1 norm. In Proceedings
of the 46th Annual Allerton Conference on Communication, Control, and Computing, pages 198–205, 2008.
A. M. Bruckstein, D. L. Donoho, and M. Elad. From
sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51(1):34–81,
2009.
E. J. Candès. The restricted isometry property and its
implications for compressed sensing. Comptes Rendus
Mathematique, 346(9-10):589–592, 2008.
E. J. Candès, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59(8):1207–1223, 2006.
E. J. Candès and T. Tao. Decoding by linear programming. IEEE Transactions on Information Theory, 51(12):4203–4215, 2005.
R. Chartrand. Exact reconstruction of sparse signals
via nonconvex minimization. Signal Processing Letters,
14(10):707–710.
R. Chartrand. Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data.
In Proceedings of the IEEE International Symposium
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
on Biomedical Imaging: From Nano to Macro (ISBI),
pages 262–265, 2009.
R. Chartrand and V. Staneva. Restricted isometry
properties and nonconvex compressive sensing. Inverse
Problems, 24(3), 2008.
R. Chartrand and W. Yin. Iteratively reweighted
algorithms for compressive sensing. Proceedings of the
IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pages 3869–3872,
2008.
A. Cherian, S. Sra, and N. Papanikolopoulos. Denoising sparse noise via online dictionary learning. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages
2060–2063, 2011.
E. Chlamtac and M. Tulsiani. Convex relaxations and
integrality gaps. In Handbook on Semidefinite, Conic
and Polynomial Optimization. 2012.
A. Cohen, W. Dahmen, and R. DeVore. Compressed
sensing and best k-term approximation. Journal of the
American Mathematical Society, 22(1):211–231, 2009.
B. G. R. de Prony. Essay experimental et analytique:
sur les lois de la dilatabilite de fluides elastique et sur
celles de la force expansive de la vapeur de l’alcool, a
differentes temperatures. Journal de l’ecole Polytechnique, pages 24–76, 1795.
D. L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006.
S. Foucart and H. Rauhut. A Mathematical Introduction to Compressive Sensing. Springer, 2013.
D. Ge, X. Jiang, and Y. Ye. A note on the complexity of Lp minimization. Mathematical Programming,
129(2):285–299, 2011.
R. Ge and T. Ma. Decomposing overcomplete 3rd
order tensors using sum-of-squares algorithms. CoRR,
abs/1504.05287, 2015.
A. Gilbert and P. Indyk. Sparse recovery using sparse
matrices. Proceedings of the IEEE, 6:937–947, 2010.
D. Grigoriev and N. Vorobjov. Complexity of null-and
positivstellensatz proofs. Annals of Pure and Applied
Logic, 113(1):153–160, 2001.
P. Indyk and M. Ružić. Near-optimal sparse recovery
in the L1 norm. In Proceedings of the 49th Annual
IEEE Symposium on Foundations of Computer Science
(FOCS), pages 199–207, 2008.
J. N. Laska, M. A. Davenport, and R. G. Baraniuk.
Exact signal recovery from sparsely corrupted measurements through the pursuit of justice. Conference
Record of the 43rd Asilomar Conference on Signals,
Systems and Computers (ACSSC), pages 1556–1560,
2009.
J. B. Lasserre. Global optimization with polynomials
and the problem of moments. SIAM Journal on
Optimization, 11(3):796–817, 2006.
Y. Nesterov. Squared functional systems and optimization problems. In High Performance Optimization, pages 405–440. Springer, 2000.
R. O’Donnell and Y. Zhou. Approximability and proof
[33]
[34]
[35]
[36]
[37]
[38]
[39]
[40]
complexity. In Proceedings of the 24th Annual ACMSIAM Symposium on Discrete Algorithms (SODA),
pages 1537–1556, 2013.
P. A. Parrilo. Structured semidefinite programs and
semialgebraic geometry methods in robustness and optimization. PhD thesis, California Institute of Technology, 2000.
M. Putinar. Positive polynomials on compact semialgebraic sets. Indiana University Mathematics Journal, 42(3):969–984, 1993.
T. Rothvoß. The Lasserre hierarchy in approximation
algorithms. Lecture Notes for the MAPSP, pages 1–25,
2013.
R. Saab, R. Chartrand, and O. Yilmaz.
Stable
sparse approximations via nonconvex optimization. In
Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP),
pages 3885–3888, 2008.
K. Schmüdgen. The k-moment problem for compact semi-algebraic sets. Mathematische Annalen,
289(1):203–206, 1991.
Y. Seginer. The expected norm of random matrices.
Combinatorics, Probability and Computing, 9(2):149–
166, 2000.
N. Z. Shor. An approach to obtaining global extremums in polynomial mathematical programming
problems. Cybernetics, 23(5):695–700, 1987.
G. Tang and P. Shah. Guaranteed tensor decomposition: A moment approach. Proceedings of the 32nd International Conference on Machine Learning (ICML),
pages 1491–1500, 2015.