To appear in Proc. 42nd Annu. Conf. Information Sciences and Systems (CISS’08), Princeton, NJ, Mar. 19-21, 2008
1
Compressed Channel Sensing
Waheed U. Bajwa† , Jarvis Haupt† , Gil Raz‡ and Robert Nowak†
† Department
of Electrical and Computer Engineering, University of Wisconsin-Madison
‡ GMR Research and Technology, Concord, MA 01742-3819
Abstract—Reliable wireless communications often requires
accurate knowledge of the underlying multipath channel. This
typically involves probing of the channel with a known training
waveform and linear processing of the input probe and channel output to estimate the impulse response. Many real-world
channels of practical interest tend to exhibit impulse responses
characterized by a relatively small number of nonzero channel
coefficients. Conventional linear channel estimation strategies,
such as the least squares, are ill-suited to fully exploiting the
inherent low-dimensionality of these sparse channels. In contrast,
this paper proposes sparse channel estimation methods based
on convex/linear programming. Quantitative error bounds for
the proposed schemes are derived by adapting recent advances
from the theory of compressed sensing. The bounds come within
a logarithmic factor of the performance of an ideal channel
estimator and reveal significant advantages of the proposed
methods over the conventional channel estimation schemes.
I. C HANNEL S ENSING
AND
E STIMATION
Optimal demodulation and decoding in wireless communication systems often requires accurate knowledge of channel
impulse response functions. Typically, an impulse response is
estimated by probing the channel with known training waveforms and comparing the transmitted and received signals.
The importance of channel identification is underscored by the
number of research papers dedicated to this topic. Searching
for “channel estimation” turns up more than 36 000 articles
in Google Scholar and over 700 IEEE journal articles in
IEEE Xplore.
There are two salient aspects to this channel identification
problem, namely, sensing and estimation. Sensing corresponds
to the design of waveforms used to probe the channel. Estimation is the problem of processing the input probe and
channel output to recover the impulse response. The ability
to accurately identify a channel critically depends on both the
design of appropriate probes and the application of effective
estimation methods. In particular, probing schemes and estimation strategies that are tailored to the anticipated characteristics
of the underlying channel yield better estimates than generic
procedures. Grappling with these issues is central to most of
the papers written on this topic.
Multipath wireless channels tend to exhibit impulse responses dominated by a relatively small number of clusters
of significant paths, especially when operating at large bandwidths and signaling durations and/or with numbers of antenna
elements [1]–[3]. These are often called “sparse” channels,
This research was supported in part by the DARPA A2I Program. E-mails:
{wubajwa, jdhaupt}@wisc.edu, [email protected], [email protected]
since most of the channel coefficients are either zero or nearly
zero. In the simplest setting, which is the primary focus of this
paper, consider frequency-selective channels (channels with a
large delay spread relative to the inverse of the communication
bandwidth) with most of the impulse response energy localized
to relatively small regions in delay. Sparse channel models of
this type have received considerable attention lately [4]–[6].
This paper tackles the problem of sensing and estimating sparse (single-antenna) frequency-selective channels. A
number of authors have recently addressed this problem [4]–
[6]. Lacking from the previous investigations is a quantitative theoretical analysis of the performance of the proposed
sparse channel estimation methods. The main results of this
paper adapt recent advances from the theory of compressed
sensing to devise quantitative error bounds for convex/linear
programming based sparse channel estimation schemes. The
bounds come within a logarithmic factor of the performance of
an ideal channel estimator and clearly reveal the relationship
between the input probes and the accuracy of the channel
estimates.
II. M ULTIPATH W IRELESS C HANNEL M ODELING
In this section, we review mathematical models for multipath wireless channels. In particular, the so-called “virtual
channel model” is a discrete approximation of the continuoustime channel at the delay resolution dictated by the channel
bandwidth. The virtual channel model captures the relationship
between clustering of physical paths and sparsity of dominant
channel coefficients and sets the stage for the application of
compressed sensing theory and methods.
We consider single-antenna communication channels, which
are often characterized as linear, time-varying systems [7]. The
corresponding (complex) basedband transmitted and received
signals are related as
Z Tm
y(t) =
h(t, τ )x(t − τ )dτ + z(t)
(1)
0
where h(t, τ ) is the time-varying channel impulse response,
and x(t), y(t) and z(t) represent the transmitted, received
and additive white Gaussian noise (AWGN) waveforms, respectively. The maximum delay spread Tm is defined as the
maximum possible nonzero delay. The focus of this paper is
the identification of frequency-selective channels. A channel
is said to be (purely) frequency-selective if (i) the channel
impulse response remains constant over the time duration
of interest, i.e., h(t, τ ) ≈ h(0, τ ) = h(τ ); and (ii) the
2
delay resolution bin, as illustrated in Fig. 1. Note that this
approximation gets more accurate with increasing W , due to
higher delay resolution.
B. Discrete-Time Representation
Fig. 1. A schematic illustrating the virtual representation of a single-antenna,
frequency-selective channel. The virtual channel coefficients {βj } correspond
to uniformly-spaced samples of a smoothed version of the channel impulse
response taken at the virtual delays {τ̂j = j/W } in the delay space.
channel delay spread is large relative to the inverse of the
communication bandwidth W , i.e., Tm W ≥ 1.
A. Virtual Channel Modeling
Frequency-selective channels generate multiple delayed and
attenuated copies of the transmitted waveform. For such
“multipath” channels, h(τ ) is modeled as
Npath
h(τ ) =
X
i=1
αi δ(τ − τi )
(2)
and the transmitted and received waveforms are related by
Npath
y(t) =
X
i=1
αi x(t − τi ) + z(t)
(3)
which corresponds to signal propagation along Npath physical
paths, where αi ∈ C and τi ∈ [0, Tm ] are the complex path
gain and the delay associated with the i-th physical path,
respectively.
The discrete path model (2), while realistic, is difficult to
analyze and identify due to nonlinear dependence on the realvalued delay parameters {τi }. However, because the communication bandwidth W is limited, the continuous-time channel
can be accurately approximated by a discrete counterpart,
known as a virtual channel model, with the aid of sampling
theorems and/or power series expansions—see, e.g., [7], [8].
The key idea behind virtual channel modeling is to provide
a discrete approximation of frequency-selective channels by
uniformly sampling the physical multipath environment in
delay at a resolution commensurate with W , i.e.,
y(t) ≈
βj ≈
p−1
X
j=0
βj x(t − j/W ) + z(t)
X
i∈Sτ,j
αi sinc(j − W τi )
(4)
The continuous-time received signal y(t) can be sampled at
a rate of 1/W at the receiver to obtain an equivalent discretetime representation of (4)
y =x∗β+z
=⇒
y = Xβ + z
(6)
where ∗ denotes the discrete convolution operator, β ∈ Cp
is the vector of channel coefficients, and x, y, and z are the
vectors of samples of x(t), y(t) and z(t), respectively. The
dimensions of x, y and z are dictated by the input signaling
duration Ts , where x(t) = 0 for all t 6∈ [0, Ts ]. We use
n (= ⌈Ts W ⌉) to denote the number of nonzero samples of
x(t), resulting in x ∈ Cn and y, z ∈ Cn+p−1 . Finally, the
matrix X is an (n+p−1)×p (Toeplitz-structured) convolution
matrix formed from vector x.
In the following, we shall model β as a deterministic but
unknown vector. In practical communication systems, there is
also a power constraint on the transmitted signal x(t) that
can be readily translated into an average power constraint
on the entries of x. Without loss of generality, we assume
that maxk E[|xk |2 ] ≤ 1 and the entries of z correspond
to an independent and identically distributed (i.i.d.) complex
Gaussian white noise sequence. We denote their distribution
as CN (0, σ 2 ), where σ 2 > 0 is the noise power.
The virtual representation of a frequency-selective channel
captures its essential characteristics in terms of the channel
coefficients {βj }. Identifying a frequency-selective channel,
therefore, becomes equivalent to designing the (discrete) input
probe x and estimating β from the output y. Three types
of input probes are commonly employed for channel sensing,
namely, impulses, pseudo-random inputs, and multitone signals. Channel estimates are usually obtained by solving the
least squares (LS) problem (or a variant of it)
b = (X H X)−1 X H y.
β
LS
(7)
It is easy to show that the mean squared error (MSE) of an
LS channel estimator obeys
h
i
b − βk2 = trace((X H X)−1 )σ 2 .
E kβ
(8)
LS
ℓ2
III. S PARSE M ULTIPATH C HANNELS
(5)
where p = ⌈Tm W ⌉ + 1, sinc(a) = sin(πa)/πa and Sτ,j =
{i : τi ∈ [j/W − 1/2W, j/W + 1/2W )} denotes the set of
all physical paths whose delays lie within the delay resolution
bin of width ∆τ = 1/W centered around the j-th resolvable
virtual delay, τ̂j = j/W . Henceforth, {βj } are termed as the
virtual channel coefficients in the delay space. The expression
(5) states that the coefficient βj approximately consists of the
sum of gains of all paths whose delays lie within the j-th
Channel measurement results dating as far back as 1987 [1]
and as recent as 2007 [9] suggest that multipath components
tend to be distributed in clusters rather than uniformly over the
channel delay spread. These clusters of paths physically correspond to large-scale objects in the scattering environment (e.g.,
buildings and hills in an outdoor propagation environment),
while multipath components within a cluster arise as a result
of scattering from small-scale structures of the corresponding
large-scale reflector (e.g., windows of a building, trees on a
hill).
3
Based on the interarrival times between different multipath
clusters within the delay spread, wireless channels can be
categorized as either “rich” or “sparse”. In a rich multipath
channel, the interarrival times are smaller than the inverse of
the communication bandwidth W . Sparse multipath channels,
on the other hand, exhibit interarrival times that are larger than
1/W . Therefore, similar to the setting in Fig. 1, not every
delay bin of width ∆τ = 1/W contains a multipath component in this case. In particular, since a channel coefficient
consists of the sum of gains of all paths falling within its
respective delay bin, sparse frequency-selective channels tend
to have far fewer nonzero channel coefficients than p at any
fixed (but large enough) bandwidth. We formalize this notion
of multipath sparsity as follows.
Definition 1 (Sparse Multipath Channels): Let W be the
operating bandwidth and p = ⌈Tm W ⌉ + 1 be the number
of resolvable paths (channel coefficients) within the channel
delay spread. We say that a multipath channel is S-sparse if
kβkℓ0 = S < p, where k · kℓ0 counts the number of nonzero
entries in a vector.
Many real-world channels of practical interest, such as
underwater acoustic channels [10], digital television channels
[11] and residential ultrawideband channels [2], in fact, tend
to be sparse or approximately sparse, with S ≪ p. However,
conventional LS based channel estimation schemes, while appropriate for rich channels, fail to capitalize on the anticipated
sparsity of the abovementioned channels. To get an idea of the
potential MSE gains to be had by incorporating the sparsity
assumption into the channel estimation strategy, we compare
the performance of an LS based channel estimator to that of
a channel estimation strategy that has been equipped with
an oracle. The oracle does not reveal the true β, but does
inform us of the indices of nonzero entries of β. Clearly this
represents an ideal estimation strategy and one cannot expect
to attain its performance level. Nevertheless, it is the target
that one should consider.
We begin with an application of arithmetic-harmonic means
inequality to the MSE expression in (8) and note that
h
i
p2 σ 2
p σ2
b − βk2 ≥
E kβ
=
(9)
LS
ℓ2
kxk2ℓ2
trace(X H X)
H
kxk2ℓ2 I p .
with equality if and only if X X =
Now let T⋆ ⊂
{1, . . . , p} be the set of indices of nonzero entries of β and
suppose that an oracle provides us with T⋆ . Then an ideal
channel estimator β ⋆ can be obtained from y by first forming
a restricted LS estimator
−1
β T⋆ = (X H
XH
T⋆ X T⋆ )
T⋆ y
(10)
where X T⋆ is a submatrix obtained by extracting the S
columns of X corresponding to the indices in T⋆ , and then
setting β ⋆ to β T⋆ on the indices in T⋆ and zero on the indices
in T⋆c. Clearly, the MSE of this oracle channel estimator obeys
−1 2
E kβ⋆ − βk2ℓ2 = trace((X H
)σ
T⋆ X T⋆ )
2 2
S σ
S σ2
≥
=
(11)
H
kxk2ℓ2
trace(X T⋆ X T⋆ )
2
with equality if and only if X H
T⋆ X T⋆ = kxkℓ2 I S . Comparing
the MSE lower bounds (9) and (11) shows that LS based conventional channel estimates may be at a significant disadvantage when it comes to identifying sparse channels. And while
the oracle channel estimator β⋆ is impossible to construct
in practice, results from the theories of adaptive denoising
and compressed sensing can be readily adapted to construct
channel estimates using impulse probes and discrete multitone
probes, respectively, which come within a logarithmic factor
of the ideal MSE in (11). Below, we briefly summarize these
results which are essentially direct applications of existing
theory and methods.
A. Impulse Probing
The channel output corresponding to an impulse probe,
b
xkh = δk , is given
i by y = β + z. Trivially, β LS = y and
2
2
b − βk
E kβ
LS
ℓ2 = p σ . On the other hand, let λ(p, σ) =
√
b by setting
2 log p·σ and define a hard-threshold estimator β
λ
βbλ,i = yi 1{|yi |>λ} ,
i = 1, . . . , p.
(12)
It is well-known that the MSE of this thresholded estimator
obeys the following upper bound [12]:
i
h
b − βk2 ≤ const · log p · S σ 2
E kβ
(13)
λ
ℓ2
which comes within a factor of log p to the ideal (oracle based)
MSE of S σ 2 (note that kxk2ℓ2 = 1 in this case), and shows
an MSE improvement by a factor of (roughly) O(p/S) over
the LS MSE of p σ 2 .
B. Discrete Multitone Probing
b performs significantly
While the thresholded estimator β
λ
better than the LS estimator, power constraint on the entries of
an input probe makes impulse probing highly unattractive for
channel estimation purposes (since kxk2ℓ2 = 1), except in very
high signal-to-noise ratio (SNR) scenarios (σ 2 ≪ 1). Instead,
an attractive alternative is to employ discrete multitone probes
of the form
M
1 X
xk = √
tm ejωm k ,
M m=1
k = 0, . . . , n − 1
where the input probe duration n ≥ p, the number of tones
M
≤ℓ p, the frequencies {ωm } are randomly selected from
2π n : ℓ = 0, . . . , n − 1 , and the amplitudes {tm } are i.i.d.
binary random variables taking the values ±1 with probability
1/2 each. Stated in the language of orthogonal frequency
division multiplexing (OFDM) communication systems, this
is equivalent to assigning M OFDM tones (out of a possible
n) as pilot tones.
2
It is straightforward
h to check thati kxkℓ2 = n in this case and
b − βk2 ≥ p σ 2 /n. Recent results
hence, from (9), E kβ
LS
ℓ2
from compressed sensing, however, show that one can improve
on the LS estimator using sparse estimation methods based
on convex/linear programming. The crucial observation here
is that a multitone probe essentially samples the frequency
4
response of a multipath channel at M locations. This can
be easily seen by transforming the time domain channel
estimation problem into the frequency domain through an n
point discrete Fourier transform of the channel output y, which
results in
√
Y (ωm ) = n tm β(ωm ) + Z(ωm ), m = 1, . . . , M. (14)
Now suppose that the number of frequency samples M ≥
const · (log p)4 · S. Then knowledge of the channel frequency
response at M frequencies is sufficient to construct an estimab
tor β
CS of the sparse channel by solving either an ℓ1 penalized
LS problem (the “lasso” estimator [13]) or a convex program
called the “Dantzig selector” [14]; see (16). Further, it has
been shown that, with high probability, the resulting estimator
obeys the following upper bound [14], [15]:
2
b − βk2 ≤ const · log p · S σ
kβ
(15)
CS
ℓ2
n
which is within a factor of log p to the oracle estimator’s lower
bound of S σ 2 /n. Note that the key to understanding this result
is the well-known time-frequency duality: signals concentrated
in the time domain are spread out in the frequency domain and
vice versa. Sampling an S-sparse multipath channel at roughly
O(S) locations in the frequency domain, therefore, suffices to
capture its salient information.
IV. C OMPRESSED C HANNEL S ENSING
n−1
A pseudo-random sequence {xk }k=0
is another form of
input that is generally used to probe a channel, where n is
the input probe duration and xk ’s are i.i.d. realizations from a
zero mean, unit variance distribution f (x). For the sake of this
exposition, we limit ourselves to f (x) being the Rademacher
distribution, i.e., xk ’s take values +1 or −1 with probability
1/2 each, but extending the main results of the paper to other
distributions is straightforward. The energy in the input probe
2
in this case is again
h given by kxk
i ℓ2 = n, resulting in the LS
b − βk2 ≥ p σ 2 /n.
lower bound of E kβ
LS
ℓ2
We now show that it is possible to obtain a more reliable
estimator of β as a solution to the convex program
b = arg min kβ̃kℓ
β
1
β̃∈Cp
subject to
kX H rkℓ∞ ≤ λ
(16)
where λ(p, σ) > 0 and r is the (n + p − 1)-dimensional vector
of residuals: r = y − X β̃. This optimization program goes
by the name of Dantzig selector (DS) and is computationally
tractable since it can be recast as a linear program [14]. We
state our main results in terms of the DS primarily because
it provides the cleanest and most interpretable error bounds
that we know. Note, however, that similar bounds also hold
for the lasso estimator [15] which can sometimes be more
computationally attractive because of the availability of a wide
array of efficient software packages for solving it [16]. The
key to proving the efficacy of the DS is in showing that
the Toeplitz matrix X generated by the pseudo-random input
probe x satisfies the so-called “restricted isometry property”
(RIP) with sufficiently small value of 3S-restricted isometry
constant.
Definition 2 (Restricted Isometry Constant): Suppose that
the columns of X are normalized to unit ℓ2 norm. The 3Srestricted isometry constant of X, denoted by δ3S , is defined
as the smallest value such that
(1 − δ3S )kβ̃k2ℓ2 ≤ kX β̃k2ℓ2 ≤ (1 + δ3S )kβ̃k2ℓ2
(17)
p
holds for all 3S-sparse vectors β̃ ∈ C . The matrix X is said
to satisfy RIP of order 3S if δ3S ∈ [0, 1).
Note that if any two columns of X happen to be linearly
dependent then δ3S ≥ 1. Loosely speaking, the RIP essentially
requires that mutual coherence between the columns of X is
sufficiently small so that X H X (approximately) behaves like
a multiple of identity on the set of sparse vectors. The main
result in [14] asserts that the DS solution is highly accurate
in this case.
Theorem 1 ([14]): Suppose that the column-normalized
version of X satisfies
p RIP of order 3S with δ3S < 1/2.
Choose λ(p, σ) = 2(1 + a) log p√· σ for any a ≥ 0. Then,
with probability exceeding 1 − ( π log p · pa )−1 , the DS
b obeys
estimator β
2
b − βk2 ≤ const · log p · S σ
kβ
.
(18)
ℓ2
n
Theorem 1 states that the DS based channel estimator can
potentially achieve squared error within a factor of log p of
the oracle based MSE lower bound of S σ 2 /n. However, it
remains to be seen whether the convolution matrix X formed
from vector x satisfies the conditions of this theorem. The
main thesis of this paper is that this is indeed the case.
n−1
Theorem 2: Let {xk }k=0
be a sequence of i.i.d. random
variables drawn from the Rademacher distribution and X be
the (n + p − 1) × p Toeplitz matrix generated by this sequence
(as described in Section II-B). Suppose that the duration of
the input probe n ≥ c1 · log p · S 2 . Then, with probability
exceeding 1 − exp (−c2 · n), the column-normalized version
of X satisfies RIP of order 3S with δ3S < 1/2. Here, c1 and
c2 are constants that do not depend on n or p.
Theorem 2 is, in fact, a strengthened version of the results
first reported in [17] for Toeplitz-structured matrices. A detailed proof of the theorem, which leverages a few key ideas
from [17], is given in the Appendix. Theorem 2, combined
b corresponding
with Theorem 1, shows that the DS estimator β
to a pseudo-random input probe does remarkably better than
the LS estimator in learning an S-sparse channel: assuming
that the input probe duration n ≥ const · log p · S 2 , the
improvement is roughly by a factor of O(p/S).
The appeal of the DS estimator, however, goes beyond the
estimation of truly sparse channels. Indeed, it is to be expected
that physical channels in certain scattering environments happen to be only approximately sparse [2]. One such scenario
could be, for example, that the magnitudes of the ordered
channel coefficients exhibit a power law decay, i.e., the ℓ-th
largest coefficient obeys |β(ℓ) | ≤ R · ℓ−1/q for some R > 0
and q ≤ 1. Define S = |{j : |βj | > σ}|. Then, using a
5
Least Squares Reconstruction
1.5
1
0.5
0
−0.5
−1
−1.5
−2
20
40
60
80
100
120
Compressed Channel Sensing Reconstruction
1.5
1
formed by retaining the columns indexed by the entries of
T . The RIP condition essentially requires that, for all subsets
T satisfying |T | = m, the eigenvalues of the Gram matrix
G(T ) = X ′T X T lie in the interval [1 − δ, 1 + δ]. For a fixed
subset T , this condition can be established using Gershgorin’s
circle theorem, which states that the eigenvalues of an m × m
matrix G all lie in the union of m discs, where the i-th disc
is centered at the diagonal entry Gi,i and has radius
0.5
0
−0.5
R(i) =
−1
−1.5
−2
20
40
60
80
100
j=1,j6=i
120
Fig. 2. An illustrative example contrasting the sparse reconstruction abilities
of LS and lasso (n = p = 128, S = 10, SNR = −10 dB).
pseudo-random input probe with n ≥ const · log p · S 2 , the DS
estimator achieves the minimax rate over the class of objects
exhibiting the power law decay [14, Th. 1.3].
V. N UMERICAL R ESULTS AND D ISCUSSION
Convex programming based channel estimators, such as the
DS (or lasso) estimator, are inherently tuned to yield sparse
solutions. This is in stark contrast to linear channel estimators,
such as the LS estimator, wherein each coefficient of the
channel estimate will be nonzero in general. To this end, an
example contrasting the sparse reconstruction abilities of LS
and lasso is illustrated in Fig. 2. The setup corresponds to
using an n = 128 length pseudo-random input probe to sense
a p = 128 length channel that has only S = 10 nonzero
channel coefficients. The output of the channel is observed
at an SNR of −10 dB (SNR = 10 log10 (1/σ 2 )), and LS
and lasso estimates are obtained by pseudo-inverting X and
executing GPSR [16], respectively. As can be seen from Fig. 2,
the lasso estimate is able to identify every nonzero channel
coefficient (shown as an impulse), as well as reject all but one
of the noise-only coefficients. In contrast, it is easy to see that
even a clairvoyantly thresholded LS estimate would be unable
to identify all the nonzero channel coefficients in this case.
We conclude this paper by noting that with the advent of
modern-day wireless transceivers capable of communicating at
large spatio-spectral-temporal dimensions, multipath sparsity
is becoming more and more pronounced across a broad
range of communication systems. As such, we expect similar
performance gains by applying sparse estimation strategies to
these systems and propose to extend the results of this and
related works [18], [19] to estimating sparse channels in time,
frequency, and space.
A PPENDIX
Proof of Theorem 2: Suppose that the columns of X are
normalized to unit ℓ2 norm. For ease of notation, we index the
generative sequence of X as {xk }nk=1 , where xk ’s √
are now
i.i.d. binary random variables taking the values ±1/ n with
probability 1/2 each. Define δ = δ3S ∈ (0, 1/2) and m = 3S.
Let T ⊂ {1, 2, . . . , p} be a subset of indices of cardinality
|T |, and let X T be the (n + p − 1) × |T | submatrix of X
m
X
|Gi,j |.
(19)
Notice that by choice of the xk ’s, Gi,i (T ) = 1 deterministically. Thus, to establish that the eigenvalues lie in [1 − δ, 1 + δ]
for a fixed T , it is sufficient to show that the off-diagonal
entries of G(T ) are all less than δ/m in absolute value, since
this would imply R(i) ≤ (m − 1)(δ/m) < δ for all i.
To guarantee the RIP condition for X, however, the eigenvalue bounds must hold for all subsets T that satisfy |T | = m.
To this end, we consider the full p × p Gram matrix of X,
G = X ′ X, and show that the off-diagonal entries of G are
all bounded above by δ/m in absolute value. The implication
is that, since the Gram matrix G(T ) corresponding to any
subset T satisfying |T | = m is itself a submatrix of G, G(T )
has bounded
off-diagonals and, therefore, the eigenvalues of
p
all m
Gram matrices G(T ) lie in [1 − δ, 1 + δ].
To proceed, notice that each off-diagonal term of G is
simply the inner product between i-th and j-th column of X,
and thus Gi,j = Gj,i . We can write an expression for the
off-diagonal element Gi,j , assuming p ≥ j > i ≥ 1, as
Gi,j =
n
X
xk xk+(j−i) ,
(20)
k=1
where we define xℓ = 0 for ℓ > n. Standard concentration
inequalities are not directly applicable here because all of the
entries in the sum are not mutually independent. For example,
consider i = 1, j = 2, and n = 5. Then G1,2 = x1 x2 +x2 x3 +
x3 x4 + x4 x5 and the first two terms are dependent (through
x2 ), as are the second and third (through x3 ), etc. But notice
that the first and third terms are independent as are the second
and fourth, suggesting that each sum may be split into two
sums of i.i.d. random variables.
This is true in general, and to establish the claim we leverage
a result from the theory of graph coloring. A proper coloring
of a graph is a coloring (or labeling) of vertices so that two
vertices have different colors (labels) if they are connected by
an edge; a proper coloring forms a partition of the vertices
into color classes. An equitable coloring is a proper coloring
where the difference in size between the smallest and largest
color classes is at most one.
We begin by associating a graph to each sum Gi,j , where
each term in the sum corresponds to a vertex in the graph.
Two vertices are connected by an edge if the corresponding
terms in the sum are statistically dependent. Notice that in
this setting, the maximum number of edges originating from
6
any vertex (the degree of the graph) is one, since each term
is dependent with at most one other term.
The problem of splitting the original sum into sums of
independent terms is equivalent to coloring this graph. To
obtain sums of approximately the same size we consider
equitable coloring, and we leverage a result of Hajnal and
Szemerédi which states that a graph with degree ∆ can be
equitably colored with ∆ + 1 colors [20]. Here, this result
guarantees that two colors are sufficient.
To proceed, we rewrite the expression in (20) to explicitly
show the number of nonzero terms in the sum, and obtain
0
n < p and j − i ≥ n
Pn−(j−i)
Gi,j =
, (21)
gk
otherwise
k=1
where {gk } are (dependent) random variables taking values
±1/n each with probability 1/2. Because we are interested
in obtaining absolute upper bounds on Gi,j , only the nonzero
situation requires further analysis. Partition {gk } into two mutually independent subsets according to the equitable coloring,
and appropriately reindex the terms to obtain
q1 =
Gi,j =
n−(j−i)
X2
q2 =
gk′ 1
+
k1 =1
Gi,j =
n−(j−i)+1
2
X
k1 =1
gk′ 2
(22)
k2 =1
when n − (j − i) is even, and
q1 =
n−(j−i)
X2
q2 =
gk′ 1 +
n−(j−i)−1
2
X
gk′ 2
(23)
k2 =1
when n − (j − i) is odd. Generically, we write Gi,j = G1i,j +
G2i,j . We analyze each component sum using Hoeffding’s (twosided) inequality for bounded random variables to obtain, for
example,
2 2
−ǫ n
Pr |G1i,j | > ǫ ≤ 2 exp
,
(24)
2q1
and choosing ǫ = δ/2m yields
2 2
−δ n
Pr |G1i,j | > δ/2m ≤ 2 exp
.
8q1 m2
Considering both sums, we can write
(25)
Pr (|Gi,j | > δ/m)
≤ Pr {|G1i,j | > δ/2m} or {|G2i,j | > δ/2m}
≤ 2 max Pr |G1i,j | > δ/2m , Pr |G2i,j | > δ/2m
2 2
2 2 −δ n
−δ n
≤ 2 max 2 exp
,
2
exp
. (26)
2
8q1 m
8q2 m2
Notice that smaller values of q1 and q2 lead to tighter bounds,
and thus the slowest rate of concentration occurs when the
number of nonzero terms in Gi,j is n − 1. In this case, we
have (n − 1)/2 ≤ q1 ≤ q2 ≤ (n + 1)/2, implying that the
worst case sum has at least (n − 1)/2 terms. As a result
−δ 2 n2
Pr (|Gi,j | > δ/m) ≤ 4 exp
4(n − 1)m2
2 −δ n
≤ 4 exp
.
(27)
4m2
To establish RIP we require that each of the p(p − 1)/2
unique off-diagonal terms Gi,j satisfy this bound. Applying
the union bound yields
2 −δ n
2
Pr (any |Gi,j | > δ/m) ≤ 2p exp
4m2
2
−δ n
≤ exp
+ 3 log p .(28)
4m2
where the last step follows under the mild assumption that
p ≥ 2. Now, notice that whenever δ 2 n/4m2 > 3 log p, or
2
n > 12mδ2log p , RIP is satisfied with probability at least
2
−δ n
1 − exp
+ 3 log p .
(29)
4m2
This success probability is nonzero and can be very close to
one when n is large compared to m2 .
R EFERENCES
[1] A. Saleh and R. Valenzuela, “A statistical model for indoor multipath
propagation,” IEEE J. Select. Areas Commun., pp. 128–137, Feb. 1987.
[2] A. F. Molisch, “Ultrawideband propagation channels-Theory, measurement, and modeling,” IEEE Trans. Veh. Technol., pp. 1528–1545, Sep.
2005.
[3] N. Czink, X. Yin, H. Ozcelik, M. Herdin, E. Bonek, and B. H. Fleury,
“Cluster characteristics in a MIMO indoor propagation environment,”
IEEE Trans. Wireless Commun., pp. 1465–1475, Apr. 2007.
[4] S. F. Cotter and B. D. Rao, “Sparse channel estimation via matching
pursuit with application to equalization,” IEEE Trans. Commun., pp.
374–377, Mar. 2002.
[5] C. Carbonelli, S. Vedantam, and U. Mitra, “Sparse channel estimation
with zero tap detection,” IEEE Trans. Wireless Commun., pp. 1743–
1753, May 2007.
[6] Y. Selen and E. G. Larsson, “RAKE receiver for channels with a sparse
impulse response,” IEEE Trans. Wireless Commun., pp. 3175–3180, Sep.
2007.
[7] P. Bello, “Characterization of randomly time-variant linear channels,”
IEEE Trans. Commun., pp. 360–393, Dec. 1963.
[8] A. M. Sayeed and B. Aazhang, “Joint multipath-Doppler diversity in
mobile wireless communications,” IEEE Trans. Commun., pp. 123–132,
Jan. 1999.
[9] L. Vuokko, V.-M. Kolmonen, J. Salo, and P. Vainikainen, “Measurement of large-scale cluster power characteristics for geometric channel
models,” IEEE Trans. Antennas Propagat., pp. 3361–3365, Nov. 2007.
[10] D. Kilfoyle and A. Baggeroer, “The state of the art in underwater
acoustic telemetry,” IEEE J. Oceanic Eng., pp. 4–27, Jan. 2000.
[11] Receiver Performance Guidelines, ATSC Recommended Practices for
Digital Television, 2004.
[12] D. Donoho and I. M. Johnstone, “Ideal spatial adaptation by wavelet
shrinkage,” Biometrika, pp. 425–455, 1994.
[13] R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. Roy.
Statist. Soc. Ser. B, pp. 267–288, 1996.
[14] E. Candès and T. Tao, “The Dantzig selector: Statistical estimation when
p is much larger than n,” Ann. Statist., pp. 2313–2351, Dec. 2007.
[15] P. Bickel, Y. Ritov, and A. Tsybakov, “Simultaneous analysis of lasso
and Dantzig selector,” pre-print.
[16] M. A. T. Figueiredo, R. Nowak, and S. Wright, “Gradient projection
for sparse reconstruction: Application to compressed sensing and other
inverse problems,” IEEE J. Select. Topics Signal Processing, pp. 586–
597, Dec. 2007.
[17] W. U. Bajwa, J. Haupt, G. Raz, S. Wright, and R. Nowak, “Toeplitzstructured compressed sensing matrices,” in Proc. 14th IEEE/SP Workshop on Statistical Signal Processing (SSP’07), Aug. 2007, pp. 294–298.
[18] G. E. Pfander, H. Rauhut, and J. Tanner, “Identification of matrices
having a sparse representation,” pre-print.
[19] M. Herman and T. Strohmer, “High resolution radar via compressed
sensing,” pre-print.
[20] A. Hajnal and E. Szemerédi, “Proof of a conjecture of P. Erdős,” in
Combinatorial Theory and its Application, 1970, pp. 601–623.
© Copyright 2026 Paperzz