Sparse Sampling : Theory, Algorithms and Applications

Tutorial ICASSP 2009
Sparse Sampling:
Theory, Algorithms and Applications
Thierry Blu (CUHK), Pier-Luigi Dragotti (ICL), Pina Marziliano (NTU),
Martin Vetterli (EPFL & UCB)
Tutorial Outline
Introduction
Part 1: Overview (Martin Vetterli )
–
–
–
–
Classical sampling revisited
Generalizations of classical sampling
Sparse sampling: FRI, CS
Diverse applications
Part 2: Sampling Signals with Finite Rate of Innovation (Pina Marziliano)
–
–
–
Definitions
Periodic stream of Dirac pulses
Applications: Compression of EEG signals, removal of shot noise in band-limited signals
Part 3: Recovery of noisy sparse signals (Thierry Blu)
–
–
–
–
Acquisition in the noisy case
Denoising strategies
Cramer-Rao bounds
Applications : Optical Coherence Tomography
Part 4: Variations and extensions (Pier-Luigi Dragotti)
–
–
–
Alternative sampling kernels
Sampling FRI signals with polynomial reproducing kernels
Application: Image super-resolution
Summary and Conclusions , Q/A
ICASSP 2009
Tutorial ICASSP 2009
Sparse Sampling:
Theory, Algorithms and Applications
Part 1
Thierry Blu (CUHK), Pier-Luigi Dragotti (ICL), Pina Marziliano (NTU),
Martin Vetterli (EPFL & UCB)
Acknowledgements
• Sponsors
– NSF Switzerland: thank you!
• Collaborations
– C.Lee, B.Aryan, D.Julian, S.Rangan Qualcomm
– J. Kusuma, MIT/Schlumberger
– I. Maravic, Deutsche Bank
– P. Prandoni, Quividi
• Discussions and Interactions
– V. Goyal, MIT
– A.Fletcher, UCB
– M. Unser, EPFL
Spring-09 - 2
Outline
1. Introduction
Shannon and beyond…
2. Signals with finite rate of innovation
Shift-invariant spaces and Poisson spikes
Sparsity in CT and DT
3. Sampling signals at their rate of innovation
The basic set up: periodic set of diracs, sampled at Occam’s rate
Variations on the theme: Gaussians, splines and Strang-Fix
4. The noisy case
Total least squares and Cadzow
5. Sparse sampling, compressed sensing, and compressive sampling
Sparsity, fat matrices, and frames
Norms and algorithms
6. Applications and technology
Wideband communications, UWB, superresolution
Technology transfer
7. Discussion and conclusions
Spring-09 - 3
The Question!
You are given a class of objects, a class of functions (e.g. BL)
You have a sampling device, as usual to acquire the real world
• Smoothing kernel or lowpass filter
• Regular, uniform sampling
• That is, the workhorse of sampling!
Obvious question:
When does a minimum number of samples uniquely specify the
function?
Spring-09 - 4
The Question!
About the observation kernel:
Given by nature
• Diffusion equation, Green function
Ex: sensor networks
Given by the set up
• Designed by somebody else, it is out there
Ex: Hubble telescope
Given by design
• Pick the best kernel
Ex: engineered systems, but constraints
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
About the sampling rate:
Given by problem
• Ex: sensor networks
Given by design
• Usually, as low as possible
Ex: digital UWB
Spring-09 - 5
1. Shannon and Beyond… 1948 Foundation paper
If a function x(t) contains no frequencies higher than W cps, it is completely
determined by giving its ordinates at a series of points spaced 1/(2W)
seconds apart.
[if approx. T long, W wide, 2TW numbers specify the function]
It is a representation theorem:
• sinc(t-n) is an orthogonal basis for BL[-π,π]
• x(t) ∈ BL[-π,π] can be written as
x
BL
Note:
• … slow convergence of the series
• Shannon-BW, BL sufficient, not necessary
Spring-09 - 6
Shannon’s Theorem… a bit of History
Whittaker 1915, 1935
Nyquist 1928
Kotelnikov 1933
Raabe 1938
Gabor 1946
Shannon 1948
So… you have never heard of Herbert Raabe?
– Wrote his PhD on the sampling theorem…
– Wrong time
– Wrong country
– Wrong lab…
Spring-09 - 7
Shannon’s Theorem: Variations on the subspace theme
• Non uniform
– Kadec 1/4 theorem
• Derivative sampling
– Sample signal and derivative…
… at half the rate
• Stochastic
– Power spectrum is good enough
• Shift-invariant subspaces
– Generalize from sinc to other
shift-invariant Riesz bases
(ortho. and biorthogonal)
– Structurally, it is the same thing!
• Multichannel sampling
– Known shift: easy
– Unknown shift: interesting
(superresolution imaging)
Spring-09 - 8
A Sampling Nightmare….
Shannon BL case
or 1/T degrees of freedom per unit time
• But: a single discontinuity, and no more sampling theorem…
• Are there other signals with finite number of degrees of freedom per
unit of time that allow exact sampling results?
Spring-09 - 9
Examples of non-bandlimited signals
Spring-09 - 10
1. Shannon and Beyond…
• Is there a sampling theory beyond Shannon?
– Shannon: bandlimitedness is sufficient but not necessary
– Shannon bandwidth: dimension of subspace
– Shift-invariant spaces popular with wavelets etc
• Thus, develop a sampling theory for classes of
non-bandlimited but sparse signals!
x(t)
xN-1
x0
t1
…
t
t0
x1
tN-1
Generic, continuous-time sparse signal
Spring-09 - 11
Related work
Spectral estimation, Fourier analysis, sampling
– Prony 1795
– classic high resolution spectral estimation (Stoica/Moses)
– uncertainty principle (Donoho and Stark, 1989)
– spectrum blind sampling (Bresler et al, 1996, 1999)
Error correction coding
– Errors are sparse…Reed-Solomon, Berlekamp-Massey
Randomized algorithms for fast estimation
– Faster Fourier transforms, Gilbert et al 2005
– Sketches of databases, Indyk et al
Compressed sensing, compressive sampling
– Gribonval, Nielsen 2003, Donoho, Elad 2003
–Donoho et al, 2006, Candes et al, 2006
– etc!
Spring-09 - 12
Outline
1. Introduction
2. Signals with finite rate of innovation
Poisson spikes
Rate of innovation ρ
The set up and the key questions
3. Sampling signals at their rate of innovation
4. The noisy case
5. Sparse sampling, compressed sensing, and compressive
sampling
6. Applications and technology
7. Discussion and conclusions
Spring-09 - 13
2. Sparsity and Signals with Finite Rate of Innovation
Sparsity:
CT: parametric class, with degrees of freedom
(e.g. K diracs, or 2K degrees of freedom)
DT: N dimensional vector x, and its l0 norm K = ||x||0.
Then K/N indicates sparsity
Compressibility:
• Object expressed in an ortho-basis has fast decaying NLA
For ex., decay of ordered wavelet coefficients in O(k  ),   1
ρ : Rate of innovation or degrees of freedom per unit of time
Call CT the number of degrees of freedom in the interval [-T/2,T/2], then
Note: Relation to information rate in Shannon-48
Spring-09 - 14
2. Signals with Finite Rate of Innovation
Ex: Poisson Process: a set of Dirac pulses, |tk - tk+1| exp. distrib
Innovations: times {tk}, rate of innovation average # of Diracs per sec.
Call CT the number of Diracs in the interval [-T/2,T/2], then
Poisson process: average time is 1/λ, thus ρ = λ
Weighted Diracs:
Obvious question: is there a sampling theorem at ρ samples per sec?
Necessity: Obvious! Sufficiency?
Spring-09 - 15
2. Signals with Finite Rate of Innovation
The set up:
Where x(t): signal, φ(t): sampling kernel,
y(t): filtering of x(t) and yn: samples
For a sparse input, like a weighted sum of Diracs (in DT and CT)
• When is there a one-to-one map yn ⇔ x(t)?
• Efficient algorithm?
• Stable reconstruction?
• Robustness to noise?
• Optimality of recovery?
Spring-09 - 16
Outline
1. Introduction
2. Signals with finite rate of innovation
3. Sampling signals at their rate of innovation
The basic set up: periodic set of diracs, sampled at their rate of innovation
A representation theorem [VMB:02]
Toeplitz system
Annihilating filter and root finding
Total least squares and SVD
Variations
4. The noisy case
5. Sparse sampling, compressed sensing, and compressive sampling
6. Applications and technology
7. Discussion and conclusions
Spring-09 - 17
An example and overview
K Diracs on the interval: 2K degrees of freedom. Periodic case:
x(t)
xk
tk
τ
t
• Key: The Fourier series is a weighted sum of K exponentials
• Result: taking 2K+1 samples from a lowpass version of bandwidth
(2K+1) allows to perfectly recover x(t)
Spring-09 - 18
A Representation Theorem [VMB:02]
Theorem
Periodic stream of K Diracs, of period τ, weights {xk} and locations {tk}.
Take a sampling kernel of bandwidth B, with Bτ an odd integer > 2K
or a Dirichlet kernel. Then the N samples, N > Bτ, T = τ/Ν ,
are a sufficient characterization of x(t).
Spring-09 - 19
Note: Linear versus Non-Linear Problem
Problem is non-linear in tk, and linear in xk given tk
Consider two such streams of K Diracs, of period τ,
and weights and locations {xk,tk} and {x’k,t’k}, respectively
The sum is in general a stream with 2K Diracs.
But, given a set of locations {tk} then the problem is linear in {xk}.
The key to the solution:
Separability of non-linear from linear problem
Note: Finding locations is a key task in estimation/retrieval of sparse
signals, but also in registration, feature detection etc
Spring-09 - 20
Notes on Proof
The procedure is constructive, and leads to an algorithm:
1. Take 2K+1 samples yn from the Dirichlet kernel output
2. Compute the DFT to obtain the Fourier series coefficients -K..K
3. Solve a Toeplitz system of equation of size K by K to get H(z)
4. Find the roots of H(z) by factorization, to get uk and tk
5. Solve a Vandermonde system of equation of size K by K to get xk.
The complexity is:
1. Analog to digital converter
2. K Log K
3. K2
4. K3 (can be accelerated)
5. K2
Or polynomial in K!
Note: For size N vector, with K Diracs, O(K3) complexity, noiseless
Spring-09 - 21
Generalizations [VMB:02]
For the class of periodic FRI signals which includes
– Sequences of Diracs
– Non-uniform or free knot splines
– Piecewise polynomials
There are sampling schemes with sampling at the rate of
innovation with perfect recovery and polynomial complexity
• Variations: finite length, 2D, local kernels etc
Spring-09 - 22
Generalizations [VMB:02]
Sinc
Gaussian
Splines
Spring-09 - 23
Generalizations [DVB:07]
The return of Strang-Fix!
Local, polynomial complexity reconstruction, for diracs and piecewise
polynomials
Pure discrete-time processing!
Spring-09 - 24
Generalizations: 2D extensions [MV:06]
Note: true 2D processing!
Spring-09 - 25
Outline
1. Introduction
2. Signals with finite rate of innovation
3. Sampling signals at their rate of innovation
4. The noisy case
Total least squares
Cadzow or model based denoising
Uncertainty principles on location and amplitude of Diracs
Algorithms
Examples
5. Sparse sampling, compressed sensing, and compressive sampling
6. Applications and technology
7. Discussion and conclusions
Spring-09 - 26
The noisy case…
Acquisition in the noisy case:
where ‘’analog’’ noise is before acquisition (e.g. communication noise
on a channel) and digital noise is due to acquisition (ADC, etc)
Example: Ultrawide band (UWB) communication….
Spring-09 - 27
The noisy case: The non-linear estimation problem
Data:
where: T=τ/N, ϕ(t) Dirichlet kernel, N/(ρ τ) redundancy
Estimation: For
Find the set
such that
is minimized
This is: a non-linear estimation problem in {tk},
a linear estimation problem in {xk} given {tk}.
Spring-09 - 28
Outline
1. Introduction
2. Signals with finite rate of innovation
3. Sampling signals at their rate of innovation
4. The noisy case
5. Sparse sampling, compressed sensing, and compressive sampling
Discrete, finite size set up: sparse vectors
Fat matrices and frames
Unicity and conditioning
Reconstruction algorithms
Compressed sensing and compressive sampling
6. Applications and technology
7. Discussion and conclusions
Spring-09 - 29
Sparsity revisited… Discrete, finite-dimensional
Consider a discrete-time, finite dimensional set up
Model:
– World is discrete, finite dimensional of size N
– x ∈ RN, but |x|0 = K << N : K sparse in a basis Φ
Method:
– Take M measurements, where K < M << N
– Measurement matrix F: a fat matrix of size M x N
Mx1
measurements
Measurement Matrix
F
Sparsity
Basis
Nx1
Sparse signal
K
Non-zeros
Spring-09 - 30
Geometry of the problem
This is a vastly under-determined system…
– F is a frame matrix, of size M by N, M << N
– Key issue: the (N|K) submatrices (K out of the N) columns of F
– Consider the set of submatrices {Fk}, k=1…(N|K)
– Calculate projection of y onto range of Fk,
– If
, possible solution
– In general, choose k such that
– Note: this is hopeless in general ;)
Necessary conditions (for most inputs, or prob. 1)
-M>K
- all Fk must have rank K
- all ranges of Fk must be different
Spring-09 - 31
Geometry of the problem
Easy solution… Vandermonde matrices!
– Each submatrix Fk is of rank K and spans a different subspace
– Particular case: Fourier matrix or DFT matrix, or a slice of it
Spring-09 - 32
The power of random matrices! Donoho, Candes et al
Set up: x ∈ RN, but |x|0 = K << N : K sparse, F of size M by N
Measurement matrix with random entries (gaussian, bernoulli)
– Pick M = O(K log N/K)
– With high probability, this matrix is good!
Condition on matrix F
– Uniform uncertainty principle or restricted isometry property
All K-sparse vectors x satisfy an approx. norm conservation
Reconstruction Method:
– Solve linear program
under constraint
Strong result: l1 minimization finds, with high probability, sparse
solution, or l0 and l1 problems have the same solution
Spring-09 - 33
The power of random matrices!
Notes:
Works for sparse in any basis
– Just replace F by F Θ, where Θ is the sparse basis
Works for approximately sparse in basis
– For ex. fast decay of wavelet coefficients
Works for noisy case
– LP will work by picking large coefficients (like wavelet denoising)
Price
–
–
Gain
–
–
Unstructured matrices (e.g. Fx can be expensive)
Some redundancy (log N factor)
Universality of measurement
General set up
Spring-09 - 34
Relation to compressed sensing: Differences
Sparse Sampling of Signal Innovations
+ Continuous or discrete, infinite or finite dimensional
+ Lower bounds (CRB) provably optimal reconstruction
+ Close to “real” sampling, deterministic
– Not universal, designer matrices
Compressed sensing
+ Universal and more general
± Probabilistic, can be complex
– Discrete, redundant
The real game:
In the space of frame measurements matrices F
• Best matrix? (constrained grassmanian manifold)
• Tradeoff for M: 2K, KlogN, Klog2N
• Complexity (measurement, reconstruction)
Spring-09 - 35
Outline
1. Introduction
2. Signals with finite rate of innovation
3. Sampling signals at their rate of innovation
4. The noisy case
5. Sparse sampling, compressed sensing, and compressive sampling
6. Applications and technology
Superresolution
Joint sparsity estimation
Wideband communications, ranging, channel estimation
Acoustic tomography
A technology transfer story
7. Discussion and conclusions
Spring-09 - 36
Applications: The proof of the pie is in …
1. Look for sparsity!
– In either domain, in any basis
– Different singularities
– Model is key
– Physics might be at play
2. Choose operating point, algorithm,
– Low noise, high noise?
– Model precise of approximate?
– Choice of kernel?
– Algorithmic complexity? Iterative?
3. It is not a black box….
– It is a toolbox
– Handcraft the solution
– No free lunch!
Spring-09 - 37
Applications: Joint Sparsity Estimation
Two non-sparse signals, but related by sparse convolution
– Often encountered in practice
– Distributed sensing
– Room impulse measurement
Question: What is the sampling rate region?
Similar to Slepian-Wolf problem in information theory
Spring-09 - 38
Applications: Joint Sparsity Estimation
Sampling rate region:
– Universal (all signals): no gain!
– Almost surely (undecodable signals of measure zero)
Universal recovery
Annihilating
filter method
Almost sure
recovery
[Roy,Hormati, Lu et al, ICASSP09]
Spring-09 - 39
Applications: Joint Sparsity Estimation
Ranging, room impulse response, UWB:
– Know signal x1, low rate acquisition of x2
Spring-09 - 40
Conclusions and Outlook
Sampling at the rate of innovation
– Cool!
– Sharp theorems
– Robust algorithms
– Provable optimality over wide SNR ranges
Many actual and potential applications
– Fit the model (needs some work)
– Apply the ‘’right’’ algorithm
– Catch the essence!
Still a number of good questions open, from the fundamental
to the algorithmic and the applications
A slogan:
Sort the wheat from the chaff… at Occam’s rate!
Spring-09 - 41
Publications
Special issue on Compressive Sampling:
– R.Baraniuk, E.Candes, R.Nowak and M.Vetterli
(Eds.)IEEE Signal Processing Magazine, March
2008.10 papers overviewing the theory and
practice of Sparse Sampling, Compressed
Sensing and Compressive Sampling (CS).
Basic paper:
– M.Vetterli, P. Marziliano and T. Blu, “Sampling
Signals with Finite Rate of Innovation,” IEEE
Transactions on Signal Processing, June 2002.
Main paper, with comprehensive review:
– T.Blu, P.L.Dragotti, M.Vetterli, P.Marziliano, and
L.Coulot,“Sparse Sampling of Signal Innovations:
Theory, Algorithms and Performance Bounds,”
IEEE Signal Processing Magazine, Special issue
on Compressive Sampling, March 2008.
Spring-09 - 42
Publications
For more details:
– P.L. Dragotti, M. Vetterli and T. Blu, “Sampling Moments and
Reconstructing Signals of Finite Rate of Innovation: Shannon
Meets Strang-Fix,” IEEE Transactions on Signal Processing,
May 2007.
– P. Marziliano, M. Vetterli and T. Blu, Sampling and exact
reconstruction of bandlimited signals with shot noise, IEEE
Transactions on Information Theory, Vol. 52, Nr. 5, pp. 22302233, 2006.
– I. Maravic and M. Vetterli, Sampling and Reconstruction of
Signals with Finite Rate of Innovation in the Presence of Noise,
IEEE Transactions on Signal Processing, Aug. 2005.
– I. Maravic and M. Vetterli, Exact Sampling Results for Some
Classes of Parametric Non-Bandlimited 2-D Signals, IEEE
Transactions on Signal Processing, Vol. 52, Nr. 1, pp. 175-189,
2004.
Spring-09 - 43
Thank you for your attention!
http://panorama.epfl.ch
Questions?
Spring-09 - 44
Tutorial ICASSP 2009
Sparse Sampling:
Theory, Algorithms and Applications
Part 2
Thierry Blu (CUHK), Pier-Luigi Dragotti (ICL), Pina Marziliano (NTU),
Martin Vetterli (EPFL & UCB)
Outline
• Mathematical background
–
–
–
–
Dirac pulses and applications
Fourier series, Z-transform
Toeplitz and Vandermonde matrices
Classical sampling theorem
• Signals with Finite Rate of Innovation
– Definition and examples
• Periodic stream of Dirac pulses
– Sampling and reconstruction scheme: step by step
• Applications
– Modeling and compression of neonatal EEG seizure signals
– Removal of shot noise in band-limited signals
– Demo
ICASSP 2009 - 2
Signal Representations
1820
1840
1860
1880
1900
1920
1940
1960
1980
2000
1822 Jean Baptiste Joseph Fourier: Fourier Transform
1872 Ludwig Boltzmann: Entropy of a single gas particle
1915 E. T. Whittaker: Functions and their expansions in shift orthonormal spaces
1928 Harry Nyquist: Distortionless transmission in band-limited channels
1929 Norbert Weiner: Hermitian polynomials and Fourier analysis
1937 E. U. Condon: Fractional Fourier Kernel
1948 C. E. Shannon: Sampling theorem and theory of communication
1985 The Wavelet Era
2000 The POST Wavelet Era: Sparse Signal Representations/Compressive Sampling
ICASSP 2009 - 3
(Paul) Dirac’s-Delta Function (Unit Impulse Function)
Definition

 (t )(t )dt (0)

Delta Function is continuous analogue of
the discrete Kronecker delta.
Paul Dirac


 t0
(t )  


0 t0


Self-Fourier Function Property of Stream of Diracs


n 
F
(t  nT ) 
 2T


k 
(w  k
2
T
)
ICASSP 2009 - 4
Example : Multipath Signal Propagation
N 1
h(t )
y(t )  h(t )  x(t )   ne n x (t   n )
i
n 0
Complex
Amplitude
N – multipath
x(t )
t
The mathematical model of the multipath can be presented using a stream of Diracs.
h (t ) 
N 1
 e
n 0
n
in
(t   n )
N
: No. of possible paths
τn
: Time delay of nth channel
λn e
iφ n
ne
n
in
 N 1
t
: Complex amplitude of nth received pulse
ICASSP 2009 - 5
Example : Pulse Position Modulation
Signal of Interest
Unmodulated Pulse Train
Pulse Amplitude Modulated Signal
Pulse Position Modulated Signal
Modulation of a pulse carrier wherein the value of each instantaneous sample of a modulating
wave varies the position in time of a pulse relative to its unmodulated time of occurrence.
ICASSP 2009 - 6
Useful matrices
Defn.: A matrix is Toeplitz if each diagonal has a constant value :
AToeplitz
a1
a2
 a0
a
a1
 −1 a0
=  a−2 a−1
a0



 
 a− K a− K +1 a− K + 2
 aK 
 aK −1 
 aK − 2 

  
 a0 
Defn. : A matrix is Vandermonde if each row is a power of each other :
1 1 1
a a a
2
3
 1
AVandermonde =  a12 a22 a32




 a1K a2K a3K
 1
 aK 
 aK2 

  
 aKK 
Note:
• A Vandermonde system of equations admits a unique solution when
am ≠ an for all m ≠ n
ICASSP 2009 - 7
Review : Fourier and Z-transform domain
Continuous-time
Time domain
Non-periodic
1
i t
(
)
xc (t ) 
X
e
d

2  c
τ - Periodic
x p (t ) 


m 
xˆme
i 2 tm /
, t  [t0 , t0   ]
Non-periodic
Discrete-time
Xc () 



xc (t )e itdt,   
1 
X (e i )e ind , n  

2 
1
x [n ] 
2i
 x[n ]e
X (e ) 

C
in
,   [,  ]
n 
X (z ) 
X (z )z n 1dz, n  

 x[n ]z
n
,z  
n 
Discrete Fourier Series (DFS)
N 1
c e
k 0
i 2 nk /N
k
, n  0, , N  1
N - Finite length
N 1
 X [k ]e
k 0

i
Z-transform
N - Periodic
1
x p [n ] 
N
1 t0 
xˆm  
x p (t )e i 2 mt /dt, m  
 t0
Discrete-time Fourier Transform (DTFT)
Non-periodic
1
x [n ] 
N
Fourier Transform (FT or CTFT)
Fourier Series (FS or CTFS)

x [n ] 
Transform domain
ck 
N 1
x
n 0
p
[n ]e i 2 kn /N , k  0, , N  1
Discrete Fourier Transform (DFT)
i 2 nk /N
, n  0, , N  1
X [k ] 
N 1
 x[n ]e
i 2 kn /N
, k  0, , N  1
n 0
ICASSP 2009 - 8
Classical sampling theorem
Thm. 1:
A bandlimited signal x(t ) ∈ BL[ −ωM ,ωM ]:{X (ω ) =
0, ω >ωM }
is perfectly recovered from a set of samples x(nT ) taken T apart
if the sampling rate ωs =
2π
T
is greater than or equal to two times
the maximum frequency :
ωs ≥ 2ωM ⇔ T ≤ ωπ
M
ICASSP 2009 - 9
Sampling and reconstruction of band-limited signals
Time domain
yn  x (nT )

x (t )
sinc
x (t )   yn sinc(t  nT )
n 
 (t  nT )
n 
Frequency domain


X ()
1
Ys () 
T


k 
X (  k s )
rect
X ()  TYs ()
s  (  k s )
n 
ICASSP 2009 - 10
What about non band-limited signals?
In practice, we force a band-limit !
x (t )
sinc
y(t )
yn  y(nT )  x (t ), sinc(t  nT )
 (t  nT )
n 
sinc
Reconstruction not perfect!
ICASSP 2009 - 11
Signals with Finite Rate of Innovation
Defn.: Rate of innovation
  number of degrees of freedom per unit of time
A signal with finite rate of innovation has a parametric representation
and
 
Example : Consider x(t ) band-limited to
=
x(t )
[ − B2 , B2 ], then
∑ x sinc ( Bt − n )
n∈Ζ
n
is exactly defined by the samples
{ xn =
Bsinc ( Bt − n ) , x(t ) = x(nT )}
   B  T1 degrees of freedom per second
ICASSP 2009 - 12
Examples of signals with FRI
Piecewise linear :
Piecewise constant :
Stream of Dirac pulses :
4 Diracs :  
x(t )
x (1) (t ) =
0

t
0

t
0

t
d
x(t )
dt
d2
x (t ) = 2 x(t )
dt
(2)
4 weights  4 locations

ICASSP 2009 - 13
More examples of FRI signals
All sparse in some sense !
Bilevel signals
PPM, CDMA
Streams of Dirac Pulses
Poisson Process
Action Potentials
Piecewise constant
PAM
1.5
1
cK-1
c2
0.5
t0 t1
t
2
t
K-1
0
c0
0.5
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Piecewise polynomials
1D
2D
c1
Bandlimited + PWLinear
ECG signals
3
Filtered Stream of Diracs
Ultra Wide Band
Neural spikes
1.5
2
1
1
0
1
0.5
2
3
0
4
0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
-0.5
50
100
150
200
250
300
ICASSP 2009 - 14
Sampling theorem for periodic stream of Diracs
Thm. 2: Consider a τ -periodic stream of K Diracs with amplitudes xk
located at distinct times tk ∈ [0,τ )
=
x(t )
K
∑ x ∑ δ (t − t
k 1
=
and rate of innovation ρ =
k
k ′∈
2K
τ
k
− k ′τ ).
.
Take a sampling kernel of bandwidth B ≥ ρ
sin(π Bt )
Bτ sin(π t / τ )
k ′∈
or a Dirichlet kernel such that Bτ= 2 M + 1 > 2 K .
Then the N samples, N > Bτ ,
=
ϕ (t ) ∑ sinc( B(t − =
k ′τ ))
yn
=
(nT − t ))
x(t ),sinc( B
=
K
∑ x ϕ (nT − t )
k =1
k
k
for n = 1, 2, , N are a sufficient characterization of x(t ).
ICASSP 2009 - 15
Sampling and reconstruction periodic stream of Diracs
xK
x1
x2
x3
0 t t
1 2
t3
xk 1
x (t )
(t )
tk 1 tK 
y(t )
yn  y(nT )  x(t ), (nT  t )
 (t  nT )
n 
Step 3
Step 2
Step 1
Find K
Dirac
locations
Find K
annihilating
filter
coefficients
Find at least
2K spectral
values of
x (t )
tk
Step 4
Find K
Dirac weights
xk
ICASSP 2009 - 16
τ
-Periodic Stream of K weighted Dirac pulses
A periodic stream of K weighted Dirac pulses is defined by
K
∑ x ∑ δ (t − t
=
x(t )
k 1
=
=
k
∑ xˆ
m∈
k ′∈
m
e
k
− k ′τ )
i 2 πτmt
x (t )
x1
x2
x3
0 t t
1 2
t3
xK
xk 1
tk 1 tK 
where
τ
xˆm
2 π mt
1
1 K
− i 2 πτmt
−i ( τ k )
,m∈
=
∫ x(t ) e dt ∑ xk e
τ
0
τ
k =1
are the Fourier series/spectral values of x (t ) .
Rate of innovation is given by K locations + K weights for each period :

2K

ICASSP 2009 - 17
Periodic sinc sampling kernel
Consider the Dirichlet kernel or the periodic sinc sampling kernel :
ˆm

(t )
1
 
0
=
ϕ (t )
B(t − k ′τ ))
∑ sinc(=
k ′∈

t
-M -K
 
0
K M
m
1 if − M ≤ m ≤ M
sin(π Bt )
Fourier
←→
=
ϕˆm 
else
Bτ sin(π t / τ )
0
where
2K
 Bτ 
M =   and B ≥ ρ =
,ie. take Bτ = 2 M + 1
2
τ
 
ICASSP 2009 - 18
Find the N sample values
The N sample values are obtained by uniform sampling the filtered signal
y(t )
NT  
T
or by evaluating
t
=
yn y (nT
=
) x(t ) ∗ ϕ (t ) t = nT
x(t ), ϕ (nT − t )
=
τ
=
, n
∫ x(t ) sinc( B(nT − t )) dt =
1, 2, , N
0
ICASSP 2009 - 19
Step 1: Find at least 2K spectral values of x(t)
Since N ≥ Bτ = 2 M + 1 by substituting x(t ) =
x(t ), ϕ (nT
=
− t)
yn
=
∑ xˆ
m∈
M
m
m∈
e
=
∑ xˆme , n 1, 2, , N
=
i 2 π τmnT
m= − M
xˆm


If N =
τ
T
0
i 2 πτmt
m
e
into
, ϕ (nT − t )
system of N equations and 2 M + 1
unknown contiguous spectral values
xˆm , m = − M , , M
M  3, N  2M  1  7
yˆm
M K 2
-M
∑ xˆ
i 2 πτmt
M
m
τ xˆm
=
yˆ m DFT
=
( yn ) 
then
 0
-M
0
M
N
m
−M ≤ m ≤ M
for other m ∈ − N2 , N2 
ICASSP 2009 - 20
Step 2: Find K annihilating filter coefficients
But what is an annihilating filter?
Defn.: A filter hmis called an annihilating filter for a sequence
when
=
hm * xˆm
h xˆ
∑=
k∈
k
m−k
0
xˆm
for all m ∈ 
Example: Verify that the filter hm with Z-transform H ( z ) = 1 − u1 z
m
annihilates the exponential sequence xˆm = u1 :
−1
hm ∗ xˆm = h0 xˆm + h1 xˆm −1
= 1 ⋅ u1m + (−u1 ) ⋅ u1m −1 = 0
ICASSP 2009 - 21
Step 2: Find K annihilating filter coefficients
In general, a filter with coefficients
( z)
H=
{hm }m =0,1,, K and
K
−m
h
=
z
∑ m
m=0
Z-transform
K
−1
(1
−
u
z
∏ k )
k =1
annihilates the linear combination of exponentials
=
xˆm
where
uk = e
−i (
1
τ
K
m
x
u
∑ k k , m∈
k =1
2π t
τ
k)
.
Verification:
=
hm * xˆm
K
hl xˆm −l
∑=
=l 0
K
∑ hl
1
τ
K
K
K
m −l
m
k
k
k
=
k 1 =l 0
xk u
∑=
=l 0=
k 1
∑
x
−l
0
u =
h
u
∑
l k
τ



H ( uk ) = 0
ICASSP 2009 - 22
How do we find the annihilating filter coefficients?
Using 2K spectral values obtained in
of equations
hm * xˆm
=
K
h xˆ
∑=
k =0
m−k
k
K
∑ h xˆ
⇔
k =1
k
m−k
0
Step 1
solve the Toeplitz system
=
for m 1, , K
=
−h0 xˆm for m =
1, , K
Example: Take
=
K 3,=
let h0 1 =
and m 1, ,3 solve
 xˆ0
 xˆ
 1
 xˆ2
xˆ−1
xˆ0
xˆ1
xˆ−2 
xˆ−1 
xˆ0 
 h1 
 xˆ1 
 
ˆ 
h
=
−
 2
 x2 
h 
 xˆ 
 3
 3
We obtain the K annihilating filter coefficients
{h1 , h2 , , hK }
ICASSP 2009 - 23
Step 3 : Find K locations of the Diracs
The roots of the z-transform of the annihilating filter
K
H ( z ) =∑ hm z
m=0
⇔
−m
=∏ (1 − uk z −1 ) =0
K
k =1
−i (
2π t
k)
, k=
1, , K
z=
uk =
e
uniquely define the K locations
tk
τ
of the Diracs.
Note :
•
•
Nonlinear problem
Annihilating filter ~ error locator polynomial, spectral estimation (Prony’s method)
ICASSP 2009 - 24
Step 4: Find K weights of the Diracs
From K spectral values
{ xˆm }m=1,, K
and K distinct locations
solve the Vandermonde system of K equations, K unknowns
1 K
m
=
xˆm =
x
u
∑ k k , m 1, , K
τ
Example: K=3
where
k =1
{tk }k =1,, K ,
{ xk }k =1,, K
uk = e − i (tk 2π /τ )
 xˆ1 
 u1 u2 u3   x1 
 ˆ  1  2 2 2  
 x2  = τ u1 u2 u3   x2 
3
3
3
 xˆ 
x 


u
u
u
 3
 1 2 3  3
Note :
•
•
Linear problem
Vandermonde system admits a unique solution since the locations k are distinct.
t
ICASSP 2009 - 25
Notes on FRI sampling and reconstruction scheme
The complexity is
• N samples
: Analog to Digital converter
• 2K spectral values from DFT of samples : O(K log K )
• K annihilating filter coefficients from
Toeplitz system
: O(K 2 )
Polynomial in K !
• K locations from roots of Z-transform of
O(K 3 )
annihilating filter (non linear problem)
:
2
• K weights from Vandermonde system
: O(K )
(linear problem)
The key to the solution:
Separability of non-linear from linear problem
Note: Finding the locations is a key task in estimation/retrieval of sparse signals,
but also in registration, feature detection,etc…
ICASSP 2009 - 26
FRI Applications
• Modeling and compression of neonatal EEG
seizure signals
• Removing shot noise from bandlimited signals
ICASSP 2009 - 27
Neonatal EEG seizure as FRI signal
A section of the neonatal EEG seizure signal
Amplitude (microvolt)
Amplitude (microvolt)
Amplitude (microvolt)
60
40
20
0
-20
-40
0
50
100
150
200
250
300
350
400
450
350
400
450
Extracted stream of Dirac pulses
60
40
20
0
-20
-40
0
60
50
100
150
200
250
300
Non-uniform linear spline approximation of the seizure signal
40
20
0
-20
-40
0
50
100
150
200
250
300
350
400
450
Time samples
ICASSP 2009 - 28
Sampling and reconstruction non-uniform linear spline
R= degree of spline =1
x (t )

0

(t )
t
y(t )
yn  y(nT )  x(t ), (nT  t )
 (t  nT )
2K

n 
Step 1
Find at least
2K spectral
values
xˆm 
m K ,, K
Step 4a
Integrate R+1 times
Step 2 : Find Annhilating filter
Step 3 : Find K locations
Step 4 : Find K weights
x
x(t ) = ∫∫∫ x ( R +1) (t ) dt
R +1
0
( R +1)
Step 1a
Differentiate (R+1) times
xˆm( R +1) = ( i 2τπ m )( R +1) xˆm
(t )

t
m = − K , , K
ICASSP 2009 - 29
Close-up of reconstruction neonatal EEG seizure
FRI
Sinc approximation
Actual seizure signal
15
Amplitude (microvolt)
10
5
0
-5
-10
-15
-20
320
340
360
Compression ratio: 83%
Error FRI method : 17%
Error Sinc method : 37%
380
400
420
440
460
480
500
520
Time
ICASSP 2009 - 30
Band-limited signal with additive shot-noise
=
x(t ) xBL (t ) + xD (t )
BL+ Shotnoise
τ
xBL (t ) ∈ BL[− L , L]
xD (t ), stream of K weighted Diracs
ICASSP 2009 - 31
Band-limited signal with additive shot-noise
Spectrum
Spectrum of BL +Shotnoise
xˆ BL [m]
xˆ D [m]
−L 0 L
Degrees of freedom for x (t ) are
L + 2K
2
L +1+ 2
K


xBL ( t )
xD ( t )
1. From the uniform set of samples, find Fourier series { xˆ[m]}
m∈[ − ( L + 2 K ), L + 2 K ]
2. Using annihilating filter method, determine stream of Diracs using 2K Fourier series
outside of the band [− L, L] , i.e. { xˆ D [m] = xˆ[m]}m∈[ L +1, L + 2 K ]
3. Determine the band-limited signal by taking the inverse Fourier series of
[ m]
{ xˆBL=
xˆ[m] − xˆ D [m]}m∈[ − L , L ]
ICASSP 2009 - 32
Demo: FRI GUI
ICASSP 2009 - 33
Summary Part 2
• Stream of Dirac pulse is very important
• Signals with FRI allow uniform sampling after
appropriate smoothing and perfect reconstruction from
the samples
• Methods rely on separating the innovation in a signal
from the rest and identifying the innovative part solely
from the samples
• The annihilating filter method is a key component in
isolating the innovative part of the signals
• Applications
ICASSP 2009 - 34
Greetings from Sunny Singapore!
35
Tutorial ICASSP 2009
Sparse Sampling:
Theory, Algorithms and Applications
Part 3
Thierry Blu (CUHK), Pier-Luigi Dragotti (ICL), Pina Marziliano (NTU),
Martin Vetterli (EPFL & UCB)
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Recovery of noisy sparse signals
Sparse Sampling: Theory, Algorithms and Applications
Thierry Blu
The Chinese University of Hong Kong
April 2009
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Outline
1
Acquisition in the noisy case
2
Denoising strategies
3
Cramér-Rao bounds
4
Applications
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
FRI with noise
Schematical acquisition of a τ -periodic FRI signal with noise
analog noise
K
X
k=1
digital noise
sampling kernel
xk δ(t − tk )
?
-L -
ϕ(t)
y(t)
T
?
W -L
Modelization
yn =
K
X
k=1
xk ϕ(nT − tk ) + εn
NOTE: N measurements yn for n = 1, 2, . . . , N
Thierry Blu
Recovery of noisy sparse signals
yn -
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
FRI with noise
Estimation problem
Find estimates ȳn , x̄k and t̄k of yn , xk and tk such that
ȳn =
K
X
k=1
x̄k ϕ(nT − t̄k )
kȳ − ykℓ2 is as small as possible
Reminder
τ = N T is the periodicity of the filtered signal
sin(π(2M + 1)t/τ )
ϕ(t) =
is the Dirichlet kernel
(2M + 1)τ sin(πt/τ )
M ≥ K is some integer characterizing the bandwidth of ϕ(t)
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Annihilation
Without noise, this problem can be brought back to
AH = 0
where
the roots of H(z) (built from H) provide tk
A is built using the DFT coefficients of yn
In the presence of noise, the matrix A is only approximatively singular.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Total least-squares
Replace the annihilation equation AH = 0 by
min kAHk2
H
under the constraint kHk2 = 1
Solution
Perform a Singular Value Decomposition
A = USVT
and choose the last column of V for H.
U is unitary of same size as A
S is diagonal (with decreasing coefficients) and of size
(K + 1) × (K + 1)
V is unitary and of size (K + 1) × (K + 1)
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Total least-squares
The estimation of the innovations are then obtained as follows
tk : by finding the roots of the polynomial H(z)
xk : by least-square minimization of


ϕ(T − t1 ) ϕ(T − t2 ) · · · ϕ(T − tK )
 ϕ(2T − t1 ) ϕ(2T − t2 ) · · · ϕ(2T − tK )  




..
..
..


.
.
.
ϕ(N T − t1 ) ϕ(N T − t2 ) · · · ϕ(N T − tK )
x1
x2
..
.
xK


 
 
−
 
NOTE:
Related to Pisarenko method
Not robust with respect to noise ; need for extra denoising
Thierry Blu
Recovery of noisy sparse signals
y1
y2
..
.
yN



.

Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Cadzow iterated denoising
Without noise, the annihilation property AH = 0 still holds if
length(H) = L + 1 is larger than K + 1. We have the properties
A is still of rank K
A is a band-diagonal matrix (Tœplitz)
conversely, if A is Tœplitz and has rank K, then yn are the samples
of an FRI signal
Rank K “projection” algorithm
1
Perform the SVD of A: A = USVT
3
Set to zero the L − K + 1 smallest diagonal elements of S ; S′
build A′ = US′ VT
4
find the band-diagonal matrix that is closest to A′ and goto step 1
2
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Cadzow iterated denoising
Essential details
Iterations of the projection algorithm are performed until the matrix
A is of “effective” rank K
L is chosen maximal, i.e., L = M
Schematical view of the whole retrieval algorithm
Cadzow ŷn
H
too
H
H no - Annihilating
?-
noisy?
HH
Filter method
H
Thierry Blu
Recovery of noisy sparse signals
tk-
linear system
yes
yn FFT
- tk
- xk
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Examples
Original and estimated signal innovations : SNR = 5 dB
1.6
Noiseless samples
original Diracs
estimated Diracs
1.4
1
0.5
1.2
0
1
−0.5
0
0.8
0.2
0.4
0.6
0.8
1
0.8
1
Noisy samples : SNR = 5 dB
0.6
1
0.4
0.5
0.2
0
0
0
0.2
0.4
0.6
0.8
1
−0.5
0
0.2
0.4
0.6
Retrieval of an FRI signal with 7 Diracs (left) from 71 noisy (SNR = 5
dB) samples (right).
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Examples
Original and estimated signal innovations : SNR = 20 dB
1001 samples : SNR = 20 dB
1.6
signal
noise
1.4
1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
−0.2
0
original 100 Diracs
estimated Diracs
0.2
0.4
−0.2
0.6
0.8
1
−0.4
0
0.2
0.4
0.6
0.8
1
Retrieval of an FRI signal with 100 Diracs (left) from 1001 noisy (SNR =
20 dB) samples (right).
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
General uncertainty formula
Consider the noisy FRI retrieval problem
yn =
K
X
k=1
xk ϕ(nT − tk ) + εn
for n = 1, 2, . . . , N
where εn is a zero-mean Gaussian noise with correlation matrix R.
This interpolation/approximation problem is actually a parametric
estimation problem of the vector X = [x1 , . . . , xK , t1 , . . . , tK ]T , based on
the vector of noisy measurements Y = [y1 , y2 , . . . , yN ]T .
Any estimate X̂ = Θ(Y) of X will be random and it is possible to
evaluate its minimal “uncertainty”.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
General uncertainty formula
Cramér-Rao lower bound
The covariance of any unbiased estimate of X based on the noisy
measurements yn is lower-bounded by Fisher’s matrix
−1
cov{Θ(Y)} ≥ ΦT R−1 Φ
where
3
ϕ(T − t1 ) · · · ϕ(T − tK ) −x1 ϕ′ (T − t1 ) · · · −xK ϕ′ (T − tK )
6 ϕ(2T − t1 ) · · · ϕ(2T − tK ) −x1 ϕ′ (2T − t1 ) · · · −xK ϕ′ (2T − tK ) 7
7
6
Φ=6
7.
..
..
..
..
5
4
.
.
.
.
′
′
ϕ(N T − t1 ) · · · ϕ(N T − tK ) −x1 ϕ (N T − t1 ) · · · −xK ϕ (N T − tK )
2
No periodicity requirement, arbitrary differentiable kernel ϕ(t).
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
The one-Dirac case
Under periodicity assumptions, the lower bounding matrix is diagonal.
Minimal location and amplitude uncertainties
!−1/2
X
Bτ
m2
∆t1
√
≥
τ
r̂m
2π|x1 | N
|m|≤⌊Bτ /2⌋
Bτ
∆x1 ≥ √
N
X
|m|≤⌊Bτ /2⌋
1
r̂m
!−1/2
Here, rm = E {εn εn−m } and r̂m is its DFT.
NOTE: If the Diracs are sufficiently well-separated (typ. 2/N ), the
one-Dirac case formula is a good approximation of the multi-Dirac case.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
The one-Dirac case
Digital white noise
s
∆t1
3Bτ
1
≥
·PSNR−1/2
τ
π N (B 2 τ 2 − 1)
Analog (filtered) white noise
r
1
∆t1
3
≥
· PSNR−1/2
2
τ
π B τ2 − 1
and
and
∆x1
≥
|x1 |
r
Bτ
·PSNR−1/2
N
∆x1
≥ PSNR−1/2
|x1 |
In both cases, increasing the bandwidth of the antialiasing filter improves
the temporal resolution.
√
3
1/2 ∆t1
NOTE: uncertainty relation N · PSNR ·
≥
τ
π
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Simulations: Quasi-optimality
2 Diracs / 21 noisy samples
1
10
−1
Positions
0.8
10
observed standard deviation
Cramér−Rao bound
−3
10
−5
0.6
10
−10
0
10
0.4
20
30
40
50
20
30
input SNR (dB)
40
50
Second Dirac
0
10
−1
Positions
Positions
First Dirac
0
retrieved locations
Cramér−Rao bounds
0.2
0
−10
10
−3
10
−5
0
10
20
30
input SNR (dB)
40
50
10
−10
0
10
Retrieval of the locations of a FRI signal. Left: scatterplot of the
locations; right: standard deviation (averages over 10000 realizations)
compared to Cramér-Rao lower bounds.
Quasi-optimality of the algorithm.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Simulations: Robustness
amplitude of the Diracs
Original and estimated signal innovations : SNR = −5 dB; ’Observed’ SNR : −5.0847
290 noiseless samples
1
original Diracs
estimated Diracs
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
0
0.1
0.2
0.3
0.4
0.5
0.6
time location
0.7
0.8
0.9
−1.5
0
1
0.1
0.2
Average distance between original and retrieved positions: 0.15467
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.8
0.9
1
290 noisy samples : SNR = −5 dB
1
Original position
Retrieved position
0.5
amplitude
time location
1
0.8
0.6
0.4
0.2
0
−0.5
−1
0
1
2
3
4
5
Dirac number (1,2,... K)
6
7
−1.5
0
0.1
0.2
0.3
0.4
0.5
0.6
sampled time
0.7
290 samples of an FRI signal in -5 dB noise. Left: retrieved locations and
amplitudes; right: noiseless and noisy signal.
In high noise levels, the algorithm is still able to find accurately
a substantial proportion of Diracs
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
Optical Coherence Tomography: Principle
Detection of coherent backscattered waves from an object by making
interferences with a low-coherence reference wave. Measurement
performed with a standard Michelson interferometer.
Axial (depth) resolution: ∝ coherence length of the reference wave;
Transversal resolution: width of the optical beam.
NOTE: Very high sensitivity (low SNR), noninvasive, low-depth penetration
; biomedical applications (ophthalmology, dentistry, skin).
Axial resolution1 : 10→20µm. Better resolution → better diagnoses.
OCT is a ranging application!
1 with
SuperLuminescent Diods: low cost, compact, easy to use.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Experimental Setup
AAAA
AAAA
moving mirror
ψR (t)
low-coherence
source light (SLD)
reference path
ψO (t)
object
ψO = h ∗ ψR
photodetector
h|ψR (t − zc ) + ψO (t −
z0 2
)| i
c
NOTE: z, z0 are the optical path lengths of the reference and the object wave.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Mathematical setting
Measured intensity:
z0 2
Iphoto (z0 − z) = ψR t − zc +
n ψO t − c o
0
.
= const + 2R (h ∗ G) z−z
)
e r
c
bl rro
ia mi
r
|
{z
}
va ing
ov
OCT signal
(m
G(t) is the temporal coherence function of the reference wave:
G(t′ − t) = hψR (t′ ), ψR (t)i
Typically, G(t) = A(t)e2iπν0 t is a bandpass function.
A deconvolution/superresolution problem
Retrieve h(t) from the uniform samples of the OCT signal.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Resolution
Resolution limit of OCT (two interfaces): Lc = c × FWHM of A(t)
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
-5
0
acquisition time (ms)
5
NOTE: Because the light travels twice (forward then backward) in the
object, the actual physical resolution is Lc /2. Moreover, a larger value of
refraction index inside the object further divides the resolution limit.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Finite Rate of Innovation Signal
A succession of K layers of varying depth, each of constant refractive
index:
K
X
c
b
h(ν) =
ak e−2iπbk ν/c
for all |ν − ν0 | ≤
Lc
k=1
optical path = b1
= b2
= b3
Problem: retrieve the innovations ak and bk from the uniform samples
of the OCT signal.
NOTE: A physical modelization (1D wave equation) shows that ak and bk
are related to the depth (×2) and refractive index of the layers.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Gaussian kernel hypothesis
We assume that the coherence function has a Gaussian envelope:
2
G(t) ∝ e
2
c
−4 ln(2) L
2 t +2iπν0 t
c
This is a very reasonable hypothesis for SLDs:
0.2
OCT mirror signal
difference with Gabor function
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
-0.2
-10
-5
0
acquisition time (ms)
Thierry Blu
5
10
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: A standard FRI retrieval problem
Denote the OCT signal by I(z), then
n
o
0
I(z) = 2R (h ∗ G) z−z
c
K
nX
o
2
1
= 2R
αk e− 2σ2 (z−βk )
|
k=1
{z
}
sum of Gabor functions
where the complex numbers αk and βk are related to ak and bk ,
c
.
and where σ = 2√2Llog
2
An FRI problem
Retrieve amplitudes αk and “locations” βk from I(l∆z), where
∆z = sampling step.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Example of Processing
Separated Gaussians
located at z0 and z1
their sum
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
→
0
-0.2
-0.4
0.2
ց
0
-0.2
-0.4
+noise
1
0.8
-0.6
-0.6
-0.8
-0.8
0.6
0.4
-1
-10
-5
z0 z1 0
5
-1
-10
10
-5
0
5
10
0.2
0
resynthetized signal
retrieved Gaussians
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
←
0
-0.2
-0.4
-0.6
-0.4
-0.6
-0.8
-1
-10
ւ
0
-0.2
-0.4
-0.6
-0.8
-1
-10
-0.2
-0.8
-5
0
5
10
-1
-10
Thierry Blu
-5
z0 z1 0
5
10
Recovery of noisy sparse signals
-5
0
5
10
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Simulation example
Simulation examples: two interfaces distant by 7µm (1ms below)
PSNR 40dB
PSNR 30dB
OCT signal
difference with model
retrieved positions
0.3
PSNR 25dB
OCT signal
difference with model
retrieved positions
0.3
0.2
0.2
0.1
0.1
0.1
0
0
0
-0.1
-0.1
-0.1
-0.2
-0.2
-0.2
-0.3
-0.3
-2
-1
0
1
acquisition time (ms)
2
3
OCT signal
difference with model
retrieved positions
0.3
0.2
-0.3
-2
-1
0
1
acquisition time (ms)
Thierry Blu
2
3
-2
Recovery of noisy sparse signals
-1
0
1
acquisition time (ms)
2
3
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Real Data Processing
SLD source of central wavelength 0.814µm and coherence length 25µm.
; OCT resolution of 12.5µm.
Depth scan of a 4µm thick pellicle beamsplitter of an optical2 depth of
6.6µm ; approximately half the OCT resolution.
Calibration part:
Depth scan of 1 interface ; effective coherence function;
High-coherence interferometer ; accurate position of the moving
mirror.
2 refractive
index 1.65.
Thierry Blu
Recovery of noisy sparse signals
Acquisition in the noisy case
Denoising strategies
Cramér-Rao bounds
Applications
OCT: Real Data Processing
Example of superresolution (preliminary results3 ): data-model
PSNR=31dB
OCT signal
difference with model
retrieved positions
0.06
0.04
0.02
0
-0.02
-0.04
-0.06
4.5
5
5.5
6
6.5
7
acquisition time (ms)
7.5
8
The two retrieved interfaces are distant by 17 interference fringes
; 17 × 0.814/2 = 6.9µm.
3 without
correction for the movement of the mirror (HC phase unreliable).
Thierry Blu
Recovery of noisy sparse signals
Tutorial ICASSP 2009
Sparse Sampling:
Theory, Algorithms and Applications
Part 4
Thierry Blu (CUHK), Pier-Luigi Dragotti (ICL), Pina Marziliano (NTU),
Martin Vetterli (EPFL & UCB)
Part 4: Variations and Extensions
• Problem Statement and Motivation
• Alternative Sampling Kernels
• Kernels Reproducing Polynomials
– The family of B-Splines
• Sampling FRI Signals with Polynomial Reproducing Kernels
– Sampling Streams of Diracs
– Sampling Piecewise Polynomial Signals
• Kernels Reproducing Exponentials and Kernels with Rational Fourier Transform
• Estimation of FRI signal at the output of an RC circuit
• Application: Image Super-Resolution
• Summary and Conclusions
1
Problem Statement and Motivation
You are given a class of functions. You have a sampling device, typically, a low-pass filter. Given
the measurements yn = hϕ(t/T − n), x(t)i, you want to reconstruct x(t).
x(t)
h(t)= ϕ(−t/T)
y(t)
yn=<x(t), ϕ (t/T−n)>
T
Acquisition Device
Natural questions:
• When is there a one-to-one mapping between x(t) and yn?
• What signals can be sampled and what kernels ϕ(t) can be used?
• What reconstruction algorithms?
In this part of the tutorial we aim to extend the choices of valid sampling kernels.
2
Problem Statement and Motivation
Digital Cameras:
Observed
scene
Lens
CCD
Array
Samples
Acquisition
System
• The low-quality lens blurs the images. Lenses are, to some extent, equivalent to the sampling
kernel. Lenses do not necessarily behave like the sinc function.
3
Problem Statement and Motivation
−4
(a)Original (2014 × 3039)
−2
0
Pixels (perpendicular)
2
4
6
(b) Point Spread function in black
The distortion introduced by a lens can be well approximated with splines.
4
Problem Statement and Motivation
A-to-D converters (simplified Sample and Hold Converter).
−
R
OUT
IN
+
C
• The electric circuits in many practical devices modify the measured signal, can we model this
distortion with a function that suits our sampling framework?
5
Alternative Sampling Kernels
Possible classes of kernels (we want to be as general as possible):
1. Any kernel ϕ(t) that can reproduce polynomials:
X
m
cm,nϕ(t − n) = t
m = 0, 1, ..., L,
n
for a proper choice of coefficients cm,n.
2. Any kernel ϕ(t) that can reproduce exponentials:
X
am t
cm,nϕ(t − n) = e
am = α0 + mλ and m = 0, 1, ..., L,
n
for a proper choice of coefficients cm,n.
3. Any kernel with rational Fourier transform:
Q
i (jω − bi )
ϕ̂(ω) = Q
am = α0 + mλ and m = 0, 1, ..., L.
(jω
−
a
)
m
m
6
Alternative Sampling Kernels
Class 1 is made of all functions ϕ(t) satisfying Strang-Fix conditions. This includes any scaling
function generating a wavelet with L + 1 vanishing moments (e.g., Splines and Daubechies
scaling functions).
Strang-Fix Conditions:
(m)
ϕ̂(0) 6= 0 and ϕ̂
(2nπ) = 0 for n 6= 0 and m = 0, 1, .., L,
where ϕ̂(ω) is the Fourier transform of ϕ(t).
0.8
1
1
0.7
0.8
0.6
0.6
0.8
0.5
0.5
0.4
0.6
0.6
0.4
0.3
0.4
0.4
0.2
0.2
0.3
0.2
0.2
0.1
0.1
0
−2
−1.5
−1
−0.5
0
β (t)
0.5
0
β0(t)
1
1.5
2
−2
0
0
0
−1.5
−1
−0.5
0
β (t)
0.5
1
β1(t)
1
1.5
2
−0.1
−2
−1.5
−1
−0.5
0
β2(t)
0.5
1
1.5
2
β2(t)
−0.1
−2
−1.5
−1
−0.5
0
β3(t)
0.5
1
1.5
2
β3(t)
B-splines satisfy Strang-Fix conditions since β̂n(ω) = sinc(ω/2)n+1.
7
B-splines Reproduce Polynomials
3
3
3
2
2
2
1
1
1
1
0
−1
0
0 1
0
t
t
t
−1
−1
−1
−2
−2
−2
−3
−6
−3
−4
−2
0
β1 (t)
2
4
6
−3
−6
−4
−2
0
2
4
c0,n = (1, 1, 1, 1, 1, 1, 1)
The linear spline reproduces polynomials up to degree L=1:
P
6
−6
−4
−2
0
2
4
6
c1,n = (−3, −2, −1, 0, 1, 2, 3)
n cm,n β1 (t − n) = t
m
m = 0, 1, for a proper
choice of coefficients cm,n (in this example n = −3, −2, ..., 1, 2, 3).
Notice: cm,n = hϕ̃(t − n), tm i where ϕ̃(t) is biorthogonal to ϕ(t): hϕ̃(t), ϕ(t − n)i = δn .
8
B-splines Reproduce Polynomials
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−6
−3
−4
−2
0
2
4
6
−6
c0,n = (1, 1, 1, 1, 1, 1, 1)
−4
−2
0
2
4
6
c1,n = (−3, −2, −1, 0, 1, 2, 3)
20
6
15
4
10
2
5
0
0
−2
−5
−10
−4
−15
−6
−6
−4
−2
0
2
4
−20
−6
6
c2,n ∼ (8.7, 3.7, 0.7, −0.333, 0.7, 3.7, 8.7)
The cubic spline reproduces polynomials up to degree L=3:
−4
−2
0
2
4
6
c3,n ∼ (−24, −6, −0.001, 0, 0.001, 6, 24)
P
m
n cm,n β3 (t − n) = t
m = 0, 1, 2, 3.
9
Basic Set-up: Sampling Streams of Diracs
• Assume that x(t) is a stream of K Diracs on the interval [0, τ [:
x(t) =
K−1
X
xk δ(t − tk ).
k=0
The signal has 2K degrees of freedom.
• Assume that ϕ(t) is any function that can reproduce polynomials of maximum degree
L ≥ 2K − 1:
X
m
cm,nϕ(t − n) = t , m = 0, 1, ..., L,
n
where cm,n = hϕ̃(t − n), tmi and ϕ̃(t) is biorthogonal to ϕ(t).
• We want to retrieve x(t), from the samples yn = hx(t), ϕ(t/T − n)i with n =
0, 1, ..., N − 1 and T N = τ .
10
Sampling Streams of Diracs
Assume for simplicity that T = 1 and define sm =
sm
=
=
=
=
We thus observe sm =
PK−1
k=0
PN −1
n=0
hx(t),
R∞
−∞
P
n
cm,nyn, m = 0, 1, ..., L, we have that
cm,nyn
PN −1
n=0
cm,nϕ(t − n)i
(1)
x(t)tmdt
PK−1
k=0
xk tm
m = 0, 1, ..., L
k
xk tm
k , m = 0, 1, ..., L.
11
Sampling Streams of Diracs
x0δ(t−t0)
x δ(t−t )
0
0
0
X
yn = hx0δ(t−t0),
n
X
Z
n
∞
ϕ(t−n)i =
c1,nyn = hx0δ(t − t0),
x0δ(t−t0)
−∞
n
X
0
t
X
n
X
ϕ(t−n)dt = x0
n
c1,nϕ(t − n)i = x0
t
X
ϕ(t0−n) = x0
n
X
c1,nϕ(t0 − n) = x0t0
n
12
Sampling Streams of Diracs
x0δ(t−t0)
x0δ(t−t0)
x1δ(t−t1)
x δ(t−t )
1
0
s0 =
P
1
0
t
n yn = x0 + x1
s1 =
P
n
c1,nyn = x0t0 + x1t1
x0δ(t−t0)
x0δ(t−t0)
x1δ(t−t1)
0
s2 =
P
t
x1δ(t−t1)
0
t
2
2
n c2,n yn = x0 t0 + x1 t1
s3 =
P
n
t
c3,nyn = x0t30 + x1t31
13
Sampling Streams of Diracs - Toy Examples
√
Example: T = 1, x(t) = x0 δ(t − t0 ) with x0 = 2 and t0 =
The sampling kernel is a linear spline.
2
2.
2
x δ(t−t )
0
1.5
0
1.5
1
yn
1
0.5
0.5
0
0
−0.5
−6
In this case
and
−4
−2
0
t
2
4
6
−0.5
−6
−4
−2
0
n
2
4
6

n 6= 1, 2
 0
√
2(2
yn = hx(t), ϕ(t − n)i = x0 ϕ(t0 − n) =
√− 2), n = 1,

2( 2 − 1)
n = 2.
P
P
s0 =
c
y
=
y = y1 + y2 = x0 = 2,
n
√
Pn 0,n
Pn n
2.
s1 =
c
y
=
ny
=
y
+
2y
=
x
t
=
2
n
n
1,n
1
2
0
0
n
n
14
The Annihilating Filter Method
The quantity
sm =
K−1
X
xk tm
k , m = 0, 1, ..., L
k=0
is a power-sum series.
Notice the similarity with the sinc case where we were finding the quantity:
x̂m
K−1
K−1
X
1X
−j2πmtk /τ
=
xk e
=
xk um
k .
τ
k=0
k=0
As for the sinc case, we can retrieve the locations tk and the amplitudes xk with
the annihilating filter method.
15
The Annihilating Filter Method
1. Call hm the filter with z -transform H(z) =
QK−1
k=0
(1 − tk z −1).
PK
We have that hm ∗ sm =
i=0 hi sm−i = 0. This filter is thus called the annihilating
filter.
In matrix/vector form we have that SH = HS = 0 or alternatively using the fact that h0 = 1











sK−1
sK−2
···
s0
sK
sK−1
···
s1
...
...
...
...
sL−1
sL−2
···
sL−K

h1


 h
 2


  ..
 .


hK


sK




 s

 K+1



 = −


...






sL






.




Solve the above Toeplitz system to find the coefficient of the annihilating filter.
16
The Annihilating Filter Method
2. Given the coefficients {1, h1, h2, ..., hk }, we get the locations tk by finding the roots of H(z).
PK−1
3. Solve the first K equations in sm =
k=0
form

1
1
···
1


 t
t1
· · · tK−1

0



...
...
...
...



tK−1
tK−1
· · · tK−1
0
1
K−1
xk tm
k to find the amplitudes xk . In matrix/vector











x0
x1
...












=








xK−1
s0
s1
...






.




(2)
sK−1
Classic Vandermonde system. Unique solution for distinct tk s.
17
Sampling Streams of Diracs: Numerical Example
1
5
0.9
4.5
0.8
4
0.7
3.5
0.6
3
0.5
2.5
0.4
2
0.3
1.5
0.2
1
0.1
0.5
0
0
0.2
0.4
0.6
0.8
1
0
−4
−3
−2
−1
t
0
t
1
2
3
4
(b) Sampling Kernel (β7(t))
(a) Original Signal
4
1
0.9
3.5
0.8
3
0.7
2.5
0.6
2
0.5
0.4
1.5
0.3
1
0.2
0.5
0
0.1
0
0.2
0.4
0.6
t
(c) Samples
0.8
1
0
0
0.2
0.4
0.6
0.8
1
t
(d) Reconstructed Signal
18
Sampling Streams of Diracs: Closely Spaced Diracs
1
5
0.9
4.5
0.8
4
0.7
3.5
0.6
3
0.5
2.5
0.4
2
0.3
1.5
0.2
1
0.1
0.5
0
0
0.2
0.4
0.6
0.8
1
0
−4
−3
−2
−1
t
0
t
1
2
3
4
(b) Sampling Kernel (β7(t))
(a) Original Signal
4.5
1
4
0.9
0.8
3.5
0.7
3
0.6
2.5
0.5
2
0.4
1.5
0.3
1
0.2
0.5
0
0.1
0
0.2
0.4
0.6
t
(c) Samples
0.8
1
0
0
0.2
0.4
0.6
0.8
1
t
(d) Reconstructed Signal
19
Sampling Streams of Diracs: Sequential Reconstruction
4
0.5
3
0.4
2
0.3
1
0
0.2
−1
0.1
−2
0
−3
−4
−20
−15
−10
−5
0
5
10
15
20
−0.1
−5
−4
−3
−2
−1
0
1
2
3
4
5
(b) Sampling Kernel (β7(t))
(a) Original Signal
4
0.5
3
2
0
1
0
−1
−0.5
−2
−3
−1
0
5
10
15
20
(c) Samples
25
30
35
−4
−20
−15
−10
−5
0
5
10
15
20
(d) Reconstructed Signal
20
Sinc Kernel vs B-Spline Kernel
Acquisition
Reconstruction
{yn }
T
x(t)
DFT
^m}
{x
AF
t 0..t K
VS
x0 ..xK
Sinc Kernel
Fourier coefficients of x(t)
Acquisition
Reconstruction
{yn }
T
x(t)
C
{sm}
AF
t 0..t K
VS
x0 ..xK
B−Spline Kernel
Moments of x(t)
21
Sampling Piecewise Constant Signals
Insight: the derivative of a piecewise constant signal is a stream of Diracs. Thus, if we can
compute the derivative of piecewise constant signals, we can sample them.
Given the samples yn compute the finite difference zn = yn+1 − yn, we have that
zn = yn+1 − yn
=
hx(t), ϕ(t − n − 1) − ϕ(t − n)i
=
−jωn −jω
1
(e
2π hx̂(ω), ϕ̂(ω)e
=
−jωn
1
2π hx̂(ω), −jω ϕ̂(ω)e
=
d
−hx(t), dt
[ϕ(t − n) ∗ β0(t − n)]
=
h dx(t)
dt , ϕ(t − n) ∗ β0 (t − n)i.
− 1)i
1−e−jω
jω
i
Thus the samples zn are related to the derivative of x(t).
Notice: Only Discrete-Time Processing!
22
Sampling Piecewise Constant Signals
T
x(t)
zn= yn −yn−1
ϕ(t)
T
dx(t)
dt
yn
zn
ϕ (t)* β0 (t)
23
Piecewise Constant Signals. Example
2
3.5
3
1.5
2.5
2
1
1.5
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1.5
−1
−2
200
400
600
800
1000
0
1200
(a) Original Signals
5
10
15
20
25
30
(b) Measured Samples
2
3.5
3
1.5
2.5
2
1
1.5
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1.5
−1
0
5
10
15
20
25
(c) Finite Difference
30
−2
200
400
600
800
1000
1200
(d) Reconstructed Signal
24
Sampling Diracs vs Sampling PCS
Acquisition
Reconstruction
{yn }
C
T
x(t)
{sm}
AF
t 0..t K
VS
x0 ..xK
B−Spline Kernel
Moments of x(t)
Acquisition
3.5
Reconstruction
{sm}
2
3
1.5
2.5
Digital
Filter
(1−z)
2
1
1.5
1
0.5
0.5
0
0
T
−0.5
−1
−1.5
−0.5
−1
−2
200
400
600
800
x(t)
1000
1200
0
5
10
15
20
25
30
{yn}
B−Spline Kernel
C
x0 ..xK
AF t 0..t K VS
{zn }
Moments of dx/dt
2
1.5
1
0.5
0
−0.5
−1
0
5
10
15
20
25
30
25
Kernel Reproducing Exponentials
Classe 2 includes any function ϕ(t) that can reproduce exponentials:
X
a t
cm,nϕ(t − n) = e m , am = α0 + mλ and m = 0, 1, ..., L.
n
This includes any composite kernel of the form φ(t) ∗ βα~ (t) where βα~ (t) = βα0 (t) ∗ βα1 (t) ∗
... ∗ βαL (t) and βαi (t) is an Exponential Spline of first order [UnserB:05].
1 − eα−jω
βα(t) ⇔ β̂(ω) =
jω − α
eα t
Notice:
E−Spline βα(t)
• α can be complex.
• E-Spline is of compact support.
• E-Spline reduces to the classical polynomial
spline when α = 0
26
Kernels Reproducing Exponentials
1
0.9
eα0 t
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1
t
0
0
1
2
3
4
t
5
6
7
8
The E-spline of first order βα0 (t) reproduces the exponential eα0t:
X
a0 t
c0,nβα0 (t − n) = e
.
n
In this case c0,n = eα0n. In general, cm,n = cm,0eαmn.
27
Trigonometric E-Splines
• The exponent α of the E-splines can be complex. This means βα(t) can be a complex
function.
• However if pairs of exponents are chosen to be complex conjugate then the spline stays real.
• Example:

sin ω0 t α0 t

0≤t<1

ω0 e





sin ω0 (t−2) α0 t
βα0+jω0 (t) ∗ βα0−jω0 (t) =
−
e
1≤t<2
ω0






 0
Otherwise
When α0 = 0 (i.e., purely imaginary exponents), the spline is called trigonometric spline.
28
Trigonometric E-Splines
1.5
1.5
Exponential
Reproduction
Scaled and shifted E−splines
Exponential
Reproduction
Scaled and shifted E−splines
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−10
−8
−6
−4
−2
0
2
4
6
8
10
12
−1.5
−10
−8
−6
−4
−2
0
t
2
4
1.5
8
10
12
1.5
Exponential
Reproduction
Scaled and shifted E−splines
Exponential
Reproduction
Scaled and shifted E−splines
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−10
6
t
−8
−6
−4
−2
0
2
4
6
8
10
12
−1.5
−10
−8
−6
−4
−2
t
Here α
~ = (−jω0 , jω0 ) and ω0 = 0.2.
0
2
4
6
8
10
12
t
P
jmω0
~ (t − n) = e
n cn,m βα
m = −1, 1.
Notice: βα
~ (t) is a real function, but the coefficients cm,n are complex.
29
Sampling Streams of Diracs with E-splines
• Assume that x(t) is a stream of K Diracs on the interval [0, τ [:
x(t) =
K−1
X
xk δ(t − tk ).
k=0
The signal has 2K degrees of freedom.
• Assume that ϕ(t) is any function that can reproduce exponentials of the form
X
am t
cm,nϕ(t − n) = e
am = mλ and m = 0, 1, ..., L,
n
where L ≥ 2K − 1.
• We want to retrieve x(t), from the samples yn = hx(t), ϕ(t/T − n)i with n =
0, 1, ..., N − 1 and T N = τ .
30
Sampling Diracs with E-splines
Assume that x(t) =
sm
PK−1
k=0
=
=
=
=
=
xk δ(t − tk ). We have that
P
n
cm,nyn
hx(t),
P
n
cm,nϕ(t − n)i
R∞
αm t
x(t)e
dt
−∞
PK−1
k=0
PK−1
k=0
xk eαmtk
xk eλmtk =
PK−1
k=0
xk um
k ,
m = 0, 1, ..., L
and we can again use the annihilating filter method to reconstruct the Diracs. In matrix form
HCY = 0.
Notice: if αm = jmω0 with m = 0, 1, ..., N , then sm = x̂(mω0) = x̂m.
In this case C ∼ DF T .
31
Alternative Sampling Kernels
Class 3 includes any linear differential acquisition device. That is, any device for which the input
and output are related by linear differential equations. This includes most electrical, mechanical
or acoustic systems.
A class 3 kernel ϕ̂(ω) =
Q
(jω−bi )
Q i
m (jω−am )
Q
~ (ω)
i (jω − bi )β̂α
QL
by filtering the samples yn = hx(t), ϕ(t − n)i with an FIR filter H(z) = m=0(1 − eam z).
Example:
Assume ϕ̂(ω) =
1
jω−α
zn = hn ∗ yn
can be converted into the class 2 kernel
and yn = hx(t), ϕ(t − n)i. Then
=
yn − eαyn+1 = hx(t), ϕ(t − n) − eαϕ(t − n − 1)i
=
α−jω )
−jωn (1−e
1
2π hx̂(ω), e
(jω−α)
= hx(t), βα(t − n)i.
32
Reconstruction of FRI signals at the output of an RC circuit
R
.
.
+
x(t)
.
C
−
+
y(t)
.
Ai
yn
T=1
Digital
Filter
g(1)
n
Annihilating
Filter
Method
ti
−
A bit of circuit analysis ;-)
• Input x(t), samples yn = hx(t), ϕ(t − n)i with ϕ̂(ω) = α/(α − jω) and α = 1/RC ,
ϕ(t) leads to the E-Spline βα(t) with α = 1/RC .
• An ideal reasonant circuit√(LC circuit) leads to E-Splines with purely imaginary exponents:
α = ±jω0 and ω0 = 1/ LC . With this we can estimate x̂(ω0).
• A classical RLC circuit leads to E-splines with complex exponential.
• More sophisticated passive or active circuits give higher order splines.
33
Application: Image Super-Resolution
Super-Resolution is a multichannel sampling problem with unknown shifts. Use moments to retrieve the shifts or the
geometric transformation between images.
(a)Original (512 × 512)
(b) Low-res. (64 × 64)
(c) Super-res ( PSNR=24.2dB)
• Forty low-resolution and shifted versions of the original.
• Intuition: The disparity between images has a finite rate of innovation and can be retrieved exactly.
• Accurate registration is achieved by retrieving the continuous moments of the ‘Tiger’ from the samples.
• The registered images are interpolated to achieve super-resolution.
34
Application: Image Super-Resolution
• A sample Pm,n in the blurred image is given by
J
J
Pm,n = hI(x, y), ϕ(x/2 − n, y/2 − m)i.
• The scaling function ϕ(x, y) can reproduce polynomials:
X X (l,j)
l j
cm,n ϕ(x − n, y − m) = x y
n
l = 0, 1, ..., N ; j = 0, 1, ..., N.
m
• Retrieve the exact moments of I(x, y) from Pm,n :
Z Z
X X (l,j)
X X (l,j)
J
J
l j
τl,j =
cm,n Pm,n = hI(x, y),
cm,n ϕ(x/2 −n, y/2 −m)i =
I(x, y)x y dxdy.
n
m
n
m
• Register the low-resolution images using the retrieved moments.
35
Application: Image Super-Resolution
−4
(a)Original (2014 × 3039)
−2
0
Pixels (perpendicular)
2
4
6
(b) Point Spread function
36
Application: Image Super-Resolution
(a)Original (128 × 128)
(b) Super-res (1024 × 1024)
37
Application: Image Super-Resolution
(a)Original (48 × 48)
(b) Super-res (480 × 480)
38
Summary
• We have introduced alternative sampling kernels (e.g., B-splines, E-splines)
• Potential advantages:
– Kernels with compact support ⇒ reconstruction algorithms with low
complexity.
– Very close to real devices (e.g., Electric circuits, Digital cameras)
• Promising Application: Image Super-resolution.
• Many other potential applications such as for example parallel A-to-D
converters.
39
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002
1417
Sampling Signals With Finite Rate of Innovation
Martin Vetterli, Fellow, IEEE, Pina Marziliano, and Thierry Blu, Member, IEEE
Abstract—Consider classes of signals that have a finite number
of degrees of freedom per unit of time and call this number the rate
of innovation. Examples of signals with a finite rate of innovation
include streams of Diracs (e.g., the Poisson process), nonuniform
splines, and piecewise polynomials.
Even though these signals are not bandlimited, we show that they
can be sampled uniformly at (or above) the rate of innovation using
an appropriate kernel and then be perfectly reconstructed. Thus,
we prove sampling theorems for classes of signals and kernels that
generalize the classic “bandlimited and sinc kernel” case. In particular, we show how to sample and reconstruct periodic and finite-length streams of Diracs, nonuniform splines, and piecewise
polynomials using sinc and Gaussian kernels. For infinite-length
signals with finite local rate of innovation, we show local sampling
and reconstruction based on spline kernels.
The key in all constructions is to identify the innovative part of
a signal (e.g., time instants and weights of Diracs) using an annihilating or locator filter: a device well known in spectral analysis and
error-correction coding. This leads to standard computational procedures for solving the sampling problem, which we show through
experimental results.
Applications of these new sampling results can be found in signal
processing, communications systems, and biological systems.
Index Terms—Analog-to-digital conversion, annihilating filters, generalized sampling, nonbandlimited signals, nonuniform
splines, piecewise polynomials, poisson processes, sampling.
I. INTRODUCTION
M
OST continuous-time phenomena can only be seen
through sampling the continuous-time waveform, and
typically, the sampling is uniform. Very often, instead of the
waveform itself, one has only access to a smoothed or filtered
version of it. This may be due to the physical set up of the
measurement or may be by design.
the original waveform, its filtered version is
Calling
, where
is the convolution
kernel. Then, uniform sampling with a sampling interval
leads to samples
given by
0
~ (t) = h( t)
Fig. 1. Sampling setup: x(t) is the continuous-time signal; h
is the smoothing kernel; y (t) is the filtered signal; T is the sampling interval;
y (t) is the sampled version of y (t); and y (nT ), n
are the sample values.
The box C/D stands for continuous-to-discrete transformation and corresponds
to reading out the sample values y (nT ) from y (t).
2
The intermediate signal
sampling is given by
(2)
This setup is shown in Fig. 1.
When no smoothing kernel is used, we simply have
, which is equivalent to (1) with
. This simple
model for having access to the continuous-time world is typical for acquisition devices in many areas of science and technology, including scientific measurements, medical and biological signal processing, and analog-to-digital converters.
are a
The key question is, of course, if the samples
. If so, how can
faithful representation of the original signal
from
, and if not, what approximation
we reconstruct
? This question is at the
do we get based on the samples
heart of signal processing, and the dominant result is the wellknown sampling theorem of Whittaker et al., which states that
is bandlimited, or
, then samples
if
with
are sufficient to reconstruct
[5],
in cycles
[15], [18]. Calling , which is the bandwidth of
), we see that
sample/s are
per second (
. The reconstruction formula
a sufficient representation of
is given by
sinc
(1)
Manuscript received May 16, 2001; revised February 11, 2002. The work of
P. Marziliano was supported by the Swiss National Science Foundation. The
associate editor coordinating the review of this paper and approving it for publication was Dr. Paulo J. S. G. Ferreira.
M. Vetterli is with the LCAV, DSC, Swiss Federal Institute of Technology,
Lausanne, Switzerland, and also with the Department of Electrical Engineerring and Computer Science, University of California, Berkeley, CA, 94720 USA
(e-mail: [email protected]).
P. Marziliano was with the Laboratory for Audio Visual Communication
(LCAV), Swiss Federal Institute of Technology, Lausanne, Switzerland. She is
now with Genista Corporation, Genimedia SA, Lausanne, Switzerland (e-mail:
[email protected]).
T. Blu is with the Biomedical Imaging Group, DMT, Swiss Federal Institute
of Technology, Lausanne, Switzerland (e-mail: [email protected]).
Publisher Item Identifier S 1053-587X(02)04392-1.
corresponding to an idealized
(3)
. If
is not bandlimited, conwhere sinc
sinc
(an ideal lowpass filter with
volution with
) allows us to apply sampling and reconsupport
, which is the lowpass approximation of
.
struction of
to the interval
provides
This restriction of
in the
the best approximation in the least squares sense of
sinc space [18].
A possible interpretation of the interpolation formula (3) is
the following. Any real bandlimited signal can be seen as having
degrees of freedom per unit of time, which is the number of
samples per unit of time that specify it. In the present paper, this
number of degrees of freedom per unit of time is called the rate
1053-587X/02$17.00 © 2002 IEEE
1418
of innovation of a signal1 and is denoted by . In the bandlimited
.
case above, the rate of innovation is
In the sequel, we are interested in signals that have a finite rate
of innovation, either on intervals or on average. Take a Poisson
process, which generates Diracs with independent and identically distributed (i.i.d.) interarrival times, the distribution being
. The exexponential with probability density function
. Thus, the rate of innopected interarrival time is given by
vation is since, on average, real numbers per unit of time
fully describe the process.
Given a signal with a finite rate of innovation, it seems attractive to sample it with a rate of samples per unit of time. We
know it will work with bandlimited signals, but will it work with
a larger class of signals? Thus, the natural questions to pursue
are the following.
1) What classes of signals of finite rate of innovation can be
sampled uniquely, in particular, using uniform sampling?
allow for such sampling schemes?
2) What kernels
3) What algorithms allow the reconstruction of the signal
based on the samples?
In the present paper, we concentrate on streams of Diracs,
nonuniform splines, and piecewise polynomials. These are
signal classes for which we are able to derive exact sampling
theorems under certain conditions. The kernels involved are the
sinc, the Gaussian, and the spline kernels. The algorithms, although they are more complex than the standard sinc sampling
of bandlimited signals, are still reasonable (structured linear
systems) but also often involve root finding.
As will become apparent, some of the techniques we use in
this paper are borrowed from spectral estimation [16] as well
as error-correction coding [1]. In particular, we make use of the
annihilating filter method (see Appendices A and B), which is
standard in high-resolution spectral estimation. The usefulness
of this method in a signal processing context has been previously pointed out by Ferreira and Vieira [24] and applied to error
detection and correction in bandlimited images [3]. In array
signal processing, retrieval of time of arrival has been studied,
which is related to one of our problems. In that case, a statistical framework using multiple sensors is derived (see, for example, [14] and [25]) and an estimation problem is solved. In
our case, we assume deterministic signals and derive exact sampling formulas.
The outline of the paper is as follows. Section II formally
defines signal classes with finite rate of innovation treated in
the sequel. Section III considers periodic signals in continuous
time and derives sampling theorems for streams of Diracs,2
nonuniform splines, and piecewise polynomials. These types
of signals have a finite number of degrees of freedom, and a
sampling rate that is sufficiently high to capture these degrees
of freedom, together with appropriate sampling kernels, allows
perfect reconstruction. Section IV addresses the sampling of
finite-length signals having a finite number of degrees of
freedom using infinitely supported kernels like the sinc kernel
and the Gaussian kernel. Again, if the critical number of
1This is defined in Section II and is different from the rate used in rate-distortion theory [2]. Here, rate corresponds to a degree of freedom that is specified
by real numbers. In rate-distortion theory, the rate corresponds to bits used in a
discrete approximation of a continuously valued signal.
2We are using the Dirac distribution and all of the operations, including
derivatives, must be understood in the distribution sense.
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002
samples is taken, we can derive sampling theorems. Section V
concentrates on local reconstruction schemes. Given that the
local rate of innovation is bounded, local reconstruction is
possible, using, for example, spline kernels. In the Appendix, we
review the “annihilating filter” method from spectral analysis
and error-correction coding with an extension to multiple zeros.
This method will be used in several of the proofs in the paper.
II. SIGNALS WITH FINITE RATE OF INNOVATION
In the Introduction, we informally discussed the intuitive notion of signals with finite rate of innovation. Let us introduce
more precisely sets of signals having a finite rate of innovation.
and signals of the form
Consider a known function
(4)
of which bandlimited signals [see (3)] are a particular case.
. There are many
Clearly, the rate of innovation is
is a scaling
examples of such signals, for example, when
function in a wavelet multiresolution framework [8], [20] and in
approximation theory [17].
A more general case appears when we allow arbitrary shifts
or
(5)
,
,
, and
For example, when
are i.i.d. with exponential density, then we have the
Poisson process of rate .
and arbitrary
Allowing a set of functions
shifts, we obtain
(6)
Finally, we will be considering signals that are built using a
noninnovative part, for example, a polynomial and an innovative
part such as (5) for nonuniform splines or, more generally, (6)
for piecewise polynomials of degree and arbitrary intervals
are one-sided power functions
(in that case, and
).
Assuming the functions
are known, it is clear that the
as in (6) are the time
only degrees of freedom in a signal
instants and the coefficients
. Introducing a counting functhat counts the number of degrees of freedom in
tion
over the interval
, we can define the rate of innovation as
(7)
Definition 1: A signal with a finite rate of innovation is a
signal whose parametric representation is given in (5) and (6)
and with a finite , as defined in (7).
If we consider finite length or periodic signals of length ,
then the number of degrees of freedom is finite, and the rate of
. Note that in the uniform case given
innovation is
in (4), the instants, being predefined, are not degrees of freedom.
VETTERLI et al.: SAMPLING SIGNALS WITH FINITE RATE OF INNOVATION
1419
One can also define a local rate of innovation with respect to
a moving window of size . Given a window of size , the local
rate of innovation at time is
The Fourier series coefficients
are thus given by
(13)
(8)
In this case, one is often interested in the maximal local rate or
that is, the linear combination of complex exponentials. Conwith
sider now a finite Fourier series
-transform
(9)
,
tends to . To illustrate the differences
As
, consider again the Poisson process with
between and
. The rate of innovation is given
expected interarrival time
.
by . However, for any finite , there is no bound on
The reason for introducing the rate of innovation is that one
hopes to be able to “measure” a signal by taking samples per
unit of time and be able to reconstruct it. We know this to be
true in the uniform case given in (4). The main contribution of
this paper is to show that it is also possible for many cases of
interest in the more general cases given by (5) and (6).
(14)
and having
zeros at
, that is
(15)
is the convolution of
elementary filters
Note that
and
with coefficients
that the convolution of such a filter with the exponential
is zero
III. PERIODIC CONTINUOUS-TIME CASE
In this section, we consider –periodic signals that are either
made of Diracs or polynomial pieces. The natural representation
of such signals is through Fourier series
(16)
is the sum of exponentials and each
Therefore, because
, it follows that
being zeroed out by one of the roots of
(10)
We show that such signals can be recovered uniquely from their
projection onto a subspace of appropriate dimension. The subspace’s dimension equals the number of degrees of freedom of
the signal, and we show how to recover the signal from its projection. One choice of the subspace is the lowpass approximation of the signal that leads to a sampling theorem for periodic
stream of Diracs, nonuniform splines, derivatives of Diracs, and
piecewise polynomials.
(17)
is thus called an annihilating filter since it anThe filter
exponentials [16]. A more detailed discussion of
nihilates
annihilating filters is given in Appendix A. In error-correction
coding, it is called the error locator polynomial [1]. The peis obtained by inversion of
riodic time-domain function
for
the Fourier series, or equivalently, by evaluating
. Thus
(18)
A. Stream of Diracs
Diracs periodized with period ,
, where
, and
. This signal has 2 degrees of freedom per period,
and thus, the rate of innovation is
Consider a stream of
(11)
The periodic stream of Diracs can be rewritten as
from Poisson's summation formula
has zeros at
. Thus, in the time domain,
that is,
we have the equivalent of (17) as
Note that
with
nonzero coefficients for
and that
is unique for
given by (12)
and distinct locations
. This
with
follows from the uniqueness of the set of roots in (15). Further,
, where
define the sinc function of bandwidth
as
, that is,
sinc
. We are now ready to
prove a sampling result for periodic streams of Diracs.
, which is a periodic stream of
Theorem 1: Consider
and at loDiracs of period with Diracs of weight
, as in (12). Take as a sampling kernel
cation
sinc
, where is chosen such that it is greater or equal to
the rate of innovation given by (11) and sample
at
uniform locations
, where
, and
. Then, the samples
(12)
(19)
are a sufficient characterization of
.
1420
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002
Proof: We first show how to obtain the Fourier series cofrom the samples
efficients
. Based on these Fourier series coefficients, we then
, which leads to
show how to obtain the annihilating filter
the locations of the Diracs. Finally, the weights of the Diracs are
.
found, thus specifying uniquely
from
.
1) Finding
Using (10) in (19), we have
Given the locations
, we can write
values
as linear combinations of exponentials following
of
(13). Again, for
(26)
(20)
(21)
, the system of equa-
In general, using
tions is given by
(22)
is the Fourier transform of
. This system
where
with
of equations is not invertible when
, where
. In all other cases, the equations are
. When is a divisor of
,
of maximal rank
this is simply the inverse discrete-time Fourier transform
.
(IDTFT) of
that annihilates
2) Finding the coefficients of the filter
.
, given
We need to solve (17) for
. For example, pick
,
, and
to illustrate. Discarding
,
we have that (17) is equivalent to
(23)
In general, at critical sampling, we have a system given
by
..
.
..
.
..
.
(24)
This is a classic Yule–Walker system [4], which in our
distinct
case has a unique solution when there are
because there is a unique annihilating filter.
Diracs in
.
3) Factoring
, we factor its
Given the coefficients 1,
– transform into its roots
(25)
, which leads to the
where
.
4) Finding the weights
.
locations
..
.
..
.
..
.
..
.
..
.
(27)
which is a Vandermonde system, which always has a solution when the ’s are distinct.
An interpretation of the above result is the following. Take
, and project it onto the lowpass subspace corresponding to
to . This projection, which
its Fourier series coefficients
, is a unique representation of
. Since it is
is denoted by
, it can be sampled
lowpass with bandwidth
. Note that step 2 in
with a sampling period smaller than
. We chose
the proof required 2 adjacent coefficients of
the ones around the origin for simplicity, but any set will do. In
, then
particular, if the set is of the form
as well.
one can use bandpass sampling and recover
weighted
Example 1: Consider a periodic stream of
and sinc sampling kernel with
Diracs with period
,(
), as illustrated in Fig. 2(a)
bandwidth
and (b). The lowpass approximation is obtained by filtering
the stream of Diracs with the sampling kernel, as illustrated in
from the samples is exact
Fig. 2(c). The reconstruction of
to machine precision and, thus, is not shown. The annihilating
filter is illustrated in Fig. 3, and it can be seen that the locations
of the Diracs are exactly the roots of the filter. In the third step
of the proof of Theorem 1, there is a factorization step. This can
be avoided by a method that is familiar in the coding literature
and known as the Berlekamp–Massey algorithm [1]. In our
case, it amounts to a spectral extrapolation procedure.
Corollary 1: Given the annihilating filter
and the spectrum
, one can
by the following recursion:
recover the entire spectrum of
(28)
since
is real.
For negative , use
Proof: The Toeplitz system specifying the annihilating
filter shows that the recursion must be satisfied for all . This
is a th-order recursive difference equation. Such a difference
initial conditions, in our
equation is uniquely specified by
, showing that the entire Fourier series
case by
is specified by (28).
VETTERLI et al.: SAMPLING SIGNALS WITH FINITE RATE OF INNOVATION
=
1421
Fig. 2. Stream of Diracs. (a) Periodic stream of K 8 weighted Diracs with period = 1024. (b) Sinc sampling kernel h (t). (c) Lowpass approximation y (t),
dashed lines are samples y (nT ), T = 32. Sample values y (nT ) = h (t nT ); x(t) . (d) Fourier series X [m] (dB). (e) Central Fourier series H [m]; m =
50; . . . ; 50 (dB). (f) Central Fourier series of sample values Y [m]; m = 50; . . . ; 50 (dB).
0
h
0
0
i
, where
. Thus, the rate of innovation is
, and
(29)
Using (12), we can state that the Fourier series coefficients of
are
. Differtimes shows that these coefficients are
entiating (10)
(30)
Fig. 3. Real and imaginary parts of the annihilating filter a(t) =
A(z )
=
1
e
. The roots of the
annihilating filter are exactly the locations of the Diracs.
j
0
The above recursion is routinely used in error-correction
coding in order to fill in the error spectrum based on the
error locator polynomial. In that application, finite fields are
involved, and there is no problem with numerical precision. In
our case, the real and complex field is involved, and numerical
has zeros on the unit
stability is an issue. More precisely,
circle, and the recursion (28) corresponds to convolution with
, that is, a filter with poles on the unit circle. Such a
filter is marginally stable and will lead to numerical instability.
Thus, Corollary 1 is mostly of theoretical interest.
B. Nonuniform Splines
In this section, we consider periodic nonuniform splines
is a periodic nonuniform spline of
of period . A signal
with knots at
if and only if its
degree
)th derivative is a periodic stream of weighted Diracs
(
This shows that
of length
periodic stream of
can be annihilated by a filter
. From Theorem 1, we can recover the
Diracs from the Fourier series coefficients
and thus follows the periodic
nonuniform spline.
with
Theorem 2: Consider a periodic nonuniform spline
pieces of maximum degree . Take a
period , containing
such that is greater or equal to
sinc sampling kernel
the rate of innovation given by (29), and sample
at
uniform locations
, where
, and
. Then,
is uniquely
represented by the samples
(31)
Proof: The proof is similar to the proof of Theorem 1.
from the
We determine the Fourier series
samples
exactly as in Step 1 in the proof
of Theorem 1 .
1422
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002
=
Fig. 4. Nonuniform spline. (a) Periodic nonuniform linear spline with K
4 knots or transitions with period = 1024. (b) Sinc sampling kernel h (t).
(c) Lowpass approximation y (t); dashed lines are samples y (nT ), T = 64. Sample values y (nT ) = h (t nT ); x(t) . (d) Fourier series X [m] (dB).
(e) Central Fourier series H [m]; m = 50; . . . ; 50 (dB). (f) Central Fourier series Y [m]; m = 50; . . . ; 50 (dB). Reconstruction is within machine precision.
0
0
Note that the Fourier series coefficients of the (
) differentiated nonuniform spline are given by (30); therefore, the
provide
.
values
by
in Steps 2–4 in the proof of
We substitute
Diracs
,
Theorem 1 and thus obtain the stream of
that is, the locations and weights .
is obtained using (30)
The nonuniform spline (see Fig. 4)
(that is,
) and substituting
to get the missing
these values in (10).3
h
0
i
The corresponding Fourier series coefficients are given by
Let
Fourier series simplifies to
(34)
; then, the
and
(35)
C. Derivative of Diracs
Derivative of Diracs are considered in this section to
set the grounds for the following section on piecewise
function is a distripolynomial signals. The Dirac
bution function whose th derivative has the property
, where
is times
continuously differentiable [9].
Consider a periodic stream of differentiated Diracs
(32)
and
with the usual periodicity conditions
.
weights
Note that there are locations and
degrees of freedom per period , that
that makes at most
is, the rate of innovation is
From Proposition 4 in Appendix A, the filter
, with
nihilates the exponential
the filter defined by
(36)
poles at
annihilates
. The locations of
with
the differentiated Diracs are obtained by first finding the annihiand then finding
lating filter coefficients
. The weights
, on the other hand, are found
the roots of
.
by solving the system in (35) for
Theorem 3: Consider a periodic stream of differentiated
with period , as in (32). Take as a sampling
Diracs
sinc
, where
is greater or equal to
kernel
the rate of innovation given by (33), and sample
at
uniform locations
, where
and
. Then, the samples
(33)
3Note
that X [0] is obtained directly.
an, and therefore,
(37)
are a sufficient characterization of
.
VETTERLI et al.: SAMPLING SIGNALS WITH FINITE RATE OF INNOVATION
1423
=
Fig. 5. Piecewise polynomial spline. (a) Periodic piecewise linear spline with K
4 knots or transitions with period = 1024. (b) Sinc sampling kernel
h (t). (c) Lowpass approximation y (t); dashed lines are samples y (nT ), T = 32. Sample values y (nT ) = h (t nT ); x(t) . (d) Fourier series X [m] (dB).
(e) Central Fourier series H [m]; m = 50; . . . ; 50 (dB). (f) Central Fourier series of sample values Y [m]; m = 50; . . . ; 50 (dB). Reconstruction from the
samples is within machine precision.
0
Proof: We follow the same steps as in the proof of Thesamples
with
, we oborem 1. From the
from which we determine
tain the values
. In Step
the annihilating filter coefficients
3, the annihilating filter is factored as in (36), from which we
obtain the multiple roots
and, therefore, the
locations . The
weights
are
found by solving the generalized Vandermonde system in (35)
, from which we obtain
. Simfor
ilar to usual Vandermonde matrices, the determinant of the mafor some
trix given by the system vanishes only when
. This is not our case, and thus, the matrix is nonsingular
and, therefore, admits a unique solution.
D. Piecewise Polynomials
Similar to the definition of a periodic nonuniform splines, a
is a periodic piecewise polynomial with
pieces
signal
) derivative
each of maximum degree if and only if its (
is a stream of differentiated Diracs, that is, given by
with the usual periodicity conditions. The degrees of freedom per period are from the locafrom the weights; thus, the rate of
tions and
innovation is
(38)
By analogy with nonuniform splines, we have the following
Theorem 4: Consider a periodic piecewise polynomial
with period , containing pieces of maximum degree . Take
h
0
0
i
a sinc sampling kernel
such that is greater or equal to
the rate of innovation given by (38), and sample
at
uniform locations
, where
, and
. Then,
is uniquely
represented by the samples
(39)
Proof: The proof follows the same steps as the proof of
Theorems 2 and 3.
samples , we obtain the Fourier seFirst, from the
ries coefficients of the periodic piecewise polynomial signal
. Then, using (30), we obtain
Fourier series coefficients of the (
) times differentiated
signal, which is a stream of differentiated Diracs. Using
Theorem 3, we are able to recover the stream of differentiated
. Similar to the peDiracs from
riodic nonuniform spline, the periodic piecewise polynomial
(see Fig. 5) is obtained using (30) to get the missing
(that is,
) and substituting these values in (10).
IV. FINITE-LENGTH SIGNALS WITH FINITE
RATE OF INNOVATION
A finite-length signal with finite rate of innovation clearly
has a finite number of degrees of freedom. The question of interest is as follows: Given a sampling kernel with infinite support, is there a finite set of samples that uniquely specifies the
signal? In the following sections, we will sample signals with finite number of weighted Diracs with infinite support sampling
kernels such as the sinc and Gaussian.
1424
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002
A. Sinc Sampling Kernel
Consider a continuous-time signal with a finite number of
weighted Diracs
. If we let
lating equation is equivalent to
, then this annihi-
(45)
(40)
(46)
and the sinc sampling kernel. The sample values are obtained by
filtering the signal with a sinc sampling kernel. This is equivalent to taking the inner product between a shifted version of the
, where
sinc and the signal, that is,
sinc
, with
. The question that arises
is the following: How many of these samples do we need to recover the signal? The signal has 2 degrees of freedom: from
from the locations of the Diracs; thus,
the weights and
samples
will be sufficient to recover the signal. Similar to the previous cases, the reconstruction method will require
solving two systems of linear equations: one for the locations of
the Diracs involving a matrix and one for the weights of the
Diracs involving a matrix . These systems admit solutions if
the following conditions are satisfied.
, where
is defined
C1] Rank
by (45).
, where
is defined by (43).
C2] Rank
Theorem 5: Given a finite stream of weighted Diracs and
with
, if conditions C1]
a sinc sampling kernel
samples
and C2] are satisfied, then the
(41)
are a sufficient representation of the signal.
Proof: The sample values in (41) are equivalent to
sinc
matrix. The system (46) admits
where is an
and the Rank
,
a nontrivial solution when
that is, condition C1]. Therefore, (45) can be used to find the
unknowns , which lead to the locations since these
. Once the locations are determined,
are the roots of
the weights of the Diracs are found by solving the system in
. Since
, the system
(44) for
admits a solution from condition C2].
Note that the result does not depend on . This of course
may be
holds only in theory since in practice, the matrix
ill-conditioned if is not chosen appropriately. A natural solution to this conditioning problem is to take more than the critical number of samples and solve (46) using a singular value
decomposition (SVD). This is also the method of choice when
noise is present in the signal. The matrix , which is used to
find the weights of the Diracs, is less sensitive to the value of
and better conditioned on average than .
B. Gaussian Sampling Kernel
Consider sampling the same signal as in (40) but, this time,
. Similar to
with a Gaussian sampling kernel
the sinc sampling kernel, the samples are obtained by filtering
the signal with a Gaussian kernel. Since there are 2 unknown
samples with
are
variables, we show next that
sufficient to represent the signal.
Theorem 6: Given a finite stream of weighted Diracs and
. If
,
a Gaussian sampling kernel
sample values
then the
(47)
are sufficient to reconstruct the signal.
Proof: The sample values are given by
(42)
Lagrange polynomial
Let us define the degree
, where
. Mul, we find an expression in
tiplying both sides of (42) by
terms of the interpolating polynomials.
(43)
(44)
Since the right-hand side of (43) is a polynomial of degree
in the variable , applying
finite differences makes the
left-hand side vanish,4 that is,
K
4Note that the
finite-difference operator plays the same role as the annihilating filter in the previous section.
(48)
,
If we let
, then (48) is equivalent to
, and
(49)
to a linear combinaNote that we reduced the expression
, the annihilating
tion of real exponentials . Since
filter method described in Appendix B allows us to determine
and . In addition, note that the Toeplitz system in (68) has
, and therefore,
real exponential components
a solution exists when the number of equations is greater than
, and the rank of
the number of unknowns, that is,
VETTERLI et al.: SAMPLING SIGNALS WITH FINITE RATE OF INNOVATION
1425
Fig. 6. (a) Hat spline sampling kernel (t=T ), T = 1. (b) Bilevel signal with up to two transitions in an interval [n; n + 1] sampled with a hat sampling kernel.
the system is equal to , which is the case by hypothesis. Furthermore, must be carefully chosen; otherwise, the system is
ill conditioned. The locations are then given by
(50)
and the weights of the Diracs are simply given by
(51)
Here, unlike in the sinc case, we have an almost local reconstruction because of the exponential decay of the Gaussian sampling kernel, which brings us to the next topic.
V. INFINITE-LENGTH BILEVEL SIGNALS WITH FINITE
LOCAL RATE OF INNOVATION
In this section, we consider the dual problem of Section IV,
with a finite local rate
that is, infinite length signals
of innovation and sampling kernels with compact support [19].
In particular, the B-splines of different degree are considered
[17]
(52)
for
and
where the box spline is defined by
0 elsewhere.
We develop local reconstruction algorithms that depend on
moving intervals equal to the size of the support of the sampling kernel. 5 The advantage of local reconstruction algorithms
is that their complexity does not depend on the length of the
signal. In this section, we consider bilevel signals but similar to
the previous sections the results can be extended to piecewise
polynomials.
be an infinite-length continuous-time signal that
Let
takes on two values 0 and 1 with initial condition
with a finite local rate of innovation . These are called bilevel
signals and are completely represented by their transition
values .
Suppose a bilevel signal is sampled with a box
. The sample values obtained are
spline
, which correspond to the area occupied
. Thus, if there is at
by the signal in the interval
most one transition per box, then we can recover the transition
from the sample. This leads us to the following proposition.
5The
size of the support of (t=T ) is equal to (d + 1)T .
Proposition 1: A bilevel signal
with initial conis uniquely determined from the samples
dition
, where
is the box spline if
in each interval
and only if there is at most one transition
.
, and suppose that
Proof: For simplicity, let
. If there are no transitions in the interval
, then the
. If there is one transition in the interval
sample value is
, then the sample value is equal to
from
.
which we uniquely obtain the transition value
in
To show necessity, suppose there are two transitions ,
, then the sample value is equal to
the interval
and is not sufficient to determine both transitions.
Thus, there must be at most one transition in an interval
to uniquely define the signal.
Now, consider shifting the bilevel signal by an unknown shift
; then, there can be two transitions in an interval of length ,
and one box function will not be sufficient to recover the transitions. Suppose we double the sampling rate; then, the support
of the box sampling kernel is doubled, and we have two sample
covering the interval
, but these
values ,
values are identical (see their areas). Therefore, increasing the
sampling rate is still insufficient.
This brings us to consider a sampling kernel not only with a
larger support but with added information. For example, the hat
for
and 0 elsewhere leads
spline function
.
to the sample values defined by
Fig. 6 illustrates that there are two sample values covering the
from which we can uniquely determine
interval
the signal.
, with
Proposition 2: An infinite-length bilevel signal
, is uniquely determined from the
initial condition
, where
samples defined by
is the hat sampling kernel if and only if there are at most two
in each interval
.
transitions
, and suppose the
Proof: Again, for simplicity, let
and
. First, we show sufsignal is known for
ficiency by showing the existence and uniqueness of a solution.
Then, we show necessity by a counterexample.
Similar to the box sampling kernel, the sample values
will depend on the configuration of the transitions in the
. If there are at most two transitions in
interval
, then the possible configurations are
the interval
(0,0), (0,1), (0,2), (1,0), (1,1), (2,0), where the first and
1426
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002
second component indicate the number of transitions in the
,
, respectively. Furthermore,
intervals
since the hat sampling kernel is of degree one, we obtain
for each configuration a quadratic system of equations with
variables ,
(53)
(54)
which admits a solution in the given interval.
and
, then this
As for uniqueness, if
implies configuration (0,0).
and
, then the possible
If
configurations are (0,1),(0,2). By hypothesis, there are at
; therefore,
most two transitions in the interval
, then the configuration in the interval
if
is (0,1); otherwise, if
, then the configuration
is (0,2).
and
, then this implies
If
configuration (2,0).
and
, then this implies
If
configuration (1,0).
Necessity is shown by counterexample.
Consider a bilevel signal with three transitions in the
interval [0,2] but with all three in the interval [0,1]. Then,
the quadratic system of equations is
(55)
(56)
which does not admit a unique but an infinite number of
solutions. Thus, there must be at most 2 transitions in an
interval [0,2].
The pseudo-code for sampling bilevel signals using the
box and hat functions are given in full detail in [10], [22]. A
stronger condition than the one in Proposition 2 is to require
. In that case, we are ensured that on any
interval of length 2 , there are at most two transitions, and
therefore, the reconstruction is unique. Based on Propositions
1 and 2, we conjecture that using splines of degree , a local
ensures unicity of
rate of innovation
the reconstruction.
VI. CONCLUSION
We considered signals with finite rate of innovation that
allow uniform sampling after appropriate smoothing and perfect reconstruction from the samples. For signals like streams
of Diracs, nonuniform splines, and piecewise polynomials, we
were able to derive exact reconstruction formulas, even though
these signals are nonbandlimited.
The methods rely on separating the innovation in a signal
from the rest and identifying the innovative part from the samples only. In particular, the annihilating filter method plays a
key role in isolating the innovation. Some extensions of these
results, like to piecewise bandlimited signals, are presented in
[11] and [21], and more details, including a full treatment of the
discrete-time case, are available in [10], [22], and [23].
To prove the sampling theorems, we assumed deterministic,
noiseless signals. In practice, noise will be present, and this can
be dealt with by using oversampling and solving the various systems involved using the singular value decomposition (SVD).
Such techniques are standard, for example, in noisy spectral estimation [16]. Initial investigations using these techniques in our
sampling problem are promising [13].
It is of also interest to compare our notion of “finite rate
of innovation” with the classic Shannon bandwidth [12]. The
Shannon bandwidth finds the dimension of the subspace (per
unit of time) that allows us to represent the space of signals of
interest. For bandpass signals, where Nyquist’s rate is too large,
Shannon’s bandwidth is the correct notion, as it is for certain
other spread-spectrum signals. For pulse position modulation
(PPM) [7], Shannon’s bandwidth is proportional to the number
of possible positions, whereas the rate of innovation is fixed per
interval.6 Thus, the rate of innovation coincides with our intuition for degrees of freedom.
This discussion also indicates that an obvious application of
our results is in communications systems, like, for example, in
ultrawide band communication (UWB). In such a system, a very
narrow pulse is generated, and its position is carrying the information. In [6], initial results indicate that a decoder can work
at much lower rate than Nyquist’s rate by using our sampling
results. Finally, filtered streams of Diracs, known as shot noise
[7], can also be decoded with low sampling rates while still recovering the exact positions of the Diracs. Therefore, we expect
the first applications of our sampling results to appear in wideband communications.
Finally, the results presented so far raise a number of questions for further research. What other signals with finite rate of
innovation can be sampled and perfectly reconstructed? What
tools other than the annihilating filter can be used to identify innovation, and what other types of innovation can be identified?
What are the multidimensional equivalents of the above results,
and are they computationally feasible? These topics are currently under investigation. Thus, the connection between sampling theory on the one hand, and spectral analysis and error
correction coding on the other hand, is quite fruitful.
APPENDIX A
ANNIHILATING FILTERS
Here, we give a brief overview of annihilating filters (see [16]
for more details). First, we have the following definition.
is called an annihilating filter of
Definition 2: A filter
when
a signal
(57)
6The number of degrees of freedom per interval is 1 or 2, depending if the
amplitude is fixed or free.
VETTERLI et al.: SAMPLING SIGNALS WITH FINITE RATE OF INNOVATION
1427
Next, we give annihilating filters for signals that are linear combinations of exponentials.
, where
Proposition 3: The signal
,
, is annihilated by the filter
(58)
and is composed of three parts: First, we need to find the annihilating filter that involves solving a linear system of equations;
second, we need to find the roots of the -transform of the annihilating filter, which is a nonlinear function; third, we must
solve another linear system of equations to find the weights.
1) Finding the annihilating filter.
in
The filter coefficients
must be such that (57) is satisfied or
Proof: Note that
(67)
(59)
In matrix/vector form, the system in (67) is equivalent to
(60)
(61)
annihilates
.
Thus,
Next, we show that the signal
.
with poles, where
Proposition 4: The signal
filter
is annihilated by a filter
is annihilated by the
(62)
Proof: Note that
(63)
(64)
By differentiating
times
, we easily see that
(65)
This is true for
for all polynomials
in particular, for
. Thus,
of degree less than or equal to
. Thus,
annihilates
It follows from Proposition 3 and 4 that the filter
annihilates the signal
..
.
..
.
..
.
..
.
..
.
..
.
..
.
..
..
.
.
..
.
(68)
are available. Since there are
Suppose values
unknown filter coefficients, we need at least
equations, and therefore,
must be greater or equal to
. Define the appropriate submatrix; then, the
will admit a solution when Rank
system
.
In practice, this system is solved using an SVD where
. We obtain
the matrix is decomposed into
, where
is a vector with 1
that
and 0 elsewhere. The method can be
on position
, in which
adapted to the noisy case by minimizing
is given by the eigenvector associated with the
case,
.
smallest eigenvalue of
2) Finding the .
are found, then the
Once the the filter coefficients
are simply the roots of the annihilating filter
values
.
3) Finding the .
To determine the weights , it suffices to take
equations in (66) and solve the system for
. Let
; then, in matrix vector form, we have
the following Vandermonde system:
,
.
..
.
..
.
..
.
.
APPENDIX B
ANNIHILATING FILTER METHOD
..
.
The annihilating filter method consists of finding the values
and
in
..
.
(69)
and has a unique solution when
(70)
(66)
This concludes the annihilating filter method.
1428
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 50, NO. 6, JUNE 2002
ACKNOWLEDGMENT
The authors would like to thank the reviewers for their
constructive remarks. Suggestions by Prof. S. Shamai on the
Shannon bandwidth and by Prof. E. Telatar and Prof. R. Urbanke
on the connection to error correction codes are greatly acknowledged. A special thank you to Prof. P. J. S. G. Ferreira
for his invaluable suggestions.
REFERENCES
[1] R. E. Blahut, Theory and Practice of Error Control Codes. Reading,
MA: Addison-Wesley, 1983.
[2] T. M. Cover and J. A. Thomas, Elements of Information Theory. New
York: Wiley Interscience, 1991.
[3] P. J. S. G. Ferreira and J. M. N. Vieira, “Locating and correcting errors in
images,” in Proc. IEEE Int. Conf. Image Process., vol. I, Santa Barbara,
CA, Oct 1997, pp. 691–694.
[4] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore,
MD: John Hopkins Univ. Press, 1996.
[5] A. J. Jerri, “The Shannon sampling theorem-its various extensions and
applications: A tutorial review,” Proc. IEEE, vol. 65, pp. 1565–1597,
1977.
[6] J. Kusuma, A. Ridolfi, and M. Vetterli, “Sampling of communication
systems with bandwidth expansion,” presented at the Int. Contr. Conf.,
2002.
[7] E. Lee and D. Messerschmitt, Digital Communication. Boston, MA:
Kluwer, 1994.
[8] S. Mallat, A Wavelet Tour of Signal Processing. San Diego, Ca: Academic, 1998.
[9] J. E. Marsden, Elementary Classical Analysis. New York: SpringerVerlag, 1974.
[10] P. Marziliano, “Sampling Innovations,” Ph.D. dissertation, Swiss Fed.
Inst. Technol., Audiovis. Commun. Lab., Lausanne, Switzerland, 2001.
[11] P. Marziliano, M. Vetterli, and T. Blu, “A sampling theorem for a class
of piecewise bandlimited signals,” submitted for publication.
[12] J. L. Massey, “Toward an information theory of spread-spectrum systems,” in Code Division Multiple Access Communications, S.G. Glisic
and P.A. Leppanen, Eds. Boston, MA: Kluwer, 1995, pp. 29–46.
[13] A. Ridolfi, I. Maravic, J. Kusuma, and M. Vetterli, “Sampling signals
with finite rate of innovation: the noisy case,” Proc. ICASSP, 2002.
[14] T. J. Shan, A. M. Bruckstein, and T. Kailath, “Adaptive resolution of
overlapping echoes,” in Proc. IEEE Int. Symp. Circuits Syst., 1986.
[15] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.
Tech. J., vol. 27, 1948.
[16] P. Stoica and R. Moses, Introduction to Spectral Analysis. Englewood
Cliffs, NJ: Prentice-Hall, 2000.
[17] M. Unser, “Splines: A perfect fit for signal and image processing,” IEEE
Signal Processing Mag., vol. 16, pp. 22–38, Nov. 1999.
[18]
, “Sampling–50 years after Shannon,” Proc. IEEE, vol. 88, pp.
569–587, Apr. 2000.
[19] M. Vetterli, “Sampling of piecewise polynomial signals,” Swiss Fed.
Inst. Technol., Lausanne, Switzerland, LCAV, 1999.
[20] M. Vetterli and J. Kovačević, Wavelets and Subband
Coding. Englewood Cliffs, NJ: Prentice-Hall, 1995.
[21] M. Vetterli, P. Marziliano, and T. Blu, “Sampling discrete-time piecewise bandlimited signals,” in Proc. Sampling Theory and Applications
Workshop, Orlando, FL, May 2001, pp. 97–102.
[22]
, “Sampling signals with finite rate of innovation,” Swiss Fed. Inst.
Technol., Lausanne, Switzerland, DSC-017, 2001.
[23]
, “A sampling theorem for periodic piecewise polynomial signals,”
in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Salt Lake City,
UT, May 2001.
[24] J. M. N. Vieira and P. J. S. G. Ferreira, “Interpolation, spectrum analysis,
error-control coding, and fault-tolerant computing,” in Proc. IEEE Int.
Conf. Acoust., Speech, Signal Process., vol. 3, 1997, pp. 1831–1834.
[25] A. J. Weiss, A. S. Willsky, and B. C. Levy, “Eigenstructure approach
for array-processing with unknown intensity coefficients,” IEEE Trans.
Acoust., Speech, Signal Processing, vol. 36, pp. 1613–1617, Oct. 1988.
Martin Vetterli (F’95) received the Dipl. El.-Ing. degree from ETH Zürich (ETHZ), Zürich, Switzerland,
in 1981, the M.S. degree from Stanford University,
Stanford, CA, in 1982, and the D.Sc. degree from
the Swiss Federal Institute of Technology, (EPFL)
Lausanne, Switzerland, in 1986.
He was a Research Assistant at Stanford and EPFL
and has worked for Siemens and AT&T Bell Laboratories. In 1986, he joined Columbia University, New
York, NY, where he was last an Associate Professor
of electrical engineering and co-director of the Image
and Advanced Television Laboratory. In 1993, he joined the University of California, Berkeley, where he was a Professor with the Department of Electrical Engineering and Computer Science until 1997. He now holds an Adjunct Professor
position at Berkeley. Since 1995, he has been a Professor of communication systems at EPFL, where he chaired the Communications Systems Division from
1996 to 1997 and heads the Audio-Visual Communications Laboratory. Since
2001, he has directed the National Competence Center in Research on mobile
information and communication systems (www.terminodes.org). He has held
visiting positions at ETHZ (1990) and Stanford (1998). He is on the editorial
boards of Annals of Telecommunications, Applied and Computational Harmonic
Analysis, and The Journal of Fourier Analysis and Applications. He is the coauthor, with J. Kovacević, of the book Wavelets and Subband Coding (Englewood Cliffs, NJ: Prentice-Hall, 1995). He has published about 85 journal papers
on a variety of topics in signal/image processing and communications and holds
seven patents. His research interests include sampling, wavelets, multirate signal
processing, computational complexity, signal processing for communications,
digital video processing, and joint source/channel coding.
Dr. Vetterli is a member of SIAM and was the Area Editor for Speech,
Image, Video, and Signal Processing of the IEEE TRANSACTIONS ON
COMMUNICATIONS. He received the Best Paper Award from EURASIP in 1984
for his paper on multidimensional subband coding, the Research Prize of the
Brown Bovery Corporation (Switzerland) in 1986 for his doctoral thesis, and
the IEEE Signal Processing Society’s Senior Awards in 1991 and in 1996
(for papers with D. LeGall and K. Ramchandran, respectively). He won the
Swiss National Latsis Prize in 1996, the SPIE Presidential award in 1999, and
the IEEE Signal Processing Technical Achievement Award in 2001. He is a
member of the Swiss Council on Science and Technology (www.swtr.ch). He
was a plenary speaker at various conferences (e.g., 1992 IEEE ICASSP).
Pina Marziliano was born in Montréal, QC,
Canada, in 1971. She received the B.Sc. degree in
applied mathematics in 1994 and the M.Sc. degree
in computer science (operations research) in 1996,
both from the Université de Montréal. She received
the Ph.D. degree in communication systems from
the Swiss Federal Institute of Technology, Lausanne,
in 2001.
She is presently doing research on perceptual
quality metrics for images and video at Genista
Corporation, Genimedia SA, Lausanne. Her research
interests include Fourier and wavelet analysis, sampling theory applied to
audio-visual communications, and perceptual quality metrics.
Thierry Blu (M’96) was born in Orléans, France, in
1964. He received the Diplôme d’ingénieur degree
from École Polytechnique, Paris, France, in 1986 and
from Télécom Paris (ENST) in 1988. He received the
Ph.D degree in electrical engineering in 1996 from
ENST for a study on iterated rational filterbanks applied to wideband audio coding.
He is currently with the Biomedical Imaging
Group at the Swiss Federal Institute of Technology
(EPFL), Lausanne, on leave from France Télécom
National Centre for Telecommunication Studies
(CNET), Issy-les-Moulineaux, France. His interests include (multi)wavelets,
multiresolution analysis, multirate filterbanks, approximation and sampling
theory, and psychoacoustics.
Sparse Sampling
of Signal Innovations
[Theory, algorithms, and performance bounds]
[
Thierry Blu,
Pier-Luigi Dragotti,
Martin Vetterli,
Pina Marziliano,
and Lionel Coulot
]
S
ignal acquisition and reconstruction is at the heart of signal processing, and
sampling theorems provide the bridge between the continuous and the discrete-time worlds. The most celebrated and widely used sampling theorem is
often attributed to Shannon (and many others, from Whittaker to Kotel’nikov
and Nyquist, to name a few) and gives a sufficient condition, namely bandlimitedness, for an exact sampling and interpolation formula. The sampling rate, at twice the
maximum frequency present in the signal, is usually called the Nyquist rate.
Bandlimitedness, however, is not necessary as is well known but only rarely taken advantage of [1]. In this broader, nonbandlimited view, the question is: when can we acquire a
signal using a sampling kernel followed by uniform sampling and perfectly reconstruct it?
The Shannon case is a particular example, where any signal from the subspace of
bandlimited signals, denoted by BL, can be acquired through sampling and perfectly
Digital Object Identifier 10.1109/MSP.2007.914998
1053-5888/08/$25.00©2008IEEE
IEEE SIGNAL PROCESSING MAGAZINE [31] MARCH 2008
© PHOTO CREDIT
© DIGITAL VISION
And finally, what is the relationship between the presented
interpolated from the samples. Using the sinc kernel, or ideal
methods and classic methods as well as the recent advances in
low-pass filter, nonbandlimited signals will be projected onto
compressed sensing and sampling?
the subspace BL. The question is: can we beat Shannon at this
game, namely, acquire signals from outside of BL and still
SIGNALS WITH FINITE RATE OF INNOVATION
perfectly reconstruct? An obvious case is bandpass sampling
Using the sinc kernel (defined as sinc(t) = sin π t/π t ), a signal
and variations thereof. Less obvious are sampling schemes
x(t ) bandlimited to [−B/2, B/2] can be expressed as
taking advantage of some sort of sparsity in the signal, and
this is the central theme of this article. That is, instead of
generic bandlimited signals, we consider the sampling of
x(t ) =
xk sinc(Bt − k),
(1)
classes of nonbandlimited parametric signals. This allows us
k∈ Z
to circumvent Nyquist and perfectly sample and reconstruct
signals using sparse sampling, at a rate characterized by how
sparse they are per unit of time. In some sense, we sample at
where xk = B sinc(Bt − k), x(t ) = x(k/B) , as stated by
the rate of innovation of the signal by complying with
Shannon in his classic 1948 paper [4]. Alternatively, we can say
Occam’s razor principle [known
that x(t ) has B degrees of freedom
as Lex Parcimoniæ or Law of
per second, since x(t ) is exactly
Parsimony: Entia non svnt mvltidefined by a sequence of real numCAN WE BEAT SHANNON AT THIS
plicanda præter necessitatem, or,
bers {xk}k ∈ Z , spaced T = 1/B secGAME, NAMELY, ACQUIRE SIGNALS
“Entities should not be multionds apart. It is natural to call this
FROM OUTSIDE OF BL AND STILL
plied beyond necessity” (from
the rate of innovation of the banPERFECTLY RECONSTRUCT?
Wikipedia)].
dlimited process, denoted by ρ ,
Besides Shannon’s sampling
and equal to B.
theorem, a second basic result that permeates signal proA generalization of the space of bandlimited signals is the
cessing is certainly Heisenberg’s uncertainty principle,
space of shift-invariant signals. Given a basis function ϕ(t ) that
which suggests that a singular event in the frequency
is orthogonal to its shifts by multiples of T, or
ϕ(t − kT), ϕ(t − k T) = δk−k , the space of functions
domain will be necessarily widely spread in the time domain.
A superficial interpretation might lead one to believe that a
obtained by replacing sinc with ϕ in (1) defines a shift-invariant
perfect frequency localization requires a very long time
space S. For such functions, the rate of innovation is again
observation. That this is not necessary is demonstrated by
equal to ρ = 1/ T.
high resolution spectral analysis methods, which achieve
Now, let us turn our attention to a generic sparse source,
very precise frequency localization using finite observation
namely a Poisson process, which is a set of Dirac pulses,
windows [2], [3]. The way around Heisenberg resides in a
k∈Z δ(t − tk ), where tk − tk−1 is exponentially distributed
parametric approach, where the prior that the signal is a linwith p.d.f. λe−λt . Here, the innovations are the set of posiear combination of sinusoids is put to contribution.
tions {tk}k ∈ Z . Thus, the rate of innovation is the average
If by now you feel uneasy about slaloming around Nyquist,
number of Diracs per unit of time: ρ = lim T→∞ CT/ T, where
CT is the number of Diracs in the interval [−T/2, T/2]. This
Shannon, and Heisenberg, do not worry. Estimation of sparse
data is a classic problem in signal processing and communicaparallels the notion of information rate of a source based on
tions, from estimating sinusoids in noise, to locating errors in
the average entropy per unit of time introduced by Shannon
digital transmissions. Thus, there is a wide variety of available
in the same 1948 paper. In the Poisson case with decay rate
λ, the average delay between two Diracs is 1/λ; thus, the rate
techniques and algorithms. Also, the best possible performance
is given by the Cramér-Rao lower bounds for this parametric
of innovation ρ is equal to λ. A generalization involves
estimation problem, and one can thus check how close to optiweighted Diracs or
mal a solution actually is.
We are thus ready to pose the basic questions of this artix(t ) =
xkδ(t − tk).
cle. Assume a sparse signal (be it in continuous or discrete
k∈ Z
time) observed through a sampling device that is a smoothing
kernel followed by regular or uniform sampling. What is the
By similar arguments, ρ = 2λ in this case, since both posiminimum sampling rate (as opposed to Nyquist’s rate, which
tions and weights are degrees of freedom. Note that this
is often infinite in cases of interest) that allows to recover the
class of signals is not a subspace, and its estimation is a nonsignal? What classes of sparse signals are possible? What are
linear problem.
good observation kernels, and what are efficient and stable
Now comes the obvious question: is there a sampling theorecovery algorithms? How does observation noise influence
rem for the type of sparse processes just seen? That is, can we
recovery, and what algorithms will approach optimal performacquire such a process by taking about ρ samples per unit of
ance? How will these new techniques impact practical applicatime, and perfectly reconstruct the original process, just as the
tions, from inverse problems to wideband communications?
Shannon sampling procedure does.
IEEE SIGNAL PROCESSING MAGAZINE [32] MARCH 2008
The necessary sampling rate is clearly ρ , the rate of
innovation. To show that it is sufficient can be done in a
number of cases of interest. The archetypal sparse signal is
the sum of Diracs, observed through a suitable sampling
kernel. In this case, sampling theorems at the rate of innovation can be proven. Beyond the question of a representation theorem, we also derive efficient computational
procedures, showing the practicality of the approach. Next
comes the question of robustness to noise and optimal estimation procedures under these conditions. We propose
algorithms to estimate sparse signals in noise that achieve
performance close to optimal. This is done by computing
Cramér-Rao bounds that indicate the best performance of
an unbiased estimation of the innovation parameters. Note
that when the signal-to-noise ratio (SNR) is poor, the algorithms are iterative and thus trade computational complexity for estimation performance.
To easily navigate through the article, the reader will find the
most frequent notations in Table 1 that will be used in the sequel.
SAMPLING SIGNALS AT THEIR RATE OF INNOVATION
We consider a τ -periodic stream of K Diracs with amplitudes xk
located at times tk ∈ [0, τ [:
x(t ) =
K xkδ(t − tk − k τ ) .
(2)
Since x(t ) is periodic, we can use the Fourier series to represent it. By expressing the Fourier series coefficients of x(t )
we thus have
x(t ) =
x̂m e j2π mt/τ ,
where
m∈Z
K
1
x e− j2π mtk /τ .
τ k =1 k x̂m =
(5)
ukm
We observe that the signal x(t ) is completely determined
by the knowledge of the K amplitudes xk and the K locations
tk , or equivalently, uk . By considering 2K contiguous values
of x̂m in (5), we can build a system of 2K equations in 2K
unknowns that is linear in the weights xk but is highly
nonlinear in the locations tk and therefore cannot be solved
using classical linear algebra. Such a system, however,
admits a unique solution when the Diracs locations are distinct, which is obtained by using a method known in spectral estimation as Prony’s method [2], [3], [5], [6], and
which we choose to call the annihilating filter method for
the reason clarified below. Call {hk}k=0,1,... ,K the filter coefficients with z-transform
H(z) =
K
k =0
k =1 k ∈ Z
hk z−k =
K 1 − uk z−1 .
(6)
k =1
We assume that the signal x(t )
[TABLE 1] FREQUENTLY USED NOTATIONS.
is convolved with a sinc window
SYMBOL
MEANING
of bandwidth B, where B τ is an x(t ), τ, x̂m
τ -PERIODIC FINITE RATE OF INNOVATION
SIGNAL AND ITS FOURIER COEFFICIENTS
INNOVATION PARAMETERS: x(t ) = Kk=1 xK δ(t − tk ) FOR t[0, τ [
o d d i n t e g e r a n d i s u n i f o r m l y K, tk , xk , and ρ
AND RATE OF INNOVATION OF THE SIGNAL: ρ = 2K/τ
sampled with sampling period ϕ(t ), B
“ANTIALIASING” FILTER, PRIOR TO SAMPLING: TYPICALLY, ϕ(t ) = sinc Bt
NOTE: B × τ IS RESTRICTED TO BE AN ODD INTEGER
T = τ/N . ( We w i l l u s e t h i s
yn , ŷm , N, T
(NOISY) SAMPLES {yn }n = 1, 2, . . . , N OF (ϕ ∗ x)(t )
hypothesis throughout the artiAT MULTIPLES OF T = τ/N [SEE (14)] AND THEIR DFT COEFFICIENTS ŷm
RECTANGULAR ANNIHILATION MATRIX WITH L + 1 COLUMNS [SEE (12)]
cle to simplify the expressions A, L
H(z ), hk AND H
ANNIHILATING FILTER: z-TRANSFORM, IMPULSE RESPONSE AND VECTOR
and because it allows converREPRESENTATION
gence of the τ -periodized sum of
sinc kernels.) We therefore want
t o r e t r i e v e t h e i n n o v a t i o n s xk a n d tk f r o m t h e
That is, the roots of H(z) correspond to the locations
n = 1, 2, . . . , N measurements
uk = e− j2π tk /τ . It clearly follows that
yn = x(t ), sinc(B(nT − t)) =
K
xkϕ(nT − tk),
hm ∗ x̂m =
k =0
k =1
where ϕ(t ) =
k∈ Z
(3)
sinc(B(t − k τ )) =
sin(π Bt)
Bτ sin(π t/τ )
(4)
is the τ -periodic sinc function or Dirichlet kernel. Clearly,
x(t ) has a rate of innovation ρ = 2K/τ , and we aim to
devise a sampling scheme that is able to retrieve the innovations of x(t ) by operating at a sampling rate that is as
close as possible to ρ.
K
=
hk x̂m−k =
K K
xk hk ukm−k
τ
k =0 k =1
K
K
xk m hk u−k
= 0.
uk k
τ
k =1
k =0
(7)
H(uk )=0
The filter hm is thus called annihilating filter since it annihilates
the discrete signal x̂m . The zeros of this filter uniquely define
the locations tk of the Diracs. Since h0 = 1, the filter coefficients hm are found from (7) by involving at least 2K consecutive values of x̂m , leading to a linear system of equations; e.g., if
IEEE SIGNAL PROCESSING MAGAZINE [33] MARCH 2008
we have x̂m for m = −K, −K + 1, . . . , K − 1, this system can
be written in square Toeplitz matrix form as follows:
⎡
x̂−1
⎢ x̂0
⎢
⎢ .
⎣ ..
x̂K−2
x̂−2
x̂−1
..
.
x̂K−3
···
···
..
.
···
⎤⎡
⎤
⎤
⎡
x̂−K
h1
x̂0
⎢
⎥
⎢ x̂1 ⎥
x̂−K+1 ⎥
⎥ ⎢ h2 ⎥
⎥
⎢
.. ⎥ ⎢ .. ⎥ = − ⎢ .. ⎥ . (8)
⎦
⎣
⎦
⎣
.
. ⎦
.
hK
x̂K−1
x̂−1
N ≥ Bmin τ = 2K + 1, which is the next best thing to critical
sampling. Moreover, the reconstruction algorithm is fast and
does not involve any iterative procedures. Typically, the only
step that depends on the number of samples, N, is the computation of the DFT coefficients of the samples yn , which can of
course be implemented in O(N log2 N) elementary operations
using the fast Fourier transform algorithm. All the other steps
of the algorithm (in particular, polynomial rooting) depend on
K only, i.e., on the rate of innovation ρ.
2M – L + 1 rows
If the xk s do not vanish, this K × K system of equations has a
MORE ON ANNIHILATION
unique solution because any hm satisfying it is also such that
H(uk) = 0 for k = 1, 2, . . . K. Given the filter coefficients hm ,
A closer look at (7) indicates that any nontrivial filter
{hk}k=0,1,... ,L where L ≥ K that has uk = e− j2π tk /τ as zeros
the locations tk are retrieved from the zeros uk of the z-transform in (6). The weights xk are
will annihilate the Fourier series
coefficients of x(t ). The converse
then obtained by considering, for
BESIDES SHANNON’S SAMPLING
is true: any filter with transfer
instance, K consecutive FourierTHEOREM, A SECOND BASIC
function H(z) that annihilates the
series coefficients as given in (5).
RESULT THAT PERMEATES
x̂m is automatically such that
By writing the expression of these
K coefficients in vector form, we
H(uk) = 0 for k = 1, 2, . . . , K.
SIGNAL PROCESSING IS
obtain a Vandermonde system of
Taking (10) into account, this
CERTAINLY HEISENBERG’S
equations that yields a unique
means that for such filters
UNCERTAINTY PRINCIPLE.
solution for the weights xk since
the uk s are distinct. Notice that we
L
need in total no more than 2K consecutive coefficients x̂m to
hk ŷm−k = 0,
for all |m| ≤ B τ/2.
(11)
solve both the Toeplitz system (8) and the Vandermonde system.
k =0
This confirms our original intuition that the knowledge of only
2K Fourier-series coefficients is sufficient to retrieve x(t ).
These linear equations can be expressed using a matrix formalWe are now close to solving our original sampling question;
ism: let A be the Toeplitz matrix
the only remaining issue is to find a way to relate the Fourier⎧
L + 1 columns
series coefficients x̂m to the actual measurements yn. Assume
⎪
⎡
⎪
⎪
⎤
⎪
⎪
N ≥ B τ then, for n = 1, 2, . . . , N, we have that
ŷ−M
ŷ−M+L−1 · · ·
ŷ−M+L
⎪
⎪
⎪
⎢
⎪
· · · ŷ−M+1 ⎥
ŷ−M+L
⎪
⎥
⎪⎢ ŷ−M+L+1
⎪
⎢
⎪
.
.. ⎥
.
..
⎪⎢
..
..
⎥
⎨
yn = x(t ), sinc(Bt − n) =
T x̂m e j2π mn/N . (9)
.
.
⎥
⎢
A=
⎥
⎢
..
.
|m|≤B τ/2
.
.
..
..
.. ⎥
⎢
⎪
⎪
.
⎥
⎢
⎪
⎪
⎢
⎪
..
.. ⎥
..
..
⎪
⎪
⎦
⎣
Up to a factor N T = τ , this is simply the inverse discrete Fourier
.
.
⎪
.
.
⎪
⎪
⎪
transform (DFT) of a discrete signal bandlimited to
⎪
·
·
·
ŷ
ŷ
ŷ
⎪
M
M−1
M−L
⎪
⎩
[−Bτ/2, Bτ/2] and which coincides with x̂m in this bandwidth. As a consequence, the discrete Fourier coefficients of yn
where
M = B τ/2,
(12)
provide Bτ consecutive coefficients of the Fourier series of x(t )
according to
and H = [h0 , h1 , . . . , hL]T the vector containing the coeffiN
cients of the annihilating filter, then (11) is equivalent to
yn e− j2π mn/N
ŷm =
n =1
=
τ x̂m
0
if |m| ≤ B τ/2
for other m ∈ [−N/2, N/2].
AH = 0,
(10)
Let us now analyze the complete retrieval scheme more precisely and draw some conclusions. First of all, since we need at
least 2K consecutive coefficients x̂m to use the annihilating filter method, this means that B τ ≥ 2K. Thus, the bandwidth of
the sinc kernel, B, is always larger than 2K/τ = ρ, the rate of
innovation. However, since B τ is odd, the minimum number of
samples per period is actually one sample larger,
(13)
which can be seen as a rectangular extension of (8). Note that,
unlike (6), H is not restricted to satisfy h0 = 1. Now, if we
choose L > K, there are L − K + 1 independent polynomials of
degree L with zeros at {uk}k =1,2,... ,K , which means that there
are L − K + 1 independent vectors H which satisfy (13). As a
consequence, the rank of the matrix A does never exceed K. This
provides a simple way to determine K when it is not known a
priori: find the smallest L such that the matrix A built according
to (12) is singular, then K = L − 1.
IEEE SIGNAL PROCESSING MAGAZINE [34] MARCH 2008
The annihilation property (11) satisfied by the DFT coefficients ŷm is narrowly linked to the periodized sinc-Dirichlet
window used prior to sampling. Importantly, this approach can
be generalized to other kernels such as the (nonperiodized) sinc,
the Gaussian windows [7], and more recently any window that
satisfies a Strang-Fix like condition [8].
Digital Noise
Analog Noise
∑kxkδ(t−tk)
Sampling Kernel
ϕ(t )
y (t )
T
yn
Linear System
[FIG1] Block diagram representation of the sampling of an FRI
FRI SIGNALS WITH NOISE
signal, with indications of potential noise perturbations in the
“Noise,” or more generally model mismatch are unfortunately
analog, and in the digital part.
omnipresent in data acquisition, making the solution presented in the previous section only ideal. Schematically,
perturbations to the finite rate of innoCadzow
Yes
vation (FRI) model may arise both in
ŷn
the analog domain during, e.g., a transtk
yn
Annihilating
No
Too
tk
FFT
Noisy?
Filter Method
mission procedure, and in the digital
domain after sampling (see Figure 1)—
xk
in this respect, quantization is a source
of corruption as well. There is then no
other option but to increase the sam[FIG2] Schematical view of the FRI retrieval algorithm. The data are considered “too noisy”
pling rate in order to achieve robust- until they satisfy (11) almost exactly.
ness against noise.
to noise levels of the order of 5 dB (depending on the number of
Thus, we consider the signal resulting from the convolution of
samples). In particular, these bounds tell us how to choose the
the τ -periodic FRI signal (2) and a sinc-window of bandwidth B,
bandwidth of the sampling filter.
where B τ is an odd integer. Due to noise corruption, (3) becomes
yn =
K
xkϕ(nT − tk) + εn
for n = 1, 2, . . . , N,
(14)
k =1
where T = τ/N and ϕ(t ) is the Dirichlet kernel (4). Given that
the rate of innovation of the signal is ρ , we will consider
N > ρτ samples to fight the perturbation εn, making the data
redundant by a factor of N/(ρτ ). At this point, we do not make
specific assumptions—in particular, of statistical nature—on εn.
What kind of algorithms can be applied to efficiently exploit this
extra redundancy and what is their performance?
A related problem has already been encountered decades
ago by researchers in spectral analysis where the problem of
finding sinusoids in noise is classic [9]. Thus we will not try to
propose new approaches regarding the algorithms. One of the
difficulties is that there is as yet no unanimously agreed optimal algorithm for retrieving sinusoids in noise, although
there has been numerous evaluations of the different methods
(see, e.g., [10]). For this reason, our choice falls on the the
simplest approach, the total least-squares approximation
(implemented using a singular value decomposition (SVD), an
approach initiated by Pisarenko in [11]), possibly enhanced by
an initial “denoising” (more exactly, model matching) step
provided by what we call Cadzow’s iterated algorithm [12].
The full algorithm, depicted in Figure 2, is also detailed in its
two main components in “Annihilating Filter: Total LeastSquares Method” and “Cadzow’s Iterative Denoising.”
By computing the theoretical minimal uncertainties known
as Cramér-Rao bounds on the innovation parameters, we will
see that these algorithms exhibit a quasi-optimal behavior down
TOTAL LEAST-SQUARES APPROACH
In the presence of noise, the annihilation equation (13) is not
satisfied exactly, yet it is still reasonable to expect that the minimization of the Euclidian norm AH2 under the constraint
that H2 = 1 may yield an interesting estimate of H. Of particular interest is the solution for L = K—annihilating filter of
minimal size—because the K zeros of the resulting filter provide a unique estimation of the K locations tk. It is known that
this minimization can be solved by performing an SVD of A as
ANNIHILATING FILTER: TOTAL LEAST-SQUARES METHOD
An algorithm for retrieving the innovations xk and tk from the
noisy samples of (14).
1) Compute the N-DFT coefficients of the samples
ŷm = Nn=1 yn e−j2π nm/N .
2) Choose L = K and build the rectangular Toeplitz
matrix A according to (12).
3) Perform the SVD of A and choose the eigenvector
[h0 , h1 , . . . , hK ]T corresponding to the smallest eigenvalue—i.e., the annihilating filter coefficients.
4) Compute the roots e−j2π tk /τ of the z-transform
H(z) = Kk=0 hk z−k and deduce {tk }k=1,... ,K .
5) Compute the least mean square solution xk of the N
equations {yn − k xk ϕ(nT − tk )}n=1,2,...N .
When the measures yn are very noisy, it is necessary to
first denoise them by performing a few iterations of
Cadzow’s algorithm (see “Cadzow’s Iterative Denoising”),
before applying the above procedure.
IEEE SIGNAL PROCESSING MAGAZINE [35] MARCH 2008
CADZOW’S ITERATIVE DENOISING
Algorithm for “denoising” the samples yn of ”Annihilating Filter:
Total Least-Squares Method.”
1) Compute the N-DFT coefficients of the samples
ŷm = Nn=1 yn e−j2π nm/N .
2) Choose an integer L in [K, Bτ/2] and build the rectangular Toeplitz matrix A according to (12).
3) Perform the SVD of A = USV T where U is a (2M − L + 1)×
(L + 1) unitary matrix, S is a diagonal (L + 1) × (L + 1) matrix,
and V is a (L + 1) × (L + 1) unitary matrix.
4) Build the diagonal matrix S from S by keeping only the K
defined by (12)—more exactly: an eigenvalue decomposition of
the matrix AT A—and choosing for H the eigenvector corresponding to the smallest eigenvalue. More specifically, if A = USV T
Original and Estimated Signal Innovations : SNR = 5 dB
1.6
Original Diracs
Estimated Diracs
1.4
1.2
most significant diagonal elements, and deduce the total
least-squares approximation of A by A = US V T .
5) Build a denoised approximation ŷn of ŷn by averaging the
diagonals of the matrix A .
6) Iterate step 2 until, e.g., the (K + 1)th largest diagonal
element of S is smaller than the K th largest diagonal element by some pre-requisite factor.
The number of iterations needed is usually small (less than
ten). Note that, experimentally, the best choice for L in step 2
is L = M.
where U is a (B τ − K) × (K + 1) unitary matrix, S is a
(K + 1) × (K + 1) diagonal matrix with decreasing positive elements, and V is a (K + 1) × (K + 1) unitary matrix, then H is
the last column of V. Once the tk are retrieved, the xk follow
from a least mean square minimization of the difference
between the samples yn and the FRI model (14).
This approach, summarized in “Annihilating Filter: Total
Least-Squares Method,” is closely related to Pisarenko’s method
[11]. Although its cost is much larger than the simple solution
presented earlier, it is still essentially linear with N (excluding
the cost of the initial DFT).
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
0.8
1
0.8
1
(a)
Noiseless Samples
1
0.5
0
−0.5
0
0.2
0.4
0.6
Noisy Samples : SNR = 5 dB
1
0.5
0
−0.5
0
0.2
0.4
0.6
(b)
[FIG3] Retrieval of an FRI signal with (a) seven Diracs from (b) 71
noisy (SNR = 5 dB) samples.
EXTRA DENOISING: CADZOW
The previous algorithm works quite well for moderate values of the
noise—a level that depends on the number of Diracs. However, for
a small SNR, the results may become unreliable and it is advisable
to apply a robust procedure that “projects” the noisy samples onto
the sampled FRI model of (14). This iterative procedure was already
suggested by Tufts and Kumaresan in [13] and analyzed in [12].
As noticed earlier, the noiseless matrix A in (12) is of
rank K whenever L ≥ K. The idea consists thus in performing the SVD of A, say A = USVT , and forcing to zero the
L + 1 − K smallest diagonal coefficients of the matrix S to
yield S . The resulting matrix A = US VT is not Toeplitz anymore but its best Toeplitz approximation is obtained by averaging the diagonals of A . This leads to a new “denoised”
sequence ŷn that matches the noiseless FRI sample model
better than the original ŷn ’s. A few of these iterations lead to
samples that can be expressed almost exactly as bandlimited
samples of an FRI signal. Our observation is that this FRI
signal is all the closest to the noiseless one as A is closer to a
square matrix, i.e., L = Bτ/2.
The computational cost of this algorithm, summarized in
“Cadzow’s Iterative Denoising,” is higher than the annihilating
filter method since it requires performing the SVD of a square
matrix of large size, typically half the number of samples.
However, using modern computers we can expect to perform
the SVD of a square matrix with a few hundred columns in less
than a second. We show in Figure 3 an example of FRI signal
reconstruction having seven Diracs whose 71 samples are
buried in a noise with 5 dB SNR power (redundancy ≈ 5): the
IEEE SIGNAL PROCESSING MAGAZINE [36] MARCH 2008
total computation time is 0.9 s on a PowerMacintosh G5 at 1.8
GHz. Another more striking example is shown in Figure 4
where we use 1,001 noisy (SNR = 20 dB) samples to reconstruct 100 Diracs (redundancy ≈ 5): the total computation
time is 61 s. Although it is not easy to check on a crowded
graph, all the Dirac locations have been retrieved very precisely, while a few amplitudes are wrong. The fact that the Diracs
are sufficiently far apart (≥ 2/N) ensures the stability of the
retrieval of the Dirac locations.
CRAMÉR-RAO BOUNDS
The sensitivity of the FRI model to noise can be evaluated theoretically by choosing a statistical model for this perturbation.
The result is that any unbiased algorithm able to retrieve the
innovations of the FRI signal from its noisy samples exhibits a
covariance matrix that is lower bounded by Cramér-Rao Bounds
(see “Cramér-Rao Lower Bounds”). As can be seen in Figure 5,
the retrieval of an FRI signal made of two Diracs is almost optimal for SNR levels above 5 dB since the uncertainty on these
1,001 Samples : SNR = 20 dB
Original and Estimated Signal Innovations : SNR = 20 dB
1.6
Signal
Noise
1.4
1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
Original 100 Diracs
Estimated Diracs
−0.2
0
0.2
0.4
−0.2
0.6
0.8
1
−0.4
0
0.2
0.4
0.6
0.8
1
(b)
(a)
[FIG4] Retrieval of an FRI signal with (a) 100 Diracs from (b) 1,001 noisy (SNR = 20 dB) samples.
CRAMÉR-RAO LOWER BOUNDS
We are considering noisy real measurements Y = [y1 , y2 , . . . yN ] of the form
yn =
K
xk ϕ(nT − tk ) + εn
k=1
where εn is a zero-mean Gaussian noise of covariance R; usually the noise is assumed to be stationary: [R]n,n = rn−n where
rn = E{εn +n εn }. Then any unbiased estimate (Y) of the unknown parameters [x1 , x2 , . . . , xK ]T and [t1 , t2 , . . . tK ]T has a covariance
matrix that is lower bounded by the inverse of the Fisher information matrix (adaptation of [21, (6)])
cov{} ≥ (T R −1 )−1 ,
⎡
ϕ(T − t1 ) · · ·
⎢ ϕ(2T − t1 ) · · ·
⎢
where = ⎣
..
.
ϕ(NT − t1 ) · · ·
ϕ(T − tK )
ϕ(2T − tK )
..
.
ϕ(NT − tK )
− x1 ϕ (T − t1 )
−x1 ϕ (2T − t1 )
..
.
−xK ϕ (T − tK )
−xK ϕ (2T − tK )
..
.
···
···
−x1 ϕ (NT − t1 ) · · ·
⎤
⎥
⎥.
⎦
−xK ϕ (NT − tK )
Note that this expression holds quite in general: it does not require that ϕ(t ) be periodic or bandlimited, and the noise does not need
to be stationary.
One-Dirac periodized sinc case—If we make the hypothesis that εn is N-periodic and ϕ(t ) is the Dirichlet kernel (4), then
the 2 × 2 Fisher matrix becomes diagonal. The minimal uncertainties on the location of one Dirac, t1 , and on its amplitude, x1 , are then given by:
t1
Bτ
≥
√
τ
2π|x1 | N
m2
r̂
|m|≤Bτ/2 m
−1/2
Bτ
and x1 ≥ √
N
1
r̂
|m|≤Bτ/2 m
IEEE SIGNAL PROCESSING MAGAZINE [37] MARCH 2008
−1/2
.
√
1/ NB τ when the sampling noise is dominant (white noise
locations reaches the (unbiased) theoretical minimum given by
case), and to 1/(B τ ) when the transmission noise is dominant
Cramér-Rao bounds. Such a property has already been observed
(ϕ(t )-filtered white noise). In both cases, it appears that it is betfor high-resolution spectral algorithms (and notably those using
ter to maximize the bandwidth B of
a maximum likelihood approach)
the sinc-kernel in order to minimize
by Tufts and Kumaresan [13].
UNBIASED RETRIEVAL OF THE
the uncertainty on the location of
It is particularly instructive to
INNOVATIONS OF THE FRI SIGNAL
the Dirac. A closer inspection of the
make the explicit computation for
white noise case shows that the
signals that have exactly two innoFROM ITS NOISY SAMPLES
improved time resolution is obtained
vations per period τ and where the
EXHIBITS A COVARIANCE
at the cost of√a loss of amplitude
samples are corrupted with a white
MATRIX LOWER BOUNDED
accuracy by a B τ factor.
Gaussian noise. The results, which
BY CRAMÉR-RAO BOUNDS
involve the same arguments as in
When K ≥ 2, the Cramér-Rao
[14], are given in “Uncertainty
formula for one Dirac still holds
Relation for the One-Dirac Case” and essentially state that the
approximately when the locations are sufficiently far apart.
uncertainty on the location of the Dirac is proportional to
Empirically, if the minimal difference (modulo τ ) between two
of the Dirac locations is larger than, say, 2/N, then the maximal (Cramér-Rao) uncertainty on the retrieval of these locations is obtained using the formula given in “Uncertainty
Two Diracs / 21 Noisy Samples
Relation for the One-Dirac Case.”
1
Retrieved Locations
Cramér−Rao Bounds
DISCUSSION
Positions
0.8
0.6
0.4
0.2
0
−10
0
10
20
30
40
50
Input SNR (dB)
(a)
First Dirac
Positions
100
10−1
Observed Standard Deviation
Cramér−Rao Bound
10−3
10−5
−10
0
10
30
40
50
40
50
Second Dirac
100
Positions
20
10−1
10−3
10−5
−10
0
10
20
30
Input SNR (dB)
(b)
[FIG5] Retrieval of the locations of a FRI signal. (a) Scatterplot of
the locations; (b) standard deviation (averages over 10,000
realizations) compared to Cramér-Rao lower bounds.
APPLICATIONS
Let us turn to applications of the methods developed so far.
The key feature to look for is sparsity together with a good
model of the acquisition process and of the noise present in the
system. For a real application, this means a thorough understanding of the setup and of the physics involved (remember
that we assume a continuous-time problem, and we do not start
from a set of samples or a finite vector).
One main application to use the theory presented in this article is ultra-wide band (UWB) communications. This communication method uses pulse position modulation (PPM) with very
wideband pulses (up to several gigahertz of bandwidth).
Designing a digital receiver using conventional sampling theory
would require analog-to-digital conversion (ADC) running at over
5 GHz, which would be very expensive and power consumption
intensive. A simple model of an UWB pulse is a Dirac convolved
with a wideband, zero mean pulse. At the receiver, the signal is
the convolution of the original pulse with the channel impulse
response, which includes many reflections, and all this buried in
high levels of noise. Initial work on UWB using an FRI framework
was presented in [15]. The technology described in the present
article is currently being transferred to Qualcomm Inc.
The other applications that we would like to mention,
namely electroencephalography (EEG) and optical coherence
tomography (OCT), use other kernels than the Dirichlet window, and as such require a slight adaptation to what has been
presented here.
EEG measurements during neuronal events like epileptic
seizures can be modeled reasonably well by an FRI excitation to
a Poisson equation, and it turns out that these measurements
satisfy an annihilation property [16]. Obviously, accurate localization of the activation loci is important for the surgical treatment of such impairment.
IEEE SIGNAL PROCESSING MAGAZINE [38] MARCH 2008
compressed sensing strategy involves random measurement
In OCT, the measured signal can be expressed as a convoluselection, whereas arguments obtained from Cramér-Rao
tion between the (low-)coherence function of the sensing laser
bounds computation, namely, on the optimal bandwidth of
beam (typically, a Gabor function which satisfies an annihilathe sinc-kernel, indicate that, on
tion property) and an FRI signal
the contrary, it might be worth
whose innovations are the locations
optimizing the sensing matrix.
of refractive index changes and
ONE MAIN APPLICATION TO USE
their range, within the object
THE THEORY PRESENTED IN THIS
imaged [17]. Depending on the
CONCLUSIONS
ARTICLE IS ULTRA-WIDE BAND
noise level and the model adequacy,
Sparse sampling of continuous(UWB) COMMUNICATIONS.
the annihilation technique allows to
time sparse signals has been
reach a resolution that is potentially
addressed. In particular, it was
well-below the “physical” resolution implied by the coherence
shown that sampling at the rate of innovation is possible,
length of the laser beam.
in some sense applying Occam’s razor to the sampling of
sparse signals. The noisy case has been analyzed and
solved, proposing methods reaching the optimal performRELATION WITH COMPRESSED SENSING
ance given by the Cramér-Rao bounds. Finally, a number
One may wonder whether the approach described here could
of applications have been discussed where sparsity can be
be addressed using compressed sensing tools developed in
taken advantage of. The comprehensive coverage given in
[18] and [19]. Obviously, FRI signals can be seen as “sparse”
this article should lead to further research in sparse samin the time domain. However, differently from the compling, as well as new applications.
pressed sensing framework, this domain is not discrete: the
innovation times may assume arbitrary real values. Yet,
assuming that these innovations fall on some discrete grid
UNCERTAINTY RELATION FOR THE ONE-DIRAC CASE
{θn }n =0,1,... ,(N −1) known a priori, one may try to address
We consider the FRI problem of finding [x1 , t1 ] from the N
our FRI interpolation problem as
noisy measurements [y1 , y2 , . . . , yN ]
N
−1
N yn
|xn | under the constraint
min
x0 ,x1 ,... ,xN −1
n =0
2
N
−1
−
xn ϕ(nT − θn ) ≤ Nσ 2 ,
yn = μn + εn
n =1
(15)
n =0
where σ 2 is an estimate of the noise power.
In the absence of noise, it has been shown that this minimization provides the parameters of the innovation, with
“overwhelming” probability [19] using O(K log N ) measurements. Yet this method is not as direct as the annihilating filter method that does not require any iteration. Moreover, the
compressed-sensing approach does not reach the critical sampling rate, unlike the method proposed here which almost
achieves this goal (2K + 1 samples for 2K innovations). On the
other hand, compressed sensing is not limited to uniform
measurements of the form (14) and could potentially accommodate arbitrary sampling kernels—and not only the few ones
that satisfy an annihilation property. This flexibility is certainly
an attractive feature of compressed sensing.
In the presence of noise, the beneficial contribution of the
1 norm is less obvious since the quadratic program (15)
does not provide an exactly K-sparse solution anymore,
although 1 /2 stable recovery of the xk is statistically guaranteed [20]. Moreover, unlike the method we are proposing
here which is able to reach the Cramér-Rao lower bounds
(computed in “Cramér-Rao Lower Bounds”), there is no evidence that the 1 strategy may share this optimal behavior.
In particular, it is of interest to note that, in practice, the
with μn = x1 ϕ(nτ/N − t1 )
where ϕ(t ) is the τ -periodic, B-bandlimited Dirichlet kernel
and εn is a stationary Gaussian noise. Any unbiased algorithm
that estimates t1 and x1 will do so up to an error quantified
by their standard deviation t1 and x1 , lower bounded by
Cramér-Rao formulas (see “Cramér-Rao Lower Bounds”).
Denoting the noise power by σ 2 and the peak SNR by
PSNR = |x1 |2 /σ 2 , two cases are especially interesting:
• The noise is white, i.e., its power spectrum density is constant and equals σ 2 . Then we find
t1
1
≥
τ
π
x1
3Bτ
· PSNR−1/2 and
≥
N(B 2 τ 2 − 1)
|x1 |
Bτ
· PSNR−1/2 .
N
• The noise is a white noise filtered by ϕ(t ), then we find
1
t1
≥
τ
π
3
· PSNR−1/2
B 2τ 2 − 1
and
x1
≥ PSNR−1/2 .
|x1 |
In both configurations, we conclude that, in order to minimize the uncertainty on t1 , it is better to maximize the bandwidth of the Dirichlet kernel, i.e., choose B such that Bτ = N if
N is odd, or such that Bτ = N − 1 if N is even. Since Bτ ≤ N
we always have the following uncertainty relation
√
t1
3
N · PSNR1/2 ·
≥
,
τ
π
involving the number of measurements, N, the end noise
level and the uncertainty on the position.
IEEE SIGNAL PROCESSING MAGAZINE [39] MARCH 2008
Lionel Coulot ([email protected]) received the B.Sc. and
AUTHORS
the M.Sc. degrees in 2005 and 2007, respectively, in communiThierry Blu ([email protected]) received the “Diplôme
cation systems from the Swiss Federal Institute of Technology,
d’ingénieur” from École Polytechnique, France, in 1986
Lausanne, where he is currently a Ph.D. candidate. He spent one
and the Ph.D. from Télécom Paris,in 1996. His main field of
year on exchange at Carnegie
research is approximation/interMellon University and was an
polation/sampling theory and
intern at Microsoft Research Asia,
wavelets. He was with the
ESTIMATION OF SPARSE DATA IS
Beijing, PRC. His research interBiomedical Imaging Group at the
A CLASSIC PROBLEM IN SIGNAL
ests include image and signal proSwiss Federal Institute of
PROCESSING AND
cessing as well as auction and
Technology, Lausanne, and is
COMMUNICATIONS.
game theory.
now with the Department of
Electronic Engineering, the
Chinese University of Hong
Kong. He received two best paper awards from the IEEE
REFERENCES
[1] M. Unser, “Sampling—50 Years after Shannon,” Proc. IEEE, vol. 88, pp.
Signal Processing Society (2003 and 2006).
569–587, Apr. 2000.
Pier-Luigi Dragotti ([email protected]) is a senior
[2] S.M. Kay, Modern Spectral Estimation—Theory and Application. Englewood
lecturer (associate professor) in the Electrical and Electronic
Cliffs, NJ: Prentice-Hall, 1988.
Engineering Department at Imperial College London. He
[3] P. Stoica and R.L. Moses, Introduction to Spectral Analysis. Upper Saddle River,
NJ: Prentice-Hall, 1997.
received the laurea degree in electrical engineering from the
[4] C.E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J.,
University Federico II, Naples, Italy, in 1997 and the master’s
vol. 27, pp. 379–423 and 623–656, Jul. and Oct. 1948.
and Ph.D. degrees from the École Polytechnique Federale de
[5] R. Prony, “Essai expérimental et analytique,” Ann. École Polytechnique, vol. 1,
Lausanne, in 1998 and 2002, respectively. He was a visiting
no. 2, p. 24, 1795.
student at Stanford University and a summer researcher in
[6] S.M. Kay and S.L. Marple, “Spectrum analysis—A modern perspective,” Proc.
IEEE, vol. 69, pp. 1380–1419, Nov. 1981.
the Mathematics of Communications Department at Bell
[7] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate of innoLabs, Lucent Technologies. His research interests include
vation,” IEEE Trans. Signal Processing, vol. 50, pp. 1417–1428, June 2002.
wavelet theory, sampling and approximation theory, image
[8] P.-L. Dragotti, M. Vetterli, and T. Blu, “Sampling moments and reconstructing
and video processing, and compression. He is an associate edisignals of finite rate of innovation: Shannon meets Strang-Fix,” IEEE Trans.
Signal Processing, vol. 55, pp. 1741–1757, May 2007.
tor of IEEE Transactions on Image Processing.
[9] Special Issue on Spectral Estimation, Proc. IEEE, vol. 70, Sept. 1982.
Martin Vetterli ([email protected]) received his engi[10] H. Clergeot, S. Tressens, and A. Ouamri, “Performance of high resolution freneering degree from Eidgenoessische Technische
quencies estimation methods compared to the Cramér-Rao bounds,” IEEE Trans.
Hochschule-Zuerich, his M.S. from Stanford University, and
Acoust., Speech, Signal Processing, vol. 37, pp. 1703–1720, Nov. 1989.
his Doctorate from École Polytechnique Federale de Lausanne
[11] V.F. Pisarenko, “The retrieval of harmonics from a covariance function,”
Geophys. J., vol. 33, pp. 347–366, Sept. 1973.
(EPFL). He was with Columbia University, New York, and the
[12] J.A. Cadzow, “Signal enhancement—A composite property mapping algoUniversity of California at Berkeley before joining EPFL. He
rithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 49–62, Jan.
1988.
works on signal processing and communications, e.g., wavelet
theory and applications, joint source-channel coding, sensor
[13] D.W. Tufts and R. Kumaresan, “Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood,” Proc. IEEE,
networks for environmental monitoring. He received the
vol. 70, pp. 975–989, Sept. 1982.
best paper awards of IEEE Signal Processing Society
[14] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Cramér-Rao
bound,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, pp. 720–741, May
(1991,1996, and 2006). He is the coauthor of the book
1989.
Wavelets and Subband Coding (Prentice-Hall, 1995) and the
[15] I. Maravić, J. Kusuma, and M. Vetterli, “Low-sampling rate UWB channel charforthcoming textbook Fourier and Wavelets: Theory,
acterization and synchronization,” J. Comm. Netw., vol. 5, no. 4, pp. 319–327,
2003.
Algorithms and Applications.
[16] D. Kandaswamy, T. Blu, and D. Van De Ville, “Analytic sensing: Reconstructing
Pina Marziliano ([email protected]) received the B.Sc. in
pointwise sources from boundary Laplace measurements,” in Proc. Spie–Wavelet
applied mathematics and the M.Sc. in computer science
XII, vol. 6701, San Diego, CA, 2007, pp. 670114-1/67011Y.
from the Université de Montréal, Canada, in 1994 and 1996,
[17] T. Blu, H. Bay, and M. Unser, “A new high-resolution processing method for
the deconvolution of optical coherence tomography signals,” in Proc. ISBI’02, vol.
respectively. In 2001, she completed the Ph.D from the Écôle
III, Washington, D.C., 2002, pp. 777–780.
Polytechnique Fédérale de Lausanne, Switzerland. She
[18] D.L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, pp.
joined Genimedia in 2002. Since 2003, she has been an assis1289–1306, Apr. 2006.
tant professor in the School of Electrical and Electronic
[19] E.J. Candès, J.K. Romberg, and T. Tao, “Robust uncertainty principles: Exact
signal reconstruction from highly incomplete frequency information,” IEEE Trans.
Engineering at the Nanyang Technological University in
Inform. Theory, vol. 52, pp. 489–509, Feb. 2006.
Singapore. She received IEEE Signal Processing Society
[20] E.J. Candès, J.K. Romberg, and T. Tao, “Stable signal recovery from incom2006 Best Paper Award. Her research interests include samplete and inaccurate measurements,” Comm. Pure Appl. Math., vol. 59, pp.
1207–1223, Mar. 2006.
pling theory and applications in communications and bio[21] B. Porat and B. Friedlander, “Computation of the exact information matrix of
medical engineering, watermarking, and perceptual quality
Gaussian time series with stationary random components,” IEEE Trans. Acoust.,
Speech, Signal Processing, vol. 34, pp. 118–130, Feb. 1986.
metrics for multimedia.
[SP]
IEEE SIGNAL PROCESSING MAGAZINE [40] MARCH 2008