Comparison of Uniform and Random Sampling for Speech

2017 International Conference on Sampling Theory and Applications (SampTA)
Comparison of Uniform and Random Sampling
for Speech and Music Signals
arXiv:1705.01457v2 [cs.SD] 15 May 2017
Nematollah Zarmehi, Sina Shahsavari, and Farokh Marvasti
Advanced Communication Research Institute
Department of Electrical Engineering
Sharif University of Technology, Tehran, Iran
Email: http://zarmehi.ir/contact.html
Abstract—In this paper, we will provide a comparison between
uniform and random sampling for speech and music signals.
There are various sampling and recovery methods for audio
signals. Here, we only investigate uniform and random schemes
for sampling and basic low-pass filtering and iterative method
with adaptive thresholding for recovery. The simulation results
indicate that uniform sampling with cubic spline interpolation
outperforms other sampling and recovery methods.
TABLE I
S AMPLING AND RECOVERY SCHEMES .
I. I NTRODUCTION
Various sampling and recovery methods proposed in the
field of signal processing. The Nyquist-Shannon theorem
proposes condition to recover a band-limited signal from
its samples [1]–[3]. The uniform sampling using an antialiasing filter and low-pass (LP) filtering were used for some
decades. After that, other sampling methods such as nonuniform sampling [4]–[6], periodic non-uniform sampling [7],
[8], and random sampling [9], [10] were proposed.
In this paper, we provide a comparison between uniform and
random sampling for speech and music signals. We use basic
LP filtering and spline interpolation for uniform sampling
and Iterative Method with Adaptive Thresholding (IMAT) for
random sampling.
Short Name
Sampling
Description
U-AF-FFT-Sp
AF & Uniform
FFT as AF & Spline interpolation
U-AF-FFT
AF & Uniform
FFT as AF & LP-filtering
U-AF-FIR-Sp
AF & Uniform
FIR as AF & Spline interpolation
U-AF-FIR
AF & Uniform
FIR as AF & LP-filtering
R-IMATI
Random
IMATI
R-Sp
Random
Spline interpolation
TABLE II
O BJECTIVE PERFORMANCE METRIC CRITERI
II. S AMPLING AND R ECOVERY M ETHODS
This Section introduces the sampling and recovery methods
that we are going to compare. We compare uniform and
random sampling schemes for speech and music signals along
with different recovery methods such as basic LP filtering,
spline interpolation, and IMAT [11], [12]. IMATI is a version of IMAT algorithm that uses interpolation operator in
each iteration [13]. Table I shows the sampling and recovery
schemes used in this paper. The abbreviations AF stands for
Anti-aliasing Filter.
We have used some objective performance metric criteria
for comparison of above methods. They are listed in Table II.
The recovered “.WAV” files are also saved on memory disk
for subjective evaluations.
Name
Quantity
SNR
dB
More description
kxk
SN R(x, x̂) = 20 log kx−x̂k
PESQ
Dimensionless
A score between 1.0 (worst) up to 4.5 (best)
CPU Time
second
Using tic and toc commands in MATLAB
A. Speech Signal
All sampling and recovery methods are simulated on our
speech dataset and the results are presented in Fig. 1 in terms
of SNR.
Fig. 1 shows the SNR of all methods vs. sampling rate. In
uniform sampling scheme, we used periodic uniform sampling
for sampling rates greater than 0.5. According to Fig. 1,
uniform sampling with spline interpolation outperforms the
other methods. Another observation is that spline interpolation
works well with uniform samples but its performance degraded
in case of random sampling.
The Perceptual Evaluation of Speech Quality (PESQ) metric
is employed to assess the quality of recovered speech signals.
PESQ is approved as ITU-T Rec. P.862 [14]. The voice quality
is rated by a value ranging from 1 (bad) to 5 (excellent). The
results are shown in Fig 2.
III. S IMULATION R ESULTS
In this section, we present simulation results. We have used
44.1–48kHz speech and music “.WAV” signals. Frame size is
1024 and the simulations are done in MATLAB R2015a on
Intel(R) Core(TM) i7-5960X @ 3GHz with 23GB-RAM.
1
2017 International Conference on Sampling Theory and Applications (SampTA)
50
60
50
40
U-NAF-Sp
U-NAF
U-AF-FFT-Sp
U-AF-FFT
R-IMATI
R-Sp
U-AF-FIR-Sp
U-AF-FIR
30
20
10
0
0.1
SNR (dB)
SNR (dB)
40
0.2
0.3
0.4
0.5
0.6
0.7
0.8
30
U-NAF-Sp
U-NAF
U-AF-FFT-Sp
U-AF-FFT
R-IMATI
R-Sp
U-AF-FIR-Sp
U-AF-FIR
20
10
0
0.1
0.9
0.2
0.3
Sampling Rate
0.4
0.5
0.6
0.7
0.8
0.9
Sampling Rate
Fig. 1. SNR vs. sampling rate.
Fig. 3. SNR of all methods.
3.5
4.5
4
3
3.5
3
U-NAF-Sp
U-NAF
U-AF-FFT-Sp
U-AF-FFT
U-AF-FIR-Sp
U-AF-FIR
R-IMATI
R-Sp
2
1.5
1
0.5
0.1
PESQ
PESQ
2.5
0.2
0.3
0.4
0.5
0.6
0.7
0.8
U-NAF-Sp
U-NAF
U-AF-FFT-Sp
U-AF-FFT
U-AF-FIR-Sp
U-AF-FIR
R-IMATI
R-Sp
2.5
2
1.5
1
0.5
0.1
0.9
Sampling Rate
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Sampling Rate
Fig. 2. PESQ vs. sampling rate.
Fig. 4. PESQ of all methods.
B. Music Signal
Sampling rate = 0.5
35
We have also compared uniform and random sampling with
music signals. The SNR of all methods is compared in Fig. 4.
In uniform sampling scheme, the high frequency components
will be filtered; Therefore, we expect spline does not work for
music signal as well as for speech signal.
Although the PESQ is not coincident with music, we measured PESQ for the recovered music signals. Fig. 4 presents
the PESQ of all methods. It can be seen that uniform sampling
with FIR filter as anti-aliasing filter has the best value of PESQ
between these sampling and recovery methods.
We also test different FIR and IIR filters as anti-aliasing
filter before uniform sampling. The original signal is recovered
by cubic spline interpolation. The results are shown in Figs.
5-8. It can be seen that using FIR filter as anti-aliasing filter
leads to better performance in terms of SNR and PESQ.
SNR (dB)
30
25
20
Maximally flat
Window
Least square
Generalized Eq. ripple
15
10
50
60
70
80
90
100
110
120
Filter Order
Fig. 5. Different FIR Filters.
C. Complexity Comparison
seen that the simulation time does not change as sampling
rate changes. This is true almost for all methods.
The normalized CPU time is also charted in Fig. 10.
Random sampling with IMATI for reconstruction takes more
We provide a complexity comparison between the methods
of Table I by measuring the run time of simulations. Fig.
9 shows the run time for different sampling rate. It can be
2
2017 International Conference on Sampling Theory and Applications (SampTA)
0.14
Sampling rate = 0.5
X: 0.2
Y: 0.1185
2.85
Time (sec) Frame-Size = 1024
0.12
PESQ
2.8
2.75
2.7
2.65
2.6
50
Maximally flat
Window
Least square
Generalized Eq. ripple
60
0.1
IMATI
IMAT-Spline
IMAT
Spline
Uniform
0.08
0.06
0.04
X: 0.2
Y: 0.01287
0.02
70
80
90
100
110
0
0.1
120
0.15
0.2X: 0.2
0.25
Y: 3.446e-05
Filter Order
0.3
0.35
0.4
0.45
0.5
Sampling Rate
Fig. 9. CPU time of all methods.
Fig. 6. Different FIR Filters.
1.5
13
Normalized CPU Time
12
11
SNR (dB)
X: 0.2
Y: 0.04263
10
9
1
0.5
0.38
0.11
Butterworth
Elliptic
Chebyshev
8
1
0
IMATI
IMAT-Spline
IMAT
0.006
0.0003
Spline
Uniform-LP
7
5
10
15
20
25
30
Fig. 10. Normalized CPU time of all methods.
Filter Order
Fig. 7. Different IIR Filters.
well matches to the samples. Thus, in cases that the signal is
band-limited, the cubic spline leads to smaller error. As the
name suggests, in the cubic spline, the interpolated value at
each missing sample is a cubic interpolation of the values at
neighboring samples. Given a set of n + 1 data points (xi , yi )
where x0 < x1 < . . . < xn , the spline interpolator S(x) is a
polynomial of degree 3 on each subinterval [xi−1 , xi ] where
i = 1, . . . , n, i.e.,

C1 (x)
x0 ≤ x ≤ x1





Ci (x)
xi−1 ≤ x ≤ xi
S(x) =
(1)





Cn (x)
xn−1 ≤ x ≤ xn ,
2.5
Butterworth
Elliptic
Chebyshev
2.4
PESQ
2.3
2.2
2.1
2
1.9
5
10
15
20
25
30
Filter Order
where Ci (x) = ai + bi x + ci x2 + di x3 (di 6= 0) is a cubic
function. An example of cubic spline interpolation is shown
in Fig. 11.
We have generated an artificial 64-sparse signal with length
1024 and sampled its inverse transformed version uniformly.
The original, sampled, and reconstructed signal is shown in
Fig. 12. It can be seen that the original signal is not smooth at
all. In other words, it is not a nearly band-limited or LP signal.
Consider two samples highlighted in Fig. 12. We expect the
original signal have a value near zero that is between the value
of these two samples. But it is about -30.
Fig. 8. Different IIR Filters.
time than the uniform sampling with basic LP filtering or
spline interpolation.
IV. C UBIC S PLINE I NTERPOLATION
We observed that the spline interpolation does not work
well in random sampling case. Among all spline interpolations,
the cubic spline gives smoother interpolating polynomial that
3
2017 International Conference on Sampling Theory and Applications (SampTA)
[4] A. J. Jerri, “The shannon sampling theorem: Its various extensions and
applications: A tutorial review,” Proceedings of the IEEE, vol. 65, no. 11,
pp. 1565–1596, November 1977.
[5] M. B. Mashhadi, N. Salarieh, E. S. Farahani, and F. A. Marvasti, “Level
crossing speech sampling and its sparsity promoting reconstruction using
an iterative method with adaptive thresholding,” IET Signal Processing,
2017.
[6] F. Marvasti, “Spectrum of nonuniform samples,” Electronics Letters,
vol. 20, no. 21, pp. 896–897, October 1984.
[7] J. L. Yen, “On nonuniform sampling of bandwidth-limited signals,” IRE
Transaction on Circuit Theory, vol. CT-3, pp. 251–257, December 1956.
[8] A. Papoulis, Signal Analysis. New York: McGraw-Hill, 1997.
[9] E. Masry, “Random sampling and reconstruction of spectra,” Inf. Contr.,
vol. 19, pp. 275–288, 1971.
[10] F. Marvasti, “Spectral analysis of random sampling and error free
recovery by an iterative method,” IECE Transaction of Inst. Electron.
Commun. Engs. Japan, vol. E 69, no. 2, 1986.
[11] F. Marvasti, A. Amini, F. Haddadi, M. Soltanolkotabi, B. Khalaj,
A. Aldroubi, S. Sanei, and J. Chambers, “A unified approach to sparse
signal processing,” EURASIP Journal on Advances in Signal Processing,
2012.
[12] IMAT algorithm, “Iterative method with adaptive thresholding algorithm
for sparse signal reconstruction,” 2017, [Online]. Available: http://ee.
sharif.ir/$\sim$imat. [Accessed: 07-January-2017].
[13] M. Azghani and F.Marvasti, “Iterative methods for random sampling
and compressed sensing recovery,” in 10th International Conference on
Sampling Theory and Applications (SAMPTA 2013), July 2013.
[14] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ)-a new method for speech
quality assessment of telephone networks and codecs,” in 2001 IEEE
International Conference on Acoustics, Speech, and Signal Processing.
Proceedings (Cat. No.01CH37221), vol. 2, 2001, pp. 749–752.
Original Signal
0.04
Random Sampled Signal
Reconstructed Signal by Cubic Spline
0.03
0.02
0.01
0
-0.01
-0.02
237
238
239
240
241
242
243
Fig. 11. An example of cubic spline interpolation.
Fig. 12. An example of cubic spline interpolation for pure sparse signal.
V. C ONCLUSION
In this paper, we compared uniform and random sampling
schemes for speech and music signals. We also used different
reconstruction techniques for both sampling schemes. In case
the speech signal is divided into frames, both objective performance metric criteria and subjective evaluations proposed that
the uniform sampling with spline interpolation outperforms
other methods. This is true due to the fact that speech and
music signals are LP signals. However, this is not true if the
signal has high frequency components or it is sparse. In case,
the signal is not purely low-pass or sparse, one can use subband coding in which the low and high frequency components
are sampled uniformly and randomly, respectively.
R EFERENCES
[1] C. E. Shannon, “Communication in the presence of noise,” in Proc. IRE,
vol. 37, January 1949, pp. 10–21.
[2] F. Marvasti, Nonuniform Sampling: Theory and Practice. Springer,
formerly Kluwer Academic/Plenum Publishers, 2001.
[3] H. J. Landau, “Necessary density conditions for sampling and interpolation of certain entire functions,” Acta Mathematica, vol. 117, no. 1,
pp. 37–52, 1967.
4