An Iterative Multichannel Subspace

An Iterative Multichannel Subspace-Based
Covariance Subtraction Method for
Relative Transfer Function Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
International Audio Laboratories Erlangen
March 1, 2017
Introduction and Motivation
Source extraction in noisy environments is
ubiquitous in hands-free applications
Desired
source
Noise source
To estimate the desired source we need to
estimate the transfer functions Hm
To extract the desired source as received
by the first microphones we only need to
estimate Hm /H1
RTFs can be estimated from the data when the source is active
We summarise state-of-the art estimators and propose an efficient
iterative RTF estimator suitable for real-time applications
© AudioLabs 2017
Slide 1
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Content
1. Signal Model and Source Extraction
2. Existing RTF Estimators
3. Proposed RTF Estimator
4. Performance Evaluation
5. Conclusions
© AudioLabs 2017
Slide 2
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Content
1. Signal Model and Source Extraction
2. Existing RTF Estimators
3. Proposed RTF Estimator
4. Performance Evaluation
5. Conclusions
© AudioLabs 2017
Slide 3
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Signal Model
Desired speech and noise signals captured by M microphones
STFT-domain signal at time n, frequency k :
y(n, k) = x(n, k) + v(n, k)
= h(n, k) S(n, k) + v(n, k)
= g(n, k) X1 (n, k) + v(n, k)
The RTF vector can be expressed in terms of the acoustic transfer
functions Hm (n, k):
T
H2 (n, k)
HM (n, k)
g(n, k) = 1,
, ··· ,
H1 (n, k)
H1 (n, k)
The RTF vector is time-dependent to model source movements
© AudioLabs 2017
Slide 4
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Signal Model
The power spectral density (PSD) matrices Φy and Φv are required
for RTF estimation
The PSD matrix of the received signal:
Φy (n, k) = Φx (n, k) + Φv (n, k)
The PSD matrix of the desired signal:
Φx (n, k) = φx1 (n, k) g(n, k)g H (n, k) with φx1 = E |X1 |2
The PSD matrix of the undesired signal, Φv , can be estimated
during speech absence, or using speech presence
probability-controlled recursive averaging (Souden et al., 2011)
© AudioLabs 2017
Slide 5
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Source Extraction
Estimate of the desired signal:
b1 (n, k) = wH (n, k) y(n, k)
X
= wH (n, k) g(n, k) X1 (n, k) + v(n, k)
Distortionless response if wH g = 1
Minimum Variance Distortionless Response (MVDR) filter:
w(n, k) = arg min wH Φv (n, k)w subject to wH g(n, k) = 1
w
=
Φ−1
v (n, k) g(n, k)
g(n, k)H Φ−1
v (n, k) g(n, k)
For real-time applications, the RTF vector needs to be efficiently
estimated online using the microphone signals y(n, k)
© AudioLabs 2017
Slide 6
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Content
1. Signal Model and Source Extraction
2. Existing RTF Estimators
3. Proposed RTF Estimator
4. Performance Evaluation
5. Conclusions
© AudioLabs 2017
Slide 7
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Existing RTF Estimators
Method 1: Covariance Subtraction
Recall the definition:
Φx (n, k) = φx1 (n, k)g(n, k)g H (n, k)
The RTF can be obtained by
g CS (n, k) =
with e1 = [1, 0, . . . , 0]T
bx = Φ
by − Φ
bv
In practice Φx can be estimated using Φ
© AudioLabs 2017
Slide 8
Φx (n, k) e1
eT
1 Φx (n, k) e1
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Existing RTF Estimators
Method 2: Covariance Subtraction with EVD
The RTF vector g is proportional to the principal eigenvector of Φx
An estimate of the RTF vector is given by the principal eigenvector
bx = Φ
by − Φ
bv
umax of Φ
g CS−EVD (n, k) =
umax (n, k)
eT
1 umax (n, k)
by − Φ
b v provides better performance
The principal eigenvector of Φ
by − Φ
b v (Serizel et al., 2014)
in spatial filtering than the column of Φ
R. Serizel et al., ”Low-rank approximation based multichannel Wiener filter algorithms for
noise reduction with application in cochlear implants”, IEEE/ACM Transactions on ASLP,
2014
© AudioLabs 2017
Slide 9
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Existing RTF Estimators
Method 3: Covariance Whitening
A generalized eigenvalue problem:
(φx1 gg H + Φv ) u = λΦv u
|
{z
}
Φy
In theory: Only one eigenvalue λ 6= 1
In practice: Use the principal eigenvector umax of Φ−1
v Φy
g CW (n, k) =
© AudioLabs 2017
Slide 10
Φv (n, k) umax (n, k)
T
b v (n, k) umax (n, k)
e1 Φ
b
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Existing RTF Estimators
Method 4: Covariance Whitening using PM
Use power method to estimate the GEVD (Krueger et al., 2011)
b −1 (n, k)Φ
b y (n, k)
Iteration matrix: Acw (n, k) = Φ
v
b max (n, k) =
Power iteration: u
Compute the RTF vector:
g PM−CW (n, k) =
Acw (n,k)b
umax (n−1,k)
kAcw (n,k)b
umax (n−1,k)k
b v (n, k) u
b max (n, k)
Φ
T
b
b max (n, k)
e Φv (n, k) u
1
Krueger et al., ”Speech enhancement with a GSC-like structure employing
eigenvector-based transfer function ratios estimation”, IEEE Transactions on ASLP, 2011
© AudioLabs 2017
Slide 11
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Existing RTF Estimators
Summary
Covariance-Subtraction: gCS
I Computationally efficient
Covariance-Subtraction with EVD: gCS−EVD
I More accurate than gCS (Serizel et al., 2014)
I Requires EVD
Covariance-Whitening: gCW
I More accurate than g
CS (Markovich-Golan et al., 2015)
I Requires GEVD
Covariance-Whitening with PM: gPM−CW (Krueger et al., 2011)
S. Markovich-Golan et al., ”Performance analysis of the CS method for relative transfer
function estimation and comparison to the CW method”, IEEE Transactions on ASLP, 2015
© AudioLabs 2017
Slide 12
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Content
1. Signal Model and Source Extraction
2. Existing RTF Estimators
3. Proposed RTF Estimator
4. Performance Evaluation
5. Conclusions
© AudioLabs 2017
Slide 13
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Proposed RTF Estimator
Computing g PM−CW is less complex than computing g CW
b v to compute Acw , and
It still involves the inversion of Φ
b
multiplication by Φv to obtain g PM−CW from the eigenvector umax
We propose to estimate g CS−EVD using the power method
I
Iteration matrix:
b y (n, k) − Φ
b v (n, k)
Acs (n, k) = Φ
I
Power iteration:
b max (n, k) =
u
g PM−CS (n, k) =
© AudioLabs 2017
Slide 14
Acs (n,k)b
umax (n−1,k)
kAcs (n,k)b
umax (n−1,k)k
b max (n, k)
u
b max (n, k)
eT
1 u
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Content
1. Signal Model and Source Extraction
2. Existing RTF Estimators
3. Proposed RTF Estimator
4. Performance Evaluation
5. Conclusions
© AudioLabs 2017
Slide 15
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Experimental Setup
Simulated room 4.5 × 4 × 3 m3 , reverberation time T60 = 0.3 s
Uniform 5-element linear array, inter-microphone distance 4 cm
Microphone signals contain desired speech, directional interferer
(fan noise), and sensor noise
I
I
signal-to-interference ratio (SIR): {0, ∞} dB
signal-to-sensor noise ratios (SNRs): [−5, 15] dB
In all experiments, source-array distance was 1-1.2 m
STFT frame-size is 128 ms, overlap 50%, sampling rate 16 kHz
Noise PSD matrix:
1. Estimated in advance during speech absence (denoted by ”Id”)
2. Estimated using speech presence probability-based framework
© AudioLabs 2017
Slide 16
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Results: Distance Measure
Hermitian Angle: Θ(n, k) = arccos
Without
interferer
Only
sensor
noise
Withwith
interferer
Fan Noise
SIR = 0 dB
1.2
1
Average Error (radian)
Average Error (radian)
1.2
0.8
0.6
0.4
0.2
0
b(n, k)|
|g H (n, k) g
kg(n, k)k kb
g (n, k)k
CSId
CS-EVDId
PM-CS Id
-5
0
5
10
15
1
0.8
0.6
0.4
0.2
0
CSId
CS-EVDId
PM-CS Id
-5
0
SNR [dB]
10
15
Averaged Θ(n, k) over time segment of 15 s for all n and k
CS-EVD outperforms CS and the error of the proposed PM-CS lies
between the two methods
© AudioLabs 2017
Slide 17
5
SNR [dB]
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Results: Source Extraction Using MVDR
MVDR filters using different RTF estimates
Objective quality evaluation:
Speech distortion (SD) index
I
Signal to interference-plus-noise ratio (SINR) improvement
compared to the reference microphone
The measures are computed for non-overlapping 30 ms frames and
are then averaged over all frames (15 seconds)
© AudioLabs 2017
Slide 18
I
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Speech distortion (fan noise with 0 dB SIR)
0.2
0.20
Speech distortion index
0.18
CW
CW
PM-CW
PM-CW
CS-EVD
CS-EVD
PM-CS
PM-CS
CS
CS
estimated noise PSD
SPP-based noise PSD
0.16
0.16
ideal noise PSD
0.14
0.12
0.12
0.1
0.08
0.08
0.06
0.04
0.04
0.02
00
-5
-5
0
0
55
SNR [dB]
10
10
15
15
Signal-to-noise ratio (SNR) [dB]
Ideal noise PSD matrix: The proposed PM-CS causes similar or
larger SD than the CS-EVD, but smaller than the CS
Estimated noise PSD matrix: The distortion of PM-CS and CS-EVD
is comparable
Estimated noise PSD matrix: PM-CS causes lower SD than CW
and PM-CW
SINR improvement (fan noise with 0 dB SIR)
20
20
SINR improvement [dB]
18
CW
CW
PM-CW
PM-CW
CS-EVD
CS-EVD
SPP-based noise PSD
16
16
PM-CS
PM-CS
CS
CS
estimated noise PSD
ideal noise PSD
14
12
12
10
88
6
44
2
00
-5
-5
00
55
SNR [dB]
10
10
15
15
Signal-to-noise ratio (SNR) [dB]
CS provides less SINR improvement than the alternatives which is
consistent with (Markovich-Golan et al., 2015)
Estimated noise PSD matrix: The proposed PM-CS has similar
SINR improvement than CS-EVD
Content
1. Signal Model and Source Extraction
2. Existing RTF Estimators
3. Proposed RTF Estimator
4. Performance Evaluation
5. Conclusions
© AudioLabs 2017
Slide 21
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Conclusions
Motivated by the advantage of g CS−EVD compared to g CS , we
proposed an iterative estimator to reduce the complexity
Although the proposed PM-CS estimator has a greater
computationally complexity than the CS estimator, it is less complex
than the PM-CW estimator
When the noise statistics are estimated, the performance of the
proposed estimator is comparable to the CS-EVD estimator
© AudioLabs 2017
Slide 22
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets
Thank you for your attention.
© AudioLabs 2017
Slide 23
An Iterative Covariance Subtraction Method for RTF Estimation
Reza Varzandeh, Maja Taseska and Emanuël Habets