Multiple Pitch Tracking for Blind Source Separation Using a Single

BGU
Multiple Pitch Tracking for Blind Source
Separation Using a Single Microphone
Joseph Tabrikian
Dept. of Electrical and Computer Engineering
Ben-Gurion University of the Negev
Workshop on:
Speech Enhancement and Multichannel Audio Processing
Technion 22.2.2007
BGU
Outline





Motivation
Single source pitch estimation and tracking
Multiple source pitch estimation and tracking
Experiments
Conclusion
BGU
Motivation


Speech enhancement
Sensitivity of many audio processing
algorithms to interference. For example:




Automatic speech/speaker recognition
Speech/music compression
Single microphone blind source separation
(BSS)
Karaoke
BGU
Single Source - Modeling

Voice frames - harmonic model:
K
y (tn )   bk cos(tn  k )  v(tn ), n  1,
,N
k 1

v(tn ) - additive Gaussian noise
In matrix notation: y  A()b  v, v ~ N (0, R v )
1 cos t1

1 cos t2
A( )  


1 cos t N
b  bc 0
bc1
cos K t1 sin t1
cos K t2 sin t2
cos K t N sin t N
bcK bs1
bsK 
T
sin K t1 

sin K t2 


sin K t N 
BGU
Single Source – Pitch Tracking

Maximum Likelihood (ML) estimator:
ˆ  arg max PR

PR1/ 2 A ( )  R
v

1/ 2
A
v
1/ 2
v
( )y
2
A( )  A ( )R A( )  A H ( )R v1/ 2
H
1
v
1
Pitch tracking:
The data vector at the mth frame:
y m  A(m )bm  v m ,
m  1,
m m1- first-order Markov process: f (1 ,
M

,M
M
, M )   f (m | m 1 )
m 1
Maximum A-posteriori Probability (MAP) pitch tracking
via the Viterbi algorithm.
(Tabrikian-Dubnov-Dickalov 2004)
BGU
Single Source - Voicing Decision

Unvoiced model


Colored Gaussian noise model: y ~ N (0, R y )
Voiced/unvoiced decision by the
Generalized Likelihood Ratio Test (GLRT):
max2 f (y |  , b,  v2 ; H voiced )
GLRT=
 ,b , v
max f (y | R y ; H unvoiced )
Ry
(Fisher-Tabrikian-Dubnov 2006)

y
H voiced
2
 I  PA ( )  y

2

H unvoiced

BGU
Multiple Sources

ML estimator of  from  y j  j 1 under the
model: y j  a s j  v j with unknown signal and
unknown (Gaussian) noise covariance:
J


Gl
2
ML  arg max    log max(Gl ,  ) 
2 

max(

,

)
l 1 
Gl
G  TAT R y TA , TA  svd (I  a aT ), TA : L  ( L  1)
L 1
ˆ
L 1
 2  0  ˆ  arg max   log Gl

J
L
 ˆ  arg max

l 1
1
aT R y1a
(Harmanci-Tabrikian-Krolik 2000)
MVDR
BGU
Multiple Sources


Voiced model: y  A()b  v, v ~ N (0, R v )
v includes other interferences. R v is unknown.
Using J overlapping subframes of size Ls
1
T
R

YY
(2K+1<J< Ls): y J
T
jth column of Y:  y j , y j 1 , , y j  N  J 1 
 J

ˆ ML  arg max   log Gj ( )  ,

 j 1

1 T
G ( )  Y  I  U A ( )UTA ( )  Y,
J
A( )  U A ( ) Λ A ( ) VAT ( )
BGU
Multiple Sources

Pitch tracking:
The data vector at the mth frame:
y m  A(m )bm  v m ,
m  1,
,M
m m1 - first-order Markov process
M
 Maximum A-posteriori Probability (MAP) pitch
tracking via the Viterbi algorithm
BGU
Multiple Sources - Voicing Decision

Unvoiced model
Colored Gaussian noise model:

y ~ N (0, R y )
Voiced/unvoiced decision by the GLRT:
max f (y |  , b, R v ; H voiced )
GLRT=
 ,b , R v
max f (y | R v ; H unvoiced )
Rv
(Fisher-Tabrikian-Dubnov 2007)
J

j 1
H voiced
R
y
j
G
y
j


H unvoiced

BGU
Multiple Source Models
Exact ML for the strongest voiced signal, and
“locally ML” for other voiced signals
Likelihood function

ˆ 2, LML
ˆ ML  ˆ1, LML

BGU
Experiments – Single Source
BGU
Experiments - Two Sources
Two voiced sources
0
-10
Normalized log-likelihood
-20
-30
-40
-50
-60
-70
-80
-90
150
200
250
Frequency [Hz]
300
350
BGU
Experiments – Voicing Decision
BGU
Experiments - – Voicing Decision
BGU
Conclusions




ML pitch estimation for single and multiple sources
have been developed under the harmonic model for
voiced frames.
The derived likelihood functions under the two
models allow implementation of the Viterbi
algorithm for MAP pitch tracking.
The GLRT for voicing decision is derived under the
two models.
Future work:


development of multiple hypothesis tracking methods for
single microphone BSS.
Adaptive estimation of the number of harmonics