BGU Multiple Pitch Tracking for Blind Source Separation Using a Single Microphone Joseph Tabrikian Dept. of Electrical and Computer Engineering Ben-Gurion University of the Negev Workshop on: Speech Enhancement and Multichannel Audio Processing Technion 22.2.2007 BGU Outline Motivation Single source pitch estimation and tracking Multiple source pitch estimation and tracking Experiments Conclusion BGU Motivation Speech enhancement Sensitivity of many audio processing algorithms to interference. For example: Automatic speech/speaker recognition Speech/music compression Single microphone blind source separation (BSS) Karaoke BGU Single Source - Modeling Voice frames - harmonic model: K y (tn ) bk cos(tn k ) v(tn ), n 1, ,N k 1 v(tn ) - additive Gaussian noise In matrix notation: y A()b v, v ~ N (0, R v ) 1 cos t1 1 cos t2 A( ) 1 cos t N b bc 0 bc1 cos K t1 sin t1 cos K t2 sin t2 cos K t N sin t N bcK bs1 bsK T sin K t1 sin K t2 sin K t N BGU Single Source – Pitch Tracking Maximum Likelihood (ML) estimator: ˆ arg max PR PR1/ 2 A ( ) R v 1/ 2 A v 1/ 2 v ( )y 2 A( ) A ( )R A( ) A H ( )R v1/ 2 H 1 v 1 Pitch tracking: The data vector at the mth frame: y m A(m )bm v m , m 1, m m1- first-order Markov process: f (1 , M ,M M , M ) f (m | m 1 ) m 1 Maximum A-posteriori Probability (MAP) pitch tracking via the Viterbi algorithm. (Tabrikian-Dubnov-Dickalov 2004) BGU Single Source - Voicing Decision Unvoiced model Colored Gaussian noise model: y ~ N (0, R y ) Voiced/unvoiced decision by the Generalized Likelihood Ratio Test (GLRT): max2 f (y | , b, v2 ; H voiced ) GLRT= ,b , v max f (y | R y ; H unvoiced ) Ry (Fisher-Tabrikian-Dubnov 2006) y H voiced 2 I PA ( ) y 2 H unvoiced BGU Multiple Sources ML estimator of from y j j 1 under the model: y j a s j v j with unknown signal and unknown (Gaussian) noise covariance: J Gl 2 ML arg max log max(Gl , ) 2 max( , ) l 1 Gl G TAT R y TA , TA svd (I a aT ), TA : L ( L 1) L 1 ˆ L 1 2 0 ˆ arg max log Gl J L ˆ arg max l 1 1 aT R y1a (Harmanci-Tabrikian-Krolik 2000) MVDR BGU Multiple Sources Voiced model: y A()b v, v ~ N (0, R v ) v includes other interferences. R v is unknown. Using J overlapping subframes of size Ls 1 T R YY (2K+1<J< Ls): y J T jth column of Y: y j , y j 1 , , y j N J 1 J ˆ ML arg max log Gj ( ) , j 1 1 T G ( ) Y I U A ( )UTA ( ) Y, J A( ) U A ( ) Λ A ( ) VAT ( ) BGU Multiple Sources Pitch tracking: The data vector at the mth frame: y m A(m )bm v m , m 1, ,M m m1 - first-order Markov process M Maximum A-posteriori Probability (MAP) pitch tracking via the Viterbi algorithm BGU Multiple Sources - Voicing Decision Unvoiced model Colored Gaussian noise model: y ~ N (0, R y ) Voiced/unvoiced decision by the GLRT: max f (y | , b, R v ; H voiced ) GLRT= ,b , R v max f (y | R v ; H unvoiced ) Rv (Fisher-Tabrikian-Dubnov 2007) J j 1 H voiced R y j G y j H unvoiced BGU Multiple Source Models Exact ML for the strongest voiced signal, and “locally ML” for other voiced signals Likelihood function ˆ 2, LML ˆ ML ˆ1, LML BGU Experiments – Single Source BGU Experiments - Two Sources Two voiced sources 0 -10 Normalized log-likelihood -20 -30 -40 -50 -60 -70 -80 -90 150 200 250 Frequency [Hz] 300 350 BGU Experiments – Voicing Decision BGU Experiments - – Voicing Decision BGU Conclusions ML pitch estimation for single and multiple sources have been developed under the harmonic model for voiced frames. The derived likelihood functions under the two models allow implementation of the Viterbi algorithm for MAP pitch tracking. The GLRT for voicing decision is derived under the two models. Future work: development of multiple hypothesis tracking methods for single microphone BSS. Adaptive estimation of the number of harmonics
© Copyright 2026 Paperzz