Organizing Gaussian mixture models into a tree for
scaling up speaker retrieval
Jamal Rougui, Marc Gelgon, D. Aboutajdine, Noureddine Mouaddib, M.
Rziza
To cite this version:
Jamal Rougui, Marc Gelgon, D. Aboutajdine, Noureddine Mouaddib, M. Rziza. Organizing
Gaussian mixture models into a tree for scaling up speaker retrieval. Pattern Recognition
Letters, Elsevier, 2007, 28 (11), pp.1314-1319.
HAL Id: hal-00416675
https://hal.archives-ouvertes.fr/hal-00416675
Submitted on 29 Oct 2014
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
a,b
a
b
b
a
b
a
S1 , . . . , SM
< O(M)
O(M)
M
M
k
Sk (x) =
mk
!
wki Nki (x)
i=1
Nki (x)
wki
D
µik
Σik
M
M
S1
S12
S2
p(D|S12 )
p(D|S2 )
S12
p(D|S1 )
p(D|S1 )
p(D|S12 )
p(D|S2)
m12
m1 + m2
S12
S1
S2
S12
ESk [ ln p(D|Sk ) ] − ESk [ ln p(D|S12 ) ] ,
"
S
12
= arg min
S
#
−
$
S1 (x) ln S12 (x) dx −
$
k = 1, 2
"
S
12
%
S2 (x) ln S12 (x) dx
S
KL(S1+2 ∥S12 )
S1+2 (x)
"
S
12
= arg min
S
&
" = arg min −
S
12
S
m1!
+m2
i
&
1
(S1 (x) + S2 (x))
2
−
$
S12 (x)
dx
S1+2 (x) ln
S1+2 (x)
i
w1+2
$
i
N1+2
(x)
'
'
ln S12 (x) dx
S12
i
N1+2
KLm
KL
"
S
12
S1+2
S12
= arg min [KLm (S1+2 ∥S12 )]
S
= arg min
S
&m +m
1
!2
i
w1+2
i=1
KLm (Sk ∥S12 ) =
mk
!
i=1
m12
j
i
KL(N1+2
∥N12
)
min
j=1
'
m12
j
wki min KL(Nki ∥N12
), k = 1, 2
j=1
(µ1 , Σ1 )
(µ2 , Σ2 )
|Σ2 |
1
T −1
(log
+ T r(Σ−1
2 Σ1 ) + (µ1 − µ2 ) Σ2 (µ1 − µ2 ) − δ)
2
|Σ1 |
δ
m1 + m2
π
m12 (< m1 + m2 )
S1
S12
S1+2
S12
S
KLm
m1 + m2
m12
S1+2
S12
π0
π0
M ×M
S1
S2
KLm (S1 ||S2 ) + KLm (S2 ||S1 )
S1+2
S12
log2 (M)
KLm
S12
π̂ 0
it = 0
S12
π̂ it
"
S
12
Sm12
it
= arg min KLm (S1+2 , S12 , π̂ it )
S12 ∈Sm12
m12
Mc
S12
j
j
ŵ12
=
µ̂j12 =
Σ̂j12 =
i
w1+2
i∈π −1 (j)
i∈π −1 (j)
i
w1+2
µi1+2
j
ŵ12
(
j
i
i
i
i
j T
i∈π −1 (j) w1+2 (Σ1+2 + (µ1+2 − µ̂12 )(µ1+2 − µ̂r ) )
j
ŵ12
π −1 (j)
π̂ −1,it (j)
j
S1+2
S12
it
S)12
{1, . . . , m1 + m2 }
π it+1
S1+2
π̂ it+1
=
{1, . . . , m12 }
it
S)12
" , π)
arg min KLm (S1+2 , S
12
π
i
j
(
!
S1+2
S)it
12
j
i
π it+1 (i) = arg min KL(N1+2
||N12
)
j
π it+1 = π it
Binary tree of the
GMMs speaker using
similarity criterion
Tree level
Bin. tree
to
N−Tree
transformation
!
Sp ∈ parents
Grouping a GMMs Speaker
according to bin. tree map
Sc ∈
KLm (Sparent ∥Schild)
!
KLm (Sp ∥Sc )
Sp
{S1 , S2 , . . . }
Sp
Sp
log p(D|Sp) ≈ log p(D|Sc )
log p(D|Sp )
k log p(D|Sk )
log p(D|Sp )
log p̃(D|Sk ) ≈ log p(D|Sp ) +
KL(Sp ||Sk )
, k = 1, 2, . . .
KL(Sp ||Sk )
KL(Sp ||Sk )
KLm
log p̃(D|Sk ),
KLm
KLm
KLm
MinERR = min KL(Sp ||Sk ),
k
[log
p(D|Sp ) + MinERR
log p(D|Sp) + MaxERR ]
KLm
•
•
KLm
KLm
KLm
KLm
KLm
KLm
© Copyright 2026 Paperzz