Conditional random fields

Neural networks
Conditional random fields - factor graph
K (L+1)
K
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
(ykcX)
) f==
(yb⇣kc ) y=
exp
b
ck fk=1
c yk=1
f
f •
ff •=k ff(y,
yk=1
c 1yk=1
ff(y,kX)
yk =c ⌘
k =c
k =c
k =c 1⌘
k =c
k=1
2 ⌘
⇣exp
⇣P ⇣P ⇣P
⌘
P
(L+1,+1)
K(L+1,+1)
1 K(L+1,+1)
(L+1,+1)
1 K(L+1,+1)
1PK 1 K 1
⇣=W
⇣
⌘x1k+1,i
⌘x1k+1,i
⌘
• f (y,
(y
xf k+1,i
=
xexp
⇣
⌘x1k+1,i
•=k ,X)
(y,
(yW
,c,i
x
) ⇣=
⇣PW
• X)
•= X)
(y
x)fk+1,i
(y
,X)
)xfk+1,i
=
)fk+1,i
=
W
x1k+1,i
•=k ,X)
(y,
=kexp
(y
, c,i
x
)
=
exp
W
k+1,i
y
kexp
k+1,i
f (y,
ff (y,
yk =c
yk =c
kexp
y
=c
k =c
k=1
c,i
k=1
c,i
P
P
P
k
k=1
k=1
c,i
k=1
P
(L+1, (L+1,
1)
1)
(L+1,
1)
K
K
K1)
(L+1,
1)
K
(L+1,
K
• f (y,
• X)
•(y
=kX)
, fx
X)
, fxf)(y,
=
=k1,i
exp
(y
=
W
exp
, xf)c,i
W
)k= 1,i
exp
W
x
x1kk=2
1
x
1
•= X)
=k1,i
, x)f ⇣
=k1,i
exp
W
x
1
•k(y
X)
=k1,i
(y
,
x
)
=
exp
x
1
f (y,
f(y,
k(y
k
kk=2
1,iW
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
0
0 0
⇣
⇣
⇣
⌘
⌘
⌘
• f (y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1 =c
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
y=
), y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
y
y
fk+1
fk+1
kk=1
c,c
y
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
k
k
k+1
k+1
k=1
P
P
P
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
Topics: Markov
network
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1yxkk,i
=k ff(y,
(ykX)
, xf k,i
exp
W
1ykk,i
=k)c,i=f k,i
(y
, xk,i
)c,i= k,i
exp
f •
ff •
k,i
k,i
k,i
=c
=c
=c 1yk =c
ff(y,kX)
=c
k
c,i
c,iykW
k=1
k=1
k=1
k=1
c,iyxkk,i
k=1
=
observed
• f (y
), yfk+2
,(y
yf•k+1
yfk+2
(y
, yk),k+2
) k),, yy⇣fk+2
(yk),, y⇣
yk+2
yk+2 ) ⇣
•
(y
•
(y
)
,
y
(y
)
(y
,
y
y
•
,
y
)
(y
k, y
k+2
f k(y
kfk+2
f,k+2
k
k+1), ⇣
f
k
k
k+1
k+2
k+1
k
f
k+1
⇣
⌘ P⌘
⌘
⌘
• Illustration for K=5
P
P
P
P
(L+1,+1)
(L+1,+1)
(L+1,+1)
K
1 K
1 K
(L+1,+1)
1 K 1 K 1
(L+1,+1)
• f (y,
• X)
= X)
•(y
=kX)
, fx
(y
, fxf)(y,
=
=kX)
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1k+1,i
•
=
(y
,
x
=
exp
•
=
(y
,
x
)
=
exp
W
x
f (y,
f(y,
f(y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
yk =cx1k=1
yk =cx1
k+1,i
yk =c 1
f
k+1,i
y
=c
k
c,i
c,i
k
k=1
c,i
c,i
k=1
• x1 •x2x1x3x
x•x5x
•5 x
• 2xx4x
1x
2 4xx
3 5 x4 x5
13
2x
4x
3x
4x
5x
1x
2x
3x
⇣
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
• y1f (y,
y
y
y
y
•
y
y
y
y
y
•
y
y
•
y
y
y
y
y
y
y
y
•
y, f1f3(y,
y=2k4X)
y2k4X)
y3k+1
0=cV
5•
4)
5V
• 2 X)
(y
=2k4X)
,)yfy1f=
exp
), f=
(y
Vexp
, yf0=k+1
), y0=
1c,c
11c,c
11k=1
0=c=c
1f3(y,
24X)
1
5
5=
•=
(y
exp
Vexp
11c,c
3(y,
5=
•(y
(y
)k0=
exp
11yy0kk+1
1y0k+1 =c0
f(y,
k+1
k+1
kc,c
y
=cV
y
ykk+1
=c
y0kk+1
yy0kk+1
=c =c
f3
y
kc,c
k+1
=c =c
k=1
k=1
k=1
k=1
•
MARKOV NETWORK VISUALIZATION
•
• k , yf k+2
(y•k ,)yfk+2
•fk(y
), yfkk+2
(y
,•yfkk+1
(y
, yfkk+1
), ykk+2
,)yfk+2
(y
(y
), yfkk+2
(y
(y
) k), yfk+2
(yk ), yk+2
, yk+2 )
f (y
k+2
k+1
k+1
k+1 )
• x1 •x2x1x3•x2xx4x
•x5x
x52xx41x3xx52x4x3x5x4 x5
xx
1 3x
2x41
3•
• Each
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5 y4 y5
edge is associated with one of more factors
2
2
2
2
2
K (L+1)
K
X
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
X)
(y
) f1=p(y
exp
bkcx)>yk=2
k=2
X)
=
(y
=
exp
b
1
k=2 >
cK
ck |X))
c yk=1
K
1k =c=
r
log
p(y|X)
r
flog p(y|X)
f •
ff •=k f(e(y
yk=2
c1
ff(y,k=
f(y,
k
yk=1
f
y
k =c
k =c
k =c
k =c
k=1
k=1
W
a⌘
(x
k=1
X
X
)=
ra
x
)
k
k
3 ⌘
⇣
⌘
⇣
(x )
⇣
⌘
k 1
k 1 ⇣
⇣
>
P
K
1
K
1
P
K
1
K
1
(L+1,+1)
X
X
K(L+1,+1)
1PK
r
log p(y|X)
=
ra(1,+1)
log p(y|X)
x
X
X
K) 1p(yk |X))
(L+1,+1)
1P=
K(L+1,+1)
1PK (e(y
(L+1,+1)
1 k=1
k
k=2
k=2
W(1,+1)
(x
)
k+1
k
>
>⌘
>
>⌘
⇣
⇣
⇣
⌘
•
(y,
X)
=
(y
,
x
)
=
exp
W
x
1
⇣
⌘
•
(y,
X)
=
(y
,
x
)
=
exp
W
x
⇣
•
(y,
•
X)
(y,
=
X)
(y
=
,
x
(y
,
)
x
=
exp
)
=
W
exp
W
x
x
1
1
•
(y,
X)
=
(y
,
x
)
=
exp
W
x
1
f
f
k
k+1,i
k+1,i
y
=c
f
f
k
k+1,i
k+1,i
f
f
k
k+1,i
k
k+1,i
k+1,i
k+1,i
y
=c
y
=c
r
log
p(y|X)
=
r
log
p(y|X)
x
=
(e(y
)
p(y
|X))
x
f
f
k
k+1,i
k+1,i
y
=c
k
r
log
p(y|X)
=
r
log
p(y|X)
x
=
(e(y
)
p(y
|X))
x
c,i
k=1
c,i
kk
c,i
c,i
k k+1
k=1
k=1
c,i
k=1
W
(x(L+1,
)
k+1
a
(x K
) 1
k+1
k=1 a k+1
k=1
K
(L+1,
1) PK
1)kP
(L+1,
1) P
K
K
(L+1,
1) kPK
(L+1,
1)kkPKk=1
X1 W
X
• flog
(y,
• p(y|X)
X)
•(y
=k=X)
, fx
X)
, fxf)(e(y
=
=k1,i
exp
(y
=
W
exp
, |X))
xf)c,i
W
)k=1
= 1,i
exp
W
x
x1kk=2
1
x
1
•=xX)
=k1,i
,) x)f ⇣
=k1,i
exp
W
x
1
•k(y
(y,
X)
=
(y
,
x
)
=
exp
x
1
k=1
k=1
k=1
>
>
f (y,
f(y,
k(y
k1,i
k
kk=2
1,iW
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
=
ra
p(y
x
⌘
⇣
k
k ⇣
⇣
⌘
⌘
⇣
⌘
(x )
•
k+1
k+1
PK 1PK 1PK 1PK 1PK 1
k=1
k=1
0 0
⇣y=
⇣
⌘
⌘
⌘
• f•(y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1 =c
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
),⇣0y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
y
y
fk+1
fk+1
kk=1
c,c
y
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
@
log
p(y|X)
k
k
k+1
k+1
k=1
P
P
P
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
•
=
(1y
Topics: Markov
network
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1
0
0
•
(y,
X)
=
(y
,
x
)
=
exp
W
x
1
•
(y,
X)
=
(y
,
x
)
=
exp
W
x
1
f
k,i k f k,ikc,i f k,ik
=c
=c
yk)k,i
=c yk =c
ff k ff@k,iklog
=c
ffp(y|X)
@apy(y
, yk+1
c,i k,i
c,iykk,i
kk,i
k=1
k=1
k=1
c,i k,i
k=1
c,iykk,i
k=1
k
0
0
=
observed
0 ,y
0
p(y
=
(1
@
log
p(y|X)
@
log
p(y|X)
k 0 = yk , yk+10 = yk+1 |X))
y
=y
=y
k
k+1
0
0
0
0
• f (y
,
y
)
(y
,
y
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
k
k+1
•
(y
•
,
y
(y
)
,
y
(y
)
(y
,
,
y
y
)
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
k fk+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
k f=
k+2k (1fyk+2
fk
fk+1
k+1
k+1
p(y
yk+1 |X))
(1
fk+2
k+1
k+2
@a
, y=y
)k= ⇣
p(y
yk+2
= yk+1
|X))
k =⇣yk , yk+1 =⌘
=y
p0 (y
kky==y
⇣
⇣
⌘
⌘
=y
,ykk0 k+2
⇣
⌘
k ,,yyk+1
0
0
•
Illustration
for
K=5
P
P
P
@a
(y
,
y
)
P
@a
(y
,
y
)
P
p k k+1
p k k+1
(L+1,+1)
(L+1,+1)
(L+1,+1)
K
1 K
1 K
(L+1,+1)
1 K 1 K 1
p(y|X)
(L+1,+1)
0
0
• =yX)
= X)
(y
=
(y
,=
xf)(y,
=
=
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1
x
1
K
1=c 1
•
=
,
x
=
exp
x
1
X)
=
(y
,
x
)
=
exp
W
x
p(y
=kX)
y,kfx
y•
y(y
|X))
=
(1•y =y f,y(y,
f (y,
f(y,
f, (y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
y
=c
y
=c
k+1,i
y
f
k
k+1,i
y
=c
k•
k+1
f
k
k+1,i
X
k
k
k
k+1
c,i
c,i
k
k=1
k=1
c,i
0
c,i
k=1
@ log p(y|X)
• x1 •x2x1x3x
x•x5x
•5 x
• 2xx4x
, yk+1 )
1x
2 4xx
3 5 x4 x5
13
2x
4x
3x
4x
5x
1x
2x
3x
=
(1
K
1
@Vy⌘,y
X⇣
⇣
⇣
⌘
⌘k=1 ⌘
⇣
⌘0
⇣
K
1
K
1
@
log
p(y|X)
X
P
P
P
0
X
P
P
@ log•p(y|X)
p(y|X)
Ky0k =y
1 0 ,yK
1
1 ykK
1p(yK
0K
1 0 = yk+1 |X))
=
(1
=
,0y=
0
0
k
k+1
=y
•@ y1flog(y,
y
y
y
y
k+1
y
y
y
y
y
•
y
y
•
y
y
y
y
y
y
y
y
•
y
y
y
y
y
k
k+1
0
0
0
0
0
2 X)
3
4
5
=
(1
p(y
=
y
,
y
yk+1
0
0|X))
1
2
3
4
5
•
(y,
=
X)
•
(y
=
,
(y,
(y
X)
,
)
y
=
=
exp
)
=
(y
V
exp
,
y
V
)
=
exp
1
V
1
1
1
1
1
=
(1
p(y
=
y
,
y
=
y
|X))
0
0
1f • 2 1
2k
4X)
5
40 ,yff
5 k4
=
(y
,
)
=
V
1
1
1f3@V
3(y,
5=kc,c
•
X)
(y
,
y
)
=
exp
1
1
0,y
kV
=y
=y
k yexp
k+1
y2
=y
f(y,
f
k+1
k=y
k+1
f
k+1
c,c
y
=c
c,c
y
y
=c
=c
y
=c
y
=c
y
=c
k=ck+1
f3
k+1
c,c
y
y
=c
k ,y
k+1
f
k
k+1
c,c
y
=c
y
=c
k
k
k+1
k+1
k
k+1
k
k+1
y
k=1
k=1
k=1
k
k+1
k=1
k=1
K 1
k k+1
@V
@V
X
(1,
1)
•
(1,+1)
(1,+1)
k
MARKOV NETWORK VISUALIZATION
(1,+1)
(1,+1)
(1,+1)
(1,+1)
(1,+1)
k
k
k
k
k
0
k
k
k
0
k
k+1
0
k+1
k+1
0
k
k
k+1
0
k+1
0
k+1
0
k
y|X)
=
X
0 ,y 0
yk
k+1
0
(1yk =yk0 ,yk+1 =yk+1
•
0
k
0 ,y 0
yk
k+1
0
k
k=1
p(yk =
k+1
yk , yk+1 =
0
k+1
k=1
0
yk+1
|X))
k=1
k
0
k
k+1
0
k+1
0
k+1
• k , yf k+2
(y•k ,)yfk+2
•fk(y
), yfkk+2
(y
,•yfkk+1
(y
, yfkk+1
), ykk+2
,)yfk+2
(y
), yVk+2log
(y
), yfkk+2
(y
(y
) k), yfk+2
(yk ), yk+2
) p(y|X)
f (y
k+2
k+1
k+1
k+1r
K
X1
(e(yk ) e(yk+1 )>
! K 1 K
K
1
1
X
X
K
1
K
1
K
1
k=1
X
X
X
X
>
>
>=
rV log
log
p(y|X)
(e(y
e(y
p(y
, y|X))
|X))
=
freq(y
y
k ) k+1
k+1 )freq(y
kk+1
k+1
k , )yk+1 ) p(yk ,p(y
k ,|X
r
p(y|X)
=
(e(y
)
e(y
)
p(y
,
y
=
freq(y
,
y
y
r
log
p(y|X)
=
(e(y
)
e(y
)
p(y
,
y
|X))
=
,
y
)
p(y
,
y
|X)
!
V
k
k
k+1
k
k+1
k+1
V
k
k+1
k
k+1
k
k
k+1
• x1 •x2x1x3•x2xx4x
•x5x
x52xx41x3xx52x4x3Kx
xx
5x
1 3x
2x41
3•
1 4 x5
X1
X
k=1k=1
k=1 k=1
k=1
k=1
>
(e(yk ) e(yk+1 )
p(yk , yk+1 |X)) =
freq(yk , yk+1 )
p(yk , yk+1 |X)• log p(y, X) = log (p(y|X)p(X)) = log p(y
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5k=1
y4 y5
1
QK
• Representation
• logis
log
p(y,
(p(y|X)p(X))
logp(y|X)
p(y|X)log
log p(X)
logp(y
p(y|X)
• p(y|X)
=log k=1
ambiguous:
p(y,
X)X)
(p(y|X)p(X))
= =loglog
p(y|X)
p(X)
p(y|X)
k |yk 1 , X) p(yk |yk 1 , X) =
log
p(y, X) = log• (p(y|X)p(X))
= =loglog
p(y|X)
log p(X)
QK
QK
QK
1 •1
1
X)p(X))
=
log
p(y|X)
log
p(X)
log
p(y|X)
=
p(y
|y
,
X)
p(y
|y
,
X)
=
(a
(y,2yk)k ))
, yu2(y
,x
)+ka)pf+
(yka2 ,px
‣ clique
between
yp(y|X)
,
y
and
x
,
but
we
didn’t
explicitly
define
a
joint
factor
=
p(y
|y
,
X)
p(y
|y
,
X)
=
exp
(a
)
(y
p(y|X)
= k=1
p(yk•|yk•p(y|X)
,
X)
p(y
|y
,
X)
=
exp
(a
(y
)
+
a
(y
,
y
))
1
2
2
k
k
1
k
k
1
u
1 ,fy(y
k ))
fk(y
1exp
2(y
1 , y2 )
f (y1 , y2
1 Z(yk 1 ,X)k k 1 u k
k
1
1
kk=1k=1
k 1 k
k
1
,X)
Z(ypkZ(y
,X)
k
1
1
+1
k=1
2
2
=
2
2
2
1
X)f (y
p(y
|y
,
X)
=
exp2y(a
)(y
+(y
,=yf1k,(y
))
k
k
1
k
,,axy2p22,(y
),xx
y221,,is
(y11,,yto
y22,)x2f)(y=2 , xf2(y
) 1 , y2 ) f (y2 , x2 ) f (y1 , x2 )
,2)x)u2, (y
) fkf(y
, 1x,2y)f2, )(y
(y
,xxy22equivalent
, x•2kf) (y
(y
fact,
defined
and
which
Z(y
,X)
1(y
2
2)
1ff,(y
12, 1,y,x
f21
22))f1(y
1
2))) = ff(y
1f
1‣, yin
2, x
2 ) we
f (y
2•
f, y
fy(y
having:
(y
,
x
f 1
2)
f (y1 , y2 )
f (y1 , y2 , x2 )
=
f (y1 , y2 ) f (y2 , x2 ) f (y1 , x2 )
K (L+1)
K
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
(ykcX)
) f==
(yb⇣kc ) y=
exp
b
ck fk=1
c yk=1
f
f •
ff •=k ff(y,
yk=1
c 1yk=1
ff(y,kX)
yk =c ⌘
k =c
k =c
k =c 1⌘
k =c
k=1
4 ⌘
⇣exp
⇣P ⇣P ⇣P
⌘
P
(L+1,+1)
K(L+1,+1)
1 K(L+1,+1)
(L+1,+1)
1 K(L+1,+1)
1PK 1 K 1
⇣=W
⇣
⌘x1k+1,i
⌘x1k+1,i
⌘
• f (y,
(y
xf k+1,i
=
xexp
⇣
⌘x1k+1,i
•=k ,X)
(y,
(yW
,c,i
x
) ⇣=
⇣PW
• X)
•= X)
(y
x)fk+1,i
(y
,X)
)xfk+1,i
=
)fk+1,i
=
W
x1k+1,i
•=k ,X)
(y,
=kexp
(y
, c,i
x
)
=
exp
W
k+1,i
y
kexp
k+1,i
f (y,
ff (y,
yk =c
yk =c
kexp
y
=c
k =c
k=1
c,i
k=1
c,i
P
P
P
k
k=1
k=1
c,i
k=1
P
(L+1, (L+1,
1)
1)
(L+1,
1)
K
K
K1)
(L+1,
1)
K
(L+1,
K
• f (y,
• X)
•(y
=kX)
, fx
X)
, fxf)(y,
=
=k1,i
exp
(y
=
W
exp
, xf)c,i
W
)k= 1,i
exp
W
x
x1kk=2
1
x
1
•= X)
=k1,i
, x)f ⇣
=k1,i
exp
W
x
1
•k(y
X)
=k1,i
(y
,
x
)
=
exp
x
1
f (y,
f(y,
k(y
k
kk=2
1,iW
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
0
0 0
⇣
⇣
⇣
⌘
⌘
⌘
• f (y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1 =c
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
y=
), y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
y
y
fk+1
fk+1
kk=1
c,c
y
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
k
k
k+1
k+1
k=1
P
P
P
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
Topics: factor
graph
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1yxkk,i
=k ff(y,
(ykX)
, xf k,i
exp
W
1ykk,i
=k)c,i=f k,i
(y
, xk,i
)c,i= k,i
exp
f •
ff •
k,i
k,i
k,i
=c
=c
=c 1yk =c
ff(y,kX)
=c
k
c,i
c,iykW
k=1
k=1
k=1
k=1
c,iyxkk,i
k=1
=
observed
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
•
,
y
(y
)
,
y
(y
)
(y
,
,
y
y
)
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
f
k
k+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
f
k
f
k+2
k
k+2
f
k
f
k+1
k
k+2
k+1
k+2
f
k
k+2
f
k
k+1
k+2
⇣
⇣
⇣
⌘
⇣
⌘
⇣P P⌘ P⌘
• Factor graphs better represent factors
P
P
(L+1,+1)
(L+1,+1)
(L+1,+1)
K
1 K
1 K
(L+1,+1)
1 K 1 K 1
(L+1,+1)
• f (y,
• X)
= X)
•(y
=kX)
, fx
(y
, fxf)(y,
=
=kX)
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1k+1,i
•
=
(y
,
x
=
exp
•
=
(y
,
x
)
=
exp
W
x
f (y,
f(y,
f(y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
yk =cx1k=1
yk =cx1
k+1,i
yk =c 1
f
k+1,i
y
=c
k
c,i
c,i
k
k=1
c,i
c,i
k=1
• x1 •x2x1x3x
x•x5x
•5 x
• 2xx4x
1x
2 4xx
3 5 x4 x5
13
2x
4x
3x
4x
5x
1x
2x
3x
⇣
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
• y1f (y,
y
y
y
y
•
y
y
y
y
y
•
y
y
•
y
y
y
y
y
y
y
y
•
y, f1f3(y,
y=2k4X)
y2k4X)
y3k+1
0=cV
5•
4)
5V
• 2 X)
(y
=2k4X)
,)yfy1f=
exp
), f=
(y
Vexp
, yf0=k+1
), y0=
1c,c
11c,c
11k=1
0=c=c
1f3(y,
24X)
1
5
5=
•=
(y
exp
Vexp
11c,c
3(y,
5=
•(y
(y
)k0=
exp
11yy0kk+1
1y0k+1 =c0
f(y,
k+1
k+1
kc,c
y
=cV
y
ykk+1
=c
y0kk+1
yy0kk+1
=c =c
f3
y
kc,c
k+1
=c =c
k=1
k=1
k=1
k=1
•
FACTOR GRAPH VISUALIZATION
•
• k , yf k+2
(y•k ,)yfk+2
•fk(y
), yfkk+2
(y
,•yfkk+1
(y
, yfkk+1
), ykk+2
,)yfk+2
(y
(y
), yfkk+2
(y
(y
) k), yfk+2
(yk ), yk+2
, yk+2 )
f (y
k+2
k+1
k+1
k+1 )
• x1 •x2x1x3•x2xx4x
•x5x
x52xx41x3xx52x4x3x5x4 x5
xx
1 3x
2x41
3•
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5 y4 y5
Factor graph 1
• This
Markov network
Factor graph 2
is less ambiguous:
2
?
?
2
2
2
2
K (L+1)
K
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
(ykcX)
) f==
(yb⇣kc ) y=
exp
b
ck fk=1
c yk=1
f
f •
ff •=k ff(y,
yk=1
c 1yk=1
ff(y,kX)
yk =c ⌘
k =c
k =c
k =c 1⌘
k =c
k=1
4 ⌘
⇣exp
⇣P ⇣P ⇣P
⌘
P
(L+1,+1)
K(L+1,+1)
1 K(L+1,+1)
(L+1,+1)
1 K(L+1,+1)
1PK 1 K 1
⇣=W
⇣
⌘x1k+1,i
⌘x1k+1,i
⌘
• f (y,
(y
xf k+1,i
=
xexp
⇣
⌘x1k+1,i
•=k ,X)
(y,
(yW
,c,i
x
) ⇣=
⇣PW
• X)
•= X)
(y
x)fk+1,i
(y
,X)
)xfk+1,i
=
)fk+1,i
=
W
x1k+1,i
•=k ,X)
(y,
=kexp
(y
, c,i
x
)
=
exp
W
k+1,i
y
kexp
k+1,i
f (y,
ff (y,
yk =c
yk =c
kexp
y
=c
k =c
k=1
c,i
k=1
c,i
P
P
P
k
k=1
k=1
c,i
k=1
P
(L+1, (L+1,
1)
1)
(L+1,
1)
K
K
K1)
(L+1,
1)
K
(L+1,
K
• f (y,
• X)
•(y
=kX)
, fx
X)
, fxf)(y,
=
=k1,i
exp
(y
=
W
exp
, xf)c,i
W
)k= 1,i
exp
W
x
x1kk=2
1
x
1
•= X)
=k1,i
, x)f ⇣
=k1,i
exp
W
x
1
•k(y
X)
=k1,i
(y
,
x
)
=
exp
x
1
f (y,
f(y,
k(y
k
kk=2
1,iW
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
0
0 0
⇣
⇣
⇣
⌘
⌘
⌘
• f (y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1 =c
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
y=
), y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
y
y
fk+1
fk+1
kk=1
c,c
y
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
k
k
k+1
k+1
k=1
P
P
P
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
Topics: factor
graph
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1yxkk,i
=k ff(y,
(ykX)
, xf k,i
exp
W
1ykk,i
=k)c,i=f k,i
(y
, xk,i
)c,i= k,i
exp
f •
ff •
k,i
k,i
k,i
=c
=c
=c 1yk =c
ff(y,kX)
=c
k
c,i
c,iykW
k=1
k=1
k=1
k=1
c,iyxkk,i
k=1
=
observed
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
•
,
y
(y
)
,
y
(y
)
(y
,
,
y
y
)
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
f
k
k+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
f
k
f
k+2
k
k+2
f
k
f
k+1
k
k+2
k+1
k+2
f
k
k+2
f
k
k+1
k+2
⇣
⇣
⇣
⌘
⇣
⌘
⇣P P⌘ P⌘
• Factor graphs better represent factors
P
P
(L+1,+1)
(L+1,+1)
(L+1,+1)
K
1 K
1 K
(L+1,+1)
1 K 1 K 1
(L+1,+1)
• f (y,
• X)
= X)
•(y
=kX)
, fx
(y
, fxf)(y,
=
=kX)
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1k+1,i
•
=
(y
,
x
=
exp
•
=
(y
,
x
)
=
exp
W
x
f (y,
f(y,
f(y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
yk =cx1k=1
yk =cx1
k+1,i
yk =c 1
f
k+1,i
y
=c
k
c,i
c,i
k
k=1
c,i
c,i
k=1
• x1 •x2x1x3x
x•x5x
•5 x
• 2xx4x
x
1x
2 4xx
3 5 x4 x5
13
2x
4 factor
3x
4x
5x
1x
2x
3x
=
⇣
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
• y1f (y,
y
y
y
y
•
y
y
y
y
y
•
y
y
•
y
y
y
y
y
y
y
y
•
y, f1f3(y,
y=2k4X)
y2k4X)
y3k+1
0=cV
5•
4)
5V
• 2 X)
(y
=2k4X)
,)yfy1f=
exp
), f=
(y
Vexp
, yf0=k+1
), y0=
1c,c
11c,c
11k=1
0=c=c
1f3(y,
24X)
1
5
5=
•=
(y
exp
Vexp
11c,c
3(y,
5=
•(y
(y
)k0=
exp
11yy0kk+1
1y0k+1 =c0
f(y,
k+1
k+1
kc,c
y
=cV
y
ykk+1
=c
y0kk+1
yy0kk+1
=c =c
f3
y
kc,c
k+1
=c =c
k=1
k=1
k=1
k=1
•
FACTOR GRAPH VISUALIZATION
•
• k , yf k+2
(y•k ,)yfk+2
•fk(y
), yfkk+2
(y
,•yfkk+1
(y
, yfkk+1
), ykk+2
,)yfk+2
(y
(y
), yfkk+2
(y
(y
) k), yfk+2
(yk ), yk+2
, yk+2 )
f (y
k+2
k+1
k+1
k+1 )
• x1 •x2x1x3•x2xx4x
•x5x
x52xx41x3xx52x4x3x5x4 x5
xx
1 3x
2x41
3•
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5 y4 y5
Factor graph 1
• This
Markov network
Factor graph 2
is less ambiguous:
2
?
?
2
2
2
2
K (L+1)
K
hugo.larochelle@
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
(ykcX)
) f==
exp
b
1
(y
)
=
exp
b
1
ck fk=1
c
c
f
f •
ff •=k ff(y,
y
=c
y
=c
y
=c
c
ff(y,kX)
y
=c
k
y
=c
k
k
k
k ⇣ k=1 ⌘k
4 ⌘
⇣
⇣ k=1⇣P k=1⇣P k=1
⌘
⌘
P
(L+1,+1)
K(L+1,+1)
1 K(L+1,+1)
(L+1,+1)
1PK(L+1,+1)
1PK 1 K 1
⇣=W
⇣
⌘x1k+1,i
⌘x1k+1,i
⌘
• f (y,
(y
xf k+1,i
=
xexp
⇣
⌘x1k+1,i
•=k ,X)
(y,
(yW
,c,i
x
) ⇣=
⇣PW
• X)
•= X)
(y
x)fk+1,i
(y
,X)
)xfk+1,i
=
)fk+1,i
=
W
x1k+1,i
•=k ,X)
(y,
=kexp
(y
, c,i
x
)
=
exp
W
k+1,i
y
kexp
k+1,i
f (y,
ff (y,
yk =c
ykSeptember
=c
kexp
y
=c
k =c
k=1
c,i
k=1
c,i
P
P
P
k
k=1
k=1
c,i
k=1
P
(L+1, (L+1,
1)
1)
(L+1,
1)
K
K
K1)
(L+1,
1)
K
(L+1,
K
• f (y,
• X)
•(y
=kX)
, fx
X)
, fxf)(y,
=
=k1,i
exp
(y
=
W
exp
, xf)c,i
W
)k= 1,i
exp
W
x
x1kk=2
1
x
1
•= X)
=k1,i
, x)f ⇣
=k1,i
exp
W
x
1
•k(y
X)
=k1,i
(y
,
x
)
=
exp
x
1
f (y,
f(y,
k(y
k
kk=2
1,iW
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
0
0 0
⇣
⇣
⇣
⌘
⌘
⌘
• f (y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
y=
), y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
Abstr
y
y
fk+1
fk+1
kk=1
c,c
y
=c
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
k
k
k+1
k+1
k=1
P
P
P
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
Topics: factor
graph
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1yxkk,i
=k ff(y,
(ykX)
, xf k,i
exp
W
1ykk,i
=k)c,i=f k,i
(y
, xk,i
)c,i=fork,i
exp
f •
ff •
k,i
k,i
ykW
k,i
=c
=c
=c 1yk =c
ff(y,kX)
=c
k
c,i
c,islides
k=1
k=1
k=1
Math
my
“Training
CRFs”.
k=1
c,iyxkk,i
k=1
=
observed
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
•
,
y
(y
)
,
y
(y
)
(y
,
,
y
y
)
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
f
k
k+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
f
k
f
k+2
k
k+2
f
k
f
k+1
k
k+2
k+1
k+2
f
k
k+2
f
k
k+1
k+2
⇣
⇣
⇣
⌘
⌘
⌘
⇣
⌘
⇣
• Factor graphs better represent factors
(L+1,0)
(L+1,
1)
P
P
P
P
P
• a(L+1,+1)
a 1 K
(x1k )yK
+ 11k>1
u (yk ) =
(L+1,+1)
(L+1,+1)
K
Ka 1 K (x
(L+1,+1)
(L+1,+1)
1 k 1 )y +
• f (y,
• X)
= X)
•(y
=kX)
, fx
(y
, fxf)(y,
=
=kX)
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1k+1,i
•
=
(y
,
x
=
exp
•
=
(y
,
x
)
=
exp
W
x
f (y,
f(y,
f(y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
yk =cx1k=1
yk =cx1
k+1,i
yk =c 1
f
k+1,i
y
=c
k
c,i
c,i
k
k=1
c,i
c,i
k=1
• x1 •x2x1x3x
x•x5x
•5 x
• 2xx4x
x
1x
2 4xx
3 5x4 x5• ap (yk , yk+1 ) = 11k<K Vy ,y
13
2x
4 factor
3x
4x
5x
1x
2x
3x
=
⇣
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
⌘
K 1 (L+1,0)
K 1 K 1 K 1 (L+1,
K 11) (x ) ) exp(a(L+
• y1f (y,
y
y
y
y
•
y
y
y
y
y
•
exp(a
(x
)
)
exp(a
•
y
y
•
y
y
y
y
y
y
y
y
•
y
y
y
y
y
0
0
0
0
3
4
5
0
3
y
y =c
1
2
3
4
5
• 2 X)
(y,
=
X)
•
(y
=
,
(y,
(y
X)
,
)
y
=
=
exp
)
=
(y
V
exp
,
y
V
)
=
exp
1
V
1
1
1
1
0
1f • 2 1
2k
4X)
5
5 k4X)
(y
, fk+1
) f=k+1
Vc,c
1c,c
1k=1
1f3k+1
5=kc,c
•=2k4 ff3(y,
(yexp
, yk+1
)k=
11yy0k2k+1
1y0k+1 =c0
f(y,
f
k+1
y
=cexp
c,c
y
ykk+1
=cV=c
ykk+1
yy0kk+1
=c =c
f3
y
=c=c
kc,c
=c
k=1
k=1
k=1
k=1
•
FACTOR GRAPH VISUALIZATION
k
k
k
k+1
3
•
3
• k , yf k+2
(y•k ,)yfk+2
•fk(y
), yfkk+2
(y
,•yfkk+1
(y
, yfkk+1
), ykk+2
,)yfk+2
(y
(y
), yfkk+2
(y
(y
) k), yfk+2
(yk ), yk+2
, yk+2 )
f (y
k+2
k+1
k+1
k+1 )
• x1 •x2x1x3•x2xx4x
•x5x
x52xx41x3xx52x4x3x5x4 x5
xx
1 3x
2x41
3•
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5 y4 y5
Factor graph 1
• This
Markov network
Factor graph 2
is less ambiguous:
2
?
?
2
2
2
2
K (L+1)
K
hugo.larochelle@
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
(ykcX)
) f==
exp
b
1
(y
)
=
exp
b
1
ck fk=1
c
c
f
f •
ff •=k ff(y,
y
=c
y
=c
y
=c
c
ff(y,kX)
y
=c
k
y
=c
Université
de
Sherbrooke
k
k
k
k ⇣ k=1 ⌘k
4 ⌘
⇣
⇣ k=1⇣P k=1⇣P k=1
⌘
⌘
P
(L+1,+1)
K(L+1,+1)
[email protected]
(L+1,+1)
K(L+1,+1)
1PK(L+1,+1)
1PK 1 K 1
⇣=W
⇣
⌘x1k+1,i
⌘x1k+1,i
⌘
• f (y,
(y
xf k+1,i
=
xexp
⇣
⌘x1k+1,i
•=k ,X)
(y,
(yW
,c,i
x
) ⇣=
⇣PW
• X)
•= X)
(y
x)fk+1,i
(y
,X)
)xfk+1,i
=
)fk+1,i
=
W
x1k+1,i
•=k ,X)
(y,
=kexp
(y
, c,i
x
)
=
exp
W
k+1,i
y
kexp
k+1,i
f (y,
ff (y,
yk =c
ykSeptember
=c
kexp
y
=c
k =c
k=1
c,i
k=1
c,i
P
P
P
k
k=1
k=1
c,i
k=1
P
(L+1, (L+1,
1)
1)
(L+1,
1)
K
K
K1)
(L+1,
1)
K
(L+1,
K
• f (y,
• X)
•(y
=kX)
, fx
X)
, fxf)(y,
=
=k1,i
exp
(y
=
W
exp
, xf)c,i
W
)k= 1,i
exp
W
x
x1kk=2
1
x
1
•= X)
=k1,i
, x)f ⇣
=k1,i
exp
W
x
1
•k(y
X)
=k1,i
(y
,
x
)
=
exp
W
x
1
f (y,
f(y,
k(y
k
kk=2
1,iSeptember
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
26,
2012
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
0
0 0
⇣
⇣
⇣
⌘
⌘
⌘
• f (y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
y=
), y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
Abstr
y
y
fk+1
fk+1
kk=1
c,c
y
=c
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
k
k
k+1
k+1
k=1
P
P
P
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
Topics: factor
graph
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1yxkk,i
=k ff(y,
(ykX)
, xf k,i
exp
W
1ykk,i
=k)c,i=f k,i
(y
, xk,i
)c,i=fork,i
exp
f •
ff •
k,i
k,i
ykW
k,i
=c
=c
=c 1yk =c
ff(y,kX)
=c
k
c,i
c,islides
k=1
k=1
k=1
Math
my
“Training
CRFs”.
k=1
c,iyxkk,i
k=1
Abstract
=
observed
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
•
,
y
(y
)
,
y
(y
)
(y
,
,
y
y
)
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
f
k
k+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
f
k
f
k+2
k
k+2
f
k
f
k+1
k
k+2
k+1
k+2
f
k
k+2
f
k
k+1
k+2
⇣
⇣
⇣
⌘
⌘
⌘
⇣
⌘
⇣
• Factor graphs better represent factors
(L+1,0)
(L+1,
1)
Math for my slides
“Training
CRFs”.
P
P
P
P
P
• a(L+1,+1)
a 1 K
(x1k )yK
+ 11k>1
u (yk ) =
(L+1,+1)
(L+1,+1)
K
Ka 1 K (x
(L+1,+1)
(L+1,+1)
1 k 1 )y +
• f (y,
• X)
= X)
•(y
=kX)
, fx
(y
, fxf)(y,
=
=kX)
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1k+1,i
•
=
(y
,
x
=
exp
•
=
(y
,
x
)
=
exp
W
x
f (y,
f(y,
f(y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
yk =cx1k=1
yk =cx1
k+1,i
yk =c 1
f
k+1,i
y
=c
k
c,i
c,i
k
k=1
c,i
c,i
k=1
• x1 •x2x1x3x
x•x5x
•5 x
x4 x(x
• 2xx4x
x
1)
1ax
2k )4x
3a5(L+1,0)
5•k )ayp (y
13
2x
4 factor
3x
5x
1x
2x
3ux
=
•4x
(y
=x
+k ,1yk>1
(xkVy1 ),yy + 1k<K a(L+1,+1) (xk+
) (L+1,
= 11k<K
k+1a
⇣
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
⌘
K V
1 (L+1,0)
K 1 K 1 K 1 (L+1,
K 11) (x ) ) exp(a(L+
•
a
(y
,
y
)
=
1
• y1f (y,
y
y
y
y
•
y
y
y
y
y
p
k
k+1
1k<K
y
,y
•
exp(a
(x
)
)
exp(a
•
y
y
•
y
y
y
y
y
y
y
y
•
y
y
y
y
y
0
0
0
0
3
4
5
0
3
y
y =c
1
2
3
4
5
• 2 X)
(y,
=
X)
•
(y
=
,
(y,
(y
X)
,
)
y
=
=
exp
)
=
(y
V
exp
,
y
V
)
=
exp
1
V
1
1
1
1
0
1f • 2 1
2k
4X)
5
5 k4X)
(y
, fk+1
) f=k+1
Vc,c
1c,c
1k=1
1f3k+1
5=kc,c
•=2k4 ff3(y,
(yexp
, yk+1
)k=
11yy0k2k+1
1y0k+1 =c0
f(y,
f
k+1
y
=cexp
c,c
y
ykk+1
=cV=c
ykk+1
yy0kk+1
=c =c
f3
y
=c=c
kc,c
=c
k=1
k=1
k=1
k=1
•
FACTOR GRAPH VISUALIZATION
k
k
k
k
•
k
k+1
• exp(a(L+1,0) (x3 )y3 ) exp(a(L+1,
k
k+1
3
1)
3
(x2 )y3 ) exp(a(L+1,+1) (x4 )y3 ) exp(Vy3
• k , yf k+2
(y•k ,)yfk+2
•fk(y
), yfkk+2
(y
,•yfkk+1
(y
, yfkk+1
), ykk+2
,)yfk+2
(y
(y
), yfkk+2
(y
(y
) k), yfk+2
(yk ), yk+2
, yk+2 )
f (y
k+2
k+1
k+1
k+1 )
• x1 •x2x1x3•x2xx4x
•x5x
x52xx41x3xx52x4x3x5x4 x5
xx
1 3x
2x41
3•
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5 y4 y5
Factor graph 1
• This
Markov network
Factor graph 2
is less ambiguous:
2
?
?
2
2
2
2
K (L+1)
Hugo
Larochelle
K
hugo.larochelle@
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
(ykcX)
) f==
exp
b
1
(y
)
=
exp
b
1
ck fk=1
c
c
f
f •
ff •=k ff(y,
y
=c
y
=c
y
=c
c
ff(y,kX)
y
=c
k
y
=c
Université
de
Sherbrooke
k
k
k
k ⇣ k=1 ⌘k
k=1
k=1
4 ⌘
⇣ Département
⇣ k=1⇣Pd’informatique
⌘
⌘
⇣
P
(L+1,+1)
K(L+1,+1)
1P
(L+1,+1)
K(L+1,+1)
1PK(L+1,+1)
1PK 1 K 1
[email protected]
⇣=W
⇣
⇣=
⌘x1k+1,i
⌘x1k+1,i
⌘
• f (y,
(y
xf k+1,i
=
xexp
⇣
⌘x1k+1,i
•=k ,X)
(y,
(yW
,c,i
x
)Sherbrooke
⇣PW
• X)
•= X)
(y
x)fk+1,i
(y
,X)
)xfk+1,i
=
)fUniversité
=
W
x1k+1,i
•=k ,X)
(y,
=kexp
(y
, c,i
x
)
=
exp
W
de
k+1,i
y
kexp
k+1,i
f (y,
ff (y,
yk =c
ykSeptember
=c
kexp
k+1,i
y
=c
k =c
k=1
c,i
k=1
c,i
P
P
P
k
k=1
k=1
c,i
k=1
P
(L+1, (L+1,
1)
1)
(L+1,
1)
K
K
K1)
(L+1,
1)
K
(L+1,
K
• f (y,
• X)
•(y
=kX)
, fx
X)
, fxf)(y,
=
=k1,i
exp
(y
=
W
exp
, xf)c,i
W
)k= 1,i
exp
W
x
x1kk=2
1
x
1
•= X)
=k1,i
, x)f ⇣
=k1,i
exp
W
x
1
•k(y
X)
=k1,i
(y
,
x
)
=
exp
W
x
1
[email protected]
f (y,
f(y,
k(y
k
kk=2
1,iSeptember
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
26,
2012
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
0
0 0
⇣
⇣
⇣
⌘
⌘
⌘
• f (y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
y=
), y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
Abstr
y
y
fk+1
fk+1
kk=1
c,c
y
=c
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
k
k
k+1
k+1
k=1
P
P
P
September
26,
2012
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
Topics: factor
graph
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1yxkk,i
=k ff(y,
(ykX)
, xf k,i
exp
W
1ykk,i
=k)c,i=f k,i
(y
, xk,i
)c,i=fork,i
exp
f •
ff •
k,i
k,i
ykW
k,i
=c
=c
=c 1yk =c
ff(y,kX)
=c
k
c,i
c,islides
k=1
k=1
k=1
Math
my
“Training
CRFs”.
k=1
c,iyxkk,i
k=1
Abstract
=
observed
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
•
,
y
(y
)
,
y
(y
)
(y
,
,
y
y
)
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
f
k
k+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
f
k
f
k+2
k
k+2
f
k
f
k+1
k
k+2
k+1
k+2
f
k
k+2
f
k
k+1
k+2
⇣
⇣
⇣
⌘
⌘
⌘
⇣
⌘
⇣
• Factor graphs better represent factors
(L+1,0)
(L+1,
1)
Math for my slides
“Training
CRFs”.
P
P
P
P
P
• a(L+1,+1)
) =K
a 1 K
(x1k )yK
+ 11k>1
u (ykAbstract
(L+1,+1)
(L+1,+1)
Ka 1 K (x
(L+1,+1)
(L+1,+1)
1 k 1 )y +
• f (y,
• X)
= X)
•(y
=kX)
, fx
(y
, fxf)(y,
=
=kX)
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1k+1,i
•
=
(y
,
x
=
exp
•
=
(y
,
x
)
=
exp
W
x
f (y,
f(y,
f(y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
yk =cx1k=1
yk =cx1
k+1,i
yk =c 1
f
k+1,i
y
=c
k
c,i
c,i
k
k=1
c,i
c,i
k=1
• x1 •x2x1x3x
xx4x
x
•5 x
x
x4 x(x
•Math
x
x
1)
•x5x
for
my
slides
1ax
2k )CRFs”.
3a5(L+1,0)
5•k )ayp (y
2
13
2x
4 factor
3x
5x
1
2“Training
3ux
4x
=
•4x
(y
=x
+k ,1yk>1
(xkVy1 ),yy + 1k<K a(L+1,+1) (xk+
) (L+1,
= 11k<K
k+1a
⇣
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
⌘
(L+1,0)
(L+1, 1) K 1 (L+1,0)
(L+1,+1)
K
11k<K
K 1(x
K )a1
K )1
(L+1,
1)
(L+
•
a
(y
)
=
a
(x
)
+
1
a
(x
)
+
•
a
(y
,
y
)
=
1
V
u
k
k
y
k>1
k
1
y
k+1
y
• y1f (y,
y
y
y
y
•
y
y
y
y
y
p
k
k+1
1k<K
y
,y
•
exp(a
(x
)
exp(a
(x
)
)
exp(a
•
y
y
•
y
y
y
y
y
y
y
y
•
y
y
y
y
y
0
0
0
0
0
0
3
4
5
0
0
3
y
2
y
1
2
3
4
5
• 2 X)
(y,
=
X)
•
(y
=
,
(y,
(y
X)
,
)
y
=
=
exp
)
=
(y
V
exp
,
y
V
)
=
exp
1
V
1
1
1
1
1
0
1f • 2 1
2k
4X)
5
5 k4X)
(y
, fk+1
) f=k+1
Vc,c
1c,c
1k=1
1f3k+1
5=kc,c
•=2k4 ff3(y,
(yexp
, yk+1
)k=
1yykk+1
1yk+1 =c0
f(y,
f
k+1
y
=cexp
c,c
y
ykk+1
=cV=c
ykk+1
yykk+1
=c =c
f3
y
=c=c
kc,c
=c =c
k=1
k=1
k=1
k=1
•
FACTOR GRAPH VISUALIZATION
k
k
k
•
k
k
k k k+1
• ap (yk , yk+1 ) = 11k<K
Vyk ,y(L+1,0)
• exp(a
(x3 )y3 ) exp(a(L+1,
k+1
k
k+1
k
3
1)
3
(x2 )y3 ) exp(a(L+1,+1) (x4 )y3 ) exp(Vy3
• k , yf k+2
(y•k ,)yf(L+1,0)
•fk(y
), yfkk+2
(y
,•yfkk+1
(y
, yfkk+1
),(L+1,
,)yfk+2
(y
), yk+2 )
(y
), yfkk+2
(y
(y
ykk+2
) k), yfk+2
(yk ), yk+2
f (y
k+2
k+2
k+1
k+1
k+1
1)
(L+1,+1)
• exp(a
(x3 )y3 ) exp(a
(x2 )y3 ) exp(a
(x4 )y3 ) exp(Vy3 ,y4 )
• x1 •x2x1x3•x2xx4x
•x5x
x52xx41x3xx52x4x3x5x4 x5
xx
1 3x
2x41
3•
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5 y4 y5
Factor graph 1
• This
Markov network
Factor graph 2
is less ambiguous:
2
?
?
2
2
2
2
K (L+1)
Hugo
Larochelle
K
hugo.larochelle@
(y,
•
X)
(y,
=
X)
•
(y
=
)
=
(y,
(y
exp
X)
)
=
=
b
exp
(y
b
)
=
exp
1
b
1
1
(ykcX)
) f==
exp
b
1
(y
)
=
exp
b
1
ck fk=1
c
c
f
f •
ff •=k ff(y,
y
=c
y
=c
y
=c
c
ff(y,kX)
y
=c
k
y
=c
Université
de
Sherbrooke
k
k
k
k ⇣ k=1 ⌘k
k=1
k=1
k=1
4 ⌘
⇣
⇣
⇣
⌘
⌘
⇣
Département
d’informatique
P
P
Hugo Larochelle
(L+1,+1)
K(L+1,+1)
1P
(L+1,+1)
K(L+1,+1)
1PK(L+1,+1)
1PK 1 K 1
[email protected]
⇣=W
⇣
⇣=
⌘x1k+1,i
⌘x1k+1,i
⌘
• f (y,
(y
xf k+1,i
=
xexp
⇣
⌘x1k+1,i
•=k ,X)
(y,
(yW
,c,i
x
)Sherbrooke
⇣PW
• X)
•= X)
(y
x)fk+1,i
(y
,X)
)xfk+1,i
=
)fUniversité
=
W
x1k+1,i
•=k ,X)
(y,
=kexp
(y
, c,i
x
)
=
exp
W
de
k+1,i
y
kexp
k+1,i
f (y,
ff (y,
yk =c
ykSeptember
=c
kexp
k+1,i
y
=c
k =c
k=1
c,i
k=1
c,i
P
P
P
k
k=1
k=1
c,i
Département
d’informatique
k=1
P
(L+1, (L+1,
1)
1)
(L+1,
1)
K
K
K1)
(L+1,
1)
K
(L+1,
K
• f (y,
• X)
•(y
=kX)
, fx
(y
X)
, fxf)(y,
=
=k1,i
exp
(y
=
W
exp
, xf)c,i
W
)k= 1,i
exp
W
x
x1kk=2
1
x
1
•= X)
=k1,i
,de
x)f ⇣
=k1,i
exp
W
x
1
•kUniversité
X)
=k1,i
(y
,
x
)
=
exp
W
x
1
[email protected]
f (y,
f(y,
k(y
k
kk=2
1,iSeptember
y
1,i
=c
y
=c
k
1,i
y
=c
ff(y,
k
k
1,i
y
=c
k
1,i
y
k
k
k
c,i
c,i
k
k=2
k=2
c,i
c,i
k=2
Sherbrooke
26,
2012
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
K 1 K 1 K 1 K 1 K 1
[email protected]
0
0 0
⇣
⇣
⇣
⌘
⌘
⌘
• f (y,
(y
yfk+1
exp
Vexp
1
0=c
0 =c=c
0 =c=c
⌘
•=k ),X)
(y,
=c,c
(y
,⇣0y=k+1
)
=
exp
V
1
1y0 k+1
0 =c
⇣
⌘
• X)
•= X)
(y
y=
(y
y=
), y=
Vexp
V1exp
1
1
•=k ,X)
(y,
=
(y
)
V
1
Abstr
y
y
fk+1
fk+1
kk=1
c,c
y
=c
f (y,
ff (y,
k),X)
c,c
c,c
y
=c
y
=c
=c
y
fk+1
k
c,c
y
k
k+1
k
k
k
k+1
k+1
k=1
P
P
P
September
26,
2012
k
k+1
k=1
k=1
P
k=1
P
(L+1,0)
(L+1,0)
K (L+1,0)
K (L+1,0)
K
K (L+1,0)
K
Topics: factor
graph
• f (y,
• X)
(y,
=
X)
•
(y
=
,
x
(y,
(y
)
X)
,
=
x
exp
=
)
=
(y
W
exp
,
x
W
)
=
exp
x
W
1
x
1
x
1yxkk,i
=k ff(y,
(ykX)
, xf k,i
exp
W
1ykk,i
=k)c,i=f k,i
(y
, xk,i
)c,i=fork,i
exp
f •
ff •
k,i
k,i
ykW
k,i
=c
yxkk,i
=c
=c 1yk =c
ff(y,kX)
=c
k
c,i
c,i
k=1
k=1
k=1
Math
my
slides
“Training
CRFs”.
k=1
c,i
k=1
Abstract
September 26, 2012 = observed
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
•
(y
•
,
y
(y
)
,
y
(y
)
(y
,
,
y
y
)
,
y
)
•
(y
,
y
)
(y
,
y
,
y
)
f
k
k+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
f
k
f
k+2
k
k+2
f
k
f
k+1
k
k+2
k+1
k+2
f
k
k+2
f
k
k+1
k+2
⇣
⇣
⇣
⌘
⌘
⌘
⇣
⌘
⇣
• Factor graphs better represent factors
(L+1,0)
(L+1,
1)
Math for my slides
“Training
CRFs”.
P
P
P
P
P
• a(L+1,+1)
) =K
a 1 K
(x1k )yK
+ 11k>1
u (ykAbstract
(L+1,+1)
(L+1,+1)
Ka 1 K (x
(L+1,+1)
(L+1,+1)
1 k 1 )y +
• f (y,
• X)
= X)
•(y
=kX)
, fx
(y
, fxf)(y,
=
=kX)
exp
)f k+1,i
(y
=
exp
, xf)c,ik+1,i
W
) k+1,i
=W
exp
W
xk=1
x1k=1
1k+1,i
•
=
(y
,
x
=
exp
•
=
(y
,
x
)
=
exp
W
x
f (y,
f(y,
f(y,
k+1,i
kX)
k+1,i
kW
k+1,i
k+1,i
yk =cx1k=1
yk =cx1
k+1,i
yk =c 1
f
k+1,i
y
=c
k
c,i
c,i
k
k=1
c,i
c,i
k=1
• x1 •x2x1x3x
xx4x
x
•5 x
x
x4 x(x
•Math
x
x
1)
•x5x
x
for
my
slides
1ax
2k )CRFs”.
3a5(L+1,0)
5•k )ayp (y
2
13
2x
4 factor
3x
5Abstract
1
2“Training
3ux
4x
=
•4x
(y
=x
+k ,1yk>1
(xkVy1 ),yy + 1k<K a(L+1,+1) (xk+
) (L+1,
= 11k<K
k+1a
⇣
⇣P ⇣P ⇣P ⇣P ⌘ P ⌘
⌘
⌘
⌘
Math for my slides “Training CRFs”.
(L+1,0)
(L+1, 1) K 1 (L+1,0)
(L+1,+1)
K
11k<K
K 1(x
K )a1
K )1
(L+1,
1)
(L+
•
a
(y
)
=
a
(x
)
+
1
a
(x
)
+
•
a
(y
,
y
)
=
1
V
u
k
k
y
k>1
k
1
y
k+1
y
• y1f (y,
y
y
y
y
•
y
y
y
y
y
p
k
k+1
1k<K
y
,y
•
exp(a
(x
)
exp(a
(x
)
)
exp(a
•
y
y
•
y
y
y
y
y
y
y
y
•
y
y
y
y
y
0
0
0
0
0
0
3
4
5
0
0
3
y
2
y
1
2
3
4
5
• 2 X)
(y,
=
X)
•
(y
=
,
(y,
(y
X)
,
)
y
=
=
exp
)
=
(y
V
exp
,
y
V
)
=
exp
1
V
1
1
1
1
1
0
1f • 2 1
2k
4X)
5
5 k4X)
(y
, fk+1
) f=k+1
Vc,c
1c,c
1k=1
1f3k+1
5=kc,c
•=2k4 ff3(y,
(yexp
, yk+1
)k=
1yykk+1
1yk+1 =c0
f(y,
f
k+1
y
=cexp
c,c
y
ykk+1
=cV=c
ykk+1
yykk+1
=c =c
f3
y
=c=c
kc,c
=c =c
k=1
k=1
k=1
k=1
•
FACTOR GRAPH VISUALIZATION
k
k
k
k
k
k k k+1
k
k+1
k
3
3
(L+1, 1)
(L+1,+1)
• au (yk ) = a(L+1,0) (xk•)yakp+(yk1,k>1
a
(x
)
+
1
a
(xk+1
)yk 1) (x ) ) exp(a(L+1,+1) (x ) ) exp(V
(L+1,0)
(L+1,
k<K
k
yk+1 ) = 11k<K
V1yky,y
• kexp(a
(x3 )y3 ) exp(a
k+1
2 y3
4 y3
y3
• ) =f (y
(y•Vkyk,),yyf(L+1,0)
•
(y
)
(y
,
y
(y
,
y
,
y
)
,
)
y
(y
)
,
y
)
(y
,
y
)
(y
,
y
)
•
(y
,
y
)
(y
,
y
)
k, y
f k+2
k+2
f
f
k
f
k
k+1
k
k+2
k+1
k+2
f
k+2
k
k+1
k+2
k
k+2
f
k
k+1
k+2
f
k
k+2
f
k
k+1
k+2
• ap (yk , yk+1
1•1k<K
(L+1,
1)
(L+1,+1)
• exp(a k+1
(x3 )y3 ) exp(a
(x2 )y3 ) exp(a
(x4 )y3 ) exp(Vy3 ,y4 )
(L+1, 1)
(L+1,+1)
• exp(a(L+1,0)
)x
(x
)
)
exp(a
• x(x1 3 •
xexp(a
x
x
x
x
•
x
x
x
x
x
x
x5x4(xx
•
x
x
x
x
x
•
x
x
y32)x1
2
y
4 )5
y3 ) exp(Vy3 ,y4 )
3 2 41 3 52 41 3 52
3 41 3 52 4x3
• y1 •y2 yy1 3 y•y2 4yy1y3 5y•y24yy1y35y•y24yy1y35yy2 4 yy3 5 y4 y5
Factor graph 1
• This
Markov network
Factor graph 2
is less ambiguous:
2
?
?
2
2
2
2