Conditional random fields

Neural networks
Conditional random fields - linear chain CRF
•
•
(t(t 1)1)
(t (t)
1)(t) (t)
(t) (t)(t+1)
(t+1)
(t+1)
(t
1)
(t+1)
(t+1)
(t• 1)
(t
1)
(t)
(t+1)
(t)
(t)
•
p(y
|x
)
p(y
|x
)
p(y
|x
)
|x
)
p(y
|x
)
p(y
|x
)
(L+1)
(t
1)
(L+1)
(t)
(L+1)
(t+1)
(t)
p(y p(y
|x
)
p(y
|x
)
p(y
|x
)
(t)
(t)
(L+1)
(t•
1)
(L+1)
(t)
(L+1)
(t+1)
(t)
a
(x
)
a
(x
)
a
(x
)random fields”.
•
X
=
[x
,
.
.
.
,
x
]
Math
for
my
slides
“Conditional
•• aX =(x[x1 , ). . a. , x1Kt ](x )Kta
(x
)
(T(T
) ) (1) (T )
(T )
(1)(T
(T
). , y(1)
(1)
). (1)
(1)
•
p(y
,
.
.
|x
,
.
.
.
,
x
) (t 1) (L+1) (t) (L+1) (t+1)
•
p(y
,
.
.
,
y
|x
,
.
.
.
,
x
)
p(y , .(t. . ,1)
y •|x
,
.
.
.
,
x
)
(t
1)
(t)
(t+1)
(L+1)
Kt (t+1)
(t)
CONDITIONAL
RANDOM
FIELD
• a
(x
) a
(x ) a
(x
x x
•• xK
x • x
x
t
(t)
•
y
Topics:
(t) (t)
(t)
(t)(t)(t)
(t)
(t)
(t)
•
y
=
[y
,
.
.
.
,
y
]
•= y
=
[y
,
.
.
.
,
y
]
(t 1) (t)(t) (t)(t+1) (t+1) (t+1)
[y
,
.
.
.
,
y
]
1
K
1
(t
1)
(t
1)
K
•
x
t
1
t
K
(t)
•
x
x |x x(t+1)
(t 1)classification
1)
(t)
sequence
•(tktp(y
|x (t) ) p(y(t+1)
) p(y
|x
)
)
•• p(y
|x
) p(y |x ) p(y
|x
)
xk
(t)
(t)(t) (t) (t)
(t)
(t)
(t)
(t)
(t 1) (t 1)
(t) (t)
(t+1) (t+1)
• For
•
X
=
[x
,
.
.
.
,
x
]
•
(X,
y)
a given
example
:
•=X
=
[x
,
.
.
.
,
x
]
• X
[x
,
.
.
.
,
x
]
•
p(y
|x
)
p(y
|x
)
p(y
|x
)
(1)
(T
)
(1)
(T
)
1
K
1• Kp(y
K
1
t
t
(1)
(T
)
(1)
(T
)
•• p(y
, . . . , y t |x , .,....,.y, x |x) , . . . , x )
(X, y)
(t)
(t)
(t)
•
y
=
[y
,
.
.
.
,
y
]
•
p(y|X)
=
p(y
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
1
K
1
K
•
y
=
[y
,
.
.
.
y
]
1
K
t
(t)1 (L+1) Kt
(t)
(t)
(L+1)
(t)(L+1)
• xk • xk • xk• a
•
•
(xk
(t) )
(t)
)
a
(x
)
a
(x
1
• y =k [y1 , . . . , yKk+1
]
t
(t) (t)
(t)
(t)
(t)
(t)
•
{(X
,
y
)}
(t)
•
X
=
[x
,
.
.
.
,
x
]
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
•
X
=
[x
,
.
.
.
,
x
]
1
K
at(x
(xk+1
) )a(x
(x
t k+1
•ak1x1k(x
a • (x
)(x
ax
(x
(t)
a k • 1a
) 1(x
ax
a
)
k
k (t)
k+1 ) (t)
k+1
k1)k) K
k) •
X (L+1)
= [x1 , . . . , xKt ]
(L+1)
(L+1)
• a
(xk• K
(xk ) a
(xk+1 )
1) a
•1k+1
xxkk• 1x
xk1 ,t .x.k+1
xk •1• K
xxktk x
p(y
. , yK |x1 , . . . , xK )
k+1
...
• Kt
(t)
•
x
x
x
(t)
k
1
k
k+1
• p(y1•,•.x.p(y
.k, y•1K, .p(y
|x
. .x
,. x
,K
,1yK,K.).|x
)
(t)
. .1,,1y.•
|x
. ,1x, .K. ). ,•xK
k
xk
...
(t)
...
...
(t)
(t)
(1)
(T ) (1)
(T )
•
K
•
{(X
,
y
)}
•
K
• Kt
•
p(y
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
t
t
(t)
(t)
(t)
• p(y
|x
,
.
.
.
,
x
)
(L+1)
(L+1)
(L+1)
1 , . . . ,•yK
1
K
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
a
(x
)
a
(x
)
a
)
• a
(xk 1 ) a
(x
k k1• a
k 1 ) )a(L+1)(x
(xkk+1
(xk+1
)
a
(xk+1 )
k
•
xk 1 xk xk+1
• xk 1 xk xk+1
• xk 1 xk •xk+1
2
(xk ) a
, xK )
(L+1)
(xk+1 )
3
LINEAR CHAIN CRF
•
•x
k
(L+
is yk followed
k=1
by yk+1 likely?
(t)
x
k
•• (X,y)(X,y)
• {(X(t),y(t))}
(L+1)
k 1
(L
k=1
is yk likely
given input ?
•(X,y)
•• a {(X(t)(x,y(t)))}a
•
x x x
k
1
k
k+1
•{(X(t),y(t))}
(L+1)
(L
1 ,...,yK |x1 ,...
•• p(ya(L+1)
(
x
)
a
k
1
•
lateral weights
xKTopics:
)
Y
Y
(L+1)
• Regular
p(y|X)
=
p(y
|x
)
=
exp(a
(xk )yk )/Z(xk )
classification: k k
Yk
Yk
!
!
(L+1)
p(y|X) =
p(ykX
|xk ) =
exp(a
(x
Y
k )yk )/Z(xk )
(L+1)
= kexp
a k (xk )yk /
Z(xk )
!
!
Xk
Yk
(L+1)
=
a ⌘ (xk )yk /
Z(xk )
⌘ ⇣exp
QK k
(L+1)
k
a
(x
)
/
Z(x
)
k yk
k
=1
k=1
⇣Q
⌘linear chain:
• Sequence ⌘classification
with
!
K
(L+1)
K k)
K
(xk )yk /
X
X1
1a
k=1 Z(x
(L+1)
p(y|X) = exp
a
(xk )yk +
Vyk ,yk+1 /Z(X)
!
Kk=1
K
1
X
Xk=1
(L+1)
partition
p(y|X) = exp
a
(xk )yk +
Vyk ,yk+1 /Z(X)
function
Y
Y
(L+1)
(L+1)
(t
1)
(L+1)
(t)
(L+1)
(t+1)
Y
Y
Y
Y
Y
p(y|X)
=
p(y
|x
)
=
exp(a
p(y|X)
=
p
k
k
4
(L+1) (t• 1)
(L+1)
(t)
(L+1)
(t+1)
p(y|X)
=
p(y
|x
)
=
exp(a
(x
)
)/Z(x
)
a
(x
)
a
(x
)
a
(x
)
Math
for
my
slides
“Conditional
random
fields”.
k
y
k
k
k
(L+1)
(L+1
k
• a
(x
) a p(y|X)
(x )=a p(y(x
)
|x
p(y|X)
)
=
=
exp(a
p(y
(x
|x
p(y|X)
)
)
=
)/Z(x
=
exp(a
)
p(y
k
k k
k kk yk
k
k
k
k
k
!
k
k
k
k
k
!
!
(t
1)
(t)
(t+1)
(L+1)
(t
1)
(L+1)
(t)
(L+1)
(t+1)
!
!
!
(t 1)
(t)
(t+1) x
•
a
(x
)
a
(x
)
a
(x
)
•
x
x
X
X
Y
• x
x x
X
X
Y a(L+1)=
X
Y
(L+1)
=
exp
(x
)
(L+1)
(L+1)
k exp
yk
=
exp
a
(x
)
/
Z(x
)
k
y
k
k
=
exp
a
(x
=
)
exp
/
Z(x
a
)
(x
=
)
exp
/
k yk
k
k yk
(t 1) (t)(t) (t)(t+1) (t+1)
(t
1)
(t
1)
(t+1)
•(t)x) p(y(t+1)
x |x xk(t+1)
Topics:•lateral
(tweights
1) •
(t p(y
1)
(t)
|x
|x k )k k
k|x ) p(y
p(y
|x
) p(y |x ) p(y
)
⇣⇣⌘P
⌘P
⇣(t)Q
⌘(t+1)
⇣
⌘
⇣
⇣
⇣
⌘
⇣
⌘
⇣
⌘
⌘
⇣
⇣
⌘
⌘
⇣
(t
1)
(t
1)
(t)
(t+1)
Q
• Sequence classification
P
Q
with
linear
chain:
K
K
P
P
Q
P
Q
Q
K
K
•
p(y
|x
)
p(y
|x
)
p(y
|x
)
(1)
(T
)
(1)
(T
)
(L+1)
(L+1)
K
K
K
K
K
K
K
K
(L+1)
(L+1)
(L+1)
(L+1)
(1)
(T
)
(1)
(T
)
•k=1
p(y|X)
=.(x
exp
a
(x
)
/
Z(x
)
•
p(y|X)
=
exp
a
(x
)
/
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
k
y
k
k
y
•
a
p(y|X)
(x
=
)
exp
/
•
a
p(y|X)
Z(x
(x
)
=
)
exp
/
a
Z(x
(x
)
)
/
Z
• •p(y|X)
a
)
/
Z(x
)
k
k
k=1
k=1
k
y
k
k
y
k
k
y
•p(y|X)
p(y ==exp
,exp
. . . •, yp(y
|x
,
.
.
,
x
)
k=1
k=1
k
y
k
k k
k
k
k=1
k=1
k=1
k=1
k=1
k=1
k=1
(1)
(T ) (1)
(T )
!
•
p(y
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
!
(t)
(t)
K
K
K
1
K
K
1
KK
(t)
K
K
1
(t)
(t)
X
X
X
X
X
X
(t)
X(L+1)
[y] 1 , . . . , yKt ] X(L+1)
• y = [y1 •, .y. . , y=Kp(y|X)
(L+1)
(L+1)
=
exp
p(y|X)
a
(x
=
)
exp
+
V
p(y|X)
a
(x
=
/Z(X)
)
exp
+
V
p(y|X)
=
exp
a
(x
)
+
t
k
y
y
,y
k
y
p(y|X)
=
exp
(t)
(t)
k
k
k+1
k
k
y
p(y|X)
=
exp
a
(x
)
+
V
/Z(X
(t)
k
...
...
..., . . . , y ] k yk
yk ,yk+1
• y = [yk=1
1
Kt
k=1
k=1
k=1
k=
LINEAR CHAIN CRF
(t)
k=1
(t) (t) (t)
(t)
(t)
=K[x] 1 , . . . , xKt ] (t)
• X (L+1)
= [x1•(L+1)
,X
...,x
(L+1)
(L+1)
(L+1)(t) (L+1)
(t)
t(L+1)
• V W(L+1) b(L+1)• V W
b
•
V
W
• X(L+1)
= [x , (L+1)
. . . , x b](L+1)
• V W
• Kt
•
•
• Kt
(t)
xk
• a
(L+1)
• xk
1
(t)
xk
b
• Kt
•
• V 1W
...
b
...
• V W
...
...
(t)
xk
...
k=1
k=1
Kb
t
1
1
1
1
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
•
a
(x
)
a
(x
)
a
)
(xk 1 ) a
(x
k k1• a
k 1 ) )a(L+1)(x
(xkk+1
(xk+1
)
a
(xk+1 )
k
xk
xk •xk+1
1
xk xk+1
• xk
1
xk xk+1
Y
Y
k)Y
(L+1)
(L+1)
(t
1)
(L+1)
(t)
(L+1)
(t+1)
Y
Y
Y
Y
p(y|X)
=
p(y
|x
=
exp(a
p(y|X)
=
p
k
k
4
(L+1) (t• 1)
(L+1)
(t)
(L+1)
(t+1)
p(y|X)
=
p(y
|x
)
=
exp(a
(x
)
)/Z(x
)
a
(x
)
a
(x
)
a
(x
)
Math
for
my
slides
“Conditional
random
fields”.
k
y
k
k
k
(L+1)
(L+1
k
• a
(x
) a p(y|X)
(x⇣ )=a p(y(x
)
|x
p(y|X)
)
=
=
exp(a
p(y
(x
|x
p(y|X)
)
)
=
)/Z(x
=
exp(a
)
p(y
k
k k
k kk yk
⌘
⇣
⌘
k
k
k
k
k
P
Q
!
K
K
k
k
k
k
k
!
!
(L+1)
(t
1)
(t)
(t+1)
(L+1)
(t
1)
(L+1)
(t)
(L+1)
(t+1)
!
!
!
p(y|X)
a
(x
)
/
Z(x
)
(t 1) • (t)
(t+1)= xexp x
•
a
(x
)
a
(x
)
a
(x
)
•
x
X
k
y
k
X
Y
k
k=1
k=1
• x
x x
(L+1)
X
X
Y
X
Y
(L+1)
=
exp
a
(x
)
=
(L+1)
(L+1)
k exp
yk
=
exp
a
(x
)
/
Z(x
)
k
y
k
k
=
exp
a
(x
=
)
exp
/
Z(x
a
)
(x
=
)
exp
/
k yk
k
k yk
(t 1) (t)(t) (t)(t+1) (t+1)
(t
1)
(t
1)
(t+1)
•(t)x) p(y(t+1)
x |x xk(t+1)
Topics:•lateral
(tweights
1) •
(t p(y
1)
(t)
kKk
|x
|x k )X
k|x ) p(y
p(y
|x
) p(y |x ) p(y
)
(L+1)
⇣⇣⌘P
⌘
⇣
⌘
⇣
⌘
⇣
⇣
⇣
⌘
⇣
⌘
⇣
⌘
⌘
⇣
⇣
⌘
⌘
⇣
(t
1)
(t
1)
(t)
(t)
(t+1)
(t+1)
Q
P
Q
p(y|X)
=
exp
a
(x
)
• Sequence classification
P
Q
with
linear
chain:
k
K
K
P
P
Q
P
Q
Q
K
K
•
p(y
|x
)
p(y
|x
)
p(y
|x
)
(1)
(T
)
(1)
(T
)
(L+1)
(L+1)
K p(y|X)
K
K
KK
K K a(L+1)
K
(L+1)
(L+1)
(L+1)
(1)
(T
)
(1)
(T
)
•
=
exp
a
(x
)
/
Z(x
)
•
p(y|X)
=
exp
(x
)
/
•
p(y
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
k
y
k
• |x
ap(y|X)
=.k ),kexp
•k=1
a
p(y|X)
Z(x
)=kk)exp
ak=1
Z(xk(x
) kk)ykyk /
Z
• •p(y|X)
a , .(x
)kyk / )/ k=1
Z(x
)yk k / k=1
yx
k(x
•p(y|X)
p(y ==exp
,exp
. . . , y k=1
.(x
k=1
k=1
k=1
k=1
k=1
k=1
k=1
k=1
(1)
(T ) (1)
(T )
!
•
p(y
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
!
(t)
(t)
K
K
K
1
K
K
1
KK
(t)
K
K
1
(t)
(t)
X
X
X
X
X
X
(t)
X(L+1)
•, .y. . , y(L+1)
= [y] 1 , . (L+1)
. . , yKt ] X(L+1)
• y = •[y1V
(L+1)
(L+1)
Kp(y|X)
=
exp
p(y|X)
a
(x
=
)
exp
+
V
p(y|X)
a
(x
=
/Z(X)
)
exp
+
V
W
b
p(y|X)
=
exp
a
(x
)
+
t
k
y
y
,y
k
y
p(y|X)
=
exp
(t)
(t)
k
k
k+1
k
k
y
p(y|X)
=
exp
a
(x
)
+
V
/Z(X
(t)
k
...
...
..., . . . , y ] k yk
yk ,yk+1
• y = [yk=1
1
Kt
k=1
k=1
k=1
k=
LINEAR CHAIN CRF
(t)
k=1
(t) (t) (t)
(t)
(t)
=K[x] 1 , . . . , xKt ] (t)
• X (L+1)
= [x1•(L+1)
,X
...,x
(L+1)
(L+1)
(L+1)(t) (L+1)
(t)
t(L+1)
• V W(L+1) b(L+1)• V W
b
•
V
W
• X(L+1)
= [x , (L+1)
. . . , x b](L+1)
• V W
• Kt
•
•
• Kt
(t)
xk
• a
(L+1)
• xk
1
(t)
xk
b
• Kt
•
• V 1W
...
b
...
• V W
...
...
(t)
xk
...
k=1
k=1
Kb
t
1
1
1
1
1
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
•
a
(x
)
a
(x
)
a
)
(xk 1 ) a
(x
k k1• a
k 1 ) )a(L+1)(x
(xkk+1
(xk+1
)
a
(xk+1 )
k
xk
xk •xk+1
1
xk xk+1
• xk
1
xk xk+1
•
Y
Y
k
k)Y
k Y
(L+1)
(L+1)
(t
1)
(L+1)
(t)
(L+1)
(t+1)
Y
Y
Y
p(y|X)
=
p(y
|x
=
exp(a
p(y|X)
=
p
k
k
4
(L+1) (t• 1)
(L+1)
(t)
(L+1)
(t+1)
p(y|X)
=
p(y
|x
)
=
exp(a
(x
)
)/Z(x
)
a
(x
)
a
(x
)
a
(x
)
Math
for
my
slides
“Conditional
random
fields”.
k
y
k
k
k
(L+1)
(L+1
k
a
(x
) a
(x ) a
(x
)
p(y|X)
p(yk |x
p(y|X)
exp(a
p(yk(x
|x
p(y|X)
=
exp(a
kk))y=)/Z(x
k ) ⇣kp(y
k ) =⇣ =
⇣P
⇣P=⌘ ⇣Q
⌘
⌘
⇣
⇣
⌘
k
k
k
k
P
Q
P
Q
!K
K (t (L+1)
K
K
K
K
K
k
k
k
k
k
!
!
(L+1)
(L+1)
(L
1)
(t)
(t+1)
(L+1)
(t
1)
(L+1)
(t)
(L+1)
(t+1)
!
!
!
• p(y|X)•=x(t
exp1) •x(t)
p(y|X)
a
=
exp
(x
•
)
p(y|X)
/
a
=
exp
Z(x
(x
•
)
)
p(y|X)
/
a
=
exp
(x
Z(x
)
)
/
(t+1)
•
a
(x
)
a
(x
)
a
(x
)
•k=1
x
x
x
X
k
y
k
k
y
k
y
k
X
Y
k
k
k
k=1
k=1
k=1
k=1
k=1 a k
x
LINEAR CHAIN
CRF
X
k
X
Y a(L+1)=
X
Y
(L+1)
=
exp
(x
)
(L+1)
(L+1)
k exp
yk
=
exp
a
(x
)
/
Z(x
)
k
y
k
k
=
exp
a
(x
=
)
exp
/
Z(x
a
)
(x
=
)
exp
/
k yk
k
k yk
(t 1) (t)(t) (t)(t+1) (t+1)
(t
1)
(t
1)
(t+1)
x xk(t+1)
Topics:•lateral
(tweights
1) (t 1)
(t) •
(t)x
(t+1)
kKkK 1
k|x K
k
p(y
|x
) p(y |x ) p(y
)
• p(y
|x
) p(y |x X
) p(y
|x
)XX
(L+1)
(L+1)
⇣
⌘
⇣
⌘
⇣
⌘
⇣
⇣
⌘
⇣
⌘
⇣
⌘
⇣
⇣
⌘
⌘
⇣
⇣
⌘
⌘
⇣
(t
1)
(t
1)
(t)
(t)
(t+1)
(t+1)
P
Q
P
Q
p(y|X)
=
exp
p(y|X)
a
=
(x
exp
)
p(y|X)
+
a
V
=
(x
exp
)
• Sequence classification
P
Q
with
linear
chain:
k
y
y
,y
k
K
K
P
P
Q
P
Q
Q
K
K
•
p(y
|x
)
p(y
|x
)
p(y
|x
)
k
k
k+1
(1)
(T
)
(1)
(T
)
(L+1)
(L+1)
K p(y|X)
K
K
KK
K K a(L+1)
K
(L+1)
(L+1)
(L+1)
(1)
(T
)
(1)
(T
)
•
=
exp
a
(x
)
/
Z(x
)
•
p(y|X)
=
exp
(x
)
/
•
p(y
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
k
y
k
• |x
ap(y|X)
=.k ),kexp
•k=1
a
p(y|X)
Z(x
)=kk)exp
ak=1
Z(xk(x
) kk)ykyk /
Z
• •p(y|X)
a , .(x
)kyk / )/ k=1
Z(x
)yk k / k=1
yx
k(x
•p(y|X)
p(y ==exp
,exp
. . . , y k=1
.(x
k=1
k=1
k=1
k=1
k=1
k=1
k=1
k=1
k=1k=1
(1)
(T ) (1)
(T )
!
•
p(y
,
.
.
.
,
y
|x
,
.
.
.
,
x
)
!
(t)
(t)
K
K
K
1
K
K
1
KK
(t)
K
K
1
(t)
(t)
X
X
X
X
X
X
(t)
X
X
•
y
=
[y
,
.
.
.
,
y
]
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
•
y
=
[y
,
.
.
.
,
y
]
(L+1)
1
K
t
(L+1)
1
K
p(y|X)
=
exp
p(y|X)
a
(x
=
)
exp
+
V
p(y|X)
a
(x
=
)ykexp
+
V
• V W ... b• V W ...p(y|X)
b• V
W (t)...p(y|X)
b(t)
• k=
V
W
b
a
(x
)
+
t
yk )exp
yk ,y
k/Z(X)
p(y|X)
=
exp
k+1
k
y
=
a
(x
+
V
/Z(X
(t) exp
k
k yk
yk ,yk+1
• y = [yk=1
,
.
.
.
,
y
]
1
Kt
k=1
k=1
k=1
k=
(t)
k=1
(t) (t) (t)
(t)
(t)
=K[x] 1 , . . . , xKt ] (t)
• X (L+1)
= [x1•(L+1)
,X
...,x
(L+1)
(L+1)
(L+1)(t) (L+1)
(t)
t(L+1)
• V W(L+1) b(L+1)• V W
b
•
V
W
• X(L+1)
= [x , (L+1)
. . . , x b](L+1)
• V W
• Kt
•
•
• Kt
(t)
xk
• a
(L+1)
• xk
1
(t)
xk
b
• Kt
•
• V 1W
...
b
...
• V W
...
...
(t)
xk
Kb
t
1
1
...
k=1
k=1
1
1
1
1
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
(L+1)
•
a
(x
)
a
(x
)
a
)
(xk 1 ) a
(x
k k1• a
k 1 ) )a(L+1)(x
(xkk+1
(xk+1
)
a
(xk+1 )
k
xk
xk •xk+1
1
xk xk+1
• xk
1
xk xk+1