ppt

Lecture Slides for
ETHEM ALPAYDIN
© The MIT Press, 2010
[email protected]
http://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Introduction
 Modeling dependencies in input; no longer iid
 Sequences:
 Temporal: In speech; phonemes in a word (dictionary),
words in a sentence (syntax, semantics of the language).
In handwriting, pen movements
 Spatial: In a DNA sequence; base pairs
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
3
Discrete Markov Process
 N states: S1, S2, ..., SN
 First-order Markov
State at “time” t, qt = Si
P(qt+1=Sj | qt=Si, qt-1=Sk ,...) = P(qt+1=Sj | qt=Si)
 Transition probabilities
aij ≡ P(qt+1=Sj | qt=Si)
aij ≥ 0 and Σj=1N aij=1
 Initial probabilities
πi ≡ P(q1=Si)
Σj=1N πi=1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
4
Stochastic Automaton
T
P O  Q | A ,    P q1  P qt | qt 1   q aq q
t 2
1
1 2
aqT 1qT
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
5
Example: Balls and Urns
 Three urns each full of balls of one color
S1: red, S2: blue, S3: green
  0.5,0.2,0.3T
O  S1 , S1 , S3 , S3 
0.4 0.3 0.3
A  0.2 0.6 0.2
 0.1 0.1 0.8
P O | A ,    P S1   P S1 | S1   P S3 | S1   P S3 | S3 
  1  a11  a13  a33
 0.5  0.4  0.3  0.8  0.048
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
6
Balls and Urns: Learning
 Given K example sequences of length T
ˆi 
aˆij 
# sequences starting with Si 

# sequences 
# transition s from Si to S j 
k

1
q
k 1  S i 
K
# transition s from Si 
k
k

1
q

S
and
q
k t 1 t i
t 1  S j 
T- 1

k

1
q
k t 1 t  Si 
T- 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
7
Hidden Markov Models
 States are not observable
 Discrete observations {v1,v2,...,vM} are recorded; a
probabilistic function of the state
 Emission probabilities
bj(m) ≡ P(Ot=vm | qt=Sj)
 Example: In each urn, there are balls of different colors,
but with different probabilities.
 For each observation sequence, there are multiple state
sequences
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
8
HMM Unfolded in Time
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
9
Elements of an HMM
 N: Number of states
 M: Number of observation symbols
 A = [aij]: N by N state transition probability matrix
 B = bj(m): N by M observation probability matrix
 Π = [πi]: N by 1 initial state probability vector
λ = (A, B, Π), parameter set of HMM
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
10
Three Basic Problems of HMMs
1. Evaluation: Given λ, and O, calculate P (O | λ)
2. State sequence: Given λ, and O, find Q* such that
P (Q* | O, λ ) = maxQ P (Q | O , λ )
3. Learning: Given X={Ok}k, find λ* such that
P ( X | λ* )=maxλ P ( X | λ )
(Rabiner, 1989)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
11
Evaluation
 Forward variable:
 t i   P O1 Ot , qt  Si |  
Initializa tion :
1 i    i bi O1 
Recursion :
N

 t 1  j    t i aij  b j Ot 1 
 i 1

N
P O |     T i 
i 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
12
 Backward variable:
 t i   P Ot 1 OT | qt  Si ,  
Initializa tion :
T i   1
Recursion :
N
t i    aij b j Ot 1 t 1  j 
j 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
13
Finding the State Sequence
 t i   P qt  Si O,  
 t i  t i 
 N
 j 1 t  j t  j 
Choose the state that has the highest probability,
for each time step:
qt*= arg maxi γt(i)
No!
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
14
Viterbi’s Algorithm
δt(i) ≡ maxq1q2∙∙∙ qt-1 p(q1q2∙∙∙qt-1,qt =Si,O1∙∙∙Ot | λ)
 Initialization:
δ1(i) = πibi(O1), ψ1(i) = 0
 Recursion:
δt(j) = maxi δt-1(i)aijbj(Ot), ψt(j) = argmaxi δt-1(i)aij
 Termination:
p* = maxi δT(i), qT*= argmaxi δT (i)
 Path backtracking:
qt* = ψt+1(qt+1* ), t=T-1, T-2, ..., 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
15
Learning
t i , j   P qt  Si , qt 1  S j | O ,  
 t i aij b j Ot 1  t 1  j 
t i , j  
k l  t k akl bl Ot 1 t 1 l 
Baum - Welch algorithm (EM) :
1 if qt  Si
1 if qt  Si and qt 1  S j
t
z 
zij  
0 otherwise
0 otherwise
t
i
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
16
Baum-Welch (EM)
 
 
E  step : E zit   t i  E zijt  t i , j 
M  step :
K
ˆi 
k

 1 i 
k 1
K
aˆij 
k

k 1 t 1 t i 
K
Tk 1
k
k




j
1
O
k 1 t 1 t
t  vm 
K
bˆ j m  
k

k 1 t 1 t i , j 
Tk 1
K
Tk 1
k

k 1 t 1 t i 
K
Tk 1
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
17
Continuous Observations
 Discrete:
rmt
1 if Ot  vm
r 
0 otherwise
P Ot | qt  S j ,     bj m
M
t
m
m 1
 Gaussian mixture (Discretize using k-means):
P Ot |qt  S j ,     P G jl  pOt |qt  S j ,Gl ,  
L
l 1
 Continuous:
~ N  l ,  l 
POt |qt  S j ,   ~ N  j , j2 
Use EM to learn parameters, e.g.,
  j O

̂ 
   j
t
t
t
j
t
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
t
18
HMM with Input
 Input-dependent observations:
POt | qt  S j , x t ,   ~ N g j x t |  j , j2 
 Input-dependent transitions (Meila and Jordan, 1996;
Bengio and Frasconi, 1996):
Pqt 1  S j |qt  Si , x t 
 Time-delay input:
xt  f Ot  ,...,Ot 1 
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
19
Model Selection in HMM
 Left-to-right HMMs:
a11 a12 a13 0 
0 a

a
a
22
23
24 
A
0
0 a33 a34 


0
0
0
a
44 

 In classification, for each Ci, estimate P (O | λi) by a
separate HMM and use Bayes’ rule
P O | i P i 
P i | O  
 j P O |  j P  j 
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
20