CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training Baum Welch algorithm Training Hidden Markov Model (not structure learning, i.e., the structure of the HMM is pre-given). This involves: Learning probability values ONLY Correspondence with PCFG: Not learning production rule but probabilities associated with them Training algorithm for PCFG is called Inside-Outside algorithm Key Intuition a a b q b a r a b Given: Initialization: Compute: Approach: Training sequence Probability values Pr (state seq | training seq) get expected count of transition compute rule probabilities Initialize the probabilities and recompute them… EM like approach b Building blocks: Probabilities to be used 1. i (t ) forward probabilit y P(W 1,t 1,St s i ),t 0 α1(t) 1.0 if i 0 0 otherwise T T P(W 1,n ) P(W 1,t,Sn 1 s ) αi(n 1 ) i i 1 i 1 T j k αj(t 1 ) αi(t)P( s i w s ) i 1 W1 S1 W2…………… Wn-1 S2 Wn Sn Sn+1 Probabilities to be used, contd… 2. i (t ) backward probabilit y P (Wt , n,, St s i ) ,t 1 1(1) P (W 1, n,, S 1 s i ) P (W 1, n ) i (n 1) P (| Sn 1 s i ) 1 T j i (t 1) P( s i w s ). j (t ) k j 1 T Exercise 1:- Prove the following:P(W 1,n ) j (t ).j (t ) j 1 Start of baum-welch algorithm b b q a r a String = aab aaa aab aaa Sequence of states with respect to input symbols o/p seq a b b a a a b b b a a a q r q q r q r q q q r q r State seq Calculating probabilities from table Table of counts a P( q r) 5 / 8 P( q r ) 3 / 8 b P( s i wk s ) j k c( s i w s ) j T Src Dest O/P Count q r a 5 q q b 3 r q a 3 r q b 2 A i wm l c ( s s ) l 1 m 1 T=#states A=#alphabet symbols Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this. Interplay Between Two Equations Wk P( s i sj) Wk c( s i sj) T A Wk i j c ( s s ) l 1 m 1 Wk C ( s i sj) P( S si , n 1 i , n 1 | W1,n ) n( s s , Si ,n 1 , w1,n ) i Wk j wk No. of times the transitions sisj occurs in the string Learning probabilities a:0.67 b:0.17 q r a:0.16 b:1.0 Actual (Desired) HMM a:0.4 b:0.48 q r a:0.48 b:1.0 Initial guess One run of Baum-Welch algorithm: string ababa a a b b a a b b b b P(path) a b b a q r r q q q q q q r q r q q 0.00077 0.00154 0.00154 0 0.00077 q r q q q q 0.00442 0.00442 0.00442 0.00442 0.00884 q q q r q q 0.00442 0.00442 0.00442 0.00442 0.00884 q q q q q q 0.02548 0.0 0.000 0.05096 0.07644 0.035 0.01 0.01 0.06 0.095 0.06 1.0 0.36 0.581 Rounded Total New Probabilities (P) State sequences (0.01/(0.01 +0.06+0.09 5) * is considered as starting and ending symbol of the input sequence string This way through multiple iterations the probability values will converge. Appling Naïve Bayes P( S1,n 1 | W1,n ) P( S1,n 1 , W1,n ) 1 P(S1 ) P(S 2W1 | S1 ) P( S3W2 | S1W2W1 ) P( S 4W3 | S1S 2W1W2 ) P(W1,n ) P(W1,n ) P( S1 ) n P ( S W | S ) i 1 i i P(W1,n ) i 1 Hence multiplying the transition probabilities is valid Discussions 1. Symmetry breaking: Example: Symmetry breaking leads to no change in initial values a:0.5 b:1.0 s s s a:1.0 b:0.5 Desired b:0.25 a:0.25 a:0.5 b:0.5 s b:0.5 a:0.25 s a:0.5 s b:0.5 1. Struck in Local maxima 2. Label bias problem Probabilities have to sum to 1. Values can rise at the cost of fall of values for others. Initialized Computational part 1 Wk C ( s s ) [ P( S1,n1 ,W1,n ) n(s i s j , S1,n1 ,W1,n )] P(W1,n ) i Wk j Wk P( S1,n 1 ,W1,n ) n( s i s j , S1,n 1 ,W1,n ) n P( S s , S t t 1 n α (t)P(s i i t 1 s j ,Wt Wk , S1,n 1 ,W1,n ) wk s ).i (t 1) i i t 1 Exercise 2: What is the complexity of calculating the above expression? Hint: To find this first solve Exercise 1 i.e. understand how probability of given T string can be represented as αi(t).i (t ) i 1
© Copyright 2026 Paperzz