CS626/449 : Natural Language
Processing, Speech and the Web/Topics
in AI
Lecture 31: POS Tagging
(discussion to assist the CMU pronunciation dictionary
assignment)
Pushpak Bhattacharyya
CSE Dept.,
IIT Bombay
Lexicon
•
Example
^_ Some_ People_ Jump_ High_ ._
Lexicon/
Dictionary
Lexical
Tag
Example
Some
People
A (Adjective)
N (Noun)
V (Verb)
V (Verb)
N (Noun)
R (Adverb)
A (Adjective)
N (Noun)
{Quantifier}
lot of people
peopled the city with soldiers
he jumped high
This was a good jump
He jumped high
high mountain
Bombay high; on a high
Jump
High
Bigram Assumption
Best tag sequence
= T*
= argmax P(T|W)
= argmax P(T)P(W|T) (by Baye’s Theorem)
P(T) = P(t0=^ t1t2 … tn+1=.)
= P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) …
P(tn|tn-1tn-2…t0)P(tn+1|tntn-1…t0)
= P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn)
n+1
=
∏
i=0
P(ti|ti-1)
Bigram Assumption
Lexical Probability Assumption
P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) …
P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1)
Assumption: A word is determined completely by its tag.
This is inspired by speech recognition
= P(wo|to)P(w1|t1) … P(wn+1|tn+1)
n+1
= ∏ P(wi|ti)
i=0
n+1
= ∏ P(wi|ti)
(Lexical Probability Assumption)
i=1
Thus,
argmax P(T)P(W|T) = Equation
Generative Model
^_^
People_N
Jump_V
High_R
N
V
A
V
N
N
A
A
N
._.
Lexical
Probabilities
^
.
Bigram
Probabilities
This model is called Generative model.
Here words are observed from tags as states.
This is similar to HMM.
Bigram probabilities
•
N
V
A
N
0.2
0.7
0.1
V
0.6
0.2
0.2
A
0.5
0.2
0.3
Lexical Probability
•
People
jump
high
N
10-5
0.4x10-3
10-7
V
10-7
10-2
10-7
A
0
0
10-1
values in cell are P(col-heading/row-heading)
Calculation from actual data
• Corpus
– ^ Ram got many NLP books. He found them
all very interesting.
• Pos Tagged
–^NVANN.^NVNARA.
Recording numbers
^
N
V
A
R
.
^
0
2
0
0
0
0
N
0
1
2
1
0
1
V
0
1
0
1
0
0
A
0
1
0
0
1
1
R
0
0
0
1
0
0
.
1
0
0
0
0
0
Probabilities
^
N
V
A
R
.
^
0
1
0
0
0
0
N
0
1/5
2/5
1/5
0
1/5
V
0
1/2
0
1/2
0
0
A
0
1/3
0
0
1/3
1/3
R
0
0
0
1
0
0
.
1
0
0
0
0
0
Compare with the Pronunciation
Dictionary Assignment
Phoneme Example Translation
------- ------- ----------AE at
AE T
In POS tagging the
Labels are already given
AH hut HH AH T
on the words.
AO ought AO T
The “alignment” of
Words with labels are already
AW cow K AW
Given.
AY hide HH AY D
In the assignment the most
Likely alignment is to be
B
be B IY
Discovered followed by the
Best possible mapping.
© Copyright 2025 Paperzz