PCFG Background Slides

General Information on Context-free and
Probabilistic Context-free Grammars
İbrahim Hoça
CENG784, Fall 2013
Outline
•
•
•
•
•
•
•
•
Basics of Context-free Grammars (CFGs)
Tree Structure
Convenience of Tree Structures
Natural Language Examples
Probabilistic Context-free Grammars (PCFGs)
PCFG Rules
Computing the Probabilities in PCFGs
Some Aspects of PCFGs
Basics of Context-free Grammars
A CFG is a quadruple (V, Σ, P, S) where
- V is a finite set of variables,
- Σ (the alphabet) is a finite set of terminal
symbols,
- P is a finite set of rules, and
- S is a distinguished element of V called the start
symbol.
Basics of Context-free Grammars
- A rule is an element of the set V x (VU)*
- The rule [A, w] is written as A  w.
- Lambda (null) rules are also possible:
Aλ
- The rules are written using the shorthand
A  u|v
to abbreviate A  u and A  v, the vertical
bar being ‘or’.
Sample Derivations
G =(V, Σ, P, S)
V = {S,A}
Σ = {a,b}
P: S  AA
A  AAA | bA | Ab | a
S => AA
=> aA
=> aAAA
=> abAAA
=> abaAA
=> ababAA
=> ababaA
=> ababaa
(a)
S => AA
=> AAAA
=> aAAA
=> abAAA
=> abaAA
=> ababAA
=> ababaA
=> ababaa
(b)
S => AA
=> Aa
=> AAAa
=> AAbAa
=> AAbaa
=> AbAbaa
=> Ababaa
=> ababaa
(c)
S => AA
=> aA
=> aAAA
=> aAAa
=> abAAa
=> abAbAa
=> ababAa
=> ababaa
(d)
Tree Structure
Trees corresponding to the derivations in the previous slide.
Implementing CFG on Natural Language
Let’s consider the following sentence:
‘nice dogs like cats’
Rules:
S  NP VP
NP  Adj N
NP  N
VP  V NP
N  dogs | cats
V  like
Adj  nice
Tree:
Convenience of Tree Structures
- Natural language have a recursive structure.
- Tree structures, hence CFGs, allow us to
extend the context according to the properties
of the relevant head nodes rather than
limiting it with an arbitrary amount of
adjacent words.
Convenience of Tree Structures
Consider the verb agreement in the following
construction:
‘Velocity of the seismic waves rises to …’
bigram:
trigram:
P(rises|waves)
P(rises|seismic waves)
quadrigram:
P(rises|the seismic waves)
Convenience of Tree Structures
- The verb ‘rises’ is apparently modified by a
singular noun, which is ‘velocity’ in this case.
- CFG allows us to capture this relationship
between non-adjacent words.
Probabilistic Context-free Grammars (PCFG)
- PCFG is simply a CFG with probabilities added
to the rules, indicating how likely different
rewritings are.
PCFG
A PCFG G consists of:
• A set of terminals: {wk}, k = 1, …,V
• A set of non-terminals: {Ni}, i = 1,…n
• A designated start symbol: N1
• A set of rules: {Ni  ζj}, (where ζj is the sequence of
terminals and non-terminals)
• A corresponding set of probabilities on rules such
that:
PCFG Rules
S  NP VP
NP  NP PP
PP  P NP
VP  V NP
NP  astronomers
NP  ears
NP  saw
NP  stars
NP  telescopes
V  saw
P  with
1.0
0.4
1.0
0.7
0.1
0.18
0.04
0.18
0.1
1.0
1.0
Note that the NP rules are chosen to
make the rules comply with the Chomsky
Normal Form, which basically allows:
ABC
Aw
Aλ
Where A, B, and C are non-terminals,
and w is a terminal symbol.
Computing the Probabilities in PCFG
where t is the parse tree and w1m is the sentence from w1 to wm.
- This formula gives us the total probability of a sentence.
- Probability of each tree is found by multiplication of the
probabilities of the rules that created the tree.
Computing the Probabilities in PCFG
P(t1) = 1.0 × 0.1 × 0.7 × 1.0 × 0.4 × 0.18 × 1.0 × 1.0 × 0.18
= 0.0009072
Computing the Probabilities in PCFG
P(t2) = 1.0 × 0.1 × 0.3 × 0.7 × 1.0 × 0.18 × 1.0 × 1.0 × 0.18
= 0.0006804
Computing the Probabilities in PCFG
P(w15) = P(t1) + P(t2) = 0.0015876
Some Aspects of PCFG
+ As grammars expand to cover a large and diverse corpus of
text, they become increasingly ambiguous. A PCFG gives some
idea of plausibility of different parses.
- Nevertheless, a PCFG does not offer a very good idea of
plausibility in itself, since its probability estimates are based
purely on structural factors, and do not include lexical cooccurrence.
Some Aspects of PCFG
+ Real text tends to have grammatical mistakes, disfluencies and
errors. This problem can be avoided to some extent with a PCFG
by ruling out nothing excluded by the grammar but instead, by
giving implausible sentences a low probability.
- In a PCFG, the probability of a smaller tree is greater than a
larger tree. For instance, the most frequent length for Wall Street
Journal sentences is around 23 words. A PCFG gives too much of
the probability mass to very short sentences.