Dependency parsing

Dependency Parsing
CMSC 723 / LING 723 / INST 725
MARINE CARPUAT
[email protected]
Slides credit: Joakim Nivre & Ryan McDonald
Today’s Agenda
• Midterm
• P1
• Dependency grammars & parsing
Two Types of Parsing
●
Dependency: focuses on relations between words
I saw a girl with a telescope
●
Constituency: focuses on identifying constituents
and their recursive structure
S
VP
PP
NP
NP
NP
PRP VBD DT NN
IN
DT
NN
I saw a girl with a telescope
Why is parsing difficult?
Ambiguity
I saw a girl with a telescope
I saw a girl with a telescope
Constituency: Example
The funicular which goes to the top
of Victoria Peak is one of the
longest in the world.
Dependency Grammars
• Syntactic structure = lexical items linked by
binary asymmetrical relations called
dependencies
Example Dependency Parse
They hid the letter on the shelf
Compare with constituent parse… What’s the relation?
Criteria for dependency
• D is likely to be the dependent of head H in
construction C
– H can often replace C and determines the
category of C
– H gives semantic specification of C, D specifies H
– H is obligatory, D may be optional
– H selects D and determines whether D is
obligatory
– The form of D depends on H (agreement)
Some intuitive dependencies…
Some trickier dependencies…
Complex verb groups
Subordination
Coordination
Prepositions
Punctuation
Dependencies
●
Typed: Label indicating relationship between words
prep
nsubj
dobj
det
pobj
det
I saw a girl with a telescope
●
Untyped: Only which words depend
I saw a girl with a telescope
Dependency Treebanks
●
●
●
ID
1
2
3
4
5
CoNLL File Format
Standard format for dependencies
Tab-separated columns, sentences separated by space
Word
ms.
haag
plays
elianti
.
Base
ms.
haag
play
elianti
.
POS
N
N
V
N
.
POS2
NNP
NNP
VBZ
NNP
.
Head
2
3
0
3
3
Type
DEP
NP-SBJ
ROOT
NP-OBJ
DEP
Data-driven dependency parsing
Goal: learn a good predictor of dependency graphs
Input: x
Output: dependency graph/tree G
Parsing strategies
•
Transition-based
–
–
•
Learn to predict transitions given input and history
Predict new graphs using deterministic parsing algorithm
Graph-based
–
–
Learn to predict entire graphs given the input
Predict new graphs using spanning tree algorithms
Shift-Reduce
●
●
●
●
Process words one-by-one left-to-right
Two data structures
●
Queue [Buffer] of unprocessed words
●
Stack of partially processed words
At each point choose one action
●
shift: move one word from queue to stack
●
reduce left: top word on stack is head of second word
●
reduce right: second word on stack is head of top word
Learn how to choose each action with a classifier
Shift Reduce Example
Stack
Queue
Stack
I saw a girl
Queue
saw a girl
shift
I
saw a girl
I
r left
shift
a girl
I saw
girl
saw
shift
I
a
r left
r right
a girl
saw
I
I
shift
saw a
I
saw
girl
girl
a
Classification for Shift-Reduce
●
Given a state:
Stack
Queue
girl
saw a
I
●
Which action do we choose?
shift ?
r left ?
saw a girl
a
saw
I
I
●
Correct actions → correct tree
r right ?
girl
girl
saw
I
a
Classification for Shift-Reduce
●
●
●
●
We have a weight vector for “shift” “reduce left”
“reduce right”
ws wl wr
Calculate feature functions from the queue and
stack
φ(queue, stack)
Multiply the feature functions to get scores
ss = ws * φ(queue,stack)
Take the highest score
ss > sl && ss > sr → do shift
Features for Shift Reduce
●
Features should generally cover at least
the last stack entries and first queue entry
Word:
POS:
stack[-2]
stack[-1]
queue[0]
saw
VBD
a
DET
girl
NN
(-2 → second-to-last)
(-1 → last)
(0 → first)
φW-2saw,W-1a = 1
φW-1a,W0girl = 1
φW-2saw,P-1DET = 1
φW-1a,P0NN = 1
φP-2VBD,W-1a = 1
φP-1DET,W0girl = 1
φP-2VBD,P-1DET = 1
φP-1DET,P0NN = 1
Algorithm Definition
●
●
The algorithm ShiftReduce takes as input:
●
Weights ws wl wr
●
A queue=[ (1, word1, POS1), (2, word2, POS2), …]
starts with a stack holding the special
ROOT symbol:
●
●
stack = [ (0, “ROOT”, “ROOT”) ]
processes and returns:
●
heads = [ -1, head1, head2, … ]
Training Shift-Reduce
Can be trained using perceptron algorithm
Do parsing, if correct answer corr different
from classifier answer ans, update weights
e.g. if ans = SHIFT and corr = LEFT
ws -= φ(queue,stack)
wl += φ(queue,stack)
Keeping Track of the Correct
Answer (Initial Attempt)
Assume we know correct head of each stack entry:
stack[-1].head == stack[-2].id
(left is head of right)
→ corr = RIGHT
stack[-2].head == stack[-1].id
(right is head of left)
→ corr = LEFT
else
→ corr = SHIFT
Problem: too greedy for right-branching dependencies
stack[-2] stack[-1] queue[0]
go
go
to
school
→ RIGHT
to
id:
1
2
3
1
2
school head: 0
Keeping Track of the
Correct Answer (Revised)
Count the number of unprocessed children
●
stack[-1].head == stack[-2].id
(right is head of left)
stack[-1].unproc == 0 (left no unprocessed children)
→ corr = RIGHT
●
stack[-2].head == stack[-1].id
(left is head of right)
stack[-2].unproc == 0
(right no unprocessed children)
→ corr = LEFT
●
else
→ corr = SHIFT
Shift Reduce Training Algorithm
ShiftReduceTrain(queue)
initialize stack
while |queue| > 0 or |stack| > 1:
feats = MakeFeats(stack, queue)
calculate ans
calculate corr
if ans != corr:
wans -= feats
wcorr += feats
perform action according to corr
Recap
●
Dependency grammars
● What’s a dependency
● Some relations are clearer than others
●
Dependency treenbanks
●
An example of a transition-based parser
●
●
●
Shift-Reduce parser
Combined with classifier
Can be trained using perceptron algorithm