Semantic Role Labelling Using Shallow Information

Semantic Role Labelling
Using Chunk Sequences
Ulrike Baldewein
Katrin Erk
Sebastian Padó
Detlef Prescher
Saarland University
Saarbrücken
Amsterdam University
Amsterdam
1. Representation for Classification

Usual choice: Classify Constituents



Classify words?


Intuition: one argument  one constituent
Not available in this task
Too fine-grained
Classify chunks?

Data anylsis: Not always the right level


34% of arguments: more than one chunk
13% of arguments: do not respect chunk boundaries
Chunk sequences as classification instances

Sequences of chunks and chunk parts


Adaptive level of structure
„Potential constituents“
[NP Britain´s] [NP manufacturing industry]
[VP is transforming] [NP itself] [VP to boost] [NP exports]




ARG0 = NP_NP
V = VP[VBG]
ARG1 = NP
ARG2 = VP_NP
Frequency-based chunk sequence filtering

Filter 1: Use only sequence
types which realise
arguments in training set

1089 types
 Zipf distributed




NP (23,000)
S (5,000)
NP_PP_NP_PP_NP_VP
_PP_NP_NP (1)
Filter 2: Use only frequent
sequence type (f(s)>10)

Examine material between
sequence and target
 Divider sequences
 Also Zipf distributed



Empty divider (14,000)
NP (10,000)
Similar to „Path“

Filter 3: Use only seq.s with
frequent divider (f(d)<10)

Filter 4: Use only seq.s cooccurring frequently with
some divider (f(s,d)<5)
Results of filtering

Leaves 87 sequence types (was 1089)



43,777 tokens in devel set (about 1 seq / word)
8,698 are proper arguments (about 20%)
Bad news: representation loses 16% of proper
arguments
2. Features

„Shallow features“:


„Higher-level features“:


Simple properties
Syntactic properties (mostly heuristic)
„Divider features“:

Shallow and higher-level properties of dividers
EM-based clustering

Measure fit between objects y1 (pred:arg) and y2
(sequence)


y1 and y2 are independent and generated by cluster




Example: How well does NP fit give:A1?
p(y1,y2) = c p(c) p(y1|c) p(y2|c)
EM derives clusters from training data
Intention: Generalisation within clusters
Features: e.g. „most likely argument slot for this
sequence for this predicate“
3. Procedure

Filter sequences from training set

Compute features for sequence tokens and their
dividers (training + development + test set)

Estimate Maximum Entropy model on training set

Classify sequences from devel / test set

Recover semantic parses
Two-step classification procedure

Classifier 1: Argument recognition



Binary decision about argumenthood
All argument classes conflated into ARG
Classifier 2: Argument labelling

Consider only sequences assigned ARG by step 1
4. Classification result: Sequence chart
The
man
with
A0 (70%), A1 (20%)
the
beard
sleeps
NOLABEL (70%), AM-MOD(25%)
A0 (60%), NOLABEL
(40%)
A0 (65%), A1 (25%)
Need to find optimal „semantic parse“ of argument labels
Semantic parse recovery

Find most probable semantic parse p = (l1,l2,...)

Step 1: Beam search:


Simple probability model with independence assumption:
Pbs(l1,l2,...) = i Pc(li)
Step 2: Reestimation


Global considerations: [A0 A0]
Use counts from training set:
P(l1,l2,...) = Pbs(l1,l2,...) * Ptr(l1,l2,...)
5. Results (Development Set)
Precision
Recall
F-score
Upper Bound
100
83.3
90.9
Step 1 (ARG only)
77.3
60.1
66.1
Final
64.9
41.6
50.7


Upper Bound: given by lost chunk sequences
But filtering is necessary

Only sequence frequency filtering (filter 1 and 2):


Good news: 9% arguments are lost (now 16%)
Bad news: 127,000 sequences (now 44,000)

Argument recognition much more difficult

F-score with same features only 0.38
Results
Precision
Recall
F-score
Upper Bound
100
83.3
90.9
Step 1 (ARG only)
77.3
60.1
66.1
Final
64.9
41.6
50.7

Two steps have different profiles
 Arg identification: shallow and divider features important
 Arg labelling: shallow and higher-level features important

Clustering features unsuccessful: Increase precision at cost of recall



Feature „most probable label for sequence“
Successful in SENSEVAL-3 model
Largest problem is recall
What I talked about... and more

Chunk sequences for SRL




Since submission


Adaptive representation with „higher-level“ features
Recall problem (Filtering loses proper arguments)
EM-based features promising, but currently not helpful
Maxent vs. memory-based learner: virtually same result
Left to do



Detailed error analysis
More intelligent filtering
Better features