Exploring Syntactic Features for Relation Extraction using a

Exploring Syntactic Features
for Relation Extraction
using a Convolution Tree Kernel
Min Zhang, Jie Zhang, Jian Su
Institute for Infocomm Research
In Proceedings of HLT 2006
presented by Andy Schlaikjer
Relation Extraction (ACE 03)
•
•
Given entity mentions (and their types)
identify relations between them
5 main relation types (AT, NEAR, PART,
ROLE, SOCIAL) covering 24 subtypes
Akyetsu testified he was powerless to
stop the merger of an estimated 2000
ethnic Tutsi’s in the district of Tawba.
2
Relation Extraction (ACE 03)
•
•
Given entity mentions (and their types)
identify relations between them
5 main relation types (AT, NEAR, PART,
ROLE, SOCIAL) covering 24 subtypes
Akyetsu testified he was powerless to
stop the merger of an estimated 2000
ethnic Tutsi’s
in the district
of Tawba.
A
B
AT.LOCATED(
A, B )
3
Syntax for Relation Extraction
• Generative models with syntax (Miller et
al, 2000)
• Maxent with syntax (Kambhatla 2004)
• SVM with syntax (Zhou et al 2005)
• Tree kernel (Zelenko et al 2003, Culotta
and Sorensen 2004)
• Shortest path dependency kernel
(Bunescu and Mooney 2005)
4
Kernels for Feature Spaces
• Conditional estimation and discriminative
techniques allow greater flexibility for
features, but how to choose?
• Kernels allow large feature spaces to be
used easily without precise feature
engineering
• So why didn’t those last two applications
of kernels using syntax for relation
extraction work?
5
Shortcomings of Previous Kernels
• Only those nodes at the same depth may
be compared (Culotta and Sorensen 2004)
• Paths to be compared must have same
length (Munescu and Mooney 2005)
These kernel function constraints limit the
amount of syntactic information used,
possibly ignoring important qualities of
syntax trees.
6
Let’s Explore our Options…
• What kinds of syntax are useful for
Relation Extraction?
• Experiment with various transformations of
input trees which cover different spaces of
tree features
• 7 spaces investigated (details soon)
• Before we get carried away, why might
limiting the feature space defined by the
kernel be important?
7
The Feature Spaces Considered
•
•
•
•
•
•
•
Minimum Complete Tree (MCT)
Path-enclosed Tree (PT)
Chunking Tree (CT)
Context-Sensitive PT (CPT)
Context-Sensitive CT (CCT)
Flattened PT (FPT)
Flattened CPT (FCPT)
8
The Feature Spaces Considered
Akyetsu testified he was
powerless to stop the merger of an
estimated 2000 ethnic Tutsi’sA in
the districtB of Tawba.
AT.LOCATED( A ,
B
)
9
10
Minimum Complete Tree
an estimated 2000 ethnic Tutsi’s in the district of Tawba
11
Path-enclosed Tree
an estimated 2000 ethnic Tutsi’s in the district of Tawba
12
Chunking Tree
an estimated 2000 ethnic Tutsi’s in the district of Tawba
13
Context-Sensitive PT
an estimated 2000 ethnic Tutsi’s in the district of Tawba
14
Context-Sensitive CT
an estimated 2000 ethnic Tutsi’s in the district of Tawba
15
Flattened PT
an estimated 2000 ethnic Tutsi’s in the district of Tawba
16
Flattened CPT
an estimated 2000 ethnic Tutsi’s in the district of Tawba
17
How to Compute the Kernel?
• Convolution Tree Kernel used for all types of
tree spaces (Collins and Duffy 2001, Moschitti
2004).
• Feature space (vector representation) for a
parse tree looks like this:
 (T )  (c(t1 ), , c(ti ), , c(tn ))
where c(ti) is “count of sub-trees of type i”
18
How to Compute the Kernel?
• The kernel function is defined:
K (T1 , T2 ) 
   I (n )  I (n )
n1N1 n2 N 2
i
1
i
2
i
where N1 and N2 are the sets of all nodes in
trees T1 and T2 respectively.
• This can be computed in O(|N1|×|N2|) (Collins
and Duffy 2002).
19
Experiments (ACE 03)
• Train: 647 docs, 9683 relations.
• Test: 97 docs, 1386 relations.
• 5 entity types: Person, Organization,
Location, Facility, GPE.
• 5 relation types: AT, NEAR, PART, ROLE,
SOCIAL (with 24 sub-types).
• Treated as multi-class classification.
• Charniak parser, SVM, best of 1-v-all.
20
Experiments (ACE 03)
21
Experiments (ACE 03)
Symmetric relations.. ?
22
Results (ACE 03)
Only syntax tree features and still
reasonable performance
23
Results (ACE 03)
PT does best, and CT performance drops
significantly, suggesting syntax deeper
than chunking is important for this task.
24
Results (ACE 03)
Using context blindly is harmful, but
perhaps there may be a better way to
determine when to incorporate it…
25
Results (ACE 03)
Incorporating other features of the entities
within the trees boosts performance
26
Results (ACE 03)
Quasi-comparison with other Relation
Extraction research. The datasets used
may be slightly different, so these figures
can’t really be compared…
27
Conclusions
• Tune your feature space to your task (this
goes for kernel approaches, too).
• Syntax is important for Relation Extraction,
and specifically, the path-enclosed tree
between two entities contains useful
information.
• Convolution Tree Kernels allow elegant
coverage of a very large, structured
feature space
28
Questions?
• How can context be used more intelligently?
Should we always ignore it?
• Can feature spaces be designed according to
task? Is there any methodology one can follow?
• What other information would be useful to tie
into the syntax tree?
• It seems unnatural to include extra nodes for
non syntax-based features. Can we consider
nodes in the tree as sets of attributes, instead of
single class labels?
29