Exploring Syntactic Features for Relation Extraction using a Convolution Tree Kernel Min Zhang, Jie Zhang, Jian Su Institute for Infocomm Research In Proceedings of HLT 2006 presented by Andy Schlaikjer Relation Extraction (ACE 03) • • Given entity mentions (and their types) identify relations between them 5 main relation types (AT, NEAR, PART, ROLE, SOCIAL) covering 24 subtypes Akyetsu testified he was powerless to stop the merger of an estimated 2000 ethnic Tutsi’s in the district of Tawba. 2 Relation Extraction (ACE 03) • • Given entity mentions (and their types) identify relations between them 5 main relation types (AT, NEAR, PART, ROLE, SOCIAL) covering 24 subtypes Akyetsu testified he was powerless to stop the merger of an estimated 2000 ethnic Tutsi’s in the district of Tawba. A B AT.LOCATED( A, B ) 3 Syntax for Relation Extraction • Generative models with syntax (Miller et al, 2000) • Maxent with syntax (Kambhatla 2004) • SVM with syntax (Zhou et al 2005) • Tree kernel (Zelenko et al 2003, Culotta and Sorensen 2004) • Shortest path dependency kernel (Bunescu and Mooney 2005) 4 Kernels for Feature Spaces • Conditional estimation and discriminative techniques allow greater flexibility for features, but how to choose? • Kernels allow large feature spaces to be used easily without precise feature engineering • So why didn’t those last two applications of kernels using syntax for relation extraction work? 5 Shortcomings of Previous Kernels • Only those nodes at the same depth may be compared (Culotta and Sorensen 2004) • Paths to be compared must have same length (Munescu and Mooney 2005) These kernel function constraints limit the amount of syntactic information used, possibly ignoring important qualities of syntax trees. 6 Let’s Explore our Options… • What kinds of syntax are useful for Relation Extraction? • Experiment with various transformations of input trees which cover different spaces of tree features • 7 spaces investigated (details soon) • Before we get carried away, why might limiting the feature space defined by the kernel be important? 7 The Feature Spaces Considered • • • • • • • Minimum Complete Tree (MCT) Path-enclosed Tree (PT) Chunking Tree (CT) Context-Sensitive PT (CPT) Context-Sensitive CT (CCT) Flattened PT (FPT) Flattened CPT (FCPT) 8 The Feature Spaces Considered Akyetsu testified he was powerless to stop the merger of an estimated 2000 ethnic Tutsi’sA in the districtB of Tawba. AT.LOCATED( A , B ) 9 10 Minimum Complete Tree an estimated 2000 ethnic Tutsi’s in the district of Tawba 11 Path-enclosed Tree an estimated 2000 ethnic Tutsi’s in the district of Tawba 12 Chunking Tree an estimated 2000 ethnic Tutsi’s in the district of Tawba 13 Context-Sensitive PT an estimated 2000 ethnic Tutsi’s in the district of Tawba 14 Context-Sensitive CT an estimated 2000 ethnic Tutsi’s in the district of Tawba 15 Flattened PT an estimated 2000 ethnic Tutsi’s in the district of Tawba 16 Flattened CPT an estimated 2000 ethnic Tutsi’s in the district of Tawba 17 How to Compute the Kernel? • Convolution Tree Kernel used for all types of tree spaces (Collins and Duffy 2001, Moschitti 2004). • Feature space (vector representation) for a parse tree looks like this: (T ) (c(t1 ), , c(ti ), , c(tn )) where c(ti) is “count of sub-trees of type i” 18 How to Compute the Kernel? • The kernel function is defined: K (T1 , T2 ) I (n ) I (n ) n1N1 n2 N 2 i 1 i 2 i where N1 and N2 are the sets of all nodes in trees T1 and T2 respectively. • This can be computed in O(|N1|×|N2|) (Collins and Duffy 2002). 19 Experiments (ACE 03) • Train: 647 docs, 9683 relations. • Test: 97 docs, 1386 relations. • 5 entity types: Person, Organization, Location, Facility, GPE. • 5 relation types: AT, NEAR, PART, ROLE, SOCIAL (with 24 sub-types). • Treated as multi-class classification. • Charniak parser, SVM, best of 1-v-all. 20 Experiments (ACE 03) 21 Experiments (ACE 03) Symmetric relations.. ? 22 Results (ACE 03) Only syntax tree features and still reasonable performance 23 Results (ACE 03) PT does best, and CT performance drops significantly, suggesting syntax deeper than chunking is important for this task. 24 Results (ACE 03) Using context blindly is harmful, but perhaps there may be a better way to determine when to incorporate it… 25 Results (ACE 03) Incorporating other features of the entities within the trees boosts performance 26 Results (ACE 03) Quasi-comparison with other Relation Extraction research. The datasets used may be slightly different, so these figures can’t really be compared… 27 Conclusions • Tune your feature space to your task (this goes for kernel approaches, too). • Syntax is important for Relation Extraction, and specifically, the path-enclosed tree between two entities contains useful information. • Convolution Tree Kernels allow elegant coverage of a very large, structured feature space 28 Questions? • How can context be used more intelligently? Should we always ignore it? • Can feature spaces be designed according to task? Is there any methodology one can follow? • What other information would be useful to tie into the syntax tree? • It seems unnatural to include extra nodes for non syntax-based features. Can we consider nodes in the tree as sets of attributes, instead of single class labels? 29
© Copyright 2026 Paperzz