InToEventS: An Interactive Toolkit for Discovering and Building Event Schemas Germán 1 Ferrero , Audi 2 Primadhanty , Ariadna 3 Quattoni 1 Universidad Nacional de Còrdoba, Cordoba, Argentina 2 Universitat Politècnica de Catalunya, Barcelona, Spain 3 Xerox Research Centre Europe, Grenoble, France Summary Event Schema Event Triggers • Goal: event schema induction • Main challenge: no supervision • Traditional approaches: require document-level supervision Set of atomic predicates associated with an event • Literal (e.g. explode) • Real-valued word vector representation • Distance threshold (defines a ball around the literal in a word vector space representation) • Problems: • No annotated data • User does not know in advance the event schemas that he/she is interested in Event Slots Set of participating entities involved in the event • Entity type (e.g. person, organization or object) • Contributions: • Interactive event schema induction system that can be used by non-experts to explore a corpus and easily build event schemas and their corresponding extractors • A set of predicates: • Literal • Real-valued word vector representation • Distance threshold • Syntactic relation First Step: Event Induction Second Step: Role Induction Idea Observations • Literals that tend to appear nearby in a document usually play a role in the same event description • Literals with similar meaning are usually describing the same atomic predicates System • Extract predicate literals: all unique verbs and all nouns noun with a corresponding synset in Wordnet labeled as noun.event or noun.act • Calculate distance between predicates: • Distance in corpus • Distance in word embedding vector space • Agglomerative clustering: based on corpus distance User • Explore the resulting dendogram: • Choose distance threshold • Choose initial partition of event triggers • Merge or split initial clusters • Selects and labels the cluster • Expand each event trigger set: adding predicate literals that are close in the word vector space Victim of a bombing: “Someone who dies, is attacked or injured”, that is: “PERSON: subject of die, object of attack, object of injured” System • For each predicate in the event trigger set: • Extract from the corpus all unique tuples: < predicate, syntactic relation, entity type > • Compute distance between tuples: based on average word embeddings of the arguments observed in the corpus • Agglomerative clustering User • Explore different clusters settings and store those that represent the slots that he/she is interested in
© Copyright 2025 Paperzz