N-document Merging process

Overview of the merger
prototype
Overview
•
•
Backgrounds: The MUMIS project
Cross document annotation merging
• Alignment of parallel fragments
• Unification of aligned fragments
• Clean up unified fragments
•
•
•
Reasoning
Evaluation
Future work & Conclusions
The MUMIS project
Semantic access to a multimedia database.
The MUMIS project
Semantic access to a multimedia database.
Subject: Soccer
The MUMIS project
Semantic access to a multimedia database.
Corpus: Video recordings of matches, formal
texts, ‘ticker’ texts.
The MUMIS project
Semantic access to a multimedia database.
Approach:
• Extract knowledge from textual sources
• Align this (time based) knowledge with video
• Do retrieval on annotation, returning
corresponding video fragments to user
The MUMIS project
Semantic access to a multimedia database.
Main subject of this presentation: Merging the
annotations resulting from separate texts
into one cross-document annotation.
Merging
Intention of merging:
- start with various texts
- annotate each text individually
- combine annotations
Example match:
Netherlands – Yugoslavia
(European Championship 2000)
Two types of text in merger:
•
•
Formal texts
Ticker texts
Example formal text
Netherlands-Yugoslavia
Final score: 6-1
Referee: Garcia Aranda
Goals:
24' Patrick Kluivert
90' Marc Overmars
91' Savo Milosevic
Substitutions:
53' out : Nisa Saveljic
in : Jovan Stankovic
58' out : Patrick Kluivert in : Roy Makaay
Yellow Cards:
Paul Bosvelt
Example ticker text (BBC)
19 mins: Bergkamp scuffs his left-foot shot but still forces Kralj
into a diving save low down to his left.
20 mins: Edgar Davids wastes the best chance of the game so
far when he blazes over with just the goalkeeper to beat
after being put through by Bergkamp.
24 mins: Kluivert puts Holland in front after latching onto a
wonderful chip from Bergkamp and then planting a rightfoot shot past Kralj from eight yards.
25 mins: Boudewijn Zenden comes close to doubling Holland's
lead when he fires in low, right-foot shot which Kralj just
about hangs onto.
Example of parallel fragments
BBC - 15:
Van der Sar pulls of great save to block Mijatovic's shot after Savo
Milosevic has cut through the Dutch defence like a knife.
Guardian - 17:
Mijatovic, played in with a quick square ball from Milosevic, finds himself
one-on-one with van der Sar 10 yards out. He picks his spot, but
unfortunately for Mijatovic, it's the spot occupied by van der Sar. A
great save and Yugoslavia should be one-nil up.
Kickers 15:
Milosevic auf Mijatovic, doch der Stuermer vom AC Florenz scheitert aus
12 Metern freistehend an van der Sar.
WEBTEC 15:
Milosevic filtreert door de Nederlandse defensie door één beweging en
legt af voor Mijatovic. Deze laatste trapt op van der Sar.
Example of parallel fragments
BBC - 15:
Van der Sar pulls of great save to block Mijatovic's shot after Savo
Milosevic has cut through the Dutch defence like a knife.
Guardian - 17:
Mijatovic, played in with a quick square ball from Milosevic, finds himself
one-on-one with van der Sar 10 yards out. He picks his spot, but
unfortunately for Mijatovic, it's the spot occupied by van der Sar. A
great save and Yugoslavia should be one-nil up.
Kickers 15:
Milosevic auf Mijatovic, doch der Stuermer vom AC Florenz scheitert aus
12 Metern freistehend an van der Sar.
WEBTEC 15:
Milosevic filtreert door de Nederlandse defensie door één beweging en
legt af voor Mijatovic. Deze laatste trapt op van der Sar.
Merging process:
overview
•
•
•
•
2 document alignment
N-document alignment
Unification of events from separate sources
Special situations
Merging process:
2-document alignment
Step 1 of the merging process: merge
annotations of 2 texts
Merging process:
2-document alignment
Source A
Source B
Merging process:
2-document alignment
The strongest binding is selected, ruling out
certain other bindings.
Merging process:
2-document alignment
The strongest binding is selected, ruling out
certain other bindings.
Merging process:
2-document alignment
The strongest binding is selected, ruling out
certain other bindings.
Merging process:
2-document alignment
The strongest binding is selected, ruling out
certain other bindings.
Merging process:
N-document
Given the 2-document alignments for each pair
of sources, find the n-document alignment
where all fragments describing same scene
in all separate sources are aligned.
Merging process:
N-document
Merging process:
N-document
Merging process:
N-document
Merging process:
Unification of scenes
Merging and reasoning:
types of rules
•
•
•
•
Within events or scenes:
Player1 and Player2 will not be the same
person, a player performing a save will not
score a goal in the same scene, etc.
Role of teams and events:
offensive vs. defensive
Combinations of events that probably have
the same player: ShotOnGoal+Goal,
Penalty+HitThePost
Terminology of authors may vary:
Cross—Pass, Save—Clearance
Merging and reasoning:
example rules
Merging and reasoning:
example rules
Reasoning:
mistakes in IE
Sometimes the information extraction
component makes mistakes. Example rules
have been applied to solve some of these.
Reasoning:
mistakes in IE
Fix: The goal made by Kralj (Yugoslavian
keeper) is removed
Evaluation:
What do we want to know?
Quality of the merger in itself
The advantages and disadvantages of merging
Evaluation:
Quality of the merger
•
•
•
Quality of alignments
Quality of unification
The effect of the quality of the original
information extraction on both
Evaluation:
Approach
•
•
•
•
Create gold standard annotations for single
sources
Create gold standard merged annotation of
all sources
Run merger in different conditions
Compare everything with everything
Evaluation:
Results
Alignments based on machine IE
Version 1 Version 2 Version 3
Manual
210
210
210
Automatic
104
187
189
Overlap
82
172
172
Precision
Recall
78.8
39.0
92.0
81.9
91.0
81.9
Evaluation:
Results
Alignments based on manual IE
Version 1
Manual
210
Automatic
188
Overlap
174
Precision
Recall
92.6
82.9
Evaluation:
Conclusions
•
•
•
Quality of alignments is pretty good.
Better IE improves alignments.
Low quality IE does not degrade
alignments too much.
MORE TO COME….
----- Extra Sheets -----
Extra example – 15th min.
Extra example – graph
Extra example – unification
Extra example – the source
BBC - 15:
Van der Sar pulls of …
Milosevic has cut …
Pass
Milosevic
ShotOnGoal Mijatovic
Save
Van der Sar
Guardian - 17:
Mijatovic, played in with a quick square ball from Milosevic, finds himself
one-on-one with van der Sar 10 yards out. He picks his spot, but
unfortunately for Mijatovic, it's the spot occupied by van der Sar. A
great save and Yugoslavia should be one-nil up.
Kickers 15:
Milosevic auf Mijatovic, doch der Stuermer vom AC Florenz scheitert aus
12 Metern freistehend an van der Sar.
WEBTEC 15:
Milosevic filtreert door de Nederlandse defensie door één beweging en
legt af voor Mijatovic. Deze laatste trapt op van der Sar.
Reasoning:
incomplete graphs
Reasoning:
incomplete graphs
Reasoning:
incomplete graphs
Reasoning:
incomplete graphs
Reasoning:
incomplete graphs
Reasoning:
incomplete graphs
Reasoning:
incomplete graphs
Reasoning:
incomplete graphs
Reasoning:
own goal
Reordering
Observation from corpus:
• Scenes in correct order
• Events within scenes often in wrong order
Reordering
Manual annotation of several matches
Pass, Shot-on-goal, Goal
Pass, Shot-on-goal, Save
Shot-on-goal, Hitting-the-post
Foul, Free-kick, Shot-on-goal, Corner
Reordering
Reordering
Not fully implemented yet in the merger.