IE DATA Formal text

MUMIS
Multimedia Indexing
and Searching
Franciska de Jong & Thijs Westerveld
University of Twente
[email protected]
OBJECTIVES
•
•
•
•
•
•
•
•
Automatically indexing of video
Data from different media sources
(paper, radio, tv)
Domain: soccer
Digitise + ASR
Extract significant events
Merge annotations
Store final annotations
UI for searching
FACTS SHEET
Title:
and
Funding:
MUMIS: Multimedia Indexing
Searching Environment
EU Language Engineering Sector of
TAP
Duration: 30 months
July 2000 – January
2003
Volume:
2.4 M Euro, 385 Person months
Languages:Dutch, English, German (Swedish)
Consortium
•
•
•
•
•
•
•
University of Twente (NL)
Sheffield University (UK)
University of Nijmegen (NL)
DFKI LT-Lab (DE)
Max Planck Institute for Psycholinguistics
(DE)
Esteam (SE)
VDA (NL)
Offline Processing
Information Extraction
DE Formal
Formal
Formal
Text
Formal
Text
NL Formal
Formal
Text
Text
Formal
Text
Formal
Text
Formal
Text
EN Formal
Formal
Text
Text
Text
Formal
Text
Text
Merging
Merged
Annotated
formal text
Formal
Formal
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Text
Speech
Text
Signals
ASR
Formal
Formal
Formal
Text
Text
Formal
Text
Formal
Formal
Text
Text
Formal
Text
Formal
Formal
Text
Text
Free
Text
IE
Formal
Text
Formal
Text
Formal
AnnoText
tations
Text
Formal
Formal
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Formal
Text
Text
Speech
Text
Merging
Transcr
Annotations
Automatic Speech Recognition
DOMAIN MODELLING
DATA: text, video, audio
ENTITY
EVENT
Location Time Person Score Object
Date
Player
Defender
Stopper
Multilingual
Lexicons
Goal
Player:…
Artifact Cause:…
Time:…
...
Official
RELATION
Foul
Player:…
Consequence:…
Time:…
Annotations
Multilingual IE
Multilingual Search
...
Location:...
<?xml version=…>
<mumis-ontology>
<version>…</version>
...
<class>
<name>Defender</name>
<documentation>a ’Defender’ is a …</documentation>
<subclass-of>Player</subclass-of>
</class>
</mumis-ontology>
SPEECH RECOGNITION
•
•
•
•
Large-vocabulary
Speaker independent
Phoneme-based
Hidden Markov models
• acoustic model
• language model
•
•
•
Emotionally coloured speech
Domain language model
Match specific vocabularies (player names)
INFORMATION
EXTRACTION
•
•
•
•
•
•
multilingual
formal descriptions
closed captions
tickers
newspapers
ASR output (radio/TV comment)
IE DATA
Ticker
24 Scholes beats Jens Jeremies wonderfully, dragging the ball around and past the
Bayern Munich man. He then finds Michael Owen on the right wing, but Owen's cross
is poor.
TV report
Newspaper
Formal text
Scholes
Owen header pushed onto the post
Deisler brought the German
supporters to their feet with a
buccaneering run down the right.
Moments later Dietmar Hamann
managed the first shot on target but it
was straight at David Seaman.
Mehmet Scholl should have done
better after getting goalside of Phil
Neville inside the area from Jens
Jeremies’ astute pass but he scuffed
his shot.
Schoten op doel 4 4
Schoten naast doel 6 7
Overtredingen
23
15
Gele kaarten
1 1
Rode kaarten
0 1
Hoekschoppen
3 5
Buitenspel
4 1
Past Jeremies
Owen
IE Techniques &
resources
•
•
•
•
•
•
•
24 Scholes beats Jens Jeremies
wonderfully, dragging the ball around and
past the Bayern Munich man. He then
finds Michael Owen on the right wing, but
Owen's cross is poor.
Tokenisation
Lemmatisation
POS + morphology
Named Entities
Shallow parsing
Co-reference resolution
Template filling
He
24
24
NUM
He
24 then finds Michael
Scholes
time Owen
on
then
Scholes
Scholes
the right wing
PROP
then
Scholes
player
finds
beats
beat
beat
VERB VP
finds
beat
PASS
Michael
Jens
sing
Michael
Jens Jeremies 3p
player
player1
Owen
Jeremies
Jens
= Scholes
PROP
Owen
wonderfull
player2
on
wonderfully
wonderfull
Jeremies
= Owen.PROP
on
,
,the
wonderfull
the
… right
right wing
wing ADV NP
but
dragging
drag
,
PUNCT
…
Owen's
...
...
cross
NP
MERGING
•
•
•
•
Fuse annotations and recover from errors
and differences:
Multiple annotations of the same event
(possibly with different attributes, e.g.
time).
Wrong event descriptions because of
information extraction errors.
Merging multiple partial annotations, e.g. by
solving unsolved references like “star
player”.
ON-LINE TASKS
Multilingual Search and Display
•Search for interesting events
with formal questions (user
interface in many languages)
•Indicate hits by thumbnails &
let user select scene
•Play scene via the Internet &
allow scrolling
Give me all goals from Overmars
shot with his head in 1. Half.
Event=Goal; Scorer=Overmars;
Cause=Head; Time<=45
PSV - Ajax
1995
Ned - Eng
1998
Ned - Ger
1998
SUMMARY
•
Multimedia and multilingual
•
ASR on emotionally coloured speech
•
IE on ASR output
•
Merging different annotations
•
Search archives and play video online
http://parlevink.cs.utwente.nl/projects/mumis.ht