slides (MS powerpoint)

The Automatic Generation of Formal
Annotations in a MultiMedia Indexing
and Searching Environment
Thierry Declerck, Peter Wittenburg and
Hamish Cunningham
DFKI GmbH, Max-Planck-Institut für Psycholinguistik and
University of Sheffield
ACL/EACL2001 Workshop on Human Language
Technology and Knowledge Management
The MUMIS Consortium
•
•
•
•
•
•
CTIT
TSI
DFKI
MPI
DCS
ESTEAM
• VDA
University of Twente, Enschede, NL
University of Nijmegen, Nijmegen, NL
Saarbrücken, D
Nijmegen, NL
University of Sheffield, UK
Gothenburg, SE (location Athens, GR)
Hilversum, NL
NLP/IE
ASR
NLP/IE
Online SW
NLP/IE
Translation
Software
Dissemination
Objectives
• Technology development to automatically index (with
formal annotations) lengthy multimedia recordings
(off-line process)
Find and annotate relevant events, together with the involved
entities and relations
• Technology development to exploit indexed
multimedia archives (on-line process)
Search for interesting scenes and play them via Internet
Test Domain: Soccer Games / UEFA Tournament 2000
Off-line Task
• Automatic Speech Recognition (Radio/TV
Broadcasts)
Automatically transforms the speech signals into texts (for 3
languages — Dutch, English and German)
• Natural Language Processing (Information
Extraction)
Analyse all available textual documents (newspapers,
speech transcripts, tickers, formal texts ...), identify and
extract interesting entities, relations and events
• Merging all the annotations produced so far
• Create a database with formal annotations
The Generation of
Formal Annotations
• Metadata (type of game, teams, date, final score,
players etc.), as they can be used a.o. for classifying
and filtering videos in the MM digital archive
• Events (particular actions with time codes,
involved entities and related events), as they can be
extracted from the video sequences
• All Formal Annotations available in XML Standard
The Event Table
Related to domain ontology and multilingual
terminology. Guiding the generation of formal
annotations
Event
ID Time
Subcat/Modification
Metadata
Final whistle
#
90>t>120
Subj=referee, score etc…
Final score
Goal kick
#
0>t>120
Subj=pl, loc=loc, cons=cons,..
Dribbling
#
0>t>120
Subj=pl, loc=loc, …
Substitution
#
0>t>120
Subj=pl, I.obj=pl, cause=c, …
Team (adding pl)
Red Card
#
0>t>120
Subj=ref, I.obj=pl, cause=c, …
Team (red at t)
Goal
#
0>t>=pen Subj=pl, I.obj=team, score=s,
…
Order of goal
Off-line Task
Newspaper
Newspaper
Newspaper
Newspaper
Text
Text
Text
Texts
3 Languages
RadioCommenting
Commenting
Radio
Radio Commenting
Audio Commenting
(TV, Radio)
3 Languages
Languages
33 Languages
3 Languages
Newspaper
Newspaper
Newspaper
Close caption
Text
Text
Text
3 Languages
multilingual IE
=> event tables
Merging of
Annotations
Event = goal
Player = Basler
Dist. = 25 m
Time = 18
Score = 1:0
Events indexed in video
recording
Event = goal
Type = Freekick
Player = Basler
Dist. = 25 m
Time = 17
Score: leading
Event = goal
Type = Freekick
Player = Basler
Team = Germany
Time = 18
Score = 1:0
Final score = 1:0
Distance = 25 m
Event = goal
Player= Basler
Team = Germany
Time = 18
Score = 1:0
Finalscore = 1:0
Freekick
Goal
Pass
Defense
17 min
18 min
24 min
28min
1:0
Foul
Freekick
Neville
Basler
Dribbling
Matthäus
Basler
25 m
Campbell
Scholl
25 m
60 m
The Role of IE in
MUMIS
• Information Extraction (IE) is the task of identifying,
collecting and normalizing relevant information for
a specific application or user.
• The relevant information is typically represented in
form of predefined “templates”, which are filled by
means of Natural Language (NL) analysis (Template
= Event Table in MUMIS)
• IE combines pattern matching mechanisms,
(shallow) NLP and domain knowledge (terminology
and ontology).
Extension of our IE
system in MUMIS
• Multilingual and multisource IE. Incremental
information building
• Cross-document co-reference resolution
• Combine Metadata and event extraction => better
organisation and dynamic updating of information
(KM)
• Multiple presentation of results: Template, Event
table and Hyperlinks (Named Entities, rel. to KM)
Example of Processing
Formal Texts
• Formal Text
• The Formal Text annotated with domainspecific information
Example of Processing
Semi-Formal Texts
• Semi-Formal Text
• The Semi-Formal Text annotated with
domain-specific information
Merging Component
•
•
•
Acting on the generated formal annotations
(Metadata and Events), but also interleaving
with the generation process of those
Checking consistency, eliminating
redundancy (Template Merging), in
accordance with domain ontology
Completing the information with domain
knowledge, inference Machine
On-line Tasks
Searching and Displaying
• Search for interesting events with formal queries
Give me all goals from Overmars shot with his head in 1. Half.
Event=Goal; Player=Overmars; Time<=45; Previous-Event=Headball
• Indicate hits by thumbnails & let user select scene
• Play scene via the Internet & allow scrolling etc
Of course: slow motion, fast play, start/stop, etc
•
User Guidance (Lexica and Ontology)
On-line Tasks
Knowledge Guided
User Interface
&
Search Engine
Freekick
Goal
Pass
Defense
17 min
18 min
24 min
28min
1:0
Foul
Freekick
Neville
Basler
Basler
25 m
München - Ajax
1998
München - Porto
1996
25 m
Deutschland - Brasilien
1998
Prototype Demo
Play
Movie Dribbling
Fragment
Matthäus
Campbell
of that Game
Scholl
60 m
More about MPEG
(Moving
Picture Coding Experts Group)
•
•
•
MPEG-1: low-level media encoding and
compression format (VHS quality, ~ 2-3 Mbps
- good for streaming)
MPEG-2: improved media encoding and
compression format (S-VHS quality, ~ 5-10
Mbps, digital TV and DVD standard
MPEG-4: Codes content as objects and
enables those objects to be manipulated at
the client, optimized compression
On-line SW Architecture
Client-Server structure:
• fully distributed
• JMF media presentation
• RMI-based interaction
Ontology
Client
Objects
Lexica
Client
Applet
JMF
Media Server
Objects
Query Engine
Objects
HTTP
RMI
RMI
(RTP,
RTSP)
WWW Server
Java Server
MPEG Movies
Keyframes
Annotations
Metadata
JDBC
Media
Media
Server
Server
MPEG1
MPEG1
DB
Media
Server
Server
rDBMS
MPEG1
Media
File
Server
Server
MPEG1
Query interface:
• automatic pre-selection
• guided by domain knowledge
• interactive, visual feedback
On-line HW Architecture
Media Server
RAID
1Gbps
Gb-Switch
GB
Switch
Router
FC Switch
Tape
Library
Media Server
Internet
• efficient & reliable storage management
(near-line capacity, media change, 2. Location)
• high storage capacity (n TB, 1 h MPEG1 = 1 GB)
• powerful media servers / powerful network
UI / Annotation
• UI optimization
• thumbnails not that informative
• which thumbnail? (several around time mark)
• automatic thumbnail adjustment?
• seamless operation for user
• lexicon/ontology guidance
• user driven input
• Manual annotation tools
• MediaTagger
• EUDICO
Gain - User Group
Current Procedure
MUMIS Procedure
Manual Video Annotation
Integration Central DB
Automatic Video Annotation
and DB Integration
Query via PC
Query via PC
Results on PC
Contact Video Archive
Get Video Tapes
Search on Tape on VCR
Results on PC
And
Select & Play
Segment & Play
• What gets lost? Is it necessary?
• Potential: direct Internet Service, less dependencies
Acknowledgements
• UEFA
• DFB, FA, KNVB
• EBU, WDR, NOS, SWR
Allez les Bleus!!