ppt 815 K - TRAN THUONG Tien

A Proposal for a Video Modeling for
Composing Multimedia Document
Cécile ROISIN - Tien TRAN_THUONG - Lionel VILLARD
Presented by: Tien TRAN THUONG
Project OPERA - INRIA
Grenoble - France
Work Context

Theme: Multimedia Document (Madeus)
 Authoring
system for multimedia structured
documents
 Basic media: sound, video, text, image, etc.
 Document composed by relations

Need: composition of semantic video fragments
with other basic media elements (image, text,
sound, ...)
Temporal Synchronization Example
INRIA’s positions document
Pictures &
Titles
synchronized
with video
parts
Video
Presentation
Logical organization of document
InriaIntroduction
Buildings
Overview
Rocq.
Title
Text
RhôneAlpes
Title
Text
Rocq.
Picture
RhôneAlpes
Picture
Image
Image
Video Presentation
Locations of
INRIA’s units
Rocq.
appears
Rennes
appears
S.A.
appears
Lorraine
appears
Video Frames
R.A.
appears
Time line view of the document
Rocquencourt Title & Picture
Texts
grow up
Rennes Title & Picture
Sophia-Antipolis Title & Picture
Lorraine Title & Picture
Rhône-Alpes Title & Picture
...
Introduction
Raw video
Rocq. appears Ren. appears S.A. app Lorraine appears RA app.
Locations of INRIA’s units
Time
Video fragments
...
Conclusion
Spatial Synchronization examples
Ok
Hyperlink
Tracking
The text follows a
character
Spatial layout of text follow video
object document
Document Region
(Left, Top, Width, Height)
Text Region
Video Region
(Width, Height)
(Left, Top, Width, Height)
Ok
Video Object Region
Right-Top-Align
{x(t), y(t)}
.
..
.

.
..
.
.
..
.
Location of the video object region that is moving region in the video region
Objective and plan of that
work

Research and development on the video modeling for
the description of the video content relevant to
multimedia applications:
 Video
modeling: video description for multimedia
composition,
 Multimedia application: our VideoMadeus is an
editing and presentation system.
Video Description
Video => Analysis -> Description -> Applications
Scheme of audiovisual applications



Dublin core: the semantic indexing schema for
video content description.
MPEG-7: the future standard tools will enable to
define the semantic schemas for description of
the audiovisual information.
Our video modeling for composing multimedia
document.
Methodology

Specification of a modeling for the description of
video content:
 Multi-level
structuration,
 temporal and spatial relations,
 actions interactive on the video elements.


Specification in XML
Experimentation in Madeus (VideoMadeus)
Multi-level Structuration
Raw video
Video
<!--XML schema for the description of VideoContent-->
Occ.1
Occ.2
Occ.3
Occ.4
Structure
Structure
Irene
Nabil
Semantic
Video Content
Description
Semantic
Video Content Description

<!ELEMENT
VideoContent
(MetaInfo, MediaInfo, Summary, Structure,

Semantic,
Thesaurus)>
Structural Description
 Semantic
Description
<!ELEMENT Structure (Sequence+, Relation?)>
 Thesaurus
<!ELEMENT Semantic (VideoObject*,
EventSemantics*)>
Researcher
Thesaurus
Thesaurus
<!ELEMENT Thesaurus
(ReferenceDictionary*, UserDictionary*)>
Video Structure Description



Motivation: for composition, the basis is to have the
Structure description level.
Semantic and Thesaurus are more necessary for
retrieval applications or as a support for structuration
level.
First step is Structure description
High Level Description
Video Structure
Video Structure

<!--XML schema for the description high level
structure -->
 Shots
Sequences
<!ELEMENT Structure
(Sequence+, Relation?)>
<!ELEMENT Sequence
(Scene+,Relation?)>
 Scenes
 Sequences
Scenes
<!ELEMENT Scene
<!ELEMENT Shot
Shots
Video
(Shot+,Event*, Relation?)>
(Transition?,Event*,Occurrence*,
Background?, Relation?)>
Shot Content Description
Shot Content

<!-- XML
Shot Description -->
Semantic
Transition
(Transition?,Event*,Occurrence*,
Shot
<!ELEMENT
Background?, Relation?) >
Shot
 Event
Index
<!ELEMENT Transition EMPTY >
SpatialLayout
EMPTY>
Event
<!ELEMENT
Trans.
Event
SpatialLayout
CameraWork
Occurrence
Occurrence
Background
Background
<!ELEMENT
Background (Region+)>
*
Reference
<!ELEMENT Occurrence (Region+, Trajectory?,

Occurrence*) >
<!ELEMENT SpatialLayout
(2DBStringDS+) >
 Camerawork
<!ELEMENT CameraWork (CameraMotion?) >
Occurrence Content Description
<!-- XML Occurrence description -->

Occurrence Content
<!ELEMENT Occurrence (Trajectory*, Region+,
 Trajectory
Occurrence
Occurrence*)>
 Regions
Trajectory
Region
Occurrence
<!ELEMENT Region (Contour+, Color*, Texture*,
 Color Centroid, Region*)>
Contour
Color
<!ELEMENT Trajectory …>
 Contour
Texture
Centroid
Region
<!ELEMENT Contour … >
 Texture
<!ELEMENT Color
…>
<!ELEMENT
Texture … >
 Regions
<!ELEMENT Centroid … >
Occurrences

<!ELEMENT
Region … >
Model summary


The model focuses on the description of video
elements useful for composing a multimedia
document (shot, scene, occurrence, event,
relation, etc.)
It has a XML specification that makes it
independent and easy to apply to multimedia
applications (ex. our VideoMadeus).
Experimentation of the model in
Madeus - VideoMadeus
Madeus Architecture
Editor/Presentation Tools
EXECUTION
View
TIME LINE
View
HIERARCHICAL
View
VIDEO STRUCTURED
View
...
MODEL MANAGEMENT
PARSERS
LOGIC
STRUCTURATION
EVENT
TEMPORAL
SAVE
Madeus
document
STRUCTURATION
MANAGEMENT
SPATIAL
MADEUS
STRUCTURATION
OUTILS
JAVA

Xerces
JMF
To extend Madeus to VideoMadeus, video content description is handled
both in composition and in presentation parts.
Madeus Document Model
Madeus Document
Structured document organized<Madeus>
according to the dimensions:
<Content> … </Content>
Content
Logical,
temporal, spatial.

<Actor> . . . </Actor>
Actor
Temporal
Madeus Document
Spatial
Logical
Content
Actor
Internal Document
<Temporal> . . . </Temporal>
<Spatial> . . . </Spatial>
</Madeus>
Content that describes the content information of the document
Temporal
Actor that defines how this basic information in the content part is
used in the document (style information, link, etc.)
Spatial

Temporal for the synchronization between document parts

Spatial for layout specification

Relations

Temporal relations (Allen extension)


meets, starts, equals, during, overlaps, parmin,etc.
Spatial relations

left_align, right_align, center_v, center_h,
top_align, bottom_align, etc.
<Temporal> …
<Relations>
<start Interval1=« a » Interval2=« b » />
<meet Interval1=« b » Interval2=« d » /> …
<Relations>
d
</Temporal>
<Spatial> …
<Relations>
<left-align Region1=« b » Region2=« d » />
…
<Relations>
</Spatial>
Overview of VideoMadeus
Editing and Presentation Tools
Execution View
Video edition View

Element
Management
Structure View
Synchronization
Synchronization
Management
Behavior Management
Hyperlink
Follow-up
Erase
Display, etc...


Semantic View

Thesaurus View
Edit
Play
Search
Temporal
Spatial


Requested descriptions
Requested descriptions
Modified description
Data Management
XML
Description of
video content
Modify
Parser
Index on video
Video
Internal Structure
(MODEL)
VideoMadeus document
<Madeus>
<Content> . . .
<VideoContentDS> . . .
<Scene ID = « MyScene » ... > . . . </Scene>
</VideoContentDS>
</Content>
<Actor> . . .
<VideoElement ID=«SceneVideo» Content = «MyScene » . . . > . . .</VideoElement>
</Actor>
<Temporal>
<Interval ID=“ScenceInt” Actor=“SceneVideo” Duration=“...” … />
<Relations> . . . </Relations>
</Temporal>
<Spatial>
<Region ID=“ScenceReg” Actor=“SceneVideo” Height =“288” Width=“352” … />
<Relations> . . . </Relations>
</Spatial>
</Madeus>
Editing features

Editing of the video description
shot detection (automatic or manual)
 extract manually video objects, events, spatialLayout, etc.


Creating of semantic groups (manual)
group shots in a scene, group scenes in a sequence
 detection occurrences of a character (group occurrences in
objects)
 creation of the other semantic indexing
 classifying of the video elements (thesaurus)


scenario editing (composing)
Set temporal and spatial relations between video element
and other media
 Set actions on the video elements

Conclusion

Provide support for deeper access into video data
in the multimedia authoring system:
 temporal/spatial
synchronization with the other
media elements (image, text, sound, etc.),
 actions on the video elements (hyperlink, follow-up,
erasing, etc.)

Develop experimentally the video editing view to
help the user create and modify descriptions of
video data in accordance with our video model.
Perspectives




More experimentation for spatial synchronization,
Extension and experimentation of the semantic
parts (Semantic and Thesaurus) -> semantic
queries,
Use the MPEG-7 tools to specify our video model,
Develop the video content description editing tool:
 Integration
and adaptation of the video analyzing
algorithms for generating more automatically
possible the video elements,
 Timeline editing view for video structure, etc.

Semantic queries for playing a part of video through
network.
Video content description in
Madeus document
<?xml version="1.0"?>
<Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600">
<Content>
<VideoContent ID="InriaInfoco" … >
<Structure ID="InriaInfocoStruc" … >
<Sequence ID="Seq" Start_Time ="0" Stop_Time ="76.69" … >
<Scene ID="Scene1" Start_Time ="0" Stop_Time ="4.91" … > </Scene>
<Scene ID="Scene2" Start_Time ="4.91" Stop_Time ="11.09" … >
<Shot ID="Shoti" Start_Time ="4.91" Stop_Time ="8.71" … />
<Shot ID="Shotii" Start_Time ="8.71" Stop_Time ="11.09" … />
</Scene>
<Scene ID="Scene3" Start_Time ="11.09" Stop_Time ="29.07" … > … </Scene>
…
</Sequence>
</Structure>
…
</VideoContent >
<VideoContent ID="InriaGen" … > … </VideoContent>
…
</Content>
…
</Madeus>
Video element definition
<?xml version="1.0"?>
<Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600">
<Content> . . . </Content>
<Actor>
<VideoElement
ID=«WesternScene»
Content=«WesternDS.Seq.Scene1»
TypeRenderer=«LightWeight». . . >
<VideoObject ID=«VO1» Object = «Shot2.ActorOcc1»
Actions=«Follow-up;Hyrperlink;...»
HRef =«file:///C:/Users/ttran/Multimedia/Madeus/opera.html»
/>
...
</VideoElement>
...
</Actor>
<Temporal> . . . </Temporal>
<Spatial> . . . </Spatial>
</Madeus>

The operations can be defined in the instance of the
described video: Hyperlink, Tracking, Erasing, Jumping, etc.
Temporal part of Inria introduction
document
<?xml version="1.0"?>
<Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600"> ...
<Temporal>
<T-Group ID="Temporal" Duration="pref:20s min:15s max:22s">
<!-- Interval of three hypertexts -->
<Interval ID="ControlOperaInterval" Actor="ControlOperaInfo" Duration="pref:20s min:15s max:22s"/>
…
<!-- Interval of the video element -->
<Interval ID="MovieInriaURScene4" Actor ="InriaURScene4" Fill="freeze" Duration="pref:20s min:15s max:22s"/>
<!-- Interval of the texts -->
<Interval ID="txtInriaURScene4Shotii" Actor ="TxtInriaURScene4" Duration="pref:20s min:15s max:22s" />
...
<Interval ID="txtInriaURScene4Shotvi" Actor ="TxtInriaURScene4Shotvi" Duration="pref:20s min:15s max:22s" />
<!-- Interval of the images -->
<Interval ID="ImgInriaURScene4Shotii" Actor ="ImgRoc" Duration="pref:20s min:15s max:22s" />
...
<Interval ID="ImgInriaURScene4Shotvi" Actor ="ImgRA" Duration="pref:20s min:15s max:22s"/>
<Relations>
<!-- Equals relations of the texts with the video elements -->
<Equals Interval1="MovieInriaURScene4.Shotii" Interval2="txtInriaURScene4Shotii.SizeAnimation" />
...
<Equals Interval1="MovieInriaURScene4.Shotvii" Interval2="txtInriaURScene4Shotvii.SizeAnmationi" />
<!-- Start relations of the images with the video elements -->
<Starts Interval1="MovieInriaURScene4.Shotii" Interval2="ImgInriaURScene4Shotii" />
...
<Starts Interval1="MovieInriaURScene4.Shotvi" Interval2="ImgInriaURScene4Shotvi" />
</Relations>
</T-Group>
</Temporal>
<Spatial> … </Spatial>
</Madeus>
Spatial part of Spatio-Temporal Relation
Demo document
<?xml version="1.0"?>
<Madeus Name="DocMadeus" Version="2.0" Width="800" Height="600">
...
<Spatial>
<S-Group ID="TOTOSpatial">
<!-- Video region -->
<Region ID= "WesternVideoRegion" Actor="WesternVideo" Left="206" Top="140" Height="288" Width="352" Depth="1"/>
<!-- Three hypertext regions-->
<Region ID= " LinkOperaInfoRegion" Actor ="ControlOperaInfo" Left="236.0" Top="492.0" Width="210.0" Depth="2.0"/>
<Region ID="LinkAutoCitroenRegion" Actor ="ControlAutoCitroen" Left="36.0" Top="492.0" Width="210.0" Depth="2.0"/>
<Region ID="LinkSTRST" Actor ="ControlSpatioTemp" Left="472.0" Top="492.0" Width="236.0" Depth="2.0"/>
<Region ID="TxtOperaIntroRegion" Actor ="TxtOperaIntro" Left="168" Top="46" Height="42" Width="429" Depth="2.0"/>
<!-- Regions of the text following the video object-->
<Region ID="TxtMotionRegion" Actor ="TxtMotion" Height="16.0" Width="69" Depth="2.0"/>
<Relations>
<Top_align Region1="WesternVideoRegion.Shot1.Obj" Region2="TxtMotionRegion" />
</Relations>
</S-Group >
</Spatial>
</Madeus>