ZY A Multimedia Document Model for Reuse and Adaptation

To appear in Transactions on Knowledge and Data Engineering, DS-8 Special Issue, IEEE, 2000
1
| ZYX |
A Multimedia Document Model for
Reuse and Adaptation of Multimedia Content
Susanne Boll, Wolfgang Klas
Database and Information Systems (DBIS)
University of Ulm, Computer Science Department, Ulm, Germany
fboll,[email protected]
neous network and system environments.
Our research project \Gallery of Cardiac Surgery"
(Cardio-OP1) [1] is an example of an advanced multimedia application that emphasizes this need for reuse and
adaptation and explicitly requires a model for multimedia material that supports extensive reuse of the material
in dierent user contexts. The overall goal is to develop
an Internet-based and database-driven multimedia information system for physicians, medical lecturers, students,
and patients in the domain of cardiac surgery. The system
will serve as a common information and education base for
its dierent types of users in which the users are provided
with multimedia information according to their specic request, their dierent understanding of the selected subject,
their geographic location and technical infrastructure.
Within this project context, our group is developing
concepts and prototypical implementations of a databasedriven multimedia repository that integrates modeling,
management, and content-based retrieval of multimedia
content with exible dynamic multimedia presentation services that select, deliver, and present the multimedia content according to the user context. Major project requirements are the support for reuse, adaptation, and
presentation-neutral description of the structure and content of multimedia documents.
Given the project's requirements, we were looking for a
I. Introduction
suitable modeling support among existing multimedia docMultimedia applications need data models for the repre- ument standards. Therefore, we elaborated both the tradisentation of the composition of media elements | multi- tional and advanced requirements to multimedia document
media document models. They are employed to model the models and, endowed with this metrics, analyzed the docusemantic relationships between the media elements partic- ment models HTML [2], MHEG [3], [4], [5], HyTime [6], [7],
ipating in a multimedia presentation. The initial require- and SMIL [8]. The detailed analysis and comparison of the
ments to multimedia documents are the modeling of the models can be found in [9], [10], [11]. However, the analytemporal and spatial course of a multimedia presentation sis of the models' basic modeling concepts as well as their
and also the modeling of user interaction. However, the support for reuse, adaptation, and presentation-neutral derequirements of multimedia applications have evolved: as scription of multimedia content showed that each of the
authoring of multimedia information is a very time consum- models lacks some signicant concepts and does not meet
ing and costly task, attention has been drawn to reuse mul1 Partially funded by the German Ministry of Research and Educatimedia documents for eÆciency and economical reasons.
tion, grant number 08C58456. Our project partners are the UniverFurthermore, the growing plenitude of multimedia informa- sity Hospital of Ulm, Dept. of Cardiac Surgery and Dept. of Cardioltion calls for personalization of the multimedia information ogy, the University Hospital of Heidelberg, Dept. of Cardiac Surgery,
according to the user's individual context. Access and dis- an associated Rehabilitation Hospital, the publishers Barth-Verlag
dpunkt-Verlag, Heidelberg, FAW Ulm, and ENTEC GmbH,
tribution of multimedia documents via networks like the and
St. Augustin. For details see also URL http://www.informatik.uniInternet require adaptation of the documents to heteroge- ulm.de/dbis/Cardio-OP/
Abstract | Advanced multimedia applications require adequate support for the modeling of multimedia content by
multimedia document models. This support more and more
calls not only for the adequate modeling of the temporal and
spatial course of a multimedia presentation and its interactions, but also for the partial reuse of multimedia documents
and adaptation to a given user context. Our thorough investigation of existing standards for multimedia document
models such as HTML, MHEG, SMIL, and HyTime, however, leads to us the conclusion, that these standard models
do not provide suÆcient modeling support for reuse and
adaptation.
Therefore, we propose a new approach for the modeling of
adaptable and reusable multimedia content, the ZYX model.
The model oers primitives that provide | beyond the more
or less common primitives for temporal, spatial, and interaction modeling | a variform support for reuse of structure
and layout of document fragments and for the adaptation of
the content and its presentation to the user context. We
present the model in detail and illustrate the application
and eectiveness of these concepts by samples taken from
our Cardio-OP application in the domain of cardiac surgery.
With the ZYX model, we developed a comprehensive
means for advanced multimedia content creation: support
for template-driven authoring of multimedia content; support for exible, dynamic composition of multimedia documents customized to the user's local context and needs. The
approach signicantly impacts and supports the authoring
process in terms of methodology and economic aspects.
Keywords | multimedia document model, reuse, adaptation, multimedia database system
2
all of the requirements. Therefore, we designed and implemented the ZYX model to overcome these limitations and
to have a proper basis to start out from to comprehensively
provide for reusability and adaptation by the multimedia
repository.
In this paper, we present the ZYX model, which forms
the core for the modeling of the multimedia content in our
repository. In comparison to existing models, it provides
more adequate support for semantic modeling, reusability
and exible composition, adaptation and individualization
for presentation, and presentation-neutral storage. We illustrate the application of the model in the domain of cardiac surgery and point out the implications of such a model
that supports reuse and adaptation to multimedia authoring and multimedia presentation.
The paper is organized as follows: Section II provides
the reader a better understanding of the new requirements
we see with next generation multimedia applications. This
leads to a metric that we used to analyze existing multimedia document models. The summary of this analysis is also
presented in this section. It motivates the need for our new
document model ZYX that emphasizes the requirements
for reuse and adaptation of multimedia documents. Section III presents the basic ideas and design considerations
of the ZYX model, Section IV gives the formal framework
for a detailed understanding of the model. Focussing on
reuse and adaptation, Section V, presents and illustrates
the spectrum of application possibilities of ZYX for reuse
and adaptation and discusses the advantages this supports
brings to creation and delivery of multimedia content. Section VI summarizes our work and gives an outlook to ongoing and future work.
II. Requirements to Multimedia Document
Models and an Analysis of Existing Models
In this section, we present our requirements to multimedia document models. Hereby, we distinguish basic and advanced requirements. The basic requirements to multimedia document models are the modeling of the temporal and
spatial course of a multimedia presentation and the modeling of interaction. The challenging, advanced requirements
to multimedia document models are the reusability of the
multimedia material, the adaptation to user specic needs
and context, and presentation-neutral description of the
content. As our focus lies on the advanced requirements,
we start out with presenting these in Section II-A and only
shortly sketch the basic requirements afterwards in Section
II-B. Both the basic and the advanced requirements constitute a metrics along which we analyzed selected relevant
multimedia document models for their suitability in the
project context. This analysis is summarized in Section
II-C.
A. Advanced Requirements
In order to support a modular and context-dependent
composition of multimedia documents from media objects
and parts of multimedia documents, document models
need to provide a data model which provides support for
reuse, adaptation, and presentation-neutral description of
the structure and content of multimedia documents.
Reuse. As motivated in the introduction, reuse of multimedia material is an unavoidable requirement for multimedia
document models. We characterize reusability of multimedia content along three dimensions: the granularity of
reuse, the kind of reuse, and the selection and identication
of reusable components.
Granularity: The granularity of reuse determines what
can be reused. Regarding multimedia document models,
we can distinguish at least three levels of granularity of
reusable components: reuse of complete multimedia documents, reuse of fragments of multimedia documents like single scenes or teaching units, and reuse of individual atomic
media elements such as a video or audio and parts of those
media elements such as a scene of a video.
Kind of reusage: For all three levels of granularity we
distinguish two dierent ways of how to reuse material for
the composition of new documents: identical reusage, i.e.,
the components are reused including all temporal, spatial,
design and interaction relationships and constraints as originally specied by the author(s), and structural reusage by
means of separating layout and structure and reusing only
structural parts.
Selection and identication: Before we can reuse multimedia components we have to identify and select them
within the multimedia information system. This calls for
metadata and for mechanisms for classifying, indexing, and
querying components. Hence, a document model should
provide support for comprehensive and sophisticated annotation of reusable components with metadata.
Adaptation. The presentation of multimedia documents
preferably should adapt to the user context, like the user's
interest, knowledge level, preferences, the targeted user system environment, and varying resources like available network bandwidth and CPU time. To introduce adaptivity
into multimedia presentations a requirement to a multimedia document model is that the model must oer primitives
to specify or generate orderive in some way presentation alternatives that reect and meet the dierent presentation
contexts. For an actual presentation the system can use
these alternatives to adapt the delivery and the rendering
of the presentation to the current user context.
For example, consider a professor on campus who is interested to see in-depth multimedia material on coronary
artery bypass grafting, and an undergraduate student at
home who needs to get only an abstraction of the same
material to pass the upcoming exam. In these two dierent presentations, the \story" behind each actual presentation, however, might be the same; some components of the
professor's presentation might be (re)used in the student's
presentations while others might be substituted or adapted
by more abstract representations of the specic content.
For a better understanding, we distinguish adaptation by
the extent to which the adaptability is modeled and when
the adaptability is exploited:
Extent of the adaptability: For the extent of the adapt-
3
ability we distinguish between adaptation to personal interest, which adapts the contents of a document to the user's
interests, knowledge, professional background and the like,
and adaptation to technical infrastructure, which adapts to
the technical infrastructure available to a user.
In the example above, adaptation to technical infrastructure would be the capability to adapt the document's presentation both to the high-end environment of the professor on campus and the low-end environment of the student
at home. Therefore, the presentation should be adaptable
by means of technical parameters like resolution of images
and frame rate of videos, but also by means of media substitutability like substituting an audio by text or a video
by a sequence of pictures or a small animation. Adaptation
to personal interest would be an adaptation of the content
such that the professor would see a more in-depth presentation of the coronary artery bypass grafting whereas the
student would rather get a simplied variant presentation
of the operation, thus reecting the expected background
knowledge of the dierent users.
Static or dynamic adaptability: With regard to the presentation alternatives it is of interest whether all possible alternatives for the adaptation are to be known and
modeled at authoring time of a multimedia document or
whether they are left for generation at the actual presentation time just when the adaptation is needed.
Presentation-neutral Representation. The multimedia material available has to be presentable in a heterogeneous
software and hardware environment as can be found in the
Internet. As a consequence, the multimedia material has to
be stored presentation-neutral, i.e., independent of the actual realization of a presentation at a client. This calls for a
presentation-neutral representation of multimedia content
that is convertible into the respective presentation-specic
format used for playout of the multimedia material. It is
desirable that this conversion is lossless and a conversion to
dierent \output formats" is possible. The presentationneutral representation of multimedia content should hence
| besides the coverage of rich multimedia functionality |
take place on a high level of semantics. The presentationneutral model should also be open in the sense that it allows
for later integration of multimedia functionality expected
to be developed in the future.
Multimedia functionality: The multimedia functionality
of a multimedia document model describes the expressiveness of its modeling primitives. A document should have
a high multimedia functionality to give suÆcient support
for modeling multimedia content. With regard to the conversion process of a (presentation-neutral) document into
another/output format, this means that if the target document model does not oer an equivalent multimedia functionality as oered by the source model, the conversion will
be lossy.
Semantic level: A document model describes a document
on a high semantic level if the document's structure is specied rather than its presentation. This is helpful and necessary to allow for an automatic conversion of a document
into another document format as then the course of the
presentation can be extracted and converted easier. If the
document has a low semantic level, a conversion needed
knowledge about the multimedia content that often only
the author will have.
Therefore, the presentation-neutral representation of
multimedia content should have a high multimedia functionality and take place on a high level of semantics.
B. Basic Requirements
The traditional requirements for a temporal and spatial
model as well as interaction modeling are imperative for
a multimedia document model and, hence, are presented
only in short for the sake of completeness.
Temporal model. A temporal model (see also [12], [13], [14],
[15]) describes temporal dependencies between the media
elements of a multimedia document. One can nd four
types of temporal models: point-based temporal models,
interval-based temporal models, and event-based temporal
models. Another way to specify temporal relations between
media elements is by the use of scripts { programs written
in a scripting language which can comprise temporal synchronization operations.
Spatial Model. Three approaches of positioning the visual
elements on the presentation medium can be distinguished:
absolute positioning based on a coordinate system, directional relations [16], using relations like strong-north and
weak-north (to specify overlapping), and topological relations [17] using relations like disjoint, meet, and overlap.
Interaction. Users should be able to interact with presentations in terms of three types of interaction: (1) Navigational
interactions determining the user-dened ow of a multimedia presentation, (2) design interactions inuencing the
visual and audible layout of a presentation, and (3) movie
interactions aecting the temporal course of the entire presentation. Navigational and design interactions should be
specied within multimedia documents, whereas movie interactions are expected to be oered by the presentation
engine.
C. Analysis of Existing Models
In this section, we very briey summarize our analysis
of the most relevant existing standards and data models
in view of the requirements presented in the previous section. Both the basic and the advanced requirements constitute a metrics along which we analyzed selected multimedia document models. Due to the limitation of space we
can not present our comprehensive and detailed discussion
how the models meet the specic requirements in this paper
but refer the reader to [9], [10], [11]. Figure 1 illustrates
the results of our analysis of the most relevant existing
approaches and shows to which extent HTML/DHTML,
MHEG-5/6, HyTime, and SMIL, full the basic and advanced requirements. For each of the requirements the single aspects elaborated in Section II are listed and for each
of the models the Figures shows how/to what extent the
requirements are met by the model.
4
these requirements by a new data model. In the following, we take up the advanced requirements and discuss the
eventpointintervalscript
Temporal Model
approach how we aim at supporting them in ZYX. With
based
based
based
regard to the basic requirements, we present what underabsolute
absolute
absolute
absolute
absolute
Spatial Model
positioning positioning positioning positioning positioning
lying temporal and spatial model we selected and explain
Interaction
the interaction capabilities.
+
+
+
+
Navigational
Presentation-neutral representation. For the supported de+
+
Design
gree of presentation neutrality of the multimedia document
Reusability
Granularity
model, the semantic level of the model and the model's ab+
+
+
+
+
Media Elements
straction from the actual presentation are crucial. There+
Fragments
fore, decided to develop a data model that describes a mul+
+
+
+
+
Documents
timedia document on a high semantic level. This allows us
Kind of Reusage
a (lossy) export or conversion of our multimedia document
+
+
+
+
Identical
+
Structural
into data models like MHEG-5, SMIL, and HTML. To keep
+
+
+
+
Identification/Selection
the documents independent of the nal realization within a
multimedia presentation, the model strictly separates modAdaptation
Parameters of Adaptability
eling of layout information from document structure. To be
MHEG-6
+
User Interest
able to support a rich multimedia functionality our model
MHEG-6
+
+
Technical Infrastructure
is designed to support as much of the multimedia functionDefinition of Alternatives
MHEG-6
+
+
Static
ality of these models as possible while still keeping a high
Dynamic
semantic level.
Presentation-neutral
Reuse. For the structure of the documents, we consider
Representation
very low
high
medium
very high
low
Multimedia Functionality
a hierarchical organization of the document as it can be
medium
very low
medium
low
very high
Semantic Level
found with XML-based document models. To achieve reuse
on an arbitrary level of granularity, the model supports
Fig. 1. Summary of the support of the basic and advanced require- dierent granules of reusable components, i.e., media elements by HTML, DHTML, MHEG-5/6, SMIL, and HyTime (+
ments, document fragments, and entire documents. The
support, o partial support, | no support)
model strictly separates modeling of layout information
from structure to keep the documents independent of the
The analysis of existing standards, defacto standard for- nal realization within a multimedia presentation. Due
mats, and models shows that, although, individual formats to this separation of layout information from structure,
and models are strong with respect to particular features, hence, it is both possible to just reuse the structure and
they are not capable to meet all the requirements identi- add new layout information to it, and to reuse the dierent
ed in the previous section especially those we nd with granules directly with the layout information. Hence, the
advanced multimedia applications, i.e., support for reuse, ZYX model supports structural and identical reusage of eladaptation, and presentation-neutral description. This re- ements, fragments, and documents. For the selection and
sult led to the design and implementation of the ZYX model identication of the dierent granules the model has the
which tries to take the pick of the bunch of features of ex- capability to annotate/enhance the granules with contentisting formats and models, especially also recent develop- descriptive metadata.
ments in the area of Internet-applicable models driven by Adaptation. With our document model we want to supthe development of XML and SMIL.
port comprehensive adaptation mechanisms. Adaptability
of ZYX is not limited to adaptation to a pre-dened set of
III. The ZYX Model
discriminating technical attributes that are exploited for
When designing the ZYX model, we were, of course, tak- adaptation, as can be found with SMIL, but can be specing into account the lessons learned with the models we ied by an open set of attributes that reect a complex
analyzed. To give the reader an understanding of the de- user and system context. The model oers the static modsign of our model and also the points of contact of ZYX eling of \presentation alternatives" that can be exploited
with other approaches in the eld, we sketch our design for adaptation to the dierent presentation contexts. Adconsiderations in Section III-A. In Section III-B, we then ditionally, the model oer primitives that determine the
introduce the reader into the basic concepts of our ZYX needed presentation alternative only at the point in time
data model before we present the detailed formal frame- when the document is actually requested and presented.
work for ZYX in Section IV.
Temporal model. We decided to use an interval-based temporal model. In order to full the important requirement
A. Design Considerations
to describe the temporal dimension of interaction, we seAiming at the design of a model which fulls the re- lected the Interval Expressions [14] to form the basis of the
quirements of reuse, adaptation, and presentation-neutral underlying temporal model of the ZYX data model. In exrepresentation as presented in the previous section, there tent to other interval-based temporal models it allows to
are still choices open how to achieve suÆcient support for describe and related time intervals which possibly have an
Advanced Requirements
Basic Requirements
Requirements
HTML
DHTML
SMIL
MHEG-5
HyTime
5
binding point
pv
presentation element 1
variables
v1
. . .
1
...
pvn
vn
free variable
bound variable
projector variables
pv1
presentation element 2
v1
. . .
vn
...
unknown duration, a feature which is of importance with
interaction modeling. The selection of an interval-based
temporal model does not contradict to the high-semantic
level of the document model as this would be the case of
an event-based or script-based temporal model.
Spatial model. For the spatial layout we decided for a pointbased description of each visual media entity in a multimedia document. Each visual media entity has assigned
2-dimensional extension plus a third dimension to specify
overlapping of visual media entities. So far, we do not
consider the specication of spatial relationships between
media entities like right-of or besides. As our model strictly
separates structure and layout, and denes clear interfaces
to add layout to structure, the model, however, allows to
be extended by a more sophisticated spatial model later.
Interaction. Our model supports the two interaction types:
navigational/decision interactions and design interactions.
This means that our model provides a comprehensive support for these two interaction types comparable with the
interaction capabilities of MHEG-5, but more sophisticated
than those of SMIL.
pvn
Fig. 2. Graphical representation of the basic document elements
an Audio which are bound to its variables v6 and v7 . The
presentation semantics of each fragment is that is starts
with the presentation of the root element, here the sequential element. The specic presentation semantics of the seq
element is that the elements bound to its variables v1 , v2 ,
v3 , v4 , and v5 . are presented one after the other. That
is, the element that will be bound to v1 is presented rst,
then the image bound to v2 , then the par element and so
on. The presentation semantics of par element is that the
video and the audio element bound to its variable v6 and v7
are presented in parallel. The sample fragment represents
B. Basic Concepts of the ZYX Model
the media elements and the semantic relationships between
In this section, we present the terminology and the basic the four media elements. With the seq element's binding
concepts of the ZYX model. The ZYX model describes a point this fragment can be bound to another presentation
multimedia document by means of a tree. The nodes of element in a more complex multimedia document tree. The
the tree are the presentation elements and the edges of the variables v1 and v5 of the fragment are still unbound. Here,
tree bind the presentation elements together in a hierar- an(other) author could insert, e.g., a title at the beginning
chical fashion. Each presentation element has one binding and a summary at the end of the sequence, later.
point with which it can be bound to another presentation
element. It also has one or more variables with which it
seq
can bind other presentation elements. Additionally, each
v1
v2
v3
v4
v5
presentation element can bind projector variables to specify
the element's layout. Figure 2 introduces the graphical representation of these basic elements of the model which we
Text
Image
par
use in the following to illustrate the model's features. The
v6
v7
presentation elements are represented by rectangles, they
form the nodes of the document tree. On top of this rectangle, a diamond represents the element's binding point.
Video
Audio
The variables are represented by the lled circles below the
Fig. 3. Simple document tree | a ZYX fragment
rectangle. The open circles on the right side of each presentation element represent the element's projector variables.
The actual connections of variables and projector variables
We now explain the modeling capabilities of our model
to binding points of other presentation elements are reprewith
regard to our specic requirements of reusability,
sented by edges in the graphical representation. A variable
adaptation
and presentation-neutral representation, as well
that is connected to another presentation element is called
as
temporal
and spatial modeling, and interaction.
bound variable, those variables that are not connected are
Reusability. First, we describe the elements of ZYX that
called free variables.
Presentation elements are the generic elements of the support dierent granularity of reusable components of
model. They can be media elements that represent the multimedia documents.
media data but also elements that represent the temporal, Reusability on the level of media elements is supported
spatial, layout, and interactive semantic relationships be- by means of selector elements: These are presentation eltween the elements of a multimedia document. Consider ements that determine what, that is which part of a methe simple document tree, a so called ZYX fragment, in dia element is presented. They can be used to select and
Figure 3. A temporal element, the sequential element seq, thereby (re)use a specic part of an audio or a specic area
binds the media elements Image and Text to its variables v2 of an image. To select a part of a continuous media eleand v4 , as well as a parallel element par to its variable v3 . ment, the temporal selector temporal-s species start and
The par element element again synchronizes a Video and duration of the selected sequence. Figure 4 illustrates the
6
usage and semantics of a temporal selector element: The
temporal selector selects a scene of a video of a duration of
40sec beginning with second 10 of the original video.
temporal-s
Interval selected by temporal-s
[10,40]
v1
Video
...
Video
0
20
10
30
40
50
t in sec
Fig. 4. Temporal selector element temporal-s and its semantics
To select a spatial fraction of a visual media element, the
spatial selector species the selected area by a polygon. In
Figure 5, a spatial selector spatial-s is applied to an image
media element to select a rectangular area from the image.
The selectors can also be applied to fragments, e.g., to
select two minutes of an existing slide presentation or a
fraction of a composite visual element.
(0,0)
spatial-s
[x,y,w,h]
image part - selected by
spatial-s
fragment
seq
height
v1
of variables. In the example in Figure 6, dierent presentation elements of the fragment leave variables unbound,
which makes it a template as described above. Here also
the encapsulation of fragment by complex media elements
is of help: To make later \lling" of such templates easier,
a template can also be encapsulated. The free variables
of the fragment are exported and form the variables of the
complex media element. Figure 6 illustrates how a complex media element encapsulates a complex fragment. A
complex media element somehow is the black box view to
a possibly complex presentation fragment. The concepts of
free variables in combination with complex media element
guarantee comprehensive and workable reusability on the
level of presentation fragments.
Analogously, an external media element encapsulates a
specication of a fragment that was composed in another
external document format. This allows the inclusion of
existing documents of another document format into our
model. What, however, is encapsulated by the external
media element is dependent of the external document format.
Fragments and documents: And, of course, fragments
entire documents can be reused by binding the root element
of the document to a free variable in another document.
(x,y)
v1
v2
v3
v4
v5
h
Image
w
seq
par
Image
v7
v6
v8
v9
width
Fig. 5. Spatial selector element spatial-s and its semantics
Audio
encapsulate
Reuse is also supported on the level of fragments. Here
templates, complex media elements, and external media elements provide for the support of reusability of fragments:
Templates: In the ZYX model, not all of the variables of
a presentation element must be bound at authoring time.
In Figure 3 the variables v1 and v5 , e.g., the title and the
summary of the presentation are still unbound. This means
that the sequence element seq can later be completed by
binding presentation elements to the free variables v1 and
v5 . This makes the simple fragment in Figure 3 a \template" for later (re)use. This is an important feature for
building reusable fragments that can be applied in dierent multimedia documents by (a kind of late) binding of
the free variables dierently corresponding to the current
context.
Complex and external media elements: It is of course
possible to form more complex fragments like the one
shown in Figure 6. To make reuse more easy and make it
easier to handle large documents fragments can be encapsulated by complex media elements. Then, an encapsulated
fragment appears like a single presentation element in the
specication tree with one binding point and possibly a set
complex media element
v
1
v6
v
8
v
9
v
5
reuse
Fig. 6. Complex fragment encapsulated in a complex media element
With regard to the kind of re-usage the model supports
both identical and structural reuse: Therefore, besides the
selector elements, the ZYX data model oers projector elements that inuence the visual and audible layout in a presentation of a multimedia document. Projector elements
determine how a media element or a fragment is presented.
They determine for example the presentation speed of a
video or the spatial position of an image on the screen.
Projectors are bound to the projector variables of presentation elements. Each presentation element can have one or
more projector variables to which projectors can be bound.
A projector applies not only to the presentation element it
is bound to but also to its subtree. For the arbitrary nesting of projectors authoring tools should provide support for
7
consistency checking to avoid contradicting layout specications.
Figure 7 illustrates the usage of projector elements and
the separation of structure and layout. In this example, a
fragment denes the parallel presentation of an audio and
a video. Two projector elements are bound to the sequential element, a spatial projector spatial-p and an acoustic
projector acoustic-p. Each of the projectors applies only
to those elements in the same tree that can be aected
by it. Therefore, the spatial projector aects the spatial
layout of the video. The acoustic projector applies to the
audio element and determines the volume, base, treble, balance for presentation. By means of changing/adding projector elements one can change the layout of the document.
This allows for reusability of the same structure with dierent presentation layouts, i.e., implements structural reuse.
This follows the idea of separating structure from layout
information as can be found with SGML and XML and
complies also with our requirement for presentation-neutral
representation of the documents.
Structure
pv 1
Layout
par
v1
v2
pv 2
of the document. With each of the alternatives under a
switch element, there is associated metadata that describe
the context in which this specic alternative is the best
choice for presentation. This metadata is specied as a set
of discriminating attribute-value pairs for each alternative.
During presentation, the user prole is evaluated against
the metadata of the switch and that alternative is selected
for presentation of which the discriminating attributes best
match the current user prole. An illustration of the switch
element is given in Figure 8. The switch element species
two presentation alternatives: the rst alternative, bound
to v1 is associated with a seminar-like teaching style (type,
seminar) and the second one with a lecture-like type of
teaching (type, lecture). When the document is presented,
depending of the preferred type of teaching which is reected in the user's current prole, either the left or the
right subtree is presented. As the switch element can specify an arbitrary number of alternatives each of which is
described by an arbitrary number of attribute-value pairs
this provides for a very comprehensive extent of adaptability as almost every aspect of a user and the environment
can be distinguished and later be evaluated for adaptation
during presentation.
spatial-p [10,10,30,30]
acoustic-p [20,0,0,0]
Video
switch [ (type, "seminar"),
(type, "lecture")]
Audio
"seminar"
v1
v2
"lecture"
par
par
V
Presentation
Fig. 7. Simple fragment with spatial and acoustic projector elements
and their semantics
Image
Subtitle
Video
Audio
Fig. 8. Specication of presentation alternatives with the switch
element
As we have outlined in the requirements, reuse needs
support for identication and selection of the multimedia
A switch element can be used only if all alternatives can
content to be reused | hence, metadata is needed. There- me modeled at authoring time, in advance to the presenfore, each ZYX fragment is assigned a set of metadata that tation. Hence, the switch element implements the requiredescribes its content by means of attribute-value pairs.
ment for static adaptability of the model. However, there
Adaptation.
might be the case that an author cannot or does not want to
Adaptation means that the ZYX document that is de- exactly specify a part of the presentation but only describe
livered for presentation should best match the context of the desired fragments and defer the actual selection of suitthe user who requested the document. To support such a able fragments to the point in time when the document is
kind of adaptation both a description of the user context requested for presentation. For example, an author might
is needed and a multimedia document that can be adapted wish to specify that at a specic point in the presentation
to this context.
about \cardiac surgery" a digression into physiology is to
The context of a user is captured in a so called user pro- be made, however, the author does not want specify which
le, i.e., metadata that describes the user's topics of inter- fragments are relevant to this but have the most suitable
est, presentation system environment, network connection one selected out of a pool of available fragments just before
characteristics and the like. This metadata is organized as presentation. This can be specied with a query element.
key-value pairs just as the metadata that is assigned to the By means of metadata the query represents the fragment
multimedia content.
that is expected at this point in the presentation. When
The ZYX data model provides two presentation elements the document is selected for presentation the query element
for an adaptation of the document to a user prole: the is evaluated and the element is replaced by the fragment
switch element and the query element. The switch element best matching the metadata given by query element. An
allows to specify dierent alternatives for a specic part illustration of the query element is given in Figure 9. The
8
sample query element is the place holder for the fragment
best matching the query with topic \physiology in cardiac
surgery", of type \lecture" and of 5 minutes duration. The
more metadata tuples are used the more specic the query
is. The query element provides for the dynamic adaptability
of the model as the evaluation of the query and the selection of the fragment takes place just before presentation.
Authoring Time
query [ (topic, "physiology in cardiac surgery",
(type, "lecture"),
(duration, 5min.)]
evaluate query element and select suitable fragment
replace query element with selected fragment
get. Note that this is element is not interactive. Based
on the genericLink the menu element supports to interactively select one out of a set of visual elements and follow
the presentation path that is associated with the selected
element. The elements hotspot and hypertext dene negrained interactive visual areas in images and text. The
design interactive elements are the interactive version of
the projector elements. For example, for the typographic
projector that allows to specify font, size, and style of a
text, the interactive typographic projector element species that these settings can be altered interactively when
the document is presented.
IV. Formal Framework of the ZYX Model
Presentation Time
seq
v1
v2
Image
v3
Text
par
v4
Video
v5
Audio
Fig. 9. Specication of presentation alternatives with the query element | evaluation of the query element and replacement by
selected ZYX fragment
Presentation-neutral representation.
The requirement of presentation-neutral representation
is strongly interrelated with the structural reuse (see also
Figure 7). The explicit separation of structure and layout
allows for presentation-neutral representation. As outlined
before, the variables of a presentation element need not to
be bound in the rst place, this also applies for the projector variables. It is possible to specify the presentationneutral course of the presentation and, later, bind the
presentation-dependent layout just when the document is
selected for presentation. Then the presentation-neutral
structure of the document is bound via projector variables
to the presentation-dependent layout dened by a set of
projectors.
Temporal and spatial modeling. Based on the Interval Expressions [14] the model oers the primitives seq,
par, loop, delay to specify temporal interval relationships.
These presentation elements can be nested to specify any
arbitrary temporal course of the multimedia presentation.
For the spatial model we use the spatial projectors as
presented above. They realize the absolute positioning we
decided to use for the ZYX model. A spatial projector
determines the spatial layout of the presentation element
it is applied to and the layout applies to the entire subtree
of the presentation element.
Interaction. The requirement to support the modeling
of interactive multimedia presentations is met by the data
model's interaction elements. The model oers two types
of interaction elements, navigational interactive elements
and design interactive elements. The basic navigational
element is the genericLink element that allows to specify
the transition from the document to an arbitrary link tar-
In this section, we present the formal framework of the
ZYX model. Therefore, we introduce the reader into the
basic terminology and formalism of the basic elements of
the model and then present the elements for modeling the
temporal course, the layout, interaction, and the adaptation of the presentation. Figure 10 gives the reader an
overview of the denitions to follow. They are listed along
the requirements and design criteria presented in Section II
which where used for the comparison of document models,
illustrated in Figure 1.
A. Basic Terminology
The presentation elements are the generic elements of the
ZYX model. Each presentation element p has assigned exactly one binding point bp . This is the connector with which
a presentation element can be bound to another presentation element. A presentation element has furthermore 0 to
n variables v which are used to bind other presentation elements to it. To add layout information to a presentation
element it optionally can have 0 to n projector variables
pv that can be used to bind projector elements to the element. The projector variables are treated separately, due
to separating structure and layout.
The symbols introduced in Denition 1 are used in the
denitions to follow.
Denition 1 (Symbols)
Let denote B the set of all binding points, V AR the set of
all variables, P V AR the set of all projector variables, T the
set of all element types, M T the set of media types, M E D
the set of all raw media data, ZYXDOC the set of all ZYX
documents, E X T the set of multimedia documents in an
external document format, P T OT the set of all projector element types, AT T RI BU T E S the set of all possible
attribute names, C OLORS the set of all possible colors.
A presentation element p is dened as follows:
Denition 2 (Presentation element)
A presentation element p is a tuple p : [tp ; bp ; Vp ; P Vp ] with
tp 2 T denoting the type of p, bp 2 B denoting the binding
point of p, Vp VAR denoting the set of variables of p, and
P Vp PVAR denoting the set of projector variables of p.
The tuple p can be augmented with further tuple elements
depending on the type tp of the presentation element.
A presentation element p can be an atomic media element, a complex media element, an external media ele-
9
Name
Basic Primitives
p
c
presentation element
connection
generic element of the ZYX model
interconneting presentation elements
2
3
Basic Elements
am
cm
em
atomic media element
complex media element
external media element
represents a media element
encapsulates a ZYX fragment
encapsulates an external fragment in another document format
4
6
7
Temporal Model
par
seq
loop
delay
parallel operator element
sequential operator element
loop operator element
delay operator element
specification of parallel presentations
specification of sequential presentations
specification of repetitive presentations
specification of a temporal gap
8
9
10
11
spatial-p
spatial projector element
projects the visual presentation to a rectangular area
16
specifies the non-interactive transition to a target element or document
specifies the non-interactive transition to a ZYX document
specifies an interactive menu for selection of a presentation path
specifies an interactive region in a visual element
specifies an interactive text region
specifies the interactive adjustment of speed and direction
specifies the interactive scaling of a visual element
20
21
22
23
24
25
26
Spatial Model
Interaction
Navigational
Design
gen_link
ZYX_link
menu
hotspot
hypertext
temporal-pi
spatial-si
Description
Definition No.
Label
generic link element
ZYX link element
menu interaction element
hotspot interaction element
hypertext interaction element
temporal interactive projector element
spatial interactive selector element
Reusability
Granularity
Media Elements
am
temporal-s
spatial-s
textual-s
f = (P, C, M)
cm
em
atomic media element
temporal selector element
spatial selector element
textual selector element
fragment
complex media element
external media element
represents a media element
selects a temporal part of a continuous media element or fragment
selects a visual area of a visual media element or fragment
selects a continuous text passage of a text element
spatial-p
temporal-p
acoustic-p
typographic-p
spatial projector element
temporal projector element
acoustic projector element
typographic projector element
projects the visual presentation to a rectangular area
determines the playback direction and speed factor of the presentation
determines the volume, balance, base, and treble of the presentation
determines the font, size, style, color, etc. of the presentation
16
17
18
19
f = (P, C, M)
query
fragment
query element
fragment specification with metadata for identification and selection
specifies a query for a presentation fragment
5
30
Adaptation
Parameters of Adaptability
User Interest
switch
decide
query
switch element
decide element
query element
specifies presentation alternatives for continuous adaptation
specifies presentation alternatives for adaptation
specifies a query for a presentation fragment
28
29
20
switch
decide
query
switch element
decide element
query element
specifies presentation alternatives for continuous adaptation
specifies presentation alternatives for adaptation
specifies a query for a presentation fragment
28
29
30
switch element
decide element
query element
specifies presentation alternatives for continuous adaptation
specifies presentation alternatives for adaptation
28
29
specifies a query for a presentation fragment
30
Fragments
and Documents
Kind of Reusage
Identification/Selection
Technical
Infrastructure
Definition of Alternatives
switch
Static
decide
Dynamic
Presentation-neutral
Representation
Multimedia Functionality:
Semantic Level:
query
fragment specification
encapsulates a ZYX fragment
encapsulates an external fragment
4
13
14
15
5
6
7
the model provides a comprehensive set of elements for providing high multimedia functionality
the model separates structure from layout by separating structural composition from projectors
Fig. 10. Summary of denitions of ZYX elements
ment, a specic operator element to build up the temporal, structural and interactive relationships, or serve for
the specication of adaptation. This is distinguished by
the type tp in the denition of a presentation element p.
is given in Denition 3.
Denition 3 (Atomic media element)
An atomic media element am : [tam ; bam ; Vam ; P Vam ; m]
is a presentation element with tam 2 M T =
The basic units of a ZYX multimedia document are the fAudio; V ideo; I mage; T ext; Animationg T , Vam = ;,
atomic media elements. An atomic media element is an and m 2 M E D denoting the media data represented by
instantiation of a media type. An atomic media element am.
Presentation elements are interconnected using their
in our model abstracts from the raw media data and just
represents the media element and its media specic charac- variables and binding points. Each variable and also each
teristics. The formal denition of an atomic media element projector variable of a presentation element can be bound
10
to exactly one binding point of another presentation element. Each binding point of a presentation element can
be bound to exactly one variable or projector variable of
another presentation element. A connection binds one variable to a binding point, and is formally dened in Denition
4:
Denition 4 (Connection)
A connection c = [v; bp ] connects the (projector) variable
v 2 Vp [ P Vp of a presentation element p with the binding
point bp of presentation element p0 6= p.
The result of interconnecting presentation elements is
a specication tree that describes a reusable fragment of
multimedia document. A fragment can be comprised of
a single media element, a part, or an entire multimedia
document. The formal description of a valid fragment is
given in the following Denition 5.
Denition 5 (Fragment)
A fragment f = (P; C ) is an acyclic, undirected graph that
describes a part or an entire multimedia document with:
P the set of presentation elements that are part of the
tree.
C f[v; bp ] j p; p0 2 P; p 6= p0 ; v 2 Vp [ P Vp g the set of
connections in the tree.
For a valid fragment f = (P; C ) the following conditions
must hold:
1. If c1 ; c2 2 C , c1 = [v1 ; bp ]; c2 = [v2 ; bp ]; p 2 P then
v1 = v2 , i.e., each binding point can be bound to only one
variable.
2. If c1 ; c2 2 C; p; p0 2 P and c1 = [v; bp ]; c2 = [v; bp ] then
0
p = p , i.e., each variable can be bound to only one binding
point.
S V : [v; b ] 2
3. U nboundf = fp 2 P j:9v 2
p
p
0
0
reuse of ZYX fragments, we introduce the denition of a
complex media element. A complex media object cm encapsulates a fragment f = (P; C ) within the denition of a
presentation element, somehow like a container. With this
denition, an encapsulated fragment can simply be reused
like a single presentation element in any other fragment. A
complex media element cm is dened as follows (Denition
6):
Denition 6 (Complex media element)
A complex media element cm : [tcm ; bcm ; Vcm ; P Vcm ; f ] is
a presentation element that encapsulates the fragment f =
(S
P; C ) with tcm = C omplex 2 T , bcm = brootf , Vcm = fv 2
Vp j 8q 2 P : [v; bq ] 2
= C g, and
p2P
P Vcm = fpv 2
S
p2P
p
PV
j8 2
q
P
: [pv; bq ] 2= C g.
That is, the binding point of the root brootf of the encapsulated fragment f becomes the binding point bcm of
the complex media object cm. All variables and all projector variables in the fragment f that are not bound are
exported and form the free variables Vcm and projector variables P Vcm of the complex media object. For an illustration recall Figure 6: The binding point of the seq element
becomes the root element of the complex media element,
and the unbound variables v1 , v6 , v8 , v9 , and v5 become
the free variables of the complex media element.
As complex media objects encapsulate ZYX fragments,
they oer a means of abstraction. The export of free variables allows for a later accomplishment of the complex
media element. Hence, complex media elements can form
templates which can be \lled" later by binding media elements, other complex media elements and fragments to
the free variables. This \late binding" of presentation elp 2P
ements to the free variables nally instantiates the actual
C g and jU nboundf j = 1, rootf 2 U nboundf ^ trootf 2
= PT.
ZYX document.
There is exactly one presentation element p 2 P of the
To encapsulate fragments that are specied in an exterfragment f that is not bound to any other presentation nal format we dene external media elements (Denition
element. This unbound presentation element is called the 7). An external media element em is also a complex media
root element, denoted rootf , of the fragment and has the element. It encapsulates, however, not a fragment specibinding point brootf that forms the \entry point" of the ed in ZYX, but the specication of an external fragment
fragment; note that projector elements cannot be root ele- available in another data model. Like the complex media
ments.
element, the external media element has assigned a set of
4. There is no sequence of connections c1 ; : : : ; cn , such that variables Vem , projector variables P Vem , and one binding
ci = [vi ; bpi ]; i = 1 : : : n
1, with vi+1 2 Vpi , and v1 2 Vpn . point bem . However, the meaning of the variables and proThis meansS that f is acyclic.
jector variables depends on the external document format.
5. 8pv 2
P Vp : 9[pv; bp ] 2 C ) tp 2 P T :
Denition 7 (External media element)
p2P
An
external media element cm : [tem ; bem ; Vem ; P Vem ; f ] is
Projector variables of a presentation element can bind only
a
presentation
element that encapsulates the fragment f 2
projectorSelements.
E X T with tem = E xternal 2 T , bem binding point of the
6. 8v 2
Vp : 9[v; bp ] 2 C ) tp 2
= P T:
external fragment, Vem variables of the external fragment,
p2P
Variables of a presentation element can not bind projector P Vem projector variables of the external fragment.
elements.
With the denitions given so far it is possible to com7. 8p 2 P : tp 2 P T ) Vp = P Vp = ;.
pose presentation elements by means of connections. The
A projector element can not bind any other presentation interconnection of presentation elements via their variables
element.
and binding point puts these presentation elements in a reFragments form the building blocks of a multimedia doc- lationship, the semantics of this relationship, however, is
ument. They are the units that can be reused and re- not yet dened. Therefore, our data model oers diercomposed in dierent multimedia documents. To ease this ent types of presentation elements, operator elements, with
0
0
0
0
0
0
0
0
11
which presentation elements can be interconnected with a
certain semantics.
In the following, we present the element denitions of
temporal operators, projectors, selectors, interaction elements, and adaptation elements. These elements determine
the semantics that have to be interpreted by a presentation
environment and mapped into the spatial, temporal, structural, interaction, and adaptive domain of a multimedia
presentation. The dierent operator elements are dened
in the tuple notation as already introduced for the generic
presentation element. Again, the type distinguishes the
dierent operator elements. For the dierent elements the
tuple carries additional operator type-specic values that
characterize the element's specic semantics. Not to be
repetitive in the denitions to follow, only the domains of
each of the newly introduced tuple elements are given.
B. Temporal Operator Elements
The temporal operator elements determine the temporal
relationships between the presentation elements. As outlined above, our temporal model is based on Interval Expressions [14]. In the following, we present the denition of
the temporal operator elements par, seq , loop, and delay ,
their specic parameters, and semantics. An illustration of
these temporal operator elements is shown in Figure 11.
The presentation semantics of the par operator element
(Denition 8) is that the presentation elements bound to
its variables are to be presented in parallel.
Denition 8 (Temporal operator element | par)
The temporal operator element par : [tpar ; bpar ; Vpar ; P Vpar ;
f inish; lipsync] is a presentation element with tpar =
P ar 2 OT , Vpar = fv1 ; : : : ; vn g VAR, f inish 2
f1; : : : ; n; min; maxg, and lipsync 2 N0 .
The par operator element oers the two parameters
f inish and lipsync to control the synchronization of parallel presentation: The parameter f inish determines which
one of the n presentation elements fv1 ; : : : ; vn g terminates
the parallel presentation. If f inish is set to min or max
then the presentation stops when the presentation of the
element with the minimal presentation time stop, respectively with the maximal presentation time. By setting
f inish = i; i 2 f1; : : : ; ng the presentation stops when the
presentation of the dedicated presentation element bound
to vi stops. The second parameter lipsync determines the
element that forms the master of a continuous ne synchronization during playout of the par operator. If the second
parameter lipsync equals 0 then no lip synchronization is
specied. If the value of lipsync is i, i > 0, the presentation of the presentation elements bound to v1 ; : : : ; vn is
carried out in lip synchronization and the presentation element bound to vi forms the master of this synchronization.
The presentation semantics of the seq operator element
(Denition 9) is that the presentation elements that are
bound to it are presented in sequence. The presentation
of a seq operator element starts the sequential presentation of the presentation elements that are bound to the
variables vi ; i = 1 : : : n in the order of v1 ; v2 ; : : : ; vn . The
presentation of the seq operator element begins with the
presentation of the presentation element bound to V1 and
ends with the end of the presentation of the element bound
to vn .
Denition 9 (Temporal operator element | seq )
The temporal operator element seq : [tseq ; bseq ; Vseq ; P Vseq ]
is a presentation element with tseq = S eq 2 OT , and Vseq =
fv1 ; : : : ; vn g VAR.
The presentation semantics of the loop operator element
(Denition 10) is that its presentation starts the repeated
presentation of the single presentation element bound to
v 2 Vloop . The presentation is repeated r times and stops
after the rth presentation of the presentation element. If r
is set to 1 the presentation of the element loops forever.
Denition 10 (Temporal operator element | loop)
The temporal operator element loop : [tloop ; bloop ; Vloop ;
P Vloop ; r ] is a presentation element with tloop = Loop 2
OT , jVloop j = 1, and r 2 N [ 1.
The delay operator element (Denition 11) models a
temporal delay of t milliseconds. It can be seen as an
\empty" media element that is presented for a duration
of t milliseconds.
Denition 11 (Temporal operator element | delay )
The temporal operator element delay : [tdelay ; bdelay ; Vdelay ;
P Vdelay ; t] is a presentation element with tdelay = Delay 2
OT , Vdelay = ; = P Vdelay , and t 2 N .
loop[10]
seq
par
Video 1
delay[50]
Text 1
Video 2
par
delay[50]
Text 2
Fig. 11. Fragment illustrating the usage of the temporal operator
elements
Figure 11 illustrates the dierent temporal operators dened above. The loop element that forms the root of the
sample fragments species, that the subtree is repeated
10 times. This subtree is comprised of a sequence of two
videos with accompanying texts, each followed by a short
temporal gap of 50ms for the transition.
C. Selectors
The model oers selector elements to reuse parts of media elements and fragments, i.e., spatial regions and temporal intervals.
First, Denition 12 introduces the notion of a successor
in a fragment needed for subsequent denitions.
Denition 12 (Successor)
Let F denote the set of all fragments. We then dene a
function expand : F ! F that computes for a fragment
12
the fragment that is semantically equivalent to f but
does not contain any complex media element. The function expand(f ) recursively replaces each complex media
element in f by the fragment that the complex media element encapsulates.
Be f 2 F a fragment, expand(f ) = (P; C ) the expanded
fragment, and p; p0 2 P presentation elements. Then the
following direct and indirect successor relationships hold:
1. p0 is direct successor of p () 9[v; bp ] 2 C : v 2 Vp .
2. p0 is indirect successor of p () p0 is not a direct successor of p and there exists a sequence succ1 ; : : : ; succn ; n 2 N
with succ1 is direct successor of p, succi is direct successor
of succi 1 ; i = 2; : : : ; n, and p0 is direct successor of succn .
3. p0 is successor of p () p0 is direct or indirect successor
of p.
For example, in Figure 11 the seq element is a direct
successor of the loop element. The video and text elements
are indirect successors of the loop element and a direct
successor of the parallel element. There is no successor
relationship between image and the audio media element.
Now we can dene the dierent selector elements, the
temporal selector, spatial selector, textual selector, and the
acoustic selector. A temporal selector temporal-s (Denition 13) is a presentation element that can bind exactly one
other presentation element p. The presentation semantics
of this element is that the presentation of the direct and
indirect successors of p is started start milliseconds after
the original starting point of the fragment and lasts for
duration milliseconds.
Denition 13 (Temporal selector element | temporal-s)
The temporal selector element temporal-s :[ttemporal s ;
btemporal s ; Vtemporal s ; P Vtemporal s ; start; duration] is a
presentation element with jVtemporal s j = 1, ttemporal s =
Temporal-S 2 OT , and start; duration 2 N0 .
A spatial selector spatial-s (Denition 14) element can
bind exactly one other presentation element p, which can
be a visual media element like an image or a video but also
a complex media element with visual appearance. The spatial selector selects a spatial area from p. The presentation
semantics of the spatial selector is that only those visual
parts of p and its successors that are visible within the rectangular area that is specied with the element's parameters
x; y; width; and height are presented. For an illustration
of the spatial selector confer Figure 5.
Denition 14 (Spatial selector element | spatial-s)
The spatial selector element spatial-s :[tspatial s ; bspatial s ;
Vspatial s ; P Vspatial s ; x; y; width; height] is a presentation
element with tspatial s = Spatial-S 2 OT , jVspatial s j = 1,
x, y 2 N0 , and width; height 2 N .
The application of temporal and spatial selector elements
is context sensitive. That is, they apply to the entire subtree of the presentation element bound to it. Selector elements can be organized in a hierarchy and each selector element is applied in the the context of the subtree
it is bound to. For an illustration, consider the example given in Figure 12: Two temporal selector elements
s1 and s2 with s1 = [Temporal-S; bs1 ; fvs1 g; ;; 10; 25] and
f
0
2 = [Temporal-S; bs2 ; fvs2 g; ;; 10; 40] (time in seconds) are
nested with s2 being a direct or indirect successor of s1 .
Then the selected temporal interval dened by s1 is dened
relative to the temporal interval specied by s2 . That is,
the start time 10sec of s1 is relative to the beginning of the
interval already selected by s2 .
s
temporal-s
Interval selected by
s1 : [..., 10,25]
v
temporal-s s1
10
temporal-s
25
Interval selected by temporal-s s2
s2 : [..., 10,40]
v
10
40
Video
...
Video
0
10
20
30
40
50
t in sec
Fig. 12. Sample fragment illustrating the usage and semantics of
nesting temporal selector elements s1 : [:::; 10; 25]; s2 : [:::; 10; 40]
To also be able to reuse parts of text, a textual selector spatial-s (Denition 15) selects a continuous fraction
from a text media element bound to the variable of p. The
presentation semantics is that only the selected part of the
text is presented, i.e., the text fraction that begins at the
text position start and has the given length in characters.
Denition 15 (Textual selector element | textual-s)
The textual selector element textual-s :[ttextual s ; btextual s ;
Vtextual s ; P Vtextual s ; start; length] is a presentation element with ttextual s = Textual-S 2 OT , jVtextual s j = 1,
start 2 N0 , and length 2 N .
D. Projectors
To add layout information to a presentation element its
0 to n projector variables pv can be used to bind projector
elements to the presentation element. Projector elements
are presentation elements that determine how presentation
elements are presented. The model oers the four dierent
projector elements spatial-p, temporal-p, acoustic-p, and
typographic-p, to specify the temporal, spatial, acoustic,
and typographic layout of a presentation which we dene
in the following.
The presentation semantics of the spatial projector element spatial-p (Denition 16) is that the visual presentation of p, the presentation element it is bound to, is \projected" on the rectangular presentation area dened by the
projector element. The parameters x and y dene the position of the upper left corner of a rectangle with the given
width and height. The parameter priority denes the order of the overlapping of visual objects such that an object
with a higher priority value covers objects with a lower
priority value. The value of the parameter unit determines
whether the values of the parameters x; y; width; height are
given in pixel or in percent of a presentation window.
Denition 16 (Spatial projector element | spatial-p)
The spatial projector element spatial-p : [tspatial p ; bspatial p ;
Vspatial p ; P Vspatial p ; x; y; width; height; priority; unit] is
a presentation element with tspatial p = Spatial-P 2 P T ,
13
V
spatial p =
height
spatial p = ;, x, y ,
PV
2 N , and
unit
2f
priority
pixel; percent
g.
2 N0 ,
,
width
The spatial projector, like all projector elements, applies
not only to the presentation element p it is bound to but
so all successors of p. That is it aects the entire subtree of
which p is the root element with regard to the spatial projection. The visual parts of p and possibly successors are
scaled to the presentation area dened by the projector's
parameters.
If spatial projectors are nested then each spatial projector spatial-p is evaluated in its context. Figure 13
illustrates the usage and the semantics of nesting spatial projector elements. In the example, the root par
element has a spatial projector bound to it that species the rectangle presentation area for the subtree as
[x = 10; y = 10; w = 100; h = 100]. This area is indicated on the right part of the gure with an dotted rectangle. The two images that are successors of the par element
each have an own spatial projector. The spatial projector of I mage1 in the subtree denes a presentation area
[x = 0; y = 0; w = 40; h = 40] and the second image a
presentation area with [x = 60; y = 60; w = 40; h = 40].
In consequence both spatial projectors of the images are
evaluated in the context of the spatial projector bound to
the par element. Therefore, the areas of the tow images are
projected within the area dened by the spatial projector
of the par element.
pv
par
v1
v2
spatial-p [10,10,100,100]
Image 1
pv
Image 2
pv
spatial-p [0,0,40,40]
spatial-p [60,60,40,40]
Presentation
Fig. 13. Sample fragment illustrating the usage and semantics of
nesting spatial projector elements
element p denes speed = 2 and a successor p0 of p has a
temporal projector that also denes speed = 2 then in fact
the successor p0 is presented at a speed factor of 4.
In the same way the acoustic projector element and the
typographic projector element are dened. The acoustic
projector element acoustic-p (Denition 18) determines the
volume, balance, base, and treble of the presentation of the
presentation element p and all successors of p. The typographic projector element typographic-p (Denition 19) affects the parameters font, size, style, background and foreground color of the presentation of the presentation element
p it is bound to and all successors of p.
Denition 18 (Acoustic projector element | acoustic-p)
The acoustic projector element acoustic-p : [tacoustic p ;
bacoustic p ; Vacoustic p ; volume; balance; base; treble] is a
presentation element with tacoustic p = Acoustic-P 2 P T ,
Vacoustic p = P Vacoustic p = ;, volume 2 [0; : : : ; 100], and
balance, base,treble 2 [ 1; : : : ; 1].
Denition 19 (Typographic projector | typographic-p)
The typographic projector element typographic-p :
[btypographic p ; Vtypographic p ; f ont; size; style; bg; f g ] is a
presentation element with ttypographic p = Typographic-P
2 P T , Vtypographic p = P Vtypographic p = ;, f ont 2
F ontN ames, style 2 fnormal; italic; boldg, size 2 point,
and bg; f g 2 C OLORS .
A projector element at rst aects those presentation element p it is bound to. If, however, p has successors, than
these can be aected to. Each successor of p is aected
if the specic projector can acutally have an aect on it.
For example, a typographic projector aects only those elements in the subtree of p that bear typographic aspects.
In Figure 7, a spatial and an acoustic project element are
bound to a par temporal operator. The spatial projector
applies only to the video, whereas the acoustic projector
applies only to the audio that is bound to the par element.
E. Interaction Elements
To support the requirement of interactive multimedia
The presentation semantics of the temporal projector el- presentations, the model oers dierent interaction eleement temporal-p (Denition 17) bound to a presentation ments for navigational and design interactions.
The gen link element (Denition 20) is the basic eleelement p is that the element p is presented with the given
playback direction and speed. The parameter direction ment for the modeling of navigation in ZYX documents.
species, whether the presentation element (and its sub- The generic link is the presentation element that species
tree) is presented in forward (drection = 1) or in backward a non-interactive direct transition to a target element. It
direction (direction = 1). The actual playback speed is serves as the basis for the actual \interactive" elements in
computed by multiplying the original playback speed with the following. The gen link has the two parameters target
the factor given by the speed parameter.
and mode. The prameter target species the target of the
Denition 17 (Temporal projector element - temporal-p) transition and the parameter mode the way how this tranThe temporal projector element temporal-p : [ttemporal p ; sition is to be carried out.
btemporal p ; Vtemporal p ; P Vtemporal p ; direction; speed] is
Denition 20 (Generic Link element | gen link )
a presentation element with ttemporal p = Temporal-P 2 The interaction element gen link : [tgen link ; bgen link ;
P T , Vtemporal p = P Vtemporal p = ;, direction 2 f 1; 1g,
Vgen link ; P Vgen link ; target; mode] is a presentation eleand speed 2 <+ .
ment with tgen link = GenericLink 2 OT , Vgen link =
Like the spatial projector element a temporal projector P Vgen link = ;, target 2 dom(U nif orm Resource I dentielement applies not only to the presentation element p it is f ier), and mode 2 fstop; spawng.
bound to but to all successors of that presentation element.
The presentation semantics of the gen link is that on
If, for example, the temporal-p projector of a presentation the presentation of the link element, the link target which
14
is specied by a Uniform Resource Identier (URI) is presented. The target need not to be a ZYX document but can
be an HTML document or an arbitrary application and it
presented by the browser/viewer that is associated with
the target's URI. The mode of the generic link determines
whether the current presentation stops and only the target
is presented (mode = stop) or if the presentation of the
target is presented in parallel with the current presentation (mode = spawn). The ZYX sample tree in Figure 14
shows a video-audio presentation which is followed directly
by the presentation of the link target, i.e., the presentation
of the target specied with anURI.
seq
par
Video 1
gen_link
[anURI]
i 2 Vmenu ; i = 1 : : : n representing the menu items. The
presentation elements bound to ti 2 Vmenu ; i = 1 : : : n represent the target elements of the selection. Each selectable
menu item bound to vk corresponds to the target tk . The
presentation semantics of the menu element is that on presentation of the menu element, in parallel all the elements
bound to vi 2 Vmenu ; i = 1 : : : n are presented, i.e., the
menu is presented. When a user selects one of the menu
items bound vj , the target element of the selection bound
to tj is presented. The parameter mode determines what
happens with the current presentation. If mode = vanish,
the engine nishes the presentation of all presentation elements bound to vj ; j = 1 : : : n, and starts the presentation of the presentation element bound to ti . If parameter mode = prevail, the engine \merges" the presentation
of the presentation element bound to ti with the currently
running presentation. If no element (menu item) is selected
by a user, the presentation of the menu element stops as
soon as the presentation of all presentation elements bound
to vi ; i = 1 : : : n, is nished.
v
Audio 1
menu
Fig. 14. Sample fragment illustrating the usage and semantics of the
gen link interaction element
As the generic link is intended to model transitions to arbitrary link targets, we introduce the ZYX link (Denition
21) to specify the specic transition to a ZYX document.
Denition 21 (ZYX Link element | ZYX link )
The interaction element ZYX link : [tZYX link ; bZYX link ;
VZ X link ; P VZ X link ; target; mode] is a presentation eleY
Y
ment with tZYX link = ZYX LI N K 2 OT , VZYX link =
P VZ X link
= ;, target 2 ZYXDOC , and mode 2
Y
fstop; spawng.
The semantics of the ZYX link is that on its presentation the ZYX document specied by target is presented.
The parameter mode describes whether presentatin of the
current document stops and the target ZYX document is
presented (mode = stop) or if it is presented in parallel
with the current presentation (mode = spawn).
So far the elements gen link and the ZYX link are used
to model a direct, non-interactive transition to a link target. For a link transition initiated by a user interaction
with a visual presentation element, we dene the menu
interaction element.
The menu interaction element (Denition 22) denes a
set of variables to which the presentation elements of the
visual link anchors are bound and the corresponding presentation elements that are to be presented when the respective link anchor is interactively selected.
Denition 22 (Interaction element | menu)
The interaction element menu : [tmenu ; bmenu ; Vmenu ;
P Vmenu ; mode] is a presentation element with tmenu =
M enu
2 OT , mode 2 fvanish; prevailg, Vmenu =
fv1 ; : : : ; vn ; t1 ; : : : ; tn g, and n 2 N .
The menu interaction element denes a set of selectable presentation elements (link anchors) bound to
v1
Image 1
Video 1
t1
v2
par
t2
Image 2
ZYXLink
[document 1]
Audio 1
Fig. 15. Sample fragment illustrating the usage and semantics of the
menu and ZYX link interaction element
Figure 15 illustrates the usage of the menu element.
1 and I mage 2 represent the two selectable menu
items. On interaction with I mage 1 the presentation of the
video-audio presentation bound to t2 starts. On interaction
with the link anchor I mage 2 bound to v2 , the presentation of a ZYX link starts which results in the presentation
of the ZYX document bound to t2 .
The menu interaction element is provided to allow for
interaction with visual presentation elements and navigation within a document, i.e., the selection of one out of a set
of possible presentation paths. By using the gen link and
ZYX link as target elements of the menu element, these
paths can leave the document and lead to other documents.
So far the appearance of a link is limited to the visual
appearance of the presentation element that forms the link
anchor. To oer a more ne-grained specication of link
anchors, e.g., a region in an image or a word within a text,
the ZYX model oers the primitives hotspot and hypertext.
The hotspot element (Denition 23) is a variant of the
menu element but renes the interaction sensitive area to
an arbitrary polygon of a visual element. In addition to
the link anchors in the menu element it species a set of
sensitive areas by polygons. Instead of linking a set of
link anchors with a set of targets in the menu element,
the hotspot element interlinks areas of visual presentation
I mage
15
elements with link targets.
Denition 23 (Interaction element | hotspot)
The interaction element hotspot : [thotspot ; bhotspot ; Vhotspot ;
P Vhotspot ; P1 ; : : : ; Pn ; mode] is a presentation element with
thotspot = H otS pot 2 OT , Vhotspot = fv; t1 ; : : : ; tn g,
P Vhotspot
= ;, Pi = [[< x1 ; y1 >; : : : ; < xm ; ym >],
[start; dur]], mode 2 fvanish; prevailg, n 2 N .
The presentation semantics of the hotspot is the presentation of the link anchor bound to v and, not necessarily
visible, the associated interaction-sensitive areas. These
areas are dened each by a tuple Pi that species the sensitive area by a polygon < x1 ; y1 >; : : : ; < xm ; ym > and
the interval [start; dur] for which the sensitive area is active during the presentation. This interval is related to
the beginning of the presentation of the hotspot. On user
interaction with the sensitive area specied by Pi the corresponding link target ti is presented under the given mode
(vanish or prevail).
A further variant of the menu element is the hypertext
element (Denition 23). As a hotspot allows to associate
an interaction-sensitive region of an image or a video with a
link, the hypertext element oers a means to model sensitive parts within text. Like the hotspot, a hypertext interaction element is sensitive for a specied temporal interval
[start; dur].
Denition 24 (Interaction element | hypertext)
The interaction element hypertext : [thypertext ; bhypertext ;
Vhypertext ; P Vhypertext ; T1 ; :::; Tn ; mode] is a presentation element with thypertext = H yperT ext 2 OT , Vhypertext =
fv; t1 ; : : : ; tn g, and P Vhypertext = ;, Ti = [[start; length],
[start; dur]], mode 2 fvanish; prevailg, n 2 N .
The presentation semantics of the hypertext is that
on its presentation the presentation of the text anchor
bound to v starts. The hypertextelement species the
sensitive regions of the text by means of tuples Ti =
[[start; length]; [start; dur]] each dening a sensitive text
segment by its starting text position and its length and the
temporal interval for which the sensitive text area is active
during the presentation. On user interaction with the sensitive segment dened by Ti of the text the corresponding
link target ti is presented under the given mode (vanish or
prevail ).
The model provieds two further types of interaction elements, interactive projector elements and interactive selector elements. These elements comply in general with the
projector and selector elements presented in Denitions 16
to 19, but they have an additional \interactive" aspect,
i.e., they can be interactively changed and adjusted by a
user. For each of the projector and selector element, a corresponding interactive projector element is provided by the
model.
An example of an interactive projector element is the
interactive temporal projector element temporal-pi (Denition 25) that is an interactive temporal-p projector element. Its presentation semantics is that in addition to the
specied temporal projection during presentation a user
can interactively adjust the element's specic parameters
and speed within their domains. For each temporal projector the model oers the corresponding interactive
projector element.
Denition 25 (Interaction element | temporal-pi)
The temporal interactive projector element temporal-pi :
[btemporal pi ; Vtemporal pi ; P Vtemporal pi ; direction; speed]
is a presentation element Vtemporal pi = ;, speed 2 <+ ,
direction 2 f 1; 1g
An example of an interactive selector is the interaction
element spatial-si (Denition 26) that is a special spatial-s
selector. Its presentation semantics is that in addition to
the spatial selection the presentation engine oers a user
to interactively adjust the selected spatial area and the
overlapping by changing the parameters x, y , width, height
and priority within their domains.
Denition 26 (Interaction element | spatial-si)
The spatial interactive selector element spatial-si :
[bspatial si ; Vspatial si ; P Vspatial si ; x; y; width; height] is a
presentation element with Vtemporal si = ;, x; y 2 N0 ,
width; height 2 Ng.
Analogously, the temporal-si element is dened. The interactive selector elements allow to model the interactive
spatial and temporal scaling of media elements and fragments during the presentation.
In addition to the support for navigational interaction
by the elements gen link , ZYX link , menu, hotspot, and
hypertext, the interactive projector and selector elements
implement the design interactions of multimedia presentations.
direction
F. Adaptation elements
Our model oers the two elements switch and query
which allow for the adaptation of a multimedia presentation according to the user's individual context. This user
context, expressing the user's topics of interest, presentation system environment, network connection characteristics and the like, is described in a global prole GP by
means of attribute value pairs (Denition 27).
Denition 27 (Global prole | GP )
The Global Prole GP : [m1 ; : : : ; mn ] is a set of metadata
with mi = [attri ; valuei ] denoting attribute-value pairs
that describe the current user context during a presentation
with attri 2 AT T RI BU T E S and valuei 2 dom(attri ); i 2
N.
The switch adaptation element (Denition 28) serves
the purpose to specify dierent presentation alternatives
for dierent contexts. Under a switch element an author
can \collect" dierent alternatives (media elements or fragments) and add metadata to each alternative that specify
under which presentation conditions the alternative is to be
selected. Thereby, an author can dene dierent fragments
for conveying the same content under dierent presentation
context like system environment, user language, the user's
understanding of the subject, network bandwidth, and the
like. The metadata associated with the switch element
is evaluated by the presentation environment against the
16
global prole to select the one best matching the current
context.
Denition 28 (Adaptation element | switch)
The adaptation element switch : [tswitch ; bswitch ; Vswitch ;
P Vswitch ; M1 ; : : : ; Mn ] is a presentation element with
tswitch = S witch 2 OT , Mi denoting sets of attributevalue pairs, Vswitch = fv1 ; : : : ; vn ; vdefault g, and n 2 N .
The presentation semantics of the switch element is that
upon its presentation the metadata available with the GP
is evaluated against the sets of metadata Mi ; i = 1 : : : n of
the switch. Let Mj ; j 2 f1; : : : ; ng be the set of metadata
which matches best GP . Then, the fragment bound to vj ,
i.e., the presentation alternative best matching the current
presentation context, is presented. If there is no suitable
set of metadata among M1 ; : : : ; Mn , the presentation element bound to vdefault is selected for presentation. The
metadata of the switch element is continuously evaluated
against the current, possibly changing global prole, i.e.,
changing presentation context like varying bandwidth. In
this case during the presentation of the switch element the
presentation environment can select another more suitable
alternative due to a changed context, e.g., switching from a
video to a slide show due to decreasing network bandwidth.
The presentation of the switch element nally terminates
when the presentation of the selected presentation element
is nished.
For cases in which an author does not want to allow
this kind of continuous adaptation the model provides the
decide element. The usage of a decide element instead of
the switch element would, e.g., make the presentation stay
with the video, once selected, instead of switching to an
alternative slide show. The denition of the decide element
is given in Denition 29:
Denition 29 (Adaptation element | decide)
The adaptation element decide : [tdecide ; bdecide ; Vdecide ;
P Vdecide ; M1 ; : : : ; Mn ] is a presentation element with
tdecide = Decide 2 OT , Mi denoting sets of attribute-value
pairs, Vdecide = fv1 ; : : : ; vn ; vdefault g, and n 2 N ,
The presentation semantics of the decide element is the
same as that of the switch element. However, the evaluation of the sets of metadata against the current global
prole GP and the selection of the best match is made only
once at the beginning of the presentation of the decide element.
For cases in which the presentation alternatives of a document are not known at authoring time, the query element
(Denition 30) is provided. The query element is just a
\placeholder" for a fragment. It species a \query" which
selects a fragment just before presentation time from all
available fragments. The resulting fragment replaces the
query element in the ZYX document.
For the denition of the query element, we enhance the
denition of a fragment as given in Denition 5 such that
a fragment specication also includes metadata, i.e., f =
(P; C; M ) with M being a set of attribute-value pairs. This
metadata describes both the content of a fragment f like
the topics covered and technical features of the fragment
like the network bandwidth needed for its presentation.
Denition 30 (Adaptation element | query )
The adaptation element query : [tquery ; bquery ; Vquery
P Vquery ; M ] is a presentation element with tquery
=
Query 2 OT , M denoting a set of attribute-value pairs,
and Vquery = ;.
The semantics of the query element is that before the actual presentation the metadata of the query element and
the global prole specied with M [GP is evaluated against
the metadata given with all fragments known to the system.
Then the fragment with the best match with respect to M
and the prole GP is selected and the query element is replaced by the selected fragment. The query element allows
to dynamically select the most suitable fragment at presentation time taking into account the actual user interest
and system environment.
V. Application of ZYX and Implications to
Authoring and Presentation
We have made clear how important we consider the support for reuse and adaptation by a multimedia document
model, requirements we were aiming to meet in the ZYX
model. This section illustrates application of these two
specic features we elaborated in detail in Section II-A in
authoring and presenting of ZYX multimedia documents.
We start out with presenting the many dierent kinds of
reuse of ZYX elements and fragments in Section V-A before we come to the various possibilities to employ ZYX for
adaptation in Section V-B. And, looking on the impact of
new document models like ZYX for multimedia content production, we point at the implications and positive eects
this has to multimedia authoring.
A. Reuse
Applying ZYX for reuse means that, rst, we show how
identication and selection is supported by ZYX as this
forms the basis for eÆcient reuse of media elements, fragments, and documents. Then, we show application of
reusing ZYX elements in dierent granules and present
structural vs. identical reuse in ZYX .
A.1 Identication and selection
Support for identication and selection is obligatory for
content to be eÆciently reused. Only if the content can
easily be retrieved within the authoring process reuse of
material is possible. Hence, sophisticated metadata must
be associated with media elements, fragments, and documents. The metadata for the media elements comes with
the modeling of the dierent media types. At the level of
fragments, a set of metadata describes the content of the
composition. This metadata is anchored in the denition of
a ZYX fragment f = (P; C; M ), and relates especially to the
content and targeted user group. The available metadata
concerning both the content and the structure of fragments
can be employed for the browsing of fragments in an authoring environment and to identify and select fragments
for composition of ZYX documents.
17
seq
par
par
temporal-s
[10,15]
textual-s
[80,25]
temporal-s
[120,20]
textual-s
[483,20]
Video
Text
Video
Text
Video showing the opening a patient’s chest
Text, explaining how to open the chest
Fig. 16. Reuse of media elements in ZYX
A.2 Dierent granularity of reuse
Equipped with the modeling of metadata of the media
elements and ZYX fragments we illustrate how reuse of media elements, fragments, and documents can be extensively
applied with ZYX.
Reuse of media elements. Atomic media elements represent the raw media data within ZYX documents. These
elements can be reused entirely or only partwise. Atomic
media elements form the leaves of the document structure.
One media element can be used in dierent branches of the
tree. As the atomic media elements only represent the actual media data only the atomic media elements are then
used several times in the document, however, the mere data
exists only once. To select only a part of a media element,
the selector elements are used. They select the desired
scene, visual area, or sound sequence of a medium. In Figure 16 two dierent scenes of the same video showing the
opening of patient's chest before the actual operation on
the open heart as well as two dierent parts of the same
text explaining the operative steps are composed in a ZYX
document. The reuse of media elements, especially partly
reuse, can avoid redundant preparation of media data just
for one single application.
Reuse of fragments and complex media elements. The composition of presentation elements leads to fragments of arbitrary size and complexity. Fragments can be reused as
fragments themselves but also encapsulated within complex media elements. Both the fragments and the complex media elements can be bound to any other variable
during the composition of a (new) document. Exploiting
identication and selection as discussed in Section V-A.1,
an authoring environment for ZYX here can oer the author fragments or complex media elements relevant in the
desired context to be part of the newly composed document. The only dierence between reusing complex media
elements and fragments is that with the complex media
elements the structure and complexity of the selected sub-
part of the document are intendedly hidden from the author. Rather the semantics of the complex media element
is important, e.g., it comprises a slide show, and of how to
ll the unbound variables with presentation elements and
fragments. As the structure of all ZYX documents is accessible and explicitly visible, authoring support could go
so far that a sophisticated content based search algorithm
identies those nodes (presentation elements) in other documents that could be of interest to an author and extracts
the respective subtree (=fragment) for reuse.
The reuse of fragments and complex media elements of
arbitrary size is a feature that relieves an author from
cut&paste of formerly composed documents but opens the
way to composition of multimedia documents much like
using a Lego or K'NEX unit construction set.
Figure 17 illustrates the reuse of fragments and complex
media elements. In the example, the fragment introduced
already in Figure 16 is reused in a course about operative
surgery. Additionally, an already existing complex media
element about a bypass operation is inserted as a digression
of the course into the specic domain of open heart surgery.
The fragment and the complex media element are, e.g.,
arranged in an sequential order and this sequence is then,
indicated by the dashed line, part of the entire course.
seq
Reused fragment
seq
bypass
Reused complex
media element
par
par
temporal-s
[10,15]
textual-s
[80,25]
temporal-s
[120,20]
textual-s
[483,20]
Video
Text
Video
Text
Fig. 17. Reuse of fragments and complex media elements in ZYX
Reusable templates. With ZYX an author can dene templates that cover, e.g., a didactic unit like a multimedia
course, a lecture, a technical guide, a tour through a museum, and the like. Such a template is a regular ZYX fragment but with unbound variables, i.e., the author leaves
some of the leafs of the tree unbound. These templates
give other authors a basic structure to start with for the
composition of a new ZYX document. Consider the sample
fragment in Figure 18: It forms a sequence of ve presentation elements two of which are bound to a parallel
operator. This fragment is encapsulated into a complex
media element denoted aTemplate which then another author uses to \plug-in" the missing presentation elements
and, hereby, forms a new document. In Figure 18 two
complex media elements, a title and a summary, and two
videos with captions are bound to the template aT emplate,
e.g., in a semi-automatic authoring process. For this the
author rather needs only information about the usage of
the complex media element but not necessarily about the
18
explicit structure of the template.
pv1
par
v1
v2
spatial-p [10,10,30,30]
pv2
acoustic-p [20,0,0,0]
seq
v1
v2
par
v6
v3
v4
v5
Video
delay[100]
v7
par
v8
v9
Presentation
pv1
encapsulate
aTemplate
v1
v6
v7
par
v1
v9
v8
v5
Caption 1
v2
spatial-p [100,100,40,40]
pv2
acoustic-p [70,0,0,0]
bind
Video
Title
Audio
Audio
Caption 2
Presentation
Video 1
Video 2
Summary
Fig. 18. Templates | structural reuse of ZYX fragments
Reuse of documents. As entire documents in ZYX are nothing else but a (logically complete) fragment, documents can
be reused in any other ZYX document. Or reuse can just
mean that an author arbitrarily alters and by this adjusts
an existing ZYX document to her specic needs.
A.3 Identical versus structural reuse
Following one of ZYX 's design ideas to separate structure from layout is to reuse a multimedia document with
dierent layouts, e.g., a dierent look and feel. For example, if the layout designer of our Cardio-OP project changes
the concept for the overall presentation of medical content
in the project hopefully only the layout of the documents
must be changed without touching the documents at all.
Another application is the change of the technical presentation medium. Consider a presentation with a screen layout.
What happens if the same presentation is to be presented
at a point of information with a touch screen? By exchanging the layout the same fragments can be used in different presentation contexts. As each presentation element
distinguishes between its variables and projector variables
the structural part can easily be separated from the layout
part. An author, hence, can select to use only the structure
and assign a new, own layout to the document or fragment.
With structural reuse of ZYX documents and fragments
the adaptation of a document's appearance to the presentation context is possible | here the relationship between
reuse and adaptation becomes obvious. Figure 19 gives a
simple example of reusing the same fragment with two different layouts. The presentation of the same fragment then
changes depending on the layout bound to it. Structural
reuse is also an application of the adaptation of the layout
of ZYX documents to a specic user context.
B. Adaptation
In the following, we describe the dierent adaptation
possibilities we have when exploiting the modeling primitives of the ZYX model. The adaptation elements switch
and query as well as ZYX templates play the key role in
supporting adaptation.
Fig. 19. Reuse of structure with dierent layouts in ZYX
B.1 Explicit modeling of presentation alternatives
With the modeling of presentation alternatives, the author of a ZYX document can explicitly model adaptivity to
the user context. For example, in the Cardio-OP context
a switch can distinguish the alternatives for undergraduate
students, graduate students, and researchers. The switch
element allows to dene arbitrary discriminating values.
An alternative can also be \labelled" by a combination of
discriminating values. This means that adaptation has as
many dimensions as the author desires.
However, this means that the document, to be adaptable
to many dierent presentation contexts, needs to model
all the dierent presentation alternatives for the respective
contexts under the document's switch elements. To relieve
an author from such a time consuming and somehow never
ending story, we propose to provide mechanisms to (semi) automatically augment the document with the necessary
alternatives, possibly guided by a user. The idea is that
the author concentrates on the initial goal, to compose a
multimedia document with a certain content, an then enrich the document, exploiting the switch primitive with
additional fragments for conveying the same information
but in dierent presentation contexts. In the following, we
only illustrate how this can be achieved, for further details
we refer the reader to [18].
Automatic generation of presentation alternatives | Augmentation
For a ne-grained adaptation to many dierent user contexts, it is mandatory that a high number of alternatives
is available. However, if an author had to specify all possible alternatives this would result in a very time consuming
composition eort and deviate the author from the initial
goal, namely the composition of a sound presentation. To
relieve the authors from this additional burden, we propose
to support the automatization of the specication of the
alternatives. We call this step augmentation of the multimedia document which takes place after the document has
been composed by the author. The augmentation process
queries the underlying pool of fragments exploiting the inherent technical data and the metadata the media elements
19
have been annotated with to receive potential presentation
alternatives. The alternatives are then inserted into the
document, i.e., the document is augmented by the alternatives to provide for adaptivity in dierent presentation contexts. However, the suggested alternatives cannot simply
be inserted into the document but, to preserve the semantics of the presentation intended by the author, have to undergo a verication to assure that the augmented document
is still valid with regard to the representation semantics.
Figure 20 shows an small document which has been augmented by additional fragments. First, before the augmentation, the document contained the video V ideo1, indicated
in bold face. Then, targeting the document at both a medical professor and a medical student and at the same time
taking into account three dierent levels of available bandwidth for the presentation, the augmentation results in a
switch element oering such dierent alternatives. From
the technical side the augmentation constraint has introduced atomic media elements and fragments for medium
and low bandwidth. Please note, that this does not necessarily mean only dierent quality of the same medium. For
example, for the professor the alternative for the V ideo 1
at low bandwidth is a complex media element, a slide show
S lideS how . Additionally, the documents can be used also
in the context of a medical student. Therefore, for each
available bandwidth a media element has been augmented
that is targeted at the knowledge and background of a medical student but covers the same topic. The parameters of
the switch element in Figure 20 only indicate the discriminating attributes, whereas the actual parameter list is too
long for this illustrative example.
switch [ ... prof ... student ...
... high ... medium ... low ]
Video 1
Video 2
High Bandwidth
Image 2
Video 1’
SlideShow
Medical professor
Medium Bandwidth
Text
Low Bandwidth
Medical student
Fig. 20. Augmentation of a ZYX fragment
In a rst step, we elaborated an augmentation scheme
to (semi-) automatically augment documents with respect
to dierent system contexts which mainly dier in the targeted bandwidth and system power on the level of providing presentation alternatives on the level of atomic media
elements. We have formalized the verication of this kind
of automatic augmentation of ZYX document with presentation alternatives in [18]. A much more complicated eort
is to automatically augment ZYX documents with semantically equivalent fragments that cover larger parts of a
presentation. For example, can a subsection of a multimedia presentation intended for a medical doctor be au-
tomatically augmented such that an equivalent content is
conveyed to a student which presumably has a much lower
background in the eld? Here, possibly the annotation
of multimedia content must be carried out very carefully
by the experts in the eld to give an automatic augmentation model suÆcient input to select and insert semantically equivalent presentation alternative. And additionally,
the process of augmentation will be rather semi-automatic,
possibly guided by an author who is an expert in the eld.
B.2 Declarative modeling of presentation alternatives
There are two kinds of applications of query elements
for adaptation: The query elements can be used for the
dynamic binding of fragments just before presentation and
can also be used to support the authoring process.
The query element bears the metadata that is to be evaluated for the selection of the best matching fragment. The
formal denition of the query element species a set of
metadata to be met by the fragment to replace the query
node. The query semantics, however, are not specied by
the model but left to the application.
Query elements can be used to automatically adjust documents to the current context, i.e., the query elements are
used to select the element that best matches the query at
the latest point in time just before presentation. One of
the advantages of leaving parts of the document somehow
\a black box" just until the actual request for presentation is that in this case always the most up-to-date pool of
fragments is considered in the query evaluation. The evaluation of a query element specied in a document can be
executed at authoring time to test the later result of the
presentation.
In combination with templates, the query element can be
applied for authoring support. Instead of leaving the variables of a template unbound, one could bind these to suitable query elements. The evaluation of the query element
at authoring time can then propose fragments to be placed
at that respective node. By this a kind of content-oriented
browsing can be inserted in the documents and allow, e.g.,
novice users to have an easy start with the model.
C. Implications to authoring and presentation
The approach we have taken for the modeling of multimedia content signicantly impacts authoring and presentation of the multimedia material. Traditional authoring
systems usually aim at the creation of a pre-orchestrated
presentation addressing a dedicated user group. These presentations usually do not allow to exploit the logical structure or layout denitions for adaptation of the presentation
during playout. Given our approach the authoring process
has to focus much more on the structural composition of
multimedia material, separating the logical structure of a
multimedia presentation from its layout specications. The
resulting composition is no longer a xed pre-orchestrated
presentation. It allows for explicit exploitation of the structural composition in order to adapt the presentation to
individual user needs. In consequence, the authoring system needs to have access to the individual media elements,
20
fragments, and documents that should be considered for
composition. Hence, the authoring tool has to oer browsing, navigation, and selection mechanisms to the authors
in order to identify those media elements in the multimedia repository that should become part of the presentation.
Obviously, the annotation of media elements, parts of media elements, fragments, and documents give the necessary
support for the content-oriented browsing such that an author can easily identify and select the relevant parts. The
authoring tool can either provide for the construction of a
ZYX document tree from scratch, or allow for the completion of pre-dened ZYX templates.
The playout of a ZYX document can be realized in different ways. As a rst alternative, the ZYX document
can be transformed into a presentation format that can
be directly interpreted by existing players. This alternative seems to be very interesting for the SMIL format, as
rst SMIL players are already available. Obviously, the
transformation into another document format may result
in the loss of specic features or presentation information
if the target model does not provide the same level of semantic expressiveness as available by the ZYX model. As
a second alternative, ZYX documents could be played out
by a ZYX-specic presentation engine that is capable to
fully exploit all the features of the ZYX model with respect to adaptation of a presentation. This allows for the
integration of new business models into the presentation
environment. For example, the end user can be billed for
the actual quality of the multimedia material s/he received.
In the Cardio-OP project we developed a specic ZYX presentation engine.
In summary, the kind of structured authoring that results in adaptive multimedia documents and the presentation features of a ZYX-based presentation tool, both aiming at reuse and adaptation of multimdia material allow for
cost eective multimedia authoring and customized presentations.
the basis for the denition of an XML DTD for the ZYX
model. This will enable access to content stored in the
Cardio-OP repository by future XML-capable browsers
and we can also think about storing ZYX documents in
an SGML/XML-capable database system in the future,
following the approach taken in [22]. Furthermore, we
have developed a generic presentation engine for ZYX
documents which includes support for continuous MPEG
video streams based on an MPEG-specic extension of the
L/MRP buer management technique [23].
For content-based managing and querying the underlying
media data, we have been developing a Media Integration
DataBlade module [24] for the IDS/UD which forms an
integration layer oering uniform, homogeneous access to
the dierent types of media data. Supporting multimedia
authoring, this DataBlade allows for interactive contentbased browsing in the multimedia material. With the MediaWorkBench we have been developing a tool in Java on
top of the Media Integration DataBlade module for GUIsupported annotating and browsing the media data.
With regard to the global prole describing the user
context, we have been developing a mathematical model
for the combination of dierent proles describing dierent aspects of a user like user group, user system environment and the like into one semantically correct, conictfree global prole that can be exploited for presentation of
adaptive ZYX documents.
For adaptation support, we have developed a cross-media
adaptation scheme [18] that can be integrated with the
ZYX model and provides for the automatic augmentation
of ZYX documents by semantically correct presentation alternatives - a process which relieves the authors from a
time-consuming task of comprehensively composing documents for dierent user and system contexts.
Given this ongoing work, one further goal is to develop
generic composition schemes and, exploiting the metadata
provided with the fragments and the global prole describing the user context, to support (semi-)automatic compoVI. Conclusion and Future Work
sition of documents that are adapted and personalized to
Starting out with the requirements of the Cardio-OP the specic user context.
project, which calls for the support of reusability, adap- Acknowledgments.
tation, and presentation-neutral description of the strucWe would like to thank Utz Westermann for his contributure and content of multimedia documents, we sketched our tions to the design and implementation of the ZYX model.
analysis of existing relevant multimedia document models. We would like to thank Jochen Wandel for his contributions
As these models do not meet the project's requirements, to the formal framework to support automatic augmentawe introduced our new ZYX model that gives the necessary tion of multimedia document models. We would also like
support. We outlined the design considerations of the ZYX to thank Christian Heinlein for his valuable comments on
model and the basic concepts followed by a formal frame- the paper.
work of the ZYX primitives. Finally, we illustrated the
References
applicability of ZYX for reuse and adaptation and the chal[1]
W.
Klas,
C.
Greiner,
and
R. Friedl, \Cardio-OP | Gallery of
lenges and implications of these advanced concepts have to
Cardiac Surgery," in Proc. of IEEE International Conference
authoring and presentation environments for multimedia
on Multimedia Computing and Systems (ICMCS'99), Florence,
Italy, June 1999, IEEE Computer Society.
documents.
Raggett, A. Le Hors, and I. Jacobs,
HTML 4.0
The ZYX model has been implemented as a DataBlade [2] D.
Specication { W3C Recommendation, revised on 24-Aprilmodule for the object-relational database system Informix
1998, W3C, URL: http://www.w3.org/TR/1998/REC-html4019980424, April 1998.
Dynamic Server/Universal Data Option under SUN SoJTC1/SC29, Information technology { Coding of mullaris [19], following the architectural framework initially [3] ISO/IEC
timedia and hypermedia information { Part 1: MHEG object
presented in [20], [21]. The formal description served as
representation ISO/IEC 13522-1, ISO/IEC IS, 1997.
21
[4] ISO/IEC JTC1/SC29/WG12, Information Technology { Coding of Multimedia and Hypermedia Information { Part 6: Support for Enhanced Interactive Applications, ISO/IEC IS 135226, ISO/IEC, 1996.
[5] ISO/IEC JTC1/SC29/WG12, Information Technology { Coding
of Multimedia and Hypermedia Information { Part 5: Support
for Base-Level Interactive Applications, ISO/IEC IS 13522-5,
ISO/IEC, 1995.
[6] ISO/IEC, Information Technology - Hypermedia/Time-based
Structuring Language (HyTime), 1992, ISO/IEC IS 10744.
[7] S. R. Newcomb, N. A. Kipp, and V. T. Newcomb, \HyTime
{ The Hypermedia/Time-Based Document Structuring Language," Communications of the ACM, vol. 34, no. 11, November
1991.
[8] P. Hoschka, S. Bugaj, D. Bulterman, et al., Synchronized Multimedia Integration Language { W3C Working Draft 2-February98, W3C, URL: http://www.w3.org/TR/1998/WD-smil-0202,
Februar 1998.
[9] S. Boll, W. Klas, and U. Westermann, \Multimedia Document
Formats | Sealed Fate or Setting Out for New Shores?," in
Proc. of IEEE International Conference on Multimedia Computing and Systems (ICMCS'99), Florence, Italy, June 1999,
IEEE Computer Society.
[10] S. Boll, W. Klas, and U. Westermann,
\A Comparison of Multimedia Document Models Concerning Advanced
Requirements,"
Technical Report - Ulmer InformatikBerichte Nr. 99-01, University of Ulm, Germany, February 1999,
http://www.informatik.uni-ulm.de/dbis/CardioOP/publications/TR99-01.ps.gz.
[11] S. Boll, W. Klas, and U. Westermann, \Multimedia Document
Formats | Sealed Fate or Setting Out for New Shores?," Multimedia - Tools and Applications, ICMCS special issue, to appear
in 2000.
[12] T. D. C. Little and A. Ghafoor, \Interval-Based Conceptual
Models for Time-Dependent Multimedia Data," IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 4, August
1993.
[13] T. Wahl and K. Rothermel, \Representing Time in Multimedia Systems," in Proc. IEEE International Conference on Multimedia Computing and Systems, Boston, MA, May 1994, pp.
538{543.
[14] A. Duda and C. Keramane, \Structured Temporal Composition
of Multimedia Data," in Proc. IEEE International Workshop
on Multimedia- Database-Management Systems, Blue Mountain
Lake, August 1995.
[15] N. Hirzalla, B. Falchuk, and A. Karmouch, \A Temporal Model
for Interactive Multimedia Scenarios," IEEE Multimedia, vol. 2,
no. 3, pp. 24{31, Fall 1995.
[16] D. Papadias, Y. Theodoridis, T. Sellis, and M. J. Egenhofer,
\Topological Relations in the World of Minimum Bounding
Rectangles: A Study with R-Trees," in Proc. of the ACM SIGMOD Conference on Management of Data, San Jose, May 1995.
[17] M. J. Egenhofer and R. Franzosa, \Point-Set Topological Spatial
Relations," Int. Journal of Geographic Information Systems,
vol. 5, no. 2, March 1991.
[18] S. Boll, W. Klas, and J. Wandel, \A Cross-Media Adaptation
Strategy for Multimedia Presentations," in Proc. of ACM Multimedia'99, Orlando, Florida, USA, November 2{5 1999.
[19] S. Boll, W. Klas, and U. Westermann, \Exploiting OR-DBMS
Technology to Implement the ZYX Data Model for Multimedia
Documents and Presentations," in Proc. of Datenbanksysteme
in Buro, Technik und Wissenschaft (BTW99), GI-Fachtagung,
Freiburg, Germany, March 1999, Springer.
[20] W. Klas and K. Aberer, \Multimedia and its Impact on
Database System Architectures," in Multimedia Databases in
Perspective, P. M. G. Apers, H. M. Blanken, and M. A. W.
Houtsma, Eds. Springer, London, 1997.
[21] S. Boll, W. Klas, and M. Lohr, \Integrated Database Services for
Multimedia Presentations," in Multimedia Information Storage
and Management, S. M. Chung, Ed. Kluwer Academic Publishers, Dordrecht, 1996.
[22] K. Bohm, K. Aberer, and W. Klas, \Building a Hybrid Database
Application for Structured Documents," Multimedia - Tools and
Applications, vol. 8, no. 1, 1999.
[23] F. Moser, A. Krai, and W. Klas, \L/MRP: A Buer Management Strategy for Interactive Continuous Data Flows in a
Multimedia DBMS," in Proceedings VLDB 1995, USA, 1995,
Morgan Kaufmann.
[24] U. Westermann and W. Klas, \Architecture of a DataBlade
Module for the Integrated Management of Multimedia Assets,"
in Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM),
Orlando, Florida, October 1999.