To appear in Transactions on Knowledge and Data Engineering, DS-8 Special Issue, IEEE, 2000 1 | ZYX | A Multimedia Document Model for Reuse and Adaptation of Multimedia Content Susanne Boll, Wolfgang Klas Database and Information Systems (DBIS) University of Ulm, Computer Science Department, Ulm, Germany fboll,[email protected] neous network and system environments. Our research project \Gallery of Cardiac Surgery" (Cardio-OP1) [1] is an example of an advanced multimedia application that emphasizes this need for reuse and adaptation and explicitly requires a model for multimedia material that supports extensive reuse of the material in dierent user contexts. The overall goal is to develop an Internet-based and database-driven multimedia information system for physicians, medical lecturers, students, and patients in the domain of cardiac surgery. The system will serve as a common information and education base for its dierent types of users in which the users are provided with multimedia information according to their specic request, their dierent understanding of the selected subject, their geographic location and technical infrastructure. Within this project context, our group is developing concepts and prototypical implementations of a databasedriven multimedia repository that integrates modeling, management, and content-based retrieval of multimedia content with exible dynamic multimedia presentation services that select, deliver, and present the multimedia content according to the user context. Major project requirements are the support for reuse, adaptation, and presentation-neutral description of the structure and content of multimedia documents. Given the project's requirements, we were looking for a I. Introduction suitable modeling support among existing multimedia docMultimedia applications need data models for the repre- ument standards. Therefore, we elaborated both the tradisentation of the composition of media elements | multi- tional and advanced requirements to multimedia document media document models. They are employed to model the models and, endowed with this metrics, analyzed the docusemantic relationships between the media elements partic- ment models HTML [2], MHEG [3], [4], [5], HyTime [6], [7], ipating in a multimedia presentation. The initial require- and SMIL [8]. The detailed analysis and comparison of the ments to multimedia documents are the modeling of the models can be found in [9], [10], [11]. However, the analytemporal and spatial course of a multimedia presentation sis of the models' basic modeling concepts as well as their and also the modeling of user interaction. However, the support for reuse, adaptation, and presentation-neutral derequirements of multimedia applications have evolved: as scription of multimedia content showed that each of the authoring of multimedia information is a very time consum- models lacks some signicant concepts and does not meet ing and costly task, attention has been drawn to reuse mul1 Partially funded by the German Ministry of Research and Educatimedia documents for eÆciency and economical reasons. tion, grant number 08C58456. Our project partners are the UniverFurthermore, the growing plenitude of multimedia informa- sity Hospital of Ulm, Dept. of Cardiac Surgery and Dept. of Cardioltion calls for personalization of the multimedia information ogy, the University Hospital of Heidelberg, Dept. of Cardiac Surgery, according to the user's individual context. Access and dis- an associated Rehabilitation Hospital, the publishers Barth-Verlag dpunkt-Verlag, Heidelberg, FAW Ulm, and ENTEC GmbH, tribution of multimedia documents via networks like the and St. Augustin. For details see also URL http://www.informatik.uniInternet require adaptation of the documents to heteroge- ulm.de/dbis/Cardio-OP/ Abstract | Advanced multimedia applications require adequate support for the modeling of multimedia content by multimedia document models. This support more and more calls not only for the adequate modeling of the temporal and spatial course of a multimedia presentation and its interactions, but also for the partial reuse of multimedia documents and adaptation to a given user context. Our thorough investigation of existing standards for multimedia document models such as HTML, MHEG, SMIL, and HyTime, however, leads to us the conclusion, that these standard models do not provide suÆcient modeling support for reuse and adaptation. Therefore, we propose a new approach for the modeling of adaptable and reusable multimedia content, the ZYX model. The model oers primitives that provide | beyond the more or less common primitives for temporal, spatial, and interaction modeling | a variform support for reuse of structure and layout of document fragments and for the adaptation of the content and its presentation to the user context. We present the model in detail and illustrate the application and eectiveness of these concepts by samples taken from our Cardio-OP application in the domain of cardiac surgery. With the ZYX model, we developed a comprehensive means for advanced multimedia content creation: support for template-driven authoring of multimedia content; support for exible, dynamic composition of multimedia documents customized to the user's local context and needs. The approach signicantly impacts and supports the authoring process in terms of methodology and economic aspects. Keywords | multimedia document model, reuse, adaptation, multimedia database system 2 all of the requirements. Therefore, we designed and implemented the ZYX model to overcome these limitations and to have a proper basis to start out from to comprehensively provide for reusability and adaptation by the multimedia repository. In this paper, we present the ZYX model, which forms the core for the modeling of the multimedia content in our repository. In comparison to existing models, it provides more adequate support for semantic modeling, reusability and exible composition, adaptation and individualization for presentation, and presentation-neutral storage. We illustrate the application of the model in the domain of cardiac surgery and point out the implications of such a model that supports reuse and adaptation to multimedia authoring and multimedia presentation. The paper is organized as follows: Section II provides the reader a better understanding of the new requirements we see with next generation multimedia applications. This leads to a metric that we used to analyze existing multimedia document models. The summary of this analysis is also presented in this section. It motivates the need for our new document model ZYX that emphasizes the requirements for reuse and adaptation of multimedia documents. Section III presents the basic ideas and design considerations of the ZYX model, Section IV gives the formal framework for a detailed understanding of the model. Focussing on reuse and adaptation, Section V, presents and illustrates the spectrum of application possibilities of ZYX for reuse and adaptation and discusses the advantages this supports brings to creation and delivery of multimedia content. Section VI summarizes our work and gives an outlook to ongoing and future work. II. Requirements to Multimedia Document Models and an Analysis of Existing Models In this section, we present our requirements to multimedia document models. Hereby, we distinguish basic and advanced requirements. The basic requirements to multimedia document models are the modeling of the temporal and spatial course of a multimedia presentation and the modeling of interaction. The challenging, advanced requirements to multimedia document models are the reusability of the multimedia material, the adaptation to user specic needs and context, and presentation-neutral description of the content. As our focus lies on the advanced requirements, we start out with presenting these in Section II-A and only shortly sketch the basic requirements afterwards in Section II-B. Both the basic and the advanced requirements constitute a metrics along which we analyzed selected relevant multimedia document models for their suitability in the project context. This analysis is summarized in Section II-C. A. Advanced Requirements In order to support a modular and context-dependent composition of multimedia documents from media objects and parts of multimedia documents, document models need to provide a data model which provides support for reuse, adaptation, and presentation-neutral description of the structure and content of multimedia documents. Reuse. As motivated in the introduction, reuse of multimedia material is an unavoidable requirement for multimedia document models. We characterize reusability of multimedia content along three dimensions: the granularity of reuse, the kind of reuse, and the selection and identication of reusable components. Granularity: The granularity of reuse determines what can be reused. Regarding multimedia document models, we can distinguish at least three levels of granularity of reusable components: reuse of complete multimedia documents, reuse of fragments of multimedia documents like single scenes or teaching units, and reuse of individual atomic media elements such as a video or audio and parts of those media elements such as a scene of a video. Kind of reusage: For all three levels of granularity we distinguish two dierent ways of how to reuse material for the composition of new documents: identical reusage, i.e., the components are reused including all temporal, spatial, design and interaction relationships and constraints as originally specied by the author(s), and structural reusage by means of separating layout and structure and reusing only structural parts. Selection and identication: Before we can reuse multimedia components we have to identify and select them within the multimedia information system. This calls for metadata and for mechanisms for classifying, indexing, and querying components. Hence, a document model should provide support for comprehensive and sophisticated annotation of reusable components with metadata. Adaptation. The presentation of multimedia documents preferably should adapt to the user context, like the user's interest, knowledge level, preferences, the targeted user system environment, and varying resources like available network bandwidth and CPU time. To introduce adaptivity into multimedia presentations a requirement to a multimedia document model is that the model must oer primitives to specify or generate orderive in some way presentation alternatives that reect and meet the dierent presentation contexts. For an actual presentation the system can use these alternatives to adapt the delivery and the rendering of the presentation to the current user context. For example, consider a professor on campus who is interested to see in-depth multimedia material on coronary artery bypass grafting, and an undergraduate student at home who needs to get only an abstraction of the same material to pass the upcoming exam. In these two dierent presentations, the \story" behind each actual presentation, however, might be the same; some components of the professor's presentation might be (re)used in the student's presentations while others might be substituted or adapted by more abstract representations of the specic content. For a better understanding, we distinguish adaptation by the extent to which the adaptability is modeled and when the adaptability is exploited: Extent of the adaptability: For the extent of the adapt- 3 ability we distinguish between adaptation to personal interest, which adapts the contents of a document to the user's interests, knowledge, professional background and the like, and adaptation to technical infrastructure, which adapts to the technical infrastructure available to a user. In the example above, adaptation to technical infrastructure would be the capability to adapt the document's presentation both to the high-end environment of the professor on campus and the low-end environment of the student at home. Therefore, the presentation should be adaptable by means of technical parameters like resolution of images and frame rate of videos, but also by means of media substitutability like substituting an audio by text or a video by a sequence of pictures or a small animation. Adaptation to personal interest would be an adaptation of the content such that the professor would see a more in-depth presentation of the coronary artery bypass grafting whereas the student would rather get a simplied variant presentation of the operation, thus reecting the expected background knowledge of the dierent users. Static or dynamic adaptability: With regard to the presentation alternatives it is of interest whether all possible alternatives for the adaptation are to be known and modeled at authoring time of a multimedia document or whether they are left for generation at the actual presentation time just when the adaptation is needed. Presentation-neutral Representation. The multimedia material available has to be presentable in a heterogeneous software and hardware environment as can be found in the Internet. As a consequence, the multimedia material has to be stored presentation-neutral, i.e., independent of the actual realization of a presentation at a client. This calls for a presentation-neutral representation of multimedia content that is convertible into the respective presentation-specic format used for playout of the multimedia material. It is desirable that this conversion is lossless and a conversion to dierent \output formats" is possible. The presentationneutral representation of multimedia content should hence | besides the coverage of rich multimedia functionality | take place on a high level of semantics. The presentationneutral model should also be open in the sense that it allows for later integration of multimedia functionality expected to be developed in the future. Multimedia functionality: The multimedia functionality of a multimedia document model describes the expressiveness of its modeling primitives. A document should have a high multimedia functionality to give suÆcient support for modeling multimedia content. With regard to the conversion process of a (presentation-neutral) document into another/output format, this means that if the target document model does not oer an equivalent multimedia functionality as oered by the source model, the conversion will be lossy. Semantic level: A document model describes a document on a high semantic level if the document's structure is specied rather than its presentation. This is helpful and necessary to allow for an automatic conversion of a document into another document format as then the course of the presentation can be extracted and converted easier. If the document has a low semantic level, a conversion needed knowledge about the multimedia content that often only the author will have. Therefore, the presentation-neutral representation of multimedia content should have a high multimedia functionality and take place on a high level of semantics. B. Basic Requirements The traditional requirements for a temporal and spatial model as well as interaction modeling are imperative for a multimedia document model and, hence, are presented only in short for the sake of completeness. Temporal model. A temporal model (see also [12], [13], [14], [15]) describes temporal dependencies between the media elements of a multimedia document. One can nd four types of temporal models: point-based temporal models, interval-based temporal models, and event-based temporal models. Another way to specify temporal relations between media elements is by the use of scripts { programs written in a scripting language which can comprise temporal synchronization operations. Spatial Model. Three approaches of positioning the visual elements on the presentation medium can be distinguished: absolute positioning based on a coordinate system, directional relations [16], using relations like strong-north and weak-north (to specify overlapping), and topological relations [17] using relations like disjoint, meet, and overlap. Interaction. Users should be able to interact with presentations in terms of three types of interaction: (1) Navigational interactions determining the user-dened ow of a multimedia presentation, (2) design interactions inuencing the visual and audible layout of a presentation, and (3) movie interactions aecting the temporal course of the entire presentation. Navigational and design interactions should be specied within multimedia documents, whereas movie interactions are expected to be oered by the presentation engine. C. Analysis of Existing Models In this section, we very briey summarize our analysis of the most relevant existing standards and data models in view of the requirements presented in the previous section. Both the basic and the advanced requirements constitute a metrics along which we analyzed selected multimedia document models. Due to the limitation of space we can not present our comprehensive and detailed discussion how the models meet the specic requirements in this paper but refer the reader to [9], [10], [11]. Figure 1 illustrates the results of our analysis of the most relevant existing approaches and shows to which extent HTML/DHTML, MHEG-5/6, HyTime, and SMIL, full the basic and advanced requirements. For each of the requirements the single aspects elaborated in Section II are listed and for each of the models the Figures shows how/to what extent the requirements are met by the model. 4 these requirements by a new data model. In the following, we take up the advanced requirements and discuss the eventpointintervalscript Temporal Model approach how we aim at supporting them in ZYX. With based based based regard to the basic requirements, we present what underabsolute absolute absolute absolute absolute Spatial Model positioning positioning positioning positioning positioning lying temporal and spatial model we selected and explain Interaction the interaction capabilities. + + + + Navigational Presentation-neutral representation. For the supported de+ + Design gree of presentation neutrality of the multimedia document Reusability Granularity model, the semantic level of the model and the model's ab+ + + + + Media Elements straction from the actual presentation are crucial. There+ Fragments fore, decided to develop a data model that describes a mul+ + + + + Documents timedia document on a high semantic level. This allows us Kind of Reusage a (lossy) export or conversion of our multimedia document + + + + Identical + Structural into data models like MHEG-5, SMIL, and HTML. To keep + + + + Identification/Selection the documents independent of the nal realization within a multimedia presentation, the model strictly separates modAdaptation Parameters of Adaptability eling of layout information from document structure. To be MHEG-6 + User Interest able to support a rich multimedia functionality our model MHEG-6 + + Technical Infrastructure is designed to support as much of the multimedia functionDefinition of Alternatives MHEG-6 + + Static ality of these models as possible while still keeping a high Dynamic semantic level. Presentation-neutral Reuse. For the structure of the documents, we consider Representation very low high medium very high low Multimedia Functionality a hierarchical organization of the document as it can be medium very low medium low very high Semantic Level found with XML-based document models. To achieve reuse on an arbitrary level of granularity, the model supports Fig. 1. Summary of the support of the basic and advanced require- dierent granules of reusable components, i.e., media elements by HTML, DHTML, MHEG-5/6, SMIL, and HyTime (+ ments, document fragments, and entire documents. The support, o partial support, | no support) model strictly separates modeling of layout information from structure to keep the documents independent of the The analysis of existing standards, defacto standard for- nal realization within a multimedia presentation. Due mats, and models shows that, although, individual formats to this separation of layout information from structure, and models are strong with respect to particular features, hence, it is both possible to just reuse the structure and they are not capable to meet all the requirements identi- add new layout information to it, and to reuse the dierent ed in the previous section especially those we nd with granules directly with the layout information. Hence, the advanced multimedia applications, i.e., support for reuse, ZYX model supports structural and identical reusage of eladaptation, and presentation-neutral description. This re- ements, fragments, and documents. For the selection and sult led to the design and implementation of the ZYX model identication of the dierent granules the model has the which tries to take the pick of the bunch of features of ex- capability to annotate/enhance the granules with contentisting formats and models, especially also recent develop- descriptive metadata. ments in the area of Internet-applicable models driven by Adaptation. With our document model we want to supthe development of XML and SMIL. port comprehensive adaptation mechanisms. Adaptability of ZYX is not limited to adaptation to a pre-dened set of III. The ZYX Model discriminating technical attributes that are exploited for When designing the ZYX model, we were, of course, tak- adaptation, as can be found with SMIL, but can be specing into account the lessons learned with the models we ied by an open set of attributes that reect a complex analyzed. To give the reader an understanding of the de- user and system context. The model oers the static modsign of our model and also the points of contact of ZYX eling of \presentation alternatives" that can be exploited with other approaches in the eld, we sketch our design for adaptation to the dierent presentation contexts. Adconsiderations in Section III-A. In Section III-B, we then ditionally, the model oer primitives that determine the introduce the reader into the basic concepts of our ZYX needed presentation alternative only at the point in time data model before we present the detailed formal frame- when the document is actually requested and presented. work for ZYX in Section IV. Temporal model. We decided to use an interval-based temporal model. In order to full the important requirement A. Design Considerations to describe the temporal dimension of interaction, we seAiming at the design of a model which fulls the re- lected the Interval Expressions [14] to form the basis of the quirements of reuse, adaptation, and presentation-neutral underlying temporal model of the ZYX data model. In exrepresentation as presented in the previous section, there tent to other interval-based temporal models it allows to are still choices open how to achieve suÆcient support for describe and related time intervals which possibly have an Advanced Requirements Basic Requirements Requirements HTML DHTML SMIL MHEG-5 HyTime 5 binding point pv presentation element 1 variables v1 . . . 1 ... pvn vn free variable bound variable projector variables pv1 presentation element 2 v1 . . . vn ... unknown duration, a feature which is of importance with interaction modeling. The selection of an interval-based temporal model does not contradict to the high-semantic level of the document model as this would be the case of an event-based or script-based temporal model. Spatial model. For the spatial layout we decided for a pointbased description of each visual media entity in a multimedia document. Each visual media entity has assigned 2-dimensional extension plus a third dimension to specify overlapping of visual media entities. So far, we do not consider the specication of spatial relationships between media entities like right-of or besides. As our model strictly separates structure and layout, and denes clear interfaces to add layout to structure, the model, however, allows to be extended by a more sophisticated spatial model later. Interaction. Our model supports the two interaction types: navigational/decision interactions and design interactions. This means that our model provides a comprehensive support for these two interaction types comparable with the interaction capabilities of MHEG-5, but more sophisticated than those of SMIL. pvn Fig. 2. Graphical representation of the basic document elements an Audio which are bound to its variables v6 and v7 . The presentation semantics of each fragment is that is starts with the presentation of the root element, here the sequential element. The specic presentation semantics of the seq element is that the elements bound to its variables v1 , v2 , v3 , v4 , and v5 . are presented one after the other. That is, the element that will be bound to v1 is presented rst, then the image bound to v2 , then the par element and so on. The presentation semantics of par element is that the video and the audio element bound to its variable v6 and v7 are presented in parallel. The sample fragment represents B. Basic Concepts of the ZYX Model the media elements and the semantic relationships between In this section, we present the terminology and the basic the four media elements. With the seq element's binding concepts of the ZYX model. The ZYX model describes a point this fragment can be bound to another presentation multimedia document by means of a tree. The nodes of element in a more complex multimedia document tree. The the tree are the presentation elements and the edges of the variables v1 and v5 of the fragment are still unbound. Here, tree bind the presentation elements together in a hierar- an(other) author could insert, e.g., a title at the beginning chical fashion. Each presentation element has one binding and a summary at the end of the sequence, later. point with which it can be bound to another presentation element. It also has one or more variables with which it seq can bind other presentation elements. Additionally, each v1 v2 v3 v4 v5 presentation element can bind projector variables to specify the element's layout. Figure 2 introduces the graphical representation of these basic elements of the model which we Text Image par use in the following to illustrate the model's features. The v6 v7 presentation elements are represented by rectangles, they form the nodes of the document tree. On top of this rectangle, a diamond represents the element's binding point. Video Audio The variables are represented by the lled circles below the Fig. 3. Simple document tree | a ZYX fragment rectangle. The open circles on the right side of each presentation element represent the element's projector variables. The actual connections of variables and projector variables We now explain the modeling capabilities of our model to binding points of other presentation elements are reprewith regard to our specic requirements of reusability, sented by edges in the graphical representation. A variable adaptation and presentation-neutral representation, as well that is connected to another presentation element is called as temporal and spatial modeling, and interaction. bound variable, those variables that are not connected are Reusability. First, we describe the elements of ZYX that called free variables. Presentation elements are the generic elements of the support dierent granularity of reusable components of model. They can be media elements that represent the multimedia documents. media data but also elements that represent the temporal, Reusability on the level of media elements is supported spatial, layout, and interactive semantic relationships be- by means of selector elements: These are presentation eltween the elements of a multimedia document. Consider ements that determine what, that is which part of a methe simple document tree, a so called ZYX fragment, in dia element is presented. They can be used to select and Figure 3. A temporal element, the sequential element seq, thereby (re)use a specic part of an audio or a specic area binds the media elements Image and Text to its variables v2 of an image. To select a part of a continuous media eleand v4 , as well as a parallel element par to its variable v3 . ment, the temporal selector temporal-s species start and The par element element again synchronizes a Video and duration of the selected sequence. Figure 4 illustrates the 6 usage and semantics of a temporal selector element: The temporal selector selects a scene of a video of a duration of 40sec beginning with second 10 of the original video. temporal-s Interval selected by temporal-s [10,40] v1 Video ... Video 0 20 10 30 40 50 t in sec Fig. 4. Temporal selector element temporal-s and its semantics To select a spatial fraction of a visual media element, the spatial selector species the selected area by a polygon. In Figure 5, a spatial selector spatial-s is applied to an image media element to select a rectangular area from the image. The selectors can also be applied to fragments, e.g., to select two minutes of an existing slide presentation or a fraction of a composite visual element. (0,0) spatial-s [x,y,w,h] image part - selected by spatial-s fragment seq height v1 of variables. In the example in Figure 6, dierent presentation elements of the fragment leave variables unbound, which makes it a template as described above. Here also the encapsulation of fragment by complex media elements is of help: To make later \lling" of such templates easier, a template can also be encapsulated. The free variables of the fragment are exported and form the variables of the complex media element. Figure 6 illustrates how a complex media element encapsulates a complex fragment. A complex media element somehow is the black box view to a possibly complex presentation fragment. The concepts of free variables in combination with complex media element guarantee comprehensive and workable reusability on the level of presentation fragments. Analogously, an external media element encapsulates a specication of a fragment that was composed in another external document format. This allows the inclusion of existing documents of another document format into our model. What, however, is encapsulated by the external media element is dependent of the external document format. Fragments and documents: And, of course, fragments entire documents can be reused by binding the root element of the document to a free variable in another document. (x,y) v1 v2 v3 v4 v5 h Image w seq par Image v7 v6 v8 v9 width Fig. 5. Spatial selector element spatial-s and its semantics Audio encapsulate Reuse is also supported on the level of fragments. Here templates, complex media elements, and external media elements provide for the support of reusability of fragments: Templates: In the ZYX model, not all of the variables of a presentation element must be bound at authoring time. In Figure 3 the variables v1 and v5 , e.g., the title and the summary of the presentation are still unbound. This means that the sequence element seq can later be completed by binding presentation elements to the free variables v1 and v5 . This makes the simple fragment in Figure 3 a \template" for later (re)use. This is an important feature for building reusable fragments that can be applied in dierent multimedia documents by (a kind of late) binding of the free variables dierently corresponding to the current context. Complex and external media elements: It is of course possible to form more complex fragments like the one shown in Figure 6. To make reuse more easy and make it easier to handle large documents fragments can be encapsulated by complex media elements. Then, an encapsulated fragment appears like a single presentation element in the specication tree with one binding point and possibly a set complex media element v 1 v6 v 8 v 9 v 5 reuse Fig. 6. Complex fragment encapsulated in a complex media element With regard to the kind of re-usage the model supports both identical and structural reuse: Therefore, besides the selector elements, the ZYX data model oers projector elements that inuence the visual and audible layout in a presentation of a multimedia document. Projector elements determine how a media element or a fragment is presented. They determine for example the presentation speed of a video or the spatial position of an image on the screen. Projectors are bound to the projector variables of presentation elements. Each presentation element can have one or more projector variables to which projectors can be bound. A projector applies not only to the presentation element it is bound to but also to its subtree. For the arbitrary nesting of projectors authoring tools should provide support for 7 consistency checking to avoid contradicting layout specications. Figure 7 illustrates the usage of projector elements and the separation of structure and layout. In this example, a fragment denes the parallel presentation of an audio and a video. Two projector elements are bound to the sequential element, a spatial projector spatial-p and an acoustic projector acoustic-p. Each of the projectors applies only to those elements in the same tree that can be aected by it. Therefore, the spatial projector aects the spatial layout of the video. The acoustic projector applies to the audio element and determines the volume, base, treble, balance for presentation. By means of changing/adding projector elements one can change the layout of the document. This allows for reusability of the same structure with dierent presentation layouts, i.e., implements structural reuse. This follows the idea of separating structure from layout information as can be found with SGML and XML and complies also with our requirement for presentation-neutral representation of the documents. Structure pv 1 Layout par v1 v2 pv 2 of the document. With each of the alternatives under a switch element, there is associated metadata that describe the context in which this specic alternative is the best choice for presentation. This metadata is specied as a set of discriminating attribute-value pairs for each alternative. During presentation, the user prole is evaluated against the metadata of the switch and that alternative is selected for presentation of which the discriminating attributes best match the current user prole. An illustration of the switch element is given in Figure 8. The switch element species two presentation alternatives: the rst alternative, bound to v1 is associated with a seminar-like teaching style (type, seminar) and the second one with a lecture-like type of teaching (type, lecture). When the document is presented, depending of the preferred type of teaching which is reected in the user's current prole, either the left or the right subtree is presented. As the switch element can specify an arbitrary number of alternatives each of which is described by an arbitrary number of attribute-value pairs this provides for a very comprehensive extent of adaptability as almost every aspect of a user and the environment can be distinguished and later be evaluated for adaptation during presentation. spatial-p [10,10,30,30] acoustic-p [20,0,0,0] Video switch [ (type, "seminar"), (type, "lecture")] Audio "seminar" v1 v2 "lecture" par par V Presentation Fig. 7. Simple fragment with spatial and acoustic projector elements and their semantics Image Subtitle Video Audio Fig. 8. Specication of presentation alternatives with the switch element As we have outlined in the requirements, reuse needs support for identication and selection of the multimedia A switch element can be used only if all alternatives can content to be reused | hence, metadata is needed. There- me modeled at authoring time, in advance to the presenfore, each ZYX fragment is assigned a set of metadata that tation. Hence, the switch element implements the requiredescribes its content by means of attribute-value pairs. ment for static adaptability of the model. However, there Adaptation. might be the case that an author cannot or does not want to Adaptation means that the ZYX document that is de- exactly specify a part of the presentation but only describe livered for presentation should best match the context of the desired fragments and defer the actual selection of suitthe user who requested the document. To support such a able fragments to the point in time when the document is kind of adaptation both a description of the user context requested for presentation. For example, an author might is needed and a multimedia document that can be adapted wish to specify that at a specic point in the presentation to this context. about \cardiac surgery" a digression into physiology is to The context of a user is captured in a so called user pro- be made, however, the author does not want specify which le, i.e., metadata that describes the user's topics of inter- fragments are relevant to this but have the most suitable est, presentation system environment, network connection one selected out of a pool of available fragments just before characteristics and the like. This metadata is organized as presentation. This can be specied with a query element. key-value pairs just as the metadata that is assigned to the By means of metadata the query represents the fragment multimedia content. that is expected at this point in the presentation. When The ZYX data model provides two presentation elements the document is selected for presentation the query element for an adaptation of the document to a user prole: the is evaluated and the element is replaced by the fragment switch element and the query element. The switch element best matching the metadata given by query element. An allows to specify dierent alternatives for a specic part illustration of the query element is given in Figure 9. The 8 sample query element is the place holder for the fragment best matching the query with topic \physiology in cardiac surgery", of type \lecture" and of 5 minutes duration. The more metadata tuples are used the more specic the query is. The query element provides for the dynamic adaptability of the model as the evaluation of the query and the selection of the fragment takes place just before presentation. Authoring Time query [ (topic, "physiology in cardiac surgery", (type, "lecture"), (duration, 5min.)] evaluate query element and select suitable fragment replace query element with selected fragment get. Note that this is element is not interactive. Based on the genericLink the menu element supports to interactively select one out of a set of visual elements and follow the presentation path that is associated with the selected element. The elements hotspot and hypertext dene negrained interactive visual areas in images and text. The design interactive elements are the interactive version of the projector elements. For example, for the typographic projector that allows to specify font, size, and style of a text, the interactive typographic projector element species that these settings can be altered interactively when the document is presented. IV. Formal Framework of the ZYX Model Presentation Time seq v1 v2 Image v3 Text par v4 Video v5 Audio Fig. 9. Specication of presentation alternatives with the query element | evaluation of the query element and replacement by selected ZYX fragment Presentation-neutral representation. The requirement of presentation-neutral representation is strongly interrelated with the structural reuse (see also Figure 7). The explicit separation of structure and layout allows for presentation-neutral representation. As outlined before, the variables of a presentation element need not to be bound in the rst place, this also applies for the projector variables. It is possible to specify the presentationneutral course of the presentation and, later, bind the presentation-dependent layout just when the document is selected for presentation. Then the presentation-neutral structure of the document is bound via projector variables to the presentation-dependent layout dened by a set of projectors. Temporal and spatial modeling. Based on the Interval Expressions [14] the model oers the primitives seq, par, loop, delay to specify temporal interval relationships. These presentation elements can be nested to specify any arbitrary temporal course of the multimedia presentation. For the spatial model we use the spatial projectors as presented above. They realize the absolute positioning we decided to use for the ZYX model. A spatial projector determines the spatial layout of the presentation element it is applied to and the layout applies to the entire subtree of the presentation element. Interaction. The requirement to support the modeling of interactive multimedia presentations is met by the data model's interaction elements. The model oers two types of interaction elements, navigational interactive elements and design interactive elements. The basic navigational element is the genericLink element that allows to specify the transition from the document to an arbitrary link tar- In this section, we present the formal framework of the ZYX model. Therefore, we introduce the reader into the basic terminology and formalism of the basic elements of the model and then present the elements for modeling the temporal course, the layout, interaction, and the adaptation of the presentation. Figure 10 gives the reader an overview of the denitions to follow. They are listed along the requirements and design criteria presented in Section II which where used for the comparison of document models, illustrated in Figure 1. A. Basic Terminology The presentation elements are the generic elements of the ZYX model. Each presentation element p has assigned exactly one binding point bp . This is the connector with which a presentation element can be bound to another presentation element. A presentation element has furthermore 0 to n variables v which are used to bind other presentation elements to it. To add layout information to a presentation element it optionally can have 0 to n projector variables pv that can be used to bind projector elements to the element. The projector variables are treated separately, due to separating structure and layout. The symbols introduced in Denition 1 are used in the denitions to follow. Denition 1 (Symbols) Let denote B the set of all binding points, V AR the set of all variables, P V AR the set of all projector variables, T the set of all element types, M T the set of media types, M E D the set of all raw media data, ZYXDOC the set of all ZYX documents, E X T the set of multimedia documents in an external document format, P T OT the set of all projector element types, AT T RI BU T E S the set of all possible attribute names, C OLORS the set of all possible colors. A presentation element p is dened as follows: Denition 2 (Presentation element) A presentation element p is a tuple p : [tp ; bp ; Vp ; P Vp ] with tp 2 T denoting the type of p, bp 2 B denoting the binding point of p, Vp VAR denoting the set of variables of p, and P Vp PVAR denoting the set of projector variables of p. The tuple p can be augmented with further tuple elements depending on the type tp of the presentation element. A presentation element p can be an atomic media element, a complex media element, an external media ele- 9 Name Basic Primitives p c presentation element connection generic element of the ZYX model interconneting presentation elements 2 3 Basic Elements am cm em atomic media element complex media element external media element represents a media element encapsulates a ZYX fragment encapsulates an external fragment in another document format 4 6 7 Temporal Model par seq loop delay parallel operator element sequential operator element loop operator element delay operator element specification of parallel presentations specification of sequential presentations specification of repetitive presentations specification of a temporal gap 8 9 10 11 spatial-p spatial projector element projects the visual presentation to a rectangular area 16 specifies the non-interactive transition to a target element or document specifies the non-interactive transition to a ZYX document specifies an interactive menu for selection of a presentation path specifies an interactive region in a visual element specifies an interactive text region specifies the interactive adjustment of speed and direction specifies the interactive scaling of a visual element 20 21 22 23 24 25 26 Spatial Model Interaction Navigational Design gen_link ZYX_link menu hotspot hypertext temporal-pi spatial-si Description Definition No. Label generic link element ZYX link element menu interaction element hotspot interaction element hypertext interaction element temporal interactive projector element spatial interactive selector element Reusability Granularity Media Elements am temporal-s spatial-s textual-s f = (P, C, M) cm em atomic media element temporal selector element spatial selector element textual selector element fragment complex media element external media element represents a media element selects a temporal part of a continuous media element or fragment selects a visual area of a visual media element or fragment selects a continuous text passage of a text element spatial-p temporal-p acoustic-p typographic-p spatial projector element temporal projector element acoustic projector element typographic projector element projects the visual presentation to a rectangular area determines the playback direction and speed factor of the presentation determines the volume, balance, base, and treble of the presentation determines the font, size, style, color, etc. of the presentation 16 17 18 19 f = (P, C, M) query fragment query element fragment specification with metadata for identification and selection specifies a query for a presentation fragment 5 30 Adaptation Parameters of Adaptability User Interest switch decide query switch element decide element query element specifies presentation alternatives for continuous adaptation specifies presentation alternatives for adaptation specifies a query for a presentation fragment 28 29 20 switch decide query switch element decide element query element specifies presentation alternatives for continuous adaptation specifies presentation alternatives for adaptation specifies a query for a presentation fragment 28 29 30 switch element decide element query element specifies presentation alternatives for continuous adaptation specifies presentation alternatives for adaptation 28 29 specifies a query for a presentation fragment 30 Fragments and Documents Kind of Reusage Identification/Selection Technical Infrastructure Definition of Alternatives switch Static decide Dynamic Presentation-neutral Representation Multimedia Functionality: Semantic Level: query fragment specification encapsulates a ZYX fragment encapsulates an external fragment 4 13 14 15 5 6 7 the model provides a comprehensive set of elements for providing high multimedia functionality the model separates structure from layout by separating structural composition from projectors Fig. 10. Summary of denitions of ZYX elements ment, a specic operator element to build up the temporal, structural and interactive relationships, or serve for the specication of adaptation. This is distinguished by the type tp in the denition of a presentation element p. is given in Denition 3. Denition 3 (Atomic media element) An atomic media element am : [tam ; bam ; Vam ; P Vam ; m] is a presentation element with tam 2 M T = The basic units of a ZYX multimedia document are the fAudio; V ideo; I mage; T ext; Animationg T , Vam = ;, atomic media elements. An atomic media element is an and m 2 M E D denoting the media data represented by instantiation of a media type. An atomic media element am. Presentation elements are interconnected using their in our model abstracts from the raw media data and just represents the media element and its media specic charac- variables and binding points. Each variable and also each teristics. The formal denition of an atomic media element projector variable of a presentation element can be bound 10 to exactly one binding point of another presentation element. Each binding point of a presentation element can be bound to exactly one variable or projector variable of another presentation element. A connection binds one variable to a binding point, and is formally dened in Denition 4: Denition 4 (Connection) A connection c = [v; bp ] connects the (projector) variable v 2 Vp [ P Vp of a presentation element p with the binding point bp of presentation element p0 6= p. The result of interconnecting presentation elements is a specication tree that describes a reusable fragment of multimedia document. A fragment can be comprised of a single media element, a part, or an entire multimedia document. The formal description of a valid fragment is given in the following Denition 5. Denition 5 (Fragment) A fragment f = (P; C ) is an acyclic, undirected graph that describes a part or an entire multimedia document with: P the set of presentation elements that are part of the tree. C f[v; bp ] j p; p0 2 P; p 6= p0 ; v 2 Vp [ P Vp g the set of connections in the tree. For a valid fragment f = (P; C ) the following conditions must hold: 1. If c1 ; c2 2 C , c1 = [v1 ; bp ]; c2 = [v2 ; bp ]; p 2 P then v1 = v2 , i.e., each binding point can be bound to only one variable. 2. If c1 ; c2 2 C; p; p0 2 P and c1 = [v; bp ]; c2 = [v; bp ] then 0 p = p , i.e., each variable can be bound to only one binding point. S V : [v; b ] 2 3. U nboundf = fp 2 P j:9v 2 p p 0 0 reuse of ZYX fragments, we introduce the denition of a complex media element. A complex media object cm encapsulates a fragment f = (P; C ) within the denition of a presentation element, somehow like a container. With this denition, an encapsulated fragment can simply be reused like a single presentation element in any other fragment. A complex media element cm is dened as follows (Denition 6): Denition 6 (Complex media element) A complex media element cm : [tcm ; bcm ; Vcm ; P Vcm ; f ] is a presentation element that encapsulates the fragment f = (S P; C ) with tcm = C omplex 2 T , bcm = brootf , Vcm = fv 2 Vp j 8q 2 P : [v; bq ] 2 = C g, and p2P P Vcm = fpv 2 S p2P p PV j8 2 q P : [pv; bq ] 2= C g. That is, the binding point of the root brootf of the encapsulated fragment f becomes the binding point bcm of the complex media object cm. All variables and all projector variables in the fragment f that are not bound are exported and form the free variables Vcm and projector variables P Vcm of the complex media object. For an illustration recall Figure 6: The binding point of the seq element becomes the root element of the complex media element, and the unbound variables v1 , v6 , v8 , v9 , and v5 become the free variables of the complex media element. As complex media objects encapsulate ZYX fragments, they oer a means of abstraction. The export of free variables allows for a later accomplishment of the complex media element. Hence, complex media elements can form templates which can be \lled" later by binding media elements, other complex media elements and fragments to the free variables. This \late binding" of presentation elp 2P ements to the free variables nally instantiates the actual C g and jU nboundf j = 1, rootf 2 U nboundf ^ trootf 2 = PT. ZYX document. There is exactly one presentation element p 2 P of the To encapsulate fragments that are specied in an exterfragment f that is not bound to any other presentation nal format we dene external media elements (Denition element. This unbound presentation element is called the 7). An external media element em is also a complex media root element, denoted rootf , of the fragment and has the element. It encapsulates, however, not a fragment specibinding point brootf that forms the \entry point" of the ed in ZYX, but the specication of an external fragment fragment; note that projector elements cannot be root ele- available in another data model. Like the complex media ments. element, the external media element has assigned a set of 4. There is no sequence of connections c1 ; : : : ; cn , such that variables Vem , projector variables P Vem , and one binding ci = [vi ; bpi ]; i = 1 : : : n 1, with vi+1 2 Vpi , and v1 2 Vpn . point bem . However, the meaning of the variables and proThis meansS that f is acyclic. jector variables depends on the external document format. 5. 8pv 2 P Vp : 9[pv; bp ] 2 C ) tp 2 P T : Denition 7 (External media element) p2P An external media element cm : [tem ; bem ; Vem ; P Vem ; f ] is Projector variables of a presentation element can bind only a presentation element that encapsulates the fragment f 2 projectorSelements. E X T with tem = E xternal 2 T , bem binding point of the 6. 8v 2 Vp : 9[v; bp ] 2 C ) tp 2 = P T: external fragment, Vem variables of the external fragment, p2P Variables of a presentation element can not bind projector P Vem projector variables of the external fragment. elements. With the denitions given so far it is possible to com7. 8p 2 P : tp 2 P T ) Vp = P Vp = ;. pose presentation elements by means of connections. The A projector element can not bind any other presentation interconnection of presentation elements via their variables element. and binding point puts these presentation elements in a reFragments form the building blocks of a multimedia doc- lationship, the semantics of this relationship, however, is ument. They are the units that can be reused and re- not yet dened. Therefore, our data model oers diercomposed in dierent multimedia documents. To ease this ent types of presentation elements, operator elements, with 0 0 0 0 0 0 0 0 11 which presentation elements can be interconnected with a certain semantics. In the following, we present the element denitions of temporal operators, projectors, selectors, interaction elements, and adaptation elements. These elements determine the semantics that have to be interpreted by a presentation environment and mapped into the spatial, temporal, structural, interaction, and adaptive domain of a multimedia presentation. The dierent operator elements are dened in the tuple notation as already introduced for the generic presentation element. Again, the type distinguishes the dierent operator elements. For the dierent elements the tuple carries additional operator type-specic values that characterize the element's specic semantics. Not to be repetitive in the denitions to follow, only the domains of each of the newly introduced tuple elements are given. B. Temporal Operator Elements The temporal operator elements determine the temporal relationships between the presentation elements. As outlined above, our temporal model is based on Interval Expressions [14]. In the following, we present the denition of the temporal operator elements par, seq , loop, and delay , their specic parameters, and semantics. An illustration of these temporal operator elements is shown in Figure 11. The presentation semantics of the par operator element (Denition 8) is that the presentation elements bound to its variables are to be presented in parallel. Denition 8 (Temporal operator element | par) The temporal operator element par : [tpar ; bpar ; Vpar ; P Vpar ; f inish; lipsync] is a presentation element with tpar = P ar 2 OT , Vpar = fv1 ; : : : ; vn g VAR, f inish 2 f1; : : : ; n; min; maxg, and lipsync 2 N0 . The par operator element oers the two parameters f inish and lipsync to control the synchronization of parallel presentation: The parameter f inish determines which one of the n presentation elements fv1 ; : : : ; vn g terminates the parallel presentation. If f inish is set to min or max then the presentation stops when the presentation of the element with the minimal presentation time stop, respectively with the maximal presentation time. By setting f inish = i; i 2 f1; : : : ; ng the presentation stops when the presentation of the dedicated presentation element bound to vi stops. The second parameter lipsync determines the element that forms the master of a continuous ne synchronization during playout of the par operator. If the second parameter lipsync equals 0 then no lip synchronization is specied. If the value of lipsync is i, i > 0, the presentation of the presentation elements bound to v1 ; : : : ; vn is carried out in lip synchronization and the presentation element bound to vi forms the master of this synchronization. The presentation semantics of the seq operator element (Denition 9) is that the presentation elements that are bound to it are presented in sequence. The presentation of a seq operator element starts the sequential presentation of the presentation elements that are bound to the variables vi ; i = 1 : : : n in the order of v1 ; v2 ; : : : ; vn . The presentation of the seq operator element begins with the presentation of the presentation element bound to V1 and ends with the end of the presentation of the element bound to vn . Denition 9 (Temporal operator element | seq ) The temporal operator element seq : [tseq ; bseq ; Vseq ; P Vseq ] is a presentation element with tseq = S eq 2 OT , and Vseq = fv1 ; : : : ; vn g VAR. The presentation semantics of the loop operator element (Denition 10) is that its presentation starts the repeated presentation of the single presentation element bound to v 2 Vloop . The presentation is repeated r times and stops after the rth presentation of the presentation element. If r is set to 1 the presentation of the element loops forever. Denition 10 (Temporal operator element | loop) The temporal operator element loop : [tloop ; bloop ; Vloop ; P Vloop ; r ] is a presentation element with tloop = Loop 2 OT , jVloop j = 1, and r 2 N [ 1. The delay operator element (Denition 11) models a temporal delay of t milliseconds. It can be seen as an \empty" media element that is presented for a duration of t milliseconds. Denition 11 (Temporal operator element | delay ) The temporal operator element delay : [tdelay ; bdelay ; Vdelay ; P Vdelay ; t] is a presentation element with tdelay = Delay 2 OT , Vdelay = ; = P Vdelay , and t 2 N . loop[10] seq par Video 1 delay[50] Text 1 Video 2 par delay[50] Text 2 Fig. 11. Fragment illustrating the usage of the temporal operator elements Figure 11 illustrates the dierent temporal operators dened above. The loop element that forms the root of the sample fragments species, that the subtree is repeated 10 times. This subtree is comprised of a sequence of two videos with accompanying texts, each followed by a short temporal gap of 50ms for the transition. C. Selectors The model oers selector elements to reuse parts of media elements and fragments, i.e., spatial regions and temporal intervals. First, Denition 12 introduces the notion of a successor in a fragment needed for subsequent denitions. Denition 12 (Successor) Let F denote the set of all fragments. We then dene a function expand : F ! F that computes for a fragment 12 the fragment that is semantically equivalent to f but does not contain any complex media element. The function expand(f ) recursively replaces each complex media element in f by the fragment that the complex media element encapsulates. Be f 2 F a fragment, expand(f ) = (P; C ) the expanded fragment, and p; p0 2 P presentation elements. Then the following direct and indirect successor relationships hold: 1. p0 is direct successor of p () 9[v; bp ] 2 C : v 2 Vp . 2. p0 is indirect successor of p () p0 is not a direct successor of p and there exists a sequence succ1 ; : : : ; succn ; n 2 N with succ1 is direct successor of p, succi is direct successor of succi 1 ; i = 2; : : : ; n, and p0 is direct successor of succn . 3. p0 is successor of p () p0 is direct or indirect successor of p. For example, in Figure 11 the seq element is a direct successor of the loop element. The video and text elements are indirect successors of the loop element and a direct successor of the parallel element. There is no successor relationship between image and the audio media element. Now we can dene the dierent selector elements, the temporal selector, spatial selector, textual selector, and the acoustic selector. A temporal selector temporal-s (Denition 13) is a presentation element that can bind exactly one other presentation element p. The presentation semantics of this element is that the presentation of the direct and indirect successors of p is started start milliseconds after the original starting point of the fragment and lasts for duration milliseconds. Denition 13 (Temporal selector element | temporal-s) The temporal selector element temporal-s :[ttemporal s ; btemporal s ; Vtemporal s ; P Vtemporal s ; start; duration] is a presentation element with jVtemporal s j = 1, ttemporal s = Temporal-S 2 OT , and start; duration 2 N0 . A spatial selector spatial-s (Denition 14) element can bind exactly one other presentation element p, which can be a visual media element like an image or a video but also a complex media element with visual appearance. The spatial selector selects a spatial area from p. The presentation semantics of the spatial selector is that only those visual parts of p and its successors that are visible within the rectangular area that is specied with the element's parameters x; y; width; and height are presented. For an illustration of the spatial selector confer Figure 5. Denition 14 (Spatial selector element | spatial-s) The spatial selector element spatial-s :[tspatial s ; bspatial s ; Vspatial s ; P Vspatial s ; x; y; width; height] is a presentation element with tspatial s = Spatial-S 2 OT , jVspatial s j = 1, x, y 2 N0 , and width; height 2 N . The application of temporal and spatial selector elements is context sensitive. That is, they apply to the entire subtree of the presentation element bound to it. Selector elements can be organized in a hierarchy and each selector element is applied in the the context of the subtree it is bound to. For an illustration, consider the example given in Figure 12: Two temporal selector elements s1 and s2 with s1 = [Temporal-S; bs1 ; fvs1 g; ;; 10; 25] and f 0 2 = [Temporal-S; bs2 ; fvs2 g; ;; 10; 40] (time in seconds) are nested with s2 being a direct or indirect successor of s1 . Then the selected temporal interval dened by s1 is dened relative to the temporal interval specied by s2 . That is, the start time 10sec of s1 is relative to the beginning of the interval already selected by s2 . s temporal-s Interval selected by s1 : [..., 10,25] v temporal-s s1 10 temporal-s 25 Interval selected by temporal-s s2 s2 : [..., 10,40] v 10 40 Video ... Video 0 10 20 30 40 50 t in sec Fig. 12. Sample fragment illustrating the usage and semantics of nesting temporal selector elements s1 : [:::; 10; 25]; s2 : [:::; 10; 40] To also be able to reuse parts of text, a textual selector spatial-s (Denition 15) selects a continuous fraction from a text media element bound to the variable of p. The presentation semantics is that only the selected part of the text is presented, i.e., the text fraction that begins at the text position start and has the given length in characters. Denition 15 (Textual selector element | textual-s) The textual selector element textual-s :[ttextual s ; btextual s ; Vtextual s ; P Vtextual s ; start; length] is a presentation element with ttextual s = Textual-S 2 OT , jVtextual s j = 1, start 2 N0 , and length 2 N . D. Projectors To add layout information to a presentation element its 0 to n projector variables pv can be used to bind projector elements to the presentation element. Projector elements are presentation elements that determine how presentation elements are presented. The model oers the four dierent projector elements spatial-p, temporal-p, acoustic-p, and typographic-p, to specify the temporal, spatial, acoustic, and typographic layout of a presentation which we dene in the following. The presentation semantics of the spatial projector element spatial-p (Denition 16) is that the visual presentation of p, the presentation element it is bound to, is \projected" on the rectangular presentation area dened by the projector element. The parameters x and y dene the position of the upper left corner of a rectangle with the given width and height. The parameter priority denes the order of the overlapping of visual objects such that an object with a higher priority value covers objects with a lower priority value. The value of the parameter unit determines whether the values of the parameters x; y; width; height are given in pixel or in percent of a presentation window. Denition 16 (Spatial projector element | spatial-p) The spatial projector element spatial-p : [tspatial p ; bspatial p ; Vspatial p ; P Vspatial p ; x; y; width; height; priority; unit] is a presentation element with tspatial p = Spatial-P 2 P T , 13 V spatial p = height spatial p = ;, x, y , PV 2 N , and unit 2f priority pixel; percent g. 2 N0 , , width The spatial projector, like all projector elements, applies not only to the presentation element p it is bound to but so all successors of p. That is it aects the entire subtree of which p is the root element with regard to the spatial projection. The visual parts of p and possibly successors are scaled to the presentation area dened by the projector's parameters. If spatial projectors are nested then each spatial projector spatial-p is evaluated in its context. Figure 13 illustrates the usage and the semantics of nesting spatial projector elements. In the example, the root par element has a spatial projector bound to it that species the rectangle presentation area for the subtree as [x = 10; y = 10; w = 100; h = 100]. This area is indicated on the right part of the gure with an dotted rectangle. The two images that are successors of the par element each have an own spatial projector. The spatial projector of I mage1 in the subtree denes a presentation area [x = 0; y = 0; w = 40; h = 40] and the second image a presentation area with [x = 60; y = 60; w = 40; h = 40]. In consequence both spatial projectors of the images are evaluated in the context of the spatial projector bound to the par element. Therefore, the areas of the tow images are projected within the area dened by the spatial projector of the par element. pv par v1 v2 spatial-p [10,10,100,100] Image 1 pv Image 2 pv spatial-p [0,0,40,40] spatial-p [60,60,40,40] Presentation Fig. 13. Sample fragment illustrating the usage and semantics of nesting spatial projector elements element p denes speed = 2 and a successor p0 of p has a temporal projector that also denes speed = 2 then in fact the successor p0 is presented at a speed factor of 4. In the same way the acoustic projector element and the typographic projector element are dened. The acoustic projector element acoustic-p (Denition 18) determines the volume, balance, base, and treble of the presentation of the presentation element p and all successors of p. The typographic projector element typographic-p (Denition 19) affects the parameters font, size, style, background and foreground color of the presentation of the presentation element p it is bound to and all successors of p. Denition 18 (Acoustic projector element | acoustic-p) The acoustic projector element acoustic-p : [tacoustic p ; bacoustic p ; Vacoustic p ; volume; balance; base; treble] is a presentation element with tacoustic p = Acoustic-P 2 P T , Vacoustic p = P Vacoustic p = ;, volume 2 [0; : : : ; 100], and balance, base,treble 2 [ 1; : : : ; 1]. Denition 19 (Typographic projector | typographic-p) The typographic projector element typographic-p : [btypographic p ; Vtypographic p ; f ont; size; style; bg; f g ] is a presentation element with ttypographic p = Typographic-P 2 P T , Vtypographic p = P Vtypographic p = ;, f ont 2 F ontN ames, style 2 fnormal; italic; boldg, size 2 point, and bg; f g 2 C OLORS . A projector element at rst aects those presentation element p it is bound to. If, however, p has successors, than these can be aected to. Each successor of p is aected if the specic projector can acutally have an aect on it. For example, a typographic projector aects only those elements in the subtree of p that bear typographic aspects. In Figure 7, a spatial and an acoustic project element are bound to a par temporal operator. The spatial projector applies only to the video, whereas the acoustic projector applies only to the audio that is bound to the par element. E. Interaction Elements To support the requirement of interactive multimedia The presentation semantics of the temporal projector el- presentations, the model oers dierent interaction eleement temporal-p (Denition 17) bound to a presentation ments for navigational and design interactions. The gen link element (Denition 20) is the basic eleelement p is that the element p is presented with the given playback direction and speed. The parameter direction ment for the modeling of navigation in ZYX documents. species, whether the presentation element (and its sub- The generic link is the presentation element that species tree) is presented in forward (drection = 1) or in backward a non-interactive direct transition to a target element. It direction (direction = 1). The actual playback speed is serves as the basis for the actual \interactive" elements in computed by multiplying the original playback speed with the following. The gen link has the two parameters target the factor given by the speed parameter. and mode. The prameter target species the target of the Denition 17 (Temporal projector element - temporal-p) transition and the parameter mode the way how this tranThe temporal projector element temporal-p : [ttemporal p ; sition is to be carried out. btemporal p ; Vtemporal p ; P Vtemporal p ; direction; speed] is Denition 20 (Generic Link element | gen link ) a presentation element with ttemporal p = Temporal-P 2 The interaction element gen link : [tgen link ; bgen link ; P T , Vtemporal p = P Vtemporal p = ;, direction 2 f 1; 1g, Vgen link ; P Vgen link ; target; mode] is a presentation eleand speed 2 <+ . ment with tgen link = GenericLink 2 OT , Vgen link = Like the spatial projector element a temporal projector P Vgen link = ;, target 2 dom(U nif orm Resource I dentielement applies not only to the presentation element p it is f ier), and mode 2 fstop; spawng. bound to but to all successors of that presentation element. The presentation semantics of the gen link is that on If, for example, the temporal-p projector of a presentation the presentation of the link element, the link target which 14 is specied by a Uniform Resource Identier (URI) is presented. The target need not to be a ZYX document but can be an HTML document or an arbitrary application and it presented by the browser/viewer that is associated with the target's URI. The mode of the generic link determines whether the current presentation stops and only the target is presented (mode = stop) or if the presentation of the target is presented in parallel with the current presentation (mode = spawn). The ZYX sample tree in Figure 14 shows a video-audio presentation which is followed directly by the presentation of the link target, i.e., the presentation of the target specied with anURI. seq par Video 1 gen_link [anURI] i 2 Vmenu ; i = 1 : : : n representing the menu items. The presentation elements bound to ti 2 Vmenu ; i = 1 : : : n represent the target elements of the selection. Each selectable menu item bound to vk corresponds to the target tk . The presentation semantics of the menu element is that on presentation of the menu element, in parallel all the elements bound to vi 2 Vmenu ; i = 1 : : : n are presented, i.e., the menu is presented. When a user selects one of the menu items bound vj , the target element of the selection bound to tj is presented. The parameter mode determines what happens with the current presentation. If mode = vanish, the engine nishes the presentation of all presentation elements bound to vj ; j = 1 : : : n, and starts the presentation of the presentation element bound to ti . If parameter mode = prevail, the engine \merges" the presentation of the presentation element bound to ti with the currently running presentation. If no element (menu item) is selected by a user, the presentation of the menu element stops as soon as the presentation of all presentation elements bound to vi ; i = 1 : : : n, is nished. v Audio 1 menu Fig. 14. Sample fragment illustrating the usage and semantics of the gen link interaction element As the generic link is intended to model transitions to arbitrary link targets, we introduce the ZYX link (Denition 21) to specify the specic transition to a ZYX document. Denition 21 (ZYX Link element | ZYX link ) The interaction element ZYX link : [tZYX link ; bZYX link ; VZ X link ; P VZ X link ; target; mode] is a presentation eleY Y ment with tZYX link = ZYX LI N K 2 OT , VZYX link = P VZ X link = ;, target 2 ZYXDOC , and mode 2 Y fstop; spawng. The semantics of the ZYX link is that on its presentation the ZYX document specied by target is presented. The parameter mode describes whether presentatin of the current document stops and the target ZYX document is presented (mode = stop) or if it is presented in parallel with the current presentation (mode = spawn). So far the elements gen link and the ZYX link are used to model a direct, non-interactive transition to a link target. For a link transition initiated by a user interaction with a visual presentation element, we dene the menu interaction element. The menu interaction element (Denition 22) denes a set of variables to which the presentation elements of the visual link anchors are bound and the corresponding presentation elements that are to be presented when the respective link anchor is interactively selected. Denition 22 (Interaction element | menu) The interaction element menu : [tmenu ; bmenu ; Vmenu ; P Vmenu ; mode] is a presentation element with tmenu = M enu 2 OT , mode 2 fvanish; prevailg, Vmenu = fv1 ; : : : ; vn ; t1 ; : : : ; tn g, and n 2 N . The menu interaction element denes a set of selectable presentation elements (link anchors) bound to v1 Image 1 Video 1 t1 v2 par t2 Image 2 ZYXLink [document 1] Audio 1 Fig. 15. Sample fragment illustrating the usage and semantics of the menu and ZYX link interaction element Figure 15 illustrates the usage of the menu element. 1 and I mage 2 represent the two selectable menu items. On interaction with I mage 1 the presentation of the video-audio presentation bound to t2 starts. On interaction with the link anchor I mage 2 bound to v2 , the presentation of a ZYX link starts which results in the presentation of the ZYX document bound to t2 . The menu interaction element is provided to allow for interaction with visual presentation elements and navigation within a document, i.e., the selection of one out of a set of possible presentation paths. By using the gen link and ZYX link as target elements of the menu element, these paths can leave the document and lead to other documents. So far the appearance of a link is limited to the visual appearance of the presentation element that forms the link anchor. To oer a more ne-grained specication of link anchors, e.g., a region in an image or a word within a text, the ZYX model oers the primitives hotspot and hypertext. The hotspot element (Denition 23) is a variant of the menu element but renes the interaction sensitive area to an arbitrary polygon of a visual element. In addition to the link anchors in the menu element it species a set of sensitive areas by polygons. Instead of linking a set of link anchors with a set of targets in the menu element, the hotspot element interlinks areas of visual presentation I mage 15 elements with link targets. Denition 23 (Interaction element | hotspot) The interaction element hotspot : [thotspot ; bhotspot ; Vhotspot ; P Vhotspot ; P1 ; : : : ; Pn ; mode] is a presentation element with thotspot = H otS pot 2 OT , Vhotspot = fv; t1 ; : : : ; tn g, P Vhotspot = ;, Pi = [[< x1 ; y1 >; : : : ; < xm ; ym >], [start; dur]], mode 2 fvanish; prevailg, n 2 N . The presentation semantics of the hotspot is the presentation of the link anchor bound to v and, not necessarily visible, the associated interaction-sensitive areas. These areas are dened each by a tuple Pi that species the sensitive area by a polygon < x1 ; y1 >; : : : ; < xm ; ym > and the interval [start; dur] for which the sensitive area is active during the presentation. This interval is related to the beginning of the presentation of the hotspot. On user interaction with the sensitive area specied by Pi the corresponding link target ti is presented under the given mode (vanish or prevail). A further variant of the menu element is the hypertext element (Denition 23). As a hotspot allows to associate an interaction-sensitive region of an image or a video with a link, the hypertext element oers a means to model sensitive parts within text. Like the hotspot, a hypertext interaction element is sensitive for a specied temporal interval [start; dur]. Denition 24 (Interaction element | hypertext) The interaction element hypertext : [thypertext ; bhypertext ; Vhypertext ; P Vhypertext ; T1 ; :::; Tn ; mode] is a presentation element with thypertext = H yperT ext 2 OT , Vhypertext = fv; t1 ; : : : ; tn g, and P Vhypertext = ;, Ti = [[start; length], [start; dur]], mode 2 fvanish; prevailg, n 2 N . The presentation semantics of the hypertext is that on its presentation the presentation of the text anchor bound to v starts. The hypertextelement species the sensitive regions of the text by means of tuples Ti = [[start; length]; [start; dur]] each dening a sensitive text segment by its starting text position and its length and the temporal interval for which the sensitive text area is active during the presentation. On user interaction with the sensitive segment dened by Ti of the text the corresponding link target ti is presented under the given mode (vanish or prevail ). The model provieds two further types of interaction elements, interactive projector elements and interactive selector elements. These elements comply in general with the projector and selector elements presented in Denitions 16 to 19, but they have an additional \interactive" aspect, i.e., they can be interactively changed and adjusted by a user. For each of the projector and selector element, a corresponding interactive projector element is provided by the model. An example of an interactive projector element is the interactive temporal projector element temporal-pi (Denition 25) that is an interactive temporal-p projector element. Its presentation semantics is that in addition to the specied temporal projection during presentation a user can interactively adjust the element's specic parameters and speed within their domains. For each temporal projector the model oers the corresponding interactive projector element. Denition 25 (Interaction element | temporal-pi) The temporal interactive projector element temporal-pi : [btemporal pi ; Vtemporal pi ; P Vtemporal pi ; direction; speed] is a presentation element Vtemporal pi = ;, speed 2 <+ , direction 2 f 1; 1g An example of an interactive selector is the interaction element spatial-si (Denition 26) that is a special spatial-s selector. Its presentation semantics is that in addition to the spatial selection the presentation engine oers a user to interactively adjust the selected spatial area and the overlapping by changing the parameters x, y , width, height and priority within their domains. Denition 26 (Interaction element | spatial-si) The spatial interactive selector element spatial-si : [bspatial si ; Vspatial si ; P Vspatial si ; x; y; width; height] is a presentation element with Vtemporal si = ;, x; y 2 N0 , width; height 2 Ng. Analogously, the temporal-si element is dened. The interactive selector elements allow to model the interactive spatial and temporal scaling of media elements and fragments during the presentation. In addition to the support for navigational interaction by the elements gen link , ZYX link , menu, hotspot, and hypertext, the interactive projector and selector elements implement the design interactions of multimedia presentations. direction F. Adaptation elements Our model oers the two elements switch and query which allow for the adaptation of a multimedia presentation according to the user's individual context. This user context, expressing the user's topics of interest, presentation system environment, network connection characteristics and the like, is described in a global prole GP by means of attribute value pairs (Denition 27). Denition 27 (Global prole | GP ) The Global Prole GP : [m1 ; : : : ; mn ] is a set of metadata with mi = [attri ; valuei ] denoting attribute-value pairs that describe the current user context during a presentation with attri 2 AT T RI BU T E S and valuei 2 dom(attri ); i 2 N. The switch adaptation element (Denition 28) serves the purpose to specify dierent presentation alternatives for dierent contexts. Under a switch element an author can \collect" dierent alternatives (media elements or fragments) and add metadata to each alternative that specify under which presentation conditions the alternative is to be selected. Thereby, an author can dene dierent fragments for conveying the same content under dierent presentation context like system environment, user language, the user's understanding of the subject, network bandwidth, and the like. The metadata associated with the switch element is evaluated by the presentation environment against the 16 global prole to select the one best matching the current context. Denition 28 (Adaptation element | switch) The adaptation element switch : [tswitch ; bswitch ; Vswitch ; P Vswitch ; M1 ; : : : ; Mn ] is a presentation element with tswitch = S witch 2 OT , Mi denoting sets of attributevalue pairs, Vswitch = fv1 ; : : : ; vn ; vdefault g, and n 2 N . The presentation semantics of the switch element is that upon its presentation the metadata available with the GP is evaluated against the sets of metadata Mi ; i = 1 : : : n of the switch. Let Mj ; j 2 f1; : : : ; ng be the set of metadata which matches best GP . Then, the fragment bound to vj , i.e., the presentation alternative best matching the current presentation context, is presented. If there is no suitable set of metadata among M1 ; : : : ; Mn , the presentation element bound to vdefault is selected for presentation. The metadata of the switch element is continuously evaluated against the current, possibly changing global prole, i.e., changing presentation context like varying bandwidth. In this case during the presentation of the switch element the presentation environment can select another more suitable alternative due to a changed context, e.g., switching from a video to a slide show due to decreasing network bandwidth. The presentation of the switch element nally terminates when the presentation of the selected presentation element is nished. For cases in which an author does not want to allow this kind of continuous adaptation the model provides the decide element. The usage of a decide element instead of the switch element would, e.g., make the presentation stay with the video, once selected, instead of switching to an alternative slide show. The denition of the decide element is given in Denition 29: Denition 29 (Adaptation element | decide) The adaptation element decide : [tdecide ; bdecide ; Vdecide ; P Vdecide ; M1 ; : : : ; Mn ] is a presentation element with tdecide = Decide 2 OT , Mi denoting sets of attribute-value pairs, Vdecide = fv1 ; : : : ; vn ; vdefault g, and n 2 N , The presentation semantics of the decide element is the same as that of the switch element. However, the evaluation of the sets of metadata against the current global prole GP and the selection of the best match is made only once at the beginning of the presentation of the decide element. For cases in which the presentation alternatives of a document are not known at authoring time, the query element (Denition 30) is provided. The query element is just a \placeholder" for a fragment. It species a \query" which selects a fragment just before presentation time from all available fragments. The resulting fragment replaces the query element in the ZYX document. For the denition of the query element, we enhance the denition of a fragment as given in Denition 5 such that a fragment specication also includes metadata, i.e., f = (P; C; M ) with M being a set of attribute-value pairs. This metadata describes both the content of a fragment f like the topics covered and technical features of the fragment like the network bandwidth needed for its presentation. Denition 30 (Adaptation element | query ) The adaptation element query : [tquery ; bquery ; Vquery P Vquery ; M ] is a presentation element with tquery = Query 2 OT , M denoting a set of attribute-value pairs, and Vquery = ;. The semantics of the query element is that before the actual presentation the metadata of the query element and the global prole specied with M [GP is evaluated against the metadata given with all fragments known to the system. Then the fragment with the best match with respect to M and the prole GP is selected and the query element is replaced by the selected fragment. The query element allows to dynamically select the most suitable fragment at presentation time taking into account the actual user interest and system environment. V. Application of ZYX and Implications to Authoring and Presentation We have made clear how important we consider the support for reuse and adaptation by a multimedia document model, requirements we were aiming to meet in the ZYX model. This section illustrates application of these two specic features we elaborated in detail in Section II-A in authoring and presenting of ZYX multimedia documents. We start out with presenting the many dierent kinds of reuse of ZYX elements and fragments in Section V-A before we come to the various possibilities to employ ZYX for adaptation in Section V-B. And, looking on the impact of new document models like ZYX for multimedia content production, we point at the implications and positive eects this has to multimedia authoring. A. Reuse Applying ZYX for reuse means that, rst, we show how identication and selection is supported by ZYX as this forms the basis for eÆcient reuse of media elements, fragments, and documents. Then, we show application of reusing ZYX elements in dierent granules and present structural vs. identical reuse in ZYX . A.1 Identication and selection Support for identication and selection is obligatory for content to be eÆciently reused. Only if the content can easily be retrieved within the authoring process reuse of material is possible. Hence, sophisticated metadata must be associated with media elements, fragments, and documents. The metadata for the media elements comes with the modeling of the dierent media types. At the level of fragments, a set of metadata describes the content of the composition. This metadata is anchored in the denition of a ZYX fragment f = (P; C; M ), and relates especially to the content and targeted user group. The available metadata concerning both the content and the structure of fragments can be employed for the browsing of fragments in an authoring environment and to identify and select fragments for composition of ZYX documents. 17 seq par par temporal-s [10,15] textual-s [80,25] temporal-s [120,20] textual-s [483,20] Video Text Video Text Video showing the opening a patient’s chest Text, explaining how to open the chest Fig. 16. Reuse of media elements in ZYX A.2 Dierent granularity of reuse Equipped with the modeling of metadata of the media elements and ZYX fragments we illustrate how reuse of media elements, fragments, and documents can be extensively applied with ZYX. Reuse of media elements. Atomic media elements represent the raw media data within ZYX documents. These elements can be reused entirely or only partwise. Atomic media elements form the leaves of the document structure. One media element can be used in dierent branches of the tree. As the atomic media elements only represent the actual media data only the atomic media elements are then used several times in the document, however, the mere data exists only once. To select only a part of a media element, the selector elements are used. They select the desired scene, visual area, or sound sequence of a medium. In Figure 16 two dierent scenes of the same video showing the opening of patient's chest before the actual operation on the open heart as well as two dierent parts of the same text explaining the operative steps are composed in a ZYX document. The reuse of media elements, especially partly reuse, can avoid redundant preparation of media data just for one single application. Reuse of fragments and complex media elements. The composition of presentation elements leads to fragments of arbitrary size and complexity. Fragments can be reused as fragments themselves but also encapsulated within complex media elements. Both the fragments and the complex media elements can be bound to any other variable during the composition of a (new) document. Exploiting identication and selection as discussed in Section V-A.1, an authoring environment for ZYX here can oer the author fragments or complex media elements relevant in the desired context to be part of the newly composed document. The only dierence between reusing complex media elements and fragments is that with the complex media elements the structure and complexity of the selected sub- part of the document are intendedly hidden from the author. Rather the semantics of the complex media element is important, e.g., it comprises a slide show, and of how to ll the unbound variables with presentation elements and fragments. As the structure of all ZYX documents is accessible and explicitly visible, authoring support could go so far that a sophisticated content based search algorithm identies those nodes (presentation elements) in other documents that could be of interest to an author and extracts the respective subtree (=fragment) for reuse. The reuse of fragments and complex media elements of arbitrary size is a feature that relieves an author from cut&paste of formerly composed documents but opens the way to composition of multimedia documents much like using a Lego or K'NEX unit construction set. Figure 17 illustrates the reuse of fragments and complex media elements. In the example, the fragment introduced already in Figure 16 is reused in a course about operative surgery. Additionally, an already existing complex media element about a bypass operation is inserted as a digression of the course into the specic domain of open heart surgery. The fragment and the complex media element are, e.g., arranged in an sequential order and this sequence is then, indicated by the dashed line, part of the entire course. seq Reused fragment seq bypass Reused complex media element par par temporal-s [10,15] textual-s [80,25] temporal-s [120,20] textual-s [483,20] Video Text Video Text Fig. 17. Reuse of fragments and complex media elements in ZYX Reusable templates. With ZYX an author can dene templates that cover, e.g., a didactic unit like a multimedia course, a lecture, a technical guide, a tour through a museum, and the like. Such a template is a regular ZYX fragment but with unbound variables, i.e., the author leaves some of the leafs of the tree unbound. These templates give other authors a basic structure to start with for the composition of a new ZYX document. Consider the sample fragment in Figure 18: It forms a sequence of ve presentation elements two of which are bound to a parallel operator. This fragment is encapsulated into a complex media element denoted aTemplate which then another author uses to \plug-in" the missing presentation elements and, hereby, forms a new document. In Figure 18 two complex media elements, a title and a summary, and two videos with captions are bound to the template aT emplate, e.g., in a semi-automatic authoring process. For this the author rather needs only information about the usage of the complex media element but not necessarily about the 18 explicit structure of the template. pv1 par v1 v2 spatial-p [10,10,30,30] pv2 acoustic-p [20,0,0,0] seq v1 v2 par v6 v3 v4 v5 Video delay[100] v7 par v8 v9 Presentation pv1 encapsulate aTemplate v1 v6 v7 par v1 v9 v8 v5 Caption 1 v2 spatial-p [100,100,40,40] pv2 acoustic-p [70,0,0,0] bind Video Title Audio Audio Caption 2 Presentation Video 1 Video 2 Summary Fig. 18. Templates | structural reuse of ZYX fragments Reuse of documents. As entire documents in ZYX are nothing else but a (logically complete) fragment, documents can be reused in any other ZYX document. Or reuse can just mean that an author arbitrarily alters and by this adjusts an existing ZYX document to her specic needs. A.3 Identical versus structural reuse Following one of ZYX 's design ideas to separate structure from layout is to reuse a multimedia document with dierent layouts, e.g., a dierent look and feel. For example, if the layout designer of our Cardio-OP project changes the concept for the overall presentation of medical content in the project hopefully only the layout of the documents must be changed without touching the documents at all. Another application is the change of the technical presentation medium. Consider a presentation with a screen layout. What happens if the same presentation is to be presented at a point of information with a touch screen? By exchanging the layout the same fragments can be used in different presentation contexts. As each presentation element distinguishes between its variables and projector variables the structural part can easily be separated from the layout part. An author, hence, can select to use only the structure and assign a new, own layout to the document or fragment. With structural reuse of ZYX documents and fragments the adaptation of a document's appearance to the presentation context is possible | here the relationship between reuse and adaptation becomes obvious. Figure 19 gives a simple example of reusing the same fragment with two different layouts. The presentation of the same fragment then changes depending on the layout bound to it. Structural reuse is also an application of the adaptation of the layout of ZYX documents to a specic user context. B. Adaptation In the following, we describe the dierent adaptation possibilities we have when exploiting the modeling primitives of the ZYX model. The adaptation elements switch and query as well as ZYX templates play the key role in supporting adaptation. Fig. 19. Reuse of structure with dierent layouts in ZYX B.1 Explicit modeling of presentation alternatives With the modeling of presentation alternatives, the author of a ZYX document can explicitly model adaptivity to the user context. For example, in the Cardio-OP context a switch can distinguish the alternatives for undergraduate students, graduate students, and researchers. The switch element allows to dene arbitrary discriminating values. An alternative can also be \labelled" by a combination of discriminating values. This means that adaptation has as many dimensions as the author desires. However, this means that the document, to be adaptable to many dierent presentation contexts, needs to model all the dierent presentation alternatives for the respective contexts under the document's switch elements. To relieve an author from such a time consuming and somehow never ending story, we propose to provide mechanisms to (semi) automatically augment the document with the necessary alternatives, possibly guided by a user. The idea is that the author concentrates on the initial goal, to compose a multimedia document with a certain content, an then enrich the document, exploiting the switch primitive with additional fragments for conveying the same information but in dierent presentation contexts. In the following, we only illustrate how this can be achieved, for further details we refer the reader to [18]. Automatic generation of presentation alternatives | Augmentation For a ne-grained adaptation to many dierent user contexts, it is mandatory that a high number of alternatives is available. However, if an author had to specify all possible alternatives this would result in a very time consuming composition eort and deviate the author from the initial goal, namely the composition of a sound presentation. To relieve the authors from this additional burden, we propose to support the automatization of the specication of the alternatives. We call this step augmentation of the multimedia document which takes place after the document has been composed by the author. The augmentation process queries the underlying pool of fragments exploiting the inherent technical data and the metadata the media elements 19 have been annotated with to receive potential presentation alternatives. The alternatives are then inserted into the document, i.e., the document is augmented by the alternatives to provide for adaptivity in dierent presentation contexts. However, the suggested alternatives cannot simply be inserted into the document but, to preserve the semantics of the presentation intended by the author, have to undergo a verication to assure that the augmented document is still valid with regard to the representation semantics. Figure 20 shows an small document which has been augmented by additional fragments. First, before the augmentation, the document contained the video V ideo1, indicated in bold face. Then, targeting the document at both a medical professor and a medical student and at the same time taking into account three dierent levels of available bandwidth for the presentation, the augmentation results in a switch element oering such dierent alternatives. From the technical side the augmentation constraint has introduced atomic media elements and fragments for medium and low bandwidth. Please note, that this does not necessarily mean only dierent quality of the same medium. For example, for the professor the alternative for the V ideo 1 at low bandwidth is a complex media element, a slide show S lideS how . Additionally, the documents can be used also in the context of a medical student. Therefore, for each available bandwidth a media element has been augmented that is targeted at the knowledge and background of a medical student but covers the same topic. The parameters of the switch element in Figure 20 only indicate the discriminating attributes, whereas the actual parameter list is too long for this illustrative example. switch [ ... prof ... student ... ... high ... medium ... low ] Video 1 Video 2 High Bandwidth Image 2 Video 1’ SlideShow Medical professor Medium Bandwidth Text Low Bandwidth Medical student Fig. 20. Augmentation of a ZYX fragment In a rst step, we elaborated an augmentation scheme to (semi-) automatically augment documents with respect to dierent system contexts which mainly dier in the targeted bandwidth and system power on the level of providing presentation alternatives on the level of atomic media elements. We have formalized the verication of this kind of automatic augmentation of ZYX document with presentation alternatives in [18]. A much more complicated eort is to automatically augment ZYX documents with semantically equivalent fragments that cover larger parts of a presentation. For example, can a subsection of a multimedia presentation intended for a medical doctor be au- tomatically augmented such that an equivalent content is conveyed to a student which presumably has a much lower background in the eld? Here, possibly the annotation of multimedia content must be carried out very carefully by the experts in the eld to give an automatic augmentation model suÆcient input to select and insert semantically equivalent presentation alternative. And additionally, the process of augmentation will be rather semi-automatic, possibly guided by an author who is an expert in the eld. B.2 Declarative modeling of presentation alternatives There are two kinds of applications of query elements for adaptation: The query elements can be used for the dynamic binding of fragments just before presentation and can also be used to support the authoring process. The query element bears the metadata that is to be evaluated for the selection of the best matching fragment. The formal denition of the query element species a set of metadata to be met by the fragment to replace the query node. The query semantics, however, are not specied by the model but left to the application. Query elements can be used to automatically adjust documents to the current context, i.e., the query elements are used to select the element that best matches the query at the latest point in time just before presentation. One of the advantages of leaving parts of the document somehow \a black box" just until the actual request for presentation is that in this case always the most up-to-date pool of fragments is considered in the query evaluation. The evaluation of a query element specied in a document can be executed at authoring time to test the later result of the presentation. In combination with templates, the query element can be applied for authoring support. Instead of leaving the variables of a template unbound, one could bind these to suitable query elements. The evaluation of the query element at authoring time can then propose fragments to be placed at that respective node. By this a kind of content-oriented browsing can be inserted in the documents and allow, e.g., novice users to have an easy start with the model. C. Implications to authoring and presentation The approach we have taken for the modeling of multimedia content signicantly impacts authoring and presentation of the multimedia material. Traditional authoring systems usually aim at the creation of a pre-orchestrated presentation addressing a dedicated user group. These presentations usually do not allow to exploit the logical structure or layout denitions for adaptation of the presentation during playout. Given our approach the authoring process has to focus much more on the structural composition of multimedia material, separating the logical structure of a multimedia presentation from its layout specications. The resulting composition is no longer a xed pre-orchestrated presentation. It allows for explicit exploitation of the structural composition in order to adapt the presentation to individual user needs. In consequence, the authoring system needs to have access to the individual media elements, 20 fragments, and documents that should be considered for composition. Hence, the authoring tool has to oer browsing, navigation, and selection mechanisms to the authors in order to identify those media elements in the multimedia repository that should become part of the presentation. Obviously, the annotation of media elements, parts of media elements, fragments, and documents give the necessary support for the content-oriented browsing such that an author can easily identify and select the relevant parts. The authoring tool can either provide for the construction of a ZYX document tree from scratch, or allow for the completion of pre-dened ZYX templates. The playout of a ZYX document can be realized in different ways. As a rst alternative, the ZYX document can be transformed into a presentation format that can be directly interpreted by existing players. This alternative seems to be very interesting for the SMIL format, as rst SMIL players are already available. Obviously, the transformation into another document format may result in the loss of specic features or presentation information if the target model does not provide the same level of semantic expressiveness as available by the ZYX model. As a second alternative, ZYX documents could be played out by a ZYX-specic presentation engine that is capable to fully exploit all the features of the ZYX model with respect to adaptation of a presentation. This allows for the integration of new business models into the presentation environment. For example, the end user can be billed for the actual quality of the multimedia material s/he received. In the Cardio-OP project we developed a specic ZYX presentation engine. In summary, the kind of structured authoring that results in adaptive multimedia documents and the presentation features of a ZYX-based presentation tool, both aiming at reuse and adaptation of multimdia material allow for cost eective multimedia authoring and customized presentations. the basis for the denition of an XML DTD for the ZYX model. This will enable access to content stored in the Cardio-OP repository by future XML-capable browsers and we can also think about storing ZYX documents in an SGML/XML-capable database system in the future, following the approach taken in [22]. Furthermore, we have developed a generic presentation engine for ZYX documents which includes support for continuous MPEG video streams based on an MPEG-specic extension of the L/MRP buer management technique [23]. For content-based managing and querying the underlying media data, we have been developing a Media Integration DataBlade module [24] for the IDS/UD which forms an integration layer oering uniform, homogeneous access to the dierent types of media data. Supporting multimedia authoring, this DataBlade allows for interactive contentbased browsing in the multimedia material. With the MediaWorkBench we have been developing a tool in Java on top of the Media Integration DataBlade module for GUIsupported annotating and browsing the media data. With regard to the global prole describing the user context, we have been developing a mathematical model for the combination of dierent proles describing dierent aspects of a user like user group, user system environment and the like into one semantically correct, conictfree global prole that can be exploited for presentation of adaptive ZYX documents. For adaptation support, we have developed a cross-media adaptation scheme [18] that can be integrated with the ZYX model and provides for the automatic augmentation of ZYX documents by semantically correct presentation alternatives - a process which relieves the authors from a time-consuming task of comprehensively composing documents for dierent user and system contexts. Given this ongoing work, one further goal is to develop generic composition schemes and, exploiting the metadata provided with the fragments and the global prole describing the user context, to support (semi-)automatic compoVI. Conclusion and Future Work sition of documents that are adapted and personalized to Starting out with the requirements of the Cardio-OP the specic user context. project, which calls for the support of reusability, adap- Acknowledgments. tation, and presentation-neutral description of the strucWe would like to thank Utz Westermann for his contributure and content of multimedia documents, we sketched our tions to the design and implementation of the ZYX model. analysis of existing relevant multimedia document models. We would like to thank Jochen Wandel for his contributions As these models do not meet the project's requirements, to the formal framework to support automatic augmentawe introduced our new ZYX model that gives the necessary tion of multimedia document models. We would also like support. We outlined the design considerations of the ZYX to thank Christian Heinlein for his valuable comments on model and the basic concepts followed by a formal frame- the paper. work of the ZYX primitives. Finally, we illustrated the References applicability of ZYX for reuse and adaptation and the chal[1] W. Klas, C. Greiner, and R. Friedl, \Cardio-OP | Gallery of lenges and implications of these advanced concepts have to Cardiac Surgery," in Proc. of IEEE International Conference authoring and presentation environments for multimedia on Multimedia Computing and Systems (ICMCS'99), Florence, Italy, June 1999, IEEE Computer Society. documents. Raggett, A. Le Hors, and I. Jacobs, HTML 4.0 The ZYX model has been implemented as a DataBlade [2] D. Specication { W3C Recommendation, revised on 24-Aprilmodule for the object-relational database system Informix 1998, W3C, URL: http://www.w3.org/TR/1998/REC-html4019980424, April 1998. Dynamic Server/Universal Data Option under SUN SoJTC1/SC29, Information technology { Coding of mullaris [19], following the architectural framework initially [3] ISO/IEC timedia and hypermedia information { Part 1: MHEG object presented in [20], [21]. The formal description served as representation ISO/IEC 13522-1, ISO/IEC IS, 1997. 21 [4] ISO/IEC JTC1/SC29/WG12, Information Technology { Coding of Multimedia and Hypermedia Information { Part 6: Support for Enhanced Interactive Applications, ISO/IEC IS 135226, ISO/IEC, 1996. [5] ISO/IEC JTC1/SC29/WG12, Information Technology { Coding of Multimedia and Hypermedia Information { Part 5: Support for Base-Level Interactive Applications, ISO/IEC IS 13522-5, ISO/IEC, 1995. [6] ISO/IEC, Information Technology - Hypermedia/Time-based Structuring Language (HyTime), 1992, ISO/IEC IS 10744. [7] S. R. Newcomb, N. A. Kipp, and V. T. Newcomb, \HyTime { The Hypermedia/Time-Based Document Structuring Language," Communications of the ACM, vol. 34, no. 11, November 1991. [8] P. Hoschka, S. Bugaj, D. Bulterman, et al., Synchronized Multimedia Integration Language { W3C Working Draft 2-February98, W3C, URL: http://www.w3.org/TR/1998/WD-smil-0202, Februar 1998. [9] S. Boll, W. Klas, and U. Westermann, \Multimedia Document Formats | Sealed Fate or Setting Out for New Shores?," in Proc. of IEEE International Conference on Multimedia Computing and Systems (ICMCS'99), Florence, Italy, June 1999, IEEE Computer Society. [10] S. Boll, W. Klas, and U. Westermann, \A Comparison of Multimedia Document Models Concerning Advanced Requirements," Technical Report - Ulmer InformatikBerichte Nr. 99-01, University of Ulm, Germany, February 1999, http://www.informatik.uni-ulm.de/dbis/CardioOP/publications/TR99-01.ps.gz. [11] S. Boll, W. Klas, and U. Westermann, \Multimedia Document Formats | Sealed Fate or Setting Out for New Shores?," Multimedia - Tools and Applications, ICMCS special issue, to appear in 2000. [12] T. D. C. Little and A. Ghafoor, \Interval-Based Conceptual Models for Time-Dependent Multimedia Data," IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 4, August 1993. [13] T. Wahl and K. Rothermel, \Representing Time in Multimedia Systems," in Proc. IEEE International Conference on Multimedia Computing and Systems, Boston, MA, May 1994, pp. 538{543. [14] A. Duda and C. Keramane, \Structured Temporal Composition of Multimedia Data," in Proc. IEEE International Workshop on Multimedia- Database-Management Systems, Blue Mountain Lake, August 1995. [15] N. Hirzalla, B. Falchuk, and A. Karmouch, \A Temporal Model for Interactive Multimedia Scenarios," IEEE Multimedia, vol. 2, no. 3, pp. 24{31, Fall 1995. [16] D. Papadias, Y. Theodoridis, T. Sellis, and M. J. Egenhofer, \Topological Relations in the World of Minimum Bounding Rectangles: A Study with R-Trees," in Proc. of the ACM SIGMOD Conference on Management of Data, San Jose, May 1995. [17] M. J. Egenhofer and R. Franzosa, \Point-Set Topological Spatial Relations," Int. Journal of Geographic Information Systems, vol. 5, no. 2, March 1991. [18] S. Boll, W. Klas, and J. Wandel, \A Cross-Media Adaptation Strategy for Multimedia Presentations," in Proc. of ACM Multimedia'99, Orlando, Florida, USA, November 2{5 1999. [19] S. Boll, W. Klas, and U. Westermann, \Exploiting OR-DBMS Technology to Implement the ZYX Data Model for Multimedia Documents and Presentations," in Proc. of Datenbanksysteme in Buro, Technik und Wissenschaft (BTW99), GI-Fachtagung, Freiburg, Germany, March 1999, Springer. [20] W. Klas and K. Aberer, \Multimedia and its Impact on Database System Architectures," in Multimedia Databases in Perspective, P. M. G. Apers, H. M. Blanken, and M. A. W. Houtsma, Eds. Springer, London, 1997. [21] S. Boll, W. Klas, and M. Lohr, \Integrated Database Services for Multimedia Presentations," in Multimedia Information Storage and Management, S. M. Chung, Ed. Kluwer Academic Publishers, Dordrecht, 1996. [22] K. Bohm, K. Aberer, and W. Klas, \Building a Hybrid Database Application for Structured Documents," Multimedia - Tools and Applications, vol. 8, no. 1, 1999. [23] F. Moser, A. Krai, and W. Klas, \L/MRP: A Buer Management Strategy for Interactive Continuous Data Flows in a Multimedia DBMS," in Proceedings VLDB 1995, USA, 1995, Morgan Kaufmann. [24] U. Westermann and W. Klas, \Architecture of a DataBlade Module for the Integrated Management of Multimedia Assets," in Proceedings of the First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM), Orlando, Florida, October 1999.
© Copyright 2026 Paperzz