Multimedia Montage -Counterpoint synthesis of moviesRyotaro Suzuki Yuichi Iwadate [email protected] [email protected] ATR Media Integration & Communications Research Laboratories Abstract In this paper, we propose ´Multimedia Montageµ, which is the structural synthesis in time and space of multimedia components, such as movies and sounds, as a new image expression method for communications. In this study, we introduce the counterpoint theory in music to compose movie structures, and we use scripts to describe the structures. The nature of counterpoints lies in the parallelism of autonomous elements and they fit today·s multimedia technologies very well. We confirm the effectiveness of our method by making example movies based on the counterpoint theory, developing a prototype system, and conducting movie synthesis experiments using the prototype. Our next goal is to achieve automatic synchronization among multimedia components based on their internal rhythms, which were observed in our experiments. 1. Introduction In our laboratory, various possibilities are being researched for image expression in communications utilizing multimedia technology. Our study of Multimedia Montage is one such research that was started in 1997 to develop a new image expression method using multimedia. Image communications are much different from language communications. A typical style for language communications is story telling of a sort where a speaker tries to create one exact meaning in a sequential manner. This method, of course, is effective in image communications, too. However, image communications have the potential of a different kind, i.e., based on parallel structures as their nature. In our study, we focus on this characteristic and attempt to utilize multimedia technology for image expression of this type. In this paper, we first describe the notion of Multimedia Montage and introduce the counterpoint theory in music as a useful method for image expression using multimedia. Then, we describe movie synthesis based on the counterpoint method and a prototype system of Multimedia Montage. Finally, we review results of the proposed movie synthesis by the prototype system and discuss some of the research in progress. 2. Multimedia Montage and Counterpoint 2.1. Multimedia Montage Image expression methods based on parallel structures can typically be found in composition methods in art, such as collage, assemblage, and montage. Today·s multimedia technology goes a long way towards achieving such kind of image expression not only in the space domain but also in the time domain[8]-[14]. Consequently, while focusing on the relationship between the present multimedia technology and this image composition concept, we name the structural synthesis of multimedia components in time and space as ´Multimedia Montageµ. · 2.2. Eisenstein s montage theory The notion of composition in movies most typically appears in the montage theory, which was studied mainly in the 1920·s by Russian movie directors such as S. M. Eisenstein[1]. In these early studies, the montage theory was based on movie shot compositions. After 1930, Eisenstein extended the theory to involve multimedia, including colors and sounds. In particular, in movies such as ´Aleksandr Nevskiiµ, Eisenstein emphasized the importance of counterpoint composition in the synthesis of movies and sounds, e.g., sounds must not obey movies; the two must be equal and independent of each other. 2.3. Counterpoint concept Generally speaking, the word ´counterpointµ has two meanings. One meaning is ´an abstract concept of an expression methodµ and the other meaning is used in music. The former emphasizes the contrast between two elements to be combined in some presentation. The latter is specified in the classic European music theory and has some different aspects. What is common and most important in the counterpoint concept is that every element of what is to be expressed must be autonomous and independent of other elements. Some entity is composed of multiple elements, which temporally coexist in parallel and are not tied to any fixed dependency. Therefore, the main characteristic of counterpoint is its parallel structures, especially in the time axis. PARALLEL AUGMENTATION DIMINUTION RETROGRADE 3. Counterpoint Movie INVERSION There are several suggestive examples of counterpoint application to other media, such as Glenn Gould·s radio documentary ´The Idea of Northµ, which is called a ´counterpoint documentaryµ[2][3]. Another example is Samuel Beckett·s movie ´Quadµ[6], where the actors walk as if dancing in a ceremony based on a canon form. However, it is difficult to find counterpoint application to movie structures. Counterpoint movies are the target for production by the Multimedia Montage prototype system based on the counterpoint theory. In this study, we generally call a movie based on the counterpoint theory a ´counterpoint movieµ. Before producing such a movie by the prototype system, we analyze characteristics of the counterpoint theory in music and construct some examples of counterpoint movies using a commercial non-linear movie editing tool (´Adobe Premiereµ), based on the results of our analysis. 3.1. Counterpoint in music The counterpoint theory in classical music has grown since the European middle ages. J. S. Bach accomplished the integration of the counterpoint and harmonics. The common characteristics of the theory can be summarized as follows, especially from the viewpoint of multimedia handling methods[2][5]. Composition of multiple parts, each with autonomy and independence. Temporal relationships among the notes to be matched. Harmony among the notes to be matched. Dux and comes ( subject and answer ) Transformation for a following voice As is typical in canons, various temporal or pitch transformations and their combinations are used against a leading voice melody to form the corresponding following voice melody. Examples of the transformations used in canons are parallel( follow ), augmentation, diminution, retrograde, and inversion ( of pitch ) ( Fig. 1 ). Fig. 1 Transformations in canons 3.2. Counterpoint movie examples We have made the following counterpoint movie examples to examine the possibility of applying the counterpoint theory to movies. 1) Dance Canonica ´Dance Canonicaµ[7](Fig. 2) is a basic version of a counterpoint movie, one that uses transformation variations in canons. The same dance shot is made to overlap the original one by changing its starting time, speed, etc. Actually, the movie consists of five scenes corresponding to canon forms, i.e., parallel, augmentation, retrograde, diminution, and retrograde & inversion (of color). 2) Kazoku Game Game ´Kazoku Game Gameµ(Fig. 3) is a practical version of a counterpoint movie. It is an aggregation of diverse variations of movie compositions, like a fugue, based on a parody of a famous Japanese movie ´Kazoku Gameµ. ´Kazokuµ means family and ´Kazoku Gameµ is a sort of home comedy. The actual structure of the typical version of the movie is presented in the next chapter. The target of ´Kazoku Game Gameµ is not only a movie but also a sort of communication game based on movie making using multimedia technology. 3.3. Characteristics of counterpoint movies By looking at the above analysis and experiments, we have reached the conclusion that most concepts in the counterpoint theory in music can be adapted to counterpoint movies, especially in the aspect of time. Fig. 2 Scenes from ´Dance Canonicaµ Fig. 3 Scenes from ´Kazoku Game Gameµ The characteristics of counterpoint movies are summarized as follows corresponding to those in music. 1)Synthesis of multiple independent movie elements. 2)Temporal synchronization among movie elements that are synthesized. 3)Adjustment in the presentation of synchronizing movie elements. 4)Communications among movie elements based on subject reference and repetition. 5)Temporal transformations such as translation, scaling, and reversion in the reference. These are the most fundamental characteristics of counterpoint movies. Not all the five conditions are necessarily covered all of the time. There can be cases where some are achieved by indirect suggestions or simply omitted. 4. Multimedia Montage Prototype System We have developed a prototype system to produce counterpoint movies, as a case study in Multimedia Montage. 4.1. Configuration The present prototype consists of the following parts ( Fig. 4 ). 1) Movie Modeller The Movie Modeller consists of two main modules, that is, a Movie Graph GUI and a Script Converter. 1.1) Movie Graph GUI The Movie Graph GUI is a GUI for editing Movie Graphs (Fig. 7), each of which is an internal form of the Meta Script (Fig. 5). The Movie Graph GUI can read from and write to the Meta Script, so users are able to make and arrange Meta Scripts using the Movie Graph GUI. 1.2) Script Converter The Script Converter converts the Meta Script into a Raw Script, which is of a machine dependent form. In the conversion, the Script Converter retrieves multimedia components that match the Meta Script descriptions. Movie Renderer The Movie Renderer synthesizes multimedia components in the Multimedia Database into one movie according to the Raw Script descriptions. The resultant movie is saved into the Multimedia Database as a standard movie file, so as to be played by a standard movie player, or directly displayed. Multimedia Database The Multimedia Database stores multimedia components and their attribute information, which corresponds to the attributes defined in the Meta Script. 0RGHOOHU 0RYLH duration(40.0) mix(CHROMAKEY) Symbol(student) 6FULSW &RQYHUWHU } 0(7$ 6&5,37 029,( *5$3+ 5$: 6&5,37 029,( 287387 0RYLH 0RYLH 5HQGHUHU *UDSK *8, &20321(17 $775,%87(6 &20321(17 ),/(6 0XOWLPHGLD'DWDEDVH 08/7,0(',$'$7$%$6( Fig. 40XOWLPHGLD0RQWDJH3URWRW\SH6\VWHP 4.2. Meta Script A Meta Script is a hierarchical aggregation of node definitions and describes the structure of a synthesized movie. Fig. 5 illustrates an example of a Meta Script which corresponds to one typical version of ´Kazoku Game Gameµ (Figs. 6-7 correspond to Fig. 5). We originally designed the Meta Script format in our Multimedia Montage study. The format is similar to VRML [8] or the Open Inventor file format [9] and its details are arranged corresponding to our own requirements. In VRML, the node hierarchy represents a 3D space structure. In the Meta Script, in contrast, it represents a temporal structure. The combination of sequential and parallel is very similar to SMIL[12]. The main function of the Meta Script, on the other hand, is the description of the synthesis (superimposition) of multiple movies; SMIL describes the 2D spatial layout of multimedia components on a web page considering temporal synchronization. The general description of a node is as follows. Node Type Name ( Node Name ) { Descriptions of node attributes. Descriptions of other nodes. } ex.) # Study Shot Shot(Study) { type(SOUNDMOVIE) Every node has its own name except for the Define Node. Any line starting with ´#µ is ignored as a comment line (except the #define macro). Once a node is defined in a Meta Script ( whether inside or outside of the Define Node ), it can thereafter be referenced by simply describing its name (´Symbol(student)µ in the example). When a defined node is referenced, all of the attribute values are inherited and any one of them can be changed by adding the block ´{...}µ. The types of nodes to be used in the Meta Script (and in the Movie Graphs) are as follows. 1) Movie This is the root of any whole movie structure. 2) Define The Define Node defines some of the information of a scene, shot, or symbol nodes to be referenced elsewhere. Nodes under the Define Node are not directly traversed. The Define Node can have any number of offset attributes. An offset attribute is used to define a temporal offset value corresponding to its label name, which can be refered to as a value of a start_time attribute in any descendant node of the Define Node·s parent. ex.) Define { offset( Development, 45.0 ) offset( Coda, 75.0 ) } 3) Scene The Scene Node makes a hierarchical structure by becoming the parent of other scene nodes. It has such attributes as timing, start_time, duration, speed, reverse, repeat, and mix. The last mix attribute represents the image mixture (superimpose) mode against other parallel nodes (OVERLAY or CHROMAKEY). This attribute corresponds to an attribute of the same name in the Multimedia Database. If it is defined both in the database and in the script, the definition in the Meta Script has priority. The other attributes correspond to the above-mentioned temporal transformations. The timing attribute has a value of SEQUENTIAL or PARALLEL, which determines the temporal relationship of the child nodes. From this specification, a sequential structure and a parallel structure are freely combined into one movie. 4) Shot The Shot Node is the bottom end child node of a scene hierarchy and corresponds to an actual component in the Multimedia Database. It can have Symbol Nodes as its children. Then, the set of symbols can be matched to a component 6KRW*LUOVB0LGGOHB6KRW^ 0HWD6FULSW6DPSOH W\SH6,/(17029,( .D]RNX*DPH*DPH GXUDWLRQ 6\PERORFHDQ 0RYLH*DPH^ 6\PEROJLUOV 0DLQ&DVWLQJ ` 'HILQH^ 6KRW*LUOB=RRPB6KRW^ 6\PEROWXWRU^ W\SH6,/(17029,( FDVWLQJ<RQHL VSHHG ` GXUDWLRQ 6\PEROIDWKHU^ 6\PERORFHDQ FDVWLQJ6X]XNL 6\PEROJLUO ` ` 6\PEROPRWKHU^ ` FDVWLQJ8HPXUD %*0 ` 6KRW%*0^ ` W\SH6281' 7LWOH GXUDWLRQ 6KRW7LWOH^ 6\PERO%*0 ILOHBQDPH7LWOHWLI ` GXUDWLRQ ` ` 3DUW1R%DFNJURXQG,03/,&,7 'LQLQJ6FHQH ` 6FHQH'LQLQJ^ )RUHJURXQG WLPLQJ3$5$//(/ 6KRW)RUHJURXQG^ %DFNJURXQG W\SH6281'029,( 6FHQH%DFNJURXQG^ PL[&+520$.(< 3DUW.LWFKHQ8QLW 6\PEROWXWRU 6KRW.LWFKHQ^ 6\PEROIDWKHU W\SH67,// 6\PEROPRWKHU GXUDWLRQ 6\PERO\RXQJHUBFKLOG 6\PERONLWFKHQ 6\PEROHOGHUBFKLOG ` 6\PEROORQJBWDEOH 3DUW%HDFK6FHQH ` by Multimedia Database retrieval. It has such attributes as timing, start_time, duration, speed, reverse, repeat, and mix, like the Scene Node. It also has a type attribute and a file_name attribute. The type attribute represents the content type of a component to be matched such as SOUNDMOVIE, SILENTMOVIE, STILL, or SOUND. When a known component in the Multimedia Database is used, the file name of the component can be specified using the file_name attribute. 5) Symbol The Symbol Node represents abstract symbol information that will appear in the parent Shot Node. It has a casting attribute which represents a person (or a thing) that acts as the symbol. It can also be used to represent a property of the parent. In such a case, the casting attribute represents the value of the property. The combination of the Symbol Node name and the casting attribute is utilized for the retrieval of multimedia components, which have corresponding attribute data in the Multimedia Database. } 6FHQH%($&+DQG%*0^ WLPLQJ3$5$//(/ (QG7LWOH %HDFK,PDJH 6KRW(QG^ 6FHQH%HDFK^ ILOHBQDPH(QGWLI 6KRW%DQDQDB%RDW^ GXUDWLRQ W\SH6,/(17029,( ` GXUDWLRQ ` 4.3. Movie Graph 6\PERORFHDQ 6\PEROEDQDQDBERDW ` Fig. 5 Meta Script of ´Kazoku Game Gameµ 7LWO 7LWOHH %HDFK %HDFK %HDFK 'LQLQJ%DFNJURXQG %DQDQDB *LUOVB .LWFKHQ %RDW 0LGGOH6KRW *LUOB (QG =RR =RRP6 P6KKRW /D\HUV %*0 RI VKR KRWWV 'LQLQJ 'LQLQJ)RUHJURXQG 7LPH Fig. 67LPH7DEOHRI´.D]RNX*DPH*DPHµ *DPH 029 029,,( 7LWO LWOHH 'LQL QLQQJ '(),1 '(),1(( 6+2 +277 6&(1( (QG 6+2 +277 %DFNJU FNJURRXQG 6&(1( A Movie Graph is an internal form of a Meta Script, and it is represented in the form of a tree graph (Fig. 7). The types of nodes and the structure of a Movie Graph simply correspond to the Meta Script. The idea of Movie Graphs is taken from the ´Scene Graphµ of the Open Inventor [9]. In the Movie Graph example, each right lower node is the first child of its left upper node. The vertical node sequence represents the parallel relationships of child nodes, while the horizontal node sequence represents their sequential relationships. According to this presentation format, the difference between parallel and sequential can be clearly viewed, although the graph itself is a simple tree. This format was originally designed in our Multimedia Montage study for visualization in the Movie Graph GUI. It does not constrain the actual Movie Graph description form in Multimedia Montage programs, even though this presentation itself is called Movie Graph in some cases. 5. Experiment on the Prototype System .LWFK WFKHHQ%($&+DQG%*0 6+2 +277 6&(1( %HD HDFK FK 6&(1( *LUOV *LUO 0LGGGOHB=RRP %DQ DQDDQDB0LG 6KRRW B6K 6KRRW B%RDW B6K 6+2 +27 7 6+2 +27 7 6+2 +277 %*0 6+2 +27 7 )RU RUHHJUR JURXXQG 6+2 +27 7 Fig. 70RYLH*UDSKRI´.D]RNX*DPH*DPHµ To verify the effectiveness of the prototype system as an image expression tool using multimedia, we reproduced our counterpoint movie samples “Dance Canonica” and “Kazoku Game Game”. In addition, we produced a new movie named “Invention”, which is based on the counterpoint structure of J. S. Bach’s “3 Part Invention No. 4” using the prototype. The results are summarized as follows. 1) Structure We recreated both movies to have completely the same structures as the former samples, including the temporal synchronization parameters such as speed, start time offset, etc. It is much easier for the prototype system to produce movies of certain temporal structures, because present nonlinear movie editing tools do not have such functions to define structures and to reuse them. Our prototype is especially effective in this respect. The creation of “Invention” was a very important experiment to ensure the effectiveness of the Meta Script in describing the structure of an actual counterpoint musical piece. As a result of this experiment, we found that some synchronization definition mechanism is necessary for any pair of separated nodes in a whole Movie Graph. The offset attribute of the Define Node was added for this reason lately after the experiment. 2) Abstract expression As a part of the structure description function, our prototype has the ability to express an abstract content of a multimedia component using Symbol Nodes. Present nonlinear movie editing tools do not have such a function related to a database. In our experiment, we verified that the sample movies could be recreated utilizing this abstract expression. The function of a Symbol Node is very general and can be utilized in many ways. In “Dance Canonica”, we used it for color information; in “Kazoku Game Game” for player casting information; and in “Invention” for emotion, motion, and camera angle information. It is therefore very effective for image expression, which is often obscure. 3) Texture expression The expression of detailed textures is very important in image expression. Present non-linear movie editing tools have various kinds of filters and superimposing modes that support image texture expression. Our prototype, in contrast, is a very simple tool with a lot fewer functions. As a result, the texture expression produced by the overlay and chroma-key functions of our prototype in this experiment is very similar to, but not strictly the same as, the one in our former output. 4) Process time and manipulation The total process time depends on the rendering time, which in turn depends on the renderer’s implementation environment. The method taken in our prototype system has no special merits or demerits at this point. What is important is the throughput of the prototype’s batch processing compared with that of present non-linear movie editing tools using GUI manipulation. To create and to edit a structure and the substitutions of the structure elements are much faster in our prototype. On the contrary, especially for the early stage of movie design, the non-linear tool GUIs have good merits for partial verification. Therefore, for actual use, a combination of the two will be the most effective. 6. Conclusion and Work in Progress We have confirmed that the counterpoint theory in music can be applied to movie composition utilizing multimedia technology, thereby creating a new way of image expression. We have also confirmed that the script language that we have created is effective for describing the structures of counterpoint movies. The most important issue for applying the counterpoint theory to multimedia is to find a proper substitution rule for the harmony rule in the counterpoint theory in music. We have found some clues for solving this issue in our experiments on producing counterpoint movies. In our experiments, we observed that some pairs of recorded movie or sound components fit much better than expected, given certain time lags and speed rates. We want to explain this phenomenon with what C. G. Jung calls “synchronicity” [15], which even diachronically exists among past records and can be replayed, just as Glenn Gould experimented with in “The Idea of North” [2][3]. We think that it is caused by the fact that each image element has its own rhythm both in the outer world and in each person’s mind, and it can be synchronized to generate a whole image [16]. We call this hypothetical concept the “Image Wave”. Consequently, we have started new research based on the Image Wave concept. In this research, we are analyzing rhythmic information and its synchronization conditions in movies. For Multimedia Montage, we plan to create a Counterpoint Synchronizer which will automatically synchronize multimedia components based on this research. We hope this mechanism will work as a harmony rule in counterpoint movies. References [1] Eisenstein, S.M.: The Works of Sergei Eisenstein Part 2, KinemaJunposha Inc. (1980-1993) [2]McGreevy, J., Ed.: Glenn Gould Variations, Doubleday (1983) [3] Guertin, G., Ed.: Glenn Gould Pluriel, Verdun, Quebec: Louise Courteau, Editrice Inc. (1988) [4] Nattiez, J.J.: Musicologie generale et emiologie, Christian Bourgois Editeur (1987) [5] Greenberg, B.S.: Johann Sebastian Bach FAQ, USENET newsgroup alt.music.j-s-bach (1996, 1997) [6]Beckett, S.: Quad, Curtis Brown Ltd. (1984) th [7] Suzuki, R.: Dance Canonica, The 6 ACM International Multimedia Conference -Art Demos-Technical Demos-Poster Papers-, pp. 37 (1998) [8] ISO: ISO/IEC 14772-1, The Virtual Reality Modeling Language, ISO (1998) [9] Wernecke, J.: The Inventor Mentor, Addison-Wesley Publishing Co. (1994) [10] Apple Computer, Inc.: QuickTime3 Reference, Apple Computer, Inc. (1998) [11] ISO: ISO/IEC 14478, Presentation Environments for Multimedia Objects (PREMO), ISO (1998) [12] W3C: Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, W3C (1998) [13] Ackermann, P: Developing Object-Oriented Multimedia Software - Based on the MET++ Application Framework, dpunkt Verlag ( 1996) [14] Gibbs, S: Composite Multimedia and Active Objects, OOPSLA’91, pp. 97-112 (1991) [15]Peat, F.D.: Synchronicity, Bantam Books Inc. (1987) [16]Pribram, K.H.: Languages of the Brain, Prentice-Hall Inc. (1971)
© Copyright 2026 Paperzz