Multimedia Montage -Counterpoint synthesis of

 Multimedia Montage
-Counterpoint synthesis of moviesRyotaro Suzuki
Yuichi Iwadate
[email protected] [email protected]
ATR Media Integration & Communications Research Laboratories
Abstract
In this paper, we propose ´Multimedia Montageµ, which
is the structural synthesis in time and space of multimedia
components, such as movies and sounds, as a new image
expression method for communications. In this study, we
introduce the counterpoint theory in music to compose movie
structures, and we use scripts to describe the structures. The
nature of counterpoints lies in the parallelism of autonomous
elements and they fit today·s multimedia technologies very
well. We confirm the effectiveness of our method by making
example movies based on the counterpoint theory, developing
a prototype system, and conducting movie synthesis
experiments using the prototype. Our next goal is to achieve
automatic synchronization among multimedia components
based on their internal rhythms, which were observed in our
experiments.
1. Introduction
In our laboratory, various possibilities are being
researched for image expression in communications utilizing
multimedia technology. Our study of Multimedia Montage is
one such research that was started in 1997 to develop a new
image expression method using multimedia.
Image communications are much different from
language communications. A typical style for language
communications is story telling of a sort where a speaker tries
to create one exact meaning in a sequential manner. This
method, of course, is effective in image communications, too.
However, image communications have the potential of a
different kind, i.e., based on parallel structures as their nature.
In our study, we focus on this characteristic and attempt to
utilize multimedia technology for image expression of this
type.
In this paper, we first describe the notion of Multimedia
Montage and introduce the counterpoint theory in music as a
useful method for image expression using multimedia. Then,
we describe movie synthesis based on the counterpoint
method and a prototype system of Multimedia Montage.
Finally, we review results of the proposed movie synthesis by
the prototype system and discuss some of the research in
progress.
2. Multimedia Montage and Counterpoint
2.1. Multimedia Montage
Image expression methods based on parallel structures
can typically be found in composition methods in art, such as
collage, assemblage, and montage. Today·s multimedia
technology goes a long way towards achieving such kind of
image expression not only in the space domain but also in the
time domain[8]-[14]. Consequently, while focusing on the
relationship between the present multimedia technology and
this image composition concept, we name the structural
synthesis of multimedia components in time and space as
´Multimedia Montageµ.
·
2.2. Eisenstein s montage theory
The notion of composition in movies most typically
appears in the montage theory, which was studied mainly in
the 1920·s by Russian movie directors such as S. M.
Eisenstein[1].
In these early studies, the montage theory was based on
movie shot compositions. After 1930, Eisenstein extended the
theory to involve multimedia, including colors and sounds.
In particular, in movies such as ´Aleksandr Nevskiiµ,
Eisenstein emphasized the importance of counterpoint
composition in the synthesis of movies and sounds, e.g.,
sounds must not obey movies; the two must be equal and
independent of each other.
2.3. Counterpoint concept
Generally speaking, the word ´counterpointµ has two
meanings. One meaning is ´an abstract concept of an
expression methodµ and the other meaning is used in music.
The former emphasizes the contrast between two elements to
be combined in some presentation. The latter is specified in
the classic European music theory and has some different
aspects.
What is common and most important in the counterpoint
concept is that every element of what is to be expressed must
be autonomous and independent of other elements. Some
entity is composed of multiple elements, which temporally
coexist in parallel and are not tied to any fixed dependency.
Therefore, the main characteristic of counterpoint is its
parallel structures, especially in the time axis.
PARALLEL
AUGMENTATION
DIMINUTION
RETROGRADE
3. Counterpoint Movie
INVERSION
There are several suggestive examples of counterpoint
application to other media, such as Glenn Gould·s radio
documentary ´The Idea of Northµ, which is called a
´counterpoint documentaryµ[2][3]. Another example is
Samuel Beckett·s movie ´Quadµ[6], where the actors walk as
if dancing in a ceremony based on a canon form. However, it
is difficult to find counterpoint application to movie structures.
Counterpoint movies are the target for production by the
Multimedia Montage prototype system based on the
counterpoint theory. In this study, we generally call a movie
based on the counterpoint theory a ´counterpoint movieµ.
Before producing such a movie by the prototype system,
we analyze characteristics of the counterpoint theory in music
and construct some examples of counterpoint movies using a
commercial non-linear movie editing tool (´Adobe Premiereµ),
based on the results of our analysis.
3.1. Counterpoint in music
The counterpoint theory in classical music has grown
since the European middle ages. J. S. Bach accomplished the
integration of the counterpoint and harmonics. The common
characteristics of the theory can be summarized as follows,
especially from the viewpoint of multimedia handling methods[2][5].
Composition of multiple parts, each with autonomy
and
independence.
Temporal
relationships among the notes to be matched.
Harmony
among
the notes to be matched.
Dux
and
comes
(
subject
and answer )
Transformation for a following voice
As is typical in canons, various temporal or pitch
transformations and their combinations are used against a
leading voice melody to form the corresponding following
voice melody. Examples of the transformations used in canons
are parallel( follow ), augmentation, diminution,
retrograde, and inversion ( of pitch ) ( Fig. 1 ).
Fig. 1 Transformations in canons
3.2. Counterpoint movie examples
We have made the following counterpoint movie examples
to examine the possibility of applying the counterpoint theory
to movies.
1) Dance Canonica
´Dance Canonicaµ[7](Fig. 2) is a basic version of a
counterpoint movie, one that uses transformation variations in
canons. The same dance shot is made to overlap the original
one by changing its starting time, speed, etc. Actually, the
movie consists of five scenes corresponding to canon forms,
i.e., parallel, augmentation, retrograde, diminution, and
retrograde & inversion (of color).
2) Kazoku Game Game
´Kazoku Game Gameµ(Fig. 3) is a practical version of a
counterpoint movie. It is an aggregation of diverse variations
of movie compositions, like a fugue, based on a parody of a
famous Japanese movie ´Kazoku Gameµ. ´Kazokuµ means
family and ´Kazoku Gameµ is a sort of home comedy. The
actual structure of the typical version of the movie is
presented in the next chapter. The target of ´Kazoku Game
Gameµ is not only a movie but also a sort of communication
game based on movie making using multimedia technology.
3.3. Characteristics of counterpoint movies
By looking at the above analysis and experiments, we
have reached the conclusion that most concepts in the
counterpoint theory in music can be adapted to counterpoint
movies, especially in the aspect of time.
Fig. 2 Scenes from ´Dance Canonicaµ
Fig. 3 Scenes from ´Kazoku Game Gameµ
The characteristics of counterpoint movies are
summarized as follows corresponding to those in music.
1)Synthesis of multiple independent movie elements.
2)Temporal synchronization among movie elements that
are synthesized.
3)Adjustment in the presentation of synchronizing movie
elements.
4)Communications among movie elements based on
subject reference and repetition.
5)Temporal transformations such as translation, scaling,
and reversion in the reference.
These are the most fundamental characteristics of
counterpoint movies. Not all the five conditions are
necessarily covered all of the time. There can be cases where
some are achieved by indirect suggestions or simply omitted.
4. Multimedia Montage Prototype System
We have developed a prototype system to produce
counterpoint movies, as a case study in Multimedia Montage.
4.1. Configuration
The present prototype consists of the following parts
( Fig. 4 ).
1) Movie Modeller
The Movie Modeller consists of two main modules, that
is, a Movie Graph GUI and a Script Converter.
1.1) Movie Graph GUI
The Movie Graph GUI is a GUI for editing Movie
Graphs (Fig. 7), each of which is an internal form of the Meta
Script (Fig. 5). The Movie Graph GUI can read from and write
to the Meta Script, so users are able to make and arrange Meta
Scripts using the Movie Graph GUI.
1.2) Script Converter
The Script Converter converts the Meta Script into a
Raw Script, which is of a machine dependent form. In the
conversion, the Script Converter retrieves multimedia
components
that match the Meta Script descriptions.
Movie Renderer
The Movie Renderer synthesizes multimedia
components in the Multimedia Database into one movie
according to the Raw Script descriptions. The resultant movie
is saved into the Multimedia Database as a standard movie file,
so as to be played by a standard movie player, or directly
displayed.
Multimedia Database
The Multimedia Database stores multimedia components
and their attribute information, which corresponds to the
attributes defined in the Meta Script.
0RGHOOHU
0RYLH
duration(40.0)
mix(CHROMAKEY)
Symbol(student)
6FULSW
&RQYHUWHU
}
0(7$
6&5,37
029,(
*5$3+
5$:
6&5,37
029,(
287387
0RYLH
0RYLH
5HQGHUHU
*UDSK
*8,
&20321(17
$775,%87(6
&20321(17
),/(6
0XOWLPHGLD'DWDEDVH
08/7,0(',$'$7$%$6(
Fig. 40XOWLPHGLD0RQWDJH3URWRW\SH6\VWHP
4.2. Meta Script
A Meta Script is a hierarchical aggregation of node
definitions and describes the structure of a synthesized movie.
Fig. 5 illustrates an example of a Meta Script which
corresponds to one typical version of ´Kazoku Game Gameµ
(Figs. 6-7 correspond to Fig. 5).
We originally designed the Meta Script format in our
Multimedia Montage study. The format is similar to VRML
[8] or the Open Inventor file format [9] and its details are
arranged corresponding to our own requirements. In VRML,
the node hierarchy represents a 3D space structure. In the
Meta Script, in contrast, it represents a temporal structure.
The combination of sequential and parallel is very
similar to SMIL[12]. The main function of the Meta Script, on
the other hand, is the description of the synthesis
(superimposition) of multiple movies; SMIL describes the 2D
spatial layout of multimedia components on a web page
considering temporal synchronization.
The general description of a node is as follows.
Node Type Name ( Node Name ) {
Descriptions of node attributes.
Descriptions of other nodes.
}
ex.)
# Study Shot
Shot(Study) {
type(SOUNDMOVIE)
Every node has its own name except for the Define Node.
Any line starting with ´#µ is ignored as a comment line
(except the #define macro). Once a node is defined in a Meta
Script ( whether inside or outside of the Define Node ), it can
thereafter be referenced by simply describing its name
(´Symbol(student)µ in the example). When a defined node is
referenced, all of the attribute values are inherited and any one
of them can be changed by adding the block ´{...}µ.
The types of nodes to be used in the Meta Script (and in
the Movie Graphs) are as follows.
1) Movie
This is the root of any whole movie structure.
2) Define
The Define Node defines some of the information of a
scene, shot, or symbol nodes to be referenced elsewhere.
Nodes under the Define Node are not directly traversed.
The Define Node can have any number of offset
attributes. An offset attribute is used to define a temporal
offset value corresponding to its label name, which can be
refered to as a value of a start_time attribute in any
descendant node of the Define Node·s parent.
ex.)
Define {
offset( Development, 45.0 )
offset( Coda, 75.0 )
}
3) Scene
The Scene Node makes a hierarchical structure by
becoming the parent of other scene nodes. It has such
attributes as timing, start_time, duration, speed, reverse,
repeat, and mix. The last mix attribute represents the image
mixture (superimpose) mode against other parallel nodes
(OVERLAY or CHROMAKEY). This attribute corresponds
to an attribute of the same name in the Multimedia Database.
If it is defined both in the database and in the script, the
definition in the Meta Script has priority.
The other attributes correspond to the above-mentioned
temporal transformations. The timing attribute has a value of
SEQUENTIAL or PARALLEL, which determines the
temporal relationship of the child nodes. From this
specification, a sequential structure and a parallel structure are
freely combined into one movie.
4) Shot
The Shot Node is the bottom end child node of a scene
hierarchy and corresponds to an actual component in the
Multimedia Database. It can have Symbol Nodes as its
children. Then, the set of symbols can be matched to a component
6KRW*LUOVB0LGGOHB6KRW^
0HWD6FULSW6DPSOH
W\SH6,/(17029,(
.D]RNX*DPH*DPH
GXUDWLRQ
6\PERORFHDQ
0RYLH*DPH^
6\PEROJLUOV
0DLQ&DVWLQJ
`
'HILQH^
6KRW*LUOB=RRPB6KRW^
6\PEROWXWRU^
W\SH6,/(17029,(
FDVWLQJ<RQHL
VSHHG
`
GXUDWLRQ
6\PEROIDWKHU^
6\PERORFHDQ
FDVWLQJ6X]XNL
6\PEROJLUO
`
`
6\PEROPRWKHU^
`
FDVWLQJ8HPXUD
%*0
`
6KRW%*0^
`
W\SH6281'
7LWOH
GXUDWLRQ
6KRW7LWOH^
6\PERO%*0
ILOHBQDPH7LWOHWLI
`
GXUDWLRQ
`
`
3DUW1R%DFNJURXQG,03/,&,7
'LQLQJ6FHQH
`
6FHQH'LQLQJ^
)RUHJURXQG
WLPLQJ3$5$//(/
6KRW)RUHJURXQG^
%DFNJURXQG
W\SH6281'029,(
6FHQH%DFNJURXQG^
PL[&+520$.(<
3DUW.LWFKHQ8QLW
6\PEROWXWRU
6KRW.LWFKHQ^
6\PEROIDWKHU
W\SH67,//
6\PEROPRWKHU
GXUDWLRQ
6\PERO\RXQJHUBFKLOG
6\PERONLWFKHQ
6\PEROHOGHUBFKLOG
`
6\PEROORQJBWDEOH
3DUW%HDFK6FHQH
`
by Multimedia Database retrieval.
It has such attributes as timing, start_time, duration,
speed, reverse, repeat, and mix, like the Scene Node. It also
has a type attribute and a file_name attribute. The type
attribute represents the content type of a component to be
matched such as SOUNDMOVIE, SILENTMOVIE, STILL,
or SOUND. When a known component in the Multimedia
Database is used, the file name of the component can be
specified using the file_name attribute.
5) Symbol
The Symbol Node represents abstract symbol
information that will appear in the parent Shot Node. It has a
casting attribute which represents a person (or a thing) that
acts as the symbol. It can also be used to represent a property
of the parent. In such a case, the casting attribute represents
the value of the property. The combination of the Symbol
Node name and the casting attribute is utilized for the
retrieval of multimedia components, which have
corresponding attribute data in the Multimedia Database.
}
6FHQH%($&+DQG%*0^
WLPLQJ3$5$//(/
(QG7LWOH
%HDFK,PDJH
6KRW(QG^
6FHQH%HDFK^
ILOHBQDPH(QGWLI
6KRW%DQDQDB%RDW^
GXUDWLRQ
W\SH6,/(17029,(
`
GXUDWLRQ
`
4.3. Movie Graph
6\PERORFHDQ
6\PEROEDQDQDBERDW
`
Fig. 5 Meta Script of ´Kazoku Game Gameµ
7LWO
7LWOHH
%HDFK
%HDFK
%HDFK
'LQLQJ%DFNJURXQG
%DQDQDB
*LUOVB
.LWFKHQ
%RDW
0LGGOH6KRW
*LUOB
(QG
=RR
=RRP6
P6KKRW
/D\HUV
%*0
RI
VKR
KRWWV
'LQLQJ
'LQLQJ)RUHJURXQG
7LPH
Fig. 67LPH7DEOHRI´.D]RNX*DPH*DPHµ
*DPH
029
029,,(
7LWO
LWOHH 'LQL
QLQQJ
'(),1
'(),1(( 6+2
+277 6&(1(
(QG
6+2
+277
%DFNJU
FNJURRXQG
6&(1(
A Movie Graph is an internal form of a Meta Script, and
it is represented in the form of a tree graph (Fig. 7). The types
of nodes and the structure of a Movie Graph simply
correspond to the Meta Script. The idea of Movie Graphs is
taken from the ´Scene Graphµ of the Open Inventor [9].
In the Movie Graph example, each right lower node is
the first child of its left upper node. The vertical node
sequence represents the parallel relationships of child nodes,
while the horizontal node sequence represents their sequential
relationships. According to this presentation format, the
difference between parallel and sequential can be clearly
viewed, although the graph itself is a simple tree.
This format was originally designed in our Multimedia
Montage study for visualization in the Movie Graph GUI. It
does not constrain the actual Movie Graph description form in
Multimedia Montage programs, even though this presentation
itself is called Movie Graph in some cases.
5. Experiment on the Prototype System
.LWFK
WFKHHQ%($&+DQG%*0
6+2
+277 6&(1(
%HD
HDFK
FK
6&(1(
*LUOV *LUO
0LGGGOHB=RRP
%DQ
DQDDQDB0LG
6KRRW B6K
6KRRW
B%RDW B6K
6+2
+27
7
6+2
+27
7
6+2
+277
%*0
6+2
+27
7
)RU
RUHHJUR
JURXXQG
6+2
+27
7
Fig. 70RYLH*UDSKRI´.D]RNX*DPH*DPHµ
To verify the effectiveness of the prototype system as an
image expression tool using multimedia, we reproduced our
counterpoint movie samples “Dance Canonica” and “Kazoku
Game Game”. In addition, we produced a new movie named
“Invention”, which is based on the counterpoint structure of J.
S. Bach’s “3 Part Invention No. 4” using the prototype. The
results are summarized as follows.
1) Structure
We recreated both movies to have completely the same
structures as the former samples, including the temporal
synchronization parameters such as speed, start time offset,
etc. It is much easier for the prototype system to produce
movies of certain temporal structures, because present nonlinear movie editing tools do not have such functions to define
structures and to reuse them. Our prototype is especially
effective in this respect.
The creation of “Invention” was a very important
experiment to ensure the effectiveness of the Meta Script in
describing the structure of an actual counterpoint musical
piece. As a result of this experiment, we found that some
synchronization definition mechanism is necessary for any
pair of separated nodes in a whole Movie Graph. The offset
attribute of the Define Node was added for this reason lately
after the experiment.
2) Abstract expression
As a part of the structure description function, our
prototype has the ability to express an abstract content of a
multimedia component using Symbol Nodes. Present nonlinear movie editing tools do not have such a function related
to a database. In our experiment, we verified that the sample
movies could be recreated utilizing this abstract expression.
The function of a Symbol Node is very general and can
be utilized in many ways. In “Dance Canonica”, we used it for
color information; in “Kazoku Game Game” for player
casting information; and in “Invention” for emotion, motion,
and camera angle information. It is therefore very effective for
image expression, which is often obscure.
3) Texture expression
The expression of detailed textures is very important in
image expression. Present non-linear movie editing tools have
various kinds of filters and superimposing modes that support
image texture expression. Our prototype, in contrast, is a very
simple tool with a lot fewer functions. As a result, the texture
expression produced by the overlay and chroma-key functions
of our prototype in this experiment is very similar to, but not
strictly the same as, the one in our former output.
4) Process time and manipulation
The total process time depends on the rendering time,
which in turn depends on the renderer’s implementation
environment. The method taken in our prototype system has
no special merits or demerits at this point.
What is important is the throughput of the prototype’s
batch processing compared with that of present non-linear
movie editing tools using GUI manipulation. To create and to
edit a structure and the substitutions of the structure elements
are much faster in our prototype. On the contrary, especially
for the early stage of movie design, the non-linear tool GUIs
have good merits for partial verification. Therefore, for actual
use, a combination of the two will be the most effective.
6. Conclusion and Work in Progress
We have confirmed that the counterpoint theory in music can
be applied to movie composition utilizing multimedia technology,
thereby creating a new way of image expression. We have also
confirmed that the script language that we have created is
effective for describing the structures of counterpoint movies.
The most important issue for applying the counterpoint
theory to multimedia is to find a proper substitution rule for
the harmony rule in the counterpoint theory in music. We
have found some clues for solving this issue in our
experiments on producing counterpoint movies.
In our experiments, we observed that some pairs of
recorded movie or sound components fit much better than
expected, given certain time lags and speed rates. We want to
explain this phenomenon with what C. G. Jung calls
“synchronicity” [15], which even diachronically exists among
past records and can be replayed, just as Glenn Gould
experimented with in “The Idea of North” [2][3]. We think
that it is caused by the fact that each image element has its
own rhythm both in the outer world and in each person’s mind,
and it can be synchronized to generate a whole image [16].
We call this hypothetical concept the “Image Wave”.
Consequently, we have started new research based on the
Image Wave concept. In this research, we are analyzing rhythmic
information and its synchronization conditions in movies. For
Multimedia Montage, we plan to create a Counterpoint
Synchronizer which will automatically synchronize multimedia
components based on this research. We hope this mechanism
will work as a harmony rule in counterpoint movies.
References
[1] Eisenstein, S.M.: The Works of Sergei Eisenstein Part 2, KinemaJunposha Inc. (1980-1993)
[2]McGreevy, J., Ed.: Glenn Gould Variations, Doubleday (1983)
[3] Guertin, G., Ed.: Glenn Gould Pluriel, Verdun, Quebec: Louise
Courteau, Editrice Inc. (1988)
[4] Nattiez, J.J.: Musicologie generale et emiologie, Christian
Bourgois Editeur (1987)
[5] Greenberg, B.S.: Johann Sebastian Bach FAQ, USENET
newsgroup alt.music.j-s-bach (1996, 1997)
[6]Beckett, S.: Quad, Curtis Brown Ltd. (1984)
th
[7] Suzuki, R.: Dance Canonica, The 6 ACM International
Multimedia Conference -Art Demos-Technical Demos-Poster
Papers-, pp. 37 (1998)
[8] ISO: ISO/IEC 14772-1, The Virtual Reality Modeling Language,
ISO (1998)
[9] Wernecke, J.: The Inventor Mentor, Addison-Wesley Publishing
Co. (1994)
[10] Apple Computer, Inc.: QuickTime3 Reference, Apple Computer,
Inc. (1998)
[11] ISO: ISO/IEC 14478, Presentation Environments for Multimedia
Objects (PREMO), ISO (1998)
[12] W3C: Synchronized Multimedia Integration Language (SMIL)
1.0 Specification, W3C (1998)
[13] Ackermann, P: Developing Object-Oriented Multimedia
Software - Based on the MET++ Application Framework,
dpunkt Verlag ( 1996)
[14] Gibbs, S: Composite Multimedia and Active Objects,
OOPSLA’91, pp. 97-112 (1991)
[15]Peat, F.D.: Synchronicity, Bantam Books Inc. (1987)
[16]Pribram, K.H.: Languages of the Brain, Prentice-Hall Inc. (1971)