A Methodology Centered on Modularization of QoS Constraints for the
Development and Performance Evaluation of Multimedia Systems
Giancarlo Fortino† and Libero Nigro
Laboratorio di Ingegneria del Software
DEIS - Università della Calabria, I-87036, Rende (CS), Italy
Email: {g.fortino, l.nigro}@unical.it
Abstract
This paper proposes a methodology for the development of
multimedia systems. It is based on a time-sensitive, reflective
actor framework that centers on lightweight actors, nonoverkilling concurrency and customizable constraint-directed
scheduling. A multimedia system can be visualized as a
collection of media actors, i.e., autonomous, concurrent and
computational processing entities, involved in multimedia
sessions. QoS requirements associated to a multimedia session
are incorporated in reflective actors called QoSsynchronizers
that manage and enforce application-dependent QoS
parameters. Timing QoS parameters are first specified by using
Time Stream Petri Nets and then translated in terms of a
QoSsynchronizer. In order to support QoS constraints analysis
and validation, media actors and QoSsynchronizers are
prototyped under simulation by exploiting a flexibility of the
adopted framework to operate transparently under virtual or
real time. Thus a seamless transition from the modeling to the
implementation stages can be obtained. The paper describes
the use of the methodology for the development of live and ondemand multimedia applications.
1. Introduction
Nowadays, advances in computer and network technology
have fueled the development of distributed multimedia systems
such as interactive video conferencing, media on-demand, etc.
Such applications are characterized by spatial and, notably,
timing Quality of Service (QoS) requirements that need to be
fulfilled to ensuring the user’s expectations. Timing QoS
encompasses multimedia synchronization parameters (jitter and
skew) and interactivity (end-to-end delay) [9].
A major challenge is to build multimedia systems on
asynchronous and heterogeneous environments like Internet [3].
In such an environment, multimedia sessions experience
violations of QoS parameters due to unbound delays,
interference between sessions, packet losses and congestion.
Recovery mechanisms and strategies can be developed to
smooth those effects so as to provide an acceptable level of
†
service to the user [7]. Further problems contributing to
unpredictable behavior arise from the use of common, generalpurpose operating systems whose task scheduling can hardly be
controlled to ensure real-time constraints.
In this work, a time-sensitive, reflective actor framework [8]
is adopted to model, analyze and implement multimedia
systems [4][5][6]. Novel in this approach is a customizable,
non-preemptable scheduling/dispatching control mechanism
that can be tuned to multimedia applications.
Media actors are introduced to perform computational tasks
such as media capturing, coding, networking, and presenting
functions. They communicate via asynchronous message
passing and are not aware of timing aspects. Timing QoS
constraints are handled by reflective actors
or
QoSsynchronizers [11]. They enforce QoS requirements (e.g.,
skew and jitter) by filtering message exchanges inside groups
of actors directly affecting the scheduling. Actors can be
prototyped and operated either under simulation or in real-time
by specializing the time notion in the scheduler and the runtime control machine.
In this paper, the methodology is applied to the modeling,
simulation and timing QoS evaluation of multimedia
applications. Most emphasis is devoted to a multimedia
synchronization system, which ensures lip synchronization
between two correlated A/V streams. In order to formalize
timing QoS specification, the Time Stream Petri Nets (TSPN)
model [9] is used.
The network traffic can be generated according to (i)
previously dumped and archived Real time Transport Protocol
(RTP) based multimedia sessions that are composed of two
synchronized audio/video streams; (ii) traces produced by
means of a network simulator (e.g., ns-2) [2].
The remainder of this paper is as follows. §2 summarizes
the concepts of the actor-based framework. §3 describes the
modeling of distributed multimedia systems. §4 introduces
multimedia synchronization issues and the TSPN model. §5
provides a QoSsynchronizer embedding a TSPN specification.
§6 reports experimental results about the fulfillment of the QoS
requirements coming from analysis through simulation.
Conclusions and directions of on-going work are finally
presented.
Corresponding author: phone +39.0984.494684, fax +39.0984.494713.
2. A time sensitive, reflective actor
framework
The actor framework [8][4] is based on a variant of the
Actor model [1] that centers on lightweight actors and a
modular approach to synchronization and timing constraints.
Actors are reactive entities modeled as finite state machines.
The arrival of an event (i.e., a message) causes a state
transition and the execution of an atomic action. At the action
completion the actor is ready to process a next message and so
forth. Actors do not have internal threads for message
processing. At most one action can be in progress in an actor at
a given time.
Actors can be grouped into clusters (i.e., subsystems). A
subsystem is allocated to a distinct physical processor. It is
regulated by a control machine (Figure 1) that hosts a time
notion and is responsible of message scheduling and
dispatching. The control machine can be customized through
programming. For instance, in [8] a specialization of the control
machine for hard real-time systems is proposed, where
scheduling is based on messages time-stamped by a time
validity window [tmin, tmax] expressing the interval of
admissible delivery times. Message selection and dispatching is
based on an Earliest Deadline First strategy. Within a
subsystem, actor concurrency depends on message processing
interleaving. True parallelism is possible among actors
belonging to distinct subsystems.
CONTROL MACHINE
SUBSYSTEM
Actor
Builder
schedule
Message Queues
The actor framework can be supported by a few Java base
classes in a simple yet efficient way. In order to avoid
dependencies from unpredictable Java features, no threads are
used, messages are pre-allocated and reused and the recourse to
dynamic memory management avoided. Actors are
implemented as reactive objects provided of message handlers.
A message handler implements the state transition diagram of
the corresponding actor. The light-weight nature of actors
simplifies the realization of mobile actors (see fig. 1) which
through the Actor Builder block based on Java Object
Serialization and Dynamic Class Loader can easily be
streamlined and then reconstructed in the context of the
receiving subsystem. Mobile actors are required to satisfy
security policies constraining their activities.
3. Modeling multimedia systems
Actors are the basic building blocks to structure multimedia
systems such as videoconference applications, video-on demand
systems [5], virtual acquisition systems [6], etc. These systems
can be viewed as a collection of autonomous, concurrent
information processing actors called media-actors involved in
multimedia sessions. A multimedia session consists of a set of
media-actors that interact with each other to provide the
required service (Figure 2). Typically, QoS requirements are
associated to a multimedia session. To this purpose, a special
actor called QoSsynchronizer [11] that encapsulates the timing
QoS requirements for a session is introduced. A
QoSsynchronizer can control the execution of media actors by
observing transitions, in terms of message exchanges, on the
media actors in the session. It is different from an actor in that
it does not interfere with the behavior of the underlying
computational actors that it observes.
Clock
QoSsynchronizers
select
Scheduler
dispatch
RTsynchronizers
Controller
moveTo
[Selector/Dispatcher]
dispatch
Q2
Timing
Domain
Q4
QoS Broker
Q1
Q3
dispatch
Messages
Mobile Actors
M2
Inner Actors
M5
M1
Network Messages
APPLICATION ACTORS
Computational
Domain
M6
M3
M4
Media-actors
Communication Network
Figure 2. Structure of an actor-based multimedia system.
Figure 1. Actor run-time architecture.
In particular, the QoSsynchronizer cannot send messages to
or receive messages from the underlying media actors. It
impacts execution by constraining when the media actors in the
session react and respond to events and thereby restricts when a
particular computational behavior can occur. In addition, the
internal state of a QoSsynchronizer may be updated as a result
of observing events on media actors. Using the concept of
QoSsynchronizers permits the modularization of QoS
requirements and encourages separation of orthogonal concerns:
functionality and execution constraints.
In a system with multiple sessions, there is a need to satisfy
QoS constraints for each session. To accomplish this, an actor
A distinguishing feature of the actor framework is the
modular handling of timing constraints. Application actors are
developed according to functional issues only. They are not
aware of when they are activated by a message. Timing
requirements are responsibility of RTsynchronizers, i.e., special
actors which capture “just sent messages” (including messages
received from the network) and apply to them timing
constraints affecting scheduling. Control machines of a
distributed system can be interconnected by a network and real
time protocol so as to fulfill system-wide timing constraints.
called the QoS broker [11] can be introduced (Figure 2). It acts
as a coordinator for all the ongoing sessions and performs
admission control for new incoming ones. Since the overall
system QoS cannot be violated, every QoSsynchronizer must
interact with the QoS broker.
A multimedia system, from a programming in the large
viewpoint, can be visualized as a collection of two kinds of
macro-components: transmitter and receiver. For instance, a
multi-user videoconference system is composed of a set of
transmitter/receiver pairs, whereas a video-on demand system
consists of one or more transmitter (server complex) and a set
of receivers (or clients). The transmitter is typically devoted to
acquiring the multimedia data, e.g., from a capture device or
from media archives, and to send it on the network. On the
other hand, the receiver is continuously waiting for multimedia
data to be displayed for the final presentation.
In the proposed modeling, each subsystem, which can be
composed of transmitter and/or receiver components in
particular configurations, hosts a multimedia control machine
equipped with suitable QoSsynchronizers.
Transmitter and receiver components remotely located are
connected by bindings, i.e., logical communication channels.
Bindings can be point-to-point (i.e., unicast) and point-tomultipoint (i.e., multicast). A binding is created by a bind
operation originated from media-actors called Binders. A
binder governs the on-going flow of data (e.g., continuous
media or control messages) sent into the binding. It hides
particular transmission mechanisms (e.g., network and
transport protocols). A binder can also monitor the binding QoS
so as to make it available information such as throughput, jitter,
latency and packet loss statistics.
A Streamer is a periodic actor that accesses digital media
information through media passive objects (e.g., MediaFile
providing access to multimedia archives, MediaDevice
supplying support for capturing and presenting multimedia
data), encode and send it to Binders or Presenters.
Presenters are media-actors specialized to render media
objects. Presenters can be synchronous and asynchronous. The
former atomically consumes the multimedia data and does not
allow checking the status of the presentation (e.g., a vic [3]
agent can belong to this category). The latter, while the
received multimedia objects are being displayed, can poll the
current presentation status.
The notion of a multimedia presentation is encapsulated in a
media-actor called Manager. It orchestrates the media objects
(time-dependent and time-independent) by interacting with the
media-actors.
RTP/RTCP
SERVER
MMDB
Figure 3 (b). Architecture of an on-demand system.
Figure 3(a) portrays a multimedia system concerned with a
unidirectional remote videoconferencing (e.g., a teleteaching
session) over the Internet MBone. Figure 3(b) shows a video
on-demand system, which allows a user to connect to a server,
requests and displays a movie. In both cases, the multimedia
session consists of two synchronized A/V streams. In figure
3(a), the Transmitter and Receiver(s) are connected by two
multicast data streaming bindings. In figure 3(b) the transmitter
(or Sender) and the receiver (or Client) are linked by a unicast
data streaming binding and a unicast control streaming binding.
Data bindings are based on the RTP/RTCP (Real Time Control
Protocol) protocol [12], whereas the control binding uses RTSP
(Real Time Streaming Protocol) [3]. Transmitter subsystem is
responsible of the streaming process (capturing or reading) and
the enforcement of timing constraints upon the media streams
to fulfil the requirements of the multimedia presentation. On
the remote site, the receiver subsystem resynchronizes, renders,
and controls the requested multimedia session. The multimedia
session can be described by the Session Description Protocol
(SDP) [3]. The presentation description file contains
information about the media streams within the presentation,
such as the set of codings, network addresses, inter-stream
synchronization relations and information about the content.
Media-actors Controllers are introduced to handle events
generated by the user through a graphical interface (e.g.,
remote control panel of an on-demand session).
Figure 4 portrays a schema of the proposed multimedia
synchronization system (see V5) at the receiver. The schema
highlights the actors and their relationships, i.e., messages and
constraints. It can be common to both the systems depicted in
figures 3(a) and 3(b).
RateSynch
TRANSMITTER
RTP/RTCP
Media Actors
Figure 3 (a). Architecture of a “live” unidirectional one-tomany videoconferencing system.
PollSynch
PollVB
FinishedV
StopV
Video binding
Video Binder
vmfVBtoVP
Audio Binder
Video Presenter
FinishedA
PollAB
Audio binding
RECEIVER #j
QoSsynchronizer
RTsynchronizers
RECEIVER #i
Multicast
Network
CLIENT
RTSP
amfVBtoVP
PollVP
Video
Device
StopA
Audio Presenter
PollAP
Audio
Device
Figure 4. Actor-based multimedia synchronization system
at the receiver.
Rate synchronizers (e.g., RateSynch and PollSynch) are
introduced for timing the acquisition, transmission, reception,
and presenting operations of the media actors.
4. Specifying timing Quality of Service (QoS)
requirements
Multimedia temporal synchronization includes two types,
i.e.,
intra-medium
and
inter-media.
Intra-medium
synchronization influences the rate of presentation. If the
arrival rate is abnormal due to the network delay, the jitter
phenomenon happens. Inter-media synchronization deals with
maintaining the requirements of the temporal relationship
between two or more media, such as lip-sync. Due to the
cumulative effect of jitters on a per media stream basis, skew
occurs. Subjective studies showed that video and audio streams
do not have to be exactly tied, but that a skew of 80-120 ms is
below the limit of human perception. The end-to-end delay
(EED) is defined as the time between the grabbing of a data
unit (e.g., video frame, audio sample) on the sender and its
presentation on the receiver. In order to deliver a certain degree
of interactivity, the EED must not be greater than a given
threshold value. The acceptable value depends on the kind of
the multimedia session, e.g., 500ms for live conferencing, up to
5s for video on-demand.
From an application standpoint, a main issue is to provide
synchronous playout of fine-related audio and video streams
under a maximum EED. This is typically achieved by (i)
smoothing the network jitter and (ii) applying intra-medium
and inter-media synchronization constraints. Network jitter
smoothing is accomplished [10] by buffering received audio
and video data for enough time so that “most” of the data will
have been available before their scheduled playout times. This
additional artificial delay until playout can either be fixed
during a multimedia session, or it may adaptively change. Data
which is not received before its scheduled playout time either is
considered lost (i.e., if it arrives later, it is discarded) or can be
replaced by previously arrived data.
Intra-medium synchronization policies have two goals: (i)
waiting for late media units within a predefined interval; (ii)
stopping the presentation of media units if it exceeds a duration
equals to the nominal time + a tolerance and/or keeping on the
presentation of a unit till its nominal time - a tolerance. The (i)
policy is called restricted blocking. The (ii) policy is a
mechanism, which constrains the presentation of a media unit
within a temporal window.
Inter-media synchronization is required to maintain the
temporal
relationships
among
streams.
Inter-media
synchronization policies include (i) parallel first, (ii) restricted
parallel first, (iii) parallel last, and (iv) master. The (i) policy
means that all streams must keep pace to the first terminated
stream to conform synchronization. The (ii) is a parallel first
policy with some delay tolerance. When the first stream
terminates, i.e., it reaches the synchronization point, instead of
terminating soon the other streams, a delay is waited. The (iii)
policy makes the last media stream to be the reference for the
others. The (iv) policy establishes a master stream, which is the
media stream reference for the others. For example, an
audio/video fine-grain synchronization (lip sync) typically
considers the audio to be the master stream. In fact, the humans
normally prefer jerky video to noisy audio.
In figure 5, a desired multimedia presentation specified by
using the TSPN model [9] is shown. The throughput is 10
frame per second. Thus, the nominal presentation time of each
frame is 100ms. The acceptable jitter on audio and video is
10ms. The skew must be less than or equal to 100ms. The
inter-media synchronization policy is of “audio and-master”
type.
[90, 100, 110]
[90, 100, 110]
[90, 100, 110]
[90, 100, 110]
[90, 100, 110]
Video
1
Video
2
Video
3
Video
4
Video
5
Audio
1
Audio
2
Audio
3
Audio
4
Audio
5
[90, 100, 110]
[90, 100, 110]
[90, 100, 110]
[90, 100, 110]
AUDIO
AND-MASTER
[90, 100, 110]
Figure 5. A TSPN presentation model.
The semantics of the model are as follows:
S
when a token arrives in a place the associated presentation
starts
S
the presentation duration belongs to a validity interval of
an outgoing arc
S
the firing instant is scheduled with respect to the temporal
interval [minimal, nominal, maximal] and the chosen
inter-stream transition rule.
4.1. Multimedia synchronization in RTP sessions
RTP (Real-Time Transport Protocol) [12], is an applicationlevel protocol that is used by the majority of the multimedia
tools (e.g., vic, vat, rat, etc.) over Internet MBone. In the IP
protocol stack, RTP lies above UDP. It consists of two
protocols: RTP for real-time transmission of data packets with
no “guarantees” and RTCP (Real Time Control Protocol) for
monitoring QoS and for conveying participants’ identities in a
session. RTP data packet is composed of a header followed by
payload data which can be either a video frame (or a part of it)
or several audio samples. Main fields in an RTP header are:
S Timestamp (T): reflects the sampling instant of the first
octet in the data packet. It is media specific and is used to
provide receiver-based synchronization
S Sequence number (SN): is incremented by one for each data
packet sent. It can be used to detect losses, duplicated and outof-order packets
S Payload type (PT): identifies the format of the data payload,
e.g., H.261, JPEG for video streams, PCMU, GSM for audio
streams
S Marker (M): signals significant events for the payload, e.g.,
end of a frame for video or beginning of a talkspurt for audio.
According to the Audio/Video profile [13], the default
packetization interval of the audio should have duration of
20ms. If a PCM encoding (PT=0) is used, the RTP packet
contains 12 bytes of header and 160 byte of data. The MTU
(Maximum Transmission Unit) in RTP is 1024 bytes. Thus, an
audio sample is encapsulated in one RTP packet whereas a
video frame can be split in several packets (sub-frames) in
order to be sent on the network. Since the use of RTP, both
video and audio objects of Figure 5 are actually split into
elementary schedulable units, namely RTP packets.
5. Embedding QoS specifications in
QoSsynchronizers
Multimedia synchronization specifications based on the
TSPN model can be embedded in QoSsynchronizers. In the
following, a QoSsynchronizer (figure 4), which embodies the
TSPN specification of Figure 5, is described.
Indeed, two different implementations have been realized
which take into account synchronous [4] and asynchronous
media presenters (see §5.1). Synchronous media presenters are
not equipped with methods for testing and controlling the
presentation so that the presentation time should be known “a
priori”. Current distributed systems (networks and operating
systems) are asynchronous. Moreover, current trends exploit
asynchronism to boost performances, and asynchronism seems
to be the future of computers. For this reason, asynchronous
media presenters have been introduced. For example, a media
presenter can poll the presentation process of a media object
(video, audio) for determining its state (Terminated, On-going).
It is also provided with a Stop method to terminate an on-going
presentation. When the media presenter detects that the
presentation is over, it sends itself a message Finished.
It can be assumed that the playtime (decode + render time)
of one full RTP packet payload is constant. If the total playtime
computed for a frame exceeds the maximum presentation time
(MPT), applicative losses are generated to limit the
presentation to its MPT.
In the synchronous case, in order to fulfill the MPT
requirements, some RTP packets (messages) are not scheduled
(or discarded). Conversely, in the asynchronous case, the
presentation can be stopped if it overtakes the MPT.
The QoSsynchronizer filters the following messages (see
Figure 4):
vmfVBtoVP. Video message from VideoBinder to
S
VideoPresenter. It contains a video RTP packet
amfABtoAP. Audio message from AudioBinder to
S
AudioPresenter. It contains an audio RTP packet.
FinishedV. It is the message the VideoPresenter sends
S
itself when detects the video presentation process is over.
FinishedA. It is the message the AudioPresenter sends
S
itself when detects the audio presentation process is over.
StopV. It is the message the VideoPresenter sends itself to
S
stop the video presentation process.
StopA. It is the message the AudioPresenter sends itself to
S
stop the audio presentation process.
The QoSsynchronizer also schedules three internal messages:
InterTimer. It is the timer firing the inter-stream
S
synchronization transition
IntraATimer. It is the timer firing the intra-audio
S
synchronization transition
IntraVTimer. It is the timer firing the intra-video
S
synchronization transition.
The QoSsynchronizer (Figure 6) has 3 states: CREATED,
INITIALIZED, and ACTIVE. The Init event makes the QoSsync
pass from the CREATED state to the INITIALIZED state. The action
is associated with the transition. It consists of (i) initializing the
QoS parameters (maxJitter, maxSkew), the media stream
throughput (fps, sps), the maxBufTime (i.e., the buffering
time), (ii) computing the nominal video and audio object
duration (VDU, ADU), (iii) creating the audio and video
buffers (mv and ma). The arrival of the first message
(amfABtoAP or vmfVBtoVP) triggers the state transition from
the INITIALIZED to the ACTIVE state. The associated scheduling
action is as follows: the message is stored into the appropriate
buffer by the method put(Message) and the first InterTimer is
scheduled to fire at the time now() (which returns the current
time) + maxBufTime. As soon as the first InterTimer fires, the
QoSsynchronizer begins scheduling. This point in time
corresponds to the first inter-stream synchronization transition
of Figure 5, which signals the start of the interstream
synchronization period. The associated scheduling actions are
as follows: (i) setting the next IntraTimers (audio and video)
related to the intra-stream synchronization transition; (ii)
scheduling the messages to be presented. The audio and video
messages are fetched from the respective media buffer by the
method get(). This method returns a message containing all the
RTP packets which belong to the current media object. The
IntraVTimer and the IntraATimer are scheduled respectively to
now()+VDU-maxJitter and now()+ADU-maxJitter. All of this
guarantees the minimum presentation time to be always
reached. The audio and video messages are scheduled now().
The variables FTAT and FTVT store the instants of the last
audio and video fired transitions. When an IntraATimer
expires, the boolean A_Finished that takes into account the
completion of the audio object presentation, is tested. If it is
true, the audio presentation is over and the next audio object,
provided it is not the last one of the synchronization period, is
scheduled, i.e., the IntraATimer is set to its minimum
presentation time and the audio message is scheduled now(). If
A_Finished is false, the message StopA is scheduled to be
dispatched after 2*maxJitter time units to allow the audio
presentation process to terminate within the maximum
allowable time. When a FinishedA message is captured, the
audio object presentation time is checked. If it belongs to the
range ]90,110[ and the next audio object to be scheduled is not
the last one, the StopA message is descheduled and the
IntraATimer is set to now(). If the played audio object
presentation time does not belong to the range ]90,110[ and the
next audio object is not the last one, the IntraATimer is set to
now(). Otherwise, if the next audio object is the last one (i.e.,
NA is equal to nObjects), the InterTimer is scheduled to 110(FinishedA.iTime()-FTAT), where iTime() returns the
invocation time of FinishedA.
The scheduling actions performed in the case of the
IntraVTimer expiration are similar to those carried out for the
IntraATimer. However, since the audio master semantics, the
InterTimer is not handled.
State: CREATED
Event: Init
Action: maxBufTime=500; maxJitter=10; maxSkew=100; fps=10; sps=10;
VDU=(1/fps)*1000; ADU=(1/sps)*1000; nObjects=maxSkew/2*maxJitter;
mv=new RTPVideoBuffer(maxBufTime , fps);
ma=new RTPAudioBuffer(maxBufTime , sps);
first=true; Become(INITIALIZED);
State: INITIALIZED
Event: vmfVBtoVP
Action: Schedule( InterTimer, now()+maxBufTime );
mv.put(vmfVBtoVP); Become(ACTIVE);
State: INITIALIZED
Event: amfABtoAP
Action: Schedule( InterTimer, now()+maxBufTime );
ma.put(amfABtoAP); Become(ACTIVE);
State: ACTIVE
Event: vmfVBtoVP
Action: mv.put(vmfVBtoVP);
State: ACTIVE
Event: amfABtoAP
Action: ma.put(amfABtoAP);
State: ACTIVE
Event: InterTimer
Action: if ((!first) && (!V_Finished)) Schedule( StopV, now() );
if (first) first=false;
NA=1; NV=1;
Schedule( IntraATimer, now()+ADU-maxJitter );
Schedule( IntraVTimer, now()+VDU- maxJitter );
FTAT=now(); //Fired Transition Audio Time
Schedule( ma.get(), now() );
FTVT=now(); //Fired Transition Video Time
Schedule( mv.get(), now() );
A_Finished=false; V_Finished=false;
State: ACTIVE
Event: IntraATimer
Action: if ( A_Finished ) {
NA++;
FTAT=now();
Schedule( IntraATimer, now()+ADU-maxJitter );
Schedule( ma.get(), now() );
A_Finished=false;
}
else Schedule( StopA, now()+2* maxJitter );
State: ACTIVE
Event: IntraVTimer
Action:
if (V_Finished) {
NV++;
if ( NV!=nObjects )
Schedule( IntraVTimer, now()+VDU-maxJitter )
FTVT=now();
Schedule( mv.get(), now() );
V_Finished=false;
}
else Schedule( StopV, now()+2* maxJitter );
State: ACTIVE
Event: FinishedA
Action: A_Finished=true;
if (NA!=nObjects){
if ( (FinishedA.iTime()-FTAT>90) &&
(FinishedA.iTime()-FTAT<110) ) {
Deschedule(StopA); Schedule( IntraATimer, now() );
}
else Schedule( IntraATimer, now() );
}
else{
Deschedule(StopA);
Schedule( InterTimer, now()+110-FinishedA.iTime()+FTAT );
}
State: ACTIVE
Event: FinishedV
Action: V_Finished=true;
if (( FinishedV.iTime()-FTVT>90 ) &&
( FinishedV.iTime()-FTVT<110 )) Deschedule(StopV);
if ( NV!= nObjects ) Schedule( IntraVTimer, now() );
Figure 6. QoSsynchronizer’ state/event/action model.
according to the sequence number (SN) and the timestamp (T).
It is able to cope with duplicated and out-of-order packets. A
lost video packet is replaced by the previous one in the stream.
Figure 8 reports a Java code fragment concerning with an
implementation of the bufferData method (for simplicity, the
management of losses, duplications and misordering is not
shown). Care is taken in restricting dynamic object creation and
dereferentiation so as to control the Java garbage collector
activity. According to the session QoS parameters, media
buffers and RTPcache objects, which store RTP packets
belonging to the same video frame, are pre-allocated.
State: CREATED
Event: Init
Action:
//Parameter initialization …
Become( INITIALIZED );
State: PLAYING
Event: Mstop
Action:
Stop();
Send(this, FinishedM);
Become(WAITING);
State: WAITING
Event: mfMBtoMP
Action:
Play(mfMBtoMP.getData());
Send( this, Poll );
Become( PLAYING );
(a)
mfMBtoMP
Init
State: INITIALIZED
Event: mfMBtoMP
Action:
Play(mfMBtoMP.getData());
Send(this, Poll);
Become( PLAYING );
State: PLAYING
Event: Poll
Action:
if (isFinished()) {
Send(this, FinishedM);
Become(WAITING);
} else Send( this, Poll );
Media
Presenter
CREATED
Poll
Poll
INITIALIZED
FinishedM
mfMBtoMP
mfMBtoMP
Poll
PLAYING
Mstop
Poll
Poll
Poll
mfMBtoMP
Mstop
FinishedM
WAITING
(b)
(c)
Figure 7. Media Presenters. (a) State/Event/Action model;
(b) State diagram; (c) Interaction diagram.
public void bufferData(RTPpacket packet){
if ((num_elem==dim) && (lastTS!=packet.getTimestamp()) ) {
//handle overflow by replacing the oldest frame
in=(in+1)%dim;
buffer[in].addElement(packet);
lastTS=packet.getTimestamp();
out=(in+1)%dim;
}
else {
if (lastTS!=packet.getTimestamp()){
in=(in+1)%dim;
num_elem++;
}
buffer[in].addElement(packet);
lastTS=packet.getTimestamp();
}
Figure 8. The method bufferData.
5.1. Media Presenters
6. Simulation and timing QoS evaluation
The behavior of the media presenters is shown in figures 7.
The message mfMBtoMP (message from Media Binder to
Media Presenter) refers to both the audio and video messages.
It is worth highlighting that the message mfMBtoMP bundles
all the RTP packets of a media object (e.g., video frame).
5.2. Buffer management
Media buffers (i.e., mv and ma in Figure 6) allow to
temporarily store media objects. The method bufferData puts
an RTP packet in the proper position in the media buffer
Media actors, messages and RTsynchronizers of figure 4
have been prototyped according to the actor framework under
virtual time. Design parameters of the multimedia
synchronization system are the maxPlaytime (i.e., the
maximum time to play a full RTP packet payload), the media
buffer dimension, and the maxBufTime. The simulation (fig. 9)
is driven by RTP traces dumped from real multimedia sessions
over IP-multicast generated by RTP based tools (e.g., vic and
vat [3]). The used traces consist of a video stream JPEG
encoded and a related audio stream PCM (Pulse Code
Modulation) encoded. Since the input traces have been
gathered on a local testbed (two-switched Fast Ethernet), they
aren’t affected by losses, misordering and duplications.
However, the main goal of the simulation presented in this
paper is to evaluate the lip synchronization mechanism at
presentation level by varying the tuning parameters (fig. 9).
The simulation outputs trace files of the most significant
events during the session, which allow analyzing the audio and
video presentation jitter, the audio/video presentation skew, the
buffer growth, the applicative losses (i.e., the losses, in terms
of non-presented milliseconds of frames, that the multimedia
synchronization system generates to bound the presentation of
media objects within their MPT).
5
ms
2.5
-2.5
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
0
-5
frames
12
8
ms
4
{<AT1, PI 1>, …, <ATn, PIn>}
Video Trace
bufferSize
Multimedia
Simulator
Audio Trace
INPUT
SECTION
0
maxBufTime
Network
Simulator
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
88
91
94
97
100
maxPlaytime
Packet
Info
1
4
7
Arrival Time
Timestamp
Seq. Number
Marker
Payload
DataLength
-4
TUNING
SECTION
Buffer growth
App. Losses
Audio Jitter
Video Jitter
Skew
-8
-12
frames
Figure 10. The audio and the video presentation jitter.
OUTPUT
SECTION
60
50
40
30
ms
Stored
Traces
20
10
Figure 9. Multimedia system simulator.
96
101
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
11
6
1
0
-10
-20
6.1. Output data analysis
s y n c h r on iz ation p oin ts
Figure 11. The audio/video presentation skew.
5
cells
4
3
2
1
48303
46300
44300
42301
40398
38300
36400
34199
32100
30100
27897
25899
23900
21996
19896
17896
15798
13803
9896
11799
7901
6001
3896
2002
0
0
The analysis of the output traces confirms that all the
synchronization mechanisms meet the temporal presentation
requirements, and in particular, the quality of the intra and
inter-stream synchronization. The showed results refer to the
same sample multimedia session and the tuning parameters
fixed as follows: maxPlayTime=20ms, maxBufferTime=100ms,
bufferSize=10.
Figure 10 portrays respectively the video and the audio
presentation jitter of the first 10s of the considered multimedia
session. The jitter is always in the range [-10, 10].
Figure 11 shows the A/V inter-stream skew at the
synchronization points of the first 50s of the multimedia
session. The skew requirements are always respected because
the skew never overtakes +/- 50ms, the maximum allowed
value being less than 100ms.
Figure 12 depicts the video buffer growth of the first 50s of
the multimedia session in two cases:
(i) the audio is played according to its nominal time. The
video buffer, which was set to 10 media objects, can be
dimensioned to 2 media objects
(ii) the audio is played according to its nominal time plus a
random quantity, which introduces a drift effect. In this case,
the audio drift leads to a buffer overflow if the buffer is
dimensioned to a size less than 4 video objects.
ms
no drift
aud io drif t
Figure 12. Video buffer growth.
Figure 13 highlights the actual video frame duration in the
case of synchronous and asynchronous media presenters. In the
asynchronous case, a higher presentation quality is obtained by
minimizing frame corruption. In fact, when a frame to be
displayed is going to overtake its maximum presentation time
(MPT), the presentation is always stopped at its MPT (i.e.,
110). Conversely, in the synchronous case, the presentation
cannot be stopped and some RTP packets (i.e., atomic
presentation units) are to be discarded in order to stay below
On-going work aims at:
continuing with the modeling and analysis of actor-based
multimedia systems: sizing adaptively the buffer used to
smooth the jitter within the allowed end-to-end delay, coping
with losses, examining the user-system interaction, changing
the QoS parameters of an on-going session
S integrating the proposed methodology with ns-2 [2] to
perform interactive simulations
S experimenting with concrete implementations in Java of
analyzed multimedia systems [5][6].
S
96
91
86
81
76
71
66
61
56
51
46
41
36
31
26
21
16
6
11
120
110
100
90
80
70
60
50
40
30
20
10
0
1
ms
the MPT. This process causes the video frame duration to only
have as upper bound the MPT.
frame
asynch
8. References
synch
Figure 13. Synch/asynch video frame durations.
Figure 14 reports the ratio (R), which takes into account the
video applicative losses generated by the multimedia
synchronization mechanism, between the total amount of
truncated milliseconds of presentation and the total number of
video frames versus the maxPlaytime in the case of
asynchronous and synchronous media presenters.
7
6
5
R
4
3
2
1
0
20
19
18
17
16
m a xP la y tim e [m s]
a sync h
sync h
Figure 14. Applicative video losses vs. maxPlaytime.
7. Conclusions
This work claims that an application-level approach, which
integrates a multimedia application with its operating software
(i.e., run-time support and customizable scheduling), and is
based on a formal and rigorous model for open distributed
systems, namely the Actor model, can be an effective
methodology for the modeling, analysis and implementation of
Internet-based multimedia systems.
The paper proposes an actor framework which favors
modularity by separating concerns between application media
actors and QoSsynchronizers. The approach supports multiple
operating environments for prototyping, temporal validation
and concrete implementation of a multimedia system. Each
environment operates on the same runtime representation of the
actors and relies on a specialization of the time notion and the
message scheduling/dispatching control structure.
The design of a receiver component, suitable for both live
and on-demand multimedia systems, which acts as lip-sync
filter, is reported. Multimedia synchronization mechanisms
follow from a TSPN specification and take into account real
operating parameters. The simulator can be fed both by
recorded RTP traces of real multimedia sessions and by traces
produced by a network simulator.
[1] G. Agha, “Actors: a model for concurrent computation in
distributed systems,” MIT Press, 1986.
[2] S. Bajai, D. Estrin, S. Floyd, S. McCanne, D. Zappala et
alter, “Improving Simulation for Network Research,” Technical
Report 99-702, University of Southern California, March 1999.
[3] J. Crowcroft, M. Handley and I. Wakeman,
“Internetworking Multimedia,” UCL Press, 1999.
[4] G. Fortino and L. Nigro, “Modeling, Analysis and
Implementation of Actor-based Multimedia Systems, ” In Proc.
of the International Conference on Parallel and Distributed
Processing Techniques and Applications (PDPTA’99), June 28
– July 1, 1999.
[5] G. Fortino and L. Nigro, “ViCRO: an interactive and
cooperative videorecording on demand system over Internet
MBone, ” to appear on Int. J. Informatica, 1999.
[6] G. Fortino and L. Nigro, “Development of Virtual Data
Acquisition Systems based on Multimedia Internetworking, ” to
appear on Computer Standards & Interfaces, Elsevier, 1999.
[7] I. Kouvelas, V. Hardman, and Anna Watson, “Lip
Synchronization for use over Internet: Analysis and
Implementation,” In Proc. of IEEE Globecom, London (UK),
November 1996.
[8] L. Nigro and F. Pupo, “A Modular Approach to Real-time
Programming using Actors and Java,” Control Engineering
Practice, Vol. 6, N° 12, pp. 1485-1491, December 1998.
[9] P. Owezarsky, M. Diaz, and C. Chassot, “A Time Efficient
Architecture for Multimedia Applications,” IEEE Journal on
Selected Areas in Communication, Special Issue on Protocols
and Architectures for Applications of the 21st Century, Vol.16,
N°3, April 1998.
[10] Sue B. Moon, Jim Kurose and Don Towsley, “Packet
audio playout delay adjustment: performance bounds and
Algorithms, ” ACM/Springer Multimedia Systems, vol. 5, no. 1,
pp. 17--28, Jan. 1998.
[11] S. Ren, N. Venkatasubramanian, and G. Agha,
“Formalising multimedia QoS constraints using actors,” in
Proc. of 2nd IFIP Workshop on Formal Methods for Open
Object-based Distributed Systems, Canterbury (UK),
Proceedings by Chapman & Hall, pp. 139-153, July 1997.
[12] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson,
“RTP: a Transport Protocol for Real-Time Applications,” IETF
Internet Draft, September 1999.
[13] H. Schulzrinne, “RTP Profile for Audio and Video
Conferences with Minimal Control,” IETF Internet Draft,
September 1999.
© Copyright 2026 Paperzz