Web Radio: Technology and Performance

Web Radio: Technology and Performance
Paula Cortés Camino
LITH-ISY-EX–3327-2002
2002-08-30
Avdelning, Institution
Datum
Bildkodning (Image Coding Group)
Date
augusti 2002
(August 2002)
Institutionen för systemteknik
(Division of Electrical Engineering)
Språk
Rapporttyp
Language
Report Category
† Svenska (Swedish)
; Engelska (English)
† Licenciatavhandling
; Examensarbete
† C-uppsats
† D-uppsats
† Annan
URL för elektronisk version
ISBN
ISRN
Serietitel och serienummer
Title of Series, numbering
ISSN
LiTH-ISY-EX-3327-2002
http://www.ep.liu.se/exjobb/isy/2002/3327/
Titel
Web-radio: Tekniker och prestanda
Title
Web-radio: Technology and Performance
Författare
Paula Cortéz Camino
Author
Sammanfattning
Abstract
We review some popular techniques involved in web radio, from the compression methods (both open and
proprietary) to the network protocols. The implementation of a web radio station is also addressed.
Nyckelord
Keywords
Web-radio, streaming audio, windows media, real audio, shoutcast, MP3,
AAC, WMA, audio coding, RTP, RTSP
Web Radio: Technology and Performance
Master thesis in
Image Coding Group
Linköping Institute of Technology
by
Paula Cortés Camino
LITH-ISY-EX–3327-2002
Supervisor: Dr. Robert Forchheimer, Image Coding Group
Examiner: Dr. Robert Forchheimer, Image Coding Group
Linköping, August 2002
Acknowledgements
I would like to express my sincere gratitude to Dr. Robert Forchheimer for accepting me in
the Image Coding Group. He found the right project for me and helped me all the time with
his suggestions and corrections. The first time we met I got an excellent impression about
him, not only as a professor but also as a person, that has been improving with the time and
talks we have had. Thanks also to Peter for his effort to be my opponent, and to Jonas for
his corrections.
Thanks to all the people that have made my time in Linköping so special and unforgettable,
my friends from the university and my host family. There are many people I have to be
grateful to for being so close to me in spite of the distance, all my friends from Castillejo
del Romeral, Guadalajara, my university, all my family...the list would be endless.
I want to dedicate this thesis to my parents and my “mormor” (abuela). They have been
always there, listening to me when I have needed it, and also here visiting me. Also to Jesús
that has been with me during the summer, without his support this thesis (and many other
things) would have never been possible. I love all of you so much.
TACK SÅ MYCKET!
¡MUCHAS GRACIAS!
Linköping, August 2002
Abstract
The march of electronic technology and the explosive growth of the Internet, is changing
how audio communications are delivered. For decades radio was delivered through analog
signals sent over the airwaves, but this is being transformed. Today, streaming technologies
allow more and more people to listen to their favorite radio station through the Internet.
The development of audio broadcasting via the web is probably the biggest revolution in
broadcasting since the advent of FM. Web radio is radio with a lot of potential, with the
ability to hear stations of all formats, musical genres, political social orientations, from
every part of the world. Web radio is also far from radio heard by other means, in terms of
quality, quantity, and variety. It is becoming more and more popular and in the future it
may even replace traditional radio.
The aim of this thesis is to make a review of all the techniques involved in web radio, from
the compression methods (both open and proprietary) to the network protocols. The
implementation of a web radio station is also addressed.
Before the audio gets to the listeners, it suffers many transformations, jitter, delays etc. that
degrade its quality. In the end of the thesis the main performance parameters and factors
affecting the quality of real time audio are reviewed, and some measurements in quality
parameters are analyzed.
Key words: Audio, Compression, Bandwidth, Streaming, Broadcasting, Web Radio
Station, RTSP, RTP, Streaming Server, Player, Encoder, Web Server, Audio File Format,
QoS, Multicasting.
Contents
1
Introduction
1
1.1 Disposition of the Thesis ................................................................................... 1
1.2 Audio Overview ................................................................................................. 2
2
Audio Compression
5
2.1
Classification of Compression Techniques ........................................................ 6
2.1.1 Lossless Compression Techniques .......................................................... 7
2.2.2 Lossy Compression Techniques .............................................................. 7
2.2 Audio Compression on the Internet ................................................................... 10
2.2.1 MPEG ..................................................................................................... 10
2.2.2 Windows Media Audio ........................................................................... 26
2.2.3 RealAudio ................................................................................................ 26
3 ...Compression Techniques Comparison
3.1
3.2
3.3
4
29
Coders Comparison ........................................................................................... 29
Formats Comparison .......................................................................................... 35
Final Conclusions ............................................................................................... 39
Internet Audio Transmission
41
4.1
4.2
Introduction ........................................................................................................ 41
Streaming Audio ................................................................................................. 42
4.2.1 Overview ................................................................................................. 42
4.2.2 Streaming Improvement .......................................................................... 44
4.3 Broadcasting ....................................................................................................... 50
4.4
Multimedia Protocols ......................................................................................... 50
4.4.1 Physical/ Data Link Layer......................................................................... 51
4.4.2 Internet Layer ......................................................................................... 51
4.4.3 Transport Layer ....................................................................................... 54
4.4.4 Session/ Presentation/ Application Layer ................................................ 61
4.4.5 Asynchronous Transfer Mode (ATM) .................................................... 64
4.5 Media Servers ..................................................................................................... 65
4.5.1 Web Servers ............................................................................................ 65
4.5.2 Streaming Servers ................................................................................... 67
4.5.3 Web Servers vs. Streaming Servers ........................................................ 68
4.6 Standards Organizations ..................................................................................... 68
5
Web Radio Stations
71
5.1 Introduction ....................................................................................................... 71
5.2 How do they work? ............................................................................................ 72
6
Implementing a Web Radio Station
75
6.1 Elements ............................................................................................................. 75
6.2 Steps ................................................................................................................... 76
6.3 Commercial Tools .............................................................................................. 79
6.3.1 Windows Media ...................................................................................... 80
6.3.2 Real Networks ......................................................................................... 81
6.3.3 SHOUTcast and Icecast .......................................................................... 86
6.4 Comparisons ...................................................................................................... 87
7
Performance
89
8
Conclusions
97
Appendix: Audio File Formats
99
Acronyms
101
References
115
1
Introduction
In this chapter, the thesis is presented and an overview
of audio processing and transmission is given.
1.1
Disposition of the Thesis
The march of electronic technology and the explosive growth of the Internet [119], is
changing how audio communications are delivered. For decades radio was delivered
through analog signals sent over the airwaves, but this is being transformed. Today,
streaming technologies allow more and more people to listen to their favorite radio station
through the Internet.
Web radio is radio with a lot of potential, with the ability to hear stations of all formats,
musical genres, political social orientations, from every part of the world. Web radio is also
far from radio heard by other means, in terms of quality, quantity, and variety [123]. In the
future it may even replace traditional radio.
This thesis presents an overview of Internet radio systems. Chapter 1 reviews the
processing of audio, from analog sound to packets ready to be transmitted through the
Internet. Chapter 2 emphasizes the necessity of compression in Internet audio transmission,
classifies compression techniques and presents the main techniques used nowadays. In
Chapter 3 both audio formats and the encoders available for each format are compared.
Chapter 4 lists the main audio file formats found on the Internet and their main
characteristics. In Chapter 5 the actual transmission of the audio packets through the
Internet is addressed, introducing the concept of streaming and the protocols used. The
chapter ends by taking a look at the standard organizations for audio compression and
transmission. Chapter 6 introduces web radio and explains the general operation of a web
radio station. Chapter 7 covers the process of creating an Internet radio station with three of
1
1 Introduction
the main streaming systems, SHOUTcast/Icecast, Windows Media and Real Networks.
Factors affecting the QoS of web radio stations are studied in Chapter 8, and some tests
regarding the performance are analyzed. The thesis finishes with a conclusion about the
actual state and the future of Internet radio and future work given in Chapter 9.
1.2
Audio Overview
Sounds are rapid pressure variations in the atmosphere such as are produced by many
natural processes or man-made systems. The human ear responds to atmospheric pressure
variations when they are in the frequency range between 20 Hz to about 20 kHz, which is
called the audio bandwidth.
The nature of the sound is analog, but it is better converted into a digital signal before
transmission. The conversion from analog to digital is done by an ADC (Analog-Digital
Converter), which filters, samples, quantizes and encodes the audio.
In this process of digitizing, some things should be taken into account to get a high audio
quality. In the sampling stage, a higher sampling frequency increases fidelity. The sampling
frequency is the number of times the audio is quantized within a given time period. The
sampling frequency is related to the audio signal bandwidth by the Nyquist limit, which
states that the sampling frequency should be at least two times the maximum frequency of
the signal (or its bandwidth).
When recording music, the choice of the sampling frequency is crucial since musical
instruments produce a wide range of frequencies. The sampling frequency should be above
40 kHz, giving a highest frequency reproduced of 20.0 kHz, approximately the top of
human hearing range [2]. If recording speech, however, a lower sampling frequency will be
enough [3]. Table 1.1 summarizes some sampling frequencies and their common use [2].
Sampling frequency
11.1 kHz
22.050 kHz
24 kHz
44.1 kHz
Common use
Minimum quality currently used on personal computers
Very common in computer sound file formats
Minimum acceptable quality needed for speech recognition
The standard for audio compact discs and high quality personal
computer sound. CD quality
Table 1.1: Sampling frequencies and their common use.
2
Web Radio: Technology and Performance
Quantization refers to the process of approximating the continuous set of values in the data
with a finite set of values. The instantaneous amplitude of the analog signal at each
sampling is rounded off to the nearest of several predetermined levels. The number of
levels is usually a power of 2, so that these numbers can be represented by three, four, five,
or more bits when encoding.
There are two types of quantization, scalar and vector quantization. In scalar quantization,
each input symbol is treated separately to give the output, while in vector quantization the
input symbols are combined together in groups called vectors, and then processed to give
the output [4].
Obviously, as this is a process of approximation it introduces quantization error. Increasing
the word length (number of bits used to encode each sample), the quantization error can be
reduced. In large amplitude signals the correlation between the signal and the quantization
error is small, the error is random and sounds like analog white noise. However, in low
level signals, the error becomes correlated with the signal, which leads to distortion [5].
To decorrelate this error, a technique called “dithering” is used. Dithering distributes the
error across the entire spectrum by adding some noise prior to quantization. This method
raises the noise floor equally at all frequencies, however, since the ear is not equally
sensitive to all frequencies, it makes sense to push the majority of this dither noise to
frequencies where the ear is least sensitive, and remove noise where the ear is most
sensitive. This is done by another technique called “noise shaping” [6].
The complete approach of digitizing is called PCM (Pulse Code Modulation) [7]. The
digital audio is then stored in files in a standard format. However, the size of the files can
be quite large [3], so the files usually need to be compressed.
Once the audio file is compressed, it is divided into packets before transmitting it through
the Internet. The conventional process of transferring and listening to an audio file involves
first transferring the file from one computer to the next or downloading files from a server
via FTP (File Transfer Protocol). Another option is to listen to an audio file as it is
transferred, or real time transfer, using a technique called streaming.
3
4
2
Audio Compression
In this chapter the necessity of audio compression on
the Internet is emphasized and audio compression
techniques are introduced. Compression techniques
can be classified in lossy and lossless, typical
examples of both are presented. At the end of the
chapter three of the main approaches in audio
compression for the Internet (MPEG, WMA and Real
Audio) and their main techniques are analyzed and
explained in more detail.
To store a 3 minutes song on your hard disk with CD quality (44.1 kHz, stereo, 16 bits per
sample) will take up:
44.100 samples/s * 2 channels (for stereo)* 2 bytes/sample * 3 * 60 s/min = around 30
MBytes of storage space.
Then, downloading it over the Internet, given an average 56K modem, would take:
30.000.000 bytes * 8 bits/byte / (56.000 bits/s * 60 s/min) = around 70 minutes.
Sound Quality
CD (44.1 kHz)
FM Radio (22.05 kHz)
AM Radio (8 kHz)
Stereo 16 bit
10 MBytes
5 MBytes
1.8 MBytes
Stereo 8 bit Mono 16 bit
5 MBytes
5 MBytes
2.5 MBytes 2.5 MBytes
900 KBytes 900 KBytes
Mono 8 bit
2.5 MBytes
1.25 MBytes
450 KBytes
Table 2.1: Approximate file sizes of a one minute sound file [3].
5
2 Audio Compression
Looking at the previous example and also at Table 2.1, the need of compression is clear.
Audio compression (also called “audio coding”) reduces the amount of memory required to
store an audio file, reducing also the time required to transfer it and the bandwidth needed
for the transmission.
The term bitrate is used to define the strength of the compression[9]. Bitrate denotes the
average number of bits that one second of audio data will consume. For PCM, bitrate is
then in bps = fsampling (Hz) * word length (bits/sample), e.g. for a digital audio signal
from a CD, the bitrate is 1411.2 Kbps (44.1 K * 2 * 16).
The first proposals to reduce audio followed those for speech coding, but speech and music
have different properties. Furthermore, speech can be coded very efficiently because a
speech production model is available, whereas nothing similar exists for audio signals [10].
High coding efficiency is achieved with algorithms exploiting signal redundancies. These
redundancies can be [11]:
•
Spatial: exploits correlation between neighboring data items.
•
Spectral: uses the frequency domain to exploit relationships between frequency of
change in data.
•
Psychoacoustics: exploits perceptual properties of the human auditory system.
•
Temporal.
2.1 Classification of Compression Techniques
Compression can be categorized in two ways [11, 8, 4]:
6
•
Lossless compression: when the compressed data can be reconstructed
(uncompressed) without loss of information. It is also referred to as reversible
compression.
•
Lossy Compression: the aim is to obtain the best possible fidelity for a given bitrate
or minimizing the bitrate to achieve a given fidelity measure. The compression is
not reversible, the decompressed file is not the same as the original file.
Web Radio: Technology and Performance
2.1.1 Lossless Compression Techniques
These methods are fairly straight forward to understand and implement. Their simplicity is
their downfall in terms of attaining the best compression ratios. Some lossless compression
techniques are:
•
Simple Repetition Suppression: replaces a series of sequence of n successive tokens that
appears with a token and a count number of occurrences. This technique is used when
there is silence in sound files.
•
Pattern Substitution: substitutes a frequently repeated pattern(s) with a code shorter
than the pattern [11].
•
Entropy Encoding: it is based on information theoretic techniques [11, 8], e.g.:
o Huffman Coding: uses a lower number of bits to encode the data that occurs
more frequently.
The basic Huffman algorithm has been extended to the Adaptive Huffman
Coding, because the previous algorithms require a statistical knowledge which
is often not available (e.g. in live audio).
o Arithmetic coding: maps entire messages to real numbers based on statistics.
o LZW (Lempel-Ziv-Welch): is a dictionary-based compression method. It maps a
variable number of symbols to a fixed length code.
2.1.2 Lossy Compression Techniques
Traditional lossless compression methods don't work well on audio because their
compression is not high enough. Lossy compression methods use source encoding
techniques that may involve transform encoding, differential encoding or vector
quantization.
Perceptual techniques (based on psychoacoustics) are also used (e.g. in the MPEG
standards), getting higher compression. The sensitivity of the human auditory system for
audio signals varies in the frequency domain, it is high for frequencies between 2.5 and 5
kHz and decreases beyond and below this frequency band. Therefore, some tones are
masked by others, and then are inaudible. There are two main masking effects:
•
Frequency masking: tones nearby a loud tone are masked.
7
2 Audio Compression
•
Temporal masking: after hearing a loud sound, it takes a little while until humans
can hear a soft tone nearby.
There is a “threshold in quiet” and any tone below this threshold won’t be perceived. For
every tone in the audio signal a “masking threshold” can be calculated [see Figure 2.1].
Tones lying below this masking threshold can be eliminated by the encoder because will be
masked and then irrelevant for the human perception[9].
The following are some of the lossy methods applied to audio compression:
•
Silence Compression: detects the “silence”, similar to Simple Repetition Suppression.
•
ADPCM (Adaptive Differential Pulse Code Modulation): it is a derivative of DPCM
(Differential Pulse Code Modulation). It encodes the difference between two
consecutive signals, and adapts quantization so that fewer bits are used when the value
is smaller [7]. Used in CCITT G.721 at 16 or 32 Kbps and in G.723 at 24 and 40 Kbps.
•
LPC (Linear Predictive Coding): fits the signal to the speech model and then transmits
the parameters of the model [7].
•
CELP (Code Excited Linear Predictor): does LPC, but also transmits an error term [7].
•
ITU-T G.711, mu-law and A-law: mu-law is an encoding commonly used in North
America and Japan for digital telephony. mu-law samples are logarithmically encoded
in 8 bits. However, their dynamic range corresponds to 14 bits linear data.
A-law is similar to mu-law, and is used as an European telephony standard. This
encoding method comes out to be 64 Kbps at 8 kHz.
•
Transform coding: e.g. Frequency domain coders: The spectral characteristics of the
source signal and the masking properties of the human ear are exploited to reduce the
transmitted bitrate [12].
The time-domain audio signal is transformed to the frequency domain before
quantization [13]. The reason for transforming the signal is that the input samples are
highly correlated, and the time-to-frequency transform produces coefficients that are
less correlated. There are also more coefficients with the value near zero, that can be
coded as zero without introducing great distortion.
The spectrum is split into frequency bands that are quantized separately. Therefore the
quantization noise associated with a particular band is contained within that band.
8
Web Radio: Technology and Performance
Figure 2.1: Masking effects in the human ear [9].
The total number of bits available for quantizing (usually fixed by design) is distributed
by a dynamic bit allocation over a number of signal component quantizers, so that the
audibility of the quantization noise is minimized. The number of bits used to encode
each frequency component varies: components being subjectively more important, are
quantized more finely, while components being subjectively less important, have fewer
bits allocated, or may not be encoded at all [12]. This results in the highest possible
audio quality for a given number of bits [14].
Transform coders differ in the strategies used for quantization of the spectral
components and masking the resulting coding errors [15]. Some examples are:
o SBC (Subband coding): eliminates information about frequencies which are
masked, according to psychoacoustic models. It is used in ISO/MPEG Audio
Coding Layers I and II.
o Adaptive transform coding: used in Dolby AC-2 coding and AT&T’s Perceptual
Audio Coder.
o Hybrid (subband / transform) coding: is a combination of discrete transform and
filter bank implementations. It is used by Sony’s Adaptive Transform Acoustic
Coding used in minidisk and by MPEG-1 Layer III.
Other transformations can be applied instead of frequency, such as DCT (Discrete Cosine
Transform) DST (Discrete Sine Transform), and DWT (Discrete Wavelet Transform).
There are also nested approaches or multilevel data compression techniques, such as
applying the DCT several times [16].
Finally, coding methods can also be divided in CBR (Constant bitrate) and VBR (Variable
bitrate). CBR techniques vary the quality level in order to ensure a consistent bitrate
throughout an encoded file. Difficult passages (e.g. passages containing a relatively wide
9
2 Audio Compression
stereo separation) may be encoded with fewer than the optimum number of bits, and easy
passages (passages containing silence or a relatively narrow stereo separation) are encoded
using more bits than necessary. Consequently, difficult passages may experience a decrease
in quality, while easy passages may include unused bits [17].
VBR techniques ensure consistent high audio quality throughout an encoded file, at the cost
of variable bitrate. Difficult passages in the audio source are allocated additional bits and
easy passages fewer bits, thus reducing unused bits. VBR encoding produces an overall
higher quality level than CBR encoding, and should be used when consistent audio quality
is the top priority and constant or predictable encoded file size is not critical [17].
2.2 Audio Compression on the Internet
When the Internet was young, the only way to deliver audio in an acceptable time was to
precede the compression by some reduction scheme, reducing the sampling rate, converting
from stereo to mono, reducing the resolution from 16 down to 8 bits per sample, or all of
the above. But every reduction in the above parameters resulted in a lower quality sound
[3].
Not long after, however, groups began to create highly complex algorithms that allowed for
reduction in the size of the file while retaining the highest quality possible. An early effort
was the MPEG (Moving Picture Experts Group), which intended to develop a compression
scheme first oriented to video and then also to audio [3].
The next step was creating an acceptable compression scheme for streaming technology
[3]. For the first time, hour-long sound files could be played on a computer with a 28.8
Kbps modem. The higher the bandwidth available for the users, the better quality of the
sound they got.
The following is a review on the MPEG audio compression standards and two proprietary
audio compression formats, Windows Media Audio and RealAudio [111].
2.2.1 MPEG
The MPEG is a working group of ISO/IEC in charge of the development of standards for
coded representation of digital audio and video. Established in 1988, the group has
produced MPEG-1, the standard on which such products as Video CD and MP3 are based,
MPEG-2, the standard on which Digital Television set top boxes and DVD are based and
MPEG-4, the standard for multimedia [18]. Furthermore, the MPEG audio compression
algorithm is being considered by the European Broadcaster’s Union as a standard for use in
digital audio broadcasting [19]. They have developed more standards that are not revised
here since they are not related to audio.
10
Web Radio: Technology and Performance
MPEG-1 (ISO/IEC 11172) Coding of moving pictures and associated
audio for digital storage media at up to about 1,5 Mbps
MPEG-1 is a standard for efficient storage and retrieval of audio and video on CD
consisting of several parts: part 1 (Systems), part 2 (Video), part 3 (Audio), part 4
(Conformance Testing), and part 5 (Reference Software). The Systems part provide
multiplexing and synchronization support to elementary audio and video streams. The
Audio part provides lossy encoding of stereo audio with transparency (subjective quality
similar to the original stereo) at some predefined bitrates [see Table 2.2], and also provides
a free bitrate mode to support fixed bitrates other than the predefined. From Layer I to
Layer III, the codec complexity increases, the overall codec delay increases [see Table 2.2],
and the performance increases. The sampling frequencies used in the three layers are 32,
44.1 and 48 kHz.
Layer
Layer I
Layer II
Layer III (MP3)
Bitrates
384 Kbps (1:4)
256 - 192 Kbps (1:6 - 1:8)
128 - 112 Kbps (1:10 - 1:12)
Minimum coder delay
< 50 ms.
< 100 ms.
< 150 ms.
Table 2.2: Bitrates and theoretical minimum coder delay in the MPEG-1 layers [7].
MPEG-1 Audio Layer I (MP1)
The Layer I encoder is the basic MPEG encoder [see Figure 2.2]. The audio stream passes
through a PQMF (Polyphase Quadrature Mirror Filterbank) that divides the input into 32
equal-width subbands of frequency, which don’t reflect the human auditive model where
critical bands are not of equal width [20]. Each subband has 12 frequency samples that are
obtained calculating the DFT (Discrete Fourier Transform) of 12 input PCM samples. This
means that the Layer I frame has 384 (12 * 32) PCM audio samples. In Layer I the DFT is
calculated with 512 points.
In addition, the filter bank also determines the maximum amplitude of the 12 subband
samples in each subband. This value is know as the scaling factor for the subband and is
passed both to the psychoacoustics model and, together with the set of frequency samples
in each subband, to the corresponding quantizer and coding block [7].
The input audio stream simultaneously passes through a psychoacoustic model that
determines the ratio of the signal energy to the masking threshold for each subband. The
quantizer and coding block use a bit allocation algorithm that takes the signal-to-mask
ratios and decides how to distribute the total number of bits available for the quantization of
11
2 Audio Compression
the subband signals to minimize the audibility of the quantization noise and maximize the
compression at the same time.
Figure 2.2: MPEG basic audio coder and decoder [21].
Finally, the last block takes the representation of the quantized subband samples and
formats this data and side information (bit allocation, scale factors) into a coded bitstream
[see Figure 2.3].
Figure 2.3: MPEG-1 Layer I Frame Structure [21]: valid for 384 PCM Audio Input
Samples. Duration: 8 ms. with a Sampling Rate of 48 kHz.
12
Web Radio: Technology and Performance
The decoder is the reverse of this process, except that no psychoacoustic model is required.
The bitstream is unpacked and passed through the filter bank to reconstruct the time domain
PCM samples.
Layer I is the same as the PASC (Precision Adaptive Sub-band Coding) compression used
in Digital Compact Cassettes. Typical applications of Layer I include digital recording on
tapes, hard disks, or magneto-optical disks, which can tolerate the high bitrate [21].
MPEG-1 Audio Layer II (MP2)
Layer II algorithm is a straightforward enhancement of Layer I. It codes the audio data in
larger groups (1152 PCM samples) and imposes some restrictions on the possible bit
allocations for values from the middle and higher subbands. It also represents the bit
allocation, the scale factors values and the quantized samples with more compact code.
Layer II gets better audio quality by saving bits in these areas, so more code bits are
available to represent the quantized subband values [20].
The resulting frame structure is represented in Figure 2.4. Typical applications of Layer II
include audio broadcasting, television, consumer and professional recording, and
multimedia [21].
Figure 2.4: MPEG-1 Layer II Frame Structure [21]: valid for 1152 PCM Audio Input
Samples. Duration: 24 ms with a Sampling Rate of 48 kHz.
MPEG-1 Audio Layer III (MP3)
The Layer III algorithm is a much more refined approach derived from ASPEC (Audio
Spectral Perceptual Entropy Coding). Although based on the same filter bank found in the
other layers (for reasons of compatibility), Layer III compensates for some filter bank
deficiencies by processing the filter outputs with a MDCT (Modified Discrete Cosine
Transform) [see Figure 2.5] [22]. It can be said that the filter bank used in MPEG Layer III
is a hybrid filter bank which consists of a polyphase filter bank and a MDCT [20].
13
2 Audio Compression
Unlike the polyphase filter bank, without quantization, the MDCT transformation is
lossless. The MDCT further subdivides the subband outputs in frequency to provide better
spectral resolution [20].
Figure 2.5: Block structure of ISO/MPEG audio encoder and decoder, Layer III [15].
Besides the MDCT processing, other enhancements over Layer I and II algorithm include
the following [20]:
14
•
Alias reduction: Layer III specifies a method of processing the MDCT values to
remove some artifacts caused by the overlapping bands of the polyphase filter bank.
•
Non-uniform quantization.
•
Scale-factor bands: These bands cover several MDCT coefficients and have
approximately critical-band widths. In Layer III scale factors serve to color the
quantization noise to fit the varying frequency contours of the masking threshold.
Values for these scale factors are adjusted as part of the noise-allocation process.
Web Radio: Technology and Performance
•
Entropy coding of data values: To get better data compression, Layer III uses
variable-length Huffman codes to encode the quantized samples.
The Huffman code tables assign smaller words to more frequent values. If the
number of bits resulting from the coding operation exceeds the number of bits
available to code a given block of data, this can be corrected by adjusting the global
gain to result in a larger quantization step size, leading to smaller quantized values.
This operation is repeated with different quantization step sizes until the resulting
bit demand for Huffman coding is small enough. This loop is called rate loop
because it modifies the overall coder rate until it is small enough [23].
•
Uses of a bit “reservoir”: The coder encodes at different bitrate when it is needed.
If a frame is easy it is assigned less bits and the unused bits are put into a reservoir
buffer. When a frame comes along that needs more than the average amount of bits,
the reservoir is tapped for extra capacity.
•
Ancillary data: Is held in a separate buffer and gated onto the output bit stream
using some of the bits allocated for the reservoir buffer when they are not required
for audio.
•
Noise allocation: The encoder iteratively varies the quantizers in an orderly way,
quantifies the spectral values, counts the number of Huffman code bits required to
code the audio data, and calculates the resulting noise. If, after quantization, some
scale-factor bands still have more than the allowed distortion, the encoder amplifies
the values in those scale-factor bands and effectively decreases the quantizer step
size for those bands. Then the process repeats. The process stops if any of the
following 3 conditions is true: none of the scale-factor bands have more than the
allowed distortion, the next iteration would cause the amplification of the bands to
exceed the maximum allowed rate or the next iteration would require all the scalefactors to be amplified.
As with Layer II, Layer III processes the audio data in frames of 1152 samples. The
arrangements of the bit fields in the bitstream is like this: Header (32), CRC (0,16), side
information (136,256) and main data.
Layer III is intended for applications where a critical need for low bitrate justifies the
expensive and sophisticated encoding system. It allows high quality results at bitrates as
low as 64k bps. Typical applications are in telecommunication and professional audio, such
as commerBc±/ly published music and video [20].
15
2 Audio Compression
MPEG-2 (ISO/IEC 13818) Generic coding of moving pictures and
associated audio information
MPEG-2 has 9 parts. Part 1 addresses the combining of one or more elementary streams of
video and audio, as well as other data into single or multiple streams which are suitable for
storage or transmission. This is specified in two forms: the Program Stream (PS) and the
Transport Stream (TS). Each is optimized for a different set of applications. Part 2 builds
on the powerful video compression capabilities of the MPEG-1 standard to offer a wide
range of coding tools.
Part 3, the Audio part, provides support to encoding of multichannel audio in such a way
that it is a backwards compatible multichannel extension of MPEG-1 Audio, using the
same family of audio codecs: Layer I, II and III. The new audio features of MPEG-2 are:
• “low sample rate extension” to address very low bitrate applications with limited
bandwidth requirements. The new sampling frequencies are 16, 22.05 or 24 kHz,
and the bitrates extend down to 8 Kbps,
• “multichannel extension” to address surround sound applications with up to 5 main
audio channels (left, center, right, left surround, right surround) ,
• optionally one extra “low frequency enhancement” or LFE channel ,
• “multilingual extension” to allow the inclusion of up to 7 more audio channels [25].
Part 4 and 5 correspond to part 4 and 5 of MPEG-1 [23]. The Part 6 DSM-CC (Digital
Storage Media Command and Control) provides protocols for session set up across
different networks and for remote control of a server containing MPEG-2 content. Part 7
“Advanced Audio Coding” (AAC) provides a new multichannel audio coding that is not
backwards compatible with MPEG-1 Audio. It will be explained within the MPEG-4
standard, which defines an improvement of AAC. Part 8 was intended to support video
coding when samples are represented with an accuracy of more than 8 bits, but its
development was discontinued when the interest of the industry that had requested it did
not materialize. Part 9 “Real Time Interface” provides a standard interface between an
MPEG-2 Transport Stream and a decoder.
MPEG-2 provides broadcast quality audio and video at higher data rates. Parts 1, 2 and 3
are used in digital television set top boxes and DVD (Digital Versatile Discs). Some
MPEG-2 encoders are very costly professional equipment and some are inexpensive PC
boards that are sold with video editing software. AAC has been adopted by Japan for a
national digital television standard and by several manufacturers of secure digital music.
As the MPEG-2 standard defines 16 kHz as lowest sample rate, a further extension has
been introduced, again dividing the low sample rates of MPEG-2 by 2: 8, 11.025, and 12
16
Web Radio: Technology and Performance
kHz [23]. This extension is named “MPEG 2.5” but it is not part of the official ISO
standard.
MPEG – 1, MPEG - 2 Audio Frame
An MPEG audio file is built up from smaller independent parts called frames, each one
with its own header and audio information. There is no file header, and therefore, any part
of an MPEG file can be cut and played correctly [26, 27].
To get the information about an MPEG file, it is usually enough to find the first frame, read
its header and assume that the other frames are the same. However, this may not be always
the case, VBR MPEG files may use “bitrate switching”, which means that bitrate changes
according to the content of each frame. Layer III decoders must support this method, and
Layer I & II decoders may support it.
The frame header is constituted by the first four bytes in a frame. Here is a “graphical”
presentation of the header content. Characters from A to M are used to indicate different
fields. In Table 2.4 are shown the details about the content of each field [26].
AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM
Sign
Length
(bits)
Position
(bits)
A
11
(31-21)
B
2
(20,19)
Description
Frame sync (all bits set) : used for synchronization. To avoid false frame
sync, 2 or more frames in a row must be checked.
MPEG Audio version ID
00 – MPEG Version 2.5
01 – reserved
10 – MPEG Version 2 (ISO/IEC 13818-3)
11 – MPEG Version 1 (ISO/IEC 11172-3)
Layer description
C
2
(18,17)
D
1
(16)
E
4
(15,12)
00 – reserved
01 – Layer III
10 – Layer II
11 – Layer I
Protection bit
0 - Protected by CRC (16 bit CRC follows header)
1 - Not protected
Bitrate index
17
2 Audio Compression
bits
V1,L1
V1,L2
V1,L3
0000
free
Free
free
0001
32
32
32
0010
64
48
40
0011
96
56
48
0100
128
64
56
0101
160
80
64
0110
192
96
80
0111
224
112
96
1000
256
128
112
1001
288
160
128
1010
320
192
160
1011
352
224
192
1100
384
256
224
1101
416
320
256
1110
448
384
320
1111
bad
Bad
bad
NOTES: All values are in Kbps
V2,L1
free
32
48
56
64
80
96
112
128
144
160
176
192
224
256
bad
V2, L2 & L3
Free
8
16
24
32
40
48
56
64
80
96
112
128
144
160
Bad
V1 - MPEG Version 1, V2 - MPEG Version 2 and 2.5
L1 - Layer I, L2 - Layer II, L3 - Layer III
For Layer II there are some combinations of bitrate and mode
which are not allowed. Here is a list of allowed combinations.
bitrate
free
32
48
56
64
80
96
112
128
160
192
224
256
320
384
18
allowed modes
All
single channel
single channel
single channel
All
single channel
All
All
All
All
All
stereo, intensity stereo, dual channel
stereo, intensity stereo, dual channel
stereo, intensity stereo, dual channel
stereo, intensity stereo, dual channel
Web Radio: Technology and Performance
F
2
(11,10)
Sampling rate frequency index (values are
in Hz)
bits
MPEG1
00
44100
01
48000
10
32000
11
reserv.
Padding bit
G
1
(9)
H
1
(8)
I
2
(7,6)
J
2
(5,4)
K
1
(3)
L
1
(2)
MPEG2
22050
24000
16000
reserv.
MPEG2.5
11025
12000
8000
reserv.
0 - frame is not padded
1 - frame is padded with one extra slot
Private bit. It may be freely used for specific needs of an application, e.g. if it
has to trigger some application specific events.
Channel Mode
00 – Stereo
01 - Joint stereo (Stereo)
10 - Dual channel (Stereo)
11 - Single channel (Mono)
Mode extension (Only if Joint stereo)
Mode extension is used to join information that are of no use
for stereo effect, thus reducing needed resources. These bits are
dynamically determined by an encoder in Joint stereo mode.
Complete frequency range of MPEG file is divided in subbands
There are 32 subbands. For Layer I & II these two bits
determine frequency range (bands) where intensity stereo is
applied. For Layer III these two bits determine which type of
joint stereo is used (intensity stereo or m/s stereo). Frequency
range is determined within decompression algorithm.
Layer I and II
Layer III
Intensity
value Layer I & II
MS stereo
stereo
00
bands 4 to 31
off
off
01
bands 8 to 31
on
off
10
bands 12 to 31
off
on
11
bands 16 to 31
on
on
Copyright
0 - Audio is not copyrighted
1 - Audio is copyrighted
Original
19
2 Audio Compression
0 - Copy of original media
1 - Original media
Emphasis
M
2
(1,0)
00 – none
01 - 50/15 ms
10 – reserved
11 - CCIT J.17
Table 2.3: MPEG-1 and -2 frame header [26].
Frames usually have a CRC check after the header, and other fields in different formats as
has been shown for each layer [see Figure 2.3 and 2.4]. Then comes the audio data and after
the data follows tag information, used to describe the MPEG Audio file. The structure of
the MPEG Audio Tag ID3v1 is:
AAABBBBB BBBBBBBB BBBBBBBB BBBBBBBB
BCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCD
DDDDDDDD DDDDDDDD DDDDDDDD DDDDDEEE
EFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFG
Sign Length (bytes)
Position (bytes)
A
3
(0-2)
B
C
D
E
F
G
30
30
30
4
30
1
(3-32)
(33-62)
(63-92)
(93-96)
(97-126)
(127)
Description
Tag identification. Must contain 'TAG' if tag
exists and is correct.
Title
Artist
Album
Year
Comment
Genre ( 1=Classic Rock, 2=Country, 3=Dance…)
Table 2.4: MPEG Audio Tag ID3v1 [26].
The ID3v1 tag has some limitations and drawbacks. It has a fixed size of 128 bytes and
supports only a few fields of information, which are limited to 30 characters, making it
impossible to correctly describe many titles and authors. Furthermore, since the position of
20
Web Radio: Technology and Performance
the ID3v1 tag is at the end of the audio file it will also be the last thing to arrive when the
file is being transmitted.
ID3v2 is a new tagging system designed to be expandable and more flexible. It improves
ID3v1 by enlarging the tag to 256 MBytes and dividing the tag into smaller pieces, called
frames, which can contain any kind of information and data such as title, lyrics, images,
web links etc. It still keeps files small by being byte conservative and with the capability of
compress data. What is more, the tag resides in the beginning of the audio file thus making
it suitable for streaming.
MPEG Audio stereo redundancy coding
MPEG Audio works both with mono and stereo signals. It supports two types of stereo
redundancy coding: Intensity and MS (middle/side) stereo coding. All layers support
intensity stereo coding, Layer III also supports MS stereo coding.
Both forms of redundancy coding exploit psychoacoustics. Above 2 kHz and within each
critical band, the human auditory system bases its perception of stereo more on the
temporal envelop of the audio signal than on its temporal fine structure.
In intensity stereo mode the encoder replaces the left and right signal by a single
representing signal, plus directional information [38]. The MS stereo mode encodes the left
and right channel signals in certain frequency ranges as middle (sum of left and right, L+R)
and side (difference of left and right, L-R) channels.
A technique called joint stereo coding is used in Layer III to achieve a more efficient
combined coding of the left and right channels of a stereophonic audio signal. It takes
advantage of the redundancy in stereo material, the encoder switches from discrete L/R to a
matrixed L+R/L-R mode dynamically, depending on the material.
MPEG-4 (ISO/IEC 14496): Coding of audio-visual objects
The first 6 parts of the standard correspond to those of MPEG-2, and it is backwards
compatible with MPEG-1 and MPEG-2. There are, however, a number of significant
differences of content. MPEG-4 enables the coding of individual objects, which means that
the video information don’t need to be of rectangular shape as in MPEG-1 and MPEG-2
Video. The same applies for audio, MPEG-4 provides all tools to encode speech and audio
at different rates from 2 to 64 Kbps and with different functionality, including MPEG-4
AAC, an extension of MPEG-2 AAC [34].
Part 5 is a complete software implementation of both encoders and decoders. Compared
with the reference software of MPEG-1 and MPEG-2 whose value is purely informative,
the MPEG-4 Reference Software has the same normative value as the textual parts of the
21
2 Audio Compression
standard. The software may also be used for commercial products and the copyright of the
software is licensed at no cost by ISO/IEC for products conforming to the standard.
So far the industry has enthusiastically adopted MPEG-4 Video, which has been selected by
several industries for setting standards for next generation mobile communication and is
being utilized to develop solutions for video on demand and related applications. Some
people call it the future “global multimedia language” [29].
MPEG-4 Audio provides several “profiles” to allow the optimal use of MPEG-4 in
different applications. At the same time the number of profiles is kept as low as possible in
order to maintain maximum interoperability. Some of the profiles that MPEG-4 offers are
Speech Audio, Synthesis Audio, Main Audio, High Quality Audio, Low Delay Audio,
Natural Audio, Mobile Audio Internetworking etc.[36].
MPEG-4 Audio has two work items underway for improving audio coding efficiency [35]:
bandwidth extension for both general audio signals and speech signals and parametric
coding, to extend the capabilities currently provided by HILN (Harmonic and Individual
Lines plus Noise) [35].
MPEG-2/4 AAC (Advanced Audio Coding)
The AAC system is the highest performance coding method within MPEG [37]. AAC
works with a wide range of sampling rates from 8 to 96 kHz, bitrates from 16 to 576 Kbps
(achieving indistinguishable audio quality at 96 Kbps per channel), and from 1 to 48 audio
channels. Because it uses a modular approach, users may pick and choose among the
component tools to make a product with appropriate performance/complexity ratios [37].
Due to its high coding efficiency, AAC is a prime candidate for any digital broadcasting
system and delivery of high-quality music via the Internet [29].
MPEG-2/4 AAC has no backward compatibility with MPEG-1, -2, but as it is built on a
similar structure to Layer III, it retains some of MP3’s powerful features: redundancy
reduction using Huffman encoding, bit “reservoir”, non-uniform quantization, ancillary
data and the joint stereo mode [37]. However, it improves MP3 in a lot of details and uses
new coding tools [see Figure 2.7]. The crucial differences are [29]:
22
•
Filter bank: AAC uses a plain MDCT (Modified Discrete Cosine Transform) [29],
and an increased window length (2048 instead of 1152) .
•
TNS (Temporal Noise Shaping): It shapes the distribution of quantization noise in
time by prediction in the frequency domain, and transmits the prediction residual of
the spectral coefficients instead of the coefficients [29].
•
Prediction: It benefits from the fact that a certain type of audio signals (stationary or
semi-stationary) are easy to predict [29, 30] and then, instead of repeating such
information for sequential windows, a simple repeat instruction can be passed.
Web Radio: Technology and Performance
•
Quantization: makes a finer control of the resolution using an iteration method.
AAC is a block oriented, VBR coding algorithm, but rate control can be used in the encoder
such that the output bitrate is averaged to a predetermined rate (as for CBR). Each block of
AAC compressed bits is called a “raw data block”, and can be decoded “stand-alone”
(without knowledge of information in prior bitstream blocks), which facilitates encoder and
decoder synchronization and, if any packet is lost this doesn’t affect the decodability of
adjacent packets [32]. The syntax of an AAC bitstream is as follows:
<AAC_ bitstream>
=> <raw_data_block><AAC_bitstream>
<raw_data_block> => [<element>]<END><PAD>
here [] indicates one or more occurrence. <END> indicates the end of a raw_data_block
and <PAD> forces the total length of a raw_data_block to be an integral number of bytes.
The <element> is a string of bits of varying length, indicating if the represented data is
from a single audio channel, stereo, multi-channel, user data, etc. [32]:
Figure 2.6: MPEG-2 AAC audio coding [1]. Legend: Data
Control
23
2 Audio Compression
The standard defines two examples of formats for the transport of audio data [33]:
•
ADIF (Audio Data Interchange Format) puts all data controlling the decoder
(sampling frequency, mode etc...) into a header preceding the audio stream. This is
useful for file exchange, but not for streaming.
•
ADTS (Audio Data Transport Stream) packs AAC data into frames with headers
(like the MPEG-1, -2 format), which is more suitable for streaming.
MPEG2/4 AAC-LD
The MPEG-4 AAC LD (Low Delay Audio Coder) is designed to combine the advantages
of perceptual audio coding with the low delay necessary for two way communication. The
codec is closely derived from MPEG-2 AAC [41], but the contributors to the delay (such as
frame length or window shape) have been addressed and modified [37].
MPEG-4 General Audio coder
The MPEG-4 General Audio coder is derived from MPEG-2 AAC, and is backward
compatible with it, but adds several enhancements:
• PNS (Perceptual Noise Substitution): PNS is based on the observation that one
noise sounds like the other. Then, the actual fine structure of a noise signal is of
minor importance for its subjective perception. Consequently, instead of
transmitting the actual spectral components of a noisy signal, the bitstream would
just signal that the frequency region is a noise-like one and give some additional
information on the total power in that band [38].
• LTP (Long Term Prediction): LTP is an efficient tool for reducing the redundancy
of a signal between successive coding frames. It is especially effective for the parts
of a signal which have clear pitch property [38].
The structure of the coder is represented in Figure 2.6. The same building blocks are
present in the decoder implementation, performing the inverse processing steps.
To increase coding efficiency for coding of musical signals at very low bitrates (below 16
Kbps per channel) [38,41], TwinVQ based coding tools are part of MPEG-4 General Audio
[40]. The basic idea is to replace the conventional encoding of scalefactors and spectral
data used in MPEG-4 AAC by an interleaved vector quantization applied to a normalized
spectrum [38]. The input signal vector (spectral coefficients) is interleaved into subvectors
that are then quantized using vector quantizers [38]. The rest of the processing chain
remains identical as can be seen in the Figure 2.7.
24
Web Radio: Technology and Performance
Figure 2.7: Building Blocks of the MPEG-4 General Audio Coder [38]. Legend: Data
Control
Figure 2.8: Weighted interleaved vector quantization [38].
25
2 Audio Compression
2.2.2 Windows Media Audio
Microsoft Windows Media Audio delivers audio for streaming and download. According
to Microsoft, WMA offers [47]:
•
“Near-CD Quality” at 48 Kbps, and “CD quality” at 64 Kbps.
• High scalability with bandwidths from 5 Kbps to 192 Kbps and sampling rates from
8 kHz to 48 kHz high-quality stereo music [47]. This allows users to choose the best
combination of bandwidth and sampling rate for their content .
Microsoft claims that their WMA codec is very resistant to degradation due to packet loss,
which makes it excellent for use with streaming content. In addition, by using an improved
encoding algorithm, this codec encodes and decodes much faster than others.
Windows Media audio files can also support ID3 metadata. If a source .mp3 file is encoded
using the Windows Media Audio codec, any ID3 properties are included in the Windows
Media audio file [47].
As it is a proprietary format no information can be obtained about the coder and the
compression technique used, all the information exposed here was obtained from
Microsoft.
2.2.3 RealAudio
RealAudio is a proprietary encoding format created by Real Networks. It was the first
compression format to support live audio over the Internet and thus gained considerable
support, but it requires proprietary server software in order to provide the real-time
playback facility. It was first designed for voice applications on the web but later it
developed also into music and video algorithms [48].
RealAudio uses a “lossy” compression scheme that provides high audio quality from source
material with high sampling rates (11 kHz, 22 kHz and 44 kHz). The RealAudio Encoder
compression scheme works by making educated guesses about what is most important in
the sound file [48]:
•
It knows how much room there is in the destination stream and fills that available
bandwidth with as much sound information as it can.
•
Any sound information that doesn't fit is lost.
• The user can help the encoder with its task by emphasizing the most important parts
of the recording.
26
Web Radio: Technology and Performance
It offers bitrates from 16 Kbps, but the recommended bitrate is 96 Kbps.
As it is a proprietary format no information can be obtained about the coder and the
compression technique used.
27
28
3
Compression Techniques Comparison
With so many compression techniques and coders in
the market, it would be good to have some comparison
between them that helps the user to decide which
format to choose for each application. In this chapter
the important issues to look at when measuring audio
quality are introduced. Different coders and
compression techniques are discussed and compared.
The chapter ends with conclusions and reflexions
about the future of compression.
3.1
Coders Comparison
Measuring the audio quality of different encoders has developed into an art of its own,
over the last ten years. Basically, there are three methods to measure the audio quality [23]:
•
Listening tests: evaluate the performance of coders under worst-case conditions.
•
Simple objective measurement techniques: measure of encoder quality by looking at
parameters such as the signal-to-noise-ratio or bandwidth of the decoded signal. If
the coder is perceptual these measurements are not useful.
•
Perceptual measurement techniques: are a very useful supplement to listening tests
and, in some cases, replace them. The ITU-R Task Group 10/4 has produced a
Recommendation for a measuring quality system, called PEAQ (Perceptual
Evaluation of Audio Quality).
29
3 Compression Techniques Comparison
Another important parameter to compare different coders is the coding/decoding time,
which should be as low as possible.
The pure compliance of an encoder with, e.g. an MPEG audio standard, does not guarantee
any quality of the compressed music. Audio quality differs depending on parameters such
as the bitrate of the compressed audio and the sophistication of different encoders, even if
they work with the same set of basic parameters [23]. In the last years many coders have
appeared in the market, trying to improve the original compression format. Here we present
some comparisons between coders.
TEST_1: Comparing MP3 encoders
HOW THE TEST WAS PERFORMED
The coders under test were: Fraunhofer, Lame, BladeEnc and Gogo. Fraunhofer and
BladeEnc used stereo encoding, while Lame and Gogo used joint stereo encoding. First the
output spectrum of the encoders was compared to the spectrum of the original signal, when
encoding music and noise. Then the encoding time of an 8 minutes song for each of the
encoders at different fixed bitrates (160 Kbps and 128 Kbps) was calculated. Lame and
Gogo were also tested with the VBR option.
RESULTS
The results for the encoding time are showed in Figure 3.1-3.3.
Figure 3.1: Encoding time at 160 Kbps required to encode an 8:05 minutes stereo
song (using a Pentium III 600 MHz) [57].
30
Web Radio: Technology and Performance
Figure 3.2: Encoding time at 128 Kbps required to encode an 8:05 minutes stereo
song (using a Pentium III 600 MHz) [57].
Figure 3.3: Encoding time at VBR required to encode an 8:05 minutes stereo song
(using a Pentium III 600 MHz) [57].
This figures show that Gogo is the encoder with lower encoding time at every bitrate. The
results of the quality tests show that Lame is probably the best encoder [see 57 for figures],
followed by Gogo, that also gives good quality, but offers less encoding options than Lame.
31
3 Compression Techniques Comparison
VBR encoders offers better quality than CBR, which was easy to predict, since they
optimize the used bitrate with the complexity of the audio signal to encode.
CONCLUSIONS
If the encoding time is not the priority, Lame with VBR should be chosen since it offers
slightly better quality than Gogo. On the other way, if the encoding time is the priority,
Gogo with CBR is the best option [57].
TEST_2: Comparing MPEG-2 AAC encoders
HOW THE TEST WAS PERFORMED
The methodology used for these tests was based on the ITU-R Recommendation BS.1116,
which specifies that for the greatest listener sensitivity to artifacts each listener should be
tested on his/her own and should be free to switch at any time between the stimuli under
assessment.
The encoders tested were: AAC Main Profile and AAC Low Complexity Profile at 96 Kbps
and 128 Kbps, AAC SSR at 128 Kbps, Layer II (MP2) at 192 Kbps and Layer III (MP3) at
128 Kbps. Ten audio clips of different time duration (always less than a minute) and
different “encoding difficulty” (speech, castanets, accordion, Dire Straits, Tracy
Chapman...) were used for the tests.
RESULTS [58]
32
•
AAC Main 128: Better than MP2 for 3 items, worse for no items, equivalent for 7
items. Better than MP3 for 3 items, worse for no items, equivalent for 7 items.
•
AAC Main 96: Better than MP2 for 1 item, worse for 1 item, equivalent for 8 items.
Better than MP3 for 1 item, worse for no items, equivalent for 9 items.
•
AAC LC 128: Better than MP2 for 3 items, worse for no items, equivalent for 7
items. Better than MP3 for 3 items, worse for no items, equivalent for 7 items.
•
AAC LC 96: Better than MP2 for no items, worse for no items, equivalent for 10
items. Better than MP3 for 1 item, worse for no items, equivalent for 9 items.
•
AAC SSR 128: Better than MP2 for 1 item, worse for no items, equivalent for 9
items. Better than MP3 for 2 items, worse for no items, equivalent for 9 items.
Web Radio: Technology and Performance
CONCLUSIONS [58]
Only the Main 96 codec is outperformed by any MP2 or MP3 codec for any of these
examples. AAC Main 128, AAC LC 128, and AAC SSR 128 give significantly better
performance than MP2 192 or MP3 128. In addition, AAC Main 96 gives better results
than MP3 128. There is no statistically significant improvement between AAC LC 96 and
the MPEG-1 codecs.
Within the AAC codec group, AAC Main 128, AAC LC 128, and AAC SSR 128 are all
superior to AAC LC 96. In addition, AAC Main 128 and AAC LC 128 are superior to
AAC Main 96.
The final ranking would be then: AAC Main 128, AAC LC 128, AAC SSR 128, AAC
Main 96, AAC LC 96, MP2 192, MP3 128. However, there are no statistically significant
differences between each pair in this ordering.
TEST_3: Comparing MPEG-4 encoders
HOW THE TEST WAS PERFORMED
The coders were divided in four groups of coding scheme/bitrates:
• Group A tests the codecs at 6 and 8 Kbps mono and contains HILN, Twin-VQ and
MPEG Layer III. The reference is MPEG Layer III (MP3).
• Group B tests the codecs at 16 Kbps mono and contains HILN, AAC, and G.722 at
48 Kbps as a reference.
Group C and D belong to the same coding system, but are separated because the lowest
layer is a mono layer while the higher layers are stereo layers:
• Group C tests the mono core layer of the AAC large step scalable coder against a
unscaled AAC coder and MP3. The reference coder is MP3.
• Group D tests the upper layers of the scalable coders against unscaled coders and
contains AAC, AAC large step scaleable coder, AAC-BSAC fine granule scalable
coder and MP3. The reference coder is MP3. The AAC-BSAC coder has no
counterpart in the C-Test since it is based on a unscaled stereo AAC coder and
therefore does not provide mono/stereo scalability.
The codecs which were tested are then: HILN at 6 Kbps (mono, 8 kHz) and 16 Kbps
(mono, 16 kHz); Twin-VQ (16 kHz); AAC (24 kHz for all bitrates, 24 Kbps (mono), 40
33
3 Compression Techniques Comparison
Kbps (stereo) and 56 Kbps (stereo)); AAC scal (24 kHz for all bitrates, 24 Kbps (mono), 40
Kbps (stereo), 56 Kbps (stereo)); AAC scal BSAC (24 kHz for all bitrates, 40Kbps, 56
Kbps); G.722 at 48 Kbps, MP3 for the A test at 8 kHz for 8 Kbps coding (MPEG 2.5).
RESULTS [56]
•
Test A: TwinVQ performed better overall. TwinVQ and Layer III performed
equally well overall, however TwinVQ needs 25% less bitrate.
•
Test B: AAC performed better overall.
•
Test C: AAC 24 main performed slightly better than AAC 24 scalable overall. Both
AAC coders performed much better overall and on most items compared to MP3.
•
Test D: AAC 56 performed better than AAC scal 56 overall, but performance was
similar on almost all items. AAC 40 performed better than AAC scal 40 overall,
but performance was also similar on almost all items.
These two codecs did not demonstrate statistical difference, although a trend shows
AAC 56 performing slightly better overall. Since BSAC didn't use a mono/stereo
scaleable mode but a small step scaleable mode based on a stereo coder, these
results cannot be compared directly to the AAC scaleable coder.
AAC 40 performed much better than AAC BSAC 40 overall. Since BSAC didn’t
use a mono/stereo scaleable mode, but an unscaled AAC coder these results cannot
be compared directly to the AAC scaleable coder. The performance difference
between AAC 40 and AAC BSAC 40 is significant.
CONCLUSIONS [56]
The following conclusions can be drawn from the test results [56]:
34
•
Test A: Twin VQ at 6 Kbps shows statistically the same quality as Layer III at 8
Kbps. Twin VQ is therefore a valuable MPEG-4 tool for improved coding
efficiency at lowest bitrates.
•
Test B: AAC at 16 Kbps performed 0.6 grades worse than G.722, but operated at
1/3rd of the bitrate. It can therefore be concluded that AAC is a valuable MPEG-4
tool for coding music signals at bitrates as low as 16 Kbps.
•
Test C and D: At all three bitrates, AAC audio coding shows significantly better
audio quality than MPEG Layer III.
Web Radio: Technology and Performance
The Large Step Scaleable System (AAC Scaleable) shows almost the same quality
as unscaled AAC at the lower (mono) layer and worse quality at the higher (stereo)
layers. Still all Layers perform slightly better (highest layer) or significantly better
(lower and mid layer) than MPEG Layer III. Therefore the scaleable system shows
good performance compared to older standards while providing the additional
functionality of mono/stereo scaleable coding.
The BSAC (Small Step Scaleable System) performed very well at the highest bitrate
of 56 Kbps. On the lower bitrate of 40 Kbps, however, BSAC performed worse than
expected. Although being mainly designed for bitrates from 40-64 Kbps mono at 48
kHz sampling rate, the BSAC tool is still expected to show reasonably good
performance when going from 56 Kbps stereo to 40 Kbps stereo at 24 kHz sampling
rate. The conclusion therefore is that the integration of BSAC in the MPEG-4 audio
framework needs further investigation to check whether the integration is
incomplete or needs changes.
3.2 Formats Comparison
There are many features to look at when choosing the audio format that fits a specific
application: bitrate, sampling frequency, compression ratio, if the format is free or
proprietary etc. Depending on the kind of music compressed the quality of the audio may
vary, e.g. one format may be the best for classical music and another for speech, so the
music used in the test must be specified.
There are also differences in quality depending on the bandwidth, e.g. audio transmitted
over a 14.4 Kbps analogue modem results in “very annoying”, independently of the audio
coding scheme and test material, even under optimum network conditions. The audio
quality improves significantly if a 28.8 Kbps modem is used, with quality levels of
“slightly annoying” and “annoying”. Much better is the quality of systems with an ISDN
line, and with DSL and cable network connection [49].
The way you play the audio back also affects its quality. A compressed file will sound
considerably different when played through a high-end sound card and headphones than it
will through a portable MP3 player with a pair of cheap earpieces.
The following test are subjective (or perceptual) quality test, which means that some users
listen to music in different formats and different bitrates and subjectively decide about the
quality.
35
3 Compression Techniques Comparison
TEST_1: WMA vs. RealAudio vs. MP3 vs. AAC
HOW THE TEST_A WAS PERFORMED
Five 30 sec. clips of music were selected: an acoustic version of Daughter (Pearl Jam),
Radioactivity (Kraftwerk), a cellos version of Wherever I May Roam (Apocalyptica), a live
version of Time (Pink Floyd), and O Fortuna from Carl Orff's Carmina Burana. These
presented a range of challenges to the compression programs, as they ranged from subtle
acoustic sounds to full-on orchestral splendor [50].
Each clip was compressed to a variety of bitrates and the following formats: WMA,
RealAudio, MP3 and AAC. 30 people were asked in a blind test, if they could tell the
difference between the compressed an uncompressed audio versions [50]. The testers
listened to the sound clips through Sony MDR-7506 headphones.
RESULTS
Codec/ Bitrate
AAC
RealAudio’
MP3
WMA
64 Kbps
12%
36%
5%
9%
96Kbps
52%
N/A
N/A
11%
128 Kbps
70%
69%
69%
14%
192 Kbps
N/A
81%
70%
N/A
256 Kbps
N/A
79%
73%
N/A
Table 3.1: Test_A results [50]. This table shows the percentage of testers who were unable
to tell the compressed version from the uncompressed one.
1
The RealAudio codec supports slightly different bitrates to the others: 64, 96, 132, 176,
and 264 Kbps.
It is not surprising that we found that the higher the bitrate, the higher the percentage of
testers who couldn't distinguish between the files [50].
The live Pink Floyd and acoustic Pearl Jam tracks were particularly easy to distinguish
because both pieces contained a variety of subtle sounds (such as audience noise in the
background of the Pink Floyd live track) that were lost or garbled in the compressed
version.
The codec that came off worst in the tests was MP3. The other interesting note is that the
RealAudio codec did the best at the lowest bitrate [50].
36
Web Radio: Technology and Performance
HOW THE TEST_B WAS PERFORMED
Four samples of music were compressed and then 30 people asked to rate the compressed
version against the uncompressed. The scores are averages of the judges' ratings. The clips
were rated on a scale of 1 to 5 according to ITU-Recommendation BS.562.3. This methods
use a five grade scale for scoring:
5
4
3
2
1
BS.562.3 Quality scale
Excellent
Good
Fair
Poor
Bad
Table 3.2: Scoring of subjective sound quality [56].
RESULTS
Codec/ Bitrate
AAC
MP3
RealAudio’
WMA
64 Kbps
3.4
2.2
4.1
3.6
96 Kbps
4.6
N/A
N/A
4.1
128 Kbps
4.8
4.7
4.8
4.1
192 Kbps
N/A
4.8
4.9
N/A
256 Kbps
N/A
4.9
4.8
N/A
Table 3.3: Test_B results [50].
When the testers rated what they guessed were compressed tracks, all of the formats scored
above four at a bitrate of 128 Kbps or higher. MP3 came out on top with a score of 4.9 at
the highest bitrate of 256 Kbps. However, RealAudio encoded at 64 Kbps also scored an
average of 4.1, significantly better than the other formats at that low bitrate [50].
Interestingly, the testers did not rate WMA files at 128 Kbps any higher then WMA files at
96 Kbps, and at 128 Kbps that format was rated significantly lower than the other formats,
even MP3 [50, 51].
37
3 Compression Techniques Comparison
Although MP3 is the most widespread format, it performed the worst at 64 Kbps, achieving
an average score of only 2.2 [50].
TEST_2: MP3 vs. RA G2 vs. WMA v.8 vs. OGG vs. MP3pro vs. MPC
HOW THE TEST WAS PERFORMED
The test was made upon a classical music sample, which formed a very complex wave in
matters of frequency, stereo image and dynamic range [52]. The formats were graded from
1-5 according to Table 3.4.
RESULTS
The results of the test are shown in Table 3.4.
Quality
5 (“CD quality”)
4
3 (FM radio)
2
1 (AM radio)
Format
MP3 at 85 Kbps (FhG VBR)
RA at 96 Kbps
WMA at 96 Kbps
MP3 at 113 Kbps (lame VBR)
OGG at 118 Kbps
MP3 at 56 Kbps and 80 Kbps (MP3 PRO)
WMA at 64 Kbps
RA at 64 Kbps
MP3 at 128 Kbps
WMA at 48 Kbps
MPC at 85 Kbps (VBR)
WMA at 32 Kbps
RA at 45 Kbps
MP3 at 48 Kbps (MP3 PRO)
WMA at 20 Kbps
RA at 32 Kbps
MP3 at 40 Kbps
Table 3.4: Codec performance on a typical multimedia computer [52, 53, 54].
More of these interesting audio test can be found on the Internet [50-55], some of them
even offer samples of the audio so that you can access the audio quality yourself.
38
Web Radio: Technology and Performance
3.3
Final Conclusions
Some discrepancies can be found in these tests, and in other tests available on the Internet,
but I will try to give some general conclusions based on the previous results.
Judging from the tests, most people will find music compressed at higher bitrates, specially
if VBR algorithms are used, indistinguishable from the original versions. It is now obvious
that no algorithm is perfect for everything, users must then choose the best one for their
concrete application [see Table 3.5].
Appliance
Share through CDs, large storage
Share through Internet
Portable players, small storage
Publishing audio in Internet
Broadcasting in Internet
Main requirements
Quality, Popular
Quality, Compression
Compression
Quality, Compression, Popular
Popular, Compression, Streamable
Format of Choice
MP3
MP3, MP3PRO, AAC, WMA
MP3PRO, AAC
MP3
RA-G2, WMA
Table 3.5: Best choices of algorithm per appliance [52].
The goal was to find the best format to be used for broadcasting audio on the Internet, so
the choice should be a format that is popular (so that users will have the player for it
already installed), highly compressed (users may just have a modem connection) and
streamable (to allow for real time transmission). WMA and RealAudio fulfil all these
requirements, with the drawback of being proprietary and not free. MP3 offers less
compression, but the advantage of being an open format and very popular.
Finally, I have used here the adjective of “CD quality” for some compressed audio bitrates,
because it is so defined in many web pages. However, the term “CD quality” should not be
applied to any lossy algorithm (such as the algorithms used on the Internet). If used, the
meaning should be that the audio has a quality that is indistinguishable, for humans, from
CD quality. That said, I believe that for the majority of music and listening conditions,
MP3 when properly implemented at 128 Kbps and WMA, RealAudio and AAC at 96 Kbps,
will achieve this “CD quality”. For classical music, however, the bitrate may need to be
increased to 128 Kbps, even for AAC, WMA and RealAudio.
39
40
4
Internet Audio Transmission
The chapter begins with an introduction on the
Internet and how it works. Then an overview of the
basic streaming techniques is given and some
advanced techniques are proposed to improve it.
Broadcasting is then defined. Some things need to be
changed in the conventional model of the Internet
when streaming audio, additional protocols and a new
concept of servers must be defined. The main
protocols used for multimedia transmission are
explained, as well as web servers and streaming
servers are introduced and compared. To end the
chapter with, the necessity of standardization in
streaming media and some currently approaches are
discussed.
4.1 Introduction
The Internet is a network of networks that spans the globe. These networks are
interconnected so that it is always possible to route data across the networks from one point
to another by some route [63].
The information, which needs to be sent across the networks, is a large piece of binary data,
e.g. text, images or video. This piece of data is divided into many smaller pieces called
packets (or datagrams). Each packet is given a header containing its destination and sending
address and some other important information [63], and is sent independently (packets
belonging to the same message may take a different route depending on network
congestion, router failure...) across the network to the receiving machine, where the packets
are reassembled to form the original piece of data.
41
4 Internet Audio Transmission
Connecting computers together might be difficult, since the computers may be produced by
different companies, have different data representations, different voltage levels for
encoding 1 and 0 etc. To provide connectivity between computers, the International
Standards Organization (ISO) defines an abstract model for computer communication, the
“Layer Reference Model”, which divides communication into 7 layers: application,
presentation, session, transport, network, data link and physical. This model provides a
framework for the development of Open Systems protocol standards.
The corresponding (peer) layers in the computers communicate by means of a set of rules
that form a protocol and dictates data format, control information and timing. Under the
ISO model, messages are delivered by the application at the source node to the highest
layer. The message then travels down the different layers. Each layer in the source adds a
header containing information for its peer in the destination. At the destination node, the
message is received by the lowest layer. The message travels up the layers, with each layer
processing and stripping off the header added by its peer at the source [84].
Traditional communications on the Internet use a suite of protocols called TCP/IP,
Transmission Control Protocol and Internet Protocol. However, these protocols are not
suitable for real time multimedia applications and new protocols must be used, such as
RTSP (Real Time Streaming Protocol) and RTP (Real Time Protocols).
4.2
Streaming Audio
4.2.1 Overview
Audio distribution over the Internet is usually based on a client/server model. In this model,
programs called servers wait for request from other programs called clients, which are
generally running on a different computer elsewhere in the network. Whenever a server
receives a request, it sends a response, which provides some service or data to the client
[65].
The server machine makes its services available using numbered ports, one for each
service. Clients connect to the server’s IP address and on a specific port number, to get the
audio available in the server. There are three ways of getting the audio to the user [68]:
•
Download: The entire file is downloaded over the net, saved in the user’s machine, and
then played from the hard disk.
This method has some advantages, it allows any bitrate, the audio has much better
quality and the user can listen to the saved file whenever he wants [118].
42
Web Radio: Technology and Performance
The main drawback is the extremely long download time if high quality audio is
provided over a low bandwidth network, because the complete file has to be
downloaded before listening to it. With this method there is no possibility to provide
live service [131].
•
Progressive Download (also known as Progressive Streaming): the file is saved locally,
as if it was downloaded, but playback begins as soon as part of the file is downloaded,
before the download finishes. The user can see the part of the file that has downloaded
at a given time, but can’t jump ahead to portions that haven’t been transferred yet.
Progressive download is useful for listening to short files at high quality, but it is not a
good solution for material that the user may want to randomly access or for live
broadcast, it is strictly an on-demand technology. It is the method used by common web
servers.
•
Streaming (also known as Real Time Streaming): the file is played directly from the
network as it gets to the machine, which allows the user to hear the audio without
lengthy download times. The file is never saved in the local hard disk. The server
streams the data to you as soon as it gets it. Streaming is the only delivery method for
live material, and it is also well suited for random access material.
Figure 4.1: Streaming file formats [69].
Streaming requires the use of special servers, called streaming servers, such as Quick Time
Streaming Server, Real Server or Windows Media Server, and also uses special network
protocols, such as RTSP or MMS (Microsoft Media Server). How both web and streaming
server work will be explained later in this chapter.
43
4 Internet Audio Transmission
To date, streaming audio is much more pleasant to experience than streaming video. A
future technology is streaming delivered by email, e.g. Real email (from Real Networks),
which intends to deliver streaming audio and video directly into the body of an email
without requiring the recipient to launch a web browser, download or install any executable
files, or have any preinstalled media players or plug-ins [70].
4.2.2 Streaming Improvement
Buffering
When a file is streamed, buffering in network elements may cause variations in packet’s
arrival rate (jitter) because of the multiplexing of packets from many sources. Additionally,
network buffers may overflow, causing packets to be lost with a corresponding change in
timing relationship. Packets to the same destination may also traverse different paths and
arrive out of order. Timing relationships for audio streams must be restored at the receiver
to ensure coherent reception. This consideration is much more important than compensating
for the lost of a packet, which is often ignored and appears to the listener as an
imperceptible silence [1].
A technique called buffering is used to solve these problems. On the receiving side, the
player software stores some of the stream, e.g. 10 seconds (playout delay), in a buffer
before playing it. Thus, the player always has 10 seconds of material to play if there’s a
network problem, and the listener will never realize the problem. Some sort of time
stamping of the data will be necessary to enable the receiver to play it back at the
appropriate time [1].
The size of this buffer depends on the application, by using a larger buffer and then
delaying the play out of the data until the buffer is nearly full, variations in jitter can be
smoothed. The downside of using a large buffer is that it introduces delay in the audio
stream, this delay is not a problem for one-way audio broadcast (such as radio), but for
interactive audio conferences [77]. The are also some algorithms that vary the buffer length
when needed.
Late packets that cannot be handled by the buffer must be considered as loss, and some
form of packet-loss recovery process may be initiated. Packet loss recovery processes may
be classified as redundant or nonredundant [88]:
•
Redundant or “Error resilience” methods require the transmission of extra information
(e.g. repetition of the previous frame) that may be used when packets are lost. The
repeated information is generally coded to a lower bitrate than the original stream.
Using compression can make the increased bitrate requirements of transmitting
redundant information more acceptable.
44
Web Radio: Technology and Performance
•
Nonredundant or “Error concealment” techniques don’t require the transmission of
extra information, but instead rely on processing within the player, such as repeating
previous waveforms or parameters, to reduce the effects of packet loss.
Multi-rate files
To stream audio data, the transmission line has to provide the full bandwidth of the stream
during the whole transmission period, which imposes a limit on the bandwidth and hence
on the quality of the audio stream [131]. Traditionally, in order to meet the bandwidth
capabilities of a wide variety of users, streaming data was encoded at a variety of
bandwidths. Each of these different speeds required a separate link in the web page. The
users had to know how fast their connection was, and choose among all the links offered.
It is possible to create streaming files that can be streamed over different speeds, so called
multi-speed or multi-rate files, obviating the need for offering multiple bandwidths [143].
This scheme is used in the “SureStream” and “Intelligent streaming” technologies used in
Real Media and Windows Media respectively. In a nutshell, this scheme stores multiple
versions of the content, encoded at every desired bandwidth, in a single file. When the user
requests the content, the player and server negotiate the appropriate bandwidth based on the
user’s capabilities. In fact, if network congestion causes interruptions in delivery, the rate
can be re-negotiated at a lower value [3].
Increasing connection bandwidth
Congested networks and overloaded servers resulting from the growing number of Internet
users contributes to the lack of good quality audio streaming over the Internet. Advanced
compression mechanisms can reduce the load in the network but not in the server.
A possible user solution is to install a faster connection. A cable modem offers speeds up to
10Mbps. ISDN (Integrated Services Digital Network), unlike standard modem and cable
connections, offers faster connections without sharing bandwidth, offering significantly
more fluid streaming [71].
Satellite vendors are offering Internet Services Providers (ISP) non-terrestrial networks to
maximize bandwidth usage. Instead of using standard ground cables with standard
bandwidth restrictions, ISPs who use satellite networking can theoretically deliver any type
of bandwidth imaginable, thereby significantly relieving network traffic congestion.
45
4 Internet Audio Transmission
Target Connection Speed
14.4 Kbps Modem
28.8 Kbps Modem
56 Kbps Modem
56 Kbps ISDN
64 Kbps ISDN
112 Kbps ISDN
Recommended maximum bitrate
for streaming clips
10 Kbps
20 Kbps
32 Kbps
45 Kbps
56 Kbps
80 Kbps
96 Kbps
128 Kbps ISDN
Table 4.1: Typical maximum bitrates for the signal content recommended for
streaming multimedia presentations, depending on modem speed [125].
Multicasting
Multicasting is a method for transmitting streaming content that helps in the conservation
of bandwidth. The traditional method for broadcasting on the Internet is unicasting,
technology that sends a separate packet to each user that requests it. Networks also support
broadcasting, where a single copy of the data is sent to all clients on the network. When the
same data needs to be sent to only a portion of the clients on the network, both of these
methods waste network bandwidth. Unicast wastes bandwidth by sending multiple copies
of the data and may produce also server overload. Broadcast wastes bandwidth by sending
the data to the whole network whether or not the data is wanted [72].
In multicast streaming, instead of broadcasting thousands of streams, the server sends out
only one packet, which is duplicated along the way whenever paths to different users
diverge. For this to be possible, hosts must be assigned to host groups, with multicast
addresses identifying groups instead of single hosts (a range of IP addresses are reserved
for this purpose). Bandwidth is thus decreased not only at the server, but also on the entire
network, as can be observed in Figure 4.3.
Some new things must be redefined when using multicast. In the lowest layer, the Ethernet
address will be multicast addresses. IP Multicasting protocol is used in the network layer,
where new routing protocols are also needed. In the transport layer, TCP is not suitable for
multicasting, and thus UDP (User Datagram Protocol) is used [73].
46
Web Radio: Technology and Performance
Figure 4.2: Comparison between the network load per client when unicasting and
multicasting an 8-Kbps PCM audio stream [72].
The user needs information about how to join the multicast. The Session Description
Protocol (SDP) is the protocol in charge of providing this information via .sdp files,
containing group address, port number, name and description, protocol used etc. SDP files
are commonly posted on web servers to announce upcoming multicasts [74]. The IGMP
(Internet Group Management Protocol) manages the multicast groups.
Figure 4.3: Unicast vs. Multicast [75].
The heterogeneous nature of the Internet makes multicast transmission a challenge.
Different receivers of the same multicast data may have different processing capabilities,
loss tolerance and bandwidth available in the paths leading to them. The sender application
must treat the group of receivers with fairness. An adaptive mechanism can be used, the
sender application may transmit one multicast stream and determine which transmission
rate satisfies most of the receivers, may transmit at multiple rates etc. [78].
47
4 Internet Audio Transmission
Furthermore, multicast addresses are recognized only by multicast routers, but not all the
routers support multicast. The MBONE is a virtual network that allows multicast packets
to travel through routers that are set up to handle only unicast traffic. To achieve this each
router supporting multicast encapsulates the multicast message into a unicast message
addressed to the next multicast router.
Users behind routers that don't implement multicasting can also receive multicast streams
by requesting them from a reflector. A reflector is an RTSP server that joins a multicast,
and then converts the multicast into a series of unicast, passing the streams to users who
request them [see Figure 4.5].
Figure 4.4: You can also receive a multicast through a reflector [74].
48
Web Radio: Technology and Performance
Caching and replication
Caching and replication (or mirroring, splitting) technologies are also being used to reduce
the load of networks and servers and the response times.
Replication involves setting up a source audio server that is sending the streams, along with
multiple splitting servers in other locations. The streams are then splitted from the source
server to each splitting server that rebroadcasts them to its own users. Because stream
splitting requires fewer direct connections to the origin server, a large amount of bandwidth
can be conserved by enabling this feature [79].
Unfortunately these proposals and implementations do not currently use multicast
technology, to their detriment [1]. Caching cannot be used for live transmission either, just
for on-demand contents.
Figure 4.5: Stream replication [80].
Error spreading
Researches continue thinking in ways to improve the streaming technology. Some of them
oriented to the error handling and coding methods. One of the most annoying disturbances
when listening to audio are bursty losses caused by congestion. Error spreading is a
technique that permutes the input sequence of packets, from a continuous stream of data,
before transmission. In the receiver the packets are unscrambled. This technique ensures
that bursty losses in the transformed domain get spread all over the sequence in the original
domain, thus improving the perceptual quality of the stream [81].
49
4 Internet Audio Transmission
4.3 Broadcasting
When an audio file is delivered on demand, it starts from its beginning when the user clicks
the presentation link in a web page. Each user can receive the file at any time and use the
player's controls to fast-forward or rewind through the file [137].
In an Internet radio station, however, the audio is broadcast (even if there are some audio
clips stored on the web page). In a broadcast, the broadcaster starts the audio at a certain
time. Users who click the audio link join the broadcast in progress. Before the broadcast
begins and after it completes, the audio URL is not valid. During the broadcast the player’s
fast-forward and rewind controls do not function. To make an analogy, on-demand content
is like a song on a tape. The user can listen it at any time, skip forward, rewind, and pause.
A broadcast, though, is like a song broadcast on a radio channel. As with a radio broadcast,
there are two types of streaming media broadcasts:
•
Live content (also known as real time content): Live Broadcasting occurs when
encoding is from a live source and the encoded stream is being broadcast over the
Internet live at the time of encoding.
•
Pre-recorded content (also known as pre-stored content): Pre-recorded content
consists of audio recorded and written to a digitalized clip. The clip can be edited
before converting it to a streaming format and broadcasting it across a network. To
the user, the audio looks just like a live broadcast.
4.4 Multimedia Protocols
The multimedia protocols define the establishment of the connection and transmission of
the media from the server to the clients. Multimedia protocols have some differences with
traditional protocols used for Internet transmissions, and must fulfil some requirements:
50
•
The protocols should provide a way that a sender can tell a receiver which coding
scheme he wants to use, so that they can interoperate.
•
The receiver should be able to determine the timing relationship among the received
data to be able to reconstruct the audio information.
•
The receiver should be also able to determine if there is a packet loss. This
information is also needed to detect congestion.
Web Radio: Technology and Performance
•
The bandwidth should be used efficiently. Audio packets tend to be small to reduce
the time it takes to fill them with samples, then if a long header were added by a
protocol, a large amount of link bandwidth would be wasted.
HTTP, RTSP, SDP, SIP, SAP
TCP, UDP, RTP, RTCP, RSVP
IP, ICMP, IP multicast
Ethernet
Session/ Presentation/ Application Layer
Transport Layer
Internet/ Network Layer
Physical/ Data Link Layer
Table 4.2: Multimedia protocols on each Internet ISO layer.
In Table 4.1 we can see the layered scheme of the multimedia protocols according to the
ISO architecture. These protocols, specified for audio, which is the purpose of this thesis,
are explained in the following.
4.4.1 Physical/ Data Link Layer
Ethernet
The physical layer specifies protocols for the actual transmission of the audio data across
the link [84]. Most computers today are connected to a LAN, where the most popular
technology is Ethernet, which offers high bandwidth up to 1 Gbps [83]. Both multicast and
unicast are supported by the Ethernet technology.
4.4.2 Internet/ Network Layer
Internet Protocol (IP)
IP is a connectionless protocol, and it is also referred to as an unreliable protocol, because it
relies on other layers to provide handshake and error detection and correction. The IETF
(Internet Engineering Task Force) describes IP in RFC 791 [86].
The functions that IP performs include:
•
Defining a packet and an addressing scheme.
•
Moving data between transport layer and network access layer protocols.
51
4 Internet Audio Transmission
•
Routing packets to remote hosts.
•
The fragmentation and reassembly of packets.
←--------------------------------------- 32 bits --------------------------------------------------→
Version
IHL
Type of Service
Total length
Identification
Flags
Fragment offset
Time to live
Protocol
Header checksum
Source address
Destination address
Options (+ padding)
Data (variable)
Table 4.3: IPv4 packet format [85].
The fields in an IPv4 packet are represented in Figure 4.2 and their use is described in:
52
•
Version: Indicates the version of IP currently used.
•
IP Header Length (IHL): Indicates the header length in 32-bit words.
•
Type-of-Service: Specifies how an upper-layer protocol would like a current packet
to be handled, assigning various levels of importance to the packets.
•
Total Length: Specifies the length, in bytes, of the entire IP packet, including the
data and header.
•
Identification: Contains an integer that identifies the current packet. This number is
used to reassemble packet fragments, which are created when the size of the packet
is greater than the MTU (Maximum Transfer Unit) of the destination access
network or an intermediate network, and the original packet has to be divided.
•
Flags (3 bits): The low-order bit specifies whether the packet can be fragmented.
The middle bit specifies whether the packet is the last fragment in a series of
fragmented packets. The third or high-order bit is not used.
•
Fragment Offset: Indicates the position of the fragment's data relative to the
beginning of the data in the original packet, which allows the destination IP process
to properly reconstruct the original packet.
Web Radio: Technology and Performance
•
Time-to-Live: Maintains a counter that gradually decrements down to zero, at which
point the packet is discarded (keeps packets from looping in the network endlessly).
•
Protocol: Indicates which upper-layer protocol receives incoming packets after IP
processing is complete.
•
Header Checksum: Helps ensure IP header integrity.
•
Source Address: Specifies the sender IP address.
•
Destination Address: Specifies the receiver IP address.
•
Options: Allows IP to support various options, such as security.
•
Data: Contains upper-layer information (≤ 65535 bytes).
The only other protocol that is generally described as being at the Internet Layer of the ISO
model is the Internet Control Message Protocol (ICMP), a protocol used to communicate
control messages between IP systems. ICMP messages generally contain information about
routing difficulties with IP datagrams or simple exchanges such as time-stamp or echo
transactions. The IETF describes ICMP in RFC 792 [87].
With the Internet’s massive growth, the Internet address space was being consumed. The
IETF thought about a new version of the IP. It was called IP version 6 (IPv6). IPv6
increased the IP address size from 32 bits to 128 bits, to support more levels of addressing
hierarchy, a much greater number of addressable nodes and simpler auto-configuration of
addresses. Scalability of multicast addresses was introduced, and also a new type of address
called an anycast address, which allows sending a packet to any one of a group of nodes,
was defined.
IPv6 substitutes the ‘Type of Service’ field in IPv4 header by two new fields: ‘Flow Label’
(24 bits to distinguish between different flows) and ‘Priority’ (4 bits to assign different
priority levels to the flows) to improve the QoS. The IETF describes IPv6 in RFC 1883
[90].
The ICMP was revised during the definition of IPv6. The multicast control functions of the
IPv4 Group Membership Protocol (IGMP) were incorporated with the ICMPv6, which is
defined in RFC 2463 [91].
IP Multicast
We have seen the suitability of the multicast technology for audio streaming transmission.
The IP Multicast protocol was created to extend the IP protocol to cater for multiple user
transmission [84]. It was defined by IETF in RFC 2365 [92].
53
4 Internet Audio Transmission
While IP works by assigning each host a unique address called an IP address, IP Multicast
extends IP by assigning group IP addresses to groups of users, so that the information needs
only to be sent once to the group IP address for all users in that group to receive the data,
using network resources and bandwidth more efficiently. The IP Multicast packet format is
similar to IP packet, but with multicast IP addresses.
4.4.3 Transport Layer
Transmission Control Protocol (TCP)
TCP is a transport level protocol that controls the transmission and the flow of data
between two hosts. TCP protocol is reliable because it ensures that every single packet is
delivered by the use of sequence numbers, ACKs and retransmissions. It also provides flow
control, congestion control and error recovery [84]. The IETF describes TCP in RFC 793
[93].
Figure 4.6: The Transmission Control Protocol [63].
The TCP packet format is represented in Table 4.3 and the packet’s fields are:
←--------------------------------------Source Port
Data offset
32 bits
--------------------------------------------------→
Destination Port
Sequence Number
Acknowledge Number
Reserved
Flags
Window
Checksum
Urgent Pointer
Options (+ Padding)
Data
Table 4.4: TCP packet format [89].
54
Web Radio: Technology and Performance
•
Source port: Source port number.
•
Destination port: Destination port number.
•
Sequence number: The sequence number of the first data octet (except when SYN is
present). If SYN is present, the sequence number is the initial sequence number
(ISN) and the first data octet is ISN+1.
•
Acknowledgment number: If the ACK control bit is set, this field contains the value
of the next sequence number that the sender is expecting to receive. Once a
connection is established, this value is always sent.
•
Data offset (4 bits): The number of 32-bit words in the TCP header, which indicates
where the data begins. The TCP header has a length that is an integral number of 32
bits.
•
Reserved (6 bits): Reserved for future use. Must be zero.
•
Flags (6 bits): The control bits may be (from right to left):
U (URG)
A (ACK)
P (PSH)
R (RST)
S (SYN)
F (FIN)
Urgent pointer field.
Acknowledgment field.
Push function.
Reset the connection.
Synchronize sequence numbers.
No more data from sender.
Table 4.5: Control bits in the TCP packet [89].
•
Window (16 bits): The number of data octets, which the sender is willing to accept,
beginning with the octet indicated in the acknowledgment field.
•
Checksum (16 bits): The checksum field is the 16 bit one’s complement of the one’s
complement sum of all 16-bit words in the header and data. If a segment contains an
odd number of header and data octets to be checksummed, the last octet is padded
on the right with zeros to form a 16-bit word for checksum purposes. The pad is not
transmitted. While computing the checksum, the checksum field itself is replaced
with zeros.
•
Urgent Pointer (16 bits): This field stores the current value of the urgent pointer as
a positive offset from the sequence number. The urgent pointer points to the
55
4 Internet Audio Transmission
sequence number of the octet following the urgent data. This field can only be
interpreted when the URG control bit has been set.
•
Options: Options may be transmitted at the end of the TCP header and always have
a length which is a multiple of 8 bits. All options are included in the checksum. An
option may begin on any octet boundary.
The use of TCP in multimedia communications is restricted to the transmission of control
messages between client and servers [84]. It is not well-suited for continuous audio
streaming because of its reliability [95], e.g. a packet loss will cause a dropout in the audio,
but this is more acceptable than retransmitting the packet, which would probably cause the
packet to arrive too late to be useful, and will pause the audio. TCP is not suited to
multicast either [102].
User Datagram Protocol (UDP)
In order to stream, several things need to be modified from the traditional TCP/IP model,
current streaming technology relies on UDP rather than TCP. UDP is an unreliable
transport protocol that forsakes the reliability and flow control of TCP and simply sends out
the data as a continuous stream of packets to the receiver. This results in a continuous
stream of packets that is only dependant on the bandwidth of the connection, which is
product of network congestion. The IETF describes UDP in RFC 768 [97].
Figure 4.7: The User Datagram Protocol [63].
←--------------------------------------Source port
Length
32 bits
--------------------------------------------------→
Destination port
Checksum
Data
Table 4.6: UDP packet format [89].
56
Web Radio: Technology and Performance
The UDP packet fields are represented in Table 4.5 and are:
•
Source and Destination port (16 bits): This port has a meaning within the context of
a particular IP source/ destination address.
•
Length (16 bits): Is the length in octets of this packet, including header and data.
•
Checksum: Is the 16-bit one's complement of the one's complement sum of a pseudo
header of information from the IP header, the UDP header, and the data, padded
with zero octets at the end (if necessary) to make a multiple of two octets. The
pseudo header contains the source address, the destination address, the protocol, and
the UDP length.
One problem that does exist with UDP is that many network administrators see UDP as a
security risk, and they choose to combat this problem by blocking all or most of the UDP
traffic using a firewall. If the receiver network is running a firewall that is blocking UDP
packets, then it cannot receive UDP packets and the sender will have to resort to TCP [63].
Since UDP does not implement congestion control, applications that are implemented using
UDP should detect and react to congestion in the network. Ideally, they should do so in a
way that ensures fairness when competing with existing Internet traffic, otherwise such
applications may obtain larger portions of the available bandwidth than TCP-based
applications.
Real Time Protocol (RTP)
RTP provides end-to-end network transport functions suitable for applications transmitting
real time data such as audio or video, over multicast or unicast technologies. RTP does not
address resource reservation and does not guarantee QoS.
Since RTP does not have all the functions of a transport protocol it is typically used on top
of TCP or UDP, usually UDP and is thus connectionless. RTP is not part of the TCP/ IP
protocol stack, so applications should add and recognize a new 12-byte header in each UDP
packet [83].
The sender fills in each header which format can be observed in Figure 4.6, and contains:
•
V (Version): Identifies the RTP version.
•
P (Padding): When set, the packet contains one or more additional padding octets at
the end that are not part of the payload. In the case of padding, the complete length
of the RTP header, data and padding would be transported by the lower layer
protocol header (UDP header) and the last byte of the padding would contain a
count of how many bytes should be ignored. This approach removes any need for a
57
4 Internet Audio Transmission
length field in the RTP header, in the common case of no padding, the length is
deduced from the lower-layer protocol.
←------------------------------ 8 bits
V
M
P
----------------------------------------→
X
CSRC count
Payload type
Sequence number (2 bytes)
Timestamp (4 bytes)
SSRC (4 bytes)
CSRC (0-60 bytes)
Table 4.7: RTP header format [89].
58
•
X (Extension bit): When set, the header is followed by exactly one header
extension.
•
CSRC count: Contains the number of CSRC identifiers that follow the fixed header.
•
M (Marker): The interpretation of the marker is defined by a profile. It is intended
to allow significant events such as frame boundaries to be marked in the packet
stream.
•
Payload type: Describes the type of data transported, such as voice, audio or video,
and how it is encoded.
•
Sequence number: Increments by one for each RTP data packet sent, and may be
used by the receiver to detect lost, out-of order and duplicate packets.
•
Timestamp: Reflects the sampling instant of the first octet in the RTP data packet. It
is used to reconstruct the timing of the original audio in the receiver.
•
SSRC: Identifies the synchronization source. This identifier is chosen randomly,
with the intent that no two synchronization sources within the same RTP session
will have the same SSRC identifier. By making the source identifier something
other than the network or transport address of the source, RTP ensures
independence from the lower layer protocol, and enables a single node with
multiple sources to distinguish those sources.
•
CSRC: Contributing source identifiers list. Identifies the contributing sources for the
payload contained in this packet.
Web Radio: Technology and Performance
After this header there may be optional header extensions. The intention of this header is
that it contains only the fields that are likely to be used by many applications, since
anything that is very specific to a single application would be more efficiently carried in the
RTP payload for that application only [89]. The payload in the RTP packet is following the
header.
Because RTP is designed to support a wide variety of applications, it provides a flexible
mechanism by which new applications can be developed without repeatedly revising the
RTP protocol itself. For each class of application, such as audio, it defines a profile and one
or more formats. The profile provides a range of information that ensures a common
understanding of the fields in the RTP header for that application class. The format
specification explains how the data in the payload is to be interpreted [83], e.g. there is a
payload format defined for: MPEG-2 AAC [103], MPEG-4 [104, 105], MP3 [106],
MPEG1/MPEG2 Video [107], etc.
The protocol also includes a provision to support “translators” and “mixers”. These devices
interconnect different networks that may support different transports or codecs. A translator
allows user machines to interact with each other, it accepts the traffic of one machine and
translates (encodes) it into a format that is in consonance with the bandwidth limitations of
the transit network and/ or the receiver machine. A mixer receives streams of RTP packets
from one or more sources, possibly changes the data format, combines the streams in some
manner and then forwards the combined stream. Translators and mixers can also convert
multicast addresses to multiple unicast addresses so as to reach participants not on a
multicast network [1, 96].
IETF describes the bases of RTP in RFC 1889 [99]. The International Telecommunication
Union (ITU) employs RTP in the multimedia communications standard H.323 and it is also
recommended by the Internet Streaming Media Alliance (ISMA).
Real Time Control Protocol (RTCP)
RTCP Real Time Control Protocol is used in conjunction with RTP, as its control protocol,
and is also defined in RFC 1889 [99]. RTCP data provides information such as
sender/receiver reports and statistics about the connection, e.g. number of packets
lost/successfully delivered. The sender may then modify its transmission method based on
information provided by RTCP, e.g. in rate adaptive applications it may decide to use more
aggressive compression scheme to reduce congestion or to send higher quality stream when
there is little congestion.
RTCP data is sent out periodically between RTP data, and makes up to 5% of the overall
session bandwidth. This limitation is important because RTCP information must not be
allowed to overwhelm the connection, thereby slowing down the audio information carried
by RTP [99].
59
4 Internet Audio Transmission
←------------------------------ 8 bits ----------------------------------------→
Version
P
Reception report count
Packet type
Length
Table 4.8: RTCP header format [89].
RTCP defines a number of different packets depending on the function: sender report,
receiver report etc. The fields of the header that are common to all the packets are
represented in Table 4.7 and are:
•
Version: Identifies the RTP version that is the same that the one in RTP packets.
•
P (Padding): When set, it means that this RTCP packet contains some additional
padding octets at the end that are not part of the control information. The last octet
of the padding is a count of how many padding octets should be ignored.
•
Reception report count: The number of reception report blocks contained in this
packet. A value of zero is valid.
•
Packet type: Contains a constant to identify the kind of RTCP packet: 200 for
sender report, 201 for receiver report etc.
•
Length: The length of this RTCP packet in 32-bit words minus one, including the
header and any padding.
Resource Reservation Protocol (RSVP)
Streaming applications require a minimum bandwidth to transmit a stream. The RSVP was
created to allow a stream to reserve bandwidth hop by hop, from the receiver to the source,
so as to provide an acceptable QoS [83]. It works both with unicast and multicast, the only
difference is that in multicast environments, an application uses RSVP to reserve
bandwidth after it has joined a multicast group [1].
When a client wants to reserve bandwidth, he invokes RSVP by sending an RSVP request
on the proposed delivery path to the server. All the routers on the path process this request.
When a router receives an RSVP request, it tries to make the reservation on the downstream
path to the client. If the reservation cannot be made, an error is propagated back to the
client, and the client may try an alternative path. If the reservation is successful, the router
forwards the request to the next router. After a successful reservation, the client receives a
60
Web Radio: Technology and Performance
message specifying the path through the network on which the reservations have been made
[84].
The RSVP is described by the IETF in RFC 2205 [109].
4.4.4 Session/ Presentation/ Application Layer
Hypertext Transfer Protocol (HTTP)
HTTP is an application level protocol used traditionally to transfer data on the Internet. In
spite of being well suited for the transfer of web pages, it is not suitable for real time audio
streaming, because it is entirely based on TCP. Therefore, HTTP can still be used for
streaming in two ways [111]:
•
HTTP streaming: Used in progressive streaming.
•
HTTP tunneling: It is used by many streaming servers (Real Server, Microsoft
Media Server…) in case UDP or TCP streaming fails (streaming protocols typically
are using a port that is not permitted by the users firewall). The server then sends its
packets inside a HTML protocol so that it looks like regular HTTP traffic and can
bypass the user’s firewall.
HTTP is defined by IETF in RFC 2616 [110].
Real Time Streaming Protocol (RTSP)
RTSP was designed to control a continuous multimedia (e.g. audio) stream over the
Internet in real time. It supports many underlying protocols, like UDP, TCP, RTP, RSVP
and multicast. It was described by IETF in RFC 2326[112].
RTSP acts like a network remote control requesting audio streams, in the same way that a
user might request a web page from a web server using HTTP. For on demand audio it
provides the functionality necessary to be able to stop, start, pause or go to a particular
point in the stream; for live audio streams, to schedule a time at which to start the play. It
does not actually carry any audio itself, this is carried on a separate network connection by
RTP and RTCP. This is a benefit because control requests can be made with RTSP using a
reliable protocol (TCP) on one connection, and the audio can be streamed, uninterrupted,
using some unreliable protocol (UDP) on another connection [65].
The overall audio presentation and the properties of the audio the presentation is made up
of are defined by a presentation description file, the format of which is defined usually by a
session description protocol (SDP) [65]. The presentation description file contains
information about the data streams, including their encoding, language etc. In this
61
4 Internet Audio Transmission
presentation description, each audio stream that is individually controllable by RTSP is
identified by an rtsp:// URL, which tells an application that the presentation is located
externally on a streaming server and is using RTSP. This URL points to the server handling
that particular audio stream and names the stream stored on that server, e.g.
rtsp://myserver/myaudio.rm (.rm indicates RealAudio) [147].
Besides the audio parameters, the network destination IP address and port need to be
determined to establish the connection. Several modes of operation can be distinguished:
•
Unicast: The audio is transmitted to the source of the RTSP request, with the port
number chosen by the client.
•
Multicast: The server picks the multicast address and port. This is the typical case
for a live transmission, where the client must join the multicast before being able to
listen to the audio [135].
RTSP is used to establish the client/ server connection. Initially, when the client clicks in
the audio link, the client and server establish a TCP connection [84]. The presentation
description file is obtained by the client using HTTP typically, or other means such as
email.
Figure 4.8: RTSP operation [113]. Protocols:
HTTP/TCP
, RTSP/TCP
, RTP/UDP
62
.
Web Radio: Technology and Performance
Once it has the presentation description, the client sends a SETUP command to the server
specifying the desired audio presentation (typically specified as an URL), the selected
protocol that should be used to transmit the audio and the port number the client has made
available to receive the audio. The server makes any resource allocation needed and
responds with a message that includes a session identifier (an arbitrary string that is used
for both client and server to identify messages associated with the same session), the
selected protocol, and acknowledgement of the client's port numbers. In addition, the server
provides port numbers for feedback sent by the client. Everything is then in place for the
streaming to begin [65].
In the simplest mode operation, the client sends PLAY request to cause the server to begin
sending data, and PAUSE requests to temporarily halt it. A PLAY request may include a
header specifying a range within the duration of the stream.
A session is ended by the client sending a TEARDOWN request, which causes the server to
deallocate any resources associated with the session.
Session Control Protocols
While the protocols above provide a wide range of the functionality needed for multimedia
applications, there is an aspect of multimedia that they don’t address, the session control:
information about addresses, ports, encoding etc. The protocols involved in this
functionality are:
Session Description Protocol (SDP)
Session description protocol is purely a format for session description, it does not
incorporate a transport protocol, and is intended to use other different protocols including
the Session Announcement Protocol (SAP), Session Initiation Protocol (SIP), RTSP,
electronic mail, and HTTP. It is described in RFC 2327 [114].
The purpose of SDP is to convey information about media streams in multimedia sessions
to allow the recipients of a session description to participate in the session. The information
provided by SDP includes:
•
Session name and purpose.
•
Time(s) the session is active.
• The media comprising the session: type of media (audio, video), transport protocol
(RTP / UDP/ IP), format of the media compression if used (MPEG etc.).
63
4 Internet Audio Transmission
•
Information to receive those media (IP unicast/ multicast addresses, ports, formats
etc.).
•
Transport methods that the server is capable of understand.
As the resources necessary to participate in a session may be limited, some additional
information such as the bandwidth to be used may also be desirable.
Session Announcement Protocol (SAP)
The Session announcement protocol is used in order to assist the advertisement of multicast
sessions, and to communicate the relevant session setup information to prospective
participants. SAP periodically multicasts an announcement packet, containing a session
description, to a well-known multicast address and port. SAP services provide information
on all the known servers throughout the entire network [77, 89]. The protocol is described
in the RFC 2974 [115].
Session Initiation Protocol (SIP)
SIP is a signaling protocol for initiating, managing and terminating sessions across packet
networks. It is described in RFC 2543 [116]. SIP sessions involve one or more participants
and can use both unicast and multicast.
4.4.5 Asynchronous Transfer Mode (ATM)
ATM is an alternative network technology to TCP/ IP based on transferring data in cells of
a fixed size. ATM is a connection oriented technology, this means that a fixed channel, or
route, is created between two points whenever data transfer begins. This differs from the
TCP/ IP connectionless model, in which messages are divided into packets and each packet
can take a different route from source to destination. This difference makes it easier to track
and bill data usage across an ATM network, but it makes it less adaptable in case of
network congestion [142].
The small, constant cell size allows ATM equipment to transmit video, audio, and computer
data over the same network, and assure that no single type of data hogs the connection.
Because of this, some people think that ATM holds the answer to the Internet bandwidth
problem.
ATM also offers QoS. You generally have a choice of four different types of service:
•
64
Constant Bitrate (CBR) specifies a fixed bitrate.
Web Radio: Technology and Performance
•
Variable Bitrate (VBR) provides a specified throughput capacity but data is not sent
at a constant bitrate.
•
Unspecified Bitrate (UBR) does not guarantee any throughput. This is used for
applications, such as file transfer, that can tolerate delays.
•
Available Bitrate (ABR) provides a guaranteed minimum bitrate but allows data to
be bursted at higher bitrates when the network is free.
ATM’s scalability (it works at different speeds in different media), bandwidth efficiency
and support for multiple traffic types make it a suitable choice for streaming. International
standards exists (ITU-T J.82) which define how the MPEG-2 Transport Stream is streamed
over ATM using the AAL [141].
4.5
Media Servers
When users connect and request web pages (or any other on demand content), these are
stored in web servers, and the pages are transferred using HTTP [66]. In applications
involving live audio using streaming, special servers called streaming servers have to be
used. Web and streaming servers operation when working with audio are analyzed and
compared here [7].
4.5.1 Web Servers
A web server stores the compressed audio file, and the web page containing the audio file's
URL [7, 117]. When a user clicks on a hyperlink in a page for an audio file, the user’s
browser connects to the server named in the hyperlink using HTTP, and sends a request for
the contents of the file named in the hyperlink using a GET request message. The server
responds by returning the contents of the file in a GET response message. On receipt to this
the browser determines from the header of the message the type of data (audio) and the
compression method used. Then the browser invokes the media player that decompress the
contents of the file and outputs the resulting byte stream to the sound card.
The disadvantage of this approach is that since the browser must first receive the contents
of the file in its entirety, an unacceptable long delay is introduced if the contents are of
significant size. Hence, for larger files, an alternative approach is used which enables the
file to be sent directly to the media player rather than through the browser [7].
65
4 Internet Audio Transmission
Figure 4.9: Streaming from a Web Server [113].
Using this approach, when an audio file is created a second file is also created. The second
file, called metafile, has the URL of the original file, containing the compressed audio, and
a specification of the content type stored in the file. The metafile also has a URL associated
with it and, when a hyperlink to audio is included in a web page, the URL of the metafile is
used rather than the URL from the original file.
Figure 4.10: Metafile requests [113].
66
Web Radio: Technology and Performance
Thus when the user clicks in the hyperlink, the GET response message contains the
contents of the metafile. The browser accesses the metafile and invoke the player as before,
but this time it simply passes the presentation description in the metafile to the media
player. The media player, on determining that this is a metafile, reads the URL of the
original file and then proceeds to obtain the contents of the original file in the normal way
using HTTP/ TCP. On receipt to the file contents, the media player simply streams the
received compressed contents into the buffer. After a predefined delay, to allow the buffer
to partially fill, it starts reading the stream from the buffer and, after decompression,
outputs the resulting byte stream to the audio card.
This approach removes the delays that are introduced when the file contents are accessed
through the browser. The limiting factor with this approach is that since the audio file is
accessed using HTTP/ TCP, a long delay will be introduced by TCP retransmissions. In
general, for files containing real time information UDP is used, meaning that HTTP cannot
be used and so a different server, called streaming server, must be chosen [7].
4.5.2 Streaming Servers
These servers use the RTSP, RTP, UDP and IP protocols and their operation is the same as
described with RTSP [see 4.4.10].
Figure 4.11: Streaming from a Streaming Server [113].
67
4 Internet Audio Transmission
4.5.3 Web Servers vs. Streaming Servers
The primary advantage of the web server approach is that it requires one less software
component (the streaming media server) to learn and manage [117].
The streaming server approach, has these advantages:
•
More efficient use of the network bandwidth, as long as it doesn’t use
retransmissions.
•
Better audio quality, due to advanced features like detailed reporting and multispeed audio content. Files created to be streamed over multiple speeds (multi-rate or
multi-speed files) can only be streamed from a streaming server; files created to be
streamed over a single speed (single-rate encoding) can be streamed from both a
web server and a streaming server. This is because the streaming server is
specifically designed to determine the speed of the connection, and to stream the file
at the optimum speed for that particular connection [143].
•
Supports large number of users.
•
Supports multicasting.
•
Supports both live and on-demand content.
4.6
Standards Organizations
As has been shown, there are many different file formats and multimedia protocols on the
Internet. Internet appliances would benefit from a single standard since audio devices often
cannot afford to have e.g. multiple streaming media players installed to listen to differently
compressed audio content from the web, or understand different protocols [126].
The Internet Society (ISOC) is the organization home for the groups responsible for
Internet infrastructure standards, including the Internet Engineering Task Force (IETF), the
European Telecommunications Standards Institute (ETSI), ITU and the Internet
Architecture Board (IAB) [127]. Some standards are already widely used, such as those for
speech transmission, H.323 for videoconferencing or H.324 for multimedia
communications, but the streaming audio standard is not that clear yet.
MPEG, a working group of the ISO/ IEC that develops the MPEG audio/ video coding
standards, has already been discussed.
68
Web Radio: Technology and Performance
For streaming media, the Internet Streaming Media Alliance (ISMA) [126] was created
with the purpose of accomplishing standards for rich Internet content, streaming video and
audio. “The Alliance believes that by creating an interoperable approach for transporting
and listening streaming media, content creators, product developers and service providers
will have easier access to the expanding commercial and consumer markets for streaming
media services” [126].
“Standards for many of the fundamental pieces needed for a streaming media over IP
solution do exist. The ISMA adopts parts or all of those existing standards and contributes
to those still in development in order to complete, publish and promote a systemic, end-toend specifications that enables cross-platform and multi-vendor interoperability. The first
specification from the ISMA defines an agreement for streaming MPEG-4 video and audio
over IP networks” [126]. It also promotes the use of RTP and RTSP as the protocols for
streaming multimedia.
69
70
5
Web Radio Stations
The goal of this chapter is to introduce web radio
broadcasting as a growing technology and the general
operation of an Internet radio station.
5.1 Introduction
The development of audio broadcasting via the web is probably the biggest revolution in
broadcasting since the advent of FM. An incredible variety of stations stream their audio on
the Internet, from small stations to national radio, from broadcasters already known for
their FM services to Internet-only stations. Additionally, numerous web radio stations have
their own web sites with archives of their programs for on demand listening [123].
Web radio is becoming more and more popular. Taking a look at some statistics, the
number of listeners has grown considerably in the last years and promises to continue
growing. “The streaming media audience in the US is following the same pattern as the
surge in Internet popularity” [139].
Listening Sessions
Time Spent Listening Per Session
Time Spent Listening Per Unique Listener Per Month
Monthly Aggregate Hours Listened
351,934 Sessions
1 Hours 17 Minutes
9 Hours 2 Minutes
451,648 Hours
Table 5.1: Web radio station statistics from May, 2002 [132].
71
5 Web Radio Stations
Today, there are approximately 2500 Internet radio stations in Real Networks, around 3000
stations (Internet radio and television) in Windows Media and almost 3000 in SHOUTcast,
according to data offered in their web sites.
These numbers are growing at a rapid pace. There are several factors that contribute to this
growth of web radio stations [119]:
1. Web radio eliminates the coverage restriction found with FM radio stations. A
radio station on the network can be accessed from any computer with Internet
access.
2. Another factor is the ease of setting up a radio station server. Cheap hardware
with Internet access along with free/ low-cost high-quality software enable any
user to create his/ her own web radio station.
5.2 How do they work?
To summarize all the previous chapters and put all the bits in order, a look at the operation
of a web radio system, with both live broadcasting and automatically updated audio
archives, is taken here.
How the client initially connects to the streaming server using RTSP has been already
explained in previous chapters.
An audio source (e.g. a satellite receiver, a radio tuner or live content) delivers the content
to be broadcast on the Internet. In the broadcaster equipment the analog audio signal is
captured, digitized and encoded in real time by the ADC and the encoder. The output of the
encoder is compressed digital audio in a streamable format. As the audio transmission
begins, the contents are broken into RTP packets, containing information about the coding
technique, sequence numbers and timestamping, and each packet is sent as soon as it is
prepared [131]. The packets are sent over UDP, to either a multicast or unicast destination
address. Broadcasters will usually create .sdp files containing all the needed information
about this live presentation [135].
There may also be a reflector that takes the audio stream and repeats it. This reflector
allows a multicast client to listen to the stream as a normal unicast stream coming from the
streaming server [135].
The streaming server then transmits the audio to the client(s), if there are any currently
demanding the live stream. Otherwise the data is discarded [131].
72
Web Radio: Technology and Performance
The user’s browser must have a streaming media player installed. The encoded audio data
samples in the RTP packet are then placed in the player’s buffer with previously received
audio samples. The samples are placed in the buffer in contiguous order based on their
sequence number and timestamp so that the original audio can be recovered [77]. When the
buffer is sufficiently full, the samples are decoded and sent to the sound card, which
converts the bitstreams back to analog signal, amplifies and outputs the signal to the
speakers.
Further packets continue to arrive, thus, the buffer is being filled and emptied
simultaneously, as playback continues, usually uninterrupted. In case of network
congestion, playback may stop and the user will experience a pause in the audio while the
player attempts to refill the buffer [131].
The copy of the original stream is passed to the recorder, that stores the audio on hard disk
according to a schedule which determine file names and start and end times. When a client
clicks on the URL, demanding a stored audio stream, the client’s browser sends a request to
the streaming server, which reads the audio from hard disk and transmits it to the client
[131].
73
74
6
Implementing a Web Radio Station
This chapter covers all the issues related to the
implementation of a web radio station. The basic
elements are reviewed and all the steps to create a
web radio station - with both live and on demand
content – are explained. There are many commercial
systems available for streaming, the most commonly
used systems – Windows Media, Real Networks and
SHOUTcast/ Icecast - are explained and their main
advantages and disadvantages are discussed at the
end of the chapter.
6.1 Elements
A streaming technology consists of hardware and software components that work together
to create, store and deliver media files over the web. It has three main components:
•
Client side or player: It decompresses the audio bitstream taken from the buffer and
passes it to the sound card, which converts the decompressed bitstream back to
analog signal, amplifies it and outputs it to the speakers [7].
Players can be either a browser plug-in or a helper application that the user must
download and install. Most of them are installed automatically by the browser
software, and launched when the browser detects an incoming encoded audio file of
the appropriate format.
Players must know the codec used to compress the stream before it can decompress
it and play it. However, most players will download new codecs as required.
75
6 Implementing a Web Radio Station
Players can also be divided into embedded, when they look like part of the web
page, or non-embedded, when the player is displayed as a separate window from the
web page [143].
The elements constituting a streaming media player are a streaming protocol
decoder that communicates with the streaming server, a bitstream decoder/
controller, an audio/ video compression decoder, an audio/ video post processor
(that corrects the audio data if audio data has been lost during the transport), an user
interface, a play-out buffer and an overall system controller [111].
When listening to audio on demand, the user requires control of the playout process
using features such as pause and rewind. Hence it is necessary for the player to have
another element that displays and monitors the buttons on the screen. When a button
is selected, it first adapts the playout process (e.g. stops output if the pause button is
activated) and then passes the appropriate command to the server using the RTSP
[7].
Many players can be downloaded and are free [118], e.g. Winamp and some
versions of Real Player and Windows Media Player.
•
Streaming server: As was explained in a previouschapter [see 5.5.2], it is a program
that allows streaming files, and is made to serve large number (depending on the
bandwidth) of simultaneous users [118].
•
Encoder, converter or producer: It is a program that converts file formats from
downloadable format (MPEG) into streamable formats. Many encoder programs
come bundled with the whole client/ server system [118], but usually other encoders
can be added by hardware or software.
6.2
Steps
Practically all audio streaming systems work the same way. The implementation process
follows a series of steps [118]:
•
Install the streaming server, encoder and client programs. The server part can be
ignored when using HTTP streaming (server-less system).
The maximum bandwidth used for streaming will be equal to the bitrate served,
multiplied by the maximum number of users allowed to connect to the station + 1
[138].
76
Web Radio: Technology and Performance
If the radio service is provided only from one server, it will be a bottle neck as the
number of users increases. Then it is recommended to use distributed streaming, by
setting up relay servers. Users connect to the nearest relay server, which receives a
copy of the audio data from the radio station and sends it to the users [130].
•
Capture, create and edit the audio. Recording the audio can be accomplished with
normal audio recording equipment [143]. For capturing live audio, the audio source
must be attached to the computer’s audio card.
•
Convert the files into streaming formats using the encoder. If it is live content, a real
time encoder must be used.
The files can be multi-speed or single-speed. In either case, the broadcaster must
decide the speed(s) that is going to be the target, depending on the Internet
connection of the audience [143]. When streaming from a web server, one link for
each speed must be included, but if a streaming server is used, just one link to the
multi-speed file is needed.
Several tools can be used to create the streamable files. Windows Media Producer
(free) has a wizard that guides users through the creation of single-speed and multispeed .asf files. Real Producer (free) also uses a wizard that helps users to create
single-speed .rm files, while Real Producer Plus G2 (not free) can be used to create
multi-speed .rm files [143].
•
Create a text pointer file with the URL of the audio file, e.g. if the audio file is
called concert:
http://www.company.com/media/concert.rm
http://www.company.com/media/concert.asf
if on demand audio working with Real Networks and Windows Media respectively,
and the audio is stored on a web server, and
rtsp://realserver.company.com/ramgen/media/concert.rm
mms://windowsserver.company.com/media/concert.asf
if live audio working with Real Networks and Windows Media respectively and the
audio comes from a streaming server. The first four letters specify the protocol used
for streaming, which is RTSP for Real Networks and MMS for Windows Media,
then comes the server name, then the folder in the server storing the file, and the
audio file name [150].
77
6 Implementing a Web Radio Station
Save this metafile with extension .ram in Real Networks and .asx in Windows
Media. RealSystem G2 has a program called Ramgen that avoids the creation of
.ram files. It uses specially configured URL that causes the browser to launch the
player and stream using RTSP [151].
The only difference if Windows Media audio content with a .wma extension
(instead of .asf) is streamed, is that it is accessed by metafiles with a .wax file name
extension [47], the rest of the process exposed here is the same.
•
Create the web page with a link to the metafile (not to the streaming file, since
browsers cannot make RTSP request, just HTTP):
<a href="http://www.company.com/media/concert.ram">
Click here</a> to listen.
<a href="http://www.company.com/media/concert.asx">
Click here</a> to listen.
•
Send the web page, the metafile (.ram/.asx ) and the streaming (.rm/.asf) files to the
web server or streaming server if working with live audio.
•
Add or change MIME types on the server, so that browsers can recognize your new
streaming type.
Finally, test by clicking on the link to verify that it works. The player will launch and, after
a few seconds of buffering, will play the audio[149]. Table 6.1 summarizes the formats and
techniques used.
Windows Media Server
Real Server
Streaming File Format
Linking File (metafile)
Media Player
Type of Streaming
.asf
.rm
.asx
.ram
Window Media Player
RealOne Player
Single Speed or Intelligent
Single Speed or SureStream
Single Speed - Web server
Single Speed – Web server
Location of Streaming
Multi-Speed - Streaming Media
Multi-Speed – Streaming Media
File
Server
Server
Streaming protocol
MMS
RTSP
Create hyperlink to .asx file in Create hyperlink to .ram file in
Technique
HTML page; in .asx file create HTML page; in .ram file create
hyperlink to .asf file.
hyperlink to .rm file.
Table 6.1: Streaming media with Windows Media and Real Networks [143].
78
Web Radio: Technology and Performance
A web radio station can broadcast its content either using unicast or multicast (over the
Mbone). Once the station is set up, it has to give information describing its attributes
(metadata), content, content-type, media-type etc. The player will use this information to
select the appropriate audio decoder and provide the listener information on the content
(station name, current playlist ...). There are several mechanisms for a station to advertise
its metadata [119]:
•
Listeners select well-known IP addresses or web sites to access the radio station.
Typically, this information is obtained by advertisement, word of mouth, or from
portal sites and content provider sites (e.g. broadcast.com).
•
The radio station can register itself with a well-known directory server. Examples of
this model include Nullsoft's SHOUTcast and Icecast. These systems provide a
directory server that maintains a database keeping track of radio stations and their
attributes, as well as a mechanism for a radio station to register itself.
•
The radio station is hosted on a well-known address. Companies like live365.com
host radio stations on their site. They offer features like reliability, quality etc.
•
The station can announce its attributes using the SAP protocol on a well-known
multicast address.
6.3 Commercial Tools
There are three main platforms that offer complete streaming systems (server, client and
encoder): Real Networks [137], Windows Media [136] and SHOUTcast/Icecast [138, 140].
The Real Networks and Windows Media systems offer servers for specific client system,
the Real Player and Windows Media Player. Icecast and SHOUTcast provide MP3
streaming servers [130]. A short discussion on each of them is given now, followed by a
comparison between them.
There are more encoders, players and servers available in the market, that may provide
even better quality for specific file formats or applications, but these are the most
commonly used nowadays.
79
6 Implementing a Web Radio Station
6.3.1 Windows Media
To build a radio-type solution using Windows Media, you can put together components in
this way:
Windows
Media
Windows
Media
Network
Windows
Media
Figure 6.1: A basic solution using Windows Media components.
Windows Media Encoder
Windows Media Encoder is a free application [143] that compresses live or stored audio
and video content into Windows Media format files or streams. After the digital media has
been compressed and encoded, it can be saved as a Windows Media file or broadcast to the
Internet. If the digital media is a live broadcast, the media is delivered in real time to a
Windows Media Server that streams it to the players requesting the audio [136].
The following file formats can be used with Windows Media Encoder: .wma, .wmv, .asf,
.avi, .wmv, .mpg, .mp3 and .bmp [148].
Windows Media Encoder uses two advanced features: Intelligent streaming and multiple
bitrates. In Intelligent streaming the server and the client communicate with each other to
establish the actual network throughput and automatically adjust the properties of the
audio/video stream to maximize quality.
To take full advantage of intelligent streaming, the content must be encoded using multiple
bitrates. A single Windows Media file is created, containing multiple streams that are
encoded at different bitrates. When the player receives the multi-rate Windows Media file
or live stream, it only plays the stream encoded at the bitrate that best matches the user’s
connection [148].
Windows Media encoding software has also the option to transmit the stream
uncompressed , so users need 1.4 Mbps of available bandwidth to be able to listen at it.
KEXP is the first radio station in the world to offer uncompressed audio on the Internet.
80
Web Radio: Technology and Performance
Windows Media Player
The Windows Media Player not only plays Windows Media format files (.wmv, .wma), but
also other multimedia formats including AVI, MOV, ASF, WAV, MPEG-1, MPEG-2,
MIDI, MP3 and QuickTime. Additional hardware is recommended to decode MPEG-2 files
[136].
Windows Media Server
The Windows Media Encoder works in conjunction with Windows Media Server; it first
compresses audio data into the Windows Media format and then passes it to the Windows
Media Streaming Server that converts it into .asf format, which is suitable for streaming
[118].
The Microsoft Window Media streaming server uses some proprietary streaming protocols:
•
Microsoft MMS (Microsoft Media Server protocol): MMS protocol has both a data
delivery mechanism to ensure that packets from the Windows Media Server reach the
client, and a control mechanism to handle client commands such as stop or play. MMS
works both over UDP (MMSU) and TCP (MMST) [163].
With the “protocol rollover” functionality, the server switches from one protocol to
another when it fails to make a connection using a particular protocol.
•
Microsoft MSBD (Microsoft Media Stream Broadcast Distribution protocol): The
MSBD protocol was used to transfer streams from the Windows Media Encoder to the
Windows Media Server and between servers. However, Windows Media Encoder 7 no
longer supports MSBD and uses HTTP instead [163].
Detailed information about broadcasting using the Windows Media Service is given in the
documentation and tutorials from the Windows Media site [136].
6.3.2 Real Networks
Real Networks pioneered streaming audio with RealAudio, the first streaming media
product for the Internet. The Real Broadcast Network (RBN) offers users the ability to
broadcast on demand and live audio/video content.
The Real Networks system consists of software only [131], and its main tools are:
RealSystem Producer, RealOne Player and RealSystem Server.
81
6 Implementing a Web Radio Station
RealSystem Producer
Some sound-editing programs can create Real clips (file extension .rm, although older clips
may use .ra instead), but RealSystem Producer Basic (free) and RealSystem Producer Plus
are the most widely used tools [151]. Both producers support real time encoding for live
content. The resulting audio/video stream is transmitted via a network interface to the
streaming server (RealSystem server).
RealSystem Producer accepts many common audio/video formats apart from its own
formats. These may vary by operating system, though, e.g. RealSystem Producer on
Macintosh accepts the formats widely used on the Macintosh, such as QuickTime, whereas
RealSystem Producer on Windows or Unix supports the formats most used on those
operating systems. Some of the accepted formats are: Audio Interchange Format (.aiff),
Audio (.au), MPEG-1 (.mpg, .mp3), QuickTime (.mov), Sound (.snd) and WAV(.wav)
[151].
When encoding audio clips with RealSystem Producer, the target audiences to reach can be
selected [see Table 6.2]. RealSystem Producer determines then which RealAudio codecs
(also called encoders) are best to use depending on the compression and quality needed
[151]. On the receiving end, RealOne Player uses the same codec to decode the audio.
Table 6.3 list the music codecs used in RealAudio 8.
Target Audience
28.8 Kbps modem
56 Kbps modem
64 Kbps single ISDN
112 Kbps dual ISDN
Corporate LAN
256 Kbps DSL/cable modem
384 Kbps DSL/cable modem
512 Kbps DSL/cable modem
Voice Only Voice & Music
16 Kbps
20 Kbps
32 Kbps
32 Kbps
44 Kbps
64 Kbps
64 Kbps
96 Kbps
Mono Music
20 Kbps
32 Kbps
44 Kbps
64 Kbps
96 Kbps
Stereo Music
20 Kbps
32 Kbps
44 Kbps
64 Kbps
132 Kbps
176 Kbps
264 Kbps
352 Kbps
Table 6.2: Real Audio standard bitrates [151].
RealSystem Producer has some advanced features to improve audio quality, such as
SureStream, technology that works in the same way as Intelligent streaming in Windows
Media. It also allows multi-rate encoding, which is needed for SureStream.
82
Web Radio: Technology and Performance
RealAudio 8 codec
16 Kbps Stereo Music
20 Kbps Stereo Music
20 Kbps Stereo Music—High Response
32 Kbps Stereo Music
32 Kbps Stereo Music—High Response
44 Kbps Stereo Music
44 Kbps Stereo Music—High Response
64 Kbps Stereo Music
96 Kbps Stereo Music
105 Kbps Stereo Music
132 Kbps Stereo Music
146 Kbps Stereo Music
176 Kbps Stereo Music
264 Kbps Stereo Music
352 Kbps Stereo Music
Sampling frequency
22.05 kHz
22.05 kHz
22.05 kHz
22.05 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
44.1 kHz
Frequency response
4.3 kHz
8.6 kHz
9.9 kHz
10.3 kHz
13.8 kHz
13.8 kHz
16.0 kHz
16.0 kHz
16.0 kHz
13.7 kHz
16.5 kHz
16.5 kHz
19.2 kHz
22.0 kHz
22.0 kHz
Table 6.3: RealAudio 8 Stereo Music Codecs [151].
RealOne Player
Real Networks claims that their RealOne Player “provides the most advanced media
playback possibilities available, combining streaming media, digital downloads, and Web
browsing” [151]. RealOne Player also provides several means for delivering information
about a presentation, such as its title, author, and copyright. On their web page Real
Networks claims to have 285.000.000 players installed.
RealOne Player can play many types of audio, video, streaming media formats and
compression formats for audio files. Table 6.4 lists all these formats.
83
6 Implementing a Web Radio Station
File type
Extension(s)
A2B
.mes
Active Stream Format
.asf
Audio File
.au
Blue Matter Files
.bmo, .bmr, .bmt
GIF File Format
.gif
IBM EMMS Files
.emm
Liquid Audio
.lqt
MJuice Files
.mjf
MP3 Playlist Files
.m3u, .pls, .xpl
MPEG Files
.mp3, .mpeg, .mpa, .mp2, .mpv, .mx3
MPEG Playlist File
.pls
MPEG URL (MIME Audio File)
.m3u
Macromedia Flash
.swf
Portable Network Graphics image files
.png
QuickTime Files
.avi, .aiff
RAM Metafile
.ram, .rmm
RealAudio, Real Media
ra, .rm, .rmx, .rmj, .rms
RealOne Music
.mnd
RealPix
.rp
RealText
.rt
WAVE
.wav
Windows Media Audio
.wma
Other compatible formats
RealJukebox Skins
.rjs
RealJukebox Track Info Packages
.rmp
Table 6.4: Media Formats RealOne Player supports [137].
For presentations in which RealOne Player pops up as a separate application, .ram is used
as the metafile extension. When the clip or presentation is embedded in a web page,
however, the metafile uses the file extension .rpm. RealOne Player still plays the
presentation, but it does not launch as a separate application. Instead, the browser appears
to play the clips [151]. To link to an embedded clip (e.g. audio.rm) from a web page, the
<EMBED> tag is used:
<EMBED SRC="play_audio.rpm" WIDTH=300 HEIGHT=134>
84
Web Radio: Technology and Performance
where play_audio.rpm is the metafile. This metafile then gives RealOne Player either the
RTSP URL to the clip if it is stored on a RealSystem Server:
rtsp://realserver.example.com/audio.rm
or the HTTP URL to the clip if it is stored on a web server:
http://www.example.com/audio.rm
One of the players’ drawbacks is that they are CPU intensive, so they cannot be used in
many systems [48].
RealSystem Server
RealSystem Server streams the clips created by RealSystem Producer. It runs on Windows
NT/2000 and many Unix platforms, including Linux [151].
The protocol used by the RealSystem Server is RTSP. It can stream several audio formats
in addition to RealAudio: .aif, .au, and .wav. Plug-ins may exist for additional audio
formats [149].
In the Figure 6.4 there is an example of the implementation of a web radio station using
RealSystem.
Figure 6.2: Set up of a web radio station using the RealSystem [131].
85
6 Implementing a Web Radio Station
6.3.3 SHOUTcast and Icecast
SHOUTcast is Nullsoft's free Winamp-based open source distributed streaming audio
system. It allows anyone with Internet connection to broadcast audio from their PC to
listeners across the Internet or any other IP-based network (office LANs, college campuses,
etc.). SHOUTcast's underlying technology for audio delivery is MP3. It can deliver both
live audio or on demand audio for archived broadcasts [138].
The SHOUTcast system has three elements: Nullsoft's Winamp, the SHOUTcast Source
Plug-in for Winamp, and a MP3 codec [138].
Listeners tune in to SHOUTcast broadcasts by using a player compatible with streaming
MP3 audio. Users can visit the SHOUtcast directory to locate a stream they'd like to listen
to. The recommended player is Winamp (for Windows users) [138].
Users wanting to broadcast will need to run their own server. Once a server is located,
broadcasters use Winamp to stream music. A plug-in called the SHOUTcast Source for
Winamp converts the stream and sends it from Winamp to the SHOUTcast server. The
SHOUTcast server can stream any format supported by Winamp: MPEG audio layers 1, 2,
and 3, MOD/S3M/XM/IT (digital synthesized music formats), MIDI/MID (musical
instrument digital interface), WAV/VOC (digital audio file), CDA (compact disc audio),
WMA (Windows Media Audio), AS/ASFS (Audiosoft secure MP3 file) and so on.
Through the use of specialized SHOUTcast broadcasting plug-ins, audio from a
microphone as well as any device attached to the broadcasters’ sound card can also be
streamed [138].
The last element of the SHOUTcast system is the SHOUTcast Distributed Network Audio
Server (DNAS). This software runs on a server attached to an IP network with lots of
bandwidth, and is responsible for receiving audio from a broadcaster, updating the
SHOUTcast directory with information about what the broadcaster is sending, and sending
the broadcast out to listeners [138]. With this system, SHOUTcast server responsibilities
are easily distributed, and users can stream content to additional servers to allow for more
listeners, making it feasible for someone with only a modem connection to broadcast to a
large number of listeners.
The SHOUTcast server is not a streaming server, instead it uses the HTTP protocol to
stream, this is the reason why “HTTP streaming” is also known as the “SHOUTcast
streaming protocol” [111].
The server software runs on Macintosh, Windows and Unix, and Winamp is available for
the Macintosh as well as for Windows. Both the server and player software are free, a
substantial advantage over Real and Windows (which charges money for high-performance
server software).
86
Web Radio: Technology and Performance
Icecast is an open source Internet audio streaming server based on MPEG audio technology
that is completely compatible with SHOUTcast. The difference is that SHOUTcast is more
Windows and Solaris oriented while Icecast is Unix oriented.
Icecast has some limitations due to the lack of a good free encoder for Linux/Unix, e.g. the
streams don’t change bitrate to the bitrate you specify using SHOUTcast, because it doesn’t
exist an encoder to re-encode them at a lower bitrate properly. In Windows, Winamp uses
Microsoft's licensed mp3 codec, which cannot be used for Linux/Unix. [140]
Icecast's advantages over SHOUTcast include: less CPU and memory resources used,
several streams per server, the possibility to use multiple directory servers and a simple
administration through either telnet, console, or the web interface [157].
More information about broadcasting either with SHOUTcast or Icecast can be found in the
documentation [138, 140]. There are also good tutorials on the Internet [144].
6.4
Comparison
The best systems are those delivering the highest quality audio for a given bandwidth,
offering low delay, no jitter, low frame loss [162], low computational complexity [162],
good audio synchronization, etc. In addition, the ability to provide the best possible audio
quality over a range of networks/bandwidths (scalability) without content duplication is
also desirable. The popularity of the system chosen is another important factor, since it
helps to interconnect and transmit to the highest number of users. It is also important to
keep in mind that some systems are free and others aren’t.
Real Networks and Windows Media streaming technologies have their own proprietary
server designed to stream files in their proprietary format. Therefore, the media files
created should match a specific server’s file format requirements. Their players are also
proprietary., e.g. a Windows Media Player will not normally play RealAudio and vice versa
[147].
This is their main disadvantage, since their codecs are not open source, their use requires
licensing and that typically means that broadcasters have to pay for the software to run
them. The MP3 codec, and Winamp player used in SHOUTcast are practically free in
comparison [144]. Besides, to run a Real Server, for example, the broadcaster must pay a
license fee per-stream (per concurrent listener) [144].
87
6 Implementing a Web Radio Station
For broadcasters wanting to stream other people's music without purchasing their own
licenses, there are sites like Live 365 [145] or Wired Planet. They handle all the paperwork
concerning copyrights, and also provide the bandwidth required to stream to thousands of
listeners. This is a good choice for broadcasters requiring big bandwidth, and they are not
very expensive (from 10 to 80 $ per month) [144].
The main advantage of both Real Networks and Windows Media systems is their
popularity, they are very consolidated and are the main systems used for streaming on the
Internet sites and by the users nowadays. Using their formats avoids users having to
download a new plug-in every time they want to listen to audio from the Internet, what is
really annoying. Windows system is a very good system specially for those organizations
that are already using Microsoft Windows [118]. Real Player is cross-platform and it is
installed by default with many web browsers, so many listeners will be able to access it
without installing additional software.
My guess is that in the future all the platforms will tend to be open-source, since this is the
market trend right now. Real Networks has a new software that can distribute streamed
audio and video in a range of formats, including Windows Media format, and has
announced a shared source code initiative. The media delivery platform, called the Helix
Platform, is based on their current client and server software, and is the first to support
commonly used technologies and applications, such as MPEG-4 and Windows Media
[158].
“This is an attempt of Real Networks to prevent Windows Media from achieving market
domination. A perception of open-ness will make more people consider Real Networks
products as standards rather than just products. But Real Networks may not be able to
afford to be open enough , their revenue today depends on licensing fees for the use of their
software, and unless they can change their business model somewhat, it will be difficult for
them to achieve a real partnership with the Open Source community. That community has
little to gain by replacing Microsoft's proprietary audio format with Real Networks stillproprietary audio format” [159].
88
7
Performance
Before the audio gets to the listeners, it suffers many
transformations, delays etc. that may degrade its
quality. The main performance parameters and factors
affecting the quality of real time audio are reviewed
here. At the end of the chapter a practical experiment
of measuring these parameters is analyzed.
The quality of the audio that we listen to when we connect to a web radio stations depends
on many parameters. Some audio quality is lost before transmission in the process of
capturing, digitizing and compressing. These effects can be minimized, but not suppressed,
depending on the equipment used, the strength of the compression etc.
When the audio packets are transmitted over the Internet, they are subject to the vagaries of
network performance, which depends on many factors, many of them unpredictable (such
as congestion). One of this factors is the delay. Whereas traditional network applications
(email, file transfer) can tolerate considerable delays, streamed audio is much less tolerant
[65], specially in the case of two-way communication (such as conferencing) where the
maximum acceptable delay is 250 ms [25]. In radio broadcasting applications the delay
would increase the time users must wait before starting listening to the audio, but it doesn’t
affect the quality of the audio itself. The total delay for real time audio broadcasting can be
calculated as [122]:
Tdelay: Tsample (ADC) + Tencode + Tpacketize + Ttransmission + Tpropagation +
Tprocess in the network elements (routers, hops...) + Tbuffer + Tdepacketize + Tdecode +
Tpresent (DAC)
More than the delay, variations in this delay cause problems with audio traffic. The
variation in delay is called jitter, and means that the time between the arrival of successive
packets will vary, which may cause samples to be played back at the wrong time. One way
89
7 Performance
to deal with this at the receiver end is to buffer [see 5.2.2] [65]. For high quality stereo
music this delay variance should be less than 1 ms [25].
In addition to delay and jitter, there may be packet loss or errors, since the audio is not
transported using a reliable protocol as TCP. However, audio has a high tolerance to this,
compared to data packets [65].
In the receiving end, the user needs to have enough connection bandwidth to get the audio
with its full quality. If the audio is encoded at a higher bitrate than the user’s connection,
the user will experience frequent pauses in playback, or may be unable to play the content
at all [149].
Some experiments can be made to measure the QoS of the Internet audio [see 164]. One of
the most important metrics that influence user perception of audio is average packet audio
playout delay vs. the packet loss rate. Other relevant metrics are the receiving buffer
capacity vs. the packet loss rate, the average packet audio playout delay vs. the lateness
(difference between the value of the receiver’s clock upon the arrival of the discarded
packet and its timestamp.) of discarded packets, and the waiting time in the receiver buffer
for the played packets vs. average packet audio playout delay.
In Italy, some investigators compared the performance of three different playout delay
control mechanisms [165, 166, 167], designed to dynamically adjust the receivers’ buffer
depth to compensate for the highly variable Internet packet delays. Due to the presence of
generally distributed delays typically experienced by audio samples over the Internet, the
analysis was conducted via simulation. The three mechanisms were evaluated using both
experimentally obtained delay measurements (trace-driven simulation) and randomly
generated according to Gaussian distributed delays and exponentially distributed delays.
The study was oriented to voice traffic, but I present here the results they obtained that can
be extrapolated to real time Internet audio transmission. In the Figures 7.1 – 7.9 mechanism
# 1 [165] is represented in yellow, # 2 [167] in green, and # 3 [166] in black. In Figure 7.10
Gaussian traffic is represented in yellow, exponential traffic in green and trace-driven
traffic in black.
90
Web Radio: Technology and Performance
Figure 7.1: Average playout delay (ms.) vs. loss rate (%): Gaussian traffic [164].
Figure 7.2: Average playout delay (ms.) vs. loss rate (%): Exponential traffic [164].
91
7 Performance
Figure 7.3: Average playout delay vs. loss rate (%): Trace-driven simulation [164].
Figure 7.4: Lateness of discarded packets (ms.) vs. average playout delay (ms.):
Gaussian traffic [164]..
92
Web Radio: Technology and Performance
Figure 7.5: Lateness of discarded packets (ms.) vs. average playout delay (ms.):
Exponential traffic [164].
Figure 7.6: Lateness of discarded packets (ms.) vs. average playout delay (ms.):
Trace-driven simulation [164]..
93
7 Performance
In Figures 7.1 and 7.2 are represented the average playout delay vs. the loss rate for
Gaussian and exponential traffic, respectively, where Gaussian delays have the expected
value of 100 ms and a standard deviation of 7 ms, and exponential delays have the expected
value of 100 ms and standard deviation of 10 ms. Figure 7.3 shows the same metrics
averaged over a set of traffic traces gathered experimentally from an IP-based
interconnection with 16 hops and that is quite lossy. To provide an understanding of the
effects that various playout delays and loss rates (as well as buffer dimensions) have on the
quality of the audio, in Figures 7.1 to 7.3 an approximate intuitive representation of three
different ranges on the quality of the audio is presented. These ranges are for voice
communications, such as conferencing, where delays larger than 350 ms. impede a
conversation. It can be concluded that the longer the playout delay, the smaller the loss
rate.
Figures 7.4 to 7.6 show, for each audio mechanism, the average increment in playout delay
that must be paid to reduce lateness and, consequently, avoid discarded audio packets.
Under Gaussian traffic (Figure 7.8.), all the plotted curves show that a small increase in
playout delay drastically reduces the percentage of packet loss. However, under the other
traffic scenarios (exponential and trace-driven) a large playout delay must be introduced in
order to obtain an appreciable reduction in the packet loss rate.
One of the primary performance metrics is the percentage of packets lost at the destination.
Such loss may either be due to packets that arrive too late for presentation or packets that
arrive too far in advance of their playout time. In the latter case, packet loss results from
limitations on the finite size of the buffer (that is, buffer overflow due to the premature
arrival of packets). Figures 7.7 through 7.9 quantify, for each mechanism, the tradeoff
between the loss rate and the maximum buffer capacity. Figure 7.9 shows that the
dimensioning of the receiving buffer must be adjusted dynamically, according to the
playout delay value computed by the mechanism. But with Gaussian and exponential
traffic, a threshold value seems to exist beyond which the increase in buffer capacity
provides no benefit.
Figure 7.10 shows the total amount of time that nondiscarded packets wait in the receiver
buffer before their playout time. It can be concluded that the larger the average playout
delay, the larger the waiting time in the buffer.
94
Web Radio: Technology and Performance
Figure 7.7: Loss rate (%) vs. buffer capacity (number of packets): Gaussian traffic [164].
Figure 7.8: Loss rate (%) vs. buffer capacity (number of packets): Exponential
traffic [164].
95
7 Performance
Figure 7.9: Loss rate (%) vs. buffer capacity (number of packets): Trace-driven
simulation [164].
Figure 7.10: Wait Time in buffer (ms.) vs. average playout delay (ms) for
mechanism #3 [164].
96
8
Conclusions
We have seen that radio broadcasting on the Internet is an expanding technology, and is
attracting more and more people, both listeners and broadcasters.
A key characteristic of the streaming audio commercial products is the diversity in
technological infrastructure, e.g. networks, protocols and compression standards [162].
Compatibility between products has been limited because of the use of proprietary
standards. However, recent products have been designed to enable new and various codecs
to be easily incorporated into their framework [162]. Streaming products now tend to be
open, to provide compatibility and improve performance.
Future efforts will focus on improving the existing networking technologies to provide
better services at lower cost. Development of new communication protocols in conjunction
with better compression techniques will also be important in the future. This will hopefully
lead to radio without dropouts, servers that don’t suffer overload etc. and will make
listening to Internet radio a much more pleasant experience.
It seems to be not much written about the performance of Internet radio stations.
Simulations on networks to measure the QoS, as the one we have analyzed before, could be
done for web radio stations using network simulators (such as COMNET or NS). The
previous simulation model could be extended to test different streaming protocols (MMS,
RTSP etc.).
Moreover, a web audio broadcast should exploit the features of the Internet, e.g. the
information about the content, station etc. could be used to fulfill listeners’ preferences by
providing them with private channels [130]. However, there may exist a copyright problem
in giving listeners total control of what they listen to, and the situation where nobody will
buy a CD again can be achieved.
97
8 Conclusions
The increasing number of web radio stations and listeners is causing a lot of controversy.
“During the last year, radio broadcasters and the recording industry have been locked in a
battle over the fees [160]. In July, the Librarian of Congress decided what royalty rates
webcasters will have to pay to record companies and performers to stream music out onto
the Internet. The fee level will be hard to justify for even the larger commercial stations
with strong advertising revenue. Even the lower fees applied to some noncommercial
webcasters are impossible for small experimental stations” [161]. Will this slow down the
growth of web radio technology?.
98
Appendix: Audio File Formats
In this appendix the main audio file formats and their
basic characteristics - file extension, origin,
recommended bitrate, compression, if they stream or
not and if they are proprietary or not - are listed.
A format is a known or defined method of setting things out. Different files on a computer
are arranged in their own formats that every piece of software that is capable of reading that
format understands [69].
A media file format might be something like a BMP picture file, an AVI video file, or an
AIFF audio file. It holds the information used to describe a sound or a picture in a special
known format, that everyone must know before they can decode it [69].
A compression format holds the same information describing a piece of audio or a picture,
except that it has been compressed, which changes the arrangement of the data bits and
therefore, the format. A JPEG picture file, an MPEG video file or a RA encoded audio file,
are examples of this [69]. The data in the compressed media file needs to be decompressed
before it becomes a media format once again.
A streaming file format is one that has been specially encoded so that it can be played while
it downloads, instead of having to wait for the whole file to download. Such formats
usually include also some compression. It is possible to stream some standard media file
formats, however it is usually more efficient to encode them into streaming file formats
[69]. Some examples of streaming file formats are ASF, RA, etc.
A media delivery format is the unique way that audio data has been arranged. It contains
information about the timing, synchronization, copyright etc. The actual audio data may be
located in the same file, or in a separate file [69]. This is the prospect of streaming
multimedia, because it is hoped that an open media delivery format will be taken on by all
commercial streaming products in order to provide a defacto method for delivery of media
types that use different standards of compression (such as MPEG) and different media file
formats (QuickTime, AVI, AIFF, etc) [69]. This format also has the benefit of being able to
99
Appendix: Audio File Formats
synchronize many different streams of different types in the same format. Some examples
of media delivery formats are ASF, SMIL, etc.
Historically, almost every type of machine used its own file format for audio data, but some
file formats are more generally applicable, and in general it is possible to define
conversions between almost any pair of file formats [59]. There are two types of file
formats [59]:
•
Self-describing file formats generally define a family of data encoding, where a
header field indicates the particular encoding variant used. Some self-describing
formats are those with extension : .au or .snd, .aif(f), .aifc, .mp2, .mp3, .ra, .wav,
WAVE, etc.
•
Headerless formats (sometimes called “raw”), define a single encoding. Some
headerless formats are: .snd, .fssd (Mac, PC), .snd (Amiga).
The table below reviews the main characteristics of some of the audio file formats. There
are many others, but here are mentioned the most used today on the Internet.
File Extension
.au, .snd (audio)
.aif(f), AIFF (audio
Interchange File
Format)
.aifc, AIFC
.ra
.wav , WAVE, RIFF
.cda (CD Audio
Track)
.asf (Audio Stream
Format), .wma
(Windows Media
Audio v.8)
.mp1, .mpa
.mpeg2, .mp2, .mpa
.mp3 (.m3u for
playlists) stereo
.acc stereo
.acc mono
Origin
Recommended bitrate
UNIX, Sun
720 Kbps
Apple
1,4 Mbps
Compression
Yes (2:1)
No
Streaming
No
No
Price
O.F.
O.F.
Apple
RealAudio
Microsoft
-
236 Kbps
96 Kbps
128 Kbps
1,4 Mbps
Yes (1:6)
Yes (1:14)
Yes (1:10)
No
No
Yes
No
No
O.F.
Proprietary
O.F.
-
Microsoft
96 Kbps
Yes (1:14)
Yes
Proprietary
MP1
MP2
MP3
384 Kbps
192 Kbps
128 Kbps
Yes (1:4)
Yes (1:8)
Yes (1:10)
Yes
Yes
Yes
O.F.
O.F.
O.F.
Yes
Yes (1:20)
Yes
Yes
O.F.
O.F.
MPEG-2
MPEG-2
128 Kbps
64 Kbps, 96 Kbps [23]
Table A.1: Audio File Formats [2, 59, 60, 61, 62, 69]. (O.F. = Open Format).
100
References
[1] Multicast Networking and Applications
C.Kenneth Miller. Addison Wesley Longman, Inc. 1999
[2] Audio on the Internet
http://www.noisebetweenstations.com/personal/essays/audio_on_the_internet/
[3] Building a Community Information Network: A Guidebook
http://www.mel.org/toolkit/content/book/
[4] Principles of digital audio and video
Arch C. Luther. Artech House, Inc., 1997
[5] Fundamentals of digital audio
http://www.cs.tut.fi/~ypsilon/80545/FundamentalsOfDA.html#HDR0
[6] Dithering and Noise Shaping: The Basics
http://www.glowingcoast.co.uk/audio/theory/dither/index.htm
[7] Multimedia Communications : Applications, Network, Protocols and Standards
Fred Halsall. Addison-Wesley, 2001
[8] Digital video and audio compression
Stephen J. Solari. McGraw-Hill, 1997
[9] Basics about MPEG Perceptual Audio Coding: The purpose of Audio Compression
Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/basics.html
[10] Principles in Low Bitrate Coding
http://www.cs.tut.fi/~ypsilon/80545/WBACoding.html#HDR1
[11] Introduction to Multimedia
http://www.cs.cf.ac.uk/Dave/Multimedia/
101
References
[12] Coding of Audio Signals
http://www.cs.tut.fi/~ypsilon/80545/CodingOfAS.html#HDR3
[13] Bitrate scalability
http://www.scalatech.co.uk/technology.htm
[14] Perception Based Bit Allocation Algorithms for Audio Coding
Stephen Voran. Institute for Telecommunication Sciences, U.S.
http://www.its.bldrdoc.gov/home/programs/audio/pdf/waspaa97.pdf
[15] Wideband Speech and Audio Coding
Esin Darici Haritaoglu, http://www.umiacs.umd.edu/~desin/Speech1/node13.html
[16] Multilevel data compression techniques for transmission of audio over networks
Wunnava, S.V., Craig Chin. Proceedings. IEEE, 2001. Page(s): 234 –238
[17] Constant and Variable Bitrate Encoding
http://service.real.com/help/faq/rjbvbrfaq.html
[18] The MPEG Home Page
http://mpeg.telecomitalialab.com/
[19] An Overview of the MPEG/audio Compression Algorithm
Pan, D., Applications of Signal Processing to Audio and Acoustics, 1991
IEEE ASSP Workshop on Page(s): 0_79 -0_80
[20] A tutorial on MPEG/audio compression
Pan, D., IEEE Multimedia, Volume: 2 Issue: 2, Summer 1995, Page(s): 60 –74
[21] Subband Coding Tutorial
http://www.otolith.com/pub/u/howitt/sbc.tutorial.html
[22] MPEG Audio Layer-3
Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/layer3/
[23] An Introduction to MPEG Layer –3
K. Brandenburg, H. Popp. Fraunhofer Institut für Integrierte Schaltungen (IIS)
[24] Short MPEG-2 description
Leonardo Chiariglione, ISO/IEC TC1/SC29/WG11 N, MPEG 00/, October 2000
http://mpeg.telecomitalialab.com/standards/mpeg-2/mpeg-2.htm
[25] Multimedia networks: fundamentals and future directions
Nalin Sharda, Victoria Univ. of Technology. 1999
102
Web Radio: Technology and Performance
[26] MPEG Audio Frame Header
http://www.dv.co.yu/mpgscript/mpeghdr.htm
[27] The private life of MP3 frames
http://www.id3.org/mp3frame.html
[28] Tagging introduction
http://www.id3.org/intro.html
[29] MPEG-2 AAC
Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/aac/index.html
[30] Advanced Audio Coding
http://www.aac-audio.com/technology/
[31] Source Coding of Audio
Technologies and Services on Digital Broadcasting,
http://www.nhk.or.jp/strl/publica/bt/en/le0010-1.html
[32] RTP Payload Format for MPEG-2 AAC Streams
IETF, 1999
http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-ietf-avt-rtp-mpeg2aac- 00.txt
[33] MP3 and AAC explained
Karlheinz Brandenburg, Fraunhofer Institute for Integrated Circuits
http://www.ece.cmu.edu/~ee545/MP3/docs/mp3_explained.pdf
[34] High quality audio for multimedia: key technologies and MPEG standards
Noll, P. Global Telecommunications Conference, 1999. GLOBECOM '99, Volume: 4,
1999, Page(s): 2045 -2050 vol.4
[35] MPEG-4 Overview
ISO/IEC JTC1/SC29/WG11 N4668, March 2002
http://mpeg.telecomitalialab.com/standards/mpeg-4/mpeg-4.htm
[36] MPEG-4 Audio
Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/mpeg4/index.html
[37] MPEG-4 AAC
Steve Church, Telos Systems.
http://www.broadcastpapers.com/audio/TelosAAC04.htm
[38] MPEG-4 Natural Audio Coding
Karlheinz Brandenburg, Oliver Kunz, Akihiko Sugiyama
http://leonardo.telecomitalialab.com/icjfiles/mpeg-4_si/9-natural_audio_paper/index.html
103
References
[39] Applications of MPEG-4: digital multimedia broadcasting
Grube, M.; Siepen, P.; Mittendorf, C.; Boltz, M.; Srinivasan, M. Consumer Electronics,
IEEE Transactions, on Volume: 47 Issue: 3 , Aug. 2001, Page(s): 474 –484
[40] MPEG Audio FAQ.MPEG-4 Audio: coding of natural and synthetic sound
http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/mpeg4.html
[41] AAC Low Delay
Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/mpeg4/aac_ld.html
[42] MPEG-4 General Audio Coding
Jürgen Herre, Fraunhofer Institute for Integrated Circuits (IIS)
http://www.tnt.uni-hannover.de/project/mpeg/audio/general/aes106_1-GeneralAudio.pdf
[43] Overview of the MPEG-7 Standard (version 6.0)
ISO/IEC JTC1/SC29/WG11. N4509. Pattaya, December 2001
http://mpeg.telecomitalialab.com/standards/mpeg-7/mpeg-7.htm#_Toc533998965
[44] MPEG Audio FAQ.MPEG-7: description of meta-information on sound
http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/mpeg7.html#7A
[45] MPEG-7 Applications Document v.10
ISO/IEC JTC1/SC29/WG11/N3934. January 2001/Pisa.
http://ipsi.fhg.de/delite/Projects/MPEG7/Documents/W3934.htm
[46] MPEG-21 Overview v.4
ISO/IEC JTC1/SC29/WG11/N4801. Fairfax, May 2002
http://mpeg.telecomitalialab.com/standards/mpeg-21/mpeg-21.htm
[47] About Windows Media Audio Codec
Starr Andersen Microsoft Corporation. Updated October 2000
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwmt/html/msaudio.asp
[48] Real Audio
www.realaudio.com
[49] Internet radio and excellent audio quality: dreamboat or reality?
Stoll,G.; Felderhoff, U.; Spikofski, G., Broadcasting Convention, 1997. International,
1997. Page(s): 192 -201
[50] Compressed Audio vs. CDs: Can You Tell the Difference?
Ramon Mcleod and Richard Baguley
http://webcenter.pcworld.aol.com/computing/aolcom/article/0,aid,64123,pg,3,00.asp
104
Web Radio: Technology and Performance
[51] WMA vs. MP3 –3
Peter Bridger,
http://www.hardwarecentral.com/hardwarecentral/reviews/2606/7/
[52] Which is the best low-bitrate audio compression algorithm? OGG vs. MP3 vs.
WMA vs. RA
Panos Stokas, http://ekei.com/audio/
[53] Comparison Testing of Codecs For Microsoft Windows Media Technologies 4.0,
RealNetworks RealSystem G2, and MP3
April, 1999, http://www.nstl.com/downloads/Final_MSAudio_Report.pdf
[54] Music, Internet Audio, Web Design
http://www.midisound.com/index.html
[55] WMA, MP3, OGG, VQF, WAV
http://www.litexmedia.com/article/
[56] MPEG-4 Audio verification test results: Audio on Internet
Eric Scheirer, Sang-Wook Kim, Martin Dietz. Source: Audio and Test subgroup.
ISO/IEC JTC1/SC29/WG11. MPEG98/N2425. Atlantic City - October 1998
[57] MP3 encoders comparison
http://members.tripod.com/Milaa/mp3Comparison/encoders.html
[58] Report on the MPEG-2 AAC Stereo Verification Tests
David Meares, BBC R&D, Kingswood Warren, UK; Kaoru Watanabe, NHK, Tokyo,
Japan; Eric Scheirer, MIT Media Labs, USA. ISO/IEC JTC1/SC29/WG11. N2006.
February 1998
[59] Audio File Formats FAQ
Chris Bagwell, http://home.attbi.com/~chris.bagwell/AudioFormats.html#
[60] Introduction to Multimedia Systems
Gaurav Bhatnagar, Shikha Mehta and Sugata Mitra. Academic Press, 2001
[61] A comparison of Internet audio compression formats
Andrew Pam, http://www.sericyb.com.au/audio.html
[62] Music formats
http://www.wotsit.org/search.asp?s=music
[63] Networking fundamentals
http://www.streamdemon.co.uk/structure.html
105
References
[64] How Internet Infrastructure Works
Jeff Tyson, http://www.howstuffworks.com/internet-infrastructure.htm
[65] Digital multimedia
Nigel Chapman and Jenny Chapman. John Wiley & Sons, 2000
[66] White paper- Streaming media technology
Steve Cresswell, June 2000
http://whitepapers.smart421.com/smart421-streamingmedia.pdf
[67] Streaming media
AT&T. July 2000,
http://www.ipservices.att.com/techviews/whitepapers/StreamingMedia.pdf
[68] Streaming media frequently asked questions
http://www.kexp.org/listen/faq.htm
[69] Streaming file formats
http://www.streamdemon.co.uk/avdata.html
[70] Internet terms: Streaming
http://www.clienthelpdesk.com/dictionary/streaming.html
[71] Streaming optimization
http://howto.lycos.com/lycos/step/1,,1+11+26063+24067+9945,00.html
[72] Multicast Streaming: An Introduction
http://www.microsoft.com/windows/windowsmedia/serve/multiwp.asp
[73] Multicast communication: Protocols and applications
Ralph Wittman and Martina Ziterbart. Morgan Kaufmann publishers, 2001
[74] Multicast streaming
QuickTime API Documentation
http://developer.apple.com/techpubs/quicktime/qtdevdocs/REF/Streaming.4.htm
[75] IP Multicast
http://www.mbone.ru/tech/intro.shtml
[76] MBONE: Multicasting Tomorrow's Internet
Kevin Savetz, Neil Randall, and Yves Lepage,
http://www.savetz.com/mbone/toc.html
[77] Developing IP Multicast Networks
Volume I, Beau Williamson. Cisco Press, 2000
106
Web Radio: Technology and Performance
[78] A Mechanism for Multicast Multimedia Data with adaptive QoS Characteristics
Christos Bouras and A.Gkamas, 2001
[79] Multichannel Splittin algorithm for AAC and AAC-LD encoded audio
Anton Thimet and Joesph Zolyak
http://www.telos-systems.com/?/techtalk/split/default.htm
[80] Streaming Media Optimization with CacheFlow Internet Caching Appliances
http://www.cacheflow.com/technology/whitepapers/streaming.cfm
[81] Error Spreading: A Perception-Drive n Approach to Handling Error in Continuous
Media Streaming
Srivatsan Varadarajan, Hung Q. Ngo, and Jaideep Srivastava
[82] Packet loss resilient, scalable audio compression and streaming for IP networks
Leslie, B.; Sandler, M. 3G Mobile Communication Technologies, 2001
Second International Conference on (Conf. Publ. No. 477), 2001. Page(s): 119 –123
[83] Computer networks : a systems approach
Larry L.Peterson & Bruce S. Davie,
The Morgan Kaufmann Series in Networking. Second Edition, 2000
[84] Multimedia Servers : Applications, environments and design
Dinkar Sitaram and Asit Dan
The Morgan Kaufmann Series in Multimedia Information and Systems, 2000
[85] Internet Protocols
http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/ip.htm
[86] RFC 791: Internet protocol
September 1981, http://www.ietf.org/rfc/rfc791.txt
[87] RFC 792: Internet Control Message Protocol
September 1981, http://www.ietf.org/rfc/rfc792.txt
[88] IP Audio
INET'99 from the Internet Society
http://www.isoc.org/isoc/conferences/inet/99/proceedings/4p/index.htm
[89] Protocol directory
http://www.protocols.com/pbook/index.htm
[90] RFC 1883: Internet Protocol, Version 6 (IPv6)
December 1995, http://www.ietf.org/rfc/rfc1883
107
References
[91] RFC 2463: Internet Control Message Protocol (ICMPv6) for the Internet Protocol
Version 6 (IPv6)
December 1998, http://www.ietf.org/rfc/rfc2463
[92] RFC 2365: Administratively Scoped IP Multicast
July 1998, http://www.ietf.org/rfc/rfc2365.txt
[93] RFC 793: Transmission Control Protocol
September 1981, http://www.ietf.org/rfc/rfc793.txt
[94] A TCP/IP Tutorial
T. Socolofsky, C. Kale. Spider Systems. January 1991,
http://www.faqs.org/rfcs/rfc1180.html
[95] TCP-friendly Congestion Control for Real-time Streaming Applications
Deepak Bansal and Hari Balakrishnan
M.I.T. Laboratory for Computer Science. Cambridge, MA 02139
[96] Advanced internet technologies
Uyless Black. Prentice Hall series in advanced communications technologies, 1999
[97] RFC 768: User Datagram Protocol
J.Postel, ISI. August 1980, http://www.ietf.org/rfc/rfc768
[98] Time-lined TCP for the TCP-friendly Delivery of Streaming Media
Biswaroop Mukherjee and Tim Brecht. Department of Computer Science, University
of Waterloo, Waterloo, Ontario, Canada N2L 3G1
Appears in the Proceedings of the International Conference on Network Protocols
(ICNP), Osaka Japan, pp. 165176, November 2000
http://www.nmsl.cs.ucsb.edu/~ksarac/icnp/2000/papers/2000-15.pdf
[99] RFC 1889: RTP: A Transport Protocol for Real-Time Applications
H. Schulzrinne (GMD Fokus), S. Casner (Precept Software, Inc.), R. Frederick (Xerox
Palo Alto Research Center), V. Jacobson (Lawrence Berkeley National Laboratory).
IETF, January 1996, http://www.ietf.org/rfc/rfc1889.txt
[100] Protocol ensures safer multimedia delivery
John Walker and Jeff Hicks. Network World, 1999
http://www.nwfusion.com/news/tech/1101tech.html
[101] Streaming media protocols
Neil Ridgway,
http://www.mmrg.ecs.soton.ac.uk/publications/archive/ridgway1998/html/node26.html
[102] Some frequently asked questions about RTP
http://www.cs.columbia.edu/~hgs/rtp/faq.html
108
Web Radio: Technology and Performance
[103] RTP Payload Format for MPEG-2 AAC Streams
Kretschmer-AT&T/Basso-AT&T, Civanlar-AT&T/Quackenbush-AT&T,
Snyder-AT&T. IETF, June 25, 1999
http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-ietf-avt-rtp-mpeg2aac-00.txt
[104] RTP Payload Format for MPEG-4 Streams
Civanlar-AT&T/Basso-AT&T, Casner-Packet Design, Herpel-Thomson/Perkins-ISI.
IETF, July 13, 2000
http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-ietf-avt-rtp-mpeg4-03.txt
[105] RFC 3016: RTP Payload Format for MPEG-4 Audio/Visual Streams
Y. Kikuchi (Toshiba), T. Nomura (NEC), Fukunaga (Oki), Y. Matsui (Matsushita),
H. Kimata (NTT). IETF, November 2000
http://www.faqs.org/rfcs/rfc3016.html
[106] A More Loss-Tolerant RTP Payload Format for MP3 Audio
R. Finlayson, IETF,June 2001, http://www.live.com/rtp-mp3/rtp-mp3.txt
[107] RFC 2250: RTP Payload Format for MPEG1/MPEG2 Video
D. Hoffman, G. Fernando, V. Goyal, M. Civanlar. IETF. January 1998
http://www.networksorcery.com/enp/rfc/rfc2250.txt
[108] RSVP Protocol Overview
http://www.isi.edu/div7/rsvp/overview.html
[109] RFC 2205: Resource ReSerVation Protocol (RSVP)
R. Braden, Ed., L. Zhang, S. Berson, S. Herzog, S. Jamin. IETF, September 1997
http://www.ietf.org/rfc/rfc2205.txt
[110] RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee
IETF, June 1999, http://www.ietf.org/rfc/rfc2616.txt
[111] Streaming, Transmission and other Network Protocols
http://streamingmedialand.com/technology.html
[112] RFC 2326: Real Time Streaming Protocol (RTSP)
H. Schulzrinne, A. Rao, R. Lanphier. IETF, April 1998, www.ietf.org/rfc/rfc2326.txt
[113] Multimedia Applications
Marcel Waldvogel, http://classes.cec.wustl.edu/~cs423/FL2000/Chapter6a/img0.html
[114] RFC 2327: SDP: Session Description Protocol
M. Handley, V. Jacobson. IETF, April 1998, http://www.ietf.org/rfc/rfc2327.txt
109
References
[115] RFC 2974: Session Announcement Protocol
M. Handley, C. Perkins, E. Whelan. IETF, October 2000
http://www.ietf.org/rfc/rfc2974.txt
[116] RFC 2543: SIP: Session Initiation Protocol
M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg. IETF, March 1999
http://www.ietf.org/rfc/rfc2543.txt
[117] Streaming Methods: Web Server vs. Streaming Media Server
http://www.microsoft.com/windows/windowsmedia/compare/webservvstreamserv.asp
[118] Web Developer.com Guide to Streaming Multimedia
José Alvear. Wiley Computer Publishing, 1998
[119] Customized Internet radio
Venky Krishnan and S. Grace Chang. Hewlett-Packard Laboratories
[120] Video and image processing in multimedia systems
Borko Furth, Stephen W. Smoliar and Hongjiang Zhang
Kluwer Academic Publishers, 1995
[121] Management of multimedia on the internet
4th IFIP/IEEE International Conference on Management of Multimedia Networks and
Services, MMNS 2001, Chicago, IL, USA, October/November 2001
[122] Multimedia Information Networking
Nalin K. Sharda. Prentice Hall, 1999
[123] Internet radio
http://www.radioenthusiast.com/internet_radio.htm
[124] Streaming audio@Internet: Perspectives for the broadcasters
Gerhard Stoll. Institut für Rundfunktechnik, IRT GmbH, München, Germany
http://radio.irt.de/aida/docs/AES17Sto.pdf
[125] From Broadcasting to Webcasting: The EBU experience on Audio@Internet
Gerhard Stoll. Institut für Rundfunktechnik, München, Germany
IBC’2000 Tutorial Day “The Internet Delivery”, 7 Sept. 2000, Amsterdam
http://radio.irt.de/aida/docs/IBC2000.pdf
[126] ISMA: Internet Streaming Media Alliance
http://ism-alliance.tv/index.html
[127] Internet Society
http://www.isoc.org
110
Web Radio: Technology and Performance
[128] Music on the Internet and the intellectual property protection problem
Lacy, J.; Snyder, J.H.; Maher, D.P. Industrial Electronics, 1997. ISIE '97.,
Proceedings of the IEEE International Symposium on , Volume: 1 , 1997. Page(s):
SS77 -SS83 vol.1
[129] The development of an interactive music station over Internet
Chung-Ming Huang; Pei-Chuan Liu; Pi-Fung Shih. Information Networking, 2001.
Proceedings. 15th International Conference on , 2001. Page(s): 279 –284
[130] Issues in Internet Radio
Yasushi Ichikawa, Kensuke Arakawa, Keisuke Wano, and Yuko Murayama, 2002
[131] Audio streaming on the Internet. Experiences with real-time streaming of audio
streams
Jonas, K.; Kanzow, P.; Kretschmer, M. Industrial Electronics, 1997. ISIE '97,
Proceedings of the IEEE International Symposium on , Volume: 1 , 1997. Page(s):
SS71 -SS76 vol.1
[132] Bethoven.com Web Site Statistics
http://www.beethoven.com/statistics.htm
[133] Clare FM Audience Statistics
http://www.clarefm.ie/about/perf.htm#jnlr
[134] The Lion 90.7fm Webcasting Statistics
http://www.lion-radio.com/webstats.php
[135] FAQs about Streaming Server
http://developer.apple.com/darwin/projects/streaming/faq.html
[136] Windows Media Technologies
http://www.microsoft.com/windows/windowsmedia/default.asp
[137] Real Networks
www.real.com, www.realnetworks.com, http://service.real.com/
[138] Shoutcast
http://www.shoutcast.com/, http://www.shoutcast.com/support/docs/
[139] Web Radio to Resume Tunes in July
Frank Thorsberg, PCWorld.com
http://webcenter.pcworld.icq.com/computing/icq/article/0,aid,54328,00.asp
[140] Icecast
www.icecast.org
111
References
[141] Networks and transfer protocols
Final Report of the EBU / SMPTE Task Force for Harmonized Standards for the
Exchange of Television Programme Material as Bitstreams
http://www.ebu.ch/pmc_tfrep_part_5.pdf
[142] Webopedia
http://www.webopedia.com
[143] Creating Streaming Media at CSU
http://csu.colstate.edu/webdevelop/streamingmedia/#Recording%20the%20Media
[144] Streaming audio tutorial
http://hotwired.lycos.com/webmonkey/00/45/index3a_page5.html?tw=multimedia
[145] Live 365
http://www.live365.com/home/index.live
[146] Wired Planet
http://www.wiredplanet.com/
[147] Streaming Media tutorial
http://www.doit.wisc.edu/services/streaming/tutorial/tutorial3.htm
[148] Windows Media Encoder
http://www.visionlb.ca/ISE/Sarpal/WindowMediaEncoder7UserManual.doc
[149] Real System G2 production guide
http://csu.colstate.edu/webdevelop/streamingmedia/realsys/production/
[150] The WebDeveloper.com Secret Guide to RealAudio
David Fiedler,
http://www.webdeveloper.com/multimedia/multimedia_guide_realaudio.html
[151] RealSystem Production Guide With RealOne Player
http://service.real.com/help/library/guides/realone/ProductionGuide/HTML/realpgd.htm
[152] Players comparison overview
http://www.microsoft.com/windows/windowsmedia/press/compare/playcomp.asp
[153] Servers and delivery comparison overview
http://www.microsoft.com/windows/windowsmedia/press/compare/servercomp.asp
[154] Comparative Cost Analysis: Windows Media Technologies Vs. Real System G2
http://www.microsoft.com/windows/windowsmedia/compare/wmtcostcompare.asp
112
Web Radio: Technology and Performance
[155] Microsoft Windows Media and RealNetworks RealSystem Feature Comparison
http://www.approach.com/content/expertise/digital.asp
[156] Microsoft: Windows Media Player for Windows XP versus Real Jukebox 2.0 Basic
and RealONE performance comparison test
http://www.etestinglabs.com/main/reports/mswxpmp8.pdf
[157] Icecast - Streaming media server based on the MP3 audio code
http://www.gnu.org/directory/Audio/Mp3/icecast.html
[158] RealNetworks shares code, streams Windows format
Joris Evers. IDG News Service, 07/22/02
http://www.nwfusion.com/news/2002/0722realshare.html
[159] How open is RealNetworks' new "open" software?
Adam Gaffin. Network World Fusion, 07/24/02
http://napps.nwfusion.com/compendium/archives/00000210.html
[160] Webcasting Royalty Rates Set--For Now
Scarlet Pruitt, IDG News Service
http://webcenter.pcworld.icq.com/computing/icq/article/0,aid,102146,00.asp
[161] The final (for some) report
Scott Bradner. Network World, 07/01/02
http://www.nwfusion.com/columnists/2002/0701bradner.html
[162] A Review of Video Streaming over the Internet
Jane Hunter , Varuni Witana , Mark Antoniades. SuperNOVA Project
DSTC Technical Report TR97-10, August 1997
http://archive.dstc.edu.au/RDU/staff/jane-hunter/video-streaming.html
[163] Streaming multimedia data
http://www.teamsolutions.co.uk/streaming.html
[164] Comparing the QoS of Internet Audio Mechanisms via Formal Methods
Alessandro Aldini, Roberto Gorrieri, Marco Roccetti and Marco Bernardo
[165] Adaptive playout mechanisms for packetized audio applications in wide-area
networks
Ramjee, R., Kurose, J., Towsley, D., and Schulzrinne, H., 1994. In Proceedings of
the Conference on INFOCOM ’94 (Montreal, Canada).
[166] Design and experimental evaluation of an adaptive playout delay control mechanism
for packetized audio for use over the internet
Roccetti, M., Ghini, V., Pau, G., Salomoni, P., and Bonfigli, M., 1999
113
References
[167] Packet audio playout delay adjustment: Performance bounds and algorithms
Moon, S.B., Kurose, J., and Towsley, D. 1998
114
Acronyms
AAC: Advanced Audio Coding.
AAC LC: Advanced Audio Coding Low Complexity.
ABR: Available Bitrate.
ACK: Acknowledge.
ADC: Analog Digital Conversor.
ADIF: Audio Data Interchange Format.
ADPCM: Adaptive Pulse Code Modulation.
ADTS: Audio Data Transport Stream.
AIFF: Audio Interchange File Format.
ASF: Advanced Streaming Format.
ASPEC: Adaptive Spectral Perceptual Entropy Coding.
ASR: Automatic Speech Recognition.
ATM: Asynchronous Transfer Mode.
AVI: Audio Video Interleave.
BMP: Bit-mapped graphics format.
BW: Bandwidth.
115
Acronyms
CBR: Constant Bitrate.
CCITT: Comité Consultatif International Téléphonique et Télégraphique.
CD: Compact Disk.
CELP: Code Excited Linear Predictor.
Codec: Coder – Decoder.
CPU: Central Processing Unit.
CRC: Cyclic Redundancy Code.
DAB: Digital Audio Broadcasting.
DCT: Discrete Cosine Transform.
DFT: Discrete Fourier Transform.
DNAS: Distributed Network Audio Server.
DPCM: Differential Pulse Code Modulation.
DSM-CC: Digital Storage Media Command and Control.
DST: Discrete Sine Transform.
DVD: Digital Versatile Disk.
DWT: Discrete Wavelet Transform.
ETSI: European Telecommunications Standards Institute.
FTP: File Transfer Protocol.
FFT: Fast Fourier Transform.
HILN: Harmonic and Individual Lines plus Noise.
HTML: HyperText Markup Language.
HTTP: Hypertext Transfer Protocol.
IAB: Internet Architecture Board.
116
Web Radio: Technology and Performance
ICMP: Internet Control Message Protocol.
IEC: International Electrotechnical Commission.
IETF: Internet Engineering Task Force.
IGMP: Internet Group Management Protocol.
IP: Internet Protocol.
IPv4: Internet Protocol, Version 4.
IPv6: Internet Protocol, Version 6.
ISDN: Integrated Services Digital Network.
ISMA: Internet Streaming Media Alliance.
ISN: Initial Sequence number.
ISO: International Standards Organization.
ISOC: Internet Society.
ISP: Internet Service Provider.
ITU: International Telecommunication Union.
JPEG: Joint Photographic Experts Group.
KBD: Kaiser-Bessel Derived.
Kbps: Kilo bits per second.
kHz: Kilo Herzs.
LAN: Local Area Network.
LD: Low Delay.
LFE: Low Frequency Enhancement.
LPC: Linear Predictive Coding.
LTP: Long Term Prediction.
117
Acronyms
LZW: Lempel-Ziv-Welch.
MDCT: Modified Discrete Cosine Transform.
MMS: Microsoft Media Server.
MMSU: MMS over UDP.
MMST: MMS over TCP.
MPEG: Moving Picture Experts Group.
MS: Middle Side.
MSBD: Microsoft Media Stream Broadcast Distribution protocol.
MUSICAM: Masking pattern adapted Universal Subband Integrated Coding
And Multiplexing.
NS: Network Simulator.
PASC: Precision Adaptive Sub-band Coding.
PCM: Pulse Code Modulation.
PEAQ: Perceptual Evaluation of Audio Quality.
PNS: Perceptual Noise Substitution.
PQMF: Polyphase Quadrature Mirror Filterbank.
PS: Program Stream.
QoS: Quality of Service.
RA: Real Audio.
RBN: Real Broadcast Network.
RFC: Request for Comments.
RSVP: Resource Reservation Protocol.
RTCP: Real Time Control Protocol.
RTP: Real Time Protocol.
118
Web Radio: Technology and Performance
RTSP: Real Time Streaming Protocol.
SAP: Session Announcement Protocol.
SBC: Subband Coding.
SCFSI: Scale Factor Selection Information.
SDP: Session Description Protocol.
SIP: Session Initiation Protocol.
SMIL: Synchronized Multimedia Integration Language.
SMR: Signal to Mask Ratio.
TCP: Transmission Control Protocol.
TNS: Temporal Noise Shaping.
TS: Transport Stream.
TwinVQ: Transform-domain Weighted interleaved Vector Quantization.
UBR: Unspecified Bitrate.
UDP: User Datagram Protocol.
URL: Uniform Resource Locator.
VBR: Variable Bitrate.
WMA: Windows Media Audio.
119
120