Web Radio: Technology and Performance Paula Cortés Camino LITH-ISY-EX–3327-2002 2002-08-30 Avdelning, Institution Datum Bildkodning (Image Coding Group) Date augusti 2002 (August 2002) Institutionen för systemteknik (Division of Electrical Engineering) Språk Rapporttyp Language Report Category Svenska (Swedish) ; Engelska (English) Licenciatavhandling ; Examensarbete C-uppsats D-uppsats Annan URL för elektronisk version ISBN ISRN Serietitel och serienummer Title of Series, numbering ISSN LiTH-ISY-EX-3327-2002 http://www.ep.liu.se/exjobb/isy/2002/3327/ Titel Web-radio: Tekniker och prestanda Title Web-radio: Technology and Performance Författare Paula Cortéz Camino Author Sammanfattning Abstract We review some popular techniques involved in web radio, from the compression methods (both open and proprietary) to the network protocols. The implementation of a web radio station is also addressed. Nyckelord Keywords Web-radio, streaming audio, windows media, real audio, shoutcast, MP3, AAC, WMA, audio coding, RTP, RTSP Web Radio: Technology and Performance Master thesis in Image Coding Group Linköping Institute of Technology by Paula Cortés Camino LITH-ISY-EX–3327-2002 Supervisor: Dr. Robert Forchheimer, Image Coding Group Examiner: Dr. Robert Forchheimer, Image Coding Group Linköping, August 2002 Acknowledgements I would like to express my sincere gratitude to Dr. Robert Forchheimer for accepting me in the Image Coding Group. He found the right project for me and helped me all the time with his suggestions and corrections. The first time we met I got an excellent impression about him, not only as a professor but also as a person, that has been improving with the time and talks we have had. Thanks also to Peter for his effort to be my opponent, and to Jonas for his corrections. Thanks to all the people that have made my time in Linköping so special and unforgettable, my friends from the university and my host family. There are many people I have to be grateful to for being so close to me in spite of the distance, all my friends from Castillejo del Romeral, Guadalajara, my university, all my family...the list would be endless. I want to dedicate this thesis to my parents and my “mormor” (abuela). They have been always there, listening to me when I have needed it, and also here visiting me. Also to Jesús that has been with me during the summer, without his support this thesis (and many other things) would have never been possible. I love all of you so much. TACK SÅ MYCKET! ¡MUCHAS GRACIAS! Linköping, August 2002 Abstract The march of electronic technology and the explosive growth of the Internet, is changing how audio communications are delivered. For decades radio was delivered through analog signals sent over the airwaves, but this is being transformed. Today, streaming technologies allow more and more people to listen to their favorite radio station through the Internet. The development of audio broadcasting via the web is probably the biggest revolution in broadcasting since the advent of FM. Web radio is radio with a lot of potential, with the ability to hear stations of all formats, musical genres, political social orientations, from every part of the world. Web radio is also far from radio heard by other means, in terms of quality, quantity, and variety. It is becoming more and more popular and in the future it may even replace traditional radio. The aim of this thesis is to make a review of all the techniques involved in web radio, from the compression methods (both open and proprietary) to the network protocols. The implementation of a web radio station is also addressed. Before the audio gets to the listeners, it suffers many transformations, jitter, delays etc. that degrade its quality. In the end of the thesis the main performance parameters and factors affecting the quality of real time audio are reviewed, and some measurements in quality parameters are analyzed. Key words: Audio, Compression, Bandwidth, Streaming, Broadcasting, Web Radio Station, RTSP, RTP, Streaming Server, Player, Encoder, Web Server, Audio File Format, QoS, Multicasting. Contents 1 Introduction 1 1.1 Disposition of the Thesis ................................................................................... 1 1.2 Audio Overview ................................................................................................. 2 2 Audio Compression 5 2.1 Classification of Compression Techniques ........................................................ 6 2.1.1 Lossless Compression Techniques .......................................................... 7 2.2.2 Lossy Compression Techniques .............................................................. 7 2.2 Audio Compression on the Internet ................................................................... 10 2.2.1 MPEG ..................................................................................................... 10 2.2.2 Windows Media Audio ........................................................................... 26 2.2.3 RealAudio ................................................................................................ 26 3 ...Compression Techniques Comparison 3.1 3.2 3.3 4 29 Coders Comparison ........................................................................................... 29 Formats Comparison .......................................................................................... 35 Final Conclusions ............................................................................................... 39 Internet Audio Transmission 41 4.1 4.2 Introduction ........................................................................................................ 41 Streaming Audio ................................................................................................. 42 4.2.1 Overview ................................................................................................. 42 4.2.2 Streaming Improvement .......................................................................... 44 4.3 Broadcasting ....................................................................................................... 50 4.4 Multimedia Protocols ......................................................................................... 50 4.4.1 Physical/ Data Link Layer......................................................................... 51 4.4.2 Internet Layer ......................................................................................... 51 4.4.3 Transport Layer ....................................................................................... 54 4.4.4 Session/ Presentation/ Application Layer ................................................ 61 4.4.5 Asynchronous Transfer Mode (ATM) .................................................... 64 4.5 Media Servers ..................................................................................................... 65 4.5.1 Web Servers ............................................................................................ 65 4.5.2 Streaming Servers ................................................................................... 67 4.5.3 Web Servers vs. Streaming Servers ........................................................ 68 4.6 Standards Organizations ..................................................................................... 68 5 Web Radio Stations 71 5.1 Introduction ....................................................................................................... 71 5.2 How do they work? ............................................................................................ 72 6 Implementing a Web Radio Station 75 6.1 Elements ............................................................................................................. 75 6.2 Steps ................................................................................................................... 76 6.3 Commercial Tools .............................................................................................. 79 6.3.1 Windows Media ...................................................................................... 80 6.3.2 Real Networks ......................................................................................... 81 6.3.3 SHOUTcast and Icecast .......................................................................... 86 6.4 Comparisons ...................................................................................................... 87 7 Performance 89 8 Conclusions 97 Appendix: Audio File Formats 99 Acronyms 101 References 115 1 Introduction In this chapter, the thesis is presented and an overview of audio processing and transmission is given. 1.1 Disposition of the Thesis The march of electronic technology and the explosive growth of the Internet [119], is changing how audio communications are delivered. For decades radio was delivered through analog signals sent over the airwaves, but this is being transformed. Today, streaming technologies allow more and more people to listen to their favorite radio station through the Internet. Web radio is radio with a lot of potential, with the ability to hear stations of all formats, musical genres, political social orientations, from every part of the world. Web radio is also far from radio heard by other means, in terms of quality, quantity, and variety [123]. In the future it may even replace traditional radio. This thesis presents an overview of Internet radio systems. Chapter 1 reviews the processing of audio, from analog sound to packets ready to be transmitted through the Internet. Chapter 2 emphasizes the necessity of compression in Internet audio transmission, classifies compression techniques and presents the main techniques used nowadays. In Chapter 3 both audio formats and the encoders available for each format are compared. Chapter 4 lists the main audio file formats found on the Internet and their main characteristics. In Chapter 5 the actual transmission of the audio packets through the Internet is addressed, introducing the concept of streaming and the protocols used. The chapter ends by taking a look at the standard organizations for audio compression and transmission. Chapter 6 introduces web radio and explains the general operation of a web radio station. Chapter 7 covers the process of creating an Internet radio station with three of 1 1 Introduction the main streaming systems, SHOUTcast/Icecast, Windows Media and Real Networks. Factors affecting the QoS of web radio stations are studied in Chapter 8, and some tests regarding the performance are analyzed. The thesis finishes with a conclusion about the actual state and the future of Internet radio and future work given in Chapter 9. 1.2 Audio Overview Sounds are rapid pressure variations in the atmosphere such as are produced by many natural processes or man-made systems. The human ear responds to atmospheric pressure variations when they are in the frequency range between 20 Hz to about 20 kHz, which is called the audio bandwidth. The nature of the sound is analog, but it is better converted into a digital signal before transmission. The conversion from analog to digital is done by an ADC (Analog-Digital Converter), which filters, samples, quantizes and encodes the audio. In this process of digitizing, some things should be taken into account to get a high audio quality. In the sampling stage, a higher sampling frequency increases fidelity. The sampling frequency is the number of times the audio is quantized within a given time period. The sampling frequency is related to the audio signal bandwidth by the Nyquist limit, which states that the sampling frequency should be at least two times the maximum frequency of the signal (or its bandwidth). When recording music, the choice of the sampling frequency is crucial since musical instruments produce a wide range of frequencies. The sampling frequency should be above 40 kHz, giving a highest frequency reproduced of 20.0 kHz, approximately the top of human hearing range [2]. If recording speech, however, a lower sampling frequency will be enough [3]. Table 1.1 summarizes some sampling frequencies and their common use [2]. Sampling frequency 11.1 kHz 22.050 kHz 24 kHz 44.1 kHz Common use Minimum quality currently used on personal computers Very common in computer sound file formats Minimum acceptable quality needed for speech recognition The standard for audio compact discs and high quality personal computer sound. CD quality Table 1.1: Sampling frequencies and their common use. 2 Web Radio: Technology and Performance Quantization refers to the process of approximating the continuous set of values in the data with a finite set of values. The instantaneous amplitude of the analog signal at each sampling is rounded off to the nearest of several predetermined levels. The number of levels is usually a power of 2, so that these numbers can be represented by three, four, five, or more bits when encoding. There are two types of quantization, scalar and vector quantization. In scalar quantization, each input symbol is treated separately to give the output, while in vector quantization the input symbols are combined together in groups called vectors, and then processed to give the output [4]. Obviously, as this is a process of approximation it introduces quantization error. Increasing the word length (number of bits used to encode each sample), the quantization error can be reduced. In large amplitude signals the correlation between the signal and the quantization error is small, the error is random and sounds like analog white noise. However, in low level signals, the error becomes correlated with the signal, which leads to distortion [5]. To decorrelate this error, a technique called “dithering” is used. Dithering distributes the error across the entire spectrum by adding some noise prior to quantization. This method raises the noise floor equally at all frequencies, however, since the ear is not equally sensitive to all frequencies, it makes sense to push the majority of this dither noise to frequencies where the ear is least sensitive, and remove noise where the ear is most sensitive. This is done by another technique called “noise shaping” [6]. The complete approach of digitizing is called PCM (Pulse Code Modulation) [7]. The digital audio is then stored in files in a standard format. However, the size of the files can be quite large [3], so the files usually need to be compressed. Once the audio file is compressed, it is divided into packets before transmitting it through the Internet. The conventional process of transferring and listening to an audio file involves first transferring the file from one computer to the next or downloading files from a server via FTP (File Transfer Protocol). Another option is to listen to an audio file as it is transferred, or real time transfer, using a technique called streaming. 3 4 2 Audio Compression In this chapter the necessity of audio compression on the Internet is emphasized and audio compression techniques are introduced. Compression techniques can be classified in lossy and lossless, typical examples of both are presented. At the end of the chapter three of the main approaches in audio compression for the Internet (MPEG, WMA and Real Audio) and their main techniques are analyzed and explained in more detail. To store a 3 minutes song on your hard disk with CD quality (44.1 kHz, stereo, 16 bits per sample) will take up: 44.100 samples/s * 2 channels (for stereo)* 2 bytes/sample * 3 * 60 s/min = around 30 MBytes of storage space. Then, downloading it over the Internet, given an average 56K modem, would take: 30.000.000 bytes * 8 bits/byte / (56.000 bits/s * 60 s/min) = around 70 minutes. Sound Quality CD (44.1 kHz) FM Radio (22.05 kHz) AM Radio (8 kHz) Stereo 16 bit 10 MBytes 5 MBytes 1.8 MBytes Stereo 8 bit Mono 16 bit 5 MBytes 5 MBytes 2.5 MBytes 2.5 MBytes 900 KBytes 900 KBytes Mono 8 bit 2.5 MBytes 1.25 MBytes 450 KBytes Table 2.1: Approximate file sizes of a one minute sound file [3]. 5 2 Audio Compression Looking at the previous example and also at Table 2.1, the need of compression is clear. Audio compression (also called “audio coding”) reduces the amount of memory required to store an audio file, reducing also the time required to transfer it and the bandwidth needed for the transmission. The term bitrate is used to define the strength of the compression[9]. Bitrate denotes the average number of bits that one second of audio data will consume. For PCM, bitrate is then in bps = fsampling (Hz) * word length (bits/sample), e.g. for a digital audio signal from a CD, the bitrate is 1411.2 Kbps (44.1 K * 2 * 16). The first proposals to reduce audio followed those for speech coding, but speech and music have different properties. Furthermore, speech can be coded very efficiently because a speech production model is available, whereas nothing similar exists for audio signals [10]. High coding efficiency is achieved with algorithms exploiting signal redundancies. These redundancies can be [11]: • Spatial: exploits correlation between neighboring data items. • Spectral: uses the frequency domain to exploit relationships between frequency of change in data. • Psychoacoustics: exploits perceptual properties of the human auditory system. • Temporal. 2.1 Classification of Compression Techniques Compression can be categorized in two ways [11, 8, 4]: 6 • Lossless compression: when the compressed data can be reconstructed (uncompressed) without loss of information. It is also referred to as reversible compression. • Lossy Compression: the aim is to obtain the best possible fidelity for a given bitrate or minimizing the bitrate to achieve a given fidelity measure. The compression is not reversible, the decompressed file is not the same as the original file. Web Radio: Technology and Performance 2.1.1 Lossless Compression Techniques These methods are fairly straight forward to understand and implement. Their simplicity is their downfall in terms of attaining the best compression ratios. Some lossless compression techniques are: • Simple Repetition Suppression: replaces a series of sequence of n successive tokens that appears with a token and a count number of occurrences. This technique is used when there is silence in sound files. • Pattern Substitution: substitutes a frequently repeated pattern(s) with a code shorter than the pattern [11]. • Entropy Encoding: it is based on information theoretic techniques [11, 8], e.g.: o Huffman Coding: uses a lower number of bits to encode the data that occurs more frequently. The basic Huffman algorithm has been extended to the Adaptive Huffman Coding, because the previous algorithms require a statistical knowledge which is often not available (e.g. in live audio). o Arithmetic coding: maps entire messages to real numbers based on statistics. o LZW (Lempel-Ziv-Welch): is a dictionary-based compression method. It maps a variable number of symbols to a fixed length code. 2.1.2 Lossy Compression Techniques Traditional lossless compression methods don't work well on audio because their compression is not high enough. Lossy compression methods use source encoding techniques that may involve transform encoding, differential encoding or vector quantization. Perceptual techniques (based on psychoacoustics) are also used (e.g. in the MPEG standards), getting higher compression. The sensitivity of the human auditory system for audio signals varies in the frequency domain, it is high for frequencies between 2.5 and 5 kHz and decreases beyond and below this frequency band. Therefore, some tones are masked by others, and then are inaudible. There are two main masking effects: • Frequency masking: tones nearby a loud tone are masked. 7 2 Audio Compression • Temporal masking: after hearing a loud sound, it takes a little while until humans can hear a soft tone nearby. There is a “threshold in quiet” and any tone below this threshold won’t be perceived. For every tone in the audio signal a “masking threshold” can be calculated [see Figure 2.1]. Tones lying below this masking threshold can be eliminated by the encoder because will be masked and then irrelevant for the human perception[9]. The following are some of the lossy methods applied to audio compression: • Silence Compression: detects the “silence”, similar to Simple Repetition Suppression. • ADPCM (Adaptive Differential Pulse Code Modulation): it is a derivative of DPCM (Differential Pulse Code Modulation). It encodes the difference between two consecutive signals, and adapts quantization so that fewer bits are used when the value is smaller [7]. Used in CCITT G.721 at 16 or 32 Kbps and in G.723 at 24 and 40 Kbps. • LPC (Linear Predictive Coding): fits the signal to the speech model and then transmits the parameters of the model [7]. • CELP (Code Excited Linear Predictor): does LPC, but also transmits an error term [7]. • ITU-T G.711, mu-law and A-law: mu-law is an encoding commonly used in North America and Japan for digital telephony. mu-law samples are logarithmically encoded in 8 bits. However, their dynamic range corresponds to 14 bits linear data. A-law is similar to mu-law, and is used as an European telephony standard. This encoding method comes out to be 64 Kbps at 8 kHz. • Transform coding: e.g. Frequency domain coders: The spectral characteristics of the source signal and the masking properties of the human ear are exploited to reduce the transmitted bitrate [12]. The time-domain audio signal is transformed to the frequency domain before quantization [13]. The reason for transforming the signal is that the input samples are highly correlated, and the time-to-frequency transform produces coefficients that are less correlated. There are also more coefficients with the value near zero, that can be coded as zero without introducing great distortion. The spectrum is split into frequency bands that are quantized separately. Therefore the quantization noise associated with a particular band is contained within that band. 8 Web Radio: Technology and Performance Figure 2.1: Masking effects in the human ear [9]. The total number of bits available for quantizing (usually fixed by design) is distributed by a dynamic bit allocation over a number of signal component quantizers, so that the audibility of the quantization noise is minimized. The number of bits used to encode each frequency component varies: components being subjectively more important, are quantized more finely, while components being subjectively less important, have fewer bits allocated, or may not be encoded at all [12]. This results in the highest possible audio quality for a given number of bits [14]. Transform coders differ in the strategies used for quantization of the spectral components and masking the resulting coding errors [15]. Some examples are: o SBC (Subband coding): eliminates information about frequencies which are masked, according to psychoacoustic models. It is used in ISO/MPEG Audio Coding Layers I and II. o Adaptive transform coding: used in Dolby AC-2 coding and AT&T’s Perceptual Audio Coder. o Hybrid (subband / transform) coding: is a combination of discrete transform and filter bank implementations. It is used by Sony’s Adaptive Transform Acoustic Coding used in minidisk and by MPEG-1 Layer III. Other transformations can be applied instead of frequency, such as DCT (Discrete Cosine Transform) DST (Discrete Sine Transform), and DWT (Discrete Wavelet Transform). There are also nested approaches or multilevel data compression techniques, such as applying the DCT several times [16]. Finally, coding methods can also be divided in CBR (Constant bitrate) and VBR (Variable bitrate). CBR techniques vary the quality level in order to ensure a consistent bitrate throughout an encoded file. Difficult passages (e.g. passages containing a relatively wide 9 2 Audio Compression stereo separation) may be encoded with fewer than the optimum number of bits, and easy passages (passages containing silence or a relatively narrow stereo separation) are encoded using more bits than necessary. Consequently, difficult passages may experience a decrease in quality, while easy passages may include unused bits [17]. VBR techniques ensure consistent high audio quality throughout an encoded file, at the cost of variable bitrate. Difficult passages in the audio source are allocated additional bits and easy passages fewer bits, thus reducing unused bits. VBR encoding produces an overall higher quality level than CBR encoding, and should be used when consistent audio quality is the top priority and constant or predictable encoded file size is not critical [17]. 2.2 Audio Compression on the Internet When the Internet was young, the only way to deliver audio in an acceptable time was to precede the compression by some reduction scheme, reducing the sampling rate, converting from stereo to mono, reducing the resolution from 16 down to 8 bits per sample, or all of the above. But every reduction in the above parameters resulted in a lower quality sound [3]. Not long after, however, groups began to create highly complex algorithms that allowed for reduction in the size of the file while retaining the highest quality possible. An early effort was the MPEG (Moving Picture Experts Group), which intended to develop a compression scheme first oriented to video and then also to audio [3]. The next step was creating an acceptable compression scheme for streaming technology [3]. For the first time, hour-long sound files could be played on a computer with a 28.8 Kbps modem. The higher the bandwidth available for the users, the better quality of the sound they got. The following is a review on the MPEG audio compression standards and two proprietary audio compression formats, Windows Media Audio and RealAudio [111]. 2.2.1 MPEG The MPEG is a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video. Established in 1988, the group has produced MPEG-1, the standard on which such products as Video CD and MP3 are based, MPEG-2, the standard on which Digital Television set top boxes and DVD are based and MPEG-4, the standard for multimedia [18]. Furthermore, the MPEG audio compression algorithm is being considered by the European Broadcaster’s Union as a standard for use in digital audio broadcasting [19]. They have developed more standards that are not revised here since they are not related to audio. 10 Web Radio: Technology and Performance MPEG-1 (ISO/IEC 11172) Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbps MPEG-1 is a standard for efficient storage and retrieval of audio and video on CD consisting of several parts: part 1 (Systems), part 2 (Video), part 3 (Audio), part 4 (Conformance Testing), and part 5 (Reference Software). The Systems part provide multiplexing and synchronization support to elementary audio and video streams. The Audio part provides lossy encoding of stereo audio with transparency (subjective quality similar to the original stereo) at some predefined bitrates [see Table 2.2], and also provides a free bitrate mode to support fixed bitrates other than the predefined. From Layer I to Layer III, the codec complexity increases, the overall codec delay increases [see Table 2.2], and the performance increases. The sampling frequencies used in the three layers are 32, 44.1 and 48 kHz. Layer Layer I Layer II Layer III (MP3) Bitrates 384 Kbps (1:4) 256 - 192 Kbps (1:6 - 1:8) 128 - 112 Kbps (1:10 - 1:12) Minimum coder delay < 50 ms. < 100 ms. < 150 ms. Table 2.2: Bitrates and theoretical minimum coder delay in the MPEG-1 layers [7]. MPEG-1 Audio Layer I (MP1) The Layer I encoder is the basic MPEG encoder [see Figure 2.2]. The audio stream passes through a PQMF (Polyphase Quadrature Mirror Filterbank) that divides the input into 32 equal-width subbands of frequency, which don’t reflect the human auditive model where critical bands are not of equal width [20]. Each subband has 12 frequency samples that are obtained calculating the DFT (Discrete Fourier Transform) of 12 input PCM samples. This means that the Layer I frame has 384 (12 * 32) PCM audio samples. In Layer I the DFT is calculated with 512 points. In addition, the filter bank also determines the maximum amplitude of the 12 subband samples in each subband. This value is know as the scaling factor for the subband and is passed both to the psychoacoustics model and, together with the set of frequency samples in each subband, to the corresponding quantizer and coding block [7]. The input audio stream simultaneously passes through a psychoacoustic model that determines the ratio of the signal energy to the masking threshold for each subband. The quantizer and coding block use a bit allocation algorithm that takes the signal-to-mask ratios and decides how to distribute the total number of bits available for the quantization of 11 2 Audio Compression the subband signals to minimize the audibility of the quantization noise and maximize the compression at the same time. Figure 2.2: MPEG basic audio coder and decoder [21]. Finally, the last block takes the representation of the quantized subband samples and formats this data and side information (bit allocation, scale factors) into a coded bitstream [see Figure 2.3]. Figure 2.3: MPEG-1 Layer I Frame Structure [21]: valid for 384 PCM Audio Input Samples. Duration: 8 ms. with a Sampling Rate of 48 kHz. 12 Web Radio: Technology and Performance The decoder is the reverse of this process, except that no psychoacoustic model is required. The bitstream is unpacked and passed through the filter bank to reconstruct the time domain PCM samples. Layer I is the same as the PASC (Precision Adaptive Sub-band Coding) compression used in Digital Compact Cassettes. Typical applications of Layer I include digital recording on tapes, hard disks, or magneto-optical disks, which can tolerate the high bitrate [21]. MPEG-1 Audio Layer II (MP2) Layer II algorithm is a straightforward enhancement of Layer I. It codes the audio data in larger groups (1152 PCM samples) and imposes some restrictions on the possible bit allocations for values from the middle and higher subbands. It also represents the bit allocation, the scale factors values and the quantized samples with more compact code. Layer II gets better audio quality by saving bits in these areas, so more code bits are available to represent the quantized subband values [20]. The resulting frame structure is represented in Figure 2.4. Typical applications of Layer II include audio broadcasting, television, consumer and professional recording, and multimedia [21]. Figure 2.4: MPEG-1 Layer II Frame Structure [21]: valid for 1152 PCM Audio Input Samples. Duration: 24 ms with a Sampling Rate of 48 kHz. MPEG-1 Audio Layer III (MP3) The Layer III algorithm is a much more refined approach derived from ASPEC (Audio Spectral Perceptual Entropy Coding). Although based on the same filter bank found in the other layers (for reasons of compatibility), Layer III compensates for some filter bank deficiencies by processing the filter outputs with a MDCT (Modified Discrete Cosine Transform) [see Figure 2.5] [22]. It can be said that the filter bank used in MPEG Layer III is a hybrid filter bank which consists of a polyphase filter bank and a MDCT [20]. 13 2 Audio Compression Unlike the polyphase filter bank, without quantization, the MDCT transformation is lossless. The MDCT further subdivides the subband outputs in frequency to provide better spectral resolution [20]. Figure 2.5: Block structure of ISO/MPEG audio encoder and decoder, Layer III [15]. Besides the MDCT processing, other enhancements over Layer I and II algorithm include the following [20]: 14 • Alias reduction: Layer III specifies a method of processing the MDCT values to remove some artifacts caused by the overlapping bands of the polyphase filter bank. • Non-uniform quantization. • Scale-factor bands: These bands cover several MDCT coefficients and have approximately critical-band widths. In Layer III scale factors serve to color the quantization noise to fit the varying frequency contours of the masking threshold. Values for these scale factors are adjusted as part of the noise-allocation process. Web Radio: Technology and Performance • Entropy coding of data values: To get better data compression, Layer III uses variable-length Huffman codes to encode the quantized samples. The Huffman code tables assign smaller words to more frequent values. If the number of bits resulting from the coding operation exceeds the number of bits available to code a given block of data, this can be corrected by adjusting the global gain to result in a larger quantization step size, leading to smaller quantized values. This operation is repeated with different quantization step sizes until the resulting bit demand for Huffman coding is small enough. This loop is called rate loop because it modifies the overall coder rate until it is small enough [23]. • Uses of a bit “reservoir”: The coder encodes at different bitrate when it is needed. If a frame is easy it is assigned less bits and the unused bits are put into a reservoir buffer. When a frame comes along that needs more than the average amount of bits, the reservoir is tapped for extra capacity. • Ancillary data: Is held in a separate buffer and gated onto the output bit stream using some of the bits allocated for the reservoir buffer when they are not required for audio. • Noise allocation: The encoder iteratively varies the quantizers in an orderly way, quantifies the spectral values, counts the number of Huffman code bits required to code the audio data, and calculates the resulting noise. If, after quantization, some scale-factor bands still have more than the allowed distortion, the encoder amplifies the values in those scale-factor bands and effectively decreases the quantizer step size for those bands. Then the process repeats. The process stops if any of the following 3 conditions is true: none of the scale-factor bands have more than the allowed distortion, the next iteration would cause the amplification of the bands to exceed the maximum allowed rate or the next iteration would require all the scalefactors to be amplified. As with Layer II, Layer III processes the audio data in frames of 1152 samples. The arrangements of the bit fields in the bitstream is like this: Header (32), CRC (0,16), side information (136,256) and main data. Layer III is intended for applications where a critical need for low bitrate justifies the expensive and sophisticated encoding system. It allows high quality results at bitrates as low as 64k bps. Typical applications are in telecommunication and professional audio, such as commerBc±/ly published music and video [20]. 15 2 Audio Compression MPEG-2 (ISO/IEC 13818) Generic coding of moving pictures and associated audio information MPEG-2 has 9 parts. Part 1 addresses the combining of one or more elementary streams of video and audio, as well as other data into single or multiple streams which are suitable for storage or transmission. This is specified in two forms: the Program Stream (PS) and the Transport Stream (TS). Each is optimized for a different set of applications. Part 2 builds on the powerful video compression capabilities of the MPEG-1 standard to offer a wide range of coding tools. Part 3, the Audio part, provides support to encoding of multichannel audio in such a way that it is a backwards compatible multichannel extension of MPEG-1 Audio, using the same family of audio codecs: Layer I, II and III. The new audio features of MPEG-2 are: • “low sample rate extension” to address very low bitrate applications with limited bandwidth requirements. The new sampling frequencies are 16, 22.05 or 24 kHz, and the bitrates extend down to 8 Kbps, • “multichannel extension” to address surround sound applications with up to 5 main audio channels (left, center, right, left surround, right surround) , • optionally one extra “low frequency enhancement” or LFE channel , • “multilingual extension” to allow the inclusion of up to 7 more audio channels [25]. Part 4 and 5 correspond to part 4 and 5 of MPEG-1 [23]. The Part 6 DSM-CC (Digital Storage Media Command and Control) provides protocols for session set up across different networks and for remote control of a server containing MPEG-2 content. Part 7 “Advanced Audio Coding” (AAC) provides a new multichannel audio coding that is not backwards compatible with MPEG-1 Audio. It will be explained within the MPEG-4 standard, which defines an improvement of AAC. Part 8 was intended to support video coding when samples are represented with an accuracy of more than 8 bits, but its development was discontinued when the interest of the industry that had requested it did not materialize. Part 9 “Real Time Interface” provides a standard interface between an MPEG-2 Transport Stream and a decoder. MPEG-2 provides broadcast quality audio and video at higher data rates. Parts 1, 2 and 3 are used in digital television set top boxes and DVD (Digital Versatile Discs). Some MPEG-2 encoders are very costly professional equipment and some are inexpensive PC boards that are sold with video editing software. AAC has been adopted by Japan for a national digital television standard and by several manufacturers of secure digital music. As the MPEG-2 standard defines 16 kHz as lowest sample rate, a further extension has been introduced, again dividing the low sample rates of MPEG-2 by 2: 8, 11.025, and 12 16 Web Radio: Technology and Performance kHz [23]. This extension is named “MPEG 2.5” but it is not part of the official ISO standard. MPEG – 1, MPEG - 2 Audio Frame An MPEG audio file is built up from smaller independent parts called frames, each one with its own header and audio information. There is no file header, and therefore, any part of an MPEG file can be cut and played correctly [26, 27]. To get the information about an MPEG file, it is usually enough to find the first frame, read its header and assume that the other frames are the same. However, this may not be always the case, VBR MPEG files may use “bitrate switching”, which means that bitrate changes according to the content of each frame. Layer III decoders must support this method, and Layer I & II decoders may support it. The frame header is constituted by the first four bytes in a frame. Here is a “graphical” presentation of the header content. Characters from A to M are used to indicate different fields. In Table 2.4 are shown the details about the content of each field [26]. AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM Sign Length (bits) Position (bits) A 11 (31-21) B 2 (20,19) Description Frame sync (all bits set) : used for synchronization. To avoid false frame sync, 2 or more frames in a row must be checked. MPEG Audio version ID 00 – MPEG Version 2.5 01 – reserved 10 – MPEG Version 2 (ISO/IEC 13818-3) 11 – MPEG Version 1 (ISO/IEC 11172-3) Layer description C 2 (18,17) D 1 (16) E 4 (15,12) 00 – reserved 01 – Layer III 10 – Layer II 11 – Layer I Protection bit 0 - Protected by CRC (16 bit CRC follows header) 1 - Not protected Bitrate index 17 2 Audio Compression bits V1,L1 V1,L2 V1,L3 0000 free Free free 0001 32 32 32 0010 64 48 40 0011 96 56 48 0100 128 64 56 0101 160 80 64 0110 192 96 80 0111 224 112 96 1000 256 128 112 1001 288 160 128 1010 320 192 160 1011 352 224 192 1100 384 256 224 1101 416 320 256 1110 448 384 320 1111 bad Bad bad NOTES: All values are in Kbps V2,L1 free 32 48 56 64 80 96 112 128 144 160 176 192 224 256 bad V2, L2 & L3 Free 8 16 24 32 40 48 56 64 80 96 112 128 144 160 Bad V1 - MPEG Version 1, V2 - MPEG Version 2 and 2.5 L1 - Layer I, L2 - Layer II, L3 - Layer III For Layer II there are some combinations of bitrate and mode which are not allowed. Here is a list of allowed combinations. bitrate free 32 48 56 64 80 96 112 128 160 192 224 256 320 384 18 allowed modes All single channel single channel single channel All single channel All All All All All stereo, intensity stereo, dual channel stereo, intensity stereo, dual channel stereo, intensity stereo, dual channel stereo, intensity stereo, dual channel Web Radio: Technology and Performance F 2 (11,10) Sampling rate frequency index (values are in Hz) bits MPEG1 00 44100 01 48000 10 32000 11 reserv. Padding bit G 1 (9) H 1 (8) I 2 (7,6) J 2 (5,4) K 1 (3) L 1 (2) MPEG2 22050 24000 16000 reserv. MPEG2.5 11025 12000 8000 reserv. 0 - frame is not padded 1 - frame is padded with one extra slot Private bit. It may be freely used for specific needs of an application, e.g. if it has to trigger some application specific events. Channel Mode 00 – Stereo 01 - Joint stereo (Stereo) 10 - Dual channel (Stereo) 11 - Single channel (Mono) Mode extension (Only if Joint stereo) Mode extension is used to join information that are of no use for stereo effect, thus reducing needed resources. These bits are dynamically determined by an encoder in Joint stereo mode. Complete frequency range of MPEG file is divided in subbands There are 32 subbands. For Layer I & II these two bits determine frequency range (bands) where intensity stereo is applied. For Layer III these two bits determine which type of joint stereo is used (intensity stereo or m/s stereo). Frequency range is determined within decompression algorithm. Layer I and II Layer III Intensity value Layer I & II MS stereo stereo 00 bands 4 to 31 off off 01 bands 8 to 31 on off 10 bands 12 to 31 off on 11 bands 16 to 31 on on Copyright 0 - Audio is not copyrighted 1 - Audio is copyrighted Original 19 2 Audio Compression 0 - Copy of original media 1 - Original media Emphasis M 2 (1,0) 00 – none 01 - 50/15 ms 10 – reserved 11 - CCIT J.17 Table 2.3: MPEG-1 and -2 frame header [26]. Frames usually have a CRC check after the header, and other fields in different formats as has been shown for each layer [see Figure 2.3 and 2.4]. Then comes the audio data and after the data follows tag information, used to describe the MPEG Audio file. The structure of the MPEG Audio Tag ID3v1 is: AAABBBBB BBBBBBBB BBBBBBBB BBBBBBBB BCCCCCCC CCCCCCCC CCCCCCCC CCCCCCCD DDDDDDDD DDDDDDDD DDDDDDDD DDDDDEEE EFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFG Sign Length (bytes) Position (bytes) A 3 (0-2) B C D E F G 30 30 30 4 30 1 (3-32) (33-62) (63-92) (93-96) (97-126) (127) Description Tag identification. Must contain 'TAG' if tag exists and is correct. Title Artist Album Year Comment Genre ( 1=Classic Rock, 2=Country, 3=Dance…) Table 2.4: MPEG Audio Tag ID3v1 [26]. The ID3v1 tag has some limitations and drawbacks. It has a fixed size of 128 bytes and supports only a few fields of information, which are limited to 30 characters, making it impossible to correctly describe many titles and authors. Furthermore, since the position of 20 Web Radio: Technology and Performance the ID3v1 tag is at the end of the audio file it will also be the last thing to arrive when the file is being transmitted. ID3v2 is a new tagging system designed to be expandable and more flexible. It improves ID3v1 by enlarging the tag to 256 MBytes and dividing the tag into smaller pieces, called frames, which can contain any kind of information and data such as title, lyrics, images, web links etc. It still keeps files small by being byte conservative and with the capability of compress data. What is more, the tag resides in the beginning of the audio file thus making it suitable for streaming. MPEG Audio stereo redundancy coding MPEG Audio works both with mono and stereo signals. It supports two types of stereo redundancy coding: Intensity and MS (middle/side) stereo coding. All layers support intensity stereo coding, Layer III also supports MS stereo coding. Both forms of redundancy coding exploit psychoacoustics. Above 2 kHz and within each critical band, the human auditory system bases its perception of stereo more on the temporal envelop of the audio signal than on its temporal fine structure. In intensity stereo mode the encoder replaces the left and right signal by a single representing signal, plus directional information [38]. The MS stereo mode encodes the left and right channel signals in certain frequency ranges as middle (sum of left and right, L+R) and side (difference of left and right, L-R) channels. A technique called joint stereo coding is used in Layer III to achieve a more efficient combined coding of the left and right channels of a stereophonic audio signal. It takes advantage of the redundancy in stereo material, the encoder switches from discrete L/R to a matrixed L+R/L-R mode dynamically, depending on the material. MPEG-4 (ISO/IEC 14496): Coding of audio-visual objects The first 6 parts of the standard correspond to those of MPEG-2, and it is backwards compatible with MPEG-1 and MPEG-2. There are, however, a number of significant differences of content. MPEG-4 enables the coding of individual objects, which means that the video information don’t need to be of rectangular shape as in MPEG-1 and MPEG-2 Video. The same applies for audio, MPEG-4 provides all tools to encode speech and audio at different rates from 2 to 64 Kbps and with different functionality, including MPEG-4 AAC, an extension of MPEG-2 AAC [34]. Part 5 is a complete software implementation of both encoders and decoders. Compared with the reference software of MPEG-1 and MPEG-2 whose value is purely informative, the MPEG-4 Reference Software has the same normative value as the textual parts of the 21 2 Audio Compression standard. The software may also be used for commercial products and the copyright of the software is licensed at no cost by ISO/IEC for products conforming to the standard. So far the industry has enthusiastically adopted MPEG-4 Video, which has been selected by several industries for setting standards for next generation mobile communication and is being utilized to develop solutions for video on demand and related applications. Some people call it the future “global multimedia language” [29]. MPEG-4 Audio provides several “profiles” to allow the optimal use of MPEG-4 in different applications. At the same time the number of profiles is kept as low as possible in order to maintain maximum interoperability. Some of the profiles that MPEG-4 offers are Speech Audio, Synthesis Audio, Main Audio, High Quality Audio, Low Delay Audio, Natural Audio, Mobile Audio Internetworking etc.[36]. MPEG-4 Audio has two work items underway for improving audio coding efficiency [35]: bandwidth extension for both general audio signals and speech signals and parametric coding, to extend the capabilities currently provided by HILN (Harmonic and Individual Lines plus Noise) [35]. MPEG-2/4 AAC (Advanced Audio Coding) The AAC system is the highest performance coding method within MPEG [37]. AAC works with a wide range of sampling rates from 8 to 96 kHz, bitrates from 16 to 576 Kbps (achieving indistinguishable audio quality at 96 Kbps per channel), and from 1 to 48 audio channels. Because it uses a modular approach, users may pick and choose among the component tools to make a product with appropriate performance/complexity ratios [37]. Due to its high coding efficiency, AAC is a prime candidate for any digital broadcasting system and delivery of high-quality music via the Internet [29]. MPEG-2/4 AAC has no backward compatibility with MPEG-1, -2, but as it is built on a similar structure to Layer III, it retains some of MP3’s powerful features: redundancy reduction using Huffman encoding, bit “reservoir”, non-uniform quantization, ancillary data and the joint stereo mode [37]. However, it improves MP3 in a lot of details and uses new coding tools [see Figure 2.7]. The crucial differences are [29]: 22 • Filter bank: AAC uses a plain MDCT (Modified Discrete Cosine Transform) [29], and an increased window length (2048 instead of 1152) . • TNS (Temporal Noise Shaping): It shapes the distribution of quantization noise in time by prediction in the frequency domain, and transmits the prediction residual of the spectral coefficients instead of the coefficients [29]. • Prediction: It benefits from the fact that a certain type of audio signals (stationary or semi-stationary) are easy to predict [29, 30] and then, instead of repeating such information for sequential windows, a simple repeat instruction can be passed. Web Radio: Technology and Performance • Quantization: makes a finer control of the resolution using an iteration method. AAC is a block oriented, VBR coding algorithm, but rate control can be used in the encoder such that the output bitrate is averaged to a predetermined rate (as for CBR). Each block of AAC compressed bits is called a “raw data block”, and can be decoded “stand-alone” (without knowledge of information in prior bitstream blocks), which facilitates encoder and decoder synchronization and, if any packet is lost this doesn’t affect the decodability of adjacent packets [32]. The syntax of an AAC bitstream is as follows: <AAC_ bitstream> => <raw_data_block><AAC_bitstream> <raw_data_block> => [<element>]<END><PAD> here [] indicates one or more occurrence. <END> indicates the end of a raw_data_block and <PAD> forces the total length of a raw_data_block to be an integral number of bytes. The <element> is a string of bits of varying length, indicating if the represented data is from a single audio channel, stereo, multi-channel, user data, etc. [32]: Figure 2.6: MPEG-2 AAC audio coding [1]. Legend: Data Control 23 2 Audio Compression The standard defines two examples of formats for the transport of audio data [33]: • ADIF (Audio Data Interchange Format) puts all data controlling the decoder (sampling frequency, mode etc...) into a header preceding the audio stream. This is useful for file exchange, but not for streaming. • ADTS (Audio Data Transport Stream) packs AAC data into frames with headers (like the MPEG-1, -2 format), which is more suitable for streaming. MPEG2/4 AAC-LD The MPEG-4 AAC LD (Low Delay Audio Coder) is designed to combine the advantages of perceptual audio coding with the low delay necessary for two way communication. The codec is closely derived from MPEG-2 AAC [41], but the contributors to the delay (such as frame length or window shape) have been addressed and modified [37]. MPEG-4 General Audio coder The MPEG-4 General Audio coder is derived from MPEG-2 AAC, and is backward compatible with it, but adds several enhancements: • PNS (Perceptual Noise Substitution): PNS is based on the observation that one noise sounds like the other. Then, the actual fine structure of a noise signal is of minor importance for its subjective perception. Consequently, instead of transmitting the actual spectral components of a noisy signal, the bitstream would just signal that the frequency region is a noise-like one and give some additional information on the total power in that band [38]. • LTP (Long Term Prediction): LTP is an efficient tool for reducing the redundancy of a signal between successive coding frames. It is especially effective for the parts of a signal which have clear pitch property [38]. The structure of the coder is represented in Figure 2.6. The same building blocks are present in the decoder implementation, performing the inverse processing steps. To increase coding efficiency for coding of musical signals at very low bitrates (below 16 Kbps per channel) [38,41], TwinVQ based coding tools are part of MPEG-4 General Audio [40]. The basic idea is to replace the conventional encoding of scalefactors and spectral data used in MPEG-4 AAC by an interleaved vector quantization applied to a normalized spectrum [38]. The input signal vector (spectral coefficients) is interleaved into subvectors that are then quantized using vector quantizers [38]. The rest of the processing chain remains identical as can be seen in the Figure 2.7. 24 Web Radio: Technology and Performance Figure 2.7: Building Blocks of the MPEG-4 General Audio Coder [38]. Legend: Data Control Figure 2.8: Weighted interleaved vector quantization [38]. 25 2 Audio Compression 2.2.2 Windows Media Audio Microsoft Windows Media Audio delivers audio for streaming and download. According to Microsoft, WMA offers [47]: • “Near-CD Quality” at 48 Kbps, and “CD quality” at 64 Kbps. • High scalability with bandwidths from 5 Kbps to 192 Kbps and sampling rates from 8 kHz to 48 kHz high-quality stereo music [47]. This allows users to choose the best combination of bandwidth and sampling rate for their content . Microsoft claims that their WMA codec is very resistant to degradation due to packet loss, which makes it excellent for use with streaming content. In addition, by using an improved encoding algorithm, this codec encodes and decodes much faster than others. Windows Media audio files can also support ID3 metadata. If a source .mp3 file is encoded using the Windows Media Audio codec, any ID3 properties are included in the Windows Media audio file [47]. As it is a proprietary format no information can be obtained about the coder and the compression technique used, all the information exposed here was obtained from Microsoft. 2.2.3 RealAudio RealAudio is a proprietary encoding format created by Real Networks. It was the first compression format to support live audio over the Internet and thus gained considerable support, but it requires proprietary server software in order to provide the real-time playback facility. It was first designed for voice applications on the web but later it developed also into music and video algorithms [48]. RealAudio uses a “lossy” compression scheme that provides high audio quality from source material with high sampling rates (11 kHz, 22 kHz and 44 kHz). The RealAudio Encoder compression scheme works by making educated guesses about what is most important in the sound file [48]: • It knows how much room there is in the destination stream and fills that available bandwidth with as much sound information as it can. • Any sound information that doesn't fit is lost. • The user can help the encoder with its task by emphasizing the most important parts of the recording. 26 Web Radio: Technology and Performance It offers bitrates from 16 Kbps, but the recommended bitrate is 96 Kbps. As it is a proprietary format no information can be obtained about the coder and the compression technique used. 27 28 3 Compression Techniques Comparison With so many compression techniques and coders in the market, it would be good to have some comparison between them that helps the user to decide which format to choose for each application. In this chapter the important issues to look at when measuring audio quality are introduced. Different coders and compression techniques are discussed and compared. The chapter ends with conclusions and reflexions about the future of compression. 3.1 Coders Comparison Measuring the audio quality of different encoders has developed into an art of its own, over the last ten years. Basically, there are three methods to measure the audio quality [23]: • Listening tests: evaluate the performance of coders under worst-case conditions. • Simple objective measurement techniques: measure of encoder quality by looking at parameters such as the signal-to-noise-ratio or bandwidth of the decoded signal. If the coder is perceptual these measurements are not useful. • Perceptual measurement techniques: are a very useful supplement to listening tests and, in some cases, replace them. The ITU-R Task Group 10/4 has produced a Recommendation for a measuring quality system, called PEAQ (Perceptual Evaluation of Audio Quality). 29 3 Compression Techniques Comparison Another important parameter to compare different coders is the coding/decoding time, which should be as low as possible. The pure compliance of an encoder with, e.g. an MPEG audio standard, does not guarantee any quality of the compressed music. Audio quality differs depending on parameters such as the bitrate of the compressed audio and the sophistication of different encoders, even if they work with the same set of basic parameters [23]. In the last years many coders have appeared in the market, trying to improve the original compression format. Here we present some comparisons between coders. TEST_1: Comparing MP3 encoders HOW THE TEST WAS PERFORMED The coders under test were: Fraunhofer, Lame, BladeEnc and Gogo. Fraunhofer and BladeEnc used stereo encoding, while Lame and Gogo used joint stereo encoding. First the output spectrum of the encoders was compared to the spectrum of the original signal, when encoding music and noise. Then the encoding time of an 8 minutes song for each of the encoders at different fixed bitrates (160 Kbps and 128 Kbps) was calculated. Lame and Gogo were also tested with the VBR option. RESULTS The results for the encoding time are showed in Figure 3.1-3.3. Figure 3.1: Encoding time at 160 Kbps required to encode an 8:05 minutes stereo song (using a Pentium III 600 MHz) [57]. 30 Web Radio: Technology and Performance Figure 3.2: Encoding time at 128 Kbps required to encode an 8:05 minutes stereo song (using a Pentium III 600 MHz) [57]. Figure 3.3: Encoding time at VBR required to encode an 8:05 minutes stereo song (using a Pentium III 600 MHz) [57]. This figures show that Gogo is the encoder with lower encoding time at every bitrate. The results of the quality tests show that Lame is probably the best encoder [see 57 for figures], followed by Gogo, that also gives good quality, but offers less encoding options than Lame. 31 3 Compression Techniques Comparison VBR encoders offers better quality than CBR, which was easy to predict, since they optimize the used bitrate with the complexity of the audio signal to encode. CONCLUSIONS If the encoding time is not the priority, Lame with VBR should be chosen since it offers slightly better quality than Gogo. On the other way, if the encoding time is the priority, Gogo with CBR is the best option [57]. TEST_2: Comparing MPEG-2 AAC encoders HOW THE TEST WAS PERFORMED The methodology used for these tests was based on the ITU-R Recommendation BS.1116, which specifies that for the greatest listener sensitivity to artifacts each listener should be tested on his/her own and should be free to switch at any time between the stimuli under assessment. The encoders tested were: AAC Main Profile and AAC Low Complexity Profile at 96 Kbps and 128 Kbps, AAC SSR at 128 Kbps, Layer II (MP2) at 192 Kbps and Layer III (MP3) at 128 Kbps. Ten audio clips of different time duration (always less than a minute) and different “encoding difficulty” (speech, castanets, accordion, Dire Straits, Tracy Chapman...) were used for the tests. RESULTS [58] 32 • AAC Main 128: Better than MP2 for 3 items, worse for no items, equivalent for 7 items. Better than MP3 for 3 items, worse for no items, equivalent for 7 items. • AAC Main 96: Better than MP2 for 1 item, worse for 1 item, equivalent for 8 items. Better than MP3 for 1 item, worse for no items, equivalent for 9 items. • AAC LC 128: Better than MP2 for 3 items, worse for no items, equivalent for 7 items. Better than MP3 for 3 items, worse for no items, equivalent for 7 items. • AAC LC 96: Better than MP2 for no items, worse for no items, equivalent for 10 items. Better than MP3 for 1 item, worse for no items, equivalent for 9 items. • AAC SSR 128: Better than MP2 for 1 item, worse for no items, equivalent for 9 items. Better than MP3 for 2 items, worse for no items, equivalent for 9 items. Web Radio: Technology and Performance CONCLUSIONS [58] Only the Main 96 codec is outperformed by any MP2 or MP3 codec for any of these examples. AAC Main 128, AAC LC 128, and AAC SSR 128 give significantly better performance than MP2 192 or MP3 128. In addition, AAC Main 96 gives better results than MP3 128. There is no statistically significant improvement between AAC LC 96 and the MPEG-1 codecs. Within the AAC codec group, AAC Main 128, AAC LC 128, and AAC SSR 128 are all superior to AAC LC 96. In addition, AAC Main 128 and AAC LC 128 are superior to AAC Main 96. The final ranking would be then: AAC Main 128, AAC LC 128, AAC SSR 128, AAC Main 96, AAC LC 96, MP2 192, MP3 128. However, there are no statistically significant differences between each pair in this ordering. TEST_3: Comparing MPEG-4 encoders HOW THE TEST WAS PERFORMED The coders were divided in four groups of coding scheme/bitrates: • Group A tests the codecs at 6 and 8 Kbps mono and contains HILN, Twin-VQ and MPEG Layer III. The reference is MPEG Layer III (MP3). • Group B tests the codecs at 16 Kbps mono and contains HILN, AAC, and G.722 at 48 Kbps as a reference. Group C and D belong to the same coding system, but are separated because the lowest layer is a mono layer while the higher layers are stereo layers: • Group C tests the mono core layer of the AAC large step scalable coder against a unscaled AAC coder and MP3. The reference coder is MP3. • Group D tests the upper layers of the scalable coders against unscaled coders and contains AAC, AAC large step scaleable coder, AAC-BSAC fine granule scalable coder and MP3. The reference coder is MP3. The AAC-BSAC coder has no counterpart in the C-Test since it is based on a unscaled stereo AAC coder and therefore does not provide mono/stereo scalability. The codecs which were tested are then: HILN at 6 Kbps (mono, 8 kHz) and 16 Kbps (mono, 16 kHz); Twin-VQ (16 kHz); AAC (24 kHz for all bitrates, 24 Kbps (mono), 40 33 3 Compression Techniques Comparison Kbps (stereo) and 56 Kbps (stereo)); AAC scal (24 kHz for all bitrates, 24 Kbps (mono), 40 Kbps (stereo), 56 Kbps (stereo)); AAC scal BSAC (24 kHz for all bitrates, 40Kbps, 56 Kbps); G.722 at 48 Kbps, MP3 for the A test at 8 kHz for 8 Kbps coding (MPEG 2.5). RESULTS [56] • Test A: TwinVQ performed better overall. TwinVQ and Layer III performed equally well overall, however TwinVQ needs 25% less bitrate. • Test B: AAC performed better overall. • Test C: AAC 24 main performed slightly better than AAC 24 scalable overall. Both AAC coders performed much better overall and on most items compared to MP3. • Test D: AAC 56 performed better than AAC scal 56 overall, but performance was similar on almost all items. AAC 40 performed better than AAC scal 40 overall, but performance was also similar on almost all items. These two codecs did not demonstrate statistical difference, although a trend shows AAC 56 performing slightly better overall. Since BSAC didn't use a mono/stereo scaleable mode but a small step scaleable mode based on a stereo coder, these results cannot be compared directly to the AAC scaleable coder. AAC 40 performed much better than AAC BSAC 40 overall. Since BSAC didn’t use a mono/stereo scaleable mode, but an unscaled AAC coder these results cannot be compared directly to the AAC scaleable coder. The performance difference between AAC 40 and AAC BSAC 40 is significant. CONCLUSIONS [56] The following conclusions can be drawn from the test results [56]: 34 • Test A: Twin VQ at 6 Kbps shows statistically the same quality as Layer III at 8 Kbps. Twin VQ is therefore a valuable MPEG-4 tool for improved coding efficiency at lowest bitrates. • Test B: AAC at 16 Kbps performed 0.6 grades worse than G.722, but operated at 1/3rd of the bitrate. It can therefore be concluded that AAC is a valuable MPEG-4 tool for coding music signals at bitrates as low as 16 Kbps. • Test C and D: At all three bitrates, AAC audio coding shows significantly better audio quality than MPEG Layer III. Web Radio: Technology and Performance The Large Step Scaleable System (AAC Scaleable) shows almost the same quality as unscaled AAC at the lower (mono) layer and worse quality at the higher (stereo) layers. Still all Layers perform slightly better (highest layer) or significantly better (lower and mid layer) than MPEG Layer III. Therefore the scaleable system shows good performance compared to older standards while providing the additional functionality of mono/stereo scaleable coding. The BSAC (Small Step Scaleable System) performed very well at the highest bitrate of 56 Kbps. On the lower bitrate of 40 Kbps, however, BSAC performed worse than expected. Although being mainly designed for bitrates from 40-64 Kbps mono at 48 kHz sampling rate, the BSAC tool is still expected to show reasonably good performance when going from 56 Kbps stereo to 40 Kbps stereo at 24 kHz sampling rate. The conclusion therefore is that the integration of BSAC in the MPEG-4 audio framework needs further investigation to check whether the integration is incomplete or needs changes. 3.2 Formats Comparison There are many features to look at when choosing the audio format that fits a specific application: bitrate, sampling frequency, compression ratio, if the format is free or proprietary etc. Depending on the kind of music compressed the quality of the audio may vary, e.g. one format may be the best for classical music and another for speech, so the music used in the test must be specified. There are also differences in quality depending on the bandwidth, e.g. audio transmitted over a 14.4 Kbps analogue modem results in “very annoying”, independently of the audio coding scheme and test material, even under optimum network conditions. The audio quality improves significantly if a 28.8 Kbps modem is used, with quality levels of “slightly annoying” and “annoying”. Much better is the quality of systems with an ISDN line, and with DSL and cable network connection [49]. The way you play the audio back also affects its quality. A compressed file will sound considerably different when played through a high-end sound card and headphones than it will through a portable MP3 player with a pair of cheap earpieces. The following test are subjective (or perceptual) quality test, which means that some users listen to music in different formats and different bitrates and subjectively decide about the quality. 35 3 Compression Techniques Comparison TEST_1: WMA vs. RealAudio vs. MP3 vs. AAC HOW THE TEST_A WAS PERFORMED Five 30 sec. clips of music were selected: an acoustic version of Daughter (Pearl Jam), Radioactivity (Kraftwerk), a cellos version of Wherever I May Roam (Apocalyptica), a live version of Time (Pink Floyd), and O Fortuna from Carl Orff's Carmina Burana. These presented a range of challenges to the compression programs, as they ranged from subtle acoustic sounds to full-on orchestral splendor [50]. Each clip was compressed to a variety of bitrates and the following formats: WMA, RealAudio, MP3 and AAC. 30 people were asked in a blind test, if they could tell the difference between the compressed an uncompressed audio versions [50]. The testers listened to the sound clips through Sony MDR-7506 headphones. RESULTS Codec/ Bitrate AAC RealAudio’ MP3 WMA 64 Kbps 12% 36% 5% 9% 96Kbps 52% N/A N/A 11% 128 Kbps 70% 69% 69% 14% 192 Kbps N/A 81% 70% N/A 256 Kbps N/A 79% 73% N/A Table 3.1: Test_A results [50]. This table shows the percentage of testers who were unable to tell the compressed version from the uncompressed one. 1 The RealAudio codec supports slightly different bitrates to the others: 64, 96, 132, 176, and 264 Kbps. It is not surprising that we found that the higher the bitrate, the higher the percentage of testers who couldn't distinguish between the files [50]. The live Pink Floyd and acoustic Pearl Jam tracks were particularly easy to distinguish because both pieces contained a variety of subtle sounds (such as audience noise in the background of the Pink Floyd live track) that were lost or garbled in the compressed version. The codec that came off worst in the tests was MP3. The other interesting note is that the RealAudio codec did the best at the lowest bitrate [50]. 36 Web Radio: Technology and Performance HOW THE TEST_B WAS PERFORMED Four samples of music were compressed and then 30 people asked to rate the compressed version against the uncompressed. The scores are averages of the judges' ratings. The clips were rated on a scale of 1 to 5 according to ITU-Recommendation BS.562.3. This methods use a five grade scale for scoring: 5 4 3 2 1 BS.562.3 Quality scale Excellent Good Fair Poor Bad Table 3.2: Scoring of subjective sound quality [56]. RESULTS Codec/ Bitrate AAC MP3 RealAudio’ WMA 64 Kbps 3.4 2.2 4.1 3.6 96 Kbps 4.6 N/A N/A 4.1 128 Kbps 4.8 4.7 4.8 4.1 192 Kbps N/A 4.8 4.9 N/A 256 Kbps N/A 4.9 4.8 N/A Table 3.3: Test_B results [50]. When the testers rated what they guessed were compressed tracks, all of the formats scored above four at a bitrate of 128 Kbps or higher. MP3 came out on top with a score of 4.9 at the highest bitrate of 256 Kbps. However, RealAudio encoded at 64 Kbps also scored an average of 4.1, significantly better than the other formats at that low bitrate [50]. Interestingly, the testers did not rate WMA files at 128 Kbps any higher then WMA files at 96 Kbps, and at 128 Kbps that format was rated significantly lower than the other formats, even MP3 [50, 51]. 37 3 Compression Techniques Comparison Although MP3 is the most widespread format, it performed the worst at 64 Kbps, achieving an average score of only 2.2 [50]. TEST_2: MP3 vs. RA G2 vs. WMA v.8 vs. OGG vs. MP3pro vs. MPC HOW THE TEST WAS PERFORMED The test was made upon a classical music sample, which formed a very complex wave in matters of frequency, stereo image and dynamic range [52]. The formats were graded from 1-5 according to Table 3.4. RESULTS The results of the test are shown in Table 3.4. Quality 5 (“CD quality”) 4 3 (FM radio) 2 1 (AM radio) Format MP3 at 85 Kbps (FhG VBR) RA at 96 Kbps WMA at 96 Kbps MP3 at 113 Kbps (lame VBR) OGG at 118 Kbps MP3 at 56 Kbps and 80 Kbps (MP3 PRO) WMA at 64 Kbps RA at 64 Kbps MP3 at 128 Kbps WMA at 48 Kbps MPC at 85 Kbps (VBR) WMA at 32 Kbps RA at 45 Kbps MP3 at 48 Kbps (MP3 PRO) WMA at 20 Kbps RA at 32 Kbps MP3 at 40 Kbps Table 3.4: Codec performance on a typical multimedia computer [52, 53, 54]. More of these interesting audio test can be found on the Internet [50-55], some of them even offer samples of the audio so that you can access the audio quality yourself. 38 Web Radio: Technology and Performance 3.3 Final Conclusions Some discrepancies can be found in these tests, and in other tests available on the Internet, but I will try to give some general conclusions based on the previous results. Judging from the tests, most people will find music compressed at higher bitrates, specially if VBR algorithms are used, indistinguishable from the original versions. It is now obvious that no algorithm is perfect for everything, users must then choose the best one for their concrete application [see Table 3.5]. Appliance Share through CDs, large storage Share through Internet Portable players, small storage Publishing audio in Internet Broadcasting in Internet Main requirements Quality, Popular Quality, Compression Compression Quality, Compression, Popular Popular, Compression, Streamable Format of Choice MP3 MP3, MP3PRO, AAC, WMA MP3PRO, AAC MP3 RA-G2, WMA Table 3.5: Best choices of algorithm per appliance [52]. The goal was to find the best format to be used for broadcasting audio on the Internet, so the choice should be a format that is popular (so that users will have the player for it already installed), highly compressed (users may just have a modem connection) and streamable (to allow for real time transmission). WMA and RealAudio fulfil all these requirements, with the drawback of being proprietary and not free. MP3 offers less compression, but the advantage of being an open format and very popular. Finally, I have used here the adjective of “CD quality” for some compressed audio bitrates, because it is so defined in many web pages. However, the term “CD quality” should not be applied to any lossy algorithm (such as the algorithms used on the Internet). If used, the meaning should be that the audio has a quality that is indistinguishable, for humans, from CD quality. That said, I believe that for the majority of music and listening conditions, MP3 when properly implemented at 128 Kbps and WMA, RealAudio and AAC at 96 Kbps, will achieve this “CD quality”. For classical music, however, the bitrate may need to be increased to 128 Kbps, even for AAC, WMA and RealAudio. 39 40 4 Internet Audio Transmission The chapter begins with an introduction on the Internet and how it works. Then an overview of the basic streaming techniques is given and some advanced techniques are proposed to improve it. Broadcasting is then defined. Some things need to be changed in the conventional model of the Internet when streaming audio, additional protocols and a new concept of servers must be defined. The main protocols used for multimedia transmission are explained, as well as web servers and streaming servers are introduced and compared. To end the chapter with, the necessity of standardization in streaming media and some currently approaches are discussed. 4.1 Introduction The Internet is a network of networks that spans the globe. These networks are interconnected so that it is always possible to route data across the networks from one point to another by some route [63]. The information, which needs to be sent across the networks, is a large piece of binary data, e.g. text, images or video. This piece of data is divided into many smaller pieces called packets (or datagrams). Each packet is given a header containing its destination and sending address and some other important information [63], and is sent independently (packets belonging to the same message may take a different route depending on network congestion, router failure...) across the network to the receiving machine, where the packets are reassembled to form the original piece of data. 41 4 Internet Audio Transmission Connecting computers together might be difficult, since the computers may be produced by different companies, have different data representations, different voltage levels for encoding 1 and 0 etc. To provide connectivity between computers, the International Standards Organization (ISO) defines an abstract model for computer communication, the “Layer Reference Model”, which divides communication into 7 layers: application, presentation, session, transport, network, data link and physical. This model provides a framework for the development of Open Systems protocol standards. The corresponding (peer) layers in the computers communicate by means of a set of rules that form a protocol and dictates data format, control information and timing. Under the ISO model, messages are delivered by the application at the source node to the highest layer. The message then travels down the different layers. Each layer in the source adds a header containing information for its peer in the destination. At the destination node, the message is received by the lowest layer. The message travels up the layers, with each layer processing and stripping off the header added by its peer at the source [84]. Traditional communications on the Internet use a suite of protocols called TCP/IP, Transmission Control Protocol and Internet Protocol. However, these protocols are not suitable for real time multimedia applications and new protocols must be used, such as RTSP (Real Time Streaming Protocol) and RTP (Real Time Protocols). 4.2 Streaming Audio 4.2.1 Overview Audio distribution over the Internet is usually based on a client/server model. In this model, programs called servers wait for request from other programs called clients, which are generally running on a different computer elsewhere in the network. Whenever a server receives a request, it sends a response, which provides some service or data to the client [65]. The server machine makes its services available using numbered ports, one for each service. Clients connect to the server’s IP address and on a specific port number, to get the audio available in the server. There are three ways of getting the audio to the user [68]: • Download: The entire file is downloaded over the net, saved in the user’s machine, and then played from the hard disk. This method has some advantages, it allows any bitrate, the audio has much better quality and the user can listen to the saved file whenever he wants [118]. 42 Web Radio: Technology and Performance The main drawback is the extremely long download time if high quality audio is provided over a low bandwidth network, because the complete file has to be downloaded before listening to it. With this method there is no possibility to provide live service [131]. • Progressive Download (also known as Progressive Streaming): the file is saved locally, as if it was downloaded, but playback begins as soon as part of the file is downloaded, before the download finishes. The user can see the part of the file that has downloaded at a given time, but can’t jump ahead to portions that haven’t been transferred yet. Progressive download is useful for listening to short files at high quality, but it is not a good solution for material that the user may want to randomly access or for live broadcast, it is strictly an on-demand technology. It is the method used by common web servers. • Streaming (also known as Real Time Streaming): the file is played directly from the network as it gets to the machine, which allows the user to hear the audio without lengthy download times. The file is never saved in the local hard disk. The server streams the data to you as soon as it gets it. Streaming is the only delivery method for live material, and it is also well suited for random access material. Figure 4.1: Streaming file formats [69]. Streaming requires the use of special servers, called streaming servers, such as Quick Time Streaming Server, Real Server or Windows Media Server, and also uses special network protocols, such as RTSP or MMS (Microsoft Media Server). How both web and streaming server work will be explained later in this chapter. 43 4 Internet Audio Transmission To date, streaming audio is much more pleasant to experience than streaming video. A future technology is streaming delivered by email, e.g. Real email (from Real Networks), which intends to deliver streaming audio and video directly into the body of an email without requiring the recipient to launch a web browser, download or install any executable files, or have any preinstalled media players or plug-ins [70]. 4.2.2 Streaming Improvement Buffering When a file is streamed, buffering in network elements may cause variations in packet’s arrival rate (jitter) because of the multiplexing of packets from many sources. Additionally, network buffers may overflow, causing packets to be lost with a corresponding change in timing relationship. Packets to the same destination may also traverse different paths and arrive out of order. Timing relationships for audio streams must be restored at the receiver to ensure coherent reception. This consideration is much more important than compensating for the lost of a packet, which is often ignored and appears to the listener as an imperceptible silence [1]. A technique called buffering is used to solve these problems. On the receiving side, the player software stores some of the stream, e.g. 10 seconds (playout delay), in a buffer before playing it. Thus, the player always has 10 seconds of material to play if there’s a network problem, and the listener will never realize the problem. Some sort of time stamping of the data will be necessary to enable the receiver to play it back at the appropriate time [1]. The size of this buffer depends on the application, by using a larger buffer and then delaying the play out of the data until the buffer is nearly full, variations in jitter can be smoothed. The downside of using a large buffer is that it introduces delay in the audio stream, this delay is not a problem for one-way audio broadcast (such as radio), but for interactive audio conferences [77]. The are also some algorithms that vary the buffer length when needed. Late packets that cannot be handled by the buffer must be considered as loss, and some form of packet-loss recovery process may be initiated. Packet loss recovery processes may be classified as redundant or nonredundant [88]: • Redundant or “Error resilience” methods require the transmission of extra information (e.g. repetition of the previous frame) that may be used when packets are lost. The repeated information is generally coded to a lower bitrate than the original stream. Using compression can make the increased bitrate requirements of transmitting redundant information more acceptable. 44 Web Radio: Technology and Performance • Nonredundant or “Error concealment” techniques don’t require the transmission of extra information, but instead rely on processing within the player, such as repeating previous waveforms or parameters, to reduce the effects of packet loss. Multi-rate files To stream audio data, the transmission line has to provide the full bandwidth of the stream during the whole transmission period, which imposes a limit on the bandwidth and hence on the quality of the audio stream [131]. Traditionally, in order to meet the bandwidth capabilities of a wide variety of users, streaming data was encoded at a variety of bandwidths. Each of these different speeds required a separate link in the web page. The users had to know how fast their connection was, and choose among all the links offered. It is possible to create streaming files that can be streamed over different speeds, so called multi-speed or multi-rate files, obviating the need for offering multiple bandwidths [143]. This scheme is used in the “SureStream” and “Intelligent streaming” technologies used in Real Media and Windows Media respectively. In a nutshell, this scheme stores multiple versions of the content, encoded at every desired bandwidth, in a single file. When the user requests the content, the player and server negotiate the appropriate bandwidth based on the user’s capabilities. In fact, if network congestion causes interruptions in delivery, the rate can be re-negotiated at a lower value [3]. Increasing connection bandwidth Congested networks and overloaded servers resulting from the growing number of Internet users contributes to the lack of good quality audio streaming over the Internet. Advanced compression mechanisms can reduce the load in the network but not in the server. A possible user solution is to install a faster connection. A cable modem offers speeds up to 10Mbps. ISDN (Integrated Services Digital Network), unlike standard modem and cable connections, offers faster connections without sharing bandwidth, offering significantly more fluid streaming [71]. Satellite vendors are offering Internet Services Providers (ISP) non-terrestrial networks to maximize bandwidth usage. Instead of using standard ground cables with standard bandwidth restrictions, ISPs who use satellite networking can theoretically deliver any type of bandwidth imaginable, thereby significantly relieving network traffic congestion. 45 4 Internet Audio Transmission Target Connection Speed 14.4 Kbps Modem 28.8 Kbps Modem 56 Kbps Modem 56 Kbps ISDN 64 Kbps ISDN 112 Kbps ISDN Recommended maximum bitrate for streaming clips 10 Kbps 20 Kbps 32 Kbps 45 Kbps 56 Kbps 80 Kbps 96 Kbps 128 Kbps ISDN Table 4.1: Typical maximum bitrates for the signal content recommended for streaming multimedia presentations, depending on modem speed [125]. Multicasting Multicasting is a method for transmitting streaming content that helps in the conservation of bandwidth. The traditional method for broadcasting on the Internet is unicasting, technology that sends a separate packet to each user that requests it. Networks also support broadcasting, where a single copy of the data is sent to all clients on the network. When the same data needs to be sent to only a portion of the clients on the network, both of these methods waste network bandwidth. Unicast wastes bandwidth by sending multiple copies of the data and may produce also server overload. Broadcast wastes bandwidth by sending the data to the whole network whether or not the data is wanted [72]. In multicast streaming, instead of broadcasting thousands of streams, the server sends out only one packet, which is duplicated along the way whenever paths to different users diverge. For this to be possible, hosts must be assigned to host groups, with multicast addresses identifying groups instead of single hosts (a range of IP addresses are reserved for this purpose). Bandwidth is thus decreased not only at the server, but also on the entire network, as can be observed in Figure 4.3. Some new things must be redefined when using multicast. In the lowest layer, the Ethernet address will be multicast addresses. IP Multicasting protocol is used in the network layer, where new routing protocols are also needed. In the transport layer, TCP is not suitable for multicasting, and thus UDP (User Datagram Protocol) is used [73]. 46 Web Radio: Technology and Performance Figure 4.2: Comparison between the network load per client when unicasting and multicasting an 8-Kbps PCM audio stream [72]. The user needs information about how to join the multicast. The Session Description Protocol (SDP) is the protocol in charge of providing this information via .sdp files, containing group address, port number, name and description, protocol used etc. SDP files are commonly posted on web servers to announce upcoming multicasts [74]. The IGMP (Internet Group Management Protocol) manages the multicast groups. Figure 4.3: Unicast vs. Multicast [75]. The heterogeneous nature of the Internet makes multicast transmission a challenge. Different receivers of the same multicast data may have different processing capabilities, loss tolerance and bandwidth available in the paths leading to them. The sender application must treat the group of receivers with fairness. An adaptive mechanism can be used, the sender application may transmit one multicast stream and determine which transmission rate satisfies most of the receivers, may transmit at multiple rates etc. [78]. 47 4 Internet Audio Transmission Furthermore, multicast addresses are recognized only by multicast routers, but not all the routers support multicast. The MBONE is a virtual network that allows multicast packets to travel through routers that are set up to handle only unicast traffic. To achieve this each router supporting multicast encapsulates the multicast message into a unicast message addressed to the next multicast router. Users behind routers that don't implement multicasting can also receive multicast streams by requesting them from a reflector. A reflector is an RTSP server that joins a multicast, and then converts the multicast into a series of unicast, passing the streams to users who request them [see Figure 4.5]. Figure 4.4: You can also receive a multicast through a reflector [74]. 48 Web Radio: Technology and Performance Caching and replication Caching and replication (or mirroring, splitting) technologies are also being used to reduce the load of networks and servers and the response times. Replication involves setting up a source audio server that is sending the streams, along with multiple splitting servers in other locations. The streams are then splitted from the source server to each splitting server that rebroadcasts them to its own users. Because stream splitting requires fewer direct connections to the origin server, a large amount of bandwidth can be conserved by enabling this feature [79]. Unfortunately these proposals and implementations do not currently use multicast technology, to their detriment [1]. Caching cannot be used for live transmission either, just for on-demand contents. Figure 4.5: Stream replication [80]. Error spreading Researches continue thinking in ways to improve the streaming technology. Some of them oriented to the error handling and coding methods. One of the most annoying disturbances when listening to audio are bursty losses caused by congestion. Error spreading is a technique that permutes the input sequence of packets, from a continuous stream of data, before transmission. In the receiver the packets are unscrambled. This technique ensures that bursty losses in the transformed domain get spread all over the sequence in the original domain, thus improving the perceptual quality of the stream [81]. 49 4 Internet Audio Transmission 4.3 Broadcasting When an audio file is delivered on demand, it starts from its beginning when the user clicks the presentation link in a web page. Each user can receive the file at any time and use the player's controls to fast-forward or rewind through the file [137]. In an Internet radio station, however, the audio is broadcast (even if there are some audio clips stored on the web page). In a broadcast, the broadcaster starts the audio at a certain time. Users who click the audio link join the broadcast in progress. Before the broadcast begins and after it completes, the audio URL is not valid. During the broadcast the player’s fast-forward and rewind controls do not function. To make an analogy, on-demand content is like a song on a tape. The user can listen it at any time, skip forward, rewind, and pause. A broadcast, though, is like a song broadcast on a radio channel. As with a radio broadcast, there are two types of streaming media broadcasts: • Live content (also known as real time content): Live Broadcasting occurs when encoding is from a live source and the encoded stream is being broadcast over the Internet live at the time of encoding. • Pre-recorded content (also known as pre-stored content): Pre-recorded content consists of audio recorded and written to a digitalized clip. The clip can be edited before converting it to a streaming format and broadcasting it across a network. To the user, the audio looks just like a live broadcast. 4.4 Multimedia Protocols The multimedia protocols define the establishment of the connection and transmission of the media from the server to the clients. Multimedia protocols have some differences with traditional protocols used for Internet transmissions, and must fulfil some requirements: 50 • The protocols should provide a way that a sender can tell a receiver which coding scheme he wants to use, so that they can interoperate. • The receiver should be able to determine the timing relationship among the received data to be able to reconstruct the audio information. • The receiver should be also able to determine if there is a packet loss. This information is also needed to detect congestion. Web Radio: Technology and Performance • The bandwidth should be used efficiently. Audio packets tend to be small to reduce the time it takes to fill them with samples, then if a long header were added by a protocol, a large amount of link bandwidth would be wasted. HTTP, RTSP, SDP, SIP, SAP TCP, UDP, RTP, RTCP, RSVP IP, ICMP, IP multicast Ethernet Session/ Presentation/ Application Layer Transport Layer Internet/ Network Layer Physical/ Data Link Layer Table 4.2: Multimedia protocols on each Internet ISO layer. In Table 4.1 we can see the layered scheme of the multimedia protocols according to the ISO architecture. These protocols, specified for audio, which is the purpose of this thesis, are explained in the following. 4.4.1 Physical/ Data Link Layer Ethernet The physical layer specifies protocols for the actual transmission of the audio data across the link [84]. Most computers today are connected to a LAN, where the most popular technology is Ethernet, which offers high bandwidth up to 1 Gbps [83]. Both multicast and unicast are supported by the Ethernet technology. 4.4.2 Internet/ Network Layer Internet Protocol (IP) IP is a connectionless protocol, and it is also referred to as an unreliable protocol, because it relies on other layers to provide handshake and error detection and correction. The IETF (Internet Engineering Task Force) describes IP in RFC 791 [86]. The functions that IP performs include: • Defining a packet and an addressing scheme. • Moving data between transport layer and network access layer protocols. 51 4 Internet Audio Transmission • Routing packets to remote hosts. • The fragmentation and reassembly of packets. ←--------------------------------------- 32 bits --------------------------------------------------→ Version IHL Type of Service Total length Identification Flags Fragment offset Time to live Protocol Header checksum Source address Destination address Options (+ padding) Data (variable) Table 4.3: IPv4 packet format [85]. The fields in an IPv4 packet are represented in Figure 4.2 and their use is described in: 52 • Version: Indicates the version of IP currently used. • IP Header Length (IHL): Indicates the header length in 32-bit words. • Type-of-Service: Specifies how an upper-layer protocol would like a current packet to be handled, assigning various levels of importance to the packets. • Total Length: Specifies the length, in bytes, of the entire IP packet, including the data and header. • Identification: Contains an integer that identifies the current packet. This number is used to reassemble packet fragments, which are created when the size of the packet is greater than the MTU (Maximum Transfer Unit) of the destination access network or an intermediate network, and the original packet has to be divided. • Flags (3 bits): The low-order bit specifies whether the packet can be fragmented. The middle bit specifies whether the packet is the last fragment in a series of fragmented packets. The third or high-order bit is not used. • Fragment Offset: Indicates the position of the fragment's data relative to the beginning of the data in the original packet, which allows the destination IP process to properly reconstruct the original packet. Web Radio: Technology and Performance • Time-to-Live: Maintains a counter that gradually decrements down to zero, at which point the packet is discarded (keeps packets from looping in the network endlessly). • Protocol: Indicates which upper-layer protocol receives incoming packets after IP processing is complete. • Header Checksum: Helps ensure IP header integrity. • Source Address: Specifies the sender IP address. • Destination Address: Specifies the receiver IP address. • Options: Allows IP to support various options, such as security. • Data: Contains upper-layer information (≤ 65535 bytes). The only other protocol that is generally described as being at the Internet Layer of the ISO model is the Internet Control Message Protocol (ICMP), a protocol used to communicate control messages between IP systems. ICMP messages generally contain information about routing difficulties with IP datagrams or simple exchanges such as time-stamp or echo transactions. The IETF describes ICMP in RFC 792 [87]. With the Internet’s massive growth, the Internet address space was being consumed. The IETF thought about a new version of the IP. It was called IP version 6 (IPv6). IPv6 increased the IP address size from 32 bits to 128 bits, to support more levels of addressing hierarchy, a much greater number of addressable nodes and simpler auto-configuration of addresses. Scalability of multicast addresses was introduced, and also a new type of address called an anycast address, which allows sending a packet to any one of a group of nodes, was defined. IPv6 substitutes the ‘Type of Service’ field in IPv4 header by two new fields: ‘Flow Label’ (24 bits to distinguish between different flows) and ‘Priority’ (4 bits to assign different priority levels to the flows) to improve the QoS. The IETF describes IPv6 in RFC 1883 [90]. The ICMP was revised during the definition of IPv6. The multicast control functions of the IPv4 Group Membership Protocol (IGMP) were incorporated with the ICMPv6, which is defined in RFC 2463 [91]. IP Multicast We have seen the suitability of the multicast technology for audio streaming transmission. The IP Multicast protocol was created to extend the IP protocol to cater for multiple user transmission [84]. It was defined by IETF in RFC 2365 [92]. 53 4 Internet Audio Transmission While IP works by assigning each host a unique address called an IP address, IP Multicast extends IP by assigning group IP addresses to groups of users, so that the information needs only to be sent once to the group IP address for all users in that group to receive the data, using network resources and bandwidth more efficiently. The IP Multicast packet format is similar to IP packet, but with multicast IP addresses. 4.4.3 Transport Layer Transmission Control Protocol (TCP) TCP is a transport level protocol that controls the transmission and the flow of data between two hosts. TCP protocol is reliable because it ensures that every single packet is delivered by the use of sequence numbers, ACKs and retransmissions. It also provides flow control, congestion control and error recovery [84]. The IETF describes TCP in RFC 793 [93]. Figure 4.6: The Transmission Control Protocol [63]. The TCP packet format is represented in Table 4.3 and the packet’s fields are: ←--------------------------------------Source Port Data offset 32 bits --------------------------------------------------→ Destination Port Sequence Number Acknowledge Number Reserved Flags Window Checksum Urgent Pointer Options (+ Padding) Data Table 4.4: TCP packet format [89]. 54 Web Radio: Technology and Performance • Source port: Source port number. • Destination port: Destination port number. • Sequence number: The sequence number of the first data octet (except when SYN is present). If SYN is present, the sequence number is the initial sequence number (ISN) and the first data octet is ISN+1. • Acknowledgment number: If the ACK control bit is set, this field contains the value of the next sequence number that the sender is expecting to receive. Once a connection is established, this value is always sent. • Data offset (4 bits): The number of 32-bit words in the TCP header, which indicates where the data begins. The TCP header has a length that is an integral number of 32 bits. • Reserved (6 bits): Reserved for future use. Must be zero. • Flags (6 bits): The control bits may be (from right to left): U (URG) A (ACK) P (PSH) R (RST) S (SYN) F (FIN) Urgent pointer field. Acknowledgment field. Push function. Reset the connection. Synchronize sequence numbers. No more data from sender. Table 4.5: Control bits in the TCP packet [89]. • Window (16 bits): The number of data octets, which the sender is willing to accept, beginning with the octet indicated in the acknowledgment field. • Checksum (16 bits): The checksum field is the 16 bit one’s complement of the one’s complement sum of all 16-bit words in the header and data. If a segment contains an odd number of header and data octets to be checksummed, the last octet is padded on the right with zeros to form a 16-bit word for checksum purposes. The pad is not transmitted. While computing the checksum, the checksum field itself is replaced with zeros. • Urgent Pointer (16 bits): This field stores the current value of the urgent pointer as a positive offset from the sequence number. The urgent pointer points to the 55 4 Internet Audio Transmission sequence number of the octet following the urgent data. This field can only be interpreted when the URG control bit has been set. • Options: Options may be transmitted at the end of the TCP header and always have a length which is a multiple of 8 bits. All options are included in the checksum. An option may begin on any octet boundary. The use of TCP in multimedia communications is restricted to the transmission of control messages between client and servers [84]. It is not well-suited for continuous audio streaming because of its reliability [95], e.g. a packet loss will cause a dropout in the audio, but this is more acceptable than retransmitting the packet, which would probably cause the packet to arrive too late to be useful, and will pause the audio. TCP is not suited to multicast either [102]. User Datagram Protocol (UDP) In order to stream, several things need to be modified from the traditional TCP/IP model, current streaming technology relies on UDP rather than TCP. UDP is an unreliable transport protocol that forsakes the reliability and flow control of TCP and simply sends out the data as a continuous stream of packets to the receiver. This results in a continuous stream of packets that is only dependant on the bandwidth of the connection, which is product of network congestion. The IETF describes UDP in RFC 768 [97]. Figure 4.7: The User Datagram Protocol [63]. ←--------------------------------------Source port Length 32 bits --------------------------------------------------→ Destination port Checksum Data Table 4.6: UDP packet format [89]. 56 Web Radio: Technology and Performance The UDP packet fields are represented in Table 4.5 and are: • Source and Destination port (16 bits): This port has a meaning within the context of a particular IP source/ destination address. • Length (16 bits): Is the length in octets of this packet, including header and data. • Checksum: Is the 16-bit one's complement of the one's complement sum of a pseudo header of information from the IP header, the UDP header, and the data, padded with zero octets at the end (if necessary) to make a multiple of two octets. The pseudo header contains the source address, the destination address, the protocol, and the UDP length. One problem that does exist with UDP is that many network administrators see UDP as a security risk, and they choose to combat this problem by blocking all or most of the UDP traffic using a firewall. If the receiver network is running a firewall that is blocking UDP packets, then it cannot receive UDP packets and the sender will have to resort to TCP [63]. Since UDP does not implement congestion control, applications that are implemented using UDP should detect and react to congestion in the network. Ideally, they should do so in a way that ensures fairness when competing with existing Internet traffic, otherwise such applications may obtain larger portions of the available bandwidth than TCP-based applications. Real Time Protocol (RTP) RTP provides end-to-end network transport functions suitable for applications transmitting real time data such as audio or video, over multicast or unicast technologies. RTP does not address resource reservation and does not guarantee QoS. Since RTP does not have all the functions of a transport protocol it is typically used on top of TCP or UDP, usually UDP and is thus connectionless. RTP is not part of the TCP/ IP protocol stack, so applications should add and recognize a new 12-byte header in each UDP packet [83]. The sender fills in each header which format can be observed in Figure 4.6, and contains: • V (Version): Identifies the RTP version. • P (Padding): When set, the packet contains one or more additional padding octets at the end that are not part of the payload. In the case of padding, the complete length of the RTP header, data and padding would be transported by the lower layer protocol header (UDP header) and the last byte of the padding would contain a count of how many bytes should be ignored. This approach removes any need for a 57 4 Internet Audio Transmission length field in the RTP header, in the common case of no padding, the length is deduced from the lower-layer protocol. ←------------------------------ 8 bits V M P ----------------------------------------→ X CSRC count Payload type Sequence number (2 bytes) Timestamp (4 bytes) SSRC (4 bytes) CSRC (0-60 bytes) Table 4.7: RTP header format [89]. 58 • X (Extension bit): When set, the header is followed by exactly one header extension. • CSRC count: Contains the number of CSRC identifiers that follow the fixed header. • M (Marker): The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream. • Payload type: Describes the type of data transported, such as voice, audio or video, and how it is encoded. • Sequence number: Increments by one for each RTP data packet sent, and may be used by the receiver to detect lost, out-of order and duplicate packets. • Timestamp: Reflects the sampling instant of the first octet in the RTP data packet. It is used to reconstruct the timing of the original audio in the receiver. • SSRC: Identifies the synchronization source. This identifier is chosen randomly, with the intent that no two synchronization sources within the same RTP session will have the same SSRC identifier. By making the source identifier something other than the network or transport address of the source, RTP ensures independence from the lower layer protocol, and enables a single node with multiple sources to distinguish those sources. • CSRC: Contributing source identifiers list. Identifies the contributing sources for the payload contained in this packet. Web Radio: Technology and Performance After this header there may be optional header extensions. The intention of this header is that it contains only the fields that are likely to be used by many applications, since anything that is very specific to a single application would be more efficiently carried in the RTP payload for that application only [89]. The payload in the RTP packet is following the header. Because RTP is designed to support a wide variety of applications, it provides a flexible mechanism by which new applications can be developed without repeatedly revising the RTP protocol itself. For each class of application, such as audio, it defines a profile and one or more formats. The profile provides a range of information that ensures a common understanding of the fields in the RTP header for that application class. The format specification explains how the data in the payload is to be interpreted [83], e.g. there is a payload format defined for: MPEG-2 AAC [103], MPEG-4 [104, 105], MP3 [106], MPEG1/MPEG2 Video [107], etc. The protocol also includes a provision to support “translators” and “mixers”. These devices interconnect different networks that may support different transports or codecs. A translator allows user machines to interact with each other, it accepts the traffic of one machine and translates (encodes) it into a format that is in consonance with the bandwidth limitations of the transit network and/ or the receiver machine. A mixer receives streams of RTP packets from one or more sources, possibly changes the data format, combines the streams in some manner and then forwards the combined stream. Translators and mixers can also convert multicast addresses to multiple unicast addresses so as to reach participants not on a multicast network [1, 96]. IETF describes the bases of RTP in RFC 1889 [99]. The International Telecommunication Union (ITU) employs RTP in the multimedia communications standard H.323 and it is also recommended by the Internet Streaming Media Alliance (ISMA). Real Time Control Protocol (RTCP) RTCP Real Time Control Protocol is used in conjunction with RTP, as its control protocol, and is also defined in RFC 1889 [99]. RTCP data provides information such as sender/receiver reports and statistics about the connection, e.g. number of packets lost/successfully delivered. The sender may then modify its transmission method based on information provided by RTCP, e.g. in rate adaptive applications it may decide to use more aggressive compression scheme to reduce congestion or to send higher quality stream when there is little congestion. RTCP data is sent out periodically between RTP data, and makes up to 5% of the overall session bandwidth. This limitation is important because RTCP information must not be allowed to overwhelm the connection, thereby slowing down the audio information carried by RTP [99]. 59 4 Internet Audio Transmission ←------------------------------ 8 bits ----------------------------------------→ Version P Reception report count Packet type Length Table 4.8: RTCP header format [89]. RTCP defines a number of different packets depending on the function: sender report, receiver report etc. The fields of the header that are common to all the packets are represented in Table 4.7 and are: • Version: Identifies the RTP version that is the same that the one in RTP packets. • P (Padding): When set, it means that this RTCP packet contains some additional padding octets at the end that are not part of the control information. The last octet of the padding is a count of how many padding octets should be ignored. • Reception report count: The number of reception report blocks contained in this packet. A value of zero is valid. • Packet type: Contains a constant to identify the kind of RTCP packet: 200 for sender report, 201 for receiver report etc. • Length: The length of this RTCP packet in 32-bit words minus one, including the header and any padding. Resource Reservation Protocol (RSVP) Streaming applications require a minimum bandwidth to transmit a stream. The RSVP was created to allow a stream to reserve bandwidth hop by hop, from the receiver to the source, so as to provide an acceptable QoS [83]. It works both with unicast and multicast, the only difference is that in multicast environments, an application uses RSVP to reserve bandwidth after it has joined a multicast group [1]. When a client wants to reserve bandwidth, he invokes RSVP by sending an RSVP request on the proposed delivery path to the server. All the routers on the path process this request. When a router receives an RSVP request, it tries to make the reservation on the downstream path to the client. If the reservation cannot be made, an error is propagated back to the client, and the client may try an alternative path. If the reservation is successful, the router forwards the request to the next router. After a successful reservation, the client receives a 60 Web Radio: Technology and Performance message specifying the path through the network on which the reservations have been made [84]. The RSVP is described by the IETF in RFC 2205 [109]. 4.4.4 Session/ Presentation/ Application Layer Hypertext Transfer Protocol (HTTP) HTTP is an application level protocol used traditionally to transfer data on the Internet. In spite of being well suited for the transfer of web pages, it is not suitable for real time audio streaming, because it is entirely based on TCP. Therefore, HTTP can still be used for streaming in two ways [111]: • HTTP streaming: Used in progressive streaming. • HTTP tunneling: It is used by many streaming servers (Real Server, Microsoft Media Server…) in case UDP or TCP streaming fails (streaming protocols typically are using a port that is not permitted by the users firewall). The server then sends its packets inside a HTML protocol so that it looks like regular HTTP traffic and can bypass the user’s firewall. HTTP is defined by IETF in RFC 2616 [110]. Real Time Streaming Protocol (RTSP) RTSP was designed to control a continuous multimedia (e.g. audio) stream over the Internet in real time. It supports many underlying protocols, like UDP, TCP, RTP, RSVP and multicast. It was described by IETF in RFC 2326[112]. RTSP acts like a network remote control requesting audio streams, in the same way that a user might request a web page from a web server using HTTP. For on demand audio it provides the functionality necessary to be able to stop, start, pause or go to a particular point in the stream; for live audio streams, to schedule a time at which to start the play. It does not actually carry any audio itself, this is carried on a separate network connection by RTP and RTCP. This is a benefit because control requests can be made with RTSP using a reliable protocol (TCP) on one connection, and the audio can be streamed, uninterrupted, using some unreliable protocol (UDP) on another connection [65]. The overall audio presentation and the properties of the audio the presentation is made up of are defined by a presentation description file, the format of which is defined usually by a session description protocol (SDP) [65]. The presentation description file contains information about the data streams, including their encoding, language etc. In this 61 4 Internet Audio Transmission presentation description, each audio stream that is individually controllable by RTSP is identified by an rtsp:// URL, which tells an application that the presentation is located externally on a streaming server and is using RTSP. This URL points to the server handling that particular audio stream and names the stream stored on that server, e.g. rtsp://myserver/myaudio.rm (.rm indicates RealAudio) [147]. Besides the audio parameters, the network destination IP address and port need to be determined to establish the connection. Several modes of operation can be distinguished: • Unicast: The audio is transmitted to the source of the RTSP request, with the port number chosen by the client. • Multicast: The server picks the multicast address and port. This is the typical case for a live transmission, where the client must join the multicast before being able to listen to the audio [135]. RTSP is used to establish the client/ server connection. Initially, when the client clicks in the audio link, the client and server establish a TCP connection [84]. The presentation description file is obtained by the client using HTTP typically, or other means such as email. Figure 4.8: RTSP operation [113]. Protocols: HTTP/TCP , RTSP/TCP , RTP/UDP 62 . Web Radio: Technology and Performance Once it has the presentation description, the client sends a SETUP command to the server specifying the desired audio presentation (typically specified as an URL), the selected protocol that should be used to transmit the audio and the port number the client has made available to receive the audio. The server makes any resource allocation needed and responds with a message that includes a session identifier (an arbitrary string that is used for both client and server to identify messages associated with the same session), the selected protocol, and acknowledgement of the client's port numbers. In addition, the server provides port numbers for feedback sent by the client. Everything is then in place for the streaming to begin [65]. In the simplest mode operation, the client sends PLAY request to cause the server to begin sending data, and PAUSE requests to temporarily halt it. A PLAY request may include a header specifying a range within the duration of the stream. A session is ended by the client sending a TEARDOWN request, which causes the server to deallocate any resources associated with the session. Session Control Protocols While the protocols above provide a wide range of the functionality needed for multimedia applications, there is an aspect of multimedia that they don’t address, the session control: information about addresses, ports, encoding etc. The protocols involved in this functionality are: Session Description Protocol (SDP) Session description protocol is purely a format for session description, it does not incorporate a transport protocol, and is intended to use other different protocols including the Session Announcement Protocol (SAP), Session Initiation Protocol (SIP), RTSP, electronic mail, and HTTP. It is described in RFC 2327 [114]. The purpose of SDP is to convey information about media streams in multimedia sessions to allow the recipients of a session description to participate in the session. The information provided by SDP includes: • Session name and purpose. • Time(s) the session is active. • The media comprising the session: type of media (audio, video), transport protocol (RTP / UDP/ IP), format of the media compression if used (MPEG etc.). 63 4 Internet Audio Transmission • Information to receive those media (IP unicast/ multicast addresses, ports, formats etc.). • Transport methods that the server is capable of understand. As the resources necessary to participate in a session may be limited, some additional information such as the bandwidth to be used may also be desirable. Session Announcement Protocol (SAP) The Session announcement protocol is used in order to assist the advertisement of multicast sessions, and to communicate the relevant session setup information to prospective participants. SAP periodically multicasts an announcement packet, containing a session description, to a well-known multicast address and port. SAP services provide information on all the known servers throughout the entire network [77, 89]. The protocol is described in the RFC 2974 [115]. Session Initiation Protocol (SIP) SIP is a signaling protocol for initiating, managing and terminating sessions across packet networks. It is described in RFC 2543 [116]. SIP sessions involve one or more participants and can use both unicast and multicast. 4.4.5 Asynchronous Transfer Mode (ATM) ATM is an alternative network technology to TCP/ IP based on transferring data in cells of a fixed size. ATM is a connection oriented technology, this means that a fixed channel, or route, is created between two points whenever data transfer begins. This differs from the TCP/ IP connectionless model, in which messages are divided into packets and each packet can take a different route from source to destination. This difference makes it easier to track and bill data usage across an ATM network, but it makes it less adaptable in case of network congestion [142]. The small, constant cell size allows ATM equipment to transmit video, audio, and computer data over the same network, and assure that no single type of data hogs the connection. Because of this, some people think that ATM holds the answer to the Internet bandwidth problem. ATM also offers QoS. You generally have a choice of four different types of service: • 64 Constant Bitrate (CBR) specifies a fixed bitrate. Web Radio: Technology and Performance • Variable Bitrate (VBR) provides a specified throughput capacity but data is not sent at a constant bitrate. • Unspecified Bitrate (UBR) does not guarantee any throughput. This is used for applications, such as file transfer, that can tolerate delays. • Available Bitrate (ABR) provides a guaranteed minimum bitrate but allows data to be bursted at higher bitrates when the network is free. ATM’s scalability (it works at different speeds in different media), bandwidth efficiency and support for multiple traffic types make it a suitable choice for streaming. International standards exists (ITU-T J.82) which define how the MPEG-2 Transport Stream is streamed over ATM using the AAL [141]. 4.5 Media Servers When users connect and request web pages (or any other on demand content), these are stored in web servers, and the pages are transferred using HTTP [66]. In applications involving live audio using streaming, special servers called streaming servers have to be used. Web and streaming servers operation when working with audio are analyzed and compared here [7]. 4.5.1 Web Servers A web server stores the compressed audio file, and the web page containing the audio file's URL [7, 117]. When a user clicks on a hyperlink in a page for an audio file, the user’s browser connects to the server named in the hyperlink using HTTP, and sends a request for the contents of the file named in the hyperlink using a GET request message. The server responds by returning the contents of the file in a GET response message. On receipt to this the browser determines from the header of the message the type of data (audio) and the compression method used. Then the browser invokes the media player that decompress the contents of the file and outputs the resulting byte stream to the sound card. The disadvantage of this approach is that since the browser must first receive the contents of the file in its entirety, an unacceptable long delay is introduced if the contents are of significant size. Hence, for larger files, an alternative approach is used which enables the file to be sent directly to the media player rather than through the browser [7]. 65 4 Internet Audio Transmission Figure 4.9: Streaming from a Web Server [113]. Using this approach, when an audio file is created a second file is also created. The second file, called metafile, has the URL of the original file, containing the compressed audio, and a specification of the content type stored in the file. The metafile also has a URL associated with it and, when a hyperlink to audio is included in a web page, the URL of the metafile is used rather than the URL from the original file. Figure 4.10: Metafile requests [113]. 66 Web Radio: Technology and Performance Thus when the user clicks in the hyperlink, the GET response message contains the contents of the metafile. The browser accesses the metafile and invoke the player as before, but this time it simply passes the presentation description in the metafile to the media player. The media player, on determining that this is a metafile, reads the URL of the original file and then proceeds to obtain the contents of the original file in the normal way using HTTP/ TCP. On receipt to the file contents, the media player simply streams the received compressed contents into the buffer. After a predefined delay, to allow the buffer to partially fill, it starts reading the stream from the buffer and, after decompression, outputs the resulting byte stream to the audio card. This approach removes the delays that are introduced when the file contents are accessed through the browser. The limiting factor with this approach is that since the audio file is accessed using HTTP/ TCP, a long delay will be introduced by TCP retransmissions. In general, for files containing real time information UDP is used, meaning that HTTP cannot be used and so a different server, called streaming server, must be chosen [7]. 4.5.2 Streaming Servers These servers use the RTSP, RTP, UDP and IP protocols and their operation is the same as described with RTSP [see 4.4.10]. Figure 4.11: Streaming from a Streaming Server [113]. 67 4 Internet Audio Transmission 4.5.3 Web Servers vs. Streaming Servers The primary advantage of the web server approach is that it requires one less software component (the streaming media server) to learn and manage [117]. The streaming server approach, has these advantages: • More efficient use of the network bandwidth, as long as it doesn’t use retransmissions. • Better audio quality, due to advanced features like detailed reporting and multispeed audio content. Files created to be streamed over multiple speeds (multi-rate or multi-speed files) can only be streamed from a streaming server; files created to be streamed over a single speed (single-rate encoding) can be streamed from both a web server and a streaming server. This is because the streaming server is specifically designed to determine the speed of the connection, and to stream the file at the optimum speed for that particular connection [143]. • Supports large number of users. • Supports multicasting. • Supports both live and on-demand content. 4.6 Standards Organizations As has been shown, there are many different file formats and multimedia protocols on the Internet. Internet appliances would benefit from a single standard since audio devices often cannot afford to have e.g. multiple streaming media players installed to listen to differently compressed audio content from the web, or understand different protocols [126]. The Internet Society (ISOC) is the organization home for the groups responsible for Internet infrastructure standards, including the Internet Engineering Task Force (IETF), the European Telecommunications Standards Institute (ETSI), ITU and the Internet Architecture Board (IAB) [127]. Some standards are already widely used, such as those for speech transmission, H.323 for videoconferencing or H.324 for multimedia communications, but the streaming audio standard is not that clear yet. MPEG, a working group of the ISO/ IEC that develops the MPEG audio/ video coding standards, has already been discussed. 68 Web Radio: Technology and Performance For streaming media, the Internet Streaming Media Alliance (ISMA) [126] was created with the purpose of accomplishing standards for rich Internet content, streaming video and audio. “The Alliance believes that by creating an interoperable approach for transporting and listening streaming media, content creators, product developers and service providers will have easier access to the expanding commercial and consumer markets for streaming media services” [126]. “Standards for many of the fundamental pieces needed for a streaming media over IP solution do exist. The ISMA adopts parts or all of those existing standards and contributes to those still in development in order to complete, publish and promote a systemic, end-toend specifications that enables cross-platform and multi-vendor interoperability. The first specification from the ISMA defines an agreement for streaming MPEG-4 video and audio over IP networks” [126]. It also promotes the use of RTP and RTSP as the protocols for streaming multimedia. 69 70 5 Web Radio Stations The goal of this chapter is to introduce web radio broadcasting as a growing technology and the general operation of an Internet radio station. 5.1 Introduction The development of audio broadcasting via the web is probably the biggest revolution in broadcasting since the advent of FM. An incredible variety of stations stream their audio on the Internet, from small stations to national radio, from broadcasters already known for their FM services to Internet-only stations. Additionally, numerous web radio stations have their own web sites with archives of their programs for on demand listening [123]. Web radio is becoming more and more popular. Taking a look at some statistics, the number of listeners has grown considerably in the last years and promises to continue growing. “The streaming media audience in the US is following the same pattern as the surge in Internet popularity” [139]. Listening Sessions Time Spent Listening Per Session Time Spent Listening Per Unique Listener Per Month Monthly Aggregate Hours Listened 351,934 Sessions 1 Hours 17 Minutes 9 Hours 2 Minutes 451,648 Hours Table 5.1: Web radio station statistics from May, 2002 [132]. 71 5 Web Radio Stations Today, there are approximately 2500 Internet radio stations in Real Networks, around 3000 stations (Internet radio and television) in Windows Media and almost 3000 in SHOUTcast, according to data offered in their web sites. These numbers are growing at a rapid pace. There are several factors that contribute to this growth of web radio stations [119]: 1. Web radio eliminates the coverage restriction found with FM radio stations. A radio station on the network can be accessed from any computer with Internet access. 2. Another factor is the ease of setting up a radio station server. Cheap hardware with Internet access along with free/ low-cost high-quality software enable any user to create his/ her own web radio station. 5.2 How do they work? To summarize all the previous chapters and put all the bits in order, a look at the operation of a web radio system, with both live broadcasting and automatically updated audio archives, is taken here. How the client initially connects to the streaming server using RTSP has been already explained in previous chapters. An audio source (e.g. a satellite receiver, a radio tuner or live content) delivers the content to be broadcast on the Internet. In the broadcaster equipment the analog audio signal is captured, digitized and encoded in real time by the ADC and the encoder. The output of the encoder is compressed digital audio in a streamable format. As the audio transmission begins, the contents are broken into RTP packets, containing information about the coding technique, sequence numbers and timestamping, and each packet is sent as soon as it is prepared [131]. The packets are sent over UDP, to either a multicast or unicast destination address. Broadcasters will usually create .sdp files containing all the needed information about this live presentation [135]. There may also be a reflector that takes the audio stream and repeats it. This reflector allows a multicast client to listen to the stream as a normal unicast stream coming from the streaming server [135]. The streaming server then transmits the audio to the client(s), if there are any currently demanding the live stream. Otherwise the data is discarded [131]. 72 Web Radio: Technology and Performance The user’s browser must have a streaming media player installed. The encoded audio data samples in the RTP packet are then placed in the player’s buffer with previously received audio samples. The samples are placed in the buffer in contiguous order based on their sequence number and timestamp so that the original audio can be recovered [77]. When the buffer is sufficiently full, the samples are decoded and sent to the sound card, which converts the bitstreams back to analog signal, amplifies and outputs the signal to the speakers. Further packets continue to arrive, thus, the buffer is being filled and emptied simultaneously, as playback continues, usually uninterrupted. In case of network congestion, playback may stop and the user will experience a pause in the audio while the player attempts to refill the buffer [131]. The copy of the original stream is passed to the recorder, that stores the audio on hard disk according to a schedule which determine file names and start and end times. When a client clicks on the URL, demanding a stored audio stream, the client’s browser sends a request to the streaming server, which reads the audio from hard disk and transmits it to the client [131]. 73 74 6 Implementing a Web Radio Station This chapter covers all the issues related to the implementation of a web radio station. The basic elements are reviewed and all the steps to create a web radio station - with both live and on demand content – are explained. There are many commercial systems available for streaming, the most commonly used systems – Windows Media, Real Networks and SHOUTcast/ Icecast - are explained and their main advantages and disadvantages are discussed at the end of the chapter. 6.1 Elements A streaming technology consists of hardware and software components that work together to create, store and deliver media files over the web. It has three main components: • Client side or player: It decompresses the audio bitstream taken from the buffer and passes it to the sound card, which converts the decompressed bitstream back to analog signal, amplifies it and outputs it to the speakers [7]. Players can be either a browser plug-in or a helper application that the user must download and install. Most of them are installed automatically by the browser software, and launched when the browser detects an incoming encoded audio file of the appropriate format. Players must know the codec used to compress the stream before it can decompress it and play it. However, most players will download new codecs as required. 75 6 Implementing a Web Radio Station Players can also be divided into embedded, when they look like part of the web page, or non-embedded, when the player is displayed as a separate window from the web page [143]. The elements constituting a streaming media player are a streaming protocol decoder that communicates with the streaming server, a bitstream decoder/ controller, an audio/ video compression decoder, an audio/ video post processor (that corrects the audio data if audio data has been lost during the transport), an user interface, a play-out buffer and an overall system controller [111]. When listening to audio on demand, the user requires control of the playout process using features such as pause and rewind. Hence it is necessary for the player to have another element that displays and monitors the buttons on the screen. When a button is selected, it first adapts the playout process (e.g. stops output if the pause button is activated) and then passes the appropriate command to the server using the RTSP [7]. Many players can be downloaded and are free [118], e.g. Winamp and some versions of Real Player and Windows Media Player. • Streaming server: As was explained in a previouschapter [see 5.5.2], it is a program that allows streaming files, and is made to serve large number (depending on the bandwidth) of simultaneous users [118]. • Encoder, converter or producer: It is a program that converts file formats from downloadable format (MPEG) into streamable formats. Many encoder programs come bundled with the whole client/ server system [118], but usually other encoders can be added by hardware or software. 6.2 Steps Practically all audio streaming systems work the same way. The implementation process follows a series of steps [118]: • Install the streaming server, encoder and client programs. The server part can be ignored when using HTTP streaming (server-less system). The maximum bandwidth used for streaming will be equal to the bitrate served, multiplied by the maximum number of users allowed to connect to the station + 1 [138]. 76 Web Radio: Technology and Performance If the radio service is provided only from one server, it will be a bottle neck as the number of users increases. Then it is recommended to use distributed streaming, by setting up relay servers. Users connect to the nearest relay server, which receives a copy of the audio data from the radio station and sends it to the users [130]. • Capture, create and edit the audio. Recording the audio can be accomplished with normal audio recording equipment [143]. For capturing live audio, the audio source must be attached to the computer’s audio card. • Convert the files into streaming formats using the encoder. If it is live content, a real time encoder must be used. The files can be multi-speed or single-speed. In either case, the broadcaster must decide the speed(s) that is going to be the target, depending on the Internet connection of the audience [143]. When streaming from a web server, one link for each speed must be included, but if a streaming server is used, just one link to the multi-speed file is needed. Several tools can be used to create the streamable files. Windows Media Producer (free) has a wizard that guides users through the creation of single-speed and multispeed .asf files. Real Producer (free) also uses a wizard that helps users to create single-speed .rm files, while Real Producer Plus G2 (not free) can be used to create multi-speed .rm files [143]. • Create a text pointer file with the URL of the audio file, e.g. if the audio file is called concert: http://www.company.com/media/concert.rm http://www.company.com/media/concert.asf if on demand audio working with Real Networks and Windows Media respectively, and the audio is stored on a web server, and rtsp://realserver.company.com/ramgen/media/concert.rm mms://windowsserver.company.com/media/concert.asf if live audio working with Real Networks and Windows Media respectively and the audio comes from a streaming server. The first four letters specify the protocol used for streaming, which is RTSP for Real Networks and MMS for Windows Media, then comes the server name, then the folder in the server storing the file, and the audio file name [150]. 77 6 Implementing a Web Radio Station Save this metafile with extension .ram in Real Networks and .asx in Windows Media. RealSystem G2 has a program called Ramgen that avoids the creation of .ram files. It uses specially configured URL that causes the browser to launch the player and stream using RTSP [151]. The only difference if Windows Media audio content with a .wma extension (instead of .asf) is streamed, is that it is accessed by metafiles with a .wax file name extension [47], the rest of the process exposed here is the same. • Create the web page with a link to the metafile (not to the streaming file, since browsers cannot make RTSP request, just HTTP): <a href="http://www.company.com/media/concert.ram"> Click here</a> to listen. <a href="http://www.company.com/media/concert.asx"> Click here</a> to listen. • Send the web page, the metafile (.ram/.asx ) and the streaming (.rm/.asf) files to the web server or streaming server if working with live audio. • Add or change MIME types on the server, so that browsers can recognize your new streaming type. Finally, test by clicking on the link to verify that it works. The player will launch and, after a few seconds of buffering, will play the audio[149]. Table 6.1 summarizes the formats and techniques used. Windows Media Server Real Server Streaming File Format Linking File (metafile) Media Player Type of Streaming .asf .rm .asx .ram Window Media Player RealOne Player Single Speed or Intelligent Single Speed or SureStream Single Speed - Web server Single Speed – Web server Location of Streaming Multi-Speed - Streaming Media Multi-Speed – Streaming Media File Server Server Streaming protocol MMS RTSP Create hyperlink to .asx file in Create hyperlink to .ram file in Technique HTML page; in .asx file create HTML page; in .ram file create hyperlink to .asf file. hyperlink to .rm file. Table 6.1: Streaming media with Windows Media and Real Networks [143]. 78 Web Radio: Technology and Performance A web radio station can broadcast its content either using unicast or multicast (over the Mbone). Once the station is set up, it has to give information describing its attributes (metadata), content, content-type, media-type etc. The player will use this information to select the appropriate audio decoder and provide the listener information on the content (station name, current playlist ...). There are several mechanisms for a station to advertise its metadata [119]: • Listeners select well-known IP addresses or web sites to access the radio station. Typically, this information is obtained by advertisement, word of mouth, or from portal sites and content provider sites (e.g. broadcast.com). • The radio station can register itself with a well-known directory server. Examples of this model include Nullsoft's SHOUTcast and Icecast. These systems provide a directory server that maintains a database keeping track of radio stations and their attributes, as well as a mechanism for a radio station to register itself. • The radio station is hosted on a well-known address. Companies like live365.com host radio stations on their site. They offer features like reliability, quality etc. • The station can announce its attributes using the SAP protocol on a well-known multicast address. 6.3 Commercial Tools There are three main platforms that offer complete streaming systems (server, client and encoder): Real Networks [137], Windows Media [136] and SHOUTcast/Icecast [138, 140]. The Real Networks and Windows Media systems offer servers for specific client system, the Real Player and Windows Media Player. Icecast and SHOUTcast provide MP3 streaming servers [130]. A short discussion on each of them is given now, followed by a comparison between them. There are more encoders, players and servers available in the market, that may provide even better quality for specific file formats or applications, but these are the most commonly used nowadays. 79 6 Implementing a Web Radio Station 6.3.1 Windows Media To build a radio-type solution using Windows Media, you can put together components in this way: Windows Media Windows Media Network Windows Media Figure 6.1: A basic solution using Windows Media components. Windows Media Encoder Windows Media Encoder is a free application [143] that compresses live or stored audio and video content into Windows Media format files or streams. After the digital media has been compressed and encoded, it can be saved as a Windows Media file or broadcast to the Internet. If the digital media is a live broadcast, the media is delivered in real time to a Windows Media Server that streams it to the players requesting the audio [136]. The following file formats can be used with Windows Media Encoder: .wma, .wmv, .asf, .avi, .wmv, .mpg, .mp3 and .bmp [148]. Windows Media Encoder uses two advanced features: Intelligent streaming and multiple bitrates. In Intelligent streaming the server and the client communicate with each other to establish the actual network throughput and automatically adjust the properties of the audio/video stream to maximize quality. To take full advantage of intelligent streaming, the content must be encoded using multiple bitrates. A single Windows Media file is created, containing multiple streams that are encoded at different bitrates. When the player receives the multi-rate Windows Media file or live stream, it only plays the stream encoded at the bitrate that best matches the user’s connection [148]. Windows Media encoding software has also the option to transmit the stream uncompressed , so users need 1.4 Mbps of available bandwidth to be able to listen at it. KEXP is the first radio station in the world to offer uncompressed audio on the Internet. 80 Web Radio: Technology and Performance Windows Media Player The Windows Media Player not only plays Windows Media format files (.wmv, .wma), but also other multimedia formats including AVI, MOV, ASF, WAV, MPEG-1, MPEG-2, MIDI, MP3 and QuickTime. Additional hardware is recommended to decode MPEG-2 files [136]. Windows Media Server The Windows Media Encoder works in conjunction with Windows Media Server; it first compresses audio data into the Windows Media format and then passes it to the Windows Media Streaming Server that converts it into .asf format, which is suitable for streaming [118]. The Microsoft Window Media streaming server uses some proprietary streaming protocols: • Microsoft MMS (Microsoft Media Server protocol): MMS protocol has both a data delivery mechanism to ensure that packets from the Windows Media Server reach the client, and a control mechanism to handle client commands such as stop or play. MMS works both over UDP (MMSU) and TCP (MMST) [163]. With the “protocol rollover” functionality, the server switches from one protocol to another when it fails to make a connection using a particular protocol. • Microsoft MSBD (Microsoft Media Stream Broadcast Distribution protocol): The MSBD protocol was used to transfer streams from the Windows Media Encoder to the Windows Media Server and between servers. However, Windows Media Encoder 7 no longer supports MSBD and uses HTTP instead [163]. Detailed information about broadcasting using the Windows Media Service is given in the documentation and tutorials from the Windows Media site [136]. 6.3.2 Real Networks Real Networks pioneered streaming audio with RealAudio, the first streaming media product for the Internet. The Real Broadcast Network (RBN) offers users the ability to broadcast on demand and live audio/video content. The Real Networks system consists of software only [131], and its main tools are: RealSystem Producer, RealOne Player and RealSystem Server. 81 6 Implementing a Web Radio Station RealSystem Producer Some sound-editing programs can create Real clips (file extension .rm, although older clips may use .ra instead), but RealSystem Producer Basic (free) and RealSystem Producer Plus are the most widely used tools [151]. Both producers support real time encoding for live content. The resulting audio/video stream is transmitted via a network interface to the streaming server (RealSystem server). RealSystem Producer accepts many common audio/video formats apart from its own formats. These may vary by operating system, though, e.g. RealSystem Producer on Macintosh accepts the formats widely used on the Macintosh, such as QuickTime, whereas RealSystem Producer on Windows or Unix supports the formats most used on those operating systems. Some of the accepted formats are: Audio Interchange Format (.aiff), Audio (.au), MPEG-1 (.mpg, .mp3), QuickTime (.mov), Sound (.snd) and WAV(.wav) [151]. When encoding audio clips with RealSystem Producer, the target audiences to reach can be selected [see Table 6.2]. RealSystem Producer determines then which RealAudio codecs (also called encoders) are best to use depending on the compression and quality needed [151]. On the receiving end, RealOne Player uses the same codec to decode the audio. Table 6.3 list the music codecs used in RealAudio 8. Target Audience 28.8 Kbps modem 56 Kbps modem 64 Kbps single ISDN 112 Kbps dual ISDN Corporate LAN 256 Kbps DSL/cable modem 384 Kbps DSL/cable modem 512 Kbps DSL/cable modem Voice Only Voice & Music 16 Kbps 20 Kbps 32 Kbps 32 Kbps 44 Kbps 64 Kbps 64 Kbps 96 Kbps Mono Music 20 Kbps 32 Kbps 44 Kbps 64 Kbps 96 Kbps Stereo Music 20 Kbps 32 Kbps 44 Kbps 64 Kbps 132 Kbps 176 Kbps 264 Kbps 352 Kbps Table 6.2: Real Audio standard bitrates [151]. RealSystem Producer has some advanced features to improve audio quality, such as SureStream, technology that works in the same way as Intelligent streaming in Windows Media. It also allows multi-rate encoding, which is needed for SureStream. 82 Web Radio: Technology and Performance RealAudio 8 codec 16 Kbps Stereo Music 20 Kbps Stereo Music 20 Kbps Stereo Music—High Response 32 Kbps Stereo Music 32 Kbps Stereo Music—High Response 44 Kbps Stereo Music 44 Kbps Stereo Music—High Response 64 Kbps Stereo Music 96 Kbps Stereo Music 105 Kbps Stereo Music 132 Kbps Stereo Music 146 Kbps Stereo Music 176 Kbps Stereo Music 264 Kbps Stereo Music 352 Kbps Stereo Music Sampling frequency 22.05 kHz 22.05 kHz 22.05 kHz 22.05 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz 44.1 kHz Frequency response 4.3 kHz 8.6 kHz 9.9 kHz 10.3 kHz 13.8 kHz 13.8 kHz 16.0 kHz 16.0 kHz 16.0 kHz 13.7 kHz 16.5 kHz 16.5 kHz 19.2 kHz 22.0 kHz 22.0 kHz Table 6.3: RealAudio 8 Stereo Music Codecs [151]. RealOne Player Real Networks claims that their RealOne Player “provides the most advanced media playback possibilities available, combining streaming media, digital downloads, and Web browsing” [151]. RealOne Player also provides several means for delivering information about a presentation, such as its title, author, and copyright. On their web page Real Networks claims to have 285.000.000 players installed. RealOne Player can play many types of audio, video, streaming media formats and compression formats for audio files. Table 6.4 lists all these formats. 83 6 Implementing a Web Radio Station File type Extension(s) A2B .mes Active Stream Format .asf Audio File .au Blue Matter Files .bmo, .bmr, .bmt GIF File Format .gif IBM EMMS Files .emm Liquid Audio .lqt MJuice Files .mjf MP3 Playlist Files .m3u, .pls, .xpl MPEG Files .mp3, .mpeg, .mpa, .mp2, .mpv, .mx3 MPEG Playlist File .pls MPEG URL (MIME Audio File) .m3u Macromedia Flash .swf Portable Network Graphics image files .png QuickTime Files .avi, .aiff RAM Metafile .ram, .rmm RealAudio, Real Media ra, .rm, .rmx, .rmj, .rms RealOne Music .mnd RealPix .rp RealText .rt WAVE .wav Windows Media Audio .wma Other compatible formats RealJukebox Skins .rjs RealJukebox Track Info Packages .rmp Table 6.4: Media Formats RealOne Player supports [137]. For presentations in which RealOne Player pops up as a separate application, .ram is used as the metafile extension. When the clip or presentation is embedded in a web page, however, the metafile uses the file extension .rpm. RealOne Player still plays the presentation, but it does not launch as a separate application. Instead, the browser appears to play the clips [151]. To link to an embedded clip (e.g. audio.rm) from a web page, the <EMBED> tag is used: <EMBED SRC="play_audio.rpm" WIDTH=300 HEIGHT=134> 84 Web Radio: Technology and Performance where play_audio.rpm is the metafile. This metafile then gives RealOne Player either the RTSP URL to the clip if it is stored on a RealSystem Server: rtsp://realserver.example.com/audio.rm or the HTTP URL to the clip if it is stored on a web server: http://www.example.com/audio.rm One of the players’ drawbacks is that they are CPU intensive, so they cannot be used in many systems [48]. RealSystem Server RealSystem Server streams the clips created by RealSystem Producer. It runs on Windows NT/2000 and many Unix platforms, including Linux [151]. The protocol used by the RealSystem Server is RTSP. It can stream several audio formats in addition to RealAudio: .aif, .au, and .wav. Plug-ins may exist for additional audio formats [149]. In the Figure 6.4 there is an example of the implementation of a web radio station using RealSystem. Figure 6.2: Set up of a web radio station using the RealSystem [131]. 85 6 Implementing a Web Radio Station 6.3.3 SHOUTcast and Icecast SHOUTcast is Nullsoft's free Winamp-based open source distributed streaming audio system. It allows anyone with Internet connection to broadcast audio from their PC to listeners across the Internet or any other IP-based network (office LANs, college campuses, etc.). SHOUTcast's underlying technology for audio delivery is MP3. It can deliver both live audio or on demand audio for archived broadcasts [138]. The SHOUTcast system has three elements: Nullsoft's Winamp, the SHOUTcast Source Plug-in for Winamp, and a MP3 codec [138]. Listeners tune in to SHOUTcast broadcasts by using a player compatible with streaming MP3 audio. Users can visit the SHOUtcast directory to locate a stream they'd like to listen to. The recommended player is Winamp (for Windows users) [138]. Users wanting to broadcast will need to run their own server. Once a server is located, broadcasters use Winamp to stream music. A plug-in called the SHOUTcast Source for Winamp converts the stream and sends it from Winamp to the SHOUTcast server. The SHOUTcast server can stream any format supported by Winamp: MPEG audio layers 1, 2, and 3, MOD/S3M/XM/IT (digital synthesized music formats), MIDI/MID (musical instrument digital interface), WAV/VOC (digital audio file), CDA (compact disc audio), WMA (Windows Media Audio), AS/ASFS (Audiosoft secure MP3 file) and so on. Through the use of specialized SHOUTcast broadcasting plug-ins, audio from a microphone as well as any device attached to the broadcasters’ sound card can also be streamed [138]. The last element of the SHOUTcast system is the SHOUTcast Distributed Network Audio Server (DNAS). This software runs on a server attached to an IP network with lots of bandwidth, and is responsible for receiving audio from a broadcaster, updating the SHOUTcast directory with information about what the broadcaster is sending, and sending the broadcast out to listeners [138]. With this system, SHOUTcast server responsibilities are easily distributed, and users can stream content to additional servers to allow for more listeners, making it feasible for someone with only a modem connection to broadcast to a large number of listeners. The SHOUTcast server is not a streaming server, instead it uses the HTTP protocol to stream, this is the reason why “HTTP streaming” is also known as the “SHOUTcast streaming protocol” [111]. The server software runs on Macintosh, Windows and Unix, and Winamp is available for the Macintosh as well as for Windows. Both the server and player software are free, a substantial advantage over Real and Windows (which charges money for high-performance server software). 86 Web Radio: Technology and Performance Icecast is an open source Internet audio streaming server based on MPEG audio technology that is completely compatible with SHOUTcast. The difference is that SHOUTcast is more Windows and Solaris oriented while Icecast is Unix oriented. Icecast has some limitations due to the lack of a good free encoder for Linux/Unix, e.g. the streams don’t change bitrate to the bitrate you specify using SHOUTcast, because it doesn’t exist an encoder to re-encode them at a lower bitrate properly. In Windows, Winamp uses Microsoft's licensed mp3 codec, which cannot be used for Linux/Unix. [140] Icecast's advantages over SHOUTcast include: less CPU and memory resources used, several streams per server, the possibility to use multiple directory servers and a simple administration through either telnet, console, or the web interface [157]. More information about broadcasting either with SHOUTcast or Icecast can be found in the documentation [138, 140]. There are also good tutorials on the Internet [144]. 6.4 Comparison The best systems are those delivering the highest quality audio for a given bandwidth, offering low delay, no jitter, low frame loss [162], low computational complexity [162], good audio synchronization, etc. In addition, the ability to provide the best possible audio quality over a range of networks/bandwidths (scalability) without content duplication is also desirable. The popularity of the system chosen is another important factor, since it helps to interconnect and transmit to the highest number of users. It is also important to keep in mind that some systems are free and others aren’t. Real Networks and Windows Media streaming technologies have their own proprietary server designed to stream files in their proprietary format. Therefore, the media files created should match a specific server’s file format requirements. Their players are also proprietary., e.g. a Windows Media Player will not normally play RealAudio and vice versa [147]. This is their main disadvantage, since their codecs are not open source, their use requires licensing and that typically means that broadcasters have to pay for the software to run them. The MP3 codec, and Winamp player used in SHOUTcast are practically free in comparison [144]. Besides, to run a Real Server, for example, the broadcaster must pay a license fee per-stream (per concurrent listener) [144]. 87 6 Implementing a Web Radio Station For broadcasters wanting to stream other people's music without purchasing their own licenses, there are sites like Live 365 [145] or Wired Planet. They handle all the paperwork concerning copyrights, and also provide the bandwidth required to stream to thousands of listeners. This is a good choice for broadcasters requiring big bandwidth, and they are not very expensive (from 10 to 80 $ per month) [144]. The main advantage of both Real Networks and Windows Media systems is their popularity, they are very consolidated and are the main systems used for streaming on the Internet sites and by the users nowadays. Using their formats avoids users having to download a new plug-in every time they want to listen to audio from the Internet, what is really annoying. Windows system is a very good system specially for those organizations that are already using Microsoft Windows [118]. Real Player is cross-platform and it is installed by default with many web browsers, so many listeners will be able to access it without installing additional software. My guess is that in the future all the platforms will tend to be open-source, since this is the market trend right now. Real Networks has a new software that can distribute streamed audio and video in a range of formats, including Windows Media format, and has announced a shared source code initiative. The media delivery platform, called the Helix Platform, is based on their current client and server software, and is the first to support commonly used technologies and applications, such as MPEG-4 and Windows Media [158]. “This is an attempt of Real Networks to prevent Windows Media from achieving market domination. A perception of open-ness will make more people consider Real Networks products as standards rather than just products. But Real Networks may not be able to afford to be open enough , their revenue today depends on licensing fees for the use of their software, and unless they can change their business model somewhat, it will be difficult for them to achieve a real partnership with the Open Source community. That community has little to gain by replacing Microsoft's proprietary audio format with Real Networks stillproprietary audio format” [159]. 88 7 Performance Before the audio gets to the listeners, it suffers many transformations, delays etc. that may degrade its quality. The main performance parameters and factors affecting the quality of real time audio are reviewed here. At the end of the chapter a practical experiment of measuring these parameters is analyzed. The quality of the audio that we listen to when we connect to a web radio stations depends on many parameters. Some audio quality is lost before transmission in the process of capturing, digitizing and compressing. These effects can be minimized, but not suppressed, depending on the equipment used, the strength of the compression etc. When the audio packets are transmitted over the Internet, they are subject to the vagaries of network performance, which depends on many factors, many of them unpredictable (such as congestion). One of this factors is the delay. Whereas traditional network applications (email, file transfer) can tolerate considerable delays, streamed audio is much less tolerant [65], specially in the case of two-way communication (such as conferencing) where the maximum acceptable delay is 250 ms [25]. In radio broadcasting applications the delay would increase the time users must wait before starting listening to the audio, but it doesn’t affect the quality of the audio itself. The total delay for real time audio broadcasting can be calculated as [122]: Tdelay: Tsample (ADC) + Tencode + Tpacketize + Ttransmission + Tpropagation + Tprocess in the network elements (routers, hops...) + Tbuffer + Tdepacketize + Tdecode + Tpresent (DAC) More than the delay, variations in this delay cause problems with audio traffic. The variation in delay is called jitter, and means that the time between the arrival of successive packets will vary, which may cause samples to be played back at the wrong time. One way 89 7 Performance to deal with this at the receiver end is to buffer [see 5.2.2] [65]. For high quality stereo music this delay variance should be less than 1 ms [25]. In addition to delay and jitter, there may be packet loss or errors, since the audio is not transported using a reliable protocol as TCP. However, audio has a high tolerance to this, compared to data packets [65]. In the receiving end, the user needs to have enough connection bandwidth to get the audio with its full quality. If the audio is encoded at a higher bitrate than the user’s connection, the user will experience frequent pauses in playback, or may be unable to play the content at all [149]. Some experiments can be made to measure the QoS of the Internet audio [see 164]. One of the most important metrics that influence user perception of audio is average packet audio playout delay vs. the packet loss rate. Other relevant metrics are the receiving buffer capacity vs. the packet loss rate, the average packet audio playout delay vs. the lateness (difference between the value of the receiver’s clock upon the arrival of the discarded packet and its timestamp.) of discarded packets, and the waiting time in the receiver buffer for the played packets vs. average packet audio playout delay. In Italy, some investigators compared the performance of three different playout delay control mechanisms [165, 166, 167], designed to dynamically adjust the receivers’ buffer depth to compensate for the highly variable Internet packet delays. Due to the presence of generally distributed delays typically experienced by audio samples over the Internet, the analysis was conducted via simulation. The three mechanisms were evaluated using both experimentally obtained delay measurements (trace-driven simulation) and randomly generated according to Gaussian distributed delays and exponentially distributed delays. The study was oriented to voice traffic, but I present here the results they obtained that can be extrapolated to real time Internet audio transmission. In the Figures 7.1 – 7.9 mechanism # 1 [165] is represented in yellow, # 2 [167] in green, and # 3 [166] in black. In Figure 7.10 Gaussian traffic is represented in yellow, exponential traffic in green and trace-driven traffic in black. 90 Web Radio: Technology and Performance Figure 7.1: Average playout delay (ms.) vs. loss rate (%): Gaussian traffic [164]. Figure 7.2: Average playout delay (ms.) vs. loss rate (%): Exponential traffic [164]. 91 7 Performance Figure 7.3: Average playout delay vs. loss rate (%): Trace-driven simulation [164]. Figure 7.4: Lateness of discarded packets (ms.) vs. average playout delay (ms.): Gaussian traffic [164].. 92 Web Radio: Technology and Performance Figure 7.5: Lateness of discarded packets (ms.) vs. average playout delay (ms.): Exponential traffic [164]. Figure 7.6: Lateness of discarded packets (ms.) vs. average playout delay (ms.): Trace-driven simulation [164].. 93 7 Performance In Figures 7.1 and 7.2 are represented the average playout delay vs. the loss rate for Gaussian and exponential traffic, respectively, where Gaussian delays have the expected value of 100 ms and a standard deviation of 7 ms, and exponential delays have the expected value of 100 ms and standard deviation of 10 ms. Figure 7.3 shows the same metrics averaged over a set of traffic traces gathered experimentally from an IP-based interconnection with 16 hops and that is quite lossy. To provide an understanding of the effects that various playout delays and loss rates (as well as buffer dimensions) have on the quality of the audio, in Figures 7.1 to 7.3 an approximate intuitive representation of three different ranges on the quality of the audio is presented. These ranges are for voice communications, such as conferencing, where delays larger than 350 ms. impede a conversation. It can be concluded that the longer the playout delay, the smaller the loss rate. Figures 7.4 to 7.6 show, for each audio mechanism, the average increment in playout delay that must be paid to reduce lateness and, consequently, avoid discarded audio packets. Under Gaussian traffic (Figure 7.8.), all the plotted curves show that a small increase in playout delay drastically reduces the percentage of packet loss. However, under the other traffic scenarios (exponential and trace-driven) a large playout delay must be introduced in order to obtain an appreciable reduction in the packet loss rate. One of the primary performance metrics is the percentage of packets lost at the destination. Such loss may either be due to packets that arrive too late for presentation or packets that arrive too far in advance of their playout time. In the latter case, packet loss results from limitations on the finite size of the buffer (that is, buffer overflow due to the premature arrival of packets). Figures 7.7 through 7.9 quantify, for each mechanism, the tradeoff between the loss rate and the maximum buffer capacity. Figure 7.9 shows that the dimensioning of the receiving buffer must be adjusted dynamically, according to the playout delay value computed by the mechanism. But with Gaussian and exponential traffic, a threshold value seems to exist beyond which the increase in buffer capacity provides no benefit. Figure 7.10 shows the total amount of time that nondiscarded packets wait in the receiver buffer before their playout time. It can be concluded that the larger the average playout delay, the larger the waiting time in the buffer. 94 Web Radio: Technology and Performance Figure 7.7: Loss rate (%) vs. buffer capacity (number of packets): Gaussian traffic [164]. Figure 7.8: Loss rate (%) vs. buffer capacity (number of packets): Exponential traffic [164]. 95 7 Performance Figure 7.9: Loss rate (%) vs. buffer capacity (number of packets): Trace-driven simulation [164]. Figure 7.10: Wait Time in buffer (ms.) vs. average playout delay (ms) for mechanism #3 [164]. 96 8 Conclusions We have seen that radio broadcasting on the Internet is an expanding technology, and is attracting more and more people, both listeners and broadcasters. A key characteristic of the streaming audio commercial products is the diversity in technological infrastructure, e.g. networks, protocols and compression standards [162]. Compatibility between products has been limited because of the use of proprietary standards. However, recent products have been designed to enable new and various codecs to be easily incorporated into their framework [162]. Streaming products now tend to be open, to provide compatibility and improve performance. Future efforts will focus on improving the existing networking technologies to provide better services at lower cost. Development of new communication protocols in conjunction with better compression techniques will also be important in the future. This will hopefully lead to radio without dropouts, servers that don’t suffer overload etc. and will make listening to Internet radio a much more pleasant experience. It seems to be not much written about the performance of Internet radio stations. Simulations on networks to measure the QoS, as the one we have analyzed before, could be done for web radio stations using network simulators (such as COMNET or NS). The previous simulation model could be extended to test different streaming protocols (MMS, RTSP etc.). Moreover, a web audio broadcast should exploit the features of the Internet, e.g. the information about the content, station etc. could be used to fulfill listeners’ preferences by providing them with private channels [130]. However, there may exist a copyright problem in giving listeners total control of what they listen to, and the situation where nobody will buy a CD again can be achieved. 97 8 Conclusions The increasing number of web radio stations and listeners is causing a lot of controversy. “During the last year, radio broadcasters and the recording industry have been locked in a battle over the fees [160]. In July, the Librarian of Congress decided what royalty rates webcasters will have to pay to record companies and performers to stream music out onto the Internet. The fee level will be hard to justify for even the larger commercial stations with strong advertising revenue. Even the lower fees applied to some noncommercial webcasters are impossible for small experimental stations” [161]. Will this slow down the growth of web radio technology?. 98 Appendix: Audio File Formats In this appendix the main audio file formats and their basic characteristics - file extension, origin, recommended bitrate, compression, if they stream or not and if they are proprietary or not - are listed. A format is a known or defined method of setting things out. Different files on a computer are arranged in their own formats that every piece of software that is capable of reading that format understands [69]. A media file format might be something like a BMP picture file, an AVI video file, or an AIFF audio file. It holds the information used to describe a sound or a picture in a special known format, that everyone must know before they can decode it [69]. A compression format holds the same information describing a piece of audio or a picture, except that it has been compressed, which changes the arrangement of the data bits and therefore, the format. A JPEG picture file, an MPEG video file or a RA encoded audio file, are examples of this [69]. The data in the compressed media file needs to be decompressed before it becomes a media format once again. A streaming file format is one that has been specially encoded so that it can be played while it downloads, instead of having to wait for the whole file to download. Such formats usually include also some compression. It is possible to stream some standard media file formats, however it is usually more efficient to encode them into streaming file formats [69]. Some examples of streaming file formats are ASF, RA, etc. A media delivery format is the unique way that audio data has been arranged. It contains information about the timing, synchronization, copyright etc. The actual audio data may be located in the same file, or in a separate file [69]. This is the prospect of streaming multimedia, because it is hoped that an open media delivery format will be taken on by all commercial streaming products in order to provide a defacto method for delivery of media types that use different standards of compression (such as MPEG) and different media file formats (QuickTime, AVI, AIFF, etc) [69]. This format also has the benefit of being able to 99 Appendix: Audio File Formats synchronize many different streams of different types in the same format. Some examples of media delivery formats are ASF, SMIL, etc. Historically, almost every type of machine used its own file format for audio data, but some file formats are more generally applicable, and in general it is possible to define conversions between almost any pair of file formats [59]. There are two types of file formats [59]: • Self-describing file formats generally define a family of data encoding, where a header field indicates the particular encoding variant used. Some self-describing formats are those with extension : .au or .snd, .aif(f), .aifc, .mp2, .mp3, .ra, .wav, WAVE, etc. • Headerless formats (sometimes called “raw”), define a single encoding. Some headerless formats are: .snd, .fssd (Mac, PC), .snd (Amiga). The table below reviews the main characteristics of some of the audio file formats. There are many others, but here are mentioned the most used today on the Internet. File Extension .au, .snd (audio) .aif(f), AIFF (audio Interchange File Format) .aifc, AIFC .ra .wav , WAVE, RIFF .cda (CD Audio Track) .asf (Audio Stream Format), .wma (Windows Media Audio v.8) .mp1, .mpa .mpeg2, .mp2, .mpa .mp3 (.m3u for playlists) stereo .acc stereo .acc mono Origin Recommended bitrate UNIX, Sun 720 Kbps Apple 1,4 Mbps Compression Yes (2:1) No Streaming No No Price O.F. O.F. Apple RealAudio Microsoft - 236 Kbps 96 Kbps 128 Kbps 1,4 Mbps Yes (1:6) Yes (1:14) Yes (1:10) No No Yes No No O.F. Proprietary O.F. - Microsoft 96 Kbps Yes (1:14) Yes Proprietary MP1 MP2 MP3 384 Kbps 192 Kbps 128 Kbps Yes (1:4) Yes (1:8) Yes (1:10) Yes Yes Yes O.F. O.F. O.F. Yes Yes (1:20) Yes Yes O.F. O.F. MPEG-2 MPEG-2 128 Kbps 64 Kbps, 96 Kbps [23] Table A.1: Audio File Formats [2, 59, 60, 61, 62, 69]. (O.F. = Open Format). 100 References [1] Multicast Networking and Applications C.Kenneth Miller. Addison Wesley Longman, Inc. 1999 [2] Audio on the Internet http://www.noisebetweenstations.com/personal/essays/audio_on_the_internet/ [3] Building a Community Information Network: A Guidebook http://www.mel.org/toolkit/content/book/ [4] Principles of digital audio and video Arch C. Luther. Artech House, Inc., 1997 [5] Fundamentals of digital audio http://www.cs.tut.fi/~ypsilon/80545/FundamentalsOfDA.html#HDR0 [6] Dithering and Noise Shaping: The Basics http://www.glowingcoast.co.uk/audio/theory/dither/index.htm [7] Multimedia Communications : Applications, Network, Protocols and Standards Fred Halsall. Addison-Wesley, 2001 [8] Digital video and audio compression Stephen J. Solari. McGraw-Hill, 1997 [9] Basics about MPEG Perceptual Audio Coding: The purpose of Audio Compression Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/basics.html [10] Principles in Low Bitrate Coding http://www.cs.tut.fi/~ypsilon/80545/WBACoding.html#HDR1 [11] Introduction to Multimedia http://www.cs.cf.ac.uk/Dave/Multimedia/ 101 References [12] Coding of Audio Signals http://www.cs.tut.fi/~ypsilon/80545/CodingOfAS.html#HDR3 [13] Bitrate scalability http://www.scalatech.co.uk/technology.htm [14] Perception Based Bit Allocation Algorithms for Audio Coding Stephen Voran. Institute for Telecommunication Sciences, U.S. http://www.its.bldrdoc.gov/home/programs/audio/pdf/waspaa97.pdf [15] Wideband Speech and Audio Coding Esin Darici Haritaoglu, http://www.umiacs.umd.edu/~desin/Speech1/node13.html [16] Multilevel data compression techniques for transmission of audio over networks Wunnava, S.V., Craig Chin. Proceedings. IEEE, 2001. Page(s): 234 –238 [17] Constant and Variable Bitrate Encoding http://service.real.com/help/faq/rjbvbrfaq.html [18] The MPEG Home Page http://mpeg.telecomitalialab.com/ [19] An Overview of the MPEG/audio Compression Algorithm Pan, D., Applications of Signal Processing to Audio and Acoustics, 1991 IEEE ASSP Workshop on Page(s): 0_79 -0_80 [20] A tutorial on MPEG/audio compression Pan, D., IEEE Multimedia, Volume: 2 Issue: 2, Summer 1995, Page(s): 60 –74 [21] Subband Coding Tutorial http://www.otolith.com/pub/u/howitt/sbc.tutorial.html [22] MPEG Audio Layer-3 Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/layer3/ [23] An Introduction to MPEG Layer –3 K. Brandenburg, H. Popp. Fraunhofer Institut für Integrierte Schaltungen (IIS) [24] Short MPEG-2 description Leonardo Chiariglione, ISO/IEC TC1/SC29/WG11 N, MPEG 00/, October 2000 http://mpeg.telecomitalialab.com/standards/mpeg-2/mpeg-2.htm [25] Multimedia networks: fundamentals and future directions Nalin Sharda, Victoria Univ. of Technology. 1999 102 Web Radio: Technology and Performance [26] MPEG Audio Frame Header http://www.dv.co.yu/mpgscript/mpeghdr.htm [27] The private life of MP3 frames http://www.id3.org/mp3frame.html [28] Tagging introduction http://www.id3.org/intro.html [29] MPEG-2 AAC Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/aac/index.html [30] Advanced Audio Coding http://www.aac-audio.com/technology/ [31] Source Coding of Audio Technologies and Services on Digital Broadcasting, http://www.nhk.or.jp/strl/publica/bt/en/le0010-1.html [32] RTP Payload Format for MPEG-2 AAC Streams IETF, 1999 http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-ietf-avt-rtp-mpeg2aac- 00.txt [33] MP3 and AAC explained Karlheinz Brandenburg, Fraunhofer Institute for Integrated Circuits http://www.ece.cmu.edu/~ee545/MP3/docs/mp3_explained.pdf [34] High quality audio for multimedia: key technologies and MPEG standards Noll, P. Global Telecommunications Conference, 1999. GLOBECOM '99, Volume: 4, 1999, Page(s): 2045 -2050 vol.4 [35] MPEG-4 Overview ISO/IEC JTC1/SC29/WG11 N4668, March 2002 http://mpeg.telecomitalialab.com/standards/mpeg-4/mpeg-4.htm [36] MPEG-4 Audio Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/mpeg4/index.html [37] MPEG-4 AAC Steve Church, Telos Systems. http://www.broadcastpapers.com/audio/TelosAAC04.htm [38] MPEG-4 Natural Audio Coding Karlheinz Brandenburg, Oliver Kunz, Akihiko Sugiyama http://leonardo.telecomitalialab.com/icjfiles/mpeg-4_si/9-natural_audio_paper/index.html 103 References [39] Applications of MPEG-4: digital multimedia broadcasting Grube, M.; Siepen, P.; Mittendorf, C.; Boltz, M.; Srinivasan, M. Consumer Electronics, IEEE Transactions, on Volume: 47 Issue: 3 , Aug. 2001, Page(s): 474 –484 [40] MPEG Audio FAQ.MPEG-4 Audio: coding of natural and synthetic sound http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/mpeg4.html [41] AAC Low Delay Fraunhofer Institute, http://www.iis.fhg.de/amm/techinf/mpeg4/aac_ld.html [42] MPEG-4 General Audio Coding Jürgen Herre, Fraunhofer Institute for Integrated Circuits (IIS) http://www.tnt.uni-hannover.de/project/mpeg/audio/general/aes106_1-GeneralAudio.pdf [43] Overview of the MPEG-7 Standard (version 6.0) ISO/IEC JTC1/SC29/WG11. N4509. Pattaya, December 2001 http://mpeg.telecomitalialab.com/standards/mpeg-7/mpeg-7.htm#_Toc533998965 [44] MPEG Audio FAQ.MPEG-7: description of meta-information on sound http://www.tnt.uni-hannover.de/project/mpeg/audio/faq/mpeg7.html#7A [45] MPEG-7 Applications Document v.10 ISO/IEC JTC1/SC29/WG11/N3934. January 2001/Pisa. http://ipsi.fhg.de/delite/Projects/MPEG7/Documents/W3934.htm [46] MPEG-21 Overview v.4 ISO/IEC JTC1/SC29/WG11/N4801. Fairfax, May 2002 http://mpeg.telecomitalialab.com/standards/mpeg-21/mpeg-21.htm [47] About Windows Media Audio Codec Starr Andersen Microsoft Corporation. Updated October 2000 http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnwmt/html/msaudio.asp [48] Real Audio www.realaudio.com [49] Internet radio and excellent audio quality: dreamboat or reality? Stoll,G.; Felderhoff, U.; Spikofski, G., Broadcasting Convention, 1997. International, 1997. Page(s): 192 -201 [50] Compressed Audio vs. CDs: Can You Tell the Difference? Ramon Mcleod and Richard Baguley http://webcenter.pcworld.aol.com/computing/aolcom/article/0,aid,64123,pg,3,00.asp 104 Web Radio: Technology and Performance [51] WMA vs. MP3 –3 Peter Bridger, http://www.hardwarecentral.com/hardwarecentral/reviews/2606/7/ [52] Which is the best low-bitrate audio compression algorithm? OGG vs. MP3 vs. WMA vs. RA Panos Stokas, http://ekei.com/audio/ [53] Comparison Testing of Codecs For Microsoft Windows Media Technologies 4.0, RealNetworks RealSystem G2, and MP3 April, 1999, http://www.nstl.com/downloads/Final_MSAudio_Report.pdf [54] Music, Internet Audio, Web Design http://www.midisound.com/index.html [55] WMA, MP3, OGG, VQF, WAV http://www.litexmedia.com/article/ [56] MPEG-4 Audio verification test results: Audio on Internet Eric Scheirer, Sang-Wook Kim, Martin Dietz. Source: Audio and Test subgroup. ISO/IEC JTC1/SC29/WG11. MPEG98/N2425. Atlantic City - October 1998 [57] MP3 encoders comparison http://members.tripod.com/Milaa/mp3Comparison/encoders.html [58] Report on the MPEG-2 AAC Stereo Verification Tests David Meares, BBC R&D, Kingswood Warren, UK; Kaoru Watanabe, NHK, Tokyo, Japan; Eric Scheirer, MIT Media Labs, USA. ISO/IEC JTC1/SC29/WG11. N2006. February 1998 [59] Audio File Formats FAQ Chris Bagwell, http://home.attbi.com/~chris.bagwell/AudioFormats.html# [60] Introduction to Multimedia Systems Gaurav Bhatnagar, Shikha Mehta and Sugata Mitra. Academic Press, 2001 [61] A comparison of Internet audio compression formats Andrew Pam, http://www.sericyb.com.au/audio.html [62] Music formats http://www.wotsit.org/search.asp?s=music [63] Networking fundamentals http://www.streamdemon.co.uk/structure.html 105 References [64] How Internet Infrastructure Works Jeff Tyson, http://www.howstuffworks.com/internet-infrastructure.htm [65] Digital multimedia Nigel Chapman and Jenny Chapman. John Wiley & Sons, 2000 [66] White paper- Streaming media technology Steve Cresswell, June 2000 http://whitepapers.smart421.com/smart421-streamingmedia.pdf [67] Streaming media AT&T. July 2000, http://www.ipservices.att.com/techviews/whitepapers/StreamingMedia.pdf [68] Streaming media frequently asked questions http://www.kexp.org/listen/faq.htm [69] Streaming file formats http://www.streamdemon.co.uk/avdata.html [70] Internet terms: Streaming http://www.clienthelpdesk.com/dictionary/streaming.html [71] Streaming optimization http://howto.lycos.com/lycos/step/1,,1+11+26063+24067+9945,00.html [72] Multicast Streaming: An Introduction http://www.microsoft.com/windows/windowsmedia/serve/multiwp.asp [73] Multicast communication: Protocols and applications Ralph Wittman and Martina Ziterbart. Morgan Kaufmann publishers, 2001 [74] Multicast streaming QuickTime API Documentation http://developer.apple.com/techpubs/quicktime/qtdevdocs/REF/Streaming.4.htm [75] IP Multicast http://www.mbone.ru/tech/intro.shtml [76] MBONE: Multicasting Tomorrow's Internet Kevin Savetz, Neil Randall, and Yves Lepage, http://www.savetz.com/mbone/toc.html [77] Developing IP Multicast Networks Volume I, Beau Williamson. Cisco Press, 2000 106 Web Radio: Technology and Performance [78] A Mechanism for Multicast Multimedia Data with adaptive QoS Characteristics Christos Bouras and A.Gkamas, 2001 [79] Multichannel Splittin algorithm for AAC and AAC-LD encoded audio Anton Thimet and Joesph Zolyak http://www.telos-systems.com/?/techtalk/split/default.htm [80] Streaming Media Optimization with CacheFlow Internet Caching Appliances http://www.cacheflow.com/technology/whitepapers/streaming.cfm [81] Error Spreading: A Perception-Drive n Approach to Handling Error in Continuous Media Streaming Srivatsan Varadarajan, Hung Q. Ngo, and Jaideep Srivastava [82] Packet loss resilient, scalable audio compression and streaming for IP networks Leslie, B.; Sandler, M. 3G Mobile Communication Technologies, 2001 Second International Conference on (Conf. Publ. No. 477), 2001. Page(s): 119 –123 [83] Computer networks : a systems approach Larry L.Peterson & Bruce S. Davie, The Morgan Kaufmann Series in Networking. Second Edition, 2000 [84] Multimedia Servers : Applications, environments and design Dinkar Sitaram and Asit Dan The Morgan Kaufmann Series in Multimedia Information and Systems, 2000 [85] Internet Protocols http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito_doc/ip.htm [86] RFC 791: Internet protocol September 1981, http://www.ietf.org/rfc/rfc791.txt [87] RFC 792: Internet Control Message Protocol September 1981, http://www.ietf.org/rfc/rfc792.txt [88] IP Audio INET'99 from the Internet Society http://www.isoc.org/isoc/conferences/inet/99/proceedings/4p/index.htm [89] Protocol directory http://www.protocols.com/pbook/index.htm [90] RFC 1883: Internet Protocol, Version 6 (IPv6) December 1995, http://www.ietf.org/rfc/rfc1883 107 References [91] RFC 2463: Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) December 1998, http://www.ietf.org/rfc/rfc2463 [92] RFC 2365: Administratively Scoped IP Multicast July 1998, http://www.ietf.org/rfc/rfc2365.txt [93] RFC 793: Transmission Control Protocol September 1981, http://www.ietf.org/rfc/rfc793.txt [94] A TCP/IP Tutorial T. Socolofsky, C. Kale. Spider Systems. January 1991, http://www.faqs.org/rfcs/rfc1180.html [95] TCP-friendly Congestion Control for Real-time Streaming Applications Deepak Bansal and Hari Balakrishnan M.I.T. Laboratory for Computer Science. Cambridge, MA 02139 [96] Advanced internet technologies Uyless Black. Prentice Hall series in advanced communications technologies, 1999 [97] RFC 768: User Datagram Protocol J.Postel, ISI. August 1980, http://www.ietf.org/rfc/rfc768 [98] Time-lined TCP for the TCP-friendly Delivery of Streaming Media Biswaroop Mukherjee and Tim Brecht. Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1 Appears in the Proceedings of the International Conference on Network Protocols (ICNP), Osaka Japan, pp. 165176, November 2000 http://www.nmsl.cs.ucsb.edu/~ksarac/icnp/2000/papers/2000-15.pdf [99] RFC 1889: RTP: A Transport Protocol for Real-Time Applications H. Schulzrinne (GMD Fokus), S. Casner (Precept Software, Inc.), R. Frederick (Xerox Palo Alto Research Center), V. Jacobson (Lawrence Berkeley National Laboratory). IETF, January 1996, http://www.ietf.org/rfc/rfc1889.txt [100] Protocol ensures safer multimedia delivery John Walker and Jeff Hicks. Network World, 1999 http://www.nwfusion.com/news/tech/1101tech.html [101] Streaming media protocols Neil Ridgway, http://www.mmrg.ecs.soton.ac.uk/publications/archive/ridgway1998/html/node26.html [102] Some frequently asked questions about RTP http://www.cs.columbia.edu/~hgs/rtp/faq.html 108 Web Radio: Technology and Performance [103] RTP Payload Format for MPEG-2 AAC Streams Kretschmer-AT&T/Basso-AT&T, Civanlar-AT&T/Quackenbush-AT&T, Snyder-AT&T. IETF, June 25, 1999 http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-ietf-avt-rtp-mpeg2aac-00.txt [104] RTP Payload Format for MPEG-4 Streams Civanlar-AT&T/Basso-AT&T, Casner-Packet Design, Herpel-Thomson/Perkins-ISI. IETF, July 13, 2000 http://www.cs.columbia.edu/~hgs/rtp/drafts/draft-ietf-avt-rtp-mpeg4-03.txt [105] RFC 3016: RTP Payload Format for MPEG-4 Audio/Visual Streams Y. Kikuchi (Toshiba), T. Nomura (NEC), Fukunaga (Oki), Y. Matsui (Matsushita), H. Kimata (NTT). IETF, November 2000 http://www.faqs.org/rfcs/rfc3016.html [106] A More Loss-Tolerant RTP Payload Format for MP3 Audio R. Finlayson, IETF,June 2001, http://www.live.com/rtp-mp3/rtp-mp3.txt [107] RFC 2250: RTP Payload Format for MPEG1/MPEG2 Video D. Hoffman, G. Fernando, V. Goyal, M. Civanlar. IETF. January 1998 http://www.networksorcery.com/enp/rfc/rfc2250.txt [108] RSVP Protocol Overview http://www.isi.edu/div7/rsvp/overview.html [109] RFC 2205: Resource ReSerVation Protocol (RSVP) R. Braden, Ed., L. Zhang, S. Berson, S. Herzog, S. Jamin. IETF, September 1997 http://www.ietf.org/rfc/rfc2205.txt [110] RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1 R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee IETF, June 1999, http://www.ietf.org/rfc/rfc2616.txt [111] Streaming, Transmission and other Network Protocols http://streamingmedialand.com/technology.html [112] RFC 2326: Real Time Streaming Protocol (RTSP) H. Schulzrinne, A. Rao, R. Lanphier. IETF, April 1998, www.ietf.org/rfc/rfc2326.txt [113] Multimedia Applications Marcel Waldvogel, http://classes.cec.wustl.edu/~cs423/FL2000/Chapter6a/img0.html [114] RFC 2327: SDP: Session Description Protocol M. Handley, V. Jacobson. IETF, April 1998, http://www.ietf.org/rfc/rfc2327.txt 109 References [115] RFC 2974: Session Announcement Protocol M. Handley, C. Perkins, E. Whelan. IETF, October 2000 http://www.ietf.org/rfc/rfc2974.txt [116] RFC 2543: SIP: Session Initiation Protocol M. Handley, H. Schulzrinne, E. Schooler, J. Rosenberg. IETF, March 1999 http://www.ietf.org/rfc/rfc2543.txt [117] Streaming Methods: Web Server vs. Streaming Media Server http://www.microsoft.com/windows/windowsmedia/compare/webservvstreamserv.asp [118] Web Developer.com Guide to Streaming Multimedia José Alvear. Wiley Computer Publishing, 1998 [119] Customized Internet radio Venky Krishnan and S. Grace Chang. Hewlett-Packard Laboratories [120] Video and image processing in multimedia systems Borko Furth, Stephen W. Smoliar and Hongjiang Zhang Kluwer Academic Publishers, 1995 [121] Management of multimedia on the internet 4th IFIP/IEEE International Conference on Management of Multimedia Networks and Services, MMNS 2001, Chicago, IL, USA, October/November 2001 [122] Multimedia Information Networking Nalin K. Sharda. Prentice Hall, 1999 [123] Internet radio http://www.radioenthusiast.com/internet_radio.htm [124] Streaming audio@Internet: Perspectives for the broadcasters Gerhard Stoll. Institut für Rundfunktechnik, IRT GmbH, München, Germany http://radio.irt.de/aida/docs/AES17Sto.pdf [125] From Broadcasting to Webcasting: The EBU experience on Audio@Internet Gerhard Stoll. Institut für Rundfunktechnik, München, Germany IBC’2000 Tutorial Day “The Internet Delivery”, 7 Sept. 2000, Amsterdam http://radio.irt.de/aida/docs/IBC2000.pdf [126] ISMA: Internet Streaming Media Alliance http://ism-alliance.tv/index.html [127] Internet Society http://www.isoc.org 110 Web Radio: Technology and Performance [128] Music on the Internet and the intellectual property protection problem Lacy, J.; Snyder, J.H.; Maher, D.P. Industrial Electronics, 1997. ISIE '97., Proceedings of the IEEE International Symposium on , Volume: 1 , 1997. Page(s): SS77 -SS83 vol.1 [129] The development of an interactive music station over Internet Chung-Ming Huang; Pei-Chuan Liu; Pi-Fung Shih. Information Networking, 2001. Proceedings. 15th International Conference on , 2001. Page(s): 279 –284 [130] Issues in Internet Radio Yasushi Ichikawa, Kensuke Arakawa, Keisuke Wano, and Yuko Murayama, 2002 [131] Audio streaming on the Internet. Experiences with real-time streaming of audio streams Jonas, K.; Kanzow, P.; Kretschmer, M. Industrial Electronics, 1997. ISIE '97, Proceedings of the IEEE International Symposium on , Volume: 1 , 1997. Page(s): SS71 -SS76 vol.1 [132] Bethoven.com Web Site Statistics http://www.beethoven.com/statistics.htm [133] Clare FM Audience Statistics http://www.clarefm.ie/about/perf.htm#jnlr [134] The Lion 90.7fm Webcasting Statistics http://www.lion-radio.com/webstats.php [135] FAQs about Streaming Server http://developer.apple.com/darwin/projects/streaming/faq.html [136] Windows Media Technologies http://www.microsoft.com/windows/windowsmedia/default.asp [137] Real Networks www.real.com, www.realnetworks.com, http://service.real.com/ [138] Shoutcast http://www.shoutcast.com/, http://www.shoutcast.com/support/docs/ [139] Web Radio to Resume Tunes in July Frank Thorsberg, PCWorld.com http://webcenter.pcworld.icq.com/computing/icq/article/0,aid,54328,00.asp [140] Icecast www.icecast.org 111 References [141] Networks and transfer protocols Final Report of the EBU / SMPTE Task Force for Harmonized Standards for the Exchange of Television Programme Material as Bitstreams http://www.ebu.ch/pmc_tfrep_part_5.pdf [142] Webopedia http://www.webopedia.com [143] Creating Streaming Media at CSU http://csu.colstate.edu/webdevelop/streamingmedia/#Recording%20the%20Media [144] Streaming audio tutorial http://hotwired.lycos.com/webmonkey/00/45/index3a_page5.html?tw=multimedia [145] Live 365 http://www.live365.com/home/index.live [146] Wired Planet http://www.wiredplanet.com/ [147] Streaming Media tutorial http://www.doit.wisc.edu/services/streaming/tutorial/tutorial3.htm [148] Windows Media Encoder http://www.visionlb.ca/ISE/Sarpal/WindowMediaEncoder7UserManual.doc [149] Real System G2 production guide http://csu.colstate.edu/webdevelop/streamingmedia/realsys/production/ [150] The WebDeveloper.com Secret Guide to RealAudio David Fiedler, http://www.webdeveloper.com/multimedia/multimedia_guide_realaudio.html [151] RealSystem Production Guide With RealOne Player http://service.real.com/help/library/guides/realone/ProductionGuide/HTML/realpgd.htm [152] Players comparison overview http://www.microsoft.com/windows/windowsmedia/press/compare/playcomp.asp [153] Servers and delivery comparison overview http://www.microsoft.com/windows/windowsmedia/press/compare/servercomp.asp [154] Comparative Cost Analysis: Windows Media Technologies Vs. Real System G2 http://www.microsoft.com/windows/windowsmedia/compare/wmtcostcompare.asp 112 Web Radio: Technology and Performance [155] Microsoft Windows Media and RealNetworks RealSystem Feature Comparison http://www.approach.com/content/expertise/digital.asp [156] Microsoft: Windows Media Player for Windows XP versus Real Jukebox 2.0 Basic and RealONE performance comparison test http://www.etestinglabs.com/main/reports/mswxpmp8.pdf [157] Icecast - Streaming media server based on the MP3 audio code http://www.gnu.org/directory/Audio/Mp3/icecast.html [158] RealNetworks shares code, streams Windows format Joris Evers. IDG News Service, 07/22/02 http://www.nwfusion.com/news/2002/0722realshare.html [159] How open is RealNetworks' new "open" software? Adam Gaffin. Network World Fusion, 07/24/02 http://napps.nwfusion.com/compendium/archives/00000210.html [160] Webcasting Royalty Rates Set--For Now Scarlet Pruitt, IDG News Service http://webcenter.pcworld.icq.com/computing/icq/article/0,aid,102146,00.asp [161] The final (for some) report Scott Bradner. Network World, 07/01/02 http://www.nwfusion.com/columnists/2002/0701bradner.html [162] A Review of Video Streaming over the Internet Jane Hunter , Varuni Witana , Mark Antoniades. SuperNOVA Project DSTC Technical Report TR97-10, August 1997 http://archive.dstc.edu.au/RDU/staff/jane-hunter/video-streaming.html [163] Streaming multimedia data http://www.teamsolutions.co.uk/streaming.html [164] Comparing the QoS of Internet Audio Mechanisms via Formal Methods Alessandro Aldini, Roberto Gorrieri, Marco Roccetti and Marco Bernardo [165] Adaptive playout mechanisms for packetized audio applications in wide-area networks Ramjee, R., Kurose, J., Towsley, D., and Schulzrinne, H., 1994. In Proceedings of the Conference on INFOCOM ’94 (Montreal, Canada). [166] Design and experimental evaluation of an adaptive playout delay control mechanism for packetized audio for use over the internet Roccetti, M., Ghini, V., Pau, G., Salomoni, P., and Bonfigli, M., 1999 113 References [167] Packet audio playout delay adjustment: Performance bounds and algorithms Moon, S.B., Kurose, J., and Towsley, D. 1998 114 Acronyms AAC: Advanced Audio Coding. AAC LC: Advanced Audio Coding Low Complexity. ABR: Available Bitrate. ACK: Acknowledge. ADC: Analog Digital Conversor. ADIF: Audio Data Interchange Format. ADPCM: Adaptive Pulse Code Modulation. ADTS: Audio Data Transport Stream. AIFF: Audio Interchange File Format. ASF: Advanced Streaming Format. ASPEC: Adaptive Spectral Perceptual Entropy Coding. ASR: Automatic Speech Recognition. ATM: Asynchronous Transfer Mode. AVI: Audio Video Interleave. BMP: Bit-mapped graphics format. BW: Bandwidth. 115 Acronyms CBR: Constant Bitrate. CCITT: Comité Consultatif International Téléphonique et Télégraphique. CD: Compact Disk. CELP: Code Excited Linear Predictor. Codec: Coder – Decoder. CPU: Central Processing Unit. CRC: Cyclic Redundancy Code. DAB: Digital Audio Broadcasting. DCT: Discrete Cosine Transform. DFT: Discrete Fourier Transform. DNAS: Distributed Network Audio Server. DPCM: Differential Pulse Code Modulation. DSM-CC: Digital Storage Media Command and Control. DST: Discrete Sine Transform. DVD: Digital Versatile Disk. DWT: Discrete Wavelet Transform. ETSI: European Telecommunications Standards Institute. FTP: File Transfer Protocol. FFT: Fast Fourier Transform. HILN: Harmonic and Individual Lines plus Noise. HTML: HyperText Markup Language. HTTP: Hypertext Transfer Protocol. IAB: Internet Architecture Board. 116 Web Radio: Technology and Performance ICMP: Internet Control Message Protocol. IEC: International Electrotechnical Commission. IETF: Internet Engineering Task Force. IGMP: Internet Group Management Protocol. IP: Internet Protocol. IPv4: Internet Protocol, Version 4. IPv6: Internet Protocol, Version 6. ISDN: Integrated Services Digital Network. ISMA: Internet Streaming Media Alliance. ISN: Initial Sequence number. ISO: International Standards Organization. ISOC: Internet Society. ISP: Internet Service Provider. ITU: International Telecommunication Union. JPEG: Joint Photographic Experts Group. KBD: Kaiser-Bessel Derived. Kbps: Kilo bits per second. kHz: Kilo Herzs. LAN: Local Area Network. LD: Low Delay. LFE: Low Frequency Enhancement. LPC: Linear Predictive Coding. LTP: Long Term Prediction. 117 Acronyms LZW: Lempel-Ziv-Welch. MDCT: Modified Discrete Cosine Transform. MMS: Microsoft Media Server. MMSU: MMS over UDP. MMST: MMS over TCP. MPEG: Moving Picture Experts Group. MS: Middle Side. MSBD: Microsoft Media Stream Broadcast Distribution protocol. MUSICAM: Masking pattern adapted Universal Subband Integrated Coding And Multiplexing. NS: Network Simulator. PASC: Precision Adaptive Sub-band Coding. PCM: Pulse Code Modulation. PEAQ: Perceptual Evaluation of Audio Quality. PNS: Perceptual Noise Substitution. PQMF: Polyphase Quadrature Mirror Filterbank. PS: Program Stream. QoS: Quality of Service. RA: Real Audio. RBN: Real Broadcast Network. RFC: Request for Comments. RSVP: Resource Reservation Protocol. RTCP: Real Time Control Protocol. RTP: Real Time Protocol. 118 Web Radio: Technology and Performance RTSP: Real Time Streaming Protocol. SAP: Session Announcement Protocol. SBC: Subband Coding. SCFSI: Scale Factor Selection Information. SDP: Session Description Protocol. SIP: Session Initiation Protocol. SMIL: Synchronized Multimedia Integration Language. SMR: Signal to Mask Ratio. TCP: Transmission Control Protocol. TNS: Temporal Noise Shaping. TS: Transport Stream. TwinVQ: Transform-domain Weighted interleaved Vector Quantization. UBR: Unspecified Bitrate. UDP: User Datagram Protocol. URL: Uniform Resource Locator. VBR: Variable Bitrate. WMA: Windows Media Audio. 119 120
© Copyright 2024 Paperzz