Presented By, Ashwini Urs Under guidance of Dr.K.R.Rao Motivation  Exponential rise in demand for video streaming over internet and mobile devices.  To counter the bandwidth limitations, an efficient video and audio compression scheme is essential.  Exploiting the open source benefits of Dirac video codec. Outline  Applications  Overview of Dirac video codec  Overview of AAC audio codec  Multiplexing process  De-multiplexing process  Results  Conclusions  Future work  References Areas of Application  Digital video broadcasting : DVB –T (Terrestrial), DVB – S (Satellite), DVB-C (cable). [10] [21]  IPTV.[24]  ATSC- M/H (Mobile/Handheld). [16][17] [18] Block diagram of Multiplexing /De-multiplexing Overview of Dirac video codec  Dirac [31] is a hybrid video codec developed by British Broadcasting Corporation (BBC). [1]  Dirac is an open technology, meaning that it can be used without payment of licensing fees.  Dirac uses modern techniques like, wavelets for transform and arithmetic coding for entropy coding. [2]  Dirac video codec has two software implementations, namely, Dirac research and Schrodinger. Dirac research is used in this thesis. [28] Features of Dirac  Dirac video codec achieves state of the art performance - good quality at low bitrates, leading to lower costs. [26]  The applications of Dirac range from high definition television (HDTV) to web streaming due to its flexibility. [1]  Performance of Dirac for HD media seems as apt as H.264, due to absence of large and intolerable variations between the two codecs [2].  With industry plans for “On demand” TV and enormous demand for streaming over internet, open platform technologies like Dirac have become highly significant.[26] Figure.1 Block diagram of Dirac encoder [1] Figure.2 Block diagram of Dirac decoder Dirac Bit- stream syntax  The Dirac bit stream begins with a sequence header followed by the byte stream data in the form of frames and ends with end of a sequence.  The start of the frame data is signaled by the parse info header. Parse info header shall contain 4 byte code i.e. 0x42, 0x42,0x43 and 0x44. [23]  This is followed by parse code which indicates the type of picture : intra (I), predictive (P) or bi-directional (B) frame.  Next parse offset consists of 4 bytes signaling the data from first byte of current parse info header to the first byte of the next parse info header, if present. It shall be zero, if there is no further parse info header. [23]  Previous parse offset consists of 4 bytes signaling the data from first byte of current parse info header to the first byte of the previous parse info header, if present. It shall be zero, if there is no further parse info header. [23] Table.1 Parse codes. [23] Overview of Advanced audio coder (AAC)  AAC [4][37] consists of three profiles, namely: main profile, low- complexity profile and scalable sampling rate (SSR) profile.  The key feature of low-complexity profile is, it eliminates the use of prediction tool and reduces the complexity for the temporal noise shaping tool  Supports Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz). [4][37]  Superior performance at bit rates > 64 kbps and at bit rates reaching as low as 16 kbps. Features of AAC  AAC provides improved compression over earlier audio codecs and provides higher-quality audio at low bit rates. [4]  AAC can accommodate two basic bit stream formats: Audio data interchange format (ADIF) and Audio data transport stream (ADTS) . [4]  The feature of ADTS is that it has one header for each frame followed by raw block of data. Figure.3 Block Diagram AAC encoder [4] Figure.4 Block Diagram AAC decoder [4] Field Number of Description bits syncword 12 Always ‘111111111111’ ID 1 0:MPEG-4 1: MPEG-2 ADTS fixed layer 2 Always”00” header protection_absent 1 profile 2 sampling_frequency_index 4 private_bit 1 channel_configuration 3 original/copy 1 home 1 copyright_identification_bit 1 copyright_identification_start 1 aac_frame_length 13 Explained below ADTS variable Length of frame including header header (in bytes) adts_buffer_fullness 11 no_raw_data_blocks_in_frame 2 crc_check 16 raw_data_blocks 0x7FF indicates VBR Only if protection_absent ==0 Variable size Table.2 ADTS bit stream header format [36] [37] Profile bits Bits ID == 1 (MPEG -2 profile) ID ==0 (MPEG-4 object type) 00(0) Main profile 01(1) Low complexity profile 10(2) Scalable sampling rate 11(3) reserved Table.3 AAC profile bits expansion. [36] [37] Figure.5 Block diagram of ATSC transreception showing MPEG-2 Transport stream(TS) multiplexer [20] Factors to be considered during multiplexing data to a single stream  Each one of video/audio elementary stream data needs to be prioritized equally. This is essential to maintain buffer fullness at the receiver’s end.  Information in the form of time stamps is embedded to the PES header which will assist in maintaining lip sync.  Finally, since the multiplexed data will be transmitted in an error prone network, a provision for error detection and correction methods is to be made. Figure.6 Encapsulation of PES from elementary streams [9] Header fields Size Description (Bytes) PES packet start prefix 3 0x000001 Video/audio stream ID 1 Unique for Video/audio PES packet length 2 Unbounded when exceeds 65536 bytes and value of 0 is indicated under such circumstances. Time stamps 2 Frame numbers used a s time stamps Table.6 PES header description [19] Figure.7 Splitting of PES packet to several TS packets [9] Time stamps  Frame numbers are used as time stamps.  Time stamp information is embedded in the PES header of each frame.  Time stamp information helps synchronization during playback. to achieve Advantages of using frame numbers over program clock reference (PCR)  Less complex.  Can be easily incorporated in software and is well suited.  Savings in PES header bytes. (In case of PCR, time information needs to be sent periodically)  Synchronization can be effectively achieved due to absence of clock jitters and inaccurate clock samples. Figure.8 MPEG transport stream and standard structure of MPEG-TS packet [12] [24]. Figure.9 Structure of TS packet header used in this thesis. AFC : Adaptation field control Header fields Size Sync byte 8 bits PUSI 1bit AFC 1bit CC 4 bits PID (Unique for video/audio) 10 bits Byte offset (optional) 8 bits (If AFC ==1 && LPES < 185 bytes) Stuffing bytes Added in case of byte offset Table.7 TS packet header Example of multiplexing process Consider, A video PES packet of 500 bytes @ frame rate = 25 frames per second (fps). Number of TS packets for PES = 500/185 = 2.7 ~= 3 TS packets. Video PES duration = 1/25 = 0.04 = 40ms. Duration of each TS packet = 40/3 = 13.33ms. Consider , An audio PES packet of 600 bytes @ sampling frequency = 44100 Hz. Number of TS packets for PES = 600/185 = 3.2 ~ =4 TS packets. Audio PES duration = 1024/44100 = 0.04 = 23.2ms. Duration of each TS packet = 23.2/4 = 5.8ms. De-multiplexing  De-multiplexing is the process of retrieving video and audio elementary streams from the transmitted elementary stream.  Two buffers maintained separately to extract video and audio data respectively.  While extracting video/audio data, the video and audio buffers need to be constantly monitored to ensure no buffer overflow or underflow of data takes place.  In this thesis, the data can be de-multiplexed from start to end as well as from any TS packet. Figure.10 Flowchart of the de-multiplexer Buffer fullness check  When the buffer fullness of the data is monitored at the de- multiplexer, the video and audio content playback time is calculated as follows :  Video content playback time = Number of video frames × (1/fps)  Audio content playback time = Number of audio frames × (1024/sampling frequency) Calculation of presentation time The data is de-multiplexed from any TS packet, video and audio presentation time for the chosen frame number is calculated as follows: Lip synchronization  Calculating the delay between audio and video presentation time at the de-multiplexer, helps synchronization during playback. to determine audio/video  The maximum delay that is allowed by MPEG-2 standard is ± 40ms while synchronization between audio and video. [25] Results Clip 1 Clip 2 (Shrek) (Despicable me) Clip Duration (seconds) 50 36 Resolution 480 × 204 480 × 270 Chroma sub-sampling format YUV 420P YUV 420P No: of video frames 1206 885 No: of audio frames 2166 1589 Video frame rate (fps) 24 24000/1001 Audio sampling frequency(kHz) 44.1 44.1 YUV file size (kB) 172985.63 168011.72 WAV file size (kB) 8658.04 6349.54 .DRC file size(kB) 4041.28 2629.67 .AAC file size (kB) 676.32 563.95 Video compression ratio 43:1 64:1 Audio compression ratio 13:1 12:1 DRC video bitrate (kbps) 80.83 73.05 AAC audio bitrate (kbps) 13.53 15.67 Video TS packets 23042 15039 Audio TS packets 4772 3927 TS packets 27814 18966 TS file size(kB) 5106.48 3482.039 TS bitrate(kbps) 102.13 96.72 Original AVI file size (kB) 4227 3199 Details of test clips Table .8 Inter coding: test clips characteristics Clip 1 Clip 2 (Despicable me) (Despicable me) Clip Duration (seconds) 36 10 Resolution 480 × 270 480 × 360 Chroma sub-sampling format YUV 420P YUV 420P No: of video frames 885 224 No: of audio frames 1589 385 Video frame rate (fps) 24000/1001 25 Audio sampling frequency(kHz) 44.1 44.1 YUV file size (kB) 168011.72 56700 WAV file size (kB) 6349.54 1534.54 .DRC file size(kB) 16290.97 5496.87 .AAC file size (kB) 563.95 126.15 Video compression ratio 10:1 10:1 Audio compression ratio 12:1 12:1 DRC video bitrate (kbps) 452.53 549.69 AAC audio bitrate (kbps) 15.67 12.62 Video TS packets 90714 30563 Audio TS packets 3927 842 TS packets 94641 31405 TS file size (kB) 17375.5 5765.76 TS bitrate (kbps) 482.65 576.57 Original AVI file size (kB) 3199 1083 Details of test clips Table .9 Intra coding: test clips characteristics Figure.11 Frame 140 : luma component of Clip2 : 36 sec Figure. 12 Frame 80 : luma component of Clip2 :10 sec Check for buffer fullness # of Video Frames in buffer # of Audio frames in buffer Size of Video buffer (kB) Size of Audio buffer (kB) Video content playback time* (in sec) Audio content playback time (in sec) 100 185 277.9268 64.7295 4.2960 4.2957 200 367 631.0459 130.1924 8.5085 8.5217 400 732 1055.0078 257.5605 17.0170 16.9970 500 916 1425.7266 328.4414 21.2713 21.2695 600 1098 1670.6875 392.0518 25.4838 25.4955 700 1281 2068.3242 456.8135 29.7381 29.7448 800 1464 2382.5303 524.0430 33.9923 33.9940 Table. 10 Clip2: Check for buffer fullness and video/audio content playback time (Inter coding) *Video content playback time (sec) = (# of video frames + # of SPS/PPS) × (1/fps) No: of Video Frames in buffer No : of Audio frames in buffer Size of Video buffer (kB) Size of Audio buffer (kB) Video content playback time* (in sec) Audio content playback time (in sec) 30 56 711.6436 16.6016 1.2800 1.3003 60 112 1419.9668 35.0361 2.6000 2.6006 100 188 2369.0645 60.0449 4.3600 4.3654 150 283 3545.3672 92.5586 6.5600 6.5712 200 378 4873.7568 123.8281 8.7600 8.7771 Table .11 Clip2: Check for buffer fullness and video/audio content playback time (Intra coding) De- multiplexer output TS packet number Video IDR frame number chosen Synchronized audio frame number Chosen video frame presentation time (sec) Chosen audio frame presentation time (sec) Delay (ms) Visual delay Perceptible? 1 1 2 0.0417 0.0464 4.7 No 100 65 117 2.7110 2.7167 5.7 No 2000 146 262 6.0894 6.0836 5.8 No 10000 491 882 20.4788 20.4800 1.2 No 12000 611 1097 25.4838 25.4723 11.5 No Table.12 Clip 2: Output of de-multiplexer (Inter coding) IDR : Instantaneous decoder refresh TS packet number Video IDR frame number chosen Synchronized audio frame number Chosen video frame presentation time (sec) Chosen audio frame presentation time (sec) Delay (ms) Visual delay Perceptible? 1 1 2 0.0400 0.0464 6.4 No 5000 38 65 1.5200 1.5093 10.7 No 10000 75 129 3.0000 2.9954 4.6 No 20000 149 257 5.9600 5.9675 7.5 No 25000 182 314 7.2800 7.2911 11.1 No Table.13 Clip 2: Output of de-multiplexer (Intra coding) Conclusions  Non-NAL (network abstraction layer) syntax format of Dirac was used for multiplexing process and there is no need for conversion to NAL syntax, if not available.  Visually no delay is perceptible between video and audio.  Lip synchronization was effectively achieved using time stamps as frame numbers and from any TS packet. Future Work  Multiplexing algorithm can be modified to support for multiple program elementary streams.  Mkvmerge [32] supports Intra (I) and Predictive (P) pictures and is ignoring the fact that Dirac video has bidirectional predicted (B) frames. Hence, mkvmerge can be modified to support for B pictures of Dirac.  In practice, the data is broadcasted in an error prone network. Rugged error correction codes can be incorporated to this implementation. References [1] T. Borer and T. Davies, “Dirac video compression using open technology”, BBC EBU Technical Review, July 2005. [2] K.Oriantar, K.K. Loo and Z.Xue, “Performance comparison of emerging Dirac video codec with H.264/AVC”, School of engineering and design, Brunel university, UK. [3] A. Ravi, "Performance analysis and comparison of Dirac video codec with H.264/ MPEG-4 Part 10 AVC", M.S.E.E Thesis, University of Texas at Arlington, Arlington, TX, Aug. 2009. [4] MPEG. Information technology - Generic coding of moving pictures and associated audio information, part 4: Conformance testing. International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998. [5] M. Bosi and M. Goldberg, “Introduction to digital audio coding and standards”, Boston: Kluwer, 2003. [6] H. Eeckhaut et al, “Speeding up Dirac’s entropy coder”, proceedings of the 5th WSEAS conference on multimedia , internet and video technologies, Corfu, Greece, pp: 120 -125, August 2005. [7] “Encapsulation of Dirac in ISO/IEC 13818-1”, British Broadcasting Corporation. [8] “Dirac ISO/IEC 13818-1 Transport stream mapping”, encapsulation of Dirac video content and time code markers in ISO/IEC 13818-1 transport streams, draft, Feb 2007. [9] P.A. Sarginson, “MPEG-2: Overview of systems layer”, BBC RD 1996/2. [10] J.M. Boyce, “The United States television broadcasting transition”, IEEE signal processing magazine, vol.26, pp. 110-112, May 2009. [11] K. Brandenburg, “MP3 and AAC Explained”, AES 17th International conference, Florence, Italy, Sept. 1999. [12] “Introduction to MPEG”, excerpted from IPTV basics, Lawrence Harte. [13] Website: http://www.atsc.org/cms/ ATSC : Advanced television systems committee. [14] Dirac information : http://en.wikipedia.org/wiki/Dirac_%28video_compression_format%29 [15] Audio sampling frequency table link : http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio#Sampling_Frequencies [16] DVB-H link : http://dvb-h.org/ [17] ATSC-M/H link : http://atsc.org/cms/ [18] Mobile video DTV : http://www.openmobilevideo.com/about-mobile-dtv/standards/ [19] Information Technology – Generic coding of moving pictures and associated audio: Systems, International Standard 13818-1, ISO/IEC JTC1/SC29/WG11 N0801, 1994. [20] B. J. Lechner, et al, “The ATSC transport layer, including program and system information (PSIP),” Proceedings of the IEEE , vol. 94, no. 1, pp.77–101, Jan. 2006. [21] “Special issue on global digital television: technology and emerging services”, Proceedings of the IEEE, vol. 94, pp. 5-332, Jan. 2006. [22] H. Murugan, “Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and achieving lip sync during playback”, M.S.E.E Thesis, University of Texas at Arlington, Arlington, TX, May 2007. [23] Dirac specification: Version 2.3, Issued: Sept.2008. [24] MPEG transport stream: http://www.iptvdictionary.com/iptv_dictionary_MPEG_Transport_Stream_TS_definition.html [25] MPEG. Information technology — generic coding of moving pictures and associated audio information, part 4: Conformance testing .International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998. [26] Dirac document: “Dirac compression family using open technology”. http://www.bbc.co.uk/rd/projects/dirac/index.shtml [27] Digital Video Systems Characteristics Standard for Cable Television, ANSI/SCTE 432004. [28] Dirac website: www.diracvideo.org [29] Digital audio compression standard (AC-3, E-AC-3), revision B, ATSC Document A/52B, Advanced Television Systems Committee, Washington, D.C., Jun. 14, 2005 : www.atsc.org [30] /CMPT 365 Course Slides/, School of Computing Science, Simon Fraser ... fig3: http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap4/Chap4.3/chap 4.3.html [31] Dirac software: http://sourceforge.net/projects/dirac/ [32] Mkvmerge software: http://www.bunkus.org/videotools/mkvtoolnix/downloads.html [33] I.H.Witten, R.M. Neal and J.G.Cleary: “Arithmetic coding for data compression”, Communications of the ACM, Vol.30, No.6, pp.520-540, June 1987. [34] A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 AVC compression standard”, Signal processing: image communication, vol. 19, issue 9, pp. 793-849, Oct. 2004. [35] G. A. Davidson, “Digital audio coding: Dolby AC-3,” in the Digital Signal Processing Handbook, V. K. Madisetti and D. B. Williams, Eds. Boca Raton, FL: CRC, pp. 41-1–41-21, 1998. [36] MPEG-4:ISO/IEC JTC1/SC29 14496-3: Information technology – coding of audio-visual objectspart 3 : Audio, amendment 4:audio lossless coding(ALS), new audio profiles and ASAC extensions. [37] MPEG-2: ISO/IEC JTC1/SC29 13818-7: Advanced audio coding, AAC. IS WG11, 1997. [38] ffmpeg software: http://www.ffmpeg.org/ [39] FAAC software : www.audiocoding.com [40] VLC mediaplayer : www.videolan.org [41] A.Ravi and K.R.Rao, “Performance analysis and comparison of the Dirac video codec with H.264/ MPEG-4, part 10”, for the book “Advances in reasoning-based image processing, analysis and intelligent systems: conventional and intelligent paradigms”, 2011. [42] H.Murugan and K.R.Rao, “ Multiplexing H.264 video with AAC audio bit streams, de-multiplexing and achieving lip sync”, ICEAST 2007, Bangkok, Thailand, 21 – 23 Nov.2007. [43] H. Kalva, et al, “Implementing multiplexing, streaming and server interaction for MPEG-4”, IEEE Trans. on circuits and systems for video technology, vol. 9, No. 8, pp. 1299-1311, Dec. 1999. [44] C. C. Todd, et al, “AC-3: perceptual coding for audio transmission and storage,” presented at the 96th Conv. Audio Engineering Society., 1994, preprint 3796. [45] M. Uehara, “Application of MPEG-2 systems to terrestrial ISDB (ISDB-T)”, Proceedings of the IEEE, vol.94, pp. 261-268, Jan. 2006. [46] G.A. Davidson, et al, “ATSC video and audio coding”, Proceedings of the IEEE, vol. 94, pp. 60 76, Jan. 2006. [47] J.Jain and P. Desale, “Low power transport de-multiplexer for ATSC and DVB broadcast format”, LSI research and development Pune pvt. Ltd. [48] A.Ravi and K.R.Rao, “Performance analysis and comparison of the Dirac video codec with H.264/ MPEG-4, part 10”, IJWMIP accepted : July 2011. [49] AVI sequences: www.youtube.com [50] AVAREX YUV PLAYER: http://www.avarexgroup.com/en/solutions.php?sub=2&product=rawplayer [51] Information technology- advanced coding of audio and video part -7: mobile video [52] Geraold Blakowski et.al “A media synchronization survey: Reference model, specification and case studies”, IEEE journal on selected areas in communications, vol.14, no.1, January 1996. Questions ? Thank You!
© Copyright 2025 Paperzz