PPT - UTA.edu

Presented By,
Ashwini Urs
Under guidance of
Dr.K.R.Rao
Motivation
 Exponential rise in demand for video streaming over internet
and mobile devices.
 To counter the bandwidth limitations, an efficient video and
audio compression scheme is essential.
 Exploiting the open source benefits of Dirac video codec.
Outline
 Applications
 Overview of Dirac video codec
 Overview of AAC audio codec
 Multiplexing process
 De-multiplexing process
 Results
 Conclusions
 Future work
 References
Areas of Application
 Digital video broadcasting :
DVB –T (Terrestrial), DVB – S (Satellite),
DVB-C (cable). [10] [21]
 IPTV.[24]
 ATSC- M/H (Mobile/Handheld). [16][17] [18]
Block diagram of Multiplexing /De-multiplexing
Overview of Dirac video codec
 Dirac [31] is a hybrid video codec developed by British
Broadcasting Corporation (BBC). [1]
 Dirac is an open technology, meaning that it can be used
without payment of licensing fees.
 Dirac uses modern techniques like, wavelets for transform and
arithmetic coding for entropy coding. [2]
 Dirac video codec has two software implementations, namely,
Dirac research and Schrodinger. Dirac research is used in this
thesis. [28]
Features of Dirac
 Dirac video codec achieves state of the art performance - good
quality at low bitrates, leading to lower costs. [26]
 The applications of Dirac range from high definition television
(HDTV) to web streaming due to its flexibility. [1]
 Performance of Dirac for HD media seems as apt as H.264, due
to absence of large and intolerable variations between the two
codecs [2].
 With industry plans for “On demand” TV and enormous
demand for streaming over internet, open platform
technologies like Dirac have become highly significant.[26]
Figure.1 Block diagram of Dirac encoder [1]
Figure.2 Block diagram of Dirac decoder
Dirac Bit- stream syntax
 The Dirac bit stream begins with a sequence header followed by the byte
stream data in the form of frames and ends with end of a sequence.
 The start of the frame data is signaled by the parse info header. Parse info
header shall contain 4 byte code i.e. 0x42, 0x42,0x43 and 0x44. [23]
 This is followed by parse code which indicates the type of picture : intra (I),
predictive (P) or bi-directional (B) frame.
 Next parse offset consists of 4 bytes signaling the data from first byte of
current parse info header to the first byte of the next parse info header, if
present. It shall be zero, if there is no further parse info header. [23]
 Previous parse offset consists of 4 bytes signaling the data from first byte of
current parse info header to the first byte of the previous parse info header,
if present. It shall be zero, if there is no further parse info header. [23]
Table.1 Parse codes. [23]
Overview of Advanced audio coder (AAC)
 AAC [4][37] consists of three profiles, namely: main profile, low-
complexity profile and scalable sampling rate (SSR) profile.
 The key feature of low-complexity profile is, it eliminates the use
of prediction tool and reduces the complexity for the temporal
noise shaping tool
 Supports Sample frequencies from 8 kHz to 96 kHz (official
MP3: 16 kHz to 48 kHz). [4][37]
 Superior performance at bit rates > 64 kbps and at bit rates
reaching as low as 16 kbps.
Features of AAC
 AAC provides improved compression over earlier
audio codecs and provides higher-quality audio at low
bit rates. [4]
 AAC can accommodate two basic bit stream formats:
Audio data interchange format (ADIF) and Audio data
transport stream (ADTS) . [4]
 The feature of ADTS is that it has one header for each
frame followed by raw block of data.
Figure.3 Block Diagram AAC encoder [4]
Figure.4 Block Diagram AAC decoder [4]
Field
Number of
Description
bits
syncword
12
Always ‘111111111111’
ID
1
0:MPEG-4 1: MPEG-2
ADTS fixed
layer
2
Always”00”
header
protection_absent
1
profile
2
sampling_frequency_index
4
private_bit
1
channel_configuration
3
original/copy
1
home
1
copyright_identification_bit
1
copyright_identification_start
1
aac_frame_length
13
Explained below
ADTS variable
Length of frame including
header
header (in bytes)
adts_buffer_fullness
11
no_raw_data_blocks_in_frame
2
crc_check
16
raw_data_blocks
0x7FF indicates VBR
Only if protection_absent ==0
Variable size
Table.2 ADTS bit stream header format [36] [37]
Profile bits
Bits ID == 1 (MPEG -2 profile)
ID ==0 (MPEG-4 object type)
00(0)
Main profile
01(1)
Low complexity profile
10(2)
Scalable sampling rate
11(3)
reserved
Table.3 AAC profile bits expansion. [36] [37]
Figure.5 Block diagram of ATSC transreception showing
MPEG-2 Transport stream(TS) multiplexer [20]
Factors to be considered during multiplexing data
to a single stream
 Each one of video/audio elementary stream data needs to
be prioritized equally. This is essential to maintain buffer
fullness at the receiver’s end.
 Information in the form of time stamps is embedded to the
PES header which will assist in maintaining lip sync.
 Finally, since the multiplexed data will be transmitted in
an error prone network, a provision for error detection and
correction methods is to be made.
Figure.6 Encapsulation of PES from elementary streams [9]
Header fields
Size
Description
(Bytes)
PES packet start prefix
3
0x000001
Video/audio stream ID
1
Unique for Video/audio
PES packet length
2
Unbounded when exceeds 65536 bytes and
value of 0 is indicated under such
circumstances.
Time stamps
2
Frame numbers used a s time stamps
Table.6 PES header description [19]
Figure.7 Splitting of PES packet to several TS packets [9]
Time stamps
 Frame numbers are used as time stamps.
 Time stamp information is embedded in the PES
header of each frame.
 Time
stamp information helps
synchronization during playback.
to
achieve
Advantages of using frame numbers over program clock
reference (PCR)
 Less complex.
 Can be easily incorporated in software and is well suited.
 Savings in PES header bytes. (In case of PCR, time
information needs to be sent periodically)
 Synchronization can be effectively achieved due to
absence of clock jitters and inaccurate clock samples.
Figure.8 MPEG transport stream and standard structure of MPEG-TS packet [12] [24].
Figure.9 Structure of TS packet header used in this thesis.
AFC : Adaptation field control
Header fields
Size
Sync byte
8 bits
PUSI
1bit
AFC
1bit
CC
4 bits
PID (Unique for video/audio)
10 bits
Byte offset (optional)
8 bits
(If AFC ==1 && LPES < 185 bytes)
Stuffing bytes
Added in case of byte offset
Table.7 TS packet header
Example of multiplexing process
Consider,
A video PES packet of 500 bytes @ frame rate = 25 frames per second (fps).
Number of TS packets for PES = 500/185 = 2.7 ~= 3 TS packets.
Video PES duration = 1/25 = 0.04 = 40ms.
Duration of each TS packet = 40/3 = 13.33ms.
Consider ,
An audio PES packet of 600 bytes @ sampling frequency = 44100 Hz.
Number of TS packets for PES = 600/185 = 3.2 ~ =4 TS packets.
Audio PES duration = 1024/44100 = 0.04 = 23.2ms.
Duration of each TS packet = 23.2/4 = 5.8ms.
De-multiplexing
 De-multiplexing is the process of retrieving video and
audio elementary streams from the transmitted elementary
stream.
 Two buffers maintained separately to extract video and
audio data respectively.
 While extracting video/audio data, the video and audio
buffers need to be constantly monitored to ensure no
buffer overflow or underflow of data takes place.
 In this thesis, the data can be de-multiplexed from start to
end as well as from any TS packet.
Figure.10 Flowchart of the de-multiplexer
Buffer fullness check
 When the buffer fullness of the data is monitored at the de-
multiplexer, the video and audio content playback time is
calculated as follows :
 Video content playback time = Number of video frames ×
(1/fps)
 Audio content playback time = Number of audio frames ×
(1024/sampling frequency)
Calculation of presentation time
The data is de-multiplexed from any TS packet, video and audio
presentation time for the chosen frame number is calculated as follows:
Lip synchronization
 Calculating the delay between audio and video presentation time
at the de-multiplexer, helps
synchronization during playback.
to
determine
audio/video
 The maximum delay that is allowed by MPEG-2 standard is
± 40ms while synchronization between audio and video. [25]
Results
Clip 1
Clip 2
(Shrek)
(Despicable me)
Clip Duration (seconds)
50
36
Resolution
480 × 204
480 × 270
Chroma sub-sampling format
YUV 420P
YUV 420P
No: of video frames
1206
885
No: of audio frames
2166
1589
Video frame rate (fps)
24
24000/1001
Audio sampling frequency(kHz)
44.1
44.1
YUV file size (kB)
172985.63
168011.72
WAV file size (kB)
8658.04
6349.54
.DRC file size(kB)
4041.28
2629.67
.AAC file size (kB)
676.32
563.95
Video compression ratio
43:1
64:1
Audio compression ratio
13:1
12:1
DRC video bitrate (kbps)
80.83
73.05
AAC audio bitrate (kbps)
13.53
15.67
Video TS packets
23042
15039
Audio TS packets
4772
3927
TS packets
27814
18966
TS file size(kB)
5106.48
3482.039
TS bitrate(kbps)
102.13
96.72
Original AVI file size (kB)
4227
3199
Details of test clips
Table .8 Inter coding: test clips characteristics
Clip 1
Clip 2
(Despicable me)
(Despicable me)
Clip Duration (seconds)
36
10
Resolution
480 × 270
480 × 360
Chroma sub-sampling format
YUV 420P
YUV 420P
No: of video frames
885
224
No: of audio frames
1589
385
Video frame rate (fps)
24000/1001
25
Audio sampling frequency(kHz)
44.1
44.1
YUV file size (kB)
168011.72
56700
WAV file size (kB)
6349.54
1534.54
.DRC file size(kB)
16290.97
5496.87
.AAC file size (kB)
563.95
126.15
Video compression ratio
10:1
10:1
Audio compression ratio
12:1
12:1
DRC video bitrate (kbps)
452.53
549.69
AAC audio bitrate (kbps)
15.67
12.62
Video TS packets
90714
30563
Audio TS packets
3927
842
TS packets
94641
31405
TS file size (kB)
17375.5
5765.76
TS bitrate (kbps)
482.65
576.57
Original AVI file size (kB)
3199
1083
Details of test clips
Table .9 Intra coding: test clips characteristics
Figure.11 Frame 140 : luma component of Clip2 : 36 sec
Figure. 12 Frame 80 : luma component of Clip2 :10 sec
Check for buffer fullness
# of Video
Frames in
buffer
# of Audio
frames in
buffer
Size of Video
buffer
(kB)
Size of Audio
buffer
(kB)
Video content
playback time*
(in sec)
Audio content
playback time
(in sec)
100
185
277.9268
64.7295
4.2960
4.2957
200
367
631.0459
130.1924
8.5085
8.5217
400
732
1055.0078
257.5605
17.0170
16.9970
500
916
1425.7266
328.4414
21.2713
21.2695
600
1098
1670.6875
392.0518
25.4838
25.4955
700
1281
2068.3242
456.8135
29.7381
29.7448
800
1464
2382.5303
524.0430
33.9923
33.9940
Table. 10 Clip2: Check for buffer fullness and
video/audio content playback time (Inter coding)
*Video content playback time (sec) = (# of video frames + # of SPS/PPS) × (1/fps)
No: of Video
Frames in
buffer
No : of Audio
frames in
buffer
Size of Video
buffer
(kB)
Size of Audio
buffer
(kB)
Video content
playback time*
(in sec)
Audio content
playback time
(in sec)
30
56
711.6436
16.6016
1.2800
1.3003
60
112
1419.9668
35.0361
2.6000
2.6006
100
188
2369.0645
60.0449
4.3600
4.3654
150
283
3545.3672
92.5586
6.5600
6.5712
200
378
4873.7568
123.8281
8.7600
8.7771
Table .11 Clip2: Check for buffer fullness and
video/audio content playback time (Intra coding)
De- multiplexer output
TS
packet
number
Video IDR
frame
number
chosen
Synchronized
audio frame
number
Chosen video
frame
presentation
time (sec)
Chosen audio
frame
presentation
time (sec)
Delay
(ms)
Visual delay
Perceptible?
1
1
2
0.0417
0.0464
4.7
No
100
65
117
2.7110
2.7167
5.7
No
2000
146
262
6.0894
6.0836
5.8
No
10000
491
882
20.4788
20.4800
1.2
No
12000
611
1097
25.4838
25.4723
11.5
No
Table.12 Clip 2: Output of de-multiplexer (Inter coding)
IDR : Instantaneous decoder refresh
TS
packet
number
Video IDR
frame
number
chosen
Synchronized
audio frame
number
Chosen video
frame
presentation
time (sec)
Chosen audio
frame
presentation
time (sec)
Delay
(ms)
Visual delay
Perceptible?
1
1
2
0.0400
0.0464
6.4
No
5000
38
65
1.5200
1.5093
10.7
No
10000
75
129
3.0000
2.9954
4.6
No
20000
149
257
5.9600
5.9675
7.5
No
25000
182
314
7.2800
7.2911
11.1
No
Table.13 Clip 2: Output of de-multiplexer (Intra coding)
Conclusions
 Non-NAL (network abstraction layer) syntax format of
Dirac was used for multiplexing process and there is no
need for conversion to NAL syntax, if not available.
 Visually no delay is perceptible between video and audio.
 Lip synchronization was effectively achieved using time
stamps as frame numbers and from any TS packet.
Future Work
 Multiplexing algorithm can be modified to support for
multiple program elementary streams.
 Mkvmerge [32] supports Intra (I) and Predictive (P)
pictures and is ignoring the fact that Dirac video has bidirectional predicted (B) frames. Hence, mkvmerge can be
modified to support for B pictures of Dirac.
 In practice, the data is broadcasted in an error prone
network. Rugged error correction codes can be
incorporated to this implementation.
References
[1] T. Borer and T. Davies, “Dirac video compression using open technology”, BBC EBU Technical Review,
July 2005.
[2] K.Oriantar, K.K. Loo and Z.Xue, “Performance comparison of emerging Dirac video codec with
H.264/AVC”, School of engineering and design, Brunel university, UK.
[3] A. Ravi, "Performance analysis and comparison of Dirac video codec with H.264/ MPEG-4 Part 10 AVC",
M.S.E.E Thesis, University of Texas at Arlington, Arlington, TX, Aug. 2009.
[4] MPEG. Information technology - Generic coding of moving pictures and associated audio information, part
4: Conformance testing. International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998.
[5] M. Bosi and M. Goldberg, “Introduction to digital audio coding and standards”, Boston: Kluwer, 2003.
[6] H. Eeckhaut et al, “Speeding up Dirac’s entropy coder”, proceedings of the 5th WSEAS conference on
multimedia , internet and video technologies, Corfu, Greece, pp: 120 -125, August 2005.
[7] “Encapsulation of Dirac in ISO/IEC 13818-1”, British Broadcasting Corporation.
[8] “Dirac ISO/IEC 13818-1 Transport stream mapping”, encapsulation of Dirac video content and time code
markers in ISO/IEC 13818-1 transport streams, draft, Feb 2007.
[9] P.A. Sarginson, “MPEG-2: Overview of systems layer”, BBC RD 1996/2.
[10] J.M. Boyce, “The United States television broadcasting transition”, IEEE signal processing magazine,
vol.26, pp. 110-112, May 2009.
[11] K. Brandenburg, “MP3 and AAC Explained”, AES 17th International conference, Florence, Italy, Sept.
1999.
[12] “Introduction to MPEG”, excerpted from IPTV basics, Lawrence Harte.
[13] Website: http://www.atsc.org/cms/ ATSC : Advanced television systems committee.
[14] Dirac information : http://en.wikipedia.org/wiki/Dirac_%28video_compression_format%29
[15] Audio sampling frequency table link :
http://wiki.multimedia.cx/index.php?title=MPEG-4_Audio#Sampling_Frequencies
[16] DVB-H link : http://dvb-h.org/
[17] ATSC-M/H link : http://atsc.org/cms/
[18] Mobile video DTV : http://www.openmobilevideo.com/about-mobile-dtv/standards/
[19] Information Technology – Generic coding of moving pictures and associated audio: Systems,
International Standard 13818-1, ISO/IEC JTC1/SC29/WG11 N0801, 1994.
[20] B. J. Lechner, et al, “The ATSC transport layer, including program and system information (PSIP),”
Proceedings of the IEEE , vol. 94, no. 1, pp.77–101, Jan. 2006.
[21] “Special issue on global digital television: technology and emerging services”, Proceedings of the IEEE,
vol. 94, pp. 5-332, Jan. 2006.
[22] H. Murugan, “Multiplexing H264 video bit-stream with AAC audio bit-stream, demultiplexing and
achieving lip sync during playback”, M.S.E.E Thesis, University of Texas at Arlington, Arlington, TX, May
2007.
[23] Dirac specification: Version 2.3, Issued: Sept.2008.
[24] MPEG transport stream:
http://www.iptvdictionary.com/iptv_dictionary_MPEG_Transport_Stream_TS_definition.html
[25] MPEG. Information technology — generic coding of moving pictures and associated audio information,
part 4: Conformance testing .International Standard IS 13818–4, ISO/IEC JTC1/SC29 WG11, 1998.
[26] Dirac document: “Dirac compression family using open technology”.
http://www.bbc.co.uk/rd/projects/dirac/index.shtml
[27] Digital Video Systems Characteristics Standard for Cable Television, ANSI/SCTE 432004.
[28] Dirac website: www.diracvideo.org
[29] Digital audio compression standard (AC-3, E-AC-3), revision B, ATSC Document A/52B, Advanced
Television Systems Committee, Washington, D.C., Jun. 14, 2005 : www.atsc.org
[30] /CMPT 365 Course Slides/, School of Computing Science, Simon Fraser ... fig3:
http://www.cs.sfu.ca/CourseCentral/365/li/material/notes/Chap4/Chap4.3/chap 4.3.html
[31] Dirac software: http://sourceforge.net/projects/dirac/
[32] Mkvmerge software: http://www.bunkus.org/videotools/mkvtoolnix/downloads.html
[33] I.H.Witten, R.M. Neal and J.G.Cleary: “Arithmetic coding for data compression”, Communications of
the ACM, Vol.30, No.6, pp.520-540, June 1987.
[34] A. Puri, X. Chen and A. Luthra, “Video coding using the H.264/MPEG-4 AVC compression standard”,
Signal processing: image communication, vol. 19, issue 9, pp. 793-849, Oct. 2004.
[35] G. A. Davidson, “Digital audio coding: Dolby AC-3,” in the Digital Signal Processing Handbook, V. K.
Madisetti and D. B. Williams, Eds. Boca Raton, FL: CRC, pp. 41-1–41-21, 1998.
[36] MPEG-4:ISO/IEC JTC1/SC29 14496-3: Information technology – coding of audio-visual objectspart 3 : Audio, amendment 4:audio lossless coding(ALS), new audio profiles and ASAC extensions.
[37] MPEG-2: ISO/IEC JTC1/SC29 13818-7: Advanced audio coding, AAC. IS WG11, 1997.
[38] ffmpeg software: http://www.ffmpeg.org/
[39] FAAC software : www.audiocoding.com
[40] VLC mediaplayer : www.videolan.org
[41] A.Ravi and K.R.Rao, “Performance analysis and comparison of the Dirac video codec with H.264/
MPEG-4, part 10”, for the book “Advances in reasoning-based image processing, analysis and intelligent
systems: conventional and intelligent paradigms”, 2011.
[42] H.Murugan and K.R.Rao, “ Multiplexing H.264 video with AAC audio bit streams, de-multiplexing
and achieving lip sync”, ICEAST 2007, Bangkok, Thailand, 21 – 23 Nov.2007.
[43] H. Kalva, et al, “Implementing multiplexing, streaming and server interaction for MPEG-4”, IEEE
Trans. on circuits and systems for video technology, vol. 9, No. 8, pp. 1299-1311, Dec. 1999.
[44] C. C. Todd, et al, “AC-3: perceptual coding for audio transmission and storage,” presented at the 96th
Conv. Audio Engineering Society., 1994, preprint 3796.
[45] M. Uehara, “Application of MPEG-2 systems to terrestrial ISDB (ISDB-T)”, Proceedings of the
IEEE, vol.94, pp. 261-268, Jan. 2006.
[46] G.A. Davidson, et al, “ATSC video and audio coding”, Proceedings of the IEEE, vol. 94, pp. 60 76, Jan. 2006.
[47] J.Jain and P. Desale, “Low power transport de-multiplexer for ATSC and DVB broadcast format”,
LSI research and development Pune pvt. Ltd.
[48] A.Ravi and K.R.Rao, “Performance analysis and comparison of the Dirac video codec with
H.264/ MPEG-4, part 10”, IJWMIP accepted : July 2011.
[49] AVI sequences: www.youtube.com
[50] AVAREX YUV PLAYER:
http://www.avarexgroup.com/en/solutions.php?sub=2&product=rawplayer
[51] Information technology- advanced coding of audio and video part -7: mobile video
[52] Geraold Blakowski et.al “A media synchronization survey: Reference model, specification and
case studies”, IEEE journal on selected areas in communications, vol.14, no.1, January 1996.
Questions ?
Thank You!