Advanced Image Coding - The University of Texas at Arlington

TUSHAR SAXENA
FALL 2011
THESIS PROPOSAL
Reducing the encoding time of H.264 Baseline profile
using parallel programming techniques
INSTRUCTOR: DR. K. R. RAO
Tushar Saxena
Department of Electrical Engineering
University of Texas at Arlington
Email: [email protected]
Page 1
LIST OF ACRONYMS
ASO
Arbitrary slice ordering
API
Application Programming Interface
BMA
Block-matching algorithm
CPU
Central Processing Unit
CUDA
Compute Unified Device Architecture
EE
Electrical Engineering
FMO
Flexible Macro block Ordering
GPU
Graphical Processing Unit
HD
High Definition
ME
Motion Estimation
M.S.
Master of Science
NAL
Network Abstraction Layer
OpenMP
Open Multi Processing
UTA
University of Texas at Arlington
Page 2
Reducing the encoding time of H.264 Baseline profile
using parallel programming techniques
Abstract:
H.264 [5] is a standard for video compression for recording, compression and
distribution of high definition video. It is also designed for multiview coding, scalable
coding, etc. Baselines and extended profiles are designed for handheld devices, video
streaming, etc. It basically reduces the amount of information required to reproduce the
input video by exploiting redundancy in the pictures it is encoding, both spatially (within
the same picture) and temporally (between pictures).
But these computations are very complex. It increases the encoding time so as to restrict
its use for real time applications.
To make it suitable for real-time applications, the encoding time of H.264 video codec
should be reduced. This can be achieved by encoding video frames parallely instead of
sequentially. Hence more than one video frame [depending on the number of cores on
the system] will be encoded in the same time duration of a single video frame. With the
advancement in technology and the need for more bandwidth and processing power
increasing on a daily basis, many parallel programming techniques [10] are now
available. Since the scope of this thesis is to reduce the encoding time in H.264 Baseline
profile only on central processing units (CPU), a set of libraries known as Open Multi
Processing (OpenMP) will be used for this purpose.
What is H.264:
H.264 [5] is a video compression standard which can achieve high quality video in
relatively low bitrates. It can mainly achieve this because of its very strong salient feature
i.e. Variable block-size motion compensation with small block sizes [See Fig.1]
Fig.1: Block sizes available for motion prediction in H.264 [2]
The encoder [see fig. 2] splits the input video signal into macro blocks of 16x16 pixels.
The macro blocks are then encoded by a technique known as motion estimation (ME)
[1], an essential part in inter-picture prediction, makes a great contribution to reduce the
bit rate.
Page 3
FIG.2: H.264 VIDEO ENCODER BLOCK DIAGRAM [2]
Once the encoding of a frame is done, the coded video data is organized into network
abstraction layer (NAL) units [see fig. 3], each of which is effectively a packet that
contains an integer number of bytes. The first byte of each NAL unit is a header byte that
contains an indication of the type of data in the NAL unit and the remaining bytes contain
payload data of the type indicated by the header.
FIG.3: NAL unit interface between encoder and decoder [2]
The NAL units are then made available at the input of the decoder [see fig. 4] where the
encoded data is decoded to obtain the original frame.
FIG.4: H.264 VIDEO DECODER BLOCK DIAGRAM [2]
Page 4
ME [see fig. 5] is an important part of inter-picture prediction. It is a process of
determining the best motion vectors that describe the transformation from one frame to
another. MV described as (dx, dy) is displacement vector of a moving object. An
algorithm known as block-matching algorithm (BMA) [3] is used in H.264 standard to
locate matching macro block in a frame based on the position of this macro block in
reference frame.
FIG.5: Multi frame ME [4]
H.264 exploits both spatial redundancy as well as temporal redundancy. Temporal
redundancy is exploited using ME [see fig. 5] whereas spatial redundancy is exploited
using the prediction modes [see fig. 6]. Since these computations are very complex, it
increases the encoding time drastically restricting the use of H.264 for real time
application. For homogeneous regions the macro block sizes are 16x16 with four intra
prediction modes [see fig. 7] but for non homogeneous regions the block sizes are 4x4
with nine prediction modes [see fig. 6].
Fig.6: Nine intra prediction modes for 4x4 block sizes [11]
Page 5
FIG.7: Four intra prediction modes for 16x16 block sizes [12]
Profiles of H.264:
FIG.8: Profiles of H.264 [5]
The four profiles [see fig.8] of H.264: Baseline, Main, Extended and High.
1] Baseline Profile: It is primarily for low-cost applications that require additional data
loss robustness, this profile is used in some videoconferencing and mobile applications.
This profile includes all features that are supported in the Constrained Baseline Profile,
plus three additional features that can be used for loss robustness (or for other purposes
such as low-delay multi-point video stream compositing).
2] Main Profile: It is designed for digital storage media and television broadcasting.
H.264 main profile which is the subset of high profile was designed with compression
coding efficiency as its main target.
3] Extended Profile: Extended profile intended as the streaming video profile, this
profile has relatively high compression capability and some extra tricks for robustness to
data losses and server stream switching.
4] High Profile: High profile powers more visual communication with fewer resources,
thus limiting or avoiding costly network upgrades. High definition (HD) systems benefit
the most from this profile and this new technology will accelerate the adoption of HD
communication across organizations.
Page 6
Advantages and Disadvantage of H.264:
H.264 encoding and decoding is more computationally complex than some other codecs
such as MPEG-4 Part-2[13]. This is mainly because of the variable block-size motion
compensation technique adopted with small block sizes and adaptive intra directional
predictions. This limits its use for real-time applications.
However, the compression performance of H.264 is significantly better than these so it
depends on the requirement of the application.
Changes adopted in H.264 Baseline profile video codec to reduce the encoding time:
The only way to make H.264 suitable for real-time applications is to reduce the encoding
time. This can be achieved by encoding many frames parallely. From the software point
of view this can be done by incorporating parallel programming techniques in the
encoding algorithm. One of them is to use OpenMP libraries.
The strategy adopted for encoding the frames parallely is as follows:
Step 1] Divide the total number of frames to encode into 2 equal sets. Ex. If the total
number of frames to encode is 30, then set1 contains frame numbers from 1 to 15 and
set 2 contains frame numbers from 16 to 30.
Step 2] Perform intra coding parallely on frame 1 and frame 16 only. Frame 1 can be
used as a reference frame for frame 2 and frame 16 can be used as a reference frame
for frame 17 and so on.
Step 3] Perform inter coding on frame 2 and frame 17 by incorporating changes in
the encoding algorithm using OpenMP. Repeat for frame 3 and frame 18 and so on till all
the frames are encoded.
FRAME 1
FRAME 2
FRAME 3
FRAME 15
INTRA
INTER
INTER
INTER
PARALLEL
ENCODING
PARALLEL
ENCODING
PARALLEL
ENCODING
PARALLEL
ENCODING
INTRA
INTER
INTER
INTER
FRAME 16
FRAME 17
FRAME 18
FRAME 30
FIG.9: Parallel processing of frames to reduce encoding time
As seen from [FIG.9] 30 frames are divided into 2 sets with set1 having frames from 1 to
frame 15 and set 2 having frames from frame 16 to frame 30. Frame 1 and frame 16 are
intra coded in parallel to act as a reference frame for frame 2 and frame 17. Similarly
frame 2 and frame 17 are inter coded in parallel to act as a reference frame for frame 3
and frame 18. This process is carried out till all the frames are encoded. Hence all the
frames can be encoded in exactly half the time required to encode all the frames.
Page 7
What is OpenMP:
OpenMP is an application programming interface (API) that supports multi-platform
shared memory multiprocessing programming in C, C++ and Fortran on many
architectures, including Unix and Microsoft Windows platforms.
FIG.10: ILLUSTRATION OF MULTITHREADING IN PARALLEL [6]
OpenMP is an implementation of multithreading [see fig.10], a method of parallelization
whereby the master "thread" (a series of instructions executed consecutively) "forks" a
specified number of slave "threads" and a task is divided among them. The threads then
run concurrently, with the runtime environment allocating threads to different processors.
The section of code that is meant to run in parallel is marked accordingly, with a
preprocessor directive that will cause the threads to form before the section is executed.
Previous research work carried out:
1] Name: D. Han et al.
Title: “Low complexity H.264 encoder using machine learning”.
Year: Sept. 2010, Sejong University, Seoul, Korea.
Description: [7]
2] Name: P.R. Ramolia
Title: Low Complexity AVS-M Using Machine Learning Algorithm C4.5.
Year: May 2011, Electrical Engineering (EE) Department, University of Texas
Arlington (UTA), Arlington, USA.
Description: [8]
3] Name: Hitesh Yadav
Title: Optimization of the deblocking filter in H.264 codec for real time
implementation
Year: May 2006, Master Of Science (M.S.), Thesis EE Department, UTA.
Description: Reduce the encoding time by enhancing the algorithm of the deblocking
filter so as to make it suitable for real time applications
Page 8
4] Name: Suchethan Swaroop
Title: Low complexity H.264 encoder using machine learning for streaming
applications
Year: May 2011, Master Of Science (M.S.), Thesis EE Department, UTA.
Description: Machine learning was adopted to improve the encoder complexity of
H.264. Machine learning is a branch of artificial intelligence that is concerned with the
design and development of algorithms that allow computers to evolve behaviors.
5] Name: Amruta Kulkarni
Title: Implementation of fast inter-prediction mode decision algorithm in H.264/AVC
video encoder.
Year: Expected Dec. 2011, M.S., Thesis EE, UTA.
Description: It is proposed to implement a complexity reduction algorithm for inter
mode selection in H.264/AVC video coding.
6] Name: Tejas Sathe
Title: Complexity reduction in H.264 encoder using Compute Unified Device
Architecture (CUDA) programming
Year: Expected Dec. 2011, M.S., Thesis EE, UTA.
Description: CUDA programming is basically used for general purpose computing on
Graphical Processing Unit (GPU). To make H.264 applicable for real time applications,
the encoding time needs to be minimized. This can be achieved by CUDA programming
as it reduces the computational complexity.
Page 9
REFERENCES:
[1] F. Dufaux and F. Moscheni , “Motion estimation techniques for digital TV: a review and a
new contribution”, Proceedings of the IEEE, Vol. 83, No.6, pp. 858-876, Jun. 1995.
[2] M. Wien, “Variable Block-Size Transforms for H.264/AVC”, IEEE Transactions on Circuits
and Systems for video technology, Vol.13, No.7, pp. 564-567, July 2003.
[3] J. Vanne, “A Configurable Motion Estimation Architecture for Block-Matching Algorithms”,
IEEE Transactions on Circuits and Systems for video technology, Vol.19, No.4, pp 620-628,
April 2009.
[4] T. Wiegand et al, “Overview of the H.264/AVC Video Coding Standard”, IEEE Transactions
on Circuits and Systems for video technology, Vol.13, No.7, pp. 560-576, July 2003.
[5] I. E. G. Richardson, “H.264 and MPEG-4 Video Compression: Video Coding for Next
Generation Multimedia”, Wiley 2nd edition, August 2010.
[6] Wikipedia, It is a free, web-based, collaborative, multilingual encyclopedia project
supported by the non-profit Wikimedia Foundation.
[7] D. Han et al, “ Low complexity H.264 encoder using machine learning”, IEEE SPA ,
Poznan, Poland, pp. 40-43, Sept 2010.
[8] P. R. Ramolia and K.R. Rao, “Low Complexity AVS-M Using Machine Learning
Algorithm C4.5”, TELSIKS 2011, Nis, Serbia, 5-8 Oct. 2011.
[9] H. Kalva, “Parallel programming for multimedia applications”, Springer Science and
Business Media, Florida Atlantic University, Florida, USA, Dec 2010.
[10] JM Software version 18.0, H.264/AVC codec software, Website:
http://iphome.hhi.de/suehring/tml/.
[11] C.C. Cheng, “Fast three step intra prediction algorithm for 4×4 blocks in H.264”,
Circuits and Systems, ISCAS, IEEE International Symposium, Vol.2, pp. 1509 – 1512,
May 2005.
[12] M. Jafani and S. Kasaei, “Fast Intra-Prediction Mode Decision in H.264 Advanced
Video Coding”, IEEE Communication Systems, Singapore International Conference ,pp.
1-6, Oct. 2006.
[13] M. Roitzsch and M. Pohlack, “Principles for the Prediction of Video Decoding
Times Applied to MPEG-1/2 and MPEG-4 Part 2 Video”, Real-Time Systems
Symposium, RTSS, 27TH IEEE International, pp. 271 – 280, Dec. 2006.
Page 10
Page 11