H.264/AVC Baseline Profile Decoder Complexity Analysis

H.264/AVC Baseline
Profile Decoder
Complexity Analysis
Michael Horowitz, Anthony Joch,
Faouzi Kossentini, and Antti Hallapuro
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS,
JULY 2003
Outline
 Introduction
 H.264/AVC decoder overview
 Storage requirements
 Time complexity
 Comparative analysis
 Experimental analysis
 Conclusion
Introduction
 To estimate the computational
complexity of an H.264/AVC
baseline decoder, it is important to
understand its two major
components:


Time complexity
Space complexity
Introduction
 Time complexity
 Time complexity is measured by the
approximate number of operations
required to execute a specific
implementation of an algorithm.
 Storage complexity
 Storage complexity is measured by
the approximate amount of memory
required to implement an algorithm.
Introduction
 Develop and validate a
methodology for estimating decoder
time complexity.
 Study the relationship between
decoder time complexity and
encoder characteristics, source
content, resolution and bit rate
H.264/AVC decoder overview
 H.264/AVC decoding process
consists of two primary paths:


the generation of the predicted video
blocks
the decoding of the coded residual blocks
H.264/AVC decoder overview
H.264/AVC decoder overview
 The decoding first process includes the
parsing and decoding of the entropy
coded bitstream

UVLC ( Complexity of CAVLC = 2 x UVLC )
 Depending on the coding mode (I or P)
of each macroblock, the predicted
macroblock can be generated either
temporally (intercoding) or spatially
(intra-coding).
H.264/AVC decoder overview
 Inter-coding MB:




Block size 16x16 ~ 4x4
Quarter-sample accuracy
Motion vectors are coded differentially
using either median or directional
prediction.
Multiple reference frame
H.264/AVC decoder overview
 Intra-coding MB


16x16 or 4x4 Intra-coding mode
9 possible mode for 4x4, 4 possible mode
for 4x4 (ex : DC, vertical, horizontal …...)
 Decoding of residual

Inverse transform
 Deblocking filter
Storage requirements
 The storage required by an
H.264/AVC baseline decoder is
divided into:




Memory that is needed for the whole frame
Memory that is needed for one line of
macroblock
Memory that is needed for a macroblock
Memory that is needed for constant data
Storage requirements
Storage requirements
 Frame buffers dominate the storage
requirements, particularly for highresolution video

95% for QCIF, 98% for CIF
Time complexity
 Table Descriptions
Decoder Subfunction Tables
 Operation Count Table
 Execution Subunit Table

Time complexity
 Analysis Methodology

First:compute the number of cycles
required to execute a particular
subfunction on a chosen hardware
platform

Second:the cycle count estimate is
derived by multiplying the result from the
first step by the frequency with which the
subfunctionwas used.
Time complexity
 Example :
4x4 inverse transform and reconstruct on
TRIMEDIA
 Two case :


inverse transform and reconstruct
inverse transform only (no nonzero
coefficient )
Time complexity
 Case 1 : inverse transform and reconstruct
( A  S  ST  L) ( A  S  ST ) ( A  S  L) ( A  ST  L) ( S  ST  L)
,
,
,
,
,
5
5
5
5
4
( A  S ) ( A  ST ) ( A  L) ( S  ST ) ( S  L) ( ST  L)
,
,
,
,
,
,
5
5
5
4
4
2
( A) ( S ) ( ST ) ( L)
,
,
,
,
]  38.4(cycles )
5 2
2
2
c  max[
Time complexity
 Case 2 : inverse transform
( ST  L) ( ST ) ( L)
c  max[
,
,
]  16(cycles )
2
2
2
Time complexity
 4 x 4 inverse transforms are 42165.7
( 28266.6 luminance and 13899.1
chrominance, Mobile QP=21 )
 42165.7 x 39 = 164462.3
 242954.3 x 16 = 3887268.8
 164462.3 + 3887268.8 = 5531731.1
Comparative analysis
 Measure the cycles of 4 x 4 inverse
transform and reconstruct on P3



Using the propose method
6.9 million cycles
Using VTune Performance Analyzer
28.23 million cycles
The ratio : 28.23 / 6.9 = 4.05
Comparative analysis
Comparative analysis
 The ratio is due to :


Operation count table contains data for
only fundamental operations . Overhead
operations such as loop overhead, flow
control, and boundary condition handling
are not included.
The software is designed so that the
overhead due to instruction cache misses
is negligible, hardware register counts are
not exceeded and operation latency is
hidden.
Comparative analysis
 Through an analysis summary for the P3
that the theoretical estimates are
approximately 2~6 times lower than the
experimental results.
 Specific factor depends mainly on the
characteristics of the subfunction, such
as the regularity of operations, amount of
overhead.
Experimental analysis
 Our experimental analysis shows that
the time complexity of the decoder and
its major subfunctions are strongly
dependent upon the average bit rate of
the coded bitstream
 Optimality of the motion estimation and
mode decision processes in the source
encoder don’t have a significant impact
on decoder complexity.
Experimental analysis
 One of the most important pieces of
information in the complexity analysis is
the distribution of time complexity
amongst subsystems




Loop filtering 33%
interpolation 25%
Entropy decoding 13%
inverse transforms and reconstruction 13%
Experimental analysis
 The factors that can affect the complexity
of each of subfunctions

Inverse Quantization, Transforms, and
Reconstruction
 the
number of blocks and macroblocks that
contain nonzero coefficients.

Bitstream Parsing and Entropy Decoding
 bit
rate: More time is spent with coefficients at
higher bit rates
Experimental analysis
 Interpolation :
Experimental analysis
 Loop filter

The most important factor in this variability
is the percentage of edges that might be
filtered due to the boundary strength
determined for that edge.
Conclusions
 Study the computational complexity of
the H.264/AVC baseline decoder using
both theoretical and experimental
methods