Int iStartX = cStruct.iBestX - The University of Texas at Arlington

Department of Electrical Engineering
The University of Texas at Arlington
A Project Proposal on
Early termination for TZSearch in HEVC motion
estimation.
Under the guidance of Dr. K. R. Rao
For the fulfillment of the course Multimedia Processing (EE5359)
Spring 2016
Submitted by
Rajath Shivananda (1001096626)
1
TABLE OF CONTENTS
1. Objective of the project ...................................................................................................................... 8
2. H.265 / High Efficiency Video Coding.................................................................................................. 8
2.1 Introduction .................................................................................................................................. 8
2.2 Encoder and Decoder in HEVC .................................................................................................... 10
2.3 Features of HEVC ........................................................................................................................ 13
2.3.1 Coding tree units and coding tree block (CTB) structure:........................................................ 14
2.3.2 Coding units (CUs) and coding blocks (CBs): ............................................................................ 14
2.3.4 TUs and transform blocks: ....................................................................................................... 16
3. Inter Picture Prediction in HEVC ....................................................................................................... 16
3.1 Introduction ................................................................................................................................ 16
3.2 Motion Vector Prediction ........................................................................................................... 16
3.3 Advanced Motion Vector Prediction Process ............................................................................. 17
3.4 Merge Mode ............................................................................................................................... 18
3.5 Motion Compensation ................................................................................................................ 18
3.6 Fractional Sample Interpolation ................................................................................................. 21
3.7 Weighted Sample Prediction ...................................................................................................... 21
4. Block Matching [22][23] .................................................................................................................... 22
4.1 Motion Estimation Algorithms [22] [23] ..................................................................................... 23
4.2 Full Search Algorithm .................................................................................................................. 23
4.3 TZSearch Algorithm [25] ............................................................................................................. 24
5. Proposed Algorithm [16] ................................................................................................................... 26
5.1 Introduction [16] ......................................................................................................................... 26
5.2 Median Predictors [16] ............................................................................................................... 26
6. Configuration Profile ......................................................................................................................... 35
6.1 Introduction ................................................................................................................................ 35
6.1.1 Low Delay ................................................................................................................................. 35
6.1.2 Random Access ........................................................................................................................ 36
6.1.3 Custom Profile (Random Access Early) .................................................................................... 37
7. Test Sequences ................................................................................................................................. 38
8. Experimental Results......................................................................................................................... 40
2
8.1 Test Conditions ........................................................................................................................... 40
8.2 Comparison of low delay, random access and random access early for different QP values and
different video sequences. ............................................................................................................... 41
Conclusion: -.......................................................................................................................................... 44
REFERENCES .......................................................................................................................................... 45
3
Acknowledgement
I would like to acknowledge Dr. K.R.Rao for his continuous support
and guidance during the course of the project. We thank him for
providing necessary feedback and dedicating his precious time in
reviewing the reports and presentation slides at each step.
I would also like extend our gratitude towards Mr. Tuan Ho for
helping us understand Inter prediction and addressing other issues faced
during the course of the project without which the project would not
have been successful.
4
List of Acronyms and Abbreviations
AVC- Advanced Video Coding.
AMVP- Advance motion vector prediction.
AP- Above Predictor.
ARP- Above Right Predictor.
B-frame- Bi-predictive frame.
BMA - Block Matching Algorithm.
CABAC- Context Adaptive Binary Arithmetic Coding.
CTB- Coding Tree Block.
CTU- Coding Tree Unit.
CU- Coding Unit.
CB- Coding Block
DCT- Discrete Cosine Transform.
GOP – Group of Pictures
HDTV- High Definition Television.
HEVC- High Efficiency Video Coding.
HM- HEVC Test Model.
I-frame- Intra-coded frame.
ICASSP- International Conference on Acoustics, Speech and Signal Processing.
JCT- Joint Collaborative Team.
JCT-VC- Joint Collaborative Team on Video Coding.
JM- H.264 Test Model.
JPEG- Joint Photographic Experts Group.
KBPS - Kilo Bits Per Second.
LCU - Large Coding Unit.
LDSP- Large Diamond Search Pattern
LP – Left Predictor.
MV- Motion Vector.
MP - Median Predictor.
MC - Motion Compensation.
5
ME - Motion Estimation.
MPEG - Motion Picture Experts Group.
P-frame: Predicted frame.
PC- Prediction Chunking.
PU- Prediction Unit.
PB- Prediction Block.
PSNR- Peak Signal to Noise Ratio.
SAD- Sum of absolute Difference.
SCU- Small coding unit.
SSD -Sum of Squared Differences.
QP: Quantization Parameter.
RD: Rate Distortion
TB: Transform Block.
TU: Transform Unit.
TZSearch: Test Zone Search
6
Abstract [14]
The TZSearch algorithm was adopted in the high efficiency video coding reference software HM as a
fast Motion Estimation (ME) algorithm for its excellent performance in reducing ME time and
maintaining a comparable Rate Distortion (RD) performance. However, the multiple initial search point
decision and the hybrid block matching search contribute a relatively high computational complexity to
TZSearch. Based on the statistical analysis of the probability of median predictor to be selected as the
final best point in the large Coding Units (CUs) (64x64, 32x32) and small CUs (16x16, 8x8) as well as
the center-biased characteristic of the final best search point in ME process two early terminations for
TZSearch are proposed. Experimental results shows that 38.96% encoding time is saved, while the RD
performance degradation is quite acceptable [16].
7
1. Objective of the project
In this project, TZSearch is used as the best block matching algorithm compared to full search algorithm.
Proposed algorithm is followed to terminate TZSearch algorithm to reduce the computational time by
38.96 % [16]. Median predictors are used as the best initial search point which is about 67.41%
(Average). Further Experiment is conducted using 3 configuration profiles which are random access,
low delay and custom configuration profile. The results for these different configuration profiles are
compared and the best configuration profile is selected. Lastly, different video sequences are used and
their PSNR and Bitrate are compared.
2. H.265 / High Efficiency Video Coding
2.1 Introduction
High Efficiency Video Coding (HEVC) [2] is an international standard for video compression
developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T VCEG
(Video Coding Experts Group). The main goal of HEVC standard is to significantly improve
compression performance compared to existing standards (such as H.264/Advanced Video Coding [2])
in the range of 50% bit rate reduction at similar visual quality [4].
HEVC is designed to address existing applications of H.264/MPEG-4 AVC and to focus on two key
issues: increased video resolution and increased use of parallel processing architectures [4]. It primarily
targets consumer applications as pixel formats are limited to 4:2:0 8-bit and 4:2:0 10-bit. The next
revision of the standard, will enable new use-cases with the support of additional pixel formats such as
4:2:2 and 4:4:4 and bit depth higher than 10-bit [5], embedded bit-stream scalability and 3D video [6].
8
Figure 1. YVU format for 10-bit 4:2:2, 8-bit 4:2:2 and 8-bit 4:2:0 [43].
Figure 2. Evolution of video coding standards over the years [13]
Figure 3. Comparison of 10-bit and 8-bit color [43].
9
Figure 4. Comparison of YVU format for 4:2:2 and 4:4:4 [43].
2.2 Encoder and Decoder in HEVC
Figure 5. Block diagram of HEVC CODEC [7]
Source video, consisting of a sequence of video frames, is encoded or compressed by a video encoder
to create a compressed video bit stream. The compressed bit stream is stored or transmitted. A video
decoder decompresses the bit stream to create a sequence of decoded frames [7].
The video encoder performs the following steps:
 Partitioning each picture into multiple units.
 Predicting each unit using inter or intra prediction, and subtracting the prediction from the
unit.
 Transforming and quantizing the residual (the difference between the original picture unit and
the prediction).
 Entropy encoding transform output, prediction information, mode information and headers.
The video decoder performs the following steps:
 Entropy decoding and extracting the elements of the coded sequence.
10



Rescaling and inverting the transform stage.
Predicting each unit and adding the prediction to the output of the inverse transform.
Reconstructing a decoded video image.
11
The Figure 6 and Figure 7 represent the detailed block diagrams of HEVC encoder and decoder
respectively:
Figure 6. Block Diagram of HEVC Encoder [2]
Figure 7. Block Diagram of HEVC Decoder [8]
12
2.3 Features of HEVC
The video coding layer of HEVC employs the same hybrid approach (inter-/intra-picture prediction and 2D transform coding) used in all video compression standards. Figure 6 depicts the block diagram of a hybrid
video encoder, which can create a bit-stream conforming to the HEVC standard. Figure 7 shows the HEVC
decoder block diagram. An encoding algorithm producing an HEVC compliant bit-stream would typically
proceed as follows. Each picture is split into block-shaped regions, with the exact block partitioning being
conveyed to the decoder. The first picture of a video sequence (and the first picture at each clean random
access point in a video sequence) is coded using only intra-picture prediction (that uses prediction of data
spatially from region-to-region within the same picture, but has no dependence on other pictures). For all
remaining pictures of a sequence or between random access points, inter-picture temporally predictive
coding modes are typically used for most blocks.
The encoding process for inter-picture prediction consists of choosing motion data comprising the
selected reference picture and motion vector to be applied for predicting the samples of each block. The
encoder and decoder generate identical inter-picture prediction signals by applying MC using the MV and
mode decision data, which are transmitted as side information. The residual signal of the intra- or interpicture prediction, which is the difference between the original block and its prediction, is transformed by
a linear spatial transform. The transform coefficients are then scaled, quantized, entropy coded, and
transmitted together with the prediction information. The encoder duplicates the decoder processing loop
(see gray-shaded boxes in Figure.4) such that both will generate identical predictions for subsequent data.
Therefore, the quantized transform coefficients are constructed by inverse scaling and are then inverse
transformed to duplicate the decoded approximation of the residual signal. The residual is then added to the
prediction, and the result of that addition may then be fed into one or two loop filters to smooth out artifacts
induced by block-wise processing and quantization.
The final picture representation (that is a duplicate of the output of the decoder) is stored in a
decoded picture buffer to be used for the prediction of subsequent pictures. In general, the order of encoding
or decoding processing of pictures often differs from the order in which they arrive from the source;
necessitating a distinction between the decoding order (i.e., bit-stream order) and the output order (i.e.,
display order) for a decoder. Video material to be encoded by HEVC is generally expected to be input as
progressive scan imagery (either due to the source video originating in that format or resulting from deinterlacing prior to encoding). No explicit coding features are present in the HEVC design to support the
use of interlaced scanning, as interlaced scanning is no longer used for displays and is becoming
substantially less common for distribution. However, a metadata syntax has been provided in HEVC to
allow an encoder to indicate that interlace-scanned video has been sent by coding each field (i.e., the even
or odd numbered lines of each video frame) of interlaced video as a separate picture or that it has been sent
by coding each interlaced frame as an HEVC coded picture. This provides an efficient method of coding
interlaced video without burdening decoders with a need to support a special decoding process for it. In the
following, the various features involved in hybrid video coding using HEVC are highlighted as follows.
13
2.3.1 Coding tree units and coding tree block (CTB) structure:
The core of the coding layer in previous standards was the macroblock, containing a 16×16 block of luma
samples and, in the usual case of 4:2:0 colour sampling, two corresponding 8×8 blocks of chroma samples;
whereas the analogous structure in HEVC is the coding tree unit (CTU), which has a size selected by the
encoder and can be larger than a traditional macroblock. The CTU consists of a luma CTB and the
corresponding chroma CTBs and syntax elements. The size L×L of a luma CTB can be chosen as L = 16,
32, or 64 samples, with the larger sizes typically enabling better compression. HEVC then supports a
partitioning of the CTBs into smaller blocks using a tree structure and quad tree-like signalling [10]. The
partitioning of CTBs into CBs ranging from 64*64 down to 8*8 is shown in Figure 8.
Figure 8. 64*64 CTBs split into CBs [9]
2.3.2 Coding units (CUs) and coding blocks (CBs):
The quad tree syntax of the CTU specifies the size and positions of its luma and chroma CBs. The root of
the quadtree is associated with the CTU. Hence, the size of the luma CTB is the largest supported size for
a luma CB. The splitting of a CTU into luma and chroma CBs is signalled jointly. One luma CB and
ordinarily two chroma CBs, together with associated syntax, form a coding unit (CU) as shown in Figure
9. A CTB may contain only one CU or may be split to form multiple CUs, and each CU has an associated
partitioning into prediction units (PUs) and a tree of transform units (TUs).
14
Figure 9. CUs split into CBs [9]
2.3.3 Prediction units and prediction blocks (PBs):
The decision whether to code a picture area using interpicture or intrapicture prediction is made at the CU
level. A PU partitioning structure has its root at the CU level. Depending on the basic prediction-type
decision, the luma and chroma CBs can then be further split in size and predicted from luma and chroma
prediction blocks (PBs) as shown in figure 10 HEVC supports variable PB sizes from 64×64 down to 4×4
samples.
Figure 10. Partitioning of Prediction Blocks from Coding Blocks [9]
15
2.3.4 TUs and transform blocks:
The prediction residual is coded using block transforms. A TU tree structure has its root at the CU level.
The luma CB residual may be identical to the luma transform block (TB) or may be further split into smaller
luma TBs as shown in Figure 11. The same applies to the chroma TBs. Integer basis functions similar to
those of the discrete cosine transform (DCT) are defined for the square TB sizes 4×4, 8×8, 16×16, and
32×32. For the 4×4 transform of luma intrapicture prediction residuals, an integer transform derived from
a form of discrete sine transform (DST) is alternatively specified.
Figure 11. Partitioning of Transform Blocks from Coding Blocks [9]
3. Inter Picture Prediction in HEVC
3.1 Introduction
Motion estimation (ME) and motion compensation is seen as one of the most important methods of
exploiting redundancy in motion pictures. Its importance is so high that 50% to 70% of encoder complexity
is dedicated to the motion estimation process [18]. However, as we move towards higher resolution videos,
computational complexity is becoming a bigger concern. This is why motion estimation is seen as a major
savings area in terms of computational expense. However, in HEVC and the previous video coding standard
H.264/MPEG-4(AVC), motion estimation using multiple reference frames was also introduced which
added to the complexity of the motion estimation process. While it provided the ability to improve PSNR,
it also added extra computational cost.
3.2 Motion Vector Prediction
Like the AVC, the HEVC standard has two reference lists: L0 and L1. They can hold 16 references each,
but the maximum total number of unique pictures is 8. This means that to find a maximum output, the same
16
picture has to be added more than once. The encoder may choose to do this to be able to predict the same
picture with different weights (weighted prediction). The HEVC standard uses more complex motion
prediction than AVC. The HEVC standard uses candidate list indexing. There are two MV prediction
modes: merge and AMVP (advanced motion vector prediction). The encoder decides between these two
modes for each PU and signals it in the bit stream with a flag. Only the AMVP process can result in any
desired MV, since it is the only one that codes an MV delta. Each mode builds a list of candidate MVs, and
then selects one of them using an index coded in the bit stream.
Figure 12. Position of spatial candidates of motion information [19]
Figure 13. Quad-tree splitting flag. 1- Level 1 (L1), 0 – Level (L0) [20]
3.3 Advanced Motion Vector Prediction Process
17
AMVP process is performed once for each MV; so once per L0 or L1 PU for unidirectional PU, or twice for
a bidirectional PU. The bit stream specifies the reference picture to use for each MV. A two-deep candidate
list is formed:
First, a left predictor is obtained. a0 is preferred over a1, same list is preferred over the opposite list, and
neighbor is preferred that points to the same picture over one that does not. If no neighbor points to the
same picture, the motion vector is scaled to match the picture distance (similar process as the AVC temporal
direct mode). If all this results are in a valid candidate, a motion vector is added to the candidate list.
Second, upper predictor is obtained. b0 is preferred over b1, b1is preferred over b2 and the neighbor MV that
points to the same picture is preferred for motion vector prediction over the one that are not in same picture.
Neighbor scaling for the upper predictor is only done if it was not done for the left neighbor, ensuring no
more than one scaling operation per PU. If the candidate is found, it is added to the list. If the list still
contains less than two candidates, a temporal candidate (scaled MV according to picture distance) is
obtained, which is co-located with the right bottom of the PU. If the candidate lies outside the CTB row, or
outside the picture, or if the co-located PU is intra, center position is tried again. If temporal candidate is
found, it is added to the list. If the candidate list is still empty, a (0, 0) motion vector is added until full.
Finally, with the transmitted index, right candidate is selected and is added in the transmitted MV delta.
3.4 Merge Mode
The merge process results in a candidate list of up to five entries deep, configured in the slice header. Each
entry ends up being L0, L1 or bidirectional. Four spatial candidates are added in this order: a1, b1, b0, a0, b2.
A candidate is not added to the list if it is the same as one of the earlier candidates. Then, if the list still has
room, a temporal candidate is added, which is found by the same process as in AMVP (Advance Motion
Vector Prediction) process. Then, if the list still has room, bidirectional candidates are added and formed
by making combinations of the L0 and L1 vectors of the other candidates already in the list. Then finally if
the list still is not full, (0, 0) MVs are added with increasing reference indices. The final motion vector is
obtained by picking one of the up-to-five candidates as signaled in the bit stream.
The HEVC sub-samples the temporal motion vectors on a 16x16 grid. Thus, decoder only needs to make
room for two motion vectors (L0 and L1) per 16x16 regions in the picture when it allocates the temporal
motion vector buffer. When the decoder calculates the co-located position, lower 4 bits are zeroed out of
the x/y position, snapping the location to a multiple of 16 and picture is considered to be co-located, that is
signaled in the slice header.
3.5 Motion Compensation
18
Like MPEG-4/AVC [1], HEVC specifies motion vectors in 1/4-pel, but uses an 8tap filter for luma (all
positions), and a 4-tap 1/8-pel filter for chroma as shown in figure 32. Because of the 8-tap filter, any given
NxM sized block requires extra pixels on all sides (3 left/above, 4 right and below) to provide the filter with
the data it needs. With small blocks like an 8x4, (8+7) x (4+7) = 15x11 pixels are needed. The HEVC
standard limits the smallest block to be uni-directional and 4x4 is not supported since more small blocks
require more memory read, thus increasing more memory access, more time and more power.
The HEVC standard also supports weighted prediction for both uni- and bidirectional PUs. However, the
weights are always explicitly transmitted in the slice header, there is no implicit weighted prediction like
in MPEG-4/AVC [19]. Quarter-sample precision is used for the motion vectors. 7-tap (weights: -1, 4, -10,
58, 17, -5, 1) or 8-tap (weights: -1, 4, -11, 40, 40, -11, 4, 1) filters are used for interpolation of fractionalsample positions as shown in figure 3-3. Similar to H.264/MPEG-4 AVC [19], multiple reference pictures
are used as shown in figure 14. For each PB, either one or two motion vectors can be transmitted, resulting
either in unipredictive or bipredictive coding, respectively. A scaling and offset operation can be applied to
the prediction signal/signals in a manner known as weighted prediction.
Figure 14. Integer (Ai, j) and fractional sample (a i, j) position for luma interpolation [1]
19
Figure 15. Multiple pictures used as reference for the current picture for motion
compensation [2]
Figure 14 shows the positions labeled with upper-case letters, Ai, j, representing the available
luma samples at integer sample locations, whereas the other positions labeled with lower-case letters
represent samples at non-integer sample locations, which need to be generated by interpolation. The
samples labeled a0,j, b0,j, c0,j, d0,0, h0,0, and n0,0 are derived from the samples Ai,j by applying the eighttap filter for half-sample positions and the seven-tap filter for the quarter-sample positions as follows
[19]:
Where the constant B ≥ 8 is the bit depth of the reference samples (typically B = 8 for most
applications). In these formulas, the symbol, >>, denotes an arithmetic right shift operation. The
samples labeled e0,0, f0,0, g0,0, i0,0, j0,0, k0,0, p0,0, q0,0, and r0,0 can be derived by applying the corresponding
filters to samples located at vertically adjacent a0,j, b0,j and c0,j positions as follows [19]:
20
3.6 Fractional Sample Interpolation
Interpolation tasks arise naturally in the context of video coding because the true displacements of objects
from one picture to another are independent of the sampling grid of cameras. Therefore, in MCP, fractionalsample accuracy is used to more accurately capture continuous motion. Samples available at integer
positions are filtered to estimate values at fractional positions. This spatial domain operation can be seen in
the frequency domain as introducing phase delays to individual frequency components. An ideal
interpolation filter for band-limited signals induces a constant phase delay to all frequencies and does not
alter their magnitudes. The efficiency of MCP is limited by many factors—the spectral content of original
and already reconstructed pictures, camera noise level, motion blur, quantization noise in reconstructed
pictures, etc. Similar to H.264/AVC, HEVC supports motion vectors with quarter-pixel accuracy for the
luma component and one-eighth pixel accuracy for chroma components. If the motion vector has a half or
quarter-pixel accuracy, samples at fractional positions need to be interpolated using the samples at integersample positions. The interpolation process in HEVC introduces several improvements over H.264/AVC
that contributes to the significant coding efficiency increase of HEVC [21].
3.7 Weighted Sample Prediction
Similar to H.264/AVC, HEVC includes a weighted prediction (WP) tool that is particularly useful for
coding sequences with fades. In WP, a multiplicative weighting factor and an additive offset are applied to
the motion compensated prediction. In principle, WP replaces the inter prediction signal P by a linearly
weighted prediction signal.
Where w is an Illumination Compensation weight and o is an offset.
21
The inputs to the WP process are: the width and the height of the luma prediction block, prediction samples
to be weighted, the prediction list utilization flags, the reference indices for each list, and the color
component index. Weighting factors w0 and w1, and offsets o0 and o1 are determined using the data
transmitted in the bit stream. The subscripts indicate the reference picture list to which the weight and the
offset are applied. The output of this process is the array of prediction sample values. In HEVC weight and
offset parameters are explicitly signaled (explicit mode). Optimal solutions are obtained when the
Illumination Compensation weights, motion estimation and Rate Distortion Optimization (RDO) are
considered jointly [21]. However, practical systems usually employ simplified techniques, such as
determining approximate weights by considering picture-to-picture mean variation [21].
4. Block Matching [22][23]
The MPEG and H.26X standards [20] use block-matching technique for motion estimation /compensation.
In the block-matching technique, each current frame is divided into equal-size blocks, called source blocks.
Each source block is associated with a search region in the reference frame. The objective of blockmatching is to find a candidate block in the search region best matched to the source block. The relative
distances between a source block and its candidate blocks are called motion vectors.
Figure.16. Block matching scenario [22]
X: Source block for block-matching
Bx: Search area associated with X
MV: Motion vector
22
4.1 Motion Estimation Algorithms [22] [23]
•
Full Search Algorithm
•
TZSearch Algorithm
4.2 Full Search Algorithm
In video coding, Full search algorithm based on block matching finds optimal motion vectors which
minimize the matching differences between reference blocks and candidate blocks in search area. Full
search algorithm has been widely used in video coding applications because of its simple and easy
hardware implementation. However, high computational cost of the full search algorithm with very large
search area has been considered as a serious problem for realizing fast real-time video coding.
In figure 17 an example for full search algorithm is shown where the blue colored pixels are previously
determined SAD values of the current block before determining the best match which is purple in color. It
can be seen that it takes numerous computations before the best match is found.
Figure 17. Full Search algorithm [23]
23
4.3 TZSearch Algorithm [25]
Motion estimation is an essential process in HEVC. It finds the best matched block position in the past (or
future frames) for every block in the current video frame. Full search algorithms searching all the blocks in
reference frame search window can find the most accurate matching block, but they are also too timeconsuming [26]. So fast motion estimation algorithms searching only blocks which are likely to be the best
matched block position is widely used. TZ search method is adopted as the fast integer pixel motion
estimation method in HM. It has four steps as described in the following [27]:
1. Start Search Center: Establish a set of search centers, including the motion vector obtained from
median prediction, the motion vector of the left, the up and the upper right position in the corresponding
block of the reference frame, the motion vector at (0,0) position. Choose the point which has the smallest
matching error as search center of next step.
2. Diamond or Square Search: Determine the search range and the search pattern. Run the search with
different stride lengths from 1 through 64, in multiples of 2. Then performing 2 point search to check only
the 2 untested points if the optimal point is around the search center with stride length being 1. Find out the
smallest matching error point as search center of step 3.
3. Raster Search: If the distance between the optimal point obtained from step 2 and current search center
called best distance is 0, stop the search. Otherwise, if it is greater than the value of iRaster which is set
appropriately, raster search is performed and the value of iRaster is used as the stride length of raster search.
4. Raster/Star Refinement: Set the optimal point from step 3 as the starting point. Raster refinement
performs 8 point diamond or square search with the best distance decreasing according to the exponential
of 2 and updating every step of the current starting point location until the best distance is 0. The star
refinement is similar to step 2 except for the optimal point being the starting point every round. The
refinement process will only start if the best distance is greater than zero. When the best distance equals to
0, the search stops.
Some of the commonly used TZSearch patterns are shown in the Figure 18.
24
Figure 18. TZSearch patterns
4.3.1 Diamond Search Algorithm.
Figure.19 Diamond search scenario for ME [28] [29]
The Diamond Search algorithm employs two search patterns.
1. Large diamond search pattern (LDSP) comprises nine checking points from which eight points
surround the center one to compose a diamond shape.
25
2. Small diamond search pattern (SDSP) consisting of five checking points forms a small diamond
shape.
5. Proposed Algorithm [16]
5.1 Introduction [16]
The TZSearch algorithm was adopted in the high efficiency video coding reference software HM as a fast
Motion Estimation (ME) algorithm for its excellent performance in reducing ME time and maintaining a
comparable Rate Distortion (RD) performance. However, the multiple initial search point decision and the
hybrid block matching search contribute a relatively high computational complexity to TZSearch. Based on
the statistical analysis of the probability of median predictor to be selected as the final best point in the large
Coding Units (CUs) (64X64, 32X32) and small CUs (16X16, 8X8) as well as the center-biased characteristic
of the final best search point in ME process, two early terminations for TZSearch are proposed.
5.2 Median Predictors [16]
Flexible size representation technique contribute the largest proportion of encoding time to HEVC encoder.
The search procedure of TZSearch includes two steps, initial search point decision and block matching
search, respectively. The first step is to determine the initial search point by using a set of predictors which
includes Median Predictor (MP) [10], Left Predictor (LP), Above Predictor (AP), Above-Right Predictor
(ARP) and (0, 0). LP, AP and ARP are corresponding to the Motion Vector (MV) of left, top, top right block
of the current block, respectively. After the initial search point is determined, the hybrid block matching
search, including multiple diamond/square search and raster search, are used to locate the best matching
block which is with the minimum RD cost. However, the computational complexity of multiple initial search
point decision and hybrid block matching search is still relatively high. If these two processes can be
simplified, much more encoding time will be saved.
Figure 20. Median Predictors [30]
26
In video sequences, there are a large number of blocks with static or quite slow motion activity. For these
blocks, they have the largest probability to select the MP as the final best point in ME process. From
experimental results it can be seen that approximately 62.26 % of the final best search point are Median
Predictors. Table 1 shows the possibility of selecting Median Predictors as the final best search point for
different video sequences and different QP.
QP
Sequence
24
28
32
36
Average
BQMall
58.74
60.99
60.66
62.70
60.77
Johnny
61.44
64.15
67.35
70.84
65.94
Modisode2
65.55
68.15
70.55
70.24
68.62
ParkScene
42.22
46.84
45.24
49.66
45.99
Average
56.9875
60.0325
60.95
63.36
60.33
Table 1. Possibility of selecting Median Predictors as the final best search point, Unit(%).
Code Used:The code below is taken from HM 16.0. This particular part of the code helps in determining the initial
best point to start the TZSearch.
Void TEncSearch::xTZSearch(const TComDataCU* const pcCU,
const TComPattern* const pcPatternKey,
const Pel* const
piRefY,
const Int
iRefStride,
const TComMv* const
pcMvSrchRngLT,
const TComMv* const
pcMvSrchRngRB,
TComMv&
rcMv,
Distortion&
ruiSAD,
const TComMv* const
pIntegerMv2Nx2NPred,
const Bool
bExtendedSettings)
{
const Bool bUseAdaptiveRaster = bExtendedSettings;
const Int iRaster = 5;
const Bool bTestOtherPredictedMV = bExtendedSettings;
const Bool bTestZeroVector = true;
const Bool bTestZeroVectorStart = bExtendedSettings;
const Bool bTestZeroVectorStop = false;
const Bool bFirstSearchDiamond = true; // 1 = xTZ8PointDiamondSearch 0 = xTZ8PointSquareSearch
const Bool bFirstCornersForDiamondDist1 = bExtendedSettings;
const Bool bFirstSearchStop = m_pcEncCfg->getFastMEAssumingSmootherMVEnabled();
const UInt uiFirstSearchRounds = 3; // first search stop X rounds after best match (must be >=1)
const Bool bEnableRasterSearch = true;
const Bool bAlwaysRasterSearch = bExtendedSettings; // true: BETTER but factor 2 slower
const Bool bRasterRefinementEnable = false; // enable either raster refinement or star refinement
const Bool bRasterRefinementDiamond = false; // 1 = xTZ8PointDiamondSearch 0 = xTZ8PointSquareSearch
const Bool bRasterRefinementCornersForDiamondDist1 = bExtendedSettings;
const Bool bStarRefinementEnable = true; // enable either star refinement or raster refinement
const Bool bStarRefinementDiamond = true; // 1 = xTZ8PointDiamondSearch 0 = xTZ8PointSquareSearch
const Bool bStarRefinementCornersForDiamondDist1 = bExtendedSettings;
const Bool bStarRefinementStop = false;
27
const UInt uiStarRefinementRounds = 2; // star refinement stop X rounds after best match (must be >=1)
const Bool bNewZeroNeighbourhoodTest = bExtendedSettings;
UInt uiSearchRange = m_iSearchRange;
pcCU->clipMv(rcMv);
#if ME_ENABLE_ROUNDING_OF_MVS
rcMv.divideByPowerOf2(2);
#else
rcMv >>= 2;
#endif
// init TZSearchStruct
IntTZSearchStruct cStruct;
cStruct.iYStride = iRefStride;
cStruct.piRefY = piRefY;
cStruct.uiBestSad = MAX_UINT;
// set rcMv (Median predictor) as start point and as best point
xTZSearchHelp(pcPatternKey, cStruct, rcMv.getHor(), rcMv.getVer(), 0, 0);
// test whether one of PRED_A, PRED_B, PRED_C MV is better start point than Median predictor
if (bTestOtherPredictedMV)
{
for (UInt index = 0; index < NUM_MV_PREDICTORS; index++)
{
TComMv cMv = m_acMvPredictors[index];
pcCU->clipMv(cMv);
#if ME_ENABLE_ROUNDING_OF_MVS
cMv.divideByPowerOf2(2);
#else
cMv >>= 2;
#endif
if (cMv != rcMv && (cMv.getHor() != cStruct.iBestX && cMv.getVer() != cStruct.iBestY))
{
// only test cMV if not obviously previously tested.
xTZSearchHelp(pcPatternKey, cStruct, cMv.getHor(), cMv.getVer(), 0, 0);
}
}
}
// test whether zero Mv is better start point than Median predictor
if (bTestZeroVector)
{
if ((rcMv.getHor() != 0 || rcMv.getVer() != 0) &&
(0 != cStruct.iBestX || 0 != cStruct.iBestY))
{
// only test 0-vector if not obviously previously tested.
xTZSearchHelp(pcPatternKey, cStruct, 0, 0, 0, 0);
}
}
Int
Int
Int
Int
iSrchRngHorLeft = pcMvSrchRngLT->getHor();
iSrchRngHorRight = pcMvSrchRngRB->getHor();
iSrchRngVerTop = pcMvSrchRngLT->getVer();
iSrchRngVerBottom = pcMvSrchRngRB->getVer();
if (pIntegerMv2Nx2NPred != 0)
28
{
TComMv integerMv2Nx2NPred = *pIntegerMv2Nx2NPred;
integerMv2Nx2NPred <<= 2;
pcCU->clipMv(integerMv2Nx2NPred);
#if ME_ENABLE_ROUNDING_OF_MVS
integerMv2Nx2NPred.divideByPowerOf2(2);
#else
integerMv2Nx2NPred >>= 2;
#endif
if ((rcMv != integerMv2Nx2NPred) &&
(integerMv2Nx2NPred.getHor() != cStruct.iBestX || integerMv2Nx2NPred.getVer() != cStruct.iBestY))
{
// only test integerMv2Nx2NPred if not obviously previously tested.
xTZSearchHelp(pcPatternKey, cStruct, integerMv2Nx2NPred.getHor(), integerMv2Nx2NPred.getVer(), 0, 0);
}
// reset search range
TComMv cMvSrchRngLT;
TComMv cMvSrchRngRB;
Int iSrchRng = m_iSearchRange;
TComMv currBestMv(cStruct.iBestX, cStruct.iBestY);
currBestMv <<= 2;
xSetSearchRange(pcCU, currBestMv, iSrchRng, cMvSrchRngLT, cMvSrchRngRB);
iSrchRngHorLeft = cMvSrchRngLT.getHor();
iSrchRngHorRight = cMvSrchRngRB.getHor();
iSrchRngVerTop = cMvSrchRngLT.getVer();
iSrchRngVerBottom = cMvSrchRngRB.getVer();
}
// start search
Int iDist = 0;
Int iStartX = cStruct.iBestX;
Int iStartY = cStruct.iBestY;
const Bool bBestCandidateZero = (cStruct.iBestX == 0) && (cStruct.iBestY == 0);
// first search around best position up to now.
// The following works as a "subsampled/log" window search around the best candidate
for (iDist = 1; iDist <= (Int)uiSearchRange; iDist *= 2)
{
if (bFirstSearchDiamond == 1)
{
xTZ8PointDiamondSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, iStartX, iStartY,
bFirstCornersForDiamondDist1);
}
else
{
xTZ8PointSquareSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, iStartX, iStartY, iDist);
}
if (bFirstSearchStop && (cStruct.uiBestRound >= uiFirstSearchRounds)) // stop criterion
{
break;
}
}
29
iDist,
if (!bNewZeroNeighbourhoodTest)
{
// test whether zero Mv is a better start point than Median predictor
if (bTestZeroVectorStart && ((cStruct.iBestX != 0) || (cStruct.iBestY != 0)))
{
xTZSearchHelp(pcPatternKey, cStruct, 0, 0, 0, 0);
if ((cStruct.iBestX == 0) && (cStruct.iBestY == 0))
{
// test its neighborhood
for (iDist = 1; iDist <= (Int)uiSearchRange; iDist *= 2)
{
xTZ8PointDiamondSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, 0, 0, iDist, false);
if (bTestZeroVectorStop && (cStruct.uiBestRound > 0)) // stop criterion
{
break;
}
}
}
}
}
else
{
// Test also zero neighbourhood but with half the range
// It was reported that the original (above) search scheme using bTestZeroVectorStart did not
// make sense since one would have already checked the zero candidate earlier
// and thus the conditions for that test would have not been satisfied
if (bTestZeroVectorStart == true && bBestCandidateZero != true)
{
for (iDist = 1; iDist <= ((Int)uiSearchRange >> 1); iDist *= 2)
{
xTZ8PointDiamondSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, 0, 0, iDist, false);
if (bTestZeroVectorStop && (cStruct.uiBestRound > 2)) // stop criterion
{
break;
}
}
}
}
// calculate only 2 missing points instead 8 points if cStruct.uiBestDistance == 1
if (cStruct.uiBestDistance == 1)
{
cStruct.uiBestDistance = 0;
xTZ2PointSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB);
}
// raster search if distance is too big
if (bUseAdaptiveRaster)
{
int iWindowSize = iRaster;
Int iSrchRngRasterLeft = iSrchRngHorLeft;
Int iSrchRngRasterRight = iSrchRngHorRight;
Int iSrchRngRasterTop = iSrchRngVerTop;
Int iSrchRngRasterBottom = iSrchRngVerBottom;
if (!(bEnableRasterSearch && (((Int)(cStruct.uiBestDistance) > iRaster))))
30
{
iWindowSize++;
iSrchRngRasterLeft /= 2;
iSrchRngRasterRight /= 2;
iSrchRngRasterTop /= 2;
iSrchRngRasterBottom /= 2;
}
cStruct.uiBestDistance = iWindowSize;
for (iStartY = iSrchRngRasterTop; iStartY <= iSrchRngRasterBottom; iStartY += iWindowSize)
{
for (iStartX = iSrchRngRasterLeft; iStartX <= iSrchRngRasterRight; iStartX += iWindowSize)
{
xTZSearchHelp(pcPatternKey, cStruct, iStartX, iStartY, 0, iWindowSize);
}
}
}
else
{
if (bEnableRasterSearch && (((Int)(cStruct.uiBestDistance) > iRaster) || bAlwaysRasterSearch))
{
cStruct.uiBestDistance = iRaster;
for (iStartY = iSrchRngVerTop; iStartY <= iSrchRngVerBottom; iStartY += iRaster)
{
for (iStartX = iSrchRngHorLeft; iStartX <= iSrchRngHorRight; iStartX += iRaster)
{
xTZSearchHelp(pcPatternKey, cStruct, iStartX, iStartY, 0, iRaster);
}
}
}
}
// raster refinement
if (bRasterRefinementEnable && cStruct.uiBestDistance > 0)
{
while (cStruct.uiBestDistance > 0)
{
iStartX = cStruct.iBestX;
iStartY = cStruct.iBestY;
if (cStruct.uiBestDistance > 1)
{
iDist = cStruct.uiBestDistance >>= 1;
if (bRasterRefinementDiamond == 1)
{
xTZ8PointDiamondSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, iStartX, iStartY,
bRasterRefinementCornersForDiamondDist1);
}
else
{
xTZ8PointSquareSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, iStartX, iStartY, iDist);
}
}
// calculate only 2 missing points instead 8 points if cStruct.uiBestDistance == 1
if (cStruct.uiBestDistance == 1)
{
31
iDist,
cStruct.uiBestDistance = 0;
if (cStruct.ucPointNr != 0)
{
xTZ2PointSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB);
}
}
}
}
// star refinement
if (bStarRefinementEnable && cStruct.uiBestDistance > 0)
{
while (cStruct.uiBestDistance > 0)
{
iStartX = cStruct.iBestX;
iStartY = cStruct.iBestY;
cStruct.uiBestDistance = 0;
cStruct.ucPointNr = 0;
for (iDist = 1; iDist < (Int)uiSearchRange + 1; iDist *= 2)
{
if (bStarRefinementDiamond == 1)
{
xTZ8PointDiamondSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, iStartX, iStartY,
bStarRefinementCornersForDiamondDist1);
}
else
{
xTZ8PointSquareSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB, iStartX, iStartY, iDist);
}
if (bStarRefinementStop && (cStruct.uiBestRound >= uiStarRefinementRounds)) // stop criterion
{
break;
}
}
iDist,
// calculate only 2 missing points instead 8 points if cStrukt.uiBestDistance == 1
if (cStruct.uiBestDistance == 1)
{
cStruct.uiBestDistance = 0;
if (cStruct.ucPointNr != 0)
{
xTZ2PointSearch(pcPatternKey, cStruct, pcMvSrchRngLT, pcMvSrchRngRB);
}
}
}
}
// write out best match
rcMv.set(cStruct.iBestX, cStruct.iBestY);
static unsigned long int top = 0, topright = 0, topleft = 0, down = 0, downleft = 0, downright = 0, left = 0, right = 0; //Static
Variables declared to count all the prediction direction
if (cStruct.iBestX <= -1)
{
if (cStruct.iBestY == 0)
{
cout << "Left-";
32
left++;
cout << left << endl;
}
}
if (cStruct.iBestX == 0)
{
if (cStruct.iBestY >= 1)
{
cout << "TOP-";
top++;
cout << top << endl;
}
}
if (cStruct.iBestX >= 1)
{
if (cStruct.iBestY >= 1)
{
cout << "TOP_ABOVE_RIGHT-";
topright++;
cout << topright << endl;
}
}
/*if (cStruct.iBestX ==0)
{
if (cStruct.iBestY == 0)
{
cout <<"Center" << endl;
}
}*/
if (cStruct.iBestX == 0)
{
if (cStruct.iBestY <= -1)
{
cout << "Down-";
down++;
cout << down << endl;
}
}
if (cStruct.iBestX >= 1)
{
if (cStruct.iBestY <= -1)
{
cout << "Down_Right-";
downright++;
cout << downright << endl;
}
}
if (cStruct.iBestX <= -1)
{
if (cStruct.iBestY <= -1)
{
cout << "Down_Left-";
downleft++;
cout << downleft << endl;
33
}
}
if (cStruct.iBestX <= -1)
{
if (cStruct.iBestY >= 1)
{
cout << "Above_Left-";
topleft++;
cout << topleft<<endl;
}
}
if (cStruct.iBestX >= 1)
{
if (cStruct.iBestY >= 1)
{
cout << "Right-";
right++;
cout << right<<endl;
}
}
ruiSAD = cStruct.uiBestSad - m_pcRdCost->getCostOfVectorWithPredictor(cStruct.iBestX, cStruct.iBestY);
}
In the code mentioned above static variables are used to help find out the direction of the prediction used
as the best point. Further TZSearch is terminated using the algorithm shown in figure 21.
34
Figure 21. Algorithm to terminate TZSearch.
6. Configuration Profile
6.1 Introduction
Configuration profile is required by the HM software to perform encoding and decoding on the video
sequences based on the parameter values defined in the configuration file. In this project 3 configuration
profiles are considered with different Quantization parameters (QP).
6.1.1 Low Delay
In this profile one I-frame is introduced in the beginning of the encoder followed by B-frames or P-frames.
I-frame are called Intra frame and B-frame is Bi-Directional frame. In this profile PSNR is degraded because
there is only one I-frame followed by B or P frames.
In figure 22 show the summary of Low Delay profile where PSNR of I-frame is higher than PSNR of Bframe.
35
Figure 22. Low Delay Profile Summary. Bitrate in sec, Y, U, V-PSNR in db, Total # of frames 150.
6.1.2 Random Access
In this profile I-frames are introduced in between B-frames or P-frames. This technique of introducing Iframes in between every 30 B or P frames which helps to improve overall PSNR because I-frame does not
use any reference picture for prediction instead uses its own picture for prediction hence the possibility of
have error is less. Where as in B-frames and P- frames, prediction is based on reference picture which are
B-frame and P-frame respectively, so the possibility of having error is higher.
In figure 23 it can be seen that the average PSNR of random access profile is improved compared to low
delay profile.
36
Figure 23. Random Access Profile Summary. Bitrate in sec, Y, U, V-PSNR in db, Total # of frames 150.
6.1.3 Custom Profile (Random Access Early)
In this profile Coding Unit is terminated when best match is found which saves the encoding time by 40%.
In figure 24, the encoding time is reduced by 40% with comprisable decrease in PSNR.
37
Figure 24. Custom Profile Summary. Bitrate in sec, Y, U, V-PSNR in dB, Total # of frames 150.
7. Test Sequences
Figure 25. A frame from Mobisode2 video sequence. Resolution – 416x240.
38
Figure 26. A frame from BQMall video sequence. Resolution – 832x480.
Figure 27. A frame from Johnny video sequence. Resolution – 1280x720.
39
Figure 28. A frame from Park Scene video sequence. Resolution – 1920x1080.
8. Experimental Results
8.1 Test Conditions
Test Conditions
Frame Rate
30
Total Number of Frames
60
GOP
8
Search Range
64
CU Size / Depth
64/4
Inter frames intervals
32
QP
24, 28, 32, 36
Table 2. Test Conditions
40
8.2 Comparison of low delay, random access and random access early for different QP
values and different video sequences.
PSNR in DB for Different QP
Profile
24
28
32
Low_delay
44.0872
41.8229
39.8071
Random_access
44.6688
42.4969
40.5565
Random_access_early 44.6429
42.4997
40.5335
Table 3. PSNR for low delay, random access and random access early profile.
36
37.9595
38.7318
386617
Bitrate in kbps for Different QP
Profile
24
28
32
Low_delay
154.0520
88.0560
53.4080
Random_access
131.4360
77.4760
46.8400
Random_access_early 134.7200
78.7480
47.5880
Table 4. Bitrate for low delay, random access and random access early profile.
36
34.7000
30.4360
30.3840
Video Sequence
Modisode2
Video Sequence
Modisode2
Encoding Time in Seconds for Different QP
Profile
24
28
32
36
Low_delay
213.243
154.464
179.762
138.741
Random_access
123.502
115.912
104.691
113.527
Random_access_early 78.859
72.575
63.303
66.9806
Encoding Time saved
36.14759
37.38785
39.53348 41.00029
(%).
Table 5. Encoding Time for low delay, random access and random access early profile.
Video Sequence
Modisode2
PSNR in DB for Different QP
Profile
24
28
32
Low_delay
39.3297
36.9636
34.6412
Random_access
39.4750
37.3016
35.0993
Random_access_early 39.4804
37.2947
35.0340
Table 6. PSNR for low delay, random access and random access early profile
36
32.4062
32.9294
32.8468
Bitrate in kbps for Different QP
Video Sequence
Profile
24
28
32
Low_delay
2049.2600
1118.9640 634.7920
BQMall
Random_access
1877.1480
1070.1760 631.3680
Random_access_early 1906.9080
1083.3560 632.6000
Table 7. Bitrate for low delay, random access and random access early profile
36
371.7720
383.0400
382.9560
Video Sequence
BQMall
41
Encoding Time in Seconds for Different QP
Video Sequence
Profile
24
28
32
Low_delay
1115.699
852.405
767.365
BQMall
Random_access
628.543
538.078
475.976
Random_access_early 392.471
335.318
294.552
Encoding Time saved
(%).
37.55861
37.68227
38.11621
Table 8. Encoding Time for low delay, random access and random access early profile
Video Sequence
Johnny
Profile
Low_delay
Random_access
PSNR in DB for Different QP
24
28
39.3297
36.9636
43.4299
42.1872
36
703.206
424.812
254.845
40.00993
32
34.6412
40.7531
36
32.4062
39.0332
Random_access_early 43.4189
42.1770
40.7362
Table 9. PSNR for low delay, random access and random access early profile
39.0224
Bitrate in kbps for Different QP
Video Sequence
Profile
24
28
32
Low_delay
2049.2600
1118.9640 634.7920
Johnny
Random_access
700.2360
346.9480
201.4520
Random_access_early 713.6160
348.2760
201.6160
Table 10. Bitrate for low delay, random access and random access early profile
Encoding Time in Seconds for Different QP
Profile
24
28
32
Low_delay
1115.699
852.405
767.365
Random_access
900.794
856.603
813.568
Random_access_early 601.5399
563.924
487.1408
Encoding Time saved
33.22114
34.1674
40.12292
(%).
Table 11. Encoding Time for low delay, random access and random access early profile
Video Sequence
Johnny
Video Sequence
ParkScene
Profile
Low_delay
Random_access
PSNR in DB for Different QP
24
28
39.4567
37.3035
39.7192
37.7745
36
371.7720
125.3560
125.0160
36
703.206
872.784
513.1097
41.21
32
35.2910
35.8569
36
33.4236
34.0347
Random_access_early 39.7196
37.7595
35.8355
Table 12. PSNR for low delay, random access and random access early profile
34.0067
42
Bitrate in kbps for Different QP
Profile
24
28
32
Low_delay
5042.252
4590.547
4266.833
Random_access
2871.684
2722.541
2267.504
Random_access_early 1782.362
1700.226
1360.5024
Table 13. Bitrate for low delay, random access and random access early profile
Video Sequence
ParkScene
Encoding Time in Seconds for Different QP
Profile
24
28
32
Low_delay
5042.252
4590.547
4266.833
Random_access
2871.684
2722.541
2267.504
Random_access_early 1782.362
1700.226
1358.502
Encoding Time saved
37.93321
37.55003
40.0882
(%).
Table 14. Encoding Time for low delay, random access and random access early profile.
Video Sequence
ParkScene
43
36
3280.501
2324.282
1324.68
36
3280.501
2324.282
1324.68
43.00692
Conclusion: -
Hence from the results shown in table 1 it can be said that about 60% of the
motion vectors are median predictors. And these median predictors are
terminated using TZSearch which saves encoding time by 38%. The results can
be seen from table 3 through table 14.
44
REFERENCES
[1] HEVC overview
http://www.apsipa2013.org/wpcontent/uploads/2013/09/Tutorial_8_NextGenerationVideoCoding_Part_2.
pdf
[2] D. Marpe et al, “The H.264/MPEG4 advanced video coding standard and its applications”, IEEE
Communications Magazine, Vol. 44, pp. 134-143, Aug. 2006.
[3] B. Bross et al, “High Efficiency Video Coding (HEVC) Text Specification Draft 10”, Document
JCTVC-L1003, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC),Mar.2013.
[4] G.J. Sullivan et al, “Overview of the high efficiency video coding (HEVC) standard”, IEEE Trans.
Circuits and Systems for Video Technology, vol. 22,no.12, pp. 1649 – 1668, Dec 2012.
[5] HEVC white paper - Ateme: http://www.ateme.com/an-introduction-to-uhdtv-and-hevc
[6] G.J. Sullivan et al, “Standardized Extensions of High Efficiency Video Coding (HEVC)”, IEEE Journal
of selected topics in Signal Processing, Vol. 7, No. 6, pp. 1001-1016, Dec. 2013.
[7] HEVC tutorial by I.E.G. Richardson: http://www.vcodex.com/h265.html
[8] C. Fogg, “Suggested figures for the HEVC specification”, ITU-T / ISO-IEC Document: JCTVC
J0292r1, July 2012.
[9] U.S.M. Dayananda, “Study and Performance comparison of HEVC and H.264 video codecs” Final
project
report
,
EE
Dept.,
UTA,
Arlington,
TX,
Dec.
2011
available
on
http://www.uta.edu/faculty/krrao/dip/Courses/EE5359/index_tem.html
[10] HM Software Manual - https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/
[11] Visual studio: http://www.dreamspark.com
[12] Tortoise SVN: http://tortoisesvn.net/downloads.html
[13] Multimedia processing course website: http://www.uta.edu/faculty/krrao/dip/
[14] C. E. Rhee et al, “A Survey of Fast Mode Decision Algorithms for Inter-Prediction and Their
Applications to High Efficiency Video Coding”, IEEE Transactions on Consumer Electronics, vol 58, no.
4, pp 1375-1383, Dec. 2012.
[15] R. Li, B. Zeng, and M.L. Lio, “A new three-step search algorithm for block motion estimation” IEEE
Trans. Circuits and Systems for Video Technology, vol. 4, no. 4, pp. 438-443,August 1994.
[16] Z. Pan et al, “Early termination for TZSearch in HEVC motion estimation”, IEEE ICASSP 2013, pp.
1389-1392, June 2013.
45
[17] X. Zhang, S. Wang, S. Ma, “Early termination of coding unit splitting for HEVC”, Signal &
Information Processing Association Annual Summit and Conference. Page(s):1-4, December 2012.
[18] Ahmad asghar, Muhammad atiq, Rai Ammad khan, Nadeem a. khan, “Motion Estimation and Inter
Prediction Mode Selection in HEVC”, Recent Researches in Telecommunications, Informatics, Electronics
and Signal Processing, page(s): 351 – 357, December 2013.
[19] G. Sullivan et al, “Overview of the high efficiency video coding (HEVC) standard”, IEEE
Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp 1649-1668, December
2012.
[20]HEVC tutorial 
http://www.apsipa2013.org/wpcontent/uploads/2013/09/Tutorial_8_NextGenerationVideoCoding_Part_2.
pdf.
[21] V. Sze, M. Budagavi and G.J. Sullivan (editors), “High Efficiency Video Coding (HEVC) Algorithms
and Architectures” Springer, 2014.
[22] L.C.Manikandan et al, “A new survey on Block Matching Algorithms in Video Coding” in
International Journal of Engineering Research, vol. 3, pp.121-125, February 2014.
[23] M. Jakubowski and G. Pastuszak, “Block-based motion estimation algorithms-a survey,” Journal of
Opto-Electronics Review, vol. 21, pp 86-102, March 2013.
[24] Maria Santamaria and Maria Trujilo,”A comparision of block-matching motion estimation algorithms”
on October 4th 2012.
[25] L. Xufeng, et al,”Fast motion estimation for HEVC.” 2014 IEEE International Symposium on
Broadband Multimedia and Broadcasting (BMSB), IEEE, December 2014.
[26] N. Purnachand, et al, “Fast motion estimation algorithm for HEVC,” in Consumer Electronics - Berlin
(ICCE-Berlin), 2012 IEEE second International Conference on Consumer Electronics - Berlin (ICCEBerlin), 2012 IEEE International Conference on. Berlin: pp. 34–37, IEEE, March 2012.
[27] X.-l. Tang, et al, “An analysis of TZ search algorithm in JMVC,” in Green Circuits and Systems
(ICGCS), 2010 International Conference on, ser. Green Circuits and Systems (ICGCS), 2010 International
Conference on. Shanghai: pp. 516–520, IEEE, September 2010.
[28] L.N.A. Alves, and A. Navarro, " Fast Motion Estimation Algorithm for HEVC ", Proc IEEE
International Conf. on Consumer Electronics - vol.11 , pp. 11 - 14 , ICCE Berlin , Germany, September
2012
[29] M. Jakubowski and G. Pastuszak, “Block-based motion estimation algorithms-a survey,” Journal of
Opto-Electronics Review, vol. 21, pp 86-102, March 2013.
46
[30] Z. Pan, S. Kwong, L. Xu, Y. Zhang, T. Zhao, “Predictive and distribution-oriented fast motion
estimation for H.264/AVC” Journal of Real-Time Image Processing, vol. 9, page(s): 597 – 607, December
2014.
[31] P. Nalluri et al, “High Speed SAD Architectures for variable block size estimation in HEVC video
coding”, IEEE International Conference on Image Processing (ICIP). Page(s): 1233 – 1237, October 2014.
[32] M. A. B. Ayed, et al, “TZ Search pattern search improvement for HEVC motion estimation modules,”
Advanced Technologies for Signal and Image Processing (ATSIP). Page(s): 95 – 99, March 2014.
2014.
[33] Introduction to Motion estimation and Motion compensation--->
http://www.cmlab.csie.ntu.edu.tw/cml/dsp/training/coding/motion/me1.html
[34] HM Software Manual - https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/
[35] Visual studio: http://www.dreamspark.com
[36] Tortoise SVN: http://tortoisesvn.net/downloads.html
[37] Tutorials---> N. Ling, “High efficiency video coding and its 3D extension: A research perspective,”
Keynote Speech, IEEE Conference on Industrial Electronics and Applications, Singapore, July 2012.
[38] Tutorials---> X. Wang et al, “Paralleling variable block size motion estimation of HEVC on CPU plus
GPU platform”, IEEE International Conference on Multimedia and Expo workshop, 2013.
[39] Tutorials---> H.R. Tohidpour, et al, “Content adaptive complexity reduction scheme for quality/fidelity
scalable HEVC”, IEEE International Conference on Image Processing, pp. 1744-1748, June 2013.
[40] HEVC tutorial 2014 ISCAS --->
http://www.rle.mit.edu/eems/wp-content/uploads/2014/06/H.265-HEVC-Tutorial-2014-ISCAS.pdf
[41] Video Lecture on Digital Voice and Picture Communication by Prof.S. Sengupta, Department of
Electronics
and
Electrical
Communication
Engineering
IIT
Kharagpur
->
https://www.youtube.com/watch?v=Tm4C2ZFd3zE.
[42] Lecture on video coding standards ->http://nptel.iitm.ac.in
[43] YUV format figures -> http://www.dvxuser.com/V6/showthread.php?336009-Request-4K-10-bit-4-22-internal-for-the-DVX200!/page6.
47