Video Compression

Video Compression
Hoda Roodaki
[email protected]
1
Video Redundancies
• Spatial
• Neighboring pixels in a frame are statistically related.
• Temporal
• Pixels in consecutive frames are statistically related.
• One can achieve higher compression ratios by exploiting both spatial
and temporal redundancies.
2
H.261 Standard
• Developed by CCITT (Consultative Committee for International
Telephone and Telegraph) in 1988-1990
• Designed for videoconferencing, video-telephone applications over
ISDN telephone lines.
• Low bitrates and low delay
• Bit-rate is p x 64 Kb/sec, where p ranges from 1 to 30.
• Constitutes the basic framework for most current video compression
methods.
3
H.261 Standard
• Only two picture formats are allowed:
• CIF (Common Intermediate Format) (352x288)
• QCIF (Quarter-CIF) (176x144)
• Color components:
• YUV (or YCBCR)
• Y = ( 0.257R + 0.504G + 0.098B ) + 16
• Cb = (- 0.148R - 0.291G + 0.439B) + 128
• Cr = ( 0.439R - 0. 368G + 0.071B ) + 128
• 4:2:0 format
4
Inter and Intra coding
• To exploit spatial redundancies within a
• frame (Intra coding):
• 8x8 DCT, similar to JPEG
• To exploit temporal redundancies between
• frame (Inter coding):
• Motion Estimation
5
Frame Types
• Two frame types:
• Intra-frames (I-frames): I-frame provides an accessing point, it uses basically
JPEG.
• Inter-frames (P-frames): P-frames use "differences“ from previous frame, so
frames depend on each other.
6
Intra-Frame Coding
• Quantization is by constant value of 8 for all DC coefficients, and
uniform everything else (i.e., no quantization table as in JPEG)
7
Blocks, Macroblocks
• A Macroblock (16x16) consists of
• In the YCbCr color space with 4:2:0 chroma subsampling, a 16×16 macroblock
consists of
• 16×16 luma (Y) samples
• 8×8 chroma (Cb and Cr) samples
8
Inter-frame Motion Estimation
• Partition the image into 16x16 blocks
• For each block in the current frame, called predicted image, search
for the best match in the previous frame, called Reference image.
• We can use MAE, or MSE to decide the best block.
9
Inter-frame Motion Estimation
• The distance between the matched block in the reference frame and
the original block in the target frame is called a Motion Vector (MV).
• For the P-frames we encode motion vectors.
• Then we reconstruct the p-frame at the encoder side using these MVs.
• The difference between this reconstructed frame and the original frame,
called Error Residuals, is intra-coded.
• We should use decoded frames as a reference not the originals.
• ME is only performed on Y Component.
10
MV Encoding
• MVs have integer values
• Differential coding in each direction
• VLC for MV difference
11
H.261 Encoder
• I-frame
12
H.261 Encoder
• Control: controlling the bit-rate.
• If the transmission buffer is too full, then bit-rate will be reduced by changing
the quantization factors.
• Memory: used to store the reconstructed image (blocks) for the
purpose of motion vector search for the next P-frame.
13
H.261 Encoder
• P-frames
14
Motion estimation and Motion compensation
• Motion estimation is the process of determining motion vectors that
describe the motion from one 2D image to another; usually from
adjacent frames in a video sequence.
• Motion compensation is the use of the motion estimation
information to achieve compression.
15
MV Search Methods
• Full Search
• Sequentially search the whole [-p, p] region --> very slow
16
MV Search Methods
• Two dimensional Logarithmic Search
1. MAE function is computed within a window of [-p/2, p/2] at nine locations.
2. Find one of the nine locations that yields the minimum MAE.
3. Form a new searching region with half of the previous size and centered at
the location found in step 2.
4. Go to Step 1.
17
MV Search Methods
18
Hierarchical Motion Estimation
• Perform a heuristic search in a small area to locate the best match for
this level.
• The vector is propagated to the next level by multiplying both its
coordinates with the scaling factor (in most cases, 2).
• The process of propagating the vector and refining it is repeated until
a result for level 0 is reached, where the process ends.
19
Hierarchical Motion Estimation
20
Intra coded frames (I-frame)
• I-frames are coded without reference to any frame except
themselves.
• May be generated by an encoder to create a random access point (to
allow a decoder to start decoding properly from scratch at that
picture location).
• Typically require more bits to encode than other frame types.
21
Predicted frames (P-frames)
• P-frame is the name to define the forward Predicted pictures.
• The prediction is made from an earlier picture, mainly an I-frame, so that require less coding data
(≈50% when compared to I-frame size).
• The amount of data needed for doing this prediction consist of
• Motion vectors
• Transform coefficients describing prediction correction
22
Macroblock Types
• Intra:
• In I-frames
• when no good match is found (Usually in scene cuts)
• For better error resilience.
• In P-frames
• Inter:
• In P-Frames: has a MV, and error residual DCT coefficients.
• Skipped:
• In P-frames:
• They will be copied from the previous frame.
23
Macroblock Inter/intra decision
• The error of intra-frame MB is compared with the error of inter. The
smallest is chosen.
• For smaller errors inter-frame is preferred.
• Error(inter) = ( (𝑝𝑒𝑙 − 𝑚𝑒_𝑝𝑒𝑙)2 )/256
• Error(intra) = ( (𝑝𝑒𝑙 − 𝑏𝑙𝑘 − 𝑎𝑣𝑔)2 )/256
24
H.261 bitstream
•
•
•
•
•
•
•
•
•
Picture Start Code --> PSC
Need timestamp for picture (used later for audio synchronization),
TR: Temporal Reference
Ptype: Is this a P-frame or an I-frame? Send Picture Type
GOB: Picture is divided into regions of 11 x 3 macroblocks called Groups of
Blocks
Grp #: Might want to skip whole groups, so send Group Number
GQuant: Might want to use one quantization value for whole group, so
send
Group Quantization Value
Overall, bitstream is designed so we can skip data whenever possible.
25
H.261 bitstream
26
H.261 Quality vs. bitrate
27
Loop Filtering
• To reduce the blockiness effect in low bitrates (less than 6x64 Kbit/s),
loop filtering is introduced after decoding
• Has a blurring effect
• Should be activated in blocks with motion only
• Is carried out at Macroblock level:
28
H.263
• Started around Nov 1993 by ITU-T SG 15
• PSTN and mobile network: about 10~24 kbits/s
• Adopted in March 1996.
• Compared to H.261
• More picture formats, different GOP structures
• Half-pel motion compensation, no loop filtering
• Performance
• About 3~4 dB better PSNR than H.261 at <64 kbits/s
29
H.263 Picture Formats
30
Half-pel ME Resolution
• Bilinear interpolation to fill in pixels
𝑎=𝐴
𝑏 = (𝐴 + 𝐵)/2
𝑐 = (𝐴 + 𝐶)/2
𝑑 = (𝐴 + 𝐵 + 𝐶 + 𝐷)/4
• Resolution of MVs is half-pixel
31
Unrestricted motion vector mode (UMVmode)
• In the default prediction mode of H.263, motion vectors must not
reference pixels outside the coded picture area. This restriction need
not hold if the UMV-mode is switched on.
• If a motion vector component points outside the coded picture area,
the edge pixel referenced by the truncated vector component is used
instead.
• The unrestricted motion vector mode is useful when the moving
objects are entering or moving around the frame border.
Consequently, a better prediction candidate can be found in the
extended border.
32
Unrestricted motion vector mode (UMVmode)
33
Advanced Prediction Mode
• Four motion vectors for each MB (one for each 8×8 block)
• Use more bits, but give better prediction.
• Encoder decides which MBs size to be applied.
34
PB-Frame (PB) Mode
• A PB-frame consists of two pictures
• P-picture: Predicted from the last decoded picture
• B-picture: Predicted from both the last decoded picture and the P-picture
currently being coded
35
References
• Video Coding, An Introduction to standard codecs, Mohammed
Ghanbari, the Institution of Electrical Engineers 1999.
• Multimedia Communications: Coding, Systems, and Networking,
Tsuhan Chen.
36