part2spring2009.pdf

Image Block Transformations
The image block transformations can be broken down into two parts:
the geometric part, which is applied first, and the massic part,
which is applied next.
Geometric part: scales down the 2n × 2n domain block to a n × n
range block by setting each range block pixel value to the average
value of the group of four pixels at the corresponding location in the
domain block. This transformation has a contractivity of 1.
Massic part: includes one or more of the following operations:
• Absorption at gray level go, where {0
g0 255} for an 8-bit
image. This operation simply sets the value of all the pixels in the
block to some uniform gray level go. For this operation, s = 0.
• Contrast scaling by ), The value of each pixel in the block is
multiplied by a and the resulting value clipped to the range (0,
255). For this operation, s = )2.
• Luminance shift by ~g, where the value of
g is added to each
pixel value ({–255
g
255}), with the resulting values
clipped to (0,255). For this operation, s = 1.
•
One of the eight isometric transformations of identity (1),
reflection (4), and rotation (3). For these operations, s = 1.
151
Transformation Example
The following example represents an overall contractive
transformation consisting of a massic transformation followed by a
contrast scaling by ) = 0.50, followed by a luminance shift of 2g =
50 followed by a isometric rotation of 90 degrees.
152
Jacquin’s Fractal Compression Technique
The goal of the encoding algorithm is to find, for each n × n block of
the image, another 2n × 2n block, which when transformed by a
combination of the previously mentioned geometric and massic
transforms, will result in a good approximation of the original block.
The most appropriate domain block indicated by the search is called
a matching block. To this end:
• The image is segmented into n × n range blocks (usually, 8 × 8).
• Each 2n × 2n (e.g., 16 × 16) domain block in the image is
transformed by a combination of geometric and massic
transformations with different parameters until the best
matching block and the best set of transform parameters are
found.
• The parameters of the geometric and massic transforms, along
with the coordinates of the matching domain block are
encoded.
• To improve the reconstructed image quality, each n × n (parent)
block is further divided into four n / 2 × n / 2 (child) blocks. If
distortion between a reconstructed child block and the original
child block is larger than a threshold, the child block is
independently encoded with the results of the child block
encoding superseding that portion of the parent block. If more
than two child blocks need to be independently encoded, the
fractal for the parent block is discarded and the region is
encoded as collection of 4 child blocks.
153
Encoding of Fractal Parameters
Domain coordinates: The probability that a domain block is the
best match for a given range block is almost inversely proportional
to the distance between them. Entropy coding of this parameter
can result in significant bit rate reduction.
Contrast scale factor: The distribution of this parameter is also
nonuniform and usually has a spike at its maximum allowable
value, thus benefiting from entropy coding.
Luminance offset: This parameter has an asymmetric distribution,
usually with a peak at zero.
Isometries: The distribution of this parameter is fairly image
dependent, but usually the identity transform is used more the 90
degree rotations are used the least often.
Encoding speed: There are several ways to speed up the encoder
search time. For example, the range and domain blocks can be
classified into edge, shade, and mid-range classes based on their
image content. For instance, when searching for the best match for
an edge block, only the pool of the edge domain blocks needs to be
considered. Also, there is no need to find a domain block for the
shade blocks since they only consist of a constant brightness value.
154
VECTOR QUANTIZATION
IMAGE COMPRESSION
155
Basic VQ Structure
• An image is divided into nonoverlapping blocks or vectors;
each vector X is of size n.
• Each image vector is compared with Nc codevectors or templates from a codebook Yi, i = 1,…, Nc, for a best match using
a minimum distortion rule, i.e., choose Yk such that d(X, Yk)
d (X, Yj) for all j = 1,…, Nc.
• Transmit the index k of the codevector, and use this index as
an entry to a look-up table at the receiver to reproduce Yk. This
results in a rate of R = (log2 Nc) / n bits/pixel.
• Example: For a 4 × 4 block size (n = 16), and a codebook with
256 codevectors (Nc = 256), the bit rate is:
R = (log2 256) / 16 = 0.5 bits/pixel.
X
Minimize
K
Channel
K
Table
d (X, Yi)
Lookup
Codebook
Yi, i = 1, …, Nc
Codebook
Yi, i = 1, …, Nc
Basic VQ
156
YK
Product Codes
For a constant rate R = (log2 Nc) / n, the performance of VQ
Rn
improves as the block size n increases. However, Nc = 2 , which
implies that the codebook size (and the encoder complexity) grows
exponentially with n. This imposes a severe restriction on the
performance of basic VQ.
One solution is to use multiple codebooks with a product
structure. If a vector can be characterized by certain
independent features, a separate codebook may be developed and
used to encode each feature. The final codeword would be the
concatenation of all the different encoder outputs. Product codes
are usually suboptimum, but result in substantial savings in
storage and computation requirements; e.g., if a codebook of size
Nc can be decomposed into two product codebooks of sizes N1
and N2, with Nc = N1 N2, the storage and computation complexity
would be proportional to N1 + N2 rather than Nc. Examples of
VQ schemes which implement a product structure are:
•
Mean/Residual VQ (M/RVQ)
•
Mean/Reflected-Residual VQ (M/RRVQ)
•
Gain/Shape VQ (G/SVQ)
•
Interpolative/Residual VQ (I/RVQ)
Sabin and Gray IEEE Trans. Acoust. Speech Sig. Proc., Vol ASSP-32 , pp. 474, 1984.
157
Gain/Shape VQ (G/SVQ)
In G/SVQ, separate, but interdependent, codebooks are used to code
the shape and gain of the vector. The shape is defined as the
original vector normalized by removal of a gain term such as the ac
energy of a codevector. Given The gain and shape codebooks, the
encoding proceeds as follows: First, .a unit energy shape vector
yi is chosen from the shape codebook to match the input vector
by maximizing the inner product over the codewords. Given the
resulting shape vector, a scalar gain codeword Ji is selected so as
to minimize the distortion between the original vector x and the
reproduction vector Ji yi.
SHAPE
CODEBOOK
yi, i = 1,…, 2Rc
xn
MAXIMIZE
i
t
n
x ym
MINIMIZE
2
2 x nt y i
GAIN
CODEBOOK
yi
(i , j)
ROM
j
j
y i = xˆ n
j
DECODER
Sabin and Gray, IEEE Trans. Acoust. Speech Sig. Proc., Vol ASSP.82, pp. 474, 1984
158
, j = 1,…,2 Rg
ENCODER
j
Mean/Residual VQ (M/RVQ)
• In M/RVQ, the mean of each image vector is quantized separately
using a scalar quantizer.
• The
quantized mean is subtracted to yield a residual (shape)
vector with approximately zero sample mean. This new vector
is quantized using VQ.
• The index of the coded residual vector along with the sample
mean is transmitted to the receiver. A convential coding
scheme (e.g. DPCM) can be used to encode the block means to
further reduce the bit rate.
• A motivation for M/RVQ is that many image vectors have a
similar shape and differ only in mean level. By removing the
mean, fewer codevectors are required to represent these image
vectors.
Gray, IEEE ASSP Magazine, pp. 4, April 1984.
159
M/RVQ Bitrate/Error Table
LENA
Technique Bitrate
bits/pixel
0.50
M/RVQ 1ev 0
0.69
M/RVQ 1ev 1
0.87
M/RVQ 1ev 2
M/RVQ 1ev 3
M/RVQ 1ev 4
M/RVQ 1ev 5
1.06
1.25
1.44
BOOTS
Technique Bitrate
bits/pixel
0.50
M/RVQ 1ev 0
0.69
M/RVQ 1ev 1
0.87
M/RVQ 1ev 2
M/RVQ 1ev 3
M/RVQ 1ev 4
M/RVQ 1ev 5
1.06
1.25
1.44
Included
Excluded
RMSE SNR RMSE SNR
(0-255)
11.52
8.62
6.60
(db)
26.90
29.42
31.74
5.25
4.27
33.72
35.51
37.85
3.27
Included
(0-255) (db)
11.52 26.90
8.75
29.29
6.64
31.68
5.37
4.63
4.24
33.53
34.82
35.58
Excluded
RMSE SNR RMSE SNR
(0-255)
(db)
(0-255)
(db)
19.59
14.70
11.46
9.29
7.40
4.86
22.29
24.79
26.95
28.77
30.75
34.39
19.59
14.72
11.61
9.61
8.45
7.85
22.29
24.77
26.83
28.47
29.59
30.24
• The mean is uniformly quantized to 8 bits. VQ is performed on 4 ×
4 residual blocks. Codebook is of size 215 with an oct-tree
structure. The tree has 5 levels with a bit-increment of 3/16
8
bits/pixel/level. Final bitrate is 163 × 5 + 16 = 1.438 bits/pixel.
Two training sequences are used, each consisting of 8, 512 × 512
images. The desired image is included in one set and excluded
from the other.
• SNR is defined as l0 log10(2552/mse) db.
160
Interpolative/Residual VQ (I/RVQ)
• An approximation (prediction) to the original image is made by
first subsampling it by a factor l (e.g., l = 8) in each direction,
and then expanding it to the original size by bilinear
interpolation.
•A
residual image is formed by subtracting the interpolated
image from the original image. This residual image is encoded
using VQ.
• The subsampled image values and the VQ indices of the encoded
residual image are transmitted to the receiver.
• For a resudual-image codebook of size NR,VQ blocks of size n, a
subsampling factor l, and 8-bit representation of each
subsampled value, the bit rate in bits/pixel is
R=
• If
8 log 2 N R
.
+
l2
n
the residual image is included in the training set, I/RVQ
usually results in a better quality reconstructed image than
M/RVQ or G/SVQ for the same bit rate.
Hang and Haskell, IEEE Trans. Communications, Vol 36, PP. 465,1988.
161
I/RVQ Bitrate/Error Table
Included
Excluded
LENA
Technique Bitrate RMSE SNR RMSE SNR
I/RVQ 1ev 0
I/RVQ 1ev 1
I/RVQ 1ev 2
I/RVQ 1ev 3
I/RVQ 1ev 4
I/RVQ 1ev 5
bits/pixel
(0-255)
(db)
(0-255)
(db)
0.12
0.31
0.50
16.21
10.68
8.05
0.69
0.87
1.06
6.41
5.19
3.96
23.94
27.56
30.02
31.99
33.83
36.18
16.21
10.75
8.11
6.53
5.65
5.23
23.94
27.50
29.95
31.83
33.09
33.77
Included
Excluded
BOOTS
Technique Bitrate RMSE SNR RMSE SNR
I/RVQ 1ev 0
I/RVQ lev 1
I/RVQ 1ev 2
I/RVQ 1ev 3
I/RVQ 1ev 4
I/RVQ 1ev 5
bits/pixel (0-255)
(db)
(0-255)
(db)
27.62
18.18
14.09
11.39
9.06
5.99
19.31
22.94
25.15
27.00
28.99
32.58
27.62
18.09
14.14
11.69
10.25
9.57
19.31
22.98
25.12
26.78
27.92
28.52
0.12
0.31
0.50
0.69
0.87
1.06
• Original
image is subsampled by 8 in each direction and each
sample is transmitted using 8 bits. VQ is performed on 4 × 4
blocks. Codebook is of size 215 with an oct-tree structure. The tree
has 5 levels with a bit-increment of 3/16 bits/pixel/level. Final
8
3
bitrate is 16 × 5 + 64 = 1.0625 bits/pixel. Two training
sequences are used, each consisting of 8, 512 × 512 residual
images. The desired image is included in one set and excluded
from the other.
• SNR is defined as l0 log10(2552/mse) db.
162
PROGRESSIVE TRANSMISSION
163
Progressive Transmission (PT)
• The
transmission of digital images over low bandwidth
channels, e.g., telephone lines, requires a significant amount of
time in cases where quick recognition is important.
• Examples of this scenario are telebrowsing, where an image
data base is searched for a particular image or images, and
teleconferencing, where images need to be sent in near
realtime to avoid undesirable delays in the discussion.
• Conventional
compression algorithms, such as transform or
predictive coding, decrease transmission time, but generally
require the majority of an image to be reconstructed to achieve
recognition.
• A Solution to this problem is Progressive Transmission (PT).
164
Progressive Transmission Techniques
Spatial Hierarchy Techniques:
• Subsampling
• Averaging over blocks
• Knowlton’s technique
Amplitude Hierarchy Techniques:
• Bit-plane encoding
Frequency Domain Techniques:
• Transform coding (DCT, Hadamard, ect.)
• Sub-band coding
Other Techniques:
• Laplacian pyramid
• Tree-structured VQ
165
166
Based on Laplacian Pyramid
1. A set of lowpass filtered images is formed by recursively
filtering (normally by a Gaussian shape filter) the original
image and subampling after each filtering operation. This set
of images is called the Gaussian pyramid.
2. A set of difference images is formed by first expanding each
subsampled image to the next larger size in the pyramid and
then subtracting the adjacent levels in the Gaussian pyramid.
This set is called the Laplacian pyramid. Essentially each
lowpass filtered image is being used as a prediction image for
the adjacent level and each level in the Laplacian pyramid is a
quasi-bandpassed image.
3. The Laplacian pyramid levels are encoded (e.g. using a
DPCM scheme) and transmitted, starting with the lowest
resolution level.
4. The images are reconstructed by expanding and summing the
levels recursively.
167
168
Hierarchical Progressive Mode
•
In the hierarchical mode, the image is encoded at multiple
resolutions using a pyramid structure. The base of the pyramid
represents the full-resolution image, and as one moves up the
pyramid, the images decrease in spatial resolution by a factor
of two in each horizontal and vertical dimension. This is particularly suited to a multiuse environment supporting devices
of varying resolutions, e.g.,
—
a high-resolution output writer at 2048 × 2048 pixels,
—
an HDTV video monitor at 1024 × 1024 pixels,
—
a low-resolution TV monitor at 512 × 512 pixels, etc.
• In order to create the lower-resolution images, the size of the
original image needs to be reduced by filtering and
downsampling. The JPEG proposed standard does not specify
the structure of the downsampling filters and has left them to
the skill of each implementer.
• In order to reconstruct the high-resolution images, the size of
the low-resolution reconstructed images must be increased by
upsampling. The JPEG standard has specified bilinear interpolation filters which must be used for the upsampling process.
169
170
Sequential Progressive Mode
In sequential progressive transmission, partial image information
is transmitted (1 in stages, and at each stage, an approximation to
the original image is reconstructed at the decoder. The
transmission can be stopped if an intermediate version of the
image is satisfactory or if the image is found to be of no interest.
It is motivated by the need to transmit images over lowbandwidth channels, e.g., telephone lines, in cases where quick
recognition is important or total transmission time might be
limited. Applications include PACS for medical imaging,
photojournalism, real estate, and military. The JPEG extended
system specifies two methods for the sequential progressive
build-up of images:
-
• Spectral selection: Where each block of DCT coefficients
is segmented into bands and each band is encoded in a
separate scan.
• Successive approximation: Where each coefficient is first
sent with reduced precision n (specifiable by the user) and
the precision is increased by one bit in each subsequent
scan.
These two procedures may either be used separately, or
combined. Note that in the progressive mode there is the need for
an image-sized buffer memory that can hold all the quantized
DCT coefficients with full accuracy (which could be as much as
3 bits/pixel more than the input image precision).
171
JPEG Sequential Progressive Build-up1
1
G. Wallace, Communications of ACM, .34, pp. 30, 1991
172
Choosing a Compression Algorithm
• Image quality vs. bit-rate
• Implementation
complexity: software implementation;
hardware (DSP chip) implementation; ASIC implementation;
memory requirements.
• Encoder/decoder asymmetry
• Constant bit-rate vs. constant quality
• Flexible bit-rate/quality tradeoff
• Robustness to post-processing
•
Robustness to input image types: contone, halftone, text;
different levels of input noise; multiple resolutions; etc.
• Nature of artifacts
• Effect of multiple coding
• Channel error tolerance
• Progressive transmission capability
173
VIDEO COMPRESSION
Definition of Video/Image Sequence
An image sequence (or video) is a series of 2-D images that are
sequentially ordered in time.
Image sequences can be acquired by video or motion picture
cameras, or generated by sequentially ordering 2-D still images
as in computer graphics and animation.
174
The Need for Compression
Video must be significantly compressed for efficient storage and
transmission, as well as for efficient data transfer among various
components of a video system.
Examples
V Motion Picture:
One frame of a Super 35 format motion picture may be digitized
(via Telecine equipment) to a 3112 lines by 4096 pels/color, 10
bits/color image.
As a result, 1 sec. of the movie takes W 1 Gbytes !
V HDTV:
A typical progressive scan (non-interlaced) HDTV sequence may
have 720 lines and 1280 pixels with 8 bits per luminance and
chroma channels.
The data rate corresponding to a frame rate of 60 frames/sec is
720 × 1280 × 3 × 60 = 165 Mbytes/sec!
175
Major Components of A Video Compression
System
input
Preprocessor
Encoder
Motion
Estimator
Motion
Estimator
video
channel
Channel
Decoding &
Demodulation
Channel
Encoding &
Modulation
Decoder
Postpreprocessor
Compression
Module
Motion
Estimator
176
channel
output
video
Components of A Video Compression System
Motion Estimation
• Motion
Information is utilized at
Compression, and also at Post-Processing.
Pre-Processing,
Pre-processing
• Noise suppression (motion-compensated)
• Correction of undesirable motion jitter;
• Spatial and temporal subsampling for data reduction (e.g.,
skip-ping frames)
Video Compression
• Inraframe
and Interframe compression are utilized.
Interframe compression uses motion information for motioncompensated temporal prediction.
Post-processing
• Spatial and temporal interpolation for display:
Frame interpolation for reconstructing skipped frames,
or for frame-rate conversion.
Compression artifact reduction
—
•
177
MOTION ESTIMATION
178
Motion Estimation
In general, we speak of motion of objects in 3-D real world. In
this section, we are concerned with the “projected motion” of 3D objects onto the 2-D plane of an imaging sensor.
> By motion estimation, we mean the estimation of the
displacement (or velocity) of image structures from one frame to
another in a time-sequence of 2-D images.
In the literature, this projected motion is referred to as “apparent
motion”, “2- D image motion”, or “optical flow”.
In the following, optical flow estimation, motion estimation, 2-D
motion estimation, or apparent motion estimation have
equivalent meanings.
179
Motion Estimation
P,
d̂ y
P
d̂ x
t + 2t
dˆ = [dˆ x dˆ y ]T
P
t
The 2-D displacement of the pixel located at point p from frame
at time t to frame at time t + :t
180
Section Outline
• Fundamental concepts in motion estimation
—
Image intensity conservation: The optical flow constraint
equation
—
Ambiguities resulting from intensity conservation
—
The use of a priori information in motion estimation
• Overview of selected algorithms
—
Block Matching
—
Hierarchical Block Matching
—
A Gradient-Based Algorithm
• Evaluation
of the performance of a motion estimation
algorithm
181
Fundamental Concepts in Motion Estimation
Notation
Let us denote the displacement vector at location r in frame at
time t, describing the motion from frame at t to frame at t + :t, as
d(r) = [dx(r)
dy(r)]T
where r = [x y]T, the continuous spatial coordinates, and superscript T denotes vector transposition. Note the displacement
vector on spatial coordinates; for dependency of the each
location r at frame t, a displacement vector can be defined.
r
d (r)
t
182
t + :t
Fundamental Concepts in Motion Estimation
Image Intensity Conservation: The Optical Flow Constraint
A possible first approach to motion estimation is to assume
that the motion is locally translational, and that the image
intensity is invariant to motion, i.e., it is conserved along motion
trajectories:
an estimate dˆ ( r ) solves
g(r + d(r), t + :t) = g(r, t)
where g(r) denotes the image intensity at location r,
or
an estimate dˆ ( r ) minimizes the displaced frame difference
df d(r) = g(r + d(r), t + :t) – g(r, t)
Such estimates are said to satisfy the optical flow constraint.
183
Fundamental Concepts in Motion Estimation (cont’d.)
Problems with The Optical Flow Constraint
• The effects of noise and occlusions are ignored.
optical flow
does hold
optical flow does not hold
(uncovered region)
• Ambiguity:
Two unknowns, d̂ xand d̂ y cannot be uniquely
determined from a single equation.
b
a
t + :t
p
t
For the point at location p, the optical flow equation is satisfied
by vectors pointing at locations a and b, resulting in two
different displacement vectors. BUT, an a priori smoothness
constraint would resolve the ambiguity in favor of the vector
pointing at a.
184
Overview of Selected Algorithms
•
In the following, we will consider the first two classes,
feature/region matching and gradient-based methods, since the
algorithms that are most commonly used in video image
processing belong to these two classes.
In particular, we will give an overview of the following
algorithms:
—
Block Matching
—
Hierarchical Block Matching
—
A Gradient-Based Algorithm (Lucas-Kanade)
These are the most commonly used practical algorithms in video
processing.
185
A PRIORI INFO AND CONSTRAINTS
Image properties
-- Image intensity is invariant to motion
-- Image intensity is invariant to motion
within a constant multiplier plus a bias
-- Multispectral (color) bands have
common motion fields
-- Image intensity distribution can be
modeled via stochastic models, such as
Markov random field models.
Models of Motion Field
-- Smooth motion field
--Directionally smooth motion field
Motion Estimation algorithms
use at least one, or more of
these a priori information.
--Stochastic models, such as Markov
random field models
--Local translational motion
--Global translational motion
--Global pan, zoom, rotation
--Regional affine transform model
--Regional perspective transform
model
3-D Objects and 3-to-2 D Projection
--Rigid object
--No occlusion
--Planar (or higher order) surface
--Ortographic projection
--Perspective projection.
186
Block Matching
Basic Principles
The basic principles of motion estimation by block matching are
the following:
• Locally translational motion, and rigid body assumptions are
made.
• To determine the displacement of a particular pixel p at frame
at time t, a block of pixels centered at p is considered. The
frame at time t + :t is searched for the best matching block of
the same size. In the matching process, it is assumed that
pixels belonging to the block are displaced with the same
amount. (Coherence/Smoothness).
• Matching
is performed by either maximizing the cross
correlation function or minimizing an error criterion — —
conservation of image intensity.
The most commonly used error criteria are the mean square
error (MSE) and the minimum absolute difference (MAD).
•
The search for the blocks is performed following a welldefined procedure.
187
Block Matching
Illustration of The Basic Idea
d̂ y
yp
d̂ x
xp
t + :t
dˆ = [ dˆ x dˆ y ]T
yp
t
xp
minimum error
or
maximum correlation
-
188
Block Matching
Issues of Importance
1. Matching criterion
2. Search procedure
3. Block size
4. Spatial resolution of the displacement field
(Do we obtain an estimate for every pixel location ? Every
other pixel location ?, etc.,)
5. Amplitude resolution of the displacement field
(integer versus real-valued displacement vectors)
189
Block Matching
Matching Criteria
For an N × N block B,
MSE:
MSE ( m, n) = 1 / N 2
[ g ( m + m, n + n, t + t )
( m ,n ) B
g (m, n, t )]2
MAD:
MAD ( m, n) = 1 / N 2
| g ( m + m, n + n, t + t )
( m ,n ) B
g ( m, n, t ) |
d mmax
m d mmax and d nmax
n d nmax , and
[ d mmax d nmax ] is the maximum allowable (absolute) displacement.
where
In both cases, the displacement vector estimate [ dˆ m , dˆ n ]
~ ~
~
~
defined as [ dˆ m , dˆ n ] T = [ m , n ] T where [ m , n ]T
minimizes the appropriate criterion.
T
We use discrete spatial coordinates due to the discrete nature of block matching.
190
is
Example:
MAD 1 = 1/9 [ |a1- b1| + |a2-b2| + ….. + |a9-b9| ]
MAD 2 = 1/9 [ |a1- b9| + |a2-c2| + ….. + |a9-c9| ]
IF MAD 2 < MAD 1, then X is displaced to location Z with (3,1)
IF MAD 1 < MAD 2, then X is displaced to location Y with (1,3)
191
Block Matching
Matching Criteria (cont ‘d.)
Note that block matching utilizes the fundamental constraints of
optical flow and smoothness:
•
Minimizing the sum in MSE and MAD can be viewed as
imposing the optical flow constraint (conservation of image
intensity), collectively, on the pixels of the block, and hence
the use of the optical flow constraint in block matching.
•
The very idea of using a block of pixels and assuming a
common displacement for them in the matching process
corresponds to a local smoothness (coherence) constraint on
the displacement vector field.
192
Block Matching
Search Procedures
Full Search: Exhaustive search within a predetermined
maximum displacement range:
max
max
For N × N measurement windows (i.e., blocks) and [ d m d n ]
for maximum absolute displacement, full search calls for the
evaluation of the matching criterion at
( 2 d mmax + 1) × ( 2 d nmax + 1) points.
Suboptimal (non-exhaustive) search procedures are specified to
decrease the search time.2These are particularly useful for
software implementations. Most hardware implementations
utilize full search.
2
T. Koga et al., “Motion compensated interframe coding for video conferenc-
ing,” in Proc. Nat. Telecommun. Conf. New Orleans, pp. G. 5. 3.1,G 5.3.5,
1981.
193
Block Matching
Full Search
X at frame
Full search grid for a block centered at location
from which the displacement vectors originate (frame at time t).
The grid points represent the location of the centers of the blocks
considered in the matching process (i.e., candidate blocks) This
grid is overlayed on the frame that the displacement vectors
point at (frame at time t + :t).
.
Example:
**Grid points correspond to centers of candidate blocks considered for a
match.
194
Block Matching
The Block Size Issue
Small Blocks: It is possible in this case that a match may be 1.
errenouslv established between blocks containing similar
gray-level patterns which are otherwise unrelated in the
sense of motion (e.g.. imagine the extreme case of 1 × 1
block = pixel). Therefore, the block size should be sufficiently
large.
t
t+ t
reference
block
matching
block
195
Block Matching
The Block Size Issue (cont ‘d.): Hierarchical Block Matching
Large Blocks: If the motion varies locally (possibly within 2.
the measurement block), for instance due to independently
moving smaller size structures within the block, it is
obvious that block matching will provide inaccurate motion
estimates when large blocks are used. Therefore, sufficiently
small blocks should be used in this case.
t
t+ t
reference
block
matching block
Images may contain different amounts and types of motion, >
there are conflicting requirements on the size of the
measurement window. Hierarchical block matching aims at
resolving this issue.
196
Hierarchical Block Matching (HBM)
Main Mechanism
Multiresolution representations of the two frames of interest •
are generated. A 3-level representation is shown below.
Rectangular uniform convolution kernels, K1 × K1, K2 × K2,
and K3 × K3 can be used to perform low pass filtering. When
K3 L K2 L K1, the 3rd level becomes the lowest resolution level.
•
The hierarchical block matching is first applied to the two
frames at their lowest resolution level. From low to high
resolution, the result of each level is passed onto the next
higher resolution level and used as initial estimates of the
displacement vector field. These initial values are then updated
by the values estimated at the present level and then passed
onto the next level and so on.
197
Hierarchical Block Matching (HBM)
Main Mechanism (cont’d.)
• Larger block sizes are used at lower resolution levels whereas
smaller block sizes are used in higher resolution levels. The
purpose of the lower resolution levels is to estimate the major
part of the displacement.
It is appropriate to use larger size blocks at lower resolution
levels since the independently moving smaller structures within
the blocks may have been eliminated by low-pass filtering.
198
Hierarchical Block Matching (HBM)
Main Mechanism (cont’d.)
•
At higher resolution levels, the significant sensitivity
associated with a small block size should not cause “incorrect
match problems” since the progression through the hierarchy
provides good initial estimates at higher levels, steering the
matching process to the “right” direction. The purpose of the
higher resolution levels is to fine-tune the estimates obtained at
lower resolution levels.
Also note that
•
At lower resolution levels displacement estimates can be determined at a
sparse set of image points due to the expected lack of significant variations.
•
Low pass filtering in creating the multiresolution representation reduces the
effects of the noise.
199
Hierarchical Block Matching (HBM)
Propagation of Estimates
Propagation of Estimates is illustrated here for three levels.
200