Image Block Transformations The image block transformations can be broken down into two parts: the geometric part, which is applied first, and the massic part, which is applied next. Geometric part: scales down the 2n × 2n domain block to a n × n range block by setting each range block pixel value to the average value of the group of four pixels at the corresponding location in the domain block. This transformation has a contractivity of 1. Massic part: includes one or more of the following operations: • Absorption at gray level go, where {0 g0 255} for an 8-bit image. This operation simply sets the value of all the pixels in the block to some uniform gray level go. For this operation, s = 0. • Contrast scaling by ), The value of each pixel in the block is multiplied by a and the resulting value clipped to the range (0, 255). For this operation, s = )2. • Luminance shift by ~g, where the value of g is added to each pixel value ({–255 g 255}), with the resulting values clipped to (0,255). For this operation, s = 1. • One of the eight isometric transformations of identity (1), reflection (4), and rotation (3). For these operations, s = 1. 151 Transformation Example The following example represents an overall contractive transformation consisting of a massic transformation followed by a contrast scaling by ) = 0.50, followed by a luminance shift of 2g = 50 followed by a isometric rotation of 90 degrees. 152 Jacquin’s Fractal Compression Technique The goal of the encoding algorithm is to find, for each n × n block of the image, another 2n × 2n block, which when transformed by a combination of the previously mentioned geometric and massic transforms, will result in a good approximation of the original block. The most appropriate domain block indicated by the search is called a matching block. To this end: • The image is segmented into n × n range blocks (usually, 8 × 8). • Each 2n × 2n (e.g., 16 × 16) domain block in the image is transformed by a combination of geometric and massic transformations with different parameters until the best matching block and the best set of transform parameters are found. • The parameters of the geometric and massic transforms, along with the coordinates of the matching domain block are encoded. • To improve the reconstructed image quality, each n × n (parent) block is further divided into four n / 2 × n / 2 (child) blocks. If distortion between a reconstructed child block and the original child block is larger than a threshold, the child block is independently encoded with the results of the child block encoding superseding that portion of the parent block. If more than two child blocks need to be independently encoded, the fractal for the parent block is discarded and the region is encoded as collection of 4 child blocks. 153 Encoding of Fractal Parameters Domain coordinates: The probability that a domain block is the best match for a given range block is almost inversely proportional to the distance between them. Entropy coding of this parameter can result in significant bit rate reduction. Contrast scale factor: The distribution of this parameter is also nonuniform and usually has a spike at its maximum allowable value, thus benefiting from entropy coding. Luminance offset: This parameter has an asymmetric distribution, usually with a peak at zero. Isometries: The distribution of this parameter is fairly image dependent, but usually the identity transform is used more the 90 degree rotations are used the least often. Encoding speed: There are several ways to speed up the encoder search time. For example, the range and domain blocks can be classified into edge, shade, and mid-range classes based on their image content. For instance, when searching for the best match for an edge block, only the pool of the edge domain blocks needs to be considered. Also, there is no need to find a domain block for the shade blocks since they only consist of a constant brightness value. 154 VECTOR QUANTIZATION IMAGE COMPRESSION 155 Basic VQ Structure • An image is divided into nonoverlapping blocks or vectors; each vector X is of size n. • Each image vector is compared with Nc codevectors or templates from a codebook Yi, i = 1,…, Nc, for a best match using a minimum distortion rule, i.e., choose Yk such that d(X, Yk) d (X, Yj) for all j = 1,…, Nc. • Transmit the index k of the codevector, and use this index as an entry to a look-up table at the receiver to reproduce Yk. This results in a rate of R = (log2 Nc) / n bits/pixel. • Example: For a 4 × 4 block size (n = 16), and a codebook with 256 codevectors (Nc = 256), the bit rate is: R = (log2 256) / 16 = 0.5 bits/pixel. X Minimize K Channel K Table d (X, Yi) Lookup Codebook Yi, i = 1, …, Nc Codebook Yi, i = 1, …, Nc Basic VQ 156 YK Product Codes For a constant rate R = (log2 Nc) / n, the performance of VQ Rn improves as the block size n increases. However, Nc = 2 , which implies that the codebook size (and the encoder complexity) grows exponentially with n. This imposes a severe restriction on the performance of basic VQ. One solution is to use multiple codebooks with a product structure. If a vector can be characterized by certain independent features, a separate codebook may be developed and used to encode each feature. The final codeword would be the concatenation of all the different encoder outputs. Product codes are usually suboptimum, but result in substantial savings in storage and computation requirements; e.g., if a codebook of size Nc can be decomposed into two product codebooks of sizes N1 and N2, with Nc = N1 N2, the storage and computation complexity would be proportional to N1 + N2 rather than Nc. Examples of VQ schemes which implement a product structure are: • Mean/Residual VQ (M/RVQ) • Mean/Reflected-Residual VQ (M/RRVQ) • Gain/Shape VQ (G/SVQ) • Interpolative/Residual VQ (I/RVQ) Sabin and Gray IEEE Trans. Acoust. Speech Sig. Proc., Vol ASSP-32 , pp. 474, 1984. 157 Gain/Shape VQ (G/SVQ) In G/SVQ, separate, but interdependent, codebooks are used to code the shape and gain of the vector. The shape is defined as the original vector normalized by removal of a gain term such as the ac energy of a codevector. Given The gain and shape codebooks, the encoding proceeds as follows: First, .a unit energy shape vector yi is chosen from the shape codebook to match the input vector by maximizing the inner product over the codewords. Given the resulting shape vector, a scalar gain codeword Ji is selected so as to minimize the distortion between the original vector x and the reproduction vector Ji yi. SHAPE CODEBOOK yi, i = 1,…, 2Rc xn MAXIMIZE i t n x ym MINIMIZE 2 2 x nt y i GAIN CODEBOOK yi (i , j) ROM j j y i = xˆ n j DECODER Sabin and Gray, IEEE Trans. Acoust. Speech Sig. Proc., Vol ASSP.82, pp. 474, 1984 158 , j = 1,…,2 Rg ENCODER j Mean/Residual VQ (M/RVQ) • In M/RVQ, the mean of each image vector is quantized separately using a scalar quantizer. • The quantized mean is subtracted to yield a residual (shape) vector with approximately zero sample mean. This new vector is quantized using VQ. • The index of the coded residual vector along with the sample mean is transmitted to the receiver. A convential coding scheme (e.g. DPCM) can be used to encode the block means to further reduce the bit rate. • A motivation for M/RVQ is that many image vectors have a similar shape and differ only in mean level. By removing the mean, fewer codevectors are required to represent these image vectors. Gray, IEEE ASSP Magazine, pp. 4, April 1984. 159 M/RVQ Bitrate/Error Table LENA Technique Bitrate bits/pixel 0.50 M/RVQ 1ev 0 0.69 M/RVQ 1ev 1 0.87 M/RVQ 1ev 2 M/RVQ 1ev 3 M/RVQ 1ev 4 M/RVQ 1ev 5 1.06 1.25 1.44 BOOTS Technique Bitrate bits/pixel 0.50 M/RVQ 1ev 0 0.69 M/RVQ 1ev 1 0.87 M/RVQ 1ev 2 M/RVQ 1ev 3 M/RVQ 1ev 4 M/RVQ 1ev 5 1.06 1.25 1.44 Included Excluded RMSE SNR RMSE SNR (0-255) 11.52 8.62 6.60 (db) 26.90 29.42 31.74 5.25 4.27 33.72 35.51 37.85 3.27 Included (0-255) (db) 11.52 26.90 8.75 29.29 6.64 31.68 5.37 4.63 4.24 33.53 34.82 35.58 Excluded RMSE SNR RMSE SNR (0-255) (db) (0-255) (db) 19.59 14.70 11.46 9.29 7.40 4.86 22.29 24.79 26.95 28.77 30.75 34.39 19.59 14.72 11.61 9.61 8.45 7.85 22.29 24.77 26.83 28.47 29.59 30.24 • The mean is uniformly quantized to 8 bits. VQ is performed on 4 × 4 residual blocks. Codebook is of size 215 with an oct-tree structure. The tree has 5 levels with a bit-increment of 3/16 8 bits/pixel/level. Final bitrate is 163 × 5 + 16 = 1.438 bits/pixel. Two training sequences are used, each consisting of 8, 512 × 512 images. The desired image is included in one set and excluded from the other. • SNR is defined as l0 log10(2552/mse) db. 160 Interpolative/Residual VQ (I/RVQ) • An approximation (prediction) to the original image is made by first subsampling it by a factor l (e.g., l = 8) in each direction, and then expanding it to the original size by bilinear interpolation. •A residual image is formed by subtracting the interpolated image from the original image. This residual image is encoded using VQ. • The subsampled image values and the VQ indices of the encoded residual image are transmitted to the receiver. • For a resudual-image codebook of size NR,VQ blocks of size n, a subsampling factor l, and 8-bit representation of each subsampled value, the bit rate in bits/pixel is R= • If 8 log 2 N R . + l2 n the residual image is included in the training set, I/RVQ usually results in a better quality reconstructed image than M/RVQ or G/SVQ for the same bit rate. Hang and Haskell, IEEE Trans. Communications, Vol 36, PP. 465,1988. 161 I/RVQ Bitrate/Error Table Included Excluded LENA Technique Bitrate RMSE SNR RMSE SNR I/RVQ 1ev 0 I/RVQ 1ev 1 I/RVQ 1ev 2 I/RVQ 1ev 3 I/RVQ 1ev 4 I/RVQ 1ev 5 bits/pixel (0-255) (db) (0-255) (db) 0.12 0.31 0.50 16.21 10.68 8.05 0.69 0.87 1.06 6.41 5.19 3.96 23.94 27.56 30.02 31.99 33.83 36.18 16.21 10.75 8.11 6.53 5.65 5.23 23.94 27.50 29.95 31.83 33.09 33.77 Included Excluded BOOTS Technique Bitrate RMSE SNR RMSE SNR I/RVQ 1ev 0 I/RVQ lev 1 I/RVQ 1ev 2 I/RVQ 1ev 3 I/RVQ 1ev 4 I/RVQ 1ev 5 bits/pixel (0-255) (db) (0-255) (db) 27.62 18.18 14.09 11.39 9.06 5.99 19.31 22.94 25.15 27.00 28.99 32.58 27.62 18.09 14.14 11.69 10.25 9.57 19.31 22.98 25.12 26.78 27.92 28.52 0.12 0.31 0.50 0.69 0.87 1.06 • Original image is subsampled by 8 in each direction and each sample is transmitted using 8 bits. VQ is performed on 4 × 4 blocks. Codebook is of size 215 with an oct-tree structure. The tree has 5 levels with a bit-increment of 3/16 bits/pixel/level. Final 8 3 bitrate is 16 × 5 + 64 = 1.0625 bits/pixel. Two training sequences are used, each consisting of 8, 512 × 512 residual images. The desired image is included in one set and excluded from the other. • SNR is defined as l0 log10(2552/mse) db. 162 PROGRESSIVE TRANSMISSION 163 Progressive Transmission (PT) • The transmission of digital images over low bandwidth channels, e.g., telephone lines, requires a significant amount of time in cases where quick recognition is important. • Examples of this scenario are telebrowsing, where an image data base is searched for a particular image or images, and teleconferencing, where images need to be sent in near realtime to avoid undesirable delays in the discussion. • Conventional compression algorithms, such as transform or predictive coding, decrease transmission time, but generally require the majority of an image to be reconstructed to achieve recognition. • A Solution to this problem is Progressive Transmission (PT). 164 Progressive Transmission Techniques Spatial Hierarchy Techniques: • Subsampling • Averaging over blocks • Knowlton’s technique Amplitude Hierarchy Techniques: • Bit-plane encoding Frequency Domain Techniques: • Transform coding (DCT, Hadamard, ect.) • Sub-band coding Other Techniques: • Laplacian pyramid • Tree-structured VQ 165 166 Based on Laplacian Pyramid 1. A set of lowpass filtered images is formed by recursively filtering (normally by a Gaussian shape filter) the original image and subampling after each filtering operation. This set of images is called the Gaussian pyramid. 2. A set of difference images is formed by first expanding each subsampled image to the next larger size in the pyramid and then subtracting the adjacent levels in the Gaussian pyramid. This set is called the Laplacian pyramid. Essentially each lowpass filtered image is being used as a prediction image for the adjacent level and each level in the Laplacian pyramid is a quasi-bandpassed image. 3. The Laplacian pyramid levels are encoded (e.g. using a DPCM scheme) and transmitted, starting with the lowest resolution level. 4. The images are reconstructed by expanding and summing the levels recursively. 167 168 Hierarchical Progressive Mode • In the hierarchical mode, the image is encoded at multiple resolutions using a pyramid structure. The base of the pyramid represents the full-resolution image, and as one moves up the pyramid, the images decrease in spatial resolution by a factor of two in each horizontal and vertical dimension. This is particularly suited to a multiuse environment supporting devices of varying resolutions, e.g., — a high-resolution output writer at 2048 × 2048 pixels, — an HDTV video monitor at 1024 × 1024 pixels, — a low-resolution TV monitor at 512 × 512 pixels, etc. • In order to create the lower-resolution images, the size of the original image needs to be reduced by filtering and downsampling. The JPEG proposed standard does not specify the structure of the downsampling filters and has left them to the skill of each implementer. • In order to reconstruct the high-resolution images, the size of the low-resolution reconstructed images must be increased by upsampling. The JPEG standard has specified bilinear interpolation filters which must be used for the upsampling process. 169 170 Sequential Progressive Mode In sequential progressive transmission, partial image information is transmitted (1 in stages, and at each stage, an approximation to the original image is reconstructed at the decoder. The transmission can be stopped if an intermediate version of the image is satisfactory or if the image is found to be of no interest. It is motivated by the need to transmit images over lowbandwidth channels, e.g., telephone lines, in cases where quick recognition is important or total transmission time might be limited. Applications include PACS for medical imaging, photojournalism, real estate, and military. The JPEG extended system specifies two methods for the sequential progressive build-up of images: - • Spectral selection: Where each block of DCT coefficients is segmented into bands and each band is encoded in a separate scan. • Successive approximation: Where each coefficient is first sent with reduced precision n (specifiable by the user) and the precision is increased by one bit in each subsequent scan. These two procedures may either be used separately, or combined. Note that in the progressive mode there is the need for an image-sized buffer memory that can hold all the quantized DCT coefficients with full accuracy (which could be as much as 3 bits/pixel more than the input image precision). 171 JPEG Sequential Progressive Build-up1 1 G. Wallace, Communications of ACM, .34, pp. 30, 1991 172 Choosing a Compression Algorithm • Image quality vs. bit-rate • Implementation complexity: software implementation; hardware (DSP chip) implementation; ASIC implementation; memory requirements. • Encoder/decoder asymmetry • Constant bit-rate vs. constant quality • Flexible bit-rate/quality tradeoff • Robustness to post-processing • Robustness to input image types: contone, halftone, text; different levels of input noise; multiple resolutions; etc. • Nature of artifacts • Effect of multiple coding • Channel error tolerance • Progressive transmission capability 173 VIDEO COMPRESSION Definition of Video/Image Sequence An image sequence (or video) is a series of 2-D images that are sequentially ordered in time. Image sequences can be acquired by video or motion picture cameras, or generated by sequentially ordering 2-D still images as in computer graphics and animation. 174 The Need for Compression Video must be significantly compressed for efficient storage and transmission, as well as for efficient data transfer among various components of a video system. Examples V Motion Picture: One frame of a Super 35 format motion picture may be digitized (via Telecine equipment) to a 3112 lines by 4096 pels/color, 10 bits/color image. As a result, 1 sec. of the movie takes W 1 Gbytes ! V HDTV: A typical progressive scan (non-interlaced) HDTV sequence may have 720 lines and 1280 pixels with 8 bits per luminance and chroma channels. The data rate corresponding to a frame rate of 60 frames/sec is 720 × 1280 × 3 × 60 = 165 Mbytes/sec! 175 Major Components of A Video Compression System input Preprocessor Encoder Motion Estimator Motion Estimator video channel Channel Decoding & Demodulation Channel Encoding & Modulation Decoder Postpreprocessor Compression Module Motion Estimator 176 channel output video Components of A Video Compression System Motion Estimation • Motion Information is utilized at Compression, and also at Post-Processing. Pre-Processing, Pre-processing • Noise suppression (motion-compensated) • Correction of undesirable motion jitter; • Spatial and temporal subsampling for data reduction (e.g., skip-ping frames) Video Compression • Inraframe and Interframe compression are utilized. Interframe compression uses motion information for motioncompensated temporal prediction. Post-processing • Spatial and temporal interpolation for display: Frame interpolation for reconstructing skipped frames, or for frame-rate conversion. Compression artifact reduction — • 177 MOTION ESTIMATION 178 Motion Estimation In general, we speak of motion of objects in 3-D real world. In this section, we are concerned with the “projected motion” of 3D objects onto the 2-D plane of an imaging sensor. > By motion estimation, we mean the estimation of the displacement (or velocity) of image structures from one frame to another in a time-sequence of 2-D images. In the literature, this projected motion is referred to as “apparent motion”, “2- D image motion”, or “optical flow”. In the following, optical flow estimation, motion estimation, 2-D motion estimation, or apparent motion estimation have equivalent meanings. 179 Motion Estimation P, d̂ y P d̂ x t + 2t dˆ = [dˆ x dˆ y ]T P t The 2-D displacement of the pixel located at point p from frame at time t to frame at time t + :t 180 Section Outline • Fundamental concepts in motion estimation — Image intensity conservation: The optical flow constraint equation — Ambiguities resulting from intensity conservation — The use of a priori information in motion estimation • Overview of selected algorithms — Block Matching — Hierarchical Block Matching — A Gradient-Based Algorithm • Evaluation of the performance of a motion estimation algorithm 181 Fundamental Concepts in Motion Estimation Notation Let us denote the displacement vector at location r in frame at time t, describing the motion from frame at t to frame at t + :t, as d(r) = [dx(r) dy(r)]T where r = [x y]T, the continuous spatial coordinates, and superscript T denotes vector transposition. Note the displacement vector on spatial coordinates; for dependency of the each location r at frame t, a displacement vector can be defined. r d (r) t 182 t + :t Fundamental Concepts in Motion Estimation Image Intensity Conservation: The Optical Flow Constraint A possible first approach to motion estimation is to assume that the motion is locally translational, and that the image intensity is invariant to motion, i.e., it is conserved along motion trajectories: an estimate dˆ ( r ) solves g(r + d(r), t + :t) = g(r, t) where g(r) denotes the image intensity at location r, or an estimate dˆ ( r ) minimizes the displaced frame difference df d(r) = g(r + d(r), t + :t) – g(r, t) Such estimates are said to satisfy the optical flow constraint. 183 Fundamental Concepts in Motion Estimation (cont’d.) Problems with The Optical Flow Constraint • The effects of noise and occlusions are ignored. optical flow does hold optical flow does not hold (uncovered region) • Ambiguity: Two unknowns, d̂ xand d̂ y cannot be uniquely determined from a single equation. b a t + :t p t For the point at location p, the optical flow equation is satisfied by vectors pointing at locations a and b, resulting in two different displacement vectors. BUT, an a priori smoothness constraint would resolve the ambiguity in favor of the vector pointing at a. 184 Overview of Selected Algorithms • In the following, we will consider the first two classes, feature/region matching and gradient-based methods, since the algorithms that are most commonly used in video image processing belong to these two classes. In particular, we will give an overview of the following algorithms: — Block Matching — Hierarchical Block Matching — A Gradient-Based Algorithm (Lucas-Kanade) These are the most commonly used practical algorithms in video processing. 185 A PRIORI INFO AND CONSTRAINTS Image properties -- Image intensity is invariant to motion -- Image intensity is invariant to motion within a constant multiplier plus a bias -- Multispectral (color) bands have common motion fields -- Image intensity distribution can be modeled via stochastic models, such as Markov random field models. Models of Motion Field -- Smooth motion field --Directionally smooth motion field Motion Estimation algorithms use at least one, or more of these a priori information. --Stochastic models, such as Markov random field models --Local translational motion --Global translational motion --Global pan, zoom, rotation --Regional affine transform model --Regional perspective transform model 3-D Objects and 3-to-2 D Projection --Rigid object --No occlusion --Planar (or higher order) surface --Ortographic projection --Perspective projection. 186 Block Matching Basic Principles The basic principles of motion estimation by block matching are the following: • Locally translational motion, and rigid body assumptions are made. • To determine the displacement of a particular pixel p at frame at time t, a block of pixels centered at p is considered. The frame at time t + :t is searched for the best matching block of the same size. In the matching process, it is assumed that pixels belonging to the block are displaced with the same amount. (Coherence/Smoothness). • Matching is performed by either maximizing the cross correlation function or minimizing an error criterion — — conservation of image intensity. The most commonly used error criteria are the mean square error (MSE) and the minimum absolute difference (MAD). • The search for the blocks is performed following a welldefined procedure. 187 Block Matching Illustration of The Basic Idea d̂ y yp d̂ x xp t + :t dˆ = [ dˆ x dˆ y ]T yp t xp minimum error or maximum correlation - 188 Block Matching Issues of Importance 1. Matching criterion 2. Search procedure 3. Block size 4. Spatial resolution of the displacement field (Do we obtain an estimate for every pixel location ? Every other pixel location ?, etc.,) 5. Amplitude resolution of the displacement field (integer versus real-valued displacement vectors) 189 Block Matching Matching Criteria For an N × N block B, MSE: MSE ( m, n) = 1 / N 2 [ g ( m + m, n + n, t + t ) ( m ,n ) B g (m, n, t )]2 MAD: MAD ( m, n) = 1 / N 2 | g ( m + m, n + n, t + t ) ( m ,n ) B g ( m, n, t ) | d mmax m d mmax and d nmax n d nmax , and [ d mmax d nmax ] is the maximum allowable (absolute) displacement. where In both cases, the displacement vector estimate [ dˆ m , dˆ n ] ~ ~ ~ ~ defined as [ dˆ m , dˆ n ] T = [ m , n ] T where [ m , n ]T minimizes the appropriate criterion. T We use discrete spatial coordinates due to the discrete nature of block matching. 190 is Example: MAD 1 = 1/9 [ |a1- b1| + |a2-b2| + ….. + |a9-b9| ] MAD 2 = 1/9 [ |a1- b9| + |a2-c2| + ….. + |a9-c9| ] IF MAD 2 < MAD 1, then X is displaced to location Z with (3,1) IF MAD 1 < MAD 2, then X is displaced to location Y with (1,3) 191 Block Matching Matching Criteria (cont ‘d.) Note that block matching utilizes the fundamental constraints of optical flow and smoothness: • Minimizing the sum in MSE and MAD can be viewed as imposing the optical flow constraint (conservation of image intensity), collectively, on the pixels of the block, and hence the use of the optical flow constraint in block matching. • The very idea of using a block of pixels and assuming a common displacement for them in the matching process corresponds to a local smoothness (coherence) constraint on the displacement vector field. 192 Block Matching Search Procedures Full Search: Exhaustive search within a predetermined maximum displacement range: max max For N × N measurement windows (i.e., blocks) and [ d m d n ] for maximum absolute displacement, full search calls for the evaluation of the matching criterion at ( 2 d mmax + 1) × ( 2 d nmax + 1) points. Suboptimal (non-exhaustive) search procedures are specified to decrease the search time.2These are particularly useful for software implementations. Most hardware implementations utilize full search. 2 T. Koga et al., “Motion compensated interframe coding for video conferenc- ing,” in Proc. Nat. Telecommun. Conf. New Orleans, pp. G. 5. 3.1,G 5.3.5, 1981. 193 Block Matching Full Search X at frame Full search grid for a block centered at location from which the displacement vectors originate (frame at time t). The grid points represent the location of the centers of the blocks considered in the matching process (i.e., candidate blocks) This grid is overlayed on the frame that the displacement vectors point at (frame at time t + :t). . Example: **Grid points correspond to centers of candidate blocks considered for a match. 194 Block Matching The Block Size Issue Small Blocks: It is possible in this case that a match may be 1. errenouslv established between blocks containing similar gray-level patterns which are otherwise unrelated in the sense of motion (e.g.. imagine the extreme case of 1 × 1 block = pixel). Therefore, the block size should be sufficiently large. t t+ t reference block matching block 195 Block Matching The Block Size Issue (cont ‘d.): Hierarchical Block Matching Large Blocks: If the motion varies locally (possibly within 2. the measurement block), for instance due to independently moving smaller size structures within the block, it is obvious that block matching will provide inaccurate motion estimates when large blocks are used. Therefore, sufficiently small blocks should be used in this case. t t+ t reference block matching block Images may contain different amounts and types of motion, > there are conflicting requirements on the size of the measurement window. Hierarchical block matching aims at resolving this issue. 196 Hierarchical Block Matching (HBM) Main Mechanism Multiresolution representations of the two frames of interest • are generated. A 3-level representation is shown below. Rectangular uniform convolution kernels, K1 × K1, K2 × K2, and K3 × K3 can be used to perform low pass filtering. When K3 L K2 L K1, the 3rd level becomes the lowest resolution level. • The hierarchical block matching is first applied to the two frames at their lowest resolution level. From low to high resolution, the result of each level is passed onto the next higher resolution level and used as initial estimates of the displacement vector field. These initial values are then updated by the values estimated at the present level and then passed onto the next level and so on. 197 Hierarchical Block Matching (HBM) Main Mechanism (cont’d.) • Larger block sizes are used at lower resolution levels whereas smaller block sizes are used in higher resolution levels. The purpose of the lower resolution levels is to estimate the major part of the displacement. It is appropriate to use larger size blocks at lower resolution levels since the independently moving smaller structures within the blocks may have been eliminated by low-pass filtering. 198 Hierarchical Block Matching (HBM) Main Mechanism (cont’d.) • At higher resolution levels, the significant sensitivity associated with a small block size should not cause “incorrect match problems” since the progression through the hierarchy provides good initial estimates at higher levels, steering the matching process to the “right” direction. The purpose of the higher resolution levels is to fine-tune the estimates obtained at lower resolution levels. Also note that • At lower resolution levels displacement estimates can be determined at a sparse set of image points due to the expected lack of significant variations. • Low pass filtering in creating the multiresolution representation reduces the effects of the noise. 199 Hierarchical Block Matching (HBM) Propagation of Estimates Propagation of Estimates is illustrated here for three levels. 200
© Copyright 2025 Paperzz