SUMMARY (CHAPTER 1) 1. Basic Concepts of Images Images can be obtained using different observing and capturing systems from the real world in various forms and manners. A 2-D image is the projection result of a 3-D scene, and can be represented by a 2-D array f (x, y). In a 2-D digital image, x, y, and f will all take values in an integer set. The gray-levels of images are represented by the brightness of the images. Higher levels represent brighter pixels, while lower levels represent darker pixels. A general image representation function is a vector function f (x, y, z, t, λ) with five variables. 2. Image Engineering Image Engineering (IE) includes three related layers of image techniques: Image Processing (IP), Image Analysis (IA), and Image Understanding (IU). Image processing concerns the manipulation of an image to produce another (improved) image. Image analysis concerns the extraction of information from an image. Image understanding concerns the interpretation of the original scene and the decision and actions according to the interpretation of the images. Image engineering is related to a number of disciplines. The closest ones include Computer Graphics (CG), Pattern Recognition (PR), and Computer Vision (CV). Based on a survey of image engineering literature, more than 20 subcategories of image engineering can be defined. 3. Organization and Overview of the Book According to the general framework of image engineering, four parts are identified in this book: image fundamentals, image processing, image analysis, and image understanding. SUMMARY (CHAPTER 2) 1. Spatial Relationship in Image Formation Perspective transformation describes the process of the projection of a 3-D scene on to a 2-D image. Three coordinate systems are involved in image capturing by a camera: The world coordinate system, the camera coordinate system, and the image coordinate system. In the general camera model, all three systems can be separated. Using the homogeneous coordinates, the transformations for a perspective projection can be expressed in a linear matrix form. Inverse projection means to determine the coordinates of a 3-D point according to its projection on to a 2-D image. Perspective transformation is composed of translation, rotation, and projection in the general case. 2. Image Brightness Radiometry measures the energy of electromagnetic radiation while photometry measures the energy radiation of light. The scale of a point source is sufficiently small and/or is sufficiently distant from the observer, so the eye cannot identify its form. The extended source has a finite emitting surface, so it is more realistic. Brightness specifies the quantity of light emitted from a light source while illumination specifies the quantity of light received by a surface, which is illuminated by the light source. As the intensity of an image is a measurement of the radiation energy, its value should be non-zero and finite. The brightness of an image is propositional to the illumination component (the amount of light incident on the viewing scene) and the reflection component (the amount of light reflected by the object in a scene). 3. Sampling and Quantization An image f (x, y) must be digitized both in space and in amplitude to be processed by computers. Sampling is the process of digitization of the spatial coordinates (x, y), and quantization is the process of digitization of the amplitude f . The number of bits needed to store an image is the product of the number of bits to represent the image intensity and the image width and height. The influence of sampling on the quality of an image is marked by a checkerboard effect with graininess, especially around the region boundary in images. The influence of quantization on the quality of an image is characterized by the appearance of ridge-like structures and false contouring around the areas with smooth gray-levels. The Shannon sampling theorem states that it is possible to completely recover the original frequency-limited function from its samples. However, for any signal and its associated Fourier spectrum, the signal can be limited in extent (space-limited) or the spectrum can be limited in extent (band-limited) but not both. Applying the sampling theorem in a proper way is impossible due to either a finite camera aperture or the use of a finite amount data or both. 4. Stereo Imaging Stereo imaging uses two or more cameras for capturing the depth information that is often lost in a normal image acquisition process. The parallel horizontal model is the most popular model used in stereo imaging. Other models include the angular-scanning model, the focused horizontal model, and the axis model. In stereo imaging, the disparity between different cameras is the essential quantity to be computed. The depth information can be derived from the disparity by triangulation. SUMMARY (CHAPTER 3) 1. Relationships between Pixels Neighbors of a pixel p at coordinates (x, y) consist of the set of pixels that are closest to p. Two common neighborhoods are 4-neighbors of p: N4(p) and 8-neighbors of p: N8(p). If pixel q is a neighbor of pixel p, p and q are adjacent. Adjacency depends on the number of neighbors. Connectivity is more general than adjacency. Connectivity counts both spatial relationships between pixels and the pixel properties. A connected component of an image forms a region inside the image, in which any two pixels are connected via the connectivity of pixels inside the region. The definitions of neighbor, neighborhood, adjacency, connectivity, and connected component for the 2-D case can be easily extended to the case of 3-D or even higher dimensions. 2. Distances Distance measures how far apart two pixels are in an image. In a digital image, instead of Euclidean distance, city-block distance and chessboard distance are generally used. The 4-neighbors and 8-neighbors of a pixel can also be defined using cityblock distance and chessboard distance, respectively. Using the knight-distance measurement, an even wider neighborhood of a pixel can be defined. A distance disc centered at a pixel consists of all surrounding pixels, which have limited distances to the pixel. The chamfer distance is an integer approximation of the Euclidean distance. The element for computing chamfer distance is move, which is the length of the path from one pixel to another. The Farey sequence provides a suitable way to extend the move in the chamfer distance to high orders. 3. Image Coordinate Transformations Common coordinate transformations include translation, rotation, and scaling. In a 3-D space, all these transformation can be represented by a 4 × 4 transformation matrix. Various coordinate transformations can be combined by cascading. Since matrix multiplication is generally not exchangeable, the results of the cascading depend on the order of transformations. 4. Distance Transforms Distance transform is a special transform that maps a binary image into a gray-level image. It takes the value of the distance from a pixel to a reference point as the gray-level value of that pixel. Distance transform is a global operation, but it can be computed locally by using a small mask whose size should be an odd value. This process can be implemented either sequentially or parallelly. In the sequential process, the mask is divided into two symmetric submasks. One forward pass and one backward pass are performed using respective sub-masks. In the parallel process, the propagation of distance value from the boundary to the center is performed for all pixels in the mask iteratively. 5. Geometric Transformations Geometric transformation corrects the geometric distortion of an image. It consists of two steps: spatial transformation and gray-level interpolation. Spatial transformation rearranges pixels in the distorted image to recover the original spatial relation between pixels. Gray-level interpolation assigns values to pixels in the geometrically corrected image, according to the values in the distorted image, to get back the original properties of images. Various gray-level interpolation schemes exist. They range from zero order to high order with increased accuracy and computational costs. 6. General Morphic Transform Morphic transformations are a group of transformations that map a plane to another plane. The projective transformation, the affine transformation, the similarity transformation, and the isometric transformation (including the rigidbody transformation and the Euclidean transformation) form a hierarchy of morphic transformations. Aprojective transformation determines the coordinate transformations in the projection. It is specified by eight parameters. An affine transformation is a non-singular linear transformation followed by a translation. It has six degrees of freedom. A similarity transformation is an equi-form transformation. It has four degrees of freedom. An isometric transformation has three degrees of freedom. SUMMARY (CHAPTER 4) 1. Separable and Orthogonal Transforms Separable transforms between functions f (x, y) and T (u, v) are made with the forward transformation kernel and the inverse transformation kernel, respectively. These two kernels depend only on the indexes x, y, u, and v, but not on the values of f (x, y) or T (u, v). A 2-D transform with a separable kernel can be computed in two steps, each requiring a 1-D transform. If A is a real matrix and A−1 = AT, then the matrix A is an orthogonal matrix and the corresponding transform is called the orthogonal transform. 2. The Fourier Transform The Fourier transform is a particular separable and symmetric transform. Both the Fourier transform and its inverse transform are separable and symmetric. Typical theorems for the 2-D Fourier transform include shift, rotation, scale, shear, and the affine transform. 3. Walsh and Hadamard Transforms Both the Walsh transform and the Hadamard transform are separable, symmetric, and orthogonal. The kernel of the Walsh transform can be considered a set of basis functions. The Hadamard transformation matrices can be generated with the help of a simple recursive relationship. 4. Discrete Cosine Transform The discrete cosine transform (DCT) is a separable, symmetric, and orthogonal transform. Its computation can be realized using the real part of the discrete Fourier transform (DFT). The kernels of DCT are complex cosine functions. Cosine functions are even functions, so the DCT has implicitly a 2N-point periodicity. 5. Gabor Transform The short-time Fourier transform (STFT) of a function f (t) with respect to the window function r(t) evaluated at the location (b, v) in the timefrequency plane gives the approximate spectrum of f near t = b. The Gabor transform uses the Gaussian function as the window function. Since the Gaussian function has the minimum size of a time-frequency window, it has the highest concentration of energy in the t -f plane. In real applications, the Gabor transform of images is computed for a number of scales and a number of orientations. 6. Wavelet Transform One function can be represented by a linear combination of realvalued expansion functions weighted by real-valued expansion coefficients. Any function f (x) can be decomposed into two parts: one approximates f (x) obtained by using the scaling functions and the other is a difference, which can be represented by the sum of the wavelet functions. Two typical wavelet transforms are wavelet series expansion, which maps continuous functions to a sequence of expansion coefficients, and discrete wavelet transform, which converts a sequence of data to a sequence of coefficients. 2-D wavelet transform involves one 2-D scaling function and three wavelet functions, which are the product of a 1-D separable scaling function with corresponding 1-D separable and directionally sensitive wavelet functions. Wavelet decomposition of an image can be implemented in different ways: the pyramid-structured decomposition, the non-complete treestructured decomposition, and the complete tree-structured decomposition. 7. The Hotelling Transform The Hotelling transform is computed using the mean vector and the orthonormal eigenvectors of the covariance matrix. The Hotelling transform forms a new coordinate system, in which the principal axes are aligned with the eigenvectors by rotation. 8. Radon Transform The Radon transform of f (x, y) is defined as the line integral of f (x, y) for all lines defined by p = x cos θ + y sin θ. The 2-D Fourier transform of f (x, y) is equivalent to the Radon transform of f (x, y) followed by a 1-D Fourier transform on the variable p. Basic theorems of the Radon transform include linearity, similarity, symmetry, shifting, differentiation, and convolution. Inversion of the Radon transform yields information about an object in the image space when a probe has been used to produce the projection data. SUMMARY (CHAPTER 5) 1. Image Operations Image operation uses images as operands. It works pixel-by-pixel. Arithmetic operations include addition, subtraction, multiplication, and division. The basic logic operations include COMPLEMENT,AND, OR, and XOR. Image operations can be used for noise removing, motion detection, edge detection, etc. 2. Direct Gray-Level Mapping Gray-level mapping changes the visual impression of an image by changing the gray-level value of each pixel (a point-based operation) according to certain transform functions. Design of the mapping law should depend on enhancement requirements. 3. Histogram Transforms The histogram of an image is a statistical representation of the image. A gray-level histogram provides the statistics (the number of pixels for each gray-level) of the image. The cumulative histogram of an image is a partial summation of the histogram of this image. Histogram equalization tries to make the histogram more evenly distributed, so that the dynamic range and the contrast of the image can be increased automatically. Histogram specification transforms the histogram into a specific form to enhance the image in a pre-defined manner. Histogram equalization can be considered a special case of histogram specification. Compared with SML, GML is less biased in mapping, produces less mapping differences, and has small error expectations. 4. Frequency Filtering By the Fourier transform, an image can be represented using various frequency components. Enhancement in the frequency domain can be achieved by removing or keeping the specified frequency components. Low-pass filtering will keep low-frequency components and remove highfrequency components. The boundary frequency is called the cutoff frequency. High-pass filtering will keep high-frequency components and remove low-frequency components. 5. Linear Spatial Filtering Enhancement in the image domain can be based on the property of the pixels and the relations among the neighbors. The relations among the pixels are often represented by masks. The filtering process is carried out by mask convolution, which performs a linear combination of the computation results. The simplest smoothing filter takes the average value of the neighboring pixels as the output of the mask. An improvement is to emphasize the importance of each mask element by using a specified weight. 6. Non-Linear Spatial Filtering A non-linear filter provides a logic combination of the computation results in a neighborhood. The median filter is a popularly used non-linear filter. It replaces the value of a pixel by the median of the gray-levels in the neighbors of that pixel. Order-statistic filters are a group of non-linear filters, the median filter is an example. Typical examples also include max and min filters, as well as midpoint filters. 7. Color Image Enhancement Each color can be represented as a point in the color space (color model). Hardware-oriented color models are used mainly for color monitors, printers, etc. Typical examples are the RGB model and the CMY model. Perception-oriented models are more suitable for describing colors in terms of human interpretation, and are used mainly for image processing. Typical examples include the HSI model and the L*a*b* model. Pseudo-color enhancements consist of assigning different colors to different gray-levels in an image to emphasize their differences. Full color images can be enhanced either by processing each color component individually or by processing the pixel magnitude of the color image as a vector. SUMMARY (CHAPTER 6) 1. Degradation and Noise Images can be degraded in many ways and in various steps during image acquisition and operations. Noise is one of the most popular sources of degradation, which is, in general, considered to be the disturbing/annoying signals of the required signals. Signal-to-noise ratio (SNR) is a useful indication of the image quality with the presence of noise. To describe the statistical behavior of the noise component of an image, the probability density function (PDF) is used. Typical examples include Gaussian noise, uniform noise, and impulse (salt-and-pepper) noise. 2. Degradation Model and Restoration Computation In a simple model of image degradation, an operator H, which acts on the input image f (x, y) and an additive noise n(x, y) jointly produce the degraded image g(x, y). The properties of a degradation system may include linearity, additivity, homogeneity, and the position invariance. The computation of a degradation model can be carried out with the convolution of the circulant matrix (for a 1-D case) or the block-circulant matrix (for a 2-D case). Both a circulant matrix and a block-circulant matrix can be diagonalized. The effect of the diagonalization is that the degradation model can be solved with the help of a few discrete Fourier transforms. 3. Techniques for Unconstrained Restoration In unconstrained restoration, no a priori knowledge about the noise is assumed. The restoration is carried out in a least squares sense of the estimation error. Inverse filtering is a commonly used restoration approach, which can be implemented in the Fourier domain. Removal of the blur caused by uniform linear motion is a typical application of the techniques for the unconstrained restoration in a closed form. 4. Techniques for Constrained Restoration In constrained restoration, some a priori knowledge of the noise is used to constrain the least square computation. The restoration is carried out using the method of Lagrange multipliers. Wiener filtering is a statistical method for constrained restoration. It is based on the correlation matrices of the image and the noise. When there is no noise, theWiener filter degrades to the ideal inverse filter. The constrained least square restoration only requires knowledge of the noise’s mean and variance, but the restoration is optimal for each image. 5. Interactive Restoration Interactive restoration uses the advantage of human intuition to control the restoration process. Interactive restoration is suitable for eliminating the occurrence of a 2-D sinusoidal interference pattern (coherent noise). SUMMARY (CHAPTER 7) 1. Introduction The techniques for reconstruction from projections are different from the techniques for 3-D reconstruction from depth images, which will be discussed in later chapters. There are various modes of reconstruction from projections. Typical examples are Transmission Computed Tomography (TCT), Emission Computed Tomography (ECT), including Positron Emission Tomography (PET), and Single Positron Emission CT (SPECT), Reflection Computed Tomography (RCT), Magnetic Resonance Imaging (MRI), and Electrical Impedance Tomography (EIT). In reconstruction from projections, information about a 2-D region is collected by a number of line integrations over this region along different directions. The reconstruction is carried out by solving integration equations. 3-D reconstruction from projections makes true 3-D imaging possible. 2. Reconstruction by Fourier Inversion Fourier inversion is a typical reconstruction method among the group of transform methods. It starts by establishing a continuous model, then solves this model by using inversion, and finally adapts to the result for discrete data. The basis of transform methods is the projection theorem for the Fourier transform, which relates the 2-D Fourier transform of the projected data to the 1-D Fourier transform of a line integration. Fourier inversion follows a typical transform-based approach, which transforms the data, processes the transformed data, and transforms the result inversely. Phantoms are used to study the algorithms for reconstruction from projection. One commonly used example is the Shepp-Logan head model. 3. Convolution and Back-Projection The idea behind back-projection is to obtain a distribution from the inverse process of the projection. Convolution back-projection is the most widely used technique for backprojection, which consists of a convoluted projection process followed by a back-projection process. Filtering the back-projections is also a technique for back projection, which consists of a back-projection process followed by a filtering or convolution process. Back-projection of the filtered projections is another technique for backprojection, which consists of a filtering process followed by a backprojection process. 4. Algebraic Reconstruction The algebraic reconstruction technique (ART) is also called a finite seriesexpansion reconstruction method. ART is performed in a discrete space from the beginning, in contrast to other transform or projection methods which start in a continuous space. ART without relaxation uses an iterative step to update the attenuation vector by using one single ray at each time and changing only the pixels that intersect with this ray. ART with relaxation is extended from ART without relaxation by adding a relaxation coefficient that controls the convergence speed of the iterative process. 5. Combined Reconstruction Different reconstruction methods can be combined to form new methods for reconstruction. One typical combined reconstruction method is called the iterative transform method. It is similar to transform methods as the algorithm for discrete data is derived from the reconstruction formula. Its iterative nature makes it common to ART. SUMMARY (CHAPTER 8) 1. Fundamentals Data are closely related to information but are different from information. Data redundancy occurs when some data provide non-essential or already known information. There are three basic data redundancies: The coding redundancy, the inter-pixel redundancy, and the psycho-visual redundancy. In a lossless coding process, no information is lost. In a lossy coding process, some information is lost. To judge the fidelity of the decompressed image to the original image, objective or subjective criteria can be used. 2. Variable-Length Coding Variable-length coding is based on statistical theories and is called entropy coding. It represents high probability events with fewer bits and low probability events with many bits. Huffman code can be constructed by iteratively constructing a binary tree and computing the probability of source symbols. Sub-optimal Huffman cording sacrifices coding efficiency for simplicity in code construction. Shannon-Fano code can be constructed by assigning 0 and 1 in nearly the same probabilities. Arithmetic coding assigns an entire sequence of the source symbols with a single arithmetic code word. As a result, it generates non-block codes. 3. Bit-Plane Coding Bit-plane coding decomposes one gray-level image into a sequence of bitplanes (binary images) and then codes each bit-plane. A binary code decomposition of an image produces an m-bit image in the form of m 1-bit bit-planes. A Gray code decomposition of an image can reduce the effect of small gray-level variations on each bit-plane. In Gray code, successive code words differ by only one bit position. Constant area coding (CAC) divides the image into three types of blocks and uses 1-bit or 2-bit code words for all white or all black blocks. In 1-D run-length coding, the white or black runs are coded by their lengths. The length of the run can be further coded by variable-length coding. 2-D run-length coding can be extended from 1-D run-length coding by tracking the binary transitions that begin and end each black and white run in a previous line. 4. Predictive Coding Predictive coding only codes the new information in each pixel with respect to the predictive value of this pixel to reduce or to remove interpixel redundancy. In lossless predictive coding, the predictor produces the predictive/estimation value of the current pixel according to the values of several past input pixels. A system of lossy predictive coding is formed by adding a quantizer into the system of the lossless predictive coding. The quantizer maps the prediction error into a limited range of outputs. In designing the optimal encoder, the encoder’s mean-square prediction error is minimized to achieve optimal encoding performance. In designing the optimal quantizer, the mean-square quantization error is minimized to achieve optimal quantizing performance. 5. Transform Coding Transform coding is a frequency domain technique, and is lossy in general. The encoder of a typical transform coding system consists of four operation modules: sub-image decomposition, transformation, quantization, and symbol coding. The decoder of a typical transform coding system consists of three operation modules: symbol decoding, inverse transformation, and subimage merging. In transform coding, compression is achieved during the quantization step but not the transform step. The selection of the transform depends on the coding error allowed and the computational complexity. Since wavelet transform is efficient in computation and has inherent local characteristics, there is no module for sub-image decomposition in wavelet transform coding systems. SUMMARY (CHAPTER 9) 1. Definition and Classification Image segmentation is defined as the process used to sub-divide an image into its constituent parts and extract those parts of interest (objects). Image segmentation can be more formally defined with the help of uniformity predication and the set theory. The classification of segmentation algorithms can be considered a partition of a set into subsets. Image segmentation algorithms can be classified into four groups: Boundary-based parallel, boundary-based sequential, region-based parallel, and region-based sequential. The classification of segmentation algorithms can be considered a partition of a set into subsets. 2. Basic Technique Groups Edge detection is the first step of the boundary-based parallel segmentation algorithm. Differential edge detectors (DED), such as the Sobel detector, are widely employed to detect edges. The Hough transform is a global technique for boundary detection. The major components of a boundary-based sequential segmentation algorithm consists of the selection of a good initial point, choosing the next point according to the dependence structure, and terminating the process according to a pre-defined criterion. Graph search and dynamic programming form the basis of a global approach combining edge detection and edge linking for image segmentation. Thresholding is a popular segmentation algorithm, which belongs to the region-based parallel group. Determination of appropriate threshold values is the most important task involved in thresholding techniques. The threshold values can be pointdependent, region-dependent, or coordinate-dependent. A split, merge, and group approach is a typical region-based sequential segmentation algorithm, which consists of five phases: initialization, merging, splitting, conversion, and grouping. 3. Extension and Generation The modifications required to extend 2-D algorithms to 3-D applications depend on the nature of 2-D algorithms. In extending 2-D differential edge detectors to 3-D, the number of masks and the size of masks (in counting 6-, 18-, and 26-neighborhoods) should be modified. In extending 2-D thresholding techniques to the 3-D case, 3-D images’ primitive, that is voxel, should replace the pixel in 2-D images, and 3-D local operators should replace 2-D local operators. In extending 2-D SMG algorithm to 3-D, the 2-D quad-tree structure should be substituted with a 3-D octree structure. Generating pixel-level edge detection schemes to detect sub-pixel edges can be accomplished by using information either about the statistics along the normal direction or the tangent direction information. The basic Hough transform technique for determining close-form curves can be generalized to detect any form of curves by using an R-table. 4. Segmentation Evaluation The development of segmentation techniques has traditionally been an ad hoc and problem-oriented process, as no general theory exists for segmentation. Segmentation algorithms can be evaluated analytically or empirically. The methods used in the latter approach can still be classified as the goodness methods and the discrepancy methods. Many criteria for segmentation evaluation have been proposed. They can be classified according to the groups of evaluation methods. A general evaluation framework consists of three related modules: Image generation, algorithm testing, and performance assessment. Systematic comparison of various segmentation evaluation methods is one level up for the segmentation evaluation, and can be made either on the evaluation group or on the single evaluation methods. Experimental comparison of some empirical evaluation methods shows that the discrepancy methods are better than the goodness methods in segmentation evaluation. SUMMARY (CHAPTER 10) 1. Classification of Representation and Description Representation can be external or internal. The former focuses on shape characteristics while the latter focuses on region properties. Description expresses the characteristics of an object region based on its representation. The properties of the region can be obtained via the computation of descriptors. Both representation and description schemes can be divided into boundary-based, region-based, and transform-based representations. 2. Boundary-Based Representation Boundary-based representation approaches can be divided into parametric boundaries, a set of boundary points, and a curve approximation. Chain codes, boundary segments, polygonal approximation, signatures, and landmarks are frequently used techniques for boundary-based representation. 3. Region-Based Representation Region-based representation approaches can be divided into region decomposition, bounding regions, and internal features. Bounding regions, quad-trees, pyramids, and skeletons are frequently used techniques for region-based representation. 4. Transform-Based Representation In transform-based approaches, some transforms are used to represent the shape in terms of the transform coefficients. The Fourier transform-based approach is a basic technique for boundarybased representation in the frequency domain. Fourier descriptors of the boundary, for different geometric changes, follow some simple transformation rules. 5. Descriptors for Boundary Boundary length, boundary diameter, and boundary curvature are some simple descriptors for boundary. Shape number is based on chain-code representation. It is the smallest value of the first difference of chain codes. The boundary moment of an object is computed from the segments composing the boundary. 6. Descriptors for Region There are various region properties, such as the geometric property, shape property, topological property, and intensity property. The area, centroid, and intensity of an object region are some basic descriptors for regions. The seven moments that are invariant to translation, rotation, and scaling are composed by normalized second and third central moments. Topological descriptors are independent of distance measurements, so are unaffected by any deformation of object regions. SUMMARY (CHAPTER 11) 1. Direct and Indirect Measurements Direct measurement quantifies directly the property of objects. The measurement can be made on an image in general or on the object in particular. Derived measurements are indirect measurements as they are derived from direct measurements by combination. The number of combinations can be unlimited. Measurement combinations are based on combining metrics to form a new metric. 2. Accuracy and Precision Accuracy is defined as the agreement between the measurement and some objective standard taken as the “truth.” High accuracy shows a tendency toward unbiasedness. Precision is defined in terms of repeatability, the ability of the measurement process to duplicate the same measurement and produce the same result. High precision in measurement provides highly efficient measurements. Accuracy and precision in measurements are somehow independent. A highly precise but inaccurate measurement is generally useless. The statistical error is associated with precision as it describes the scatter of the measured value for repeated measurements. The systematic error is associated with accuracy as it is indicated by the difference between the true value and the average of the measured values. 3. Two Types of Connectivity In square lattice, 4-direction connectivity and 8-direction connectivity are commonly used. In image measurement, the internal pixels and boundary pixels should be judged with different types of connectivity to avoid ambiguity. Similar statements for object points and background points, for 4connected component and 8-connected arcs, for 4-connected component and 8-connected curves, and for an open set and closed set can also be made. 4. Feature Measurement Error Along the process from scene to data, many factors will influence the accuracy of measurements. Important factors that make the real data and estimated data different include the natural variation of the scene, the image acquisition process, the image processing and analysis procedures, the different measurement practices and feature computation formulas, as well as noise and other degradations. The applicability of the sampling theorem cannot be guaranteed in image analysis, so the sampling theorem is not a proper reference for choosing a sampling density. Segmentation is the basis for feature measurement, so the quality of image segmentation directly influences the accuracy of feature measurements. This influence depends also on the types of measurements, that is, the features to be measured. The computation formulas used in indirect measurements are also critical for an accurate and precise feature measurement. Different influence factors can have combined effects on feature measurements. 5. Error Analysis For a given feature measurement task, there would be an upper bound and a lower bound for the error. Some approximation errors can be analyzed in closed forms, though most of them can only be obtained by numerical computations. SUMMARY (CHAPTER 12) 1. Concepts and Classification Texture is a word rich in meanings but having no unique formal definition. The perception of texture depends on the scale at which the texture is viewed. Two types of textures can be distinguished based on scales. They are micro-texture and macro-texture. Research on texture can be classified into four groups: texture description, texture segmentation, texture synthesis, and shape from texture. Approaches to texture analysis can be classified into three categories: statistical techniques, structural techniques, and spectral techniques. 2. Statistical Approaches The statistical approach for texture analysis uses the statistical property of the whole texture (not the element) to describe and classify texture patterns. The co-occurrence matrix of gray-levels reflects the spatial distribution of pixels with different gray-levels, and thus is the basis for many texture descriptors. Law’s texture energy is computed by using fixed-size windows. 1-D masks are first defined, then 2-D convolution is performed to give an energy measurement of a local texture pattern. 3. Structural Approaches Structural approach for texture analysis uses a two-layered structure. The first layer characterizes the gray-level primitive to specify the local properties, and the second layer describes the organization of primitive elements with some arrangement rules. A texture element is a connected component of an image. Examples range from the simplest pixel to a more complex neighborhood of a pixel. Arrangement rules can be defined by specific language/grammar, which specify what variables can be replaced by certain elements and variables. One typical structural approach tessellates texture patterns on the plane in a certain order. 4. Spectral Approaches Spectral approaches for texture analysis transform the original image into spectral spaces and describes texture property using texture attributes in new spaces. The peaks Fourier spectrum can provide the principal direction of the texture patterns and the fundamental spatial period of the pattern. Many texture descriptors based on the Bessel-Fourier function can be obtained. They can indicate many properties of textures, such as rotational and translational symmetry, coarseness, contrast, roughness, regularity, etc. By using a set of 2-D Gabor filters, an image can be decomposed into a sequence of frequency bands. Further texture descriptions can be conducted from these bands. 5. Texture Categorization Three categories of textures can be distinguished: globally ordered textures, locally ordered textures, and disordered textures. Locally ordered textures can be modeled neither statistically nor structurally. They can be analyzed by a combination of two types of images: The angle image and the coherence image. Texture can be composed with three types of primitive textures: by the linear combination, the functional combination, and the opaque overlap. 6. Texture Segmentation Texture segmentation aims to separate regions with different textures. As the texture is defined in a neighborhood, the result of texture segmentation has a resolution of a group of pixels. Techniques for texture segmentation can be divided into supervised and unsupervised ones. Both of them perform texture classifications. In supervised texture segmentation, two steps, the pre-segmentation and post-segmentation steps, are needed. In unsupervised texture segmentation, the number of texture types is not known a priori, and to determine the number of texture types is a key task. SUMMARY (CHAPTER 13) 1. Definitions and Tasks A shape means the shape of an entity in the real world or the shape of a region in an image. An object’s shape involves all aspects of an object, except its position, orientation, and size. A shape in an image involves all aspects, except its mean gray-level and intensity scale factor. An operable definition of shape takes an object’s shape as the pattern formed by the points on the boundary of this object. Shape analysis involves shape pre-processing, shape representation and descriptions, and shape classifications. 2. Different Classes of 2-D Shape A classification of planar shapes forms a tree-like structure. Shapes are categorized as thin and thick shapes. Thin shapes are categorized as involving a single curve or composed parametric curves. The single curve shapes are classified as being open or closed, smooth or not, Jordan or not, and regular or not. 3. Shape Property Description Shape property description consists of describing one property of a shape by using different techniques/descriptors. To describe the compactness property, descriptors such as form factor, sphericity, circularity, and eccentricity can be used. To describe the complexity property, descriptors such as area to perimeter ratio, thinness ratio, rectangularity, mean distance to the boundary, and temperature can be used. 4. Technique-Based Descriptors Technique-based descriptors are a set of descriptors belonging to the same technique group. Many shapes descriptors are based on the polygonal approximation representation. The shape number and the shape signature are two of them. The curvature can be extracted from the contour of an object. Measurements on a curvature, direct or indirect, have many geometrical aspect meanings of the object contours. The computation of discrete curvatures needs some approximation techniques. The order of a curvature is an index of the approximation order. Descriptors based on curvature computations include curvature statistics; maxima, minima, and inflection points; symmetry measurement and bending energy. 5. Wavelet Boundary Descriptors Wavelet boundary descriptors can be normalized for both translation and scale in-variances. The two important properties of wavelet boundary descriptors are their ability to represent a boundary and their measurable ability for similarity between two boundaries. Compared to the Fourier boundary descriptor, the wavelet boundary descriptor is less affected by the local deformation of a boundary, and is more precise in describing the boundary with less coefficients. 6. Fractal Geometry The topological dimension corresponds to the number of degrees of freedom of a point in space. The fractal dimension is often larger than its topological dimension for a complex point set. The box-counting approach is a practical method used to estimate the fractal dimensions in practice. Experimentally obtained diagrams for the box-counting method have three distinct regions depending on the scales, from a non-fractal region, via a fractal region, to a region with zero dimensions. SUMMARY (CHAPTER 14) 1. Modules of Stereo Vision A complete stereo vision system must perform six tasks: camera calibration, image capture, feature extraction, stereo matching, 3-D information recovery, and post-processing. 2. Region-Based Binocular Matching In stereo vision, more than one image is used. Stereo matching consists of establishing the correspondence among these images. The basic model of stereo vision is the parallel horizontal model that uses binocular images. The techniques used for this case can be easily extended to other stereo models. Region-based matching uses gray-level correlation to determine the correspondence. A simple and basic example is template matching. In region-based matching, a number of constraints, such as compatibility constraints, uniqueness constraints, continuity constraints, and epipolar line constraints, can be used to reduce computational complexity. 3. Feature-Based Binocular Matching Feature-based matching relies on some particular feature points for establishing the correspondence between images. Once the correspondence between the feature points in two images is established, the depth of these feature points can be calculated and interpolated to obtain the whole depth map. Matching of feature points can be considered an optimization problem and be solved using dynamic programming techniques. 4. Horizontal Multiple Stereo Matching Horizontal multiple stereo vision is a direct extension of the basic stereo vision model, with the purpose of improving the accuracy of the disparity measurement. The key in multiple stereo matching is the introduction of the inverse distance in the calculation of the sum of the squared difference. The uncertainty problem caused by repeat patterns in stereo matching can be solved by using two different baselines. 5. Orthogonal Trinocular Matching Stereo matching along a line can encounter problems when the gray-level values along this direction are smooth. One technique for solving this problem is to perform a matching that is also in the perpendicular direction. Orthogonal trinocular matching conducts two complementary matching processes. It can reduce the mismatching caused by smooth regions and periodic patterns. Fast orthogonal matching can be attained by classifying the gradientdirection. 6. Computing Sub-Pixel Level Disparity Sub-pixel level disparity is needed for precise measurements. Sub-pixel level disparities can be obtained by using the information rom local variation patterns of image intensity and disparity. 7. Error Detection and Correction In the process used to obtain the disparity map, various factors can cause errors; hence, post-processing of the depth map is necessary. A general and fast error detection and correction algorithm, which is independent of any matching algorithm, for a disparity map was introduced. SUMMARY (CHAPTER 15) 1. Photometric Stereo Photometric stereo is an important method, which needs a set of images, taken at the same view angles but with different lighting conditions, for recovering surface orientation. Scene radiance and image irradiance are related by the bidirectional reflectance distribution function. The reflectance map gives the relation between object brightness and surface orientation. 2. Structure from Motion Optical flow and motion field are closely related but they are different concepts. The former brings on the latter, but the latter does not necessarily induce the former. The optical flow constraint equation determines the relations among the gray-level gradients along the X, Y , T directions. The optical flow constraint equation can only be solved in some particular cases, such as rigid bodies and smooth motion. The determination of the optical flow on a surface makes it possible to find surface orientation. 3. Shape from Shading Shape from shading can recover the shape information of a 3-D scene from a single image. The image brightness constraint equation can be solved with one image under certain smooth constraints. The principle used to solve the image brightness constraint equation can also be used to solve the optical flow constraint equation. 4. Texture and Surface Orientation Shape from texture is based on the texture distortion caused by projective texture changes with respect to surface orientation. Surface orientation is determined from the appearance change of texture elements. Three groups can be distinguished: the change in size, the change in shape, and the change in spatial relation. If the texture is composed of a regular grid of texture elements, then surface orientation information can be recovered by computing the vanishing point, which in turn can be obtained with the help of the Hough transform. 5. Depth from Focal Length The change of the focal length of a lens creates blurring of the object image, which is dependent on the distance of the object from the lens. 6. Pose from Three Pixels In cases where a geometric model of the object and the focal length of the camera are known, the pose of a 3-D surface can be recovered from a 2-D image. The problem of P3P can be reduced to solving three quadratic equations, which in turn can be solved by an iterative algorithm performing a nonlinear optimization. SUMMARY (CHAPTER 16) 1. Fundamentals of Matching Matching consists of finding the correspondence between the objects in a scene and the established models to understand the meaning of the scene. There are different levels of matching, such as image matching, object matching, relation matching, concept matching, and semantic matching. In detailed scale, matching and registration are different. Registration has a narrower meaning and can be considered a special case of matching. Registration can be performed either in the image space or in the transformed space. In the Fourier space, both the phase correlation and magnitude correlation are used for registration. 2. Object Matching Object matching takes an object as a unit; hence, interesting objects should be first detected and extracted. Hausdorff distance is a suitable measurement of the similarity of point sets and is useful in cases where objects are considered a set of points. If objects are decomposed into parts, object matching can be performed at the structure level. String matching is used to match two sequences of feature points or two boundaries of objects in a symbol-by-symbol manner. Matching of inertia equivalent ellipses is a matching technique using all points of object regions. All parameters needed to coordinate transforms can be calculated independently from the coordinates, the orientation angle, and the ratio of the major axis lengths of equivalent ellipses. 3. Dynamic Pattern Matching In dynamic pattern matching, the patterns to be matched are constructed dynamically based on the data to be matched and made during the match process. The dynamic pattern is constructed for each object with its coordinates, several distances measured from its surrounding objects, and several angles between the adjacent connecting lines. By discarding the coordinates of the central object, the absolute pattern becomes a relative pattern that is translation invariant. 4. Relation Matching Relation matching concentrates on the matching of distinctive relations among objects, instead of the characteristics of a single object. In relation matching, all relations are represented by components of a vector and the matching is performed component-by-component. Relation matching consists of four steps: determine the relations among components, determine the transform for matching relations, determine the transform set for matching relation sets, and determine the model. 5. Graph Isomorphism A graph is a kind of data structure for describing relations. A graph is defined by a limit and non-empty vertex set and a limit edge set. In some general cases, more than one vertex set and more than one edge set are permitted. A graph can be represented geometrically, in which different properties of vertexes and edges are distinguished by different geometric forms. Graph isomorphism is a technique for matching relations with the help of identical graphs (they have the same structures and differences as the labels of the vertex and/or edge). Three forms of isomorphism can be distinguished: Graph isomorphism, sub-graph isomorphism, and double-sub-graph isomorphism. 6. Labeling of Line Drawings A line drawing is a simplified representation of objects and their relations. A line drawing is based on the labeling of the object contour. Based on line drawings in 2-D, the relations among 3-D objects in the world system can be derived by structure reasoning. Three labels representing non-closed convex line, non-closed concave line, and closed line can be used to label line drawings. Labeling with sequential backtracking is a method for automatic labeling of line drawings. SUMMARY (CHAPTER 17) 1. Summary of Information Fusion Information fusion consists of fusing various complementary information data (often from a mixture of sensors) to get an optimal result. Multi-sensor information fusion is a basic ability of human beings. Information fusion can be distinguished in various layers. One example is to distinguish fusion in detection, position, object recognition, posture evaluation, and menace estimation layers. Active vision takes the advantages of “active” observer, “qualitative” metric, and narrow “purposive” vision. Image understanding can be based on active fusion, which actively selects the sources to be analyzed and fuses the data from different representational levels. The advantages of fusing multi-sensor information are that it can enhance reliability and credibility, increase spatial coverage, augment information quantity and reduce fuzziness, and improve spatial resolution. Information from multi-sensors can be redundant information, complementary information, and cooperation information. State model, observation model, and correlation model are three multisensor models based on the probability theory. 2. Image Fusion Image fusion is a particular type of multi-sensor fusion, which takes images as the operating objects. Image fusion consists of three steps: image pre-processing, image registration, and image combination. Image fusion has, from low to high, three layers: pixel-based fusion layer, feature-based fusion layer, and decision-based fusion layer. Two groups of fusion are subjective evaluation (based on the subjective filling of observers) and objective evaluation (based on some computable criteria). In subjective evaluation, the precision of image registration, the distribution of global color appearance, the global contrast of brightness and color, the abundance of texture and color information, and the definition of fused image are considered. Objective evaluation can be based on the statistics of fused images or information quantities contained in fused images. 3. Pixel-Layer Fusion Pixel-layer fusion performs pixel-by-pixel, for example, by using image arithmetic and logical operations. Typical methods of pixel-layer fusion include weighted average fusion, pyramid fusion, HSI transform fusion, PCA-based fusion, and wavelet transform fusion. Different fusion methods can be combined to increase the performance of fusion techniques. Two examples are combining HSI transform fusion and wavelet transform fusion as well as combining PCA-based fusion and wavelet transform fusion. In wavelet transform fusion, images are decomposed into different levels. An appropriate level of decomposition can balance the spectrum property and the space details in fused images. 4. Feature-Layer and Decision-Layer Fusions Some techniques used in feature-layer fusion and decision-layer fusion are common. Of them, the Bayesian method and evidence reasoning are popular. The Bayesian method considers the multi-sensor decision as a partition of the sample space and solves the decision problem by using the Bayesian conditional probability. Evidence reasoning adopts half-additivity to treat the case where the reliabilities of two opposite propositions are both small. The rough set theory is a mathematical theory. It can analyze complementary information and compress redundant information to avoid composition exploitation. SUMMARY (CHAPTER 18) 1. Feature-Based Image Retrieval Typical features used in feature-based image retrieval include the color and texture of an image, the shape of the object, the spatial relationship among objects, and the structure of each object. Color features can be matched using histogram intersection, histogram distance, central moments, or reference color tables. There are many texture features used to describe images. They can be combined to form texture vectors for a feature-based retrieval. Shapes can be described, in the image space domain, by either boundary/contour-based descriptors or region-based descriptors. Shape descriptors can also be defined in the transform domain. One of them is based on the wavelet modulus maxima and their invariant moments. 2. Motion-Feature-Based Video Retrieval Motion feature is unique for video. Two types of motion feature can be distinguished: the global motion feature and the local motion feature. Global motion is also called camera motion, as it is caused by the manipulation of the camera. Global motion is characterized by the mass movement of all points in a frame, and can be modeled by a 2-D motion vector. Local motion is also called object motion, as it is caused by the movement of objects in a scene. Local motion can be quite complicated since many different objects can move differently and each of them should be represented by a vector field. 3. Object-Based Retrieval Feature-based image retrieval is based on low-level image features, while human beings often describe the content of an image in a high-level semantic. There is a semantic gap. To fill the semantic gap, an image is analyzed hierarchically and progressively based on the multi-layer description model. A typical multi-layer description model consists of four layers: the original image layer, the meaningful region layer, the visual perception layer, and the object layer. In object-based retrieval, object recognition should be performed in the object layer. Once the objects are recognized, the retrieval can be conducted by object matching and/or object relation matching. 4. Video Analysis and Retrieval Video can be considered an extension of an image. Many types of video programs exist. Three of them—news program, sport match, and home video—are discussed. A news program has a relatively fixed regular structure. One news program is composed of a sequence of news items, which is started by an anchor shot. A three-step detection method for anchor shots includes the detection of all main speaker close-ups (MSC), the clustering and listing of MSC, and the analysis of the time distribution of shots. Sport match videos are structured (the matches are either limited in space or in time) with particularity for each sport (different limits). The most interesting aspects of a sport match are climactic events. Detecting and retrieving these events are often the goals of sport match video analysis. Detecting and tracking balls and players, and ranking scenes according to the motion of balls and players are the essential steps in retrieving the highlight shots for table tennis. Home video can have unrestricted content and equally important shots. Consequently, discovering the scene structure above the shot level plays a key role in home video analysis. To organize the shots in a home video, it is important to detect motion attention regions and separately treat the attention regions and other remaining regions.
© Copyright 2026 Paperzz