SUMMARY (CHAPTER 1) 1. Basic Concepts of Images

SUMMARY (CHAPTER 1)
1. Basic Concepts of Images
Images can be obtained using different observing and capturing systems
from the real world in various forms and manners.
A 2-D image is the projection result of a 3-D scene, and can be represented
by a 2-D array f (x, y).
In a 2-D digital image, x, y, and f will all take values in an integer set.
The gray-levels of images are represented by the brightness of the images.
Higher levels represent brighter pixels, while lower levels represent
darker pixels.
A general image representation function is a vector function f (x, y, z, t, λ)
with five variables.
2. Image Engineering
Image Engineering (IE) includes three related layers of image techniques:
Image Processing (IP), Image Analysis (IA), and Image Understanding
(IU).
Image processing concerns the manipulation of an image to produce
another (improved) image.
Image analysis concerns the extraction of information from an image.
Image understanding concerns the interpretation of the original scene and
the decision and actions according to the interpretation of the images.
Image engineering is related to a number of disciplines. The closest ones
include Computer Graphics (CG), Pattern Recognition (PR), and
Computer Vision (CV).
Based on a survey of image engineering literature, more than 20 subcategories of image engineering can be defined.
3. Organization and Overview of the Book
According to the general framework of image engineering, four parts are
identified in this book: image fundamentals, image processing, image
analysis, and image understanding.
SUMMARY (CHAPTER 2)
1. Spatial Relationship in Image Formation
Perspective transformation describes the process of the projection of a 3-D
scene on to a 2-D image.
Three coordinate systems are involved in image capturing by a camera:
The world coordinate system, the camera coordinate system, and the
image coordinate system. In the general camera model, all three systems
can be separated.
Using the homogeneous coordinates, the transformations for a perspective
projection can be expressed in a linear matrix form.
Inverse projection means to determine the coordinates of a 3-D point
according to its projection on to a 2-D image.
Perspective transformation is composed of translation, rotation, and
projection in the general case.
2. Image Brightness
Radiometry measures the energy of electromagnetic radiation while
photometry measures the energy radiation of light.
The scale of a point source is sufficiently small and/or is sufficiently
distant from the observer, so the eye cannot identify its form.
The extended source has a finite emitting surface, so it is more realistic.
Brightness specifies the quantity of light emitted from a light source while
illumination specifies the quantity of light received by a surface, which is
illuminated by the light source.
As the intensity of an image is a measurement of the radiation energy, its
value should be non-zero and finite.
The brightness of an image is propositional to the illumination component
(the amount of light incident on the viewing scene) and the reflection
component (the amount of light reflected by the object in a scene).
3. Sampling and Quantization
An image f (x, y) must be digitized both in space and in amplitude to be
processed by computers. Sampling is the process of digitization of the
spatial coordinates (x, y), and quantization is the process of digitization of
the amplitude f .
The number of bits needed to store an image is the product of the number
of bits to represent the image intensity and the image width and height.
The influence of sampling on the quality of an image is marked by a
checkerboard effect with graininess, especially around the region
boundary in images.
The influence of quantization on the quality of an image is characterized
by the appearance of ridge-like structures and false contouring around the
areas with smooth gray-levels.
The Shannon sampling theorem states that it is possible to completely
recover the original frequency-limited function from its samples.
However, for any signal and its associated Fourier spectrum, the signal
can be limited in extent (space-limited) or the spectrum can be limited in
extent (band-limited) but not both.
Applying the sampling theorem in a proper way is impossible due to
either a finite camera aperture or the use of a finite amount data or both.
4. Stereo Imaging
Stereo imaging uses two or more cameras for capturing the depth
information that is often lost in a normal image acquisition process.
The parallel horizontal model is the most popular model used in stereo
imaging. Other models include the angular-scanning model, the focused
horizontal model, and the axis model.
In stereo imaging, the disparity between different cameras is the essential
quantity to be computed. The depth information can be derived from the
disparity by triangulation.
SUMMARY (CHAPTER 3)
1. Relationships between Pixels
Neighbors of a pixel p at coordinates (x, y) consist of the set of pixels that
are closest to p. Two common neighborhoods are 4-neighbors of p: N4(p)
and 8-neighbors of p: N8(p).
If pixel q is a neighbor of pixel p, p and q are adjacent. Adjacency depends
on the number of neighbors.
Connectivity is more general than adjacency. Connectivity counts both
spatial relationships between pixels and the pixel properties.
A connected component of an image forms a region inside the image, in
which any two pixels are connected via the connectivity of pixels inside
the region.
The definitions of neighbor, neighborhood, adjacency, connectivity, and
connected component for the 2-D case can be easily extended to the case
of 3-D or even higher dimensions.
2. Distances
Distance measures how far apart two pixels are in an image.
In a digital image, instead of Euclidean distance, city-block distance and
chessboard distance are generally used.
The 4-neighbors and 8-neighbors of a pixel can also be defined using cityblock distance and chessboard distance, respectively.
Using the knight-distance measurement, an even wider neighborhood of a
pixel can be defined.
A distance disc centered at a pixel consists of all surrounding pixels,
which have limited distances to the pixel.
The chamfer distance is an integer approximation of the Euclidean
distance. The element for computing chamfer distance is move, which is
the length of the path from one pixel to another.
The Farey sequence provides a suitable way to extend the move in the
chamfer distance to high orders.
3. Image Coordinate Transformations
Common coordinate transformations include translation, rotation, and
scaling. In a 3-D space, all these transformation can be represented by a 4
× 4 transformation matrix.
Various coordinate transformations can be combined by cascading. Since
matrix multiplication is generally not exchangeable, the results of the
cascading depend on the order of transformations.
4. Distance Transforms
Distance transform is a special transform that maps a binary image into a
gray-level image. It takes the value of the distance from a pixel to a
reference point as the gray-level value of that pixel.
Distance transform is a global operation, but it can be computed locally by
using a small mask whose size should be an odd value. This process can
be implemented either sequentially or parallelly.
In the sequential process, the mask is divided into two symmetric submasks. One forward pass and one backward pass are performed using
respective sub-masks.
In the parallel process, the propagation of distance value from the
boundary to the center is performed for all pixels in the mask iteratively.
5. Geometric Transformations
Geometric transformation corrects the geometric distortion of an image. It
consists of two steps: spatial transformation and gray-level interpolation.
Spatial transformation rearranges pixels in the distorted image to recover
the original spatial relation between pixels.
Gray-level interpolation assigns values to pixels in the geometrically
corrected image, according to the values in the distorted image, to get
back the original properties of images.
Various gray-level interpolation schemes exist. They range from zero
order to high order with increased accuracy and computational costs.
6. General Morphic Transform
Morphic transformations are a group of transformations that map a plane
to another plane.
The projective transformation, the affine transformation, the similarity
transformation, and the isometric transformation (including the rigidbody transformation and the Euclidean transformation) form a hierarchy
of morphic transformations.
Aprojective transformation determines the coordinate transformations in
the projection. It is specified by eight parameters.
An affine transformation is a non-singular linear transformation followed
by a translation. It has six degrees of freedom.
A similarity transformation is an equi-form transformation. It has four
degrees of freedom.
An isometric transformation has three degrees of freedom.
SUMMARY (CHAPTER 4)
1. Separable and Orthogonal Transforms
Separable transforms between functions f (x, y) and T (u, v) are made with
the forward transformation kernel and the inverse transformation kernel,
respectively. These two kernels depend only on the indexes x, y, u, and v,
but not on the values of f (x, y) or T (u, v).
A 2-D transform with a separable kernel can be computed in two steps,
each requiring a 1-D transform.
If A is a real matrix and A−1 = AT, then the matrix A is an orthogonal
matrix and the corresponding transform is called the orthogonal
transform.
2. The Fourier Transform
The Fourier transform is a particular separable and symmetric transform.
Both the Fourier transform and its inverse transform are separable and
symmetric.
Typical theorems for the 2-D Fourier transform include shift, rotation,
scale, shear, and the affine transform.
3. Walsh and Hadamard Transforms
Both the Walsh transform and the Hadamard transform are separable,
symmetric, and orthogonal.
The kernel of the Walsh transform can be considered a set of basis
functions.
The Hadamard transformation matrices can be generated with the help of
a simple recursive relationship.
4. Discrete Cosine Transform
The discrete cosine transform (DCT) is a separable, symmetric, and
orthogonal transform. Its computation can be realized using the real part
of the discrete Fourier transform (DFT).
The kernels of DCT are complex cosine functions.
Cosine functions are even functions, so the DCT has implicitly a 2N-point
periodicity.
5. Gabor Transform
The short-time Fourier transform (STFT) of a function f (t) with respect to
the window function r(t) evaluated at the location (b, v) in the timefrequency plane gives the approximate spectrum of f near t = b.
The Gabor transform uses the Gaussian function as the window function.
Since the Gaussian function has the minimum size of a time-frequency
window, it has the highest concentration of energy in the t -f plane.
In real applications, the Gabor transform of images is computed for a
number of scales and a number of orientations.
6. Wavelet Transform
One function can be represented by a linear combination of realvalued
expansion functions weighted by real-valued expansion coefficients.
Any function f (x) can be decomposed into two parts: one approximates f
(x) obtained by using the scaling functions and the other is a difference,
which can be represented by the sum of the wavelet functions.
Two typical wavelet transforms are wavelet series expansion, which maps
continuous functions to a sequence of expansion coefficients, and discrete
wavelet transform, which converts a sequence of data to a sequence of
coefficients.
2-D wavelet transform involves one 2-D scaling function and three
wavelet functions, which are the product of a 1-D separable scaling
function with corresponding 1-D separable and directionally sensitive
wavelet functions.
Wavelet decomposition of an image can be implemented in different
ways: the pyramid-structured decomposition, the non-complete treestructured decomposition, and the complete tree-structured
decomposition.
7. The Hotelling Transform
The Hotelling transform is computed using the mean vector and the
orthonormal eigenvectors of the covariance matrix.
The Hotelling transform forms a new coordinate system, in which the
principal axes are aligned with the eigenvectors by rotation.
8. Radon Transform
The Radon transform of f (x, y) is defined as the line integral of f (x, y) for
all lines defined by p = x cos θ + y sin θ.
The 2-D Fourier transform of f (x, y) is equivalent to the Radon transform
of f (x, y) followed by a 1-D Fourier transform on the variable p.
Basic theorems of the Radon transform include linearity, similarity,
symmetry, shifting, differentiation, and convolution.
Inversion of the Radon transform yields information about an object in the
image space when a probe has been used to produce the projection data.
SUMMARY (CHAPTER 5)
1. Image Operations
Image operation uses images as operands. It works pixel-by-pixel.
Arithmetic operations include addition, subtraction, multiplication, and
division.
The basic logic operations include COMPLEMENT,AND, OR, and XOR.
Image operations can be used for noise removing, motion detection, edge
detection, etc.
2. Direct Gray-Level Mapping
Gray-level mapping changes the visual impression of an image by
changing the gray-level value of each pixel (a point-based operation)
according to certain transform functions.
Design of the mapping law should depend on enhancement requirements.
3. Histogram Transforms
The histogram of an image is a statistical representation of the image. A
gray-level histogram provides the statistics (the number of pixels for each
gray-level) of the image.
The cumulative histogram of an image is a partial summation of the
histogram of this image.
Histogram equalization tries to make the histogram more evenly
distributed, so that the dynamic range and the contrast of the image can
be increased automatically.
Histogram specification transforms the histogram into a specific form to
enhance the image in a pre-defined manner. Histogram equalization can
be considered a special case of histogram specification.
Compared with SML, GML is less biased in mapping, produces less
mapping differences, and has small error expectations.
4. Frequency Filtering
By the Fourier transform, an image can be represented using various
frequency components. Enhancement in the frequency domain can be
achieved by removing or keeping the specified frequency components.
Low-pass filtering will keep low-frequency components and remove highfrequency components. The boundary frequency is called the cutoff
frequency.
High-pass filtering will keep high-frequency components and remove
low-frequency components.
5. Linear Spatial Filtering
Enhancement in the image domain can be based on the property of the
pixels and the relations among the neighbors.
The relations among the pixels are often represented by masks.
The filtering process is carried out by mask convolution, which performs a
linear combination of the computation results.
The simplest smoothing filter takes the average value of the neighboring
pixels as the output of the mask. An improvement is to emphasize the
importance of each mask element by using a specified weight.
6. Non-Linear Spatial Filtering
A non-linear filter provides a logic combination of the computation results
in a neighborhood.
The median filter is a popularly used non-linear filter. It replaces the value
of a pixel by the median of the gray-levels in the neighbors of that pixel.
Order-statistic filters are a group of non-linear filters, the median filter is
an example. Typical examples also include max and min filters, as well as
midpoint filters.
7. Color Image Enhancement
Each color can be represented as a point in the color space (color model).
Hardware-oriented color models are used mainly for color monitors,
printers, etc. Typical examples are the RGB model and the CMY model.
Perception-oriented models are more suitable for describing colors in
terms of human interpretation, and are used mainly for image processing.
Typical examples include the HSI model and the L*a*b* model.
Pseudo-color enhancements consist of assigning different colors to
different gray-levels in an image to emphasize their differences.
Full color images can be enhanced either by processing each color
component individually or by processing the pixel magnitude of the color
image as a vector.
SUMMARY (CHAPTER 6)
1. Degradation and Noise
Images can be degraded in many ways and in various steps during image
acquisition and operations.
Noise is one of the most popular sources of degradation, which is, in
general, considered to be the disturbing/annoying signals of the required
signals.
Signal-to-noise ratio (SNR) is a useful indication of the image quality with
the presence of noise.
To describe the statistical behavior of the noise component of an image,
the probability density function (PDF) is used. Typical examples include
Gaussian noise, uniform noise, and impulse (salt-and-pepper) noise.
2. Degradation Model and Restoration Computation
In a simple model of image degradation, an operator H, which acts on the
input image f (x, y) and an additive noise n(x, y) jointly produce the
degraded image g(x, y).
The properties of a degradation system may include linearity, additivity,
homogeneity, and the position invariance.
The computation of a degradation model can be carried out with the
convolution of the circulant matrix (for a 1-D case) or the block-circulant
matrix (for a 2-D case).
Both a circulant matrix and a block-circulant matrix can be diagonalized.
The effect of the diagonalization is that the degradation model can be
solved with the help of a few discrete Fourier transforms.
3. Techniques for Unconstrained Restoration
In unconstrained restoration, no a priori knowledge about the noise is
assumed. The restoration is carried out in a least squares sense of the
estimation error.
Inverse filtering is a commonly used restoration approach, which can be
implemented in the Fourier domain.
Removal of the blur caused by uniform linear motion is a typical
application of the techniques for the unconstrained restoration in a closed
form.
4. Techniques for Constrained Restoration
In constrained restoration, some a priori knowledge of the noise is used to
constrain the least square computation. The restoration is carried out
using the method of Lagrange multipliers.
Wiener filtering is a statistical method for constrained restoration. It is
based on the correlation matrices of the image and the noise. When there
is no noise, theWiener filter degrades to the ideal inverse filter.
The constrained least square restoration only requires knowledge of the
noise’s mean and variance, but the restoration is optimal for each image.
5. Interactive Restoration
Interactive restoration uses the advantage of human intuition to control
the restoration process.
Interactive restoration is suitable for eliminating the occurrence of a 2-D
sinusoidal interference pattern (coherent noise).
SUMMARY (CHAPTER 7)
1. Introduction
The techniques for reconstruction from projections are different from the
techniques for 3-D reconstruction from depth images, which will be
discussed in later chapters.
There are various modes of reconstruction from projections. Typical
examples are Transmission Computed Tomography (TCT), Emission
Computed Tomography (ECT), including Positron Emission Tomography
(PET), and Single Positron Emission CT (SPECT), Reflection Computed
Tomography (RCT), Magnetic Resonance Imaging (MRI), and Electrical
Impedance Tomography (EIT).
In reconstruction from projections, information about a 2-D region is
collected by a number of line integrations over this region along different
directions. The reconstruction is carried out by solving integration
equations.
3-D reconstruction from projections makes true 3-D imaging possible.
2. Reconstruction by Fourier Inversion
Fourier inversion is a typical reconstruction method among the group of
transform methods. It starts by establishing a continuous model, then
solves this model by using inversion, and finally adapts to the result for
discrete data.
The basis of transform methods is the projection theorem for the Fourier
transform, which relates the 2-D Fourier transform of the projected data to
the 1-D Fourier transform of a line integration.
Fourier inversion follows a typical transform-based approach, which
transforms the data, processes the transformed data, and transforms the
result inversely.
Phantoms are used to study the algorithms for reconstruction from
projection. One commonly used example is the Shepp-Logan head model.
3. Convolution and Back-Projection
The idea behind back-projection is to obtain a distribution from the
inverse process of the projection.
Convolution back-projection is the most widely used technique for backprojection, which consists of a convoluted projection process followed by
a back-projection process.
Filtering the back-projections is also a technique for back projection, which
consists of a back-projection process followed by a filtering or convolution
process.
Back-projection of the filtered projections is another technique for backprojection, which consists of a filtering process followed by a backprojection process.
4. Algebraic Reconstruction
The algebraic reconstruction technique (ART) is also called a finite seriesexpansion reconstruction method.
ART is performed in a discrete space from the beginning, in contrast to
other transform or projection methods which start in a continuous space.
ART without relaxation uses an iterative step to update the attenuation
vector by using one single ray at each time and changing only the pixels
that intersect with this ray.
ART with relaxation is extended from ART without relaxation by adding a
relaxation coefficient that controls the convergence speed of the iterative
process.
5. Combined Reconstruction
Different reconstruction methods can be combined to form new methods
for reconstruction.
One typical combined reconstruction method is called the iterative
transform method. It is similar to transform methods as the algorithm for
discrete data is derived from the reconstruction formula. Its iterative
nature makes it common to ART.
SUMMARY (CHAPTER 8)
1. Fundamentals
Data are closely related to information but are different from information.
Data redundancy occurs when some data provide non-essential or already
known information.
There are three basic data redundancies: The coding redundancy, the
inter-pixel redundancy, and the psycho-visual redundancy.
In a lossless coding process, no information is lost. In a lossy coding
process, some information is lost.
To judge the fidelity of the decompressed image to the original image,
objective or subjective criteria can be used.
2. Variable-Length Coding
Variable-length coding is based on statistical theories and is called entropy
coding. It represents high probability events with fewer bits and low
probability events with many bits.
Huffman code can be constructed by iteratively constructing a binary tree
and computing the probability of source symbols.
Sub-optimal Huffman cording sacrifices coding efficiency for simplicity in
code construction.
Shannon-Fano code can be constructed by assigning 0 and 1 in nearly the
same probabilities.
Arithmetic coding assigns an entire sequence of the source symbols with a
single arithmetic code word. As a result, it generates non-block codes.
3. Bit-Plane Coding
Bit-plane coding decomposes one gray-level image into a sequence of bitplanes (binary images) and then codes each bit-plane.
A binary code decomposition of an image produces an m-bit image in the
form of m 1-bit bit-planes.
A Gray code decomposition of an image can reduce the effect of small
gray-level variations on each bit-plane. In Gray code, successive code
words differ by only one bit position.
Constant area coding (CAC) divides the image into three types of blocks
and uses 1-bit or 2-bit code words for all white or all black blocks.
In 1-D run-length coding, the white or black runs are coded by their
lengths. The length of the run can be further coded by variable-length
coding. 2-D run-length coding can be extended from 1-D run-length
coding by tracking the binary transitions that begin and end each black
and white run in a previous line.
4. Predictive Coding
Predictive coding only codes the new information in each pixel with
respect to the predictive value of this pixel to reduce or to remove interpixel redundancy.
In lossless predictive coding, the predictor produces the
predictive/estimation value of the current pixel according to the values of
several past input pixels.
A system of lossy predictive coding is formed by adding a quantizer into
the system of the lossless predictive coding. The quantizer maps the
prediction error into a limited range of outputs.
In designing the optimal encoder, the encoder’s mean-square prediction
error is minimized to achieve optimal encoding performance.
In designing the optimal quantizer, the mean-square quantization error is
minimized to achieve optimal quantizing performance.
5. Transform Coding
Transform coding is a frequency domain technique, and is lossy in
general.
The encoder of a typical transform coding system consists of four
operation modules: sub-image decomposition, transformation,
quantization, and symbol coding.
The decoder of a typical transform coding system consists of three
operation modules: symbol decoding, inverse transformation, and subimage merging.
In transform coding, compression is achieved during the quantization step
but not the transform step.
The selection of the transform depends on the coding error allowed and
the computational complexity.
Since wavelet transform is efficient in computation and has inherent local
characteristics, there is no module for sub-image decomposition in
wavelet transform coding systems.
SUMMARY (CHAPTER 9)
1. Definition and Classification
Image segmentation is defined as the process used to sub-divide an image
into its constituent parts and extract those parts of interest (objects).
Image segmentation can be more formally defined with the help of
uniformity predication and the set theory.
The classification of segmentation algorithms can be considered a
partition of a set into subsets.
Image segmentation algorithms can be classified into four groups:
Boundary-based parallel, boundary-based sequential, region-based
parallel, and region-based sequential.
The classification of segmentation algorithms can be considered a
partition of a set into subsets.
2. Basic Technique Groups
Edge detection is the first step of the boundary-based parallel
segmentation algorithm.
Differential edge detectors (DED), such as the Sobel detector, are widely
employed to detect edges.
The Hough transform is a global technique for boundary detection.
The major components of a boundary-based sequential segmentation
algorithm consists of the selection of a good initial point, choosing the
next point according to the dependence structure, and terminating the
process according to a pre-defined criterion.
Graph search and dynamic programming form the basis of a global
approach combining edge detection and edge linking for image
segmentation.
Thresholding is a popular segmentation algorithm, which belongs to the
region-based parallel group.
Determination of appropriate threshold values is the most important task
involved in thresholding techniques. The threshold values can be pointdependent, region-dependent, or coordinate-dependent.
A split, merge, and group approach is a typical region-based sequential
segmentation algorithm, which consists of five phases: initialization,
merging, splitting, conversion, and grouping.
3. Extension and Generation
The modifications required to extend 2-D algorithms to 3-D applications
depend on the nature of 2-D algorithms.
In extending 2-D differential edge detectors to 3-D, the number of masks
and the size of masks (in counting 6-, 18-, and 26-neighborhoods) should
be modified.
In extending 2-D thresholding techniques to the 3-D case, 3-D images’
primitive, that is voxel, should replace the pixel in 2-D images, and 3-D
local operators should replace 2-D local operators.
In extending 2-D SMG algorithm to 3-D, the 2-D quad-tree structure
should be substituted with a 3-D octree structure.
Generating pixel-level edge detection schemes to detect sub-pixel edges
can be accomplished by using information either about the statistics along
the normal direction or the tangent direction information.
The basic Hough transform technique for determining close-form curves
can be generalized to detect any form of curves by using an R-table.
4. Segmentation Evaluation
The development of segmentation techniques has traditionally been an ad
hoc and problem-oriented process, as no general theory exists for
segmentation.
Segmentation algorithms can be evaluated analytically or empirically. The
methods used in the latter approach can still be classified as the goodness
methods and the discrepancy methods.
Many criteria for segmentation evaluation have been proposed. They can
be classified according to the groups of evaluation methods.
A general evaluation framework consists of three related modules: Image
generation, algorithm testing, and performance assessment.
Systematic comparison of various segmentation evaluation methods is one
level up for the segmentation evaluation, and can be made either on the
evaluation group or on the single evaluation methods.
Experimental comparison of some empirical evaluation methods shows
that the discrepancy methods are better than the goodness methods in
segmentation evaluation.
SUMMARY (CHAPTER 10)
1. Classification of Representation and Description
Representation can be external or internal. The former focuses on shape
characteristics while the latter focuses on region properties.
Description expresses the characteristics of an object region based on its
representation. The properties of the region can be obtained via the
computation of descriptors.
Both representation and description schemes can be divided into
boundary-based, region-based, and transform-based representations.
2. Boundary-Based Representation
Boundary-based representation approaches can be divided into
parametric boundaries, a set of boundary points, and a curve
approximation.
Chain codes, boundary segments, polygonal approximation, signatures,
and landmarks are frequently used techniques for boundary-based
representation.
3. Region-Based Representation
Region-based representation approaches can be divided into region
decomposition, bounding regions, and internal features.
Bounding regions, quad-trees, pyramids, and skeletons are frequently
used techniques for region-based representation.
4. Transform-Based Representation
In transform-based approaches, some transforms are used to represent the
shape in terms of the transform coefficients.
The Fourier transform-based approach is a basic technique for boundarybased representation in the frequency domain.
Fourier descriptors of the boundary, for different geometric changes,
follow some simple transformation rules.
5. Descriptors for Boundary
Boundary length, boundary diameter, and boundary curvature are some
simple descriptors for boundary.
Shape number is based on chain-code representation. It is the smallest
value of the first difference of chain codes.
The boundary moment of an object is computed from the segments
composing the boundary.
6. Descriptors for Region
There are various region properties, such as the geometric property, shape
property, topological property, and intensity property.
The area, centroid, and intensity of an object region are some basic
descriptors for regions.
The seven moments that are invariant to translation, rotation, and scaling
are composed by normalized second and third central moments.
Topological descriptors are independent of distance measurements, so are
unaffected by any deformation of object regions.
SUMMARY (CHAPTER 11)
1. Direct and Indirect Measurements
Direct measurement quantifies directly the property of objects. The
measurement can be made on an image in general or on the object in
particular.
Derived measurements are indirect measurements as they are derived
from direct measurements by combination. The number of combinations
can be unlimited.
Measurement combinations are based on combining metrics to form a
new metric.
2. Accuracy and Precision
Accuracy is defined as the agreement between the measurement and some
objective standard taken as the “truth.” High accuracy shows a tendency
toward unbiasedness.
Precision is defined in terms of repeatability, the ability of the
measurement process to duplicate the same measurement and produce
the same result. High precision in measurement provides highly efficient
measurements.
Accuracy and precision in measurements are somehow independent. A
highly precise but inaccurate measurement is generally useless.
The statistical error is associated with precision as it describes the scatter
of the measured value for repeated measurements.
The systematic error is associated with accuracy as it is indicated by the
difference between the true value and the average of the measured values.
3. Two Types of Connectivity
In square lattice, 4-direction connectivity and 8-direction connectivity are
commonly used.
In image measurement, the internal pixels and boundary pixels should be
judged with different types of connectivity to avoid ambiguity.
Similar statements for object points and background points, for 4connected component and 8-connected arcs, for 4-connected component
and 8-connected curves, and for an open set and closed set can also be
made.
4. Feature Measurement Error
Along the process from scene to data, many factors will influence the
accuracy of measurements.
Important factors that make the real data and estimated data different
include the natural variation of the scene, the image acquisition process,
the image processing and analysis procedures, the different measurement
practices and feature computation formulas, as well as noise and other
degradations.
The applicability of the sampling theorem cannot be guaranteed in image
analysis, so the sampling theorem is not a proper reference for choosing a
sampling density.
Segmentation is the basis for feature measurement, so the quality of image
segmentation directly influences the accuracy of feature measurements.
This influence depends also on the types of measurements, that is, the
features to be measured.
The computation formulas used in indirect measurements are also critical
for an accurate and precise feature measurement.
Different influence factors can have combined effects on feature
measurements.
5. Error Analysis
For a given feature measurement task, there would be an upper bound
and a lower bound for the error.
Some approximation errors can be analyzed in closed forms, though most
of them can only be obtained by numerical computations.
SUMMARY (CHAPTER 12)
1. Concepts and Classification
Texture is a word rich in meanings but having no unique formal
definition.
The perception of texture depends on the scale at which the texture is
viewed. Two types of textures can be distinguished based on scales. They
are micro-texture and macro-texture.
Research on texture can be classified into four groups: texture description,
texture segmentation, texture synthesis, and shape from texture.
Approaches to texture analysis can be classified into three categories:
statistical techniques, structural techniques, and spectral techniques.
2. Statistical Approaches
The statistical approach for texture analysis uses the statistical property of
the whole texture (not the element) to describe and classify texture
patterns.
The co-occurrence matrix of gray-levels reflects the spatial distribution of
pixels with different gray-levels, and thus is the basis for many texture
descriptors.
Law’s texture energy is computed by using fixed-size windows. 1-D
masks are first defined, then 2-D convolution is performed to give an
energy measurement of a local texture pattern.
3. Structural Approaches
Structural approach for texture analysis uses a two-layered structure. The
first layer characterizes the gray-level primitive to specify the local
properties, and the second layer describes the organization of primitive
elements with some arrangement rules.
A texture element is a connected component of an image. Examples range
from the simplest pixel to a more complex neighborhood of a pixel.
Arrangement rules can be defined by specific language/grammar, which
specify what variables can be replaced by certain elements and variables.
One typical structural approach tessellates texture patterns on the plane
in a certain order.
4. Spectral Approaches
Spectral approaches for texture analysis transform the original image into
spectral spaces and describes texture property using texture attributes in
new spaces.
The peaks Fourier spectrum can provide the principal direction of the
texture patterns and the fundamental spatial period of the pattern.
Many texture descriptors based on the Bessel-Fourier function can be
obtained. They can indicate many properties of textures, such as rotational
and translational symmetry, coarseness, contrast, roughness, regularity,
etc.
By using a set of 2-D Gabor filters, an image can be decomposed into a
sequence of frequency bands. Further texture descriptions can be
conducted from these bands.
5. Texture Categorization
Three categories of textures can be distinguished: globally ordered
textures, locally ordered textures, and disordered textures.
Locally ordered textures can be modeled neither statistically nor
structurally. They can be analyzed by a combination of two types of
images: The angle image and the coherence image.
Texture can be composed with three types of primitive textures: by the
linear combination, the functional combination, and the opaque overlap.
6. Texture Segmentation
Texture segmentation aims to separate regions with different textures. As
the texture is defined in a neighborhood, the result of texture
segmentation has a resolution of a group of pixels.
Techniques for texture segmentation can be divided into supervised and
unsupervised ones. Both of them perform texture classifications.
In supervised texture segmentation, two steps, the pre-segmentation and
post-segmentation steps, are needed.
In unsupervised texture segmentation, the number of texture types is not
known a priori, and to determine the number of texture types is a key task.
SUMMARY (CHAPTER 13)
1. Definitions and Tasks
A shape means the shape of an entity in the real world or the shape of a
region in an image.
An object’s shape involves all aspects of an object, except its position,
orientation, and size. A shape in an image involves all aspects, except its
mean gray-level and intensity scale factor.
An operable definition of shape takes an object’s shape as the pattern
formed by the points on the boundary of this object.
Shape analysis involves shape pre-processing, shape representation and
descriptions, and shape classifications.
2. Different Classes of 2-D Shape
A classification of planar shapes forms a tree-like structure.
Shapes are categorized as thin and thick shapes. Thin shapes are
categorized as involving a single curve or composed parametric curves.
The single curve shapes are classified as being open or closed, smooth or
not, Jordan or not, and regular or not.
3. Shape Property Description
Shape property description consists of describing one property of a shape
by using different techniques/descriptors.
To describe the compactness property, descriptors such as form factor,
sphericity, circularity, and eccentricity can be used.
To describe the complexity property, descriptors such as area to perimeter
ratio, thinness ratio, rectangularity, mean distance to the boundary, and
temperature can be used.
4. Technique-Based Descriptors
Technique-based descriptors are a set of descriptors belonging to the same
technique group.
Many shapes descriptors are based on the polygonal approximation
representation. The shape number and the shape signature are two of
them.
The curvature can be extracted from the contour of an object.
Measurements on a curvature, direct or indirect, have many geometrical
aspect meanings of the object contours.
The computation of discrete curvatures needs some approximation
techniques. The order of a curvature is an index of the approximation
order.
Descriptors based on curvature computations include curvature statistics;
maxima, minima, and inflection points; symmetry measurement and
bending energy.
5. Wavelet Boundary Descriptors
Wavelet boundary descriptors can be normalized for both translation and
scale in-variances.
The two important properties of wavelet boundary descriptors are their
ability to represent a boundary and their measurable ability for similarity
between two boundaries.
Compared to the Fourier boundary descriptor, the wavelet boundary
descriptor is less affected by the local deformation of a boundary, and is
more precise in describing the boundary with less coefficients.
6. Fractal Geometry
The topological dimension corresponds to the number of degrees of
freedom of a point in space. The fractal dimension is often larger than its
topological dimension for a complex point set.
The box-counting approach is a practical method used to estimate the
fractal dimensions in practice.
Experimentally obtained diagrams for the box-counting method have
three distinct regions depending on the scales, from a non-fractal region,
via a fractal region, to a region with zero dimensions.
SUMMARY (CHAPTER 14)
1. Modules of Stereo Vision
A complete stereo vision system must perform six tasks: camera
calibration, image capture, feature extraction, stereo matching, 3-D
information recovery, and post-processing.
2. Region-Based Binocular Matching
In stereo vision, more than one image is used. Stereo matching consists of
establishing the correspondence among these images.
The basic model of stereo vision is the parallel horizontal model that uses
binocular images. The techniques used for this case can be easily extended
to other stereo models.
Region-based matching uses gray-level correlation to determine the
correspondence. A simple and basic example is template matching.
In region-based matching, a number of constraints, such as compatibility
constraints, uniqueness constraints, continuity constraints, and epipolar
line constraints, can be used to reduce computational complexity.
3. Feature-Based Binocular Matching
Feature-based matching relies on some particular feature points for
establishing the correspondence between images.
Once the correspondence between the feature points in two images is
established, the depth of these feature points can be calculated and
interpolated to obtain the whole depth map.
Matching of feature points can be considered an optimization problem
and be solved using dynamic programming techniques.
4. Horizontal Multiple Stereo Matching
Horizontal multiple stereo vision is a direct extension of the basic stereo
vision model, with the purpose of improving the accuracy of the disparity
measurement.
The key in multiple stereo matching is the introduction of the inverse
distance in the calculation of the sum of the squared difference.
The uncertainty problem caused by repeat patterns in stereo matching can
be solved by using two different baselines.
5. Orthogonal Trinocular Matching
Stereo matching along a line can encounter problems when the gray-level
values along this direction are smooth. One technique for solving this
problem is to perform a matching that is also in the perpendicular
direction.
Orthogonal trinocular matching conducts two complementary matching
processes. It can reduce the mismatching caused by smooth regions and
periodic patterns.
Fast orthogonal matching can be attained by classifying the gradientdirection.
6. Computing Sub-Pixel Level Disparity
Sub-pixel level disparity is needed for precise measurements.
Sub-pixel level disparities can be obtained by using the information rom
local variation patterns of image intensity and disparity.
7. Error Detection and Correction
In the process used to obtain the disparity map, various factors can cause
errors; hence, post-processing of the depth map is necessary.
A general and fast error detection and correction algorithm, which is
independent of any matching algorithm, for a disparity map was
introduced.
SUMMARY (CHAPTER 15)
1. Photometric Stereo
Photometric stereo is an important method, which needs a set of images,
taken at the same view angles but with different lighting conditions, for
recovering surface orientation.
Scene radiance and image irradiance are related by the bidirectional
reflectance distribution function.
The reflectance map gives the relation between object brightness and
surface orientation.
2. Structure from Motion
Optical flow and motion field are closely related but they are different
concepts. The former brings on the latter, but the latter does not
necessarily induce the former.
The optical flow constraint equation determines the relations among the
gray-level gradients along the X, Y , T directions.
The optical flow constraint equation can only be solved in some particular
cases, such as rigid bodies and smooth motion.
The determination of the optical flow on a surface makes it possible to
find surface orientation.
3. Shape from Shading
Shape from shading can recover the shape information of a 3-D scene
from a single image.
The image brightness constraint equation can be solved with one image
under certain smooth constraints.
The principle used to solve the image brightness constraint equation can
also be used to solve the optical flow constraint equation.
4. Texture and Surface Orientation
Shape from texture is based on the texture distortion caused by projective
texture changes with respect to surface orientation.
Surface orientation is determined from the appearance change of texture
elements. Three groups can be distinguished: the change in size, the
change in shape, and the change in spatial relation.
If the texture is composed of a regular grid of texture elements, then
surface orientation information can be recovered by computing the
vanishing point, which in turn can be obtained with the help of the Hough
transform.
5. Depth from Focal Length
The change of the focal length of a lens creates blurring of the object
image, which is dependent on the distance of the object from the lens.
6. Pose from Three Pixels
In cases where a geometric model of the object and the focal length of the
camera are known, the pose of a 3-D surface can be recovered from a 2-D
image.
The problem of P3P can be reduced to solving three quadratic equations,
which in turn can be solved by an iterative algorithm performing a nonlinear optimization.
SUMMARY (CHAPTER 16)
1. Fundamentals of Matching
Matching consists of finding the correspondence between the objects in a
scene and the established models to understand the meaning of the scene.
There are different levels of matching, such as image matching, object
matching, relation matching, concept matching, and semantic matching.
In detailed scale, matching and registration are different. Registration has
a narrower meaning and can be considered a special case of matching.
Registration can be performed either in the image space or in the
transformed space. In the Fourier space, both the phase correlation and
magnitude correlation are used for registration.
2. Object Matching
Object matching takes an object as a unit; hence, interesting objects should
be first detected and extracted.
Hausdorff distance is a suitable measurement of the similarity of point
sets and is useful in cases where objects are considered a set of points.
If objects are decomposed into parts, object matching can be performed at
the structure level.
String matching is used to match two sequences of feature points or two
boundaries of objects in a symbol-by-symbol manner.
Matching of inertia equivalent ellipses is a matching technique using all
points of object regions. All parameters needed to coordinate transforms
can be calculated independently from the coordinates, the orientation
angle, and the ratio of the major axis lengths of equivalent ellipses.
3. Dynamic Pattern Matching
In dynamic pattern matching, the patterns to be matched are constructed
dynamically based on the data to be matched and made during the match
process.
The dynamic pattern is constructed for each object with its coordinates,
several distances measured from its surrounding objects, and several
angles between the adjacent connecting lines.
By discarding the coordinates of the central object, the absolute pattern
becomes a relative pattern that is translation invariant.
4. Relation Matching
Relation matching concentrates on the matching of distinctive relations
among objects, instead of the characteristics of a single object.
In relation matching, all relations are represented by components of a
vector and the matching is performed component-by-component.
Relation matching consists of four steps: determine the relations among
components, determine the transform for matching relations, determine
the transform set for matching relation sets, and determine the model.
5. Graph Isomorphism
A graph is a kind of data structure for describing relations.
A graph is defined by a limit and non-empty vertex set and a limit edge
set. In some general cases, more than one vertex set and more than one
edge set are permitted.
A graph can be represented geometrically, in which different properties of
vertexes and edges are distinguished by different geometric forms.
Graph isomorphism is a technique for matching relations with the help of
identical graphs (they have the same structures and differences as the
labels of the vertex and/or edge).
Three forms of isomorphism can be distinguished: Graph isomorphism,
sub-graph isomorphism, and double-sub-graph isomorphism.
6. Labeling of Line Drawings
A line drawing is a simplified representation of objects and their relations.
A line drawing is based on the labeling of the object contour.
Based on line drawings in 2-D, the relations among 3-D objects in the
world system can be derived by structure reasoning.
Three labels representing non-closed convex line, non-closed concave line,
and closed line can be used to label line drawings.
Labeling with sequential backtracking is a method for automatic labeling
of line drawings.
SUMMARY (CHAPTER 17)
1. Summary of Information Fusion
Information fusion consists of fusing various complementary information
data (often from a mixture of sensors) to get an optimal result.
Multi-sensor information fusion is a basic ability of human beings.
Information fusion can be distinguished in various layers. One example is
to distinguish fusion in detection, position, object recognition, posture
evaluation, and menace estimation layers.
Active vision takes the advantages of “active” observer, “qualitative”
metric, and narrow “purposive” vision.
Image understanding can be based on active fusion, which actively selects
the sources to be analyzed and fuses the data from different
representational levels.
The advantages of fusing multi-sensor information are that it can enhance
reliability and credibility, increase spatial coverage, augment information
quantity and reduce fuzziness, and improve spatial resolution.
Information from multi-sensors can be redundant information,
complementary information, and cooperation information.
State model, observation model, and correlation model are three multisensor models based on the probability theory.
2. Image Fusion
Image fusion is a particular type of multi-sensor fusion, which takes
images as the operating objects.
Image fusion consists of three steps: image pre-processing, image
registration, and image combination.
Image fusion has, from low to high, three layers: pixel-based fusion layer,
feature-based fusion layer, and decision-based fusion layer.
Two groups of fusion are subjective evaluation (based on the subjective
filling of observers) and objective evaluation (based on some computable
criteria).
In subjective evaluation, the precision of image registration, the
distribution of global color appearance, the global contrast of brightness
and color, the abundance of texture and color information, and the
definition of fused image are considered.
Objective evaluation can be based on the statistics of fused images or
information quantities contained in fused images.
3. Pixel-Layer Fusion
Pixel-layer fusion performs pixel-by-pixel, for example, by using image
arithmetic and logical operations.
Typical methods of pixel-layer fusion include weighted average fusion,
pyramid fusion, HSI transform fusion, PCA-based fusion, and wavelet
transform fusion.
Different fusion methods can be combined to increase the performance of
fusion techniques. Two examples are combining HSI transform fusion
and wavelet transform fusion as well as combining PCA-based fusion
and wavelet transform fusion.
In wavelet transform fusion, images are decomposed into different levels.
An appropriate level of decomposition can balance the spectrum property
and the space details in fused images.
4. Feature-Layer and Decision-Layer Fusions
Some techniques used in feature-layer fusion and decision-layer fusion
are common. Of them, the Bayesian method and evidence reasoning are
popular.
The Bayesian method considers the multi-sensor decision as a partition of
the sample space and solves the decision problem by using the Bayesian
conditional probability.
Evidence reasoning adopts half-additivity to treat the case where the
reliabilities of two opposite propositions are both small.
The rough set theory is a mathematical theory. It can analyze
complementary information and compress redundant information to
avoid composition exploitation.
SUMMARY (CHAPTER 18)
1. Feature-Based Image Retrieval
Typical features used in feature-based image retrieval include the color
and texture of an image, the shape of the object, the spatial relationship
among objects, and the structure of each object.
Color features can be matched using histogram intersection, histogram
distance, central moments, or reference color tables.
There are many texture features used to describe images. They can be
combined to form texture vectors for a feature-based retrieval.
Shapes can be described, in the image space domain, by either
boundary/contour-based descriptors or region-based descriptors. Shape
descriptors can also be defined in the transform domain. One of them is
based on the wavelet modulus maxima and their invariant moments.
2. Motion-Feature-Based Video Retrieval
Motion feature is unique for video. Two types of motion feature can be
distinguished: the global motion feature and the local motion feature.
Global motion is also called camera motion, as it is caused by the
manipulation of the camera.
Global motion is characterized by the mass movement of all points in a
frame, and can be modeled by a 2-D motion vector.
Local motion is also called object motion, as it is caused by the movement
of objects in a scene. Local motion can be quite complicated since many
different objects can move differently and each of them should be
represented by a vector field.
3. Object-Based Retrieval
Feature-based image retrieval is based on low-level image features, while
human beings often describe the content of an image in a high-level
semantic. There is a semantic gap.
To fill the semantic gap, an image is analyzed hierarchically and
progressively based on the multi-layer description model.
A typical multi-layer description model consists of four layers: the original
image layer, the meaningful region layer, the visual perception layer, and
the object layer.
In object-based retrieval, object recognition should be performed in the
object layer. Once the objects are recognized, the retrieval can be
conducted by object matching and/or object relation matching.
4. Video Analysis and Retrieval
Video can be considered an extension of an image. Many types of video
programs exist. Three of them—news program, sport match, and home
video—are discussed.
A news program has a relatively fixed regular structure. One news
program is composed of a sequence of news items, which is started by an
anchor shot.
A three-step detection method for anchor shots includes the detection of
all main speaker close-ups (MSC), the clustering and listing of MSC, and
the analysis of the time distribution of shots.
Sport match videos are structured (the matches are either limited in space
or in time) with particularity for each sport (different limits).
The most interesting aspects of a sport match are climactic events.
Detecting and retrieving these events are often the goals of sport match
video analysis.
Detecting and tracking balls and players, and ranking scenes according to
the motion of balls and players are the essential steps in retrieving the
highlight shots for table tennis.
Home video can have unrestricted content and equally important shots.
Consequently, discovering the scene structure above the shot level plays a
key role in home video analysis.
To organize the shots in a home video, it is important to detect motion
attention regions and separately treat the attention regions and other
remaining regions.