Master Thesis
Electrical Engineering
Implementation of No-Reference
Image Quality Assessment in
Contourlet Domain
UDAY CHAITANYA DORNADULA
PRIYANKA DEVUNURI
This thesis is presented as part of Degree of Master of Science in Electrical
Engineering with emphasis on Signal Processing
Blekinge Institute of Technology
November 2013
Blekinge Institute of Technology
School of Engineering
Department of Electrical Engineering
Supervisor: Muhammad Shahid
Examiner: Dr. Benny Lövström
i
ABSTRACT
In image processing, efficiency term refers to the ability in capturing significant information that is
sensitive to human visual system with small description. Natural images or scenes that contain
intrinsic geometrical structures (contours) are key features of visual information. The existing
transform methods like Fourier transformation, wavelets, curvelets, ridgelets etc., have limitations in
capturing directional information in an image and their compatibility with compression methods.
Hence, to capture the directional information or natural scene statistics of an image and to handle the
compatibility over distortion methods, Contourlet Transform (CT) can be a promising approach. The
goal of no-reference image quality assessment using contourlet transform (NR IQACT) is to establish
a rational computational model to predict the visual quality of an image. In this thesis we
implemented an improved Natural Scene Statistics (NSS) model that blindly measures image quality
using the concept of Contourlet Transform (CT). In fact, natural scenes contain nonlinear
dependencies that can be disturbed by a compression process. This disturbance can be quantified and
related to human perception of quality.
Key Words:
Image Quality, Contourlets, NR IQACT, CT, NSS, Directional Information, Intrinsic Geometrical
Structures.
We would like to dedicate this work to Almighty and to our parents
ACKNOWLEDGEMENTS
We would like to thank our Examiner Dr Benny L ̈ vstr ̈ m and our Supervisor Mr Muhammad Shahid
for giving this opportunity to be part of this research and for providing valuable support throughout
our work. We would like to appreciate our friend Mr Sridhar Bitra for his support in our thesis.
We would like to convey our deepest gratitude for our family members. They provided very much
comfort and encouragement that helped us to reach here.
Uday Chaitanya Dornadula,
Priyanka Devunuri.
Karlskrona, November 2013
TABLE OF CONTENTS
ABSTRACT........................................................................................................................................... iii
ACKNOWLEDGEMENTS .................................................................................................................... v
TABLE OF CONTENTS ....................................................................................................................... vi
LIST OF FIGURES ............................................................................................................................. viii
CHAPTER 1 - INTRODUCTION .......................................................................................................... 1
1.1.
Literature Survey..................................................................................................................... 1
1.2.
Problem Statement .................................................................................................................. 3
1.3.
Research Question................................................................................................................... 4
1.4.
Thesis Outline ......................................................................................................................... 4
CHAPTER 2 - BACKGROUND ............................................................................................................ 5
2.1.
CT Construction using Filter Banks........................................................................................ 5
2.1.1.
Laplacian Pyramid (LP) .................................................................................................. 5
2.1.1.A. Construction of LP: ................................................................................................................................ 8
2.1.2. Iterated Directional Filter Banks ......................................................................................... 12
2.1.2. A. Multirate Identities: ............................................................................................................................. 14
2.1.2. B. Quincunx Filter Bank (QFB): ............................................................................................................. 15
2.1.2. C.New Construction of DFB, proposed by Minh N. Do and Martin Vetterli: ..................................... 15
2.2.Contourlet Coefficients ............................................................................................................... 22
2.2.1. Structure and definitions of relationship among Contourlet coefficients ........................... 22
2.2.2. Statistics of Contourlet Coefficients ................................................................................... 24
2.2.2. A. Marginal Statistics: .............................................................................................................................. 24
2.2.2. B. Joint Statistical Distribution ............................................................................................................... 24
2.2.2. C. Mutual Information among coefficients ............................................................................................. 26
CHAPTER 3 –IMPLEMENTATION ................................................................................................... 27
3.1. Implementation model of NR IQACT ....................................................................................... 27
3.2 Image data set and its explanation .............................................................................................. 27
3.3 NR-IQACT system model .......................................................................................................... 27
3.3.1 Contourlet Transform ............................................................................................................................... 29
3.3.2 Contourlet coefficients............................................................................................................................... 29
3.3.3 Statistics of contourlet coefficients in distorted natural images ............................................................. 30
3.3.4 Image-dependent threshold .................................................................................................................... 30
3.4. Training methodology ................................................................................................................ 32
3.5 Testing method and algorithm to calibrate image quality ........................................................... 33
3.6 Subjective image quality assessment .......................................................................................... 34
3.7. Image quality assessment using SSIM ....................................................................................... 34
3.8. Evaluation of Test Results ......................................................................................................... 35
3.8.1. Pearson’s Correlation Coefficient ........................................................................................................... 35
CHAPTER 4– RESULTS ..................................................................................................................... 36
4.1. No Reference Image Quality Assessment using Contourlet Transform (NR-IQACT).............. 36
CHAPTER 5 – CONCLUSION AND FUTURE WORK .................................................................... 44
5.1.Conclusion .................................................................................................................................. 44
5.2. Future Work ............................................................................................................................... 44
APENDIX A ......................................................................................................................................... 45
BIBLIOGRAPHY ................................................................................................................................. 55
LIST OF FIGURES
2.1
Image decomposition using CT………………………………………………………………..
6
2.2
The effect of “Frequency Scrambling” in 1-D case………………...…………………….........
7
2.3
Generation of outputs in between Wavelet and Laplacian filter banks at multi-dimensional
case……………………………………………………………….............................................. 7
2.4
The first level of image decomposition using Laplacian Pyramid………………………..........
8
2.5
A new optimal linear reconstruction of LP for synthesis………………………………………
10
2.6
Graphical representation of Pythagorean Theorem…...……………………………………….. 12
2.7
The New Construction scheme for LP
2.8
Implementation of tree decomposition at
levels using wedge shaped frequency
partitioning……………………...……………………………………………………………... 13
2.9
Two-dimensional spectrum partition using quincunx filter banks with fan filters…...………..
14
2.10
Example of shearing operation that is used like a rotation operation for DFB
decomposition…………………….………………………………............................................
14
2.11
Multi-dimensional multirate identity for interchange of downsampling and filtering………...
15
2.12
Two possible support configurations for the filters in the QFB. Each region represents the
ideal frequency support of a filter in the pair…………………………………………………..
16
2.13
First and second level of the DFB…………………………..………………………….............
17
2.14
Support configuration of the equivalent filters in the first two levels of the DFB…………….. 17
2.15
QFBs with resampling operations that are used in the DFB starting from the third
level…………………………………………………………………………………………….
18
Left: The analysis side of the two resampled QFB’s that are used from the third level in the
first half channels of the DFB. Right: The equivalent filter banks using parallelogram filters.
The black regions represent the ideal frequency supports of the filters…………......................
20
2.16
12
2.17
Impulse responses of 32 equivalent filters for the first half of channels in 6-levels DFB that
use the Haar filters………………………………………………………..…………………… 21
2.18
Contourlet Transform of Barbara Image. This image was decomposed into three levels and
eight directional subbands……………………………………………………………………...
22
2.19
(a) Contourlet coefficients with their relationships. (b) Wavelet coefficients relationships…... 23
2.20
Histograms of the Barbara Image contourlet coefficients. Histograms (a) to (c) processed
from first level; (d) to (f) from second level and (g) to (i) from third level…………..………..
25
Joint scatter graphs conditioned on parent, cousins and neighbours…………………………..
25
2.21
3.1
System model for NR-IQA in contourlet domain……………………………………………...
3.2
) for different distorted images [1]. (a). Natural (b).
Joint histograms of(
,
JPEG2000……………………………………………………………………………………… 31
3.3 (a)
The subband serial number used in subband enumeration of (b)………………………………
3.3 (b)
(
(| |)) Versus subband enumeration index……………….………………….…...
28
31
32
4.1
Shows the Contourlet coefficients subbands at 4 different levels…..…………………………. 36
4.2
Shows Joint histograms of (
,
) at level 1 for 2 subbands………..…..…………...
37
4.3
Joint histograms of (
) at level 2 for 4 subbands…..…………………………….
37
4.4
Graphical representation of subjective and objective scores for the videos considered……….
38
4.5
Validation of the results using neural network toolbox…………………………………..........
38
4.6
(| |)) versus subband enumeration index falls off with decrease of scale from
(
original image to least distorted image with respect to subjective value……………………… 38
4.7
Partition of Significant, Insignificant portions in P and C space ……………………………...
39
4.8
Shows first set of building images that were taken for testing. First one is Original image
and their respective distorted images…………………………………………………………..
39
,
4.9
Shows NR-IQAT, subjective values and SSIM metric for the images in Fig 4.8. X-axis
indicates 8 images and Y-label indicates quality in scale of 0 to 1…….……………………… 40
4.10
Shows second set of flowersonih35 images that were taken for testing. First one is Original
image and their respective distorted images…………………………………………………...
4.11
4.12
40
Shows NR-IQAT, subjective values and SSIM metric for the images in Fig 4.10. .X-axis
indicates 8 images and Y-label indicates quality in scale of 0 to 1. …………………………
41
Shows Metric assessments, Pearson correlation coefficient between objective value and its
corresponding subjective values and Pearson correlation coefficient between objective value
and SSIM obtained for all 10 image sets…………………………………………………...
41
LIST OF TABLES
Table4.1
Table4.2
Shows correlation coefficient of 7 different set of images that were taken for
testing…………………………………………………………………………………..
Shows results of K, U, and T fitting parameter obtained after using MATLAB
command ‘fminsearch’ for 8 subbands these subbands are average of 24 images
taken for training that can be trained from the training set. The 8 subbands taken are
1, 3, 5, 7, 9, 11, 13, 14 this numbering is as per fig 3.4
(a)………………………………………
43
43
NOMENCLATURE LIST
CT
Contourlet Transform
NR IQACT
No Reference Image Quality Assessment Using Contourlet Transform
NR IQA
No Reference Image Quality Assessment
NR
No Reference
RR
Reduced Reference
FR
Full Reference
NSS
Natural Scene Statistics
PSNR
Peak Signal to Noise Ratio
JND
Just Noticeable Difference
SSIM
Structural Similarity
IFC
Information Fidelity Criterion
VIF
Visual Information Fidelity
PDFB
Pyramidal Directional Filter Bank
LP
Laplacian Pyramid
DFB
Directional Filter Bank
QFB
Quincunx Filter Bank
PC
Parent Coefficient
GC
Grandparent Coefficient
NC
Neighbour Coefficient
CC
Cousin Coefficient
CHAPTER 1 - INTRODUCTION
In most of the applications, the visual information in an image is finally received by human beings
and it is intuitively reasonable to score the image quality subjectively (by humans) [2, 3]. But, in real
scenarios subjective image quality assessment (IQA) is expensive and in most of the real time
applications implement the alternative option called objective IQA. But, there are numerous Objective
IQA techniques in image processing that can be classified in many ways. For example, data metrics
namely PSNR and MSE uses fidelity of the signal and ignores the visual content. But, on other side
picture metrics consider visual information in the signal [6]. In addition to these two types of metrics,
there are objective metrics that consider reference information in calculating the image quality. These
metrics can be one of the full- reference (FR) [4], no-reference (NR) [5] and reduced reference (RR)
methods. In our thesis we did no-reference image quality assessment using contourlet transform (NR
IQACT) based on natural scene statistics (NSS).
For one-dimensional piecewise smooth signals, like scan-lines of an image, wavelets have been
established as the right tool, because they provide an optimal representation for these signals in a
certain sense. But, natural images are not simply stacks of 1-D piecewise smooth scan-lines. Because
natural images are made of discontinuity points (i.e. edges) that are typically located along smooth
curves (i.e. contours) owing to smooth boundaries of physical objects. Thus, natural images contain
intrinsic geometrical structures that are key features in visual information [1].
With reference to the studies related to the human visual system, natural image statistics and existing
transformation methods; CT has higher degree of directionality (basis elements are oriented in variety
of directions) and anisotropy (smooth contours in image representation) [1, 3]. In comparison with
wavelet, CT (improvement of wavelet in terms of efficiency) does addition of directionality and
anisotropy to wavelets’ properties namely multi-scale and time-frequency-localization.
1.1.
Literature Survey
Over the years, many researches contributed different mathematical models and different approaches
to analyse the images and to assess the quality of the same. Among those, we found no reference
image quality assessment using contourlet transform (NR IQACT) as a latest and sensible approach to
analyse the image and to assess the image quality. Here, we mention the different views of different
researchers regarding image quality assessment (IQA) and mathematical models in analysing images.
1
In [7], authors proposed an approach to the QA problem as an information fidelity problem. In their
words, a source of natural image and its receiver communicate with each other through a channel that
has limitations on amount of information that could flow from the source (the reference image) to the
receiver (the human observer). Using signal source and distortion or channel models they explained,
mutual information can be quantified (as IFC) between reference and test images. This quantified
information or IFC can be used to quantify perceptual quality. In their work, they modelled source
and channel in wavelet domain. This work helped us in understanding the significance of mutual
information between sub-bands in performing image quality assessment. In our thesis, we used the
concept of mutual information in finding coefficient’s neighbours distribution over sub-bands in terms
of scale, space and direction.
In [8], authors implemented a reduced reference model to find the quality of an image. They used an
image called quality-aware image i.e., in an original image certain features were extracted and
embedded as invisible hidden messages into an image data. When the distorted version of quality
aware image is given as input to the proposed algorithm, the algorithm decodes and maps to
corresponding hidden message and provides objective score of the distorted image. From this work,
we learned the importance of features in an original image and one type of its usage in performing
IQA using reduced reference model.
In [9], a performance evaluation study of ten image quality assessment algorithms conveys that there
is very much difference between machine and human evaluation of image quality. Among the ten
algorithms that authors undertook in [24], DCTune (A technique for visual optimization of DCT
quantization matrices for individual images) performs statistically worse than peak signal to noise
ratio (PSNR), and just noticeable difference (JND),structural similarity(SSIM), information fidelity
criterion (IFC), and Visual Information Fidelity (VIF) perform much better than the rest of the
algorithms. They found, VIF as best in the considered 10 algorithm set. This work helped us in
understanding different algorithms in the area of signal processing to perform image quality
assessment. Among different objective metrics, as it is desirable to have perceptually relevant
objective metrics, we used Similarity Index (SSIM) to validate our results.
In [2], Minh N. Do and Martin Vetterli proposed a two-dimensional transform called contourlet
transform (CT) that can capture the intrinsic geometrical structure that is key in visual information.
They realized this concept with a discrete-domain multi-resolution and multi-direction expansion
using non-separable filter banks. They claimed that, with parabolic scaling and sufficient directional
vanishing moments, contourlets can achieve the optimal approximation rate for piecewise smooth
functions with discontinuities along twice continuously differentiable curves. This work helped us in
understanding the concept (construction) of contourlet transform. In our thesis, we used the concept of
contourlets proposed in this work to decompose an image.
2
In [5], authors proposed a model that uses NSS to blindly measure the quality of an image that was
compressed using standard JPEG2000. They observed, natural scenes are nonlinearly dependant and
they were disturbed by compression process that can be quantified. In addition, they claim that a
rational relationship exists between this quantification of disturbance and human perceptions of
quality. From this work we learned the concept of wavelets in performing no-reference image quality
assessment and it helped us in analysing the process of IQA using contourlet transform.
In [1], authors presented a model to perform no reference image quality assessment by capturing the
image structural information. They used the concept of contourlet transform to decompose an image
into multiscale and multidirectional subbands that consists of nonlinear dependencies. Then they
captured these nonlinear dependencies using joint histograms of the reference and estimated
contourlet coefficients. An image dependent threshold is employed to eliminate the influence of
content. Finally, objective quality was calculated by the nonlinear combination of the extracted
features.
In our thesis, we studied and analysed CT and implemented the model that was proposed in [1].
Thereafter we tested our algorithm with a new database of images and verified the results using SSIM.
Then we evaluated the results using Pearson’s Correlation coefficient. The advantage of our thesis in
comparison to the similar work that proposed in [5] is that NR IQACT can be used across different
compression techniques.
1.2.
Problem Statement
The primary goal of our thesis is to perform No-Reference Image Quality Assessment using
Contourlet Transform (NR IQACT).
Using nonlinear and structured transforms an image representation will become efficient. CT can
build fast algorithms that can represent piecewise smooth signals or functions that resemble images.
In addition, CT can incorporate the “wish list” [1, 10] of an image representation namely; multiresolution, localization, critical sampling, directionality and anisotropy.NR IQACT is its inexpensiveness when compared to FR and RR reference image quality assessments.
Sheikh et al. [5] presented a NR IQA metric that uses NSS for JPEG2000 compressed images. This
existing state-of-art approach develops NSS model to NR IQA [1]. The mentioned NSS model is
based on the concept that “natural images exhibit certain common characteristics which can be
represented by a mathematical model”. In our thesis, the implement NR IQACT can be used across
different image compression techniques and not restricted to JPEG2000 that was presented in [5].
3
1.3.
Research Question
How to perform No-Reference Image Quality Assessment using Contourlet Transform (NR IQACT)
using Natural Scene Statistics (NSS) of an image and how does it perform compared to other existing
methods?
1.4.
Thesis Outline
Our thesis is organised as follows.
In Chapter 2, we focus on the background of our thesis, CT. We discuss the construction of Pyramidal
Directional Filter Bank (PDFB) or CT using Laplacian Pyramid (LP) and Directional Filter Bank
(DFB). In the construction of LP, we first describe LP that was introduced by Burt and Adelson [11]
which has the limitation of frame bounds. Later we present a modified construction of LP [2] that
overcomes the limitation of frame bounds. In the construction of DFB, we discuss the two
dimensional DFB proposed by Bamberger and Smith [11] and its extended version introduced by Do
and Vetterli [10] that provides orthogonal bases. Then we present the contourlet coefficients, its
statistical properties and relationships among them.
In Chapter 3, we discuss the implementation of no-reference image quality assessment (NR-IQA)
using contourlet coefficients. This chapter states the approach i.e. system model that summarizes the
process of image decomposition using contourlet coefficients and finding no reference image quality.
By using the statistical properties of these coefficients, we develop a joint histogram in which
employment of image dependent threshold will happen. Using this image dependent threshold and
offset parameters, significant information that is needed in finding quality of an image could be
determined. Then, we state the implementation aspects and simulation details in which a step-by-step
algorithm of finding image quality as mentioned above is added.
In Chapter 4, we present and discuss the obtained results.
Finally in Chapter 5, we conclude our thesis with recommendations to extend this research.
4
CHAPTER 2 - BACKGROUND
Before going to implement contourlet transform to decompose an image, we present here the
theoretical concepts of the contourlet transform (CT). This chapter gives the overview of the CT
construction using filter banks and the statistics of contourlet coefficients.
CT was proposed by Minh N. Do and Martin Vetterli [1]. CT is as an improvement over wavelets in
terms of the efficiency in presenting multi-scale, local and directional contour segments which are
sensitive to human eyes. The primary objective of the CT was to obtain a sparse expansion for typical
images that are piecewise smooth functions or contours.
2.1. CT Construction using Filter Banks
CT employs a double filter bank structure in which at first the Laplacian pyramid (LP) is used to
capture the point discontinuities, and then a directional filter bank (DFB) is used to link these point
discontinuities (which are correlated to each other in terms of coefficients magnitude) into linear
structures. Figure 2.1shows the process of decomposition that happens in the double filter bank
structure.
2.1.1. Laplacian Pyramid (LP)
Laplacian pyramid (LP) was introduced by Burt and Adelson [11]. The LP decomposition at each
level generates a down-sampled low pass version of the original and the difference between the
original and the prediction, resulting as a band pass image.
The reason behind opting LP instead of wavelet filter bank is:
To avoid frequency scrambling that happens when a high pass channel is folded back into
low frequency band after down sampling. This drawback was resolved in band pass signals
of LP by just down sampling low pass channel [2, 12].Figure 2.2 illustrates frequency
scrambling in 1-D case.
In converse to wavelet filter bank, LP generates only one isometric detailed signal at each
level in any dimensions as shown in Figure 2.3 [2, 12].
5
Figure 2.1: Image decomposition using CT [1, 2].
If we look at the drawbacks of LP, we can find the following facts:
In the presence of noise i.e. noise from high pass sub-bands in a multi-dimensional LP, it
appears as a broadband noise in the reconstructed signal instead of remaining in these sub
bands [12].
Another drawback of LP is implicit over sampling [2]. After one step in the LP, the coarse
signal c (n) will become
| |
times the size of the input and a difference or band-pass signal
will have the same size as the input [15]. When the scheme is iterated, redundancy ratio will
become as follows:
| |
| |
| |
| |
(
)
Here, M is sampling and integer matrix.
6
As our work concerned to analysis part, we have not faced these drawbacks in our thesis. But, using a
modified version of usual LP that was proposed in [10], we can exclude these drawbacks.
Figure 2.2: The effect of “Frequency Scrambling” in 1-D case [15].
Upper: Spectrum after high-pass filtering. Lower: Spectrum after down sampling. The filled regions
indicate that the high frequency is folded back into the low frequency.
Figure 2.3: Generation of outputs in between Wavelet and Laplacian filter banks at multi-dimensional
case [10, 11].
7
2.1.1. A. Construction of LP:
LP uses the concept of oversampled filter banks and theory of frames [13]. In this section, we first
explain LP that was introduced by Burt and Adelson [11]. Later we present a new or modified
construction of LP that was proposed by Minh. N. Do and Martin Vetterli [2] that overcomes the
limitation of frame bounds.
In Figure 2.4, H and G are called (low pass) analysis and synthesis filters respectively that were
orthogonal to each other (i.e., the analysis (H) and synthesis (G) filters are time reversal, h[n] = g [−n]
with respect to sampling matrix M) [2].
Figure 2.4: The first level of image decomposition using Laplacian Pyramid. The outputs are a coarse
approximation c[n] and a difference or band-pass d[n] between the original signal and the prediction
[2].
The filtering and coarse approximation of LP yields filtered coarse approximation of the signal c (n).
The approximation signal c (n) from Figure 2.4 is given by [13],
∑
〈
̃
〉
(
)
Where, due to orthogonal filters H and G, we denote ̃
By performing up-sampling and filtering operation on
results prediction signal
given by,
8
∑
(
Writing the above signals in matrix form gives,
and
)
, where, H and G matrices
correspond to down sampling and up sampling processes respectively.
Using these matrix notations, the difference or residual signal of the LP can be written as
(
)
(2.4)
From previous relations, we can write the analysis operator of the LP as follows:
( )
(
)
(2.5)
Now, let us denote these matrices as,
(
)
(
)
(
)
(
)
The inverse transform of the LP [refer to Figure 2.5] can be re-written as below,
̂
It can be written as,
̂
(
)( )
(
)
(
( ))
And let’s say,
(
)
9
Therefore, we can conclude that
that is perfect reconstruction for any H and G.
Here, we have to note that due to orthogonal filters H and G, tight frame will occur when frame
bounds are equal to 1. In this case, in [2], authors Minh. N. Do and Martin Vetterli proposed the use
of the optimal linear reconstruction of LP using the dual frame operator (or pseudo-inverse) as shown
in Figure 2.5.
Figure 2.5: Usual reconstruction of LP for synthesis [2].
In Figure 2.5, the signal is obtained by simply adding back the difference to the prediction from the
coarse signal which is an improved model over the usual reconstruction in the presence of noise i.e.,
̂
[2, 10].
As mentioned before, LP uses theory of frames or frame operator with redundancy. So, it admits
infinite number of left inverses.
Consider S as an arbitrary left inverse of A.
So, in a noisy environment equation 2.7(a) can be written as,
̂
̂
(
)
(
)
As mentioned before, LP’s frame operator admits an infinite number of left inverses due to the
property of redundancy. Among those infinite left inverses, the most important is the dual frame
operator or the pseudo inverse or optimal left inverse (minimizing ‖ ‖ ) of matrix ‘A’ i.e.
10
(
)
(
)
But, reconstruction using pseudo inverse without tight frame is computationally expensive or difficult.
In LP, the orthogonal filters H and G exhibits;
〈
〉
(
)
(
)
(
)
〉
(
)
‖ ‖
(
)
In geometrical interpretation the prediction signal,
∑〈
〉
From equation (2.2), we can write
〈
Appling Pythagorean theorem to above triangle in Figure 2.6,
‖ ‖
‖ ‖
‖ ‖
‖ ‖
Here, condition ‖ ‖ ‖ ‖ (frame bounds) comes from the fact that ‘c’ represents the coefficients in
the orthogonal expansion of ‘p’.
As a result, pseudo-inverse of A is simply its transpose.
(
)
(
)
(
)
11
Figure 2.6: Graphical representation of Pythagorean Theorem [10].
Therefore, the optimal reconstruction is,
(
̂
)
(
)
(
)
This optimal reconstruction stated in equation 2.17 can be realized as shown in below Figure 2.7.
c
H
M
G
M
x
p
d
H
G
Figure 2.7: The new reconstruction scheme for the LP [13, 14].
2.1.2. Iterated Directional Filter Banks
In CT construction, LP was used before the DFB because DFB’s are designed to capture high
frequency (deals with directionality) information of the given image. So, low frequency content will
12
be neglected or poorly treated. Therefore, it is sensible way in incorporating LP prior to DFB to
remove the low frequency content.
In 1992, a 2-D DFB was constructed by Bamberger and Smith. In their construction, they used
quincunx filter banks (to modulate the image) with diamond-shaped filters [14]. They implemented it
using level binary tree decomposition that leads to
sub-bands with wedge-shaped frequency
partitioning as shown in Figure 2.8.
Minh N. Do and Martin Vetterli [10] proposed a new construction that avoids image modulation with
a simple rule for expanding the decomposition tree. This new DFB was constructed in two building
blocks namely, two-channel quincunx filter bank and a shearing operator. The two channel quincunx
filter bank with fan filters divides the spectrum into horizontal and vertical directions (see Figure
2.9).The second building block, shearing operator reorders the image samples (see Figure 2.10) [2].
Figure 2.8: Implementation of tree decomposition at
partitioning [10].
levels using wedge shaped frequency
In Figure 2.8, centred Figure conveys, frequency partitioning at level
and its corresponding
real wedge-shaped frequency bands. Subbands 0–3 correspond to the mostly horizontal directions,
while subbands 4–7 correspond to the mostly vertical directions. The left end figure is zone plate
image and the right end is the zone plate image that was decomposed by a DFB with 4 levels that
leads to 16 subbands.
In Figure 2.10, we can observe the application of shearing operator that reorders the image edge
having 450 directions to a vertical direction. Hence, incorporation of shearing operator and its inverse
(un-shearing) before and after (in respective order) to the 2-D filter bank in Figure 2.8, we can obtain
directional frequency partition by maintaining perfect reconstruction [2, 10].
13
Figure 2.9: Two-dimensional spectrum partition using quincunx filter banks with fan filters. The black
regions represent the ideal frequency supports of each filter. Q is a quincunx sampling matrix [2].
(a)
(b)
Figure 2.10: Example of shearing operation that is used like a rotation operation for DFB
decomposition. (a) The “Cameraman” image. (b) The “Cameraman” image after a shearing operation
[2].
2.1.2. A. Multirate Identities:
For the interchange of filtering and sampling, multirate identities can be used [10, 16]. Consider the
given sample was down sampled by M and it was filtered by a filter ( ). This sample is equivalent
to the sample that was filtered by using filter (
) and up-sampled by using ( ) with order M,
before down-sampling. We can observe this concept in Figure 2.11.
14
(
( )
M
)
M
Figure 2.11: Multi-dimensional multirate identity for interchange of down-sampling and filtering [10].
Proposition 2.1:
( )
( ) if and only if
; where E is a unimodular integer
matrix [10].
To fulfil the rotation operations, four basic unimodular matrices are used in the DFB. They are:
(
)
(
Here, we need to note that,
)
(
where
observation we can say that, for example, upsampling by
)
denotes
(
)
(
)
identity matrix. From this
is equivalent to the downsampling by
.
2.1.2. B. Quincunx Filter Bank (QFB):
The QFB can be used to split the frequency spectrum of the input signal into a lowpass and a highpass
channel using a diamond-shaped filter pair, or into a horizontal and a vertical channel using a fan
filter pair [2, 10]. Frequency characteristics of these filters are shown in Figure 2.12.
Note that we can obtain one filter pair from the other by simply modulating the filters by in either
thefrequency variable
or
.
2.1.2. C. New Construction of DFB, proposed by Minh N. Do and Martin Vetterli:
In new construction, DFB is based only on the Quincunx Filter Bank’s (QFB) with fan filters. This
new construction method avoids the modulation of the input image and has a simpler rule for
expanding the decomposition tree. Due to this reason, synthesis is exactly same as analysis; M. N. Do
15
(
)
(
1
0
1
0
(
)
(
)
)
(b)
(a)
Figure 2.12: Two possible support configurations for the filters in the QFB. Each region represents the
ideal frequency support of a filter in the pair. (a) Diamond shaped filter pair. (b) Fan filters pair [10].
and M.Vetterli focused only on the analysis side. Intuitively, the wedge-shaped frequency partition of
the DFB is realized by an appropriate combination of directional frequency splitting by the fan QFB’s
and the “rotation” operations done by resampling [2, 10].
In their construction, to obtain a four directional frequency partitioning, the first two levels are
explained in Figure 2.13 and the sampling matrices in the first and second level are Q0 and Q1,
respectively. Hence, the overall sampling after two levels is
, or downsampling by two
in each dimension.
Referring to the concept of multirate identity, it is acceptable to interchange the filters at the second
level with the sampling matrix
. Because of this interchange, a fan filter will transform into an
equivalent filter with quadrant frequency response. In Figure 2.14, we can observe the results of all
these combinations.
To achieve finer frequency partition from third level, authors used quincunx filter banks together with
resampling operations as shown in Figure 2.15. There are four types of resampled QFB’s,
corresponding to the four resampling matrices in proposition 2.1.
Resampled QFB’s of type 0 and 1 are used in the first half of DFB channels that generates
subbands in horizontal directions or directions in between +45◦ and −45◦.
Resampled QFB’s of type 2 and 3 are used in second half of DFB channels that generates
subbands in remaining directions.
16
Q1
0
Q1
1
Q1
2
Q1
3
Q0
Q0
Figure 2.13: First and second level of the DFB. At each level, QFB’s with fan filters are used. The
black regions represent the ideal frequency supports of the filters [10].
(
0
(
)
0
1
)
(
0
1
2
3
3
2
1
(
)
(
1
0
)
(b)
(c)
(a)
Figure 2.14: Support configuration of the equivalent filters in the first two levels of the DFB. (a) First
level: fan filters. (b) Second level: quadrant filters. (c) Combining the supports of the two levels. The
equivalent filters corresponding to those four directions are denoted by , i = 0, 1, 2, 3 [10].
17
)
From the third level of DFB, authors constructed the second half of the DFB channels by
simply swapping the two dimensions, n0and n1 from the corresponding channels in the first
half. Here swapping includes both the sampling matrices (for example R0becomes R2,
Q0becomes Q1) and the filters in the QFB’s.
Ri
Ri
Figure 2.15: QFBs with resampling operations that are used in the DFB starting from the third level
[10].
Therefore, from the above discussion, we can conclude that we can only focus on first half of the DFB
channels!
In Figure 2.15, we can see on the left side that the two resampled QFB’s of the analysis that were used
from the third level in the first half channels of the DFB and on the right side the equivalent FBs using
parallelogram filters. Here, we have to notice the order of frequency supports of the fan filters in each
QFB. In the iterated DFB, the upper channel at each node from third level in the first half is expanded
using the type 0 filter bank while the lower channel is expanded using the type 1 filter bank [10].
We can also observe that the concept of multirate identity in Figure 2.15 was applied in the Figure
2.16. From these figures we can state the equation or mathematical relation as follows [10]:
( )
(
),
(2.17)
This equation is followed by the process of downsampling, given by:
, for
Here, the filters
{
}
( ) were obtained by the process of resampling the fan filters,
(2.18)
( ) and called
by the name “parallelogram filters”.
To simplify the sampling matrices for the resampled QFB’s, we use Smith form of quincunx matrices
as shown in [10], i.e.
(2.19)
18
.
(2.20)
Using the proposition that states, the LP with stable filters (stability filters produces bounded output
for given bounded input) provide a frame expansion in
( )
(
( )
) [10] i.e.
(
)
(
)
So, from the above equation we can say that the sampling lattices for the resampled QFB’s of type 0
and 1 are equivalent to downsampling by 2 along the
dimension. Here, we considered only
horizontal direction.
In [10], first half of the channel was indexed by
the levels of the DFB channel. Associate this index
1, as (
and bounded as
, where indicates
as a sequence of path types either type 0 or type
) of the filter banks from the second level leading to that channel. Therefore,
using expanding rule, can be rewritten as
∑
(
)
So, using above path type with index k and from Figure 2.15, for the channel k, the sequence of
filtering and downsampling can be written as [15],
(
)
(
)
(
)
(
)
From this and using the concept of multirate identity recursively we can write,
(a) Type 0
19
(b) Type 1
Figure 2.16: Left: The analysis side of the two resampled QFB’s that are used from the third level in
the first half channels of the DFB. Right: The equivalent filter banks using parallelogram filters. The
black regions represent the ideal frequency supports of the filters [10].
()
()
Where,
( )(
∏
( )
( )∏
((
()
)
)
(
)
(
)
) is the single filtering form of the analysis side of channel . This form was followed
by down-sampling process that was headed by overall sampling matrix
()
in which
is
the partial product of overall sampling matrix.
In real time applications, filters with non-ideal frequency response will be used. Therefore, the upsampling operation fed on the filters
shears their impulses in different directions as shown in
example Figure 2.17 [2, 10]. Here, we can observe that this shearing of impulse responses produces
equivalent filters that have linear supports in space and span in all directions.
The DFB’s introduced by Bamberger and Smith will generate distorted sub-band images because the
modulation and scrambling operations introduce “frequency scrambling”. In the new construction of
the DFB, the modulation problem was solved by incorporating the equivalent and modulated filters at
each level of the DFB. To fix the resampling problem, in [10], author proposed the back-sampling
operations at the end of the analysis side of DFB. Because of this, overall sampling matrices of all
channels become diagonal.
20
Figure 2.17: Impulse responses of 32 equivalent filters for the first half of channels in 6-levels DFB
that use the Haar filters. Black and gray squares correspond to +1 and −1, respectively [10].
Therefore, to correct the resampling problem (to exclude the scrambling frequencies affect), the
explicit formula for the back sampling matrices in new construction of DFB is as follows:
( )
()
( ( )
)
Here, we have to note that these matrices are for the first half channels with
(
)
; and for the
second half channels are obtained by transposing these matrices. By appending a down-sampling by
()
at the end of the analysis side of the channel in the DFB, it becomes equivalent to filtering by
( )(
) followed by down-sampling [10] by
()
()
Because of the reason
()
{
() ()
(
)
(
)
(
)
(
)
is unimodular matrices, sampling using theses matrices rearranges the
coefficients in the DFB subbands that enhances the visualization.
Figure 2.18 shows the image decomposition using CT. In this decomposition, for LP or multiscale
decomposition, we used 5-3 biorthogonal filter with three levels of decomposition and for DFB or
directional decomposition, we used McClellan transformed directional filters of the 5-3 filters [2].
21
Figure 2.18: Contourlet Transform of Barbara Image. This image was decomposed into three levels
and eight directional subbands [1, 2].
2.2. Contourlet Coefficients
2.2.1. Structure and definitions of relationship among Contourlet
coefficients
A decomposed image using CT will generate the CT coefficients in different subbands. These
coefficients are correlated with each other in terms of scale, direction [1]. As wavelet concept is well
known in signal processing, we took the same as an example to explain the concept of contourlet
coefficients [18].
22
(a)
(b)
Figure 2.19: (a) Contourlet coefficients with their relationships. (b) Wavelet coefficients relationships
[18].
For a given contourlet coefficient C, the following relationships hold with other coefficients in CT
decomposition [18].
With reference to spatial location of the given coefficient C, the coefficient that exists in the
same spatial location in the immediately coarser scale is defined as its parent (PC), while
those in the same spatial location in the immediately finer scale are its children. A parent will
have a grandparent coefficient (GC) from relevant coarser scale.
In the same subband, at the given spatial location, the given coefficient C is always
surrounded by eight adjacent coefficients named by neighbors (NC). Those at the same scale
and spatial location but in different directions are defined as cousins (CC) of each other.
Figure 2.19 explains the relationship between contourlet coefficients and the difference between
wavelet decomposition. In this figure, we can observe that contourlet transform had four children in
two separate directional subbands but in case of wavelet, every parent has its child in the same
23
direction [18]. Therefore, we can say that contourlet transform can represent the images in diversified
scale, space and orientation.
2.2.2. Statistics of Contourlet Coefficients
To design a system model that can decompose an image, the statistical information of each coefficient
and its neighbour’s distribution over subbands in terms of scale, space and direction is very important.
Marginal statistics will provide the individual coefficient statistical information and joint statistics.
Mutual information will provide coefficients neighbour’s distribution over subbands in terms of scale,
space and direction [1].
2.2.2. A. Marginal Statistics:
From Figure 2.20 [1, 18], we can observe the distribution of contourlet coefficients among different
levels and subbands. Heavy tails of the bars in histograms represent the majority of coefficients at that
scale. We can also observe that as the level increases the distribution is changing its scale i.e.
transformation is sparse.
2.2.2. B. Joint Statistical Distribution
In Figure 2.18, despite the decorrelation properties of CT, we can observe that contourlet coefficients
are not statistically independent or simply, we can say that they are statistically dependant [18]. Large
magnitudes in contourlet coefficients occur when contourlet function overlap and align with image
edges [18]. These large magnitude coefficients will always show correlation to the coefficients in next
level in same space and orientation.
In Figure 2.20, we can see the joint scatter graphs that were conditioned on parent, cousin and
neighbour referring to the Figure 2.18 [18].
From Figure 2.21, we can conclude:
1. In all three scatter graphs, coefficients were clustered near zero amplitude point. Hence, in
case of inter scale i.e. ( ⁄
), if parent node has small magnitude, its children are very
24
likely to be small too and in case of intra scale i.e.( ⁄
), if the previous coefficient has
small magnitude than its neighbour tends to be small too [18].
Figure 2.20: Histograms of the Barbara image contourlet coefficients. Histograms (a) to (c) processed
from first level; (d) to (f) from second level and (g) to (i) from third level.
Figure 2.21: Joint scatter graphs conditioned on parent ( ⁄
neighbours ( ⁄ ) [18].
), cousins ( ⁄
) and
25
2. In comparison between strong dependencies and little dependencies conditioned on parents,
cousins and neighbors can be found when the coefficients are at higher amplitude. This
indicates that the dependencies in these three kinds of conditional distributions are local,
especially when the CT coefficients, including the reference and its parents, cousins and
neighbors are small [5, 18].
2.2.2. C. Mutual Information among coefficients
Mutual information among coefficients will provide coefficients neighbour’s distribution over
subbands in terms of scale, space and direction. Mutual information can be used as measure of
dependencies [18].
The mutual information in between two variables
(
)
∬ (
can be defined as,
)
(
)
(
( ) ( )
)
Mutual information in between two variables can also be estimated using entropy. Entropy in between
two coefficients can be calculated as:
(
Where,
( )
( )
(
variable
can be calculated as [18]:
)
( )
( )
(
) are entropy of variable
( )
Where, ( ) is the probability of variable
∑ ( )
at the state of
)
(
)
respectively [5]. Entropy of
( )
(
)
.
26
CHAPTER 3 –IMPLEMENTATION
3.1. Implementation model of NR IQACT
This chapter explains the system model that we used to find the quality of an image. We modelled the
concept of No-Reference Image Quality Assessment using Contourlet Transform (NR IQACT) as
shown in Figure 3.1.
3.2 Image data set and its explanation
Our algorithm is validated on the laboratory for image engineering (LIVE) database [20].This
database contains 29 high resolution 24 bits/pixels RGB images as reference or original and
corresponding 227 JPEG2000 compressed images. The difference mean opinion scores (DMOS) of
the image is provided to describe the subjective quality of the degraded images.
The database has been divided into two sets to simplify the process of testing and training. Among the
two sets, one set is used for training which contains 10 sets of different images. In these 10 sets, each
set contains around 7 to 8 images, one original and remaining are distorted images; similarly for
testing we have taken another 10 different sets of images in which each set contains 7 to 8 images,
one original and remaining are distorted images. So, in total we considered 78 images for training and
78 images for testing.
The luminance component of images was normalized to be a root-mean-squared (RMS) value of 1.0
per pixel, and these images were employed to validate our algorithm. The difference between each
scale for estimating the subband coefficient is learned from the uncompressed or original images in
the training set [1].
3.3 NR-IQACT system model
As shown in Figure 3.1,
(1) CT has been used to optimally approximate the given input image’s as piecewise smooth functions
using its contourlet coefficients in different sub-bands at different levels.
(2) The relationship between contourlet coefficients in diversified sub-bands and levels is represented
by conditional histogram or joint statistics distribution.
27
Input Image
Relationship between CT Coefficients
Contourlet Transform
Image Dependent Threshold
Joint Histogram
Non-linear Combination
(
(
(
|
)
))
Image Quality
Figure 3.1: System model for NR-IQACT in contourlet domain [1].
28
(3) The statistical information of contourlet coefficients is used to indicate the variation of image
quality.
(4) Image dependent threshold has been adopted to reduce the effect of image content while
calibrating the image quality from joint histogram.
(5) Image quality is calculated by combining the extracted features of the given image in each subband nonlinearly.
3.3.1Contourlet Transform
CT or PDFB is an efficient representation for image geometry structure. It employs LP and DFB to
perform multi-scale and multi-directional decomposition in frequency domain. LP will capture point
discontinuities in an image and DFB links the captured point discontinuities into linear structures [1,
2].Figure 2.1 explains the concept of multi-scale and multi-directional decomposition or
representation using LP and DFB.
3.3.2 Contourlet coefficients
As mentioned in the system model, our second step is finding the relationship between contourlet
coefficients using joint statistics distribution.
In our thesis, contourlet coefficient’s magnitude is modelled by combining all correlation, conditioned
on the magnitude of linear prediction of the coefficient, P, as follows [2, 20]:
∑
(
)
(
)
In the above equations, M and N are considered as independent zero mean random variables,
represents linear prediction parameters and
comes from
coefficient neighborhood of
in
space, scale, and orientation [1].
29
Based on statistical dependencies in between contourlet coefficients, the mutual information can be
predicted using joint histogram.
3.3.3Statistics of contourlet coefficients in distorted natural images
The statistical dependency of
and
in terms of scale, direction and space can be described by
logarithmic joint histogram of the same. Figure 3.2 shows the joint histogram of (Log2P, Log2C) of
natural image and its corresponding JPEG2000 distorted version. In Figure 3.3(b), we can observe
that, on logarithmic axes, natural images will have strong nonlinear dependency in between
and .
Now, an image dependent threshold was used to divide the joint histogram into four parts. From this
division, to find the significant information, we used following relation [1]:
Here in equations 3.3 and 3.4,
coefficients in the considered subband;
and
(
)
(
)
are the significant and insignificant C and P
represents the total number of coefficients in a considered
subband and T is the image dependent threshold that was employed in joint histogram. In the
implemented model, we used significant information (
) as a quality indicator in finding the
image quality of JPEG2000 compressed images. Insignificant information can be used as a quality
indicator in finding the quality of white noise disturbed images due to the reason white noise add extra
information on high frequency.
3.3.4 Image-dependent threshold
Statistical distribution of joint histogram we discussed in previous section changes not only with
distortion of the image, but also with the image content. So, we need an image dependent threshold
that can rationally divide significant and insignificant information in joint histogram. Therefore, due
30
to the reason of complexity and need of preciseness in the process, we employed
(
(| |)) as
image dependent threshold [1].
From Figure 3.3, we can observe that due to the reason LP decomposition, the logarithmic mean of
CT coefficients shows the biggest difference between each scale. This difference will become more
because of image content. On the other side, logarithmic magnitude of CT coefficients in the same
scale change more or less in a regular way due to DFB employment [1].
( )
(a)
( )
(b)
Figure 3.2: Joint histograms of (Log2P, Log2C) for different distorted images. (a). Natural (b).
JPEG2000
Figure 3.3(a): The subband serial number used in subband enumeration of 3.3 (b) [1].
31
From Figure 3.3 (b), another important observation is that, we can see the degradation of image in
almost all subbands in CT. But, in wavelet domain only partial subbands will get affected by
distortions. Therefore, we need more precise statistical model to evaluate image quality [1].
Figure 3.3 (b):
(
(| |))versus subband enumeration index [1].
We train different parameters between each scale by the natural images in the training set [1]. But, in
some subbands we can see deviation of mean value of contourlet coefficients from
(
(| |)).
3.4. Training methodology
We used a nonlinear combination to find the image quality. Because, the distortion is different in each
scale and direction, therefore, to integrate these features we need a nonlinear combination. The
nonlinear transform of these features is used to calculate the quality of each subband as follows [1]:
32
(
Where,
(
(
is the predicted image quality;
insignificant and
for the
subband;
,
|
)
or
and
(
))
)
is probability of significant or
are fitting parameters for the
subband that
can be trained from the training set [1].To calculate the final quality
(
Where,
)
subband.
3.5 Testing method and algorithm to calibrate image quality
The following algorithm explains the method that we followed to process the image database that we
mentioned in 3.2.
1. From the reference database [20], we randomly selected 78 images for training and another 78
images (with corresponding subjective values) for testing. These two sets consist of original
and JPEG2000 compression images.
2. We employed CT or PDFB to decompose the given image. Here, we considered
decomposition at 3 levels using biorthogonal 9/7 filters in LP and ladder or pkva filters in
DFB.
3. Using the concept of relationship in-between contourlet coefficients, we modeled the mutual
information of C and P using their joint histogram.
4. We employed image dependent threshold on joint histogram to find the significant
information using equations (3.3) and (3.4).
5. To minimize threshold offsets and fitting parameters, we used unconstrained non-linear
minimization (MATLAB command “fminsearch”) [1, 20].
Using fminsearch function and subband quality equation (3.7), we obtained k, u and t
parameter values for 14 subbands of each image and thus for all the training image set. Then
we calculated the average of all k, u, and t values for 14 subbands in each image and for all
33
the images as final K, U, and T values. Input these final K, U, and T values for testing or to
predict the quality of different set of images [20].
6. We obtained q by fitting the trained K, U, and T parameters values for the current tested
image. Here, we used MATLAB command “fitfunction”.
3.6 Subjective image quality assessment
Quality assessment research strongly depends upon subjective experiments to provide calibrated data
as well as a testing mechanism. After all, the goal of all QA research is to build a rational agreement
between quality predictions and subjective opinion of human observers. In order to calibrate an image
quality using QA algorithms and to test their performance, a data set of images for which quality has
been ranked by human subjects is required. The QA algorithm may be trained using this data set, and
could be tested on distorted images [20].
3.7. Image quality assessment using SSIM
The Structural Similarity Index (SSIM) is a quality metric which measures the structural similarity
between two images. SSIM is still used as an alternative for evaluation of perceptual quality
assessment. SSIM considers quality degradations in the images as perceived changes in the variation
of structural information between two images.
The SSIM metric is calculated on various windows of an image. The measure between two windows
X and Y of common size N×N is:
(
Where,
is the average of x,
(
)
)(
(
)(
the average of y,
the covariance of X and Y. The
(
)
)
is the variance of x,
(
(3.9)
)
is the variance of y,
) are two variables to stabilize the division
with weak denominator, L is the dynamic range of the pixel-values (typically this is 2# of bits per pixel-1),
k1= 0.01 and k2 = 0.03 by default. In our work, we considered X as input data that was given to NR
IQACT and Y as subjective results.
34
3.8. Evaluation of Test Results
We evaluated our obtained results using Pearson’s Correlation coefficient.
3.8.1. Pearson’s Correlation Coefficient
Pearson’s correlation coefficient [21] can be defined as the measure of correlation or dependence
between two variables. It can be used as a measure of strength of linear dependence between two
variables.
(
)
(3.10)
Where, X and Y are two variables, σ is the standard deviation and Coʋ is the covariance.
In this work, we evaluated our results using Pearson correlation coefficient in two ways. First we
considered X as objective data i.e. the results obtained using NR IQACT and Y as subjective scores.
Then, we considered X as results obtained using SSIM (i.e. using section 3.7) and Y as NR IQACT
results.
35
CHAPTER 4 – RESULTS
In this chapter we present the results that we obtained in each step of the system model of NR IQACT
or implementation that we discussed in chapter 3.
4.1. No Reference Image Quality Assessment using Contourlet
Transform (NR-IQACT)
In Fig 4.1, the coin-fountain image was decomposed into three pyramidal levels, which were then
decomposed into two, four and eight directional subbands. Small magnitude coefficients are in colour
black while large coefficients are in colour white. On various set of images, the process of training
and testing was done numerous times to demonstrate the performance of the algorithm. Than we used
subjective score and SSIM we validated the obtained results.
Level 0
Level 1
Level 2
Level 3
Figure 4.1: Contourlet coefficientsof 14 directional subbands at 3 different levels (1, 2 and 3) of coinfountain image. Here LP works at level 0 and DFB at level 1, 2 and 3.
36
We implemented the NR IQACT model on an image called coin-fountain. Fig 4.2 shows the joint
histograms of (
(
(P),
(P),
(C)) for two subbands at level 1. Fig 4.3 shows joint histograms of
(C)) for 4 subbands at level 2. Fig 4.4 shows the joint histograms of (
(P),
(C))
for 8 subbands at level 3 of coinfountain image.
Level 1 SubBand 3
20
20
40
40
Log2(C)
Log2(C)
Level 1 Sub-Band 2
60
60
80
80
100
100
120
120
20
40
60
80
Log2(P)
100
120
20
40
60
80
Log2(P)
100
120
Figure 4.2: Joint histograms of (
(P),
(C)) for two subbands at level 1. (Of coinfountain
image).From left to right, first is vertical and second is horizontal subband.
Figure 4.3: Joint histograms of (
(P),
(C)) for 4subband at level 2 of coinfountain image.
From top to bottom, first two are horizontal subbands and remaining two are vertical subbands,
Fig 4.5 shows the coinfountain images in which the calibrated quality was degraded due to the image
content (the background information of the image). Fig 4.6 shows the histogram of 14thsubband
(because of the finest scale in the process of our decomposition) in level 3 of respective images that
37
were shown in Fig.4.5. Here, we can observe that image histograms shapes were affected due to the
distortion introduced by the compression process.
Level 3 Sub-Band 9
Level 3 Sub-Band 11
20
40
40
60
60
Log2(C)
20
40
Log2(C)
20
40
60
60
80
80
80
80
100
100
100
100
120
120
120
20
40
60
80
Log2(P)
100
120
20
Level 3 Sub-Band 12
40
60
80
Log2(P)
100
120
120
20
Level 3 Sub-Band 13
40
60
80
Log2(P)
100
120
20
Level 3 Sub-Band 14
20
40
40
60
Log2(C)
20
40
Log2(C)
20
40
60
60
80
80
80
80
100
100
100
120
20
40
60
80
Log2(P)
100
120
120
20
40
60
80
Log2(P)
100
120
60
80
Log2(P)
100
120
60
100
120
40
Level 3 Sub-Band 15
20
Log2(C)
Log2(C)
Level 3 Sub-Band 10
20
Log2(C)
Log2(C)
Level 3 Sub-Band 8
120
20
40
60
80
Log2(P)
100
120
20
40
60
80
Log2(P)
100
120
Figure 4.4: Joint histograms of (
(P),
(C)) for 8 subbands at level 3 of coinfountain image.
From top to bottom, first four are horizontal subbands and next four are vertical subbands.
Figure4.5: The coinfountain images considered to find the effect of image content. The left image is
the original and right is its corresponding distorted version.
Figure4.6: Histogram of original image of one subband and its corresponding 5 distorted version of
coinfountain image. Shape of joint histogram is affected by distortion.
38
Fig 4.7 shows the partition of significant and insignificant information in a normal joint histogram
using image dependent threshold. The resultant four quadrants are significant C and significant P,
insignificant C and insignificant P, significant P and insignificant C, insignificant P and significant C.
We used the partition significant P and insignificant C as quality indicator because this region
produces better results compared to remaining other three regions.
20
40
Log(C)
Sig C,P
60
80
100
InSig C,P
120
20
40
60
Log(P)
80
100
120
Figure 4.7: The partition of significant and insignificant (P and C) information. Image depended
threshold is introduced to divide the joint histogram plane into four parts.
In Fig 4.9, the quality prediction is plotted on the scale with limits 0 to1 which is normalized scale or
in other words 0 to 1 can be considered as 0 to 100 value scale. Here, the image quality nearer to 1 is
the better or best quality and nearer to 0 is bad quality. Table 4.1 shows 10 different images sets that
were taken for testing. In these 10 sets, each set contains one natural and its corresponding 7 to 8
different distorted images.
Figure 4.8:First set of bikes images that were taken for testing. First one is Original image and their
respective distorted images.
39
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
Image set
6
7
8
Figure 4.9: NR-IQAT, subjective values and SSIM metric for the images in Fig 4.8. X-axis indicates 8
images and Y-label indicates quality in scale of 0 to 1.
In Appendix A, we presented complete sets of images that we considered and its respective quality
graph. For easy understanding, we presented first two images sets namely bikes and parrots in this
chapter. It was observed that when the values of Pearson correlation coefficient close to 1, the results
produced becomes better
Figure 4.10: Second set of parrot images that were taken for testing. First one is Original image and
their respective distorted images.
40
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
Image set
6
7
8
Figure 4.11: NR-IQAT, subjective values and SSIM metric for the images in Fig 4.10. X-axis
indicates 8 images and Y-label indicates quality in scale of 0 to 1.
pearson correlation coeff Comparision: NR-IQACT vs Subjective,NR-IQACT vs SSIM
1
NR-IQACT vs Subjective
0.98
NR-IQACT vs SSIM
pearson correlation coeff
0.96
0.94
0.92
0.9
0.88
0.86
0.84
0.82
0.8
1
2
3
4
5
6
Testing Image set
7
8
9
10
Figure 4.12: Metric assessments, Pearson correlation coefficient between objective value and its
corresponding subjective values and Pearson correlation coefficient between objective value and
SSIM obtained for all 10 image sets.
In the Table 4.1, we have just shown one original for each set. Table 4.2 shows the K, U and T values,
fitting obtained from equation 3.7. The 8 subbands that were considered here are 1, 3, 5, 7, 9, 11, 13,
41
Image sets of original
image(1set has 6 to 7
distorted images)
Pearson Correlation
between Subjective and
NR-IQACT Quality
Pearson Correlation
between SSIM and NRIQACT Quality
0.8735
0.8835
0.9784
0.9446
0.9748
0.9575
0.8057
0.9156
0.9856
0.9636
0.9787
0.9606
42
0.9137
0.9606
0.9282
0.9385
0.8964
0.9245
0.9772
0.9728
Table 4.1: Correlation coefficient of 10 different set of images that were taken for testing.
Here, Pearson NR_IQACT vs. SUB = 0.93120 and Pearson NR_IQACT vs. SSIM = 0.93554.
SUBBANDS
Trained with
coarser and
finer level
1
2
3
4
5
6
7
8
K
78.7086 79.0211 78.2711 78.0685 82.0376 81.2842 80.8235 81.1865
U
-0.1613
-0.1616
-0.1623
-0.1627
-0.1581
-0.1583
-0.1587
-0.1584
T
0.2149
0.2141
0.2149
0.2145
0.2122
0.2135
0.2140
0.2136
Table 4.2:Results of K, U, and T fitting parameter obtained after using MATLAB command
‘fminsearch’ for 8 subbands .
43
CHAPTER 5 – CONCLUSION AND FUTURE WORK
5.1 Conclusion
CT can represent the basic elements of natural or original images called multi-scale, local and
directional contour segments (and also sensitive to human eyes). CT was constructed as a double filter
bank structure in which at first the laplacian pyramid captures the point discontinuities, and then a
directional filter bank links these point discontinuities into linear structures. In our thesis, an improved
system model has been implemented to perform no reference image quality assessment using natural
scene statistics. We used the concept of contourlet transform to decompose an image (as CT
coefficients) and to provide a flexible multiscale and multi directional representation. Then, we used
joint histogram to capture nonlinear dependencies or the relationship between CT coefficients across
subbands and, statistical information of CT coefficients to observe the variation of image quality. We
employed Image dependent threshold on joint histogram to reduce the effect of image content during
the calibration of image quality. Finally, we calibrated the image quality by adding all the extracted
features nonlinearly. We used 78 images for training and 78 images for testing with their
corresponding subjective scores. Then, we verified our calibrated results using SSIM and validated
the same using Pearson’s correlation coefficient. The model we implemented in our thesis
recommends that contourlet transform can be a sensible choice to perform the no-reference image
quality assessment.
5.2. Future Work
This approach can be implemented on different image compression techniques. The same approach
can be implemented in videos compressed by using different compression techniques.
44
APENDIX A
Set 1
Figure A.1: First set of bikes images that were taken for testing. First one is Original image and
remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
Image set
Figure A.2: Quality prediction of above taken bikes images in comparison of its subjective values and
SSIM metric. X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
45
Set 2
Figure A.3: Second set of parrot images that were taken for testing. First one is Original image and
remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
Image set
6
7
8
Figure A.4: Quality prediction of above taken Parrots images in comparison of its subjective values
and SSIM metric.X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
46
Set 3
Figure A.5: Third set of house images that were taken for testing. First one is Original image and
remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
Image set
6
7
8
Figure A.6: Quality prediction of above taken house images in comparison of its subjective values and
SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
47
Set 4
Figure A.7: Fourth set of woman images that were taken for testing. First one is Original image and
remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
Image set
Figure A.8: Quality prediction of above taken woman images in comparison of its subjective values
and SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
48
Set 5
Figure A.9: Fifth set of statue images that were taken for testing. First one is Original image and
remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
Image set
Figure A.10: Quality prediction of above taken statue images in comparison of its subjective values
and SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
49
Set 6
Figure A.11: Sixth set of woman hat images that were taken for testing. First one is Original image
and remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
8
Image set
Figure A.12: Quality prediction of above taken woman hat images in comparison of its subjective
values and SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
50
Set 7
Figure A.13: Seventh set of monarch images that were taken for testing. First one is Original image
and remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
6
7
Image set
Figure A.14: Quality prediction of above taken monarch images in comparison of its subjective values
and SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
51
Set 8.
Figure A.15: Eighth set of sailing4 images that were taken for testing. First one is Original image and
remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
Image set
5
6
7
Figure A.16: Quality prediction of above taken sailing 4 images in comparison of its subjective values
and SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
52
Set 9
Figure A.17: Ninth set of lighthouse2 images that were taken for testing. First one is Original image
and remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
5
Image set
6
7
8
Figure A.18: Quality prediction of above taken lighthouse2 images in comparison of its subjective
values and SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
53
Set 10.
Figure A.19: Tenth set of ocean images that were taken for testing. First one is Original image and
remaining were itscorresponding distorted images.
Quality Comparision: Subjective,NR-IQACT,SSIM Image Quality
1
NR-IQACT
Subjective
SSIM
0.9
0.8
0.7
Quality
0.6
0.5
0.4
0.3
0.2
0.1
0
1
2
3
4
Image set
5
6
7
Figure A.20: Quality prediction of above taken ocean images in comparison of its subjective values
and SSIM metric .X-axis indicates 8 images and Y-label indicates quality in scale of 0 to 1.
54
BIBLIOGRAPHY
[1]
Wen Lu, Kai Zeng, Dacheng Tao, Yuan Yuan, XinboGao. (2009, June). No-reference image
quality assessment in contourlet domain. Neurocomputing. [Online].73(4-6), pp.784-794.
[2]
Minh N. Do, Martin Vetterli, “The Contourlet Transform: An Efficient Directional
Multiresolution Image Representation” IEEE Transactions on image Processing, vol.14, no.12,
pp.515-519, November 2005.
[3]
Stefan Winkler, “Video Quality,” in Digital Video Quality. West Sussex, England: Wiley, 2005,
PP.35-70.
[4]
E.P. Simoncelli, A.C. Bovik (Ed.) Statistical modeling of photographic images, in: Handbook of
Image and Video Processing, second ed., Academic Press, NY, 2005, pp. 431–441
[5]
H.R. Sheikh, A.C. Bovik, L. Cormack, No-reference quality assessment using natural scene
statistics: JPEG2000, IEEE Transactions on Image Processing 14 (11) (2005) 1918–1927.
[6]
Muhammad Shahid, Andreas Rossholm, and Benny Lövström, “a no-reference machine learning
based video quality predictor”, in IEEE 5th International Workshop on Quality of Multimedia
Experience, Oct. 2013, pp. 176 –181.
[7]
H.R. Sheikh, A.C. Bovik, Gustavo de Veciana, An Information Fidelity Criterion for Image
Quality Assessment Using Natural Scene Statistics, IEEE Transactions on Image Processing 14
(12) (2005) 2117–2128.
[8]
Zhou Wang, Guixing Wu, Hamid Rahim Sheikh, Eero P. Simoncelli, En-Hui Yang, Alan Conrad
Bovik, Quality-Aware Images, IEEE Transactions on Image Processing 15 (06) (2006) 1680–
1689.
[9]
H.R. Sheikh, Muhammad FarooqSabir, A.C. Bovik, A Statistical Evaluation of Recent Full
Reference Image Quality Assessment Algorithms, IEEE Transactions on Image Processing 15
(11) (2006) 3441–3452.
[10] M. N. Do, “Directional Multi-resolution Image Representations”, Ph.D. Thesis, Department
of
Communication Systems, Swiss Federal Institute of Technology Lausanne, November 2001,
http://www.ifp.uiuc.edu/˜minhdo/publications.
[11] R. J. Duffin and A. C. Schaeffer. A class of nonharmonic Fourier series. Trans. Amer.
Math. Soc., 72:341–366, 1952.
[12] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code,” IEEE Trans.
Commun., vol. 31, no. 4, pp. 532–540, April 1983.
[13] Minh N.Do, “A Friendly Guide to the Frame Theory and Its Application to Signal Processing”,
Signal Processing Seminar, University of Illinois at Urbana – Champaign, Feb. 2003
55
[14] M. N. Do and M. Vetterli, “Framing pyramids,” IEEE Trans. Signal Proc., pp. 2329–2342, Sep.
2003.
[15] R. H. Bamberger and M. J. T. Smith, “A filter bank for the directional decomposition of images:
Theory and design,” IEEE Trans. Signal Proc., vol. 40, no. 4, pp. 882–893, April 1992.
[16] P. P. Vaidyanathan, “Multirate System Fundamentals” in Multirate Systems and Filter Banks.
Englewood Cliffs, NJ: Prentice-Hall, 1993, PP.100-178.
[17] S. Park, M. J. T. Smith, and R. M. Mersereau. A new directional filterbank for image analysis
and classification. In Proc. IEEE Int. Conf. Acoust., Speech, and Signal Proc., pages 1417–1420,
1999.
[18] WentongXue; Jianshe Song; Lihai Yuan; Tao Shen, "Statistical Dependences of Images
Coefficients in Contourlet Domain: Analyzing and Modeling," Intelligent Control and
Automation, 2006. WCICA 2006. The Sixth World Congress on, vol.2, no., pp.10121-10125
[19] R.W. Buccigrossi, E.P. Simoncelli, Image compression via joint statistical characterization in the
wavelet domain, IEEE Transactions on Image Processing 8 (12) (1999) 1688–1701.
[20] H.R. Sheikh, Z. Wang, L. Cormack, A.C. Bovik, LIVE Image Quality Assessment Database,
2003, available online: http://live.ece.utexas.edu/research/quality.
[21] W. Wen-Jie, Y. Xu, “Correlation analysis of visual verbs' sub categorization based on
Pearson's correlation coefficient,” International Conference on Machine Learning and
Cybernetics (ICMLC), vol.4, pp.2042- 2046, July 2010.
56
© Copyright 2026 Paperzz