F-MAD: A Feature-Based Extension of the Most Apparent Distortion

F-MAD: A Feature-Based Extension of the Most Apparent
Distortion Algorithm for Image Quality Assessment
Punit Singh and Damon M. Chandler
Laboratory of Computational Perception and Image Quality,
School of Electrical and Computer Engineering,
Oklahoma State University, Stillwater, OK 74078 USA
ABSTRACT
In this paper, we describe the results of a study designed to investigate the effectiveness of peak signal-to-noise
ratio (PSNR) as a quality estimator when measured in various feature domains. Although PSNR is well known to
be a poor predictor of image quality, PSNR has been shown be quite effective for additive, pixel-based distortions.
We hypothesized that PSNR might also be effective for other types of distortions which induce changes to other
visual features, as long as PSNR is measured between local measures of such features. Given a reference and
distorted image, five feature maps are measured for each image (lightness distance, color distance, contrast, edge
strength, and sharpness). We describe a variant of PSNR in which quality is estimated based on the extent to
which these feature maps for the reference image differ from the corresponding maps for the distorted image.
We demonstrate how this feature-based approach can lead to improved estimators of image quality.
1. INTRODUCTION
A crucial requirement for any system that processes images is a means of assessing the impacts of such processing
on the visual quality of the resulting images. Over the last several decades, numerous algorithms for image quality
assessment (IQA) have been developed to meet this requirement. IQA algorithms aim to predict the quality
of an image in a manner that agrees with quality as judged by human subjects. Here, we specifically focus on
full-reference IQA algorithms which require both a reference image and a distorted image.
The simplest approach to full-reference IQA is to measure local pixelwise differences, and then to collapse
these local measurements into a scalar which represents the overall quality. The mean-squared error (MSE) and
its log-based counterpart, peak signal-to-noise ratio (PSNR), were the earliest and simplest measures of local
pixelwise differences. To improve predictive performance, variants of MSE/PSNR have been measured in the
luminance domain,1 with frequency weighting based on the the human contrast sensitivity function (see, e.g.,
Ref. 2), and with further adjustments for other low-level properties of the human visual system (HVS) (e.g.,
Ref. 3).
More recent and complete IQA algorithms have employed a wide variety of approaches. Numerous IQA
algorithms have been designed based on computational models of the HVS (e.g., Refs. 2, 4–9). Numerous IQA
algorithms have also been designed based on structural similarity (e.g., Refs. 10, 11). Other IQA algorithms
have been designed based on various statistical and information-theoretic-based approaches (e.g., Refs. 12, 13),
based on machine-learning (e.g., Refs. 14, 15), and based on many other techniques (see Ref. 16 for a review).
All of the aforementioned IQA approaches have been shown to outperform PSNR when tested across images
and distortion-types from various IQA databases. However, one important observation when examining the
performances of these IQA algorithm vs. PSNR is that the latter is still quite competitive (and can even
outperform most IQA algorithms) on certain types of distortions, most notably additive noise. For example,
on the TID database,17 PSNR outperforms the vast majority of IQA algorithms on most additive noise types
(white grayscale noise, white color noise, correlated noise, impulse noise, high-frequency noise, etc.). Thus,
PSNR, which is measured between the pixel values of the reference vs. distorted images, appears to be quite
effective at capturing quality differences when the changes are perceived as pixel-based degradation. Following
this argument, it would seem that PSNR might also be effective when measured between feature values of the
reference vs. distorted images, when the changes are perceived as degradations to the corresponding features
(e.g., degradation of perceived contrast, perceived sharpness, perceived edge clarity, etc.).
P. S.: E-mail: [email protected]; D.M.C.: E-mail: [email protected]
In this paper, we describe the results of a study designed to investigate the effectiveness of PSNR as a quality
estimator when PSNR is measured in various feature domains. We specifically investigated measuring PSNR
between the same feature maps used in our algorithm for detecting main subjects in images.18 Given a reference
and distorted image, we measure, for each block in each image, five low-level features: (1) lightness distance,
(2) color distance, (3) contrast, (4) edge strength, and (5) sharpness. These block-based measures thus result
in five feature maps for the reference image, and five feature maps for the distorted image. From these feature
maps, quality is estimated based on the extent to which the feature maps for the reference image differ from the
corresponding maps for the distorted image. We specifically present a measure of quality—F-PSNR—in which
the differences between the feature maps are quantified based on a combination of the average PSNR and the
average Pearson correlation coefficient. We also describe a straightforward technique of integrating F-PSNR into
the MAD (Most Apparent Distortion) IQA algorithm,9 resulting in what we have termed F-MAD. As we will
demonstrate, this feature-based approach can lead to improved estimates of quality.
This paper is organized as follows. Section 2 describes the feature maps and how they are used to estimate
quality (F-PSNR and F-MAD). Section 3 analyzes the performances of these feature-based IQA measures in
predicting subjective ratings of quality. General conclusions are provided in Section 4.
2. ALGORITHM
In this section, we first provide details of the feature maps (Section 2.1), and then we describe how these maps
are used in the F-PSNR and F-MAD measures of quality (Sections 2.2 and 2.3).
2.1. Feature Maps
Given a reference and distorted image, five feature maps are computed for each image: (1) a map of local lightness
distance (the distance between the average lightness of a region and the average lightness of the entire image);
(2) a map of local color distance; (3) a map of local luminance contrast; (4) a map of local edge-strength; and
(5) a map of local sharpness. We have previously shown that these features maps are effective for detecting main
subjects in images.18 Here, we argue that these maps can also be effective for quality assessment.
Let X denote a (reference or distorted) image, and let x denote an 8x8 block of X with 50% overlap between
neighboring blocks. Let fi (x), i ∈ [1, 5], denote the ith feature value measured for x. From all fi (x), x ∈ X, we
form the ith feature map, which we denote as fi (X).
2.1.1. Lightness and Color Distance
Let f1 (x) denote the Euclidean distance between the average lightness of block x and the average lightness of
the entire image. Let f2 (x) denote the Euclidean distance between the average color of block x and the average
color of the entire image. These two features are given by:
(1)
f1 (x) = L̄∗x − L̄∗X ,
f2 (x) =
2
2
(ā∗x − ā∗X ) + b̄∗x − b̄∗X ,
(2)
where L̄∗ , ā∗ , b̄∗ denote the average L∗ , a∗ , b∗ measured in the CIE 1976 (L∗ , a∗ , b∗ ) color space (CIELAB). See
Ref. 18 for details on how RGB values are converted to (L∗ , a∗ , b∗ ) values.
The second and third rows of Figure 1 show lightness-distance maps f1 (X) and color-distance maps f2 (X),
respectively, for various images.
Input
Image
Lightness
Distance
Color
Distance
Contrast
Edge
Strength
Sharpness
Monument
Fisher
Sparrow
Swarm
NativeUS
Figure 1. Example images and their feature maps. Images in first row are select reference images from the CSIQ
database.19 The second through sixth rows show maps of lightness distance, color distance, contrast, edge strength, and
sharpness.
2.1.2. Contrast
Local contrast can also be an important factor which influences an image’s visual appearance. To measure this,
we first convert the image into the luminance domain. Then, the root mean square (RMS) contrast of each block
is given by the ratio of the standard deviation of luminances to the mean luminance of the respective block. The
result is a map in which each value represents local RMS contrast.
Specifically, let f3 (x) denote the RMS contrast of block x. In order to compute f3 (x), we first convert the
image X into a grayscale image Xg via Xg = 0.299R + 0.587G + 0.114B.
Let xg denote the corresponding block in Xg . Let l(x) = (b + kxg )γ denotes the luminance-valued block, with
b = 0.7297, k = 0.0376 and γ = 2.2 assuming sRGB display conditions. The quantity f3 (x) is then computed
via:
f3 (x) =
σl(x) /μl(x) ,
0,
μl(x) > 0,
μl(x) = 0,
(3)
where σl(x) and μl(x) denote the standard deviation and the mean of l(x), respectively.
The fourth row of Figure 1 shows contrast maps f3 (X) for various images.
2.1.3. Edge Strength
To quantify similarity between object boundaries, we use maps of local edge strength. First, edges are detected
by using Robert’s edge detector.20 Then the edge strength of each block is computed by averaging the number
of detected edge pixels within that block. The result is a map in which each value represents local edge strength.
Specifically, let f4 (x) denote the edge strength of block x. Let E(X) denote the binary edge map computed
by running Roberts edge detector on X. The feature f4 (x) is then given by:
f4 (x) = μE(x) =
1 ej ,
m2 j
(4)
where E(x) is the corresponding block of x in E(X) and ej is a pixel of E(x).
The fifth row of Figure 1 shows edge-strength maps f4 (X) for various images.
2.1.4. Sharpness
In general the sharper an image the better is its quality. If the image is blurred, it is difficult to distinguish
between neighboring objects; blurring also reduces the ability to visually recognize objects. Thus, sharpness can
potentially be a useful feature for estimating image quality.
Let f5 (X) denote the sharpness map for image X. For measuring local sharpness, we employ our own S3
sharpness estimator21 in which local sharpness is measured in both the frequency domain and the spatial domain.
In the frequency domain, the image is divided into 32x32 pixel blocks with 75% overlap. The slope of the power
spectrum averaged across all orientations serves as the spectral sharpness measure. In the spatial domain, the
image is divided into 8x8 pixel blocks, and then a measure of the local total variation serves as the spatial
sharpness measure. The two sharpness measures are then combined via a geometric mean. The result is a map
in which each value represents local sharpness.
The sixth row of Figure 1 shows sharpness maps f5 (X) for various images.
2.2. PSNR and Correlation Between Feature Maps
Given the five feature maps, we estimate quality based on the extent to which the feature maps of the distorted
image differ from the feature maps of the reference image. We employ PSNR and Pearson correlation coefficient
to quantify the overall difference between each pair of maps (distorted image’s map vs. reference image’s map).
Let F-PSNR denote this feature-based quality measure. A block diagram of the F-PSNR computation is shown
in Figure 2.
Let Xr and Xd denote the reference and distorted images, respectively. The PSNR between each feature
map is given by
R2
P SN R (fi (Xr ), fi (Xd )) = 10 log10
,
(5)
M SE
where fi (Xr ) and fi (Xd ) denote the ith feature map for images Xr and Xd , respectively; and where R denotes
peak value of the signal, and M SE denotes the mean-squared error between fi (Xr ) and fi (Xd ).
Corr
Contrast
Graysale,
luminance,
or L* (see
text for
details)
PSNR
P
Corr
Sharpness
PSNR
x
x
Distorted
image
Edge
Strength
Corr
P
PSNR
Reference
image
Corr
C
b*
Luminance
Distance
PSNR
P
Corr
C
Color
Distance
a*
PSNR
P
x
+
F-PSNR
x
x
Figure 2. Block diagram of the F-PSNR quality measure.
We also compute the linear correlation coefficient between the corresponding maps from the two images,
given by
fi (Xd )n1 ,n2 − fi (Xd )
n1
n2 fi (Xr )n1 ,n2 − fi (Xr )
, (6)
CORR (fi (Xr ), fi (Xd )) = 2 2 n1
n2 fi (Xr )n1 ,n2 − fi (Xr )
n1
n2 fi (Xd )n1 ,n2 − fi (Xd )
where fi (Xr )n1 ,n2 and fi (Xd )n1 ,n2 denote the (n1 , n2 ) element of fi (Xr ) and fi (Xd ), respectively; and where
fi (Xr ) and fi (Xd ) denote the mean of fi (Xr ) and fi (Xd ), respectively.
Finally, F-PSNR is computed by multiplying the correlation coefficients with the corresponding PSNRs, and
then averaging the products:
5
F-PSNR =
1
P SN R (fi (Xr ), fi (Xd )) × CORR (fi (Xr ), fi (Xd )) .
5 i=1
(7)
As we will demonstrate in Section 3, F-PSNR on its own performs quite competitively with current stateof-the-art IQA algorithms in predicting quality. However, additional improvements in predictive performance
can potentially be gained by combining F-PSNR with an existing IQA algorithm. In the following section, we
describe a combination of F-PSNR and the MAD IQA algorithm.9
2.3. F-MAD: Augmenting MAD with F-PSNR
To investigate the effectiveness of F-PSNR as a supplement to existing IQA algorithms, we augmented the MAD
(Most Apparent Distortion)9 algorithm with F-PSNR.
MAD was one of the first algorithms to demonstrate that quality can be predicted by modeling two strategies
employed by the HVS, and by adapting these strategies based on the amount of distortion. For high-quality
images, in which the distortion is less noticeable, the image is most apparent, and thus the HVS attempts to look
past the image and look for the distortion—a detection-based strategy. For low-quality images, the distortion
is most apparent, and thus the HVS attempts to look past the distortion and look for the image’s subject
matter—an appearance-based strategy.
In MAD, two main stages are employed: (1) a detection-based stage, which computes the perceived distortion
due to visual detection of distortions ddetect ; and (2) an appearance-based stage, which computes the perceived
distortion due to visual appearance changes dappear . The detection-based stage of MAD computes ddetect by using
a masking-weighted block-based mean square error which is computed in the lightness domain. The appearancebased stage of MAD computes dappear by computing the average difference between the block-based log-Gabor
statistics of the original image to those of the distorted image.
To augment MAD with F-PSNR, we employ the following weighted geometric mean:
F-MAD = (ddetect )α (dappear )β (F-PSNR)−γ
(8)
where ddetect and dappear denote the outputs of MAD’s detection-based and appearance-based stages, respectively.
The parameters β and γ are given by β = (1−α)
and γ = 1 − α − β, where α is the blending parameter computed
2
in the original MAD algorithm:
1
(9)
α=
(1 + β1 (ddetect )β2 )
where β1 = 0.32 and β2 = 0.132.
As argued in Ref. 9, Equation (9) was designed to give greater weight to ddetect for high-quality images
and greater weight to dappear for low-quality images. Here, because F-PSNR does not take into account visual
masking, we chose β = (1−α)
and γ = 1 − α − β so that F-PSNR supplements MAD’s appearance-based stage
2
rather than its detection-based stage.
3. RESULTS
We applied F-PSNR and F-MAD to two publicly available databases of subjective image quality: LIVE22 and
CSIQ.19 We compared F-PSNR and F-MAD with normal PSNR and five other modern full-reference IQA
algorithms for which code is publicly available: SSIM,10 MS-SSIM,11 VIF,12 VSNR,7 and MAD.9 Four
measures of performance were employed: Pearson correlation coefficient (CC), Spearman rank order correlation
coefficient (SROCC), outlier ratio (OR), and outlier distance (OD). For all IQA algorithms, a four-parameter
sigmoid was applied before computing CC, OR, and OD to compensate for nonlinear relations between the
predictions and subjective scores.
Table 1 lists the resulting CC, SROCC, OR, and OD of each algorithm on each database. Notice from Table
1 is that F-PSNR outperforms PSNR, SSIM, MS-SSIM, and VSNR. In terms of CC, F-PSNR yields values of of
0.949 and 0.931 on LIVE and CSIQ, respectively. This finding suggests that changes to the feature maps caused
by the distortions can be an effective proxy for estimating quality.
For F-MAD, the results in Table 1 demonstrate that the combination of F-PSNR and MAD may or may not
lead to improved predictions over MAD alone. In terms of CC, F-MAD yields CC values of 0.970 and 0.962 on
LIVE and CSIQ, respectively; and MAD alone yields 0.968 and 0.950 on these databases. The improvement on
the LIVE database is negligible; however, the improvement on the CSIQ database is significant. (For comparison,
the next overall best performer, VIF, yields CC values of 0.960 and 0.925 on the respective databases.) Although
F-PSNR on its own shows promise, there is clearly a need to further research proper techniques of combining
F-PSNR with existing IQA algorithms.
Table 1. Performances of F-MAD and other quality assessment algorithms on images from the LIVE and CSIQ databases.
The results in the Average rows denote averages weighted by the number of images in the databases. The best performances
are bolded.
CC
SROCC
OR
OD
LIVE
CSIQ
Average
LIVE
CSIQ
Average
LIVE
CSIQ
Average
LIVE
CSIQ
PSNR
SSIM
MSSIM
VSNR
VIF
MAD
F-PSNR
F-MAD
0.871
0.800
0.835
0.876
0.806
0.841
0.682
0.343
0.512
4943
3178
0.938
0.815
0.876
0.947
0.837
0.892
0.592
0.335
0.463
2814
2896
0.933
0.897
0.915
0.944
0.914
0.929
0.619
0.245
0.432
2960
1528
0.923
0.800
0.862
0.928
0.811
0.869
0.588
0.311
0.449
3247
3325
0.960
0.925
0.942
0.963
0.919
0.941
0.546
0.226
0.386
1890
1218
0.968
0.950
0.959
0.968
0.947
0.957
0.415
0.180
0.297
1370
626
0.949
0.931
0.940
0.953
0.929
0.941
0.557
0.216
0.387
2331
936
0.970
0.962
0.966
0.970
0.956
0.963
0.398
0.170
0.284
1282
579
4. CONCLUSIONS
This paper described the results of a study designed to investigate the effectiveness of PSNR as a quality estimator
when measured between feature maps for the reference and distorted images. Given a reference and distorted
image, five feature maps are measured for each image (lightness distance, color distance, contrast, edge strength,
and sharpness). Quality is then estimated based on the extent to which these feature maps for the reference
image differ from the corresponding maps for the distorted image. We demonstrated how this feature-map-based
approach (F-PSNR) can yield a competitive IQA strategy, and how it can be used to augment and improve an
existing IQA algorithm (F-MAD).
5. ACKNOWLEDGMENTS
This material is based upon work supported by, or in part by, the National Science Foundation Award 0917014.
REFERENCES
1. B. Moulden, F. A. A. Kingdom, and L. F. Gatley, “The standard deviation of luminance as a metric for
contrast in random-dot images,” Perception 19, pp. 79–101, 1990.
2. N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A. C. Bovik, “Image quality assessment
based on a degradation model,” IEEE Transactions on Image Processing 9, 2000.
3. K. Egiazarian, J. Astola, N. Ponomarenko, V. Lukin, F. Battisti, and M. Carli, “A NEW FULLREFERENCE QUALITY METRICS BASED ON HVS,” in Proceedings of the Second International Workshop on Video Processing and Quality Metrics, (Scottsdale, AZ USA).
4. P. LeCallet, A. Saadane, and D. Barba, “Frequency and spatial pooling of visual differences for still image
quality assessment,” Proc. SPIE Human Vision and Electronic Imaging V 3959, pp. 595–603, 2000.
5. “JNDMetrix technology.” Sarnoff Corporation.
6. A. Ninassi, O. Le Meur, P. Le Callet, and D. Barba, “Which semi-local visual masking model for wavelet
based image quality metric?,” in Image Processing, 2008. ICIP 2008. 15th IEEE International Conference
on, pp. 1180–1183, Oct. 2008.
7. D. M. Chandler and S. S. Hemai, “Vsnr: A wavelet-based visual signal-to-noise ratio for natural images,”
IEEE Transactions on Image Processing 16(9), pp. 2284–2298, 2007.
8. V. Laparra, J. M. noz Marı́, and J. Malo, “Divisive normalization image quality metric revisited,” J. Opt.
Soc. Am. A 27, pp. 852–864, Apr. 2010.
9. E. C. Larson and D. M. Chandler, “Most apparent distortion: full-reference image quality assessment and
the role of strategy,” Journal of Electronic Imaging 19(1), p. 011006, 2010.
10. Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: From error visibility to
structural similarity,” IEEE Transactions on Image Processing 13, pp. 600–612, 2004.
11. Z. Wang, E. Simoncelli, and A. Bovik, “Multiscale structural similarity for image quality assessment,”
in Conference Record of the Thirty-Seventh Asilomar Conference on Signals, Systems and Computers, 2,
pp. 1398–1402, Nov. 2003.
12. H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Transactions on Image
Processing 15(2), pp. 430–444, 2006.
13. A. Shnayderman, A. Gusev, and A. M. Eskicioglu, “An SVD-based grayscale image quality measure for
local and global assessment,” IEEE Transactions on Image Processing 15(2), pp. 422–429, 2006.
14. M. Liu and X. Yang, “A new image quality approach based on decision fusion,” Fuzzy Systems and Knowledge
Discovery, Fourth International Conference on 4, pp. 10–14, 2008.
15. P. Peng and Z. Li, “Image quality assessment based on distortion-aware decision fusion,” in Proceedings
of the Second Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering,
pp. 644–651, 2012.
16. D. M. Chandler, “Seven challenges in image quality assessment: Past, present, and future research,” ISRN
Signal Processing , 2012. in press.
17. N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, M. Carli, and F. Battisti, “Tid2008-a database
for evaluation of full-reference visual quality assessment metrics,” Advances of Modern Radioelectronics 10,
pp. 30–45, 2009.
18. C. Vu and D. M. Chandler, “Main subject detection via adaptive feature refinement,” Journal of Electronic
Imaging 20, Mar. 2011.
19. E. C. Larson and D. M. Chandler, “Categorical subjective image quality CSIQ database,” 2009.
20. L. G. Roberts, “Machine perception of three-dimensional solids,” Optical and Electrooptical Information
Processing , MIT Press, Cambridge, MA, 1965.
21. C. T. Vu, T. D. Phan, and D. M. Chandler, “S3: A spectral and spatial measure of local perceived sharpness
in natural images,” Trans. Img. Proc. 21, pp. 934–945, Mar. 2012.
22. H. Sheikh, Z.Wang, L. Cormack, and A. Bovik, “LIVE image quality assessment database Release 2..”