Unsupervised clustering of PolSAR data using Polarimetric G

EUSAR 2014
Unsupervised clustering of PolSAR data using Polarimetric G
Distribution and Markov Random Fields
Salman Khan, Surrey Space Centre, University of Surrey, [email protected], United Kingdom
Anthony Paul Doulgeris, University of Tromsø, [email protected], Norway
Abstract
In this paper an unsupervised PolSAR data clustering algorithm utilizing the flexible polarimetric G distribution is
proposed for the first time. This algorithm has been demonstrated in earlier contributions using other non-Gaussian
distributions like K, G 0 , and U distributions. The K and G 0 distributions suffer from limited modeling capability due
to the presence of only one shape parameter, while the U distribution, although as flexible as the G model, has a very
cumbersome probability distribution function, making its software implementation difficult and computation slow. The
proposed algorithm with the G distribution has a similar non-Gaussian modeling accuracy to the U model, a more easily
implementable probability distribution function, and a much faster computation time. The only disadvantage being that
the log cumulants of the G model are only computable using numerical differentiation, and hence fractional moment
estimators are used in this analysis.
1 Introduction
Synthetic Aperture Radar (SAR) data are of significant
interest because of the weather and light independence
properties of SAR sensors. These data offer a viable alternative in situations where other sensors (e.g. optical)
suffer from impenetrable signals due to cloud cover, rain,
smoke, light conditions etc. The recent advent of high
resolution SARs now provides data with sub meter resolution comparable to that of some optical sensors. Further, the availability of polarimetric SAR (PolSAR) data
facilitates more diverse information through the different
transmit and receive polarization pairs. This helps in differentiating between physical scattering mechanisms occurring at the target of interest.
SAR data are inherently statistical due to the presence of
speckle, which is a characteristic phenomenon of a coherent imaging system. Therefore, it is inevitable that
the analyses of such data take place from a probabilistic approach. Gaussian statistics model low resolution
SAR data reasonably well, however, when the resolution
increases and central limit theorem is not strictly satisfied, non-Gaussian statistics are observed. Consequently,
many non-Gaussian probability models have been used to
describe both single-channel and PolSAR data. For multilook PolSAR data, which is the format analyzed in this
research, the underlying Gaussian statistics are modeled
by the scaled complex Wishart distribution, sWd , while
the non-Gaussian statistics are derived using the product
model [1], which states that the backscattered signal results from the product between a Gaussian speckle noise
random variate and a positive scalar texture random variable.
The non-Gaussian multilook polarimetric Kd [2] and Gd0
[3] distributions are relatively more flexible (one texture
parameter each), and successfully model many PolSAR
scenes. However, it has been noted that sometimes more
ISBN 978-3-8007-3607-2 / ISSN 2197-4403
modeling flexibility is needed for real PolSAR data. In
this regard, the multilook polarimetric Ud [4], and Gd [3]
distributions are more flexible with two texture shape parameters each, and have the sWd , Kd , and Gd0 distributions as special cases. In fact, the modeling flexibility of
Ud and Gd distributions is very similar as shown recently
in [5].
In many applications, clustering or segmentation of SAR
data is of interest. These include, land monitoring, mapping, change detection, damage assessment and detection, and rescue and recovery operations. Some of these
recent algorithms are presented in [4, 6–10]. The clustering algorithm of interest in this paper is a modified version of the unsupervised expectation maximization (EM)
algorithm. This algorithm has been proposed by Doulgeris et al. in [7] and later extended in [10] to incorporate
contextual smoothening through the use of Markov random fields (MRF). The algorithm uses one of the aforementioned probability distributions as the underlying statistical model, and has recently been proposed with the
flexible Ud distribution in [11]. However, the pdf of the
Ud distribution is computationally challenging as it involves Kummer-U functions, which do no have readily
available logarithmic implementations, and are only computable through numerical integration. This is the reason for the slow computation time of this segmentation
algorithm as noted in [11]. In contrast to this, the similarly flexible Gd distribution pdf contains modified Bessel
functions of the second kind, which have stable and well
tested logarithmic implementations in GNU scientific library (GSL) [12]. It is therefore expected to be computationally faster with a similar modeling capability. This is
exactly the motivation behind the current research.
The rest of the paper has been organized as follows. Section 2 gives a brief overview of the clustering algorithm.
The Gd distribution and its estimator are presented in Section 3. Section 4 depicts the application of the algorithm
1025
© VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany
EUSAR 2014
to PolSAR data. Section 5 discusses the observed numerical inaccuracies during parameter estimation, while Section 6 lists some conclusions, and possible future study.
2 Clustering Algorithm
The clustering algorithm is developed for multilook polarimetric data available in the form of polarimetric covariance matrices. It is also assumed that the scalar product model is valid, and the multilooking procedure is a
simple box-car multilook averaging of single-look scattering vectors. Currently, the equivalent number of looks
(ENL) is estimated only once from a homogeneous area
in the image, and is utilized throughout the lifetime of
the clustering algorithm, although this can be set to an
adaptive ENL estimation.
The clustering algorithm uses the method of multivariate fractional moments (MoMFM), recently proposed in
[13], to estimate the texture shape parameters of the Gd
distribution within the expectation maximization framework. It is pertinent to introduce the two dimensional
matrix log cumulants (2D-MLC) diagram here. Figure 1
shows the 2D-MLC diagram, depicting the color-coded
manifolds spanned by the theoretical MLCs of several
matrix-variate compound distributions. The dimension
spanned by the manifold is equal to the number of texture
parameters present in the compound PDF. The two texture shape parameters of the Gd distribution only need to
be estimated when the sample matrix log cumulants fall
within the Gd distribution manifold in the 2D-MLC diagram. In this case MoMFM is used to estimate the texture
shape parameters. Outside this domain, only one texture
parameter needs to be estimated, which corresponds to
the texture parameter of the Kd or Gd0 distributions depending on whether the sample matrix log cumulant falls
on the left or the right side of the Gd distribution matrix
log cumulant manifold, respectively. In this case the corresponding method of matrix log cumulants (MoMLC) is
used for texture shape parameter estimation [14].
The algorithm separates the image pixels into clusters
based on the Gd distribution. It uses a modified version
of the iterative expectation maximization algorithm as detailed in [7] and contextual smoothening is achieved with
an MRF approach, which integrates the Gd distribution
to model the statistics of each image cluster and a Potts
model for the spatial context.
The goodness-of-fit (GoF) testing in the algorithm is performed using Pearson’s chi-squared GoF test instead of
using matrix log cumulants based GoF procedures as in
[11]. The primary reason of not using matrix log cumulants based GoF testing is that the theoretical matrix log
cumulants of the Gd distribution do not have closed forms
and can only be computed using numerical differentiation. The Pearson’s GoF testing is done by comparing
the model fitting to the histogram of the determinants of
the multilook polarimetric covariance matrices.
The GoF of each cluster is used to automatically determine the number of significant clusters within the dataset. Poorly fitting clusters are split into two clusters and
the EM-algorithm is re-applied to convergence. The algorithm stops when all clusters are considered good-fits to
the data histograms. Consistent initialisation is achieved
by always starting with one cluster. This results in as
many statistically distinct classes as allowed by the chosen underlying pdf, the number of data samples, and the
chosen confidence level, e.g., 95%. The algorithm optionally includes adaptive sensitivity and sub-sampling
ability as explained in [7].
3 The Gd Distribution
The multilook polarimetric Gd distribution was initially
proposed in [3]. It has two texture shape parameters
α ∈ R, and ω > 0. When ω → 0+ , it reduces to the Kd or
Gd0 distributions if α is positive or negative, respectively.
When |α| → ∞ or ω → ∞ it reduces to the Gaussian
case of sWd distribution. Its pdf is given by [5, 13]:
LLd|C|L−d
1
Γd (L)|Γ|L η α Kα (ω)
! α−Ld
2
2L Tr Γ−1 C + ωη
×
ω/η
p
× Kα−Ld
ω/η (2L Tr (Γ−1 C) + ωη) ,
pC (C; L, Γ, α, ω, η) =
8
7
6
κ
2
5
where L is the number of looks, Γ is the normalized sample covariance matrix, η is the scale parameter, Kν (·) is
modified Bessel function of the second kind and order ν.
Its MoMFM estimator can be derived from the following
equation:
4
3
U/G
W
M
K
2
1
0
−4
G0
Wis
−3
−2
−1
0
1
2
3
ν
Kα̂K1 +ν (ω̂K1 )Kα̂ν−1
(ω̂K1 )
K1
E{Tr Σ−1 C } =
ν
Kα+1
(ω̂K1 )
Γ(Ld + ν)
× ν
,
L Γ(Ld)
4
κ
3
Figure 1: Manifolds of different models in matrix log
cumulants diagram. The U and G models have the same
manifold. The W and M models are currently considered invalid and ignored.
ISBN 978-3-8007-3607-2 / ISSN 2197-4403
(1)
(2)
by simultaneously solving two equations with ν = 81 and
1
+
4 . Outside the Gd distribution manifold ω → 0 , there-
1026
© VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany
EUSAR 2014
fore only α needs to be estimated, which can be easily
done using the MoMLC estimators for Kd and Gd0 distributions listed in [14]. The MoMLC estimator for the texture shape parameters of the Ud distribution is also listed
in [14]. Moreover, the performance analysis of these estimators on simulated PolSAR data can be found in [13,14]
labeled classes show a nearly perfect clustering performance. The algorithm took approximately 1 minute and
2 seconds to compute the shown results using an Intel
quad core 3.1 GHz processor, with 8 Gb RAM, and MATLAB software. The fitting of the G pdf to the six detected
clusters is shown in Figure 3.
6
5
4
3
2
1
9
Figure 2: Clustering of simulated Gd data with 6 classes,
sub-sampling = 4.
8
7
5
20
0
0
1
2
3
4
5
6
4
7
3
20
0
2
0
1
2
3
4
5
6
7
1
20
0
0
1
2
3
4
5
6
7
Figure 4: (Top) False color Pauli RGB image. (Bottom)
Clustering of quad-pol TerraSARX data with 6 look and
sub-sampling = 8.
40
20
0
0
1
2
3
4
5
6
7
0
1
2
3
4
5
6
7
10
5
0
10
5
0
cluster:1
cluster:9
cluster:6
cluster:4
cluster:3
cluster:8
cluster:7
cluster:5
cluster:2 α=5.783E+03
α=−1.000E+04
α=2.6
α=−11
α=−11
α=−1.000E+04α=−2.340E+03
α=−24
α=−1.9
ω=779
ω=0.0
ω=3.5
ω=0.0
ω=0.0
ω=0.0
ω=0.0
ω=0.0
ω=0.0
(1/d)
(1/d)
(1/d)
(1/d)
(1/d)
|Σ|
= 0.856
|Σ|
= 0.0425 (1/d)
|Σ|
= 0.0368 (1/d)
|Σ|
= 0.0727|Σ|
= 0.0652 (1/d)
|Σ|
= 0.0548|Σ|(1/d)= 0.0573
|Σ|
= 0.0724
|Σ|
= 0.36
cluster:6
cluster:5
cluster:4
cluster:3
cluster:2
cluster:1
α=−2.1
α=3.7
α=−33
α=3.7
α=−8.2
α=19
ω=0.0
ω=0.0
ω=0.0
ω=0.0
ω=9.7
ω=0.0
(1/d)
(1/d)
(1/d)
(1/d)
(1/d)
(1/d)
|Σ|
= 2.43 |Σ|
= 0.731 |Σ|
= 3.67 |Σ|
= 9.24 |Σ|
= 10.2 |Σ|
= 32.9
6
40
1
0.5
0
1
2
3
4
5
6
7
Figure 3: Fitting of estimated G pdf to cluster histograms
of simulated data.
0
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
0
2
4
6
8
10
12
14
5
0
20
0
10
4 Results
The clustering algorithm using the Gd distribution has
been used to cluster both simulated and real PolSAR data.
Selected results for each case have been shown below.
4.1 Simulated PolSAR Data
Synthetic dual-pol PolSAR data 250 × 250 pixels with
six distinct classes of Gd distribution were generated using 5 looks. The parameters chosen for simulated data
were collected from real data samples. The results of
the clustering algorithm, with a sub-sampling of four,
after MRF smoothening are shown in Figure 2. The
ISBN 978-3-8007-3607-2 / ISSN 2197-4403
5
0
2
1
0
10
5
0
5
0
5
0
50
0
Figure 5: Fitting of estimated G pdf to cluster histograms
of real data.
1027
© VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany
EUSAR 2014
4.2 Real PolSAR Data
SAR imagery,” in Proc. IGARSS, vol. 4, Pasadena,
CA, Aug. 1994, pp. 2179–2181.
The algorithm is also applied to quad-pol TerraSAR-X
data, 300 × 400 pixels, with an ENL of six using a subsampling of eight. The results are shown in Figure 4,
where they can also be compared to the corresponding Pauli decomposed false color image. Nine distinct
classes were found, with the first class containing only
one pixel. A comparison of the clustering results with
the Pauli decomposed image shows a visually acceptable
performance. The algorithm took 5 minutes and 11 seconds to compute the shown results on the same computing platform. The fitting of the G pdf to the nine detected
clusters is shown in Figure 5.
[3] C. Freitas, A. Frery, and A. Correia, “The polarimetric G distribution for SAR data analysis,” Environmetrics, vol. 16, no. 1, pp. 13–31, Feb. 2005.
[4] L. Bombrun, G. Vasile, M. Gay, and F. Totir, “Hierarchical segmentation of polarimetric SAR images
using heterogeneous clutter models,” IEEE Trans.
Geosci. Remote Sens., vol. 49, no. 2, pp. 726–737,
Feb. 2011.
[5] S. Khan and R. Guida, “Application of mellin-kind
statistics to polarimetric g distribution for sar data,”
IEEE Trans. Geosci. Remote Sens., vol. PP, no. 99,
pp. 1–16, 2013.
5 Numerical Inaccuracy of Parameter Estimation
[6] J.-M. Beaulieu and R. Touzi, “Segmentation of textured polarimetric SAR scenes by likelihood approximation,” IEEE Trans. Geosci. Remote Sens.,
vol. 42, no. 10, pp. 2063–2072, Oct. 2004.
It has been experimentally observed that the parameter estimation using multivariate fractional moments has
slight numerical inaccuracies, which accentuate on real
PolSAR data. There is a consensus between the authors
that, for practical purposes, the fast computation time of
MoMFM estimators outweighs their slight numerical inaccuracy. In the proposed version of the algorithm, this
effect has been mitigated by using sub-sampling and also
limiting the maximum sample size to 10,000 pixels. This
reduces the sensitivity of the GoF test, enough to cancel
out the little inaccuracy in estimation. However, improvement in the accuracy of these estimators will form a subject suitable for future research as the fast computational
time is highly desirable.
[7] A. Doulgeris, S. Anfinsen, and T. Eltoft, “Automated non-gaussian clustering of polarimetric synthetic aperture radar images,” IEEE Trans. Geosci.
Remote Sens., vol. 49, no. 10, pp. 3665–3676, Oct.
2011.
[8] A. C. Frery, J. Jacobo-Berlles, J. Gambini, and
M. Mejail, “Polarimetric SAR image segmentation
with B-splines and a new statistical model,” Multidimensional Syst. Signal Process., vol. 21, no. 4, pp.
319–342, Dec. 2010.
[9] O. Harant, L. Bombrun, M. Gay, R. Fallourd,
E. Trouvé, and F. Tupin, “Segmentation and classification of polarimetric SAR data based on the
KummerU distribution,” in Proc. PolInSAR, Frascati, Italy, 2011.
6 Conclusions and Future Work
A fast unsupervised clustering algorithm for multilook
PolSAR data has been proposed using the flexible G distribution for the first time. The results on simulated and
real PolSAR data look very promising. The computational time and software implementation have also been
found to be very straight forward. The only drawback
is the numerical inaccuracy during parameter estimation,
which will form a topic of further investigation.
7 Acknowledgments
This work has been funded by the EC FP7 project Demining ToolBOX (D-BOX), grant agreement no:284996,
and the TerraSAR-X dataset has been provided by DLR.
References
[1] C. Oliver and S. Quegan, Understanding Synthetic
Aperture Radar Images, 2nd ed.
Raleigh, NC:
SciTech Publishing, 2004.
[2] J. Lee, D. Schuler, R. Lang, and K. Ranson, “Kdistribution for multi-look processed polarimetric
ISBN 978-3-8007-3607-2 / ISSN 2197-4403
[10] V. Akbari, A. P. Doulgeris, G. Moser, T. Eltoft,
S. N. Anfinsen, and S. B. Serpico, “A texturalcontextual model for unsupervised segmentation of
multipolarization synthetic aperture radar images,”
IEEE Trans. Geosci. Remote Sens., no. 99, pp. 1–
12, 2012.
[11] A. Doulgeris, V. Akbari, and T. Eltoft, “Automatic
PolSAR segmentation with the U-distribution and
Markov Random Fields,” in Proc. EUSAR, Nuremberg, Germany, Apr. 2012, pp. 183–186.
[12] B. Gough, GNU Scientific Library Reference Manual - Third Edition. Network Theory Ltd., 2009.
[13] S. Khan and R. Guida, “On fractional moments
of multilook polarimetric whitening filter for polarimetric SAR data,” IEEE Trans. Geosci. Remote
Sens., vol. PP, no. 99, pp. 1–11, 2013.
[14] S. Anfinsen and T. Eltoft, “Application of the
matrix-variate Mellin transform to analysis of polarimetric radar images,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 6, pp. 2281–2295, Jun.
2011.
1028
© VDE VERLAG GMBH ∙ Berlin ∙ Offenbach, Germany