A GRAPH THEORETIC APPROACH TO ANOMALY

A GRAPH THEORETIC APPROACH TO ANOMALY DETECTION IN HYPERSPECTRAL
IMAGERY
D. W. Messinger, J. Albano
Chester F. Carlson Center for Imaging Science
Rochester Institute of Technology
54 Lomb Memorial Blvd., Rochester, NY 14623
ABSTRACT
For many applications in spectral image analysis, the quantitative model used to describe the data is based on first and
second order statistics, linear mixture models (i.e., the convex hull), and / or linear subspaces. An example of this for
anomaly detection is the well known RX algorithm, a statistical measure of the anomalousness of individual pixels when
compared to the mean and covariance of the image. While
these models perform well for several applications, as sensor
resolution improves, and thus, complex clutter in the image
has a stronger impact, the simple assumptions behind these
algorithms are not necessarily well-met. Here, we propose a
novel approach to spectral image processing that instead models the data using the concept of a high dimensional graph.
The pixels are considered in the spectral space as the nodes
(vertices) of the graph and edges are created connecting them
if two nodes satisfy some similarity criterion. Given this spectral graph, there are several metrics that can be computed related to the overall graph connectivity, as well as the connectivity of individual pixels to the graph. This latter metric is
used here as the anomaly detection metric. A hyperspectral
image is tiled (thus providing a computational advantage in
addition to a spatially adaptive background model) and the
graph computed per tile. Each pixel in the tile is then assigned an anomalousness score based on its weighted vertex
volume, or connectivity, to the graph. Results are presented
for a reflective hyperspectral image with known targets and
are shown to be comparable to the RX algorithm.
Index Terms— hyperspectral, anomaly detection, graph
theory
1. INTRODUCTION
Quantitative analysis of hyperspectral imagery requires the
development of mathematical models of the data. Traditional
models include first and second order statistics, linear subspaces, and the linear mixture model[1, 2, 3]. These models
allow for per-pixel decisions to be made regarding the class
to which the pixel belongs, the likelihood that the pixel is a
target of interest, or the likelihood that the pixel has changed,
for example. These data models enforce assumptions on the
data, such as multivariate normality, linearity, etc., assumptions that are increasingly difficult to meet as hyperspectral
sensors improve in spatial resolution. While this allows for
improved object recognition through spatial patterns and
more “full pixel” targets, it also increases the spectral variability with which the world is sampled. This changes the
nature of the image through reduced spatial averaging, and
typically results in a stronger impact of the image “clutter”
on algorithm performance.
Anomaly detection is another such application that has
been widely studied. The traditional RX algorithm[4, 3] uses
the image mean and covariance as a statistical model of the
image and attempts to identify pixels that are statistically “different” from the background, and thus anomalous. There have
been several updates to the RX algorithm. The Kernel-RX
algorithm, proposed by Kwon & Nasrabadi (2005), utilizes
a kernel mapping function to improve separability between
the background and the anomalies. The Subspace-RX algorithm developed by Schaum (2007) uses a subspace approach with a similar goal. However, both of these algorithms use the same essential model of the data. The Topological Anomaly Detection (TAD) algorithm[7, 8] develops a
different model of the data and has been shown to perform
as well if not better than the RX algorithm. Here, the data
are modeled using a topological construction - the simplicial
complex. The data are collected into background components
based on a pixel spectral similarity calculation, and anomalies are identified as being “outside” the background complex.
Their measure of anomalousness is the codensity - essentially
the distance to the outer “shell” of the nearest background
component. A similar approach to unsupervised spectral image clustering, based on the graph modularity, has also been
demonstrated[9]. Here, the application is different, but the
approach is similar in that a graph of the data in the spectral
space is created and edges are selectively “cut” to produce
clusters at various levels of detail.
The anomaly detection approach presented here is similar in nature to the TAD algorithm, with some modifications.
Here, we develop a connected graph of the pixels in the spec-
tral space (described below) and measure the connectivity of
individual pixels to the graph. The hypothesis is that “background” pixels will sit in the large clusters and will be very
highly connected, while anomalous pixels will be less well
connected to the graph. The input image is spatially tiled
and the background graph is computed per tile. This provides
two benefits, a computational advantage as well as a spatially
adaptive background model.
This paper is organized in the following way. Section 2
presents the algorithm theory and implementation. Section 3 describes the experiment used to test the algorithm
and presents the results. We conclude with a summary in
Section 4.
Vertex Volume:
Volvv (G) =
X
d(u)
(1)
u∈VG
Edge Volume:
Volev (G) =
X
wu,v
(2)
{u,v}∈EG
Normalized Edge Volume:
P
Volev
Volnev (G) =
=
Volvv
wu,v
{u,v}∈EG
P
d(u)
(3)
u∈VG
2. METHODOLOGY
As stated above, traditional quantitative algorithms for use in
spectral imagery model the data using first and second order
statistics, linear mixture models (i.e., the convex hull model),
or linear subspaces[1, 2, 3]. These models can be very useful
in many cases but the assumptions behind them are not necessarily well-met, particularly in high resolution hyperspectral
imagery. An alternate approach presented here is to model the
data using the concept of a high dimensional graph[10, 11].
In this model, individual pixels, when considered in the spectral space, are the nodes or vertices in the graph, G. Two
nodes are connected with an edge if they satisfy some similarity criteria. Here, we use spectral Euclidean distance. The
graph can then be described as a vertex set, VG , and an edge
set, EG . Once the graph is created (described below), the degree of each node u, du , is the number of edges to which it is
connected. For each edge between nodes u & v, the weight
of the edge, wu,v , is the length of the edge. Figure 1 shows a
notional simple graph of points in 2D, including an anomaly.
Fig. 1. Notional anomaly in a graph theory context. Circles
are pixels in the spectral space and the red point is anomalous.
Edges are shown as straight lines. Graph was constructed with
3 nearest neighbors.
Given a set of pixels, the graph that describes those pixels
is constructed. Once the graph is constructed, several metrics
can be computed from the graph[12] that describe the connectivity of the graph. Some metrics are evaluated based on the
entire graph, while others are computed per pixel. The metrics presented here are:
Weighted Vertex Volume:
Volwvv (u) =
d
P u
wu,v
, ∀u ∈ VG
(4)
v∈N (u)
The Vertex Volume, Edge Volume, and Normalized Edge Volume are computed for a graph and describe the overall connectivity of an entire graph. The Weighted Vertex Volume is
computed per pixel and is a measure of the connectivity of a
given vertex to the rest of the graph through its neighborhood
N (u) (i.e., those pixels connected to the pixel under test with
an edge). These metrics can be used for various applications
(e.g., change detection[13] and large area search). Here, the
Weighted Vertex Volume (WVV) is used as a per pixel measure of the anomalousness of each pixel in the image.
The algorithm proceeds as shown in Figure 2. An input
image is sectioned into tiles. For each tile, the graph of the
data in the spectral space is constructed. While there are several considerations to take into account when constructing this
graph (see Mercovich, et al., this proceedings), here we use a
nearest neighbor approach. This is important as for this application, we require the graph to be connected, meaning each
vertex has at least one edge connecting it to another vertex.
For each pixel in the set, the 20 nearest neighbors are identified using the ATRIA[15] algorithm, and the distances to
those neighbors computed. The number of nearest neighbors
essentially limits the size of the anomalies (i.e., a cluster with
more than 20 pixels will look like background here). Then,
for each pixel, we now know both the degree of the pixel vertex (du ) and the lengths of all of the edges associated with it
(wu,v ). Thus, for each pixel we can compute its WVV using
equation 4. Referring back to Figure 1, anomalous pixels will
have a low degree and the edges connecting it to the rest of
the graph will be long. The WVV for anomalies will then be
a low value; consequently, we take the inverse of the WVV as
a measure of anomalousness.
Note that this is similar to the metric computed using
the Topological Anomaly Detection (TAD) algorithm[7, 8].
However, in the TAD algorithm, background components are
scene. The image was sectioned into tiles that were 30 × 30
pixels in size, and processed using the algorithm described
above. Additionally, the image was processed using the RX
algorithm for comparison.
The results are shown in Figure 4 below. Figures 4(a)
& 4(b) show the results from processing the image with the
RX anomaly detection scheme while figures 4(c) & 4(d) show
the WVV results. In each case, the full range of the result is
shown (linearly scaled) as well as a thresholded binary map
indicating the top 1% of the detections. Both methods de-
Fig. 2. Algorithm scheme for anomaly detection. The input
image is tiled and a graph is computed for each tile. Then,
the anomalousness of each pixel in the tile is estimated and a
map is produced.
identified on a graph developed by sampling the data from
the entire image, and the measure of anomalousness is the
co-density between the anomalous pixels and the nearest
background pixels. Here, we use a similar metric (summation
of the edge lengths), but the WVV also takes into account
the degree of each pixel (i.e., how many connections it has)
and assigns an anomalousness score to every pixel. Additionally, due to the tiled nature of the algorithm, the “background”
graph against which pixels are compared is spatially adaptive.
(a)
(b)
(c)
(d)
3. EXPERIMENTAL RESULTS
The algorithm described above was implemented into the IDL
/ ENVI spectral image processing package and was tested
against the Forest Radiance hyperspectral image shown in
Figure 3[3]. This well-studied scene contains a large grass
Fig. 4. Anomaly detection results. (a) RX algorithm, linearly
stretched (b) RX, thresholded at top 1% (c) WVV, linearly
stretched (d) WVV, thresholded at top 1%.
Fig. 3. Hyperspectral image used to test the anomaly detection scheme.
field, a road, and a treeline. There are several targets of various materials and sizes placed in the scene in regular increments providing a particularly useful anomaly detection
tect the majority of the targets with a low false alarm rate,
although it is evident that they are false alarming on different phenomena in the scene. The RX algorithm is sensitive
to noise in the scene (evident as stripes down the left side of
the result) while the WVV detection is more sensitive to point
anomalies in the forest and to some bright pixels near the bottom of the image. This is likely due to the fact that the noise
in the scene has a sufficient number of pixels that occupy the
same portion of the spectral space that they are well connected
to each other in the spectral graph. In the fully scaled results,
the separation between the targets and the background is visually more enhanced in the WVV detection plane (i.e., the
background pixels are darker relative to the targets), but a few
of the targets are obviously not in the top 1% of detections, as
evidenced by the thresholded image.
The results from the WVV anomaly detection method are
comparable to the RX results for this case, though, despite the
fact that the mathematical model being used and the metric
being computed are dramatically different from the statistical
distance that is the basis of the RX algorithm. It is anticipated
that this method will outperform RX in more complicated scenarios (e.g., those with more spatial / spectral clutter) where
the assumptions of multivariate normality of the data are less
well-met and the RX algorithm performs poorly.
4. SUMMARY
This paper presents a novel mathematical model for use in
developing algorithms for analysis of hyperspectral imagery.
Instead of the traditional models, such as statistics, linear subspaces, and the linear mixture model, the graph theory approach here seeks to apply as few assumptions on the data
as possible. The application described here is anomaly detection, but others are under development. Here, a tiled approach
is implemented. For each tile, a connected graph is computed
on the pixels in the spectral space. Given this graph per tile,
the connectivity of each pixel (in the tile) to this graph is estimated though calculation of the Weighted Vertex Volume
(WVV) for that pixel. This metric accounts for not only the
number of connections (edges) between the pixel under test
and the graph, but the length of those edges, thus identifying
as anomalous pixels that are both isolated and far away from
the background components in the image tile. The results
are demonstrated against an airborne reflective hyperspectral
image and compared to RX. While the results are not dramatically better than RX, the test here is a relatively simple one
of a fairly uniform, low clutter background with targets of interest in the scene. However, the demonstration that this new
technique, based on a new mathematical formulation of the
data, performs as well as the standard algorithm is encouraging. It is anticipated that the algorithm will outperform RX in
situations where the multivariate normality assumptions behind RX are less well met.
5. REFERENCES
[1] D. Landgrebe, “Hyperspectral image data analysis,”
Signal Processing Magazine, IEEE, vol. 19, no. 1, pp.
17 –28, Jan. 2002.
[4] I.S. Reed and X. Yu, “Adaptive multiple-band cfar detection of an optical pattern with unknown spectral distribution,” IEEE Trans. Acoust., Speech, & Sig. Proc.,
vol. 38, no. 10, pp. 1760–1770, 1990.
[5] H. Kwon and N.M. Nasrabadi, “Kernel RX-algorithm:
A nonlinear anomaly detector for hyperspectral imagery,” IEEE Transactions on Geoscience and Remote
Sensing, vol. 43, no. 2, pp. 388 – 397, 2 2005.
[6] A. Schaum, “Hyperspectral anomaly detection: Beyond
RX,” in Algorithms and Technologies for Multispectral,
Hyperspectral, and Ultraspectral Imagery XIII, S. Shen,
Ed. SPIE, April 2007, vol. 6565.
[7] B. Basener, E. Ientilucci, and D.W. Messinger,
“Anomaly detection using topology,” in Algorithms and
Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XIII, S. Shen, Ed. SPIE, April 2007,
vol. 6565.
[8] Bill Basener and David W. Messinger, “Enhanced detection and visualization of anomalies in spectral imagery,”
in Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XV, Orlando,
Florida, United States, April 2009, SPIE, vol. 7334.
[9] Ryan A. Mercovich, Anthony A. Harkin, and Dave W.
Messinger, “Automatic clustering of multispectral imagery by maximization of the graph modularity,” in
Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVII. S. Shen, Ed.
SPIE, April 2011, vol. 8048.
[10] Douglas B. West, Introduction to Graph Theory, Prentice Hall, 2 edition, September 2000.
[11] Jonathan L. Gross and Jay Yellen, Graph Theory and
Its Applications, Discrete Mathematics and Its Applications. Chapman and Hall, 2 edition, 2006.
[12] Fan R. K. Chung, Spectral Graph Theory, Regional
Conference Series in Mathematics no. 92. American
Mathematical Society, December 1996.
[13] J. Albano, D. W. Messinger, A. Schlamm, and
B. Basener, “Graph theoretic metrics for spectral imagery with application to change detection,” in Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XVII, S. Shen, Ed. SPIE,
April 2011, vol. 8048.
[2] John R. Schott, Remote Sensing: the Image Chain Approach, Oxford University Press, 2 edition, May 2007.
[14] R. Mercovich, J. Albano, and D. W. Messinger, “Techniques for the graph representation of spectral imagery,”
submtted to WHISPERS 2011. IEEE, June 2011.
[3] Dimitris Manolakis, David Marden, and Gary A. Shaw,
“Hyperspectral image processing for automatic target
detection applications,” MIT Lincoln Laboratory Journal, vol. 14, no. 1, pp. 79–116, 2003.
[15] C. Merkwith, U. Parlitz, and W. Lauterborn, “Fast
nearest-neighbor searching for nonlinear signal processing,” Phys. Rev. E, vol. 62, no. 2, pp. 2089 –2097, 2000.