y = X +

Modeling Spatial Context in Classification/Prediction
Using MRF and SAR techniques.
Shashi Shekhar
Weili Wu
Sanjay Chawla
Ranga Raju Vatsavai
Department of Computer Science.
Army HPC Research Center.
University of Minnesota.
Outline







Introduction
Problem Definition
Supervised Classification
Spatial Context
Markov Random Fields (MRF)
Spatial Autoregression (SAR)
Results
Introduction

Spatial Databases



Objectives



Maps, Ground Observations, Multi-spectral/Multi-temporal remote
sensing images
e.g. Ecology (Wetlands), Forest Inventory, Aerial Photographs and
Satellite Remote Sensing images
Predict spatial distribution of marsh-breeding birds
Thematic Classification – identification of objects in a imagery
Techniques



Supervised, Unsupervised
Statistical, Neural
Knowledge Based
Example Datasets
Example Datasets
Problem Definition

Given





Find


Classification model: f^L : R1 x …RK –> L.
Objective


A spatial framework S consisting of {s1, …,sn} for an underlying geographic space G
A collection of explanatory functions fXk : S -> Rk, k = 1, …,K. Rk is the range
A dependent class variable fL : S -> L = {l1, …, lM}
A value for parameter , relative importance of spatial accuracy
Maximize similarity ( = (1- )classification_acc(f^L , fL ) + ()spatial_acc(f^L , fL ))
Constraints


S is multi-dimensional Euclidean space
Explanatory functions and response function may not be independent, i.e. spatial
autocorrelation exists.
Accuracy Measure

ADNP
Problem Definition


Given

Multi-spectral image X, composed of N-dimensional Pixels

X(i,j) = [bv1, …bvn]’
Find

Appropriate Label
i , i  1,...M
Classical Techniques

Logistic Regression


Assumptions



y=X+
independent, identical, zero-mean, normal distribution
i.e. i = N ( 0, 2 ).
Bayesian Classification

Pr(li | X) = Pr(X | li) Pr(li) / Pr(X).
Classification Schemes
Pixel Classification


Class to which a pixel at location x belongs it is strictly the conditional
probability p(i | x), i  1,....M
Decision Rule
x  i iff p (i | x)  p ( j | x)j  i
How to compute p(i | x) ?
Bayes' Theorem :
p ( x | i ). p (i )
p( x)
Where p ( x)  prob. of finding a pixel from
p(i | x) 
any class at location x
M
  p ( x | i ). p (i ).
i 1
Classification
p ( x |  i ) is the class conditional distribution and can be estimated
from the training data.
p ( ) : prior probability distribution
Modeling both terms are imporant for accurate classification.
Decision Rule :
x   i if p ( x |  i ). p ( i )  p ( x |  j ). p ( j )i  j.
Simplified :
g i ( x)  ln p ( x |  i )  ln p ( i )
x   i iff g i ( x)  g j ( x)i  j.
Comparison
Criteria
Logistic Regression
Bayesian
Input
fx1, …,fxk, fl
fx1, …,fxk, fl
Intermediate
Results

Pr(li), Pr(X|li)
Output
Pr(li | X) based on 
Pr(li | X) based on Pr(li), Pr(X|li)
Decision
Select most likely class
For a given feature value
Select most likely class
For a given feature value
Assumptions
- Pr(X|li)
Exponential family
- Class bndry
Linearly separable
- Autocorrelation None
None
Spatial Context

What is spatial context?



Observations at location i depend on observations at location j  i.
Formally yi = f(yj), j = 1,2,…..n
j  i.
Why?


Natural Objects (Species) occur together
Higher sensor resolution (i.e. object is bigger than pixel).
Prior Distribution Model:

For Markov random field  , the conditional distribution of a
point in the field given all other points is only dependent on its
neighbors.
p{ ( s ) |  ( S  s )}  p{ ( s ) |  (s )}
Where S is an image lattice
S  s denotes a set of points in S excluding s
x
x
s
x
x
x
x
x
x
s
x
x
x
x
x
x
x
x
x
s
x
x
x
x
x
x
x
MRF Continued

A Clique is defined as a subset of points in S such that if s and r
two points contained in clique C, then s and r are neighbors.
s
Gibbs Distribution

For a given neighborhood system, a Gibbs distribution is defined
as any distribution  (s) expressed as:
p ( )  p ( (i, j ) |  (k , l );{k , l}  {i, j}).
 p ( (i, j ) |  (k , l );{k , l}  s ).
U ( )
1
 e T
z
Maximizing p ( (i, j ) |  (k , l )) is equivalent to minimizing U ( ).
U ( (i, j ))   Vc ( )
C
whereVc is an arbitrary function of  on the clique c and
C is the set of all cliques.
Hammersley-Clifford Theorem

H-C theorem states that  is an MRF with strictly positive
distribution and a neighborhood system
{(s, s)}sS iff
the distribution of  can be written as a Gibbs distribution with cliques
induced by the neighborhood system.
Therefore, if p() is formulated as Gibbs distribution,  would have
properties of MRF.
Gibbs Distribution
For a first - order neighborhood system
1
p ( )  e
z
   t c ( )
C
 e.q.1
t c ( ) is the total number of horizantally and vertially
neighboring points of different value in  in clique c.
e.q.1 is Gibbs distribution and therefore, an MRF.
 is emphirically determined weight.
t c ( )  {01,
if  (i , j )  ( k ,l )
otherwise.
ICM

1. Compute p( x | i )
2. For all pixels (i,j), update p( )
3. Repeat 2.

Assuming multivariate normal distribution:



N
1
p( x | i )  2 2 | i | 2
1
{ ( x mi )T i 1 ( x  mi )}
e 2
SAR

y=Wy+X+


W is the contiguity matrix
 strength of spatial dependencies
Experimental Results

Experimental Setup
Results
Results
Results
Results
Results
Results
Conclusion






Comparison on a common probabilistic framework.
Modeling spatial dependencies
SAR makes more restrictive assumptions than MRF
MRF allows flexible modeling of spatial context
Relationship between SAR and MRF is analogues to logistic
regression and Bayesian classifiers.
Efficient Solution procedures


Graph-cut
Extend the graph-cut to SAR.