Ji, Chuanshu; (1990).Sieve Estimators for Pair-Interaction Potentials and Local Characteristics in Gibbs Random Fields."

SIEVE ESTDIATORS FOR PAIR-INTERACfIaf POTENTIALS AND LOCAL
aJARACfERIsrI~
IN GIBBS RANIDI F:IEl.m
aIUANSHU JI
Department of Statistics
University of North Carolina
Chapel Hi II, NC 27599-3260
USA
The problems of estimating certain infinite-dimensional unknown parameters
(such as interaction potentials, local characteristics, etc.) in Gibbs random
fields are considered.
We apply Grenander's method of sieves to construct
strongly consistent estimators for those parameters.
Exponential rates of
consistency have been established by using the conditional miXing property of the
Gibbs random fields.
The approach of the paper is applicable to image models
with pair-potentials of short range or long range and with simple degradation
structure, such as texture segmentation models.
The results in the paper hold
regardless of phase transition and symmetry breaking in the Gibbs random fields.
Key words and phrases: Gibbs random fields, method of sieves, consistent
estimator, texture segmentation, image analysis.
Abbreviated Title:
Estimation for Gibbs Random Fields.
AIlS {1980} subject classification:
primary 62G05, 6OG6O; secondary 68GlO, 82A25, 82A67.
- 1 -
1.
Introduction.
Since the pioneer work of Geman and Geman (1984), Gibbs random fields (GRF)
have been extensively used in imaging problems.
as models in statistical mechanics.
GRF were originally introduced
A particle on each site of a two-dimensional
integer lattice may represent the "spins" of a magnet, and there are usually
interactions between particles at different sites.
particles on all sites.
configurations.
("pixels").
A configuration consists of
GRF are probability distributions on the set of all
In imaging problems, particles are replaced by picture elements
Intrinsic properties of GRF are preserved.
In terms of Bayesian
paradigm, GRF playa role of the prior distribution on relevant scene attributes
to capture the tendencies and constraints that characterize the scene of
interest.
Image processing is then guided by the prior, which, if properly
conceived, enormously limits the plausible restorations and interpretations.
The
mean or mode(s) of the prior, if regarded as the true scene(s), usually can only
be inferred based upon partial or corrupted observations, i.e. the corresponding
posterior mean or posterior mode(s) are taken as the estimates of the true scene.
For instance, in emission tomography, the true scene is represented by the
spatial distribution of isotope in a target region of the body.
The observations
- photon counts - whose probabili ty law is specified by the likelihood function
given the true scene, have a mean function that is an attenuated Radon transform
of the isotope intensity.
In the texture segmentation problems, an image
consists of the pixel intensity array and a corresponding array of texture
lables.
Each label gives the texture type of the associated pixel.
grey-levels of the pixels are observed, but not the labels.
The
There is a
substantial literature on imaging problems from the viewpoint of probabilistists
and statisticians.
We refer the reader to Geman and Geman (1984), Besag (1986)
and Geman (1990) for more complete discussions of general framework and
- 2 -
methodology. Geman and McClure (1987). Johnstone and Silverman (1990) for
tomography. Geman and Graffigne (1986) for texture segmentation.
The quality of image processing will clearly depend on choices made at the
modeling stage. i.e. how to specify GRF as a prior.
aspects: parameter estimation and model selection.
This may include two
The former is to estimate
unknown parameters contained in the energy functions of GRF.
choose one from several candidate GRF as the true model.
consider parameter estimation.
The latter is to
In this paper we only
Model selection will be discussed later on in
Seymour and Ji (1990).
The parameter estimation for GRF is complicated by phase transition and
degradation that creates corrupted scenes as indirect observations.
Phase
transition means that for certain values of the true parameters there may exist
more than one "infinite-volume" GRF that generate the data.
It produces an
interesting class of spatial statistical models with long-range dependence. which
can not be detected solely by the data in advance.
Therefore good estimators
should have nice asymptotic properties which hold regardless of phase
transitions. and meanwhile be computationally feasible.
So far all available estimation procedures have been parametric. i.e. they
are based on the assumption that the energy functions in the GRF are parametrized
by a fini te number of unknown parameters.
These procedures include maximum
likelihood esimators: (MLE).•,:, JR(lXimum~ pseudo-likelihood esHmators (MPLE). etc.
Computationally.,
done to remedy
MLE'are~intl!8Cl:tabl:e'rin their
this·drawback"",~while.
basic forms.
Some work has been
MPLE turn out to be very effective.
Gidas (1988) established the consi&tency. asymptotic normality and asymptotic
efficiency of MLE for fully observed data.
consistency of MLE for degraded data.
Comets and Gidas (1988) proved the
For MPLE. Geman and Graffigne (1986) .
Gidas (1986) obtained the consistency respectively by using different approaches.
•
- 3 -
Recently, Comets (1989) applied the large deviation theory to derive the
exponential convergence rates for the consistency of MLE and MPLE, and also
studied the connection with the Bahadur efficiency.
It is noteworthy that the
consistency of those estimators is not affected by phase transition under some
identifiability conditions. but the asymptotic normality is.
As mentioned in Gidas (1988) and Geman (1990), an open problem along this
line is nonparametric estimation in which the unknown parameters contained in the
energy functions of the GRF are infinite-dimensional:
or smooth functions.
either infinite sequences
Why should we consider infinite-dimensional parameters?
Mainly because it may be a good starting point when we know very little about the
energy functions.
In this paper, Grenander's method of sieves [Grenander (1981).
Geman and Hwang (1982)] is adopted to construct strongly consistent estimators
for the local characteristics (as unknown functions) in GRF. the results then are
used to produce strongly consistent estimates for a countable sequence of unknown
coefficients of the pair-potentials in the energy functions.
Consistency of MPLE
is also proved as a by-product for the case in which the energy functions contain
a large number of parameters.
aforementioned.
This generalizes some results in the papers
The main ingredient of sieve method in our context is to choose
an increasing sequence of sieves - each being a constrained finite-dimensional
subspace of the original infinite-dimensional parameter space - so that every
sieve implicitly corresponds to a Markov random field. (MRF) induced by
interactions of fini te range.
As the samp'let$!ZeT:increases, so is the sieve
size, but with a slower rate.
The
and the convergence rate in the
relatihD.lbe~ween
consis~y of
the growth rate of sieve size
our sieve estimators is carefully
demonstrated, which indicates how far this'method can go.
This paper may be the first one to make an attempt on nonparametric
estimation for GRF.
The results hold regardless of phase transition and symmetry
breaking (GRF need not be stationary).
The estimators we constructed are quite
- 4 -
tractable computationally.
(i)
However. a few limitations should be mentioned here.
The sieve estimators in this paper apply only to the cases of fully observed
data or the incomplete data such as grey-levels in texture segmentation problems.
where the degradation is simply a projection and the parameters of interest are
only related to the grey-levels. so it can be dealt with as if there is no
degradation.
Nonparametric estimation problems for the GRF wi th more complex
degradation structure are yet to be tackled.
(ii)
For the data set consisting of observations taken from an nxn square
lattice. we choose an kxk square lattice as a sieve compatible with the sample
size. where k behaves like Jc log n for large n.
To determine the constant c.
which depends on the unknown parameters. a bound for the sum of the external
field coefficient and the pair-potentials is assumed to be known as a priori.
This is similar to the situation in density estimation. where the bandwidth
selection is based on the knowledge of bound on the derivative of the density to
be estimated.
Therefore. effort in the future should be made to replace this
assumption by sensible adaptive procedures to determine c from the data.
In Section 2 we give the background of GRF and formulate the related
estimation problems.
The main results are stated in Section 3.
argument for choosing the sieve size is also given there.
A heuristic
In Section 4. we
derive the rate of consistency for the sieve estimators of the local
characteristics. using the conditional miXing properties of the GRF and a lower
bound on the probabili ~ies of
.
~,
'.,~
poss~ible
l . fj l
a:r '
configurations restricted to each sieve.
In Section 5. we deal with the consistent estimators for pair-potentials in GRF
of certain types. including the general Ising model and the one introduced by
Geman and Graffigne (1986) for texture segmentation.
consistency for MPLE is discussed.
Generalization of the
Finally. in Section 6. we make some
concluding remarks and mention some possible issues for future research.
- 5 -
The term "nonparametric" is usually used for "modeL-free" statistics.
However. using GRF to build up image models tends to comply with "modeL-based"
principle.
In a sense the estimation procedures in this paper could be called
"infintte-parametrtc".
Nevertheless. following the tradition. we still adopt
"nonparametric" here.
2.
Mathematical model and preliminaries.
2.1
GRF induced by pair-interaction potentials.
Let
z2 be
With each pixel site f €
the two-dimensional integer lattice.
we associate a random variable X taking values in a finite set S
where r
€~.
Let 0 =
realization of X.
let 0A
x
f
= SA •
~
Here x
f
z2.
= {±l •...• ±r}.
be the configuration space and each x € 0 be a
= (xf •
f €
z2). X = (Xf • E € z2).
and x A = (x . f € A) € OAt XA = (X .
E
E
E € A).
For every A C
z2.
we
In imaging problems.
may represent the grey-LeveL of the pixel f.
GRF will be the probability distributions of X defined on 0 via pair-
interaction potentiaLs (or simply pair-potentials in what follows).
Let J = {J(f}. f €
(A1)
J(f}
= J(-E}.
Vf
where the norm II-II on
Let V : S
~
z2} be a real sequence satisfying
€ z2. and J(w} = O. w is the origin of z2.
-r
is defined by
IR and U : S x S
~
IIf!I;~max{IE11. If2 1}
(E 1 .E2 ).
IR be two functions satisfying
(A3) V is either identically zero or nonconstant.
Moreover. U is synwnetric. and
for every z € S
....
max U(z.z}
z€S
for f =
> min
z€s
U(z.z}.
- 6 -
(A4)
3 zl' z2 € S, called doubLe extremaL points, such that one of the following
two cases holds:
Case I:
Case 2:
V
=0;
U{zl'z)
= max
z€S
U{z,z), U{zl'~)
= min
z€S
U{z,~)
for some z, ~ € S.
V is nonconstant, V{ZI) = max V{z), V{z2) = min V{z);
z€S
z€S
U{z2'z*)
= min
z€S
U{z,z*) for some z,
The pair-potential between the pixels r and
J{r~) U{xr'x~).
~
~,
meanwhile,
*
z ,
is assumed to have the form
Apparently, such potentials are translation-invariant.
also generalize the potentials of finite range in which J{f)
=0
They
for all f with
sufficiently large norm IIfl!.
For every finite subset A of
where h €
m is
r and each x
€ 0,
define the energy
called the coefficient of external field.
HA{x) may be
interpreted as the contribution of the pixels in A to the total energy associated
with the configuration x.
The finite-volume Gibbs distribution in the volume A with the external
condi tion x
AC
is given by
(2.2)
where the combined configuration YA $ x
normaliZing factor
(2.3)
Ac
agrees with YA in A, and x
C
Ac
in A
;
the
~
- 7 -
is called the partition function on A given x
AC
= {f}
In particular, when A
.
= X{f}c ,fr(ele) =
(a singleton), let rX
ll.
ll.
f{f}(e Ie), then for a given fX'
-1
-H{f} (Yf'fX)
ff(Yf IfX) = Z{f}'fx e
,
(2.4)
is called the local characteristics at f.
Yf € S
It is known that the local
characteristics in (2.4) determine the finite-volume Gibbs distributions in
(2.2).
Example 2.1.
""
Let V(z) = z, U(z,z) = zz, Z,z € S.
general Ising models (GIM).
~
= -r,
Example 2.2.
The double extremal points zl = r with
with z *
= -r
while z2
Let V(z)
is a positive constant.
= 0,
This corresponds to the
= -r
and z*
=r
in (A4).
""
V z € S, and U(z,z)
=
1
""
2 '
= -r
in (A4).
z,z € S, where a
l+a(z-z)
The pair-potentials are connected with the texture
double extremal point zl' can also be verified:
~
and
This is Case 2.
segmentation models described in Geman and Graffigne (1986).
and
z= r
The existence of
for instance, zl
=r
with
z=r
This is Case 1.
For the potentials specified by U, V, J, h, let
~(U,V;J,h)
A
=
~
be the set of
corresponding infinite-volume GRF on 0 so that P € ~ if for each finite A C
and every x
(2.5)
AC
€
0
AC
z2
,
P(XA = YA
I X C =x
A
C
A
)
= fA(YAlxAc ),
:3
P is said to be stationary if
(2.6)
where XA+
f
= (XC+f ' C €
A).
Note that the translation invariant potentials need
not induce stationary GRF (possibility of symmetry breaking).
- 8 -
Meanwhile. under our assumption. <9 is always non-empty. but need not be a
singleton (possibility of phase transition).
In general. <9 is a convex. compact
Choquet simplex.
Remark.
The definition of GRF under more general conditions is given in Ruelle
(197S) and Georgii (1988).
Both the configuration space and the potentials can
be brought into a more general set-up.
However. the framework given here is
convenient for the nonparametric estimation problems related to GRF.
2.2 The nonparametric estimation problems for GRF.
Suppose h and J are unknown parameters which induce the set of GRF <9 on D.
Let A be the nxn SYmmetric square lattice centered at the origin w of
n
without loss of generality we assume n € ~ is odd.
z2.
here
Based on the data X(n} ~ X
A
n
generated by a GRF P € <9. two nonparametric estimation problems are considered:
(I)
estimating the local characteristics f (x). x € D. where f (x)
w
w
= f w(xw Iwx):
(II) estimating h and J.
Note that the translation invariance of the potentials implies the
translation invariance of the local characteristics. i.e.
(2.1)
Vf€??
xED.
where t f is the translation operator defined by (tfx}C
amounts to estimating ff(xflfx) V f
€
= XC-f'
V C€
z2.
So (I)
z2.
A random function T on D constructed from X(n} is said to be a strongly
n
conshtent estimator of f
(2.8)
w
if
''"
liT -f II -+ O.
n
where liT -f II
n
w
~ sup
xEO
w
a. s.
under P as n -+
CIO.
IT (x) - f (x}l.
A random sequence
estimator of (h:J) if
w
n
(hn :
In(f). f €
z2}
is called
a
strongly consistent
- 9 a.s.
(2.9)
under P as n
~
m.
Here we adopt the iI-distance because (A2) implies the summability of J.
There is another estimation problem which is closely related to (II) but
practically more apPealing.
Suppose X(n) is generated by a MRF with respect to a
set of pair-potentials of a finite but large range.
i.e.
contains finite but a large number of unknown parameters.
the energy function
To formulate this
quantitatively. we may assume that the range of the potentials. or eqUivalently
the number of unknown parameters is d
but with a slower rate.
n
-- it increases along with the sample size
This is essentially a nonparametric estimation problem
even though it looks like a parametric model superficially.
Intuitively, the
question here is given a large data set what would be our limitation in terms of
the compatible number of unknown parameters which we can estimate consistently.
We call the problem
(III)
estimating h and J = {J(f) : f €
r.
IIfll ~ dn }.
It should be noticed that we do not consider the problem of estimating the
GRF themselves here.
Due to the phase transitions. the GRF themselves are
generally non-identifiable.
3.
3.1.
See Gidas (1988) for further discussions .
.
llain results.
L
Q)nstruction or T and selection or the sieve size k.
n
We write f(x) for f (oJ (x). x € 0 in the problem (I) of Section 2.2 when there
is no confusion.
By (2.5), f(x) is the conditional probability
P(X(oJ = x(oJ I(oJX = (oJx), which can naturally be approximated by
(3.1)
- 10 for large k € ~, where ~ -- the sieve -- is an kxk square lattice centered at w.
~
Therefore we construct the empirical measure from the sample X{n), and use the
"sampl.e condittonal. Frequency" of xw appearing at w given that
in Ak\{w} to estimate f{k){x).
x~\{w}
appearing
More specifically, we first extend X{n) by
n
periodization outside A into a periodic configuration X , then define
n
(3.2)
where 6
is the Dirac mass at x € 0; IAI is the cardinality of A C
x
z2.
R X is
n,
the empirical measure defined on 0 constructed from X{n).
For every
x~ € A~,
(3.3)
let
~ = {y € 0 : y ~ = x~}
(3.4)
Define T
n
0
~
IR by
Tn (x)
(3.5)
where a
€
=
Rn,X{~)lRn,X{Ak)'
{ .....
a
otherwise,
(0,1) can be set arbitrarily whose value is not important.
Tn , as the
"sample conditional frequency" mentioned before, is our estimator for f.
Now we give a heuristic argument that for large n, the sieve size k should
behave like ~c log n to guarantee the consistency of T .
n
3.1.1
Why ,/iog n ?
Suppose X{n) comes from an extremal point P €
~
so X is ergodic under P.
Let
(3.6)
P{x{k»
= P{X~ = x~),
x~ € O~ .
By the Shannon-McMillan-Breiman Theorem [cf. Follmer (1973)], for P-almost all
•
- 11 x € O.
P(x(k»
(3.7)
~
e
-1~lh(P)
and
•
(3.8)
where h(P) is the specific entropy of P. and is positive except for the trivial
cases.
For the consistency of Tn . we need to have enough "empirical counts"
IAnIRn.X(~) for all sub-configurations
x~.
Hence the expectation of
IAnIRn.X(~) under P. which is just IAn Ip(x(k». needs to
be large.
with (3.7). this means k should be the order of v10g n at most.
Combining
Meanwhile. the
orders slower than "'log n are not desirable because IIf(kLfll would tend to zero
too slowly.
3.1.2 How to determine c?
Uniform upper and lower bounds for fA (x A_ Ix
k
Ole
~
) and P(x(k». x
€
derived by the follOWing lemma.
Lemma 3.1.
e
(3.9)
There exist two constants
-b IA- I
1 Ole
Wli(ormly V k
€
~ P(x(k»
IN.
x
-b IA- I
~ e 2 ole.
0
< b2 < b 1 < ~.
and
€ O.
Proof.
By the DLR equations [cf. Ruelle (1978)].
(3.11)
P(x(k»
=i O
such that
~
f A_ (xA- Iy ) P(dy ).
Ole Ole ~
~
Hence it suffices to show (3.10).
V x(k)
€ 0A_ •
Ole
For a bounded function g. let
0 are
- 12 g* = sup g - inf g. then it follows from (2.4) that
Therefore.
-b
e
(3.13)
-b
I
~
f(x)
~
e
2
•
uniformly V x
€
O.
where
log{1 + (2r-l) exp[lhl
(3.14)
v*
+
~f#W IJ(f) IU*]}
log{I + (2r-I) exp[-Ihlv* - ~f#W IJ(f) IU*]}
Give an arbitrary order 1 •...• I~I to the set ~ and write f~(x~lxAc} as a
k
product of conditional probabilities under P in an obvious way
By (3.13) and (2.7). we obtain (3.10) by taking further conditional expectation
in each factor of fA (x A_ Ix).
k Uk ~
0
Lenma. 3.1 implies that for all x(k) € O~. IAn I-P(x(k» is bounded below by
2-b c
2 -b 1 clog n
1
::::n e
=n"'
as n ...
ClO
provided b c
1
ClO
< 2.
In practice. since b
1
depends ·on the unknown parameters h and J. we need to
make the assumption:
(AS) 3 a known constant b
>0
such that b 1
~
(AS) enables us to choose c such that
(3.15)
O<bc<~
b. V(h.J} of interest.
- 13 -
Remark.
(A1)-(A5) are sufficient conditions for the results in this paper, but
not necessary.
They merely simplify our argument and could be weakened.
Among
them, the knowledge of b in (AS) is particularly stringent. and to reduce it
appears to be challenging.
C'4nsistency of T and
n
3.2.
See Section 6 for further comment.
(b.n : j n (E).
E€
il).
The key in this paper is
Theorem 3.1.
Assume (AI)-(AS) and define Tn by (3.5) with k
integer part of clog n. where c satisfies (3.15).
3 nO
€ ~.
C
> O.
a
> O. P > O.
= [c
log n] ~ the
Then for every
~
> O.
such that
(3.16)
Here P €
~
Remark.
(3.16) gives the rate of consistency for Tn in (2.8).
is a GRF corresponding to (h.J).
The proof will be
given in Section 4.
Now we construct the estimator
(hn'. j n (f).
f
€
z2)
via the method of
pseudo-likelihood.
To illustrate the idea. suppose the pair-potential J has a finite range.
Then 0
o is
A
=
(h.J)
a MRF.
is a finite-dimensional parameter. and the GRF P corresponding to
For every x € O. define the pseudo-likelihood function
~
(3.17)
(x;9)
n
~
n
(x:O) is a concave function in O.
=
n
ff{xflpX).
f€An
'
For tbegiven sample X(n) and an arbitrary
A.
X
~
•
any 0 which maximizes
n
~
n
(X(n) CD x
~
; 0) is called a MPLE of O.
It is a
n
good alternative of the MLE of O. and has a great computational advantage over
the MLE.
In the model specified in Section 2. J is infinite-dimensional.
above method of pseudo-likelihood needs to be modified.
The
- 14 -
For the k chosen in Theorem 3.1. trancate (h.J) by
(3.1S)
Then similar to (2.1}-(2.4). define
(3.19)
H{c}(x}
= -h
!.
V(xc } !.
~
O<lIf~lI~d
J(f-~}
U(xc'x }.
!.
n
~
(3.20)
where
Note that gf(x;9 } are the local characteristics that induce some MRF P' so that
k
if P' generates X then V x
€
O. f
€
z2.
(3.21)
In general. P' # P and the local characteristics ff(xflfx} may still depend on
the entire sequence J.
Nevertheless. for large n the two functions ff(-}' and
gf(-;9k } are close to each other.
This motivates the following construction.
n
Given the periodic configuration X • let
(3.22)
where
(3.23)
~ is a bounded. simple-connected open region in
rI-.
wi th the closure
,.. A ,..
,..
n
9k = (hn ; In(f). IIfll ~ dn } be a maximizer of PLn(X • 9} for 9
~.
€ ~.
and
(3.24)
Jn (f) = O.
Vf
€
z2 with
IIfll
> dn .
n
Note that both 9 and ~ depend on n. Apparently. for fixed n PLn (X ;9} is a
k
,..
bounded continuous function in 9. Hence 9k exists on
but need not be unique.
e.c.
tit
- 15 -
Theorem 3.2.
Assume (AI)-(A5) and Let
Then For every
and (3.24).
P(lh -hi +
n
~
Again, P €
3 n' € IN, COl
f €
> 0,
A
A
(3.25)
> 0,
~
(hn ; 3n (f),
J.l
z2)
> 0,
be deFined by (3.23)
v
> 0,
such that
v
"
I-2 IJ (f) - J(f)1 > ~) ~ C e-J,lD, V n ~ n'.
fEr n
is induced by (h,J).
Having Theorem 3.2 established, we obtain a solution to the problem (III) as
well:
COrollary 3.1.
J(f)
with
Suppose the sampLe X(n) is generated by a MRF P induced by (h,J)
= 0, V
f
€
1?- with
IIfll > dn .
Then
(3.26)
And
(hn ; 3n (f) ,
of (h,J).
f
€
1?-)
in (3.23) and (3.24) is a strongLy consistent estimator
Moreover, (3.25) hoLds.
The proof of Theorem 3.2 will be given in Section 5.
4.
The rate of consistency of T •
n
The aim of this section is to prove Theorem 3.1 which provides the rate of
consistency for T.
n
under P €
~
As we mentioned before, X may bear long-range dependence
due to the possible phase transitions.
This difficulty can be
overcome by studying the asymptotics of the conditional probability of some
sub-configurations given their complements.
The following proposi tion may be called a "conditionaL lIlixing LeIJlma".
Proposi tion 4. 1.
! =
~(~=1
lim
~
Jl""I'»n
=
01)
Bt )·
and
Le t B , ... ,B are bounded, silllpLe-connec ted regions in
1
L
Suppose the distance d(Bt,B t ,)
~ ~n'
V l # t ' , where
7l,
- 16 -
(4.1)
11len for any botmded measurabLe functions u t :
uniformLy V ~ € O~,
~
-+
JR, t = I, ... ,L, we have
t
as n -+~, where E(·I~} is the conditional. expectation with
respect to p(·I~}, and
(4.3)
Proof.
Suppose K ,
I
K2'
~ form a disjoint decomposition of
bounded and simple-connected, d(K I ,K2 }
x
€
0, x'
€
such that K is
I
~ "ln' and ~ = A(K I U K2L then for all
0,
= I!
C€K
where the notation
~
$ Xv
I
.'3
!
I
(4.4)
+ A ,
n
TJ€K2
$ ~
parts, simi lar to that in (2.2).
=I
z2
J(C~}[U(xr,x'}-U(xr'x }]I
l.
11
l.
11
means the configuration consisting of several
2
Hence
- 17 -
So V t
= 1•.... L.
P(~ = ~ I~ $ ~ $ ... $ ~
(4.5)
t t l
as n -+
t-1
} =
P(~
t
= ~ I~} (1 + 6 ).
t
n
00.
which implies
P(~t = ~t •
(4.6)
t = 1.....
LI~}
Therefore. (4.2) follows trivially.
= [
~ P(~t = ~t Ix~}
t=l
] (1 + 6 }L.
n
D
To create an environment for the application of the conditional mixing
lenuna. we deCOmPOse A in a "self-similar" fashion:
n
2
For technical convenience. we suppose n = m for some m €
partitioned as mxm squares D •...• D .
1
n
1 •...• n.
m.
Hence every f € An is indexed
by a pair (i.j). referred to as the j-th site in D • i. j=l •...• n.
i
(3.2}-(3.5). define
(4.7)
n
(4.8)
I
i=l
n
Each D contains n sites. ordered by
i
All Di's keep the same way of ordering.
=
A then is
Yij •
n
= I Yij'
i=l
j = 1 ....• n.
Note that all Yij • Yij' Nj • Nj depend on n and
x~.
and
From
- 18 -
n
N +·· .+N
T (x)
(4.9)
n
Let N = ~=1 Nj ,
n
1
= --.;;..---
N01+"'+N°n
N' = ~=1 Nj'
when
~
j=1
Nj > O.
then
(4.10)
~
~
n
~ I{N >O} N°
j=1
j
I~
N° - f
j
(k)
(x)
I
+
~
n
~ I{Nj=O,N>O} N' f
j=1
(k)
(x).
Therefore,
where
(4.11)
Dn(2}(x} -- I {N=O} la- - f(k}(x}
{3}
D
n
(4)
D
n
{x}
I
n
= j=1
~
I
N~ {k}
~ f
(x)
{Nj=O, N>O} N°
(x) = n~ I{N >O} ~
N0
j=1
j
I~
N0 - f (k) {x}
j
I
.
k
First of all, by setting K1 = {w} and "Yn = [2'] in (4.4) we average out
and obtain
(4.12)
,
.
XK
2
.
f(k}(x} = f(x}(1 + A},
n
uniformly V x
€ 0,
with
as n
~ m
{4.13}
for some Co > O.
Hence
IID(1}1I
n
~ 0 as n
~
m.
•
- 19 -
The next lemma is needed for n(2}(x}. n(3}(x} and n(4}(x}.
n
n
n
LeIIIIB 4. 1 .
= n I-be .
Let An
~ < l-~ )
(4.l4)
CI
> O.
Proof.
p[ A
n
a
l
> o. f3 1 > 0;
~ €
Then for every
~
Cl e
and uniformly V n
~
(O.l). 3 n
f3 l
-aln
l
€ ~.
such that
• for some
n l • V j = l ....• n. and V x~.
Denote the kxk square lattice centered at (i.j) by Qi.j' i
= l ..... n.
And let
(4.15)
Notice that Yij depends only on the sub-configuration on Qi.j·
And V i l # 12 ,
(4.16)
For every xQ • it follows from (4.2) that
j
(4.17)
with 6
n
= 0(k2
I
IIfll~m-k
IJ(f}I}.
Furthermore. for large n by Taylor expansion
(4.l8)
(I~I ~ 1
~ exp{ - ~E(YijlxQjl
n
Since E(YijlxQ ) ~ e
j
-b11~1
~ n
-be
depends on Yij )
(1 -
~ l} .
n
by (2.7). (3.10). (4.4) and (AS). we
- 20 -
derive from (4.17) that
N
P[ ~
<1
)
- ~
(1-~)~
-N /~
(1-~)~
-(1- ~)~
4
n
~ en. E e
n
j
n
~
e
n • e
The uniformity with respect to j and
Lemma 4.2.
For every
~ €
(0.1). 3 n2
[]
(32
P(IID(l~}1I
> -4~) <
C e 2
n
- 2
(4.19)
is obvious.
€ ~. such that
-a n
Proof.
x~
I! = 2.3.
for some
By Lemma 4.1.
peN j = O}
t
~
P[
~
(2r)
n
< 1-~
)
~ C1
v
j = 1 •...• n.
Therefore.
(2)
P(IIDn II
P(IID(3}1I
n
~
> 4'}
> 4~} ~
(2r)
I~I
P(N=O}
~
(2r)
clog n
C
I~I
Lemma 4.2 follows trivially.
[]
The similar conditioning argument can also apply to
following lemma.
1
D~4}(.} as shown in the
tit
- 21 Leuaa. 4.3.
For every
€ (O.l). 3 n
P{IID(4)1I
(4.20)
h
were
E;.
n
T
1 E;. (I -E;..
)
= 4
3 € IN. such that
> ~)
~
4
133
C
3
e-~n
•
for some
, ij = Y - f{k){x) Yi'j.
ij
By (4.l4). we only need to study the second term.
First by (4.4).
(4.2l)
where p
=O(
n
uniformly V x
€
O.
};
IIfll~m-k
IJ{f) I ) =o[V£e
-~
2
).
V j=l •...• n.
Taylor expansion implies
(4.22)
E[e'ijl~ I xCl
j
) = 1 +..L. O{p) +iE{r. Ix )
...ti::'
n
2An
ij Cl j
n
(I~I ~ 1 depends on'ij)'
Therefore.
-22-
~
e
~
--rIA
2e
E{
n
--rIA
n
~
]}
. _.
1
Kn
I-bc
-2
/3
-a n 4
4 e 4
•
W
ij
K n i=1
exp{n
~4exp-Tn
(
~
exp[ _1
be]
1
+2'n
by (3.15).
By the same token.
n
(4.24)
P[-
~ Wij
L.
>T
'n
I\.
)
/34
/ 4 e
::.
-an
4
i=1
Finally. with (4.14). (4.23). (4.24) together we obtain that
[]
Thus we have completed the proof of Theorem 3.1 by combining (4.11). (4.13).
(4.19) and (4.20).
The following strengthened result is needed in Section 5.
COrollary 4.1.
(4.25)
For every
~'
> O.
we have
- 23 -
ProoF.
For every x
•
A
k
Nt II{N)O} N
g(x;9k )
I
~
(1) (x)
(2) (x)
Mn
+ Mn
(3) (x). where
+ Mn
g(x·9 ).
Mn(l)(X) = I {N=O}'
k •
N'
n
M(2)(x)
= };
n
j=l
I{N =0 N>O}
j-'
j(1- g(x; 9k );
M(2)(x) and M(3)(x) are just n(3)(x) and n(4)(x) respectively with f(k)(x)
n
n
n
n
replaced by g(x;9 ).
k
If we also replace
E.
E.'
by bc in Lemma 1-3. then all those lemmas still hold.
n
In particular. the constant
T
in (4.23) and (4.24) becomes O(n-be ).
is needed to guarantee the exponential decay there.
exactly the same. we omit the details.
5.
The rate or consistency or
Hence (3.15)
Since the argument would be
0
(bn ; j n (E).
E€
7?-).
In this section. we prove Theorem 3.2 by making use of the analysis carried
out in Section 4 to refine the argument given in Geman and Graffigne (1986).
By (3.24) and (A2).
(5.1)
A
};
IIEII>dn
IJn (E) - J(E)I
= Oedn
e
-adn
) ~
Hence to obtain (3.25). it suffices to show that
(5.2)
where II-II is the t1-norm.
0 as
n
~ ClO.
- 24 -
For B € ~, define
G (B) =
n
(5.3)
~
~A' ~ g(x;B ) log g{x;B)
In_I
n
x~\.{w}
k
X
g(x;Bk )
w
where gee) = gw(e); and
F (B)
n
(5.4)
Lemma 5.1
(i)
~~
lA_I
x~\.{w}
n
X
(1 {N)O} !L]
N'
w
log g(x; B)
g(x;Bk )
Given the sample X{n) and fixed n,
= 0;
Fn(Bk ) = 0;
Gn(Bk )
(ii)
= ~
~ 0, V
Gn(B)
B;
Gn(B) is concave in B.
Fn(B) is concave in B; and
(5.5)
G (B ) = 0 and F (Bk ) = 0 are obvious. G (B) ~ 0 follows from Jensen's
n k
n
n
inequality. The concavity of Gn (B) and Fn (B) follows from the concavity of
Proof.
log g(x;B) as a function of B, which can be verified easily.
Finally, by (4.7)
and (4.8),
~A ~ (1 {N)O}
!
x~\.{w} Innl
I
and gE(tEx; B)
X
NN
I
~ () (
Innl (i,j)€A x~
n
w
= g(x;B),
~
= "'X'l
]
thus (5.5) holds.
t
j)
1,
~
(A.- )
-l<
= 1,
C
Now define the event
'n = {
Lemma 5.2.
1
2nbc '
v x~}.
3 q ) 0 unich depends only on (h,J), V,U, such that V sufficiently
large n and V B €
(5.6)
N'
~)
or<,
sup
I~"=l
[v'
e
v2Gn (B)
e
v] ~ -
ic
n
and
•
-25-
(5.7)
sup [v' IIvll=1
uh.ere v
€
rt<.
V
I
is
v~n(O) - v] ~ - ~
hold on In'
n
the transpose of v. and v2Gn (0). v~n (0) are the Hessian
matrices of Gn (a) and Fn (a).
Proof.
For notational convenience. let s =
~
'P(xw .s) = (V(xw); U(xw . xc),
IIfll
~
in
rI-.
and
'P(xw .s) is formally regarded as a vector
By standard calculation.
2
v • - v Gn (0) - v = v· -
(5.8)
d).
n
x~'{w}'
v~n (0)
- u
where Eo(-Is) is the conditional expectation on S with respect to gw(-ls;O) in
(3.20).
Thus it suffices to show (5.6).
The vector 'P(xw.s) - EO['P(Xw.s)ls] has the following components:
V(x) - EO[V(X )Is] = ~
(5.9)
w
z€s
w
p(zls)[V(x )-V(z)];
w
where p(zls) = P(Xw=zIX~,{w} = s) = gw(zls;O).
Given an arbitrary unit vector v
sub-configuration
x~
€
RK.
we can always find a
such that the corresponding components in v and in
'P(xw.s) - EO['P(Xw .s}ls] have the same sign.
The scheme is as follows:
Let vwand v • f € ~'{w} be the components of v
f
corresponding to (5.9) and (5.10) respectively. Set
Assume (A3) and Case 2 in (A4).
- 26 -
= zl' xf = z,
= z,
X w = zl' x
f
X
vw
if
w
if v
X
w = z2' xf
=z
,
if v
X
w = z2' x f
= ZM'
if V
w
w
v
0,
0, v
~
W
M
~
< 0;
f
< 0,
v
< 0,
v
0;
~
f
f
f
0;
~
< o.
Case 1 of (A4) can be treated in the same way.
It follows from (A3), (3.13), (5.9) and (5.10) that
IV'h)(xw's) -Ea [4p(Xw.s)ls])! ~ IIvlle
(5.11)
'" A " ' ' ' '
U(e.z)
= max U(z,z) - min U(z.z). z
zES
€
-b
1
q1
> O.
s.
zES
Therefore. on the set €n • (5.6) follows from (5.8) and (5.11) by letting
1
q = 4' e
Remark 5.1.
> O.
[]
Lemma 4.1 implies that
P(I~) ~ ~
(5.12)
-3b 1 2
q1
n
P7
e-'7
for some
e., > o.
'7 > O. P7 > 0
and V sufficiently large n.
Therefore by Lemma 5.2. "lth an arbitrarily large probability both Gn(a) and
Fn(a) are strictly concave in a.
In particular. ak would be the unique maximizer
of Gn (a).
Lemma 5.3.
(5.13)
Proof.
For every
~
>0
and V sufficiently large n.
sup
[G (a) - G (ak )]
lIa-akll~ n
n
By Taylor expansion,
<-
2
~ holds on I .
nbc
n
- 21 -
"-
"-
~ Eo.
for some 0 with "O-Ok "
Lemma 5.4.
For every
(5.14)
P[
for some Cs
Proof.
>0
Eo
I
and V sufficiently large n,
sup
IFn(O)-Gn(O)I
10-Ok "=Eo
> 0, as > 0,
Since
0
13S
2
> Eo~
;
'n
]
n
~ Cs e
13S
-aSn
,
> o.
log g(x;O)
g(x;Ok)
I~
2b
1
V 0 €
mK,
IFn (O)-Gn (0)1
10 g(x; 0)
1 g g(x;9 )
k
2
Hence (5.14) follows from Corollary 4.1 by letting
Proof of Theorem 3.2.
Take an arbitrary
~
p[
Eo
>0
Eo'
= :b{r
in (4.25).
such that {9 : "O-Ok "
sup
[F (0) - F (Ok)']
110-0 1I=Eo n
n
> 0;'
k
~
P[
sup
IFn (0) - Gn (0) I
110-0 1I=Eo
k
> Eo~
n
;
n
~ Eo}
] +P(,c)
n
'n ]
0
C
~.
-28-
o
(5.2) then follows from Lemma 5.4, Lemma 5.3 and (5.12).
Remark 5.2.
In general, the following identiFiabiLity condition needs to be
imposed:
For every n €
(A6)
~,
9
k
= 9k holds
whenever g(x;9 )
k
= g(x;9k) V x
€
Nevertheless, (A6) is not used in our proof of Theorem 3.2 explicitly.
o.
(A6) is
tightly connected with the equivaLence of potentiaLs [cf. Georgii (1988) and
Gidas (1988)].
In fact, (A3) , (A4) imply that each equivalence class of the
pair-potentials is a singleton, hence (A6) holds.
In some special cases, consistent estimators for (h,J) can be computed
directly without using MPLE.
ExampLe 2.1.
(the GIM revisited)
T
3
= (x, = -I,
V,
T
4
= (xw = -1;
x,
T
1f
= T 1, f
T
2f
= (xf
T
3f
= (xw = -I,
T
4f = T 4 ,
Let
€ ~);
= 1,
V , € ~\{w});
€ ~\{w};
= -1; x, = 1, V , €
f
x
f
= -1;
x,
~\{f}),
= I,
V, €
f
€
Ak\{w};
~\{w,f}),
f
€ ~\{w};
€ ~\{w}.
Then
(5.15)
(5.16)
and
f
€ ~/{w}.
-29-
Based on these explicit expressions. the construction of
(hn ;
In(f). f
€
z2}
consists of two steps:
Step 1.
Trancate (h.J) by Ok as in (3.1S) and estimate g(x;Ok} by Tn(x} defined
in (3.S).
Step 2.
Define
(S.17)
if
f
€ ~'{w}
(S.lS)
Then the exponential rate of consistency for
(hn ; Jn (f).
f
€
z2)
in the sense of
(3.25) can be derived by repeating the argument in Section 4.
6.
Concluding Remarks.
This paper has provided a solution to the open problem of nonparametric
estimation for GRF induced by pair-potentials.
phase transition and symmetry breaking.
in many examples of interest.
The results hold regardless of
The conditions (Al}-(A4) are satisfied
They can be modified in many other image models
without much difficulty. so that the argument in this paper still applies.
As mentioned in Section 1. one major unresolved issue is to replace (AS) by
some data-driven procedures.
Another important topic for future attention is to
generalize the results to the models with more complicated degradation structure.
It should also be pointed out that besides the grid size n and the range k of the
neighboring system. there is a third factor in asymptotic analysis:
grey-level r.
In practice r might also be large along with n and k.
the
To explore
the relation among n. k and r in various imaging problems should be very
- 30 -
interesting and challenging as well.
Acknowledgement.
aclmowledged.
Grant support from ONR (NOOOI4-S9-J-1760) is greatfully
The author particularly thanks Stuart Geman and Basilis Gidas for
many stimulating discussions.
References
Besag. J. (1986).
discussion) .
On the statistical analysis of dirty pictures (with
]. Roy. Stat. Soc .• Series B. 48. 259-302.
Comets. F. (1989). On consistency of a class of estimators for exponential
families of Markov random fields on the lattice. Preprint. Univ. Paris-X.
Comets. F. and Gidas. B. (1989). Parameter estimation for Gibbs distributions
from partially observed data. Preprint. Brown Univ.
Follmer. H. (1973). On entropy and information gain in random fields.
Z. Wahrsch. Verw. Geb .• 26. 207-217.
Geman. D. (1990). Random Fields and Inverse Problems in Imaging.
Lecture Notes in Math .• Springer-Verlag. New York.
To appear in
Geman. S. and Geman. D. (1984). Stochastic relaxation. Gibbs distributions. and
Bayesian restoration of images. IEEE Trans. PAXI-6. 721-741.
Geman. S. and Graffigne. C. (1986). Markov random field image models and their
applications to computer vision. Proceedings of the International Congress
of Mathematicians. 1986. ed. A.M. Gleason. ANS. Providence.
Geman. S. and Hwang. C. (1982). Nonparametric maximum likelihood estimation by
the method of sieves. Alul. Stat .• 10. 401-414.
Geman. S. and McClure. D. (1987). Statistical methods for tomographic image
reconstruction. Proceedings of the 46th Session of the International
Statistical Institute. Bulletin of the lSI. Vol. 52.
Georgii. H.O. (1988).
Berlin-New York.
Gibbs Measures and Phase Transitions.
Walter de Gruyter.
Gidas. B. (1986). Consistency of maximum likelihood and pseudo-likelihood
estimators for Gibbs distributions. Proceedings of the Workshop on
Stochastic DiFFerential Systems with Applications in Electrical/Computer
Engineering. Control Theory. and Operations Research. IMA. Univ. of
Minnesota.
Gidas. B. (1988).
Univ.
Parameter estimation for Gibbs distributions.
Preprint. Brown
A
~
- 31 Grenander, U. (1981).
Abstract Inference.
Wiley, New York.
Johnstone, I. and Silverman, B. (1990). Speed of estimation in positron emission
tomography and related inverse problems. Ann. Stat., 18, 251-280.
Ruelle, D. (1978). Thermodynamic FormaLism.
Massachusetts.
Addison-Wesley, Reading,
SeYmOur, L. and Ji, C. (1990). Nearly optimal procedures for selecting Markov
random fields in texture segmentation models. In preparation.