Document

Kriging theory
1. Some statistical properties of the sum of random variables
Our notation for variables denotes a single random variable X in capital letter, its realization, x, in
lower case; and vectors and matrices in bold faces, for example, X = [X1,...,Xn]T and x = [x1,..., xn]T.
Let  be a vector of n deterministic numbers, i.e.  = [1  2 …  n] T
Property 1.1:
var[Y] = E[Y2] - E[Y]2
Proof:
var[Y] = E[ (Y- E[Y]) 2] = E[Y2-2 Y E[Y]+E[Y]2 ]= = E[Y 2]-2 E[Y]E[Y]+E[Y]2 = E[Y 2] - E[Y]2
Property 1.2:
cov[Xi , Xj] = E[Xi Xj] - E[Xi] E[ Xj]
Proof:
cov[Xi , Xj] = E[(Xi - E[Xi])(Xj-E[ Xj])] ] = E[Xi Xj- XiE[Xj]- E[Xi]Xj+ E[Xi]E[Xj] ]
= E[Xi Xj] - E[Xi]E[Xj]- E[Xi]E[Xj]+ E[Xi]E[Xj] = E[Xi Xj] - E[Xi] E[Xj]
1
Property 1.3:
n
n
E[  i gi(X)] =
 i
i 1
E[gi(X)]
i 1
Proof:
n

n
n
i 1
i 1
n

E[  i gi(X)] =   dx  i gi(x) f(x)=  i   dx gi(x) f(x) =  i E[gi(X)]
i 1
i 1
The application of property 1.3 leads to properties 1.3.1. and 1.3.2 below
Property 1.3.1:
n
E[  i Xi] =
i 1
n
 i
E[Xi]
i 1
Property 1.3.2:
n
n
E[   i  j Xi Xj] =
i 1 j 1
n
n
n
n
 i E[   j Xi Xj] =   i  j
i 1
j 1
E[Xi Xj]
i 1 j 1
Property 1.4:
n
var(  i Xi) =
i 1
n
n
  i cov[Xi ,Xj] j
i 1 j 1
2
Proof:
n
n
n
n
n
var(  i Xi) = E[(  i Xi) ] – (E[  i Xi]) = E[   i  j Xi Xj] i 1
n n
=
2
i 1
2
i 1 j 1
i 1
n
n
n
n
n
i 1
j 1
 i E[Xi]   j E[Xj]
n
n
n
  i  j E[Xi Xj]-   i  j E[Xi]E[Xj] =   i  j (E[Xi Xj]-E[Xi]E[Xj])=   i cov[Xi
i 1 j 1
i 1 j 1
i 1 j 1
i 1 j 1
,Xj]j
Property 1.5:
n
cov(  i Xi ,
i 1
n
n
n
  j Yi) =   i cov(Xi , Yj) j
j 1
i 1 j 1
Proof:
n
cov(  i Xi ,
i 1
n n
=
n
n
n
n
n
j 1
i 1
j 1
i 1
j 1
  j Yi) = E[  i Xi   j Yi] – E[  i Xi ]E[   j Yi]
n
n
  i  j (E[Xi Yi] – E[Xi ]E[Yi]) =   i cov(Xi , Yj) j
i 1 j 1
i 1 j 1
3
2. Vector and matrix notation and properties
 1 
 
 
Bold variables will usually denote a vector of values, i.e.  =  2  ,
 
 4 
c1m 
 c11 c12


c
c


or a matrix of values, i.e. c=  21 22



c
c
 n1
nm 
cn1 
 c11 c21


c22
c

The superscript T denotes the transpose, i.e. T = [1  2 …  n] and cT=  12



cnm 
 c1m
The size of a matrix are its number of rows and columns, i.e. size()=[n,1], size( T)=[1,n],
size(c)=[n,m], size(cT)=[m,n],
The product of a matrix a of size [n,q] by a matrix b of size [q,m] is a matrix c =a b of size [n,m] and
q
with elements cij =  aik bkj , where i=1,…,n and j=1,…,m.
k 1
4
Note that for a product a b to be valid, a must have the same number q of columns as there are rows
in b. We call q the inner dimension of the product a b.
Example 2.1:
Consider T as a matrix of size [1,n] and elements 1i, where i=1,…,n, and assume that c is a square
matrix of size [n,n] and with elements cij where i=1,…,n and j=1,…,n
Then  c is a matrix of size [1,n] and elements ( c)1j =
T
T
n
 1i cij , where j=1,…,n.
i 1
Example 2.2:
Consider  as a matrix of size [n,1] and elements j1, where j=1,…,n.
Since Tc is a matrix of size [1,n] and  is a matrix of size [n,1] it follows that
n
n
 c is a matrix of size [1,1] (also called a scalar) with value ( c)11 =   1i cij j1.
T
T
j 1i 1
For simplicity we drop the subscript 1, and we swap the sums so that we have
n
n
 c =   i cij j.
T
i 1 j 1
Using vector and matrix notations, we can rewrite properties 1.4 and 1.5 as follow
Property 2.1 (same as property 1.4):
var(TX) = TcX,X , where cX,X=cov(X, X) is the [n,n] symmetric covariance matrix for X, i.e. it has
elements (cX,X)ij=cov(Xi, Xj).
5

Property 2.2 (same as property 1.5):
cov(TX , TY) = TcX,Y , where cX,Y =cov(X, Y) is the covariance matrix between X and Y, i.e. it
has elements (cX,Y)ij=cov(Xi, Yj).
The following two properties pertain to taking derivatives of matrices
Property 2.3:
Let = [1 …  n] be a [1,n] matrix, let q= [q1 … q n] be a [n,1] matrix, let the scalar q be their
 T q
 T q
product, and let
be a [n,1] matrix with elements
, then we have

λi
 T q
=q

Proof:
n
 T q
=
λi
  k q k
k 1
λi
=qi
Property 2.4:
Let = [1 …  n] be a [1,n] matrix, let q be a symmetric [n,n] matrix, then q is a scalar and
 T q
is a [n,1] matrix given by

6
 T q
= 2q

Proof:
n n
 T q
=
i
   k qkll
k 1l 1
i
   k qkll
=
k i l i
i
  i qil l   k qkii
 q 
+ l i
+ k i
+ i ii i
i
i
i
n
=0+  qil l +  k qk i + 2qii i =2  qil l + 2qii i =2  qil l
l i
k i
l i
l 1
7
3. Simple kriging
Suppose that the random variable Xk and the random vector Xh=[ Xh1, …, Xhn]T are correlated with
known covariance.
We seek the best linear unbiased estimate x̂k for Xk given measured values xh for Xh.
The main feature of simple kriging (SK) is that we assume that the expected value of Xk and Xh are
known, i.e. the simple kriging model assumption is
E[Xk]=mk and E[Xh]=mh, where mk and mh are known values.
We then apply the properties that SK is linear, that it is unbiased, and that it is the best estimator in
that it minimizes the estimation error variance. These three properties lead to the following
equations:
SK is a linear estimator
We define X̂ k as a random variable equal to the linear combination of Xh, i.e.
X̂ k =0 + T Xh , where =0 and T=[1,…, n] are kriging weights
SK is unbiased
E[ X̂ k ]= E[Xk ]

X̂ k = mk + T (Xh - mh )
Proof: E[ X̂ k ]= E[Xk ]  E[0 + T Xh ]= E[Xk ]  0 + T mh = mk  0 = mk - T mh which
once substituted back in X̂ k =0 + T Xh gives the desired result.
8
SK minimizes the estimation variance
Let Ek=(Xk- X̂ k ) be the estimation error. Since the estimate X̂ k is unbiased, it follows that E[Ek]=0,
and that the estimation error variance is v̂ k=var(Ek)=E[(Xk- X̂ k )2]. Substituting X̂ k with its
expression we get
 X  mk 
 , and
Ek=(Xk- X̂ k ) = [1 -T]  k
 X h  mh 
 X  mk 
 )= [1 -T]
v̂ k=var(Ek)= var ([1 -T]  k
 X h  mh 
=ckk + Tchh -2 Tchk ,
 ckk

 chk
ckh   1 


chh     
(using property 2.1)
where ckk=cov(Xk, Xk)=var(Xk), chh=cov(Xh, Xh) and chk=cov(Xh, Xk) have known covariance values.
The kriging weights that minimize v̂ k verify the following set of n equations
v̂k
=0
λ
Substituting v̂ k for its expression, using properties 2.3 and 2.4, and rearranging, we obtain the
following equation for the SK weights
9
 = chhchk
Proof :
v̂k

=0 
(ckk + Tchh - 2Tchk) = 0  0 + 2chh - 2chk = 0   = chhchk
λ
λ
Finally, substituting the above expression for  in the equations for X̂ k and v̂ k, and using x̂k and xh
in place of X̂ k and Xh, respectively, we obtain
the Simple Kriging (SK) estimator:
x̂ k = mk + ckhchh(xh - mh )
v̂ k = ckk - ckh chhchk
Example 3.1
Assume that n=1, mk=10, mh=10, ckk=2, chh=2, chk = ckh =1.5, xh=12. Calculate x̂ k and v̂ k.
x̂ k = mk + ckhchh(xh - mh) = 10 + 1.5 / 2 * (12-10) = 10 + 1.5 = 11.5
v̂ k = ckk - ckh chhchk= 2 – 1.5 / 2 * 1.5 = 2 – 1.125 = 0.875
MATLAB Program :
mk=10; mh=10;
Ckk=2; Chh=2; Chk=1.5; Ckh=Chk; xh=12;
xk=mk+Ckh*inv(Chh)*(xh-mh)
vk=Ckk-Ckh*inv(Chh)*Chk
Example 3.2
10
10 
 2 1.1
 1 .5 
12 
 , chk = ckhT =   , xh=   . Calculate x̂ k
Assume that n=2, mk=10, mh=   , ckk=2, chh= 
10 
1.1 2 
 1 .5 
12 
and v̂ k.
1
 2 1.1 12  10 
 = 10 + 1.936 = 11.936
x̂ k = mk + ckhchh (xh - mh ) = 10 + (1.5 1.5) 
 
12

10
1
.
1
2


 
1
 2 1.1 1.5 
v̂ k = ckk - ckh chhchk= 2 - (1.5 1.5) 
   = 2 – 1.451 = 0.548
1
.
1
2

  1 .5 

MATLAB Program:
mk=10; mh=[10;10];
Ckk=2; Chh=[2 1.1;1.1 2]; Chk=[1.5;1.5]; Ckh=Chk';
xh=[12;12];
xk=mk+Ckh*inv(Chh)*(xh-mh)
vk=Ckk-Ckh*inv(Chh)*Chk
Example 3.3
Let’s consider the stationary temporal random field (TRF) X(t) representing microbial contamination
(in CFU), where t is time (in hr). Let’s assume that the mean trend of X(t) is mX(t)=mX=10 CFU, and
that its covariance between times t and t’ is given by the covariance model
cX(t,t’)= cX(|t-t’|)= cX() = 20 CFU2 exp(-3  / 10 hr).
11
Let’s assume that samples collected at time t1=5 hr and t20=20 hr were analyzed and provided the
following measurements: X(t1)=19 CFU and X(t2)=9 CFU. What is the simple kriging estimate of the
microbial concentration at time 10 hr?
 t   5 hr 
 be the times at which the hard data is
Let tk=10 hr be the estimation time, and let th=  1  = 
20
hr
t

 2 
 X (t1 ) 
available. Then let Xk=X(tk) and Xh= X(th)= 
 represent the TRF at the estimation time and the
X
(
t
)

2 
hard data times, respectively.
Since the TRF has a constant mean of 10 CFU, it follows that the expected values for Xk and Xh are
10 
mk=10 and mh=   , respectively.
10 
We then need to calculate covariance matrix chh. Using matrix notation we have
 | t  t | | t1  t2 | 
chh=20 CFU2 exp(-3 dhh / 10 hr), where dhh = dist(th,th) =  1 1

|
t

t
|
|
t

t
|
 2 1
2
2 
12
 X (t1 )   X (t1 ) 
 cov( X (t1 ), X (t1 )) cov( X (t1 ), X (t2 )) 
Proof: chh = cov(Xh,Xh) = cov( 
 , 
 ) = 

X
(
t
)
X
(
t
)
cov(
X
(
t
),
X
(
t
))
cov(
X
(
t
),
X
(
t
))


2  
2 
2
1
2
2 
 c (t , t ) c X (t1 , t 2 )   20 exp( 3 | t1  t1 | / 10) 20 exp( 3 | t1  t2 | / 10) 
=  X 1 1
 = 

c
(
t
,
t
)
c
(
t
,
t
)
20
exp(

3
|
t

t
|
/
10
)
20
exp(

3
|
t

t
|
/
10
)
 X 2 1

X 2 2  
2
1
2
2
 | t  t | | t1  t2 | 
=20 exp(-3  1 1
 / 10 hr) =20 exp(-3 dhh / 10)
|
t

t
|
|
t

t
|
 2 1
2
2 
Substituting t1=5 hr and t2=20 hr we obtain
 | t  t | | t1  t2 |   | 5  5 | | 5  20 |   0 15 
 = 
 , and
dhh=  1 1
 = 
|
20

5
|
|
20

20
|
15
0
|
t

t
|
|
t

t
|

 

 2 1
2
2 
0.222 
 20
 .
chh=20 CFU2 exp(-3 dhh / 10 hr) = 
0
.
222
20


Similarly we have
dkk= 0
 | t  t |   | 5  10 |   5 
 =  
dhk=  1 k  = 
|
20

10
|
|
t

t
|
 10 
 2 k  
dkh= dhkT = 5 10
13
and
ckk=20 CFU2 exp(-3 dkk / 10 hr) = 20
 4.463

chk=20 CFU2 exp(-3 dhk / 10 hr) = 
0
.
996


T
ckh= chk = (4.463 0.996)
Finally we can calculate the simple kriging estimate and its estimation error
x̂ k = mk +
ckhchh(xh
0.222 
 20
chk = 20 - (4.463 0.996) 

0
.
222
20



v̂ k = ckk - ckh chh
0.222 
 20
- mh ) = 10 +(4.463 0.996) 

0
.
222
20


1
1
19  10 

 =11.95 CFU
9

10


 4.463

 = 18.96
0
.
996


14
MATLAB Program:
tk=10; th=[5;20]; xh=[19;9]; mk=10; mh=[10;10]; Ckk=20;
Dhh=coord2dist(th,th)
Chh=20*exp(-3*Dhh/10)
Dhk=coord2dist(th,tk)
Chk=20*exp(-3*Dhk/10)
Ckh=Chk';
xk=mk+Ckh*inv(Chh)*(xh-mh)
vk=Ckk-Ckh*inv(Chh)*Chk
Example 3.4
Let X(t) be a stationary TRF with mean mX=19 CFU and covariance cX()=20 CFU2 exp(-3/25 hr).
Let’s assume that at times t1=2 hr and t2=15 hr we have the measured values X(t1)=29 CFU and
X(t2)=16 CFU. Plot the simple kriging estimate +/- its standard deviation from times 0 to 20 hr.
15
MATLAB function (save as example3_4.m)
Figure
function example3_4
th=[2;15]; xh=[29;16]; mh=[19;19];
Ckk=20;
Dhh=coord2dist(th,th);
Chh=20*exp(-3*Dhh/10);
tk=0:.1:20;
mk=0*tk+19;
for i=1:length(tk)
Dhk=coord2dist(th,tk(i));
Chk=20*exp(-3*Dhk/10);
Ckh=Chk';
[xk(i),vk(i)]=SK(xh,mk(i),mh,Ckk,Ckh,Chh);
end
figure;
set(gca,'FontSize',16);
hold on;
plot(th,xh,'s');
plot(tk,xk);
plot(tk,xk-sqrt(vk),':');
plot(tk,xk+sqrt(vk),':');
xlabel('Time (hr)');
ylabel('X (CFU)');
function [xk,vk]=SK(xh,mk,mh,Ckk,Ckh,Chh)
xk=mk+Ckh*inv(Chh)*(xh-mh);
vk=Ckk-Ckh*inv(Chh)*Ckh';
Exercise 3.1:
Assume that n=1, mk=10, mh=10, ckk=2, chh=2, chk = ckh =0.5, xh=12. Calculate x̂ k and v̂ k.
16
Exercise 3.2
14 
 20 11 
11
15 
 , chk = ckhT =   , xh=   . Calculate x̂ k and v̂ k.
Let n=2, mk=14, mh=   , ckk=20, chh= 
14 
 11 20 
11
15 
Exercise 3.3
Let X(t) be a stationary TRF with mean mX=10 CFU and covariance CX()=20 CFU2 exp(-3/5 hr).
Let’s assume that at times t1=2 hr, t2=10 hr and t3=15 hr we have the measured values X(t1)=29 CFU,
X(t2)=12 CFU and X(t3)=16 CFU.
a. Calculate the simple kriging estimate and its estimation error variance at time tk=8hr.
b. Plot the simple kriging estimate +/- its standard deviation from times 0 to 20 hr.
Exercise 3.4
Due to a traffic accident, a truck carrying the toxic agent A contaminates a lake in Orange County.
Investigation after this accident indicates that the distribution across space of the agent can be
modeled as a Spatial Random Field (SRF) X(s), where s denote space, with a constant mean of
10ppm, and a covariance function cX(s,s’)= cX(||s - s’||)= cX(r) = 20 CFU2 exp(-3 r / 15 Km), where r
is the distance between points s and s’ expressed in Km.
An infant resides in a house at a distance of 2 Km from a point in the lake where the concentration is
measured to be 50ppm. What is the simple kriging estimate and associated estimation error variance
at the house where the infant resides?
17
4. Ordinary kriging
In ordinary kriging (OK) we still suppose that the random variable Xk and the random vector Xh=[
Xh1, …, Xhn]T are correlated with known covariance, and we still seek the best linear unbiased
estimate x̂ k for Xk given measured values for Xh.
The difference with simple kriging is that in ordinary kriging we assume that the expected value of
the random field is unknown and constant, i.e. the ordinary kriging model assumption is
1
 
E[Xk]=m and E[Xh]=1 m , where m is the unknown mean, and 1=  ...  is a n by 1 column vector of
1
 
1s.
We then apply the properties that OK is linear, that it is unbiased, and that it is the best estimator in
that it minimizes the estimation error variance. These three properties lead to the following
equations:
OK is a linear estimator
We define X̂ k as a random variable equal to the linear combination of Xh, i.e.
X̂ k = T Xh , where T=[1,…, n] are kriging weights
OK is unbiased
E[ X̂ k ]= E[Xk ]

T 1 - 1= 0 , which means that the kriging weights must sum to 1.
18
Proof: E[ X̂ k ]= E[Xk ]  E[T Xh ]= E[Xk ]  T 1 m = m  T 1 = 1  T 1 - 1= 0
OK minimizes the estimation variance
We again let Ek=(Xk- X̂ k ) be the estimation error, and v̂ k=var(Ek)=E[(Xk- X̂ k )2] be the estimation
error variance given by
X 
Ek=(Xk- X̂ k ) = [1 -T]  k  , and
 Xh 
X 
v̂ k=var(Ek)= var ([1 -T]  k  )=[1 -T]
 Xh 
=ckk + Tchh -2 Tchk
 ckk

 chk
ckh   1 

 
chh     
(using property 2.1)
where ckk=cov(Xk, Xk)=var(Xk), chh=cov(Xh, Xh) and chk=cov(Xh, Xk) have known covariance values.
We seek the kriging weights which minimize v̂ k while also satisfying the constraint that
T 1 - 1= 0. These kriging weights are obtained by minimizing the Lagrange function
l = var(ek) + 2  (1T - 1)
where  is an unknown parameter that is often called the Lagrange multiplier. See “An Introduction
to Lagrange Multipliers” by Steuard Jensen for some background on Lagrange multipliers.
19
The kriging weights that minimize l verify the following set of n+1 equations
 l
   0
 2chk  2chh  2(1 )  0
chh  1  chk
 chh
 Τ



 l


Τ
Τ
2
(

1

1
)

0

1

1


1
 0
 
1     chk 
    
0     1 
1
    chh 1   chk 
  
     Τ

1
0
  
  1 
The above set of n+1 linear equations can be solved numerically to obtain the value of the ordinary
kriging weights , which once substituted in x̂ k = T xh and v̂ k= ckk + Tchh -2 Tchk provides the
ordinary kriging estimated value and its association error variance.
In the following we obtain an algeabric solution for . To take the inverse of the square n+1 by n+1
partitioned matrix we recall the following property for a symmetric partitioned matrix (Searles,
1982, Matrix Algebra useful for Statistics)
a
 Τ
b
b

c 
1
 a 1  a 1bwbΤ a 1  a 1bw 

 
Τ 1
 wb a
w 

with w = (c-bTa-1b)-1
Using this property we get
20
    chh
    Τ
  1
1

0 
1
 chk   c hh 1  c hh 11w1Τ c hh 1
  = 
 w1Τ c hh 1
 1  
 chh 11w   chk 
1


with
w
=
 1 
w
1Τ chh 11
 
1
Τ

(
1

1
c
c hk ) 
1 
hh



c
c

1

hh  hk
 
1Τ c hh 11 


1
1
Τ
Τ
   1 c hh c hk  1 / 1 c hh 1



Substituting the expression for  in the equations for X̂ k we obtain the OK estimate
1 Τ
1 
 1Τ c hh 1
c
c
11
c

1
kh
hh
hh
 Xh
X̂ k =  Xh =  Τ 1  c khc hh 
1
Τ

1 c hh 1
 1 c hh 1

T
 X̂ k = M̂ + ckhchh(Xh - 1 M̂ ),
with M̂ = 1Tchh-1 Xh/ 1Tchh-11
Furthermore note that from the OK equations we have  chh-1chk -1  chhchk -1  
TchhTchk -T 1Tchk -, which when substituting in v̂ k gives
v̂ k= ckk + Tchh -2 Tchk= ckk +Tchk - -2 Tchk= ckk -Tchk -
Hence, using x̂k , xh and m̂ in place of X̂ k and Xh and M̂ , respectively, we have
21
the Ordinary Kriging (OK) estimator: x̂ k = m̂ + ckhchh(xh - 1 m̂ )
v̂ k= ckk -Tchk -

mˆ  1Τ c hh 1 x h / 1Τ c hh 11

1
Τ
(
1

1
c
c hk ) 

1 
hh
where   c hh  c hk  1
1Τ c hh 11 


   1Τ c 1c  1 / 1Τ c 11

hh
hk
hh



We note that the OK estimate is nothing else but the SK estimate, with a constant mean set to its
Generalized Least Square (GLS) estimate m̂ .
Exercise 4
Do exercises 3.1 to 3.4 replacing the SK estimator with the OK estimator
22
5. Universal kriging
In universal kriging (UK) we once more suppose that the random variable Xk and the random vector
Xh=[ Xh1, …, Xhn]T are correlated with known covariance, and we still seek the best linear unbiased
estimate X̂ k for Xk given measured values xh for the random vector Xh.
The difference with simple kriging or ordinary kriging is that in universal kriging we assume that the
expected value of the random field is unknown and linearly varying with respect to p known
variables at each data and estimation points, i.e. the universal kriging model assumption is
 1 
 
E[Xh]= g and E[Xk]=g0, where  =  ...  are p unknown linear coefficients for p known predictor
 
 p
variables, g is a n by p design matrix where gij provide the j-th known predictor variable at the i-th
data point, and g0 is a 1 by p row vector where g0j provides the j-th known predictor variable at the
estimation point.
For example if Xk represents a TRF at time tk, and if Xh represents a TRF at times th=[ t1 t2 … tn],
then time raised to any power is a known variable that can be used as a predictor for each data and
estimation points. If we use time raised to the power 0, 1 and 2 we create 3 known predictor
variables for each data and estimation points. In that case we have p=3, and the linear coefficients 
and the design matrices g and g0 are given by
23
 1 
 
 =  3 
 
 3
1

1
,g = 
 ...
1

t12 

2
t2 t2 
 and g0= 1 t k
... ... 
t n t n2 
t1


t k2 ,
which models the mean trend as a quadratic drift of time. Note that ordinary kriging is a special case
of universal kriging in the following case
1
 
1
p=1,  =m,g =   and g0=1.
...
 
1
We then apply the properties that UK is linear, that it is unbiased, and that it is the best estimator in
that it minimizes the estimation error variance. These three properties lead to the following
equations:
UK is a linear estimator
We define X̂ k as a random variable equal to the linear combination of Xh, i.e.
X̂ k = T Xh , where T=[1,…, n] are kriging weights
UK is unbiased
24
E[ X̂ k ]= E[Xk ]

T g – g0= 0 , which is a set of p equations; T gj – g0j= 0, for j=1, … ,p
Proof: E[ X̂ k ]= E[Xk ]  E[ Xh ]= E[Xk ]   g = g0 
T
T
p
p
  gj  j =  g 0 j  j
j 1
Τ
j 1
 T gj  j = g0 j  j for j=1, … ,p  T gj = g0 j for j=1, … ,p
UK minimizes the estimation variance
We again let Ek=(Xk- X̂ k ) be the estimation error, and v̂ k=var(Ek)=E[(Xk- X̂ k )2] be the estimation
error variance given by
X 
Ek=(Xk- X̂ k ) = [1 -T]  k  , and
 Xh 
X 
v̂ k=var(Ek)= var ([1 -T]  k  )=[1 -T]
 Xh 
=ckk + Tchh -2 Tchk
 ckk

 chk
ckh   1 


chh     
(using property 2.1)
where ckk=cov(Xk, Xk)=var(Xk), chh=cov(Xh, Xh) and chk=cov(Xh, Xk) have known covariance values.
25
We seek the kriging weights which minimize v̂ k while also satisfying the constraint that
T g – g0= 0. These kriging weights are obtained by minimizing the Lagrange function
l = var(ek) + 2  T (g T – g0 T)
where  T is a 1 by p row vector of p unknown Lagrange multipliers.
The kriging weights that minimize l verify the following set of n+p equations
 l
   0
 2chk  2chh  2 gμ  0
chh  gμ  chk
 chh
 Τ
 


 l

Τ
Τ
Τ
Τ
2
(
g


g
)

0
g


g
g


0
0
 0
 μ
  c
     hh
Τ
 μ  g
g

0 
1
g     chk 
    Τ 
0  μ   g0 
 chk 
 Τ 
 g0 
The above set of n+ p linear equations can be solved numerically to obtain the value of the universal
kriging weights , which once substituted in x̂ k = T xh and v̂ k= Ckk + TChh -2 TChk provide the
universal kriging estimated value and its association error variance.
In the following we obtain an algeabric solution for . To take the inverse of the square n+p by n+p
partitioned matrix we use again the property of symmetric partitioned matrices to obtain
26
    chh
    Τ
 μ  g
g

0 
1
 chk   chh 1  chh 1 gwg Τ chh 1  chh1 gw   chk 
 T 
 Τ  = 
1
Τ
g 
g
 wg chh
w
 0  
 0 

  chh1 chk  gw ( g Τ chh 1chk  g0Τ )
 
  w ( g Τ chh1chk  g0Τ )

with w = - (g Tchh-1g) -1

Substituting the expression for  in the equations for X̂ k , and using x̂k and xh in place of X̂ k and Xh,
respectively, we obtain the UK estimate
x̂ k = T xh =( ckh chh-1 + ckh chh-1gwgT chh-1 – x0wgT chh-1) xh
= ckh Tchh-1 (xh + gwgT chh-1 xh) – g0wgT chh-1 xh
= ckh Tchh-1 (xh - g β̂ ) + g0 β̂ , with β̂ =-wgT chh-1 xh=(g Tchh-1g) -1gT chh-1 xh
 x̂ k = g0 β̂ + ckh Tchh-1 (xh - g β̂ ),
with β̂ =(g Tchh-1g) -1gT chh-1 xh
Furthermore note that from the UK equations we have TchhTchk -(Tg)Tchk -g0, which
when substituting in v̂ k gives
27
v̂ k= ckk + Tchh -2 Tchk= ckk +Tchk - g0 -2 Tchk= ckk -Tchk - g0
Hence, by way of summary, we have
the Universal Kriging (UK) estimator:
where
x̂ k = g0 β̂ + ckhchh(xh - g β̂ )
v̂ k= ckk -Tchk - g0
β̂ =(g Tchh-1g) -1gT chh-1 xh
w = - (g TChh-1g) -1
  chh1 chk  gw( gΤ chh1chk  g0Τ )


  w( gΤ chh1chk  g0Τ )
We note that the UK estimate is nothing else but the SK estimate, with a mean set to its Generalized
Least Square (GLS) estimate, i.e.
xh= g +  β̂ =(g Tchh-1g) -1gT chh-1 xh
Exercise 5
Do exercises 3.3 replacing the SK estimator with the UK estimator having a linear drift in the mean
trend.
28