Interpolation {x,y} Data with Suavity
Peter K. Ott
Forest Analysis & Inventory Branch
BC Ministry of FLNRO
Victoria, BC
[email protected]
30/11/2015
1
The Goal
โข Given a set of points:
๐ฅ๐ , ๐ฆ๐ , ๐ = 1,2, โฆ , ๐
find a function that passes through the points
affording the prediction of ๐ฆ๐ at new ๐ฅ๐
โข Regression or smoothing is a related but
different problem
30/11/2015
2
Outline
โข
โข
โข
โข
Example Data
Linear Interpolation
Thin Plate Splines (TPS)
Ordinary Kriging (OK)
โข Two implementations
โข Conclusion
30/11/2015
3
data fake;
input x y;
cards;
0
41
1
26
2
19
3
18
4
18.5
5
17.5
6
18
7
18.5
8
19
;
run;
ods html;
ods graphics on;
proc sgplot data=fake noautolegend;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Y versus X';
run;
ods graphics off;
ods html close;
30/11/2015
4
Our Data
30/11/2015
5
Linear Interpolation (~200 yrs BC)
โข Form a straight line between pairs of known points
โข What is ๐ฆ0 | ๐ฅ0 where ๐ฅ0 lies between ๐ฅโ1 and ๐ฅ+1 ?
โข Slope must be constant, so
โข Solve for ๐ฆ0 :
๐ฆ0 โ ๐ฆโ1 ๐ฆ+1 โ ๐ฆ0
=
๐ฅ0 โ ๐ฅโ1 ๐ฅ+1 โ ๐ฅ0
๐ฆโ1 ๐ฅ+1 โ ๐ฅ0 + ๐ฆ+1 ๐ฅ0 โ ๐ฅโ1
๐ฆ0 =
๐ฅ+1 โ ๐ฅโ1
30/11/2015
6
data interp0; *denser range of x to be interpolated;
do x=0 to 8 by 0.1;
output;
end;
run;
proc sql;
create table lin_pred(drop=x0) as
select *
from interp0 left join fake(rename=(x=x0))
on put(interp0.x, 6.3) = put(fake.x0, 6.3)
;
quit;
proc print data=lin_pred(obs=34) noobs;
run;
30/11/2015
7
x
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
30/11/2015
y
41
.
.
.
.
.
.
.
.
.
26
.
.
.
.
.
.
.
.
.
19
.
.
.
.
.
.
.
.
.
18
.
.
.
8
proc expand data=lin_pred(keep=x y) out=lin_interp;
convert y=linear / method=join;
id x; *data must be sorted by x;
run;
ods html;
ods graphics on;
proc sgplot data=lin_interp noautolegend;
series y=linear x=x /
lineattrs=(pattern=2 thickness=1pt color=red)
lineattrs=GraphPrediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Linear interpolated values';
run;
ods graphics off;
ods html close;
30/11/2015
9
Linear Interpolation
30/11/2015
10
Thin Plate Splines (1970s)
โข
Want a function to minimize:
๐
๐ฟ=
๐ฆ๐ โ ๐ ๐ฅ๐
2
+๐โ
๐ โฒโฒ ๐ฅ
2
๐๐ฅ
๐=1
โข
or more generally
๐
๐ฟ=
โข
where, for ๐ = 2
๐ฝ ๐ =
โข
๐ฆ๐ โ ๐ ๐ฑ ๐ข
2
+๐โ๐ฝ ๐
๐=1
๐2๐
๐๐ฅ12
2
๐2๐
+
๐๐ฅ1 ๐ฅ2
2
๐2๐
+
๐๐ฅ22
2
๐๐ฅ1 ๐๐ฅ2
Where ๐ โฅ 0 is an unknown parameter that controls the wiggliness
30/11/2015
11
โข Solution to this problem is a function that relies on radial
basis functions and it passes through data without knots
โข One dimension example:
1
๐ฆ0 | ๐ฅ0 = ๐ผ0 + ๐ผ1 ๐ฅ0 +
12
๐
๐ฝ๐ โ ๐ฅ0 โ ๐ฅ๐
๐=1
โข Two dimension example:
๐ฆ0 | ๐ฅ01 , ๐ฅ02
1
= ๐ผ0 + ๐ผ1 ๐ฅ01 + ๐ผ2 ๐ฅ02 +
8๐
โข where
๐ง๐ =
30/11/2015
๐ฅ01 โ ๐ฅ1๐
2
3
๐
๐ฝ๐ โ ๐ง๐2 ๐๐๐ ๐ง๐
๐=1
+ ๐ฅ02 โ ๐ฅ2๐
2
12
proc tpspline data=fake;
model y =(x); */ lambda0=1e-15;
*setting lambda0 to zero is necessary for interpolation;
score data=interp0 out=tps_pred pred;
*this will yield the interpolated points and more;
output out=tps_coef pred coef;
run;
proc print data=tps_coef noobs;
run;
*output are alpha[0], alpha[1], beta[1], ..., beta[n], with the
beta aligned with sorted (unique) x[i];
*Note also that sum(beta[i])=0 and sum(beta[i]*x[i])=0;
30/11/2015
13
x
0
1
2
3
4
5
6
7
8
.
.
30/11/2015
y
41.0
26.0
19.0
18.0
18.5
17.5
18.0
18.5
19.0
.
.
P_y
Coef_y
41.0000 27.8513
26.0000 -8.1071
19.0000 10.5088
18.0000 -15.0530
18.5000 0.2121
17.5000 -0.7953
18.0000 11.9690
18.5000 -11.0810
19.0000
5.3549
.
-1.3387
.
0.2231
14
proc sql;
create table tps_pred2(drop=x0) as
select *
from tps_pred left join fake(rename=(x=x0))
on put(tps_pred.x, 6.3) = put(fake.x0, 6.3)
;
quit;
ods html;
ods graphics on;
proc sgplot data=tps_pred2 noautolegend;
series y=p_y x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Interpolated values';
run;
ods graphics off;
ods html close;
30/11/2015
15
Thin Plate Spline
30/11/2015
16
Ordinary Kriging (1960s)
โข Consider ๐ฆ๐ | ๐ฅ๐ as a multivariate Gaussian
process:
๐ฆ๐ | ๐ฅ๐ = ๐ฒ ~ ๐๐ ๐๐, ๐
๐
๐=1 ๐ค๐ ๐ฆ๐
โข Find the estimator ๐ฆ0 | ๐ฅ0 =
= ๐ฐโฒ๐ฒ
such that:
๐ธ ๐ฆ0 | ๐ฅ0 = ๐ (unbiased), and
Prediction error, ๐๐๐(๐ฆ0 โ ๐ฆ0 ) is minimized
30/11/2015
17
โข It turns out:
๐ฐ=
๐โ๐ ๐ โ
โ1 โ๐
โฒ
โ๐
๐ ๐ ๐ ๐ ๐๐โฒ ๐โ๐ ๐
+
โ1 โ๐
โฒ
โ๐
๐๐ ๐ ๐ ๐
(ugly)
๐ฆ0 | ๐ฅ0 = ๐ + ๐โฒ๐ ๐โ๐ ๐ฒ โ ๐๐ (better)
โข where
๐ถ๐๐ฃ ๐ฆ1 , ๐ฆ0
โ1 โฒ โ๐
โฒ
โ๐
๐ = ๐ ๐ ๐ ๐ ๐ ๐ฒ and ๐๐ = ๐ถ๐๐ฃ ๐ฆ2 , ๐ฆ0
โฎ
๐ถ๐๐ฃ ๐ฆ๐ , ๐ฆ0
30/11/2015
18
โข How do we determine ๐๐ ? Weโll need to model the
covariance structure as a function of distance, say โ
โข Tradition is to use semivariances (semivariogram) instead of
covariances (covariogram) or correlations (correlogram):
๐พ๐๐ = ๐ 2 โ ๐๐๐
= ๐ 2 1 โ ๐๐๐
1
๐พ โ =
2โ๐ โ
30/11/2015
๐ โ
๐ฆ๐ ๐ฅ๐ + โ โ ๐ฆ๐ ๐ฅ๐
2
๐=1
19
Semivariogram
30/11/2015
20
Features of the (Semi)variogam
โข Nugget: discontinuity at the origin. Canโt have
this for interpolation with kriging!
โข Range: distance it takes for the variogram to level
off (reach asymptote)
โข Sill: value of variogram at asymptote (= ๐ 2 =
๐ฃ๐๐ ๐ฆ0 ). When a nugget is present, sill = partial
sill + nugget
30/11/2015
21
Ordinary Kriging
Implementation - two options:
1. Use both proc variogram & proc krige2d
โข need to create a second variable (x2) with
constant values
2. Use a mixed model procedure (proc mixed)
โข not provided empirical and fitted variograms
automatically
30/11/2015
22
data fake2;
set fake;
x2=1; *constant value;
run;
ods html;
ods graphics on;
proc variogram data=fake2 outvar=look;
store out=semivar_store;
directions 90(0); *not really needed;
compute lagdist=1 maxlag=10;
*lagdist should be ~ 2*min norm and maxlag should be ~ max norm among xs;
coordinates xc=x yc=x2;
var y;
model nugget=0 form=auto(mlist=(gau,pow,she) nest=2) choose=(AIC SSE
STATUS);
*important that nugget is zero for interpolation;
run;
proc krige2d data=fake2 outest=kr_pred(rename=(gxc=x estimate=y_est));
restore in=semivar_store;
coordinates xc=x yc=x2;
predict var=y;
model storeselect;
grid x=0 to 8 by 0.01 y=1 to 1 by 1;
run;
30/11/2015
23
The VARIOGRAM Procedure
Dependent Variable: y
Empirical Semivariogram at
Angle=90
30/11/2015
Lag
Class
Pair
Count
Average
Distance
0
0
.
.
1
8
1
17.313
2
7
2
39.339
3
6
3
49.146
4
5
4
58.000
5
4
5
77.188
6
3
6
97.542
7
2
7
138.813
8
1
8
242.000
9
0
.
.
10
0
.
.
Semivariance
24
30/11/2015
25
proc sql;
create table kr_pred2(drop=x0) as
select *, (y_est+1.96*stderr) as cl_upp, (y_est-1.96*stderr) as cl_low
from kr_pred(keep=x y_est stderr) left join fake2(keep=x y rename=(x=x0))
on put(kr_pred.x, 6.3) = put(fake2.x0, 6.3)
;
quit;
proc sgplot data=kr_pred2 noautolegend;
series y=y_est x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=graphprediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
series y=cl_upp x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
series y=cl_low x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
title 'Interpolated values';
run;
ods graphics off;
ods html close;
30/11/2015
26
30/11/2015
27
Getting setup for proc mixed
data fake_fmixed;
set fake end=last;
output;
if last then do x=0 to 8 by 0.01;
y=.;
output;
end;
run;
proc print data=fake_fmixed(obs=34) noobs;
run;
30/11/2015
28
x
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21
0.22
0.23
0.24
30/11/2015
y
41.0
26.0
19.0
18.0
18.5
17.5
18.0
18.5
19.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
proc mixed data=fake_fmixed;
model y = / outp=ok_preds; *outputing predictions;
repeated / subject=intercept type=sp(matern)(x);
title 'Ordinary Kriging in Proc Mixed';
run;
proc print data=ok_preds(obs=34) noobs;
run;
ods html;
ods graphics on;
proc sgplot data=ok_preds(where=(resid=.)) noautolegend;
series y=pred x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction;
series y=lower x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
series y=upper x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
title 'Kriged values via Mixed Model';
run;
ods graphics off;
ods html close;
30/11/2015
30
x
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21
0.22
0.23
0.24
30/11/2015
y
41.0
26.0
19.0
18.0
18.5
17.5
18.0
18.5
19.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Pred
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
41.0000
40.8198
40.6401
40.4607
40.2818
40.1034
39.9254
39.7478
39.5707
39.3941
39.2179
39.0422
38.8670
38.6923
38.5181
38.3443
38.1711
37.9984
37.8262
37.6546
37.4834
37.3128
37.1428
36.9733
36.8043
StdErr
Pred DF
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
.
8
0.0104 8
0.0206 8
0.0305 8
0.0401 8
0.0494 8
0.0584 8
0.0672 8
0.0757 8
0.0839 8
0.0918 8
0.0995 8
0.1069 8
0.1140 8
0.1208 8
0.1274 8
0.1337 8
0.1398 8
0.1456 8
0.1511 8
0.1564 8
0.1614 8
0.1661 8
0.1706 8
0.1749 8
Alpha
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
Lower
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
.
40.7957
40.5925
40.3904
40.1894
39.9895
39.7906
39.5929
39.3962
39.2006
39.0062
38.8128
38.6206
38.4295
38.2394
38.0505
37.8628
37.6761
37.4906
37.3061
37.1229
36.9407
36.7597
36.5798
36.4010
Upper
201.330
201.330
201.330
201.330
201.330
201.330
201.330
201.330
201.330
.
40.844
40.688
40.531
40.374
40.217
40.060
39.903
39.745
39.588
39.430
39.272
39.113
38.955
38.797
38.638
38.479
38.321
38.162
38.003
37.844
37.685
37.526
37.367
37.208
Resid
-22.5257
-37.5257
-44.5257
-45.5257
-45.0257
-46.0257
-45.5257
-45.0257
-44.5257
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
30/11/2015
32
Comparison of all 4 approaches
30/11/2015
33
Conclusions
โข TPSs and OK are both capable of interpolation
and smoothing
โข TPSs require no distributional assumptions but
predictions can be overly โwigglyโ when ๐ =
0
โข OK takes a bit more effort/practice but is
powerful when a suitable model is available
for the empirical variogram
โข Consider TPS and OK over linear interpolation!
30/11/2015
34
Thanks!
30/11/2015
35
© Copyright 2025 Paperzz