Interpolation {x,y} Data with Suavity

Interpolation {x,y} Data with Suavity
Peter K. Ott
Forest Analysis & Inventory Branch
BC Ministry of FLNRO
Victoria, BC
[email protected]
30/11/2015
1
The Goal
โ€ข Given a set of points:
๐‘ฅ๐‘– , ๐‘ฆ๐‘– , ๐‘– = 1,2, โ€ฆ , ๐‘›
find a function that passes through the points
affording the prediction of ๐‘ฆ๐‘– at new ๐‘ฅ๐‘–
โ€ข Regression or smoothing is a related but
different problem
30/11/2015
2
Outline
โ€ข
โ€ข
โ€ข
โ€ข
Example Data
Linear Interpolation
Thin Plate Splines (TPS)
Ordinary Kriging (OK)
โ€ข Two implementations
โ€ข Conclusion
30/11/2015
3
data fake;
input x y;
cards;
0
41
1
26
2
19
3
18
4
18.5
5
17.5
6
18
7
18.5
8
19
;
run;
ods html;
ods graphics on;
proc sgplot data=fake noautolegend;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Y versus X';
run;
ods graphics off;
ods html close;
30/11/2015
4
Our Data
30/11/2015
5
Linear Interpolation (~200 yrs BC)
โ€ข Form a straight line between pairs of known points
โ€ข What is ๐‘ฆ0 | ๐‘ฅ0 where ๐‘ฅ0 lies between ๐‘ฅโˆ’1 and ๐‘ฅ+1 ?
โ€ข Slope must be constant, so
โ€ข Solve for ๐‘ฆ0 :
๐‘ฆ0 โˆ’ ๐‘ฆโˆ’1 ๐‘ฆ+1 โˆ’ ๐‘ฆ0
=
๐‘ฅ0 โˆ’ ๐‘ฅโˆ’1 ๐‘ฅ+1 โˆ’ ๐‘ฅ0
๐‘ฆโˆ’1 ๐‘ฅ+1 โˆ’ ๐‘ฅ0 + ๐‘ฆ+1 ๐‘ฅ0 โˆ’ ๐‘ฅโˆ’1
๐‘ฆ0 =
๐‘ฅ+1 โˆ’ ๐‘ฅโˆ’1
30/11/2015
6
data interp0; *denser range of x to be interpolated;
do x=0 to 8 by 0.1;
output;
end;
run;
proc sql;
create table lin_pred(drop=x0) as
select *
from interp0 left join fake(rename=(x=x0))
on put(interp0.x, 6.3) = put(fake.x0, 6.3)
;
quit;
proc print data=lin_pred(obs=34) noobs;
run;
30/11/2015
7
x
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
30/11/2015
y
41
.
.
.
.
.
.
.
.
.
26
.
.
.
.
.
.
.
.
.
19
.
.
.
.
.
.
.
.
.
18
.
.
.
8
proc expand data=lin_pred(keep=x y) out=lin_interp;
convert y=linear / method=join;
id x; *data must be sorted by x;
run;
ods html;
ods graphics on;
proc sgplot data=lin_interp noautolegend;
series y=linear x=x /
lineattrs=(pattern=2 thickness=1pt color=red)
lineattrs=GraphPrediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Linear interpolated values';
run;
ods graphics off;
ods html close;
30/11/2015
9
Linear Interpolation
30/11/2015
10
Thin Plate Splines (1970s)
โ€ข
Want a function to minimize:
๐‘›
๐ฟ=
๐‘ฆ๐‘– โˆ’ ๐‘“ ๐‘ฅ๐‘–
2
+๐œ†โˆ™
๐‘“ โ€ฒโ€ฒ ๐‘ฅ
2
๐‘‘๐‘ฅ
๐‘–=1
โ€ข
or more generally
๐‘›
๐ฟ=
โ€ข
where, for ๐‘‘ = 2
๐ฝ ๐‘“ =
โ€ข
๐‘ฆ๐‘– โˆ’ ๐‘“ ๐ฑ ๐ข
2
+๐œ†โˆ™๐ฝ ๐‘“
๐‘–=1
๐œ•2๐‘“
๐œ•๐‘ฅ12
2
๐œ•2๐‘“
+
๐œ•๐‘ฅ1 ๐‘ฅ2
2
๐œ•2๐‘“
+
๐œ•๐‘ฅ22
2
๐‘‘๐‘ฅ1 ๐‘‘๐‘ฅ2
Where ๐œ† โ‰ฅ 0 is an unknown parameter that controls the wiggliness
30/11/2015
11
โ€ข Solution to this problem is a function that relies on radial
basis functions and it passes through data without knots
โ€ข One dimension example:
1
๐‘ฆ0 | ๐‘ฅ0 = ๐›ผ0 + ๐›ผ1 ๐‘ฅ0 +
12
๐‘›
๐›ฝ๐‘– โˆ™ ๐‘ฅ0 โˆ’ ๐‘ฅ๐‘–
๐‘–=1
โ€ข Two dimension example:
๐‘ฆ0 | ๐‘ฅ01 , ๐‘ฅ02
1
= ๐›ผ0 + ๐›ผ1 ๐‘ฅ01 + ๐›ผ2 ๐‘ฅ02 +
8๐œ‹
โ€ข where
๐‘ง๐‘– =
30/11/2015
๐‘ฅ01 โˆ’ ๐‘ฅ1๐‘–
2
3
๐‘›
๐›ฝ๐‘– โˆ™ ๐‘ง๐‘–2 ๐‘™๐‘œ๐‘” ๐‘ง๐‘–
๐‘–=1
+ ๐‘ฅ02 โˆ’ ๐‘ฅ2๐‘–
2
12
proc tpspline data=fake;
model y =(x); */ lambda0=1e-15;
*setting lambda0 to zero is necessary for interpolation;
score data=interp0 out=tps_pred pred;
*this will yield the interpolated points and more;
output out=tps_coef pred coef;
run;
proc print data=tps_coef noobs;
run;
*output are alpha[0], alpha[1], beta[1], ..., beta[n], with the
beta aligned with sorted (unique) x[i];
*Note also that sum(beta[i])=0 and sum(beta[i]*x[i])=0;
30/11/2015
13
x
0
1
2
3
4
5
6
7
8
.
.
30/11/2015
y
41.0
26.0
19.0
18.0
18.5
17.5
18.0
18.5
19.0
.
.
P_y
Coef_y
41.0000 27.8513
26.0000 -8.1071
19.0000 10.5088
18.0000 -15.0530
18.5000 0.2121
17.5000 -0.7953
18.0000 11.9690
18.5000 -11.0810
19.0000
5.3549
.
-1.3387
.
0.2231
14
proc sql;
create table tps_pred2(drop=x0) as
select *
from tps_pred left join fake(rename=(x=x0))
on put(tps_pred.x, 6.3) = put(fake.x0, 6.3)
;
quit;
ods html;
ods graphics on;
proc sgplot data=tps_pred2 noautolegend;
series y=p_y x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
title 'Interpolated values';
run;
ods graphics off;
ods html close;
30/11/2015
15
Thin Plate Spline
30/11/2015
16
Ordinary Kriging (1960s)
โ€ข Consider ๐‘ฆ๐‘– | ๐‘ฅ๐‘– as a multivariate Gaussian
process:
๐‘ฆ๐‘– | ๐‘ฅ๐‘– = ๐ฒ ~ ๐‘๐‘› ๐œ‡๐Ÿ, ๐‘
๐‘›
๐‘–=1 ๐‘ค๐‘– ๐‘ฆ๐‘–
โ€ข Find the estimator ๐‘ฆ0 | ๐‘ฅ0 =
= ๐ฐโ€ฒ๐ฒ
such that:
๐ธ ๐‘ฆ0 | ๐‘ฅ0 = ๐œ‡ (unbiased), and
Prediction error, ๐‘‰๐‘Ž๐‘Ÿ(๐‘ฆ0 โˆ’ ๐‘ฆ0 ) is minimized
30/11/2015
17
โ€ข It turns out:
๐ฐ=
๐‘โˆ’๐Ÿ ๐œ โˆ’
โˆ’1 โˆ’๐Ÿ
โ€ฒ
โˆ’๐Ÿ
๐Ÿ ๐‘ ๐Ÿ ๐‘ ๐Ÿ๐Ÿโ€ฒ ๐‘โˆ’๐Ÿ ๐œ
+
โˆ’1 โˆ’๐Ÿ
โ€ฒ
โˆ’๐Ÿ
๐Ÿ๐‘ ๐Ÿ ๐‘ ๐Ÿ
(ugly)
๐‘ฆ0 | ๐‘ฅ0 = ๐œ‡ + ๐’„โ€ฒ๐ŸŽ ๐‘โˆ’๐Ÿ ๐ฒ โˆ’ ๐œ‡๐Ÿ (better)
โ€ข where
๐ถ๐‘œ๐‘ฃ ๐‘ฆ1 , ๐‘ฆ0
โˆ’1 โ€ฒ โˆ’๐Ÿ
โ€ฒ
โˆ’๐Ÿ
๐œ‡ = ๐Ÿ ๐‘ ๐Ÿ ๐Ÿ ๐‘ ๐ฒ and ๐œ๐ŸŽ = ๐ถ๐‘œ๐‘ฃ ๐‘ฆ2 , ๐‘ฆ0
โ‹ฎ
๐ถ๐‘œ๐‘ฃ ๐‘ฆ๐‘› , ๐‘ฆ0
30/11/2015
18
โ€ข How do we determine ๐œ๐ŸŽ ? Weโ€™ll need to model the
covariance structure as a function of distance, say โ„Ž
โ€ข Tradition is to use semivariances (semivariogram) instead of
covariances (covariogram) or correlations (correlogram):
๐›พ๐‘–๐‘— = ๐œŽ 2 โˆ’ ๐œŽ๐‘–๐‘—
= ๐œŽ 2 1 โˆ’ ๐œŒ๐‘–๐‘—
1
๐›พ โ„Ž =
2โˆ™๐‘› โ„Ž
30/11/2015
๐‘› โ„Ž
๐‘ฆ๐‘– ๐‘ฅ๐‘– + โ„Ž โˆ’ ๐‘ฆ๐‘– ๐‘ฅ๐‘–
2
๐‘–=1
19
Semivariogram
30/11/2015
20
Features of the (Semi)variogam
โ€ข Nugget: discontinuity at the origin. Canโ€™t have
this for interpolation with kriging!
โ€ข Range: distance it takes for the variogram to level
off (reach asymptote)
โ€ข Sill: value of variogram at asymptote (= ๐œŽ 2 =
๐‘ฃ๐‘Ž๐‘Ÿ ๐‘ฆ0 ). When a nugget is present, sill = partial
sill + nugget
30/11/2015
21
Ordinary Kriging
Implementation - two options:
1. Use both proc variogram & proc krige2d
โ€ข need to create a second variable (x2) with
constant values
2. Use a mixed model procedure (proc mixed)
โ€ข not provided empirical and fitted variograms
automatically
30/11/2015
22
data fake2;
set fake;
x2=1; *constant value;
run;
ods html;
ods graphics on;
proc variogram data=fake2 outvar=look;
store out=semivar_store;
directions 90(0); *not really needed;
compute lagdist=1 maxlag=10;
*lagdist should be ~ 2*min norm and maxlag should be ~ max norm among xs;
coordinates xc=x yc=x2;
var y;
model nugget=0 form=auto(mlist=(gau,pow,she) nest=2) choose=(AIC SSE
STATUS);
*important that nugget is zero for interpolation;
run;
proc krige2d data=fake2 outest=kr_pred(rename=(gxc=x estimate=y_est));
restore in=semivar_store;
coordinates xc=x yc=x2;
predict var=y;
model storeselect;
grid x=0 to 8 by 0.01 y=1 to 1 by 1;
run;
30/11/2015
23
The VARIOGRAM Procedure
Dependent Variable: y
Empirical Semivariogram at
Angle=90
30/11/2015
Lag
Class
Pair
Count
Average
Distance
0
0
.
.
1
8
1
17.313
2
7
2
39.339
3
6
3
49.146
4
5
4
58.000
5
4
5
77.188
6
3
6
97.542
7
2
7
138.813
8
1
8
242.000
9
0
.
.
10
0
.
.
Semivariance
24
30/11/2015
25
proc sql;
create table kr_pred2(drop=x0) as
select *, (y_est+1.96*stderr) as cl_upp, (y_est-1.96*stderr) as cl_low
from kr_pred(keep=x y_est stderr) left join fake2(keep=x y rename=(x=x0))
on put(kr_pred.x, 6.3) = put(fake2.x0, 6.3)
;
quit;
proc sgplot data=kr_pred2 noautolegend;
series y=y_est x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=graphprediction;
scatter y=y x=x /
markerattrs=(symbol=circle size=4pt color=blue);
series y=cl_upp x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
series y=cl_low x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
title 'Interpolated values';
run;
ods graphics off;
ods html close;
30/11/2015
26
30/11/2015
27
Getting setup for proc mixed
data fake_fmixed;
set fake end=last;
output;
if last then do x=0 to 8 by 0.01;
y=.;
output;
end;
run;
proc print data=fake_fmixed(obs=34) noobs;
run;
30/11/2015
28
x
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21
0.22
0.23
0.24
30/11/2015
y
41.0
26.0
19.0
18.0
18.5
17.5
18.0
18.5
19.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
proc mixed data=fake_fmixed;
model y = / outp=ok_preds; *outputing predictions;
repeated / subject=intercept type=sp(matern)(x);
title 'Ordinary Kriging in Proc Mixed';
run;
proc print data=ok_preds(obs=34) noobs;
run;
ods html;
ods graphics on;
proc sgplot data=ok_preds(where=(resid=.)) noautolegend;
series y=pred x=x /
lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction;
series y=lower x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
series y=upper x=x /
lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction;
title 'Kriged values via Mixed Model';
run;
ods graphics off;
ods html close;
30/11/2015
30
x
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.21
0.22
0.23
0.24
30/11/2015
y
41.0
26.0
19.0
18.0
18.5
17.5
18.0
18.5
19.0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Pred
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
63.5257
41.0000
40.8198
40.6401
40.4607
40.2818
40.1034
39.9254
39.7478
39.5707
39.3941
39.2179
39.0422
38.8670
38.6923
38.5181
38.3443
38.1711
37.9984
37.8262
37.6546
37.4834
37.3128
37.1428
36.9733
36.8043
StdErr
Pred DF
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
59.7589 8
.
8
0.0104 8
0.0206 8
0.0305 8
0.0401 8
0.0494 8
0.0584 8
0.0672 8
0.0757 8
0.0839 8
0.0918 8
0.0995 8
0.1069 8
0.1140 8
0.1208 8
0.1274 8
0.1337 8
0.1398 8
0.1456 8
0.1511 8
0.1564 8
0.1614 8
0.1661 8
0.1706 8
0.1749 8
Alpha
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
0.05
Lower
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
-74.2785
.
40.7957
40.5925
40.3904
40.1894
39.9895
39.7906
39.5929
39.3962
39.2006
39.0062
38.8128
38.6206
38.4295
38.2394
38.0505
37.8628
37.6761
37.4906
37.3061
37.1229
36.9407
36.7597
36.5798
36.4010
Upper
201.330
201.330
201.330
201.330
201.330
201.330
201.330
201.330
201.330
.
40.844
40.688
40.531
40.374
40.217
40.060
39.903
39.745
39.588
39.430
39.272
39.113
38.955
38.797
38.638
38.479
38.321
38.162
38.003
37.844
37.685
37.526
37.367
37.208
Resid
-22.5257
-37.5257
-44.5257
-45.5257
-45.0257
-46.0257
-45.5257
-45.0257
-44.5257
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
30/11/2015
32
Comparison of all 4 approaches
30/11/2015
33
Conclusions
โ€ข TPSs and OK are both capable of interpolation
and smoothing
โ€ข TPSs require no distributional assumptions but
predictions can be overly โ€œwigglyโ€ when ๐œ† =
0
โ€ข OK takes a bit more effort/practice but is
powerful when a suitable model is available
for the empirical variogram
โ€ข Consider TPS and OK over linear interpolation!
30/11/2015
34
Thanks!
30/11/2015
35