Interpolation {x,y} Data with Suavity Peter K. Ott Forest Analysis & Inventory Branch BC Ministry of FLNRO Victoria, BC [email protected] 30/11/2015 1 The Goal โข Given a set of points: ๐ฅ๐ , ๐ฆ๐ , ๐ = 1,2, โฆ , ๐ find a function that passes through the points affording the prediction of ๐ฆ๐ at new ๐ฅ๐ โข Regression or smoothing is a related but different problem 30/11/2015 2 Outline โข โข โข โข Example Data Linear Interpolation Thin Plate Splines (TPS) Ordinary Kriging (OK) โข Two implementations โข Conclusion 30/11/2015 3 data fake; input x y; cards; 0 41 1 26 2 19 3 18 4 18.5 5 17.5 6 18 7 18.5 8 19 ; run; ods html; ods graphics on; proc sgplot data=fake noautolegend; scatter y=y x=x / markerattrs=(symbol=circle size=4pt color=blue); title 'Y versus X'; run; ods graphics off; ods html close; 30/11/2015 4 Our Data 30/11/2015 5 Linear Interpolation (~200 yrs BC) โข Form a straight line between pairs of known points โข What is ๐ฆ0 | ๐ฅ0 where ๐ฅ0 lies between ๐ฅโ1 and ๐ฅ+1 ? โข Slope must be constant, so โข Solve for ๐ฆ0 : ๐ฆ0 โ ๐ฆโ1 ๐ฆ+1 โ ๐ฆ0 = ๐ฅ0 โ ๐ฅโ1 ๐ฅ+1 โ ๐ฅ0 ๐ฆโ1 ๐ฅ+1 โ ๐ฅ0 + ๐ฆ+1 ๐ฅ0 โ ๐ฅโ1 ๐ฆ0 = ๐ฅ+1 โ ๐ฅโ1 30/11/2015 6 data interp0; *denser range of x to be interpolated; do x=0 to 8 by 0.1; output; end; run; proc sql; create table lin_pred(drop=x0) as select * from interp0 left join fake(rename=(x=x0)) on put(interp0.x, 6.3) = put(fake.x0, 6.3) ; quit; proc print data=lin_pred(obs=34) noobs; run; 30/11/2015 7 x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 30/11/2015 y 41 . . . . . . . . . 26 . . . . . . . . . 19 . . . . . . . . . 18 . . . 8 proc expand data=lin_pred(keep=x y) out=lin_interp; convert y=linear / method=join; id x; *data must be sorted by x; run; ods html; ods graphics on; proc sgplot data=lin_interp noautolegend; series y=linear x=x / lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction; scatter y=y x=x / markerattrs=(symbol=circle size=4pt color=blue); title 'Linear interpolated values'; run; ods graphics off; ods html close; 30/11/2015 9 Linear Interpolation 30/11/2015 10 Thin Plate Splines (1970s) โข Want a function to minimize: ๐ ๐ฟ= ๐ฆ๐ โ ๐ ๐ฅ๐ 2 +๐โ ๐ โฒโฒ ๐ฅ 2 ๐๐ฅ ๐=1 โข or more generally ๐ ๐ฟ= โข where, for ๐ = 2 ๐ฝ ๐ = โข ๐ฆ๐ โ ๐ ๐ฑ ๐ข 2 +๐โ๐ฝ ๐ ๐=1 ๐2๐ ๐๐ฅ12 2 ๐2๐ + ๐๐ฅ1 ๐ฅ2 2 ๐2๐ + ๐๐ฅ22 2 ๐๐ฅ1 ๐๐ฅ2 Where ๐ โฅ 0 is an unknown parameter that controls the wiggliness 30/11/2015 11 โข Solution to this problem is a function that relies on radial basis functions and it passes through data without knots โข One dimension example: 1 ๐ฆ0 | ๐ฅ0 = ๐ผ0 + ๐ผ1 ๐ฅ0 + 12 ๐ ๐ฝ๐ โ ๐ฅ0 โ ๐ฅ๐ ๐=1 โข Two dimension example: ๐ฆ0 | ๐ฅ01 , ๐ฅ02 1 = ๐ผ0 + ๐ผ1 ๐ฅ01 + ๐ผ2 ๐ฅ02 + 8๐ โข where ๐ง๐ = 30/11/2015 ๐ฅ01 โ ๐ฅ1๐ 2 3 ๐ ๐ฝ๐ โ ๐ง๐2 ๐๐๐ ๐ง๐ ๐=1 + ๐ฅ02 โ ๐ฅ2๐ 2 12 proc tpspline data=fake; model y =(x); */ lambda0=1e-15; *setting lambda0 to zero is necessary for interpolation; score data=interp0 out=tps_pred pred; *this will yield the interpolated points and more; output out=tps_coef pred coef; run; proc print data=tps_coef noobs; run; *output are alpha[0], alpha[1], beta[1], ..., beta[n], with the beta aligned with sorted (unique) x[i]; *Note also that sum(beta[i])=0 and sum(beta[i]*x[i])=0; 30/11/2015 13 x 0 1 2 3 4 5 6 7 8 . . 30/11/2015 y 41.0 26.0 19.0 18.0 18.5 17.5 18.0 18.5 19.0 . . P_y Coef_y 41.0000 27.8513 26.0000 -8.1071 19.0000 10.5088 18.0000 -15.0530 18.5000 0.2121 17.5000 -0.7953 18.0000 11.9690 18.5000 -11.0810 19.0000 5.3549 . -1.3387 . 0.2231 14 proc sql; create table tps_pred2(drop=x0) as select * from tps_pred left join fake(rename=(x=x0)) on put(tps_pred.x, 6.3) = put(fake.x0, 6.3) ; quit; ods html; ods graphics on; proc sgplot data=tps_pred2 noautolegend; series y=p_y x=x / lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction; scatter y=y x=x / markerattrs=(symbol=circle size=4pt color=blue); title 'Interpolated values'; run; ods graphics off; ods html close; 30/11/2015 15 Thin Plate Spline 30/11/2015 16 Ordinary Kriging (1960s) โข Consider ๐ฆ๐ | ๐ฅ๐ as a multivariate Gaussian process: ๐ฆ๐ | ๐ฅ๐ = ๐ฒ ~ ๐๐ ๐๐, ๐ ๐ ๐=1 ๐ค๐ ๐ฆ๐ โข Find the estimator ๐ฆ0 | ๐ฅ0 = = ๐ฐโฒ๐ฒ such that: ๐ธ ๐ฆ0 | ๐ฅ0 = ๐ (unbiased), and Prediction error, ๐๐๐(๐ฆ0 โ ๐ฆ0 ) is minimized 30/11/2015 17 โข It turns out: ๐ฐ= ๐โ๐ ๐ โ โ1 โ๐ โฒ โ๐ ๐ ๐ ๐ ๐ ๐๐โฒ ๐โ๐ ๐ + โ1 โ๐ โฒ โ๐ ๐๐ ๐ ๐ ๐ (ugly) ๐ฆ0 | ๐ฅ0 = ๐ + ๐โฒ๐ ๐โ๐ ๐ฒ โ ๐๐ (better) โข where ๐ถ๐๐ฃ ๐ฆ1 , ๐ฆ0 โ1 โฒ โ๐ โฒ โ๐ ๐ = ๐ ๐ ๐ ๐ ๐ ๐ฒ and ๐๐ = ๐ถ๐๐ฃ ๐ฆ2 , ๐ฆ0 โฎ ๐ถ๐๐ฃ ๐ฆ๐ , ๐ฆ0 30/11/2015 18 โข How do we determine ๐๐ ? Weโll need to model the covariance structure as a function of distance, say โ โข Tradition is to use semivariances (semivariogram) instead of covariances (covariogram) or correlations (correlogram): ๐พ๐๐ = ๐ 2 โ ๐๐๐ = ๐ 2 1 โ ๐๐๐ 1 ๐พ โ = 2โ๐ โ 30/11/2015 ๐ โ ๐ฆ๐ ๐ฅ๐ + โ โ ๐ฆ๐ ๐ฅ๐ 2 ๐=1 19 Semivariogram 30/11/2015 20 Features of the (Semi)variogam โข Nugget: discontinuity at the origin. Canโt have this for interpolation with kriging! โข Range: distance it takes for the variogram to level off (reach asymptote) โข Sill: value of variogram at asymptote (= ๐ 2 = ๐ฃ๐๐ ๐ฆ0 ). When a nugget is present, sill = partial sill + nugget 30/11/2015 21 Ordinary Kriging Implementation - two options: 1. Use both proc variogram & proc krige2d โข need to create a second variable (x2) with constant values 2. Use a mixed model procedure (proc mixed) โข not provided empirical and fitted variograms automatically 30/11/2015 22 data fake2; set fake; x2=1; *constant value; run; ods html; ods graphics on; proc variogram data=fake2 outvar=look; store out=semivar_store; directions 90(0); *not really needed; compute lagdist=1 maxlag=10; *lagdist should be ~ 2*min norm and maxlag should be ~ max norm among xs; coordinates xc=x yc=x2; var y; model nugget=0 form=auto(mlist=(gau,pow,she) nest=2) choose=(AIC SSE STATUS); *important that nugget is zero for interpolation; run; proc krige2d data=fake2 outest=kr_pred(rename=(gxc=x estimate=y_est)); restore in=semivar_store; coordinates xc=x yc=x2; predict var=y; model storeselect; grid x=0 to 8 by 0.01 y=1 to 1 by 1; run; 30/11/2015 23 The VARIOGRAM Procedure Dependent Variable: y Empirical Semivariogram at Angle=90 30/11/2015 Lag Class Pair Count Average Distance 0 0 . . 1 8 1 17.313 2 7 2 39.339 3 6 3 49.146 4 5 4 58.000 5 4 5 77.188 6 3 6 97.542 7 2 7 138.813 8 1 8 242.000 9 0 . . 10 0 . . Semivariance 24 30/11/2015 25 proc sql; create table kr_pred2(drop=x0) as select *, (y_est+1.96*stderr) as cl_upp, (y_est-1.96*stderr) as cl_low from kr_pred(keep=x y_est stderr) left join fake2(keep=x y rename=(x=x0)) on put(kr_pred.x, 6.3) = put(fake2.x0, 6.3) ; quit; proc sgplot data=kr_pred2 noautolegend; series y=y_est x=x / lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=graphprediction; scatter y=y x=x / markerattrs=(symbol=circle size=4pt color=blue); series y=cl_upp x=x / lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction; series y=cl_low x=x / lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction; title 'Interpolated values'; run; ods graphics off; ods html close; 30/11/2015 26 30/11/2015 27 Getting setup for proc mixed data fake_fmixed; set fake end=last; output; if last then do x=0 to 8 by 0.01; y=.; output; end; run; proc print data=fake_fmixed(obs=34) noobs; run; 30/11/2015 28 x 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 30/11/2015 y 41.0 26.0 19.0 18.0 18.5 17.5 18.0 18.5 19.0 . . . . . . . . . . . . . . . . . . . . . . . . . 29 proc mixed data=fake_fmixed; model y = / outp=ok_preds; *outputing predictions; repeated / subject=intercept type=sp(matern)(x); title 'Ordinary Kriging in Proc Mixed'; run; proc print data=ok_preds(obs=34) noobs; run; ods html; ods graphics on; proc sgplot data=ok_preds(where=(resid=.)) noautolegend; series y=pred x=x / lineattrs=(pattern=2 thickness=1pt color=red) lineattrs=GraphPrediction; series y=lower x=x / lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction; series y=upper x=x / lineattrs=(pattern=2 thickness=1pt color=green) lineattrs=graphprediction; title 'Kriged values via Mixed Model'; run; ods graphics off; ods html close; 30/11/2015 30 x 0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 30/11/2015 y 41.0 26.0 19.0 18.0 18.5 17.5 18.0 18.5 19.0 . . . . . . . . . . . . . . . . . . . . . . . . . Pred 63.5257 63.5257 63.5257 63.5257 63.5257 63.5257 63.5257 63.5257 63.5257 41.0000 40.8198 40.6401 40.4607 40.2818 40.1034 39.9254 39.7478 39.5707 39.3941 39.2179 39.0422 38.8670 38.6923 38.5181 38.3443 38.1711 37.9984 37.8262 37.6546 37.4834 37.3128 37.1428 36.9733 36.8043 StdErr Pred DF 59.7589 8 59.7589 8 59.7589 8 59.7589 8 59.7589 8 59.7589 8 59.7589 8 59.7589 8 59.7589 8 . 8 0.0104 8 0.0206 8 0.0305 8 0.0401 8 0.0494 8 0.0584 8 0.0672 8 0.0757 8 0.0839 8 0.0918 8 0.0995 8 0.1069 8 0.1140 8 0.1208 8 0.1274 8 0.1337 8 0.1398 8 0.1456 8 0.1511 8 0.1564 8 0.1614 8 0.1661 8 0.1706 8 0.1749 8 Alpha 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 Lower -74.2785 -74.2785 -74.2785 -74.2785 -74.2785 -74.2785 -74.2785 -74.2785 -74.2785 . 40.7957 40.5925 40.3904 40.1894 39.9895 39.7906 39.5929 39.3962 39.2006 39.0062 38.8128 38.6206 38.4295 38.2394 38.0505 37.8628 37.6761 37.4906 37.3061 37.1229 36.9407 36.7597 36.5798 36.4010 Upper 201.330 201.330 201.330 201.330 201.330 201.330 201.330 201.330 201.330 . 40.844 40.688 40.531 40.374 40.217 40.060 39.903 39.745 39.588 39.430 39.272 39.113 38.955 38.797 38.638 38.479 38.321 38.162 38.003 37.844 37.685 37.526 37.367 37.208 Resid -22.5257 -37.5257 -44.5257 -45.5257 -45.0257 -46.0257 -45.5257 -45.0257 -44.5257 . . . . . . . . . . . . . . . . . . . . . . . . . 31 30/11/2015 32 Comparison of all 4 approaches 30/11/2015 33 Conclusions โข TPSs and OK are both capable of interpolation and smoothing โข TPSs require no distributional assumptions but predictions can be overly โwigglyโ when ๐ = 0 โข OK takes a bit more effort/practice but is powerful when a suitable model is available for the empirical variogram โข Consider TPS and OK over linear interpolation! 30/11/2015 34 Thanks! 30/11/2015 35
© Copyright 2024 Paperzz