Model-assisted Estimation
of Forest Resources with
Generalized Additive
Models
Jean Opsomer, Jay Breidt, Gretchen Moisen,
Göran Kauermann
August 9, 2006
1
Outline
1. Forest surveys
2. Sampling from spatial domain
3. Model-assisted estimation
4. GAM estimation for forest inventory data
5. Variance estimation for systematic samples
2
Research Project
• Collaboration between academic and US
Forest Service statisticians
• Goal: apply on-going modeling efforts by
Forest Service staff to improve efficiency of
survey estimators
3
1. Forest Inventory and
Analysis (FIA)
• Forest Inventory and Analysis
is annual survey of all forest
lands in US
• Multi-phase survey,
including field visits phase
with approximately 1 plot/
6,000 acres
• Expensive: $68million in
2004 (nation-wide)
4
Inference for Surveys?
Specific Inference
•
•
expensive, high quality
•
using “custom-built” method
(or model) to achieve best
possible estimator for
particular variable(s)
•
willing to defend estimates/
inference
targeted to specific
application and/or scientific
question
5
Inference for Surveys
Generic Inference
•
cheap, reasonable quality,
good for many purposes
C orn
•
using method appropriate
for large number of variables
•
Fla k es
provide reasonable answers
to many possible scientific
questions
•
validity of estimates resistant
to model misspecification;
model independent
6
N E T WT. 12 O Z.
Survey Estimation
• Classical methods depend only on sampling
design (Horvitz-Thompson; Hájek)
• Improved methods are still design-based but take
advantage of auxiliary information
•
•
•
•
ratio, regression, post-stratification
model-assisted (Särndal et al, 1992)
calibration (Deville and Särndal, 1992)
nonparametric (Breidt and Opsomer, 2000),
nonlinear/generalized (Wu and Sitter, 2001), ...
7
Current Dataset
• 2.5 million ha ecological
region in Utah
• Contains 968 FIA field
plots on 5x5km grid
• FIA plots embedded in
24,980 remote sensing
locations on 1x1km grid
8
Current Dataset (2)
Remote Sensing Variables
‣
‣
‣
‣
‣
‣
Field Plot Variables
‣
‣
‣
‣
‣
‣
‣
Elevation
Slope
Aspect
Location
Vegetation Index
TM spectral bands
9
Forest/non-forest
Total wood volume
Tree basal area
Biomass
Percent crown cover
Mean diameter
...
Systematic Sampling
• Common in natural resource and other
spatial surveys
• Advantages:
• Simple to implement, intuitive
• Easy to “nest” within GIS environment
proportional representation of
• Ensures
domains
• Optimal for certain stationary processes
10
Systematic Sampling (2)
• Disadvantages
• Inflexible, can miss rare features in region
not capture spatial relationships at
• Does
fine scales (modeling)
• No design-based variance estimator
11
2. Sampling from Spatial
Domain
•
Phase I sample G1 is
systematic from
continuous domain
U ⊆ D = [0, L1 ] × [0, L2 ]
•
Phase II sample G2 is
systematic (discrete)
sub-sample of G1
Conditional on G1 , only
25 possible phase II
sample
12
Sampling from Spatial
Domain (2)
• Phase I sample G1(u), with u = (u1, u2)
uniform random variable on [0, 1] × [0, 1] and
sampling intervals (δ1 , δ2 )
G1 (u) = {(u1 + i1 )δ1 , (u2 + i2 )δ2 ) : i1 , i2 = 0, 1, . . .}
• Phase II sample G2(u, d), where d = (d1, d2)
discrete uniform on [1, 2, . . . , h1 ] × [1, 2, . . . , h2 ]
G2 (u, d) = {(u1 + d1 + j1 h1 )δ1 , (u2 + d2 + j2 h2 )δ2 ) :
j1 , j2 = 0, 1, . . .}
13
Population
Characteristics
• Interested in estimating finite population total
for variable z(v) on!D
θz =
z(v)dv
D
• Total θz can!be"“gridded’’ into cells Di i
1 2
θz =
z(v)dv
i1 i2
= δ1 δ2
Di1 i2
!
[0,1]×[0,1]
14
"
s∈G1 (u)
z(s)du
Survey Estimation
• Phase I expansion estimator
θ̂1z (u) =
!
z(s)
1/(δ1 δ2 )
s∈G1 (u)
(unfeasible for Phase II variables)
• Two-phase expansion estimator
θ̂2z (u, d) =
!
s∈G2 (u,d)
z(s)
1/(δ1 δ2 h1 h2 )
• Both unbiased, have exact variance formula
15
3. Model-Assisted
Estimation
• Variables X(v) observed on Phase I can
improve precision of survey estimators for
Phase II variables
• Model-assisted approach provides
convenient framework for incorporating
auxiliary information within design-based
(generic) inference
16
Model-Assisted
Estimation (2)
1. Assume working model Eξ (z(v)) = µ(X(v))
2. Fit model on {z(s), X(s) : s ∈ G2 (u, d)}
to predict µ̂(s), s ∈ G1 (u)
3. Construct model-assisted estimator
!
!
µ̂(s)
z(s) − µ̂(s)
+
θ̂MA,z =
1/(δ1 δ2 )
1/(δ1 δ2 h1 h2 )
s∈G1 (u)
s∈G2 (u,d)
17
Properties of ModelAssisted Estimator
• Estimator θ̂MA,z is approximately design
unbiased for large classes of models, with
approximate design variance
"
!
|D|2
Var(θ̂MA,z ) ≈ Var(θ̂1z (u)) + 2
n2
2
S (u) =
1
h1 h2 − 1
td1 d2 (u) =
h1 !
h2
!
!
1
1−
h1 h2
18
E(S (u))
2
(td1 d2 (u) − t̄(u))
d1 =1 d2 =1
s∈G2 (u,d)
2
(z(s) − µ̂(s))
Applying ModelAssisted Estimation
• In typical survey context, many variables of
interest instead of single z(v)
• Express estimator θ̂MA,z in the form
θ̂MA,z =
!
w(s)z(s)
s∈G2 (u,d)
(automatic for linear estimators)
• Survey variables “related to” Phase I variables
X(v) will benefit from improved efficiency
19
4. Estimation for Forest
Inventory Data
• Forest Service researchers are investigating
predictive models for forest characteristics
based on remote sensing data
• Key variable in this survey: FOREST indicator
IFOR (v)
‣
Many other variables not recorded when
IFOR (v) = 0
20
GAM Variables for
FOREST
‣ (X,Y) coordinates (bivariate)
‣ ELEV90CU elevation
‣ TRASP90 aspect (transformed)
‣ SLP90CU slope
‣ MRLCOOB5 TM satellite band 5
‣ NDVI vegetation index (TM)
‣ NLDC7 vegetation classes (TM)
21
GAM Model for
FOREST
• Model
Eξ (IFOR (v)) ≡ µFOR (v) = g(m1 (x1 (v)) + . . .
!
+m6 (x6 (v)) + x7 (v) β)
with g(·) logistic link and xk (v) Phase I
variables
• Fitted in S-Plus using gam() with lo()
smoothers, to obtain prediction µ̂FOR (s)
for s ∈ G1 (u)
22
-10
-2
2
-0.2
0.4
0
-100
-50
50
100
150
MRLC00B5
23
0
1
2
)
0.5
s
s, Y
X
(
lo
5 5 5 .5
-1. -1-0. 00. 11
=
an
p
s
,
-3
-1
0
1
s(ELEV90CU, df = 4)
0.8
-1
0.6
0.4
0.2
Xs
-2
s(SLP90CU, df = 4)
-0.8
s(TRASP90, df = 4)
0
Ys .4 0
.2
0.6
-2 0
4
-150
0.8
-6
s(NDVI, df = 4)
0
s(MRLC00B5, df = 4)
FOREST Model
Components
1500
0
0
-0.2
2000
20
TRASP90
0.0
2500
0.2
NDVI
3000
40
0.4
3500
ELEV90CU
60
80
SLP90CU
0.6
0.8
Other Phase II Variables
‣ NVOLTOT
‣ BA ‣ BIOMASS ‣ CRCOV ‣ QMDALL total wood volume (cuft/acre)
tree basal area (per acre)
total wood biomass (ton/acre)
percent crown cover (%)
quadratic mean diameter (in)
24
Modeling Other
Variables
Approaches considered:
1. “Classical” model-assisted
2. Model-assisted with FOREST
prediction as auxiliary variable
3. Model-assisted with FOREST
prediction indicator
25
We didn’t do...
1. “Classical” model-assisted
Eξ (z(v)) = m1 (x1 (v)) + . . . + m6 (x6 (v))
!
+x7 (v) β
with m1 (·), . . . , m6 (·) parametric or
nonparametric
2. Model-assisted with FOREST prediction as
auxiliary variable
Eξ (z(v)) = m1 (x1 (v)) + . . . + m6 (x6 (v))
!
+x7 (v) β + µ̂FOR (v)γ
26
Selected Method
indicator for FOREST prediction
• Construct
!
IˆFOR (v) =
1 if µ̂FOR (v) ≥ θ̂2,FOR (u, d)/|D|
0 otherwise
and FOREST-interaction Phase I variables
xk∗F (v) = IˆFOR (v)xk (v)
• Working model
Eξ (z(v) ≡ µ(X(v)) = X ∗F (v)β
model predicts 0 whenever IˆFOR (v) = 0
‣
27
Selected Method (2)
• Estimator constructed as
θ̂MA,z =
=
!
s∈G1 (u)
!
µ̂(s)
+
1/(δ1 δ2 )
!
s∈G2 (u,d)
z(s) − µ̂(s)
1/(δ1 δ2 h1 h2 )
w(s)z(s)
s∈G2 (u,d)
• Onlyˆ approximately linear, due to presence
of IFOR (v) on RHS
28
Calibration Properties
• Weights w(s) calibrated for Phase I totals of
auxiliary variables xk∗F (v)
!
! xk∗F (s)
w(s)xk∗F (s) =
1/(δ1 δ2 )
s∈G1 (u)
s∈G2 (u,d)
and approximately for µ̂FOR (v) and xk (v)
• Estimators for domains Uh ⊆ D improved over
simpler estimators, by incorporating forest/
non-forest prediction locally
29
Comparing Estimators
1. EXP: expansion estimator θ̂2z (u, d)
2. PS: Model-assisted using NLDC7 categories
only (=post-stratified, current FIA method)
3. REG: Model-assisted with linear model
using Phase I variables
4. GAM/REGI: Model-assisted with linear
model using Phase I variables interacted
with GAM forest/non-forest prediction
30
Results
Study
Variable
Estimator
Estimated
Mean
FOREST
(forest/
non-forest
binary)
NVOLTOT
(total wood
volume in
cuft/acre)
BA
(tree basal
area per
acre)
BIOMASS
(total wood
biomass in
tons/acre)
CRCOV
(percent
crown
cover)
QMDALL
(quadratic
mean diameter
in inches)
EXP
PS
REG
GAM
EXP
PS
REG
REGI
EXP
PS
REG
REGI
EXP
PS
REG
REGI
EXP
PS
REG
REGI
EXP
PS
REG
REGI
0.51
0.54
0.54
0.54
845.81
877.41
877.67
853.85
45.19
47.12
47.29
46.01
13.51
14.01
14.00
13.60
21.02
22.03
22.18
21.64
3.77
3.95
3.96
3.89
Estimated
Standard
Error
Est. Relative
Efficiency of
GAM/REGI
0.02
0.01
0.01
0.01
44.07
39.10
35.35
32.98
2.01
1.77
1.63
1.54
0.69
0.60
0.54
0.49
0.86
0.77
0.68
0.65
0.15
0.14
0.14
0.14
1.83
1.38
1.18
1.79
1.41
1.15
1.70
1.33
1.12
1.96
1.51
1.19
1.73
1.39
1.09
1.26
1.08
1.01
5.Variance Estimation
for Systematic Samples
• No design-based variance estimator for
systematic sampling
‣ in each phase, sample contains only one
of all possible grids in population/phase
θz =
!
[0,1]×[0,1]
θ̂1z (u) =
!
s∈G1 (u)
z(s)
1/(δ1 δ2 )
"
s∈G1 (u)
z(s)
du
1/(δ1 δ2 )
θ̂2z (u, d) =
!
s∈G2 (u,d)
32
z(s)
1/(δ1 δ2 h1 h2 )
Simple Random
Sampling Approximation
• Efficiency comparison relied on simple
random sampling approximation for
variance estimation
• For large numbers of possible samples and
stationary populations, approximation is
good on average
• Deviations can be significant for individual
samples
33
Simple Random Sampling
Approximation (2)
• Stationarity is reasonable for model-
assisted estimators (model removes trend)
• But: only 25 possible samples in Phase II
• Approximate variance relies on asymptotic
arguments
34
Alternative Approach to
Assess Efficiency Gains
• Ignore Phase I variance component Var(θ̂1z (u)):
identical across all estimators
• Generate “synthetic” population and compute
exact Phase II variance over 25 samples
‣ avoid both asymptotic and simple random
sampling approximations
‣ depends on appropriateness of model
35
Synthetic Population
• Higher order polynomial models fitted to
sample data
• logistic model for FOREST
• remaining variables fitted only on locations
with IFOR (v) = 1
• Predict variables for Phase I
• Sample means of variables approximately
match those of the original population
36
Approximation Bias
Simulated
Variable
Relative
Relative Percent
Bias Percent Bias
Simulated Efficiency of Efficiency
of
of Variance
of Variance
Variable GAM/REGI
Estimator GAM/REGI
Estimator
Estimator
Estimator
FOREST
FOREST
EXP
(forest/
(forest/
PS
non-forest
non-forest
REG
binary)
GAM
binary)
NVOLTOT
EXP
NVOLTOT
(total wood
(total PS
wood
volume in
REG in
volume
cuft/acre)
REGI
cuft/acre)
BA
EXPBA
(tree basal
(tree PS
basal
area per
REGper
area
acre)
REGI
acre)
BIOMASS
EXP
BIOMASS
(total wood
(total PS
wood
biomass in
REG in
biomass
tons/acre)
REGI
tons/acre)
CRCOV
EXP
CRCOV
(percent
PS
(percent
crown
REG
crown
cover)
REGI
cover)
QMDALL
EXP
QMDALL
(quadratic
PS
(quadratic
mean diametermean diameter
REG
in inches)
inREGI
inches)
EXP
4.51
PS
3.13
REG
1.92
GAM
1.31
EXP
1.14
PS
1.07
REG
REGI
2.02
EXP
1.55
PS
1.19
REG
REGI
1.67
EXP
1.12
PS
1.14
REG
REGI
1.49
EXP
1.36
PS
1.17
REG
REGI
2.55
EXP
1.79
PS
1.20
REG
REGI
4.51
12.62
3.13
9.33
1.92
24.77
-31.01
-23.71
1.31
-32.11
1.14
-43.88
1.07
-52.71
19.35
2.02
19.81
1.55
20.97
1.19
8.93
17.48
1.67
34.31
1.12
-1.32
1.14
-14.61
-4.04
1.49
-20.93
1.36
-31.38
1.17
-44.01
5.92
2.55
23.70
1.79
50.32
1.20
13.27
12.62
9.33
24.77
-31.01
-23.71
-32.11
-43.88
-52.71
19.35
19.81
20.97
8.93
17.48
34.31
-1.32
-14.61
-4.04
-20.93
-31.38
-44.01
5.92
23.70
50.32
13.27
Variable
Simulated
Variable
Simulated
FOREST
Variable
Simulated
(forest/
FOREST
Variable
Simulated
non-forest
(forest/
FOREST
Simulated
Variable
binary)
non-forest
(forest/
FOREST
Simulated
Variable
binary)
NVOLTOT
non-forest
(forest/
FOREST
Variable
(total
wood
FOREST
NVOLTOT
binary)
non-forest
(forest/
Estimator
Estimator
EXP
Estimator
PS
EXP
Estimator
REG
PS
EXP
Estimator
GAM
REG
PS
EXP
Estimator
GAM
REG
PS
EXP
Estimator
EXP
GAM
REG
PS
GAM/REGI
Efficiency
of
Relative
GAM/REGI
Efficiency
of
Relative
4.51
GAM/REGI
Efficiency
of
Relative
3.13
4.51
Relative
GAM/REGI
Efficiency
of
1.92
3.13
4.51
Relative
Efficiency
of
GAM/REGI
1.92
3.13
4.51
Efficiency
of
GAM/REGI
1.31
1.92
3.13
4.51
GAM/REGI
1.14
4.51
1.31
1.92
3.13
ofEstimator
Variance
Percent
Bias
ofEstimator
Variance
Percent
Bias
12.62
ofEstimator
Variance
Percent
Bias
9.33
12.62
Percent
Bias
ofEstimator
Variance
24.77
9.33
12.62
Percent
Bias
ofEstimator
Variance
-31.01
24.77
9.33
12.62
ofEstimator
Variance
-31.01
-23.71
24.77
9.33
12.62
Estimator
-32.11
12.62
-23.71
-31.01
24.77
9.33
FOREST
volume
in
(forest/
(total
wood
binary)
non-forest
NVOLTOT
(forest/
cuft/acre)
non-forest
volume
in
binary)
(total
wood
NVOLTOT
non-forest
binary)
cuft/acre)
BA
volume
in
(total
wood
NVOLTOT
binary)
(tree
basal
NVOLTOT
BA
cuft/acre)
volume
in
(total
wood
NVOLTOT
areawood
per
(total
(tree
basal
cuft/acre)
volume
in
BA
(total
wood
volume
in
areaacre)
per
cuft/acre)
(tree
basal
BA
volume
in
cuft/acre)
acre)
BIOMASS
area
per
(tree
basal
BA
cuft/acre)
(total
wood
BA
BIOMASS
acre)
area
per
(tree
basal
BA
biomass
in
(tree
basal
(total
wood
areaacre)
per
BIOMASS
(tree
basal
tons/acre)
area
per
biomass
in
acre)
(total
wood
BIOMASS
area
per
acre)
tons/acre)
CRCOV
biomass
in
(total
wood
BIOMASS
acre)
(percent
BIOMASS
CRCOV
tons/acre)
biomass
in
(total
wood
BIOMASS
crown
(total
wood
(percent
tons/acre)
biomass
in
CRCOV
(total
wood
cover)
biomass
in
crown
tons/acre)
(percent
CRCOV
biomass
in
tons/acre)
cover)
QMDALL
crown
(percent
CRCOV
tons/acre)
(quadratic
CRCOV
QMDALL
cover)
crown
(percent
CRCOV
mean(quadratic
diameter
(percent
cover)
crown
QMDALL
(percent
indiameter
inches)
crown
mean(quadratic
cover)
QMDALL
crown
EXP
PS
GAM
REG
REGI
REG
GAM
PS
EXP
REG
GAM
REGI
PS
EXP
GAM
PS
EXP
REGI
REG
EXP
PS
REGI
REG
PS
REGI
REG
EXP
REG
REGI
PS
EXP
REGI
PS
EXP
REG
PS
REGI
REG
EXP
PS
REG
REGI
EXP
REGI
REG
PS
EXP
REGI
EXP
REG
PS
EXP
PS
REGI
REG
PS
REG
REGI
EXP
REG
REGI
EXP
PS
REGI
EXP
REG
PS
EXP
REG
PS
REGI
REG
REGI
PS
EXP
REG
4.51
1.07
3.13
1.14
1.92
1.31
3.13
1.92
1.07
1.14
1.31
1.92
2.02
1.07
1.14
1.31
1.55
1.31
2.02
1.07
1.14
1.31
1.19
1.14
1.55
1.07
2.02
1.14
1.07
1.19
1.55
2.02
1.07
1.67
1.19
1.55
2.02
1.12
2.02
1.67
1.19
1.55
2.02
1.14
1.55
1.12
1.19
1.67
1.55
1.19
1.14
1.12
1.67
1.19
1.49
1.14
1.12
1.67
1.36
1.67
1.49
1.14
1.12
1.67
1.17
1.12
1.36
1.14
1.49
1.12
1.14
1.17
1.36
1.49
1.14
2.55
1.17
1.36
1.49
1.79
1.49
2.55
1.17
1.36
1.49
1.20
1.36
1.79
1.17
2.55
1.36
1.17
1.20
1.79
2.55
1.17
12.62
-43.88
9.33
-32.11
-31.01
24.77
-23.71
9.33
-52.71
24.77
-43.88
-31.01
-32.11
-23.71
24.77
-31.01
-52.71
19.35
-43.88
-32.11
-23.71
-31.01
19.81
-23.71
19.35
-52.71
-43.88
-32.11
-23.71
20.97
-32.11
19.81
-52.71
-43.88
19.35
-32.11
8.93
-43.88
20.97
-52.71
19.81
19.35
-43.88
-52.71
8.93
17.48
20.97
19.81
19.35
-52.71
34.31
19.35
17.48
8.93
20.97
19.81
19.35
-1.32
19.81
34.31
8.93
20.97
17.48
19.81
-14.61
20.97
-1.32
8.93
34.31
17.48
20.97
8.93
-14.61
-4.04
-1.32
34.31
17.48
8.93
-20.93
17.48
-4.04
-14.61
-1.32
34.31
17.48
-31.38
34.31
-20.93
-14.61
-1.32
-4.04
34.31
-44.01
-1.32
-31.38
-14.61
-20.93
-4.04
-1.32
-14.61
-44.01
5.92
-31.38
-20.93
-4.04
-14.61
23.70
-4.04
5.92
-44.01
-31.38
-20.93
-4.04
50.32
-20.93
23.70
-44.01
-31.38
5.92
-20.93
13.27
-31.38
50.32
-44.01
23.70
5.92
-31.38
Relative Efficiency
6. Conclusions: modelassisted estimation
• Model-assisted framework provides flexible
approach to incorporate sophisticated
models in survey estimation
• Nonparametric models make it possible to
capture complex patterns in forest
resource data
• Use on-going spatial modeling efforts by
forestry researchers to improve tabular
data
39
6. Conclusions:
systematic sampling
• Systematic sampling is popular in natural
resource surveys, but does not allow for a
design-based variance estimator
• Synthetic population approach provided
ad hoc solution in this case
• On-going research: predict design-based
variance under nonparametric model (Li,
2006)
40
• Almost-final version of this paper available at
http://www.public.iastate.edu/~jopsomer/research.html
• Contact info: [email protected]
© Copyright 2026 Paperzz