Lesser, Virginia M.; (1992).A Comparisonof Periodic Survey Designs Employing Multi-Stage Sampling."

A COMPARISON OF PERIODIC SURVEY DESIGNS EMPLOYING
MULTI-STAGE SAMPLING
by
Virginia M. Lesser
A dissertation submitted to the faculty of the University of North Carolina in partial
fulfillment of the requirements for the degree of Doctorate of Public Health in the
Department of Biostatistics.
Chapel Hill
1992
Approved by:
1J lU.\-D ~AdVisor
!L~ h. --J..a.Ll~~
Reader
~/lL
Reader
VIRGINIA M. LESSER. A Comparison of Periodic Survey Designs Employing
Multi-Stage Sampling (Under the direction of William D. Kalsbeek.)
.
ABSTRACT
Due to increasing environmental awareness, the Environmental Protection Agency
(EPA) has determined the need to establish the current status and future changes of our
nation's ecological resources on both regional and national scales. In order to accomplish
these goals, a long term survey is proposed to monitor eight different ecosystems across the
United States. This research was initiated to determine the best type of sample design to
assess the agricultural component of this monitoring program.
Three types of panel survey designs, which included a longitudinal and two types of
mixed-longitudinal designs, were considered in this research.
Comparisons of the design
options were evaluated with. regard to cost-efficiency and other issues, such as nonsampling
errors, which need to be considered in planning a long term survey.
Each of these three
panel designs were evaluated within the framework of two-stage sampling.
In order to
compare precision in this research, the underlying variance of a simple estimator of mean
difference was derived and compared among the design options. A cost model, developed for
each design, accounted for the different rotational sampling schemes across designs.
The
cost models were combined with the error models to determine measures of cost-efficiency.
The cost-efficiencies for each design were computed over a range of statistical and cost
parameters in order to determine the most cost-efficient design over a range of conditions.
II
ACKNOWLEDGEMENTS
First, I wish to express my gratitude to my major professor, Dr. William D.
Kalsbeek, for his guidance, support, and supervision of this research. In addition, I would
like to thank my committee members, Dr. Lisa LeVange, Dr. Ron Helms, Dr. Gary Koch,
and Dr. Walter Heck, for their support and constructive comments on this dissertation.
I also wish to express my sincere gratitude to Dr. John O. Rawlings from N.C. State
University, for his guidance and support throughout my graduate education. I want to also
especially thank Dr. Rick A. Linthurst for his encouragement, friendship, and support in the
pursuit of my doctoral degree.
I am grateful to both John and Rick for sparking my
interest in environmental statistics.
I also express my thanks to my loving family for their support and encouragement
throughout my graduate education.
Finally, I extend my gratitude to all my dear friends
for their friendship and support through these years.
iii
TABLE OF CONTENTS
Page
LIST OF TABLES
vii
LIST OF FIGURES
viii
Chapter
I. INTRODUCTION AND LITERATURE REVIEW
1.1 Review of Periodic Survey Designs
1
2
1.1.1 Description of Periodic Survey Designs
2
1.1.2 Advantages and Disadvantages of Periodic Survey Designs
5
1.2 Periodic Survey Design Comparisons
7
1.2.1 Statistical Efficiency Comparisons
7
1.2.2 Consideration of Sampling Design
9
1.3 USDA - EPA Examples
10
1.3.1 Description of the Design Options
10
1.3.2 Initial Comparison of the Design Options
17
1.4 Proposed Research
19
II. PRECISION COMPARISON AMONG DESIGNS
21
2.1 Consideration of Possible Estimators
22
2.2 General Assumptions and Notation
.24
2.3 Results and Derivation of the Underlying Variance
27
2.3.1 Impact of Sample Overlap
27
2.3.2 Derivation of the Underlying Variance
28
2.4 Simplification of Underlying Variance
39
iv
2.4.1 For Large N
39
2.4.2 In Terms of Temporal Correlation
.40
2.4.3 In Terms of Roh
42
2.4.4 For Equal Year Variance
44
2.5 Comparison of the Underlying Variance Among Designs
45
2.6 Alternative Estimators of Mean Difference
.47
III. COST MODEL DEVELOPMENT AND OPTIMUM STAGE ALLOCATION
.49
50
3.1 Cost Model Development
3.1.1 Cost Models Considered
50
3.1.2 Proposed Cost Model.
51
3.2 Optimum Sample Allocation Among Stages
52
3.2.1 Cost and Error Models
52
3.2.2 Analytical Methods
55
3.2.3 Empirical Methods
57
3.3 Sample Allocation Assessment
66
IV. COST-EFFICIENCY RESULTS
72
4.1 Measure of Cost-Efficiency
73
4.2 Cost-Efficiency Comparisons
74
4.2.1 Proposed Comparative Measures
74
4.2.2 Comparisons of Cost-Efficiency Among Designs
75
4.2.2.1 Relative Efficiencies for b=l Using Average Cost Per Design
J6
4.2.2.2 Relative Efficiencies for b=l Using a Fixed Cost
81
4.2.2.3 Relative Efficiencies for b>1
85
4.2.3 Effect of Variable Cost and Variance on Cost-Efficiency Results
v
88
4.2.3.1 Variance Comparisons at Various Cost Levels
90
4.2.3.2 Cost Comparisons at Various Variance Levels
103
4.4 Summary of Cost-Efficiency Results
117
V. SUMMARY AND DISCUSSION
121
5.1 Discussion of Findings
122
5.1.1 Sampling Error
122
5.1.2 Other Sources of Error
123
5.2 Self-Assessment of Cost-Efficiency Analysis
129
5.2.1 Assessment of Future Work
129
5.2.2 Alternative Panel Designs
132
APPENDIX A: Procedures for Development of NASS and EMAP Area Frame and
Sampling Strategies
.138
APPENDIX B: Results of the Cost-Efficiency Analysis for b=2 and b=3
144
REFERENCES
157
vi
LIST OF TABLES
Table 2.1: Notation used for analytical work
25
Table 3.1: Cost estimates for each design option
53
Table 3.2: Cost estimates for each design option assuming 800 segments are monitored
across 50 states over 12 years
59
Table 3.3: Estimates of total cost over a scenario of different years and different number
of sample points per year for all designs
61
Table 3.4: Estimates of a opt per year for different values of b
70
Table 4.1: Ratio of cost-efficiencies for the ENASS to EMAP design, assuming an annual
78
average cost for each design, over a range of annual autocorrelation, p
Table 4.2: Ratio of cost-efficiencies for the ENASS to NASS design, assuming an annual
average cost for each design, over a range of annual autocorrelation, p
80
Table 4.3: Ratio of cost-efficiencies for the NASS to EMAP design, assuming an annual
average cost for each design, over a range of annual autocorrelation, p
82
Table 4.4: Ratio of cost-efficiencies for the ENASS to EMAP design, assuming an annual
84
fixed cost for each" design, over a range of annual autocorrelation, p
Table 4.5: Ratio of cost-efficiencies for the ENASS to NASS design, assuming an annual
fixed cost for each design, over a range of annual autocorrelation, p
86
Table 4.6: Ratio of cost-efficiencies for the NASS to EMAP design, assuming an annual
fixed cost for each design,over a range of annual autocorrelation, p
87
Table 5.1: Relative comparison of all sources of survey error among designs
.l28
Table 5.2: Ratio of cost-efficiencies for the Lesser design to the EMAP design, assuming
an annual average cost for each design, over a range of annual
134
autocorrelation, p
Table 5.3: Ratio of cost-efficiencies for the Lesser design to the ENASS design, assuming
an annual average cost for each design, over a range of annual
autocorrelation, p
136
Table 5.4: Ratio of cost-efficiencies for the Lesser design to the NASS design, assuming
an annual average cost for each design, over a range of annual
autocorrelation, p
137
..
vii
LIST OF FIGURES
Figure 1.1: Cross-sectional, longitudinal, and mixed-longitudinal designs
3
Figure 1.2: Three design options under consideration for the agroecosystem component of
EMAP
13
Figure 3.1: V(C)' versus b for the EMAP design (q=O)
62
Figure 3.2: V(C)' versus b for the EMAP design (q=I)
64
Figure 3.3: V(C)' versus b for varying fixed cost for the NASS design (q=O)
67
Figure 4.1: ENASS to EMAP variance ratio by cost for four year comparison
91
Figure 4.2: EN ASS to NASS variance ratio by cost for four year comparison
93
Figure 4.3: NASS to EMAP variance ratio by cost for four year comparison
94
Figure 4.4: ENASS to EMAP variance ratio by cost for three year comparison
96
Figure 4.5: ENASS to NASS variance ratio by cost for three year comparison
98
Figure 4.6: NASS to EMAP variance ratio by cost for three year comparison
99
Figure 4.7: ENASS to EMAP variance ratio by cost for eight year comparison
101
Figure 4.8: NASS to EMAP variance ratio by cost for eight year comparison
102
Figure 4.9: ENASS to EMAP cost ratio by variance for four year comparison
l04
Figure 4.10: ENASS to NASS cost ratio by variance for four year comparison
106
Figure 4.11: NASS to EMAP cost ratio by variance for four year comparison
108
Figure 4.12: ENASS to EMAP cost ratio by variance for three year comparison
ll0
Figure 4.13: ENASS to NASS cost ratio by variance for three year comparison
l11
Figure 4.14: NASS to EMAP cost ratio by variance for three year comparison
113
Figure 4.15: ENASS to EMAP cost ratio by variance for eight year comparison
115
Figure 4.16: NASS to EMAP cost ratio by variance for eight year comparison
118
Figure 5.1: Alternative design option proposed for the agroecosystem
component of EMAP
133
viii
CHAPTER 1
INTRODUCTION AND LITERATURE REVIEW
Much information is given in the literature on longitudinal, and cross-sectional
designs, and more recently on mixed-longitudinal designs. The differences in these survey
types relate to the design of each survey, which will also affect survey estimates.
Specifically, the efficiency of a mixed-longitudinal design compared to a longitudinal design,
with interpenetrating samples, is evaluated in this work.
The first section of the literature review presents a summary of the two types of
panel surveys, the longitudinal and mixed-longitudinal designs.
This section reviews the
advantages and disadvantages of these designs with respect to the precision of the estimates.
The second section reviews the research previously published on the comparisons of the
efficiencies of these designs.
This section also includes a summary of research which
incorporates the structure of the survey design in the analysis of the data. This research was
initiated to address a design dilemna currently being examined in the context of a national
monitoring program to be implemented by the Environmental Protection Agency (EPA).
The design options being considered, a longitudinal design with interpenetrating replicates
and a mixed-longitudinal design with planned missing observations, are discussed in the
third section.
In the last section, a proposal for this research to determine the optimum
design for this program is discussed.
1.1 Review of Periodic Survey Designs
1.1.1 Description of Periodic Survey Designs
A number of survey types exist to provide measurements of interest in the
population (Bailar, 1989). The cross-sectional design collects information using a different
(independent) sample of individuals at each point on a time scale.
This type of survey
provides data to obtain estimates of status. Repeated and longitudinal surveys are recurring
surveys, which collect information using the same set of individuals at each time point.
The repeated survey, which is shorter in duration than a longitudinal survey, additionally
provides estimates of change, similar to the longitudinal survey. The cross-sectional design
is shown in Figure 1.1 (A) and the repeated/longitudinal design is shown in Figure 1.1 (B).
Kodlin and Thompson (1958) discussed the difficulties of obtaining observations in a
longitudinal design, specifically the practical problems associated with obtaining repeated
observations on the same subject. Bell (1953) employed a series of cross-sectional studies in
order to accelerate the longitudinal approach in obtaining estimates of growth norms for
children at varying ages.
However, the use of many panels in estimating growth rates
causes confounding of change over time with the panel (Baltes, 1968; Goldstein, 1968).
Rao and Rao (1966) noted the limitations of the longitudinal and cross-sectional
designs in estimating the growth norms and rates of school age children.
Neither design
demonstrated a greater precision for both estimates. They proposed a mixture of these two
designs, which they called the linked cross-sectional design (also referred to as the mixedlongitudinal design). This design employs the use of partial overlap, as shown in Figure 1.1
(C).
The authors argued that the disadvantages associated with the longitudinal design,
such as the expense and the complex planning involved with this design, can be partially
offset by their proposed design.
Prahl-Anderson and Kowalski (1973) summarized the
concerns of the longitudinal and cross-sectional studies and agree the best solution to these
issues is the use of the mixed-longitudinal design proposed by Rao and Rao (1966).
2
Figure 1.1 Cross-sectional, longitudinal, and mixed-longitudinal designs. G
A. Cross-sectional design.
YEAR
Panel
1
1
X
2
3
4
5
6
7
8
X
2
X
3
X
4
X
5
X
6
X
7
X
8
B. Longitudinal Design.
YEAR
Panel
1
2
3
4
5
6
7
8
1
X
X
X
X
x
x
x
x
c'
3
Figure 1.1 (con'd) Cross-sectional, longitudinal, and mixed-longitudinal designs.
.
C. Mixed-longitudinal design.
YEAR
Panel
1
2
345
1
x
x
x
2
X
X
3
(I
7
8
X
X
X
X
4
6
X
X
X denotes sample observed.
-.
4
The advantages and disadvantages of these three designs can be evaluated with
respect to meeting the objectives of the surveys in which they are implemented.
If the .
population characteristics are unchanging, a single cross-sectional survey gathering
information at one point of time would be the simplest design since these results would
apply over time.
However, this design would not meet the objectives of the survey in a
changing population, since the single cross-sectional design does not account for the
changing dynamics of the population.
Duncan and Kalton (1987) evaluated the
longitudinal, mixed-longitudinal, and cross-sectional designs with respect to the following
objectives, which typically justifies the use of these designs:
(a) to provide estimates of population parameters at distinct time points or during
distinct periods of time within which changes are treated negligible;
(b) to provide estimates of population parameters averaged across a period of time;
(c) to measure net change, (i.e.: change at the aggregate level);
(d) to measure various components of individual change, which include gross change
at the element level between two time points, the average change for each individual,
and a measure of instability for each individual;
(e) to aggregate data for individuals over time;
(f) to measure the frequency, timing and duration of events occurring within a given
time period;
(g) to cumulate samples over time, especially samples of rare populations;
They concluded that the choice of design depends on the objectives of concern, on practical
considerations, such as feasibility and cost, and on the level of precision provided by the
design.
1.1.2 Advantages and Disadvantages of Periodic Survey Designs
5
Longitudinal designs have two principal motivations: to increase the precision of
treatment contrasts by eliminating interindividual variation and to examine the individual's
changing response over time (Cook and Ware, 1983).
Duncan and Kalton (1987) suggest
the major strengths of the longitudinal survey are: to enable components of individual
change to be measured; to allow the summation of a variable across time to be measured;
and to provide more precise estimates of temporal changes as compared to a mixedlongitudinal study or cross-sectional study of the same size.
Major disadvantages of the longitudinal design are losses through nonresponse and
the lack of introducing new elements to the population as time passes.
This last
disadvantage is a basic sampling problem associated with longitudinal data as discussed by
Goldfarb (1960) and Namboodri (1978).
Since the population is to be followed over an
extended period of time, a sample of a population-to-be is needed.
However, the sample
obtained in the longitudinal study is the sample of a current population and not a future
population.
There is a difference of opinion regarding cost in a longitudinal survey. Pearson and
Boruch (1980) and Kodlin and Thompson (1958) argue that the cost associated with this
type of survey is much greater that that which would be incurred by carrying out the same
number of interviews with replicated cross-sections.
This is attributed to the amount of
effort devoted to follow-up, particularly tracking those individuals through the population.
However, other authors argue that longitudinal surveys are less expensive than repeated
cross-sectional surveys.
For example, Goldfarb (1960) suggests that the cost of a first time
interview relative to the cost of a second time interview will be more expensive in most
studies.
However, the answer to this debate will depend on the particular study under
investigation, particularly on the level of effort devoted to follow-up.
For a mixed-longitudinal design, sample elements are rotated into the survey,
evaluated for a fixed number of time points, and later rotated out of the survey.
6
An
example of a survey utilizing this design is the Current Population Survey (CPS).
Individuals are evaluated on a monthly basis for four months, rotated out for eight months,
and finally rotated back into the survey for another four months (Technical Report 40).
New panels are continually added to the survey to offset the panels which are eventually
rotated out of the survey.
Duncan and Kalton (1987) discuss the advantages associated
with this survey. The problems of panel conditioning and panel loss in comparison with a
longitudinal survey are reduced, since there is less burden on the participants.
The
introduction of new samples helps to maintain an up-to-date sample of a changing
Data accumulated over time can improve
population over the course of the study.
evaluations of rare populations.
Since new individuals are continually rotated into this
type of design, it is more likely to obtain a higher number of individuals from rarer
subgroups of the population than with a longitudinal design.
The major disadvantages of this design are the inability to measure components of
individual change over an extended time period.
There also is a lack of information to
aggregate data for individuals across time. Since no individual participates for the duration
of the study, individual change cannot be estimated.
Only net change can be estimated
with this design, since individual points are rotated out of the system.
Cross-sectional surveys provide estimates of status at distinct time points.
If a
series of cross-sectional surveys are conducted, these estimates could be averaged to provide
estimates across time. However, no estimates of trends or change can be obtained with this
design.
1.2 Periodic Survey Design Comparisons
1.2.1 Statistical Efficiency Comparisons
Panel surveys, in which similar measurements are made on the same sample each
7
year, provide data for the analyses of change. Longitudinal and mixed-longitudinal surveys
are both considered types of panel surveys (Kasprzyk, et. aI, 1989). Medical research have
employed longitudinal surveys to investigate individual change over time and analyses of
these data have been well documented (Hoel, 1964; Goldstein, 1979; Nesselroade and Baltes,
1979; Ware, 1985; Laird and Ware, 1983).
Procedures for testing hypotheses of mixed-
longitudinal designs have also been documented (Kleinbaum, 1973; McCarroll, 1985; Clark
and Woolson, 1987).
Comparisons of the efficiency of longitudinal and mixed-longitudinal designs have
been investigated by Rao and Rao (1966), Machin (1975), Woolson and Leeper (1980), and
Berger (1986). For the two time periods under investigation by Rao and Rao (1966), the
proportion of individuals to measure at both time periods, which will result in maximum
precision of the estimates, was obtained. They assumed a covariance structure with equal
variances and covariances dependent only on the separation between observations.
This
proportion varied depending on which estimate, either norm or growth rate, was being
considered.
Assuming the data are generated by a multivariate normal distribution, linear
growth is estimated, and a generalized variance of the estimated parameters is used as a
comparison criterion, Machin (1975) investigated the relative efficiencies of the longitudinal
to the mixed-longitudinal and cross-sectional designs.
The generalized variance for these
designs was derived assuming the covariance structure was first order auto-regressive.
He
also assumed overlapping design points among the panels. His results indicated that when
the correlation parameter in the covariance matrix is greater than zero, the mixedlongitudinal and cross-sectional designs were more efficient than the longitudinal design. He
attributed this increase in precision to the increased number of subjects involved in those
designs as compared to the longitudinal design.
Woolson and Leeper (1980) extended Machin's work to non-overlapping designs.
8
Assuming a first order auto-regressive covariance structure in estimating linear growth, they
also compared the ratio of the generalized variances of the longitudinal and mixedlongitudinal designs. They showed that the mixed-longitudinal design is more efficient than
the longitudinal design for all correlations greater than zero, but less efficient than the
Other results indicated that for positive
overlapping designs described by Machin.
correlations the efficiency is increased by increasing the number of independent subsets
rotated into the survey.
Berger (1986) compared the efficiencies of the longitudinal, mixed-longitudinal, and
cross-sectional designs when estimating a polynomial response over time.
He suggested that
the efficiency of a design depends on the covariance structure, the size of the correlation, the
number of time points measurements are taken, and the degree of the polynomial.
He
assumed non-overlapping time points and computed relative efficiencies by comparing the
ratio of the generalized variances of the various designs.
He considered three covariance
patterns which were the simplex covariance structure, the circumplex structure, and a
uniform covariance structure.
His results indicated that the assumed degree of the
polynomial is crucial for the choice of the most efficient design. As the degree of polynomial
approaches the number of time points, the longitudinal design is most efficient. For a linear
function and small positive correlations among measurements, the cross-sectional design is
most efficient.
1.2.2 Consideration of Survey Design
The work mentioned in the previous section has been developed on the assumption
of simple random sampling. However, the assumption of independence of sample elements
is seldom found in survey work.
For example, complex designs, such as cluster designs,
typically contain homogeneities within dusters. Kish and Frankel (1974) conclude that in
designs such as these, the independence assumption fails.
9
Therefore, the results described
above comparing the designs may be invalid.
Methodologies have been developed in analyzing data that take into account the
survey design.
For example, methods to compute statistics collected from the Current
Population Survey have been developed (U.S. Bureau of the Census, Technical Paper 40).
Methodologies combining other types of data analysis with the complex design has been
discussed by others. Nathan and Holt (1980) and Holt and Scott (1981) discuss regression
analyses of data from complex surveys.
Rao et.al. (1989) discuss the analysis of categorical
response data within a complex survey design.
Appropriate methodologies need to be developed to compare a longitudinal, mixedlongitudinal, or cross-sectional design nested within a survey design. The primary focus of
this research is to address this question within a specific design context. The designs to be
compared are discussed in the next section.
1.3 USDA - EPA Examples
1.3.1 Description of the Design Options
With increasing concern on the status of environmental conditions, it has become
critical to initiate monitoring programs to provide quantitative, scientific assessments of the
complex effects of pollutants on ecosystems.
In order to monitor ecological status and
trends and establish baseline environmental conditions against which future changes can be
documented,
EPA
established a
program
to
coordinate
these
Environmental Monitoring and Assessment Program (EMAP),
efforts called
the
(Overton, et.al, 1990;
Bromberg, 1990). The major objectives of the EMAP program are:
(a)
to estimate current status, extent, changes, and trends in indicators of the
condition of the nation's ecological resources on a regional basis with known
10
'
.
confidence;
(b)
to monitor indicators of pollutant exposure and to seek associations between
human-induced stresses and ecological conditions that identify possible causes of
adverse effects;
(c) to provide periodic statistical summaries and interpretive reports on status and
trends to the EPA Administrator and the public;
The program intends to assess the status of a number of different ecological resources,
including surface waters, wetlands, forests, near coastal wetlands, agroecosystems, and arid
lands. EMAP is designed to provide statistically unbiased estimates of status, trends, and
relationships with quantifiable confidence limits on national and regional scales over periods
of years to decades over all these resources. Presently, an overall design strategy is under
consideration to sample all resources.
The agroecosystem component of EMAP has initiated the work discussed in this
research.
Each EMAP component (e.g., agroecosystem, forest, wetland) has the task of
reviewing past and currently operational monitoring programs with similar objectives as the
EMAP program. In addition, these programs are to be compared with the design strategy
proposed within the EMAP program, designed to sample all resource groups, to determine
the best design to monitor each EMAP component across the US.
The objective of this
research is to address this issue for the agroecosystem component of EMAP.
Three design options will be examined in this research, which are illustrated in
Figure 1.2. In this Figure, N represents the sample observed in the j-th year of the program
j
illustrated in the superscript as N . The panel or replicate number, which will be explained
in subsequent paragraphs of this section, is represented in the subscript, as the i-th panel.
Therefore, the sample observed in the j-th year for the i-th panel is denoted as
Nf.
Each
design option provides an area frame covering the total area of the US (Kish, 1965). A
11
description of each of the design options follow.
A. NASS Design.
The first design option, presented in Figure 1.2 (A), is a currently operational
monitoring program conducted jointly by the United States Department of Agriculture Agricultural Research Service (USDA-ARS) and the National Agricultural Statistics Service
(NASS), (Cotter and Nealon, 1987; USDA Pub. 1308). This is an annual national scale
survey which collects and disseminates current statistics on the Nation's agriculture.
In
order to ensure complete coverage of land area in the US, NASS uses an area sampling
frame.
Since 1965, NASS has developed area frame surveys for each state in order to
provide yearly agricultural estimates by state, which are combined for national estimates.
The first step in the development of an area frame for a state is the stratification of
land across the state.
By dividing the land area of a state into strata and optimally
allocating the total sample to strata, the precision of survey estimates are improved.
Satellite imagery, aerial photography, topographic maps, and county highway maps are
some of the materials utilized to perform the stratification process.
procedure, stratification is performed by county across the state.
To simplify this
Quality boundaries are
used in defining strata, which are defined as boundaries easily found and identifiable by an
interviewer (e.g., paved highways, rivers, railroad tracks).
A more detailed description of
this process, which describes some of these strata, is presented in Appendix A.
Once stratification is completed for each county, the strata are subdivided into
primary sampling units (PSUs).
A PSU is defined as a land area constructed by NASS
which are identified by quality boundaries, as described above for the strata. The number
of PSUs for each stratum vary, causing the size of the PSU to also vary. For example, the
size of a PSU in the general cropland strata is 6-8 square miles, while the PSU size of
residential areas is generally 0.5-1 square miles. Another level of stratification is utilized,
12
..
Figure 1.2
Three design options under consideration for the agroecosystem component of
EMAP. a
A. NASS Design.
YEAR
Panel
8
9
10
11
12
1
2
3
4
5
N~
N~
N13
Nt
N~
N 22
N~
N~
N 52
N~
N~
N~
N 53
N~
N~
N 44
N 54
N 64
N~
N:
N~
N:
N~
N:
N~
N~
N~
N:
N~
N~o
N~
N~
N~
N~o
N l7l
N:
N:
N 10
8
N 8l l
N 812
8
9
10
11
12
#
6
7
B. EMAP Design.
YEAR
1
2
3
4
5
6
13
7
Figure 1.2 (cont'd).
Three design options under consideration for the agroecosystem
component of EMAP.&
c.
ENASS Design.
YEAR
1
2
3
4
N~
5
6
7
10
11
12
N~
N~
N~
N 84
N:
#
9
N51
N22
Panel
8
N~
N~
N~
N~O
N~
N~l
N~2
N:
N~
N 10
10
11
N 11
N 12
12
a N represents the observed sample, the s~perscript represents year and the subscript
represents panel number, denoting Nt.
14
which involves ordering the population of sampling units using a criteria of agricultural
similarity.
These substrata improve the precision in the estimates of individual
commodities, especially in areas of intensive cultivation where cropland content varies across
the state.
The PSUs are finally subdivided into secondary sampling units (SSUs, also referred
to as ultimate sampling units or segments) of approximately uniform size. A segment is
defined as a land area, consisting of approximately one square mile, which are easily
identified by quality boundaries. The segments are the actual units selected for sampling.
The first stage of sampling is to randomly select a PSU from within each substratum with
probabilities proportional to the number of segments within each PSU. Segment boundaries
are only dileneated for those selected PSUs. At the second stage of sampling, a segment is
randomly selected from the chosen PSUs.
Once the required number of selected segments have been identified, all segments
are assigned to replicates (panels). Each replicate is a small portion of the sample, which
could provide an independent estimate of the entire sample, although with less precision.
This allows the rotation of sample segments surveyed each year, which NASS refers to as
replicated sampling.
Each year one of these replicates is introduced into the survey and
measured annually for five years. At the same time, a replicate, which has been measured
for five years, is rotated out of the survey.
Thus, the same number of segments are
measured each year. Approximately 20 percent of the replicates in each land-use stratum
are replaced annually. This results in an 80 percent overlap of panel elements from year to
year. A detailed description of these procedures is given in Appendix A.
B. EMAP Design.
The frame of the EMAP design imposes an imaginary hexagon pattern over the
entire US. The hexagon grid consists of 12,600 hexagons, each composed of approximately
15
640 square hectares of land area. The total area of the US is enclosed by these hexagons,
thus providing an area frame with complete coverage of land across the US. Hexagons of a
smaller size, one-sixteenth the size of the larger hexagon, are defined at the center of the
larger hexagon. The collection of the smaller hexagons is considered the frame population
for the EMAP design.
The EMAP design is based on a hierarchical structure, in which distinct tiers are
identified. At the first tier, landscape characterizations will be made on all sample points
(12,600 hexagon areal units) to provide baseline descriptions of all ecological resources, to
identify populations of concern, and to provide a base for assessment of landuse change
(Overton, et.al., 1990). The characterizations at this tier will be based primarily on remote
sensing data.
A subsample of the 12,600 hexagons, which contains a resource unit for that
ecosystem, will be selected from the Tier 1 sample in the EMAP design. The boundaries of
these hexagons, which consist of 40 square hectares, are imaginary.
Approximately 3200
hexagons will be selected at this tier. At this point, the selection process utilizes the NASS
frame. The centroid of the hexagon will be used to identify the NASS PSUs at this location.
These PSUs are the same PSUs discussed for the NASS design.
They have identifiable
boundaries and usually consist of a 6-8 square mile area. These PSUs are further subdivided
into NASS segments and the segment enclosing the centroid is considered the sample
segment or SSU. These are the same SSUs defined in the NASS design. The SSUs have
identifiable boundaries and average one square mile in area.
Thus, the two-stage design
discussed in the NASS design is nested into the second tier of the EMAP design.
A suite of indicators to assess health for each ecosystem will be measured at the
segment level.
In this design option, the 3200 points will be separated into four
interpenetrating subsamples (replicates), each consisting of 800 points.
Each of these
replicates are visited in successive years. In the fifth year of the survey, the first replicate is
16
..
remeasured.
Thus, the four interpenetrating replicates are measured within a repeating
cycle length of four years.
Figure 1.2 (B) illustrates this sampling pattern. The EMAP
design will be referred to as a longitudinal design with interpenetrating replicates.
This
approach ensures a nearly uniform spatial coverage for each annual subsample, thus
providing annual estimates of population parameters over every geographical region. The
measurements made at Tier 2 will form the basis for estimating regional and national
estimates of status, change, and trends, and for identifying additional subpopulations of
interest.
C. ENASS Design
The ENASS design is very similar to the NASS design. The area frame and sample
segments selected are the same as discussed for the NASS design.
The difference in the
designs is due to the number of repeated measurements on the same observational unit.
Sampling segments are measured only at the first and last year of the NASS rotating cycle.
Figure 1.2 (C) illustrates this sampling strategy.
1.3.2 Initial Comparisons of the Design Options
The pattern of remeasuring the selected segments is primarily what distinguishes the
three design options.
This research will focus on the comparison of this rotation pattern
between the three design options. With reference to the previous section, the EMAP design
is a longitudinal survey. The interpenetrating replicates allow for repeated measurements on
the same sampling unit every fourth year. The NASS design is a mixed-longitudinal design
with an 80 percent overlap of panel elements from year to year. The ENASS design is also
a mixed-longitudinal design with a 50 percent overlap every fourth year, but consists of no
year to year overlap.
The replicated sampling strategy utilized by NASS is also referred to as a repeated
17
survey with partial overlap. The advantages of using this strategy instead of an alternative,
such as a longitudinal design, include (USDA Pub.1308):
(a) Sample rotation.
This scheme reduces respondent burden caused by repeated
interviewing, avoids the expense of selecting a completely new area sample each year,
and provides reliable measures of change in the production of agricultural
commodities from year to year through the use of the ratio estimator.
(b) Methodology research. This sampling provides the capability to test alternative
survey procedures or evaluate current methodology since different replicates can be
assigned to the operational methods.
(c) Quality assurance. Replicated sampling facilitates quality assurance analysis by
allowing data comparisons among years in order to determine if significant differences
in survey processes exist over time.
(d) Sample management. Replication allows easy management of the sample due to
the replicate numbering scheme.
(e) Variance estimation. Replicated sampling provides a simple, unbiased method for
estimating the sampling variance using replicate means or totals.
(f) Rotation effects.
Replication readily provides NASS the vehicle for evaluating
sample rotation effects.
An advantage of the agroecosystem component of EMAP to follow the EMAP
design is consistency with the other components (e.g., forests, wetlands). Future analyses of
EMAP data could involve integrating data across ecosystems to investigate possible
associations.
Since all ecosystems would be based on the same sampling frame, sampling
units for some resource groups will be chosen from the same hexagons. This would allow for
an evaluation of data between resource groups with minimal location variability. However,
18
..
the likelihood of this occurrence is quite small.
An advantage of EPA using the NASS design is the coordination of research with
another Federal agency in order to reduce the amount of similar effort in estimating the
status of agroecosystem health.
The area frame adopted by NASS assures complete
coverage of the US and has demonstrated its effectiveness in producing estimates of national
crop production. Some of the information collected by NASS is compatible with the major
objectives of the agroecosystem component of EMAP.
Thus, the only additional cost
expected for EMAP to obtain information on agroecosystem health is the training of the
interviewers to obtain the additional indicators required by the agroecosystem component of
EMAP.
The advantages and disadvantages of the ENASS design are similar to the NASS
design discussed above.
However, a disadvantage of this design is that no information is
taken on intermittent years which removes the opportunity to obtain estimates of year to
year change, similar to the EMAP design.
A larger number of new sampling units are
measured for this design at each year than with the NASS design to compensate for the
reduced level of repeated observations on the same segments.
1.4
Proposed Research
The primary difference between these design options is the number of measurements
collected over time on the sampling units. This results in a comparison of a longitudinal
design with two types of a mixed-longitudinal design.
The statistical efficiency of these
designs has been discussed in the literature, but not within the context of multi-stage
sampling.
Many researchers, as mentioned earlier, have discussed the necessity of
consideration of the survey design in applying statistical methods to the data.
The proposal for this research is to compare the statistical efficiency of these mixed-
19
longitudinal and longitudinal designs in the context of a relatively complex sampling
process. A measure of precision will be derived for each of the design options. Data from
previous NASS annual surveys will be used to verify assumptions made in this work, to
obtain estimates of year to year correlations used in the comparisons, and to compute
variances under each design option. The cost of each of these designs is also expected to
differ. A cost model will be developed for each design option. Both the variance and cost
models will be combined for each design to determine and recommend to EPA the best
design, in terms of precision and practical considerations, to evaluate the ecological
condition of our Nation's agriculture.
..
20
CHAPTER 2
ASSUMPTIONS AND DERIVATIONS OF THE UNDERLYING VARIANCE
In this chapter, the estimator used in this research is defined.
The underlying
variance of this estimator is derived for each of the design options. Prior to obtaining these
results, notation and assumptions of the three design options are outlined.
Some results,
which are used in this derivation, are presented.
Some assumptions are made in order to simplify these formulas, which are listed in
the fourth section. The comparison of the underlying variance among the design options are
made in the fifth section of this chapter. This chapter concludes with a discussion of some
alternative estimators of differences in yearly means, which may affect the comparisons of
these design options.
.
2.1 Consideration of Possible Estimators
Like most surveys, EMAP has many purposes. The agroecosystem component of
EMAP will obtain a variety of measurements on many crops and on the management
practices used in these fields. Examples of these measurements, defined as indicators in the
EMAP program, are land use, productivity, quantity of fertilizer and pesticide application,
amount and type of irrigation applied, and nematode density. Estimates will be obtained
on a number of statistics on the same variable, such as means, totals, and changes over
time. There are a number of difficulties addressed in the lite~ature of choosing the bes~ type
of design in a multipurpose study (Kish, 1988; Kish, 1990; Kish, 1986; Kish and Anderson,
1978). For example, the different purposes need to be defined in statistical terms in order to
serve for comparisons, estimates of variance and cost are needed for each purpose, and these
values need to be combined into a single formulation to solve for an optimal design.
Little discussion of. multipurpose designs are found in the literature due to these
"
difficulties. Kish (1988) points out that in some surveys, a single statistic is considered the
principal variable of interest and assumes zero importance to all other purposes in obtaining
an optimal design. This does not appear reasonable unless arguments can justify that other
variables would result in a similar design.
The principal variables of interest in the EMAP program are estimates of status,
changes, and trends for indicators of agroecosystem health. In this research, measurements
are assumed to be taken on the same number of sampling units for each of the design
options discussed in Chapter 1. Three fields will be sampled from each selected segment for
each design. Under this assumption, no differences in the precision of the estimate of status
are expected among the designs.
Assuming a positive covariance or correlation in two estimated population values
between years, estimates of change and trend are obtained with better precision for designs
which repeat measurements on the same sampling unit. This is apparent from the general
22
.
formula of the variance of the difference between two parameters. Assume -y refers to any
population parameter (e.g., mean, total) and i and j represent two time points.
variance of an estimator of the difference,
T,
between two time points is defined as:
Var(ijr) = Var(j1 -i1) = Var(j1)
=
In this equation,
(7'2
)
The
.(7'2
+ Var(i1) - 2Cov(j1,
+ ,
.(7'2
_
i1)
2 '..) (7'
refers to the variance of 1 at a specified time point, denoted as either i
or j, while ijP refers to the correlation of estimates among all possible samples between time
points i and j. This equation suggests that the variance of the difference between two time
points is most precise for high correlations. For a rotation panel, only a proportion of the
sample is overlapped and this must be accounted for in the formula. The above formula
would' include an additional element in the covariance term to indicate the fraction of
overlap. Hansen et.al (1955) point out that for two occasions, the estimate of change will be
most precise for full overlap when P is positive.
Although one of the objectives of EMAP is to estimate trend, the pattern of
expected trend has not been specified. Since no information exists on long term observations
of many environmental measurements, the expected pattern of trend can only be speculated.
As Berger (1986) summarized in his work, the degree of polynomial specified is critical to
determining the optimal design.
Subsequently, comparison of a trend estimator of an
unknown degree among the three designs is not an optimal choice for comparison of design
efficiency for this study.
Estimates of year to year change are also of primary interest to EMAP and are
expected to differ with the three design options. In order to compare the efficiency of these
three design options, the variance of a simple estimator of year to year mean difference for
23
each of these designs will be computed and used in this comparison.
2.2 General Assumptions and Notation
In this research, assume that a survey population exists, which consists of N
elements.
Each element is characterized by a number of variables.
For example, in this
survey, these variables are called indicators of agricultural health, such as crop yield, pest
density, and erosion index (Agroecosystem Research Plan, 1991). Let one of these variables
be referred to as a Y z variable (z = 1, 2, ...... N). The population value of interest in this
research is the difference of this population value, specifically a mean value between two
time points, which are years in this survey.
Since yearly measurements will be obtained,
assume i and j represent two different years. The difference in a population mean between
years, i and j, is defined as:
In this research, each of the design options assumes the following.
A list of the
notation used in subsequent work is listed in Table 2.1.
1. Two-stage design.
2. Equal size clusters.
3. Sampling is unrestricted random sampling at the first stage.
4. Sampling is simple random sampling at the second stage.
•
5. No stratification at either stage.
6. The same sample size is assumed for each year for each design option.
7. Each replicate (panel) is considered an independent sample.
8. Epsem design.
24
Table 2.1 Notation used for analytical work. CI
Population
Description
Sample
A
a
total # of PSUs
B
b
# of elements per PSU
N=AB
n=ab
total# of population elements
Y a /3
Yap
value of measurement for
B
Y a = EYa /3
/3
A
Y= EYa
a
b
Ya = EYap
p
a
Y= EYa
a
total of measurement
Y=Y/N
y
average of measurement
/3-th element in a-th PSU
=y/n
total of measurement for
a-th PSU
per element
Y=Ya/B
"fa = Ya/b
average of measurement per
element in a-th PSU
iY a/3
i Yap
value of measurement for
/3-th element in a-th PSU
for i-th year
~Ya/3
t
value of measurement for
iYap
/3-th element in a-th PSU
for i-th year and
k-th panel
ClLet a = subscript indexing PSU and /3 = subscript indexing element within PSU. The
totals and averages described in this table can also be defined for specific years (i or j)
and/or panel (k) by adding the subscripts i, j, or k respectively.
25
An equal probability sample of elements will be selected from the survey population.
Also, assume that the set of observations Yz (z = 1, 2, ...... n) is obtained for the Y z
variable. The estimator of the population value defined above is obtained from the sample
and defined as:
n
Ez
n
'Yz / n J
Ez
iYz / n
In order to compare the precision of the three design options discussed in this research, the
underlying sampling variance of this estimator is derived.
The general formula for the
underlying variance of this estimator within the context of a two-stage design consists of a
between -and within PSU component of variance:
•
In this formula, E I and Varl denote the expectation and variance over all possible PSU
.
samples of size a, while E(elcr) and Var(elcr) denotes the expectation and variance given a
particular PSU sample.
The first component obtains the expectation of the conditional
variance at the second stage over all a PSUs and the second component obtains the
variance over all a PSUs of the conditional expectation at the second stage.
For this work, assume that the estimator,
sampling distribution of
Y have
Y, is measured at p time points.
Let the
a mean vector Il, with dimensions p x 1, and covariance
matrix E, with dimensions p x p. In order to obtain the differences in yearly values, assume
that a conformable A matrix consists of contrast numbers, such as 1, 0, and -1, in order to
indicate year to year change over time.
The mean and variance matrix for the mean
differences discussed in this research are derived using:
26
..
E(ijll) = Ay
V(ij'l)
= AEA'
As suggested in previous research, the efficiency of these designs depends on the covariance
structure between points in time for individual population members (Rao and Rao, 1966;
Machin, 1975; Berger, 1986). A pattern expected for the biological indicators collected over
time is a simplex or first order auto-regressive structure. This pattern has been discussed
and used for biological measurements over time and appears justifies for this study of
environmental data (Patterson, 1950; Pottoff and Roy, 1964; Machin, 1975; and Berger,
1986).
2.3 Results and Derivation of the Underlying Variances
2.3.1 Impact of Sample Overlap
The variance of the difference between two years is a function of the covariance
between the two years. Due to the structure of the designs, the covariance term will differ
for each of the designs and for different years within each design. Full overlap occurs in the
EMAP design for all the sampling units measured at four year intervals.
However, no
overlap exists for comparisons other than four years apart.
The covariance term needs to account for the partial overlap of sampling units
found with both the NASS and ENASS designs. The amount of overlap is dependent on
which years are compared in each of the designs. For both designs, no overlap exists for
comparisons greater than four years apart.
In the ENASS design, there is a 50 percent
overlap for all comparisons four years apart, but no overlap for any other comparison. In
the NASS design, the amount of overlap will vary for each comparison less than four years
apart. The overlap for a one year comparison is 80 percent, for a two year comparison the
27
overlap is 60 percent, for a three year comparison the overlap is 40 percent, and for a four
year comparison the overlap is 20 percent. This overlap is introduced into the covariance
term (specifically in a term defined as q, which represents the degree of overlap) as shown in
the next section.
"
2.3.2 Derivation of the Underlying Variances
The variance of the difference between two means in a two-stage design consists of
two components.
In the following, each component will be derived separately for each
design and then combined. Assuming
ijl
is the estimator of the difference between a sample
mean at two time points, i and j:
2.1
IJ
...
1. EMAP Design
a. The variance at the element level:
= Var2 (jY I a)
+
_ EO (1-f6) S2
+
-
a
- 2 - ' a6
a b 1
Var 2 (if
I a)
- 2 Cov2
(/Y, ty I a )
2.2
28
where, the variance at year i or j (denote as w) is defined as:
2.3
and the covariance between year i and j is defined as:
2.4
b.
Since the second stage fractions are assumed uniform for all first stage units, taking
expectation over all possible PSU samples of size a results in:
E 1 [Var 2 ('.J.~IQ)]
"
= (1-f.) 52 + (1-f.) S2
ab j •
ab i •
2 (1-f.)
-
ab
S
ij •
2.5
where, the variance at year i or j (denote as w) is defined as:
-2
= LA LB
(wY ap - wY a )
a
p
A(B-l)
2.6
and the covariance between year i and j is defined as:
2.7
Two situations arise for this estimator when comparing years, which is illustrated in Figure
29
1.2 (B).
Since the same units are remeasured at four year intervals, the covariance term
which is defined as ijSbl is expected to be non-zero for comparisons of four year multiples.
However, for comparisons of non-multiples of four year intervals, ijSb is zero, since no
panels overlap.
2. ENASS Design
a.
The variance term at the element level is similar to Equations 2.2-2.4 given for the
EMAP design.
Since the total sample size in this design is split between two samples, the
sample size of each panel consists of a/2 clusters in contrast to the a clusters for the EMAP
design. However, since summing occurs over two panels, this results in the same summation
(over a clusters) after cancellation, as with the EMAP design. In Section 2.3.1, a discussion
of the partial overlap of sampling units was introduced. This term is introduced into the
covariance component of the following equation as q, defined as the proportion of
overlapping samples between years i and j.
•
The formula for S~b for years i and j, and their covariance, is given in Equation 2.3 and 2.4,
respectively.
b. As with the EMAP design, the second-stage fractions are assumed uniform for all PSUs.
Thus, taking expectation over all possible PSU samples of size a results in:
30
..
2.9
The result for the covariance term in Equation 2.9 is an extension of the argument given by
M. Hansen, W. Hurwitz, and W. Madow in Sample
~
Methods and Theory: Volume
II (1953, Equation 8.9). Two situations arise for this design when comparing years. For
exactly four years apart, q=I/2.
Otherwise, q=O, and subsequently the covariance term
falls out. The variance, S~, for years i and j, and the covariance term are presented in
Equations 2.6 and 2.7 respectively.
3. NASS Design
a.
The variance at the element level is similar to that derived for the ENASS design in
Equations 2.8 and 2.9.
Since the total sample size in this design is split between five
samples, the sample size of each panel consists of a/5 clusters in contrast to the a clusters
for the EMAP design. However, since summing occurs over five panels, this results in the
same summation after cancellation, as with the EMAP design. As with the ENASS design,
q denotes the proportion of overlapping samples.
= Var2 (/Y I a) +
Var2 (ty
31
I a)
- 2 Cov2
(/Y, ty I a )
2.10
The formula for S~6 for years i and j and the covariance between these two years are
presented in Equations 2.3 and 2.4, respectively.
b. Taking expectation over all possible PSU samples of size a:
2.11
The formula for S~ and the covariance term are presented in Equations 2.6 and 2.7
respectively. With this design, two situations arise when comparing years. For comparisons
of four years or less, the proportion of overlap, q, =
~,
where m represents the difference
of years j and i. Otherwise, q=O and hence the covariance term falls out.
1. EMAP Design
a.
The expectation at the element level:
=
~ 'Y a
LJ2.a
a
tir
a
a
32
2.12
b. Taking the variance over all possible PSU samples results in:
ijCT CJ
2.13
2 -a-
where, the variance at year i or j (denote as "') is defined as:
",CT;
=
LA
a
=
-
(",Yo - ",Y)
2
2.14
A
and the covariance between year i and j is defined as:
2.15
Two situations arise for this estimator when comparing years. For comparisons of multiples
of four years, the covariance term is expected to be non-zero.
For comparisons of non-
multiples of four years apart, since no units overlap, the covariance term falls out.
2. ENASS Design
a. Since the total sample size in the ENASS design is split between two samples, the sample
size of each panel consists of a/2 clusters in contrast to the a clusters for the EMAP design.
However, since summing occurs over two panels, the result is the same as with the EMAP
design (Equation 2.12).
33
•
=
~
/la
L- -,;;- a
t~
a
2.16
b. The next step in this derivation obtains the variance over all possible PSU samples of size
a of the above difference. In order to obtain this formula, the variance of each of the above
terms and the covariance of these terms over all possible PSU samples is obtained.
The
covariance between these two terms incorporates a term to account for the amount of
overlapping samples.
The proportion of overlap varies with the years in the comparison.
As shown in Figure 1.2 (C), one of the two panels will be remeasured in a comparison of a
four year difference for the ENASS design. Otherwise, no panels are remeasured. A general
formula for this covariance term, accounting for partial overlap, will first be derived.
Each panel is considered an independent sample.
Thus, there is no temporal
covariance between panels. However, a temporal covariance does exist between years i and j
within a specific k-th panel. This is expressed as:
'Ya
{ ~a .la;-
COVaL-
K
L
=
K
[L
k
K
L:
=
where:
a
Cov a
ky
ja
a
a
k
ij(Ta/
a
k
= number of replicates (panels) which overlapped in year i and j
k
i j (T a
= covariance of j Y andi Y for the k-th replicate
Furthermore, since the k panels are independent, it is plausible to assume that the variance
34
in each of the k panels is the same and hence the above simplifies to:
Cov a
'Yo
LJ...la{~
o
~
LJ
a
i
Y
a
a} __
{q
ijUIJ
}/ a
where: q = proportion of overlapping samples.
This result is an extension of the argument given by M. Hansen, W. Hurwitz, and
W. Madow in Sample Survey Methods and Theory:. Volume
II (1953, Equation 8.9). The
proportion of overlap varies with the three designs. As shown in Figure 1.2, the frequency
and timing of the repeated measurements on the panels vary for the three designs.
For
example, Figure 1.2 (B) illustrates that a panel in the EMAP design is remeasured only at
four year intervals due to the interpenetrating replicates. For the EMAP design, the q term
can be incorporated into Equation 2.13.
In this equation, q=1 for any comparison of a
multiple of four years, and q=O otherwise.
The proportion of overlap varies within the
NASS design due to the changing amount of overlap between comparative years.
Using this covariance term, the variance over all PSU samples of size a for the
ENASS design is defined as:
. ·UIJ
']
2 q-a-
2.17
The formula for u~ for years j and i, and the covariance term, are given in Equation 2.14
and 2.15 respectively.
Two situations arise for this design when comparing years.
exactly four years apart, q=I/2.
For
Otherwise, q=O and subsequently the covariance term
falls out.
3. NASS Design
a. The expectation at the element level is similar to Equation 2.15 above. Since the total
35
sample size in this design is split between five samples, the sample size of each panel consists
of a/5 clusters in contrast to the a clusters for the EMAP design. However, after summing
•
over five panels, the result is the same as with the EMAP design (Equation 2.12).
52
~
=
-
,ya
~ :(g)
t i!a
2.18
a
b. Taking the variance over all possible PSU samples of size a results in the following. This
result follows directly from the ENASS design above.
+
2
itT G
a;-
-
ijtT G
2.19
2q-a-
The formula for tT~ for years i and j, and the covariance between these two years, are given
in Equations 2.14 and 2.15 respectively.
comparing years.
Two situations arise for this estimator when
For comparisons of four years or less,
hence the covariance term falls out.
36
q=~.
Otherwise, q=O and
Combining these components for each of the design results in the formula for
Var( i j I) for each of the three designs.
2.20
Two situations arise:
1. For multiples of 4 years apart, the covariance is expected to be non-zero.
2. For non-multiples of 4 years apart, the covariance term falls out.
B. ENASS
Var (ij') =
(~~b) {jS~ + iS~
- 2 q ijSb}
+
1{jD'~
+ iD'~
- 2 q ijD'Q } 2.21
Two situations arise:
1.
For exactly 4 years apart, q=1/2.
2.
Otherwise, q=O and the covariance term falls out.
C. NASS
Two situations arise:
¥,
1.
For::5 4 years apart, q=
where m=j-i.
2.
Otherwise, q=O and the covariance term falls out.
In these equations, S~ and ijSb are defined in Equations 2.6 and 2.7 respectively, while D'~
and ijD'Q are defined in Equations 2.14 and 2.15, respectively.
37
The above formulas (Equation 2.20-2.22) simplify for b=1 to the following.
same caveats apply to the respective equations. Noting that for i or j, (7'~
+
(7'~ =
The
(7'2,
we
have the following:
For EMAP (Equation 2.20 simplification):
2.23
For ENASS and NASS (Equation 2.21-2.22 simplification):
..
2.24
where (7'~ and
ij(7'o
are defined in Equations 2.14 and 2.15, respectively and where,
the variance at year i or j (denote as w) is defined as:
2.25
38
and the covariance between year i and j is defined as:
2.26
2.4 Simplification of Underlying Variances
2.4.1 For Large N
In order to simplify Equations 2.20-2.22 for further comparisons, two assumptions
are used in this section redefining the underlying variances. The first assumes N is large
relative to n, resulting in a small sampling fraction f=n/N. Therefore, the finite population
factors in the second stage disappear.
Secondly, for each year the expectation of S~ is
assumed equal to O'~. In addition, the expectation of S" is assumed equal to
0'".
This proof,
assuming the observations are independently and randomly distributed, is shown using the
Weak Law of Large Numbers or Slutsky's Theorem.
Substitution of these assumptions
results in the following:
A. EMAP
Two situations arise:
1. For multiples of 4 years apart, the covariance term is expected to be non-zero.
2. For non-multiples of 4 years apart, the covariance term falls out.
B. ENASS
39
Two situations arise:
1.
For exactly 4 years apart, q=1/2.
2.
Otherwise, q=O and the covariance term falls out.
C. NASS
The formula is given in Equation 2.28 with the following caveats:
~
For
2.
Otherwise, q=O and the covariance term falls out.
4 years apart, q=
In these equations, O'~ and
and
¥,
1.
ijO'a
ij0'6
where m=j-i.
are defined in Equations 2.25 and 2.26, respectively while O'~
are defined in Equations 2.14 and 2.15, respectively.
2.4.2 In Terms of Temporal Correlation
The covariance terms in Equations 2.27 and 2.28 can be written in terms of the
temporal correlation. For example,
and
The formulas for
ijP6
and
ijPa
are defined as:
40
ijP"
ijPtJ
=
=
A _
= _
=
E(iYa - iY)(jYa - jY)
a
The formula for the correlation at the element level does not equal the correlation at
the cluster level.
In addition, averaging the element level correlations over a cluster does
not equal the cluster level correlation defined above. However, in this research b will take
on only small values, such as one, two, or three. Therefore, in order to simplify subsequent
results, the element level correlation is assumed to effectively equal the cluster level
correlation. Therefore, in order to simplify Equations 2.27 and 2.28, the covariance terms
are defined in terms of the temporal correlation which is defined as ijP, This results in:
A. EMAP
2.29
Two situations arise:
1. For multiples of 4 years apart, the covariance term is expected to be
non-zero (m=4, m=8, etc).
2. For non-multiples of 4 years apart, the covariance term falls out.
41
B. ENASS
2.30
Two situations arise:
c.
1.
For exactly 4 years apart, m=4 and q=1/2.
2.
Otherwise, q=O and the covariance term falls out.
NASS
The formula is given in Equation 2.30 with the following caveats:
~
(5-m) h
..
4 years apart, q= -5-' were m=J-l.
1.
For
2.
Otherwise, q=O and the covariance term falls out.
In these equations, O"~ and
and
ij0"4
ijO""
are defined in Equations 2.25 and 2.26 respectively, while O"~
are defined in Equations 2.14 and 2.15 respectively.
2.4.3 In Terms of Roh
Another step in the simplification of the variance formulas makes use of the
measure of homogeneity, 6, between second stage units within the first stage units. Assume
that the between and overall components of variance can be rewritten in terms of the overall
variance and 6 (Kish, 1965).
Using:
42
We obtain the following expressions:
The variance formulas simplify to the following:
A. EMAP
Var
(iJ· 8 )
=
{ J"u
2
+ iU2 -
2."'."'"
.pm} {(b-1)(21- O)
.,.,
1
J
ab
IJ
+
(1+0 (b-1)}
ab
2.31
Two situations arise:
1. For multiples of 4 years apart, m=4, m=8, etc.
2. For non-multiples of 4 years apart, the covariance term falls out.
B. ENASS
Var ( ij 8) =
{jU
2
+ i u 2- 2 q
;u
jU ;jpm} {(b-:S-O) +
Two situations arise:
1.
For exactly 4 years apart, m=4 and q=1/2.
2.
Otherwise, q=O and the covariance term falls out.
C. NASS
This is given in Equation 2.32 with the following caveats:
43
(1+:~b-1)}
2.32
~
4 years
apart~ m=(j-i) and q= (5~m).
1.
For
2.
Otherwise, q=O and the covariance term falls out.
In these equations, (7'~ and ij(7'b are defined in Equations 2.25 and 2.26, respectively while (7'~
and ij(7'/J are defined in Equations 2.14 and 2.15, respectively.
2.4.4 For Equal Year Variance
To simplify these equations further, assume j(7'2=i(7'2 and n=ab.
Also, the
correlation between the two successive years of interest in a comparison is now defined as p
(i.e., j-i=1).
The power of p, which is defined as m, will account for mean comparisons
which are greater than one year apart.
A. EMAP
2.33
Two situations arise:
1. For multiples of 4 years apart, m=4, m=8, etc.
2. For non-multiples of 4 years apart, the covariance term falls out.
B. ENASS
2.34
..
Two situations arise:
1.
For exactly 4 years apart, m=4 and q=1/2.
2.
Otherwise, q=O and the covariance term falls out.
44
C. NASS
This is given in Equation 2.34 with the following caveats:
~
S
1.
For
4 years apart, m=(j-i) and q= (5 m).
2.
Otherwise, q=O and the covariance term falls out.
In these equations, IT~ and
while IT~ and
ijlTa
ijlT"
are defined in Equations 2.25 and 2.26 respectively,
are defined in Equations 2.14 and 2.15, respectively. Notice that the first
component in each of these equations is a simple random variance of the difference of two
means. The second component can be considered a design effect. For the special case when
b=l, this component equals 1.
2.5 Comparison ofthe Underlying Variance Among Designs
In order to compare the precision of these designs, a measure, such as the difference
or relative difference between the variances is computed and compared for the designs under
consideration. In this comparison, the differences of the variances presented in Section 2.4.4
among the design options are determined below. In each of these equations, IT~ and
defined in Equations 2.25 and 2.26, respectively while IT~ and
ijlTa
ijlT"
are
are defined in Equations
2.14 and 2.15, respectively.
No difference is found between these two designs except at multiples of four year
comparisons. The difference in a comparison of 4 years is the following:
45
The difference between these two designs depends on which years are in the
comparison. For some comparisons, the EMAP design is more efficient, while NASS is more
efficient in others. The differences are noted below.
i. For j-i
< 4,
(m=j-i).
ii. For j-i = 4.
-_ {_ 8
VarEMAP (i j)
' - VarNASS (
i j)
'
iii. For j-i
tT;/4)}
-
U
{(b- 1- l - 6) + [1+£(b-1)]}
> 4 but i= 8 i= 12 or any other multiple of 4,
f}
2.37
th~ difference is O.
iv. For j-i=8. (For a comparison involving a 12 year or higher 4 year multiple difference,
the superscript on the p is 12, or corresponding multiple of 4.)
2.38
"
As in the previous comparison of the EMAP and ENASS designs, the difference
46
between these two designs depends on which years are in the comparison.
For some
comparisons, the ENASS design is more efficient, while NASS is more efficient in others.
The differences are noted below.
i. For j-i
< 4,
(m=j-i).
VarENASS ( i ; ' ) -
-_ {20'2 qn pm)}
VarNASS (i j)
'
{(b-l~I-6) + [1+t:(b-l)]}
11
2.39
ii. For j-i = 4.
VarENASS (i j ' ) - VarNASS (i j ' )
iii. For j-i
IV.
__ {_ 4 O'S2np
4)}
{(b-l)b(I-6)
+
[1+t:(b-l)]}
11
2.40
> 4 but =I: 8 =I: 12 or any other multiple of 4, the difference is O.
For j-i = 8. (For a comparison involving a 12 year or higher 4 year multiple difference,
the superscript on the p is 12, or corresponding multiple of 4.)
2.41
2.6 Alternative Estimators of Mean Differences
The above measure of the underlying variance for each design option was obtained
for the difference in means between years i and j within a two-stage sample.
However,
combining data from a number of periods, where data is collected on the same observational
units, would improve the precision of these estimates. The CPS, capitalizing on the rotation
design, uses such an estimator, called a composite estimator, for status estimates (Hansen,
47
et.al, 1955; Waksberg and Pearl, 1964). A few other types of composite estimators have
been investigated to improve the precision of the status estimate (Gurney and Daly, 1965;
Kumar and Lee, 1983; Huang and Ernst, 1981; Breu and Ernst, 1983; Cantwell, 1990). In
all of these cases, the estimate of precision on year to year change improves with the use of
data from preceding periods to compute the variance of year to year change.
Since the benefits of the composite estimator are related to the type of design, a
variety of different estimators can be used for each of these designs to improve precision.
The use of a composite estimator with the rotating panel designs, such as NASS and
ENASS, would improve the estimate of the underlying variance. The use of a composite
estimator to incorporate data from the four interpenetrating replicates would improve the
precision of the EMAP design.
The use of the simple estimator in the above work
eliminated any bias attributed to the choice of the estimator and allowed for a straight
forward comparison of the design options.
However, the existence of the composite
estimators should be acknowledged as a method to improve the precision for all three design
options.
48
..
CHAPTER 3
COST MODEL DEVELOPMENT AND OPTIMUM STAGE ALLOCATION
The remaining component to be developed in this research is the cost model. An
important requirement of the cost model is to include all costs associated with the survey.
For example, these might include costs associated with the development of the frame, travel
costs, quality assurance costs, and costs associated with printing reports.
In addition,
differences in costs associated with initial or repeated visits must be accounted for in the
cost model.
A cost model is to be developed for each of the three designs under
consideration in this research.
The first section of this chapter outlines the components of costs which are
considered in the cost model and presents the cost model adopted in this research.
The
second section presents computations for the optimum stage allocation in each design.
Using the adopted cost model and the variance model derived in Chapter 2, methods are
presented to seek exact expressions for the optimum stage allocation. An empirical method
is presented to best approximate the optimum allocations.
3.1 Cost Model Development
3.1.1 Cost Models Considered
The variance model for each of the design options under consideration was derived
in Chapter 2. In this chapter, a cost model is derived for each of these same design options.
All costs of the survey, including both flXed and variable costs, need to be accounted for in
these models. A number of general cost models have been developed and presented in the
literature. For example, Hansen, Hurwitz, and Madow (1953), Kish (1965), Cochran (1977),
and Groves (1989) each presented a general cost model for a two-stage design. In this cost
model, the total cost of the survey has been partitioned between the first and second stages.
Another form of this model adds a component of cost associated with the flXed costs of the
survey, such as overhead costs.
A more complex cost model, discussed by each of these
authors, extends this two-stage cost model to account for the added costs associated with
travel to a primary sampling unit.
The development of the cost model is equally as important as the variance model,
which both contribute to the computation of cost efficiencies. Groves (1989) points out that
survey researchers have given much less attention to survey cost models as compared to
survey error models. Subsequently, many cost models do not accurately reflect the cost of a
survey.
For example, Groves points out that most existing cost models inappropriately
assume that the models are linear functions of the survey parameters, are continuous in
those parameters, and are deterministic.
In addition, most cost models are inapplicable
under some ways of conducting the data collection.
In order to improve designs, Groves (1989) suggests that more attention should be
paid to cost model construction. In this research, several sources were utilized to obtain the
most accurate information on survey costs. The agroecosystem component of EMAP plans
to coordinate with USDA-NASS to collect the information proposed in the EMAP program,
regardless of which design option is adopted.
50
Therefore, USDA-NASS was requested to
..
provide accurate estimates on costs for each of the designs under consideration. The cost
estimates in this dissertation were obtained from an internal USDA-NASS document on
survey costs (Garibay and Huffman, 1991), from a summary report on the USDA-NASS
design (Cotter and Nealon, 1987), and from personal communication with Robert Bass,
USDA-NASS.
These combined sources provided the most accurate estimates of costs
currently available for each of the designs.
3.1.2 Proposed Cost Model
The total cost of the survey was partitioned into three categories. The first category
are those costs associated with frame development.
The second category of costs are
associated with time and travel by the interviewer in collecting these data.
The third
category includes quality assurance costs, printing, laboratory, and statistical analyses costs.
Costs attributed to frame development were further partitioned into fixed and
variable costs. The fixed cost is the cost attributed to the development of the area frame for
a given state. This cost is not affected by sample size and is a one time cost. The cost
associated with the delineation of a primary sampling unit (PSU) is a variable cost, the cost
varying with the number of PSUs that are delineated.
Each of these components of cost
also incorporated some administrative costs, such as agency overhead.
In order to obtain all of the data from a site for a given year, three visits are
planned to each site.
characterization.
....
The first visit during any given year will obtain data on land use
The second visit during a given year will obtain a sample of irrigated
water, while the third visit will obtain a soil sample for laboratory analyses. The cost of the
initial visit for a given year is dependent on the year the site enters the survey. During this
initial visit, the address of the respondent is obtained.
Travel costs are lessened in
subsequent years since the respondent is more easily located. In addition, the amount of
time (and cost) to conduct the survey is reduced since the respondent has previously been
51
informed about the nature of the survey.
Other costs of the survey are associated with quality assurance, training of
II
interviewers, printing of questionnaires and final reports, laboratory analysis of soil and
water samples, computer time, and data analysis. Some of these costs are fixed per year,
while other costs such as printing, vary with the number of segments.
The costs for all
three categories are summarized in Table 3.1.
3.2 Optimum Sample Allocation Among Stages
3.2.1 Cost and Error Models
The cost model presented in Section 3.1.2 reflects the complexities of the cost
structure. This model was summarized into the following form:
T = C
for initial visit }
for repeated visit
+ a*C/ + n* {
3.1
where T reflects total cost of the survey, a is the number of PSUs, C represents the flXed
costs presented in Table 3.1, C/ is the cost of delineating a PSU, n is the sample size
(n=a*b), C t is the variable cost associated with collection of data in a segment for the first
year the site is in the survey, and C 2 is the variable cost attributed to collection of data in a
segment for the second, third, and other subsequent years in the survey. Both C t and C 2
include costs of the three visits in a given year.
An attempt was made to provide a more complicated cost model, which accounted
for the varying degrees of overlap associated with different years of interest in a comparison.
However, Groves (1989) points out that cost and error models for survey optimization have
remained simple in practice due to three reasons. First, the survey statistician has not had
extensive data on cost and error components; secondly, closed form solutions to
52
•
Table 3.1 Cost estimates for each design option.
.,
Component
CI
NASS and ENASS
EMAP
A. Frame costs
$12,032.00jstate
$60,160.00jstate
351.18.k
561.74.k
Initial visit
23.00.k
200.00.k
Subsequent visits
20.00.k
174.00.k
Midsummer visit:
35.00.£
35.00.f
December visit:
85.00.f
85.00.f
6,OOO.00jyear
7,500.00jyear
- Training
14,OOO.00jyear
17,500.00jyear
- Printing
20.00.k
20.00.k
- Laboratory analyses
95.00.£
95.00.f
6,OOO.00jyear
8,OOO.00jyear
Fixed
Variable
B. Site visits
June visit:
..
C. Other costs
- Quality assurance
- Data analysis j
computer time
...
CI
k=segment, f=field.
53
optimization problems are not available when complex cost and error models are used; and
lastly, they add complexity to the already difficult task of designing a complete survey. The
cost model presented in Equation 3.1 accounts for all sources of survey cost and provides a
simple model for use in the optimum allocation computations.
•
The error models presented in Section 2, which will be used in this optimization for
each of the design options, are the following:
A. EMAP
3.2
Two situations arise:
..
1. For multiples 0(4 years apart, m=4, m=8, etc.
2. For non-multiples of 4 years apart, the covariance term falls out.
B. ENASS
3.3
Two situations arise:
c.
1.
For exactly 4 years apart, m=4 and q=I/2.
2.
Otherwise, q=O and the covariance term falls out.
NASS
This is given in Equation 3.3 with the following caveats:
54
S
1.
For $ 4 years apart, m=(j-i) and q= (5 m).
2.
Otherwise, q=O and the covariance term falls out.
3.2.2 Analytical Methods
The allocation of primary and secondary sampling units, which minimizes the
variances with respect to a specified cost, was determined analytically. Optimum solutions
can be obtained by using either the Cauchy-Swartz inequality (Stuart, 1954) or the
Lagrange multiplier (Morrison, 1976). Attempts to determine the optimum stage allocation
for each design utilized both of these approaches.
Cochran (1977) shows that minimization of the variance model for a fixed cost is
equivalent to minimizing the product of the cost and variance model. Furthermore, Stuart
(1954) suggested that this product of the variance-cost equation can be minimized using the
Cauchy-Swartz inequality..However, this equation was too complicated to allow for a closed
form solution of a and b.
Since the goal of this analysis is to minimize variance subject to a fixed cost, a
Lagrange multiplier was used. This method has been applied to optimization theory in the
areas of economics and operational research, as well as statistics (Fryer and Greenman,
1987). The variance-cost equation was defined which combined the variance model subject
to the condition that the cost was fIXed.
The derivatives with respect to a and b were
obtained and set equal to zero. The expression aOpf was derived in terms of b. However, no
easy solution for b was available, since the equation used to solve for b opt contained linear,
quadratic, and cubic terms in b.
In another approach, the cost model was solved in terms of a (Equation 3.1) and
substituted into the variance model (Equation 3.2).
This result is referred to as the
variance-cost V(C) model, which is a presentation of variance in terms of cost.
equation and the derivative with respect to b was obtained and presented as:
55
This
V(C)
3.4
.
8[V(C)]
8b
+
4
(T2
(1 - q pm) (b6 - 6 + 1) (C J
b 2 (T-C)
+ b(C1 + C 2 )
•
4 (T2 (1- q pm) (2b - 26b
+ cSb 2 + cS - 1) (C J + b(C 1 + C 2 )
b 3 (T-C)
3.5
This derivative was set equal to zero in order to solve for bop.' As indicated in Equation
3.5, linear, quadratic, and cubic terms in b are present; consequently a simple solution for
bop. could not be found. However, a solution was obtained using a software package called
Mathematica (Champaign, Illinois). In the following equations, K is defined as the average
of C 1 and C 2 • The solution resulted in two roots which were imaginary and the following
real root:
56
3.6
The solution for a opt is the following:
(T - C)
3.7
The second derivative of b opt was obtained to determine whether the critical point
was a relative maximum or minimum. Results of the second derivative test suggested that
the critical point was a relative maximum. The goal of this analysis was to obtain a value
of b which would minimize the variance-cost model.
No minimum point was found,
implying that the results from this analysis are inconclusive in determining aopt and b o6t '
3.2.3 Empirical Methods
Another approach was used in order to solve this equation for optimum a and b,
and to determine whether these critical points minimized the variance-cost model. In this
57
approach, a fixed cost for the survey was assumed.
First, the objectives of the EMAP
program are to monitor ecosystem health across the United States.
Therefore, it was
assumed that the area frame would be developed for 50 states. Secondly, legislation is under
review in Congress to approve spending on the EMAP program for a 10-15 year period, with
a sample size of 800 points each year. To obtain an estimate of cost to monitor agricultural
health in this research, the total cost of each design was computed assuming 50 states are in
the survey, the duration of the program is 12 years, and funds are available to monitor 800
sample points per year.
Since the Congressional approval of this budget is based on the
EMAP design, which also is the most expensive design option of the three designs, the total
cost estimated for the EMAP design was adopted as the fixed total cost in subsequent
analyses.
The total fixed cost to monitor agriculture health for 50 states, over 12 years,
using 800 sample points per year was estimated at $13.3 million dollars.
Estimates of C J' C, C I , and C 2 for each design were computed based on the values
presented in Table 3.1 and are shown in Table 3.2. The number of fields required to obtain
water and soil samples was flXed at three for each design.
It is apparent that there is a
great cost savings using either the ENASS or NASS designs when reviewing the estimates
presented in Table 3.2. The EMAP design is approximately 37 percent and 59 percent more
expensive than the ENASS and NASS designs, respectively. Since the segments chosen in
the NASS or ENASS design are the same segments visited by USDA, the costs of developing
the frame and visiting these segments is shared by USDA, which provides for the cost
savings of these designs.
In order to determine whether this cost savings was consistent for other sets of
plausible design parameters, the total survey cost was computed over a number of other
scenarios. For example, the total number of years of the study was assumed to be 8 and 16
years, and the total number of sites visited each year was varied between 600 and 1000.
The results of this analysis is presented in Table 3.3. The ratio of the cost of the EMAP
58
,
Table 3.2.
Cost estimates for each design option assuming 800 segments are monitored
across 50 states over 12 years.
Design
C
NASS
$913,600
$351.18
$688.00
$685.00
$8,396,301
EMAP
$3,404,000
$561.74
$865.00
$839.00
$13,339,168
ENASS
$913,600
$351.18
$688.00
$685.00
$9,756,352
T
59
design for each category to the NASS and ENASS designs is presented in parentheses. Using
800 points per year as a comparison, the proportional cost savings of the NASS or ENASS
designs to the EMAP design is more substantial for 8 years and less substantial for 16 years.
This pattern is due to the rotating panel in the NASS and ENASS design. As the number
.
of years in the survey increases (as well as increasing sample size), there is an increasing
number of new PSUs entering the survey. This results in an increased cost for these designs
due to PSU delineation. Since no new PSUs are introduced into the EMAP design, the cost
of PSU delineation remains constant across years. Therefore, the proportional cost savings of
NASS and ENASS to EMAP decreases with increasing sample size and with an increasing
number of years in the survey.
Using the estimates of costs presented in Table 3.2, the variance-cost values were
obtained over a range of b, (b=l to 100), thus describing the behavior of the V(C) function
from Equation 3.4. Since
(12
.
will vary among measures of interest, values of
were plotted in Figures 3.1-3.3.
V(C)'=~
2(1
This function was evaluated over a range of temporal
correlation (rho, p), panel overlap (q), and intracluster correlation (delta, 6). For example,
p=-1.00 to 1.00, 6=0.10 to 0.95, and all possibilities of q are expected for each design. An
indication of the behavior of V(C)' over a range of b is presented in Figure 3.1 for the
EMAP design (q=O for two values of 6). In both cases, bop.=l resulted in the minimum
variance-cost estimate. The range of the y axis was fixed for these two plots in order to
visually inspect the impact of 6 on V(C)'.
A high measure of intracluster homogeneity
(6=0.75) reflects a high degree of similarity within a cluster. High measures of 6 result in
high measures of variance, as compared to the same design with low values of 6.
The
difference in the slopes of these two lines reflect this effect of 6. The plot for 6=0.75 reflects
a much higher slope and V(C)', for the same b, than the plot for 6=0.20.
The same optimum for b was obtained for the EMAP design (q=l), presented in
Figures 3.2.a and 3.2b. The range of the y axis was fixed for these four plots in order to
60
.
Table 3.3. Estimates of total cost over a scenario of different years and different number of
sample points per year for all designs.
SAMPLE POINTS
Design
.
800
1000
8
12
16
4,607,619
6,525,626
8,443,632
(1.89)
(1.66)
(1.54)
5,372,648
7,545,664
9,718,680
(1.62)
(1.44)
(1.34)
EMAP
8,709,776
10,855,376
13,000,976
NASS
5,873,626
8,396,301
10,918,976
(1.79)
(1.59)
(1.48)
6,893,664
9,756,352
12,619,040
(1.53)
(1.37)
(1.28)
EMAP
10,522,368
13,339,168
16,155,968
NASS
7,139,632
10,266,976
13,394,320
(1.73)
(1.54)
(1.44)
8,414,680
11,967,040
15,519,400
(1.47)
(1.32)
(1.24)
12,334,960
15,822,960
19,310,960
NASS
600
YEARS
ENASS
ENASS
ENASS
EMAP
61
0
C\I
0
c
II
0
0
CO
Q)
0
•
II
cc:
C>
en
Q)
-c
c..
<:
0
900'0
500'0
~
W
l>00'0
&00'0
(;00'0
~OO'O
0'0
.(0)1\
Q)
.c.
......
~
0
~
en
en
:::3
~
Q)
-->
>
0
0
to
To
(Y)
LO
,.....
Q)
0
~
:::3
C>
u..
II
CO
Q)
0
•
c
o
900'0
500'0
&00'0
,(0)1\
62
(;00·0
~OO'O
visually inspect the impact of D and p on V(C)'. As discussed above, D had a similar effect
on the function for q=1 at two measures of p. The additional effect of p on V(C)' was
evaluated in this example. Temporal correlation was assumed to be equal to a high value,
0.90, in Figure 3.2a and a low value, 0.20 in Figure 3.2b. As indicated in Equation 3.2 and
3.3, the variance decreases with increasing values of positive temporal correlation.
The
relatively high value of p assumed in Figure 3.2a resulted in a smaller estimate of V(C)' as
compared to a low value of p presented in Figure 3.2b. This results in a lower slope for the
relationship of V(C)' to b for high measures of p.
As indicated in these figures, the relationship of V(C)' to b was linear. However,
the equation for V(C)', and V(C) which was presented in Equation 3.4, indicated this
relationship was nonlinear. In order to explain this result, the V(C)' equation was further
simplified into a linear and nonlinear component of b.
The nonlinear component was a
quadratic function of the inverse of b, which dominates the expression of V(C)' for
b~l.
However, the linear component of the V(C)' equation dominates the expression for
b~l.
Since our objective is to obtain the minimum value of V(C)' for
b~ 1,
only this linear
component of the function is represented in the attached figures. The linear coefficient for b
in this component of the model is :t~. Since the denominator of this expression is quite
large relative to the numerator (Table 3.2), a small rate of change results as indicated in
these plots (see range of y axis).
An increasing linear relationship between band V(C)' was also obtained for the
NASS design and ENASS designs over the range of p and D as used in the EMAP analysis.
The slope of the relationship of V(C)' and b was greatest with high D. This same result was
obtained over measures of q, which are expected for these designs. A large percentage of the
costs in the cost model is attributed to the fixed overhead costs and the costs of analysis of
the soil and water samples.
These costs are not affected by sample stage allocation.
Therefore, the cost savings of choosing more than one segment per PSU does not make a
63
Figure 3.2a. V(C)' versus b for the EMAP design (q=1).
Rho=O.90, Delta=O.75
Rho=O.90, Delta=O.20
80
80
0
III
III
8
8
0
0
~
~
8
8
0
0)
"'"
5-
>
-i
0
0
80
0-
>
N
M
8
0
-
N
8
~
0
0
g
00
0
0
o
o
0
0
o
20
40
60
b
..
80
~- -
-
I
I
I
,
0
20
40
60
b
.,
..
_-
-,80
0
Q)
0
C\l
0
...
II
<U
-
.
,...
II
C'"
c:
CD
C
0
0
•
C\l
0
II
0
.r;
a:
.Q)
(/J
Q)
"'C
0
a..
<C
900·0
~
LU
500·0
VOO·O
£00·0
<:00·0
~OO·O
0·0
.(0)1\
Q)
.s::::.
.....
t.-
O
.e
(/J
::J
(/J
t.-
Q)
-->
>
0
0
.e
C\I
('t)
Q)
t.-
::J
C>
u..
Q)
Ll')
l"-
0
II
<U
CD
C
0
0
•
C\l
0
II
0
.r;
a:
o
900·0
500·0
~OO·O
.(0)1\
65
0·0
significant enough impact to justify sampling more than one segment per PSU. In order to
investigate the impact of a fixed total cost on the relationship between V(C)' and b, the
sample allocation was obtained using a range of total cost ($5,000,000 to $30,000,000).
Examples of these results are presented in Figure 3.3 for the NASS design. The estimate of
V(C)' decreases with increasing dollars, due to the increased sample size influencing V(C)'.
This analysis was repeated for the EMAP and ENASS designs over this range of costs (as
well as 0, q, and p) and similar conclusions were obtained. Results of all analyses indicated
that 1 as the optimum allocation for b.
3.3 Sample Allocation Assessment
Data collected from the USDA-NASS in 1990 and 1991 in North Carolina were
obtained to determine measures of intracluster homogeneity, 0, the level of within cluster
homogeneity. The measurement obtained was the acreage of agricultural land use within a
segment. Approximately, 1 percent of the PSUs selected by NASS contain two segments per
PSU. These PSUs were identified and measures of intracluster correlation were computed
for 1990 and 1991. The number of PSUs in 1990 was 20, while there were 22 PSUs in 1991.
The measures of 0 were 0.80 and 0.69, respectively. These values of 0 suggest that there is a
high measure of similarity among segments within a PSU. This is typical for agricultural
land use, since the agricultural practices within the size of a PSU are very similar. It is
expected that most of the variables measured in this program, which are all related to
agricultural land use, would also result in a high measure of intracluster homogeneity. A
high measure of intracluster homogeneity suggests that the number of segments required to
adequately describe a particular PSU is small.
Multiple segments are providing only
redundant information on that chosen PSU. Therefore, the results of this analysis support
the conclusion that bop. =1 is adequate to represent a PSU in this survey.
66
~
Figure 3.3a. V(C)' versus b for varying fixed cost for the NASS design (q=O).
Total cost
= $5,000,000
Total cost
N
N
o
ci
~
0
o
0
oci
0
ci
eX)
eX)
8
8
ci
0)
-l
6"
:>
= $10,000,000
ci
~
6"
:>
ci
~
~
ci
0
~
8ci
N
N
ci
ci
o
ci
ci
0
0
o
o
o
o
20
40
60
b
80
o
20
60
40
b
80
Figure 3.3b. V(C)' versus b for varying fixed cost for the NASS design (q=O).
Total cost
= $20,000,000
Total cost
N
N
(;
ci
(;
ci
0
o
(;
(;
ci
ci
a
IX)
8ci
0
0)
00
0"
:>
= $30,000,000
~
0"
:>
ci
8ci
~
ci
8ci
N
N
~
0
8
ci
o
o
ci
ci
o
20
40
60
b
80
o
20
40
60
80
b
~
.
There is one disadvantage in choosing b obt =!. No assessment of whether this is the
appropriate stage allocation will be possible to confirm this allocation.
For example,
...
choosing b>l will provide data to determine measures of 6. Using data collected from the
survey, the optimum stage allocation can be determined and compared with the initial stage
allocation. However, the choice of b opt =l does not provide the data to obtain estimates of
6. Since this evaluation could be a consideration in the determination of the best design,
additional measures of b opt will be used to provide a range of stage allocations in the costefficiency comparisons.
The optimum stage allocation analysis indicated b opt =l over a range of parameters
and cost. The number of PSUs were determined, assuming b=2 and b=3, and fixing the
total cost to those estimates provided in Table 3.2 for each design. These results are
presented in Table 3.4.A.
The proposed budget for the EMAP program, submitted to Congress this past year,
adopted the EMAP design to monitor all ecosystems.
If this budget is approved, it is
expected that $13.3 million dollars will be devoted to monitor agricultural health over a 12
year period. As indicated, the cost of a 12 year program using the NASS or ENASS designs
is less than $13.3 million dollars (Table 3.2).
Assuming the agriculture component of
EMAP acquires $13.3 million dollars, the number of PSUs which can be evaluated over a 12
year period with this funding was determined for each of the designs over a range of b. The
costs used in this analysis are presented in Table 3.2, which accounted for the differential
costs attributed to repeated visits over time, the cost ofPSU delineation, fixed frame costs,
and the travel cost savings attributed to multiple SSUs in a PSU (15 percent savings).
The results of this analysis are presented in Table 3.4.B.
As expected, a higher
number of PSUs can be evaluated with a decreasing number of SSUs for all designs. This
analysis also indicates that more PSUs can be evaluated with the ENASS and NASS designs
as compared to the EMAP designs when cost is fixed at $13.3 million dollars. Since the cost
69
Table 3.4. Estimates of a opt per year for different values of b.
A. Sample size computed fixing the total survey cost at the costs
presented in Table 3.2.
b
ENASS
1
800
800
800
2
525
495
508
3
371
338
352
NASS
EMAP
B. Sample size computed fixing the total survey cost at $13.3 million dollars
(the cost of the EMAP design).
b
ENASS
1
NASS
EMAP
1124
1328
800
2
738
822
508
3
521
562
352
70
of PSU delineation for the ENASS and NASS designs is 60% of the cost of the EMAP
delineation, more funds are available to increase the number of PSUs.
The cost savings
discussed for the NASS design decreases with an increasing number of SSUs. The cost of
sampling each additional SSUs over a 5 year rotation absorbs this cost savings, and thus
does not provide the resources necessary to increase the number of PSUs.
71
CHAPTER 4
COST-EFFICIENCY RESULTS
The objective of this chapter is to compare cost-efficiency results among the design
options studied in this research. The variance and cost models for each design were derived
in Chapter 2 and 3, respectively.
The measure of cost-efficiency adopted in this research, along with other
alternatives, is presented in the first section of this chapter. The cost-efficiency comparative
measures are presented in tlte second section of this chapter. Cost-efficiencies were evaluated
assuming two types of survey costs. The first approach assumed the average cost per design
while the second approach assumed a similar fixed cost across designs.
Finally, a
comparison of designs was carried out assuming a range of total survey cost and fixed
variance.
The results of the cost-efficiency comparison are presented across a range of
parameters, such as annual autocorrelation.
efficiency comparison results.
The last section summarizes these cost-
4.1 Measure of Cost-Efficiency
The purpose of a cost-efficiency measure is to describe efficiency
In
terms of cost.
This measure is a combination of the variance of interest and an accurate measure of cost.
The variance model used in this analysis for each of the designs was presented in Equations
3.2-3.3.
The cost model was presented in Equation 3.1, where the accompanying cost
estimates for this model were presented for each design in Table 3.1.
A method of comparing investment alternatives commonly used in economic
analysis is the benefit-cost ratio method (Canada and White, 1980; Henderson and Quandt,
1971). The benefit-cost ratio can be defined as the ratio of the equivalent worth of benefits
The cost-efficiency analysis proposed in this chapter is
to the equivalent worth of costs.
similar to the benefit-cost ratio method.
In this analysis, a measure of relatively small
variance can be considered a benefit. The estimate of the total survey cost associated with
this benefit is the estimate of cost.
Measures of cost-efficiency, which are both measures of efficiency per dollar of cost,
considered in this research included:
V~C
4.1a
2. CE=~
4.1b
1. CE=
;V. C
In these equations, CE refers to cost-efficiency.
V and C are defined as the measures of
derived variance and cost, respectively. Precision or efficiency, is defined by Kish (Section
1.6; 1965) as the inverse of the variance of the survey estimates. However, small measures
of variance can result in very small measures of precision. Variances approaching zero can
result in an undefined measure of precision. Taking the square root of the variance is an
attempt to avoid such extremes. In this research, the second equation was adopted as the
73
cost-efficiency measure.
.
4.2 Cost-Efficiency Comparisons
4.2.1 Proposed Comparative Measures
In order to compare designs, several comparative measures of relative efficiency
(RE) were considered. These measures of relative efficiency included:
1
1. RE
=
CE l
CE
2
=~
1
4.2a
1
~1
1
:rv;;c,
4.2b
4.2c
The subscript in the CE refers to any two designs in a comparison.
measure is a simple ratio of the cost-efficiency of two designs.
The first
The second measure is a
relative ratio of cost-efficiency. This measure evaluates the difference in cost-efficiency of
two designs relative to the cost-efficiency of one of these designs. The third measure is an
estimate of the simple difference of cost-efficiency in two designs. In many cases, a ratio of
cost-efficiencies resulted in a cancellation of some terms.
74
For example, the design effect
(present in the variance model, Equation 3.2) canceled out in Equations 4.2a and 4.2b.
Given these two choices to describe relative efficiencies in this research, the ratio in Equation
4.2a was selected to compare relative efficiency due to its simpler form.
In Section 3.2.3, a justification to compute costs for this survey for a 12 year period,
over 50 states, and sampling 800 points per year was discussed.
The measure of cost in
these analyses was assumed to be the average cost per year of the survey.
The cost
estimates for each of the designs for the conditions assumed in this research were presented
in Table 3.2. To obtain the average cost per year, these values were divided by the number
of years in the survey (12) to obtain annual yearly cost.
The remaining two subsections of Section 4.2 includes cost-efficiency results among
the designs.
The results of the cost-efficiencies are presented for b=l, b=2, and b=3
respectively, in Section 4.2.2.
For each of these allocations, the cost-efficiencies of the
pairwise comparisons among the design alternatives are presented.
Two topics are addressed in Section 4.2.3. First, variance comparisons among the
designs are presented for a range of various cost levels. These comparisons are discussed for
each pairwise combination of designs over ranges of degree of overlap and temporal
correlation. The comparison of total survey costs at various variance levels is also presented
in this section. These analyses, similar to the variance comparison, were done over ranges of
degree of overlap and temporal correlation.
4.2.2 Comparisons of Cost-Efficiency Among Designs
Each of the components of the adopted relative efficiency measure (Equation 4.2a)
will be described to best understand the results of this analysis. The most efficient design is
associated with the smallest product of variance and cost, V.C.
In addition, the most
efficient design is then associated with the largest inverse of V.C. Therefore, ratios greater
than one indicate that the design in the numerator is a more cost-efficient design than the
75
design in the denominator.
Relative efficiencies for b=1, b=2, and b=3 over a range of annual autocorrelation
are presented for the ENASS to EMAP comparison, the ENASS to NASS comparison, and
the NASS to EMAP comparison respectively, in this section. Both the NASS and ENASS
designs are presented as alternate designs to the EMAP design to monitor agriculture health
in this program. Therefore, the EMAP design is considered the reference design and appears
in the denominator in these analyses. The established USDA design, the NASS design, is
considered as the reference design in comparison to the ENASS design.
The cost-efficiency results are presented in a series of tables for this section. The
column headings vary according to which two designs are in the comparison. This is due to
the varying degree of overlap over years for each of the designs.
For example, both the
EMAP and ENASS designs have no overlap for comparisons at one, two, and three years.
Therefore, the relative efficiencies for these years are the same and appear under the column
heading labeled no overlap in the following tables. However, since the NASS design does
have varying overlap for each of these years, the relative efficiencies are presented for one,
two, and three years.
4.2.2.1 Relative Efficiencies for b=1 Using Average Cost Per Design
The first analysis of cost-efficiency assumed that the program will survey 50 states
over 12 years, sampling 800 points per year. The total costs of a 12 year design presented in
Table 3.2 were averaged over 12 years.
substituted in the cost-efficiency model.
The average yearly cost for each design was
Since average yearly cost was used in this cost
comparison, the changes in cost-efficiency over a period of time is attributed to changes in
variance, not cost.
Therefore, the changes in the relative efficiencies over time in these
tables are attributed to differences in variances between designs.
76
.
1. ENASS to EMAP Comparison.
In Table 4.1, the ratio of the cost-efficiency of the ENASS design to the EMAP
design is greater than one for all correlations greater than -0.85 at four year comparisons,
suggesting that the ENASS design is more cost-efficient. For correlations greater than 0.95
for four year comparisons, the EMAP design is more cost-efficient.
occurs in the EMAP design for comparisons at four years.
Full overlap (q=l)
For the ENASS design, 50
percent overlap occurs at four year comparisons. A smaller variance occurs with a greater
degree of overlapping panels (Equations 3.2-3.3).
However, the amount of variance
reduction is influenced by the degree of temporal correlation (also referred to as
autocorrelation).
variance.
High temporal correlation has the greatest impact on reducing the
Therefore, the EMAP design is more cost-efficient in these situations.
For
moderate and low measures of autocorrelation, the advantage of a high degree of
overlapping panels is reduced.
Due to the lower cost of the ENASS design, the ENASS
design is more cost-efficient in these situations.
At eight years, the EMAP design maintains full overlap while no overlap exists for
the ENASS design.
The impact of the high correlation on reducing the variance for the
EMAP design has decreased as compared to the four year comparisons, due to the
autocorrelation structure.
This results in a loss in cost-efficiency for the EMAP design.
This loss in cost-efficiency is also indicated by the comparison at 12 years.
These same
results occur for positive and negative correlations, again due to the autocorrelation
structure assumed for the variance. The years of interest in this comparison correspond to
even numbers (e.g., four, eight). (Recall that this number appears as the power function of
the measure of correlation, Equation 3.2.) Therefore, the negative sign in the correlation
measure disappears.
At the beginning of the survey, a comparison among some years for either the
EMAP and ENASS design will have no overlapping panels (q=O). The last column of Table
77
Table 4.1
Ratio of cost-efficiencies for the ENASS to EMAP design, assuming an annual
average cost for each design, over a range of annual autocorrelation, p. CJ
..
Difference in Comparative Years
4
8
12
No overlap
0.765
0.793
0.927
1.367
+/- 0.90
0.978
1.032
1.158
1.367
+/- 0.85
1.100
1.166
1.266
1.367
+/- 0.80
1.178
1.247
1.319
1.367
+/- 0.50
1.345
1.365
1.367
1.367
+/- 0.25
1.367
1.367
1.367
1.367
0
1.367
1.367
1.367
1.367
p
+/-
0.95
.
CJCost was fixed as the annual average cost for each design as listed in Table 3.2.
(b=1, a=800 for each design).
78
4.1 indicates that ENASS will be more cost-efficient in this situation. Since the variance of
both designs is the same in this situation, this result is solely attributed to the cost savings
of the ENASS design.
(Recall that the sample size was constant for all designs in this
analysis, (i.e.: n=800 for each design.)
2. ENASS to NASS Comparison.
Table 4.2 compares the cost-efficiency of the ENASS design to the NASS design. In
years one, two, and three, the NASS design consists of overlapping panels, while the ENASS
design has no overlapping panels. The proportion of overlapping panels decreases from one
year (80 percent overlap), to two years (sixty percent overlap), to three years (40 percent
overlap). At four years, the ENASS design consists of 50 percent overlapping panels, while
the NASS design consists of 20 percent overlapping panels. Recall that a positive measure
for the annual autocorrelation decreases the variance for all designs (Equation 3.2 and 3.3).
The NASS design is more cost-efficient as compared to the ENASS designs as indicated by
small ratios for years one through three. Since there are more overlapping panels for year
one, this year is the most cost efficient.
The measure of annual autocorrelation was raised to a power, which was the
difference in years, to account for the first-order autoregressive structure of the variance.
When this power is an odd number and the measure of annual correlation is negative, this
measure inflates the variance. This resulted in a loss of cost-efficiency for the ENASS design
at negative measures of correlations for odd years in the comparison.
The negative
correlation cancels out for comparisons of even years. This results in a lower variance for
the NASS design at two and four year comparisons.
Since there is no difference in the
measure of variance between these two designs when no overlap occurs, the ratio of the costefficiencies solely reflects the differential cost of the two designs.
79
Table 4.2
Ratio of cost-efficiencies for the ENASS to NASS design, assuming an annual
average cost for each design, over a range of annual autocorrelation, p. CI
.
.
Difference in Comparative Years
p
1
2
3
4
No overlap
- 0.95
1.142
0.583
0.997
1.023
0.861
- 0.90
1.129
0.617
0.978
0.979
0.861
- 0.85
1.115
0.648
0.961
0.947
0.861
- 0.80
1.102
0.675
0.945
0.925
0.861
- 0.50
1.018
0.793
0.882
0.869
0.861
- 0.25
0.943
0.844
0.863
0.861
0.861
0
0.861
0.861
0.861
0.861
0.861
+ 0.25
0.770
0.844
0.858
0.861
0.861
+ 0.50
0.667
0.793
0.839
0.869
0.861
+ 0.80
0.516
0.675
0.767
0.925
0.861
+ 0.85
0.487
0.648
0.747
0.947
0.861
+ 0.90
0.455
0.617
0.724
0.979
0.861
+ 0.95
0.422
0.583
0.698
1.023
0.861
ClCost was fIxed as the annual average cost for each design as listed in Table 3.2. (b=I,
a=800 for each design).
80
..
3. NASS to EMAP Comparison.
The NASS design was compared to the EMAP, or reference design, in Table 4.3.
The NASS design is more cost-efficient as compared to the EMAP design in most situations.
Overlapping panels exist for years one through three for the NASS design, but not for the
EMAP design. The pattern of the ratios in this table is similar to the pattern which was
observed in Table 4.2.
However, the ratios in this table are greater due to the greater
differential in cost of NASS to EMAP as compared to NASS to ENASS (Table 3.2).
The effect of shifting the yearly comparison from four to eight to 12 years is
illustrated in this table. The impact of the correlation on decreasing variance is reduced as
the difference in the comparative years increases.
A large difference between years in the
comparison combined with low measures of correlation, reduces the improvement in variance
attributed to the overlapping panels. This occurs for differences at eight and 12 years in
Table 4.3 at moderate levels of correlation due to cost. For example, for p=0.50 and at an
eight year comparison, the cost-efficiency ratio is only a reflection of cost (this measure is
the same as the cost-efficiency measure for p=O).
4.2.2.2 Relative Efficiencies for b=1 Using a Fixed Cost.
The cost efficiencies presented in Table 4.1-4.3 assumed a total fIxed cost for 800
sampling points per year, which varied among designs (Table 3.2). As discussed in Section
3.2, it is possibile that the agricultural ecosystem will obtain funding to employ the EMAP
design.
This estimate, which was presented in Table 3.2, is $13,339,168. Assuming this
fIxed cost, the number of PSUs sampled under the NASS and ENASS designs, which were
presented in Table 3.4, were 1328 and 1124 respectively. Using these estimates of PSU size,
cost-efficiencies were recomputed and presented in Tables 4.4-4.6 for each of the three
pairwise comparisons. As discussed for Tables 4.1-4.3, average yearly cost was used in this
analysis.
Therefore, the changes in cost-efficiency over a period of time within each
81
Table 4.3
Ratio of cost-efficiencies for the NASS to EMAP design, assuming an annual
average cost for each design, over a range of annual autocorrelation, p. /I
..
Difference in Comparative Years
No overlap
1
2
3
4
8
- 0.95
1.198
2.346
1.371
0.748
0.922
1.077
1.589
- 0.90
1.211
2.216
1.398
1.000
1.199
1.346
1.589
- 0.85
1.226
2.111
1.423
1.161
1.355
1.471
1.589
- 0.80
1.241
2.024
1.447
1.274
1.449
1.533
1.589
- 0.50
1.343
1.723
1.550
1.548
1.586
1.589
1.589
- 0.25
1.450
1.619
1.584
1.586
1.589
1.589
1.589
0
1.589
1.589
1.589
1.589
1.589
1.589
1.589
+ 0.25
1.776
1.619
1.594
1.586
1.589
1.589
1.589
+ 0.50
2.051
1.723
1.630
1.548
1.589
1.589
1.589
+ 0.80
2.648
2.024
1.782
1.274
1.449
1.533
1.589
+ 0.85
2.808
2.111
1.829
1.161
1.355
1.471
1.589
+ 0.90
3.002
2.216
1.888
1.000
1.199
1.346
1.589
+ 0.95
3.243
2.346
1.960
0.748
0.922
1.077
1.589
p
12
/lCost was fIXed as the annual average cost for each design as listed in Table 3.2. (b=1,
a=800 for each design).
82
..
table is attributed to changes in variance, not cost.
1. ENASS to EMAP Comparison.
Table 4.4 reflects the ratio of cost-efficiency for the ENASS to EMAP design. The
pattern of the ratios shown in this table is the same as the pattern discussed for Table 4.1.
For example, the cost-efficiency ratio increases with decreasing annual autocorrelation.
Since cost is constant for the two designs, this effect is attributed to the change in variance.
As annual autocorrelation decreases, the variance increases for both designs. However, since
the EMAP design has a greater number of overlapping samples (full overlap), the variance
for this design increases at a faster rate with reduced correlation. This results in a higher
cost-efficiency for the ENASS design at low measures of correlation.
The cost-efficiency ratio also increases with an increasing difference in years. The
increasing difference in years reduces the correlation measure. This has the same effect on
the cost-efficiency measure as described. above.
.
The measures which appear in this table are smaller than those presented in Table
4.1. The average yearly cost differs for the ENASS and NASS design in this analysis, as
compared to Table 4.1
The lower measures observed in Table 4.4 are attributed to the
decreased cost-efficiency of the ENASS design. (Recall that no changes with regard to cost
were made to the cost-efficiency computations for the EMAP design in this analysis.) In the
analysis presented in Table 4.4, an increased number of dollars was available for the ENASS
design.
This provided more funding to increase the sample size, which decreased the
variance. However, the cost increase to add these samples was $3.6 million dollars. This
larger cost value did not compensate for the smaller variance and resulted in a larger
measure of V.C as compared to values in Table 4.1. This resulted in the smaller ratios
observed in this table as compared to Table 4.1.
83
Table 4.4
Ratio of cost-efficiencies for the ENASS to EMAP design, assuming an annual
fixed cost for each design, over a range of annual autocorrelation, p. CI
..
Difference in Comparative Years
p
4
8
12
No overlap
+/-
0.95
0.663
0.688
0.804
1.185
+/-
0.90
0.848
0.895
1.004
1.185
+/-
0.85
0.953
1.011
1.098
1.185
+/-
0.80
1.021
1.081
1.144
1.185
+/-
0.50
1.166
1.183
1.185
1.185
+/-
0.25
1.184
1.185
1.185
1.185
1.185
1.185
1.185
1.185
0
ClCost was fixed as the annual average cost for the EMAP design. Values of PSUs varied
among designs as presented in Table 3.4B (b=1 for each design).
84
.
.
2. ENASS to NASS Comparison.
The pattern of the ratios presented in Table 4.5 is similar to the patterns described
in Table 4.2 for the ENASS to NASS comparison. The ratios in Table 4.5, however, are
greater than those presented in Table 4.2.
The extra funds available in this analysis
provided for an increased sample size for both designs, which resulted in a decreased
variance.
3.4).
However, there was a larger sample size available for the NASS design (Table
This resulted in a greater reduction in variance with the NASS design.
Taking a
square root of these values, which were less than one, resulted in a smaller measure of costefficiency for the NASS design. This resulted in larger ratios as observed in this analysis as
compared to Table 4.2.
3. NASS to EMAP Comparison.
The patterns of the cost ratios in Table 4.6 are similar to that described for Table
4.3. However, the ratios in this table are smaller compared to those presented in Table 4.3.
.t
This same phenomena occurred between the comparison of Table 4.1 to 4.4.
The cost-
efficiencies of the EMAP design did not change in this analysis. There was in increase in
available dollars for the NASS design in this analysis.
This provided more funding to
increase the sample size, which decreased the variance. However, the cost of these additional
sampling units increased by $5 million dollars. The increased cost did not compensate for
the smaller variance and resulted in a larger measure of V.C as compared to measures
computed for Table 4.3.
Consequently, smaller ratios are observed in this table as
compared to Table 4.3.
4.2.3.3 Relative Efficiencies for b> 1.
Cost-efficiencies were also computed for different sample stage allocations. For b>l,
the variance formula presented in Equation 2.33 includes a term for intracluster
85
Table 4.5
Ratio of cost-efficiencies for the ENASS to NASS design, assuming an annual
fixed cost for each design, over a range of annual autocorrelation, p. a
.
Difference in Comparative Years
p
1
2
3
4
No overlap
- 0.95
1.221
0.623
1.066
1.093
0.920
- 0.90
1.207
0.660
1.046
1.046
0.920
- 0.85
1.192
0.692
1.027
1.013
0.920
- 0.80
1.178
0.722
1.010
0.989
0.920
- 0.50
1.089
0.848
0.942
0.929
0.920
- 0.25
1.008
0.903
0.923
0.921
0.920
0
0.920
0.920
0.920
0.920
0.920
+ 0.25
0.823
0.903
0.917
0.921
0.920
+ 0.50
0.713
0.848
0.897
0.929
0.920
+ 0.80
0.552
0.7-22
0.820
0.989
0.920
+ 0.85
0.520
0.692
0.799
1.013
0.920
+ 0.90
0.487
0.660
0.774
1.046
0.920
+ 0.95
0.451
0.623
0.746
1.093
0.920
..
..
•
aCost was fixed as the annual average cost for the EMAP design. Values of PSUs varied
among designs as presented in Table 3.4B (b=l for each design).
86
Table 4.6
Ratio of cost-efficiencies for the NASS to EMAP design, assuming an annual
fixed cost for each design, over a range of annual autocorrelation, p. a
...
Difference in Comparative Years
1
2
3
4
8
- 0.95
0.971
1.903
1.112
0.607
0.747
0.874
1.288
- 0.90
0.982
1.797
1.134
0.811
0.972
1.091
1.288
- 0.85
0.994
1.712
1.154
0.941
1.099
1.193
1.288
- 0.80
1.006
1.642
1.174
1.033
1.175
1.243
1.288
- 0.50
1.089
1.397
1.257
1.255
1.286
1.288
1.288
- 0.25
1.176
1.313
1.284
1.286
1.288
1.288
1.288
0
1.288
1.288
1.288
1.288
1.288
1.288
1.288
+ 0.25
1.440
1.313
1.292
1.286
1.288
1.288
1.288
+ 0.50
1.663
1.397
1.322
1.255
1.286
1.288
1.288
+ 0.80
2.147
1.642
1.445
1.033
1.175
1.243
1.288
+ 0.85
2.278
1.712
1.483
0.941
1.099
1.193
1.288
+ 0.90
2.435
1.797
1.531
0.811
0.972
1.091
1.288
+ 0.95
2.630
1.903
1.589
0.607
0.747
0.874
1.288
p
12
No overlap
aCost was fixed as the annual average cost for the EMAP design. Values of PSUs varied
among designs as presented in Table 3.4B (b=l for each design).
87
homogeneity, 6. This measure is the same for both designs in the comparison. Since the
cost-efficiencies are ratios of designs, this parameter cancels out and therefore does not
influence the results.
The cost-efficiency ratios for b=2 and b=3 were evaluated.
results appear in Appendix B.
The tables of these
.
Cost-efficiency ratios for b=2 are presented in Tables 4.7-
4.12, while the results for b=3 are presented in Tables 4.13-4.18. The number of PSUs used
in these analyses for the three designs was presented in Table 3.4.
The results for this analysis are nearly identical to the results presented in Tables
4.1-4.6. The cost-efficiency ratios in both sets of tables follow the same patterns. In fact,
the maximum difference between the ratios which appear in Tables 4.1-4.6 from Tables 4.74.12 and Tables 4.13-4.18 is 3 percent. The only difference in these analyses is attributed to
the sample allocation.
The sample allocation was consistent for each design in a
comparison. The slight differences observed in these tables is due to the differences in PSU
and SSU cost among the designs. This resulted in a small change in sample size among the
sample allocation schemes. This slight change in the results suggest that sample allocation
does not have a major effect on the cost-efficiency results.
4.2.3 Effect of Variable Cost and Variance on Cost Efficiency Results
In order to further compare these designs, the cost-efficiencies were recomputed.
The cost-efficiency results presented in the previous section were calculated over two
scenarios of a fixed budget allotment, Tables 4.1-4.3 versus Tables 4.4-4.6. It is of interest
to determine the effect of further varying total fixed cost over a broader range of budget
allotments, which is discussed in this section.
The sample size varied for each design over a range of total survey cost. In this
analysis the total survey cost was set to vary between $5 and $30 million dollars. (Recall
that the total cost of conducting this survey using any of these designs ranged from $5 to
88
•
$20 million dollars - Table 3.3.)
By rearranging Equation 3.1, the sample size was
determined for each design at each measure of fixed cost.
Using this sample size, the
variances for each design were computed (Equations 3.2-3.2). The designs were compared
by evaluating the ratios of these variances. The same designs were used as reference designs
as in Section 4.2. The results of these analyses are presented in the first subsection of this
analysis.
In the next subsection of this analysis, the cost ratios over a range of fixed variance
were computed. The range of fixed variance was obtained from the variances computed for
the cost-efficiency results presented in Tables 4.1-4.6. Since u 2 will vary among measures of
interest and the ratio of the variances are of importance, these variance measures were
computed without the constant term, 2u 2 •
ranged from 0.0004 to 0.002.
The measures of variance from these results
(For example, the lowest measure of variance, 0.0004, was
achieved for a design with full overlap and a high temporal correlation, such as 0.95. The
largest measure of variance resulted from a design with no overlap.)
.'
Using measures of
variance observed along this range, sample size was computed by rearranging Equation 3.1.
An estimate of cost to achieve this level of variance was computed for each design. These
designs were compared by evaluating the ratios of cost among designs. The same designs
used as reference designs in the previous analyses were maintained.
The ratios of variances and costs were evaluated over a range of annual
autocorrelation and degree of overlapping panels.
The results presented in Section 4.2
evaluated cost-efficiency over measures of annual autocorrelation, which ranged from -0.95
to +0.95.
These results were also presented over all possible degrees of rotating panels,
which varied with comparative years. The present analysis evaluated the variance and cost
ratios over this same range of autocorrelation and years. However, the results of the present
analysis are best presented as figures in this analysis. In order to limit the presentation of
these results, a representative suite of results are presented for each pairwise design
89
comparison.
It is expected that most measurements will have a positive temporal
correlation.
Therefore, results for two levels of positive annual autocorrelation are
presented, both a high (0.90) and low (0.20) measure of correlation. The discussion of these
.
results will include the full range of autocorrelation, which was investigated.
In addition, in order to compare the effect of the degree of overlap among designs,
the results for three different years are presented, which are representative of varying degrees
of overlap.
Figures are presented for four, three, and eight year comparisons.
Since, all
three designs include a degree of overlap for comparisons with a four year difference, this
comparison is of key interest.
At three year differences, only the NASS design includes
overlapping panels, while at eight year comparison, only the EMAP design includes
overlapping panels. The effect of the first-order autoregressive structure can be studied in
the comparison of the four year and eight year figures.
Results of the full range of years
studied are discussed in this section.
..
4.2.3.1 Variance Comparisons at Various Cost Levels
Four Year Comparison
1. ENASS to EMAP Comparison.
The results of the comparison of the variance ratios of the ENASS to EMAP design
at four year comparisons are presented in Figure 4.1.
Full overlap exists for the EMAP
design and 50 percent overlap for the ENASS design at four years. A small variance results
when a high degree of overlapping panels and high temporal correlations are assumed
(Equation 3.2). Since the EMAP design has a higher degree of overlap as compared to the
ENASS design, this variance is reduced to a greater degree for nearly the entire range of
assumed total cost (p=0.90).
For low values of total cost, the ENASS design is more
efficient. This is due to the larger sample size available for this design at low values of total
cost. Most of costs for the EMAP design at low values of fixed total cost are allocated to
90
l>
1.
Figure 4.1. ENASS to EMAP variance ratio by cost for four year comparison.
Rho=O.90
---.-.-.-.
It!
~
/.
.,.o
l'Il
co
.....
a:
C!
Rho=O.20
It!
~
/,/---
.,.o
_
l'Il
~
a:
I
C!
_
~
I
--.-.-.-.- .---.
~.--
ci
-
~
o
ci
-
oo -
10
j'
'1
I
-,
I
1
I
-,
-,
-,
-,
-,
5
10
15
20
25
5
10
15
20
25
Millions of Dollars
Millions of Dollars
the fixed frame costs (Table 3.2). Therefore, few resources are available for the number of
samples.
Decreasing the temporal correlation increases the variance for the EMAP design at a
higher rate than the ENASS design, over the entire range of assumed total cost. This effect
is presented for a correlation of p=0.20 in Figure 4.1. As the degree of correlation decreases,
the advantages of overlapping panels on variance reduction is decreased. In addition, at any
given cost along this range, the sample size of the ENASS design is larger than the EMAP
design. (This is true for all correlations investigated.) For example, for a fIXed cost of $8
million dollars, the sample size is 370 and 641 for the EMAP and ENASS designs,
respectively. At a fIXed cost of $20 million dollars, the sample size is 1336 and 1726 for the
EMAP and ENASS designs, respectively.
2. ENASS to NASS Comparison.
The variance ratio of ENASS to NASS at four years is presented in Figure 4.2 over
a range of cost. Since the cost of these designs is nearly identical, the variance ratio remains
constant across the range of cost. In this analysis, there is a 50 percent degree of overlap for
the ENASS design, while there is a 20 percent degree of overlap for the NASS design. For
high correlations, p=0.90, the higher degree of overlapping panels results in a smaller
variance for the ENASS design, and therefore, a variance ratio less than one results. As the
correlation decreases, the variance of the ENASS design increases, resulting in variance
ratios greater than one. This is illustrated for correlations of 0.90 and 0.20 respectively, in
Figure 4.2.
3. NASS to EMAP Comparison.
The variance ratio across a range of cost at a four year comparison for the NASS to
EMAP design is presented in Figure 4.3. There is a 20 percent degree of overlap for the
92
..
,
,..
Figure 4020 ENASS to NASS variance ratio by cost for four year comparisono
Rho=O.90
Rho=O.20
... -
...
~
~
-
0--0--0--0--0--0--0--0--0
,g
III
~
~
rI:
...
q
.i~
.g
-
III
.--0--0--0--0--0--0--0---0
It)
rI:
...q
-
J
It)
ci
ci
o
ci
I
I
5
10
I
,
,
15
20
25
Millions of Dollars
o
ci
I
5
-.
I
10
15
I
20
Millions of Dollars
I
25
Figure 4.3. NASS to EMAP variance ratio by cost for four year comparison.
Rho=O.90
Rho=O.20
...
...
It!
0
'=
III
~
,j:>.
a:
...
~
~
.~
lil
>
It!
0
1/
'=
III
a:
...
~
~
.~
lil
>
It)
o
II)
o
o
o
o
o
5
10
15
20
25
5
Millions of Dollars
10
15
20
Millions of Dollars
,
.
25
NASS design and full overlap for the EMAP design in this comparison.
For high
correlations, the variance is much smaller for the EMAP design due to the high degree of
panel overlap. This results in a ratio greater than one for high correlations, such as 0.90.
As the correlation decreases, the NASS design is more efficient over the range of cost
investigated. The variance reduction provided by the overlapping panels is reduced at low
levels of correlation. In addition, the sample size for the NASS design is larger than the
EMAP design for all costs investigated. Therefore, the NASS design has a smaller variance.
(This is true for all correlations investigated.) For example, for a fixed cost of $8 million
dollars, the sample size is 370 and 757 for the EMAP and NASS designs, respectively. At a
fixed cost of $20 million dollars, the sample size is 1336 and 2040 for the EMAP and
ENASS designs, respectively.
:I:l!m YaI Comparison
1. ENASS to EMAP Comparison.
No overlapping panels exist for either the ENASS or EMAP design at three years.
Therefore, the temporal correlation does not influence the variance as indicated in Figure
4.4.
This is illustrated for two measures of temporal correlation.
However, the variance
ratio of ENASS to EMAP was less than one over the same range of costs evaluated in the
previous figures. The variance ratio varied from 0.35 at a fixed cost of $5 million dollars to
0.81 at a fixed cost of $29 million dollars. This result was due to the larger sample size
available for the ENASS design at any fixed cost which was considered. For example, for a
fixed cost of $8 million dollars, the sample size is 370 and 641 for the EMAP and ENASS
designs, respectively. At a fixed cost of $20 million dollars, the sampIe size is 1336 and 1726
for the EMAP and ENASS designs, respectively.
2. ENASS to NASS Comparison.
95
Figure 4.4. ENASS to EMAP variance ratio by cost for three year comparison.
Rho=O.90
Rho=O.20
~
....
,g
III
co
0)
a:
C!
....
~
....
,g
_
11.
I
/'
j'
It)
o -
--
C!
....
_
I
.---- ..---.--.-- ,_.
/'
j'
It)
o
oo -
--
.----.---.--.--
o
I
I
I
I
5
10
15
20
o
-.
25
Millions of Dollars
I
-.
I
I
I
I
5
10
15
20
25
Millions of Dollars
fJ
•
"
,-'
The variance ratios of the ENASS to NASS design for a three year comparison is
presented in Figure 4.5. The fixed and variable cost of these designs is identical, therefore
the variance ratio is nearly constant across the range of cost investigated.
The slight
variation is due to changes in sample sizes over the range of cost. A greater number of new
PSUs are delineated in the ENASS design.
This PSU delineation cost decreases the
resources available to increase sample size as much as the NASS design.
(Recall that the
NASS design keeps a PSU in the panel for five years, while the ENASS design keeps a PSU
in the panel for two years. Thus, the one-time PSU delineation cost is lower for the NASS
design.)
No overlapping panels exist for the ENASS design at three years, while the NASS
design has a 40 percent degree of overlap. For a high correlation of 0.90, the variance of the
NASS design is much smaller than the ENASS design. This is due to the overlapping panels
and the larger sample size.
This results in a variance ratio greater than one.
As the
correlation decreases, the advantage of the overlapping panels is decreased. This results in a
decrease in the variance ratio.
3. NASS to EMAP Comparison.
In, Figure 4.6, the numerator in the comparison is the NASS design, which has a 40
percent degree of overlap, while the denominator is the EMAP design, which has no
overlapping panels. The variance ratio for this comparison is less than one, indicating that
the NASS design has the smaller variance. This result is attributed to both the overlapping
panels and the lower cost of the NASS design. As the correlation decreases, the variance
reduction due to overlapping panels is decreased, resulting in a higher variance for NASS
and subsequently, a larger variance ratio. This is illustrated for p=0.20 in Figure 4.6.
~
and Two Year Comparison
97
Figure 4.5. ENASS to NASS variance ratio by cost for three year comparison.
Rho=O.90
Rho=O.20
.--.--.--.--.--.--.--.--.
.... -
~
~
....
-
.--.--.--.--.---.--.--.--.
o
~
III
co
00
a:
~
....~
III
a:
j
q
....
-
j
It)
ci
It)
-
ci
o
o
ci
I
--.
5
10
ci
-
I
T
I
--.
J
I
I
I
15
20
25
5
10
15
20
25
Millions of Dollars
Millions of Dollars
.
,.
Figure 4.6. NASS to EMAP variance ratio by cost for three year comparison.
Rho=O.90
Rho=O.20
...
... -
~
,g
1II
co
co
a::
~
.,o
...q -
1II
a::
B
...
q
-
.i~
.~
>
/.
It)
/0 .........-.-.-.-. _0-·-·
ci
o
ci
-.
-.
5
10
I
15
I
20
Millions of Dollars
It)
ci
-
o
ci
-
/
.
I
1
T
25
5
10
--
.----.------- .-.
15
I
I
20
25
Millions of Dollars
1. ENASS to EMAP Comparison.
There are no overlapping panels for the ENASS and EMAP designs at one and two
year comparisons. Therefore, there is no effect of temporal correlation on the variance. The
results discussed at three years are the same for these comparative years (Figure 4.4).
2. ENASS to NASS Comparison.
The variance ratio of the ENASS to NASS design for one and two year comparison
revealed the same pattern as presented in Figure 4.5.
However, since there is a greater
degree of panel overlap for NASS at these years, the variance of the NASS design is smaller.
This results in larger variance ratios for these comparisons relative to the three year
comparisons.
3. NASS to EMAP Comparison.
There is an increasing degree of panel overlap for the NASS design for these years,
while there are no overlapping panels for the EMAP design. Due to the high degree of panel
overlap for the NASS design, the variance for the NASS design is smaller than the variance
computed at three years, which was presented in Figure 4.6. The only difference noted in
the analyses at one and two years is the overall decrease in the variance ratios. This is
.
attributed to the smaller variance for the NASS design.
Eight Year Comparison
.
1. ENASS and NASS to EMAP Comparison.
The patterns of the variance ratios for an eight year comparison of the ENASS to
EMAP design (Figure 4.7) and the NASS to EMAP design (Figure 4.8) were very similar to
those presented in Figures 4.1 and 4.3, respectively.
Since the effect of the temporal
correlation is decreased over time, due to the autocorrelation structure imposed, the impact
100
.,.
"
Figure 4.7. ENASS to EMAP variance ratio by cost for eight year comparison.
Rho=O.90
~
/.
.. o
.....
o.....
-
nI
a:
Rho=O.20
J
o
'A
nI
a:
.~
-~
~
Ul
o
oo -
~
--.-.-.-.
/.--.
/"
~
--
/"
/"
Ul
o
--.
5
oo I
I
10
15
I
20
Millions of Dollars
25
--
.----.--.---.--. - "
--.
1
I
I
I
5
10
15
20
25
Millions of Dollars
Figure 4.8. NASS to EMAP variance ratio by cost for eight year comparison.
Rho=O.90
Rho=O.20
...
... -
~
o
....
o
l-.:l
~
III
II:
.~
~
~
/./.
...
C!
~
o
_I.
--.--.-.-.-.
.g
III
II:
C!
...
_
j
I·
/.
/.
II)
o -
o
--
..----.---.--.- .--.
oo -
o
I
I
I
I
I
I
I
I
I
I
5
10
15
20
25
5
10
15
20
25
Millions of Dollars
,
Millions of Dollars
of the overlapping panels is reduced for the EMAP design. The variance ratios computed
for the eight year comparison were less than those computed for four years.
This result
implies that the variance of the EMAP design has increased in the eight year comparison as
compared to the four year comparison.
The variance of the EMAP design is influenced only slightly by low measures of
temporal correlation at eight, twelve, and greater yearly differences. Therefore, the results
presented for four years at p=0.20 are nearly the same as the results for differences at eight
year comparisons.
2. ENASS to NASS Comparison.
No overlapping panels exist for either the ENASS or NASS designs at eight year
comparisons. Therefore, there was no effect due to the degree of temporal correlation. A
constant variance ratio was revealed over the range of cost investigated, as discussed for
Figure 4.2.
.!
Since the cost of the ENASS design is more expensive than the NASS design
(Table 3.3), the variance ratio was greater than one (1.18) along the range of cost
investigated.
4.2.3.2 Cost Comparisons at Various Variance Levels.
Four Year Comparison
1. ENASS to EMAP Comparison.
The results of the cost ratio of the ENASS to the EMAP design for a four year
comparison are presented in Figure 4.9.
Recall that the ENASS design has a 50 percent
degree of overlap and EMAP has full overlap for this comparison. The variance for a four
year comparison at a high correlation was 0.0004 for the EMAP design. It was discussed
previously that the EMAP design was more efficient for this comparison, as compared to the
ENASS design, at high correlations. This was primarily due to the greater degree of panel
103
Figure 4.9. ENASS to EMAP cost ratio by variance for four year comparison.
Rho=O.90
o
C\i
o
-
C\i
\
III
~
.....
o,j>..
.g
III
a:
C!
Ui
~
Rho=O.20
_
-
III
~
\
'"'"
8
.g
III
a:
~
"''-
C!
~
........... ' -
'-.... .---...---.--
'-""'-""
II)
II)
oo -
oo -
'-'-'-'-'-'-'
o -
o
I
0.0
I
I
I
0.001
0.002
0.003
Variance
,
I
0.004
I
0.005
T
0.0
I
0.001
T
I
I
I
0.002
0.003
0.004
0.005
Variance
overlap for the EMAP design. This result is illustrated at this level of variance in Figure
4.9.
As the variance increases, the ENASS design is more cost-efficient as compared to
the EMAP design.
The sample size for both designs increases, which results in larger
measures of variance. For example, the sample size for the EMAP and ENASS designs is
748 and 1411, respectively, at a fixed variance equal to 0.0004. The sample size is 114 and
223 for the EMAP and ENASS design, respectively, at a fixed variance equal to 0.003. The
number of samples required to achieve the same fixed variance was smaller for all measures
of variance for the EMAP design, due to the overlapping samples.
At high measures of
fixed variance, the cost ratio differences are primarily attributed to the fixed frame costs,
which were quite substantial (Table 3.2).
The ENASS design has a much lower frame cost
as compared to the EMAP design, which decreases the variance ratio to values less than one
for high fIXed measures of variance.
As the correlation decreases, the variance increases for both designs. However, since
the EMAP design has a higher degree of overlap, the impact on this design is greater. Since
the advantages of the overlapping panels are reduced at low measures of temporal
correlation, the ENASS design is more cost-efficient. This is illustrated in Figure 4.9 for
p=0.20.
2. ENASS to NASS Comparison.
The cost ratio of the ENASS to the NASS design is presented in Figure 4.10 for a
.
four year comparison. There is a 50 percent degree of overlap for the ENASS design, while
there is a 20 percent degree of overlap for the NASS design in this comparison. For high
measures of correlation, the ENASS design has a lower variance due to the overlapping
panels.
The number of samples to achieve the fixed variance is smaller for the ENASS
design than the NASS design, which results in a lower cost as shown for p=0.90.
105
The
Figure 4.10. ENASS to NASS cost ratio by variance for four year comparison.
Rho=O.90
o
C\i
Rho=O.20
o
-
...
o
0)
~
III
a:
~
...
~
-
...
_
It!
It!
-
C\i
-
·08III
a:
.-. --. --.--.--. --.--.-- .--.--.
§
II)
--.-._-._-.--.-.--.--.--.
...
~
-
II)
d
d
o
o
d
d
I
0.0
0.001
I
I
I
I
I
0.002
0.003
0.004
0.005
0.0
Variance
I
0.001
I
I
I
I
0.002
0.003
0.004
0.005
Variance
..
sample size is equal to 1888 for NASS and 1460 for EN ASS, at a fixed variance equal to
0.0004.
The cost ratio increases in this plot for p=0.90 along the x axis, from 0.92 to 0.94.
This slight increase is attributed to the change in sample size over the range of fixed
variance. At a fIXed variance equal to 0.003, the sample size is 288 for NASS and 223 for
the ENASS design. However, since the fixed design costs are the same for these designs, the
cost ratio is nearly unaffected by varying sample size.
The cost ratios are presented for p=0.20 in Figure 4.10. As discussed previously,
there is little advantage of the overlapping panels in reducing variance at this level of
correlation. The same sample size is required for each design to achieve a fixed variance of
0.0004 and 0.003, which is 2173 and 332, respectively. Since the average total cost of the
NASS design is lower than the ENASS design, the cost ratio is greater than one.
3. NASS to EMAP Comparison.
Similar results were obtained for the comparison of cost ratios of the NASS to
EMAP designs as presented in Figure 4.11. In this comparison, there is a 20 percent overlap
for the NASS design and full overlap for the EMAP design. High measures of correlation
result in a smaller variance for the EMAP design. A smaller sample size, and subsequent
lower cost, is observed for the EMAP design. As the variance increases, a smaller sample
size is required for each design. For example, the sample size for EMAP and NASS is 748
and 1888, respectively, for a fixed variance equal to 0.0004. The sample size for EMAP and
NASS is 114 and 288, respectively, for a fixed variance equal to 0.003. The EMAP design
requires a smaller sample size to achieve the same level of variance due to the overlapping
sample at four years. For larger measures of variance, the differences in cost are primarily
attributed to the fixed frame cost, which were much smaller for the NASS design.
decreasing cost ratio for increasing variance reflects this result.
107
The
Figure 4.11. NASS to EMAP cost ratio by variance for four year comparison.
Rho=O.90
Rho=O.20
o
o
C\i
C\i
\
... -
~
....
0
00
,g
III
a:
Ui
8
...
~
...
~
\
'"'".,
,g
III
a:
..........
..............
--
~
... -
~
...........
·---"0--
10
o -
10
o
oo -
o -
o
I
I
I
I
0.0
0.001
0.002
0.003
0.004
.--.-.-.-.-.-.-.-.
I
,
,
,
0.005
0.0
0.001
0.002
Variance
.c
--
0.003
Variance
."
I
I
0.004
0.005
There is a larger variance for the EMAP design for decreasing measures of
correlation as seen in the plot of p=0.20. Therefore, an increased sample size, as compared
to the plot of p=0.90, is required for the EMAP design to achieve the measures of fixed
variance. For example, the sample size for the EMAP and NASS designs is 331 and 332,
respectively, at a fixed variance of 0.003. Since the same sample size is required for each
design to achieve the measure of fixed variance, the cost differential is attributed to the
fixed frame cost. As noted earlier, this cost was much lower for the NASS design.
Three Year Comparison
1. ENASS to EMAP Comparison.
The cost ratio of the ENASS design to the EMAP design is presented in Figure 4.12.
There is no overlap for either of these designs in a three year comparison. Therefore, the
sample size to achieve any level of fixed variance is the same for both designs. Along the
range of the variance considered in this analysis, the ENASS design was less expensive as
compared to the EMAP design to obtain the same level of variance. Since both fixed and
variable costs of the ENASS design are smaller than the EMAP design, the cost ratios are
less than one.
Since there are no overlapping panels in this comparison, the degree of correlation
has no effect, which is illustrated in this Figure. The sample size to achieve larger measures
of variance is smaller for both designs. Thus, the cost ratio at larger measures of variance is
a reflection of the ratio of the fixed frame costs. The differential in fixed frame cost for the
designs was more substantial than the differential cost of the sampling units.
2. ENASS to NASS Comparison.
The cost ratio of the ENASS to NASS design is presented in Figure 4.13 for a three
year comparison. There are no overlapping panels for the ENASS design at three years, but
109
Figure 4.12. ENASS to EMAP cost ratio by variance for three year comparison.
Rho=O.90
o
N
Rho=O.20
o
-
~
~
.....
.....
o
,g
nl
a:
§
N
-
~
~
-
q
-
.g
q
~
nl
a:
-
§
.....................
..............
II)
o
--
.--.-.-.-.-.-.
o
~
.....................
..............
II)
o -
--.--.-.-.-.-.-.
oo -
o
I
I
I
I
I
I
I
I
I
I
I
I
0.0
0.001
0.002
0.003
0.004
0.005
0.0
0.001
0.002
0.003
0.004
0.005
Variance
..
Variance
•
Figure 4.13. ENASS to NASS cost ratio by variance for three year comparison.
Rho=O.90
'.
o
N
Rho=O.20
o
N
............. "-..
."-.. .............
..............
--
.--.......---.--..
...
~
--
...
~
0
::>
III
0:
ti
------.-._-.-._-.--.--.--.
,g
...
C!
III
-
0:
ti
8
...C!
8
10
o
It)
o
o
o
o
.-
I
I
0.0
0.001
0.002
---.
0.003
Variance
o
I
I
0.004
0.005
0.0
I
I
0.001
0.002
-.
-.
0.003
0.004
Variance
I
0.005
60 percent of the panels are overlapping for the NASS design.
variance, given the same sample size, for the NASS design.
This results in a smaller
The effect of a reduction in
temporal correlation, for example to p=0.20, is illustrated in this Figure. The advantages of
a smaller variance due to overlapping panels is diminished as the degree of correlation
decreases.
3. NASS to EMAP Comparison.
The cost ratio of the NASS design to the EMAP design at three year comparisons is
presented in Figure 4.14.
There is a 60 percent degree of overlap for the NASS design
compared to no overlapping panels for the EMAP design. This results in a lower variance
for the NASS design, given the same sample size. The cost of the NASS design was less
than the EMAP design for all correlations and over the range of the variance investigated.
This is not unexpected since the fixed and variable frame costs for the NASS design is
smaller. This results in a cost ratio less than one for both levels of correlation presented in
this Figure.
~
and Two Year Comparison
1. ENASS to EMAP Comparison.
There are no overlapping panels in either year for the ENASS and EMAP designs.
Therefore, the results observed and discussed at three years are the same for one and two
year comparisons presented in Figure 4.12.
2. ENASS to NASS Comparison.
For comparisons at one and two years, the NASS design has some degree of overlap,
while the EMAP design has no overlap. The only difference in the analyses at one and two
years, as compared to the three year comparison, was an increase in the cost differential.
112
Figure 4.14. NASS to EMAP cost ratio by variance for three year comparison.
Rho=O.90
o
C\i
Rho=O.20
o
-
...
~
....
....
~
-B11l
a:
iii
8
C\i
-
...
-
~
-BIII
...C!
a:
~
...
C!
.......... -....
It)
.~.-.-.-.-.-.-.-.
It)
d
d
.-.--.-.-.-.-._-._-._-.--.
o
o
d
I
I
I
I
0.0
0.001
0.002
0.003
Variance
--.
--.
0.004
0.005
d
I
I
I
I
I
I
0.0
0.001
0.002
0.003
0.004
0.005
Variance
Since the NASS design has a greater degree of overlap at these years, the fixed variance was
achieved at lower sample size and subsequently, lower cost. This resulted in a larger cost
ratio at these years as compared to the three year comparison presented in Figure 4.13.
3. NASS to EMAP Comparison.
The cost ratio of the NASS to EMAP design was computed for one and two year
comparisons. As discussed for the variance ratio comparisons, there is an increasing degree
of panel overlap for the NASS design for these years, while there are no overlapping panels
for the EMAP design. The variance for the NASS design is smaller due to the increased
degree of panel overlap, as compared to the results presented for three years, which were
discussed for Figure 4.14. The only difference noted in these analyses is the overall decrease
in the cost ratios. This is attributed to the lower variance for the NASS design.
•
Eight Year Comparison
1. ENASS to EMAP Comparison.
The effect of the cost ratio for the ENASS to EMAP design for eight year
comparisons is presented in Figure 4.15. No overlapping panels exist for the ENASS design,
while there is full overlap for the EMAP design at eight years. This effect decreases the
variance for the EMAP design at high measures of correlation and low measure of fixed
variance. This results in a lower cost for the EMAP design.
As the level of fixed variance increases (i.e., as precision requirements become less
stringent), the ENASS design is more cost-efficient than the EMAP design.
This same
result was observed previously in Figure 4.9. To achieve these higher measures of variance,
a smaller sample size is required for both designs. At high measures of fixed variance, the
cost ratio differences are primarily attributed to the fixed frame costs, which were quite
substantial (Table 3.2).
The ENASS design has a much lower frame cost as compared to
114
Figure 4.15. ENASS to EMAP cost ratio by variance for eight year comparison.
Rho=O.90
Rho=O.20
o
o
N
...
~
N
-
.\
-
...
~
.",
.....
.....
'"
,g
III
0:
iii
...
q
-
8
.".,.,
.g
III
0:
iii
...
q
8
..............
.'-.--
.......................
.'-.--
·-----""""--e
III
III
c:i
c:i
o
c:i
0.0
I
I
I
I
-.
0.001
0.002
0.003
0.004
0.005
Variance
o
c:i
I
0.0
.--.-.-.-.-.-.
-.
I
-.
-.
1
0.001
0.002
0.003
0.004
0.005
Variance
the EMAP design, which decreases the variance ratio to values less than one for high fixed
measures of variance.
The effect of the autocorrelation structure is illustrated in the comparison of Figure
4.9 and Figure 4.15. In Figure 4.9, the four year comparison fixed m=4, while in Figure
4.15, the eight year comparison was fixed at m=8. (Recall that m influences the correlation
measure, Equation 3.2.) This factor decreases the impact of the temporal correlation on the
variance. Therefore, the advantages of the overlapping panels is decreased. The cost ratios
in Figure 4.15 for p=0.90 have decreased as compared to the cost ratios in Figure 4.9. This
is attributed to the increased sample size, and subsequent increased costs, to achieve the
corresponding fixed level of variance. For example, at four years, the sample size was 114
and 223 for the EMAP and ENASS designs, respectively, at a fixed level of variance equal to
0.003. For this same measure of variance at eight years, the sample size was 189 and 332
for the EMAP and ENASS designs, respectively.
The effect of a decreasing temporal correlation is also presented in Figure 4.15. The
advantages of the overlapping panels in the EMAP design is nearly zero for p=0.20. For
the entire range of fixed variance, the ENASS design is more cost effective. To achieve a
fixed variance of 0.003 at eight years for p=0.20, the sample size for the both designs is 332.
This indicates that a correlation of 0.20 does not influence the variance for an eight year
comparison. Since the costs of the ENASS design are smaller than the EMAP design, the
cost ratios are less than one in this Figure.
2. ENASS to NASS Comparison.
No overlap exists for the NASS and ENASS designs at eight years. Therefore, there
is no effect of temporal correlation on the variance.
Since the ENASS design is more
expensive than the NASS design, the cost ratio is greater than one for all levels of fixed
variance. This is primarily due to the greater number of PSUs delineated with the ENASS
116
..
design, which increases cost.
3. NASS to EMAP Comparison.
The cost ratio of the NASS to EMAP design for an eight year comparison is
presented in Figure 4.16.
Since the effect of the temporal correlation is reduced at eight
years as compared to four years (e.g.: pm = (0.90)4= 0.656 versus (0.90)8 = 0.431), the
sample size of the EMAP design is increased to achieve the same level of fixed variance.
This results in a decrease in cost ratios in this Figure as compared to Figure 4.11. Since the
impact of low correlations have nearly no impact on the variance at eight years, there is
little change on the cost ratios found at this level of p as compared to p=0.20 at four year
comparisons.
4.4 Summary of Cost-Efficiency Results
This section summarizes the results presented in this chapter.
Each significant
finding is briefly discussed. To obtain more details of a specific finding, refer to the section
in which the referenced table or figure originated.
Most results, which are presented from a number of angles, indicate that the designs
which piggy-back on the USDA survey, the NASS and ENASS designs, are more costefficient than the EMAP design for most comparisons. This is primarily due to the cost
savings of these designs. A large proportion of the survey costs are shared by USDA. The
differences in costs for these designs were presented in Table 3.2.
Comparisons Less Than Four Years
The NASS design is most cost-efficient of the three designs when yearly comparisons
are less than four years. The only exception is a comparison at one year assuming negative
117
Figure 4.16. NASS to EMAP cost ratio by variance for eight year comparison.
Rho=O.90
oC\i
Rho=O.20
o
•
C\i
...
-
... -
~
~
.\
..-
00
.",
o
+:>
III
a:
Ui
...~ -
o
.'.,............ -...
8
""
a:
III
Ui
.--
...
~
8
........... -...
.---.....---.---..
It)
o -
.--.-.-.-.-.-.-.-.
It)
o
o
o
o
I
,
-.
0.0
0.001
0.002
--I
0.003
-.
0.004
o
I
I
0.005
0.0
Variance
...
.
-.
-.
-.
T
,
0.001
0.002
0.003
0.004
0.005
Variance
•
•
•
temporal correlations.
For this situation, the ENASS design is more cost-efficient.
The
greater cost-efficiency for the NASS design is due to the high degree of overlapping panels,
while the ENASS and EMAP design have no overlapping panels, for these years.
With
decreasing correlation, some efficiency is lost. However, the NASS design remains most costefficient of the three designs even at low correlations (Tables 4.1-4.3; Figures 4.4-4.6).
For any measure of fixed precision assumed in the analysis discussed in Section
4.2.3.2, the NASS design was more cost-efficient. This was true across all levels of temporal
correlation assumed (Figures 4.12-4.14).
Comparisons at Four Years
The EMAP design is more cost-efficient than either the NASS and ENASS designs
assuming high levels of correlation, p>O.90, for comparisons at four years. This efficiency
gain is attributed to the full degree of overlapping panels at four years for the EMAP
design.
As the correlation decreases, this efficiency is reduced which is illustrated in the
tables and figures (Tables 4.1-4.3; Figures 4.1-4.3).
For moderate to low temporal
correlations at four year comparisons, the NASS design is most cost-efficient.
This is
illustrated for p=O.20 in these Figures.
When small fixed measures of variance are assumed, the EMAP design is most costefficient. An example of small measures of variance occur for a design with full overlapping
panels and a high degree of temporal correlation. As the precision requirements become less
stringent, the ENASS design is more cost-efficient at high measures of temporal correlation.
At low measures of temporal correlation, the NASS design is most cost-efficient (Figures 4.94.11).
Comparisons Greater Than Four Years
The NASS design is most cost-efficient for comparing the difference between two
119
years when no overlapping panels exist. Since there is no influence of the overlapping panels
on the variance, this result is attributed to the lower cost of the NASS design.
The NASS design is most cost-efficient in nearly all cases at eight year comparisons,
assuming a high temporal correlation. This is presented in Tables 4.1-4.3. When very small
•
...
measures of fIxed variance are assumed, however, the EMAP design is most cost-efficient
(Figures 4.15-4.16). When low to moderate levels of temporal correlation are assumed, the
NASS design is again most cost-efficient.
This is illustrated for a comparison with the
EMAP design for p=0.20 in Figure 4.16.
..
120
CHAPTER 5
SUMMARY AND DISCUSSION
The objective of this chapter is to summarize the results presented in the previous
chapters. Other potential sources of survey error are briefly discussed relative to the impact
these sources might have on total mean square error among the designs. The first section of
this chapter concludes with a recommendation on the best design to monitor agricultural
health in the United States.
A self assessment of this research is discussed in the second section of this chapter.
The shortcomings and potential criticisms of this work are reviewed. A few proposals for
future research are suggested in this section.
5.1
Discussi~n
of Findings
The objective of this research was to evaluate three sampling designs to monitor
agriculture health in the United States.
The motivation and description of these designs
were presented in Chapter 1. A cost-efficiency analysis was the method used to compare
these designs. In order to compute these analyses, the variance of interest was derived in
Chapter 2, while the cost model was presented in Chapter 3.
The results of the cost-efficiency analysis for the design comparisons were presented
in Chapter 4. These results suggested that the degree of temporal correlation and years of
interest in the comparison are highly influential in determining the best design. This section
summarizes results for these situations and reviews other sources of potential error.
5.1.1 Sampling Error
The underlying variance of the difference between two means was derived for each
of the design options in Chapter 2.
The results in Chapter 4 combined these variance
models with the cost models presented in Chapter 3. The following summarizes the costefficiency analysis discussed in this research.
The cost-efficiencies at one, two, and three years were computed for each of the
designs. Since the NASS design consists of overlapping panels for these years, the variance
was smaller for the NASS design than the other designs. In addition, the cost of the NASS
design was the lowest of the three designs. Subsequently, the NASS design was the most
cost-efficient for comparisons at any of these years than the alternative designs (Tables 4.1-
4.3).
Assuming high measures of correlation, the EMAP design is the most cost-efficient
design at comparisons of four years. The EMAP design consists of complete overlapping
panels for this comparison, which reduces the variance for the EMAP design (Equation 3.2).
122
With decreasing levels of correlation, the EMAP design is not as cost-efficient due to the
increase in variance.
For comparisons assuming moderate to low measures of temporal
correlation, the NASS design was most cost-efficient at four year comparisons (Tables 4.14.3). This result is primarily due to the lower costs of the NASS design.
When no overlapping panels exist for yearly comparisons (e.g., five, six, seven, and
DIne years), the measure of variance is constant across designs.
For these years, the
differences in cost-efficiency are attributed to the differential cost among designs. Since the
least expensive design is the NASS design, this design is most cost-efficient for these
comparisons (Table 3.2; Tables 4.1-4.3).
For eight year comparisons and assuming a high degree of temporal correlation, the
EMAP design is most cost-efficient. This is attributed to the lower variance obtained due to
the high degree of overlapping panels.
most cost-efficient (Tables· 4.1-4.3).
As the correlation decreases, the NASS design is
In this situation, the lower cost of the NASS design
provides for the increased cost-efficiency.
Combining all years of interest, it is suggested that the NASS design is most costefficient to monitor agriculture health across the United States.
This is assuming the
temporal correlation of most measurements over time are not extremely high. (Referring to
Tables 4.1-4.3, this generally occurs for temporal correlations less than 0.90.) This result is
primarily attributed to the cost savings of the NASS design (Table 3.2). In some situations,
(e.g., comparisons at a longer intervals such as eight years), the EMAP design has a smaller
variance. However, sample size is increased for the NASS design to achieve this same level
of variance. Since the cost of adding sampling units is lower for the NASS design, the same
level of variance among designs is achieved at a lower cost for the NASS design. Thus, the
cost-efficiency of the NASS design is lower in these situations.
5.1.2 Other Sources of Error
123
A few other sources of error may have potential impact on the relative costefficiencies of these designs. The differential impact among the designs for these sources of
error are of concern in this assessment. Each of these sources are briefly mentioned and the
relative impact of these sources of error across these designs is discussed.
Coverage Error
Error due to incomplete coverage exists because some persons are not part of the list
or frame used to identify the members of the population (Kish, 1965). All three designs
discussed in this research are based on an area frame with 100% coverage of the area across
the United States. Once an area is selected in the sample, the list of farms will be identified
by USDA irregardless of which sampling design is selected.
It is expected that a slight
amount of coverage error will be present, but no differential effects are expected across
designs.
Nonresponse Error
Groves (1989) defines nonresponse as the failure to obtain complete measurements
on the survey sample. The size of error due to nonresponse is determined by the response
rate and the difference in measurements between nonrespondents and respondents to the
survey.
In addition to nonresponse affecting the estimated value of the statistics,
nonresponse can also affect the variability in estimates over replications of the survey. For
example, different interviewers can result in different levels of nonresponse.
Therefore,
nonresponse errors can have variable components and fixed components, due to replications
of the survey over time and across the interviewers working in the survey. This results in
possible nonresponse variance and nonresponse bias.
No difference is anticipated during the first year of the survey in the response rate
across designs. The designs do, however, vary in their pattern of repeated contacts on the
124
respondent over time. For example, the NASS design repeats measurements on an annual
basis for five years.
In contrast, measurements collected using the EMAP design are
obtained every four years throughout the duration of the survey. The ENASS design repeats
measurements on a site only once, at four year intervals. Therefore, a potential for some
differences in response rate over time is expected.
There is no available literature on the response rate for long term surveys which
have a long lapse between repeated measurements (similar to the pattern in the EMAP and
ENASS designs). The US Bureau of Census and Bureau of Labor Statistics have no surveys
which consisted of such long intervals of time between repeated measurements (e.g., four
years).
Therefore, to compare the degree of nonresponse among designs involves a bit of
speculation. However, there is a known high rate of response for the current NASS design,
which repeats measurements on an annual basis. The response rate varies by state, but the
average rate of response is 87 percent across the US (Garriby and Huffman, 1991).
A
possible reason for this high response rate is the knowledge that involvement in this survey
is temporary. After a five year period, these sampling units are rotated out of the system.
It is unknown whether the rate of response will change if the period of respondent
involvement increases to an indefinite period of time (e.g., the EMAP design).
Interviewer Error
This source of error is attributed to the person collecting the data. Groves (1989)
presents some sources of this error, such as the failure to read a question correctly, the
failure to record an answer correctly, and the influence of the interviewer on the respondent's
choice of answer.
The agriculture component of EMAP has arranged to hire the
interviewers employed by USDA/NASS regardless of which design is adopted in this
program. Therefore, it is anticipated that that there will be no differential effects attributed
to interviewer error across the designs.
125
Instrument Error
This source of error is attributed to the instrument, or questionnaire, used in the
survey. The same questionnaire, which is developed for this survey, will be used for any
design option which is adopted. Therefore, no differential effects are expected due to this
source of error among designs.
Respondent
~
This source of survey error is attributed to the respondent. There will be different
respondents selected for the design options.
Assuming that the respondents are selected
randomly for each design, it is expected that there will be no differential effects in
respondent error among the designs.
Another source of error attributed to the respondent is due to panel conditioning.
This type of error is expected in this survey due to the repeated visits (Kasprzyk, et.al,
1989). In this program, this source of error can have a substantial impact on the survey
measures.
For example, a selected respondent will be aware that a monitoring program,
supported by the Federal government, is monitoring his/her agricultural practices over a
period of time. This knowledge could have an impact on how he/she performs agricultural
practices. Since the number of repeated visits varies over these designs, this source of error
could vary across designs. It is speculated that the shorter period of time a respondent is in
the survey, the less impact this source of error will have on the design. If this assumption is
accurate, the EMAP design would have a higher degree of error attributed to this source of
error than the other designs.
Data Processing Error
Groves (1989) defines this source of error attributed to the error associated with the
126
•
processing of data, (i.e., after the data collection process), such as coding, editing, etc. The
error associated with laboratory instruments are also put in this category. For a given year,
no differential effects are expected among designs in measurement error, since the same
instruments would be available for any choice of design.
measurement error are expected over time.
However, differences in
For example, technology improves over time
and it is expected instruments with better precision will become available as the program
progresses.
In addition, the learning process for the laboratory technician improves over
time, which would improve the precision of measurements as the program develops. This
introduces a bias in measurements collected over time attributed to the changing
technology.
Since we can just speculate on the magnitude of this bias, the relative
differences of measurement error among the designs options are unknown at this time.
Summary
2f Other Sources of Error
Each of the sources of error are components of total mean square error of the design.
No differences among a few sources of error are expected across designs.
For some
components, differences are expected. However, since no data has been collected to support
the previous discussion, these differences are speculation.
To summarize this section, the
relative impact of these components of error are presented in Table 5.1. A scale is presented
to weight the degree of error among designs. The lowest degree of error across designs was
assigned an L, a moderate level of error was assigned an M, and the highest degree of error
was assigned an H.
Results from this section, which are summarized in this table, suggest
that the NASS design is the best sampling design to monitor agriculture health in the US.
5.2 Self Assessment of Cost-Efficiency Analysis
Throughout this analysis, a number of assumptions and specific directions were
127
Table 5.1. Relative comparison of all sources of survey error among designs. <I
Qualitative Assessment of Relative
Size of Error Among Designs
Error Source
EMAP
NASS
ENASS
A. Nonobservational
Sampling
1, 2, 3 years
M
L
M
4 years
L
L
L
5,6, 7 years
M
L
M
8 years
L
L
L
Coverage
L
L
L
Nonresponse
M
L
L
..
B. Observational
Instrument
L
L
L
Respondent
H
M
M
Data Collection
L
L
L
M
M
M
C. Data Processing
Total MSE
<lL was assigned to represent the lowest measure of expected error, M was assigned to
represent a moderate measure of error, and H was assigned to represent a high degree
of expected error for a design.
128
taken which potentially might affect the cost-efficiency results. These issues are discussed in
this section. Proposals for future work to extend this research are also discussed.
5.2.1 Assessment and Future Work
The model adopted in this research to compare designs was the variance of change
between two means. A discussion of the choice of this estimator was presented in Chapter
2. Mean differences across years are not the only focus of the EMAP program. Two other
objectives of this program are to monitor annual means (e.g., status) and trends (e.g., linear
or nonlinear response functions) in ecological health. It is expected that the results of the
cost-efficiency analysis would vary for alternative measures of other estimators.
For
example, no differences in the variance of a simple estimator of status is expected among
designs, but differences are expected for the variance of an estimate of status if data from
previous years were somehow incorporated into the annual yearly measure. These measures
are referred to as composite estimators. There are a variety of composite estimators, which
have been developed and discussed in the literature by the US Bureau of Census.
The
literature illustrates that the precision varies across composite estimators (Gurney and Daly,
1965; Huang and Ernst, 1981). This was briefly discussed in Section 2.6. The changes in
the cost-efficiency analysis among designs for a composite estimator would be an area for
future research.
Differences among designs would also be expected for the variance of an estimator of
trend (e.g., polynomial response across time). This difference is attributed to the pattern of
repeated measurements, which differ among designs.
If the objective is to estimate trend
over time, precision is improved if the same sampling points are monitored throughout the
survey, assuming a positive temporal correlation exists between measurements (Section 2.1).
A high positive correlation reduces the covariance term, which is present in the variance
model for a difference between two means. This effect was demonstrated in this research,
129
for the estimate of a difference between two means.
Therefore, it is speculated that the
findings presented in this research would be similar to an analysis which compared a linear
trend among designs. However, the level of difference in the cost-efficiencies among designs
for a linear trend estimator, as well as higher order polynomials, would be an area for future
research.
There is a clear advantage in the choice of the estimator selected in this research
(Sections 2.1-2.2). It is expected that some bias exists for the variables measured in this
survey.
For example, a measurement bias is common in laboratory procedures.
This
research investigated the variance of a difference between two means. Under the assumption
that a constant bias is present over time, this bias would drop out in the estimate of a
difference between two time points. It is unknown whether this bias would have a major
impact on the mean square error of the status and trend estimator.
The temporal covariance matrix, which was adopted in this research, was a first
order
aut~regressive
structure.
Biological measurements typically follow this type of
pattern, which justifies the choice of this structure.
The results of this research has
illustrated that the degree of temporal correlation greatly influences the cost-efficiency
analysis. Therefore, other types of structures, such as a constant covariance structure, could
have an impact on the results. It is expected that one variable of interest in this program,
soil chemistry, is fairly consistent across time, assuming man's activities does not severly
alter the chemical makeup.
Therefore, the first order
aut~regressive
inappropriate to assume for the covariance structure for this variable.
structure is
Thus, a
recommendation for future research on comparisons of panel surveys would be to evaluate
design options under different covariance structures. For example, two other types proposed
by Berger (1986) were a uniform and circumplex covariance structure.
A series of assumptions were made in Section 2.4 in order to simplify the variance
for each of the designs. For example, N was assumed large, the covariance was written in
130
terms of the correlation, the between and within components of variance were written in
terms of intracluster homogeneity, and finally, the variance between any two comparing
years was assumed equal. Recall that the cost-efficiencies were compared by a simple ratio
of two designs. If any of these simplifications were omitted in the variance derivation, it is
expected that the ratio of cost-efficiencies would remain unchanged.
It is not anticipated
that any of these assumptions differ in their impact on the variance derivation for each of
the designs. Therefore, it is expected that the cost-efficiency results would be similar if these
assumptions were omitted from this analysis.
The results of the cost-efficiency analysis were based on a sampling variance.
In
order to account for total survey error, a more appropriate error term to compare would be
the total mean square error. However, no data is available to determine these measures of
error quantitatively. The discussion in Section 5.1 addresses potential differences among the
designs qualitatively.
A recommendation for future research on comparisons of panel
surveys would be to evaluate designs using total mean square error.
Finally, some interest within the EMAP program has focused on combining the
estimates (e.g., estimates of land use characterization) obtained from one sampling point
with other sampling points across a state in order to develop contour maps of land use
across a state.
This area of analysis is known as spatial statistics.
Estimates and their
variances are computed based on the location of the sampling points over time.
The
sampling points of two designs considered in this research (Le., NASS and ENASS) rotate
out of the system, while new points rotate into the system over time.
sampling points of the EMAP design remain fixed over time.
However, the
An area of future research
would be to compare the variances obtained from a spatial analysis from these different
types of panel surveys.
5.2.2 Alternative Panel Designs
131
This research reviewed three types of panel designs. There are additional types of
sampling patterns which can be investigated to monitor agriculture health. Results of this
research have implied that overlapping panels do increase cost-efficiency, under the
assumption of a high degree of temporal correlation.
However, the drawbacks of keeping
panels in the survey for an extended period of time, such as the impact on response rates,
were discussed in Section 5.1.2.
Since the information to compute cost-efficiency for alternative designs has been
presented in this research, various sampling scheme can be compared with any of these
designs.
For example, another design of interest combines a degree of overlapping panels
over an indefinite period of time (e.g., EMAP) with a rotational panel (e.g., NASS or
ENASS). The sampling scheme of this design retains 50 percent of the samples throughout
the length of the survey. The remaining 50 percent of the samples rotate similarly to the
ENASS design. This design, referred to as the Lesser design, is presented in Figure 5.1. The
cost of this design to monitor 800 points per year, over 12 years and 50 states is $9,735,140,
which was less than both the EMAP and ENASS designs.
The Lesser design, capitalizes on the NASS design, which provides a cost savings as
compared to the EMAP design. Some panels (50 percent) are maintained throughout the
survey, which provides improved precision on the estimation of long term changes. Finally,
some panels (50 percent) are rotating in and out of the survey, which reduces the impact of
nonresponse error and panel conditioning.
A cost-efficiency analysis was done on this type of design and the results are
presented in Tables 5.2-5.4. The cost-efficiencies for the Lesser design relative to the EMAP
design are presented in Table 5.2 over a range of temporal correlations. The Lesser design is
more cost-efficient than the EMAP design in nearly all comparisons. The only exception is
for high temporal correlations in a four year comparison. For this correlation, the EMAP
design is more cost-efficient primarily due to the full overlap for the EMAP design at four
132
•
Figure 5.1 Alternative design option proposed for the agroecosystem component of EMAP. a
YEAR
1
2
3
4
5
6
7
8
9
10
11
12
Nt
N~
N~
N:
N~
N~
N~
N84
N~
N~o
N~1
N12
4
------------------N51
Nt
N62
N~
N~
N~
N:
N:
Panel
#
N~
N~
N~
N~o
N~
N~1
N~2
N:
N~
N10
10
11
N11
N12
12
aN represents the observed sample, the superscript represents year and the subscript
represents panel number, denoting Nt. The line above the dashed line represents
the sampling scheme for 50 percent of the sample size. The remaining 50 percent of
the sample are measured similar to the pattern illustrated below the dotted line.
133
Table 5.2 Ratio of cost-efficiencies for the Lesser design to the EMAP design, assuming an
annual average cost for each design, over a range of annual autocorrelation, p.4
.
'J
Difference in Comparative Years
4
8
0.873
1.065
1.370
+/- 0.90
1.071
1.225
1.370
+/- 0.85
1.173
1.292
1.370
+/- 0.80
1.235
1.326
1.370
+/- 0.50
1.355
1.369
1.370
+/- 0.25
1.369
1.370
1.370
0
1.370
1.370
1.370
p
+/-
0.95
No overlap
•
4Cost was fIxed as the annual average cost for each design as listed in Table 3.2. (b=l,
a=800 for each design).
134
years. The Lesser design has 75 percent overlap at four years.
Maintaining some panels in the survey over time increases the precision of change
for comparisons of longer intervals, assuming a high temporal correlation exists. The Lesser
design has a greater number of overlapping panels than the ENASS design, for comparisons
at longer intervals.
In addition, the total survey cost is less for the Lesser design, as
compared to the ENASS design. As a result, the combined design is more cost-efficient for
all comparisons as compared to the ENASS design (Table 5.3).
Since the Lesser design has no overlapping panels at one, two, and three years, the
NASS design is more cost-efficient for these years (Table 5.4). For comparisons at four and
eight years, assuming a high positive correlation, the Lesser design is more cost-efficient.
For years where no overlap occurs, the NASS design is more cost-efficient, which is
primarily due to the cost savings of the NASS design.
The analysis of
any
alternative design can easily be computed based on the results
presented in this research. The variance of the difference between two temporal means has
been derived and presented in this research for a general design with overlapping panels
(Chapter 2). In addition, an outline of costs based on USDA estimates has been presented
in Chapter 3. As data are collected from this survey, the cost-efficiency analysis for any
alternative designs can be calculated using the information discussed in this research.
135
Table 5.3 Ratio of cost-efficiencies for the Lesser design to the ENASS design, assuming an
annual average cost for each design, over a range of annual autocorrelation, p. CI
Difference in Comparative Years
p
+/-
4
8
No overlap
1.228
1.737
1.002
+/- 0.90
1.142
1.342
1.002
+/- 0.85
1.095
1.187
1.002
+/- 0.80
1.067
1.108
1.002
+/- 0.50
1.007
1.003
1.002
+/- 0.25
1.003
1.002
1.002
0
1.002
1.002
1.002
0.95
..
•
ClCost was ftxed as the annual average cost for each design as listed in Table 3.2. (b=l,
a=800 for each design).
136
Table 5.4 Ratio of cost-efficiencies for the Lesser design to the NASS design, assuming an
annual average cost for each design, over a range of annual autocorrelation, p.4
Difference in Comparative Years
p
1
2
3
4
8
No overlap
- 0.95
1.144
0.584
0.999
1.168
1.155
0.862
- 0.90
1.131
0.618
0.980
1.072
1.022
0.862
- 0.85
1.118
0.649
0.963
1.011
0.953
0.862
- 0.80
1.105
0.677
0.947
0.969
0.915
0.862
- 0.50
1.020
0.795
0.884
0.876
0.864
0.862
- 0.25
0.945
0.846
0.865
0.863
0.862
0.862
0
0.862
0.862
0.862
0.862
0.862
0.862
+ 0.25
0.771
0.846
0.860
0.863
0.862
0.862
+ 0.50
0.668
0.795
0.841
0.876
0.864
0.862
+ 0.80
0.517
0.677
0.769
0.969
0.915
0.862
+ 0.85
0.488
0.649
0.749
1.101
0.953
0.862
+ 0.90
0.456
0.618
0.726
1.072
1.022
0.862
+ 0.95
0.423
0.584
0.699
1.168
1.155
0.862
4Cost was fixed as the annual average cost for each design as listed in Table 3.2. (b=l,
a=800 for each design).
137
.
APPENDIX A
PROCEDURES FOR DEVELOPMENT OF NASS AND EMAP AREA FRAME
AND SAMPLING STRATEGIES
I. NASS DESIGN
1. The United States is divided into 50 states.
2. Stratification.
a. Land-use stratification materials are collected for each state.
b. Stratification is performed by county (for administrative purposes) for each state.
Quality boundaries are used, boundaries which are permanent or long-lasting geographic
features and are easily identified by the interviewer. Eight strata are used in this process:
1. cropland, >75% cultivated
2. cropland, 50-74% cultivated
3. cropland, 15-49% cultivated
4. ag/urban, <15% cultivated
5. residential/commercial, no cultivation
6. range and pasture
7. non-agricultural
8. water
3. Construction of PSU's.
138
a. The strata are subdivided into PSU's. PSU's are defined on frame maps to have
identifiable boundaries, such as rivers and highways. The size of PSU varies by strata and
cross counties. They usually consist of 6-8 square miles for heavily agricultural areas and
0.5-1 square mile for less agriculturally developed areas. The PSU is further subdivided into
SSU's or segments.
The size of a SSU also varies but typically is 1 square mile.
The
advantage of this two-stage process is economical. Only the required number of PSU's, as
determined in the selection process,
are randomly selected and further subdivided into
segments.
b.
attached.
Once the PSU's are delineated for a county,
the PSU identification code is
PSU's are numbered in a serpentine manner beginning in NE corner of county
ending in SE or SW corner. (This is done without regard to strata.)
4. Digitization of PSU's.
a. The purpose is to convert map 'points into X-Y coordinates.
b.
The PSU's are measured to determine the number of segments per PSU for
sampling purposes.
c. The PSU areas are accumulated for each stratum. PSU area is divided by target
segment size for each stratum to obtain the total number of segments in that PSU.
(Summing the number of segments yields the total number of segments in the stratum.)
5. Substratification.
a. Since the agricultural makeup of segments differ depending on the location of
segments within a state, another level of stratification is applied to the area frame
construction process. This process is primarily effective where intense cultivation areas vary
across state.
b.
The PSU's are ordered according to agricultural content (using a criteria of
agricultural similarity).
The
PSU's are first ordered as described above, followed by
139
ordering the counties across the state in order to group counties into clusters which have the
same agricultural makeup.
In most cases, the ordering proceeds from one county into an
•
adjacent county.
c.
After the population of segments has been determined for each stratum and the
two-stage ordering completed, the number of substrata for each land-use stratum is
established. Selection of the number of substrata for each stratum is established based on
homogeneity of segments in strata, number of sample segments, and number of replicates
within each stratum.
6. Selection of segment.
Each substratum contains the same number of sampling units, except the last.
After the required number of segments has been delineated for the selected PSU, the
segments are numbered in a serpentine order starting in the NE corner. The selection of the
segment to be sampled is performed by selecting a number from a random number table
between one and the number of segments in the PSU.
Replicates (panels) are used in this scheme (see other issues section). A five-year
rotation scheme is used for the sample segments.
Rotation is accomplished by replacing
segments from specified replicates within a land-use stratum with newly selected segments.
Preferably, the number of replicates is a multiple of five to provide a constant workload for
sample selection activities. As a rule of thumb, 5 replicates are generally used if the sample
size for a stratum is less than 50 segments, 5 or 10 replicates if the sample size for a stratum
is between 50 and 100 segments, and 10 segments if the segment size is between 100 and 200
segments.
140
OTHER ISSUES WITH THE NASS DESIGN
1. Replicated subsampling.
a. This is characterized by the selection of a number of independent subsamples or
replicates from the same population using the same selection procedure for each replicate.
Each replicate (sample) i§ !!! unbiased representation of the population.
A replicate is
defined as a simple random sample of one segment from each substratum in a land-use
stratum.
b.
annually.
Approximately 20% of the replicates in each land-use stratum are replaced
This approximation is due to the sample design which does not rotate exactly
20% of the segments because the number of replicates is not always a multiple of five.
2. Selection probabilities.
a.
About 95% of the approximately 16,000 segments in the area frame sample are
selected based on the equal probability selection method.
Typically a two-step process is
used. The first step involves the selection of a sample of PSU's within each substratum in a
given land-use stratum.
Selection is done randomly, with replacement, with probability
proportional to the number of segments in the PSU.
After the sample of PSU's is selected,
each selected PSU is divided into the required number of segments. The second step of the
process involves randomly selecting a segment with equal probability from the selected PSU.
Therefore, all segments within a given substratum in a land-use stratum have an equal
probability of selection. If more than 10 segments are in a PSU, another step is added in
order to reduce the number of segments delineated on the map.
b. The other 5% of the segments are selected with an unequal probability selection
method.
This occurs mostly in range and non-agricultural strata.
This is used since
adequate boundaries are not available in these areas to draw off segments of approximately
the same size. One or two-step procedures are used to select the segment. For the one-step
141
process, PSU and segment size are synonymous. In a two-step process, PSU's are selected
with replacement and PPS sampling. Point sampling is used at the second stage to select
.
the segment.
B. EMAP Design
OVERVIEW: The EMAP monitoring strategy is based on a hierarchial structure, in which
distinct tiers are identified. The resources of the US are sampled via a hypothetical grid
superimposed over the US. This grid identifies 12,600 location at which ecological resources
will be identified, classified, and studied.
These activities are divided into 2 tiers, the first
directed toward information that essentially can be collected using remote sensing and the
second directed largely toward more intensive data collection at sites selected to represent
specific resources.
.'
1. Sampling Grid.
A fixed position that represents a permanent location for the the point grid is
established, and the sampling points to be used by EMAP are generated by a slight random
shift of the entire grid from this base location. The design is systematic, therefore a single
random shift will be chosen and all points will translate systematically.
Hexagons are
constructed on the grid points, each composed of approximately 640 km 2 • The hexagons
cover the entire US, thus providing for full coverage of land area.
At the grid points,
characterization of smaller hexagons (1/16th of the size of the bigger hexagons) will be
examined. There will be 12,6000 points establishing the grid density for baseline sampling
in Tier 1 of EMAP.
142
..
2. The Sample.
a. Tier 1.
One sixteenth of the area of the US is characterized in terms of land use
development at this tier.
This description provides the data for estimation of certain
regional resource parameters.
The area of each of the smaller hexagons, which is
characterized at this stage, is approximately 39.7 km 2 •
b. Tier 2. Field collection activities are done at this stage. This sample will be a subset of
the Tier 1 sample for that resource. For example, all of the 12,600 hexagons characterized
in Tier 1 as containing an agricultural makeup will be identified to include in the next step.
This subset of hexagons will be superimposed onto the NASS area frame. The centroids of
these hexagons are used to determine the NASS PSU which is located at this point. These
NASS PSU's, which are located at the center of the hexagon, will serve as the PSU's in the
EMAP design.
These PSU's are the same PSU's as described in the NASS design
description. Approximately 3200 PSU's will be selected at this tier. As described for the
NASS design, each of these PSU's are divided into 6-8 segments or SSU's. These SSU's are
the same SSU's which have been identified and discussed in the NASS design.
The SSU
located at the centroid of this hexagon will be selected as the ultimate sampling unit.
Therefore, the two-stage sampling design is nested within the second tier of the EMAP
design.
3. Interpenetrating replicates.
The sample will be blocked into 4 interpenetrating replicates. This decomposition
would apply to both the Tier 1 and Tier 2 sample. The repeating cycle length will be 4
years, thus all of the Tier 2 sample sites would be visited during a 4-year period. A second
cycle would begin in the fifth year with revisits to the first interpentrating subsample. Each
of the interpenetrating sample would be implemented over the entire US.
143
..
APPENDIX B
ill
RESULTS OF THE COST-EFFICIENCY ANALYSIS FOR b=2 AND b=3
The tables included in this appendix are results of the cost-efficiency analysis
computed for secondary stage allocations equal to two and three (i.e., b=2 and b=3). These
tables are referred to in Chapter 4, Section 2.
".
144
Table 4.7 Ratio of cost-efficiencies for the ENASS to EMAP design when b=2 over a range
of annual autocorrelation, p. II
Difference in Comparative Years
4
8
12
No overlap
0.778
0.806
0.942
1.390
+/- 0.90
0.994
1.049
1.177
1.390
+/- 0.85
1.118
1.186
1.287
1.390
+/- 0.80
1.198
1.268
1.341
1.390
+/- 0.50
1.367
1.387
1.390
1.390
+/- 0.25
1.389
1.390
1.390
1.390
0
1.390
1.390
1.390
1.390
p
+/-
0.95
IICost was fIxed as the annual average cost for each design as listed in Table 3.2. Number
of PSUs varied among designs, as a result of setting b=2. For ENASS and EMAP, the
number of PSUs were 525 and 508, respectively.
145
Table 4.8 Ratio of cost-efficiencies for the ENASS to NASS design when b=2 over a range
of annual autocorrelation, p. II
.1'
Difference in Comparative Years
p
1
2
3
4
No overlap
- 0.95
1.176
0.600
1.027
1.053
0.886
- 0.90
1.162
0.635
1.007
1.008
0.886
- 0.85
1.149
0.667
0.989
0.976
0.886
- 0.80
1.135
0.696
0.973
0.952
0.886
- 0.50
1.049
0.817
0.908
0.895
0.886
- 0.25
0.971
0.870
0.889
0.887
0.886
0
0.886
0.886
0.886
0.886
0.886
+ 0.25
0.793
0.870
0.884
0.887
0.886
+ 0.50
0.687
0.817
0.864
0.895
0.886
+ 0.80
0.532
0.696
0.790
0.952
0.886
+ 0.85
0.501
0.667
0.770
0.976
0.886
+ 0.90
0.469
0.635
0.746
1.008
0.886
+ 0.95
0.434
0.600
0.718
1.053
0.886
IICost was fIxed as the annual average cost for each design as listed in Table 3.2. Number
of PSUs varied among designs, as a result of setting b=2. For ENASS and NASS, the
number of PSUs were 525 and 508, respectively.
146
'"
.
Table 4.9 Ratio of cost-efficiencies for the NASS to EMAP design when b=2 over a range of
annual autocorrelation, p. /I
Difference in Comparative Years
p
1
2
3
4
8
12
No overlap
- 0.95
1.182
2.316
1.353
0.738
0.910
1.063
1.568
- 0.90
1.196
2.187
1.380
0.987
1.184
1.328
1.568
- 0.85
1.210
2.084
1.405
1.146
1.338
1.452
1.568
- 0.80
1.225
1.998
1.429
1.258
1.431
1.513
1.568
- 0.50
1.325
1.701
1.530
1.528
1.565
1.568
1.568
- 0.25
1.432
1.598
1.563
1.566
1.568
1.568
1.568
0
1.568
1.568
1.568
1.568
1.568
1.568
1.568
+ 0.25
1.753
1.598
1.573
1.566
1.568
1.568
1.568
+ 0.50
2.025
1.701
1.609
1.528
1.565
1.568
1.568
+ 0.80
2.614
1.998
1.759
1.258
1.431
1.513
1.568
+ 0.85
2.772
2.084
1.806
1.146
1.338
1.452
1.568
+ 0.90
2.964
2.187
1.863
0.987
0.184
1.328
1.568
+ 0.95
3.201
2.316
1.935
0.738
0.910
. 1.063
1.568
/lCast was fIxed as the annual average cost for each design as listed in Table 3.2. Number
of PSUs varied. among designs, as a result of setting b=2. For NASS and EMAP, the
number of PSUs were 495 and 508, respectively.
147
Table 4.10 Ratio of cost-efficiencies for the ENASS to EMAP design when b=2 over a range
of annual autocorrelation, p. <I
I,
Difference in Comparative Years
p
4
8
12
No overlap
+/-
0.95
0.674
0.699
0.817
1.205
+/-
0.90
0.862
0.910
1.021
1.205
+/-
0.85
0.969
1.028
1.116
1.205
+/-
0.80
1.039
1.100
1.163
1.205
+/-
0.50
1.186
1.203
1.205
1.205
+/-
0.25
1.204
1.205
1.205
1.205
1.205
1.205
1.205
1.205
0
-,
..
41Cost was fixed as the annual average cost for the EMAP design. Values of PSUs varied
among designs, as a result of setting b=2. For ENASS and EMAP, the number of
PSUs were 738 and 508, respectively.
148
Table 4.11 Ratio of cost-efficiencies for the ENASS to NASS design when b=2 over a range
of annual autocorrelation, p. G
Difference in Comparative Years
p
1
2
3
4
No overlap
- 0.95
1.257
0.642
1.098
1.126
0.948
- 0.90
1.243
0.679
1.077
1.077
0.948
- 0.85
1.228
0.713
1.058
1.043
0.948
- 0.80
1.213
0.744
1.040
1.018
0.948
- 0.50
1.121
0.874
0.971
0.957
0.948
- 0.25
1.038
0.930
0.950
0.948
0.948
0
0.948
0.948
0.948
0.948
0.948
+ 0.25
0.848
0.930
0.945
0.948
0.948
+ 0.50
0.734
0.874
0.924
0.957
0.948
+ 0.80
0.569
0.744
0.845
1.018
0.948
+ 0.85
0.536
0.713
0.823
1.043
0.948
+ 0.90
0.501
0.679
0.798
1.077
0.948
+ 0.95
0.464
0.642
0.768
1.126
0.948
GCost was fIxed as the annual average cost for the EMAP design. Values of PSUs varied
among designs, as a result of seting b=2. For ENASS and NASS, the number of PSUs
were 738 and 822, respectively.
149
Table 4.12 Ratio of cost-efficiencies for the NASS to EMAP design when b=2 over a range
of annual autocorrelation, p. a
Difference in Comparative Years
1
2
3
4
- 0.95
0.959
1.879
1.098
0.599
0.738
0.862
1.272
- 0.90
0.970
1.774
1.119
0.800
0.960
1.078
1.272
- 0.85
0.981
1.690
1.140
0.929
1.085
1.178
1.272
- 0.80
0.993
1.621
1.159
1.020
1.160
1.228
1.272
- 0.50
1.075
1.380
1.241
1.239
1.270
1.272
1.272
- 0.25
1.161
1.297
1.268
1.270
1.272
1.272
1.272
0
1.272
1.272
1.272
1.272
1.272
1.272
1.272
+ 0.25
1.422
1.297
1.276
1.270
1.272
1.272
1.272
+ 0.50
1.642
1.380
1.305
1.239
1.270
1.272
1.272
+ 0.80
2.120
1.621
1.426
1.020
1.160
1.228
1.272
+ 0.85
2.249
1.690
1.465
0.929
1.085
1.178
1.272
+ 0.90
2.404
1.774
1.511
0.800
0.960
1.078
1.272
+ 0.95
2.597
1.879
1.569
0.599
0.738
0.862
1.272
p
8
12
No overlap
aCost was fixed as the annual average cost for the EMAP design. Values of PSUs varied
among designs, as a result of setting b=2. For NASS and EMAP, the number of PSUs
were 822 and 508, respectively.
150
•
Table 4.13 Ratio of cost-efficiencies for the ENASS to EMAP design when b=3 over a range
of annual autocorrelation, p.4
Difference in Comparative Years
4
8
12
No overlap
0.785
0.814
0.952
1.404
+/- 0.90
1.004
1.059
1.189
1.404
+/- 0.85
1.129
1.197
1.230
1.404
+/- 0.80
1.209
1.281
1.355
1.404
+/- 0.50
1.381
1.401
1.403
1.404
+/- 0.25
1.402
1.404
1.404
1.404
0
1.404
1.404
1.404
1.404
p
+/-
0.95
4Cost was fIxed as the annual average cost for each design as listed in Table 3.2. Number
of PSUs varied among designs, as a result of setting b=3. For ENASS and EMAP, the
number of PSUs were 371 and 352, respectively.
151
Table 4.14 Ratio of cost-efficiencies for the ENASS to NASS design when b=3 over a range
of annual autocorrelation, p. «I
\
Difference in Comparative Years
p
1
2
3
4
No overlap
- 0.95
1.196
0.611
1.045
1.071
0.902
- 0.90
1.182
0.646
1.025
1.025
0.902
- 0.85
1.169
0.679
1.006
0.993
0.902
- 0.80
1.155
0.708
0.990
0.969
0.902
- 0.50
1.067
0.831
0.924
0.910
0.902
- 0.25
0.988
0.885
0.904
0.902
0.902
0
0.902
0.902
0.902
0.902
0.902
+ 0.25
0.806
0.885
0.899
0.902
0.902
+ 0.50
0.698
0.831
0.879
0.910
0.902
+ 0.80
0.541
0.708
0.804
0.969
0.902
+ 0.85
0.510
0.679
0.783
0.993
0.902
+ 0.90
0.477
0.646
0.759
1.025
0.902
+ 0.95
0.442
0.611
0.731
1.071
0.902
«least was fixed as the annual average cost for each design as listed in Table 3.2. Number
of PSUs varied among designs.
152
•
Table 4.15 Ratio of cost-efficiencies for the NASS to EMAP design when b=3 over a range
of annual autocorrelation, p."
Difference in Comparative Years
p
1
2
3
4
8
12
No overlap
- 0.95
1.173
2.299
1.343
0.733
0.903
1.055
1.557
- 0.90
1.187
2.171
1.370
0.979
1.175
1.319
1.557
- 0.85
1.201
2.068
1.395
1.137
1.328
1.442
1.557
- 0.80
1.216
1.984
1.418
1.248
1.420
1.502
1.557
- 0.50
1.316
1.689
1.519
1.517
1.554
1.557
1.557
- 0.25
1.421
1.587
1.552
1.554
1.557
1.557
1.557
0
1.557
1.557
1.557
1.557
1.557
1.557
1.557
+ 0.25
1.741
1.587
1.562
1.554
1.557
1.557
1.557
+ 0.50
2.010
1.689
1.597
1.517
1.554
1.557
1.557
+ 0.80
2.595
1.984
1.746
1.248
1.420
1.502
1.557
+ 0.85
2.752
2.068
1.792
1.137
1.328
1.442
1.557
+ 0.90
2.942
2.171
1.850
0.979
0.175
1.319
1.557
+ 0.95
3.178
2.299
1.921
0.733
0.903
1.055
1.557
"Cost was fIxed as the annual average cost for each design as listed in Table 3.2. Number
of PSUs varied among designs, as a result of setting b=3. For NASS and EMAP, the
number of PSUs were 338 and 352, respectively.
153
Table 4.16 Ratio of cost-efficiencies for the ENASS to EMAP design when b=3 over a range
of annual autocorrelation, p. CI
Difference in Comparative Years
p
4
8
12
No overlap
+/-
0.95
0.681
0.706
0.825
1.217
+/-
0.90
0.870
0.918
1.031
1.217
+/-
0.85
0.978
1.038
1.127
1.217
+/-
0.80
1.048
1.110
1.174
1.217
+/-
0.50
1.197
1.214
1.216
1.217
+/-
0.25
1.215
1.217
1.217
1.217
1.217
1.217
1.217
1.217
0
ClCost was fIxed as the annual average cost for the EMAP design. Values of PSUs varied
among designs, as a result of setting b=3. For ENASS and EMAP, the number of
PSUs were 521 and 352, respectively.
154
.
...
Table 4.17 Ratio of cost-efficiencies for the ENASS to NASS design when b=3 over a range
of annual autocorrelation,
p./I
Difference in Comparative Years
p
.'
1
2
3
4
No overlap
- 0.95
1.277
0.652
1.116
1.144
0.963
- 0.90
1.263
0.690
1.094
1.095
0.963
- 0.85
1.248
0.725
1.075
1.060
0.963
- 0.80
. 1.233
0.756
1.057
1.035
0.963
- 0.50
1.139
0.888
0.987
0.972
0.963
- 0.25
1.055
0.945
0.966
0.963
0.963
0
0.963
0.963
0.963
0.963
0.963
+ 0.25
0.861
0.945
0.960
0.963
0.963
+ 0.50
0.746
0.888
0.938
0.972
0.963
+ 0.80
0.578
0.756
0.859
1.035
0.963
+ 0.85
0.545
0.725
0.836
1.060
0.963
+ 0.90
0.509
0.690
0.810
1.095
0.963
+ 0.95
0.472
0.652
0.780
1.144
0.963
/lCost was fIXed as the annual average cost for the EMAP design. Values of PSUs varied
among designs, as a result of setting b=3. For ENASS and NASS, the number of
PSUs were 521 and 562, respectively.
155
Table 4.18 Ratio of cost-efficiencies for the NASS to EMAP design ratio when b=3 over a
range of annual autocorrelation, p. /I
Difference in Comparative Years
p
1
2
3
4
8
12
No overlap
- 0.95
0.952
1.866
1.090
0.595
0.733
0.857
1.264
- 0.90
0.963
1.762
1.112
0.795
0.954
1.070
1.264
- 0.85
0.975
1.679
1.132
0.923
1.078
1.170
1.264
- 0.80
0.987
1.610
1.151
1.013
1.153
1.219
1.264
- 0.50
1.068
1.371
1.233
1.231
1.261
1.263
1.264
- 0.25
1.153
1.288
1.260
1.262
1.264
1.264
1.264
0
1.264
1.264
1.264
1.264
1.264
1.264
1.264
+ 0.25
1.413
1.288
1.268
1.262
1.264
1.264
1.264
+ 0.50
1.631
1.371
1.296
1.231
1.261
1.263
1.264
+ 0.80
2.106
1.610
1.417
1.013
1.153
1.219
1.264
+ 0.85
2.234
1.679
1.455
0.923
1.078
1.170
1.264
+ 0.90
2.388
1.772
1.501
0.795
0.954
1.070
1.264
+ 0.95
2.579
1.866
1.559
0.595
0.733
0.857
1.264
/lCost was fIxed as the annual average cost for the EMAP design. Values of PSUs varied
among designs, as a result of setting b=3. For NASS and EMAP, the number of PSUs
were 562 and 352, respectively.
156
•
REFERENCES
Bailar, B.A. (1989) "Information Needs, Surveys, and Measurement Errors" in Panel
Surveys (D. Kaspryk, G. Duncan, G. Kalton, M.P. Singh, eds.), John Wiley & Sons,
NY, ppl-24.
Baltes, P.B. (1968) "Longitudinal and Cross-Sectional Sequences in the Study of Age and
Generation Effects", Human Development, 11, 145-171.
Bass, R.
(1991)
Personal Communication, National Agricultural Statistics Service.
Washington, D.C.
(1953)
"Convergence:
Bell, R.Q.
Development, 24, 145-152.
An Accelerated Longitudinal Approach", Qtihl
Berger, M.P.F. (1986) "A Comparison of Efficiencies of Longitudinal, Mixed Longitudinal,
and Cross-Sectional Designs", Journal of Educational Statistics, 11, 171-181.
Breau, P., and Ernst, L.R. (1983) "Alternative Estimators to the Current Composite
Estimator" , Proceedings of the Section on Survey Research Methods, American
Statistical Association, 397-402.
Bromberg, S.M. (1990) "Identifying Ecological Indicators: An Environmental Monitoring
and Assessment Program", Journal 2f Waste Management, 40, 976-978.
Canada, J.R., and White, J.R.
(1980)
CaDital Investment Decision Analysis
Management !!:Wi Engineering, Prentice-Hall Inc., Englewood Cliffs, New Jersey.
fm:
Cantwell, P.J. (1990) "Variance Formulae for Composite Estimators in Rotation Designs",
Survey Methodology, 16, 153-163.
Clarke, W.R., Woolson, R.F. "Methodologic Issues in Linked Cross-Sectional Studies:
Experiences from the Muscatine Study", American Statistical Association:
Proceedings of the Section on Survey Research Methods. 1987.
Cochran, W.G. (1977) Sampling Techniques, John Wiley and Sons, New Vork.
Cook, N.R., Ware, J.H. (1983) "Design and Analysis Methods for Longitudinal Research",
Annual Review of Public Health, 4, 1-23.
Cotter, J., Nealon J. (1987): Area Frame Design for Agricultural Surveys, National
Agricultural Statistics Service. Washington, D.C.
Duncan, G.J., Kalton, G. (1987) "Issues of Design and Analysis of Surveys Across Time",
International Statistical Review, 55, 97-117.
Garibay, R., Huffman, J.E. (1991) ~ Agricultural Surveys Survey Administration
Analysis, National Agricultural Statistics Service, Washington, D.C.
Goldfarb, N. (1960) An Introduction 12 Longitudinal Statistical Analysis, Glencoe, Illinois:
Free Press.
Goldstein, H. (1979) The Design and Analysis of Longitudinal Studies, Academic Press,
157
New York.
Groves, R.M. (1989) Survey Errors and Survey Costs, John Wiley and Sons, New York.
Gurney, M., and Daly, J.F. (1965) "A Multivariate Approach to Estimation in Periodic
Sample Surveys", Proceedings of the Social Statistics Section, American Statistical
Association, 242-257.
'(
Hansen, M.H., Hurwitz, W.N., Madow, W.G. (1953) Sample Survey Methods mlS! Theory:
volume ll, John Wiley and Sons, New York.
Henderson, J.M., and Quandt, R.E.
Company, New York.
(1980)
Microeconomic Theory, McGraw-Hill Book
Holt, D., and Scott, A.J. (1981)
Statistician, 30, 169-178.
"Regression Analysis Using Survey Data",
~
Huang, E.T., and Ernst, L.R. (1981) "Comparison of an Alternative Estimator to the
Current Composite Estimator in CPS", Proceedings of the Section on Survey
Research Methods, American Statistical Association, 303-308.
Kasprzyk, D., Duncan, G., Kalton, G., Singh, M.P. (1989) Panel Surveys, John Wiley and
Sons, New York.
Kish, L. (1965) Survey Sampling, John Wiley and Sons, New York.
Kish, L., Frankel, M.R. (1974) "Inference from Complex Samples", Journal of the
Statistical Association, Ser. B, 36, 1-37.
Kish, L. and Anderson, D.W.
J ASA, 73, 24-34.
(1978)
~
"Multivariate and Multipurpose Stratification",
Kish, L. (1986) "Timing of Surveys for Public Policy", Australian Journal of Statistics,
28(1), 1-12.
Kish, L. (1988) "Multipurpose Sample Designs", Survey Methodology, 14, 19-32.
Kish, L. (1990) "Rolling Samples and Censuses", Survey Methodology, 16,63-79.
Kleinbaum, D.G. (1973) "Testing Linear Hypotheses in Generalized Multivariate Linear
Models", Communications in Statistics, 1(5), 433-457.
Kodlin, D., Thompson, D.J. (1958) "An Appraisal of the Longitudinal Approach to
Studies of Growth and Development", Monographs 2f ~ Society fm: Research in
Qill!l Development, 23, (Whole No. 67).
Kowalski, C.J., Guire, K.E. (1974) "Longitudinal Data Analysis", Growth, 38, 131-169.
Kumar, S., and Lee, H. (1983) "Evaluation of Composite Estimation for the Canadian
Labour Force Survey", Proceedings of the Section on Survey Research Methods,
American Statistical Association, 403-408.
Machin, D. (1975) "On a Design Problem in Growth Studies", Biometrics, 31, 749-753.
158
McCarroll, K. (1987) "An Evaluation of Some Approximate F Statistics and Their Small
Sample Distributions for the Mixed Model with Linear Covariance Structure", Ph.D.
thesis, University of North Carolina, Chapel Hill, NC.
Nathan, G., Nolt, D. (1980) "The Effect of Survey Design on Regression Analysis",
Journal of the B.2nl Statistical Society, Ser. B, 42, 377-386.
N.C. State University
Raleigh, NC.
(1991):
Agroecosystem Research Plan 1991. Peer Review Draft.
Overton, W.S., Stevens, D.L., Pereira, C.B., White, D., Olsen, T. (1990) "Design Report
for EMAP, Environmental Monitoring and Assessment Program", March, 1990 Draft.
Patterson, H.D. (1950) "Sampling on Successive Occasions with Partial Replacement of
Units", Journal of ~ &!vAl Statistical Society .H, 12, 241-255.
Pawel, D., Feeso, R. "On the Use of Correlation in Crop Yields", American Statistical
Association Proceedings on the Section of Survey Research Methods, 1987.
Pearson, R.W., Boruch, R.F. (1980)
Survey Research Designs: Towards §
Understanding of Their Costs and Benefits, Springer-Verlag, New York.
~
Potthoff, R.F. and Roy, S.N. (1964) "A Generalized Multivariate Analysis of Variance
Model Useful Especially for Growth Curve Problems", Biometrika, 51, 313-326.
Prahl-Anderson, B., Kowalski, C.J. (1973) "Mixed Longitudinal, Interdisciplinary Study of
the Growth and Development of Dutch Children", Growth, 37, 281-295.
Rao, J.N.K., Kumar, S., Roberts, G. (1989) "Analysis of Sample Survey Data Involving
Categorical Response Variables: Methods and Software", Survey Methodology, 15,
161-186.
Rao, M.N., Rao, C.R. (1966) "Linked Cross-Sectional Study for Determining Norms and
Growth Rates - A Pilot Survey on Indian School-Going Boys", Sankyha, 28, 237-258.
Rubin, R.B. (1987) Multiple Imputation fur Nonresponse in Surveys, John Wiley and
Sons, New York.
Stuart, A. (1954) "A Simple Presentation of Optimum Sampling Results", Jour. RQL.
Stat. Soc., B16, 239-241.
U.S. Bureau of the Census (1963): The Current Population Survey: A. Report !ill
Methodology. Technical Paper No.7, U.S. Government Printing Office, Washington,
D.C.
U.S. Department of Agriculture (1983): Scope and Methods of the Statistical Reporting
Service. Publication No. 1308, Washington, D.C.
van't Hof, M.A., Roaed, M.J., Kowalski, C.J. (1977) "A Mixed Longitudinal Data Analysis
Model", HUID§n Biology, 49, 165-179.
Vollmer, W.M., Johnson, L.R., McCamant, L.E., Buist, A.S. (1988) "Longitudinal Versus
Cross-Sectional Estimation of Lung Function Decline - Further Insights", Statistics in
Medicine, 7, 686-696.
159
Ware, J.B. (1985) "Linear Models for the Analysis of Longitudinal Studies", The
American Statistician, 39, 95-101.
Ware, J.B., Wu, M.C. (1981) "Tracking: Prediction of Future Values from Serial
Measurements", Biometrics, 37, 427-437.
Woolson, R.F., Leeper, J.D. (1980) "Growth Curve Analysis of Complete and Incomplete
Longitudinal Data", Communications in Statistics, A9, 14, 1491-1513.
Woolson, R.F., Leeper, J.D., Clarke, W.R. (1978) "Analysis of Incomplete Data from
Longitudinal and MUted Longitudinal Studies", Journal 2f ~ B.!wll Statistical
Society A, 141, 242-252.
160