This research was partially supported by the National Institute of Child
Health and Human Development through Grant HD-00371 and by the U. S. Bureau
of the Census through Joint Statistical Agreement JSA 74-2.
A DOUBLE SAMPLING SCHEME MODEL FOR ELIMINATING
MEASUREMENT PROCESS BIAS AND ESTIMATING
MEASUREMENT ERRORS IN SURVEYS
By
Judith T. Less1er
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 949
AUGUST 1974
JUDITH T. LESSLER. A Double Sampling Scheme Model for
Eliminating Measurement Process Bias and Estimating
Measurement Errors in Surveys. (Under the direction
of DANIEL HORVITZ.)
A general double sampling scheme model which employs a
combination of an error-free measurement process and a
faulty measurement process is developed.
The model allows
estimation of measurement error variance and elimination of
measurement process bias.
The model is applied to two
specific survey situations, a self-enumeration survey and
an interviewer conducted survey.
Using a cost function
which reflects the relative cost of the error-free measurement process and the faulty measurement process, optimum
values for the sample sizes are derived and the optimum
number of interviewers is indicated.
For various values of
the parameters the DSS model is compared to using only the
faulty measurement process or only the error-free measurement process and the preferred sampling scheme is indicated.
(
ACKNOWLEDGEMENTS
I wish to express my sincere thanks to Dr. Daniel
Horvitz for the direction provided during the course of this
research.
To Dr. H. Bradley Wells, I extend my thanks for his
continual encouragement and guidance throughout the course
of my work in the department.
Special thanks go to the other members of my committee:
Dr. Gary Koch whose work provides a basis for this paper;
Dr. Dana Quade for his excellent teaching; Dr. Krishnan
Namboodiri whose instruction in my first year provided a
framework for much of my thinking and interest in demography.
I wish to express deep gratitude to Dr. Ahmed Mustafa
for his help and our work together during our graduate
training.
I am indebted to Mr. William H. Brown, the artist for
the tables in Chapter IV, and to Pat Eichman for her typing
~
and encouragement.
I would also like to extend my thanks to Mrs. Pat
Vandiviere for her moral support.
I wish to thank my husband Ken whose love and encouragement supported me every step along the way.
(
TABLE OF CONTENTS
Chapter
I.
II.
I I I.
Page
INTRODUCTION
Review of Literature.
The Research . . . . .
1
2
24
PRESENTATION OF THE DOUBLE SAMPLING SCHEME
MODEL, DSS; COMPARISON TO THE KOCH-CBM
MODEL; SIMPLE RANDOM SAMPLING CASE
·
2. 1 Introduction
· · ·
2.2 The DSS Model · · · · ·
· · ·
2.3 Comparison of DSS Model to
Koch-CBM Model · · · ·
· · ·
26
26
· · ·
·
28
SELF-ENUMERATION AND INTERVIEWER DESIGNS
FOR ESTIMATING THE VARIANCE COMPONENTS
OF THE DSS MODEL . · · ·
· ·
3.1 Introduction · · · · · · · · · · ·
3.2 Self-Enumeration Survey · · · · · ·
3.3 Survey Design Using Interviewers ·
3.4 Using the DSS Model to Evaluate
a Measurement Process · · · · · ·
·····
IV.
· · ·
· · ·
· · · ·
· · · ·
· ·
·····
USE OF DSS MODEL; OPTIMUM SIZES FOR ORIGINAL
SAMPLE AND SUBSAMPLE IN SELF-ENUMERATION
CASE; MODIFICATION OF DSS MODEL FOR
ELIMINATING BIAS ONLY: PREFERRED
SAMPLING SCHEME FOR FIXED COSTS
AMONG DSS, KOCH-CBM, OR MEASURING TRUE VALUES ONLY; TABLES . . .
4.1 Introduction
....•....
4.2 Optimum Sizes for the Initial Sample
and the Subsample in a Se1fEnumeration Survey
...
4.3 Modification of the DSS Model
to Eliminate Bias Only
.....•
4.4 Preferred Survey Procedure
.
40
46
46
46
55
78
•.
4.5
Summary
80
80
82
91
97
. . 124
1v
Chapter
V.
VI.
Page
OPTIMUM NUMBER OF INTERVIEWERS
.
.
5.1 Introduction
5.2 V(Zt) when a Sample from the Interviewer
Population is Used
.
5.3 Optimum Values for b, ~, and r
.
1
5.4 Conclusion as to the Optimum Values
for b, r, and r 1
SUMMARY AND INCIATIONS FOR FURTHER RESEARCH
APPENDICES. . .
.
.
.
.
6.1 Summary. . . . .
6.2 Indications for Further Research
APPENDIX A. General DSS Model in the NonSimple Random Sampling Case.
APPENDIX B. Effect on V(Zt) in an Interviewer
Survey when Randomization is not
Done at the Second Phase
...
BIBLIOGRAPHY
. . . . . . .
·
·
126
126
126
· 130
·
134
· 152
· 152
· 154
· 1 55
· 159
. . . . . 160
(
LIST OF TABLES
Table
4.2. 1
4.2.2
4.2.3
4.2.4
Page
Optimum Values for n and n1 for Fixed Cost
S2/ S2 = 0.03, C = 1000M
· · · · · · · ·
B
Optimum Values for n and n1 for Fixed Cost
s2/ s 2 = 0.3, C = 1000M
· · · · · ·
B
Optimum Values of n and n1 for Fixed Variance
1
2
s 2 = 0.03 · · · · · · · ·
V(Zt) = 50 S , s2/
B
Optimum Values of n and n1 for Fixed Variance
1
2
S2 = 0.3
V(Zt) = 50 S , S2/
· · ·
B
4.4.2.1
Preferred Sampling Scheme, p
=
4.4.2.2
Preferred Sampling Scheme, p
=
4.4.2.3
Preferred Sampling Scheme
4.4.3.1
Preferred Sampling Scheme, CV
S2/ S2
y
4.4.3.2
y
y
1.0
=
0.5
y
=
1.5
y
=
1.0
102
=
103
O. 1 ,
109
O. 1 ,
110
O. 1 ,
· · · · · · · · · · · · · ·
111
0.5,
····
112
· · · · · · · · · · · · · ·
113
·· ·
· · · · · ·
Preferred Sampling Scheme, CV
S2/ S2
=
90
0.3
· · · · · · · · · · · · · ·
Preferred Sampling Scheme, CV
S2/ S2
4.4.3.5
=
=
89
101
· · · · · · · · · · · · · ·
Preferred Sampling Scheme, CV
S2/ S2
4.4.3.4
1.5
Preferred Sampling Scheme, CV
S2/ S2
4.4.3.3
=
87
o. 1 · · · ·
· · ·
· · · · · · ·
=
86
=
·
0.5,
vi
Table
4.4.3.6
Page
Preferred Sampling Scheme, CV = 0.5,
$2/ s 2 = 0.5
114
Preferred Sampling Scheme, CV = 1 . a,
S2/ S2 = 1.5
115
y
4.4.3.7
y
4.4.3.8
· · · · · · · · · · · · ·
=
1 . a,
· · · · · · · · · · · · · ·
Preferred Sampling Scheme, CV = 1 . a,
S2/ S2 = 0.5
y
4.4.4.1
·
Preferred Sampling Scheme, CV
S2/ S2 = 1.0
y
4.4.3.9
· · · · · · · · · · · · · ·
· · · ·
· · ·
5.4.2
=
1.0
S2 =
V(z t) , r, and r l for Various b 0' S2/
y
0.25, S2/ 5 2 = 0.25, S2 /S2 = 0.25, and
IQ
n
S2/ S2 = 0.25, B = 20, c = 11
B
·
3
·
5
··
5.4.5
· · ·
123
· · ·
S2 = 0.5,
V(zt) , r, and r l for Various b0' S2/
y
S2/ S2 = 0.5, S2 /S2 = 0.5, and 5 2/5 2 = 0.5,
IQ
B
n
B = 20, c = 11
· · ·
· · · · ·
3
2 = 1 • a,
V (z t ) , r, and r for Various b 0' S2/
l
y
S2/ S2 = 1 . a, S2 /S2 = 1 . 0, and 5 2/$2 = 1 . a,
IQ
B
n
B = 20, c = 11
· · · ·
3
· · ·
·
·
5.4.4
117
S2 = ·as,
V(Zt) , r, and r l for Various b a ' S2/
y
s2/ S2 = .as, s2 /S2 = . 05, and S2/ S2 = · 05,
IQ
B
n
B = 20, c = 11
· · · · ·
3
· · · · · · ·
·
5.4.3
· · ·
Preferred Sampling Scheme, DSS Estimates
Variable Errors, CV = 0.5, S2/ S2
y
5 .4. 1
116
=
V(zt ) , r, and r l for Various bo ' S2/$2
y
0.05, $2/ S2 = 0.05, S2 /S2 = 0.75, and
IQ
n
S2/ S2 = 0.05, B = 20, c
11
B
3
I
136
137
··
138
·
139
· · ·
140
vii
Page
Table
5.4.6
5.4.7
5.4.8
5.4.9
5.4.10
5.4.11
5.4.12
V(Zt) , r, and r 1 for Various
2
2
0.75, 5 /5 2 = 0.75, 52IQ /5
n
2
2
5 B/5 = 0.05, B = 20, c 3 =
V(Zt), r, and r 1 for Various
2 2
0.75, 5 /5 = 0.05, 52IQ /S2
n
2
5 2B/5 = 0.05, B = 20, c 3 =
V(zt ) , r, and r 1 for Various
2
0.02, S2/ 5 2 = 0.02, S2IQ /5
n
2
2
5 /S = 1 .05, B = 20, c 3 =
B
V(zt ) , r, and r 1 for Various
2
0.01 , 5 2/5 2 = 0.01 , 52IQ /5
n
5 2B/5 2 = 0.05, B = 20, c 3 =
V(z t ) , r, and r 1 for Various
2 2
0.05, 5 /5 = 0.05, S2IQ /S2
n
2 = 0.05, B = 20, c =
2
5 /5
3
B
V(z t ) , r, and r 1 for Various
2
2 2
0.05, 5 /5 = 0.05, 52IQ /5
n
2/5 2 = 0.05, B = 20, c =
5B
3
V(Zt) , r, and r 1 for Various
0.05, 5 2/5 2 = 0.05, S2 /S2
n
5.4.13
IQ
5 2B/5 2 = 0.05, B = 34, c 3 =
V(z t ) , r, and r 1 for Various
2
0.05, 52 /5 2 = 0.05, 52IQ /5
n
2
2
5 B/5 -- 0.05, B = 50, c 3 =
2 2
b0' 5y /5 =
=
11
0.05, and
· ·
· · · · ·
141
b0' 5y2/5 2 =
=
0.05, and
· · · · ·
142
. · · · · · · ·
143
· · · · · · ·
144
11
2 2
bo ' 5 ~ /5 =
= 0.5, and
3
2 2 =
5
o ' ~ /5
= 0.5, and
b
3
b0' 5 2/5
~
=
2 =
0.05, and
21
· · · · ·
145
2 2
b0' 5 ~ /5 =
=
11
0.05, and
·
·
· · · · ·
146
52 =
bo ' S2/
~
= 0.05, ·and
11
· · · · · · ·
147
b0' 5 2/5 2 =
~
=
11
0.05, and
· · ·
148
viii
Table
Page
5.4.14
V(Zt) , r, and r
for Various b ' S2/ S2 =
o
t,;
S2/
S2
0.05,
= 0.05, S2 /S2 = 0.05, and
n
lQ
S2/ S2 = 0.05, B = 20, c = 41 .
. .
B
3
. · .
149
S2/ S2 =
V(zt ) , r, and r for Various b
1
0'
t,;
0.05, S2/ S2 = 0.05, S2 /S2 = 0.05, and
n
1Q
s2/ 5 2 = 0.05, B = 20, c = 51
. .
B
3
. · .
150
V(z t ) , r, and r for Various b
s2/ S2 =
1
0'
t,;
2
2
0.05, 5 /s = 0.05, S2 /S2 = 0.05, and
n
1Q
S2/ S2 = 0.05, B = 20, c = 101 .
.
B
3
. · .
151
1
.
5.4.15
5.4.16
\
o
(
CHAPTER I
INTRODUCTION
Traditional research in sample survey methods has been
concerned with the use of probability sampling schemes which
allow objective estimates of the sampling errors in the
statistics computed from the data.
Cost models are avail-
able which allow the survey researcher to determine the best
method of allocating his funds in order to minimize the
sampling errors in a survey.
These procedures for optimiz-
ing the sample design assume that the measurements of the
characteristics under study in the survey are made without
error for each unit of observation.
It has long been recog-
nized, however, that this assumption is usually not valid
and that the measurement errors may be as serious as sampling errors in the results of a survey (Rice, 1929; Deming,
1944).
Advancement in treating measurement errors requires
the development of realistic models for survey procedures
which will allow one to estimate, and consequently develop
methods for eliminating or otherwise controlling, the
measurement errors in a survey.
The researcher conducting
a survey needs information which will allow him to decide
how to best allocate his funds in order to minimize the
effects of both $amp1ing and measurement errors in his
2
survey results.
This paper presents methods for eliminating
the bias and for estimating the variable errors due to
measurement errors.
In addition optimum allocation schemes,
which will allow one to minimize the total error (sampling
and measurement) of the survey, are presented.
A short review of the literature on measurement errors
follows.
In this review a short history of the development
of methods for controlling measurement errors is included.
The problem that the major portion of this research
addresses is further delineated, and a discussion of existing models for measurement errors is presented.
Review of Literature
(
In a 1902 paper on the mathematical theory of measurement errors Karl Pearson, although not directly concerned
with errors in surveys, introduced several issues and concepts which are still important today in the study of and
development of models for studying measurement errors.
Pearson began his study on measurement error with the aim
of providing a means by which one might find a value for the
"per son ale qua t ion" ina s t ron 0 my .
The" per son ale qua t ion"
is the average over a large series of judgments or measurements of the errors that an observer makes in the measurement of a fixed quantity; i.e., using Pearson's terminology,
if
~
(
is the actual value of some physical quantity, xl its
value according to the judgment of an observer, then the
mean value of xl -
s
over a large series of judgments, p
01
,
3
is the personal equation of the observer.
He writes "if .;
be the actual value of some physical quantity, whether it
can really be determined or not..
II
This raises the
issue of the existence of an actual or true value for a
Present day researchers are still concerned
measurement.
with this issue and the role the concept of a true value
should play in the study of measurement errors (Deming,
1961; Cochran, 1968; Madow, 1965).
Some authors do not like the term IItrue value" and
instead use the term "ideal value."
The issue seems to boil
down to the question of whether or not true values are
obtainable.
When the term "true value" is discarded, the
term lIideal value ll usually takes its place.
What is meant
by this is the value that would be obtained if the ideal (or
preferred, or best) measurement process were carried out
without error.
Deming (1961), Hansen, Hurwitz and Pritzker
(1967), Mosteller (1968), Cochran (1963; 1968), Madow (1965),
Hansen, Hurwitz, and Madow (1953), Kish (1965) all discuss
this issue.
There are two issues as to whether or not one can
obtain a IItrue value.
1I
The first is whether a "true value ll
can be defined in terms of a measurement.
The second is
whether or not the measurement is obtainable once it is
defined.
In this paper the term "true value" is used.
Those readers who object to this on philosophical grounds
may think of it as the ideal value or best measurement that
could be obtained under ideal measurement conditions for the
4
characteristic in question.
Bias in the measurement process
(
is defined as the difference between the "true value" and
the average of the values obtained over many repetitions of
the measurement process, a definition used widely today.
At the time that Pearson conducted his study on
measurement errors it was assumed that these errors would
be normally distributed with mean zero and that they were
statistically independent for those observers who made their
observations independently.
Pearson discovered, however,
that the mean errors are not usually zero, they are not
generally normally distributed, and that the errors for
observers who were apparently operating independently were
positively correlated.
The recognition of the existence of
positively correlated measurement errors was and is very
important; and all present day models must treat this
issue.
How important are measurement errors in surveys?l
To
answer this question one must have some model for these
errors which provides a means for assessing their magnitude.
Models are discussed later in this review.
A few results
lSince many surveys are conducted by means of questionnaires, the response of a particular individual to a
certain question is the measurement for that individual;
thus measurement errors are often (usually) called response
errors. However, a questionnaire is only one of the tools
with which measurements may be taken. Measurements may be
taken by directly counting, weighing, determining length,
area, amount, etc.; by record checks; or judgments of enumerators or observers; as well as by asking questions of a
respondent. Thus, the more appropriate term, measurement
error, is used in this paper.
4[
5
are presented here to give an idea of the magnitude of these
errors.
In a study of bias in reports of income received
from welfare assistance, David (1962) found that respondents
understated the amount received by 18% of the actual amount.
Carter, Glick, and Lewit (1955) found that the total number
of marriages reported for males in a certain period was 21%
fewer in the Current Population Survey than was estimated by
the National Office of Vital Statistics.
Maynes (1968)
found that in reports of bank balances the mean balance
reported by respondents was significantly lower than the
mean actual balance.
Ferber (1955), in a study of the
agreement among different respondents from the same household about characteristics of their household, found as low
as 41% agreement on whether or not the family kept a budget
and that in 43% of families of size three, one or more of
the adult members failed to report the purchase of a car.
Borus (1966) found little overall bias in the reporting of
earned income although differences were found for certain
characteristics of the respondent such as age, sex, etc.
Zarcovich (1966) has an extensive review of papers which
examine the relationship between measurement
~rrors
and
certain characteristics of the respondents.
Contemporary models for measurement errors usually
assume that correlated measurement errors are due to the
presence of interviewers or enumerators.
These errors may
have an important effect on the total error in survey data.
Kish (1962) showed that even when the interviewer variance
6
was as small as 7% of the total variance, the variance of
the estimate of the mean was increased by 10%, even with
small interviewer work loads.
Hansen, Hurwitz and Bershad
(1961) compared the relationship of interviewer variability
to sampling variance in the 1950 Census for certain characteristics.
The ratios of interviewer variability to
sampling variance ranged from 0.03 to 4.2.
The above examples show that measurement errors may
indeed be important and that the survey researcher should
make some effort to control their magnitude.
Zarcovich (1966) points out that errors in surveys may
be detected in two different ways, by "post hoc techniques"
and by sampling methods.
Post hoc techniques involve
examining, after completion of the survey, the data that
have been collected and tabulated and comparing them with
other known information.
These techniques depend upon some
previous knowledge of the population characteristics to be
measured or of specific relationships among these characteristics.
They are particularly useful in demographic
studies in which certain known patterns exist between the
characteristics being measured.
Sampling methods for detecting measurement errors rely
upon additional information on the units in the survey
collected by means of a resurvey or by a special sample
design, such as Mahalanobis' (1946) method of interpenetrating samples.
Dalenius (1962) in a review of recent
advances in survey methodology points out that previous to
7
1962 two lines of work had developed to deal with measurement errors.
The first was the development of appropriate
theory and methods for dealing with specific sources of
measurement errors, such as the methods that have been
developed in order to deal with
no~-response;
the develop-
ment of the technique of interpenetrating samples to deal
with errors introduced by field workers; methods developed
to deal with the biases associated with various types of
sampling units, etc.
The second line of work was the development of a more
comprehensive theory of measurement error, taking the form
of mixed error models.
The models were developed as a basis
for measuring the overall errors in a survey.
The model developed in this paper is a comprehensive
model which may be adapted to handle various sources of
error and which involves the use of special sampling techniques for detecting errors and eliminating bias.
Review of Models for Studying
Measurement Errors
In this section several types of models found in the
literature are mentioned briefly.
These will-not be re-
viewed in any detail, but are included to give the reader
an idea of the range of work that has been done in this
field.
Following this a more detailed discussion is given
of the several models that provided a basis for the work
presented in this paper.
One of the first measurement error problems for which
8
models were developed is the problem of non-response.
In
the basic model the population being studied is viewed as
consisting of two strata, a responding stratum for which
measurements are obtained on the initial round of the
survey; and the non-responding stratum for which additional
efforts must be made in order to obtain measurements for
. the individuals in that stratum.
The non-response models
provide a means for determining the optimum sampling fraction for the second stratum, the non-responding stratum, in
order to minimize the cost of obtaining these additional
measurements or, alternately, to minimize the variance of
the final estimate of the population mean.
Hansen and
Hurwitz (1946) developed the initial model of this type.
Kish (1965) and Zarkovich (1966) give an extensive review of
the variations of this basic model.
Another general class of models attempts to estimate
the bias that exists due to measurement errors by means of
a record check.
In this type of study the measurement
obtained for a particular individual in the survey is compared to an existing value for the same characteristic which
is available from some other source.
The net disagreement
between these two figures is taken as an indication of the
measurement error that exists in the survey figures.
In
addition such record check studies often examine the relationship of the bias in the characteristic being measured
to other characteristics of the individual members of the
population.
Studies by David (1962), Carter, Glick, and
9
Lewit (1955), and Borus (1966) are of this general type.
Neter, Maynes, and Ramanathan (1965) examined the effect of
errors in the matching of the individual record to the
individual response on the measurement of the bias.
Maynes
(1968) studied the effect of record consultation, rounding,
size of balance on the bias in respondent reports of their
bank balance.
Neter and Waksberg (1964) studied the effect
of different types of survey design on the accuracy of
reports on the amount and the time of household
expenditures.
Other models involve the use of special measuring
techniques and survey designs to produce unbiased estimates.
These include dual measurement (Tenenbein
t
1970; Madow t
1965), the technique of randomized response (Warner, 1965;
Greenberg et al., 1969; Greenberg, Abernathy and Horvitz,
1970; Greenberg et al., 1971), and dual record systems
(Chandrasekar and Deming, 1949; Wells, 1971).
The Tenenbein
model involves the use of two measuring devices which
classify the units in the sample into two mutually exclusive
categories.
The first measuring device is the true classi-
fier which classifies the units into the two categories
without error.
The second classifier is fallible and may
give classifications which are in error.
A double sampling
scheme is employed in which all units in the sample are
classified by the fallible classifier and a subsample of the
original sample is classified by means of the true classifier.
Tenenbein derives the maximum likelihood estimator
10
el
of p, the population proportion falling in one of the
categories, and the optimum values for the sampling fractions for each classifier which minimize measurement costs
for fixed variance, or alternately, minimum variance for
fixed cost.
The randomized response technique was developed to
protect the privacy of the individuals in the survey.
It
permits truthful answers to sensitive questions, thereby
reducing response bias.
Chandrasekar and Deming (1949) provide a method for
estimating the total number of vital events by the use of
two independent data collection systems, a survey and a
vital registration system.
Their method has become widely
used in what is termed the IIdual record system" (Wells,
1971).
The method may be summarized as follows:
Suppose a survey is conducted in which the number of
vital events occurring in the population is estimated.
This
estimate will usually be an underestimate because some of
the vital events in the population will not be reported.
Suppose the population is also covered by a vital registration system which also underestimates the
vit~l
events.
If
the survey and the registration system are independent an
unbiased estimate of the total number of vital events in the
population can be obtained under the model.
The indepen-
dence property, together with an estimate of the probability
that a vital event will be reported with each system, is
used to estimate the events missed by both systems.
11
The development of models to estimate the increased
variance due to measurement errors has focused to a large
extent on the contribution of enumerators or interviewers to
this error.
Sukhatme and Seth (1952) developed a mixed model
for the effect of interviewers on the values reported for a
particular unit in the sample.
They show how the various
components of their model may be estimated under different
survey designs and give some examples from surveys conducted
in India.
Kish (1962) in -a study of the effect of inter-
viewer variance on certain attitudinal variables developed
a simple variance components model which demonstrates how
the interviewer affects the variance of the estimate of the
sample mean.
Mahalanobis (1946) developed the technique of
interpenetrating samples by which enumerator effects may be
estimated by analysis of variance techniques.
Models which handle bias and variance simultaneously,
i.e., mean square error models, have been developed.
and Lansing (1954) have developed such a model.
Kish
A similar
model was used by Felischer et a1. (1958) to estimate the
measurement errors associated with farmers' reports of
cotton field acreages.
General models for measurement errors are discussed
next.
These models are discussed in detail since they
provide the basis for the research presented in this paper.
Census Bureau Model.
The CBM refers to a series of
models developed by Morris H. Hansen and his colleagues at
the U. S. Bureau of the Census.
The following discussion is
12
based on two papers which give the most complete explication
of the model (Hansen, Hurwitz and Bershad, 1961; Hansen,
Hurwitz, and Pritzkev, 1964).
The model is a univariate
model worked out in terms of a characteristic which takes
only 0,1 values.
The CBM assumes that the general conditions of the
survey are defined.
These include such things as inter-
viewers, supervisors, publicity, questionnaire design,
processing, etc.
The model is developed under the assump-
tion that the survey is repeatable under the same general
conditions.
A particular survey is considered as one trial,
t, of the possible repetitions of the survey.
Let
1 if the j-th unit is said to have the characteristic at the t-th trial
o
U.
J
=
otherwise
1 if the j-th unit has the characteristic
o
otherwise
Uj is the true value for the j-th unit.
Et{x jtG )
= PjG
.
The population parameter to be estimated is -U
1 ~
=N
J
U
j
From a particular trial of sample size n an estimate for IT
is
The expected value over trials of PtG is
13
The bias in PtG is
The mean square error of a particul,ar survey is
- 2 = (J 2
2 where
MSE G = E(p tG -U)
+ 8
G
PtG
2
(J
= E{PtG - PG) 2 is the total variance of
PtG
A goal in any survey is to minimize the mean
the survey.
square error for the given survey budget.
The total variance of the survey is then broken into
several components.
= E [(ptG-p)2]
where p
1
+ E [(p-P G)2] + 2E [(PtG-p)(p-P G)]
n
= -n . r 1 PJ. G . These three terms are respectively
J=
the response variance, the sampling variance, and the interaction variance.
subscript G.
For a particular survey we may drop the
The response variance can be divided into the
simple response variance and a correlated component.
This
is done in terms of a quantity called the response deviation
In the case of simple random sampling the response variance
is given by
E(Pt- p )2
1 n
= E(- r
n j
d. )
Jt
2
14
2
and ad t is shown to be equal to
adt
2
=
n1 ad 2
[ ( 1 + P ( n- 1 ) ]
where p is the intrac1ass correlation coefficient among the
response deviations in a survey, i.e.,
and
is the simple response variance.
Hansen et a1. do not
demonstrate the nature of the interaction variance term.
However, they state that this term is zero for a complete
census or for a sample survey in which repetitions of the
survey are defined only for a fixed sample of units.
In a
reformulation of this model by Koch (1974) the nature of
this component is demonstrated and these statements are
consistent with the results obtained.
In addition, Hansen et al. (1951) show that if the
response deviations are uncorrelated, the usual estimate (in
the case of simple random sampling) of the sampling variance contains both the sampling variance and simple response
variance as defined in the model.
In the 1964 paper the response variance is further
analyzed.
They derive an "index of inconsistency" which is
. seen to be a measure of the unreliability or inconsistency
of classification.
This is defined under a survey situation
in which the correlated response deviations and the
15
interaction variance are zero.
I, is simply:
I
The index of inconsistency,
= simple response variance
They then
total variance
derive an estimator of this index when a repetition t of
l
the survey t is made under the same general conditions,
This is done in terms of the follow,ing table:
Repetition
=
Survey
1
=0
=1
a
b
a + b
=0
c
d
c + d
a + c
b + d
n
The IIgross difference rate
1 n
ll
9 = - r (xJ'tG - XJ'tIG)
is defined to be
2
n j
They show that g
= b~C in terms of the preceding table and
that g/2 is an unbiased estimator of the simple response
variance.
It is interesting to note that the situation
they have defined is analogous to the replicated selfenumeration survey of Koch (1973) which is discussed below,
except theirs is restricted to the univariate.case and to a
0,1 characteristic.
The two papers have exactly the same
estimate for the simple response variance; however, the Koch
paper arrives at it in a much simpler manner.
In addition to the numerical results that are included
for specific sample designs in these two papers, Bailar
16
(1968) used the model to examine the effect of different
time lags and different reinterview procedures.
Fellegi (1964) extended the model of Hansen, Hurwitz
and Bershad to cover a repetition of the survey in which he
jointly applied the techniques of interpenetrating samples
and re-enumeration.
Estimators for the parameters of the
model are presented as well as results based on a study
conducted during 1961 Canadian Census.
Kochis Reformulation of Census Bureau Model.
The dis-
cussion of Kochis model is based on a series of 1971 technical reports (Koch, 1971a; 1971b; 1971c) and a summarizing
paper written in 1972 (Koch, 1972).
The model for the case
of simple random sampling without replacement may be
summarized as follows:
For a population containing N elements define for each
individual in the population a p component vector random
variable
~itG
which corresponds to the observations that
would be made on the p characteristics, for the i-th individual, at the t-th trial of a sequence of repeated trials,
under the general survey conditions, G.
The nature of any
particular random sample of this population is formulated in
terms of indicator random variables U. where
1
1 if population
Ui =
0 otherwise
element i is in the sample
The overall population mean vector is
1
N
V = -N i=l
r
Et {Y' t
_1
g
}
1
N
= -N r
i=l
Y.
-1
17
where
~i
is the expected response for the i-th individual
over a series of repeated trials under the same survey conditions, G.
~i ,0
t ria 1sun de r
II
=
Et{~it}
is the expected value over
ide a 1 con di t ion s
0
f mea sur emen t
i. e ., the
,II
true values of the p characteristics for the i-th indi-
= -1,0
Y.
For simplicity the model assumes that Y.
vidual.
-1
and
that there is no bias in the measurement process in the
sense of differences between Y. and y.
-1
does not consider
a
-1,0
.
Thus the model
bias term; however, it is very simple
to extend the model to include this term.
The statistic
~tG
=
1
n
N
I:
i =1
Ui -1
Y' t g
is an unbiased estimator of the overall population mean
Y.
vector
The pxp variance-covariance matrix for
~tG
is then
broken down into three parts:
v=
E { (YtG
t
=
E {(~tG
t
!) (~ tG
~) (f tG
Et { (YtG - y) (y - Y)
where y
=
1
n
N
u. Y••
i=l 1-1
I:
Y)
I
y)
I }
I }
}
+ E
t
{(y - Y) (Y
+ Et {(y - ~ )(~ tG
Y)
I }
- y)l}
These three parts are called the
response variance, the sampling variance, and the interaction variance, respectively.
The model then considers
each of these three components separately.
The response
variance RV is broken down into two components, SRV, the
+
18
simple response variance which is due to the dispersion of
the !itG about !i; and CRV, the correlated response variance
which is due to the covariance of the response deviations
for the i-th and i '-th elements of the population.
If the
sampling and response errors interact in the sense that the
expected response for the i-th population element is different for those samples which contain both the i-th and the
i'-th than for those which contain the i-th and not the
i'-th element, i.e.,
then the covariance component may be broken down further
into the simple covariance component and an interaction component which reflects the difference between the conditional
and unconditional means.
This occurs in the following
manner:
(CRV)",
= Et {(Y'tG-Y')(Y"tG-Y,,)'/u,=U,,=l}
11
_1
-1
-1
-1
1
1
= Et {(Y·tG-Yoo')(V·'tG-Y,,·)'lu.=U
.• =l}
_1
-11
-1
_1 1
1
1
+ (Yoo1-Y.)(Y.1.-Y. 1) .
-11
-1
-1 1 -1
The sampling variance component is shown to correspond to
•
the variability that would result if the sample was taken
from the population of Y. 's without response error rather
-1
than from the population of {!itG}'
The interaction component IV reflects the variability
that is due to the relationship between sampling and
response errors.
This component is due to the differences
ttl
19
between {Y .. ,} and {Y.} and equals zero when Y..
-11
all i 1 i
-1
-11
I
= ....Y.1 for
l
•
This model is quite similar to the C8M.
The principal
advantages of the Koch model are that it does not restrict
itself to 0,1 variables, the multivariate extension and the
use of indicator functions.
The use of indicator functions
makes the mathematical manipulations much easier to follow,
particularly in sampling designs that are more complex than
simple random sampling, e.g., single stage cluster sampling
(Koch, 1971b, 1972).
In addition, the vector approach
facilitates the study of the effect of response errors on
sub-class means, differences between sub-class means, and
post-stratified means (Koch, 1971c, 1972).
In addition to this very general model, Koch presents
some survey designs that allow one to estimate the components of the model (Koch, 1971d, 1973).
In the 1973 paper
he considers three survey designs, "self-enumeration in a
repeated trial survey, random assignment of all interviewer
combinations in a repeated trial survey, and random assignment of interviewer combinations in a split-cluster repeated
trial survey.
II
In order to understand the manner in which
this is done, consider the simplest design presented, "selfenumeration in a repeated trial survey."
From a population of N individuals a simple random
sample of size n is drawn.
Measurements on each selected
individual are obtained on each of two trials.
The measure-
ment process is by means of self-enumeration in which each
20
individual in the sample responds individually and independently at each trial.
The random variable.
measurement for the i-th individual on the a-th
trial of the survey, a = 1,2. t indexes a
sequence of replications of the overall
measurement process.
The model for this survey design is
Y.1 a
t = Yl
+ H. l
+ R. a t
where
1
N
Y=N r
Y.1
i=l
Hi
=
(Y i - Y) , a fixed main effect due to the i-th
individual,
and
R.la t = (Y. t - Y.) , a random effect due to the combila
1
nation of the i-th individual and
the a-th trial.
The following assumptions are made for the model:
1.
2.
Y. = Y.
3.
The measurement process is equally variable from
1
1 ,0
trial to trial, Le.,
4.
The measurement process associated with each indi-
vidual in the sample is statistically independent of that
associated with any other individual, i.e.,
4It1
21
for i ;. ii, a ;. a
l
•
In terms of the general model explicated earlier, these
assumptions imply that
IV = 0, the interaction component is 0 due to
assumption 1,
and
CRV = 0, the correlated response variance component i s
o due to assumption 4.
Thus the variance of the overall sample mean
N
2
1
L
U.Y. t
Yt = 2r1 L
i =1 a=l 1 la
has two components in terms of the general model, SRV, the
simple response variance and SV, the sampling variance.
Koch shows how these variance components of the general
model are related to the variance components of the model
for the specific survey design in question.
In addition,
suggested estimators for these components in terms of
quadratic sample statistics are given along with the
formulas for these sample statistics.
The following vari-
ance components are defined for the specific model
1
Vy = {-
N
L
N i=l
1
2
Yi } and VH = {N:T
N
L
i =1
Then, the variance of the overall sample mean is
V(y) = (_l)V + (l)(l-!!')V
t
2n Y
n
N H·
Estimators for Vy and VH are given.
The more complicated survey designs presented in the
paper are considered in a similar manner, that is, variance
22
components for the specific model are defined, estimators
and computational formulas for these are given in terms of
the sample statistics, and the relationship of these variance components to the components of the general model are
demonstrated.
Koch refers to the model as a response error model.
The general model might better be referred to as a measurement error model.
In this context the first components of
the variance-covariance matrix of
~t
could be referred to as
the measurement variance which under specific survey design
models may contain terms due to respondents, interviewers,
coders, etc.
In a paper which discusses many aspects of measurement
errors Madow (1965) presents a mean square error model.
The
approach is somewhat different in that the MSE is broken
down in terms of regression coefficients.
Madow then gives
a double sampling model which will allow one to eliminate or
reduce the bias.
The model is worked out in the case of
simple random sampling.
A summary of the model follows.
Let
~i
= true
xi
=
value of a characteristic forothe i-th
member of a population; i = 1, 2, ... , N.
value obtained for the i-th member, i
••• , N.
E(~.) =
1
From a sample of size n one wishes to estimate
1 N
~ = N i~l lli .
~,
=
1, 2,
23
Let xi
I
denote the responses for the elements in the sample.
Then
n
Xl = -1 r xi ,
n 1=1
a'
n
= 1n r a 1.
i=l
'1.1'
n
= 1n r ]..l 1.
i=l
I
I
.
A simple random subsamp1e of n1 < n units is obtained.
Let
xi " denote the responses for the elements in the subsamp1e.
Then
x"
n1
r x.1 "
= n1
1 i=l
a"
n1
= 1n
I: a. "
1 i =1 1
1
n1
I:
n1 i=l
]..l.1 "
In the first sample one measures only the xi
sample xi" and ]..li" are measured.
I.
In the sub-
Thus,
Zl = Xl - (x" - iJ")
is an unbiased estimate of ]..l.
Madow states that the difference between the variance
of Zl and the mean square error of
x'
is a measure of the
gain that occurs by using the double sampling scheme.
Madow's general idea provides a basis for the model that is
developed in this research.
24
Summary Qf the Review
of Literature
In light of the models reviewed in this section, what
can be said about the progress that has been made in handling measurement errors?
Progress has been made in two
areas, estimation of errors and survey designs to eliminate
or reduce errors, particularly bias.
The general model
proposed by Koch (Koch-CBM) seems to be the best model in
that he demonstrates how specific components of error in
certain survey designs are related to the general model, and
how these components may be estimated.
The Koch-CBM model
does not consider the bias term, but this is very easily
added.
Madow's idea of the use of double sampling schemes
to eliminate bias is particularly promising.
Often the
survey researcher faces a situation in which he could obtain
values which do not contain measurement errors for the units
in his sample, but it may be very expensive to obtain these
values.
Usually for considerably less money he can obtain
values which contain measurement errors.
One particular
question that needs to be answered is what is the best way
to allocate the resources available for a specific survey so
that one may estimate errors, both sampling and measurement
errors, and hence eliminate them or minimize or otherwise
reduce them.
The Research
This research presents a general double sampling scheme
model which allows one to eliminate bias and estimate the
25
total variable error in the survey, namely, that due to both
sampling and measurement errors.
The model is worked out in
a manner similar to the Koch-CBM model and is offered as -an
alternative to that model to be used when the bias is a
significant part of the mean
squar~
error of the survey
estimates.
The general model developed in this research is compared to the Koch-CBM model.
In addition, specific survey
designs are presented which allow one to estimate the components of the model.
The model is a dual sampling scheme
model which involves measurements on an original sample and
repeat measurements on a subsample of the original sample.
Procedures for determining the optimum size of the two
samples are developed.
This is done in terms of a cost
function which reflects the cost of obtaining measurements
without error and measurements which contain errors.
The
procedures which are developed will allow one to minimize
the variance of his estimates for fixed cost.
Three samp-
ling schemes are compared, the dual sampling scheme, the
Koch-CBM model, and a sampling scheme in which measurements
which are not in error are obtained.
The preferred survey
procedure under various values for the components of the
three models and various cost functions is indicated.
CHAPTER II
PRESENTATION OF THE DOUBLE SAMPLING SCHEME MODEL, DSS
COMPARISON TO THE KOCH-CBM MODEL
SIMPLE RANDOM SAMPLING CASE
2.1.
Introduction
The Koch-CBM model demonstrates the manner in which
measurement errors affect the variability of sample
estimates of population characteristics.
It is assumed in
that model that the measurement process is unbiased in that
the expected value over repeated trials of the measurements
on a set of characteristics for a particular individual is
equal to the ideal or true values for that individual.
many cases this assumption may not be valid.
In
If the expected
value over repeated trials is not equal to the true value
there may be a net bias in the sample estimators.
The
square of the net bias may be a considerable proportion of
the mean square error of the survey estimates.
It is
possible in some situations that a survey researcher will
have available a measurement process which yields measurements which are essentially error free and a measurement
process producing measurements which contain measurement
errors.
Using the error free measurement process is usually
27
much more costly than using the faulty measurement process.
The double sampling scheme model which is presented in this
research is offered as an alternative to the Koch-CBM model
to be used when the faulty measurement process gives
estimates with a net bias the square of which is a large
part of the mean square error, and measurements without
errors can be obtained, although this may be at considerable
expense.
The double sampling scheme employed is that sug-
gested by Madow(1965).
An original sample is drawn and
measurements which contain errors are taken on the individuals in the sample.
A subsample of the original sample is
drawn, and the measurement process employed in the original
sample is repeated for each individual in the subsample.
This will allow one to estimate the contribution that measurement errors make to the variance of the sample estimators.
In addition, measurements are obtained for each individual
in this subsample by a measurement process which yields
measurements which do not contain errors.
called "true values" in this paper.
These will be
As indicated below,
this sampling scheme allows one to eliminate from the sample
estimates the bias which is due to the measurement process.
The difference between the variance of these estimates and
the mean square error of those in the Koch-CBM model is a
measure of the gain to be had using the DSS model.
Following Koch (1973), the model is formulated in
terms of indicator functions which reflect the nature of the
samples.
The use of indicator functions facilitates the
28
handling of complex survey designs.
The model is formulated
for a multivariate situation.
In order to make it easier for the reader to follow
the formulation of the model the case of simple random
sampling and estimation of the
in this section.
pop~lation
mean is presented
The general case of the model is presented
in Chapter VI, Appendix A.
Following the mathematical formulation of the DSS
model, it is compared to the Koch-CBM model.
2.2.
The DSS Model 1
Assume that there exists a fixed population of N
individuals.
The individual members of this population are
indexed by the subscripts i
= 1,
2, ... , N.
For each indi-
vidual i, let there be defined a p component vector _1
X. which
contains the true values for the p characteristics to be
measured in the survey.
Thus X. represents the measurements
- 1
that would be obtained if the measurement process which produces measurements devoid of measurement errors was carried
out.
In addition, corresponding to each individual i,
define a p component vector random variable
y~ t
-10-
which con-
tains the measurements that would be obtained if the faulty
measurement process is employed.
Here, t indexes a con-
ceptual sequence of repeated trials of the survey process
lNotation similar to that employed by Koch is used
whenever possible. Some changes are made, however, in order
to increase clarity.
29
which are carried out under the same general survey conditions. 2 In the double sampling scheme the survey will be
viewed as consisting of two phases, a first phase in which
measurements are obtained for an original sample and a
second phase in which repeated measurements for a subsamp1e
of the original sample are made.
The index a indicates at
which phase of the survey the measurements were obtained,
a
=
1, 2.
It is assumed that measurements from trial to
trial are uncorrelated, i.e., Y.1 a t and Y.1 a t are uncorrelated for t ; t and any specification of i = 1, 2, ... , N;
l
l
a
= 1, 2.
At the first phase of the survey a sample of size n is
drawn and the measurements
members.
~ilt
A subsample of size
are obtained for the sample
nl~n
is drawn for the second
phase of the survey and measurements -1
Y' 2t and -1
X. are
obtained for each element i of the subsample.
The nature of the sample and the subsample is characterized by indicator random variables Ui and Vi where
1 if the ith element is in the original sample
U. =
1
o
(2.2.1)
otherwise
1 if the ith element is in the subsample
V. =
1
(2.2.2)
o otherwise.
The probability distribution of the Ui and Vi reflects the
characteristics of the survey design, and the sampling
2See Hansen, Hurwitz and Bershad (1961) for a
discussion of IIgeneral survey conditions.
1I
30
errors.
For the case of simple random sampling at each
phase of the survey the following relationships hold for
the U. and V.,
1
1
n
= N'
N-n
Pr {U.=O}
= -N-'
1
Pr {V.=l/U.=l}
=
1
1
(2.2.3)
nl
Pr {V.=O/U.=l} =
1
1
Pr {V.=lIU.=O}
= 0,
1
1
Pr {V.=O/
U.=O}
= 1.
,
1
The above relationships imply the following:
n
Pr {U.1 , V.1 = 1} = Nl
{U. ,U . = 1} = n
N
1
J
n
p
{U. ,V . = 1} = l
N
r
1
J
n
Pr {V. ,V . = 1} = Nl
1
J
n
Pr {Vi=l} = Nl
P
r
{n -1 }
N-l '
(n - 1 )
N-l '
nl -1
(N-=l) ,
i f
j ,
i f
j ,
i f
j ,
{2.2.4}
Two additional assumptions are made for the model.
It
wi 11 be assumed that the expected value over trials of Y. t
-let
does not depend on et, (et = 1 ,2) , i. e. ,
Y..
t} = -1
Et{Y.
-let
{2.2.5}
Also we wi 11 assume that there is no interaction between the
sampling and measurement errors in that the expected
31
response for the ith population element is the same for
those samples which contain both the ith and the i Ith
element as for those which contain the ith element and not
the i'th element, i.e.,
Et {Yo
t!U.=l,
U.•
= l} = Y••• = Y.•
-let
1
1
11-1
{2.2.6}
When considering the correlation between the
Y. t we will consider two cases.
-12
assume that
~ilt
and
~i2t
~ilt
and
In the first case we will
are statistically independent.
This implies that the measurements obtained for the ith
individual at the two phases of the survey are uncorrelated.
It is possible, however, that in some survey situations one
would not wish to make this assumption.
For example, if the
measurements are respondents' answers to a questionnaire,
individuals in the subsample upon whom repeated measurements
are taken at the second phase of the survey may remember the
answer they gave at the first phase and may repeat these
answers or otherwise give responses which depend upon their
answers in the first phase.
In addition there may be a
between phase, between individual correlation, i.e.,
and
~j2t
may be correlated.
~ilt
For example, if ynterviewers
conduct the survey, having the same interviewer for both the
initial sample observations and the subsample observations
is sufficient to suspect a correlation.
Thus, we identify
two cases concerning the _let
Y. t' Y. It and Y. t·
-let
-Jet
Case 1.
~ilt' ~i2t and ~ilt' ~j2t are uncorrelated
which implies that
32
(2.2.7)
E {{V. 1t -V.){V. 2t -V·)'1 U.=V.=l} = 0
-1
-1
-1
-1
1
1
and
E {{V. 1t -V.){V· 2t -V·)'IU.=V.=1} = O.
-1
-1
-J
-J
1
J
!ilt and !i2t are correlated, i.e.,
Case 2.
r0
E {{V. 1t -V.){V· 2t -v·)'lu.=V.=1}
-1
-1
-1
-1
1
1
(2.2.8)
and
E {{V. 1t -V.){V. 2t -V·)IU.=V.=1}
-1
-1
-J
-J
1
J
r
0
-
Let us now consider the statistic Zt as an estimator
1 N X ••
of the overall population mean X = N l: -1
i =1
Let
~t = ~lt
(2.2.9)
~2t + ~2t
1 N
where ~lt = -n l: U.V· 1t '
i =1 1-1
and
N
1
l:
V.V· 2t ,
~2t = n
1 i =1 1 - 1
and
N
1
V. X. ,
=
l:
~2t
n1 i =1
1 -1
the average of the elements in
the original sample,
the average of the elements i n
the subsamp1e,
the average of the true values
for the elements in the
subsample.
The statistic -:t is an unbiased estimator for
E {~t} = E {~lt} - E {~2t } + E {~2t}
1
= n
i
1
n
1
::
X,
i . e .,
N
l:
=1
E{U i )E{!iltl Ui =l)
N
N
1
E{V.)X.
L
E( V. ) E(V . 2tl V. =1 ) + -n
1 -1
i=l
1
-1
1
1 ;=1
l:
1 LN V.
N i=l -1
N
l:
- N1 i =1
1 N X. = X
V. + N
l:
-1
i=l - 1
.
(2.2.10)
Thus we see that the double sampling scheme allows one to
eliminate the bias due to the measurement process.
In order
33
to examine whether this model is preferable to the Koch-CBM
model we need to compare the variance of this estimator to
the mean square error of the estimator in the Koch-CBM
model.
Considering the variance-covariance matrix of -:t we
have
v=
E {(:t-!) (~t-!)'}
=
E {(flt-f2t+~2t-~)(flt-f2t+~2t-!)'}
=
E {(~lt-~1-~2t+~2+~1-~2+~2t-!) x
(2.2.11)
where
~
are the
1 =
statistics corresponding to
~lt
and
~2t
in which the
have been replaced by their expected values.
~iat
The variance-
covariance matrix may be partitioned as the sum of four
components in the following manner,
v = E {[(~lt-Yl)-(~2t-Y2)][(~lt-~1)-(~2t-Y2)]'} +
E {[(~1-Y2)+(~2t-~)][(fl-Y2)+(~2t-!)]'} +
E {[(Ylt-Yl)-(Y2t-f2)][(fl-~2)+(~2t-!)]'} +
E {[(~1-~2)+(~2t-!)][(tlt-Yl)-(Y2t-~2)]'} (2.2.12)
Under the assumption that y"
_11
I
= Y"
-1
the 1 ast two terms of
the above expression are 0 and we have
v
=
E {[(~1t-~1)-(~2t-Y2)][(~lt-~1)-(~2t-~2)]'} +
E {[(~1-~2)+(~2t-!)][(~1-~2)+(~2t-!)]'}'
(2.2.13)
34
-~lt =
1
N
U. X., the average of
i=l 1-1
a sample of size n of the true values, the last term of this
By adding and subtracting
n
L:
. expression is partitioned further to give,
v = E {[(~lt-fl)-(~2t-f2)][(flt-fl)-(f2t-f2)]'}
+
E {[(~1-~lt)-(~2-~2t)][(fl-~lt)-(~2-~2t)]'} +
E {(~lt-!)(~lt-!)'} +
E {[(~1-~lt)-(~2-~2t)][(~lt-!)']} +
E {[(~lt-!)][(~1-~lt)-(f2-~2t)]'}
(2.2.14)
The last two terms of the above expression for V are 0 and
thus
may be expressed as the sum of three components
~
v=
(MV) + (BV) + (TV)
"-
'"'-'
(2.2.15)
..-....
where
(MV)
....,
= Measurement Variance
(BV)
= Measurement Bias Variance
~
=E
(2.2.17)
{[(~1-~lt)-(~2-~2t)][(fl-~lt)-(f2-~2t)]'}
(TV) = Sampling Variance of true values
~
(2.2.18)
•
We will consider each of these terms in greater detail.
2.2.1.
Measurement Variance 3
The measurement variance reflects the variability in
3This term is called the Response Variance in the
Koch-CBM model. Measurement variance is a more appropriate
term, however, since the term response variance seems to
imply that the variability is due to some quality of the
respondents.
35
the measurement errors, which are not the same from trial
to trial and phase to phase of the survey.
The following
illustrates the manner in which this occurs.
(~.) = E {[(Y1t-~1)-(Y2t-~2)][(f1t-~1)-(f2t-f2)]'}
= E {(~1t-~1)(f1t-~1)'} +
E {(f2t-~2)(Y2t-Y2)'} -
E {(~1t-~1)(Y2t-Y2)'} E {(~2t-~2)(~1t-~1)'}
(2.2.1.1)
Looking at each of these terms separately we have
E {(Y1t-Y1)(Y1t-f1)'}
1
N
1
N
= E {(- L U'Y' 1t L U.Y.)
n i=l 1-1
n i=l 1-1
(1n
N
U'Y'
i=l 1-1 1t
L
N
1
n
x
N
L
i=l
U. Y. ) , }
1- 1
N
= E {~L
L
U.U.(Y. 1t -Y.)(Y.· t -Y.)'}
n i=l j=l
1 J -1
-1 -1J -J
= E
r Ln2
E
{L2
n
N
L
;=1
N
L
i~j
u.(Y· 1t -Y.)(Y. t-Y')'} +
1 -1
-1 -1
-1
U.U.(Y. -Y.)(Y. 1t -Y.)'}
1 J -11 t -1 -J
-J
N
L
;=1
-Y·)(Y· -Y·)'/U.=1}+
-1 1t -1 -1 1t -1
1
E {(y.
(2.2.1.2)
1
!!(n-l)
2n N N- 1
N
~
~
i~j
E {Y·
(
-Y. )( Y' -Y ' )/ U.=U.=l } .
-1 1t -1 -J 1t - J
1 J
Thus we identify two components; one due to the dispersion
36
of the Y. t around the Y., and one due to the covariance
- 1a
-
1
between sample units of this dispersion.
Let the simple
measurement variance, (SMV) be
1 N
(SMV) = -N L: E {{V. t-Y'){Y' t-Y.)'/U.=l}
-la -1
-la -1
1
1. =.1
(2.2.1.3)
,-..."
and the correlated measurement variance, (CMV) be
(CMV) =
"-
N
N{N~l)
L:
ifj
E {(V. t-Y'){Y' t-Y·)IU.=U.=1}{2.2.1.4)
-la
-1
-Ja
-J
1
J
Thus we have
E {{~1t-~1){~1t-~1)1}
=
~{{~V)
+ (n-l){Sl:!.Y)}.
(2.2.1.5)
Similarly
(2.2.1.6)
The two cross product terms in expression (2.2.1 .1) reflect
the effect of the between phase correlation of the
the
~i2t'
~ilt
and
This occurs in the following manner,
E {{~1t-~1){~2t-~2)'}
N
= nn1
l
[L:
. 1
1=
E{U.V.)
E {{V.
t- _1
Y.){Y.
1 1
-11
-1
-Y.)'/U.=V.=1}
+
1 1
2t _1
N
r E{U.V.)
;fj
1 J
=
1
nn
n
n
N
1
[ - L:
E
1 N 1. = 1
1 (n-1)
N
E {{V. t-Y'){Y' t-Y.)/U.=V.=l}]
-11
-1
-J2
-J
1 J
N-T
N
L:
ifj
{{~i1t-~;){~i2t-~i)'/Ui=Vi=1}+
(2.2.1.7)
E
{{V.
t-Y'){Y'
t-Y.)'/U.=V.=l}],
-11
-1
-J2
-J
1 J
37
the first term being the between phase covariance of the
Y. t for a particular individual, the second being the
-let
between individual, between phase covariance of the Y. t'
-let
Let
1
(SMVC) = -N
r--
N
E {(Y· 1t -Y.)(Y· 2t -Y.)'/U.=V.=1},
i=l
-1
-1
-1
-1
1
1
(2.2.1.8)
l:
the between phase covariance of the -let
y. t for a particular
individual, and
1
(CMVC) = N(N 1)
-
-
N
l:
i fj
E {(V· 1t -V.)(V· 2t -V.)'/U.=V.=1},
-1
-1
-J
-J
1
J
(2.2.1.9)
the between phase, between individual covariance.
We then
have,
E {{llt-ll)(l2t-l2)'} = E {(Y2t-Y2)(Ylt-ll)'}
=1
[(SMVC) + (n-1)(CMVC)].
n
--
(2.2.1.10)
-
Hence, (MV) may be expressed as follows.
,.."
Case 1. !ilt' !i2t and !jlt' !j2t uncorre1ated
implies that (SMVC)
....- = (CMVC) = 0, and
~
(MV)
=
,....
1n
[(SMV)+(n-1)(CMV)]
+
__
,--
Case 2.
(~)
=
A between phase correlation does exist, and
~ [(~)+(n-1)(~)] + ~1 [(~)+(n1-1)(~)] 2 [(SMVC)+(n-1){CMVC)].
n
2.2.2.
~
-
(2.2.1.12)
Bias Variance
The bias variance represents the variation of the
biases in the measurements for each individual around the
38
net bias in the measurement process.
The manner in which
this occurs is as follows:
= E
1 l:N U.{Y.-X.)'J}
U.{Y.-X.)J[+
n i =1 1 - 1 -1
n i =1 1 -1 -1
N
{[1
l:
N
1
N
V. (Y . - X. ) ] [ - l: V.{Y.-X.)I]}
n1 i=l 1 -1 -1
i =1 1 -1 -1
.
N
N
1
l:
V.{Yi-X.)][n . l: 1 U.{Y.-Xi)I]}
1 -1 1=
i =1 1 - -1
E {[_1
n1
l:
E {[_1
n1
N
E {[ 1
1
l:
n i=l
U. (Y . - X. ) J[1
-1
N
l:
n1 i=l
-1
V. (Y . - X. ) I
1
-1
_1
J}.
(2. 2 . 2 . 1 )
Let
N
(IBV) = 1N l: (Y.-X.){Y.-X.)I
_1 -1
-1-1
r0i =1
and
(2.2.2.2)
N
1
(2.2.2.3)
(IIBV) = N(N-l ) l: {Y.-X.){Y.-X.)I.
,-.irj -1 -1 --J -J
Then
n-n
(BV) = nn 1
1
-
-
-
(IBV)
r-
(IIBV)
.
(2.2.2.4)
Now the net bias in the measurement process i s
1
N
B=Y- X =N
i =1
Y.
L:
-1
1
- N
N
L:
i =1
(2.2.2.5)
X.
-1
and the bias for the i th individual is
B. = -1
Y.
-1
-
X..
-1
--
1
N
(B.-8}{B.-B)
Thus [(IBV)-{IIBV)] = N-l
ri =1 -1 - - 1 L:
I.
(2.2.2.6)
39
2.2.3.
Sampling Variance of True Values
(TV) is the sampling variance-covariance matrix of
~
X for
as an estimator of the population mean
-
sample of size n from the population of X..
-1
the variance of
x assuming
x
a probability
That is it is
a sample of size n and a measure-
ment process devoid of measurements errors.
(~)
= E
=
E
{(~ t-!)(~ t-~)I}
{[~
N
L
i=l
U.X.- 1
N
L
(2.2.3.l )
1
X.][-
1 N
U,X'-- L: X.]I}.
n i=l 1-1 N i=1-1
1_1 n i=l -1
N
L:
For a simple random sample (TV) reduces to
2.2.4.
Final Expression for the Variance-Covariance Matrix
of
It
Combining these terms we arrive at the following
expressions for the variance-covariance matrix of
Case 1.
~t:
No between phase correlation.
v = 1n [{SMV)+{n-1
)(CMV)]
+ __
1n [(SMV)+{nl-l
)(CMV)]
+
r-r__
,l
n-n
_ _1 [(IBV)-(IIBV)] + (TV).
nn 1
Case 2.
V =
,--
-
-
(2.2.4.l)
A between phase correlation does exist.
~ [( ~ )+( n- 1 )( C~ )] + ~ 1 [( S~ )+( n1- 1 ) ( C1!.Y)] -
£n [(SMVC)+(n-1)(CMVC)
...-
-
(TV).
+
n-n
nn 1
1 [(IBV)-{IIBV)] +
--'
(2.2.4.2)
40
2.3.
Comparison of DSS Model to Koch-CBM Model
In order to compare the Koch-CBM model to the DSS
model it is necessary to modify the Koch-CBM model to
include a bias term.
2.3.1.
Modification of Koch-CBM Model
The Koch-CBM model has been reviewed in Chapter I.
It is based on a survey scheme consisting of a single phase
in which a sample of size n is drawn and measurements Y' t
-1
are obtained. Here Y. t is a p component vector random
-1
variable and represents the measurements obtained during the
survey on p characteristics for the ith individual.
The
index t indicates a series of repeated trials of the survey.
A particular random sample is formulated in terms of
indicator random variables U.1 where
1 if the ith element is in the sample
U. =
1
{2.3.1.1}
o otherwise.
The Koch-CBM model assumes that the expected measurements
for the ith individual over a series of repeated trials is
equal to the true values for the ith individual, i.e.,
Et {Y.
-1 t
}
•
= -Y.1=- 1
X..
We will change this assumption and assume tha t Y. f
-1
The estimate of the population mean
X is -~t
-
This estimator may be biased when Y. f X..
-1
the net bias is nonzero,
-1
=
{2.3.1.2}
x..
-1
1 I:N
_ 1 t.
n i = 1 U.1 Y.
Assuming that
41
_
1
N
E(y ) = - E Y. = V
-t
N.1= 1 -1
-
~
X.
(2.3.1.3)
The mean square error matrix of -~t is
~ = E {(~t-~)(tt-!) I}
=E
{(~t-!)(lt-!)I} + E
{(V-X)(V-X)I} +
E {(~t-!)(!-!)I} + E {(!-!)(tt-!)I}.
The expected value of the last two terms is zero.
~·matrix
(2.3.1.4)
Thus the
of ft may be partitioned as the sum of two
components
MSE
r-
= V+ B
(2.3.1.5)
where V is the variance-covariance matrix of It
and B is a
symmetric matrix which contains the square of the net bias
terms for each characteristic on the diagonal and the cross
product of the net bias terms off the diagonal.
For the
case of simple random sampling and with the assumption that
Y.. I
-1 1
Y. that is used in the DSS model Koch has shown that
=
-V may
-1
be partitioned as follows,
)(CRV)} + (SV)
V = -1n {(SRV)+(n-1
...-
-
where
-
(2.3.1.6)
(SRV)
,-...-
(2.3.1.7)
(CRV)
(2.3.1.8)
~
and
(SV)
,--
=E
{(~-V)(~-V)l}
(2.3.1.9)
--
1 N
l:
U• Y••
The terms (SRV)
,......, and (CRV) are equivan i=l 1-1
lent to (SMV) and (CMV) in the DSS model. Thus using the
where y =
~
-
~
42
notation employed for the DSS model we have
MSE
+ B.
__ = 1.. {(SMV)+(n-1)(CrW)} + (SV)
,...._
n~
(2.3.1.10)
~
We will consider the term (SV),
called sampling variance in
.....the Koch-CBM, further.
In light of the fact that we do not
assume that -1
Y. = -1
X., (SV) may be broken down as follows,
(SV)
= E { (y_ Y)(y_Y) I}
r-
-
--
= E {[(y-i)-(Y-X)+(i-X)][(y-i)-(Y-X)+(i-X)]'}
............
....,......
............
............
.....
......
~......
+ E {(x-X)(X-X)I} +
= E {[(y-x)-(Y-X)][(y-x)-(Y-X)]I}
..........,
..........
...........
............
+
E {(x-X)[(y-x)-(V-X)]I}
............
...........
..........,
E {[(y-x)-(Y-X)]I(X-X)I}
(2.3.1.11)
--
where x = 1.. ~ U.X ..
n 1. = 1 1-1
The E {[(y-x)-(Y-X)][(y-x)-(V-X)]I}
represents the variation of the individual bias terms around
the net bias as follows,
E {[(~-~)-(~-!)J[(~-~)-(~-!)J'}
1
N
1
n
N
= E {[- E (U·-N)(Y.-X.)J[- E (u.-N)(Y.-X.)'J}
n ;=1
1
1
1
n i=l
1
1
1
(2.3.1.12)
The E {(x-X)(X-X)I} equals (TV).
The nature of the cross
-.."
product terms may be demonstrated as follows,
E {(x-X)[(y-x)-(Y-X)J'}
............
...........
..... .....
= E {(x-X)(Y-X)l}
N
E
N
E
= E {(l
(u.-~)x.)(1..
U.(Y.-X.)I)}
n ;=1
1
-1
n ;=1 1 -1 -1
= l(N-N n )[N1
n
N
E
;=1
x.(y.-X·)'-N(~ 1)
-1
-1
-1
-
N
E
(2.3.1.13)
X.(Y.-X.)'].
ifj -1 -J -J
43
Let
( BTI) =
~
1
-N
N
l:
i=l
X.(Y.-X.)'
-1
--
-1
-1
N
1
- N(N-l)
l:
i!j
(2.3.1.14)
X.(Y.-X.)'.
-1
-J -J
Thus (BTl) represents the interaction of the bias terms with
---
the true values.
(BTl) will be non zero when the bias
terms vary with the true values.
~or
example, Fleischer
et ale (1958) found that errors in farmers' reports of sizes
of fields tended to vary with the actual size of the field.
Large fields tended to be underestimated, small fields
overestimated.
We can then write (SV) as
~
= 1 (N-n) {(IBV)
(SV)
rn
N
r-
(IIBV)} +
~
(2.3 1.15)
We then have the following as the final decomposition of
the MSE matrix of Yt'
~ = ~ {( ~V )+( n- 1 ) ( ~)} + ~ ( Nr/)
{( ~ )-(I~V)}
+
(2.3.1.16)
Comparison of Mean Square Error
2.3.2.
~t
to Variance of :t
In the case in which a survey is conducted under the
DSS model and the two phases of the survey are independent
the variance-covariance matrix of
~t'
the estimator of
X is
given by
V=
1
{(SMV)+(n-l)(CMV)} + __
1 {(SMV)+(nl-l )(CMV)} +
n,-..-
(-'_1)
"1 n
,....-
n1
----
{(IBV)-(IIBV)} + (TV).
,--
-
-
--
(2.3.2.1)
44
When the survey is conducted under the Koch-CBM model the
mean square error of
~E
=
~
ft '
{( ~V) + (n - 1 )
the estimator of
(C~)}
+
~
(N
!
is given by
r/) {( ill) - (~V)}
+
(2.3.2.2)
In each case we have a term due to the sampling variance of
. the true values, a measurement variance term due to the
trial to trial variability of the measurements errors and a
term due to the variation of the individual bias terms
around the net bias.
~t
In addition the mean square error of
contains a term due to the square of the bias and one due
to the interaction of the bias terms with the true values.
This interaction term may be positive or negative. 4 These
two terms are eliminated from the variance of :t by means of
the information obtained in the second phase of the survey
conducted under the DSS model.
~t
This is due to the fact that
is formed by subtracting the net subsample bias from the
original sample mean, i.e., ~t
2.3.3.
= ~lt
- (~2t
~2t)'
Gain to be had using DSS Model
The difference between the mean square error of
the variance
Of'~t
~t
and
is a measure of the advantage of using the
the double sampling scheme to conduct the survey.
This will
4 It is interesting to note that if one has the choice
of using a measurement process which uniformly gives measurements which have positive bias that is 10% of the true
values or one with a uniformly negative bias, which is 10%
of the true values, the estimate of the mean will have
smaller MSE when the latter process is uSEd.
45
be affected by the relative cost of measuring the true
values versus the values which contain measurement errors.
Since it is usually much more expensive to obtain true
values, the sample sizes under the double sampling scheme
cannot be as large as that for a single sample under the
However, if the bias in the Yo t is large
-1
it may be advantageous to use the double sampling scheme.
Koch-CBM model.
The conditions under which double sampling will be preferred
are considered in detail in later chapters.
CHAPTER III
SELF ENUMERATION AND INTERVIEWER DESIGNS FOR
ESTIMATING THE VARIANCE COMPONENTS
OF THE DSS MODEL
3.1.
Introduction
In order for thp. general DSS model to be useful to
persons conducting surveys one must be able to estimate the
components of the model for specific survey situations.
In
addition we need to know what the best survey procedure is
for a particular population, in particular when is it best
to use the DSS model for conducting the survey.
question is examined in Chapter IV.
This latter
In this chapter two
survey procedures are considered, a self-enumeration survey
and a survey using interviews.
The manner in which the
components of the DSS model may be estimated is demonstrated
in each case.
•
3.2.
Self-Enumeration Survey
Consider the following survey design.
For a particu-
lar population of N individuals a survey is conducted using
a dual sampling scheme.
A simple random sample of size n
is drawn for the first phase of the survey; a subsample of
this original sample of size n1 is drawn for the second
47
phase of the survey.
The measurement process used for
obtaining measurements for the individuals in the sample and
subsample is self-enumeration.
tionnaire may be used.
For example, a mailed ques-
In addition the "true values" for
the characteristics being measured are obtained for the
individual members of the subsample.
This might be done by
means of a record check in which some record is examined to
obtain the true value for the individuals in the subsample.
For example, as in the study by Maynes (1968), respondents
might be asked to report their bank balances with the actual
balances being determined from bank records.
Thus the
survey design is a particular case of the DSS model and may
be viewed in the following manner.
3.2.1.
The Model
Let the set of N individuals in the population be
indexed by the subscript i, i
= 1, 2, ... , N.
The sample
design may be characterized by random variables U.1 and V.1
where
1
if the ith element is in the original sample
0
otherwise
1
if the ith element i s in the subsample
0
otherwise.
U. =
1
and
Vi
=
For simplicity we will assume that simple random sampling
is used at each phase of the survey.
A simple random
sample of size n is drawn for the first phase of the survey.
At the second phase of the survey from the original sample
48
of n individuals a simple random subsamp1e of size n ,
1
n1 < n, is drawn. Thus the set of relationships (2.2.3) and
(2.2.4) hold for U.1 and V..
1
let the random variable Y.10. t be the measurement
obtained for the ith individual at the o.th phase of the
survey; where a = 1 corresponds to the first phase of the
survey in which measurements are obtained for each member of
the original sample and
a
=
2 corresponds to the second phase
of the survey in which measurements are obtained from the
subsamp1e members.
As in the general model, the index t
indicates a series of repeated trials of the survey
procedure.
In addition, let Xi be the true value for the
ith individual.
We will assume that Y.10. t is the sum of a
fixed effect and a random residual effect in the following
way,
Yio.t = Xi + Bio.t ·
(3.2.1.1)
In addition
(3.2.1.2)
and
Et (B., a t) = S ,.•
Thus B.1
= Y.,
- X,. .
(3.2.1.3)
The following additional assumptions
are made,
1.
That the measurement process is equally variable
from trial to trial so that Et {(S. t-S') 2 } = y. 2 for
,a '
1
a = 1,2 and i = 1, 2, ... , N;
2.
That the measurement process for each individual
in the sample is statistically independent of that for any
49
other individual and is statistically independent between
phases for a particular individual, i.e.,
Cov (Bia.t' Bi'a.t)
=
Cov (Bia.t' Bia.'t) = 0 for
i f ii, a. f a.'.
In terms of the general model explicated in Chapter II this
last assumption implies that the measurement variance term
includes only the simple measurement variance, the other
terms being O.
3.2.2.
Estimator for Population Mean
We can, thus, use the unbiased estimator Zt to estimate
_
1
. the overall population mean X = -N
N
L
i =1
Xl"' where Zt is formed
by subtracting the net subsample bias from the original
3.2.3.
Variance of the Estimator Zt in Terms of the Model
Let us now examine the variance of Zt'
The variance
50
(3.2.3.1)
E {(Zt-X)2} = E {(Ylt-Y2t+X2t-X)2},
may be partitioned in the following manner;
2
E { (z t -X) 2} = E {[{Y1t- Y1)-{Y2t- Y2)] } +
E
{[{y1 -y 2 )+{x 2t -X)]2}
( 3 . 2 . 3. 2 )
N
N
1
V.1 V.1 are the sample
L
where Y1 = -1n L U.1 V.1 and Y2 - -n
1 i =1
i =1
and subsamp1e means respectively across the appropriate set
of Vi.
The first term in (3.2.3.2) corresponds to the measurement variance term in the general model.
Considering this
term we have
-
-
-
-
2
(MV) = E {[{Ylt-Yl)-{Y2t-V2)] }
= E {{y
1t
-y 1 )2} + E {(Y 2t- Y2)2}
(3. 2. 3. 3)
Expressing (MV) in terms of the specific model which we
are presently considering and noting that the (CMV) is zero
we have,
N
E {(l L U.V.
E {{Y - Y
n i =1 1 11 t
1t 1)2} =
=
E
1
n
{~. -N
n~
1
~
n~
N
1 L:N U. V.) 2 }
n i=l 1 1
U.{V·t-V.)
i=l 1 11
1
L:
2
+
N
r U.U.{V. t- V )(V. t- V')}
iij 1 J 11
J1
J
51
=
1
2
n L:N E {(B
N i =1
i1t -B i ) } +
1
n
N
~
~
n-1 L:N E
{(Bilt-Bi)(Bjlt-Bj)}
N-1 irj
N
y. 2 } = -1 (SMV).
n
n N i =1 1
2
S 2.
Then E {(Ylt- Yl) } = 1
n y
= 1 {l
(3.2.3.4)
L:
1 L:N y. 2
Let Sy 2 = N
i =1 1
1 S 2
2
-1
E {(Y 1t -Y ) } = 2
n1 y = n1 (SMV).
Similarly
Looking at the cross product term we have
E {(~lt-Yl)(~2t-~2)}
=
1 1
E {[ - -
N
N
L:
L:
n n 1 i=l j=l
U. U. (Y. t- Y. )( Y. t- Y.)}
1 J
11
1
J2
J
=
0
due to assumption 2 above (a rat).
Thus, we have
(MV) = (1 + _1 ) (SMV)
n
n1
= E
{[(Ylt-~l)
(~2t-Y2)J2}
= (~ + n~)Sy2.
(3.2.3.5)
Turning to the second term in expression (3.2.3.2) we
have
priate set of true values, Xi'
on
X2 t
We may drop the subscript t
since Xi' the true value for the ith individual, does
52
not vary from trial to trial.
Continuing with this term,
we have,
(3.2.3.6)
The expected value of the cross product term is zero, and
Now,
(TV)
where S
2
-
_ 2
= E {(xl-X)
1
= N-1
N
~
;=1
1 N-n
} = n(-N-)S
2
_ 2
(Xi-X) , the variance for the mean of a
simple random sample of size n from the true values.
The
remaining term E {[(Yl-Xl)-(Y2-X2)]2}, i.e. (BV), represents
the variance of the individual bias terms Bi around the net
bias B
1
=N
(BV)
N
~
Bi in the following manner,
i =1
= E
{[(Y 1 -X 1 )-(Y2- X2)]2}
= E
{[l ~
N
n i =1
U.(Y.-X.)
1
1
1
N
2
= E {_l[ ~ U·B·]
1
1
n 2 i=1
N
1
[
ViB i ] 2 }
~
2
"1
i =1
N
1
n
~
l
i =1
2}
V.(Y.-X.)]
1
1
1
N
N
2
U.1 B.1 ~ V·1 B·1 +
~
nn1 i =1
i=l
53
1 N
= (nri"l) N L B·1 2
i=l
n-nl
(n-nl)
nnl
1
N(N-l)
N
L
i~j
BiB j
N
n- nl
1
= nnl SB 2 where SB 2 = N=T
(B i -B)2. (3.2.3.7)
E
i =1
We have thus identified three variance components, Sy 2 due
to the variation in the Yiat from trial to trial and phase
to phase; S2 the population variance; and SB 2 due to the
variations of the individual bias terms around the net bias,
all of which pertain to the variance of Zt such that
V(Zt) = n+nl {S y 2} + n-nl {SB2} + l(NN- n ) {S2} (3.2.3.8)
nnl
nnl
n
3.2.4.
Sample Estimators for Variance Components
Sample estimators for the variance components Sy 2,
SB 2 and S2 may be constructed as follows: Consider first
the subsample variance of the true values sx 2 .
1
= nl-l
1
nl-
N
L
i=l
-
[V 1·(X i -x 2 )]
2
nl-l N
1
N
(3.2.4.1)
E V.X. 2 - -- L V.V.X.X.].
nl i=l 1 1
nl i;j 1 J 1 J
= ----1 [----
Thus, sx 2 is an unbiased estimator for the population
var1anca S2, f.e.,
1 N
E(Sx 2 ) - 1'1" E
- ., f=l
(3.2.4.2)
The subsample variance of the bias terms is an
unbiased estimator of the sum Sy2 + SB 2 . The following
demonstrates the manner in which this occurs,
54
5B
2
1
N
-
-
= ----1 E {V i [{Yl"2t- Xl"} - {Y2 t- x 2}]}
n1- i=l
2
{3.2.4.3}
1
1
n1-1 N
2
= - [ - E VB
"1-1"1 i =1 i i2t
N
E ViVJ"Bi2tBJ"2t].
" 1 irj
Now,
1
2
E{5 B } = i
N
1
E[B i2t ]2 - N(N-1)
E
i =1
N
l"~J"
r
E{Bi2tBj2t}·
From assumption 2 above we have that V{Bi2t} = Yi 2 and
COV(B i2t , Bj2t } =
1
o.
Therefore,
N
E{sB 2 } = N E
i =1
= SY 2
+ S 2
{3.2.4.4}
B·
Consider next the between phase within individual sum
of squares, Sw 2 .
s 2
w
=
{ 3•2•4•5}
Thus,
"1
= ~
N
E E {{Bilt- Bi}2 - 2{Bilt-Bi}{Bi2t-Bi} +
i=l
{Bi2t- Bi)2}
"1
= N
N
E
i =1
2
2
{Yi +Yi };
{3.2.4.6}
{3.2.4.7}
55
Considering the above results we arrive at the following
estimators for the variance components,
A
S2
=
2
Sx '
2
2 = sw ,
2n l
y
Sw 2
2 = 2
sB
SB
2n l
~
(3.2.4.8)
S
~
3.3.
.
Survey Design Using Interviewers
Many surveys are conducted with the use of interviewers.
It is generally believed that interviewers intro-
duce additional errors into the survey data.
For example an
interviewer has his own opinions, ideas, and attitudes
regarding the subject matter of a survey and these may
influence the responses that are elicited and recorded.
A
classic example of this is in a study by Rice (1922) in
which destitute men were interviewed to determine to what
they attributed their destitute state.
Those persons inter-
viewed by a prohibitionist tended to report the effects of
alcohol as the cause; those interviewed by a socialist
attributed their state to social conditions.
the interviewers may make careless errors
.
when
In addition
recording
results.
Respondents to an interview are also influenced by
certain characteristics of the interviewer and may vary the
answers they give in order to appear socially acceptable to
the interviewer.
Different interviewers are able to
establish different degrees of rapport with the respondents.
56
This will often influence the willingness of the respondent
to cooperate with the survey and the effort made to give
accurate answers.
In this section we examine the measure-
ment error effects of interviewers under the double sampling
scheme model.
Consider the following survey design.
For a particu-
lar population a survey is conducted using a double sampling
scheme.
An original sample and subsamp1e are drawn, and
measurements are taken on the individual members of the
sample and subsamp1e by interviewers.
In addition, for the
members of the subsample the true values for the characteristrics being measured are obtained.
This might be by a
record check or by re-interview by a more highly trained and
experienced interviewer whom we are willing to assume
obtains values which do not contain measurement errors.
When interviewers are used to conduct a survey we will
assume that a correlated component of the measurement variance exists which is due to the interviewers.
We will
assume that interviewers operate independently of each
other.
Thus, the correlated component of the measurement
variance results only from the fact that
E {(Y 1· t -Y.)(Yo t -Y.,)'/U 1·=U.,=1} '10 if the ith and i'-th
-1 -1 -1
1
individuals are assigned to the same interviewer. If the
ith and i '-th individuals are assigned to different interviewers then E {(Y1·t-Y')(Y1·'t-Y,,)'/U,=U1·,=1} = O.
-
-1
-
-1
1
The
survey design is a specific case of the DSS model and may
be viewed in the following manner.
57
3.3.1.
Model Using Interviewers
Let the population consist of N individuals indexed by
the subscript i, i = 1, 2, ... , N.
Let there be a fixed
population of B interviewers indexed by the subscript j,
j
= 1,
2, ... , B.
The sample design and interviewer
structure may be characterized by indicator random variables
in the following way.
A simple random sample of size n is drawn for the
first phase of the survey.
This sample is characterized by
the random variable Ui where
U·1
=
1
if the ith element is in the original sample
o
otherwi se.
Each individual in the sample is assigned at random to one
of the interviewers.
For simplicity we will assume that
each interviewer interviews r individuals such that n = Br.
The interviewer structure at this phase of the survey is
characterized by the random variable Cij where
1 if the ith individual is assigned to the jth
in tervi ewer
Cij =
{3.3.1.1}
o otherwise.
From the sample of n individuals a simple random subsamp1e
of n1 individuals is chosen for the second phase of the
survey.
The random variable Vi characterizes this sub-
sample where
Vi
=
1
if the ith individual is in the subsamp1e
o
0
the rw i s e .
Again each individual member of the subsamp1e is assigned
at random to one of the B interviewers such that each
58
1
individuals and n = Br 1 .
The inter1
1
viewer structure at the second phase of the survey is charinterviewer has r
e\
acterized by the random variable Dij where
1 if the ith element is in the jth interviewer's
subsamp1e
D•• =
(3.3.1.2)
lJ
a otherwise.
Assuming simple random sampling at all stages of the
survey procedure the set of relationships (2.2.3) and
(2.2.4) concerning the Ui and Vi hold.
the following,
In addition we have
Pr {C .. =l/U i =l} = r
1J
n
n-r
Pr {C .. =O/U.=l} = -n1J
1
Pr {C··=l,U.=l}
= r
1J
1
n
(3.3.1.3)
Pr {D 1..J =l/V.=l}
1
r1 .
Pr {D .. =l,V.=l} = _
1J
1
N
Each individual is interviewed by only one interviewer at
each phase of the survey such that
Pr {C.1 J·=1 ,C 1..J ,=1/ U.=l}
= a, j
1
Pr {Dij=l ,D .. ,=1/ V.=l} = 0, j
1J
1
t
t
jl
•
(3.3.1.4)
j'
let the random variable Y.. t be the measurement
lJa
lIf randomization across interviewers is not done at
the second phase of the survey additional components of
error are introduced. Appendix B in Chapter VI demonstrates
the results if each interviewer re-interviews a subsamp1e of
his original sample.
el
59
obtained for the ith individual in the survey by the jth
interviewer at the ath phase of the survey.
Again, t
indexes a series of repeated trials of the survey.
Let X.1
be the true value for the ith individuals for the characteristic being measured.
We will assume that y lJa
.. t may be
expressed as the sum of random and fixed effects in the
following manner,
Yijat = X + Hi + B + Li + Qj + (IQ)ij + Zjat + Rijat
(3.3.1.5)
where the effects in the model are defined using the
following,
Et{Y lJa
.. t) = y 1..J
B
Y.1 = 1 ~ y ..
N j =1 1 J
1 NL Y..
Vej = N
i=l 1 J
N
1 L
Vejat = N
i =1
lJa t
y ..
(3.3.1.6)
N
N
B
1
1 L Y.
L Y..
V =N
L
1 =W
1J
j=1
i =1
i=l
X= 1
N
L
N i=1
Bi = Y.1
B
which gives
=
1
X.
1
X.
1
N
L
N i =1
Bi
60
Hi = X·1 - X
L·1
= (Vi-Xi)
I.
= H.1
1
B
= B.1
B
X
B
= V.1
+ L.1
Q. = Voj
J
V
( I Q) 1..J = V..
1J
Ii
= V.1 - V
(3.3.1.7}
-
-
Q.
J
V
1 N
E
= Vojat
Z.Ja t =
N i =1 (Yijat-Yij)
Voj
and
V..
R..
lJa t = lJat
Thus
i
Y..
1J
-
Z.Ja t
.
associated with every individual in the survey
we have
two fixed effects Hi and Li . Hi represents the
difference between the true value for the ith individual and
population mean of the characteristic being measured.
is a net bias
i
B
i
where
B is
There
the difference between the
population mean which results from the specific measurement
process being used
i
i .e'
i
V
and
Xi
the actual population
mean.
Li represents the effect of the ith individual on
this net bias. Thus Li is the difference between the bias
in the expected measurement for the ith individual and the
i
net bias.
The sum of L1· and H.1 is I'1
i
the difference
between the expected measurement for the ith individual and
population mean V which results from the faulty measurement
process.
Associated with each interviewer is a fixed effect Qj
61
which is the difference between the mean across all individua1s that would be obtained by the jth interviewer and
Y.
(IQ)ij is the fixed interaction effect and is the difference
between the population mean
Y and
the expected response over
trials when the ith individual is interviewed by the jth
interviewer, minus the effect that is uniquely associated
with the ith individual and that associated with the jth
interviewer.
This term is zero when (Y" .-Y.) = Q.; that
1J
J
1
is, when the difference between the expected response for
the ith individual interviewed by the jth interviewer and
its average across interviewers is equal to the effect of
the jth interviewer.
This implies that the specific combi-
nation of the ith individual and jth interviewer does not
uniquely affect the measurement errors.
is zero when (Y .. -Y.. )
1J
1J
Similarly (IQ)ij
= 1..
1
In addition to these fixed effects we have defined
two random effects Zjat and Rijat . Zjat is a random effect
associated with the jth interviewer at the ath phase and
t-th trial and is the difference of the average across
individuals in the population of the measurements obtained
by the jth interviewer at the ath phase and
t~th
trial and
the expected value across trials of that average.
R··
lJa t
;s a random residual effect and represents the difference
between the measurement obtained for the ith individual by
the jth interviewer at a particular phase and trial and the
expected value of that measurement across trials minus the
random effect associated with the jth interviewer.
62
We make the following additional assumptions
concerning the model.
1.
That the measurement process is equally variable
from trial to trial such that
(3.3.1.8)
and
E
t
2.
{R .. t 2 } = n .. 2 •
lJa
lJ
(3.3.1.9)
That the measurement process associated with each
interviewer is statistically independent of that associated
with any other interviewer and is statistically independent
from phase to phase of the survey for a particular
interviewer such that
Cov(Zja t ' Zjlat) = Cov(Zjat' Zja1t) = 0
for j;/j
I,
ata I
Cov ( R.. t' R..
=0
lJa
lJa t
for jrj I , a'ia l
I
)
and
(3.3.1.10)
Cov(Z.Ja t' R..
lJa It) = 0
for jtj I, ata I •
Estimator of Population Mean
3.3.2.
As in the simple self-enumeration case we can form an
unbiased estimator of the population mean by subtracting the
net subsample bias from the original sample mean.
Zt is the estimator for the population mean then
Zt
= Ylt
- (Y2t- X2t)
Thus, if
63
1
where Yit = n
N
B
L:
L:
U.C
.. Y. 'it ,
1 1J 1J
i=l j=l
N
B
1
L:
V·D
.. Y · t ,
Y2t = n1 L:
1 lJ i J2
i =1 j=l
and
3.3.3.
N
1
V.1 X.1
=
L:
x2 t
n1 i =1
Variance of the Estimator Zt in Terms of the Model
We will now examine the variance of Zt to determine
how the variance components of the general DSS model are
related to the effects of the model for this specific case
as expressed in equation (3.3.1 .5).
Proceeding in a manner
similar to that used for the general model and the selfenumeration case we find that the variance of Zt may be
partitioned as the sum of three components giving the
following,
E {{Zt-X)2} = E {{Y1t-Y2t+x2t-X)2}
-
-
-
-
2
= E {[(Ylt-Y) - (Y2t- Y2)] } +
E
{Ly1 -x 1 ) - {y 2 -x 2 )]2}
x
E {{Xi- )2}
wnere
1
Y1 = n
Y2
=
,
N
L:
i =1
UiY i
N
1
V. Y.
L:
"1 i =1 1 1
+
(3.3.3.1)
64
N
L
1
xl = n
-
and
e,
u.1 v·1
i =1
= X2t
X
2
These three terms correspond respectively to the measurement
variance, bias variance and sampling variance of true values
from the general model.
In terms of the specific model (3.3.1.5) for Vijat the
bias variance may be expressed as follows,
-
(BV) = E {[(Yl-x l )
N
=
E { [~
- 2
(Y2- x 2)] }
N
1
V (v.-x.)]2}
L
nl i=l i 1 1
U.(V.-X.)
L
1
i =1
1
1
N
1
n-nl
- 2}
(B.-B)
L
= nn1 {N-l
1
i =1
n- nl
= nn1
N
{N~l
L
i =1
n-n1
= nn1 5B2
el
L. 2}
1
1
where 5B2 =. N-T
N
L
i =1
L. 2
1
.
(3.3.3.2)
The sampling variance of the true values may be written:
N
2
1
l(
Nn)
L H.1 }
{(x
-X)2}
E
=
(TV) =
{N
-1
n N
1
i =1
(3.3.3.3)
For the measurement variance we have,·
(MV)
-
-
-
-
2
=E
{[(Ylt- Yl) - (Y 2t -Y 2 )] }
=E
{(Ylt- Yl)2} + E {(Y 2t- Y2)2}
(3.3.3.4)
65
We will consider each of these terms separately.
Now,
- 2
E {(Ylt-Yl) } = E {(1
n
N
B
1:
1:
i =1 j=l
UiCijVij1t
N
1 L: U.v.)2}
n i=l 1 1
B
N
= E { (1
L:
U
[(
L:
n i =1 i j=l CijVij1t)-Vi])2}
= E
{~
n
N
;=1
j=l
N
Ui [(
1:
N
L:
E {_l
n 2 i Ii
B
[( L:
B
2
C. ,V" l )-V i ] } +
j=l lJ lJ t
L:
B
I
U.U .• [( L: C.. V·. t)-V']
1 J 1 J1
1
1 1
. 1
J=
Cil .V·
t)-V i ' ] '
J 1 J1
I
.
(3.3.3.5)
•
B
Note that E {_l
L:
U·[( L: Cl'J,Vl·J'lt)-Vl·]2} corresponds to
n2 i=l 1 j=l
the contribution that the initial sample makes to the simple
measurement variance term of the general model and that the
last term above corresponds to the correlated measurement
variance term.
Expanding the simple measurement variance term
we
have the following,
1
N
B
E {L:
Ui [( .L: CiJ'Vijlt)-Vi]2}
n2 1=1
J= l
=
1
N
B
1
2
E {2" 1: Ui [( L: C.. V.. 1 -C .. V, .+C .. V.. - -BV, .)J }
n i=l
j=l lJ lJ t lJ lJ lJ lJ
lJ
66
{~
= E
B
C.. (Y .. -Y .. ) + L ( C.. - l)y .. J2}
1J
B 1J
; j=l lJ lJit lJ
j=l
B
N
U[
L
; =1
n
L
N
E
B
N
B
U.[ E C;j(Y;jit- Yij)J 2 } +
~ i =1 1 j=l
= E {1
1
J2 }
U.[ L (C .. - S)Y"
L
E {1
---z i=l
1J
1J
1 . 1
J=
n
(3.3.3.6)
with the expected value of the cross product term being O.
Now,
B
N
E {_1
L
n 2 i =1
U.[
1
L
. 1
J=
C..
lJ (Y.lJ'it-Y'lJ.)J2}
B
N
U.[ L Ci . (Y. 'it- Y' .)2 +
L
= E {_1
J lJ
lJ
. 1
n 2 i =1 1 J=
B
L
j;j'
1
=--z
n
=
'lr)
n {~
1
CijC; .,(Y ijit -Y .. )(Y. " i t- Y'" )J}
J
lJ
lJ
lJ
N
B
L
L
i=l j=l
B
~.2} + 1
n
j=l J
E
Also,
E {1
N
B
E
U.[ E
;Z i =1
= -2' E
n
B
i:
j#j'
2
{
N
L
j=l
1
2
(~j +nij )
{}g-
N
B
E
E
2
n··
}
1J
i =1 j=l
1
2
(C·1 J.- S)YijJ }
B
1
2
2 ?
(C .. - -[C .. + 2) Y.. +
. 1
1J
B
1J
1J
J=
U. [ L
i =1
( C.. C..
1
lJ lJ'
-
C· .
.:.u..
B
C·1 .,
.=.JL
B
+
1
87
)Y .. Y.. ,J}
lJ lJ
( 3 . 3 . 3 . 7)
67
=~{N
n
N
B
L
L
N
B
2
V•• 2_~(N) L
LV •• +
i=l j=l
1J
i=l j=l 1J
1 n
N
2
1 r
N
B
-2 (-N ) L V.. + 0 - -B(-N ) L
L
V. ,V."
i=l
1J
i=l j1jl 1J lJ
B
r)
lB(-N
=
1
2"
n
N
B
L
L
N
-
B
V.. V.. , + L (!!-N) L
L
V.. V.. ,}
i=l j1jl 1J 1J
B2
'i=l j1jl 1J 1J
{N
N
B
2
rB N
2
LV.. - N
LVi } .
i=l j=l
1J
i=l
(3.3.3.3)
L
Therefore, the simple measurement variance term may be
expressed as follows
E
{L2
n
11
= n
{'B
N
B
L
n
2
;.}
j=l J
1
~ {~
B
U.[( L C.. V. 'It)-v.]2}
i=l
1
j=l 1J lJ
1
L
N
L
B
L
i=l j=l
11
+ n {Nl3
V..
N
B
2
Ln.,} +
i=l j=l
lJ
L
2
(3.3.3.9)
1J
For the correlated measurement variance term we have
the following expansion:
1
N
B
E {-2 L U,U'I[( L
n i1i I 1 1
j=l
= E
{~
n
N
E U.U.
i1i I 1 1
1
c.. V·· 1t )-Y,][
lJ lJ
1
B
L
j=l
Cl"J,V1"J'lt)-Yl"]}
B
1 [
B
L
j=l
C"(Y"lt- V,,) +
lJ lJ
lJ
1
(C .. - -B)V"][ L C.,.(Y.,. t-Y'I') + (C'I'- -B)Y"']}
1J
1J j=l 1 J 1 Jl
1 J
1 J
1 J
68
1
= E {-2
N
B
n
L: [C .. (V .. t-V,,) +
U.U.,
L:
iti'
1
j=l
1
lJ
lJl
lJ
1
1
(C iJ'- -B)Y' .][C., '(Y"'lt- V'j) + (C.,.- -B)Y"'] +
lJ
1 J
1 J
1
1 J
1 J
1
N
---;»
L:
nl: iti'
U.U.,
1
1
B
L
jtj'
[C .. (Y .. -Y .. ) +
lJ lJ 1t lJ
1
1
(C .. - -B)Y,,][C""(Y"·'lt-Y.,.,) + (C.,.,- -B)V",,]}
lJ
lJ
1 J
1 J
1 J
1 J
1 J
1
N
= E {-2
n
1
N
---;»
L:
iti'
nl: iti
1
2"
n
1
---;»
L:
I
N
L:
iti'
N
L:
nl: iti
1
-2
B
U.U.,
1
j=l
1
(C .. - -B)V"C",(V"'lt- V,!,) +
j=l
lJ
lJ 1 J 1 J
1 J
U.U.,
1
C.. (V .. t-V")(C,,,- S)V", +
j=l
lJ lJl
lJ
1 J
1 J
U.U.,
1
1
(C .. - -B)(C",- -B)V"V", +
j=l
lJ
1 J
lJ 1 J
U.U.,
C.. C.,.,(V .. t-V")(V",, t-V'I") +
jtj' lJ 1 J
lJl
lJ
1 J 1
1 J
1
1
1
1
1
1
B
L:
B
L:
B
L:
B
L:
n
iti'
1
2
N
B
lr 1
jtj'
n
1
-2
n
1
~
1
1
.~.,UiUi'
N
L:
iti'
N
L:
nl: iti'
C.. C.,.(Y .. t-Y")(V", t- V.,.) +
lJ 1 J lJl
lJ
1 Jl
1 J
U.U.,
N
L:
1
L:
U.U.,
1
1
U.U.,
1
1
L:
1
(CiJ'-S)YiJ,Ci"'(V"" t-V",,) +
J
1 J 1
1 J
B
1
C.. (V .. 1t -V .. )(C.,.,- -B)V""
jtj,lJ lJ
lJ
1 J
1 J
L:
B
1
1
(C .. - -B)(C",,- -B)V"Y",,}
jtj'
lJ
1 J
lJ 1 J
L:
+
69
N
= E
B
{~E
U.U. ,C .. C., .(R .. t+ Z ·1t)(R. , . t+ Z , t) +
1 1
lJ 1 J
lJ1
J
1 J1
J1
E
nt: iri
j=l
I
1
N B C ..
......
1:
U.U.,(C .. C.
1:
nt: iri' j=l
1
N
B
......
E
1:
nt: iri' jrj
1
N
B
-2
E
1:
n
I
lJ
1
1
•
1
• ,}
J
C.,.
1
1
1J
B
- ..:..1J + -2)Y .Y. I • +
B
B
iJ 1 J
-
J
U,U"C"C" .• (R··1t+Z·1t)(R·'·'1t+Z·'1t) +
1 1
lJ 1 J
lJ
J
1 J
J
U.U .• (C .. C. I •
iri ' jrj' 1 1
lJ 1 J
Y•• Y• I
1J
1
I
I
-
Ci,j,
C..
1
1
B
--l1+-2)x
B
B
(3.3.3.10)
•
Noticing that the sum over the cross product terms which
involve z. t and R.. t i s zero (3.3.3.10) equals
lJa.
Ja.
N
B
= 1 r (r-l) 1:
1:
~ N N-l
iri ' j=l
N
1 ! (r-l) 1:
~ N N-l
i i
r
1
N
B
r
2"
n 1.~.J j =1
[~
N
1 r
1:
(N ~ 1 )
2"n N
i i
r
N
1 r ( r ) 1:
2"
N N-T i i
n
r
1
B
1:
I
j =1
2 r (n -1 ) + _1 .!l
(r-l )
- B N N-1
N-1
B2 N
(~-~)]Y
.. Y.,.
lJ 1 J
+
B
E(Rij1tRi'j'1t) +
1:
I
jrjl
B
E{Zj1t Zj'1t) +
1:
I
jrjl
n (n-1)]
-N
1 Y.. Y. I · ,
lJ 1 J
~r.
B' n
2
E(Zj1t ) +
•
(3.3.3.11)
Finally we find that the correlated measurement variance
70
term equals
B
1 {:I (r - 1 ) N B
2
= 2"
l:
l:
+
r(r-l
)
l:
n·
.
N N-l i =1 j=l 1J
n
j=l
1 {:I ( B-1 ) N
2"
N B(N-l) i;L: i
n
t,;.2} +
J
B
Y.. Y. I ' } +
L:
1J 1 J
j =1
I
N
B
1 {!. 1
1
L:
L:
Y.. Y.
2"
N B (N - 1 )
n
iIi' J';"J lJ 1 J
1
(3.3.3.12)
• 1 }
Combining the simple measurement variance (3.3.3.9)
and the correlated measurement variance (3.3.3.12) we arrive
at the following expression for the contribution of the
original sample to the measurement variance,
B
~ (r
+ r(r-l)) { z::
1 (!.
2"
N
n
r
- N
n
1
2"
n
j
N
(r-l))
N-l
N
B
z::
z::
=1
{ L:
B
2
n··
1J } +
L:
i =1 j =1
N
2
rB L: Y. 2
N i =1 j =1 Y••
1J - N i =1 1
{.!:.
r ( B-1 ) N
L:
N B(N-l) i;i
B
z::
l
j =1
Y•• Y• I
(3.3.3.13)
.}
1J 1 J
Simplifying this we have
1
B {l
B
l:
B j =1
1
1
n {B(N-l)
t,;.2}
J
1
1
+ n (N-r) {Bf\f
N-l
N
B
l:
E
i =1 j =1
N
B
L:
L:
i =1 j =1
(IQ) .. 2 }
1J
2
n··
} +
1J
(3.3.3.14)
as the final expression for the contribution of the original
sample to the measurement variance.
The last term in this
71
expression was arrived at by noting that,
N B
1
1
- {B(N-l ) L:
(IQ) .. 2 }
L:
n
1J
i =1 j =1
1
= 2"
(N ~ 1 )
n
{
N
B
L:
L:'
-v• J.)2 1J
{Y • .
i =1 j =1
N
B
{Y -V)2}
L:
i
i =1
.
Proceeding in a similar manner we find that the contribution of the subsamp1e to the measurement variance,
-
-
2
i.e., E {{Y2t- Y2) } is equal to
N B
B
N-r l
2
1
s.2} + _1 (-N:-')
L:
L:
n··
} +
{TfN
n1
1J
B J=
. 1 J
i =1 j =1
1B {1
L:
N B
1
1
(IQ) .. 2 }
L:
l:
n1 {B(N-1 ) i =1 j =1
1J
(3.3.3.15)
In the specific case that we are presently considering
we would expect no contribution to the measurement variance
to result from the between phase correlation.
The following
demonstrates that this is indeed the case.
E {{Ylt-~1){~2t-~2)}
N
N
B
U. c.. Y .. t - -1n L: U.1 Y1. )
L:
= E { (1n L:
i =1
i =1 j=l 1 lJ lJl
(L
N
B
l:
L:
"1 ;=1 j =1
V.D"Y"
1 1 J 1 J 2t
-
1
x
N
L:
"1 ; =1
V. Y. ) }
1
1
72
= E
N
B
E
E
U.C .. (Y. 'it- Y.. )]
n
lJ
i =1 j=l 1 lJ lJ
{[1
x
N
B
[_1 E
E V.D .. (Y. '2t-Y .. )] +
lJ
n 1 i=l j =1 1 lJ lJ
N
B
1
E
E U. (C .. - B)
Y.. ]
n
1
1
J
1J
i =1 j=l
[1
x
N
B
1
[_1 E
E V. (D .. - B)Y'
.]}
1
1
J
1J
n1 i =1 j =1
= E
1
{nn-1
N
B.
N
B
E
E
E
E U.V.,C .. D., .• (R. 'It+ Z 'lt)
lJ
J
i =1 i '=1 j =1 j 1=1 1 1 lJ 1 J
1
(R i 'j 2t+ Zj I 2t) + nn
I
1
(C .. - -B)(D""
lJ
1
N
N
B
B
E
E
E
E U. V. 1
1 1
1 i=l i 1=1 j =1 j '=1
1
- -B)Y"Y""} .
J
lJ
1
x
X
(3.3.3.16)
J
Using assumption 2 and due to the fact that a
r
al
(3.3.3.16) reduces to
N
B
1
E Y.. 2 E {U.V.(C .. - S)
(D .. - 1)} +
1 1
1J
B
1J
i= 1 j =1 1 J
r
N
B
1
E
E
Y.. Y..• E {U.V.(C .. - S)(D
..
lJ
lJ
1
1
1
J
1J
i =1 jrjl
N
B
E
Y. , . E {U.V.(C ..
r Y..
1
J
1 J
1 1
1J
iri' j=l
N
E
if i
1
(D""
1 J
B
E
Y.. Y.
jfj' lJ 1
1
• 1
J
1
- -B)}
which in turn equals O•
-
E {u.v.1(c ..
1 1
1J
I
1
s)( D·1 IJ'
-
i)
1)}
B
+
- l)}
B
+
x
(3.3.3.17)
73
Considering all the above results we arrive at the
following expression for the variance of Zt.
= 2
B
1
nn
B
{l
E s.2} +
B . 1 J
J=
n(N-r 1 )+n 1 (N-r)· 1
N-1
) {NB
(
1
n+n 1
nn 1
1
{B(N-l)
N
B
2
L:
E
n iJ· } +
i=l j=l
N B
2
E
E (IQ) .. } +
i=l j=l
lJ
(3.3.3.18)
Thus we see that V(Zt) is the weighted sum of five variance
components, namely,
B
E s.2}
B J=
. 1 J
2
Ss =
{l
S 2 =
1
n
N
B
i =1 j =1
E
2
1
= {B(N-1
)
2
1
= {N-1
2
En·· }
1J
N
B
E
E
= Interviewer individual
interaction random effe.::t
variance
(IQ) .. 2 }= Interviewer individual
1J
interaction fixed effect
i =1 j =1
variance
N
1
S2 =
H. 2 }
= Sampling variance
E
{N-l
i=1 1
SIQ
and
{WB
= Interviewer random effect
variance
S8
Thus,
N
E
;=1
L. 2}
1
= Systematic error variance.
74
n+n 1
2
n-n 1
2
1 N
2
{SIQ
}
+
{S}
+
(~) {S } .
B
n
N
nn 1
nn 1
(3.3.3.19)
3.3.4.
Sample Estimators for Variance Components
We will now demonstrate how the above variance com-
ponents may be estimated using the sample statistics. The
subsamp1e variance of the true values, sx 2 , is an unbiased
estimator for the population variance, S2, i.e.,
2
1
N
-
2
E{sx ) = E {n-1 i~l [V i {X i -x 2 )] }
= S2 .
Consider next the between phase within interviewer
within individual sum of squares, BPWII.
BPWII =
=
N
B
L:
L:
i =1 j=l
N
B
L:
L:
i =1 j =,
We have
' '2t)
U.V.CijD
.. (Y.lJ'1t- YlJ
1 1
lJ
2
U.V.C
.. 0. ,[(Z'1t
R. '1t)
1 1 lJ lJ
J - lJ
(3.3.4.1)
Taking the expected value of this quantity we have
N B
r
E{BPWII) = nr (_1 ) L:
.. 1 t)
N i =1 L: E[{Z.J 1 t+ R1J
j='
(Zj2t+ Rij2t)]
r
N
B
) L:
= rn (-'
2{t;;. 2+T) .. 2 )
L:
N
J
lJ
i =, j =1
= 2r,
,
N B
t;;.2} + 2r,
T) .. 2 }
L:
L
{NB
B J=
. 1 J
i =1 j =1 lJ
{l
B
L:
2
75
(3.3.4.2)
Looking next at the between phase within interviewer
between individual sum of squares, BPWIBI, we have,
BPWIBI =
=
N
B
l:
l:
iii' j=l
N
1
1
1
1
lJ
1
J
lJ
x
I'
1
J
B
l:
iii
U.U .• V.V. IC .. C. 1.0 .. 0.
l:
I
j=l
U.U .• V.V .• C.. C.,.D .. D.•.
1
1
1
1
lJ
1
J
lJ
1
J
x
The expected value of this quantity results in the following,
r -1
r
E{BPWIBI) = r (rn=ll){-N~l ) -N1
n
N
B
l:
l:
r
N
r 1 -1
) 1 (N="l)
= rn (r-1
4N
E
n-1 N
i =1
=
N
r 1 (r 1 -1) {r-1 )4N 1
{Ni3
(n-1)(N-1)
r 1 (r 1 -1) {r-1 )4N
=
(n-1)(N-1)
E {[(R. ·1t- R . '2t) -
iii' j=l
l:
i =1
lJ
B
l:
j
=1
B
E
j =1
n·1 J.
lJ
2
2
n··
}
1J
2
S
(3.3.4.4)
n
For the within phase within interviewer between individual sum of squares, WPWIBI, we have
WPWIBI =
N
L
i;i
B
L
I
j=l
U.U.,C .. C., .[Y. 'it- Y' '1t]
I
1
1
lJ
1
J
lJ
1
J
2
76
=
N
B
L
iii
U.U. IC, .C. I .[(H.-H. I) + (L.-L. I) +
"
'J' J
"
"
L
j=l
I
Taking the expected value of this quantity we have
B
E(WPWIBI) = ~ (r-l) [ L
N N-l
j =1
B
N
L
L
N
L
• -I- •
,r'
(H.-H.)
,
,
,
2
(L,.-L",)
2
+
+
j=l iii'
B
N
L
L
[(IQ) ,..J -(IQ)'I']
, J
j=l iti'
E
B
1:
j~l
2
+
N
L
iii'
B
N
B
N
r
2
= N (r-l)[ L 2N L H. + E 2N E L. 2 +
N- 1 . 1
j =1
i =1
i=l
J=
,
2N
N
B
E
E
,
N
E
(IQ) .. 2 + 2N
'J
i=l j=l
1
= 2Br(r-1) {N - 1
1
2Br{r-1 ) {N-l
i =1 j =1
N
2
n·· ]
' J
2
H. } +
,
E
i =1
N
,
L. 2 } +
.L:
, =1
1
2Br(r-1) {B(N-1
)
B
E
N
B
L:
L:
(IQ) .. 2 } +
i =1 j =1
'J
B
1
N
N
2
L n .. }
2Br(r-1 )(N-1) {aN . E
, =1 j=l
' J
= 2Br{r-1) {S
2
+ SB
2
+ SIQ
2
N
2
+ N-1 Sn }.(3.3.4.6)
Considering next the between phase between interviewer
,e
77
between individual sum of squares, BPBIBI, we have,
N
BPBIBI
=
B
I
U.U.,V.Vo.C .. C.• oD ..• Do.o. x
jlj' 1 1 1 1 1J 1 J 1J 1 J
I
U.U .. V.V .• C.. C... D... D..
j Ij' 1 1 1 1 1J 1 J 1J 1 J
L:
iii
L:
N
=
B
L
;I i
E
o
1
X
[(IQ). o-(IQ)., .-(IQ) .. ,+(IQ) .• o. +
1J
1 J
1J
1 J
(3.3.4.7)
And,
1
E(BPBIBI)
r1
r 1 -1
N
B
L:
L:
1
B {B(N-l)
N
= ~ (~:1) N (tr=""l)[4NB
4N(B-1)
N
B
L:
L:
;=1 j=l
4(r-1)r (r l -l)
1
=
n-l
(IQ) .. 2 +
1J
;=1 j=l
2
n··
]
1J
B
2
;~l j~l (IQ);j } +
4(r-l)r 1 (r l -1)(B-l)
N
1
N
B
2
n-l
(N-l) {N'B.L:
E n .. }
1=1 j=l 1J
=
4(r-l)r l (r l -l)
2
(B l)N
2
n-l
{B SIQ +N=l
Sn }(3.3.4.8)
Using the above results, the following estimators for the
variance components of V(Zt) are constructed.
"2
S
= s x2
78
1\
Sn
2
" 2
SF;
BPWIBI
r l (rl-l )(r-l )4N
= (n-l)(N-l)
BPWII
= 2r
l
2
- Sn
A
( 3 . 3. 4 . 9 )
BPBIBI
4(r-l)r l (r l -l)
2
"
2
(_N_)
- (B-1)
=
SIQ
( n- 1 ) B
B N-1 Sn
A
"2
SB
WPWIBI
"2
A
= 2Br(r-l) - S - SIQ
3.4.
2
N
2
-S
A
N-l
n
Using the DSS Model to Evaluate
a Measurement Process
Of what value is the DSS model and the information
derived from it to a person engaged in survey research?
If
we view a particular measurement process as consisting of
certain tools for making measurements, e.g., questionnaires;
instructions as to the actions to be carried out during the
measurement process, e.g., self-enumeration, use of interviewers; and specifications of the environment under which
the measurements are to be made, we may examine how these
elements affect the measurements obtained.
Different sets
of interviewers may have received different amounts of
training.
By getting estimates of the variance components
resulting from interviewer effects for various sets of
interviewers, we can evaluate the effect of the kind and
amount of interviewer training on the quality of the survey
estimates.
79
It has long been recognized that the way in which
questions are phrased influence the responses that are obtained.
We can look at particular kinds of questionnaires
and ask, IIWhat is the best type of questionnaire for use in
a self-enumeration survey?lI; or, alternately, IIDoes a particular type of questionnaire reduce the effects that
interviewers have on measurement errors?1I
Before conducting a large scale survey, the survey
researcher who suspects that the measurement process (in the
sense that it is defined above) influences the measurements
obtained, may use several different measurement processes in
pilot studies to determine which has the smallest variable
errors, i.e., (MV) and (BV).
Although the model is framed
in terms of eliminating the measurement process bias,
information is obtained as to the magnitude of this bias.
If pilot work indicates that the bias is small one may wish
to use the Koch-CBM mean square error model as described in
section 2.3.1 for conducting the survey.
Even if extensive pilot work is not possible, one may
use the process which he believes to be the best and allocate some portion of the funds for choosing a "subsample and
evaluating the process.
Publication of the results on the
errors associated with the measurement process will allow
the accumulation, over time, of a body of knowledge about
the best methods of obtaining survey data.
CHAPTER IV
USE OF DSS MODEL
OPTIMUM SIZES FOR ORIGINAL SAMPLE AND SUBSAMPLE
IN SELF-ENUMERATION CASE
MODIFICATION OF DSS MODEL FOR ELIMINATING BIAS ONLY
PREFERRED SAMPLING SCHEME FOR FIXED COST AMONG DSS,
KOCH-CBM, OR MEASURING TRUE VALUES ONLY
TABLES
4.1.
Introduction
In this chapter we will examine several issues that
must be considered when one is deciding whether or not to
use the DSS model for conducting a survey.
For simplicity
we will assume that the survey is a self-enumeration survey.
A researcher who uses the DSS model for conducting a survey
must choose values for the sample sizes of the original
sample and subsample.
In the next section, optimum values
of nand nl are given which minimize V(Zt) for fixed cost
or to minimize the cost for fixed V(Zt). A cost function
is
used. which reflects the relative cost of the faulty
measurement process versus the error-free measurement
process.
The DSS model as formulated in Chapter II and Chapter
III allows one to estimate the specific components of the
81
variance, i.e. (MV), (BV) and (TV), as well as eliminating
the bias due to the measurement process.
In some instances
one may not be interested in estimating the variance components and may wish to eliminate only the measurement process bias.
In this case it is not necessary to repeat the
inaccurate measurement process for the subsample.
Only true
values need be obtained for the individuals in the subsample.
In section 4.3 the DSS model is modified to treat
this case and the optimum values for nand n
l
are given for
the modified DSS model in a self-enumeration survey.
Finally, we note that when one is considering using
the DSS model to conduct a survey there are two limiting
cases of interest.
First, the entire survey could be conducted using the
faulty measurement process.
If we assume that obtaining
values with measurement errors costs less than obtaining the
true values, then, for a fixed cost, we can take larger
samples if only the faulty measurement process is used.
When one conducts the survey entirely by
mean~
of the faulty
measurement process the Koch-CBM mean square error model as
given in section 2.3.1 applies.
Given the large sample
sizes that are possible with the Koch-CBM model the MSE(Yt)
may be less than V(Zt).
In the second limiting case of interest, the survey
is conducted using only the error-free measurement process.
We will call this the true value sampling scheme, TVSS.
Although for fixed cost much smaller samples would be
82
possible under the TVSS since the cost per observation is
higher, the variance of the estimate, V(x ), may be smaller
t
than V(Zt) for certain situations. Section 4.4 examines the
conditions under which the DSS, the Koch-CBM, or the TVSS
will be the preferred procedure.
4.2.
9ptimum Sizes for the Initial Sample and the
Subsample in a Self-Enumeration Survey
In order to choose the sample sizes for the initial
sample and the subsample in a self-enumeration survey one
must consider the relative cost of obtaining measurements by
the faulty measurement process versus the error-free measurement process.
We will use the following cost function to
reflect this relative cost:
Suppose that the cost of obtaining the Yiat is c l and
the cost of obtaining the true values Xi is c 2 ; then the
total cost C of conducting the survey using the DSS model is
(4.2.1)
where n is the size of the initial sample and n l is the
size of the subsample. The optimum values fo: nand n l are
those which minimize V(Zt) for a fixed cost C.
Alternately
we may wish to minimize C for a fixed variance V(Zt).
Recall that for a self-enumeration survey the V(Zt) as
demonstrated in section 3.2.3 is
1 N
2
l S2 + n-n l S2 + -(~)S
= n+n
.
nn l
y
nn l
B n. N
83
If we let
{4.2.2}
then
{4.2.3}
Using the technique of Lagrangian multipliers to determine
the optimum values of nand n1 we minimize the function
A{nc l +n l c 3-C}
{4.2.4}
S2
{4.2.5}
let
k = S2y + S2
1
-
B
and
k2 = S2y + S2B
Thus
k
F
k
= -n l + n 2
l
{4.2.6}
s2
r:r+
Anc l + An l c 3 -
I.e .
{4.2.7}
This gives the following optimum for n and n1 '
n{y}
~
2
and
k1
1
= {_}'2
cl
n{y}~
{4.2.8}
.
{4.2.9}
If we let
nt
=
n + nl '
we have
(4.2.10)
84
and
k2 1
(_)'2
C
3
~=
n
(4.2.11)
t
Using the above we find that
For fixed cost C the optimum values of n, n1 for
1.
minimizing V{Zt) are
n
=
(4.2.12)
=
(4.2.13)
and
n
1
For fixed variance V{Zt) the optimum values of
2.
n, n
1
for minimizing the cost Care
k
n
=
k k c
+ ( 1 2 3)\
1
c1
(4 . 2. 14).
S2""
V{Zt) + N
k
n1
=
k k C 1
+ { 2 1 1 )'2
2
c3
(4.2.15)
Since we require that n1 ~ n for the DSS model if
equations (4.2.12), (4.2.13) or (4.2.14), (4.2.15) result in
n1 > n, then we set "1
= "
where
85
For fixed cost C
l.
n
=
and
(4.2.16)
For fixed variance V(Zt)
2.
n
4.2.1.
C
c +c 3
1
=
k1 + k2
52
V(Zt) + r
(4.2.17)
Illustrative Values
The following tables illustrate the optimum values of
nand n1 for various ratios of the variance components
2
2
2
5 , 5y and 58' The tables for fixed cost are constructed
as follows:
Let c 1 ' the cost of obtaining the Yiat , be taken as
M standard units of cost. Then
K
=
is the cost of obtaining the true values relative to the
faulty values in standard units of cost.
Let the total amount of money available for conducting
the survey be fixed such that
C
= 1000M
=
standard units of cost
. (4.2.18)
lOOOC 1
Table 4.2.1 and Table 4.2.2 show the optimum values
of nand nl for various values of K and various values of
52
the ratios
sf and
S2
s~·
As the cost of obtaining the true
86
Table 4.2.1
Optimum Values for nand n l for Fixed Cost
S2 /S 2 = 0.03, C = 1000M
B
K= 2
4
10
20
40
50
100
S2y /S 2=0 . 05
n
673
615
518
438
358
333
262
nl
109
77
44
27
16
13
7
0.162
0.125
0.085
0.062
0.045
0.039
0.027
nl/n
K 1K
2
l
0.078
S~/S2=0.25
n
546
483
386
313
246
226
172
nl
151
103
56
33
18
15
8
0.276
0.213
0.145
0.105
0.073
0.066
0.047
n1/n
K IK
2
1
0.230
S2/
y S2=0 • 5
n
490
427
334
266
206
189
142
n1
170
115
61
35
19
16
8
n1 /n
0.347
0.269
0.183
0.132
0.092
0.085
0.056
K21 K1
0.361
444
382
294
232
178
162
121
185
124
64
37
20
16
9
"lIn
0.417
0.324
0.218
0.159
0.112
0.099
K2/ K1
0.523
S2y /S 2=1 • 0
"
"1
0.074
87
Table 4.2.2
Optimum Values for nand n1 for Fixed Cost
S2/ S2 = 0.3, C = 1000M
B
4
10
20
40
50
100
458
396
306
242
186
170
127
181
121
63
36
20
16
9
"l/ n
0.395
0.306
0.206
0.149
0.108
0.094
0.071
K2/ K1
2 2
5y /5 =0.25
0.467
431
370
284
223
170
155
116
190
126
65
37
20
17
9
"1/"
0.441
0.341
0.229
0.166
O. 118
0.110
0.078
K2/ K1
5y2/5 2=0 . 5
0.579
414
354
270
211
161
146
109
195
129
66
38
20
17
9
"1/"
0.471
0.364
0.244
0.180
0.124
0.116
0.083
K2/ K1
2 2
5y /5 =1.0
0.667
396
338
256
200
152
138
102
201
132
68
38
21
17
9
0.505
0.391
0.266
0.190
0.138
0.123
K= 2
S2/ 5 2=0.05
y
"
"1
"
n1
"
"1
n
"1
"l/ n
K2/ K1
0.765
0.088
88
values increases, the total number of units in both the
sample and the subsamp1e declines.
In addition the ratio
of the subsamp1e size to the initial sample size,
decreases with increasing k.
As the simple measurement variance s~ and s~ the variance of the bias terms increase relative to s2, the ratio of
This is because increases in S2 and S2
y
B
have a greater relative effect on k2 versus k1 and
n1 to n increases.
The tables for fixed variance are constructed as
follows:
Let the V(i t ) be fixed such that
V(Zt)
=
Vo
S2
= 50
(4.2.19)
As before we let
c l = M monetary units
and
Table 4.2.3 and Table 4.2.4 show the optimum values
2
for nand n1 as Sy2 and SB2 vary with respect to S.
Exactly
n1
the same trends occur for the ratio of n
however, the
sum of the sample sizes increases with increased relative
2 = 0.05 and S2/ S2 = 0.03 the
Note that when S2/
y S
B
S2 i s
minimum cost for which we can assure that V(Zt) = _1
50
2
S2/ S2 = 0.3 the
only 112M; whereas, for S2/
y S = 1. 0 and B
cost k.
89
Table 4.2.3
Optimum Values of nand n1 for Fixed Variance
52
B
_1
52
V("it) = 50_
:2= 0.03
5
K= 2
4
10
20
40
50
100
5;/ 52 =0.05
n
76
83
98
117
143
153
195
n1
12
10
8
7
6
6
5
n 1/ n
0.158
0.120
0.082
0.060
0.042
0.039
0.026
K2I K1
0.078
5y2/5 2=0 . 25
n
112
127
158
195
249
270
355
n1
31
27
23
20
19
18
17
n1/n
0.277
0.213
0.146
0.103
0.076
0.067
0.048
K21 K1
2
5y2 /5 =0.5
0.230
n
150
172
220
276
356
389
517
"1
52
45
40
36
33
33
31
n1/n
0.347
0.262
0.182
0.130
0.093
0.088
0.060
K21 K1
0.361
221
257
334
424
553
605
812
92
83
73
67
63
61
59
"lIn
0.416
0.323
0.219
0.158
0.114
o. 101
0.073
K2/ K1
0.523
5y2 /5 2=1.0
"
"1
90
Table 4.2.4
Optimum Values of " and "1 for Fixed Variance
S2
V(Zt) = _1
50
K= 2
4
S2B _
~-
0.3
S
10
20
40
50
100
2
5y2 /5 =0.05
82
95
122
155
201
220
295
32
29
25
23
21
21
20
"1/"
0.390
0.305
0.205
0.148
0.104
0.095
0.068
K2/ Kl
5 2/5 2 =0.25
X
0.467
110
128
167
213
279
306
411
48
44
38
35
33
33
31
0.436
0.344
0.118
0.108
0.075
"
"1
"
"1
"1 / "
K2 / Kl
0.228· 0.164
0.579
2 2
5 /5 =0.5
X
"
"1
"1 / "
K2/ Kl
145
170
222
285
374
410
552
68
62
55
51
48
47
45
0.469
0.365
0.248
0.179
0.128
0.115
0.082
0.667
5y2 /5 2=1.0
n
214
252
332
426
562
617
834
"1
108
98
87
81
'77
75
72
"l/ n
0.505
0.388
0.262
0.190
0.137
0.122
0.086
K21 Kl
0.765
91
minimum cost is 538 M when the relative cost K
= 2.
This
illustrates the necessity of keeping measurement errors
small. 1
4.3.
Modification of the DSS Model to
Eliminate
Bia~
Only
The formulation of the DSS model as given in Chapter
II allows one to estimate the variable errors due to the
faulty measurement process as well as eliminate the measurement process bias.
In some situations a survey researcher
may only be interested in eliminating the measurement
process bias.
This may be because he already knows what the
variable errors are, or because he wishes to estimate
X with
minimum total error and is not interested in the components
of error.
In this section we present a modification of the
DSS model that allows one to eliminate the bias only.
4.3.1
Modification of the General DSS Model in the Case of
Simple Random Sampling
If we are not interested in estimating the variable
components of error for the faulty measurement process it is
not necessary to remeasure the V. t . In order to eliminate
-1
the bias we need only measure the X. for our subsamp1e. The
-1
modified DSS model is formulated in the following manner.
1
.
2 2
2 2
In the latter case, l.e.
Sy/S
= 1.0 and SS/S
= 0.3,
K = 2, the DSS model would not be th~ preferred
sampling
scheme. Preferred sampling schemes are discussed in section
4.4.
92
The assumptions concerning the population and the X.
-1
are the same as in section 2.2.
Since we are not
~oing
to
repeat any measurements using the faulty measurement process, we may drop the subscript a on Y. t'
-la
Thus, for each
individual we have a p component vector random variable
Yo
-1
t
which is the measurements for the ith individual at the
t-th trial of the faulty measurement process.
Again we draw an initial simple random sample and a
simple random subsample which is characterized by random
Relationships (2.2.1), (2.2.2),
variables U. and V..
1
1
(2.2.3) and (2.2.4) concerning the U.1 and V.1 apply.
There is no between phase correlation to be concerned
with for the -1
Y· t . Also, we will assume that (2.2.5) and
(2.2.6) hold for the Y. t .
-1
Thus, we measure the Y' t for our original sample and
-1
measure X. for a subsample of this original sample.
-1
our estimator
~t
=
~t
=
~1 t
=
~lt
=
~t
for
X as
!:t - (f1t-~1t)
We form
follows:
(4.3.1.1)
where
N
r
i =1
U'Y'
1_ 1 t ,
the average of the elements in the
initial sample
N
1
l:
1 _ 1 t , the average of the elements of the
n l i = 1 V'Y'
original sample which are also i n
the 5ubsample
and
1
N
,
n1 i l:= 1 V.X;
1-
the average of the true values for
the elements in the subsample.
93
It follows that E(~t) = I.
Considering the variance-covariance matrix of -~t we
have,
(4.3.1.2)
Dropping the subscript t on
y
=
1
N
L U.Y. and y
n 1=1 1-1
-1
1
= --
~lt
and adding and subtracting
N
L V.Y. gives
nl i=l 1-1
V(~t) = E
E {[{y-y )+{x -I)][(y-y )+{x
-
= E
-1
-1 -
-
-1
-X)]'}
_1_
{[{ft-f)-{Y1t-~1)][(~t-~)-(f1t-~1)]'}+
-x )]I} +
E {[{y-x)-{y
- _
-1 -x
-1 )][(y-x)-(y
_ _1_1
E {[(x-I)(x-I)I}
where
x=1
-
N
r
n 1=1
(4.3.1.3)
U.X.
1 1
The last three terms in (4.3.1.3) are, respectively, (MV),
~
(BV)
and (TV).
(BV)
...-..
,....
,.... and (TV) are identical to the terms in
the original DSS model, i.e., as in sections 2.2.2 and
~
2.2.3.
-
Considering (MV) we have,
(!:!y) = E {{~t-~) (~t-~) I} +
E {{~lt-~l){~lt-~l)'} E {{~lt-F1)(ft-~)'}
E {{~t-f){flt-fl)'} .
(4.3.l.4)
95
4.3.2.
A Self-Enumeration Survey Under the Modified DSS
Model
If a self-enumeration survey as described in section
3.2 is carried out under the modified DSS model and letting
(4.3.2.1)
with
= y.21
then we have
V(w )
t
4.3.3.
=
n-n 1
2
n-n 1
2
1 N
2
{5
}
+
{5
}
+
-(
--=.!!.) 5
nn
y
nn
B
n N
•
1
1
(4.3.2.2)
Optimum Values for nand n1 for a Self-Enumeration
Survey under the Modified DSS Model
Under the modified D55 model the cost function is
C -- nc 1 + n c ,
1 2
and
V{w t )
(4.3.3.1)
S2_ 5 2_ S2
y
=
B
n
S2+ 52
+ Y B
n1
S2
-r
(4.3.3.2)
Let
52
y
-
+ S2
.
k 1m
= S2 -
k 2m
= S2y
S2
B
(4.3.3.3)
and
B
(4.3.3.4)
Then we have the following for the optimum values of nand
n1 :
1.
For fixed cost C the optimum values of n,n l for
minimizing V{w t ) are
96
n
=
{4.3.3.5}
and
(4.3.3.6)
provided n l
nl > n take
n
<
n.
n 1 --
If {4.3.3.5} and (4.3.3.6) result in
nand
C
=
{4.3.3.7}
2.
n,n
l
For fixed variance V{w t } the optimum values of
for minimizing the cost Care
k
n
=
1In
kl k C 1
+ { mC 2 m 2 )'2
l
{4.3.3.8}
and
k k c' 1
l
k 2m + { m 2 m l }'2
n
l
=
with n l
=
{4.3.3.9}
n
=
if the above give "1
>
n.
97
4.4.
Preferred Survey Procedure
A researcher who is considering using the DSS model to
conduct a survey has available at least two measurement
processes, a faulty measurement process and an error-free
measurement process.
Thus, as well as employing a combina-
tion of these two processes in a DSS model for conducting
the survey one could use either of these two processes
separately.
If one used only the faulty measurement process
the Koch-CBM mean square error model of section 2.2 applies.
If one uses the error-free measurement process all of the
usual results of traditional sampling apply.
For conven-
ience we will call this the true value sampling scheme,
denoted TVSS.
In this section we examine which is the
preferable model for a self-enumeration survey.
4.4.1.
Procedures Considered
We will assume that the survey researcher wishes to
estimate the mean
X
of some population with minimum error
by means of a self-enumeration survey.
processes are available.
Two measurement
One is an inexpensive faulty
measurement process, and the second is a relatively more
expensive error-free measurement process.
Thus, the survey
researcher wishes to choose the survey procedure which will
yield the smallest error among the modified DSS model, the
Koch-CBM mean square error model, and the TVSS model.
We
will assume that the researcher has a fixed amount of money
C available for conducting the survey.
98
For a self-enumeration survey we have the following:
1.
DSS model:
(4.4.1.1)
-)
n-n l {S2}
= nn
V{w t
+ n-n l
l
nn l
2
,{SB} +
(4.4.1.2)
The sample sizes nand n l are chosen according to (4.3.3.5)
and (4.3.3.6) using the cost function
(4.4.1.3)
2.
Koch-CBM-MSE model: 2
Assuming fixed cost C and cost c l for obtaining the
Y,'t' the sample size is no = f-.
We draw a sample of size
c
l
no and measure Yit for each individual in the sample.
.0estimate for X is
1
N
1:
UiY it
no i=l
The estimator
~t
Our
(4.4.1.4)
may be biased, thus using the results of
section 2.2 and those of Koch (1973), we have
(4.4.1.5)
where
2See Koch (1973). These results are based on those in
the Koch paper with the differences that only one trial of
the survey is used and a bias term has been added.
99
and
2
Sy
1
= N-1
3.
N
- 2
~ (Yi- V).
i =1
TVSS model:
If the cost of obtaining the true values is c 2 then
the sample size for the TVSS is
(4.4.1.6)
We use the usual estimate for
'"
1
X
=x =n
N
~
2 i=l
X which
gives
U.X.
1 1
(4.4.1.7)
and
v(x)
(4.4.1.8)
The preferred sampling scheme will be that one among
the above which has the smallest mean square error.
Since
the estimators for the DSS model and the TVS5 are unbiased,
we must compare V(w t }, V(i} and M5E(y t } in order to choose
the preferred sampling scheme. In the next two sections
various values for 52.. S2y' 52y' 52B and B = (V-X) are chosen
and the preferred sampling scheme is indicated for several
relative costs k = ~
Choosing the values for 52,
S2y and
c
y
1
S~ in relation to 52 is somewhat of a problem since there
are few guidelines from actual practice.
Those chosen in
the next two sections seem reasonable and the reasons they
were chosen are indicated.
The situation in which one
100
wishes to measure the proportion of the population which
possesses a certain attribute is considered separately,
because the reasons which govern the choice of values for
the variable error components are somewhat different from
those for non-attribute variables.
4.4.2.
Measuring an Attribute
The proportion p of a population which possesses a
certain attribute is to be measured by a self-enumeration
survey.
Tables (4.4.2.1), (4.4.2.2) and (4.4.2.3) indicate
what would be the preferred survey procedure for various
values of the error components of the different models.
MSE denotes the Koch-CBM-MSE model.
I
8_l e/.
sji••
IC).to
4
I
~
O.
S:
5'.0.2
.
,
O.~5
liS[
--
s·~
".
S·
o 2~.1
-::\ - 0.2
5,
S·'K
K=
I
I
~
I s·
5:S~ - 0.5I
s'I
.
02~
1
liS(
~
o.~o
S
-- --L---'
8 .... 0"1.
0.2
~>K
~l,.
5'
51I, aO.~
02.!
0.50
"'I"
O.O~
5·I
a
0.2
02:
O.~
1.00
I
I
OSS
O.~
'VSS
I
IISl
I
--
1.00
&.
,
'-
~
•
'v
t,'"
CoV
"u
:JU
IVY
i
B· HoO'-,.' K
l1'.
00';
s·
0.2)1
:faO.5
5,
0.5·_
OSS
IS: a.0.2
o..;j
Sy
N5£
I'
I.OV
TABLE 4.4.2.1
-
I
2, 4 ,101201~0ISO,IOO.
'us
0.0,
5'
'-t-,0.2
5,
O.2~
I.CO
S·\.K -=-
,K
Ys'
e'
0.25
o,~o
v
I
s·
5,
osd
1.001
•
20,40 50.!00
I
liSE
,K
Ys'
O.O~
0.2~
O.!O
4
'0
0.2:1
-t-0.5
~~t;.~O.1OO
liS I
I
"ss
B'-"0%~£:KGIJ4
"
:~~
(~~
20 AO:O
.s:s~··
0 5
025'
o.~.;
-
I.O~
.
IIsl
.
S
• ~
_
-
_
e--.l/07. ~~
2 , ':l 2:l ~':'J!O 1':)
~. i i ' • •
,
OC:
OSS
s"•
0.2:
52 -0.5
y
I.CO
s.
t
O.O~
'us
i
02:J
O.~~
leo
2 , • 1'0 ,20,40, ~O ,lOa,
IISl
I.eo
S
O.O~
S;~
10 10 40.5:llo:l
O.:C
O.V~
I
O.ot
a
S
4
0.05
I s~, - 0.2
115£
V
0.:; ..
Z
. 0.25
S·
10~
_
"'K
Ys.
s'I
sr•, -0.5
B·~S1.
K
s'~
""2',4,10
I
,
51
5
100
s~.
Is! ·.0.2
I
I,US
I t;4.'0 ,ZO,40,~0,IOO
S5
I
o.c~
~aO.2
s:
I
liSE
O.~ci
l-CO
IISf
I
O.~O
1.001
s:
O.I~~
--- - -- - _....
• . 10 .20.050100
I
O.O~
I
1.4 .10.10 .•0.50.!l){I
O.O.~
s·
-::f
- O.~
5,
0.50
1.00
sj~
lIO!CO
o.~
,.ss
liSE
I.C':
~- ·,o'-s~>~
s: .
$T -0.5
y
I.e;)
YSl
o.c:
o.:c=.
0,· ~
I.e:
PRfiFERRED SAMPLING SCHEME
P
==
0.1
o
i
I
.s:.
S.
s!
'"~
B·'~.
II
. I' 4
0.0
.
.
0.2
-+S, ·o.~
II$(
O.~J
•.CO
-e·
K
2
4
5: '.0.2
S,
s'•
liS(
o.~~
"";i
S~ • 0.5
O.!C
'- --5''''- I:
--
~."
C.05
s: ·.0.2
S!
,
050~
I.CO
-
uss
I I
liS(
1.0::
,
S.'\.K
~I
2,4,10.~O,40,~0."JO
~Z
OC;
T
s'
--+.
0.5
S,
s'
-:f.
O.~
S,
0.2
s' '\.K
s·
Sf•, '.0.2
20 40 S) .ICO
O.C;;
.
O.2~
o.~c
I.CO
Is'
I
0~4 v I
5,
1 ~·0.2
I
liS(
~
IO.~O,40J~O!OO
OSS
!
O.C'l
O.2~
uss
0.5
liS(
i
oss
e·tbo~.y
a'
USS
~ '0.5
S~
I S:
I S~' 9. 2
I~
0.2~t T I
~.
II
"r--!4-:"-+l""-+''-''+'.....=.'::;
O·.O~
s'•
S~ • 0.5
liS(
~
0.50
6·-5%
.
~
0.2~t ~
liS(
o.s1.
I.CoT S
I.CC
S
, S
B "JO~ '~K
.s: .
S~
O.O~
I
2
~
4
I
10 20.40 SO .100
I
.---,
Ys~;t
0.5
2
025
0.5:
liSE
e.-&lO'1.s:f;'Z1 2,
YSI
4
I
1020,40,50,'OJ,
I
s'•
S: •0.5
0.2~
o.eo
I.CO
I
5:
I1'0.2
~O.<OI~O,I-:·:l1
O.D~!
OSS
0.25
0.5
10 20 40 50100
055
1.00,
I
-I
_
s~~,J-L: ~
. 2'4
0.05
Sl~
~.
L- OSS
0.1~t
0.5
II
s·
l~.
I S, 0.2
.0
2
~.
I
s:~
20.~0.50.100
~.
4. u.
0.0"
5'
-1-
I.OJ
• I
~.
I
I.OJ
I
S:·~
I
0.5~
'\..:"-..L. .;
B' UO". S~.
B·t'\()1.
:, S:
liS(
1.00
.'
. 0.25
I .,
I
0.5'
I.CO~
I
10,.0,40,50,100
O.Z~
tS~.
10 20 40 S) .IOJ
0.05
1.4
O.O~
S· .
O.!~
s'~I
s·~
~.
10 to 40 5!l 100
/,;" 2
s'•
oss
~'0.5
I 5,
TVSS
J.C~
TABLE 4 .4. 2 . 2 :
e" -{;01;).K
10.20.40 50.ICO
PREFERRED
4
'
/0
,
20,~O,5J,ICO,
OSS
o.s:1 lVS S ,
I.CO,
I
:·:~t
I
I
-1
I. 0 'I
I
_
I
SAM PLIN G SCHEr~ E
--'
p
e
=
o
0.3
e
N
e
103
TABLE 4.2.2.3
PREFERRED SAM PLI NG SCHEME
'kSl
Bs
K
2
4
SI
Sy
~I
10 20 40 50 100
0.05
~. 0.2
IO/e.
SI8 .
-I
=0.5
0.25
MSE
0.50
Sy
K
2
4
'0 20 40 50 100
0.05
0.25
MSE
O.!lC
1.00
1.00
3./
3.2
---------------------------------------------------------------------B= 5%
4
10 20 40 50 100
4
'0 20 40 50 100
ass
S'
Sy
~.0.2
0.25 T
V
1.00
MSE
MSE
0.50 S
S
3.3
3.4
--------------------------------------------------------------------20 40 50 100
S:
0.25
Sy
0.50
z·0.2
20 40 50 100
0.25
MSE
1.00
rvss
.1.
MSE
3.6
3.5
-----------------------------------------_.------------------------------~./. K
~
~.0.2
SO(
TSI
2
0.05
4
10 20 40 50 100
2
4
10 20 40 50 100
DSS
ass
0.25
0.50 TVSS
""------1
1.00
3.7
3.8
--------------------------------------------------------------------?-L~~~~~IOO~
s'
s!.
rvss~
.
~
r
o
Sl
• O.S
-a·
ass
0.2
8:& 100%'
~
40 50 100
DSS
Sy
UID
3.10
104
The data used in constructing the tables were chosen
as follows.
1.
The population size N.
It is assumed that the population is large enough
that the effect of the finite population correction
is negligible. 3
2.
The true proportion p.
Three values of p were used,
p = 0.1
= 0.3
and p = 0.5
(4.4.2.1)
P
3.
The population variance 52.
S2 = p{l-p) was used.
4.
The net bias B.
B was chosen to be a certain percentage of p.
Nine values were used, 1%,
~5%,
+20%, +40% and +60%.
For the larger absolute percent biases the positive
and negative biases were considered separately because
of the effect of the bias on S2.
y
5.
The simple measurement variance 52.
y
5; was chosen to be a certain proportion of S2.
The proportions used were
0.05{S2)
0.25(5 2 )
3N
=1
(4.4.2.2)
million was used for programming purposes.
105
0.50(S2)
2
1.0 (S).
Note that when S2 > S2 the DSS model will never be the
y -
preferred sampling scheme because
V(x).s.. V(W t ) for S~ > S2.,
The variance of the -population
of -Y.'s,
S2Y .
1-
6.
The following reasoning governed the values of
S2y that were used:
The true value Xi for any individual member of the
population is either
ment variance is>
a
or 1.
If the simple measure-
a
then, for some i, 0 < Yi < 1. If
B is a certain percentage ~ of p then V = p + ~p = Py'
The maximum variance for the Y. 's is then p (l-p )
1
Y
Y
which would occur when S2y = 0 and Y.1 = O,l;V 1.. Since
S2y is not
a
in any of the cases we considered,
S2 < p (l-p). In each case S2 was chosen to be
y
y
y
y
approximately 10% less than py(l-py)'
7.
The variance of the population of
Bi~'
2
SB'
Values for S~ were chosen as follows:
The values for Bi are restricted such that
-1 < B. < 1. More specifically,
-
1
Xi
=a
implies that
a
Xl'
=1
implies that -1
< B. < 1
1
-
and
<
-
B.l < O.
If we assume that Yi is fairly close to the true
value Xi for all individuals i then the range of the
Bi's is much smaller than that of the Vi's.
For
106
example, if Y.1 = 0.9, Xi = 1 then B.1 = -0.1, and if
Yi = O. 1 , X.1 = 0 then B.1 = +0. 1 . Thus we would expect
S2 to be somewhat sma 11 er than 2
Sy' Two values were
B
2
chosen for SS'
S2 = (0.2)S~
B
and
(4.4.2.3)
S~
=
Results.
(0.5)S~ .
We will look first at the trend within
tables, i.e., for a particular value of p.
shows the results for p
= 0.1.
Table (4.4.2.1)
For small percent biases,
i.e. 1% and 5%, the MSE model is overwhelmingly the model of
choice, as indicated by subtables 1.1-1.6.
Its position as
the preferred model is replaced by the TVSS only when the
simple measurement variance equals the population variance
and the relative cost k is small, i.e. k=2.
As the percent
bias increases to 20% the TVSS replaces the MSE model for
the small values of k.
instance (subtable 1.7).
The DSS model is preferred in one
For high percent bias (40% and 60%)
the DSS model and the TVSS continue to displace the MSE
model as the procedures of choice.
In general. the DSS model
is best for middle ranges of relative cost, the MSE model
for high relative cost and the TVSS for low relative cost.
Comparing subtables 1.11 with 1.13 and 1.15 with 1.17
illustrates the effect of increases in S~ on the V(w t ), i.e.
if the ratio S~/S~ increases then the MSE model is more
likely to be the preferred model due to increased
V(w~).
"
107
For the higher percent biases we see differences
between a negative and positive net bias.
If the bias is
negative the MSE model and DSS model are preferred in more
cases and the TVSS in fewer cases than when the bias is
positive.
This is because S~ and S~ are smaller when the
bias is negative.
Table (4.4.2.2) gives the results for p
= 0.3.
same general trends are evident as for p = 0.1.
The
However,
the MSE model is replaced as the model of choice by the DSS
model and TVSS at lower levels of the net bias. This is
because, for each level of bias,8 2/S 2 is greater for p = 0.3
than for p
= 0.1.
Thus, in the situation when the bias is
60% of p the MSE model is never preferred.
Differences for
positive and negative biases are not as marked as in the
table for p
= O. l.
This occurs because the difference
between the maximum 52y for a positive and a negative bias
of a particular size is smaller for p = 0.3 than for p = O. 1.
Table (4.4.2.3) presents the results for p
= 0.5.
There are no differences between the maximum S2y for positive and negative bias in this case.
Again, the same
general trends are observed with the exception that the MSE
model is replaced by TVS5 and 055 for even smaller values
of B, so that when the bias is 40% of p the MSE model is
never the model of choice. Again this is because 82 has a
greater relative effect on the MSE(y t ); e.g., when p = 0.1
and 8 = 60% p, 82/s 2 = 0.04; when p = 0.3 and 8 = 60% p,
B2/5 2 = 0.154; and when p = 0.5 and 8 = 60% p, 8 2/s 2 = 0.36.
108
A bias of 10% is not indicated in the tables.
p
For
= 0.1 the results for a 10% bias are similar to those for
a 20% bias.
For p = 0.3 and p = 0.5 a bias of 10% gives
results that are intermediate between 5% and 20% bias.
4.4.3.
Measuring Non-Attribute Variables
The population mean
enumeration survey.
X is to be estimated by a self-
Tables (4.4.3.1) to (4.4.3.9) indicate
the preferred survey procedure, among the three being
considered, for various values of the error components.
tables are constructed in terms of the population
coefficient of variation, CV
= s
X
The
[ev. 0.1 ; :i·"5J
52
52
52
-z -0.5
-z--1.0
...
5:
s~ -O.O~
-..,025
52
.
5
5
K
8
K
,,,. 2
4 10 20 4050100
MSE
ll,..
52
I
-z -0.02
5y
L
8'\ 2 4,10.20,40,50,100
1"-,
.".
ll"-
ll'l'.
T
20'"
40-/.
DSS
40-/.
~I
DSS
50"-
40-/.
S
150-/.
S
+-0.0~
5y
"l:J
.
.
DSS
.
00'(.
.,..
~~.
20%
2
4
K
52
-8-_ 02
~I
5Z
y
•
4 10 20 40 50 100
I
20%
40'(.
50'(.
AfSE
r
V
S
S
DSS
DSS
DSS
.
K
2 4
B'\ 2 14 .10 (0 140 15°1100
."·1
r
DSS
5".
20,..
40,..
r
~I
DSS
S
1./0
TABLE 4.4.3.1
.
5
50".
1.9
ZO'l"f
40-/.
'.
'.
r.
'.
r.
I
TVSS
50°/. ,
U
K
MSE
·~·I.
V I
S
S
40".
4 10 204050100
MSE
'(.
1.4
I<
50".
2
TVSS
1l0"l. ,
1.6
8
40".
1-/.
I(
2
1"5'(,
B
. I MSE
5.".
20'(.
1.$
8
10 2q,~0.50 ,100
T
40'1'. S
OO'Y.
ZO...· f
lS
I(
8
T
20". V
S
40'1'. S
I
1.2
1.1
.
~
20'(.
ZO,.. V
1l0"l'.
52
•
,
.
/.8
K
1\,1
C;V
-tv
;'\,1 'VV
MSE
B~Z
1'(.
," 110,20,401501100
MSE
I
ll'(.
T
V
S
S
DSS
ZO"'t
40'1'.
50".
TVSS
I
/./1
1./2
PREFERRED SAMPLING SCHEME
a
\0
r.
~V
.,
52
5
-t.0.0 2
5.,
~
J
I
"I.
5
I(
8
5.,
Z 4 10 .0 40 50 100
M
1'1.
~'1.
40'1.
~0'1.
S·I
7 . 0.2
S.,
2
4
K
I '1.
~
".
20%
~
S
40% S
.
B
2.9
r
~I
20'1.
DSS
4
40'1.
,
DSS
s
~O%
' 'U
~O·/•.
50%
DSS
2./0
TABLE 4.4.3.2:
I
rvss
.
2.8
I(
<
•
•
~
IV
'v -tv
";;IV
tIIS£
~
S
S I
AI<:!"
~".
ZO%
2.7
AI<:I"
"0'/.
2,4 I!O ,20 , 40 15° 1100
1'1.
5-/.
~
,,,.1 2 ,41'0I?Ot40,~O.lOO
• 50".
50"1.
8,1(
2
''1.
5 "I.
DSS
I
2.4
I(
Z 14 110,20140,50,100
1'1./
.
AI<:~ I
C
rvss
40':~.
2.~
2.6
!O 2040.50.100
I M<;F
I
50"1.,
2.2
".
20"1.
DSS
50"1.
~o,..
I(
52· 1.0
8"'- Z,4,IO 20,"OlO,1OO
''1.!
M £
5%
r
~I
s
~O·I.
~O"l.
2.5
8
'(
20"1.
DSS
ZO'1'l
40% ~I
DS$
Sly
0 .5
-2-S
I
~
20'1.
II
I(
2./
-+'-0.05
S~ I.~
S2
40'1.
~O%
52
52
-0.25
oss .,<
ZO%
0.1 :
51
'I
-2
-z -0.05
51
.
II
z
" ••
TVSS
DSS
- ..
2.//
'vv
B"'- Z ,4 ,to
~,IOO
1 "1.1
5 "I.
.
20"f.~
rvss
4Ci%
~".
MS£
,
2./2
PREFERRED SAMPLING SCHEME
-"
-"
o
e
e
e
[cv.
I(
IofSE.L
•I,..
5,..
20,..
20".
OS$
40.,.
40%
2
4
i
'
Sy
4 10 20 40 !SO ,10(
MSE.L
-,.
8
I'.
~~.
20'1.
40'7.
80.,.
80.,.
Sl
B
2 - 0.2
Sy
5'.
10'1.
40'1.
4
10 20 40 50
I()I)
,,/<:F-
8
I MSE
T
v
S
40'1. S
50,..
oss
1(0
B
I'.
5'.
20.,.
OSS
20"1••
I
~
S
40'1. S
MSF
ass
' I'.
6-'.
10'10
40"1.
.1.4
TABLE 4.4.3.3:
8
10 20 40 50 100
I
MSE
'
T
V
oss
S
S
K
204050100
I'.
MSE'
5"1.
20"f'f
40'1.
rvss
50.,.
".0
2 4 JO 20 40 50 100
"'SF'
I
K
8
1%
l()
20 C<l !SO ,100
MS£
5".
T
V
s
oss
10"l.!
40".
S
50'1.
.1.10
TVSS
50.,.
Ie
It
50,100
40.,.
.J.T
10 20 40 50 lOO
~
USE
OS$
50.,.
2 4
20
~./.
sV
S
2 4
40"1.
50'1.
".9
l()
1(0
T
t ,..
5'.
20"1.
T
s
S
1.0
MSE
I
.J.6
10 20 40 50
10 20 40 50
.J.J
I(
•
S
K
2 4
.1.5
Il
-2--
50%
20Y. V
OS$
40.,.
•I'.
2 4
1"4
5,..
20,..
ass
SI'C
0 .5
I(
I
K
2
1%
S2,
,
100
.1.2
Ie
-+-o.b~
S
I USE
.1./
~
10 20 40 50
-2--
50,..
50.,.
•
S2~
•
'S
4 10 20 40 !SO.O
5,..
S2
o.s]
a
- 2 -0.25
I(
It
t
1 ,..
:~
SI
Sll'
SZ·0.05
-+-0.02
Sy
0.1 i
rvss
50%
.1./1
PREFERRED SAMPLING SCHEME
.1.12
....
....
....
[ev.
52
,
Y
·
4./
8
52
52y
K
,.,.
2 4 10 20 40 !l0 100
'-I.
•
MS£
r
~I
20Y.
40.,..
.,
..
IV i:U .. o.~
r,
Sl
8
0 .2
7y
~
r
v
•
,
S
.. , S
I
101$£
2
~
•
toO
8
40.".
'''' ,u -tv ;;;JU 'VV
.$
DSS
•
10.20.40 50 K><t
1 1
MS£
DSS
40-/.
4.3
MS£
e
4.4
I(
8
1%
K
20.40,~O.1OO
I.".
MS£
5-/.
rvss
MS£
rvss.
DSS
4.7
4.8
K
:~I~
f
40~'
~
DSS
.:10.,..
4.9
K
MS£
DSS
I
20%
DSS
52 - 1.0
8'\. ,
4.6
I(
<
I
• Vr
4.5
Sl.,
S2 -0.5
4.2
S
DSS
S
'0",
.,
52
4 10 2040 !l0 100
•r
•V
.$
DSS
~0'1. ,
-8-_'0.05
l
~
20"'l
40.,.
:= -1.5J
52
't
52 -0.25
r
SZ-0.05
S2
-!--0.02
52
0.5 ;
10,20.40,!SO ,100
...
MS£
~'Io
20%
40%
rvss
DSS
~OY,
4./D
TABLE 4.4.3.4:
4./1
MS£
I
20%
40~.
TVSS
!lO"',
4.12
PREFERRED SAMPLING SCHEME
......
......
N
e
e
e
[cv-0.5; :~ - 1.0]
S''t
"52-
s:
51y
0 .05
sa
-!.._,
0
52
.
--0.5
SZ-0.25
52
IC
5'I
S2 -0.02
y
~~ 2 I • I I MS" I
•
:
r.
r.
~%
20%
40'1.
....
DSS
I
,
1.
V
S
e
y
-
20'7.
'. ~
DSS'
I.
,
:10%.,._..,.~~_ _-1
5.5
s'•
O•2
-rSy
IV
' V " V JV IVV
I
""n:
5.9
•
•
DSs
n.
40'1.
C:V"U ;AI IUU
,'O.20,40,!I(
.'L:LAlSE
NSE
I
T
20-'.
TVSS
40'1.
DSS
:10'1.
--
5.J
..
~
5.4
T
V
S
DSs
S
TABLE 4.4.3.5:
I'I.~
5_'.'
AlSE
'---l
S
S
40""C
40,.,
'
110%
-I.
5
10'1.
MSE
20%
DSS
5.8
MSE
~
S
I
~~t
'---l
40,.,1
S
MSE
TVSS
40'4
110'4
5./0
8'\
5.'
MsE
1
T
V
S
IU l'U 4U ::xJ IUU
I
5.6
50'4
~o.,.
··
I,
20%
DSS
MSE
.
.
5 "to
10'1. T
V •
40'4 S
S
f
IV
K
~
I.
40'.
c: ..
•
5.2
.
'.
•
•'.
'. $V
DSS
5./
51
MSE
T
1. S
:10%,
52 -0.05
'v .........."" " " , . _
~
5./1
PREFERRED SAMPLING SCHEME
5./2
--'
--'
w
[ev-0.5. :~ 80.5]
52
S2
,
2 4
-,--0.02
Sy
not
Y.
oss
40%
7
I,
~." ,
S
1%
51
-1Sy
2 4
8
0.05
10
ro 40 50
us£
100
z
.
IU
se/.
"0
I
~
:>u .nu
~
MS£
• V
oss'
40'1.
S
, S
~
110% ,
6.5
OSS
V
2 4
10 20 40 50 100
1'1,
MSE
:1'1.
20'1.
rvss
:10'1.
.
"
~
r
~
S
S
6.4
I(
IV
cu -tv
OU IUU
B
2 4 10 20 40 !lO 100
1'1,
USE
I
V
OSS
OSS
ItISE
5'1.
20'1.
40'1'
110'1.
67
6.6
1.0
K
40'1,
S
'.
, 7
20~.
r
II
6 . .1
~u
~
MS.E
I
6.2
I(
T
5
S
~
6./
II
•'.
'. S
51
-2- 8
10 20 40 !lO~tQ
~
OSS
V
4
~
I USE
r,
51
~ 2
10 70 40 !lo.n
II.
I
51
r
52 -0.5
r
0 .25
52.
'I
SZ-0.05
ss
L
6.8
I(
,
I.
Sl
51• 80.2
y
2 4 10 70 40,!0 101:
•" r
20or.
80'.,
V
oss
40'1.
I
I,
I
6.9
S
S
.I
~
.. 10 20 40 50 101
"
ItISE
•
'.
oss
I.
•
"
6./0
TABLE' 4.4.3.6:
2
r
~
ItISE
I
8
.,.
2
.. 10 20 40 50 00
NSE
11'1.
20'1,
OSS
S
40'1.
rvss
50% ,
6./1
6.12
PREFERRED SAMPLING SCHEME
--'
--'
~
e
e
e
[CV-1.0 :~ -1.5]
i
52y
-2 -0.25
5
52
Y'
5'2-
0 .05
IC
4
I.,.
y
1020 40 ~O.IOC
I,..
2
4 10 20 40
5,..
MSE
I
~./.
40-/. S
00"1. S
-T"o.o~
Sy
0"4
4 10 20 40
f--,
v
40,.. S
00% S
~o
MS£
40"1.
00"1.
DSS
2
4 10 20 40
~
100
II
K
2 4
10 20 40
~o
100
MSE
T
V
S
S
40,..\
2 4 10 2040 50.100
20"1.
MSE
v
ass
40"1. S
S
40%
~O·I.
T
V
S
S
40%1
7.7
T.6
,,..
I
7- 0 .2
y
~·I.
20~·.
~
40% S
S
00,..
20'1'.
ass
40%
'Z 4
T.8
}K
~
S
S
DSS
0"1.
'-I.
20'1'.
20,..
40%
00,..
00"1.
T.9
I,..
'0 20 40 00 100
MS£
'./0
TABLE 4.4.3.7:
TVSS
5061.
8 K
0'11.
TVSS
00%
K
5 ~~
~,.
00%
8
1-/.
1%
T.5
52
.100
T.4
20% T
20"1. T
I,..
Se,'.
DSS
.
T.2
K
2
52
....!..·.o
52
.
K
20%
T./
52
100
MSE
v
ass
00,..
1"1.
~
20"/. T
20""
40,..
8
e
IC
2
52
I
52 -0.02
52
52
-1'-_05
DSS
'10'1',1
TVSS
00'1'.
'./1
7./2
PREFERRED SAMPLING SCHEME
U'l
~v - 1.0
• -0.05
-2
52
+-0.02
S,
r ..
~
II
r , ..
10 20 4050
t
I
DSS
50".
8,/
I(
I
.
S8
'
2- 0 .05
S,
I,..
.
,10 ,20 ,40,~,!?9
2
4
10 20 40
~
100
USc
5".
40"'~
DSS
50.,. S
8,2
8
I(
DSS
40'.
I,..
5".
40'.
. eO-i.
10.20 40 50
,
USc
5".
20".
S
,4,10.20,40,50,100
~
40"'lS
50% S
4O%UJ
50% S
8
I
K
2
.. 10 20 40 50 100
.e{.
5'1.
20%
S
S
K
a",~
DSS
I
8,4
'-K
8
"'.
40'1.1
eo,..
.
4 ,10,20,40,50,1()()
~
TVSS
50'/,
8.1
le/.
TVSS
50%
20'.
40-'.
DSS
40%1
5'1'.
8.6
to
DSS
8,3
.
!
8,$ ,
Sl -0.2
I
20,.. V
10".
eo.,.
S·II
~2
-=r -to
IJSc
5'4
20'.
40'
II
'51
-'-_05
S2
.
~"0.25
Ie
,
SI
1
S
,,..
1.0J
SI
Sl
II
i :: -
8.8
2,4 ,10,20,40,50,100
5,..
20'/.
DSS
40'l.W
• 50% S
8.9
TABLE 4.4.3.8:
OSS
8./0
I
40%~
50%
40%1
8./1
PREFERRED
SA~1P
LING
TVSS
50,..
8,/1.
SCHEME
--'
--'
0'1
e
e
e
[CVa,.Oi
..
.
52 • 0.2~
51
.".
• '1'.
-+-0.02
Sy
AlSE
5'1'.
52'f
S2
S2ao.o~
51
:~ ao.s)
AlSE
4°~'LLI
10,.. S
DSS
DSS
40".
9./
II
•
a
2 - 0 .05
Sy
2 4 10 20 40
1%
5%
i1
9.2
~
ItIS~
20".
DSS
40-1.
•
24 10 2040 SO 100
1%
-
5".
20,.. T
v
40,.. S
50,.. S
• II
"l~
•
20%
y
: : JOSS
~
I
~'SE
~.7.
Z,1 ,19 .20.40.!lO "
ZO,.liJ
~
DSS
,,.. -
40%
5-/.
20~~
v
S
9.9
TABLE' 4.4.3.9:
~O
100
DSS
9.4
~
2 .4 ,10,20. 40.!lO .100
Lj
20,..
40,..
50',.
9.8
~ 2,4 .10,20 40.!lO.tOq
i
1
I 'Y.
L
"liB
20,..
r
40%~"
00%
9./0
TVSS
50,..
9.T
AlSE
r
40% S
.50,..
DSS
10,.. S
2 4 10 20 40
40·,.1
I'\'.
9,6
9,'
I
DSS
9.J
II
100"
OO'\'.
51
51 -0.2
82"-1.0
20".
50".
S• .
51r
$2ao.~
DSS
S
5,..
20%
40'/:1
TVSS
110%
9./1
9,/2
PREFERRED SAMPLING SCHEME
.......,
118
The data used in constructing tables (4.4.3.1) to
(4.4.3.9) were chosen as follows.
1.
The population size N.
It is assumed that the population is large enough
that the effect of the finite' population correction
is neg1igib1e. 4
2.
The population coefficient of variation, CV.
The relationship of B2 to S2 determines how
important the net bias B of the faulty measurement
process is to the total error in the faulty measurement process.
Since B was chosen to be a certain
percentage of X, the ratios of B2 to S2 will be
different for different coefficients of variation.
Therefore, the results are
pr~sented
in terms of CV.
Three values for CV were used,
CV
= O. 1
CV
=
0.5
(4.4.3.1)
and
CV
4N
=1
=
1.0
.
million was used for programming purposes.
119
3.
The net bias B.
B was chosen to be a certain percentage of
X.
Five values were used, 1%, 5%, 20%, 40%, and 50%.
4.
The simple measurement variance 52.
y
S2y was chosen to be a certain proportion of 52 as
in the tables for p, i.e.,
0.05{S2)
0.25(5 2 )
0.50(5 2 )
(4.4.3.2)
1.0 (S2)
5.
The variance of the population of
Yi~'
Sy2 .
Because the range of the Yi lsi s not in genera 1
restricted for non-attribute variables, limits in relationship to the bias cannot be set.
However, we may
note that if there is no trial to trial variation in
,
the Y' t such that Y' t = Y., V., and if the faulty
measurement process uniformly overestimates the Xi by
"
,
a certain proportion of Xi' v say, then S~
= {1+v)2 S2.
Similarly for a uniform underestimate 5~ = (1_v)2 5 2.
Thus we would expect that sometimes S2y is larger,
sometimes smaller and sometimes nearly ~qual to 52.
Thus 5~ was chosen to be a certain proportion of 52.
Those used were
S2 = 1.5(5 2 )
Y
S2 = 1.0(5 2 )
y
and S2 = 0.5{S 2 ).
Y
(4.4.3.3)
120
The variance of the population of
6.
Bi~'
2
SB'
In general we would expect that the range of the
B.
1
IS
would be much smaller than the range of the Yi
IS.
Thus SB2 was chosen to be a certain proportion of S2.
Three values were used,
S2
=
0.02(S~)
S2
=
O.05(S~)
and S2B
=
0.2 (S2)
y
B
B
Results.
(4.4.3.4)
Consider first the results for CV
= 0.1 as
presented in Tables (4.4.3.1), (4.4.3.2) and (4.4.3.3).
The
MSE model is never the preferred model for biases greater
than 1%
X,
and then only for relative costs of 20 or more.
This is as one would expect given the small value for CV.
Thus for small CV one is essentially faced with choosing
between the TVSS and the DSS.
Looking at any row of sub-
tables we notice that for a fixed value of S~, TVSS replaces
DSS as the preferred sampling scheme as S~ increases.
When
S2/
S2 has small to moderate values (i.e., 0.05-0.5) the
y
cases in which TVSS is preferred are restricted to lower
relative costs.
For S2/
S2
y
= 1.0 TVSS is always preferred
over DSS.
The columns of subtables illustrate the effect of
increasing S~ for fixed S~.
As S~ increases the TVSS
becomes the preferred procedure.
The replacement of DSS
by TVSS occurs first at the lower values of k and spreads to
larger values of k as the errors associated with the DSS
121
model increase.
The results for fixed B are illustrated by looking at
I
a specific subtab1e row across the rows of subtab1es.
Again as the errors associated with the 055 model increase
TVSS replaces it as the preferred model with this first
occuring at low values of relative cost.
Thus we can conclude that in general for small CV the
MSE model is preferred only when the bias is very small and
the relative cost high.
The TVSS is preferred in those situations where the
relative cost is small unless the errors associated with the
DSS model are quite large.
When the relative costs are moderate to large, the
DSS model is preferred.
Any time that the simple measure-
ment variance is as large as the population variance, TVSS
is preferred over DSS no matter what the relative cost.
Tables {4.4.3.4}, {4.4.3.5} and {4.4.3.6} give the
results for CV
= 0.5.
As we would expect, given the
increase in CV, the MSE model is the preferred model more
often than for CV
= 0.1.
The MSE is almost always the pre-
ferred procedure when the bias is 1%, the exceptions coming
s 2 (Table
at small relative cost k and large values for s2/
y
4.4.3.4).
The MSE model is not preferred for the moderate
and large percent biases, i.e. B
>
20%.
For bias of 5%
X
the MSE model is preferred at higher cost levels.
For biases of 20% or greater the same general trends
are observed for the DSS model and the TVSS.
As the errors
122
associated with the DSS model increase the TVSS replaces it
as the preferred sampling scheme with this occuring first at
the lower values of relative cost.
Tables (4.4.3.7), (4.4.3.8) and (4.4.3.9) illustrate
= 1.0.
the results for CV
Again with the increase in CV,
the MSE model becomes the preferred scheme in a greater
number of cases.
It is always preferred for a bias of 1%,
preferred for a .bias of 5% at all but low levels of relative
cost.
When the bias is 20%
X the
MSE model is restricted to
the very high levels of relative cost, i.e. k
= 50 and 100.
Again we find that the DSS is preferred for moderate
and low levels of cost unless the measurement errors are
large, in which case the TVSS is preferred.
4.4.4.
Using the Non-Modified DSS Model
Suppose that the researcher had decided to use the
version of the DSS model that allowed estimation of the
measurement errors.
V(Zt) would be greater than V(w t ); however, there
still might be cases in which V(Zt) is less than either
MSE(y t ) or V(x).
That is, the DSS model might still be the
preferred model even when the form that allows estimation of
the variable measurement errors is used.
Using the same
data that were used in the previous section, Table (4.4.4.1)
shows the preferred sampling scheme among the non-modified
DSS model, the MSE model and the TVSS for CV
S~/S2 = 1.0.
= 0.5 and
Comparing this table to Table (4.4.3.5) we see
that the DSS model is still the model of choice for quite a
DSS ESTIMATES VARIABLE ERRORS
[cv• :~
0.5;
S2
r
S2 -0.05
•
I%~S£
5'1.
8
52 -0.02
y
.
40~.
.
G
DSS
S
S
!!O"l'l,.,j,_ _""'='_ _...._
,
IU
I
T
V
S
S
'.
r.
'''It]
-0.05
y
L'
MS£
~O"lHI
:~::~--,.-,,--- ]
51
II
52y
0. 2
r.
4
,.
r.
......
DSS
·w
,.
'.
'.
'.
I.
. I
DSS
I
I.J
IV
£u
"V ;JU
,,,"v
.......
I
T
V
S
S
I.
S
I,
~
I.T
TABLE 4.4.4.1
.....................
Mse
I
I.
T
V
S
S
"'JV
!IIS£
,~
DSS
TVSS
~
DSS
'.
..
G
"
DSS
'v
~
1.6
'''' cv
"U ;J\I
rvv
G
~
MSe
DSS
'.
..
G
'I
-
TVSS
I,
I.S
.. ",. .............
T
V
S
~ ~ .~ ~~ ~S;
'.
~
-
,.
,.
;)\1 9UU
!lise
1.2
2 I 4 1'0 1<0,40,50 1100
:I "I.
~u
52 - 1.00
•
'.
1.1
s·II
s.
C:U
If.
r.
20'1. V
T
Slr
S2
52 - 0.25
2.4 110,20,40,'o.1OCl
51
• 1.0J
I
Mse
DS$
4
···
~
..
IV ' U
I
TVSS
.v
JV IUU
!lise
0
S
S
~
1.8
1,9
PREFERRED SAMPLING SCHEME
......
I'V
W
124
few cases.
The TVSS replaces the DSS at lower levels of
measurement errors, however.
This is as one would expect,
given the increase in the variance of the DSS estimate for
the mean when some of the resources are allocated for
estimating measurement errors.
Table (4.4.4.1) illustrates
the fact that the two goals, estimating measurement errors
and estimating
X with
small error, are not always
incompatible.
4.5.
Summary
When drawing conclusions based on the above results
as to what the preferred survey procedure is, one should
keep in mind that the data used for constructing the tables
were not drawn from actual practice.
Firm guidelines as to
the best survey procedure cannot be made until more information is obtained about the values of S~, S~ and B which are
associated with certain measurement processes.
However,
since the results of sections 4.3 and 4.4 show fairly consistent patterns for a variety of situations it seems likely
that the more general conclusions that can be drawn from
these results will be confirmed in actual practice.
The
following is a summary of these general conclusions.
When measuring an attribute, the rarity with which it
occurs in the population affects the preferred way of measuring it.
Recall that for p
=
0.1 the MSE model was the pre-
ferred model except in cases of high bias.
This fits with
the results in the section on measuring non-attribute
125
variables if we note that for
p
= 0.1, CV = 3.0.
When the bias makes a large contribution to the MSE
of the faulty measurement process one should use either the
DSS model or the TVSS.
The DSS is preferred over the TVSS
for moderate to large values of the, relative costs of errorfree measurements to faulty measurements and moderate levels
of measurement errors.
The TVSS is preferred over the DSS
when the relative cost is low or when measurement errors are
very high.
CHAPTER V
OPTIMUM NUMBER OF INTERVIEWERS
5.1.
Introduction
In Chapter III, section 3.3, the V{Zt) was developed
for a survey using a fixed population of interviewers.
Suppose that a researcher does not wish to use the entire
population of interviewers to conduct the survey but wishes
instead to use a sample from the population of interviewers.
In this chapter we determine the optimum size for the sample
of interviewers, i.e. the optimum number of interviewers.
5.2.
V{Zt) when a Sample from the
Interviewer Population is Used
The specific model in this case is the same as that in
section 3.3,
with the exception that a simple random sample
of size b is selected from the population of B interviewers
to carry out the survey.
Let the interviewer sample be
characterized by the indicator random variables Fj where
F.
J
=
1
if the j-th interviewer is in the
interviewer sample
0
otherwise.
(5.2.1)
Again the initial sample and subsample from the
population of N individuals is characterized by U. and V.
1
1
127
where (2.2.1) and (2.2.2) hold for Ui and Vi respectively.
We will assume that each individual in the original sample
and the subsamp1e is assigned at random to one of the interviewers such that each interviewer chosen .for the survey
interviews r members of the initial sample and r 1 members of
the subsamp1e and n = br; n1 = br . The interviewer
1
structure at each phase of the survey is characterized by
the indicator random variables C.. and D.. where these are
1J
1J
defined as in (3.3.1.1) and (3.3.1.2) respectively.
Assuming simple random sampling at all phases of the
sampling and assignment procedure we have the following:
r
Pr
{c .. =lIU.=l, F . =1 } =
n
1J
1
J
Pr
{C •. =OIU.=l,
1J
1
F.=l} = n-r
n
J
=
n
b
N°B
b
N-n
Pr {Ui=O, F. =1} = -NB
J
n
B-b
Pr {U i =l, F.=O} = N
-BJ
b
b-1
Pr {F j =l, Fj ,=l} = B
B-1
(5.2.2)
o
0
0
.
Y•• t is the measurement obtained by the j-th interlJa.
viewer for the i-th individual at the a-th phase and t-th
trial of the survey.
The model and assumptions given by
(3.3.1.5), (3.3.1.6), (3.3.1.7), (3.3.1.8), (3.3.1.9) and
(3.3.1.10) apply.
Thus,
(5.2.3)
where
128
N 8
1
E U.F.C .. Y. 'lt
=
E
Ylt
n1 i=l j=l
1 J lJ lJ
N 8
1
E
E V.F.D .. Y. '2t
Y2t = -n
1 i = 1 j=l 1 J lJ lJ
and
x 2t
=
N
1
E V. x..
n1 i=l
1 1
Using the same partitions as were used in section
3.3.3 we find tha t
n(N-r 1 ) + n (N-r)
2
1
{S2} +
(
V(Zt) = b {S2} + _1_
nn l
~
n
N-l
n l - n + 8(n+n l ) - 2Br l
2
{SIQ} +
nn (8-1)
l
(5.2.4)
where s~, s~,
si Q,
S2, and
s~ are as defined in (3.3.3).
When b = B (5.2.4) reduces to {3.3.3.19}, i.e.
{5.4.5}
Thus we see that the only difference in the V(Zt} when
a sample of the interviewer population is used occurs in the
coefficients of the components of the measurement variance,
i.e. the coefficients of s~,
s;,
and
si Q.
The algebra
involved in reaching this result will not be included here;
129
however, the following indicates in broad outline the manner
in which the result is obtained.
The measurement variance is partitioned, as in (3.3.3.4)
as the sum of three components, i.e.
(MV) = E
{[(~1t-~1) - (Y2t-;2)]2}
= E {(Y- 1t -Y- 1 ) 2 }
+
E {(~2t-~2)2} 2E {(~1t-~1)(Y2t-~2)}
(5.2.6)
Each term of (5.2.6) is expanded, giving
E {(Y 1t -;1)2}
{S2} + 1 {S2 } +
= nr {S2} + 1n (N-r)
N-1
n
IQ
n
~
N
r(B-b)
1
1:
n {NB(N-1)(B-1) i r i
N
r(B-b)
1:
N(N-1 )(B-1) iri
I
B
1:
I
j=l
Y.. Y.
lJ
1
I'
J
-
Y.1 Y.1 I } ,
(5.2.7)
E {(Y- 2t -Y- 2 ) 2 }
=
r1
n
1
{S2}
~
+
N- r 1
_1 (N-"l)
{S2}
n1
n
r1(B-b)
N
N(N-l)(B-l) i~i' YiU il } ,
and
+
2
_1 {SIQ}
n
l
+
(5.2.8)
130
Y
Y
2E {{Ylt- l)(Y2t- 2)}
r 1 (B-b)
2
= nn2 [ B-1
{SIQ} +
1
r
(B-b) 1
B-1 NB (N ~ 1 )
N
B
L
L
Y•• Y.
1J
if i ' j=l
r
N
(B-b) 1
L
B-1 fir (N ~ 1 ) if i
1
I
•
J
-
Y• Y• I]
I
1
(5.2.9)
1
Collecting the terms gives
(MV) =
2
2
b {S~} +
n(N-r 1 ) + n 1 (N-r) {S2} +
nn (N-1)
n
1
(5.2.10)
2 reflects the fact that
The change in the coefficient of SIQ
taking a sample of the interviewers implies additional
sampling of the individual-interviewer interaction effects.
5.3.
Optimum Values for b, r, and r
1
Suppose that there is a fixed amount of money available for conducting the survey.
We wish to choose the
values for b, r, r 1 which will minimize the variance of
Zt.
let
C = total amount available for conducting the survey
c
t
1 = cost of obtaining Y••
lJet
c = cost of obtaining X.
2
1
C
3 = cl + c2
(5.3.1)
1 31
Thus we have the following cost function
C = brc l + br l c 3
(5. 3. 2)
Therefore we wish to minimize V(Zt) subject to the
restriction
(5.3.3)
Now,
=
.£ S2 + r(N-r l ) + r l (N-r) S2 +
~
b
brrl(N-l}
n
r + B(rl+r) 2
S2
2B
2
l SIQ + "b""rSIQ - 2
b r(8-1)
brlr(B-l)
S2
S2
1 S2 + 8
B
(5.3.4)
br - br .
N
l
2S 2
S2
_
-!l
2
N-l
~
r
Let
A
=
N
2
0 = (N-l ) S2 + SIQ + S28
n
(5.3.5)
N S2 + (8+ 1 ) 2 + S2 _ S2
E = (N-l
) n
8
8-1 SIQ
and
G
2
= (fL)
SIQ
B-1
Then
V(Zt)
o+ E
= ~b + br
br l
G
b2 r
.
(5.3.6)
Proceeding in a straightforward manner l we have
f(b,r1,r)
=
E
~ + _0_ + brbr
b
l
G
(5.3.7)
~
lSee Buck, Advanced Calculus,
p • 360.
132
and
(5.3.8)
g(b,r 1 ,r)
T
u
= f(b,r 1 r)
v
=
g(b,r 1 ,r)
o
- --2
br 1
dT
(5.3.9)
=
C
C
1
l,2
Thus we find that the desired values for b, r 1 , and rare
among the simultaneous solutions to the following equations.
-0 C
br 2 l,2
1
1.
2.
3.
0
_E_ + 2G )
c 1 (~
2 - -2- - 2
bT;
b r1
b r
b
0
~
br 1
c1
=
c
3
(- E +
b;:2
=
C
b2
(-E
G
- 2 + 22"
br
b r
(5.3.10)
G
~
1.
2.
-Ac 1 b 2 r 2r 1 - DC 1 b 2 r 2 - EC 1 b2 rr 1 + 2GCbrr 1 + ECbr 2 r 1 GCr 1
3.
=0
r 1 2 (Gc 3 - Ebc 3 ) - r 2 (-c 1 Db)
(5.3.11)
=0 .
These equations do not lend themselves to easy solution.
133
Thus we propose an iterative solution.
Since the population of interviewers that we have to
choose from is finite and we are only interested in integer
values of b the problem can be simplified as follows.
For any particular values of C, c l and c 3 we have
upper limits to b, i.e.
b < c +c
C and b < B
1 3
(5.3.12)
since we want each interviewer chosen to interview at least
one individual at each phase of the survey.
b
Thus we can fix
= bo ;
find the optimum values of r l and r which will
minimize V(Zt) for bo ; and solve for V(Zt). Repeating this
process for each possible value of b, we will arrive at the
values which give minimum variance of Zt for a fixed cost.
Proceeding in this way we arrive at optimum values
for r, r l for fixed b as follows:
Fix b = bo ' then
_G_)
b 2
o
(5.3.13)
and
(5.3.14)
Using the technique of Lagrange multipliers we find that
[.Q....]~ C
rl
=
c3
bo([Cl(E-bG)]~ + [c3D]~)
0
n1 *
= bo r, , the optimum value for nl
(5.3.15)
134
and
r
=
(5.3.16)
= bor, the optimum value for nl are the optimum
n
values when b = b
o
G
E - b
o
>
rl
=
or if E - ~
is negative take
b
o
rand
(5.3.17)
5.4.
Conclusion as to the Optimum Values
It appears that in actual practice V(Zt) will be
minimized for fixed cost when we take b equal to its maximum
possible value, i.e.
b
..
(C
= mlnlmum
c +c
1
, B) •
(5.4.1)
3
That is we use either the entire population of interviewers
or choose
c1
~c
3
interviewers from the population and let
each interviewer interview one individual in the initial
sample and one individual in the subsamp1e with the sizes of
the sample and subsamp1e being equal, i.e. "1
=
n
= br.
The tables (5.4.1) through (5.4.16) provide the basis
for this conclusion.
In constructing the tables various
135
.
ratlos of
222
2
2
Sn' SIQ and S8 to S were chosen.
S~,
cost of the survey was fixed at C
The total
= 1000
c l ' where c l is the
V{Zt) was calculated for
cost of a measurement Y•• t'
lJa
various values of bo with rand r l being chosen according to
(5.3.15) and (5.3.16) for each value of boo In each case we
see a steady decline in V(Zt) with increasing boo
Examining the tables we notice that although V(Zt)
steadily declines as bo increases the most dramatic decre
decreases occur early in "the tables. Thus, since constraints
other than cost may be operating, e.g., the necessity of
providing full-time work for the interviewers hired, etc.,
other practical considerations may override the reduction in
variance for the higher values of boo
However, the results
indicate that one should use as many interviewers as is
practically possible for conducting a survey.
•
136
Table 5.4.1
V(zt ) , r, and r 1 for Various bo
S2/ S2
y
S2 = .05
= .05 , S2/
n
B
S2 /S2 - .05 , and S2/ S2
IQ
B
= 20 ,
c3
= .05
= 11
bo
r
r1
1
2
3
4
5
431
219
147
110
88
52
26
16
13
10
10
44
5
0.0153
20
22
3
0.0103
V(z t)
o. 105
0.0552
0.0386
0.0303
0.0253
137
Table 5.4.2
V(Zt) , r, and r 1 for Various b0
s2/ S2
y
2 = 0.25, S2 /S2 = 0.25, and s2/ s 2 = 0.25
= 0.25 , s2/
IQ
B
n s
B = 20 , c 3 = 11
bo
r
r
1
2
3
4
5
284
155
106
80
65
65
31
21
15
12
0.512
0.263
0.179
0.138
0.113
10
33
6
0.0633
20
16
3
0.0384
l
V(Zt)
138
Table 5.4.3
V{Zt)' r, and r
S~/S2 = 0.5 , S~/S2 = 0.5
B
r
=
1
for Various b o
siQ/S2
20 , c
3
, and S~/S2
= 0.5
= 0.5
= 11
r
1
V(Zt)
2
223
132
3
91
4
5
70
57
33
22
16
13
10
29
6
0.123
20
15
3
0.073
1
71
1. 02
0.521
0.355
0.272
0.222
139
Table 5.4.4
V (Zt) , r, and r
2
5 y /S
2 = 1.0 , S2/ S2 = 1.0
n
1 for Various bo
2
2
5 2 = 1.0
SIQ/S = 1.0 , and S2/
B
B = 20 , c 3 =, 11
bo
r
r1
1
2
3
4
5
170
115
81
63
51
75
35
23
17
13
2.032
1. 04
0.706
0.540
0.441
10
26
7
0.241
20
13
3
0.142
V(Zt)
140
Table 5.4.5
V(Zt)' r, and r 1 for Various b
o
S~/s2 = 0.05 , S~/S2 = 0.05 , Si Q/S2 = 0.75,
B = 20 , c = 11
3
r
r
1
and
S~/s2 = 0.05
V (zt)
o. 11 3
4
5
141
125
91
70
57
16
13
0.066
0.051
0.043
0.034
10
28
6
0.029
20
15
3
0.024
1
2
3
78
34
22
10
30
6
0.169
20
15
3
0.0942
142
Table 5.4.7
V (Zt ) , r, and r 1 for Various bo
2
2 2
5y /5 2 = 0.75 , 5 /s = 0.05 , 52IQ /S2 = 0.05
n
B = 20 , c 3 = 11
S2/ S2 = 0.05
B
bo
r
r1
1
2
3
4
5
431
219
147
110
88
52
26
17
13
10
1 .505
0.755
0.505
0.380
0.305
10
44
5
0.155
20
22
3
0.0803
V(Zt)
143
Table 5.4.8
V(Z t ) , r, and r 1 for Various bo
S2/ 5 2 = 0.02 , 5 2/5 2 = 0.02 , 52 /5 2 - 0.5 , 5 2/5 2 = 1 .05
IQ
B
E;
n
B = 20
, c3
=
3
bo
r
r1
1
2
3
4
5
250
125
83
63
50
250
125
83
63
50
0.0442
0.0244
0.0177
0.0145
0.0125
10
25
25
0.00854
20
13
13
0.00656
V(Zt)
144
Table 5.4.9
V{Zt)' r, and r
1
for Various b
S~/s2 = 0.01 , S~/s2 = 0.01 , si Q/S2 = 0.5 ,
o
and
S~/s2 = 0.05
B = 20 , c 3 = 3
b
r
r
1
2
3
4
5
244
217
151
116
94
219
94
61
45
35
0.0239
0.0152
0.0123
0.0108
0.0099
10
48
17
0.0082
2"0
24
9
0.0073
o
V{Zt)
1
..
145
Table 5.4.10
V(Zt) , r and r 1 for Various b0
S2/ s2 = 0.05 , 5 2/s 2 = 0.05
S2 /5 2 = 0.05 , 5 2/S 2 = 0.05
IQ
B
n
~
B
= 20 , c 3 = 21
bo
r
r1
1
2
3
4
31
15
10
8
5
354
180
121
91
73
6
0.108
0.058
0.041
0.033
0.028
10
37
3
0.018
20
18
2
0.013
V(z t)
•
146
Table 5.4.11
V(Zt) , rand r 1 for Various bo
S2/ S2 = 0.05 , S2/ S2 = 0.05 , S2 /S2 = 0.05
IQ
~
11
B
b
S2/ S2 = 0.05
B
= 20 , c 3 = 11
o
r
r1
V(Zt)
1
2
3
4
5
431
219
147
110
88
52
26
17
13
10
0.105
0.055
0.039
0.030
0.025
10
44
5
0.015
20
22
3
0.010
147
Table 5.4.12
V(Zt ) , r, and r 1 for Various bo
S2/ S2
t;
= 0.05 , 5n2 /5 2 = 0.05 , 52IQ /S2 = 0.05
B = 34 , c = 11
3 .
52
/S 2
B
= 0.05
bo
r
r1
V(Zt)
1
2
3
4
5
431
219
147
110
88
52
26
17
13
10
0.105
0.05
0.039
0.030
0.025
10
44
5
0.015
20
22
3
0.010
30
15
2
0.0086
34
13
1
0.0083
148
Table 5.4.13
V(Zt ) , r, and r 1 for Various bo
2 2
5 /5 = 0.05 , 5T)2/s 2 = 0.05 , 52IQ /S2 = 0.05
~
S2/ S2 = 0.05
B
B = 50 , c 3 = 11
bo
r
r1
1
2
3
4
5
431
219
147
110
88
52
26
17
13
10
0.105
0.055
0.039
0.030
0.025
10
44
5
0.015
20
22
3
0.010
30
15
1
0.0086
40
11
1
0.0078
50
9
1
0.0073
V(Zt)
149
Table 5.4.14
2
5 /5 2
~
= 0.05 ,
V(Zt)' r, and r 1 for Various b0
2
2
5 2/S 2 = 0.05 , SIQ/S
S2
= 0.05 S2/
B
n
B
b
= 20
, c
3
= 0.05
= 41
o
r
r1
1
2
3
4
5
282
144
96
73
58
18
9
4
3
0.062
0.046
0.037
0.032
10
29
2
0.022
20
15
1
0.017
6
V(Zt)
O. 112
150
Table 5.4.15
V(Zt), r, and r 1 for Various bo
5 2/5 2
~
= 0.05 , 5n2/5 2 = 0.05 , 52IQ /S2 = 0.05 , S~/s2 = 0.05
B = 20 , c 3 -- 51
bo
r
r1
1
2
3
4
5
261
133
89
67
54
14
7
5
4
3
10
27
1
0.0243
19
14
1
0.020
V(Zt)
0.114
0.064
0.048
0.039
0.034
151
Table 5.4.16
S2/ S 2
~
=
V(Zt) , r, and r 1 for Various b0
2 2
0.05 , 5 2/5 2 = 0.05 , 5 2IQ /5 2 = 0.05
5 B/5
n
B
= 20
, c
3
= 0.05
= 101
bo
r
r1
1
2
3
4
5
200
102
69
52
41
8
4
3
2
2
0.124
0.074
0.057
0.049
0.044
9
23
1
0.035
V (Zt)
.
CHAPTER VI
SUMMARY AND INDICATIONS FOR FURTHER RESEARCH
APPENDICES
6.1.
Th~
Summary
research presented in this paper has examined a
model for treating measurement errors in surveys.
Chapter I presents a review of previous work in the
field and a discussion of the work which provides a basis
for this dissertation.
A general double sampling scheme model which can be
applied to a variety of situations is developed in Chapter
II.
The model assumes that a faulty measurement process and
an error-free measurement process are available for use in
conducting a survey.
The model as explicated in Chapter II
allows one to simultaneously eliminate the measurement
process bias in the faulty measurement process and estimate
the increased variance that measurement errors introduce
into survey estimates.
In Chapter III the model is adapted to two specific
survey situations, a self-enumeration survey and a survey
which employs interviewers.
In each case a specific model
for the measurement errors is given and the relationship of
153
the components of the specific model to the general model
is demonstrated.
Sample estimators for the components are
demonstrated.
Issues in using the DSS model are treated in Chapter
IV.
Optimum values for the sample sizes are derived for
fixed cost and for fixed variance.
The general model is
modified to one which eliminates the measurement process
bias only.
Three survey procedures are compared, use of
only the faulty measurement process, only the error-free
measurement process or the combination of the two in a
double sampling scheme.
indicated.
The preferred survey procedure is
The results point to the use of the DSS model
for those cases in which the measurement process bias is
moderate to large, the relative cost of error-free to
faulty measurements moderate, and the variable errors due to
the measurement errors moderate.
The error-free measurement
process seems to be best for those cases in which the bias
is moderate to large, relative cost small and variable
errors due to measurement errors large.
The faulty measure-
ment process is best suited for cases in which the bias is
small and the relative cost large.
The optimum number of interviewers is examined in
Chapter V.
An iterative solution was used in which the
number of interviewers is fixed and the optimum values for
rand r 1 , interviewer work loads at each phase of the double
sampling scheme are calculated. The V(Zt) is calculated for
each combination of b, r, and r 1 .
That combination which
154
gives smallest V(Zt) for fixed cost is optimum.
The results
indicate that a survey should use as many interviewers as is
practically possible.
Appendix A of this chapter presents the DSS model for
general probability sampling.
Appendix B indicates the results for an interview
conducted survey when randomization across interviewers is
not done at the second phase of the survey.
6.2.
Indications for Further Research
The following recommendations are made for further
research.
1.
Application of the model to actual surveys to
determine the size of the components is of paramount
importance.
2.
The effect of relaxing the assumptions, particu-
larly those of phase to phase and trial to trial
independence needs to be examined.
3.
An adaptation of the model to use a combination
of a faulty measurement process and a less faulty (but not
error-free) measurement process would be useful.
4.
The effect of non-response should be examined.
APPENDIX A
GENERAL DSS MODEL IN THE NON-SIMPLE RANDOM
SAMPLING CASE
In this appendix the DSS model is presented in the
general case, i.e., a probability sampling scheme other than
non-simple random sampling.
The presentation is in outline
form.
1.
The general conditions of the model are those of
Chapter II.
2.
For an initial sample of size n and a subsample of
size n l with possibly unequal probabilities of
selection we have
Pr {Ui=l} = ¢ 1. ,
.. ,
Pr {U i =l, U. =1} = 8 lJ
J
= p.1 ,
Pr {V.=l/U.=l}
1
1
=
Pr {V.=l/U.=O}
1
1
J
Pr {Ui=l, V.1 =1} =
Pr {Ui=l, V. =1} =
3.
~ j ,
,
i
~ j
,
i
~ j
a ,
Pr {V i =l} = °i ,
Pr {V i =l, V. =1} =
J
i
E ••
1J
o.1 ,
y ..
. 1J
We wi 11 consider the statistic ~t
N
X=
E
i =1
w.¢.X.
1
1- 1
as
an estimate of
156
where
~t = ~lt - ~2t + ~2t
and
N
~lt =
=
~2t
L
i =1
w.U'Y·lt
1 1- 1
N
L
i=l
m,V'Y'
1 1 - 1 2t
N
x 2t =
L
m.V.X.
1
i=l
1- 1
with the weights w.1 and m.1 being chosen
4.
wi<Pi = mio i .
The statistic Zt is an unbiased estimate of X
N
t
i=l
=
5.
w.<p. Y.
1
1- 1
N
t
i =1
m. 0. Y. +
1
1- 1
N
L m. 0. X.
i =1 1 1 - 1
X.
In the case of estimating the population mean
_
X
-
1
N
= -N. L1
1
=
X. we may choose m.
-1
1
=
=
1
N <Pi
Therefore
1
wi<Pi = N = m.o.
1 1
and
X
6.
=
1
N
X. = X
L
N i=
1 -1
•
Considering the variance-covariance matrix of :t we
have
v=E
{[(~lt-~1)-(~2t-~2)][(~lt-~1)-{~2t-~2)]'} +
E {[{~1-~1)-{~2-~2)][{~1-~1)-{~2-~2)]'} +
E {(~1-~)(~1-~)'}
where
157
N
Y1 =
L:
i =1
N
Y2 =
L:
i =1
N
=
Xi
L:
w.U.V.
1
1- 1
m.V.V.
1
1- 1
w.U.X.
i =1
1
1- 1
and we have dropped the sUbscript t on
X. do not vary from trial to trial.
~2t
since the
The three com-
-1
ponents of V are, respectively, (MV), (BV) and (TV).
7.
Let
t-V.)(V.
t-V.)I}
(SMV)i = E {(v.
_lex
_1
-lex
-1
(CMV) .. = E {(V. t-V.)(V. t-V.)t}
-lex
1J
-1
-Jet
-J
t-V.)(Y.
It-V.)I}
(SMVC)i = E {(V.
-let
-1
-let
-1
(CMVC) .. = E {(V.
t-V.)(V.
It-V.)I}
-let
-1
-Jet
-J
1J
Then we have the following for (MV):
A.
No between pha se correlation
N
(MV) =
L:
i =1
N
L:
i =1
B.
A
w.1 2¢.1 (SMV ) 1. +
2
m.o.(SMV).
+
1
1
1
N
L
iFj
N
L:
i~j
w.w.8 .. (CMV) .. +
1
J lJ
lJ
m.m.E .. (CMV) ..
1
J
lJ
lJ
between phase correlation exists
N
(MV) =
L:
i=l
N
L:
i =1
2
wi ¢ i (SMV) i +
2
m.o.(SMV).
+
1
1
1
N
L:
i~j
N
L:
i =1
w.w.¢ .. (CMV) .. +
1
J
lJ
lJ
im.m.E .. (CMV) .. 1
J
lJ
lJ
N
2 ( L: w.m.o.(SMVC). +
111
1
i =1
w.m.y .. (CMVC) .. ) .
1
1
1J
1J
158
and
(IITV) ..
1J
= (X.-X)(X.-X)'
-1
-
-J-
Then
(TV)
9.
=
N
2
L
W. ep.(ITV)
. 1 1 1
,
1=
w.w.e . . (IITV) ...
+
1 J
1J
1J·
Let
= (Y.-X.)(Y.-X.)'
-1 -1
-1-1
(IBV)i
and
=
(IIBV) ..
1J
(Y.-X.)(Y.-X.)'
-1
-1
-J-J
Then we have the following for (BV)
=
(BV)
N
N
2cp.(IBV). + L: w.w.e . . (IIBV) .. +
w.
L
1J
1
1
1
i ; j 1 J 1J
i =1
N
N
2
. . (IIBV) .. L n·. 6.(IBV). + L n.n.E
1
J
1J
1J
1
1
1
; ;j
; =1
N
m.w.o.(IBV). +
L
2[ ;=1
10.
1 1 1
1
If we let
~1.
~
= w.1 2cp.1
m.1 2o.1
+
..
=
(w.w.e .. + m.m.E .. )
Ki
=
-2m.w.o.
=
-2m.w.y ..
1J
1 J
1
1J
J
1 J
1J
1
and
K..
lJ
1 J
1J
we have the following for V.
N
L:
i;j
n.w.y . . (IIBV) . . ].
1 J
lJ
1J
APPENDIX B
EFFECT ON V(Zt) IN AN INTERVIEWER SURVEY WHEN
RANDOMIZATION IS NOT DONE AT THE SECOND PHASE
When randomization across interviewers is not done at
the second phase of the survey the coefficients of the
variance components defined in 3.3 change.
In addition a
new component of error is introduced,
2 _ 1
S - -B
Q
variance.
B
2
Q. ,the Interviewer fixed effect
. 1 J
J=
r
The expression for V(Zt) is
r[n 1 (n-1) + n(n 1 -1)] + n(n-n 1 )
nn 1 (n-1)
[
nN(n 1 -1) - n(n-"l) - n1 N(n-l)
]
nn 1 (N-1)(n-1)
{S2} +
11
•
nN(n 1 -1) n(n-n 1 ) - n1 N(n-1)
2
nn N(n-1)
] {SIQ} +
[
1
BIBLIOGRAPHY
Bailar, B. A.
IIRecent Research in Reinterview Procedures.
JASA, 63 (1968), 41-63.
'1
Borus, M. E.
IIResponse Errors in Survey Reports of Earning
Information. 1I JASA, 61 (1966),729-738.
Buck, R. Creighton.
Hill, 1965.
Advanced Calculus.
New York:
McGraw-
Carter, H., P. C. Glick, and S. Levit.
IIS ome Demographic
Characteristics of Recently Married Persons: Comparisons of Registration Data and Sampling Survey
Data. 1I Am. Soc. Rev., 20 (1955), 165-172.
Chandrasekar, C., and W. E. Deming.
liOn a Method of
Estimating Birth and Death Rates and the Extent of
Registration. 1I JASA, 44 (1949), 101-115.
Cochran, W. G. Sampling Techniques.
and Sons, 1963.
New York:
John Wiley
Cochran, W. G.
IIErrors of Measurement in Statistics. 1I
Technometrics, 10 (1968), 637-666.
Da1enius, T.
"Recent Advances in Sample Survey Theory and
Methods. 1I Annals of Mathematical Statistics, 31
(1962), 325-349.
David, Martin.
"The Validity of Income Reported by a Sample
of Families who Received Welfare Assistance During
1959." JASA, 57 (1962).
Deming, W. E.
"On Errors in Surveys.1I
(1944), 359-369.
Am. Soc. Rev., 9
Deming, W. E.
IIUncertainties in Statistical Data and their
Relationship to the Design and Management of Statistical Surveys and Experiments. II Bull. International
Statistical Institute, 38 (1961), 365-383. Part 1.
Durbin, J.
IINon-Response and Call-Backs in Surveys."
Bull. International Statistical Institute, 34 (1954),
82-86. Part 2.
161
El-Badry, M. A. IIA Sampling Procedure for Mailed Questionnaires. 1I JASA, 51 (1956), 209-227.
Fellegi, I. P. IIResponse Variance and its Estimation. 1I
JASA, 59 (1964), 1016-1041.
Ferber, Robert. liOn the Reliability of Responses Secured
in Sample Surveys.1I JASA, 50 (1955).
Fleischer, Jack, Daniel G. Horvitz, J. Malcolm Airth, and
A. L. Finkner. IIMeasurement Errors Associated with
Obtaining Acreage Estimates of Cotton Fields. 1I
Biometrics, 13 (1958), 401.
Greenberg, Bernard G., Abdel-Latif A. Abul-Ela, Walt R.
Simmons, and Daniel G. Horvitz. liThe Unrelated
Question Randomized Response Model: Theoretical
Framework. II JASA, 64 (1969), 520.
Greenberg, Bernard G., James R. Abernathy, and Daniel G.
Horvitz. IIA New Survey Technique and its Application
in the Field of Public Health. a Milbank Memorial
Fund Quarterly, October, 1970.
Greenberg, Bernard G., Roy R. Kuebler, Jr., James R.
Abernathy, and Daniel G. Horvitz. IIApplication of
the Randomized Response Technique in Obtaining
Quantitative Data.
JASA, 66 (197l), 243.
1I
Han sen, M. H., and W. N. Hurw i t z . liThe Problem of NonResponse in Sample Surveys.1I JASA, 41 (1946), 517529.
Hansen, M. H., W. N. Hur wit z, E. S. Marks , and P. W.
Mauldin. IIResponse Errors in Surveys.1I JASA, 46
(1951), 147-190.
Hansen, M. H., W. N. Hurwitz, and M. A. Bershad. IIMeasurement Errors in Censuses and Surveys.1I Bull. International Statistical Institute, 38 (1961), 359-374 .
•
Hansen, M. H., W. N. Hurwitz, and W. G. Madow. Sample
Survey Methods and Theory. Vol. II. New York: John
Wiley and Sons, 1953.
Hansen, M. H., W. N. Hurwitz, and L. Prutzker. liThe Estimation and Interpretation of Gross Differences and
Simple Response Variance. 1I Contributions to Statistics. Oxford, England: Pergamon Press, 1964.
·162
Hansen, M. H., W. N. Hurwitz, and L. Prutzker. "Standardization of Procedures for the Evaluation of Data:
Measurement Errors and Statistical Standards in the
Bureau of the Census." Bull. International
Statistical Institute, 42 (1967), 49-66.
Heise, David R., and George W. Bohrnstedt. "Validity,
Invalidity, and Reliability." In Borgatta (ed.),
Sociological Methodology 1970. San Francisco:
Jossey-Bass~ Inc., Publishers, 1970.
Horvitz, D. G. "Prob1ems in Designing Interview Surveys to
Measure Population Growth." Soc. Stat. Sec., Am.
Stat. Assoc., (1966), 245-249.
Kish, L., and J. B. Lansing. "Response Errors in Estimating
the Value of Homes." JASA, 49 (1954),520-538.
Kish, L. "Studies of Interviewer Variance for Attitudinal
Variables." JASA, 57 (1962), 92-115.
Kish, Leslie. Survey Sampling.
Sons, 1965.
New York:
John Wiley and
Koch, Gary G. "A Response Error Model for Sub-Class Means
and Post-Stratified Means." Unpublished Technical
Report #6, Research Triangle Institute, Project
SU-618, September, 1971.
Koch, Gary G. "A Response Error Model for a Simple Interviewer Structure Situation.
Unpublished Technical
Report #4, Research Triangle Institute, Project
SU-618, July, 1971.
1I
Koch, Gary G. "An Alternative Approach to Multivariate
Response Error Models for Sample Survey Data with
Applications to Estimators Involving Subclass Means."
Submitted to JASA.
Koch, Gary G. "S ome Survey Designs for Estimating Response
Error Model Components.
Unpublished Technical Report
#5, Research Triangle Institute, Project 216-730,
January, 1973.
1I
163
Larsen, Richard F., and William R. Catton, Jr.
IICan the
Mail-Back Bias Contribute to a Study's Va1idity?1I
Am. Soc. Rev., 24 (1959), 243.
Madow, William G.
liOn Some Aspects of Response Error
Measurement. 1I Proc. Soc. Stat. Sec., Am. Stat. Soc.,
(1965),182-192.
Maha1anobis, P. C.
IIRecent Experiments in Statistical
Sampling in the Indian Statistical Institute. 1I
J. Royal Stat. Soc., 109 (1946), 327-378.
Maynes, E. Scott. IIMinimizing Response Errors in Financial
Data: The Possibilities. 1I JASA, 63 (1968), 214.
Mosteller, Frederick.
IINonsamp1ing Errors. 1I International
Encyclopedia of the Social Sciences. Vol. V, pp.
113-131.
Neter, John, E. Scott Maynes, and R. Ramanathan. liThe
Effect of Mismatching on the Measurement of Response
Errors. 1I JASA, 60 (1965), 1005.
Pea r son, Ka r 1 . II 0 nth e Mat hem a tic a 1 The 0 r y 0 fEr r 0 r s 0 f
Measurement. 1I Philosophical Transactions, Royal
Society of London, Series A, 198 (1902), 235-299.
Rice, S. A.
IIContagious Bias in the Interview. 1I
Soc., 34 (1929), 420-423.
Am. J.
Sukhatme, P. V., and G. R. Seth. IINon-Samp1ing Errors in
Surveys.1I J. Ind. Soc. Agr. Stat., 4-5 (1952).
Tenenbein, Aaron.
IIA Double Sampling Scheme for Estimating
from Binomial Data with Misc1assifications. 1I JASA,
65 (1970).
-Williams, W. H., and C. L. Mallows.
IIS ys tematic Biases in
Panel Surveys.1I JASA, 65 (1970), 1338-1349.
Wells, Bradley H.
IIDua1 Record Systems for M~asurement of
Fertility Change. 1I Working paper No. 13, East-West
Population Institute, East-West Center, April, 1971.
Zarcovich, S. S. Quality Statistical Data. Rome, Italy:
Food and Agriculture Organization of the United
Nations, 1966.
© Copyright 2026 Paperzz