Freeman, Daniel H. Jr.; (1975)The regression analysis of data from complex sample surverys: an empirical investigation of covariance matris estimation."

This research was supported in part by the National Institute of Child
Health and Human Development through Grants HD-00371 and by the U.S.
Bureau of the Census and the Department of Biostatistics through Joint
Statistical Agreements JSA 74-2 and JSA 75-2.
THE REGRESSION ANALYSIS OF DATA FROM COMPLEX
SAMPLE SURVEYS:
AN EMPIRICAL INVESTIGATION OF COVARIANCE
MATRIX ESTIMATION
by
Daniel H. Freeman, Jr.
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1020
July, 1975
ABSTRACT
DANIEL H. FREEMAN, JR. The Regression Analysis of Data from Complex
Sample Surveys: An Empirical Investigation of Covariance Matrix
Estimation. (Under the direction of GARY G. KOCH.)
The U. S. Bureau of the Census model for response errors is
specialized to the case of cross-classified data.
The application of
the method of half samples for balanced repeated replication and
weighted least squares for inference is discussed in the framework of
this model.
This discussion suggests an investigation of several pro-
cedures for the estimation of the covariance matrix of a cross-classified
data set.
The investigation is carried out first in the presence of primari1y sampling variance and then in the presence of limited response
error.
A Taylor series approximation is compared to the direct repli-
cation of ratio estimates to investigate consistency properties.- The
effect of poststratification for nonresponse and the effect of the
assumption of zero interdomain covariance are also examined.
The
results of the comparisons are then employed in the regression analysis
of survey data.
The empirical investigation is followed by an analytical discuss ion of various postsamp1ing data adjustments.
These adjustments
are directed at the underlying stochastic process which generated the
survey responses.
ACKNOWLEDGEMENTS
I wish to extend my heart felt thanks to the many people
without whom this investigation would not have been feasible.
This is
especially true with respect to Doctor Gary G. Koch under whose guidance this work was done.
Both his insights into the problem and his
advice concerning my academic career as a student have been invaluable.
I also wish to thank the members of my committee, Dr. James R.
Abernathy, Dr. Karl Bauman, Dr. Daniel G. Horvitz, and Dr. H. Bradley
Wells.
Their patient reading of the manuscript and advice with re-
spect to both content and form has contributed greatly to the clarity
and correctness of the presentation.
However, any errors which may
have slipped by are solely the responsibility of the author.
Any empirical investigation requires a substantial amount of
computer work and great care in the editing and preparation of the
tables and the text.
This study would have been much more difficult
without the tireless efforts of Jean L. Freeman in the COluputer work.
Similarly Rebecca Wesson and Betty Owens have been vital to the typing
and editing of the manuscript.
The National Center for Health Statistics, in the persons of
Dwight B. Brock, Earl Bryant, Gretchen Jones, Bob Fuchsberg, Clint
Burnham, George Schnack, Bob Casady, and Ron Wilson, has been very
generous in its providing the data for this study.
Financial support
for this work has been partially provided by the National Institute
of Child Health and Human Development through Grants liD-0037l and by
the U.S. Bureau of the Census and the Department of Biostatistics
iv
through Joint Statistical Agreements JSA 74-2 and JSA 75-2.
Finally, lowe a great deal to those persons of both the
Department of Biostatistics and the University at large whose assistance cannot be quantified.
Athletic Club.
This is especially true of the Friday
TABLE OF CONTENTS
Page
LIST OF TABLES
vii
CHAPTER
I.
INTRODUCTION AND REVIEW OF LITERATURE:
CONTEXT OF THE INVESTIGATION • • .
1.1
1.2
1.3
1.4
1.5
II.
III.
THE
Introduction • • . . . • • •
.••• •
Sampling Variance and Estimation .
External Measurement Effects
Respondent Errors. .
Summary and Problem.
1.
1
2
4
7
9
GENERAL ANALYTICAL FRAMEWORK••
12
2.1 Introduction • • • • • • •
2.2 Notation and Definitions •
2.3 'The Response Error Model and Linear
Statistics for Cross-Classified Data.
2.4 Replication for Estimates of Var {y }• .
.,.t Rates
2.5 Application of BRR and CSM to ---Doma1n
2.6 An Inference Structure for Complex Survey
Estimates •
• • • •
2.7 Summary. . . • . • . .
•..•.
12
13
THE INVESTIGATION OF EFFECTS ON INFERENCE IN
THE PRESENCE OF ONLY SAMPLING VARIANCE
3.1 Introduction. . •
•••.•••
3.2 The Data. • • •.
•••••.•
3.3 Estimates and Variance Covariance Matrix.
3.4 Design of the Experiment • • • • • • • •
3.5 The Taylor Series Approximation (TS)
Compared to Replicated Ratios (RR).
3.6 The Poststratification (PS) Adjustment
of Survey Data. • • • ..• . . • • .
3.7 Covariance Effects. . • • • • • • . •
3.8 Inference Structure and Substantive
Conclusions • • • . • • • • . . • • •
3.9 Summary • • • • • • • • • • •
3.10 Appendix of Detailed Tables.
15
19
22
26
30
33
33
34
37
39
40
42
45
48
53
54
vi
Page
IV.
THE INVESTIGATION OF EFFECTS ON INFERENCE IN
THE PRESENCE OF LINITED RESPONSE ERROR.
4.1
4.2
4.3
Introduction. • • • •
The Comparison of the Taylor Series
Approximation (TS) to BRR for Ratio
Estimates . • . • •
The Poststratification Adjustment of Survey
Da ta •
4.4
4.5
4.6
4.7
V.
• ••
•••••.•••••
77
80
82
87
95
99
100
ANALYSIS OF POSTSAMPLING ADJUSnlENT EFFECTS AND MORE
COMPLEX FUNCTIONS OF SURVEY DATA.
• • • • • 132
5.1
5.2
5.3
5.4
5.5
5.6
VI.
• •
Covariance Effects.
Inference Structure and Substantive
Conclusions. • •
Summary of Results.
Appendix of De.tailed Tables .
77
Introduction. • • • • . • • • •
• • • ••
The IPF Postsampling Adjustment Procedure
Variance Structure of the IPF Model
Application to Poststratification
Complex Postsamp1ing Models • • • • • •
Summary and Conclusion. • • • •
CONCLUSIONS AND DIRECTION OF FURTHER RESEARCH.
6.1
6.2
BIBLIOGRAPHY •
Conclusions • •
Directions.
132
133
141
145
146
151
153
153
155
157
LIST OF TABLES
PAGE
TABLE
3.1
3.2
3.3
3.4
Ratio Estimates of Physician Visits/Population
Classified by Age, Sex, and Race • • • •
3.5.2
3.6
41
Comparison of Tests Hypotheses, Replicated
Ratios (RR) Versus Taylor Series Approximation
(TS), Age, Sex, and Race Classification Physician
Visit/Population, Covariances Estimated, Q
Statistics, Nonpoststratified . • • • • • . • . •
43
Effects of Poststratification in terms of Overall
Standard Errors. Age, Sex, and Race Classification
of Physician Visits/Population. TS and RR
• .
• •
.
•
• .
.
•
• .
• .
.
.
.
•
•
44
•
Correlation Matrix for Replicated Ratios
Age x Sex x Race Poststratified • • • • •
46
Correlation Matrix for Replicated Ratios
Age x Sex x Race Nonpoststratified ••
~
47
Effects of Zero Covariance Assumption on Final
Models Nonpoststratified Replicated Ratios
Physician Visits/Population Classified by Age,
Sex t and Race .
3.7
37
Comparison of Estimated Standard Errors for RR
and TS Procedure. Nonpoststratified Age x Sex x
Race Classification of PV/P. Covariances Estimated
Estimates .
3.5.1
....
•
.
49
• • • • • • • • • • • •
Predicted Values (Estimated Standard Errors
in Parentheses) for Ratio Estimates of Physician
Visits/Population based on Poststratified Data
with Age, Sex, and Race Classification and All
Covariances Estimated • • • • • . • • •
...
52
3.8
Detailed Tables in the Appendix •
55
4.1
Ratio Estimates of Physician Visits/Population
Classified by Income, Residence, and Education • • • • •
79
Comparison of Estimated Standard Errors for Replicated
Ratios (RR) and Taylor Series Approximation (TS),
Income by Residence by Education Classification of
Physician Visits/Population (Covariances Estimated)
81
4.2
viii
PAGE
TABLE
4.3
4.4.1
4.4.2
4.5.1
4.5.2
4.6.1
4.6.2
4.6.3
4.6.4
4.7
4.8
Comparison of Tests of Hypotheses Replicated Ratios
(RR) Versus Taylor Series Approximation (TS), Income (1)
by Residence (R) by Education (E) Classification,
Physician Visits/Population, Covariances Estimated.
Q Statistics . • . • . • • • • • • • • • •
83
Effects of Poststratification (PS) Versus Nonpoststratification (NPS). Income by Residence by
Education Classification. Physician Visits/Population,
Covariances Estimated . •
..•• • •
• . • •
84
Effects of Poststratification (PS) Versus Nonpoststratification (NPS). Income (I) by Residence (R)
by Education (E) Classification. Physician Visits/
Population, Covariances Estimated. Replicated
Ratios and Taylor Series Approximation. • • •
• • ••
86
Correlation Matrix for Income by Residence by Education, Replicated Ratios, Nonpoststratified Data
88
Correlation Matrix
cation, Replicated
89
...·.
for Income by Residence by EduRatios, Poststratified Data.
..·.
Effects of Zero Covariances Assumption on Final
Models, Nonpoststratified Replication Ratios,
Physician Visits/Population Classified by Income, Residence, and Education. • • • • • • • • • • • •
90
Effects of Zero Covariance Assumption on Final
Models, Poststratified Replicated Ratios,
Physician Visits/Population Classified by Income, Residence, and Education . • • •
91
Effects of Zero Covariance Assumption on Final Models,
Nonpoststratified Taylor Series Approximation,
Physician Visits/Population Classified by Income,
Residence, and Education • • • • • • • • • • • • • • • ••
92
Effects of Zero Covariance Assumption on Final
Models, Poststratified Taylor Series Approxuuation,
Physician Visits/Population Classified by Income,
Residence, and Education • • • • • • • • • • • • • •
93
Predicted Values (Estimated Standard Errors in Parentheses) for Ratio Estimates of Physician Visits/Population Based on Replicated Ratios with Income,
Residence, and Education Classification All
Covariances are Estimated, Nonpoststratified Data
98
Detailed Tables in the Appendix • • • • • • • • •
• • 101
CHAPTER ONE
INTRODUCTION AND REVIEW OF LITERATURE:
THE CONTEXT OF THE INVESTIGATION
1.1
Introduction
This paper is an investigation of the weighted least squares
methodology described in Grizzle, Starmer and Koch (1968) (hereafter
GSK) for inference procedures as applied to complex sample survey data.
The central issue in any statistical inference procedure is the estimation of population parameters and the estimation of the standard errors
of these parameter estimates.
At this time, survey sampling has
reached a high level of sophistication in order to obtain accurate
information about large populations at minimum cost.
In the process of
developing these survey techniques the actual sampling has become much
different from the simple random sampling procedures for which the
estimators and inference proceudres of mathematical statistics were
originally constructed.
Furthermore, as survey sampling developed,
many sources of variation in the-estimates, other than those induced
by the sampling procedure, were recognized.
ter~med
nonsampling sources of variation.
These may be generally
This chapter is devoted to a
aua1itative discussion of these sources of variation.
This discussion
sets the context for the investigation of the weighted least squares
methodology for inference on data from complex sample surveys.
The organization of the chapter is first a brief discussion of
sampling variance and the methods of estimating it.
Then several of
2
the nonsamp1ing sources of variation are discussed.
Specifically,
errors in coverage and nonresponse effects are examined, the class of
errors associated with the measurement process, and lastly, the errors
which are associated with the respondent are examined in separate sections.
The literature in each of the areas is used to identify var-
iables usually associated with these errors.
In addition, the work of
other investigators, which was a guide in the formulation of this study,
is discussed.
The review of literature in these areas would be quite
extensive in itself (Sudman and Bradburn (1974»
and even a much larger
discussion would of necessity be incomplete (Hansen and Waksburg
(1970».
What is attempted here is more in terms of highlighting
some of the more important issues and some of the ways with which they
have been dealt.
Out of this discussion certain procedures for estima-
tion are selected for the implementation of this investigation.
1.2
Sampling Variance and Estimation
The first source of error that comes to mind when one thinks of
statistical procedures is that associated with the taking of samples.
That is, when subsets of a population are examined, then the derived
estimates depend on the subset which is selected.
When different sub-
sets are selected, then different estimates are obtained.
tion among the estimates induced
This varia-
by the selection process
or selection design is termed sampling variance (e.g., Cochran (1963)
and Kish (1965».
The selected subsets are termed samples and if some
identifiahle probahilities are associated with the selection design
these samples are referred to as probability or random samples.
This
paper is restricted to the consideration of random samples without
3
replacement and unless otherwise specified this is what the term "random sample" will refer to.
These selection probabilities may be quite simple, as in the
case for simple random sampling where all r-tuples of individuals of
the population
have an equal probability of being selected.
A great
deal of effort has been made to determine explicit formulae for these
selection probabilities, for it is based on these probabilities that
the choice among various estimators can be made.
This is because the
selection probabilities determine the sampling properties of the estimators.
However, more and more sophisticated selection designs have
been developed to minimize the cost of conducting surveys.
This has
made the formulation of the selection probabilities more and more complex.
This in turn has meant that in many situations it is no longer
desirable or even feasible to obtain direct estimates of the standard
errors of survey statistics, because of the computational difficulties
involved.
This difficulty has led to the development of indirect estimators of sampling variance.
to the sample statistics.
approximations.
Some methods
~ely
on linear approximations
These are often referred to as Taylor series
Generally, these yield consistent estimates of var-
iance (see for example Frankel (1971)).
However, these still require
the knowledge of the selection probabilities.
examine subsamples within the original sample.
Other methods simply
The principle is that
variation among the estimates based on each of the subsamples reflects
the variation of the statistic based on the entire sample.
This has
the obvious advantage of bypassing the selection probabilities.
Two
such methods are balanced repeated replication (BRR) (HcCarthy (1966,
4
1969a,b»
and jackknifing (e.g., Miller (1974».
These indirect methods have been
and empirically.
investig~ted
both analytically
Properly employed, McCarthy has shown that unbiased
estimates of the variance of linear statistics may be obtained quite
efficiently from BRR procedures.
properties.
Jackknifing procedures have similar
Frankel (1971) conducted an extensive empirical investi-
gation of these procedures and found them generally to be quite acceptable for the estimation of contrast variance estimates and the variance
estlinates of similar statistics.
However, he did not extend these
results to general inference procedures.
Kish and Frankel (1968 and
1970) also used BRR to investigate the assumption that the survey was
in fact the result of simple random sampling.
Partly as a result of
these investigations the U. S. Bureau of the Census and the National
Center for Health Statistics (N.C.H.S. 1974) have adopted BRR as a procedure for the estimation of sampling variance.
As pointed out in
Koch, Freeman, and Freeman (1975) BRR can easily be extended to the
estimation of numerator denominator covariance in ratio statistics.
formula for the estimation of interdomain covariances of nonlinear
statistics is developed in Chapter Two of this paper.
be used for estimating the covariance matrix.
issue of nonsamp1ing sources of variation.
next two sections.
BRR will then
This then raises the
This is considered in the
First, errors associated with the measurement
device and then those associated with the respondent are discussed
separately.
1.3
External Measurement Effects
Three effects which are not respondent centered can have an
A
5
important effect both on variance cstiallltes and the unbiasedness of
the estimates.
These are effects due to questionnaire design, choice
of respondent, and interviewer effects.
Questionnaire design and
proper choice of respondent are discussed in many places (e.g., Deming
(1960) and Kish (1965»
and will not be considered here, except in so
far as multiple questionnaires might be used.
If multiple question-
naires are used, then the corresponding effect is conceptually like
that due to the use of multiple interviewers.
Thus, the more general
term of measurement device effect might be used.
The measurement device effect arises through the use of a particular device which introduces a bias in the responses.
This effect
will be discussed in terms of a general model in Chapter Two.
However,
it can also be dealt with or controlled for in the data collection
design proper.
The two data collection designs of greatest influence
for the control of this effect are interpenetrating subsamples (Mahalanobis (1946), Hansen and Marks (1958), Kish (1962), and Cochran (1963»
and replicated sampling, Deming (1960).
Interpenetration is especially useful in the case of correlated
measurement errors with respect to interviewers.
It is a design which
assigns an independent simple random sample to each interviewer within
each data collection area.
Thus, differences among interviewers and
area assigrunents are combined.
The analysis is similar to that of
variance components analysis in the linear models framework.
The over-
all effect is to treat the subsmmple assigned to each interviewer as a
single complex sampling unit.
Hence, under a general set of sampling
assumptions, unbiased estimates of the variance of the overall estimate
of the population parameters may be obtained.
Here the term " para-
6
meter" is used to refer to statistics based on the entire target population rather than on the sample of the population.
An alternative but similar approach to the data collection
design which addresses not only measurement device effects but the
broader context of the response error problem is that of replicated
designs.
The principle is that any sampling design may be divided in
half and two samples of the same design drawn to make" two replications.
The important assumption is that each of the (up to k) subsamples or
replicates is independent of the other subsamples.
Thus, estimates
within each subs ample may be used to estimate the variance of the
parameter estimates.
This approach has the advantage of providing
estimates of the total response variance.
It has the drawback that it
is both expensive and difficult to obtain a sufficient number of independent replicates to be of value.
However, it is this design which
suggested the BRR techniques for indirect estimates of variance
(McCarthy (1966».
These are not data collection techniques
~~,
but as will be pointed out in the discussion of Chapter Two, they do
impose certain constraints on the design.
This completes the discussion of the sources of measurement error except for nonresponse and coverage errors.
These are grouped with
the measurement device effects since frequently the magnitude of these
effects is directly associated with the execution of the data collection
design.
These effects, which are the contribution to parameter values
by respondents who cannot be sampled either because of refusal or absence, are the most difficult to examine for precisely those reasons.
Efforts to do so have been, again, largely empirical, as with the North
Carolina Marriage Follow-Back Survey (Wells, Coulter, and Wiener (1973».
7
There it was noticed that the effects could be quite substantial.
Other studies, National Center for Health Statistics (October, 1974),
have experienced sufficiently little nonresponse that simple ratio adjustments were sufficient to account for this effect.
Thus, there are two sources of error which should be taken into
account in adopting a data collection design.
The first, measurement
device effects, is directly involved in the design; while the second,
coverage and nonresponse effects are to be dealt with either in the
execution of the design or in adjustments to the estimation process.
Again, these effects have been investigated primarily through empirical
or simulated sampling experiments.
The adjustment of estimates to
account for these effects leads to the third major source of survey
estimate. variance, the respondent centered variance.
1.4
Respondent Errors
If surveys are
o~
a retrospective nature or carried out over a
period of time, frequently time dependent biases or trends tend to
appear in the responses.
Among other researchers into this problem,
Neter and Waksburg (1964) and Sudman and Bradburn (1973) discuss and
model the effects on responses of a time 1age between the survey date
and the event of interest.
These researchers used an exponential decay
model ,to characterize memory loss over time.
The memory losses inclu-
ded both the failure to recall events as well as the tendency to misplace events in time or the so called telescoping effect.
In examining
these phenomena both bounded and unbounded recall periods have been
examined.
ferred.
The consensus is that short recall periods are to be pre-
8
Memory loss is not the only source of respondent error over
time.
Bailar (1973) and others have observed what might be termed
respondent exhaustion.
This occurs, for example in panel surveys and
longitudinal studies.
These are surveys where respondents are reinter-
viewed several times in the course of the study.
Empirical investiga-
tion has revealed distinct time trends in the response which indicate
another source of time dependent bias.
Models which produce estimates
corrected for these trends are presently being utilized for the Current
Population Survey reports of the United States Bureau of the Census.
The last group of respondent errors may be due either to lack of
motivation or knowledge, or to misrepresentation.
These errors tend to
occur with respect to information which is considered either sensitive
or unimportant by respondents.
They have been investigated by many
researchers and efforts to control such errors have been diverse.
For
example, randomized response (Greenburg, Abernathy, and Horvitz (1970»
has been used to elicit information on illegal abortions.
Often multi-
ple independent reports of the same phenomenon have been used to examine the effects of such errors.
For example, Borus (1966) and Borus
and Neste1 (1973) each utilize multiple regression to compare employee's responses on earnings to employer's records on earnings, and to
compare father's and son's responses on the father's education and
socio-economic status.
They were able to identify demographic factors
influencing differences in the two response reports.
This is only a brief summary of the sources of respondent error.
It is given to indicate sources of error that must be considered in the
construction of analytic models and selection designs.
These error
sources, to the extent that they generate randomly distributed errors,
9
would tend to :f.nflate estimatcs of simple rcsponse variance.
To the
extent that they introduce bias, these error sources would inflate the
errors of approximation.
As will be secn in Chapter Two, interaction
between these respondent centered sources of variation and interviewer
effects can have disastrous effects on the overall estimates of variance.
Thus, there are three major sources of variance in estimates
based on survey data, those due to sampling and the method of estimation, those due to the measurement device, and those which are respondent centered.
It is in this context that the empirical investigation
takes place.
1.5
Summary and Problem
The preceding four sections have reviewed three broad sources of
variance contribution to the estimates of parameters based on complex
sample surveys.
These are sampling variance, response errors, and
measurement errors.
Each of these has been investigated to varying
degrees using empirical procedures.
However, none of the investiga-
tions have been directed toward the evaluation of a generalized inference procedure.
Further, the sources of variance and error have not
been examined jointly but this is not to criticize the previous investigations which were focused on the comparison of the approaches to one
or another of the contributions to error.
a joint investigation has been laid.
However, the ground work for
In other words, the various "ver-
tical investigations" have prepared the way for a "horizontal investigation."
It is to this issue that this dissertation will address itself.
10
Specifically, three related issues are examined in a series of empirical tests:
1.
The Taylor series approximation is compared to the replicated ratio estimate of variance;
2.
The effect of postsampling adjustments for nonresponsej
3.
The zero covariance assumption.
These comparisons are made in order to evaluate three hypotheses.
First,
since the Taylor series approximation (T8) has been shown to be consistent (Koch, Freeman, and Freeman (1975», the comparison to the replicated ratio estimate (RR) will examine the consistency (for large samples) of the BRR approach to variance estimation.
The assertion that
postsampling adjustments for representativeness and nonresponse reduce
standard error estimates will be evaluated by the second set of comparisons.
And the amount of statistical power lost by assuming zero co-
variances among domains will be examined by the third set of comparisons.
Further, these comparisons will be made in the context of an
inference approach.
The inference methodology used will be that out-
lined in Koch, Freeman, and Freeman (1975) which uses weighted least
squares estimates to generate what are termed Wald statistics.
This
methodology was first discussed in Grizzle, Starmer, and Koch (1968),
hereafter denoted GSK.
This investigation is contained in the next three chapters.
Chapter Two will establish the
g~neral
analytical framework.
This in-
cludes a summary of what is known as the U.S. Bureau of the Census
response error model, hereafter referred to as the census response error model or CSM.
Then a discussion of weighted least squares is given
and Balanced Repeated Replication (BRR) described.
This is all in
11
terms of what is generally referred to as cross-classified data.
Chap-
ters Three and Four are the result of the empirical investigation.
description of the data used is contained in Chapter Three.
A
Inference
and compariHons in the presence of primarily sampling variance is the
focus of Chapter Three; m1ile a different cross-classification in Chapter Four introduces the effect of moderate response error.
Various
analytic results relevant to the investigation of the effects of postsampling adjustments and of certain types of response errors are presented in Chapter Five.
CHAPTER TWO
GENERAL ANALYTICAL FRAMEWORK
2.1
Introduction
The preceding chapter discussed some of the ways in which random
errors and biases can affect the estimates of population parameters and
their variances when the estimates are based on sample surveys of human
populations.
This chapter examines various methodological topics which
are relevant to this empirical investigation of the effects of postsampling adjustments for nonresponse, the effects of response errors,
and the effects of various approximations.
Sections two and three
describe and develop the U. S. Bureau of the Census response error
model (CSM) for
cross-cl~ssified data.
For other applications this
model has been used extensively (Fellegi (1964), Hansen, Hurwitz, and
Bershad (1961), Hansen, Hurwitz, and Pritzker (1964), Hansen,
and Marks (1958), Koch (1973».
Variance matrix estimation by balanced
repeated replication is discussed in Section four.
These results are
then applied to the estimation of domain rates and the corresponding
covariance matrix in Section five.
The last part of the discussion is
devoted to putting these estimation procedures into an inference structure with the goal of obtaining reasonable and informative models of
the survey data.
The structure used is that of Wald (1943) statistics
as discussed in Koch, Freeman, and Freeman (1975).
Taken together
these several sections provide a framework in which the empirical inves-
13
tigations are pursued in subsequent chapters.
2.2
Notation and Definitions
The population will be taken to be a set of individuals indexed
by the subscript i
= 1,2, ••• ,N.
The term sample will refer to a ran-
domly selected subset of the population.
denoted by "n."
The size of the subset is
Under a fixed set of general conditions for the survey
there is a conceptual sequence of infinitely many trials denoted by the
subscript t, T of' which may be observed.
Thus, t = 1, ••• ,T.
In
applications often only one trial is observed, however there are two
important exceptions.
The first is for empirical simulation studies
where a fixed set of individuals are simulated and repeatedly sampled.
The other is in the technique known as balanced repeated replication
where half samples are used to create pseudoreplicates, each of which
may be considered as one of a sequence of trials, in certain contexts.
The sample design or, more accurately, the first component of
the sample design termed the selection design, is characterized by random indicator variables U (as described by Cornfield (March, 1944»,
i
where:
if the i-th individual is selected,
(2.1)
otherwise.
Let ~i ~ E{U } be the probability of selection of the i-th individual
i
and 8
ii
,
= E{UiU i ,}
be the probability of j~int selection of the i-th
and i'-th individuals.
In general the selection process is without
replacement on any given trial, t, thus:
(2.2)
14
The 8
ii
, for i , i' must be calculated from the random process of
selection specified by the design at hand.
As indicated in Chapter One the response process is really two
separate but related sources of variation.
This process may be part i-
tioned into measurement and respondent centered processes.
The mea-
surements may either be by self-enumeration, as when all selected individuals are mailed identical questionnaires, or by the use
assigned interviewers or questionnaires.
of randomly
The case of randomly assigned
measuring devices assumes that there is a fixed population of B devices
to be apportioned among the n selected individuals.
It is assumed thal
a random subset of the sample is allocated to the h-th device.
sample subset size is denoted
by~.
B
n
= I
h=l
The
Thus,
~.
(2.3)
This allocation of devices can be characterized by the random indicator
variables T :
hi
if selected individual i (U = 1)
i
is assigned to device h,
(2.4)
otherwise;
and by the joint distribution of the T •
hi
The basic unit of observation is represented by a random vector
:hit' a vector of r components, which denotes the classification of the
i-th individual in the population as measured by the h-th device.
The
subscript t indexes the trial or replication for the specific individual.
In most applications the components of :hit are of the form
15
1
if the i-th selected individual is classified
in category j by device h on trial t;
o
otherwise; j=l, ••• ,r, i=l, ••• ,N, h=l, •.• ,B,
t=l, ••• , T •
(2.5)
Here j may be vector valued.
It should be noted that Y ,. may take on
h 1.J t
fractional or scored values when, for example, multiple responses to a
given question are permitted.
This case will not be considered.
Thus, three sources of variation have been identified in this
model:
(i)
(ii)
(iii)
The selection process, characterized by the Vi;
The measurement process, characterized by the T ;
hi
The pure response process, characterized by the variation
among conceptually repeatable trials under a fixed set of
general conditions at the time of the survey.
This list defines the random variables which are of interest in terms
of the sources of error in Chapter One.
These can now be modeled to
provide the conceptual framework of the sampling experiment.
2.3
The Response Error Model and Linear Statistics for Cross-Classified
Data
The model follows naturally out of the definitions of the random
variables given in Section 2.2 for Vi' T , and !hit.
hi
It is in the
spirit of Wi1k and Kempthorne (1955) and certain models developed by the
V. S. Bureau of the Census, Hansen et. al. (1961), (1964).
sion of Koch, Freeman, and Freeman (1975) will be followed.
CSM model.
The discusThis is the
It is assumed that the !hit can be represented as a linear
function of an overall mean, a fixed main effect due to the i-th indiVidual, a fixed main effect due to the h-th interviewer, a fixed interaction term due to the (h,i)-th interviewer by individual combination at
16
the t-th trial, and a residual.
Formally this yields,
(2.6)
where, !hi
Y
=
= ~t{~hit} = expected response across trials,
1
B
NB h __ 1
N
1 I !hi'
i=l
Here the {~hit} correspond to trial to trial variations for fixe~
interviewer by individual combinations.
Thus, it is the variation not
controlled by the sampling and selection designs and may be thought of
as pure or intrinsic response error.
Under the CSM it assumed that the
trials are uncorre1ated so that,
(2.7)
Implicitly it is assumed that there.is no bias as with the memory lag
models mentioned in Chapter One •. In certain superpopu1ation models,
these assumptions may be relaxed (Fuller (1973».
The nature of the errors due to the measurement device is now
characterized by the {~h}and {(~~)hi}'
Tnese terms correspond to the
17
{T
hi
} which give rise to what may more generally be termed the external
response error.
If the selection design does not specifically account
for these terms, then they are rather difficult to manipulate in the
postsampling analysis.
for all hand i.
Thus, it is assumed that ~h
= 0 and
=0
(~~)hi
For the purposes of this discussion certain other
assumptions about the model are appropriate.
Specifically it is assumed that the measurement process is free
from both interaction with the selection process and bias.
Absence of
interaction with the selection process is taken to mean that
thus expection conditioned on selection will not be necessary.
of bias means that the population proportions,
~,
Absence
are in fact the para-
meters of interest.
The statistics or parameter estimates are in the class of generalized Horvitz-Thompson estimators (1952),
(2.9)
which are linear combinations of the observations on selected individuals with the W being known coefficients for weighting the observahi
tions.
It follows that,
(2.10)
~
h
were
't'i
co
E{U} an d'1\
i
hi.
EI
E{T hi IU i
c
1} •
A natural choice of weights
is for the W to be the reciprocals of the selection probabilities;
hi
18
i.e., W =
hi
l/~iAhi'
This allows for the nonuniform assignment of mea-
surement device effects.
formity among the {A
hi
This is further simplified by assuming uni-
}. That is, E{ThilUi = I} = l/B, then (2.9) may
be written:
~t =
N 1
Y
= i=l
l -Ui~it'
-t
~i
--
(2.11)
B
where Wi = l/~i and !it = {h~l Thi!hit}'
Then (2.10) may be rewtitten
as,
N
l
Y
i=l -i
since by (2.8), E T{Y
t,
- it
=Y=NY
(2.12)
-
} = Y with E T{~"} being interpreted as
-i
t,
expected value with respect to repeat trials and interviewer assignments.
The variance of y
-t
under the CSM can be decomposed into three
components of variance, as indicated by among others Koch (1973):
(2.13)
where,
(SRV)
= Simple
Response Variance Matrix,
(2.14)
19
(CRV) ::: Correlated Response Variance Matrix,
(2.15)
(SV)
:::
Sampling Variance Matrix
N
=
N
l
i=l
l
i'=1
(8",
¢1~ - 1) Y.Y i ,;
i i'
-1-
where -¢ = -1 ~l ¢.'''' -n . 'and
N i=1 1
N
(2.16)
~
1
n(n-1)
6 = N(N-1) i~i' 6ii ,
= N(N-l) .
The assumptions concerning the intrinsic response process (2.7)
and the interviewer or measurement device effects
(~h
= 0, h = 1, ••. ,B;
(BH)h' ::: 0, h = 1, ••• ,B, i = 1, ••• ,N) imply that there is no correlated
--
1
response variance component, that is, CRV = O.
Hence, equation (2.13)
may be rewritten:
1,
~~: (~t) ::: n {(SRV)}
2
l
where (SRV) ::: N {-N
N -
l ~ ~i}'
i=l 'l'i
where
~i
, (2.17)
+ (SV),
1 B
::: B h~l
~hi
and
~hi
is
defined in (2.7).
2.4
Replication for Estimates of Var{y
}
..... "V_ -t
The key to an inference structure for the survey statistic is
the estimate of its covariance matrix, ~~:{~t}'
For the purposes of
discussion attention will be limited to the usual Horvitz-Thompson
estimator (2.11), under the assumptions given in Sections 2.2 and 2.3
with respect to the uncorrelated nature of the intrinsic response errors
and to the absence of external response errors.
Further, since fairly
complex surveys are to be discussed, only the indirect method of esti-
20
mation, balanced repeated replication (BRR) is considered.
This is
because the numerical calculations associated with direct estimators
can require considerable effort with respect to programming as well as
substantial computer time costs.
The BRR method has been used extensively by the National Center
for Health Statistics (Simmons and Baird (1968»
as well as other
institutions and organizations engaged in survey research.
The effi-
cient procedures for application of the method were developed by
McCarthy (1966, 1969a,b).
Empirical investigations of the technique
have been carried out by a number of individuals, Frankel (1971), Kish
and Frankel (1968, 1970), and Koch and Lemeshow (1972).
The focus of
these investigations have either been directed at comparing it to other
approaches to variance estimation, investigating the selection process
component of variance, or specialized applications.
It has been found
to be a reasonable approach to estimating Var(y ) in all three situa--- -t
tions.
It is worthwhile to examine the BRR method in some detail.
discussion follows that of Koch, Freeman, and Freeman (1975).
The
The
principal concept which governs the use of BRR is that the variability
of a statistic based on a total sample can be estimated in terms of the
variability of that statistic across subsamples (called pseudo-rep licates or simply replicates) which reproduce (except for size) the comp1ex design of the entire sample.
This is analogous to the replication
designs for selection, Deming (1960).
pseudoreplicatlon occurs
The difference being that the
on a postsamp11ng basis.
Hence, BRR has con-
siderable appeal in those situations where clustering causes the underlying distribution theory for determining the 8
1i
, as well as the compu-
21
tational efforts at estimating Y~E{~t} to become impractical.
One
specific version of BRR is the method of half samples.
The half sample procedure is characterized by a matrix H with
elements h
ik
defined by,
l
h ik = { 0
if individual i is in the k-th half sample,
(2.18)
otherwise.
For each half sample, the half sample estimates y_t k are defined by,
N 1
~tk
=2 I ~
1=1 'fIi
which is directly analogous to the y
-t
1. 1. -1.
based on the whole sample for
estimating the population distribution, Y.
for the variance matrix of y
-t
(2.19)
U.h·kY· t ,
The resulting estimator V
is,
where L denotes the number of half sample partitions.
In this context,
it should be noted that the choice of H is critical in this method of
determining the estimator V.
McCarthy (1966) describes some effi-
cient methods for this which involve orthogonal matrices.
It should be apparent that implicit in the assumptions of this
method is the absence of external response errors (e.g., measurement
device effects), as assumed throughout this discussion.
However, appro-
priate modifications to the definition of H are within the scope of the
BRR approach for variance matrix estimation, modifications which could
reasonably reflect this additional source bf error, as well as intrinsic
response error and sampling error.
22
An additional advantage to the method is that the covariance
of two linear sample statistics can be estimated.
of estimates and y(2) be a second set.
-t
Let
l~l)
be one set
Then these can be arrayed as a
single vector with the corresponding set of half sample estimates,
(
(1)'
ltk
(2) , ) ,
'ltk
.
Thus, the joint variance matrix estimate is given by
1
= -
L
l
(2.21)
L k=l
Thus, for example, the covariances of two (or for that matter several)
margins can be estimated without ever examining the complete crossclassification.
In summary, BRR provides a computationally straight-
forward method of estimating Var{Zt} or associated sets of margins.
That BRR has desirable properties such as unbiasedness has been shown by
McCarthy (1966, 1969a,b).
2.5
Application of BRR and CSM to Domain Rates
Many surveys have domain rates as output in addition to the pop-
ulation proportions and distributions discussed in Sections 2.2 and 2.3.
An example is when in addition to estimating the number of persons in
population subclasses having various numbers of physician visits, the
rate of physician visits for the subclass is to be estimated.
the case where j in (2.5) is vector valued.
various margins of
!
This is
In fact, it is the ratio of
which is to be estimated.
This is discussed in
Koch (1971) and Koch, Freeman, and Freeman (1975), for the estimation of
continuous variables.
Let there be r classes for the numerator variable of the rate
23
and there be s domains for the denominator variable.
Redefine the com-
ponents of !hit in (2.5) as,
if individual i is in domain j2(=1, ••• ,s)
and in class jl(=l, ••• ,r),
(2.22)
otherwise,
where, as in Section 2.5 measurement device effects are ignored.
Then
the marginal statistics are of the form,
N
x'
-t
where A'
=
a
A
-r
=
A
=
-6
and a
j1
Thus, ~~
l
= A- -Yt
= A
U
L -i
i=l <Pi
(2.23)
Y.
-l.t
[A',A'] and,
-r -s
0
0
a
0
0
a
0
0
2
0
r
0
0
a
0
0
a
1
0
0
1
o
o
0
1
0
0
1
0
0
1
a
0
0
a
1
0
0
1
0
o
1
0
0
o
0
1
0
1
a
0
0
l
o
o
2
r
corresponds to the value of the j1-th class, j1
=
0
m
(an s x rs
matrix)
(an s x rs
matrix)
1, ••• ,r.
(~it,~it) where
(2.24)
24
N
x 2j
2
t
=
U
i
L
~
i=l ~i
r
L
1, ... ,s.
(2.25)
j =1
1
There are two methods of estimating the variances and covariances of
the rates, r
j
2
t
.
(2.26)
j2=1, ••• ,s
These ratios may be replicated by the method of Section 2.4, the method
of half samples.
Then':kt is the corresponding half sample estimate
and the variance matrix estimates of :t is given by
(2.27)
The alternative is a Taylor series expansion as suggested by Koch,
Freeman, and Freeman (1975).
Here, write !t as:
(2.28)
where the
~~~
and
!~
functions are taken componentwise and A is defined
in (2.23) and
~(8
x 28)·
I.~.
o
o
-1
0
0-1
1
o
o
100
Then the linear term of the Taylor series
~xpansion
.~
-D
yields:
25
(2.29)
where V(y ) is obtained from (2.20) and D is a diagonal matrix with
- -t
-x
the elements of x on the diagonal. Thus, there are two estimates of
the variance matrix of
:t'
The replicated ratio estimates, VRR(r ), has the advantage that
-
-t
half sample estimates of only the desired ratios need to be computed.
Unfortunately, the properties of BRR for nonlinear estimates are not
known.
On the other hand, while
~TS(:t)
is subject to errors of
approximation which depend on the sample size, it involves only linear
statistics for which BRR is known to be
unbiased.
only the estimation of the desired margins.
It also requires
Both estimates are liable
to the effects of response errors, the effect which is to be investigated on a limited basis in Chapter Four.
The
~TS
estimate illustrates an additional point.
Estimated
covariance matrices for more complex sets of compounded functions
involving estimates other than domain rates can be produced in an analogous manner.
For example, it is a straightforward generalization to
estimate the covariance structure of differences between domain means,
poststratified means, certain complex functions, including lifetable
functions, based on vital rates, and rank correlation type measures of
association.
the vector
It
The key problem is the identification and estimation of
and the corresponding estimate ~(lt)'
In summary, estimates, Y(:t)' may be obtained either by direct
replication or by compound function operations in conjunction with Taylor series expansions.
the difference is
Both procedures yield identical estimates, :t'
prin~rily
whether errors of approximation or the
26
unknown properties of BRR are felt to be more troublesome.
There are
also certain sample size considerations which are discussed in Koch and
Lemeshow (1972).
These are beyond the scope of this investigation.
Nevertheless, the estimator r
-t
and its corresponding variance matrix
estimate V(r ) may be obtained for complex surveys in the general
- -t
framework described in Sections 2.2 to 2.4.
The ultimate objective is
to provide a general inference procedure for r
based on these variance
-t
estimates.
2.6
An Inference Structure for Complex Survey Estimates
A reasonable position to hold is that the objective of a stat-
istical analysis of a body of data is the summary of the data by some
small set of estimates which correspond to some substantively interpretable set of parameters.
This is usually achieved through various
tests of hypotheses and inference procedures culminating in what might
be called model building.
As discussed in this section, the approach
to this objective is that of Koch, Freeman, and Freeman (1975), among
others.
It is often termed "weighted least squares," and uses the GSK
methodology.
Let
rates,
~t'
!
denote a (gxl) vector of statistics such as the domain
as described in Section 2.5.
Let
YF denote
a consistent
estimate of the corresponding (gxg) covariance matrix obtained by
methods such as discussed in (2.4) and (2.5).
Then, the relationship
between the variation among the g estimates contained in
!
and the sub-
stantive aspects of the domain classification system can be investigated by fitting regression models to the vector
weighted least squares.
!
by the method of
If attention is restricted to linear models,
27
the analysis can be characterized by writing,
F
F =
I
F
2
...
~
F
g
where
~
x
x
x
gl
x
x
12
x
22
x
g2
lu
Sl
2u
62
gu
6u
- -6,
(2.30)
= X
is a prespecified design (or independent variable) matrix of
known coefficients, x
~
xII
x
2l
ij
(i
= l, •.• ,g;
j
= l, •.• ,u),
of rank u, u
is a (uxl) vector of unknown parameters or effects; and
lIis estimated by.1I
Note, that the definitions of ~, x
ij
II~II
g;
means,
' i, and j no
longer correspond to the definitions in the earlier sections.
ference should be clear from the context.
~
The dif-
The choice of the matrix X,
i.e., the model of the domain structure, implies the existence of a
matrix L which is orthogonal to X.
A
f
= L F = L X b = 0,
(2.31)
represents a set of constraints among the components of F implied by
the model.
Since f is a linear transformation of F, the variance
-
structure of f is estimated by,
~f = ~ ~F ~'.
This
~uggests
(2.32)
an appropriate test statistics for the goodness of fit of
the model specified by X is,
Q = f' V-I
_f
f
=
f' (L V L,)-l f,
_F -
(2.33)
which, under the hypothesis specified in (2.31), is approximately distributed according to a Chi-squared distribution with degrees of freedom
28
equal to (g - u) (denoted d.f.
=g
- u).
The approximation depends on
the overall sample size, n, being sufficiently large that the elements
of the vector
!
have an approximate multivariate normal distribution as
a consequence of Central Limit Theory.
Such test-statistics are known
as Wa1d (1943) statistics and have been employed in the analysis of
categorical data by Grizzle, Starmer, and Koch (1969),
Koch (1973) and Koch and Tolley (1975), among others.
size associated with most complex sample surveys is
assumption that the
!
Forthofer. and
Since the sample
quite large, the
follows an approximately multivariate normal dis-
tribution is reasonable and therefore these Wa1d statistics provide a
valid and useful framework for the analysis of the resulting survey
estimates.
Moreover, the usefulness is enhanced by the manner in which
these statistics are computed.
These Q statistics are, identical to the usual weighted least
squares quadratic form which is minimized in the estimation of S in
(2.30).
Specifically,
Q = !'(~ ~F ~,)-l f _ (F
Xb),
--
(2.34)
where
(2.35)
This is easily seen by substituting (2.35) and (2.34), applying Koch's
Lemma (Koch (1969», and employing
orthogonal matrix.
~
as the required left and right
From the identity established in (2.34) and the
large sample validity of Wa1d statistics in general, it follows that the
estimates b must also be regarded as having reasonable statistical properties with respect to
~,
because of the way in which they determine Q.
29
In the light of this argument, it can be noted that,
v =
-b
(X' V-I X)-l
-
-F
-
(2.36)
'
represents a reasonable and consistent estimate of Var(b).
Returning to the original objective of determining a minimum
set of parameters for describing the target population and its crossclassification, statistical tests of significance involving b may be
performed by standard multiple regression procedures, once a suitable
X has been established under the hypothesis (2.31).
Linear hypotheses,
which correspond to additional constraints on the model, are
c S = 0,
-
fOl~ulated
where C is a known (d x u) coefficient matrix, and
tested using the statistic,
Qe = b'
C,]-l C b,
_ _C' [C(X'
_ _ V-I
_F X)-l
_
which,
~
(2.37)
fortiori, is approximately distributed according to a Chi-
squared distribution, d.f.
= d,
under R '
O
Successive uses of the goodness of fit tests (2.34) and the
significance tests (2.37) viewed as additional constraints specified
by the C matrices, represent ways of partitioning the model components
into specific sources of variation.
In this context, the specific Q
C
statistics reflect precisely the amount by which the Wald statistic Q
would increase, if the basic model X were modified to reflect the additioanl constraint implied by liO:
C
S = O.
A further discussion of the
statistical aspects of the approach to model reduction is given in Koch
and Tolley (1975).
Finally, predicted values corresponding to any specific nlodel
can be calculated from
30
F=
X b ~ X(X I V-I X)-l Xl V_- l F,
F
-F -
(2.38)
and corresponding estimates of variance can be obtained from the diagonal elements of
VA
=
-F
X (Xl v_- l X)-l Xl.
F
- -
(2.39)
Such predicted values not only have the advantage of characterizing
all of the statistically important features of the substantive variation in the original data, but also represent better estimates than the
the entire sample, or at least those domains which are combined in the
original function statistics, F, since they are based on the data from
model reduction, as opposed to its component parts.
values are said to be based on the final model.
Such predicted
These predicted values
are also descriptively advantageous in the sense that they explicitly
display trends and effects and permit a more transparent
interpreta-
tion of the independent variables and design variables comprising
~.
In conclusion, this inference procedure of weighted least squares,
motivated by Wald statistics, fulfills the stated objective of determining a minimum set of statistics to estimate the parameters of the
underlying target population.
2.7
Summary
This chapter has been devoted to the three major components in
the framework of the empirical sampling investigation which is the main
thrust of this thesis. A generalized response error model for crossclassified data from complex sample surveys has been developed.
This
model characterizes the major sources of response error as well as
31
interactions.
The model implied certain linear survey sample statis-
tics; the variance matrix of which was then decomposed to display the
major sources of variation characterized by the model.
The variance matrix for such statistics was then estimated by
indirect methods.
The method of estimation discussed was that of bal-
anced repeated replication for half samples.
were suggested.
Two distinct approaches
One was in terms of replicated ratios (RR) and the
other .in terms of Taylor series approximations (TS) based on the original cross-classification.
These Taylor series approximations were
especially important in the application to the estimation of domain
rates.
It was noted that each approach permitted the direct investi-
gation of marginal tables of interest rather than the fully cross-classified data set.
This variance matrix estimation procedure was then used to permit the examination of an inference structure for complex sample survey
estimates.
The procedure proposed is that of weighted least squares
based on generalized Wald statistics.
Test statistics for both goodness
of fit and linear hypotheses were displayed.
Procedures for model
building and overall inference objectives were discussed.
Finally, the
predicted values based on this procedure and the corresponding variance
estink,tes were displayed and the advantages of these were discussed.
In conclusion, the framework, which consists of:
(i)
(ii)
(iii)
a response error model for crossclassified survey data;
a method of variance matrix estimation;
an inference procedure;
has been developed and discussed.
It is now possible to perform empir-
ical sampling experiments in the context of this framework.
The results
32
of two such experiments are the subject of the next two chapters.
CHAPTER THREE
THE INVES1IGATION OF EFFECTS ON INFERENCE
IN THE PRESENCE OF ONLY SAMPLING VARINiCE
3.1
Introduction
Chapter Two prOVided a general framework within which the sam-
pIing experiment of this paper may take place.
The discussion of the
CSM for response error suggests that two layers of the experiment are
appropriate.
This chapter is concerned with the comparisons of various
estimates, as outlined in Chapter One, when there is only sampling variance in the cross-classification variables and in the response variable.
The second layer, which is the subject of Chapter Four, is con-
cerned with the comparisons in the presence of limited response error
in the cross-classification variables.
To review, there are three rele-
vant comparisons:
(i)
The Taylor series approximation (TS) versus the replicated ratio estimate (RR), to evaluate
the consistency
of BRR for ratio statistics, for large samples,
(ii)
The comparison of postsampling data adjustments for nonresponse versus unadjusted data, to evaluate the asserted
variance reduction;
(iii)
The assumption that
interdomain covariances are zero
with respect to the power of test statistics.
The results of these comparisons are then incorporated into a general
inference structure.
The experiment is perfonllcd with data gcnerously
34
provided by the National Center for Health Statistics (hereafter
N.e.H.S.) from the Health Interview Survey (hereafter H.I-S.) for the
1973 United States population (N.e.H.S. (1974».
3.2
The Data
The response variable of interest for the first cross-tabulation
is the ratio of physician visits (PV) to population (P) for an Age (A)
by Sex (S) by Race (R) cross-classification into 16 domains.
Age is
divided into 4 ranges; 0 to 16, 17 to 44, 45 to 64, and 65 and over.
Sex has two levels, male and female.
other.
Race has two levels, white and
A description of the methods of data collection and estimation
is given in Vital and Health Statistics - Series 10 - No. 95, "Appendix
I" (N.e.H.S. (1974».
However, for clarity of discussion it is worth-
while to summarize the discussion given in that publication.
The survey information is collected using a continuing nationwide sample of households.
The population covered is the civilian, non-
institutionalized population of the United States living at the time of
the interview in 1973.
The sampling plan of the survey follows a multi-
stage probability design with the feature that the weekly samples are
additive over time.
The first stage of the design is the random selec-
tion of 376 primary sampling units (PSU's) from approximately 1900 geographically defined PSU's.
These PSU's each consist of a county, sev-
eral contiguous counties, or a standard metropolitan statistical area.
The final stage of selection within the PSU is a segment consisting of
an expected four households.
Segments based on geographic areas, seg-
ments based on 1970 census registers, and segments based on lists of
building permits are used.
TIlis selection process generates approx-
35
imately 12,000 segments containing 51,000 assigned households.
This
yields 42,000 eligible households for a probability sample of about
120,000 persons in 41,000 interviewed households in a year.
The field
operations for the survey are performed by the U. S. Bureau of the Census under specifications established by the National Center for Health
StaUs tics.
Since the sample is a complex multistage probability sample moderately complex procedures are used for the derivation of the estimates.
Four adjustments are used:
1.
The sample observations are inflated by their respective
sampling weights.
These weights are the reciprocals of the
products of the probabilities of selection.
2.
The estimates are multiplied by a nonresponse weight which
is the ratio of the number of sample households in a segment
divided by the number of interviewed households in a segment.
3.
There is a "first-stage ratio adjustment" by which the sample
estimates are weighted to reflect the 1970 populations within
12 race residence classes.
4.
There is a pcststratification ratio adjustment by which the
estimates are weighted to reflect the population within each
of 60 age-sex-race cells.
The weighting is obtained from an
independent population estimate prepared by the U. S. Bureau
of the Census.
Both the first stage ratio adjustment and the poststratification ratio
adjustment
take the form of multiplication factors applied to the
weight of each elementary unit (person or physician visit).
N.C.H.S.
feels that these ratio adjustments have the effect of making the sample
36
more representative of the civilian, non-institutionalized population.
It is the effect of this poststratification after sampling adjustment
that is investigated in the examples of this chapter and Chapter Four.
The response variable of interest, physician visits, is based on
the consolidation of a year of the interviewee's experience over the two
calender weeks prior to the week of the interview.
imize the possibility of recall error.
This tends tQ min-
In addition, there is a rela-
tively small range of possible responses with most being 0, 1, or 2.
Further, the response definition is:
IIA physician visit is defined as consultation with a
physician, in person or by telephone, for examination, diagnosis, treatment or advice. The visit is considered to be
a physician visit if the service is provided directly by
the physician or by a nurse or other person acting under a
physician's supervision. For the purpose of this definition
'physician' includes doctors of medicine and osteopathic
physicians. The term 'doctor' is used in the interview
rather than physician because of popular usage.
"Physician visits for services provided on a mass basis
are not included. • • •
IIphysician visits to hospital in-patients are not
included.
"If a physician is called to a house to see more than
one person, the call is considered a separate physician
visit for each person about whom the physician was consulted.
"The physician visit is associated with the person
about whom the advice was sought."
(N.C.H.S. (1974), pp. 55-56)
The observed estimates are presented in Table 3.1.
Here, the four
levels of age (A) are shown in the first column, the levels of sex (8)
are in columns two and three, and four and five, respectively.
descriptive patterns emerge.
Several
For persons under 17, there is an apparent
race difference which disappears at the older ages.
years (17-64) there is an apparent sex difference.
For the adult
Lastly, there may
well be some evidence of an age trend, but it is confused by some com-
37
TABLE 3.1
RATIO ESTIMATES OF PHYSICIN{ VISITS/POPULATION
CLASSIFIED BY AGE. SEX, AND RACE
Observed Estimates
Sex
Male
Female
Age
Race
Race
\o,Thite
Other
White
Other
<17
4.6549
3.0635
4.1199
3.0648
17-44
3.7129
3.0170
6.3649
6.3755
45-64
4.8232
4.6611
5.9242
7.0655
65+
5.8624
8.0478
6.9463
6.2122
plex interaction between the three variables.
These descriptive com-
ments are investigated statistically in Section 3.8.
3.3
Estimates and Variance Covariance Matrix
Having defined the sources of the data, it is next appropriate
to define the estimates that are used and the method of obtaining the
corresponding variance covariance matrix.
The separate estimates of
totals for physician visits and population size within each of the
classification domains are based on the entire sample with all four
adjustments applied.
In addition·, half sample estimates for 152 pseudo-
replicates are used for both poststratified and nonpoststratified estimates of the variance covariance matrix of both the totals and the
ratios of PV/P.
This provided the data fot implementation of the bal-
anced repeated replication procedures described in Chapter Two.
To be
38
specific the following formulae were used:
1.
Taylor Series Estimates:
(3.1)
where " L" indicates the summation is over all 152 pseudo"-
"-
replicates;
~y
and
~
are the l6xl column vectors of half
sample estimates for physician visit and population totals
for the' 16 classifications; PV and P are the corresponding
estimates based on the whole sample with all adjustments
applied; and
for poststratified half sample estimates
w == [1.0
t· o
2.
for nonpoststratified half sample estimates.
Ratio Estimates:
v
A
"-
== 1;2
l
'"
"-
«r - r)(r - r)')
(3.2)
A
where r = PV
P and r == PV . P, where
Il-,-Il
indicates compo-
nent-wise division.
Two points should be made.
First, the nonpoststratified half sample
must be weighted in (3.1) to reflect the fact that the estimates are
based on an orthogonally selected half of the sample.
cancels out in the replicated ratio
estimat~s
This weighting
of the matrix V in (3.2).
Secondly, since the poststratification was done on the same classification points as the classifications of the domains, the estimates of P
based on the half samples are identical to the estimates of
the entire sample.
'"
"-
~
based on
Thus P is a constant and r is a linear sample stat-
39
istic for the poststratified estimates of V.
The experimental design
may now be discussed.
3.4
Design of the Experiment
Three major comparisons are to be made:
1.
Taylor series (TS) versus replicated ratio (RR) estimates of
V,
2.
Poststratified versus nonpoststratified estimates of V,
3.
Covariance effects among domains.
Thus, there are 2 3
=8
experimental points to be examined.
eral comments need to be made.
All poststratified estimates are linear
statistics as pointed out in 3.3.
estimates of V are identical.
Several gen-
Thus, the Taylor series and ratio
This means that the four points of the
experiment corresponding to direct ratios and Taylor series approximation may be combined.
So for the poststratified data only the repli-
cated ratio estimates need to be compared for the case where the- covariances are estimated and the case where they are assumed to be zero.
The Taylor series approximations were also computed as a check on the
programs used and the outputs were identical to the replicated ratio
estimates to five decimal places.
A separate issue concerns evaluation of covariance effects.
comparison
The
of interest is that in terms of the interdomain covariances.
These are the covariances among the estimates of population, the covariances among the estimates of PV, and the covariances between PV and P
where PV and P are from different domains.
For purposes of this exper-
iment these are estimated and the resulting inferences compared to
inferences resulting when they are assumed to be zero.
The effects of
40
assuming the intradomain covariances to be zero were also investigated
but this assumption is quite drastic and
since it is known to yield a
poor approximation in any case, the results are not discussed.
However,
the appropriate tables are contained in the appendices to Chapters Three
and Four.
Having disposed of these issues, it is possible to discuss the
specific results.
The first will be an examination of the effects of
using Taylor series approximations as opposed to the direct replication
of the ratios.
3.5
The Comparison of the Taylor Series Approximation to the BRR Procedure for Ratio Estimates
The Taylor series approximation for the estimation of V is well
known (e.g., Koch (1970c».
The specific formulation in the GSK nota-
tion is
(3.3)
and where, K = ~16 ® [1
-1].
Then,
Here, D denotes a diagonal matrix with the elements of a on the diag-a
onal, and V(a) is obtained from (3.1). The standard errors of the
ratios,
!,
--
are square roots of the diagonal elements of
~F'
The stan-
dard errors for the replicated ratios RR are obtained from the square
roots of the diagonal elements of (3.2).
For the nonpoststratified
estimates, !, the two sets of standard error estimates are presented in
41
Table 3.2.
In addition, the ratio of replicated ratio (RR) estimate to
TABLE 3.2
COMPARISON OF ESTIMATED STANDARD ERRORS FOR RR AND TS PROCEDURES.
NONPOSTSTRATIFIED AGE x SEX x RACE CLASSIFICATION OF pvfp.
COVARIANCES ESTIMATED.
Age
<17
RR
TS
RR~TS
17-44
RR
TS
RR~TS
45-64
RR
TS
RR~TS
65+
RR
TS
RR~TS
Female
Male
Estimate
White
Other
White
Other
0.1244
0.1242
1.0012
0.2113
0.2107
1. 0027
0.1121
0.1120
1.0009
0.2215
0.2210
1.0021
0.0978
0.0978
1.0003
0.2340
0.2333
1.0029
0.1098
0.1099
0.9996
0.3277
0.3290
0.9961
0.1637
0.1642
0.9973
0.4516
0.4447
1.0157
0.1530
0.1530
1.0004
0.4286
0.4288
0.9994
0.2318
0.2308
1. 0043
0.9602
0.9538
1.0067
0.1907
0.1906
1.0005
0.6458
0.6388
1. 0109
the Taylor series estimate is presented in the 3-rd, 6-th, 9-th, and
12-th rows.
The range of standard errors considered is moderately
large, from a minimum of 0.0978 to a maximum of 0.9602.
The replicated
ratio estimate exceeds the Taylor series estimate 12 out of 16 times
but by at most 1. 6 per cent.
The 1'S estimate exceeds the RR estimate
the remaining four times by no more than 0.4 per cent.
It is evident,
therefore, that for this data set there is very little difference
between the two approaches for estimating the observed standard errors.
It should be noted that the comparisons in Table 3.2 are the only rele-
42
vant ones since, for this classification, the poststratified estimates
are linear statistics and the interdomain covariances are not relevant
to the estimation of the standard errors of the observed ratios.
noted before the
effect
As
intradomain covariances are important, but this
is discussed in other places and therefore felt to be outside
the scope of this investigation.
A second issue that stems out of this comparison is the question
of investigating domain differences.
At this point it is sufficient to
examine the effect on hypothesis testing where the covariances are not
assumed to be zero, a situation to be examined in Section 3.7.
only the nonpoststratified data need be examined.
Again,
Table 3.3 displays
the Chi square statistics for various hypotheses of interest.
The test
statistics are the GSK statistics discussed in Chapter Two for saturated
models.
As with the estimates of stmldard error there is little differ-
ence between the two types of estimates.
The Chi square statistic for
the TS estimate is not more than 0.8 per cent larger nor 3.0 per cent
smaller than the Chi square statistic based on the RR estimate.
This
applies both to the various tests of hypotheses and to the tests for the
final model.
tion 3.8.
The construction of final models will be discussed in Sec-
In conclusion, it may be said that for this data set there is
little difference between the TS and RR estimates of V, both in terms of
the estimates of standard error and in terms of hypothesis testing.
3.6
The Poststratification CPS) Adjustment of Survey Data
These data provide two comparisons for the examination of the
effect of poststratification when the interdomain covariances are not
assumed to be zero.
Based on the discussion in Section 3.2 an overall
43
TABLE 3.3
COMPARISON OF TESTS OF HYPOTHESES, REPLICATED RATIOS (RR)
VERSUS TAYLOR SERIES APPROXIMATION (TS), AGE, SEX, AND RACE
CLASSIFICATION, PHYSICIAN VISIT/POPULATION, COVARIANCES
ESTIMATED, Q STATISTICS, NONPOSTSTRATIFIED DATA
Hypothesis
d.f.
TS
RR
RR .;. TS
No Age Trends
Male White
3
92.19
92.00
0.998
Male Other
3
40.72
39.67
0.974
Female White
3
358.94
358.83
1.000
Female Other
3
92.27
92.97
1.008
<17
2
10.92
10.89
0.997
17-44
2
468.12
469.91
1.004
45-64
2
41.51
41.12
0.991
65+
2
17.43
17.35
0.995
<17
2
56.97
56.88
0.998
17-44
2
8.20
8.16
0.995
45-64
2
6.65
6.67
1.003
65+
2
7.59
7.52
0.991
l-lode1
2
1100.20
1103.90
0.995
Error
13
15.15
15.07
1.004
Total
15
1115.34
1118.97
1.003
No Sex Differences
No Race Differences
Final Models
44
reduction in estimates of standard errors should be observed.
Further,
the Q statistic for total variation corresponds to a measure of this
reduction.
A last point to be examined is that there does not appear
to be any analytical reason why this overall reduction will not be
observed when the
interdomain covariances are assumed to be zero.
Table 3.4 displays two measures of the overall reduction in
standard errors.
The first is the Q statistic for total variation and
the second is a somewhat cruder measure, the mean standard error.
latter statistic
The
is simply the average of the estimated standard
TABLE 3.4
EFFECTS OF POSTSTRATIFICATION IN TERMS OF OVERALL STANDARD ERRORS.
AGE, SEX, AND RACE CLASSIFICATION OF PHYSICIAN VISITS/POPULATION.
TS AND RR ESTIMATES.
Statistic
Standard Error Estimation Procedure
,
Poststratified
TS & RR
Nonpoststratified
TS
RR
Covariance Not Equal to Zero
Total Variation Q
DF = 15
1075.69
QPS/QNPS
1115.35
1118.97
0.964
0.961
Covariance Equal to Zero
Total Variation Q
DF • 15
816.31
QPS/QNPS
Mean Standard Error
MSE PS/MSE NPS
0.2899
821. 91
820.55
0.993
0.995
0.2902
0.2915
0.999
0.995
45
errors over the 16 cells of the cross-classification.
As can be seen,
poatstratification has the predicted effect but, it is a relatively
weak effect.
When the interdomain covariance is assumed to be
zero, less than a one per cent reduction is observed in Q.
The cruder
statistic, mean standard error, also shows only a small (less than one
per cent) reduction.
Thus, for this data set where poststratification
is based on the classification variables there is
&1
overall effect on the estimates of standard error.
almost negligible
This implies that
the unadjusted data is a representative sample.
3.7
Covariance Effects
Frequently for testing hypotheses involving comparisons among
domains the interdomain covariance among statistics is assumed to be
zero.
Further, it is asserted (N.C.H.S., 1974) that since the covar-
iances are positive, the power of the usual comparison is reduced and
any resulting statistical statements are conservative.
Tables
~.5.l
and 3.5.2 display the estimated correlation matrices for the replicated
ratios with and without poststratification.
It can be seen that there
are correlations as negative as -0.30 (17-44 MO and FW) and as positive
as +0.45 (FW and FO, <In.
Altogether, for the poststratified data,
49 of the 120 correlations (15 x 16 7 2) are less than zero and 26 are
less
~han
or equal to -0.10.
Thus, while more than half of the corre-
lations are between -0.10 and 0.10, the assumptions of zero correlation
is neither necessarily conservative nor reasonable.
The evaluation of
the zero assumption requires only the examination of the replicated
ratios for the nonpoststratified estimates since the differences from
the TS estimates and poststratified estimates are negligible in this
.
TABLE 3.5.1
CORRELATION 11ATRIX FOR ~PLICATED RATIOS
AGE x SEX x RACE
POSTSTRATIFIED
17-4/.
<17
t:J
<17
17-4~
45-64
65+
Mil
FIi
Fa
~J
45-64
NO
FO
F..I
~rw
65+
HO
Flo'
FO
KW
MQ
FW
I'll
.
KJ
1.(;000
HI)
0.1641
1.IXlOO
F'~
0.0479
V.OOOS
1.0000
1'0
0.V,31
0.43::'6
0.0069
1.01l00
Hoi
0.03~6
O.035t
0.:l661
0.0431>
l.0000
!'S)
0.('297
-0.07a8
-(l.On1
-0.2764
'0.2207
1.0000
N
O.07,J:j
0.02l7.
0.0,05
0.3000
0.1133
-0.3191
1.0000
:fO
O. :;..)57
0.1771
0.0205
-O.lCO':'
. O. 2565
0.3120
-0.2617
1.0000
~L!
-C.1I0U
0.0~13
-0.1728
-0.1091
-0.0789
0.0662
-0.0453
0.2913
1.0000
~.D
0.0207
0.0~26
-0.08~/.
0.0185
0.0406
-0.3070
-0.0902
0.1532
0.0703
1.0000
F'J
0.15M
0.0644
0.2131
0.016G
0.0338
0.0657
-0.0082
0.026~
0.0628
-0. 0050
l.OOllO
;0
0.':'473
0.0429
-0.0100
-0.1767
-0.14111
0.2177
-0.1040
0.1794
0.0667
0.0711
0.0295
1.0001l
/1loi
-0.029"
0.0231
-0.1422
-\>.0955
-0.051b
-0.1971
-O.06~7
0.0%5
0.2257
0.1075
0.0901
-0.14·91
l.oono
t:o
-0.0673
0.2199
-0.On02
0.14':':1
0.0098
-0.Ul2
-0.0129
-0.2517
0.0:151
-0.OO~1
-0.1437
0.2no
-0.1505
1.(;000
m
0.0341
0.0734
0.0942
-O.OJll>
-0.1282
0.0325
-0.0142
0.457.,
0.1089
-0.0521
-0.2073
0.0661
0.1632
-0.0575
1.0000
Fa
-u.0720
-0.0731
-0.1455
0.0192
,
-0.0064
0.0649
0.1994
0.0458
0.1314
0.0679
0.2452
-0.1227
0.0863. -0.1646
0.14"3
1.0000
~
0'\
TARLE 3.5.2
CORRELATIQN MATRIX FOR REPLICATEn RATIOS
AGE x SEX x RACE
NONPOSTSTRATIFIED
17-44
<17
li10
<17
17-44
45-64
o.H
tlO
F\l
<'0
tlW
4:;-64
~10
F\l
FO
...
ES+
HOw
HO
i'Y
FO
v'
HO
MloI
1. OtlOO
W
0.2328
1.OG<Y.l
F';
O.O!»(;4
0.0126
1. 0000
;0
0.1439
0.4511
0.0057
l.0000
Mol
0,029:;
-0.OH5
0.2410
O. 041~
~o
o .04Jl
-0.on7
-0.047d
-0.27~v
-0.2348
1.0000
?,f
0.1025
0.0055
o .O~71
O. :':CSb
0.1294
-0.3023
1.0000
FO
0.3036
0.226:;
-0.0069
-0.1025
-0.2718
0.J658
-0.2779
1.0000
X"~
0.0091
C.0318
-0.1630
-0.100i
-0.0778
0.0567
-0.0584
0.2740
1.0000
~~J
0.0058
0.0567
-0.0748
-O.Or.Zl
0.U227
-0.2',56
-0.0814
0.1.72')
U. 079 7
1.0000
r...
0.124Z
0.078>
O. n09
o .OJl3
0.0383
0.0437
-0.0051
0.i!lZ4
0.0736
-0.0078
1.0000
Fe:
0.12])
0.0227
0.0302
-0.1603
-0.
~D ..
1j.l.264
-G, Obit)
O.l/5il
0.07.~O
oj.
0 75 11
0.0348
1. COCO
i-i"'"
-0.0796
-0.0224
-0.1541
-0.0769
-0.0467
-0.1978
-0.0636
0.0421
0.2110
0.(1(,31
0.0982
-0.1973
1.0GOO
iiI)
0.0151
0.2374
-0.0437
0.16),
0.0171
-0.1492
-0.0044
-0.2332
0.0469
0.0099
-0.1221
0.2743
-0.J894
1.0000
1
}""'"
fv
1.0000
Fol
0.0256
0.04436
0.072:3
-0.1122
-0.1140
0.0179
-0.0089
0. .150
0.c,n7
-0.0614
-0.1824
O.CJu~"
0.1865
-0.1097
1.0\)00
Iro
-0.0489
0.11;0
-0.1514
0.lJ67
-0.u762
-u.12:17
0.017/!
0.0427
0.0826
0.1826
0.0671
0.1332
0.0674
0.3010
-0.12l2
1.0000
~
.......
48
classification of the data, as shown in Sections 3.5 and 3.6.
Detailed
tabulations for other comparisons are shown in the appendix to this
chapter.
Table 3.6 shows the inference statistics resulting from a final
model for the two sets of assumptions.
The left side of the table shows
the various statistics for the final models when the covariances are
estimated and the right side shows the statistics when the covariances
are assumed to be zero.
right sides.
(~)
The bottom block shows the ratio of the left to
When the covariances are assumed to be zero the parameters
are overestimated by as much as six per cent, the variances by up to
26 per cent, and the Chi square test statistics are underestimated by as
much as 36 per cent.
Thus, it appears that there are two related
effects when the covariances are assumed to be zero.
First, there
appears to be an unanticipated bias introduced in the modeling procedure.
This is not serious for this data set.
More important, there is a sig-
nificant overestimation of the standard errors which in turn produces
conservative test statistics.
Thus, the assumption of zero covariances
is unduly conservative and if patterns of negative covariances occur,
then the assumption could produce misleading inferences.
However, the
latter possibility did not occur for this cross-classification.
3.8
Inference Structure and Substantive Conclusions
Having examined the effects of different estimation procedures,
it is worthwhile to examine the original data set and form substantive
inferences.
This is done in the context of forming a final model based
on the various tests of hypotheses and estimating the corresponding
final models.
The inference procedure of Section 2.6 is implemented.
TABLE 3.6
EFFECTS OF ZERO COVARIANCE ASSUMPTIONS ON FINAL MODELS
NONPOSTSTRATIFIED REPLICATED RATIOS
PHYSICIk~ VISITS/POPULATION CLASSIFIED BY AGE, SEX, AND RACE
Covariances Estimated
Interdomain Covariances Zero
Parameter Vector b and Estimated Covariance Matrix V(b)
--
-
~
B0379~
U05085 -1.3265 2002~
~0416~
800859 -1.6784 1.85~
= -0.8316 ; V(b) = 1.3265 0.6487 -0.8717 x 10- 3 b = -0.8436 ; V(b) = -1.6784 0.8806 -0.6110 x
0.4967
- -
-
2.0230 -0.8717 11.371
0.5304
- -
.,
10-~
1.8516 -0.6110 11.802
Tables of Variation
Source
d. f.
Q
% Total
Source
d. f.
Q
% Total
Model
2
1103.90
98.65
Model
2
808.31
98.51
Error
13
15.07
1.35
Error
13
12.24
1.49
Total
15
1118.97
100.00
Total
15
820.55
100.00
Ratios:
Parameters:
(Covariance Estimated) + (Covariance
[99]
0.986
0.936
,
Covariance Matrix:
Tables of Variation:
[886
0.790
1.093
Model
1.366
Error
1.231
Total
1.364
\
= 0)
0.790
0~737
1.427
LOrn
1.427
0.963
~
\0
50
Since poststratified estimates and replicated ratios have generally been
used by the N.C.H.S., these are used here.
However, the discussion in
Section 3.7 points out the importance of estimating the covariances.
This is done for this set of inferences.
As noted in Section 3.7 Inore
power is obtained for this data set by estimating the covariances.
The observed ratios were given in Table 3.1.
Table 3.A.l in the
appendix to this chapter shows the detailed tests of hypotheses.
The
tests of hypotheses are obtained from a saturated model where the design
matrix, X, and parameter vector, b, are given
X=
~l 0
0
~l
-0
-0
-00 -00
0
-~l -0
0
0
by~
~l
b = ~2
~3
~4
~l
(3.5)
where
X
-1
=
Here the
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
1
1
0
0
1
1
1
1
1
1
~i
(i
0
= 1,2,3,4)
=
~i
Ai , i = 1,2,3,4.
~i =
B.
~
C
i
correspond to the ratios for the <17 age
group in each sex by race classification.
The Ai' B , C correspond
i
i
to the changes in each consecutive age group.
Thus, by selecting appro-
priate contrasts, it is possible to test the various hypotheses of
interest.
Based on the results of these tests, a final model was con-
structed.
It is given on the third page of each of the appendix tables
and at the bottom of Table 3.7.
The design matrix is:
51
.1
1
111
X' = 2 4 3 4 3 4
[0 0 0
also given.
o
000
The estimated parameter vector
1
o
=
~'
1
0
0
1 1 1
220
o 0-1
(~)bl)b2)
1
1
0
o
o -1
1
o
1
= (6.37)-0.83,0.49) is
By selecting appropriate contrasts predicted values and
the corresponding standard errors may be estimated.
the top of Table 3.7.
These are shown at
As in most linear models an interpretation may be
attached to each of the components
of b.
~
corresponds to a
b~seline
estimate of the number of physician visits per person, specifically for
other females over 17 and white females in their child bearing years
(17-44).
b l is the major incremental reduction in physician visits for
other age, sex, race groups.
Thus, there is a 4 step reduction for
other females under 17 and for other males under 45.
There is a 3 step
reduction for white females under 17 and for white males 17 to 44.
The
last major incremental reduction is of Z steps for white males under 17
and all males 45 to 64.
Thus, the major incremental reduction clusters
the 16 sample domains into only four distinct groups.
However, a second,
and minor, incremental term was necessary to reduce the Chi square due
to error to a nonsignificant level.
older age groups.
This is b ' which effects only the
Z
It is half the magnitude of b , and is parameterized
l
so as to have an opposite sign.
It generates two additional groups:
1.
At 5.88 PV/P, white females 45-64 and white males 65+
2.
At 6.87 PV/P, other males and white females 65+.
This model displays only statistically distinct predicted values (at the
a
a
.05 significance level) which are well separated from their nearest
neighbor (at least 3 predicted standard errors).
square
Further, the Chi
statistic for error or goodness of fit is Q
E
= 15.93 with 13
52
TABLE 3.7
PREDICTED VALUES (ESTIMATED STANDARD ERRORS IN PARENT~lliSES)
FOR RATIO ESTIMATES OF PHYSICIAN VISITS/POPULATION
BASED ON POSTSTRATIFIED DATA
WITH AGE, SEX, RACE CLASSIFICATION
AND ALL COVARIANCES ESTIMATED
Inference Table
Age
Sex
Male
Female
Race
White
Other
White
Other
Predicted Values
<17
4.72
(0.04)
3.06
(0.07)
3.89
(0.05)
3.06
(0.07)
17-44
3.89
(0.05)
3.06
(0.07)
6.37
(0.07)
6.37
(0.07)
45-64
4.72
(0.04)
4.72
(0.04)
5.88
(0.11)
6.37
(0.07)
65+
5.88
(0.11)
6.87
(0.14)
6.87
(0.14)
6.37
(0.07)
Coefficients of (b ,b )
1 2
<17
(2,0)
(4,0)
(3,0)
(4,0)
17-44
(3,0)
(4,0)
(0,0)
(0,0)
45-64
(2,0)
(2,0)
(0, -1)
(0,0)
65+
(0, -1)
(0,1)
(0,1)
(0,0)
53
degrees of freedom which in nonsignificant and amounts to less then two
per cent of the total variation in the data.
Thus, this is a very sat-
isfactory model and based on it, several substantive conclusions are
reasonable.
The most obvious is that among persons under 17, white
males have the most physician visits, white females occupy an intermediate position and others receive the fewest.
The number of physician
visits for all females rises sharply to 6.37 in the child bearing years
(17-44) and remains at that level at later ages for other females.
For
white females there is a moderate decline in the ages 45 to 64 and
another sharp rise over 65.
White males show moderate decline in the
early adult years (17-44) but return to the level of the young white
males in the late adult years (45-64) and show another moderate rise in
old age (65+).
Other males continue at a low level of visits in early
adulthood, but rise sharply to the level of white males in late adulthood, and rise again to the highest levels in old age.
Thus, it appears
that there is no clear cut set of differences except that women between
17 and 65 have more visits than men and whites under 17 have more visits
than other racial groups.
These models are examined under the other 5
forms of estimators and the conclusions are not effected.
is shown in Table 3.A.7 if the intradomain
However, as
covariance between popula-
tion and physician visit totals is assumed to be zero, then some of the
differences in this model will not necessarily be detected.
ally, those differences due to b
3.9
2
Specific-
will tend to be obscured.
Summary
Four major issues have been examined in this section.
This data
set is believed to involve primarily only sampling variance and under
54
those conditions which pertain to this sample (as outlined in Section
3.2) it appears that:
1.
The Taylor series approximation estimate of V is virtually
identical to the replicated ratio estimate.
2.
Poststratification on the classification variables has little
or no effect on the estimate of
y,
indicating the represent-
ativeness of the original sample.
3.
The assumption of zero covariance among domain estimates produces inflated estimates of the standard error of predicted
values and correspondingly deflated variation statistics.
In addition, moderate biases in the predicted values are
observed.
Thus, it appears the assumption of zero covariance is the most questionable.
These conclusions are investigated in the presence of some
response error in the classification variables and poststratification
~
on variables other than the classification variables in Chapter Four.
Section 3.10
Appendix of Detailed Tables
Tables for each of the experimental points discussed in the
chapter are contained in this section.
In addition, Table 3.A.7 con-
tains the estimates when all variables including the intradomain covariance between PV and P are assumed to be zero.
three sections.
Each table consists of
The first section shows the observed standard errors
for the ratio of PV to P and the ·saturated model parameters discussed
in Secti.on 3.9.
The second section shows tests of hypotheses for Age,
Sex, and Race contrasts.
It should be noted that in the Hypothesis
Testing section the d.f. column refers to the a priori degrees of freedom and Q is the corresponding Wald quadratic form.
Wllere appropriate
55
the estimate contrast and its standard error are given. The third
section shows statistics relevant to the final models.
Table 3.8
lists the various tables.
TABLE 3.8
DETAILED TABLES IN THE APPENDIX
--
,
-
Interdomain
Covariance
Table
Stratification Estimator
3.A.l
Poststratified Taylor
Series and Ratios
:f0
3.A.2
Poststratified Taylor
Series and Ratios
=0
3.A.3
Nonpoststratified Taylor
Series
:f0
3.A.4
Nonpoststratified Taylor
Series
=0
3.A.5
Nonpoststratified Ratios
:f0
3.A.6
Nonpoststratified Ratios
=0
3.A.7
Nonpoststratified Taylor
Series
All
Covariances
=0
I
56
TABLE 3.A.1
RATIO ESTIMATES, Y(r) BASED ON TAYLOR SERIES APPROXIMATION
OR REPLICATED RATIOS POSTSTRATIFIED.
PHYSICIAN VISITS (PV)/POPULATION (P). (AGE, SEX, RACE)
COVARIANCES ESTIMATED.
Observed Estimates of Standard Error
for the Ratios
Age
White
Male
Other
Female
White
Other
Observed Estimates of S.E. for
Saturated Model Parameters
II
B
A
C
<17
0.1257
0.2092
0.1108
0.2229
0.1257
0.1564
0.1966
0.2492
17-44
0.0978
0.2298
0.1135
0.3203
0.2902
0.3227
0.5662
1.0512
45-64
0.1629
0.4517
0.1554
0.4242
0.1108
0.1513
0.1932
0.2673
65+
0.2288
0.9455
0.1876
0.6514
0.2229
0.4082
0.4835
0.7291
Tests of Hypotheses:
d. f.
Q
15
1075.69
Hale White
<17 VS. 17-44
17-44 VS. 45-64
45-64 VS. 65+
3
1
1
1
90.66
36.25
31.91
17.40
-0.9419
1.1103
1.0392
0.1564
0.1966
0.2492
Male Other
<17 VS. 17-44
17-44 VS. 45-64
45-64 VS. 65+
3
1
1
1
41.13
0.02
8.43
10.38
-0.0465
1. 6441
3.3867
0.3227
0.5662
1.0512
Female White
<17 VS. 17-44
17-44 VS. 45-64
45-64 VS. 65+
3
1
1
1
365.81
220.17
5.20
14.62
2.2450
-0.4407
1.0221
0.1513
0.1932
0.2673
Female Other
<17 VS. 17-44
17-44 vs. 45-64
45-64 VS. 65+
3
1
1
1
94.89
65.79
2.04
1.37
3.3107
0.6900
-0.8532
0.4082
0.4835
0.7291
Source
Total Variation
Contrast
S.E.
Age Trends
57
Table 3.A.l can't.
d. f.
Tests of Hypotheses (can't.)
Sex Differences by Age
<17
S.E.
Contrast
Q
White
Other
2
1
1
10.71
10.70
0.00
0.5350
-0.0013
0.1636
0.2262
White
Other
2
1
1
454.42
360.74
103.03
-2.6520
-3.3585
0.1396
0.3309
White
Other
2
1
1
40.84
25.51
16.20
-1.1010
-2.4043
0.2180
0.5973
White
Other
2
1
1
17.19
15.97
3.32
-1. 0839
1. 8356
0.2712
1.0081
Male
Female
2
1
1
55.07
50.76
18.06
1.5913
1.0550
0.2234
0.2483
Male
Female
2
1
1
8.14
6.70
0.00
0.6959
-0.0106
0.2689
0.3688
45-64
Male
Female
2
1
1
6.73
0.12
6.50
0.1621
-1.1412
- 0.4693
0.4474
65+
2
1
1
7.18
4.72
1.10
-2.1855
0.7341
1.0057
0.6997
17-44
45-64
65+
Race Differences by Age
<17
17-44
Male
Female
Final Hodel:
Design Matrix
X' =
[
1
4
0
1
3
0
1
4
0
1
3
0
1
4
0
1
0
0
1
0
0
1
2
0
1
2
o
1
0
-1
1
0
o
1
0
-1
1
0
1
1]
o
1
1
0
Estimated Model Parameters
b -
Q:i~;11
G·49~~
V(b)
~ -'
=
1.837~
,·[.5088
-1.3361
-1.3361
0.6616
-0.7894
1.8374
-0.7894
11.382
x 10-
3
58
Table 3.A.1 can't.
Smoothed or Fitted Table
Sex
Age
Race
Class
Male
White
Other
White
Female
Other
<17
Fitted
(Obs. S.E.)
Fitted S.E.
(Res. )
4.7182
(0.1257)
0.0426
(-0.063!+)
3.0622
(0.2092)
0.0664
(0.0014)
3.8902
(0.1108)
0.0495
(0.2297)
3.0622
(0.2229)
0.0664
(0 .0027)
17-44
Fitted
(Obs. S .E.)
Fitted S.E.
(Res. )
3.8902
(0.0978)
0.0495
(-0.1773)
3.0622
(0.2298)
0.0664
(-0.0451)
6.3743
(0.1135)
0.0671
(-0.0094)
6.3743
(0.3203)
0.0671
(0 .0012)
45-64
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7182
(0.1629)
0.0426
(0.1050)
4.7182
(0.4517)
0.0426
(-0.0571)
5.8818
(0 .1554)
0.1105
(0.0425)
6.3743
(0.4242)
0.0671
(0.6912)
65+
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
5.8818
(0.2288)
0.1105
(-0.0194)
6.8868
(0.9455)
0.1399
(1.1810)
6.8868
(0.1876)
0.1399
(0.0795)
6.3743
(0 .6514)
0.0671
(-0.1621)
S.E.
i. Total Q
Analysis of Variation Table
Source
d.£.
Model
Q
Contrast
2
1059.76
1
1036.29
-0.8280
0.0257
1
21.32
0.4925
0.1067
1
·0.57
0,'1570
0.2074
Error
13
15.93
Total
15
1075.69
b
b
b
1
1
2
+ 2b
Z
=a
98.52
1.48
59
TABLE 3.A.2
RATIO ESTIMATES,
Y(E) BASED ON TS OR RR ESTIMATES.
POSTSTRATIFIED.
PHYSICIAN VISITS (PV)/POPULATION (P). (AGE, SEX, RACE).
INTERDOMAIN COVARIANCES ZERO.
Observed Estimates of Standard Error
for the Ratios
Age
White
Hale
Other
Female
White
Other
Observed Estimates of S.E. for
Saturated Model Parameters
~
A
B
C
<17
0.1257
0.2092
0.1108
0.2229
0.1257
0.1593
0.1900
0.2809
17-44
0.0978
0.2298
0.1135
0.3203
0.2092
0.3108
0.5068
1. 0479
45-64
0.1629
0.4517
0.1554
0.4242
0.1108
0.1586
0.1924
0.2436
0.2286· 0.9455
0.1876
0.6514
0.2229
0.3902
0.5316
0.777 3
65+
Tests of Hypotheses:
Contrast
S.E.
100.43
34.96
34.13
13.69
-0.9419
1.1103
1.0392
0.1593
0.1900
0.2809
3
1
1
1
37.06
0.02
10.52
10.45
-0.0465
1.6441
3.3867
0.3108
0.5068
1.0479
Female White
<17 VB. 17-44
17-44 VS. 45-64
45-64 VS. 65+
3
1
1
1
279.29
200.27
5.24
17.60
2.2450
-0.4407
1.0221
0.1586
0.1924
0.2436
Female Other
<17 VS. 17-44
17-44 VB. 45-64
45-64 VS. 65+
3
1
1
1
117.65
71.97
1.68
1.20
3.3107
0.6900
-0.8532
0.3902
0.5316
0.7774
Source
d. f.
Q
15
816.31
Male White
<17 VS. 17-44
17-44 VB. 45-64
45-64 vs. 65+
3
1
1
1
Male Other
<17 VS. 17-44
17-44 VS. 45-64
45-64 VS. 65+
Total Variation
Age Trends
60
Table 3. A. 2 con' t •
Tests of Hypotheses (con' to)
d.f.
Contrast
Q
S.E.
Sex Differences by Age
<17
White
Other
2
1
1
10.19
10.19
0.00
0.5350
-0.0013
0.1676
0.3057
17-44
White
Other
2
1
1
385.78
313.21
72.58
-2.6520
-3.3585
-0.1498
0.3942
45-64
White
Other
2
1
1
38.97
23.91
15.05
-1.1010
-2.4043
0.2252
0.6197
65+
2
1
1
15.97
13.42
2.56
-1. 0839
1.8356
0.2959
1.1482
Male
Female
2
1
1
60.47
42.51
17.96
1.5913
1.0550
0.2441
0.2489
17-44
Male
Female
2
1
1
7.77
7.76
0.00
0.6959
- 0.0106
0.2497
0.3398
45-64
Male
Female
2
1
1
6.50
0.11
6.38
0.1621
-1.1412
65+
2
1
1
6.22
5.05
1.17
-2.1855
0.7341
White
Other
Race Differences by Age
<17
Male
Female
~
0.4802
0.4518
0.9728
0.6779
Final Model (Based on Table 3.A.1):
Design Matrix
X'
=:
[1
2
o
4
0
1
3
0
1
4
0
1
4
0
1
3
0
1
0
0
1
0
0
·
rr
1
2
0
1
2
o
1
0
-1
1
0
o
1
0
-1
1
0
1
1]
o
1
0
0
Estimated Model Parameters
b... •
G
·41~8
-0.8437
0.5322
V(b)
... ...
=:
7761
-1.7052
1.7813
-1. 7052
0.8856
-0.5868
1. 78~j
3
-0.5868 x 1011.720
61
Table 3.A.2 con't.
Smoothed or Fitted Table
Sex
Age
Race
Class
Male
White
Other
White
Female
Other
<17
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7313
(0.1257)
0.0436
(-0.0765)
3.0439
(0.2092)
0.0755
(0.0197)
3.8876
(0.1108)
0.0540
(0.2323)
3.0439
(0.2229)
0.0755
(0.0210)
17-44
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
3.8876
(0.0978)
0.0540
(-0.1746)
3.0439
(0.2298)
0.0755
(-0.0268)
6.4188
(0.1135)
0.0719
(-0.0539)
6.4188
(0.3203)
0.0719
(-0.0433)
45-64
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7313
(0.1629)
0.0436
(0.0919)
4.7313
(0.4517)
0.0436
(-0.0702)
5.8866
(0.1554)
0.1155
(0.0376)
6.4188
(0.4242)
0.0719
(0.6467)
65+
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
5.8866
(0.2286)
0.1155
(-0.0242)
6.9510
(0.9455)
0.1430
(1.0969)
6.9510
(0.1876)
0.1430
(-0.0047)
6.4188
(0.6514)
0.0719
(-0.2066)
S.E.
% Total Q
Analysis of Variation Table
Source
d.f.
Q
Contrast
2
803.90
1
803.84
-0.8437
0.0298
1
24.16
0.5322
0.1083
1
1.07
0.2206
0.2131
Error
13
12.41
Total
15
816.31
Model
b
1
b2
b
l
+ 2b
Z
==
0
98.48
1.52
62
TABLE 3.A.3
RATIO
~STIMAT£S.
PtlYSICI~~
Y(r) tiASED OJ TAYLOR
COVARI~~CES
White
Male
Other
(AG:C;. SEX. RACE).
ESTlt~TED.
Observed Estimates of Standard Error
for the Ratios
Age
SERI~S EXP~~SIOcl.
NOclPOSTSTRATIFI£D.
VISITS (PV) /POPULATIOl~ (p).
Female
White
Other
Observed Estimates of S.E. for
Saturated Model Parameters
l..l
A
B
C
<17
0.1242
0.2107
0.1120
0.2210
0.1242
0.1560
0.1974
0.2534
17-44
0.0978
0.2333
0.1099
0.3290
0.2107
0.3206
0.5498
1.1049
45-64
0.1642
0.4442
0.1530
0.4288
0.1120
0.1492
0.1890
0.2652
65+
0.2308
0.9538
0.1906
0.6388
0.2210
0.4156
0.4920
0.7240
Tests of Hypotheses:
Source
d. f.
Q
Contrast
S.E.
15
. 1115.34
Male White
<17 vs. 17-44
17-44 VS. 45-64
45-64 vs. 65+
3
1
1
1
92.19
36.48
31.64
16.82
-0.9419
1.1103
1. 0392
0.1560
0.1974
0.2534
Hale Other
<17 vs. 17-44
17-44 vs. 45-64
45-64 vs. 65+
3
1
1
1
40.72
0.02
8.94
10.42
-0.0465
1. 6441
3.3867
0.3206
0.5498
1.0492
Female White
<17 vs. 17-44
17-44 vs. 45-64
45-64 vs. 65+
3
1
1
358.94
226.47
5.43
14.85
2.2450
-0.4407
1. 0221
0.1492
0.1890
0.2652
92.27
63.44
1. 97
1.38
3.3107
0.6900
-0.8532
0.4156
0.4920
0.7240
Total Variation
Age Trends
Female Other
<17 vs. 17-44
17-44 VB. 45-64
45-64 VB. 65+
1
3
1
1
1
63
Table 3. A. 3 con' t.
Tests of Hypotheses (con' t.)
d. f.
Contrast
Q
S.E.
Sex Differences by Age
White
Other
2
1
1
10.92
10.89
0.00
0.5350
-0.0013
0.1621
0.2258
17-44
White
Other
2
1
1
468.12
372.76
106.11
-2.6520
-3.3585
0.1374
0.3260
45-64
White
Other
2
1
1
41.51
26.03
16.36
-1.1010
-2.4043
0.2158
0.5945
White
Other
2
1
1
17.43
15.98
3.52
-1. 0839
1. 8356
0.2712
0.9789
2
1. 5913
1.0550
0.2187
0.2476
<17
65+
Race Differences by Age
<17
Male
Female
1
56.97
52.94
18.15
17-44
Male
Female
2
1
1
8.20
6.47
0.00
0.6959
-0.0106
0.2736
0.3746
45-64
Male
Female
2
1
1
6.65
0.12
6.41
0.1621
-1.1412
0.4625
0.4506
Male
Female
2
1
1
7.59
4.56
1.14
-2.1855
0.7341
1.0229
0.6867
1
65+
Final Model (Based on Table 3.A.1) :
Design Matrix
X' •
-
[J
2
4
o 0
1
3
0
1
4
0
1
3
0
1
4
0
1
0
0
1
0
0
1
2
0
1
2
o
1
0
-1
1
0
o
1
0
-1
1
0
1
,]
o
1
0
0
Estimated Model Parameters
b·
a· m
37 7
-0.8316
0.4990
V(b).
- -
.5052
-1.3280
H
~0218
-1.3280
0.6504
-0.8706
2.02~8
-0.8706
11.376
x 10-
3
64
Table 3.A.3 con't.
Smoothed or Fitted Table
Sex
Age
Race
Class
Male
White
Other
White
Female
Other
<17
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7156
(0.1242)
0.0424
(-0.0607)
3.0524
(0.2107)
0.0655
(0.0108)
3.8840
(0.1120)
0.0489
(0.2358)
3.0524
(0.2210)
0.0655
(0.0124)
17-44
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
3.8840
(0.0978)
0.0489
(-0.0171)
3.0524
(0.2333)
0.0655
(-0.0354)
6.3787
(0.1099)
0.0671
(-0.0138)
6.3787
(0.3290)
0.0671
(-0.0032)
45-64
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7156
(0.1642)
0.0424
(0.1076)
4.7156
(0.4442)
0.0424
(-0.0545)
5.8798
(0.1530)
0.1088
(0.0445)
6.3787
(0.4288)
0.0671
(0.6867)
65+
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
5.8798
(0.2308)
0.1088
(-0.0174)
6.8777
(0.9538)
0.1412
(1.1701)
6.8777
(0.1906)
0.1412
(0.0686)
6.3787
(0.6388)
0.0671
(-0.1665)
S.E.
% Total Q
Analysis of Variation Table
d.f.
Source
Model
Q
Contrast
2
1100.20
1
1063.20
f-0.8316
0.0255
1
21. 89
0.4990
0.1067
1
0.65
0.1660
0.2065
Error
13
15.15
Total
15
1115.34
b
1
b2
b + 2h
Z
1
=0
98.64
1. 36
65
TABLE 3.A.4
RATIO ESTll1ATES, Y(r) BASED ON TAYLOR SERIES APPROXIMATION.
NONPOSTSTRATIFIED.
PHYSICIAN VISITS (PV)/POPULATION (p). (AGE, SEX, RACE).
INTERDOMAIN COVARIANCES ZERO.
Observed Estimates of Standard Error
for the Ratios
Age
White
Male
Other
Female
White
Other
Observed Estimates of S.E. for
Saturated Model Parameters
11
B
A
C
<17
0.1242
0.2107
0.1120
0.2210
0.1242
0.1581
0.1911
0.2832
17-44
0.0978
0.2333
0.1099
0.3290
0.2107
0.3144
0.5022
1. 0524
45-64
0.1642
0.4446
0.1530
0.4288
0.1120
0.1569
0.1884
0.2444
65+
0.2308
0.9538
0.1906
0.6388
0.2210
0.3964
0.5405
0.7694
Tests of Hypotheses:
Q
15
821.91
Male White
<17 vs. 17-44
17-44 vs. 45-64
45-64 vs. 65+
3
1
1
1
99.71
35.50
33.77
13.46
-0.9419
1.1103
1. 0392
0.1581
0.1911
0.2832
Male Other
<17 vs. 17-44
17-44 VB. 45-6 l l
45-64 vs. 65+
3
1
1
1
36.83
0.02
10.72
10.36
-0.0465
1. 6441
3.3867
0.3144
0.5022
1.0524
Female White
<17 VB. 11-44
17-44 VB. 45-64
45-64 va. 65+
3
1
1
1
277.29
204.65
5.47
17.49
2.2450
-0.4407
1. 0221
0.1569
0.1884
0.2444
Female Other
<17 VB. 17-44
17-44 VB. 45-64
45-64 VB. 65+
3
1
1
116.64
69.77
1.63
1.23
3.3107
0.6900
-0.8532
0.3964
0.5405
0.7694
Total Variation
Contrast
S.E.
d. f.
Source
Age Trends
1
66
Table 3. A. 4 con It.
Tests of Hypotheses(con't.)
d.f.
Contrast
Q
S.E.
Sex Differences by Age
2
1
1
10.23
10.23
0.00
0.5350
-0.0013
0.1673
0.3054
White
Other
2
1
1
394.43
325.07
69.34
-2.6520
-3.3585
0.1471
0.4033
45-64
White
Other
2
1
1
39.22
24.08
15.15
-1.1010
-2.4043
0.2244
0.6177
65+
2
1
1
15.67
13.11
2.56
-1.0839
1.8356
0.2994
1.1480
Male
Female
2
1
1
60.45
42.33
18.12
1.5913
1. 0550
0.2446
0.2478
17-44
Male
Female
2
1
1
7.57
7.57
0.00
0.6959
-0.0106
0.2530
0.3468
45-64
Male
Female
2
1
1
6.40
0.34
6.28
0.1621
-1.1412
0.4740
0.4553
Male
Female
2
1
1
6.17
4.96
1.21
-2.1855
0.7341
0.9814
0.6667
<17
White
Other
17-44
White
Other
Race Differences by Age
<17
65+
Final Model (Based on Table 3.A.1) :
Design Matrix
..
Xl •
..b
•
[1
2
o
1
4 3
0 0
a·il
41
-0.8436
0.5306
1 1
4 3
0 0
1 1
4 0
0 0
1
0
0
·
a
1
2
0
1
2
o
1
0
-1
1
0
1
0
o -1
1
0
1
I]
o
1
0
0
Estimated Model Parameters
0831
V(b)
.. ..
a
-1.6770
1.8543
-1. 6770
0.8792
-0.6118
1. 8543 1
-0.6118
11.775
x 10- 3
67
Table 3.A.4 can't.
Smoothed or Fitted Table
Sex
Age
Race
Class
Male
White
<17
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7296
(0.1242)
0.0435
(-0.0747)
17-44
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
3.8860
(0.0978)
0.0542
(-0.1731)
45-64
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
65+
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
Other
3.0425
(0.2107)
0.0757
(0.02105)
White
Female
Other
3.8860
(0.1120)
0.0542
(0.2338)
3.0425
(0.2210)
0.0757
(0.0224)
3.0425
(0.2333)
0.0757
(-0.0254)
6.4167
(0.1099)
0.0713
(-0.0518)
6.4167
(0.3290)
0.0713
(-0.0412)
4.7296
(0.1642)
0.0435
(0.0936)
4.7296
(0.4446)
0.0435
(-0.0685)
5.8861
(0.1530)
0.1147
(0.0381)
6.4167
(0.4288)
0.0713
(0.6488)
5.8861
(0.2308)
0.1147
(-0.0237)
·6.9473
(0.9538)
0.1434
(1.1006)
6.9473
(0.1906)
0.1435
(-0.0010)
6.4167
(0.6388)
0.0713
(-0.2045)
S.E.
% Total Q
Analysis of Variation Table
Source
d.f.
Q
Contrast
2
809.65
1
809.36
-0.8436
0.0297
1
23.91
0.5306
0.1085
1
1.04
0.2176
0.213 1•
Error
13
12.27
Total
15
821. 91
Model
b1
. b2
b + 2b .. 0
2
1
98.51
1.49
68
TABLE 3.A.5
RATIO ESTIMATES, V(r) BASED ON REPLICATED RATIOS.
NONPOSTSTRATIFIED.
PHYSICIAN VISITS (PV)/POPULATION (p). (AGE, SEX, RACE).
COVARIANCES ESTIMATED.
Observed Estimates of Standard Error
for the Ratios
Observed Estimates of S.E. for
Saturated Model Parameters
Age
Male
Other
White
Female
White
Other
<17
0.1244
0.2114
0.1121
0.2215
0.1244
0.1560
0.1971
0.2540
17-44
0.0978
0.2340
0.1098
0.3277
0.2113
0.3203
0.5573
1. 0570
45-64
0.1637
0.4516
0.1530
0.4286
0.1121
0.1492
0.1888
0.2654
65+
0.2318
0.9602
0.1907
0.6458
0.2215
0.4139
0.4916
0.7259
J1
B
A
C
Tests of Hypotheses:
Source
Total Variation
d. f.
Q
Contrast
S.E.
15
118.97
Male White
<17 vs. 17-44
17-44 vs. 45-64
45-64 vs. 65+
3
1
1
1
92.00
36.48
31.73
16.74
-0.9419
1.1103
1.0392
0.1560
0.1971
0.2540
Male Other
<17 VB. 17-44
17-44 VB. 45-64
45-64 VS. 65+
3
1
1
1
39.67
0.02
8.70
10.27
-0.0465
1. 6441
3.3867
0.3203
0.5573
1. 05 70
Female White
<17 VS. 17-44
17-44 VB. 45-64
45-64 VS. 65+
3
1
1
1
358.83
226.53
5 •.45
14.83
2.2450
-0.4407
1. 0221
0.1lI92
0.1888
0.2654
Female Other
<17 VS. 17-44
17-44 VS. 45-64
45-64 VS. 65+
3
1
1
1
92.97
63.97
1.97
1. 38
3.3107
0.6900
-0.8532
0.4139
0.4916
0.7259
Age Trends
69
Table 3. A. 5 can't.
Tests of Hypotheses (can't.)
d. f.
S.E.
Contrast
- Q
Sex Differences by Age
White
Other
2
1
1
10.89
10.86
0.00
0.5350
-0.0013
0.1624
0.2269
17-44
White
Other
2
1
1
469.91
373.08
106.36
-2.6520
-3.3585
0.1373
0.3256
45-64
White
Other
2
1
1
41..12
26.05
16.13
-1.1010
-2.4043
0.2157
0.5986
White
Other
2
1
1
17.35
15.95
3.49
-1. 0839
1. 8356
0.2736
0.9827
Male
Female
2
1
1
56.88
52.90
18.14
1. 5913
1. 0550
0.2188
0.2477
17-44
Male
Female
2
1
1
8.16
6.45
0.00
0.6959
-0.0106
0.2740
0.3734
45-64
Male
Female
2
1
1
6.67
0.12
6.43
0.1621
-1.1412
65+
2
1
1
7.52
4.51
1.12
-2.1855
0.7341
<17
65+
Race Differences by Age
<17
Male
Female
~
1. 0296
0.6952
Final Model (Based on Table 3.A.1) :
Design Matrix
X' •
D
2
o
4
0
1
3
0
1
4
0
1
3
0
1
4
0
1
0
0
1
0
0
1
2
0
1
2
o
1
0
-1
1
0
o
1
0
-1
1
0
1
OJ
o
1
0
a
Estimated Model Parameters
b·
B
·379~
0.8316
0.4967
,5085
V(b). -1.3265
[ 2.0230
- -
-1.3265
0.6487
-0.8717
0.4680
0.4500
2.02301
-0.8717 x 10- 3
11.371
70
Table 3.A.5 con't.
Smoothed or Fitted Table
Sex Race
Class
White
Other
White
Age
Male
Female
Other
<17
Fitted
(Obs. S.E.)
Fitted S.E.
(Res. )
4.7158
(0.1244)
0.0424
(-0.0610)
3.0526
(0.2113)
0.0654
(0.0109)
3.8842
(0.1121)
0.0489
(0.2356)
3.0526
(0.2215)
0.0654
(0.0122)
17-44
Fitted
(Obs. S.E.)
Fitted S.E.
(Res. )
3.8842
(0.0978)
0.0489
(-0.1713)
3.0526
(0.2340)
0.0654
(-0.0355)
6.3791
(0.1098)
0.0671
(-0.0142)
6.3791
(0.3277)
0.0671
(-0.0036)
45-64
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7158
(0.1637)
0.0424
(0.1074)
4.7158
(0.4516)
0.0424
(-0.0547)
5.8824
(0.1530)
0.1088
(0.0418)
6.3791
(0.4286)
0.0671
(0.6864)
65+
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
5.8824
(0.2318)
0.1088
(-0.0200)
6.8757
(0.9602)
0.1412
(1.1721)
6.8757
(0.1907)
0.1412
(0.0706)
6.3791
(0.6458)
0.0671
(-0.1668)
S.E.
% Total Q
Analysis of Variation Table
Source
d.f.
Q
Contrast
2
1103.90
1
1066.09
-0.8316
0.0255
1
21. 70
0.4967
0.1066
1
0.61
0.1617
0.2065
Error
13
15.07
Total
15
1118.97
Model
b
1
b2
b + 2b
2
1
=0
98.65
1.35
71
TABLE 3.A.6
RATIO ESTIMATES. Y{r) BASED ON REPLICATED RATIOS.
NONPOSTSTRATIFIED.
PHYSICIAN VISITS {PV)/POPUJATION (P). (AGE, RACE, SEX).
INTERDOMAIN COVARIANCES ZERO.
Observed Estimates of Standard Error
for the Ratios
Age
White
Male
Other
Female
White
Other
Observed Estimates of S.E. for
Saturated Model Parameters
11
A
B
C
<17
0.1244
0.2114
0.1121
0.2215
0.1244
0.1582
0.1907
0.2838
17-44
0.0978
0.2340
0.1098
0.3277
0.0978
0.3152
0.5086
1. 0611
45-64
0.1637
0.4516
0.1530
0.4286
0.1637
0.1570
0.1884
0.2445
65+
0.2318
0.9602
0.1907
0.6458
0.2318
0.3956
0.5395
0.7551
Tests of Hypotheses:
Source
Contrast
S.E.
d.£.
Q
15
820.55
Male White
<17 vs. 17-44
17-44 vs. 45-64
45-64 vs. 65+
3
1
1
1
99.34
35.44
33.90
13.41
-0.9419
1.1103
1. 0392
0.1582
0.1907
0.2838
Male Other
<17 vs. 17-44
17-44 vs. 45-64
45-64 vs. 65+
3
1
1
1
36.21
0.02
10.45
10.19
-0.0465
1.6441
3.3867
0.3152
0.5086
1.0611
Female. White
<17 VB. 17-44
17-44 VS. 45-64
45-64 VS. 65+
3
1
1
1
276.99
204.55
5.47
17.47
2.2450
-0.4407
1.0221
0.1570
0.1884
0.2445
Female Other
<17 VS. 17-44
17-44 VS. 45-64
45-64 VB. 65+
3
1
116.54
70.05
1.64
1.21
3.3107
0.6900
-0.8532
0.3956
0.5395
0.7751
Total Variation
Age Trends
1
1
72
Table 3. A. 6 con It.
Tests of Hypotheses (conlt.)
d. f.
Q
Contrast
S.E.
Sex Differences by Age
White
Other
2
1
1
10.21
10.21
0.00
·0.5350
-0.0013
0.1675
0.3061
17-44
White
Other
2
1
1
394.70
325.16
69.57
-2.6520
-3.3585
0.1471
0.4027
45-64
2
1
-1.1010
-2.4043
0.2241
0.6226
<17
White
Other
1
39.05
24.14
3.86
White
Other
2
1
1
15.55
13.04
2.52
-1.0839
1. 8356
0.3002
1.1571
Male
Female
2
1
1
60.19
42.13
18.05
1.5913
1.0550
0.2452
0.2483
17-44
Male
Female
2
1
1
7.53
7.53
0.00
0.6959
-0.0106
0.2536
0.3456
45-64
Male
Female
2
1
1
6.40
0.11
6.57
0.1621
-1.1412
0.4804
0.4551
Male
Female
2
1
1
6.08
4.90
1.18
-2.1855
0.7341
0.9878
0.6734
65+
Race Differences by Age
<17
65+
Final Model (Based on Table 3.A.1):
Design Matrix
Xl
-
EO
[1
2
o
4
0
1
1
4
1
3
0
0
0
3
1 1
4 0
0 0
1
0
0
1
2
0
1
2
o
1
0
-1
1
0
o
1
0
-1
1
0
1
1]
o
1
0
0
Estimated Model Parameters
b ..
-
a::~~
[Q.53~
V(b)"
~ -
[l
,0859
-1.6784
1.8516
-1.6784
0.8806
-0.6110
1' 85161
-0.6110
11.802
x 10-3
73
Table 3.A.6 con't.
Smoothed or Fitted Table
Sex
Age
Race
Class
Male
White
Other
White
Female
Other
<17
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7296
(0.1244)
0.0435
(-0.0747)
3.0425
(0.2114)
0.0758
(0.0210)
3.8860
(0.1121)
0.0542
(0.2338)
3.0425
(0.2215)
0.0758
(0.0223)
17-44
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
3.8860
(0.0978)
0.0542
(-0.1731)
3.0425
(0.2340)
0.0758
(-0.0254)
6.4167
(0.1098)
0.0713
(-0.0518)
6.4167
(0.3277)
0.0713
(-0.0412)
45-64
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7296
(0.1637)
0.0435
(0.0936)'
4.7296
(0.4516)
0.0435
(-0.0685)
5.8863
(0.1530)
0.1148
(0.0379)
6.4167
(0.4286)
0.0713)
(0.6488)
65+
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
5.8863
(0.2318)
0.1148
(-0.0239)
6.9471
(0.9602)
0.1435
(1.1008)
6.9471
(0.1907)
0.1435
(-0.0008)
6.4167
(0.6458)
0.0713
(-0.2045 )
S.E.
% Total Q
Analysis of Variation Table
d.f.
Source
Model
Q
Contrast
2
808.31
1
808.05
-0.8436
0.0297
1
23.84
0.5304
0.1086
1
1.03
0.2172
0.2136
Error
13
12.24
Total
15
820.55
b
1
b2
b
1
+ 2b 2
=
0
98.51
1.49
74
TABLE 3.A.7
RATIO ESTIMATES, 'tI
V(r) BASED ON TAYLOR SERIES APPROXIMATION.
NONPOSTSTRATIFIED.
PHYSICIAN VISITS (PV)/POPULATION (P). (AGE, RACE, SEX).
COVARIANCES ZERO.
~
Observed Estimates of Standard Error
for the Ratios
Age
White
Male
Other
Female
White
Other
Observed Estimates of S.E. for
Saturated Model Parameters
lJ
A
B
C
<17
0.1497
0.2609
0.1348
0.2642
0.1497
0.1880
0.2124
0.3096
17-44
0.1137
0.2663
0.1472
0.4110
0.2609
0.3728
0.5949
1.1946
45-64
0.1794
0.5320
0.1893
0.5620
0.1348
0.1996
0.2398
0.2945
65+
0.2524
1.0696
0.2256
0.8368
0.2642
0.4887
0.6963
1.0080
Tests of Hypotheses:
Contrast
S.E.
78.36
25.10
27.33
11.26
-0.9419
1.1103
1.0392
0.1880
0.2124
0.3096
3
1
1
1
28.13
0.02
7.64
8.04
-0.0465
1.6441
3.3867
0.3728
0.5949
1.1946
Female White
<17 vs. 17-44
17-44 vs. 45-64
45-64 va. 65+
3
1
1
1
185.40
126.51
3.38
12.05
2.2450
-0.4407
1.0221
0.1996
0.2398
0.2945
Female Other
<17 va. 17-44
17-44 VB. 45-64
45-64 vs. 65+
3
1
1
1
75.29
45.40
0.98
0.72
3.3107
0.6900
-0.8532
0.4887
0.696'3
1.0080
d. f.
Q
15
548.64
Male White
<17 VB. 17-44
17-44 vs. 45-64
45-64 vs. 65+
3
1
1
1
Male Other
<17 vs. 17-44
17-44 vs. 45-64
45-64 va. 65+
Source
Total Variation
Age Trends
75
Table 3•A. 7 con' t.
Tests of HY,Eotheses (con' to)
d. f.
Q
Contrast
S.E.
Sex Differences by Age
White
Other
2
1
1
7.05
7.05
0.00
0.5350
-0.0013
0.2014
0.3713
17-44
White
Other
2
1
1
250.25
203.25
47.01
-2.6520
-3.3585
0.1860
0.4898
45-64
White
Other
2
1
1
27.47
17.82
9.65
-1. 1010
-2.4043
0.2608
0.7738
White
Other
2
1
1
12.08
10.25
1.83
-1.0839
1.8356
0.3385
1.3580
2
1
1
40.64
28.00
12.65
1.5913
1.0550
0.3008
0.2966
17-44
Male
Female
2
5.77
5.77
0.00
0.6959
-0.0106
0.2896
0.4366
45-64
Male
Female
2
1
1
3.79
0.08
3.70
0.1621
-1.1412
0.5614
0.5930
65+
2
1
1
4.67
3.96
0.72
-2.1855
0.7341
1.0989
0.8667
<17
65+
Race Differences by Age
<17
Male
Female
1
1
Male
Female
Final Model (Based on Table 3.A.1) :
Design Matrix
...X' ..
[1
2
o
1
4 3
0 0
1 1
4 3
0 0
1 1
4 0
0 0
1
0
0
1 .1 1
2 2 0
0 o -1
1
0
o
1
0
-1
1
0
1
1]
o
1
Estimated Model Parameters
b
...
co
S
·41@6
-0.8436
0.5304
U
o08S9
V(b)
... ...
co
-1.6784
1.8516
-1.6784
0.8806
-0.6110
1.8Sa
-0.6110
11.802
0
0
76
Table 3.A.7 con't.
Smoothed or Fitted Table
Sex Race
Class
White
Other
White
Age
Male
Female
Other
<17
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7296
(0.1497)
0.0435
(-0.0747)
3.0425
(0.2609)
0.0758
(0.0210)
3.8860
(0.1348)
0.0542
(0.2338)
3.0425
(0.2642)
0.0758
(0.0223)
17-44
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
3.8860
(0.1137)
0.0542
(-0.1731)
3.0425
(0.2663)
0.0758
(-0.0254)
6.4167
(0.1472)
0.0713
(-0.0518)
6.4167
(0.4110)
0.0713
(-0.0412)
45-64
Fitted
(Obs. S.E.)
Fitted S.E.
(Res.)
4.7296
(0.1794)
0.0435
(0.0936)
4.7296
(0.5320)
0.0435
(-0.0685)
5.8863
(0.1893)
0.1148
(0.0379)
6.4167
(0.5620)
0.0713
(0.6488)
65+
Fitted
(Obs. S.E.)
Fitted S.E.
5.8863
(0.2524)
0.1148
(-0.0239)
6.9471
(1.0696)
0.1435
(1.1008)
6.9471
(0.2256)
0.1435
(-0.0008)
6.4167
(0.8368)
0.0713
(0.2045)
S.E.
% Total Q
(Res.)
Ana1lsis of Variation Table
Source
d.f.
Model
Q
Contrast
2
540.12
1
539.84
-0.8436
0.0297
1
17.29
0.5304
0.1086
1
0.80
0.2172
0.2136
Error
13
8.52
Total
15
548.64
b
b
1
2
b + 2b
1
2
c::
0
98.45
1.55
CHAPTER FOUR
THE INVESTIGATION OF EFFECTS ON INFERENCE
IN THE PRESENCE OF LIMITED RESPONSE ERROR
4.1
Introduction
Chapter Three was devoted to the examination of the three var-
iance estimation techniques for an age, race, and sex classification.
The conclusions based on that classification are limited in two important ways.
First, these are fairly concrete variables with only age pos-
sibly being subject to serious response error and the age groupings
selected would tend to minimize even this possibility.
The other limita-
tion is that poststratification was done on precisely these variables
and it is reasonable to suspect that this would minimize whatever effects
poststratification might have.
This suspicion is in part justified by
the fact that the levels chosen caused the poststratified estimates to
be linear sample statistics.
So, because of these two limitations, it
is worthwhile to repeat the sampling experiment on the same data but with
a different cross-classification.
It was observed in Chapter One that education and income variables are subject to response error in sample surveys.
This error may be
either measurement error or pure response error or sonle combination of
the two.
For the H.I.S. data used in this experiment it is not possible
to distinguish among the various sources of error, pure response error,
s&upling error, measurement error and possibly interaction error.
How-
ever, by classifying on these variables, Bome additional (but unknown)
78
amount of response error should be introduced into the experiment.
In
addition, it was felt to be useful to use at least one variable subject
to only sampling error.
this purpose.
The geographic location of the household serves
Further, since the data are poststratified by age, race,
and sex, the effects of poststratification if they are present should be
highlighted in the comparison of the Taylor series approximation estimates to the replicated ratio estimates.
The classification variables are defined in the experiment as
follows.
The income variable (I) refers to the household income and
has three levels, less than $5,000, $5,000 to $14,999, and $15,000 or
more.
The education variable" (E) refers to the education of the head of
the household and has three levels, less than twelve years «HS) , twelve
years (HS), and more than twelve years (>HS).
The sampling error only
variable is termed residence (R) and is fixed by the sample selection
design prior to the interview.
Residence has two levels, SMSA
~d
non-
SMSA, where SMSA denotes a standard metropolitan statistical area
according to the 1970 U. S. Bureau of the Census definition.
All other
terms are defined as in Chapter Three.
These definitions are brought into focus by considering Table
4.1 which shows the observed estimates of the physician visit to population ratios, denoted PV/P.
Here the first column indicates the R level,
the second column the E level, and columns 3, 4, and 5 refer to the I
level.
Several descriptive comments are in order.
The non-SMSA resi-
dents have fewer visits than the corresponding SMSA residents.
Persons
in either the lowest income households or the highest educational level
households appear generally to have the highest number of physician
visits.
Lastly, there may be an interaction between income and educa-
79
TABLE 4.1
RATIO ESTIMATES OF PHYSICIAN VISITS/POPULATION
CLASSIFIED BY INCOME, RESIDENCE, AND EDUCATION
Observed Estimates
Residence
Education
of Head of
Household
SMSA
<HS
6.1475
4.7348
4.8245
HS
6.1736
4.9812
4.7031
>HS
6.3065
6.0771
5.6562
<HS
5.0770
4.1442
4.4177
HS
5.3602
4.3202
4.4929
>HS
4.5846
5.0603
4.4798
Non-SMSA
tion.
0-4,999
Household Income
15,000 and
5,000-14,999
. over
The term interaction means that the effects of income and educa-
tion may not be strictly additive.
These descriptive comments are dis-
cussed more throughly in the section on inference.
With these estimates
fixed throughout the experiment it is possible to proceed.
The investigation follows the plan established in Chapter Three.
First, estimates based on the Taylor series approximation are compared
to those based on the direct replication of the ratio estimate.
This is
to determine whether the consistency of BRR for nonlinear estimates
observed in Chapter Three is
response error.
reta~ned
in the presence of limited
Next, the poststratified estimates are examined to
determine whether significant variance reduction is achieved.
the effect of assuming zero interdomain covariance is examined.
Lastly,
This
effect is particularly pertinent to the inference question, but effects
80
related to bias and consistency of the variance estimates are also
examined.
Section 4.5 is concerned with inferences and substantive
conclusions for this cross-classification.
This will conclude the
empirical investigation of the effect of limited response error on different approaches to varimlce estimation and inference for complex sampIe surveys.
4.2
The Comparison of the Taylor SeriesAEproximation to the BRR Procedure for Ratio Estimates
The consistency of the Taylor series approximation for variance
estimation is noted in Chapter Two.
This, in conjunction with the
unbiasedness of BRR for estimating the variance of linear sample statistics, permits the investigation of whether BRR consistently estimates
the variance of ratio statistics.
The technical aspects of this inves-
tigation are outlined in Section 3.5.
dard error are shown in Table 4.2.
The resulting estimates of stan-
The estimate comparisons are arrayed
row-wise with the replicated ratios (RR) appearing first, the Taylor
series approximation (TS) next and the ratio appearing thirdly for each
of the 18 domains.
The estimates based on the poststratified and the
nonpoststratified data sets are shown in the upper and lower blocks
respectively.
The classification yields a moderately wide range of
standard errors, from a minimum of 0.1285 to a maximum of 0.5819.
The
RR estimates range from 0.998 times the TS estimates to 1.034 times the
TS estimates.
For all but 2 of the 36 comparisons the difference is
less than one percent of the TS estimate.
found in Chapter Three.
These results confirm those
Thus, it appears that, for this data set at
least, BRR produces consistent estimates of the standard error of ratio
statistics in the presence of limited response error.
This provides
81
TABLE 4.2
COMPARISON OF ESTIMATED STANDARD ERRORS
FOR REPLICATED RATIOS (RR) AND TAYLOR SERIES APPROXIMATION (TS),
INCOME BY RESIDENCE BY EDUCATION CLASSIFICATION
OF PHYSICIAN VISITS/POPULATION
(COVARIANCES ESTIMATED)
Income and Residence Class
IEducation
Estimate
0-4,999
SMSA
Non-SMSA
5,000-14,999
SMSA
15,000 and over
Non-SMSA
SMSA
Non-SMSA
Poststratified
<HS
RR
TS
RR+TS
0.1796
0.1798
0.9989
0.2618
0.2607
1.0042
0.1286
0.1285
1.0008
0.1541
0.1535
1.0039
0.2530
0.2528
1. 0008
0.3748
0.3755
0.9981
HS
RR
TS
RR+TS
0.4112
0.4098
1. 0034
0.4378
0.4380
0.9995
0.1746
0.1746
1.0000
0.1935
0.1934
1.0005
0.1799
0.1802
0.9983
0.3279
0.3248
1.0095
>HS
RR
TS
RR+TS
0.4925
0.4885
1. 0082
0.5809
0.5622
1.0333
0.1868
0.1861
1. 0038
0.2921
0.2915
1.0021
0.1628
0.1630
0.9988
0.3128
0.3110
1. 0058
Nonpoststratified
<HS
RR
TS
RR+TS
0.1779
0.1779
1.0000
0.2627
0.2618
1. 0034
0.1292
0.1290
1.0016
0.1539
0.1533
1. 0039
0.2542
0.2541
1. 0004
0.3734
0.3742
0.9979
HS
RR
TS
RR+TS
0.4090
0.4078
1. 0029
0.4356
0.4366
0.9977
0.1738
0.1738
1.0000
0.1933
0.1932
1. 0005
0.1799
0.1802
0.9983
0.3281
0.3251
1.0092
>HS
RR
TS
RR+TS
0.4912
0.4871
1. 0084
0.5819
0.5630
1. 0336
0.1852
0.1844
1. 0043
0.2921
0.2911
1.0034
0.1619
0.1620
0.9994
0.3125
0.3107
L0058
82
evidence that the complicated cOluputing procedure for the Taylor series
approximation is unnecessary since equally as consistent estimates of
the standard errors may be obtained through direct replication of the
ratio statistics.
Given a consistent method of estimating standard errors, inference procedures are of interest to the survey data analyst.
RR approaches are compared in Table 4.3.
work was discussed in Chapter Two.
are based on a saturated model.
The TS and
The hypothesis testing frame-
The test statistics or Q statistics
Various hypotheses of interest are
given in column one with the corresponding degrees of freedom shown in
column two.
mates.
tics.
Columns three and six show the Q's based on the RR esti-
Columns five and eight show the ratio of the RR to TS Q statisColumns three, four, and five are based on the poststratified
data set and columns six, seven, and eight are based on the nonpoststratified data set.
The ratios in columns five and eight reveal no
~ore
than a four per cent difference between the test statistics based upon
the TS and RR estimates of variance.
Therefore, one is led to conclude
that for hypothesis testing in this data set there is little difference
between the TS and RR approaches.
of Chapter Three.
Again, these results confirm those
This will become a familiar litany in the course of
this investigation, but it should be noted that both poststratified and
nonpoststratified data sets have been examined.
This point leads to the
next comparison of the investigation.
4.3
The Poststratification (PS) Adjustment of Survey Data
The question of whether poststratification of survey data for
representativeness has the effect of reducing standard error estimates
83
TABLE 4.3
COMPARISON OF TESTS OF HYPOTHESES,
REPLICATED RATIOS (RR) VERSUS TAYLOR SERIES APPROXIMATION (TS),
INCOME (I) BY RESIDENCE (R) BY EDUCATION (E) CLASSIFICATION
PHYSICIAN VISITS/POPULATION,
COVARIANCES ESTIMATED.
Q STATISTICS.
Hypothesis
d. f.
Poststratified
Nonpoststratified
RR
TS
RR.;-TS
RR
TS
RR';-TS
Interactions:
I x R x E ::: 0
4
1.03
1.04
0.990
1.04
1.04
1.000
I x E ::: 0
I x R ::: 0
4
13.22
13.57
0.974
13.27
13.62
0.974
2
2.26
2.26
1.000
2.26
2.26
1.000
E x R::: 0
2
3.67
3.80
0.966
3.70
3.83
0.966
1=0
2
17.62
17.91
0.984
17.96
18.24
0.985
E ::: 0
2
6.89
7.15
0.964
6.88
7.14
~0.964
R
1
27.24
27.96
0.974
27.07
27.75
0.975
1
5.51
5.54
0.995
5.52
5.54
0.996
3
2.76
2.78
0.993
2.73
2.75
0.993
Model
1
198.64
200.44
0.991
199.01
201.16
0.989
ErrOl"
16
18.21
18.04
1.009
18.14
17.98
1.009
Total
17
216.85
218.49
0.992
217.15
219.14
0.991
Main Effects:
a::
0
Trends:
1 -21 +1 = 0
2 3
1
E within I =0
Final Models:
84
may be answered by considering Tables 4.4.1 and 4.4.2.
The first table
examines the issue for the individual domains for both RR and TS esti-
TABLE 4.4.1
EFFECTS OF POSTSTRATIFICATION (PS) VERSUS NONPOSTSTRATIFICATION (NPS).
INCOME (I) BY RESIDENCE (R) BY EDUCATION (E) CLASSIFICATION.
PHYSICIAN VISITS/POPULATION.
COVARIANCES ESTIMATED.
Effects on Standard Errors:
S.E. (PS)
.
S.E. (NPS)
Income and Residence Class
Education
0-4,999
SMSA
5,000-14,999
Non-SMSA
SMSA
Non-SMSA
15,000 and over
SMSA
Non-SMSA
0.9953
1.0037
Replicated Ratios
<HS
1.0096
0.9966
0.9954
1.0013
~
HS
1.0054
1.0051
1.0046
1.0010
1.0000
0.9994
>HS
1.0026
0.9983
1. 0086
1.0000
0.9994
1.0010
Taylor Series Approximation
<HS
1.0107
0.9958
0.9961
1.0013
0.9949
1.0035
HS
1. 0049
1. 0032
1. 0046
1.0010
1.0000
0.9991
>HS
1. 0026
0.9986
1.0092
1. 0014
1.0062
1.0010
mates, while the second examines the effect on test statistics.
covariance assumption is not examined.
The zero
As mentioned earlier poststrati-
fication is used to account for nonresponse and thus reduce standard
85
errors.
It is felt that this reduction is large enough to risk the pos-
sible biases that would be introduced if errors are present in the poststratification variables.
Naturally, such errors would tend to inflate
the standard errors of the estimates and negate the advantages of the
poststratification.
The effects of poststratification on the individual domain
ratio standard error estimates are neg1gib1e as can be seen in Table
4.4.1.
Here the ratio of the poststratified to nonpoststratified stan-
dard errors are presented.
These ratios are based on the estimates
shown in Table 4.2 and the appendix to this chapter.
These ratios show
that poststratification does not produce as much as a one per cent
reduction in the standard error estimates based on either the RR or TS
ratio estimates.
For the RR estimates in two domains no effect is
observed and in ten domains a less than one per cent standard error
increase is observed.
For the TS estimates in one domain there is no
effect and in twelve domains a one per cent or less standard error
inflation due to poststratification is observed.
Thus, it is reasonable
to conclude that poststratification is not having the desired effect of
variance reduction to a significant degree.
The second effect of poststratification should be to increase
the power of tests of hypotheses. This issue is examined in Table 4.4.2.
Here the ratio of the poststratified Q statistics to nonpoststratified
Q statistics are presented, based· on the estimates in Table 4.3 and the
appendix to this chapter.
If poststratification does increase the power,
the test statistics should increase relative to the Q's for the nonpoststratified data.
For the RR estimates the'predicted increase is never
more than one per cent and occurs in only four of the twelve tests of
86
TABLE 4.4.2
EFFECTS OF POSTSTRATIFICATION (PS) VERSUS NONPOSTSTRATIFICATION (NPS).
INCOME (I) BY RESIDENCE (R) BY EDUCATION (E) CLASSIFICATION.
PHYSICIAN VISITS/POPULATION.
COVARIANCES ESTIMATED.
REPLICATED RATIOS AND TAYLOR SERIES APPROXIMATION.
Q(PS) .;- Q(NP S)
Effects on Test Statistics:
Estimate
Interactions
IxRxE
IxE
IxR
ExR
Replicated Ratios
0.990
0.996
1.000
0.992
Taylor Series
1.000
0.996
1.000
0.992
Main Effects
I
E
R
Replicated Ratios
0.981
1.001
l.001
Taylor Series
0.982
l.001
l.008
Trends
1 -21 +1
1
2 3
E within I
Replicated Ratios
0:998
l.011
Taylor Series
1.000
l.011
Final Model
Model
Error
Total
Replicated Ratios
0.998
1.004
0.999
Taylor Series
0.996
1.003
0.997
87
hypotheses.
This is also true for the TS estimates.
The most impor-
tant hypothesis is that for total variation since it corresponds to an
overall measure of the variance inflation due to the adjustment procedure.
Here the poststratified data set has a slightly smaller total
Q statistic which corresponds to a small, but detectable, overall variance inflation due to the adjustment proceudre.
These results for
hypotheses testing confirm the results for standard errors.
The con-
clusion is that the adjustment has very little effect and that which
occurs is not in the desired direction.
The next major question is the
effect of assuming interdomain covariances are zero.
4.4
Covariance Effects
As noted in Chapter Three the presence of interdomain covar-
iance is particularly important for inference procedures in sample survey data.
For testing differences among domain estimates positive
covariances produce conservative test statistics, if the covariances
are assumed to be zero, and negative covariances produce anti-conservative test statistics.
Tables 4.5.1 and 4.5.2 show the correlation
matrices based on RR estimates for the PS and NPS data sets.
As can be
seen the NPS data set variance matrix has 70 negative covariances and
some correlations as negative as -0.278.
Similarly the PS data set
variance matrix contains 67 negative covariances and correlations as
negative as -0.292.
This suggests that the covariance effects may be
signiUcant.
These effects on inferences are examined in Tables 4.6.1 through
4.6.4.
Final models for this crossc1assification were constructed
based on the procedures for inference discussed in Chapter Two.
The
TABLE 4.5.1
CORRELATION MATRIX FOR INCOME BY RESIDENCE BY EDUCATION.
REPLICATED RATIOS. NONPOSTSTRATIFIED DATA.
Inco-.
0-4,999
i.ed'£oce
SKSA
UuC~t10D
SItU
I)-
-.999
"",,-SllSA
I
I
<US
1.000
HS
0.014
b.aoo-
!14.999
<ilS
(;.049
HS
0.055
>us
-0.099
<HS
0.124
<HS
US
>HS
0.062
-O.OSS
1.000
0.070
-0.065
0.121
>KS
<HS
US
<KS
>HS
HS
>KS
<HS
liS
I
>HS
•
1.000
0.068 -0.083
0.185
0.196
0.241
0.080
0.124
-0.103
-0.129 -0.076
0.258
.1.000
-0.108 : 1.000
-0.054 -0.253
0.045
I
0.345
1.000
II
-0.151
0.086
-0.156
-0.025
0.120
0.038
-0.135
-0.118
1.000
<HS
0.167
-0.070
-0.195'
0.077
0.043
-0.042
-0.027
-0.193
0.063
1.000
-0.114 -0.147
':'0.058
0.077
0.017
0.063
0.018
-0.084
0.055
0.311
1.000
>HS
-0.140
0.058
-0.035
0.003
-0.035
0.123
0.025
0.126
-0.019
0.086
-0.032
1.000
<HS
-0.058
0.154
-0.065
0.044
-0.138
0.044
0.204
0.257
-0.199
-0.064
-0.016
0.153
1.000
-0.108 -0.047
O.Oil
0.085
0.014
0.034 rO.219
-0.129
0.170
0.037
0.036
0.102
-0.111
1.000
-0.074
0.012
-0.132
0.211
1.000
>HS
-0.278
0.086
-0.253
0.026
-0.036
-0.037
and
o";er
<KS
-0.112
-0.166
0.006
-0.011
0.087
-0.012
I
KS
,
1.000
IS.DOO
Noo-SUSA
'1\S
Non-S:1SA
SMSA
Non-SHSA
>HS
HS
S~A
>HS
SMSA
1.000
0.119 -0.286
HS
!IoD-SHSA
HS
>HS
HS
5.'ISA
<HS
15,000 and ov"r
5,000-14,999
Non-SHSA
\
-0.060
-0.087
0.032
-0.014
0.046
0.192
0.005
0.035
0.085
0.083
-0.001
-0.069
0.002
1.000
0.074 -0.048
-0:054
0.002
-0.133
-0.050
0.044
-0.086
-0.039
1.000
0.139
0.105
0.185
-0.079
0.027
0.176
-0.065
-0.071
HS
0.056
0.022
0.162
-0.003
0.034
0.164
0.067
>HS
0.041
-0.071
-0.050
0.066
0.042
0.047
0.118
-0.076
-0.024
1.000
00
00
TABLE 4.5.2
CORRELATION MATRIX FOR INCOME BY RESIDENCE BY EDutATION
REPLICATED RATIOS. POSTSTRATIFIED DATA •.
rnc~
lluLlence
<K5
I
liKSA
Non-SMSA
SMSA
UucaUoa
~m
5.000-14,999
0-4,999
<JIS
KS
<KS
>KS
SHSA
>KS
liS
<lIS>
liS
>I\S
<KS
!
15,000 and over
Non-SKSA
liS
Non-SKSA
SMSA
>KS
<KS
>1lS .
. liS
<liS
HS
1l.000
KS
0.030
>HS
I
l.000
/HS
0.097
-0.286
l.000
<KS
0.043
0.072
-0.056
,
I
j
I
lion-~SA
Sl'..5A
So.... ~
SliSA
115 ,000
0.078
-0.057
0.134
>I!S
-0.096
0.074
-0.087
0.185
0.200
l.000
0.114
-0.085
-0.108
l.000
0.2481-0.051
-0.240
0.044
0.348
1.000
0.124
0.043
-0.126
-0.113
1.000
0.055
-0.042
- O. 040 -0.191
0.073
l.000
0.116
0.250
liS
-0.138
-0.076
I
0.071
I
lio.... SllSA
I
II
!
0.094
-0.144
<KS
I 0.173
-0.058
liS
-0.105
-0.140
-~.".
i'·'"
0.082
0.Oi6
0.064
0.018
-0.080
0.064
0.312
l.000
>HS
-0.128
0.063
-0.038
0.002
-0.029
0.120
0.030
0.126
-0.016
0.091
-0.029
l.000
<HS
-0.067
0.155
- O. 063
0.043
-0.138
0.050
0.200
0.250 -0.197
-0.076
-0.02)
0.160
l.000
liS
1-0.128
-0.044
0.010
0.083
0.010
0.032
-0.230 -0.124
0.172
0.023
0.031
0.099
-0.100
>HS
-0.292
0.075
-0.252
0.026 -0.034
-0.033
-0.064
-0.086
0.030
-0.017
-0.086
0.017
-0.132
0.209
l.000
0.034
0.165
0.010
0.024
0.084
0.065
0.002
-0.066
0.004
0.075
<I!S
,
I
i
;-0.125
-0.058
-0,016
"'ad
lo\Oer
l.000
0.090
>H$
5,00014,999
,
I!S
<KS
,I
l.000
liS
>HS
:
,
I
:
l.000
,
0.006 -0.017
0.068 -0.015 ,
0.024
0.168
0.007
0.032
0.165
0.067
-0.042
-0.051
0.003
-0.136
-0.052
0.047
-0.087
-0.039
1.000
0.030 -0.077
-0.050
I 0.060
0.038
0.044
0.120 -0.080 -0.019
0.131
0.108
0.183
-0.081
0.021
0.175
-0.066
-0.078
-0.123 -0.171
0.052
l.000
i
.I
1.0001
(Xl
\0
90
TABLE 4.6.1
EFFECTS OF ZERO COVARIANCE ASSUMPTION ON FINAL MODELS.
NONPOSTSTRATIFIED REPLICATED RATIOS.
PHYSICIAN VISITS/POPULATION.
CLASSIFIED BY INCOME, RESIDENCE, AND EDUCATION.
Interdomain Covariances
Assumed to be Zero
Covariances Estimated
?arameter Vector l).arid Estimated Covariance Matrix V(b)
-
b_
--
= ~~09ill
0.2771
V{b) '"" lJ·S355
-0.2259
--
b_
-0.22m x 10- 3 V{b)
0.3859
--
=
= [07~TI
0.2762
~6S64
0.0856
0.08m x 10- 3
0.4987
Tables of Variation
Source
d.L
Q
Model
1
199.01
Error
16
17
18.14
217.15
Total
Ratios:
% Total
Source
91.65
8.35
100.00
Model
[O~
1.003
t
1
16
17
Error
Total
(Covariance Estimated)
Parameters:
d.L
152.94
13.58
166.52
(Covariance
Covariance
Matrix:
Tables of Variation:
Q
[J.
954
-2.639
Error
1.301
1.336
Total
1.304
Model
= 0)
-2.6ru
0.774
% Total
91.84
8.16
100.00
91
TABLE 4.6.2
EFFECTS OF ZERO COVARIANCE ASSUMPTION ON FINAL MODELS
POSTSTRATIFIED REPLICATED RATIOS
PHYSICIAN VISITS/POPULATION.
CLASSIFIED BY INCOME, RESIDENCE, AND EDUCATION.
Interdomain Covariances
Assumed to be Zero
Covariance Estimated
Parameter Vector b and Estimated Covariance Matrix V(b)
-
-=
b
y(~)
=
[09m
0.2777
[5259
-0.2070
--
0.2760
- = [07ill
b
m
-0. 20
0.3882
x 10- 3 V(b)
--
=
/2.6685
0.0926
0. 0926 1 x 10- 3
0.5031
Tables of Variation
Source
d.L
% Total
Source
d.L
91.60
Model
1
151.36
91.82
Q
% Total
Model
1
Q
198.64
Error
16
18.21
8.40
Error
16
13.49
8.18
Total
17
216.85
100.00
Total
17
164.85
100.00
Ratios:
(Covariance Estimated)
Parameters:
[o~]
1.006
,
.
(Covariance
Covariance
Matrix:
I
Tables of Variation:
~.947
-2.235
Model
1.312
Error
1.350
Total
1.315
= 0)
-2.2m
0.772
92
TABLE 4.6.3
EFFECTS OF ZERO COVARIANCE ASSUMPTION ON FINAL MODELS.
NONPOSTSTRATIFIED TAYLOR SERIES APPROXIMATION.
PHYSICIAN VISITS/POPULATION.
CLASSIFIED BY INCOME, RESIDENCE, AND EDUCATION.
Interdomain Covariances
Assumed to be Zero
Covariances Estimated
Parameter Vector b and Estimated Covariance Matrix V(b)
-
-
b
--
V(b)
c:
--
[o~-m
0.2776
= ~.5505
-0.2283
'b'
_
70. 2283 1
0.3830
x·10 -3
V(b)
--
c:
~.075~
0.2761
= ~6464
0.0857
0.0857/
0.4965
x 10- 3
Tables of Variation
Source
d.£.
% Total
91.80
Model
Source
% Total
1
Q
153.59
d. f.
Model
1
Q
201.16
Error
16
17 .98
8.20
Error
16
13.75
8.22
Total
17
219.14
100.00
Total
17
167.34
100.00
Ratios:
(Covariance Estimated)
Parameters:
,
[.O~
1.005
.
(Covariance
Covariance []".964
Matrix:
-2.664
Tables of Variation:
Model
1.310
Error
1.308
Total
1.310
= 0)
-2. 664 1
0.771
9~1.
78
93
TABLE 4.6.4
EFFECTS OF ZERO COVARIANCE ASSUMPTION ON FINAL MODELS.
POSTSTRATIFIED TAYLOR SERIES APPROXIMATION.
PHYSICIAN VISITS/POPULATION.
CLASSIFIED BY INCOME, RESIDENCE, AND EDUCATION.
Interdomain Covariances
Assumed to be Zero
Covariances Estimated
Parameter Vector b and Estimated Covariance Matrix V(b)
--
-
-=
b
y(~)
b_
[09ill
.2781
= ~5368
0.2079
°ill
2
-0.
0.3858
x 10- 3
V(b)
--
=
~.07~
0.2759
= ~6606
0.0934
0. 0934
0.5014
1
x 10- 3
Tables of Variation
Source
d.f.
% Total
Source
91. 74
Model
d.f.
% Total
1
Q
151.83
Model
1
Q
200.44
Error
16
18.04
8.26
Error
16
13.65
8.25
Total
17
218.49
100.00
Total
17
165.49
100.00
Ratios:
(Covariance Estimated)
Parameters:
[O~~
1.008
'
(Covariance
Covariance [J.953
-2.226
Matrix:
Tables of Variation:
Model
1.320
Error
1.322
Total
1.320
= 0)
-2.226/
0.769
91. 75
94
implication of these models are discussed in Section 4.5, so it is sufficient to say that they represent a best fit to the observed data in
that all statistically significant differences among the domains are
included in the model.
Here, attention is restricted to the examina-
tion of parameter and variance matrix estimates and of test statistics.
The tables are for the NPS-RR, PS-RR, NPS-TS, and PS-TS estimates,
respectively, but by the discussion in Sections 4.2 and 4.3 it is sufficient to examine only the first of these, the nonpoststratified data
set replicated ratio
estimates in Table 4.6.1.
The first two blocks
show the estimates and test statistics respectively, with the estimates
and statistics based on the zero covariance assumption on the right.
The third block of Table 4.6.1 shows the relevant comparisons.
Tables
4.6.2 through 4.6.4 are similarly organized.
The assumption of zero covariance has aro important effects,
bias and variance inflation.
These two effects cause a correspo?ding
loss of power in the test statistics.
The ratio of the parameter esti-
mates shows that the bias effect is not particularly important, less
than one per cent.
However, there is a five per cent inflation in the
variance estimate for the overall mean parameter estimate and a more
than 22 per cent inflation of the variance estimate for the regression
parameters estimate.
Further, the covariance estimate reverses in sign
and is more than 50 per cent too small.
This last point is particularly
important for estimating the variance of predicted values.
This var-
iance inflation is reflected in the more than 30 per cent reduction in
the test statistics, Q.
The remaining methods of generating variance
matrix estimates confirm the results.
95
The assumption of zero covariance has had two effects in this
cross-classification of the data.
There is an overall inflation of prc-
dieted value variance estimates and a corresponding deflation of test
statistics.
Thus, the tests become conservative and power is lost.
Further, it appears that the negative covariances, while present, do
not appear to be the dominant factor in this data set.
With the' effect
of the covariances being recognized and accounted for, the data set may
now be investigated for substantive conclusions.
4.5
Inference Structure and Substantive Conclusions
The investigation of the various estimation procedures is now
complete and it is appropriate to select a procedure to investigate the
data for substantive analysis.
It is clear that the TS and RR
approaches are equivalent and poststratification produces few gains.
Further, the interdomain covariances are important.
Therefore, the RR
procedure is used with NPS data and all covariances are estimated.
This analysis is presented in detail in Table 4.A.7 in the appendix to
this chapter.
The reduction to a final model is the same as in Chapter
Three.
The first step in the model reduction was to select a saturated
model and test various hypotheses.
vector,
£'
The design matrix, X, and parameter
are defined by,
Xs
~l
0
0
~2
-0
0
-00
~3
96
X
-1
~i
where the
=
1 1 1 0 1 0
1 1 0 1 0 1
1 1 -1 -1 -1 -1
1 -1 1 o -1 0
1 -1 0 1 o -1
1 -1 -1 -1 1 1
, i
= 1,2,3,
are the parameters at the i-th level of income.
Then
~~ = (~i,Ri,Eil,Ei2,(RE)il)(RE)i2) is the parameter vector with the fol-
lowing interpretation,
= average at i-th level of income
~i
Ri = residence effect at i-th level of income
Eil~= the education effects at
J
the i-th level of income
Ei2
J
(RE)i
(RE)
i2
= the residence by education interaction
effects at the i-th level of income.
This parameterization takes account of the possible income by education
interaction noted in Section 4.1.
The various tests of hypotheses in
Section 4.2 show that this is in fact the only important interaction.
This yields the following reduced model design matrix,
97
X
-1
c::
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
1 1
1 1
1 1
1 -1
1 -1
1 -1
0 1
0 1
0 1
o -1
o -1
o -1
-1 1
-1 1
-1 1
-1 -1
-1 -1
-1 -1
0
0
0
0
0
0
0
0
1
0
0
1
0
Mean
l..l
~l
=
II
1
2
c
J
(Incom
Effec
R
Residence.
(IE)
Interaction
0
1
0
0
0
Here, the second income effect was found to be zero.
Further, the var-
ious parameters could be summed to yield the final model which is displayed in Table 4.7 in the block labeled coefficient of the regression
parameter.
The goodness of fit statistic
is Q= 13.75, which has an
approximate Chi-squared distribution with 16 d.o.f.
non-significant.
This is clearly
Following the discussion of Chapter Two, the pre-
dieted values and estimated standard errors were computed and are displayed in Table 4.7.
These predicted values show only those differences which are
statistically significant at the 0.05 level.
tant conclusions.
They imply several impor-
Non-SMSA persons have fewer PV's than SMSA persons.
SMSA persons in the lowest income levels have the highest number of PV's
along with SMSA persons in households where the head of the household
has a greater than high school education.
For persons in households
with $5,000 or more income and where the head of the household has a
greater than high school education the two variables raise the number of
physician visits to the highest level of the residence class.
The
98
TABLE 4.7
PREDICTED VALUES (ESTIMATED STANDARD ERRORS IN PARENTHESES)
FOR RATIO ESTIMATES OF PHYSICIAN VISITS/POPULATION
BASED ON REPLICATED RATIOS
WITH INCOME, RESIDENCE, AND EDUCATION CLASSIFICATION.
ALL COVARIANCES ARE ESTIMATED.
NONPOSTSTRATIFIED DATA.
Inference Table
Education
of
Head
of
Household
Household Income
5,000-15,000
<5,000
SMSA
Non SMSA
Residence
SMSA
Non SMSA
15,000+
SMSA
Non-SMSA
Predicted Values
<HS
5.93
(0.07)
5.37
(0.05)
4.82
(0.06)
4.26
(0.09)
4.82
(0.06)
4.26
(0.09)
HS
5.93
(0.07)
5.37
(0.05)
4.82
(0.06)
4.26
(0.09)
4.82
(0.06)
- 4.26
(0.09)
>HS
5.93
(0.07)
5.37
(0.07)
5.93
(0.07)
5.37
(0.05)
5.93
(0.07)
4.26
(0.09)
Coefficient of Regression Parameter
.1.0
-1.0
-3.0
-1.0
-3.0
3.0
1.0
-1.0
-3.0
-1.0
-3.0
3.0
1.0
3.0
1.0
3.0
-3.0
<HS
3~Q
HS
>HS
.
99
exception is in the non-SMSA households with $15,000 or more income.
This exception, along with a parallel effect in the poorest households,
corresponds to the descriptively evident interaction noted in Section
4.1.
Thus, the GSK inference procedure has compactly described the 18
sampling domains with only four distinct subpopulations.
These corre-
spond to the two residence classes, two income classes (less than
$5,000 or $5,000 and morc), and an interaction term for education (no
more than a high school degree or more than high school for households
with $5,000 income or more).
Thus, a small number of substantively
suggestive parameters characterize this cross-classification.
4.6
Summary of Results
This chapter has been devoted to testing the conclusions of
Chapter Three in the presence of limited response error.
It was
observed that the particular source of this error could not be identified, but that previous work in this area suggested the likelihood of
its presence for the income and education variables.
Three hypotheses
were examined, the consistency of BRR for ratio variance estimation,
there is variance reduction due to poststratification, and thirdly that
the interdomain covariances may be assumed to be zero.
Each of these
hypotheses was evaluated in terms of the effects on GSK weighted least
squares inference procedures.
The results confirmed those of Chapter Three for these data.
The comparison of the Taylor series approximation, which yields consistent estimates of VCr), to the RR estimate shows the difference to be
virtually negligible.
also.
Thus, the RR estimate appears to be consistent
The poststratification adjustment faired less well.
As in Chap-
100
ter Three it had little or no effect.
However, what effect that was
detected was not usually in the desired direction.
This suggests that
the poststratification adjustment in the presence of limited response
error may not be worthwhile and may lead to variance inflation rather
than reduction.
The investigation of the covariance assumption indi-
cated that these were quite important in the inference procedure and
could not be assumed to be zero.
Based on these results, the data were then compactly described
using the GSK inference procedure.
This completes the empirical inves-
tigation of inference procedures for complex sample survey data.
The
next chapter is devoted to various related analytical results.
4.7
Appendix of Detailed Tables
The tables for this analysis are presented on the following
pages according to the arrangement in the appendix in Chapter Three.
detailed list of the tables is found in Table 4.8.
A
The last two tables,
4.A.9 and 4.A.lO along with 3.A.7, are included for the reader interested in the assumption of zero numerator denominator covariance.
These
confirm the results of Frankel (1971) who observed a 50 per cent error
introduced by this assumption in Taylor series approximation.
101
TABLE 4.8
DETAILED TABLES IN THE APPENDIX
Table
Postsampliug
Adjustment
Domain Covariance
..
Estimator
Between
Within
4.A.l
Poststratified
TS
,,0
/0
4.A.2
Poststratified
TS
=0
fO
4.A.3
Poststratified
RR
riO
ua
4.A.4
Poststratified
RR
=0
ua
4.A.5
Noupoststratified
TS
fO
riO
4.A.6
Noupoststratified
TS
=0
riO
4.A.7
Nonpoststratified
RR
riO
ua
4.A.8
Nonpoststratified
RR
=0
ua
4.A.9
Poststratified
TS
=0
=0,
TS
=0
=0
4.A.lO Nonpoststratified
I
TABLE 4.A.1
RATIO ESTIMATES, VCr) BASED ON TAYLOR SERIES APPROXIMATION.
DATA:- PHYSICIAN VISITS/POPULATION]
(INCOME, EDUCATION, RESIDENCE).
POSTSTRATIFIED.
COVARIANCES ESTIMATED.
Observed Estimates
of Standard Error
Education
Residence
Class
0-4,999
Observed Estimates
of S.E. for Parameters
Income
5,000- 15,000
14,999 and over
SMSA
<HS
0.1798
0.1285
0.2528
HS
0.4098
0.1746
0.1802
>HS
0.4885
0.1861
0.1630
0.2607
0.1535
0.3755
HS
0.4380
0.1934
0.3248
>HS
0.5622
0.2915
0.3110
I
I
,I
~on SMSA
<HS
,
R1
E1
E2
0.1736
0.1768
0.1754
0.0843
0.0830
0.1061
0.1096
l!
(RE)l
(RE)2
0.2397
0.1733
0.2442
0.0871
0.0986
0.0841
9.1081
0.1856
0.1643
0.1733
0.1511
......
o
N
103
Table 4.A.1 (con't.)
Tests of Hypotheses:
d.t.
Interactions:
1.04
= 0
1
0.01
0.0170
0.2039
= 0
1
0.31
-0.1467
0.2621
= 0
1
0.01
0.0298
0.2516
=0
1
0.00
0.0004
0.3060
4
13.57
0
1
6.67
0.4508
0.1746
0
1
1.9Z
0.3942
0.2842
0
1
0.33
0.1453
0.2534
0
1
1.15
0.3230
0.3017
2
2.26
1
1.53
0.2229
0.1802
1
Z.23
0.3021
0.20Z2
2
3.80
=0
1
0.93
-0.2440
0.2528
0
1
2.23
-0.4356
0.2916
I E - I E =
Z 1
1 1
I E - 1 E =
1 2
2 Z
I E - 1 E =
l 1
3 1
I E - 1 E =
1 2
3 Z
Income x Residence
I R - 1 R =0
1 l
2 1
l R - 1 R =0
1 1
3 3
Education x Residence
RE
2
S.E.
4
I RE - 1 RE
2 1
1 1
I RE - I RE
2 2
1 2
I RE 1 - 1 3RE
1
3
I RE - 1 RE
3 3
1 2
Income x Education
1
Contrast
12
Income x Education x Residence
RE
Q
c
Main Effects:
5
Income
2
17.91
lJ 1 - lJ 2 = 0
lJ 2 - lJ 3 = 0
Education
1
13.80
0.7220
0.1943
1
1.04
0.1239
0.1218
2
7.15
E = 0
1
4.34
-0.5840
0.2804
E = 0
2
Residence
1
0.77
-0.2413
0.2745
1
27.96
1.2779
0.2417
17
218.49
1
5.54
0.5980
0.2542
3
2.78
1
0.22
-0.1546
0.3273
1
2.36
-0.2112
0.1374
1
0.01
0.0231
0.3093
1
Total
Trends:
lJ 1 - 2lJ 2 + lJ 3
Education with Income
Income
1 :
1
1 :
2
1 :
3
E1 - E2 • 0
E - E c 0
1
2
E - E = 0
1
2
cO
104
Table 4.A.l (con't.)
Final Model (Based on Table 4.A.7):
Design Matrix
X' •
b
-
~
~3
1
3
1 1 1 1 1 1 1 1 1 1 1
1 1 1 -1 -1 3 -3 -3 1 -1 -1
Estimated Model Parameters
='5.09121
V(b)
- -
~278!J
= r;2.S368
l:0.2079
1 1 l~
3 -3 -3~
-0.20791 x 10- 3
0.38S~
Smoothed or Fitted Table
Residence-Education Class
iFami1y
Income
-.
SMSA
HS
<HS
0-4,99(
>HS
<HS
Non-SMSA
HS
>HS
5.3693
5.3693
5.3693
5.925~
5.9255
Fitted
5.9255
(Obs. S.E.) (0.1798) (0.4098) (0.4885) (0.2607) (0.4380) (0.5622)
0.0501
0.0501
0.0501
0.0690
0.0690
Fitted S.E. 0.0690
(0.2221) (0.2482) (0.3811) (-0.2922)(-0.0091)(-0.7846)
(Res. )
4.2569
5,0005.3693
4.2569
5.9255
4.8131 . 4.8131
Fitted
14,99C; (Obs. S. E.) (0.1285) (0.1746) (0.1861) (0.1535) (0.1934) (0.2915)
0.0852
0.0501
0.0852
0.0690
Fitted S.E. 0.0578 . 0.0578
(0.0634)(-0.3090)
(-0.1127)
(0.1516)
(-0.0783)
(0.1681)
(Res.)
4.2569
15,000
4.2569
4.2569
5.9255
4.8131
4.8131
Fitted
and Uf (Obs. S.E.) (0.2528) (0.1802) (0.1630) (0.3755) (0.3248) (0.3110)
0.0852
0.0852
0.0852
0.0690
0.0578
Fitted S.E. 0.0578
(0.2229)
(0.2360)
(0.1608)
(0.0114)(-0.1100)(-0.2692)
(Res.)
Analysis of Variation Table
Source
d.f.
Q
Contrast
S.E.
% Total Q
0.2781
0.0196
91. 74
Model
1
200.44
Error
16
18.04
Total
17
218.49
8.46
TABLE 4.A.2
RATIOS ESTIMATES, Y(r) BASED ON TAYLOR SERIES APPROXIMATION
DATA: PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
POSTSTRATIFIED.
INTRADOMAIN COVARIANCES ASSUMED ZERO.
Observed Estimates
of Standard Error
IEducation
Residence
Class
0-4,999
Income
5,00014,999
Observed Estimates
of S.E. for Parameters
15,000
and over
SMSA
<liS
0.1798
0.1285
0.2528
HS
0.4098
0.1746
0.1802
>liS
0.4885
0.1861
0.1630
0.2607
0.1535
0.3755
HS
0.4380
0.1934
0.3248
>HS
0.5622
0.2915
0.3110
~on
SMSA
<liS
,
l.l
R
1
E1
E2
0.1679
0.1679
0.1912
0.0795
0.0795
0.1138
0.1138
(RE) 1
(RE) 2
0.2412
0.1912
0.2412
0.0983
0.1094
0.0983
0.1094
0.1733
0.1564
0.1733
0.1564
I-'
o
V1
106
Table 4.A.Z (con't.)
Tests of Hypotheses:
d. f.
Q
Contrast
S.E.
lZ
Interactions:
Income x Education x Residence
4
0.96
=0
=0
=0
1
0.01
0.0170
0.2150
1
0.31
-0.1467
0.Z649
1
0.01
0.0298
-0.2580
0
1
0.00
-0.0004
0.Z874
4
10.09
= 0
1
4.40
0.4508
0.2150
=0
1 E =0
3 1
1 E =0
3 Z
1
2.2Z
0.3942
0.2649
1
0.32
0.1453
0.2580
1
1.26
0.3230
0.2874
2
2.23
1
1.44
0.2Z29
0.1858
1
2.22
0.3021
0.2028
Z
4.08
REI .. 0
1
0.78
-0.2440
0.2761
RE
1
2.01
-0.4356
0.3076
I RE
1
- I RE
Z 1
1 1 RE Z - IZRE Z
I RE - 1 RE
1
1
1
3
3
I RE - 1 RE
1 2
3 3
Income x Education
I E - I E
Z1
1 1
I 1EZ - IZE Z
I E 1 1
I E 1 Z
=:
Income x Residence
I R - I R .. 0
1 1
Z 2
I R - 1 R =0
1 1
3 3
Education x Residence
2
0
I:
Main Effects:
5
Income
III - ll2
llZ - ll3
2
18.73
=0
1
15.10
0.7220
0.1858
0
1
0.80
0.1239
0.1388
2
6.99
1
4.47
-0.5840
0.Z761
1
0.62
-0.2413
0.3076
1
34.41
1.Z779
0.2178
17
165.49
1
5.38
0.5980
0.2577
3
1.87
0
1
0.21
-0.1546
0.3392
0
0
1
1.65
0.1643
1
0.01
-0.211Z
0.OZ31
=:
Education
E .. 0
1
E2 • 0
Residence
Total
Trends:
III - Zll2 + ll3
Income
Education within Income
II:
E - E2
1
1 :
Z
1 :
3
E - E
1
2
E - E
1
2
a
a
0
a
0
0.Z928
107
Table 4.A.2. (con't.)
Final Model (Based on Table 4.A.7):
Design Matrix
X'
= ~
~
1
3
1 1 1 1 1 1 1 1 1 1 1
1 1 1 -1 -1 3 -3 -3 1 -1 -1
Estimated Model Parameters
b = js:07461
~275~
V(b)
- -
=
~606
~0934
1 1 1~
3 -3 -~
0.09341 x 10- 3
0.501~1
Smoothed or Fitted Table
Residence-Education Class
bamily
ncome
SMSA
HS
<HS
0-4,999
>HS
<HS
Non-SMSA
HS
>HS
5.3505
5.3505
5.3505
Fitted
5.9023
5.9023
5.9023
(Obs. S.E.) (0.1798) (0.4098) (0.4885) (0.2607) (0.4380) (0.5622)
0.0579
0.0579
0.0579
"'itted S.E. 0.0879
0.0879
0.0879
(0.2452) (0.2713) (0.4042) (-0.2735) (0.0096)(-0.7659)
(Res.)
4.2469
4.2469
5.3505
5,0004.7987
4.7987
Fitted
5.9023
14,999 (Obs. S.E.) (0.1285) (0.1746) (0.1861) (0.1535) (0.1934) (0.2915)
0.0813
0.0813 '0.0579
0.0545
0.0879
Fitted S.E. 0.0545
(Res.) (-0.0640) (0.1824) (0.1748) (-0.1027) (0.0733)(-0.2902)
4.2469
4.2469
4.7987
5.9023
4.2469
15,000
Fitted
4.7987
(0.3110)
(0.3248)
(0.1630)
(0.3755)
and up (Obs. S. E.) (0.2528) (0.1802)
0.0813
G-.0813
0.0813
0.0545
0.0879
Witted S.E. 0.0545
(0.2329)
(0.0258) (0.0956) (-0.2461) (0.1708) (0.2460)
(Res.)
Analysis of Variation Table
. Source
d. f.
Q
Model
1
151. 83
Error
16
13.65
Total
17
165.49
Contrast
S.E.
% Total Q
0.2759
0.0224
91. 75
8.25
TABLE 4.A.3
RATIOS ESTIMATES, ~(~) BASED ON REPLICATED RATIOS
DATA: PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
POSTSTRATIFIED.
COVARlANCES ESTIMATED.
Observed Estimates
of Standard Error
Education
Residence
Class
0-4,999
Observed Estimates
of S.E. for Parameters
Income
4,999
14,999
15,000
and over
SMSA
<HS
0.1796
0.1286
0.2530
HS
0.4112
0.1746
0.1799
>HS
0.4925
0.1868
0.1628
Non SMSA
<HS
0.2618
0.1541
0.3748
HS
0.4378
0.1935
0.3279
>HS
0.5809
0.2921
0.3128
,
~
E
1
E
2
0.1767
0.1786
0.1770
0.0513
0.0830
0.1062
0.1097
II
(RE)l
(RE)2
0.2419
0.1746
0.2474
0.0875
0.0990
0.0843
0.1082
0.1861
0.1654
0.1741
0.1518
to-"
o
co
109
Table 4.A.3 (con't.)
Tests of Hypotheses:
d.f.
Interactions:
Q
Contrast
S.E.
12
Income x Education x Residence
4
1.03
I RE - 1 RE = 0
2 1
l 1
I RE ~ 1 RE = 0
2 Z
1 2
I RE - 1 RE = 0
1 1
3 1
I RE - 1 RE = 0
3 Z
1 Z
Income x Education
1
0.01
0.1705
0.2049
1
0.31
-0.1467
0.2655
1
0.01
0.0298
'0.2520
1
0.00
-0.0004
0.3097
4
13.22
0
1
6.56
0.4508
0.1760
0
1
1.90
0.3942
0.2864
0
1
0.33
0.1453
0.2541
0
1
1.13
0.3230
0.3043
2
2.26
1
1.51
0.2229
0.1815
1
2.24
0.3021
0.2019
2
3.67
=0
1
0.91
-0.2440
0.2556
=0
1
2.21
-0.4356
0.Z933
I E - 1 E =
2 1
1 1
1
E ..
I E 1 2
2 2
I E - I E =
1 1
3 1
I E - 1 E =
1 2
3 2
Income x Residence
I R - I R = 0
2 Z
1 1
I R - I R .. 0
1 1
3 3
Education x Residence
RE
RE
1
2
Main Effects:
5
Income
2
17.62
\.1 1 - \.1 2 = 0
\.1 - \.1 .. 0
2
3
Education
1
13.66
0.7220
0.1954
1
1.03
0.1239
0.1223
2
6.89
1
4.27
-0.5840
0.2828
1
0.76
-0.2413
0.2764
1
27.24
1.2779
0.2448
17
216.85
1
5.51
0.5980
0.2548
3
2.76
1
0.22
-0.1546
0.3275
1
2.34
-0.2112
0.1382
1
0.01
0.0231
0.3110
E .. 0
1
E .. 0
2
Residence
Total
Trends:
Income
\.1
Education
11:
1 :
2
1 :
3
- 2\.1 + \.1 a 0
2
1
3
within Income
E - E 1:1 0
1
2
E - E a 0
1
2
E - E • 0
1
Z
110
Table 4.A.3 (con't.)
Final Model (Based on Table 4.A.7):
Design Matrix
x' = ~ 1
L!- 3
1
3
1
1
1
1
1 1 1
1 -1 -1
1 1 1
3 -3 -3
1 1 1
1 -1 -1
1 1 1 J:l
3 -3 -3~
Estimated Model Parameters
J
rs.
0920
b =
~.2777
= 1~·5259 -0.20~
V(b)
- -
L:?2070
0.38~
x 10- 3
Smoothed or Fitted Table
Residence-Education Class
Family
Income
SMSA
. HS
<HS
0-4,999
>HS
<HS
Non-SMSA
HS
>HS
Fitted
5.9250
5.9250
5.9250
5.3697
5.3697
5.3697
(Obs. S.E.) (0.1796) (0.4112) (0.4925) (0.2618) (0.4378) (0.5809
Witted S.E. 0.0691
0.0691
0.0691
0.0500
0.0500
0.0500
(Res.)
(0.2225) (0.2486) (0.3815) V-0.2926) (-0.0095)(-0.7850
5,000Fitted
4.8143
4.8143
5.9250
4.2589
4.2589
5.3697
14,999 (Obs. S.E;) (0.1286) (0.1746) (0.1868) (0.1541) (0.1935) (0.2921)
lJi'itted S.E. 0.0577
0.0577
0.0691
0.0852
0.0852
0.0500
(Res.) (-0.0796) (0.1669) (0.1521) ~-O .1147) (0.0613)(-0.3094~
15,000
Fitted
4.8143
4.8143
5.9250
4.2589
4.2589
4.2589
and up (Obs. S.E.) (0.2530) (0.1799) (0.1628) (0.3748) (0.3279) (0.3128)
~itted S.E. 0.0577
0.0577
0.0691
0.0852
0.0852
0.0852
(Res.)
(0.0102)(-0.1112)(-0.2688) (0.1588) (0.2340) (0.2209)
Analysis of Variation Table
. Source
d. f.
Q
Model
1
198.64
Error
16
18.21
Total
17
216.85
Contrast
S.E.
% Total Q
0.2777
0.0197
91.60
8.40
TABLE 4.A.4
RATIOS ESTIMATES, V(r) BASED ON REPLICATED RATIOS.
DATA: PHYSiCIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
POSTSTRATIFIED.
INTERDOMAIN COVARIANCES ASSUMED ZERO.
Observed Estimates
of S.E. for Parameters
Observed Estimates
of Standard Error
lEducation
lResidence
Class
0-4,999
I
Income
5,00014,999
15,000
and over
SMSA.
<HS
0.1796
0.1286
0.2530 .
HS
0.4112
0.1746
0.1799
>HS
0.4925
0.1868
0.1628
!Non SMSA
<HS
0.2618
0.1541
0.3748
HS
0.4378
0.1935
0.3279
>HS
0.5809
0.2921
0.3128
R1
E1
E2
0.1701
0.1701
0.1932
0.0796
0.0796
0.1141
0.1141
II
(RE)l
(RE)2
0.2429
0.1932
0.2429
0.0985
0.1096
0.9850
0.1096
0.1734
0.1571
0.1734
0.1571
....
........
112
Table 4 .A. 4 (con' t.)
Tests of Hypotheses:
Interactions:
S.E.
0.95
1
0.01
0.0170
0.Z169
1
0.30
-0.1467
0.Z664
1
0.01
0.0298
0.2596
1
0.00
-0.0004
0.Z893
4
9.86
1
4.3Z
0.4508
0.Z169
1
2.19
0.3942
0.2664
1
0.31
0.1453
0.Z596
1
1.25
0.3230
0.2893
2
2.19
1
1.41
0.2229
0.1878
1
2.17
0.3021
0.2048
2
3.98
-0
1
0.77
-0.2440
0.2777
=0
1
1.98
-0.4356
0.3093
I R - 1 R ::: 0
1 1
2 2
I R - 1 R ::: 0
1 1
3 3
Education x Residence
RE 2
Contrast
4
I E - I E ::: 0
1 1
Z 1
I 1E - 1 E ::: 0
2 Z
2
1
I 1E1 - 3E1 ::: 0
I E - 1 EZ = 0
3
1 2
Income x Residence
1
Q
12
Income x Education x Residence
IIREI - I ZRE 1 ::: 0
I RE - IZRE ::: 0
1 2
Z
I RE - 1 RE ::: 0
1 1
3 3
I RE - 1 RE ::: 0
1 Z
3 3
Income x Education
RE
d.f.
Main Effects:
5
Income
2
18.31
lJ 1 - lJ 2 • 0
lJ 2 - lJ 3 = 0
Education
1
14.77
0.7220
0.1878
1
0.79
0.1239
0.1392
2
6.82
E1 = 0
E .. 0
2
Residence
1
4.42
-0.5840
0.2777
1
0.61
-0.2413
0.3093
1
33.81
1.2779
0.2198
17
164.85
1
5.31
0.5980
0.2595
3
1.86
1
0.21
-0.1546
0.3397
1
1.65
-0.2112
0.1645
1
0.01
0.0231
0.2934
Total
Trends:
Income
- 2lJ 2 + lJ 3 a 0
Education within Income
~1
II:
1 :
2
1 :
3
E - E u 0
1
2
E1 - E2 - 0
E - E sa 0
1
2
113
Table 4.A.4 (can't.)
Final Model (Based on Table 4.A.7):
Design Matrix
X' = ~ 1
~3
1
3
1 1 1 1 1 1 1 1 1 1 1
1 1 1 -1 -1 3 -3 -3 1 -1 -1
Estimated Model Parameters
V(b) = rz.-6685
- l.Q.:..0926
b = f5.074sl
lQ.: 276Qj
0.09261
0.50:gJ
1 1 1--n
3 -3 -3~
x 10- 3
Smoothed or Fitted Table
Residence-Education Class
Family
ncome
SMSA
HS
<HS
0-4,999
>HS
<HS
Non-SMSA
HS
>HS
Fitted
5.9027
5.9027
5.3508
5.3508
5.9027
5.3508
Obs. S.E.) (0.1796) (0.4112) (0.4925) (0.2618) (0.4378) (0.5809
;'itted S.E. 0.0880
0.0880
0.0579
0.0579
0.0880
0.0579
(Res.)
(0.2448) (0.2709) (0.4038) -0.2738) (0.0094)(-0.7662)
5,000Fitted'
4.2470
4.2470
4.7989
4.7989
5.9027
5.3508
14,999 Obs. S.E.) (0.1286) (0.1746) (0.1868) (0.1541) (0.1935) (0.2921)
~itted S.E.
0.0546
0.0546
0.0880
0.0815
0.0815
0.0579
(Res.) (-0.0641) (0.1823) (0.1744) -0.1028) (0.0732)(-0.2905)
15,000
4.7989
4.2470
Fitted
4.7989
5.9027
4.2470
4.2470
and Uf ~Obs. S.E.) (0.2530) (0.1799) (0.1628) (0.3748) (0.3279) (0.3128)
bitted S.E. 0.0546
0.0815
0.0546
0.0815
0.0815
0.0880
(Res.)
(0.0256)(-0.0958) (-0.2464) (0.1707) (0.2459) (0.2328)
Analysis of Variation Table
Source
d. f.
Model
1
151.36
Error
16
13.49
Total
17
164.85
Q
Contrast
S.E.
% Total Q
0.2750
0.0224
91.82
8.18
TABLE 4.A.5
RATIO ESTIMATES, V(r) BASED ON TAYLOR SERIES APPROXIMATION.
DATA:~ PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
NONPOSTSTRATIFIED.
COVARIANCES ESTIMATED.
Observed Estimates
'of S. E. for Parameters
Observed Estimates
of Standard Error
Education
Residence
Class
0-4,999
Income
5,00014,999
15,000
and over
SMSA
<}IS
0.1779
0.1290
0.2541
HS
0.4078
0.1738
0.1802
>HS
0.4871
0.1844
0.1620
lNon SMSA
<liS
0.2618
0.1533
0.3742
HS
0.4366
0.1932
0.3251
>HS
0.5630
0.2911
0.3107
,
II
R
1
E
1
E
0.1753
0.1775
0.1762
0.0850
0.0829
0.1061
0.1095
(RE) 1
(RE) 2
0.2409
0.1723
0.2437
0.0877
0.0984
0.0837
0.1078
0.1856
0.1640
0.1734
0.1516
2
....
....
~
115
Table 4.A.5 (con't.)
d.f.
Tests of Hypotheses:
Interactions:
Q
Contrast
S.E.
12
Income x Education x Residence
4
1.04
I RE - 1 RE c: 0
1 l
2 l
I RE - I RE = 0
1 2
2 2
I RE - 1 RE = 0
3 3
1 1
I RE - 1 RE = 0
3 3
1 Z
Income x Education
1
0.01
0.0170
0.2031
1
0.31
-0.1467
0.2614
1
0.01
0.0298
0.Z509
1
0.00
-0.0004
0.3054
4
13.62
0
1
6.65
0.4508
0.1749
0
1
1.91
0.3942
0.2850
0
1
0.33
0.1453
0.2533
0
1
1.14
0.3230
0.3025
2
2.26
1
1. 53
0.2229
0.1804
1
2.23
0.3021
0.2023
2
3.83
1
0.94
-0.2440
0.2517
1
2.23
-0.4356
0.2920
I E - 1 E c:
1 1
2 l
I E - 1 E =
1 2
2 2
I E - 1 E =
1 1
3 1
I E - 1 E =
1 2
3 2
Income x Residence
I R - 1 R =0
1 1
2 Z
I1~ - 1 R = 0
3 3
Education x Residence
RE =0
1
RE ... 0
2
Main Effects:
5
Income
2
18.24
1.1 1 - 1.1 2 = 0
1.1 2 - 1.1 3 = 0
Education
E .. 0
1
E .. 0
2
Residence
1
13.89
0.7220
0.1937
1
1.05
0.1239
0.1208
2
7.14
1
4.26
-0.5840
0.2828
1
0.77
-0.2413
0.2748
1
27.75
1. 2779
0.2426
17
219.14
1
5.54
0.5980
0.2540
3
2.75
1
0.22
-0.1546
0.3301
"I
2.35
-0.2112
0.1377
1
0.01
0.0231
0.3093
Total
Trends:
Income
1.1 1 - 21.1 2 + 1.1 3 • 0
Education within Income
1 :
1
1 :
2
1 :
3
E - E • 0
2
1
E - E2 • 0
1
E1 - £2 ... 0
116
Table 4.A.5 (con't.)
Final Model (Based on Table 4.A.7):
Design Matrix
X'
=
f11
b
=
~
1
3
1 1 1 1 1 1 1 1 1 1 1
1 1 1 -1 -1 3 -3 -3 1 -1 -1
Estimated Model Parameters
rs.09451
~.277~
V(b)
- -
=
12.5505
L:0.2283
-0.22831
0.38~
1 1 1~
3 -3 -3~
x 10- 3
Smoothed or Fitted Table
Residence-Education Class
Family
~ncome
SMSA
HS
<HS
0-4,999
>HS
<HS
Non-SMSA
HS
>HS
Fitted
5.9272
5.9272
5.3721
5.3721
5.3721
5.9272
(Obs. S.E.) (0.1779) (0.4078) (0.4871) (0.2618) (0.4366) (0.5630
Fitted S.E~ 0.0680
0.0680
0.0498
0.0498
0.0498
0.0680
(Res.)
(0.2203) (0.2464) (0.3793) -0.2950)(-0.0119)(-0.7874
5,000Fitted
4.8169
4.2617
5.3721
4.8169
5.9272
4.2617
14,999(Obs. S.E.) (0.1290) (0.1738) (0.1844) (0.1533) (0.1932) (0.2911
Fitted S.E. 0.0582
0.0858
0.0498
0.0582
0.0680
0.0858
(Res.)
-0.8213) (0.1643) (0.1498) -0.1175) (0.0585)(-0.3118
15,000
Fitted
4.8169
4.2617
4.2617
4.2617
4.8169
5.9272
and up (Obs. S.E.) (0.2541) (0.1802) (0.1620) (0.3742) (0.3251) (0.3107
Fitted S.E. 0.0582
0.0858
0.0582
0.0858
0.0858
0.0680
(Res.)
(0.0076)(-0.1138)(-0.2710) (0.1560) (0.2312) (0.2181
Analysis of Variation Table
Source
d. f.
Model
1
201.16
Error
16
17.98
Total
17
219.14
Q
Contrast
S.E.
% Total Q
0.2776
0.0196
91.80
8.20
.
TABLE 4.A.6
RATIO ESTIMATES, V(r) BASED ON TAYLOR SERIES APPROXIMATION.
DATA:- PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
NONPOSTSTRATIFIED.
INTERDOMAIN COVARIANCES ASSUMED ZERO.
Observed Estimates
of Standard Error
.Education
Residence
Class
0-4,999
Observed Estimates
of S.E. for Parameters
Income
5,00014,999
15,000
and over
1.1
R
1
E
1
E
2
0.1676
0.1676
0.1909
0.0793
0.0793
0.1137
0.1137
SMSA
<HS
0.1779
0.1290
0.2541
HS
0.4078
0.1738
0.1802
>liS
0.4871
0.1844
0.1620
Non SMSA
<fiS
0.2618
0.1533
0.3742
HS
0~4366
0.1932
0.3251
>HS
0.5630
0.2911
0.3107
(RE) 1
(RE) 2
0.2405
0.1909
0.2405
0.0981
0.1092
0.0981
0.1092
0.1732
0.1564
0.1732
0.1564
t-'
t-'
""'"
118
Table 4.A.6 (con't.)
Tests of Hypotheses:
d.f.
Interactions:
Q
Contrast
S.E.
12
Income x Education x Residence
4
0.96
- 12RE
1
I
RE
I 1RE 2
2 2
I 1RE - I RE
1
3 3
I RE - 1 RE
1 2
3 3
Income x Education
== 0
1
0.01
0.0170
0.2147
0
1
0.31
-0.1467
0.2641
=0
1
0.01
0.0298
0.2577
0
1
0.00
-0.0004
0.2869
4
10.12
I E - 1 E =
1 1
2 1
I E - I E =
2 2
1 2
I E - I E ==
1 l
3 1
I E - 1 E =
1 2
3 2
Income x Residence
0
1
4.41
0.4508
0.2147
0
1
2.23
0.3942
0.2641
0
1
0.32
0.1453
0.2477
0
1
1.27
0.3230
0.2869
2
2.24
1
1.45
0.2229
0.1854
1
2.22
0.3021
0.2026
2
4.10
1
0.78
-0.2440
0.2758
1
2.01
-C.4356
0.3069
I 1 RE
1
==
==
I R - 1 R == 0
1 1
2 2
1 R - I R == 0
1 1
3 3
Education x Residence
REI == 0
RE = 0
2
Main Effects:
5
Income
II 1 - II 2 == 0
II 2 - II 3 ... 0
Education
E =0
1
E2 == 0
Residence
Total
2
18.78
1
15.16
0.7220
0.1854
1
0.80
0.1239
0.1386
2
7.00
1
4.48
-0.5840
0.2758
1
0.62
-0.2413
0.3069
1
34.51
1.2779
0.2175
17
167.34
1
5.40
0.5980
0.2573
3
1.87
1
·1
0.21
-0.1546
0.3381
1.66
-0.2112
0.1641
1
0.01
0.0231
0.2927
Trends:
Income
111 - 2112 + 113 • 0
Education within Income
11:
1 :
2
1 :
3
E - E .. 0
1
2
E - E .. 0
1
2
E - E .. 0
1
2
119
Table 4.A.6 (con't.)
Final Model (Based on Table 4.A.Z):
Design Matrix
x' = ~
~
b
=
1
3
1 1 1 1 1 1 1 1 1 1 1
1 1 1 -1 -1 3 -3 -3 1 -1 -1
Estimated Model Parameters
V(b) = f2. 6464
- ~0857
f5:oz531
~2Z~
1 1 1~
3 -3 -3-=JU
0.08lli
0.4965
Smoothed or Fitted Table
Residence-Education Class
Family
IIncome
SMSA
HS
<HS
0-4,999
>HS
<HS
Non-SMSA
HS
>HS
Fitted
5.9038
5.9038
5.9038
5.3515 - 5.3515
5.3515
Obs. S.E.) (0.1779) (0.4078) (0.4871) (0.2618) (0.4366) (0.5630
Fitted S.E. 0.0873
0.0873
0.0873
0.0576
0.0576
0.0576
(Res.)
(0.2438) (0.2698) (0.4027) 1/-0.2744) (0.0087)(-0.7669
5,000Fitted
4.7992
5.9038
4.7992
4.2469
4.2469
5.3515
14,999 (Obs. S. E.) (0.1290) (0.1738) (0.1844) (0.1533) (0.1932) (0.2911
Fitted S.E. 0.0545
0.0545
0.0812
0.0873
0.0812
0.0576
(Res.)
-0.0644) (0.1820) (0.1733) (-0.1027) (0.0733)(-0.2912
15,000
Fitted
4.7992
4.7992
4.2469
5.9038
4.2469
4.2469
and up (Obs. S.E.) (0.2541) (0.1802) (0.1620) (0.3742) (0.3251) (0.3107
Fitted S.E. 0.0545
0.0545
0.0873
0.0812
0.0812
0.0812
(Res.)
(0.0253)(-0.0961)(-0.2475) (0.1708) (0.2460) (0.2329
Analysis of Variation Table
Source
d. f.
Q
Model
1
153.59
Error
16
13.75
Total
17.
16Z.34
Contrast
S.E.
% Total Q
0.2761
0.0223
91. 78
8.22
TABLE 4.A. 7RATIO ESTIMATES, V(r) BASED ON REPLICATED RATIOS.
DATA: PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
NONPOSTSTRATIFIED.
COVARIANCES ESTIMATED.
Observed Estimates
of Standard Error
Education
Residence
Class
0-4,999
Observed Estimates
of S.E. for Parameters
Income
5,00014,999
15,000
and over
1.1
R1
E1
E2
(RE) 1
(RE) 2
0.1750
0.1791
0.1776
0.2431
0.1735
0.2465
0.0843
0.0829
0.0881
0.0988
0.0840
0.1081
0.1063
0.1097
0.1860
0.1650
0.1743
0.1521
SMSA
<HS
0.1779
0.1292
0.2542
HS
0.4090
0.1738
0.1799
>HS
0.4912
0.1852
0.1619
Non SMSA
<HS
0.2627
0.1539
0.3734
HS
0.4356
0.1933
0.3281
>HS
0.5819
0.2921
0.3125
~
N
o
121
Table 4.A.7 (con't.)
Tests of Hypotheses:
d.£.
Interactions:
Q
Contrast
S.E.
12
Income x Education x Residence
4
1.04
I RE - I 2RE = 0
1 1
1
I RE - I RE .. 0
2 2
1 2
I 1RE 1 - I RE = 0
3 3
I RE - 1 RE .. 0
3 3
1 2
Income x Education
1
0.01
0.0170
0.2042
1
0.31
-0.1467
0.2645
1
0.01
0.0298
0.2510
1
0.00
-0.0004
0.3087
4
13.27
I E - 1 E =
1 l
2 1
I E2 - 1 2E2 =
1
I E - 1 E =
1 l
3 l
I 1 E - 1 3E =
2
2
Income x Residence
0
1
6.54
0.4508
0.1762
0
1
1.88
0.3942
0.2872
0
1
0.33
0.1453
0.2540
0
1
1.12
0.3230
0.3050
2
2.26
1
1.51
0.2229
0.1817
1
2.23
0.3021
0.2021
2
3.70
... 0
1
0.92
-0.2440
0.2548
=0
1
2.21
-0.4356
0.2934
I 1R1 - 1 2R2 = 0
I R - I R =0
1 l
3 3
Education x Residence
RE
RE
1
2
Main Effects:
5
Income
~
1 -
2 Education
~
~
2 = 0
~ = 0
3
E .. 0
1
E .. 0
2
Residence
Total
2
17.96
1
13.75
0.7220
0.1947
1
1.04
0.1239
0.1239
2
6.88
1
4.20
-0.5840
0.2849
1
0.76
-0.2413
0.2769
1
27.07
1. 2780
0.2456
17
217.15
1
5.52
0.5980
0.2546
3
2.73
1
0.22
-0.1546
0.3300
1
2.33
-0.2112
0.1385
1
0.01
0.0231
0.3108
Trends:
Income
~1 - 2~2 + ~3 • 0
Education within Income
II: E1 - E2 ... 0
1 : E - E ... 0
2
1
2
0
1 : E - E
3
2
1
CI
122
Table 4.A.7 (con't.)
Final Model (Based on Tests of Hypotheses):
Design Matrix
~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ;-:u
~ 3 1 1 1 -1 -1 3 -3 -3 1 -1 -1 3 -3 -3~
=
X'
Estimated Model Parameters
b
-
=
12.
5355
V(b) =
- ~.2259
/5.0956]
l.Q.:. 277.u
-0.22ill
0.3859
x 10- 3
Smoothed or Fitted Table
Residence-Education Class
lFamily
p:ncome
<HS
0-4,999
.
SMSA
HS
>HS
<HS
Non-SMSA
HS
>HS
Fitted
5.9269
5.9269
5.9269
5.3727
5.3727
5.3727
kObs. S. E.) (0.1779) (0.4090) (0.4912) (0.2627) (0.4356) (0.5819
Fitted S.E. 0.0682
0.0682
0.0682
0.0497
0.0497
0.0497
(Res.)
(0.2206) (0.2467) (0.3796) -0.2956)(-0.0125)(-0.7880
5,000Fitted
4.8184
4.8184
5.9269
4.2642
4.2642
5.3727
14,999 (Obs. S.E.) (0.1292) (0.1738) (0.1852) (0.1539) (0.1933) (0.2921
Fitted S.E. 0.0581
0.0581
0.0682
0.0858
0.0858
0.0497
(Res.) 1"-0.0837) (0.1628) (0.1502) -0.1200) (0.0561)(-0.3124
15,000
Fitted
4.8184
4.8184
5.9269
4.2642
4.2642
4.2642
and up Obs. S.E.) (0.2542) (0.1799) (0.1619) (0.3734) (0.3281) (0.3125
Fitted S.E. 0.0581
0.0581
0.0682
0.0858
0.0858
0.0858
(Res.)
(0.0061) (-0.1153)(*0.2707) (0.1535) (0.2287) (0.2156)
Analysis of Variation Table
Source
d. f.
Q
Model
1
199.01
Error
16
18.14
Total
17
217.15
Contrast
S.E.
% Total Q
0.2771
0.0196
91.65
8.35
•
TABLE 4.A.8
RATIO ESTIMATES, V(r) BASED ON REPLICATED RATIOS.
DATA: PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
NONPOSTSTRATIFIED.
INTERDOMAIN COVARIANCES ASSUMED ZERO.
Observed Estimates
of Standard Error
Education
Residence
Class
0-4,999
Observed Estimates
of S.E. for Parameters
Income
5,00014,999
15,000
and over
SMSA
<HS
0.1779
0.1292
0.2542
HS
0.4090
0.1738
0.1799
>HS
0.4912
0.1852
0.1619
INon SMSA
<HS
0.2627
0.1539
0.3734
HS
0.4356
0.1933
0.3281
>HS
0.5819
0.2921
0.3125
II
R
1
E
1
E
2
0.1698
0.1698
0.1929
0.0795
0.0795
0.11~0
0.1140
(RE)l
(RE) 2
0.2420
0.1929
0.2420
0.0984
0.1093
0.0984
0.1093
0.1732
0.1571
0.1732
0.1571
~
..",
w
124
Table 4.A.8 (con't.)
Tests of Hypotheses:
d.£.
Interactions:
Q
Contrast
S.E.
12
Income x Education x Residence
4
0.95
0
1
0.01
0.0170
0.2166
=0
1
0.30
-0.1467
0.2656
0
1
0.01
0.0298
. 0.2592
0
1
0.00
-0.0004
0.2885
Income x Education
4
9.88
I 1E1 - I2~1
0
I E - 1 E ... 0
2 2
1 2
I E - 1 E ... 0
1 1
3 1
I E - 1 E = 0
1 2
3 2
Income x Residence
1
4.33
0.4508
0.2166
1
2.20
0.3942
0.2656
1
0.31
0.1453
0.2592
1
1.25
0.3230
0.2885
2
2.20
I R - I R =0
2 2
1 1
1 R - I R -= 0
1 1
3 3
1
1.41
0.2229
0.1875
1
2.18
0.3021
0.2045
Education x Residence
2
3.99
I RE - I RE
1 1
2 1
IIRE 2 - 1 2RE 2
I RE - I RE
3 3
1 1
IIRE 2 - 1 3RE 3
:I
:I
REI
=0
1
0.77
-0.2440
0.2773
RE
= 0
1
1.99
-0.4356
0.3085
2
Main Effects:
5
Income
111 - 112
112 - 113
Education
=0
=0
E ... 0
1
E ... 0
2
Residence
Total
2
18.37
1
14.83
0.7220
0.1875
1
0.79
0.1239
0.1390
2
6.83
1
4.44
-0.5840
0.2773
1
0.61
-0.2413
0.3085
1
33.92
1.2779
0.2194
17
166.52
1
5.33
0.5980
0.2590
3
1.87
1
0.21
-0.1546
0.3382
1
1.65
-0.2112
0.1643
1
0.01
0.0231
0.2933
Trends:
Income
111 - 211 2 + 113 a 0
Education within Income
II:
1 :
2
13 :
E1 - E2 • 0
E - E a 0
1
2
E1 - E2 • 0
125
Table 4.A.8 (con't.)
Final Model (Based on Table 4.A.7):
Design Matrix
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1~
x' = ~
~ 3 1 1 1 -1 -1 3 -3 -3 1 -1 -1 3 -3 -3~
Estimated Model Parameters
b
...
=
f5.07561
V(b)
... ...
~276~
=
f2.6564
~0856
0.0856]
0.49~
x 10- 3
Smoothed or Fitted Table
Residence-Education Class
Wamily
Income
SMSA
HS
<HS
0-4,999
>HS
<HS
Non-SMSA
HS
>HS
Fitted
5.9041
5.9041
5.9041
5.3518
5.3518
5.3518
(Obs. S. E.) (0.1779) (0.4090) (0.4912) (0.2627) (0.4356) (0.5819
Witted S.E. 0.0875
0.0875
0.0875) 0.0577
0.0577
0.0577
(Res.)
(0.2434) (0.2695) (0.4024) -0.2747) (0.0084) (-0. 7671
5,000Fitted
4.7994
4.7994
5.9041
4.2471
4.2471
5.3518
14,999 (Obs. S. E.) (0.1292) (0.1738) (0.1852) (0.1539) (0.1933) {0.2921
Witted S.E. 0.0546
0.0546
0.0875
0.0814
0.0814
0.0577
(Res.)
-0.0647) (0.1818) (0.1730) -0.1029) (0.0732)(-0.2915
15,000
Fitted
4.7994
4.7994
5.9041
4.2471
4.2471
4.2471
and up (Obs. S.E.) (0.2542) (0.1799) (0.1619) (0.3734) (0.3281) (0.3125
l<'itted S.E. 0.0546
0.0546
0.0875
0.0814
0.0814
0.0814
(Res.)
(0.0251)(-0.0963)(-0.2478) (0.1706) (0.2458) (0.2327
Analysis of Variation Table
Source
d. f.
Q
Model
1
152.94
Error
16
13.58
Total
17
166.52
Contrast
S.E.
% Total Q
0.2762
0.0223
91.84
8.16
TABLE 4.A.9
--
RATIO ESTIMATES, VCr) BASED ON TAYLOR SERIES APPROXIMATION.
DATA: PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
POSTSTRATIFIED.
COVARIANCES ASSUMED ZERO.
Observed Estimates
of Standard Error
Education
Residence
Class
0-4,999
Observed Estimates
of S.E. for Parameters
Income
5,00014,999
15,000
and over
SMSA
<HS
0.2868
0.1742
0.3234
HS
0.4731
0.2078
0.2767
>HS
0.7138
0.2912
0.2338
lNon SMSA
<HS
0.3582
0.2378
0.5346
HS
0.5421
0.2448
0.4660
>HS
0.7870
0.4100
0.4584
1.1
R
1
E
1
E
2
0.2271
0.2271
0.2629
0.1109
0.1109
0.1624
0.1624
(RE) 1
(RE) 2
0.3078
0.2629
0.3078
0.1398
0.1446
0.1398
0.1446
0.2427
0.2255
0.2427
0.2255
.....
N
0\
1Z7
Table 4.A.9 (can't.)
Tests of Hypotheses:
d. f.
Interactions:
Q
4
0.50
ll RE I - lZRE 1
ll RE Z - lZRE Z
ll RE l - l3 RE 3
ll RE Z - 1 3RE 3
Income x Education
=
0
1
0.00
0.0170
0.2978
=
0
1
0.19
-0.1467
0.3400
=
0
1
0.01
0.0298
'0.3578
= 0
1
0.00
-0.0004
0.3815
4
4.92
0
1
2.29
0.4508
0.Z978
0
1
1.34
0.3942
0.3400
0
1
0.16
0.1453
0.3578
0
1
0.72
0.3Z30
0.3815
Z
1.19
1
0.78
0.2229
0.2528
1
1.17
0.3021
0.2792
Z
2.08
=0
1
0.40
-0.2440
0.3841
=0
1
1.14
-0.4356
·0.4080
ll E1 - I ZE1 =
I E - IZE
Z
1 Z
I E - l3E1 =
1 1
llEZ - 1 3E2 =
Income x Residence
I:
I 1R1 - IZRZ = 0
I R - 1 R = 0
1 1
3 3
Education x Residence
RE
1
2
S.E.
1Z
Income x Education x Residence
1m
Contrast
Main Effects:
5
Income
2
10.06
II 1 - II 2 = 0
llZ - 113 = 0
Education
1
8.16
0.7220
0.2528
1
0.40
0.1239
0.1966
2
3.42
1
2.31
-0.5840
0.3841
1
0.35
-0.2413
0.4080
1
18.10
0.1278
0.3004
17
75.95
1
2.81
0.5980
0.3566
3
1.07
1
0.13
-0.1546
0.4Z67
1
0.94
-0.2112
0.2179
1
0.00
0.0231
0.4135
E = 0
1
E
0
2
Residence
I:
Total
Trends:
IncolUe
111 - 211 Z + 113 .. 0
Education within Income
1 :
1
1 :
2
1 :
3
E - E .. 0
1
2
E - E .. ()
1
2
E - E .. 0
1
2
128
Table 4.A. 9 (con't.)
Final Model (Based on Table 4.A.7):
Design Matrix
X'
=~
~
b
1
1 1 1 1 1 1 1 1 1 1 1
1 1 1 -1 -1 3 -3 -3 1 -1 -1
Estimated Model Parameters
3
= [5.07751
-
V(b)
- -
~27~
=
rs. 2582
~3049
0.30491
1.05~,
1 1 1 11
3 -3 -3 -3
x 10- 3
Smoothed or Fitted Table
Residence-Education Class
Family
ilncome
SMSA
HS
<HS
0-4,999
Fitted
5.8892
5.8892
>HS
<HS
5.8892
5.3480
Non-SMSA
HS
5.3480
>HS
5.3480
~Obs. S.E.) (0.2868) (0.4731) (0.7138) (0.3582) (0.5421) (0.7870
ft'itted S.E. 0.1289
0.0832
0.0832
0.1289
0.1289
0.0832
(Res.)
(0.2584) (0.2844) (0.4173) -0.2710) (0.0121)(-0.7634
5,000Fitted
4.8069
4.8069
5.8892
4.2657
4.2657
5.3480
14,999 ~Obs·. S. E.) (0.1742) (0.2078) (0.2912) (0.2378) (0.2448) {0.4100
li'itted S.E. 0.0755
0.0755
0.1138
0.1138
0.0832
0.1289
(Res .)
-0.0721) (0.1743) (0.1879) -0.1215) (0.0545)(-0.2877
15,000
Fitted
4.8069
4.8069
4.2657
4.2657
4.2657
5.8892
and up (Obs. S.E.) (0.3234) (0.2767) (0.2338) (0.5346) (0.4660) (0.4584
Fitted S.E. 0.0755
0.1138
0.1138
0.0755
0.1289
0.1138
(Res .)
(0.0176)(-0.1038)(-0.2329) (0.1520) (0.2272) (0.2141
Analysis of Variation Table
Source
d.f.
Q
Contrast
S.E.
% Total Q
0.2706
0.0325
91.06
Model
1
69.16
Error
16
6.80
Total
17
75.95
8.95
TABLE 4.A.10
RATIO ESTIMATES, V(r) BASED ON TAYLOR SERIES APPROXIMATION.
DATA:- PHYSICIAN VISITS/POPULATION.
(INCOME, EDUCATION, RESIDENCE).
NONPOSTSTRATIFIED.
COVARIANCES ASSUMED ZERO.
Observed Estimates
of S.E. for Parameters
Observed Estimates
of Standard Error
Education
'Residence
Class
0-4,999
Income
5,00014,999
15,000
and over
SMSA
<HS
0.2774
0.1864
0.3254
HS
0.4788
0.2157
0.2774
>HS
0.7128
0.2898
0.2500
tNon SMSA
<HS
0.3670
0.2426
0.5345
HS
0.5444
0.2468
0.4644
>HS
0.7968
0.4279
o. 4 7l~
R
1
E
1
E
2
0.2285
0.2285
0.2643
0.1140
0.1140
0.1641
0.1641
~
(RE) 1
(RE) 2
0.3099
0.2643
0.3099
0.1442
0.1482
0.1442
0.1482
0.2440
0.2265
0.2440
0.2265
,...
N
\0
130
Table 4.A.IO (con't.)
Tests of Hypotheses:
d. f.
Interactions:
Q
Contrast
S.E.
12
Income x Education x Residence
4
0.48
- I 2RE 1 == 0
0
I RE - I 2RE
1 2
Z
I RE - 1 RE == 0
1 1
3 3
I 1RE Z - I RE == 0
3 3
Income x Education
1
0.00
0.0170
0.3011
1
0.18
-0.1467
0.3435
1
0.01
0.OZ98
0.3597
1
0.00
-0.0004
0.3838
4
4.77
0
1
2.24
0.4508
0.3011
0
1
1.32
0.3942
0.3435
0
1
0.16
0.1453
0.3597
0
1
0.71
0.3230
0.3838
2
1.17
1
0.76
0.2229
0.2554
1
1.15
0.3021
0.2813
2
2.02
=0
1
0.40
-0.2440
0.3876
RE 2 '" 0
1
1.12
-0.4356
0.4114
I 1RE
l
::I
I E - I E ==
1 1
2 1
I 1 E - I E2 ==
2
2
I E - I E =
1 1
3 1
I E2 - I E ==
1
3 2
Income x Residence
I R - I R2 == 0
1 1
2
I R - I R == 0
1 1
3 3
Education x Residence
RE
1
Main Effects:
5
Income
l.l
- l.l ... 0
1
2
l.l2 - l.l3 == 0
Education
E1 == 0
E2 = 0
Residence
Total
2
9.91
1
7.99
0.7220
0.2554
1
0.38
0.1239
0.1998
2
3.34
1
2.27
-0.5840
0.3876
1
0.34
,,:,0.2413
0.4114
1
17.72
1.2779
0.3036
17
75.03
1
2.73
0.5980
0.3622
3
1.02
1
"1
0.13
-0.1546
0.4293
0.89
-0.2112
0.2242
1
0.00
0.0231
0.4136
Trends:
Income
l.l1 - 2l.l 2 + l.l3 • 0
Education within Income
11:
1 :
2
1 :
3
E - E
0
1
2
E1 - E2 • 0
E - E == 0
1
2
::I
131
Table 4.A.IO (can't.)
Final Model (Based on Table 4.A.7):
Design Matrix
X'
=
fl1
~
b =
-
1
1
3
1 1 1 -1 -1 3 -3 -3 1 -1 -1
Estimated Model Parameters
1
08 4ol
rs:
lQ:.27~
1
V(b)
- -
1
1
=
1
1
15.5253
~3050
1
1
1
1
1
1
11l
3 -3 -~
0.30nJ
1.0921
Smoothed or Fitted Table
Residence-Education Class
Family
Income
SMSA
HS
<HS
0-4,999
>HS
<HS
Non-SMSA
HS
>HS
Fitted
5.9046. 5.9046
5.9046
5.3575
5.3575
5.3575
(Obs. S.E.) (0.2774) (0.4788) (0.7128) (0.3670) (0.5444) (0.7968
fitted S.E. 0.1311
0.1311
0.1311
0.0850
0.0850
0.0850
(Res.)
(0.2429) (0.2690) (0.4019) -0.2805) (0.0026)(-0.7729
Fitted
4.8105
5,0004.8105
5.9046
4.2634
4.2634
5.3575
(Obs.
S.E.)
(0.1864)
(0.2157)
14,999
(0.2898) (0.2426) (0.2468) (0.4279
Fitted S.E. 0.0775
0.0775
0.1311
0.1163
0.0850
0.1163
(Res.) 1,-0.0757) (0.1707) (0.1725) -0.1192) (0.0568)(-0.2973
Fitted
4.8105
4. 263l~
4.8105
5.9046
15,000
4.2634
4.2634
and up (Obs. S.E.) (0.3254) (0.2774) (0.2500) (0.5345) (0.4644) (0.4714
Fitted S.E. 0.0775
0.0775
0.1311
0.1163
0.1163
0.1163
(Res.)
(0.0140)(-0.1074)(-0.2484) (0.1543) (0.2295) (0.2164
Analysis of Variation Table
Source
d. f.
Q
Contrast
S.E.
% Total Q
0.2735
0.0330
91.31
Model
1
68.51
Error
16
6.52
Total
17
75.03
8.69
CHAPTER FIVE
ANALYSIS OF POSTSAMPLING ADJUSTMENT EFFECTS
AND MORE COMPLEX FUNCTIONS OF SURVEY DATA
5.1
Introduction
The sampling experiments discussed in Chapters Three and Four
provide insights into the effects of various errors found in data from
sample surveys.
The most important part of the investigation was the
effect of various approximations and postsampling adjustments on the
inference structure for regression type models.
It is now of interest
to investigate the inference structure appropriate to more complex
functions.
These complex functions deal both with the postsampling
adjustment of the data and with the attempt to explain the underlying
stochastic processes which gave rise to the data.
The function for the postsampling adjustment of the data to be
investigated is that of iterative proportional fitting (denoted IPF).
It should be noted that IPF is closely related to maximum likelihood
procedures for fitting log-linear models to categorical data.
An
approximation to the variance matrix generated by the procedure is
given in Section 5.3 along with a statistic for the evaluation of the
overall variance reduction achieved by the procedure.
These results
are then applied to the special case of simple ratio adjustments known
as poststratification.
These simple postsampling adjustments have been
investigated empirically in Chapters Three and Four.
While IPF is a postsampling adjustment aimed at variance reduc-
133
tion, it is also of interest to fit other types of functions which are
directed at investigating the underlying process which generated the
responses of the surveyed population.
Such functions are especially
of interest when, in the context of Chapter Two, pure response error is
the predominant component of the variance structure of the target population.
Two such functions are investigated in Section 5.5.
These
are the multiple logit function and a type of bivariate Weibull distribution.
It is important to recognize that both types of functions
can be investigated within the framework discussed in Chapter Two and
the results discussed in Chapters Three and Four are completely applicable.
However, a general approach to postsampling data adjustment for
variance reduction is of interest in its own right.
5.2
The IPF Postsampling Adjustment Procedure
On~
of the sources of error in survey sampling not explicitly
represented in the CSM framework of Chapter Two is that of nonresponse.
Here, nonresponse includes both noncoverage of the target population by
the survey frame and the inability to make the required measurements,
either by interviewer or questionnaire or other measuring device, on
the selected individual.
Thus, deflated estimates of the population
distribution would result from nonresponse errors.
would
~e
An analogous error
when certain groups are over-represented in the sense that the
selection probabilities are under-estimated.
This source of error is
beyond the scope of this paper but it should be observed that this type
of error is not uncommon in censuses, where it is related to the phenomenon referred to as "double counting."
The effect is an inflation of
the estimates of the population distribution.
134
One approach to nonresponse errors is to inflate or adjust the
estimates of the survey to correspond to those of an independently
determined set of population parameters.
Typically these parameters
correspond to the population distribution on several margins.
The IPF
adjustment uses these margins directly in the adjustment process.
It
was first discussed, in form to be examined here, by Deming and Stephan
(1940).
In the context of data generated by multinomial or Poisson
distributions, the procedure is discussed in many places, for example
by Fienberg (1970a,b).
In this context, IPF has several important
properties.
These properties concern the stability and interpretation of
the estimates produced by IPF.
Fienberg (1970b) showed that the pro-
cedure will converge or stabilize to a set of domain or cell estimates
under fairly general conditions.
The most important of the conditions
is that the underlying multinomial model is correct and that there are
observations in all of the fitted cells.
For large surveys such as the
Current Population Survey and the Health Interview Survey, the latter
condition is usually satisfied, as long as a not too complex cross-classification is used.
The importance and appropriateness of the multi-
nomial assumption is not clear and is beyond the scope of this discussion.
In most cases (Deming and Stephan, 1940) the procedure has been
seen to converge relatively quickly when it has been applied to survey
data.
A related advantage of the procedure is in preserving measures
of association.
That is to say, for margins which are only estinlated
and not fixed by the independently determined paranleters, measures of
multiplicative association in the data are preserved by the IPF proce-
135
dure.
This was shown by Mosteller (1968) in a paper on contingency
tables for infinite or independently sampled populations.
A third
point, which is only heuristically relevant, is that IFF produces maximum likelihood estimates for the fitted model of the cell frequencies
(Fienberg, 1970b).
This point will be used in the discussion of IFF
but not in terms of producing maximum likelihood estimates.
These properties make IFF a useful procedure which can now
be examined in detail.
For the sake of simplicity, it is discussed in
terms of the response frequencies without reference to measurement error
or intrinsic response errors, therefore only one trial, T
required.
= 1,
is
Implicitly there is a known structure on the expected
response vector !i'
This means that the category of response j, possi-
bly vector valued, is the result of the joint response to s variables.
Thus, the components of Y. may be arrayed as a contingency table in s
-~
dimensions.
k
Further, the k-th variable may contain up to J
k
levels,
= l, ... ,s.
For example, consider a set of eight binomial responses.
Then
!i is a vector with 2 8 components and,
if individual i is classified
in category (jl,j2, ••. j8)
(5.1)
otherwise,
where j
= 1, ••• ,2 8 ,
the k-th variable.
jk
k
= 1, ••• ,8
and jk corresponds to the jk- th level of
That is
.{~
i f level one of variable k
(5.2)
i f level two.
The corresponding estimates for the population distribution are given by,
136
and
(5.3)
"
N'
=
A
~
A
A
(Nll ••• l,Nll ••• 2,···,N22 ••• 1,N22 ••• 2)
"
where N
jlj2·· ·ja
(5.4)
and where U and <Pi were defined in equation (2.1).
i
The margin esti-
mates may be written as sums of these cell estimates,
"
N
+r... ja
=
N U
i
L
i=l <Pi
(5.5)
where the "+" subscript denotes the sum over the corresponding margin.
For higher order margins the terms are similarly defined, for example,
"
N
,
"
+···jk •• ·jk ••• +
~ Ui
=L.T
i=l'l'i
s
L
k=l
k=lk'
k=lk"
In the eight binomial example, s = a and J
k
C
2 for k == l, ••• ,a.
In
this manner, all margins through the s-th order margin which corresponds
to the original table, may be defined and estimated.
Now suppose that several second order margins are known, for
example those for the (s -2)-th through s-th variables.
These may be
denoted N
2
' N
. , N
2
1 ,where the vector
-+ ••• 5- ,+,5 -+ ••• +,s-l,s -+ ••• s- ,s- ,+
notation indicates a vector with components corresponding to J
and J
s
levels.
Then the IPF adjustment is as follows:
s- 2,J s- l'
137
N
~(I,l)
j1' j 2,···,js
9(1-1,3)
j1,j2,···,js
IC
X
+ ... +j s- 2,j s- 1'+
~(1-1,3)
+ .•• +j s- 2,j s- 1'+
N
N(I,Z)
j1' j Z'··· ,js
~ (1,1)
...
j1,jZ, .. ·,js
j
j
l' 2'···'
~
lC
8
Nj
s
~(1, 1)
N+
j
Z,+,j
(5.7)
+'
,
••• J s- 1,] s
;(I,Z)
j1'jz, .. ·,js x ~(I,Z)
+..• +j s- 1,j s
j 1 ,j Z,· .. ,j s
AN~(l,l)
s-
+ ... +j s- Z,+,j s
~(I,3)
where
+ ... +j
x
,and I is the superscript indicating
.
1,J 2 ,···,J 8
the cycle of the adjustment.
As noted earlier this procedure will
11;
stabilize, generally, to a set of estimates
~,
where
~
~++ .•. s-2,s-1,+ = ~+ .•. +s-2,s-1,+'
*'
N
-+ ••• s-2,+,s
= N-+ ••• +,s-2,+,s ,
(5.8)
~
N
-+ •••+s-l,s
= N
.
-+ •••+s- 1
,5
Further, from the work of Feinberg (1970b) these'estimates maximize the
function
L(NIN
,N
,N
)
- -+ ••. +s-2,8-1,+ -+ .•. +8-2,+,8 -+ ... ++,s-l,s
(5.9)
s
• Nt
IT
k-l
138
, subject to the constraints in (5.8).
This follows since (5.9) has the form of the 1ike-
1ihood function for the multinomial distribution.
context that the IPF estimates can be examined.
It is in this latter
Before deriving the
corresponding estimates of the variance matrix of these adjusted estimates, it is worthwhile to consider two justifications for the approach.
The first is that of smoothed means.
This was first presented
in the context of modeling stochastic phenomenon in Tolley and Koch
(1974) and Koch and Tolley (1975).
They argue that simplified likeli-
hood equations are useful for modeling complex events in the context of
modularized experiments.
Thus, relatively simple likelihoods may be
fitted in each module and then the estimated parameters are combined
through weighted least squares.
This has the advantage of being compu-
tationa1ly simple and moreover generates statistics with
desira~le
large sample properties permitting the application of Central Limit
Theory.
A separate justification is in terms of superpopu1ation theory
as discussed by for example Fuller (1973).
To recall, the sampled
finite population is thought of as a sample from a larger infinite population.
In this context, the equation given in (5.9) would be the usual
likelihood function.
The only complication would be in terms of the
" estimate was constructed prior to the IPF adjustmanner in which the N
mente
Thus, the examination of
,..
plexity of the estimate of N.
...
implicit function theory.
lit
y(~)
will need to incorporate the com-
This may be done in the context of
139
This may be seen more easily by first taking the natural 10garithm of (5.9) and denoting the resulting equation by
L
= 1n{a
L,
constant with respect to TI}
s
+
l
(5.10)
k=l
(TI
where TI
jl,j2,···,js
= N'
.
j )
j 1,J2'·'" s
jl,j2,···,js
IN
is to be estimated and
A is a
Lagrange multiplier corresponding to the additional constraint that,
(5.11)
Maximizing (5.10) under (5.11) yields A = -N.
L
= 1n{a
Therefore,
constant}
s
- N
r
(5.12)
k-1
Now, maximizing L in (5.12) under (5.8) maximizes L in (5.9) under
(5.8).
The maximization is with respect to
TIt
-
-
{TI
j1,j2,···,js
the argument of Fienberg (1970b) must yield estimates of
~
}
and by
of the form
(5.13)
140
where the bracket notation indicates a vector.
Now the constraints
(5.8) imply an underlying set of parameters for the associations in the
cross-classification.
!(~)
~
where the
That is to 'say, there exists an implicit function
are an unknown set of parameters, such that
(5.14)
s
u < (IT
k=l
J ) - 1.
k
Thus, F(S) may be substituted for TI in (5.12) and maximization of L
-
with respect to S for appropriate functions F, maximizes L in (5.9) under
-
the constraints in (5.8).
Formally this means that the following equa-
tion must be solved for S:
-
[~a (l~(~»]'[!(~) - p] = 0
(5.15)
..
where
~
1 (\
D
N ~,
,..
~
This set of equations
defined in (5.3) and (5.4).
implicitly defines the estimate· of
a,
-
denoted
,..
-a,
in terms of
p,
(5.16)
For regular functions !(~), H(p) will be differentiable about some
value of
~,
say
~O'
and therefore can be expanded in a Taylor series
about ~O'
t
,.
,. J(~
oH(p)
.. H(n)+
- "'Vo
-"
°e
Note that by (5.3) and the discussion in Chapter Two,
n.
,..
Thus, h(p) is a consistent estimate of
- TI ).
-0
p=n
_ -0
~
when
~O
f
(5.17)
is unbiased for
obtains and the IPF
141
TIlerefore, the linearized variance of "S is defined
model is correct.
by,
"-- t~~£)
L - Var(S)
(S .18)
n
Op
A
where Var(p) is known from the discussion in Section 2.3.
" is in hand, a Taylor series expansion of
L - Var(S)
~
.
earized variance of p since (S.14) implies
term in this sequence is
"
{S.lS) by £.
5.3
A
"
o~(£)/o£
~
~(§)
Once
yields the lin-
A
£ = F(S).
The only complex
which can be found by differentiating
This is the subject of the next section.
Variance Structure of the IPF Model
" (5.18) shows the importance of the
The equation for L - Var(S)
form which F(S) takes.
Specifically, for the model shown in (5.9) which
IPF maximized, F(S) takes a log-linear form,
1T = exp{X
---
where X is a
(5.19)
S},
- -
s
«
J ) x u) design matrix which corresponds to a com-
IT
jk=l
k
plete factorial design (Fienberg (1970b», with additive constraints.
For example, if there are three binomial responses, then
X •
1
1
Sl
1 -1
1 -1 -1 -1
S2
1 -1
1 -1 -1 -1
a3
a ... a4
as
a6
a7
1
1
1
1
1
1
1
1
1
1
1 -1 -1 -1 -1
1 -1 -1
1 -1
1
1 -1
1 -1 -1
1 -1 -1
1
1 -1 -1 -1
1
1
1
1 -1
1 -1
1
1 -1 -1
1
1
1
1 -1
, and
Ss
(5.20)
142
If Sl through Sa are estimated, this is equivalent to not adjusting
the data.
If the total population size is·known then Sl is fixed.
the first order margins are known then SZ' S3' and S4 are fixed.
If
If
the second order margins are known then S5' S6' and S7 are fixed.
If
the third order margin is known then the sample need not have been
drawn.
These parameters correspond to various levels of multiplicative
association.
The usual level of adjustment is done in the reverse of
the order of model reduction in the linear models context.
the highest order terms are the last to be eliminated.
That is,
Returning to
the ~ - ~~:(~) implied by (5.19) the solution for [O~(f)/oe] follows
the method of proof given in Tolley and Koch (1974).
Let,
(1) (~)
A
Fi,j
where i
=
s
l, ••• , r = IT J
k
k=l
F(Z)
= G~
oSj (Fi(~»~
G~2
OSj2
r
l Fi(~)
i-I
~
(Fi(~»
implies
r
L
Fi(lj) (8)
i=l'
(5.22)
A
S = S
r
=1
(5.21)
A
j = l, ••• ,u, and
A
=
i,j (~)
Noting that
IS =S
=
l
i=l
F(Z) (S) '" O.
i,j
Then
.(6.23)
for i - l, ••• ,r, j •
~, .••
,u.
Substituting (5.21) aud (5.22) into (5.19) yields:
143
Fi~~ (~)/Fi(~)
= x ij ,
[F~~~ (~)]2/Fi(§)
=
x ij
(5.24)
;j'
These in turn yield
x
X
11
x
21
r
2 P.
r
2
x i1
x· 1Pi
i=l
r
r
~
J.
"
oH(p)
-" op
-
x
"p=p""
=
lr
i=l
12
2 ~
x i2 Pi
l
~
i=l
x
r1
r
~
x·21P.
i=l J. ].
J.
x
22
r
2 ~
r
x· 2Pi
i=l ].
r2
r
~
= G(p)
2 "
l
x· 2P.
i=l]. ].
x
ru
(5.25)
When J
k
= 2, k = 1, ••• ,s then x
2
= 1 all i,j so
ij
" = X'.
" "
p=p
-
"
So returning to the original problem of approximating ~-~~:(€):
(5.26)
"
~
Var(p) [G(p)] "
---....,
-...,
(5.27)
(5.28)
and
144
A
= D~
L-Var{p)
- --- -
where
~
~ = ~(£)
G'G Var{p) G'G
-£
--- -
D~,
(5.29)
-f
is defined in (5.25) and D~
is a diagonal matrix with
...p
-
"
Y~E(E)
the IPF estimates on the main diagonal.
may be estimated by the
methods discussed in Chapter Two or any method which yields consistent
estimates.
This yields an expression for the approximation to the var-
iance matrix of the IPF estimates.
Given L -
"
Var(~),
it is desirable to find an overall measure
of the variance reduction achieved by an IPF data adjustment.
One
method is suggested by the generalized Wald statistics discussed in
Chapter Two.
s
Let m' = (l,l, ••• ,l), a vector of r = ilk=l
J k ones.
Then, estimates of the variance matrix of
~
and ~, denoted V(p) and
~
V(p), may be obtained by the methods of Section 2.4 and from (5.29)
respectively.
Then use of a weighted least squares algorithm can esti-
mate an overall mean parameter, b, of the form
~ ~ mb.
(5.30)
-
There exists an orthogonal matrix L such that,
=L
1
~
P =L mb =0
(5.31)
with an associated total variation statistic
Qi • 1'(L
Vi L,)-l
1,
... __
_
... i = 1,2
where
~l
=
~(~)
and
~2
~
= ~(~).
(5.32)
These Qi correspond to the total varia-
tion statistics in the GSK framework.
If this statistic is considered
in a superpopulation context, as mentioned in Section 5.2, then Q has
l
a Chi-squared distribution with r-l degrees of freedom and Q has a
2
145
Chi-squared distribution with d.f.
=
(r-l) - (r-u)
= u-l.
But more
important, the ratio of Q to Q may be used to evaluate the propor2
l
tion of variation removed by the adjustment procedure.
This section
concludes the discussion of general postsampling adjustments for variance reduction.
However, it is of interest to examine the effect of
ratio adjustments for variance reduction.
This is the subject of the
next section.
5.4
Application to Poststratification
Some sample surveys are conducted in such a way that for single
margins involving several variables complete information about the
target population is available.
For example, in a survey to determine
health characteristics, the demographic characteristics for each of the
survey strata may be completely known.
Thus, the ratio of the known
variables to their estimated values may be used to inflate the strictly
estimated survey variables.
tion.
This process is known as poststratifica-
Frequently, this inflation takes the form of a simple multipli-
cative adjustment.
(N.C.H.S. (1974), "Appendix I").
The poststratification procedure is in fact a simplified form
of the IPF procedure discussed in Sections 5.2 and 5.3.
What is done
is to array the demographic variables as. a single variable with a number of· levels equal to the product of the number of levels in each of
the known variables.
Thus, the poststratification corresponds to a
single iteration of the IPF algorithm.
It immediately follows from the
discussion in Section 5.3 that the Q statistic is increased to the extent
of the elimination of sampling variance and measurement error.
is a corresponding reduction in the standard errors.
There
A cautionary
146
note should be observed in that poststratification,
a~
with the IPF
procedure, will induce bias in the estimate to the extent of error in
the known variables.
The important point is .that when poststratification, or for
that matter IPF, is contemplated the Q statistics and their corresponding ratios may be computed to estimate the relative improvement in
precision.
Thus, assessments may be made and weighed against the pos-
sibility of
induci~g
5.5
bias in the estimation process.
Complex Postsampling Models
Subsequent to the selection and analysis of a sample survey,
substantive analysts often become aware of the possibility that the
responses were in fact generated by an underlying stochastic process.
These processes usually can be formalized into some type of statistical
function.
Two such functions are the multiple logit and the Weibull
distributions.
Each of these distributions have found wide acceptance
in the biological and social sciences because of their flexibility and
interpretability.
The fitting of these distributions for survey data
may easily be implemented in the framework of Chapter Two as will be
demonstrated.
Here the multiple logit is considered in the context of
multiple binomials and the Weibull is considered in the context of a
bivariate distribution for a cross-sectional representative sample.
Consider the situation where the
su~vey
This is again a 2 3 design
fied by two levels for each of three factors.
on the response space.
Let rr
ijk
responses are classi-
denote the probability of a response at
joint levels (i,j,k) where i • 1,2, j
a
l,~,
k
a
1,2; and
147
222
L L L
niJ"k
i=l j=l k=l
= 1.
Then the multiple logit model is:
(5.32)
where
X(l) ::: 1
ijk
k ::: 2 , 0
x(2) ::: 1
j
=2
, 0
otherwise;
i
=2
, 0
otherwise;
ijk
(3)
=1
(4)
(2)
= x ijk
x
(2)
= x ijk
x
x ijk
x ijk
x
x
(5)
ijk
(6)
ijk
= x
(3)
ijk
x
otherwise;
(3)
ijk
(4)
ijk
(4)
ijk
_
and
D ..
Implicit in the model is the constraint
L L kLn ijk
i
j
.. 1, and the
eighth degree of freedom is reserved for a goodness of fit test.
This
goodness of fit test is equivalent to the test of no second order interaction in the sense of Bhapkar and Koch (l968a,b),
148
(5.33)
This may be tested using the Q statistics discussed in Chapter Two.
The formulation of Sections 5.2.and 5.3 in terms of models which correspond to maximum likelihood functions may also be used.
The essential
poi.nt i.s the use of maximum likelihood search procedures to estimate
the
ar .
The estimation of standard errors again requires the evaluation
of
where,
6
L
r=l
(r)
x ijk Sr - 1n D.
(5.34)
In matrix notation the solutions may be written
(5.35)
where the p are the observed proportions.
The covariance matrix of the parameters
from (5.23) and (5.18).
~
may nOW be obtained
In particular, after some algebra
149
-1
1
IT
IT
--1
-1
'IT
• 1-
IT
~~£J
f~~
=
1••
--2
1
IT
·-1
-1
'IT
-=l...
-1
IT
1
• 2·
-1
....::L
'IT
·-2
1
'IT ·2·
IT
• J. •
1· •
7T
1- -
-1
1-7T
-1
1-7T
7T
-1
l-7T
-1
l-lT
-1
1-7T
1-'IT
-1
1-7T
-1
l-7T
-1
1-7T
-1
1-7T
2·2
22-
·22
2·2
22·
1
·22
·22
22·
22-
-1
2··
2· •
_1_
IT
IT
••1
1
7T
1
7T
·2-
1
'IT
2··
2··
-1
1-7T
7T
-1
1-7T
1
7T
-1
1-7T
7T
-1
1-7T
-1
1-7T
22-
2-2
22·
1
·22
·22
1
2-2
1
7T
-2•
_1_
7T
-1
1-7T
·22
··2
1
7T
-1
1-7T
2·2
2·2
··2
-1
'IT • J. •
•1•
·22
-1
2·2
•• 1
1
7T
1• -
1
IT
-1
'IT
-1
7T
-1
1-7T
·22
-1
IT
2-2
1
7T
22-
22·
(5.36)
~
Thus, as with the IPF procedure,
stituting
~
and
V(~)
may be obtained by sub~
This in turn yields the estimates of p
for'fl' in (5.36).
~
and V(~) under the model, where p now refers to the logit model proportions.
An asymptotically equivalent procedure is that of weighted least
squares.
The formulation is as follows.
K In A 7T
...,
__
-
-
~
Let
X
~
a
#IV
where
-1
-1
-1
A- ~8'
K
... •
-1
-1
-1
-1
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
150
X
=
1
0
0
0
0
0
0
1
0
0
0
0
Sl
1
1
0
1
0
0
Sz
0
0
1
0
0
0
1
0
1
0
1
0
0
1
1
0
0
1
a5
1
1
1
1
1
1
S6
a=
S3
(5.37)
64
Then S may be estimated directly from the weighted least squares pro. '.
:
cedures of Chapter. Two and corresponding estimates
inverting (5.37).
estimates of
A
~(£)
of~
obtained by
Again, the only critical aspect is in obtaining
from some procedure such as BRR.
Thus, it is appar-
ent that functions with a simple exponential form, such as multiple
logits, may be fitted to complex survey data.
The bivariate Weibull
distribution represents a more complex function.
The Weibull distribution has been of interest to statisticians
since its introduction to the literature, partly because of its usefulness in data which are thought to reflect either increasing or decreasing hazard functions.
(For example, see Johnson and Kotz (1970 or
Gehan and Siddiqui (1973).)
If t represents the time to the occurrence
of an event of interest (e.g., a death or the detection of a tumor),
then the Weibull cumulative distribution function may be written as:
.
G(tl~,o,w)
=1
0
- exp{-~(t-w) } for t ~ w and ~,o> O.
(5.38)
As demonstrated in Freemap, Freeman, and Koch (1974) such models may
easily be fitted in the usual weighted least squares framework.
This
may be seen by examining S(t) which denotes the log-log of the survival
rate.
Specifically let
151
aCt) • In[-ln{l - G(tl~,o,w)}]
where
In(~)
= In(~) +
0 In(t-w),
(5.39)
and 0 are to be estimated and w is to be fixed by the
experimental situation.
With this reparameterization e(t) can easily
be fitted, both for univariate
and bivariate data.
Typically, the Weibull distribution is fitted to data which is
assumed to arise from a simple random sample.
However, the response
error model of Chapter Two and the variance estimation procedure of BRR
permit the fitting of this distribution to data arising from a complex
sample survey.
The only change would be instead of estimating Var(p)
--- -
based on the multinomial model assumption,
~(E)
would be based on the
BRR procedure.
5.6
Summary and Conclusions
This chapter was concerned with various postsamp1ing adjust-
ments to complex survey data and the fitting of complex functions to
such data.
The IPF (iterative proportional fitting) procedure was
investigated and an approximation to the variance structure induced by
the procedure was developed.
Further, a statistic, Q, was suggested
for evaluating the overall variance reduction achieved by the procedure.
This statistic is particularly useful since it is a natural by-
product of the inference structure discussed in Chapter Two.
The discussion of IPF was followed by a discussion of the procedure for fitting more complex distributions to data.
The multiple
logit and bivariate Weibu11 distributions were examined in this context.
It was pointed out that once the distribution of interest had been reparameterized into the linear model framework, the only change from the
152
simple random sampling inference structure was in terms of estimating
the variance matrix.
This was shown to be easily accomplished by the
use of the half sample method in the BRR approach discussed in Chapter
Two.
Thus, it follows that the response error model in conjunction
with BRR permits the implementation of the weighted least squares
methodology for inference in the postsampling analysis of complex sample survey data.
CHAPTER SIX
CONCLUSIONS AND DIRECTION OF FURTHER RESEARCH
6.1
Conclusions
This paper has primarily been an empirical investigation of the
effects of various survey sampling estimation procedures on the inference structure induced by the weighted least squares methodology of
Grizzle. Starmer. and Koch.
In addition, related analytical results
were given for more complex procedures.
This was carried out within
the general framework of the U.S. Bureau of the Census response error
model, the CSM.
The results of the investigation may be briefly re-
viewed.
The empirical investigation contained two layers, embodied in
the two cross-classifications used.
The first layer carried the exper-
iment out in the presence of only sampling variance.
This was felt to
be the case for the cross-classification variables selected, age, race,
and sex, as well as the response variable physician visits over the
past year.
The recall errors on PV were minimized by accumulating two
week experiences over 26 intervals.
The second layer repeated the ex-
periment in the presence of limited response error.
This was done by
using the same response with an income, residence. and education crossclassification.
The literature review in Chapter One suggested the
presence of response error for the education and income cross-classification variables.
154
The experiment tested three hypotheses:
1.
That BRR for ratios is consistent.
2.
That poststratification reduces the-variance of estimated
ratios to a significant degree.
3.
Interdomain covariances do not significantly affect
inference procedures.
Hypotheses two and three were rejected and one was accepted in both
layers of the experiment.
The BRR hypothesis was tested by comparing
the BRR estimates to Taylor series (TS) estimates.
There were virtually
no differences between the two sets of estimates and in the context of
this experiment the TS estimates were shown to be consistent in Chapter
Two.
Thus, the consistency of the BRR procedure was shown for this set
of data.
Mathematical considerations such as those in Sections 5.2, 5.3,
and 5.4 suggest that poststratification reduces the variance of .estimates by making the samples more representative of the national population and by accounting for nonresponse.
However, when the poststrati-
fied estimates were compared to the nonpoststratified estimates virtually no differences were found.
Furthermore, in some cases a slight
variance inflation due to poststratification was found.
Thus, it can-
not be said that poststratification generally reduces variance estimates.
The effect of assuming zero covariances among domain estimates
was the last objective of the investigation.
This assumption was found
to bias final models to a small extent and inflate estimated variances
for inference-based predicted values by as much as 25 per cent.
Anal-
ogously, there was a 30 per cent under estimate of the corresponding
155
test statistics.
were observed.
Furthermore, large numbers of negatIve covari.:wccs
The combined result was that the zero assumption pro-
duced very conservative inference procedures.
The last stage of the experiment was to implement the GSK methodology for fitting final models.
the experiment's hypotheses.
As
This was easily done under each of
a result a compact description was
obtained for the underlying population from which the sample was
obtained.
Chapter Five was devoted to various analytical results suggested by the empirical investigation.
The IPF procedure for postsam-
pIing adjustment was characterized and an evaluation statistic proposed.
Poststratification was
procedure.
sho~~
to be a special case of the IPF
Lastly, it was shown how the BRR procedure can be used for
the fitting of complex stochastic models to sample survc:y data.
It was
noted that it is important to account for the selection design in the
fitting of such models and that BRR in conjunction with the GSK weighted
least squares methodology does this in contexts where the half sample
procedure may be implemented.
6.2
Directions of Further Research
This paper has provided a number of insights into inference
procedures for sample survey data, however, it has also raised several
questions.
These may be formulated into three general directions for
further related research.
The most obvious direction is for empirical
studies of the effects of response error in the dependent variables.
This can be pursued by investigating response variables which are more
subject to measurement error than the physician visit variable of this
156
experiment.
One direction would be in terms of recall effects where
the respondent is asked to respond to the question for a six month or
one year period.
Another direction would be the investigation of
response variables, such as disability days, which would take on a
wider range of values.
A related line of empirical research would be in terms of different cross-classification variables.
Either different levels of the
variables examined in this study or different sets of variables could
be used.
In addition, it would be worthwhile to repeat this experiment
in different surveys.
However, empirical studies represent only some of the necessary
directions for further research.
Analytical proofs of the consistency
of the BRR methodology are very desirable but may prove to be intractable.
Thus, the comparison of the Taylor series approximation to the
RR estimates can be used on selected cross-classifications of future
surveys to test the consistency of the RR estimates in those particular
surveys.
This corresponds to the experimental validation that occurs
in the physical sciences.
A related area for analytical research is
the further development of the distributional properties of the IPF procedure characterized in Chapter Five.
properties of QPS/QNPS are unknown.
For example, the statistical
Thus, the empirical investigation
of this paper suggests a number of fertile areas for further research.
BIBLIOGRAPHY
1.
Bai1ar, Barbara A. (1973), "A Common Problem in the Analysis
of Panel Data," Paper presented at the December 1973 Joint
Meetings of the Allied Social Science Associations, New
York, New York.
2.
Bhapker, V.P., and Koch, Gary G. (1968a), "Hypotheses Of 'No
Interaction' In Multidimensional Contingency Tables,"
Technometrics , 10:1, pp. 107-123.
3.
,
(1968b), "On the Hypotheses
of "No Interactions' in Contingency Tables," Biometrics,
24:3, pp. 567-594.
4.
Borus, Michael E. (1966), "Response Error in Survey Reports of
Earnings Information," Journal of the American Statistical
Association, 61:315, pp. 729-738.
5.
, and Nestel, Gilbert (1973), "Response Bias
in Report of Father's Education and Socioeconomic Status,"
Journal of the American Statistical Association, 68:344,
pp. 816-820.
6.
Cochran, W.G. (1963), Sampling Technigues, Second Edition.
John Wiley and Sons, Inc., New York, New York.
7.
Cornfield, Jerome (1944), "On Samples from Finite PopulationE;,"
Journal of the American Statistical Association, )9,
pp. 236-239.
8.
Deming, W. Edwards (1960), Sampling Designs in Business R.esearch,
John Wiley and Sons, New York, New York.
9.
, and Stephen, Frederick F. (1940), "On a
Least Squares Adjustment of a Sampled Frequency 1able l{hell
the Expected Marginal Totals are Known," Al~i1l~_:~_
Mathematical Statistics, D.,_ pp. 427-444.
10.
Fienberg, Stephen E. (1970a) , "The Analysis of Mu1tid:imensional
Contingency Tables," Ecology, 51:2, pp. 419-433.
11.
(1970b), "An Interative Proced UlP ioc
Estimation in Contingency Tables," p.nnals o_LMa_~l(c[,_l£l":i(~_~_
Statistics, 41:3, pp. 907-918.
12.
Forthofer, Ronald N. and Koch, Gary G. (1973), "An Analysis for
Compounded Funct ions of Categorical Data," Biometrics. ~~_:J::,
pp. 143-157.
158
13.
Frankel, Martin R. (1971), Inference from Survey Samples.
The University of Michigan, Ann Arbor, Michigan.
'.14.
Freeman, Jr., Daniel H., Freeman, Jean L., and Koch, Gary G.
(1974), "A Modified X2 Approach for Fitting Weibull Models to
Synthetic Life Tables," Institute of Statistics Hirneo
Series No. 958, The Consolidated University of North Carolina,
Chapel lIill, North Carolina.
15.
Fuller, Wayne A. (1973), "Regression Analysis for Sample Surveys,"
unpublished manuscript.
16.
Gehan, Edmund A., and Siddiqui, M. (1973), "Simple Regression
Methods for Survival Time Studies," Journal of the American
Statistical Association, 68:344, pp. 848-856.
17.
Greenberg, Bernard G., Abernathy, James R., and Horvitz, Daniel
G. (1970), "A New Survey Technique and Its Application in·
the Field of Public Health," Milbank Hemorial Fund Quarterly,
48:2, pp. 39-55.
18.
Grizzle, James E., Starmer, C. Frank, and Koch, Gary G. (1969),
"Analysis of Categorical Data by Linear Models,"
Biometrics, 25: 3, pp. 489-503.
19.
Hansen, Morris H., Hurwitz, William N., and Bershad, Max A.
(1961), "Measurement Errors in Censuses and Surveys,"
Bulletin of the International Statistical Institute~,
Part II, pp. 359-374.
20.
and Pritzker, Leon, (1964), "The Estimation
,
and Interpretation of Cross Differences and the Simple
Regression Variance," in C.R. Rao, ed., Contributions to
Statistics, Pergamon Press, Ltd., London, pp. 111-136.
21.
and Waksberg, Joseph (1970), "Research on
Non-Sampling Errors in Censues and Surveys," Review of
the International Statistical Institute, 38:3, pp. 317-332.
22.
Hansen, Robert H., and Marks, Eli S. (1958) "Influence of the
,
" Journal
Interviewer on the Accuracy of Survey Results,
of the American Statistical Association, 53:283, pp. 635-655.
23.
Horvitz, D.G. and Thompson, D.J. (1952), "A Generalization of
Sampling Without Replacement from a Finite Universe,"
Journal 0[_ the American Statistical Association, 47: 260,
pp. 663-685.
24.
Johnson, Nonnan, L. and Kot.z, Samuel (1970), Distributions
in StatIstics: Continuous Univariate Distribution I,
Houghton Mifflin Co., Massachusetts.
159
25.
Kish, Leslie (1965), Survey Sampling, John Wiley and Sons, Inc.,
New York, New York.
26.
and Frankel, Martin R. (1968), "Balanced Repeated
Replications for Analytical Statistics," Proceedings of
the Social Statistics Section of the American Statistical
Association, pp. 2-10.
27.
,
(1970), "Balanced
Repeated Replications for Standard Errors," Journal of t·he
American Statistical Association 65:331, pp. 1071-1094.
28.
Koch, Gary G. (1969), "A Useful Lemma for Proving the Equality
of Two Matrices with Applications to Least Squares Type
Quadratic Forms," Journal of the American Statistical
Association, 64:327, pp. 969-970.
29.
(1971), "A Response Error Model for Sub-Class
Means and Post-Stratified Means," Technical Report 116, SU-6l8,
Research Triangle Institute, Research Triangle Park, North
Carolina.
30.
, Freeman, Jr., Daniel H., and Freeman, Jean L. (1975),
"Strategies in the Multivariate Analysis of Data from
Complex Surveys," International Statistical Review, 43:1,
pp. 55-74.
31.
and Lemeshow, Stanley (1972), "As Application
of Multivariate Analysis to Complex Sample Survey Data,"
Journal of the American Statistical Association, 67:340,
pp. 180-782.
32.
and Tolley, H. Dennis (1975), "A Generalized
Modified-x 2 Analysis of Categorical Bacteria Survival Data
from a Complex Dilution Experiment," Biometrics, 31:1,
pp. 59-92.
33.
Mahalanobis, P.C. (1946), "Recent Experiments in Statistical
Sampling in the Indian Statistical Institute," Journal
of the Royal Statistical Society, 109:4, pp. 325-370.
34.
McCarthy, Phillips J. (1966), "Replication: An Approach to
the Analysis of Data from Complex Surveys," Vital and
Health Statistics, P.H.S. Pub. No. lOOO-Series2 - No. 14,
·National Center for Health Statistics, Public Health Service.
35.
(1969a), "Pseudoreplication: Further
Evaluation and Application of the Balanced Half-Sample
Technique," in Vital and Health Statistics, P.H.S. Pub.
No. 1000 - Series 2 - No. 31, National Center for Health
Statistics, Public Health Service.
160
36.
(1969b), "Pseudoreplication: Half
Samples," Review of the International Statistical Institute,
37:3, pp. 239-264.
37.
Miller, Rupert G. (1974), "The Jackknife - a Review,"
Biokmetrika, 61:1, pp. 1-16.
38.
Mosteller, Frederick (1968), "Association and Estimation in
Contingency Tables," Journal of the American Statistical
Association, 63:321, pp. 1-28.
39.
National Center for Health Statistics (1974), "Current Estimates
from the Health Interview Survey United States - 1973,"
Vital and Health Statistics - Series 10 - No. 95, DHEW
Pub. No. (HRA) 75-1522, Public Health Service, Rockville, Md.
40. N'eter, John and Waksburg, Joseph (1964), "A Study of Response
Errors in Expenditures Data from Household Interviews,"
Journal of the American Statistical Association, 59:305,
pp. 18-55.
41.
Simmons, Walter R. and Baird, Jr, James T. (1968), "Pseudoreplication in the NCHS Health Examination Survey," Proceedings
of the Social Statistics Section of the American Statistical
Association, pp. 19-30.
42.
Sudman, Seymor and Bradburn, Norman M. (1974), Response Effects
in Surveys. Aldine Publishing Company, Chicago, Illinois.
43.
Tolley, H. Dennis and Koch, Gary G. (1974), "A Two-Stage
Approach to the Analysis of Longitudinal Type Categorical
Data," Institute of Statistics Mimeo Series No. 962,
The Consolidated University of North Carolina, Chapel
Hill, North Carolina.
44.
Wald, Abraham (1943), "Tests of Statistical Hypotheses Concerning
Several Parameters When the Number of Observations is Large,"
Transactions of the American Mathematical Society, 54,
pp. 426-482.
45.
Wells, n. Bradley, Coulter, Elizabeth J. and Wienir, Linda S.
(1973), "Completness and Quality of Response in the
North Carolina Marriage Follow-Back Survey," Vital and Health
Statistics - Series 2 - No. 56, DHEW Pub. No. (HSM) 73-1330,
Public Health Service, Rockville, Md.
46.
Wilk, M.B. and Kempthonne, O. (1955), "Fixed, Mixed, and
Random Models in the Analysis of Variance," Journal of
the American Statistical Association, 50:272, pp. 1144-1167.