•
•
~
EFFECTS OF
..... :.- .,.,--".....,,; .... ~
Mis'2"LASSIFICATioN-B-IA~' ON
ANAJ::,Y~S-.-oF··
REGRESSION
EPIDEMIOLOGIC DATA
,.' --.------.........t i-...
.. ·u-.....
.Y
".+~ '1-"';""~ ··..·_··T-·_.. ·....~·_· :.',:.
--~-
·-.,~·
-- :' Susan
J.
-T';9~·,,,,,,,,,,,, ~-_._".--:---
.Reade
. . .........
~
_"!~;":~~'7
Depaf-tment -of.,·Bi,ostat.istics
Hill ..
University of North
Carolina
at Chapel
___ .... _.-...__
.. ,_:: •. ' , •
..
L~
~
Institute of Statistics Mimeo Series No.
September 1986
:
,
IBO~
EFFECTS OF MISCLASSIFICATION BIAS ON REGRESSION
ANALYSES OF EPIDEMIOLOGIC DATA
by Susan J. Reade
A Dissertation submitted to the faculty of The
University of North Carolina at Chapel Hill in
partial fullfillment of the requirements for
the degree of Doctor of Philosophy in the
Department of Biostatistics.
Chapel Hill
1986
Apr.·.ved by:
12.
~. ~
Adviser
Abstract
SUSAN J. READE. Effect of Misclassification Bias on Regression
Analyses of Epidemiologic Data (Under the direction of Lawernce L.
Kupper.)
In most epidemiologic studies, the primary focus is on quantifying
the association between exposure to some suspected agent and the
development of some disease. Often, however, subjects are
misclassified as to their level of exposure. For instance, in the case
of a dichotomous exposure variable, a subject may be classified as
unexposed when she or he is really exposed, or vice versa. Ignoring
this misclassification error has been shown to introduce a bias into the
estimates of certain parameters.
The research reported in the literature concerning this problem has
dealt almost entirely with analyses involVing dichotomous exposure
variables. Little research has been done regarding situations in which
there are more than two exposure categories. In addition, little
research has been done regarding the effect of exposure
. misclassification in the presence of covariates; what has been done is
limited to situations involVing one dichotomous covariate. No
research, thus far, has dealt with more complex (and realistic)
situations involVing the use of regression analyses.
For situations in which there is misclassification of exposure in a
follow-up study with categorical data, we have developed a model which
can be used wi th logistic and Poisson regression procedures. This
model allows for an exposure variable with any number of categories,
as well as for any number of covariates. The model helps quantify the
biases involved when misclassification is ignored. When ancillary
information is available, this model can be used to correct for the bias
in the estimates produced by these regression procedures.
•
Acknowledgements
First and foremost, I would like to thank Larry Kupper, whd has
been a great teacher, advisor, and friend. It was his influence which
led me to pursue a doctoral degree, and his support which helped me
complete it. I not only learned an enormous amount working with him,
but, more importantly, I enjoyed it as well.
I would also like to thank the other members of my committee for
their comments and signatures: Drs. Gary Koch, Dave Kleinbaum, Ed
Davis, and Harvey Checkoway.
There are many people who have offered me a great deal of support
and encouragement throughout the writing of this dissertation. Among
these are Jean Coble, whose door was always open; Elaine Walker, my
roommate; Lynn Shemanski, my racquetball partner; and Kathie, Jerry,
Melissa, and Christopher Edwards, my second family. Also included
are Nancy Lucas, George Jerdack, Dennis Cosmatos, Jerry Schindler,
and Marie Matteson, who made sure I didn't work too hard.
Finally, I would like to thank my family, and especially my parents,
Oavid and Ruth Reade, for all the encouragement and support they have
given me throughout the years.
Table of Cootents
Q1apter
1
Uterature Review, Notation, ani Model Formation
1. 1 Introduction ......•...••.•...•..................................
1
1.2 Notation ..............•...•.....•...•.......•...................
3
1.3 Irldependence AsslD11ption •• ••••••••• •••• •••••••• •• •••• •••• ••
5
1.4 Proposed Model . . . .. . . . . . .. . . . . •. . . . . . . . . . . . .. . . .. . . . . . . .. . . ..
7
1.5 Case-Control and Cross-Sectional Studies . . . .. . . .. . . . .. .
10
1.6 Example Using Dichotomous Disease and Exposure
Variables
13
1.6. 1 Misclassification Bias for Risk Differences
14
1.6.2 Misc1assification Bias for Risk Ratios
16
1.6.3 Misc1assification Bias for Odds Ratios .. . .. . . . . . .
17
1.7 Hypothesis Testing . ••... . . . . .
. . . .. ..
.......
.
...
1.8 Covariates
18
18
1.8. 1 Covariates and Misc1assification in the
Li terature
19
1.8.2 Expanded Model .• . . . . ••. . . . . . . ••.. . . . . •. . . . ••. . . . . . . . . .
22
1. 9 Estimation of Parameters .. ... .. ... .. . .. . . . .. . .. . . . . .. . .. . .
24
1. 10 Matrix Representation of Model (1. 16) .. . .. .. .. . . .. .. .
26
1.11 Appropriateness of Models
30
2
Flt~
and Extensions of Models
2.1 Introduction..............................
33
2.2 WLS and ML Estimators for Poisson and
Logistic Regression Procedures
3
34
2.3 CATMAX Computer Procedure
38
2.4 Odds Ratios and Incidence Density Ratios
39
2.5 Models for Misclassification of Covariates
42
2.5.1 Development of Models
42
2.5.2 Illustration of Model (2.7)
49
2.5.3 Generalization of Misclassification Models
52
2.6 Model to Include Interaction
53
2.7 Model to Reflect Ordinal Exposure Categories
57
2.8 Nonconstant Misc1assification Probabilities
59
Properties of Estlmatm's
3.1 Introduction....................................................
61
3.2 Least Squares Estimation
62
....
....
....
3.2.1 Bias of /lIs and Yl s when n=n
...
...
...
3.2.2 Bias of /lIs and Yls when n=n
3.2.3 Bias of Predicted Values
63
65
...
...
3.2.4 Variance-Covariance Matrices for /l Is and YIs ..
.......
3.2.5 Variance of U/l Is
2
69
71
73
3.2.6 Invariance of R and the F Statistic
74
3.2.7 Summary for LS Estimation
74
3.3 WLS Estimation
75
A
3.3.1 Bias of fJ wls and Twl s
3.3.2 Bias of Predicted Values..
76
81
3.3.3 Goodness-of-Fit Statistic for WLS Estimation...
A
82
A
3.3.4 Variance-Covariance Matrices for fJ wls and Twis 83
3.3.5 Summary for WLS Estimation
83
3.4 ML Estimation
A
84
A
3.4.1 Bias of fJ ml and Tml
84
A
4
A
3.4.2 Variance-Covariance Matrices for fJ ml and Yml.
3.4.3 Goodness-of-Fit Statistics for ML Estimation...
88
3.4.4 Summary for ML Estimation
88
87
Extension and Implications of Results
4.1 Intr-oduction....................................................
90
4.2 Misclassification of Covariates
91
4.3 Nonconstant Misclassification Probabilities
97
4.4 Ordinal Exposure Categories
10 1
A
A
4.4 Relationship Between fJ* and fJo
102
A
4.5 Bias Towards the Null of
pOk
103
4.6 Relationship Bet.ween Variances
110
4.7 Invariance of Test of Ho: P1=f32=· "=Pk=0
113
A
A
4.8 Relationship Between fJ* and fJ
118
4.9 Removal of Restriction on Data for LS Estimation
122
5
Nlmerical Applications
s. 1
Introduction
5.2 ConcHtions for Appropriateness of Models
. 127
. 127
...
5.2.1 Logistic Regression, .............•..................... 128
5.2.2 Poisson Regression
5.3 Nl..D11erical Examples
. 138
. 143
5.3.1 Logistic Regression Example
. 146
5.3.2 Poisson Regression Example
.. 151
5.3.3 Least Squares Regression Example
5.4- Real Life Example
. 156
. 160
5.4.1 Analysis Assl..D11ing Nominal Exposure Categories 164
5.4.2 Analysis Assl..D11ingOrdinal Exposure Categories.
170
e
6
St.gestlons for Fl.rtI'er RE!!Seal'dl ••••.•••••.••••••••••••••••••••
RE!f~.................................................
.••••.••••.••...
176
178
CHAPTER ONE
LITERATURE REVIEW, NOTATION,
AND MODEL FORMATION
1. 1 Introduction
In most epidemiologic studies, the primary focus is on
quantifying the association between exposure to some suspected
agent and the development of some disease.
To carry out such
studies, data must be collected on subjects regarding their
exposure and disease status, as well as on other risk factors for
the disease under study.
In some instances, obtaining the
information on exposure may be a straightforward procedure. For
instance, if exposure is "place of residence", acquiring the
correct information would be relatively Simple.
However, it is often the situation that the exposure level for an
individual cannot be directly observed or measured; it may be
technically infeaSible or financially impractical.
An example of
the former situation is a cohort study examining the effect of
asbestos in an occupational setting.
It may be impossible to
get perfect information on the level of exposure for each worker.
Instead, an estimate could be formed by piecing together relevent
information.
There are other techniques, for instance the use of
surrogate variables, which may be employed to estimate true
exposure in certain situations.
Whenever the amount of true exposure is not being directly
measured, there is opportunity for individuals to be assigned an
incorrect level of exposure.
In the case of categorical
exposure variables, this error is termed misclassification
ll
II
subject who
belongs in one exposure
misc1assified into another.
category
may
A
•
be
This error is often ignored in the
analysis of data. Not correcting for misclassification, however,
can lead to biased estimates and invalid hypothesiS tests concerning
the true exposure-disease relationship.
Much has been said in the literature about the bias of
certain
estimators in various types of studies (Bross, 1954;
Diamond and Lilienfeld, 1962; Barron, 1977; Gullen et. a1.,1968;
Goldberg, 1975; Shy et. a!., 1978) . Most of this work, however,
pertains only to situations involVing dichotomous exposure and
disease variables.
There are few results which generalize to
any number of exposure categories.
The research done thus Jar regarding hypothesis testing in the
presence of misc1assification error has also been limited. All
the resul ts relate to the use of standard chi square tests of
independence. This implies that only overall tests comparing any
exposure to no exposure have been examined for studies with more
than two exposure levels.
There are no gUidelines for testing an
2
association between exposure and disease involving two of several
exposure categories, for instance medium vs. low or high vs. low,
when there is misclassification error in the study. These types of
tests would be of particular importance if the researcher was
interested in modeling dose-response
c~s.
This theSis proposes a general model which can be applied to
studies
with any number of exposure
categories.
It
explains
mathematically the relationship between a parameter of interest
estimated from the incorrect (i.e., misclassified) data, and that
parameter estimated from the (hypothetical) correct data.
thiS relationship, regular least squares methods or log
Using
linear models using weighted least squares or maximum likelihood
estimation can be employed to analyze data potentially involVing
exposure misclassification. Contrast tests can be used to test
hypotheses concerning any or all of the parameters.
1.2
Notation
The
theory
presented in the theSis allows
for
situations
involving two levels of disease status and any number of exposure
levels
or
categories.
Each subject in the study will
classified exposure level and a true exposure level.
have
These mayor
may not be the same.
Events of interest are:
o = "a subject has or develops a given disease"
E. = "a subject is aSSigned to exposure level or category i"
1
3
a
Ej * = Ita subject is trul y in exposure level or category j
i,j = O,l, ... ,k
Throughout this paper, a
11*11
II
indicates the true value of a
variable, its value when there is no misclassification.
With
each
categories,
pair
there
(i,j)
are
of
classified
two related
and
conditional
true
exposure
probabilities.
These will be referred to as the misclassification probabilities.
is the probability of being classified in category i given that
One
one truly belongs in category j, pr( Ei I E *). For the
j
situation in which there are just two exposure levels [exposed
(subscript 1) and unexposed (subscript 0)], the two probabilities,
pr( E 1 I E 1*) and pr( Eol Eo* ) are referred to as the sensitiVity and
specificity, respectively.
The
truly
other
belonging
conditional
probability is the
in category j given that one is
category 1, pr( E j * I Ei ). Letting
k
-
7T i
probability
of
classified
in
.= pr( E .* I E i ), for each i,
J
J
we have ~ 7T i .= 1. Perfect classification will occur when
j=O J
.
for all i=l=j, or, equivalently,
7T.. =1
IJ
7T •• =
IJ
0
when i=j.
Two parameters of interest in follow-up studies are the
measures of incidence of a given disease,
the probability of
interest,
given
developi~g
two
risk and rate. Risk is
the disease over the time period of
that one does not die from another cause first.
The incidence rate is the number of new occurrences of the
disease per unit of time relative to the size of the population at
risk.
This dissertation will primarily focus on follow-up or cohort
4
studies, but will also discuss case-control and cross-sectional
studies.
Let 6 ..
IJ
= pr( DIE. n EJ.*) =risk
1
associated with having
classified exposure level i and true exposure level j.
6 = pr( D I Ei ) = risk associated with
i
classified exposure level i,
6j *=
having
pr ( 0 I Ej * ) = risk associated with having true
exposure level j,
=
Aij incidence rate associated with having classified
exposure level i and true exposure level j,
Ai = incidence rate associated with
classified exposure level i,
having
Aj *= incidence rate associated with having true
exposure level j.
1.3 Independence Assumption
An assumption made for the remainder of this thesis is that
disease is associated only with the true exposure variable, not the
classified one.
In other words, given the true exposure level, the
risk or rate of disease is independent of the classified exposure
level.
In terms of the risk, this can be written as:
But,
pr(D
n E.1
IE .*)= pr(DIE.
J
1
n EJ.*)
:5
pr(E. IE .*).
1
J
So this assumption is equivalent to
e.*
=6. . •
(1.1)
pr(DIE .*)= pr(DIE. n E .*), or
J
1
J
J
IJ
This says that the risk of disease for those with true exposure
level j is the same for all classified exposure levels.
When
dealing with a follow-up study, thiS is a very reasonable
assumption to make.
Whether or not an indiVidual develops the
disease over the time period of interest should not be related to his
or her classified exposure, only to the true exposure. Ignoring
other risk factors, all individuals in the same true exposure
category should have the same risk of disease, regardless of their
classified exposure.
Not only is this a reasonable assumption to
make, it would be t.mreasonable not to make it.
Equation (1.1) can be shown to be eqUivalent to the condition
known as "nondifferentlal misclassification". NoncHfferential
misclassification occurs, in the case of a dichotomous
exposure variable,
when the sensitivity and specificity are
the same for the diseased and nondiseased populations.
For
the more general case of (k+l) exposure categories, it occurs
when pr(E i IE * n D) = pr(E i IE * nO),
j
j
.which implies
Using the above expreSSion, we can see that
pr (EinEj*nO) + pr (EinEj*nD)
pr(Ej*nO) + pr(Ej*nD)
8
=
pr (E.nE .*nO) +
1 J
pr (E i nE .*nO) pr(E.*nD)
J
J
pr(E .*nO)
J
pr(Ej*nO) + pr(Ej*nO)
= pr (EinEj*nO) [pr(Ej*nO) + pr(Ej*nO)
J
pr(Ej *nO) [pr{Ej *nO) + pr{E j *nD) ]
(1.2)
- pr(E i IEj *nO)
Now, pr(O IE1.nE .*) can be written as the following:
J
.
pr(O IE. n E.* ) = [
1
J
pr (0 IEj * ) pr (E j * ) pr{E i IE j * n D) ]
pr (E.
1
=pr(OIE .*)
J
n E. *)
J
pr(E. IE.* n
1 J
[ pr(E. IE.* )
1
J
D) ]
Examination of the above expression indicates that if equation
(1.1) holds, then equation (1.2) will also hold, and vice versa.
Therefore, the assumption of independence between the classified
exposure and the risk of disease given the true exposure is
equivalent
to
the
assumption
of
nondifferential
misclassification. Therefore, the assumption of nondifferential
misclassification seems to
be the only reasonable assumption to
make.
1.4 Proposed Model
When a researcher is confronted with misclassification, she or
he
has the ability to estimate a parameter based on the
7
misclassified data and the desire to estimate that parameter based
on the true data.
For example,
ei
, the risk associated with
having classified exposure level i, can generally be estimated
directly using follow-up study data, while
ej *, the risk associated
with haVing true exposure level j, cannot. Some knowledge of the
misclassification probabilities involved is reqUired to estimate
eJ.*
unbiasedly.
There are various methods proposed in the literature for
correcting misclassified data to obtain unbiased parameter
estimates (Barron,1977; Elton and Duffy, 1983; Copeland et. a!.,
1977; Shyet. a!., 1978). However, these techniques are
limited to studies involving dichotomous exposure and disease
variables.
No procedures have been suggested for studies with
more than two exposure levels.
The following theory leads to a method which will give
estimates of odds ratios and incidence density ratios for studies
with any number of exposure categories.· A mathematical
.'relationship between the two risks described above,
is shown below.
Use of this relationship assumes that the values
of the 7l"ij'S are known, and
~hat
there is nondifferential
misclassification.
e.=
pr(DIE.)
1
1
ei and ej *,
k
= '5 pr(D
j;"O
n E.*
J
8
lEi)
k
=.> pr(E.* lEi) pr(DIE i
J~
J
.
n E .*)
J
k
=j~ 7fij eij
(1.3)
Using equation (1.1) in equation (1.3), we get
k
e. = L 7ft . e.*
(1.4)
1 j=O
J J
•
Summing over the j's in
this
expression
is
similar to the
summing involved in defining a mixture of distributions.
Extending this reasoning, let g be a function of the risk or rate
of disease.
Also, let gi be that function for subjects classified
into exposure level i, and g.* be that function for subjects truly
J
in exposure level j.
Then
7f ..
IJ
can be considered to be the
probability that the true value of the function for subjects in the ith claSSified exposure level is gj * •
9
As with the situation above, we can express gi in the form:
k
gi = ~
j=O
Tr ..
g.*
(1.5)
IJ J
Summing over the gj *'s takes into account all the (k+ 1)
possibilities for the true risk or rate function for subjects in the ith classified exposure level. This leaves us with the weighted
average, gi' associated with the i-th claSSified exposure level.
There are two special cases of (1.5) which are of interest to
us. First, let g = logit e = In [e/(1-e)], where e is the risk of
disease. Then we find :
legit
Also, let g
el.
=
k
~Tr .•
j=O lJ
logit
e.*
(1.6)
J
= In A, where A is the incidence rate. This leads to
k
In A. = '5 Tr •• In A.*
J
1 j';:b lJ
Expressions
(1.7)
(1.6) and (1.7) are used later to develop regression
models for risks and rates of disease.
.1.5 Case-Control and Cross-Sectional Studies
Case-control studies use the probability of exposure given
disease, rather than the probability of disease given exposure, to
measure the association between exposure and disease.
these
studies,
there' is
often a question
of
correct
classification of disease status as well as of exposure.
important
In
concern is whether subjects have been
An
properly
identified as cases or controls.
Examining the effect of misclassification of disease status on
10
the probability of exposure given disease is analogous to examining
the
of misclassification of exposure on risk. Therefore,
effect
identity (1.4) can
be used in case-control studies with
nondifferential misclassification of disease simply by redefining
the variables. The events of interest here are:
E 1* = " a subject is classifed as, and truly is, exposed
II
Do = " a subject is classified as a control (not diseased)"
0 1 = " a subject is classified as a case (diseased) "
0 0* = " a subject is truly a control
.
0 1* = " a subject is truly a case
II
II
Then,
1
pr(E1*IO i ) = ~ pr(O .*IOi)pr(E1*10 .*),
j=O
J
J
•
.
i=O, 1.
Therefore, results which stem from model (1.4) for risk and
nondifferential
misclasSif~cation of
thiS situation.
For instance, it is later noted that there is bias
towards
the
null
for the
exposure will also hold for
risk odds ratio
nondifferential misclassification of exposure.
with
Bias towards the
hull for the exposure odds ratio with nondifferential
r:nisclassification of disease is an immediate consequence of this
result.
Also, if there is nondifferential misclassification of
exposure in a case-control study, a relationship exists between the
parameter of interest based on the misclassified data and based
on the true data. Using (1.1), the relationship is the follOWing:
k
pr(EiIO) =.5 pr(E i
J';l)
n E.*IO)
J
11
k pr(DIE.
_5
1
j~
.
n EJ.*)
pr(E .. n E .*)
1
pr(D)
k pr(DIE .*) pr(E.
=~
J
j=O
J
1
n E .*)
J
pr(D)
k
=.5 pr(EJ.*ID) pr(El/E .*)
J~
J
(1.8)
Notice that although this equation is similar to (1.4), the
misc1assification probabilities are different from those in
(1.4).
In the 2x2 case (k=l), these are functions of the
sensitivity and specificity.
to 1.
So,
Also notice that the pr(Ej *10) 's
SLUTl
for pr(E o /0) in the 2x2 case, equation (1.8)
simplifies to:
pr(E o 10) = pr(E o* /0) (specificity) +[l-pr(E o*ID) ] (1-sensitivity).
Many authors use these equations in some fonn even though
they may not acknowledge them
(Bross, 1954; Diamond and
Lilienfeld, 1962; Keys and Klhlberg, 1963; Goldberg, 1975). The
results obtained for model (1.8) are similar to the ,resUlts
.9btained for the model (1.4) used for misclassification of exposure
and measures of risk. For instance, various estimators have
been found to be biased towards the null under both models.
It is
important to note that it is the combination of the measure of
effect used (namely, the probability of disease given exposure or
the probability of exposure given disease) and the
rnlsclassified
variable (exposure or disease) that detennines which of the
models, (1.4) or (1.8),
should be used in the analysis of the
data.
12
•
In
a cross-sectional study,
the measLD"'e of effect
prevalence of disease in the population.
is
the
Since the prevalence of
disease in the j -th true exposLD"'e group is pr(D IEj *), all
calculations
and conclUSions
found for the risk of disease also
apply to the prevalence.
1.6 Example Using Dichotomous Disease and ExposLD"'e Variables
Assuming there are two levels of disease status (D and
0,
diseased and not diseased), and there are two levels of exposLD"'e
status (E 1 and Eo ,exposed and unexposed), then k=1 and
Eo = "the event that a subject is assigned to the
unexposed categoryll,
•
E 1 = lithe event that a subject is aSSigned to the exposed
•
categoryll,
E o* =
II
the event that a subject is truly in the unexposed
category" ,
E 1* =
II
the event that a subject is truly in the exposed
categoryll.
The misclassification
proba~ilities involved
7Too = pr(E o* lEo)
Tr11 = pr(E 1* lEi)
TrOl
=1-
Troo
7Tl0 = 1 - 7T11
13
are:
Nondifferential misclassification implies
specificity= pr(E o IE o*
sensitivity= pr(E 1 IE 1*
n D)= pr(E o IEo* n D)
n D)= pr(E 1 IE 1* n D).
t
and
The risks involved are the following:
eo = risk for persons classified as unexposed
e = risk for persons classified as exposed
eo* = risk for persons truly unexposed
e * = risk for persons truly exposed
1
1
Using equation (1.4) we get:
eo
=7Too e o* + (1- 7Too)e 1* = 7Too(eO*-e1*)
+ e 1*
e 1 = (1- 7Tu) e o* + 7TUel* = 7Tu (e 1*-e O*) + e o*
(1. 9)
(1.10)
e
1.6.1 Misclassification Bias for Risk Difference
Let
RD =
(e 1-e O) be the risk difference in the presence
of misclassification and RD* =ie 1*-e O*) be the true risk
difference. From equations (1.9) and (1.10):
RD = 7TU(e1*-eO*) + e o* + 7TOO(e1*-eO*) - e 1*
= (7Tu +7Too - 1)
(e 1* -e o*)
= (7Tu +7Too - 1) RD* .
This is similar to a
re~ult
(1.11)
shown by Newell (1962).
His
work was done in the context of case-control studies and he looked
at the differences between the proportions of interest [pr(EtlD) and
pr(EtI[))].
Newell noted that the true difference varied from the
apparent difference by a factor of (sensitivity + specificity-1)
when there is nondifferential misclassification of exposure. This
.
14
•
can be derived from equation (1.8) using the same mechanics used
to derive (1. 11 ).
Equation (1.11) is the risk-based counterpart of Newell's
result. With a reasonably small degree of misclassification
error so that 2<7Too<1 and 2<7Tll<1 , equation (1.11) implies bias
towards the null of the estimated risk difference. Also, if
(7ToO+7Tll)<1, then RD<O where RD*)O.
Newell's result implies that there is bias towards the null
when estimating [pr(EtlD) -pr(EtlD], assuming the sensitivity and
specifiCity are both greater than 2. Diamond and Lilienfeld (1962)
present a proof in their Appendix which supports this result.
The article by Gullen et. al. (1968) considers
•
e
misclassification of both exposure and disease variables and the
effect of thiS has on the prevalence difference from a crosssectional study. As was explained before, the theory developed
for the prevalence difference also applies to the risk difference for
a follow-up study.
These authors derive a formula relating the two prevalence
differences involved: that from the misclassified and and that from
•
the correct!y claSSified population. They find that the difference
involving misclassification is equal to the true difference times a
factor they call R.
R is a function of the misclassification
probabilities for exposure and disease, given in terms of
pr( E.IE.* ) and pr( 0.1 0.* ).
1 .J
1
J
After intense algebraic manipulations, it can be shown that
R= (sensitivity + speCifiCity -1 ) (7Tu +7Too -1),
15
where sensitivity and specificity relate to misclassification of
disease status and Tr 11 and Troo relate to misclassification of
exposure status.
Thus, if there is no misclassification of·
exposure, this formula is identical to Newell's.
If, on the
other hand, there is no disease misclassification, thiS formula is
identical to equation (1. 11) .
1.6.2 Misclassification Bias for Risk Ratios
As with the risk difference, it has been shown that the
estimated relative risk is biased. towards the null in the presence of
nondifferential misc1assification. Gladen and Rogan (1979)
examine the effect of misclassification of exposure on the relative
risk when there are any number of exposure levels. They
show that the estimated relative risk comparing the highest and the
lowest levels of exposure will always be biased towards the null.
Below, a special case of thiS result is shown: when k=l, the
estimated relative risk will be biased towards the null. Further,
.they conclude that it is impossible to make any statements regarding
the direction of the bias in estimating those relative risks which
compare intermediate exposure levels with the lowest.
To prove bias towards the null when k= 1, let RR be the relative
risk parameter estimated from the misc1assified data, and let
RR* be the true relative risk. Using (1.9) and (1.10),
e
eo
1
Tru
(e 1* - eo*) + eo*
Troo
(e o* - e1*) + e1*
RR=- -
16
,
and RR* =
•
AssLDT1ing
e * > 6 0*s
1
be bias towards the null.
RR must be less than RR* for there to
It is possible for a switch-over to
occurs i.e. s for RR to be less than the null value of 1. A switchover will only occurs however s when the degree of misclassification
is very high. If 7roo and 7r11 are both greater than ;s which we will
aSSLDT1e is the cases a sWitch-over is not possible.
Bias towards the null of the risk ratios when RR*> 1s is
demonstrated as follows:
RR
~
<RR*
7r11 6 0* + 7roo 6 1*
<6 0* + 6 1*s which
is true since
0< 7rOOs 7r11 <1. Therefore s RR is less than RR*s
implying bias towards the null of the relative risk when k=1.
'1.6.3 Misclassification Bias for Odds Ratios
For a case-control study with nondifferential
misclassification of the exposure variables Diamond and Lilienfeld
(1962) show that the estimated exposure odds ratio is biased
•
towards the null. Using' equations (1.9) and (1.10) s it can be
shown that this is also the case for a risk odds ratio from a
follow-up study under the same circumstances. As with the
relative risks a switch-over is not possible assLDT1ing 7roo and 7r11
17
are both greater than
i.
1.7 HyPothesis Testing
Several authors have examined the effect of nondifferential
misclassification erTOr on tests of independence.
Bross (1954)
began by considering a test equivalent to a test of the
difference between proportiOns in case-control situations, pr(EtlD)pr(EdD) , with misclassification of exposure.
He found that the
significance level is not altered by the presence of
misclassification, but that the power is reduced.
The work of Gladen and Rogan (1979)
deals with a test of
independence of a 2xk table. The two-level factor COrTeSponds to
disease status and the k-level factor corresponds to exposure, the
misclassified variable.
The authors show that the Type I erTOr
e
..
rate is not affected by nondifferential misc1assification but that,
again, power is reduced. Mote and Anderson (1965) show thiS
result to hold for any size contingency table.
1.8 . Covariates
Invariably, there will be other risk factors for the disease, in
addition to the primary exposure variable of interest, which must
be considered in the analysis.
for
Two possibilities for adjusting
these covariates are the use of stratification and the use of
modeling. The literature reviewed in thiS section, dealing with
the effects of misc1assification in the presence of covariates,
18
•
relies solely on stratification as the means of adjustment.
The
disadvantage in the use of stratification is that, for a large
nlDTlber of extraneous variables and a large nlDTlber of categorjes
for each extraneous variable, the nlDTlbers in the various strata
will be small, thus proViding the atmosphere for imprecise results.
Modeling is an alternative which is more robust to the instability
problem caused by small stratum-specific sample sizes.
In the second part of this section, the theory in Section 1.4 is
expanded to include covariates. Ultimately, thiS expanded model
will be used in a regression analysis setting.
1.8. 1 Covariates and Misclassification in the Literature
Greenland (1980) illustrates, by use of examples involVing
2x2 tables, the effects of nondifferential misc1assification of the
exposure variable alone, of a confounder C alone, and of the two
together. In the first of these cases, he shows that it is possible
for the OR's across the strata to differ, even though the OR*'s
are homogeneous.
This will occur, Greenland explains, only if
the covariate is associated with the claSSified exposure variable.
It is also possible to mask heterogeneity of the OR*'s. In other
words, effect modification may spuriously appear to be present
or absent.
This phenomenon occurs with the risk difference as
well.
From results for 2x2 tables already stated, it follows that the
summary OR will be biased towards the null. Greenland points out
19
that this is also true for the risk difference.
Greenland and Robins (1985) add that if the covariate is
associated only with exposure, and not with disease nor with the
rate of misc1assification, then, if (sensitivity + specificity)
>1,
adjusting for C will increase the bias due to misclassification. A
proof of this is provided in their AppendiX.
For a case in which a covariate is associated only with the
true exposure variable, the standardized and crude OR's will be
equally biased towards the null.
If the covariate itself is
nondifferentially misclassified, but is not a confounder, it will
produce no bias in either the standardized or the crude OR.
Another situation examined by Greenland and Robins involves
exposure variable misclassificatlon rates which differ between
two strata, but not between the diseased and nondiseased
populations. This could occur, as is illustrated in an example, if
two different methods were used to obtain information on exposure.
Each method corresponds to a stratum and would presumably have
. a unique sensitivity and specificity. In thiS situation, adjustment
for the covariate mayor may not proVide a more accurate estimate
of OR* than analysing the crude data. It is possible that
adjusting will increase overall bias.
The authors also illustrate a result of Korn (1981) concerning
hypothesis testing:
if there is misc1assification as described
above and no other biases are invelved, then under the hypotheSiS of
no exposure-disease relationship Within each stratum, the
adjusted chi-square test for independence will have a
20
e
.
large-sample significance level no greater than the true largesample significance level.
Finally, these results can be carried over to a regression
situation. Suppose that the covariate is associated with the
classified exposure
variable
variable
only,
and that the exposure
is misclassified; then, since the adjusted OR is more
biased than the crude, if covariates are introduced into the
regression model, the estimate of OR* will be more biased than if
the covariates were excluded from the model. If misclassification
rates differ over the levels of the covariate, the degree of effect
modification may be misrepresented.
This implies that the
estimated interaction coefficients may be biased.
The effects of misclassiflcaUon of the confounder are also
examined by Greenland.
If C is a confounder such that ignoring
its presence in the analysis will lead to crude odds ratios which
are greater than the adjusted odds ratios obtained by controlling for
C, then the stratum-specific and summary OR's will be higher
than the corresponding OR*'s.
If, on the other hand, ignoring C
produces lower crude odds ratios than the adjusted odds ratios,
then the OR's will be lower than the OR*'s.
bias will be towards the
n~ll
In other words, the
if C has a negative confounding
effect, and away from the null if C has a positive confounding
effect.
It is also poSSible for the misclassification to result in a
masking of OR* heterogeneity, or to produce a spurious or
exaggerated appearance of heterogeneity. This distortion is not as
21
easily predicted as the one described above.
The final case involves the misclassification of both the
exposure variable and the confounder.
that
Here, Greenland aSSLDTles
the misclassifications are independent,
as well
as
being nondifferential. Under this situation, any type of distortion is
possible; there may be a bias of the stratLDTl-specific OR's towards
or away from the null, or no bias at all.
As a last note,
Greenland says that all of the results stated above for the
effects of misclassification on the odds ratio apply to the
relative risk as well.
1.8.2 Expanded Model
Suppose that' the covariates of interest can be represented by a
vector of p covariate values,
v=
). Also, for our
p
purposes, we will need to aSSLDTle that all covariates are
categorical.
Any
(VI ,V2 , ... ,v
continuous variable can
be
made
categorical, in which case there may be some loss of information.
Each subject will fall into one of the strata defined by the
combinations of categories for the grouped covariates. Each
stratLDTl will correspond to a particular set of covariate values.
Suppose, for instance, that
t~ere
are three categorical variables,
age, sex and race, and that these variables have been categorized
such that there are five age categories, two race categories, and
two sex categories. Then the total number of strata will be 20,
each stratLDTl corresponding to a different age-race-sex category
combination.
22
•
Let e iS be the probability of disease development for a set of
Nis incHviduals classified into the i-th exposure category and the sth stratum, s=1,2, ... ,S.
Let e js* be the probability of disease
development for those individuals with true exposure category j,
classified into the s-th stratum. Using expreSSion (1.5) and
defining gis = logit eis(vs )' where e iS is as defined above and
vs ' = (vs1 ' vs2 ,... ,vsp) is the set of covariate values corresponding
to the s-th stratum, then we find that
k
logit e 1.S (vS) = '5 7r i . legit e. *(v )
j=b J
JS s
(1.12)
This implies that expreSSion (1.6) holds within each of the S strata.
It is important. to notice that as we expand our model to hold
within each stratum, as we are doing under model (1.12), we are
inherent!y expanding the assumption of nondifferential
misclassification to hold within each stratum. The nondifferential
misclassification assumption discussed in Section 1.3 is eqUivalent
to assuming that 6 .*=6 i ., which in turn gives
k
J
J
6 i =2 7r i .6 .*. This model form was then generalized to that of
j=O J J
model (1.6). Therefore, if we claim that model (1.6) holds Within
each stratum, we are claiming that 6js*=B ijs for all s=1,2,... ,S,
or that nondifferential miscl~ssification holds within each stratum.
Using similar notation, covariates can be included in model
(1. 7),
which involves the rates of disease occurrence. Let AiS
be the rate of disease development for the i-th classified exposure
category and s-th stratum, and Ajs* be the rate of disease
development for the j-th true exposure category and s-th stratum.
23
By using expression (1.5) and now setting gis = In Ais(Vs ), we find
k
In Ais(Vs ) =j~ '1rij In AjS*(VS)
(1.13)
Again, model (1.13) implies that model (1.7) holds Within each of
the S strata. This reqUires that the nondifferential
misclassification assumption holds within each stratum.
In
the
folloWing section,
we will show how
logistic
and
Poisson regression methods can be used in conjunction with
expressions (1.12) and (1.13), respectively, formodelingand
estimation purposes.
1.9 Estimation of Parameters
Logistic regression is often used in modeling risk. It is based
on fitting the logistic model
e.JS* (vs ) = { 1 + exp (-13 a * - 13J.* - vs 'y* ) r 1
(1.14)
for j=O,1,. .. ,k, where e jS* (vs ) is a probability, in thiS case a
risk, for the j-th true exposure level and the stratum
corresponding to the s-th set of p covariate values
vs'=(vs1,vs2'''·'vsp). The vector Y*=(Y1*'Y2*'''.'Yp*)' is the set
of regression coefficients for the covariates. 13 a* is the reference
value for the group in the lowest exposure category and first
stratum. f3 j * is the regression coefficient for the j-th true
exposure level.
130*=0, since the effect of the lowest exposure
category is already included in the
13 a * term.
From (1.14) it follows that
logite. *(v)
JS s
=13 a *+p.*+v'y*,
J
s
24
j=O,1,... ,k
(1.15)
Substituting expression (1.15) into model (1.12), we get
logit
e.1S (v)
S
k
+ti
= 13a._
*
J-
Tr•.
1J
13J.*
+vs 'y*
= ...1' fJ* + vS 'y'*
i 1' ... , Tr ik ) and fJ*= (f3 a *, 13 1*,,,.,
The term j=O drops out of the summation since f3f)*=O.
where ..{= (1,
Tr
(1.16)
f3 k*)'.
Our objective is to obtain accurate (i.e., small bias and
variance) estimates of the
13j *'s.
Weighted Least Squares (WLS)
or Maximum Likelihood (ML) methods can be used to fit model
(1.16).
In the modeling of rates, Poisson regression is frequent 1y
used. It is based on fitting the model
A. *(v) = exp (13 * + 13·* + v'y'*),
(1.17)
JS s
a
J
s
where AjS*(VS) is the rate for the j-th true exposure level and the
s-th stratum, and f3 a *, f3 j *, vs ' and y'* are as defined earlier.
The use of the exponential function in thiS model assures that
estimated rates will be non-negative. From (1.17), it follows that
In A. *(v ) = f3 * + 13.* + v 'y'*
JS s
a
J
s
Substituting thiS expreSSion into model (1. 13) we get
k
In Ai (v) = 13 * +~ Tr .• f3.* + v 'y'*
S s
a j=l 1J J
s
= ...1'fJ* + vS 'y'*
(1.18)
Model (1.18) can be fit by WLS or ML methods to obtain an
estimate of fJ*.
Ordinary (unweighted) least squares methods can also be
applied to a situation in which the dependent variable (say Y) is
25
continuous (e.g., blood pressure).
In this case, the covariates
need not be categorical variables. The subscript s can refer to
an individual subject rather than to a particular stratum. The
same general model form would be fit, namely
E(Y.1S) = 7f.'
/1* +
1
y
S
'y* .
(1.19)
1.10 Matrix Representation of Model (1.16)
In this section, model (1.16) will be expressed in matrix
notation as an aid to further understanding. Model (1.16) can be
written as
legit
e = n/1* +
V~
where
logit e =
S(k+ 1)xl
legit eO 1
log~t ell
.
logit ek1
, n=
eOS
log~t
e1S
legit
ekS
.
.
S(k+l)x
(k+l)
-----------logit
1 lr01 ... lrOk
1 lr 11 .. · lr 1k
-------------------1 7T0 1'" 7TOk
1 7T 11'" 7T 1k
.
1
26
7Tk 1'" 7Tkk
Pa*
Pt*
fJ*=
,
.
(k+1)x1
v 11 v 12·" v 1p
v11 v12 ... v1P , r=
.
V =
S(k+ 1)xp
pxi
v ! v 2"· vi
Pk*
1 1
-----------------_JQ
vS ! vS 2
vSp
v~1 v~2
v~p
y*
p
The first (k+1) rows of logit 6, n, and V correspond to the first
stratum, and the last (k+1) rows to the last stratum. Notice that n
is a series of vertical! y concatonated identical matrices. Let us
e
define the submatrix n t to be
n1=
(k+!)x
(k+1)
i
i
'lr 0 !"· 'lrOk
'Ir!1"·
'lr 1k
50
that
.
.
1
'Irk!"·
This definition of
n1
'lrkk
n will
n=
nt
nt
be used in the theory of later chapters.
Let us also define JIO to be that n matrix which represents
perfect classification of exposure. When n=JIO, then n t is a matrix
with 1's down the first column and main diagonal and O's
elsewhere.
As an illustration, suppose there are two exposure categories
(i.,e., k=1) and two covariates, race and age. Further, suppose
there are two levels of race and three levels of age. The number of
27
covariate variables needed explicitly in the model to represent race
is one, the number needed to'represent age is two. Therefore,
there is a total of three covariate variables needed, i.e., p=3.
There is a total number of six strata, i.e., S=6.
Suppose that the strata are defined as follows. The vector of
covariate values for stratum s is vs'=(vsl,vs2,vs3)' where
v - { 0, first level of race
s 11, second level of race
v - { 1, second level of age
s2- 0, otherwise
v - { 1, third level of age
53- 0, otherwise
13a *
The parameters are interpreted as follows.
is the
reference value for the group with no exposln"'e in the first age and
first race categories.
131*
is the exposln"'e effect. y 1* is the effect
for the second race category. Y2* and Y3* are the effects for the
second and third age categories,respectively. Now,
logit 6 =
logit
logit
logit
logit
logit
logit
logit
logit
logit
logit
logit
logit
6 01
6 11
6 02
6 12
6 03
6 13
6 04
6 14
6 05
6 15
6 06
6 16
1
,
1
1
1
II=
1
1
1
1
1
1
1
1
28
7T01
7T11
7T01
7T11
7T01
7T11
7T01
7Tu
7T01
7T11
7T01
7T11
,
/1*=
[:~: ]
e
VI'
V =
VI'
V/
v/
V/
V3'
v/
v/
vs'
vs'
V6'
V6'
=
0
0
0
0
0
0
1
1
1
1
1
1
0
0
1
1
0
0
0
0
1
1
0
0
0
0
0
0
1
1
0
0
0
0
1
1
and
y*=
Yl*
Y2*
Y3*
logit 6 = nj1* + Vy* expands to the following twelve equations:
logit
eOl = TrooPo* + Tr01Pl*
eu = TrlOPO* + 7rll Pl*
logit
e12 = 7rlOPO* + Trll{31*
legit
-
+ Y2*
e03 = 7rooPo* + TrOl{31* + Y3*
logit e13 = TrlOPO* + Trllf31* + Y3*
logit
e04 = TrooPo* + 7r01Pl* + Y 1*
logit e14 = Trl0f30* + Trllf31* + Yl*
legit
logit eos = TrOOPo* + TrOlf31* + Yl* + Y2*
logit
e1S = Trl0f30* + TrllPl* + Yl*
+ Y2*
e06 = TrOOPo* + Tr01Pl* + Yl* + Y3*
legit e16 = Trl0PO* + TrllPl* + Yl* + Y3*
logit
The vector of parameters 6'=(e Oh e1h
29
e02' e12 ,
... ,
e06 ' e16 ) is
...
estimated directly from the follow-up study data.
ais is the
observed proportion of people in cell (i,s) who develop the disease in
question during the follow-up period. The lTij'S must be specified.
Pa *, P1*'
Y1*' Y2*' and Y3* are to be estimated using logistic
regression methods.
1. 11 Appropriateness of Models
Model (1.4) describes a relationship which will hold under any
situation involVing misclassification of exposure. However,
models (1.6) and (1.7) are not models which will hold in
every situation. The relationships described by these models make
certain claims concerning the relationships between the distorted and
true odds ratios and the distorted and true rate ratios, respectively.
When the claimed relationships hold, the models are completely
appropriate. When the claimed relationships hold "approximately",
the models will generally perform well.
According to model (1. 4), the follOWing relationship holds
in a situation involving (k+ 1) exposure categories.
k
a.-a O = 2
1
j=1
(7T. ·-7T
lJ
OJ.·)
(e .*-eO*)
J
or,
k
RD. = 2
1 j=1
(IT .. -lT
lJ
oJ·)
RD.* ,
J
for i=1 ,2,... ,k,
(1.20)
where RDi=(e i -eO) is the risk difference comparing the i-th and the
lowest claSSified exposure categories and RDj *= (ej * - eO*) is the
risk difference comparing the j-th and the lowest true exposure
30
categories.
When k=1, expression (1.20) becomes
RO = (7T11+7T00-1) RO*,
as we saw in Section 1.6.1.
Model (1.6), which uses the parameter logit
e rather than e
itself, implies
k
logit e. - logit eO = ~ (7T. ·-7TO ·) (logit e.* - logit eo*)
.
1
j=1 IJ
J
J
or,
k
In OR i = ~ (7T.. -7TO') In OR.* , for i=1 ,2, ... ,k,
j=1 IJ
J
J
(1.21)
where ORi=[e (1-e )/e O(1-e i )] is the odds ratio comparing the i-th
O
i
and the lowest classified exposure categories, and
ORj *=[e j *(1-e O*)/e O*(1-8j *)] is the odds ratio comparing the jth and the lowest true exposure categories. Therefore, when k= 1,
model (1.6) implies that RO=(7T11+7T00-1)RO* and
InOR= (7T11+7T00-1) InOR*, so that the same relationship which holds
between the true and distorted risk differences, holds between the
natural log of the true and distorted odds ratios.
Model (1.7) uses the parameter In A rather than
e in the model
form of model (1. 4) • This model implies
k
In A.-In AO = ~ (7T. ·-7T0 ·) (ln A .*-In Ao*)
1
j=l IJ
J
J
or,
k
In lOR. = ~ (7T.. -7T O ·) In IOR.* , for i=1,2, ... ,k,
1 j=l IJ
J
J
(1.22)
where IOR 1=( Ai/A a) is the incidence density ratio comparing
the i-th and the lowest classified exposure categories and
31
IOR *=( Aj*/A *) is the incidence density ratio comparing the
O
j
j-th and the lowest true exposure categories. When k=1, expre~sion
(1. 22) simplifies to
In lOR = (lTll+lTOO-1) In IOR*.
Since model (1.4) holds in any situation involving nondifferential
misc1assification of exposure, the relationship described in
expression (1.20) will also hold in any such situation. If model
(1.6) is appropriate, then the relationship described in expression
(1. 21) also holds. If model (1. 7) is appropriate, then the
relationship described in expression (1.22) holds.
The theory developed in Chapters 2,3, and 4 assumes that models
(1. 16) and (1.18) are appropriate to a given situation. The results
from those chapters are valid as long as thiS is the case. Model
(1.16) will hold whenever model (1.6) holds within each stratum.
Likewise, model (1.18) will hold whenever model (1.7) holds within
each statum. In Chapter 5, conditions are examined under which
these models Will, in fact, hold well. For now, however, let us
assume that the appropriateness of these models is not an issue; let
us assume that the models fi t perfect1y.
82
·
\\
CHAPTER TWO
FITTING AND EXTENSION OF MODELS
2. 1 Introduction
In Chapter 1, we developed models for different types of
response variables in situations involving misclassification of
exposure in the presence of covariates. In this chapter, we will
examine several methods for estimating the parameters in those
models. Specifically, we propose the use of weighted least squares
and maximum likelihood estimators based on logistic regression
methods for model (1. 16), Poisson regression methods for model
(1.18), and regular least squares estimators for model (1.19).
Th~
formulas for the WLS and ML estimators used with logistic and
Poisson regression methods are given.
Following this, the theory under!ying these three models is
extended to other situations involVing misclassification error.
These include situations involVing misclassification of covariates
with and Without misclassificatlon of exposure. In addition, models
are developed which include interaction terms, as well as models
which can be used to fit a functional trend in the parameters.
Finally, a model is given for situations in which misclassification
probabilities differ from stratum to stratum.
2.2 WLS and ML Estimators for Poisson and Logistic
Regression Procedures
Poisson and logistic regression methods are statistical
techniques used to analyze data involving counts assumed to follow
Poisson and binomial distributions, respectively. In such cases,
the variances of the response variables will not be constant, as is
assumed in standard (unweighted) least squares regression.
Therefore,
estimation
weighted
least
must be used.
squares or maximum
likelihood
Weighted least squares and
maximum likelihood estimation procedures are asymptotically
eqUivalent. However, when cell sizes are small (e.g., (S ),
maximum likelihood methods are recommended (Imrey, et. a1.,
1981) .
Poisson
regression is used to study the
relationship
between observed rates and a set of explanatory variables.
One
type of Poisson regression is based on fitting log linear models
such as model (1.18).
In this case, each rate is estimated
by
a count ( or number of lncidences of disease development ) di vided
by the amount of population-time at risk.
Within each classified exposure level and stratum combination
(i.e., within each (i,s) cell), the observed count is assumed to be a
Poisson random variable. Further, each variable is assumed to be
. 3.
independent of the others, so that the joint distribution of the data
is a product Poisson likelihood. Let Xis be the observed count
(or number of occurrences) in the i-th classified exposure category
and s-th stratum, and let LiS be the total amount of population
time at risk in the i-th classified exposure category and s-th
stratum, i=O,1,00.k, s=1,2,00.,S. Then, XiS is assumed to follow a
,.
Poisson distribution with mean
,.
JJis= LiSA is · Since JJis = XiS'
then Ais = XiS / Lis'
The WLS estimates for model (1.18) are derived using the
follOWing formula in Grizzle, Starmer, and Koch (1969):
,. *
-1
[ ~:~) = [[
e: )D
x
(n. V)]
[ ~ ] Dx Y
(2.1)
where Dx is a diagonal matrix of dimension S (k+ 1), with the
counts X=(X 01 ,X , ... ,XkS)' on the main diagonal, and
11
,.
Y=(Y 01 ,Y11 , ... ,YkS)' is a S(k+l) column vector with Yis=ln(A iS)
,.
where
-1
Ais= ( XiS IL is ). Dx
is the estimated covariance matrix
of Y.
The maximum likelihood equations are based on the product
Poisson likelihood for the data.
Since these equations have no
explicit solution, an iterative process is necessary to obtain the
maximum likelihood estimates., One such process is the
Newton-Raphson method. A recommended initial set of parameter
estimates for this iteration procedure is the set of weighted least
squares estimates given above in equation (2. 1) .
The successive
estimates are then obtained as the process changes the J-th step
35
estimate,
'"
[;1:]'
to the (1+1)-th step estimate,
'" *
[fJ!:l]
~+1
1
using the formula
'"
( X - JJj )
'"
....
where J.lis,l = Lis [exp ( 'W1'fJ1* + Ys'''l*) ] •
A
JJ1
A
A
A
= (J.lO 1 ,1,J.l11 ,1'·" ,jJkS,l) I
is the vector of predicted counts
'" .
from the 1-th estimate, and }:1 = D....
JJ
-1
,where D..
t
JJ1
the
inverse of the 1-th step estimated covariance matrix of Y, is a
....
diagonal matrix with the predicted counts JJ1 down the main
diagonal. The estimate of the asymptotic covariance matrix for
The iteration process continues until a specified "closeness"
" criterion is met, e.g., when the difference between two
sequential sets of estimates is less than some preset value, or
until a predetermined number of iterations has
been
performed. The final set of estimates calculated by the iteration
procedure are the maximum likelihood estimates,
'" *
fJ"'*
and
"ml'
ml
Logistic regression is used to analyze the relationship
between
an observed proportion and a set of
explanatory
variables. It is based on fitting linear logistic models such as
model (1.16).
Let XiS be the number of subjects developing the
36
disease in the i-th classified exposure category and s-th stratum,
and let Nis be the total number of subjects in the i-th exposure
category and s-th stratum. Since each subject has a binary
response (either she or he develops the disease or does not), XiS is
assumed to follow a binomial distribution with mean IJis = NiSeis.
The distributions for all of the exposure category and stratum
combinations are assumed to be mutually independent.
Therefore,
the likelihood of the data is assumed to be a product binomial
distribution.
The weighted least squares estimates are derived using the
follOWing formula in Grizzle, Starmer, and Koch (1969):
[
P~s
A* ] -v
'wls
[ [ n' ]
V'
Dt -1
.
(
n ,V )
]-1
l- n' J- Dt -1
V'
Y
where Dt is a diagonal matrix of dimension S(k+l) with the values
t=(t 01 ,t 11 ,... ,t kS ) down the main diagonal where
-1
tis=Xis
-1
+ (Nis-Xis )
Yis= In [XiS / ( NiS - XiS)
, covariance matrix of Y.
,andY=(Y01'Yll'···'YkS)'where
,..
J= logit eiS·
Dt is the estimated
As with the Poisson regression case, the maximum likelihood
equations for this situati0!1 have no explicit solution.
The
Newton-Raphson iteration process used here produces the follOWing
formula for the (1+ 1)th step of the iteration process:
37
A
where
~l
is the l-th step estimated covariance matrix of Y, p
...
is the vector of observed proportions, and Pl
is the l-th step
vector of predicted proportions. The iteration process continues
until a specified criterion is met. The final set of estimates
calculated by the iteration process are the maximum likelihood
estimates,
*
Pml
A
,. *
and Yml.
2.3 CATMAX Computer Procedure
CATMAX is a set of statements within PROC MATRIX which
is itself a part of the SAS computer package.
It produces
weighted least squares and maximum likelihood estimates for loglinear models, including models based on both product Poisson
and binomial likelihoods. The formulas used to calculate the
weighted least squares estimates are given in the previous
section, as are the formulas for the maximum likelihood estimates.
The iteration procedure ends when either the difference between
two sequential estimates is less than .0005 or eight iterations
have been performed, whichever comes first.
CATMAX proVides not only the WLS and ML estimates of fJ*
,..
and y*, but also the vector of predicted counts, p. Goodness-of-fit
is tested for the WLS estimates by Qw (the Wald chi-square
statistic), and for the ML estimates by Q (Pearson's chi-square
p
statistic) and by Qlog (the log-likelihood ratio statistic). The
number of degrees of freedom for each of these statistics is equal
to the number of (i,s) cells minus the number of parameters
estimated, or «S-1) (k+ 1) -p) degrees of freedom. Also, all
38
address the hypothesis that the variation in the counts is compatible
with the fitted model.
An option available in the CATMAX procedure is the use of
•
contrast statements for hypothesis testing.
A hypothesis of the
form
•
H,: C [ : : ) = 0 can be tested using the Wald statistic:
'" '" *
A
*
where Vara (PmI' Yml) is the estimated asymptotic covariance
matrix for
[~:).
2.4 Odds Ratios and Incidence Density Ratios
When dealing with logistic regression, the parameters
P1*,
P2*,,,.,Pk* are related to the true odds ratios in the following way.
: Using model (1. 15), we can see that
eos* (vs )
= (P a*+p.*
J + vs '1*) - (P a * + vs '1*)
In OR js* = logit
=
ejs*
Pj * ,.
(vs ) - logit
j=1,2,... ,k, s=1,2, ... ,S,
where ORjs* is the odds ratio comparing the j-th true and the
lowest true exposure categories in the s-th stratum. Since no
interaction is assumed, exp(p .*) is the adjusted odds ratio
J
comparing the j-th and lowest true exposure categories (i.e.,
39
exp(pj *)=ORj *). Also, the hypothesis Ho: Pj * =0, for j=1 ,2, ... ,k,
is equivalent to the hypothesis Ho: OR j *= 1.
Similarly, in the case of Poisson regression, the Pj *'s for
j=1,2,... ,k are related to the true incidence density ratios. Using
model (1.17), we can see that the following relationships hold:
In IDR jS* = In AjS*(vs )- In AOs* (vs )
= (P a *+PJ.*+vS 'T*) - (P a *+vS ''1*)
= Pj *,
j=1,2,... ,k,s=1,2,... ,S,
where IOR jS* is the incidence density ratio comparing the j-th true and
the lowest true exposure categories in the s-th stratum. Since no
interaction is assumed, exp (P j *) =IOR j * does not depend on 5; lORj *
is the adjusted incidence density ratio comparing the j-th and lowest
true exposure categories.
As an illustration of the use of logistic regression with model
(1.16), consider the following example.
Table 2.1
True Population
0
D
Table 2.2
Classified Population
0
D
E1*
12
48
E1
13
62
Eo*
14
126
Eo
13
112
Table 2.3
True vs. Classified Population
E 1*
Eo*
ss
s
20
120
40
e
Table
2. 1
reflects the population when exposure
is
perfectly classified. Table 2.2 reflects the population when
exposure is imperfectly classified with sensitivity =.9167 and
specificity =.8571 in both the diseased and nondiseased
populations.
•
Table 2.3 can be constructed using the values of the
sensitivity and specificity along with the marginal totals from
Table 2.1.
calculated:
From this table the values of the 7T
ij
's can be
7Too= 120/125 = .96
7Tu= 55/75 = :7333
Note that these are the true values of the 7Tij 's which, in
--
this example, can be calculated from the known true population
and known values of sensitivity and specificity.
Normally,
however, thiS information would not be known; the 7T 's would be
ij
estimated and only the classified population, i.e., the values in
Table 2.2, would be available.
Since there are no covariates in this problem, the design
matrix will simply be
n=
1
.04
1
.7333
Using model (1. 16) and logistic regression procedures, the
follOWing maximum likelihood estimates are calculated:
,..
fJ a* = -2.18767
,..
fJl* = .852971
From this analysis, the estimate of the odds ratio is equal
,..
to exp(fJl*)=2.34, while the true odds ratio calculated from the
"1
values in Table 2.1 is 2.25. We can see in this example that the
estimate of OR 1* obtained assuming model (1.6) is close to, but
not equal to, OR 1*. This is due to the situation discussed in
Section 1.11. We saw there that, while model (1.4) is always
appropriate, the appropriateness of model (1.6) varies from
situation to situation. Had model (1.6) been entirely appropriate
here, the relationship OR=OR*( 7T u- 7T 01) would have held exactly.
Instead, OR h the odds ratio based on the classified population, is
equal to 1.81 while OR 1*(7T u -7Tol)=1.75.
2.5 Models for Misclassification of Covariates
In this research, the models which have been developed thus far
accommodate only situations for which there is just one variable
misclassified, the exposure variable. Situations may arise,
however, in which there is misclassification of more than just one
variable. These include the possibilities of misclassification of
any number of covariates along with misclassification of exposure,
and the poSSibilities of misclassification of any number of
.,covariates alone.
2.5.1 Development of Models
First, let us examine the problem of misclassification of
covariates with no misclassification of the exposure variable in
terms of model (1.16).
Similar conclUSiOns will hold for models
(1. 18) and (1. 19) • The events of interest here are:
Ts = "a subject is assigned to stratum s"
Tr * = "a subject is truly is stratum r"
42
..
•
E. * = a subject is assigned to, and is truly in,
lI
1
exposure category ill
o=
lI
a subject has or develops a given disease ll
Perfect classification of exposure implies that, if a subject is
assigned to category i, that subject is truly in category i.
Therefore, Ei , defined earlier as the event in which a subject is
assigned to exposure category i, is eqUivalent to the event Ei * in
which a subject is truly in exposure category i. In a situation
involVing perfect classification of exposure, both Ei and Ei *
represent the event in which a subject is assigned to, and is truly
in, exposure category i. Likewise, if there is no misclassification
of covariates, Ts and Ts * are eqUivalent events.
Since the strata are defined by combinations of levels of all the
covariates, if one covariate is misclassified, there is a possibility
for a subject to be aSSigned to the incorrect stratum. Incorrect
stratum aSSignment will also occur if more than one covariate is
misc1assified. The poSSibility of a subject's being assigned to the
wrong stratum is the general result when any number of covariates
are misclassified.
Therefore, the results reached here will hold
for. a situation in which there is at least one covariate
misclassified and no misc1assification of exposure.
Let 4>sr = pr (Tr * ITs)'
s,r=1,2, ... ,S.
This is the
misc1assification probability which is analogous to
Tr
ij , the
misc1assification probability involved when there is
misc1assification of exposure. Now, with
eisr = pr( 0 I Tr * n Ei* n Ts)
and 4>sri = pr ( Tr * lEi *
n Ts) ,
we find
s
e.1S= pre 0 I E.*
n T ) = 2 pre 0 n T * I E.* n T )
1
S
r=1
r
1
s
S
=
2 pre 0 I Tr * n E.*
n Ts
r=1
1
) pre Tr * I Ei *
n Ts )
s
= ~1 eisr cPsri
s
=r=1
2 eisr
(2.1)
cP
sr
assLDTling that cPsri=pr(Tr * IE i* n Ts )= pre Tr * ITs )= cPsr for all
i=O,1, ... ,k, or that the cPsr's are constant over all exposure
categories. This assLDTlption is equivalent to an assLDTlption
concerning the independence of exposure and true covariate levels,
conditional on the classified covariate levels. We can see this
through the following:
pr(Tr * IEi*nTs ) = pr(Tr * ITs)
E\ * )
n Ei * )
pr ( Tr * n Ts n
pr ( Ts
pre Tr * n Ts n E.1* ) _
pr( Tr *
n Ts )
pr (Tr * n Ts )
pr (Ts )
pr(TsnE i*)
pre Ts )
•
pr( Ei * ITs) pre Tr * ITs) = pr( Ei* I Tr * n Ts ) pre Tr * ITs)
= pre Ei * n Tr * ITs)
This shows us that the assLDTlption of constant cPsr 's over all
exposure categories is equivalent to the assLDTlption that, given
classified covariate levels. exposure and true covariate levels are
independent.
This may not always be the case, though.
We will
consider, later in this chapter, a model for situations in which
the ct>sr 's differ over exposure categories. For now, however,
let us consider only situations in which they do not differ.
Under the situation involVing misclassification of exposure, we
made the assumption of nondifferential misclassification of
exposure, which stated that pr(E i IE j * ) is equal for the diseased
and nondiseased populations, for all i,j=O,1, ... ,k. In expanding our
model to include covariates, the nondifferential misclassification
assumption was inherently expanded to the assumption of
nondifferential misclassification within
each stratum, or
eqUivalently, to the assumption that pr(E i IE j * n Ts *) is equal
for the diseased and nondiseased groups for all i,j=O, 1, .. ,k
and s=1,2, ... ,S.
Now we make this assumption in terms of the misclassification
of the covariates, namely, we assume that pr(TslTr * n Ei * ) is
. equal for the diseased and nondiseased groups for all s,r=1,2, ... ,S
and i=O,1,... ,k. This assumption can be written as:
pr( TsiTr *
non Ei * ) = pr( TsiTr * n n n Ei * )
which implies
n D n Ei * ) = pr ( TsiTr * n Ei * ). (See the
derivation in Section 1.3 which used Ei and Ej * in place of Tsand
pr ( TsiTr *
Tr *·)
Since
e.lsr = pr( D I TnT
s
r * n E.1 )
= pr( 0 I Tr *
n Ei
non Ei * )
I Tr * n Ei *)
pr( TsiTr*
)
pr( Ts
].
noncHfferential misc1assification of the covariates within each
exposure category is equivalent to
pr( D ITs
n Tr * n Ei ) = pr( D I Tr * n E i
)
(2.2)
or a isr = air*
where air* = pr( D I Tr *
n Ei *
).
.Substituting expression (2.2) into model (2.1) leads to
S
aiS = 2 air* c/Jsr
r=1
Model (2.3) should look familiar.
(2.3)
It is very similar to
model (1.4), developed under the situation involVing
misc1assification of exposure. air*, the risk for exposure
category i and true stratum r, is analogous to a js*, c/Jsr is
analogous to 'fr.. , and summing over the r's is analogous to
IJ
summing over the j's.
In fact, the final models we will develop
for misclassified covariates modify the design matrix for
covariates in much the same way that the models from Chapter 1
modify the design matrix for the misc1assified exposure variable.
We argued in Chapter 1 that model form (1.4) is valid not only
for risks, but also for functions of risks and rates.
In
particular, we looked at the equality in terms of the logit of the
risks and natural log of the rates.
Here, the same argument can
be applied to claim that model form (2.3) is valid, not only for the
risks, ais and Sir*, but for any function of the risks or rates,
46
e
in particular for the logit of the risks and the natural log of the
rates.
Looking at this model in terms of the logit of the risks
we find
S
logit a iS = }: cPsr logit air* .
r=1
(2.4)
As with a js* in Chapter 1, we model air* using a logistic
model:
a. * = { 1 + exp
lr
(_p a*_p.1*-vr '
y * ) } -1
(2.5)
where vr'= (vr1 ,vr 2 ,... ,vrp ) is the set of covariate values
corresponding to the r-th true stratum and y* = ( y 1*, y 2 * ,... , y p*) ,
is the set of regression coefficients for the
classified covariates.
correctI y
pa * is the predicted reference value for the
group with the lowest exposure level in the first stratum. Pi* is
the incremental effect of exposure level i over the lowest exposure
level. Therefore, Po*=O by definition.
From expression (2.5) it follows that
logit air*(v~ = Pa*+P i * + vr' y* .
(2.6)
, Substituting expreSSion (2.6) into model (2.4), we find
S
logit a iS (vs ) = }: cPsr (Pa*+P i * + vr'Y*)
r=1
.
S
=fJ a*+fJ;* + [ ~1 <Psr vr' ]
= Pa*+p.*
+ • s 'V1 y*
1
where .s' =(cPs l' cPs 2'"'' cPs S) and
4'1
"'1*
(2.7)
v 11 v 12··· v
1p
v21 v22··· v2p
.
v.S '
vS 1 vS2 '"
vSp
Models similar to model (2.7) can be developed to model
incidence rates and continuous response variables when there is
misclassification of one or more covariates.
Such a model for
rates is
In Ais(Vs } = Pa*+P i* + .s'ViY* .
(2.8)
Likewise, the model for a continuous response variable y is
E(Yis ) = Pa *+P i * + .s'V1y*.
In matrix notation, model (2.7) can be written as
(2.9)
a = If/1* + .Vy*
where legit a, If, /1*, v, and y* are as defined in Section 1.10,
logit
e'
and
o,
,
<P12 0
<P12 0
0
0
, ···,
, ···
<P1S 0
<P1S 0
0
0
o,
<P12 0
0
,
<PiS 0
0
<P11 0
cP 11 0
0
<Pi1 0
t
···,
--------------------------------------------------------------------
48
The first (k+ 1) rows of • correspond to the first stratum. These
are represented in the matrix above by the top submatrix of
identical rows. The last (k+ 1) correspond to the last stratum and
are represented by the lower submatrix of identical rows.
Further t as we defined If> to be the
n matrix which represents
perfect classification of exposure, let us define
~
to be that •
matrix which represents perfect classification of covariates. Since
perfect classification of covariates implies that 4>sr= 1 when s=r,
s,r=l,2,... ,5, it can easily be seen that
~V=V.
2.5.2 Illustration of Model (2.7)
As an illustration of the use of model (2.7), consider the
follOWing situation.
Suppose there are two covariates chosen for
inclusion in a model: age and sex. We could assume that either or
both of the covariates are misclassified. Let us suppose that both
age and sex are misclassified. Normally, of course, one would not
expect a variable like sex to be misclassified.
Suppose there are three age categories and two sex categories.
. Define 'Pab to be the probability of being classified into age category a
given that one is truly in age category b; a,b= 1,2,3. Also, define {cd
to be the probability of being classified into sex category c given that
one is truly in sex category d; c,d=1,2. If we assume independence
between the misclassification of age and sex, we find
4>sr = 'Pab {cd
This assumption of independence is reasonable. In general, the
misclassification of one covariate should not depend on the levels of
49
the other covariates.
Since there are three age and two sex categories, the total
number of strata is 6 (i.e., S=6). The number of covariate values
used to define a stratum is 3 (i.e., p=3). Two covariate values are
used to define the age category, one is used to define the sex
category. Since each stratum
15
defined by a particular combination
of age and sex categories, let us define
where (vr1 ,vr 2) = (vb 1' vb2 ) are the covariate values indicating
the b-th age category; b=1,2,3;
and (vr3 ) = (vd) is the covariate value Indicating the d-th sex
category; d= 1,2.
S
vr'
Now, ~ epsr
r=1
becomes
This vector simplifies to
SubstitutIng the above expression into model (2.7) gives us
logit
eiS
(v ) =
s
fJ a* + fJ i * + [
~ lPab (vb1 ,vb2), d=1~ {cd (vd1 ) ].,*.
b=l
Suppose, now, that the following covariate values are used to
&0
define the various categories of age and sex.
The values indicating the b-th age group, (vb 1,vb2) for b=1 ,2,3, are:
(v 1 1'v 12 ) = (0,0)
(v21 ,v22 ) = (1,0)
(v31 ,v3 2) = (0,1)
The values indicating the d-th sex category, vd for d=1 ,2, are:
v1
=°
v2 = 1 .
3
2
Now,
b=1
\Pab (vb1 ,vb2 )
= \Pa1 (0,0) + \Pa2 (1,0) + \Pa3 (0,1)
= (0,0) + ( \Pa2'
°)+ ( °,\Pa3 )
= (\P a2' \Pa3 )
and
2
11 {cdVd = {cl (0) + {c2(1) = {c2
so that
logit eiS(vS) = Pa* +
Pi *
+ (\P a2' \Pa3' {c2 ) y* .
Suppose that the stata are defined with the following covariate
.vectors:
v 1'
=( 0, 0, °)
v2' = ( 1, 0, 0 )
°)
v3' = ( 0, 1,
v4 ' = ( 0, 0, 1 )
vS'=(l,O,l)
v6 ' = ( 0, 1, 1) .
The design matrix used for misclassified covariates, assuming k= 1,
51
is shown below. Its structure is similar to the structure of fi, the
design matrix used for a misc1assified exposure variable.
.v=
IJJ 12
IJJ 13
f 12
IJJ 12
IJJ 13
f12
1JJ22
1JJ23
f 12
1JJ22
1JJ23
f12
1JJ32
1JJ33
f 12
1JJ32
1JJ33
f 12
. 1JJ12
IJJ 13
f22
1JJ12
IJJ 13
f22
1JJ22
1JJ23
f22
1JJ22
1JJ23
f22
1JJ32
1JJ33
f22
1JJ32
1JJ33
~22
and
y* =
[Y'*]
Y2*
Y3*
The intercept parameter, fJ a *, is the predicted reference value
for the group with lowest exposure, in the first true age and sex
categories. Y1* is the effect for the second true age category. Y2*
is the effect for the for the third true age category. Y3* is the
effect for the second true sex category.
2.5.3 Generalization of Misc1assification Models
Let us reexamine models (1.16) and (2.7). Both can be written
in the form
(2.11 )
where the definitions of fi and. depend on the misclassification
probabilities involved. Model (2.11) is equivalent to model (1.16)
when
~~,
and equivalent to model (2.7) when
n=JIO.
The point
to be made here Is that the exposure variable and the covariates
are equally represented in the above model.
Both are necessary
to define a particular response variable.
There is nothing in the model to differentiate the roles of the
exposure variable and the covariates. It is in the analysis that the
two are differentiated.
Therefore, it is very reasonable that the
model for misclassified covariates should be a direct analogy to
the model for misclassified exposure.
Further, notice that the
model for misclassified covariates treats the covariates as the
model for misclassiflcation of exposure treats the exposure
variable. Therefore, it follows that misclassification of
exposure and covariates would result in a model such as model
(2. 11) . This model can be seen as a general model for
misclassification of any number or type of variables.
If there is
no misclassification of either exposure or covariates, the
corresponding misclassification matrices will reflect that.
2.6 Model to Include Interaction
The models developed !n Chapter 1 for misclassification of
exposure are strictly main effects models.
They were devised
ignoring any effect of interaction among the variables.
If one or
more of the covariates is an effect modifier, the inclusion of
interaction terms in the model can be useful.
Here we will develop a model to include interaction in model
&3
(1.16).
The expanded versions of models (1.18) and (1.19) will
follow direct!y.
Let
s '=(ws 1' ws 2'·"'wsq) be a set of values for possible
effect modifiers for stratum s where qSp, and let the true
W
exposure variable be defined as an indicator variable which picks
out the true exposure category.
If the interaction terms are
constructed by multipying the true exposure variable by the effect
modifiers, then logit
ajs*
can be expressed by
legit (ljS*=f3 a *+f3/+Vs 'Y"+
Ji
wsl <5(;-l)q+J
There are k sets of q interaction terms in all, one set for
each exposure effect. For j=O, the interaction terms are defined to
be zero since /30*=0. Then, the set of regression coeffiCients for
the interaction terms is
.r *.,Oq+1
~
* , ... ,02q
.r *., ...
a~*'_(.r
- 01 * , ... ,Oq
..r
~
*) .
,0(k+1)q+1 * , ... ,Okq
Summing from 1=1 to q beginning with the ((j-!)q+1)th term is
eqUivalent to summing over all the terms in the j-th set.
Now, model (1.12) becomes
logit
k
ai s (v)
= 2Tl"··
s
j=O lJ
=/3 *+
a
~*
(/3 *+(3.*+v ''1*+ 2 w 16('-1) +1)
a
J
s
1=1 s J
q
k.
k ~
*
2
Tl
"
..
{3.* + v 'y*+ 2
Tl
"
..
W 1 6('-1) +1
j=1 lJ J
s
j=1 =1 lJ S J
q
k
= /3 *+ 2 Tl" .. {3.* + v ''1*
a j=1 lJ J
s
+J.l 7T i1 w 51<51* +... +
54
11
7Tik w
si(:- 1)q+ J
This is what we would expect, as is illustrated in the
following example. Suppose there are two exposure categories and
three strata.
Also, let v '= w ' for s=1,2,3,
s
s
i.e., all
covariates are possible effect modifiers.
....
If n=
1
7r01
1
7r11
1
1
7r01
7r11
vz'
1
7r01
v3'
1
7r11
V3'
vi'
,
Vll
v=
vz'
-
°°
°°
°
°
°
°
1
1
, y*=
1
and
1
/1*=
then , since vs '=ws '
W1'=(W11, W12)=(V1 h V12)=V1'=(0,0)
W2'=(W21,W22l=(V21,V22)=v2'=(1 ,0)
w3'= (W3h W32) =(V3hV32) =V3'= (0,1)
and we can see that q=2.
It follows that
logit
a1s (v)=
f3 *
s
a
k
+ ~ f3 .*7r.. +v
j=l J IJ s
'r
So, for i=O and s= 1, we find
2
~ 7r01W 1101* = 7r01W1101* + 7r01W1202* = 0;
1=1
for 1=0 and s=2,
2
'5' 7r01W2101*
1=1
= 7r01W2101* + 7r01W2202* = 7r0101*;
for i=O and s=3,
2
~ 7r01W3101* = 7r01W3101* + 7r01W3202* = 7r0102*;
1=1
for i=1 and s=1,
55
'.
2
~ 7TllW 1101* = 7Tl1Wll01*
1;;'l
+ 7Tl1W1202* = 0;
for i=1 and 5=2,
2
~ 7Tuw2101* = 7Tl1W2101* + 7Tl1W2202* = 7T1101*.
~1
'
for i=1 and 5=3,
2
~ 7Tuw3101* = 7TUW3101* + 7Tl1W3202* = 7T1102*;
~1
so that
e01 = P a* + 7T01Pl* ,
legit eu = Pa * + 7TUP1*'
logit e02 = P a * + 7T01Pl* + Yl* + 7T010l* ,
legit e12 = P a* + 7Tllf31* + Yl* + 7TUOl * ,
logit
and
logit
e03
logit
e13 = Pa*
= f3 a * + 7TOlf31 * + Y2* + 7T0102* ,
+ 7TllPl* + Y2* + 7rU02* •
This implies that the model including interaction can be written
in matrix notation as
logit 6 = Df1* + Vy* + Wd*
where
W=
0
0
7TOl
7Tu
0
0
0
0
0
0
7TOl
7Tu
and
d* = [ 0.* ]
02*
56
\,
2.7
Model to Reflect Ordinal Exposure Categories
Let us return to the situation of misclassification of
exposure alone.
Often, categories of the exposure variable are
ordinal rather than nominal. If so, it may be of interest to fit
some functional trend to the exposure parameters.
Here, we
will develop a model which accommodates such situations.
We will begin by developing a model to fit a linear trend in the
true exposure effects. A linear trend implies that the parameters
follow the flD1ctional form: fJ. *=fJ *+c. T:*, where c. is a score
J
a
J
J
assigned to true exposure category j, for j=O, 1, ... ,k. A simple
example is cj=j which assumes equal spacing of the exposure
levels. The interpretation of the parameters here is different from
the case in which the exposure categories are nominal. fJa* is the
intercept term; and fJo*=fJ a *+coT:*, rather than zero as it was for
nominal exposure categories. Finally, T:* is the linear effect for
exposure.
The vector of exposure parameters is now
r = [:~
.. The logistic model becomes
-1
ejs*(vs )= { 1 + exp(-fJa*-cjT:*-vs'y*)} ,
so that
logit
e.JS*(vs ) = fJ a *+c.J i*+vs '1*.
Using this expreSSion in model (1.12) gives us
logit
e. (v) =
1S
S
k
~ 'fr .. (fJ
j~O
1J
a
*+c. T:*+v 'y*)
J
s
k
= fJ *+T:*'5 'fr.. c. + v '1*
a
j;'b
1J
J
S
57
(2.10)
*].
In general. suppose we want to fit the following functional
trend in the exposure parameters:
2
fJ.*=fJ *+C·t"l*+C. t"2*+ ...+C. m t" *
J
where m
J
J
J m
<k. A quadratic trend would be fitted when m=2, a cubic
a
when m=3, and so on. The vector of exposure parameters is
.
t"m*
In this case,
logit
e.JS*(vs ) = fJ a*+cJ.t"l*+C.J
2
.
t"2*+".+C .mt" *+v 'y* ,
J m
s
and, when substituting this expreSSion into model (1.12), we find
logit
e.JS*(vs )=
k
k
fJ *+t"1*}: C .7T. .+t"2*'5 c.
a
j=O J lJ
j~
J
k
*'5 c .m 7f.. +v 'y*
m j~ J
lJ S
2
7f .. +...+t"
lJ
(2.11 )
As an illustration of thiS model, consider a situation in
which the objective is to fit a linear trend to data for which
there are four exposure categories (i.e., k = 3) and c .=j,
J
j=0,1,2,3. It follows from model (2.10) that the design matrix
would take the form
n=
U1
where
"1.
58
1
1
1
1
Tr01+2Tr02+3Tr03
Tr11+2Tr12+3Tr13
and
Tr21+2Tr22+3Tr23
p=
[::*] .
Tr31+2 Tr32+3 Tr33
Notice that , in this case, the elements of the
n matrix are not
probabilities. Instead, the elements of the second column are weighted
averages of the scores, cj=j, j=O,1,2,3, with the weights being the
misclassification probabilities.
2.8 Nonconstant Misclassification Probabi lities
Throughout thiS and the previous chapters, we have made
certain restrictions on the misclassification probabilities.
In the
development of models (1.16), (1.18), and (1.19) we assumed
that
the
Tr ij
's do not differ from stratum to
stratum.
Similarly, in the development of models (2.7), (2.8) and (2.9), we
assumed that the tPs/s do not differ over exposure categories.
These assumptions may not always be true.
Suppose, for
instance, that the methods for classifying subjects into exposure
.. categories varied from city to city.
A covariate
which
represents "city" included in the model will lead to different
tPs/s over the strata.
Modifying the models to accomodate situations such as the
one described above is a very simple matter. For instance, model
(1.16) becomes
logit
e.1S (v)
S
k
= fJ * + ~ fJ .*Tr.. + V 'y*
a j~ J IJS S
59
where
=pr(E .*IE.nT ) so that each stratum has a distinct
1JS
J
1 s
set of Tr•. 's associated with it. Models (1.18) and (1.19) are
1J
modified similarly.
Tr..
Likewise, if the cl>sr's differ over exposure categories,
model (2.7) becomes
S
logit
e.1S (v)
= p * + p. * + ~ cI> •v 'y* ,
sal
r= 1 sr1 r
where cl>sri=pr(Tr * ITsnE i * ) so that each exposure category has a
distinct set of cl>sr's associated with it. Models (2.8) and (2.9) are
modified similarly.
60
...
CHAPTER THREE
PROPERTIES OF ESTIMATORS
3. 1 Introduction
In Chapters 1 and 2, we developed models to accommodate
several different types of situations involving misclassification
error. In Chapter 2, we also considered different methods for
estimating parameters associated with these models. Here, we will
examine the properties, Le., bias and variance, of these different
types of estimators.
We will be dealing specifically with those
models derived in Chapter 1 for misclassification of exposure,
namely models (1.16), (1.18), and (1.19). One general
assumption made in this and in the subsequent chapter is that models
(1.16), (1.18), and (1.19) are appropriate. That is, we are
assuming that the conditions required for models (1.16) and (1.18)
to be valid (see Chapter 1) hold.
One important consideration in examining the properties of
the estimators is the follOWing.
Models (1.16) , (1.18) , and
(1.19) were derived assuming that the
n
However, this is usually not the case;
instead, the values of
matrix was known.
the elements in
n are estimated
(or guessed at) by the investigator.
We are interested in the properties of certain estimators when an
estimate of
n,
rather than
n itself,
e
is used in the fitted model.
A
n to be an estimate of the matrix n. In
"
define no to be that estimate of n which reflects no
Define the matrix
addition,
misclassification of exposure. In the following sections we will
consider a fitted model which uses
special case of
n = no to
n rather than n,
.
and the
determine the consequences of ignoring
misclassification error.
Let us begin by examining the LS estimators of the parameters in
model (1.19). The primary reason for examining the least squares
estimators first is that they are the least complex and most
commonly used. The WLS and ML estimators will be discussed
later in a similar fashion.
3.2 Least Squares Estimation
Let Y=(Y Oh Y 11, ... , YkS)' be a vector of continuous response
variables. The covariates in this situation need not be categorical.
. If they are, instead, continuous, Yis refers to the s-th subject among
those in the i-th classified exposure category. The
7T •• 's
IJ
are assumed
to be the same for all subjects; the misclassification probabilities
cannot vary from subject to subject within a claSSified exposure
category.
•
Further, it is assumed that each classified exposure category
contains the same number of subjects, S. This may not be a
realistic reqUirement to put on the data. For now, however, we will
62
assume that it holds. In Chapter 4 we will investigate the situation
in which this requirement is violated.
As was discussed in the previous section, the estimate of U
used in fitting the model may not always be the true II matrix.
Therefore, we are faced with a situation involving two models:
the true model, E(Y) = llP + Vy*
(3.1)
""
and the model to be fitted, E(Y) = llP + Vy
where P = (13 a,f31"" ,13k)'
is the set of
(3.2)
regression coefficients
for the exposure variables, and y =(YhY2,".'Yp)' is the set of
regression coefficients for the covariates.
""
""
""
3.2.1 Bias Of PIs And Yls When II = II
""
The following derivation (Seber, 1976) gives formulas for PIs
""
and Yls' the LS estimators of
P and y, from which we can
investigate the properties of the two.
Model (3.2)
can be
written
""
Y= llP + Vy + e
where e=(eohell, ... ,ekS)' is a vector of error terms.
Now we
compute the LS estimates as follows:
""
""
e'e = (Y-llP-Vy)' (Y-llP-Vy)
""
.
""
= Y'Y-2p'D'Y-2y'V'Y+2P'D'Vy+P'U'DP+y'V'Vy
~ a~~e) = -21i'Y+2 n'Vy+2 Ii ,Ii II
~ a 1e ' e) = - 2V'Y+2V'lip+2V'Vy
CJY
Setting expression (3.3) equal to 0 implies
A
A"
IIls = (D 'D )
-1
A
(3.3)
(3.4)
A
(3.5)
D ' (Y-VYls)
63
Setting expression (3.4) equal to 0 implies, along with expression
(3.5), that
A
A
A
-1
A
A
A
V'VYls = V'Y- V' [ n (n 'n) n' (Y- VYl s ) J
A
A
-1
A
A
:::). V' [ Is(k+ 1) - n (n 'n )
A
J VYls
n'
A
A
A
"
Is (k+ 1)- n
= V' [
-1
A
A
(n 'n)
n' ] Y
A
:::). V/RVYls = V'RY
A
-1
A
;Ill"
:::). Yl s = (V/RV) V'RY
A
A
A
(3.6)
-1
A
A
where R= [ Is (k+ 1) - n (n/n) n / ] and
Is (k+ 1)
is an identity matrix
of dimension S (k+ 1).
A
...
Now we can examine the bias of fJls and YIs by conditioning on
n, which is equivalent to assuming that n is a matrix of known
A
...
A
A
constants. The expected value of fJ ls given n is
A
A
A
-1
A
E(fJ In) = (n 'n)
ls
A
n' (np*+Vy*)
A
1
A
A
-1
A
- (n/n-f n/v (V'RV)
A
-1
A
A
A
-1
A
A
V'R (np*+vy*)
A
n/nfJ* + (n 'n) n/vy*
= (n 'n)
A
A
-I
A
-I
A
A
- (n'n) n/v (V/RV) V'RnfJ*
A
A
-1
A
- (n/n) n/vy*
A
A
-1
A
-I
A
= (n 'n) n ' [ IS (k+l) -V (V/RV)
A
A
V'R J nfJ*
:...
Notice that when n = n, Rn=O where 0 is a matrix whose
A
A
It follows that E( fJlsl n =m =p*.
elements are all zeros.
A
Therefore, if the correct n matrix is used in the fitted model
A
(i. e.,
A
n=m, fJIs is an unbiased estimator of fJ*.
A
A
The expected value of YIs given n is
64
(3.7)
e
A
A
,.,.
-1
,.,.
-1
,.,.
E(YIs IIT ) = (V'R\') V'R (ITr+Vy*)
,.,.
,.,.,.,.
= (V'R\') V'RITr + (V'R\') V'RVy*
-1
A
A
= (V'R\') V'RITr + y*
,.,.
,.,.,.,.
(3.8)
,.,.
,.,.
Again, if IT = IT, RIT =0
and E(Yls'IT=m=y* so that Yl s is an
unbiased estimator of Y* when IT=IT.
A
A
A
A
3.2.2 Bias Of /JIs And Yls When IT ill
By investigating the matrix R, we can gain some insight
A
A
A
A
regarding the bias of /JIs and Yl s when IT is not equal to IT.
,.,.
A
,.,.
-1""
Expand the quantity IT(IT'm IT as follows:
A
0_
where IT 1 is the estimate of IT 1 and has
..
dimensions (k+ 1) x(k+ 1);
A
then,
A,.,.
~,.,.
,.,.
(IT 'IT ) = [ IT 1",IT 1
, ...
A
,IT 1, -J
ITt
IT. 1
A
.
A
IT
A
s1
-1
A
~
(IT'IT)
~
(n'n) n, =
1
=
S
,..
,.,.
A
A.-l
(IT 1'IT 1)
§' (11 '11
_I
A
~
=
1
A""
[(IT 1'IT t )
,
1
-1
A
)-1
1
,.,.
A
-1
A.
n l ', (n 1'n 1) n 1' ,
,.,.
IT (IT'IT)IT' =
65
,.,.
••• ,
A
(n1'IT 1)
-1""
IT 1']
(3.9)
1
""
,...
II
S
"
-1"
A
"'"
-1 "
A
-1
A
A
[(lll'lll) lll', (ll/lll) ll'it ... , (lll'lll) II /
1
J
.
III
III
"
-1
-1
A
-1
" A
II dlll'lll) lll'
A
A
A
-1 "
A
II 1 (lll'lll) lll' ... lldllt'lll) llt'
-1 "
" A
II dlll'lll) lll'
A
-1 "
A"
II dlll'lll) lll' ... III (lll'lll) llt'
_ 1
-5
-1 "
A
-1
II t (lll'llt) lll'
A
A
-1
A
A
II t(llt'lll) lll' ... III (lll'lll) lll'
"
Recall that llt is a square matrix.
dependencies
A
It has no linear
among the rows or columns and is
therefore
invertible. This implies that
A
A
-1
A
A
""
-1
A
-1
A
II t(lll'lll) llt' = lll ll l ll/ lll' = Ik+ 1
"
" " -1
so that ll(ll'm ll'= S1
A
A
Ik+ 1
I k+1 · .. I k+ 1
I k+l
I k+l • .. I k+ 1
I k+1
I k+1 · .. I k+ 1
A
Is
A"
-1
(3.10)
A
"-
This implies that R= L (k~ 1) - II (ll'm ll' J is not dependent on ll.
"
"
In other words, R does not vary as II varies; it is constant for all
"
"
choices of ll, including the choice ll=ll.
"
We saw before that if
"
"
ll=ll, then Rll=O.
Since R is not dependent on the choice of ll,
"
"
we can conclude that Rll=O for any II used in the fitted model.
"
"
Now let us return to the bias of PIs and Yl s . Notice that the
66
formula for Yl s ' expression (3.6), is a function of three
"
components, V, R, and Y. We have seen that the matrix R is
"
invariant to choices of D. Y is the vector of observed responses
which will obviously not change with different choices of D.
Consequently, for a given matrix V, the vector Yl s will
be invariant to choices of D. This result tells us that the
estimated
regression coefficients for the covariates have the
same values for different D matrices used in the fitted model.
An
important implication of this result is that the vector of estimates,
"
"
Yl s ' will have the same value whether or not D=D.
If D=D, as
A
"-
we've seen before, Yl s is unbiased. Therefore, since the same
"
"
estimates are produced for all D matrices, Yl s is always
unbiased.
This result is perhaps
more
clearly seen though
the conditional expectation of Yl s .
"
" .
Since RD=O, regardless of the value of D, expression (3.8)
expression
becomes
(3.8),
"
E(Ylslm = Y*.
This shows us that Yl s is
"
unbiased for any D used in the fitted model. In particular, Yl s
is an unbiased estimator of Y* if D=Do, that is, if
misc1assification is ignored.
Looking at the expected value of {Jls' we can use the fact
"
".
that RD =0 for all D to rewrite expression (3.7) as
#to
A
A.....
E(Plslm= (D 'D)
-1
A
D'DP
Substituting expression (3.9) into this expression we get
67
(3.11 )
E(P1s Iin= -§- [ (D/1I1) -lD t ', (Dl'D l)-lUl',...,(Ul'Ul)-lU l']
A
SO
-1
A
fit
fJ*
Ut
.
A
that E(P1s,m = (fit'fi t ) fit'fi t fJ* .
,..
Since fit is invertible, we can write
,..
,..
-t,..
,.. -t
(fit'fi l ) fit'= fit .
This leads to the general expression
A
A
A
-1
E(P1s Im = fi l fitfJ*
(3.12)
Consider a situation in which there are just two exposure
categories (i.e., k=1). Using expression (3.12) we can determine
I
I
_.
68
"
A
A
A
Defining PIs = (P a'
"..
P1)', the conditional expected value of P1 is
"
E (P" d D
)= P1*
"
[7T
11
"
-7T
01
]
7T11-7T01
This sho\VS us that the degree of bias involved is determined by
the ratio of the true difference
(7T1C7T01)
misclassification error were
to its estimate.
ignored,
the
If the
resulting matrix
"
D 1 would be:
A
.-
so that
A
7Tu
A
=1 and
A
A
7T01
=0. In this case, the expected value of
P1 is
A
This result is similar to the result from Section 1.6.1
describing the relationship between the risk difference in the
presence of misclassification 1md the true risk difference.
Again, by assuming a degree of misclassification such that
7Too
7T 11
and are both greater than 1/2, we find bias towards the null of
P1 when misclassification is ignored.
3.2.3 Bias of Predicted VC!lues
A
A
"
As we can see below, DPls is invariant to changes in D. Using
expression (3.5), we see that
A
A
A
-1
A
A
DPls=D(D'm D' (Y-VYls) .
A
A
A
-1
A
Equation (3.10) tells us that D(D'm D' is invariant to changes in
"
D.
"
This implies that "PIs is also invariant to changes in ". An
69
A
A
A
,...
important consequence of this result is that Yls=DPls +VYls' the
vector of predicted responses, is invariant to changes in D for a
A
A
A
given set of covariates. As with Yl s ' the fact that Yls will be
the same whether or not D=D implies that Yls is an unbiased
A
A
-.
,.
estimator of the response vector when conditioning on D, regardless
A
of which D matrix is used in the fi tted model.
A
A
A
It was shown earlier that if D=D, then PIs and Yl s are unbiased
estimators of fJ* and y*, respectively.
This unbiasedness was
A
conditional on a fixed value of D. Therefore, the unbiasedness of
DPls is also conditional. DPls ' however, is also unconditionally
unbiased. When determining the bias of Yls ' if we treat D as a
,.
random variable, rather than a as fixed constant, we still find Yls to
A
A
A
A
be unbiased for E CY) for all values of D. Using expression (3. 12)
we find
A
A
" ' "
E(DPlslm=
n
A
A
Dl
fi l
=
A
[E(Plslm
A
"
A
J = DDl
-1
Dlp*
,. -1
D l Dlp* =
I k+1
I k+l
.
D l p*
= fip*
.
A
Dl
I k+1
A
A
This shows us that the unconditional expected value of DPls
is Dp*. Further, by recalling that Yl s is unbiased for any fi, we
A
A
A
can conclude that Yl s is unconditionally unbiased for y*.
Consequently, the vector Y is an unconditionally unbiased estimator
1s
of the true vector of responses.
A
70
e.
,.
"
3.2.4 Variance - Covariance Matrices for PIs and Yi s
Now let us examine the conditional variance of the various
statistics of interest, beginning with the variance-covariance
,.
,.
,.
matrix of (PIs' Yl s ).
The variance of Yl s is derived using
,.
expression (3.6) and the fact that R is symmetric and idempotent:
A
-1
A
Var(YI s )= (V'RV)
2
A
,.
-1
= a (V'RV)
2
A-1
A
V'R Var(Y) RV(V'RV)
,.
"-1
(V'RV) (V'RV)
"-1
= a (V'RV)
(3.13)
2
This uses the least-square assumption of a constant variance, a ,
amd mutual independence of the Yis's , i=O,l, ... ,k, 5=1,2, ... ,5.
,.
,.
The derivation of the covariance of PIs and Yi s uses
,.
,.
expressions (3.5) for PIs and (3.6) for Yis.
Cov(Prs ' ~s) = Cov {di'm -I n ,y, ~s } -Cov {(n'm -In'v~s'~s }
2
,.
,. -1 ,.
,.
,.
-1
2 " " -1 ,.
"-1
= a (D'm D'RV(V'RV) -a (D'm D'V(V'RV)
2 ,.
,.
-1"
"-1
= - a (D'm D'V(V'RV)
A
,.
....
since D'R=O for all D.
Finally,
Var(Prs)= Var [(n'm -In , (Y- VYl s )]
..
= Var [(n'm -I n
,y] + Var [(n'm -In'v~s]
-2Cov [(n'm _I n ,y, (n'm -In'VYIS]
71
(3.14)
2
,..
,.. -1
,..,.. -1 ,..
= (j (n'm
+ (n'm
,..
[(D'm -lD'Y, Yl s ]
-2Cov
,..
V'Il(n'm-
,..
[(D'm -lD'Y, YlS]
,.. -1
l
,..
we saw in the derivation of Cov(llls ' Yl s )
since
Cov
,..
n'v Var(Yl ) V'n(n'm
s
that
= O.
All three expressions, (3.13), (3.14), and (3.15), involve
an unknown parameter.
2
(j,
We can see from expression (3.13) that the
A
A
true variance of Yl s will not vary as IT varies. However, it is also
important to see whether the estimated variance of Yl s ' Var(Yl s )'
A
A
A
A
will vary as IT varies.
squares estimate of
To examine thiS, we must use the least
2
(j
which is the mean square error (MSE).
,
MSE is equal to the error sums of squares (SSE) diVided by the
number of degrees of freedom corresponding to SSE, where
,..
SSE
,..
= (y-nPls-vYls)' (y-nPls-vYls).
"
A
We can see that SSE
A
is
,..
invariant to changes in n since npls and Yl s were found to be
invariant. Further, if SSE is invariant, then MSE is also
invariant since the degrees of freedom will not change if the
A
dimensions of IT are not changed.
A
Therefore, we can conclude that both Var(Yl s ) and Var(Yl s ) are
invariant to choices of n. However, the true and estimated
A
,..
A
A
Cov(P1s , Yl s ) and Var(Pls ) are not invariant.
72
e·
-
3.2.5 Variance of npls
,.,
,., ,.,
Conditioning on n, the variance of npls ' Var (npls)' is also
invariant to choices of n as the following shows:
Since
,., ,., 1 1'" -1'" -1
(n'm- = S n 1 n 1' ,
so that
,.,
then expression (3.15) for the variance of
,.,
fJ Is becomes
Var(P )=
ls
-? n,-I{n,'-\ -§-n/Tn,'.n".....n"]V(V-RVl-'V'
~1
II 1I1,-1}
l
-
n.
1
.
,.,
0_
= {n~-' {Ik+ 1+ § [Ik+ 1. Ik+1.· ... Ik+ 1 ] v(V'RVl·'V' \+1
-}
ii/'
Ik+ 1
,.,
1s = n VadPls)
=::». Var(np )
=
~
,.,
n'
2
Ik+ 1- {Ik+ 1+
Ik+ 1
which is invariant to choices of n.
It also follows that the
estimated variance, Var(np )' is invariant as well. Further, we
ls
73
A
can show that the estimated and true variances of Y
are also
ls
invariant since the covariance of npls and Yl s is not a function
of n.
AA
A
-
2
3.2.6 Invariance of R and the F Statistic
In Section 3.2.4, we showed that SSE and MSE are invariant to
A
choices of
n.
The invariances of SSR and MSR, the regression sums
of squares and mean squares for regression, respectively, are proven
2
below. Through these invariances, we can show the invariance of R ,
used to measure goodness of fit, and of the F statistic, used to test
the hypothesis that all the model coeffiCients are zero.
The total
SlDnS
of squares, SSTO, can be partitioned into SSE and
SSR (Le., SSE+SSR=SSTO). SSR is a function of the observed and
predicted values only, none of which is dependent on
n.
SSR, and
A
therefore MSR, are also independent of
invariant to choices of
n.
It follows that SSTO is
n.
2
R is obtained by dividing SSR by SSTO. Since neither sums of
squares is dependent on
2
n, R does not vary with the choice of n.
The F statistic is the ratio of MSR to MSE. Consequently, this
statistic is also invariant to choices of
n.
3.2.7 Summary for LS Estimation
We have shown in the preceding section that when the correct
matrix is used in the fitted model, PIs and Yl s are unbiased
estimators of fJ* and y*, respectively. When an estimate of
74
n
n , n,
e"
A
is used in the fitted model, PIs is not necessarily LD"lbiased for p.
A
'lIs' on the other hand,
A
a
A
is LD"lbiased for '1* for any choice of
A
n matrix, as npls and Y ls are for np and E(y), respectively.
Also, we saw that, although Var(Pls) and Cov (PIs' 'lIs) vary as
A
A
"
A
" "
"
n varies, the true and estimated Var(Yls)' Var(npls)' and Var(Y) do
2
not vary with n. In addition, R and the F statistic are invariant to
A
A
chOices of n. In the next sections, we will show that similar
properties hold for the WLS and ML estimators that were defined
in Chapter 2.
3.3 WLS Estimation
--
The method of WLS can be used to estimate parameters from
models (1.16) and (1. 18) when n is known. In the case of model
(1.16), an underlying binomial likelihood is assumed.
case of model (1.18),
assumed.
In the
an LD"lderlying Poisson likelihood is
For the remainder of this chapter, which includes the
following section dealing with ML estimators, we will limit our
work to estimators which are used in Poisson regression.
To work
with both models would be repetitive, and the matrix algebra for the
Poisson model is
considera~ly less
complex. The results reached
will also hold for logistic regression.
Poisson regression methods are used to estimate the parameters
in model (1. 18) . Model (1. 18) can be wri tten as In
~=
np +Vy* .
A
where ~= (AO 1,A 11 , ... ,AkS) '. The estimator Ais is unbiased for
A , as is proven by the following:
is
A
E (A is)=E (Xis/LiS) = [(AisL is)/L is ]=A is ·
75
,.
Asymptotically, the estimator YiS=ln(A is ) is l.D'1biased for In(A iS )'
Using this leads to the model E(Y)=np*+Vy* which, assuming
,.
model (1.18) holds, will hold in the limit. When n is used as an
,.
estimate of n, this model becomes E(Y)=np+Vy, where p and y
are as defined in the previous section.
,.
,.
3.3.1 Bias Of Pwls And Ywls
,.
Below, formulas are derived for P wls and Ywls from which we
can determine their properties. This is done as follows:
,.
.....
Let Q = ( Y- np - Vy)' Dx (Y- np - Vy )
~
~
A
~
= Y'D Y- zp'n'D Y-Z-'V'D Y+ 2p'n'D Vy + p'n'D np
x
x'
x
x
x
+ yV'DxVy
where Dx is as defined in Section 2.2.
Then,
(3.16)
aa Q = -2V'D Y+ 2V'D D.p + 2V'D Vy
Y
x
x
(3.17)
x
Setting expression (3.16) equal to 0 implies
,.
,.
....
n'Dx(Y-VYwls) = n'Dxnpwls
#Ill,
:::»
"A
-1
A,
A
P wls = (n'Dxm n'Dx-(Y-VYwls)
(3.18)
Setting expression (3.17) equal to 0 implies, along with expression
(3.18), that
V'D Vy" = V'D Y-V'D D. [(D.'D
x wls
x
x
x
76
fu- D.'Dx (Y-V.y. . wls)-J
1
...
A
A
A
=- V'OxRwVv'wls = V'OxRwY
=-.;'wls = [-V'OxRwvJ- v,oxRwY
(3.19)
-l
A
A
where R w = [
-1
A
Is (k+ 1)- n (n'Oxm n'Ox] .
A
As we investigated the matrix R in Section 3.2.2, we will
investigate R w here to gain some insight regarding the properties of
"
A
fJwls and 'Ywl s ' Recall that Ox is a diagonal matrix with the
counts X=(X01 ,X 11 , ... ,Xk1 , . . . , Xos' X 1S , ... ,XkS )' down the
diagonal. Let us write Ox as
°x =
0x2
o
o
where 0xs is a diagonal matrix with the counts (XOs,Xls,,,,,Xks)'
down the diagonal. Now we can write
"
"
(D'O m =
x
[Ii',nl',...,nl,]
°xl
o
0x2
.
o
..
"
°xS
[nl'OXl nl+ n/ox2 IT1+ ... + nl'°xSnl J
= n [OXl +Ox2+" '+0xS] n
=
l'
l
77
'"
'" '"
'"
where G is a diagonal matrix with the counts
"'-1
S
S
S
(~: XOs,2 X 1s '''',2 Xks ) down the diagonal. Therefore G exists
s=1
s=1
s=1
and we have
'"
(D'D
X
m=D
'" -1
'"
1 '" -1 '"
1-
G D 1'
-1
'"
Now, R w can be written as
Rw=
[Is(k+1)- - ~1
D1
U1- 16- 1U1,-1
[n ',n ,n
1
1', ...
1,]
.
.
'"
= Is(k+l) -
Ik+ 1
Ik+ 1
6-
D ]
x
1
[Ik+ 1 ,Ik+ 1 ,... ,Ik+ 1] Dx
e•
'"
Both G and D are dependent only on the original counts (the
x
XiS's). We can conclude, then, that R w is independent of the D
'"
'"
matrix. Therefore, Ywl s is invariant to choices of D.
A
A
'"
Now, we will derive the asymptotic expectations of fJwls and Ywl s
by first showing that they are both functions of ML estimators, and
then applying asymptotic properties of ML estimators.
We begin with the likelihood of the data, which, as was mentioned
in Chapter 2, is a product Poisson likelihood. This likelihood, LO(),
can be written,
78
•
S k
-~ ~ Ji.
s=1 i=O 1S
=
e
S k
X.
n
n
Ji. 1S
s=1 i=O 1S
----:=;---;---..;;.-..;;.-"'--""""------
S
k
rrn
s=1i=O
X.1S !
Then,
S k
S k
S k
In L(X) = -~ ~ Jiis + ~ ~ XiS In Ji·s - ~ ~ In XiS!
s=1i=0
s=1i=O
1 s=1i=O
so that
a InL(X)
a Jiis
= -1 + XiS.
Jiis
Setting this equal to 0 implies that the ML estimator of Jiis is XiS.
Since this holds for each i=O, 1,... ,k and s=1,2, ... ,S, the vector X is
the ML estimator of the vector p=(JiO 1,Ji 11 , ... ,JikS)'. In addition,
Ox is the ML estimator of 0p' which is a diagonal matrix with the
elements of IJ down the diagonal:In order to derive the asymptotic expectation of Ywl s ' let us first
return to its formula as given in expression (3.19):
,.,
"-1'"
-1
, Ywl s = [ V'OxRwV] V'OxRwY. Y can be written as In(l\. X)
where In refers to a function which takes the natural log of each
element in a matrix, and
..
l1.. is a diagonal matrix with the amounts
of population-time L= (L O1,L 11 ,... ,LkS )' down the diagonal .
By the large-sample properties of ML estimators, we can say that
"
the asymptotic expected value of Ywl s is
Ea(Ywlslm:o { [V'D"RwVtV'D"Rw In(l\.-'"l }
-1
SubstitUting 1n(1\. p)= InA = D/1*+VY* into the above expression
79
leads to
" "
"
Notice that Rwn=O for n=n; then, the invariance of R w implies
"
"
that Rwn=O for any n. Using this, we find
"
"
"
(3.20)
Ea(Ywlslm = Ea(Ywl s ) = y*
"
Expression (3.20) indicates that Ywl s is, aSYmptotically, an
"
unbiased estimator of Y* for any n matrix.
The conditional aSYmptotic expectation of
"
fJwls can be derived
from expression (3.18).
Ea(Pw1slm = Ea { (1I'Dx m -1 O'Dx (!n(£1.- 1Xl - VYwls
A
",.
-1
= (n'D m
Jl
10) }
A
n'D (nfJ*+Vy*-Vy*)
Jl
"
"
"
since we saw above that Ea (Ywl s Im = Ea (Ywls)=Y*.
"
"-1
A
-1"-1"
-1
As we proved the identity (n'Dxm = n 1 G nt' ,
"
"-1
"-1
-1
= n 1 G n 1'
diagonal matrix with the counts
(2 J.lo ' 2. J.l1
S
,
-1"
we can also show that (n'DJlm
,
where G is a
S
s= 1
s s= 1 s
S
,... ,
2. J.lk
s= 1 s
) down the
diagonal. This leads us to
=
n G-
=
n 6-
1
Ea(Pwlsln)
1
=
1
1-
1-
01-l
0 1,-1
[O l ',O l ', ... ,O<I D n fJ*
Jl
1
[Ik+ 1 ,Ik+ 1 , ... ,Ik+ 1] DJlnfJ*
G-1 [DJl! ,D
Jl2
,···,D ]
Jls
n1 fJ*
n.
1
80
..
.... -1
= n1 a
-1
and'*'
where Dps is a diagonal matrix with the elements (}JOs,}J1s"",}Jks)
down the diagonal.
The above is identical to expression (3.12) for the conditional
....
....
....
expectation of /1ls'
Clearly, .if n=n, /1wls is a
conditionally unbiased estimator of fJ* in the limit. Also, as we
saw with the LS estimator, if misclassification is ignored in a
A
situation involving just two exposure categories,
P1 will be biased
towards the null.
3.3.2 Bias of Predicted Values
....
....
The vector of predicted values, Ywls = n/1wls +VYwl s ' is
invariant as it was in the case of LS estimation. Again, it hinges on
....
the fact that npwls is invariant to choices of n, which is shown
below. From expression (3.18), it follows that
A
"
""
A
-1
A
A
npwls= n(n'Dxm . n'Dx(Y-VYwl s )
-....
=
n.... 1
n1
..
...
n1
81
Ik+ 1 -
6-
1
[Ik+ 1 ,Ik+ 1 , ... ,Ik+ 1] °x(Y-VYwl s )
Ik+ 1
n.
which is not a function of
Further, this implies that the vector
A
of predicted counts, Jlwl s ' is also invariant.
Now let us examine the bias of DPwls .
A
...
A
The conditional
A
asymptotic expected value of DPwls can be written
"
A
A
A"
A
-1
E(DPwls 1m = D E(Pwls 1m = D(D t Dtl1*)
...
=
Dt
D.t
A
-
A
-t
D t DtP =
.
ntr
I k+ 1
Ik+ 1
=DP
.
,.,
Dt
Ik+ 1
This implies that, asymptotically, DPwls is an unconditionally
,.,
unbiased estimator of Dr, and therefore Ywls is an
uncondi tionally unbiased estimator of E (Y) •
3.3.3 Goodness-of-fit Statistic for WLS Estimation
Goodness of fit for the WLS estimates is measured using the Waid
goodness-of-fit chi-square statistic,
0w= [In). -
YwIs] '
Ox [In). -
Ywls]
A
The vector A, defined in Section 2.2, and the matrix Ox are
,.,
dependent only on the original data.
Ywls' we have seen, is
,.,
invariant to chOices of D. Therefore, Ow will be independent of D.
82
e
v
The same goodness-of-fit test will result regardless of the
n'" matrix
used in the fitted model.
3.3.4 Variance-Covariance Matrices for Bwls and Ywl s
The estimated asymptotic variance-covariance matrix of
Pwls
and Ywl s is
at]
ae:]
Ox
By multiplying out
r
i
[n.v]
.
[Ii. v J}
Ox
and using techniques to invert partitioned matrices, we can
arrive at expressions for the estimated asymptotic variances and
covariance: Vara(fJwls )' Cova(fJwls'Ywls)' and Vara(Ywl s ). The
...
estimated asymptotiC variance of fJwlscan be written as
"'...
...
-1
n1
Vara (fJwls)=
'"
H n1/
-1
where H is a matrix whose elements are solely functions of the
original counts, X= (X O1,X11 ,... ,XkS )/. Therefore, we can see that
...
...
Vara (fJw1s) does vary with the choice of
A
A
"
n.
"-
Likewise, Cova (fJwls ' Ywl s ) is dependent on n. Vara (Ywl s )'
...
however, is not a function ·of n. It is a function only of V and
'"
...
the original counts. Therefore, Vara (Ywl s ) will not vary as n
varies.
3.3.5 Summary for WLS Estimation
In thiS section, we found the following asymptotic properties of
the estimators. If
n=n,
then fJwls is a conditionally unbiased
83
~
~
estimator of fJ*. Further, '1wl s ' npwls ' and Ywls are unconditionally
unbiased for '1*, nfJ*, and E(Y), respecti vely, regardless of the val ue
A
A
A
of n. We also determined that, although Vara (Pwls ) and
,..
A
A
,..,
'"
A
Cova(Pwls,'1wls) vary with the choice of n, Vara('1wl s ) is
independent of n. In addition, the Wald goodness-of-fit statistic in
.,,-
A
A
invariant to choices of n. Each of these results also holds in the
case of logistic regression.
3.4 ML Estimation
As with the WLS estimators, we will be dealing with ML
estimators in the context of Poisson regression. The results found
here also hold for logistic regression. The two models of interest
are identical to the ones presented in the previous section, namely:
the true model, E(1n A )=E(Y) = nfJ*+v'1*,
A
and the model to be fitted, E(ln
~
A
)=E(Y) = np+V'1. Again,these
models make use of the property that the asymptotic expected value
of Y is In A.
~
~
3.4.1 Bias of P ml and '1ml
Recall from Chapter 2 that the maximum likelihood
A
A
estimators, P ml and '1ml' are arrived at through an iteration
process for which the beginning value is the WLS estimate.
Therefore, the first stage values of the procedure are
[n. V J }-1 [ ~ ~ (X-I'wl~
84
e~
where Jlwl s is the vector of predicted counts from the WLS
estimation and 0"
is a diagonal matrix with these predicted
Jlwl s
counts down the main diagonal.
Multiplying out thiS matrix expression, we find
"
-"
D' 0"
D V V/ Jlw Is [ ,
[ "J
J-
"
D'D"
D
Pwl s
D'D"
V
Pwl s
V/D"
D
Jlwl s
V/D"
V
Pwl s
This matrix can be inverted using techniques to invert a
partitioned
matrix.
Define the resulting matrix as
the
partitioned matrix
--
{
[ iII]- " -" -}-i
V'
D"W!s[n, V J =
l-
12
Au
A
A 21
A,2'
J-
Using the inversion techniques, matrix expressions are
arrived at for each of the matrices A 1h A 12 , A2h A 22 • We find that
"-1
"
-1
"
Au can be written as Au =D 1 H1D 1/ ,A12 as Au =-D 1H2 VA22 , A 21
"
-1
. as A 21 =-A22 V/H3 D 1/ ,and A 22 as A 22 =
and
~
(V/~V)
-1
where Hh H2 , H3 ,
are matrices whose elements are solely functions of Pwl s .
Now the formulas for
""
Il i =
Il 1 and '1 i become
"
(Au D/+A 12 V/) (X-Jlwl s ) +
Ilwls
and
"
"
'11 = (A21 D/+A22 V/) (X-Jlwl s ) + '1wl s
Let us investigate the expreSSion for '11 first.
"
Notice that,
since we showed in the previous section that JlwIs is invariant
85
to choices of n, the matrix A22 is invariant to choices of n. Also,
when expanding
A21 n'=-A22 V'H3 n 1' -1 [n 1' ,n 1' , ••• ,n l ']
A
A
we see that A21 n' is invariant to choices of n as well.
This,
.along with the fact that Ywl s is invariant, leads us to conclude
that Y1 is also invariant.
A
A
A
The formula for fJ 1 shows us that fJ 1 is not invariant to changes
A
....
in n. However, if we examine the estimator nfJ l' we find
np 1=nn
1
1-
[H 1n l' -In'-H2V' -lV'] (x- PWl ) + np
s
wls
A
=
Ik+ 1
(H 1 [Ik+ 1,Ik+ 1'·'· ,Ik+ 1] -H2) (X- Pwl s ) + nfJwls
Ik+ 1
which is invariant to choices of n.
An important consequence of
A
this result is that the vector of predicted counts, Jll=nfJl+VYh
is also invariant.
Therefore, we have seen that for the first stage of the
iteration procedure, Y1 and P1 are invariant to choices of n.
Now, let us investigate the properties of these two estimators for
successive stages. The formulas for stage two are
"
"
fJ2 = (B 11 n'+B 12V')
~
A
(x- Jll) +
and
86
fJ 1
e
"..
"..
"..
A
Y2 = (~ln'+~2V') (X-Ili) + Yi
-1
-1
-1
-1
-1
where B11 =n 1 Mini ,B 12 =-n M 2V' '~1 =-~2V'M3n' ,
A
.-
~2
A
A
A
-1
= (V'M"V)
and M 1, M 2, M 3, and
~
are matrices whose
A
elements are functions solely of Pi'
A
Clearly, Y2'
A
A
therefore P2' will be invariant to choices of
n.
np2 ,
and
Also, it is
obvious that this invariance will hold at each stage of the iteration
A
procedure, including the final stage which-produces
A
Pml and Yml"
A
The invariance of Yml indicates that it is asymptotically
unbiased for y*.
This is shown be the following argLDTlent.
If
n =n (i.e., the true model is fitted ), then, by the large-sample
--
properties of ML estimators, Yml is asymptotically unblased for
A
y*.
A
Since Yml is invariant, the vector Yml will be identical for
" A
A
each choice of n, including the choice n=n.
Therefore,
Yml is always asymptotically unbiased for y*. The same argLDTlent
A
can be used to show that Pml is asymptotically unbiased for p.
A
Since
Pml
is not invariant, it is only asymptotically unbiased for
. fJ* when n=n.
A
3.4.2 Variance-Covariance Matrices for
The
Pml
and Yml
asymptotic variance-covariance matrix of
the
ML
estimators is estimated at the first stage of the iteration by
As we have just seen, Pl will be invariant to
87
n.
Therefore, by
examining the structure of the above matrix, whose components are
~r-esented
'"
'"
in the previous subsection, we can see that Vara (y1) alone
.
'"
will be invariant to
n.
Clearly, this is true at all other stages of
the procedure as well.
"-
3.4.3 Goodness-of-fit Statistics for ML Estimation
In Section 2.3, two chi-square statistics were said to be used for
testing goodness of fit with ML estimation. The formulas for these
statistics are given below:
S k
'" 2 '"
Qp = ~ .~ (XiS-}.liS) /}.l is
s=11=O
eNotice that both these statistics are functions solely of the
observed counts and the predicted counts. As we have seen, the
'"
predicted counts for ML estimation are not dependent on
..
n.
Therefore, we can conclude that both Q and 01 are invariant to
'"
P
og
choices of n.
3.4.4 Summary for ML Estimation
As with the LS and WLS cases, we showed that Ym1 and the
predicted counts are invariant to choices of
'"
n.'"
From thiS, it follows
'"
that Ym 1 and Jlm 1 are asymptotically unbiased for y* and
'"
JI, respectively. Pm1' on the other hand, is not invariant and will
only be asymptotically unbiased for
fJ* when
88
'"
n=n.
We also saw that
....
the estimated asymptotic variance of Yml is invariant to fi, while
....
....
the estimated asymptotic variance of fJml and covariance of fJmland
....
Yml are not. Further, Qp and Qlog' the goodness-of-fit statistics,
....
were found to be invariant to choices of fi.
--
89
CHAPTER 4
EXTENSIONS AND IMPLICATIONS OF RESULTS
4. 1 Introduction
In the previous chapter, many results were found concerning
the properties of several types of estimators in a situation involving
misclassification of exposure with equal misclassification
probabilities over all strata. In this chapter, we will examine the
implications of these results on other types of situations involVing
misclassification error. For instance, we will look at how the
results from Chapter 3 can be applied to situations involVing
misclassification of covariates. We will also examine the effect of
misclassification probabilities which differ over strata.
In addition, a relationship is derived between the estimate of fJ
obtained when misclassification is ignored and the estimate
obtained when the correct
n matrix is used in the fitted model.
From this relationship, several results are shown concerning the
properties of the estimate of fJ obtained when misclassification is
ignored.
e-
4.2 Misclassification of Covariates
In this section, several results are derived for properties of
estimators when there is misclassification of covariates, rather
than exposure. The results fOLD1d here are similar to those fOLD1d in
Chapter 3. In fact, the derivations used in Chapter 3 are frequently
applied to this altered scenerio.
To begin, consider again the example presented in Section
2.5.2, excluding the race factor. There were six strata involved,
defined by three age and two sex categories. The matrix
representation of the model
(4.1)
is
no=
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
0
1
0
1
0
1
0
1
.v
y* =
=
IP 12
IP 13
~ 12
1P12
1P13
~ 12
1P22
1P23
~ 12
1P22
1P23
~ 12
1P32
1P33
~ 12
1P32
1P33
~ 12
1P12
1P13
~22
1P12
1P13
~22
1P22
1P23
~22 .
1P22
1P23
~22
1P32
1P33
~22
_1P32
1P33
~22
where Pa*, the intercept term, is the reference value for the group
wi th no exposure in the first age and sex categories.
Pl * is the
exposure effect. Yl* and Y2* are the effects for the second and third
age.groups, respectively. Y3* is the effect for the second sex
91
category.
Now, redefine this model by including the intercept term in the
vector of covariate coefficients. This new model, whose matrix
components are presented below, is eqUivalent to model (4.1). The
'" '"
'" "'new model is written E(Y)=JIOp*+.Vy*
(4.2)
with
'"
p* = ( f3 1*),
'"
.V = ( 1, IV] ,
'"
y* =
where 1 is a column of 1's, and
o
'"
JIO=
1
o
1
o
1
o
1
o
1
o
1
'"
'"
V is defined as V= ( 1,V ]. From the definition of • given in
S
'"
Section 2.5.1 and the fact that ~ep =1, it follows that .V=(l,IVj.
r=1 sr
The motivation behind this new representation of model (4.1) is
the following. Model (4.2) is eqUivalent to model (4.1); however, the
intercept parameter in (4.2) is housed with the covariate coefficients.
92
"
This is done in order to create a model which is an exact analogy of
the model developed for misclassification of exposure,
E (y) =nj1*+Vy*.
In this model,
n is a matrix whose first column is
1, and whose
remaining columns contain elements which are misclassification
...
probabilities. The same can be said of .V. fl* is the vector of
parameters, including the reference parameter, which is
...
associated with the columns of n. y* is the vector of parameters,
including the reference parameter, which is associated with the
...
columns of .V. V is a matrix representing perfectly classified
...
variables. The same can be said of no. y* is the vector of
...
parameters associated with the columns of V. fl* is the vector of
...
parameters associated with the columns of no. The role played by
...
n
in Chapter 3 is now played by .V; the roles played by fJ*, V, and y*
......
...
are now played by y*, no, and fl*, respectively.
Earlier, we examined properties of estimators when an estimate
of
n is used
in the fitted model. Here, we will examine the
properties of estimators when an estimate of • is used in the fitted
model. The new model of interest is the model to be fitted:
""
""
"lI'\I
'"
E(Y)=noll+.Vy.
In Chapter 3, the fact that
n'" is a series of vertically concatenated
'"
matrices which are identical and invertible led to the invariance of R
....
....
and R w. This, in turn, led to the invariance of y for LS and WLS
"' ...
estimation. If.V were a series of vertical!y concatenated identical
...'"
invertible matrices, II
'"
would be invariant for choices of • for LS and
i3
WLS estimation. However, as the example above illustrates, .V will
not necessarily have this property. Therefore, the derivations in
Chapter 3 cannot be used to prove the invariance of the matrix
" '"
A " ' ' ' '"
A '"
-1
expressions .V [ (.V)' (.V)] (.V)' and
A
'"
A
"'w
A
-1
'"
A
"
,
,
,
,
A
.V [ (.V)' Ox (.V)] (.V)' Ox' which are the analogies to Rand R w
under misc1assification of covariates.
Surprisingly, though, these expressions are invariant to choices of
~
~
•. This is due to the fact that for each ., there exists a matrix T
such that .V=Vf and T is invertible. Consider the matrix .V
'"
presented in the above example. An .estimate of the corresponding .V
'"
matrix has V and T matrices as follows:
~
'"
V=
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
1
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
1
1
0
0
0
0
1
1
and
1
0
0
0
T=
~
\II 12
,..
~
\1123
,.. -
\1123-\1112
\1133-\1113
0
0
'"
A
'"
A
'"
0
,.. 0 ,..
\II,..13
\1122-\1112
,..
~
A
~12
\1113 ,..
-1
~22-~ 12
A
'"
Using this equality, the expression .V[ (.V)' (.V)] (.V)' becomes
...
'" '"
-1
'"
... '" ... -1 '"
Vf[T'V'VI1 T'V' = V[V'Vj V'.
A'"
A " " " '"
-1
(4.3)
A'"
Similarly, the expression .V[(.V)'Ox(.V)] (.V)'Ox becomes
'"
'"
'"
-1
'"
'"
'"
'" -1 '"
Vf[T'V'OxVI1 T'V'Ox = V[V'OxVj V'O.
(4.4)
x
Expressions (4.3) and (4.4) are analogous to expressions Rand
~
e·
A
R
defined under misclassification of exposure. Both (4.3) and (4.4)
w
A
are not dependent on.. Using alllD11ents analogous to those in
'"
....
Chapter 3, we can conclude that fJ is invariant to • for LS and WLS
estimation.
The predicted COlD'lts for WLS estimation
are also invariant. This
....
.... '" '"
follows from the invariance of (. Vy*) which is proven as
A
(3.1~)
for fJ wls for model (4.1) can be used to
'"
determine the formula for Ywls for model (4.2). This is
follows.
Expression
a~omplished
'"
""V
""
n by .V, V by IJO, and Ywl s
simply by replacing
fJ'"wls . The resulting formula is
A
""
AI\"
by
....
-1
A"V
A
"""'V
'"
'"
Y wls = [(.V)'Dx (.V)] (.V)'Dx (Y-IJOfJwls)·
Using the equality .v=vr where T is invertible leads to
A
-e
'"
'"
....
....
A
'"
'"
'"
'"
'"
-1
'"
'" '"
.VYwl s = vr[T'V'DxVf] T'V'Dx(Y-IJOfJwls)
....
'"
'"
'" -1 '"
'"
'"
= V[V'DxV] V'Dx (Y- IJO fJ wIs) .
....
This expreSSion is not dependent on the choice of •. Therefore,
....
,...
A
""'"
"""V""
A
Ywls=(nOfJwls+evYwl s ), and in tum Jlwl s ' are invariant to choices
A.
of •.
A
'"
A
fJ ml is also invariant to.. This is shown using the formula in
Section 3.4.1 for the first stage. values of the iteration procedure,
namely
A
A
A
A
fJl={A l1 D'+A I2V') (X-Jlwl s ) + fJ wls '
-1
A
A
-1
-1
A
where A l1 =n 1 H1D1' ,A I2=-D 1H2 V{V'li.V)
and
Hh~
....
and Ii. are
matrices whose elements are solely functions of Jlwl s ' This
formula for fJ applies to situations in which model (1.16) is being
A
95
fitted. Since model (4.2), rather than (1.16), is used in the context
A
of covariate misclassification, in order to arrive at a formula for
~
~
'"
P
~
A
the matrix 0 in the above expression is replaced by 00, 0 1 by IJO h
and V by .V.
After making these substitutions and replacing Au and A 12 with
.
A
'"
their definitions, the formula for Pl becomes
A
A
'"
'" -1 '" -1 '"
'"
'"
-1 '"
"
'"
Pl = (00 1 H1OO 1' IJO'+-OOl~(V''') V'.,) (X-Pwl s ) + Pwls
A
'"
-1
'"
= (00 1 H1IJ0 1'
-1 '"
'"
"
IJO'-IJ01~) (X-Pwl s )
'"
+fJ wls
A
A
,..."''''''
which is independent of •. The expression (.VY1) can also be shown
"
to be invariant. This leads to the invariance of Ph and of later
"
'"
stage iteration values of fJ, including fJ mI.
In summary, we have shown that in the context of misclassified
covariates, when the estimated degree of misc1assification varies, the
estimates of
P'" remain the same for LS, WLS, and ML estimation. In
"
particular, if the chosen. matrix is, in fact, the true • matrix, the
A
value of
.
'"
P will"
A
be the same as if another. matrix were chosen. This
'"
implies that fJ is an unbiased, or asymptotically unbiased in the case
'"
of WLS and ML estimation, estimator of p. In addition, the predicted
counts, and therefore the goodness-of-fit statistics will be invariant to
A
choices of • for all three types of estimation.
Below we will argue that the invariance of the estimated odds
ratio for the logistic model is a logical result. In Section 2.5.1,
we showed that the assumption of equal misc1assification
96
probabilities of covariates over all exposure categories (i.e.,
4>sr=4>sri for all i=O, 1,... ,k) is equivalent to the assumption that,
given claSSified covariate levels, exposure and true covariate levels
--
are independent. Consider the case of logistic regression. This
assumption implies that, if the classified covariates are being
controlled for, controlling for the true covariates in addition will
not affect the degree of confounding. In other words, adjusting for
the classified covariates will result in the same adjusted odds
ratio as adjusting for the true covariates.
The model used to control for the classified covariates is
logit 6 = Pa +p.+~Vy.
1
The model used to control for the true covariates is
-e
logit 6 = Pa +Pi+.Vy •
The above arguement implies that setting 4>sr=4>sri for all
i=O,l, ... ,k will result in equal Pi'S for the two models. This
agrees with the result presented earller in this section, namely
A
~
that fJ is invariant to choices of • assuming that 4>sr=4>sri for all 1.
4.3 Nonconstant Misclassification Probabilities
In developing the theory in Chapter 3, one basic assumption was
that misclassification probabilities were constant over all strata,
i.e.,
n=
. This assumption was necessary to show the
A
the invariance of R in the LS case, which led to the invariance of
A
'lIs'
A
A
npls and
Yls · In the WLS case, the matrix R w was found to be
invariant lD"lder thiS assumption, which led to the invariance of
several WLS estimators. This Invariance, in turn, contributed to
the invariance of several ML estimators. Now, we will investigate
A
the effect of a
n which does not meet this assumption.
For simplicity's sake, consider a situation in which there are
just two strata, i.e., 5=2. Let
ii =
[X:]
where
ii, is the matrix
of estimated misclassification probabilities corresponding to the
A
first stratum and
n2 is the matrix of estimated misclassification
probabilities corresponding to the second stratum. Using this
A
definition of
n,
let us expand the matrix expression
98
=
=
A
0_
A
A
A
A
-1
A
Unlike the situation in which 0 1=02 , the matrix O(O'D) 0' will
not be invariant to choices of
01and O when 014ri
From this, it
2•
2
A
A
follows that the matrix R will not be invariant to choices of 0 1 and
A
n2 .This result
is easily expanded to any number of strata. In
general, if there are S strata, then
A
n=
A
A
A
A
-1
A
0 1 , and the matrix expression 0 (n'D) n', and
A
n.2
.
lIs
A
A
A
therefore R, will not be invariant to choices of O.
Similarly, if
it =
[t:J ~
matrix
A
A
is also not invariant to choices of 0 1 and O2 • This result clearly
holds for any number of strata, which implies that the matrix R w is
not invariant for this situation.
A
A
Not only are Rand R w not invariant, but since the LS, WLS and
ML estimates of 'l are dependent on these two matrices, they will
A
A
also not be invariant to choices of O. However, if 0=0, it is easily
A
A
shown that fJls and 'lIs are t.mbiased estimators of fJ* and '1*, and
that the WLS and ML estimators of fJ* and y* are asymptotically
t.mbiased.
Similar results are found for model (4.2). The invariance of
A
A
A
""
'"
II'\"
A
"
A
fJls' fJwls ' and fJml to changes in • does not hold if <t>sr*<t>sri for
all i=O, 1, ... ,k. In thiS case, there still exists a matrix T such that
.V=Vf; however, this matrix T is not invertible since it is not a
...
A
square matrix. Without the invertibility of T, the invariance of fJ
does not hold.
These results indicate that the equality of exposure
misclassification probabilities over strata or covariate
misclassification probabilities over exposure categories can have a
gre~t
effect on the properties of certain estimators. In the case of
misclassification of covariates, in fact, if one can ascertain that the
misclassification probabilities are equal over all exposure
categories, the misclassification error need not be corrected for.
Therefore, determining whether the misclassification probabilities
are constant may be one of the most important steps in correcting
for misclassification error.
100
'-
4.4 Ordinal Exposure CategorIes
A short, but important, note is made here concerning situations in
which the exposure categories are ordinal, rather than nominal. In
these cases, models such as those developed in Section 2.7 can be
applied when there is misclassification of exposure. The
n matrix
associated with those models, however, lacks certain properties
which the
n matrices of models
(1.16), (1.1B), and (1.19) possess.
As was illustrated in Section 2.7, the columns of the
n matrix
for model (2.10) are not composed of the misclassification
probabilities themselves, as has been the case with the other
n
matrices we have considered. Instead, the elements are weighted
--
averages of the scores, dj for j=O,l , ... ,k, with the weights being the
misclassification probabilities. Likewise, the elements of the
n
matrix for model (2.11) are weighted averages of the scores and of
the scores taken to certain powers.
Although these
the indiVidual
n matrices can be written as n =
n1 matrices are not invertible.
Neither are they the
product of two matrices, of which one is invertible and the other
invariant, as was the matriK corresponding to the misclassified
variable in the previous section.
The consequence of thiS is that the statistics found earlier to be
invariant to chOices of misclassification probabilities are not
...
... ....
invariant in this case. The values of y, Var(y), and the goodness-offit statistics will vary with different sets of rr.. 's for these models.
lJ
101
A
A
4.5 Relationship Between ~ and
fl'J
In this section, a relationship is derived between estimates of fJ
"
" "
n=n
and those obtained when D=no.
obtained when
'"
A
A
Let us define
A
flO= (13° a'Po 1,... ,13°k)' as the vector of estimates of fJ obtained when
""
D=no).
misc1assification is ignored (i.e., when
"
"*
f/*=(pa,Ph""p k)'
is defined as the vector of estimates obtained when
"
"* "*
"
D=D.
The
"
relationship found between flO and f/* for LS, WLS, and ML
estimation methods is used in later sections to derive several
"
results concerning the bias of flO and properties of certain
hypothesis tests.
"
Let us begin by examining the two D matrices involved,
and
"
no.
Recall from Section 1.10 that
n =
. Similarly,
no"
=
Tr
Ok
Tr
1k
can be written as
"
"
no
n can be written as
1 Tr01 ...
1 Tr 11
where D1 =
n
Ifl
no
1
.-
wher~
no 1
=
1 0 0 o.. 0
1 1 0 O.. 0
1 0 1 O.. 0
.-
no 1
1000 .. 1
The elements in the first column and main diagonal of
no" 1 are all
1's; the remaining elements are all O's.
n1
and
no" 1 are related through the following expreSSion:
102
"-
(4.5)
In the case of LS estimation, we can easily derive the
...
...
relationship between fJ* Is and 1JO Is using expression (4.5). From
expression (3.5), the two vectors of estimates can be written
...
-1
...
fJ* Is = (IT'm IT' (Y- Vv'Is*)
and
A
A
1JOIs =
...
"-
-1
A
A
A
(no'no) no'(Y-VyIs0)
...
where YIs* and Yl s O are the LS estimates of yobtained when IT and
...
no are used in the fitted model, respectively.
Since IT 1 and
'"
no 1 are both invertible,
-1
(ITi'IT i ) IT i ' = IT i
and
...
...
-1 ...
we can see that
-1
"'-1
(no 1'no i ) no i' = no i
Using this fact, along with expression (3.9), we find
(4.6)
and
~ls =-§r [ rio ~:IJ\:.... rio 1-'] [Y-VY;s
D]
•
Now, since /1 is invertible, expression (4.5) implies
IT i
-1
= /1
-1"'-1
no!
(4.7)
Substituting expression (4.7) into expression (4.6) gives us
103
1
A*
[
-1'"
-1
-1
A
-1
-1
A
-1
11 no 1 ,11 no 1 ,...,11 no 1
(lIs = 5
A*
J
J-[Y- VYls
(4.8)
"-
A
It was shown in Chapter 3 that the vector Y1s is invariant to
A
A
n.
choices of
This implies that Y*Is
A
=-tIs.
Thus, expression
(4.8) becomes
.
P.ls
=-§-11-1[riol-'.~1-1, ... ,iI''t -I]
-1
A
POls'
=!J.
A
A
PoIs=1J.fJ* Is
which implies
[Y-V.fls ]
.
(4.9)
Consider a situation in which there are just two exposure
categories. In this case,
n1
-l-
1
1
-
Tr01J-
,
Trl1
Using expreSSion (4.9), we find
[~::
] = [
[~::]=
A
~
~
[
...
E(pOalm = P*a+Tr01P*1
and
"
E(POll
A
m = (Tr l1- TrOl)P* 1
•
These are consistent with the results found in Section 3.2.2 for
104
e·
A
A
the bias of Pa and
A
P1 when any D is used
A
Substituting
in the fitted model.
A
Tru=1 and Tr01=0 into the expreSSion given in that
section leads to the expreSSion given above.
The relationship in expreSSion (4.9) will be shown to hold for
A
both WLS and ML estimates. Begiming with the WLS case, fJ* wls
A
and
powls can be written using expression
and
A
A
-1
A
A
(3.18) as
A
powls=
(JIO'OxJIO) JIO'Ox (Y-V-'wls)·
The matrix Ox' a function of the observed counts, is constant for
A
A
all values of D, including the two choices used above, D and JIO.
A
-
.
A
Since -'wls is invariant to choices of D, we have substituted
A
A
A,..
-'wls=-'*wls and -'wls= yOwls into the above expressions.
We showed in Section 3.3.1 that
A
A
-1
(D'oxm
-1
A
A
-1
-1
A
= D1 G D 1 '
A
•
A
This expreSSion holds for D=JIO and
D=D, so that
A
A
-1
A
-1
A
-1
-1
A
(JIO'OxJIO) = JIO1 G JIO l'
-1
-1
A
-1
and (D'Oxm = D1 G D 1'
-1
•
This implies that
r wls= 81- 6- 8/ [81'.81' .....81' ] 0x(Y-VYwls )
1
1
1
Similarly,
powls= Do 1-16-' [~+1'~+1'''.'~+1]
Ox (Y-VYwls )·
Substituting expreSSion (4.7) into expreSSion (4.10) leads us to
105
;..wls= .1-'
-1
rio 1-'6-
1
[\+I.\+I ..... Ik+l ] DX(Y-VTwl s )
A
powls
=A
'
which implies
<-
A
A
powls= A fJ* wls
.
A
Similarly, this relationship can be shown to hold for fJ* ml and
A
poml
by examining the subsequent-step formulas for the two vectors
of estimators. In this case, however, we will show the relationship
holds using a Simpler proof which could be applied to any of the
three types of estimators. This proof makes use of the invariance
A
of
nPml for any choice of n,
which implies
e'
=
A
"
no 1fJ'Jml
~
" "
A
no 1poml = n1fJ* ml
poml = no 1 n 1fJ* ml
A
-t
A
A
~
From expression (4.5) we can see that
A
(4.11)
-1
A=no 1 n1
Substituting this expression into expresson (4.11) leads to
A
A
poml
= A fJ* ml
which is the desired result.
106
Therefore, we have seen that the above relationship holds for all
three types of estimators. The relationship also holds for the
corresponding parameters. We can see this through the LS
....
....
estimators. When
n =IJO,
expression (3.12) for the expected
....
value of /Jls becomes
A
A
A
E(!JOls In =IJO ) =
A
-1
!JO = IJO 1 n1fJ*
which implies fJO = A fJ*.
....
4.5 Bias Towards The Null Of P~k
In this section, we will prove a result similar to that of Gladen
.-
and Rogan (1979) discussed in Section 1.6.2. They found bias
towards the null in estimating the relative risk which compares the
highest and lowest exposure categories when misc1assification is
ignored. They also showed this property does not necessarily hold
for the relative risks comparing intermediate categories to the
lowest.
....
,Here, we will show that POk is biased towards the null value of
....
O. pOk.iS the estimate which is used to compare the highest and
lowest exposure categories when misclassification is ignored. In
the case of logistic regression, thiS result implies that the
estimate of ORk is biased towards the null when misclassification
is ignored.
Expanding the relationship found for the parameters, namely
!JO =Af1*,
we see
107
1
=
71"01
. '.'
0(71"11-71"01)
pOk
···
0 (7I"k1-7I"01) ••• (7I"kk-7I"0k)
which implies
po
k
a
=
p* + 2
71"0'
p*. ,
(IT 1 .-71"0 .)
p*. ,
a
j=1
J
k
po 1 = 2
j=1
J
J
J
J
,..
To show bias towards the null of pOk' we need to show that
,..
,..
E(P\)=pok <P*k=E(P*k)'
To do this, we will make an assumption eqUivalent to one made by
Gladen and Rogan, namely that 13 0 *<e 1*<...
implies that
<ek*.
This assumption
O<P 1*<P 2*<",<Pk*.
As we saw above,
k
pOk =
2
j=l
(7I"k .-lTo·)
p* .
2
(7I"k ·-71"0')
p* . + 2
=
J
jEw
J
J
J
J
J
(7I"k ·-71"0')
jEQ J
J
where w = { j)O : (7I"kj -71" OJ) S;O }
and
~
Q = { j)O:
(7T
kj -7T0j )O }
P\ S; 2 (7Tk ·-71"0') p* .
jEQ
J
J
J
108
p* .
J
'-
< ~ (7TI<J -7TOJ ) 8*1<
jEO
= [ 7Tkk - 7T Ok+ }: 7Tkj - }: 7TOj ]- P*k
jEO
jEO
j~k
J*k
since j=kEO.
(4.12)
Now,
o < [}: 7Tk'
- }: 7T0 .]jEO J jEO J
j=Fk
j=f=k
~ (1-7Tkk)
since the maximum value of ~ 7Tk . is (1-7Tkk ), the minimum value
jEQ J
j=f=k
of ~ 7T0' is 0, and jEQ ensures that7Tk .>7T0 .•
'EO J
J
J
j:~:k
'e
Using this inequality, we can see that
[ 7Tkk - 7TOk+
and
[ 7Tkk- 7TOk+
~ 7Tk ·- ~ 7T0 .] ~ (7Tkk - 7TOk)+(1-7Tkk) = 1- 7TOk ~ 1
jEO J jEQ J
j*k
j*k
~ 7Tk .- ~ 7TO ']- > 7Tkk -7TOk > 0
jEO J jEO J
j*k
j=f=k
This shows us that
o < [7Tkk -7TOk+ ~ 7TkjjEO
j:fk
..
~ 7TOj ]- ~ 1
jEO
j=t=k
which can be used, along with expression (4.12), to show that
109
pOk
< [7l'kk-7l'Ok+ 2 7l'kj- 2 7l'0j]-
or simply,
pOk
jEQ
jEQ
J'*k
J'*k
fJ*k
S;
(J*k
<P*k .
....
Therefore, we have shown that
pOk is biased towards
the null.
A
This implies that ORok, the estimated odds ratio comparing the
highest and lowest exposure categories obtained when
misclassification is ignored, is biased towards the null also.
Nothing can be said, however, concerning the direction of the bias
for estimated odds ratios involVing intermediate exposure
categories.
4.7 Relationship Between Variances
A
In thiS section, we will use the relationship
A
/fJ=~,
which was
found to hold for LS, WLS, and ML estimators, to derive a
A
A
po, 's and the p* ,'s.
J
J
varies with choices of n for LS,
relationship between the variances of the
A
,We saw in Chapter 3 that Var(fJ)
A
A
A
WLS, and ML estimation. Therefore, Var(/fJ) and Var(fJ) will
certainly not be equal to each other regardless of the estimation
procedure used.
A
A
/fJ=~
implies
Var(/fJ) =I1Var({J*)11' .
A
A
2
Let
ij = Cov(Pi'Pj ) for i,j=1,2, ... ,k,
(J
2
(J
2
(J
A
A
OJ =Cov(Pa,P j ) for j=1,2,... ,k, and
....
A
OO=Cov(P a'Pa) so that
110
(4.13)
2*
A
Var(fJ*) =
2*
00
(j
2 *
(j
.
2*
2 *
11 . .•
.
(j
k1 . ..
(j
.1k
(j
.2*
2*
kO
Ok
(j
2 *
10
(j
2*
01
(j
(j
kk
and
2
A
Var((JO) =
(j
0
2
10
(j
0
2
11 •••
.
(j
0
..1k
.
A
then A Var(fJ*l =
2k
j=1
2
(7T ·-7T
kJ
*
0 J.)(j k'J
A
Looking at just the diagonal elements of Var((JO), which are the
A
variances of
A
poa and
the
po j 's,
we can see that expression (4.13)
leads to
A
A
Var(fJO) = A Var(fJ*) A'
~
111
2
2 *
k
2 *
2 *
k
2*
(10 00 = (100 + ~ 7T0 ·(1 O' + 7T01 ((101+2 7T0 ·(1 ..)
J=l J
J
j=1 J lJ
k
2*
+ ... + (7Tkk -7T Ok) .2 (7Tk ·-7T0 ,)(1 k'
J=1 J
J
J
k k
2*
=.> ,2 (7Tki-7TOi)(7TkJ·-7TO .)(1 ij
1~J=1
J
2
Since (7T.. -7T0 .) and (1 .. for ifj can be either positive or
lJ
J
lJ
negative for 1=0,1, ... ,k and·j=0,1 ... ,k, we can make no conclusions
concerning the direction of the relationship between (1
2
*
2
o .•
and
JJ
(1 jj for j=O, 1, ... ,k.
The relationship between the true variances, given in expression
(4. 13), also holds for the estimated asymptotic variances for the
ML estimators of p. From the formula given in Section 3.4.2 and
112
its expansion in Section 3.4.1, the estimated asympototic variance
*
of Pml can be written as
A
A
A
*
-1
-1
Vara (Pml ) = IT 1 P IT 1 I
(4.14)
where the matrix P is a matrix whose elements are functions of
A
A
A
Jlml. As we saw in Chapter 3, Jlml is invariant to choices of IT.
Therefore, P is also invariant.
A
The estimated asymptotic variance of
A
A
Vara (POml) =
-1
A
no 1
ffJ ml can be written as
A-1
P
no 1
I
The matrix P in thiS formula is identical to the matrix P in
A
formula (4.14) since it is invariant to the choice of
n.
Now,
substituting expreSSion (4.7) into (4.14) leads to
...
A
-1 A
Vara (fJ* ml) = A
"-
no 1
-1
=A
-1
...
P
no 1
-1
I
-1
A'
A
Vara (POml) A'
-1
which implies
A
At.
A
A
Vara (ffJml) = A Vara (P"'"ml) A' .
Thus, the relationship between the individual true asymptotic
A
A
variances of the f30 j'S and f3* j 's ,shown on the previous page,will
also hold for their estimated asymptotic variances.
4.8 Invariance Of Test Of
~0:.f31==f32=••• =f3k=O
The null hypothesis HO:f3t=f32=... =f3k=O is eqUivalent, in the
context of logistic regression, to the hypothesis
Ho: OR 1=OR 2=...=OR =1. In general, it is a hypothesis which says
k
that there is no exposure-disease relationship.
Suppose a contingency table is formed whose rows are defined
113
by levels of exposure and whose columns are defined by the presence
or absence of disease. Further, suppose we are interested in testing
the hypothesis that the exposure and disease variables of the .
contingency table are independent. We saw in Section 1.6 that
several authors have found that the Type I error rate for such a test
of independence is not affected by nondifferential misclassification.
We will show that this property holds in the context of hypothesis
testing using ML estimation. Specifically, we will show that a
test of HO:Pl=P2="'=Pk=O produces a p-value which is invariant to
choices of
n.'"
The Wald statistic used for Significance testing of contrast
statements in CATMAX was discussed in Section 2.3. In order to
test Ho:
Pl=P2="'=Pk=O,
Qwc=
the statistic becomes
[P~l.Y~l ) C' (C V;ra(Pml.Yml) C')
t:
'"
-I
C
The matrix C can be expressed as the horizontal concatenation of
y'
two submatrices, Cp and Cy , i.e., C=(Cp,C
where Cp is that
part of C which pertains to fJ and C is that part of C which
y
pertains to y. Since HO:Pl=P2="'=Pk=O makes no stipulations
concerning the value of y, Cy=O. Therefore, C=(Cp,O) and
Now we can write 0wc as
"
"A-1
0wc= fJm { Cp ' (Cp Vara (fJml ) Cp ' )
114
Cp fJml
e'
where
Cp=
0 1 0 0
0 0 1 0
0 0 0 1
0
0
0 0 0
1
.~
If n=IJO is the the matrix of misclassification probabilities used
in the fitted model, Qwc becomes
A
A
Since
A
flJmI' Cp' ( Cp'
QOwc=
-1'"
A
Vara (flJmI) Cp) Cp flJ ml
A
flJ mI=
A
f1* mI' we can write
A
•
A
A
O =IJ *' A' C ' (C A Var (IJ *) A' C ')
Qwc
ml
p p
a ml
p
Ito
--
A
-1
A
Cp A
-1
A
= f1*mI' (C~)' [C~ Vara(lI*'ml) (C~)']
USing matrix multiplication, we can see that
CpA=
0
(IT 11-lT01)
0
(7T
0
(7T
(lT
1k
-lT
Ok
( 7T2k- 7TOk)
k1 -7T01)
(7T
-7T
Ok
A
C~ f1*ml.
)
21 - 7T01)
kk
f1* ml
)
and C~ = 8C where
s=
kxk
(IT 11-lT01)
(IT 12 -7T02)
( 7T21- 7T01)
(7T
k1 - 7T0 1)
(7T
50
(7T
22
.
k2
-7T
02
(IT 1k -7T
)
Ok )
(7T 2k -7TOk)
.
-7T
02
)
that S is an invertible matrix. Substituting CpA = 8Cp ' we get
A
Q~c=
A
"
A
-1
II*'mt' (8Cp )' [ 8Cp Vara (/1* mI) (8Cp )']
"
A
A.
8Cp /1* ml
-1
A
= /1* mI' Cp' S' [ 8 (Cp Vara (/1*mI) Cp') 8'] S Cp /1* ml
115
....
={J*m{ Cp/ 8' 8'
-1........
"
[C p Vara ({J* mI) Cp/]
,..
= {J* m{ Cp/
-1
At
[ Cp Vara ({J* ml) Cp/ J
-1
....
8 8 Cp {J* ml
""'"
Cp (J* ml
=0wc *
where 0wc * is the statistic which is used when
tests Ho:
-1
....
n=n and which
'-
P1*=P2*=...=Pk*=0.
We have shown that the statistic 0wc above will have the same
value whether one ignores misclassification or uses the correct
matrix of misclassification probabilities in the fitted model. Since
....
n was not specified, this proof will hold for any n.
This shows us that Owe is invariant to choices of n. Further,
the value of
A
since the value of 0wc is invariant, the p-value associated with the
test will also be invariant.
Through this proof, we can also see that testing a hypothesis
which involves fewer than k parameters cannot lead to an invariant
0wc. As an example, let Ho:
P1=0
be the hypothesis of interest in a
situation in which there are more than 2 exposure categories, i.e.,
k)1. Then C p will have dimensions lx(k+l), A will still have
dimensions (k+1)x(k+1), so that CpA will have dimensions lx(k+1).
The matrix 8Cp will also have dimensions .1x(k+1); it follows that
the dimensions of 8 will be 1x1.
On a closer examination of these matrices, we find
Cp=( 0 1 0 ... 0), so that CpA=( 0 (Tr 11 - Tr01) ... (Tr 1k -TrOk ) ),
which is the second row of A. Now, define 8 = (~1)' so that
8Cp= ( 0
~1
0 ... 0 ) .
ObViously, 8Cp cannot equal CpA. Therefore, for Ho:
116
P1=0,
there
e-
is no matrix 8 such that Cpl:1=8Cp .
Suppose, now, that HO:Pt=P2=0 is the hypothesis of interest for
a situation with k>2. In this case, the dimensions of e are 2x2.
The matrix C{3 would be
0100
Cp = [ 0 0 1 0
0-]
0
so that
Then,
8Cp =
0 ~ 11 ~ t2 0 ... 0 ]
[
o ~2t ~22 0
.
... 0
Again, 8Cp cannot equal Cpl:1 and there is no matrix 8 such that
OCp=Cpl:1.
From the cases for which Ho: Pt=O and Ho: Pt=P2=0 are the
hypotheses of interest, we can see a trend in the structure of the
m~trices
8Cp and Cpl:1. In general, for a hypothesis
Ho: Pt=P2="'=P =0, j=1,2,... ,k,
j
e
will have dimensions jXj. Cp
will take the form Cp= ( 0 , I , 0 ), so that Cpl:1 will
jX 1 jXj jx(k- j)
consist of the second through the U+1)-th rows of 1:1. 8Cp will
then equal
8Cp = (
0 , e
jxl
JXJ
,
0
).
jx(k-j)
OCp will not equal Cpl:1 as long as the right hand matrix of O's
117
is present in SCp. This is true since the right hand columns of /1,
and therefore of Cp/1, do not consist of O's. The 0 matrix is
present as long as j <k. Thus, only when j=k, i.e., when H :
o
Pl=P2=...=Pk=0, will there exist a matrix 8 such that Cp/1---g:;p.
The existence of 8 is necessary in showing the invariance of
Qwc. We can conclude, then, that the Type I level of significance is
retained in the presence of misclassification only when an overall
test is performed, i.e., when Ho: Pl=P2=."=Pk=0 is tested. The
'"
Qwc' and therefore the Type I level of significance, varies with n
if a test is performed concerning a proper subset of the k betas. It
can also be shown that a test of Ho:131=f32=...=l3k is not invariant to
,.
choices of
n.
'"
4.9 Relationship Between fJ!:.. and
"-
I!
In practice, if a researcher determines that misclassification
error of exposure is present in the data, she or he can try to adjust
for that through the use of models (1.16), (1.18), or (1.19).
However, as we have discussed before, the
n matrix will most
likely not be known and will have to be estimated. In the earlier
sections, we examined the relationship between the estimates of fJ
"-
obtained when
no is used in the fitted model
(i. e., when
misc1asslfication error is ignored) and those obtained when the
correct
n matrix is used in the fitted model.
Here, we will
investigate the relationship between the estimates of fJ obtained
"-
when
n is used in the fitted
model and those obtained when
n is
used in the fitted model. The follOWing derivasions hold for LS,
118
'
..
WLS, and ML estimation methods. Therefore, we will use the
A
general symbols
f/*, and
po,
A
A
{J*, and {J to represent the estimators of
po,
{J for any of the three methods.
To begin, define A as
1
"
A=
"
1l"01
"
1l"Ok
1l"02
"
0
(1l"11- lT01)
a
"
(1l"21-1l"01)
a
(1l"kl-1l"Ol)
A
"
"
(1l" 12 -1l"02)
"
"
(1l" 22 -1l"02)
(1l" lk-1l"Ok)
"
"
(1l" 2k -1l" Ok)
"
(1l"k2-1l"02)
" "
A
(1l" kk -1l" Ok)
A
A
It follows that PO=A{J. We saw before that 1JO=AfJ*. Therefore,
A
A
A-I
,..
A{J=AfJ*, which implies {J=A AfJ*. Let's examine these matrices
"-
in the case of two exposure categories:
"
"
"
f3 a * +
:::::t>
"
" -1
{J =A
""~Ol-~Ol" II
[
_
1l"11-1l"O1
] 11,*
(4.15)
A {J*=
[
1l"l1-1l"Ol
1l"11-1l"O1
119
] 11,*
This is the same relationshi p found to hold between the
,.,
parameters /l=E(fJls) and fJ* in Section 3.2.2. We would expect the
two relationships to be identical. Expression (4.15) was derived
A
based on the equality
A
1JO=l:4J*.
This same equality was found to hold
for the parameters. Therefore, having the relationship of
' .•
expression (4.15) hold for fJ and fJ*, as well, is consistent with the
theory.
,., -1
Let us further investigate the expreSSion /! l:1. Recall that
,.,
-1
. n 1=Ifl 1
,.,
,.,
~
,.,
-1
,.,
,.,
,.,
so that &=Ifl 1 nt. We can also show that n t =Ifl 1/!, so
-1 ,.,
A
-1
A
-t
A
A
-1
,., -1
that /!=Ifl 1 n 1 and /! =n1 Ifl t • It follows that /! /! = fi 1 n 1•
Therefore, for each type of estimation method considered,
A
A
-1
A
A
A
fJ = n 1 n t fJ*. Further, conditioning on n, the expected value of fJ is
,., -1
n1 n t fJ*
(aymptotically, in the case of WLS and ML estimation).
This was shown in Chapter 3 for LS and WLS estimators. The proof
giYen here applies to all three types of estimation methods.
Returning to the case of k= 1, by examining the expression
. we can see that it is the disparity between the difference
and its estimate, not the disparity between the values of
(7Tu-7To1)
7T11
and
and their estimates, per se~ which determines the extent of the
difference between the two estimates of fJ. For example, if
1/4
3/4
J and fi 1
,.,
=
[1 1/6 J'
1
120
2/3
A
then
A
Pt=Pt*
7T01
e.
A
A
even though 7Tl1f7Tl1 and 7T01:f7TOh since
(3/4-1/4) = (2/3-1/6) = 1/2.
However, if
1/3
n1
2/3
then f31=3/2 f31*.
'"
-J,
A
Since 7T01=( 1-7Too), the condition 7T11-7T01=7T11-7TOl holds when
A
'"
7T11+7T00=7T11+7T00
holds. 7T11 and 7TOO can be thought of the
probabilities of correctly classifiying subjects. Therefore, with
two exposure categories, f31=f31* as long as the average of the
estimated and true probabilities of correctly classifying subjects
are equal.
Considering a case of three exposure categories, it can be shown
'"
A
A
that f31=f31* and f32=f32* when all four of the following conditions
hold:
7T22 -7T02=7T22 -7T02
These conditions are less easily interpreted than the condition
for the two exposure category case. They perhaps become clearer
upon returning to the structure of the matrix
•
1
7TOl
7T02
1
7T11
7T12
1
7T21
7T22
n l'
shown below:
The four equalities given above are comparisons of the
misclassification probabilities between the second and lowest and
between the third and lowest categories.
121
Again, we see that it is not the values of the
7T
ij
'S
,.
themselves which affect the degree to which the fJ's differ, but the
differences between certain 7T ij 's. As an example, consider a
classification scheme with three exposure categories. The
misclassification probabilities are given in 111; an estimated set
,.
is given in II1:
1
111 = 1
1
.0420
.5814
.0610
.0336
.1860
.8293
1 .05
1 .5894
1 .069
.05
.2024
.8457
When these two matrices are used in regression analysis, the
resulting estimates for
Pl and P2 are the same.
This occurs since
the four conditions hold:
= .5394 = .5894-.05
.0610-.0420 = .019 = .069-.05
.8293-.0336 = .7957 = .8457 -.05
.5814-.0420
.1860-.0336
=.1524 =.2024-.05
In the next chapter, we will examine how the estimates of fJ are
affected when varying degrees of misclassification are assl.Diled.
This is done through the use of examples in an effort to see how
sensitive results of an analysis can be to misclassification of the
data.
..
4.10 Removal of Restriction on Data for LS Estimation
This final section of Chapter 4 deals exclUSively with LS
estimation. The theory developed in this and the previous chapters for
122
LS estimators was based on the assumption of equal numbers of
subjects in each classified exposure category. This assumption was
necessary to express
n as n = n
n
1
1
Such an assumption, however, places an unrealistic restriction on
the data. Here, we will investigate the situation in which the
assumption is violated. Specifically, we will determine
wheth~r
the
results for LS estimation derived in Chapter 3 and earlier sections of
thiS chapter are still valid.
Suppose that the number of subjects in the i-th classified exposure
"-
category is S1' i=O,l, ... ,k. Let us now define the total number of
k
subjects as S', so that S'=~ Si' If the assumption holds, then S.=S
i=O
1
for all i=O,l, ... ,k, and S'=(k+l)S. Here we are assuming, however,
that the Si's are not all equal.
If the response variables are ordered such that
. Y=(Y01'Y02""'YOS ' Yll'Y12'·"Y1S , . . . 'Ykl'Yk2'''''YkS)''
o
then the
n matrix will have the form
123
1
k
7T
1 7T !
0
! 7T !
0
7T
02
7T 02
7T
Ok
Ok
So rows
IT =
1 7T0!
7T02
7TOk
1 7T !
1
1 7T!!
7T 1 2
7T!k
7T!2
7T!k
1 7T 11
7T!2
7T 1k
••
S1 rows
------------------------------------------------------------------! 7T 1
k
! 7Tk!
1 7Tk !
7Tk2
7T
k2
7Tkk
7Tkk
7T
7T
k2
Sk rows
e"
kk
A
Expression (4.5) implies that n=IJOA for the case in which Si=S,
i=O,!, ... ,k. This relationship also holds if SifS for all i=0,1, ... ,k.
A
A
A
Further, the relationship IT=IJOA holds in this case as well. We will
A
use this second relationship to show the invariance of R, and therefore
A
of 'lIs' as follows.
First, consider the matrix expression
A
A
A
-1
A
A
-1
A
n(n'm IT = IJOA [(IJOA)' (IJOA)]
"
=
A
-1
A
A
A
.
(IJOA)'
A
no (no' no) no'
A
A
A
A
A
-1
A
which is invariant to choices of n. Since R = [I-n(n'm n'], we
A
A
"-
can see that R is also invariant to choices of n. The invariance of R
A
A
leads directly to the invariance of 'lIs and of Var(Yls).
124
To prove the invariance of the predicted values, we need to show
A
A
that Dills is invariant to changing degrees of misclassification. This
is easily done using the derivation presented above.
A
A
A
A
A
-1
A
A
Dills = D(D'm D' (Y-VYl s )
A
A
-1
A
A
A
= flO(flO'flO) flO' (Y-VYls) .
,.
A
It follows that the vector of predicted values, Y=DlIls+VYls' is also
2
invariant. The invariance of Y is used to conclude that Var(Yl s )' R ,
A
A
A
,.
and the F statistic are all invariant to choices of D. In fact, all of the
results found for LS estimation in Chapter 3 can be shown to hold in
this scenerio.
The conclusions reached in Sections 4.4, 4.5, 4.6, 4.7, and 4.8
concerning LS estimation hinged on the relationship found in
we
expression (4.9). If we can show that this relationship holds when
SifS for i=O,l, ... ,k, then we can conclude that the results in this
chapter for LS estimators hold in thiS case as well.
As we saw in Section 4.4,
-1
A
A
fJ* Is = (D'm D' (Y- Vy*Is) .
,.
, Using the expression D=flOA, this becomes
-1
A
A
-1"
A
A
/l* Is = A (flO'flO) flO' (Y- Vy*Is) .
A
,.
Since 'lIs is invariant to choices of D, '1*Is=yOIs so that
-1
A
A'
A
-1
A,
,..
/l* Is = A (flO'flO) IJO' (Y- VyOIs)
-1
A
=A
POls
which is the relationship found in expreSSion (4.9). This shows us
that the results found in the earlier sections of this chapter do not
125
require that there be equal numbers of subjects in the classified
exposure categories. Since this was also found to be true of the
results in Chapter 3, we will dispense with this unnecessary
requirement.
...
126
CHAPTER 5
NUMERICAL APPLICATIONS
5. 1 Introduction
Numerical examples are used in this chapter to investigate two
issues regarding the use of the models developed in Chapter 1. First,
the question of when models (1.16) and (1.18) are appropriate is
addressed. The theoretical implications of this were examined in
Section 1.11. The examples examined here point to a number of
conditions which can help ensure that the models will perform
reasonably well.
•
In the second part of thiS chapter, models (1. 16), (1. 18), and
(1.19) are each used to analyze a different data set. Each model is
fit. assuming several different sets of misclassification probabilities.
Through these analyses, the sensitivity of the results to varying
assumed degrees of misclassification is probed.
5.2
Conditions for Appropriateness of Models
The theory developed in earlier chapters assumes that models
(1. 16) and (1.18) are appropriate to a given set of risks or rates and
misclassification probabilities. Since this is not always the case, it
is valuable to know under what conditions one can expect these
models to perform well.
As has been mentioned, model (1.16) assumes that model (1.6)
holds within each stratum. This implies that expression (1. 21) holds
"'-
within each stratum. Likewise, model (1.18) assumes that model
(1. 7) and, therefore, expression (1.22) hold within each stratum. By
determining the conditions needed for models (1.6) and (1.7) to hold
within a given stratum, we will arrive at the conditions needed for
models (1.16) and (1.18) to hold. Therefore, for the time being, let
us restrict our enqUiry to the fit of models (1.6) and (1.7) in a single
stratum.
Consider a situation with no misclassification of exposure and only
....
one stratum. In this case, expressions (1.21) and (1.22) hold
perfectly, as do models (1.6) and (1.7). Here, the misclassification
probabilities are such that 7T =1 for all i=j and 7T =O otherwise, for
ij
ij
i,j=O,1, .. ,k (i.e.,n =flO). IntUitively, it seems that for a given set of
true risks, the closer the pattern of values in the matrix n are to the
pattern of values in flO, the better the fit of models (1.6) and (1.7).
5.2. 1 Logistic Regression.
Let us consider, first, the case of logistic regression. The
examples which follow indicate that, in general, the above statement
is true with regard to the fit of model (1.6). When all of the 7T ij 'S
with i=j are equal (i.e., 7Too=7Tll=... =7Tkk), as their common value
decreases from 1, the quality of the estimates resulting from the use
128
.
of model (1.6) decreases. The "qualit y" of an estimate, in this
context, refers to how close that estimate is to the estimate from the
(hypothetical) true data. It cannot be said that a lower degree of
misclassiflcation will always lead to higher quality estimates;
however, in general, this will be the case. Also, for a given set of
misclassification probabilities, the quality of estimates diminishes
as the OR* j 's increase.
These points are illustrated in the following tables. The values in
Table 5.1 were calculated for a given OR t* with k=l and 5=1. They
illustrate the effect that varying sets of misclassification
probabilities have on the quality of the estimate produced assuming
model (1.6) and, therefore, model (1.16)'. The value of OR 1* is set
at 2.111 so that Un OR t*)=Pt*=.7472. The estimator from model
A
A
A
(1.16) is Un OR 1*)=P1* which is the estimator derived when n=n.
We are considering these
1rij 's
as known and the data as population
A
data. Therefore, we would expect f31*-::::f31*' The quality of the
A
estimate is determined by the absolute differences IP1 * - P1 * I and
A
IOR 1*-OR 1*1. The third column contains (1rll+1rOO) which is used as
.
a measure of the total degree of non-misclassification involved. The
higher the value in this column, the lower the degree of
misclassiflcation; of course, (1rll+1roo)=2 corresponds to no
•
misc1assification.
129
Table 5.1
Quality of estimates for various 7T..
's
1.)given OR 1*=2.111 (P1*=.7472)
,.
7Too
.9500
.9474
.9048
.9000
.8889
.8182
.8500
.8421
.8000
.8571
.7600
.7500
.7000
.6500
.6000
7T11
.9500
.9048
.9474
.9000
.8182
.8889
.8500
.8095
.8000
.6923
.7600
.6667
.7000
.6500
.6000
"
"
-
(7Too+7T11LJ.QRt*-OR1~1*-P1~
1.9000
1.8522
1.8522
1.8000
1.7071
1.7071
1.7000
1.6816
1.6000
1.5494
1.5200
1.4167
1.4000
1.3000
1.2000
.0112
.0057
.0372
.0210
.0144
.0621
.0308
.0168
.0364
.0452
.0412
.0061
.0470
.0500
.0532
.0053
.0027
.0178
.0100
.0032
.0299
.0147
.0080
.0174
.0212
.0197
.0029
.0225
.0239
.0255
e
Evaluation of the results in this table indicates three
'characteristics of the relationship between the values of the
the quality of the estimates.. First, when
7fOO=7f1h
7f
ij
'S
and
as their common
value decreases, the quality of the estimate decreases. By comparing
the table entries corresponding to such sets of
as
A
7T00=7T11
7f .. 's,
1.)
we can see that
.
takes on values from .95 to .60, the differences
A
1,8 1*-P1* I and IOR 1*-OR 1* I increase steadily.
Second, in general, the higher the values of
7f00
and
7flh
the closer
the estimate of OR 1* is to the true value. Eight of the nine closest
130
•
estimates correspond to misclassification probabilities with both
71"00~.80
and
71"11~.80.
Third, the quality of the estimate will probably be better-if
71"00>71"1h than if 71"00=71"11 or if 71"00<71"11. First, compare the first two
sets of 71"ij 's in the table. Although the first set has a lower total
degree of misclassification, the differences in colt.rrnns four and five
are greater in magnitude than the differences corresponding to the
second set of 71".. 's. The same is true for the fourth and fifth sets of
lJ
m isclassification probabi 1ities.
Second, compare the second and third sets of misclassification
probabilities. Notice that the values of the 71"i/s are switched
between the two sets. The value of 71"11 in the first is equal to the
value of 71"00 in the second, and vice versa. ObViously, the total
degree of misclassification is equal for the two sets. The qualities
of the resul ting estimates, however, are qUite different. The second
set has the smallest differences found in the table. The differences
corresponding to the third set are over four Urnes those
corresponding to the second set, but are still qUite small in size.
A similar phenomenon is found when the fifth and sixth sets of
71"ij'S are compared. The values of 71"00 and 71"11 are also switched
between these sets. The differences corresponding to the sixth set are
not only much larger than those corresponding to the fifth set, they are
the largest in the table.
Unlike the cases in which 71"OO=71"1h when the two misclassification
probabilities are not equal, there is no definite downward trend in the
quality of the estimate as the values of the 71"ij'S decrease. This is
131
poignantly illustrated by the set of probabilities Troo=.7500 and
Tru=.6667. These probabilities correspond to the second smallest
,.
difference
IPI * -.B 1* I of all
entries in the table.
As a final, but important point, let us note that the differences in
columns four and five are all qUite small. The biggest difference
,. *
leads to a value of OR 1 =2.0489 which is not very far from the true
value of 2.111. In general, we can say that the model performs
reasonably well for all sets of Trij'S in Table 5.1, although it does
perform better for some than for others.
Table 5.2 is used to illustrate the point that, all else being equal,
the larger the true odds ratio, the lower the quality of the estimates.
Rather than varying the misclassification probabilities for a given
OR 1*, as we did for Table 5.1, here we vary the OR 1*'s for a given
set of misclassification probabilities. Again, k=l and 5=1. Each of
the estimates in Table 5.2 was calculated using Troo=Tru=.80. As the
,.
values of the OR 1*'s increase, the absolute differences IOR 1*-OR 1* I
,.
and
A
IP1*-P1* I also increase, as does the difference IOR 1*-OR 1* I
relati ve to the value of OR 1* .
Table 5.2
Quality of estimates for various OR 1*'s given Troo...=2!:11=.80
A
,.
A
OR1_*_.......lll*-P1~1*-OR1~
1.65
2.11
2.59
4.75
.0055
.0174
.0346
.1301
.0004
.036
.088
.580
132
IOR 1*-OR 1* I
-----xl00
OR 1* --.253
1.721
3.396
12.212
e
Table 5.3 includes several examples with k=2 and S=1 which are
used to investigate whether the trends found for k= 1 carryover to this
situation. Table 5.3 was constructed in the same fashion as Table
5.1. In this case, OR 1* and OR 2* are given (OR 1*=2.111 and
OR 2*=4. 75). The misclassification probabilities are then varied and
the resulting estimates of OR 1* and OR 2* are compared with the true
parameters. In addition, the estimates produced from model (1. 16)
A
A
(fJl* and fJ2*) are compared with the true values (fJl*=.7472 and
fJ2*=1.5581). For each example, the
7T
ij 's are presented in a table
of the form:
-e
j=O
j=1
j=2
i=O
7TOO
7TOl
7T02
i=1
7Tl0
7T11
7T12
i=2
7T20
7T21
7T22
The results shown in thiS table indicate that the trends found when
k=1 also hold when k=2. In general, as the probabilities on the main
diagonal increase, the quality of the estimates increases. Also, by
. comparing the third and fourth sets of
'S, we can see that the
ij
quality of the estimates is probably better when 7Too=t7T11*7T22. Since
7T
each of the diagonal elements in the third set is larger than each of
the diagonal elements in the fourth set, one would expect closer
estimates from the third set. This, however, is not the case.
Therefore, we conclude that the larger difference is due to the
equality of the
7T•. 's
lJ
on the main diagonal. Also, the differences in
133
this table are small enough to indicate that the model works well for
all sets of Trij'S considered.
Table 5.3
~
~
JORj*-ORj~j*-Pj~);=1,2
for various Trij'S given
OR 1*=2.111 and OR 2*=4.750 (P1*=.7472 and P2*=1.5581)
Absolute Differences
_ _ _--.,;.;.Trij.....;:'s~
_....L;_=.=...1
.9375
.0521
.0104
.0600
.9000
.0400
.0385
.0481
.9000
.9677
.0215
.0108
.0714
.8889
.0370
.02020
.0202
.9596
.9000
.0600
.0400
.0500
.9000
.0500
.0300
.0700
.9000
.8649
.1081
.0270
.1237
.8247
.0515
.0724
.0905
.8371
.015
(.007)
.008
(.004)
134
_.L.j=__=2=_
.011
(.002)
.042
(.007)
.128
(.072)
.225
(.072)
.061
(.029)
.035
(.005)
Table 5.3 continued
's - - - - - - - - - - - L . . j=l
- - - - - i jTr- =
~
.8000
.1000
.1000
.1000
.8000
.1000
.1000
.1000
.8000
.7500
.1250
.1250
.1739
.6957
.1304
.1905
.0952
.7143
.6667
.2222
.1111
.2222
.6667
.1111
.1111
.2222
.6667
j=2
..L
_ __=_
.226
(.113)
.368
(.078)
.250
(.126)
.258
(.053)
.252
(.127)
.523
(.114)
Finally, Table 5.4 investigates the situation for k=3. These
examples, being so few, cannot be used to say anything definitive
regarding trends in the relationship between the quality of the
estimates and the choice of misclassification probabilities. Instead,
these examples are used to support the findings of Tables 5.1 and 5.3,
.and to give an idea of how close the estimates will be to the true
values in these particular
e~amples.
The true odds ratios are set at
OR t *=2.111, OR 2*=4.75, and OR 3*=6.333 (f3t*=.7472,
f32*=1.5581, and f33*=1.8458). The Trij'S are presented in a table of
the form
135
j=O
j=1
j=2
j=3
1=0
1TOO
1T01
1T02
1T03
i=1
1T10
1Tll
1T 12
1T13
i=2
1T20
1T21
1T22
1T23
i=3
1T30
1T31
1T32
1T33
The same trends seen in Tables 5.1 and 5.3 can be seen in Table
5.4. Firstly, as the degree of misclassification decreases, the
quality of the estimates increase. Secondly, none of the sets of
1T
'S
ij
has equal probabilities on the main diagonal and, in tum, none of the
A
IPj * - Pj * I for
differences
the first two sets is greater than .1. This
is true even though two of the main diagonal probabilities in the
second set are less than .8. The results in this table, therefore,
support the hypothesis that unequal, relatively high probabilities of
correct classification will help enstn"e high quality estimates.
Also, we can compare across these three tables to examine
the effect that a greater number of expostn"e categories has on the
quality of the estimates. For instance, in Table 5.1 when
A
1TOO=1Tll=.80,
IOR 1*-OR 1* 1=.036. When k=2 and
1ToO=1Tll=.80,
A
however, IOR 1*-OR 1* 1=.226 even though the value of OR 1* is the
A
same. The differences IOR 1*-OR 1* I in Table 5.1 never exceed .06,
even with high degrees of rriisclassification. These differences in
Tables 5.3 and 5.4, however, exceed that number when there are
moderate to low degrees of misclassification. One would contend that
the higher the number of f3*'s being estimated, the lower the quality
of the estimates. These tables support this contention. It is
136
Table 5.4
A
A
JOR .*-OR .~.*-P.~) j=1,2,3 for various 7r.. 's
J
J
J
J
IJgiven OR 1*=2.111, OR 2*=4.750, and OR 3*=6.333
!Jl.1*=.7472, P2*=1.5581, and P3*=1.8458)
Absolute Differences
""'_j=_1=___ ___ooLj_==_2_ _......jl.....=..::.3_
_ _ _ _~7rij...;;'S~
.9422
.9428
.0107
.0043
.0607
.9109
.0202
.0081
.0384
.0384
.9117
.0115
.0193
.0193
.0193
.9421
.8511
.0851
.0426
.0213
.1132
.7547
.0755
.0566
.0566
.0755
.7547
.1132
.0213
.0426
.0851
.8511
.7609
.1304
.0652
.0435
.1667
.6481
.1111
.0741
.0741
.1111
.6481
.1667
.0435
.0652
.1304
.7609
.033
(.016)
.032
(.007)
.093
(.045)
.313
(.068)
.561
(.093)
.174
(.086)
.530
(.118)
.819
(.139)
.105
(.017)
important to note, also, that even with k=3 the differences are
relatively small, especially for low degrees of misclassification.
This leads us to believe that a good quality estimate can be obtained
with a large number of exposure categories as long as the degree of
misclassification is not too high and unequal between exposure
137
categories.
Therefore, we conclude that model (1.6) will hold reasonably well
for a given stratum when three conditions hold. The first is
degrees of misclassification are low so that the smallest
7T ij
~hat
the
with
"-
i=j is greater than or equal to .80. The second condition calls for
low magnitudes of the true odds ratios. Exactly how low these
magnitudes should be depends on the misclassification probabilities
involved. The higher the degree of non-misclassification, the higher
the magnitudes can be to obtain quality estimates. In general, an
odds ratio below three will be estimated well. Finally, the
probabilities of correct classification should not be equal for all
exposure categories.
The first condition is necessary in guaranteeing an estimate
reasonably close to the true estimate. If the the second condition is
Violated, the estimates produced may still be a good approximation of
the true estimate as long as the "first and third conditions are not
Violated. For example, we have seen that for probabilities of correct
classification which are unequal and .80 or above, high quality
estimates can be obtained of odds ratios with values of six or below.
The third condition is an added guarantee of quality estimates. It
becomes espeCially important as the degree of misclassification
increases.
5.2.2 Poisson Regression
We can see the same conditions hold in the case of Poisson
regression. First, Table 5.5 is analogous to Table 5.1. Here, the
138
e
value of IDR* is given for a situation with k=l and 5=1
(IDR 1*=2.25). Various sets of misclassification probabilities are
assumed; estimates were calculated for each set using model (1.7).
The absolute differences between the true parameters and their
estimates are shown in the table.
As we saw in the case of logistic regression, the absolute
differences between the true values of the parameters and their
estimates are all small. Further, these differences tend to decrease
as the degree of misclassification decreases.
Table 5.5
Quality of estimates for various
7r
ij 's given IDR 1*=2.25 (Pl*=.8109)
A
7roo
'e
.9794
.9515
.9000
.8947
.8571
.8000
.7895
(7roo+7r 11)
7r11
.9515
.9794
.9000
.8571
.8947
.8000
.7619
A
Iil*-Pl~l*-IDR*lL-
1.9309
1.9309
1.8000
1.7518
1.7518
1.6000
1.5514
.0035
.0147
.0159
.0067
.0300
.0276
.0212
.0078
.0329
.0356
.0151
.0686
.0613
.0473
Another similarity to the logistic case is seen through a
comparison of the first two sets of misclassification probabilities.
The values of
7roo
and
7r11
are switched between these two sets.
The
differences corresponding to the first set are much smaller than those
corresponding to the second set. The same holds for the fourth and
fifth sets. Compare, also, the fifth and sixth sets. Notice that,
although the total degree of misclassification is higher, the quality of
139
estimates for the fifth set is poorer than that for the sixth set.
These comparisons indicate that the quality of the estimates is
greatest when 7ToO>7Tl1. The quality decreases when 7TOO=7T1h and
decreases further when
Tr 11> 7T00.
'-
Table 5.6 for Poisson regression is analogous to Table 5.2 for
logistic regression. Again, the misclassification probabilities are set
at 7T00=7T11=.8; k=1, and 5=1. The value of IDR 1* is varied and the
quality of the estimates is examined. As in Table 5.2, these results
. indicate that the quality decreases as the value of IDR 1* increases,
both in terms of the absolute differences and in terms of the
differences relative to the value of IDR 1*.
Table 5.6
Quality of estimates for various IDR 1*'s given 7Too=7Tu=.80
,..
,..
IDR 1*
1.50
2.25
3.00
4.50
9.00
,..
~1*-{J1~1*-IDR1~
.004
.028
.025
.164
.454
.005
.061
.073
.697
3.284
IIDR 1*-IDR 1* I
x 100
IDR 1*
.333
2.711
2.433
15.089
36.489
Tables 5.7 and 5.8 examine the cases of k=2 and k=3,
respectively. For both tables, the values of the IDRj*'s were given
and the misclassificatlon probabilities varied. Again, the qualities of
the estimates from the two tables are qUite good, with the quality
generally increasing as the degree of misclassification decreases.
140
e
Table 5.7
A
A
JIDRj*-IDRj~j*-Pj~)' j=1,2, for various sets of 7l"ij'S
given IDR t*=2.25 and IDR 2*=3.00 (P*t=.8109 and P2*=1.0986)
Absolute Differences
-"
7l".. 's
;=1
.......;=2
---=--
----:lJ------------~---
1.000
0.000
0.000
.0667
.8889
.0444
.0256
0.000
.9744
.9524
.0226
.0251
.0353
.8941
.0706
.0134
.0241
.9626
.8780
.0976
.0244
.0789
.8421
.0789
.0244
.0976
.8780
.8718
.1026
.0256
.1000
.8000
.1000
.0488
.0976
.8537
.0469
.0337
(.0206)
(.9889)
.0541
.0825
(.0243)
(.0279)
.0760
.1316
(.0343)
(.0449)
.0671
(.0303)
141
.1100
(.0373)
Table 5.7 continued
7r t s
;=1
;=2
-----ij- - - - - - - - - - - " '.....
- ------"'--- - - -
.7895
.1053
.1053
.1429
.7619
.0952
.1000
.1000
.8000
.7778
.1111
.1111
.1667
.6667
.1667
.1111
.1111
.7778
.1783
.2128
(.0826)
(.0736)
.3662
.3771
(.1777)
(.1343)
Let us conclude that both models (1.6) and (1.7) perform
reasonable well in the examples we have seen here. The quality of the
estimates resulting from the use of these models will generally
improve when the degree of misclassification is low, when the
probabilities of correct classification vary (at least a little) from
category to category, and when a small number of fJ j *'s are being
estimated. Further, since models (1.16) and (1.18) assume that
models (1.6) and (1.7) hold Within each stratum, respectively, these
conditions relate to the appropriateness of models (1. 16) and (1. 18)
when any number of strata are involved.
142
.-
Table 5.8
~
~
JIDRj*-IDRj~j*-Pj~1);=1,2,3, for various
7r
ij 's
given IDR t*=2.25, IDR 2*=3.00, and IDR 3*=5.00
~t*=.8109, .82*=1.0986,
and P3*=1.6094)
Absolute Differences
_ _ _ _~7rij..::'s=--
.-
......]L...·=-=1~_ .......JL.. ·..=2=----_..Li=--:3::...-_
.9375
.0260
.0505
.9091 .0253
.0152
.0238
.0476 .8571
.0714
.0250
.0250 .0500
.9000
.8247 .1031
.0260 .0104
.0804
.09£5
.1184
(.0364)
(.0327)
(.0240)
.0515
.0206
.1758
.2213
.2728
.0995
.7960 .0746
.0299
(.0813)
(.076£)
(.0.5£1)
.0476
.0476 .7619
.1429
.0513
.0513 .0769
.8205
.7568
.1351
.0541
.0541
.2289
.3354
.5425
.1395
.6512
.1163
.0930
(.1073)
(.1186)
(.1148)
.0930
.1163
.6512
.1395
.0541
.0541
.0351
.7568
5.3 Numerical Examples
There are two different scenerios which may occur when there is
a pOSSibility of misclassification of exposure in collected data. In the
143
first, the researcher knows the data is misc1assified. In some way,
e
she or he estimates the 7Tij 'S and then uses these estimates to
"
calculate fJ, which will be a biased estimator of the true parameter
{J*, unless the researcher estimates the
7T •• 's
perfectly (an extremely
lJ
unlikely event). In our numerical examples up to this point, we have
'-
assumed that the 7Tij 'S were known.
In the second scenerio, the researcher knows or suspects there is
misclassification, but does not (or perhaps cannot) estimate n.
"
Instead, she or he first calculates the estimate fJ of {J* assuming
"
"
there is no misclassification (i.e., using n=JIO). Then the researcher
"
asks the question: How much would this estimate change if n had
some other structure? The researcher addresses this question by
assuming several different structures for n and then comparing the
resulting fJ's. This will give an indication of how sensitive the
resul ts are to varying assumed degrees of misc1assification.
"
Under thiS scenerio, the model being fit is: E(Y)=nfJ+vy. We
saw that the parameter y will have the same estimate regardless of
the choice of n matrix used in the fitted model. Nevertheless, as
long as there is a poSSibility of misc1assification of the covariates
(an error which is ignored
~n
the following analyses), we cannot be
certain that we are estimating y* even when n=n. If there is no
possibility of covariate misclassification, the parameter y in the
above model can be replaced by
1'*.
Using the theory developed in Chapter 3, we can conclude that y
"
will not change for different values of n. In addition, the predicted
values and goodness-of-fit statistics will be invariant to changes in
144
e.
...
n.
This second approach to assessing the effect of misclassification
will be followed in the remainder of this section. We will illustrate
the use of models (1.16), (1.18), and (1.19) with logistic, Poisson,
and regular unweighted least squares regression methods,
respectively. Three data sets are presented, each of which is
suspected to involve misclassiflcation with respect to exposure. Each
is analyzed using an appropriate regression procedure.
...
...
The
...
...
n are considered, including n=JIO.
Five different choices for
.
n1 matrix corresponding to each of these choices
is presented
,
below.
~e
...
nh1 =
...
nh3=
...
nhS=
•
1
1
1
1
0
1
0
0
1
1
1
1
.098
.90
.04
.04
.001
.04
.90
.05
.001
.03
.03
.90
.01
.80
.08
.07
.001
.10
.80
.08
.001
.08
.08
.80
1
1
1
1
0
0
1
0
0
0
0
1
...
nh2=
n1'4=
1
1
1
1
1
1
1
1
.05
.90
.04
.025
.03 .02
.04 .02
.90 .04
.05 .90
.10 .07 .03
.80 .08 .04
.08 .80 .08
.07 .10 .80
...
The first matrix above,
...
misclassification (i.e.,
n
h h
is the matrix which assumes no
...
nh1 =JI0 1).
...
For the matrices
145
nh2 and nh3 ,
the
"
values of the 7T ij 'S with i=j are each equal to .90. ThiS implies that
the probability of being correctly classified for both of these sets of
misclassification probabilities is .90. The differences between the
two matrices are in their off-diagonal elements. For the matrices
""
"
less than or equal to 20% (Le.,
"
7Tij~.80
n h4 and nhS' the values of the 7T ij 'S with i=j are each equal to .80
"
"
for i,j=1 ,2,3. The value of the 7T0 . associated with n h4 is also .80;
"
J "
the value of the 7T 0 . associated with nhS, however, is .988.
J
"
Finally, these choices of n1 all assume a misclassification rate
for all i=j). From the
discussion in Section 5.2, then, we can assume that models (1.16)
and (1.18) will hold reasonably well. Therefore, we should not be
concemed with the appropriateness of the two models in the following
examples.
5.3. 1 Logistic Regression Example
For this example, model (1.16) is assumed to hold. The
classified data are given in Table 5.9. There are four exposure
categories (i.e., k=3) and one covariate which divides the data into
two strata (i.e., 5=2). One covariate value is needed to represent the
stratum effect (i.e., p=1). 'Let us define vs=(vs ) where Vl=O and
V2= 1. The
n" matrix for this example will
since there are just two strata.
146
be of the form
n" =
"-
Table 5.9
Data for Example 5.3. 1
s=1
.;"'
Eo
E1
Ez
E3
D
2
5
15
4
s=2
-
D
24
43
60
12
0
2
6
9
5
Eo
E1
Ez
E3
-
D
32
51
42
10
The model being fit becomes
3
logit Gis= {la +~7T"j fl. +VSY1'
j=1 1 J
A
{l a is the reference value for the group in the lowest exposure
category and the first stratum. {lh {lz, and {l3 are the effects for the
second, third, and fourth exposure categories, respectively. Y1 is the
effect for the second stratum.
A
The five choices of fi 1 presented earlier were used in five
separate fitted models. The CATMAX procedure was used to produce
both WLS and ML estimates of fJ and y. In addition, goodness of fit
was measured for both types of estimation, and the predicted counts
were calculated for ML estimation.
A
A
. As was preViously mentioned, Ywl s and YmI' the WLS and ML
estimates of Yh along with the WLS and ML goodness-of-fit (GOF)
statistics, are invariant for all pOSSibilities of fi 1 • These values are
given below.
"
•
Ywl s = -.0321
°w=·4361,
p=.9327
0p=.4383 ,
p=.9322
A
Yml= -.0319
0log=.4383 , p=.9322.
147
where p represents the p-value associated with the value of the
GOF statistic. The predicted counts are invariant as well.
The estimates of fJ, however, are not invariant. Tables 5.10 and
,.,
5.11 contain these values for each of the fi 1 matrices for WLS and ML
estimation procedures, respectively. Let us focus on the ML
,..
estimates. The values in the fi h
1
column are the estimates of fJ
obtained assuming the data are perfectly classified. The values
contained in the remaining four columns are the estimates obtained
assuming various degrees of misclassification. These can be
compared to the left-hand column's values to see how the estimates
are affected when different sets of misclassfication probabilities are
assumed.
Table 5.10
e'
,.,
WLS Estimates
OJ' .1=1,2,3 for Example 5.3. 1
x
Structure of fi 1 matrix
,.,
,.,
fi h1
fi h2
,.,
fi h3
,.,
,.,
fi h4
fi hS
,.,
.4860
1.1810
1. 7456
/31
,.,
/32
,..
/33
.5563
1.3187
1.9695
.4749
1.2830
1.9024
.6413
1.4824
2.2343
.2539
1.2540
2.0417
,.,
The estimate for
/33
,.,
must be smaller when fi 1=fi h1 than
when fi 1 has any other value. We can see this by reviewing the theory
in Section 4.5. There, we showed that when the model being fit is
,.,
,.,
o is biased towards zero. It can also be
k
,.,
,.,
,.,
shown that 13k° is less than 13k*, where 13k* is the estimate obtained
,.,
when the correct fi matrix is used in the fitted model. In fact, 13k°
E(Y)=fifJ+Vy, if fi=fio,
/3
148
•
A
A
A
is less than 13 , where 13k is the estimate obtained when any n matrix
k
is used in the fitted model. Intuitively, this is reasonable
A
considering that any n could be the true n matrix.
A
Therefore, when the model being fit is E(Y)=np+Vy, as it is in
A
this section, the estimate 13k obtained when n=IJO will be less than
"
the estimate obtained when any other structure of n is assumed.
Also, recall that nothing could be said of the direction of the bias in
estimating 131 and
/32 It follows, then, that the estimates of these
parameters obtained when n=IJO will not necessarily be smaller than
"
those obtained when n takes on another structure.
ML Estimates
-_
"
Pj
Table 5.11
, j=l ,2,3, (and Estimated Standard Errors),
for Example 5.3.1
x
Structure of n 1 matrix
"
"
131
"
132
"
133
"
"
"
"
n h1
n 1,2
n h3
n h '4
nhS
.4929
(.6078)
.5643
(.7092)
.4832
(.7149)
.6510
(.8517)
.2622
(.6736)
1.1870
(.5674)
1.3257
(.6462)
1.2903
(.6496)
1.4911
(.7543)
1.2613
(.6241)
1.7427
(.6520)
1.9662
(.7441)
1.8993
(.7303)
2.2304
(.8487)
2.0367
(.7496)
"
The OR j'S corresponding to the ML estimates are given in Table
5.12, along with large-sample 95% confidence intervals. These
..
intervals were calculated using the formula
"
"
{ exp [ 13 ± 1.96 (estimated standard error of 13 )]).
j
j
149
The values of
Table 5.11
"
Estimates OR )_j= 1,2 ,J ,~ar1(LCgrre~pondJng
J
95% Confidence Intervals) for Example 5.3.1
A
Structure of
~
n" hl
D1t2
1.64
(.50, 5.39)
1.76
(.44, 7.06)
nl
matrix
A
"
nh3
"
nh ..
A
nh5
A
CJJ
0
OR t
1.62
(.40,10.18)
1.92
(.36, 10.18)
1.30
(.35, 4.87)
A
OR 2
"
OR 3
e
e
3.28
(1.08, 9.97)
5.71
(1.60, 20.50)
3.77
4.44
3.63
(1.06,13.36) (1.02,12.98) (1.01,19.48)
7.14
(1.66,30.71)
6.68
(1.60, 27.96)
e•
3.53
(1.04, 12.00)
7.67
9.30
(1.76,49.10) (1.76,33.31)
1
e
,..
OR 1 do not vary a lot over the five columns. They range
,..
from 1.30 to 1.92. likewise, the values of OR 2 range from 3.28 to
A
4.44, not a large variation. The values of ORa range from 5.71 to
A
A
9.30. One conclusion which can be drawn is that fJa*> 5. 71, sinc~ fJ k0
'"
is smaller than the corrsponding estimate for any other choice of D 1•
A
For each OR j' the corresponding
confi~ence
intervals have lower
bounds which vary little between values of D 1• The upper bounds,
however, vary qUite a lot. Also, notice that the confidence intervals
A
for each OR j either all include the null value of 1, or all do not
include the null value of 1.
5.3.2 Poisson Regression Example
In thiS example, we assume model (1.18) holds. The classified
data are given in Table 5 ~ 13. These are hypothetical data which could
reflect the incidence of disease (D) arising out of a total amount of
population-time at risk (l). Again, there are four exposure
categories. In addition, there are six strata defined by six levels of
'"
one covariate. The matrices D and V 1 have the follOWing forms:
'"
A
D=
D1
D1
D1
D1
D1
'"
D1
A
and
V 1=
A
A
A
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
1
,..
where D1 can be one of the choices given earlier.
The model to be fit becomes
151
3 "
In Ais = /3 a + 2 Tr .. /3.
j=l 1J J
+ vs'y
where /3 a is the reference value for the group in the lowest
exposure category and the first stratum. /3 h /32' and /33 are the
effects for the second, third, and fourth exposure categories,
respectively. Yh Y2,"" Ys are the effects for stratum 2,3, ... ,6,
respectively, so that y'=(YhY2,""YS)'
Table 5.13
Data for Example 5.3.2
s=l
D
L
Eo 186 22013
1974
E 1 23
E2
2
133
E3 1249 97469
s=2
L
D
1648 188649
530
54389
128
12480
3692 279677
s=4
Eo
E1
E2
E3"
0
764
769
475
662
L
67290
60348
32437
35724
s=3
D
L
1643 180510
928
89634
409
33747
2037 136962
s=5
0
356
453
340
308
s=6
L
19556
23200
17365
11531
D
110
150
140
84
L
4056
5187
4645
2640
The Poisson option of the CATMAX procedure was used to obtain
"
"
WLS and ML estimates. Again, the values of Ywl s and Yml are
"
common to all n matrices used in the fitted model. These values are
given below along with the values of the common goodness-of-fit
statistics for WLS and ML estimation.
152
e-
Ywl s
Yml
=
=
.0319
.1211
.3337
.7276
1.0896
.0323
.1214
.3339
.7270
1.0857
Qw =15.1497, p=.4414
Qp =15.1726, p=.4391
Qlog =15.1913, p=.4377
....
In addition, the standard errors of the y's are also invariant.
....
As in the previous example, the /l's for WLS and ML methods
....
were calculated asstnning various structures for the ITt matrix. Table
5.14 contains the resulting ML estimates and their standard errors.
Table 5.15 contains the corresponding estimates of the lOR j 's and
95% confidence intervals.
153
Table 5.14
ML Estimates P , j=1,2,3 (and Estimated Standard Errors)
j
for Example 5.3.2
A
Structure of D t matrix
A
D h1
A
A
A
D h2
D h3
.1131
(.0241)
.1303
(.0283)
.1125
(.0292)
.1498
(.0341)
.0649
(.0291)
.2086
(.0308)
.2298
(.0353)
.2237
(.0346)
.2509
(.0420)
.2033
(.0379)
.4411
(.0189)
.4998
(.0216)
.4858
(.0213)
.5690
(.0247)
.5271
(.0224)
D h4
D hS
A
PI
A
P2
A
f33
A
Notice that although the same Dt's matrices are used here as in
the previous example, the ranges in this example are smaller. In
addition, the upper limits of the confidence intervals for the lOR j 's
are much more stable than those in Example 5.3.1. This indicates
that the analyses of certain data sets are more sensitive to changes in
misclassification probabilities than are others.
A series of tests were performed to relate the theory developed
in Section 4.7 to thiS example. The null hypothesis Ho:f3 j=O was
tested for j=1 ,2, ... ,k. Table
. 5.9 contains the Qwc values associated
A
with each test under each assumed D t matrix. According to the
theory presented in Section 4.7 , these values should vary, which, in
fact, they do. In addition, when the hypothesis HO:f31=f32=f33=O is
tested, the resulting Qwc statistic has the value Qwc=578.756 for
each D h as the theory dictates should happen.
A
154
e..
e
"
•
e
\
e
e
Table 5.16
Qwc's for Testing HO~j=O, j=1,2,3
A
Structure of H 1 matrix
H h2
H h4
H h3
A
X
A
...
H hS
13j
H h1
131
21.93
21.25
14.90
19.26
4.98
fJ2
46.01
42.36
41.79
35.71
28.83
fJ3
542.72
533.94
519.28
529.44
553.26
Although the values of 0wc may seem to vary greatly, the
...
conclusions reached using the different HI matrices are the same.
All five sets of results indicate that HO:fJl=O should be rejected,
along with HO:/32=O and HO:fJ3=O. This is in accordance with the
conclusions reached after examining the confidence intervals in Table
5.15. None of the fifteen intervals contains the null value of 1. In
particular, it would be comforting to know that the analysis results
...
are fairly robust to different specifications for the HI matrix.
5.3.3 Least Squares Regression Example
In this example, we assume the following model holds:
k
E(YiS)j~ 1T E(Y *) whe~e Y is a continuous variable, in this case,
ij
jS
blood pressure. The classified data are given in Table 5.17. There
are four exposure categories (i.e., k=3). The covariates involved
are continuous age and weight variables. Therefore, p=2. Each s
subscript refers to a specific subject within a classified exposure
category. There are three subjects classified as unexposed, two
156
e·
subjects classified into each of the first and second exposed
categories, and one subject classified into the third (or highest)
exposed category. Since the numbers of subjects classified into each
exposure category are not all equal, the discussion in Section 4.9
pertains to this situation.
3 "
The model to be fit is E(Y. )={3 + ~ Tri ,{3. e+ vs'y.
IS
a j=l J J
{3 a is the intercept term. {3 It {32 and (33 are the effects for the first,
second and third exposed categories, respectively. vs '=(vs1 ,vs 2)
where vs 1 is the value of the age variable and vs2 is the value of the
weight variable. y 1 is the age effect and Y2 is the weight effect.
ll
blood pressure
age
weight
110
100
98
23
22
27
155
150
149
s=2
110
114
30
31
138
142
s=1
[ s=2
112
118
38
28
140
129
[ s=l
113
36
140
s=1
Eo
Table 5.17
Data for Example 5.3.3
s=2
s=3
s=1
When the Yis's are given in the order
Y=(YOhY02,Y03,Yll,Y12,Y2hY22,Y31)', the
157
n" matrix will have the form
1
1
1
1
1
1
1
1
'"
n=
'"
7rOl
,..
7r02
7r03
,..
7rOl
7r02
7r03
'"
'"
'"
'"
7r02
7rOl
7r03
,..
'"
'"
7ru
7r
,.. 12
7r
,.. 13
7r
,.. 12
7r13
,..
7l"22
,..
7l"23
7r21
,..
7r22
,..
7r23
,..
7r31
7r32
7r33
'"
7ru
'"
7l"21
,..
'"
'"
For this example, the same n 1 's used earlier are considered. The
estimates
'"
P.,
J
j=1,2,3, and their estimated standard errors are given
,..
,..
in Table 5.18. As expected the values of Yl and Y2 are the same for
'"
all five choices of n 1• In addition, the estimated standard errors of
'"
2
'"
the y's, R , and the F statistic are invariant to choices of n 1• This
is consistent with the theory for LS estimation, presented in Section
4.9, which deals with situations in which there are unequal numbers
of subjects in the classified exposure categories.
Table 5.18
LS Estimates P , j=1,2,3 (and Estimated Standard Errors)
j
for Example 5.3.3
'"
'"
Stucture of ntmatrix
,..
n
'"
Pl
'"
'"
'"
,..
n h3
nit ..
nitS
33.992
(11. 706)
32.055
(11.381)
40.117
(13.956)
25.144
(9.732)
41.149
(14.123)
46.172
(16.037)
45.769
(15.954)
52.331
(18.620)
42.711
(15.610)
58.110
(13.415)
65.950
(15.210)
64.212
(14.780)
h1
29.355
(10.100)
n h2
,..
P2
,..
P3
158
74.991
(17.250)
66.617
(14.786)
e"
2
The estimated values of y (and their standard errors), of R , and
of the F statistic are given below. The F statistic used in this
analysis tests that all the coefficients except the intercept term, f3 a'
are zero.
2
¥1= -1.410
(.642)
R = .9626
¥2= .958
(.546)
F = 10.301, p=.0908
By comparing the estimates in Tables 5.11,5.14, and 5.18, we
can see that the variation in absolute tenns among values in the last
table is much greater. In relative tenns, however, the degrees of
variation are more or less the same among the three tables. In fact,
~
~
nh1 column, the range for PI is less
relative to the estimates in the
in Table 5.18 than in Tables 5.11 and 5.14.
Certain patterns among the estimates are similar across the three
tables. This indicates that estimators from the three types of
regression procedures are affected by varying degrees of
misclassification in similar ways. For instance, compare the first
rows of Tables 5.11, 5.14, and 5.18. The lowest value in each of
~
these rows corresponds to the
nitS matrix;
the highest value in each
~
corresponds to the
n h " matrix.
In fact, the highest value in each row
~
for all three tables corresponds to the
nh "
matrix. The second
highest value in the second row of all three tables corresponds to the
~
nh2 matriX, and in the third row to the nitS matrix.
From this we
can see that the type of regression procedure used does not affect
159
certain relationships among the estimates resulting from the use of
,.
various
n matrices.
5.4.3 Real Life Example
As a final example, we will investigate the effect of
misclassification of exposure on a data set given by Frome (1983).
The classified data appear in Table 5.19. These data are a subset of
the original data, including only the smokers.
The exposure variable is "cigarettes per dat. This variable is
considered to involve potential misclassification error. There are
six levels of the variable so that k=5. The covariate is "years of
smoking", which will not be considered to involve misclassificatlon.
There are nine levels of this variable, each of which represents a
stratum, so that 5=9. Therefore, there are a total of 54 (i,s) cells.
The values in the table represent the number of man-years at risk and
the observed number of lung cancer deaths (in parentheses) for each
(i,s) cell.
The variable dj' j=O, 1,2,3,4,5, represents the dosage level
variable for the j-th true exposure category. The variable t s '
s=1,2, ... ,9, is equal to the midpoint years of smoking in stratum s
diVided by 42.5. Functions' of these variables, namely In d and In
j
t s ' are used later as sets of score values for these two variables.
Two different analyses are performed on the data. First,
"Cigarettes per dat is treated as a nominal variable. The model
used for this analysis includes five separate exposure effects, one for.
each exposure level above the lowest. The second analysis treats
160
Table 5.19
Man-Years at Risk (and Observed Number of Lung Cancer Deaths)
Cigarettes per day
..-"
1-9
10-14
15-19
20-24
25-34
35+
(Eo)
(E l )
(E 2)
(E 3 )
(E..)
(E s)
5.2
11.2
15.9
20.4
27.4
40.8
15-19
3,121
(0)
3,544
(0)
4,317
(0)
5,683
(0)
3,042
(0)
670
(0)
20-24
2,937
(0)
3,286
(1)
4,214
(0)
6,385
(1)
4,050
(1)
1,166
(1)
25-29
2,288
(0)
2,546
3,185
(0)
5,483
(1)
(1)
4,290
(4)
1,482
(0)
30-34
2,015
(0)
2,219
(2)
2,560
(4)
4,687
(6)
4,268
(9)
1,580
(6)
35-39
1,648
(1)
1,826
(0)
1,893
(0)
3,646
(5)
3,529
(9)
1,336
(6)
40-44
1,310
(2)
1,386
(1)
1,334
(2)
2,411
(12)
2,424
(11 )
924
(10)
45-49
927
(0)
988
(2)
849
(2)
1,567
(9)
1,409
(10)
556
(7)
50-54
710
(3)
684
(4)
470
(2)
857
(7)
663
(5)
255
(4)
55-59
606
(0)
449
(3)
280
(5)
416
(7)
284
(3)
104
(1)
Years
of
Smoking
d.-~
j
0_
a
161
exposure as an ordinal variable. Here, a dose-response model is fit
to the data. In both analyses, the covariate is represented by In t s
and is treated as a continuous variable. In the second analysis,
"Cigarettes per day" is represented by In dj' a continuous variable.
Under each analysis, an appropriate misclassification model is
used to analyze the data assuming various degrees of misclassification
.
of exposure. Model (1.16) is used for the first analysis; model
(2.10) is used for the second. Five different sets of misclassification
probabilities are assumed. The same sets are used for both
"-
analyses. The values of the
7f
ij
'S
in the five sets are presented
below.
The first set corresponds to no misclassification. The third and
fourth sets could reflect situations in which it is more likely for
subjects to be misclassified into lower exposure categories than into
higher exposure categories. The pattern of 7f ij 's in Set #2 is such
•
that, for each claSSified exposure category, the probability of truly
belonging in an adjacent category is the same (.04). Further, the
probability of truly belonging in a category which is two categories
a~ay
from the classified category is the same for all classified
"-
categories (.01). The 7r. . 's in Set #S reflect a situation in which
IJ
those subjects classified into the lowest exposure category are least
likely to be misclassified.
•
162
Set #1:
j=O
j=l
j=2
1=0
1
0
0
1=1
0
1
i=2
0
1=3
j=4
j=5
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
1=4
0
0
0
0
1
0
1=5
0
0
0
0
0
1
j=O
j=l
j=2
j=3
j=4
j=5
1=0
.95
.04
.01
0
0
0
1=1
.04
.91
.04
.01
0
0
1=2
.01
.04
.90
.04
.01
0
1=3
0
.01
.04
.90
.04
.01
1=4
0
0
_ .01
.04
.91
.04
1=5
0
0
0
.01
.04
.95
j=O
j=l
j=2
j=3
j=4
j=5
Set #2 :
Set #3 :
•
j=3
1=0
.85
.10
.03
.015
.003
.002
1=1
.01
.87
.07
.03
.014
.006
1=2
.002
.008
.88
.07
.03
.01
1=3
.001
.001
.008
.89
.07
.03
1=4
.0003
.0007
.001
.008
.90
.09
1=5
.0003
.0007
.001
.003
.008
.987
163
Set #4 :
j=O
j=1
1=0
.80
.14
.04
i=1
.02
.82
i=2
a
i=3
0
i=4
i=5
Set #5 :
j=2
j=3
j=4
j=5
.0 1
.005
.005
.14
.015
.005
a
.02
.84
.14
a
0
.02
.86
.12
0
a
a
a
a
a
a
a
.02
.90
.08
a
.02
.98
j=O
j=l
j=2
j=3
j=4
j=5
1=0
.86
.03
.03
.03
.03
.02
i=1
.01
.80
.07
.05
.04
.03
i=2
.0 1
.04
.82
.06
.04
.03
i=3
.03
.04
.06
.82
.04
.01
i=4
.03
.04
.05
.07
.80
.01
i=5
.02
.03
.04
.06
.10
.75
5.4.1 Analysis Assuming Nominal Exposure Categories
The data in Table 5.19 are analyzed here assuming nominal
exposure categories and ordinal covariates levels. The variable In t s '
s=1,2, ... ,9, is used to represent the levels of the covariate "years of
smoking". Under assumed degrees of misc!assification, model (1. 16)
becomes
5 ,.
logit e. (v)= {3 + 2 Tr . • {3. + v y
IS S
a j=1 1J J
s
•
where {3 a is the effect fo!" the j=O exposure category, {3 j is the effect
for the j-th exposure category, j=1,2,3,4,5; vs=ln t s ' and y is the
164
•
linear effect of the covariate.
A
A
The U matrix is a series of nine vertically concatenated Ut
matrices, where U t is an estimate of Uit defined in Section 1.10.
Five different structures of U t are assumed, each corresponding to
one of the five sets of 7l"ij'S given earlier. The resulting ML
estimates of the
Pj 's, j=1,2,3,4,5, and their estimated standard
errors are given in Table 5.20. The corresponding estimates of the
. lOR j'S and their 95%
large~sample
confidence intervals are given in
Table 5.21. As expected, y and its estimated standard error are
invariant to choices of U. These values, along with the GOF
statistics and corresponding p-values, are given below:
y = 4.5023
Estimated standard error of y = .3356
Q = 39.7807
p
p=.7633
Qlog = 47.1459
p=.4666
•
A
Inspection of Table 5.20 shows that the estimates
Pj ,
j=1,2,3,4,5,
corresponding to Sets #1 through #4, and their estimated standard
.
errors, increase monotonically. Not all of the estimates
P.,
J
j=1,2,3,4,5, corresponding to Set #5, however, are larger than those
A
corresponding to Set #4. T'he estimate
Pt
for Set #5 falls between the
A
estimates of
Pt
for Sets #2 and #3;
P2 for Set #5
falls between the {32's
for Sets #3 and #4. Further, none of the estimated standard errors for
•
Set #5 is larger than for Set #4.
These relationships illustrate the point that the relationships among the
{J's corresponding to different sets of misclassification probabilities are
165
Table 5.20
,..
ML Estimates fJ j (and Estimated Standard Errors), 1=1,2,3,4,5,
for Example 5.4.1
,..
Assumed Set of Tr.. ' S
IJ-
Set #1
Set #2
Set #3
Set #4
Set #5
.8780
(.4880)
.9502
(.5444)
1.0005
(.5960)
1.1128
(.6533)
.9535
(.5927)
,..
P1
,..
P2
1.0924
(.4838)
1.1268
(.5235)
1.1861
(.5703)
1.2497
(.6052)
1.2244
(.5818)
1.6841
(.4338)
1.7511
(.4604)
1.8292
(.5110)
1.9233
(.5442)
2.0807
(.5265)
1.9072
(.4321)
1.9518
(.4581)
2.0363
(.5078)
2.1167
(.5398)
2.3630
(.5254)
2.3976
(.4456)
2.4750
(.4702)
2.5818
(.5156)
2.6573
(.5476)
2.9555
(.5433)
,..
P3
....
P4
....
Ps
e•
influenced by more than the probabilities of correct classification. As
we saw in Section 4.9, the relationships between estimates of fJ
,..
, corresponding to different sets of Trij'S do not depend on the relationships
,..
between the values of the Trij'S themselves, but on the relationships
....
,..
between the differences (Trij-TrOj)' i,j=1,2, ... ,5. In other words, we
ask not whether a certain misclassification probability is larger in Set
#2 than in Set #3; rather, we ask whether that misclassification
probability minus the corresponding misclassification probability
among the lowest classified exposure category is larger in Set #2 than
,..
,..
in Set #3. The relationship of fJ for Set #S and the fJ's for Sets
# 1 through #4 shows us that not all the estimates from one set will
166
•
e
•
•
.
•
e
"
Table 5.21
"
Estimates IOR (and CorresP9nding 95% Confidence Intervals)
j
.1= 1,2,3,4,5 for Example 5.4.1
A
Assumed Sets of
"
lOR.
~
1T
ij
's
Set #1
Set #2
Set #3
Set #4
Set #5
2.4
( .9, 6.3 )
2.6
(.9,7.5)
2.7
(.9,8.8)
3.0
(.9, 11.0)
2.6
(.8, 8.3)
3.0
(1.2, 7.7)
3.1
(1.1, 8.6)
3.3
(1.1, 10.0)
3.5
(1.1,11.4)
3.4·
(1.1, 10.6)
5.4
(2.3, 12.6)
5.8
(2.3, 14.2)
6.2
(2.3, 17.0)
6.8
(2.4, 19.9)
8.0
(2.9, 22.5) .
7.0
(2.9, 17.3)
7.7
(2.8, 20.7)
8.3
(2.9, 23.9)
10.6
(3.8, 29.7)
11.9
(4.7,29.9)
13.2
(4.8, 36.3)
0)
~
"
IOR 2
"
IOR 3
"
lOR..
"
IOR s
6.7
(2.9, 15.7)
11.0
(4.6, 26.3)
14.3
(4.9, 41.7)
19.2
(6.6. 55.7)
e
not necessarily be larger or smaller than the corresponding estimates
from another set.
From Table 5.21, we can see the monotonic increase of the fJj's
for Sets # 1 through #4 reflected in the monotonic increase of the
lOR j 's for these sets. Notice, also, that the values of these
estimates do not vary greatly among the first four sets of
misclassification probabilities. The estimates resulting from use of
the fifth set, however, are qUite large compared to the values in the
column for Set #1 for j=3,4,5
The upper bounds of the confidence intervals vary great!y,
especially as j increases. This is not unexpected since exponentiation
is used in calculating the confidence intervals. The lower bounds are
relatively stable, especially for Sets #1 through #4. The intervals
across the sets for each j, however, are consistent with respect to the
inclusion of the null value of 1. They either all contain 1 or none
•
contains 1.
In addition, the hypothesis Ho: fJ 1=fJ2=fJ3=fJ ~=fJ 5 was tested for each
,.
assumed set of
'S. As was mentioned in Section 4.8, the value of
ij
,.
0wc used to test this hypothesis varies with the chOice of n1• The
7T
values for the 0wc's and their corresponding p-values are given in
Table 5.22.
•
168
Table 5.22
Values of Qwc (and P-Values) Based on Testing
for Example 5.4. 1
HO:1l.1=fb=fh=~4=1J..s_
,..
Assumed Set of Tr i "'s
JSet #2
Set #3
Set #4
Set #1
Qwc
29.68
31.91
(0.000)
(0.000)
Set #5
31.32
30.82
32.96
(0.000)
(0.000)
(0.000)
Although these values do vary, the degree of variation is small.
Also, they all lead to the same conclusion: reject Ho•
Finally, let us draw some conclusions based on the results from
,..
using the five sets of Trij's. Most importantly, we can see that, on the
whole, the results reached when misclassification is ignored still hold
•
when misclassification is not ignored. Many of the conclusions
reached when using Set #1 are also reached when using Sets #2
through #S. The values of Qp and of Qlog are :nvariant, indicating
that the fit of the model is good for all of the Tr sets. Also, the
ij
values of Qwc are nearly invariant, so that the conclusion of
inequality among the true exposure effects is supported for all five
sets.
In order to determine final estimates for the fJ j 's and their
estimated standard errors, we must review the five sets of
...
misclassification probabilities. In general, when one is dealing with a
situation in which there is potential misclassification, an important
beginning step is to examine the exposure variable itself for clues
169
concerning patterns and degrees of misclassification. For this
example, let us make the following assumption. It is reasonable to
suspect that subjects are likely to report that they smoke less than
they rea11y do and unlikely to report that they smoke more than they
....
rea11y do. In other words, those classified into low exposure
categories are often likely to belong truly in higher categories. The
converse, however, is not true; a subject who admits to smoking a
. large number of Cigarettes probably does. As a result, the patterns in
Sets #3 and #4 would seem to best reflect the true misclassification
patterns.
Therefore, we suspect that the estimates resulting from the use of
Sets #3 and #4 may be closest to the true estimates. Since the final
estimate of
fJ will be a biased estimate of f/*, there is no need to
worry over about obtaining a unique value for
fJ. Estimates from either
Set #3 or #4 could be used. A final analysis would include these
estimates, as well as the estimates obtained when misclassification is
ignored.
5.4.2 Analysis Assuming Ordinal Exposure Categories
The data from Table 5.19 will now be analyzed assuming ordinal
levels for the exposure variable as well as for the covariate. Again,
the variable In t s , s= 1,2, ... ,9, is used to represent the levels of the
covariate, years of smoking ll . The variable In dj is used to
lI
represent the levels of the exposure variable, cigarettes per dayll.
II
Model (2.10) can be applied in this situation. The score values,
170
e.
•
c ' j=O,l, ... ,k, (defined in Section 2.7) are equal to the values In d ,
j
j
j=O, 1, ... ,k, in this example. Under assumed sets of misclassification
probabilities, the model becomes
logit
where
ai s (vs ) = f3 a
5
A
+!" ._
~Tr.. (In d.) + v Y ,
IJ
J
s
J-
fJ a is the intercept term and !" is the linear effect for number
of Cigarettes per day; vs=ln t s ' and y is the linear effect for the
years of smoking.
A
The
A
n matrix is a series of nine vertically concatenated n
1
matrices where
5
1
j~ TrOj (In d j )
5
1
A
'5 Tr 1 .
J
(In d.)
J
.2 Tr2J'
(In d .)
j~
5
1
A
A
J=O
J
5
.'5 Tr3J' (In d.)
A
1
J~
5
1
A
2 Tr ·· (In d.)
j=O 4 J
J
5
1
J
A
'5 Trs·
j~
J
(In d .)
J
Notice that the elements in the second column are weighted averages
of the In dj's, with the weights being the Trij's. The values of these
171
elements for each of the five sets of misclassification probabilities
are given in Table 5.23. The values in the Set #1 column are simply
the values of In d , j=0,1,... ,5. The values in the other columns are
j
weighted averages of the values in this first column.
Table 5.23
A
Elements in Second Column of fit Matrices for Example 5.4.2
A
Set #1
Assumed Set of 7f i .'s
JSet #2
Set #3
Set #4
Set #5
row 1
1.649
1.691
1.789
1.833
1.838
row 2
2.416
2.405
2.471
2.463
2.537
row 3
2.766
2.757
2.805
2.794 .
2.806
row 4
3.016
3.018
3.053
3.046
2.954
row 5
3.311
3.309
3.342
3.337
3.181
row 6
3.709
3.686
3.701
3.701
3.510
A
As was mentioned in Section 4.4, the estimate y and its
A
estimated standard error are not invariant to choices of fit under this
"ordinal-type" model. The values of these estimates for each set of
A
's, along with the estimates of t", are given in Table 5.24. In
ij
addition, the GOF fit statistics are not invariant. The values of these
7T
statistics are given in Table 5.25.
From Table 5.24 we can see that neither the estimates of y nor of
A
the corresponding standard errors vary much among the five
There is more variation among the estimates of
172
t".
7T ij
sets.
Notice that the
e-
estimates '[" increase monotonically for Sets # 1 through #4 as the
~
~
estimates f3 j for j=1 ,2,3,4,5 did in Table 5.20. In addition, '[" for Set
#5 is larger than for Set #4, resulting in an increase across all five
~
.
sets. This is not surprising, since the estimates f3., j=3,4,5, were
J
,.,
larger for Set #5 than for Set #4, while the estimates f3., j=1,2, were
J
not much smaller for Set #5 than for Set #4.
Table 5.24
~
Estimates '[" and
,.,
y
(and Estimated Standard Errors)
for Example 5.4.2
A
Assumed Set of
y
Tr.• 's
lJSet #3
Set #4
Set #1
Set #2
1.1824
(.1686)
1.2059
(.1717)
1.2527
(.1781)
1.2629
(.1782)
1.5114
(.2186)
4.5043
(.3350)
4.5055
(.3351)
4.5048
(.3351)
4.5031
(.3352)
4.5021
(.3347)
Set #5
Table 5.25 contains the standard normal variate (z) based on
~
..
testing the hypothesis Ho: r-O for each set of rr.. 'so In addition, it
.
lJ
contains 95% confidence intervals for '[". The values of the z's indicate
that Ho: r-O is rejected under each set of misclassification
probabilities. Also notice that all these values are nearly equal.
There is greater variation among the upper and lower limits of the
~
confidence intervals, which reflects the variation among the '["'s and
their estimated standard errors.
173
Table 5.25
Standardized Normal Variates (z) Based on Testing Ho: r-O
and 95% Confidence Intervals (Cn for
t"
for Example 5.4.2
A,
Assumed Set of "1 . 's
JSet #1
Set #2
z
7.0140
7.0222
7.0324
CI
(.8519, 1.529)
(.8694, 1.5424)
(.9036, 1.6018)
Set #3
A,
Assumed Set of " ij 's
z
Set #4
Set #5
7.0886
6.9139
(.9136, 1.6022)
CI
(1.0829, 1.9399)
From Table 5.26, we can see that, although the values of
the goodness-of-fit statistics do vary, they do not vary appreciably.
Each value of either Qp or Qlog indicates the same degree of fit as the
other values of that statistic.
Table 5.26
Goodness-of-Fit Statistics (and P-Values) for Example 5.4.2
....
Assumed Set of "ij 's
Set #1
Set #2
Set #3
Set #4
Set #5
p
39.321
(.8832)
39.308
(.8835)
39.310
(.8835)
39.125
(.8877)
39.876
(.8700)
Qlog
48.275
(.5825)
48.157
(.5873)
48.218
(.5848)
48.223
(.5846)
48.940
(.5559)
Q
174
In conclusion, we can see that many results found when assuming
no misclassification are also found when assuming various degrees of
....
misclassification. Although '1, its estimated standard error, and the
OOF statistics are not perfectly invariant, as they were in the
previous analysis, they are nearly invariant. All five
sets
ij
basically agree with the choices of 4.50 and .335 for '1 and its
7r
....
estimated standard error. In addition, the OOF statistic Q indicates
p
the same degree of goodness-of-fit for each set. The same holds for
Qlog·
As was mentioned in the previous section, the estimates of
t'
from
Sets #3 and #4 may be closest to the true estimate since there is
reason to suspect that the patterns in these sets best reflect the
patterns of the true misclassification probabilities. A final analysis
would then include the estimates of
t'
from these two sets, as well as
the estimate obtained when misclassification is ignored.
175
....
CHAPTER SIX
SUGGESTIONS FOR FURTHER RESEARCH
There are many issues contained in this dissertation which could
be expanded upon or extended. For instance, the theory developed in
Chapters 3, 4, and 5 principally focus on models which deal with
situations involving misclassification of exposure alone, and no
interaction. Questions Similar to those answered in these chapters
could be addressed in terms of other models.
Consider a model which includes interaction. It may be of
interest to know whether the properties of estimators from such a
model will have the same properties as those from models which don't
,..
,..
include interaction. For instance, the bias of fJ and y could be
determined. Also, the possible invariance of certain statistics (e.g.,
,..
y and GOF statistics) case could be investigated.
In Chapter 5, we saw that models (1.16) and (1.18) work
reasonably well in a variety of situations. Similar work could be done
to evaluate the performance of other models, such as the model
developed for misclassification of covariates and model (1.19),
developed for use with LS regression.
The basic identity upon which the models in Chapter 1 were
developed, expreSSion (1.4), was derived assuming nondifferential
misclassification of exposure, a quality we believe is inherent in
follow-up studies. Other types of studies, particularly case-control
studies, may not induce thiS condition. Therefore, the development of
a model which does not rely on the assumption of nondifferential
misclassification could be very useful.
Further, the models developed in thiS dissertation allow for
misclassification of exposure variables or covariates only. Models
could be developed for situations in which there is misclassification
of the response variable. This misclassification could occur alone,
or along with misclassification of explanatory variables. The
properties of the estimators and the performance of such a model
could then be investigated.
Associated research can also be done to extend the ideas and
results developed here to other types of statistical analyses. For
example, models could be developed for survival analysis or
techniques could be developed for use with analysis of survey data.
Finally, the models and,results contained herein are certainly not
confined to epidemiologic settings. Misclassification error occurs in
a variety of research areas. If the models developed in this
dissertation are not directly applicable to other types of problems or
situations, adaptions to the theory can be made.
177
REFERENCES
Barron, B. (1977). The effects of misclassification on the
estimation of relative risk. Biometrics 33:414-418.
Bross I (1954). Misclassification in 2x2 tables. Biometrics
10:478-486.
Copeland, K.T., Checkoway, H., McMichael, A.J. & Holbrook, R.H.
(1977). Bias due to misclassification in the estimation of
relative risk. American Journal of Epidemiology 105:488495.
Diamond, E.L. & Lilienfeld, A. M. (1962). Effects of errors in
classification and diagnosis in various types of
epidemiological studies. American Journal of Public Health
52:1137-1144.
Elton, R.A. & Duffy, S.W. (1983). Correcting for the effect of
misclassification bias in a case-control study using data from
two different questionnaires. Biometrics 39:659-665.
Frome, E.L. (1983). The analysis of rates using Poisson
regression models. Biometrics 39: 665-674.
",.
Gladen, B. & Rogan, W.J. (1979). Misclassification and the
design of environmental studies. American Journal of
Epidemiology 109:607-616.
Goldberg, J.D. (1975). The effects of misclassification on the
bias in the difference between two proportions and the relative
odds in thefourfold table. Journal of the American Statistical
Association 70:561-567.
Greenland, S. (1980). Effects of misclassification in the
presence of covariates. American Journal of Epidemiology
112:564-569.
Greenland, S. & Robins, J.M. (1985). Confounding and
misclassification. American Journal of Epidemiology
122:495-506.
Grizzle, J.E., Starmer, C.F. & Koch, G.G. (1969). Analysis of
categorical data by linear models. Biometrics 28:489-504.
Gullen,W.H., Bearman, J.E. & Johnson, E.A. (1968) Effects of
misclassificationin epidemiological studies. Public Health
Report 83:914-918.
Imrey, P.B., Koch,G.G. & Stokes, M.E. (1981) Categorical data
analysis: somereflections on the log linear model and logistic
regression. Part I: historical and methodological overview.
International Statistical Review 49:265-283.
179
Keys,A. & Kihlberg, J.K. (1963). Effects of misclassification on
estimated relative prevalence of a charactersitic. Part 1. Two
populations infallibly distinguished. Part II. Errors in two
variables. American Journal of Public Health 53: 1656-1665.
Korn, E.L. (1981). Hierarchical log-linear models not preserved
by classification error. Journal of the American Statistical
Association 76: 110-113.
Mote, V.L. & Anderson R.L. (1965). An investigation of the
2
effect of misclassification° on the properties of X tests in the
analysis of categorical data. Biometrika 52:95-109.
Newell. D.J. (1962). Errors in the interpretation of errors in
epidemiology. American Journal of Public Health
52: 1925-1928.
Seber, G.A.F. (1976). Linear Regression Analysts. John Wiley
& Sons, New York.
Shy, C.M., Kleinbaum, D.G., & Morgenstern, H. (1978). The
effect of misclassification of exposure status in
epidemiological studies of air pollution health effects.
Bulletin of the New York Academy of Medicine 54: 11551165.
180
© Copyright 2026 Paperzz