THE
ERROR~IN~VARIABLES
PROBLEM IN THE LOGISTIC REGRESSION MODEL
by
Rhonda Robinson Clark
Department of·Riostatistics
University of North Carolina at Chapel Hill
Ins·titute of Statistics Mimeo Series No. 1407
July 1982
•
THE ERROR-IN-VARIABLES PROBLEM
IN THE LOGISTIC REGRESSION MODEL
by
. Rhonda Robinson Clark
A Dissertation submitted to the faculty of the University of
North Carolina at Chapel Hill in partial fulfillment of the
requirements for the degree of Doctor of PhilosophY in the
Department of Biostatistics.
Chapel Hill
JULY 1982
--
ii
ABSTRACT
RHONDA ROBINSON CLARK. The Error-in-Variables Problem in the
Logistic Regression Model. (Under the direction of Clarence
E. Davis.)
It is well known that when the independent variable in simple
linear regression is measured with error that least squaresestimates are not unbiased.
This is also true for logistic regression
and the magnitude of the bias is demonstrated through simulation
studies.
Estimators are presented as possible solutions to the 'errorin-variables' problem; that is, the problem of obtaining consistent
estimators of model parameters when measurement error is present in
the independent variable. Two solutions require an external estimate of the variance of the measurement error, two require multiple
measures on the independent variable, while two others are extensions of the method of grouping, and the instrumental variable approach. Simulation studies show that the use of an external estimate of the error variance or multiple measures on the independent
variable leads to estimators with substantially lower mean square
error than the least squares estimate.
For the grouping and instru-
mental variable approaches, the proposed estimators have lower mean
square error under some but not all conditions.
The methods discussed are applied to data from the Lipid
Research Clinics Program Prevalence Study.
i. i i
ACKNOWLEDGEMENTS
I wish to thank the chairman of my Doctoral Committee,
Dr. Clarence E. Davis, for his guidance, encouragement, and
never-ending optimism. Without his support this work could
not have been completed.
I am also grateful to Drs. G. Koch,
M. Symons, J. Hosking, and G. Heiss for their suggestions and
participation on the committee.
I would also like to take this opportunity to thank my
friends and family for their love and support throughout
my
years in this program.
Finally, I would like to thank Ms. Ernestine Bland for
the excellent and speedy job that she performed in typing the
manuscript.
iv
TABLE OF CONTENTS
CHAPTER
I
PAGE
INTRODUCTION AND REVIEW OF THE LITERATURE.............
1.1
The Error-in-Variabl es Problem in Linear
Regression;...... . .. ... . .. . .. ... .. . . .. .... . .. .. . . ..
1•1•1 In:troduc t ; on . . . . .. . . . . . . . . . . . . . . . . . . . . . . . .
1.1. 2 Classical Approaches to the Error-inVariables Problem
1.1.3 Alternatives to the Classical Approaches ..
1.1.3.1 The Method of Grouping ...•.•.....
1.1.3.2 The Use of Instrumental
1
3
3
· 5
8
8
Vari a'b1e-s ••••••••••• _. • • • • • • • • •• 10
1.1.3.3 Replication of Observations .•.... 11
1.2 The Use of the Logistic Regression Model in
Epidemiologic Research ...•....................... 15
1. 2. 1
In t roduc t 1on
.- . • . . . • . . . . . . . . . • . . . . . .. 15
1.2.2 Discriminant Function Approach •.•......... 16
1.2.3 Weighted Least Squares/Maximum
Li kel i hood Approach....... .... . . . . .. .. . .. 17
1.2.4 Comparison of the Approaches ..•.•...•..... 20
1.3 Outl ineof the Research
11.
21
DESCRIPTION OF THE BIAS DUE TO MEASUREMENT ERROR ...... 23
2.1 A Review of the Simple Linear and Multiple
Linear Regression Cases .•......••••.......•.....• 24
2.1.1 Simple Linear Regression Case ........•.... 24
2.1.2 Multiple Linear Regression Case
25
2.2 Mathematical Descriptions of the Bias Under
the Logistic Regression Model ......•............. 28
2.3 Description of the Bias Under the Logistic
Regression Model Using Simulation
2.3.1 Description of the Simulation
Pr-oc edure. . . . . . . . . . . . . . . . . . .. . . . . . . . . . . ..
2.3.2 Simple Logistic Regression Case ..•........
2.3.3 Multiple Logistic Regression Case.•.......
2.3.4 Summary:of Results
30
31
33
34
35
v
PAGE
CHAPTER
III
ALTERNATIVE ESTIMATES OF S WHEN MEASUREMENT
ERROR IS PRESENT IN THE SIMPLE LOGISTIC REGRESS ION MODEL........................................ 37
3.1 Modified Discriminant Function Estimator •..••.•.. 37
3.2 An Estimator Based on the Method of Grouping ..... 38
3.2. 1 The Method........... . . . . . ... . . . . . . . . . . . . .. 39
3.2.2 Rationale Behind the Method .......•....... 40
3.3 Use of an Instrumental Variable to Estimate S.••• 43
3.3.1 Iterative Instrumental Variable
Estimator
43
3.3.2 Two-Stage Least Squares Estimator
47
3.4 Another Two-Stage Estimator of B••••••••••••••••• 49
3.5 Comparison of the Alternative Estimators
IV
51
USE OF MULTIPLE MEASUREMENTS IN THE ESTIMATION
OF 13 ......•...••....•.•........_.......•.•....•........ 57
4.1 The Mean of m Independent Measurements as
the Independent Variabl e
57
4.2 Further Reduction of the Bias When m is
Small
-. 59
4.3 Use of the James-Stein Estimator of Ui as
the Independent Va ri ab1e.. . . . . . . . . . . . . . .. . .. •.... 61
4.4 Estimation of 0 2 When mMeasurements are
taken on a SUbsampl e
63
4.5' Results of Simulation Study
63
V APPLICATION OF METHODS TO THE LIPID RESEARCH
CLINICS DATA .........................•.....
5.1
Description of the Data Set
It • • • • • • • • • •
67
67
5.2 Computation of the Estimates and Their
Variances
-
68
5.3 Comparison of the Estimates
5.4 Conclusions
74
;
77
e
vi
CHAPTER
VI
PAGE
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH •••••••••• 84
vi i
LIST OF TABLES
TABLE
PAGE
1.1
Some Examples of Between-Individual and
Within,,:,Individual Variability ...........•..........•.. 2
1.2
ANOVA Table for Xand Y Data
2.1
12
Estimates of the Bias Along With 95%
Confidence Interva"l s. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36
3.1
Simulated Mean, Bias, Standard Deviation,
and Mean Square Error of the Estimators
in, Chapter 3
4.1
56
Simulated Mean, Bias, Standard Deviation,
and Mean Square Error of the Estimators
in Chapter 4...................•...................... 66
5.1
2
2
Sample Estimates of X, Sx' and SE by
Model and DataSet ....................•.........•.."... 71
5.2
Correlations Between Triglyceride Level
and Instrumental Variables by~~Data Set .•.............. 73
5.3
Simple Logistic Model With Total Cholesterol Level at Visit 2 as the Independent
Var ; ab1e••••••••••••••••••••••••••••• '. • • • • • • • • • • • • • • •• 7·9
5.4
5.5
Multiple Logistic Model With Total
Cholesterol at Visit 2 and Age as the
Independent Variabl es
80
Simple Logistic Model With Triglyceride
Level at Visit 2 as the Independent
. Variable • .••••••••••••••••••••••-.•.•••••••.••••.••..••• 81
5.6
Multiple Logistic Model With Triglyceride
Level at Visit 2 and Age as the Independent Variables
'. . . . . . . . .83
CHAPTER
r
INTRODUCTION AND REVIEW OF THE LITERATURE
Epidemiologic data collected from prospective studies are often
used to model the incidence of disease as a·function of one or more
independent variables. A commonly employed statistical procedure
for such studies is multiple logistic regression.
For example, in a
prospective study of coronary heart disease in Framingham (Truett,
Cornfield and Kannel (1967», the variation in the incidence of
disease as a function of serum cholesterol, systolic blood pressure,
age, relative weight, hemoglobin, cigarettes/day, and ECG measured
on the initial examination was investigated. The probability, P,
of developing the disease during the 12-year time interval was estimated for each individual using the logistic regression model:
In the standard linear regression problem one assumes that the
independent variable is measured without error and that the values
of the dependent variable deviate from their true values only by a
random error component representing explanatory variables excluded
from the model or sampling error. Under these assumptions the
ordinary least squares (OLS) estimates of the parameters are
2
unbiased with minimum variance. Violation of these assumptions in
the linear regression model leads to an underestimate of the expected magnitude of the regression coefficients (Snedecor and
Cochran (1967)).
It
is well known from epidemiologic 1iterature that variables
such as systolic blood pressure and serum cholesterol level are
subject to possible "errors" or variation in measurement. Two recognized sources of this variation are within-individual variability
and observer (interviewer, laboratory) error. Gardner and Heady
(1973) give the following examples of within-individual variability
o~
and between-individual variability
TABLE 1.1
o~:
Some Examples of Between-Individual and WithinIndividual Variability
Mean
~
~
Cholesterol (mg %)
215
25
40
Blood Pressure (mmHg)
141
9. 1
13.6
2846
370
406
Variabl e
Calories (per day)
Often these errors are combined with sampling error and forgotten.
The purpose of this research is to study the effect of measurement
error on the estimated logistic regression model coefficients and
to propose methods of producing consistent estimates of these regression coefficients when error in the independent variable is present.
3
1.1 The Error-in-Variables Problem in Linear Regression
1.1.1
Introduction
There are several possible situations out of which the finite
set of independent pairs of observations (Xl'Y l ), (XZ,Y Z)' ...
(Xn,Y n) might arise. In each of the situations that will be discussed, there is a corresponding set of true values that will be
denoted as (U l ' Vl ), (U Z' VZ), ••. (Un' Vn) . In the regress ion
situation the following linear relationship between the true variables Uand V holds:
V = ex + I3U + t
E(V IU) = ex + I3U
where V is a random variable, Umay be fixed or random, and t has
mean 0 and variance
at.
In the particular' case where U and V are
bivariate normal, this is the correlation model andE{UIV) =y +
If V and Uare linearly related as follows:
V = ex + I3U
and if both V and Uare fixed, the relationship between thevariables is called functional.
If the above relationship holds, but
Vand U are random variables, we have a structural relationship
between V and U.
If we fail to observe the true variables U and
V in any of the above situations, and instead observe X =U +
E
and Y = V + n, we have what is known as the error-in-variables
problem or measurement error.
For the remainder of this study we
shall concern ourselves with the error-in-variables problem when
oV.
4
the relationship between the true unobserved variables is structural.
Suppose that only V is subject to error, then V = ex +
au
be-
comes
v - n =a
v =a
where E(n)
= 0,
+
eu
+
eu
+n
2 and is independent of U.
n has variance 0 n
.
Thus the
relationship between the observed values is a regression type relationship with
E(V\U) = a +
eu
The ordinary least squares COLS) estimators are, in this case, both
consistent and unbiased.
If V is not subject to error, but U is, then
V = a + e(X - E)
x = -al a + 1I e V +
= a* + (3* V +
E(XIV)
= a*
E
+ e*V
02
where E{E) = 0, Ehas variance
E
E
and is independent of V.Again,
we have a regression type relation with the OLS estimators being
consistent estimators of a and
S.
(S = 1/S*, ~ = -~*/~*).
If both Uand Vare subject to error the problem becomes complicated. The relationship between the observed variables is
v
=a
The error term is now n since X =U +
~
+ eX + n - e E.
(3E
and is no longer independent of X
The OLS technique in this situation leads to
5
biased and inconsistent estimates of the regression parameters.
.
That is, if b is the OLS estimator of S, then
where
cr~
is the variance of U.
In another model proposed by Berkson (1950), observations are
made on V for a given X. Here again X = U +
E
and Y = V + n. We
are no longer trying to observe a given U, but for each fixed X
there are a number of U·s which could have given rise to that particular observed X. Also for each U there is a probability that
the observed fixed X is an observation on that U with error
E.
The
U·s are now random variables distributed about the fixed Xwith
error
E.
Now
E
is independent of X (but not of U), and
Y = a + eX + n - SE
E(Y IX) = a + (3X
where X is fixed, and we again have a situation in which the OLS
estimator of S is unbiased and consistent.
1.1.2 Classical Approaches to the Error-in-Variables Problem
.
A number of approaches have been proposed for dealing with the
error-in-variables problem under the structural model. We will
begin with the classical approaches discussed by Moran (1971) and
Madansky (1959).
Let the relationship between the unobserved true variables be
6
V.1
=a
+ (3U.1
•
1.
ni and Ei are normally and independently distributed with
means equal to 0 and variances O"~ and O'~ respectively,
2.
U
3.
U
i is normally distributed with mean l.l and variance
i , ni and
O"~,
i are mutually independent,
E
the observed variables X and Yare jointly distributed in a bivariate
normal distribution with parameters lJ, 0u2 ,
= l.l
E(X)
Var(X)
E(Y)
=a
2
.E
, 0
2
n
, Ct
and (3, where
+ BJ.l
= (32
Var(Y)
CoV(X,Y) = f3
0
0
2
U
+ 02
n
02
(1.1.2.1)
U
The following quantities are jointly sufficient for the equations
in (1.1.2.1):
n
X=
!
i=l
v=
fi=l
Yi
n
n (y._y)2
Sy.y =
I
i=l
1n
(1.1.2~2)
The maximum likelihood equations are derived by setting the quantities in (.1.1.2.1) equal to those in (1.1.2.2) if we can assume that
the latter five quantities are functionally independent.
•
7
The fact that we have five equations and six unknowns makes
the parameters
O'~, O'~, O'~,
a, and f3 unidentifiable.
In order to
avoid this problem we need to change some of our assumptions or
obtain additional information.
or
O'~/O'~
In particular, if we knew
and were sure that cov( E,n)
subsequentlya.
= 0,
O'~, O'~,
we coul d estimate Band
Both Moran (1971) and Madansky (1959) give the
following estimates of B when additional information is available.
1.
0'2
n
known:
-
Syy
A
(31 =
0
2
n
Sxy
2.
02
E
4.
When both
known:
O'~
and
O'~
are known, we are confronted with an
over-identification situation. We now have four parameters
and five equations.
cies.
That is
As a result we arrive at inconsisten-
8
are both derived from the same system of equations.
In
large samples these two estimates will become asymptotically equivalent. Because of the inconsistencies s a
maximum 1i kelihood sol ution cannot be arrived at by
equating (1.1.2.1) to (1.1.2.2). Kiefer (1964) points
out that under these circumstances the proper procedure
is to write out the likelihood and maximize it with respect
to the four unknown parameters as as
~s
and
cr~.
Barnett
(1967) gives the resu1 ting equations.
5. The previous four estimators are derived upon the assumption that cOv(Esn)
cOv(Esn) +
acr~.
= O.
If cov(Esn) ; Os then cov(xsY)
Therefore if both
cr~
and
o~
=
are known and
we assume that COV(Esn) is not equal to Os we have five
unknowns and five equations. The estimate of a is
= sign[J
,=1 xiYi
1.1.3 Alternatives to the Classical Approaches
1.1.3.1 The Method of Grouping
Several methods of solving the error-in-variables problem when
no additional information about variances is available have been
proposed. These methods do not require any distributional assumptions.
~
9
One such method is known as the method of grouping. The method consists of ordering the observed pairs (Xi,V i ), selecting proportions
Pl and P2 such that Pl + P2 <= 1. placing the first nPl pairs in
groupG l , and the last nP2 pairs in another group G3 , and discarding
G2, the middle group of observations (if Pl + P2 < 1).
"
. V, -
e =
6
V3
Xl - X3
b2
=-
b,
Wald (1940) states that 66 is a consistent estimate of
(3
if:
1. the grouping is independent of the errors E:i and ni;
2. as n ~
co.
b, does not approach 0, namely lim inf!Xl-Xjl
>
o.
J'I-HO
In order to determine when conditions (l) and (2) are satisfied we
consider several possible procedures for assigning observations to
the groups.
It is obvious that if the observations are assigned
to the groups randomly, condition (1) would be satisfied but not
condition (2) since then
If the magnitude of the Ui's were known, it would be possible
to rank the observations by their corresponding magnitude of Ui •
This grouping procedure would satisfy both conditions. However, it
is unlikely that infonnation on the magnitude of the Ui's would be
available without the actual values. Another possibility is to
order the observations (Xi,V i ) by the magnitude of the observed Xi 's.
This procedure will not guarantee consistency since the ordering
may be dependent on the error .. Neyman and Scott (1951) give the
10
following necessary and sufficient condition for the cons.istency of
S6
when the observations are grouped according to the magnitude of
the observed Xi's:
S6
is a consistent estimate ofa if and only
if the range of U has 'gaps' of 'sufficient
length' at 'appropriate places' (determined
by P1 and P2) where U has probability zero
of occurring.
This condition guarantees that, as n -+
00,
the misgrouped observaA
tions with respect to Udo not contribute to plim a6 tending away
~
from S.
1.1.3.2 The Use of Instrumental Variables
A second method of obtaining consistent estimates of a involves
the use of instrumental variables. Suppose that in addition to
having observations on the variables Xi and Vi' we also observe
another variable Zi which is known to be correlated with Ui and Vi'
but is independent of Ei and ni' Then
A
f3 7 =
~ (Z.-Z)(V.-Y)
i=l'
,
"""--n" - - - - - - - -
L (Z.-Z)(X.-'X)
i=l'
,
is a consistent estimate of f3 provided
approach zero and n -+
co
(i.e., cov(Z,X)
n
I
i=l
r}
(.z.-Z)(X.-X) does not
. ,
,
0). It should be noted
that the effectiveness of this method depends on one's ability to
find a variable which is independent of the error and correlated
11
with Uand on the strength of that correlation. Letting Zi take
the values -1. O. 1 depending on i. where i is independent of the
error. reduces '"f3] to '"f3 6. Also Durbin (1954) suggests that Zi = i
is a better instrumental variable if the rank order of the X; 's is
the same as the tank order of the Ui's. That is. it leads to a
more efficient estimate than the method of grouping.
1.1.3.3
Replication of Observations
We have seen that in the classical case we can estimate f3
when we know
2 /0 2 •
This naturally leads to the considE nor
n0 E
02 • 02 •
eration of the estimation of
e when
we can estimate one or more of
these quantities. This is possible if for each (.UpV i ) there is
more than one corresponding. value of (X p Vi).
Assume that we have n pairs of values (Ui,V i ) and Ni observations on each pa ir. That is
V••
lJ
= V.1
+ n·lJ. .
j = 1.2••••• N
i
; = 1.2 ••.•• n
If the usual assumptions of independence are made. a one-way
analysis of variance can be carried out on the
XiS
and V's and an
estimate of f3 computed. Madansky describes the procedure by using
the following ANOVA table ..
TABLE 1.2 ANOVA Table for Xand Y Data
Source
I
BETWEEN SETS{ II
tIll
n
LN.
;=1 1
(x 1.• -x ••.)2
VI
e:
n-1
- . -L N.' (-x.1· -x- •• ) (y,.
y•• ).
;=1
n-1
n
\'L
N;
L ~
-
(x;j-X;'>
n
\'
N.
\' 1
l
i=l j=l
n
\'
l.
02
n
N.1
\'
L.
;=1 j=l
2
(
X'.J.-X-1·) ( y .. -y.
)
•
'J , •
.-
y .. -y. )
'J'
.
N-n
n .
+
[(N 2
n
-
L. N~)/N(n-1)]
;=1
L N~)/N(n-1)]
;=1
'
B2
B 02
u
02
u
e:
N-n
(
;=1
02
N-n
l.
2 - nL N~)/N(n-1)] a~
cov(e:,n) + [(N -
n-1
;=1
+ [(N
2
N,. (y.1· -y- '• • ) 2
;=1 j=l
V
02
n
n
IV
WITHIN "SETS
Expected Mean Square
Mean Square
2
cov(£,n)
02
n
..J
N
e
e
e
13
where
x.1· =
Ni
2
j=l
x •• =
n
N;
xij/N i
Ni
.L j~l
1=1
Y;.
=
j~l
n
x.1J·/N
y••
==
y1J
. ./N.1
Ni
.L j~l
1=1
yij/N
n
N
= .r1 N•1
1=
Tukey (1951) proposes the following estimates of 13:
A
138
=
(II -V)/(I -IV)
=
(III-VI)/(II-V)
=
I( III -vI}/(I -IV)
A
139
A
13'0
It is easily seen that these estimates converge to S as n ~
N.1
~
00
00
and
for some i.
Housner and Brennan (1948) also give an estimate of S when there
are multiple measurements on Uand V. They consider the expression
b
Y1"
J
-
y. '-Yk o
_l.;..'llJ~-:;,("._
ij kl - xij-X kl
=() + ax..
1J
+
T)..
lJ
-
f3 t· .
1J
then
. . -x"o)
b.1J~
'1,0 = 13(x . . -x"o) + (n· ·-n"o) - S( e;•• -e:.,,)
.(x 1J
~
1J ~lJ ~
1J ~
so that
14
for all i, j, k,.e. and xij not equal to xkl . Summing over all combinations of points and ignoring the terms involving the error gives
the following estimate:
n
n
i
Yi Ni ( L N. - 2 L N. + N.)
j=l J
1
__ J'=l J
_
n
n
i
I ii Ni ( L N. - 2 L N. + Ni )
j=l J
i=l
j=l J
.=L
=1
1
~~
....l_~
~-'--
This estimate approaches ain probability as Ni ~ ~ for at least
two distinct values of i. Madansky gives the following suggestions
A
A
A
A
on which estimate (SS-' 13 9,1310 or all) to use and when:
1. If the relation is believed to be linear, the optimum
allocation of observations would be at two points. Hence,
the Housner-Brennan estimate is preferred.
2.
If the underlying structure is not linear, and one is
trying to approximate some function in a small area of it's
range by a linear function, it may be advisable to increase
n, at the expense of decreasing Ni' to as little as 2.
In
this case, the Tukey components in regression estimate is
better.
Moran states that none of these methods is optimal and suggests
and 0n2 , and
then use these estimates in Barnett's solution for the case when a~
the use
of the sum of squares (if
.
and
a~
p
= 0) to estimate
a2
E
are known. However, this is still not the complete maximum
likelihood case.
15
Many of the ideas and results discussed above in reference to
the error-in-variab1es problem with one independent variable extend
to the multivariate situation. A brief discussion is given by
Moran (1971).
1.2 Use of the Logistic Regression Model in Epidemiologic Research
1.2.1
Introduction
One of the most often used indicators of disease frequency in
epidemiology is the incidence rate. Theoretically this rate estimates the probability of disease or death for a particUlar population
over a fixed time period.
It is possible to model this probability
as a fUnction of one or more independent variables. The logistic
function is a commonly used model for this purpose. Although this
study mainly involves the univariate logistic function,we note here
that it is in the multivariate case that this function is most often
used. This is because most chronic diseases are effected by multiple
factors simultaneously.
There are two methods used to estimate a and
~
in the logistic
regression model
Vi
1
=- - - - - - + n··
1 + e - ea + au i )
1
One proposed by Truett, Cornfield and Kannel (1967) uses a linear
discriminant analysis approach. The other method, proposed by Walker
and Duncan (1967), uses an iterative procedure to obtain maximum
likelihood estimates of the parameters.
16
1.2.2 Discriminant Function Approach
Assume that we follow N individuals over a fixed time period.
Also assume that this sample is derived from two populations:
0
(those who would develop the disease), and NO (those who would not
develop the disease). The respective sample sizes being n, and nO'
Suppose also that there exists a variable U having a normal distribution with mean
~1
in the diseased population and mean
non-diseased population.
~O
in the
Finally, we assume that the variance of U
is the same in each population.
Let
P{DIU)
= probability of disease for an individual characterized by U.
P{NDI U)
= probability of not having the disease given U.
P{D)
=
P
= unconditional probability of disease.
P{ND) = 1-p = unconditional probability of not having the disease.
fO{U)=P{UIND)= probability of U given the individual does not
.
have the disease.
.
f,(U)=P(U!D) = probability of U given the individual does have
the disease.
From Bayes' Theorem:
In
partic~lar,
if the distribution of fO(U)· and f 1{U) are univariate
normals with means ~1 and ~O respectively, and with the same variance
a~,
then
P{Dlu) =
1
1 + e -(et + aU)
17
where
CI.
p 20Ju
= - log 1 P -
(lll-llO)(lll+lJO)
s = lJ1-lJO
02
u
The following samp1 e estimates of ex and t3 are derived by inserting maximum likelihood estimates of
111' 1l0'
p, and o~ into the
above equations.
A
CI.
nO 1 A(_ -)
= - log n - 2" t3 u1+u 0
1
2
2
(n O-1)5 0 + (n1- l )5 1
where 5 = -=--.--.;~-~-~
u
n1 + nO - 2
2
5~ and 5i are sample variances of U given y = 0, y = 1,
respectively.
An estimate of risk can be computed for each individual conditional
upon his value ofU as
A
1.
P(DIU) = - - . . . : - - - 1 + e -(~ +SU)
1.2.3 Weighted Least Squares/Maximum Likelihood Approach
Walker and Duncan (1967) assume that the logistic function is
an appropriate model for the probabil ity that an individual will
develop a particular disease conditional upon the risk factor U.
They proceed from that assumption to use weighted least squares
estimation to obtain estimates of the coefficients of the model:
18
where
E(ViIU i )
=
1.
1 + e - (a +
au i )
= p~
is the probability that the ith individual in the sample acquires
the disease within the follow-up period given Ui •
They approximate the function f(U i ,a,8) using a first order
Taylor Series Expansion around initial values aO and So as
af(U i ,aO,(0)
af(U i ,aO,(0)
f(Ui,a,S) ~ f(Ui,aO,B O) +
aa
(a-a O) +
dS
(8-8 0)
Letting
Yi is approximated as
Y*
!
U*(B - 00) + n*
The iterative weighted least squares estimate of
° = (a,S)
is then
0r +l - Or + U'W r U)-1 U'W·r·Y r * .
A
_
A
.. (
where U is a nx2 matrix having as its ith row (l,U i ), Wr is a
diagonal weight matrix detennined as the inverse of the variance
matrix of n*, and Y* is a nxl vector of rescaled observations
19
The previous results are identical to those obtained by using
maximum likelihood equations. Given a sample of n individuals free
of disease who are followed for a fixed period of time and identified at the end of the period as having developed the disease or not,
then the probability of disease given the variableU i is
where Yi = 1 if the ith individual has the disease and Yi = 0 otherwise. The likelihood of the sample of n individuals is given by
L(a,S) =
n
'IT
i=l
P.
y.
1
(l-P.)
1
l-Y.
1
1
The likelihood equations are then
n
l
i=l
n
l
i=l
Yi
Y.U;
1 1
n
- i~l
P.1
=0
n
- i=ll
PiU i
=
o.
Because of the nonlinearity of the above equations, this method requires the use of an iterative computing technique. The most often
used iterative method is the Newton-Raphson technique, which is
based on the first order Taylor Series Expansion of
. T(0)
where e
=d
l~eL(e)
= (a,S).
T(e) = T{e
) + T'{e0
)(e-e
) +. T"(00)
. 0
0
(e-e )2
2.10 +
20
T(0)
!
Setting T(,e)
T(00) + TI'(00)(0-eO)
= 0 and solving for 0 gives us the MLE of e •.
In practice the initial estimates are the discriminant function estimates. The right side gives a new trial value for which the process
is repeated until successive o estimates agree to a specified extent
and T(0)
= 0 at
convergence. This method works well if T{e) is
stable over a range of values near
e (i.e.,
if the likelihood func-
tion is close to normal in shape). Asymptotic likelihood theory
guarantees this for large samples. The method fails if the likelihood is multimodal.
1.2.4 Comparison of the Approaches
One of the major drawbacks of the discriminant function approach
is its requirement that U be normally distributed. This requirement
is rarely satisfied, even approximately, although in some cases the
appropriate transformation can be made to normalize the variable.
Truett, Cornfield and Kannel's results do however show that despite
extreme non-normality of U the agreement between observation and
expectation is quite good.
The Walker-Duncan weighted least squares approach makes no such
assumptions about the distribution of the independent variable. Also
this approach forces the total number of expected cases to equal the
total number of observed cases. This is a desirable property for any
smoothing procedure. As stated previously, this procedure is iterative
21
and therefore computationally more difficult than the discriminant
function approach.
Halperin, Blackwelder and Verter (1970) prefer the weighted
least squares approach on theoretical grounds since it does not
assume any particular distribution of U and it gives results which
asymptotically converge to the proper value if the logistic model
holds.
1.3 Outline of the Research
The purpose of this research is to study the effect of measurement error in the independent variable on the estimated logistic
regression model coefficients and to propose methods of producing
consistent estimates of the coefficients. The main focus of this
study is the univariate logistic regression model:
Vi
=
1
1 + e -(a + eU i )
+ n.
1
In Chapter II the bias in the estimated coefficients of the
univariate logistic model is described for several different distributions of the unobserved independent variable, Ui • Also, a
description of the bias is given for the multivariate logistic
regression model with two independent variables, one measured without error and the other measured·with error.
In Chapter III we investigate as possible solutions to the
error-in-variables problem under the logistic regression model: an
adaptation of the method of grouping, the use of instrumental variables, and solutions when certain variances are known. The criteria
22
for comparison between these methods and the iterative weighted
. least squares method is the simulated bias and MSE of the estimated
coefficient.
In Chapter IV the use of mUltiple measures of the independent
variable is investigated. A comparison is .made between the use of
one measurement, the mean of mmeasurements, and the James-Stein
estimate of the independent variable as the independent variable in
the model. Again the criteria for comparison are the simulated bias
and MSE.
Finally, in Chapter V the methods discussed are applied to a
data set from the Lipid Research Clinics Program Prevalence Study.
That data set contains observations on 4171 men and women. The
variables of interest are cholesterol level, triglyceride level, and
age, with the outcome variable being mortality status at 5.7 years
after the measurements are taken. Other variables brought into the
study and used as instrumental variables are HDL cholesterol level
and Quetelet index (weight/height 2).
A surrmary of the results and recommendations for further research are given in Chapter VI.
. I
CHAPTER II
DESCRIPTION OF THE BIAS DUE TO MEASUREMENT ERROR
Before investigating possible solutions to the error-in-variables
problem under the logistic regression model, it is important to make
two determinations. We must first determine whether there is a bias
under this model. Secondly, we must determine the direction and magnitude of the bias given that it exists.
If there is no bias or if
the bias is always small and/or away from the null, there would be
1ittle need for this study.
It is al so of interest to describe the
Mas for different di.stributions of the independent variable and to
determine whether the distribution of the independent variable has an
effect on the bias. Therefore, in this chapter the bias is described
for the following cases:
1.
Simple linear regression;
2.
MUltiple linear regression (two independent variables);
3.
Simple logistic regression where the independent variable
is
a)
conditionally normal;
b)
conditionally exponential;
c) unconditionally normal;
d)
unconditionally exponential.
24
4. Multiple logistic regression where the independent variables
are
a) conditionally bivariate normal;
b) unconditionally bivariate normal.
2.1 A Review of the Simple Linear and MUltiple Linear Regression
Cases
2.1.1 Simple Linear Regression Case
It
was stated previously that in the simple linear error-inA
variables problem the OLS estimate of au is biased. That is, if ax
is the OLS estimate based on the observed values, then
and the bias is
Proof
Assume that the true relationship between Vand U is
v = exu + au U
but we observe
x= U+
y
E
=V + n
and
E(E} = E(n} = E(UE) = E{nV)
var
E
= 0 E2 ,
= E(VE} = E(Un) = E(En) = 0
var U ::
0
2
U
.25
then
a
x
= cov(XY) =
var
X
cov(UV)
2
2
0 + 0
U
E
= cov(UV}
02
U
;;..
It is known from OLS theory that ax is consistent for Bx • Thisim02
plies that ax is consistent fO,r Bu{o~ ~ o~} and not for Bu. The bias
is always towards the null and if
o~
is small relative to
O~t
the
biasis sma 11 .
2.1.2 Multiple Linear Regression Case
Cochran (l968) examines the bias when there are two independent
Viariables in the model as follows:
Suppose the true unobservable
relationship is
but the following variables are observed
Xl
= Ul
+ El
X2 = U2 + E 2
Y
=
V+ n
·22
2
d 2 t
.
where U l t U2t El t an d E: 2 have vanances
0u t au t a t ano
1
2 El
E2
respectively, and are uncorrelated. The coefficients au and au
· 1 2
are therefore estimated based on the observable relationship
26
Y = a X. + ~xl·
Xl· + ~x X2 + n
2
Let the variance-covariance matrices for the U's and XIS be:
Then
Q
~x
-
+ -1 + - + -1 + 13
-'x
This implies that if
Q
~u
2
'
then
+xy -'x
ax
+u u·
and ex are theOLS estimates of Su and
1 2 1
27
If the correlation between Ul and U2 is zero (i .e." P = 0), then
which is identical to the results in the univariate case.
If only one variable is subject to measurement error, for
example U2, then
Even though U1 is measured without error there is a bias in the
estimation of l3u •. This bias may be negative or positive, depending
.
1
on the signs of l3u ,13 ,and p. It is possible to produce situations
1 u2
in which l3u < l3u ' but E(~x ) > E(~x). Since f < 1 we observe a
1
larger bias in
~x
2
1
2
for the multiple regression case than for the
2
simple linear regression case. Also as p increases to 1, the bias
in the multiple linear regression case increases.
28
2.2 Mathematical Descriptions of the Bias Under the Logistic
Regression Model
In Chapter I we discussed three methods of estimating the
coefficient 6 in the logistic regression model. The discriminant
function approach assumes that U is conditionally normal and yields
an estimator of 6x that can be used to describe the bias in estimating Suo The iterative weighted least squares and the maximum
likelihood methods make no distributional assumptions about U, but
do not produce a closed-form solution to 6, thus making a mathematical description of the bias difficult to obtain.
In this
section the bias is described mathematically with the independent
variable conditionally distributed.
Assume that UIV=l_
N(lll'O'~)
and U!V=O -
Chapter I
and
Ct
1
u = - 1n ( pp) - 2: 2 (l1l-lJO) (111 +110)
u
Under the error-in-variab1es model
x=U+
E
v =V
where
E -
N(O,o~)
and is independent of Uand
var(XIV) = var(UIV) + var
Therefore
E
N(l1O'O~),
then from
29
These results are identical to those obtained in the simple linear
case. We conclude that when the true independent variable comes
from a mixture of two nonna1s with equal variance and measurement
error exists in the independent variable, the discriminant function
estimate of the coefficients using the observed data is not unbiased.
The bias is always towards the null and is dependent upon the magnitude of
cr~
cr~.
relative to that of
Next we investigate the bias when UIY=l ... Exponential (A1) and
UIY=o - Exponential (:>.'0). In this case
P(Y=ll U) =
----:..1
1 + exp[- (-1 n
~
A
O
=
---=-1
1+ e
_
(l-p)
P.
_
-(ex + S U)
u
U
where
A (1)
ex = - Tn _1
U
A -A
-p ,13 = 1 0
AO P
u
AOA1
If we observe X = U + e: where e: ... N(O,crE2 ) and
then
var(XIV=O) =var(UIV=O) + var
€
E
is independent of U,
30
var(.XI Y=l) = var(UI Y=l) + var(e:)
As a result
We conclude in this case that there is a bias in the estimation of
eu when
measurement error is present and the independent variable
is conditionally exponential. Again the bias is towards the nUll.
We conclude this section by considering the bias in the multiple
logistic model when (U l ,U2) is conditionally multivariate normal.
Assume that
~IY=o - MN(~O'*u)
~IY=l ... MN(~l'*u)
where
U=
and
If we observe
[~:]
~O = ['lJ'lJ20 ]
10
31
~IY=o -
then
~IY=l
MN(~ot~x)
- MN(~lt;x)
where
The expected value of the estimator of au that we would obtain by
using the discriminant function approach on the observed
is:
XiS
This result is also identical to the multiple linear regression case
and the conclusions made in that case also apply here.
2.3 Description of the Bias Under the Logistic Regression Model
Using Simulation
"
In this section we study the bias when the iterative weighted
least squares or maximum likelihood methods are used and no distributional assumptions about the independent variable are necessary.
2.3.1 Description of the Simulation Procedure
General Model:
Yi lUi
1
= -----T(-a-+---,..,SU..,...
......)
1 +e
+ nl
1
where
1
with probability Pi =
-(a + SU.)
1+ e
with probabilityl-Pi
1
32
i.e•• YilUi is Bernoulli (Pi)'
We begi"n the simulation by generating val ues of Ui under the
following distributions:
1. . U; 1Yi
Normal
2.
UilYi
Exponential
3.
Ui ... Nonnal
4. Ui ... Exponential
Next, the error values, £1' are generated from a N(O,1) distribution.
The observed values, Xi. are then computed as
, ,. ,
X.=U.+£ .•
After assigning values to
~
and
a,
values of the dependent variable
Vi are derived by computing
P.,
=
1
-{a + a u. )
l+e
u ,u,
and 1etting
1
where.;,.... Unifonn (0.1).
if
~i <
Pi
, ,
For each sample of values (Y.,X.), i = 1,2,
.•. ,500, the SAS procedure Logist was used to produce the iterative
weighted least squares estimate of au.
"-
a-suo
The bias was then computed as
This procedure was followed for 150 samples of size 500. This
33
resulted in 150 estimates of au. The mean and standard deviation of
these estimates were computed along with 95% confidence intervals.
2.3.2 Simple Logistic Regression Case
Simulations in which the independent variable is conditionally
distributed were done as a check on the simulation procedure, since
we know mathematically what the results should be. The results are
summarized in Table 2.1. We selected 150 samples of size 500 from
a mixture of two normal populations. The conditional normal distributions were:
with p = P(V=l)
UIV=l
N(2,1)
ulv=o
NO,l)
= ~ and a = ~l-~O
= 1. If the simulation is per2
c.
U
0
fonning as expected, then
.
A
.,..,
Our results are good with E(Bx) = 0.5109.
Next we checked the simulation procedure whenU was conditionally
exponentiaL We let p = ~, AO = 1, A1 =2 and Bu = A~-~O = 0.5.
1 0
this case
A
A
In
Again the results are good with E(Bx) = 0.3201. Based on these results, we feel confident that the simulations are performing as
desired.
The simulations show that when Ui is distributed as a N(O,l)
variable, the estimated bias is -0.5377 when Bu=land 0.5338 when
34
au = -1. The bias for this case is towards the null and sl ightly
greater in magnitude than the bias that results when Ui comes from
a mixture of two normals. When Ui is exponential (ft=l) and Su=l,
the bias is -0.6825. This is much larger than the bias in the normal
case.
2.3.3 Multiple Logistic Regression Case
We now investigate the bias under a multiple logistic regression
model.
In particular we assume that
(~1,U2)
come from a multivariate
normal distribution with correlation P. means 0 and variances 1.
U2 is measured with error and Ul is measured without error. The
simulation is performed with, P = 0.25 and then with P = 0.75. The
parameters a, au ' and Su are all assigned the value 1.
1
2
The results show that there is a bias in the estimation of au
1
although Ul is measured without error. (See Table 2.1.) Also the
bias in estimating au is greater than the bias that results in the
2
simple logistic case. We know from the linear regression case that
, , '
_
A
E(aX)MULT = f e E(aX)SIMPLE
where f = l-p2. We findapproxirnately the same relationship,
l-p2R 2
A
A
A
A
E(Sx)MULT .~ f e E(Sx)SIMPLE
for the logistic case in the following table:
A
p
f
A
A
A
E(Sx)MULT
feE(Sx)SIMPLE
0.25
0.9677
0.4392
0.4474
0 .. 75
0.6087
0.2755
0.2814
35
The simulations also show that as p increases the bias increases.
36
TABLE 2.1 Estimates of the Bias Along With 95% Confidence Intervals
A
A
Model
E(.I\)
Bias
95% CIon the Bias
Ulv ... Normally
e=
u 1
0.5109
-0.4891
(-0.4993. -0.4789)
U\V ... Exponentially
0.3201
-0.1799
(-0.1881. -0.1717)
U... Normally
au = 1
0.4623
-0.5377
(-0.5503, -0.5251)
U... Exponentially
6u = 1
0.3175
-0.6825
(.-0.6950. -0.6700)
-0.4662
0.5338
A
. au = 0.5
U... Normally
e·u
=-1
(
0.5208,
0.5468)
e
(Ul,U~)Bivariate
N rmal
= 0.25
au = 1
1
eu = 1
2
p
1.0631
0.0631
0.4392
-0.5608
= 0.75)
eu = 1
1.4868
0·.4868
au 2= 1
0.2755
-0.7245
( 0.0407,
0.0855)
(-0.5743, -0.5473)
(U 1 ,U ) Bivariate
Ngnnal
p
1
e
(
0.4576, 0.5160)
(-0.7390, -0.7101)
CHAPTER III
ALTERNATIVE ESTIMATES OF a WHEN MEASUREMENT ERROR
IS PRESENT IN THE SIMPLE LOGISTIC REGRESSION MODEL
3.1 Modified Discriminant Function Estimator
Recall that the discriminant function estimator of the coefficients iOn the logistic model is appropriate when one can assume that
the conditional distribution of the independent variable, Ugiven Y,
is normal. That is, if
and
ul V=O '"
N(llO'O~)
UIV=l
N(lll o~}
then
P.(Y =11 U)
1
=---,---,-~
....
-(et+ au)
1 + e
(3.1.1)
where
Suppose measurement error is present; that is, we observe
X = U + e:, where Uand e: are independent with variances
respectively. Since the variance of X is
tuting
o~
-
o~
for
o~
2
0x
= 02 + 02,
u
e:
o~
and
o~,
by substi-
in equation (3.1.1), S can be rewritten as:
{3.1.2}
38
If
o~
is known, ex and
a can
o~,
be estimated by replacing 'Ill' 1l0'
and p(Y=l) in equation (3.1.2) with their maximum likelihood esti-
-
'
I'\.
mators: Xl' XO' Si and p = nl/n. The resulting modified discriminant function estimator is:
where
2
2
Sl and So are the sample variances of Xi given Yi=l and Xi given Yi=O
respectively. Since Xl' Xo' S~ and
mators of 'Ill' llO'
o~
p are maximum
likelihood esti-
and p(Y=l), respectively, they are consistent
estimators and therefore ~D and SMD are consistent estimators of ex
and
a.
Even if the assumption of conditional normality is not met,
the estimator in (3.1.2) may produce better estimates of the parameters than the iterative weighted least squares estimator of Walker
and Duncan. These estimators will be compared in section 3.5.
3.2 An Estimator Based on the Method of Grouping
One means of obtaining a consistent estimator of
e when
measure-
ment error occurs in simple linear regression is the method of group·
ing. This suggests that the method might also produce less biased
estimators in the case of simple logistic regression.
39
3.2.1 The Method
Suppose one has available a grouping criterion, l, that is correlated with the true unobserved independent variable, U, but is not
correlated with the error term,
E.
The method would consist of rank-
ing the observations based upon the magnitude of their corresponding
values. Those with the smallest nl l values would be classified
into group 1. The middle values would be classified into group 2,
l
and discarded. Those observations with the largest n3 l values would
be classified into group 3. The group sizes are nl = n2 = n3 = n/3.
An ideal situation would be to have a grouping criterion that would
correctly group all of the observed Xi values into the same group
that their corresponding Ui values belong. The following quantities
are computed for groups 1 and 3:
a • Xg
Xi
= ~ n;
9
b.
'"
Pg
9 =
y.
n;;
=~
1,3
g
9
= 1,3
9
c•
~g = Log [pl-P~
j;
9 = 1,3
9
The parameters, a and S, are then estimated as
/
40
3.2.2 Rationale Behind the Method
We first transform the model
P. = P(V.=lIU.)
1
1
1
1
=--~-_ (ct
+
1 +e
au.)
1
into the following linear form:
Ai
Pi
= Log ----- = a
l-P.
aUi •
+
(3.2.2.1)
1
~ince Vi is a dichotomous va,riable, an observed value of log
cannot be computed unless it can be assumed that Pi is equal
P.
1_
1-P.
1
to some value Pg for all Ui's, a member of group g; therefore we
can pool the corresponding Vi's and estimate Pg, where
(3.2.2.2)
for i
g
= l,2, •.. ,n g
= 1,2,3.
As a result, equation (3.2.2.1) becomes
Ag=.ex + au.19
and
-A
where
~9 =}G ~
= A9
n
9
= A9 = a
g
Ug
In this form we recognize
(3.2.2.3). Therefore,
+
au-9
(3.2.2.3)
9
Ui
=6 n .
9 g
a as the
a can
slope of the line in equation
be written as
41
Theorem 3.1
The estimatorB'" GRP is consistent for S in equation (3.2.2.4)
if
i) grouping is independent of the error,
E. ;
1
-
- -X does not approach zero as n becomes large.
X
1 3
ii)
Proof:
Again
'" '"
Al-A
= - X -X
and
1 3
a)
A
A
Using Taylor Series Expansions of Al and A3 about Pl and P3~
respectively, we have
P
P .
1
2P - 1
'"
2
A = log
= log ~ +
(P -P ) + z 9
2 (p g-Pg) +
g
l-P g
l-P g Pg(l-P g) 9 9
2P g(1-P g)
A
-+
A
for 9
A
= 1, 3
A
If assumption (2) holds, we know that Pg
'"
P
1) 1im E(A ·) = 1og ~ = A
g
n+oo
l-P
g
-+-
Pg as n -+- 00, therefore
9
ii) lim
n+oo
--'1J = O.
var(~g) = var [109l-P
g
'"
Thefore Al
and '"A3 are consistent for Al and A3 respectively.
b)
If the grouping is independent of the error, Eit then
42
--
--
var[(X 1-X 3) - (U1-U3)]
"
= var
'r---n11 GL (X.-U i ) - --"n13 GL
1
__ var(E i Ig=l)
n
1
0
2
1
+
(X.-U.)
3
1
]
1.
var(Ei Ig=3}
n
3
0
2
=....f..- + E
n
n
3
1
=02'E (p-l
1
- -
and
+ p-1)/n
3
.
--
lim var[(X1-X 3) - (U l -U3)} = 0 •
~
- -X- goes to U -U in probabil ity as n becomes large.
Therefore, X
1 3
1 3
c)
'"
If "Al-A3
and X1-X 3 are consistent for Al-A3 and Ul -U3, respectively, then SGRP is consistent for e as long as 01-03 does not
approach zero.
If the grouping criterion,
~,
is correlated with
U,
E(Ulg=l) ; E(Ulg=3)
therefore Ul -U3 will never approach O.
In conclusion, if we have a grouping criterion that is correlated with the unobserved independent variable, U, but is not correlated with the error tenn, E, then '"SGRP is consistent for S in
model (.3.2.2.2).
If the Pi's are not equal within groups, S in
model (3.2.2.2) will be an approximation to the parameter in the
true model.
Even so, it is possible that ~GRP will estimate the
true parameter better than the Walker-Duncan estimator; which when
measurement error is present is also based upon an approximate model.
~
43
3.3 Use of An Instrumental Variable to Estimate f3
Another method of obtaining consistent estimates of
f3
in linear
regression is one based upon the use of an instrumental variable.
An instrumental variable is one that is correlated with the true
unobserved independent variable, but is not correlated with the
error terms.
In this section we describe the use of an instrumental
variable, l, in producing estimators of
e in
simple logistic regres-
sion when measurement error is present in the independent variable ..
3.3.1
Iterative Instrumental Variable Estimator
Let
V"
1
=
~(cx
1 +e
+ SU.) + ni
'
1
= -----,(-cx-+~~f3r.X
-.---e='"e:- + ni ·
."T'")
1 +e
1
1
Vi = f(cx,e,e: i ) +ni
Using a first order Taylor Series Expansion of f(a,8,e:i) about the
point (cxo'So'O), the model can be approximated as follows:
44
=
e -faa +saXi)
+
[1 + e
e -faa + SOX;)
-(a + S X.)-:; Xi(S-Sa) - ----T"'(a-a-+-:--:::S-;aX
.,-;"T"")-2 Sc;.
.a
a 1 J2
[1 + e
. ]
(3.3.1.1)
let
pa.
1
=
1+e
1
-(aa + SOX.)
1
and Q
ai
= 1-Pa.;•
then
Next, let
As a result,
(3 .. 3.1.2)
If one observes an instrumental variable, f;., that is correlated
with Ui but is not correlated with ni or Ei' then
a) cov{n*,f)
b)
cOV(S£,l)
= 0,
~
since n is independent of X and f,
BCOV(E,l)
= 0,
45
and
c} cov{X,l}
= cov(U,l}
~o.
Using equation {3.3.1.2}, a can be written as:
cov{Y*,l} + So cov{X,l)
~
S cov{X,l).
. (3.3.1.3)
A natural estimator ofa* is
{3.3.1.4}
Next we derive an estimator of a. Taking the expected value of the
terms in equation (3.3.1.2), we have
Since E(E i )
= E(.ni) = 0,
A natural estimator of a* is
A
1'\
ar+l
_
A
'"
_
= ar + Y~ - (Sr+l-Sr) X •
(3. 3.1. 5)
Initial values, a O and SO' are assigned and equations (3.3.1.4)·
and (3.3.1.5) computed iteratively until the solution converges.
For example, until
(ex., .0005).
16r +1
- erl is less than some prespecified amount
46
Theorem 3.2
A
A
Let alV and SIV be the convergent solution to equations (3.3.1.4)
and (3.3.1.5). Both ~IV and SIV are consistent estimators of a*
and a* respectively.
Proof:
n
.L
1=1
n
_
Since.~ (li-l)(V~-Y*)
_ 1-L
is consistent for (n-2) cov(ltV*) tand
A
(li-l)(Xi-X) is consistent for (n-2) cov(ltX)t then aIVt which
is the ratio of the two sample covariances t is consistent for a*t
which is the ratio of the corresponding population covariances. as
long as cov(ltX)
r O.
Since Y*t
eIV are consistent for E(V*)t
Xt and
E(X)t and 6* respectivelYt &IV is consistent fot a*.
a;
The parameters a* and a* only approximate a and
alV and SIV are not consistent for a and a.
A
A
therefore
In cases where a* and
a* are close to a and a respectivelYt these estimators should be
less biased than the iterative weighted least squares estimate. The
estimators &IV and8IV can also be used to estimate
categorical variable is available.
f3
when only a
If the observations belong to
three distinct groupst Zi can take values It -It Ot depending upon
whether observation i is a member of group gt where g
= lt2 t3.
The
grouping must be independent of the measurement and sampling errors t
and correlated with the independent variable U. A simplified form of
SIV would be:
=8
r
* - )~ Vir
*
~) Vir
+ 1
3
~.
i
x.1 -IG x.1
3
(3.3.1.6)
..
-
47
where G1, G2 and G3 are the three groups with n/3 observations in
each group. If more than three groups exist, say g, then values 1,
2, ••. ,g could be assigned to Zi,and
a computed
using equation
(3.3.1.4) •
3.3.2 Two-Stage Least Squares Estimates
The two-stage least squares (TSLS) estimator was derived
independently by Theil (1961) and by Basmann (1957}.The estimate
is computed in two steps, as follows:
1. First, regress X, the observed independent variable on Z,
the instrumental variable, using the linear regression
model:
x.1
= 'ITO.+ 1
'IT 1 ~. + W.
1
= a var(w i } = o~
E(W i )
2.
{3.3.2.1}
Use the estimates of 'ITO and
'lT
l computed in step 1 to
estirna;te the unobserved independent variable Ui . The
model of Ui given li is
(3.3.2.2)
since X.,
E(Ei}
= U.,
= 0,
+ E..
1
If l,. and E,' are independent and
then
E(U i } = 'ITO +
A
and Ui
=
A
'lT
A
O+
'lT
'IT 1
li •
A
l lie We now regress Vi on Uf .
48
In simple linear regression the TSLS estimator and the instrumental variable estimator are identical. Let
V.1
=a 1
+ eu. 1
+ n·
Xi
= U.1
+
e:.1
= E(e:) = O.
andcov(l,n) = cov(l,e:) = O.
a)
U. n, e: are independent and E(n)
b)
cov(l,X)
= cov(l,U),
If X is regressed on l the estimates of lTO and ITl are
n
_
_
(li-i)(Xi-X)
IT l . - 2
_f
I
(fr i )
and
A
The estimate of 13 that results from regressing Vi on Ui using the
model
Vi = a + I3U.1 + n~1
A
is
n
I
A
!3rSLS
=
1
A
A-
(U.-D)(V.-Y)
1
n
'
A
A
L (u.-D)
1
2
'
In the case of logistic regression, the model in step 2 is:
49
Yi
+ ni** .
1
=--_"":"'-_-;:-"-
e
1 +
-(a + BU.)
1
The iterative weighted least squares estimator of f3 using the above
. model is:
" 1'5 an nx2 matrix with [l "U ] as the ith row, W is the
where U
r
i
diagonal weight matrix with {Pu. QQ. } on the diagonal,
is an
nxl vector with elements
V. 1
pI'.
ui,r
t(-------.;~-)o,
Q"
u.1,r u.l,r
fA
v;
1,r 1,r
p"
Ui,r
1
= --_"":"'-_-1'.-1'.,
1 + e
-(ar + f3 U,.)
r
This estimator can be derived in practice by using SAS's PROC
.
A
LOGIST on the values of (Vi,U i ), i = l,2, •.. ,n. The resulting esti. be referred to as "BrSLS'
mator w,ll
3.3.3 Another Two-Stage Estimator of f3
In previous sections, Xi and ft O + ft l Zi were used as estimates
of the unobserved independent variable Ui . In this section the use
of the Bayes or James-Stein estimate of Ui is discussed.
If Xi = U; + E i is observed, where Ui - N(~,cr~) and E i then (Xi,U i ) is distributed as a bivariate normal with
var(.X i )
and
= cr~
+ o~,
var(U i )
cov( Xi ,U i ) =
cr~
= cr~
N(O,o~),
50
ifE(E i )
=0
and Ui , Ei are independent.- Also the conditional
2
0
distribution of Ui given Xi is N(ll + P(X1.-ll), (XJ2) wher~ p = u
E
02+02
The Bayes estimate of Ui is that function of the observations U E
that minimizes the conditional expectation of the loss function
L
-
.
E[( Ui - f(X i ))
2
= (Ui
-
f(X i ))
2
.
IXi] is minimum when ftXi) = E( Ui IXi)' if a minimum
exists. Therefore
e
Proof:
Let
Yi = ~ + SQU i + ni
cov(Yi,lJ + p(Xi-lJ))
var(ll + p(Xi-u))
p
cov(Yi ,Xi)
= p2
var Xi
=
Since
02
U
+
02
U
0
1
= -s
p x
2
£
f\ = l3u .
So
is equal to l3 u ' any consistent estimator of (3a is a consistent estimator of l3u' and it is well known that the OLS estimator
using the values (Yi'U i ) is consistent for l3u. Whether a similar
relationship exists in the logistic case is investigated by means of
A
51
simulation in the next section.
In most situations the parameters
the parameter
1.1
variabl e Xi.
If
l.l
and p are unknown. However,
can always be estimated as the mean of the observed
cr~
is known,
p
2
A
. p
=
can be estimated as
Sx - cr
2
2
Sx
where
- 2
n (Xi - X)
Sx =.L
n-1
1=
2
1
The estimator of Ui is then,
'"
-
A
U. = X + p( X. - x) ,
1
1
A
the James-Stein (1966) estimator. Using Ui as the independent variable, Ssayes 1 is then computed as the iterative weighted least
squares estimator.
3.5 Comparison of Alternative Estimators
In this section the estimators described previously are compared using simulated values of the bias, standard deviation, and
mean square error. The observations for these simulations were
generated as described in section 2.3.1.
In the true model ex and 6
are both assigned the value 1. The independent variable, U, is
generated asa N(O,l) variable except in the case of the first
modified discriminant function estimate whereU is conditionally
normal. Both the error term, e:, and the instrumental variable, l,
are generated as N(O,l) variables.
In order to investigate the
effect of the correlation between Uand! on the grouping and
52
instrumental variable estimators, we generated three sets of l
values for each sample. The sets differ only in the value of
corr(U,l). The correlations are 0.2, 0.5 and 0.9. Two additional
sets of l values were generated for simulation of '"s,-5L5' with
corr(U,l)
= 0.3
and 0.4.
The following estimators were computed for 150 samples of size
500:
1.
2.
MOdified Discriminant Function Estimator,
a)
Uconditionally normal with means III
b)
U", N(O,l)
0'2
E
= 2,
known
110
=1
The Iterative Weighted Least Squares Estimator with
2
2
'U." = -X + (5 x- e)
0'
52
1
(X.1 - -X)
X
as the independent variable, '"Saayes l'
3.
The Grouping Estimator,
4. The Iterative Instrumental Variable Estimator, '"SIV
n
_
L (li-l)(Y~ -y*)
'"
i=l
lr r
= Sr + -n" - - - - - - - -
I
i=l
(lr"~)( X. -x)
1
53
a)
l - N(O,l)
bl
l takes on the values -1, 0, 1
A
5. The Two-Stage Least Squares Estimator, !3rSLS' which is the
Iterative Weighted Least Squares Estimator with
A
=
"
U.1 , 7T
·
0
+
"
7T 1 ~.
1
as the independent variable.
Estimators BGRP and B1V were computed for values of corr(U,~)::
0.2, 0.5,.0.9 and a,-SLS is computed for values of,corr(U,~) :: 0.2,
"
A
A
0.3, 0.4, 0.5, and 0.9. The estimated bias, standard deviation, and
mean square error were computed for each estimator. The resu1 ts are
shown in Table 3.5.1.
The estimators proposed are useful under different assumptions
and depending upon the availabil ity of additional information such
(C1~)
as the variance of 'the measurement error
or observations on an
instrumental variable. This makes comparisons between some of the
estimators unreasonable. However, one can make the following comparisons and conclusions.
1. When measurement error is present and the variance of that
error known, the modified discriminant function estimator,
A
•
_
A
SND' has much smaller blas and MSE than SrWLS' regardless
of whether Ui is conditionally normal or unconditionally
normal. 1n both cases the bias is very small, 0.025 and
,
A
-O.OOSt as compared to -0.538 for SIWLS.
.
When U is condi-
A
tionally normal SMD has asrnaller estimated standard deviA
,
,
'
ation than SMD when U is unconditionally normal.
54
2. When
~
ity of
1s known and one cannot assume conditional normal-
then the use of the Bayes estimate of Ui as the
independent variable produces an estimator of e with
Up
smaller MSE than when using SMD' See lines 2 and 3 of
Table 3.1.
....
3. The method of grouping estimator, f3 GRP ' the iterative instrumental variable estimator, SIV' and the TSLS estimator,
....
SrSLS' vary in quality relative to the value of the corre-
lation between the instrumental variable (or grouping criterion) and the independent variable, U. Both the bias and
the standard deviation of the estimators increase as the
correlation,
PU~'
decreases.
4. The rate of convergence for SIV is also a function of PU~'
For example, when
PU~
= 0.2,
only 108 out of 150 samples
provided solutions within 30 iterations.
PU~
In contrast, when
= 0.9 all but two samples converged to a solution within
30 iterations. This problem is due to the large variance of
the estimator when the correlation is small. It is well
known that inefficient/unstable estimators have convergence
probl ems.
5. The bias in 6GRP ' 6IV and STSLS is smaller than the bias
....
in f3 IWLS ' although the standard deviation of these estimators is always larger than the standard deviation of
'f"3 IWLS ' For values of
PU~·
>
0.2 the MSE's of these estimators
are smaller than the MSE of SIWLS'
55
6.
In terms of bias and MSE. there appears to be little
.
A
A
A
difference in the performances of f3GRp • f3IV and !3rSLS.
However. "!3rSLS is preferred over the other two because of
the convergence probl an of "f3 rv and the necessary model
"
A
assumptions of SGRp. Also. !3rSLS and its variance are
easy to compute using SAS's PROC LOGIST.
56
TABLE 3.1 Simulated Mean, Bias, Standard Deviation and Mean"Square
Error of the Estimators
STD(t3)
~SE
0.025
-0.006
-0.064
0.166
0.203
0.163
0.168
0.203
0.175
0.835
0.914
0.958
-0.165
-0.086
-0.042
0.691
0.260
0.154
0.710
0.274
0.160
SIV: :t ... N(O,1)
PU:t = 0.2, n = 108
0.5,
136
148
0.9,
0.605
0.845
0.885
-0.395
-0.155
-0.115
0.371
0.209
0.158
0.542
0.260
0.195
:t = -1, 0, 1
= 0.2, n = 100
0.5,
135
0.9,
147
0.524
0.813
0.867
-0.476
-0.187
-0.133
0.380
0.200
0.152
0.609
0.274
0.202
0.911
0.856
0.857
0.866
0.956
-0.089
-0.144
-0.143
-0.134
-0.044
0.911
0.398
0.286
0.230
0.148
0.915*
0.423
0.320
0.266
0.154
0.462
-0.538
0.079
0.544
Estimators
A
ulY ... Normally
U ... Normally
Ssayesl :
SMD:
A
A
SGRp:
PU:t
= 0.2
0.5
0.9
A
A
Mean
Bias
1.025
0.994
0.936
A
A
A
A
SIV:
PU~
~SLS:
PU:t
= 0.2
0.3
0.4
0.5
0.9
A
SIWLS(Wa1ker-Duncan)
*After removal of the extreme value, 9.39, the starred line becomes:
0.855
-0.145
0.588
0.606
e
CHAPTER IV
USE OF MULTIPLE MEASUREMENTS IN THE ESTIMATION OFe
4.1 The Mean of MIndependent Measurements as the Independent
Variable
In previous sections it has been assumed that the measurement·
error,
i , has a mean value of zero and is independent of the unobserved variable Ui , for all values of i. If under these assumpE
tions, m independent measurements are made on each,of the n
independent sample units with the observed variable being
X••
1J
=U.+E ..
1
'J
i
= 1,2, ... ,n
j =1,2, .. .,m
then
E(X.'J·IU.)
= U.
1
,
and
var(X 'J
.. IU.)
1
_
= var(E 'J'
.. Iu.)
= 0 E2
,
•
m
.r
XiJ·/m, which estimates E(XiJ·IU i ) a1 so estimates
J=l
'
.
_
Ui . Also as the number of repeated measurements increases, Xi_ converges to Ui . This suggests the use of Xi_ in place of the unknown
Therefore, Xi _ =
independent variable Ui when estimating e.
In the case of simple 1inear regression, if
58
= Ui +
Xi.
with variance
0i = o~+
ei .
o~/m is used as the independent variable in
the model, then
(4.1.1)
and as m -+
eo,
Bx converges to Bu. Therefore, if
Sx
is a consistent
estimator of Bx then as both nand m become large, Sse will converge
to
au in probabi1 ity.
The relationship between Bx- and Bu shown in equation
(4.1.1)
.
also exists in the logistic regression case when the independent variable Ui is conditionally normal.
takes· the form
=
lJl - lJO
02
u
Again, as m -+
eo,
ax
+
02/ m
If Xi. is used to estimate Ui , Bx
=
£:
goes to
au. This proof cannot be extended
directly to the logistic regression case in which the independent
variable Ui is not conditionally normal.
Ui in the logistic model as follows:
P{ Yi=ll
Xi J =
·1 +e
-
However, if Xi. replaces
...:...1- - -
-(et+BX.)
,.
= .p i
59
then the iterative weighted least squares OWLS) estimator is
(4.1.2)
where X is an nx2 matrix with i th row (1 ,Xi'>' Wr is a diagonal
weight matrix with elements" {PirQir}' and
is an nxl vector with
"- - - "
.th
, element {Yi-Pir/PirQir}.
v;
As m ~
00,
Xi. ~ Ui and equation (.4.1.2) becomes
er +=lsr + (U'W r U)-l
U'W r V*r
The nx2 matrix U has i th row (l,U i ), Wr is diagonal with element
{PiQi}'
is an nxl vector with elements {Yi-p;lPiQi}' and
v;
Pi
1
= ------,(;-a-+~SU:-:-.-.-)
•
1+ e
.,
It is well known that Sus the IWLS estimator without measure\
ment error (i.e .. Ui observable) converges to S as n ~
'"Sx converges to S as both m ~ 00 and n .~ 00.
110.
Therefore,
4.2 Further Reduction of the Bias when m is Small
Another advantage of having multiple measurements on each
sample unit is that one can now estimate the variance of the measure- "
ai,
ment error. Since var(XijIUi) = var(e:ij!U i ) =
o~ can be estimated by pool ing the estimated va lues of var( Xij IUi ) as follows:
2
SXIU,. = j=l
I
and
-
m ( X••
'J - .. X" • ).
m-l
2
i
= 1,2, ... ,n
60
-
n
m (X .• - X. )
S2 = L l
'J
,.
E
i=l j=l
n(m-l}
2
(4.2.1)
An estimate of q~ can be computed as
.
S!
~
x
-
n (X.
I .,.
i=l
-
- X )
2
..
n-l
Now from equation (4.1.1) the relationship between 13,X and l3 u for
the linear regression case and the logistic case in which Ui is
conditionally normal can be written as:
(4.2.2)
13-x = p 13-x
2
2
.1
Using I3x' SE' and S,X as estimates of I3x' 0E' and Ox respectlve y, l3u
can be estimated as:
A
2
2
2
f'
.
Sx
A
l3 u = z
z 13,X
S-x - SE1m
(4.2.3)
2
~
2
2
2
Since Sx' SE' and P,X are consistent estimates of ox' 0E' and I3x ' we
conclude that l3 u is consistent for Bu.
A
When the independent variable in the logistic regression model
is not conditionally normal the simulation studies of Chapter II have
shown the relationship in equation (4.2.2) to be an approximate
relationship~
That
is~
61
where
2
o-x
Therefore
2
(j-x
- (j 2E1m
13-x
in the logistic case. Still the estimator in
equ~tion
(4.2.3) is
A
expected to be less biased than 13x when m is small.
4.3 Use of the James-Stein Estimator of Uias the Independent
Variabl e
If more than one measurement is available for each sample unit,
one can compute the James-Stein estimator of Ui (discussed previously
in section 3.4) when both p and o~ are u~known. In the case of m
.
measurements on each sampling unit, one observes
X.
1•
= U. +
1
E.1
where (Xi.,U i ) is bivariate nonna1 with E(X i .) = E(U i )
o~ + (j~/m, var(U i ) = o~, and cov(X;.U i ) = o~.Also,
= }.I
+ p*(X.
1•
-
=}.I,
var(Xi.l
}.I)'
The Bayes estimator of Ui is again that function of the observations
that minimizes the conditional expectation of the loss function,
2
L = (U i - f(X i )) , which is
fr.1 = }.I + p*(i.1 • - ~).
=
62
Since
l.l
and p* are unknown, we estimate them using
=
A
2
n
m
=X =.r
.I
,=1 J=l
Jl
2
XiJ·/nm
•
where Sx and SE are def,ned in equation (4.2.1). The estimator of
Ui is
A
_
A
_
_
, = X+ p*{X,' .- X).
U.
.
A
Using Ui above as the independent variable, S is estimated using
the IWLS estimator. The result is SSayes 2'
A
A
In the case of linear regression, this estimator, SBayeS 2 is
AA
identical to the estimator derived in section 4.2, pSx'
Proof:
A
A
Ssayes 2
=
A
and
A
_
_
2
HUi-U)
A
A
_
HUi-U)(Y i-V)
= P*
since Ui -
_
U= p*(X i
-
L{Xi-~)(Yi-Y)
_ _
p*2 L{X i -X)2
1
=-:;::--
p*
A
13X
AA
= p (3-
X
X)
1
p - -.
A*
P
Whether this is true in the case of logistic regression is investigated in section 4.5 by means of simulation.
63
4.4 Estimation of
02
When m Measurements
are Taken on a Subsample
.
e;
It is not always economically feasible to take more than one
measurement on each sample unit. However, one can take mmeasurements on a subsample of k units and estimate
o~.
Since Ui is fixed
for each unit and independent of e: l' an estimate of a~ would be
S~
k
m (X 'J
.. -
= L .L
i=l J=l
x.
)2
,.
k(m-l)
and
S2
X
2
A
whereSx and (3IWLS are based on a single measurement.
For example,
for sample unit with mmeasurements use only the first in computing
4.5 Results of Simulation Study
Simulated values of the following estimators were produced for
150 samples of size 500.
1.
2.
A
ex:
AA
pax:
The IWLS estimator with Xi as the independent
variable.
•
The estimator in (1) adjusted by the factor
A
P
=
S~x
22
Sj( - Se:/m
3. Saayes 2: The IWLS estimator with
U.
A
,
=.~
_
+
m.]
[.S~X -S2/
E
(i,..S~
x
~)
64
as the independent variable.
The IWL5 estimator with Xi as the independent variable adjusted by the factor
52
x
where 52 is based on mmeasurements made on a subsample of k units.
Different estimates of"Sx. pSx. and SSayes 2 were computed for
values of m = 2.4.6. and 8. where m is the number of repeated measurements. The estimated mean. bias. and mean square error of the
estimators are shown in Table 4.1.
As expected. the bias in the estimators decreases as m increases.
Even when m is as small as 2. the reduction in the bias and MSE ;s
large. The actual reduction in bias from SIWLS to Sx when m = 2 is
29.4%. Adjustment of the estimator '"Sx by the factor p'" decreases the
bias even more significantly.
In the case of m = 2. there is an 87.9%
reduction in the bias from '"SIWLS to "''''
pax when m = 2. A similar reduction in bias occurs with the use of "ass a"nd "aBayes 2. As shown in
the linear regression case. ~Sx and Ssayes 2 are identical estimators.
There is an increase in the standard deviation of these estimators
over that of SIWL5. However. the YMSE is largest for the IWl5 e~ti
mator. Another point worth noting is that the use of
p does not
affect the results of the test of the Ho: S=O since the t-statistic
for this test is invariant under scale transformation. That is.
t
=
65
66
TABLE 4.1 Simulated Values of the Mean, Standard Deviation, and
Mean Square Error of the Estimators
-",-
A
A
A
Mean
Bias
STD
.; MSE
m=2
4
6
8
0.620
0.776
0.829
0.874
-0.380
-0.224
-0.171
-0.126
0.092
0.105
0.122
0.125
0.391
0.247
0.210
0.178
m=2
0.935
0.969
0.968
0.984
-0.065
-0.031
-0.032
-0.016
0.146
0.131
0.143
0.140
0.160
0.135
0.147
0.140
m= 2
4
6
8
0.935
0.969
0.968
0.984
-0.065
-0.031
-0.032
-0.016
0.146
0.131
0.143
0.140
0.160
0.135
0.147
0.140
(3ss' k=50, m = 2
0.942
-0.068
0.247
0.256
0.462
-0.538
0.079
0.544
Estimator
A
(3x:
AA
PSi:
A
SBayes 2:
A
4
6
8
e
CHAPTER V
APPLICATION OF METHODS TO THE LIPID RESEARCH CLINICS DATA
5.1 Description of the Data Set
In this chapter the application of methods discussed previously
will be applied to data collected as part of the Lipid Research
Clinics (LRC) Program Prevalence Study.
Detailed descriptions of
the LRC Prevalence Study can be found el sewhere (Hei ss, et a1. (J 980)) .
The data used fn this demonstration consist of observations on 4171
white men and women, age 30 and above. After participation in
Visit 1 of the Prevalence Study these individuals were randomly
selected, as part of a 15% sample of Visit 1 participants, to participate in Visit 2. As a result, we have available two determinations of plasma cholesterol and triglyceride levels, one taken at
each visit, along with other sociodemographic data. Other variables
used in this study are HDL cholesterol level measured at Visit 2,
and Quetelet index (weight/height2). The outcome variable for this
demonstration is mortality status at an average of 5.7 years after
the measurements were taken. Mortality status is modeled asa function of either total cholesterol or triglyceride level. Because of
the influences of sex and age on mortality, separate logistic models
were fit for each sex and when p.ossible, age was included in the
model as an additional independent variable.
Both males and females
68
using hormones or having a missing value for that variable were
eliminated from the demonstration because of the influence of sex
hormone usage on lipid and lipoprotein levels (Heiss, et al. (1980)).
5.2 Computation of the Estimators and Their Variances
The following estimators were computed by model and data set,
along with their standard deviations and p-values corresponding to
the test of the Ho: 6=0:
1.
eIWLS
1s the iterative weighted least squares estimate
with a single measurement of either cholesterol or triglyceride as the independent variable. Both the estimate and
its variance were computed using SAS's PROC LOGIST.
2.
A
6MO is the modified discriminant function estimate using
an external estimate of the variance of the measurement
error. The estimate is computed as described in Chapter III.
A Taylor Series Approximation is used to derive the following large sample estimate of the variance of '"f3MO :
When there is more than one independent variable, the
variance of the rth coefficient can be estimated as
VA{SMo,r)
=
S;l (l/n l + l/n O)
where s;l is the rth diagonal element of the inverse covariance matrix.
,
69
A
3. Ssayes 1 is the iterative weighted least square estimate
which uses
A
Ui
=
X +
as the first stage estimate of Ui . Again, cr~ is an external estimate of the variance of the measurement error.
A
A
After computing the Ui's, both the estimate
~Bayes 1
and
its variance are computed using SAS's PROC LOGIST on the
A
Ui's and the dependent variable.
4.
A
I3GRP is the grouping estimate in which observations are
grouped according to the magnitude of an instrumental
variable. The estimate is computed as described in
Chapter III. The large sample approximation to its variance using a Taylor Series Approximation of the estimator
is
2
A
A
;\1 - 1. 3
-
-
Xl - X3
+
5. SIV is the iterative instrumental varlable estimate.
variance is computed for fixed values of the ·observed
Its
70
independent variable Xi and for fixed values of the instrumental variable li as
6. s,.SLS is the two-stage iterative weighted least squares
estimate which uses '"Ui = '"'ITO + ''"ITl li as the first stage
estimate of U. li is an instrumental variable and 'ITO'
A
;1 result from the linear regression of the observed
A
independent variable Xon l. Both the estimate 6-rSLS and
its variance are computed using SAS's PROC LOG1ST on the
dependent variable and '"Ui •
7.
ex
is the iterative weighted least squares estimate which
uses the mean of the two measurements of cholesterol (or
triglyceride) as an estimate of the true independent variable (PROC LOGIST).
8. '"aBayes 2 is the iterative weighted least squares estimate
which uses
A
Ui
_
= X+
as the independent variable (PROC LOGIST).
=
2
2
The sample estimates of X, S-,
x and SE by model and by data set are
shown in Table 5.1.
71
=
2
.
2
TABLE 5.1 Sample Estimates of X, Sx' and SE by Model and Data Set
Independent Variable
Measured with Error
Males
Females
206.1
205.4
1258.4
1622.8
256.2
258.3
140.3
103.3
s-X
9086.4
4673.1
52
3092.6
815.6
Total Cholesterol
Triglyceride
=
X
2
E
All of the estimators listed above were not computed for all
models.
Estimators requiring an external estimate of the variance
of the measurement error (SMD and Ssayes 1) could only be computed
for models in which total cholesterol was the independent variable.
A relatively good estimate of the within-individual variabil ity of
total cholesterol was found in a paper by Ederer (1972). He observed, using data from the National Diet-Heart Study based on a
sample of 1000
men~
that in the absence of dietary change the
between-visit correlation was about 0.85 when measurements of total
chol esterol were taken one to two weeks apart. He a1so reports a
within-visit between-individual standard deviation of 38 (ax) for
cholesterol.
ual
Given two measurements of cholesterol for each individ-
72
where var(X i ) = var(X i ) = o~, Uiis the true value of cholesterol,
1
2
and Ei and E; are error terms, we have
1
2
corr(X,' ,X,, )
1
Therefore,
o~
2
=
02
u
o~ + o~
= [1 - corr(x ,x )]
o~. Using Ederer's values for
il i2
we computed o~ using the above equation to be
corr(x ,x ) and o~
i1 i2
216.6. No estimate of the variance of the measurement error in
triglyceride level of reasonable quality could be found.
A
A
Those estimators requiring an instrumental variable (SGRP' SIV'
e
STSLS) were demonstrated for models in which triglyceride level was
the independent variable of interest. None of the variables observed
in this study were adequately correlated with total cholesterol, By
'adequately' we mean having a correlation greater than 0.2. Our
simulations in Chapter III showed that estimators requiring an instrumental variable perform poorly when the correlation between the
instrumental variable and the independent variable measured with
error is less than or equal to 0.2. Therefore, we did not attempt
to demonstrate the use of these estimators with total cholesterol
as the independent variable. Our data set contained two variables
adequately correlated with triglyceride level. They were HDL cholesterol level at Visit 2, and Quetelet index (weightlheight2). It
was also observed that the correlation between log(HDL) and triglyceride
e
73
was even higher than the correlation between HOL and triglyceride.
Therefore, using HOL, 10g(HOL), and Quetelet index, estimators requiring instrumental variables were computed. Table 5.2 shows the
correlations between triglyceride level and HOL, 10g(HOL), and
Quetelet index. Correlations adjusted for age are also given.
TABLE 5.2 Correlations Between Triglyceride Level and Instrumental
Variables by Data Set
Instrumental Variable
Males
Females
HOL
-0.3134
-0.3265
(Adj. for Age)
-0.3209
-0.3639
log(HOL)
-0.3565
-0.3589
(Adj. for Age)
-0.3628
-0.3917
Quetelet Index
0.2483
0.2557
(Adj. for Age)
0.2496
0.2344
In using an instrumental variable to estimate model coefficients it is necessary to assume that the error in measuring the
instrumental variable is independent of the error in measuring
triglyceride level. This appears to be a reasonable assumption
given the mechanisms. for measuring HOL cholesterol and triglyceride
levels. Obviously there is no relationship between errors in the
measurement of triglyceride level and errors in measuring height or
weight.
74
A
The grouping estimator, SGRP' and the instrumental variable
...
estimator, elY' are demonstrated only for the simple logistic case.
Extensions of these estimators to the multiple logistic case was
not attempted as a part of this research.
Tables 5.3-5.6 show the estimators computed by model and data
set along with their standard deviations and p-values corresponding
to the test of the Ho: 13=0.
5.3 Comparison of the Estimates
Comparisons of the estimates are made within model and within
data set.
1.
In the simple logistic model with total cholesterol as
the independent variable we observe that
a) The magnitude of the estimate increases from SIWLS
t06MD to 6x. The increase in magnitude from 6lWLS
A
At.'"
to ex is about 58.3%. Also, SBayes 1 and ~ayes 2
are larger 1n magnitude than SIWLS and ~x respectively .
...
The largest increase 1n magnitude is from SIWLS to
BBayeS 2' a 76.4% increase.Al though the model coefficient corresponding to total cholesterol is not significantly different from zero (a
= 0.05)
regardless of
the estimator used, there is a decrease in the p-value
A
A
A
from SIWlS to eMD to ax'
b) For females, the largest increase in magnitude is
...
...
froms IWLS toSMD , a 41.9% increase. Other patterns
mentioned for males in (a) also occur for females
75
under this model. The model coefficient corresponding to total cholesterol is significantly different
from zero using any of the estimators computed.
2.
In the multiple logistic model with total cholesterol and
age as the independent variables, we observe that
a) When age is added to the model the magnitude of the
estimates decreases from that of .the simple logistic
case except for "SMD - males.
b
)
In the male data set
ex
"
A
and (3Bayes 2 perform poorly.
They show a large decrease in magnitude.
SMD performs
better than "(3IWLS in tennsof being larger in magnitude.
The increase in magnitude is 352.5%.
~ayes
1 performs
as expected.
c) Except for SMD the patterns described in (.1.a) are
present between the estimators computed under this
model using the female data set. The increase in magnitude from SIWLS to ~ayes 2 is 20.7%.
3.
In the simple logistic model with triglyceride as the
independent variable we observe that
a}
For males, there is an increase in magnitude from BIWLS
. to
ax
to Saayes 2 along with a corresponding decrease
in p-value. The increase in magnitude from SIWLS to
"SBayes 2 is a 45.5% increase ..
76
b) For females, there exists a smallet increase in mag-
e
nitude (11.9%) from IWLS to aBayeS 2 than for males.
Also the p-value increases from '"f3 IWLS ·to '"f3Bayes 2·
c) For both males and females the model coefficient is
significantly different from zero (a
'"
SIWLS'
d)
'"
ex
= .05) using
'"
or f3Bayes 2.
For the female data set the '"s,.SLS estimates are larger
in magnitude than BIWLS ' but have increased p-values,
although they remain significant. For the male data
set, the '"!3rSLS estimates are larger in magnitude than
the '"SIWLS estimates except for the case in which HDL
is the instrumental variable.
None of the TSLS esti-
mates suggest that the model coefficient fortriglyceride is significant using the male data set.
e) The grouping and instrumental variable estimators
perform poorly for both the male and female data
sets~
For '"SIV-males, there is no gain in magnitude. There
is a gain in magnitude from '"eIWLS to SGRP and from
BIWLS to IV (in the female data set). These gains
"-
e
are offset by large increases in variance.
4.
In the multiple logistic model with triglyceride level
and age as the independent variabl es, we observe that·
a) There is an increase in magnitude froms'" IWLS to '"STSLS
for both data sets except when Quetelet index is used
77
as the instrumental variable (male data set). There
is also an increase in magnitude from
A
~IWLS
to
A
~x
to
A
~Bayes 2·
b) Except for
~ayes
(.HDL,QI)-ma1es, l3x - females, and
A
.
~SLS
_
.
A
2 - females, the coefficient corresponding to
triglyceride in the model is significantly different
from zero
Ca = 0.05).
5.4 Conclusions
The simulation studies of the previous chapters showed that on
A
A
A
A
SBayes l' l3x' and 13Bayes 2 perform better than
13 IWLS • They also showed that if the correlation between the independent variable and an instrumental variable was high enough, SGRP'
the average
~D'
A
A
A
13Iv , and SrSLS would perform well. By 'perform well' we mean have
reduced bias and MSE. Since the bias was shown in Chapter II to
always be towards the null, we demonstrate the reduction in bias
offered by the proposed alternative estimators as an increase in
A
their magnitude over tbat of 13 IWLS • We also expected, based on,
simulation results, to demonstrate situations in which the p-value
corresponding to the Ho: 13=0 decreases when one uses the alternative
estimators rather than SIWLS. We were able to demonstrate this in
most cases for 13MD , 13Bayes l' ex and f13ayes 2· However, when looking at the estimators requiring an instrumental variable, the p-value
A
A
A
A
actually increased. This probably a result of one or more of the
following:
78
1. The correlation between triglyceride and the instrumental
variables was too small; therefore:. resulting in a large
estimated variance.
A
2. The variance used for l3 IV in this demonstration is conditional on the observed value of triglyceride and con~
ditional on the instrumental variable. This may not be
an appropriate estimate of the variance.
79
TABLE 5.3 Simple Logistic Model With Total Cholesterol Level at
Visit 2 as the Independent Variable
Estimators
"
f3 IWLS
Std. Error
P-value
"
f3 MD
Std. Error
P-va1ue
"
e
f3Bayes 1
Std. Error
P-va1ue
"
f3x
Std. Error
P-va1ue
"
f3Bayes 2
Std. Error
P-value
•
•
Males
(n=2508)
Females
(n=1663)
0.00254
0.00247
(0.30)
0.01381
0.00277
«0.0001)
0.00308
0.00275
(0.26)
0.01959
0.00356
«0.0001)
0.00301
0.00292
(0.30)
0.01575
0.00316
«0.0001)
0.00402
0.00258
CO.12)
0.01489
0.00284
(:<0.0001)
0.00448
0.00287
(0.12)
0.01617
0.00309
(<0.000l)
80
TABLE 5.4 Multiple Logistic Model With Total Cholesterol at
Visit 2 and Age as the Independent Variables
Estimators
Males
(n=2508)
Females
(n=1663)
A
SIWLS
Std. Error
P-va1ue
-0.00143
0.00272
(0.60)
0.00783
0.00314
(0.0l)
-0.00647
0.00282
(0.02)
0.00630
0.00401
(0.12)
-0.00014
0.00316
(0.97)
0.00945
0.00357
(0.01 )
A
(3MD
Std. Error
P-va1ue
A
13Bayes 2
Std. Error
P-value
•
81
TABLE 5.5 Simple Logistic Model with Triglyceride Level at
Visit 2 as the Independent Variable
SGRPtHDL
Std. Error
P-value
Srv tHDL
Std. Error
P-value
0.00185
0.00488
(0.70)
0.00901
0.00644
(0.16)
-0.00043
0.00288
(0.88)
0.00601
0.00681
(0.38)
0.00095
0.00254
(0.71)
0.00601
0.00619
(0.33)
-0.00114
0.00392
0.00642
0.00959
(0.50)
A
SrvtLHDL
Std. Error
P-value
A
SIV tQI
Std. Error
P-value
(0.77)
82
TABLE
5~
5
Estimators
"
~SLS,HDL
Std. Error
P-value
"
erSLS,LHDL
Std. Error
P-value
"
eBayes 2
Std. Error
P-value
CONTINUED
Males
(n=2508)
Females
(.n=1663)
.
.00114
-0.00280
(0.68)
0.01328
0.00658
(0.04)
0.00169
0.00249
(0.50)
0.01170
0.00514
(0.02)
0.00243
0.00081
«0.0001 )
0.00394
0.00128
«0.0001)
83
TABLE 5.6 Multiple Logistic Model with Triglyceride Level at
Visit 2 and Age as the Independent Variables
Estimators
Males
(n=2508)
Females
(n=1663)
0.00200
0.00062
«0.0001)
0.00207
0.00106
(0.05)
SrSLS,HDL
Std. Error
P-value
0.00471
0.00290
(0.10)
0.01622
0.00642
(0.01 )
SrSLS,LHDL
Std. Error
P-va1ue
0.00618
0.00247
«0.0001)
0.01324
0.00498
(0.01)
0.00094
0.00400
(0.82)
0.01471
0.00684
(0.03)
0.00254
0.00072
«0.0001)
0.00205
0.00112
(0.07)
0.00306
0.00087
«0.0001)
0.00225
0.00123
(0.07)
~IWLS
Std. Error
P-value
"
"l
e
"
!3rSLS,QI
Std. Error
P-value
"e-
x
Std. Error
P-value
A
eBayes 2
Std. Error
P-value
\
CHAPTER VI
SUMMARY AND SUGGESTIONS FOR FURTHER RESEARCH
In this work it has been determined that the iterative weighted
least squares (IWLS) estimator of the model coefficients in logistic
regression is not unbiased when measurement error is present in the
independent variable. Several estimators have been proposed as
alternatives to the IWLS estimator. Of the estimators discussed,
the ones requiring an external estimate of the variance of the
measurement error and the.ones requiring multiple measurements of
the independent variable performed best in termS of having smaller
bias and MSE than the IWLS estimator. The estimators requiring an
instrumental variable perform well only if
t~e
correlation between
the instrumental variable and the independent variable is high. The
application of the instrumental variables and grouping estimates
also demonstrate that they are not likely to be very useful in
practice.
The major focus of this work has been the simple logistic regression model, although some mention of bias in the multiple
logistic regression model was made. A more complete investigation
of the effect of measurement error on the estimation of coefficients
in the multiple logistic regression model would be of value,
r
85
especially when some t but not all t variables are measured without
error. Most of the estimators discussed can also be used in the
multiple case; however further work is needed to extend the
t
•
iterative1nstrumental variable estimator to the multiple case .
It would also be worthwhile to develop an unconditional estimate
of the variance of this estimator.
86
BIBLIOGRAPHY
Barnett, V.D. 1967. A Note on Linear Structural Relationships
when Both Residual Variances are Known. Biometrika 54,
670-672 ..
Basmann, R.L. 1957. A Generalized Classical Method of Linear
Estimation of Coefficients in a Structural Equation.
Econometrica 25, 77-84.
Berkson, J. 1950. Are There Two Regressions? Journal of the
American Statistical Association 45, 164-180.
Cochran, W.G. 1968. Errors of Measurement in Statistics.
Technometrics 10, 637-666.
Durbin, J. 1954. Errors in Variables. Review of th'e Institute
of International Statistics 22, 23-32.
Ederer, F. 1972. Serum Cholesterol Changes: Effects of Diet and
Regression Toward the Mean. Journal of Chronic Diseases 25,
277-289.
Gardner, M.J. and Heady, J.A. 1973. Some Effects of Within-Person
Variability in Epidemiological Studies. Journal of Chronic
Diseases 26, 781-795.
Halperin, M., Blackwelder, W.C., and Verter, J.I. 1970. Estimation of the Multivariate Logistic Risk Function: A Comparison
of the Discriminant Function and Maximum Likelihood Approaches.
Journal of Chronic Diseases 24, 125-158.
Heiss, G., Tamir, I., Davis, C.E., Tyroler H.A., Rifkind, B.M.,
Schonfeld, Got Jacobs, D., and Frantz, I.D. 1980. LipoproteinCholesterol Distributions in Selected North American Populations:
The Li pid Research Clinics Program Prevalence Study.
Circulation 61, No.2, 302-315.
Housner, G.W. and Brennan, J.F. 1948. The Estimation of Linear
Trends. Annals of Mathematical Statistics 19, 380-388.
James, W. and Stein, C. 1961. Estimation with Q~adratic Loss.
Proceedings of the Fourth Berkeley Symposium' 1, University
of California Press, 361-379..
'
Kiefer, J. 1964. Review of Kendall and Stuart's Advanced Theory
of Statistics, II. Annals of Mathematical Statistics 35,
1371-1380.
87
Kleijnen, J.P.C. 1974. Statistical Techniques in Simulation:
Part 1. Marcel Dekker, Inc., New York.
.
Lindley, D.V. and Smith, A.F.M. 1972. Bayes Estimates for Linear
Models. Journal of the Royal Statistical Society B, 34, 1-41.
Madansky, A. 1959. The Fitting of Straight Lines when Both
Variables are Subject to Error. Journal of the American
Statistical Association 54, 173-205.
Moran, P.A.P. 1971. Estimating Structural and Functional Relationships. Journal of Multivariate Analysis I, 232-255.
Neyman, J. and Scott, E. 1951. On Certain Methods of Estimating
the Linear Structural Relationship. Annals of Mathematical
Statistics 22, 352-361. (Correction 23, 135) ...
..
.
Snedecor, G.W. and Cochran, W.G. 1967. Statistical Methods. 6th
Edition, Tbe Iowa State University Press, Ames, Iowa.
Theil, H. 1961. Econometric Forecasts and Policy. 2nd Edition,
North-Holland PUblishing Company, Amsterdam.
..
Truett, J., Cornfield, J., and Kannel, W. 1967. A Multivariate
Analysis of the Risk of Coronary Heart Disease in Framingham.
Journal of Chronic Diseases 20, 511-524 .
Tukey, J.W. 1951. Components in Regression. Biometrics 7,33-69.
Wald, A. 1940. The Fitting of Straight Lines if Both Variables
are Subject to Error. Annals of Mathematical Statistics 11,
Walker, S.H. and Duncan, D.B. 1967. Estimation of the Probability
of an Event as a Function of Several Independent Variables.
Biometrika 54, 167-179.
88
PROGRAMS USED TO GENERATE SAMPLES USED IN SIMULATION STUDIES
.
A. THr FOLLOWING S~~ STATEMENTS WERE USED TO GfNrRATE 15 0
SA~PLES OF 500 OBSERVATIONS EACH USED Tn COMPUTE THF
BIAS FOR VARIOUS MODELS AND DISTRIBUTIONS or THE TRUE
HJOEPENDENT VARII\BLE U. THE VARIAOLES ARE
U=TRUE INDEPENDENT VARIABLE
LU=MEASURE"ENT ERROR TERM
F=~OME ~NIFoRM(O,l) VARIABLE USED TO COMPUTE V
X=OBSERVEDINDEPrNDENT VARIABLE
V=OsSERVEO
OEPrNOENT VARIABLE
1. U nISTRIBUTED AS A NORMALCO,l)
VARIABLr.AL~HA=1.8ETA=1.
DATA START;
00 K=l TO 150;
DO 1=1 TO 500;
U=NORMALCSQ3211);
LU=NORMAL C13Q213) ;
F=UNIFORMC221131l;
X=U + LU;
P=l/Cl + EXPC-l-U»;
IF F(PTHEN V=l; ELSE V=O; OUTPUT; END; END;
2. U nISTRIBUTEO AS A NORMALCO.l) VARII\BLr.ALPt'A=1.8ETfI=-1.
DATA START;
DO K=l TO 150;
DO 1=1 TO 500;
U=NORMALC7~6q23)1
LU=NORMALC!45269);
X=U + LUI
F=UNIFORMCQ52633);
p=t/Cl + EXPC-l+U»;
IF F(P THEN V=11 ELSE V=O; OUTPUT; END; END;
..
89
3. U TS CONDITIONALLY DISTRIBUTED AS A NORMALC1.1) VARIABLE
GIVEN Y=O AND A NORMALC2,1) VARIABLE GIVEN Y=1,ALPHA:-l.5.
BETA=1,PCY=1)=O.~.
DATA START;
00 K=1 TO 150;
00 1=1 TO 500;
•
IF 1(=1<=250 THEN U=NORMAL(879211) + 2:
ELSE U=NOR~ALCe79211) + 1;
lU=NORMAL(567887);
x=u + LU;
F=UNIFORM(567 7 77);
p=t/C1 + EXPC1.5-U»;
IF F(P THEN Y=1; ELSE Y=Ol OUTPUT; END; END;
~.
U IS DISTRIBUTED AS AN EXPONENTIAL(1) VARIABLE,
ALPHA=ltBETA=l.
DATA START;
DO K=l TO 150;
DO· 1=1 TO 500;
R=UNIFORMCI02081); U=-LOGCR);
LU=NORMAL(684911);
F=UNIFORMC111111';
x=u + LU;
P=t/C1 + EXPC-1-U»;
IF F(P THE~ Y=1; ELSE Y=o; OUTPUT; END; END;
5.
UtS CONDITIONALl.Y OIsTRIBUTED AS AN EXPONEf'ITIAL(1)
VARIABLE GIVENY~l AND AN EXPONENTIAL(2) VA~IJ\BLE GIVEN Y=1,
ALPHA=-O.6932.BETA=O.5tP(Y=1)=O.S.
DATA START;
DO K=l TO 150;
001=1 TO 500;
R=UNIFORMC786421';
IF 1(=1<=250 THE'" U=-2*LOG CR); ELSE U=-I OG CR) ;
LU:NORMALC567887';
x=u + LU;
F=UNIFORMC567777';
P=t/Cl + EXPC.69~2 -.5*U»;
IF F(P TH£~ Y=1; ELSE Y=O; OUTPUT; END; END;
90
6, U1.U2 ARE 8IVARIl\TE NORMALS WITH MEANS n. VARIANCES 1
ANn CORREL~TION=O,25.ALPHA=1.BETA1=1.BrTA2=1.
DATA SlART;
00 K=l TO 150;
00 1=1 TO ~oo;
U1=NORMALCQ29673);
C=NORMALC7e23~7);
u2=.25*U1 + .9683*C;
LU2=NORMAL(23781 Q);
Xl=U1;
X2=U2 + LU2;
F=uNIFORM(543221);
p=t/(l + EXPC-1-UI-U2»;
IF F(P THEN Y=l; ELSE Y=O; OUTPUT; END; END;
7. U1,U2 ARE BIVARIATE NORMALS WITH MEANS
ANn
n. VARIANCES
1
CORREL~TION=O.75.ALPHA=1,BETA1=1.Br.TA2=1.
DATA START;
00 K=1 TO 150;
DO 1=1 TO 500;
U1=NORMAL(Q29673);
C=NOR~AL(1e2347);
u2=,75*Ul + .6614*C;
LU2=NOKMAL(23781Q);
xl=Ul;
X2=U2 + LU2;
F=UNIFORM(543221);
p=t/(1 + EXPC-l-U1-U2»;
IF F(P THENY=l; ELSE Y=o; OUTPUT; END; END;
8, THr FOLLOWING SA~ STATEMENTS WERE USED TO GENERATE
150 SAMPLES OF 500 OBSERVATIONS EACH usre TO COMpUTE
THr VARIOUS ESTTMATORS OF BETA. THE VARIABLES ARE
DE~CRIBED IN A, WITH THE FOLLOWING ADDITIONS:
W=A NORMALCO.l) VARIABLE USED TO COMPUTr Z.
Z=THE INS1RUMENTAL VARIABLE
RHo=CQRRELATION RETWEEN U AND Z
..
I
,91
1. GENERATES SAMPLES USED TO COMPUTE THE INSTRUMENTAL
VARIABLES E~TIMATOR,THE GROUPING ESTIMATOR,THE TWO-STAGE
LEAST SQUARES ESTIMATOR AND THE MOOIFIEn DISCRIMINANT
FUNCTION ESTIMATOR.
DATA START;
J
00 RHO=0.2,O.5,O.9;
00 K=1 TO 150;
00 1=1 TO 500;
U=NORMALCS~3217);
W=NORMAL(1~q081);
Z=RHO*U + (Cl-RHO*.2)**O.S).W;
LU=NORMAL(134273);
x=u + LU;
F=UNIFORM(221137);
P=t/(l + EXPC-l-Ut);
IF F<P THEN Y=l; ELSE Y=o; OUTPUT; END; END;
2, GENERATES
MULTIPLE
SAMPLE~ USED TO COMPUTE ESTIMftTORS REQUIRING
MEASURr.~ENTS OF THE INDEPENDENT VARI~8LE,
DATA START;
00 K=l TO 150;
DO 1=1 TO sao;
U=NORMAL(5~3217);
LU1=NORMALC13~273);
LU,=NOMMAL(41006 Q );
LU~=NOHMAL(5~633~);
LU~=NORMALC7S5~5!\);
.
J
LUS=NORMAL(37 4 211);
LU6=NORMAL(1 98 229);
LU7=NORMAL(765431);
LUa=NORMAL(1 94 581);
Xl=U + LU1;
x2=U + LU2:
X3=U + LU3;
X4=U + LU~;
xS=U + LUS;
X6=U + LU6;
x7=U + LU7:
X8=U + LU8:
X8AR=~EAN(OF Xl-X8);
V=VARCOF Xl-X8);
OUTPUT; E~C; END;
92
3. GENERATES SAMPLE~ WITH TWO MEASUREMENTS TAKEN
ON ONLY A SUBSAMDLE OF 50 OBSERVATIONS PER SAMPLE.
DATA START;
00 K=l TO 150;
00 1=1 TO 500;
U=NORMALC5~3217);
LU1=NORMALC1~4273);
XI=U + LUI;
IF 1<51 THEN 00;
LU,=NORMAL(410069);
X2=U + LU2;
V=VARCOF X1-X2)1
END;
F=UNIFORM(221137);
p=t/Cl + EXPC-I-U));
IF F<PTHEN Y=1; ELSE Y=o;
OUTPUT; ENe; END;
© Copyright 2026 Paperzz