r··~~----~·-·--~-"~"--~'-~"~·=·""~""~'=·""'··-~·-·u·-~-~=-~·-="-~'"='"''-~ --~·c····"··- ······-·---"···~=--''"''''~-~-~-- -~----~--~---·~"·-~---"·"·-~--"·~~-,.
~~
'i
l
·~
'
l
j
;
Ca 'li forni a State University, Northridge
A COMPARISON OF MULTIVARIATE RELATIVE RISK FUNCTIONS
i
I
I
A thesis submitted in partial fulfillment
I
i
.
!
'•
of the requirements for the degree of
l
ll
I
~
Master of Science in Health
1-.
Sci~nce
l
I
by
Felix Olu .Aqedeji
_
./
l
~
II
'
!
l·
j
I
I
II
1
August, 1976
_
_
-
.
jl
-L. ~-----------~---~------------------------
The thesis of Felix Olu Adedeji is approved:
california State University, Northridge
August, 1976
I
. I
·--·------1
ii
. f""''"" . .-'"'-"""""~.....-. . . . . . -=·=-"'<,_._,._...,.,..,.,_""""-._... __ ,==-.-:."'"'"""'""""""-"=~-'(1......,_"'"_"'"""' ..." ~....,.,_.,.,....,$~·--"~'"-"'~-..,..,..""""'-"""""==-.t:~WI:·'•"'""'"'"'~""'''··-....--,.<o.·,•.:-·=...·_~;,.,.._."'.. ,....__,,,_~;.,~..._,.,_...___,_...,....,......,"- '"""-"'"'"-'"--~'"'--'·"-·~=-'.-:~...-...-.""-"-'·""·'~"'i
'
I
I
ACKNOWLEDGEMENTS
I wish to
~hank Doctor Bernard Hanes ~r his continuous
j encouragement and
l
~dvice in bringing this study to completion.
.
.
I I also want to thank Doctor Roberta Solomon for her valuable
!
Il assistance
II
and advice.
A special thanks to Olu~ayo and Olabisi for their cooperation
l
land for bear·ing with me during my graduate studies.
i
i! .
I
l
I
I
;
iii
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS ....... ~ • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
iii
ABSTRACT •••• " • • • • • • • • • • • • • • .. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
v
INTRODUCl.ION •••
~1od e1s ••••
:r'
11
• • • • • • • • • • • • • • • • • • • • • • -•• -••
••••
~
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
••
•
•
•
•
~
•
..............
lit
G.
•
•
•
•
•
••
•
•
•
•
•
•
•
•
•
•
LI i'EAA1"URE REV I E~J •••• ~ • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
s·ruov
~1ETHOfJS
Q
•••••••••••••••- ••••••••••••
~
• • • • • •_ • • • • • • • • •
Method of Ana 1ys is. "' .............· .•..•...•.•..••.
' ...11 ng
•
•
•
•
•
•
•
•
•
2
4
1o
10
...........•..
10
Data Set l .............................................. .
11
Data Set 2 •••••••••••••••••••••••••••••••••••••••••••••••
12
Data Set 3 .................•.....•....•... "..............
13
~.amp
STATISl~ICAL
Procedure.~ .........••...•.•. ·~···
t!
,
1
METHODS •••••••••••••••••••••..,....................
15
~....................
15
l
Multiple Logistic Risk Function ......• ~~·················
15
l
Multiple Linear Regression Risk Function ............•.••.
16
RESULTS AND DISCUSSION .................................... .
18
TRADITIONAL RELATIVE RISK METHOD ••••
I
l
i
l
1
21
22
26
I
A.
Rel~tive
Risks •.•.••..••••.••• ~ ...••••••••.• ,~·······
Logis~ic
2.7
Risk Function ..••.••..•.••••.••
30
Multiple Linear Regression Risk Function •••.•••.•.••.
34
D. Numerical Example .••••....••••..••••..•••.••.•••• ~ ..•
36
B. The Multiple
c.
iv
-;~,.,
lI
..,,......,,_ ""'~"" "'"'""""·-..,.-==-... ~.-..,_,,..,_~-~;:_-.>.""'''''·~~·"'··""·"'"'~'"" ..., ...,__.......,_...,_-=""""""'~""''"""'"'""·""'"' .,._-,:<'J!>.;;;:.."'""-"""..._.""..:: ..~_,.,."''""·=·'-~~ "'""""'·
"'A' ......... lii:. -~'-"'-"'
:-..<.
'*"'~··""'·-'·•·•-::
.• =·--«>'0,
j
l
I
ABSTRACT
l
l
A C0~1PARISON OF
~1UL TIVARIATE
RELATIVE RISK FUNCTIONS
I
l
l
by
l
I
!
Felix Olu Adedeji
l
!l
Mas te1~ of Science in He a1 th Science
I
In case-control epidemiological studies, where the risk of
I
l a disease in the genera 1 population is assumed to be unknown,
j
1 multiple
relative risks are obtained for two multivariate models-
!
j (1) multiple logistic risk function, (2) the multiple linear
l
l regression risk function and for the traditional univariate model.
I
l
Comparisons were made of the two multi-variate models to each
Il other and to the traditional
relative risk method, under three risk
j conditions. The three risk conditions were: (1) small, between
I
j 1 and 5, {2)
moderate~ between
5 and 10, and (3) large, above
I
110.
There was no significant difference bet\'>teen the magnitude
Iand ordering of the relative risks obtair.cd
J
by
the two multivariate
l
j models and the traditional method when the risks were small.
I
i
I
l
j However1
.!i
obtained
when the risks were large, the estimated relative risks
by
the two multivariate models become erratic.
j
l
·I
i
I
t......
--~-
.
.
..,....J...,_, ......_ _ _ _~. ·---~~-- ............., _ _ _.. ________._._,__ _ _ _ _ _ _ ~.......---...~,._..-----"'------""·
v
j
In case-control epidemiological studies, where h-10 or more
I risk factors are
i
I relative
!
implicat~d
tn the etiology of a disease, multiple
risk ratios can be obtained where the risk of the disease
!in the population is assumed to be unknown
I
Il
{1-8).
~!hen two or more risk factors are involved in determining an
j estimate of a relative risk ratio, there is a choice between two
I multivariate
j
i
l.
I
I
models - (1) multiple logistic risk function, {2)
!
j multiple linear regression risk function and the univariate
I
approximate relative
r~sk
model.
This study describes the
I behavior of these model~ under three risk conditions- (1) small,
1
between 1 and 5, (2) moderate, between 5 and 10, and {3) large,
above 10.
This study uses three sets of data nbtained from the Kaiser
Foundation Hospital, Los Angeles, California, between January 1, 1974
and September 1, 1975. These sets
~f
data conta1n the three risk
conditions stated above •.
Il
.
.
l________,
-~--
1
-------------·-----
I
l
I
l
•.
,.,.c~•""-~~•a<>=~-h•.-=m•=.·=•-'-~•~·~=-~r.--•~•-~•·-··•~·-•~"•~•~==-~"'"~•w·'-"'"·••J-<·~·~··"'"<•-=.-~':""'"·<~<~~·-~..,_,,,_.,.,,_,_.,., .,,~~~~-'"·"~'"'""''"''·'""wl
I
~
l
l
l THE f!&.JOELS
1------l
.
I
The •ultiple logistic risk
'j
functi~n
model assumes multivariate
!
j norma.·i d'lstributi;on for the risk factors (X ,X , ••••••• ,X ) ,
J
l
.
/equal va-riance-c¥Ovariance and
I
2
k
dif~erent means in ~he cases and the
.
i control group.
l
I'he multiple logistic risk function advanced by
I Corrrfielid {9) is cgiven as:
,
,\
<';
lI
lnt.P/1-Pe==B +~s.x.
(i = 1,2, .... ,k)
1 1
Q
I
J
where P denotes the proportion of individuals with the risk factor
1
who be lang to the cases.
1
j
Se\fera1 re.s12archers have advanced different methods for
, estimati'ng the f..¥a:rameters B and B. (12-14). This study uses the
I 1inear d:iscrimirhaint function method to estimate these parameters
0
1
l
II
l
I
as propo:sed by C~rnfield.
The model uses the pooled covariance
matr·ix iin est.imat·ing B and Bi.
Although, this approach has been
0
critici.zed by Ha:1peri.n et. al. (5) Seigal et. al. (8) noted that
I
the discriminant .function theory approach tends to give correct
estfmat..s cf the p.a rameters if the multivariate norma llty of the
risk f.ae-tors is
~ro<nt
violated.
In this study, the risk factors
are assumed to be multivariate normal.
The. second 100:del-multiple linear regression risk function
assumes multivariate normal distribution for the risk factors,
f
I
lI
di ffe:re.nt vad aooes and different mt::ans ·in the cases and the
contr~ol group.
.4\1so the variance of the cases in the general
· population is assumed to be so small that it is negligible.
I
Thus. tile covari':once matrix of the control group is used only in
L<----~----·~-·-"--- --·---·--------2
3
r·~,.-,~-~«~'-<''"=~~'·-"-=•N•'·''~"~~-·~v,,_,_,~~,_~,,.~---~--~~----~~-"~~<'<>•<»'••->>~,--~--·~•rr="'"='·'''''~'Q"'-"'-''~"'"""·""'r,>-•~W•-'''''"•-~•''~'M1
I
·
I estimating the regression coefficients B, (i = O,l, .. ,k), by the
l
l
i
l method of least squar~s·.
I given
-
The model advanced by Feldstein (18) is
Y = B +~B. (X. - E(X.));
0
1
,
i
'
l
by:
l
l
I
= l,2, .. ,k.
j
I
l
1
I
I where
I
Y denotes the risk function.
The models discussed are applied to the three sets of data
that contain the three risk conditions under investigation.
A
comparison of the relative risks obtained by the two multivariate
1
1
mode 1s under the three conditions is made with those obtai ned by
!
j the traditional approximate relative risk.
j risk
The approximate relative
(see appendix A) has often been regarded as adequate for
I research
of etiologic relationships (3,4,7,11), when the
I
I proportion
of individuals with the disease in question is assumed
to be unknown, but small,in the general population.
-------~-~-----·
I
l
1
•w·-"~·'"·==•~,.~~,.-·-,~~"~-~··~=-•=·~~-~··~·-·"=·~-~~
l
..--·"'"~ ..~-=~---~~·=·•···~~-~···~·=~·~·-~--•·•·~·.,.·=~·--.~~'·••••~··,~-~k~·~· •••·•=':'_!
.
.
I
J
l
-
I
I
I
LITERATURE
lI
!
1
RCVIE~J
I
The traditional method available to calculate relative
risks is based
on
reducing each t'isk factor X to binary form
I. resulting in analysis of one or more two by two tables.
1-
l
l
If one
I
were interested in the marginal effect of only one risk factor
X, the table would appear as follows:
II
I,
TABLE. I
Risk factor X
y
Cases
Controls
Disease present
a
b
Disease absent
c
d
a,b,c,d are frequencies.
The relative risk =(a/a+ b)/(c/c +d), if, as is often true, ·.
the proportion of the diseased persons is small, then a is small
in relation to b, and c is small in relation to
The denominators then
r~duce
a.
(see appendix A).
to b and d yielding:
Relative Risk= (a/b)/(c/d) =ad/be
as an approximation of the relative risk.
(1}
The expression is
called odds ratio because the quantities can be considered as the
odds in favor of having the disease with the factor present and
with the factor absent, respectively.
4
I
5
.
r-~~"'"""~~'-"="""'-'~=-.·~ ..-.r.,e•.·•c.-•'..-"'"-'·-'-"'~'""''
''"'=" '"'''"'"·"'-""'"""'-"""''"'""'"~·~-.-.~::.-:-->f'.-.....ru.i'..'><-"''""'"' ·""'"~·"""-"~...........,..,.__ ,_..,_.,;r;:·.tl<'""'-0 o..<"-"",.,.__ ........~'"'"'-h~-""'-'-'--"-'~~·.,.:-.-. <"'.--. .-~-,..=.,·._._..,.,,...,,._""'""'""""'-"0'« ~ . . _, ...,.,._~,., .......... ·"""'-""<c,t="'""""-:"""'~-~'·"'"''"'"""'"•~
i
l
If one were interested in the effect of two or more risk
j.
j factors simultaneously the traditional method may also be used.
j But one can only ·compare two factors at a time and where· there
;
1 are
several risk factors this approach is cumbersome.
As a result,
!
1 two
mathematica1 models- (1) themultip.le-Togistic risk function,
I
land (2) the linear regression risk function have been developed
l'
I to handle two or more risk factors simultaneously.
1
The logistic function has been variously called the 11 growth
J
I
.
! functi on 11 , the .. autoca ta lytic curve 11 and other tenns, according
l
l to
l
the application to which it was put.
I description of population
grm<~th
It was rediscovered for the
by Pearl and Reed (20) who,
I
I following Verhulst called it the "logistic function. The
I
.
11
I logistic
I
function is given by:
E(Y)
.1
= 1/1
+ exp(B 0
B;Xi}
-
(2)
I1 where i = 1,2, .•• ,k.
!
l
Cornfield,. starting with multivariate normal distribution
II
1
assumptions, and using Bayes theorem with prior probabilities
!
P0 for V = 0 (disease absent} and P for Y
1
.
.
advanced
I
tr~
=1
(disease present)
following multivariate logistic model for the
probability of 'f, of an individual who has risk factors
t(Y} == 1/1 + exp{-8 0
I
-
(3)
BiXi) ·
where 80 and B; are discriminant function coefficients.
The·curve
'
" ••. ..
J
of equation (3) is S shaped starting at E(V) = 0 and
_ . . _ . ._ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ... 5.,_ _ _ _ _ _ _ _ _ _ _ _._.._
_, _ _ _ _ _ _
~
6
increasing to a level of E(Y)
l equation
= 1.
Simple algebraic reduction of
(3) (see appendix B) to its linear form gives:
I
ln (E(Y)/l - E(Y))
=B
+ BX
0
J
i
J
;
(4)
i
and. the parameters B and B (i = l,2, .. ,k), are estimated using
i
0
! the classic linear discriminant function theory approach.
~. =~
Il
Lc
1
I in which
l
That is,
cJ-iC'
C=t
a~ic is the elem~nt in the ith row and cth column
l
; of the inverse of the pooled covariance matrix, and d
l
= the
c
1
I difference
in the value of the means in the two populations for
! the cth X risk factor.
~ = 1/2
_11,:
2
The intercept B is estimated by
B d - ln(P /P ).
0
l
Ij where P is the- probabi 1i i cty of not getting
the disease and P
0
~~
I
'
0
.
•
.
is
1
0
/ the probability of getting the disease.
It is clear that some
1,
l information of P is required to estimate
but P appears
0
0
0
j
l nowhere else. Since P is not known in case-control study and
I no estimate of P
is
a~ailable,
the relat've risk can only be
0
estimated when P is assumed to be very small.
In which case
0
the tenn ln P /P drops out and B is approximately
0
1
0
ta·= l/2 ~.d
2:_", c •
LCI
I
·-------~---......._.,.-
.
-
----------·--------------..1
7
..
)""'-·'-~-'-'·•·~"""""""'-~"-=•=••~=•·~~--"•"=-~~•w•·N~~w•~,-·•~-•·=•""~=e<-•=~-•·=-•~~• •-··.,·.-·~·"'~'-''"-"'"~"·'~"~~·••=~-'""'~~··<·.-•·-.·~•
l
...•.c·•·j
'
l
j
The use of the multivariate logistic risk function with
I
l
i
j discriminant coefficients to predict an individual's risk of
l
Ii disease
based on his risk factors has encountered some criticism.
'
!The criticism is based on the fact that the estimates of the
i
l parameters
l1 transformed
were ach1eved only after the ,:independent factors were
to achieve approximate normality.
Thus, others
~
!I (10-13,5,7) postulate equation. (4) as an appropriate model but
l
.
! B0 and Bi estimated by least squares or maximum likelihood
l
I interative procedures.
Il
A brief review of these approaches will now be presented.
i
j Halperin et. al. (5) pointed out that the maximum likelihood
i
~method
is not: affected by the distribution of the risk factors
l whatever they may be. They further went on to compare the
I
I requirements imposed on the fitting by
l
j equation, which is given by:
I
{where
.P=
th':;
maximum likelihood
1 - E(Y) = l/1. + exp(-B 0 - B·X.)
1 1
equ~tion
. (5)
(5) is from equation (3), and P or l - E(Y) is
the risk of disease for individuals with or without the risk
factor); with those imposed by the discriminant function approach.
They also
~ompared
the asymptotic (large numbers exposed to risk}
j behavior of the maxi mum like 1i hood estimates in the case of one,
·I
two or three binary" independent factors.
t~uded th~t the
Halperin et. a1.
Cornfield approach (discriminant function theory
l
;
I
l
l
!
·!.
8
method) is theoretically correct if the distribution is multivariate
norrnal for the cases and for the control with equal variarice-covariance matrices in the two populations.
Halperin et. a1. summarized their findings by saying that,
"On theoretical grounds the maximum likelihood method is preferable,
since it does not assume any particular distribution for x1 ,
X2, .... ,Xk and it gives results \'Jhich asymptotically converge
to the proper values if the logistic model holds" (5).
Similai~ly,
Walker and Duncan (12), Bradley and Gart (13) and
Seigal et. al. (8) suggested a least squares approach to estimate
the coefficients 80 and B; of the multiple logistic function.
The least squares approach gives unbiased estimates of the
coefficients, but Halperin et. aT. pointed out that the least
squares method is equivalent to the
maxim~m
likelihood method
but less efficient . . In comparing the least squares method and
the discriminant
fu~ction
method, Halperin et. al. noted that
the Cornfield approach is non---iterative and computationally
simpler than the Walker and Duncan least squares iteration approach.
An even simpler model-the multiple linearregression,
developed by Feldstein (8) is given by:
Il where. Y
Y = B0 + B·(X·
- E(X.))
1
1
1
de~otes
j regression
I
the ·risk function.
coefficien~s
s1,
i
pres~nted
Unbiased estimates of the
= 0,1,2, ..•• ,k,
least square fitting procedure.
1 model is
(6)
are obtained from
A detailed derivation of the
in appendix C.
j
~--··----··
--------·-1
9
two multivariate models revie'.'ted above are used in this
l study
under three different conditions.
A comparison of the
I
;
I
l relat·ive
l
l risks
risks obtained by the traditional approximate relative
method.
--------------·-----·
·---·-~_:_____j
1.
'
STUDY METHODS
The three sets of data employed in this study are those of
the Kaiser Foundation Hospital, Los Angeles, California.
The data is used to observe the behavior of the two multivariate models - (1) multiple logistic risk
function~
and (2)
multiple linear regression ri?k function, and the traditional
1 approximate
relative risk method.
Each set of data provide a
!
i
!! unique
condition, for example, data set 1 which provides a condition
.
l where the risks themselves are small. A comparison in terms of
j the magnitude and ordering of the relative risks obtained by the
l
!l three methods is that to which this study has addressed itself.
!
I The way this study attempts to achieve this comparison is by
Il using
Il the
the three methods with the assumption that the risk of
disease in the general population is unknown for each data set.
METHOD OF ANALYSIS
SAMPLING PROCEDURE
A computerized data file is initiated for any person,
treated at, or admitted to the hospital based on information
obtained and physician's diagnosis. :.A-computerized record
. linkage
te~hniqu~
continuously update thP files.
The computer program used {24) to ·sel_ect the samples form .
j
the data fiies can conveniently retrieve any information
;O
I.detailed form or randomly by using the_ word "sort" or "select".
J
[__·---~~--~---~---- 10
I
:'-·.
_j
11
This word is immediately fol1owed by the criterion for selection.
If one is interested in selecting a random sample, a further
distinction must be specified in the criterion.
A seven digit
_number assigned to each patient as identification nufuber for
;
updating the files is used to make thi'$ distinction.
Snedecor and Cochran (23) outline a simple way of selecting
a random sample using the random numbers Tables with each of
the sample having equal probabilities of being chosen.
Seven
digit numbers are treated as random numbers, and applied as in
1
;
the following example; since the sample sizes in this study
j are, 175 for the cases and 175 for the controls in each data
i set, the computer was asked to select any
J
disease in question whose last three digits of the seven digit
I number do not exceed 175.
I digit
J
175 patients with the
If the last
ti1~·ee
digits of the seven
number of a patient is 207, the patient will not be
II selected
and a patient with say 107 will be selected. The same
.
selection criterion was stated for the controls except that
they were selected from among the patients admitted for other
disorders, but within the ~arne peri~d. .
DATA SET 1
1
This data set provides the-situation where the risks·
l themselves are small.
I
Sartwell (22} and Seigel (7) found some
degree of association between smoking and oral contraceptive
, users \'lith thromboembolic disorders in wo"men.
i
They noted that the
j risk of acquiring the disease for users of oral contraceptives to
L.:....·-··--·-~·-·---·---~--------"·-~--
1
j
12
be about five times higher than that of a smoker who does not use
contraceptive.
Between January l, 1974 and September l, 1975 there were 175
lwomen admitted for thromboembolic disorders in Kaiser Foundation
'
'I Hospital.
j
The control group was selected from women admitted for
l other diseases between the same period.
The number of cases and
lcontrols who are either smokers or users of oral contraceptive
i
l
'
! or both, or neither are presented in Table II.
'
TABLE II
No. of cases
No. of
control~
xl
x2
no
yes
22
5
yes
no
56
86
l
yes
yes
45
18
'
no
no
52
66
j
'
l
1
l
l
1 x,
= smoking
I
llx 2 -- oral contraceptive users
l
!
j
j
l
l
DATA SET 2
This
j moderate.
1
J
d~ta
set provides the situation where the risks are
Various observers (25,26) have recognized the risk
fac~ors a~:;~ciated
.
with
~versi~·ed (>4,~0~
.
grams) infants.
I
j Published reports have shown that maternal diabetes mellitus
j and a pr'ior delivery of a large infant appear to be the most
Il consistent factors associated with oversized infants.
L.~-~~--·---·-~--~--~----·---·--·,_,____--··-~·________.,-J
13
.;,_,-....
.
.-:;..-~-.-h.oc;·==·~-..,••··-~.,_....;;.·"""-""'"'""·····"<
._,_..
.,.,..._,.,..,;,;"-"'"'•"'""-"-''"'"'"""....,,...:;..,.."·""'~·?..,..,....,..-->.'""'"-~ ~""'"''"".,.._~,...r=«<=>"'<'·>.•-- _,,.,.~,.,_..,.._._-,.,.-_,. .'<--"'='·"'~'''"""'--"""-"-'-'"'-'"···
,_ ~-.-_,.-_·~ "'"'•. '-"·'"·' ·-=.-., .,.,, "-'--"• ·~"' ,.~_,;- ,•.,.. ,._,~.·:..
.-.~,-,.-
. ~ ,..__ ,. ,,,,.. ..>..
s.• 0 ..s- '',
.
There were 210 cases of oversized infants delivered within
1
lj
In
! the same pP.riod at Kaiser Foundation Hospital, Los Angeles.
l
!order to be consistent, 175 cases were selected.
I
l~
Table III shows the distribution of cases
1 oversized
(~others
of
infants) and controls (mothe"r'S of non-oversized infants)
!
! who are either diabetic or with hi story of 1 arge infant or both,
;'
I
' or neither.
l
I
TABLE II I
j
x,
I
1
No. of cases
No. of controls
no
yes
64
15
yes
no
60
80
yes
yes
13
20
no
no
38
60
Ij - - - - - - - - - - - - i X1
=
maternal d·iabetes
I!
l x2 = history of delivery of large infant
DATA SET 3
This data set provides the situatiuo where the risks themselves
are large.
Obesity and a parental history of·diabetes are among
l
I! the
independent risk factors strongly associated with diabetes
.
.
l
l
!
i
I
I
l
Obesity and parental history of diabetes were investigated
. J in 175 cases of diagnosed diabetics and 175
I had
I
.j
j me 11 i tu~, as noted by many other observt:rs (27 -29 }..
·l
l
hospita~
patients who
been diagnosed for other diseases, at Kaiser Foundation Hospitqi, ,
.
.
.
l
--·--~-----·----·----------·-·J
L-~--·----·------·---·
14
i>-'-"""
.,,,,._,."''~"''"''"'~-'--"-•·><~<C:"<.:<"·'~ c•"' _,_.,..,._,_._U'U">~~-.... ''-'"'"".,-""""'"-'=--~=--·<40._'>'->"'"-="'"-,..-.-...,"""'-'.-<00l'.-'·"'"~···· ,,.._...,,....,."-"'·"'·"'·"'lo!'·>",\ '--'~ o> ~ """'"''"""' • , __.,..."-"''•''•;.•._._-.,-_·,.,.,.,...,..,_. H-~~~:.-,
"'-'";. -"'..,·.-<.-•., -~'-' """-• ..r.r•,· •. ,_,-,.,.,,_,..._,_,.,'-"·.,.,_,...,
c,'•:>".'• •
I
j
l bebJeen
January t ~ 1974 and September 1, 1975. ·Table IV shows the
t
ii di stri buti on of the cases and controls who are obese and or with
I
parental history of diabetes, or neither.
l
l
I'
I- x,
!
TABLE IV
-·------------_...:,.~=-~--
x2
No. of cases
No. of controls
i
_
I
i
I
no
yes
47
32
yes
no
35
40
yes
yes
70
21
no
no
23
82
I
~------------------------------------~------------·---------
1 X1 = obesity
l x2 ~
parental history of diabetes
I
I
L-·--··~·~·-·---·.,.._..-·---~-----
~-J
STATISTrC1~L
t·1ETHODS
____
____ ____
_ RISK METHOD
TRADITIONAL
RELATIVE
.,
_...
This method does not assume a:1y kind of d·istribution for the
risk factors
x1
and
x2
in each data set.
The data is placed in
a 2 x 2 table as in Table I, and the relative risk is estimated
as follows (see appendix A):·
Relative Risk : ad/be
where a
= the
nu~~ber
of cases with the risk factor
h = the nMrnber of contra 1s \'Jith the disease
c
d
= the
= the
I
~.
num1ber of cases without the risk factor
1
i
m.1m1ber of controls without the disease
MULTIPLE LtlGlSTIC RISK FUNCTION METHOD
·i
This methodl uses :a discriminant function approach to estimate
1
l
l
j the va1cres of BirD,
set.
:s1
and
s2
for tbe two risk factors in each data
It assumes multivariate normal distribution for the risk
factm·s (Xl ,X 2 } f.or both the controls. and the cases.
It also
assumes equa.l vaw·i ance-covari ance but different means in both
population~.
T~
iI
risk function is given by:
J
ll where s
j
j
1 and
s2_ are
.·
estimated from the elements in the inverse
i
II matrix of the pooled covariance matrf'x.
I
J
II
l
(i
..
This is given by:
l
I
i
I
= 1,2)
I
iI
I
______·--~-------_)
·
15
16
in \'thich .cr ic is the element in the i th rov1 and c th cohtmn of the
inverse of the pooled covariance matrix and de is the d·ifference in
:the 'Jb;;drw of the means in the two popu1ations for the cth X risk
fa·ctnr.
Afro
iB 0 is estimated as:
;!
B0 = l/2~;dc'
w
the re.tative risk is then estimated as follows:
Re·lative .Risk= exp(Bldl + B2d 2 )
br:(Relative Risk) = 1n{B1d + B2d );
1
2
Yr.tl.ere ~1 =the difference in the value of the means for risk
fact(l'r X-1 , in the cases and.the controls; and d2 =the difference
iin
t~
value of the means for the risk factor
x2,
in both
p,o,pulJAtions.
Tin£ NUL TIPLE LINEAR REGRESSION RISK FUNCTION
·--··.
I
lh~s
l
I
l
l
r~·sk
method assumes multivariate norma 1 distribution for the
factors in both
populations.
.
j equa.l ¥ariance-covariance.
1
c~ontral
l.l
and
Bz·
l ~ere
I
However, it does riot assume
It uses the inverse matrix of the
group only to estimate the regression coefficients, B1
The
risk funition is given as follows: .
Risk= 1 +~;de,
d c ·= the
differ~~ce
(i
= 1,2}
in the value of the means in the two
populations for the cth rosk factor, X.
Tne major difference between the two multivariate models is
that
t:ne Jog]stic model uses the pooled covariance matrix to
esti.mii!t.e its parameters while the multiple linear regression rriodel
I
~--~--···----·---·-~--.;...-~----'-~·----·-..1
17
uses least squares to estimate its parameters from the inverse
matrix of the control group only.
The numerical calculations to
l
!show this difference is detailed in appendix D.
l
I
I
J
t
l
.I
II.
l
J
RESULTS AND DISCUSSION
The three methods discussed were applied to each data set
discussed in metra.).)dology section and the relative risks obtained are
preser1ted below.
Since this study is not interested in the
interpretation of the relative risks, it will not be presented.
This study is interested in any discrepancies in the magnitude
; and ordertng of the risks themselves under the three conditions
1
\discussed earlier.
f
j TABLE V:
DATA SET 1
i
Ratios
1'Jf
risks by category of smoking and oral
contraceptive class, relative to ••no,no
I
11
group,
computed. by three.methods in the development of
I
thrombc}'elTibo 1i c disorders
l ----------------·----------------------------------------Felative risks by various methods
· Traditional
Method
1
I
Mult·iple
Logistic
Multiple
Linear Reg.
I ------------~----~-----------------------------------1
I
no
yes
5.5
4.4
3.6
yes
no
0.8
0.8
0.7
yes
yes
3.2
3.4
3.2
no
no
l.O
1.0
l.O
I
I
l
I
l
l
I
l
l
l
I x1 = smoking
I x2 = oral contraceptive users
I
L-~------,
18
19
r·
-~""'~"""""'
. ,.
~=·· _..;~='"'""'..... "'~ • .,....•. ..::.,.!.>- ,,_,_,..,., •• ,~· ........ ,.~~·" ,."",. ........ ~....-........... ~._,..., ~·"" ...~~···..: ,...,.,......~."'-"""""'' .;:::..· -_,..-... _;.,.. ._....,.. ~ ···"'""""""'"""
.,=,., ..~ =--""'-"""-'"-·~....._....,=~.......-, ~•....._•.,."'.........~- .-,.__~ ··~ . ~- ·~ .... ~ -~ ...•,.,.,.,.,~ .. =C.-- " .. .,"___, ,_~· ...,:;. . . -.-.. . . ,.~_ --~~--- ·" ~ -'"' -·- ~~~
j
{
!,TABLE VI:
DATA SET 2
;----------~------
~
Ratios of risks by category of maternal
diabetes and history of large infant group,
relative to
11
no,no 11 group, computed by three
methods in the delivery of oversized infant
Relative risks
Variable
T.M.
xl
Il
by
various methods
M. Logistic
M. Linear ;
Regression
no
yes
6.80
3.00
2.4*
y~s
no
1.20
0.70
0.4*
yes
yes
1.03
2.03
1.9
no
no
1.00
1.00
1.0
IX
Il , = maternal
j x2 =
diabetes
hi story of large infant
j *The multiple linear regression model provides lower estimation
/ when the risk is moderate.
l
l
This suggests that the control group
'
alone does not contain
all
the necessary information for predicting
j relative risks.
I
______J
20
TABLE VII:
DATA SET 3
Ratios of risks by class of obesity and parental
history of diabetes category, relative to 11 no,no 11
·
group computed by three methods in the development
of diabetes mellitus
Variable
Relative risks
· Tradi tiona 1
Mehtod
various methods
~1ethod
~1. Linear
Regression
M. Logistic
no
yes
5.3
.82
.28
yes
no
3.1
11.54
7.47
yes
yes
11.9
9.44
6.75
no
no
1.0
1.00
1.00
x1 = obesity
x2
b~
= parental
history of diabetes
21
Prev:iaus investigators (5-8) have noted that the two multivariate·
models tend
to
hold fairly \'Jell with lr\rge number of risk factors,
when the risks themselves are small.
are consistent with their findings.
The f-indings in this study
Hmvever, only the multiple
logistic risk function model has been used on sets of data where
the r·tsk:s are large.
l are
large~
Halperin et. al. noted that when the risks
the multiple logistic risk function discriminant
1,
) coeffi.cients do not converge to true population values and thus,
'
'
the estimates of B0 and B; are frequently seriously in error.
,•
l
l
i In turn producing misleading relative risks.
i
l study.
1
! this
!
This was observe in this
The erratic behavior of this multiple logistic model and
.
multiple linear -regression model, when the risks are large,
\
) probably suggest why they are rarely used.
J
j CONCIJJSI~!\
{
l
I
Il both
From the three tables (Tables V,VI,VII) it is evident that
-
.
models are fairly consistent with .the traditional approximate
..
! relative risk method when the risks are small.
I
.
But the multivariate.
l
1 models become less efficient estimators of relative risks when the
l
risks ar·e. large in small samples.
Halperin et. al. and Walker et. al.
have noted that the logistic model holds fairly well with large
samples ...
.I
L___·-~--~-··
BIBLIOGRAPHY
22
23
·Hausner, J.S. and Bahn, A.K., Epidemiology An Introductory
.
Text. (1974} pp. 314-320
·c. E. and Elvebacks L.R., Epidemiology: Man
{1970) pp. 296-298
Fox, J.P., Hall,
and Disease
MacMahon, B. and Pugh, T.F., Epidemiology Principles·and Mehtods
(1970) Boston, Little, Brown and Co.
.
Dawber, T.R., Kannel, W.B. and Lydell, L.P., An Approach t~
Longitudinal__studies in a C9mmunity: The FraminJham Study.
Ann. New York Acad. Sciences 107: 539-556 (1963 .
l
l 5.
;
I
J
I'
6.
Halperin, r~., Blackwelder·, W.C., Verter, J.I., Estimation of
~he Mu)tivartate Logistic Risk Function:
A Comparison of the
Discriminant Function and Naximum Like 1i hood Approaches.
J. Chron. Dis. 24: 125-158{1971).
·
Cornfield, J. A Nehtod of Es!j!)lating Comparative Rates from
Clinic~) Data, Applications to Cancer of the Lung, Breast
and Cervix. (1951) J. Nat. Cancer Inst. 11:1269-1275.
7 • Se i ge1 • 0 •G. .;_P=-re~n;..;;a~n.;:_,cv:~--,~~~;.;.,..:..~~:.:.;_:.:.__;:.;..;.;:_.;;..St.;;..e;;;.;r...;;o~i-=-d Contraceptive •
l
1
Milbank Mem. Fund Q L:
I
j 8.
I
Seigel, D.G. and Greenhouse, S.M. Myltiple Risk Functions in
Case-control Studies: (1973) Amer. J. Epidemiology. 5: 324-331.
19.
Cornfield, J., Truett, J., and Kannel, W.B., A Multivariate
Anal sis of the Risk of Coronar Heart Disease in Framin ham,
J. Chron. Dis:20, 511-524 1967 .
i1 0. Cornfield, J. · Joint Dependence of Risk of Coronary Heart
I
Disease on Serum Cholesterol and Systolic Blood Pressure:
Function Approach. Fed. Pro. 21: No. 4
Part II, Supl. No. 11 58-61, (1962).
~Discriminant
Discriminant Functions:
11.
12.
13.
. L_
Rev. lSI 35, 142-153, (1967)
S.H., and Duncan, D.B., Estimation of the Probability
of an Event as a Function of Several Inde endent Variables:
Biometrika 54: 167-179 1967 •
Walker~
Properti~s
I
~
I
'
of ML
u1ations:
___j
24
14.
!
j
I
'
jl5.
j
1
! 16.
l
Geisser, S., Posterior, Odds for Multivariate Normal, J. Royal
Stat. Soc. Series B, 26, 69-1~ {1964 . ,
Armitage, P., B~_!_.R_evelopf!l~ents in_Medical Statistics:
Rev. lSI 34: 27-42.
j
l 17.
I'
1
18.
1
I
1
lI 19.
I
l
! 20.
Cramer, H., Problems in Probabilit,x Jheorem; Ann. M. St.
Feldstein, M.S. A Bi[lary ~~ultiple Linear Regression Model
of Ana1_ysing Fa~_tors Affectj__!:!_S! Perinatal ~lortalit_y and
Other Outcomes of Pregnancy. J. Roy. Stat. Soc. Series A
1~6'1-"73 ( 1966) .. • A Method of Evaluati.!29_Perinatal ~1ortal ity Risk.
Soc. Med. 19: n·s-·139 (1965).
~B~r_,i...,...t-.-.,J,_.-,Prev.
i
Pearl, R., and Reed, L.J. On the Rate of Growth of the
Population of the United States Since 1790 and Its
Mathematical Represe~ation (1920) Proc. Natl. Acad. Sci. 6, 275:)
121.
Reed L.J. and Berkson, J. The 1Q lication of the Lo istic
Function to Experimental Data 1929 J. of Physical Chemistry,
l
II 22.
I
l 23.
24.
l 25.
33.
760.
.
j'
Sartwell, P.E. Oral Contrace tives and Thromboembolism:
A Further Report. Amer. J. Epid. 94: 192-201
Snedecor, G.W. and Cochran, W.G.
pp. 414-417, 504-538.
Statistical
Metho~s
!
.(1967)
Data Analyzer Computer Program for Honeywell Mini-Computers.
Sack, B,.A., The Large·Infant.
pp • 104-196.
Amer. J. Obstet •. (l969)
.
.
26.
Nelson, J. and Rovnerl I.W., and Barter, R.H.
South. Med. J. 51:23.· (1958}.
27.
Paff~nbarger,
28.
I
~
1
The Large Baby.
Jr., R.S. and Wing, A.L~ Chronic Disease
in Former College Students. (1973) Amer. J. Epid. 5: 314-323.
i
I
Dunn, J.P., Ipsen, J. and Elsom, K.O., Risk Factors in
Coronary Arter~ Dise~se, Hypertension and Diabetes. (1970)Jl
Am. J. Med. Sc1. 259. 309-322.
"'"--"----'
25
r
'"'-"-'~·"?;-, ~-<- ~: "·~-
,-._,__"'-"<,>.-
"->-~.~--~-"'''=''•""-"''• ~O.t='»""'-""'-""'"·"'"'>,..--""'~""'-""'~C""""'--'""'1> '>'".o.'<~--"'''-''"'-<·'-'''-"=-~"-"'-'<- ~,-........._,~~-~''"'"- ~<"<>~•'<:· ...,.._......""""'~"-""' >><C'>"~•;o
"-£-"- "• -'< ..._,_.,.-..,_.
•.-:.~~--.•,•.'0--..•>L>"<~~._
=._""""' """'""""' """
0!'••><•,,""'="'"·~•"---'-"''" -·~--'>:-"'"~--"".c.-.·'<;,·,~-'
·.•.c•;;! ""-<;:;t •..._)
l
l
i' 29.
!
Pell, S. D Alonzo, C.A., Som~ Aspects of H'1ertension in
Diabetes Mellitus.. JAt~A 202: 10-16 (1967 •
1
l1 30 .. · Cornfi r: 1d, J. and Haensze 1,
l
Studies. J. Natl. Cancer.
H.,
~..Qp1e
Inst~
Aspect.s of Retrospective
(1960} pp. 523-534.
l
. i
!_131.
i
132.
!
I 33.
.
~~antel,
N., and Haensze1, W., Statist-ical Aspects of the
~T: s
s: ~=~~. f-~~~c~~~r~~~~~':,~~~- Si~~:
es of Disease.
Cornfield, J. Proc. of ~he Third B~Jey Sumposium on Math.
Stat. and Probability., (1956) Berkeley, Univ. of Calif.
Press. Vol. 4 pp. 135-148.
Goodman, L.A., and Kruskal, W.H. Measures of Associations
·for Cross Classifications, J. Am. Stat. A. 49:732 {1954).
I!
i;
j
!
I
I
I.
L
I
----·0-;>J
I
'
j
l
APPENDICES
!
I
.l
l
I
26
27
APPENDIX A
}
j
;
.
RELATIVE RISKS
i
!
J
BACKGROUND INFOR~1ATION
,._;_;,;_;_.;....
I
J
i
In
of disease etiology, the major interest is in
studie~
I causal associations. It is often not possible to study such
J
i
.!associations
I
by
.
direct experimentation in populations of interest.
.
1When experimentations are not possible, however inferences about
the causal associations must be drawn either from a ·variety of
observational
associati~ns
or from a study of causal associations
in animal populations or from a combination of both.
Lacking the ability to introduce deliberately a disease
characteristic into a subgroup, the next best step involves
subclassifying the members of some well defined population
(or a sample of it) according to whether they do or do not
possess the charateristic.
One may then count the number of cases
j
of disease for those with and-without the
char~cteristic
in a
number of different ways:. {1) the number of new cases of the
1
disease developing during some period of time, say a year,
I
l
usually designated by the term incidence, (2) the number of
i
I
during-a
deaths from the disease occurring
.
.
. given period of time,
.,
mortality; (3) the number of live· individuals having the disease
I'
.
at some moment of time,
the point prevalence, or (4) the number
of live individuals having the disease at any time during some
I
interval
I
the interval prevalence.
However etiological aspects
L __, ~..., ------·-------
I
__J
28
i
i
are best analyzed from the point of view of incidence.
. 1
A population may be classified
by
presence or absence of
Ithe characteristic and development or non-development of the
j'
! desease as shown in Table I.
'
I
TABLE I
J
i----~------------------------------
1
Ch aractenst1c
' .
J
Disease
- No. of Indivi dua 1s
li
Cases
Total·
Controls
j
! Present
!
i
l Absent
a
b
a + b
c
d
c+d
a +c
b+d
N
1
i
l Total
1---------------------l
Relative risks can be estimated (1,3) from the proportions
j of persons
wit~
or without the disease who possess the characteristic. ;
I
j
= the proportion of the population developing the disease during~
l
l the time period of interest, i.e., the incidence rate; let P1 =
r
I
l
the·proportion of those developing the disease who possess the
j Let P
I
characteristic and let P2 = the proportion of those ·not developing
the disease who possess the characteristic. Therefore, from
I
I
Table I the following estimates can be obtained:
P1
= a/(a
+ b)
p2 = b/(b + d)
_j
however, no estimate of P can be o==inci~e::_o~f___
29
the disease
~or
those
po~sessing
the characteristic
~s
given by:
(P 1P)/(P 1P} + P2(1 - P) .
and for those lacking the characteristic is given by:
(1 - P1 )P/(l - P1)P + (1 - P2)(1 - P),
and the relative risk is the ratio cf the two expressions.
The assumption that P is sufficiently small reduces the ratio to:
P /(1 - P ) .X (1 - P )/p
1
2 2
1
and in terms of the notation in the tabel the relative risk is
approximately given by :
Relative Risk
= ad/be.
30
•0 ''"''•<'··~---~----~~-"--~-~.~~'''~'""'"'~·-••-·~•-••~·~·~~~~"•~·--~••·~'"-'•"-~·~-'"'~'-""~=••·-·•n•--•··~-·~•••••·~·-·•·•··••-"""''
•-·····w•~••••--·••~•·"•""'' "'"'~~---·-·~··••·nw·";
i
·.:
;'
APPENDIX B
l
l
<
~
l
l
I
THE MULTIPLE LOGISTIC RISK FUNCTION
· The logistic model has been called various names as has
'j been pointed out in the literature review of this study.
i
j
l
The logistic relative risk function is given by:
!
Iwhere the parameters, 8
1
0
l
and B. (i
-
= 1,2 ... ,k)
are estimated
1
l using the discriminant function theory approach. Historically, the
!
I discriminant function was developed independently by Fisher,
i
l Mahalanobis
i
and Hotelling. (23)
the probability of incorrect assignment.
31
To minimize B for a specific B0 ,
L(X)
(X 1 , ..• ,Xk)/f 1 (x , ... ,Xk)
1
where f (x 1 , •• ,Xk) and f (x 1 , .. ,Xk) are probability density
0
1
= f
0
functions for populations 1 and 2 respectively. L(X) is used to
/classify an individual into population 1 jf it is greater than
) or equal to some constant, C, and into population 2 if it falls
Ii below the constatn,
.
C.
I
.
1
The constant, C, is deffned by
I
l individual belonging to population
!
!
Prob(L(X)<C) = B •
!l Now suppose
the condition that for an
0
that f (X , ••• ,Xk) and f 1 (x , ••• ,Xk) are multivariate
0 1
1
! normal densities with different means, but the same variance and
I
I
.
! covariance, i.e., from equation (l)
l
j
P(X 1 , •• , Xk) = (1- P/ P)( f 0 (X 1 , ••• , Xk) ) / f l (X l , •• , Xk)
I
Ifor a disease Y.
= 1/(1
tc
+ exp{- B0 -<"R.x.)
,,,
= E(Y)
,(-:oC
h
.
.
I n the above equat1on t e curve 1s S shaped.
' When expressed in the linear form, the equation is given by:
.The right hund side of the above equation is usually referr€d to as
~ay
the discriminant function, since its numerical value
discriminate
between those likely and unlikely to become members of the diseased
1
.!
I
group.
Recall L(X) which is now easy to verify; as
.
.
k
.
.
.
.
1
(L(X)~C implies~ca- c )~ ·
.
Ghere A is another
---·~-·
•.
constantl.~epending on
C and the means and the
1
--··=-----------.--·-·-·-·-··--:--~----~-------~----~
.
32
difference in means, de. ·This is another way of computing a linear
! function of x•s and assign to a specific population any individual
i
1
for whom the linear combination falls below a certain value.
l.
Iparameters
B and B; are
0
~sed
The
in calculating relative risk.
i
l
l
i
l
!
l
l
II
I
l
'l
33
APPENDIX C
THE MULTIPLE LINEAR REGRESSION RISK FUNCTION
The multiple linear regression model is a simpler approach
which uses least squares estimate of regression coefficients to
estimate the relative risks.
The arguement for more than two variables is best presented
by matrix algebra.
= B0
E(Y)
Let x;j
= X;j-
The mod2l is given by:
+~;(X;
x1; i
(l)
- E(X;))
''='I
= 1,2, ... ,k,
the normal equation yielding
the least squares solution of the:K +1 parameters may be given in
matrix form as follows:
n
l
1
•. ••. •
~i Z.1\..
1
Lxxp
B
p
I
Ii n -- the total
number of observation
' The solutton for B0 can then
be written as
.
where ""'P1 = the proportion of individuals with the disease in the
population Y = l; which reduces equation (1) to:
(2)
__ j
34
~.,.,.,.,,..,.,.._...,,. '"""'"""'"~.-.~....,.-":<'''''~'''""'''"'""".,..,_""-'"'
......,.._.,,_,""-' .·~o;;,,..cu.=,_,,..,_.... _.,.,_,_.0-.<t:,..o.o'-""'>"-"'.,'"'-':.~•~'""""'"~~-.OF>"l"O"OC" _.._~.-;eo~;~_,,._-,.~.".' ~~--~""-'"'''~·,.,""''>-''-'0."--"·~- ·..·
.J.'>:";,.:..<>-.. -.1':<>.-..o»< ·,.,._ .. -F.•~~·->='
_r
1.> : < ,-'<~ •._ '"'-_,.., -,~·.,
?-,<.
"C<~ "-'*~-.,._.:M...'":>""
.1
{
l
lwher·e S ts the sample covariance matrix of x•s based on division
I
I
l by
But S is also given by:
t1.
:·
~(l(i - xi l 2 = ~(Xil
i
I
Xol
-
2
I
+~yil
)
- -x12
- 1 - -X;) 2
(X;
I
II and the samp 1e var·i ance by:
l;
1\i\.
+ nP 0 Pi.
.2 _ 1\ 2
J\
2
J\ A
2
Si- PoSio + PlSil + PoPl(X;j- X;o)
lI
.
~
I
A
j theTm making the assumption that P is small and P is approximately
1
) eqtHil to one, the fo 11 O\'Ji ng is obtai ned :
1
!1 (P
I
0
= proportion
and P1
0
of individuals for whom Y = 0 and Y = 1
respectively)
l
A
";)
sZ1 = P0 s:10
Il and
~X
•y
~ 1
I
= n :~
..
l
0
(x 1., - X.10 )
substituti.ng these values in equation (2), gives the following:
A
J\
P0 S B
0
= AA
P0 P (x
1 1
-
- X )
0
wfler,e S0 i.s the sample covariance rna tri x of X•s among those
B= ~1 s~ 1 (x1 - X0 )
ind-ividuals withY = 0; hence
and the re.gr.ession equation is given by:
Yj
=AP1 + AP1(x- 1
~ -x )1s~ 1 (Xj
0
- -x0 )
which reduces ·to:
yj =
~,(1
+
ex, -Xol's~ (Xj - Xo))
1
The re1at.iv.e risk function is then given by:
l_.---Y
~-1~
J_·
1
0
---X_~_))_/_S~-------·
( tX - X )(X·j-
_____
j
35
The relative risks can be computed just as in the multiple logistic
,model except the coefficients B; are derived from the terms in
l
j the inverse matrix of the control group only.
j
l
I
j
I
!
I
1
l-
I
-
-
'--~-·----------~---·-~------·-
_j
36
APPENDIX D
NUMERICAL EXAMPLE
MULTIVARIATE MODEL
Data set 1 is used here to illustrate the computational
proce~ures
required for multiple relative risk functions in
cans-control studies.
These data are useful for illustrative
purposes because the covariance matrices are of the order 2 and
hence computations are easy to follow.
On the other hand, these
models are most useful in summarizing data sets involving several
risk factors, where informal, intuitive review of tabular data
is cumbersome.
Data set 1 can be piaced in a 2x2 table as· follov1s:
(The computational procedures follow from Seigal and Greenhouse (8)).
Cases
Controls
Xz
x,
yes
x2
no
total
yes
45(a)
56(b)
101
no
22(c)
52(d)
74
Total 67
Var. xl
x,
yes
yes
18(a)
= fa+b}{c+d~
a+b+c+d)
Var.
23
x2 = ~+c)(b+d)
(a+b+c+d)
2
86(b}
104
152
x1 = 1:!~1~¥~1~
,_ .244
Var.
total
66 (d) .
5(c)
Total
107
no
Var. x2
= .241 .
= (a+c)(b+d)
(a+b+c+d) 2
= .236
= • 114
- - - - - - - - · --·
-~~--~-·----·
71
37
= .036
i
!
tcovariance x,x2 (controls)= .025
!
'
following covariance ~atrices are then obtained:
I The
l
Cases
Controls
X
X
.244
i
.036)
( .036
. (.241
.025
.236
.025)
. 114
I
j The logistic model uses the pooled covariance matrix which is
1
l obtained by taking the average of the elements in the covariance
i
l
•
j matn ces; for example,
i
for x11 , pooled= (.244 + .241)/2
1
j
i
= .243
The pooled covariance matrix is given by:
Pooled
A
·( .243
.030
.030)
.175
and,
.
(
I
L .. -~~·"...;,~------~~~·--
4.21
-.72· )
-.72
5.84 .
I
J.
38
r··~--'~'~"·-=··~•·•o,·•=~=-'->~••~"'''·'" '><'CO~-~-~-~"~·=-><" •W.'>=•~•••••-~-~~''"~''<>-•.o>"''-""~-'''"'""· ··-~·······~
·-•••-e·.•A.>.- ---~~~·''"=•-•'·'<~c'.'''"" < <'<·c·. Hcc•.•.,C•,
I
i
j
l Also
l1 and
required are the differences in the means between the cases
the controls for each risk factor:
For X ; 101/175- 104/175 = -.0171
1
For
x2 ;
67/175 - 23/175 = .2514
!
j
Then the coefficients are estimated as follows using the elements
I
1
I in the pooled variance covariance inverse matrix as follows:
J
l
= -.0171(4.20)
I
l
+ .2514(-.72)
= -.25
l
I
I
= -0.0171(-.72)
+ .2514(5~84)
= 1.48
To compute the relative risk for a group who does not smoke but is
an oral contraceptive user relative to the group who does not
smoke and does not use oral contraceptive,
tx 1 = 0; x2 = 1)
X1=X =0.
2
and let d represent
th~
i.e~,
let j
ln riskjyes, no
riskdno, no
= -.25
(0- 0 + 1.48 (1 - 0)
= 1.48
Risk Ratio
= {riskj/riskd) = 4.4
=
"no,no" group with
Therefo~e,
j
I
39
The risk of yes, no group relative to no, no group is
11
11
11
11
computed as follows:
{9dds(yes,po)_)
ln\Odds(no~
=
(-.25 (1-0) + (1.48) (0-0)
= -.25
Risk Ratio ={pdds {.Y-es, nof\= .8
- \Odds {no,no) 1
The risk of yes, yes" group relative to no, no group is
11
11
11
computed as follows:
(Qdds(tes,yes) \ = (-.25) (l-0) + (1.48) (1-0)
ln\Pdds no, no) J
= 1.23
j Risk Ratio =/odds{ es,
J
l
= 3.42
\Odds {no,
1
The risk of no, no" group relative to "no, non group is of
11
i
I course equa 1 to 1.
I
l
j
IL.__.;....
__j
40
Th(2 elements of the inver·se matrix of the control group is
1
l used
in estimating the regression coefficients in the linear
!
j regression model and it is given by:
'
l
i
4.25
i
l
(
-.93)
.8.98
-.93
IThe difference in the means between the cases and the controls
j
J
l are
the same as that used in the logistic model.
. a
·
I
Risk =· 1 + ~B;{X; -
From;
X; 0 );
(.::J
l
I
.
! and computing the same points as that computed for the logistic
l
I model~
I
the relattve risk of j group to d group is given by:
(no,yes) {no,no)
.
2
1
1
1
RiskJ·/Riskd =
ll -a-!B.(x;~x.
)).
~ 1
10 3
~
I
j
{1
-
+~i(Xi-Xio))d
l .where B1 = -.0171{4.25) + .2514{-.93)
=
I
= 2.21
s2 = -.0171(-.93)
+ .2514(8.98)
-.31
. and the relative risk is given by:
relative risk =· 1 +
(no:wyes) to
...:1-+-:-+-~::-H~:-=-:.!.:.,;...;=-=:+~....,....;.=-:r.:::-.,....:..:.~-:+
{no,,no)
= 3 .. 6.
41
l;.A:..--"-"'-'""·'""" ...
........::<:=:><'3<0'•.
>'·"'~"""''~-"""-"""'-~-.,.--=
.-o.-..........---~.,~..-"""""'"""'.>,.<".=o......_-..."""....,._,,,._,,....,..~ .. ~-"'-""' ·;~·--·-· "''"•'-""'·'"'"-~--- """"·"•..,.,.,,_,._ -~.-•. « ~'""' ... ,,.,..__,......_.•,.,., .... ·'""''
~··~-"""-':---....."c> ........-......
:- C! ,..._, ......... ><-...~·
j
j The
l
risk of
!comt;;uted
l
J
11
yes, no 11 group relative to
no,no 11 group is
as fo l'h'Ows:
reli1,t:fve risk=.!+ (-.31lU_-104/175) + (2.27)(0-23/17_§1_
l
1 + (-.31)(0-104/175) + (2.27)(0-23/175)
I
= ..• 1
!
l
! The x·isk of
~
11
COffifJ!\'Jte.d'
11
ye5J yes 11 g.roup relative to
11
no, noll group is
as fo ll®WS:
l
i re1®tive risk= 1+ (-.31)(1-104/175) + (2.27) (l-23/175)
I
.
1+ (-.31)(0-104/175) + (2.27)(0-23/175)
j
-~
1
l The trisk
of
11
no:~~ il10 11
group relative to
ij to 1 ..
j
.
11
no, no11 group is equal
RelatiVE risks for the same groups can be computed by using
l the itradittonal
I
approximate relative risk method.
The risk of
"na. yes" group re 1a ti ve to "no, no" group is given as fo 11 o>Js:
I
relative .risk = (22/5) I (52/66)
= 5.5
For· the ...yes, na'11 :group relative to
I
relati:we risk
= (56/86)
. =.82
11
no, no" group,
I (52/66)
42
l
l The
r·isk of 11yesll yes 11 group relative to
l
l given
11
no, no 11 group is
as follows:
1
l
relative r i s k
i
I
l
l
ll to
!
I (52/66).
= 3.2
i
l The
= (45/18)
risk of
11
no, no" group relative to
11
no, no 11 group is equal
1 ..
I
l
l
I the
i
The relative risks for data sets 2 and 3 are computed in
same manner.
l
L
-------·-----
© Copyright 2026 Paperzz