Select Issues for Linear Models

Linear Models III
Thursday May 31, 10:15-12:00
Deborah Rosenberg, PhD
Research Associate Professor
Division of Epidemiology and Biostatistics
University of IL School of Public Health
Training Course in MCH Epidemiology
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
t
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Outcomes with More than 2 Categories
Examples of Outcomes which might be suited for
ordinal or nominal regression:




Ordinal or Nominal bmi categories
Nominal cause of death categories
Ordinal or nominal severity of illness categories
Ordinal or nominal categories of program
participation
1
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
The primary motivation for using a logistic model with
an ordinal outcome is to accommodate a truly ordinal
variable that has a "ceiling" and "floor" effect and one in
which the intervals between each response category can
be somewhat arbitrary —that is, it is not a continuous
variable.
Modeling an ordinal outcome as a continuous variable
can yield biased results because it will yield predicted
values outside the range of the ordinal variable.
2
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
An ordered outcome may reflect an underlying
continuous variable for which we have no data or for
which we don't know the "real" threshold values.
For example, a Likert scale for satisfaction—very
dissatisfied to very satisfied—or for agreement—
strongly disagree to strongly agree—has response
categories reflecting a continuous scale for which there
is no data.
3
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
t
5
10
x
15
Modeling Ordinal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Some other ordinal variables that may reflect an
underlying continuous construct that cannot be measured
as such. The ordered values are intended to reflect
distinct threshold values.
Examples of ordinal variables of this type:




4
access to care index
reports of experience of life stress
assessment of overall health status
satisfaction with care
4
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
To appropriately model an outcome as ordinal, the
proportional odds assumption must hold.
The proportional odds assumption:
if an independent variable increases (or decreases) the
odds of being in category 1 v. the remaining categories,
then it also similarly increases (or decreases) the odds of
being in category 2 and 1 combined v. the remaining
categories, in categories 3, 2, and 1 combined v. the
remaining categories, etc.
5
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
The null hypothesis for the proportional odds assumption is
that the odds ratios for the association between a risk factor
and an ordinal outcome are constant regardless of how the
category boundaries are drawn.
If the proportional odds assumption holds, then the
association between an independent variable and the
outcome can be expressed as a single summary estimate—a
common odds ratio—across all categories.
6
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
The proportional odds assumption can be tested with a
chi-square statistic – a score test. A nonsignificant result
means that the null hypothesis will not be rejected and
that the cumulative logit model is appropriate; a
significant result means that the proportional odds
assumption may not hold.
7
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model:
For an ordered outcome with k categories
p1
ln Odds1  ln
1  p1
ln Odds1 2  ln
ln Odds1 2...  k 1  ln
p1 2
1  p1 2
p1 2...  k 1
1  p1 2...  k 1
Both the numerator and denominator change
http://www.indiana.edu/%7Estatmath/stat/all/cat/2b1.html
8
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
Ordinal and Nominal
Outcomes
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
5
t
10
15
x
Risk Factor
Yes
No
1
a
e
Ordinal Outcome Variable
2
3
b
c
f
g
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
4
d
h
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Total
Odds Among the exposed = a / b+c+d
Risk Factor
Yes
No
1
a
e
Ordinal Outcome Variable
2
3
b
c
f
g
4
d
h
Total
Odds Among the exposed = a+b / c+d
Risk Factor
Yes
No
1
a
e
Ordinal Outcome Variable
2
3
b
c
f
g
4
d
h
Odds Among the exposed = a+b+c / d
Total
9
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
Given k categories of an ordered outcome variable, a
cumulative logit model yields k-1 intercept terms. Each
intercept corresponds to a category combined with all
adjacent lower-ordered categories.
Since proportional odds are assumed, and therefore a
common odds ratio, the effect of each covariate is
reflected in a single beta coefficient.
10
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
Ordinal and Nominal
Outcomes
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
t
5
10
15
x
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
Suppose an outcome variable has 4 categories and we
are modeling one independent variable. The cumulative
logit model will look as follows:
ln(Odds) = b0,1 + b0,12 + b0,123 + b1
The odds ratio is the same regardless of category:
e
e
b 0 ,1  b1 1
e
b 0 , 1  b1  0 
b1
e
e
b0 ,12  b1 1
e
b0 ,12  b1 0 
b1
e
e
b0 ,123  b1 1
b1

e
b0 ,123  b1 0 
11
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
A stratified approach to mimic a cumulative logit model
for a 4 category variable, would mean creating new
dichotomous variables something like the following:
if ordvar = 1 then ordvar1 = 1;
else if ordvar ^= . then ordvar1 = 0;
if 1<=ordvar<=2 then ordvar2 = 1;
else if ordvar ^= . then ordvar2 = 0;
if 1<=ordvar<=3 then ordvar3 = 1;
else if ordvar ^= . then ordvar3 = 0;
12
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Mimicking Cumulative Logit with Binary Logistic Models
proc logistic;
model ordvar1 = factors;
run;
proc logistic;
model ordvar2 = factors;
run;
proc logistic;
model ordvar3 = factors;
run;
The OR from each model
will be approx. the same if
the proportional odds
assumption holds.
Note that all observations
are used in each model.
13
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Cumulative Logit Model
If the proportional odds assumption does not hold, it
might be because the outcome variable is nominal rather
than ordinal, or it might be that we have mis-specified
the categories, failing to pinpoint important thresholds
on the underlying continuum.
The score test is quite sensitive—it is up to the analyst to
examine the pattern of ORs for different dichotomous
cutpoints and decide whether it is reasonable to use a
cumulative logit model.
14
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Generalized Logit Model
In contrast to the cumulative logit model, in a
generalized logit model, the outcome categories
are like dummy variables—mutually exclusive
categories compared to a common reference
group.
15
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Generalized Logit Model:
For a nominal outcome with k categories
ln Odds1  ln
ln Odds 2  ln
p1
1  p1 2...  k 1
p2
1  p1 2...  k 1
p k 1
ln Odds k 1  ln
1  p1 2...  k 1
Fixed denominator (reference category)
http://www.indiana.edu/%7Estatmath/stat/all/cat/2b1.html
16
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
Ordinal and Nominal
Outcomes
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
5
t
10
15
x
Risk Factor
Yes
No
4
a
e
Nominal Outcome Variable
3
2
b
c
f
g
1
d
h
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Total
Odds Among the exposed = a / d
Risk Factor
Yes
No
4
a
e
Nominal Outcome Variable
3
2
b
c
f
g
1
d
h
Total
Odds Among the exposed = b / d
Risk Factor
Yes
No
4
a
e
Nominal Outcome Variable
3
2
b
c
f
g
1
d
h
Odds Among the exposed = c / d
Total
17
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Generalized Logit Model
Given k categories of an outcome variable, a generalized
logit model yields k-1 intercept terms. Each intercept
corresponds to a single category.
Since proportional odds are not assumed, odds ratios can
vary across categories, and therefore the effect of each
covariate is reflected in k-1 slope parameters.
18
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
t
0
5
10
15
x
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Generalized Logit Model
Suppose an outcome variable has 4 categories and we are
modeling one independent variable. The generalized logit
model is as follows:
ln(Odds) = b0,1 + b0,2 + b0,3 + b1,1 + b1,2 +b1,3
b
b
1 b 0  b 0 
0 ,1
1 ,1
1, 2
1, 3
e
b1,1
1.

e
b  b 0  b1, 2 0  b1, 3 0 
e 0 ,1 1 ,1
2. e
e
b 0 , 2  b1,1 0  b1, 2 1 b1, 3 0 
b 0 , 2  b1,1 0  b1, 2 0  b1, 3 0 
The odds ratios are
distinct for each category:
b
e
b1, 2
b
0  b 0  b 1
0,3
1 ,1
1, 2
1, 3
e
3. b  b 0  b 0  b 0   e b1, 3
1, 2
1, 3
e 0 , 3 1,1
19
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Generalized Logit Model
Each slope parameter tests the odds of being in one
outcome category compared to the odds of being in the
reference category



Compared to those without Factor A, individuals with factor A
have ___ times the odds of having the outcomecategory 1;
Compared to those without Factor A, individuals with factor A
have ___ times the odds of having the outcomecategory 2;
Compared to those without Factor A, individuals with factor A
have ___ times the odds of having the outcomecategory 3;
20
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
A stratified approach to mimic generalized logit model
for a 4 category variable, would not require creation of
new variables, but would mean running models like the
following:
21
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Ordinal and Nominal
Outcomes
proc logistic;
where ordvar in(1,4);
model ordvar = factors;
run;
proc logistic;
where ordvar in(2,4);
model ordvar = factors;
run;
proc logistic;
where ordvar in(3,4);
model ordvar = factors;
run;
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Mimicking Generalized Logit
with Binary Logistic Models
The ORs from the
models will differ.
Note that different
subsets of observations
are used in each model.
22
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 1.
2 d.f.
0
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Association of Smoking and Fetal/Infant Death
in Preterm Deliveries
Frequency|Smoking and Mortality
Percent |Dichotomous Outcome
Row Pct |
Col Pct |fetal or|survivor| Total
| neonata| >=28 da|
|l death |ys
|
---------+--------+--------+
yes
|
79 |
1135 |
1214
|
0.87 | 12.50 | 13.37
|
6.51 | 93.49 |
| 14.08 | 13.32 |
---------+--------+--------+
no
|
482 |
7385 |
7867
|
5.31 | 81.32 | 86.63
|
6.13 | 93.87 |
| 85.92 | 86.68 |
---------+--------+--------+
Total
561
8520
9081
6.18
93.82
100.00
Crude OR=1.07
23
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
Example 1.
2 d.f.
0
t
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Association of Smoking and Fetal/Infant Death
in Preterm Deliveries
Crude Logistic Model with Dichotomous Outcome
DF
Estimate
1
1
-2.7293
0.0643
0.0470
0.1255
Parameter
Standard
Error
Intercept
smoking
yes
Wald
Chi-Square
Pr > ChiSq
3370.3800
0.2627
<.0001
0.6083
Odds Ratio Estimates
Effect
smoking yes vs no
Point
Estimate
1.066
95% Wald
Confidence Limits
0.834
1.364
24
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 1.
2 d.f.
0
5
10
15
x
Cumulative Logit:
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Odds of type of death among smokers
and the OR for smoker v. nonsmoker
Frequency|
Smoking and Mortality
Percent |
3 Categories
Row Pct |
Col Pct | fetal d|neonatal|survivor| Total
|eath >=2| death 0| >=28 da|
|0 wks
|-28 days|ys
|
---------+--------+--------+--------+
yes
|
46 |
33 |
1135 |
1214
|
0.51 |
0.36 | 12.50 | 13.37
|
3.79 |
2.72 | 93.49 |
| 13.86 | 14.41 | 13.32 |
---------+--------+--------+--------+
no
|
286 |
196 |
7385 |
7867
|
3.15 |
2.16 | 81.32 | 86.63
|
3.64 |
2.49 | 93.87 |
| 86.14 | 85.59 | 86.68 |
---------+--------+--------+--------+
Total
332
229
8520
9081
3.66
2.52
93.82
100.00
Odds=46 / (33+1135)=0.04
OR = 1.04
Frequency|
Smoking and Mortality
Percent |
3 Categories
Row Pct |
Col Pct | fetal d|neonatal|survivor| Total
|eath >=2| death 0| >=28 da|
|0 wks
|-28 days|ys
|
---------+--------+--------+--------+
yes
|
46 |
33 |
1135 |
1214
|
0.51 |
0.36 | 12.50 | 13.37
|
3.79 |
2.72 | 93.49 |
| 13.86 | 14.41 | 13.32 |
---------+--------+--------+--------+
no
|
286 |
196 |
7385 |
7867
|
3.15 |
2.16 | 81.32 | 86.63
|
3.64 |
2.49 | 93.87 |
| 86.14 | 85.59 | 86.68 |
---------+--------+--------+--------+
Total
332
229
8520
9081
3.66
2.52
93.82
100.00
Odds=(46+33) / 1135=0.07
OR = 1.07
25
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 1.
2 d.f.
0
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Cumulative Logit Model with 3 Categories
Ordered Value
1
2
3
outcome5
fetal death >=20 wks
neonatal death 0-28 days
survivor >=28 days
Frequency
332
229
8520
Probabilities modeled are cumulated over the lower
Ordered Values.
Score Test for the Proportional Odds Assumption
Chi-Square
DF
Pr > ChiSq
The proportional
0.0400
1
0.8414
odds assumption
holds
26
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
t
Example 1.
2 d.f.
0
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Cumulative Logit: Each intercept corresponds to a category plus all
categories with lower ordered values v. the remaining categories.
Analysis of Maximum Likelihood Estimates
Parameter
Intercept fetal death >=20 wks
Intercept neonatal death 0-28 days
smoking
yes
DF
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq
1
1
1
-3.2803
-2.7292
0.0635
0.0586
0.0470
0.1255
3130.7559
3370.8916
0.2561
<.0001
<.0001
0.6128
Odds Ratio Estimates
Effect
smoking yes vs no
Point
Estimate
1.066
46 / (33+1135) =
(46+33) / 1135
=
95% Wald
Confidence Limits
0.833
The odds ratio is an ‘average’ of the
cumulative logits
1.363
e-3.2803+0.0635
e-2.7291+0.0635
=
=
0.04
0.07
27
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
0
t
Example 1.
2 d.f.
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Generalized Logit Model with 3 Categories
In a generalized logit model, each intercept and slope
correspond to a single category.
Parameter
outcome5
Intercept
Intercept
smoking
yes
smoking
yes
fetal death >=20 wks
neonatal death 0-28 days
fetal death >=20 wks
neonatal death 0-28 days
DF
Estimate
Standard
Error
1
1
1
1
-3.2512
-3.6291
0.0455
0.0912
0.0603
0.0724
0.1620
0.1908
Wald
Chi-Square
Pr > ChiSq
2910.4207
2514.6406
0.0787
0.2284
<.0001
<.0001
0.7790
0.6327
Odds Ratio Estimates
Effect
outcome5
smoking yes vs no
smoking yes vs no
fetal death >=20 wks
neonatal death 0-28 days
Point
Estimate
1.047
1.096
95% Wald
Confidence Limits
0.762
0.754
1.438
1.592
Is 1.07 a reasonable summary of 1.047 and 1.096?
28
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 2.
2 d.f.
0
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Association of Maternal Risk and Fetal/Infant
Death in Preterm Deliveries
Frequency|Maternal Risk and Mortality
Percent |Dichotomous Outcome
Row Pct |
Col Pct |fetal or|survivor| Total
| neonata| >=28 da|
|l death |ys
|
---------+--------+--------+
yes
|
282 |
3836 |
4118
|
2.76 | 37.50 | 40.26
|
6.85 | 93.15 |
| 41.53 | 40.17 |
---------+--------+--------+
no
|
397 |
5713 |
6110
|
3.88 | 55.86 | 59.74
|
6.50 | 93.50 |
| 58.47 | 59.83 |
---------+--------+--------+
Total
679
9549
10228
6.64
93.36
100.00
Frequency| Matern Risk and Mortality
Percent | 3 Categories
Row Pct |
Col Pct | fetal d|neonatal|survivor| Total
|eath >=2| death 0| >=28 da|
|0 wks
|-28 days|ys
|
---------+--------+--------+--------+
yes
|
153 |
129 |
3836 |
4118
|
1.50 |
1.26 | 37.50 | 40.26
|
3.72 |
3.13 | 93.15 |
| 36.60 | 49.43 | 40.17 |
---------+--------+--------+--------+
no
|
265 |
132 |
5713 |
6110
|
2.59 |
1.29 | 55.86 | 59.74
|
4.34 |
2.16 | 93.50 |
| 63.40 | 50.57 | 59.83 |
---------+--------+--------+--------+
Total
418
261
9549
10228
4.09
2.55
93.36
100.00
29
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
Example 2.
2 d.f.
0
t
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Association of Maternal Risk and Fetal/Infant
Death in Preterm Deliveries
Crude Logistic Model with Dichotomous Outcome
Parameter
DF
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq
1
1
-2.6666
0.0563
0.0519
0.0806
2639.4735
0.4873
<.0001
0.4851
Intercept
matrisk
yes
Odds Ratio Estimates
Effect
matrisk yes vs no
Point
Estimate
1.058
95% Wald
Confidence Limits
0.903
1.239
30
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 2.
2 d.f.
0
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Cumulative Logit Model with 3 Categories
Ordered Value
outcome5
Frequency
1
fetal death >=20 wks
418
2
neonatal death 0-28 days
261
3
survivor >=28 days
9549
Probabilities modeled are cumulated over the lower
Ordered Values.
Score Test for the Proportional Odds Assumption
Chi-Square
DF
Pr > ChiSq The proportional
10.7077
1
0.0011 odds assumption
does not hold.
31
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
0
t
Example 2.
2 d.f.
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Cumulative Logit Model with 3 Categories
Parameter
DF
Estimate
Standard
Error
1
1
1
-3.1750
-2.6629
0.0473
0.0600
0.0518
0.0806
Intercept fetal death >=20 wks
Intercept neonatal death 0-28 days
matrisk
yes
Wald
Chi-Square
Pr > ChiSq
2798.1261
2641.7916
0.3435
<.0001
<.0001
0.5578
Odds Ratio Estimates
Effect
matrisk yes vs no
Point
Estimate
1.048
95% Wald
Confidence Limits
0.895
e-3.1750+0.0473
e-2.6629+0.0473
The odds ratio is an ‘average’
of the cumulative logits
1.228
=
=
0.04
0.07
32
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
0
t
Example 2.
2 d.f.
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Generalized Logit Model with 3 Categories
Parameter
outcome5
Intercept
Intercept
matrisk
yes
matrisk
yes
fetal death >=20 wks
neonatal death 0-28 days
fetal death >=20 wks
neonatal death 0-28 days
DF
Estimate
Standard
Error
1
1
1
1
-3.0708
-3.7676
-0.1510
0.3755
0.0628
0.0880
0.1037
0.1255
Wald
Chi-Square
Pr > ChiSq
2388.0754
1831.5579
2.1212
8.9450
<.0001
<.0001
0.1453
0.0028
Odds Ratio Estimates
Effect
outcome5
matrisk yes vs no
matrisk yes vs no
fetal death >=20 wks
neonatal death 0-28 days
Point
Estimate
0.860
1.456
95% Wald
Confidence Limits
0.702
1.138
1.054
1.862
Is 1.048 a reasonable summary of 0.86 and 1.5?
33
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 3. LBW
2 d.f.
0
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Modeling a 3 category birthweight variable:
/*cumulative logit */
proc logistic order=formatted;
model bwcat = smoking late_no_pnc;
run;
Ordered
Value
1
2
3
bwcat
vlbw
mlbw
normal bw
Total
Frequency
897
4087
75824
Probabilities modeled are cumulated over the lower Ordered Values.
34
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
t
Example 3. LBW
2 d.f.
0
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Score Test for the Proportional Odds Assumption
Parameter
Intercept
vlbw
Intercept
mlbw
smoking
late_no_pnc
Chi-Square
DF
Pr > ChiSq
17.0152
2
0.0002
DF
Estimate
Standard
Error
1
1
1
1
-4.6326
-2.8614
0.6012
0.2720
0.0351
0.0176
0.0383
0.0362
Effect
smoking
late_no_pnc
Point
Estimate
1.824
1.313
Wald
Chi-Square
Pr > ChiSq
17461.8326
26396.0103
246.8141
56.5520
<.0001
<.0001
<.0001
<.0001
95% Wald
Confidence Limits
1.692
1.223
1.966
1.409
35
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 3. LBW
2 d.f.
0
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
/*mimicking cumulative logit with binary models*/
proc logistic order=formatted;
model vlbw = smoking late_no_pnc;
run;
Point
95% Wald
Effect
vlbw v.
mlbw and normal
smoking
late_no_pnc
Estimate
1.346
1.138
proc logistic order=formatted;
model lbw = smoking late_no_pnc;
run;
vlbw and mlbw v.
normal
Effect
smoking
late_no_pnc
Point
Estimate
1.834
1.315
Confidence Limits
1.118
0.961
1.621
1.347
95% Wald
Confidence Limits
1.701
1.225
1.977
1.412
Both models include all observations in the sample
36
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 3. LBW
2 d.f.
0
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
/* generalized logit */
proc logistic order=formatted;
model bwcat(ref='normal bw') = smoking late_no_pnc
/ link=glogit;
run;
Ordered
Value
1
2
3
bwcat
vlbw
mlbw
normal bw
Total
Frequency
897
4087
75824
Logits modeled use bwcat='normal bw' as the reference category.
37
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
0
t
Example 3. LBW
2 d.f.
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
vlbw v. normal and mlbw v. normal
Parameter
bwcat
Intercept
Intercept
smoking
smoking
late_no_pnc
late_no_pnc
vlbw
lbw
vlbw
lbw
vlbw
lbw
DF
Estimate
Standard
Error
1
1
1
1
1
1
-4.5070
-3.0764
0.3409
0.6587
0.1470
0.3002
0.0394
0.0195
0.0947
0.0412
0.0861
0.0393
Effect
bwcat
smoking
smoking
late_no_pnc
late_no_pnc
vlbw
mlbw
vlbw
mlbw
Point
Estimate
1.406
1.932
1.158
1.350
Wald
Chi-Square
Pr > ChiSq
13075.7241
24943.4219
12.9546
255.2248
2.9166
58.2169
<.0001
<.0001
0.0003
<.0001
0.0877
<.0001
95% Wald
Confidence Limits
1.168
1.782
0.979
1.250
1.693
2.095
1.371
1.458
38
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 3. LBW
2 d.f.
0
5
10
x
15
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
/* mimicking generalized logit with binary models*/
proc logistic order=formatted;
where bwcat = 2 or bwcat = 0;
model bwcat(ref='normal bw') = smoking late_no_pnc
/ link=glogit;
run;
proc logistic order=formatted;
where bwcat = 1 or bwcat = 0;
model bwcat(ref='normal bw') = smoking late_no_pnc
/ link=glogit;
run;
39
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
t
Example 3. LBW
2 d.f.
0
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Generalized logit approach using binary models with
only a subset of observations in each model
vlbw v.
normal
mlbw v.
normal
Effect
bwcat
smoking
late_no_pnc
vlbw
vlbw
Effect
bwcat
smoking
late_no_pnc
mlbw
mlbw
Point
Estimate
1.406
1.159
Point
Estimate
1.933
1.351
95% Wald
Confidence Limits
1.168
0.979
1.693
1.371
95% Wald
Confidence Limits
1.783
1.251
2.095
1.459
40
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 3. LBW
2 d.f.
0
5
10
x
15
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Generalized logit models can get complicated,
but custom estimates can still be obtained in the usual way.
proc logistic order=formatted;
where 2<=momage<=3;
class parityrisk(ref='no hx preterm') / param=ref;
model bwcat = smoking late_no_pnc matrisk momage
parityrisk smoking*parityrisk / link=glogit;
contrast 'sm-risk, hxpreterm' smoking 1 matrisk 1
smoking*parityrisk 1 0 / estimate=exp;
contrast 'sm-risk, primips'smoking 1 matrisk 1
smoking*parityrisk 0 1 / estimate=exp;
contrast 'sm-risk, lorisk multips' smoking 1 matrisk 1
smoking*parityrisk 0 0 / estimate=exp;
run;
41
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
t
1
2
3
Example 3. LBW
2 d.f.
0
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The tests for the constructs in the model are all
statistically significant:
Type 3 Analysis of Effects
Effect
smoking
late_no_pnc
matrisk
momage
parityrisk
smoking*parityrisk
DF
Wald
Chi-Square
Pr > ChiSq
2
2
2
2
4
4
199.1393
46.9823
615.7383
7.7596
382.2127
22.1081
<.0001
<.0001
<.0001
0.0207
<.0001
0.0002
42
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
Example 3. LBW
2 d.f.
0
t
5
10
Exposure or Yes
Person, Place,
or Time
Variable No
15
x
Parameter
bwcat
Intercept
Intercept
smoking
smoking
late_no_pnc
late_no_pnc
matrisk
matrisk
momage
momage
parityrisk
parityrisk
parityrisk
parityrisk
smoking*parityrisk
smoking*parityrisk
smoking*parityrisk
smoking*parityrisk
vlbw
mlbw
vlbw
mlbw
vlbw
mlbw
vlbw
mlbw
vlbw
mlbw
vlbw
mlbw
vlbw
mlbw
vlbw
mlbw
vlbw
mlbw
>=35
>=35
hx preterm
hx preterm
primip
primip
hx preterm
hx preterm
primip
primip
DF
Estimate
Standard
Error
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-5.3253
-3.6337
0.3873
0.7851
0.1095
0.2866
1.0549
0.6885
0.1607
0.1150
1.6210
1.4185
0.6110
0.5060
0.4809
0.0921
-0.3266
-0.3663
0.0733
0.0332
0.1372
0.0564
0.0933
0.0422
0.0712
0.0338
0.1002
0.0493
0.1965
0.1089
0.0789
0.0383
0.3477
0.1950
0.2141
0.0914
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Wald
Chi-Square
Pr > ChiSq
5284.8432
11977.8462
7.9668
193.5812
1.3762
46.1522
219.3322
414.5412
2.5727
5.4443
68.0158
169.6569
60.0412
174.2524
1.9131
0.2231
2.3270
16.0623
<.0001
<.0001
0.0048
<.0001
0.2407
<.0001
<.0001
<.0001
0.1087
0.0196
<.0001
<.0001
<.0001
<.0001
0.1666
0.6367
0.1271
<.0001
Not all beta coefficients are statistically significant.
43
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
t
3
Example 3. LBW
2 d.f.
0
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Parity-specific contrasts of the joint effect of smoking
and having some antepartum medical risk, adjusting for
entry into prenatal care and maternal age.
Contrast
sm-risk,
sm-risk,
sm-risk,
sm-risk,
sm-risk,
sm-risk,
Type
hxpreterm
hxpreterm
primips
primips
lorisk multips
lorisk multips
EXP
EXP
EXP
EXP
EXP
EXP
Row
Estimate
Standard
Error
1
2
1
2
1
2
6.8423
4.7860
3.0515
3.0260
4.2299
4.3649
2.2409
0.9081
0.5430
0.2388
0.6439
0.2819
Confidence Limits
3.6011
3.2997
2.1530
2.5924
3.1387
3.8459
13.0010
6.9419
4.3248
3.5322
5.7005
4.9539
Should we leave the smoking*parityrisk term in the model?
44
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
Example 4.
Prenatal Care
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
t
5
10
15
x
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Should we consider the categories ordinal or nominal?
Table of prevlbw by indexsum
prevlbw
Frequency
Row Pct
indexsum(two factor summary index)
Total
No Pnc
Inadeq
Inter
Adeq
Adeq+
prev lbw
736.34
3.71
3097.6
15.62
2363.3
11.91
5274.7
26.59
8364
42.17
19836
no hx lbw or
primip
3315.8
1.18
19576
6.98
33170
11.83
138719
49.46
85667
30.55
280448
45
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
Example 4.
Prenatal Care
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
t
5
10
15
x
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
The Overlapping dichotomous Contrasts
No Pnc v. Any PNC, OR = 3.2
prevlbw
Inad/No v. Adeq+/Adeq/Inter, OR=2.7
indexsum(two factor summary index)
prevlbw
indexsum(two factor summary index)
Frequency
Row Pct
No Pnc
Inadeq
Inter
Adeq
Adeq+
Frequency
Row Pct
No Pnc
Inadeq
Inter
Adeq
Adeq+
prev lbw
736.34
3.71
3097.6
15.62
2363.3
11.91
5274.7
26.59
8364
42.17
prev lbw
736.34
3.71
3097.6
15.62
2363.3
11.91
5274.7
26.59
8364
42.17
no hx lbw
or primip
3315.8
1.18
19576
6.98
33170
11.83
138719
49.46
85667
30.55
no hx lbw
or primip
3315.8
1.18
19576
6.98
33170
11.83
138719
49.46
85667
30.55
Inter/Inad/No v. Adeq+/Adeq, OR=1.8
prevlbw
indexsum(two factor summary index)
All others v. Adeq+, OR=0.60
prevlbw
indexsum(two factor summary index)
Frequency
Row Pct
No Pnc
Inadeq
Inter
Adeq
Adeq+
Frequency
Row Pct
No Pnc
Inadeq
Inter
Adeq
Adeq+
prev lbw
736.34
3.71
3097.6
15.62
2363.3
11.91
5274.7
26.59
8364
42.17
prev lbw
736.34
3.71
3097.6
15.62
2363.3
11.91
5274.7
26.59
8364
42.17
no hx lbw
or primip
3315.8
1.18
19576
6.98
33170
11.83
138719
49.46
85667
30.55
no hx lbw
or primip
3315.8
1.18
19576
6.98
33170
11.83
138719
49.46
85667
30.55
46
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
Example 4.
Prenatal Care
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
5
t
10
15
x
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Non-overlapping dichotomous contrasts:
prevlbw
Frequency
Row Pct
indexsum(two factor
summary index)
prevlbw
Frequency
Row Pct
indexsum(two factor
summary index)
Inadeqq
Adeq
No Pnc
Adeq
prev lbw
736.34
3.71
5274.7
26.59
prev lbw
3097.6
15.62
5274.7
26.59
no hx lbw or
primip
3315.8
1.18
138719
49.46
no hx lbw or
primip
19576
6.98
138719
49.46
prevlbw
Frequency
Row Pct
indexsum(two factor
summary index)
Inter
Adeq
prev lbw
2363.3
11.91
5274.7
26.59
no hx lbw or
primip
33170
11.83
138719
49.46
prevlbw
Frequency
Row Pct
indexsum(two factor
summary index)
Adeq+
Adeq
prev lbw
8364
42.17
5274.7
26.59
no hx lbw or
primip
85667
30.55
138719
49.46
47
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
0.4
2 d.f.
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
0
t
5
10
15
x
Cumulative Logit:
The null hypothesis of
proportional odds is rejected.
Parameter
Intercept
Intercept
Intercept
Intercept
prevlbw
No PNC
Inadequate
Intermediate
Adequate
Disease or Other Health Outcome
Yes
No
Example 4.
Prenatal Care
1 d.f.
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Score Test for the Proportional Odds Assumption
Chi-Square
DF
Pr > ChiSq
7014.0733
3
<.0001
DF
Estimate
Standard
Error
1
1
1
1
1
-4.2917
-2.3257
-1.3409
0.7857
-0.00326
0.1749
0.0701
0.0495
0.0423
0.1698
Wald
Chi-Square
Pr > ChiSq
601.9645
1101.3880
732.9840
345.3622
0.0004
<.0001
<.0001
<.0001
<.0001
0.9847
Odds Ratio Estimates
Any association is
obscured by averaging
across levels of APNCU.
Effect
prevlbw
Point
Estimate
0.997
95% Wald
Confidence Limits
0.715
1.390
48
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.5
0.3
Example 4.
Prenatal Care
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
1
2
3
0
5
t
10
15
x
Parameter
indexsum
Intercept
Intercept
Intercept
Intercept
prevlbw
prevlbw
prevlbw
prevlbw
No PNC
Inadequate
Intermediate
adequate+
No PNC
Inadequate
Intermediate
adequate+
DF
Estimate
Standard
Error
1
1
1
1
1
1
1
1
-3.7338
-1.9581
-1.4308
-0.4820
1.7648
1.4258
0.6280
0.9430
0.2019
0.0842
0.0670
0.0459
0.4114
0.2606
0.2691
0.1861
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Wald
Chi-Square
Pr > ChiSq
342.1621
541.3514
455.9302
110.4236
18.4034
29.9399
5.4441
25.6809
<.0001
<.0001
<.0001
<.0001
<.0001
<.0001
0.0196
<.0001
Odds Ratio Estimates
Generalized
Logit
Effect
indexsum
prevlbw
prevlbw
prevlbw
prevlbw
No PNC
Inadequate
Intermediate
adequate+
Point
Estimate
5.840
4.161
1.874
2.568
95% Wald
Confidence Limits
2.608
2.497
1.106
1.783
13.080
6.935
3.175
3.698
49
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Example 4.
Prenatal Care
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Women with a prior lbw delivery had more than 4 times
the odds of receiving no or inadequate prenatal care
rather than adequate care compared to women with no
history of lbw delivery.
Compared to women without a history of lbw delivery,
however, these high risk women also had more than
twice the odds of appropriately receiving care beyond
what is considered adequate for most women.
50
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
Disease or Other Health Outcome
Yes
No
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
-3
-2
-1
0
1
2
3
Example 5.
2 d.f.
0
t
5
10
15
x
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Cumulative Logit Model for the Associations Between
Key Features Across Domains and Higher Levels of
MCH Epidemiology Functioning
Odds
95% CI
Ratio
Outcome is a
3 level rating
of MCH
epidemiology
functioning:
•above average
•average
•below average
P
Organizational Position*
2.0
0.8- 4.8
0.14
Agenda-Setting by Consensus
6.1
1.1-34.3
0.04
Agenda-Setting by Consensus
Including External Partners
6.6
1.3-33.2
0.02
Total Key Staff with Doctoral Training
2.5
1.3 - 5.0
0.01
Additional Staff:
Assignees, Fellows, or Interns
6.4
1.3-32.1
0.03
Routine Data Sharing (internal and external)
& Data Integration Occurring
4.0
0.9-18.3
0.07
*
Organizational position is the three level ordinal variable: named MCH epidemiology unit,
no named unit, but recognized presence, and no or diffuse Effort
51
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
0.0
-2
-1
0
t
•
•
3 d.f.
5 d.f.
8 d.f.
0.1 0.0
0.1
0.3 0.2
0.2
-3
•
2 d.f.
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Cumulative--Ordinal
Proportional odds
assumption—assess the
series of binary
comparisons from
collapsing categories
k-1 intercepts
1 slope / 1 odds ratio
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Generalized--Nominal
• No assumption of the
shape of the association
• Categories compared to a
reference group
• k-1 intercepts
• k-1 slopes / k-1 odds
ratios
52
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Issues for categorizing an outcome variable are similar
to those for defining categories for independent
variables:



Conceptual meaning of the categories
Statistical tests v. judgment about differences
between categories
Sample size and power
53
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Model Building
Similar to beginning with examining dummy variables
for an independent variable prior to deciding whether to
use it in an ordinal form, sometimes it is useful to run a
generalized logit model first, since it requires no
assumption about the ordering of the categories, and
empirically assess whether the variation in categoryspecific odds ratios is important or negligible.
54
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
And even if the proportional odds assumption holds,
reporting separate odds ratios for each category—using
generalized logit—may be important in order to
emphasize the similarity of the strength of the
association across categories.
In addition, the cumulative logit model will not only
force the strength of association to be uniform, the
predicted values will also be forced to be linear. Using
generalized logit, the predicted odds and odds ratios will
both more closely reflect the observed values.
55
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Why Not Just Always Run Stratified Models for
Generalized Logit?
For nominal outcomes, using a single model may be
more efficient than using separate binary models
With separate binary models, need to decide whether
each model should include the same independent
variables or whether different final, category-specific
models make sense, each including only those variables
which are risk or protective factors for a particular
binary comparison
56
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Using a single multinomial model permits a
unified profile of risk and protective factors
across the categories—both significant and
insignificant
57
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
For a variable that is actually continuous, are there
reasons to use a cumulative logit model instead of a
continuous outcome model?
For example, when would modeling ordinal categories of
birthweight be preferable either to modeling birthweight
continuously in grams or categorized into nominal groups?


using a variable as ordinal (with fewer categories) as opposed
to continuous will yield odds ratios instead of mean differences
No assumption of normality required
58
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Summary: Ordinal and
Nominal Outcomes
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
For a variable that meets the proportional odds
assumption, is it still appropriate to choose to use a
generalized logit approach?
using ordinal as opposed to nominal categories will
be more efficient if there is truly an ordinal effect
Why "waste" degrees of freedom on multiple odds
ratios, if the effect is constant across categories?
59
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Which Modeling
Approach?
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Choosing the form of the outcome variable:
Stressful Life Events
•
Any stressful life event (y/n) = independent vars
(dichotomous)
•
Fin. Emot. Traum. Partner = independent vars
(Nominal - No stressful life events as the reference)
•
Sum of stressful life events = independent vars
(continuous)
•
Scale of stressful life events = independent vars
(ordinal)
60
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Which Modeling
Approach?
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Choosing the form of the outcome variable:
Maternal Depression
•
Any depression (y/n) = independent vars
•
Pre&Post Pre_Only PP_Only = independent vars
(Nominal - No depression as the reference)
•
Severe Moderate Mild = independent vars
(Ordinal or Nominal)
•
Depression Severity Scale = independent vars
(ordinal)
61
Density of Student's t with 10 d.f.
Chi-Square Densities
0.6
0.4
0.3
0.5
1 d.f.
0.4
3 d.f.
5 d.f.
8 d.f.
0.0
0.1 0.0
0.1
0.3 0.2
0.2
2 d.f.
-3
-2
-1
0
t
1
2
3
0
5
10
x
15
Which Modeling
Approach?
Disease or Other Health Outcome
Yes
No
Exposure or Yes
Person, Place,
or Time
Variable No
a
b
a+b
(n1)
c
d
c+d
(n2)
a+c
(m1)
b+d
(m2)
a+b+c+d
N
Choosing the form of the outcome variable:
Breastfeeding
•
Ever Breastfed (yes v. no) = independent vars
•
Exclusive BF>=2 mos. (yes v. no) = independent vars
•
Exclusive >=2 mo. Exclusive BF<=2 mo.= independent vars
Never Breastfed as reference
•
BF<2 mo. BF 2-6 mo. BF > 6 mo. = independent vars
Never Breastfed as reference
•
Breastfeeding duration in weeks = independent vars
62