Collapsing Ordered Outcome Categories: A Note of Concern

Vol. 144, No. 4
American Journal of Epidemiology
Copyright O 1996 by The Johns Hopkln3 University School of Hygiene and Public Health
All rights reserved
Printed In US.A
A BRIEF ORIGINAL CONTRIBUTION
Collapsing Ordered Outcome Categories: A Note of Concern
Ulf Strbmberg
When analyzing and interpreting data from an epidemiologic study where ordinal (ordered categorical)
outcomes have been measured in different exposure groups, an effect parameter of interest is the common
odds ratio implied by the proportional odds model. This model can sometimes be applied to a collapsed
outcome variable, instead of the measured variable, without reducing efficiency considerably. However, in a
given data set, changing the outcome categories can affect the effect estimate as well as the inference being
drawn from the data, even if the true effect itself has not changed. In particular, one should be careful in
dichotomizing the measured outcome variable. Am J Epidemiol 1996;144:421-4.
data interpretation, statistical; epidemiologic methods; statistics
There are several situations in which ordinal (ordered categorical) outcome data arise naturally. Even
if the underlying outcome variable is continuous, it
may not be observable. For example, "degree of pain"
is suited to be defined as an ordinal variable. Another,
more specific, example is the profusion of small opacities on a chest radiograph, which is an outcome of
interest in epidemiologic studies of radiographic abnormalities; this outcome is conventionally assessed
on an ordinal scale according to the International Labour Office's 1980 classification (1).
Different statistical models have been proposed for
analyzing data sets involving ordinal outcomes (2-4).
The present paper focuses on the proportional odds
model (sometimes referred to as the "cumulative odds
model"), which seems to be reasonable for many applications (2, 5). This model is equivalent to assuming
that the true effect, which can be expressed by an odds
ratio, does not change if one changes the ordinal
outcome variable by moving or deleting the category
boundaries. When there are only two outcome categories, the proportional odds model is identical to the
familiar logistic regression model (2).
The proportional odds model can sometimes be applied to a categorized ordinal outcome variable, instead of the measured variable, without reducing efficiency considerably (5). Nevertheless, an estimate of
the effect parameter is likely to depend on the outcome
categories used in the analysis. In particular, in a given
data set, changing the categories can affect the effect
estimate as well as the inference being drawn from the
data, even if the true effect itself has not changed; this
is demonstrated below by means of a simulation study.
METHODS
Assume that study participants are classified into
two groups, referred to as the unexposed and exposed
groups. Let Yo and Yx denote the ordinal outcome
variables that are observed for each person in the
unexposed and exposed groups, respectively. Let the
ordered categories of Yo as well as I7, be represented
by 1, 2, ..., k; the outcomes fall into these categories
with multinomial probabilities
I,
T T Q 2 , •••,
in the unexposed group and
in the exposed group.
If we define
for / = 0, 1 and; = 1, 2, ..., k - 1, then
Received for publication August 8, 1995, and in final form April
16, 1996.
From the Department of Occupational and Environmental Medicine, University Hospital, Lund University, S-221 85 Lund, Sweden.
(Reprint requests to Dr. Ulf StrOmberg at this address).
4>j = [(1 - P.>o,]/[(1 (1 < j < k - 1) is the odds ratio corresponding to a
specific binary split of Yo as well as Yv The propor421
422
Stromberg
tional odds model specifies that tjj1 = i//2 = ... = ifjk_l
(2); let us denote this common value by ip. The corresponding log odds ratio, ln(t//), can be estimated
using the efficient score and Fisher's information (6).
Similar calculations can be used for hypothesis testing.
These techniques were applied in the simulation study
described below (when k > 2, an approximate form
for Fisher's information (7) was used).
Ordinal outcome data can be generated under the
proportional odds model. By specifying the cumulative distribution of the ordinal outcome variable for the
unexposed group, {p0l, p^,.... Po<*-i)}' and the common odds ratio, i//, one can easily determine the corresponding distribution for the exposed group, {pn,
Pu> •••• Pi(k-i))- m th e present simulation study, k =
5 and {p0l, pm, pQ3, pM} = {0.333, 0.500, 0.667,
0.900}. Furthermore, two common odds ratios were
considered: i/f = 1, i.e., no exposure effect, and t// = 2,
i.e., a moderate exposure effect, which implied that
{Piu P12, Pp> Pu) = (0.200, 0.333, 0.500, 0.818}.
The simulation samples were obtained by generating
random numbers from the standard uniform distribution and grouping the numbers into five categories
according to the exposure-specific multinomial probabilities at issue. The sample size was set to 100 +
100.
Of course, an outcome variable with five ordered
categories can be collapsed in several different ways.
In the present study, for each possible ordinal categorization of a generated data set (cf. table 1), the true
effect 4> was estimated and the null hypothesis Ho: i// =
1 was tested against the alternative H^.tyi^X, with the
type I error approximately equal to 0.05. Each result
from the simulation study is based on 10,000 replicated data sets.
RESULTS
Table 1 summarizes the results obtained for each
ordinal categorization. The respective mean log odds
ratio estimates were very close to the true values. By
collapsing the outcomes, the standard deviations of the
corresponding estimates were increased, although
marginally, except when the outcomes were dichotomized by the upper cutpoint; note that this cutpoint is
the most extreme one with respect to the underlying
outcome distributions. For the original outcomes (in
five categories), a type I error of 5.4 percent and a
power (given that i/f = 2) of 78 percent were observed.
The collapsed outcome variables implied similar type
I errors and powers between 56 percent and 76 percent, except for the binary outcome defined by the
TABLE 1. Log odds ratio (In (10) estimates, type I errors, and powers obtained from a simulation study
of the effects of collapsing ordered outcome categories
0.69
Outcome
variable*
Meant
SO
PDwar
(%)§
5.4
0.69
0.25
78
0.26
0.27
0.26
0.27
5.3
5.2
5.3
5.5
0.69
0.68
0.68
0.69
056
76
75
76
75
0.00
0.00
0.00
0.00
0.00
0.00
0.28
0.27
0.28
0.28
0.28
0.30
5.1
5.3
5.2
5.5
5.2
5.6
0.69
0.69
0.68
0.69
0.69
0.69
0.00
0.00
0.00
-O.01
0.30
0.29
0.30
0.48
5.0
5.9
5.1
5.1
0.68
0.69
0.69
0.67
Meant
SO*
TVpel
error (%)§
Outcomes In five ordered categories
1 I2 |3 I 4 | 5
0.00
0.26
n Four ordered categories
2 3 I 4-5
2 3-4 | 5
2-3 4 5
1-2 | 3 4 5
0.00
0.00
0.00
0.00
Outcomes
1
1
1
0.26
0.25
0.26
Outcomes in three ordered categories
2 3-5
2-3 4-5
1 |M 5
1-2 | 3 4-5
1-2 | 3-4 5
1-3 | 4 5
1
1
Binary outcomes
1 2-5
1-2 | 3-5
1-3 | 4-5
1-4 I 5
0.28
0.27
0.27
0.27
0.28
70
75
68
73
72
69
0.31
0.29
0.29
0.40
56
68
68
38
026
* A cutpoint is indicated by a vertical bar (outcome data were generated from exposure-specific random variables with five ordered categories, and were collapsed in different ways),
t Mean of 10,000 estimates,
i SD, standard deviation.
§ Obtained from 10,000 replicated tests of Ho: Y - 1 versus H, : T - 1 .
Am J Epidemiol
Vol. 144, No. 4, 1996
Collapsing Ordered Outcome Categories
upper cutpoint, which gave a very low power of 38
percent.
For each data set, four different log odds ratio
estimates were obtained by dichotomizing the outcomes (one for each cutpoint); a relatively large maximum absolute difference between such estimates, i.e.,
(i, j = 1, 2, 3, 4), was usually obtained (the 20th, 50th,
and 80th centiles of 10,000 such differences equalled
0.29, 0.48, and 0.75 for ln(i//) = 0 and 0.28, 0.44, and
0.66 for ln(i//) = 0.69; see table 2). When three ordered
outcome categories were used, six different log odds
ratio estimates were obtained from each data set; the
maximum absolute difference between such estimates
was usually substantially smaller than in the situation
with binary outcomes (table 2). The influence of the
choice of cutpoints on the estimate was weakest when
outcomes with four categories were considered. In
accordance with these results, hypothesis testing frequently led to less coherent conclusions for a given
data set, provided that a fixed number of outcome
categories was used, if the number of categories was
decreased. For example, as is seen in table 2, when \\i
was equal to 1, the null hypothesis was not rejected for
any ordinal categorization of the outcomes into four
categories with a frequency of 91.1 percent, whereas
the corresponding percentage decreased to 84.4 for the
binary outcomes; when ip was equal to 2, the null
hypothesis was rejected for all ordinal categorizations
of the outcomes into four categories with a frequency
of 67.3 percent, whereas the corresponding percentage
markedly decreased to 21.7 for the binary outcomes.
423
Clearly, it is desirable not to reject the null hypothesis
if 4* = 1, as well as to reject it if t/f = 2, with a high
percentage, regardless of how the outcomes are categorized.
DISCUSSION
An investigator's classification of outcome data
may be somewhat arbitrary. One point of concern is
the number of outcome categories used in the analysis.
Even if efficiency is not substantially reduced by collapsing the observed outcomes, such an approach may
still raise concern. The results of the above simulation
study demonstrate that, when analyzing a data set of
limited size, changing the outcome categories can frequently affect the effect estimate as well as the inference being drawn from the data—particularly if a
single cutpoint is considered. Collapsing of an ordinal
outcome variable may therefore raise concern. The
results are not surprising; one should bear in mind that
the common odds ratio estimate is obtained by weighing information provided by the odds ratio estimates
for the corresponding binary splits (2). Thus, it seems
more robust to involve at least three outcome categories, rather than an arbitrary binary outcome, in the
analysis.
Indeed, the simulation results were obtained under
certain limitations. The case focused on here was one
in which the proportional odds model was correct.
Moreover, only one set of [p^] for the unexposed
group, one non-null odds ratio, and one sample size
were examined. I have also performed complementary
simulations which essentially led to similar conclusions. However, some comments should be made. By
TABLE 2.
Effects of collapsing the originally generated outcomes (which fell into five ordered categories) with respect to two
given log odds rattos, In (4Q Q 0 and In ( f ) a 0.69
% of data sets which resulted In n rejections of H o t
Difference between estimates*
In OF)
20th
cenUe
50th
centlte
80th
centile
n-1
n=2
n=3
n=4
n=5
Outcomes in four ordered categories
0.00
0.69
0.08
0.08
0.12
0.12
0.18
0.18
91.1
17.0
3.1
4.6
1.8
4.6
1.4
6.5
2.6
67.3
—*
—
Outcomes in three ordered categories
0.00
0.69
0.16
0.16
0.26
0.26
0.39
0.37
87.1
13.5
5.3
5.5
2.4
4.3
a3
7.6
0.9
6.4
1.0
10.5
1.0
52.2
Binary outcomes
0.00
0.69
0.29
0.28
0.48
0.75
0.66
84.4
14.5
11.4
14.4
3.1
19.6
0.9
29.8
0.2
21.7
0.44
—
—
* Maximum absolute difference between the log odds ratio estimates obtained from a given data set, using a fixed number of ordered
outcome categories (i.e., four estimates for outcomes in four categories, six estimates for outcomes in three categories, and four estimates
b r binary outcomes). The 20th, 50th, and 80th centiles of 10,000 such differences (one for each generated data set) are presented.
t Results from hypothesis testing of Ho: W - 1 versus W,: ¥ * 1 (with the type I error approximately equal to 0.05; cf. table 1) with respect
to the possible ordinal categorizations in question. The results are based on 10,000 replicated data sets.
t Comparison not possible.
Am J Epidemiol
Vol. 144, No. 4, 1996
424
Stromberg
increasing the sample size and thereby the power,
provided that ip ^ 1, the influence of collapsing outcome categories naturally weakened. In situations
where most of the potential cutpoints were rather
extreme with respect to the underlying outcome distributions, inference from the effect parameter was
particularly sensitive to the actual choice of outcome
categories. Thus, the issue of where to put the cutpoints may require attention even if more than two
outcome categories are used; one should avoid using
many extreme cutpoints. Furthermore, when the proportional odds condition was slightly violated, the
effect of collapsing categories tended to be more pronounced.
Estimation under the proportional odds model,
which can be applied in a multivariable setting, is
preferably to be carried out by the maximum likelihood method (2, 5). In the present simulation study,
however, a log odds ratio estimate was based on the
efficient score and on Fisher's information, which
provide an approximate maximum likelihood estimate
by simple calculations (6, 7). It should be stressed that
the maximum likelihood estimates tend to be larger
than the estimates based on the efficient score and
Fisher's information; the closer the log odds ratio
estimates are to zero, the better the agreement (7).
Thus, inference based on the efficient score and
Fisher's information might not be appropriate when
the effect is large. I therefore considered a moderate
exposure effect in the simulation study.
The proportional odds model seems to be reasonable
for many applications (2, 5). However, if data are
obtained under outcome-stratified ("case-control")
sampling, a proportional odds analysis may not be
appropriate, because it yields an effect estimate that
depends on the sampling fractions of the different
outcome categories (4). In such situations, it might be
better to apply the so-called stereotype model (3, 4).
Of course, in each application, one should check to see
whether the model employed seems appropriate (2-5).
Ordinal outcome data may also be obtained by
grouping of continuous outcome data. For example,
ordinal blood pressure values may be obtained by
grouping measured blood pressure values. If continuous outcome data are available, one should first attempt to analyze the data using an appropriate model
with a continuous outcome variable, because a substantial loss of information on the effect can sometimes be incurred by grouping the data. However,
epidemiologists commonly prefer to dichotomize a
continuous outcome variable, since it casts that variable into a traditional epidemiologic mold (i.e., disease
vs. no disease). The present note of concern indicates
that it would be more appropriate to categorize a
continuous outcome variable into at least three categories instead of two. Specifically, the effect estimate
is then likely to be less dependent on the choice of
cutpoints (cf. Ragland (8), who addresses this problem
by considering a single cutpoint).
In summary, epidemiologists should be prudent in
dichotomizing outcome data. It is preferable to first
attempt to analyze the original outcome data using an
appropriate statistical model.
REFERENCES
1. International Labour Office, United Nations. Guidelines for
the use of ILO international classification of radiographs of
pneumoconioses. Geneva, Switzerland: International Labour
Office, 1980. (Occupational Safety and Health, series 22, rev.
80).
2. McCullagh P. Regression models for ordinal data (with discussion). J R Stat Soc [B] 1980;42:109-42.
3. Anderson JA. Regression and ordered categorical variables
(with discussion). J R Stat Soc [B] 1984 ;46:1-30.
4. Greenland S. Alternative models for ordinal logistic regression. Stat Med 1994; 13:1665-77.
5. Armstrong BG, Sloan M. Ordinal regression models for epidemiologic data. Am J Epidemiol 1989; 129:191-204.
6. Whitehead A, Whitehead J. A general parametric approach to
the meta-analysis of randomized clinical trials. Stat Med 1991;
10:1665-77.
7. Whitehead A, Jones NM. A meta-analysis of clinical trials
involving different classifications of response into ordered
categories. Stat Med 1994;13:2503-15.
8. Ragland DR. Dichotomizing continuous outcome variables:
dependence of the magnitude of association and statistical
power on the cutpoint. Epidemiology 1992;3:434-40.
Am J Epidemiol
Vol. 144, No. 4, 1996