Estimating Prevalence of Type 1 and Type 2 Diabetes

American Journal of Epidemiology
Copyright © 1999 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
Vol. 149, No. 1
Printed in U.S.A.
Estimating Prevalence of Type 1 and Type 2 Diabetes in a Population of
African Americans with Diabetes Mellitus
James P. Boyle,1 Michael M. Engelgau,1 Theodore J. Thompson,1 Merilyn G. Goldschmid,1 Gloria L. Beckles,1
David S. Timberlake,1 William H. Herman,2 David C. Ziemer,3 and Daniel L. Gallina3
The pathogenesis, treatment, and outcomes of type 1 and type 2 diabetes differ. Current surveys derive
population-based estimates of diabetes prevalence by type using limited clinical information and applying
classification rules developed in white populations. How well these rules perform when deriving similar
estimates in African American populations is unknown. For this study, data were collected on a group of African
Americans with diabetes who enrolled at the Diabetes Unit of Grady Memorial Hospital in Atlanta, Georgia, from
April 16,1991, to November 1,1996. The data were used to develop some simple classification rules for African
Americans based on a classification tree and a logistic regression model. Sensitivities and specificities, in which
fasting C-peptide was used as the gold standard, were determined for these rules and for two current rules
developed in mostly white, non-Hispanic populations. Rules that yielded precise (minimum variance unbiased)
estimates of the prevalence of type 1 diabetes were preferred. The authors found that a rule based on the
logistic regression model was best for estimating type 1 prevalences ranging from 1 % to 17%. They concluded
that simple classification rules can be used to estimate prevalence of diabetes by type in African American
populations and that the optimal rule differs somewhat from the current rules. Am J Epidemiol 1999; 149:55-63.
classification; diabetes mellitus; logistic models
Diabetes mellitus is recognized as an important public health problem in the United States and throughout
the world. The most prevalent forms of diabetes are
type 1 and type 2. Type 1 typically occurs in youth, is
characterized by severe insulin deficiency secondary to
autoimmune destruction of insulin-producing cells, and
often results in diabetic ketoacidosis. Type 2 typically
occurs in adulthood, is characterized by insulin resistance and high insulin levels, and generally does not
result in diabetic ketoacidosis (1, 2). Because of differences in etiology, associated risk factors, and prevention strategies for these two forms of diabetes, public
health surveillance to enable a distinction to be made
between the two types is very important. Unfortunately,
data available from population-based surveys that are
useful in classifying diabetes are limited, and to our
knowledge the validity of diabetes classification based
on such data has not been examined in minority populations in the United States.
Rules used for accurate classification of diabetes were
developed from extensive clinical research. From this
research, C-peptide emerged as a biochemical marker
central to distinguishing type 1 from type 2 diabetes. Cpeptide is a polypeptide that is cleaved from proinsulin
and reflects endogenous pancreatic capacity to secrete
insulin (3-8). Exogenous insulin (i.e., insulin administered for treatment) does not alter the C-peptide level, so
it is a reliable measure of endogenous insulin production
even in patients who require insulin. A C-peptide level
of <0.9 ng/ml (0.3 pmol/ml), which is very low, has been
used to accurately classify persons with type 1 diabetes.
Based on findings from clinical research, simple classification rules were developed that can function with
the Limited information usually available from public
health survey data (3, 9, 10). This development has
made possible the surveillance of trends in diabetes by
type. Unfortunately, the data from which the original
rules were derived came largely from studies of white,
non-Hispanic populations. The incidence, prevalence,
and clinical presentation of type 1 and type 2 diabetes
in African Americans, Hispanics, Asian/Pacific
Islanders, and American Indians are different from
those in white, non-Hispanic groups (11-19). To our
knowledge, how well the existing classification
Received for publication October 14,1997, and accepted for publication May 11, 1998.
Abbreviations: BMI, body mass index; CR I, current rule I; CR II,
current rule II.
1
Division of Diabetes Translation, National Center for Chronic
Disease Prevention and Health Promotion, Centers for Disease
Control and Prevention, Atlanta, GA.
2
Division of Endocrinology and Metabolism, Department of
Internal Medicine, University of Michigan Medical Center, Ann Arbor,
Ml.
3
Diabetes Unit, Grady Memorial Hospital, Atlanta, GA.
Reprint requests to James P. Boyle, Division of Diabetes
Translation, Mailstop K-68, Centers for Disease Control and
Prevention, 4770 Buford Highway NE, Atlanta, GA 30341-3724.
55
56
Boyle et al.
schemes perform in minority groups has not been
examined previously.
For this study, development and evaluation of methods to estimate the prevalence of type 1 and type 2 diabetes included the following steps: 1) develop classification rules, 2) calculate the unadjusted estimates by
using the classification rules, 3) obtain unbiased estimates by adjusting the estimates for their sensitivities
and specificities, and 4) select the unbiased estimator
with minimum variance. The unadjusted estimates are
simply the proportion of people classified as having
type 1 or type 2 diabetes. These estimates are often used
directly (9, 10) but are known to be biased when sensitivity and specificity do not equal one (20). The unadjusted estimates are inappropriate to use since they estimate the proportion, not the prevalence, of people
classified as type 1 or type 2 in the entire population.
Thus, we considered only unbiased estimates of prevalence. Small variances are preferred, since smaller variances imply a higher probability that the estimate is
arbitrarily close to the true population prevalence.
The purpose of this study was twofold. Using
detailed clinical and laboratory information available
from a clinic-based population of African Americans
with diabetes, we wanted to evaluate the performance
of classification rules currently used with survey data to
estimate diabetes prevalence by type. In addition, we
used this detailed information to develop classification
rules specific to African Americans so that potential differences by race could be assessed.
met the standard American Diabetes Association criteria
for the diagnosis of diabetes (22); and had serum creatinine levels of <2.0 mg/dl. It is well known that creatinine levels of >2.0 mg/dl are indicative of significant
renal insufficiency and can result in spurious C-peptide
readings. For this analysis, we required information on
sex (male, female), the dichotomous variable current
insulin use (1 = yes, 0 = no), age, age at diabetes diagnosis, body mass index (BMI), and fasting C-peptide
level. Because the creatinine levels of 259 subjects were
either missing or were >2.0 mg/dl, these subjects were
excluded from analysis. An additional 158 subjects were
excluded because of missing data on required variables
(77 for missing C-peptide levels, eight for missing age
at diagnosis, and 73 for missing data used to calculate
BMI), reducing the final number of subjects to 3,613.
Figure 1 is a histogram of the distribution of 3,694
available C-peptide levels (log base 10). An analysis
(details not shown) demonstrated no evidence of
bimodality. Therefore, we used fasting C-peptide cutpoints of <0.9 ng/ml and >0.9 ng/ml to classify diabetes
as type 1 or type 2, respectively (3-8).
Of the 245 subjects who were classified as having
type 1 diabetes, 49 percent were men and 91 percent
were current insulin users (table 1). Among the 3,368
persons classified as having type 2 diabetes, 36 percent
were men and 39 percent were current insulin users.
Distributions of age, age at diagnosis, and BMI by diabetes type are also shown in this table.
Statistical analyses
MATERIALS AND METHODS
Study population and data
Between April 16, 1991, and November 1, 1996,
4,553 persons with diabetes mellitus (excluding those
with gestational diabetes and secondary forms of diabetes) enrolled at the Diabetes Unit of Grady Memorial
Hospital in Atlanta, Georgia, which serves a predominately African American clinic population. During initial assessment at the clinic, extensive historical, medical, and biochemical information is collected
systematically. Details of this assessment have been
published elsewhere (21).
Of the 4,553 new patients who enrolled during this
time period, 4,030 (88.5 percent) were African
American, 369 (8.1 percent) were white, 70 (1.5 percent) were Hispanic, and 84 (approximately 1.9 percent) were of other or unknown ethnic origin. A total of
2,794 (61 percent) were women and 1,759 (39 percent)
were men.
Subjects eligible for this study included all persons
who enrolled at the Diabetes Unit between April 16,
1991, and November 1, 1996; were African American;
When available data are limited, two simple rules, or
very similar rules, are often used in national surveys to
classify diabetes by type. The first classifies persons as
having type 1 diabetes if they are currently using insulin
and the age at diagnosis is <30 years; otherwise, they
have type 2 diabetes. The second classifies persons as
having type 1 diabetes if they are currently using
insulin, the age at diagnosis is <30 years, and the BMI
is <26; otherwise, they have type 2 diabetes (9). For
clarity, we will call these rules current rule I (CR I) and
current rule II (CR II), respectively. The sensitivities
and specificities for these rules were determined so we
could compare the performance of these rules with the
performance of the new rules.
We used two approaches to construct new rules for
classifying diabetes by type. The first was classification
trees. Classification trees provide an exploratory method
of identifying factors and interactions among factors that
may explain variation in a binary outcome (23). They
are constructed by using recursive partitioning. At each
node of the tree (beginning with the root node, which
consists of all observations to be used in constructing the
Am J Epidemiol Vol. 149, No. 1, 1999
Estimating Prevalence of Type 1 and Type 2 Diabetes
1,200
57
n
i
-0.42
0.08
0.58
3-peptide)
FIGURE 1. Histogram of the log base 10 of C-peptide values for 3,694 African Americans with diabetes mellitus who enrolled at the Diabetes
Unit of Grady Memorial Hospital in Atlanta, Georgia, between April 16, 1991, and November 1, 1996.
TABLE 1. Distribution of the characteristics of African Americans with diabetes enrolled at the
Diabetes Unit of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996
Typei diabetes (n = 245)
Type 2 diabetes (n = 3,368)
No.
%•
No.
%*
Sex: male
119
49
1,223
36
Current insulin user
223
91
1,307
39
Age (years)
<20
20-44
>45
4
109
132
2
44
54
21
954
2,393
1
28
71
Age at diagnosis (years)
<20
20-44
>45
50
110
85
20
45
35
88
1,367
1,913
3
41
56
Body mass index (kg/m2)
<20
20-24
25-29
>30
25
75
75
70
10
31
31
28
51
411
1,001
1,905
2
12
30
56
Characteristic
* Some percentages have been rounded.
tree), the following dichotomous splits of the data based
on predictor variables are allowed. For a continuous or
ordered variable Xj, the allowed splits are of the form Xj
< x versus x,•> x ; for an unordered categorical variable,
they are the two class partitions. The predictor variable
and split combination is chosen to maximize the reduction in deviance for the tree, thereby transforming the
node in question into two nodes. Nodes that contain
Am J Epidemiol Vol. 149, No. 1, 1999
fewer than 10 observations or contain observations of
only one type are not reduced and are terminal nodes.
The procedure is then repeated until no more splits are
allowed. Since constructing a tree usually entails some
overfitting, an algorithm can be applied that creates a
nested sequence of subtrees by eliminating the least
important splits (23). On the basis of this algorithm and
by using the two dichotomous variables of sex and cur-
58
Boyle et al.
rent insulin use and the three continuous variables of
age, age at diagnosis, and BMI, we chose a simple tree
that classifies diabetes using data that often differ by
type. All trees were constructed by using S-Plus
(Statistical Sciences, Inc., Seattle, Washington).
Classification rules were defined from the tree in the
following manner. A probability threshold of P was chosen. A person with diabetes was assigned to exactly one
of the terminal nodes with an associated probability, p,
of being type 1 equal to the proportion of persons with
type 1 diabetes in the node. All those in this node were
classified as type 1 if p > P and as type 2 if p < P. For
each threshold P, the sensitivity and specificity were
defined as the proportion of those with correctly classified type 1 and type 2 diabetes, respectively.
The second approach was to construct classification
rules by using logistic regression. We modeled the log
odds that a person had type 1 diabetes as a linear function of covariates. Several candidate models were estimated (24), including all 32 main effects models; a
model with all five predictors and the three quadratic
terms in age, age at diagnosis, and BMI; and all 10 models with the five predictors plus pairwise interactions.
The Schwarz information criterion (25) was used to
select the final model. The criterion for any model
equals D + k In n, where D is the deviance, k is the number of parameters in the model, and n is the number of
observations. The best model minimized the Schwarz
information criterion. All logistic regression models
were estimated by using programs written in APL
(Manugistics, Inc., Rockville, Maryland).
For this approach, classification rules were defined as
they were for classification trees by fixing a threshold
probability of P. For a person with diabetes, the fitted
logistic regression model determined a predicted probability, p, of having type 1 diabetes. The person was classified as type 1 when p > P or as type 2 when p < P.
Varying the threshold probability generated classification rules. As before, sensitivity and specificity were
defined as the proportion of those with correctly classified type 1 and type 2 diabetes, respectively.
To avoid estimating sensitivities and specificities
from the same data used to construct classification rules,
half of the 3,613 observations (n = 1,806) were randomly selected to enable us to construct rules and the
remaining observations (n = 1,807) were withheld so
that we could determine sensitivities and specificities.
There were 138 persons with type 1 diabetes and 1,668
with type 2 diabetes among the 1,806 observations used
to construct classification rules. Among the remaining
1,807 observations, 107 persons had type 1 diabetes and
1,700 had type 2 diabetes.
In general, the choice of an appropriate rule depends
on the objective when applying the rule to a specific
population (26). For example, if classification rules are
used to diagnose a patient and the costs of misdiagnosis
are known, then a rule that minimizes the average cost
of misdiagnosis in the population might be preferred. If
these costs are either equal or unknown, a rule that minimizes the misclassification rate might be chosen. In this
study we took a different perspective and emphasized
the importance of rules that yielded estimates of prevalence (type 1 or type 2 diabetes in a population of persons with diabetes) with good statistical properties. For
a rule with known sensitivity a and specificity (3, the
natural unbiased estimate of prevalence TT is given by
t + P- 1
IT =
a +p- 1
where i is the proportion in the sample classified as
positive for the disease (type 1 in this case) (20).
Therefore, t estimates the proportion in the population
classified as type 1. This equation follows from the
observation that t = air + (1 — P)(l — IT). For a
simple random sample of size N,
'0 - 0
N(a + $ - 1),2 '
(1)
The variance of TT is estimated by substituting i for t in
equation 1. We sought rules that minimized this variance.
RESULTS
Recall that CR I classifies persons as having type 1
diabetes if they are currently using insulin and the age
at diagnosis is <30 years; otherwise, they are classified
as type 2. We used this rule to classify those in the
group withheld to determine sensitivities and specificities. Of the 107 persons in this group with type 1 diabetes, 33 were classified as type 1, for a sensitivity of
33/107 = 0.308. Of the 1,700 persons remaining—
those who had type 2 diabetes—1,596 were classified
as type 2, for a specificity of 1,596/1,700 = 0.939. CR
II classifies persons as type 1 if they are current insulin
users, the age at diagnosis is <30 years, and the BMI is
<26; otherwise, they are classified as type 2. When we
used this rule, similar calculations led to a lower sensitivity of 16/107 = 0.150 and a higher specificity of
1,673/1,700 = 0.984.
Figure 2 shows the classification tree that resulted
from using the procedure described. The top number in
each terminal node (box) or intermediate node (circle)
indicates the number of persons with type 1 diabetes,
and the bottom number indicates the number of persons with type 2 diabetes. The first split occurred at
insulin use. For insulin users, a split occurred at age of
Am J Epidemiol
Vol. 149, No. 1, 1999
Estimating Prevalence of Type 1 and Type 2 Diabetes
138
(
59
]
1,668 J
Insulin = yes
^ ^ ^ ^ ^
"^«N>
Insulin = no
f 124 \
Age at diagnosis < 28.9 J
„
^
55
86
Body mass
/"
index < 31.7 /
^^
647
v
14
JL
y
Age at diagnosis £ 28.9
^^\
^
\
/
69
/
\
561
Body mass / / V _ _ _ ^ / \.
index < 31.5/'
\
7
S \
\
Body mass
Vindex > 31.7
1,021
0.01
\
/
Body mass
index > 31.5
/
49
45
6
41
60
305
9
256
0.52
0.13
0.16
0.03
FIGURE 2. Classification tree built by using data on 1,806 African Americans with diabetes who were randomly selected from the study population of 3,613 African Americans with diabetes who enrolled at the Diabetes Unit of Grady Memorial Hospital in Atlanta, Georgia, between
April 16, 1991, and November 1, 1996. Top number in each terminal node (box) or intermediate node (circle), number of persons with type 1
diabetes; bottom number, number of persons with type 2 diabetes; proportions below each box, prediction probabilities of being type 1.
diagnosis (<28.9 or >28.9 years). For insulin users
with an age at diagnosis of <28.9 years, a final split on
BMI (<31.7 or >31.7) produced predicted probabilities
of being type 1 equal to 49/(49 + 45) = 0.52 and 6/(6 +
41) = 0.13, respectively. For insulin users whose age at
diagnosis was >28.9 years, a final split on BMI (<31.5
or >31.5) assigned predicted probabilities of being
type 1 equal to 0.16 and 0.03, respectively. Nonusers
of insulin are represented by a terminal node with a
predicted probability of being type 1 equal to 0.01. Sex
and age played no role in determining these prediction
probabilities. The classification tree assigned larger
probabilities of having type 1 diabetes to insulin users
and to persons with a younger age at diagnosis and a
lower BMI.
To illustrate how rules were constructed from the
tree, consider the threshold probability P = 0.5.
Persons with diabetes assigned to the leftmost terminal
node were classified as type 1, while those assigned to
any of the four remaining terminal nodes were classified as type 2. Insulin users whose age at diagnosis was
<28.9 years and whose BMI was <31.7 were classified
as type 1; all others were classified as type 2. Of the
107 persons with type 1 diabetes in the group withheld, 26 were classified as type 1, for a sensitivity of
26/107 = 0.243. The sensitivities and specificities, calculated from the 1,807 observations withheld, for all
possible threshold values are listed in table 2. Because
the classification tree admitted only five terminal
nodes, there were a limited number of rules possible.
Am J Epidemiol
Vol. 149, No. 1, 1999
At one extreme, when 0 < P < 0.01, the rule classified
all persons with diabetes as type 1. At the other
extreme, when 0.52 < P < 1, the rule classified all persons with diabetes as type 2.
The logistic regression model with the unique minimum Schwarz information criterion (INS, current
insulin user; ADX, age at diabetes diagnosis) is represented by the following equation:
In
probability of type 1
1 — probability of type 1
= 1.09 + 2.19(INS)
- 0.031(ADX) - 0.127(BMI)
(2)
As was true when the classification tree was used, sex
and age did not play a role in determining the prediction probabilities. The parameter estimates, their standard errors, and the associated z values for the logistic
regression model are shown in table 3. Again, as found
with the classification tree, persons with type 1 diabetes tended to be insulin users, were diagnosed at
younger ages, and had a lower BMI.
We used the final logistic regression model (equation 2) to determine the probability, p, of being type 1
given values for current insulin use, age at diagnosis,
and BMI. To illustrate, consider an insulin user with an
age at diagnosis of 20 years, a BMI of 20, and a threshold probability of P = 0.5. Then, ln(p/l - p) = 1.09 +
2.19 - 0.031(20) - 0.127(20) = 0.12 or p = 0.53 and the
60
Boyle et al.
TABLE 2. Sensitivities and specificities* for rules derived
from the classification tree used to classify diabetes by type
among the African Americans enrolled at the Diabetes Unit
of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996
TABLE 4. Sensitivities and specificities* for rules derived
from the logistic regression model used to classify diabetes
by type among the African Americans enrolled at the Diabetes
Unit of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996
Probability
threshold (P)
Sensitivity
Specificity
Probability
threshold (P)
Sensitivity
Specificity
0.00 < P < 0.01
0.01 <P<0.03
0.03 <P< 0.13
0.13<P<0.16
0.16 <P< 0.52
0.52 < P < 1.00
1.000
0.925
0.673
0.636
0.243
0.000
0.000
0.612
0.764
0.786
0.971
1.000
0.00
0.01
0.03
0.10
0.13
0.16
1.000
0.991
0.972
0.710
0.636
0.579
0.000
0.277
0.586
0.776
0.828
0.865
0.20
0.26
0.30
0.40
0.50
0.52
0.60
0.66
1.00
0.514
0.411
0.318
0.168
0.103
0.056
0.009
0.009
0.000
0.910
0.952
0.965
0.986
0.994
0.995
0.998
0.999
1.000
* Calculated from the 107 persons with type 1 diabetes and the
1,700 persons with type 2 diabetes withheld from observation.
person is classified as having type 1 diabetes. For a
person with diabetes who is not using insulin, whose
age at diagnosis is 30 years, and whose BMI is 30,
ln(p/l p ) - 1 . 0 9 0.031(30) - 0.127(30) = -3.65 or
p = 0.03 and the person is classified as having type 2
diabetes. Sensitivities and specificities, again determined from the withheld group, for rules derived from
the logistic regression model corresponding to various
threshold probabilities, including the thresholds from
table 2, are shown in table 4. A threshold probability of
P - 0.03 yields correct classification of 97 percent of
type 1 and 59 percent of type 2 diabetes. At P = 0.03,
the classification tree rule yields a lower sensitivity of
93 percent but a higher specificity of 61 percent.
In equation 1, the variance for N = 1 was calculated
for the four classification tree rules (those with sensitivity plus specificity greater than unity) for prevalence
levels ranging from 1 to 17 percent. The preferred rule
was the one giving the minimum variance. The value
of N did not affect this result when constant for all
comparisons. Similar calculations were done for all
logistic regression classification rules in which the
threshold probability, P, ranged from 0.01 to 0.66. The
minimum variance rule from the set of classification
tree rules, the minimum variance rule from the set of
logistic regression rules, and CR I and CR II are presented in table 5, along with sensitivities, specificities,
and variances. The fifth column lists the expected value
TABLE 3. Logistic regression model used to classify
diabetes by type among the African Americans enrolled
at the Diabetes Unit of Grady Memorial Hospital, Atlanta,
Georgia, 1991-1996
Parameter
Constant
Current insulin use
Age at diagnosis
Body mass index
* All p values <0.001.
Estimate
1.090
2.190
-0.031
-0.127
Standard
error
z
value*
0.655
0.296
0.007
0.018
1.67
7.41
-4.57
-7.10
* Calculated from the 107 persons with type 1 diabetes and the
1,700 persons with type 2 diabetes withheld from observation.
of i; the sixth column lists the expected value of IT,
which is always the true prevalence since IT is unbiased.
Thus, the bias can be considerable when t rather than TT
is used. Note that the logistic regression classification
rule with P = 0.26 (table 4) is the best among all other
rules for each prevalence listed. This rule can be
applied easily by calculating the quantity d = 2.19(INS)
- 0.031(ADX) - 0.127(BMI) and classifying diabetes
as type 1 if d> -2.14 or as type 2 if d<-2.\4.
Improvements in accuracy that occur by choosing the
optimal rule can be seen by considering the relative
sizes of confidence intervals (data not shown). The
length of a confidence interval is proportional to the
square root of the variance shown in table 5. For example, at a prevalence of 7 percent, the ratio of the lengths
of the confidence intervals associated with the logistic
regression rule and CR I is V0.516/Vl.l83 = 0.66.
Thus, the optimal rule entails a substantial improvement.
DISCUSSION
Data from this large African American population
with diabetes provided an excellent opportunity to
examine accepted classification rules for type 1 and type
2 diabetes and to develop new ones. In this study, we
evaluated the performance of rules used in public health
surveillance of diabetes in a population of African
Americans and compared it with the performance of
new rules developed from data collected from this population. By using the limited number of elements commonly gathered during population-based surveys and
included in this evaluation, we found the predictor variables (current insulin use, age at diagnosis, BMI) used in
Am J Epidemiol
Vol. 149, No. 1, 1999
Estimating Prevalence of Type 1 and Type 2 Diabetes
61
TABLE 5. Classification rules yielding minimum variances of unbiased prevalence estimates of type 1
and type 2 diabetes among the African Americans enrolled at the Diabetes Unit of Grady Memorial
Hospital, Atlanta, Georgia, 1991-1996
Prevalence
Rule
Sensitivity
Specificity
Simple
estimate
(0
Prevalence
estimate
<*)
Minimum
variance
0.01
Tree
Logistic
CR I*
CR l i t
0.243
0.411
0.308
0.150
0.971
0.952
0.939
0.984
0.031
0.052
0.063
0.017
0.01
0.01
0.01
0.01
0.659
0.372
0.974
0.949
0.03
Tree
Logistic
CRI
CRN
0.243
0.411
0.308
0.150
0.971
0.952
0.939
0.984
0.035
0.059
0.068
0.020
0.03
0.03
0.03
0.03
0.746
0.421
1.045
1.093
0.05
Tree
Logistic
CRI
CRN
0.243
0.411
0.308
0.150
0.971
0.952
0.939
0.984
0.040
0.066
0.073
0.023
0.05
0.05
0.05
0.05
0.832
0.469
1.114
1.236
0.07
Tree
Logistic
CRI
CRII
0.925
0.411
0.308
0.150
0.612
0.952
0.939
0.984
0.426
0.073
0.078
0.025
0.07
0.07
0.07
0.07
0.848
0.516
1.183
1.378
0.10
Tree
Logistic
CRI
CRII
0.925
0.411
0.308
0.150
0.612
0.952
0.939
0.984
0.442
0.084
0.086
0.029
0.10
0.10
0.10
0.10
0.855
0.586
1.284
1.589
0.15
Tree
Logistic
CR I
CRII
0.925
0.411
0.308
0.150
0.612
0.952
0.939
0.984
0.469
0.102
0.098
0.036
0.15
0.15
0.15
0.15
0.864
0.698
1.450
1.938
0.17
Tree
Logistic
CRI
CRII
0.925
0.411
0.308
0.150
0.612
0.952
0.939
0.984
0.479
0.110
0.103
0.039
0.17
0.17
0.17
0.17
0.865
0.741
1.514
2.076
* CR I, current rule I: Persons have type 1 diabetes if they are currently using insulin and the age at diagnosis is <30 years; otherwise, they have type 2 diabetes.
t CR II, current rule II: Persons have type 1 diabetes if they are currently using insulin, the age at diagnosis is
<30 years, and the body mass index is <26; otherwise, they have type 2 diabetes.
the existing rules to be good predictors of diabetes type
and important components of the new rules as well.
Current insulin use was a powerful discriminator of
diabetes type. For some classification rules, a history of
continuous or nearly continuous use of insulin from
diagnosis of diabetes is used to characterize type 1 (9).
In our analysis, additional information about the history
of insulin use provided no more discriminating power
over current insulin use (data not shown). Thus, we used
current insulin use for both the current rules and in
developing new rules for classification.
We found that the cutpoint for age at diagnosis is 28.9
years for the classification tree that provides the best discriminating power, similar to the CR I and CR II cutpoint of 30 years. However, the BMI cutpoints of 31.7
and 31.5 are substantially higher than the BMI cutpoint
of 26 used with CR n. Obesity in this population is more
Am J Epidemiol
Vol. 149, No. 1, 1999
severe (mean BMI, 27.2 for type 1 and 32.5 for type 2
diabetes) than in white populations with diabetes, and it
is well established that obesity is more prevalent in nondiabetic African Americans as well. It is plausible that
the higher BMI cutpoint to distinguish type 1 and type 2
diabetes reflects the occurrence of the disease in a population with a greater prevalence of obesity.
According to the National Diabetes Data Group profile, persons with type 1 diabetes are characterized by
insulin deficiency (i.e., low C-peptide levels) and a lean
body habitus (1). In this population, the group that was
classified as having type 1 diabetes had low C-peptide
levels but included obese as well as lean subjects, contrasting with this profile. C-peptide levels decline with
an increasing duration of diabetes (27, 28), and, if the
obese subjects with type 1 diabetes were actually misclassified as type 2 patients, beta-cell fatigue could
62
Boyle et al.
explain in part their low C-peptide levels. This conclusion seems unlikely, however, because such profound
hypoinsulinemia is unusual in type 2 diabetes, regardless of duration (29).
Variants of diabetes have been described for the
African American population that have not been reported for whites. Umpierrez et al. (30) described diabetic
ketoacidosis in obese African Americans whose subsequent course was more consistent with type 2 diabetes;
metabolic studies after resolution of diabetic ketoacidosis demonstrated good endogenous insulin reserves,
most subjects were able to discontinue insulin therapy
within weeks, and none had evidence of the autoimmune destruction of insulin-producing cells that occurs
in type 1 diabetes. Banerji and Lebovitz (31) reported
an insulin-sensitive variant of type 2 diabetes in African
Americans that was characterized by normal peripheral
insulin sensitivity, decreased insulin secretion, absence
of autoimmunity to insulin-producing cells, and excellent response to nonpharmacologic therapy. These or
other as-yet unrecognized differences could explain the
misclassification of type 2 diabetes as type 1. However,
over 90 percent of the subjects classified as type 1 in
this population required insulin therapy and had much
lower mean C-peptide levels than the subjects in the
other studies cited.
Twenty-two subjects who were not currently using
insulin were classified as having type 1 diabetes
because they had low C-peptide levels. Their mean
duration of diabetes was 7 years and their mean BMI
was 26. Thirteen of the 22 subjects had C-peptide readings that were equal to the cutpoint separating type 1
from type 2 diabetes. Possible explanations of the low
C-peptide levels in non-insulin-using subjects include
the gradual decline in beta-cell function that occurs in
type 2 diabetes. As already noted, however, levels this
low are unusual in this form of the disease. Another
possibility is that these persons have the insulinsensitive variant of type 2 diabetes described in African
Americans, although reported C-peptide levels are usually higher in this group as well. Finally, in a prospective study of diabetes classification, discordance
between classification by CLpeptide and clinical criteria
was reported in a small number of cases despite 8 years
of follow-up (32). With our current level of understanding, some discordance may be inevitable.
The performance of classification rules in estimating
prevalence of diabetes is dependent on 1) the selected
cutpoint for each rule (i.e., the threshold selected to distinguish type 1 from type 2 diabetes) and 2) the true
prevalence of each diabetes type. Our goal was to use
the rule and cutpoint that yielded the minimum variance
and the most precise estimate of prevalence. Thus, we
determined the variance of the true prevalence esti-
mates by using 1) a range of true prevalence values for
type 1 and type 2 diabetes and then selecting the cutpoint for the newly developed rules and 2) the single
cutpoints for CR I and CRII (which are fixed).
Compared with the classification tree rules, CR I and
CR II performed poorly for prevalences ranging from 1
to 17 percent. The optimal tree rule (for prevalences of
1-5 percent) classifies a person as type 1 if he or she is
currently using insulin and the age at diagnosis is <28.9
years. Thus, a simple modification to CR I (changing
the age at diagnosis to <28.9 years) applied to the
African American population provides an improved rule
for these prevalences. For prevalences of >5 percent, the
optimal tree rule classifies a person as type 1 if he or she
is using insulin. Even for prevalences of <5 percent, this
simple rule compares favorably with the optimal tree
rule (data not shown). Finally, the logistic regression
rule provided the minimum variance of all the estimates
evaluated, regardless of the true prevalence (with the
range of 1-17 percent for type 1 diabetes).
A limitation of this study was the use of fasting Cpeptide levels to determine diabetes type. Bimodality in
the distribution of C-peptide levels was not detected in
this population (figure 1). Therefore, we selected a cutpoint value based on consensus from data in the literature. In addition, the C-peptide value available in this
database was from a fasting, not stimulated, specimen.
Some investigators have used and recommended stimulated C-peptide values (33) for studies requiring an
extremely high degree of clinical certainty. However,
other investigators have found that the fasting and stimulated C-peptide values are highly correlated and in
most cases can discriminate between insulin-requiring
patients and non-insulin-requiring patients (5).
Another limitation of this study was the use of a clinic population that may not be representative of the
overall African American population with diabetes.
Clinic populations or populations recruited for research
studies were used to develop current rules of classification. The sample included in this study came from a
large urban clinic and represents primarily persons of
lower socioeconomic status, and the extent to which
this may have biased our findings is unknown.
Our effort in this study was to identify the classification rule that provides the most precise (minimum variance) estimate of prevalence. Other issues to consider
are selecting classification rules that may either minimize the number of persons misclassified or possibly
minimize the cost of misclassification. In the future,
these issues may need to be addressed as the cost of
misclassification is better understood.
In summary, tracking the prevalence of type 1 and
type 2 diabetes in the population is an important public'
health function. To provide an appropriate and effective
Am J Epidemiol Vol. 149, No. 1, 1999
Estimating Prevalence of Type 1 and Type 2 Diabetes
response to address these major forms of diabetes that
have markedly different etiologies, treatments, and preventive strategies, we need to conduct valid public
health surveillance and determine precise prevalence
estimates. We have provided important information to
address this issue in the African American population
and have proposed the use of new classification rules to
improve and reduce the variability that occurs when
applying the typical rules used currently.
15.
16.
17.
18.
REFERENCES
1. National Diabetes Data Group. Classification and diagnosis of
diabetes mellitus and other categories of glucose intolerance.
Diabetes 1979;28:1039-57.
2. World Health Organization Expert Committee on Diabetes
Mellitus. Second report on diabetes mellitus. Geneva,
Switzerland: World Health Organization, 1989:8-14. (WHO
technical report series 646).
3. Prior MJ, Prout T, Miller D, et al. C-peptide and the classification of diabetes mellitus in patients in the Early Treatment
Diabetic Retinopathy Study. Report number 6. Ann Epidemiol
1993;3:9-17.
4. Welborn TA, Garcia-Webb P, Bonser AM. Basal C-peptide in
the discrimination of type I and type II diabetes. Diabetes Care
1981;4:616-19.
5. Hother-Nielsen O, Faber O, Schwartz N, et al. Classification of
newly diagnosed diabetic patients as insulin-requiring or noninsulin-requiring based on clinical and biochemical variables.
Diabetes Care 1988;ll:531-7.
6. Klein R, Klein BE, Moss SE. The Wisconsin Epidemiologic
Study of Diabetic Retinopathy. The relationship of C-peptide
to the incidence and progression of diabetic retinopathy.
Diabetes 1995;44:796-801.
7. Landin-Ollson M, Nilsson KO, Lernmark A, et al. Islet cell
antibodies and fasting C-peptide predict insulin requirement at
diagnosis of diabetes mellitus. Diabetologia 1990;33:561-8.
8. Katzeff HL, Savage PJ, Barclay-White B, et al. C-peptide measurement in the differentiation of type 1 (insulin-dependent)
and type 2 (non-insulin-dependent) diabetes mellitus.
Diabetologia 1985;28:264-8.
9. Harris MI, Cowie CC, Howie LJ. Self-monitoring of blood
glucose by adults with diabetes in the United States population. Diabetes Care 1993;16:1116-23.
10. Harris MI, Robbins DC. Prevalence of adult-onset IDDM in
the U.S. population. Diabetes Care 1994;17:1337^t0.
11. Geographic patterns of childhood insulin-dependent diabetes
mellitus. Diabetes Epidemiology Research International
Group. Diabetes 1988;37:1113-19.
12. Lipman TH. The epidemiology of type 1 diabetes in children
0-14 years of age in Philadelphia. Diabetes Care 1993; 16:922-5.
13. Tull ES, Roseman JM, Christian CLE. Epidemiology of childhood IDDM in the U.S. Virgin Islands from 1979 to 1988.
Diabetes Care 1991;14:558-64.
14. LaPorte RE, Matsushima M, Chang YF. Prevalence and incidence of insulin-dependent diabetes. In: Harris MI, Cowie CC,
Stem MP, et al, eds. Diabetes in America. Bethesda, MD:
National Institutes of Health, National Institute of Diabetes
Am J Epidemiol Vol. 149, No. 1, 1999
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
63
and Digestive and Kidney Diseases, 1995:37-^6. (NIH publication no. 95-1468).
Kenny SJ, Aubert RE, Geiss LS. Prevalence and incidence of
non-insulin-dependent diabetes. In: Harris MI, Cowie CC,
Stern MP, et al, eds. Diabetes in America. Bethesda, MD:
National Institutes of Health, National Institute of Diabetes
and Digestive and Kidney Diseases, 1995:47-67. (NIH publication no. 95-1468).
Umpierrez GE, Casals MMC, Gebhart SSP, et al. Diabetic
ketoacidosis in obese African-Americans. Diabetes 1995;
44:790-5.
Banerji MA, Chaiken RL, Huey H, et al. GAD antibody negative NIDDM in adult black subjects with diabetic ketoacidosis
and increased frequency of human leukocyte antigen DR3 and
DR4. Flatbush diabetes. Diabetes 1994;43:741-5.
Tull ES, Roseman JM. Diabetes in African Americans. In:
Harris MI, Cowie CC, Stern MP, et al, eds. Diabetes in
America. Bethesda, MD: National Institutes of Health,
National Institute of Diabetes and Digestive and Kidney
Diseases, 1995:613-30. (NIH publication no. 95-1468).
Stern MP, Mitchell BD. Diabetes in Hispanic Americans. In:
Harris MI, Cowie CC, Stern MP, et al, eds. Diabetes in
America. Bethesda, MD: National Institutes of Health,
National Institute of Diabetes and Digestive and Kidney
Diseases, 1995:631-60. (NIH publication no. 95-1468).
Rogan WJ, Gladen B. Estimating prevalence from the results
of a screening test. Am J Epidemiol 1978; 107:71-6.
Ziemer DC, Goldschmid MG, Musey VC, et al. Diabetes in
urban African Americans. III. Management of type II diabetes
in a municipal hospital setting. Am J Med 1996;101:25-33.
American Diabetes Association. Guide to diagnosis and classification of diabetes mellitus and other categories of glucose
tolerance. Diabetes Care 1997;20(suppl):21.
Chambers JM, Hastie TJ. Statistical models in S. Pacific
Grove, CA: Wadsworth & Brooks, 1992.
Collet D. Modelling binary data. London, England: Chapman
&Hall, 1991.
Schwarz G. Estimating the dimension of a model. Ann Stat
1978;6:461—4.
McNeil BJ, Keeler E, Adelstein SJ. Primer on certain elements
of medical decision making. N Engl J Med 1975;293:211-15.
Joffe BI, Panz VR, Wing JR, et al. Pathogenesis of noninsulin-dependent diabetes mellitus in the black population of
southern Africa. Lancet 1992;340:460-2.
DCCT Research Group. Effects of age, duration and treatment
of insulin-dependent diabetes mellitus on residual beta-cell
function: observations during eligibility testing for the
Diabetes Control and Complications Trial (DCCT). J Clin
Endocrinol Metab 1987;65:30-6.
Rewers M, Hamman RF. Risk factors for non-insulindependent diabetes. Diabetes in America. Bethesda, MD:
National Institutes of Health, National Institute of Diabetes
and Digestive and Kidney Diseases, 1995:179-220. (NIH publication no. 95-1468).
Umpierrez GE, Casals MM, Gebhart SS, et al. Diabetic ketoacidosis in obese African Americans. Diabetes 1995;44:790-5.
Banerji MA, Lebovitz HE. Insulin-sensitive and insulinresistant variants in NIDDM. Diabetes 1989;38:784-92.
Service FJ, Rizza RA, Zimmerman BR, et al. The classification of diabetes by clinical and C-peptide criteria. Diabetes
Care 1997,20:198-201.
Diabetes Control and Complications Trial (DCCT): results of
feasibility study. The DCCT Research Group. Diabetes Care
1987;10:l-19.