American Journal of Epidemiology Copyright © 1999 by The Johns Hopkins University School of Hygiene and Public Health All rights reserved Vol. 149, No. 1 Printed in U.S.A. Estimating Prevalence of Type 1 and Type 2 Diabetes in a Population of African Americans with Diabetes Mellitus James P. Boyle,1 Michael M. Engelgau,1 Theodore J. Thompson,1 Merilyn G. Goldschmid,1 Gloria L. Beckles,1 David S. Timberlake,1 William H. Herman,2 David C. Ziemer,3 and Daniel L. Gallina3 The pathogenesis, treatment, and outcomes of type 1 and type 2 diabetes differ. Current surveys derive population-based estimates of diabetes prevalence by type using limited clinical information and applying classification rules developed in white populations. How well these rules perform when deriving similar estimates in African American populations is unknown. For this study, data were collected on a group of African Americans with diabetes who enrolled at the Diabetes Unit of Grady Memorial Hospital in Atlanta, Georgia, from April 16,1991, to November 1,1996. The data were used to develop some simple classification rules for African Americans based on a classification tree and a logistic regression model. Sensitivities and specificities, in which fasting C-peptide was used as the gold standard, were determined for these rules and for two current rules developed in mostly white, non-Hispanic populations. Rules that yielded precise (minimum variance unbiased) estimates of the prevalence of type 1 diabetes were preferred. The authors found that a rule based on the logistic regression model was best for estimating type 1 prevalences ranging from 1 % to 17%. They concluded that simple classification rules can be used to estimate prevalence of diabetes by type in African American populations and that the optimal rule differs somewhat from the current rules. Am J Epidemiol 1999; 149:55-63. classification; diabetes mellitus; logistic models Diabetes mellitus is recognized as an important public health problem in the United States and throughout the world. The most prevalent forms of diabetes are type 1 and type 2. Type 1 typically occurs in youth, is characterized by severe insulin deficiency secondary to autoimmune destruction of insulin-producing cells, and often results in diabetic ketoacidosis. Type 2 typically occurs in adulthood, is characterized by insulin resistance and high insulin levels, and generally does not result in diabetic ketoacidosis (1, 2). Because of differences in etiology, associated risk factors, and prevention strategies for these two forms of diabetes, public health surveillance to enable a distinction to be made between the two types is very important. Unfortunately, data available from population-based surveys that are useful in classifying diabetes are limited, and to our knowledge the validity of diabetes classification based on such data has not been examined in minority populations in the United States. Rules used for accurate classification of diabetes were developed from extensive clinical research. From this research, C-peptide emerged as a biochemical marker central to distinguishing type 1 from type 2 diabetes. Cpeptide is a polypeptide that is cleaved from proinsulin and reflects endogenous pancreatic capacity to secrete insulin (3-8). Exogenous insulin (i.e., insulin administered for treatment) does not alter the C-peptide level, so it is a reliable measure of endogenous insulin production even in patients who require insulin. A C-peptide level of <0.9 ng/ml (0.3 pmol/ml), which is very low, has been used to accurately classify persons with type 1 diabetes. Based on findings from clinical research, simple classification rules were developed that can function with the Limited information usually available from public health survey data (3, 9, 10). This development has made possible the surveillance of trends in diabetes by type. Unfortunately, the data from which the original rules were derived came largely from studies of white, non-Hispanic populations. The incidence, prevalence, and clinical presentation of type 1 and type 2 diabetes in African Americans, Hispanics, Asian/Pacific Islanders, and American Indians are different from those in white, non-Hispanic groups (11-19). To our knowledge, how well the existing classification Received for publication October 14,1997, and accepted for publication May 11, 1998. Abbreviations: BMI, body mass index; CR I, current rule I; CR II, current rule II. 1 Division of Diabetes Translation, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, Atlanta, GA. 2 Division of Endocrinology and Metabolism, Department of Internal Medicine, University of Michigan Medical Center, Ann Arbor, Ml. 3 Diabetes Unit, Grady Memorial Hospital, Atlanta, GA. Reprint requests to James P. Boyle, Division of Diabetes Translation, Mailstop K-68, Centers for Disease Control and Prevention, 4770 Buford Highway NE, Atlanta, GA 30341-3724. 55 56 Boyle et al. schemes perform in minority groups has not been examined previously. For this study, development and evaluation of methods to estimate the prevalence of type 1 and type 2 diabetes included the following steps: 1) develop classification rules, 2) calculate the unadjusted estimates by using the classification rules, 3) obtain unbiased estimates by adjusting the estimates for their sensitivities and specificities, and 4) select the unbiased estimator with minimum variance. The unadjusted estimates are simply the proportion of people classified as having type 1 or type 2 diabetes. These estimates are often used directly (9, 10) but are known to be biased when sensitivity and specificity do not equal one (20). The unadjusted estimates are inappropriate to use since they estimate the proportion, not the prevalence, of people classified as type 1 or type 2 in the entire population. Thus, we considered only unbiased estimates of prevalence. Small variances are preferred, since smaller variances imply a higher probability that the estimate is arbitrarily close to the true population prevalence. The purpose of this study was twofold. Using detailed clinical and laboratory information available from a clinic-based population of African Americans with diabetes, we wanted to evaluate the performance of classification rules currently used with survey data to estimate diabetes prevalence by type. In addition, we used this detailed information to develop classification rules specific to African Americans so that potential differences by race could be assessed. met the standard American Diabetes Association criteria for the diagnosis of diabetes (22); and had serum creatinine levels of <2.0 mg/dl. It is well known that creatinine levels of >2.0 mg/dl are indicative of significant renal insufficiency and can result in spurious C-peptide readings. For this analysis, we required information on sex (male, female), the dichotomous variable current insulin use (1 = yes, 0 = no), age, age at diabetes diagnosis, body mass index (BMI), and fasting C-peptide level. Because the creatinine levels of 259 subjects were either missing or were >2.0 mg/dl, these subjects were excluded from analysis. An additional 158 subjects were excluded because of missing data on required variables (77 for missing C-peptide levels, eight for missing age at diagnosis, and 73 for missing data used to calculate BMI), reducing the final number of subjects to 3,613. Figure 1 is a histogram of the distribution of 3,694 available C-peptide levels (log base 10). An analysis (details not shown) demonstrated no evidence of bimodality. Therefore, we used fasting C-peptide cutpoints of <0.9 ng/ml and >0.9 ng/ml to classify diabetes as type 1 or type 2, respectively (3-8). Of the 245 subjects who were classified as having type 1 diabetes, 49 percent were men and 91 percent were current insulin users (table 1). Among the 3,368 persons classified as having type 2 diabetes, 36 percent were men and 39 percent were current insulin users. Distributions of age, age at diagnosis, and BMI by diabetes type are also shown in this table. Statistical analyses MATERIALS AND METHODS Study population and data Between April 16, 1991, and November 1, 1996, 4,553 persons with diabetes mellitus (excluding those with gestational diabetes and secondary forms of diabetes) enrolled at the Diabetes Unit of Grady Memorial Hospital in Atlanta, Georgia, which serves a predominately African American clinic population. During initial assessment at the clinic, extensive historical, medical, and biochemical information is collected systematically. Details of this assessment have been published elsewhere (21). Of the 4,553 new patients who enrolled during this time period, 4,030 (88.5 percent) were African American, 369 (8.1 percent) were white, 70 (1.5 percent) were Hispanic, and 84 (approximately 1.9 percent) were of other or unknown ethnic origin. A total of 2,794 (61 percent) were women and 1,759 (39 percent) were men. Subjects eligible for this study included all persons who enrolled at the Diabetes Unit between April 16, 1991, and November 1, 1996; were African American; When available data are limited, two simple rules, or very similar rules, are often used in national surveys to classify diabetes by type. The first classifies persons as having type 1 diabetes if they are currently using insulin and the age at diagnosis is <30 years; otherwise, they have type 2 diabetes. The second classifies persons as having type 1 diabetes if they are currently using insulin, the age at diagnosis is <30 years, and the BMI is <26; otherwise, they have type 2 diabetes (9). For clarity, we will call these rules current rule I (CR I) and current rule II (CR II), respectively. The sensitivities and specificities for these rules were determined so we could compare the performance of these rules with the performance of the new rules. We used two approaches to construct new rules for classifying diabetes by type. The first was classification trees. Classification trees provide an exploratory method of identifying factors and interactions among factors that may explain variation in a binary outcome (23). They are constructed by using recursive partitioning. At each node of the tree (beginning with the root node, which consists of all observations to be used in constructing the Am J Epidemiol Vol. 149, No. 1, 1999 Estimating Prevalence of Type 1 and Type 2 Diabetes 1,200 57 n i -0.42 0.08 0.58 3-peptide) FIGURE 1. Histogram of the log base 10 of C-peptide values for 3,694 African Americans with diabetes mellitus who enrolled at the Diabetes Unit of Grady Memorial Hospital in Atlanta, Georgia, between April 16, 1991, and November 1, 1996. TABLE 1. Distribution of the characteristics of African Americans with diabetes enrolled at the Diabetes Unit of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996 Typei diabetes (n = 245) Type 2 diabetes (n = 3,368) No. %• No. %* Sex: male 119 49 1,223 36 Current insulin user 223 91 1,307 39 Age (years) <20 20-44 >45 4 109 132 2 44 54 21 954 2,393 1 28 71 Age at diagnosis (years) <20 20-44 >45 50 110 85 20 45 35 88 1,367 1,913 3 41 56 Body mass index (kg/m2) <20 20-24 25-29 >30 25 75 75 70 10 31 31 28 51 411 1,001 1,905 2 12 30 56 Characteristic * Some percentages have been rounded. tree), the following dichotomous splits of the data based on predictor variables are allowed. For a continuous or ordered variable Xj, the allowed splits are of the form Xj < x versus x,•> x ; for an unordered categorical variable, they are the two class partitions. The predictor variable and split combination is chosen to maximize the reduction in deviance for the tree, thereby transforming the node in question into two nodes. Nodes that contain Am J Epidemiol Vol. 149, No. 1, 1999 fewer than 10 observations or contain observations of only one type are not reduced and are terminal nodes. The procedure is then repeated until no more splits are allowed. Since constructing a tree usually entails some overfitting, an algorithm can be applied that creates a nested sequence of subtrees by eliminating the least important splits (23). On the basis of this algorithm and by using the two dichotomous variables of sex and cur- 58 Boyle et al. rent insulin use and the three continuous variables of age, age at diagnosis, and BMI, we chose a simple tree that classifies diabetes using data that often differ by type. All trees were constructed by using S-Plus (Statistical Sciences, Inc., Seattle, Washington). Classification rules were defined from the tree in the following manner. A probability threshold of P was chosen. A person with diabetes was assigned to exactly one of the terminal nodes with an associated probability, p, of being type 1 equal to the proportion of persons with type 1 diabetes in the node. All those in this node were classified as type 1 if p > P and as type 2 if p < P. For each threshold P, the sensitivity and specificity were defined as the proportion of those with correctly classified type 1 and type 2 diabetes, respectively. The second approach was to construct classification rules by using logistic regression. We modeled the log odds that a person had type 1 diabetes as a linear function of covariates. Several candidate models were estimated (24), including all 32 main effects models; a model with all five predictors and the three quadratic terms in age, age at diagnosis, and BMI; and all 10 models with the five predictors plus pairwise interactions. The Schwarz information criterion (25) was used to select the final model. The criterion for any model equals D + k In n, where D is the deviance, k is the number of parameters in the model, and n is the number of observations. The best model minimized the Schwarz information criterion. All logistic regression models were estimated by using programs written in APL (Manugistics, Inc., Rockville, Maryland). For this approach, classification rules were defined as they were for classification trees by fixing a threshold probability of P. For a person with diabetes, the fitted logistic regression model determined a predicted probability, p, of having type 1 diabetes. The person was classified as type 1 when p > P or as type 2 when p < P. Varying the threshold probability generated classification rules. As before, sensitivity and specificity were defined as the proportion of those with correctly classified type 1 and type 2 diabetes, respectively. To avoid estimating sensitivities and specificities from the same data used to construct classification rules, half of the 3,613 observations (n = 1,806) were randomly selected to enable us to construct rules and the remaining observations (n = 1,807) were withheld so that we could determine sensitivities and specificities. There were 138 persons with type 1 diabetes and 1,668 with type 2 diabetes among the 1,806 observations used to construct classification rules. Among the remaining 1,807 observations, 107 persons had type 1 diabetes and 1,700 had type 2 diabetes. In general, the choice of an appropriate rule depends on the objective when applying the rule to a specific population (26). For example, if classification rules are used to diagnose a patient and the costs of misdiagnosis are known, then a rule that minimizes the average cost of misdiagnosis in the population might be preferred. If these costs are either equal or unknown, a rule that minimizes the misclassification rate might be chosen. In this study we took a different perspective and emphasized the importance of rules that yielded estimates of prevalence (type 1 or type 2 diabetes in a population of persons with diabetes) with good statistical properties. For a rule with known sensitivity a and specificity (3, the natural unbiased estimate of prevalence TT is given by t + P- 1 IT = a +p- 1 where i is the proportion in the sample classified as positive for the disease (type 1 in this case) (20). Therefore, t estimates the proportion in the population classified as type 1. This equation follows from the observation that t = air + (1 — P)(l — IT). For a simple random sample of size N, '0 - 0 N(a + $ - 1),2 ' (1) The variance of TT is estimated by substituting i for t in equation 1. We sought rules that minimized this variance. RESULTS Recall that CR I classifies persons as having type 1 diabetes if they are currently using insulin and the age at diagnosis is <30 years; otherwise, they are classified as type 2. We used this rule to classify those in the group withheld to determine sensitivities and specificities. Of the 107 persons in this group with type 1 diabetes, 33 were classified as type 1, for a sensitivity of 33/107 = 0.308. Of the 1,700 persons remaining— those who had type 2 diabetes—1,596 were classified as type 2, for a specificity of 1,596/1,700 = 0.939. CR II classifies persons as type 1 if they are current insulin users, the age at diagnosis is <30 years, and the BMI is <26; otherwise, they are classified as type 2. When we used this rule, similar calculations led to a lower sensitivity of 16/107 = 0.150 and a higher specificity of 1,673/1,700 = 0.984. Figure 2 shows the classification tree that resulted from using the procedure described. The top number in each terminal node (box) or intermediate node (circle) indicates the number of persons with type 1 diabetes, and the bottom number indicates the number of persons with type 2 diabetes. The first split occurred at insulin use. For insulin users, a split occurred at age of Am J Epidemiol Vol. 149, No. 1, 1999 Estimating Prevalence of Type 1 and Type 2 Diabetes 138 ( 59 ] 1,668 J Insulin = yes ^ ^ ^ ^ ^ "^«N> Insulin = no f 124 \ Age at diagnosis < 28.9 J „ ^ 55 86 Body mass /" index < 31.7 / ^^ 647 v 14 JL y Age at diagnosis £ 28.9 ^^\ ^ \ / 69 / \ 561 Body mass / / V _ _ _ ^ / \. index < 31.5/' \ 7 S \ \ Body mass Vindex > 31.7 1,021 0.01 \ / Body mass index > 31.5 / 49 45 6 41 60 305 9 256 0.52 0.13 0.16 0.03 FIGURE 2. Classification tree built by using data on 1,806 African Americans with diabetes who were randomly selected from the study population of 3,613 African Americans with diabetes who enrolled at the Diabetes Unit of Grady Memorial Hospital in Atlanta, Georgia, between April 16, 1991, and November 1, 1996. Top number in each terminal node (box) or intermediate node (circle), number of persons with type 1 diabetes; bottom number, number of persons with type 2 diabetes; proportions below each box, prediction probabilities of being type 1. diagnosis (<28.9 or >28.9 years). For insulin users with an age at diagnosis of <28.9 years, a final split on BMI (<31.7 or >31.7) produced predicted probabilities of being type 1 equal to 49/(49 + 45) = 0.52 and 6/(6 + 41) = 0.13, respectively. For insulin users whose age at diagnosis was >28.9 years, a final split on BMI (<31.5 or >31.5) assigned predicted probabilities of being type 1 equal to 0.16 and 0.03, respectively. Nonusers of insulin are represented by a terminal node with a predicted probability of being type 1 equal to 0.01. Sex and age played no role in determining these prediction probabilities. The classification tree assigned larger probabilities of having type 1 diabetes to insulin users and to persons with a younger age at diagnosis and a lower BMI. To illustrate how rules were constructed from the tree, consider the threshold probability P = 0.5. Persons with diabetes assigned to the leftmost terminal node were classified as type 1, while those assigned to any of the four remaining terminal nodes were classified as type 2. Insulin users whose age at diagnosis was <28.9 years and whose BMI was <31.7 were classified as type 1; all others were classified as type 2. Of the 107 persons with type 1 diabetes in the group withheld, 26 were classified as type 1, for a sensitivity of 26/107 = 0.243. The sensitivities and specificities, calculated from the 1,807 observations withheld, for all possible threshold values are listed in table 2. Because the classification tree admitted only five terminal nodes, there were a limited number of rules possible. Am J Epidemiol Vol. 149, No. 1, 1999 At one extreme, when 0 < P < 0.01, the rule classified all persons with diabetes as type 1. At the other extreme, when 0.52 < P < 1, the rule classified all persons with diabetes as type 2. The logistic regression model with the unique minimum Schwarz information criterion (INS, current insulin user; ADX, age at diabetes diagnosis) is represented by the following equation: In probability of type 1 1 — probability of type 1 = 1.09 + 2.19(INS) - 0.031(ADX) - 0.127(BMI) (2) As was true when the classification tree was used, sex and age did not play a role in determining the prediction probabilities. The parameter estimates, their standard errors, and the associated z values for the logistic regression model are shown in table 3. Again, as found with the classification tree, persons with type 1 diabetes tended to be insulin users, were diagnosed at younger ages, and had a lower BMI. We used the final logistic regression model (equation 2) to determine the probability, p, of being type 1 given values for current insulin use, age at diagnosis, and BMI. To illustrate, consider an insulin user with an age at diagnosis of 20 years, a BMI of 20, and a threshold probability of P = 0.5. Then, ln(p/l - p) = 1.09 + 2.19 - 0.031(20) - 0.127(20) = 0.12 or p = 0.53 and the 60 Boyle et al. TABLE 2. Sensitivities and specificities* for rules derived from the classification tree used to classify diabetes by type among the African Americans enrolled at the Diabetes Unit of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996 TABLE 4. Sensitivities and specificities* for rules derived from the logistic regression model used to classify diabetes by type among the African Americans enrolled at the Diabetes Unit of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996 Probability threshold (P) Sensitivity Specificity Probability threshold (P) Sensitivity Specificity 0.00 < P < 0.01 0.01 <P<0.03 0.03 <P< 0.13 0.13<P<0.16 0.16 <P< 0.52 0.52 < P < 1.00 1.000 0.925 0.673 0.636 0.243 0.000 0.000 0.612 0.764 0.786 0.971 1.000 0.00 0.01 0.03 0.10 0.13 0.16 1.000 0.991 0.972 0.710 0.636 0.579 0.000 0.277 0.586 0.776 0.828 0.865 0.20 0.26 0.30 0.40 0.50 0.52 0.60 0.66 1.00 0.514 0.411 0.318 0.168 0.103 0.056 0.009 0.009 0.000 0.910 0.952 0.965 0.986 0.994 0.995 0.998 0.999 1.000 * Calculated from the 107 persons with type 1 diabetes and the 1,700 persons with type 2 diabetes withheld from observation. person is classified as having type 1 diabetes. For a person with diabetes who is not using insulin, whose age at diagnosis is 30 years, and whose BMI is 30, ln(p/l p ) - 1 . 0 9 0.031(30) - 0.127(30) = -3.65 or p = 0.03 and the person is classified as having type 2 diabetes. Sensitivities and specificities, again determined from the withheld group, for rules derived from the logistic regression model corresponding to various threshold probabilities, including the thresholds from table 2, are shown in table 4. A threshold probability of P - 0.03 yields correct classification of 97 percent of type 1 and 59 percent of type 2 diabetes. At P = 0.03, the classification tree rule yields a lower sensitivity of 93 percent but a higher specificity of 61 percent. In equation 1, the variance for N = 1 was calculated for the four classification tree rules (those with sensitivity plus specificity greater than unity) for prevalence levels ranging from 1 to 17 percent. The preferred rule was the one giving the minimum variance. The value of N did not affect this result when constant for all comparisons. Similar calculations were done for all logistic regression classification rules in which the threshold probability, P, ranged from 0.01 to 0.66. The minimum variance rule from the set of classification tree rules, the minimum variance rule from the set of logistic regression rules, and CR I and CR II are presented in table 5, along with sensitivities, specificities, and variances. The fifth column lists the expected value TABLE 3. Logistic regression model used to classify diabetes by type among the African Americans enrolled at the Diabetes Unit of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996 Parameter Constant Current insulin use Age at diagnosis Body mass index * All p values <0.001. Estimate 1.090 2.190 -0.031 -0.127 Standard error z value* 0.655 0.296 0.007 0.018 1.67 7.41 -4.57 -7.10 * Calculated from the 107 persons with type 1 diabetes and the 1,700 persons with type 2 diabetes withheld from observation. of i; the sixth column lists the expected value of IT, which is always the true prevalence since IT is unbiased. Thus, the bias can be considerable when t rather than TT is used. Note that the logistic regression classification rule with P = 0.26 (table 4) is the best among all other rules for each prevalence listed. This rule can be applied easily by calculating the quantity d = 2.19(INS) - 0.031(ADX) - 0.127(BMI) and classifying diabetes as type 1 if d> -2.14 or as type 2 if d<-2.\4. Improvements in accuracy that occur by choosing the optimal rule can be seen by considering the relative sizes of confidence intervals (data not shown). The length of a confidence interval is proportional to the square root of the variance shown in table 5. For example, at a prevalence of 7 percent, the ratio of the lengths of the confidence intervals associated with the logistic regression rule and CR I is V0.516/Vl.l83 = 0.66. Thus, the optimal rule entails a substantial improvement. DISCUSSION Data from this large African American population with diabetes provided an excellent opportunity to examine accepted classification rules for type 1 and type 2 diabetes and to develop new ones. In this study, we evaluated the performance of rules used in public health surveillance of diabetes in a population of African Americans and compared it with the performance of new rules developed from data collected from this population. By using the limited number of elements commonly gathered during population-based surveys and included in this evaluation, we found the predictor variables (current insulin use, age at diagnosis, BMI) used in Am J Epidemiol Vol. 149, No. 1, 1999 Estimating Prevalence of Type 1 and Type 2 Diabetes 61 TABLE 5. Classification rules yielding minimum variances of unbiased prevalence estimates of type 1 and type 2 diabetes among the African Americans enrolled at the Diabetes Unit of Grady Memorial Hospital, Atlanta, Georgia, 1991-1996 Prevalence Rule Sensitivity Specificity Simple estimate (0 Prevalence estimate <*) Minimum variance 0.01 Tree Logistic CR I* CR l i t 0.243 0.411 0.308 0.150 0.971 0.952 0.939 0.984 0.031 0.052 0.063 0.017 0.01 0.01 0.01 0.01 0.659 0.372 0.974 0.949 0.03 Tree Logistic CRI CRN 0.243 0.411 0.308 0.150 0.971 0.952 0.939 0.984 0.035 0.059 0.068 0.020 0.03 0.03 0.03 0.03 0.746 0.421 1.045 1.093 0.05 Tree Logistic CRI CRN 0.243 0.411 0.308 0.150 0.971 0.952 0.939 0.984 0.040 0.066 0.073 0.023 0.05 0.05 0.05 0.05 0.832 0.469 1.114 1.236 0.07 Tree Logistic CRI CRII 0.925 0.411 0.308 0.150 0.612 0.952 0.939 0.984 0.426 0.073 0.078 0.025 0.07 0.07 0.07 0.07 0.848 0.516 1.183 1.378 0.10 Tree Logistic CRI CRII 0.925 0.411 0.308 0.150 0.612 0.952 0.939 0.984 0.442 0.084 0.086 0.029 0.10 0.10 0.10 0.10 0.855 0.586 1.284 1.589 0.15 Tree Logistic CR I CRII 0.925 0.411 0.308 0.150 0.612 0.952 0.939 0.984 0.469 0.102 0.098 0.036 0.15 0.15 0.15 0.15 0.864 0.698 1.450 1.938 0.17 Tree Logistic CRI CRII 0.925 0.411 0.308 0.150 0.612 0.952 0.939 0.984 0.479 0.110 0.103 0.039 0.17 0.17 0.17 0.17 0.865 0.741 1.514 2.076 * CR I, current rule I: Persons have type 1 diabetes if they are currently using insulin and the age at diagnosis is <30 years; otherwise, they have type 2 diabetes. t CR II, current rule II: Persons have type 1 diabetes if they are currently using insulin, the age at diagnosis is <30 years, and the body mass index is <26; otherwise, they have type 2 diabetes. the existing rules to be good predictors of diabetes type and important components of the new rules as well. Current insulin use was a powerful discriminator of diabetes type. For some classification rules, a history of continuous or nearly continuous use of insulin from diagnosis of diabetes is used to characterize type 1 (9). In our analysis, additional information about the history of insulin use provided no more discriminating power over current insulin use (data not shown). Thus, we used current insulin use for both the current rules and in developing new rules for classification. We found that the cutpoint for age at diagnosis is 28.9 years for the classification tree that provides the best discriminating power, similar to the CR I and CR II cutpoint of 30 years. However, the BMI cutpoints of 31.7 and 31.5 are substantially higher than the BMI cutpoint of 26 used with CR n. Obesity in this population is more Am J Epidemiol Vol. 149, No. 1, 1999 severe (mean BMI, 27.2 for type 1 and 32.5 for type 2 diabetes) than in white populations with diabetes, and it is well established that obesity is more prevalent in nondiabetic African Americans as well. It is plausible that the higher BMI cutpoint to distinguish type 1 and type 2 diabetes reflects the occurrence of the disease in a population with a greater prevalence of obesity. According to the National Diabetes Data Group profile, persons with type 1 diabetes are characterized by insulin deficiency (i.e., low C-peptide levels) and a lean body habitus (1). In this population, the group that was classified as having type 1 diabetes had low C-peptide levels but included obese as well as lean subjects, contrasting with this profile. C-peptide levels decline with an increasing duration of diabetes (27, 28), and, if the obese subjects with type 1 diabetes were actually misclassified as type 2 patients, beta-cell fatigue could 62 Boyle et al. explain in part their low C-peptide levels. This conclusion seems unlikely, however, because such profound hypoinsulinemia is unusual in type 2 diabetes, regardless of duration (29). Variants of diabetes have been described for the African American population that have not been reported for whites. Umpierrez et al. (30) described diabetic ketoacidosis in obese African Americans whose subsequent course was more consistent with type 2 diabetes; metabolic studies after resolution of diabetic ketoacidosis demonstrated good endogenous insulin reserves, most subjects were able to discontinue insulin therapy within weeks, and none had evidence of the autoimmune destruction of insulin-producing cells that occurs in type 1 diabetes. Banerji and Lebovitz (31) reported an insulin-sensitive variant of type 2 diabetes in African Americans that was characterized by normal peripheral insulin sensitivity, decreased insulin secretion, absence of autoimmunity to insulin-producing cells, and excellent response to nonpharmacologic therapy. These or other as-yet unrecognized differences could explain the misclassification of type 2 diabetes as type 1. However, over 90 percent of the subjects classified as type 1 in this population required insulin therapy and had much lower mean C-peptide levels than the subjects in the other studies cited. Twenty-two subjects who were not currently using insulin were classified as having type 1 diabetes because they had low C-peptide levels. Their mean duration of diabetes was 7 years and their mean BMI was 26. Thirteen of the 22 subjects had C-peptide readings that were equal to the cutpoint separating type 1 from type 2 diabetes. Possible explanations of the low C-peptide levels in non-insulin-using subjects include the gradual decline in beta-cell function that occurs in type 2 diabetes. As already noted, however, levels this low are unusual in this form of the disease. Another possibility is that these persons have the insulinsensitive variant of type 2 diabetes described in African Americans, although reported C-peptide levels are usually higher in this group as well. Finally, in a prospective study of diabetes classification, discordance between classification by CLpeptide and clinical criteria was reported in a small number of cases despite 8 years of follow-up (32). With our current level of understanding, some discordance may be inevitable. The performance of classification rules in estimating prevalence of diabetes is dependent on 1) the selected cutpoint for each rule (i.e., the threshold selected to distinguish type 1 from type 2 diabetes) and 2) the true prevalence of each diabetes type. Our goal was to use the rule and cutpoint that yielded the minimum variance and the most precise estimate of prevalence. Thus, we determined the variance of the true prevalence esti- mates by using 1) a range of true prevalence values for type 1 and type 2 diabetes and then selecting the cutpoint for the newly developed rules and 2) the single cutpoints for CR I and CRII (which are fixed). Compared with the classification tree rules, CR I and CR II performed poorly for prevalences ranging from 1 to 17 percent. The optimal tree rule (for prevalences of 1-5 percent) classifies a person as type 1 if he or she is currently using insulin and the age at diagnosis is <28.9 years. Thus, a simple modification to CR I (changing the age at diagnosis to <28.9 years) applied to the African American population provides an improved rule for these prevalences. For prevalences of >5 percent, the optimal tree rule classifies a person as type 1 if he or she is using insulin. Even for prevalences of <5 percent, this simple rule compares favorably with the optimal tree rule (data not shown). Finally, the logistic regression rule provided the minimum variance of all the estimates evaluated, regardless of the true prevalence (with the range of 1-17 percent for type 1 diabetes). A limitation of this study was the use of fasting Cpeptide levels to determine diabetes type. Bimodality in the distribution of C-peptide levels was not detected in this population (figure 1). Therefore, we selected a cutpoint value based on consensus from data in the literature. In addition, the C-peptide value available in this database was from a fasting, not stimulated, specimen. Some investigators have used and recommended stimulated C-peptide values (33) for studies requiring an extremely high degree of clinical certainty. However, other investigators have found that the fasting and stimulated C-peptide values are highly correlated and in most cases can discriminate between insulin-requiring patients and non-insulin-requiring patients (5). Another limitation of this study was the use of a clinic population that may not be representative of the overall African American population with diabetes. Clinic populations or populations recruited for research studies were used to develop current rules of classification. The sample included in this study came from a large urban clinic and represents primarily persons of lower socioeconomic status, and the extent to which this may have biased our findings is unknown. Our effort in this study was to identify the classification rule that provides the most precise (minimum variance) estimate of prevalence. Other issues to consider are selecting classification rules that may either minimize the number of persons misclassified or possibly minimize the cost of misclassification. In the future, these issues may need to be addressed as the cost of misclassification is better understood. In summary, tracking the prevalence of type 1 and type 2 diabetes in the population is an important public' health function. To provide an appropriate and effective Am J Epidemiol Vol. 149, No. 1, 1999 Estimating Prevalence of Type 1 and Type 2 Diabetes response to address these major forms of diabetes that have markedly different etiologies, treatments, and preventive strategies, we need to conduct valid public health surveillance and determine precise prevalence estimates. We have provided important information to address this issue in the African American population and have proposed the use of new classification rules to improve and reduce the variability that occurs when applying the typical rules used currently. 15. 16. 17. 18. REFERENCES 1. National Diabetes Data Group. Classification and diagnosis of diabetes mellitus and other categories of glucose intolerance. Diabetes 1979;28:1039-57. 2. World Health Organization Expert Committee on Diabetes Mellitus. Second report on diabetes mellitus. Geneva, Switzerland: World Health Organization, 1989:8-14. (WHO technical report series 646). 3. Prior MJ, Prout T, Miller D, et al. C-peptide and the classification of diabetes mellitus in patients in the Early Treatment Diabetic Retinopathy Study. Report number 6. Ann Epidemiol 1993;3:9-17. 4. Welborn TA, Garcia-Webb P, Bonser AM. Basal C-peptide in the discrimination of type I and type II diabetes. Diabetes Care 1981;4:616-19. 5. Hother-Nielsen O, Faber O, Schwartz N, et al. Classification of newly diagnosed diabetic patients as insulin-requiring or noninsulin-requiring based on clinical and biochemical variables. Diabetes Care 1988;ll:531-7. 6. Klein R, Klein BE, Moss SE. The Wisconsin Epidemiologic Study of Diabetic Retinopathy. The relationship of C-peptide to the incidence and progression of diabetic retinopathy. Diabetes 1995;44:796-801. 7. Landin-Ollson M, Nilsson KO, Lernmark A, et al. Islet cell antibodies and fasting C-peptide predict insulin requirement at diagnosis of diabetes mellitus. Diabetologia 1990;33:561-8. 8. Katzeff HL, Savage PJ, Barclay-White B, et al. C-peptide measurement in the differentiation of type 1 (insulin-dependent) and type 2 (non-insulin-dependent) diabetes mellitus. Diabetologia 1985;28:264-8. 9. Harris MI, Cowie CC, Howie LJ. Self-monitoring of blood glucose by adults with diabetes in the United States population. Diabetes Care 1993;16:1116-23. 10. Harris MI, Robbins DC. Prevalence of adult-onset IDDM in the U.S. population. Diabetes Care 1994;17:1337^t0. 11. Geographic patterns of childhood insulin-dependent diabetes mellitus. Diabetes Epidemiology Research International Group. Diabetes 1988;37:1113-19. 12. Lipman TH. The epidemiology of type 1 diabetes in children 0-14 years of age in Philadelphia. Diabetes Care 1993; 16:922-5. 13. Tull ES, Roseman JM, Christian CLE. Epidemiology of childhood IDDM in the U.S. Virgin Islands from 1979 to 1988. Diabetes Care 1991;14:558-64. 14. LaPorte RE, Matsushima M, Chang YF. Prevalence and incidence of insulin-dependent diabetes. In: Harris MI, Cowie CC, Stem MP, et al, eds. Diabetes in America. Bethesda, MD: National Institutes of Health, National Institute of Diabetes Am J Epidemiol Vol. 149, No. 1, 1999 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 63 and Digestive and Kidney Diseases, 1995:37-^6. (NIH publication no. 95-1468). Kenny SJ, Aubert RE, Geiss LS. Prevalence and incidence of non-insulin-dependent diabetes. In: Harris MI, Cowie CC, Stern MP, et al, eds. Diabetes in America. Bethesda, MD: National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 1995:47-67. (NIH publication no. 95-1468). Umpierrez GE, Casals MMC, Gebhart SSP, et al. Diabetic ketoacidosis in obese African-Americans. Diabetes 1995; 44:790-5. Banerji MA, Chaiken RL, Huey H, et al. GAD antibody negative NIDDM in adult black subjects with diabetic ketoacidosis and increased frequency of human leukocyte antigen DR3 and DR4. Flatbush diabetes. Diabetes 1994;43:741-5. Tull ES, Roseman JM. Diabetes in African Americans. In: Harris MI, Cowie CC, Stern MP, et al, eds. Diabetes in America. Bethesda, MD: National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 1995:613-30. (NIH publication no. 95-1468). Stern MP, Mitchell BD. Diabetes in Hispanic Americans. In: Harris MI, Cowie CC, Stern MP, et al, eds. Diabetes in America. Bethesda, MD: National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 1995:631-60. (NIH publication no. 95-1468). Rogan WJ, Gladen B. Estimating prevalence from the results of a screening test. Am J Epidemiol 1978; 107:71-6. Ziemer DC, Goldschmid MG, Musey VC, et al. Diabetes in urban African Americans. III. Management of type II diabetes in a municipal hospital setting. Am J Med 1996;101:25-33. American Diabetes Association. Guide to diagnosis and classification of diabetes mellitus and other categories of glucose tolerance. Diabetes Care 1997;20(suppl):21. Chambers JM, Hastie TJ. Statistical models in S. Pacific Grove, CA: Wadsworth & Brooks, 1992. Collet D. Modelling binary data. London, England: Chapman &Hall, 1991. Schwarz G. Estimating the dimension of a model. Ann Stat 1978;6:461—4. McNeil BJ, Keeler E, Adelstein SJ. Primer on certain elements of medical decision making. N Engl J Med 1975;293:211-15. Joffe BI, Panz VR, Wing JR, et al. Pathogenesis of noninsulin-dependent diabetes mellitus in the black population of southern Africa. Lancet 1992;340:460-2. DCCT Research Group. Effects of age, duration and treatment of insulin-dependent diabetes mellitus on residual beta-cell function: observations during eligibility testing for the Diabetes Control and Complications Trial (DCCT). J Clin Endocrinol Metab 1987;65:30-6. Rewers M, Hamman RF. Risk factors for non-insulindependent diabetes. Diabetes in America. Bethesda, MD: National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases, 1995:179-220. (NIH publication no. 95-1468). Umpierrez GE, Casals MM, Gebhart SS, et al. Diabetic ketoacidosis in obese African Americans. Diabetes 1995;44:790-5. Banerji MA, Lebovitz HE. Insulin-sensitive and insulinresistant variants in NIDDM. Diabetes 1989;38:784-92. Service FJ, Rizza RA, Zimmerman BR, et al. The classification of diabetes by clinical and C-peptide criteria. Diabetes Care 1997,20:198-201. Diabetes Control and Complications Trial (DCCT): results of feasibility study. The DCCT Research Group. Diabetes Care 1987;10:l-19.
© Copyright 2026 Paperzz