American Journal of Epidemiology ª The Author 2009. Published by the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: [email protected]. Vol. 170, No. 3 DOI: 10.1093/aje/kwp122 Advance Access publication June 8, 2009 Practice of Epidemiology Designing 2-Phase Prevalence Studies in the Absence of a ‘‘Gold Standard’’ Test Agus Salim and Alan H. Welsh Initially submitted December 18, 2007; accepted for publication April 24, 2009. A population survey for estimating prevalence is challenging when a disease or condition is difficult to diagnose. If clinical diagnosis is expensive, a 2-phase study, in which less expensive but less accurate tests are administered to all study subjects in the first phase (screening phase) and a more accurate but expensive or time-consuming test is administered to only a subset of the subjects in the second phase, is an attractive approach. Published research has discussed ways of maximizing precision of the prevalence estimate from a 2-phase study with a ‘‘gold standard’’ second-phase test. For many psychiatric disorders, even the best diagnostic tests are not of gold standard quality. In this paper, the authors propose a quasi-optimal design for 2-phase prevalence studies without a gold standard test; random-effects latent class analysis facilitates the estimation of prevalence and appropriately addresses the issue of dependent errors among the diagnostic tests. The authors show that the quasi-optimal design is efficient compared with the balanced and random designs when there is strong inter-test dependence caused by additional factors, apart from disease status, and highlight the importance of collecting data on those subjects testing negative in the first phase. efficiency; mass screening; patient selection; psychiatry; sampling studies Abbreviation: LCA, latent class analysis. Advances in medical technology enabled many conditions to be diagnosed with no or negligible error. Diagnostic tests with this property, referred to as ‘‘gold standard,’’ determine directly the true disease status of an individual. However, there are still many conditions for which a gold standard test is not available or is simply too expensive to apply. The lack of a gold standard test is a common feature in psychiatry. Because there are no established biomarkers for psychiatric disorders, these conditions are diagnosed against consensus-determined sets of symptoms and signs (1). When the population of interest is large, a screening phase can be performed. Tests conducted during this phase are usually easy to administer and relatively inexpensive, although they may not have high specificity. A fraction of those subjects who ‘‘screened’’ positive are examined further by using more thorough tests (refer, for example, to Kumar et al. (2) and Hayden et al. (3)). Prevalence is computed by assuming that the phase II test is a gold standard and that false negatives in the screening phase are negligi- ble. When the gold standard assumption is reasonable and a fixed fraction of subjects are to be assessed at phase II, precision of the prevalence estimate can be maximized by selecting an optimal number of subjects from each category of screening outcomes (4–6). However, falsely assuming gold standard quality will result in a biased prevalence estimate (7, 8) and subvert any claims to optimality. Despite the need to study the optimal framework for settings in which there is no definitive diagnostic method, to our knowledge, there has been no effort to address this problem. Latent class analysis (LCA) has been used to estimate prevalence in the absence of a gold standard test (9–11), but the conditional independence assumption that underlies LCA rarely holds. Torrance-Rynard and Walter (8) showed that when it is violated, the prevalence estimate is biased. Several extensions to classical LCA have been developed. A latent class model with conditional dependence between 1 or 2 pairs of binary responses was introduced by Harper (12). This approach has been generalized for handling extra Correspondence to Dr. Agus Salim, Department of Epidemiology and Public Health, National University of Singapore, MD3, 16 Medical Drive, Singapore, Singapore, 117597 (e-mail: [email protected]). 369 Am J Epidemiol 2009;170:369–378 370 Salim and Welsh dependence among multiple tests by introducing random subject effects (13). Using random-effects LCA models given in Qu et al. (13), we study a framework for designing 2-phase prevalence studies in the absence of a gold standard test, with the objective of maximizing precision of the prevalence estimate. We present a scenario in which the phase I sample size n is fixed and a predetermined number of subjects will be examined at phase II. random-effects LCA, where, in addition to the underlying disease status, the outcome depends on random subjectspecific effects (13). We model the probability of a positive outcome for the jth test on the ith individual as a function of the underlying disease status Di and a subject-specific random effect Ti through a probit regression model, Pr Yij ¼ 1j Di ¼ d; Ti ¼ t ¼ U ajd þ bjd t ; ð2Þ d ¼ 0; 1; Ti ~ N 0; 1 ; METHOD Classical LCA Suppose there are M binary, non-gold-standard, diagnostic tests. The unobserved true disease status for individual i (Di) is a latent binary variable. Let Yi ¼ (Yi1, Yi2, . . ., YiM) be the vector of outcomes for individual i. The tests are characterized by their sensitivity and specificity, which, for the jth test, are respectively given by wj ¼ Pr Yij ¼ 1j Di ¼ 1 ; xj ¼ Pr Yij ¼ 0j Di ¼ 0 : Classical LCA assumes conditional independence, in which outcomes of the different tests within an individual are assumed to be independent given the true disease status, which can be written as Pr½Yi j Di ¼ M Y A quasi-optimal hybrid design for 2-phase prevalence studies Pr½Yij ¼ yij j Di : j¼1 The likelihood function based on these M test outcomes is given by Li ðhÞ ¼ Pr½Di ¼ 1Pr½Yi j Di ¼ 1 þ Pr½Di ¼ 0Pr½Yi j Di ¼ 0 ¼q M Y M Y 1yij y 1y y 1 xj ij xj ij ; wj ij 1 wj þð1 qÞ j¼1 where U(.) is the standard normal cumulative density function, and the amount of extra dependence caused by interaction between the jth test and subject-specific characteristic Ti is controlled by parameters bj1 and bj0. Within individual i, test outcomes are conditionally independent given Di and Ti. To estimate the fixed-effects parameters, h ¼ {q, a10, a11, b10, b11, . . ., aM0, aM1, bM0, bM1}T, we maximize the marginal log-likelihood (Appendix 1). The sensitivity and specificity parameters are estimated as functions of the maximum likelihood estimates of fixed-effects parameters. Alternatives to probit regression are available. In particular, if logistic regression is used instead, the model will be equivalent to a subclass of the generalized partial-credit models used in psychometrics (14, 15). The main advantage of using the probit link is that the sensitivity and specificity estimates can be computed as relatively simple functions of parameter estimates. j¼1 ð1Þ where q ¼ Pr[Di ¼ 1] is the disease prevalence. The parameters of interest h ¼ {q, w1, w2, . . ., wM, x1, x2, . . ., xM}T can be estimated by maximizing the logarithm of equation 1 summed over all individuals. LCA with random effects The conditional independence assumption in LCA fails when dependence due to factors other than disease status exists. In psychiatry, it is likely that psychological test batteries share similar test items. These tests will tend to give similar outcomes for the same subject, regardless of the subject’s disease status. Dependency can also arise when sensitivity of the individual screening procedure in practice increases with the severity of the underlying condition. Dependence between different tests can be handled by using LCA can be adapted to handle data from 2-phase prevalence studies (refer to Appendix 2). In 2-phase prevalence studies, we screen a random sample of n subjects using M binary tests, and 100qv% of them are assessed by using a different test at phase II. We assume that the phase I sample size (n) and the overall phase II sampling fraction (qv) have been predetermined. We label the outcome classes defined by a unique combination of screening test outcomes as Zk, k ¼ 1, 2, . . ., 2M. Let n*k ; k ¼ 1; . . . ; 2M denote the number of observations in each outcome class, so 2M P n*k . In phase II, we select nk ¼ pk n*k subjects from n¼ k¼1 the kth outcome class. Each of the nk subjects selected is given the additional binary test W, with nks denoting the number of subjects in the kth class with phase II outcome s ¼ 0, 1, so that nk ¼ nk0 þ nk1, k ¼ 1, . . ., 2M. Our objective is to minimize the variance of the prevalence estimate, q̂, by choosing the optimal phase II sampling to the overall fractions, p ¼ fp1 ; p2 ; . . . ; pP 2M g, subject sampling fraction being qv ¼ k pk n*k n. For a given set of phase II sampling fractions, p, the prevalence estimator q̂ is distributed as q̂ ~ N q; I 1 11 h; p ; where I 1 11 h; p is the first main diagonal element of the inverse of expected Fisher information matrix Am J Epidemiol 2009;170:369–378 Prevalence Studies in the Absence of a ‘‘Gold Standard’’ @ 2 log‘ h; p I ðh; pÞ ¼ E : @h@h ð3Þ The analytical form of log-likelihood, log‘(h; p), is given in Appendix 2. Because we are interested in finding the ‘‘optimal’’ p, the parameters of the underlying random-effects LCA are assumed to be known. If the parameters need to be estimated, one can do so from a pilot study conducted prior to the main study. The log-likelihood in Appendix 2 can be used to estimate h based on data from a pilot study. The pilot study typically includes a small number of individuals being assessed by using M phase I tests; a subset of them are also assessed by using the phase II test. To locate the optimal sampling fractions, we need to evaluate the expected information matrix in equation 3 for various values of p. The matrix can only be approximated numerically by using finite-difference methods. We initially used the Nelder-Mead simplex algorithm (16) to locate the optimal sampling fractions. However, doing so proved quite challenging because of the difficulty in obtaining highly accurate approximations of the second-order derivatives. The problem is compounded because we rely on achieving stability in estimating a small variance parameter for declaring convergence. To overcome this problem, we search for ‘‘optimal’’ phase II sampling fractions among designs that are hybrid between a random (RAN) design, in which the sampling fractions are equal (pRAN ¼ qv), and a balanced (BAL) design, in which the sampling fraction for the kth class is set to be pBAL ¼ nqv : M 2 3n*k The balanced design seeks to make the phase II sample sizes for the different phase I outcome classes as equal as possible. An exception is made when there are no subjects in the kth phase I stratum, and, as a consequence, we cannot select anybody from this stratum at phase II. On this rare occasion, we set the balanced sampling fraction for this stratum to zero and recalculate the other sampling fractions using simple renormalization (refer to the information below). Similarly, when the number of subjects in the kth stratum is smaller than the number required under a balanced design, the sampling fraction will be set to 1 and the other sampling fractions are recalculated (again, see below). In our hybrid design, the vector of sampling fractions is taken as a linear combination of sampling fractions under random and balanced designs, p ¼ kpBAL þ ð1 kÞpRAN : ð4Þ We search forthe optimal hybrid parameter, k, that minimizes I 1 11 h; p using the line search method. To help our search, the curve of variances at various k values is smoothed first prior to performing the search. Because there is no guarantee that the hybrid sampling fractions actually achieve the global minimum of standard errors of the prevAm J Epidemiol 2009;170:369–378 371 alence estimate, we refer to this hybrid design as ‘‘quasioptimal.’’ When k ¼ 0, the quasi-optimal design samples randomly; when k ¼ 1, the quasi-optimal design is balanced. When k > 1, the quasi-optimal design overshoots the balanced design and samples more intensively from rare phase I outcome classes. Similarly, when k < 0, the quasi-optimal design samples more intensively from common phase I outcome classes compared with the random design. For some k, elements of p may be negative; we replace negative sampling fractions by zero and adjust the other sampling P fractions using the simple renormalization p* ¼ pnqv = k;pk >0 pk n*k . Similarly, if any sampling fractions are greater than one, then we replace these sampling fractions by one and adjust the other sampling fractions renormalization: p* ¼ Pusing the* following P p nqv k;pk 1 pk nk = k;pk 1 pk n*k . RESULTS We compare the relative efficiency of the proposed quasioptimal design with that of random and balanced designs. The comparisons are aimed at assessing the efficiency that could be achieved under realistic sets of parameters, chosen to represent ‘‘typical’’ psychiatric diagnostic tests. To compare the different designs, we assume that n ¼ 10,000 subjects are assessed at phase I using 4 different tests and that only a fixed fraction of these subjects receive the phase II test. The choice of n is somewhat arbitrary because the relative efficiency of any 2 designs given a fixed p does not depend on n, as evidenced by the form of the loglikelihood function in Appendix 2. The sensitivity parameters of the screening tests are set to be 0.68, 0.70, 0.73, and 0.76, and the specificity parameters are set to be 0.60, 0.62, 0.65, and 0.68. Four factors are varied in the study: 1. Phase II test parameters: sensitivity ¼ 0.95 and specificity ¼ 0.95 (labeled Good), and sensitivity ¼ 0.80 and specificity ¼ 0.75 (labeled Moderate). 2. Conditional dependence parameters: b1 and b0 are set equal across the 5 tests and are assigned the values b1 ¼ b0 ¼ 2 (labeled Strong) and b1 ¼ b0 ¼ 0.5 (labeled Weak). 3. Disease prevalence: q ¼ 0.01 corresponds to a rare disease such as schizophrenia, and q ¼ 0.10 corresponds to a relatively common disease such as depression. 4. Phase II sampling fraction: ranges from qv ¼ 0.05 to qv ¼ 0.50, with an increment of 0.05. In this current study, the underlying parameters of the LCA model are known; hence, no simulated pilot data are needed. Instead, for each combination of phase II sampling fraction and k, we can directly use these parameters to calculate the information matrix (3). The numeric approximations of the expected information matrix are carried out by using the fdHess function of the R package nlme; the codes are available, upon request, from the authors. The variance estimate is calculated by taking the first main diagonal element of the inverse of this matrix. A) 1.0 B) 1.0 Relative Efficiency 0.8 Relative Efficiency 372 Salim and Welsh 0.8 0.6 0.4 0.2 0.0 0.6 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5 C) 1.0 D) 1.0 Relative Efficiency Phase II Sampling Fraction Relative Efficiency Phase II Sampling Fraction 0.8 0.8 0.6 0.4 0.2 0.0 0.6 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.5 Phase II Sampling Fraction 0.1 0.2 0.3 0.4 0.5 Phase II Sampling Fraction Figure 1. Relative efficiency defined as the variance of the prevalence estimate under the quasi-optimal design relative to the variance estimate under random (solid lines) and balanced (dashed lines) designs for the same sampling fractions (q ¼ 0.01). A) Good phase II test with strong intertest dependence; B) good phase II test with weak inter-test dependence; C) moderate phase II test with strong inter-test dependence; D) moderate phase II test with weak inter-test dependence. To estimate the relative efficiency of the quasi-optimal design, for each combination of factors 1 and 2 above, we compute the variance of the prevalence estimate for hybrid designs with k ranging from 1 to 6. To remove the stochastic noise from the numeric approximations, the variance estimates are then smoothed as a function of disease prevalence, phase II sampling fractions, and k. The smoothing process is performed by using the R function gam. The relative efficiency for each combination of disease prevalence and phase II sampling fraction is then computed by taking the ratio of the lowest variance estimate (quasioptimal variance) relative to the variance estimate when k ¼ 0 (random design) and when k ¼ 1 (balanced design). The relative efficiency curves are constructed by smoothing the values of relative efficiency estimates as functions of phase II sampling fractions. Reduction of variance for the same overall phase II sampling fraction For rare disease, when conditional dependence is weak (Figure 1B and D), balanced and random designs perform almost as well as the quasi-optimal design. However, when conditional dependence is strong, the advantage of the quasi-optimal design over random designs is obvious. The balanced design, however, still performs relatively well when conditional dependence is strong. A very similar pattern holds for common disease (Figure 2). Although the phase I sample size (n) does not affect the relative efficiency of the quasi-optimal design relative to the random or balanced design if the phase II sampling fraction is fixed, it still affects the relative efficiency if the phase II sample size is fixed. For example, suppose that n ¼ 10,000 and we can afford to sample only 500 individuals (qv ¼ 0.05). Then, according to Figure 1A, the balanced design is about 90% as efficient as the quasi-optimal design. However, if n ¼ 1,000 and we can afford to sample only 500 individuals (qv ¼ 0.50), then using the balanced design would produce a prevalence estimate nearly as precise as that obtained with the quasi-optimal design. Reduction of the phase II sampling fraction for the same standard error of prevalence estimate We also look at how the relative efficiency translates into reduction of the required phase II sampling fraction. For each value of the phase II sampling fraction under the quasi-optimal design (factor 4 above), we use the smoothed variance estimates to interpolate the phase II sampling fraction, with k ¼ 0 and k ¼ 1, that will give the same variance of prevalence estimate as the quasi-optimal design does. The relative efficiency is then calculated by the ratio of quasioptimal phase II sampling fraction relative to that of balanced and random designs. Figures 3A and 4A show that when conditional dependence is strong and the phase II test is of Good quality, the quasi-optimal design can achieve Am J Epidemiol 2009;170:369–378 A) 1.0 B) 1.0 Relative Efficiency 0.8 Relative Efficiency Prevalence Studies in the Absence of a ‘‘Gold Standard’’ 0.8 0.6 0.4 0.2 0.0 373 0.6 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.1 0.5 0.2 0.3 0.4 0.5 C) 1.0 D) 1.0 0.8 Relative Efficiency Phase II Sampling Fraction Relative Efficiency Phase II Sampling Fraction 0.8 0.6 0.4 0.2 0.0 0.6 0.4 0.2 0.0 0.1 0.2 0.3 0.4 0.5 Phase II Sampling Fraction 0.1 0.2 0.3 0.4 0.5 Phase II Sampling Fraction Figure 2. Relative efficiency defined as the variance of the prevalence estimate under the quasi-optimal design relative to the variance estimate under random (solid lines) and balanced (dashed lines) designs for the same sampling fractions (q ¼ 0.10). A) Good phase II test with strong intertest dependence; B) good phase II test with weak inter-test dependence; C) moderate phase II test with strong inter-test dependence; D) moderate phase II test with weak inter-test dependence. more than a 40% reduction in phase II sample size when compared with the random design. However, significant gain over the balanced design is observed only when the sampling fraction is small. When the phase II test is of Moderate quality (Figures 3C and 4C), the quasi-optimal design achieves a more meaningful reduction in terms of the phase II sampling fraction when compared with the balanced design. However, the most interesting findings occur when conditional dependence is weak. Figure 3B and D reveal that, generally, the benefit of the quasi-optimal design is much more pronounced when reduction of phase II sampling fractions, rather than variance reduction, is used to measure the benefit (Figure 1B and D). A similar pattern is observed when Figures 2 and 4 are compared for common disease, although the discrepancy is less pronounced. This discrepancy can be explained by the fact that the variance of the prevalence estimate is not an inverse function of the phase II sampling fraction; rather, it is an inverse function of the phase I sample size. Hence, if for the same sampling fraction, variance under the quasi-optimal design that is half that under the random design does not mean that the variance level shown by the quasi-optimal design can be achieved by simply doubling the sampling fraction under the random design. As a consequence, the scale used to measure the benefit of the quasi-optimal design matters. In terms of practical applications, however, the definition of benefit in terms of phase II sampling fraction is probably more meaningful. Am J Epidemiol 2009;170:369–378 APPLICATION We apply our methodology to a study of mild cognitive impairment in a cohort of subjects aged 60–64 years in Canberra, Australia (2). Mild cognitive impairment has been defined as a transitional state between normal aging and Alzheimer’s disease (17). The application of different diagnostic procedures and the age range of study subjects have led to diverse and inconsistent estimates of prevalence of this condition (refer, for example, to De Carli (18)). Prevalence estimation has been further complicated by the fact that none of the currently used tests can be regarded as a gold standard. The study was a 2-phase investigation, with 2,551 individuals assessed by using the following 3 tests at phase I: 1. A Mini-Mental State Examination score of 25 (19) 2. A score below the 5th percentile for immediate or delayed recall on the California Verbal Learning Test (20) 3. A score below the 5th percentile on the Symbol Digit Modalities Test (21). At phase II, individuals underwent thorough physical, laboratory, and neuropsychological examinations conducted by a clinician and were assessed according to diagnostic criteria for amnestic mild cognitive impairment (17). The study team decided to conduct the phase II investigations on only a subset of individuals who had at least one positive A) 1.0 B) 1.0 Relative Efficiency Relative Efficiency 374 Salim and Welsh 0.8 0.8 0.6 0.4 0.2 0.6 0.4 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.1 C) 1.0 D) 1.0 Relative Efficiency 0.8 0.8 0.6 0.4 0.2 0.2 0.3 0.4 0.5 Quasi-optimal Sampling Fraction Relative Efficiency Quasi-optimal Sampling Fraction 0.6 0.4 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Quasi-optimal Sampling Fraction 0.1 0.2 0.3 0.4 0.5 Quasi-optimal Sampling Fraction Figure 3. Relative efficiency defined as the ratio of the required phase II sampling fraction under the quasi-optimal design relative to the random (solid lines) and balanced (dashed lines) designs to achieve the same standard errors of prevalence estimate (q ¼ 0.01). A) Good phase II test with strong inter-test dependence; B) good phase II test with weak inter-test dependence; C) moderate phase II test with strong inter-test dependence; D) moderate phase II test with weak inter-test dependence. phase I test. Seventeen of 2,551 subjects had one or more missing screening test outcomes and were excluded from the subsequent analysis. Therefore, 2,534 subjects remained, of whom 112 of 224 who ‘‘screened’’ positive consented to undergo and were administered the phase II test (corresponding to qv 0.044). The estimates of prevalence, as well as sensitivity and specificity parameters of the 4 tests under 1) classical LCA and 2) random-effects LCA, are given in Table 1. Because we have 4 tests, only those models with at most 24 1 ¼ 15 independent parameters can be fitted. With this restriction, we assume that the variance component for the random effects is the same in both the diseased and nondiseased subpopulation for all tests. That is, b11 ¼ b21 ¼ b31 ¼ b41 ¼ b1 and b10 ¼ b20 ¼ b30 ¼ b40 ¼ b0. The likelihood ratio statistic to compare the classical and randomeffects LCA models is 6.70 on 2 degrees of freedom (P ¼ 0.035), so we reject the conditional independence assumption. The random-effects LCA model fits the data quite well, as indicated by comparison of the observed and predicted frequencies for the different phase I outcome classes (Table 2). Violation of the conditional independence assumption results in underestimation of disease prevalence by 2.6% in absolute value and more than 30% relative to the prevalence estimate itself. It also results in underestimation of the sensitivity of the amnestic mild cognitive impairment test and overestimation of the sensitivity of the phase I tests. The low sensitivities of the Mini-Mental State Examination and the Symbol Digit Modalities Test indicate that the current threshold values used to determine positive screening for these tests are too high and that, consequently, higher sensitivity can be achieved by lowering the threshold (e.g., lower than 25 for the Mini-Mental State Examination), although the specificity will decrease to compensate for the lower threshold. The phase II test (amnestic mild cognitive impairment) has very high specificity. In fact, the 95% confidence interval for the specificity of amnestic mild cognitive impairment covers 1 (perfect specificity). The value of estimated prevalence is in accordance with previously published reports in which the prevalence of mild cognitive impairment for age groups 60 years or older is reported to be between 5% and 10% (18, 22). Treating data from this study as our ‘‘pilot’’ data, we investigate how we should have designed phase II sampling. We search for a quasi-optimal design by using the estimates in Table 1 (the Conditional Dependence column) to generate numeric estimates of the expected information matrix. The hybrid parameter for the quasi-optimal design is found to be 1.8 (Figure 5). Figure 5 also demonstrates that the quasioptimal design, compared with the random design, greatly reduces variance of the prevalence estimate. The quasioptimal design does not perform much better than the balanced design. This finding is not surprising given that our calculations show that, when the phase II test is of Moderate Am J Epidemiol 2009;170:369–378 A) 1.0 B) 1.0 Relative Efficiency Relative Efficiency Prevalence Studies in the Absence of a ‘‘Gold Standard’’ 0.8 0.8 0.6 0.4 0.2 375 0.6 0.4 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.1 C) 1.0 D) 1.0 Relative Efficiency 0.8 0.6 0.4 0.2 0.2 0.3 0.4 0.5 Quasi-optimal Sampling Fraction Relative Efficiency Quasi-optimal Sampling Fraction 0.8 0.6 0.4 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Quasi-optimal Sampling Fraction 0.1 0.2 0.3 0.4 0.5 Quasi-optimal Sampling Fraction Figure 4. Relative efficiency defined as the ratio of the required phase II sampling fraction under the quasi-optimal design relative to the random (solid lines) and balanced (dashed lines) designs to achieve the same standard errors of prevalence estimate (q ¼ 0.10). A) Good phase II test with strong inter-test dependence; B) good phase II test with weak inter-test dependence; C) moderate phase II test with strong inter-test dependence; D) moderate phase II test with weak inter-test dependence. Table 1. Parameter Estimates and Their 95% Confidence Intervals for the Mild Cognitive Impairment Study Data Set Conditional Independence Conditional Dependence Estimate 95% CI Estimate 95% CI 0.051 0.026, 0.077 0.077 0.052, 0.102 MMSE 0.470 0.289, 0.664 0.084 0.033, 0.467 CAL 0.439 0.287, 0.582 0.255 0.024, 0.745 SDMT 0.484 0.135, 0.657 0.051 0.011, 0.665 aMCI 0.319 0.135, 0.503 0.764 0.272, 0.969 MMSE 0.990 0.981, 0.999 0.971 0.926, 0.992 CAL 0.967 0.956, 0.976 0.962 0.904, 0.991 SDMT 0.972 0.962, 0.982 0.951 0.874, 0.989 aMCI 0.689 0.568, 0.823 0.986 0.656, 1.000 q Sensitivity Specificity Variance component b1 0 0.720 0.545, 1.551 b0 0 1.154 1.131, 1.225 Abbreviations: aMCI, amnestic mild cognitive impairment; CAL, California Verbal Learning Test; CI, confidence interval; MMSE, Mini-Mental State Examination; SDMT, Symbol Digit Modalities Test. Am J Epidemiol 2009;170:369–378 quality and conditional dependence is Strong, the balanced design performs nearly as well as the quasi-optimal design (Figures 1C and 2C). The variance of the prevalence estimate under the quasioptimal design is 85% of the variance under the balanced design and is 40% of the variance under the random design (Figure 5). When we compared the quasi-optimal design with the design used in the actual study, we found that the actual design is only about 33% efficient. Thus, had we been able to use the quasi-optimal design in planning the phase II sampling, then the standard error of the prevalence estimates under the LCA model would have been reduced from approximately 0.0128 (the width of the 95% confidence interval for q in Tablep 1 ffiffiffiffiffiffiffiffiffi (the Conditional Dependence column) divided by 3.94) to 0:3330:0128 0:0074. Alternatively, if we had used the balanced design, the standard qffiffiffiffiffiffi error of the 0:33 prevalence estimate would have been 0:85 3 0:0128 0:008. The recommended phase II sampling fractions are given in Table 3. When we compare the prescribed phase II sampling fractions with those used in the actual study, we learn that the study design can be improved by sampling more subjects who had 2 or more positive tests in phase I and by sampling fewer subjects who had only one positive test in phase I. A number of subjects who had screened negative 376 Salim and Welsh Table 2. Observed Number of Subjects in the Different Phase I Outcome Classes and the Expected Number Predicted by the Random-Effects Latent Class Analysis Model MMSE CAL SDMT Observed No. Expected No. 1 1 1 12 12 1 1 0 17 16 1 0 1 20 0 1 1 1 0 0 1 0 0 CAL SDMT Prevalence Sampling Fraction 1 1 1 0.005 1.000 (0.286) 1 1 0 0.008 1.000 (0.476) 21 1 0 1 0.009 1.000 (0.292) 15 13 0 1 1 0.007 0.822 (0.412) 0 31 43 1 0 0 0.014 0.402 (0.432) 0 91 91 0 1 0 0.037 0.190 (0.537) 0 1 78 78 0 0 1 0.032 0.220 (0.110) 0 0 2,270 2,260 0 0 0 0.887 0.008 (0.000) Abbreviations: CAL, California Verbal Learning Test; MMSE, MiniMental State Examination; SDMT, Symbol Digit Modalities Test. would also need to be assessed at phase II. The exact number is 0:887 3 0:008 3 2; 551 ’ 19 individuals. DISCUSSION In this paper, we have presented a quasi-optimal design for a 2-phase prevalence study without a gold standard test. We demonstrated that the quasi-optimal design is more efficient than random and balanced designs under a variety of realistic parameter settings, especially when strong conditional dependence exists. The quasi-optimal design is very useful for psychiatric studies, where even the best diagnostic test is often of Moderate quality, with sensitivity typically less than 80%, and conditional dependence between tests is strong. 1.0 0.8 Relative Efficiency Table 3. Population Prevalence of Different Phase I Outcome Classes and the Phase II Sampling Fractions Under the Quasioptimal Design (Observed Counterpart Used in the Actual Study) Optimal Hybrid Balanced Random 0.6 0.4 1.0 2.0 3.0 4.0 λ Figure 5. Relative efficiency defined as the variance of the prevalence estimate under the quasi-optimal design (k ¼ 1.8) relative to hybrid designs with different hybrid parameters (k). MMSE Abbreviations: CAL, California Verbal Learning Test; MMSE, MiniMental State Examination; SDMT, Symbol Digit Modalities Test. Because the quasi-optimal design is based on randomeffects LCA models, researchers generally need to have at least 3 screening tests to take advantage of the methodology for planning their study. With 3 screening and one phase II test, the simplest random-effects LCA model with the same conditional dependence parameters for the 4 tests would need to estimate 11 parameters (prevalence estimate, 4 sensitivity estimates, 4 specificity estimates, and 2 estimates of conditional dependence parameters). With 4 tests in total, this is possible because we have 24 1 ¼ 15 degrees of freedom to use from a 2 3 2 3 2 3 2 contingency table. However, if we have 2 subpopulations such as male and female with different prevalences, the minimum number of tests required to conduct LCA can be reduced to 3 (9). With 3 tests, each subpopulation has 23 1 ¼ 7 degrees of freedom, so in total we have 14 degrees of freedoms with only 12 parameters to be estimated. When needed, the extra degrees of freedom can be used to model the effect of covariates such as gender on specificity and sensitivity parameters. For population-based studies of a psychiatric condition such as mild cognitive impairment or dementia, we expect that the requirement of having 2 screening tests would be easily fulfilled. Our findings also demonstrate the importance of conducting a further test at phase II for a small fraction of those who screened negative at phase I. This procedure is contrary to popular practice in 2-phase psychiatric studies, where no further assessment is conducted for this subgroup of participants. Assessing a few of those who screened negative can have a profound impact on reducing the variance of prevalence estimate. Random-effects LCA can be extended to include covariates. Factors such as poor education and motor or sensory impairment may have differential effects on tests that use particular approaches to assess cognitive status. Because the efficiency of our quasi-optimal design relies on the validity of the random-effects LCA models, we suggest that researchers use the pilot data to examine model fit by comparing the observed and predicted frequencies of subjects in the different strata defined by phase I test outcomes. Am J Epidemiol 2009;170:369–378 Prevalence Studies in the Absence of a ‘‘Gold Standard’’ A significant lack of fit would suggest that the current model is inappropriate. Comparing the observed and theoretical odds ratios for each pair of tests will also help improve the model by identifying pairs of tests that are more correlated than the others and for which separate, shared, randomeffects parameters are needed. We refer readers interested in models with covariates to Qu et al. (13) and Reboussin et al. (23). The latter authors also discussed the strengths and weaknesses of random-effects latent class models. Several extensions are worth further research. Study withdrawals are a ubiquitous problem in psychiatric epidemiology, and taking this problem into account will increase applicability of the methodology. We considered a scenario in which the phase I sample size is predetermined. A more complex scenario would be to determine the optimal phase I sample size as well as the phase II sampling fractions. In this scenario, we need to consider the cost of phase I tests per subject (c1) and phase II test per subject (c2). For a gold standard phase II test and with only one phase I test, it has been shown that a 2-phase study will be efficient compared with a single-phase study (where everybody is assessed by using all available tests) if c2/c1 > 5 and the sum of the sensitivity and specificity parameters is at least 1.7 (4). Looking into this issue in the case of a non-gold-standard phase II test will have practical value and is worth researching further. ACKNOWLEDGMENTS Author affiliations: Department of Epidemiology and Public Health, National University of Singapore, Singapore (Agus Salim); Centre for Mental Health Research, Australian National University, Australian Capital Territory, Australia (Agus Salim); and Centre for Mathematics and Its Applications, Australian National University, Australian Capital Territory, Australia (Alan H. Welsh). The statistical work was funded by Australian National Health and Medical Research Council (NHMRC) Capacity Building Grant 418020 and Australian Research Council (ARC) Grant DP0559135. The mild cognitive impairment study was funded by NHMRC Program Grant 179805 and NHMRC Project Grant 157125. The mild cognitive impairment study involved human subjects, and ethical approval was obtained from the Australian National University Ethics Committee. Conflict of interest: none declared. REFERENCES 1. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision. Washington, DC: American Psychiatric Association; 2000. 2. Kumar R, Dear KB, Christensen H. Prevalence of mild cognitive impairment in 60 to 64-year old community-dwelling individuals: the personality and total health through life 60þ study. Dement Geriatr Cogn Disord. 2005;19(2–3):67–74. Am J Epidemiol 2009;170:369–378 377 3. Hayden KM, Khachaturian AS, Tschanz JT. Characteristics of a two-stage screen for incident dementia. J Clin Epidemiol. 2003;56(11):1038–1045. 4. Shrout PE, Newman SC. Design of two-phase prevalence surveys of rare disorders. Biometrics. 1989;45(2):549–555. 5. Newman SC, Shrout PE, Bland RC. The efficiency of twophase designs in prevalence surveys of mental disorders. Psychol Med. 1990;20(1):183–193. 6. McNamee R. Efficiency of two-phase designs for prevalence estimation. Int J Epidemiol. 2003;32(6):1072–1078. 7. Ihorst G, Forster J, Petersen G. The use of imperfect diagnostic tests had an impact on prevalence estimation. J Clin Epidemiol. 2007;60(9):902–910. 8. Torrance-Rynard VL, Walter SD. Effects of dependent errors in the assessment of diagnostic test performance. Stat Med. 1997;16(19):2157–2175. 9. Walter SD, Irwig LM. Estimation of test error rates, disease prevalence, and relative risk from misclassified data: a review. J Clin Epidemiol. 1988;41(9):923–937. 10. Faraone SV, Tsuang MT. Measuring diagnostic accuracy in the absence of a ‘‘gold standard. Am J Psychiatry. 1994;151(5): 650–657. 11. Johnson WO, Gastwirth JL, Pearson LM. Screening without a ‘‘gold standard’’: the Hui-Walter paradigm revisited. Am J Epidemiol. 2001;153(9):921–924. 12. Harper D. Local dependence latent structure models. Psychometrika. 1972;37(1):53–59. 13. Qu Y, Tan M, Kutner MH. Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics. 1996;52(3):797–810. 14. von Davier M, Yamamoto K. Partially observed mixtures of IRT models: an extension of the generalized partial-credit model. Appl Psychol Meas. 2004;28(6):389–406. 15. Muraki E. A generalized partial credit model: application of an EM algorithm. Appl Psychol Meas. 1992;16(2):159–176. 16. Nelder JA, Mead R. A simplex method for function minimization. Comput J. 1965;7(4):308–313. 17. Petersen RC, Doody R, Kurz A. Current concepts in mild cognitive impairment. Arch Neurol. 2001;58(12):1985–1992. 18. DeCarli C. Mild cognitive impairment: prevalence, prognosis, aetiology, and treatment. Lancet Neurol. 2003;2(1):15–21. 19. Folstein MF, Folstein SE, McHugh PR. Mini-Mental State: a practical method for grading the cognitive state of patients for the clinicians. J Psychiatr Res. 1975;12(3):189–198. 20. Delis D, Kramer J, Kaplan E. California Verbal Learning Test. San Antonio, TX: Psychological Corporation; 1987. 21. Smith A. Symbol Digit Modalities Test (SDMT) Manual. Los Angeles, CA: Western Psychological Service; 1982. 22. Hänninen T, Hallikainen M, Tuomainen S. Prevalence of mild cognitive impairment: a population-based study in elderly subjects. Acta Neurol Scand. 2002;106(3):148–154. 23. Reboussin BA, Ip EH, Wolfson M. Locally dependent latent class models with covariates: an application to under-age drinking in the USA. J R Stat Soc (A). 2008;171(4):877–897. APPENDIX 1 Marginal Likelihood of Random-Effects LCA To estimate the fixed-effects parameters, h, we maximize the sum of the marginal log-likelihood of the fixed-effects parameters over all subjects. For subject i, the marginal likelihood is given by 378 Salim and Welsh Li ðhÞ ¼ q Z N M Y y U aj1 þ bj1 t ij N j¼1 1yij 1 U aj1 þ bj1 t dUðtÞ Z N Y M y þ ð1 qÞ U aj0 þ bj0 t ij N j¼1 1yij dUðtÞ: 1 U aj0 þ bj0 t ðA1Þ For computation, the likelihood in equation A1 is often approximated by using Gaussian quadrature. Under model 2, Qu et al. (13) showed that the sensitivity of the jth test, wj, is given by the average probability of positive test outcomes in the diseased population, 1 0 Z N B aj1 C Uðaj1 þ bj1 tÞdUðtÞ ¼ U@qffiffiffiffiffiffiffiffiffiffiffiffiffiffiA: ðA2Þ wj ¼ N 1 þ b2j1 Similarly, the specificity is given by the average probability of negative test outcomes in the nondiseased population 1 0 Z N B aj0 C xj ¼ 1 Uðaj0 þ bj0 tÞdUðtÞ ¼ U@qffiffiffiffiffiffiffiffiffiffiffiffiffiffiA: N 1 þ b2j0 ðA3Þ The maximum likelihood estimates of wj and xj can be computed by substituting aj1 and bj1 with their maximum likelihood estimates. APPENDIX 2 Likelihood Function for 2-Phase Prevalence Studies The phase I sample is a simple random sample of subjects from the population. Hence, the probability of being selected at phase I is a constant and is equal for all eligible subjects. All subjects in phase I undergo M tests with binary outcomes, so there are 2M different possible outcome classes. Let n*k ; k ¼ 1; . . . ; 2M denote the number of observa- PM tions in each outcome class, so n ¼ 2k¼1 n*k . We model ðn*1 ; . . . ; n*2M Þ as having a multinomial ðn; ðP½Z1 ; h; . . . ; P½Z2M ; hÞÞ distribution, where the Zk’s are the outcome classes defined by a unique combination of phase I test outcomes and P[Zk, h] is calculated under random-effects LCA (2). In phase II, given the outcome of phase I, we select nk subjects independently with probability pk from the kth outcome class, k ¼ 1, . . ., 2M. Thus, nk j Phase I ~ binomial n*k ; pk : Each of the nk subjects selected at phase II is then given the additional test, W. Let nks denote the number of subjects with outcome s ¼ 0, 1 on the test W, k ¼ 1, . . ., 2M. Then, nks j phase I and selection in phase II ~ binomialðnk ; Pr½Sj Zk ; hÞ; where Pr[S|Zk;h] ¼ Pr[S, Zk;h]/Pr[Zk;h]. It can be shown that the log-likelihood based on data from this 2-phase sampling is given by log‘ðh; pÞ ¼ 2M X n*k nk log Pr½Zk ; h k¼1 þ 2M X 1 X nks log Pr½S; Zk ; h: ðA4Þ k¼1 s¼0 Note that we can use this log-likelihood to estimate h ¼ ĥ from the pilot data if we know the values of the other quantities (nk, n*k , nks, pk). However, we can also use this loglikelihood to find the ‘‘optimal’’ p that will give us the smallest variance of the prevalence estimate by evaluating the log-likelihood at fixed h ¼ ĥ while varying the vector of sampling fractions, p. Now, if we conducted the pilot study and estimate h ¼ ĥ after phase I of the main study has been carried out, we know the number of subjects n*k in each of the phase I outcome classes, but we do not know nk or nks. In this case, we replace the unknown nk by its conditional expectation given phase I, which is n*k pk and nks, by its conditional expectation given phase I, which is n*k pk Pr S; Zk ; ĥ Pr Zk ; ĥ . If we design the main study before phase I has been carried out, we do not even know the number of subjects n*k in each of the phase * I outcome classes, so we also replace the unknown nk by its unconditional expectation nPr Zk ; ĥ . Am J Epidemiol 2009;170:369–378
© Copyright 2026 Paperzz