Sample size determination for estimation of the accuracy of two conditionally independent diagnostic tests Marios Georgiadis, Faculty of Veterinary Medicine, Aristotle University of Thessaloniki, Greece è this work was done by v Wes Johnson, University of California, Davis v Ian Gardner, University of California, Davis v Marios Georgiadis, Aristotle University of Thessaloniki Hui-Walter model (Biometrics, 1980) Population 1 Population 2 Test 2 + - Test 2 + - Test 1 + a b a+b Test 1 + e f e+f - c d c+d - g h g+h a+c b+d n1 e+g f+h n2 Assumptions Population 1 Population 2 Test 2 + - Test 2 + - Test 1 + a b a+b Test 1 + e f e+f - c d c+d - g h g+h e+g f+h n2 a+c b+d n1 è Validity of the assumptions is critical and should be given careful consideration è π1 # π2 è Each test has the same Se-Sp in the two populations è Conditional independence (Vacek, 1985) sample size estimation using HW è data from 2-tests applied on 2-populations è goal is to estimate Se1, Se2, Sp1, Sp2, π1, π2 è minimum sample size to achieve a desired level of precision è the method provides sample sizes to obtain CI’s of a specified maximum width for one or more of the 6 parameters è alternatively, we can specify CI widths for the difference in sensitivities (Se1-Se2) and specificities (Sp1-Sp2) è spreadsheet 1 HW estimates and CI’s è HW provided closed-form formulas for the ML estimates for the two Se’s, the two Sp’s and the two prevalences (6 parameters) è using these formulas with our 2-table data we get ML point estimates for the six parameters of interest è these point estimates are the points of the (6-dimensional) parameter space for which the likelihood function is maximized è spreadsheet 3 HW formulas for the Fisher Information Matrix (FIM) è once we get the FIM we can invert it to obtain the estimated variancecovariance matrix è the diagonal elements of this matrix are the standard large-sample estimates of the variances of the respective parameter estimates è The square roots of the diagonals are the usual s.e.’s è off-diagonal elements are the corresponding estimated covariances è excel spreadsheet 2 è once we have the standard errors we can calculate CI’s Seˆ ± zα ∗ s.e.(Seˆ ) 1 /2 1 è we need the assumption of asymptotic normality of the ML estimates - large sample sizes è rule of thumb: ML estimate ± 3*s.e. should not cover 0 or 1 è if the assumption does not hold we cannot calculate CI’s in the usual way estimation of the differences: Se1-Se2 and Sp1 –Sp2 è an objective of the study might be to compare the sensitivities or the specificities of the tests è the point estimate of the differences is the difference of the point estimates è the estimated variances of the difference estimates are: vâr(Seˆ − Seˆ ) = vâr(Seˆ ) + vâr(Seˆ ) − côv(Seˆ , Seˆ ) 1 2 1 2 1 2 vâr(Spˆ − Spˆ ) = vâr(Spˆ ) + vâr(Spˆ ) − côv( Spˆ , Spˆ ) 1 2 1 2 1 2 è all the necessary estimated variances and covariances can be obtained from the estimated variance-covariance matrix è the standard error of the difference is the square root its estimated variance è if the asymptotic normality assumption holds we can create CI’s as before calculation of sample size è if the sampling distribution of an estimator µ̂ is approximately normal then the (1-α)*100% CI is µˆ ± z ∗ s.e.( µˆ ) α /2 where s.e.( µˆ ) = s / N è the width (w) of this CI is w = 2 ∗ zα ∗ s.e.( µˆ ) = 2 ∗ zα ∗ s / N /2 /2 è solving for N, we get: N = (2 * zα ∗ s / w ) 2 /2 è to calculate the sample size, N, we need an estimate of s è spreadsheet 1 è if the largest sample size is picked, all the CI widths will be as specified or smaller è estimation of only a subset of parameters might be of interest v prevalence estimates are not usually of interest v some performance estimates might be known è information on these is used in the spreadsheet but their CI widths are set arbitrarily large è for some combinations of parameter values the diagonals of C and Coˆv can be negative è this is because these parameter values result in a singular information matrix è we have to make sure that we do not have negative diagonals or very large pairwise correlation values (close to or over 1 or -1) è another indication is that the sample sizes will become very large è in these situations, the usual ML method cannot be used to obtain s.e.’s and therefore our sample size calculations are not applicable è it’s a good idea to try some combinations of parameter guesses to make sure you are not near a problematic area of the parameter space è the same potential problems and warnings can be found in spreadsheet 2 initial parameter guesses è guesses of the 6 parameters of interest are necessary è since the sample size calculation is strongly dependent on those they have to be realistic è expert opinion - be careful: v sensitivity can vary with severity of infection and stage of disease process v sensitivity of a test with experimental samples might be higher than with real field samples v specificity can vary according to geographic distribution of cross-reacting microorganisms è best to do a pilot study è calculate sample sizes for a range of possible parameter values if you wanted to conduct an evaluation study è if you want to use the HW model: first make sure that the assumptions hold v tests conditionally independent v populations have different prevalences v test performance the same in both populations è sample size calculations – precision and cost considerations v specify up front how much precision we need è formulate educated guesses for the parameters of interest (expert opinion and/or pilot study) è use spreadsheet 1 to get sample sizes è check to see if the large-sample approximation is reasonable by calculating the initial estimate/guess ±3*s.e. to determine if the interval obtained includes 0 or 1 è if it does, the sample is likely not large enough to justify large-sample normality è during the calculation process we should monitor the diagonals of matrix C and the pairwise correlations and be careful about the “singular information matrix” problem è conduct the study è insert raw data into spreadsheet 3 to get parameter estimates è use parameter estimates in spreadsheet 2 to get standard errors è if large sample theory holds, we can calculate CI’s for the parameters of interest è again, monitor information matrix diagonals and pairwise correlations dependent tests è if the tests are conditionally dependent, we can still use the HW setup but we will need different methods of analysis of our results è since there are no sample-size calculation methods for such tests, we can still use our method, knowing that to obtain comparable precision we will probably need larger sample sizes è the calculated sizes can be used as an absolutely minimum value HW data example
© Copyright 2025 Paperzz