Appendix (Supplemental file) Nonparametric estimation of percentiles The observations are ordered (ranked) according to size, and the pth percentile is derived as the value of the observation with rank number N(p/100)+0.5 (22). If this is a non-integer value, linear interpolation is carried out. In clinical chemistry, one usually sees the formula (p/100) (N+1) applied, e.g. the IFCC recommendation on nonparametric reference interval estimation (23). However, a theoretical treatment of nonparametric percentile estimation shows that the detailed, optimal percentile computation procedure depends on the type of distribution being considered (24). For a Gaussian distribution, the expression (p/100) (N +0.2)+0.4 has been derived theoretically (24). For the 95-percentile, this expression yielding 0.95N +0.59 is close to 0.95N +0.5. In simulations, the formula N(p/100) + 0.5 has been shown to perform slightly better for a relevant range of N than the theoretically derived formula and has thus been selected here (10). Further, the root mean squared error (RMSE) of this estimate is lower than the estimate based on the formula (p/100) (N+1): RMSEs of percentile estimation procedures for the standard Gaussian distribution with negative values given as zeros. RMSE Sample size 0.95N+0.5 0.95(N+1) 10 0.5984 - 25 0.3821 0.4789 50 0.2960 0.3120 100 0.2044 0.2198 200 0.1466 0.1518 1 Standard errors of percentiles For nonparametric percentile estimation, it is possible to derive confidence limits for the estimated percentile on the basis of the ranked observations as displayed in Table 1 (9). For specified, theoretical distributions, a general, approximate expression for the standard error of a nonparametrically estimated percentile corresponding to the percentage p can be given as: SEnpar = [(p/100)(1-(p/100))/(d2N)]0.5 where d is the density of the distribution at the percentile and N is the sample size (25). For Gaussian distributions, we have for the 5- and 95-percentile: SEnpar = [0.95(1-0.95)/((0.103112/σ2) N)]0.5 = 2.114σ/N0.5 This relation also holds true, if the Gaussian distribution is truncated at zero (negative values assigned zero) as may be the case for the distribution of blank measurements (σ would here refer to the original distribution before truncation). The standard error of a parametrically estimated percentile given a Gaussian distribution is: SEpar = σ [1/N + z2/(2N)]0.5 where z is the standard Gaussian derivate for the given percentile. For the 5- and 95percentiles the relation is: SEpar = σ [1/N + 1.6452/(2N)]0.5 = 1.534σ/N0.5 2 The asymptotic efficiency of the nonparametric percentile estimation procedure is: (1.534/2.114)2 = 0.527. Thus, if a parametrically estimated 95-percentile is based on 100 observations, 190 observations are required by the nonparametric procedure to provide the same precision of the estimate. Estimation of SDS The estimated standard deviation, SD, is distributed according to X- distribution (chidistribution) with f degrees of freedom. The mean value of SD, E[SD], is σ x sqrt(2/f) x Γ(f/2+1/2)/ Γ(f/2) where Γ(x) is a gamma function. For f even, Γ(f/2)=( f/2-1) x ( f/22) x …. x (1); for f odd, Γ(f/2)=(f/2-1) x ( f/2-2) x …. x (3/2) x (1/2) x sqrt(π). The expansion for the mean value of SDS , σ x (1-1/(4f) + 1/(32f2)+…), can be used for obtaining the unbiased estimate of σ if f degrees of freedom is not too small (26). The confidence interval for the SD can be easily obtained from the confidence interval for the estimate of the variance, SD2, which is distributed according to the X2 -distribution and tabulated in most statistics textbooks: f x SD2/σ2 ~ X2(f), where f is the degrees of freedom. So, the two-sided 95%-confidence interval for the standard deviation SD is derived from the relation: Pr (X22.5% (f) < f x SD2/σ2 <X297.5%(f))=95% ; SD x sqrt((f)/ X297.5%(f)) < σ < SD x sqrt((f))/ X22.5% (f)) The upper limit of the one-sided 95%-confidence interval is SD x sqrt((f))/ X25% (f)). For example, for f =49 (N = 50), we have X22.5% (49)=31.6 and X297.5%(49)=70.2. Then the two-sided 95%-confidence interval is (0.84 x SD; 1.25 x SD) because sqrt(49/70.2) = 0.84 and sqrt(49/31.6) = 1.25. The upper limit of the one-sided 95% confidence interval is 1.20 (=sqrt(49/33.93)) times the observed SD value. The approximation to the distribution of X2 (f) gives that the SD is approximately normally distributed with variance 0.5 x σ2 /(f) (25). 3 When SDS is estimated from repeated measurements of several (K) samples, a pooled estimate is formed from the weighted average of the variances (squared SDs) as: SD2S = (n1SD1 2 + n2SD2 2 + n3SD3 2 + ……+ nKSDK 2)/( n1+ n2 + n3 …….. + nK) where nj (nj= Nj – 1) refers to the degrees of freedom for the sample with number j (sample with number j is assayed Nj times). Optimal relation between NB and NS The more formal reasoning for the equal numbers of blank and sample measurement is as follows. The estimate of LoD is: LoDEST = LoBEST + 1.645 /(1-1/(4 x f)) SDS. An estimate of LoB, i.e. an estimate of the 95-percentile of the distribution of blank measurements, has approximately normal distribution with variance = (0.218)2 /(d2 x NB), where d is the density function at the 95th percentile of the blank measurements, and NB is the number of blank measurements. An estimate of the standard deviation of sample measurements, SDS, has approximately normal distribution with variance = 0.5 x σ2S /f where σ2S is the variance of the sample measurements, and f is the degrees of freedom (if NS is the total number of sample measurements with K samples, then f = Ns – K). Then LoDEST is approximately normally distributed with the variance (0.218)2/(d2 x NB)+ (1.645)2 x (1-1 /(4 x f))-2 x 0.5 x σ2S / (NS-K) (1) Now let us investigate the behaviour of this variance for a different allocation of the number of measurements. Let NB be some portion δ of the NTotal (NB= δ* NTotal), where NTotal = NB + (NS- K). Then the expression (1) is 4 σ B2 NTotal ( (2.114) 2 δ The function ( + k12 δ (1.645) 2 0.5 σ S2 σ B2 ). 1 2 (1 − δ )(1 − ) 4(1 − δ ) NTotal + k22 1 (1 − δ )(1 − )2 4(1 − δ ) NTotal ) has a minimum value at δ satisfied the 1− δ + 1 4 NTotal k12 δ2 . If NTotal is not too small, the cubic equation 2 = 2 k2 (1 − δ − 1 ) 1− δ − 1 4 Ntotal 4 NTotal optimal value of δ is close to k1/(k1+k2) with the minimum value of the variance of (k1+k2)2σB2/NTotal. If the number of blank measurements and sample measurements are about equal (NB=NS-K=0.5 NTotal), then the value of the variance of LoD estimate is 2(k12+k22)σB2/NTotal. The use of the equal numbers of blank and sample measurements instead of the optimal allocation leads to the increase of the standard error of the LoD estimate by 2(k12 + k22 ) 4k1k2 = 2− < 2 . So, for any k1 + k2 (k1 + k2 ) 2 distribution of the blank measurements, the increase of the standard error is not more than 41%. For many practical cases, an about equal number of blank and sample measurements would be optimal. For example, if the distribution of the blank measurements is the Gaussian distribution truncated at zero, then d=0.103/σB, and the optimal allocation for NB and NS-K will be N B /( N S − K ) = 0.218 /(σ B ⋅ 0.103 / σ B ) 2.114 = 1.645 ⋅ 0.707 ⋅ σ S σ B 1.163 ⋅ σ S σ B For example, for σS/σB =1.5, the optimal allocation will be 55% of NTotal for blank measurements and 45% for sample measurements. For σS/σB =2.0, the optimal allocation will be 47% and 53% respectively. σS/σB Optimal allocation of % Increase in SE of LoD NB and NS-K estimate for equal allocation 5 of NB and NS-K 1.0 65%, 35% 4.1% 1.5 55%, 45% 0.5% 2.0 47%, 53% 0.1% 2.5 42%, 58% 1.2% 3.0 38%, 62% 3.0% 5.0 27%, 73% 10.4% If the distribution of the blank measurements is the double exponential distribution 1 σB 2 e − 2 σB ⋅x truncated at zero, then d=0.071/σB and the optimal allocation for NB and NS-K will be N B /( N S − K ) = σS/σB 0.218 /(σ B ⋅ 0.0707 / σ B ) 3.083 = 1.645 ⋅ 0.707 ⋅ σ S σ B 1.163 ⋅ σ S σ B Optimal allocation of % Increase in SE of LoD NB and NS-K estimate for equal allocation of NB and NS-K 1.0 73%, 27% 9.7% 1.5 64%, 36% 3.8% 2.0 57%, 43% 1.0% 3.0 47%, 53% 0.2% 5.0 35%, 65% 4.6% So, an about equal number of blank and sample measurements would probably be optimal in many practical cases. Confidence interval for LoD 6 Below, it is demonstrated how an approximate upper 95%-CI limit of LoD can be derived from the sum of upper CI limits of LoBEST and 1.645/(1-1/(4 x N)) x SDS It is assumed here that NB = NS = N. From the section above, we have the following expression for the variance of LoD using the normal approximations of LoBEST and SDS and Taylor’s expansion (1-1/(4 xN))-2 =1+2/(4 x N) + 3/(4 xN)2 +… (2.114)2 * σ2B / N+ (1.645)2 x 0.5 x σ2S / N. (1) Then the upper limit of the 95%-CI for LoDEST is 2 2 1.96 x sqrt((2.114) x σ B / N+ (1.645)2/(1-1/(4 x N))2 x 0.5 x σ2S / N ) = 2.114 x σB / sqrt(N) x 1.96 x sqrt(1+0.303 x σS2/ σB2) (a) Let us construct the X%-CI for the LoBEST: Zx x 2.114 x σB / sqrt(N), and the X%-CI for the SDS: Zx x 0.707 x σS / sqrt(N). Consider the sum of these upper limits: UpperLoB +1.645 x UpperSDS = Zx x 2.114 x σB / sqrt(N) x (1+0.550 x σS/ σB) (b) From relations (a) and (b) we can see that Zx should be selected so that 1.96 x sqrt(1+0.303 x σS2/ σB2) = Zx x (1+0.550 x σS/ σB). from which we get Zx=1.96 x sqrt(1+0.303 x σS2/ σB2)/ (1+0.550 x σS/ σB). From this expression, we can see that Zx depends on the ratio of σS/ σB. We investigated the function sqrt (1+t2)/(1+t). This function has a minimum at t=1. For t>1, this function increases with a limiting value of 1. This means that for the small values of σS/ σB (as 1, 1.5) Zx differs much more from 1.96 than for the large values of σS/ σB . For the range of σS/ σB from 1 to 5 (higher ratios are unlikely to be encountered), X should be taken as 80%-85%. A simulation study for the sample size range 50 – 300 confirmed that X=80% provided an acceptable approximation (see Results section). 7 95%-confidence intervals for full and partial verification In the partial verification procedure LoB is a constant (e.g. supplied by the manufacturer), and accordingly the confidence interval for the proportion of sample results exceeding LoB can be derived from the binomial distribution. We are here concerned with the question of whether at least (100%-β) of sample results exceed LoB. Therefore, we apply a one-sided 95%-CI indicating the smallest observed proportion that is in agreement with the true proportion being 100%-β. For β = 5%, these proportions are displayed in Table 3. Concerning a full verification, we are estimating the probability Pr(Y>LoB) using the events Yi > LoBEST, where the latter is a random variable contributing additional variability. For N = 20 – 70 the probability was evaluated by simulation (100 000 replications) for the situations σS/ σB = 1, 1.5, 2, and 5 according to the model used above. For example, for NB = NS = N = 34, and σS/ σB = 1, we found that agreement with the hypothesis of β = 5% was assured for N >= 28 exceeding LoBest. For σS/ σB = 1.5 and 2, the corresponding number was N >= 29, increasing to N >= 30 for σS/ σB = 5. For σS/ σB approaching infinity, the limit was N >= 30, which is identical to the number for the partial verification procedure. In Table 3, we have displayed the minimum numbers converted to a proportion for an extended sample size range based on 10 000 replications. As a compromise for the various possibilities of σS/ σB, we selected the value 1.5. Nevertheless, the variation with the σS/ σB value is limited. 8
© Copyright 2026 Paperzz