Appendix - Clinical Chemistry

Appendix (Supplemental file)
Nonparametric estimation of percentiles
The observations are ordered (ranked) according to size, and the pth percentile is
derived as the value of the observation with rank number N(p/100)+0.5 (22). If this is
a non-integer value, linear interpolation is carried out. In clinical chemistry, one
usually sees the formula (p/100) (N+1) applied, e.g. the IFCC recommendation on
nonparametric reference interval estimation (23). However, a theoretical treatment of
nonparametric percentile estimation shows that the detailed, optimal percentile
computation procedure depends on the type of distribution being considered (24).
For a Gaussian distribution, the expression (p/100) (N +0.2)+0.4 has been derived
theoretically (24). For the 95-percentile, this expression yielding 0.95N +0.59 is close
to 0.95N +0.5. In simulations, the formula N(p/100) + 0.5 has been shown to perform
slightly better for a relevant range of N than the theoretically derived formula and has
thus been selected here (10). Further, the root mean squared error (RMSE) of this
estimate is lower than the estimate based on the formula (p/100) (N+1):
RMSEs of percentile estimation procedures for the standard Gaussian distribution
with negative values given as zeros.
RMSE
Sample size
0.95N+0.5
0.95(N+1)
10
0.5984
-
25
0.3821
0.4789
50
0.2960
0.3120
100
0.2044
0.2198
200
0.1466
0.1518
1
Standard errors of percentiles
For nonparametric percentile estimation, it is possible to derive confidence limits for
the estimated percentile on the basis of the ranked observations as displayed in
Table 1 (9). For specified, theoretical distributions, a general, approximate
expression for the standard error of a nonparametrically estimated percentile
corresponding to the percentage p can be given as:
SEnpar = [(p/100)(1-(p/100))/(d2N)]0.5
where d is the density of the distribution at the percentile and N is the sample size
(25). For Gaussian distributions, we have for the 5- and 95-percentile:
SEnpar = [0.95(1-0.95)/((0.103112/σ2) N)]0.5 = 2.114σ/N0.5
This relation also holds true, if the Gaussian distribution is truncated at zero (negative
values assigned zero) as may be the case for the distribution of blank measurements
(σ would here refer to the original distribution before truncation).
The standard error of a parametrically estimated percentile given a Gaussian
distribution is:
SEpar = σ [1/N + z2/(2N)]0.5
where z is the standard Gaussian derivate for the given percentile. For the 5- and 95percentiles the relation is:
SEpar = σ [1/N + 1.6452/(2N)]0.5 = 1.534σ/N0.5
2
The asymptotic efficiency of the nonparametric percentile estimation procedure is:
(1.534/2.114)2 = 0.527.
Thus, if a parametrically estimated 95-percentile is based on 100 observations, 190
observations are required by the nonparametric procedure to provide the same
precision of the estimate.
Estimation of SDS
The estimated standard deviation, SD, is distributed according to X- distribution (chidistribution) with f degrees of freedom. The mean value of SD, E[SD], is σ x sqrt(2/f)
x Γ(f/2+1/2)/ Γ(f/2) where Γ(x) is a gamma function. For f even, Γ(f/2)=( f/2-1) x ( f/22) x …. x (1); for f odd, Γ(f/2)=(f/2-1) x ( f/2-2) x …. x (3/2) x (1/2) x sqrt(π). The
expansion for the mean value of SDS , σ x (1-1/(4f) + 1/(32f2)+…), can be used for
obtaining the unbiased estimate of σ if f degrees of freedom is not too small (26).
The confidence interval for the SD can be easily obtained from the confidence
interval for the estimate of the variance, SD2, which is distributed according to the X2
-distribution and tabulated in most statistics textbooks: f x SD2/σ2 ~ X2(f), where f is
the degrees of freedom. So, the two-sided 95%-confidence interval for the standard
deviation SD is derived from the relation:
Pr (X22.5% (f) < f x SD2/σ2 <X297.5%(f))=95% ;
SD x sqrt((f)/ X297.5%(f)) < σ < SD x sqrt((f))/ X22.5% (f))
The upper limit of the one-sided 95%-confidence interval is SD x sqrt((f))/ X25% (f)).
For example, for f =49 (N = 50), we have X22.5% (49)=31.6 and X297.5%(49)=70.2.
Then the two-sided 95%-confidence interval is (0.84 x SD; 1.25 x SD) because
sqrt(49/70.2) = 0.84 and sqrt(49/31.6) = 1.25. The upper limit of the one-sided 95%
confidence interval is 1.20 (=sqrt(49/33.93)) times the observed SD value.
The approximation to the distribution of X2 (f) gives that the SD is approximately
normally distributed with variance 0.5 x σ2 /(f) (25).
3
When SDS is estimated from repeated measurements of several (K) samples, a
pooled estimate is formed from the weighted average of the variances (squared SDs)
as:
SD2S = (n1SD1 2 + n2SD2 2 + n3SD3 2 + ……+ nKSDK 2)/( n1+ n2 + n3 …….. + nK)
where nj (nj= Nj – 1) refers to the degrees of freedom for the sample with number j
(sample with number j is assayed Nj times).
Optimal relation between NB and NS
The more formal reasoning for the equal numbers of blank and sample measurement
is as follows. The estimate of LoD is: LoDEST = LoBEST + 1.645 /(1-1/(4 x f)) SDS. An
estimate of LoB, i.e. an estimate of the 95-percentile of the distribution of blank
measurements, has approximately normal distribution with variance = (0.218)2 /(d2 x
NB), where d is the density function at the 95th percentile of the blank measurements,
and NB is the number of blank measurements. An estimate of the standard deviation
of sample measurements, SDS, has approximately normal distribution with variance =
0.5 x σ2S /f where σ2S is the variance of the sample measurements, and f is the
degrees of freedom (if NS is the total number of sample measurements with K
samples, then f = Ns – K).
Then LoDEST is approximately normally distributed with the variance
(0.218)2/(d2 x NB)+ (1.645)2 x (1-1 /(4 x f))-2 x 0.5 x σ2S / (NS-K)
(1)
Now let us investigate the behaviour of this variance for a different allocation of the
number of measurements. Let NB be some portion δ of the NTotal (NB= δ* NTotal),
where NTotal = NB + (NS- K). Then the expression (1) is
4
σ B2
NTotal
(
(2.114) 2
δ
The function (
+
k12
δ
(1.645) 2 0.5 σ S2 σ B2
).
1
2
(1 − δ )(1 −
)
4(1 − δ ) NTotal
+
k22
1
(1 − δ )(1 −
)2
4(1 − δ ) NTotal
) has a minimum value at δ satisfied the
1− δ + 1
4 NTotal
k12
δ2
. If NTotal is not too small, the
cubic equation 2 =
2
k2 (1 − δ − 1
) 1− δ − 1
4 Ntotal
4 NTotal
optimal value of δ is close to k1/(k1+k2) with the minimum value of the variance of
(k1+k2)2σB2/NTotal. If the number of blank measurements and sample measurements
are about equal (NB=NS-K=0.5 NTotal), then the value of the variance of LoD estimate
is 2(k12+k22)σB2/NTotal. The use of the equal numbers of blank and sample
measurements instead of the optimal allocation leads to the increase of the standard
error of the LoD estimate by
2(k12 + k22 )
4k1k2
= 2−
< 2 . So, for any
k1 + k2
(k1 + k2 ) 2
distribution of the blank measurements, the increase of the standard error is not more
than 41%.
For many practical cases, an about equal number of blank and sample
measurements would be optimal. For example, if the distribution of the blank
measurements is the Gaussian distribution truncated at zero, then d=0.103/σB, and
the optimal allocation for NB and NS-K will be
N B /( N S − K ) =
0.218 /(σ B ⋅ 0.103 / σ B )
2.114
=
1.645 ⋅ 0.707 ⋅ σ S σ B
1.163 ⋅ σ S σ B
For example, for σS/σB =1.5, the optimal allocation will be 55% of NTotal for blank
measurements and 45% for sample measurements. For σS/σB =2.0, the optimal
allocation will be 47% and 53% respectively.
σS/σB
Optimal allocation of
% Increase in SE of LoD
NB and NS-K
estimate for equal allocation
5
of NB and NS-K
1.0
65%, 35%
4.1%
1.5
55%, 45%
0.5%
2.0
47%, 53%
0.1%
2.5
42%, 58%
1.2%
3.0
38%, 62%
3.0%
5.0
27%, 73%
10.4%
If the distribution of the blank measurements is the double exponential
distribution
1
σB 2
e
−
2
σB
⋅x
truncated at zero, then d=0.071/σB and the optimal
allocation for NB and NS-K will be
N B /( N S − K ) =
σS/σB
0.218 /(σ B ⋅ 0.0707 / σ B )
3.083
=
1.645 ⋅ 0.707 ⋅ σ S σ B
1.163 ⋅ σ S σ B
Optimal allocation of
% Increase in SE of LoD
NB and NS-K
estimate for equal allocation
of NB and NS-K
1.0
73%, 27%
9.7%
1.5
64%, 36%
3.8%
2.0
57%, 43%
1.0%
3.0
47%, 53%
0.2%
5.0
35%, 65%
4.6%
So, an about equal number of blank and sample measurements would probably be
optimal in many practical cases.
Confidence interval for LoD
6
Below, it is demonstrated how an approximate upper 95%-CI limit of LoD can be
derived from the sum of upper CI limits of LoBEST and 1.645/(1-1/(4 x N)) x SDS It is
assumed here that NB = NS = N. From the section above, we have the following
expression for the variance of LoD using the normal approximations of LoBEST and
SDS and Taylor’s expansion (1-1/(4 xN))-2 =1+2/(4 x N) + 3/(4 xN)2 +…
(2.114)2 * σ2B / N+ (1.645)2 x 0.5 x σ2S / N. (1)
Then the upper limit of the 95%-CI for LoDEST is
2
2
1.96 x sqrt((2.114) x σ
B
/ N+ (1.645)2/(1-1/(4 x N))2 x 0.5 x σ2S / N )
= 2.114 x σB / sqrt(N) x 1.96 x sqrt(1+0.303 x σS2/ σB2)
(a)
Let us construct the X%-CI for the LoBEST: Zx x 2.114 x σB / sqrt(N),
and the X%-CI for the SDS: Zx x 0.707 x σS / sqrt(N).
Consider the sum of these upper limits:
UpperLoB +1.645 x UpperSDS = Zx x 2.114 x σB / sqrt(N) x (1+0.550 x σS/ σB)
(b)
From relations (a) and (b) we can see that Zx should be selected so that
1.96 x sqrt(1+0.303 x σS2/ σB2) = Zx x (1+0.550 x σS/ σB).
from which we get
Zx=1.96 x sqrt(1+0.303 x σS2/ σB2)/ (1+0.550 x σS/ σB).
From this expression, we can see that Zx depends on the ratio of σS/ σB. We
investigated the function sqrt (1+t2)/(1+t). This function has a minimum at t=1. For
t>1, this function increases with a limiting value of 1. This means that for the small
values of σS/ σB (as 1, 1.5) Zx differs much more from 1.96 than for the large values
of σS/ σB . For the range of σS/ σB from 1 to 5 (higher ratios are unlikely to be
encountered), X should be taken as 80%-85%. A simulation study for the sample size
range 50 – 300 confirmed that X=80% provided an acceptable approximation (see
Results section).
7
95%-confidence intervals for full and partial verification
In the partial verification procedure LoB is a constant (e.g. supplied by the
manufacturer), and accordingly the confidence interval for the proportion of sample
results exceeding LoB can be derived from the binomial distribution. We are here
concerned with the question of whether at least (100%-β) of sample results exceed
LoB. Therefore, we apply a one-sided 95%-CI indicating the smallest observed
proportion that is in agreement with the true proportion being 100%-β. For β = 5%,
these proportions are displayed in Table 3.
Concerning a full verification, we are estimating the probability Pr(Y>LoB) using the
events Yi > LoBEST, where the latter is a random variable contributing additional
variability. For N = 20 – 70 the probability was evaluated by simulation (100 000
replications) for the situations σS/ σB = 1, 1.5, 2, and 5 according to the model used
above. For example, for NB = NS = N = 34, and σS/ σB = 1, we found that agreement
with the hypothesis of β = 5% was assured for N >= 28 exceeding LoBest. For σS/ σB
= 1.5 and 2, the corresponding number was N >= 29, increasing to N >= 30 for σS/ σB
= 5. For σS/ σB approaching infinity, the limit was N >= 30, which is identical to the
number for the partial verification procedure. In Table 3, we have displayed the
minimum numbers converted to a proportion for an extended sample size range
based on 10 000 replications. As a compromise for the various possibilities of σS/ σB,
we selected the value 1.5. Nevertheless, the variation with the σS/ σB value is limited.
8