Atypicality Indices as Reference Values for Laboratory Data

Atypicality Indices as Reference Values for Laboratory Data
ADELIN ALBERT, PH.D.
Albert, Adelin: Atypicality indices as reference values for laboratory data. Am J Clin Pathol 76: 421-425,1981. There have
been several methods suggested for relating laboratory results
to relevant reference values; none has been completely satisfactory. The method described in this paper is referring to
atypicality indices for uniform reporting and feasible interpretation of laboratory data. The index associated with a measurement is defined as the probability of finding a result closer to
the mean of the reference population than the one actually observed. Thus the larger the index, the less likely the measured
value would be in the reference population and the more atypical it becomes. The proposed method, unlike any other, keeps
the same simple and desirable features when extended to multivariate situations, that is when several tests are to be interpreted simultaneously. (Key words: Atypicality indices;
Reference values; Laboratory data; Multivariate interpretation.)
LABORATORY MEASUREMENTS provide meaningful information only when related to relevant
reference values. The classical approach of presenting
the original observed value together with a reference
interval, within which measurements are compatible
with the physiologic equilibrium of the organism and
outside of which they are likely to be pathologic, is not
satisfactory for many reasons. The method can be considered as a rule of thumb, depending upon whether or
not the observed value falls outside the fixed interval.
The convention that the interval should contain a 0.95
fraction of the reference population is purely arbitrary
and probably originates from the 0.95 confidence level
commonly adopted in statistical practice. The extension of the notion of univariate reference intervals to
that of a multivariate reference region, 10,21 when several
tests are to be interpreted simultaneously, leads to the
same defects and poses the difficult problem of reporting. But one of the most fundamental criticisms of the
method is theoretical in the sense that one seems to
erect a strict barrier between physiologic values and
out-of-range results.
Various methods 11 have been proposed for transforming an observed value in such a way that the result
is more easily interpretable for the clinician. These
transformations are to be regarded only as tools to exReceived December 2, 1980; accepted for publication March 23,
1981.
Supported in part by N1H grant No. 1 F05 TW02964-01.
Address reprint requests to Dr. Albert: Division of Computer Research and Technology, National Institutes of Health, Building
12A, Room 2041, Bethesda, Maryland 20205.
Laboratory of Applied Studies,
Division of Computer Research and Technology,
National Institutes of Health,
Bethesda, Maryland
press the relationship between the observed value and
the reference values. For instance, the observed value
can be related to the location 16 of the reference distribution, or to its dispersion 4 or to both. 5 , 8 1 2 1 7 Several
authors 6,8,20 also suggested it may be related to the
shape of the reference distribution by reporting the corresponding percentile. Obviously, none of the proposed methods is ideal and includes all desired information about a measured value, but they all embrace part
of it.
This paper describes the use of atypicality indices for
uniform reporting and feasible interpretation of laboratory data. The notion of an atypicality index was recently introduced in discriminant analysis 1 partly
for explaining classification errors, but the proposal
that it might be useful clinically is new. The method
assumes that the reference distribution is normal or
gaussian. Though the validity of this assumption is frequently disputed, 7 in our experience most laboratory
test distributions can reasonably be normalized using
an appropriate mathematical transformation. I3,M
Our approach is close to the percentile method in the
sense that it affords the clinician an easily interpretable
measure of how unusual the observed value would be in
the reference population. However it offers the additional advantage of being applicable to both single and
multiple test results. Hence it is appropriate for assessing the degree to which a combination of tests resembles those from the reference population. 2 The
comparison of univariate and multivariate atypicality
indices is particularly helpful in detecting highly
abnormal patterns of tests results that would otherwise
have been considered as "normal."
Materials and Methods
Univariate Index of Atypicality
Let P denote the reference population and x a normally distributed biologic constituent with mean p. and
standard deviation o\ The index of atypicality, I(x 0 ),
associated with an observed measurement x = x„, is
0002-9173/81/0010/0421 $00.75 © American Society of Clinical Pathologists
421
A.J.C.P. • October 1981
ALBERT
422
xs
A
FIG.
^
X^N(//,Or2)
X0
o
Symmetrical value
of X 0
Observed value
of X
(xgv-*")
[X0-/n*6tT)
Graphical representation (shaded area) of the atypicality index I(x0) associated with an observed value x = x0
assuming that variable x is normal N(/u.,o-2).
defined as the probability of observing a value of the
constituent closer to the mean fx of the reference population than the value x0 actually observed: in other
words, if
x0, then I(x0) = 2p(x0) - 1, if x0 > fx, and I(x0) = 1
- 2p(x0), if x0 < ix, indicating a close relationship between the two methods, at least in the univariate case.
Multivariate Extension
2
d (x) =
x - ix
(1)
denotes the squared standardized distance between x
and /x, the required index is given by the equation
I(xo) = Pr{d2(x) < d2(x0)},
(2)
and is represented by the shaded area on Fig. 1.
Defined as a probability, the index ranges from 0
to 1; a value close to zero corresponds to an observation close to the mean of the reference population,
whereas a value close to one applies to an observation
far away from the mean. Thus the larger the index,
the less chance the measurement would have to belong
to the reference population and the more atypical
it becomes.
From the definition above, symmetrical observations
around the mean have the same index. Thus to indicate
the direction of displacement of x0 from /x, a sign can be
added to the calculated index. Note that, if p(x0) denotes the percentile associated with the observed value
The chief merit of the proposed method is that it extends straightforwardly from univariate to multivariate
situations.
LetX = (x1; x2, . . . ,xp) be a vector or "profile" of
p biologic constituents whose joint distribution is assumed to be multinomial3 with mean vector X = (x,,
x2, . . . , xp) and dispersion matrix S.
The atypicality index, I(X0), associated with an observed profile X = X0, is defined as the probability of
observing a profile X closer to the mean of the reference
population than the profile X0 actually observed, taking
into consideration the correlations existing between the
p constituents. This probability is given by the equation
I(X0) = Pr{D2(X) < D2(X0)},
(3)
D2(X) = (x - xys-^x - x)
(4)
where
represents the Mahalanobis18 generalized distance between X and X. In the multivariate case, it is obviously
Table I. Mean Values, SD, Covariances and
Correlations of Blood Urea, Uric Acid and
Creatinine in the Reference Sample
of 284 Individuals
able, as it also takes into account the reference sample
variation.
Example
Covariances and Correlations
Test
Urea
(mmol/L)
Uric acid
(/j.mol/L)
Creatinine
Oxmol/L)
423
ATYPICALITY INDICES AS REFERENCE VALUES
Vol. 76 • No. 4
Mean ± SD
Urea
Uric Acid
5.1 + 1.1
1.14
0.2291
303 ± 61
14.93
3724.3
85 ± 14
6.02
473.3
~
Creatinine
ie
_
0.3930
'
0.5410)
205.5
_
impossible to give the index a sign. To calculate
te
equation 3, we note that from standard normal theory,33
>
the criteria
5)
n
where n is the number of individuals from the reference
,e
sample, is distributed as a Snedecor-F variable with
tn
p and n-p degrees of freedom. Moreover the F-distribuu_
tion is related to the incomplete beta function B(x;
K.
a, /3), for which an optimized computer algorithm19
19
has been recently published. Hence, we can write
I(X0) = B ( H2(X0); | - , - ^ - ^ j ,
gx
(6)
where
D2(Xo)
H*(X„) =
D2(X0) +
, .
2
n — 1
(7)
'
n
id
Grams and coworkers10 also envisaged equation 3 and
e2
they suggested a x approximation on p degrees of free)n
dom. However, it can be shown that this approximation
in
is not very satisfactory when the ratio n/p is lower than
:rten. Therefore in practice equation 6 is usually preferTable 2. Use of Atypicality Indices to Relate Blood'
Urea, Uric Acid and Creatinine Concentrations*
to the Reference Sample of 284 Individuals*
Test
Result
Reference
Interval
Urea (mmol/L)
Uric acid (/xmol/L)
Creatinine (/xmol/L)
4.8
411
62
2.9-7.3
181-425
57-113
Multivariate atypicality index = 0.99.
* Measured in a 50-year-old woman.
~
Univariate
Atypicality
ty
Index
0.230.92+
0.90-
_
Blood urea (mmol/L), uric acid (^mol/L), and creatinine (umol/L) were recorded in a reference sample of
284 healthy individuals. Means, standard deviations,
covariances and correlations are given in Table 1. All
estimated correlations are significant (P < 0.01), confirming the fact that the three tests do not vary completely independently.
The data in Table 1 enable to calculate univariate and
multivariate atypicality indices for any observed combination of urea, uric acid, and creatinine values.
The results displayed in Table 2 were observed in a
woman, aged fifty, undergoing a detailed laboratory
check-up. They not only show the use of the method
but also provide an actual example of the paradoxical
situations15 that can arise when comparing univariate
and multivariate atypicality indices. It can be seen that,
although all univariate indices are lower than 0.95,
meaning that each single test's concentration lies within
its 0.95 reference interval, the multivariate index indicates that the pattern of results is highly abnormal. A
closer examination reveals that the observed uric acid
concentration is much too high as compared with those
of urea and creatinine, at least as expected from the correlations above. Such atypical patterns of results are
not unlikely. They can be observed every day and certainly do deserve more careful attention.
The multivariate atypicality indices were calculated
for all 284 subjects of the reference sample: their
distribution is shown in Fig. 2 (solid lines) and looks
quite uniform as expected from theory. To further test
the effectiveness of the method in daily laboratory practice, a new random sample of 290 patients was drawn
from the general hospital population, for which urea,
uric acid, and creatinine were requested together. The
distribution of the multivariate indices for that sample
(see Fig. 2, dotted lines) was obviously quite different
from that of the reference set. Index values greater
than 0.995 were found in 99 cases (34%); thus 66% of
the patients' indices ranged from 0.00 and 0.995.
Actually about 35% of the indices were found in the
grey zone, i.e., between 0.80 and 0.99, precisely where
the clinician frequently needs interpretative help.
Discussion
Considering the bulk of data that the clinician has to
glance through and interpret every day, it is reasonable
to think that his work will be simplified if the laboratory provides clear output and standardized reporting.
Most frequently interpretation is based on reference
424
ALBERT
A.J.C.P. • October 1981
0.4
0.3-
Training
sample
Hospital
sample
u
c
«>
D
0.2-
0.1.
i
0.1
0.2
0.3
0.4
Multivariate
—i—
— i —
0.5
0.6
Index
of
~i—
0.7
0.8
0.9
1.0
Atypicality
FIG. 2. Frequency distribution of multivariate atypicality indices associated with blood urea, uric acid, and creatinine values in the
reference sample (n = 284) and in the hospital population sample (n = 290).
intervals supplied by the laboratory. Unfortunately the
intervals may vary from one laboratory to another or
may be corrected by changing the method, the instrument or the measurement units. Instead of being transparent to the clinician, all these changes may affect
considerably his habits and decisions.
It is to overcome these difficulties that several methods for a more uniform output of laboratory data have
been proposed. Among these the SD units system 12
has received considerable recognition among clinicians. Nevertheless SD units are not always satisfactory. For instance, when dealing with enzymatic tests,
where reference values are relatively low, but extremely high activities can be observed, SD units are
nearly as large as the observed results and may therefore be misleading.
The percentile 8 - 20 provides a handy compact expression easy to calculate for single test results, but in practice the method can hardly be applied to profiles, that
is, to multivariate problems. 2
Atypicality indices are proposed to report laboratory
data, for they offer many advantages.
The index is a number between 0 and 1, dimensionless. It is defined as a probability and affords the
clinician an easily interpretable measure of how un-
usual results would be in the reference population.
Clinicians can give different interpretations to identical indices depending on the clinical context. The clinician can set himself the critical index that would prompt
him to an action.
When well established profiles or groups of correlated tests are envisaged, many authors 210,21 have
claimed that a multivariate interpretation of the results
should be preferred to the classical univariate approach. In this context, the method of atypicality
index is particularly suitable for it is applicable irrespective of the number of results to be interpreted
simultaneously. Moreover the use of a single index to
summarize a group of p correlated tests is particularly
helpful in detecting abnormalities that would otherwise
have escaped notice. These abnormalities could even
be used by the laboratory chief as a further consistency
check before reporting the results.
Final Practical
Considerations
In practice, the calculation of atypicality indices requires the use of a computer, at least as soon as two
or more constituents are considered simultaneously.
It is suggested that the index be reported with two
Vol. 76 . No. 4
ATYPICALITY INDICES AS REFERENCE VALUES
decimals only, in order to keep the method as simple
as possible. This implies that all index scores greater
than 0.995 will be printed 1.00, thus pointing out highly
abnormal results at once.
The index is based on normality assumptions that
have often been criticized as much too restrictive.
However, the distribution of most biologic constituents
can reasonably be normalized14 using a simple mathematical transformation13: the advantage being that onjy
two parameters (/u. and a) are needed for single variables and (p2 + 3p)/2 parameters (X and S) when p
variables are envisaged simultaneously. Nonparametric methods require full storage of the reference
sample and in some circumstances this may be difficult
to manage.
A computer subroutine incorporating the calculation
of both univariate and multivariate indices of atypicality has been written in FORTRAN and may be
obtained on request.
References
1. Aitchison J, Habbema JDF, Kay JW: A critical comparison
of two methods of statistical discrimination. Appl Statist
26:15-25, 1977
2. Albert A, Heusghem C: Relating measured values to reference
values. The multivariate approach. Reference values in
laboratory medicine, Edited by R Grasbeck, T Alstrom,
John Wiley and Sons, 1981, pp 289-296
3. Anderson TW: An introduction to multivariate statistical analysis. New York, John Wiley and Sons, Inc., 1958, pp 11-43
4. Casey AE, Downey E: Further use of statens in the record-
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
425
ing, analysis, and retrieval of automated computerized laboratory and clinical data. Am J Clin Pathol 53:748-754, 1970
Delwaide PA, Buret J, Albert A: Le concept de valeur normale
en chimie clinique. Rev Med Liege 27:694-709, 1972
Elveback LR, Taylor WF: Statistical methods of estimating percentiles. Ann NY Acad Sci 161:538-548, 1969
Elveback LR, Guillier CR, Keating FR: Health, normality, and
the ghost of Gauss. J Am Med Ass 211:69-75, 1970
Feinstein AR: Clinical biostatistics. XXVII. The derangements
of the normal range. Clin Pharmacol Ther 15:528-540, 1974
Gabrieli ER: Enhancing the meaning of clinical laboratory data.
CRC Crit Rev Clin Lab Sci 1:65-85, 1970
Grams RR, Johnson EA, Benson ES: Laboratory data analysis
system. III. Multivariate normality. Am J Clin Pathol 58:
188-200, 1972
Grasbeck R, Dybkaer R, Winkel P: Relating values to reference
values. Ann Biol Clin 36:193-194, 1978
Gullick HD, Schauble MK: SD unit system for standardizing
reporting of laboratory data. Am J Clin Pathol 57:517-525,
1972
Harris EK, DeMets DL: Estimation of normal ranges and
cumulative proportions by transforming observed distributions to Gaussian form. Clin Chem 18:605-612, 1972
Healy MJR: Normal values from a statistical viewpoint. Bull
Acad Roy Med Belg 9:703-718, 1969
Healy MJR: Rao's paradox concerning multivariate tests of significance. Biometrics 25:411-413, 1969
Lederer WH, Gerstbrein HL: Expressing results of enzyme
assays. Clin Chem 20:916-917, 1974
Lo JS, Kellen JA: A proposal for a more uniform output in laboratory data. Clin Chim Acta 41:239-245, 1972
Mahalanobis PC: On the generalized distance in statistics. Proc
Nat Inst Sci (India) 12:49, 1936
Majumber KL, Bhattacharjee GP: The incomplete beta integral.
Appl Statist 22:409-411, 1973
Rossing RG, Hatcher WE: Percentiles as reference values for
laboratory data. Am J Clin Pathol 72:94-97, 1979
Winkel P, Lyngbye J, Jorgensen K: The normal region. A
multivariate problem. Scand J Clin Lab Invest 30:
339-344, 1972