University of Groningen Computer aided identification of

University of Groningen
Computer aided identification of toxicologically relevant substances by means of
multiple analytical methods
Hartstra, Jan
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to
cite from it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date:
1997
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):
Hartstra, J. (1997). Computer aided identification of toxicologically relevant substances by means of
multiple analytical methods Groningen: s.n.
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the
author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the
number of authors shown on this cover page is limited to 10 maximum.
Download date: 18-06-2017
Chapter 2
On the theory of substance
identication and screening in
toxicological analysis
2.1 Introduction
Although this thesis is restricted to the identication of chemical substances, an
attempt to give a general denition of the concept of identication is made.
2.1.1 A general denition of identication
A way in which science tries to deal with the enormous amount of objects and phenomena in the universe, is to put them in order. This is accomplished by classifying
the objects and phenomena, that is objects and phenomena with similar properties
are brought together in well-dened classes. Consequently, a classication system is
formed, which can be used to determine the class to which an unknown object or
phenomenon belongs to. This process is called determination [38].
If a classication system becomes so sophisticated that each class contains
a single object or phenomenon, we speak of identication. Thus identication of objects is based on comparing the properties of an unknown object to the properties
of reference objects (i.e. objects whose identity is known). If the properties of an
unknown object adequately match those of a single reference object, we have identied that unknown object. If the properties of an unknown object do not match
the properties of any known reference object, one must be dealing with a \new" object (i.e. an object that was never encountered previously). Usually, such an object
is labeled (i.e. it is given a name) and given its proper place in the classication
system.
2.1.2 Identication of chemical substances
Each chemical substance has a unique molecular, or ionic, structure. Thus, for chemical substances, the identity is directly related to the molecular structure of that
substance. Substances can be also identied by one or more names, which can be regarded as references to the molecular structure. If the molecular structure could be
measured directly, identication would be simple. However, the molecular structure
of a substance is a latent concept that cannot be measured directly, but only indirectly by measuring its physico-chemical properties [39]. These properties depend
on the molecular structure. So, a physico-chemical feature of a chemical substance
6
2.2. Mathematical concepts
7
can be regarded a function of the molecular structure and therefore directly related
to the identity of the substance.
q = f (x)
(2.1)
In other words, the feature (q) depends on the identity of a substance (x). The
physico-chemical properties can be measured using appropriate techniques and appropriate methods. Analytical chemistry furnishes a multitude of techniques and
accompanying methods to measure specic properties of chemical substances. According to Mandel [40], the purpose behind most physical and chemical measurements is to characterize a particular material or physical system with respect to a
given feature.
2.2 Mathematical concepts and philosophy of the
identication process
A sample may contain one or more substances whose identities are unknown. A
need to reveal the identities of these unknown substances makes identication, and
thus the analysis of the sample, necessary.
2.2.1 The Identication Process
The identication process is based on a simple principle: compare properties measured for the unknown substances to the properties measured for all possible substances.
Let SU = fu1; u2 ; : : : ; uk ; : : : ; ul g be the set of l relevant substances present
in the sample, whose identity is unknown, and need to be identied. Often, this set
will be simply denoted as the unknown compounds
Given the sample there is a set of substances that may present in the sample.
In STA this set is obviously quit large. Let SX = fx1 ; x2 ; : : : ; xi ; : : : ; xN g be this set
of possible substances, which we will denote as the a priori set of candidates. Prior
to the identication process (i.e. before analysis), this set of a priori candidates is
the same for each unknown substance.
The aim of the identication process is to reduce the a priori set of candidates to a single candidate, or at least only a few candidates, for each of the
l unknowns. So after analysis, there are l separate a posterior sets of candidates
Ck (k = 1; : : : ; l), where each Ck is a subset of SX (i.e. Ck X ). These sets will
be used throughout the thesis as the basis for the description of the identication
process.
2.2.2 Identication Methods
Measurement of the properties of both the unknown substances and the substances
in the set of a priori candidates plays a central role in the identication process. As
do the analytical methods used to measure these properties. An analytical method
used for identication purposes will be denoted as an identication method.
8
Chapter 2. Substance identication theory
Measurement of a feature involves an interaction of the substance with either matter or energy. For instance, spectroscopic measurements involve interaction
of chemical substances with electromagnetic radiation.
Measurement of the feature of a substance yields a signal1. This signal, yi ,
can be regarded as a function of the feature qi and therefore, due to equation (1),
as a function of the identity of the chemical substance xi , that is
y = g(qi ) = g(f (xi ))
(2.2)
So, if a particular measurement method yields a discrete signal for each substance,
a set SY = fy1 ; y2 ; : : : ; yj ; : : : ; yM g comprising the M possible discrete signals, can
be dened. Then, the signals can be regarded as a discrete function with domain
SX and range SY . Ignoring the fact that the measurements are subject to errors
(in other words, the signal is purely deterministic), we distinguish two possible
situations:
1. for each substance there is a distinct signal, and
2. for some substances the signal is equal.
Figure 2.1 represents situation (1). In this situation the number of signals equals the
x1
u
x2
u
x3
u
x4
u
x5
u
-
:
:
j
-
u
y1
u
y2
u
y3
u
y4
u
y5
Figure 2.1: Input/output graph of an ideal method
number of possible candidates. A measurement method yielding such a situation
is an ideal identication method and will allow unambiguous identication of all
substances in SX . The method is specic for each substance xi .
Figure 2.2 represents situation (2). Some substances yield an identical signal. For measurement methods yielding such situations, unambiguous identication
of all substances is impossible. The method is selective. Note, however, that for the
situation depicted in Figure 2.2, for substance x2 unambiguous identication is still
possible. In other words, for individual substances such a measurement method may
still enable unambiguous identication. The method is specic for substance x2 .
1. An analytical signal, as emerges from the analytical process, actually has a position y, carrying
information on the kind of analyte, and an intensity z carrying information on its amount [41].
Here, only the the so-called position y is used.
2.2. Mathematical concepts
x1
u
x2
u
x3
u
x4
u
x5
u
9
z:
:*
j
u
y1
u
y2
u
y3
Figure 2.2: Input/output graph of an ideal method
Analytical instruments and methods, and thus identication methods, can
be classied according to the type of signal they yield [42]. An instrument generating
a single data point (i.e. a scalar signal) per sample is a zero-order instrument. A
scalar is a zero-order tensor. An instrument generating multiple data points at one
time or for one sample is a rst-order instrument. Multiple data points can be
represented as vector. A vector is a rst-order tensor. An instrument generating
a matrix of data points is a second-order instrument. A matrix is a second order
tensor.
A melting point is a scalar signal, so the melting point determination apparatus is a zero-order instrument. A spectrum consists of multiple intensities measured at dierent frequencies, and can thus be represented by a vector. So, a scanning spectrometer is a rst-order instrument. A chromatogram consists of multiple
intensities at dierent points in time, so a chromatograph is also a rst-order instrument. A spectro-chromatogram consists of multiple intensities measured at different frequencies all measured at dierent points in time. Spectro-chromatograms
are produced by hyphenated instruments such as GC-MS and HPLC-DAD. These
hyphenated instruments are therefore second-order instruments.
2.2.3 Identication Parameters
The signal produced by the analytical instrument often needs to be translated into
relevant information. For instance, the analytical signal obtained from a sample
may contain information on several substances. Often, only a part of the signal is
due to the substances in question. The art is to extract that particular part of the
signal, which comprises the features for a particular substance. This process is called
feature extraction. For example in chromatography, a single substance corresponds
to band in the chromatogram known as peak (the chromatogram is the signal, the
peak is the feature).
Features can be described (quantied) by parameters. Ideally, a chromatographic peak has a Gaussian shape. This type of peak can be described by its center
10
Chapter 2. Substance identication theory
point, its width and its height. For qualitative purposes the central point is the most
important parameter. In the translation of the signal produced by a denite analytical method for a particular chemical substance, calibration is essential to produce
reproducible parameters (see Chapter 6).
From now on, the term identication parameter will be used to denote the
value of a feature extracted from an analytical signal and that is subsequently used
in the identication process. For example, chromatographic mobility (i.e. retention
behaviour) can be used as an identication parameter for a chromatographic peak
extracted from a chromatogram.
2.2.4 Measurement errors
A chemical analytical instrument and accompanying method can be regarded as a
sensor producing a signal. We can divide these signals into two parts:
1. a deterministic part, and
2. a stochastic part.
Ideally, the deterministic part is characteristic for the feature analyzed. In practice,
we discriminate between noise, or random error, and bias, or the systematic error.
Noise is the stochastic part of the signal. Precision is the magnitude of the
noise. Through replicate measurements the magnitude of the noise can be estimated.
Through replicate measurements we can also obtain a better estimate of the true
value (provided that there is no bias), because the central value (e.g. the mean) of
the individual measurements (i.e. trials) is generally a better estimate for the true
value then the individual measurements themselves. Precision can be dened as the
reproducibility of measurement within a set, that is, to the scatter or dispersion of
a set about its central value [43], where set refers to the number (n) of independent
replicate measurements of some feature. The standard deviation is a recommended
measure for the precision of analytical methods [44].
Bias is the deviation of the central value of a large number of measurements
(n ! 1), from the \actual or true value". The magnitude of the bias is a measure
for the accuracy of the method. Because the true value is usually unknown, the
accuracy is dicult to estimate.
Additional concepts related to the measurement error are repeatability and
reproducibility. Repeatability is the precision of the method when performed under
identical conditions (in the same laboratory, by the same operator, using the same
equipment, etc.). Reproducibility is the inter-laboratory precision of the method.
The impact of the measurement error on the identication process can be
seen as follows: All measurement methods are subject to measurement errors. Due
to this measurement error, a single measurement can only be regarded an estimate
of a \true value" of the identication parameter. If represents the true value for
the feature of a substance, the identication parameter measured for that substance
(i.e. y(xi ) or more short yi ) can be represented by
y(xi ) = i i
(2.3)
where i is the measurement error for the signal of substance xi . Because the parameter, y(xi ), is an estimate of the true value i , this i is the expectation value
2.3. General probabilistic approach
of y(xi ),
11
i = E [y(xi )]
(2.4)
Thus, repeated measurements, preferably by collaborative trials, provide a better
estimate of both i and the measurement error i .
Because of the measurement error, parameters measured for identical substances may dier. This means that for a specic substance more than one parameter
(or signal) is possible. Mathematically speaking, the signal is no longer a function
of the substance but is related to the substance. This conclusion paves the way for
the introduction of a probabilistic approach towards the identication process.
2.3 General probabilistic approach
A probabilistic approach towards the identication has been reported in several
publications, [45, 46] (although in some more explicitly than in others [47]) The
presence of an unknown substance in a sample can be regarded as a random event.
Probability is a mathematical model for random events. Using sets X and Y , respectively denoting the set of N possible candidates prior to analysis and the set
comprising all M possible signals, the central question in the identication process
can be formulated as: what is the probability of the sample containing substance xi
when a signal yj was measured?
Before analysis, for each possible candidate xi there is a corresponding
probability p(xi ) to be the unknown.
P (X ) =
x
1
x2 xi xN
p(x1 ) p(x2 ) p(xi ) p(xN )
(2.5)
Assuming a limited number of discrete signals (or identication parameters), for
each signal yj there is also a corresponding probability p(yj ).
y
1
y2 yi yM
(2.6)
p(y1 ) p(y2 ) p(yi ) p(yM )
The probability of the sample containing substance xi when a signal yj
was measured is the conditional probability p(xi jyj ). Conditional probability is the
P (Y ) =
probability of an event assuming that another event has occurred. Using Bayes' theorem, which is actually a formula for reversing the order in conditional probabilities,
we can nd this conditional probability from:
p(xi jyj ) = p(xip)p(y(y)j jxi )
(2.7)
j
Thus, to calculate p(xi jyj ) we need to know the probabilities p(xi ) and p(yj ) and
the conditional probability p(yj jxi ). This conditional probability p(yj jxi ) is the
probability of measuring a signal yj when the sample contains substance xi . This
involves replicate measurements of the signals for all substances in SX . In doing
12
Chapter 2. Substance identication theory
so, the probability of measuring a signal yj for each substance xi is determined,
resulting in a calibration matrix
0 p(y1jx1 ) p(y1jx2) : : : p(y1jxi ) : : : p(y1jxN ) 1
BB p(y2jx1 ) p(y2jx2) : : : p(y2jxi ) : : : p(y2jxN ) CC
BB ...
CC
..
..
..
.
.
.
P (Y jX ) = B
BB p(yj jx1 ) p(yj jx2 ) : : : p(yj jxi ) : : : p(yj jxN ) CCC (2.8)
B@ ..
CA
..
..
..
.
.
.
.
p(yM jx1 ) p(yM jx2 ) : : : p(yM jxi ) : : : p(yM jxN )
This calibration matrix is constructed from the signals measured for xi and their
reproducibilities. Later, we will discus ways to attain the conditional probabilities,
for the moment we suce with the remark that the conditional probabilities should
answer
m
X
p(yj jxi ) = 1
(2.9)
j =1
When the conditional probabilities are known, the probabilities p(yj ) can be calculated from
N
X
p(yj ) = p(xi )p(yj jxi )
(2.10)
i=1
When, before analysis, no a priori information is available, we can assume the
probabilities p(xi ) to be the reciprocal of the number of possible candidates, giving
p(x ) = 1
(2.11)
i
N
Substitution of Equations 2.10 and 2.11 in Equation 2.7 gives
p(xi )p(yj jxi )
(2.12)
p(xi jyj ) = P
N p(y jx )
i=1 j i
Note that in this way, measurement of the signals of the reference substances
is a form of calibration, comparable to the calibration in quantitative analysis. In
quantitative analysis the signal is measured for a number of samples with known
concentrations. From these observations, a model for the relation between the concentration and the signal can be estimated. Using this model (the calibration curve),
the concentration of an unknown sample may be calculated.
2.3.1 Hypothesis testing as a model for the identication process
After the main mathematical concepts involved in the identication process of chemical substances have been introduced, a model for this process will be introduced.
The elemental problem in the identication process is the question whether a parameter measured for the unknown substance can be equal to the parameter obtained
for a given candidate. A scientic approach towards this question is the use of hypothesis testing. Hypothesis testing is one the major tools for making statistical
inferences [48, p. 194].
2.3. General probabilistic approach
13
2.3.2 Fundamentals of hypothesis testing
In case of the identication problem as dened here, we can state the null hypothesis
(H0 ) as unknown ul is substance xi , and the alternative hypothesis as unknown ul
is not substance xi , or:
H0 : ul = xi
(2.13)
H : u =x
1
l
i
Based on the evidence provided by the analyses, we decide on either accepting or
rejecting H0 . This decision and the actual situation give rise to four possible results:
1. we decide correctly that ul is xi ,
2. we decide correctly that ul is not xi ,
3. we decide erroneously that ul is xi , and
4. we decide erroneously that ul is not xi .
Note that there are two types of incorrect decisions. Table 2.1 shows the possible
results in an arrangement frequently encountered in statistical textbooks. In statisTable 2.1: Possible outcomes of an hypothesis test
Unknown truth (state of the universe)
H0 : ul = xi
H0 : ul =
6 xi
Correct decision Type II error
False positive
p=
Type II error
Correct decision
H0 : ul 6= xi False negative True negative
p=
p=1,
H0 : ul = xi True positive
Decision
p=1,
tical hypothesis testing, the tests are designed so that the probability of rejecting
H0 when in fact it is true, is equal to the so-called signicance level of the test.
Connected to is the probability of accepting the null hypothesis when in fact it
is false. This probability is called the power, , of the test.
There are basically two approaches towards hypothesis testing:
1. by dening acceptance and rejection regions under the assumption of H0 ,
and by subsequently rejecting H0 if the test statistic falls into the rejection
region;
2. by calculating the probability of H0 being true given the value of the test
statistic, and by subsequently rejecting H0 if this probability drops below
the signicance level .
An more detailed treatment of this subject can be found in statistical textbooks
(e.g. Wonnacott and Wonnacott [49, p. 287-323]).
In the identication process, the null hypotheses uk = xi for k = 1; 2; : : : ; l
and for i = 1; 2; : : : ; N have to be tested. In other words: for each of the l unknowns
N separate hypotheses have to be tested; so, the identication process involves a
total of l N hypothesis tests. Actually, the test is based on the question whether
the parameter measured for the unknown can originate from the given candidate,
14
Chapter 2. Substance identication theory
i.e. how similar is the parameter of the unknown compared with the true value of
the parameter of the candidate. Thus, the hypothesis can be restated as:
H0 : y(ul ) = i
H1 : y(ul ) = i
(2.14)
This clearly indicates the need for a measure of the similarity between the identication parameters measured for the unknown, y(uk ), and the true value of the
candidate, i . Furthermore, there is a need for a limit beyond which these parameters are considered dissimilar. Such a limit could be called a discriminating function
or match function. If the parameters are similar H0 cannot be rejected; so substance
xi remains a candidate for unknown substance uk .
2.3.3 Similarity measures
As we have seen, the determination of the similarity (s) or dissimilarity (d) between
the identication parameters measured for the unknown substance and the candidate under consideration is essential for the hypothesis testing model. Dissimilarity
is the direct opposite of similarity [50, p. 24]. The similarity between two signals is
usually in the range [0,1]. In this case, the dissimilarity can be obtained by using
the monotonically decreasing transformation d = 1 , s.
In hypothesis testing as a model for the identication process, the measure
of similarity or dissimilarity between the true value of the identication parameter
for candidate xi (i.e. i ) and the signal measured for the unknown uk (i.e. y(uk )), is
the point of interest. To determine the measure of dissimilarity, a so-called distance
function is employed. For scalar (i.e. zero order) identication parameters, such as
the melting point and chromatographic retention measures, the absolute dierence:
is a useful distance function.
d = jy(uk ) , i j
(2.15)
For more complex signals such as spectra (e.g. mass-, NMR-, IR- and UV-spectra)
the absolute dierence is unsuitable. For these rst-order signals a multitude of
distance functions is available, such as the Euclidean distance
v
u
w
uX
d = t (yv (uk ) , v;i )2
v=1
(2.16)
where the signals are vectors consisting of w discrete measurements. Other useful distance functions are the Minkowski metric, the Canberra metric, and the
Czekanowski coecient [50, p. 25-26]. Also various types of correlation coecients,
such as Pearson's product moment correlations coecient can also be used [50, p.
25-26]. Note that the correlation coecient is a similarity measure rather than a
dissimilarity measure. A transformation such as d = 1 , r is appropriate if a correlation coecient of -1 represents the maximum disagreement. A transformation
such as d = 1 , r2 is appropriate if correlation coecients of -1 and +1 are treated
equivalently as showing maximum agreement.
2.3. General probabilistic approach
15
2.3.4 Discriminating Functions/Matching criteria
Based on the similarity between the identication parameters of the unknown and
the candidates, certain substances can be discarded from the list of possible candidates. For unambiguous identication, all but one of the original candidates need
to be discarded. To accomplish this, a discriminating function has to be applied.
The most simple discriminating function is
retain x
i if d limit
discard xi if d > limit
(2.17)
meaning that candidate xi is retained if the distance is smaller than a certain limit,
and discarded if the distance is larger than that limit. Obviously, this limit should
be related to the precision of the identication parameters. If the distance between
a reference substance and the unknown lies within the limit, the reference substance
is said to match the unknown substance. When this distance function is used in a
databases retrieval process, we speak of window retrieval.
Window retrieval yields dichotomous results: either the reference substance
matches the unknown or not. A more useful approach would be to quantify the
match, in other words to answer the question: how good is the match? If the distribution function of a given distance function is known, the probability of nding
a certain distance under the assumption that the signals originate from the same
substance can be calculated. If this probability is low the assumption may be discarded.
Based on statistical considerations, a number of other discriminating functions can be conceived. One of possible approaches is the use of condence intervals.
2.3.5 Condence intervals
Condence interval estimation is related to hypothesis testing. Under the assumption of H0 (that uk is xi ), i is the expectation of y(uk ). Hence, we assert that
y(uk ) must lie in the interval [i , ci ; i + ci ], that is:
i , ci < y(uk ) < i + ci
(2.18)
Assuming a normal distribution, with z = (y(uk ) , i )=i the limit c is z , resulting
in the interval [i , z i ; i + z i ] If y(uk ) lies in this interval, the identication
parameters are indiscernible and H0 cannot be rejected. If y(uk ) lies outside this
interval, H0 is rejected at signicance level .
2.3.6 Connection to the probabilistic approach
Under the assumption of H0 that uk is xi , i is the expectation value of y(uk ).
Hence, under the assumption of H0 (i.e. unknown substance uk is candidate xi ) the
probability of measuring a value that is as extreme as or even more extreme as y(uk ),
can be calculated; provided that the distribution of the identication parameter is
known.
16
Chapter 2. Substance identication theory
Assuming a normal distribution for a scalar identication parameter y(uk )
(i.e. N [; i2 ]), this parameter has the probability density function (p.d.f.)
p (y(uk )jH0 ) = p (y(uk )juk , xi ) = p1 exp , 21 y(uk) , i
i 2
i
(2.19)
Hence, the probability of measuring a value that is as extreme as or even more
extreme as y(uk ) is given by the shaded area in Figure 2.3. This probability can be
0.018
0.016
0.014
0.012
0.01
p(y)
0.008
0.006
0.004
0.002
0
900
i
i
-
y(uk )
950
1000
y
1050
1100
Figure 2.3: Probability p(y(uk ))jH0 ) for an unknown uk , for which an identication
parameter y(uk ) was measured, of being reference substance xi with identication
parameter i and corresponding standard deviation i .
calculated from:
p (y(uk )jH0 ) = 2 Pr (y y(uk )) = p2
2
Z1
y=jy(uk ,i )j
1 y(u ) , k
i
exp ,
2
i
(2.20)
As can be seen from Figure 2.3, the shaded area approaches 1 when y(uk ) approaches
i . When y(uk ) deviates increasingly from i , the shaded area becomes smaller and
approaches zero. When the probability drops below a given level, H0 (i.e. uk =i )
is rejected.
In this thesis, the so-called Similarity Index (SI ) for a one-dimensional (i.e.
univariate identication parameter) is dened as the probability given by Equation 2.20.
2.4. Discussion
17
2.4 Discussion
The model introduced above constitutes the basis for the identication process
delineated in this thesis. Below, some implications of and diculties with the model
are discussed.
2.4.1 Types of identication
So far, we have considered identication as a uniform process. In analytical chemistry, we often distinguish between dierent types of identication:
1. If there is strong evidence that an unknown substance has a specic identity,
simple comparison of the features of the unknown and the reference substance (provided that the reference substance is available) can be sucient.
This type of identication is often called conrmation;
2. The identication of previously known substances in samples of unknown
composition. In this case the comparison of the features of the unknown
are to be compared to a large number of reference substances. This type of
identication is often referred to as recognition; and
3. The identication of hitherto unknown substances. Comparison of the unknown with reference compounds is impossible in this case. This type identication is also called structure elucidation.
The latter type seems to lack one of the factors we noted as important in the
identication process: it is impossible to compare characteristics of a new substance
(there are no reference data available). However, structure elucidation depends on
a large number of measurements of the features of dierent chemical structures.
Based on these features, complex systems of rules (\axioms of chemistry") on the
interpretation of the features of unknown structures have been conceived. Using
these rules, the structure of the unknown can often be deduced. In general, structure
elucidation is done for relatively clean samples. In complex biological samples where
the analyte is present in a low concentration relative to the matrix substances, this
type of identication is usually not feasible.
In this thesis, the second type of identication (recognition) is emphasized.
Obviously, it is unrealistic to measure the features of all reference substances at
the same time the properties of the unknown substances are measured. It is far
more eective to collect features of all reference substances and store them into
tables, libraries or databases. The features measured for the unknown can then be
compared to the data in the library. This type of identication is often referred to
as retrieval. According to Zurcher et al. [51], the retrieval process for mass-spectra
is based on the so-called general library-search hypothesis:
\if the spectra [properties] are similar, then the chemical structures
are similar".
To arrive at a useful model for the process of identication by retrieval, we will
explore some mathematical elements that can be used to describe the process.
18
Chapter 2. Substance identication theory
2.4.2 The analytical method
So far, little has been said about the analytical method. However, the latter plays an
eminent role in the identication process. After all, it is the analytical method that
yields the identication parameters. We can subdivide the qualitative analytical
methods roughly into two categories:
1. classication methods, and
2. identication methods.
Classication methods are methods that yield a specic signal for a whole class of
substances, whereas identication methods yield a signal that is specic for a single
substance.
Although classication methods do not furnish the identication of specic
substances, they can be used to narrow the number of possible candidates and
may therefore play an important role in the identication process. An example of
analytical methods that provide a classication are color reactions, for example
the Mandelin's and Marquis spot tests. Such a spot test can classify in positive
or negative (dichotomous) results. Moreover, the observed color may reduce the
number of candidates even further.
In modern analytical chemistry, immunoassays and receptorassays are powerful classication methods. In immunoassays, a signal above the cut-o-value of the
assay can be considered a positive result. Usually, an immunoassay is developed using a particular representative of the class (e.g. diazepam as a representative of the
class of benzodiazepines). It must be noted that other benzodiazepines may have
a dierent sensitivity in the immunoassay, depending on the cross reactivity. Since
immunoassays can be sensitive and in general need a limited pre-preparation of
biological samples, these assays are popular classication methods. Hence, a large
variety of immunoassays are commercially available. Ferrara et al. [52] compare six
of these immunochemical techniques and several chromatographic techniques for
drugs of testing in urine [52]. Fitzgerald et al. [53] also compared six immunoassay
methods for the detection of benzodiazepine use, and for general drug screening in
urine.
In receptorassays, receptors from animals are used instead of antibodies.
The use of radioreceptor assays for systematic toxicological analysis has been described by Ensing et al. [54,55].
Some methods provide more useful information than others. An important
issue is how to select analytical methods for identication purposes. The selection
and performance evaluation of the analytical methods for identication purposes is
the main subject of Chapter 3.
In general, a single analytical method cannot identify all possible unknown
analytes unequivocally. In other words, analytical methods are not ideal identication methods. Therefore, two or more analytical methods have to be used simultaneously. The use of multiple analytical methods requires an extension of the model,
which is given in Chapter 5.
2.4. Discussion
19
2.4.3 Prevalence
From the probabilistic (i.e. Bayesian) approach it is evident that the prevalence of
chemical substances in cases of poisoning can be a very important piece of information. Simple reasoning also yields this insight: it is illogical to encounter very
exotic substances. In the identication process the prevalence of the substances is
usually ignored (although it is expressed implicitly by the choice of the reference
substances). In the nal conclusion however, the analytical toxicologist should always take the prevalence of the substances into account. Yet, even though it may
not occur often, encountering exotic substances in intoxications can be ruled out.
Information about the prevalence of potentially toxic substances is usually
available, although it depends on many variables, such as geographical factors and
community factors. Furthermore the prevalence may change with time. For instance,
the use of arsenic in cases of intentional poisoning (murder) has diminished since
it could be readily detected. Moreover, chemists develop new drugs and poisonous
chemicals every day, adding substances to the list of possible candidates constantly.
Information on the prevalence of chemical substances in cases of poisoning
can be obtained from epidemiologic investigations or retrospective surveys of the
laboratory results. An example of a recent retrospective survey of the results of
drugs of abuse screening for a particular region of the U.K. was given by George and
Braithwaite [56]. For The Netherlands, epidemiologic data have been summarized
by Van Heijst and Pikaar [57].
Given sucient data, the prevalence of a chemical substance xi can be
expressed as
in which substance xi was detected
p(xi ) = number of casestotal
number of cases
(2.21)
Thus, the prevalence is linked to probability theory and can be seen as a form of a
priori information.
2.4.4 Clinical observations
In addition to the prevalence, another piece of important information that is often available, is the information provided by the investigations performed by the
physician or the coroner. Unlike prevalence, it is dicult to translate the clinical
observations into probabilities. There are two approaches to take the information
into account
1. The diagnosis directs the analytical investigation, and
2. The diagnosis has to be supported by the analytical results.
If the diagnosis is used to direct the analytical investigation the analytical procedure(s) used may miss substances. For instance, when the diagnosis suggests poisoning with an alkaloid and the analytical procedure used is geared to alkaloids,
non-alkaloids will not be found. This is notably important in cases of poisoning
with multiple substances.
If one takes the stand that the analytical results have to be in keeping with,
or explain the ndings of the physician or coroner, the analytical investigation can
20
Chapter 2. Substance identication theory
be considered independent from the clinical investigation. The probability of a false
negative will be smaller. However, when time is critical, which is almost always
the case in clinical investigations, the use of the diagnosis to assist the analytical
investigation can save a lot of time when applied properly.
2.5 Concluding remarks
A model for the identication process of substances was given. According to this
model, the identication process can be regarded as a form of hypothesis testing.
Thus, the identication of substances is given some \philosophical background".
Furthermore, the probabilistic approach towards the identication process
was treated. This approach not only provides a ne way to describe the identication
process, but also forms the basis for some of the concepts that will be treated in
the next chapters.