Comparison of principal components regression

Chemometrics and Intelligent Laboratory Systems 65 (2003) 257 – 279
www.elsevier.com/locate/chemometrics
Comparison of principal components regression and partial
least squares regression through generic
simulations of complex mixtures
Peter D. Wentzell *, Lorenzo Vega Montoto
Department of Chemistry, Trace Analysis Research Centre, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3
Received 17 June 2002; received in revised form 29 October 2002; accepted 31 October 2002
Abstract
Two of the most widely employed multivariate calibration methods, principal components regression (PCR) and partial least
squares regression (PLS), are compared using simulation studies of complex chemical mixtures which contain a large number of
components. Details of the complex mixture model, including concentration distributions and spectral characteristics, are
presented. Results from the application of PCR and PLS are presented, showing how the prediction errors and number of latent
variables (NLV) used vary with the relative abundance of mixture components. Simulation parameters varied include the
distribution of mean concentrations, spectral correlation, noise level, number of mixture components, number of calibration
samples, and the maximum number of latent variables available. In all cases, except when artificial constraints were placed on
the number of latent variables retained, no significant differences were reported in the prediction errors reported by PCR and
PLS. PLS almost always required fewer latent variables than PCR, but this did not appear to influence predictive ability.
D 2002 Elsevier Science B.V. All rights reserved.
Keywords: Principal components regression; Partial least squares regression; Multivariate calibration; Comparison; Simulation; Complex
mixtures
1. Introduction
Multivariate calibration, especially first-order calibration, has become an indispensable part of modern
analytical chemistry. For as long as multivariate
calibration has been employed, researchers have
sought to develop better techniques for building
calibration models. Countless strategies have evolved
* Corresponding author. Tel.: +1-902-494-3708; fax: +1-902494-1310.
E-mail address: [email protected] (P.D. Wentzell).
over the years in an attempt to improve on existing
methods. Although some of these methods are clearly
better than others under a given set of circumstances,
there is no single ‘‘best’’ approach to calibration.
Nevertheless, a number of techniques have withstood
the test of time and have become regarded as standards of multivariate calibration for whatever reasons—tradition, simplicity, reliability, versatility or a
host of other factors. Among these techniques are
principal components regression (PCR) and partial
least squares regression (PLS). These two techniques
are similar in many ways and the theoretical relationship between them has been treated extensively in the
0169-7439/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 9 - 7 4 3 9 ( 0 2 ) 0 0 1 3 8 - 7
258
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
Table 1
Summary of the literature review
Reference
Topic
Conclusions and quotations
[7]
Theoretical-Simulations
[3]
Theoretical
[4]
Theoretical-Simulations
[8]
Simulations
[9]
Theoretical
[10]
UV – Vis.; ternary mixtures;
first and second derivative
FT-IR; Fat, Protein, Lactose
PCR and PLS gave similar optimal prediction results. PLS uses fewer
LVs than PCR.
‘‘PLS is one of a continuum of variations similar to PCR making specific
implicit assumptions about the importance of each principal component
for describing the dependant variable. It is inherently no better than those
methods formed by selecting individual eigenvectors.’’
PLS often reached its minimal mean squared error of prediction using
fewer factors than PCR.
‘‘PCR and PLS are very similar’’. . .‘‘PLS seems to predict better than PCR
in the cases when there are random linear baselines or independently
varying major spectral components which overlap with the spectral features
of the analyte.’’
2
2
‘‘It has been proven that the intuitive inequality RPLS
z RPCR
holds for any
given dimensionality. . .it should be emphasized that a better goodness of fit,
as expressed by R2, does not necessarily transcend into superior prediction
performance.’’
PLS was similar to PCR in all cases, even when derivatives were used.
[11]
[12]
IR; Blood serum constituents;
PCR, PLS and PLS-ANN
[13]
ATR-FTIR; styrene-butadiene in a
polymer; PLS, PCR and MLR
[14]
NIRS; Forage production;
PCR, PLS and restricted PCR
[15]
UV – Vis.; mixtures of up
to five cations
NIRS; pharmaceutical tablets;
PLS and variants of PCR with
selection factors
[16]
[17]
[18]
[19]
[20]
[21]
[22]
UV – Vis; phenols; MLR, PCR
and PLS
Fluorescence; amino acid
derivatives; PCR and PLS
NIR; wheat and blood samples;
PCR, PLS
UV – Vis and HPLC; pesticides
UV – Vis and HPLC; pesticides
[23]
Fluorescence; PAHs;
10 components
UV – Vis; PAHs
[24]
UV – Vis; colorants
PLS and PCR performed in a similar way. Both models used the same
number of LVs to construct the model for the prediction step.
PLS, PCR and PLS-ANN gave similar results. In presence of small
nonlinearity effects, ANNs are not necessary. ‘‘PLS and PCR yielded very
similar results for the prediction of every constituent studied.’’
PLS worked better than PCR and MLR when the original spectra were used.
‘‘The best result is obtained with PLS’’. . .‘‘Once again this experimental
example supports the theoretical view that PLS is more often superior to
PCR and MLR for this FTIR application.’’
PLS, PCR and RPCR were similar when the correct number of PCs were
chosen. ‘‘The advantages of PLSR is that the method reaches the minimum
prediction error with fewer components included than are included for PCR
and RPCR.’’
PLS and PCR gave similar performance for all systems. PLS uses fewer LVs
than PCR.
PLS and PCR top-down procedures were almost the same in terms of
predictive ability. Some pre-processing tools were used in trying to improve
the results. ‘‘Whether PCR with selection of PCs is better than PLS or viceversa cannot be decided really from these data. In this case, the quality of the
model is similar, but the PCR model with the selection of PCs is simpler.’’
PCR and PLS gave the same results. Using the pre-processing tools, MLR
gave better results than PCR and PLS.
Stated that PLS works better than PCR but gave no numerical results. Only
presented plots where it was difficult to observe any differences.
PLS and PCR have similar optimal prediction ability.
PCR, PLS1 and PLS2 produce good results. All exhibit similar performance.
‘‘PLS1 seems to predict better than PLS-2 and PCR in cases when there are
multiple components with overlapping peaks.’’
PLS and PCR gave almost the same results. PLS was stated to perform
slightly better than PCR but this conclusion was not statistically validated.
‘‘The particular data set is predicted well by PLS and PCR. . .although. . .in
all cases PLS outperforms PCR.’’
‘‘. . .no significant difference was observed in the precision of prediction
between PCR and PLS methods.’’
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
259
Table 1 (continued )
Reference
Topic
Conclusions and quotations
[25]
UV – Vis; ternary mixture of pesticides
[26]
Voltammetry; Zn, Cu, Cd and
Pb mixture; MLR, PCR and PLS
[27]
Cyclic voltammetry; tryptophan
[28]
Theoretical – Practical; NIR;
wheat samples and gasoline samples
[29]
[30]
NIR; acrylic fibres
UV – Vis, FIA; amines in foods;
PCR and PLS
NIR; variety of data sets
PLS and PCR provide similar results in terms of RMSEP using the same
number of LVs.
Reported PLS was better than PCR for predictions, but actually, PLS was
better for one element (Pb). For the others, PLS and PCR performed equally
well. ‘‘The predictions using PLS were considered to be superior to both
MLR and PCR as the PLS modeling structure finds the latent variables that
describe both the variation in the x- and y-block data. . .’’
PLS and NL-PLS worked better than PCR but the differences were not
significant. It seemed that PLS works better when peak shifts appear.
‘‘. . .PCR is often too restrictive (less variance but more bias) and PLS is
too flexible (more variance but less bias). Said another way, PCR tends to
stop too soon in including additional eigenvectors, while PLS goes too far
and uses too many eigenvectors. It should be noted that it is not necessarily
PCR and PLS that stop too soon or go to far respectively, but it can be the
operator, the one building the model.’’
PLS and PCR had the same prediction ability in this system.
PLS1, PLS2 and PCR gave similar prediction errors.
[31]
‘‘It has been shown that a correct application of many calibration methods
(MLR to the selected variables, PCR, PLS and NN) to a linear problem
yields results of similar quality.’’
literature [1 –4]. Naturally, discussions often arise as
to the relative merits of these two approaches when
applied to chemical data and no clear answer has yet
to surface. The purpose of this paper is to attempt to
illuminate some of the similarities and differences in
the performance of these two techniques from a
different perspective, namely through the simulation
of complex mixtures.
Historically, PCR predates PLS, with the latter
appearing in the chemical literature around 1983 (for
a recent historical perspective, see Refs. [5,6]). Since
its introduction, however, PLS appears by most
accounts to have become the method of choice
among chemists. The reasons for this are not entirely
clear, but a number of perceived advantages have
been cited in the literature. These include: (1) PLS
should predict better because correlations with the y
variable are sought in determining the scores, (2)
PLS requires fewer latent variables than PCR and
should therefore be more parsimonious, (3) PLS
loadings are more readily interpreted, and (4) PLS
should handle nonlinearities better than PCR. In this
paper, we focus on the first item in this list and
attempt to address the question ‘‘Does PLS predict
better than PCR?’’. In trying to answer this question,
we will specifically focus on the prediction of
concentrations from multichannel data intended to
emulate spectroscopic responses and created assuming a linear causal model. It is clear that this is a
restrictive case and conclusions will be by no means
general, but even this limited study requires a model
of some complexity. Likewise, we recognize that
there are a large number of calibration methods
available, which may be equally valid for comparison, but only PCR and PLS have been chosen to
keep the scope of the study manageable.
Initially, a review of the literature was undertaken
to obtain a clear picture of what kinds of studies had
already been done to compare these two methods and
what conclusions had been drawn. Table 1 is a
summary of the results from this literature survey.
Given the abundance of papers on multivariate calibration and the limitations of search parameters, some
work will undoubtedly have been missed in this
survey, but the results should reflect a reasonable
cross-section of literature conclusions. In each case,
the nature of the study is indicated and the major
conclusions of the work are summarized, with a direct
quotation from the paper where it is useful.
The results of the study were somewhat surprising.
Given the clear popularity of PLS over PCR among
chemists, we had expected to see many instances
260
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
where the superior predictive performance of PLS was
evident. While there were a few cases which indicated
that PLS gave better results than PCR, a greater
number of studies indicated no real difference in
performance. Additionally, there were no theoretical
studies which suggested that one method should
predict better than the other. The conclusions of this
‘‘meta-analysis’’ would appear to run counter to the
popular view of the inherent predictive advantages of
PLS. However, a difficulty with these studies is that
most are isolated cases, so it is difficult to form a clear
picture.
Comparative studies of PCR and PLS carried out
so far can be classified into four categories: (1)
theoretical studies, (2) experimental studies on simple
mixtures, (3) simulation studies on simple mixtures,
and (4) experimental studies on complex mixtures.
Theoretical studies based on the underlying mathematics of PCR and PLS have proven very useful for
elucidating the relationship between PCR and PLS,
but have not provided a clear indication of where one
method may be advantageous over the other. Experimental studies on well-defined mixtures consisting of
a few components have also been carried out. Such
mixtures, which are often synthetically prepared from
a previously determined design matrix, will be
referred to as ‘‘simple mixtures’’ in this work. These
studies are often inconclusive, since they represent
only one particular piece of a bigger puzzle and are
limited in the range of variables that can be explored
and the statistical inferences that can be made. Simulations of simple systems (for example, using simple
Gaussians to represent spectra) can alleviate both of
these problems by expanding the ways that variables
can be manipulated and allowing Monte-Carlo methods to be employed. One disadvantage, however, is
that experimental variables which could be important
in distinguishing method performance in real systems
(e.g. noise heteroscedascity and correlation, nonlinearities, temperature changes) may not be accurately
represented in simulations. In our experience, the
application of PCR and PLS to simple systems with
well-defined chemical rank gives no real difference in
performance. More challenging and potentially more
useful applications involve real chemical systems of
much greater complexity. Such systems, which will be
referred to as ‘‘complex mixtures’’ in this work, are
more often at the heart of practical applications of
multivariate calibration (e.g. octane number in gasoline, protein in wheat flour). The complexity of these
systems arises from the fact that they can easily
contain thousands of chemical constituents and therefore often have no well-defined chemical rank. The
concentrations of chemical components can range
from major constituents to trace levels and will vary
between samples in a natural design with confounding
inter-component correlations. Under these conditions,
differences in the predictive performance of PCR and
PLS may become apparent. Indeed, such complex
systems have been studied experimentally and comparisons of performance have been made, but again it
is difficult to draw broad conclusions from a limited
number of isolated experimental results.
The objective of the work presented here was to
introduce a fifth domain of comparative study, specifically the simulation of complex mixtures. The central
hypothesis is that differences between PCR and PLS
become more pronounced for complex mixtures and
that these differences will be more apparent if we are
able to carry out a large number of simulations while
modifying important experimental variables. Of
course, it is impossible to provide a completely
accurate simulation of complex chemical systems
such as a petroleum or food product because of the
number of unknown parameters involved. The strategy developed here for complex mixture simulation is
quite generic and will be limited to some degree in the
extent to which it can model real mixtures. Therefore,
a secondary objective of this work is to provide the
rationale behind the simulation methods employed
and to illustrate the performance of the calibration
methods as it pertains to certain kinds of changes in
the simulation parameters, such as mixture complexity
and spectral characteristics.
2. Simulation of complex mixtures
The simulations carried out in this study were
intended to provide an approximation to real systems
which contain a large number of chemical components. Of course, such simulations must always fall
short of reality since it is impossible to know the
complex relationships among chemical species in a
mixture. Nevertheless, it was hoped that conducting
such a study would provide insights into the calibra-
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
261
tion process and perhaps into the model of complex
chemical mixtures itself.
Some basic assumptions were first made about the
behavior of complex mixtures. The most fundamental of these was that the matrix of measurements
follows a linearly additive model:
simulation parameters were synthesized to obtain a
reasonable approximation to real systems. Some of the
approaches draw from earlier work, which simulated
complex mixtures for chromatographic studies [32],
but new principles need to be introduced here because
of the differences in the two scenarios.
D ¼ CST þ E
2.1. Distribution of concentrations
ð1Þ
Here, D represents the m n matrix of instrument
responses for m samples collected over n channels,
C is the m p matrix of concentrations for the p
components in the m mixtures, S represents the n p
matrix of sensitivity factors for the components
present, and E is the m n matrix of residual errors
(noise). For simplicity and without loss of generality,
it will be assumed throughout this work that the
responses are mixture spectra of some kind, so that S
holds the pure component spectra. Note that this
model is no different than the one usually assumed
in multivariate calibration, but for complex mixtures,
the number of components ( p) can be quite large,
often exceeding m and n. Although this technically
leads to an underdetermined system with no solution,
we postulate that the spectra for most real systems
are dominated by relatively few species, which
include the quantifiable components of interest.
Thus, estimation of concentration by least squares
methods such as PCR and PLS will involve the
classic variance/bias tradeoff. It is also true that there
will likely be some strong cases of correlation (or
anticorrelation) among the concentrations of many
components in C when a natural calibration design is
employed. Since little is known about the distribution of concentrations in real mixtures, let alone their
covariance structure, we have made no attempt to
model this covariance here.
Even with the simple model given in Eq. (1), there
are clearly a large number of mixture variables that
can be manipulated. These include: (1) the number of
chemical components, (2) the concentration distribution of the components, (3) the concentration distribution for a single component, (4) the degree of
correlation of the pure component spectra, (5) the
number of wavelength channels, (6) the level and
structure of noise in the spectral measurements, and
(7) the level of uncertainty in the reference concentrations. The following sections describe how these
One can imagine that a complex mixture such as
a petroleum product or food sample could contain
thousands of individual components that elicit an
instrument response. The concentrations of these
components are expected to vary from sample to
sample and two questions need to be addressed
concerning the their distribution: (1) for a given
component, how does the concentration vary from
sample to sample?, and (2) for a given sample, what
is the distribution of concentrations of individual
components? The answers to these questions are
critical in the design of a simulation model for
complex mixtures since they affect the relative contributions of analyte and interferences to the overall
signal and the range of concentrations available for
calibration.
Very little information is available in the literature
concerning the distribution of the concentrations of
different components in a complex mixture. In order
to generalize the discussion beyond a single sample,
we will consider the question of the distribution of the
mean concentrations of different components, where
the mean is calculated over all samples. One can
imagine two extremes of a mixture consisting of
Ncomp components. In one case, the mean concentrations of all components may be nearly the same,
resulting in a very narrow distribution. This might
be observed, for example, for a set of synthetic
calibration mixtures in a designed experiment. At
the other extreme, a mixture might contain a large
number of components at low concentration and a few
dominant components at higher concentrations. This
has some intuitive appeal for natural mixtures and
some experimental evidence in the literature has
indicated exponential-like distributions, which reflect
these characteristics [33,34]. The use of an exponential distribution of mean concentrations was explored
in this work, but in the end, it was considered to be
too restrictive. A true exponential distribution has an
262
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
equal mean and variance, removing the ability to
change the shape of the distribution independently
of the mean. Instead, the log-normal distribution was
chosen since it allowed for a range of situations to be
explored. Fig. 1 compares the exponential distribution
to the log-normal distribution for a variety of variance
values for the latter. Since the log-normal distribution
is not symmetric, the median is a better indicator of
central tendency than the mean, so the cases presented
all have a median of unity. Note that when the
variance is small, the limiting case of a narrow
distribution of mean concentrations is represented.
As the variance increases, the distribution becomes
more exponential-like. At a variance of unity, the
distribution is quite similar to the exponential distribution, but it lacks the contributions of compounds at
very low concentrations, which are likely to be below
the detection limit in any case. Because of these
characteristics, the log-normal distribution was used
to model the mean concentrations of mixture components in this work.
Once the distribution of mean component concentrations in a mixture has been determined, it is
necessary to define how the concentrations of the
individual components distribute around the means
from sample to sample. Experimental results reporting
the distribution of a particular component in a series
of mixtures are relatively common in the literature,
particularly in the fields of environmental and clinical
chemistry (see, for example, Refs. [35 –37]). Both
normal and log-normal distributions are observed,
with the latter being reported more frequently in an
informal literature survey. In the simulations reported
here, a normal distribution about the mean concentration was employed primarily because of its greater
simplicity; that is, there was no need to describe the
skewness of the distribution. For each component, the
standard deviation of the distribution about the mean
was taken to be 30% of the mean concentration, i.e. all
components had the same relative spread. Although
this value is somewhat arbitrary, it was considered
reasonable for a calibration experiment involving a
natural design.
Based on these assumptions, the generation of the
component concentration matrix, C, in Eq. (1) was
carried out as follows. First, Eq. (2) was used to
generate a (Ncomp 1) vector of mean concentrations,
cmean, drawn from a log-normal distribution with a
median concentration of cmed and a standard deviation
of rc:
cmean ¼ exp½ðlncmed þ zÞrc ð2Þ
In this equation, z, is a vector (Ncomp 1) of random
values drawn from a normal distribution with a mean
of zero and a variance of unity (N(0,1)). Once the
vector of mean concentrations for all of the components is obtained in this way, the actual concentrations
used to generate the simulated data via Eq. (1) are
obtained by assuming a normal distribution with a
relative standard deviation (RSD) of 30%. Mathematically, this is represented in Eq. (3):
C ¼ 1m cTmean þ ð0:3Þ1m cTmean *Z
Fig. 1. Comparison of exponential and log-normal concentration
distributions.
ð3Þ
Here, 1m designates an (m 1) vector of ones where
m is the number of mixtures, Z is an (m Ncomp)
matrix of N(0,1) random numbers, and ‘‘*’’ indicates
the Hadamard or element-by-element product of two
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
matrices. Any negative values produced in this calculation are set to zero.
2.2. Pure component spectra
The spectroscopic (or other response) profiles of
the pure components in a complex mixture will obviously have an important impact on the ability to
quantitate a particular analyte. The sensitivity of the
calibration for the analyte will depend on both the
magnitude and shape of its spectrum relative to the
interferents in the mixture. In this work, the magnitude of the spectral response is assumed to be imbedded in the distribution of the concentrations; that is,
the concentration distribution is assumed to reflect a
composite of the absolute concentration and the
magnitude of the pure spectral response. For this
reason, only the shape of the pure component spectra
is considered as a factor in the simulations. The key
element here is the correlation, or angle, between the
spectral vector of interest (the analyte) and the spectra
of other components (interferents) in the mixture. In
this regard, one can imagine a broad scope of situations, ranging from cases where spectra are nearly
orthogonal to one another (as in the case of mass
spectrometry) to cases where spectra are highly overlapped and share a small spectral angle (as in UV –
Visible absorption spectroscopy in solution). The
intent of this study was to simulate these different
situations. Unfortunately, however, no a priori information is available regarding the distribution of spectral angles in complex mixtures. In the absence of
such knowledge, two approaches were devised for this
work.
The first approach was adopted from an earlier
simulation study conducted in conjunction with window target-testing factor analysis [32]. In this
approach, pure component spectra are constructed in
such a way that they exhibit the same mutual correlation, or spectral angle, which can be designated in
advance. (Note that spectral similarity can be indicated by either the correlation coefficient or angle
between the spectral vectors, but in this work the latter
is used more often since it permits a better visualization of relationship between vectors in the calibration space.) The intention here is to represent changes
in spectral similarity, which can be expressed in terms
of a mean spectral angle, by varying the magnitude of
263
the fixed angle between all spectra. While this approach results in a system that is well-defined, it has a
number of important drawbacks. First, the pure component spectra generated do not resemble spectra,
which are typically obtained in a chemical system.
This is illustrated in Fig. 2, which shows spectra for
three components of a 20-component mixture. While
this does not represent a mathematical impediment,
the resulting mixture spectra are not as aesthetically
pleasing as they might be. A second drawback is that
the number of spectral channels employed must be
greater than or equal to the number of components in
the mixture, placing a restriction on the simulation of
complex mixtures. Perhaps the biggest disadvantage
of this approach, however, is that it does not reflect
the distribution of spectral similarity that one might
expect to exist naturally in real mixtures; that is, one
would expect to observe some pure component spectra which are quite similar and others which are very
different.
To address the problems of the fixed angle approach, an alternative strategy was employed. Spectra
were generated by first creating an (n Ncomp) matrix
of random numbers drawn from a normal distribution
between (N(0,1)), where n is the number of spectral
channels. The columns of this matrix were then
smoothed repeatedly until the mean angle among the
columns was close to a prespecified desired value. (A
41-point moving average filter with wrapping at the
Fig. 2. Representative simulated spectra from a 20-component
mixture using a fixed mutual angle criterion with an angle of 30j.
264
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
edges was used for this, with a reduction in the filter
width on the final pass to obtain the desired angle.)
Following normalization of each column to unit
length, this matrix was used as the pure component
spectra, S, in Eq. (1). When no filtering is used, the
average angle between spectra should be close to 90j
(correlation coefficient goes to 0). As the amount of
smoothing is increased, the mean angle decreases and
the distribution of angles becomes more diffuse. This
is illustrated in Fig. 3, which compares representative
spectra and distributions of angles and correlation
coefficients for three different cases involving 100
components and 100 channels. It should be noted that
Fig. 3. Representative spectra (right-hand panels) from continuous
angle simulation method. Three different mean angles (85j, 45j,
and 25j) are represented. The left-hand panels show the distribution
of spectral angles and correlation coefficients in each case.
the distributions and mean angles are based on all
possible unique pairs of spectral vectors. The three
cases illustrated might be considered representative of
spectroscopic methods with inherently different information content (e.g. MS, IR and UV – Vis.). Unlike
the spectra generated from the fixed angle method,
these spectra exhibit varying degrees of correlation
away from the mean value and are therefore considered to be more consistent with real mixtures,
although the extent to which this is true cannot really
be known.
2.3. Noise characteristics
Having defined C and S in Eq. (1), it remains to
define the spectral measurement noise matrix, E, in
order to generate the simulated data. The noise characteristics of measurements are critical to the success
of multivariate calibration, not only in terms of
magnitude of the noise, but also its correlation structure, as has been demonstrated in the literature [38 –
41]. Since there have been no differences observed
between PCR and PLS in the presence of correlated
noise [41] and both methods are designed to perform
optimally in the presence of uniform, uncorrelated,
normally distributed noise, this was the error structure
assumed here. This simple assumption also removed
the need to specify heteroscedascity and correlation
structure in the noise as additional parameters in the
simulation. However, the magnitude of the noise in
relation to the signal, normally expressed as the
signal-to-noise ratio (S/N), is a very important factor.
The question of how to express S/N for the simulations carried out is a difficult one, since the objective
was to compare different conditions (number of components, spectral shape, concentration distributions)
under ‘‘equivalent’’ noise conditions. For univariate
data, the S/N for a group of samples can be defined
fairly easily, for example, as the mean of calibration
responses divided by the standard deviation of the
measurements. For multivariate calibration, S/N is
usually related to the multivariate sensitivity for a
component, SENi, divided by the measurement standard deviation [42]. Unfortunately, this definition
would be inappropriate here since the sensitivity,
which is the length of the net analyte signal vector
(NASi) at unit concentration, is different for each
component. Furthermore, the idea of the simulations
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
is to observe changes in sensitivity (through the prediction errors) under different conditions, so fixing the
conditions so that they all exhibited the same SEN/N
would not be useful. Likewise, employing a fixed
noise level for all simulations would not be realistic
either, since the magnitude of the total signal changes
with conditions (e.g. a mixture with 80 components
will give a larger total signal than a mixture with 10
components).
What is needed is a way to reflect the univariate S/
N for multichannel data. Often for multichannel data
(e.g. chromatography), the S/N is expressed using the
peak maximum, but that will not give consistent
results here since the spectral shapes can change
significantly. Instead, the noise level for these simulations was determined by first obtaining an ‘‘average’’ spectrum, calculated by summing the products of
the mean concentration of each component by its
corresponding spectrum, i.e. Smean = Scmean. This
was then used to calculated an rms averaged signal
over the number of channels used, a fraction of which
was taken to be the noise as shown in Eq. (4).
rnoise
1
¼
ðS=NÞ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
NScmean N
n
ð4Þ
The S/N is specified prior to a simulation and values of
10, 100, and 1000 were employed in this work. This
approach was found to give consistent results under
changing simulation conditions and could be intuitively related to realistic measurement conditions.
Once rnoise was defined, it was simply multiplied by
an (m n) matrix of N(0,1) random numbers to give E.
2.4. Other variables
In addition to the variables already discussed, a
number of other parameters could be modified in
the simulations. A summary of these is given in
Table 2 along with the conditions used in this study
and the values employed under ‘‘benchmark’’ conditions (see Section 4.1). The number of wavelength
channels used was fixed at 100 for all simulations.
Although many real instruments have more channels
than this, this was seen as a compromise, which was
sufficient to represent the spectral variability and
reduce computation time. In terms of the number of
265
Table 2
Summary of the different system variables and values used in the
simulations
Variables
Values
Variance of concentration
distribution
Spectrum type
Spectral angles
0.01, 0.1, 1 and 10
Number of wavelength
channels
S/N
Number of components
Number of calibration samples
Number of validation samples
Number of prediction samples
Number of LVs used
Fixed and Continuous
10j, 30j and 80j (Fixed)
25j, 45j and 85j
(Continuous)
100
1000, 100 and 10
10, 30 and 80
15, 25 and 50
100
100
80% and 100%
of maximum
Benchmark conditions are shown in bold.
components present in the mixtures, the complexity
was varied over three levels (10, 30 and 80). The
number of calibration samples was also considered
at three levels, but the number of samples used for
validation and prediction was fixed at a relatively
high value of 100. The validation set was used to
determine the number of latent variables to be used
and the prediction set was used to evaluate the
prediction errors. A large number of samples was
used in each case to minimize the variability associated with these estimates. Several levels were also
employed for the spectral angle, S/N, and variance
of the log-normal concentration distribution.
In all of the studies reported here, except those
noted in Section 4.8, there were no errors in the
reference concentrations used for the calibration,
validation and prediction.
3. Experimental
3.1. General procedure
The objective of these studies was to compare the
quality of prediction of PCR and PLS for complex
mixtures under a variety of conditions. Prediction
errors were used as the principal figure of merit. A
complicating factor is that, under a given set of
266
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
conditions, the predictive ability changes greatly
depending on the component under consideration,
with major constituents being predicted better than
minor ones. Furthermore, in order to obtain a statistically valid sampling, it was necessary carry out
replicate runs under each set of conditions, with the
same parameters (e.g. concentration distribution), but
different realizations of those parameters (e.g. cmean).
Thus, each replicate run had a different profile of
component concentrations. For these reasons, it was
necessary to compile prediction error information for
each component in each replicate run and then find
some way to summarize this information in a useful
way. In addition, because the number of latent variables used is sometimes cited as an important distinction between PCR and PLS, this variable also needed
to be compiled.
The algorithm to carry out the simulations and
produce this information under a given set of conditions is shown in Fig. 4. First, the system parameters
(see Table 2) are selected and used to generate the
concentration, spectral and error matrices as previously described. These are then combined as given in
Eq. (1) to give the calibration, validation and prediction data. For each component in the mixture, a
Fig. 4. Flowchart of procedure used for simulations.
calibration model is built using PCR and PLS-1, the
number of latent variables is selected using the validation set, and the prediction errors are assessed using
the prediction set. In this case, predictive ability was
evaluated using the relative root-mean-squared error
of prediction (RRMSEP), defined by Eq. (5).
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uNpred
uX
t ðc ĉ Þ2 =N
ij
ij
pred
RRMSEPj ¼
i¼1
rangej
ð5Þ
In this equation, cij and ĉij are the true and predicted
the concentrations of component j in mixture i of the
prediction set, respectively, Npred is the number of
prediction samples, and rangej is the range of concentrations for component j. To avoid the statistical
variations caused by using a range based on actual
simulated concentration values, a theoretical range
equivalent to F 2rc was employed in Eq. (5). Given
that the RSD of concentration values was set to 30%
in the simulation, this was equivalent to a denominator of (1.2cmean) in Eq. (5). By using the relative
prediction error in this way, predictive abilities for
components with different concentrations could be
easily compared.
Returning to the flowchart in Fig. 4, once the
RRMSEP and optimum number of latent variables
(NLV) were calculated for each component by each
method (PCR and PLS), these results were stored
together with the mean concentration. This process
was then repeated Nrep times. For each repetition, a
new realization of the concentration distribution and
pure component spectra was used to ensure statistically meaningful results. The number of repetitions
depended on the number of components in the mixture since the objective was to obtain a similar total
number of components, typically 5000, for each set of
conditions (i.e. Nrep = 5000/Ncomp).
In order to examine the predictive ability of PCR
and PLS as a function of the relative abundance of
components in a mixture (i.e. major vs. minor components) and to do this across different concentration
distributions, it was necessary to first sort the resulting
data from lowest to highest mean concentration. Once
sorted, the data were then divided into 20 bins containing equal numbers of prediction results (250 in the
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
case of 5000 data points). Within each bin, the means
of the RRMSEP and the NLV were calculated along
with their standard deviations. These were then displayed in the form of comparator and difference plots,
which are discussed in greater detail in Section 4.
3.2. Computational aspects
All the calculations performed in this work were
carried out on a Sun Ultra 60 workstation with
2 300 MHz processors and 512 MB of RAM and
a 700-MHz Pentium-III PC with 128 MB of RAM.
All programs were written in-house using Matlab 6.0
(The MathWorks, Natick, MA).
4. Results and discussion
4.1. Benchmark system
In order to compare the relative effects of changes
to the simulation parameters, one set of parameters
(indicated in Table 2) was chosen as a benchmark.
Changes in one variable at a time from these con-
267
ditions were then examined and these results are
presented in the sections that follow to illustrate the
effect of the changes on calibration performance. In
this section, the results for the benchmark system
itself are described to gain a general appreciation of
the characteristics of comparator and difference plots
as they relate to the simulation of multivariate calibration in complex mixtures.
The results for the analysis of the benchmark system
are summarized in the four panels of Fig. 5. As noted in
Section 3.1, the x-axis on all of these plots is divided
into 20 bins, which relate to the mean concentration of
the components represented in that bin. It is important
to note that this is a percentile representation. For
example, the first bin represents the bottom 5% of
mean concentrations obtained by pooling the results of
167 repeat simulations of mixtures containing 30
components each. As such, it represents the most minor
components of the mixtures, whereas the last bin
represents the dominant components present. Note also
that the scale is not linear in concentration; that is, the
50th percentile does not necessarily correspond to onehalf of the concentration at the 100th percentile, since
this depends on the concentration distribution used.
Fig. 5. Simulation results for benchmark system showing: (a) comparator plot for relative prediction errors, (b) comparator plot for number of
latent variables, (c) difference plot for relative prediction errors, and (d) difference plot for number of latent variables.
268
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
The scale is monotonic, however, and can be used to
infer relative magnitudes. Fig. 5a and b is called
comparator plots in this work and shows the mean
values of the relative prediction errors and number of
latent variables within each concentration bin. The
error bars shown reflect the standard deviation of the
mean values in each case.
Fig. 5a is fairly typical for comparator plots of the
RRMSEP. It will be noted first that, in terms of
prediction errors, there is essentially no difference in
the results obtained by PCR and PLS. This was the
case for all of the comparisons made in this work,
except for the special case described in Section 4.7.
For minor components in the mixture, the left side of
Fig. 5a exhibits a small plateau that levels off between
25% and 30% RRMSEP. This feature, which is
present to a greater or lesser degree on many such
comparator plots, shows the region where the model
has essentially no predictive ability. It is easily shown
[43] that this error level is consistent with a prediction
based simply on the mean concentration, so it is clear
that the calibration is useless in this region. In the
intermediate region after the plateau, the RRMSEP
decreases monotonically as the real predictive ability
of the models improves. The point at which the
predictive ability of the model becomes practically
useful is a subjective matter, but the relative prediction
errors reach their smallest values for the major components at the right hand side of the figure. Fig. 5a
tells us what we might anticipate intuitively—that in a
complex mixture with a skewed concentration distribution (relatively few major components and more
minor components), it is possible to quantitate the
dominant species, but difficult or impossible to determine lesser ones. In this system, which is technically
underdetermined (number of calibration samples <
number of components), the minor components essentially act as noise in the determination of the major
ones. While these observations are not unexpected,
they do provide some confidence that the simulation
model is behaving as it should.
Fig. 5b shows the behavior of the optimum number
of latent variables as a function of concentration
percentile and again is fairly typical of the plots
observed throughout this study. In the low concentration range, where the calibration models are essentially useless, both PCR and PLS use only a few latent
variables. This is not surprising since very little
information is needed for predictions based on a mean
value. As the concentration percentile increases and
the predictive ability of the models improves, however, the number of latent variables used increases
quickly, reaching a plateau consistent with a multiple
linear regression (MLR) model when most or all of
the latent variables are used. Generally, throughout
these studies, PLS required fewer latent variables than
PCR, a result anticipated from literature reports, but
this factor also seemed to have no bearing on the
relative predictive abilities of the two methods. This
observation is discussed in greater detail in Section
4.7.
Fig. 5c and d, which is referred to as difference
plots in this work, shows the difference between PCR
and PLS in the comparator plots to amplify any
distinction between the two methods. It is clear from
Fig. 5c that differences in the predictive abilities of
these two methods are not significant. This was
generally the case throughout this work, so in the
interest of saving space, difference plots will only be
presented if a difference in the prediction errors was
observed. Differences in the number of latent variables are almost always observed, but these are usually
apparent from the comparator plot.
Two additional figures of merit have also been
employed in this work to more succinctly summarize
the information in the RRMSEP comparator plots.
The first is the region above which the calibration
models become quantitatively useful, which will be
designated with the symbol Q. This point is somewhat
arbitrary, but in this work, we have chosen the
concentration percentile where the RRMSEP drops
to 0.1 or 10%, designated as Q10. A cutoff of 10% is
arguably somewhat high for practical quantitation, but
the purpose is to make comparisons under different
conditions and lower values will lead to more cases
where Q is undefined (insufficient predictive ability
over the entire range) and therefore uninformative. To
give an indication of the best predictive ability, the
%RRMSEP at the highest concentration percentile is
reported as a second figure of merit and designated
with the symbol E100. For the benchmark system
studied here, Q10 = 46 (i.e. the species with concentrations above the 46th percentile can be predicted
with some quantitative reliability) and E100 = 0.8%
(i.e. the dominant species can be predicted with
0.8% RRMSEP).
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
4.2. Effects of spectral correlation and spectrum type
Since the performance of multivariate calibration is
highly dependent on the selectivity of the responses,
the effects of the parameters associated with the
simulated spectra will be described first. The benchmark system employs pure component spectra with a
continuous distribution of angles and a median angle
of 85j. This means that relatively little smoothing has
been performed on the original randomized spectra.
Fig. 6 compares the predictive performance of PCR
and PLS on the benchmark system for continuous
angle spectra as the median spectral angle changes. As
anticipated, the quality of the predictions decreases
with a decrease in the median spectral angle. In fact,
the models for median spectral angles of 25j and 45j
are essentially useless for all components in the
mixture, as reflected by the high RRMSEP values
and the fact that the optimum number of latent
variables remains small throughout the range of components. What has happened here is that the small
plateau corresponding to poor predictions on the lefthand side of the plot for the benchmark system (top
panel) has extended further to the right due to the
reduced selectivity for individual components. A
269
small downturn is noticed at the right-hand side of
both cases involving smaller angles, indicating some
improvement in predictive ability, but not to the extent
that it is very useful. This is not surprising since, in
the continuous angle systems, there will be a significant number of spectra with high correlations and this
will increase as the median angle decreases. That is
not to say that useful quantitative results could not be
obtained from systems with small median angles, but
other system parameters would have to change. For
example, the number of components would have to
decrease, the S/N would have to increase, or the
concentration distribution would have to become
more skewed to see an overall improvement in predictive ability.
The situation is a little different for the fixed angle
spectra, where all of the pure component spectra have
the same mutual correlation. Comparator plots for
these cases are shown in Fig. 7. In these cases as
well, the calibration models are not as good overall
when the spectral angle is decreased, but the effect is
more subtle. There is a monotonic shift of the
RRMSEP curves to the right as the angle decreases,
extending the region of poor prediction and diminishing the region of good quantitation. The effects,
Fig. 6. Comparator plots showing the effects of mean spectral angle for continuous angle spectra: h = 85j (a and b, benchmark system), 45j
(c and d), and 25j (e and f).
270
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
Fig. 7. Comparator plots showing the effects of mean spectral angle for fixed angle spectra: h = 80j (a and b), 30j (c and d), and 10j (e and f).
however, are much less dramatic than for the case of
the continuous angle distribution. These observations
may be more clearly indicated in the performance
table presented as Table 3, which provides Q10 and
E100 values for these systems. In the case of fixed
angles, the table indicates that, at 80j, 60% of the
most abundant components can be reasonably predicted (i.e. 100 40), while this number drops to 44%
at an angle of 10j.
From these studies, two conclusions can be reached:
(1) the correlation of pure component spectra is an
important factor in the success of multivariate calibration, and (2) the distribution of spectral angles is more
important than the median value in determining calibration success. The former point is fairly obvious, so
confirmation through simulation is satisfying. The
latter point is less apparent, but not surprising since,
regardless of the number of components present, it
takes only one highly correlated spectrum to severely
degrade the sensitivity for an analyte. With respect to
other parameters examined in the simulations that are
described in the following sections, results for the fixed
angle and continuous angle systems were qualitatively
similar. Therefore, only the results for the continuous
angle systems (which, in our opinion, should bear a
stronger resemblance to reality) will be reported.
4.3. Effects of mean concentration distribution
Table 3
Performance table for effect of the mean spectral angle and the
spectrum type (continuous and fixed angle distributions, 30
components, rc = 1, S/N = 1000)
Case
Q10 (Percentile)
E100 (%RRMSEP)
Continuous, h = 25j
Continuous, h = 45j
Continuous, h = 85j
Fixed, h = 10j
Fixed, h = 30j
Fixed, h = 80j
–
–
46
56
44
40
24
16
0.8
0.6
0.6
0.2
In a synthetic multivariate calibration experiment
involving only a few components with carefully
designed mixtures, we would expect a fairly narrow
distribution of mean concentrations. In other words,
the experiment would likely be designed such that the
mean response from each analyte was approximately
the same. In real applications, where the samples are
derived from complex natural systems, there are likely
to be a much larger number of components to which
the instrument responds. Even imagining that the pure
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
spectral response could be obtained for each of these,
which is unlikely, the concept of finding the contravariant vector for calibration of a particular analyte
becomes irrelevant because: (a) there may be an
insufficient number of channels to mathematically
determine this vector, and (b) the sensitivity for the
analyte of interest would become vanishingly small
due to the requirement of orthogonality to all of the
other spectra. However, it is still possible to determine
some analytes in such mixture because: (a) a large
number of components may be present at such low
levels relative to the analyte that they can be ignored
with limited consequences, and (b) the concentrations
of many components may be highly correlated, permitting quantitation of minor components via their
correlation with major ones. Both of these characteristics result in an effective reduction in the apparent
number of components in the mixtures and leads to
the classical variance/bias tradeoff. In this work, we
examined the effects of the distribution of mean
concentrations, but not the effects of concentration
correlations, which are more difficult to simulate in
the absence of a priori information.
All of the mean concentration distributions employed in this work had a median concentration of
271
unity, but with different levels of skewness as expressed by the variance of the log-normal distribution.
The higher the variance, the greater the skewness and
the larger the proportion of minor or trace components
relative to the major ones. Values ranging from
rc = 0.01, where the distribution is very sharp and
all components, have essentially the same mean concentration, to rc = 10, where the distribution is highly
skewed, were examined. However, only the range
between 0.1 and 3.2 is reported here since that was
sufficient to examine the limiting effects. The benchmark system, with rc = 1, has a distribution which is
substantially skewed, but not extreme (see Fig. 1).
Fig. 8 shows comparator plots illustrating the
effects of changes in the mean concentration distribution on prediction when all other parameters are
maintained at benchmark conditions. When rc = 0.1,
the mean concentrations in the mixture are fairly
uniform, so no components are present at insignificant
levels and every component can potentially interfere
in the determination of every other one. As a result,
the RRMSEP values are similar for all components
and somewhat better than predictions based on mean
values alone, but still reflect poor predictive ability
overall. In this instance, the transition region between
Fig. 8. Comparator plots showing the effects of log-normal mean concentration distribution: rc = 0.1 (a and b), 0.32 (c and d), 1.0 (e and f,
benchmark conditions), and 3.2 (g and h).
272
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
regions of very poor predictions and reasonably good
predictions has effectively expanded to occupy the
full range of concentrations and the plateaus associated with the two extremes do not appear. This is as
we would expect, since all of the components are
present at approximately the same concentration levels and therefore should exhibit similar prediction
errors. As other conditions in the simulation are
changed for this case, we expect the quality of
predictions to change overall, but remain similar for
all components. Significant improvements will be
seen at the point where the number of components
becomes equal to the number of calibration samples,
since the present system is underdetermined (see
Sections 4.5 and 4.6).
As the concentration distribution becomes more
skewed, Fig. 8 shows that there is increased differentiation between regions of good and bad predictive
ability. This means that the plateaus in the two limiting
regions become more extended and the transition
between them becomes sharper. This is expected, since
increasing differences between the major and minor
components improves the ability to determine the
former at the expense of the latter. However, the
proportion of components that can be reliably determined decreases as the distribution becomes more
skewed. This is because, for very skewed distributions,
the proportion of major components is quite small,
although they can be predicted quite well. This is also
clear from the figures of merit presented in Table 4.
Note that in going from rc = 0.32 to rc = 1, Q10
decreases from 73 to 46 (indicating an increase in the
number of components that can be accurately predicted), but then increases again to 62 for rc = 3.2.
Table 4
Performance table for effect of the concentration distribution
(continuous (h = 85j) and fixed angle (h = 80j) spectral distributions, 30 components, 25 calibration samples, S/N = 1000)
Case
Q10 (Percentile)
E100 (%RRMSEP)
Continuous, rc = 0.1
Continuous, rc = 0.32
Continuous, rc = 1
Continuous, rc = 3.2
Fixed, rc = 0.1
Fixed, rc = 0.32
Fixed, rc = 1
Fixed, rc = 3.2
100
73
46
65
53
50
40
62
10
5.6
0.8
0.0
7.2
3.2
0.2
0.0
On the other hand, E100 decreases monotonically,
indicating a continuous improvement in predictive
ability for the most dominant components as the
distribution becomes more skewed.
4.4. Effects of signal-to-noise ratio
For the benchmark system used in this study, the
S/N (as defined in Section 2.3) was set to 1000. This
represents relatively small (ca. 0.1% RSD) but not
unreasonable errors in the measured responses. It was
important to have a relatively large S/N for the
benchmark system so that the effects of measurement
noise on prediction errors would not obscure the
effects of other variables being examined. For simplicity, the studies here assumed a homoscedastic,
Gaussian noise structure with no errors in the reference concentrations.
As the S/N of the responses decreases, one would
expect the performance of the multivariate calibration
models to become progressively worse, and indeed
this is the case, as indicated in the comparator plots of
Fig. 9 and the performance table presented as Table 5.
With decreasing S/N, the proportion of components
that can be reliably predicted decreases, with the Q10
values becoming progressively larger. Likewise, the
most dominant components are not predicted as well,
as indicated by a progressive increase in E100 values.
However, the changes are not as dramatic as one might
expect. It is still possible to quantitate some components in the mixture with a S/N of only 10 and the
changes in performance between S/N = 100 and
S/N = 1000 are fairly subtle. This suggests that measurement noise is secondary to some of the other
characteristics of the mixture in determining predictive
ability at this level.
4.5. Effects of the number of mixture components
In complex mixtures, it is anticipated that the
predictive ability of the multivariate calibration model
will diminish overall as the number of components in
the mixture increases, since this increases the likelihood of spectral interferences. This is the case in the
studies carried out in this work, as demonstrated by
the comparator plots in Fig. 10. When there are only
10 components in the mixtures, excellent predictions
are obtained for all components, although there is
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
273
Fig. 9. Comparator plots showing the effects of S/N: S/N = 1000 (a and b, benchmark conditions), 100 (c and d), and 10 (e and f).
more than an order of magnitude difference in the
relative prediction errors for the least and most abundant components. The magnitudes of these errors will,
of course, be dependent on other system parameters,
such as the S/N and the degree of spectral correlation.
With 30 components (the benchmark system), 54% of
the components can be determined with an RRMSEP
of 10% or better, while this number drops to 8% when
there are 80 components in the mixture. Note that this
represents a reduction not only in the percentage of
components that can be quantitated, but also the
absolute number.
Table 5
Performance table for effect of the signal-to-noise ratio (continuous
(h = 85j) and fixed angle (h = 80j) spectral distributions, 30
components, 25 calibration samples, rc = 1)
Case
Q10 (Percentile)
E100 (%RRMSEP)
Continuous, S/N = 1000
Continuous, S/N = 100
Continuous, S/N = 10
Fixed, S/N = 1000
Fixed, S/N = 100
Fixed, S/N = 10
46
53
85
40
50
92
0.8
1.1
3.4
0.2
0.6
4.7
One of the reasons for the sharp contrast in performance between 10 and 30 components is, of
course, the fact that 25 calibration samples are used
in the benchmark system and, therefore, the system
with 10 components is overdetermined while that with
30 components is underdetermined. This is also
reflected by the fact that the number of latent variables
used stabilizes at around 10 for both PCR and PLS in
the 10-component system. Because of these conditions, it is possible to develop good models for all of
the components in the 10-component system, while
only the major components can be reliably modeled in
the 30-component system. Therefore, the number of
calibration samples employed can play an important
role and this factor is examined in the next section.
4.6. Effects of the number of calibration samples
An increase in the number of calibration samples
should result in a general improvement in the performance of multivariate calibration models for two main
reasons: (1) the availability of a larger number of
observations allows better modeling of the correlations
between responses and analyte concentrations and
therefore more reliable models, and (2) for complex
274
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
Fig. 10. Comparator plots showing the effects of the number of mixture components: Ncomp = 10 (a and b), 30 (c and d, benchmark conditions),
and 80 (e and f).
mixtures, it becomes less likely that we will have a
mathematically underdetermined system. These
effects are illustrated in Fig. 11, which shows com-
parator plots using 15, 25 and 50 calibration samples
for the benchmark conditions. With only 15 calibration
samples, predictive performance is degraded relative
Fig. 11. Comparator plots showing the effects of the number of calibration samples: Ncal = 15 (a and b), 25 (c and d, benchmark conditions), and
50 (e and f).
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
to the benchmark system (25 samples), allowing only
the upper 20% of components to be determined rather
than 54%. There is a dramatic change when 50
calibration samples are used such that, under these
circumstances, all 30 components can be reasonably
well quantitated (i.e. Q10 = 0%), with prediction errors
ranging from 2.2% to 0.04%. The reason for this is that
the system becomes overdetermined when the number
of calibration samples is greater than the number of
components, so it becomes possible to fully model the
spectral subspace. This is confirmed by examining the
plots of the number of latent variables which, for the
case of 50 calibration samples, is consistently around
30 for both PCR and PLS, despite the ability to go
higher. Also note that the number of latent variables
used by PCR is only about one more than is used by
PLS in this case.
The fact that a system has more calibration samples
than components does not guarantee reliable predictions for all components, since this ultimately relates to
sensitivity and noise considerations. In this case, the
system consists of a relatively large S/N and a relatively low spectral correlation. As the S/N decreases or
the spectral overlap becomes greater, or the number of
components increases, the prediction errors obtained
in the simulations (not shown here) were observed to
increase, even when the system was overdetermined.
4.7. Effects of restricting the number of latent
variables
In the preceding discussions, little has been said
about the differences between the relative predictive
performance of PCR and PLS, even though examining
these differences was the central motivation for this
study. This is because none of the cases presented so
far, nor numerous others not presented but carried out
away from benchmark conditions, showed any statistically significant differences in the prediction errors
for the two methods. Because of this negative conclusion, the emphasis in the discussions so far has
been on demonstrating the characteristics of the complex mixture model as it pertains to multivariate
calibration in general and thereby attempting to instill
some confidence in its ability to function as a reasonable simulation model.
In contrast, for virtually all of the cases studies,
there are distinct differences in the number of latent
275
variables used by PCR and PLS, with PLS requiring
significantly fewer latent variables than PCR, especially in the useful predictive regions of the comparator plots. This is consistent with the results reported
in the literature for real samples. However, it should
also be noted that in those systems that are not
underdetermined (Figs. 10b and 11f), there is very
little difference in the number of latent variables
employed by the two methods, that number being
equal to the number of mixture components. This is
also consistent with literature results for multivariate
calibration mixtures with a limited number of components, where most of the time there is little difference
in the number of latent variables employed by PCR
and PLS. These observations lend some support for
the complex mixture model developed here and suggest that the apparently greater parsimony of PLS may
arise from the fact that real mixtures with a large
number of components are mathematically underdetermined when it comes to multivariate calibration.
Although PLS may appear more parsimonious in
many cases, this does not seem to influence its
predictive ability, at least not in the cases studied
here. However, there was one set of conditions that
can be described where significant differences were
observed in the performance of PCR and PLS.
It is sometimes the practice in multivariate calibration for the analyst to restrict the number of latent
variables retained for the validation step. Rationalization for this restriction may related to the
simplicity (parsimony) of the model in the X space,
a desire to remove the noise which is often associated with later loading vectors, or simply a desire to
improve the computational efficiency of the validation step by reducing the number of models. It was
decided to investigate what effect such restrictions
on the number of latent variables would have on the
prediction results, so simulations were carried out in
which the number of latent variables retained for the
calibration procedure was constrained to be no more
than 80% of the maximum possible. This is referred
to as the ‘‘constrained’’ system. Constrained systems
for all of the cases discussed so far were examined,
but only the results for the constrained benchmark
system are reported here, since this was typical of
results in all other cases. Comparator and difference
plots for the constrained benchmark system are
presented in Fig. 12.
276
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
Fig. 12. Comparator (a and b) and difference (c and d) plots for the benchmark system under constrained conditions where the number of latent
variables available was restricted to 80% of the maximum possible.
The figure shows that PCR and PLS gave very
similar relative prediction errors in the limiting cases
of the least and most abundant components (left and
right plateau regions), but in the intermediate region,
PLS clearly outperforms PCR. The difference plot
shows that the RRMSEP for PLS is as much as 7%
lower than for PCR. In other words, under these
conditions of an underdetermined system with a
skewed distribution of mean concentrations, PLS is
better at predicting intermediate components than
PCR when the number of latent variables was constrained. This is reflected in the figures of merit as
well. For PCR and PLS, the Q10 values are 65% and
46%, respectively, indicating that PLS can reliably
determine a larger number of components. In contrast,
the E100 values are 1.3% and 0.8%, indicating a
relatively small difference in the ability to predict
the most dominant components.
Similar plots were observed for most of the other
cases examined, except where systems were overdetermined, in which case PCR and PLS gave very
similar results. The improved performance of PLS
under constrained conditions and its apparently more
parsimonious models under all conditions are misleading, however, and both of these properties can be
traced to the same source. Since the PLS loadings can
be shown to be linear combinations of the PCA
eigenvectors [1,3,9], the PLS models are, in fact, not
more parsimonious, but only appear to be so. Both
PLS and PCA use the same eigenvectors, except that
PLS combines the information more effectively by
employing concentration information so that fewer
latent variables are needed to generate the same
predictions. The improved performance of PLS under
constrained conditions is illusory. The comparison is
not fair, since PLS is not constrained to the same
extent as PCR. The formation of the PLS scores and
loadings utilizes information from all of the eigenvectors and is therefore capable of incorporating them
into the model. Constraints are only applied after this
step has been carried out, unlike PCR which is
restricted from the beginning. This makes PLS less
susceptible to restrictions placed on the number of
latent variables. The tendency to place an emphasis on
the earlier eigenvectors in PCR is perhaps derived
from simple mixtures where these factors dominate,
but it is clear from these simulations that the later
eigenvectors can play a critical role in calibration with
complex mixtures, since their exclusion leads to a
degradation in model performance. The importance of
the later eigenvectors has also been recognized in the
literature in the development of factor selection meth-
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
277
ods [44]. It is perhaps the practice of applying
restrictions to the number of latent variables used in
PCR that has led to the perception that PCR has
inferior predictive ability compared to PLS. This
perception was not supported by any of the simulations carried out in this work under unconstrained
conditions.
error. As with other studies, no cases were observed
where there was a significant difference between PCR
and PLS. These results, although based on limited
simulations, are consistent with those of Thomas and
Haaland [8], who found only very marginal improvements in performance of PCR over PLS when concentration noise was a factor.
4.8. Other factors examined
4.9. Summary statistics
In addition to the mixture and instrumental parameters discussed in the preceding sections, a number of
studies were carried out to examine the effects of
instrumental non-idealities, in particular baseline offsets and nonlinearities, on the performance of PCR
and PLS. Baseline offsets were introduced by adding
a random baseline shift to each of the spectra used for
calibration and prediction. Nonlinearities in the instrumental response were introduced by adding negative
curvature, modeled on a second-order polynomial, to
responses, which were above a certain threshold level.
Since all of these studies were exploratory in nature
and not intended to be comprehensive, the results are
summarized only briefly here. In all cases (except
where the baseline offsets were used with overdetermined systems), the introduction of these effects led to
a general degradation of performance. However, as in
all of the other studies carried out under unconstrained
conditions, no significant differences in the performance of PCR and PLS were detected. There is some
opinion that differences should be observed when
nonlinearities are present [45], but to our knowledge,
no comprehensive study has confirmed this. In fact,
recent work using extensive nonlinear experimental
data [46] supports the conclusions of the results
presented here; that is, there are no significant differences between PCR and PLS. Until more extensive
studies are done, the issue of nonlinearities remains an
open question.
In another exploratory study, the effect of errors in
the reference (concentration) values was examined.
After generation of the spectra, but before calibration,
normally distributed errors were added to the calibration and prediction concentrations. Several experiments were done with errors ranging from 1% to
10% of the mean concentration for each component.
Prediction errors were calculated using both the errorfree reference values and those contaminated with
Although the results presented here have focussed
on univariate perturbations from the benchmark conditions, the study as a whole employed a full factorial
design. To more thoroughly analyze the differences
between PCR and PLS, this broader population base
was used for statistical testing. In terms of the prediction ability, the populations consisted of mean
RRMSEP values for PCR and PLS for each concentration bin in each trial (about 8000 cases for each
method). Both standard and paired t-tests were used to
examine the differences between the two methods.
Since the paired t-tests gave greater sensitivity to the
differences, as would be expected in a study such as
this, only these results are reported.
For the continuous angle spectra, the t-statistic
from the paired t-test was 1.16, well below the critical
value of 1.96% at 95%, and therefore not statistically
significant. In fact, closer examination of the distribution of (RRMSEPPCR RRMSEPPLS) reveals that,
of the 92% of results that fall within F 0.25%
RRMSEP, 55% is on the negative side and 37% is
on the positive side, therefore slightly favouring PCR.
Again, however, this is not statistically significant.
In the case of fixed angle spectra, a t-statistic of
19.41 was obtained from the paired t-test, indicating a
significant difference between the two methods in
favour of PLS. Although significant, this difference is
not very meaningful. Closer analysis shows that most
of the cases with differences greater than 0.2%
RRMSEP fell in regions of very low analyte concentration, i.e. in the region of the plateau where the
predictions of them are meaningless. In the histogram,
more than 90% of values fall within F 0.2% RRMSEP
and essentially everything falls within F 1%. The
population size of this study is such that very small
differences in the two methods can be detected, but
such differences are not likely to be relevant in real
applications. Furthermore, in our opinion, the fixed
278
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
angle scenario is not likely to accurately reflect physical reality.
In terms of the number of latent variables used in
the calibration models, all statistical tests showed that
PLS used fewer latent variables than PCR, as
expected from the other results reported here. As
noted earlier, however, this is somewhat misleading
and does not reflect better predictive ability.
5. Conclusions
Relative Prediction Error
Results reported in the literature to date have not
shown a clear advantage of PLS over PCR in terms of
predictive ability in multivariate calibration. The
premise of the present study was that differences
between the two methods might become more apparent if they were applied to complex mixtures under a
variety of conditions. To this end, simulation models
for complex mixtures were developed so that the
performance of PCR and PLS could be evaluated
under changing conditions of concentration distribution, spectral correlation, noise and other factors.
Results from the simulations showed that the predictions from multivariate calibrations fell into three
regions as shown in Fig. 13: a plateau corresponding
to minor components that could not be reliably
predicted, a plateau corresponding to major components that could be predicted reasonably well, and a
transition region between the two characterized by
components present in intermediate amounts that
could be determined with varying reliability. The
extent of the plateau regions and the sharpness of
the transition region were determined by the charac-
Transition
region
Region of
no predictive
ability
Region of
good predictive
ability
Concentration Percentile
Fig. 13. General behavior of multivariate calibration applied to
complex mixtures.
teristics of the mixtures, instrument, and calibration
embodied in the simulation parameters. In general, the
curves behaved qualitatively as they were expected to
when the conditions changed, lending some credence
to the simulation models employed. Although the
simulation of complex chemical mixtures is a difficult
problem due to a lack of prior knowledge of real
systems, the model presented in this work is believed
to be a good first approximation and an important
aspect of this study which may be useful for the study
of other calibration methods.
In all of the simulations carried out in this work,
none of the conditions examined showed significant
differences between the predictive abilities of PCR
and PLS when the number of latent variables available
to the two methods was unconstrained. In a ‘‘metaanalysis’’ of all of the data accumulated, a significant
but very small difference was detected for the case of
fixed angle spectra, but given the magnitude of this
difference and the fact that the fixed angle spectra are
not as likely to reflect reality as the continuous angle
case, this finding should not be over-interpreted.
Substantial differences between the two methods were
found when the maximum number of latent variables
available was constrained, and these may be the
conditions that lead to reports of superior PLS performance in the literature. Comparison under these
conditions is unfair, however, since in reality, PLS has
access to all of the eigenvectors. Since PLS uses a
linear combination of the eigenvectors to construct its
loading vectors, it almost always required fewer latent
variables in the simulations carried out here. However,
in reality, it cannot be regarded as more parsimonious
than PCR.
The negative findings of this study with regard to
the initial hypothesis are supported by those reports in
the literature that did not find substantial differences
between PLS and PCR. However, these conclusions
do not mean that there cannot be particular cases
where one or the other method will perform better.
Nor does it mean that the are not circumstances not
examined in this study or reflected in the simulations
where PLS will give better results. Nevertheless, the
belief held by many practitioners that PLS has inherently better predictive abilities would appear to be
unsubstantiated. Other advantages of PLS, such as the
greater interpretability of the loadings, cannot be
neglected, however.
P.D. Wentzell, L. Vega Montoto / Chemometrics and Intelligent Laboratory Systems 65 (2003) 257–279
Acknowledgements
The authors gratefully acknowledge the financial
support of the Natural Sciences and Engineering
Research Council (NSERC) of Canada.
References
[1] J.H. Kalivas, Anal. Chim. Acta 428 (2001) 31 – 40.
[2] A. Phatak, S. de Jong, J. Chemom. 11 (1997) 311 – 338.
[3] A. Lorber, L.E. Wangen, B.R. Kowalski, J. Chemom. 1 (1987)
19 – 31.
[4] I.S. Helland, Commun. Stat., Simul. Comput. 17 (1988)
581 – 607.
[5] S. Wold, J. Trygg, A. Berglund, H. Antti, Chemom. Intell.
Lab. Syst. 58 (2001) 131 – 150.
[6] H. Martens, Chemom. Intell. Lab. Syst. 58 (2001) 85 – 95.
[7] T. Naes, H. Martens, Commun. Stat., Simul. Comput. 14
(1985) 545 – 576.
[8] E.V. Thomas, D.M. Haaland, Anal. Chem. 62 (1990)
1091 – 1099.
[9] S. de Jong, J. Chemom. 7 (1993) 551 – 557.
[10] I. Durán-Merás, A. Muñoz de la Peña, A. Espinosa-Mansilla,
F. Salinas, Analyst 118 (1993) 807 – 813.
[11] H.J. Luinge, E. Hop, E.T.G. Lutz, H.A. van Hemert, E.A.M.
de Jong, Anal. Chim. Acta 284 (1993) 419 – 433.
[12] P. Bhandare, Y. Mendelson, E. Stohr, R.A. Peura, Appl. Spectrosc. 48 (1994) 271 – 273.
[13] N. Dupuy, L. Duponchel, B. Amram, J.P. Huvenne, P. Legrand, J. Chemom. 8 (1994) 333 – 347.
[14] T. Almoey, E. Haugland, Appl. Spectrosc. 48 (1994) 327 – 332.
[15] K.N. Andrew, P.J. Worsfold, Analyst 119 (1994) 1541 – 1546.
[16] D. Jouan-Rimbaud, B. Walczak, D.L. Massart, I.R. Last, K.A.
Prebble, Anal. Chim. Acta 304 (1995) 285 – 295.
[17] F. Navarro-Villoslada, L.V. Pérez-Arribas, M.E. León-González, L.M. Polo-Dı́ez, Anal. Chim. Acta 313 (1995) 93 – 101.
[18] J. Saurina, S. Hernández-Casson, Analyst 120 (1995) 305 – 312.
[19] J.G. Sun, J. Chemom. 9 (1995) 21 – 29.
[20] A. Garrido Frenich, J.L. Martı́nez Vidal, P. Parrilla, M. Martı́nez Galera, J. Chromatogr. 778 (1997) 183 – 192.
[21] M. Martı́nez Galera, J.L. Martı́nez Vidal, A. Garrido Frenich,
M.D. Gil Garcı́a, J. Chromatogr. 778 (1997) 139 – 149.
[22] J. Guiteras, J.L. Beltrán, R. Ferrer, Anal. Chim. Acta 361
(1998) 233 – 240.
279
[23] F.R. Burden, R.G. Brereton, P.T. Walsh, Analyst 122 (1997)
1015 – 1022.
[24] Y. Ni, X. Gong, Anal. Chim. Acta 354 (1997) 163 – 171.
[25] T. Galeano Dı́az, A. Guiberteau, J.M. Ortı́z Burguillos, F.
Salinas, Analyst 122 (1997) 513 – 517.
[26] A. Donachie, A.D. Walmsley, S.J. Haswell, Anal. Chim. Acta
378 (1999) 235 – 243.
[27] J. Saurina, S. Hernández-Cassou, E. Fabregas, S. Alegret,
Analyst 124 (1999) 733 – 737.
[28] J.H. Kalivas, J. Chemom. 13 (1999) 111 – 132.
[29] M. Blanco, J. Coello, H. Iturriaga, S. Maspoch, J. Pages, Anal.
Chim. Acta 384 (1999) 207 – 214.
[30] J. Saurina, S. Hernández-Cassou, Analyst 124 (1999) 745 – 749.
[31] V. Centner, J. Verdú-Andrés, B. Walczak, D. Jouan-Rimbaud,
F. Despagne, L. Pasti, R. Poppi, D. Massart, O.E. de Noord,
Appl. Spectrosc. 54 (2000) 608 – 623.
[32] M.T. Lohnes, R.D. Guy, P. Wentzell, Anal. Chim. Acta 389
(1999) 95 – 113.
[33] L.J. Nagels, W.L. Creten, P.M. Vanpeperstraete, Anal. Chem.
55 (1983) 216 – 220.
[34] F. Dondi, Y.D. Kahie, G. Lodi, M. Remelli, P. Reschiglian, C.
Bighi, Anal. Chim. Acta 191 (1986) 261 – 273.
[35] R.E. Thompson, E.O. Voit, G.I. Scott, Environmetrics 11
(2000) 99 – 119.
[36] A. Kubala-Kukus, D. Banas, J. Braziewicz, U. Majewska, S.
Mrowczynski, M. Pajek, Spectrochim. Acta, Part B: Atom.
Spectrosc. 56B (2001) 2037 – 2044.
[37] Z. Michailidis, F. Vosniakos, N. Koutinas, J. Environ. Prot.
Ecol. 2 (2001) 61 – 67.
[38] P.D. Wentzell, M.T. Lohnes, Chemom. Intell. Lab. Syst. 45
(1999) 65 – 85.
[39] C.D. Brown, P.D. Wentzell, J. Chemom. 13 (1999) 133 – 152.
[40] C.D. Brown, L. Vega-Montoto, P.D. Wentzell, Appl. Spectrosc. 54 (2000) 1055 – 1068.
[41] S. Schreyer, M. Bidinosti, P.D. Wentzell, Appl. Spectrosc. 56
(2002) 789 – 796.
[42] K.S. Booksh, B.R. Kowalski, Anal. Chem. 66 (1994) 782 – 791.
[43] L. Vega-Montoto, Study of the Performance of Principal Component Regression and Partial Least Squares Regression Using
Simulation of Complex Mixtures, MSc thesis, Dalhousie University, Halifax, Canada, 2001.
[44] S.Z. Fairchild, J.H. Kalivas, J. Chemom. 15 (2001) 615 – 625.
[45] R. Kramer, Chemometric Techniques for Quantitative Analysis, Marcel Dekker, New York, 1998.
[46] T.K. Karakach, Comparison of Linear and Nonlinear Multivariate Calibration Methods, MSc thesis, Dalhousie University, Halifax, Canada, 2002.