Document

5. Multiway calibration
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
1
Multiway regression problems e.g. batch reaction
monitoring
Process measurements
Product quality
Y
batch
batch
X
time
process
variable
product
quality
2
Multiway regression problems e.g. tandem mass
spectrscopy
MS-MS spectra
parent ion m/z
daughter ion m/z
sample
samples
X1
X2
X3
X4
X5
Compound
concentrations
compound
3
Some terminology
Univariate calibration
(OLS – ordinary least squares)
zero-order
Cannot handle
interferents
first-order
Can handle interferents
if they are present in the
training set
Multivariate calibration
(ridge regression, PCR, PLS
etc.)
N-PLS(?)
Second-order advantage
(PARAFAC, restricted Tucker,
GRAM, RBL etc.)
second-order
Can handle unknown
interferents
(although see work of
K.Faber)
4
Multiway calibration methods
• PARAFAC (already discussed on first day)
• (Unfold-PLS)
• Multiway PCR
• N-PLS
• MCovR (multiway covariates regression) (see work of
Smilde & Gurden)
• GRAM, NBRA, RBL (see work of Kowalski et al.)
5
Unfold-PLS
• Matricize (or ‘unfold’) the data and use standard twoway PLS:
X1
I
J
XI
Y
I
K
...
I
X
JK
M
• But if a multiway structure exists in the data, multiway
methods have some important advantages!!
6
Two-way PCR
•
Standard PCR for
X (I  J) and y (I  1).
PT
1. Calculate PCA model of X:
X = TPT + E
X
2. Use PCA scores for ordinary
regression:
y = Tb + E
b=
=
T
+
E
b
(TTT)-1TTy
3. Make predictions for new
samples:
Y
Tnew = XnewP
ynew = Tnew b
7
Multiway PCR
•
Multiway PCR for
X (I  J  K) and y (I  1).
CT
1. Calculate multiway model:
X = A(C||B)T + E
BT
X
2. Use scores for regression:
=
+
E
A
y = A bPCR + E
bPCR = (ATA)-1ATy
3. Make predictions for new
samples:
bPCR
Y
Anew = XnewP(PTP)-1
where P = (C||B)
ynew = Anew bPCR
8
N-PLS
• N-PLS is a direct extension of standard two-way PLS
for N-way arrays.
• The advantages of N-PLS are the same as for any
multiway analysis:
– a more parsimonious model
– loadings which are easier to plot and interpret
9
N-PLS
• The standard two-way PLS
algorithm (see ‘Multivariate
Calibration’ by Martens and
Næs):
 
1. max cov X r 1w r , y r 1
wr

• The N-PLS algorithm (R.Bro)
uses PARAFAC-type
loadings, but is otherwise
very similar
 
w r ,v r
w ith w r  1
2. t r  X
3.
r 1
wr
X r   X r 1  t r w Tr
4. y
r 
 y0   Uqr 


1. max cov X r 1 v r  w r , y r 1

with w r  v r  1
vr  w r 
T
X r   X r 1  t r v r  w r 
2. t r  X
3.
r 1
4. yr   y0   Uqr 
10
N-PLS graphic
(taken from R.Bro)
11
Other methods
• Multiway covariates regression (MCovR)
– different to PLS-type models
– choice of structure on X (PARAFAC, Tucker, unfold etc.)
– sometimes loadings are easier to interpret
2
2
–
T
T 

min  X  XWPX
W 

 1    Y  XWPY

• Restricted Tucker, GRAM, RBL, NBRA etc.
– for more specialized use
– second-order advantage, i.e. able to handle unknown
interferents
standard, N
mixture, N + M
N
M
1
0
 
restricted loadings, A
12
Conclusions
• There are a number of different calibration methods
for multiway data.
• N-PLS is a extension of two-way PLS for multiway
data.
• All the normal guidelines for multivariate regression
still apply!!
– watch out for outliers
– don’t apply the model outside of the calibration range
13
Outliers (1)
18
18
16
16
14
14
Remove
outlier
12
T (oC)
T (oC)
• Outliers are objects which are very different from the
rest of the data. These can have a large effect on the
regression model and should be removed.
12
10
10
8
8
6
6
4
1
1.5
2
2.5
3
3.5
4
4.5
pH
4
1
1.5
2
2.5
3
3.5
4
4.5
pH
bad
experiment
14
Outliers (2)
6
14
4
12
2
10
Sum-of-squared residuals
Scores PC 2
• Outliers can also be found in the model space or in
the residuals.
0
-2
-4
6
4
2
-6
-8
-8
8
-6
-4
-2
0
2
Scores PC 1
4
6
8
0
22
24
26
28
30
32
34
Time (min)
36
38
40
42
15
Model extrapolation...
84
• Univariate example: mean
height vs age of a group of
young children
82
81
Height (cm)
• A strong linear relationship
between height and age is
seen.
83
80
79
78
77
76
• For young children, height
and age are correlated.
75
18
20
22
24
Age (months)
26
28
30
Moore, D.S. and McCabe G.P., Introduction to the Practice of Statistics (1989).
16
... can be dangerous!
300
250
...but is not valid
for 30 year olds!
Height (cm)
200
Linear model
was valid for
this age range...
150
100
50
0
0
5
10
15
Age (years)
20
25
30
17