C - LQTA - Unicamp

4. Model constraints
Quimiometria Teórica e Aplicada
Instituto de Química - UNICAMP
1
Principal component analysis (PCA)
• In Hotelling’s (1933) approach, components have
maximum variance.
– X = TPT + E
– Components are calculated successively.
– Components are orthogonal: TTT = Diagonal; PTP = I
• In Pearson’s (1901) and Eckart & Young’s (1936)
approach, components explain maximum amount of
variance in the variables.
– X = ABT + E
– Components are calculated simultaneously.
– Components have no orthogonality or unit-length
constraints.
2
Constrained least squares
X  AB
• Solve min
A
T
2
under the constraint that A is
non-negative, unimodal, smooth etc.
• Some constraints are inactive, e.g. PCA under
orthogonality.
• If constraints are active, A is no longer the leastsquares solution.
3
Why use constraints?
• Obtain solutions that correspond to known chemistry,
making the model more interpretable.
– Concentrations can not be negative.
• Obtain models that are uniquely identified.
– Remove rotational ambiguity.
• Avoid numerical problems such as local minima and
swamps.
– Constraints can help ALS find the correct solution
4
Example: curve resolution of HPLC data (1)
• HPLC analysis of three
coeluting
organophosphorus
pesticides.
• Diode-array detector
gives a spectrum at each
time point: X (time 
wavelength).
• Beer-Lambert law says
X = CST + E.
• Initial analysis shows
that three analytes are
present.
Data is from Roma Tauler’s web-site http://www.ub.es/gesq/eq1_eng.htm
Download it and try for yourself!
5
Example: curve resolution of HPLC data (2)
Unconstrained solution
C
S
0.5
0.4
0.4
0.3
0.2
0.2
Absorbtion (unit)
Concentration (unit)
0.3
0.1
0
0
-0.1
-0.1
-0.2
-0.2
-0.3
31
0.1
31.1
31.2
31.3
31.4
Elution time (min)
31.5
31.6
31.7
-0.3
180
200
220
240
260
280
300
Wavelength (nm)
320
340
360
380
99.990094% of X explained
Calculation time: 0.43 seconds
6
Example: curve resolution of HPLC data (2)
Non-negativity constraints
C
S
0.5
0.45
0.45
0.4
0.4
0.35
0.3
Absorbtion (unit)
Concentration (unit)
0.35
0.3
0.25
0.2
0.25
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
31
31.1
31.2
31.3
31.4
Elution time (min)
31.5
31.6
31.7
0
180
200
220
240
260
280
300
Wavelength (nm)
320
340
360
380
99.990079% of X explained
Calculation time: 16 seconds
7
Example: curve resolution of HPLC data (3)
Unimodality & non-negativity constraints
C
S
0.5
0.45
0.45
0.4
0.4
0.35
0.3
Absorbtion (unit)
Concentration (unit)
0.35
0.3
0.25
0.2
0.25
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
31
31.1
31.2
31.3
31.4
Elution time (min)
31.5
31.6
31.7
0
180
200
220
240
260
280
300
Wavelength (nm)
320
340
360
380
99.989364% of X explained
Calculation time: 16 minutes
8
Comments
• Active constraints always reduce % fit, but can give a
more interpretable model.
• It is possible to ‘stack’ two–way data from different
experiments, e.g.
S
X1
C1
=
X2
X3
E1
+
C2
C3
E2
E3
9
What sort of constraints might be useful?
• Hard target: known spectrum, ar = s
• Non-negativity: concentrations, absorbances
• Monotonicity: kinetic profiles
• Unimodality: elution profiles, fluorescence excitations
• Other curve shapes: Gaussian peaks, symmetry
• Selectivity: pure variables
• Functional constraints: first-principle models
• Closure: [A]t + [B]t + [C]t = y
• Orthogonality: useful for separation of variances
• ...plus many more...
10
Conclusions (1)
• Chemical knowledge can be included in your model
by using constraints.
• Constraints can improve the model making it
– closer to reality
– easier to understand
– more robust to extrapolation
• It is possible to mix constraints within the same
mode, i.e. loadings 1 and 3 are non-negative,
loadings 2 are unimodel.
11
Conclusions (2)
• Mixed constraints can be applied using column-wise
estimation:
1. Subtract contribution from other components
X r  X  A rBTr
2. Estimate component under desired constraint
min X  a rb
r
ar
T
r
2
– Bro & Sidiropoulos (1998) have shown that this is equivalent
to solving
2
min a r  a Tr
ar
where a is the unconstrained solution, a r  X b r b r b r
r
T
12
• Step 0: Initialise B, C & G
• Step 1: Estimate A:
ALS for Tucker3

Z T  GR1R 2R3 CT  BT
min X
IJK
A
 AZ
T

2
X
• Step 2: Estimate B in same way: min
B
X
• Step 3: Estimate C in same way: min
C
JKI
 BZ
KIJ
 CZ
T
T
2
2
• Step 4: Estimate G:
Z  C  B  A 
min vec X   vec G Z
T
T
T
2
G
• Step 5: Check for convergence. If not, go to Step 1.
13
Example: UV-Vis monitoring of a chemical reaction (1)
•
Two-step conversion reaction
under pseudo-first-order kinetics:
A+BCD+E
•
UV-Vis spectrum (300-500nm)
measured every 10 seconds for 45
minutes
•
30 normal batches measured: X
(30  201  271)
•
9 disturbed batches: pH changes
made during the reaction
14
Example: UV-Vis monitoring of a chemical reaction (2)
3-component PARAFAC model has problems!
Loading 1
Batch mode
0.095
0
0
0.09
Loading 2
5
-0.2
300
0.2
0
0
1
27
0.085
500
0
0.12
45
0.1
0.08
-5
1
27
0.5
Loading 3
Time mode
0.2
-5
highly
correlated

Wavelength mode
5
0
-0.5
-0.2
300
0.2
0.06
500
0
0.5
0
1
27
Batch number
45
0
-0.2
300
500
-0.5
0
Wavelength
45
Time
spectra are difficult
to interpret

15
Example: UV-Vis monitoring of a chemical reaction (3)
External process information
Pure spectra of reactant and
product known:
No compound interactions
allowed: Lambert-Beer law
1.4
A
1.2
Absorbance (units)
1
0.8
0.6
0.4
0.2
0
300
320
340
360
380
400
420
Wavelength (nm)
440
460
480
500
First-order reaction kinetics are
known:
1.4
D
1.2
Absorbance (units)
1
At  A0 e  k t
C  k1A0 e k t  e k t 
1
0.8
0.6
0.4
1
t
0.2
0
300
320
340
360
380
400
420
Wavelength (nm)
440
460
480
500
2
k 2  k1
Dt  A0  At  Ct
16
Example: UV-Vis monitoring of a chemical reaction (4)
Constrained Tucker3 (1,3,3) model
X = AG (CB)T + E
REACTION
KINETICS
C
=
batch
X
G
B
+
E
time
wavelength
LAMBERTBEER LAW
A
KNOWN
SPECTRA
17
Example: UV-Vis monitoring of a chemical reaction (5)
Constrained Tucker3 (1,3,3) model
• Core array: G = [g111 0 0 | 0 g122 0 | 0 0 g133]
Loading 1
Batch mode
Wavelength mode
0.5
0.2
0
0.1
-0.5
1
27
Spectrum of
intermediate is
found!
*
0
300
0.2
500
*
0.5
0
0
45
1
0.1
500
0
fixed to 1st-order
kinetics
*
0.5
0
300
0.2
Loading 3
fixed to known
spectrum
Loading 2
Batch number
Time mode
1
0
45
1
*
0.1
0
300
500
Wavelength
*
0.5
0
0
45
Time
Rate constants
are found!
k1 = 0.27,
k2 = 0.029
18
Conclusions (3)
• If you already have some information about your
chemical process, then include it in your model
• Using constraints can really help to uncover new
information about your data (e.g. find spectra,
estimate rate constants, test models).
• It is possible to build ‘hybrid’ or ‘grey’ models where
some loadings are constrained and others are left
free – see the extra material which follows!
19
Extra material: Black vs white models
•
‘Black-box’ or ‘soft’ models are
empirical models which aim to fit
the data as well as possible e.g.
PCA, neural networks
Difficult to
interpret
Good fit
•
+
•
‘White’ or ‘hard’ models use
known external knowledge of the
process e.g. physicochemical
model, mass-energy balances
Easy to
interpret
Not always
available
Good fit
‘Grey’ or ‘hybrid’ models combine the two.
20
Extra material: Grey models mix black and white models
REACTION
KINETICS
Total
variation
Systematic
variation due
to known
causes
KNOWN
CONCENTRATIONS
+
MODEL
Systematic
variation due
to unknown
causes
+
Unsystematic variation
RESIDUALS
MECHANISTIC
MODEL
21
Extra material: Grey model
REACTION
KINETICS
C
=
batch
X
G
C
B
+
G
B
+
E
time
wavelength
LAMBERTBEER LAW
A
A
KNOWN
SPECTRA
22
Extra material: Grey model parameter estimation
White part
Black part
A
-
Ordinary least squares
[a1 a2 a3]
B
Fixed (target) loadings
b1 = reactant
b3 = product
C
G
First-order kinetic model
Levenberg-Marquardt
optimisation for
[c1 c2 c3] = f(k1,k2)
Restricted core array
Non-interacting triads have gpqr =
0 according to Lambert-Beer
Ordinary least squares
[b2 b4 b5]
Ordinary least squares
[c4 c5]
Ordinary least squares
(vectorised)
G for gpqr 0
23
Extra material: Grey model parameters
Wavelength mode
0.2
Time mode
-0.5
0.1
1
27
Loading 2
Batch number
0.5
0
300
0.2
500
0
0
45
0.4
0.2
0
-0.2
-0.4
Time mode
0.1
0.1
0.5
500
*
0.1
0
0
45
1
0.2
0.1
0
0
-0.1
27
300
1
500
0.2
*
45
0.088
0
0.1
0.086
-0.2
1
27
Batch number
*
0
0.09
0.2
-0.4
-0.1
0
300
0.084
500
0
Wavelength
45
Time
0.5
0
300
500
Wavelength
0
0
45
Time
White components
describe known effects
•
Wavelength mode
1
0
300
0.2
Loading 3
*
Loading 2
Loading 1
*
0
Batch mode
1
Loading 1
Batch mode
0.5
Black components
can be interpreted
99.8% fit (corresponds well with estimated level of spectral noise of 
0.13%)
24
Extra material: Grey model residuals
-3
5
Squared residuals
Squared residuals
0.02
0.015
0.01
0.005
0
0
x 10
4
3
2
1
0
300
10
20
Batch number
350
400
450
Wavelength
500
Squared residuals
0.01
0.008
0.006
0.004
0.002
0
0
5
10
15
20
25
30
35
40
45
Time
25
Extra material: Off-line monitoring
Off-line monitoring: D-statistic with 95% and 99% confidence limits
Off-line monitoring: Q-statistic with 95% and 99% confidence limits
35
0.09
37
33
0.08
30
0.07
25
38
36
33
20
Q-statistic
D-statistic
0.06
15
10
39
0.04
0.03
34
38
32
0.05
31
35
34
0.02
5
0
0.01
0
5
10
15
20
25
Batch number
30
35
D-statistic
(within model variation)
40
0
0
5
10
15
20
25
Batch number
30
35
40
Q-statistic
(residual variation)
26
Extra material: On-line monitoring of disturbed batch
On-line monitoring: D-statistic with 95% and 99% confidence limits
20
D-Statistic
15
10
5
0
0
5
10
15
20
25
30
35
40
Time
On-line monitoring: SPE with 95% and 99% confidence limits
5
10
15
20
45
-4
ln(SPE)
-5
-6
-7
-8
-9
0
25
30
35
40
45
Time
27