5. Multiway calibration Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP 1 Multiway regression problems e.g. batch reaction monitoring Process measurements Product quality Y batch batch X time process variable product quality 2 Multiway regression problems e.g. tandem mass spectrscopy MS-MS spectra parent ion m/z daughter ion m/z sample samples X1 X2 X3 X4 X5 Compound concentrations compound 3 Some terminology Univariate calibration (OLS – ordinary least squares) zero-order Cannot handle interferents first-order Can handle interferents if they are present in the training set Multivariate calibration (ridge regression, PCR, PLS etc.) N-PLS(?) Second-order advantage (PARAFAC, restricted Tucker, GRAM, RBL etc.) second-order Can handle unknown interferents (although see work of K.Faber) 4 Multiway calibration methods • PARAFAC (already discussed on first day) • (Unfold-PLS) • Multiway PCR • N-PLS • MCovR (multiway covariates regression) (see work of Smilde & Gurden) • GRAM, NBRA, RBL (see work of Kowalski et al.) 5 Unfold-PLS • Matricize (or ‘unfold’) the data and use standard twoway PLS: X1 I J XI Y I K ... I X JK M • But if a multiway structure exists in the data, multiway methods have some important advantages!! 6 Two-way PCR • Standard PCR for X (I J) and y (I 1). PT 1. Calculate PCA model of X: X = TPT + E X 2. Use PCA scores for ordinary regression: y = Tb + E b= = T + E b (TTT)-1TTy 3. Make predictions for new samples: Y Tnew = XnewP ynew = Tnew b 7 Multiway PCR • Multiway PCR for X (I J K) and y (I 1). CT 1. Calculate multiway model: X = A(C||B)T + E BT X 2. Use scores for regression: = + E A y = A bPCR + E bPCR = (ATA)-1ATy 3. Make predictions for new samples: bPCR Y Anew = XnewP(PTP)-1 where P = (C||B) ynew = Anew bPCR 8 N-PLS • N-PLS is a direct extension of standard two-way PLS for N-way arrays. • The advantages of N-PLS are the same as for any multiway analysis: – a more parsimonious model – loadings which are easier to plot and interpret 9 N-PLS • The standard two-way PLS algorithm (see ‘Multivariate Calibration’ by Martens and Næs): 1. max cov X r 1w r , y r 1 wr • The N-PLS algorithm (R.Bro) uses PARAFAC-type loadings, but is otherwise very similar w r ,v r w ith w r 1 2. t r X 3. r 1 wr X r X r 1 t r w Tr 4. y r y0 Uqr 1. max cov X r 1 v r w r , y r 1 with w r v r 1 vr w r T X r X r 1 t r v r w r 2. t r X 3. r 1 4. yr y0 Uqr 10 N-PLS graphic (taken from R.Bro) 11 Other methods • Multiway covariates regression (MCovR) – different to PLS-type models – choice of structure on X (PARAFAC, Tucker, unfold etc.) – sometimes loadings are easier to interpret 2 2 – T T min X XWPX W 1 Y XWPY • Restricted Tucker, GRAM, RBL, NBRA etc. – for more specialized use – second-order advantage, i.e. able to handle unknown interferents standard, N mixture, N + M N M 1 0 restricted loadings, A 12 Conclusions • There are a number of different calibration methods for multiway data. • N-PLS is a extension of two-way PLS for multiway data. • All the normal guidelines for multivariate regression still apply!! – watch out for outliers – don’t apply the model outside of the calibration range 13 Outliers (1) 18 18 16 16 14 14 Remove outlier 12 T (oC) T (oC) • Outliers are objects which are very different from the rest of the data. These can have a large effect on the regression model and should be removed. 12 10 10 8 8 6 6 4 1 1.5 2 2.5 3 3.5 4 4.5 pH 4 1 1.5 2 2.5 3 3.5 4 4.5 pH bad experiment 14 Outliers (2) 6 14 4 12 2 10 Sum-of-squared residuals Scores PC 2 • Outliers can also be found in the model space or in the residuals. 0 -2 -4 6 4 2 -6 -8 -8 8 -6 -4 -2 0 2 Scores PC 1 4 6 8 0 22 24 26 28 30 32 34 Time (min) 36 38 40 42 15 Model extrapolation... 84 • Univariate example: mean height vs age of a group of young children 82 81 Height (cm) • A strong linear relationship between height and age is seen. 83 80 79 78 77 76 • For young children, height and age are correlated. 75 18 20 22 24 Age (months) 26 28 30 Moore, D.S. and McCabe G.P., Introduction to the Practice of Statistics (1989). 16 ... can be dangerous! 300 250 ...but is not valid for 30 year olds! Height (cm) 200 Linear model was valid for this age range... 150 100 50 0 0 5 10 15 Age (years) 20 25 30 17
© Copyright 2026 Paperzz