INTRODUCTION - Chemometrics Introduction What is this and why we need it BASICS OF CHEMOMETRICS - Some definitions - Overview of methods Juan Antonio Fernández Pierna Vincent Baeten Pierre Dardenne - Examples Walloon Agricultural Research Centre (CRA-W) Valorisation of Agricultural Products Department Gembloux, Belgium 1 X - METRICS CHEMOMETRICS – INTRODUCTION The use of multivariate analysis in the discipline X: “Chemometrics is the chemical discipline that uses mathematics and statistics to design or select optimal experimental procedures, to provide maximum relevant chemical information by analyzing chemical data, and to obtain knowledge about chemical systems” Statistical, mathematical or graphical technique, considers multiple variables simultaneously - Biometrics (used in biology) - Technometrics (used in engineering) - Psychrometrics (used in phychology) - Chemometrics (used in chemistry) 2 D. L. Massart 3 4 CHEMOMETRICS – INTRODUCTION CHEMOMETRICS – INTRODUCTION The scientific world today Mathematics Computing Biology Industrial Food Feed Engineering Statistics • The data flood generated by modern analytical instrumentation produces large quantity of numbers to understand and quantify phenomenons around us. Analytical chemistry • The evolution of personal computers allows faster acquisition, processing and interpretation of chemical data. Theoretical and physical chemistry Organic chemistry • Every scientist uses software related to mathematical methods or to processing of knowledge. Meeting point of various disciplines 5 Chemometrics CHEMOMETRICS – INTRODUCTION • A deeper understanding of those methods and tools for viewing all data simultaneously are needed. 6 CHEMOMETRICS – INTRODUCTION Use of mathematical and statistical methods for selecting optimal experiments Statistical experimental design Design of Experiments (DoE)… CHEMOMETRICS CHEMOMETRIC Analytical Analytical Analytical S method answer request Extracting maximum amount of information when analysing multivariate (chemical) data Classification Process monitoring, Multivariate calibration… Useful at any point in an analysis, from the first conception of an experiment until the data is discarded. 7 8 CHEMOMETRICS – APPLICATIONS CHEMOMETRICS – DEFINITIONS Huge growth area in past 15 years • • • • • Linear algebra is the language of Chemometrics. One cannot expect to truly understand most chemometric techniques without a basic understanding of linear algebra (Wise and Gallagher, 1998) Process Control and analysis Food and feed analysis Biology – metabolomics etc Environmental monitoring Analytical Chemistry Matrix and vector operations 9 10 CHEMOMETRICS – DEFINITIONS CHEMOMETRICS – DEFINITIONS 0.9 0.8 0.7 - Samples are referred to as OBJECTS Absorbance 0.6 - Measurement results (e.g. concentration, absorbances, …) are referred to as VARIABLES 0.5 0.4 0.3 0.2 K variables x22 ... xn 2 ... x1 p ù ... x2 p úú ... ... ú ú ... xnp ûú 0.1 0 -0.1 1200 1400 1600 1800 Wavelength 2000 2200 2400 0.8 0.7 0.6 0.5 Absorbance -A data table of K variables and M objects is referred to as a DATA MATRIX OF SIZE MxK x12 Samples é x11 êx 21 X =ê ê ... ê ëê xn1 0.4 0.3 0.2 Variables X 0.1 0 -0.1 1200 1400 1600 1800 Wavelength 2000 2200 2400 1200 1400 1600 1800 Wavelength 2000 2200 2400 0.8 CHEMOMETRICS: Extract meaningful information about the objects and the variables from data matrices 0.7 0.6 0.5 Absorbance If X has 3 rows and 5 columns 3 x 5 data matrix M objects - observations 0.4 0.3 0.2 0.1 11 12 0 -0.1 CHEMOMETRICS – INTRODUCTION CHEMOMETRICS – IMPORTANT DISCIPLINES In summary… é x11 êx ê 21 ê ... ê ëê xn1 K variables X = M objects x12 ... x1 p ù x22 ... x2 p úú ... ... ... ú ú xn 2 ... xnp ûú Ø Sampling, selection of objects and variables Ø Clustering Ø Multivariate regressions, calibrations and predictions Ø Neural Networks Ø Validation Ø Graphical display and outlier detection 13 14 Source: Chemometrics – Introduction - Jens C. Frisvad Calibration data set (X) Data Exploration Pattern Recognition INCLUDE IN THE CALIBRATION SET ALL THE FACTORS OF VARIATION: Data matrix pre-treatment Model construction Calibration Unsupervised analysis Principal Component Analysis (PCA) Measurement factors: - room temperature - operator - instrument setup Supervised analysis Regression models Model building Visualization clustering tendency SAMPLE SELECTION Sample Selection Selection optimal complexity outlier detection New data Outliers Prediction 15 Uncertainty estimation Validation Population sources of variation: - origins of samples - processes - varieties - storage conditions - sample preparations (t°, particle size) - residual moisture 16 PATTERN RECOGNITION PRINCIPAL COMPONENT ANALYSIS PCA i sa chemomet ric techni que for visual izing high dimensionaldata.PCA reducesthedimensionality(the Groupingofobject se. g.how simil arist hebehaviourofcompounds, how similar are products… numberofvari ables)ofadat asetbymai ntaini ngasmuch varianceaspossi ble. PCA PrincipalComponentAnalysis 0.4 -3 PC1 x 10 0.2 -1 0 -2 -0.2 PC 2 (3.42%) Lookingatrel at ionshi ps: Betweensamples:food samples / patients / people / spectra / … -3 PC2 -0.4 -4 -0.6 -5 Betweenvariables:el ementalcomposit ions/compound concentrations / spectral peaks / … -0.8 0.8 0.7 17 -1 0.4 0.35 PC2 0.6 PC1 -1.2 -3 -2 -1 18 0.3 0.25 0.5 0.4 0 0.2 1 2 3 4 PC 1 (96.09%) PRINCIPAL COMPONENT ANALYSIS CALIBRATION Quantit at iveorqual it at iveest imat ion.Especial lymixtures. Samples/Scores Plot of Xoffset 0.2 0.2 0.15 0.15 Clusters Scores on PC 3 (0.34%) 0.1 0.1 Univariat ecal ibrati on:onemeasuremente. g.apeakhight 0.05 0.05 0 Outliers -0.05 M ult ivariat ecal ibrati on:severalmeasurementse. g.spect ra Stat ist ical ,mat hemat icalorgraphicaltechnique,considers mult ipl evariablessi mult aneousl y -0.1 -0.15 -0.2 -0.25 -3 -3 -2 -2 -1 -1 0 0 11 Scores on PC 1 (96.09%) 22 Object space 33 4 19 20 MULTIVARIATE CALIBRATION MULTIVARIATE CALIBRATION Calibration X – Data spectra • Relateinstrumentalresponsetochemicalconcentrati on. • Impropercalibrationyieldsimproperresults. • Linear,non-linearandmultivariatecalibrationalgorithms areavailabl e,dependingontheinstrumentandtheanalysis involved. + Y – Data Referencevalues Calibration model -M oist ure(regression) -Con centrati onofprotein(regression) -countryoforigin(discriminati on) -PDO (discriminati on) -… Linki ng two set s of dat a together : peak hi ght t o concent rati on / Spect ra t o concent rati ons / bi ologi cal act ivi tyt ost ruct ure/… 21 MULTIVARIATE CALIBRATION - VALIDATION -Infraredspect roscopy(IR-Raman) -Fourier-Transform Infraredspect roscopy(FTIR) -Nucl earM agnet icResonance(NM R) -X-RayFluorescence(XRF) -X-RayDiffract ion(XRD)spect roscopy -… 22 MULTIVARIATE CALIBRATION Calibration X – Data spectra + Y – Data Refvalue Y=f(X)=XB+E Calibration model Y is a matrixnxm containingthe reference values Validation X – Data spectra + Calibration model X is a matrixnxp containingthe spectra (NIR,MIR, Raman,…) Y – Data //refvalue. Calibration uses empiricaldata and prior knowledge for determininghow to predict unknown quantitative information yfrom available measurements x,via some mathematicaltransfer functions B is a matrixpxm containingthe regression coeficients E is a matrixnxm explainingfor the modelerror. 23 24 Principal Component Analysis (PCA) Unsupervised - Pattern recognition Unsupervised Cluster Analysis Multiple Linear Regression (MLR) (CA) Principal Component Regression (PCR) Principal Component Analysis (PCA) Partial Least Squares (PLS) Unsupervised Cluster Analysis Regression Multivariate analysis (CA) Artificial Neural Networks (ANN) Multiple Linear Regression (MLR) Principal Component Regression (PCR) Partial Least Squares (PLS) LS–Support Vector Machines (LS-SVM) Regression Multivariate analysis Artificial Neural Networks (ANN) Supervised Local techniques LS–Support Vector Machines (LS-SVM) Supervised Local techniques à Seeks similarities and regularities present in the data. PLS-Discriminant Analysis (PLS-DA) PLS-Discriminant Analysis (PLS-DA) Discrimination SIMCA Discrimination 25 Algorithms Support Vector Machines (SVM) Supervised - Discrimination Principal Component Analysis (PCA) Principal Component Analysis (PCA) à RegressionUnsupervised based statisticaltechnique used in determining which Cluster Analysis Multipleof Linear particul ar cl assification or(CA)group an item dRegression ata or(MLR) an object bel ongs to on the basis of its characteristics or essential features. Principal Component Regression (PCR) Unsupervised Cluster Analysis (CA) Multiple Linear Regression (MLR) Principal Component Regression (PCR) Partial Least Squares (PLS) Partial Least Squares (PLS) Regression Regression Multivariate analysis Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) LS–Support Vector Machines (LS-SVM) Supervised 26 Support Vector Machines (SVM) Supervised - Regression Multivariate analysis SIMCA K-Nearest Neighbours (k-NN) K-Nearest Neighbours (k-NN) LS–Support Vector Machines (LS-SVM) Supervised Local techniques Local techniques PLS-Discriminant Analysis (PLS-DA) SIMCA à It incl udes anyDiscrimination techniques for mod el ing and anal yzing several variables,when the focus is on thK-Nearest e rel ationsh ip between a dependent Neighbours (k-NN) variable (property of interest) and one or more independent Support Vector Machines (SVM) variables (spectra). PLS-Discriminant Analysis (PLS-DA) Discrimination SIMCA K-Nearest Neighbours (k-NN) 27 Support Vector Machines (SVM) 28 VARIABLE SELECTION VARIABLE SELECTION Find a setofpre dictorvari ables which gi ves a good fi t , pre dicts t he dependent val ue well and i s as smal l as possible. Ø Backwardel iminat ion:Startwit hallthepredictorsand potential lydroppredict orsinsubsequentsteps. Ø Forwardsel ect ion:Startwithnopredict orsandpotential l y addpredictorsinsubsequentst eps. Ø Stepwiseregression:Combinat ionofbackward el iminati onandforwardsel ect ion. 29 VARIABLE SELECTION 30 CHEMOMETRICS – SOFTW ARE Reduci ngnumberofvariables romet ersbasedonareduced Possi bili tyofconst ructi ngfastspect number of variables ai ming t o a reducti on of t rai ni ng and util izat ionti mest obeused,forinstance,i naconveyerbelt . M oreover,t hefactofgro upi ngt hedi fferentsel ect edvari ablesi n regionsal lowsaneasychemicalinterpret at ionofthespect ra. References - ‘Backward variable selection in PLS (BVSPLS)’;J. A.Fernández Pierna,V.Baeten,P.Dardenne.AnalyticaChimicaActa642(2009) pp.89-93. - ‘Prediction errorimprovements using variable select ion on small calibration sets - a comparison of some recent methods’. S. I. Overgaard,J. A.Fernández Pierna,V.Baeten,P.Dardenne & T. Isaksson.J.NearInf raredSpectrosc.20,329-337(2012). 31 32 EXAMPLE: FEED EXAMPLE: FEED FARIMAL Other Vegetal Poultry Bovine + Pig Terrestrialanimal Fish Other CAL LOOCV % classifiedas. . . % classifiedas. . . Belonging to. . . Fish Fish 98.6 1.4 Rest 9. 5 90. 5 Rest Fish 96.8 10 Rest 3.2 90 33 34 93.4 correct % classification EXAMPLE: CEREALS - IMPURITIES EXAMPLE: NUTS AND DRIED FRUITS § § § § § § Walnut Hazelnut Brazil nut Almond Cashew Raisin SVM discrimination models ‘NIR hyperspectral imaging spectroscopy and chemometrics for the detection of undesirable substances in food and feed’ J.A. Fernández Pierna, Ph. Vermeulen, O. Amand, A. Tossens, P. Dardenne and V. Baeten. Special issue Chemometrics and Intelligent Laboratory Systems 117 (2012) 233-239 35 36 EXAMPLE: OLIVES - IMPURITIES EXAMPLE: OLIVE OIL – QUALITY PARAMETERS KNN - PCA models ‘Usinga visible vision system for on-line determination of qualityparameters of olive fruits’. E. Guzmán, V.Baeten,J. A.FernándezPierna,J. A.García Mesa.Foofand Nutrition Sciences,4,90-98(2013). 37 38 ‘Application of low-resolution Raman spectroscopyfor the analysis of oxidized olive oil’ E. Guzmán, V.Baeten,J. A.FernándezPierna,J. A.García Mesa.(2011) Food control 22,2036-2040. EXAMPLE: OLIVE OIL – ADULTERATION EXAMPLE: SUGAR BEET PLANTS – CYST optical microscopy NIR hyperspectral imaging SVM discrimination models 39 Detection ofthe Presence ofHazelnut Oil in Olive Oil by FT-Raman and FT-MIR Spectroscopy'V.Baeten, J. A.FernándezPierna,P.Dardenne,M.Meurens,D. L.García González,R.Aparicio.Journal of Agricultural and Food Chemistry,53 (16) (2005),6201-6206. ‘NIR hyperspectral imagingspectroscopyand chemometrics for the detection of undesirable substances in food and feed’ J.A. Fernández Pierna, Ph. Vermeulen, O. Amand, A. Tossens, P. Dardenne and V. Baeten. Special issue Chemometrics and Intelligent Laboratory Systems 117 (2012) 233-239 40 EXAMPLE: SUGAR BEET PLANTS – CERCOSPORA EXAMPLE: FEED PRODUCTS Multivariate regression method comparison: PLS, ANN and LS-SVM 41 EXAMPLE: FEED PRODUCTS 42 EXAMPLE: FEED PRODUCTS Feed -------------------------------------------- (28676x700) Ash, Fat, Fibre, Starch, Protein Feed Ingredients --------------------------- (26652x700) Ash, Fat, Fibre, Protein Fresh Silages ------------------------------- (1035x700) Dry Matter, Fibre, Protein Soils ------------------------------------------- (1625x700) CEC, COT_SK, N_Kj Example of data distribution (Soils N_Kj) 43 Pre-processing: SNV + detrend + First derivative 44 EXAMPLE: FEED PRODUCTS EXAMPLE: FEED PRODUCTS Feed - fibre Feed - Fat 1.4 1.2 val 0.4 ks RMS val 0.6 ks 0.7 rd 0.6 0.4 rd 0.2 0.2 0 0 pls ann pls svm ann Feed ingredients - fat Feed ingredients - ash train 0.8 RMS RMS 1 train 0.6 svm 0.5 train 0.4 val 0.3 ks 0.2 rd RMS 1 0.8 0.1 0 Feed - ash pls ann 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 train val ks rd pls svm ann svm 2.5 Feed ingredients - fibre train 1.5 val 1 ks 0.5 0 pls ann svm 3.5 3 RMS 2 val 1.5 ks 1 rd RMS train 2.5 0.5 0 pls ann 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 val ks rd ann svm 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 train val ks rd pls ann svm train val ks 45 rd pls svm train pls Feed - protein Feed - starch Feed ingredients - protein 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 rd RMS RMS 2 ann 46 svm EXAMPLE: FEED PRODUCTS EXAMPLE: TRANSFERT CALIBRATION TRANSFER FROM DISPERSIVE INSTRUM ENTSTO HANDHELD SPECTROM ETERS (M EM S) Fresh silages - protein Fresh silages - fibre 2.5 train val 1 ks rd 0.5 0 pls ann RMS RMS 2 1.5 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 60 train val 40 rd 30 pls svm 50 ks ann svm 20 10 RMS Fresh silages - dry matter 0 0 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 10 20 30 40 50 60 70 train val ks rd pls ann svm 47 48 ‘Calibration Transfer from Dispersive Instruments to Handheld Spectrometers’, J.A. Fernández Pierna, P. Vermeulen, B. Lecler, V. Baeten, P. Dardenne. Applied Spectroscopy 64 (6) (2010) EXAMPLE: TRANSFERT QUESTIONS ABOUT CHEMOMETRICS? j.f ernandez@ cra. wallonie. be http://www. cra. wallonie. be/ 49 51 50
© Copyright 2026 Paperzz