Determining lamb’s lettuce postharvest age based on visible / near infrared reflectance spectroscopy Bert A.J.G. Jacobs1, Bert E. Verlinden1a, Els Bobelyn1, An Decombel2, Peter Bleyaert2, Joris Van Lommel3, Isabel Vandevelde3, Wouter Saeys4 and Bart M. Nicolai1,4 1Flanders Centre of Postharvest Technology, Leuven, Belgium; 2Inagro, Rumbeke-Beitem, Belgium; voor de Groenteteelt, Sint-Katelijne-Waver, Belgium; 4KU Leuven, Department of Biosystems, MeBioS, Leuven, Belgium. 3Proefstation Abstract Lamb’s lettuce (Valerianella locusta L.) which is presented to the market is not always freshly harvested. The product can be stored up to 28 days and is indistinguishable from fresh material by the human eye. However, due to the prior storage period the shelf life potential is limited and this leads to losses in distribution and a lower quality for the consumer. This work aims to develop a rapid and nondestructive methodology using visible/near infrared (Vis/NIR) reflectance spectroscopy to detect and quantify the postharvest age. Vis/NIR reflectance spectra were linked to the time in storage by partial least squares regression (PLS). Two variable selection techniques, Genetic Algorithms PLS and Monte Carlo Uninformative Variable Elimination PLS, were combined to improve the accuracy and robustness of the prediction model while decreasing the number of wavelengths used. The final model used only 10% of the original wavelength variables while the root mean squared error of cross validation decreased from 6.0 to 3.6 days. The final model was tested using 2 external test sets and had a maximum root mean squared error of prediction of 3.7 days. Therefore, it was concluded that Vis/NIR reflectance spectroscopy can be a valid rapid and nondestructive method for identifying and quantifying the postharvest age of lamb’s lettuce. Keywords: storage, quality, corn salad, lambs lettuce, near infrared, spectroscopy, multivariate statistics INTRODUCTION Lamb’s lettuce is a popular greenhouse vegetable thanks to its ease of use and readyto-eat character. It is used both as a leafy salad and as an ingredient in ready to eat salad mixtures (Enninghorst and Lippert, 2003). However, lamb’s lettuce presented to the market is not always freshly harvested. Depending on the season, it can be stored up to three weeks. Stored samples are by eye visually indistinguishable from fresh produce, but they have impaired shelf life potential (Rico et al., 2007). To detect a lamb’s lettuce storage period, a fast and nondestructive measurement setup is needed which estimates how long a batch of lamb’s lettuce has been stored before it is commercialized. Nondestructive measurements using visible / near infrared (Vis/NIR) spectroscopy may provide this nondestructive method. The potential of NIR to characterize and analyze fruit and vegetables has been shown before (Nicolai et al. 2008). Due to the nature of NIR having broad overlapping peaks and a high correlation of adjacent wavelengths, it is necessary to use multivariate statistical techniques to extract useful information from the NIR spectrum (Nicolai et al., 2007). This is usually done by means of partial least square regression (PLS). This statistical method uses the independent variables, the Vis/NIR spectrum, to predict a dependent variable of interest. PLS defines new a E-mail: [email protected] variables called Latent Variables (LV’s) based on the covariance between the dependent and independent variables. In this way, a PLS model captures the variation in the Vis/NIR spectrum which explains the dependent variable (Naes et al., 2002; Wold et al., 2001). An important factor for the successful application of multivariate calibration models is their robustness (Zeaiter et al., 2004). A calibration model which is sufficiently robust for the specific application is based on a calibration dataset which is sufficiently rich in variation. To test for the robustness of a calibration model, which is essential for a successful application, an appropriate external validation is of prime importance (Bobelyn et al., 2010). Improving robustness and model accuracy can be achieved by preprocessing the spectra and applying wavelength selection techniques to remove any irrelevant information which cannot be handled properly by PLS (Xiaobo et al., 2010). Therefore, the aim of this study was to develop a fast and nondestructive methodology to estimate how long a batch of lamb’s lettuce has been stored before it was presented to the market. The potential of Vis/NIR reflectance spectroscopy in combination with multivariate statistics were tested for this purpose. To achieve robustness the calibration models were trained on a diverse calibration matrix after pre-processing of the spectra and selecting the most useful wavelengths. MATERIALS AND METHODS Plant material and storage conditions Samples of nine cultivars (Agathe, Audace, Baron, Calarasi, Cirilla, Gala, Pulsar, Trophy, Palace) of lamb’s lettuce (Valerianella locusta L.) were harvested between September 2012 and November 2014. Batches harvested before 2014 were grown at the experimental garden Inagro (Rumbeke-Beitem, Belgium). Batches harvested in 2014 contained more diverse plant material from commercial growers. Different treatments were applied during the postharvest period to induce extra variation. Vis/NIR reflectance spectroscopy From each selected lamb’s lettuce rosette, only the largest leaf was used for Vis/NIR reflectance spectroscopy measurements. Adaxial and abaxial reflectance spectra (380 - 1690 nm, wavelength increment 2 nm) of the leaves were acquired using a Zeiss Corona 1.7 (Carl Zeiss, AG, Germany) diode array (Si - InGaAs) with a 0/45° reflectance set-up using a fiber optics probe (Nicolai et al., 2007). For each measurement, the leaf sample was placed between a polished white PTFE block and the measuring head. Spectra were acquired at least once a week for a period of 3 to 4 weeks. Each measurement point during storage new samples from the same batch were used to minimize the effect of sample handling on the quality of the samples. Prediction model 1. Data matrix Based on preliminary results de data matrix of the independent variables consisted of the concatenated adaxial and abaxial spectra. The dependent variable was storage time after harvest (days). Regions in the combined spectra where the noise was too high (380 - 418 nm) and where the detectors change (962 - 992 nm) were ignored. For possible practical implementations it would be more interesting to have a cheaper measurement set-up. Preliminary analysis showed a rise in R² from 0.81 to 0.86 when the wavelengths above 1100 nm were discarded. Also a decrease in the RMSEC and RMSECV of respectively 0.5 and 0.7 days was observed with the exclusion of wavelengths above 1100nm. Therefore, the analyses in this study were limited to the theoretical range of a Si detector (380 - 1100 nm). The 706 spectra acquired in January 2013 (193), July 2013 (79), November 2013 (138), February 2014 (164) and November 2014 (132) were used as calibration data. A cross validation (CV) was applied to evaluate the PLS model performance during construction. Each harvest period was used as a separate CV group. The 190 spectra acquired in September 2012 were used as a validation set for variable selection to prevent over fitting. An external test set for validating the final prediction models existed out of 165 spectra acquired in March 2013 (81) and May 2014 (84). 2. Variable Selection Two wavelength selection techniques were applied on the spectra to improve the prediction potential and robustness of the PLS model by removing wavelength variables which were not informative for predicting the dependent variable. The applied wavelength selection techniques were Genetic Algorithms PLS (GA-PLS) (Lucasius et al., 1994) and Monte Carlo Uninformative Variable Elimination PLS (MC-UVE-PLS) (Cai et al., 2008). GA-PLS is a combination of Genetic Algorithms with PLS. Genetic Algorithms are an optimization technique that is inspired by Darwin’s theory of natural selection. Different combinations of subintervals of equal size of the spectrum, called individuals, are made by coding their chromosomes as a sequence of zeros and ones indicating for each variable or interval of variables whether it is used or not (genes). The PLS model performance for these individuals is then evaluated in CV. The best performing half of the individuals are then selected to ‘breed’ new individuals by creating pairs of parents and exchanging sections of their chromosomes in a single or double cross over. This new generation of individuals consisting of the best half of the individuals from the previous generation and the newly bred individuals is then again evaluated based on its PLS model performance in CV. This process is repeated until a pre-defined fraction of the individuals share the same genes or until a certain number of generations has been reached (Mehmood et al., 2012). In MC-UVE-PLS, many PLS models are built using every time a different selection of spectra/samples using a Monte Carlo sampling. We used 1000 Monte Carlo runs with 75% of the samples selected. Each model has a different beta coefficient for each wavelength. A reliability index (RI) for each wavelength is then calculated as the mean of its beta coefficients divided by its standard deviation. (Cai et al., 2008; Mehmood et al., 2012). The absolute value of the RI was the basis on which wavelengths were retained. Wavelength variables with a low absolute value of RI were considered less important for good predictions. The selection of the number of variables to retain was based on the RMSECV and RMSEV of different PLS models with different amounts of the best wavelengths retained. The maximum number of LV's in MC-UVE-PLS was set to 13 which is the same number of LV’s used by the initial full spectrum model. 3. Preprocessing Initially, the combined spectra were preprocessed using a multiplicative scatter correction (MSC). MSC is a pre-processing step that attempts to account for offset and scaling effects (Geladi et al., 1985). After variable selection a Generalized Least Squares weighting (GLSW) was applied after a full spectrum MSC to further reduce the number of LV’s. GLSW identifies interfering signals in the spectra and down-weights them (Martens et al., 2003). RESULTS AND DISCUSSION Evaluation of initial model The initial full spectrum PLS model was based on the spectrum of both adaxial and abaxial leaf sides. The RMSEC and RMSECV were inconclusive for indicating an optimal value of LV’s. There was only a local minimum when 8 LV’s were used, but the RMSECV gave lower values with increasing LV’s and no real minimum was reached. Therefore, the optimal number of LV’s was selected based on the estimated signal to noise ratio (S/N). An estimated S/N greater than or equal to 3 is considered good. As the S/N for the 13th LV was still good, the prediction model with 13 LV’s was selected. This model had an R² of 0.75 and a RMSEC, RMSECV and RMSEV of 3.6, 6.0 and 5.4 days, respectively (Figure 1). Figure 1. (A) RMSEC (open) and RMSECV (solid) of the basic PLS model with limited preprocessing and no variable selection. (B) Estimated signal to noise ratios for different LV’s. The horizontal line is the threshold (S/N = 3). Solid and open symbols represent good (S/N ≥ 3) and bad (S/N < 3) signal to noise ratios respectively. Variable Selection 4. GA-PLS The importance of each wavelength in the combined spectrum was determined by the presence of each wavelength in the resulting 6226 models after 25 runs. The optimal number of wavelengths was selected at 60% which means that 390 wavelength variables were used for the construction of the PLS model (Figure 2A) This model used 13 LV’s and had an R² of 0.78. The RMSEC, RMSECV and RMSEV were 3.3, 4.5 and 4.8 days, respectively. The RMSECV of all the 6226 models which were constructed using GA-PLS varied between 4.6 and 3.5 days. Chances of an overrepresentation of bad performing wavelengths was a possibility. To cope with this problem a second evaluation was conducted using solely the 10% best performing models of each replicate run based on the RMSECV. The optimal number of wavelengths was selected at 15% which means 98 variables were used for the construction of the PLS model. This prediction model used 10 LV’s and had an R² of 0.82. The RMSEC, RMSECV and RMSEV were 3.1, 4.3 and 4.1 days, respectively. 5. MC-UVE-PLS Different prediction models were constructed using different numbers of wavelengths based on the absolute value of RI. The optimal number of wavelengths was selected at 35% which means 228 wavelengths were used for the construction of the PLS model (Figure 2B). This model used 14 LV’s and had an R² of 0.84. The RMSEC, RMSECV and RMSEV were 3.0, 3.7 and 3.9 days, respectively. Figure 2. Output and selection of wavelengths of (A) GA-PLS, and (B) MC-UVE-PLS. On the left and the middle pane, the dashed line is the mean spectrum, the thick solid regions are wavelengths selected for constructing the PLS model and the horizontal dashed line is the threshold which discriminates between useful and useless wavelengths. The thin solid line in A is the presence of each variable in the models constructed by the GA-PLS. The thin solid line in B is the rescaled absolute value of RI. On the most right pane, the RMSEC (solid circles), RMSECV (open circles) and RMSEV (diamonds) of PLS models constructed using different numbers of wavelengths are shown. The vertical line denotes the selected model for (A) GA-PLS, and (B) MC-UVE-PLS. Evaluation of the added value of variable selection When the different prediction models with a reduced number of wavelength variables were compared with the initial model, it became clear that both wavelength selection methods improved the RMSECV of the PLS models. The prediction model based on wavelengths selected by GA-PLS had an RMSECV of 4.3 days and an RMSEV of 4.1 days which implies that both models were quite robust. Of both techniques, MC-UVE-PLS was the most successful, resulting in a PLS model with the highest R² (0.84), the lowest RMSECV (3.7 days) and the lowest RMSEV (3.9 days). Although only 35% of the initial wavelengths were retained for model construction 14 LV’s were used for the predictions which was higher than the 10 LV’s of the GA-PLS model. Combining variable selections The GA-PLS and the MC-UVE-PLS model used 98, and 228 wavelengths respectively. There is a high consistency in the wavelengths which none of these wavelength selection methods selected. 32% (206) of the initial 650 wavelengths were discarded by both techniques. A combination of the wavelengths selected by both techniques might give even better results than any selection made by a single technique. Therefore, a combination of the selected wavelengths was made by combining all the selected wavelengths of both techniques or by keeping only the wavelengths on which the two techniques were unambiguous. In Table 1 the performance of both models based on these combinations is presented together with the number of selected wavelengths. C1 was the only combinations which performed as good as MC-UVE-PLS, but the number of included wavelengths was lower. While the selection based on MC-UVE-PLS used 35% of the initial 650 variables, combinations C1 used only 10%. To reduce the number of LV’s GLSW was applied with an optimal threshold of 1 and 0.4 for MC-UVE-PLS and C1 respectively. These final models were tested using an extra external test set. The root mean square error of this extra external test set (RMSEP) was an extra indication for model robustness. All of the selected variable sets gave similar results in RMSECV, RMSEV and RMSEP which indicates that these selections gave robust PLS prediction models (Table 2). As both models performed similarly, the model based on the lowest number of wavelength variables (C2) was selected (Figure 3 and 4). Table 1. Performance of PLS models constructed by using combinations of wavelengths selected by GA-PLS and MC-UVE-PLS. Wavelengths selected Number of LV’s R² RMSEC RMSECV RMSEV wavelengths (days) (days) (days) Included by GA-PLS 98 (16%) 10 0.82 3.1 4.3 4.1 Included by MC-UVE-PLS 228 (35%) 14 0.84 3.0 3.7 3.9 C1: Included by both 65 (10%) 13 0.82 3.3 3.7 3.9 C2: Included by GA-PLS 261 (40%) 12 0.81 3.3 4.4 4.2 or MC-UVE-PLS Table 2. The performance of 2 models using different number of wavelengths tested on data from September 2012 (RMSEV) and data from March 2013 and May 2015 (RMSEP). Name of Number of LV’s R² RMSEC RMSECV RMSEV RMSEP combination wavelengths (days) (days) (days) (days) MC-UVE-PLS 228 (35%) 5 0.85 3.0 3.5 3.8 3.7 C1 65 (10%) 7 0.83 3.2 3.6 3.7 3.3 Figure 3. Time in storage after harvest plotted against the predicted time in storage. The solid and open symbols are samples from the calibration and external test set, respectively. The dashed line is the optimal regression line and the solid black line is the regression line for C1. Figure 4. The wavelengths which were included in selection C1. The dashed line is the mean spectrum and the thick solid regions are wavelength variables selected for constructing the PLS model. CONCLUSIONS Vis/NIR reflectance spectroscopy in combination with PLS regression was evaluated as a fast and non-destructive method for the determination and quantification of a prior storage period for lamb’s lettuce. The accuracy and robustness of the predictions improved vastly after wavelength selection with a RMSECV, RMSEV and RMSEP of 3.6, 3.7 and 3.3 days respectively. The number of LV’s dropped from 13 to 7, the RMSECV and RMSEV decreased with 2.4 and 1.7 days respectively while the R² increased from 0.75 to 0.83. The number of used variables was minimized by combining the output of GA-PLS and MC-UVE-PLS resulting in 65 essential variables which made up 10% of the initial 650 variables. These wavelengths in the Vis/NIR spectrum contained the essential information related to the time in storage after harvest and had a good signal to noise ratio. It is still possible that certain wavelength variables were influenced by external factors, but this could then be corrected by other wavelength variables to prevent an incorrect prediction. AKWNOWLEDGEMENTS This research was carried out as part of IWT project 100885 supported by the Agency for Innovation by Science and Technology in Flanders (IWT) and LAVA, Belgium. Literature Cited Bobelyn, E., Serban, A.-S., Nicu, M., Lammertyn, J., Nicolai, B.M., Saeys, W., 2010. Postharvest quality of apple predicted by NIR-spectroscopy: Study of the effect of biological variability on spectra and model performance. Postharvest Biol. Technol. 55, 133–143. doi:10.1016/j.postharvbio.2009.09.006 Cai, W., Li, Y., Shao, X., 2008. A variable selection method based on uninformative variable elimination for multivariate calibration of near-infrared spectra. Chemom. Intell. Lab. Syst. 90, 188–194. doi:10.1016/j.chemolab.2007.10.001 Enninghorst, A., Lippert, F., 2003. Postharvest Changes in Carbohydrate Content of Lamb ’ s Lettuce, in: International Conference on Quality in Chains. Acta Horticulturae, pp. 553–558. Geladi, P., MacDougall, D., Martens, H., 1985. Linearization and scatter-correction for NIR reflectance spectra of meat. Appl. Spectrosc. 39, 491–500. Lucasius, C.B., Beckers, M.L.M., Kateman, G., 1994. Genetic algorithms in wavelength selection: a comparative study. Anal. Chim. Acta 286, 135–153. doi:10.1016/0003-2670(94)80155-X Martens, H., Høy, M., Wise, B.M., Bro, R., Brockhoff, P.B., 2003. Pre-whitening of data by covarianceweighted pre-processing. J. Chemom. 17, 153–165. doi:10.1002/cem.780 Mehmood, T., Liland, K.H., Snipen, L., Sæbø, S., 2012. A review of variable selection methods in Partial Least Squares Regression. Chemom. Intell. Lab. Syst. 118, 62–69. doi:10.1016/j.chemolab.2012.07.010 Naes, T., Isaksson, T., Fearn, T., Davies, T., 2002. A user Friendly guide to Multivariate Calibration and Classification 352. Nicolai, B.M., Beullens, K., Bobelyn, E., Peirs, A., Saeys, W., Theron, K.I., Lammertyn, J., 2007. Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review. Postharvest Biol. Technol. 46, 99–118. doi:10.1016/j.postharvbio.2007.06.024 Nicolai, B.M., Verlinden, B.E., Desmet, M., Saevels, S., Saeys, W., Theron, K., Cubeddu, R., Pifferi, A., Torricelli, A., 2008. Time-resolved and continuous wave NIR reflectance spectroscopy to predict soluble solids content and firmness of pear. Postharvest Biol. Technol. 47, 68–74. doi:10.1016/j.postharvbio.2007.06.001 Rico, D., Martín-Diana, a. B., Barat, J.M., Barry-Ryan, C., 2007. Extending and measuring the quality of fresh-cut fruit and vegetables: a review. Trends Food Sci. Technol. 18, 373–386. doi:10.1016/j.tifs.2007.03.011 Wold, S., Sjöström, M., Eriksson, L., 2001. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 58, 109–130. doi:10.1016/S0169-7439(01)00155-1 Xiaobo, Z., Jiewen, Z., Povey, M.J.W., Holmes, M., Hanpin, M., 2010. Variables selection methods in near-infrared spectroscopy. Anal. Chim. Acta 667, 14–32. doi:10.1016/j.aca.2010.03.048 Zeaiter, M., Roger, J.-M., Bellon-Maurel, V., Rutledge, D.N., 2004. Robustness of models developed by multivariate calibration. Part I: The assessment of robustness. Trends Anal. Chem. 23, 157–170. doi:10.1016/S0165-9936(04)00307-3
© Copyright 2025 Paperzz