Postharvest Biology and Technology 34 (2004) 117–129 Measuring surface distribution of carotenes and chlorophyll in ripening tomatoes using imaging spectrometry G. Polder a,b,∗ , G.W.A.M. van der Heijden a , H. van der Voet a , I.T. Young b a b Biometris, PO Box 100, 6700 AC, Wageningen, The Netherlands Pattern Recognition Group, Department of Imaging Science and Technology, Delft University of Technology, Lorentzweg 1, 2628 CJ, Delft, The Netherlands Received 19 September 2003; accepted 3 May 2004 Abstract Tomatoes (Lycopersicum esculentum, Mill. cv. Capita F1) were harvested at different ripening stages. Spectral images from 400 to 700 nm with a resolution of 1 nm were recorded. After recording, samples were taken from the fruit wall and the lycopene, lutein, -carotene, chlorophyll-a and chlorophyll-b concentrations were measured using HPLC. The relation between the compound concentrations measured with HPLC and the spectral images was analyzed using partial least squares (PLS) regression. The Q2 error of the predicted lycopene concentration, determined from the PLS procedure, was 0.95 on a pixel basis, and 0.96 on a tomato basis. The Q2 error of the other compounds varied from 0.73 to 0.84. Pixel-based regression made it possible to construct concentration images of the tomatoes, which showed non-uniform ripening. The method can be applied in a conveyor belt system using sorting criteria such as concentration of the compounds and the uniformity of the distribution of the concentrations. © 2004 Elsevier B.V. All rights reserved. Keywords: Imaging spectrometry; Tomato; Lycopene; Chlorophyll; Pls; Hplc; Compounds 1. Introduction Tomatoes (Lycopersicum esculentum, Mill.) are widely consumed either raw or after processing. Tomatoes are known as health stimulating fruit because of the antioxidant properties of their main compounds (Velioglu et al., 1998). Antioxidants are important in disease prevention in plants as well as ∗ Corresponding author. Tel.: +31-317476842; fax: +31-317483554. E-mail address: [email protected] (G. Polder). URL: http://www.ph.tn.tudelft.nl/∼polder. in animals and humans. Their activity is based on inhibiting or delaying the oxidation of biomolecules by preventing the initiation or propagation of oxidizing chain reactions (Velioglu et al., 1998). The most important antioxidants in tomato are carotenes (Clinton, 1998) and phenolic compounds (Hertog et al., 1992). Amongst the carotenes, lycopene dominates. The lycopene content varies significantly with ripening and the variety of the tomato and is mainly responsible for the red color of the fruit and its derived products (Tonucci et al., 1995). Lycopene appears to be relatively stable during food processing and cooking (Khachik et al., 1995; Nguyen and Schwartz, 1999). 0925-5214/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.postharvbio.2004.05.002 118 G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 Epidemiological studies have suggested a possible role for lycopene in the protection against some types of cancer (Clinton, 1998) and in the prevention of cardiovascular disease (Rao and Agarwal, 2000). The second important carotenoid is -carotene, which is about 7% of the total carotenoid content (Gould, 1974). The amount of carotenes as well as their antioxidant activity is significantly influenced by the tomato variety (Martinez-Valverde et al., 2002) and maturity (Arias et al., 2000). Ripening of tomatoes is a combination of processes including the breakdown of chlorophyll and build-up of carotenes. Chlorophyll and carotenes have specific, well-known reflection spectra. Using knowledge of the known spectral properties of the main constitutive compounds, it is possible to calculate their concentrations using spectral measurements. Arias et al. (2000) found a good correlation between color measurements using a chromameter and the lycopene content measured by HPLC. In order to be able to sort tomatoes according to the distribution of their lycopene and chlorophyll content, a fast in-line imaging system is needed. This system can be placed on a conveyor belt sorting machine. Since the compounds can be determined by their reflection spectra, a spectral imaging system with many wavelength bands is preferred over using a standard color camera. The aim of our research is to develop methods for measuring the spatial distribution of compounds using such a system. The spectral data were correlated with compound concentration measured by HPLC. 2. Materials and methods 2.1. Tomato samples Tomatoes (Capita F1 from De Ruiter Seeds, Bergschenhoek, The Netherlands) were grown in a greenhouse located at Plant Research International, Wageningen, The Netherlands. The tomatoes were harvested at different ripening stages, varying from mature green to intense red color, and scored by visual evaluation performed by a five-member sensory panel. The ripeness stage was determined using a tomato color chart standard (The Greenery, Breda, The Netherlands), which is commonly used by breeders. The number of tomatoes used in this experiment was 37. After washing and drying the tomatoes thoroughly, spectral images were recorded. Immediately after the recording of each tomato four circular samples of 16 mm diameter and 2 mm thickness were excised from the outer pericarp and after determination of the sample fresh weight, frozen in liquid nitrogen and stored under nitrogen atmosphere at −80◦ C. 2.2. HPLC analysis The tomato samples (approximately 500 mg) were ground in a mortar in liquid nitrogen, followed by grinding in 4 ml acetone with 50 mg CaCO3 . After centrifugation, pellets were re-extracted with 4 ml acetone, 2 ml hexane and 5 ml acetone:hexane (4:1), successively. All solvents contained 0.1% (w/v) butylated hydroxytoluene (BHT). The supernatants were combined, measured and filtered through 0.2 m nylon syringe filters into HPLC vials. All processes were performed as much as possible under subdued or safe light and a nitrogen atmosphere. For HPLC analysis a Spectra Physics SP8800 pump system was used equipped with a Spectra Physics Spectra 100 UV-Vis detector (Spectra Physics, Mountain View, CA, USA). For system processing and data acquisition a computer system with WINner on Windows software was used (Thermo Separation Products, Wirral, UK). Sample injections were carried out by means of a Waters 717 (Milford, MA, USA) autosampler equipped with a cooler. Separation of the pigments was carried out according to Gilmore and Yamamoto (1991) using a non-endcapped Allsphere ODS-1 HPLC column (4.6 mm × 250 mm, 5 m particle size) preceded to a ODS-1guard column (Alltech Associates). Stainless steel column frit/insert material was replaced by a Peek Alloyed with Teflon (PAT) column frit/insert. The column temperature was 30 ◦ C. The mobile phase consisted of acetonitrile:methanol:tris buffer 0.1 M, pH 8.0 (86:10:4, solvent A) and methanol:hexane (4:1, solvent B). All solvents contained 0.1% (w/v) butylated hydroxytoluene. The flow rate was 1 ml min−1 , sample injection volume was 20 l and spectrophotometric detection was performed at 445 nm. The concentrations of pigment standard stock solutions were determined spectrophotometrically using published absorbance coefficients (Konings and Roomans, 1997). Calibration curves were made with the stan- G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 dard stock solutions to quantify the pigments. Peak identification was based on the relative retention times of the peaks with trans-beta-apo-8’-carotenal as internal standard. The newly developed HPLC analysis method for tomatoes was not developed especially to investigate whether compounds are present or absent. No determination was made of the limit of detection (LOD), which is defined as the lowest level that can be reliably detected (Currie, 1995; ISO, 1997), and which presupposes the existence of decision criteria for detection in the analytical method. When a compound is absent, or present only in a very low concentration, no HPLC peak is found, or the signal is too much distorted by noise or interference to allow a sensible calculation of estimated concentration. In practice, the analytical chemist decides on a level, called the limit of reporting (LOR), below which no numerical values will be reported. Unlike the LOD, the LOR is not a performance characteristic of the method, but rather a censoring limit. The limit of reporting for lycopene, -carotene and lutein was 0.5 g/g fresh weight (FW). For chlorophyll the limit of reporting was 2.5 g/g FW. Lycopene, -carotene, lutein, chlorophyll-a and chlorophyll-b were purchased from Sigma-Aldrich Company (St. Louis, Missouri, USA). Beta-apo-8carotenal was acquired from Fluka Chemie AG (Buchs, Switzerland). HPLC grade solvents were obtained from Acros Organics (Fisher Scientific Company, ’s-Hertogenbosch, The Netherlands). 2.3. Imaging spectrometry The spectral images in this experiment were obtained using an ImSpector (Spectral Imaging Ltd., Oulu, Finland), spectrograph (Herrala and Mauri, 1993; Hyvärinen et al., 1998) in combination with a stepper table. At each step the spectrum is recorded for one line of the object. By translating the object with respect to the camera, a full spectral image can be obtained. The ImSpector is available in several different wavelength ranges. Type V7, which was used, has an observed spectral range of 396 to 736 nm and a slit size of 13 m resulting in a spectral resolution of 0.9 nm. Details of the calibration of this system can be found in Polder et al. (2003). The spectral images were recorded using two Dolan-Jenner PL900 illuminators (Andover St. 119 Lawrence, Mass., USA), with 150 W quartz halogen lamps. These lamps have a relatively smooth emission between 380 and 2000 nm. Glass fiber optic line arrays of 0.02 in. × 6 in. aperture and rod lenses for the line arrays (Vision Light Tech, Uden, The Netherlands), were used for illuminating the scene. The camera used was a Qimaging PMI-1400 EC Peltier cooled camera with a NIKON 55 mm lens and the ImSpector V7 between the lens and the camera. The frame grabber used was a Datacell Limited (Berkshire, UK) Snapper board. The digitized pixels had an resolution of 12 bits, resulting in 4096 distinguishable grey value steps. The translation table used to move the object with respect to the camera was a Lineartechniek Lt1-Sp5-C8-600 translation table (Andelst, The Netherlands) and is driven by a SDHWA 120 programmable microstepping motor driver (Ever Elettronica, Italy). The resolution of this translation table was ±30 µm and the maximum speed 250 mm/s. When using the full camera resolution, the spectral image is over-sampled due to limitations in lens and ImSpector optics. Binning can be used to reduce the size of the image without losing information (Polder et al., 2003). The software allows binning of the image separately in both the spatial and the spectral axes during capture of the image. The binning factor in both the spatial x-axis and the spectral axis was 4. The stepsize of the stepper table was chosen to match the binned spatial resolution in the x-direction. The number of steps was chosen to capture one tomato and the grey reference in one image. The resulting images have a spatial dimension of 318 × 256 square pixels and a spectral dimension of 257 bands (about 42MB). The software to control the stepper table and frame grabber, to construct the hyperspectral images and to save and display them was locally developed in a single computer program written in Java. A detailed description of the system can be found in Polder et al. (2003). 2.4. Data preprocessing To remove spectral variation which is not caused by the differences in compound concentration, but by external effects such as aging of the light source, non-uniform lighting, and shading effects, data preprocessing techniques must be applied. Swierenga (2000) gives an overview of preprocessing techniques with 120 G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 their corresponding spectral effect. The data preprocessing techniques that are used in this research are as follows. The measured spectra are dependent on the spectrum of the light source. Due to aging this spectrum varies in time. Polder et al. (2002) have shown that using the Shafer reflection model (Shafer, 1985), the spectral image can be made independent of the spectral power distribution of the light source using the following equation: Rλ = Iλ − B λ Wλ − B λ (1) where Rλ is the color-constant reflection value at wavelength λ, Iλ the original measured spectral reflection value, Wλ the white or grey reference intensity, and Bλ the black reference, which is generated by the dark current of the camera. The grey reference signal in this experiment is obtained by recording patch 21 from the GretagMacBeth standard color checker (New Windsor, NY) in each image. The spectral reflectance of this reference as given by the manufacturer is 0.36 from 300 to 900 nm. Assuming matte surfaces, Stokman and Gevers (1999) have shown that the images can also be made invariant to object geometry. Normalizing the spectral values Rλ according to Eq. (2) results in a spectral value Xλ which is independent of the illumination spectral power distribution, illumination direction and object geometry. Rλ X λ = n λ=1 Rλ (2) Savitsky–Golay smoothing (Savitsky and Golay, 1964) was used to smooth the spectra. The procedure was combined with first-order derivatives to remove the baseline of the spectra. and inner-coefficients are calculated. Then the X- and Y-block residuals are calculated and the entire procedure is repeated for the next factor (commonly called a latent variable [LV] in PLS). The optimal number of LVs used to create the final PLS model is determined by cross-validation using contiguous block data subsets. The PLS algorithm used is SIMPLS, developed by De Jong (1993). From each tomato a bottom view spectral image was captured. In this image the center part is ignored because of specular reflection. In order to compare the variation in spectra-predicted concentrations with the variation in measured HPLC concentration, eight circular patches were defined on the tomato. The size of these patches were about the same as the samples used in the HPLC analysis. Fig. 1 shows the layout. From each of the 8 patches, 25 spectra were extracted for the PLS regression. The total number of extracted spectra per tomato this way was 200. These spectra form the X-block in the PLS regression and cross-validation. The size of the contiguous blocks was also chosen as 200. This way the cross-validation acts as leave-one-out cross-validation on the whole tomatoes. The performance of the PLS models can be measured by calculating the root-mean-square error of prediction (RMSEP) and the predicted percentage variation Q2 . These values can be measured at pixel level (Eqs. (3) and (4)) and at tomato level (Eqs. (5) and (6)): n m 2 i=1 j=1 (ŷij(−i) − yi ) RMSEPpixel = (3) nm 2.5. Partial least squares regression Partial least squares (PLS) (Geladi and Kowalski, 1986; Helland, 1990) regression models are widely used to extract useful information from spectroscopic data. A PLS regression model relates the spectral information to quantitative information of the measured samples (in our case the concentration of different compounds in the tomatoes). PLS is an iterative process where at each iteration, scores, weights, loadings Fig. 1. Image showing the masks which are used to extract tomato spectra. G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 n Q2pixel i=1 =1− m m j=1 (ŷij(−i) n i=1 (yi − y i )2 3. Results and discussion (4) − y)2 3.1. HPLC analysis m n 2 i=1 (( j=1 ŷij(−i) /m) − yi ) RMSEPtomato = n (5) n Q2tomato =1− m i=1 (( 121 j=1 ŷij(−i) /m) − yi ) 2 i=1 (yi − y) n 2 (6) where ŷij(−i) is the predicted concentration of pixel j from tomato i, based on the calibration set of all tomatoes except tomato i; yi the compound concentration of tomato i measured with HPLC; these were reference values also in the pixel case, because of course no HPLC values per pixel were available; y, mean concentration of all tomatoes; n the number of tomatoes and m the number of selected pixels per tomato. The Y-block in PLS regression is not necessarily a one-dimensional array with one compound concentration per X spectrum. It can contain a number of concentrations of different compounds. In this case we speak of PLS2 since the Y-block is two-dimensional. To distinguish PLS with one-dimensional Y-block from PLS2, we refer to it as PLS1. All analyses were done using Matlab (The Mathworks Inc., Natick, Mass, USA), the Matlab PRTools toolbox (Faculty of Applied Physics, Delft University of Technology, The Netherlands) and the PLS Toolbox (Eigenvector Research Inc., Manson, WA, USA). The number of tomatoes used in this experiment was 37. From each tomato four samples were extracted for further analysis. One sample was used to measure the fresh-weight/dry-weight ratio. Since HPLC analysis is quite expensive not all of the three remaining samples of all tomatoes were analyzed. The raw and very ripe tomatoes are not commonly used in sorting applications, therefore fewer samples of these tomatoes were used. Table 1 shows the number of samples analyzed per tomato per maturity stage. For example, six tomatoes of ripeness stages 7–8 were used, from two tomatoes one sample was analyzed in the HPLC, from four tomatoes three samples were analyzed. The total number of samples was 67. In Table 2 some descriptive statistics of the HPLC measurements are given. The coefficients of variation components for tomatoes and for samples within tomatoes have been estimated by fitting the multiplicative model yij = µ · tomatoi · sampleij under the assumption of independent lognormal distributions for tomatoi and sampleij to the logarithms of the positive measurements yij . At the logarithmic scale this is a simple variance components modeling, which has been performed using the residual maximum likelihood (REML) method in Genstat Release 6.1 (Searle et al., 1992; Genstat, 2002). The variance components v obtained at the logarithmic scale have been translated to coefficients of √ variation components at the original scale by CV = ev − 1 (which is exact for lognormal distributions). From Table 2 we learn Table 1 Number of samples per tomato used in HPLC analysis, for maturity stage of the tomatoes analyzed Ripeness 1–2 3–4 5–6 7–8 9–10 11–12 >12 Total no. of tomatoes Total no. of samples 1 sample 2 samples 3 samples No. of tomatoes 8 3 2 2 2 1 2 0 0 0 0 0 3 1 1 2 3 4 1 2 0 9 5 5 6 3 6 3 20 20 4 8 13 39 37 No. of samples 11 9 11 14 5 13 4 67 The maturity stage was determined visually using a tomato color chart standard. Tomatoes of maturity >12 were clearly over-ripe. 122 G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 Table 2 Summary of HPLC measurements Compound Lycopene -Carotene Lutein Chlorophyll-a Chlorophyll-b Statistics for tomatoes with results ≥LOR No. of tomatoes LOR (g/g FW) N<LOR N≥LOR Mean (g/g FW) CVtomato CVsample 0.5 0.5 0.5 2.5 2.5 6 0 2 21 21 31 37 35 16 16 70.33 5.67 0.77 13.98 4.85 5.42 0.81 0.51 0.73 0.86 0.32 0.15 0.21 0.27 0.37 LOR, limit of reporting; FW, fresh weight; N, number of tomatoes (multiple measurements per tomato were all either below or above LOR); mean, mean of tomato means; CV, coefficient of variation (σ/µ). CVs have been estimated from multiplicative model with lognormal factors for tomato and sample (see text). that the mean level of lutein is only slightly above the limit of reporting. In this light it is remarkable that lutein could be detected in all but two tomatoes. Many more non-detects were found for chlorophyll-a and -b, which are components that show much more variation than lutein, both between and within tomatoes. The variation between samples within tomatoes is much less than the variation between tomatoes. However, it is still considerable (CVs between 15 and 37%), and this effect may be caused by non-uniform ripening which also can be seen in the color distribution over the fruit surface (Choi et al., 1995). 3.2. Spectroscopic data and preprocessing Each tomato was recorded separately with a grey value reference. At ∼500 nm the reflection values of the reference are high compared to the reflection of the tomato surface. The wavelength plane at this value is used to segment the grey value reference by thresholding. The threshold value was empirically determined at 300 (out of 4096). The binary image obtained was used as a mask to extract the reference pixels from the spectral image. The grey value reference Wλ was calculated by averaging all these pixels. The tomato was separated from the background by thresholding the sum image of all wavelength bands. From the obtained binary image the reference was removed using an exclusive-or operation with the previously obtained mask image. The reflection at the center of the tomato which was disturbed by specularity of the fruit skin, was also excluded. The obtained raw spectra were made color-constant using Eq. (1) and normalized using Eq. (2). Savitsky–Golay combined with taking the first derivative was used to remove the baseline and noise. Smoothing parameters were filter width: 15, the order of the polynomial: 2. Fig. 4 shows raw, color-constant, normalized color-constant and Savitsky–Golay smoothed reflection spectra from all tomatoes. Table 3 tabulates the effect of the several preprocessing methods on the error and the simplicity (number of LVs) of the regression results. This test was done using PLS cross-validation on 200 randomly selected pixel spectra per tomato, with in the Y-block the lycopene concentrations obtained by HPLC. From this table we learn that when making the spectra color-constant, the regression result slightly degrades. It is not quite clear what the reason is for this, although it may be due to the fact that the noisy part of the spectrum is enhanced. The images were captured in a small time period of a couple of days. During this time there was not much aging of the illuminant. For this reason the regression on the raw spectra was reaTable 3 The effect of several preprocessing methods on the error and the simplicity of the regression results for the PLS regression of tomato spectra on lycopene concentration CC NORM SAVGOL SAVGOL 1st RMSEP/µ Q2 0.26 0.28 0.20 0.20 0.20 0.91 0.90 0.95 0.95 0.95 LV 7 9 8 8 5 Combinations of color-constant (CC), normalized (NORM), Savitsky–Golay smoothing (SAVGOL), and Savitsky–Golay smoothing in combination with a first derivative (SAVGOL 1st) were used. G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 sonably good. Normalizing the spectra reduce the error significantly, indicating that the object geometry of the fruit has a significant influence on the spectra. This is due to the overall reflectance intensity which falls off near the border of the fruit. Noise is reduced by Savitsky–Golay smoothing, but this does not have an effect on the regression result, indicating that the PLS regression method is not very sensitive to this type of noise. When Savitsky–Golay smoothing in combination with a first derivative filter is used (Fig. 4d) the error remains the same, but the model gets simpler, since 5 instead of 8 LVs are sufficient to describe the data. For the analysis in the remaining part of this paper, therefore, color-constant normalized spectra with first derivative Savitsky–Golay smoothing were used (bottom row, Table 3). 3.3. Predicting concentrations with PLS regression Chemical processes in ripening tomatoes involve breakdown of chlorophyll and build-up of carotenes, which are correlated processes. Therefore, it is reasonable to consider PLS2 regression with the concentrations of all compounds in the Y-block. We expected some problems, because in this case the Y-block contains a lot of missing values. As can be seen in Figs. 2 and 3 the missing values for chlorophyll-a and -b are for maturity >6, for lycopene the missing values are for tomatoes with maturity <2. Missing values are not allowed and when they are removed from the Y-block only tomatoes with maturity between 2 and 6 remain. To check if PLS2 would contribute to the regression result, cross-validation is done using: • a one-dimensional Y-block containing all positive values of each compound. • a one-dimensional Y-block containing all values of each compound of all tomatoes. Concentrations below the limit of reporting (missing values) are counted as 0. • a two-dimensional Y-block containing all values of all compounds of all tomatoes. Concentrations below the limit of reporting (missing values) are counted as 0. The columns of the Y-block were scaled to put the concentrations for each compound between 0 and 1. Table 4 tabulates cross-validation results. The number 123 Fig. 2. The amount of lycopene determined by HPLC as a function of the manually scored ripeness stage. The numbers are the labels assigned to the tomatoes. The maturity class is the average class assigned by the five experts. For tomatoes where more than one sample was analyzed, a confidence interval is given. of latent variables needed for the smallest error varied from 3 to 14, depending on the compound, the X and the Y variables. Fig. 5 shows a typical relation between the cross-validation error and the number of LVs. From this figure we see that for LVs higher than Fig. 3. The amount of chlorophyll-a and chlorophyll-b determined by HPLC as function of the manually scored ripeness stage. The numbers are the labels assigned to the tomatoes. The maturity class is the average class assigned by the five experts. For tomatoes where more than one sample is analyzed, a confidence interval is given. 124 G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 Fig. 4. Raw tomato reflection spectra (a), color-constant spectra (b), normalized color-constant spectra (c) and Savitsky–Golay smoothed normalized color-constant spectra (d) (filter width: 15, polynomial order 2, first derivative). One randomly selected spectrum for each of the 37 tomatoes is plotted. 5 the error does not change very much, only when the number of LVs is much larger (10 in this case) does the error begin to increase. Other compounds, and X and Y data showed similar results. For this reason the number of LVs used in Table 4 was chosen as 5. From this table we see that including concentrations below the limit of reporting of the HPLC as zero significantly degrades the regression results. Using a PLS2 model on all concentrations does not improve this result. Excluding the zeros from the two-dimensional Y-block is not an option, because very few samples will remain, as high lycopene concentration correlates with Table 4 Cross-validation results (RMSEPpixel /µ) for all the compounds, using PLS1, PLS1 with zeros, PLS2 with all compounds in the Y-block and PLS2 with selected compounds in the Y-block PLS1 PLS1 PLS2 PLS2 PLS2 PLS2 PLS2 Missing values Compound – 0 0 – 1–3 – 4–5 – 2–5 – 2–3 1 2 3 4 5 0.19 0.25 0.25 0.33 0.34 0.22 0.25 0.30 0.45 0.46 0.22 0.25 0.28 0.48 0.45 0.22 0.24 0.28 0.25 0.25 0.34 0.34 0.23 0.24 0.32 0.32 Lycopene -Carotene Lutein Chlorophyll-a Chlorophyll-b The number of LVs used was 5. Missing values are excluded (–) or counted as 0(0). G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 Fig. 5. Cross-validation result (RMSEP/µ) for lycopene spectral prediction using PLS2 with all compounds in the Y-block, as function of the number of latent variables (LV). 125 Fig. 7. Spectral predicted against real (HPLC) -carotene concentration of the tomato pixels. The mean of the pixels denoting the average concentration per tomato is indicated with a star. low chlorophyll concentration. Excluding lycopene or chlorophyll from the two-dimensional Y-block while excluding the missing values is possible, since in that case many more Y variables will remain. But results from the PLS2in this case are not better than PLS1 with removed missing values. The final PLS model was built using PLS1 where tomatoes with Y values below LOR were excluded. In Figs. 6–8 the spectral predicted lycopene, -carotene and chlorophyll-a concentrations are plotted against the observed concentration by HPLC. Table 5 shows the pixel-based and tomato-based PLS regression results for all compounds. Comparing the RMSEP/µ in this table, shows that at pixel base the variation is somewhat larger than at tomato base. Part of the extra variation at pixel level is due to the variation among pixels not captured in the HPLC analysis. This is mainly the case for tomatoes where only one sample was analyzed. Fig. 6. Spectral predicted against real (HPLC) lycopene concentration of the tomato pixels. The mean of the pixels denoting the average concentration per tomato is indicated with a star. Fig. 8. Spectral predicted against real (HPLC) chlorophyll-a concentration of the tomato pixels. The mean of the pixels denoting the average concentration per tomato is indicated with a star. 126 G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 Table 5 Mean concentration, and variation determined by HPLC, pixel-based classification error and tomato-based classification error Calibration Pixel-based Validation Tomato-based Pixel-based Tomato-based Tomato-based Compound LV RMSEP/µ Q2 RMSEP/µ Q2 LV RMSEP/µ Q2 Lycopene -Carotene Lutein Chlorophyll-a Chlorophyll-b 5 5 8 6 2 0.20 0.25 0.26 0.30 0.32 0.95 0.82 0.74 0.73 0.73 0.17 0.24 0.25 0.27 0.29 0.96 0.84 0.77 0.79 0.77 4 4 3 2 2 0.17 0.25 0.24 0.31 0.29 0.96 0.82 0.79 0.72 0.77 The calibration set contained 200 randomly selected pixels per tomato. We also built the PLS model using only one spectrum per tomato in the calibration set. This spectrum was obtained by averaging the 200 pixel spectra. This way the model gets much simpler, but we lose spatial information. The results are tabulated in the last 2 columns of Table 5. We see that the results are more or less the same as at tomato level with pixels in the calibration set. Although the models were calibrated using tomatoes with concentrations ≥ LOR, we might ask how they will predict for tomatoes with concentrations <LOR. The limit of reporting of the HPLC was given as 0.5 g/g FW for the carotenes and 2.5 g/g FW for chlorophyll. In practice, this value is lower since some of the reported concentrations were below the limit of reporting. For this reason we used half the limit of reporting in the analysis. The effect of this limit of reporting is that for tomatoes with maturity below 2 we will get unpredictable results when trying to estimate lycopene concentration. For tomatoes above maturity 6 we will get unpredictable results for chlorophyll. In order to check if this disturbing factor will degrade the prediction of the concentrations, a PLS1 model was trained on the tomatoes with concentrations above the limit of reporting, and validated with the tomatoes below the limit of reporting. We did this for the three compounds with >5 non-detects. The results showed that a number of the predicted concentrations was below zero. Since this value is in practice not possible, all concentrations below half the limit of reporting were set to (1/2) LOR. Table 6 tabulates the error on the training data; the error on the validation data, which are only the spectra of tomatoes with non-determinable concentration of the compound; and the total error, which gives an idea about the effect of the non-determinable concentration data on the total model. From this table we learn that the error in the validation set is smaller than the error in the calibration data. For chlorophyll-a and chlorophyll-b the error significantly decreases. This is due to the fact that there were a lot of tomatoes with non-detectable concentrations. A lot of the spectra predicted negative concentrations, which were set to half the limit of reporting, resulting in a relatively small error. Table 6 The effect of tomatoes with non-determinable concentrations on the regression result Compound Lycopene Chlorophyll-a Chlorophyll-b Calibration (reported values) Validation (non-detects) All N RMSE/µ N RMSEP/µ N RMSE/µ 31 16 16 0.18 0.21 0.21 6 21 21 0.01 0.07 0.02 37 37 37 0.16 0.15 0.14 The calibration errors for reported concentrations, the validation errors on the tomatoes with concentrations below the limit of reporting (non-detects), and the total error are shown. N is the number of tomatoes used in each analysis. G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 3.4. Distribution of compounds Imaging spectrometry allows us to measure compound concentration in a spatial preserving way. The PLS model is trained on a random selection of pixels. After training the model it can be applied to the spectra of all pixels. The result is an image with grey value which stands for a certain concentration. The variation in grey values gives an idea about the spatial 127 Table 7 Spatial variation in HPLC and spectral predicted lycopene concentration Tomato 2 3 4 8 9 10 11 15 19 20 31 41 HPLC (n = 3 samples) Spectral (n = 8 patches) µ σ µ σ 54.11 24.27 110.57 97.87 33.39 80.13 124.44 10.13 5.40 74.17 2.55 35.14 6.59 7.10 6.54 8.11 15.53 8.16 12.16 4.11 3.76 2.49 1.10 2.50 48.44 28.43 122.54 86.60 24.97 76.83 139.07 7.52 19.76 80.10 12.37 34.71 7.62 1.57 12.83 13.16 4.93 8.08 9.09 2.21 2.81 8.49 2.48 4.70 distribution of the compounds. Fig. 9 shows the spatial distribution of the compounds on tomatoes 7–9, with a manual scored maturity classes of 2, 8 and 6. The variation in predicted concentrations was compared with the variation in measured HPLC concentration. From eight patches of about the same size as the HPLC samples (Fig. 1) the mean concentration was calculated. The standard deviation between the eight mean values was compared with the standard deviation of the HPLC measurements. Only tomatoes where three samples were analyzed in the HPLC were used. Table 7 tabulates the result. From Table 7 we see that we found no relation between the variation seen in the HPLC measurements and the variation in the spectral prediction of lycopene. The other compounds showed similar results. A Cochran C-test showed that the difference in variance between tomatoes was significant for both the HPLC analysis and the spectral prediction for all compounds. 4. Conclusion Fig. 9. Concentration images of the spatial distribution of compounds in tomato 7 (left), 9 (middle) and 8 (right). The corresponding maturity classes are 2, 6 and 8. Tomatoes 9 and 8 show non-uniform ripening on the edge of the images. Online measurement of spatial distribution of the concentration of lycopene, -carotene, lutein, chlorophyll-a and chlorophyll-b is possible using spectral imaging. This makes it possible to sort tomato fruit using a conveyor belt. Sorting criteria might be the concentration of the compounds and the uniformity of the distribution of the concentrations. 128 G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 In this study the reference method for calibrating the spectral model is HPLC. For lycopene which has a high mean concentration (Table 2) compared to the other compounds the prediction was very good, with a Q2 of 0.95 on the individual pixels and a Q2 of 0.96 on the whole tomatoes. The prediction of the other compounds ranged from Q2 = 0.73 to 0.82 for the pixel classification and 0.77–0.84 for the tomato classification (Table 5). HPLC analysis was done on small samples taken out of the fruit wall. In order to compare the spectral results with the HPLC analysis circular regions of about the same size as the HPLC samples, were defined. Several experimental circumstances made it impossible to match the HPLC samples spatially with regions on the fruit surface. Regression on complete tomatoes, either by averaging the prediction of all the pixels, or by training the model on the mean of the spectra of all pixels gives a smaller error than regression on individual pixels (Table 5). This way the method is reduced to plain spectroscopy and the estimation of the spatial distribution of the compounds gets lost. Preprocessing on the raw pixel spectra is needed, in order to eliminate other factors than those resulting from the ripening. Color-constant, normalized spectra filtered with first-derivative Savitsky–Golay smoothing gave the best result. The build up and breakdown of the compounds are correlated processes. A spectral PLS model with the concentrations of all the compounds in one Y-block seems logical, but due to many missing values this does not work very well. A separate PLS on each compound gave the best results. Since we have a spectrum on every pixel, we can calculate and visualize the distribution of the compounds over the fruit surface. The variation shown in these concentration-images is not the same as the variation in the HPLC analysis. Since we only had a maximum of three samples from the fruit surface, and we do not know from where they were exactly taken this result is not really surprising. Acknowledgements This work was partially funded by the Dutch Ministry of Economic Affairs under the IOP (Innovation Oriented Research) program and the Ministry of Agriculture, Nature Management and Fisheries. The authors gratefully acknowledge R. de Roode for his contribution to the experimental work, E. Davelaar and G.M. Stoopen for performing the HPLC analysis, and J.H. den Dunnen for providing the tomatoes. References Arias, R., Lee, T.C., Logendra, L., Janes, H., 2000. Correlation of lycopene measured by HPLC with the L∗ , a∗ , b∗ color readings of a hydroponic tomato and the relationship of maturity with color and lycopene content. J. Agric. Food Chem. 48, 1697– 1702. Choi, K.H., Lee, G.H., Han, Y.J., Bunn, J.M., 1995. Tomato maturity evaluation using color image analysis. Trans. ASAE 38, 171–176. Clinton, S.K., 1998. Lycopene: chemistry, biology, and implications for human health and disease. Nutrition Rev. 56, 35–51. Currie, L., 1995. Nomenclature in evaluation of analytical methods including detection and quantification capabilities (IUPAC Recommendations 1995). Pure Appl. Chem. 67, 1699–1723. De Jong, S., 1993. Simpls: an alternative approach to partial least-squares regression. Chemometrics and Intelligent Laboratory Systems 18 (3), 251–263. Geladi, P., Kowalski, B.R., 1986. Partial least squares regression: a tutorial. Anal. Chim. Acta 185, 1–17. Genstat, 2000. Genstat for Windows, Release 6.1. VSN International Ltd., Oxford. Gilmore, A.M., Yamamoto, H.Y., 1991. Resolution of lutein and zeaxanthin using a non-endcapped, lightly carbon-loaded c-18 high-performance liquid- chromatographic column. J. Chromatogr. 543, 137–145. Gould, W., 1974. Color and color measurement. In: Tomato Production Processing and Quality Evaluation. Avi Publishing, Westport, CT, pp. 228–244. Helland, I.S., 1990. Partial least-squares regression and statistical-models. Scand. J. Stat. 17, 97–114. Herrala, E., Mauri, A., 1993. Direct vision spectrograph construction for imaging spectroscopy. In: Proceedings of the XXVII Annual Conference of the Finnish Physical Society. Turku, Finland, pp. 18–20. Hertog, M.G.L., Hollman, P.C.H., Katan, M.B., 1992. Content of potentially anticarcinogenic flavonoids of 28 vegetables and 9 fruits commonly consumed in the Netherlands. J. Agric. Food Chemistry 40, 2379–2383. Hyvärinen, T., Herrala, E., Dall’Ava, A., 1998. Direct sight imaging spectrograph: a unique add-on component brings spectral imaging to industrial applications. In: SPIE Symposium on Electronic Imaging, vol. 3302, pp. 165–175. ISO, 1997. International Standard ISO 11843-1. Capability of detection. Part 1. Terms and definitions. G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129 Khachik, F., Beecher, G.R., Smith, J.C., 1995. Lutein, Lycopene, and Their Oxidative Metabolites in Chemoprevention of Cancer, J. Cellular Biochem., 236–246. Konings, E., Roomans, H., 1997. Evaluation and validation of an LC method for the analysis of carotenoids in vegetables and fruit. Food Chem. 59, 599–603. Martinez-Valverde, I., Periago, M.J., Provan, G., Chesson, A., 2002. Phenolic compounds, lycopene and antioxidant activity in commercial varieties of tomato (Lycopersicum esculentum). J. Sci. Food Agric. 82, 323–330. Nguyen, M.L., Schwartz, S.J., 1999. Lycopene: chemical and biological properties. Food Technol. 53, 38–45. Polder, G., van der Heijden, G.W.A.M., Keizer, L.C.P., Young, I.T., 2003. Calibration and characterization of imaging spectrographs. J. Infrared Spectr. 11, 193–210. Polder, G., van der Heijden, G.W.A.M., Young, I.T., 2002. Spectral image analysis for measuring ripeness of tomatoes. Trans. ASAE 45, 1155–1161. Rao, A.V.R., Agarwal, S., 2000. Role of antioxidant lycopene in cancer and heart disease. J. Am. College Nutr. 19, 563– 569. 129 Savitsky, A., Golay, M.J.E., 1964. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627. Searle, S.R., Casella, G., McCulloch, C.E., 1992. Variance Components. Wiley Inc., New York. Shafer, S.A., 1985. Using color to separate reflection components. Color Res. Applic. 10, 210–218. Stokman, H., Gevers, T., 1999. Hyperspectral edge detection and classification. In: Proceedings of the 10th British Machine Vision Conference, vol. 2. Nottingham, pp. 643–651. Swierenga, H., 2000. Robust multivariate calibration models in vibrational spectroscopic applications, Katholieke Universiteit Nijmegen. Tonucci, L.H., Holden, J.M., Beecher, G.R., Khachik, F., Davis, C.S., Mulokozi, G., 1995. Carotenoid content of thermally processed tomato-based food-products. J. Agric. Food Chem. 43, 579–586. Velioglu, Y.S., Mazza, G., Gao, L., Oomah, B.D., 1998. Antioxidant activity and total phenolics in selected fruits, vegetables, and grain products. J. Agric. Food Chem. 46, 4113– 4117.
© Copyright 2026 Paperzz