Measuring surface distribution of carotenes and chlorophyll in

Postharvest Biology and Technology 34 (2004) 117–129
Measuring surface distribution of carotenes and chlorophyll in
ripening tomatoes using imaging spectrometry
G. Polder a,b,∗ , G.W.A.M. van der Heijden a , H. van der Voet a , I.T. Young b
a
b
Biometris, PO Box 100, 6700 AC, Wageningen, The Netherlands
Pattern Recognition Group, Department of Imaging Science and Technology, Delft University of Technology,
Lorentzweg 1, 2628 CJ, Delft, The Netherlands
Received 19 September 2003; accepted 3 May 2004
Abstract
Tomatoes (Lycopersicum esculentum, Mill. cv. Capita F1) were harvested at different ripening stages. Spectral images from
400 to 700 nm with a resolution of 1 nm were recorded. After recording, samples were taken from the fruit wall and the
lycopene, lutein, ␤-carotene, chlorophyll-a and chlorophyll-b concentrations were measured using HPLC. The relation between
the compound concentrations measured with HPLC and the spectral images was analyzed using partial least squares (PLS)
regression. The Q2 error of the predicted lycopene concentration, determined from the PLS procedure, was 0.95 on a pixel
basis, and 0.96 on a tomato basis. The Q2 error of the other compounds varied from 0.73 to 0.84. Pixel-based regression made
it possible to construct concentration images of the tomatoes, which showed non-uniform ripening. The method can be applied
in a conveyor belt system using sorting criteria such as concentration of the compounds and the uniformity of the distribution of
the concentrations.
© 2004 Elsevier B.V. All rights reserved.
Keywords: Imaging spectrometry; Tomato; Lycopene; Chlorophyll; Pls; Hplc; Compounds
1. Introduction
Tomatoes (Lycopersicum esculentum, Mill.) are
widely consumed either raw or after processing.
Tomatoes are known as health stimulating fruit because of the antioxidant properties of their main
compounds (Velioglu et al., 1998). Antioxidants are
important in disease prevention in plants as well as
∗ Corresponding author. Tel.: +31-317476842;
fax: +31-317483554.
E-mail address: [email protected] (G. Polder).
URL: http://www.ph.tn.tudelft.nl/∼polder.
in animals and humans. Their activity is based on inhibiting or delaying the oxidation of biomolecules by
preventing the initiation or propagation of oxidizing
chain reactions (Velioglu et al., 1998). The most important antioxidants in tomato are carotenes (Clinton,
1998) and phenolic compounds (Hertog et al., 1992).
Amongst the carotenes, lycopene dominates. The lycopene content varies significantly with ripening and
the variety of the tomato and is mainly responsible
for the red color of the fruit and its derived products (Tonucci et al., 1995). Lycopene appears to be
relatively stable during food processing and cooking
(Khachik et al., 1995; Nguyen and Schwartz, 1999).
0925-5214/$ – see front matter © 2004 Elsevier B.V. All rights reserved.
doi:10.1016/j.postharvbio.2004.05.002
118
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
Epidemiological studies have suggested a possible
role for lycopene in the protection against some types
of cancer (Clinton, 1998) and in the prevention of
cardiovascular disease (Rao and Agarwal, 2000). The
second important carotenoid is ␤-carotene, which
is about 7% of the total carotenoid content (Gould,
1974). The amount of carotenes as well as their antioxidant activity is significantly influenced by the
tomato variety (Martinez-Valverde et al., 2002) and
maturity (Arias et al., 2000).
Ripening of tomatoes is a combination of processes
including the breakdown of chlorophyll and build-up
of carotenes. Chlorophyll and carotenes have specific,
well-known reflection spectra. Using knowledge of
the known spectral properties of the main constitutive
compounds, it is possible to calculate their concentrations using spectral measurements. Arias et al. (2000)
found a good correlation between color measurements
using a chromameter and the lycopene content measured by HPLC. In order to be able to sort tomatoes
according to the distribution of their lycopene and
chlorophyll content, a fast in-line imaging system is
needed. This system can be placed on a conveyor belt
sorting machine. Since the compounds can be determined by their reflection spectra, a spectral imaging
system with many wavelength bands is preferred over
using a standard color camera. The aim of our research is to develop methods for measuring the spatial
distribution of compounds using such a system. The
spectral data were correlated with compound concentration measured by HPLC.
2. Materials and methods
2.1. Tomato samples
Tomatoes (Capita F1 from De Ruiter Seeds,
Bergschenhoek, The Netherlands) were grown in a
greenhouse located at Plant Research International,
Wageningen, The Netherlands. The tomatoes were
harvested at different ripening stages, varying from
mature green to intense red color, and scored by visual evaluation performed by a five-member sensory
panel. The ripeness stage was determined using a
tomato color chart standard (The Greenery, Breda,
The Netherlands), which is commonly used by breeders. The number of tomatoes used in this experiment
was 37. After washing and drying the tomatoes thoroughly, spectral images were recorded. Immediately
after the recording of each tomato four circular samples of 16 mm diameter and 2 mm thickness were excised from the outer pericarp and after determination
of the sample fresh weight, frozen in liquid nitrogen
and stored under nitrogen atmosphere at −80◦ C.
2.2. HPLC analysis
The tomato samples (approximately 500 mg) were
ground in a mortar in liquid nitrogen, followed by
grinding in 4 ml acetone with 50 mg CaCO3 . After
centrifugation, pellets were re-extracted with 4 ml
acetone, 2 ml hexane and 5 ml acetone:hexane (4:1),
successively. All solvents contained 0.1% (w/v) butylated hydroxytoluene (BHT). The supernatants were
combined, measured and filtered through 0.2 ␮m nylon syringe filters into HPLC vials. All processes
were performed as much as possible under subdued
or safe light and a nitrogen atmosphere.
For HPLC analysis a Spectra Physics SP8800 pump
system was used equipped with a Spectra Physics
Spectra 100 UV-Vis detector (Spectra Physics, Mountain View, CA, USA). For system processing and data
acquisition a computer system with WINner on Windows software was used (Thermo Separation Products,
Wirral, UK). Sample injections were carried out by
means of a Waters 717 (Milford, MA, USA) autosampler equipped with a cooler.
Separation of the pigments was carried out according to Gilmore and Yamamoto (1991) using
a non-endcapped Allsphere ODS-1 HPLC column
(4.6 mm × 250 mm, 5 ␮m particle size) preceded to a
ODS-1guard column (Alltech Associates). Stainless
steel column frit/insert material was replaced by a
Peek Alloyed with Teflon (PAT) column frit/insert.
The column temperature was 30 ◦ C. The mobile phase
consisted of acetonitrile:methanol:tris buffer 0.1 M,
pH 8.0 (86:10:4, solvent A) and methanol:hexane (4:1,
solvent B). All solvents contained 0.1% (w/v) butylated hydroxytoluene. The flow rate was 1 ml min−1 ,
sample injection volume was 20 ␮l and spectrophotometric detection was performed at 445 nm. The concentrations of pigment standard stock solutions were
determined spectrophotometrically using published
absorbance coefficients (Konings and Roomans,
1997). Calibration curves were made with the stan-
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
dard stock solutions to quantify the pigments. Peak
identification was based on the relative retention times
of the peaks with trans-beta-apo-8’-carotenal as internal standard. The newly developed HPLC analysis
method for tomatoes was not developed especially to
investigate whether compounds are present or absent.
No determination was made of the limit of detection
(LOD), which is defined as the lowest level that can
be reliably detected (Currie, 1995; ISO, 1997), and
which presupposes the existence of decision criteria for detection in the analytical method. When a
compound is absent, or present only in a very low
concentration, no HPLC peak is found, or the signal
is too much distorted by noise or interference to allow a sensible calculation of estimated concentration.
In practice, the analytical chemist decides on a level,
called the limit of reporting (LOR), below which no
numerical values will be reported. Unlike the LOD,
the LOR is not a performance characteristic of the
method, but rather a censoring limit. The limit of
reporting for lycopene, ␤-carotene and lutein was
0.5 ␮g/g fresh weight (FW). For chlorophyll the limit
of reporting was 2.5 ␮g/g FW.
Lycopene, ␤-carotene, lutein, chlorophyll-a and
chlorophyll-b were purchased from Sigma-Aldrich
Company (St. Louis, Missouri, USA). Beta-apo-8carotenal was acquired from Fluka Chemie AG
(Buchs, Switzerland). HPLC grade solvents were obtained from Acros Organics (Fisher Scientific Company, ’s-Hertogenbosch, The Netherlands).
2.3. Imaging spectrometry
The spectral images in this experiment were obtained using an ImSpector (Spectral Imaging Ltd.,
Oulu, Finland), spectrograph (Herrala and Mauri,
1993; Hyvärinen et al., 1998) in combination with a
stepper table. At each step the spectrum is recorded
for one line of the object. By translating the object
with respect to the camera, a full spectral image can
be obtained. The ImSpector is available in several different wavelength ranges. Type V7, which was used,
has an observed spectral range of 396 to 736 nm and
a slit size of 13 ␮m resulting in a spectral resolution
of 0.9 nm. Details of the calibration of this system
can be found in Polder et al. (2003).
The spectral images were recorded using two
Dolan-Jenner PL900 illuminators (Andover St.
119
Lawrence, Mass., USA), with 150 W quartz halogen
lamps. These lamps have a relatively smooth emission between 380 and 2000 nm. Glass fiber optic
line arrays of 0.02 in. × 6 in. aperture and rod lenses
for the line arrays (Vision Light Tech, Uden, The
Netherlands), were used for illuminating the scene.
The camera used was a Qimaging PMI-1400 EC
Peltier cooled camera with a NIKON 55 mm lens and
the ImSpector V7 between the lens and the camera.
The frame grabber used was a Datacell Limited (Berkshire, UK) Snapper board. The digitized pixels had
an resolution of 12 bits, resulting in 4096 distinguishable grey value steps. The translation table used to
move the object with respect to the camera was a
Lineartechniek Lt1-Sp5-C8-600 translation table (Andelst, The Netherlands) and is driven by a SDHWA
120 programmable microstepping motor driver (Ever
Elettronica, Italy). The resolution of this translation
table was ±30 µm and the maximum speed 250 mm/s.
When using the full camera resolution, the spectral
image is over-sampled due to limitations in lens and
ImSpector optics. Binning can be used to reduce the
size of the image without losing information (Polder
et al., 2003). The software allows binning of the image
separately in both the spatial and the spectral axes
during capture of the image. The binning factor in
both the spatial x-axis and the spectral axis was 4.
The stepsize of the stepper table was chosen to match
the binned spatial resolution in the x-direction. The
number of steps was chosen to capture one tomato and
the grey reference in one image. The resulting images
have a spatial dimension of 318 × 256 square pixels
and a spectral dimension of 257 bands (about 42MB).
The software to control the stepper table and frame
grabber, to construct the hyperspectral images and to
save and display them was locally developed in a single computer program written in Java. A detailed description of the system can be found in Polder et al.
(2003).
2.4. Data preprocessing
To remove spectral variation which is not caused
by the differences in compound concentration, but
by external effects such as aging of the light source,
non-uniform lighting, and shading effects, data preprocessing techniques must be applied. Swierenga (2000)
gives an overview of preprocessing techniques with
120
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
their corresponding spectral effect. The data preprocessing techniques that are used in this research are as
follows.
The measured spectra are dependent on the spectrum of the light source. Due to aging this spectrum
varies in time. Polder et al. (2002) have shown that
using the Shafer reflection model (Shafer, 1985), the
spectral image can be made independent of the spectral power distribution of the light source using the
following equation:
Rλ =
Iλ − B λ
Wλ − B λ
(1)
where Rλ is the color-constant reflection value at
wavelength λ, Iλ the original measured spectral reflection value, Wλ the white or grey reference intensity,
and Bλ the black reference, which is generated by the
dark current of the camera. The grey reference signal
in this experiment is obtained by recording patch 21
from the GretagMacBeth standard color checker (New
Windsor, NY) in each image. The spectral reflectance
of this reference as given by the manufacturer is 0.36
from 300 to 900 nm.
Assuming matte surfaces, Stokman and Gevers
(1999) have shown that the images can also be made
invariant to object geometry. Normalizing the spectral
values Rλ according to Eq. (2) results in a spectral
value Xλ which is independent of the illumination
spectral power distribution, illumination direction and
object geometry.
Rλ
X λ = n
λ=1 Rλ
(2)
Savitsky–Golay smoothing (Savitsky and Golay,
1964) was used to smooth the spectra. The procedure
was combined with first-order derivatives to remove
the baseline of the spectra.
and inner-coefficients are calculated. Then the X- and
Y-block residuals are calculated and the entire procedure is repeated for the next factor (commonly called
a latent variable [LV] in PLS). The optimal number of
LVs used to create the final PLS model is determined
by cross-validation using contiguous block data subsets. The PLS algorithm used is SIMPLS, developed
by De Jong (1993).
From each tomato a bottom view spectral image
was captured. In this image the center part is ignored
because of specular reflection. In order to compare
the variation in spectra-predicted concentrations with
the variation in measured HPLC concentration, eight
circular patches were defined on the tomato. The size
of these patches were about the same as the samples used in the HPLC analysis. Fig. 1 shows the
layout. From each of the 8 patches, 25 spectra were
extracted for the PLS regression. The total number of
extracted spectra per tomato this way was 200. These
spectra form the X-block in the PLS regression and
cross-validation. The size of the contiguous blocks
was also chosen as 200. This way the cross-validation
acts as leave-one-out cross-validation on the whole
tomatoes.
The performance of the PLS models can be measured by calculating the root-mean-square error of
prediction (RMSEP) and the predicted percentage
variation Q2 . These values can be measured at pixel
level (Eqs. (3) and (4)) and at tomato level (Eqs. (5)
and (6)):
n
m
2
i=1
j=1 (ŷij(−i) − yi )
RMSEPpixel =
(3)
nm
2.5. Partial least squares regression
Partial least squares (PLS) (Geladi and Kowalski,
1986; Helland, 1990) regression models are widely
used to extract useful information from spectroscopic
data. A PLS regression model relates the spectral information to quantitative information of the measured
samples (in our case the concentration of different
compounds in the tomatoes). PLS is an iterative process where at each iteration, scores, weights, loadings
Fig. 1. Image showing the masks which are used to extract tomato
spectra.
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
n
Q2pixel
i=1
=1−
m
m
j=1 (ŷij(−i)
n
i=1 (yi
− y i )2
3. Results and discussion
(4)
− y)2
3.1. HPLC analysis
m
n
2
i=1 (( j=1 ŷij(−i) /m) − yi )
RMSEPtomato =
n
(5)
n
Q2tomato
=1−
m
i=1 ((
121
j=1 ŷij(−i) /m) − yi )
2
i=1 (yi − y)
n
2
(6)
where ŷij(−i) is the predicted concentration of pixel
j from tomato i, based on the calibration set of all
tomatoes except tomato i; yi the compound concentration of tomato i measured with HPLC; these were
reference values also in the pixel case, because of
course no HPLC values per pixel were available; y,
mean concentration of all tomatoes; n the number of
tomatoes and m the number of selected pixels per
tomato.
The Y-block in PLS regression is not necessarily
a one-dimensional array with one compound concentration per X spectrum. It can contain a number of
concentrations of different compounds. In this case we
speak of PLS2 since the Y-block is two-dimensional.
To distinguish PLS with one-dimensional Y-block
from PLS2, we refer to it as PLS1.
All analyses were done using Matlab (The Mathworks Inc., Natick, Mass, USA), the Matlab PRTools
toolbox (Faculty of Applied Physics, Delft University
of Technology, The Netherlands) and the PLS Toolbox
(Eigenvector Research Inc., Manson, WA, USA).
The number of tomatoes used in this experiment
was 37. From each tomato four samples were extracted for further analysis. One sample was used
to measure the fresh-weight/dry-weight ratio. Since
HPLC analysis is quite expensive not all of the three
remaining samples of all tomatoes were analyzed.
The raw and very ripe tomatoes are not commonly
used in sorting applications, therefore fewer samples of these tomatoes were used. Table 1 shows the
number of samples analyzed per tomato per maturity
stage. For example, six tomatoes of ripeness stages
7–8 were used, from two tomatoes one sample was
analyzed in the HPLC, from four tomatoes three samples were analyzed. The total number of samples was
67. In Table 2 some descriptive statistics of the HPLC
measurements are given. The coefficients of variation components for tomatoes and for samples within
tomatoes have been estimated by fitting the multiplicative model yij = µ · tomatoi · sampleij under the
assumption of independent lognormal distributions
for tomatoi and sampleij to the logarithms of the positive measurements yij . At the logarithmic scale this
is a simple variance components modeling, which has
been performed using the residual maximum likelihood (REML) method in Genstat Release 6.1 (Searle
et al., 1992; Genstat, 2002). The variance components v obtained at the logarithmic scale have been
translated to coefficients of √
variation components at
the original scale by CV = ev − 1 (which is exact
for lognormal distributions). From Table 2 we learn
Table 1
Number of samples per tomato used in HPLC analysis, for maturity stage of the tomatoes analyzed
Ripeness
1–2
3–4
5–6
7–8
9–10
11–12
>12
Total no. of tomatoes
Total no. of samples
1 sample
2 samples
3 samples
No. of tomatoes
8
3
2
2
2
1
2
0
0
0
0
0
3
1
1
2
3
4
1
2
0
9
5
5
6
3
6
3
20
20
4
8
13
39
37
No. of samples
11
9
11
14
5
13
4
67
The maturity stage was determined visually using a tomato color chart standard. Tomatoes of maturity >12 were clearly over-ripe.
122
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
Table 2
Summary of HPLC measurements
Compound
Lycopene
␤-Carotene
Lutein
Chlorophyll-a
Chlorophyll-b
Statistics for tomatoes with results ≥LOR
No. of tomatoes
LOR (␮g/g FW)
N<LOR
N≥LOR
Mean (␮g/g FW)
CVtomato
CVsample
0.5
0.5
0.5
2.5
2.5
6
0
2
21
21
31
37
35
16
16
70.33
5.67
0.77
13.98
4.85
5.42
0.81
0.51
0.73
0.86
0.32
0.15
0.21
0.27
0.37
LOR, limit of reporting; FW, fresh weight; N, number of tomatoes (multiple measurements per tomato were all either below or above
LOR); mean, mean of tomato means; CV, coefficient of variation (σ/µ). CVs have been estimated from multiplicative model with lognormal
factors for tomato and sample (see text).
that the mean level of lutein is only slightly above the
limit of reporting. In this light it is remarkable that
lutein could be detected in all but two tomatoes. Many
more non-detects were found for chlorophyll-a and -b,
which are components that show much more variation
than lutein, both between and within tomatoes. The
variation between samples within tomatoes is much
less than the variation between tomatoes. However, it
is still considerable (CVs between 15 and 37%), and
this effect may be caused by non-uniform ripening
which also can be seen in the color distribution over
the fruit surface (Choi et al., 1995).
3.2. Spectroscopic data and preprocessing
Each tomato was recorded separately with a grey
value reference. At ∼500 nm the reflection values of
the reference are high compared to the reflection of the
tomato surface. The wavelength plane at this value is
used to segment the grey value reference by thresholding. The threshold value was empirically determined
at 300 (out of 4096). The binary image obtained was
used as a mask to extract the reference pixels from
the spectral image. The grey value reference Wλ was
calculated by averaging all these pixels.
The tomato was separated from the background
by thresholding the sum image of all wavelength
bands. From the obtained binary image the reference
was removed using an exclusive-or operation with
the previously obtained mask image. The reflection
at the center of the tomato which was disturbed by
specularity of the fruit skin, was also excluded.
The obtained raw spectra were made color-constant
using Eq. (1) and normalized using Eq. (2).
Savitsky–Golay combined with taking the first derivative was used to remove the baseline and noise.
Smoothing parameters were filter width: 15, the order
of the polynomial: 2. Fig. 4 shows raw, color-constant,
normalized color-constant and Savitsky–Golay
smoothed reflection spectra from all tomatoes. Table 3
tabulates the effect of the several preprocessing methods on the error and the simplicity (number of LVs)
of the regression results. This test was done using
PLS cross-validation on 200 randomly selected pixel
spectra per tomato, with in the Y-block the lycopene
concentrations obtained by HPLC.
From this table we learn that when making the spectra color-constant, the regression result slightly degrades. It is not quite clear what the reason is for this,
although it may be due to the fact that the noisy part of
the spectrum is enhanced. The images were captured
in a small time period of a couple of days. During this
time there was not much aging of the illuminant. For
this reason the regression on the raw spectra was reaTable 3
The effect of several preprocessing methods on the error and
the simplicity of the regression results for the PLS regression of
tomato spectra on lycopene concentration
CC NORM SAVGOL SAVGOL 1st RMSEP/µ Q2
0.26
0.28
0.20
0.20
0.20
0.91
0.90
0.95
0.95
0.95
LV
7
9
8
8
5
Combinations of color-constant (CC), normalized (NORM),
Savitsky–Golay smoothing (SAVGOL), and Savitsky–Golay
smoothing in combination with a first derivative (SAVGOL 1st)
were used.
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
sonably good. Normalizing the spectra reduce the error significantly, indicating that the object geometry of
the fruit has a significant influence on the spectra. This
is due to the overall reflectance intensity which falls
off near the border of the fruit. Noise is reduced by
Savitsky–Golay smoothing, but this does not have an
effect on the regression result, indicating that the PLS
regression method is not very sensitive to this type of
noise. When Savitsky–Golay smoothing in combination with a first derivative filter is used (Fig. 4d) the
error remains the same, but the model gets simpler,
since 5 instead of 8 LVs are sufficient to describe the
data. For the analysis in the remaining part of this paper, therefore, color-constant normalized spectra with
first derivative Savitsky–Golay smoothing were used
(bottom row, Table 3).
3.3. Predicting concentrations with
PLS regression
Chemical processes in ripening tomatoes involve
breakdown of chlorophyll and build-up of carotenes,
which are correlated processes. Therefore, it is reasonable to consider PLS2 regression with the concentrations of all compounds in the Y-block. We expected
some problems, because in this case the Y-block contains a lot of missing values. As can be seen in Figs. 2
and 3 the missing values for chlorophyll-a and -b are
for maturity >6, for lycopene the missing values are
for tomatoes with maturity <2. Missing values are not
allowed and when they are removed from the Y-block
only tomatoes with maturity between 2 and 6 remain.
To check if PLS2 would contribute to the regression
result, cross-validation is done using:
• a one-dimensional Y-block containing all positive
values of each compound.
• a one-dimensional Y-block containing all values
of each compound of all tomatoes. Concentrations
below the limit of reporting (missing values) are
counted as 0.
• a two-dimensional Y-block containing all values
of all compounds of all tomatoes. Concentrations
below the limit of reporting (missing values) are
counted as 0.
The columns of the Y-block were scaled to put the
concentrations for each compound between 0 and 1.
Table 4 tabulates cross-validation results. The number
123
Fig. 2. The amount of lycopene determined by HPLC as a function
of the manually scored ripeness stage. The numbers are the labels
assigned to the tomatoes. The maturity class is the average class
assigned by the five experts. For tomatoes where more than one
sample was analyzed, a confidence interval is given.
of latent variables needed for the smallest error varied from 3 to 14, depending on the compound, the X
and the Y variables. Fig. 5 shows a typical relation
between the cross-validation error and the number of
LVs. From this figure we see that for LVs higher than
Fig. 3. The amount of chlorophyll-a and chlorophyll-b determined
by HPLC as function of the manually scored ripeness stage. The
numbers are the labels assigned to the tomatoes. The maturity class
is the average class assigned by the five experts. For tomatoes
where more than one sample is analyzed, a confidence interval is
given.
124
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
Fig. 4. Raw tomato reflection spectra (a), color-constant spectra (b), normalized color-constant spectra (c) and Savitsky–Golay smoothed
normalized color-constant spectra (d) (filter width: 15, polynomial order 2, first derivative). One randomly selected spectrum for each of
the 37 tomatoes is plotted.
5 the error does not change very much, only when the
number of LVs is much larger (10 in this case) does
the error begin to increase. Other compounds, and X
and Y data showed similar results. For this reason the
number of LVs used in Table 4 was chosen as 5. From
this table we see that including concentrations below
the limit of reporting of the HPLC as zero significantly
degrades the regression results. Using a PLS2 model
on all concentrations does not improve this result. Excluding the zeros from the two-dimensional Y-block
is not an option, because very few samples will remain, as high lycopene concentration correlates with
Table 4
Cross-validation results (RMSEPpixel /µ) for all the compounds, using PLS1, PLS1 with zeros, PLS2 with all compounds in the Y-block
and PLS2 with selected compounds in the Y-block
PLS1
PLS1
PLS2
PLS2
PLS2
PLS2
PLS2
Missing values
Compound
–
0
0
–
1–3
–
4–5
–
2–5
–
2–3
1
2
3
4
5
0.19
0.25
0.25
0.33
0.34
0.22
0.25
0.30
0.45
0.46
0.22
0.25
0.28
0.48
0.45
0.22
0.24
0.28
0.25
0.25
0.34
0.34
0.23
0.24
0.32
0.32
Lycopene
␤-Carotene
Lutein
Chlorophyll-a
Chlorophyll-b
The number of LVs used was 5. Missing values are excluded (–) or counted as 0(0).
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
Fig. 5. Cross-validation result (RMSEP/µ) for lycopene spectral
prediction using PLS2 with all compounds in the Y-block, as
function of the number of latent variables (LV).
125
Fig. 7. Spectral predicted against real (HPLC) ␤-carotene concentration of the tomato pixels. The mean of the pixels denoting the
average concentration per tomato is indicated with a star.
low chlorophyll concentration. Excluding lycopene or
chlorophyll from the two-dimensional Y-block while
excluding the missing values is possible, since in that
case many more Y variables will remain. But results
from the PLS2in this case are not better than PLS1
with removed missing values.
The final PLS model was built using PLS1 where
tomatoes with Y values below LOR were excluded. In
Figs. 6–8 the spectral predicted lycopene, ␤-carotene
and chlorophyll-a concentrations are plotted against
the observed concentration by HPLC. Table 5 shows
the pixel-based and tomato-based PLS regression results for all compounds. Comparing the RMSEP/µ
in this table, shows that at pixel base the variation
is somewhat larger than at tomato base. Part of the
extra variation at pixel level is due to the variation
among pixels not captured in the HPLC analysis. This
is mainly the case for tomatoes where only one sample was analyzed.
Fig. 6. Spectral predicted against real (HPLC) lycopene concentration of the tomato pixels. The mean of the pixels denoting the
average concentration per tomato is indicated with a star.
Fig. 8. Spectral predicted against real (HPLC) chlorophyll-a concentration of the tomato pixels. The mean of the pixels denoting
the average concentration per tomato is indicated with a star.
126
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
Table 5
Mean concentration, and variation determined by HPLC, pixel-based classification error and tomato-based classification error
Calibration
Pixel-based
Validation
Tomato-based
Pixel-based
Tomato-based
Tomato-based
Compound
LV
RMSEP/µ
Q2
RMSEP/µ
Q2
LV
RMSEP/µ
Q2
Lycopene
␤-Carotene
Lutein
Chlorophyll-a
Chlorophyll-b
5
5
8
6
2
0.20
0.25
0.26
0.30
0.32
0.95
0.82
0.74
0.73
0.73
0.17
0.24
0.25
0.27
0.29
0.96
0.84
0.77
0.79
0.77
4
4
3
2
2
0.17
0.25
0.24
0.31
0.29
0.96
0.82
0.79
0.72
0.77
The calibration set contained 200 randomly selected
pixels per tomato. We also built the PLS model using
only one spectrum per tomato in the calibration set.
This spectrum was obtained by averaging the 200 pixel
spectra. This way the model gets much simpler, but
we lose spatial information. The results are tabulated
in the last 2 columns of Table 5. We see that the results
are more or less the same as at tomato level with pixels
in the calibration set.
Although the models were calibrated using tomatoes with concentrations ≥ LOR, we might ask how
they will predict for tomatoes with concentrations
<LOR. The limit of reporting of the HPLC was given
as 0.5 ␮g/g FW for the carotenes and 2.5 ␮g/g FW
for chlorophyll. In practice, this value is lower since
some of the reported concentrations were below the
limit of reporting. For this reason we used half the
limit of reporting in the analysis. The effect of this
limit of reporting is that for tomatoes with maturity
below 2 we will get unpredictable results when trying to estimate lycopene concentration. For tomatoes
above maturity 6 we will get unpredictable results for
chlorophyll. In order to check if this disturbing factor
will degrade the prediction of the concentrations, a
PLS1 model was trained on the tomatoes with concentrations above the limit of reporting, and validated
with the tomatoes below the limit of reporting. We
did this for the three compounds with >5 non-detects.
The results showed that a number of the predicted
concentrations was below zero. Since this value is in
practice not possible, all concentrations below half
the limit of reporting were set to (1/2) LOR. Table 6
tabulates the error on the training data; the error on
the validation data, which are only the spectra of
tomatoes with non-determinable concentration of the
compound; and the total error, which gives an idea
about the effect of the non-determinable concentration data on the total model. From this table we learn
that the error in the validation set is smaller than the
error in the calibration data. For chlorophyll-a and
chlorophyll-b the error significantly decreases. This is
due to the fact that there were a lot of tomatoes with
non-detectable concentrations. A lot of the spectra
predicted negative concentrations, which were set to
half the limit of reporting, resulting in a relatively
small error.
Table 6
The effect of tomatoes with non-determinable concentrations on the regression result
Compound
Lycopene
Chlorophyll-a
Chlorophyll-b
Calibration (reported values)
Validation (non-detects)
All
N
RMSE/µ
N
RMSEP/µ
N
RMSE/µ
31
16
16
0.18
0.21
0.21
6
21
21
0.01
0.07
0.02
37
37
37
0.16
0.15
0.14
The calibration errors for reported concentrations, the validation errors on the tomatoes with concentrations below the limit of reporting
(non-detects), and the total error are shown. N is the number of tomatoes used in each analysis.
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
3.4. Distribution of compounds
Imaging spectrometry allows us to measure compound concentration in a spatial preserving way. The
PLS model is trained on a random selection of pixels. After training the model it can be applied to the
spectra of all pixels. The result is an image with grey
value which stands for a certain concentration. The
variation in grey values gives an idea about the spatial
127
Table 7
Spatial variation in HPLC and spectral predicted lycopene concentration
Tomato
2
3
4
8
9
10
11
15
19
20
31
41
HPLC (n = 3 samples)
Spectral (n = 8 patches)
µ
σ
µ
σ
54.11
24.27
110.57
97.87
33.39
80.13
124.44
10.13
5.40
74.17
2.55
35.14
6.59
7.10
6.54
8.11
15.53
8.16
12.16
4.11
3.76
2.49
1.10
2.50
48.44
28.43
122.54
86.60
24.97
76.83
139.07
7.52
19.76
80.10
12.37
34.71
7.62
1.57
12.83
13.16
4.93
8.08
9.09
2.21
2.81
8.49
2.48
4.70
distribution of the compounds. Fig. 9 shows the spatial distribution of the compounds on tomatoes 7–9,
with a manual scored maturity classes of 2, 8 and 6.
The variation in predicted concentrations was compared with the variation in measured HPLC concentration. From eight patches of about the same size as the
HPLC samples (Fig. 1) the mean concentration was
calculated. The standard deviation between the eight
mean values was compared with the standard deviation of the HPLC measurements. Only tomatoes where
three samples were analyzed in the HPLC were used.
Table 7 tabulates the result. From Table 7 we see that
we found no relation between the variation seen in the
HPLC measurements and the variation in the spectral
prediction of lycopene. The other compounds showed
similar results. A Cochran C-test showed that the difference in variance between tomatoes was significant
for both the HPLC analysis and the spectral prediction
for all compounds.
4. Conclusion
Fig. 9. Concentration images of the spatial distribution of compounds in tomato 7 (left), 9 (middle) and 8 (right). The corresponding maturity classes are 2, 6 and 8. Tomatoes 9 and 8 show
non-uniform ripening on the edge of the images.
Online measurement of spatial distribution of
the concentration of lycopene, ␤-carotene, lutein,
chlorophyll-a and chlorophyll-b is possible using
spectral imaging. This makes it possible to sort
tomato fruit using a conveyor belt. Sorting criteria
might be the concentration of the compounds and the
uniformity of the distribution of the concentrations.
128
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
In this study the reference method for calibrating
the spectral model is HPLC. For lycopene which has
a high mean concentration (Table 2) compared to
the other compounds the prediction was very good,
with a Q2 of 0.95 on the individual pixels and a
Q2 of 0.96 on the whole tomatoes. The prediction
of the other compounds ranged from Q2 = 0.73 to
0.82 for the pixel classification and 0.77–0.84 for
the tomato classification (Table 5). HPLC analysis
was done on small samples taken out of the fruit
wall. In order to compare the spectral results with
the HPLC analysis circular regions of about the same
size as the HPLC samples, were defined. Several
experimental circumstances made it impossible to
match the HPLC samples spatially with regions on
the fruit surface. Regression on complete tomatoes,
either by averaging the prediction of all the pixels,
or by training the model on the mean of the spectra
of all pixels gives a smaller error than regression
on individual pixels (Table 5). This way the method
is reduced to plain spectroscopy and the estimation
of the spatial distribution of the compounds gets
lost.
Preprocessing on the raw pixel spectra is needed,
in order to eliminate other factors than those resulting
from the ripening. Color-constant, normalized spectra
filtered with first-derivative Savitsky–Golay smoothing gave the best result.
The build up and breakdown of the compounds are
correlated processes. A spectral PLS model with the
concentrations of all the compounds in one Y-block
seems logical, but due to many missing values this
does not work very well. A separate PLS on each
compound gave the best results.
Since we have a spectrum on every pixel, we can
calculate and visualize the distribution of the compounds over the fruit surface. The variation shown in
these concentration-images is not the same as the variation in the HPLC analysis. Since we only had a maximum of three samples from the fruit surface, and we
do not know from where they were exactly taken this
result is not really surprising.
Acknowledgements
This work was partially funded by the Dutch Ministry of Economic Affairs under the IOP (Innovation
Oriented Research) program and the Ministry of
Agriculture, Nature Management and Fisheries. The
authors gratefully acknowledge R. de Roode for his
contribution to the experimental work, E. Davelaar
and G.M. Stoopen for performing the HPLC analysis,
and J.H. den Dunnen for providing the tomatoes.
References
Arias, R., Lee, T.C., Logendra, L., Janes, H., 2000. Correlation of
lycopene measured by HPLC with the L∗ , a∗ , b∗ color readings
of a hydroponic tomato and the relationship of maturity with
color and lycopene content. J. Agric. Food Chem. 48, 1697–
1702.
Choi, K.H., Lee, G.H., Han, Y.J., Bunn, J.M., 1995. Tomato
maturity evaluation using color image analysis. Trans. ASAE
38, 171–176.
Clinton, S.K., 1998. Lycopene: chemistry, biology, and
implications for human health and disease. Nutrition Rev. 56,
35–51.
Currie, L., 1995. Nomenclature in evaluation of analytical methods
including detection and quantification capabilities (IUPAC
Recommendations 1995). Pure Appl. Chem. 67, 1699–1723.
De Jong, S., 1993. Simpls: an alternative approach to
partial least-squares regression. Chemometrics and Intelligent
Laboratory Systems 18 (3), 251–263.
Geladi, P., Kowalski, B.R., 1986. Partial least squares regression:
a tutorial. Anal. Chim. Acta 185, 1–17.
Genstat, 2000. Genstat for Windows, Release 6.1. VSN
International Ltd., Oxford.
Gilmore, A.M., Yamamoto, H.Y., 1991. Resolution of lutein
and zeaxanthin using a non-endcapped, lightly carbon-loaded
c-18 high-performance liquid- chromatographic column. J.
Chromatogr. 543, 137–145.
Gould, W., 1974. Color and color measurement. In: Tomato
Production Processing and Quality Evaluation. Avi Publishing,
Westport, CT, pp. 228–244.
Helland, I.S., 1990. Partial least-squares regression and
statistical-models. Scand. J. Stat. 17, 97–114.
Herrala, E., Mauri, A., 1993. Direct vision spectrograph
construction for imaging spectroscopy. In: Proceedings of the
XXVII Annual Conference of the Finnish Physical Society.
Turku, Finland, pp. 18–20.
Hertog, M.G.L., Hollman, P.C.H., Katan, M.B., 1992. Content of
potentially anticarcinogenic flavonoids of 28 vegetables and 9
fruits commonly consumed in the Netherlands. J. Agric. Food
Chemistry 40, 2379–2383.
Hyvärinen, T., Herrala, E., Dall’Ava, A., 1998. Direct sight imaging
spectrograph: a unique add-on component brings spectral
imaging to industrial applications. In: SPIE Symposium on
Electronic Imaging, vol. 3302, pp. 165–175.
ISO, 1997. International Standard ISO 11843-1. Capability of
detection. Part 1. Terms and definitions.
G. Polder et al. / Postharvest Biology and Technology 34 (2004) 117–129
Khachik, F., Beecher, G.R., Smith, J.C., 1995. Lutein, Lycopene,
and Their Oxidative Metabolites in Chemoprevention of
Cancer, J. Cellular Biochem., 236–246.
Konings, E., Roomans, H., 1997. Evaluation and validation of an
LC method for the analysis of carotenoids in vegetables and
fruit. Food Chem. 59, 599–603.
Martinez-Valverde, I., Periago, M.J., Provan, G., Chesson, A.,
2002. Phenolic compounds, lycopene and antioxidant activity
in commercial varieties of tomato (Lycopersicum esculentum).
J. Sci. Food Agric. 82, 323–330.
Nguyen, M.L., Schwartz, S.J., 1999. Lycopene: chemical and
biological properties. Food Technol. 53, 38–45.
Polder, G., van der Heijden, G.W.A.M., Keizer, L.C.P., Young,
I.T., 2003. Calibration and characterization of imaging
spectrographs. J. Infrared Spectr. 11, 193–210.
Polder, G., van der Heijden, G.W.A.M., Young, I.T., 2002. Spectral
image analysis for measuring ripeness of tomatoes. Trans.
ASAE 45, 1155–1161.
Rao, A.V.R., Agarwal, S., 2000. Role of antioxidant lycopene
in cancer and heart disease. J. Am. College Nutr. 19, 563–
569.
129
Savitsky, A., Golay, M.J.E., 1964. Smoothing and differentiation
of data by simplified least squares procedures. Anal. Chem.
36, 1627.
Searle, S.R., Casella, G., McCulloch, C.E., 1992. Variance
Components. Wiley Inc., New York.
Shafer, S.A., 1985. Using color to separate reflection components.
Color Res. Applic. 10, 210–218.
Stokman, H., Gevers, T., 1999. Hyperspectral edge detection and
classification. In: Proceedings of the 10th British Machine
Vision Conference, vol. 2. Nottingham, pp. 643–651.
Swierenga, H., 2000. Robust multivariate calibration models in
vibrational spectroscopic applications, Katholieke Universiteit
Nijmegen.
Tonucci, L.H., Holden, J.M., Beecher, G.R., Khachik, F., Davis,
C.S., Mulokozi, G., 1995. Carotenoid content of thermally
processed tomato-based food-products. J. Agric. Food Chem.
43, 579–586.
Velioglu, Y.S., Mazza, G., Gao, L., Oomah, B.D., 1998.
Antioxidant activity and total phenolics in selected fruits,
vegetables, and grain products. J. Agric. Food Chem. 46, 4113–
4117.