Visible-near infrared spectroscopy for detection of Huanglongbing in

Computers and Electronics in Agriculture 77 (2011) 127–134
Contents lists available at ScienceDirect
Computers and Electronics in Agriculture
journal homepage: www.elsevier.com/locate/compag
Visible-near infrared spectroscopy for detection of Huanglongbing in citrus orchards
Sindhuja Sankaran, Ashish Mishra, Joe Mari Maja, Reza Ehsani ⇑
Citrus Research and Education Center, IFAS, University of Florida, 700 Experiment Station Road, Lake Alfred, FL 33850, USA
a r t i c l e
i n f o
Article history:
Received 30 July 2010
Received in revised form 28 February 2011
Accepted 19 March 2011
Keywords:
Huanglongbing
Citrus
Visible-near infrared spectroscopy
Pattern recognition algorithms
a b s t r a c t
This paper evaluates the feasibility of applying visible-near infrared spectroscopy for in-field detection of
Huanglongbing (HLB) in citrus orchards. Spectral reflectance data from the wavelength range of 350–
2500 nm with 989 spectral features were collected from 100 healthy and 93 HLB-infected citrus trees using
a visible-near infrared spectroradiometer. During data preprocessing, the spectral data were normalized
and averaged every 25 nm to reduce the spectral features from 989 to 86. Three datasets were generated
from the preprocessed raw data: first derivatives, second derivatives, and a combined dataset (generated
by integrating preprocessed raw data, first derivatives and second derivatives). The preprocessed datasets
were analyzed using principal component analysis (PCA) to further reduce the number of features used as
inputs in the classification algorithm. The dataset consisting of principal components were randomized
and separated into training and testing datasets such that 75% of the dataset was used for training; while
25% of the dataset was used for testing the classification algorithms. The number of samples in the training
and testing datasets was 145 and 48, respectively. The classification algorithms tested were: linear discriminant analysis, quadratic discriminant analysis (QDA), k-nearest neighbor, and soft independent modeling of classification analogies (SIMCA). The reported classification accuracies of the algorithms are an
average of three runs. When the second derivatives dataset were analyzed, the QDA-based classification
algorithm yielded the highest overall average classification accuracies of about 95%, with HLB-class classification accuracies of about 98%. In the combined dataset, SIMCA-based algorithms resulted in high overall
classification accuracies of about 92% with low false negatives (less than 3%).
Ó 2011 Elsevier B.V. All rights reserved.
1. Introduction
Huanglongbing (HLB) or citrus greening is an exotic citrus disease caused by the phloem-limiting bacteria, Candidatus Liberibacter asiaticus that has greatly affected citrus production and resulted
in great economic losses. This vector-based disease affects citrus
varieties and results in tree decline (Chung and Brlansky, 2009).
The disease spreads through Asian psyllids (vector). It takes about
6–24 months before visible symptoms appear in the infected citrus
trees. Typical symptoms of HLB are yellowing of leaves and shoots,
with mottled or blotchy leaves (Chung and Brlansky, 2009). Fruit
from infected trees are small and malformed, or asymmetric.
Symptoms spread through the infected branch to the entire tree,
finally resulting in a twig die-back. HLB has spread to many parts
of Florida and is a serious threat to the Florida citrus industry.
The U.S. citrus industry is taking drastic measures to control and
contain this disease. HLB-infected citrus trees act as a source of
inoculum, causing further spread of the disease through infestation
by vectors. Therefore, disease detection and removal of infected
trees are two critical steps in minimizing further spread of this disease. Rapid identification and removal of HLB-infected trees will
⇑ Corresponding author. Tel.: +1 863 956 1151x1228; fax: +1 863 956 4631
E-mail address: ehsani@ufl.edu (R. Ehsani).
0168-1699/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.compag.2011.03.004
reduce the spread of HLB in citrus orchards. Currently, scouting
is the most widely used technique for HLB detection. A recent
study on survey methods (Futch et al., 2009) indicated that scouting efficiency for HLB detection ranges from an average of 47% to
61% depending on the survey techniques used (single survey).
Additionally, there is an increase in production costs of about
$736 per acre involved with HLB management (Muraro and Morris,
2009). Thus, there is a need for new field-based, real-time, accurate
sensing technologies for improved scouting efficiency to detect
HLB in citrus orchards. Although laboratory techniques, such as
polymerase chain reaction (PCR) provide accurate detection of
HLB (Teixeira et al., 2005; Li et al., 2006, 2009; Lacava et al.,
2006), faster and lower cost field-based methods are needed for
the rapid detection of this disease.
The spectral reflectance from the tree canopy in the visible and
infrared regions of the electromagnetic spectra can be used as an
indication of plant stress (Sankaran et al., 2010). Differences in
the spectral reflectance of healthy and diseased plants can be seen
in the visible-infrared region. Various researchers (Spinelli et al.,
2006; Yang et al., 2007; Delalieux et al., 2007; Chen et al., 2008;
Purcell et al., 2009) have used spectral reflectance based techniques for disease detection in plants and postharvest food products. Optical sensors operating based on the principle of spectral
reflectance, have also been applied to detect various diseases under
128
S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134
field-conditions. For example, Naidu et al. (2009) applied visibleinfrared spectroscopy (350–2500 nm) for detecting grapevine leafroll disease. The spectral reflectance data were collected under
field conditions and data were analyzed in the laboratory. It was
reported that the classification accuracy based on stepwise discriminant analysis ranged from 0.73 to 0.81 depending on the features (vegetative indices) used for detecting infected (symptomatic
and non-symptomatic) and healthy leaves. Similarly, Chen et al.
(2008) used visible-infrared spectroscopy for detecting and rating
verticillium wilt in cotton. The study indicated that a first derivative-based model in the wavelength range 731–1317 nm resulted
in a maximum determination coefficient of 0.741. Thus, these studies demonstrated the potential of using visible-infrared spectroscopy for disease detection in plants. Although optical sensors
show great potential in plant disease detection, there is a need to
further assess the technique for plant disease detection for specific
conditions.
The present study evaluates the application of visible-near
infrared spectroscopy for HLB detection in citrus under field conditions. This work also evaluates different statistical models for the
classification of HLB-infected citrus leaves from healthy leaves.
2. Materials and methods
2.1. Sensor
A high resolution field-portable SVC HR-1024 spectroradiometer (Spectra Vista Cooperation, NY) was used for collecting
visible-near infrared spectral reflectance data in the range of
350–2500 nm (989 data points). The spectral resolution was 63.5,
69.5 and 66.5 nm for 350–1000, 1000–1850 and 1850–2500 nm
wavelength ranges, respectively. The spectral reflectance data from
healthy and HLB-infected leaves were collected using a 4° fieldof-view (FOV) optic at a minimum integration time of 4 ms. The
sensor system was mounted on the front support of an agricultural
vehicle (Fig. 1). It was comprised of SVC HR-1024 spectroradiometer with a laser pointer and a platform to support two light sources
(500 W portable halogen lamps). The laser point was needed to
identify the region of interest. The light sources were mounted
on a platform such that they provided illumination to the region
of interest (citrus leaves) in addition to sunlight during data collection. Before data collection, a dark background correction (dark
current compensation) was performed leaving the spectroradiometer lens cap (black) on. In addition, a white background correction
(reference) was also performed using a 25.4 25.4 cm (10 in. 10 in.) white panel (Spectralon Reflectance Target, CSTM-SRT99-100, Spectra Vista Cooperation, NY). The white background
correction is performed to account for variations in the incident
light intensity.
2.2. Data collection
Spectral field data were collected from the citrus leaves attached to the tree canopy in a 2185 hectare (5400 acres) citrus
orchard (Devil’s Garden, Southern Gardens, Clewiston, FL) from
February 23rd–26th, 2010. Data collection was performed between
8.00 a.m. and 6.00 p.m from a block of citrus trees covering an area
of about 0.0539 sq. km. Spectral reflectance data from healthy and
HLB-infected orange leaves (Fig. 2) were recorded at a distance of
about 0.5–0.7 m using the sensor, which was interfaced with a laptop computer. Data was acquired using a customized computer
program, which collected five replicate spectra from each leaf sample. In addition to the spectral data collection, pictures of the samples were also taken.
The HLB-infected samples were pre-marked and confirmed by
the Southern Gardens head scout prior to data collection. Following the spectral data collection, all of the leaf samples were sent
to Southern Gardens laboratory to confirm the presence of HLB
bacteria through PCR analysis. The threshold cycle (Ct) from the
PCR analysis indicates the presence or absence of HLB bacteria in
leaves. A Ct value of higher than 32 is considered as healthy (PCR
negative); while Ct value of less than 30 is considered HLB-infected
(PCR positive). The number of healthy and HLB-infected leaf samples (symptomatic leaves) analyzed during data collection was
100 and 93, respectively. The healthy samples were from healthy
trees from a region with no HLB-infection; while HLB-infected
samples were from HLB-infected orchard area. Class 1 refers to
the healthy class or group of data; while class 2 refers to the HLB
class of data, hereafter.
In addition, spectral reflectance data were collected from a set
of 81 leaf samples with no HLB symptoms from the HLB-infected
trees. All these samples were asymptomatic or non-symptomatic
and were collected from a HLB-infected leaf branch. The asymptomatic leaves appeared healthy, unlike symptomatic leaves. They
Fig. 1. Optical sensor system placed on a mobile platform.
S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134
129
2.4. Data analysis
Fig. 2. (a) Healthy and (b) HLB-infected citrus leaves.
were labeled healthy or HLB based on PCR results. Out of 81 samples, 24 samples (29%) were PCR positive (Ct < 32). Samples with Ct
values between 30 and 32 (nine samples) were eliminated from
the dataset before statistical analysis. Thus, a set of 72 sample
spectra with 24 PCR positive samples (labeled as HLB/class 2)
and 48 PCR negative samples (labeled as healthy/class 1) were
used for further analysis.
2.3. Data preprocessing
The five spectra of each leaf sample were averaged before preprocessing. Prior to the data analysis, each averaged spectral feature of each sample was normalized (Euclidian norm or 2-norm)
based on Eq. (1) using the MATLABÒ7.6 program (The MathWorks
Inc., Natick, MA).
Ri
RnormðiÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2
½Ri þ R2iþ1 . . . R2n ð1Þ
where, Rnorm(i) is the normalized reflectance for a particular wavelength in a sample, Ri is the measured reflectance of a particular
wavelength in a sample, and i varies from 1 to 989 (n), referring
to each wavelength within a sample.
Few representative healthy and HLB-infected sample spectra
are presented in Fig. 3. After normalization, data binning was performed by averaging normalized spectral features for each 25 nm
wavelength interval to reduce the spectral features from 989 to
86. In addition, binning was also needed to make the interval between the spectral bands uniform, in order to compute the first
and second derivatives. The bandwidth was selected based upon
preliminary analysis and literature review (Ray et al., 2010; Kong
et al., 2010).
0.07
Healthy Samples
HLB Samples
Reflectance (a.u)
0.06
0.05
0.04
0.03
0.02
0.01
0
500
1000
1500
2000
2500
Wavelength (a.u)
Fig. 3. Spectral signature of representative healthy and HLB leaf samples.
The preprocessed (normalized, 25 nm-binned) spectral reflectance data were used to calculate the first and second derivatives
based on Savitzky-Golay filtering. In Savitzky-Golay filtering, an
unweighted linear least-square fit using a polynomial equation is
used to calculate the filter coefficients. The Savitzky-Golay filtering
performs data smoothing, in addition to calculating the derivatives.
A window length/frame size of 7 and polynomial order of 2 (quadratic) was used for deriving the first and second derivatives from
the preprocessed raw data. From the 86 spectral reflectance features (for each sample), a total of 80 first and 80 second derivatives
were derived. Three different datasets were generated to determine the most suitable method for detecting HLB-infected samples. The datasets used are summarized as follows: a first
derivatives dataset derived from preprocessed raw data; a second
derivatives dataset derived from preprocessed raw data; and a
combined dataset (in which the preprocessed raw data, first and
second derivatives were merged). The total number of features
representing the first derivatives, second derivatives, and the combined datasets were 80, 80, and 246, respectively.
The above spectral datasets were analyzed using principal component analysis (PCA). PCA is a multivariate statistical technique
that reduces the dimensionality of the original dataset by projecting it orthogonally into non-correlated variables or principal components (PCs). The PC scores of PCs accounting for 99.9% variance
of the original feature datasets were used as an input to the classifiers. Figs. 4a–c illustrates representative PC plots derived from the
first derivatives, the second derivatives, and combined datasets
with three PCs.
2.5. Pattern recognition algorithms
The PC scores derived from the three datasets were analyzed
using several different pattern recognition algorithms to determine
the classification accuracies during classification of healthy and
HLB-infected leaf spectra. The PC scores (input features) were randomized and separated into balanced training and testing datasets.
The training dataset was used for developing a model, while testing dataset was used to test the developed model.
Before evaluating the classification accuracies of the different
classifiers, the effect of training to testing dataset ratio on the classification accuracies was studied using the combined dataset. The
training to testing dataset ratio was varied as 70:30, 75:25,
80:20, and 90:10. The data size corresponding to each ratio was:
136:57, 145:48, 155:38, and 174:19, respectively. For evaluating
the training and testing dataset ratio, the linear and quadratic discriminant analysis (LDA, QDA), k-nearest neighbor (kNN with
k = 5)-based models were repeated three times to estimate their
average classifications accuracies. These models have been used
for different classification applications (Khot et al., 2008; Panigrahi
et al., 2008; Balabin et al., 2010). After acquiring the classification
accuracies for each model using each ratio, a simple analysis of variance (ANOVA) was performed using SASÒ9.2 (SAS Institute Inc.,
Cary, NC, USA) to determine if the training and testing the dataset
ratio had an effect on the average classification accuracies. Based
on the ANOVA results, the datasets (first, second, combined) were
randomized and separated into training and testing datasets. The
process of randomization, and separation of each featured datasets
into training and testing datasets was repeated thrice (resulting in
three different randomized datasets for each type).
One linear model and three non-linear classification models
were used for analysis. The different classifier models, discriminant
analysis (LDA, QDA), kNN, and soft independent modeling of class
analogy (SIMCA) were evaluated to classify the spectral data. In
discriminant analysis, discriminant functions are derived from
130
S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134
(a)
x 10
-6
Principal Component 3
(10.8%) 4
2
0
Healthy
HLB
-2
-4
-6
5
2
0
x 10
1
-6
Principal Component 2
(21.7%)
(b)
0
-5
x 10
-1
-10
x 10
-5
Principal Component 1
-2
(46.4%)
-4
Principal Component 3
(8.4%) 3
Healthy
HLB
2
1
0
-1
-2
-3
2
0
x 10
-4
-2
-2
Principal Component 2
-4
-4
-6
Principal Component 3
(4.7%)
4
2
Principal Component 1
(20.6%)
(c)
0
x 10
-4
(56.1%)
0.02
Healthy
HLB
0.015
0.01
0.005
0
-0.005
-0.01
-0.015
-0.02
0.04
0.02
0
-0.02
-0.04
Principal Component 2
(23.8%)
-0.06
-0.1
-0.05
0
0.05
0.1
Principal Component 1
(58.7%)
Fig. 4. Principal components plot of (a) first derivatives dataset, (b) second derivatives dataset, and (c) combined dataset, with three principal components accounting for
79.2%, 85.0% and 87.2% variability, respectively.
the input independent variables of the training dataset. These discriminant functions are used to determine classification cut-off PC
score values between the classes. LDA is a linear model, which
generates a linear boundary; while QDA develops a quadratic model
for classification of unknown samples (test dataset). For example,
Fig. 5 represents the difference in boundary generation based on
S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134
LDA and QDA using two PC scores derived from the combined dataset. In addition to the discriminant analysis, a kNN model was used
for classification. With kNN, a model consisting of two groups
(classes) is developed using the training dataset. The group
(healthy/HLB) of unknown samples is predicted by calculating
the distance between the unknown sample and k nearest neighbors/samples of the training dataset (PLS Toolbox 4.0 Manual,
Eigenvector Research Inc.). The k-value (constant in the algorithm)
was varied from 1 to 15 during the kNN model development. The
optimum k was selected based on the highest classification accuracies with low type I (false positive/healthy classified as HLBinfected) and type II errors (false negative/HLB-infected classified
as healthy). In other words, the kNN model with k yielding the
maximum overall and individual class classification accuracies
was considered as the best k (for kNN model). In kNN, the maximum number of votes (k) was used for determining the unknown
sample class or group. In case of equal number of votes, the closest
distance was used for predicting the unknown sample class.
Finally, SIMCA, another multivariate classification model based
on collection of PCA models was used. In SIMCA-based classification, as a part of model development, the PC scores are generated
for each class based on the variation in each class, rather than utilizing the overall variation in the data. The model residuals are
then used for classification of unknown samples. Thus, unlike other
models (LDA, QDA, kNN) where the PC scores were used as input
features, the spectral features were directly used as input features
with SIMCA as SIMCA model with itself would estimate PC scores
for each class. The number of PCs was selected such that they restored about 99% variability within each class. The Hotelling T2
and Q Residual in SIMCA model refer to the model space and residual space of the model. Hotelling T2 is used to estimate the probability that the predicted score of a given observation is similar or
different to that of the training set (Bylesjö et al., 2006).
Each of the classification algorithms were tested three times
and the reported classification accuracies of the algorithms are
an average of three runs. The overall and individual class
(healthy-class 1 and HLB-class 2) classification accuracies were
determined. All the pattern recognition algorithms were developed
and tested using the MATLABÒ7.6. program. The training dataset
consisted of 145 data samples; while the testing dataset consisted
of 48 data samples.
The set of asymptomatic samples, 48 healthy and 24 HLB sample spectra were preprocessed in the same way as the other
healthy and symptomatic leaf samples. The PCA was performed
on the combined data (healthy, symptomatic and asymptomatic)
and PC scores (24) were used as input features to classify the data
using the QDA-based algorithm. Ninety percentage of the symptomatic and healthy leaves spectra were used for training and
developing the classification algorithm; while 10% of the dataset
spectra were used for testing the model or validation (model
131
testing). The classification algorithm developed from the training
dataset was then used to test the entire asymptomatic dataset
(72 samples, unknown data testing).
2.6. Statistical analysis
ANOVA (level of significance, a = 0.05) was performed using
SASÒ9.2 to determine whether the classifier models yielded different classification accuracies (overall, class 1 and class 2) for each
type of dataset. In addition, ANOVA was also performed to study
the effect of different datasets (first derivatives, second derivatives,
combined) on the classification accuracies. The ANOVA was performed for each model (LDA, QDA, kNN, SIMCA) and individual
classification accuracies (overall, class 1, class 2).
3. Results and discussion
3.1. Data collection and preliminary analysis
During the data collection, the HLB-bacterial detection using
scouting crew (through observation of visual symptoms) was compared to PCR analysis. Comparing HLB identification by the scouting crew against the PCR results, it was found that the scouting
efficiency was about 90% (two to three times of scouting). The
remaining 10% error could also be a result of false negatives during
the PCR analysis. In such cases, the HLB-infection was confirmed by
observing the appearance of the symptoms from the pictures.
Based on this method, a total of 193 samples with 100 healthy
and 93 HLB were analyzed.
In addition, ANOVA on the classification accuracies (as explained in the methodology session) was performed to study the
effect of varying training-testing sample ratios on the classification
accuracies. It was found that the classification accuracies did not
change significantly (a = 0.05) by varying the training by testing
dataset ratios. Therefore, a commonly used training by testing
dataset ratio of 75:25 was applied for evaluating different classifier
algorithms for the classification of the three featured datasets.
3.2. Analysis of the first derivatives dataset
The average classification accuracies (overall, class 1 and class
2) of the different classifier models using the multiple datasets
are summarized in Table 1. The number of PCs used as input features accounting for 99.9% of variability with the exception of
99% of variability in SIMCA, are summarized in Table 1. The cross
validation results of SIMCA using PCs generated from class 1 and
class 2 are presented in Fig. 6(a) and (b), respectively. Based on ANOVA (a = 0.05), it was found that the average classification accuracies obtained from the classifier models using the first derivatives
dataset were different from each other.
Fig. 5. (a) Linear discriminant analysis, and (b) quadratic discriminant analysis of the combined dataset.
132
S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134
Table 1
Classification of healthy and HLB-infected sample spectra using different pattern
recognition algorithms.
Model
PCs
First derivatives
LDA
23
QDA
23
kNN
23
SIMCA
17*
k
5 (5–7)
Second derivatives
LDA
26
QDA
26
kNN
26
3
SIMCA
22*
Combined
LDA
QDA
kNN
SIMCA
*
(Raw data, first and
24
24
24
5 (5–7)
22*
Average classification accuracies (%)
Overall
Class 1 (Healthy)
Class 2 (HLB)
88.9
88.9
83.3
77.8
90.0
87.3
91.5
61.0
87.6
90.4
75.2
96.5
93.7
95.1
86.8
79.9
94.4
92.5
90.5
66.3
93.0
98.7
83.2
93.1
second derivatives)
90.3
97.6
92.3
91.4
86.8
89.8
91.7
85.5
3.3. Analysis of the second derivatives dataset
82.7
93.7
83.6
97.4
No. of PCs representing 99% variability within each class.
The average overall classification accuracy for all the models
was about 80%. The SIMCA and kNN-based models yielded a lower
average overall classification accuracy than LDA and QDA. However, on comparing the individual class classification accuracies,
it was found that the QDA and SIMCA models yielded higher average class 2 (HLB class) classification accuracies or lower false negatives (HLB classified as healthy), than those of LDA and kNN. It is
more important to have low false negatives than false positives
Q Residuals (0.81%)
(a)
10
10
10
10
10
(healthy classified as HLB) to ensure the identification of all HLBinfected trees. It was found that the SIMCA resulted in higher
class-2 classification accuracies (low false negatives) than those
of other models. Thus, it could be stated that SIMCA was better suited for detecting HLB-infected leaves. However, QDA-based algorithm resulted in higher overall classification accuracies, with
first derivatives dataset resulting in highest overall and individual
(HLB and healthy) class classification accuracies (Table 1).
ANOVA indicated that the classification accuracies (overall,
class 1 and class 2) using different classifier models were different
from each other (a = 0.05), when the second derivatives dataset
was used. Among all the different models, the QDA-based algorithm yielded a higher average overall and class 2 (HLB) classification accuracies. With the exception of the kNN-based algorithm, all
the classifier models yielded class 2 classification accuracies higher
than 90%. Similar to the first derivatives, the SIMCA-based algorithm resulted in the lowest class 1 classification accuracies. The
classification accuracies (overall, class 1 and class 2) with the second derivatives dataset were higher than those with the first derivatives dataset.
3.4. Analysis of the combined dataset
In the combined dataset, the models resulted in different
classification accuracies as analyzed using ANOVA (a = 0.05). The
lowest average overall classification accuracy was found with the
Samples/Scores Plot of train
3
Q Residuals (0.81%)
Class 1 (Healthy)
Class 2 (Diseased)
95% Confidence Level
2
1
0
-1
10
0
10
1
10
2
10
3
Hotelling T^2 (99.19%)
Samples/Scores Plot of train
Q Residuals (0.64%)
(b) 10 2
10
10
10
Q Residuals (0.64%)
Class 1 (Healthy)
Class 2 (Diseased)
95% Confidence Level
1
0
-1
10
0
10
1
Hotelling T^2 (99.36%)
Fig. 6. Residual plots during training and cross-validation using SIMCA.
10
2
S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134
133
Fig. 7. Average classification accuracies (all datasets) based on individual classification algorithms.
kNN-based model; while LDA resulted in lowest average class 2
classification accuracies. Among the different models, the QDA as
well as SIMCA-based algorithms resulted in high average overall
and class 2 classification accuracies. The class 2 classification accuracies of the SIMCA-based algorithm were found to be higher than
other models. In comparison to the first and second derivatives
datasets, the SIMCA-based model resulted in high class 1 classification accuracies using the combined dataset.
3.5. Comparison of the datasets
Comparing the datasets (first derivatives, second derivatives
and combined), the QDA yielded the highest average overall classification accuracy of 95.1% with average class 2 accuracy of 98.7%
compared to other models. The statistical analysis indicated that
the accuracies (dependent variable) did not change significantly
with variation in the dataset type (independent variable)
(a = 0.05), comparing one model (LDA, QDA, kNN, SIMCA), one classification accuracy type (overall, class 1, class 2) at a time. The
overall and individual class classification accuracies (averaging
all the dataset types) are presented in Fig. 7. Although the classification accuracies (overall and individual class) of the combined
dataset were similar to the other two datasets (first and second
derivatives), in most cases the combined dataset showed a lower
coefficient of variation than those of the other two datasets (within
each model).
Comparing the overall classification accuracies, the QDA model
yielded a maximum average classification accuracy of 92%.
Although the LDA model resulted in high overall classification
accuracies comparable to classification accuracies yielded by the
QDA model, the LDA model resulted in low class 2 classification
accuracies (higher false negative). Comparing the class 2 classification accuracies, it was found that the SIMCA-based algorithm gave
maximum classification accuracies of about 96% with a low variation (standard deviation of 3.7%). As the class 1 classification accuracies of SIMCA-based algorithm were lower, the overall
classification accuracies were lower than those of other models.
3.6. Analysis of asymptomatic samples
A plant disease detection tool should be rapid, specific to a particular disease, and sensitive for detection at the early onset of the
symptoms (López et al., 2003). For these reasons, in addition to the
spectra collected from healthy and symptomatic leaves in the field,
another set of spectra was collected from the asymptomatic leaves
from the same branch as the symptomatic leaves as described in
the methodology section.
Fig. 8. Classification results of spectral features of asymptomatic leaf samples.
134
S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134
The classification accuracies acquired during validation (model
testing) and testing (unknown data testing) are summarized in
Fig. 8. Considering the symptomatic and healthy leaf samples
(from HLB-infected trees), the classification accuracies (for overall,
class 1 and class 2) were found to be higher than 90%. Similarly,
when the asymptomatic spectral samples were analyzed, it was
found that the overall classification accuracies were about 38%.
Although the overall classification accuracies were found to be
low, the class 2 (HLB class) classification accuracies were found
to be about 88%. These high class 2 classification accuracies or
low number of false negatives indicates that the spectroradiometer
has a good potential for detecting asymptomatic HLB leaves under
field conditions.
The low overall classification accuracy (38%) of the asymptomatic leaf samples resulted from the low class 1 classification accuracies (more false positives). The possible reason for low class 1
(healthy) classification accuracies could be that the asymptomatic
leaves labeled as healthy did not have sufficient population of HLB
bacteria to be PCR positive. Thus, the asymptomatic leaves could be
infected with bacteria, yet exhibit no symptoms nor appear PCR
positive. It usually takes 3–6 months or more, depending on the
age and physiological status of the trees, for the leaves of the
HLB-infected trees to turn from PCR negative to PCR positive upon
infection. In this study, as the asymptomatic leaves were PCR negative, they were considered healthy. However, we could not confirm whether the healthy appearing asymptomatic leaves turned
from PCR negative to positive, as these trees were required to be
removed after data collection to prevent the further spread of
HLB infection through vectors to other parts of the orchard.
4. Conclusions
The potential of a portable, field-based, visible-near infrared
spectroradiometer for HLB detection in a citrus orchard was assessed. The results indicated that statistical classifier models such
as QDA can distinguish between healthy and symptomatic HLBinfected leaves with high classification accuracies of greater than
90%. Comparing different classifier models, QDA consistently resulted in high average overall classification accuracies with low
false positives and false negatives. Similarly, the SIMCA-based
algorithm yielded good HLB class classification accuracies (or low
false negatives). When the asymptomatic leaves were analyzed, it
was found that that HLB class samples could be accurately classified with accuracies of about 88%.
This study demonstrates the potential of optical sensors in disease detection. It is noteworthy to mention that there were a number of challenges during data collection under different field
conditions. Some of the challenges are the effect of wind, resulting
in some spectral variation and the presence of HLB-infected leaves
within the canopy among others. Although this study demonstrates the potential of the optical sensors for disease detection, a
large number of data needs to be collected to further validate these
results. In addition, there is a need to evaluate the applicability of
optical sensors in predicting asymptomatic HLB-positive leaves.
Our future studies will involve working on some of these aspects.
Acknowledgments
The authors would like to thank the United Stated Department
of Agriculture (USDA)-National Institute of Food and Agriculture
(NIFA) and the Citrus Research and Development Foundation
(CRDF) for their funding for this research. We would like to express
our gratitude to Dr. Joao Camargo Neto, Ms. Sherrie Buchanon and
Dr. Jose Gonzalez-Mora for their assistance during data collection.
References
Balabin, R.M., Safieva, R.Z., Lomakina, E.I., 2010. Gasoline classification using near
infrared (NIR) spectroscopy data: comparison of multivariate techniques.
Analytica Chimica Acta 671, 27–35.
Bylesjö, M., Rantalainen, M., Cloarec, O., Nicholson, J.K., Holmes, E., Trygg, J., 2006.
OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA
classification. Journal of Chemometrics 20, 341–351.
Chen, B., Wang, K., Li, S., Wang, J., Bai, J., Xiao, C., Lai, J., 2008. Spectrum
characteristics of cotton canopy infected with verticillium wilt and inversion
of severity level. In: IFIP International Federation for Information Processing,
vol. 259. In: Daoliang, Li (Ed.), Computer and Computing Technologies in
Agriculture, vol. II, Springer, Boston, pp. 1169–1180.
Chung, K.R., Brlansky, R.H., 2009. Citrus diseases exotic to Florida: Huanglongbing
(citrus greening). Fact sheet PP-210. Florida Cooperative Extension Service,
Institute of Flood and Agricultural Sciences, University of Florida.
Delalieux, S., Van Aardt, J., Keulemans, W., Schrevens, E., Coppin, P., 2007. Detection
of biotic stress (Venturia inaequalis) in apple trees using hyperspectral data:
non-parametric statistical approaches and physiological implications. European
Journal of Agronomy 27 (1), 130–143.
Futch, S., Weingarten, S., Irey, M., 2009. Determining HLB infection levels using
multiple survey methods in Florida citrus. Proceedings of the Florida State
Horticultural Society 122, 152–158.
Khot, L.R., Panigrahi, S., Woznica, S., 2008. Neural-network-based classification of
meat: evaluation of techniques to overcome small dataset problems. Biological
Engineering 1, 127–143.
Kong, L., Yi, D., Sprigle, S., Wang, F., Wang, C., Liu, F., Adibi, A., Tummala, R., 2010.
Single sensor that outputs narrowband multispectral images. Journal of
Biomedical Optics 15 (1), 1–3, 010502.
Lacava, P.T., Li, W.B., Araújo, W.L., Azevedo, J.L., Hartung, J.S., 2006. Rapid, specific
and quantitative assays for the detection of the endophytic bacterium
Methylobacterium mesophilicumin plants. Journal of Microbiological Methods
65, 535–541.
Li, W., Abad, J.A., French-Monar, R.D., Rascoe, J., Wen, A., Gudmestad, N.C., Secor,
G.A., Lee, I.M., Duan, Y., Levy, L., 2009. Multiplex real-time PCR for detection,
identification and quantification of ‘CandidatusLiberibacter solanacearum’ in
potato plants with zebra chip. Journal of Microbiological Methods 78 (1),
59–65.
Li, W., Hartung, J.S., Levy, L., 2006. Quantitative real-time PCR for detection and
identification of Candidatus Liberibacter species associated with citrus
Huanglongbing. Journal of Microbiological Methods 66 (1), 104–115.
López, M.M., Bertolini, E., Olmos, A., Caruso, P., Gorris, M.T., Llop, P., Penyalver, R.,
Cambra, M., 2003. Innovative tools for detection of plant pathogenic viruses and
bacteria. International Microbiology 6, 233–243.
Muraro, R.P., Morris, R.A., 2009. The dynamics and implications of recent increases
in citrus production costs. Fact sheet FE-793. Florida Cooperative Extension
Service, Institute of Flood and Agricultural Sciences, University of Florida.
Naidu, R.A., Perry, E.M., Pierce, F.J., Mekuria, T., 2009. The potential of spectral
reflectance technique for the detection of Grapevine leafroll-associated virus-3
in two red-berried wine grape cultivars. Computers and Electronics in
Agriculture 66, 38–45.
Panigrahi, S., Chang, Y., Khot, L.R., Glower, J., Logue, C.M., 2008. Integrated electronic
nose system for detection of Salmonella contamination in meat. IEEE Sensors
Applications Symposium, Atlanta, GA, pp. 85–88.
Purcell, D.E., O’ Shea, M.G., Johnson, R.A., Kokot, S., 2009. Near-infrared spectroscopy
for the prediction of disease rating for Fiji leaf gall in sugarcane clones. Applied
Spectroscopy 63 (4), 450–457.
Ray, S.S., Jain, N., Miglani, A., Singh, J.P., Singh, A.K., Panigrahy, S., Parihar, J.S., 2010.
Defining optimum spectral narrow bands and bandwidths for agricultural
applications. Current Science 98 (10), 1365–1369.
Sankaran, S., Mishra, A., Ehsani, R., Davis, C., 2010. A review of advanced techniques
for detecting plant diseases. Computers and Electronics in Agriculture 72 (1), 1–
13.
Spinelli, F., Noferini, M., Costa, G., 2006. Near infrared spectroscopy (NIRs):
Perspective of fire blight detection in asymptomatic plant material.
Proceeding of 10th International Workshop on Fire Blight. Acta Hort 704, 87–90.
Teixeira, D.C., Danet, J.L., Eveillard, S., Martins, E.C., Junior, W.C.J., Yamamoto, P.T.,
Lopes, S.A., Bassanezi, R.B., Ayres, A.J., Saillard, C., Bové, J.M., 2005. Citrus
huanglongbing in São Paulo State, Brazil: PCR detection of the ‘Candidatus’
Liberibacter species associated with the disease. Molecular and Cellular Probes
19, 173–179.
Yang, C.M., Cheng, C.H., Chen, R.K., 2007. Changes in spectral characteristics of rice
canopy infested with brown planthopper and leaffolder. Crop Science 47, 329–
335.