Computers and Electronics in Agriculture 77 (2011) 127–134 Contents lists available at ScienceDirect Computers and Electronics in Agriculture journal homepage: www.elsevier.com/locate/compag Visible-near infrared spectroscopy for detection of Huanglongbing in citrus orchards Sindhuja Sankaran, Ashish Mishra, Joe Mari Maja, Reza Ehsani ⇑ Citrus Research and Education Center, IFAS, University of Florida, 700 Experiment Station Road, Lake Alfred, FL 33850, USA a r t i c l e i n f o Article history: Received 30 July 2010 Received in revised form 28 February 2011 Accepted 19 March 2011 Keywords: Huanglongbing Citrus Visible-near infrared spectroscopy Pattern recognition algorithms a b s t r a c t This paper evaluates the feasibility of applying visible-near infrared spectroscopy for in-field detection of Huanglongbing (HLB) in citrus orchards. Spectral reflectance data from the wavelength range of 350– 2500 nm with 989 spectral features were collected from 100 healthy and 93 HLB-infected citrus trees using a visible-near infrared spectroradiometer. During data preprocessing, the spectral data were normalized and averaged every 25 nm to reduce the spectral features from 989 to 86. Three datasets were generated from the preprocessed raw data: first derivatives, second derivatives, and a combined dataset (generated by integrating preprocessed raw data, first derivatives and second derivatives). The preprocessed datasets were analyzed using principal component analysis (PCA) to further reduce the number of features used as inputs in the classification algorithm. The dataset consisting of principal components were randomized and separated into training and testing datasets such that 75% of the dataset was used for training; while 25% of the dataset was used for testing the classification algorithms. The number of samples in the training and testing datasets was 145 and 48, respectively. The classification algorithms tested were: linear discriminant analysis, quadratic discriminant analysis (QDA), k-nearest neighbor, and soft independent modeling of classification analogies (SIMCA). The reported classification accuracies of the algorithms are an average of three runs. When the second derivatives dataset were analyzed, the QDA-based classification algorithm yielded the highest overall average classification accuracies of about 95%, with HLB-class classification accuracies of about 98%. In the combined dataset, SIMCA-based algorithms resulted in high overall classification accuracies of about 92% with low false negatives (less than 3%). Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Huanglongbing (HLB) or citrus greening is an exotic citrus disease caused by the phloem-limiting bacteria, Candidatus Liberibacter asiaticus that has greatly affected citrus production and resulted in great economic losses. This vector-based disease affects citrus varieties and results in tree decline (Chung and Brlansky, 2009). The disease spreads through Asian psyllids (vector). It takes about 6–24 months before visible symptoms appear in the infected citrus trees. Typical symptoms of HLB are yellowing of leaves and shoots, with mottled or blotchy leaves (Chung and Brlansky, 2009). Fruit from infected trees are small and malformed, or asymmetric. Symptoms spread through the infected branch to the entire tree, finally resulting in a twig die-back. HLB has spread to many parts of Florida and is a serious threat to the Florida citrus industry. The U.S. citrus industry is taking drastic measures to control and contain this disease. HLB-infected citrus trees act as a source of inoculum, causing further spread of the disease through infestation by vectors. Therefore, disease detection and removal of infected trees are two critical steps in minimizing further spread of this disease. Rapid identification and removal of HLB-infected trees will ⇑ Corresponding author. Tel.: +1 863 956 1151x1228; fax: +1 863 956 4631 E-mail address: ehsani@ufl.edu (R. Ehsani). 0168-1699/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.compag.2011.03.004 reduce the spread of HLB in citrus orchards. Currently, scouting is the most widely used technique for HLB detection. A recent study on survey methods (Futch et al., 2009) indicated that scouting efficiency for HLB detection ranges from an average of 47% to 61% depending on the survey techniques used (single survey). Additionally, there is an increase in production costs of about $736 per acre involved with HLB management (Muraro and Morris, 2009). Thus, there is a need for new field-based, real-time, accurate sensing technologies for improved scouting efficiency to detect HLB in citrus orchards. Although laboratory techniques, such as polymerase chain reaction (PCR) provide accurate detection of HLB (Teixeira et al., 2005; Li et al., 2006, 2009; Lacava et al., 2006), faster and lower cost field-based methods are needed for the rapid detection of this disease. The spectral reflectance from the tree canopy in the visible and infrared regions of the electromagnetic spectra can be used as an indication of plant stress (Sankaran et al., 2010). Differences in the spectral reflectance of healthy and diseased plants can be seen in the visible-infrared region. Various researchers (Spinelli et al., 2006; Yang et al., 2007; Delalieux et al., 2007; Chen et al., 2008; Purcell et al., 2009) have used spectral reflectance based techniques for disease detection in plants and postharvest food products. Optical sensors operating based on the principle of spectral reflectance, have also been applied to detect various diseases under 128 S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134 field-conditions. For example, Naidu et al. (2009) applied visibleinfrared spectroscopy (350–2500 nm) for detecting grapevine leafroll disease. The spectral reflectance data were collected under field conditions and data were analyzed in the laboratory. It was reported that the classification accuracy based on stepwise discriminant analysis ranged from 0.73 to 0.81 depending on the features (vegetative indices) used for detecting infected (symptomatic and non-symptomatic) and healthy leaves. Similarly, Chen et al. (2008) used visible-infrared spectroscopy for detecting and rating verticillium wilt in cotton. The study indicated that a first derivative-based model in the wavelength range 731–1317 nm resulted in a maximum determination coefficient of 0.741. Thus, these studies demonstrated the potential of using visible-infrared spectroscopy for disease detection in plants. Although optical sensors show great potential in plant disease detection, there is a need to further assess the technique for plant disease detection for specific conditions. The present study evaluates the application of visible-near infrared spectroscopy for HLB detection in citrus under field conditions. This work also evaluates different statistical models for the classification of HLB-infected citrus leaves from healthy leaves. 2. Materials and methods 2.1. Sensor A high resolution field-portable SVC HR-1024 spectroradiometer (Spectra Vista Cooperation, NY) was used for collecting visible-near infrared spectral reflectance data in the range of 350–2500 nm (989 data points). The spectral resolution was 63.5, 69.5 and 66.5 nm for 350–1000, 1000–1850 and 1850–2500 nm wavelength ranges, respectively. The spectral reflectance data from healthy and HLB-infected leaves were collected using a 4° fieldof-view (FOV) optic at a minimum integration time of 4 ms. The sensor system was mounted on the front support of an agricultural vehicle (Fig. 1). It was comprised of SVC HR-1024 spectroradiometer with a laser pointer and a platform to support two light sources (500 W portable halogen lamps). The laser point was needed to identify the region of interest. The light sources were mounted on a platform such that they provided illumination to the region of interest (citrus leaves) in addition to sunlight during data collection. Before data collection, a dark background correction (dark current compensation) was performed leaving the spectroradiometer lens cap (black) on. In addition, a white background correction (reference) was also performed using a 25.4 25.4 cm (10 in. 10 in.) white panel (Spectralon Reflectance Target, CSTM-SRT99-100, Spectra Vista Cooperation, NY). The white background correction is performed to account for variations in the incident light intensity. 2.2. Data collection Spectral field data were collected from the citrus leaves attached to the tree canopy in a 2185 hectare (5400 acres) citrus orchard (Devil’s Garden, Southern Gardens, Clewiston, FL) from February 23rd–26th, 2010. Data collection was performed between 8.00 a.m. and 6.00 p.m from a block of citrus trees covering an area of about 0.0539 sq. km. Spectral reflectance data from healthy and HLB-infected orange leaves (Fig. 2) were recorded at a distance of about 0.5–0.7 m using the sensor, which was interfaced with a laptop computer. Data was acquired using a customized computer program, which collected five replicate spectra from each leaf sample. In addition to the spectral data collection, pictures of the samples were also taken. The HLB-infected samples were pre-marked and confirmed by the Southern Gardens head scout prior to data collection. Following the spectral data collection, all of the leaf samples were sent to Southern Gardens laboratory to confirm the presence of HLB bacteria through PCR analysis. The threshold cycle (Ct) from the PCR analysis indicates the presence or absence of HLB bacteria in leaves. A Ct value of higher than 32 is considered as healthy (PCR negative); while Ct value of less than 30 is considered HLB-infected (PCR positive). The number of healthy and HLB-infected leaf samples (symptomatic leaves) analyzed during data collection was 100 and 93, respectively. The healthy samples were from healthy trees from a region with no HLB-infection; while HLB-infected samples were from HLB-infected orchard area. Class 1 refers to the healthy class or group of data; while class 2 refers to the HLB class of data, hereafter. In addition, spectral reflectance data were collected from a set of 81 leaf samples with no HLB symptoms from the HLB-infected trees. All these samples were asymptomatic or non-symptomatic and were collected from a HLB-infected leaf branch. The asymptomatic leaves appeared healthy, unlike symptomatic leaves. They Fig. 1. Optical sensor system placed on a mobile platform. S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134 129 2.4. Data analysis Fig. 2. (a) Healthy and (b) HLB-infected citrus leaves. were labeled healthy or HLB based on PCR results. Out of 81 samples, 24 samples (29%) were PCR positive (Ct < 32). Samples with Ct values between 30 and 32 (nine samples) were eliminated from the dataset before statistical analysis. Thus, a set of 72 sample spectra with 24 PCR positive samples (labeled as HLB/class 2) and 48 PCR negative samples (labeled as healthy/class 1) were used for further analysis. 2.3. Data preprocessing The five spectra of each leaf sample were averaged before preprocessing. Prior to the data analysis, each averaged spectral feature of each sample was normalized (Euclidian norm or 2-norm) based on Eq. (1) using the MATLABÒ7.6 program (The MathWorks Inc., Natick, MA). Ri RnormðiÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ½Ri þ R2iþ1 . . . R2n ð1Þ where, Rnorm(i) is the normalized reflectance for a particular wavelength in a sample, Ri is the measured reflectance of a particular wavelength in a sample, and i varies from 1 to 989 (n), referring to each wavelength within a sample. Few representative healthy and HLB-infected sample spectra are presented in Fig. 3. After normalization, data binning was performed by averaging normalized spectral features for each 25 nm wavelength interval to reduce the spectral features from 989 to 86. In addition, binning was also needed to make the interval between the spectral bands uniform, in order to compute the first and second derivatives. The bandwidth was selected based upon preliminary analysis and literature review (Ray et al., 2010; Kong et al., 2010). 0.07 Healthy Samples HLB Samples Reflectance (a.u) 0.06 0.05 0.04 0.03 0.02 0.01 0 500 1000 1500 2000 2500 Wavelength (a.u) Fig. 3. Spectral signature of representative healthy and HLB leaf samples. The preprocessed (normalized, 25 nm-binned) spectral reflectance data were used to calculate the first and second derivatives based on Savitzky-Golay filtering. In Savitzky-Golay filtering, an unweighted linear least-square fit using a polynomial equation is used to calculate the filter coefficients. The Savitzky-Golay filtering performs data smoothing, in addition to calculating the derivatives. A window length/frame size of 7 and polynomial order of 2 (quadratic) was used for deriving the first and second derivatives from the preprocessed raw data. From the 86 spectral reflectance features (for each sample), a total of 80 first and 80 second derivatives were derived. Three different datasets were generated to determine the most suitable method for detecting HLB-infected samples. The datasets used are summarized as follows: a first derivatives dataset derived from preprocessed raw data; a second derivatives dataset derived from preprocessed raw data; and a combined dataset (in which the preprocessed raw data, first and second derivatives were merged). The total number of features representing the first derivatives, second derivatives, and the combined datasets were 80, 80, and 246, respectively. The above spectral datasets were analyzed using principal component analysis (PCA). PCA is a multivariate statistical technique that reduces the dimensionality of the original dataset by projecting it orthogonally into non-correlated variables or principal components (PCs). The PC scores of PCs accounting for 99.9% variance of the original feature datasets were used as an input to the classifiers. Figs. 4a–c illustrates representative PC plots derived from the first derivatives, the second derivatives, and combined datasets with three PCs. 2.5. Pattern recognition algorithms The PC scores derived from the three datasets were analyzed using several different pattern recognition algorithms to determine the classification accuracies during classification of healthy and HLB-infected leaf spectra. The PC scores (input features) were randomized and separated into balanced training and testing datasets. The training dataset was used for developing a model, while testing dataset was used to test the developed model. Before evaluating the classification accuracies of the different classifiers, the effect of training to testing dataset ratio on the classification accuracies was studied using the combined dataset. The training to testing dataset ratio was varied as 70:30, 75:25, 80:20, and 90:10. The data size corresponding to each ratio was: 136:57, 145:48, 155:38, and 174:19, respectively. For evaluating the training and testing dataset ratio, the linear and quadratic discriminant analysis (LDA, QDA), k-nearest neighbor (kNN with k = 5)-based models were repeated three times to estimate their average classifications accuracies. These models have been used for different classification applications (Khot et al., 2008; Panigrahi et al., 2008; Balabin et al., 2010). After acquiring the classification accuracies for each model using each ratio, a simple analysis of variance (ANOVA) was performed using SASÒ9.2 (SAS Institute Inc., Cary, NC, USA) to determine if the training and testing the dataset ratio had an effect on the average classification accuracies. Based on the ANOVA results, the datasets (first, second, combined) were randomized and separated into training and testing datasets. The process of randomization, and separation of each featured datasets into training and testing datasets was repeated thrice (resulting in three different randomized datasets for each type). One linear model and three non-linear classification models were used for analysis. The different classifier models, discriminant analysis (LDA, QDA), kNN, and soft independent modeling of class analogy (SIMCA) were evaluated to classify the spectral data. In discriminant analysis, discriminant functions are derived from 130 S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134 (a) x 10 -6 Principal Component 3 (10.8%) 4 2 0 Healthy HLB -2 -4 -6 5 2 0 x 10 1 -6 Principal Component 2 (21.7%) (b) 0 -5 x 10 -1 -10 x 10 -5 Principal Component 1 -2 (46.4%) -4 Principal Component 3 (8.4%) 3 Healthy HLB 2 1 0 -1 -2 -3 2 0 x 10 -4 -2 -2 Principal Component 2 -4 -4 -6 Principal Component 3 (4.7%) 4 2 Principal Component 1 (20.6%) (c) 0 x 10 -4 (56.1%) 0.02 Healthy HLB 0.015 0.01 0.005 0 -0.005 -0.01 -0.015 -0.02 0.04 0.02 0 -0.02 -0.04 Principal Component 2 (23.8%) -0.06 -0.1 -0.05 0 0.05 0.1 Principal Component 1 (58.7%) Fig. 4. Principal components plot of (a) first derivatives dataset, (b) second derivatives dataset, and (c) combined dataset, with three principal components accounting for 79.2%, 85.0% and 87.2% variability, respectively. the input independent variables of the training dataset. These discriminant functions are used to determine classification cut-off PC score values between the classes. LDA is a linear model, which generates a linear boundary; while QDA develops a quadratic model for classification of unknown samples (test dataset). For example, Fig. 5 represents the difference in boundary generation based on S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134 LDA and QDA using two PC scores derived from the combined dataset. In addition to the discriminant analysis, a kNN model was used for classification. With kNN, a model consisting of two groups (classes) is developed using the training dataset. The group (healthy/HLB) of unknown samples is predicted by calculating the distance between the unknown sample and k nearest neighbors/samples of the training dataset (PLS Toolbox 4.0 Manual, Eigenvector Research Inc.). The k-value (constant in the algorithm) was varied from 1 to 15 during the kNN model development. The optimum k was selected based on the highest classification accuracies with low type I (false positive/healthy classified as HLBinfected) and type II errors (false negative/HLB-infected classified as healthy). In other words, the kNN model with k yielding the maximum overall and individual class classification accuracies was considered as the best k (for kNN model). In kNN, the maximum number of votes (k) was used for determining the unknown sample class or group. In case of equal number of votes, the closest distance was used for predicting the unknown sample class. Finally, SIMCA, another multivariate classification model based on collection of PCA models was used. In SIMCA-based classification, as a part of model development, the PC scores are generated for each class based on the variation in each class, rather than utilizing the overall variation in the data. The model residuals are then used for classification of unknown samples. Thus, unlike other models (LDA, QDA, kNN) where the PC scores were used as input features, the spectral features were directly used as input features with SIMCA as SIMCA model with itself would estimate PC scores for each class. The number of PCs was selected such that they restored about 99% variability within each class. The Hotelling T2 and Q Residual in SIMCA model refer to the model space and residual space of the model. Hotelling T2 is used to estimate the probability that the predicted score of a given observation is similar or different to that of the training set (Bylesjö et al., 2006). Each of the classification algorithms were tested three times and the reported classification accuracies of the algorithms are an average of three runs. The overall and individual class (healthy-class 1 and HLB-class 2) classification accuracies were determined. All the pattern recognition algorithms were developed and tested using the MATLABÒ7.6. program. The training dataset consisted of 145 data samples; while the testing dataset consisted of 48 data samples. The set of asymptomatic samples, 48 healthy and 24 HLB sample spectra were preprocessed in the same way as the other healthy and symptomatic leaf samples. The PCA was performed on the combined data (healthy, symptomatic and asymptomatic) and PC scores (24) were used as input features to classify the data using the QDA-based algorithm. Ninety percentage of the symptomatic and healthy leaves spectra were used for training and developing the classification algorithm; while 10% of the dataset spectra were used for testing the model or validation (model 131 testing). The classification algorithm developed from the training dataset was then used to test the entire asymptomatic dataset (72 samples, unknown data testing). 2.6. Statistical analysis ANOVA (level of significance, a = 0.05) was performed using SASÒ9.2 to determine whether the classifier models yielded different classification accuracies (overall, class 1 and class 2) for each type of dataset. In addition, ANOVA was also performed to study the effect of different datasets (first derivatives, second derivatives, combined) on the classification accuracies. The ANOVA was performed for each model (LDA, QDA, kNN, SIMCA) and individual classification accuracies (overall, class 1, class 2). 3. Results and discussion 3.1. Data collection and preliminary analysis During the data collection, the HLB-bacterial detection using scouting crew (through observation of visual symptoms) was compared to PCR analysis. Comparing HLB identification by the scouting crew against the PCR results, it was found that the scouting efficiency was about 90% (two to three times of scouting). The remaining 10% error could also be a result of false negatives during the PCR analysis. In such cases, the HLB-infection was confirmed by observing the appearance of the symptoms from the pictures. Based on this method, a total of 193 samples with 100 healthy and 93 HLB were analyzed. In addition, ANOVA on the classification accuracies (as explained in the methodology session) was performed to study the effect of varying training-testing sample ratios on the classification accuracies. It was found that the classification accuracies did not change significantly (a = 0.05) by varying the training by testing dataset ratios. Therefore, a commonly used training by testing dataset ratio of 75:25 was applied for evaluating different classifier algorithms for the classification of the three featured datasets. 3.2. Analysis of the first derivatives dataset The average classification accuracies (overall, class 1 and class 2) of the different classifier models using the multiple datasets are summarized in Table 1. The number of PCs used as input features accounting for 99.9% of variability with the exception of 99% of variability in SIMCA, are summarized in Table 1. The cross validation results of SIMCA using PCs generated from class 1 and class 2 are presented in Fig. 6(a) and (b), respectively. Based on ANOVA (a = 0.05), it was found that the average classification accuracies obtained from the classifier models using the first derivatives dataset were different from each other. Fig. 5. (a) Linear discriminant analysis, and (b) quadratic discriminant analysis of the combined dataset. 132 S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134 Table 1 Classification of healthy and HLB-infected sample spectra using different pattern recognition algorithms. Model PCs First derivatives LDA 23 QDA 23 kNN 23 SIMCA 17* k 5 (5–7) Second derivatives LDA 26 QDA 26 kNN 26 3 SIMCA 22* Combined LDA QDA kNN SIMCA * (Raw data, first and 24 24 24 5 (5–7) 22* Average classification accuracies (%) Overall Class 1 (Healthy) Class 2 (HLB) 88.9 88.9 83.3 77.8 90.0 87.3 91.5 61.0 87.6 90.4 75.2 96.5 93.7 95.1 86.8 79.9 94.4 92.5 90.5 66.3 93.0 98.7 83.2 93.1 second derivatives) 90.3 97.6 92.3 91.4 86.8 89.8 91.7 85.5 3.3. Analysis of the second derivatives dataset 82.7 93.7 83.6 97.4 No. of PCs representing 99% variability within each class. The average overall classification accuracy for all the models was about 80%. The SIMCA and kNN-based models yielded a lower average overall classification accuracy than LDA and QDA. However, on comparing the individual class classification accuracies, it was found that the QDA and SIMCA models yielded higher average class 2 (HLB class) classification accuracies or lower false negatives (HLB classified as healthy), than those of LDA and kNN. It is more important to have low false negatives than false positives Q Residuals (0.81%) (a) 10 10 10 10 10 (healthy classified as HLB) to ensure the identification of all HLBinfected trees. It was found that the SIMCA resulted in higher class-2 classification accuracies (low false negatives) than those of other models. Thus, it could be stated that SIMCA was better suited for detecting HLB-infected leaves. However, QDA-based algorithm resulted in higher overall classification accuracies, with first derivatives dataset resulting in highest overall and individual (HLB and healthy) class classification accuracies (Table 1). ANOVA indicated that the classification accuracies (overall, class 1 and class 2) using different classifier models were different from each other (a = 0.05), when the second derivatives dataset was used. Among all the different models, the QDA-based algorithm yielded a higher average overall and class 2 (HLB) classification accuracies. With the exception of the kNN-based algorithm, all the classifier models yielded class 2 classification accuracies higher than 90%. Similar to the first derivatives, the SIMCA-based algorithm resulted in the lowest class 1 classification accuracies. The classification accuracies (overall, class 1 and class 2) with the second derivatives dataset were higher than those with the first derivatives dataset. 3.4. Analysis of the combined dataset In the combined dataset, the models resulted in different classification accuracies as analyzed using ANOVA (a = 0.05). The lowest average overall classification accuracy was found with the Samples/Scores Plot of train 3 Q Residuals (0.81%) Class 1 (Healthy) Class 2 (Diseased) 95% Confidence Level 2 1 0 -1 10 0 10 1 10 2 10 3 Hotelling T^2 (99.19%) Samples/Scores Plot of train Q Residuals (0.64%) (b) 10 2 10 10 10 Q Residuals (0.64%) Class 1 (Healthy) Class 2 (Diseased) 95% Confidence Level 1 0 -1 10 0 10 1 Hotelling T^2 (99.36%) Fig. 6. Residual plots during training and cross-validation using SIMCA. 10 2 S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134 133 Fig. 7. Average classification accuracies (all datasets) based on individual classification algorithms. kNN-based model; while LDA resulted in lowest average class 2 classification accuracies. Among the different models, the QDA as well as SIMCA-based algorithms resulted in high average overall and class 2 classification accuracies. The class 2 classification accuracies of the SIMCA-based algorithm were found to be higher than other models. In comparison to the first and second derivatives datasets, the SIMCA-based model resulted in high class 1 classification accuracies using the combined dataset. 3.5. Comparison of the datasets Comparing the datasets (first derivatives, second derivatives and combined), the QDA yielded the highest average overall classification accuracy of 95.1% with average class 2 accuracy of 98.7% compared to other models. The statistical analysis indicated that the accuracies (dependent variable) did not change significantly with variation in the dataset type (independent variable) (a = 0.05), comparing one model (LDA, QDA, kNN, SIMCA), one classification accuracy type (overall, class 1, class 2) at a time. The overall and individual class classification accuracies (averaging all the dataset types) are presented in Fig. 7. Although the classification accuracies (overall and individual class) of the combined dataset were similar to the other two datasets (first and second derivatives), in most cases the combined dataset showed a lower coefficient of variation than those of the other two datasets (within each model). Comparing the overall classification accuracies, the QDA model yielded a maximum average classification accuracy of 92%. Although the LDA model resulted in high overall classification accuracies comparable to classification accuracies yielded by the QDA model, the LDA model resulted in low class 2 classification accuracies (higher false negative). Comparing the class 2 classification accuracies, it was found that the SIMCA-based algorithm gave maximum classification accuracies of about 96% with a low variation (standard deviation of 3.7%). As the class 1 classification accuracies of SIMCA-based algorithm were lower, the overall classification accuracies were lower than those of other models. 3.6. Analysis of asymptomatic samples A plant disease detection tool should be rapid, specific to a particular disease, and sensitive for detection at the early onset of the symptoms (López et al., 2003). For these reasons, in addition to the spectra collected from healthy and symptomatic leaves in the field, another set of spectra was collected from the asymptomatic leaves from the same branch as the symptomatic leaves as described in the methodology section. Fig. 8. Classification results of spectral features of asymptomatic leaf samples. 134 S. Sankaran et al. / Computers and Electronics in Agriculture 77 (2011) 127–134 The classification accuracies acquired during validation (model testing) and testing (unknown data testing) are summarized in Fig. 8. Considering the symptomatic and healthy leaf samples (from HLB-infected trees), the classification accuracies (for overall, class 1 and class 2) were found to be higher than 90%. Similarly, when the asymptomatic spectral samples were analyzed, it was found that the overall classification accuracies were about 38%. Although the overall classification accuracies were found to be low, the class 2 (HLB class) classification accuracies were found to be about 88%. These high class 2 classification accuracies or low number of false negatives indicates that the spectroradiometer has a good potential for detecting asymptomatic HLB leaves under field conditions. The low overall classification accuracy (38%) of the asymptomatic leaf samples resulted from the low class 1 classification accuracies (more false positives). The possible reason for low class 1 (healthy) classification accuracies could be that the asymptomatic leaves labeled as healthy did not have sufficient population of HLB bacteria to be PCR positive. Thus, the asymptomatic leaves could be infected with bacteria, yet exhibit no symptoms nor appear PCR positive. It usually takes 3–6 months or more, depending on the age and physiological status of the trees, for the leaves of the HLB-infected trees to turn from PCR negative to PCR positive upon infection. In this study, as the asymptomatic leaves were PCR negative, they were considered healthy. However, we could not confirm whether the healthy appearing asymptomatic leaves turned from PCR negative to positive, as these trees were required to be removed after data collection to prevent the further spread of HLB infection through vectors to other parts of the orchard. 4. Conclusions The potential of a portable, field-based, visible-near infrared spectroradiometer for HLB detection in a citrus orchard was assessed. The results indicated that statistical classifier models such as QDA can distinguish between healthy and symptomatic HLBinfected leaves with high classification accuracies of greater than 90%. Comparing different classifier models, QDA consistently resulted in high average overall classification accuracies with low false positives and false negatives. Similarly, the SIMCA-based algorithm yielded good HLB class classification accuracies (or low false negatives). When the asymptomatic leaves were analyzed, it was found that that HLB class samples could be accurately classified with accuracies of about 88%. This study demonstrates the potential of optical sensors in disease detection. It is noteworthy to mention that there were a number of challenges during data collection under different field conditions. Some of the challenges are the effect of wind, resulting in some spectral variation and the presence of HLB-infected leaves within the canopy among others. Although this study demonstrates the potential of the optical sensors for disease detection, a large number of data needs to be collected to further validate these results. In addition, there is a need to evaluate the applicability of optical sensors in predicting asymptomatic HLB-positive leaves. Our future studies will involve working on some of these aspects. Acknowledgments The authors would like to thank the United Stated Department of Agriculture (USDA)-National Institute of Food and Agriculture (NIFA) and the Citrus Research and Development Foundation (CRDF) for their funding for this research. We would like to express our gratitude to Dr. Joao Camargo Neto, Ms. Sherrie Buchanon and Dr. Jose Gonzalez-Mora for their assistance during data collection. References Balabin, R.M., Safieva, R.Z., Lomakina, E.I., 2010. Gasoline classification using near infrared (NIR) spectroscopy data: comparison of multivariate techniques. Analytica Chimica Acta 671, 27–35. Bylesjö, M., Rantalainen, M., Cloarec, O., Nicholson, J.K., Holmes, E., Trygg, J., 2006. OPLS discriminant analysis: combining the strengths of PLS-DA and SIMCA classification. Journal of Chemometrics 20, 341–351. Chen, B., Wang, K., Li, S., Wang, J., Bai, J., Xiao, C., Lai, J., 2008. Spectrum characteristics of cotton canopy infected with verticillium wilt and inversion of severity level. In: IFIP International Federation for Information Processing, vol. 259. In: Daoliang, Li (Ed.), Computer and Computing Technologies in Agriculture, vol. II, Springer, Boston, pp. 1169–1180. Chung, K.R., Brlansky, R.H., 2009. Citrus diseases exotic to Florida: Huanglongbing (citrus greening). Fact sheet PP-210. Florida Cooperative Extension Service, Institute of Flood and Agricultural Sciences, University of Florida. Delalieux, S., Van Aardt, J., Keulemans, W., Schrevens, E., Coppin, P., 2007. Detection of biotic stress (Venturia inaequalis) in apple trees using hyperspectral data: non-parametric statistical approaches and physiological implications. European Journal of Agronomy 27 (1), 130–143. Futch, S., Weingarten, S., Irey, M., 2009. Determining HLB infection levels using multiple survey methods in Florida citrus. Proceedings of the Florida State Horticultural Society 122, 152–158. Khot, L.R., Panigrahi, S., Woznica, S., 2008. Neural-network-based classification of meat: evaluation of techniques to overcome small dataset problems. Biological Engineering 1, 127–143. Kong, L., Yi, D., Sprigle, S., Wang, F., Wang, C., Liu, F., Adibi, A., Tummala, R., 2010. Single sensor that outputs narrowband multispectral images. Journal of Biomedical Optics 15 (1), 1–3, 010502. Lacava, P.T., Li, W.B., Araújo, W.L., Azevedo, J.L., Hartung, J.S., 2006. Rapid, specific and quantitative assays for the detection of the endophytic bacterium Methylobacterium mesophilicumin plants. Journal of Microbiological Methods 65, 535–541. Li, W., Abad, J.A., French-Monar, R.D., Rascoe, J., Wen, A., Gudmestad, N.C., Secor, G.A., Lee, I.M., Duan, Y., Levy, L., 2009. Multiplex real-time PCR for detection, identification and quantification of ‘CandidatusLiberibacter solanacearum’ in potato plants with zebra chip. Journal of Microbiological Methods 78 (1), 59–65. Li, W., Hartung, J.S., Levy, L., 2006. Quantitative real-time PCR for detection and identification of Candidatus Liberibacter species associated with citrus Huanglongbing. Journal of Microbiological Methods 66 (1), 104–115. López, M.M., Bertolini, E., Olmos, A., Caruso, P., Gorris, M.T., Llop, P., Penyalver, R., Cambra, M., 2003. Innovative tools for detection of plant pathogenic viruses and bacteria. International Microbiology 6, 233–243. Muraro, R.P., Morris, R.A., 2009. The dynamics and implications of recent increases in citrus production costs. Fact sheet FE-793. Florida Cooperative Extension Service, Institute of Flood and Agricultural Sciences, University of Florida. Naidu, R.A., Perry, E.M., Pierce, F.J., Mekuria, T., 2009. The potential of spectral reflectance technique for the detection of Grapevine leafroll-associated virus-3 in two red-berried wine grape cultivars. Computers and Electronics in Agriculture 66, 38–45. Panigrahi, S., Chang, Y., Khot, L.R., Glower, J., Logue, C.M., 2008. Integrated electronic nose system for detection of Salmonella contamination in meat. IEEE Sensors Applications Symposium, Atlanta, GA, pp. 85–88. Purcell, D.E., O’ Shea, M.G., Johnson, R.A., Kokot, S., 2009. Near-infrared spectroscopy for the prediction of disease rating for Fiji leaf gall in sugarcane clones. Applied Spectroscopy 63 (4), 450–457. Ray, S.S., Jain, N., Miglani, A., Singh, J.P., Singh, A.K., Panigrahy, S., Parihar, J.S., 2010. Defining optimum spectral narrow bands and bandwidths for agricultural applications. Current Science 98 (10), 1365–1369. Sankaran, S., Mishra, A., Ehsani, R., Davis, C., 2010. A review of advanced techniques for detecting plant diseases. Computers and Electronics in Agriculture 72 (1), 1– 13. Spinelli, F., Noferini, M., Costa, G., 2006. Near infrared spectroscopy (NIRs): Perspective of fire blight detection in asymptomatic plant material. Proceeding of 10th International Workshop on Fire Blight. Acta Hort 704, 87–90. Teixeira, D.C., Danet, J.L., Eveillard, S., Martins, E.C., Junior, W.C.J., Yamamoto, P.T., Lopes, S.A., Bassanezi, R.B., Ayres, A.J., Saillard, C., Bové, J.M., 2005. Citrus huanglongbing in São Paulo State, Brazil: PCR detection of the ‘Candidatus’ Liberibacter species associated with the disease. Molecular and Cellular Probes 19, 173–179. Yang, C.M., Cheng, C.H., Chen, R.K., 2007. Changes in spectral characteristics of rice canopy infested with brown planthopper and leaffolder. Crop Science 47, 329– 335.
© Copyright 2026 Paperzz