DEPARTMENT for ENVIRONMENT, FOOD and RURAL AFFAIRS Research and Development CSG 15 Final Project Report (Not to be used for LINK projects) Two hard copies of this form should be returned to: Research Policy and International Division, Final Reports Unit DEFRA, Area 301 Cromwell House, Dean Stanley Street, London, SW1P 3JH. An electronic version should be e-mailed to [email protected] Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods DEFRA project code HH1530SPC Contractor organisation and location Horticulture Research International, Wellesbourne, CV35 9EF Total DEFRA project costs Project start date £ 9,851 01/12/01 Project end date 01/04/02 Executive summary (maximum 2 sides A4) Background Prediction of the future home-life quality of ornamental plants at the point of sale is of major interest to the horticulture industry. Plants that appear healthy during shelf-life may show rapid decline during home-life due to marketing stress or damage and techniques for early detection of latent plant damage are relevant to ROAME HH16 on Uniformity in Crop Produce and to the grower and retailer. LINK project HL0134LPC on Robust product design and prediction for post-harvest pot-plant quality and longevity used the rapid induction kinetics of chlorophyll fluorescence (CF) as a predictor of the post-harvest quality of ornamental potplants. However, that project used only a simple linear regression model based on a simple summary measure of the CF data and the aim of this project, HH1530SPC, was to test whether artificial neural networks (ANN's) could be used to produce improved predictions of plant quality superior to those used in HL0134LPC. Chlorophyll fluorescence quality prediction models The chlorophyll fluorescence curves used in HL0134LPC involved nine parameters, five values F1, F2, F3, F4 and F5 at fixed time points, minimum and maximum values F0 and Fm, the time to maximum fluorescence TFm and the area above the curve. The HL0134LPC project consultant (Professor Strasser, Geneva) supplied a summary CF performance index intended to capture all the useful CF information in the formula: F F F F F F F PI 2 1 2. m 1 .1 1 . m 1 .1 4 1 . F F 3 1 Fm F1 Fm F1 PI was then used to predict future plant quality by using a simple linear regression model based on PI. In project HH1530SPC, the linear regression used in HL0134LPC was generalised in two ways. First, the assumption of a single regression parameter PI was generalised by replacing the single PI predictor variable by a multivariate set of CF parameters. Second, the assumption of linearity was generalised by replacing the linear regression model by a non-linear ANN model. These generalisations led to three alternative CF models CSG 15 (9/01) 1 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods DEFRA project code HH1530SPC i) Model I: Simple linear regression fitted to PI ii) Model II: Multivariate linear regression fitted to CF parameters iii) Model III: ANN model fitted to CF parameters. The three models were fitted to the HL0134LPC data and were then compared using statistical methodology. Predictive CF models for begonia and Poinsettia quality variates (Objectives 1 and 2) Predictive models were fitted for begonia flower count, flower drop and damaged leaf count and for Poinsettia leaf drop and bract drop using the three alternative CF models outlined above. For each model, a predicted index was generated for each observation and the observed plant quality variables were then plotted against the predicted index. The plots were used to provide a graphical comparison of the power of the three alternative models for predicting plant quality variates using CF data. i) CF predictions for individual begonia plants recorded at de-sleeving showing the relationship between the observed square root of flower count two weeks after de-sleeving for individual begonia plants and the predicted model index for each plant for the three possible models. 8 Square Root of Flower Count at Week 2 [a] Model I [c] Model III [b] Model II 7 6 5 4 3 2 10 20 30 40 50 -20.0 IndexMI at desleeving -19.5 -19.0 -18.5 -59.0 IndexMII at desleeving -58.5 -58.0 -57.5 IndexMIII at desleeving The vertical scatter about the fitted line shows the goodness-of-fit of each individual model and it is apparent that Model III, the ANN model, gave a small improvement in fit over both Models I and II. However, the residual scatter for all three models remained substantial and the predictions for the individual plants remained unreliable. ii) CF predictions for individual Poinsettia plants recorded at de-sleeving showing the relationship between the observed square root of flower count two weeks after de-sleeving for individual Poinsettia plants and the predicted model index for each plant for the three possible models. 10 Square Root of Leaf Drop at Week 2 [a] Model I [c] Model III [b] Model II 8 6 4 2 0 20 30 40 50 IndexMI at desleeving 60 -18 -17 -16 IndexMII at desleeving -15 30 31 32 33 34 35 IndexMIII at desleeving The vertical scatter about the fitted line shows the goodness-of-fit of each individual model and it is apparent that Model III, the ANN model, gave a very substantial improvement in fit over both Models I and II. The CSG 15 (9/01) 2 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods DEFRA project code HH1530SPC scatter for Model III was substantially reduced but prediction for the individual plants still remained problematical, even for Model III. The predictive power of ANN models relative to linear regression (Objective 3) The power of ANN's for plant home-life quality using CF data was assessed by calculating the percentage variance explained by the three models for the measured begonia and Poinsettia quality variates after two weeks in home-life conditions. The power of the ANN model (Model III) was compared with the power of the simple linear regression on PI (Model I) and the power of the multiple linear regression on CF parameters (Model II) in the following tabulation Percentage variance explained Begonia by each model for each variate. flower count 14.6% Model I 22.3% Model II 30.4% Model III Begonia flower drop 31.1% 31.9% 39.9% Begonia damaged Poinsettia leaf count leaf drop 16.3% 25.9% 33.6% 27.3% 40.4% 50.2% Poinsettia bract drop 2.5% 3.8% 11.1% The ANN model gave better predictions and increased power for all the recorded variates for both begonia and Poinsettia. However, for the begonia variates, the increase in power relative to the increased complexity of the models was relatively modest and did not represent any major improvement in the fitted models. The Poinsettia bract drop predictions were too weak to be interesting but the ANN model achieved a very substantial improvement in leaf drop prediction. This showed that for Poinsettia, there was a very real improvement in predictive power for the ANN model relative to the linear regression methods. Delivery against objectives This project has delivered a new ANN methodology for improved prediction of plant quality variables from observed chlorophyll fluorescence data. Overall, the power of chlorophyll fluorescence for predicting the home-life performance of individual pot-plants was modest and even the most effective model, the ANN model for poinsettia leaf drop, explained only about 50% of the observed variability. However, there appeared to be real potential for batch screening using batch sampling and the models developed in this report will be used in HL0134LPC to test the effectiveness of CF for batch screening. The goodness-of-fit information from the various models will be used to estimate the power of CF screening to discriminate between batches of plants subjected to different levels of stress. The models will also provide additional insight into the relationship between a CF response curve and the subsequent quality of a pot-plant and could have important applications in future research on plant quality. CSG 15 (9/01) 3 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods DEFRA project code HH1530SPC Scientific report (maximum 20 sides A4) 1) Introduction The quality of ornamental plants can be determined by visual inspection at the point of sale but damage caused by poor handling or lack of temperature control during transport may not be apparent at that time. However, as damaged plants can deteriorate rapidly after purchase, some method of detecting plant damage before the visual symptoms of damage become apparent is highly desirable. Chlorophyll fluorescence (CF) measures the photosynthetic activity of plants and has the potential to detect latent damage (DeEll, van Kooten et al., 1999). Since CF monitoring is rapid and non-destructive, the technique has the potential to provide a useful method for routine quality control of pot-plants at the point of sale. The Horticulture LINK project HL0134LPC on Robust product design and prediction for post-harvest potplant quality and longevity has used the rapid induction kinetics of chlorophyll fluorescence as a predictor of post-harvest quality of ornamental pot-plants. Professor Strasser, the project consultant, has developed a theorybased measure of photosynthetic activity called the performance index (PI) using the principles of the JIP test (Parsons, Edmondson et al, 2001). PI is a non-linear measure of plant photosynthesis based on a combination of the characteristics of the CF spectral response curve and is intended to reduce the high dimensional information of a CF curve to a single dimensional variable. In HL0134LPC, it was assumed that PI captured all the useful CF information relating to plant damage and that PI was the most appropriate CF measure for the prediction of plant quality. Hence the future home-life performance of a plant was predicted using PI as the sole CF predictor in an ordinary linear regression model. There are two main difficulties with this approach. First, although PI may be a useful theoretical measure of photosynthetic activity, it does not necessarily follow that PI is the most useful statistical predictor of plant damage. Second, although linear statistical regression is powerful for prediction when there is a simple linear relationship between the predictor variables and the outcome variable, it is unclear whether this assumption is valid for the prediction of plant quality from chlorophyll fluorescence measurements. The methodology used in HL0134LPC for relating home-life quality to the initial CF response can be generalised in two ways. First, the assumption that PI captures all the useful information in a CF spectrum can be relaxed by fitting a multiple linear regression model using a range of parameters from a CF response curve. Second, the assumption of linearity can be relaxed by fitting a non-linear model using artificial neural network (ANN) methods. ANN methods have been shown to be useful for classification of plant species using a range of CF parameters as inputs (Tyystjärvi, Koski et al, 1999).) and should have similar utility for plant quality prediction. In this report, the linear regression model used in HL0134LPC will be generalised by replacing the single PI predictor variable by a multivariate set of chlorophyll fluorescence parameters with coefficients chosen to maximise the power of the linear regression model. The model will then be further generalised by fitting a nonlinear ANN model to relax the assumption of linearity. The utility of the three models, the linear PI model, the linear multivariate model and the fully general ANN model will be compared and the power of the three methods will be assessed. Finally, the appropriateness of the single PI measure as a predictor of home-life quality will be assessed by comparison with the generalised predictors constructed using multiple linear regression and ANN methods. 2) Objectives of the project i) To use ANN methods to construct a predictive model for begonia flower drop, bud drop and flower count after week 1 of home-life using CF parameter data collected at the start of home-life in 1999, 2000 and 2001 CSG 15 (9/01) 4 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC ii) To use ANN methods to construct a predictive model for poinsettia bract drop and leaf drop after week 1 of home-life using CF parameter data collected at the start of home-life in 1999/00, 2000/01 and 2001/02 iii) To quantify the predictive power of ANN models for begonia and poinsettia quality using crossvalidation methods to compare the power of ANN models with the power of simple linear regression models 3) Chlorophyll fluorescence i) Chlorophyll fluorescence measurement The chlorophyll fluorescence measurements for project HL0134LPC were made using a Plant Efficiency Analyser (PEA) supplied by Hansatech Instruments Ltd., King's Lynn. The PEA is a portable instrument designed to measure chlorophyll fluorescence induction by high time resolution continuous excitation. The instrument measures the time-dependent changes in fluorescence emission, which occur when a dark-adapted leaf is exposed to light. Typically, illumination of a healthy leaf after 10 - 30 minutes dark adaptation will result in an immediate rise to level (F0) followed by a rapid polyphasic rise to a maximum fluorescence level (Fm) as exemplified in Figure 1. Figure 1: A typical chlorophyll fluorescence response curve showing the position of the measured parameters used to characterise the shape of the curve Fm 1.0 0.8 Area = Area above curve from 0 to TF m F4 0.6 F3 0.4 F2 F1 F0 0.2 0.05 0.1 0.01 Normalized Fluorescence Fluorescence Parameters F5 0.1 0.3 2 TFm 30 ms 1 10 100 0.0 1000 Time (milliseconds) The PEA was set to record fluorescence values at five time intervals 0.05 ms, 0.1 ms, 0.3 ms, 2 ms and 30 ms within the polyphasic kinetic part of the fluorescence induction curve (also called the Kautsky curve) and the corresponding fluorescence values F1, F2, F3, F4 and F5, together with F0, Fm, the time to maximum fluorescence (TFm) and the area above the fluorescence curve at TFm were recorded. The relationship between these nine measures and the chlorophyll fluorescence curve are summarised in Fig 1. (F0 is estimated by backward extrapolation from the main curve and is indicated by the lozenge symbol on the vertical axis) 5 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC ii) Performance Index (PI) The most commonly used measure of plant stress is the ratio Fv/Fm, where Fv = (Fm - F0) is a measure of the quantum yield of the dark-adapted photo-system II (PSII). Project HL0134LPC investigated the use of the rapid induction kinetics of chlorophyll fluorescence for prediction of post-harvest quality of pot-plants using a performance index (PI) based on the principles of the JIP test (Strasser, Eggenberg and Strasser, 1996). The performance index supplied by the project consultant Professor Strasser at the University of Geneva was defined by the equation: F F F F F F F PI 2 1 2. m 1 .1 1 . m 1 .1 4 1 F3 F1 Fm F1 Fm F1 (1) In HL0134LPC, the chlorophyll fluorescence spectral response curve for each individual plant was summarised by PI, which was then used as a predictor for future home-life performance using ordinary linear regression methods (Parsons, Edmondson, et al 2001). 4) Plant quality attributes i) Measured attributes of plant quality for HL0134LPC Project HL0134LPC assessed the quality of begonia and poinsettia plants in shelf-life by scoring the quality of the plants using expert and consumer scores and by measuring a range of physiological plant attributes of quality. In addition to the quality assessment score on each plant, the following physiological characteristics were simultaneously recorded on each plant during each week of shelf-life: Begonia a) Flower drop b) Bud drop c) Flower count d) Damaged flower count e) Damaged leaf count Poinsettia a) Bract drop b) Leaf drop ii) Prediction of individual attributes of plant quality The overall quality of a plant is dependent on the observable attributes of quality and in this report the utility of chlorophyll fluorescence for predicting quality during home-life has been examined by modelling each attribute separately. One advantage of this approach is that if marketing stress affects only certain individual attributes of quality, fitting individual models will predict the effects of marketing stress with maximum sensitivity. A further advantage is that if different characteristics of the chlorophyll fluorescence curve are predictive for different aspects of stress damage, modelling plant attributes individually may give additional information about the relationship between chlorophyll fluorescence and the effects of stress damage. The ultimate aim of the work will be to develop predictors for overall quality by integrating the individual plant attributes of quality into a single measure of overall plant quality: that work will be developed in LINK project HL0134LPC and in DEFRA project HH1529SPC and will not be discussed here. 6 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC 5) Chlorophyll fluorescence prediction of quality i) Simple linear prediction method based on PI HL0134LPC tested the effects of a range of different marketing regimes on pot-plant quality by subjecting batches of plants to simulated marketing and then monitoring the subsequent performance of the plants during a period in simulated home-life environments. The chlorophyll fluorescence spectrum of the individual plants was recorded at de-sleeving immediately following simulated marketing and the quality attributes of the individual plants were then recorded weekly during the simulated home-life period. Let PI represents the performance index at de-sleeving and let yt represents a plant quality attribute measured on the tth recording occasion after de-sleeving. Let Ht represent a set of home-life environmental factors assumed not to interact with PI and let et represent a random error term. Then a predictive linear regression model for yt based on PI is: yt .PI H t h et (2) Equation (2) is the simple linear regression equation for yt that was used to model the attributes of plant quality in HL0134LPC. Unfortunately, the correlation between PI and the subsequent plant quality in home-life for individual plants was found to be weak and there appears little prospect that a simple regression equation based on PI will be useful for predicting individual plant quality (see Year 3 Annual Report for HL0134LPC). ii) Generalisation of the simple linear prediction method Potentially there are at least two ways of achieving more powerful predictions of future plant quality using chlorophyll fluorescence data. a) Multivariate linear prediction based on the individual chlorophyll fluorescence parameters where each parameter is given a proper empirical weighting for predicting plant quality attributes. b) Multivariate non-linear prediction based on non-linear models of the chlorophyll fluorescence parameters using trained artificial neural networks (ANN’s). Multivariate linear regression analysis is a statistical technique that can be applied using standard statistical methodology and is a natural generalisation of simple linear regression. ANN models are potentially more powerful than linear regression models but the methodology is less rigorous and there is a risk that models may be over-fitted with consequent spurious estimates of the power of the model. The inputs for ANN models are the individual chlorophyll fluorescence parameters of a CF curve therefore the natural reference models for comparing the power of ANN model are the corresponding multivariate regressions on the individual chlorophyll fluorescence parameters. It is essential, therefore, to compare ANN models both with the simple linear regressions on PI and with the multivariate linear regressions on the individual chlorophyll fluorescence parameters. In the remainder of this report, the three basic models, the linear regression model based on PI, the multivariate regression model based on the individual parameters of the chlorophyll fluorescence curve and the full ANN model will be designated Model I, Model II and Model III, respectively. These three models will form a structured reference set and will be discussed and developed in a structured way to emphasise the thematic nature of the project. 7 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC 6) Models for quality prediction Model I. The simple linear regression model yt .PI H t h predicts the quality attribute output (yt) using the performance index (PI) at desleeving and a set of home-life factor conditions Ht. Project HL0134LPC used two home life treatments, temperature and lighting, therefore replacing Ht by explicit home-life factors, gives the model y t Temp Light .PI (Model I) The dependencies shown in Model I can be represented graphically by a series of connected nodes where the various nodes represent inputs or outputs and the lines connecting pairs of nodes represent the relationships between those nodes. The graphical representation of Model I is shown in Fig 2, where the relationship between PI and the output node is represented by the regression coefficient . Fig 2: Model I represented as a network from inputs, PI, home-life factors and constant, to output (yt). CF Performance Index (PI) Home-life Factors PI Light Temp. Constant () Output Model II. The linear generalisation of Model I allows each individual chlorophyll fluorescence parameter to have a linear empirical regression coefficient. The single regression on PI is replaced by a sum of regression terms i xi of the individual chlorophyll fluorescence parameters to give n y t Temp Light i xi (Model II) n The dependencies in Model II can be shown graphically, as in Fig 3: Fig 3: Model II represented as a network from inputs, CF parameters, home-life factors and constant, to output (yt). CF parameters - Inputs (i) x1 x2 ..... Home-life Factors xn i Output 8 Light Temp. Constant () Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC Model III. The full generalisation of Model II is an artificial neural network (ANN) model. In an ANN model, the CF parameters at desleeving are the inputs to an artificial neural network with j hidden layer nodes, a logistic activation function (h) and a linear output function (o). The hidden layer provides the non-linear generalisation of Model II by acting as a series of non-linear switches and weights that allow the input variables to be combined in various complex ways. This allows a general non-linear function of the CF parameters at desleeving to be used to predict subsequent home-life quality. By increasing the size of the hidden layer, neural networks can approximate any continuous function, therefore Model III is a true generalisation of Model II. The non-linear model can be expressed algebraically by the formula: y t Temp Light w jo h wij xi j o i j (Model III) where h ( z ) exp( z ) /(1 exp( z )) . A representation of the nodes and interconnections of the various layers can be shown graphically, as in Fig 4: Fig 4: Model III represented as a network from inputs, CF parameters, home-life factors and constant, to output (yt). CF parameters - Inputs (i) x1 x2 wij Hidden Layers (j) ..... Home-life Factors xn Light Temp. Constant () ... ... wjo Output The number of parameters in Model III can be changed by increasing or decreasing the number of hidden layers in the network and an important aspect of fitting or training a neural network model for a particular task is the specification of the number of nodes in the hidden layer and the interconnections between the various nodes in the model. 7) Cross-validation and model fitting i) Cross-validation In the original work plan, it was intended that cross-validation methods would be used to fit and validate the different models. Cross-validation divides data into two subsets and uses one subset to fit the model and the other subset to test the performance of the fitted model. The procedure is repeated many times using different divisions of the data to generate a sampling distribution for the assumed model. Cross validation has been reported extensively in the literature for modelling large data sets and it was anticipated that the methodology would be appropriate for this project. However, the relatively modest amount of data available for this project meant that cross-validation was less useful than expected and the method has not been used in this project. 9 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC ii) Fitting and validation for Models I and II For Models I and II, model fit has been assessed by using conventional R2 methods based on the percentage variance of the total variance explained by a fitted model. The significance of the individual terms in a model has been assessed by fitting a maximal model including all potential explanatory variables and then sequentially testing the amount of variability explained by each term in the model. Terms that explained a non-significant amount of variation were omitted from the model and the process was repeated until no further terms could be omitted. The importance of the remaining terms in the final fitted model were then assessed individually by omitting each term individually from the final fitted model and assessing the variability explained by each omitted term separately. iii) Parameter normalisation for Model III For an artificial neural network model, the inputs (fluorescence parameters) must be scaled to lie in the interval [0,1] and in this project the ANN parameters F0, F1, F2, F3, F4 and F5 have been scaled by division by Fm, TFm has been scaled by division by 1000 and the area above the fluorescence curve (Area) at T Fm has been scaled by division by Fm x TFm. iv) Fitting and validation for Model III The fully connected ANN model shown in Fig 4 has every chlorophyll fluorescence input node connected to every hidden layer node, with the hidden layer nodes and the home-life input factors connected directly to the output node. The fitting and validation procedure for this model requires the pruning of unnecessary CF input nodes and connections until a minimum network with explanatory power similar to the fully connected model is obtained. Unnecessary connections have very little explanatory power and are unlikely to explain real characteristics of the data. The pruning procedure used here was to test the effect of omitting a connection from the network shown in Fig 4 by examining the amount of variation explained by each connection using an approximate R2 procedure. Connections and nodes that had little explanatory value were omitted until a minimum network was achieved. 8) Predictive models for Begonia attributes of plant quality (Objective 1) i) Preliminary analysis of quality characteristics Preliminary examination indicated that none of the correlations between the begonia plant quality attributes during home life and the chlorophyll fluorescence at marketing were strong. However, the three characteristics with the best correlations were the week 2 flower drop, week 2 flower count and week 2 damaged leaf count data, not the week 1 bud drop, week 1 flower count and week 1 flower drop characteristics as suggested in the original work plan. Therefore it was decided that the week 2 data set would provide the most useful data for this project. Due to changes in the initial settings of the PEA by Hansatech at the end of Year 1 of HL0134L, only data from Years 2 and 3 of HL0134L has been used in this project. ii) Models I and II for flower counts Table 1 shows an analysis of variance for Models I and II relating flower counts in week 2 to the mean leaf CF parameter data at desleeving. For Model I, the regression coefficient for PI at desleeving was positive showing that a higher PI value at desleeving gave a higher flower count in week 2 of home-life. For Model II, the important CF parameters for predicting flower counts were, in order of importance, T Fm, F4, Area, F3 and F5 and the estimated regression parameters show that a plant with a high value of TFm, Area, F4 or 10 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC F5 at desleeving had a lower flower count in week 2 of home-life whereas a plant with a high value of F3 had a higher flower count at week 2 of home-life. Table 1: Analysis of variance and chlorophyll fluorescence parameters for begonia flower count data Model Model I Model II Term df s.s. m.s. df s.s. m.s. 270 163.22 0.60 270 163.22 0.60 Total 1 -0.081 0.27 0.27 ns 1 -0.228 0.27 0.27 ns + Temp 1 -0.197 2.01 2.01 * 1 -0.224 2.01 2.01 * + Light 1 0.036 23.17 23.17 *** 5 37.47 7.50 *** + PI/CF 267 137.76 0.52 263 123.46 0.47 Residual parameter -1 -3.54 -10.91 10.91 *** -s TFm -1 -43.6 -5.99 5.99 *** - Area -1 10.96 -5.43 5.43 *** - F3 -1 -17.50 -8.16 8.16 *** - F4 -1 -10.94 -3.20 3.20 ** - F5 iii) Model III for flower counts Fig 5 shows the minimum ANN model consistent with the observed flower count and chlorophyll fluorescence data and Table 2 shows the corresponding estimated weights for the individual network connections. Fig 5: Fitted ANN model for the flower count in week 2 of home-life. CF parameters - Inputs (xi; i =1…6) F0 TFm Area F2 F4 Home-life Factors F5 Light Temp. Constant () wij Hidden layers (j =1…3) wjo Output (y) - Flower Count at week 2 11 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC Table 2: Estimated weights in ANN model for flower count No. Weight Connection Value 1 w11 F0 → h1 10.45 2 w13 F0 → h3 20.26 3 w21 TFm → h1 0.77 4 w22 TFm → h2 3.72 5 w23 TFm → h3 -5.77 6 w33 Area → h3 -13.35 7 w43 F2 → h3 5.49 8 w52 F4 → h2 -2.09 9 w62 F5 → h2 -0.54 10 w63 F5 → h3 -1.55 11 w1o H1 → o -93.5 12 w2o H2 → o 30.20 13 w3o H3 → o 18.15 14 → o 63.32 15 Temp Temp → o -0.23 . 16 Light Light → o -0.21 The estimated network weights in Table 2 characterise the relationship between the CF data and the flower count data. The importance of individual input nodes in the final fitted ANN model was calculated by the change in the sum of squared residuals for the removal of each input node in turn from the model and this gave the following ranking, in order of importance, of the input nodes: F5, F2, Area, TFm, F4, and F0. iv) Models I and II for flower drop Table 3 shows an analysis of variance for the regression model relating flower drop in week 2 to the chlorophyll fluorescence data at desleeving. The regression coefficient for PI in Model I was negative showing that a higher PI value at desleeving gave a lower flower drop in week 2 of home-life. The only important CF parameter for predicting flower drop was F5 with the estimated regression parameter for this term showing that a plant with a high value of F5 at desleeving had a higher flower drop in week 2 of home-life. Table 3: Analysis of variance and chlorophyll fluorescence parameters for begonia flower drop data Model Term Total + Temp + Light + PI/CF Residual paramete -rsF5 df 270 1 1 1 267 -0.720 -0.181 -0.038 Model I s.s. 206.95 36.92 2.91 26.05 141.06 m.s. df 0.77 36.92 *** 2.91 * 26.05 *** 0.53 270 1 1 1 267 -1 Model II s.s. -0.812 -0.170 12.27 206.95 36.92 2.91 27.82 139.30 -27.82 m.s. 0.77 36.92 *** 2.91 * 27.82 *** 0.52 27.82 *** v) Model III for flower drop Fig 6 shows the minimum ANN model consistent with the observed flower drop and chlorophyll fluorescence data and Table 4 shows the corresponding estimated weights for the individual network connections. 12 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC Fig 6: Fitted ANN model for the flower drop at week 2 of home-life. CF parameters - Inputs (xi; i =1…4) TFm F1 F4 Home-life Factors F5 Light Temp. Constant () wij Hidden layers (j =1…4) wjo Output (y) - Flower Drop at week 2 Table 4: Estimated weights in ANN model for flower drop No. Weight Connection Value 1 w13 TFm → h3 -9.77 2 w14 TFm → h4 46.38 3 w22 F1 → h2 103.93 4 w23 F1 → h3 -49.26 5 w31 F4 → h1 -0.93 6 w32 F4 → h2 -25.91 7 w33 F4 → h3 16.88 8 w34 F4 → h4 -60.59 9 w41 F5 → h1 -0.79 10 w44 F5 → h4 16.81 11 w1o H1 → o -47.57 12 w2o H2 → o 25.47 13 w3o H3 → o 26.2 14 w4o H4 → o 1.53 15 → o -14.41 16 Temp Temp → o -0.84 . 17 Light Light → o -0.17 The estimated network weights in Table 4 characterise the relationship between the CF data and the flower drop data. The importance of the individual input nodes in the final fitted ANN model was calculated by the change in the sum of squared residuals for the removal of each input node in turn from the model and this gave the following ranking, in order of importance, of the input nodes: F1, F4, TFm and F5. vi) Models I and II for damaged leaf counts Table 5 shows an analysis of variance for Models I and II relating damaged leaf counts at week 2 to the mean leaf CF parameter data at desleeving. For Model I, the regression coefficient for PI at desleeving was negative showing that a higher PI value at desleeving gave a lower damaged leaf count in week 2 of home-life. For 13 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC Model II, the important CF parameters for predicting damaged leaf count were, in order of importance, T Fm, F5, F2 and Area and the estimated regression parameters show that a plant with higher values of T Fm, F5, F2 or Area at desleeving had a higher damaged leaf count in week 2 of home-life. Table 5: Analysis of variance and chlorophyll fluorescence parameters for begonia damaged leaf count data Model Model I Model II Term df s.s. m.s. df s.s. m.s. 270 124.39 0.46 270 124.39 0.46 Total 1 -0.106 1.02 1.02 ns 1 -0.148 1.02 1.02 ns + Temp 1 0.016 0.00 0.00 ns 1 0.038 0.00 0.00 ns + Light 1 -0.034 20.40 20.40 *** 4 42.64 10.66 *** + PI/CF 267 102.96 0.39 264 80.73 0.31 Residual parameter -1 3.19 -8.61 8.61 *** -s TFm -1 27.91 -2.50 2.50 ** - Area -1 6.08 -4.67 4.67 *** - F2 -1 14.63 -7.71 7.71 *** - F5 vii) Model III for damaged leaf counts Fig 7 shows the minimum ANN model consistent with the observed damaged leaf count and chlorophyll fluorescence data and Table 6 shows the corresponding estimated weights for the individual network connections. Fig 7: Fitted ANN model for the damaged leaf count in week 2 of home-life. CF parameters - Inputs (xi; i =1…5) TFm Area F1 F3 Home-life Factors F5 Light Temp. Constant () wij Hidden layers (j =1…3) wjo Output (y) - Damaged leaf count at week 2 14 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC Table 6: Estimated weights in ANN model for damaged leaf count No. Weight Connection Value 1 w11 TFm → h1 23.87 2 w12 TFm → h2 4.43 3 w13 TFm → h3 6.73 4 w22 Area → h2 32.85 5 w23 Area → h3 51.4 6 w32 F1 → h2 2.78 7 w33 F1 → h3 -0.35 8 w42 F3 → h2 10.66 9 w43 F3 → h3 18.71 10 w51 F5 → h1 -1.86 11 w52 F5 → h2 -5.77 12 w53 F5 → h3 -11.94 13 w1o h1 → o -34.53 14 w2o h2 → o 104.17 15 w3o h3 → o -25.49 16 → o -41.86 17 Temp Temp → o -0.13 . 18 Light Light → o 0.06 The estimated weights in Table 6 characterise the relationship between the CF data and the damaged leaf count. The importance of individual input nodes in the final fitted ANN model was calculated by the change in the sum of squared residuals for the removal of each input node in turn from the model. This gave the following ranking, in order of importance, of the input nodes: F5, TFm F3, F1 and Area.. 9) Predictive models for Poinsettia attributes of plant quality (Objective 2) i) Preliminary analysis of quality characteristics Preliminary examination of the relationship between the poinsettia quality characteristics and the chlorophyll fluorescence data indicated that although the correlation with chlorophyll fluorescence was not strong for any characteristic there was some evidence that chlorophyll fluorescence had some predictive power for leaf drop in week 2. The correlations with bract drop were very weak but in view of the importance of bract drop for poinsettia quality it was thought worthwhile to examine both leaf drop and bract drop in week 2. ii) Models I and II for leaf drop Table 7 shows an analysis of variance for the regression model relating leaf drop in week 2 to the mean leaf CF parameter data at desleeving. For Model I, the regression coefficient for PI at desleeving was positive showing that a higher PI value gave higher leaf drop in week 2. For Model II, the important CF parameters for predicting leaf drop were, in order of importance, F3, TFm and Area and the estimated regression parameters for these terms showed that a plant with a higher value of F3, Area and TFm at desleeving had a lower leaf drop at week 2 of home-life. 15 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC Table 7: Analysis of variance and chlorophyll fluorescence parameters for poinsettia leaf drop data Model Model I Model II Term Df s.s. m.s. Df s.s. m.s. 268 836.76 3.12 268 836.76 3.12 Total 1 0.096 4.57 4.57 ns 1 0.100 4.57 4.57 ns + Temp 1 0.029 0.00 0.00 ns 1 0.016 0.00 0.00 ns + Light 219.05 1 0.106 219.05 3 235.30 78.42 *** + PI/CF *** parameter 265 613.13 2.31 263 596.91 2.27 Residual s- TFm -4.02 -23.80 23.80 ** -43.60 -16.92 16.92 ** - Area -40.91 -197.65 197.65 *** - F3 ii) Model III for leaf drop Fig 8 shows the minimum ANN model consistent with the observe leaf drop and chlorophyll fluorescence data and Table 8 shows the corresponding estimated weights for the individual network connections. Fig 8: Fitted ANN model for leaf drop in week 2 of home-life. CF parameters - Inputs (xi; i =1…5) F0 Area F4 F3 Home-life Factors F5 Light Temp. wij Hidden layers (j =1…4) wjo Output (y) - Leaf Drop at week 2 16 Constant () Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC Table 8: Estimated weights in ANN model for leaf drop No. Weight Connection Value 1 w11 F0 → h1 -86.80 2 w12 F0 → h2 -114.48 3 w13 F0 → h3 64.97 4 w14 F0 → h4 -60.93 5 w22 Area → h2 68.70 6 w23 Area → h3 -25.85 3 7 w24 Area → h4 30.87 8 w31 F3 → h1 -86.61 9 w32 F3 → h2 49.03 10 w33 F3 → h3 -39.62 3 11 w34 F3 → h4 -59.30 12 w41 F4 → h1 -142.28 13 w42 F4 → h2 118.61 14 w43 F4 → h3 -75.62 3 15 w44 F4 → h4 -10.98 16 w51 F5 → h1 135.45 17 w52 F5 → h2 -72.20 18 w53 F5 → h3 47.73 3 19 w54 F5 → h4 34.98 20 w1o h1 → o 5.48 21 w2o h2 → o 30.08 22 w3o h3 → o 53.97 23 w4o h4 → o -8.57 24 → o -27.55 25 Temp Temp → o -0.24 . 26 Light Light → o 0.17 The estimated weights in Table 8 characterise the relationship between the CF data and the leaf drop data. The importance of individual input nodes in the final fitted ANN model was calculated by the change in the sum of squared residuals for the removal of each input node in turn from the model. This gave the following ranking, in order of importance, of the input nodes: F5, F4, F0, F3, and Area. iii) Models I and II for bract drop Table 9 shows an analysis of variance for the regression model relating bract drop in week 2 to the mean leaf CF parameter data at desleeving. For Model I, the regression coefficient for PI at desleeving was negative showing that a higher PI value gave lower bract drop in week 2. For Model II, the only significant CF parameter for predicting bract drop was F0, with an estimated regression parameter that showed that a plant with a higher value of F0 at desleeving had a higher bract drop in week 2 of home-life. Table 9: Analysis of variance and chlorophyll fluorescence parameters for poinsettia bract drop data Model Model I Model II Term df s.s. m.s. df s.s. m.s. 270 358.47 1.34 270 358.47 1.34 Total 1 0.132 0.62 0.62 ns 1 0.094 0.62 0.62 ns + Temp 1 -0.124 0.96 0.96 ns 1 -0.103 0.96 0.96 ns + Light 1 -0.024 11.28 11.28 ** 1 15.76 15.76 *** + PI/CF 265 345.62 1.30 265 341.13 1.29 Residual paramet -1 23.64 -15.76 15.76 *** -ers F0 17 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC iv) Model III for bract drop Fig 9 shows the minimum ANN model consistent with the observed bract drop and chlorophyll fluorescence data and Table 10 shows the corresponding estimated weights for the individual network connections. Fig 9: Fitted ANN model for bract drop in week 2 of home-life. CF parameters - Inputs (xi; i =1…6) F0 Area F1 F2 F3 Home-life Factors F5 Light Temp. wij Hidden layers (j =1…4) wjo Output (y) - Bract Drop at week 2 Table 10: Estimated weights in ANN model for bract drop No. Weight Connection Value 1 w11 F0 → h1 -63.23 2 w12 F0 → h2 122.61 3 w13 F0 → h3 37.24 4 w14 F0 → h4 35.35 5 w21 Area → h1 40.27 6 w22 Area → h2 -40.30 3 7 w23 Area → h3 -28.78 8 w33 F1 → h3 28.98 9 w34 F1 → h4 98.42 10 w41 F2 → h1 -34.75 3 11 w42 F2 → h2 -63.64 12 w43 F2 → h3 3.74 13 w52 F3 → h2 33.13 14 w54 F3 → h4 -55.48 15 w61 F5 → h1 15.49 16 w62 F5 → h2 -16.96 3 17 w63 F5 → h3 -11.49 18 w1o h1 → o -54.88 19 w2o h2 → o 50.93 20 w3o h3 → o -114.53 21 w4o h4 → o 57.50 22 → o 5.10 23 Temp Temp → o 0.18 . 24 Light Light → o -0.10 18 Constant () Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC The estimated weights in Table 10 characterise the relationship between the CF data and the bract drop data. The importance of individual input nodes in the final fitted ANN model was calculated by the change in the sum of squared residuals for the removal of each input node in turn from the model. This gave the following ranking, in order of importance, of the input nodes: F5, F0, F3, F2, F1 and Area. 10) Quantify the power of ANN’s compared with linear models (Objective 3) The power of the models fitted in Objectives 1 and 2 can be compared by examining the proportion of the total variance explained by each model. Tables 11 and 12 for begonia and poinsettia, respectively, summarise the residual sums of squares and degrees of freedom for each fitted model and also show the percentage variance explained by each model. Table 11 for begonia shows that the PI predictor in Model I for flower count and damaged leaf count had very little power and that the multivariate linear predictor in Model II gave a definite increase in power. The full ANN model gave a further increase in power for both variates although the increase relative to Model II was modest compared with the increased complexity of the model. For the flower drop data, there was little difference between the power of Model I and Model II and again the increased power of the full ANN model was modest compared with the increased complexity of the model. None of the fitted models explained more than 40% of the total variance. Table 11: Comparison of the power of Models I, II and III for CF prediction for begonia using the percentage variance explained by each model relative to the null model to compare models Null model Model I Model II Model III Residual Residual Variance Residual Variance Residual Variance Source s.s. df s.s. Df explained s.s. df explained s.s. df explained Flower 163.22 270 137.76 267 14.6% 123.46 263 22.3% 107.32 255 30.4% Count Flower 206.95 270 141.06 267 31.1% 139.30 267 31.9% 116.93 254 39.9% Drop Damaged 124.39 270 102.96 267 16.3% 80.73 264 33.6% 69.41 253 40.4% Leaf Count Table 12 for poinsettia shows that for leaf drop the PI in Model I and the linear multivariate predictor in Model II both had similar power and explained about 25% of the variance. However, the ANN predictor in Model III gave a very substantial increase in power and explained about 50% of the total variation. The predictive power of chlorophyll fluorescence for the bract drop data was virtually negligible for all three models. Table 12: Comparison of the power of Models I, II and III for CF prediction for poinsettia using the percentage variance explained by each model relative to the null model to compare models Null model Model I Model II Model III Residual Residual Variance Residual Variance Residual Variance Source s.s. df s.s. df explained s.s. df explained s.s. df explained Leaf 836.76 268 613.13 265 25.9% 596.91 263 27.3% 377.99 243 50.2% Drop Bract 358.47 268 345.62 265 2.5% 341.13 265 3.8% 291.52 245 11.1% Drop The improvement in the fit of the models shown in Tables 11 and 12 can be illustrated by calculating a generalised performance index for each of the three models, Index MI PI , Index MII i xi and n Index MIII w j o i j wij xi based on the chlorophyll fluorescence predictive terms in Models I, II and III jo h 19 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC respectively. Figures 10[a-c] show plots of the begonia flower drop data in week 2 against the three indexes for begonia while 11[a-c] show plots of the poinsettia leaf drop data in week 2 against the three indexes for poinsettia. For begonia flower count, the plots show a progressive reduction in scatter about the predicted values from Model I to Model III whereas for poinsettia leaf drop, the scatter about the predicted values for Model I and Model II appear very similar. However, for Model III fitted by ANN, the scatter about the predicted values for poinsettia leaf drop is very substantially reduced and shows a very substantial improvement over the linear model predictions. Fig 10[a-c]: Relationship between flower count at week 2 of home-life and CF index at desleeving from Model I [a], Model II [b] and Model III [c] for begonia. 8 Square Root of Flower Count at Week 2 [a] Model I [c] Model III [b] Model II 7 6 5 4 3 2 10 20 30 40 50 -20.0 IndexMI at desleeving -19.5 -19.0 -18.5 -59.0 IndexMII at desleeving -58.5 -58.0 -57.5 IndexMIII at desleeving Fig 11[a-c]: Relationship between leaf drop at week 2 of home-life and CF index at desleeving from Model I [a], Model II [b] and Model III [c] for Poinsettia. 10 Square Root of Leaf Drop at Week 2 [a] Model I [c] Model III [b] Model II 8 6 4 2 0 20 30 40 50 IndexMI at desleeving 60 -18 -17 -16 IndexMII at desleeving -15 30 31 32 33 34 35 IndexMIII at desleeving 11) Conclusions Tables 1 and 5 show that the TFM parameter was the most important term in the Model II regression equations for begonia flower counts and damaged leaf counts. However, T FM does not occur in the PI calculation (equation 1) and as PI is scale invariant, it is likely that PI will also be time independent. Therefore PI will be uninformative about TFM and this is the probable reason why Model I was less informative than Model II about the begonia flower count and damaged leaf count data. 20 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC For Table 3, the only important regression coefficient in Model II is F5 and although this parameter does not occur in PI (equation 1), it seems probable that F5 will be highly correlated with other chlorophyll fluorescence parameters that do occur in PI. For that reason, Model I and Model II have similar power for the begonia flower drop data. The interpretation of the Model III terms is difficult but it is worthwhile to note that F5 was the most important input node for both flower count and damaged leaf count, although not for flower drop. The TFM node was included for all three measured variates which emphasises that the time course of the response curve needs to be included if CF is used as a predictor for begonia plant quality. Table 11 shows that for flower count and damaged leaf count the simple linear regression on PI (Model I) had very little predictive power, the multivariate linear model (Model II) had a significant improvement in power and the ANN model (Model III) had the best power overall. For flower drop, there was very little to choose between the predictive power of any of the three models. For all three models, the predictive power of chlorophyll fluorescence for the quality attributes of individual begonia plants was relatively modest. Tables 7 and 9 show the regression coefficients for the poinsettia leaf and bract drop data respectively but since the CF predictive power for bract drop was very low, only Table 7 contains useful information. The dominant term in Model II for leaf drop was F3 but TFM and Area were also significant. The regression coefficient for F3 in Table 7 is negative showing that as F3 increased, leaf drop decreased. However, the regression coefficient for PI in Table 7 is positive showing that as PI increased leaf drop also increased. The reason for this apparent anomaly is that PI (equation 1) contains F3 only through the difference (F3-F1) in the denominator of the equation. This means that when F3 increases, PI decreases and this gives rise to the apparently anomalous situation where a decrease in PI causes a decrease in leaf drop. Hence, PI appears to be negatively related to quality. This difficulty in the interpretation of the PI variate illustrates one of the advantages of using the individual CF parameters as explanatory variables rather than a single combined index. Although the multivariate linear regression model for poinsettia leaf drop gave little increase in power over the simple regression model, the ANN model had almost double the explanatory power. The fitted ANN model for poinsettia leaf drop described in Fig 8 and Table 8 is complex and difficult to understand but the ranking of the input nodes does indicate that F5, F4, F0 and F3 were the most important nodes taken in that order. The complexity of the ANN model for poinsettia leaf drop suggests that there is some risk that the model may be over-fitted but the increase in explanatory power compared with the linear models is so large that it seems clear that the ANN model has produced a real improvement over the linear models. 12) Exploitation Overall, the power of chlorophyll fluorescence for predicting the home-life performance of individual potplants appears modest. Even the most effective model, Model III for poinsettia leaf drop, explained only about 50% of the observed variability, which is probably inadequate for screening individual plants. However, there does appear to be the potential for detecting damage to whole batches of plants using batch-screening techniques. As part of project HL0134LPC, three transport treatments were tested including minimum stress, normal marketing stress and cold stress and there is evidence from HL0134LPC (see Parsons, Edmondson et al 2001) that there were detectable batch effects on whole batches of plants. The models developed in this project will be used to test the effectiveness of CF screening for discriminating between batches of plants subjected to different levels of stress. Chlorophyll fluorescence indexes will be calculated for each plant using the indexes discussed in Section 10 and the power of chlorophyll fluorescence to discriminate between complete batches of plants subjected to different levels of stress will be assessed. The work will be reported in the final project report for HL0134LPC. 21 Project title Chlorophyll fluorescence spectral discrimination by artificial neural network methods MAFF project code HH1530S PC References: DeEll, J., van Kooten, O., Prange, R.. and Murr, D. (1999). Applications of Chlorophyll Fluorescence Techniques in Postharvest Physiology. Horticultural Reviews 23, 69-107. Parsons, N. R., Edmondson, R. N., Clark, I. and Langton, F. A. (2001). Robust product design and prediction for post-harvest pot plant quality and longevity. Third year confidential technical Strasser, R.J., Eggenberg, P. and Strasser, B.J. (1996). How to work without stress but with fluorescence. Bulletin de la Sociėtė Royale des Sciences de Liēge 65 (4-5), 330-349. Tyystjärvi, E., Koski, A., Keränen, M. and Nevalainen, O. (1999). The Kautsky Curve is a Built-in Barcode. Biophysical Journal 77, 1159-1167. Please press enter 22
© Copyright 2026 Paperzz