LSMS - ConfTool

Blowing in the Wind:
The Quest for Accurate Crop Variety Identification in Field
Research, with an Application to Maize in Uganda
TALIP KILIC
Senior Economist & Survey Methods Team Leader
Living Standards Measurement Study
Development Data Group – Survey Unit – World Bank
[email protected]
Co-Authors: JOHN ILUKOR, JAMES STEVENSON, SYDNEY GOURLAY, FREDERIC KOSMOWSKI,
ANDRZEJ KILIAN, JULIUS PYTON SSERUMAGA, AND GODFREY ASEA
International Consortium on Applied Bioeconomy Research (ICABR) Conference
Berkeley, CA – May 31, 2017
Motivation
• Accurate identification of crop varieties grown by farmers
key to estimating
– levels of improved variety cultivation
– ensuing impacts on production, productivity, and a range of
welfare and nutrition outcomes
• Empirical evidence central to justifying investments in
crop R&D, support to seed systems
• Among farmers, correct information essential to their
adoption & management decisions
Prevailing Approaches to Variety Identification
• Extent of underinvestment in methodological innovation for
accurate variety identification is puzzling
• Literature on adoption & impacts have usually relied on expert
estimates and/or farmer-reported survey data on
– Variety names
– Improved vs. traditional status of a cultivated variety
– Hybrid vs. OPV status of a cultivated variety
• Why worry?
– Weaknesses in extension & formal seed systems
– Reliance on informal channels of seed acquisition
– Variety naming systems that exhibit variation across time & space
• Limited empirical evidence on the accuracy of prevailing
approaches to variety identification (& implications of
measurement error in impact evaluation)
Our Contribution
• Implemented a survey experiment in Eastern Uganda to
test the relative accuracy subjective approaches to maize
variety identification compared to DNA fingerprinting
• Compiled a reference library of improved varieties in
Uganda that serves a key input into DNA fingerprinting as
well as the assessment of commercial seed quality
MAPS: Methodological Experiment on Measuring
Maize Productivity, Soil Fertility, and Variety
Support
• LSMS Minding the (Agricultural) Data Gap Research Program, funded by UK Aid
• Global Strategy to Improve Agricultural and Rural Statistics, housed at FAO
• World Bank Innovations in Big Data and Analytics Program
• World Bank Trust Fund for Statistical Capacity Building
Primary Objectives
• Test subjective approaches to measurement vis-à-vis objective methods for maize yield
measurement, soil fertility assessment & maize variety identification
Partnerships
• Uganda Bureau of Statistics (Implementing Agency), World Agroforestry Centre (Soil
Fertility), CGIAR Standing Panel on Impact Assessment (Variety Identification), Stanford
University & Terra Bella (Remote Sensing)
Round I (First Agricultural Season of 2015)
• Post-Planting Fieldwork: April-June 2015
• Crop Cutting Fieldwork: June-August 2015
• Post-Harvest Fieldwork: September-November 2015
Round II (First Agricultural Season of 2016)
• Identical timeline & visit structure
• Follow-up to a subset of Round I households (540 out of 900)
MAPS Sample
Round I
Enumeration Area (EA) Selection
• 45 EAs from a 400 Km2 remote sensing tasking area (Iganga & Mayuge)
• 15 EAs in each of Serere & Sironko districts
Household Selection
• Original Plan: 6 pure stand & 6 intercropping households selected at random in
each EA following listing – 450 in each universe
•
Result: 385 vs. 515 split – inadequate # of pure stand HHs in select EAs
– 249 vs. 291 split in Iganga & Mayuge
Plot Selection
• Survey Solutions CAPI application to randomly select one plot per household
Round II
• Follow-up to 540 households in Iganga & Mayuge
• Analysis sample: 440 households with crop cuts in Round I & II
• Attrition does not have a bearing on the analysis
MAPS Remote Sensing Tasking Area
MAPS Methods
Methods Tested:
Maize Production
•
•
•
Crop-cutting
• 4m x 4m & a 2m x 2m subplot in Round I
• 8m x 8m sub-plot in Round II
• Full-plot crop cut in Round II (1/2 of sample)
Remote sensing based on high-res imagery
• First in testing the method in a smallholder production
system against an objective measure
Self-reported harvest
• Conversion of quantities in non-standard unit-condition
combos into KG-, dried grain terms (“official” methods)
Land Area
•
•
GPS measurement (Garmin eTrex 30 handheld units)
Self-reported area
Soil Fertility
(Round I)
•
•
•
Conventional Soil Analysis (subsample)
Spectral Soil Analysis
Self-reported soil quality & attributes
Variety
Identification
•
DNA fingerprinting of grain sampled from the crop-cutting
subplot harvest (4x4m in Round I, 8x8m in Round II)
Self-reported variety name, type & morphological attributes
•
DNA Fingerprinting
• Diversity Arrays Technology (DArTseq) method that facilitates
genome-wide characterizations of large accessions sets compared to
existing genotyping-by-sequencing methods using SNP markers
• Compiled a reference library of 38 maize varieties in circulation during
the pre-planting period of the first rainy season of 2015, from NARO &
4 major seed companies, with revealed genotyping intention
• Genotyped each reference library and field samples to derive two vars
– Heterogeneity: # of DNA marker variants in the genomic representation - a
collection of fragments from the genome selected for sequencing
– Purity: Computed only for the field samples, represents the extent to which
heterogeneity overlaps with that of the matched reference library variety identified initially as the one with the closest genetic distance to the field
sample in question (below a distance threshold of 3).
Recursive Partitioning & Classification Tree Analysis of
Morphological Attributes of 38 Reference Library Samples
• Morphological attributes for
the reference library: Obtained
by planting out the 38 varieties
in NaCCRI fields.
• Results: Varieties are uniquely
identified using 11 attributes.
• Identification of the varieties
in the field: Using these
attributes, varieties that the
farmers plant were identified
based farmer responses on
morphological attributes
Context
Descriptive Statistics
Crop Cut Sub-Plot/Seed Attributes
Farmer Reporting Multiple Varieties in Crop Cut Sub-Plot = Yes †
Farmer Reporting Multiple Varieties in Crop Cut Sub-Plot = No †
Farmer Reporting Multiple Varieties in Crop Cut Sub-Plot = Don't Know †
Primary Variety's Recyclability Status Correctly Identified †
Farmer Says He Knows the Variety †
Planted Seed Source = Stockist/Market †
Plot Attributes
Distance from Dwelling Based on GPS Location (KMs)
GPS-Based Plot Area in Hectares
Plot is Purestand †
Share of Seed Would Have Been Planted Under Purestand ‡
Household Labor Days
Any Hired Labor on Plot †
Hired Labor Days
Any Organic Fertilizer Was Applied †
Any Inorganic Fertilizer Was Applied †
Any Pesticide Was Applied †
Weighted Additive Soil Quality Index (Muhkerjee & Lal)
Household Attributes
Household Size
Dependency Ratio
PCA-Based Agricultural Implement & Machinery Index
Any Member Received Extension Advice on Ag Production †
Manager Attributes
Female †
Age (Years)
Education (Years)
Manager = Respondent †
Observations
Mean Std. Error Min Max
0.17
0.02
0
1
0.58
0.02
0
1
0.25
0.02
0
1
0.31
0.02
0
1
0.45
0.03
0
1
0.37
0.02
0
1
0.16
0.14
0.46
82.96
49.12
0.43
5.19
0.01
0.09
0.04
0.30
0.02
0.01
0.01
0.79
1.77
0.03
0.88
0.00
0.02
0.01
0.00
0.00 0.92
0.00 1.37
0
1
8.33 100
0
504
0
1
0
294
0
1
0
1
0
1
0
0.5
6.36
1.44
-0.27
0.31
0.14
0.06
0.04
0.03
1
22
0
7
-1.52 5.39
0
1
0.42
41.07
6.32
0.82
0.02
0.71
0.30
0.01
510
0
6
0
0
1
92
20
1
How Do Different Methods Perform in Unique
Identification of Maize Varieties in Round I?
• 53 percent of farmers could not state the variety they have planted
• Farmer-reported morph. attributes does not uniquely identify varieties
• DNA fingerprinting performs the best for unique varietal identification
Unique Variety Identification, Irrespective of Correct Variety Identification, by Method
Farmer Elicitation
DNA Fingerprinting
Variety Name Provision Morphological Protocol
Observations Percent Observations Percent Observations Percent
Uniquely Identified
227
44.5
62
12.2
510
100.0
Not Uniquely Identified
283
55.5
448
87.8
0
0.0
TOTAL
510
100
510
100
510
100
13
15
12
Total Number of Varieties Identified
Only 2 Percent of the Farmers Correctly Identified the
Variety Based on DNA Analysis in Round I
• With the exception of
LONGE 10H, the varieties
stated by the farmers (i.e.
right panel) are NOT among
the varieties identified by
DNA fingerprinting (i.e. left
panel).
• Either farmers do not know
or the stated names are the
ones they were told
Source: Ilukor et al. (Forthcoming).
And Our Experts Were No Better!
Variety
Name
Unidentified
YARA42
LONGE10H
WE2114
LONGE5
LONGE4
(Selected) Incidences of Variety Cultivation, by Method
Farmer Elicitation
Expert
Variety Name
Morphological
Elicitation
Provision
Protocol
55.5
87.8
0.0
0.0
0.0
0.0
11.8
0.0
35.0
0.0
0.4
0.0
15.3
0.2
40.0
5.3
2.4
20.0
DNA
Fingerprinting
0.0
34.3
33.3
21.2
0.0
0.0
How Do Different Methods Perform in Identifying
Local/Improved & OPV/Hybrid Varieties in Round I?
•
•
Cultivation of improved & hybrid varieties is under-estimated by farmers
Cultivation of open pollinated varieties is over-estimated by farmers
Incidence of Local/Improved & Hybrid/OPV Variety Cultivation, by Method
Panel A: Incidence of Local/Improved Variety Cultivation (% )
Farmer Elicitation
DNA
Farmer
Morphological
Fingerprinting
Reporting
Protocol
Improved
45.1
-100.0
Traditional
42.8
-0.0
Don't Know
12.2
-0.0
Panel B: Incidence of Hybrid/OPV Variety Cultivation (% )
Farmer Elicitation
DNA
Farmer
Morphological
Fingerprinting
Reporting
Protocol
Hybrid
30.6
12.2
99.6
OPV
31.2
0.0
0.4
Don't Know
38.2
87.8
0.0
Purity of the Field Samples in Round I
• Farmer-planted variety according to DNA fingerprinting is the reference
library variety that is genetically closest (not necessarily identical)
• Purity = Overlap between the field sample genetic heterogeneity & the
genetic heterogeneity of the identified variety in the reference library
Purity of Field Samples
120
100
80
60
40
20
0
Mean
63.2%
Median 62.1%
Min
46.9%
Max
98.4%
Headline Findings from Multivariate Analyses
of Variety Identification Outcomes
Purity
• Negatively correlated with farmer’s correct identification of
recyclability of the seed
• NO relationship with commercial acquisition of the seed!
Farmer’s correct identification of an improved variety
• Positively correlated with farmer’s knowledge of the variety &
commercial acquisition of the seed
(Unacceptable Levels of) Heterogeneity in
Reference Library Samples in Round I
• Acceptable level of heterogeneity of the samples is 20% but most of the
reference library samples are above the threshold.
80
Mean
32.9%
60
70
Median 24.6%
Min
9.8%
H628
H614D
H625
H624
H520
H6213
H513
WE2115
WE2114
WE2106
WE2104
WE2103
WE2101
WH505
KH500-43A
SC627
Seed Variety
WH403
PAN67
FH6150
YARA42
DK8031
YARA41
LONGE11H
LONGE9
LONGE10H
LONGE8
LONGE7
UH6303
LONGE6
UH5054
UH5053
UH5052
UH5051
VP-MAX
SCDUMA
LONGE5
MM3
LONGE4
10
20
30
40
50
Max
75.2%
Key Take-Away Messages
• Variety identification findings reveal:
–
–
–
–
High-levels of improved variety cultivation, despite popular belief
But… cultivated varieties are of inferior quality
Limited farmer knowledge about the varieties that they plant
Weaknesses in & potential implications for extension & seed system
• Evidence prompts us to think more critically about existing agricultural
statistics & survey methods
• Support for DNA fingerprinting to be the new standard for accurate
variety identification in field research
– Further experimentation & synthesis of evidence from completed survey
experiments on other countries & crops – key to formulating guidelines for scale-up
– Additional costs require more thinking around sub-sampling approaches in existing
household & farm surveys
Blowing in the Wind:
The Quest for Accurate Crop Variety Identification in Field
Research, with an Application to Maize in Uganda
TALIP KILIC
Senior Economist & Survey Methods Team Leader
Living Standards Measurement Study
Development Data Group – Survey Unit – World Bank
[email protected]
Co-Authors: JOHN ILUKOR, JAMES STEVENSON, SYDNEY GOURLAY, FREDERIC KOSMOWSKI,
ANDRZEJ KILIAN, JULIUS PYTON SSERUMAGA, AND GODFREY ASEA
International Consortium on Applied Bioeconomy Research (ICABR) Conference
Berkeley, CA – May 31, 2017