Network models and sample selection

Network models and sample selection
Outline
• First generation model: Connectivity of the
American agricultural landscape
• Second generation model: Network models
for soybean rust in the US
• Sampling strategies to inform predictions
The connectivity of the American agricultural landscape
Applying graph theory to assess the national risk of
crop pest and disease spread
Peg Margosian, Karen Garrett, Shawn Hutchinson, and Kim With
BioScience 2009
Thanks to USDA APHIS, NSF
Objective
Summarize and quantify the connectivity of the
U.S. agricultural landscape for four major crop
species to inform a national risk assessment of
their pathogens and pests
Sparks
The potential for movement through landscapes can be
modeled by evaluating nodes and the edges that connect
them
Node and edge characteristics may influence the potential
for movement
Information about US agricultural crop densities is available
by county from the National Agricultural Statistics Service
We have adapted graph theoretic approaches to this context
Dropped edge analyses characterize connectivity when only
movement across edges with ‘costs’ below a threshold is
supported
In our context, we defined the landscape resistance to
transmission (LRT) between two county centroids based on
the crop species availability in each county
Specifically, LRT(between two counties) = 1/(weighted mean percentage
acreage in crop species in the two counties)
Then we applied a dropped edge analysis to evaluate which
counties were connected for different thresholds
An example dropped-edge analysis
What threshold for the LRT is
relevant for any particular
pathogen or pest species?
The relevant threshold is a function
of the characteristics of
1. Pathogen/pest/vector
2. Host
3. Environment
4. Time scale being considered
And it is clearly a complicated
function of these factors…
Gordon
Soybean
Margosian et al. 2009
Maize
Margosian et al. 2009
Wheat
Margosian et al. 2009
Decision tree for evaluating
responses to an invasive
pathogen
Margosian et al. 2009
Next steps
Evaluate changes in connectivity over time as a
function of policy and economic drivers
Develop general agroecological models and risk
assessments that incorporate landscape dynamics
of host, pathogen, and environment
Develop and validate models for specific
pathogens and insect pests for which data are
available
Outline
• First generation model: Connectivity of the
American agricultural landscape
• Second generation model: Network models
for soybean rust in the US
• Sampling strategies to inform predictions
Dynamic network models for soybean rust epidemics in the US
with Karen Garrett, Caterina Scoglio, Philip Schumm, Scott Isard
Sweta Sutrave
Thanks to NSF, USDA-APHIS,
and all the people who contributed to this great data set
New features of this model
1. Non-adjacent counties can be connected
2. Distance is weighted by the projection of wind
speed and direction
3. We estimate the probability of new infection based
on previous infection maps (on, for example, a
monthly time step)
4. We include another host (kudzu)
Sutrave et al.
Dynamic network models
Network
Node
Edge
Edge weight: Level of interaction between the pair of
nodes
Dynamic nature: Edge weights change over time.
Soybean Rust in USA
Soybean rust status for counties, USA, 2007
Objectives
• Develop a framework for estimating edge weights using
observed epidemic time series in dynamic network models
• Apply the model to the spread of soybean rust in the US.
• Evaluate the estimation framework potential for epidemic
modeling.
Data Sets
• Rust status data: 2005 to 2008, from sentinel plot network
• Host density data: 2005 to 2008, from US National
Agricultural Statistics Service
• Wind data: Wind speed and direction, National Climatic Data
Center
Model
• SI model which classifies nodes as being susceptible or
infected.
• We consider the centroid of each county as a node.
• Based on the idea that the sentinel plot and the area around
it behave in a similar manner.
Edge weight function
• uji : Edge-weight between two nodes
• A function of the following
- Distance between the sentinel plots.
- Crop density and kudzu density.
- Speed and direction of wind w.r.t the edge
Several parameterizations being evaluated
With different structures for the edge-weights
• Multiplicative model (gravity law) performs best so far:
ui , j
⎛ d i .d j
= a1.⎜⎜
⎝ 2
⎛
⎞ ⎜ lij ⋅ wt
⎟⎟.⎜
2
⎠ ⎜ lij
⎝
di = crop density in node(county) i
dj = crop density in node(county) j
lij = distance between nodes i and j.
wt= wind vector at time t
Sutrave et al.
⎞
⎟ − a2lij
⎟⎟.e
⎠
Example Epidemic Simulation
Observed soybean rust for August
Example Epidemic Simulation
Prediction for September
Sutrave et al.
Measuring model performance
• Percentage nodes estimated correctly, where
the result for observed infected nodes is
weighted 90% and the result for observed
uninfected nodes is weighted 10%
• Other possibilities…
Goodness-of-fit of models
• Goodness of fit of multiplicative model with
gravity law for summer months
Transition period
Sutrave et al.
% Error
Estimated
Exponential
coefficient
Estimated
Scaling
coefficient
Jun05 – Jul05
0
10
0.01
Jul05 – Aug05
0
10
0.01
Aug05 – Sep05
0
10
0.01
May06 – Jun06
0.978
10
0.01
Jun06 – Jul06
1.495
10
0.01
Jul06 – Aug06
0.655
10
0.01
Aug06 – Sep06
0.023
10
0.01
Goodness-of-fit of models
Goodness of fit of multiplicative model for
summer months (continued)
z
Transition period
% Error
Estimated
Exponential
coefficient
Estimated
Scaling
coefficient
May07 – Jun07
1.061
10
0.01
Jun07 – Jul07
1.48
10
0.01
Jul07 – Aug07
2.68
10
0.01
Aug07 – Sep07
4.38
10
0.01
May08 – Jun08
3.41
10
0.01
Jun08 – Jul08
2.59
10
0.01
Jul08 – Aug08
3.13
10
0.01
Aug08 – Sep08
0
10
0.01
Sutrave et al.
Importance of host density and wind
speed/direction as predictors
• Multiplicative models with these removed
perform less well
• In randomization tests, model performance is
worse with these randomized
Potential modifications
• Incorporation of
– Environmental conditions such as temperature,
cloud cover
– Infection in previous years
Outline
• First generation model: Connectivity of the
American agricultural landscape
• Second generation model: Network models
for soybean rust in the US
• Sampling strategies to inform predictions
Optimal placement of Sentinel plots
• Objective: Sample the current set of Sentinel plots such that
the cost of establishment, maintenance and monitoring are
minimized with minimal loss to the prediction accuracy of the
model.
• Different components to the economic effort of sampling
– Sentinel plot
• Establishing a sentinel plot
• Sampling frequency for the sentinel plot
– Evaluation of other fields
• Identification of fields
• Sampling frequency of fields
Levels of sophistication for spatial selection
• Random reduction in number of locations by x%
Results of random sampling
Graph obtained by sampling counties in May 2007
Sutrave et al.
Levels of sophistication for spatial selection
• Random reduction in number of locations by x%
• Sampling based on region
– More density of plots in regions of greater interest.
– Latitude/Longitude based
– State based
• Sampling based on properties of the node
– ‘Betweenness’
– Clustering Coefficient
– Connectedness to currently infected nodes
Levels of sophistication for temporal
selection
• Reduce sampling frequency for all nodes
equally by x%
• Reduce sampling frequency for less
‘informative’ nodes
• Adapt sampling frequency to current disease
locations, identifying well-connected nodes
and moderately well-connected nodes
Model error
Goal for strategies
Random selection of sites and times
Strategic selection
Sampling effort
Outline
• First generation model: Connectivity of the
American agricultural landscape
• Second generation model: Network models
for soybean rust in the US
• Sampling strategies to inform predictions
[email protected]