Implementation of a NGS-based phytobenthos tool for - DNAqua-Net

Implementation of a NGS-based
phytobenthos tool for ecological
assessment in the UK
Martyn KellyA, Kerry WalshB, Rachel GloverC, David MannD, Steve JugginsE, Shinya SatoD,
Neil BoonhamC and Tim JonesF
A Bowburn Consultancy, 11 Monteigne Drive, Bowburn, Durham DH6 5QB, UK;
B Environment Agency, Horizon House, Deanery Road, Bristol, BS1 5AH;
C FERA, Sand Hutton, York, YO41 1LZ, UK;
D Royal Botanic Gardens Edinburgh, 21a Inverleith Row, Edinburgh EH3 5LR, UK
D Newcastle University, Newcastle NE1 7RU, UK;
E Environment Agency, Higher Shaftsbury Road, Blandford Forum, Dorset DT11 8ST, UK
[email protected]
@basil0saurus; www.microscopesandmonsters.wordpress.com
Contents
• Background: phytobenthos, diatoms and the
Water Framework Directive
• From light microscopy to NGS
• Uncertainty - a guide for the perplexed
– From data to status assessment to decision
• Better science or McEcology?
Water Framework Directive
(73 pages condensed to one slide)
Good status is the
objective
allochthonous
organic matter
High
Good
Biota shows no or only a
slight change from ‘expected’
“monitoring” supports the
management of water bodies to
achieve this objective.
Moderate
Poor
Bad
Biota shows significant alterations from
‘expected’ – a Programme of Measures is
required to bring the biota back to at least ‘good
status’
Macrophytes and phytobenthos:
one “biological quality element” required for
assessment of freshwaters
Survey  Sampling
“Macrophytes”
10 m
1m
10 cm
“Phytobenthos”
1 cm
1 mm
Scale
100 µm
10 µm
1 µm
“Phytobenthos”
• Benthic algae
• Complex mix of Bacillariophyta,
Chlorophyta, Cyanobacteria and other
algal groups
• Diatoms often abundant and diverse
• Strong correlation with water chemistry
• Therefore widely used as proxies
What are diatoms?
• Microscopic algae / protista
(“Heterokontophyta”)
• ~2800 species recorded from UK
freshwaters
• Characteristic silica cell wall
(“frustule”)
• Pigments include chlorophylls a
and c plus fucoxanthin
Limitations of diatom identification
using optical microscopy
•
•
•
•
Cost of microscopes
Slow
Staff training costs
Constant changes in diatom
taxonomy 
• Variability in outcomes 
Molecular biology is shaping our understanding of
diatom taxonomy ...
Diatom Research (2013)
28: 37-59
Recap (1)
• Diatoms are a useful but imperfect proxy for
“phytobenthos”
– Typically strong correlations with water chemistry
– A partial view of the benthic algal assemblage
– Taxonomy is not stable
– Analyses are expensive and variable
• Is there a better way?
Development of a molecular diatom
tool for WFD classification of rivers
and Lakes
– Can we develop a method based on molecular barcodes
that improves upon the current method based on light
microscopy?
– Cheaper?
– Less variable?
– Faster?
– Funding:
– Research:
Preliminaries
• Next Generation Sequencing
– Target gene: RbcL
– Phase 1: by Roche 454 (850 bp)
– Phase 2: by Illumina MiSeq (331 bp)
• Barcode reference library
– ~1000 strains  100 species barcoded
– Supplemented by barcodes from GenBank and
R-Syst::diatom ~400 species
• Bioinformatics
– Filters for Xanthophyta
Development of NGS-based method
• Design of main study:
– 250 sites in England, each sampled twice
– Stratified site selection to include all river types
and levels of pressure encountered routinely
– LM analysis: by Environment Agency staff
– NGS analysis: by FERA
– In addition, 113 “reference sites” sampled twice
(=232 samples) (mostly Scotland, Wales and
Northern Ireland)
Comparisons: LM vs NGS
Reasons for mismatches
“known knowns”
a.
+
“unknown unknowns”
b.
- + - + + - +
c.
- - - - + - -
“known unknowns”
d
-
Genetic variation
Conceptual diagram of relationship between LM and NGS outputs for four
different scenarios. “+” or “-“ indicate that a barcode either does or doesn’t
exist for a particular genotype within a species complex. a. clearly-defined
taxon aligns with barcode; b. species complex with several different barcodes
represented in the barcode database; c. species complex poorly represented in
the barcode database; d. species (or complex) not represented in the barcode
database.
Taxa differences: LM v NGS
Differences between representation of common taxa in LM and NGS: a) Achnanthidium
minutissimum-type (small, one chloroplast); b) Amphora pediculus (small, one chloroplast); c)
Navicula lanceolata (medium sized, two chloroplasts); d) Melosira varians (large, many
chloroplasts); e) Fistulifera saprophila (very small, four chloroplasts, weakly silicified); f)
Mayamaea atomus (including var. permitis (very small, possibly two chloroplasts, weakly
silicified).
Variability in results: LM v NGS
“Trophic Diatom Index” =
Using existing index on NGS
Recalibrated NGS-specific
data
index
78% of samples within 10 TDI units of current
(LM) TDI using recalibrated index
How important are “gaps” in the
barcode library?
Comparison of existing index calculated from LM data using all taxa (X) and
just those taxa represented in barcode library (Y)
Recap (2)
• Produced a NGS “mirror” of current LM
technique
• This gives a foundation for understanding how
NGS outputs relate to the “real world”
– rbcL reads per cell vary (within and) between species
– method “catches” species missed by LM (i.e. small
weakly-silicified cells)
• A different sort of “imperfect proxy”!
Uncertainty: a Guide for the Perplexed
Five discrete classes
Repeatability and reproducibility?
How does sample mean relate to
“true” status class?
Ecological status in space and time
Surface water body
“… a discrete and significant element of surface
water such as a lake, a reservoir, a stream, river or
canal, part of a stream, river or canal ……”
(Article 2)
A sample comes from a point in
space in time, but the
assessment refers to a broader
spatio-temporal “window”
Thames (freshwater):
250 km: 12 water bodies:
Mean length: 30 km
(max: 65 km; min: 2 km)
River Basin Management Plan
Every six years (Article 13)
Interim reports describing progress
after three years (Article 15)
Sources of uncertainty
Source
Addressed by …
Waterbody
3 locations per water body (~10 km)
Season
4 samples collected 3 months apart
Site
3 samples ~ 10 m apart at one site
Analytical
(repeatability)
LM: three separate slides prepared from one sample
NGS: three separate amplifications and analysis from one sample
“site”
Analytical
(reproducibility)
“waterbody”
LM: one sample per water body used for UK/Ireland diatom ring
test; results for “expert panel” (experienced analysts) used as
indication of between-analyst variation.
NGS: one sample per waterbody prepared separately by two
individuals and analysed on two separate NGS machines
Four water bodies of contrasting ecological status
“waterbody” versus “site” variation
Between-operator variation
UK / Ireland diatom ring test “expert panel”
standard deviation of TDI
5
4
3
2
1
0
Ehen
Wear
Derwent
Team
NGS
River Wear
standard deviation of TDI
10
LM
8
NGS
6
Estimate.
Diatoms only
give a partial
indication of
phytobenthos
4
2
0
Within
waterbody, one
day
Within site,
temporal
Within site,
spatial x 1 day
Within sample, 1 Within sample,
analyst
between analyst
diatom v nondiatom
• All four water bodies gave different results
• Waterbody variation: LM ~ NGS
• Analytical variation:
– Lower than site and waterbody variation
– Lower for NGS than LM
Recap (3)
• Understanding uncertainty gives us the
confidence to translate data into decisions
• Analytical precision of NGS is typically lower
than that for LM
• “Water body” variation is of a similar
magnitude for LM and NGS
– But values for NGS will decrease as gaps in library
are filled
• In the short term, therefore, there will not be
a gain in overall confidence
Better science or McEcology?
Two challenges to the UK approach to
NGS implementation
(Cheaper ? Faster ? Less variable ? )
1. Species discovery
Nupela cf neglecta Ponader, Lowe and Potapova in Potapova et al. 2003
Spotted by an Environment Agency analyst from a stream in Hampshire in
January 2017
This will be classified as “no blast hit” in January 2018
Primed for the unexpected?
• Currently ~40% of reads are not classified
– How much is infraspecific variation?
– How much is taxa for which we lack barcodes?
– How much is undescribed taxa?
• From 2017 there will be no LM sampling
– No means of verifying new records
• Possible ...
– but at expense of “cheaper, faster ...” mantra
2. Data interpretion
Excel-type interface
for calculating indices
Experience informs
interpretation ...
A poverty of meaning?
Rhoicosphenia abbreviata
Direct link to autecological data via models
Indirect link to
wider ecological
context via
experience
A poverty of meaning?
• DARLEQ calculates indices
• Biologist interprets sample and adds value to
indices
• But:
– Since 2012 Environment Agency biologists do not
collect the invertebrate or diatom samples they
analyse
– From 2017 their first encounter with a diatom
sample will be as a spreadsheet of names ...
– How will a biologist add value to an analysis result?
Recap (4)
• Implementation of NGS for ecological
assessment is not just a matter of science ...
• ... the structure and values of the
organisation, too, will influence the success
(or otherwise)
• (But this is not unique to NGS ...)
Ecologists?
or
People who collect
ecological data?
Concluding comments
• Diatom assessment by NGS now at “beta-testing”
stage
– Proof of concept complete
– Refining procedures
• Fulfils objective of achieving WFD classification at
lower cost
• Beginning to understand how NGS outputs relate to
the real world
– Current model is “entry level”
– ... but provides a good foundation for exploiting the full
potential of molecular genetic data in the future.
We shall not cease from exploration
And the end of all our exploring
Will be to arrive where we started
And know the place for the first time.
T.S. Elliot, Four Quartets
[email protected]
www.martynkelly.co.uk
@basil0saurus
www.microscopesandmonsters.wordpress.com