Implementation of a NGS-based phytobenthos tool for ecological assessment in the UK Martyn KellyA, Kerry WalshB, Rachel GloverC, David MannD, Steve JugginsE, Shinya SatoD, Neil BoonhamC and Tim JonesF A Bowburn Consultancy, 11 Monteigne Drive, Bowburn, Durham DH6 5QB, UK; B Environment Agency, Horizon House, Deanery Road, Bristol, BS1 5AH; C FERA, Sand Hutton, York, YO41 1LZ, UK; D Royal Botanic Gardens Edinburgh, 21a Inverleith Row, Edinburgh EH3 5LR, UK D Newcastle University, Newcastle NE1 7RU, UK; E Environment Agency, Higher Shaftsbury Road, Blandford Forum, Dorset DT11 8ST, UK [email protected] @basil0saurus; www.microscopesandmonsters.wordpress.com Contents • Background: phytobenthos, diatoms and the Water Framework Directive • From light microscopy to NGS • Uncertainty - a guide for the perplexed – From data to status assessment to decision • Better science or McEcology? Water Framework Directive (73 pages condensed to one slide) Good status is the objective allochthonous organic matter High Good Biota shows no or only a slight change from ‘expected’ “monitoring” supports the management of water bodies to achieve this objective. Moderate Poor Bad Biota shows significant alterations from ‘expected’ – a Programme of Measures is required to bring the biota back to at least ‘good status’ Macrophytes and phytobenthos: one “biological quality element” required for assessment of freshwaters Survey Sampling “Macrophytes” 10 m 1m 10 cm “Phytobenthos” 1 cm 1 mm Scale 100 µm 10 µm 1 µm “Phytobenthos” • Benthic algae • Complex mix of Bacillariophyta, Chlorophyta, Cyanobacteria and other algal groups • Diatoms often abundant and diverse • Strong correlation with water chemistry • Therefore widely used as proxies What are diatoms? • Microscopic algae / protista (“Heterokontophyta”) • ~2800 species recorded from UK freshwaters • Characteristic silica cell wall (“frustule”) • Pigments include chlorophylls a and c plus fucoxanthin Limitations of diatom identification using optical microscopy • • • • Cost of microscopes Slow Staff training costs Constant changes in diatom taxonomy • Variability in outcomes Molecular biology is shaping our understanding of diatom taxonomy ... Diatom Research (2013) 28: 37-59 Recap (1) • Diatoms are a useful but imperfect proxy for “phytobenthos” – Typically strong correlations with water chemistry – A partial view of the benthic algal assemblage – Taxonomy is not stable – Analyses are expensive and variable • Is there a better way? Development of a molecular diatom tool for WFD classification of rivers and Lakes – Can we develop a method based on molecular barcodes that improves upon the current method based on light microscopy? – Cheaper? – Less variable? – Faster? – Funding: – Research: Preliminaries • Next Generation Sequencing – Target gene: RbcL – Phase 1: by Roche 454 (850 bp) – Phase 2: by Illumina MiSeq (331 bp) • Barcode reference library – ~1000 strains 100 species barcoded – Supplemented by barcodes from GenBank and R-Syst::diatom ~400 species • Bioinformatics – Filters for Xanthophyta Development of NGS-based method • Design of main study: – 250 sites in England, each sampled twice – Stratified site selection to include all river types and levels of pressure encountered routinely – LM analysis: by Environment Agency staff – NGS analysis: by FERA – In addition, 113 “reference sites” sampled twice (=232 samples) (mostly Scotland, Wales and Northern Ireland) Comparisons: LM vs NGS Reasons for mismatches “known knowns” a. + “unknown unknowns” b. - + - + + - + c. - - - - + - - “known unknowns” d - Genetic variation Conceptual diagram of relationship between LM and NGS outputs for four different scenarios. “+” or “-“ indicate that a barcode either does or doesn’t exist for a particular genotype within a species complex. a. clearly-defined taxon aligns with barcode; b. species complex with several different barcodes represented in the barcode database; c. species complex poorly represented in the barcode database; d. species (or complex) not represented in the barcode database. Taxa differences: LM v NGS Differences between representation of common taxa in LM and NGS: a) Achnanthidium minutissimum-type (small, one chloroplast); b) Amphora pediculus (small, one chloroplast); c) Navicula lanceolata (medium sized, two chloroplasts); d) Melosira varians (large, many chloroplasts); e) Fistulifera saprophila (very small, four chloroplasts, weakly silicified); f) Mayamaea atomus (including var. permitis (very small, possibly two chloroplasts, weakly silicified). Variability in results: LM v NGS “Trophic Diatom Index” = Using existing index on NGS Recalibrated NGS-specific data index 78% of samples within 10 TDI units of current (LM) TDI using recalibrated index How important are “gaps” in the barcode library? Comparison of existing index calculated from LM data using all taxa (X) and just those taxa represented in barcode library (Y) Recap (2) • Produced a NGS “mirror” of current LM technique • This gives a foundation for understanding how NGS outputs relate to the “real world” – rbcL reads per cell vary (within and) between species – method “catches” species missed by LM (i.e. small weakly-silicified cells) • A different sort of “imperfect proxy”! Uncertainty: a Guide for the Perplexed Five discrete classes Repeatability and reproducibility? How does sample mean relate to “true” status class? Ecological status in space and time Surface water body “… a discrete and significant element of surface water such as a lake, a reservoir, a stream, river or canal, part of a stream, river or canal ……” (Article 2) A sample comes from a point in space in time, but the assessment refers to a broader spatio-temporal “window” Thames (freshwater): 250 km: 12 water bodies: Mean length: 30 km (max: 65 km; min: 2 km) River Basin Management Plan Every six years (Article 13) Interim reports describing progress after three years (Article 15) Sources of uncertainty Source Addressed by … Waterbody 3 locations per water body (~10 km) Season 4 samples collected 3 months apart Site 3 samples ~ 10 m apart at one site Analytical (repeatability) LM: three separate slides prepared from one sample NGS: three separate amplifications and analysis from one sample “site” Analytical (reproducibility) “waterbody” LM: one sample per water body used for UK/Ireland diatom ring test; results for “expert panel” (experienced analysts) used as indication of between-analyst variation. NGS: one sample per waterbody prepared separately by two individuals and analysed on two separate NGS machines Four water bodies of contrasting ecological status “waterbody” versus “site” variation Between-operator variation UK / Ireland diatom ring test “expert panel” standard deviation of TDI 5 4 3 2 1 0 Ehen Wear Derwent Team NGS River Wear standard deviation of TDI 10 LM 8 NGS 6 Estimate. Diatoms only give a partial indication of phytobenthos 4 2 0 Within waterbody, one day Within site, temporal Within site, spatial x 1 day Within sample, 1 Within sample, analyst between analyst diatom v nondiatom • All four water bodies gave different results • Waterbody variation: LM ~ NGS • Analytical variation: – Lower than site and waterbody variation – Lower for NGS than LM Recap (3) • Understanding uncertainty gives us the confidence to translate data into decisions • Analytical precision of NGS is typically lower than that for LM • “Water body” variation is of a similar magnitude for LM and NGS – But values for NGS will decrease as gaps in library are filled • In the short term, therefore, there will not be a gain in overall confidence Better science or McEcology? Two challenges to the UK approach to NGS implementation (Cheaper ? Faster ? Less variable ? ) 1. Species discovery Nupela cf neglecta Ponader, Lowe and Potapova in Potapova et al. 2003 Spotted by an Environment Agency analyst from a stream in Hampshire in January 2017 This will be classified as “no blast hit” in January 2018 Primed for the unexpected? • Currently ~40% of reads are not classified – How much is infraspecific variation? – How much is taxa for which we lack barcodes? – How much is undescribed taxa? • From 2017 there will be no LM sampling – No means of verifying new records • Possible ... – but at expense of “cheaper, faster ...” mantra 2. Data interpretion Excel-type interface for calculating indices Experience informs interpretation ... A poverty of meaning? Rhoicosphenia abbreviata Direct link to autecological data via models Indirect link to wider ecological context via experience A poverty of meaning? • DARLEQ calculates indices • Biologist interprets sample and adds value to indices • But: – Since 2012 Environment Agency biologists do not collect the invertebrate or diatom samples they analyse – From 2017 their first encounter with a diatom sample will be as a spreadsheet of names ... – How will a biologist add value to an analysis result? Recap (4) • Implementation of NGS for ecological assessment is not just a matter of science ... • ... the structure and values of the organisation, too, will influence the success (or otherwise) • (But this is not unique to NGS ...) Ecologists? or People who collect ecological data? Concluding comments • Diatom assessment by NGS now at “beta-testing” stage – Proof of concept complete – Refining procedures • Fulfils objective of achieving WFD classification at lower cost • Beginning to understand how NGS outputs relate to the real world – Current model is “entry level” – ... but provides a good foundation for exploiting the full potential of molecular genetic data in the future. We shall not cease from exploration And the end of all our exploring Will be to arrive where we started And know the place for the first time. T.S. Elliot, Four Quartets [email protected] www.martynkelly.co.uk @basil0saurus www.microscopesandmonsters.wordpress.com
© Copyright 2026 Paperzz