Macklin SeqDB

Standards-derived management of
specimen-derived DNA sequences
Satpal Bilkhu, Nazir El-Kayssi, James A. Macklin,
Christopher T. Lewis
Agriculture and Agri-food Canada, Ottawa, Canada
Core Biological Resources
Biodiversity Collections:
Insects CNC) – 16 million
Plants (DAO) – 1.5 million
Fungi (DAOM) – 350,000
Nematodes – 40,000
Fungi – 18,000
Bacteria – 2,000
Viruses – 450
maintained
alive
Taxonomy library reference
collection
non-living
specimens
Multi-departmental “shared” priorities initiative
• AAFC lead department
• Bioinformatics experience crucial
• CRTI projects helped to build up expertise
and capacity on high risk organisms
Environmental Monitoring
• Metagenomics approach to Environmental Monitoring
– Canadian samples (2007-11)
– Foreign samples (2010-12)
– Samples collected weekly
• Develop baseline profiles of microbial biodiversity
• Identify “Bioindicators” of climate change
• Made possible by a nation wide Collection Network
Sanger Sequence Management
Mixed Specimen
(e.g Tree Bark)
Database Information
•358,000 Sequences
(mycology group, last 10 years)
•Database Size = 1 GIG
Specimen
(e.g. Pure Culture)
Sequence Reactions (Sanger)
Sample (DNA Extraction)
PCR Reactions
Metagenome Sequence Management
Mixed Specimen
(e.g. Air/Rain Samples)
Database Information
Sample (DNA Extraction)
Identification Pipeline
PCR Reactions (Pooled)
Sequence Reactions (454)
•50 Million Sequences
•Database Size = 100 GIG
Integrate Specimen-based sequences into identification process
Genome / Transcriptome Management
Specimen
(e.g. Pure Culture)
Sample
(e.g. DNA / RNA Extraction)
Sequencing Library
Database Information
•???
Assembly / Annotation
Pipelines
Specimen as source material
NGS Sequencing
Infrastructure
Specimen -> Sequences
Downstream Management
File / Metadata Management
Network
Attached
Storage
Sequence Analysis Workflow
Design and Execution
Cluster
Computing
Resources
SeqDB Technologies
SeqDB Components / Frameworks
Legend
External
Commercial
Client using Web Service
Client using Browser
(Programmatic Access)
(Chrome / Firefox / Opera / IE)
JSON Result
HTML Page
SeqDB
Deprecated
Scheduled / Manually
Triggered
Client using Command
Line
Minimal Dev
Client using Barcode
Printer
Barcode Label
Prototype
Android
Client
(Missing)
Client Barcode Scanner
Barcode Scanner
middleware
reader
printing
webservices
web
Appfuse
Struts 2
Spring 3
loader
util
processor
BioJava
dbi
Hibernate
Java
RDBMS (MySQL)
OS (Windows, Linux, OSX)
Home Page
Specimen Collections
Mixed Specimen
Collection
Public
Private
Specimen
Collection
Query Builder
• Search on any database field
• Range Searches for Specimen Number
• Partial Date Searches
• Range Searches for Dates
Search
Filters
Records
Storage
System
Sequence Quality Colouring and
Trimming
Barcoding and Label Printing
Extracted
Fields for
Label
Customizable Label Output
Many Export
Formats
Galaxy Integration
AAFC-DINA Partnership
• AAFC has a complex set of databases for
resource management which we would like to
reconcile and integrate using the DINA
platform.
• AAFC will contribute a configurable DNA
module based on the SeqDB platform via web
services API.
• Flexibility and sustainability through
community support!