Identifying Behavioral Strategies by Large Scale Phenotyping

Identifying Behavioral Strategies through
Large Scale Phenotyping
and Statistical Analysis
Stephen Helms, Ph.D.
March 12, 2014 – SURFsara Data & Computing Infrastructure Event
FOM Institute AMOLF, Amsterdam, Netherlands
Leon Avery (VCU), Greg Stephens (VU Amsterdam/OIST),
Tom Shimizu (AMOLF)
How Do We Understand Complex Systems
With Many Parts?
(Also a general “big data” question!)
A Model
Complex
System
Traditional
approaches for
understanding
complex
biological
systems
Statistical
approach for
understanding
biological
systems
Data and
computation
problems
Proposed
computational
platform
Outlook for the
future
A Simple Model Nervous System: C. elegans
Stimuli
• Smell
(volatile odors)
• Taste
(soluble chemicals)
• Feel
(touch, heat)
The Worm
• ~1000 total cells
• 302 neurons
• 95 muscles
• ~20000 genes
Response
• Movement
• Neural activity
• Biochemical reactions
A Biologist’s Toolbox
Genetics
Biochemistry
Cell Biology
• Break individual parts, see what
happens
• Look at how parts chemically
interact
• See where the parts are
End result:
•A list of lots of details about what individual genes and proteins are doing
•But no clear view on what the system as a whole does
Idea: Finding Simple Models Through
Quantitative, Comparative Studies
• Build quantitative models that are just
complicated enough to explain the phenotypes
we can observe and care about
• Compare models across multiple strains and
species to see what phenotypes biology cares
about
• The molecular and cellular details can be filled in
later using traditional approaches
• Model system: Motile behavior
– Behavior is the output of all the complicated systems
of an organism
C. elegans Behavior
• Undulatory motion
• Occasional reversals
• Occasional sharp
“omega” turns
• Continuous turning
Gray and Lissmann (1964) J. Exp. Biol. 41:135-54, Croll (1975) J Zool. 176:159–176, Croll (1975) Adv Parasitol 13:71–122, Pierce-Shimomura et al. (1999) J. Neurosci. 19:955769. Iino, Y. & Yoshida, K. (2009) J. Neurosci. 29:5370-80. Helms (2013) Figshare.http://dx.doi.org/10.6084/m9.figshare.705155
Experimental Overview
Record video of freely
moving worms up to 30
minutes
Extract behavioral data
Develop models
Sampling Behavioral Variability:
Individual, Intra- and Inter-Species
Up to 20 individuals
per strain
Holovachov, O. et al. (2009) Nematology 11(6):927-950. Chiang, J.-T.A. et al. (2006) J. Exp. Biol. 209(10):1859-73. Andersen, E.C. et al. (2012) Nat. Genet. 44(3):285-90.
Building Quantitative Models
Deterministic
dynamics
• Correlation functions
• Phase spaces
• Fitting linear models
Stochastic
components
• Distributions
Simulations
• Monte carlo simulations
• Comparison with
statistics of data
Comparing Quantitative Models
Parameter
Correlation Matrix
Patterns (Modes)
Simulations
Data Challenges
Need to record data on many individuals for a long time at high frequency
Storage
Processing
Sharing
• Videos are large
• 240 GB/h raw
• 12 GB/h compressed
• Using ~1 TB of storage
for a proof of concept
project
• Want to scale up:
• # individuals by 10fold
• Sampling rate by 3fold
• >3-fold slower than
data collection on a
desktop computer
• Results in:
• A backlog of data to
analyze
• A long delay before
experiments can be
interpreted
• Videos are too big to
regularly transfer
around
• Extracted data is also
big
• 2 GB for the proof of
concept project
• Limited ability for
others to explore the
data themselves
Proposal:
Centrally located data processing and
analysis services at SARA
Experimental Users
(AMOLF, VCU, etc.)
Generate videos
Visualize data
Develop analyses
Upload videos
Download datasets
(hundreds of GBs, daily at peak)
SARA
Video storage
Video processing
Standard analyses
Exchange datasets
and analysis results
(few GBs, weekly)
Theory Users
(VU, OIST, etc.)
Visualize data
Develop analyses
Download datasets
(tens of GBs, weekly)
•Loading large (>10
GB) videos
•Processing 104-106
frames / video
How SURFnet/SURFsara/eScience Center
Are Helping
Storage
Processing
Sharing
• SURFsara will provide
up to 20 TB of storage
for the video data
• SURFsara will provide
computing resources
• Cloud or grid
• eScience Center is
helping with migrating
analysis code to run
on HPC infrastructure
• SURFnet is connecting
the involved institutes
with SURFsara using
high-speed lightpath
connections
• FOM Institute
AMOLF
• VU
• Okinawa Institute of
Science and Tech.
• Virginia
Commonwealth
University
Growth Prospects
•
Open source aspects of C. elegans community
–
–
–
–
•
WormBook - textbook
WormBase - genetics
WormAtlas - anatomy
etc.
As an analysis service available to other researchers
– Motility is widely used as a simple phenotype by C. elegans researchers
•
Collaborative development of new analysis methods
– Other researchers developing statistical analysis approaches for worm behavior
•
Integration of neuronal imaging data
– Ongoing experiments in the systems biology group at AMOLF
R. Doornekamp, FOM Institute AMOLF
These Are General Challenges
Advances in
imaging sensors
• Increasing temporal and spatial
resolution  more data
Advances in
experimental
techniques
• Increasing experimental throughput 
more data, access to statistical
approaches
Lack of
compression
options
• Distortion of data due to compression
artifacts is a major concern among
experimentalists
Acknowledgements
• Enlighten Your Research 4 and Global Teams
–
–
–
–
Nicole Gregoire (SURFnet)
Sylvia Kuijpers (SURFnet)
Jan Bot (SURFsara)
Frank Seinstra (eScience Center)
• eScience Center
– Rob van Nieuwpoort
– Elena Ranguelova
• Everyone else involved @ SURFnet, SURFsara
• Local ICT members
– Carl Schulz (AMOLF)