Identifying Behavioral Strategies through Large Scale Phenotyping and Statistical Analysis Stephen Helms, Ph.D. March 12, 2014 – SURFsara Data & Computing Infrastructure Event FOM Institute AMOLF, Amsterdam, Netherlands Leon Avery (VCU), Greg Stephens (VU Amsterdam/OIST), Tom Shimizu (AMOLF) How Do We Understand Complex Systems With Many Parts? (Also a general “big data” question!) A Model Complex System Traditional approaches for understanding complex biological systems Statistical approach for understanding biological systems Data and computation problems Proposed computational platform Outlook for the future A Simple Model Nervous System: C. elegans Stimuli • Smell (volatile odors) • Taste (soluble chemicals) • Feel (touch, heat) The Worm • ~1000 total cells • 302 neurons • 95 muscles • ~20000 genes Response • Movement • Neural activity • Biochemical reactions A Biologist’s Toolbox Genetics Biochemistry Cell Biology • Break individual parts, see what happens • Look at how parts chemically interact • See where the parts are End result: •A list of lots of details about what individual genes and proteins are doing •But no clear view on what the system as a whole does Idea: Finding Simple Models Through Quantitative, Comparative Studies • Build quantitative models that are just complicated enough to explain the phenotypes we can observe and care about • Compare models across multiple strains and species to see what phenotypes biology cares about • The molecular and cellular details can be filled in later using traditional approaches • Model system: Motile behavior – Behavior is the output of all the complicated systems of an organism C. elegans Behavior • Undulatory motion • Occasional reversals • Occasional sharp “omega” turns • Continuous turning Gray and Lissmann (1964) J. Exp. Biol. 41:135-54, Croll (1975) J Zool. 176:159–176, Croll (1975) Adv Parasitol 13:71–122, Pierce-Shimomura et al. (1999) J. Neurosci. 19:955769. Iino, Y. & Yoshida, K. (2009) J. Neurosci. 29:5370-80. Helms (2013) Figshare.http://dx.doi.org/10.6084/m9.figshare.705155 Experimental Overview Record video of freely moving worms up to 30 minutes Extract behavioral data Develop models Sampling Behavioral Variability: Individual, Intra- and Inter-Species Up to 20 individuals per strain Holovachov, O. et al. (2009) Nematology 11(6):927-950. Chiang, J.-T.A. et al. (2006) J. Exp. Biol. 209(10):1859-73. Andersen, E.C. et al. (2012) Nat. Genet. 44(3):285-90. Building Quantitative Models Deterministic dynamics • Correlation functions • Phase spaces • Fitting linear models Stochastic components • Distributions Simulations • Monte carlo simulations • Comparison with statistics of data Comparing Quantitative Models Parameter Correlation Matrix Patterns (Modes) Simulations Data Challenges Need to record data on many individuals for a long time at high frequency Storage Processing Sharing • Videos are large • 240 GB/h raw • 12 GB/h compressed • Using ~1 TB of storage for a proof of concept project • Want to scale up: • # individuals by 10fold • Sampling rate by 3fold • >3-fold slower than data collection on a desktop computer • Results in: • A backlog of data to analyze • A long delay before experiments can be interpreted • Videos are too big to regularly transfer around • Extracted data is also big • 2 GB for the proof of concept project • Limited ability for others to explore the data themselves Proposal: Centrally located data processing and analysis services at SARA Experimental Users (AMOLF, VCU, etc.) Generate videos Visualize data Develop analyses Upload videos Download datasets (hundreds of GBs, daily at peak) SARA Video storage Video processing Standard analyses Exchange datasets and analysis results (few GBs, weekly) Theory Users (VU, OIST, etc.) Visualize data Develop analyses Download datasets (tens of GBs, weekly) •Loading large (>10 GB) videos •Processing 104-106 frames / video How SURFnet/SURFsara/eScience Center Are Helping Storage Processing Sharing • SURFsara will provide up to 20 TB of storage for the video data • SURFsara will provide computing resources • Cloud or grid • eScience Center is helping with migrating analysis code to run on HPC infrastructure • SURFnet is connecting the involved institutes with SURFsara using high-speed lightpath connections • FOM Institute AMOLF • VU • Okinawa Institute of Science and Tech. • Virginia Commonwealth University Growth Prospects • Open source aspects of C. elegans community – – – – • WormBook - textbook WormBase - genetics WormAtlas - anatomy etc. As an analysis service available to other researchers – Motility is widely used as a simple phenotype by C. elegans researchers • Collaborative development of new analysis methods – Other researchers developing statistical analysis approaches for worm behavior • Integration of neuronal imaging data – Ongoing experiments in the systems biology group at AMOLF R. Doornekamp, FOM Institute AMOLF These Are General Challenges Advances in imaging sensors • Increasing temporal and spatial resolution more data Advances in experimental techniques • Increasing experimental throughput more data, access to statistical approaches Lack of compression options • Distortion of data due to compression artifacts is a major concern among experimentalists Acknowledgements • Enlighten Your Research 4 and Global Teams – – – – Nicole Gregoire (SURFnet) Sylvia Kuijpers (SURFnet) Jan Bot (SURFsara) Frank Seinstra (eScience Center) • eScience Center – Rob van Nieuwpoort – Elena Ranguelova • Everyone else involved @ SURFnet, SURFsara • Local ICT members – Carl Schulz (AMOLF)
© Copyright 2026 Paperzz