Iden%fying Behavioral Strategies through Large Scale Phenotyping and Sta%s%cal Analysis Stephen Helms, Ph.D. April 8, 2014 – EYR Global FOM Ins%tute AMOLF, Amsterdam, Netherlands Leon Avery (VCU), Greg Stephens (VU Amsterdam/OIST), Tom Shimizu (AMOLF) How Do We Understand Complex Systems With Many Parts? (Also a general “big data” ques%on!) A Model Complex System Tradi%onal approaches for understanding complex biological systems Sta%s%cal approach for understanding biological systems Data and computa%on problems Proposed computa%onal pla^orm Outlook for the future A Simple Model Nervous System: C. elegans S%muli • Smell (vola%le odors) • Taste (soluble chemicals) • Feel (touch, heat) The Worm • ~1000 total cells • 302 neurons • 95 muscles • ~20000 genes Response • Movement • Neural ac%vity • Biochemical reac%ons A Biologist’s Toolbox Gene%cs Biochemistry Cell Biology • Break individual parts, see what happens • Look at how parts chemically interact • See where the parts are End result: • A list of lots of details about what individual genes and proteins are doing • But no clear view on what the system as a whole does Idea: Finding Simple Models Through Quan%ta%ve, Compara%ve Studies • Build quan=ta=ve models that are just complicated enough to explain the phenotypes we can observe and care about • Compare models across mul%ple strains and species to see what phenotypes biology cares about • The molecular and cellular details can be filled in later using tradi%onal approaches • Model system: Mo=le behavior – Behavior is the output of all the complicated systems of an organism C. elegans Behavior • Undulatory mo%on • Occasional reversals • Occasional sharp “omega” turns • Con%nuous turning Gray and Lissmann (1964) J. Exp. Biol. 41:135-‐54, Croll (1975) J Zool. 176:159–176, Croll (1975) Adv Parasitol 13:71–122, Pierce-‐Shimomura et al. (1999) J. Neurosci. 19:9557-‐69. Iino, Y. & Yoshida, K. (2009) J. Neurosci. 29:5370-‐80. Helms (2013) Figshare.hqp://dx.doi.org/10.6084/m9.figshare.705155 Experimental Overview Record video of freely moving worms up to 30 minutes Extract behavioral data Develop models Sampling Behavioral Variability: Individual, Intra-‐ and Inter-‐Species Up to 20 individuals per strain Holovachov, O. et al. (2009) Nematology 11(6):927-‐950. Chiang, J.-‐T.A. et al. (2006) J. Exp. Biol. 209(10):1859-‐73. Andersen, E.C. et al. (2012) Nat. Genet. 44(3):285-‐90. Building Quan%ta%ve Models • Correla%on func%ons Determinis%c • Phase spaces dynamics • Firng linear models Stochas%c components • Distribu%ons Simula%ons • Monte Carlo simula%ons • Comparison with sta%s%cs of data Comparing Quan%ta%ve Models Parameter Correla%on Matrix Paqerns (Modes) Simula%ons Data Challenges Need to record data on many individuals for a long =me at high frequency Storage Processing Sharing • Videos are large • 240 GB/h raw • 12 GB/h compressed • Using ~1 TB of storage for a proof of concept project • Want to scale up: • # individuals by 10-‐ fold • Sampling rate by 3-‐ fold • >3-‐fold slower than data collec%on on a desktop computer • Results in: • A backlog of data to analyze • A long delay before experiments can be interpreted • Videos are too big to regularly transfer around • Extracted data is also big • 2 GB for the proof of concept project • Limited ability for others to explore the data themselves Proposal: Centrally located data processing and analysis services at SURFsara Experimental Users (AMOLF, VCU, etc.) Generate videos Visualize data Develop analyses Upload videos Download datasets (hundreds of GBs, daily at peak) SURFsara Video storage Video processing Standard analyses Exchange datasets and analysis results (few GBs, weekly) Theory Users (VU, OIST, etc.) Visualize data Develop analyses Download datasets (tens of GBs, weekly) • Loading large (>10 GB) videos • Processing 104-‐106 frames / video How EYR Is Helping Storage • SURFsara will provide up to 20 TB of storage for the video data Processing • SURFsara will provide compu%ng resources • Cloud or grid • eScience Center is helping with migra%ng analysis code to run on HPC infrastructure Sharing • Internet2 and SURFnet are connec%ng the involved ins%tutes with SURFsara using high-‐ speed lightpath connec%ons • FOM Ins%tute AMOLF • VU • Okinawa Ins%tute of Science and Tech • Virginia Commonwealth University Growth Prospects • Open source aspects of C. elegans community – – – – WormBook -‐ textbook WormBase -‐ gene%cs WormAtlas -‐ anatomy etc. • As an analysis service available to other researchers • Collabora%ve development of new analysis methods – Mo%lity is widely used as a simple phenotype by C. elegans researchers – Other researchers developing sta%s%cal analysis approaches for worm behavior • Integra%on of neuronal imaging data – Ongoing experiments in the systems biology group at AMOLF R. Doornekamp, FOM InsBtute AMOLF These Are General Challenges Advances in imaging sensors • Increasing temporal and spa%al resolu%on à more data Advances in experimental techniques • Increasing experimental throughput à more data, access to sta%s%cal approaches Lack of compression op%ons • Distor%on of data due to compression ar%facts is a major concern among experimentalists Acknowledgements • Enlighten Your Research 4 and Global Teams – – – – Nicole Gregoire (SURFnet) Sylvia Kuijpers (SURFnet) Jan Bot (SURFsara) Frank Seinstra (eScience Center) • eScience Center – Rob van Nieuwpoort – Elena Ranguelova • Everyone else involved @ SURFnet, SURFsara, Internet2 • Local ICT members – Carl Schulz (AMOLF)
© Copyright 2026 Paperzz