The Cloud for Biologists using bioinformatics tools Mattias de Hollander Netherlands Institute of Ecology (NIOO-KNAW) Galaxy Cloudman NIOO Thanks! Questions Why choose for the Cloud? 2 / 16 Galaxy Cloudman NIOO Thanks! Questions Why choose for the Cloud? It’s flexible 2 / 16 Galaxy Cloudman NIOO Thanks! Questions Why choose for the Cloud? It’s flexible You have full control 2 / 16 Galaxy Cloudman NIOO Thanks! Questions Why choose for the Cloud? It’s flexible You have full control Perfect for small labs 2 / 16 Galaxy Cloudman NIOO Thanks! Questions Why choose for the Cloud? It’s flexible You have full control Perfect for small labs It’s fancy (Google and Amazon are using it) 2 / 16 Galaxy Cloudman NIOO Thanks! Questions Why choose for the Cloud? It’s flexible You have full control Perfect for small labs It’s fancy (Google and Amazon are using it) It’s environmental friendly (Gmail: Its cooler in the cloud) 2 / 16 Galaxy Cloudman NIOO Thanks! Questions How do we use the Cloud? 3 / 16 Galaxy Cloudman NIOO Thanks! Questions Galaxy a web-based genome analysis platform1 1 Slide by Anton Nekrutenko, Galaxy Developer Conference 2011, Lunteren (NL) 4 / 16 Galaxy Cloudman NIOO Thanks! Questions Galaxy a web-based genome analysis platform1 A free (for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage 1 Slide by Anton Nekrutenko, Galaxy Developer Conference 2011, Lunteren (NL) 4 / 16 Galaxy Cloudman NIOO Thanks! Questions Galaxy a web-based genome analysis platform1 A free (for everyone) web service integrating a wealth of tools, compute resources, terabytes of reference data and permanent storage Open source software that makes integrating your own tools and data and customizing for your own site simple 1 Slide by Anton Nekrutenko, Galaxy Developer Conference 2011, Lunteren (NL) 4 / 16 Galaxy Cloudman NIOO Thanks! Questions 5 / 16 Galaxy Cloudman NIOO Thanks! Questions Most biologists don’t write code 6 / 16 Galaxy Cloudman NIOO Thanks! Questions Most biologists don’t write code Analyze Interactively manipulate genomic data with a comprehensive and expanding ’best-practices’ toolset 6 / 16 Galaxy Cloudman NIOO Thanks! Questions Most biologists don’t write code Analyze Interactively manipulate genomic data with a comprehensive and expanding ’best-practices’ toolset Publish and Share Results and step-by-step analysis record (Data Libraries and Histories) Customizable pipelines (Workflows) Share workflows with other users 6 / 16 Galaxy Cloudman NIOO Thanks! Questions Cloudman 7 / 16 Galaxy Cloudman NIOO Thanks! Questions What is Cloudman? 8 / 16 Galaxy Cloudman NIOO Thanks! Questions What is Cloudman? Cloudman is written by Enis Afghan et.al., Emory University and provides a ready-to-run, dynamically scalable version of Galaxy on Amazon AWS 8 / 16 Galaxy Cloudman NIOO Thanks! Questions What is Cloudman? Cloudman is written by Enis Afghan et.al., Emory University and provides a ready-to-run, dynamically scalable version of Galaxy on Amazon AWS Now it’s possible to run it also on the SARA HPC Cloud / Opennebula (with some limitations) 8 / 16 Galaxy Cloudman NIOO Thanks! Questions How does it work? 9 / 16 Galaxy Cloudman NIOO Thanks! Questions How does it work? A master node contains all the data and tools 9 / 16 Galaxy Cloudman NIOO Thanks! Questions How does it work? A master node contains all the data and tools Initiate worker nodes based on needs/load 9 / 16 Galaxy Cloudman NIOO Thanks! Questions How does it work? A master node contains all the data and tools Initiate worker nodes based on needs/load Data is available on all nodes using a shared filesystem: NFS 9 / 16 Galaxy Cloudman NIOO Thanks! Questions How does it work? A master node contains all the data and tools Initiate worker nodes based on needs/load Data is available on all nodes using a shared filesystem: NFS RabbitMQ is used for communication between cluster nodes 9 / 16 Galaxy Cloudman NIOO Thanks! Questions How does it work? A master node contains all the data and tools Initiate worker nodes based on needs/load Data is available on all nodes using a shared filesystem: NFS RabbitMQ is used for communication between cluster nodes Jobs are queued using SGE 9 / 16 Galaxy Cloudman NIOO Thanks! Questions How does it work? A master node contains all the data and tools Initiate worker nodes based on needs/load Data is available on all nodes using a shared filesystem: NFS RabbitMQ is used for communication between cluster nodes Jobs are queued using SGE Galaxy is served using nginx webserver 9 / 16 Galaxy Cloudman NIOO Thanks! Questions Workers instances are being configured 10 / 16 Galaxy Cloudman NIOO Thanks! Questions Galaxy is accessible 11 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? 12 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? 13 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? Analyzing high-throughput community sequencing data with QIIME 13 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? Analyzing high-throughput community sequencing data with QIIME Denoising (CPU-intensive) 13 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? Analyzing high-throughput community sequencing data with QIIME Denoising (CPU-intensive) OTU and representative set picking using uclust, cdhit, mothur BLAST or other tools 13 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? Analyzing high-throughput community sequencing data with QIIME Denoising (CPU-intensive) OTU and representative set picking using uclust, cdhit, mothur BLAST or other tools Taxonomy assignment with BLAST or the RDP classifier (CPU-intensive) 13 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? Analyzing high-throughput community sequencing data with QIIME Denoising (CPU-intensive) OTU and representative set picking using uclust, cdhit, mothur BLAST or other tools Taxonomy assignment with BLAST or the RDP classifier (CPU-intensive) Sequence alignment with PyNAST, muscle, infernal, or other tools (CPU-intensive) 13 / 16 Galaxy Cloudman NIOO Thanks! Questions How is Galaxy used at the NIOO? Analyzing high-throughput community sequencing data with QIIME Denoising (CPU-intensive) OTU and representative set picking using uclust, cdhit, mothur BLAST or other tools Taxonomy assignment with BLAST or the RDP classifier (CPU-intensive) Sequence alignment with PyNAST, muscle, infernal, or other tools (CPU-intensive) and more! 13 / 16 Galaxy Cloudman NIOO Thanks! Questions Thanks! 14 / 16 Galaxy Cloudman NIOO Thanks! Questions Thanks to the Galaxy Cloud Team 15 / 16 Galaxy Cloudman NIOO Thanks! Questions Questions? 16 / 16 Galaxy Cloudman NIOO Thanks! Questions Extra slides 16 / 16 Galaxy Cloudman NIOO Thanks! Questions Limitations of Opennebula Create instances providing user data (available in production cloud?) No support for growing qcow filesystem Would be create to access the cloud the ON API from outside Cloned instances have not a working network 16 / 16 Galaxy Cloudman NIOO Thanks! Questions More info at My notes: https://www.cloud.sara.nl/projects/galaxy/wiki Galaxy Cloud on Amazon: http://usegalaxy.org/cloud Cloudman scripts: https://bitbucket.org/galaxy/cloudman/ Install tools: https://bitbucket.org/afgane/mi-deployment Bio-linux repository: http: //nebc.nerc.ac.uk/tools/bio-linux/bio-linux-6.0 16 / 16 Galaxy Cloudman NIOO Thanks! Questions Launch Cloudman Console 16 / 16 Galaxy Cloudman NIOO Thanks! Questions Master node is online 16 / 16 Galaxy Cloudman NIOO Thanks! Questions Add extra worker nodes 16 / 16 Galaxy Cloudman NIOO Thanks! Questions New instances are pending 16 / 16 Galaxy Cloudman NIOO Thanks! Questions New instances are pending #2 16 / 16 Galaxy Cloudman NIOO Thanks! Questions New instances are running 16 / 16 Galaxy Cloudman NIOO Thanks! Questions New instances are online 16 / 16 Galaxy Cloudman NIOO Thanks! Questions Galaxy is accessible 16 / 16
© Copyright 2026 Paperzz