LARGE SCALE DEPLOYMENT OF DAP AND DTS Rob Kooper Jay Alemeda Volodymyr Kindratenko The need for scaling • How can we scale? • How can DAP architecture scale? • How can DTS architecture scale? • What options do we have to scale? • Amazon solution for scaling • XSEDE solution for scaling • Cloud solution for scaling Finite Resources CPU Memory Disk Network Scalability • A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system. Scaling Up And Out • Scale UP (vertically) • • • • Adding resources to a single system “Speed” Performance Moor’s Law • Scale OUT (horizontally) • • • • • Cloud Adding nodes to the system Nodes can be commodity hardware (vs HPC) Increase software complexity Increase management complexity Elasticity • Need ability to grow/shrink on demand • Based on workload add or remove resources • Keep requirements small • If many people use one service bring up more of those • Don’t bring up services that people don’t use Software Server Architecture Unknown Format Data Useable Data Polyglot Software Server Software Server Software Server … Software Server Image Magic Open Office ffmepg … 3D Studio HTTP HTML JSON HTTP HTML JSON Medici 2.0 Architecture Load Balancer Frontend Webapp … Frontend Webapp … Frontend Webapp Event Bus (rabbitMQ) External Services Elastic Elastic search Elastic search search MongoD MongoD B B MongoDB Filesystem Extractor (Java) Extractor (Python) How to grow? • More servers at ISDA • Funding is in Brown Dog • Not sustainable • Commercial Clouds • Amazon, … • XSEDE • NSF funded HPC computation • NCSA • Cloud infrastructure AWS Web Application Reference Architecture AWS Batch Processing Reference Architecture Pricing • Small machine (1CPU, 2GB) • Linux $0.026 per Hour • Windows $0.036 per Hour • Server is approx. $10,000 and can hold 20 VMs • Average lifespan 5 years (~ $500 per VM) • Equals around 2 years of Amazon time • But cheaper if we only need it 8 hours per day! And 7 hours/day in case of windows. XSEDE Resources Jay Alameda National Center for Supercomputing Applications 23 July 2014 What is XSEDE • Integrating service for wide variety of High Performance Computing (HPC) and Visualization and Data Analysis (RDAV) resources – – – – – Front line support Uniform documentation Extended collaborative support Training, education and outreach services Allocations • www.xsede.org Variety of HPC and RDAV resources • Dynamic list at https://www.xsede.org/web/guest/resources/ov erview – – – – – – – Overview, and expiration dates for each resource Traditional clusters Visualization and data analysis resources Storage resources High throughput resources Testbeds Services Potentially Interesting Resources for Browndog • Testbed resource “FutureGrid” – Production through 9/30/2014 – Partitioned into • HPC • Infrastructure as a Service (IaaS) – Nimbus – Openstack – Eucalyptus • Dedicated – Layer Platform as a Service (PaaS) (eg, MapReduce, Hadoop) on top of these partitions Potentially Interesting Resources for Browndog - 2 • Service resource “Quarry” – Web service hosting environment – Resource end date not specified – Available for XRAC allocations with web-service component • Storage: either NSF home directories, or lustre based storage. – OpenVZ provides virtual hosting of RPM based linux distributions – Persistent virtual machine New XSEDE Resource: Comet • Long-tail science system hosted at San Diego Supercomputer Center • Builds on experience with SDSC Gordon (flash memory, persistent storage nodes), and SDSC Trestles (long-tail science) – 99% of jobs in 2012 used < 2048 cores – These jobs consumed half of the total core hours across NSF resources. Comet • Partially designed to pick up FutureGrid use (virtual clusters) • Gateway hosting nodes and virtual machine repository • Optimized for jobs within a rack • Continues access to flash memory (Gordon) • Capacity computing: computing for the 99% of XSEDE jobs Comet virtualization • Leverage experience and expertise from FutureGrid • Virtual machine jobs scheduled like batch jobs • Flexible software environments for new communities and applications • Virtual machine repository • Virtual HPC cluster (multi-(whole)-node), miminum latency and overhead penalty XSEDE and BrownDog • Premise: BrownDog will become an integral part of a researcher’s workflow • Question: Should BrownDog evolve into an XSEDE resource provider, to provide data services for XSEDE? ISL Resources Volodymyr Kindratenko Innovative Systems Laboratory National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Hadoop Management server, user portal 1Gb Management switch HDFS, MapReduce (6) Secondary management server QDR IB switch OpenStack Cloud keystone, glance, cinder, nova, horizon, heat 1Gb Management switch neutron cinder-volume QDR IB switch nova-compute, open-switch (32) Virtual Lab for Advanced Design Storage Nodes Base Nodes High Memory Node Management 1Gb Management switch 10Gb Core switch http://www.ncsa.illinois.edu/about/org/isl 10Gb SDN switch High memory node • Dell PowerEdge R920 RAM CPU0 CPU1 RAM RAM RAM CPU Intel Xeon E7-4860v2 2.6 GHz (4) RAM 3 TB • CPU2 CPU3 RAM RAM QPI • Storage • RAM RAM PHC Interconnect • • 2x 300 GB 10,000 RPM SAS 6 Gbps HDD 4x 800 GB SAS Read-Intensive MLC 12 Gbps SSD 6x 1 TB 7,200 RPM Near-Line SAS 6 Gbps HDD 6x 1 Gbps Ethernet 2x 10 Gbps Ethernet Other systems • GPU Server • 8 NVIDIA C2050 GPUs • Intel Xeon Phi Server • 2 Xeon Phi 7120 (Knights Corner) application accelerators • HPC cluster • 8 nodes
© Copyright 2026 Paperzz