Application Scalability and High Productivity Computing Nicholas J Wright John Shalf Harvey Wasserman Advanced Technologies Group 1 NERSC/LBNL NERSC- National Energy Research Scientific Computing Center • Mission: Accelerate the pace of scientific discovery by providing high performance computing, information, data, and communications services for all DOE Office of Science (SC) research. • The production computing facility for DOE SC. • Berkeley Lab Computing Sciences Directorate – Computational Research Division (CRD), ESnet – NERSC 22 NERSC is the Primary Computing Center for DOE Office of Science • NERSC serves a large population Over 3000 users, 400 projects, 500 codes • NERSC Serves DOE SC Mission –Allocated by DOE program managers –Not limited to largest scale jobs –Not open to non-DOE applications • Strategy: Science First –Requirements workshops by office –Procurements based on science codes –Partnerships with vendors to meet science requirements Physics Chemistry Fusion Materials 3 Math + CS Climate Lattice Gauge Other Astrophysics Combustion Life Sciences NERSC Systems for Science Large-Scale Computing Systems Franklin (NERSC-5): Cray XT4 • 9,532 compute nodes; 38,128 cores • ~25 Tflop/s on applications; 356 Tflop/s peak Hopper (NERSC-6): Cray XE6 • Phase 1: Cray XT5, 668 nodes, 5344 cores • Phase 2: 1.25 Pflop/s peak (late 2010 delivery) Clusters 140 Tflops total Carver • IBM iDataplex cluster PDSF (HEP/NP) • ~1K core throughput cluster Magellan Cloud testbed • IBM iDataplex cluster GenePool (JGI) • ~5K core throughput cluster NERSC Global Filesystem (NGF) Uses IBM’s GPFS • 1.5 PB capacity • 5.5 GB/s of bandwidth HPSS Archival Storage • 40 PB capacity • 4 Tape libraries • 150 TB disk cache 4 Analytics Euclid (512 GB shared memory) Dirac GPU testbed (48 nodes) NERSC Roadmap 107 Peak Teraflop/s 106 105 NERSC-9 1 EF Peak How do we ensure that Users Performance NERSC-8 100 PF Peak is follows this trend and their Productivity NERSC-7 ? unaffected 10 PF Peak 104 Hopper (N6) >1 PF Peak 103 102 10 Franklin (N5) 19 TF Sustained 101 TF Peak Franklin (N5) +QC 36 TF Sustained 352 TF Peak 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 Users expect 10x improvement in capability every 3-4 years 5 5 Hardware Trends: The Multicore era • • • Moore’s Law continues unabated Power constraints means cores will double every 18 months not clock speed Memory capacity is not doubling at the same rate – GB/core will decrease Power is the Leading Design Constraint 7 Figure courtesy of Kunle Olukotun, Lance Hammond, Herb Sutter, and Burton Smith … and the power costs will still be staggering From Peter Kogge, DARPA Exascale Study $1M per megawatt per year! (with CHEAP power) 9 Changing Notion of “System Balance” • If you pay 5% more to double the FPUs and get 10% improvement, it’s a win (despite lowering your % of peak performance) • If you pay 2x more on memory BW (power or cost) and get 35% more performance, then it’s a net loss (even though % peak looks better) • Real example: we can give up ALL of the flops to improve memory bandwidth by 20% on the 2018 system • We have a fixed budget – Sustained to peak FLOP rate is wrong metric if FLOPs are cheap – Balance involves balancing your checkbook & balancing your power budget – Requires a application co-design make the right trade-offs 10 Summary: Technology Trends: • Number Cores – Flops will be “free” • • • • Memory Capacity per core Memory Bandwidth per core Network Bandwidth per core I/O Bandwidth 11 Navigating Technology Phase Transitions 107 Exascale + ??? Peak Teraflop/s 106 NERSC-9 1 EF Peak NERSC-8 100 PF Peak 105 NERSC-7 10 PF Peak 104 GPU CUDA/OpenCL Or Manycore BG/Q, R Hopper (N6) >1 PF Peak 103 102 10 Franklin (N5) 19 TF Sustained 101 TF Peak Franklin (N5) +QC 36 TF Sustained 352 TF Peak COTS/MPP + MPI (+ OpenMP) COTS/MPP + MPI 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 12 12 Application Scalability How can a user continue to be productive in the face of these disruptive technology trends? 13 Source of Workload Information • Documents – – – – – 2005 DOE Greenbook 2006-2010 NERSC Plan LCF Studies and Reports Workshop Reports 2008 NERSC assessment • Allocations analysis • User discussion 14 14 New Model for Collecting Requirements • Joint DOE Program Office / NERSC Workshops • Modeled after ESnet method – Two workshops per year – Describe science-based needs over 3-5 years • Case study narratives – First workshop is BER, May 7, 8 15 15 Numerical Methods at NERSC (Caveat: survey data from ERCAP requests) Methods at NERSC Percentage of 400 Total Projects 35% 30% 25% 20% 15% 10% 5% 0% 16 Application Trends • Weak Scaling Performance – Time to solution is often a non-linear function of problem size • Strong Scaling – Latency or Serial fraction will get you in the end. Performance • Add features to models – “New” Weak Scaling “Processors” “Processors” 17 Develop Best Practices in Multicore Programming fvCAM NERSC/Cray Programming Models “Center of Excellence” combines: (240 cores on Jaguar) 700 14 Time • LBNL strength in languages, tuning, performance analysis • Cray strength in languages, compilers, benchmarking 12 Time (sec) Memory Goals: • Immediate goal is training material for Hopper users: hybrid OpenMP/MPI • Long term input into exascale programming model 500 10 400 8 300 6 200 4 100 2 0 0 1 2 3 6 cores per MPI process = OpenMP thread parallelism 18 18 12 Memory per node (GB) 600 Develop Best Practices in Multicore Programming PARATEC Conclusions so far: (768 cores on Jaguar) 2000 • Mixed OpenMP/MPI saves significant memory • Running time impact varies with application • 1 MPI process per socket is often good 12 1800 Time Memory Time (sec) 1400 8 1200 1000 6 800 4 600 400 Run on Hopper next: 2 200 • 12 vs 6 cores per socket • Gemini vs. Seastar 0 0 1 2 3 6 cores per MPI process = OpenMP thread parallelism 19 19 12 Memory per node (GB) 10 1600 Co-Design Eating our own dogfood 20 20 Inserting Scientific Apps into the Hardware Development Process • Research Accelerator for Multi-Processors (RAMP) – Simulate hardware before it is built! – Break slow feedback loop for system designs – Enables tightly coupled hardware/software/science co-design (not possible using conventional approach) 21 Summary • Disruptive technology changes are coming • By exploring – new programming models (and revisiting old ones) – Hardware software co-design • We hope to ensure that scientists productivity remains high ! 22 23
© Copyright 2026 Paperzz