IFS performance on the new IBM Power6 systems at ECMWF Deborah Salmond 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 1 Plan for Talk • HPC systems at ECMWF – 2 x IBM p5 575+ clusters (Power5+) – New IBM p6 575 cluster (Power6) • Preliminary performance measurements for IFS Cycle 35r1 - ECMWF’s operational weather forecasting model - on Power6 compared with Power5+ • Power6 system is available to all ECMWF internal users for research experiments 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 2 Power5+ Power6 hpce & hpcf c1a & c1b IBM p5 575+ IBM p6 575 Power5+ 1.9 GHz + SMT Peak 7.6 Gflops per core Power6 4.7 GHz + SMT Peak 18.8 Gflops per core 2480 cores per cluster 7936 cores per cluster 16 cores per node 32 cores per node Federation Switch QLogic Silverstorm InfiniBand Switch 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 3 Power5+ Power6 Power5+ Power6 Increase 1.9 GHz 4.7 GHz 2.5 x Cores per cluster 2480 7936 3.2 x Compute nodes per cluster Cores per node 248 1.6 x 32 (64 SMT) 64 Gbytes 2x Clock 155 16 (32 SMT) Memory per node 32 Gbytes L2 cache per core 0.9 Mbytes 4 Mbytes 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 4 1x (per core) 4x ECMWF’s next operational resolution: T1279 L91 Horizontal grid-spacing = ~16km 684.1 650 600 Number of Horizontal gridpoints = 2,140,704 550 Timestep = 450 secs 300 Flops for 10day forecast = 8*1015 500 450 400 350 250 200 150 100 50 50°N 50°N 10 0° 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 5 ECMWF’s next operational resolution for EPS: T399 Horizontal grid-spacing = ~50km 473.1 450 400 Number of Horizontal gridpoints = 213,988 350 300 250 Timestep = 1800 secs Flops for 51 member EPS = 5*1015 200 150 100 50 50°N 50°N 10 0° 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 6 IFS T1279 L91 forecast (35r1) on P6 compared with P5+ – same number of cores Wall Number of Tflops % of time cores Peak P5+ 10332 640* 0.77 15.9 (40 nodes) P6 6610 640 1.21 9.9 (20 nodes) Speed-up 1 1.56 * 160 MPI tasks and 8 OpenMP threads – using SMT 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 7 IFS T1279 L91 forecast (35r1) on P6 compared with P5+ – 3.2 * number of cores Wall time P5+ 10332 P6 2541* Number of Tflops % of cores Peak 640 0.77 15.9 (40 nodes) 2048 3.15 8.2 (64 nodes) *Some performance problems E.g. load imbalance from ‘jitter’ 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 8 Speed-up 1 4.06 Variable time-step times – now mostly fixed 3.5 Sec 3 Power5 1280 cores 2.5 2 1.5 1 Power6 4096 cores 0.5 0 350 400 450 500 550 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 600 650 Slide 9 700 Step number IFS T1279 10-day forecast on Power6 Operations 6 Number of Physical cores 4096 5 3072 4 2048 Tflops 3 P6 P5 1024 2 640 1 960 1280 Run with SMT and 8 OpenMP threads 320 0 0 25 50 % of cluster 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 10 IFS T1279 10-day forecast on Power6 - scalability to whole cluster 8 7040 7 Number of Physical cores 6 4096 5 3072 Tflops 4 P6 P5 2048 3 1024 2 1 320 640 960 1280 0 0 0 % of 75cluster 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 100 150 Slide 11 IFS T1279 10-day forecast on Power6 - speed-up curve 8 7040 Speed-up 7 = 14080 user threads 6 5 4096 4 Ideal P6 3072 3 2048 2 1024 1 0 0 100 % of75 cluster 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 150 Slide 12 DrHook on Power5+ and Power6 for T1279 forecast All PEs % Tot P5 27398.0 27721.8 26762.5 8233.4 7791.7 9391.3 9606.8 8075.2 9565.0 9265.2 9134.5 6.64 6.72 6.49 2.00 1.89 2.28 2.33 1.96 2.32 2.25 2.22 All PEs % Tot P6 17512.5 13125.1 9564.7 4634.2 5407.0 5665.2 5594.3 5455.2 Ave TIME 85.6 86.6 83.6 25.7 24.3 29.3 30.0 25.2 29.9 29.0 28.5 Min Max 73.7 80.5 76.8 7.4 2.5 26.5 28.1 21.6 27.8 27.9 27.2 97.7 94.6 87.6 50.8 33.3 32.9 32.0 32.0 31.2 30.0 29.4 Ave Min Max TIME 7.23 5.42 3.95 1.91 2.64 2.34 2.31 1.79 2.25 54.7 41.0 29.9 14.5 19.9 17.7 17.5 13.5 17.0 Ave Min MFLOPS/Logical 645.4 518.0 362.3 352.0 2606.6 2342.0 197.5 71.0 0.0 0.0 290.6 266.0 2318.5 2268.0 989.4 852.0 473.1 457.0 278.6 267.0 0.0 0.0 Ave Min Max core 733.0 374.0 2941.0 232.0 0.0 311.0 2365.0 1113.0 506.0 292.0 0.0 CUADJTQ CLOUDSC MXMAOP CLOUDVAR >MPL-TRLTOG_COMMS VDFEXCU VERINT LAITQM VDFMAIN SRTM_SPCVRT_MCICA >MPL-TRMTOL_COMMS Max MFLOPS/Logical core 51.4 35.1 26.4 3.6 18.1 17.2 16.9 3.8 16.7 59.4 44.6 33.5 29.3 21.2 18.6 18.5 17.8 17.5 573.5 551.2 5314.9 5124.3 1847.7 1446.0 350.0 145.9 0.0 0.0 456.4 433.0 808.3 751.7 0.0 0.0 1466.6 1101.9 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 595.6 5776.4 2137.7 402.2 0.0 470.9 853.3 0.0 2035.2 CLOUDSC MXMAOP CUADJTQ CLOUDVAR >MPL-TRMTOL_COMMS SRTM_SPCVRT_MCICA VDFMAIN >MPL-TRLTOG_COMMS LAITQM Slide 13 Speed-up per core - Computation Power5+ Power6 • Compute speed-up – Clock cycle = 2.47 x Routine Description Gflops/core on Power6 Speed-up P5 P6 CUADJTQ Math functions 3.6 1.9 CLOUDSC IF tests 1.1 1.4 MXMAOP DGEMM call 10.6 2.2 SRTM IF tests 0.9 1.5 LAITQM Indirect addressing 2.9 1.5 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 14 Speed-up per core - Communications Power5+ Power6 • Communications speed-up – 8 IB links per node on Power6 – 2 Federation links per node on Power5+ – Increase in aggregate Bandwidth per core = 2 x Routine Description Speed-up P5 P6 TRLTOG Transposition Fourier to Grid-Point 1.89 TRMTOL Transposition Spectral to Fourier 1.52 SLCOMM1 Semi-Lagrangian Halo 1.44 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 15 Power5 compared with Power6 • Very similar for users ☺ • Re-compile –qarch=pwr6 • No ‘out-of-order execution’ on Power6 – so SMT is more advantageous • Some constructs relatively slower on Power6 – Floating point compare & branch – Store followed by load on same address 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 16 HPCF at ECMWF 1978-2011 Sustained Mflops 40 Mflops CPUs 1 CPU 1978 CRAY1 1984 1986 1990 1992 CRAY CRAY CRAY CRAY XMP-2 XMP-4 YMP-8 C90-16 Vector Vector - Shared Memory Parallel 1996 Fujitsu VPP700 Vector - MPI 2000 Fujitsu VPP5000 2002 2004 2*24802006 2*7936 2008 IBM Power4 IBM Power4+ IBM Power5+ IBM Power6 Scalar – MPI+OpenMP -> SMT 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 17 4 Tflops ~20Tflops History of IFS scalability 10000 IBM Power 6 p575 2008 RAPS-10+ T1279 L91 IBM p575+ 2006 RAPS-9 T799 L91 Gflop/s 1000 100 10 1 100 IBM p690+ 2004 RAPS-8 T799 L91 CRAY T3E-1200 1998 RAPS-4 T213 L31 1000 10000 Number of cores 13th Workshop on the use of HPC in Meteorology, 3-7 Nov 2008 Slide 18
© Copyright 2025 Paperzz