- Offer CSCS users true capability computing resources that are able to scale for large computational experiments - The resources form a portfolio of systems with complementary characteristics to support different kinds of workloads and pre-requisite requirements - At the same time, the portfolio must be as small as possible to minimize operational costs and to maximize science output - The different compute resources must be integrated: network, storage, security R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 - Storage Farm - Internal Network Upgrade - Horizon Status and Plans - Terrane Plans and Configuration - Zenith - Summary R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 - Fibre channel as primary technology - Ease of expanding and adjusting individual systems storage requirement - Leveraging Terrane hardware - 3 32 port 4Gbit FC switches - Switches will be trunked together R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 - 2 CISCO 6509 blade-based switches - Leverage Terrane equipment - Create internal 10 Gbit trunks - Between 6509s - To firewall & external net when needed - 10 Gbit among all primary systems (supercomputers, frontends, HSM) R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 - installed in summer 2005, in production since January 2006 - Consisting of three Cray XT3 massively-parallel systems: - Palu: production system with 1’100 compute nodes with 2.6 GHz single core AMD Opterons, 2 GB/CPU memory - Gele: porting and testing system with 84 compute nodes with the same characteristics as on Gele - Fred: internal CSCS HW and SW test system with 32 compute nodes as above R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • 4 login nodes with DNS rotary name - palu.cscs.ch • 2 yod/mom nodes • Scratch for general users - approximately 9TB • 4 Lustre servers (1 MDS / 15 OSTs) • Scratch for PSI users - approximately 6.6TB • 3 Lustre servers (1 MDS / 11 OSTs) • each OST = 600MB R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • Current availability > 90% • Current usage: • >90+% node utilization • Jobs using 64-512 nodes are “typical” • 768 nodes max size now • Oversubscribed • Will be upgraded with 600 more processors (6 racks) in summer. Two groups telling us they have done science they couldn’t do before. R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • Cray working on each of these bugs • Extremely difficult to diagnose • Job start failures - one of our highest priority bugs • Currently nodes stay down until next machine reboot • Single node reboot available with 1.4 • DataDirectNetwork disc controllers have been unreliable • Cray & DDN are replacing all controllers with newly manufactured ones • CSCS will receive 1 controller pair from Engenio with the Cray extension during summer R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 Firm: • Single compute node reboot • Dual core service nodes with upgrade of palu system Potentially: • Dual core CPUs on gele for performance testing • Test Linux on compute node - fred R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 - Targeted at capacity/capability problems mostly run within the node - To be installed in summer 2006 - Multiprocessor nodes with SMP capability, 4.5 Tflops aggregate - Full-scale operating system on all node types - Support for commercial HPC codes Solution after public call for tender: IBM Power-5 (!) Infiniband cluster R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 - 48 p575 compute nodes, each with 16 CPU - p550+: 8 I/O, 2 login nodes and 3 auxiliary nodes, each with 4 CPUs - 4X Infiniband (96 port Cisco Topspin switch) - Gigabit Ethernet interconnect (2x CISCO 6509 blade-based switch) - 44 TB external FC disc storage and 4Gbit FC switches - water-cooled compute racks R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • 48 P5 575 compute nodes • 16-way SMP with SMT = effectively 32-way • 47 w/32GB main memory • 1 w/64 GB main memory • 1.5 GHz clock, 96 Gflops per node • 4X dual-port infiniband adapter = 10 Gbit/s per port SMT means Simultaneous Multi-Threading R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • 6 P5+ 550 I/O nodes • 4-way SMP with SMT = effectively 8-way • 16GB main memory • 1 remote I/O drawer each • 1.9 GHz clock • 4 have 6 2Gbit FC cards • 2 have 2 4Gbit dual-channel FC cards • 1 GX bus dual port 4X Infiniband • 1 10Gbit Ethernet card R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • 2 P5+ 550 login nodes • 4-way SMP with SMT = effectively 8-way • 16GB main memory • 1.9 GHz clock • 1 GX bus dual port 4X Infiniband • 1 10Gbit Ethernet card R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • 3 P5+ 550 auxiliary nodes • 4-way SMP with SMT = effectively 8-way • 8 GB main memory • 1.9 GHz clock • 1 GX bus dual port 4X Infiniband R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • Linux on Power (SLES9) or AIX 5.3 • GPFS • Full IBM HPC SW stack (compilers, libraries, loadleveler batch system, cluster management SW) • 3-years maintenance HW & SW, with on-site same-business day response if called until noon R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 • Cisco Topspin 270 switch • 96 4X ports • Great IP network • GPFS transport within Terrane • General IP traffic • Test for MPI • Linux and AIX as candidates R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 - Installation from July to September (very early users, CSCS migration task force) - early access from September onwards - available for LUP call 1/07 Planned upgrades: - 12X Infiniband by 4Q06 (dual-striped ?, scaling to 128 CPU) - extension with Power-4 trade-in by 3Q06 (8 16-(32)-way nodes p575+ with 32 GB RAM, 1 64-(128)-way node p595+ with 256 GB RAM) R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 Zenith was characterised by (announcement on CSCS website) - Shared-memory capacity - Strong single-processor and node performance - ease of use With Terrane we got: - Shared-memory from 32 to 64 GB (base system) resp. 256 GB (trade-in of SP4) - Single-processor peak performance of up to 7.5 Gflops and node peak performance of 96 to 121 Gflops (base) resp. 484 Gflops (trade-in) - Well known IBM user environment and reliability Base characteristics well covered R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 LM (weather forecast): Shows very good scaling on Cray XT3 and is used operationally by code author, the German weather service, on IBM Power-5 ECHAM-5 (climate): Is being ported by code author on Cray XT3 with extremely promising results (see HPCWire, December 05 & later presentation by FB) CP (Quantum Espresso suite, molecular dynamics): Is being very successfully used in production on Cray XT3 and IBM SP4 Transit (CFD): CSCS will port code on scalar architecture Gaussian (chemistry): Requires big shared memory, which is available for IBM Power-5 for which code is supported R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 CSCS can drop the Zenith project because of the (unexpected) solution for Terrane - All basic requirements covered - All CSCS applications supported - With Terrane we got a scalable big 5 Tflops SMP system at the price of a loosely-coupled cluster Extremely economic solution for all needs below 64 proc/job Similar installations can be found at: DWD, MPI Garching, ECMWF, EPCC/ Daresbury, Los Alamos, NERSC/Berkeley, Lawrence Livermore, .... R. Alexander/D. Ulmer, User Assembly, May 22nd 2006 As of May 2006 R. Alexander/D. Ulmer, User Assembly, May 22nd 2006
© Copyright 2026 Paperzz