Netweb Technologies Delivers India`s Fastest Hybrid

white paper
Intel, Netweb Technologies & Tyrone
Netweb Technologies Delivers
India’s Fastest Hybrid Supercomputer
with Breakthrough Performance
Using the Intel® Xeon® processor and new Intel® Xeon Phi™ coprocessor enabled Netweb
Technologies to develop and deploy India’s fastest and largest hybrid supercomputer. With
over 30,000 processing cores and a total 14TB of memory, the next-generation PARAM
Yuva II System at the Centre for Development of Advanced Computing (C-DAC) is enabling
entirely new possibilities for research in India.
The next-generation PARAM Yuva II System at the
Centre for Development of Advanced Computing
(C-DAC) offers peak performance of 524 Teraflops,
requires only half the footprint of the previous system
and consumes 35% less energy.
The discovery of key insights and answers to many of the world’s most
challenging and complex problems is in large part possible today
thanks to the exceptional performance and groundbreaking capabilities of
modern supercomputers. In India, significant strides are now possible
with deployment of the next-generation PARAM Yuva II System at C-DAC.
Using the combined supercomputing capabilities enabled by the
Intel® Xeon® processor E5-2670 and Intel® Xeon Phi™ coprocessor 5110P,
highly parallel computing can be carried out faster than ever before. The
result is swift assessment of complicated simulation and modeling and
the subsequent advancement of research and discovery in diverse areas
including biotechnology, computational fluid dynamics, seismic, atmospheric,
computational science, disaster mitigation, engineering and more.
Netweb Delivers Fastest Supercomputer in India
Confronted with the task of deploying the most capable supercomputer in
India, the Centre for Development of Advanced Computing (C-DAC), chose
Netweb Technologies to deliver an ideal hybrid solution featuring cuttingedge Intel technologies. The next-generation PARAM Yuva II System
deployed by Netweb went above and beyond articulated project goals,
balancing optimal performance with the lowest possible power consumption
and performance per watt. Specific solution goals included a minimum
computing performance of 300 Teraflops, solution implementation within a
6-week timeframe, and setup compatible with existing C-DAC data center
space, power, and cooling infrastructure constraints. Intel components
played a vital role in meeting the performance and efficiency objectives of
the new PARAM System. Beyond the key Intel® Xeon® processor E5-2670
and the Intel® Xeon Phi™ coprocessor 5110P, other Intel components
included the Intel® Solid-State Drive 330 Series, and Intel® Cluster Studio
XE 2013 for Linux. With this cohesive foundation, Netweb was able to
provide swift delivery, installation and benchmarking processes—ultimately
deploying the supercomputing system in record time.
Netweb Utilizes Advanced Hybrid Technology
With their specialization in creating flexible solutions based on the
latest technologies, and reputation as an experienced, trusted supplier,
Netweb was chosen as the best fit to carry out C-DAC’s vision. Netweb
successfully implemented a system featuring both Intel Xeon processors
and the new Intel Xeon Phi coprocessors. The use of Intel components
and hybrid technology allowed for a final technical computing solution
with industry-leading performance for highly parallel applications, flexible
execution models to accommodate a wide range project types and
workloads, extreme energy efficiency and better performance per watt—
all within a unified hardware and software environment.
“The software was loaded on the servers and configured as per C-DAC's
requirements. Since it was an HPC installation, there was a significant
amount of work to be done beyond installation of basic server systems.
We installed the operating system, configured it as an HPC, installed
compilers and libraries and ran required benchmarks. We also had to ensure
compatibility with other infrastructure such as the PFS storage and backup
system. Running benchmarks for best results requires a considerable
amount of fine-tuning, which needs domain expertise.”
Sanjay Lodha
Chief Executive Officer
Netweb Technologies
Netweb’s specialization in a concentrated product gamut enables them
to create focused solutions highly tuned to specific project goals. With
over 10 years of hands-on experience in high-performance computing
(HPC), and more than 150 successful installations across India in various
prestigious research and education labs, Netweb was identified as wellequipped to handle the complex task of developing and deploying India’s
fastest supercomputer.
Intel® Xeon Phi™ Coprocessor Impacts Computing Performance
The addition of the Intel® Xeon Phi™ coprocessor to the Intel® Xeon® processor increased peak
performance for each computing node from 332.8 Gigaflops to 2,354.5 Gigaflops.
C-DAC Dedicated to Future-Forward HPC Solutions
C-DAC has emerged as a premier research and development (R&D)
organization in IT&E (Information Technologies and Electronics) in India
and is committed to strengthening India’s computing capabilities in
light of ever-changing global developments. As an institution committed
to high-end R&D, C-DAC has been at the forefront of the Information
Technology (IT) revolution, building out their computing capacity through
use of the latest technologies. Additionally, C-DAC applies their expertise
in developing and deploying IT solutions to support a wide range of public
and private organizations.
“The upgraded PARAM Yuva installed at C-DAC, Pune by Netweb
Technologies is based on Intel Xeon Phi Many Integrated Core architecture
coprocessors, and has become the most powerful supercomputer for
India’s scientific community with theoretical peak performance that exceeds
one half of a Petaflop. This largest system was supported by the India
Department of IT, the Ministry of Communications & IT, and the government
of India and will provide unprecedented computing power for performing
research in the fields of biotechnology, computational fluid dynamics, seismic,
atmospheric, computational science, disaster mitigation, engineering and
other disciplines. This will pave the way for a wide range of achievements
in science and technology for India. I would also like to express our full
satisfaction and admiration for Netweb Technologies that completed the
project in record time.”
As a nodal agency, C-DAC supports over 200 users running different types
of code for various HPC needs. According to Dr. Pradeep K Sinha, Senior
Director of HPC at C-DAC, “The upgraded PARAM Yuva installed at C-DAC,
Pune by Netweb Technologies is based on Intel Xeon Phi Many Integrated
Core architecture coprocessors, and has become the most powerful
supercomputer for India’s scientific community with theoretical peak
performance that exceeds one half of a Petaflop. This largest system was
supported by the India Department of IT, the Ministry of Communications
& IT, and the government of India and will provide unprecedented
Dr. Pradeep K Sinha
Senior Director, HPC
C-DAC, India
3
Utilization of Intel Technologies — Build of Materials
• 222 x Tyrone* 2U Server Compute Nodes
• 2 x Tyrone 2U Server Compile Nodes
• 448 x Intel® Xeon® E5-2670 Processors
Bundled with Node(s)
• 1808 x 8GB DDR3 Memory Modules
Bundled with Node(s)
• 448 x 180GB Intel® Solid-State Drive 330
Series, Bundled with Node(s) (146GB SAS
Hard Disk)
• 236 x MCX353A-FCBT ConnectX-3* Cards
• 224 x Rack/Rail Mounting Kits Bundled
with Node(s)
• 448 x Intel® Xeon PhiTM 5110P Coprocessors
• 1 x Mellanox* MSX6536-NR 648-Port FDRCapable Modular Chassis (648-Port FDR
Infiniband* Switch)
• 1 x Intel® Cluster Studio XE 2013 for Linux*,
25-user Floating License, 3 Years
• 224 x UFM (Unified Fabric Manager)
for InfiniBand Switch
computing power for performing research in the fields of biotechnology,
computational fluid dynamics, seismic, atmospheric, computational
science, disaster mitigation, engineering and other disciplines. This will
pave the way for a wide range of achievements in science and technology
for India.” C-DAC’s alignment with key research institutions was also critical
in the successful uptake of the new PARAM System—allowing for maximum
impact and complete utilization of the system’s advanced capabilities.
“The combination of Intel® Xeon® processors and Intel® Xeon Phi™
coprocessors provide the unique capability to cover a very wide range of
applications, taking full advantage of various levels of code parallelism. Using
the same programming model for Xeon processors and Xeon Phi coprocessors
allows single code optimization to deliver better performance on both
processor and coprocessor.”
Raj Hazra
Vice President and General Manager
of Technical Computing
Intel
Advantages of Integrated Intel® Xeon® Processor
E5-2670 and Intel® Xeon Phi™ Coprocessor 5110P
in the Next-Gen PARAM Yuva II System
As Intel’s latest-generation processor able to deliver extremely high
performance and exceptional energy efficiency for CPU-bound
applications, the Intel® Xeon® processor E5-2670 was chosen as the best
fit to meet C-DAC’s goals for their HPC deployment. Coupled with the
Intel® Xeon Phi™ coprocessor 5110P, this hybrid solution enabled reduced
latency, dramatic performance gains for demanding applications and
highly parallel technical computing workloads, better energy efficiency,
high throughput, improved performance per watt and more.
The Intel Xeon Phi 5110P was selected as the best coprocessor
accelerator available—bringing tremendous computing capability to each
node with a minimal increase in power requirements. With the addition
of the Intel Xeon Phi coprocessor, the peak performance of each node
increased from 332.8 Gigaflops to 2,354.5 Gigaflops. Power consumption
rose from 400 Watts to 980 Watts. Flexible execution models are also
available for utilization with the combined capabilities of the Intel Xeon
processor E5-2670 and Intel Xeon Phi coprocessor 5110P—allowing for
unique, tailored models to accommodate each of C-DAC’s over 200 users
running different types of code for various HPC needs.
Additionally, with a microarchitecture featuring Intel’s 22nm process
technology and 3-D Tri-Gate transistors, the Intel Xeon processor
E5-2670 and Intel Xeon Phi coprocessor 5110P based-solution maintains
the highest level of energy efficiency when running highly parallel
applications. While the Intel Xeon E5-2670 processor is a versatile
solution for most workloads, the Intel Xeon Phi 5110P coprocessor
provides superior scalability and optimized performance when running
highly parallel computing workloads.
PARAM System
Server Setup
Peak Performance
Processing Cores
PARAM Yuva
4-way, 4U
54 Teraflops
4,608
PARAM Yuva II
2-way, 2U (plus coprocessors)
524 Teraflops
30,000
Many smaller cores, more threads, and wider vector units compensate for
the reduced speed of each individual core. The result is higher aggregate
performance for workloads that can be subdivided into a sufficiently
large number of simultaneous tasks. The Intel Xeon Phi coprocessor is
Intel’s first processor to feature Intel® Many Integrated Core (Intel® MIC)
architecture and utilizes a high degree of parallelism in smaller, lowerpower Intel® processor cores. The result is advanced performance when
running highly parallel applications. With the ability to carry out trillions of
calculations per second, the Intel Xeon processor E5-2670 and Intel Xeon
Phi coprocessor 5110P-based solution provides a solid foundation with
performance optimization for virtually any workload.
Advantages of the New PARAM Yuva II Versus
Previous-Generation PARAM Yuva System
The original homogenous PARAM Yuva System was based on 4-way, 4U
Intel® Xeon® processor-based systems with 4,608 processing cores and
peak performance of 54 Teraflops, while the new hybrid PARAM Yuva
II System is based on 2-way 2U Intel® Xeon® processor-based systems
with 2 coprocessors per system, a peak performance of 524 Teraflops
and over 30,000 processing cores. The new system requires only half the
footprint of the previous PARAM Yuva System—providing an opportunity
for future upgrades and the expansion of C-DAC’s compute capacity. The
new PARAM Yuva II System also consumes 35% less energy compared to
the original PARAM Yuva. The new system delivers sustained performance
of 360.8 Teraflops on the LINPACK benchmark.
Intel® Xeon® Processor
Advantages of Intel® Solid-State Drives
The Intel® Solid-State Drive 330 Series was selected for local node
storage due to its high performance and low power consumption when
running data-intensive HPC workloads. Based on 25nm Multi-Level Cell
(MLC) Intel® NAND Flash Memory, the Intel Solid-State Drive 330 Series
combines storage reliability and responsiveness for a wide range of
applications. Combined with other Intel components, the Intel Solid-State
Drive 330 Series played a significant role in supporting the demanding
data requirements of C-DAC’s technical computing solution.
Intel® Xeon Phi™ Coprocessor
5
Intel® Cluster Studio XE 2013
for Linux Boosts Performance
Intel® Cluster Studio XE is the first comprehensive HPC hybrid parallelism
development suite specially created to support increased performance
with highly parallel applications. It provides a comprehensive set of parallel
programming standards driven by C/C++ and Fortran development tools
and programming models, enabling more efficient optimization for HPC
applications for the Intel Xeon processor and Intel Xeon Phi coprocessor.
This uniformity can greatly reduce the complexity of developing,
optimizing, and maintaining software code. Code can be optimized once
for both Intel Xeon processors and Intel Xeon Phi coprocessors. The same
techniques—such as scaling applications to many cores and threads,
blocking data for hierarchical memory and caches, and effective use of
single instruction, multiple data (SIMD)—delivers optimal performance
for both processor and coprocessor. The investment made in parallelizing
code delivers benefits across the full range of computing environments.
“The system is the first
installation in India with
Intel based Tyrone Servers
and Intel Xeon Phi, a
debut product from Intel.”
Sanjay Lodha
Chief Executive Officer
Netweb Technologies
2U Tyrone* servers met the space and power
constraints of C-DAC's solution while providing
support for both the Intel® Xeon® processor and
Intel® Xeon Phi™ coprocessor.
Netweb uses Tyrone* Servers
to Deliver a Seamless Solution
For over a decade, Tyrone* servers have helped thousands of companies
around the world run more efficiently, securely, and reliably. As a leading
provider of servers, storage, back-up, and HPC solutions, Tyrone delivers
results that combine speed, reliability, scalability and energy efficiency.
Tyrone DI200T3R-28R4 servers based on Intel products were chosen
due to their compliance with preset C-DAC limitations—including existing
space and power constraints and support for the Intel Xeon Phi coprocessor.
At the time the project was initiated, these boards had yet to be launched.
The Tyrone DI200T3R-28R4 was one of the few servers in a position to
ensure compatibility for seamless solution integration. The Tyrone server
delivered maximum memory capacity, I/O flexibility, and provided a wellmatched addition for integration with other Intel components.
C-DAC required a solution that included 400 compute nodes in 28 racks,
with a peak power consumption of 16kW per rack or lower. Netweb
and Tyrone included 2U nodes, with 14 nodes per rack and a power
consumption of 14.4kW per rack. In the future, up to 16 nodes per rack
(up to 448 nodes in the 28 racks) can be accommodated. Through the
use of Intel processors and Intel Xeon Phi coprocessors featuring the
latest microarchitecture enhancements, C-DAC was able to significantly
reduce their cooling infrastructure requirements while boosting computing
capacity by ten times. Given the remaining data center space available,
this computing capacity could be doubled—allowing for twenty times the
original computing capacity within the same power and cooling envelope.
“The innovative technology used to build this system allows for stunning
performance and leading energy efficiency. It was possible only due to active
collaboration and the single mission work that the teams did together. For
that, I want to thank the teams from C-DAC and Netweb for their vision,
ability to quickly deal with the challenges, and delivering to the promise.”
Netweb Deploys Supercomputer in Record Time
Time was the greatest challenge faced by Netweb in the development and
deployment of the next-generation PARAM Yuva II System. The project
was set for completion in a 6-week timeframe, with 4 weeks for delivery
and 2 weeks for installation and benchmarking. The usual timeline for an
order of this magnitude is 12 weeks—making timely delivery a notable
challenge, especially considering that the solution was based on newly
launched Intel Xeon Phi coprocessors. Netweb’s considerable experience
in deploying complex solutions with Intel technology aided in overall
project success within the aggressive timeframe. The compatibility of Intel
components and uniformity with Intel® Cluster Studio XE 2013 for Linux*
were key in allowing Netweb to meet their goals in record time.
Raj Hazra
Vice President and General Manager
of Technical Computing
Intel
Closing
The final Netweb Technologies solution featured tremendous processing
capabilities enabled by the Intel Xeon processor E5-2670 and
Intel Xeon Phi coprocessor 5110P. C-DAC now manages India’s
largest supercomputer built with hybrid technology, and the first Indian
supercomputer to achieve more than 500 Teraflops of peak performance.
The hybrid solution will continue to have a monumental impact in highly
parallel applications for biotechnology, computational fluid dynamics,
seismic, atmospheric, computational science, disaster mitigation,
engineering and other disciplines. With the release of the new PARAM
Yuva II System, supercomputing efforts in India have received a major
boost—with many more installations planned for deployment in the next
5 years. Those involved in supercomputing efforts in India look forward
to establishing an Exascale-level computing facility in the future, and to
continually pushing the boundaries of innovation and scientific discovery in
the years to come.
7
Learn more about Intel and HPC solutions at:
www.intel.com/HPC
Learn more about Intel® Cluster Studio XE 2013 at:
intel.ly/cluster-studio-xe
Learn more about Intel® Server products at:
www.intelserveredge.com
Learn more about Netweb Technologies at:
www.netwebindia.com
Learn more about C-DAC at:
www.cdac.in
Learn more about Tyrone* Servers at:
www.tyronesystems.com
Intel, the Intel logo, Xeon, Xeon Inside and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2013 Intel Corporation. All rights reserved.
0513/GIP/CAF/PDF
C Please Recycle
328953-001US