Unique Modular Computing Design: SGI® NUMAflex

A High-Performance Scalable Graphics
Architecture
Daniel R. McLachlan
Director, Advanced Graphics Engineering
SGI
Growth in Model Sizes
Worldwide Production of Information
200
180
160
140
Exabytes
120
100
80
60
40
20
0
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Source: Gartner
Images courtesy of Parametric Technology Corporation; Photodisc, and Magic Earth, LLC
Problems Are Getting
Increasingly Complex Over Time
Bumper
Bumper, hood, engine, wheels
Crash dummy
E-crash dummy
Entire car
Organ damage
Images courtesy of EAI; SCI Institute, NLM, Theoretical Biophysics Group of
the Beckman Institute at UIUC; Livermore Software Technology Corporation
The Complexity of the Simple
Potato Chips
Diapers
Images courtesy of Procter & Gamble
Performance
Gap
(normalized)
Bandwidth Specification
Graphic Cards Are Outpacing PC
Architecture and Bandwidth
2000
Polygons
Graph based on relative scale.
2001
Fill Rate
2002
2003
Internal Bus
2004
Network I/C
Addressing Real Needs
Visualization
• Extreme resolution
• Absolute visual quality
• VAN
Performance
• Solving complex problems
• Dense data sets
Clusters
1992
Graphics
• Low cost
• Fast simple polygons
• Single screen image quality
2003
Visualization Breaks The Cognitive Barrier For Better Decisions
Images courtesy of Advantage CFD; SCI institute; NLM; Theoretical Biophysics Group of the Beckman Institute at UIUC; Laboratory for Atmospheres, NASA Goddard Space Flight Center;
Donghoon Shin, Art Center College of Design, Nvidia Corporation; ATI Technologies, Inc; and Nintendo Co., Ltd.
Cluster Comparison
Pros
• Cheap
• Industry standard
• High display list performance
• Good for “embarrassingly parallel”
problems
• Can potentially scale to 1000s of
processors
Cons
• Cumbersome to program
• High administration costs
• Few applications for visualization
• Difficult to scale for large problems
• Difficult to dynamically load balance
• Lack of software productivity tools
• Often requires data replication
• Reliability
• Limited to 2GB memory space
The Benefits of Shared Memory
Traditional Clusters
SGI® NUMAflex™
Commodity interconnect
mem mem mem mem mem
node
+
OS
node
+
OS
node
+
OS
node
+
OS
node
+
OS
...
mem
node
+
OS
1-2 CPUs per node
Fast NUMAflex™ interconnect
Global shared memory
node node node
node
... +
+
+
+
OS
OS
OS
OS
< 64 CPUs per node
What is shared memory?
• All nodes operate on one large shared memory space, instead of each node having its own
small memory space
Shared memory is high-performance
• All nodes can access one large memory space efficiently, so complex communication and data
passing between nodes aren’t needed
• Big data sets fit entirely in memory; less disk I/O is needed
Shared memory is cost-effective and easy to deploy
• It requires less memory per node, because large problems can be solved in big shared memory
• Simpler programming means lower tuning and maintenance costs
How SGI® Onyx® Enables the Role
System at a Glance
Scalable Interaction
Scalable Graphics I/O
Scalable Data
Appropriate Delivery
SGI Onyx
Large
Data
Sets
Scalable
Compute
and
Large
Memory
Scalable Disk I/O
Scalable
Graphics
Scalable Rendering
C
o
m
p
o
s
i
t
o
r
N
e
t
w
o
r
k
Scalable Resolution
Silicon Graphics® Onyx4™ UltimateVision™
Changing the Application Paradigm
Moving from a fixed rendering path…
Geometry
…to a scalable and programmable rendering path.
Application
accelerators
Images courtesy of Pratt and
Whitney Canada and Magic Earth,
LLC
Scaling
A Shift in Pipe Paradigm
1. Screen-based decomposition
Even more powerful in combination
All modes can be used separately or
combined in any number of ways
2. Eye-based decomposition
3. Time-based decomposition
4. Data-based decomposition
Data courtesy of DaimlerChrysler, Images courtesy of MAK
Visible Human public data set
Compositor Flexibility
Multi-Tier Composition
Composite output of multiple compositors e.g.,
first layer does 2D composition, second layer
does anti-aliasing
Visual Serving
Composited output sent to workstations for
viewing and/or editing
SGI® NUMA scalability
Silicon Graphics® Onyx4™ UltimateVision™
System Architecture
8GB RAM
CPU
CPU
Optional
Standard I/O
or
2 Graphics Pipes
Memory
Controller
CPU
CPU
2 Graphics Pipes
Conclusion
Silicon Graphics® Onyx4™ UltimateVision™
Solving bigger and more complex problems
• World’s most scalable visualization system
•Up to 32 GPUs in an SSI architecture
• World-leading computational capability
•Up to 64 CPUs per node, scalable to 1024 processors
• Solves system b/w limitations of PCs and clusters
•Up to 8 NUMAlink 3 connections to a single shared memory pool
• New-generation programmable graphics architecture
•OpenGL Shading Language