A High-Performance Scalable Graphics Architecture Daniel R. McLachlan Director, Advanced Graphics Engineering SGI Growth in Model Sizes Worldwide Production of Information 200 180 160 140 Exabytes 120 100 80 60 40 20 0 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 Source: Gartner Images courtesy of Parametric Technology Corporation; Photodisc, and Magic Earth, LLC Problems Are Getting Increasingly Complex Over Time Bumper Bumper, hood, engine, wheels Crash dummy E-crash dummy Entire car Organ damage Images courtesy of EAI; SCI Institute, NLM, Theoretical Biophysics Group of the Beckman Institute at UIUC; Livermore Software Technology Corporation The Complexity of the Simple Potato Chips Diapers Images courtesy of Procter & Gamble Performance Gap (normalized) Bandwidth Specification Graphic Cards Are Outpacing PC Architecture and Bandwidth 2000 Polygons Graph based on relative scale. 2001 Fill Rate 2002 2003 Internal Bus 2004 Network I/C Addressing Real Needs Visualization • Extreme resolution • Absolute visual quality • VAN Performance • Solving complex problems • Dense data sets Clusters 1992 Graphics • Low cost • Fast simple polygons • Single screen image quality 2003 Visualization Breaks The Cognitive Barrier For Better Decisions Images courtesy of Advantage CFD; SCI institute; NLM; Theoretical Biophysics Group of the Beckman Institute at UIUC; Laboratory for Atmospheres, NASA Goddard Space Flight Center; Donghoon Shin, Art Center College of Design, Nvidia Corporation; ATI Technologies, Inc; and Nintendo Co., Ltd. Cluster Comparison Pros • Cheap • Industry standard • High display list performance • Good for “embarrassingly parallel” problems • Can potentially scale to 1000s of processors Cons • Cumbersome to program • High administration costs • Few applications for visualization • Difficult to scale for large problems • Difficult to dynamically load balance • Lack of software productivity tools • Often requires data replication • Reliability • Limited to 2GB memory space The Benefits of Shared Memory Traditional Clusters SGI® NUMAflex™ Commodity interconnect mem mem mem mem mem node + OS node + OS node + OS node + OS node + OS ... mem node + OS 1-2 CPUs per node Fast NUMAflex™ interconnect Global shared memory node node node node ... + + + + OS OS OS OS < 64 CPUs per node What is shared memory? • All nodes operate on one large shared memory space, instead of each node having its own small memory space Shared memory is high-performance • All nodes can access one large memory space efficiently, so complex communication and data passing between nodes aren’t needed • Big data sets fit entirely in memory; less disk I/O is needed Shared memory is cost-effective and easy to deploy • It requires less memory per node, because large problems can be solved in big shared memory • Simpler programming means lower tuning and maintenance costs How SGI® Onyx® Enables the Role System at a Glance Scalable Interaction Scalable Graphics I/O Scalable Data Appropriate Delivery SGI Onyx Large Data Sets Scalable Compute and Large Memory Scalable Disk I/O Scalable Graphics Scalable Rendering C o m p o s i t o r N e t w o r k Scalable Resolution Silicon Graphics® Onyx4™ UltimateVision™ Changing the Application Paradigm Moving from a fixed rendering path… Geometry …to a scalable and programmable rendering path. Application accelerators Images courtesy of Pratt and Whitney Canada and Magic Earth, LLC Scaling A Shift in Pipe Paradigm 1. Screen-based decomposition Even more powerful in combination All modes can be used separately or combined in any number of ways 2. Eye-based decomposition 3. Time-based decomposition 4. Data-based decomposition Data courtesy of DaimlerChrysler, Images courtesy of MAK Visible Human public data set Compositor Flexibility Multi-Tier Composition Composite output of multiple compositors e.g., first layer does 2D composition, second layer does anti-aliasing Visual Serving Composited output sent to workstations for viewing and/or editing SGI® NUMA scalability Silicon Graphics® Onyx4™ UltimateVision™ System Architecture 8GB RAM CPU CPU Optional Standard I/O or 2 Graphics Pipes Memory Controller CPU CPU 2 Graphics Pipes Conclusion Silicon Graphics® Onyx4™ UltimateVision™ Solving bigger and more complex problems • World’s most scalable visualization system •Up to 32 GPUs in an SSI architecture • World-leading computational capability •Up to 64 CPUs per node, scalable to 1024 processors • Solves system b/w limitations of PCs and clusters •Up to 8 NUMAlink 3 connections to a single shared memory pool • New-generation programmable graphics architecture •OpenGL Shading Language
© Copyright 2026 Paperzz