COMPUTER GRAPHICS CS 482 – FALL 2016 CHAPTER 38 GRAPHICS HARDWARE • GRAPHICS PROCESSING UNITS • PARALLELISM GRAPHICS PROCESSING UNITS HISTORICAL CONTEXT EARLY 1990’S VGA CONTROLLERS ALSO KNOWN AS A “GRAPHICS ACCELERATORS”, A VIDEO GRAPHICS ARRAY CONTROLLER COMBINED A MEMORY CONTROLLER AND A DISPLAY GENERATOR WITH ATTACHED DRAM. THESE CONTROLLERS CONTAINED FIXED FUNCTION CAPABILITIES FOR TRIANGULATION, RASTERIZATION, AND TEXTURE MAPPING. CS 482 – FALL 2016 EARLY 2000’S EARLY 2010’S WITH INCREASED PROCESSING POWER BEING DEMANDED, ESPECIALLY BY THE GAME INDUSTRY, CHIP DEVELOPERS BEGAN ADDING ENOUGH TO GPUS TO RIVAL THAT IN CPUS. FIXED FUNCTION DEDICATED LOGIC ON THESE CHIPS WAS REPLACED BY PROGRAMMABLE PROCESSORS. FIRST GPUS THE FIXED FUNCTION CAPABILITIES WERE EXTENDED TO INCLUDE TRANSFORMATIONS, LIGHTING, AND SHADING. MODERN GPUS INTEGER ARITHMETIC WAS REPLACED WITH FLOATINGPOINT ARITHMETIC. PARALLELISM WAS VASTLY INCREASED ON THE CHIPS. INSTRUCTIONS AND MEMORY BEGAN TO BE ADDED TO ALLOW GPUS TO BE USED FOR GENERAL PURPOSE PROGRAMMING, NOT JUST GRAPHICS. CHAPTER 38: GRAPHICS HARDWARE PAGE 277 GRAPHICS PROCESSING UNITS DISTINGUISHING FEATURES AS INSTRUCTION SETS AND MEMORY EXPAND ON GPUS, THEY BECOME INCREASINGLY CAPABLE OF GENERAL PURPOSE PROCESSING, BUT THERE ARE STILL IMPORTANT DIFFERENCES BETWEEN GPUS AND CPUS. GPU INSTRUCTION SETS ARE STILL RATHER NARROWLY DEFINED, FOCUSING ON GRAPHICS ACCELERATION. GPU PROGRAMMING INTERFACES ARE HIGHLEVEL APIS LIKE OPENGL AND DIRECTX, TOGETHER WITH HIGH-LEVEL SHADING LANGUAGES LIKE CG (C FOR GRAPHICS) AND HLSL (HIGH LEVEL SHADER LANGUAGE). THESE ARE SUPPORTED BY COMPILERS THAT GENERATE INTERMEDIATE LANGUAGES, WHICH ARE OPTIMIZED BY THE SPECIFIC GPU DRIVER SOFTWARE THAT GENERATES THE GPU’S MACHINE INSTRUCTIONS. CS 482 – FALL 2016 GRAPHICS PROCESSING INVOLVES MANY STAGES OF OPERATIONS, SUCH AS VERTEX SHADING, GEOMETRY SHADING, RASTERIZATION, AND FRAGMENT SHADING, WHICH ARE PERFORMED ON A MASSIVELY PARALLEL SCALE IN A PIPELINED FASHION. VERTICES CAN BE DRAWN INDEPENDENTLY AND FRAGMENTS CAN BE RENDERED INDEPENDENTLY, ALLOWING COMPUTATION TO TAKE PLACE ALONG MANY PARALLEL THREADS OF CONTROL, RELYING ON THOSE THREADS TO HIDE LATENCY RATHER THAN RELYING ON CACHES TO AVOID LATENCY. CHAPTER 38: GRAPHICS HARDWARE PAGE 278 PARALLELISM PARALLEL RENDERING WHEN RENDERING WAS HANDLED BY SERIAL PROCESSING MANAGED BY THE CPU, GRAPHICAL PRIMITIVES WERE PERIODICALLY FED TO A GPU, BUT MODERN PROGRAMMERS MUST ADDRESS THE PARALLELISM OF MULTIPLE CPUS AND GPUS. OBJECTS MAY BE SUBMITTED TO THE FRAME BUFFER IN ANY ORDER, BUT THEY MUST BE SORTED AS A LAST STEP BEFORE RASTERIZATION FOR TWO REASONS: • TRANSPARENT OBJECTS NEED TO BE DRAWN BACK-TO-FRONT SO ANYTHING BEHIND A TRANSPARENT OBJECT SHOWS THROUGH. • GPU STATE CHANGES (UPLOADING TEXTURES, ACTIVATING LIGHTING, ETC.) CAN BE EXPENSIVE, SO SORTING ALL SIMILAR STATES TOGETHER (EVERYTHING WITH THE SAME TEXTURE, THINGS THAT ARE LIT, ETC.), MINIMIZES STATE CHANGES, AND THE GPU TAKES LESS TIME TO RENDER THE SCENE. IN GENERAL, A MULTIPROCESSOR-BASED PARALLEL PIPELINE DISTRIBUTES GEOMETRY AMONG SEVERAL PROCESSORS, WHOSE RESULTS MUST ULTIMATELY BE GATHERED TOGETHER INTO THE FRAME BUFFER. CS 482 – FALL 2016 CHAPTER 38: GRAPHICS HARDWARE DATABASE GEOMETRY PROCESSORS RASTER PROCESSORS FRAME BUFFER PAGE 279 PARALLELISM SORT-FIRST MULTIPROCESSOR-BASED ARCHITECTURE DATABASE SORT BEFORE SUBMISSION TO GEOMETRY PROCESSORS GEOMETRY PROCESSORS RASTER PROCESSORS SUBDIVIDE THE FRAME BUFFER INTO TILES THAT ARE MAPPED TO THE AVAILABLE PROCESSORS, DISTRIBUTING PRIMITIVES TO PROCESSORS BASED UPON THEIR TILES. COUPLE EACH GEOMETRY PROCESSOR WITH A RASTERIZER TO FORM A COMPLETE RENDERING UNIT. P1 P2 1. SUBDIVIDE THE SCREEN 2. “PRE-TRANSFORM” THE PRIMITIVES INTO SCREEN COORDINATES VIA BOUNDING BOXES 3. DISTRIBUTE THE PRIMITIVES 4. EACH PROCESSOR RENDERS ITS OWN PRIMITIVES 5. NO COMMUNICATION NEEDED AFTERWARDS P1: FRAME BUFFER P3 P2: P3: P4: P4 ADVANTAGE: THIS ARCHITECTURE CAN EXPLOIT FRAME-TO-FRAME COHERENCE, REDISTRIBUTING PRIMITIVES TO PROCESSORS ONLY WHEN THEY MOVE BETWEEN SCREEN REGIONS. DISADVANTAGE: IT IS SUSCEPTIBLE TO LOAD IMBALANCES SINCE SOME PORTIONS OF THE SCREEN MAY HAVE MANY MORE THINGS TO RENDER THAN OTHER PORTIONS. CS 482 – FALL 2016 CHAPTER 38: GRAPHICS HARDWARE PAGE 280 PARALLELISM SORT-MIDDLE MULTIPROCESSOR-BASED ARCHITECTURE DATABASE PRIMITIVES ARE TRANSFORMED INTO SCREEN COORDINATES, SORTED BY REGION, AND ROUTED FROM GEOMETRY PROCESSORS TO RASTERIZERS, WHICH RENDER THEIR REGION. GEOMETRY PROCESSORS FRAGMENTS ARE THEN COLLECTED AND ASSEMBLED INTO THE FRAME BUFFER. 1. ARBITRARY ASSIGNMENT 2. GEOMETRY PROCESSING 3. SORTING SORT BEFORE SUBMISSION TO RASTER PROCESSORS RASTER PROCESSORS P1: P2: P1: P2: P3: P1 P4: P3: P4: FRAME BUFFER ADVANTAGE: IN THIS ARCHITECTURE, GEOMETRY CAN BE DISTRIBUTED AMONG PROCESSORS WITHOUT REGARD TO THE SUBDIVISION OF THE SCREEN. CS 482 – FALL 2016 P2 P3 P4 RASTERIZATION DISADVANTAGE: POOR LOAD DISTRIBUTION SOME AREAS OF SCREEN MAY BE RELATIVELY EMPTY. DISADVANTAGE: LATENCY - ALL PROCESSORS MUST FINISH BEFORE FINAL IMAGE IS COMPOSED. DISADVANTAGE: ORDERDEPENDENT PRIMITIVES (SUCH AS TRANSPARENT OBJECTS) ARE DIFFICULT TO ACCOMMODATE SINCE FRAGMENTS ARRIVE FOR PROCESSING IN NONDETERMINISTIC ORDER. CHAPTER 38: GRAPHICS HARDWARE PAGE 281 PARALLELISM SORT-LAST MULTIPROCESSOR-BASED ARCHITECTURE DATABASE GEOMETRY PROCESSORS RENDERERS ARE RESPONSIBLE FOR RENDERING A FULLSCREEN IMAGE USING THEIR SHARE OF THE PRIMITIVES. EACH PROCESSOR IS P ASSIGNED A SHARE OF 1 THE PRIMITIVES AND RENDERS THEM AS A COMPLETE SCENE. EACH PROCESSOR IS P ASSIGNED A SECTOR’S 1 PORTION OF THE SUBSORT DURING COMPOSITION SCENES AND SORTS INTO FRAME BUFFER THE IMAGES WITH ZCOMPOSITING/ZFRAME BUFFER BUFFER RASTER PROCESSORS P2 P3 P4 P2 P3 P4 THE PARTIAL IMAGES ARE COMPOSITED TOGETHER, TAKING INTO ACCOUNT THE DISTANCE OF EACH PIXEL IN EACH LAYER FROM THE CAMERA, WHICH GUARANTEES THAT THE RESULTS OF THE INDIVIDUAL RENDERERS ARE LAYERED CORRECTLY. ADVANTAGE: NO REQUIREMENT TO SORT OR REDISTRIBUTE PRIMITIVES; EACH RENDERER COMPUTES ITS IMAGE AS IF IT WERE THE ONLY RENDERER IN THE SYSTEM. CS 482 – FALL 2016 DISADVANTAGE: IT REQUIRES A HIGH BANDWIDTH IMAGE COMPOSITOR. CHAPTER 38: GRAPHICS HARDWARE PAGE 282
© Copyright 2026 Paperzz