Graphics Hardware - SIUE Computer Science

COMPUTER GRAPHICS
CS 482 – FALL 2016
CHAPTER 38
GRAPHICS HARDWARE
• GRAPHICS PROCESSING UNITS
• PARALLELISM
GRAPHICS PROCESSING UNITS
HISTORICAL CONTEXT
EARLY 1990’S
VGA CONTROLLERS
ALSO KNOWN AS A “GRAPHICS
ACCELERATORS”, A VIDEO
GRAPHICS ARRAY CONTROLLER
COMBINED A MEMORY
CONTROLLER AND A DISPLAY
GENERATOR WITH ATTACHED
DRAM.
THESE CONTROLLERS CONTAINED
FIXED FUNCTION CAPABILITIES
FOR TRIANGULATION,
RASTERIZATION, AND TEXTURE
MAPPING.
CS 482 – FALL 2016
EARLY 2000’S
EARLY 2010’S
WITH INCREASED
PROCESSING POWER BEING
DEMANDED, ESPECIALLY BY
THE GAME INDUSTRY, CHIP
DEVELOPERS BEGAN ADDING
ENOUGH TO GPUS TO RIVAL
THAT IN CPUS.
FIXED FUNCTION DEDICATED
LOGIC ON THESE CHIPS WAS
REPLACED BY PROGRAMMABLE
PROCESSORS.
FIRST GPUS
THE FIXED FUNCTION
CAPABILITIES WERE
EXTENDED TO INCLUDE
TRANSFORMATIONS,
LIGHTING, AND SHADING.
MODERN GPUS
INTEGER ARITHMETIC WAS
REPLACED WITH FLOATINGPOINT ARITHMETIC.
PARALLELISM WAS VASTLY
INCREASED ON THE CHIPS.
INSTRUCTIONS AND MEMORY
BEGAN TO BE ADDED TO ALLOW
GPUS TO BE USED FOR GENERAL
PURPOSE PROGRAMMING, NOT
JUST GRAPHICS.
CHAPTER 38: GRAPHICS HARDWARE
PAGE 277
GRAPHICS PROCESSING UNITS
DISTINGUISHING FEATURES
AS INSTRUCTION SETS AND MEMORY EXPAND ON GPUS,
THEY BECOME INCREASINGLY CAPABLE OF GENERAL
PURPOSE PROCESSING, BUT THERE ARE STILL IMPORTANT
DIFFERENCES BETWEEN GPUS AND CPUS.
GPU
INSTRUCTION
SETS ARE STILL
RATHER
NARROWLY
DEFINED,
FOCUSING ON
GRAPHICS
ACCELERATION.
GPU PROGRAMMING
INTERFACES ARE HIGHLEVEL APIS LIKE OPENGL
AND DIRECTX, TOGETHER
WITH HIGH-LEVEL
SHADING LANGUAGES LIKE
CG (C FOR GRAPHICS) AND
HLSL (HIGH LEVEL SHADER
LANGUAGE).
THESE ARE SUPPORTED BY
COMPILERS THAT
GENERATE INTERMEDIATE
LANGUAGES, WHICH ARE
OPTIMIZED BY THE
SPECIFIC GPU DRIVER
SOFTWARE THAT
GENERATES THE GPU’S
MACHINE INSTRUCTIONS.
CS 482 – FALL 2016
GRAPHICS
PROCESSING
INVOLVES MANY
STAGES OF
OPERATIONS, SUCH
AS VERTEX SHADING,
GEOMETRY
SHADING,
RASTERIZATION,
AND FRAGMENT
SHADING, WHICH
ARE PERFORMED ON
A MASSIVELY
PARALLEL SCALE IN A
PIPELINED FASHION.
VERTICES CAN BE
DRAWN
INDEPENDENTLY
AND FRAGMENTS
CAN BE RENDERED
INDEPENDENTLY,
ALLOWING
COMPUTATION TO
TAKE PLACE ALONG
MANY PARALLEL
THREADS OF
CONTROL, RELYING
ON THOSE THREADS
TO HIDE LATENCY
RATHER THAN
RELYING ON CACHES
TO AVOID LATENCY.
CHAPTER 38: GRAPHICS HARDWARE
PAGE 278
PARALLELISM
PARALLEL RENDERING
WHEN RENDERING WAS HANDLED BY SERIAL PROCESSING
MANAGED BY THE CPU, GRAPHICAL PRIMITIVES WERE
PERIODICALLY FED TO A GPU, BUT MODERN PROGRAMMERS
MUST ADDRESS THE PARALLELISM OF MULTIPLE CPUS AND GPUS.
OBJECTS MAY BE SUBMITTED TO THE FRAME BUFFER IN ANY
ORDER, BUT THEY MUST BE SORTED AS A LAST STEP BEFORE
RASTERIZATION FOR TWO REASONS:
• TRANSPARENT OBJECTS NEED TO BE DRAWN BACK-TO-FRONT
SO ANYTHING BEHIND A TRANSPARENT OBJECT SHOWS
THROUGH.
• GPU STATE CHANGES (UPLOADING TEXTURES, ACTIVATING
LIGHTING, ETC.) CAN BE EXPENSIVE, SO SORTING ALL SIMILAR
STATES TOGETHER (EVERYTHING WITH THE SAME TEXTURE,
THINGS THAT ARE LIT, ETC.), MINIMIZES STATE CHANGES, AND
THE GPU TAKES LESS TIME TO RENDER THE SCENE.
IN GENERAL, A MULTIPROCESSOR-BASED PARALLEL
PIPELINE DISTRIBUTES GEOMETRY AMONG SEVERAL
PROCESSORS, WHOSE RESULTS MUST ULTIMATELY
BE GATHERED TOGETHER INTO THE FRAME BUFFER.
CS 482 – FALL 2016
CHAPTER 38: GRAPHICS HARDWARE
DATABASE
GEOMETRY
PROCESSORS
RASTER
PROCESSORS
FRAME BUFFER
PAGE 279
PARALLELISM
SORT-FIRST MULTIPROCESSOR-BASED ARCHITECTURE
DATABASE
SORT BEFORE SUBMISSION TO
GEOMETRY PROCESSORS
GEOMETRY
PROCESSORS
RASTER
PROCESSORS
SUBDIVIDE THE FRAME BUFFER INTO TILES THAT ARE
MAPPED TO THE AVAILABLE PROCESSORS, DISTRIBUTING
PRIMITIVES TO PROCESSORS BASED UPON THEIR TILES.
COUPLE EACH GEOMETRY PROCESSOR WITH A
RASTERIZER TO FORM A COMPLETE RENDERING UNIT.
P1
P2
1. SUBDIVIDE THE SCREEN
2. “PRE-TRANSFORM” THE PRIMITIVES INTO
SCREEN COORDINATES VIA BOUNDING BOXES
3. DISTRIBUTE THE PRIMITIVES
4. EACH PROCESSOR RENDERS ITS OWN PRIMITIVES
5. NO COMMUNICATION NEEDED AFTERWARDS
P1:
FRAME BUFFER
P3
P2:
P3:
P4:
P4
ADVANTAGE: THIS ARCHITECTURE CAN
EXPLOIT FRAME-TO-FRAME COHERENCE,
REDISTRIBUTING PRIMITIVES TO
PROCESSORS ONLY WHEN THEY MOVE
BETWEEN SCREEN REGIONS.
DISADVANTAGE: IT IS SUSCEPTIBLE TO LOAD IMBALANCES SINCE SOME PORTIONS OF
THE SCREEN MAY HAVE MANY MORE THINGS TO RENDER THAN OTHER PORTIONS.
CS 482 – FALL 2016
CHAPTER 38: GRAPHICS HARDWARE
PAGE 280
PARALLELISM
SORT-MIDDLE MULTIPROCESSOR-BASED ARCHITECTURE
DATABASE
PRIMITIVES ARE TRANSFORMED INTO SCREEN COORDINATES, SORTED BY
REGION, AND ROUTED FROM GEOMETRY PROCESSORS TO RASTERIZERS, WHICH
RENDER THEIR REGION.
GEOMETRY
PROCESSORS
FRAGMENTS ARE THEN COLLECTED AND ASSEMBLED INTO THE FRAME BUFFER.
1. ARBITRARY ASSIGNMENT
2. GEOMETRY PROCESSING
3. SORTING
SORT BEFORE SUBMISSION TO
RASTER PROCESSORS
RASTER
PROCESSORS
P1:
P2:
P1:
P2:
P3:
P1
P4:
P3:
P4:
FRAME BUFFER
ADVANTAGE: IN THIS
ARCHITECTURE, GEOMETRY
CAN BE DISTRIBUTED AMONG
PROCESSORS WITHOUT REGARD
TO THE SUBDIVISION OF THE
SCREEN.
CS 482 – FALL 2016
P2
P3
P4
RASTERIZATION
DISADVANTAGE:
POOR LOAD
DISTRIBUTION SOME AREAS OF
SCREEN MAY BE
RELATIVELY
EMPTY.
DISADVANTAGE:
LATENCY - ALL
PROCESSORS
MUST FINISH
BEFORE FINAL
IMAGE IS
COMPOSED.
DISADVANTAGE: ORDERDEPENDENT PRIMITIVES
(SUCH AS TRANSPARENT
OBJECTS) ARE DIFFICULT TO
ACCOMMODATE SINCE
FRAGMENTS ARRIVE FOR
PROCESSING IN
NONDETERMINISTIC ORDER.
CHAPTER 38: GRAPHICS HARDWARE
PAGE 281
PARALLELISM
SORT-LAST MULTIPROCESSOR-BASED ARCHITECTURE
DATABASE
GEOMETRY
PROCESSORS
RENDERERS ARE RESPONSIBLE FOR RENDERING A FULLSCREEN IMAGE USING THEIR SHARE OF THE PRIMITIVES.
EACH PROCESSOR IS P
ASSIGNED A SHARE OF 1
THE PRIMITIVES AND
RENDERS THEM AS A
COMPLETE SCENE.
EACH PROCESSOR IS P
ASSIGNED A SECTOR’S 1
PORTION OF THE SUBSORT DURING COMPOSITION SCENES AND SORTS
INTO FRAME BUFFER
THE IMAGES WITH ZCOMPOSITING/ZFRAME BUFFER
BUFFER
RASTER
PROCESSORS
P2
P3
P4
P2
P3
P4
THE PARTIAL IMAGES ARE COMPOSITED TOGETHER, TAKING INTO
ACCOUNT THE DISTANCE OF EACH PIXEL IN EACH LAYER FROM THE
CAMERA, WHICH GUARANTEES THAT THE RESULTS OF THE INDIVIDUAL
RENDERERS ARE LAYERED CORRECTLY.
ADVANTAGE: NO REQUIREMENT TO
SORT OR REDISTRIBUTE PRIMITIVES;
EACH RENDERER COMPUTES ITS
IMAGE AS IF IT WERE THE ONLY
RENDERER IN THE SYSTEM.
CS 482 – FALL 2016
DISADVANTAGE:
IT REQUIRES A
HIGH BANDWIDTH
IMAGE
COMPOSITOR.
CHAPTER 38: GRAPHICS HARDWARE
PAGE 282