Paper #28 - University of Pittsburgh

A12
Paper #28
Disclaimer — This paper partially fulfills a writing requirement for first year (freshman) engineering students at the University
of Pittsburgh Swanson School of Engineering. This paper is a student, not a professional, paper. This paper is based on
publicly available information and may not be provide complete analyses of all relevant data. If this paper is used for any
purpose other than these authors’ partial fulfillment of a writing requirement for first year (freshman) engineering students at
the University of Pittsburgh Swanson School of Engineering, the user does so at his or her own risk.
GRAPHICAL PROCESSING UNIT ADVANCEMENTS FOR VIRTUAL SPACE
RENDERING
Max Zeeman, [email protected], Lora 3:00; Adam Granieri, [email protected], Lora 1:00
Abstract — Imagine having to solve the problem 1+1, and
similar elementary math problems. Now imagine having to
solve that same problem a million time overs. It would take
the quickest writer or mathematician days to solve even the
simplest problem a million times over. However, if you had
thousands of quick mathematicians working with you, it
would be possible to solve a million 1+1 problems in a few
short minutes. This is essentially how a Graphical Processing
Unit (GPU) functions, except in place of fast mathematicians,
you have thousands of little cores that do similar
processes. A GPU’s classical primary function is to render
three dimensional environments, which essentially can be
summed up as an enormous series of basic problems. Our
research would focus on the hardware advancements in
GPU’s that have allowed them to become the computational
powerhouses they are today. Specifically, we plan to examine
the development of memory interfaces including GDDR5X
(double data rate type five synchronous graphics random
access memory), HBM (high bandwidth memory), and their
subsequent influence on energy efficiency and power
computing. Each memory type functions in entirely different
ways, and each with its respective pros and cons. This paper
will focus on how these advancements have helped benefit
sustainability in various industries, including cloud services,
animation, and academia, such as mathematical and physics
related theoretical calculations.
Key Words — HBM, GDDR5X, Graphical Processing Unit,
Memory Bus Interface, Parallel Processing
AN OVERVIEW OF GRAPHICAL
PROCESSING UNITS
Graphical Processing Units (GPUs) do exactly as their
namesake suggests; they are electronics dedicated to the
computing of graphical related processes. This is a task that
usually comes in the form of a myriad of algebraic equations,
as GPUs are specifically designed to solve at boggling
speeds. A GPU’s ability to solve algebraic equations and
algorithms proves very useful in a computer system, as this
can allocate more computation-related tasks away from the
CPU, consequently making a computer system more efficient
[1].
University of Pittsburgh Swanson School of Engineering 1
3.31.2017
GPUs are composed of a massive parallel core
architecture [1]. By arranging thousands of small efficient
cores in this parallel structure, GPUs are capable of immense
number crunching and multi-processing. This allows GPUs
to be processing powerhouses that can do millions upon
millions of calculations, and consequently, allow consumers
to have amazing animated movies, games and other
applications [1][2]. However, our analysis focuses on a
deeper level of GPU functions, specifically that of the
memory protocols of a GPU.
Although GPUs can do millions of calculations a
second, GPUs can be limited by how quickly data is
presented to and moved from the processing unit. This is
where graphical memory takes a role. In an unconventional
sense, you can imagine the calculating bottleneck like this;
Louie Armstrong is wailing away on his trumpet, going
through jazz scales, and blasting an amazing solo. At his
level, Louie’s only limit holding him back is his breathing
and airflow, not the speed or dexterity of his fingers as he
plays. This can be translated back to graphical memory; in
the context of this paper, it is not the speed at which data is
processed that matters as much as how fast the data can be
moved. HBM and GDDR5X are proving to be the two
competing standards in memory bus technology. The purpose
of this paper is to explore graphical memory bus interfaces,
including HBM and GDDR5X, and to analyze how memory
can store and relay data to the GPU, and overall benefit
society.
THE GPU’S ROLE IN CURRENT
TECHNOLOGICAL SYSTEMS
GPUs Compared to CPUs
A common misconception is that a GPU is no different
from a CPU, the Central Processing Unit. While GPUs are
still a type of processing unit, they are dedicated to the task
of controlling what you can see on your monitor screen.
CPUs on the other hand, handle all the other data related to
making a computer work [3]. For example, CPUs have to
execute the code to run your Mac or Windows operating
system. GPUs in this scenario would execute the code that
deals with the User Interface facets of your operating system,
i.e. your login or menu screen.
The other notable difference is the type of processing
cores each contains. CPUs normally contain 1, 2 or 4
Max Zeeman
Adam Granieri
processing core, versus GPUs, which can contain thousands.
Similarly, both systems have an amazing amount of
programmability, as to allow users to specify what each core
(or in the GPU’s case, group of cores) should be tasked to do.
In fact, GPU cores work most effectively if working in
‘parallel,’ and can thus handle multiple task simultaneously
[1]. Although CPUs use to have to handle all calculations and
related such task in the computer, “GPU-accelerated
computing [now] offloads compute-intensive portions of the
application to the GPU” [1]. This allows GPUs to handle the
more compute related tasks that could slow down the CPU,
and thus, lead to a more effective system overall. Also, in
both processing units’ cases, CPUs and GPUs need a place to
store and access data; this comes in the form of Random
Access Memory and Graphical Random Access Memory [4].
modules send the commands to the cache and related
processing clusters. The protocol of the memory bus utilizes
the Peripheral Component Interconnect express (PCIe) port
interface to transfer and allocate the data. In the case for
rendering applications, the memory bus distributes models
and rigging to select memory modules as in figure 1 [5].
MEMORY BUS INTERFACES
Memory Utilization
A memory bus is a protocol the computer uses to
communicate digital and analog data and commands to
system memory. The memory it communicates to is either
the system Random Access Memory (RAM) or what is
generally called the Graphical Random Access Memory (GRAM) [4]. Data signals are sent and allocated from the CPU
to the GPU for specific computations, almost exclusively for
displaying images to the monitor. Hence the memory is
allocated for calculating and combining different aspects of
an image like color and placement [2]. When it comes to
rendering a 3-dimensional space, the card must also account
for shading, draw distance and different algorithms for
enhancing an image [2]. The memory bus receives these
commands and relevant data and distributes it to relevant
memory modules and the computations to the processor.
There the final image is assembled and sent through the
display ports.
The memory bus can differ considerably on how it
interfaces with the graphics card and the rest of the system,
but all different protocols use the same basic structure of
memory lanes, each with different bit-rates [4]. Most
memory bus lanes are either 64 or 128 bit. In the following
section, we will discuss what the memory and processors on
the GPU are used for.
FIGURE 1 [5]
GPU Memory Workflow Diagram
Others would contain texture images and mapping for
the desired model. The structure of the memory bus has a
large impact on the speed of rendering and can in some cases,
lead to a redesign of a GPU’s organization and pcb to
optimize for efficiency and speed. Memory busses have two
major performance metrics: bandwidth, or transfer speed, and
clock speed, or transfer frequency. Bandwidth is usually
measured in gigabytes per second (Gb/s) and clock speed in
megahertz (MHz).
GDDR5X
Its design is a small iterative change from the previous
generation of GDDR5 which was very large improvement in
per pin data rate than GDDR4 as seen in figure 2. GDDR5
had an improvement of 5 Gb/s/pin over the previous
generation.
GPU memory functions
A memory bus is fundamentally a series of parallel
paths for signals and subsequent data to travel from one
component in the system to another or multiple others. Most
often they are used to convert and transfer data from the
central processing unit to the system memory [4]. In the case
of GPUs, the memory bus takes tasks and data allocated to
the GPU from the CPU, usually for a series of
computationally heavy tasks like image assembly and
rendering. The memory bus allocates the tasks and resources
necessary for the image and computation from the CPU and
system memory to the memory controllers and modules. The
FIGURE 2 [4]
Memory Speed Over Successive Generations
2
Max Zeeman
Adam Granieri
In comparison GDDR5X had only an improvement of
about 4 Gb/s/pin over GDDR5 [4]. The improvement from
the most recent iteration form the previous can be attributed
primarily to the addition of the PCIe 3.0 port becoming
common place on consumer grade motherboards. The 3.0
port allows for a maximum transfer rate of 8 Giga-transfers
per second per lane and an increase of 16 lanes over PCIe
2.0’s 8 lanes at 4 Giga-transfers per second [5].
graphics memory more energy efficient, AMD engineers
designed HBM to have vertically stacked memory chips,
compared to GDDR5X’s more spread out memory. As can be
seen in figure 2, HBM is composed of a stack of memory
DRAM Dies that “interconnected by wires called “throughsilicon vias,” or TSVs” [6] that are connected to an interposer
and package substrate electrical interfaces.
FIGURE 4 [7]
HBM Memory Design
FIGURE 3 [4]
Memory Configuration of GDDR5 versus HBM
The interposer is essentially a spread out electrical
circuit that connects various HBM die stacks to one another,
as well as to the main GPU die for quick information transfer.
The underlying package substrate acts as the circuit board [6].
In short, these electronics allows for an ultra-wide
communication lanes with high energy efficiency.
Because of HBM’s vertically stacked memory
structure, it is far denser for memory with a nearly 66%
smaller footprint compared to GDDR5X according to figure
2. It is also important to note than the memory stacks can be
expanded to a larger pcb, like GDDR5X’s configuration as
well. What this means is a possibility for exponentially
increased memory capacity if the layout is set to equal the
physical footprint of GDDR5X. [7]
With the GPU products starting to come standard with
HBM 2, the bandwidth gap between GDDR5X and HBM is
likely to decrease. With the latest iteration of HBM, each
memory die is now going to hold 8 GB of memory and have
around 256 GB/s of bandwidth, compared to HBM’s 2 GB
per DRAM die and 128GB/s bandwidth [6]. However, with
these great steps in speed and efficiency comes great
monetary cost to the user. Overall, computer users would
stand to gain a more compact memory system that will
eventually rival GDDR5X in terms of bandwidth speed and
latency.
The GDDR5X protocol utilizes a standard structure for
distribution of memory resources to the main processing
module. The memory modules form a circle as seen in figure
3. With this configuration, innovation was more
straightforward and less novel where the primary increase in
performance is the improvements in processor and memory
density. The smaller design allowed for more modules on
each iteration [5]. The protocol itself distributes commands
and assets to the memory modules simultaneously in a twodimensional array, as the arrangement implies. It is limited
in this aspect. This configuration cannot implement stacking
for memory modules to allow parallel delivery [6]. From
there, as seen in figure 1, memory goes to the common cache,
them to processing clusters. The consequence of this is that
GDDR5X performs well when it comes to tasks that aren’t
heavily resource dependent and favor frequency over many
simultaneous calculations. An example is in gaming, where
there is a finite model and textures constrained in one
environment. A finite model is one such that the shape and
joints of movement are pre-defined or pre-programmed,
instead of dynamic where the model can drastically change
shape and functionally during playtime. It is similar with
textures where the actual picture is already designed and
mapped in a certain orientation on an object, it only needs to
be rendered. Often there is a lot of simulated physics
interactions and dynamic changing to the immediate
environment. Hence the higher clock speed is generally
favored to refresh the image more often.
Contrasting HBM to GDDR5X
HBM
Developed by semiconductor company, Advanced
Micro Devices (AMD), High Bandwidth Memory (HBM) is
quickly becoming the primary rival of GDDR5X. To make
3
Max Zeeman
Adam Granieri
The main advantage of HBM is its power efficiency to
bandwidth ratio. As a reminder, bandwidth is the speed at
which information is transferred from one memory die stack
to the GPU, and vice versa. Bandwidth transfer is normally
measured in units of gigabytes per second (GB/s). Although
GDDR5X has faster bandwidth and memory latency speeds
compared to HBM, GDDR5X is held down by its lack of
power efficiency, and as consequence, it could impede the
development of GPU technology [6]. HBM conversely is a
smaller, more compact memory chip that has a bandwidth
ratio of up to 35 GB/s for every watt of power used, compared
to GDDR5X’s 11 GB/s for every watt used [8]. However, an
issue concerning the first iteration of HBM memory tends to
run hot, as compared to GDDR5X. In a product review of the
AMD R9 Nano 4GB, tester Rikki Wright found that the
graphics card had a heat fluctuation of around 15 degrees
Celsius, which is considered dangerous for a GPU [9]. They
found the graphics card performed particularly poorly when
used in SLI, which is being able to link two graphics cards
together for better performance. In figure 5, Wright
compares the R9 Nano to other GPUs in terms of temperature
under load.
takes 55 by 55 nanometers of space, versus that of GDDR5X,
which is more spread out across the GPU, at 110 by 90
nanometers [4]. Overall, it becomes more of a limit of how
much money a business or an individual can spend, the
immediate cheaper solution being GDDR5X and the more
expensive, but more efficient option being HBM.
SUSTAINABILITY FOR
INDUSTRY AND ACADEMIA
What is Sustainability for GPUs?
In the case of GPUs, the question of sustainability is not
one of ethics, but one of quality of life. What will this new
technology do to improve existing human systems, or how
will it create new, better systems? That is the question
electrical and computer engineers must ask when designing a
new piece of hardware, such as the discussed HBM or
GDDR5X GPU memory. Improved GPU memory has a
myriad of applications; this paper can only reach the
beginning memory improving the quality of life. Thus, we
decided to delve into a few examples of GPU memory’s
usage in industry and academia, examining the technology’s
influence in topics like entertainment, cloud-computing, and
chemistry.
Implications for Industry
In terms of sustainability, GPUs help improve the
quality of life, particularly in the business of entertainment.
The most well-known application of GPU’s is rendering 3D
spaces and images for entertainment. As discussed in a
previous section, current GDDR5X graphics cards are
generally favored in gaming applications and real-time
rendering due to their higher clock-speeds and faster single
lane data-transfer rates. For this reason, Nvidia, a company
that only produces GDDR5X based cards had about 70%
market share back in Q2 of 2016, while Advanced Micro
Devices (AMD), the only company to commercially produce
HBM based-cards had the remaining 30% [6]. In contrast to
this, HBM is becoming increasingly popular in commercial
rendering for movies and similar animated films. Disney, for
one of their recent films, Big Hero 6, developed a new
rendering engine called Hyperion. It was needed as the film
has many scenes flying through a city-scape with many glass
walls. This meant they had to render nearly the whole city
with reflections and billions of light ray calculations in order
get a few frames. The new engine benefits from parallel
computation immensely. It algorithmically groups similar
rays with direction and length into packets and sends them
simultaneously to the GPU for computation [2]. Because this
algorithm creates larger packets deliverable to the GPU, it
directly benefits from the more parallel structure HBM
provides. Although the lower clock speed reduces the
frequency of calculations done, the larger bandwidth allows
for more calculations in a shorter amount of time. The multiGPU array scaling benefits as well. The larger amount of
FIGURE 5 [9]
Temperature Comparison of the HBM R9 Graphics
Cards versus GDDR5X Graphics Cards
Nonetheless, in today’s market, GDDR5X still has its
cost advantages. Although its DRAM is not stacked like
HBM, and are instead spread out evenly on the interposer, it
is by far the quicker memory type as of now, and has not
reached its energy limit yet. Where GDDR5X has speeds of
12 GB/s bandwidth per pin, HBM has only 128 GB/s per
cluster of memory. This makes GDDR5X still a viable
memory for both businesses and individual consumers alike,
as it is by far the cheaper memory [10]. However, spatially
speaking, HBM has the higher ground. HBM, being a threedimensional memory made of stacks of DRAM dies, only
4
Max Zeeman
Adam Granieri
physics based modeling. Up until this point, “due to model
complexity and nonlinear device characteristics, the
application of physics-based model has hitherto been
limited” [12]. Thus, the advent of the massively multi-core
processing has proven itself invaluable. Yan asserts that for
numerical solvers for matrices, such as Gaussian Elimination
and Jacobian Blocks, that GPUs help to bring admirable
computation capability [12]. He hopes that this will lead to
an increase of usage of massively parallel simulation, that can
take full advantage of GPU capabilities, and thus boost the
research of mathematical and physics-related simulations.
Another field of academia that has been positively
affected by GPUs is chemistry. In the Journal of Chemistry,
the article, Quantum Chemistry on Graphical Processing
Units, by Ivan Ufimtsev and Todd Martinez describes how
recent advances in GPU technology and accessibility have
made in possible to “increase the number of publications in
different fields, such as classical molecular dynamics and
quantum chemistry’” [13]. Ufmitsev and Martinez goes so far
as to claim that a ‘video gaming machine’ can outpace even
the highest end quad-core workstation computer when it
comes to processing energy and gradient calculations of
different organic molecules.
After testing a gaming computer containing two Nvidia
graphics cards running parallel versus an Intel quadcore
workspace computer, they found that the computer
containing the two Nvidia GPUs performed significantly
faster. For small molecules, both machines completed the
energy and gradient calculations at about the same pace [13].
However, for medium-size molecules the speedup range[d]
between 20× and 25×, while for large molecules it exceed[ed]
100× [9]. To give context, for the molecule olestra, the time
it took for the CPU system to computation of the energy and
gradient was about 6 hours, versus that of the GPU systems,
which was a little over two minutes.
To reiterate, this study should be taken with a grain of
salt, as it took place in 2009, before the discussed graphical
memory interfaces HBM and GDDR5X were even created.
However, considering the increased computational speeds
using just two GPUs meant for gaming, we can extrapolate
that our two GPU memories of interest could only help
increase the computational speeds and efficiency of this
quantum chemistry research.
GRAM available also means either they can reduce the
number of GPU’s needed to complete the render in the same
amount of time as an array of GDDR5x based cards. This can
allow the studios to devote more resources to create new
techniques for simulating and rendering virtual spaces with
less regard for computing times.
Sustainability also applies to companies wish to become
more efficient, and thus, more “sustainable.” Relational
database management systems (RDBMS) are another
structure that benefit from GPUs and their capability.
Compared to CPUs, GPUs and their high bandwidth memory
prove useful for such things as database processing for
businesses. Author Samuel Cremer in his paper on the
“Efficiency of GPUs for Relational Database Engine
Processing” describes how, “contain energy growing needs,
numerous researches show that GPUs are more efficient than
CPUs for high and intensive computing” [11]. The parallel
functionality of GPUs cores proves useful, and can handle
large vectorized data processes. Also, compared to CPU
Random Accesses Memory, GPU GRAM has nearly 10 times
the bandwidth speed [11]. These facets allow GPUs to
outperform CPUs in RDBMS in almost every area, except in
a few, very specific ways.
Cloud-computing has become a popular buzz-word and
equally as ubiquitous. In fact, Amazon’s Cloud Services
division has its largest by far, generating nearly $13 billion
last year and holding over 30% of cloud services market [7].
Amazon Web Services are the warehouses of server arrays
that Amazon uses for hosting its website and renting use to
third-parties. Currently, a customer may rent-out however
many clusters, which are just a collection of CPUs and GPUs
and send them instructions to perform some computation for
a certain amount of time. Some businesses will run market
simulations for future investments or render a video where
they do not have the resources locally to do so [11]. Since the
range of tasks is so broad, HBM is not very widely used, since
it only shines in asset-heavy workloads. Though with future
iterations, it could free up server space by a considerable
factor thanks to smaller pcb foot print and vertical stacking
of memory modules, meaning more dense warehouses and
more widely available, cheaper cloud computing services.
Benefit for Fields of Academia
Besides benefiting industries like Amazon and Disney,
new high bandwidth memory in GPUs are also proving
invaluable by improving sustainability in various fields of
academia, including mathematics, physics, and chemistry.
Advancements in graphics memory have allowed these fields
to simulate previously arduous models in topics like nonlinear physics or quantum chemistry. However, of the two
studies, the chemistry study should be taken with
reservations, as it took place in 2009, which is out of the
period of the discussed memory types.
For his Master’s, electrical engineer Shenhao Yan
writes about how power electronics can be utilized to
complete accurate and efficient simulations for non-linear
WHAT’S AHEAD FOR GPU MEMORY AND
SUSTAINABILITY
Cost of HBM
The major predicament that’s keeping HBM from
dethroning GDDR5X is the sheer price gap between the two
competing memories. Most recently, NVIDIA has revealed
their own take on HBM memory, in the form of their Pascal
memory architecture, for their upcoming line of Quadro
graphics cards [14]. However, graphics cards like the Quadro
P6000 cost upwards of $5000, which is most certainly not a
5
Max Zeeman
Adam Granieri
feasible option for the average consumer. NVIDIA is pushing
their line of HBM cards for pure work-system and CAD
systems, not particularly for the casual gamer. AMD
similarly just announced its own line of HBM graphics cards,
albeit more geared towards average consumers in terms of
price [15]. However, as newer iterations of HBM memory are
released by these competing companies in the form of HBM
2, the price and power of these cards will continue to rise,
while still staying power efficient. Until this happens,
GDDR5X will remain the go to memory type of GPUs.
reach this goal. Nonetheless, HBM still must jump through
the monetary hoop to be relevant to the average consumer, as
it is unlikely that the average consumer will drop $5000
dollars for a graphics card, no matter how efficient it may be.
However, if GDDR5X does not have a drastic iteration, it will
likely be phased out by HBM within the next decade, leading
to the most energy efficient graphics memory to date for
consumers and businesses alike.
Accessibility and Sustainability
[1] “What is GPU Accelerated Computing?” Nvidia. 2017.
http://www.nvidia.com/object/what-is-gpu-computing.html
[2] “Disney's Hyperion Renderer.” Walt Disney Animation
Studios. 2016.
https://www.disneyanimation.com/technology/innovations/h
yperion
[3] J. Ghorpade. “An Introduction to Graphical Processing
Unit.” IJMER. 2011. Accessed 2.20.2017.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.4
16.5517&rep=rep1&type=pdf
[3] “What is Cloud Computing?”. Amazon Web Services.
2016.
https://aws.amazon.com/what-is-cloud-computing/
[4] C. Kim. “Memory Interfaces: Past, Present, and Future.”
IEEE Solid-State Circuits Magazine. 2012.
http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=7495083&t
ag=1
[5] J. Ajanovic. “PCI Express (PCIe) 3.0 Accelerator
Features.” Intel Corp. 2008. Accessed 2.25.2017.
http://www.thailand.intel.com/content/dam/doc/whitepaper/pci-express3-accelerator-white-paper.pdf
[6] K. Jungrae. “All-Inclusive ECC: Thorough End-to-End
Protection for Reliable Computer Memory.” 43rd Annual
International Symposium on Computer Architecture (ISCA).
2016.
http://ieeexplore.ieee.org/abstract/document/7551427/
[7] “High Bandwidth Memory, Reinventing Memory
Technology.” AMD. 2015. Accessed 2.21.2017
http://www.amd.com/en-us/innovations/softwaretechnologies/hbm
[8] A. Shilov. “Discrete Desktop GPU Market Trends Q2
2016: AMD Grabs Market Share, but NVDIA Remains on
Top.” AnandTech. 9.13.2016. Accessed 2.25.2017.
http://www.anandtech.com/show/10613/discrete-desktopgpu-market-trends-q2-2016-amd-grabs-market-share-butnvidia-remains-on-top
[9] R. Wright. “AMD R9 Nano 4GB (HBMv1) CrossFire
Review.” eTeknix. 2016. Accessed 2.28.2017
http://www.eteknix.com/amd-r9-nano-4gb-hbmv1-crossfirereview/
[10] “Amazon Web Services: 2016 In Review.” Forbes.
12.29.2016. Accessed 2.25.2017
https://www.forbes.com/sites/greatspeculations/2016/12/29/
amazon-web-services-2016-in-review/
SOURCES CITED
Price sustainability is a very important factor that
determines whether a product will allow a company to turn a
profit. If no consumers, corporations, or universities invested
in HBM, companies like Nvidia and AMD would likely
discontinue the technology. However, this is simply not the
case. HBM may be expensive, but it was able to sell itself
well enough to allow these companies to invest and improve
the technology. As mentioned previously, Nvidia recently
released their own HBM 2.0 with their Pascal series of
graphics cards, which perform just as well as their GDDR5X
counterparts [15]. AMD is successfully selling consumer
grade versions of the cards that are less than $1000 [14].
HBM is not a technological fad; it is set to be a true
advancement for graphics cards that is sustainable for
everyone involved, whether it be the company designing the
card, or the industry that buys it.
Single Channel HBM
The unfortunate drawback with HBM so early on in its
development is its lack of single-channel data transfer
relative to GDDR5X [5]. As a goal of the development of
HBM at AMD was to create vastly superior simultaneously
data transfer, there was a compromise made to allow access
to memory module at the same time. This compromise
rendered it handicapped in gaming, which is what most
consumer GPU’s are sold for. Hence AMD’s R9 Fury X did
not perform as well hoped compared to competitors and ran
so hot they sold it with a built-on water cooling apparatus
instead of a traditional air cooler. An improvement in single
channel performance would reduce the strain on the
processing cluster and allow for lower clock-speeds to
achieve the same performance, which in turns means lower
temperatures in most consumer workloads. Going forward,
that will be the biggest improvement consumers and
businesses will be looking for to make HBM more
ubiquitous.
The Final Word
As of right now, GDDR5X will remain the preferable
memory type, for its single lane data transfer speeds and cost
alone. It performs sufficiently enough for consumer
applications like gaming and content creation in the mean
while. HBM, given time, will eventually overtake GDDR5X
in terms of power and efficiency, but currently, as still yet to
6
Max Zeeman
Adam Granieri
[11] S. Cremer. “Efficiency of GPUs for Relational Database
Engine Processing.” International Conference on Algorithms
and Architectures for Parallel Processing. 2016. Accessed
1.15.2017
http://link.springer.com/chapter/10.1007/978-3-319-499567_18
[12] S. Yan. “Large-Scale Power Electronic Circuit
Simulation on a Massively Parallel Architecture.” Education
and Research Archive of the University of Alberta. 2016.
Accessed 1.15.2017
https://era.library.ualberta.ca/files/c348
[13] I. Ufmitsev. “Quantum Chemistry on Graphical
Processing Units. 3. Analytical Energy Gradients, Geometry
Optimization, and First Principles Molecular Dynamics.”
Journal of Chemical Theory and Computation. 2009.
Accessed 2.20.2017
http://pubs.acs.org/doi/pdf/10.1021/ct9003004
[14] “AMD Radeon R9 Series Gaming Graphics Cards with
High-Bandwidth Memory.” AMD. 2017. Accessed
2.22.2017
http://www.amd.com/en-us/products/graphics/desktop/r9
[15] R. Smith. “NVIDIA Announces Quadro GP100 – Big
Pascal Comes to Workstations.” AnandTech. 2.5.2017.
Accessed 2.20.2017
http://www.anandtech.com/show/11102/nvidia-announcesquadro-gp100
ACKNOWLEDGMENTS
A man once said to me, “Knowledge is the ultimate
form of power.” That man was Tim Cibos, and he is a fellow
resident on floor 5 of Forbes Hall here at the University of
Pittsburgh. Tim is from a small town in Central Pennsylvania,
and has been an inspiration to us through the whole process.
His academic integrity, grit and devotion have been
incredibly inspiring over the course of this paper. He
supported both of us by providing intellectual support and
being a good person to bounce ideas and writing quality off
of. Similarly, Glenn Louis Mursky has also been a source of
emotional support through this difficult writing process. His
kind words have gotten through the thick and thin. We would
like to thank our writing instructor, Tim Maddocks and our
Co-Chair, Nick Haver for helping us through learning the
process of writing a research paper. We would also like to
thank Floor 5 of Forbes Hall in general for giving feedback
on our paper topic and being the genuinely best people we
have had the pleasure of working with. We wish them the best
on all of their papers as well.
7