Slide Set 1 - for ENCM 501 in Winter Term, 2017

Slide Set 1
for ENCM 501 in Winter Term, 2017
Steve Norman, PhD, PEng
Electrical & Computer Engineering
Schulich School of Engineering
University of Calgary
Winter Term, 2017
ENCM 501 W17 Lectures: Slide Set 1
Contents
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 2/49
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 3/49
ENCM 501 W17 Lectures: Slide Set 1
slide 4/49
About these slides
This is the first of around ten to fifteen large sets of slides that
will be used for lectures in ENCM 501 in Winter 2017.
It will usually take 2–3 lectures to get through a single set of
slides.
Reading these slides online is not a good substitute for
attending lectures—in most lectures I will do some important
hand-written work using the document camera. Please come
to lectures prepared to take some notes.
ENCM 501 W17 Lectures: Slide Set 1
slide 5/49
Typographical conventions
Either bold text or bright red text will be used for emphasis.
The typewriter font will usually be used for code in
assembly language, C, or C++. (I might not use the typewriter
font for code if it makes the code too wide to fit in a slide.)
Text in a box is a general description of what could appear
within a piece of code.
Example: A C do statement has this syntax . . .
do
statement
while ( expression );
(Usually statement is a compound statement that starts
with { and ends with } .)
ENCM 501 W17 Lectures: Slide Set 1
slide 6/49
Typographical conventions: Italics
Italics will be used two different ways.
One word or a few words in italics will be used to formally or
informally define a term.
Example: A bit is the basic unit of information in a digital
system; the value of a bit is either 0 or 1.
An entire sentence in italics indicates a pause to elaborate a
concept or solve a problem under the document camera.
Example: Let’s translate the C statement into a sequence of
assembly language instructions.
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 7/49
ENCM 501 W17 Lectures: Slide Set 1
slide 8/49
Review of Computer Organization Basics
The next several slides will review concepts covered early in
ENCM 369, and make comments about how those concepts
are related to current computer design.
slide 9/49
ENCM 501 W17 Lectures: Slide Set 1
Organization of a Simple Computer
Main
Memory
Bus
I
I
I
Processor
I/O
Device
..
.
..
.
I/O
Device
I
I
What is a bus?
What is the role of the
processor?
What is the role of the
main memory?
What does I/O stand
for?
What are important
categories of
I/O devices?
ENCM 501 W17 Lectures: Slide Set 1
slide 10/49
Operation of our Simple Computer
Within the processor, there is a special-purpose register called
the program counter (or PC). The PC holds the memory
address of the next instruction to be executed.
When our computer is powered up, some kind of initialization
circuit puts a specific address into the PC. After that, the
processor repeats two steps, Step 1 and Step 2, over and over,
until the computer is powered down.
ENCM 501 W17 Lectures: Slide Set 1
slide 11/49
Operation of our Simple Computer, continued
Step 1: Fetch an instruction and update the PC. Copy
one or more bytes, starting at the address in the PC, into the
processor, then make the PC point at the next byte in memory
that wasn’t part of the instruction.
Remark: Step 1 is simple if all instructions are the same size,
but can be quite messy if some instructions occupy more bytes
of memory than other instructions.
Step 2: Execute the instruction. Perform whatever tiny,
simple task is specified by the instruction.
Let’s make an informal list of the kinds of instructions we’ve
seen in various computer instruction sets, with one or more
specific examples of each kind of instruction.
ENCM 501 W17 Lectures: Slide Set 1
slide 12/49
Step 1, then Step 2, then Step 1, then Step 2 . . .
This model for computer operation is simultaneously seriously
misleading regarding current computer systems (except
perhaps the very “lowest-end” embedded computers) and
supremely important.
Let’s make a list of reasons why the model is seriously
misleading.
But why then is the model still supremely important?
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 13/49
ENCM 501 W17 Lectures: Slide Set 1
slide 14/49
What does “Computer Architecture” mean?
It is surprisingly hard to come up with a simple, short
definition of computer architecture. It’s kind of an “umbrella”
term that includes a bunch of related ideas and activities.
Let’s start at the level of instructions . . .
I What instructions are available to applications
programmers? This is often called instruction set
architecture, or ISA.
I What additional instructions are provided to operating
system kernel programmers? (Examples: Instructions to
query system state when an interrupt occurs, to manage
virtual memory hardware, to control I/O devices, and so
on.)
ENCM 501 W17 Lectures: Slide Set 1
slide 15/49
Now let’s move a down one or two levels of abstraction . . .
I Given the ISA, how exactly are instructions handled by
processors—how deep are pipelines; can instructions be
executed out-of-order?
How is the memory system organized to minimize loss of
clock cycles in fetching instructions and reading and
writing data?
This category of concern is sometimes called
microarchitecture or organization.
I Given a microarchitecture, what are good ways to
implement it at the integrated circuit and printed circuit
board levels? These are hardware design problems.
ENCM 501 W17 Lectures: Slide Set 1
slide 16/49
It’s good to have a broad perspective on
architecture
Obviously, ISA choice dictates much about microarchitecture,
and microarchitecture dictates much about hardware.
But the influences also work in the opposite direction, from
lower to higher levels of abstraction.
Costs of hardware design, hardware verification, fabrication
(chip production) makes some microarchitectures attractive
and others less attractive.
Aspects of microarchitecture matter when a new ISA is
designed or an existing ISA is extended. Preference for
relatively simple, clean microarchitecture might rule out some
useful instructions.
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 17/49
ENCM 501 W17 Lectures: Slide Set 1
slide 18/49
ENCM 501 Course Topics
I
I
I
I
I
I
introduction to computer system design goals and
performance measurement (textbook, Chapter 1)
brief overview of ISA principles (parts of Appendix A)
memory system design and performance assessment
(parts of Appendix B and Chapter 2)
aspects of instruction-level parallelism (parts of
Appendix C and Chapter 3)
aspects of thread-level parallelism (TLP) (parts of
Chapter 5)
introduction to programming with TLP (not covered in
textbook)
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 19/49
ENCM 501 W17 Lectures: Slide Set 1
slide 20/49
Trends in computer system performance
The next slide shows a plot of “benchmark” performance
scores for various computers, showing the years various
systems were introduced.
“Performance” here means roughly 1 over the time taken to
complete a collection of processor-intensive tasks. (We’ll look
much more carefully at performance measurement in future
lectures.)
(The text on the plot will be pretty much illegible in the
classroom, but we can still make a few important points by
looking at it.)
slide 21/49
ENCM 501 W17 Lectures: Slide Set 1
100,000
Performance (vs. VAX-11/780)
10,000
Intel Xeon 6 cores, 3.3 GHz (boost to 3.6 GHz)
Intel Xeon 4 cores, 3.3 GHz (boost to 3.6 GHz)
Intel Core i7 Extreme 4 cores 3.2 GHz (boost to 3.5 GHz)
24,129
Intel Core Duo Extreme 2 cores, 3.0 GHz
21,871
19,484
Intel Core 2 Extreme 2 cores, 2.9 GHz
14,387
AMD Athlon 64, 2.8 GHz
11,865
AMD Athlon, 2.6 GHz
Intel Xeon EE 3.2 GHz
7,108
Intel D850EMVR motherboard (3.06 GHz, Pentium 4 processor with Hyper-Threading Technology)
6,043 6,681
4,195
IBM Power4, 1.3 GHz
3,016
Intel VC820 motherboard, 1.0 GHz Pentium III processor
1,779
Professional Workstation XP1000, 667 MHz 21264A
1,267
Digital AlphaServer 8400 6/575, 575 MHz 21264
993
AlphaServer 4000 5/600, 600 MHz 21164
649
Digital Alphastation 5/500, 500 MHz
481
Digital Alphastation 5/300, 300 MHz
280
Digital Alphastation 4/266, 266 MHz
183
IBM POWERstation 100, 150 MHz
117
Digital 3000 AXP/500, 150 MHz
80
HP 9000/750, 66 MHz
51
1000
100
10
IBM RS6000/540, 30 MHz
MIPS M2000, 25 MHz
18
MIPS M/120, 16.7 MHz
13
Sun-4/260, 16.7 MHz
9
VAX 8700, 22 MHz
5
24
22%/year
52%/year
AX-11/780, 5 MHz
25%/year
1.5, VAX-11/785
1 1
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012
Image is Figure 1.1 from Hennessy J. L. and Patterson D. A., Computer
c 2012, Elsevier, Inc.
Architecture: A Quantitative Approach, 5nd ed., ENCM 501 W17 Lectures: Slide Set 1
slide 22/49
Performance ratio, 2010 compared to 1978: about 24,000 to 1. In
other words, what took about 7 hours in 1978 took about 1 second
in 2010.
From 1986 to 2003, the average annual performance improvement
was 52% per year.
From 2003 to 2010, the average annual performance improvement
was 22% per year—the pace of improvement has slowed in recent
years.
There have been comparable improvements in telecommunication
bandwidth and data storage capacity.
The result is a pattern that has been seen over and over: Computer
applications go from impossible to practically unaffordable to
cheap and commonplace over periods of several years.
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 23/49
ENCM 501 W17 Lectures: Slide Set 1
slide 24/49
Classes of computer
In Section 1.2, Hennessy and Patterson divide computer
systems into five classes. Knowing what they mean will help in
following the textbook!
I personal mobile device (PMD): things like smartphones
and tablets.
I desktop: what most of us would call “desktops”, and
also laptops. This is a somewhat unusual definition, but
makes sense as use cases and requirements are broadly
similar.
I servers
I clusters and warehouse-scale computers: systems large
enough to support operations like Google, Amazon, etc.
I embedded computers: computers built-in to machines
such as appliances, cars, telecom infrastructure.
ENCM 501 W17 Lectures: Slide Set 1
slide 25/49
Also in Section 1.2, Hennessy and Patterson make some
distinctions between various kinds of parallelism in hardware
design.
You can read that material to get a general idea about the
diverse forms of parallel computation, but we won’t worry
about the details until much later in the course.
A good “takeaway”: If somebody tells you in a vague way that
an algorithm uses parallel processing, you should ask, What
kind of parallel processing?
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 26/49
ENCM 501 W17 Lectures: Slide Set 1
slide 27/49
Trends in Technology
Textbook reference: Section 1.4.
Moore’s law is attributed to Gordon E. Moore, a co-founder of
Intel. The idea dates back at least to 1965, and probably
earlier. (There is a lengthy historical discussion on the
Wikipedia page for Moore’s Law.)
It isn’t really a physical law; it was more of an observation and
prediction about integrated circuit (IC) technology. The
general projection was that the number of transistors in a
typical state-of-the-art IC chip would double every two years or
so.
slide 28/49
ENCM 501 W17 Lectures: Slide Set 1
Moore’s law example
Transistor counts for some famous Intel processors . . .
processor
80386
80486
Pentium
Pentium II
year
1985
1989
1993
1997
# transistors
275 thousand
1.2 million
3.2–4.5 million
7.5 million
clock frequency
16–25 MHz
25–100 MHz
60–300 MHz
233-450 MHz
(Moore’s law didn’t make an explicit forecast about clock
speed, but decreasing transistor size tended to correspond to
decreasing transistor switching time.)
Using 1985 as a starting point, let’s estimate the transistor
count for an Intel Core i7 chip in 2010.
Data source: Table 7.7 in Harris D. M. and Harris S. L., Digital Design
c 2013, Elsevier, Inc.
and Computer Architecture, 2nd ed., ENCM 501 W17 Lectures: Slide Set 1
slide 29/49
Limits to Moore’s law
Moore’s law has been reasonably accurate for much longer
than might have been predicted decades ago.
Your instructor is neither an IC design expert nor an IC process
expert, so will refrain from giving a detailed opinion about
when Moore’s law will finally fail.
“Node size” in the latest available Intel processors is
10 nanometers, and Intel is projecting a 5-nanometer “node”
several years from now, so it seems that Moore’s law hasn’t
yet run into a wall.
However, even at 10 nanometers, linear transistor dimensions
are now in the tens of atoms in a silicon crystal lattice, so
it’s clear that the number of “shrinks” left must be limited.
ENCM 501 W17 Lectures: Slide Set 1
slide 30/49
Trends in Technology, continued
DRAM (dynamic RAM) is the IC technology used to build
main memories for the last several decades. Moore’s law has
applied to DRAM just as it has to processor chips—see the
“Mbits/DRAM chip” row in Figure 1.10 in the textbook.
Unfortunately, improvement of DRAM latency hasn’t nearly
matched improvement of DRAM capacity, necessitating the
design of complex caching systems within memory
hierarchies, and limiting the performance of programs that
need to access large amounts of memory.
Density of nonvolatile storage—magnetic disks and Flash
chips—has also seen decades of exponential growth.
ENCM 501 W17 Lectures: Slide Set 1
slide 31/49
Bandwidth and latency
These terms are very important.
Bandwidth, generally, is the peak rate at which simple tasks
can be performed, e.g., arithmetic operations per second
within a processor, or bytes transferred per second between a
DRAM module and a processor chip. (This definition is related
to but not exactly the same as the definition of bandwidth in
other areas of engineering, such as signal processing.)
Latency is the time delay between the moment a task is
started and the moment that task is completed.
slide 32/49
ENCM 501 W17 Lectures: Slide Set 1
Bandwidth and latency, continued
Relatively poor latency is a major design issue.
Current DRAM latency is approximately 100 times a processor
clock cycle, meaning that to get decent processor performance, the
vast majority of instruction fetches, data loads and data stores
must not require actual DRAM access—instead most memory
accesses need to be handled by caches on the processor chip.
Magnetic disk drive latency is horrible compared to speed of just
about anything implemented entirely with integrated circuits. 2010
numbers . . .
kind of time interval
processor clock cycle
DRAM latency
disk latency
duration (ns)
0.3
37
3,600,000
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 33/49
slide 34/49
ENCM 501 W17 Lectures: Slide Set 1
Preliminaries for energy and power use
Here is a model for a generic CMOS logic gate . . .
VDD
pull-up
network
gate inputs
gate output
pull-down
network
C
(load capacitance)
C is the sum of all wire capacitances and gate input
capacitances driven by our generic gate.
slide 35/49
ENCM 501 W17 Lectures: Slide Set 1
Models for our CMOS gate trying to output 0 or 1 . . .
VDD
VDD
RPU
RPU
gate output
gate output
C
RPD
C
RPD
Which circuit (left or right) is trying to generate a logic 0
output, and which is trying to generate a logic 1?
slide 36/49
ENCM 501 W17 Lectures: Slide Set 1
VDD
VDD
RPU
RPU
gate output
gate output
C
C
RPD
RPD
What are the energy flows when the gate output goes from
logic 0 to logic 1? What are they when the gate output goes
from logic 1 to logic 0?
ENCM 501 W17 Lectures: Slide Set 1
slide 37/49
The next two slides try to show why energy losses (heat
generation) in 1 → 0 and 0 → 1 transitions of CMOS gate
2
outputs are both 12 CVDD
, regardless of how well or poorly the
pull-down and pull-up networks conduct.
My math depends on a crude resistor-and-switch model for
pull-down and pull-up networks, but I’m pretty sure the same
results can be derived without making such rough assumptions
about NMOS and PMOS transistors.
(This course is Computer Architecture, not Digital CMOS
VLSI, so I’m not going to put any more time into this issue!)
ENCM 501 W17 Lectures: Slide Set 1
slide 38/49
1 → 0 transition: Let t = 0 be the instant when the input
switches to cause a 1 → 0 transition on the output. (In reality,
input changes are not instant.)
−t
Vout (t) = VDD exp
RPD C
VDD
RPU
Vout
C
RPD
So energy lost in RPD is
Z ∞
Vout (t)2
dt
RPD
t=0
2 Z ∞
VDD
−2t
dt
=
exp
RPD t=0
RPD C
∞
2
VDD
−RPD C
−2t
=
exp
RPD
2
RPD C t=0
1
2
= CVDD
2
ENCM 501 W17 Lectures: Slide Set 1
slide 39/49
0 → 1 transition: Let t = 0 be the instant when the input
switches to cause a 0 → 1 transition on the output.
−t
Vout (t) = VDD 1 − exp
RPU C
VDD
−t
VPU (t) = VDD exp
RPU C
+
RPU
VPU
−
Vout
C
RPD
Energy lost in RPU is
Z ∞
VPU (t)2
dt
RPU
t=0
∞
2
−RPU C
−2t
VDD
exp
=
RPU
2
RPU C t=0
1
2
= CVDD
2
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 40/49
slide 41/49
ENCM 501 W17 Lectures: Slide Set 1
Energy and power
Power is the time rate of energy use. (That should not be a
new idea for 4th-year engineering students!)
instantaneous power =
average power =
d
energy use
dt
energy use over time interval
duration of time interval
ENCM 501 W17 Lectures: Slide Set 1
slide 42/49
Energy and power use of a single logic gate
The energy spent per clock cycle of a gate with an output that
makes a 0 → 1 or 1 → 0 transition every single clock cycle is
1
C VDD 2 .
2
If the clock period is T , the frequency is f = 1/T , so the
power use by the gate is
1
1
C VDD 2 / T = C VDD 2 f .
2
2
The equations are correct but an assumption here is incorrect.
Why is this not a good model for power use by a logic gate in
a processor circuit?
ENCM 501 W17 Lectures: Slide Set 1
slide 43/49
Energy and power use of a processor chip (1)
A useful concept, unfortunately not mentioned in Section 1.5
of your textbook, is a, the activity factor.
Let Ctotal be the sum of all of the capacitive loads for all of the
logic gates in an IC. Then a Ctotal is the average capacitive load
that actually does a 0 → 1 or 1 → 0 transition in a clock cycle.
Why is a much less than 1 for a modern processor chip?
How could a be greater than 1 for certain small regions
within a modern processor chip?
Which is a better way to think?
I a is not hard for engineers to estimate, and is pretty much
determined by the design of a processor chip.
I a is scarily unpredictable.
slide 44/49
ENCM 501 W17 Lectures: Slide Set 1
Energy and power use of a processor chip (2)
Two formulas, assuming that a varies over time . . .
Energy used and heat that must be dissipated in a single clock
cycle, due to switching:
Edynamic =
1
a(t) Ctotal VDD 2 .
2
Power consumption:
Pdynamic =
1
a(t) Ctotal VDD 2 f .
2
ENCM 501 W17 Lectures: Slide Set 1
slide 45/49
Energy and power use of a processor chip (3)
An ideal CMOS logic gate does not consume any power when
its output is not switching, because either its pull-up network
or its pull-down network is completely turned off.
In real CMOS ICs, however, there are are various paths for
current to leak from VDD to ground:
Pstatic = VDD Ileakage
This is a major concern at both ends of the computing
spectrum:
I It gradually drains batteries in battery-powered embedded
systems.
I It wastes energy in servers that spend significant time
idle, waiting for tasks to arrive.
ENCM 501 W17 Lectures: Slide Set 1
slide 46/49
Both energy and power matter in processor design
Because most processors are idle much of the time, energy
spent on a typical task is a good measure of the efficiency of a
processor.
However, power at maximum load is critical as well . . .
I The power supply must be able to supply the needed
current without dropping VDD .
I The cooling system must be capable of removing heat at
a rate equal to average power during sustained heavy
load.
ENCM 501 W17 Lectures: Slide Set 1
slide 47/49
Energy and power management in processor chips
A simple processor chip is either on or off. When it’s on, the
whole chip is on, and VDD and f are fixed.
More complex processor chips . . .
I turn off idle regions within the chip;
I use DVFS (dynamic voltage-frequency scaling)—VDD and
f go up and down with the processor load.
DVFS relies on the fact that a CMOS circuit can operate
correctly over a wide range of VDD values. Lower VDD is more
energy-efficient but results in slower switching times, so when
VDD is reduced, f must be reduced as well.
ENCM 501 W17 Lectures: Slide Set 1
Outline of Slide Set 1
About these slides
Review of Computer Organization Basics
What does “Computer Architecture” mean?
ENCM 501 Course Topics
Trends in computer system performance
Classes of computer
Trends in Technology
Preliminaries for energy and power use
Energy and power
Looking Ahead
slide 48/49
ENCM 501 W17 Lectures: Slide Set 1
slide 49/49
Looking Ahead
In ENCM 501, we won’t study the material textbook
Section 1.6 (Trends in Cost) and Section 1.7 (Dependability).
We will look in detail at some of the topics in Sections 1.8 and
1.9, then move on to Appendix A, which is about instruction
set design.

Download Report

Slide Set 1 - for ENCM 501 in Winter Term, 2017

Paperzz.com

Your Paperzz