Computer Architecture and Digital Technology, 10

Computer Architecture and
Digital Technology, 10 c
18-347 – Fall 2003
Lec.03 - 1
Intro
Instructor: Stefanos Kaxiras
Few things about me:
Just moved in Uppsala
Senior Lecturer – Computer Architetcure
2003–
Prof. Univ. of Patras, Greece
1998–2003 Bell Labs (Unix & C group),
Ph.D. Univ. of Wisconsin, 1998
Don’t speak Swedish …
Course taught in English … with your participation!
18-347 – Fall 2003
Lec.03 - 2
What the course is about:
“Computer organization” + Digital Design
Interface to the bare H/W
Transistors digital circuits arithmetic logic units (adders) processors &
memory computers
Basic operations microprogramming / microarchitecture instruction set
architecture (ISA) Assembly
Finishing this course you should be able to:
Understand the functionality and operation of the basic elements of a
computer system, including processor, memory, input/output
Reason about first-order performance of a computer system
Understand the hardware/software interface
Understand and write programs in assembly language
18-347 – Fall 2003
Lec.03 - 3
Grading, Assignments, Etc.
Final exam
ASSIGNEMENTS
Assembly (MIPS assembly)
Logic Synthesis of a processor in LogicSIM
Teaching assistant:
Will test the knowledge presented in the lectures and the assembly language
experience gained in the labs
David Elköv, Vasileios Spiliopoulos
Course grade
Pass assignements 3
Pass exams > 3
18-347 – Fall 2003
Lec.03 - 4
Books
Computer Organization & Design. The Hardware/Software
Interface by Patterson & Hennessy, 4th edition Morgan
Kaufmann Publ 2007.
Note that the 3rd edition is not significantly different and it can
substitute for the 4th edition
18-347 – Fall 2003
Lec.03 - 5
Assignements
Assembly
18-347 – Fall 2003
Lec.03 - 6
Assignments (cont.)
LogicSIM
Publicly available for download
Easy digital design
Goal: to build a processor to execute instructions
18-347 – Fall 2003
Lec.03 - 7
Overview of the Course
The hardware/software interface
Processors (After basic Digital Design)
Processor structure & operation
ALUs, control, datapath
Singlecycle implementation
Multicycle implementation
Pipelining
Advanced architecture
Memory
Introduction + history
Computer Arithmetic
Performance
Instruction Set Architecture
Assembly programming
Memory system and caches
Virtual memory
IO
2003
I/O
18-347 – Fall
Lec.03 - 8
Schedule (The hardware/software interface)
Vecka 12
Tis 22 mar
13:15-15:00
Tor 24 mar
10:15-12:00
Vecka 13, 2011
Mån 28 mar 10:15-12:00
Ons 30 mar 10:15-12:00
Fre
1 apr 13:15-15:00
Vecka 14, 2011
Mån 4 apr
13:15-15:00
Tis
5 apr 10:15-12:00
Ons 6 apr 13:15-15:00
Fre
8 apr 08:15-12:00
18-347 – Fall 2003
Introduction
Arithmetic
More Arithmetic
Performance
ISA 1
ISA 2
Assembly Tutorial
Assembly Tutorial
Laboration
Lec.03 - 9
Credits
Slides and material ADAPTED from:
Karl Marklund’s slides, Uppsala University
Justin Pearson’s slides, Uppsala University
Slides originally developed by Profs. Hill, Falsafi, Marculescu,
Patterson, Rutenbar and Vijaykumar of CMU, Purdue, UCB,
UW, Copyright 2003
Stefanos Kaxiras’ slides, University of Patras & Uppsala
Univeristy
Tanenbaum, Structured Computer Organization, Fifth Edition,
(c) 2006 Pearson Education, Inc.
18-347 – Fall 2003
Lec.03 - 10
Introduction
Some historical background and
anecdotes
18-347 – Fall 2003
Lec.03 - 11
Contemporary Multilevel Machines
A six-level computer (Tanenbaum). The support method for
each level is indicated below it .
18-347 – Fall 2003
Lec.03 - 12
Milestones in Computer Architecture
PRE-HISTORY
18-347 – Fall 2003
Lec.03 - 13
Milestones in Computer Architecture
EARLY-HISTORY
The birth of the Modern-era Computer
18-347 – Fall 2003
Lec.03 - 14
A Historical Perspective
In the beginning...Eniac
5,000
18-347 – Fall 2003
additions in one second
Lec.03 - 15
ENIAC
Built at the University of Pennsylvania
Lt Gillon, Eckert and Mauchley
Initial contract for $61,700, June 1943
eventually cost $486,804.22, in 1946
Accumulator deployed in Jun 1944
Accumulator, multiplier, divide and square root and 3 portable function tables
completed in Fall 1945
200 µsecond cycle time for 1 add
Internals
19K vacuum tubes, 1.5K relays, 100K’s of resistors, and inductors;
30 separate units; forced air cooling; Multiply’s in base 10; just like a human
Originally, no internal memory --> programmed w/cables and switches
Designed to compute firing tables
Differential equations of motion to compute trajectory in 15 seconds (same
amount of computation took a human 20 hours)
Power = 200K Watts
18-347 – Fall 2003
Lec.03 - 16
ENIAC: programming in hardware
18-347 – Fall 2003
Lec.03 - 17
Von Neumann Machine: The concept of the StoredProgram Computer
The original Von Neumann machine:
DATA and PROGRAM are BOTH in the SAME MEMORY
18-347 – Fall 2003
Lec.03 - 18
Computer Generations
Zeroth Generation
Mechanical Computers (1642 – 1945)
First Generation
Vacuum Tubes (1945 – 1955)
Second Generation
Transistors (1955 – 1965)
Third Generation
Integrated Circuits (1965 – 1980)
Fourth Generation
Very Large Scale Integration (1980 – ?)
18-347 – Fall 2003
Lec.03 - 19
Milestones in Computer Architecture
The beginning of the Computer Age
18-347 – Fall 2003
Lec.03 - 20
Milestones in Computer Architecture (2)
Evolution of the Computer:
Mainframes, Minis, Supercomputers, Workstations, and PCs
(the
Killer
Micros)
18-347 – Fall 2003
Lec.03 - 21
Some Important Computers
DEC PDP-8
PDP-11
VAX-11 the king of the minis
IBM 360 the Mainframe
The supercomputers: CDC 6600, CRAY-1
The microprocessors: Intel 8086 …
The RISCs: MIPS, SPARC, ALPHA, …
Pentium 4, Core-2, …
EXCELLENT resources on the Web for the history of
these computers, especially wikipedia has great
articles for all these
18-347 – Fall 2003
Lec.03 - 22
PDP-8 Innovation – Single Bus
The PDPPDP-8 omnibus
Primitive machine: 8 basic instructions! 4K to 32K word memory
(12
(12--bit words)
18-347 – Fall 2003
Lec.03 - 23
VAX-11
VAX-11: Virtual Address eXtension to pdp 11
VAXExtremely popular university computer
One of the most complex machine instruction sets
ever!
Modern computer science developed on it
An Instruction could be a whole loop!
Studies showed that compilers could not use all these
instructions …
The nominal 11-MIPS machine
BSD Unix, TCP/IP, …, took off on this machine
18-347 – Fall 2003
Lec.03 - 24
IBM 360
•
•
•
•
•
Before 360: architecture == implementation
360 (Gene Amdahl): architecture independent of implementation!
One ISA, multiple instantiations
SAME Software runs on ALL! (Radical development)
Backwards compatibility: crucial for an architecture
•
Intel
Apple !!!! Motorola 68000 PowerPC Intel and ALWAYS maintained
software compatibility.
IBM 360 also run (emulated) IBM 1401 and 7094
18-347 – Fall 2003
Lec.03 - 25
Supercomputers
1st Supercomputer: CDC 6600 (Seymour Cray)
Could perform more than one instruction at the same time!
(Superscalar -- Scoreboard)
Pipelined
Very fast cycle time
10 Peripheral processors for I/O
No ECC or parity in memory
18-347 – Fall 2003
Correct answer?
Lec.03 - 26
CRAY
CRAY--1
CRAY
First VECTOR
supercomputer
Instructions could
operate directly on
vectors:
18-347 – Fall 2003
A[
A[ii] = B[i
B[i] + C[i
C[i], i=0,128
Lec.03 - 27
The dawn of the micros (mid 70’s)
Intel: 8051, 8080
Motorola 6800
Zilog Z80
MOS 6502
8-bit micros, 8-bit words, 64K memory
Invariably microprogrammed architectures running at
about 1 – 2 MHz
80’s: 16-bit micros: Intel 8086/8088, Motorola 68000
Up to 1 MB memory in 8086 via Segmentation
16 MB in MC 68000
Precursors of the CISCs: 80386,486,Pentium,Pro,II,III,4,Core-2
18-347 – Fall 2003
Lec.03 - 28
RISCs
Introduced the concept of smaller (ISA) is better
RISC: Reduced Instruction Set Computers
Pioneered by:
No microcore, all hardware, pipelined, many registers, very
simple instructions
Berkeley (Patterson) SPARC (SUN)
Stanford (Hennessy) MIPS (MIPS)
First RISC IBM 801,
First commercial RISC: ARM
Others: HP PA-RISC, DEC ALPHA, IBM/Motorola
PowerPC, IBM Power, …
18-347 – Fall 2003
Lec.03 - 29
Intel Computer Family (1)
The Intel CPU family. Clock speeds are measured in MHz
(megahertz) where 1 MHZ is 1 million cycles/sec.
18-347 – Fall 2003
Lec.03 - 30
Intel Computer Family (2)
The Pentium 4 chip. The photograph is copyrighted by the
Intel Corporation, 2003 and is used by permission.
18-347 – Fall 2003
Lec.03 - 31
Personal Computer
1. Pentium 4 socket
2. 875P Support chip
3. Memory sockets
4. AGP connector
5. Disk interface
6. Gigabit Ethernet
7. Five PCI slots
8. USB 2.0 ports
9. Cooling technology
10. BIOS
A printed circuit board is at the heart of every personal
computer. This figure is a photograph of the Intel D875PBZ board.
The photograph is copyrighted by the Intel Corporation, 2003 and
is used by permission.
18-347 – Fall 2003
Lec.03 - 32
Four Decades of Microprocessor
The Decade of the 1970’s “Microprocessors”
- Programmable Controller / Single-Chip Microprocessors
- Personal Computers (PC)
The Decade of the 1980’s “Quantitative Architecture”
- Instruction Pipelining
- Fast Cache Memories
- Workstations
The Decade of the 1990’s “Instruction-Level Parallelism”
- Superscalar Processors
- Speculative Microarchitectures
- Aggressive Code Scheduling
- Low-Cost Desktop Supercomputing
The Decade of the 2000’s “Power & memory wall Multicore”
- Multiprocessor on a Chip (CMP), Multicore
- Massively parallel GP-GPUs
- Manycores ?
18-347 – Fall 2003
Lec.03 - 33
Moore’s LAW
Moore said: every chip generation (3 years) # transistors double
Law: transistor # doubles every 18 months!
18-347 – Fall 2003
Lec.03 - 34
More on Moore’s Law
Corollary 1: speed of circuits (clock frequency, MHz or
GHz) doubles every 18 months. Smaller transistors are
also faster!
Corollary 2: Performance doubles every 18 months!
BUT: ARCHITECTURE translates the increase in transistors to
an increase in performance
Examples: add vs. memory access
Instructions per cycle: MAJOR ARCH. Contribution to Perf.
Exponential increase in performance with every
generation: What does it mean?
18-347 – Fall 2003
Lec.03 - 35
“Computing power” as area: what part of a 2005
processor corresponds to a 1995 processor?
6/2004
12/2005
12/2002
6/2001
12/1999
6/1998
12/1996 6/1995
18-347 – Fall 2003
Lec.03 - 36
Technology => dramatic change
Processor
logic capacity: about 30% per year
clock rate:
about 20% per year
Memory
DRAM capacity: about 60% per year (4x every 3 years)
Memory speed: about 10% per year
Cost per bit: improves about 25% per year
Disk
capacity: about 60% per year
18-347 – Fall 2003
Lec.03 - 37
Processor Performance
Microprocessor performance growth in perspective: “Unmatched by any
other industry” [John Crawford, Intel Fellow, 1993]
Doubling every 18 months (1982-1996): total of 800X
- Cars travel at 44,000 MPH; get 16,000 miles/gal.
- Air travel: L.A. to N.Y. in 22 seconds (MACH 800)
- Wheat yield: 80,000 bushels per acre
Doubling every 24 months (1970-1996): total of 9,000X
- Cars travel at 600,000 MPH; get 150,000 miles/gal.
- Air travel: L.A. to N.Y. in 2 seconds (MACH 9,000)
- Wheat yield: 900,000 bushels per acre
Exponential effect
18-347 – Fall 2003
Lec.03 - 38
Is Moore’s Law Still Alive?
YES! Transistors will double for the next few
generations: 65nano, 45, 32, 22, 16~12nano
But FREQUENCY (speed) has stalled ~4GHz
Power Consumption:
Beyond that (10nano): transistors as we know them now
don’t work need new devices!!
Power Density in chips would reach the surface of the sun if
we continued
Single-core (processor) performance also stalled
Architectural implication: shift to MULTICORES use
all these transistors for parallel performance
18-347 – Fall 2003
Lec.03 - 39
Fundamental Equation for Processor Design
The “Iron Law” of processor performance:
Processor Performance =
Instructions
= -----------------Program
(code size)
X
Time
--------------Program
Cycles
---------------Instruction
(CPI)
Time
X -----------Cycle
(cycle time)
Architecture --> Implementation --> Realization
Compiler Designer
18-347 – Fall 2003
Processor Designer
Chip Designer
Lec.03 - 40
Computer Architecture: Design Abstraction
Application
Operating System
Compiler
Microarchitecture
Digital Design
Circuit Design
18-347 – Fall 2003
I/O System
Instruction Set
Architecture (ISA):
Interface between
software & hardware
Lec.03 - 41
Instruction Set Architecture (ISA)
The HW/SW interface
“Software-visible” hardware
Specifies
Program control flow
Program data flow
Loop:
ldr R5, R3
add R2, R5
inc R3
dec R6
brz done
bra loop
18-347 – Fall 2003
Lec.03 - 42
Microarchitecture
Hardware organization or
implementation
The “guts” of the machine
Implements the ISA
“Software-invisible”
Block-level grouping of
transistors
You are looking at a 100
million transistor chip
18-347 – Fall 2003
Lec.03 - 43
High-level View of a Computer
Processor
Memory
Peripherals
Memory
Peripherals
Processor
SCSI (Disk)
Control
Ethernet
KBD/Mouse
Datapath
Video
18-347 – Fall 2003
Lec.03 - 44
What’s Inside a Computer
Processor or CPU
Memory subsystem
Processor/device interface (I/O)
How the processor talks to devices such
as the network, disk drives, keyboard,
etc.
CPU
Memory
(DRAM and caches)
Shared Bus
Network
Video
Network
(Ethernet)
18-347 – Fall 2003
Lec.03 - 45
What’s Inside the Processor?
CPU is partitioned into:
Control Logic (control path)
Datapath
Datapath includes
Program Counter
Register File (to fetch instructions)
Register File
ALU
PC
Memory Register
Control path uses
instructions to manage
the datapath
Current instruction
ALU
Control
Logic
Memory Address
Register
(to fetch data)
from memory
18-347 – Fall 2003
Lec.03 - 46
What is the ALU ?
18-347 – Fall 2003
Lec.03 - 47
ALU
What makes digital computers possible:
BINARY NUMBERS + BINARY LOGIC need only a SWITCH to be implemented
18-347 – Fall 2003
Lec.03 - 48
Switches LOGIC
18-347 – Fall 2003
Lec.03 - 49
BASIC LOGIC Functions (GATES): AND & OR
18-347 – Fall 2003
Lec.03 - 50
XOR LOGIC Function
18-347 – Fall 2003
Lec.03 - 51
ARITHMETIC …
18-347 – Fall 2003
Lec.03 - 52
BINARY ARITHMETIC
18-347 – Fall 2003
Lec.03 - 53
BINARY ARITHMETIC (cont.)
18-347 – Fall 2003
Lec.03 - 54
BINARY ARITHMETIC (cont.)
18-347 – Fall 2003
Lec.03 - 55
In Practice …
18-347 – Fall 2003
Lec.03 - 56