Computer Architecture and Digital Technology, 10 c 18-347 – Fall 2003 Lec.03 - 1 Intro Instructor: Stefanos Kaxiras Few things about me: Just moved in Uppsala Senior Lecturer – Computer Architetcure 2003– Prof. Univ. of Patras, Greece 1998–2003 Bell Labs (Unix & C group), Ph.D. Univ. of Wisconsin, 1998 Don’t speak Swedish … Course taught in English … with your participation! 18-347 – Fall 2003 Lec.03 - 2 What the course is about: “Computer organization” + Digital Design Interface to the bare H/W Transistors digital circuits arithmetic logic units (adders) processors & memory computers Basic operations microprogramming / microarchitecture instruction set architecture (ISA) Assembly Finishing this course you should be able to: Understand the functionality and operation of the basic elements of a computer system, including processor, memory, input/output Reason about first-order performance of a computer system Understand the hardware/software interface Understand and write programs in assembly language 18-347 – Fall 2003 Lec.03 - 3 Grading, Assignments, Etc. Final exam ASSIGNEMENTS Assembly (MIPS assembly) Logic Synthesis of a processor in LogicSIM Teaching assistant: Will test the knowledge presented in the lectures and the assembly language experience gained in the labs David Elköv, Vasileios Spiliopoulos Course grade Pass assignements 3 Pass exams > 3 18-347 – Fall 2003 Lec.03 - 4 Books Computer Organization & Design. The Hardware/Software Interface by Patterson & Hennessy, 4th edition Morgan Kaufmann Publ 2007. Note that the 3rd edition is not significantly different and it can substitute for the 4th edition 18-347 – Fall 2003 Lec.03 - 5 Assignements Assembly 18-347 – Fall 2003 Lec.03 - 6 Assignments (cont.) LogicSIM Publicly available for download Easy digital design Goal: to build a processor to execute instructions 18-347 – Fall 2003 Lec.03 - 7 Overview of the Course The hardware/software interface Processors (After basic Digital Design) Processor structure & operation ALUs, control, datapath Singlecycle implementation Multicycle implementation Pipelining Advanced architecture Memory Introduction + history Computer Arithmetic Performance Instruction Set Architecture Assembly programming Memory system and caches Virtual memory IO 2003 I/O 18-347 – Fall Lec.03 - 8 Schedule (The hardware/software interface) Vecka 12 Tis 22 mar 13:15-15:00 Tor 24 mar 10:15-12:00 Vecka 13, 2011 Mån 28 mar 10:15-12:00 Ons 30 mar 10:15-12:00 Fre 1 apr 13:15-15:00 Vecka 14, 2011 Mån 4 apr 13:15-15:00 Tis 5 apr 10:15-12:00 Ons 6 apr 13:15-15:00 Fre 8 apr 08:15-12:00 18-347 – Fall 2003 Introduction Arithmetic More Arithmetic Performance ISA 1 ISA 2 Assembly Tutorial Assembly Tutorial Laboration Lec.03 - 9 Credits Slides and material ADAPTED from: Karl Marklund’s slides, Uppsala University Justin Pearson’s slides, Uppsala University Slides originally developed by Profs. Hill, Falsafi, Marculescu, Patterson, Rutenbar and Vijaykumar of CMU, Purdue, UCB, UW, Copyright 2003 Stefanos Kaxiras’ slides, University of Patras & Uppsala Univeristy Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. 18-347 – Fall 2003 Lec.03 - 10 Introduction Some historical background and anecdotes 18-347 – Fall 2003 Lec.03 - 11 Contemporary Multilevel Machines A six-level computer (Tanenbaum). The support method for each level is indicated below it . 18-347 – Fall 2003 Lec.03 - 12 Milestones in Computer Architecture PRE-HISTORY 18-347 – Fall 2003 Lec.03 - 13 Milestones in Computer Architecture EARLY-HISTORY The birth of the Modern-era Computer 18-347 – Fall 2003 Lec.03 - 14 A Historical Perspective In the beginning...Eniac 5,000 18-347 – Fall 2003 additions in one second Lec.03 - 15 ENIAC Built at the University of Pennsylvania Lt Gillon, Eckert and Mauchley Initial contract for $61,700, June 1943 eventually cost $486,804.22, in 1946 Accumulator deployed in Jun 1944 Accumulator, multiplier, divide and square root and 3 portable function tables completed in Fall 1945 200 µsecond cycle time for 1 add Internals 19K vacuum tubes, 1.5K relays, 100K’s of resistors, and inductors; 30 separate units; forced air cooling; Multiply’s in base 10; just like a human Originally, no internal memory --> programmed w/cables and switches Designed to compute firing tables Differential equations of motion to compute trajectory in 15 seconds (same amount of computation took a human 20 hours) Power = 200K Watts 18-347 – Fall 2003 Lec.03 - 16 ENIAC: programming in hardware 18-347 – Fall 2003 Lec.03 - 17 Von Neumann Machine: The concept of the StoredProgram Computer The original Von Neumann machine: DATA and PROGRAM are BOTH in the SAME MEMORY 18-347 – Fall 2003 Lec.03 - 18 Computer Generations Zeroth Generation Mechanical Computers (1642 – 1945) First Generation Vacuum Tubes (1945 – 1955) Second Generation Transistors (1955 – 1965) Third Generation Integrated Circuits (1965 – 1980) Fourth Generation Very Large Scale Integration (1980 – ?) 18-347 – Fall 2003 Lec.03 - 19 Milestones in Computer Architecture The beginning of the Computer Age 18-347 – Fall 2003 Lec.03 - 20 Milestones in Computer Architecture (2) Evolution of the Computer: Mainframes, Minis, Supercomputers, Workstations, and PCs (the Killer Micros) 18-347 – Fall 2003 Lec.03 - 21 Some Important Computers DEC PDP-8 PDP-11 VAX-11 the king of the minis IBM 360 the Mainframe The supercomputers: CDC 6600, CRAY-1 The microprocessors: Intel 8086 … The RISCs: MIPS, SPARC, ALPHA, … Pentium 4, Core-2, … EXCELLENT resources on the Web for the history of these computers, especially wikipedia has great articles for all these 18-347 – Fall 2003 Lec.03 - 22 PDP-8 Innovation – Single Bus The PDPPDP-8 omnibus Primitive machine: 8 basic instructions! 4K to 32K word memory (12 (12--bit words) 18-347 – Fall 2003 Lec.03 - 23 VAX-11 VAX-11: Virtual Address eXtension to pdp 11 VAXExtremely popular university computer One of the most complex machine instruction sets ever! Modern computer science developed on it An Instruction could be a whole loop! Studies showed that compilers could not use all these instructions … The nominal 11-MIPS machine BSD Unix, TCP/IP, …, took off on this machine 18-347 – Fall 2003 Lec.03 - 24 IBM 360 • • • • • Before 360: architecture == implementation 360 (Gene Amdahl): architecture independent of implementation! One ISA, multiple instantiations SAME Software runs on ALL! (Radical development) Backwards compatibility: crucial for an architecture • Intel Apple !!!! Motorola 68000 PowerPC Intel and ALWAYS maintained software compatibility. IBM 360 also run (emulated) IBM 1401 and 7094 18-347 – Fall 2003 Lec.03 - 25 Supercomputers 1st Supercomputer: CDC 6600 (Seymour Cray) Could perform more than one instruction at the same time! (Superscalar -- Scoreboard) Pipelined Very fast cycle time 10 Peripheral processors for I/O No ECC or parity in memory 18-347 – Fall 2003 Correct answer? Lec.03 - 26 CRAY CRAY--1 CRAY First VECTOR supercomputer Instructions could operate directly on vectors: 18-347 – Fall 2003 A[ A[ii] = B[i B[i] + C[i C[i], i=0,128 Lec.03 - 27 The dawn of the micros (mid 70’s) Intel: 8051, 8080 Motorola 6800 Zilog Z80 MOS 6502 8-bit micros, 8-bit words, 64K memory Invariably microprogrammed architectures running at about 1 – 2 MHz 80’s: 16-bit micros: Intel 8086/8088, Motorola 68000 Up to 1 MB memory in 8086 via Segmentation 16 MB in MC 68000 Precursors of the CISCs: 80386,486,Pentium,Pro,II,III,4,Core-2 18-347 – Fall 2003 Lec.03 - 28 RISCs Introduced the concept of smaller (ISA) is better RISC: Reduced Instruction Set Computers Pioneered by: No microcore, all hardware, pipelined, many registers, very simple instructions Berkeley (Patterson) SPARC (SUN) Stanford (Hennessy) MIPS (MIPS) First RISC IBM 801, First commercial RISC: ARM Others: HP PA-RISC, DEC ALPHA, IBM/Motorola PowerPC, IBM Power, … 18-347 – Fall 2003 Lec.03 - 29 Intel Computer Family (1) The Intel CPU family. Clock speeds are measured in MHz (megahertz) where 1 MHZ is 1 million cycles/sec. 18-347 – Fall 2003 Lec.03 - 30 Intel Computer Family (2) The Pentium 4 chip. The photograph is copyrighted by the Intel Corporation, 2003 and is used by permission. 18-347 – Fall 2003 Lec.03 - 31 Personal Computer 1. Pentium 4 socket 2. 875P Support chip 3. Memory sockets 4. AGP connector 5. Disk interface 6. Gigabit Ethernet 7. Five PCI slots 8. USB 2.0 ports 9. Cooling technology 10. BIOS A printed circuit board is at the heart of every personal computer. This figure is a photograph of the Intel D875PBZ board. The photograph is copyrighted by the Intel Corporation, 2003 and is used by permission. 18-347 – Fall 2003 Lec.03 - 32 Four Decades of Microprocessor The Decade of the 1970’s “Microprocessors” - Programmable Controller / Single-Chip Microprocessors - Personal Computers (PC) The Decade of the 1980’s “Quantitative Architecture” - Instruction Pipelining - Fast Cache Memories - Workstations The Decade of the 1990’s “Instruction-Level Parallelism” - Superscalar Processors - Speculative Microarchitectures - Aggressive Code Scheduling - Low-Cost Desktop Supercomputing The Decade of the 2000’s “Power & memory wall Multicore” - Multiprocessor on a Chip (CMP), Multicore - Massively parallel GP-GPUs - Manycores ? 18-347 – Fall 2003 Lec.03 - 33 Moore’s LAW Moore said: every chip generation (3 years) # transistors double Law: transistor # doubles every 18 months! 18-347 – Fall 2003 Lec.03 - 34 More on Moore’s Law Corollary 1: speed of circuits (clock frequency, MHz or GHz) doubles every 18 months. Smaller transistors are also faster! Corollary 2: Performance doubles every 18 months! BUT: ARCHITECTURE translates the increase in transistors to an increase in performance Examples: add vs. memory access Instructions per cycle: MAJOR ARCH. Contribution to Perf. Exponential increase in performance with every generation: What does it mean? 18-347 – Fall 2003 Lec.03 - 35 “Computing power” as area: what part of a 2005 processor corresponds to a 1995 processor? 6/2004 12/2005 12/2002 6/2001 12/1999 6/1998 12/1996 6/1995 18-347 – Fall 2003 Lec.03 - 36 Technology => dramatic change Processor logic capacity: about 30% per year clock rate: about 20% per year Memory DRAM capacity: about 60% per year (4x every 3 years) Memory speed: about 10% per year Cost per bit: improves about 25% per year Disk capacity: about 60% per year 18-347 – Fall 2003 Lec.03 - 37 Processor Performance Microprocessor performance growth in perspective: “Unmatched by any other industry” [John Crawford, Intel Fellow, 1993] Doubling every 18 months (1982-1996): total of 800X - Cars travel at 44,000 MPH; get 16,000 miles/gal. - Air travel: L.A. to N.Y. in 22 seconds (MACH 800) - Wheat yield: 80,000 bushels per acre Doubling every 24 months (1970-1996): total of 9,000X - Cars travel at 600,000 MPH; get 150,000 miles/gal. - Air travel: L.A. to N.Y. in 2 seconds (MACH 9,000) - Wheat yield: 900,000 bushels per acre Exponential effect 18-347 – Fall 2003 Lec.03 - 38 Is Moore’s Law Still Alive? YES! Transistors will double for the next few generations: 65nano, 45, 32, 22, 16~12nano But FREQUENCY (speed) has stalled ~4GHz Power Consumption: Beyond that (10nano): transistors as we know them now don’t work need new devices!! Power Density in chips would reach the surface of the sun if we continued Single-core (processor) performance also stalled Architectural implication: shift to MULTICORES use all these transistors for parallel performance 18-347 – Fall 2003 Lec.03 - 39 Fundamental Equation for Processor Design The “Iron Law” of processor performance: Processor Performance = Instructions = -----------------Program (code size) X Time --------------Program Cycles ---------------Instruction (CPI) Time X -----------Cycle (cycle time) Architecture --> Implementation --> Realization Compiler Designer 18-347 – Fall 2003 Processor Designer Chip Designer Lec.03 - 40 Computer Architecture: Design Abstraction Application Operating System Compiler Microarchitecture Digital Design Circuit Design 18-347 – Fall 2003 I/O System Instruction Set Architecture (ISA): Interface between software & hardware Lec.03 - 41 Instruction Set Architecture (ISA) The HW/SW interface “Software-visible” hardware Specifies Program control flow Program data flow Loop: ldr R5, R3 add R2, R5 inc R3 dec R6 brz done bra loop 18-347 – Fall 2003 Lec.03 - 42 Microarchitecture Hardware organization or implementation The “guts” of the machine Implements the ISA “Software-invisible” Block-level grouping of transistors You are looking at a 100 million transistor chip 18-347 – Fall 2003 Lec.03 - 43 High-level View of a Computer Processor Memory Peripherals Memory Peripherals Processor SCSI (Disk) Control Ethernet KBD/Mouse Datapath Video 18-347 – Fall 2003 Lec.03 - 44 What’s Inside a Computer Processor or CPU Memory subsystem Processor/device interface (I/O) How the processor talks to devices such as the network, disk drives, keyboard, etc. CPU Memory (DRAM and caches) Shared Bus Network Video Network (Ethernet) 18-347 – Fall 2003 Lec.03 - 45 What’s Inside the Processor? CPU is partitioned into: Control Logic (control path) Datapath Datapath includes Program Counter Register File (to fetch instructions) Register File ALU PC Memory Register Control path uses instructions to manage the datapath Current instruction ALU Control Logic Memory Address Register (to fetch data) from memory 18-347 – Fall 2003 Lec.03 - 46 What is the ALU ? 18-347 – Fall 2003 Lec.03 - 47 ALU What makes digital computers possible: BINARY NUMBERS + BINARY LOGIC need only a SWITCH to be implemented 18-347 – Fall 2003 Lec.03 - 48 Switches LOGIC 18-347 – Fall 2003 Lec.03 - 49 BASIC LOGIC Functions (GATES): AND & OR 18-347 – Fall 2003 Lec.03 - 50 XOR LOGIC Function 18-347 – Fall 2003 Lec.03 - 51 ARITHMETIC … 18-347 – Fall 2003 Lec.03 - 52 BINARY ARITHMETIC 18-347 – Fall 2003 Lec.03 - 53 BINARY ARITHMETIC (cont.) 18-347 – Fall 2003 Lec.03 - 54 BINARY ARITHMETIC (cont.) 18-347 – Fall 2003 Lec.03 - 55 In Practice … 18-347 – Fall 2003 Lec.03 - 56
© Copyright 2024 Paperzz