CE 454 Computer Architecture Lectures 2 + 3 Ahmed Ezzat Computer Systems Organization, Ch-2 1 Outline 2 The IAS (Von Neumann) Machine Overview Computer Components Overview CPU Module Memory Module Input/Output Module CE 454 Ahmed Ezzat The IAS (Von Neumann) Machine Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Input Output Equipment Arithmetic and Logic Unit Main Memory Program Control Unit The Structure of IAS Computer 3 CE 454 1946 ~ 1952 John von Neumann Princeton Institute for Advanced Studies - IAS Almost all of today’s computers have the same general structure as the IAS - referred to as Von Neumann machines. Ahmed Ezzat The IAS Machine: Memory IAS Memory - 1000 storage locations (words), each 40 bits - Both data and instruction are stored in the memory Memory Location 0 Sign bit 39-bit value Number word 0 8 20 39 28 Location 999 Opcode Bit 1 4 ……. Bit 40 address Opcode address Instruction word CE 454 Ahmed Ezzat The IAS Machine: Control Unit The control unit operates the machine by fetching instructions from memory and executing them ONE at a time. Central Processing Unit Arithmetic and Logic Unit Accumulator MQ Arithmetic & Logic Circuits Main Memory MBR Input Output Equipment Instructions & Data PC IBR MAR IR Control Circuits Program Control Unit 5 CE 454 Address Ahmed Ezzat The IAS Machine: Instruction Cycle The IAS operates by repetitively performing an instruction cycle. Two sub-cycles: - During the fetch cycle, the opcode of the NEXT instruction is loaded in to the IR and the address portion is loaded into the MAR - Once the opcode is in the IR, the execute cycle is performed. Control circuitry interprets the opcode and executes the instruction by sending out appropriate control signals to cause data to be moved or an operation to be performed by the ALU. 6 CE 454 Ahmed Ezzat The IAS Machine: Summary 7 Data and instructions are stored in a single read-write memory The content of this memory are addressable by location, without regard to the type of data contained in it. Execution occurs in a sequential fashion (unless explicitly modified) from one instruction to the next. CE 454 Ahmed Ezzat Computer Components Overview 8 A computer consists of a set of modules of three basic types They communicate with each other Needs connection paths for connecting the modules CE 454 Ahmed Ezzat Computer Components Overview: Bus Interconnection 9 A bus is a communication path that connects two or more devices A bus is a shared transmission medium. A signal transmitted by one device is available for reception by all other devices attached to the bus If two devices transmit during the same period, their signal will overlap and become garbled. Only one device can successfully transmit at any one time. CE 454 CPU Memory I/O Buses Ahmed Ezzat Computer Components Overview: Bus Interconnection 10 Data Lines (Data Bus - DB) Address Lines (Address Bus - AB) Control Lines (Control Bus - CB): Control the access to and the use of DB and AB. (remember AB and DB shared by all devices) CB send out both command and timing signals – Command: specify the type of operation (R/W) – Timing: Indicate the validity of data on DB and AB Typical control lines include: Memory R/W, I/O R/W, Bus request, Bus grant, Interrupt request, Interrupt grant, Transfer ACK CE 454 Ahmed Ezzat CPU Module 11 Read instructions and data Write out data after processing Use control signal to control the overall operation of the computer Receive interrupt signal CE 454 Ahmed Ezzat CPU Module: CPU Registers Registers – – Accumulator – Permanent storage locations within the CPU Each used for a particular, defined purpose Instructions General purpose register Address Registers in the Control Unit – – PC - program counter register IR - Instruction register Data Control Interrupt – – – – – 12 MAR - memory address register MBR - memory buffer register I/O AR - I/O address register I/O BR - I/O buffer register Status register CE 454 Data Ahmed Ezzat CPU Module: CPU Data Path 13 CE 454 Ahmed Ezzat CPU Module: CPU Packaging 14 CE 454 Ahmed Ezzat CPU Module: RISC vs CISC RISC, initially John Cocke/IBM, typically implies: – – – – CISC typically implies: – – – – 15 Small number of instructions around load and store Instructions are of fixed size format One instruction per CPU cycle Large number of registers or compiler use to optimize registers usage Large number of instructions Multiple instruction formats Multiple CPU cycles per instruction Smaller number of registers, and typically simpler compiler development Today, most modern computers exploit best-of-both. For example latest Intel’s Itanium CPU uses CISC instruction set, while internally (microprogramming) translates each complex instruction into sequence of RISC micro-programmed instructions CE 454 Ahmed Ezzat CPU Module: RISC vs CISC - What is wrong with this Analysis 16 Speed Passengers Throughput (pmph) 6.5 hours 610 mph 470 286,700 3 hours 1350 mph 132 178,200 Plane DC to Paris Boeing 747 Concorde (Time to Run task) CE 454 Ahmed Ezzat CPU Module: Design Principals for Modern Computers 17 All instructions are directly executed by hardware Maximize the rate at which instructions are issued Instructions should be easy to decode Only Loads and Stores should reference memory Provide plenty of registers, since memory access is slow CE 454 Ahmed Ezzat CPU Module: Instruction-Level Parallelism Why? To improve performance – Instruction-level parallelism exploits parallelism within individual instruction: – – 18 How about increasing clock speed! Instruction prefetch: fetching instructions from memory is a bottleneck – prefetch in registers. Pipelining: takes that concept further and divides the instruction execution into multiple parts/stages, each is dedicated piece of hardware and all can run in parallel CE 454 Ahmed Ezzat CPU Module: Instruction-Level Parallelism Pipelining 19 CE 454 Ahmed Ezzat CPU Module: Instruction-Level Parallelism Multiple Pipelines 20 Dual five-stage pipelines with a common instruction fetch unit Pentium used this strategy with main pipeline called u pipeline (can execute arbitrary instruction), and the second pipeline called V pipeline (can execute simple integer/floating point instructions). It is the compiler responsibility to ensure that instructions don’t conflict over resources/registers, and neither must depend on the result of the other. CE 454 Ahmed Ezzat CPU Module: Instruction-Level Parallelism Superscalar 21 Single pipeline with multiple functional units – CDC 6600, Pentium-II. Implicit assumption is S3 is much faster than S4! CE 454 Ahmed Ezzat CPU Module: Processor-Level Parallelism 22 SIMD - Array or Vector Computers: ILLIAC IV CE 454 Ahmed Ezzat CPU Module: Processor-Level Parallelism MIMD: – – 23 Multiprocessors Multicomputers CE 454 Ahmed Ezzat Memory Module 24 N words of equal length Each word with a unique address (0, 1, …, N-1) A word of data can be read from or write into the memory The nature of the the operation (R/W) is indicated by read and write control signals The location for operation is specified by an address CE 454 Ahmed Ezzat Memory Module: Architecture 25 Primary memory is a collection of independent storage units. Each unit stores a single multi-bit value. The number of bits in a storage unit is a constant for all storage units in the memory system, and this constant is called the memory width. Addresses are used to access the storage units in the memory system. Each storage unit has a unique address. CE 454 Ahmed Ezzat Memory Module: Architecture A graphical representation of a memory with 128 storage units and width 8. The memory is called 128 x 8 memory. 26 CE 454 Ahmed Ezzat Memory Module: Architecture 27 Manufacturers produce a number of different types of memory devices having a variety of technologies. The technology affects not only the operating characteristics, such as power consumption, size, and speed, but also the manufacturing cost. Thus in the selection of memory chips for a particular application, designers must weigh the trade-offs between cost and performance. CE 454 Ahmed Ezzat Memory Module: Architecture Read-only Memory 28 Read-only memories (ROMs) are memory devices that the CPU can read but cannot write. Many ROMs are factory programmed and there is no way to alter their contents (the term programming here means writing values into a ROM). These devices are denser and cheaper to manufacture than other type of ROM Programmable ROMs (PROMs): This type of ROM can be programmed by using special high current device to destroy (burn) the fuse that were manufactured into the device. The result of burning a PROM is that certain bits are always 0 and the rest are always 1. These values cannot be altered once written Erasable PROMs (EPROMs): This type of ROM is alterable, although not during ordinary use. A technician can program an EPROM off line, later completely erase its contents by using ultraviolet light, and then reprogram it CE 454 Ahmed Ezzat Memory Module: Architecture Read/Write Memory 29 Read/Write memories refer to memory devices can be read from and write into with equal ease. Two main types of read/write memory devices are static random access memories (SRAMs) and dynamic random access memories (DRAMs). SRAMs: In SRAMs, the individual memory contents, once written, do not need to be further addressed or manipulated to hold their values. These devices are composed of flip-flops that use a small current to maintain their contents. SRAMs are used mostly in CPU registers and other high speed storage devices. Some computers use them for cache and main memory. SRAMs are currently the fastest and most expensive of semiconductor memory circuit DRAMs: These are semiconductor memory devices in which the stored data will not remain permanently stored, even with power applied, unless the data are periodically rewritten into the memory. The latter operation is called the refresh operation. Although much cheaper than SRAMs, DRAMs are also slower and used mostly for main memory CE 454 Ahmed Ezzat Memory Module: Architecture General Memory Operation Although each type of memory is different in its internal operation, there are certain basic operating principles that are the same for all memory systems. Every memory system requires several different types of input and output lines to perform the following function: Select the address in memory that is being accessed for READ or WRITE operation Select either READ or WRITE operation to be performed Supply the input data to be stored in memory during write operation Hold the output data coming from memory during a read operation Enable (or disable) the memory so that it will (or will not) respond to the address inputs and read/write command 30 CE 454 Ahmed Ezzat Memory Module: Architecture General Memory Operation Data Input D0 D1 D2 D3 Address Input Read/Write Control A0 A1 A2 A3 A4 R/W 32 x 4 Memory Chip Selection CS D0 D1 D2 D3 Data Output 31 CE 454 Ahmed Ezzat Memory Module: Architecture Address Decoder 32 CE 454 Ahmed Ezzat Memory Module: Architecture Memory Cell Address Decoder 33 The relationship between the MDR, the MAR, and memory CE 454 Ahmed Ezzat Memory Module: Architecture MAR-MDR example 34 CE 454 Ahmed Ezzat Memory Module: Architecture Memory Address Space and Memory Map 35 The total amount of memory contained in any system is limited by the size of the address bus Example: An “XYZ” processor has 16-bit address bus, what is the maximum amount of memory which a system can utilised? CE 454 Ahmed Ezzat Memory Module: Architecture Memory Map 36 The microcomputer designer has to allocate this address space among the RAM, ROM, and I/O devices that are to be part of the system The manner in which the total address space is apportioned among these devices depends, to certain extent, on characteristics of the processor A memory map is a simple diagram which identifies the size and location of any memory block in the total address space CE 454 Ahmed Ezzat Memory Module: Architecture Memory Map FFFF FC00 ROM 1K Not Used B0FF I/O 256 B000 Not Used 07FF 0000 37 RAM 2K A typical memory map for a microprocessor system with 16-bit address bus CE 454 Ahmed Ezzat Memory Module: Architecture IBM PC Main Memory Map 38 Main memory, also called conventional memory, refers to the storage locations that the CPU can reference during an ordinary memory-read or memory-write bus cycle without special hardware The amount of main memory a PC could directly address to is 1MB (220 bytes) The PC architects divided the address space of conventional memory into a number of blocks, which they allocated for various software components They allocated the largest block, with addresses ranging from 0K to 640K, to program memory, which they implemented with DRAM chips They reserved the remaining block, with address ranging from 640K and 1024K, for ROM BIOS and other system components. The following table summarises the allocation of addresses within conventional memory of a PC CE 454 Ahmed Ezzat Memory Module: Architecture 39 Address PC Usage Address PC Usage 960K - 1024K 880K - 960K 848K - 880K 816K - 848K 800K - 816K 784K - 800K 768K - 784K 752K - 768K 736K - 752K 720K - 736K 704K - 720K 640K - 704K ROM BIOS Unused LIM data area LIM data area Hard disk ROM Unused EGA ROM Unused CGA Unused MDA EGA or VGA 1536 - 640K 1152 - 1535 User RAM BASIC, Special system RAM Interrupt-vector table CE 454 0 - 1023 Ahmed Ezzat Memory Module: The Fetch-Execute Instruction Cycle LOAD address 40 PC MBR MAR IR IR [address] MAR MBR ACC PC+1 PC CE 454 Ahmed Ezzat Memory Module: The Fetch-Execute Instruction Cycle STORE address 41 PC MAR MBR IR IR [address] MAR ACC MBR PC+1 PC CE 454 Ahmed Ezzat Memory Module: The Fetch-Execute Instruction Cycle ADD address 42 CE 454 PC MAR MBR IR IR [address] MAR ACC +MBR ACC PC+1 PC Ahmed Ezzat Memory Module: Byte Ordering 43 Integers matters: value of 2 is 0x00000002 in LE, and 0x02000000 in BE Characters don’t matter CE 454 Ahmed Ezzat Memory Module: Error Detection & Correction Error Detection and Correction Code 44 For m data bits, r redundant/check bits, which result in n bits = m + r These n bits are called codeword Given two codewords, Hamming distance “d” is the number of bits in the codewords that differ With m-bit memory, all 2m bit patterns are legal, i.e., only 2m out of 2n are valid codewords To detect “d” single-bit errors, you need distance of (d +1) To correct “d” single-bit errors, you need distance of (2d +1) Example: with distance of “5”, you can detect “4” simultaneous single bit error, and you can correct “2” single bit errors CE 454 Ahmed Ezzat Memory Module: Error Detection & Correction Error Detection and Correction Code 45 What is r value to correct single-bit error for m data bits? (n + 1) 2m 2n (m + r + 1) 2r // Given m, we can find lower limit to r CE 454 Ahmed Ezzat Memory Module: Error Detection & Correction Hamming Code Algorithm to construct Error Correction Code 46 For m data bits, r parity bits, codeword n = m + r Bits are numbered starting at 1 (not 0), with bit 1 is the leftmost bit Bits that are power of 2 are parity bits, the rest are data bits Example: For 16-bit word, 5 parity bits are added (Bits 1, 2, 4, 8, 16) Assume even parity is used: each parity bit checks specific bit positions; set so that total number of 1s in the checked positions is even Bit 1 checks bits: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 Bit 2 checks bits: 2, 3, 6, 7, 10, 11, 14, 15, 18, 19 Bit 4 checks bits: 4, 5, 6, 7, 12, 13, 14, 15, 20, 21 Bit 8 checks bits: 8, 9, 10, 11, 12, 13, 14, 15 Bit 16 checks bits: 16, 17, 18, 19, 20, 21 CE 454 Ahmed Ezzat Memory Module: Error Detection & Correction Hamming Code Algorithm to construct Error Correction Code 47 In general, bit b is checked by those bits b1, b2, … bj such that b1 + b2 +… bj = b Simple Approach: Compute all parity bits, if all are correct, there is no error Add up all incorrect parity bits, the resulting sum is the position of the incorrect data bit reverse it Done CE 454 Ahmed Ezzat Memory Module: Cache Cache Memory 48 CPU is much faster than memory Small amount of fast memory on the CPU chip (cache) with a large amount of slow memory (off CPU chip) Most heavily used memory words are kept in the cache When CPU needs a word, it checks first the cache Memory references of the total memory in any short time interval tend to use only small fraction of the total memory is called the locality principal – basis of caching systems When a word is referenced, it and some of its neighbours are brought into the cache for future references CE 454 Ahmed Ezzat Memory Module: Cache Cache Memory 49 Assume, “c” is the cache access time, “m” is the main memory access time, and “h” is the hit ratio If one word is R/W k times in a short interval, the CPU will reference main memory 1 time and the cache (k – 1) times: Cache hit ratio = h = (k – 1)/k Cache miss ratio = (1 – h) Mean access time = c + (1 – h) m As h 1, all references can be satisfied from the cache As h 0, all references approaches (c + m) Cache is divided into blocks, called cache lines, when cache miss happens an entire cache line is brought into the cache, I.e., not only the missed word CE 454 Ahmed Ezzat Memory Module: Memory Packaging Memory Packaging and Types 50 Early days, memory was installed as single chips installed into sockets on the mother board At present, a group of chips (typically 8 – 16), is mounted on a tiny printed circuit board and sold as a unit. This unit is called SIMM (Single Inline Memory Module) or a DIMM (Dual Inline Memory Module) Typical SIMM configuration is 32 MB Typical DIMM configuration is 64 MB, and is capable of 64 data bit at once. A physically smaller version of DIMM called SO-DIMM (Small Outline DIMM) is used with notebook computers SIMM and DIMM can have parity bit for error correction, but since average error rate for a module is one error per 10 years, it is typically omitted CE 454 Ahmed Ezzat Memory Module: Memory Hierarchy 51 CE 454 Ahmed Ezzat Memory Module: Secondary Memory Magnetic Disk Surface 7 Track N Surface 6 Surface 5 Surface 4 Surface 3 Surface 2 Track 0 Direction of Arm Surface 1 Surface 0 52 CE 454 Ahmed Ezzat Memory Module: Secondary Memory 53 Magnetic Disks: access time = seek time (5-15 msec) + rotational time (4 – 8 msec) + transfer time (25 – 100 sec) Floppy Disks: 2 sizes (5.25 and 3.5). Each has low-density and highdensity versions. The 3.5 comes in a rigid jacket for protection. 3.5 diskettes are more density and better protection so they effectively replaced the 5.25 diskettes IDE (Integrated Drive Electronics) Disks: integrates the controller into the Disk drive. Evolved from the IBM PC XT (10 MB Seagate) around 1980. Goes up to 528 MB. Mainly for Intel-based systems EIDE (Extended IDE) Disks: uses LBA (Logical Block Address head, sector, cylinder). Mainly for Intel-based systems. Few other systems use them for their low price. CE 454 Ahmed Ezzat Memory Module: Secondary Memory 54 SCSI (Small Computer System Interface) Disks: disk layout is same as IDE, but interface and transfer rate is different. Some versions (Wide Ultra2 SCSI) provide 80 MB transfer rate. Heavily used in UNIX workstations. Each SCSI has unique ID (0 7 (15 for wide SCSI)). SCSI controller accepts commands (16 bytes blocks) RAID (Redundant Array of Inexpensive/Independent Disks): appears to the OS as a Single Large Expensive Disk (SLED). Different schemes are used level 0 5, to reflect how striping, replication, checksum layout are done CD-ROM: Optical as opposed to magnetic disk. Much higher density and last much longer. Standard is published in IS/ISO 10149 – Red Book. Philips/Sony published the Yellow Book in 1984 for CD-ROM specs. Philips published yet the Green Book in 1986 for multi-media capabilities. Burned areas are called pits, and unburned areas are called lands. To have uniform rate of transfer, rotation speed has to be reduced when the reading head moves from the inside (530 RPM) to the outside (200 RPM). High Sierra for CD-ROM file system specs. CE 454 Ahmed Ezzat Memory Module: Secondary Memory 55 CD-Recordables (CD-R): CD recorder allows individuals to produce CDROMs. They are still once written, could not be erased. Transfer rate is 150 KB/sec CD-Rewritables (CD-RW): uses different alloy for recording. It uses laser with 3 different powers. At high-power, it melts the alloy to generate a pit. At medium power, it melts the alloy to become a land again. At low power, it senses (read) the alloy without changes. CR-RW is much more expensive than CD-R DVD (Digital Video Disk): similar to CD using pits, lands, and laser diode. However, it uses smaller pits, closer tracks, and a different laser. Capacity is 7 folds as CDs – 4.7 GB. A 1x DVD drive operates at 1.4 MB/sec. Using singlesided and dual layers (8.5 GB), double sided and single layer (9.4 GB), and double-sided and dual layer (17 GB) CE 454 Ahmed Ezzat I/O Module 56 Logical structure of a simple PC Each I/O device consists of 2 parts: controller and I/O device itself DMA controller R/W to memory w/o CPU help. Will generate an interrupt to CPU when done Bus arbiter decides if CPU or a DMA controller (cycle stealing) will use the bus CE 454 Ahmed Ezzat I/O Module 57 From an internal (to the computer system) point of view, I/O is functionally similar to memory There are two operations: read and write I/O module may control more than one devices. The interfaces to each external devices is referred to as port, and each port is given a unique address External data path for input output data with external devices I/O module may be able to send interrupt signals to the CPU CE 454 Ahmed Ezzat I/O Module: Data Exchange 58 Memory to processor Processor to memory I/O to processor Processor to I/O I/O to and from Memory (DMA) CE 454 Ahmed Ezzat I/O Module: Bus Types Dedicated – – Physical: Connected to a subset of modules Functional: Data bus, Address Bus Multiplexed: Time multiplexing Address Data Time 59 CE 454 Ahmed Ezzat I/O Module: Bus Configuration Example-1 60 CE 454 Ahmed Ezzat I/O Module: Bus Configuration Example-2 61 CE 454 Ahmed Ezzat I/O Module: Timing / Synchronous Timing – the way in which events are co-ordinated on the bus Timing: Clock cycle/bus cycle Issued by master 62 Synchronous Timing Diagram of a Read operation CE 454 The slave places data and ACK signal Ahmed Ezzat I/O Module: Timing / Asynchronous Issued by the CPU First Read and Address signal, wait for it to stabilize, then master Sync indicating the presence of valid address and control signal Issued by the slave Once the master have read the data, it withdraw MSYN, cause the slave to drop SSYN and data Once SSYN is dropped, master removes the Read and Address 63 Asynchronous Timing Diagram of a Read operation CE 454 Ahmed Ezzat I/O Module: Bus Arbitration 64 More than one module (e.g. CPU and DMA controller ) may need control of the bus Only one module may control bus at one time Needs some form of arbitration Centralised Arbitration Single hardware device controlling bus access (Bus Controller or Arbiter) May be a separate module or part of CPU or separate Distributed Arbitration Each module may claim the bus Control logic on all modules One device is designated as master, which may initiate a data transfer with some other device (slave) CE 454 Ahmed Ezzat I/O Module: Bus Arbitration - Centralized Bus grant Bus Request Arbiter 1 2 4 I/O Devices Bus grant may or may not propagate along the chain 65 3 CE 454 Ahmed Ezzat I/O Module: Bus Arbitration - Decentralized Bus request Busy +5V IN Arbitration Line 66 OUT 1 IN OUT 2 CE 454 IN OUT 3 IN OUT 4 Ahmed Ezzat I/O Module: PCI Bus Peripheral Component Interconnect (PCI) Start development 1990 Became standard 1995 Used in: 67 – Sun Workstations – Apple Macintosh – Wintel PCs – Compaq Alpha Server The same peripheral I/O cards may be plugged into many different computers CE 454 Ahmed Ezzat I/O Module: PCI Bus PCI bus connections Source: Copyright © PCI Pin List/PCI Special Interest Group, 1999. 68 CE 454 Ahmed Ezzat I/O Module: PCI Bus - Operation Example - Read 69 CE 454 Ahmed Ezzat I/O Module: PCI Bus - Arbitration Centralized synchronous arbitration scheme 70 CE 454 Ahmed Ezzat 71 CE 454 Ahmed Ezzat
© Copyright 2026 Paperzz