Chapter 3 A Top-Level View of Computer Function and Interconnection + + Computer Components Contemporary computer designs are based on concepts developed by John von Neumann at the Institute for Advanced Studies, Princeton Referred to as the von Neumann architecture and is based on three key concepts: Data and instructions are stored in a single read-write memory The contents of this memory are addressable by location, without regard to the type of data contained there Execution occurs in a sequential fashion (unless explicitly modified) from one instruction to the next Hardwired program The result of the process of connecting the various components in the desired configuration + Hardware and Software Approaches Data Sequence of arithmetic and logic functions Results (a) Programming in hardware Instruction codes Instruction interpreter Control signals Data General-purpose arithmetic and logic functions Results (b) Programming in software Figure 3.1 Hardware and Software Approaches Software • A sequence of codes or instructions • Part of the hardware interprets each instruction and generates control signals • Provide a new sequence of codes for each new program instead of rewiring the hardware Major components: • CPU • Instruction interpreter • Module of general-purpose arithmetic and logic functions • I/O Components + • Input module • Contains basic components for accepting data and instructions and converting them into an internal form of signals usable by the system • Output module • Means of reporting results Software I/O Components Memory address register (MAR) • Specifies the address in memory for the next read or write Memory buffer register (MBR) MEMORY • Contains the data to be written into memory or receives the data read from memory MAR + I/O address register (I/OAR) I/O buffer register (I/OBR) • Specifies a particular I/O device • Used for the exchange of data between an I/O module and the CPU MBR CPU PC Main Memory MAR IR 0 1 2 System Bus Instruction Instruction Instruction MBR Computer Components: Top Level View I/O AR Execution unit Data Data Data Data I/O BR I/O Module Buffers n–2 n–1 PC IR MAR MBR I/O AR I/O BR = = = = = = Program counter Instruction register Memory address register Memory buffer register Input/output address register Input/output buffer register Figure 3.2 Computer Components: Top-Level View + Basic Instruction Cycle START Fetch Cycle Execute Cycle Fetch Next Instruction Execute Instruction Figure 3.3 Basic Instruction Cycle HALT + Fetch Cycle At the beginning of each instruction cycle the processor fetches an instruction from memory The program counter (PC) holds the address of the instruction to be fetched next The processor increments the PC after each instruction fetch so that it will fetch the next instruction in sequence The fetched instruction is loaded into the instruction register (IR) The processor interprets the instruction and performs the required action Action Categories • Data transferred from processor to memory or from memory to processor • An instruction may specify that the sequence of execution be altered • Data transferred to or from a peripheral device by transferring between the processor and an I/O module Processormemory ProcessorI/O Control Data processing • The processor may perform some arithmetic or logic operation on data + 0 3 4 15 Opcode Address (a) Instruction format 0 1 15 S Magnitude (b) Integer format Program Counter (PC) = Address of instruction Instruction Register (IR) = Instruction being executed Accumulator (AC) = Temporary storage (c) Internal CPU registers 0001 = Load AC from Memory 0010 = Store AC to Memory 0101 = Add to AC from Memory (d) Partial list of opcodes Figure 3.4 Characteristics of a Hypothetical Machine Example of Program Execution + The first 4 bits of IR: 0001: Load AC from memory 0010: Store AC to memory 0101: Add to AC from memory The remaining 12 bits of IR: Stands for the address of the memory + Instruction Cycle State Diagram Operations b/w processor & mem. or I/O Instruction Operand Opcode Operand를 기억장치로부터 가져옴 Internal operations of processor 다음에 실행될 명령어의 주소 결정 Classes of Interrupts + Virtually all computers provide a mechanism by which other modules (I/O, memory) may interrupt the normal processing of the processor. Program Generated by some condition that occurs as a result of an instruction execution, such as arithmetic overflow, division by zero, attempt to execute an illegal machine instruction, or reference outside a user's allowed memory space. Timer Generated by a timer within the processor. This allows the operating system to perform certain functions on a regular basis. I/O Generated by an I/O controller, to signal normal completion of an operation, request service from t he processor, or to signal a variety of error conditions. Hardware failure Generated by a failure such as power failure or memory parity error. Program Flow Control User Program I/O Program 4 1 I/O Command WRITE User Program I/O Program 4 1 WRITE I/O Command User Program I/O Program 4 1 WRITE I/O Command 5 2a END 2 2 Interrupt Handler 2b WRITE 5 WRITE Interrupt Handler 5 WRITE END END 3a 3 3 3b WRITE WRITE (a) No interrupts (b) Interrupts; short I/O wait WRITE (c) Interrupts; long I/O wait = interrupt occurs during course of execution of user program Figure 3.7 Program Flow of Control Without and With Interrupts + Transfer of Control via Interrupts User Program Interrupt Handler 1 2 Interrupt occurs here i i+1 M Figure 3.8 Transfer of Control via Interrupts When the interrupt processing is completed, execution resumes (Figure 3.8). Thus, the user program does not have to contain any special code to accommodate interrupts; the processor and the operating system are responsible for suspending the user program and then resuming it at the same point. + Instruction Cycle With Interrupts Fetch Cycle Execute Cycle Interrupt Cycle Interrupts Disabled START Fetch Next Instruction Execute Instruction Check for Interrupt; Interrupts Process Interrupt Enabled HALT Figure 3.9 Instruction Cycle with Interrupts Program Timing: Short I/O Wait Time + 1 1 4 4 I/O operation; processor waits 5 2a I/O operation concurrent with processor executing 5 2b 2 4 4 3a I/O operation; processor waits 5 3 I/O operation concurrent with processor executing 5 3b (b) With interrupts (a) Without interrupts Figure 3.10 Program Timing: Short I/O Wait Figures 3.7b and 3.10b assume that the time required for the I/O operation is relatively short: less than the time to complete the execution of instructions between write operations in the user program. In this case, the segment of code labeled code segment 2 is interrupted. A portion of the code (2a) executes (while the I/O operation is performed) and then the interrupt occurs (upon the completion of the I/O operation). After the interrupt is serviced, execution resumes with the remainder of code segment 2 (2b). Program Timing: Long I/O Wait Time + 1 1 4 4 I/O operation; processor waits 2 5 5 2 The more typical case, especially for a slow device such as a printer, is that the I/O operation will take much more time than executing a sequence of user I/O operation instructions. concurrent with processor executing; Figure 3.7c indicates this state of affairs. In then processor this case, the user program reaches waits the second WRITE call before the I/O operation spawned by the first call is complete. The result is that the user program is hung up at that point. 4 4 3 I/O operation; processor waits I/O operation concurrent with processor executing; then processor waits 5 5 3 (b) With interrupts (a) Without interrupts Figure 3.11 Program Timing: Long I/O Wait Instruction Cycle State Diagram With Interrupts Operand fetch Instruction fetch Operand store Multiple operands Instruction address calculation Instruction operation decoding Operand address calculation Instruction complete, fetch next instruction Multiple results Data Operation Return for string or vector data Operand address calculation Interrupt check No interrupt Figure 3.12 Instruction Cycle State Diagram, With Interrupts Interrupt User program Interrupt handler X Two approaches can be taken to dealing with multiple interrupts. The first is to disable interrupts while an interrupt is being processed. Interrupt handler Y Transfer of Control (a) Sequential interrupt processing User program Multiple Interrupts Interrupt handler X Interrupt handler Y + (b) Nested interrupt processing Figure 3.13 Transfer of Control with Multiple Interrupts A second approach is to define priorities for interrupts and to allow an interrupt of higher priority to cause a lower-priority interrupt handler to be itself interrupted (Figure 3.13b). + Time Sequence of Multiple Interrupts Printer interrupt service routine User program Communication interrupt service routine t=0 Example t 0 =1 t= 15 t = 25 t= 40 t = 25 t= Disk interrupt service routine 35 Figure 3.14 Example Time Sequence of Multiple Interrupts + I/O Function I/O module can exchange data directly with the processor Processor can read data from or write data to an I/O module Processor identifies a specific device that is controlled by a particular I/O module I/O instructions rather than memory referencing instructions In some cases it is desirable to allow I/O exchanges to occur directly with memory The processor grants to an I/O module the authority to read from or write to memory so that the I/O memory transfer can occur without tying up the processor The I/O module issues read or write commands to memory relieving the processor of responsibility for the exchange This operation is known as direct memory access (DMA) + Read Memory Write N Words Address Data Read 0 N–1 I/O Module Write Address M Ports Internal Data External Data Address Instructions Interrupt Signals Internal Data Interrupt Signals External Data Data Data CPU Control Signals Data Figure 3.15 Computer Modules Computer Modules Type of Exchange of Computer Modules • Memory — A memory module will consist of N words of equal length — Each word is assigned a unique numerical address (0,1,...,N1) — A word of data can be read from or written into the memory — The nature of the operation is indicated by read and write control signals — The location for the operation is specified by an address 24 Type of Exchange of Computer Modules • I/O module — I/O is functionally similar to memory. There are two operations, read and write for the data — It controls more than one external device. We can refer to each of the interfaces to an external device as a port and give each a unique address (e.g., 0,1,...,M-1) — Also, there are external data paths for the input and output of data with an external device — It is able to send interrupt signals to the processor 25 Type of Exchange of Computer Modules • Processor — The processor reads in instructions and data, writes out data after processing, and uses control signals to control the overall operation of the system — It also receives interrupt signals 26 The interconnection structure must support the following types of transfers: Memory to processor Processor reads an instruction or a unit of data from memory Processor to memory Processor writes a unit of data to memory I/O to processor Processor reads data from an I/O device via an I/O module Processor to I/O I/O to or from memory Processor sends data to the I/O device An I/O module is allowed to exchange data directly with memory without going through the processor using direct memory access Bu s A communication pathway connecting two or more devices • Key characteristic is that it is a shared transmission medium Typically consists of multiple communication lines • Each line is capable of transmitting signals representing binary 1 and binary 0 In te rc o nn ecti on Signals transmitted by any one device are available for reception by all other devices attached to the bus • If two devices transmit during the same time period their signals will overlap and become garbled Computer systems contain a number of different buses that provide pathways between components at various levels of the computer system hierarchy System bus • A bus that connects major computer components (processor, memory, I/O) The most common computer interconnection structures are based on the use of one or more system buses Data Bus Data lines that provide a path for moving data among system modules May consist of 32, 64, 128, or more separate lines The number of lines is referred to as the width of the data bus The number of lines determines how many bits can be transferred at a time The width of the data bus is a key factor in determining overall system performance + Address Bus Used to designate the source or destination of the data on the data bus If the processor wishes to read a word of data from memory it puts the address of the desired word on the address lines Control Bus Used to control the access and the use of the data and address lines Because the data and address lines are shared by all components there must be a means of controlling their use Control signals transmit both command and timing information among system modules Timing signals indicate the validity of data and address information Command signals specify operations to be performed Width determines the maximum possible memory capacity of the system Also used to address I/O ports The higher order bits are used to select a particular module on the bus and the lower order bits select a memory location or I/O port within the module Bus Interconnection Scheme CPU Memory Memory I/O I/O Control lines Address lines Data lines Figure 3.16 Bus Interconnection Scheme Bus Local Bus Processor Bus Cache Configurations Local I/O controller Main Memory System Bus Network Expansion bus interface SCSI Serial Modem Expansion Bus (a) Traditional Bus Architecture Main Memory Processor SCSI Local Bus Cache /Bridge FireWire System Bus Graphic Video LAN High-Speed Bus FAX Expansion bus interface Serial Modem Expansion Bus (b) High-Performance Architecture Figure 3.17 Example Bus Configurations + Elements of Bus Design Type Dedicated Multiplexed Method of Arbitration Centralized Distributed Timing Synchronous Asynchronous Bus Width Address Data Data Transfer Type Read Write Read-modify-write Read-after-write Block T1 T2 T3 Clock Status lines Address lines Status signals Stable address Address enable Read cycle Data lines Timing of Synchronous Bus Operations Valid data in Read Write cycle Data lines Valid data out Write Figure 3.18 Timing of Synchronous Bus Operations Status lines Status signals Address lines Stable address Read Data lines Valid data Acknowledge (a) System bus read cycle Status lines Status signals Address lines Stable address Data lines Valid data Write Acknowledge (b) System bus write cycle Figure 3.19 Timing of Asynchronous Bus Operations Timing of Asynchronous Bus Operations + Point-to-Point Interconnect The shared bus architecture was the standard approach to interconnection between the processor and other components (memory, I/O, and so on) for decades. But contemporary systems increasingly rely on point-to-point interconnection rather than shared buses. Principal reason for change was the electrical constraints encountered with increasing the frequency of wide synchronous buses At higher and higher data rates it becomes increasingly difficult to perform the synchronization and arbitration functions in a timely fashion A conventional shared bus on the same chip magnified the difficulties of increasing bus data rate and reducing bus latency to keep up with the processors Has lower latency, higher data rate, and better scalability Point to Point Interconnect +Quick Path Interconnect Introduced in 2008 Multiple direct connections Layered protocol architecture Direct pairwise connections to other components eliminating the need for arbitration found in shared transmission systems These processor level interconnects use a layered protocol architecture rather than the simple use of control signals found in shared bus arrangements Packetized data transfer Data are sent as a sequence of packets each of which includes control headers and error control codes QPI QPI I/O Hub PCI Express I/O device DRAM Core D DRAM Core C Multicore Configuration Using QPI I/O device I/O device DRAM Core B I/O device Core A DRAM I/O Hub Memory bus Figure 3.20 Multicore Configuration Using QPI QPI Layers Packets Protocol Protocol Routing Routing Link Physical Flits Phits Link Physical QPI is defined as a four-layer protocol architecture, encompassing the Figure 3.21 QPI Layers following layers (Figure 3.21): • Physical: Consists of the actual wires carrying the signals, as well as circuitry and logic to support ancillary features required in the transmission and receipt of the 1s and 0s. The unit of transfer at the Physical layer is 20 bits, which is called a Phit (physical unit). • Link: Responsible for reliable transmission and flow control. The Link layer’s unit of transfer is an 80-bit Flit (flow control unit). • Routing: Provides the framework for directing packets through the fabric. • Protocol: The high-level set of rules for exchanging packets of data between devices. A packet is comprised of an integral number of Flits. Physical Interface of the Intel QPI Interconnect COMPONENT A Fwd Clk Transmission Lanes Reception Lanes Rcv Clk Reception Lanes Transmission Lanes Fwd Clk Intel QuickPath Interconnect Port Rcv Clk + Intel QuickPath Interconnect Port COMPONENT B Figure 3.22 Physical Interface of the Intel QPI Interconnect + QPI Multilane Distribution bit stream of flits #2n+1 #2n #n+2 #n+1 #n #2 #2n+1 #n+1 #1 QPI lane 0 #2n+2 #n+2 #2 QPI lane 1 #3n #2n #n QPI lane 19 #1 Figure 3.23 QPI Multilane Distribution + QPI Link Layer Performs two key functions: flow control and error control Operate on the level of the flit (flow control unit) Each flit consists of a 72bit message payload and an 8-bit error control code called a cyclic redundancy check (CRC) Flow control function Needed to ensure that a sending QPI entity does not overwhelm a receiving QPI entity by sending data faster than the receiver can process the data and clear buffers for more incoming data Error control function Detects and recovers from bit errors, and so isolates higher layers from experiencing bit errors + QPI Routing and Protocol Layers Routing Layer Used to determine the course that a packet will traverse across the available system interconnects Protocol Layer Packet is defined as the unit of transfer One key function performed at this level is a cache coherency protocol which deals with making sure that main memory values held in multiple caches are consistent A typical data packet payload is a block of data being sent to or from a cache Defined by firmware and describe the possible paths that a packet can follow + Peripheral Component Interconnect (PCI) A popular high bandwidth, processor independent bus that can function as a mezzanine or peripheral bus Delivers better system performance for high speed I/O subsystems PCI Special Interest Group (SIG) Created to develop further and maintain the compatibility of the PCI specifications PCI Express (PCIe) Point-to-point interconnect scheme intended to replace bus-based schemes such as PCI Key requirement is high capacity to support the needs of higher data rate I/O devices, such as Gigabit Ethernet Another requirement deals with the need to support time dependent data streams + PCIe Configuration Core Gigabit Ethernet Core PCIe Memory Chipset PCIe–PCI Bridge PCIe Memory PCIe PCIe PCIe Legacy endpoint PCIe Switch PCIe endpoint PCIe PCIe endpoint PCIe endpoint Figure 3.24 Typical Configuration Using PCIe + PCIe Protocol Layers Transaction Data Link Physical Transaction layer packets (TLP) Data link layer packets (DLLP) Transaction Data Link Physical Figure 3.25 PCIe Protocol Layers + PCIe Multilane Distribution B4 B0 128b/ 130b PCIe lane 0 B5 B1 128b/ 130b PCIe lane 1 B6 B2 128b/ 130b PCIe lane 2 B7 B3 128b/ 130b PCIe lane 3 byte stream B7 B6 B5 B4 B3 B2 B1 B0 Figure 3.26 PCIe Multilane Distribution D+ D– 8b Scrambler 8b 128b/130b Encoding 130b Parallel to serial 1b Transmitter Differential Driver Differential Receiver Clock recovery circuit 1b Data recovery circuit 1b Serial to parallel 130b 128b/130b Decoding 128b D+ D– Descrambler (a) Transmitter 8b (b) Receiver Figure 3.27 PCIe Transmit and Receive Block Diagrams PCIe Transmit and Receive Block Diagrams + Receives read and write requests from the software above the TL and creates request packets for transmission to a destination via the link layer Most transactions use a split transaction technique PCIe Transaction Layer (TL) A request packet is sent out by a source PCIe device which then waits for a response called a completion packet TL messages and some write transactions are posted transactions (meaning that no response is expected) TL packet format supports 32-bit memory addressing and extended 64-bit memory addressing + The TL supports four address spaces: Memory The memory space includes system main memory and PCIe I/O devices Certain ranges of memory addresses map into I/O devices Configuration This address space enables the TL to read/write configuration registers associated with I/O devices I/O This address space is used for legacy PCI devices, with reserved address ranges used to address legacy I/O devices Message This address space is for control signals related to interrupts, error handling, and power management PCIe TLP Transaction Types Address Space Memory I/O Configuration Message Memory, I/O, Configuration TLP Type Memory Read Request Memory Read Lock Request Memory Write Request I/O Read Request I/O Write Request Config Type 0 Read Request Config Type 0 Write Request Config Type 1 Read Request Config Type 1 Write Request Message Request Message Request with Data Completion Completion with Data Completion Locked Completion Locked with Data Purpose Transfer data to or from a location in the system memory map. Transfer data to or from a location in the system memory map for legacy devices. Transfer data to or from a location in the configuration space of a PCIe device. Provides in-band messaging and event reporting. Returned for certain requests. Sequence number Start DLLP 4 Data 0 or 4 ECRC 4 LCRC 1 STP framing (a) Transaction Layer Packet CRC 1 End PCIe Protocol Data Unit Format Appended by Physical Layer 0 to 4096 Header Appended by Data Link Layer 12 or 16 2 Appended by PL 1 Created by DLL 2 STP framing Created by Transaction Layer + Number of octets 1 (b) Data Link Layer Packet Figure 3.28 PCIe Protocol Data Unit Format TLP Memory Request Format 32 bits R Fmt 16 octets + Type R Traffic Class R T E Attr R E P Requestor ID Tag Length Last DW BE First DW BE Address [63:32] Address [31:2] Figure 3.29 TLP Memory Request Format R Summary + Chapter 3 Computer components Computer function A Top-Level View of Computer Function and Interconnection Instruction fetch and execute Interrupts I/O function Interconnection structures Bus interconnection Point-to-point interconnect QPI physical layer QPI link layer QPI routing layer QPI protocol layer PCI express PCI physical and logical architecture Bus structure PCIe physical layer Multiple bus hierarchies PCIe transaction layer Elements of bus design PCIe data link layer
© Copyright 2026 Paperzz