03 Top Level View of Computer Function and Interconnection

Chapter 3
A Top-Level View of Computer
Function and Interconnection
+
+
Computer Components

Contemporary computer designs are based on concepts
developed by John von Neumann at the Institute for
Advanced Studies, Princeton

Referred to as the von Neumann architecture and is based on
three key concepts:


Data and instructions are stored in a single read-write memory

The contents of this memory are addressable by location, without
regard to the type of data contained there

Execution occurs in a sequential fashion (unless explicitly
modified) from one instruction to the next
Hardwired program

The result of the process of connecting the various components in
the desired configuration
+
Hardware
and Software
Approaches
Data
Sequence of
arithmetic
and logic
functions
Results
(a) Programming in hardware
Instruction
codes
Instruction
interpreter
Control
signals
Data
General-purpose
arithmetic
and logic
functions
Results
(b) Programming in software
Figure 3.1 Hardware and Software Approaches
Software
• A sequence of codes or instructions
• Part of the hardware interprets each instruction and
generates control signals
• Provide a new sequence of codes for each new
program instead of rewiring the hardware
Major components:
• CPU
• Instruction interpreter
• Module of general-purpose arithmetic and logic
functions
• I/O Components
+ • Input module
• Contains basic components for accepting data
and instructions and converting them into an
internal form of signals usable by the system
• Output module
• Means of reporting results
Software
I/O
Components
Memory
address
register (MAR)
• Specifies the
address in memory
for the next read or
write
Memory buffer
register (MBR)
MEMORY
• Contains the data
to be written into
memory or
receives the data
read from memory
MAR
+
I/O address
register (I/OAR)
I/O buffer
register (I/OBR)
• Specifies a
particular I/O
device
• Used for the
exchange of data
between an I/O
module and the
CPU
MBR
CPU
PC
Main Memory
MAR
IR
0
1
2
System
Bus
Instruction
Instruction
Instruction
MBR
Computer
Components:
Top Level
View
I/O AR
Execution
unit
Data
Data
Data
Data
I/O BR
I/O Module
Buffers
n–2
n–1
PC
IR
MAR
MBR
I/O AR
I/O BR
=
=
=
=
=
=
Program counter
Instruction register
Memory address register
Memory buffer register
Input/output address register
Input/output buffer register
Figure 3.2 Computer Components: Top-Level View
+
Basic Instruction Cycle
START
Fetch Cycle
Execute Cycle
Fetch Next
Instruction
Execute
Instruction
Figure 3.3 Basic Instruction Cycle
HALT
+
Fetch Cycle

At the beginning of each instruction cycle the processor
fetches an instruction from memory

The program counter (PC) holds the address of the
instruction to be fetched next

The processor increments the PC after each instruction
fetch so that it will fetch the next instruction in sequence

The fetched instruction is loaded into the instruction
register (IR)

The processor interprets the instruction and performs the
required action
Action Categories
• Data transferred
from processor to
memory or from
memory to
processor
• An instruction
may specify that
the sequence of
execution be
altered
• Data transferred to
or from a peripheral
device by
transferring
between the
processor and an
I/O module
Processormemory
ProcessorI/O
Control
Data
processing
• The processor
may perform
some arithmetic
or logic operation
on data
+
0
3 4
15
Opcode
Address
(a) Instruction format
0
1
15
S
Magnitude
(b) Integer format
Program Counter (PC) = Address of instruction
Instruction Register (IR) = Instruction being executed
Accumulator (AC) = Temporary storage
(c) Internal CPU registers
0001 = Load AC from Memory
0010 = Store AC to Memory
0101 = Add to AC from Memory
(d) Partial list of opcodes
Figure 3.4 Characteristics of a Hypothetical Machine
Example
of
Program
Execution
+
The first 4 bits of IR:
0001: Load AC from memory
0010: Store AC to memory
0101: Add to AC from memory
The remaining 12 bits of IR:
Stands for the address of the memory
+
Instruction Cycle State Diagram
Operations b/w
processor & mem.
or I/O
Instruction
Operand
Opcode
Operand를 기억장치로부터 가져옴
Internal
operations of
processor
다음에 실행될 명령어의 주소 결정
Classes of Interrupts
+
Virtually all computers provide a mechanism by which other modules (I/O,
memory) may interrupt the normal processing of the processor.
Program
Generated by some condition that occurs as a result of an instruction
execution, such as arithmetic overflow, division by zero, attempt to
execute an illegal machine instruction, or reference outside a user's
allowed memory space.
Timer
Generated by a timer within the processor. This allows the operating
system to perform certain functions on a regular basis.
I/O
Generated by an I/O controller, to signal normal completion of an
operation, request service from t he processor, or to signal a variety of
error conditions.
Hardware failure
Generated by a failure such as power failure or memory parity error.
Program Flow Control
User
Program
I/O
Program
4
1
I/O
Command
WRITE
User
Program
I/O
Program
4
1
WRITE
I/O
Command
User
Program
I/O
Program
4
1
WRITE
I/O
Command
5
2a
END
2
2
Interrupt
Handler
2b
WRITE
5
WRITE
Interrupt
Handler
5
WRITE
END
END
3a
3
3
3b
WRITE
WRITE
(a) No interrupts
(b) Interrupts; short I/O wait
WRITE
(c) Interrupts; long I/O wait
= interrupt occurs during course of execution of user program
Figure 3.7 Program Flow of Control Without and With Interrupts
+
Transfer of Control via Interrupts
User Program
Interrupt Handler
1
2
Interrupt
occurs here
i
i+1
M
Figure 3.8 Transfer of Control via Interrupts
When the interrupt
processing is completed,
execution resumes (Figure
3.8). Thus, the user
program does not have to
contain any
special code to
accommodate interrupts;
the processor and the
operating system are
responsible for
suspending the user
program and then
resuming it at the same
point.
+
Instruction Cycle With Interrupts
Fetch Cycle
Execute Cycle
Interrupt Cycle
Interrupts
Disabled
START
Fetch Next
Instruction
Execute
Instruction
Check for
Interrupt;
Interrupts Process Interrupt
Enabled
HALT
Figure 3.9 Instruction Cycle with Interrupts
Program Timing: Short I/O Wait
Time
+
1
1
4
4
I/O operation;
processor waits
5
2a
I/O operation
concurrent with
processor executing
5
2b
2
4
4
3a
I/O operation;
processor waits
5
3
I/O operation
concurrent with
processor executing
5
3b
(b) With interrupts
(a) Without interrupts
Figure 3.10 Program Timing: Short I/O Wait
Figures 3.7b and 3.10b assume that
the time required for the I/O
operation is
relatively short: less than the time
to complete the execution of
instructions between write
operations in the user program. In
this case, the segment of code
labeled code segment
2 is interrupted. A portion of the
code (2a) executes (while the I/O
operation is performed)
and then the interrupt occurs (upon
the completion of the I/O
operation). After
the interrupt is serviced, execution
resumes with the remainder of code
segment 2 (2b).
Program Timing: Long I/O Wait
Time
+
1
1
4
4
I/O operation;
processor waits
2
5
5
2
The more typical case, especially for a slow
device such as a printer, is that the
I/O operation will take much more time
than executing a sequence of user
I/O operation
instructions.
concurrent with
processor executing; Figure 3.7c indicates this state of affairs. In
then processor
this case, the user program reaches
waits
the second WRITE call before the I/O
operation spawned by the first call is
complete.
The result is that the user program is hung
up at that point.
4
4
3
I/O operation;
processor waits
I/O operation
concurrent with
processor executing;
then processor
waits
5
5
3
(b) With interrupts
(a) Without interrupts
Figure 3.11 Program Timing: Long I/O Wait
Instruction Cycle State Diagram
With Interrupts
Operand
fetch
Instruction
fetch
Operand
store
Multiple
operands
Instruction
address
calculation
Instruction
operation
decoding
Operand
address
calculation
Instruction complete,
fetch next instruction
Multiple
results
Data
Operation
Return for string
or vector data
Operand
address
calculation
Interrupt
check
No
interrupt
Figure 3.12 Instruction Cycle State Diagram, With Interrupts
Interrupt
User program
Interrupt
handler X
Two approaches can be taken to dealing with multiple
interrupts. The first is to
disable interrupts while an interrupt is being processed.
Interrupt
handler Y
Transfer of
Control
(a) Sequential interrupt processing
User program
Multiple
Interrupts
Interrupt
handler X
Interrupt
handler Y
+
(b) Nested interrupt processing
Figure 3.13 Transfer of Control with Multiple Interrupts
A second approach is to define priorities for interrupts
and to allow an
interrupt of higher priority to cause a lower-priority
interrupt handler to be itself
interrupted (Figure 3.13b).
+ Time Sequence of
Multiple Interrupts
Printer
interrupt service routine
User program
Communication
interrupt service routine
t=0
Example
t
0
=1
t=
15
t = 25
t=
40
t = 25
t=
Disk
interrupt service routine
35
Figure 3.14 Example Time Sequence of Multiple Interrupts
+
I/O Function

I/O module can exchange data directly with the processor

Processor can read data from or write data to an I/O module


Processor identifies a specific device that is controlled by a
particular I/O module

I/O instructions rather than memory referencing instructions
In some cases it is desirable to allow I/O exchanges to occur
directly with memory

The processor grants to an I/O module the authority to read from
or write to memory so that the I/O memory transfer can occur
without tying up the processor

The I/O module issues read or write commands to memory
relieving the processor of responsibility for the exchange

This operation is known as direct memory access (DMA)
+
Read
Memory
Write
N Words
Address
Data
Read
0
N–1
I/O Module
Write
Address
M Ports
Internal
Data
External
Data
Address
Instructions
Interrupt
Signals
Internal
Data
Interrupt
Signals
External
Data
Data
Data
CPU
Control
Signals
Data
Figure 3.15 Computer Modules
Computer
Modules
Type of Exchange of Computer
Modules
• Memory
— A memory module will consist of N words of equal length
— Each word is assigned a unique numerical address (0,1,...,N1)
— A word of data can be read from or written into the memory
— The nature of the operation is indicated by read and write
control signals
— The location for the operation is specified by an address
24
Type of Exchange of Computer
Modules
• I/O module
— I/O is functionally similar to memory. There are two
operations, read and write for the data
— It controls more than one external device. We can refer to
each of the interfaces to an external device as a port and give
each a unique address (e.g., 0,1,...,M-1)
— Also, there are external data paths for the input and output
of data with an external device
— It is able to send interrupt signals to the processor
25
Type of Exchange of Computer
Modules
• Processor
— The processor reads in instructions and data, writes out data
after processing, and uses control signals to control the overall
operation of the system
— It also receives interrupt signals
26
The interconnection structure must support the
following types of transfers:
Memory
to
processor
Processor
reads an
instruction
or a unit of
data from
memory
Processor
to
memory
Processor
writes a
unit of data
to memory
I/O to
processor
Processor
reads data
from an I/O
device via
an I/O
module
Processor
to I/O
I/O to or
from
memory
Processor
sends data
to the I/O
device
An I/O
module is
allowed to
exchange
data
directly
with
memory
without
going
through the
processor
using direct
memory
access
Bu s
A communication pathway
connecting two or more
devices
• Key characteristic is that it is a
shared transmission medium
Typically consists of multiple
communication lines
• Each line is capable of
transmitting signals representing
binary 1 and binary 0
In te rc o nn ecti on
Signals transmitted by any
one device are available for
reception by all other
devices attached to the bus
• If two devices transmit during the
same time period their signals will
overlap and become garbled
Computer systems contain a
number of different buses
that provide pathways
between components at
various levels of the
computer system hierarchy
System bus
• A bus that connects major
computer components (processor,
memory, I/O)
The most common computer
interconnection structures
are based on the use of one
or more system buses
Data Bus

Data lines that provide a path for moving data among system
modules

May consist of 32, 64, 128, or more separate lines

The number of lines is referred to as the width of the data bus

The number of lines determines how many bits can be
transferred at a time

The width of the data bus
is a key factor in
determining overall
system performance
+ Address Bus



Used to designate the source or
destination of the data on the
data bus
 If the processor wishes to
read a word of data from
memory it puts the address of
the desired word on the
address lines
Control Bus

Used to control the access and the
use of the data and address lines

Because the data and address lines
are shared by all components there
must be a means of controlling their
use

Control signals transmit both
command and timing information
among system modules

Timing signals indicate the validity
of data and address information

Command signals specify operations
to be performed
Width determines the maximum
possible memory capacity of the
system
Also used to address I/O ports
 The higher order bits are
used to select a particular
module on the bus and the
lower order bits select a
memory location or I/O port
within the module
Bus Interconnection Scheme
CPU
Memory
Memory
I/O
I/O
Control lines
Address lines
Data lines
Figure 3.16 Bus Interconnection Scheme
Bus
Local Bus
Processor
Bus
Cache
Configurations
Local I/O
controller
Main
Memory
System Bus
Network
Expansion
bus interface
SCSI
Serial
Modem
Expansion Bus
(a) Traditional Bus Architecture
Main
Memory
Processor
SCSI
Local Bus
Cache
/Bridge
FireWire
System Bus
Graphic
Video
LAN
High-Speed Bus
FAX
Expansion
bus interface
Serial
Modem
Expansion Bus
(b) High-Performance Architecture
Figure 3.17 Example Bus Configurations
+
Elements of Bus Design
Type
Dedicated
Multiplexed
Method of Arbitration
Centralized
Distributed
Timing
Synchronous
Asynchronous
Bus Width
Address
Data
Data Transfer Type
Read
Write
Read-modify-write
Read-after-write
Block
T1
T2
T3
Clock
Status
lines
Address
lines
Status signals
Stable address
Address
enable
Read
cycle
Data
lines
Timing of
Synchronous
Bus
Operations
Valid data in
Read
Write
cycle
Data
lines
Valid data out
Write
Figure 3.18 Timing of Synchronous Bus Operations
Status
lines
Status signals
Address
lines
Stable address
Read
Data
lines
Valid data
Acknowledge
(a) System bus read cycle
Status
lines
Status signals
Address
lines
Stable address
Data
lines
Valid data
Write
Acknowledge
(b) System bus write cycle
Figure 3.19 Timing of Asynchronous Bus Operations
Timing of
Asynchronous
Bus
Operations
+ Point-to-Point Interconnect
The shared bus architecture was the standard approach to interconnection between
the processor and other components (memory, I/O, and so on) for decades. But
contemporary systems increasingly rely on point-to-point interconnection rather
than shared buses.
Principal reason for change
was the electrical
constraints encountered
with increasing the
frequency of wide
synchronous buses
At higher and higher data
rates it becomes
increasingly difficult to
perform the synchronization
and arbitration functions in a
timely fashion
A conventional shared bus
on the same chip magnified
the difficulties of increasing
bus data rate and reducing
bus latency to keep up with
the processors
Has lower latency, higher
data rate, and better
scalability  Point to Point
Interconnect
+Quick Path Interconnect

Introduced in 2008

Multiple direct connections


Layered protocol architecture


Direct pairwise connections to other components
eliminating the need for arbitration found in shared
transmission systems
These processor level interconnects use a layered
protocol architecture rather than the simple use of
control signals found in shared bus arrangements
Packetized data transfer

Data are sent as a sequence of packets each of which
includes control headers and error control codes
QPI
QPI
I/O Hub
PCI Express
I/O device
DRAM
Core
D
DRAM
Core
C
Multicore
Configuration
Using
QPI
I/O device
I/O device
DRAM
Core
B
I/O device
Core
A
DRAM
I/O Hub
Memory bus
Figure 3.20 Multicore Configuration Using QPI
QPI Layers
Packets
Protocol
Protocol
Routing
Routing
Link
Physical
Flits
Phits
Link
Physical
QPI is defined as a four-layer protocol architecture, encompassing the
Figure 3.21 QPI Layers
following layers (Figure 3.21):
• Physical: Consists of the actual wires carrying the signals, as well as circuitry and logic to
support ancillary features required in the transmission and receipt of the 1s and 0s. The unit
of transfer at the Physical layer is 20 bits, which is called a Phit (physical unit).
• Link: Responsible for reliable transmission and flow control. The Link layer’s unit of
transfer is an 80-bit Flit (flow control unit).
• Routing: Provides the framework for directing packets through the fabric.
• Protocol: The high-level set of rules for exchanging packets of data between devices. A
packet is comprised of an integral number of Flits.
Physical Interface of the Intel QPI
Interconnect
COMPONENT A
Fwd Clk
Transmission Lanes
Reception Lanes
Rcv Clk
Reception Lanes
Transmission Lanes
Fwd Clk
Intel QuickPath Interconnect Port
Rcv Clk
+
Intel QuickPath Interconnect Port
COMPONENT B
Figure 3.22 Physical Interface of the Intel QPI Interconnect
+
QPI Multilane Distribution
bit stream of flits
#2n+1
#2n
#n+2
#n+1
#n
#2
#2n+1
#n+1
#1
QPI
lane 0
#2n+2
#n+2
#2
QPI
lane 1
#3n
#2n
#n
QPI
lane 19
#1
Figure 3.23 QPI Multilane Distribution
+
QPI Link Layer


Performs two key
functions: flow control and
error control

Operate on the level of
the flit (flow control
unit)

Each flit consists of a 72bit message payload
and an 8-bit error
control code called a
cyclic redundancy check
(CRC)
Flow control function
 Needed to ensure that a
sending QPI entity does not
overwhelm a receiving QPI
entity by sending data faster
than the receiver can process
the data and clear buffers for
more incoming data

Error control function

Detects and recovers from
bit errors, and so isolates
higher layers from
experiencing bit errors
+
QPI Routing and Protocol Layers
Routing Layer


Used to determine the course
that a packet will traverse
across the available system
interconnects
Protocol Layer

Packet is defined as the unit of
transfer

One key function performed at
this level is a cache coherency
protocol which deals with
making sure that main
memory values held in
multiple caches are consistent

A typical data packet payload
is a block of data being sent to
or from a cache
Defined by firmware and
describe the possible paths
that a packet can follow
+
Peripheral Component
Interconnect (PCI)

A popular high bandwidth, processor independent bus that can
function as a mezzanine or peripheral bus

Delivers better system performance for high speed I/O
subsystems

PCI Special Interest Group (SIG)


Created to develop further and maintain the compatibility of the PCI
specifications
PCI Express (PCIe)



Point-to-point interconnect scheme intended to replace bus-based
schemes such as PCI
Key requirement is high capacity to support the needs of higher data rate
I/O devices, such as Gigabit Ethernet
Another requirement deals with the need to support time dependent data
streams
+
PCIe
Configuration
Core
Gigabit
Ethernet
Core
PCIe
Memory
Chipset
PCIe–PCI
Bridge
PCIe
Memory
PCIe
PCIe
PCIe
Legacy
endpoint
PCIe
Switch
PCIe
endpoint
PCIe
PCIe
endpoint
PCIe
endpoint
Figure 3.24 Typical Configuration Using PCIe
+
PCIe Protocol Layers
Transaction
Data Link
Physical
Transaction layer
packets (TLP)
Data link layer
packets (DLLP)
Transaction
Data Link
Physical
Figure 3.25 PCIe Protocol Layers
+
PCIe Multilane Distribution
B4
B0
128b/
130b
PCIe
lane 0
B5
B1
128b/
130b
PCIe
lane 1
B6
B2
128b/
130b
PCIe
lane 2
B7
B3
128b/
130b
PCIe
lane 3
byte stream
B7
B6
B5
B4
B3
B2
B1
B0
Figure 3.26 PCIe Multilane Distribution
D+ D–
8b
Scrambler
8b
128b/130b Encoding
130b
Parallel to serial
1b
Transmitter Differential
Driver
Differential
Receiver
Clock recovery
circuit
1b
Data recovery
circuit
1b
Serial to parallel
130b
128b/130b Decoding
128b
D+ D–
Descrambler
(a) Transmitter
8b
(b) Receiver
Figure 3.27 PCIe Transmit and Receive Block Diagrams
PCIe
Transmit
and
Receive
Block
Diagrams
+

Receives read and write requests from
the software above the TL and creates
request packets for transmission to a
destination via the link layer

Most transactions use a split transaction
technique
PCIe
Transaction
Layer (TL)

A request packet is sent out by a
source PCIe device which then waits
for a response called a completion
packet

TL messages and some write
transactions are posted transactions
(meaning that no response is
expected)

TL packet format supports 32-bit
memory addressing and extended
64-bit memory addressing
+
The TL supports four address
spaces:


Memory
 The memory space includes
system main memory and
PCIe I/O devices
 Certain ranges of memory
addresses map into I/O
devices
Configuration

This address space enables
the TL to read/write
configuration registers
associated with I/O devices

I/O


This address space is used
for legacy PCI devices, with
reserved address ranges
used to address legacy I/O
devices
Message

This address space is for
control signals related to
interrupts, error handling,
and power management
PCIe TLP Transaction Types
Address Space
Memory
I/O
Configuration
Message
Memory, I/O,
Configuration
TLP Type
Memory Read Request
Memory Read Lock Request
Memory Write Request
I/O Read Request
I/O Write Request
Config Type 0 Read Request
Config Type 0 Write Request
Config Type 1 Read Request
Config Type 1 Write Request
Message Request
Message Request with Data
Completion
Completion with Data
Completion Locked
Completion Locked with Data
Purpose
Transfer data to or from a location in the
system memory map.
Transfer data to or from a location in the
system memory map for legacy devices.
Transfer data to or from a location in the
configuration space of a PCIe device.
Provides in-band messaging and event
reporting.
Returned for certain requests.
Sequence number
Start
DLLP
4
Data
0 or 4
ECRC
4
LCRC
1
STP framing
(a) Transaction Layer Packet
CRC
1
End
PCIe
Protocol
Data
Unit
Format
Appended by Physical Layer
0 to 4096
Header
Appended by Data Link Layer
12 or 16
2
Appended by PL
1
Created
by DLL
2
STP framing
Created by Transaction Layer
+
Number
of octets
1
(b) Data Link Layer Packet
Figure 3.28 PCIe Protocol Data Unit Format
TLP Memory Request Format
32 bits
R Fmt
16 octets
+
Type
R
Traffic
Class
R
T E
Attr R
E P
Requestor ID
Tag
Length
Last
DW BE
First
DW BE
Address [63:32]
Address [31:2]
Figure 3.29 TLP Memory Request Format
R
Summary
+
Chapter 3

Computer components

Computer function

A Top-Level View of
Computer Function
and Interconnection

Instruction fetch and
execute

Interrupts

I/O function

Interconnection structures

Bus interconnection

Point-to-point interconnect

QPI physical layer

QPI link layer

QPI routing layer

QPI protocol layer
PCI express

PCI physical and logical
architecture

Bus structure

PCIe physical layer

Multiple bus hierarchies

PCIe transaction layer

Elements of bus design

PCIe data link layer