Computer System Organization

CE 454
Computer Architecture
Lectures 2 + 3
Ahmed Ezzat
Computer Systems Organization,
Ch-2
1
Outline





2
The IAS (Von Neumann) Machine Overview
Computer Components Overview
CPU Module
Memory Module
Input/Output Module
CE 454
Ahmed Ezzat
The IAS (Von Neumann) Machine




Stored Program concept
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions from memory and executing
Input and output equipment operated by control unit
Input
Output
Equipment
Arithmetic
and
Logic Unit
Main
Memory

Program
Control Unit
The Structure of IAS Computer
3
CE 454
1946 ~ 1952
John von Neumann
Princeton
Institute for Advanced Studies - IAS
Almost all of today’s computers
have the same general structure as
the IAS - referred to as
Von Neumann machines.
Ahmed Ezzat
The IAS Machine: Memory
IAS Memory
- 1000 storage locations (words), each 40 bits
- Both data and instruction are stored in the memory
Memory
Location 0
Sign bit
39-bit value
Number word
0
8
20
39
28
Location 999
Opcode
Bit 1
4
…….
Bit 40
address
Opcode
address
Instruction word
CE 454
Ahmed Ezzat
The IAS Machine: Control Unit
The control unit operates the machine by fetching instructions
from memory and executing them ONE at a time.
Central Processing Unit
Arithmetic and Logic Unit
Accumulator
MQ
Arithmetic & Logic Circuits
Main
Memory
MBR
Input
Output
Equipment
Instructions
& Data
PC
IBR
MAR
IR
Control
Circuits
Program Control Unit
5
CE 454
Address
Ahmed Ezzat
The IAS Machine: Instruction Cycle
The IAS operates by repetitively performing an instruction cycle.
Two sub-cycles:
- During the fetch cycle, the opcode of the NEXT instruction is
loaded in to the IR and the address portion is loaded into the MAR
- Once the opcode is in the IR, the execute cycle is performed.
Control circuitry interprets the opcode and executes the
instruction by sending out appropriate control signals to cause
data to be moved or an operation to be performed by the ALU.
6
CE 454
Ahmed Ezzat
The IAS Machine: Summary
7

Data and instructions are stored in a single read-write
memory

The content of this memory are addressable by location,
without regard to the type of data contained in it.

Execution occurs in a sequential fashion (unless explicitly
modified) from one instruction to the next.
CE 454
Ahmed Ezzat
Computer Components Overview
8

A computer
consists of a set
of modules of
three basic types

They
communicate with
each other

Needs connection
paths for
connecting the
modules
CE 454
Ahmed Ezzat
Computer Components Overview: Bus
Interconnection
9

A bus is a communication path that
connects two or more devices

A bus is a shared transmission
medium.

A signal transmitted by one device
is available for reception by all
other devices attached to the bus

If two devices transmit during the
same period, their signal will
overlap and become garbled.

Only one device can successfully
transmit at any one time.
CE 454
CPU
Memory
I/O
Buses
Ahmed Ezzat
Computer Components Overview: Bus
Interconnection





10
Data Lines (Data Bus - DB)
Address Lines (Address Bus - AB)
Control Lines (Control Bus - CB): Control the access to and the use of DB and AB. (remember
AB and DB shared by all devices)
CB send out both command and timing signals
–
Command: specify the type of operation (R/W)
–
Timing: Indicate the validity of data on DB and AB
Typical control lines include: Memory R/W, I/O R/W, Bus request, Bus grant, Interrupt request,
Interrupt grant, Transfer ACK
CE 454
Ahmed Ezzat
CPU Module




11
Read instructions
and data
Write out data
after processing
Use control signal
to control the
overall operation
of the computer
Receive
interrupt signal
CE 454
Ahmed Ezzat
CPU Module: CPU Registers

Registers
–
–

Accumulator
–

Permanent storage locations within the CPU
Each used for a particular, defined purpose
Instructions
General purpose register
Address
Registers in the Control Unit
–
–
PC - program counter
register
IR - Instruction register
Data
Control
Interrupt
–
–
–
–
–
12
MAR - memory address register
MBR - memory buffer register
I/O AR - I/O address register
I/O BR - I/O buffer register
Status register
CE 454
Data
Ahmed Ezzat
CPU Module: CPU Data Path
13
CE 454
Ahmed Ezzat
CPU Module: CPU Packaging
14
CE 454
Ahmed Ezzat
CPU Module: RISC vs CISC

RISC, initially John Cocke/IBM, typically implies:
–
–
–
–

CISC typically implies:
–
–
–
–

15
Small number of instructions around load and store
Instructions are of fixed size format
One instruction per CPU cycle
Large number of registers or compiler use to optimize registers usage
Large number of instructions
Multiple instruction formats
Multiple CPU cycles per instruction
Smaller number of registers, and typically simpler compiler development
Today, most modern computers exploit best-of-both. For example
latest Intel’s Itanium CPU uses CISC instruction set, while
internally (microprogramming) translates each complex instruction
into sequence of RISC micro-programmed instructions
CE 454
Ahmed Ezzat
CPU Module: RISC vs CISC - What is wrong
with this Analysis
16
Speed
Passengers
Throughput
(pmph)
6.5 hours
610 mph
470
286,700
3 hours
1350 mph
132
178,200
Plane
DC to Paris
Boeing 747
Concorde
(Time to Run task)
CE 454
Ahmed Ezzat
CPU Module: Design Principals for Modern
Computers
17

All instructions are directly executed by hardware

Maximize the rate at which instructions are issued

Instructions should be easy to decode

Only Loads and Stores should reference memory

Provide plenty of registers, since memory access is slow
CE 454
Ahmed Ezzat
CPU Module: Instruction-Level Parallelism

Why? To improve performance
–

Instruction-level parallelism exploits parallelism
within individual instruction:
–
–
18
How about increasing clock speed!
Instruction prefetch: fetching instructions from memory is
a bottleneck – prefetch in registers.
Pipelining: takes that concept further and divides the
instruction execution into multiple parts/stages, each is
dedicated piece of hardware and all can run in parallel
CE 454
Ahmed Ezzat
CPU Module: Instruction-Level Parallelism Pipelining
19
CE 454
Ahmed Ezzat
CPU Module: Instruction-Level Parallelism Multiple Pipelines



20
Dual five-stage pipelines with a common instruction fetch unit
Pentium used this strategy with main pipeline called u pipeline (can
execute arbitrary instruction), and the second pipeline called V pipeline
(can execute simple integer/floating point instructions).
It is the compiler responsibility to ensure that instructions don’t conflict
over resources/registers, and neither must depend on the result of the
other.
CE 454
Ahmed Ezzat
CPU Module: Instruction-Level Parallelism Superscalar


21
Single pipeline with multiple functional units – CDC 6600, Pentium-II.
Implicit assumption is S3 is much faster than S4!
CE 454
Ahmed Ezzat
CPU Module: Processor-Level Parallelism

22
SIMD - Array or Vector Computers: ILLIAC IV
CE 454
Ahmed Ezzat
CPU Module: Processor-Level Parallelism

MIMD:
–
–
23
Multiprocessors
Multicomputers
CE 454
Ahmed Ezzat
Memory Module





24
N words of equal length
Each word with a unique address (0, 1, …, N-1)
A word of data can be read from or write into the memory
The nature of the the operation (R/W) is indicated by read and write control signals
The location for operation is specified by an address
CE 454
Ahmed Ezzat
Memory Module: Architecture
25

Primary memory is a collection of independent storage
units. Each unit stores a single multi-bit value.

The number of bits in a storage unit is a constant for all
storage units in the memory system, and this constant is
called the memory width.

Addresses are used to access the storage units in the
memory system. Each storage unit has a unique address.
CE 454
Ahmed Ezzat
Memory Module: Architecture
A graphical
representation of a
memory with 128
storage units and
width 8.
The memory is
called 128 x 8
memory.
26
CE 454
Ahmed Ezzat
Memory Module: Architecture
27

Manufacturers produce a number of different types of memory
devices having a variety of technologies.

The technology affects not only the operating characteristics,
such as power consumption, size, and speed, but also the
manufacturing cost.

Thus in the selection of memory chips for a particular
application, designers must weigh the trade-offs between cost
and performance.
CE 454
Ahmed Ezzat
Memory Module: Architecture
Read-only Memory




28
Read-only memories (ROMs) are memory devices that the CPU can read but cannot write.
Many ROMs are factory programmed and there is no way to alter their contents (the term
programming here means writing values into a ROM). These devices are denser and cheaper
to manufacture than other type of ROM
Programmable ROMs (PROMs): This type of ROM can be programmed by using special
high current device to destroy (burn) the fuse that were manufactured into the device. The
result of burning a PROM is that certain bits are always 0 and the rest are always 1. These
values cannot be altered once written
Erasable PROMs (EPROMs): This type of ROM is alterable, although not during ordinary
use. A technician can program an EPROM off line, later completely erase its contents by
using ultraviolet light, and then reprogram it
CE 454
Ahmed Ezzat
Memory Module: Architecture
Read/Write Memory



29
Read/Write memories refer to memory devices can be read from and write into with equal
ease. Two main types of read/write memory devices are static random access memories
(SRAMs) and dynamic random access memories (DRAMs).
SRAMs: In SRAMs, the individual memory contents, once written, do not need to be further
addressed or manipulated to hold their values. These devices are composed of flip-flops that
use a small current to maintain their contents. SRAMs are used mostly in CPU registers and
other high speed storage devices. Some computers use them for cache and main memory.
SRAMs are currently the fastest and most expensive of semiconductor memory circuit
DRAMs: These are semiconductor memory devices in which the stored data will not remain
permanently stored, even with power applied, unless the data are periodically rewritten into
the memory. The latter operation is called the refresh operation. Although much cheaper
than SRAMs, DRAMs are also slower and used mostly for main memory
CE 454
Ahmed Ezzat
Memory Module: Architecture
General Memory Operation
 Although each type of memory is different in its internal operation, there are
certain basic operating principles that are the same for all memory systems. Every
memory system requires several different types of input and output lines to
perform the following function:
 Select the address in memory that is being accessed for READ or WRITE
operation
 Select either READ or WRITE operation to be performed
 Supply the input data to be stored in memory during write operation
 Hold the output data coming from memory during a read operation
 Enable (or disable) the memory so that it will (or will not) respond to the
address inputs and read/write command
30
CE 454
Ahmed Ezzat
Memory Module: Architecture
General Memory Operation
Data Input
D0
D1 D2 D3
Address Input
Read/Write Control
A0
A1
A2
A3
A4
R/W
32 x 4 Memory
Chip Selection
CS
D0
D1 D2 D3
Data Output
31
CE 454
Ahmed Ezzat
Memory Module: Architecture
Address Decoder
32
CE 454
Ahmed Ezzat
Memory Module: Architecture
Memory Cell
Address Decoder
33
The relationship between the MDR, the MAR, and memory
CE 454
Ahmed Ezzat
Memory Module: Architecture
MAR-MDR example
34
CE 454
Ahmed Ezzat
Memory Module: Architecture
Memory Address Space and Memory Map
35

The total amount of memory contained in any system is limited by
the size of the address bus

Example: An “XYZ” processor has 16-bit address bus, what is the
maximum amount of memory which a system can utilised?
CE 454
Ahmed Ezzat
Memory Module: Architecture
Memory Map



36
The microcomputer designer has to allocate this address space
among the RAM, ROM, and I/O devices that are to be part of the
system
The manner in which the total address space is apportioned
among these devices depends, to certain extent, on characteristics
of the processor
A memory map is a simple diagram which identifies the size and
location of any memory block in the total address space
CE 454
Ahmed Ezzat
Memory Module: Architecture
Memory Map
FFFF
FC00
ROM
1K
Not Used
B0FF
I/O 256
B000
Not Used
07FF
0000

37
RAM
2K
A typical memory map for a microprocessor system with 16-bit address bus
CE 454
Ahmed Ezzat
Memory Module: Architecture
IBM PC Main Memory Map





38
Main memory, also called conventional memory, refers to the storage locations
that the CPU can reference during an ordinary memory-read or memory-write
bus cycle without special hardware
The amount of main memory a PC could directly address to is 1MB (220 bytes)
The PC architects divided the address space of conventional memory into a
number of blocks, which they allocated for various software components
They allocated the largest block, with addresses ranging from 0K to 640K, to
program memory, which they implemented with DRAM chips
They reserved the remaining block, with address ranging from 640K and
1024K, for ROM BIOS and other system components. The following table
summarises the allocation of addresses within conventional memory of a PC
CE 454
Ahmed Ezzat
Memory Module: Architecture
39
Address
PC Usage
Address
PC Usage
960K - 1024K
880K - 960K
848K - 880K
816K - 848K
800K - 816K
784K - 800K
768K - 784K
752K - 768K
736K - 752K
720K - 736K
704K - 720K
640K - 704K
ROM BIOS
Unused
LIM data area
LIM data area
Hard disk ROM
Unused
EGA ROM
Unused
CGA
Unused
MDA
EGA or VGA
1536 - 640K
1152 - 1535
User RAM
BASIC,
Special system RAM
Interrupt-vector table
CE 454
0 - 1023
Ahmed Ezzat
Memory Module: The Fetch-Execute
Instruction Cycle
LOAD address
40
PC
MBR
MAR
IR
IR [address]
MAR
MBR
ACC
PC+1
PC
CE 454
Ahmed Ezzat
Memory Module: The Fetch-Execute
Instruction Cycle
STORE address
41
PC
MAR
MBR
IR
IR [address]
MAR
ACC
MBR
PC+1
PC
CE 454
Ahmed Ezzat
Memory Module: The Fetch-Execute
Instruction Cycle
ADD address
42
CE 454
PC
MAR
MBR
IR
IR [address]
MAR
ACC +MBR
ACC
PC+1
PC
Ahmed Ezzat
Memory Module: Byte Ordering


43
Integers matters: value of 2 is 0x00000002 in LE, and 0x02000000 in BE
Characters don’t matter
CE 454
Ahmed Ezzat
Memory Module: Error Detection & Correction
Error Detection and Correction Code







44
For m data bits, r redundant/check bits, which result in n bits = m + r
These n bits are called codeword
Given two codewords, Hamming distance “d” is the number of bits in the
codewords that differ
With m-bit memory, all 2m bit patterns are legal, i.e., only 2m out of 2n are valid
codewords
To detect “d” single-bit errors, you need distance of (d +1)
To correct “d” single-bit errors, you need distance of (2d +1)
Example: with distance of “5”, you can detect “4” simultaneous single bit error,
and you can correct “2” single bit errors
CE 454
Ahmed Ezzat
Memory Module: Error Detection & Correction
Error Detection and Correction Code

45
What is r value to correct single-bit error for m data bits?
(n + 1) 2m  2n
(m + r + 1)  2r
// Given m, we can find lower limit to r
CE 454
Ahmed Ezzat
Memory Module: Error Detection & Correction
Hamming Code Algorithm to construct Error Correction Code





46
For m data bits, r parity bits, codeword n = m + r
Bits are numbered starting at 1 (not 0), with bit 1 is the leftmost bit
Bits that are power of 2 are parity bits, the rest are data bits
Example: For 16-bit word, 5 parity bits are added (Bits 1, 2, 4, 8, 16)
Assume even parity is used: each parity bit checks specific bit positions; set so
that total number of 1s in the checked positions is even
 Bit 1 checks bits: 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21
 Bit 2 checks bits: 2, 3, 6, 7, 10, 11, 14, 15, 18, 19
 Bit 4 checks bits: 4, 5, 6, 7, 12, 13, 14, 15, 20, 21
 Bit 8 checks bits: 8, 9, 10, 11, 12, 13, 14, 15
 Bit 16 checks bits: 16, 17, 18, 19, 20, 21
CE 454
Ahmed Ezzat
Memory Module: Error Detection & Correction
Hamming Code Algorithm to construct Error Correction Code


47
In general, bit b is checked by those bits b1, b2, … bj such that
b1 + b2 +… bj = b
Simple Approach:
 Compute all parity bits, if all are correct, there is no error
 Add up all incorrect parity bits, the resulting sum is the position of the
incorrect data bit  reverse it
 Done
CE 454
Ahmed Ezzat
Memory Module: Cache
Cache Memory






48
CPU is much faster than memory
Small amount of fast memory on the CPU chip (cache) with a large amount of
slow memory (off CPU chip)
Most heavily used memory words are kept in the cache
When CPU needs a word, it checks first the cache
Memory references of the total memory in any short time interval tend to use
only small fraction of the total memory is called the locality principal – basis
of caching systems
When a word is referenced, it and some of its neighbours are brought into the
cache for future references
CE 454
Ahmed Ezzat
Memory Module: Cache
Cache Memory



49
Assume, “c” is the cache access time, “m” is the main memory access time,
and “h” is the hit ratio
If one word is R/W k times in a short interval, the CPU will reference main
memory 1 time and the cache (k – 1) times:
 Cache hit ratio = h = (k – 1)/k
 Cache miss ratio = (1 – h)
 Mean access time = c + (1 – h)  m
 As h  1, all references can be satisfied from the cache
 As h  0, all references approaches (c + m)
Cache is divided into blocks, called cache lines, when cache miss happens an
entire cache line is brought into the cache, I.e., not only the missed word
CE 454
Ahmed Ezzat
Memory Module: Memory Packaging
Memory Packaging and Types






50
Early days, memory was installed as single chips installed into sockets on the
mother board
At present, a group of chips (typically 8 – 16), is mounted on a tiny printed
circuit board and sold as a unit. This unit is called SIMM (Single Inline Memory
Module) or a DIMM (Dual Inline Memory Module)
Typical SIMM configuration is 32 MB
Typical DIMM configuration is 64 MB, and is capable of 64 data bit at once.
A physically smaller version of DIMM called SO-DIMM (Small Outline DIMM) is
used with notebook computers
SIMM and DIMM can have parity bit for error correction, but since average
error rate for a module is one error per 10 years, it is typically omitted
CE 454
Ahmed Ezzat
Memory Module: Memory Hierarchy
51
CE 454
Ahmed Ezzat
Memory Module: Secondary Memory
Magnetic Disk
Surface 7
Track N
Surface 6
Surface 5
Surface 4
Surface 3
Surface 2
Track 0
Direction
of Arm
Surface 1
Surface 0
52
CE 454
Ahmed Ezzat
Memory Module: Secondary Memory




53
Magnetic Disks: access time = seek time (5-15 msec) + rotational time (4 – 8
msec) + transfer time (25 – 100 sec)
Floppy Disks: 2 sizes (5.25 and 3.5). Each has low-density and highdensity versions. The 3.5 comes in a rigid jacket for protection. 3.5 diskettes
are more density and better protection so they effectively replaced the 5.25
diskettes
IDE (Integrated Drive Electronics) Disks: integrates the controller into the
Disk drive. Evolved from the IBM PC XT (10 MB Seagate) around 1980. Goes
up to 528 MB. Mainly for Intel-based systems
EIDE (Extended IDE) Disks: uses LBA (Logical Block Address  head,
sector, cylinder). Mainly for Intel-based systems. Few other systems use them
for their low price.
CE 454
Ahmed Ezzat
Memory Module: Secondary Memory



54
SCSI (Small Computer System Interface) Disks: disk layout is same as IDE, but
interface and transfer rate is different. Some versions (Wide Ultra2 SCSI) provide 80
MB transfer rate. Heavily used in UNIX workstations. Each SCSI has unique ID (0 
7 (15 for wide SCSI)). SCSI controller accepts commands (16 bytes blocks)
RAID (Redundant Array of Inexpensive/Independent Disks): appears to the OS as
a Single Large Expensive Disk (SLED). Different schemes are used level 0  5, to
reflect how striping, replication, checksum layout are done
CD-ROM: Optical as opposed to magnetic disk. Much higher density and last much
longer. Standard is published in IS/ISO 10149 – Red Book. Philips/Sony published
the Yellow Book in 1984 for CD-ROM specs. Philips published yet the Green Book in
1986 for multi-media capabilities. Burned areas are called pits, and unburned areas
are called lands. To have uniform rate of transfer, rotation speed has to be reduced
when the reading head moves from the inside (530 RPM) to the outside (200 RPM).
High Sierra for CD-ROM file system specs.
CE 454
Ahmed Ezzat
Memory Module: Secondary Memory



55
CD-Recordables (CD-R): CD recorder allows individuals to produce CDROMs. They are still once written, could not be erased. Transfer rate is 150
KB/sec
CD-Rewritables (CD-RW): uses different alloy for recording. It uses laser with
3 different powers. At high-power, it melts the alloy to generate a pit. At
medium power, it melts the alloy to become a land again. At low power, it
senses (read) the alloy without changes. CR-RW is much more expensive
than CD-R
DVD (Digital Video Disk): similar to CD using pits, lands, and laser diode.
However, it uses smaller pits, closer tracks, and a different laser. Capacity is 7
folds as CDs – 4.7 GB. A 1x DVD drive operates at 1.4 MB/sec. Using singlesided and dual layers (8.5 GB), double sided and single layer (9.4 GB), and
double-sided and dual layer (17 GB)
CE 454
Ahmed Ezzat
I/O Module




56
Logical structure of a simple PC
Each I/O device consists of 2 parts: controller and I/O device itself
DMA controller R/W to memory w/o CPU help. Will generate an interrupt to CPU
when done
Bus arbiter decides if CPU or a DMA controller (cycle stealing) will use the bus
CE 454
Ahmed Ezzat
I/O Module






57
From an internal (to the computer system) point of view, I/O is functionally similar
to memory
There are two operations: read and write
I/O module may control more than one devices.
The interfaces to each external devices is referred to as port, and each port is
given a unique address
External data path for input output data with external devices
I/O module may be able to send interrupt signals to the CPU
CE 454
Ahmed Ezzat
I/O Module: Data Exchange





58
Memory to processor
Processor to memory
I/O to processor
Processor to I/O
I/O to and from Memory (DMA)
CE 454
Ahmed Ezzat
I/O Module: Bus Types

Dedicated
–
–

Physical:
 Connected to a subset of
modules
Functional:
 Data bus, Address Bus
Multiplexed:

Time multiplexing
Address
Data
Time
59
CE 454
Ahmed Ezzat
I/O Module: Bus Configuration Example-1
60
CE 454
Ahmed Ezzat
I/O Module: Bus Configuration Example-2
61
CE 454
Ahmed Ezzat
I/O Module: Timing / Synchronous
Timing – the way in which events are co-ordinated on the bus
Timing:
Clock cycle/bus cycle
Issued by
master
62
Synchronous Timing Diagram of a Read operation
CE 454
The slave
places data
and ACK
signal
Ahmed Ezzat
I/O Module: Timing / Asynchronous
Issued by the CPU
First Read and Address
signal, wait for it to
stabilize, then master Sync
indicating the presence of
valid address and control
signal
Issued by the slave
Once the master have
read the data, it withdraw
MSYN, cause the slave
to drop SSYN and data
Once SSYN is dropped,
master removes the Read
and Address
63
Asynchronous Timing Diagram of a Read operation
CE 454
Ahmed Ezzat
I/O Module: Bus Arbitration





64
More than one module (e.g. CPU and DMA controller ) may need control of the
bus
Only one module may control bus at one time
Needs some form of arbitration
Centralised Arbitration
 Single hardware device controlling bus access (Bus Controller or Arbiter)
 May be a separate module or part of CPU or separate
Distributed Arbitration
 Each module may claim the bus
 Control logic on all modules
One device is designated as master, which may initiate a data transfer with
some other device (slave)
CE 454
Ahmed Ezzat
I/O Module: Bus Arbitration - Centralized
Bus grant
Bus Request
Arbiter
1
2
4
I/O Devices
Bus grant may or
may not propagate
along the chain
65
3
CE 454
Ahmed Ezzat
I/O Module: Bus Arbitration - Decentralized
Bus request
Busy
+5V
IN
Arbitration
Line
66
OUT
1
IN
OUT
2
CE 454
IN
OUT
3
IN
OUT
4
Ahmed Ezzat
I/O Module: PCI Bus

Peripheral Component Interconnect (PCI)

Start development 1990

Became standard 1995

Used in:

67
–
Sun Workstations
–
Apple Macintosh
–
Wintel PCs
–
Compaq Alpha Server
The same peripheral I/O cards may be plugged into many
different computers
CE 454
Ahmed Ezzat
I/O Module: PCI Bus
PCI bus connections
Source: Copyright ©
PCI Pin List/PCI
Special Interest
Group, 1999.
68
CE 454
Ahmed Ezzat
I/O Module: PCI Bus - Operation Example - Read
69
CE 454
Ahmed Ezzat
I/O Module: PCI Bus - Arbitration
Centralized synchronous arbitration scheme
70
CE 454
Ahmed Ezzat
71
CE 454
Ahmed Ezzat