class3-sequential.pdf

CS:APP Chapter 4
Computer Architecture
Sequential
Implementation
Randal E. Bryant
Carnegie Mellon University
http://csapp.cs.cmu.edu
CS:APP
Y86 Instruction Set
Byte
0
nop
0
0
halt
1
0
rrmovl rA, rB
2
0 rA rB
irmovl V, rB
3
0
8 rB
V
rmmovl rA, D(rB)
4
0 rA rB
D
mrmovl D(rB), rA
OPl rA, rB
jXX Dest
call Dest
ret
pushl rA
popl rA
–2–
5
6
7
8
9
A
B
1
2
3
0 rA rB
4
5
addl
6
0
subl
6
1
andl
6
2
xorl
6
3
jmp
7
0
jle
7
1
jl
7
2
je
7
3
jne
7
4
jge
7
5
jg
7
6
D
fn rA rB
fn
0
Dest
Dest
0
0 rA 8
0 rA 8
CS:APP
Building Blocks
fun
Combinational Logic



–3–
0
MUX
1
Operate on data and implement
control
Store bits
=
A
L
U
Compute Boolean functions of
inputs
B
Continuously respond to input
changes
Storage Elements

A
valA
srcA

Addressable memories
valB

Non-addressable registers
srcB

Loaded only as clock rises
A
valW
Register
Register
file
file
B
W
dstW
Clock
Clock
CS:APP
Hardware Control Language

Very simple hardware description language

Can only express limited aspects of hardware operation
 Parts we want to explore and modify
Data Types

bool: Boolean
 a, b, c, …

int: words
 A, B, C, …
 Does not specify word size---bytes, 32-bit words, …
Statements
 bool a = bool-expr ;
 int A = int-expr ;
–4–
CS:APP
HCL Operations

Classify by type of value returned
Boolean Expressions

Logic Operations
 a && b, a || b, !a

Word Comparisons
 A == B, A != B, A < B, A <= B, A >= B, A > B

Set Membership
 A in { B, C, D }
» Same as A == B || A == C || A == D
Word Expressions

Case expressions
 [ a : A; b : B; c : C ]
 Evaluate test expressions a, b, c, … in sequence
 Return word expression A, B, C, … for first successful test
–5–
CS:APP
SEQ Hardware
Structure
newPC
PC
valE, valM
Write back
valM
State

Program counter register (PC)


Condition code register (CC)
Register File

Memories
 Access same memory space
Data
Data
memory
memory
Memory
Addr, Data
valE
Execute
Bch
CC
CC
aluA, aluB
 Data: for reading/writing program
data
 Instruction: for reading
instructions
Instruction Flow

Read instruction at address
specified by PC

Process through stages
Update program counter

ALU
ALU
valA, valB
srcA, srcB
dstA, dstB
Decode
A
B
Register
RegisterM
file
file
E
icode ifun
rA , rB
valC
valP
,
Fetch
Instruction
Instruction
memory
memory
PC
PC
increment
increment
PC
–6–
CS:APP
newPC
SEQ Stages
PC
valE, valM
Write back
valM
Fetch

Read instruction from instruction
memory
Data
Data
memory
memory
Memory
Addr, Data
Decode

Read program registers
Execute

valE
Execute
Bch
CC
CC
aluA, aluB
Compute value or address
valA, valB
Memory

Read or write data
icode ifun
rA , rB
valC
B
valP
,
Fetch

A
Register
RegisterM
file
file
E
Write program registers
PC
srcA, srcB
dstA, dstB
Decode
Write Back

ALU
ALU
Instruction
Instruction
memory
memory
PC
PC
increment
increment
Update program counter
PC
–7–
CS:APP
Instruction Decoding
Optional
5
Optional
0 rA rB
D
icode
ifun
rA
rB
valC
Instruction Format
–8–

Instruction byte
icode:ifun

Optional register byte
rA:rB

Optional constant word valC
CS:APP
Executing Arith./Logical Operation
OPl rA, rB
Fetch

Memory
Read 2 bytes
Decode

Read operand registers
Execute


–9–
6 fn rA rB
Perform operation
Set condition codes

Do nothing
Write back

Update register
PC Update

Increment PC by 2
CS:APP
Stage Computation: Arith/Log. Ops
Fetch
valP ← PC+2
valA ← R[rA]
valB ← R[rB]
valE ← valB OP valA
Read instruction byte
Read register byte
Set CC
Compute next PC
Read operand A
Read operand B
Perform ALU operation
Set condition code register
Memory
Write
R[rB] ← valE
Write back result
back
PC update
PC ← valP
Update PC
Decode
Execute
– 10 –
OPl rA, rB
icode:ifun ← M1[PC]
rA:rB ← M1[PC+1]

Formulate instruction execution as sequence of simple
steps

Use same general form for all instructions
CS:APP
Executing rmmovl
rmmovl rA, D(rB) 4 0 rA rB
Fetch

Memory
Read 6 bytes
Decode

Read operand registers
Execute

– 11 –
D
Compute effective address

Write to memory
Write back

Do nothing
PC Update

Increment PC by 6
CS:APP
Stage Computation: rmmovl
Fetch
Decode
Execute
Memory
Write
back
PC update

– 12 –
rmmovl rA, D(rB)
icode:ifun ← M1[PC]
rA:rB ← M1[PC+1]
valC ← M4[PC+2]
valP ← PC+6
valA ← R[rA]
valB ← R[rB]
valE ← valB + valC
Read instruction byte
Read register byte
Read displacement D
Compute next PC
Read operand A
Read operand B
Compute effective address
M4[valE] ← valA
Write value to memory
PC ← valP
Update PC
Use ALU for address computation
CS:APP
Executing popl
popl rA
Fetch

Memory
Read 2 bytes
Decode

Read stack pointer
Execute

b 0 rA 8
Increment stack pointer by 4

Write back

Update stack pointer

Write result to register
PC Update

– 13 –
Read from old stack pointer
Increment PC by 2
CS:APP
Stage Computation: popl
Fetch
popl rA
icode:ifun ← M1[PC]
rA:rB ← M1[PC+1]
valP ← PC+2
Decode
Execute
Memory
Write
back
PC update
valA ← R[%esp]
valB ← R [%esp]
valE ← valB + 4
valM ← M4[valA]
R[%esp] ← valE
R[rA] ← valM
PC ← valP
Read instruction byte
Read register byte
Compute next PC
Read stack pointer
Read stack pointer
Increment stack pointer
Read from stack
Update stack pointer
Write back result
Update PC

Use ALU to increment stack pointer

Must update two registers
 Popped value
– 14 –
 New stack pointer
CS:APP
Executing Jumps
jXX Dest
7 fn
fall thru:
XX XX
Not taken
target:
XX XX
Taken
Fetch
Memory

Read 5 bytes

Increment PC by 5
Decode

Do nothing
Execute

– 15 –
Dest
Determine whether to take
branch based on jump
condition and condition
codes

Do nothing
Write back

Do nothing
PC Update

Set PC to Dest if branch
taken or to incremented PC
if not branch
CS:APP
Stage Computation: Jumps
Fetch
jXX Dest
icode:ifun ← M1[PC]
Read instruction byte
valC ← M4[PC+1]
valP ← PC+5
Read destination address
Fall through address
Bch ← Cond(CC,ifun)
Take branch?
PC ← Bch ? valC : valP
Update PC
Decode
Execute
Memory
Write
back
PC update
– 16 –

Compute both addresses

Choose based on setting of condition codes and branch
condition
CS:APP
Executing call
call Dest
8 0
return:
XX XX
target:
XX XX
Fetch
Memory

Read 5 bytes

Increment PC by 5
Decode

Read stack pointer
Execute

– 17 –
Dest
Decrement stack pointer by
4

Write incremented PC to
new value of stack pointer
Write back

Update stack pointer
PC Update

Set PC to Dest
CS:APP
Stage Computation: call
call Dest
icode:ifun ← M1[PC]
Read instruction byte
valC ← M4[PC+1]
valP ← PC+5
Read destination address
Compute return point
valB ← R[%esp]
valE ← valB + –4
Read stack pointer
Decrement stack pointer
Memory
Write
M4[valE] ← valP
R[%esp] ← valE
Write return value on stack
Update stack pointer
back
PC update
PC ← valC
Set PC to destination
Fetch
Decode
Execute
– 18 –

Use ALU to decrement stack pointer

Store incremented PC
CS:APP
Executing ret
ret
9 0
return:
XX XX
Fetch

Memory
Read 1 byte

Decode

Read stack pointer
Execute

Increment stack pointer by 4
Write back

Update stack pointer
PC Update

– 19 –
Read return address from
old stack pointer
Set PC to return address
CS:APP
Stage Computation: ret
ret
icode:ifun ← M1[PC]
Read instruction byte
valA ← R[%esp]
valB ← R[%esp]
valE ← valB + 4
Read operand stack pointer
Read operand stack pointer
Increment stack pointer
Memory
Write
valM ← M4[valA]
R[%esp] ← valE
Read return address
Update stack pointer
back
PC update
PC ← valM
Set PC to return address
Fetch
Decode
Execute
– 20 –

Use ALU to increment stack pointer

Read return address from memory
CS:APP
Computation Steps
Fetch
Decode
Execute
Memory
Write
back
PC update
– 21 –
icode,ifun
rA,rB
valC
valP
valA, srcA
valB, srcB
valE
Cond code
valM
dstE
dstM
PC
OPl rA, rB
icode:ifun ← M1[PC]
rA:rB ← M1[PC+1]
valP ← PC+2
valA ← R[rA]
valB ← R[rB]
valE ← valB OP valA
Set CC
R[rB] ← valE
PC ← valP
Read instruction byte
Read register byte
[Read constant word]
Compute next PC
Read operand A
Read operand B
Perform ALU operation
Set condition code register
[Memory read/write]
Write back ALU result
[Write back memory result]
Update PC

All instructions follow same general pattern

Differ in what gets computed on each step
CS:APP
Computation Steps
Fetch
Decode
Execute
Memory
Write
back
PC update
– 22 –
icode,ifun
rA,rB
valC
valP
valA, srcA
valB, srcB
valE
Cond code
valM
dstE
dstM
PC
call Dest
icode:ifun ← M1[PC]
valC ← M4[PC+1]
valP ← PC+5
valB ← R[%esp]
valE ← valB + –4
M4[valE] ← valP
R[%esp] ← valE
PC ← valC
Read instruction byte
[Read register byte]
Read constant word
Compute next PC
[Read operand A]
Read operand B
Perform ALU operation
[Set condition code reg.]
[Memory read/write]
[Write back ALU result]
Write back memory result
Update PC

All instructions follow same general pattern

Differ in what gets computed on each step
CS:APP
Computed Values
Fetch
Execute
icode
Instruction code

valE
ALU result
ifun
rA
Instruction function
Instr. Register A

Bch
Branch flag
rB
valC
Instr. Register B
Instruction constant
valP
Incremented PC
Memory

valM Value from
memory
Decode
– 23 –
srcA
srcB
Register ID A
Register ID B
dstE
dstM
Destination Register E
Destination Register M
valA
valB
Register value A
Register value B
CS:APP
newPC
SEQ Hardware
New
PC
PC
Key
valM
data out

Blue boxes:
predesigned hardware
blocks
Mem.
control
Memory
Gray boxes:
control logic
Data
Data
memory
memory
write
Addr
 E.g., memories, ALU

read
Execute
Bch
valE
CC
CC
ALU
ALU
ALU
A
Data
ALU
fun.
ALU
B
 Describe in HCL




– 24 –
White ovals:
labels for signals
Thick lines:
32-bit word values
Thin lines:
4-8 bit values
Dotted lines:
1-bit values
valA
valB
dstE dstM srcA srcB
dstE dstM srcA srcB
Decode
A
B
Register
Register M
file
file E
Write back
icode
Fetch
ifun
rA
rB
Instruction
Instruction
memory
memory
valC
valP
PC
PC
increment
increment
PC
CS:APP
Fetch Logic
icode
ifun
rA
rB
valC
valP
Need
valC
Instr
valid
Need
regids
Split
Split
PC
PC
increment
increment
Align
Align
Byte 0
Bytes 1-5
Instruction
Instruction
memory
memory
Predefined Blocks
– 25 –
PC

PC: Register containing PC

Instruction memory: Read 6 bytes (PC to PC+5)

Split: Divide instruction byte into icode and ifun

Align: Get fields for rA, rB, and valC
CS:APP
Fetch Logic
icode
ifun
rA
rB
valC
valP
Need
valC
Instr
valid
Need
regids
Split
Split
PC
PC
increment
increment
Align
Align
Byte 0
Bytes 1-5
Instruction
Instruction
memory
memory
PC
Control Logic
– 26 –

Instr. Valid: Is this instruction valid?

Need regids: Does this instruction have a register
bytes?

Need valC: Does this instruction have a constant word?
CS:APP
Fetch Control
Logic
nop
0
0
halt
1
0
rrmovl rA, rB
2
0 rA rB
irmovl V, rB
3
0
8 rB
V
rmmovl rA, D(rB)
4
0 rA rB
D
mrmovl D(rB), rA
5
0 rA rB
D
OPl rA, rB
6
fn rA rB
jXX Dest
7
fn
Dest
call Dest
8
0
Dest
ret
9
0
pushl rA
A
0 rA 8
popl rA
B
0 rA 8
bool need_regids =
icode in { IRRMOVL, IOPL, IPUSHL, IPOPL,
IIRMOVL, IRMMOVL, IMRMOVL };
bool instr_valid = icode in
{ INOP, IHALT, IRRMOVL, IIRMOVL, IRMMOVL, IMRMOVL,
IOPL, IJXX, ICALL, IRET, IPUSHL, IPOPL };
– 27 –
CS:APP
Decode Logic
valA
valB
A
B
valM valE
Register File

Read ports A, B

Write ports E, M

Register
Register
file
file
dstE
dstM
srcA
M
E
srcB
Addresses are register IDs or
8 (no access)
dstE dstM srcA
srcB
Control Logic
– 28 –

srcA, srcB: read port
addresses

dstA, dstB: write port
addresses
icode
rA
rB
CS:APP
A Source
Decode
OPl rA, rB
valA ← R[rA]
Read operand A
Decode
rmmovl rA, D(rB)
valA ← R[rA]
Read operand A
Decode
popl rA
valA ← R[%esp]
Read stack pointer
jXX Dest
Decode
No operand
call Dest
Decode
Decode
No operand
ret
valA ← R[%esp]
Read stack pointer
int srcA = [
icode in { IRRMOVL, IRMMOVL, IOPL, IPUSHL
icode in { IPOPL, IRET } : RESP;
1 : RNONE; # Don't need register
];
– 29 –
} : rA;
CS:APP
E Destination
OPl rA, rB
Write-back R[rB] ← valE
Write back result
rmmovl rA, D(rB)
Write-back
None
popl rA
Write-back R[%esp] ← valE
Update stack pointer
jXX Dest
Write-back
None
call Dest
Write-back R[%esp] ← valE
Update stack pointer
ret
Write-back R[%esp] ← valE
Update stack pointer
int dstE = [
icode in { IRRMOVL, IIRMOVL, IOPL} : rB;
icode in { IPUSHL, IPOPL, ICALL, IRET } : RESP;
1 : RNONE; # Don't need register
];
– 30 –
CS:APP
Execute Logic
Units

ALU
 Implements 4 required functions
 Generates condition code values

Bch
CC
 Register with 3 condition code
bcond
bcond
bits

Set
CC
Control Logic



– 31 –
Set CC: Should condition code
register be loaded?
icode ifun
ALU
fun.
ALU
ALU
CC
CC
bcond
 Computes branch flag

valE
ALU
A
valC
ALU
B
valA
valB
ALU A: Input A to ALU
ALU B: Input B to ALU
ALU fun: What function should
ALU compute?
CS:APP
ALU A Input
Execute
OPl rA, rB
valE ← valB OP valA
Perform ALU operation
Execute
rmmovl rA, D(rB)
valE ← valB + valC
Compute effective address
Execute
popl rA
valE ← valB + 4
Increment stack pointer
jXX Dest
Execute
No operation
Execute
call Dest
valE ← valB + –4
Decrement stack pointer
Execute
ret
valE ← valB + 4
Increment stack pointer
int aluA = [
icode in { IRRMOVL, IOPL } : valA;
icode in { IIRMOVL, IRMMOVL, IMRMOVL } : valC;
icode in { ICALL, IPUSHL } : -4;
icode in { IRET, IPOPL } : 4;
# Other instructions don't need ALU
– 32 –
];
CS:APP
ALU Operation
Execute
OPl rA, rB
valE ← valB OP valA
Perform ALU operation
Execute
rmmovl rA, D(rB)
valE ← valB + valC
Compute effective address
Execute
popl rA
valE ← valB + 4
Increment stack pointer
jXX Dest
Execute
– 33 –
No operation
Execute
call Dest
valE ← valB + –4
Decrement stack pointer
Execute
ret
valE ← valB + 4
Increment stack pointer
int alufun = [
icode == IOPL : ifun;
1 : ALUADD;
];
CS:APP
Memory Logic
Memory

valM
Reads or writes memory
word
data out
Mem.
read
Mem.
write
Control Logic


– 34 –

Mem. addr.: Select address

Mem. data.: Select data
Data
Data
memory
memory
write
data in
Mem. read: should word be
read?
Mem. write: should word be
written?
read
Mem
addr
icode
valE
Mem
data
valA valP
CS:APP
Memory Address
OPl rA, rB
Memory
No operation
Memory
rmmovl rA, D(rB)
M4[valE] ← valA
Write value to memory
Memory
popl rA
valM ← M4[valA]
Read from stack
jXX Dest
Memory
No operation
Memory
call Dest
M4[valE] ← valP
Write return value on stack
Memory
ret
valM ← M4[valA]
Read return address
int mem_addr = [
icode in { IRMMOVL, IPUSHL, ICALL, IMRMOVL } : valE;
icode in { IPOPL, IRET } : valA;
# Other instructions don't need address
CS:APP
– 35 – ];
Memory Read
OPl rA, rB
Memory
No operation
Memory
rmmovl rA, D(rB)
M4[valE] ← valA
Write value to memory
Memory
popl rA
valM ← M4[valA]
Read from stack
jXX Dest
Memory
No operation
Memory
call Dest
M4[valE] ← valP
Write return value on stack
Memory
ret
valM ← M4[valA]
Read return address
bool mem_read = icode in { IMRMOVL, IPOPL, IRET };
– 36 –
CS:APP
PC Update Logic
PC
New PC

New
PC
Select next value of PC
icode
– 37 –
Bch
valC
valM
valP
CS:APP
PC
Update
– 38 –
PC update
OPl rA, rB
PC ← valP
Update PC
PC update
rmmovl rA, D(rB)
PC ← valP
Update PC
PC update
popl rA
PC ← valP
Update PC
PC update
jXX Dest
PC ← Bch ? valC : valP
Update PC
PC update
call Dest
PC ← valC
Set PC to destination
PC update
ret
PC ← valM
Set PC to return address
int new_pc = [
icode == ICALL : valC;
icode == IJXX && Bch : valC;
icode == IRET : valM;
1 : valP;
];
CS:APP
SEQ Operation
State
Combinational
Logic
Read
Write
Data
Data
memory
memory
CC
CC
Read
Ports
Write
Ports
Register
Register
file
file
PC
0x00c

PC register

Cond. Code register

Data memory

Register file
All updated as clock rises
Combinational Logic

ALU

Control logic

Memory reads
 Instruction memory
 Register file
 Data memory
– 39 –
CS:APP
SEQ
Operation
#2
Combinational
Logic
CC
CC
100
100
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Clock
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Read
Write

state set according to
second irmovl
instruction

combinational logic
starting to react to
state changes
Data
Data
memory
memory
Read
Ports
Write
Ports
Register
Register
file
file
%ebx
%ebx == 0x100
0x100
PC
0x00c
– 40 –
CS:APP
SEQ
Operation
#3
Combinational
Logic
CC
CC
100
100
000
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Clock
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Read
Write

state set according to
second irmovl
instruction

combinational logic
generates results for
addl instruction
Data
Data
memory
memory
Read
Ports
Write
Ports
Register
Register
file
file
%ebx
%ebx == 0x100
0x100
0x00e
PC
0x00c
– 41 –
CS:APP
SEQ
Operation
#4
Cycle 1
Cycle 3
Cycle 4
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Read
Combinational
Logic
CC
CC
000
000
Cycle 2
Clock
Write

state set
according to addl
instruction

combinational
logic starting to
react to state
changes
Data
Data
memory
memory
Read
Ports
Write
Ports
Register
Register
file
file
%ebx
%ebx == 0x300
0x300
PC
0x00e
– 42 –
CS:APP
SEQ
Operation
#5
Cycle 1
Cycle 3
Cycle 4
Cycle 1:
0x000:
irmovl $0x100,%ebx
# %ebx <-- 0x100
Cycle 2:
0x006:
irmovl $0x200,%edx
# %edx <-- 0x200
Cycle 3:
0x00c:
addl %edx,%ebx
# %ebx <-- 0x300 CC <-- 000
Cycle 4:
0x00e:
je dest
# Not taken
Read
Combinational
Logic
CC
CC
000
000
Cycle 2
Clock
Write

state set
according to addl
instruction

combinational
logic generates
results for je
instruction
Data
Data
memory
memory
Read
Ports
Write
Ports
Register
Register
file
file
%ebx
%ebx == 0x300
0x300
0x013
PC
0x00e
– 43 –
CS:APP
SEQ Summary
Implementation

Express every instruction as series of simple steps

Follow same general flow for each instruction type

Assemble registers, memories, predesigned combinational
blocks

Connect with control logic
Limitations
– 44 –

Too slow to be practical

In one cycle, must propagate through instruction memory,
register file, ALU, and data memory

Would need to run clock very slowly

Hardware units only active for fraction of clock cycle
CS:APP