CpE242 Computer Architecture and Engineering Lecture 9

CpE242
Computer Architecture and Engineering
Designing a Single Cycle Datapath
CPE 442 single-cycle datapath.1
Intro. To Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction
• Where are we with respect to the BIG picture? (10 minutes)
° The Steps of Designing a Processor (10 minutes)
° Datapath and timing for Reg-Reg Operations (15 minutes)
° Datapath for Logical Operations with Immediate (5 minutes)
° Datapath for Load and Store Operations (10 minutes)
° Datapath for Branch and Jump Operations (10 minutes)
CPE 442 single-cycle datapath.2
Intro. To Computer Architecture
Course Overview
Computer Design
Instruction Set Deign
° Machine Language
° Compiler View
° "Computer Architecture"
° "Instruction Set Processor"
"Building Architect"
Computer Hardware Design
° Machine
Implementation
° Logic Designer's View
° "Processor Architecture"
° "Computer Organization”
“Construction Engineer”
CPE 442 single-cycle datapath.3
Intro. To Computer Architecture
The Big Picture: Where are We Now?
° The Five Classic Components of a Computer
Processor
Input
Control
Memory
Datapath
Output
° Today’s Topic: Datapath Design
CPE 442 single-cycle datapath.4
Intro. To Computer Architecture
The Big Picture: The Performance Perspective
° Performance of a machine was determined by:
• Instruction count
• Clock cycle time
• Clock cycles per instruction
° Processor design (datapath and control) will determine:
• Clock cycle time
• Clock cycles per instruction
° In the next two lectures:
• Single cycle processor:
-
Advantage: One clock cycle per instruction
Disadvantage: long cycle time
CPE 442 single-cycle datapath.5
Intro. To Computer Architecture
Recall the MIPS Instruction Set Architecture
CPE 442 single-cycle datapath.6
Intro. To Computer Architecture
CPE 442 single-cycle datapath.7
Intro. To Computer Architecture
CPE 442 single-cycle datapath.8
Intro. To Computer Architecture
CPE 442 single-cycle datapath.9
Intro. To Computer Architecture
The MIPS Instruction Formats
° All MIPS instructions are 32 bits long. The three instruction formats:
31
• R-type
26
op
rs
6 bits
• I-type
31
26
op
31
16
rt
5 bits
5 bits
21
rs
6 bits
• J-type
21
5 bits
11
6
rd
shamt
funct
5 bits
5 bits
6 bits
16
0
immediate
rt
5 bits
16 bits
26
op
6 bits
0
0
target address
26 bits
° The different fields are:
• op: operation of the instruction
•
•
•
•
•
rs, rt, rd: the source and destination register specifiers
shamt: shift amount
funct: selects the variant of the operation in the “op” field
address / immediate: address offset or immediate value
target address: target address of the jump instruction
CPE 442 single-cycle datapath.10
Intro. To Computer Architecture
The MIPS Subset
° ADD and subtract
31
26
op
• add rd, rs, rt
• sub rd, rs, rt
rs
6 bits
31
° OR Immediate:
• ori rt, rs, imm16
21
op
rt
5 bits
26
5 bits
21
rs
6 bits
16
5 bits
11
6
0
rd
shamt
funct
5 bits
5 bits
6 bits
16
0
rt
immediate
5 bits
16 bits
° LOAD and STORE
• lw rt, rs, imm16
• sw rt, rs, imm16
° BRANCH:
• beq rs, rt, imm16
° JUMP:
• j target
31
26
op
6 bits
CPE 442 single-cycle datapath.11
0
target address
26 bits
Intro. To Computer Architecture
An Abstract View of the Implementation
Clk
PC
Instruction Fetch
Instruction Address
Ideal
Instruction
Memory
Instruction
Rd Rs
5
5
Rt
5
Operand Fetch
Imm
16
Execute
Rw Ra Rb
32 32-bit
Registers
32
ALU
32
32
Clk
32
CPE 442 single-cycle datapath.12
Data
Address
Data
In
Ideal
Data
Memory
DataOut
Clk
Intro. To Computer Architecture
Clocking Methodology
Clk
Setup
Hold
Setup
Hold
.
.
.
.
.
.
Don’t Care
.
.
.
.
.
.
° All storage elements are clocked by the same clock edge
° Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew
CPE 442 single-cycle datapath.13
Intro. To Computer Architecture
An Abstract View of the Critical Path
° Register file and ideal memory:
• The CLK input is a factor ONLY during write operation
• During read operation, behave as combinational logic:
- Address valid => Output valid after “access time.”
Clk
Critical Path (Load Operation) =
PC’s Clk-to-Q +
Instruction Memory’s Access Time +
Register File’s Access Time +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Setup Time for Register File Write +
Clock Skew
PC
Instruction Address
Ideal
Instruction
Memory
Instruction
Rd Rs
5
5
Imm
16
32
Rw Ra Rb
32 32-bit
Registers
32
ALU
32
Rt
5
Clk
32
CPE 442 single-cycle datapath.14
Data
Address
Data
In
Ideal
Data
Memory
DataOut
Clk
Intro. To Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction (5 minutes)
• Where are we with respect to the BIG picture? (10 minutes)
° The Steps of Designing a Processor (10 minutes)
° Datapath and timing for Reg-Reg Operations (15 minutes)
° Datapath for Logical Operations with Immediate (5 minutes)
° Datapath for Load and Store Operations (10 minutes)
° Datapath for Branch and Jump Operations (10 minutes)
CPE 442 single-cycle datapath.15
Intro. To Computer Architecture
The Steps of Designing a Processor
0) Instruction Set Architecture =>
Register Transfer Language Program
•Convert each instruction to RTL statements
1)Register Transfer Language program => Datapath
Design
•Determine Datapath components
•Determine Datapath interconnects
2)Datapath components => Control signals
3)Control signals => Control logic => Control Unit Design
CPE 442 single-cycle datapath.16
Intro. To Computer Architecture
Step: 0) Instruction Set Architecture =>
Register Transfer Language Program
Register Transfer Language-RTL:
The ADD Instruction
° add rd, rs, rt
• mem[PC]
Fetch the instruction
from memory
• R[rd] <- R[rs] + R[rt]
The ADD operation
• PC <- PC + 4
Calculate the next
instruction’s address
CPE 442 single-cycle datapath.17
Intro. To Computer Architecture
RTL: The Load Instruction
° lw
rt, rs, imm16
• mem[PC]
Fetch the instruction
from memory
• Addr <- R[rs] + SignExt(imm16)
Calculate the memory
address
• R[rt] <- Mem[Addr]
Load the data into the
register
• PC <- PC + 4
Calculate the next
instruction’s address
CPE 442 single-cycle datapath.18
Intro. To Computer Architecture
Step 1) : RTL To Data Path Design – Data Path Components,
° Adder
Combinational Logic Elements
CarryIn
A
Adder
B
32
Sum
32
Carry
32
° MUX
Select
B
32
MUX
A
32
Y
32
° ALU
OP
A
CPE 442 single-cycle datapath.19
32
ALU
B
32
32
Result
Zero
Intro. To Computer Architecture
Step 1) : RTL To Data Path Design – Data Path Components,
Storage Elements: Register
° Register
• Similar to the D Flip Flop except
- N-bit input and output
Write Enable
Data In
- Write Enable input
• Write Enable:
- 0: Data Out will not change
Data Out
N
N
Clk
- 1: Data Out will become Data In
CPE 442 single-cycle datapath.20
Intro. To Computer Architecture
Step 1) : RTL To Data Path Design – Data Path Components,
Storage Elements: Register File
RW RA RB
° Register File consists of 32 registers:
• Two 32-bit output busses:
busA and busB
• One 32-bit input bus: busW
Write Enable 5
5
5
busA
busW
32
Clk
32 32-bit
Registers
32
busB
32
° Register is selected by:
• RA selects the register to put on busA
• RB selects the register to put on busB
• RW selects the register to be written
via busW when Write Enable is 1
° Clock input (CLK)
• The CLK input is a factor ONLY during write operation
• During read operation, behaves as a combinational logic
block: RA or RB valid => busA or busB valid after “access
time.”
CPE 442 single-cycle datapath.21
Intro. To Computer Architecture
Step 1) : RTL To Data Path Design – Data Path Components,
Storage Elements: Idealized Memory
Write Enable
° Memory (idealized)
• One input bus: Data In
• One output bus: Data Out
Data In
Address
DataOut
32
Clk
32
° Memory word is selected by:
• Address selects the word to put on Data Out
• Write Enable = 1: address selects the memory
memory word to be written via the Data In bus
° Clock input (CLK)
• The CLK input is a factor ONLY during write operation
• During read operation, behaves as a combinational logic
block:
- Address valid => Data Out valid after “access time.”
CPE 442 single-cycle datapath.22
Intro. To Computer Architecture
Step 1) : RTL To Data Path Design – Data Path Components,
Overview of the Instruction Fetch Unit
° The common RTL operations
• Fetch the Instruction: mem[PC]
• Update the program counter:
- Sequential Code: PC <- PC + 4
- Branch and Jump PC <- “something else”
Clk
PC
Next Address
Logic
Address
Instruction
Memory
CPE 442 single-cycle datapath.23
Instruction Word
32
Intro. To Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction
• Where are we with respect to the BIG picture? (10 minutes)
° The Steps of Designing a Processor (10 minutes)
° Datapath and timing for Reg-Reg Operations (5 minutes)
° Datapath for Logical Operations with Immediate (5 minutes)
° Datapath for Load and Store Operations (10 minutes)
° Datapath for Branch and Jump Operations (10 minutes)
CPE 442 single-cycle datapath.24
Intro. To Computer Architecture
Step 1): RTL To Data-Path Design
Determine Data-path interconnects
RTL: The ADD Instruction
31
26
21
op
6 bits
rs
5 bits
16
rt
5 bits
11
6
0
rd
shamt
funct
5 bits
5 bits
6 bits
° add rd, rs, rt
• mem[PC]
Fetch the instruction
from memory
• R[rd] <- R[rs] + R[rt]
The actual operation
• PC <- PC + 4
Calculate the next
instruction’s address
CPE 442 single-cycle datapath.25
Intro. To Computer Architecture
RTL: The Subtract Instruction
31
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
11
6
0
rd
shamt
funct
5 bits
5 bits
6 bits
° sub rd, rs, rt
• mem[PC]
Fetch the instruction
from memory
• R[rd] <- R[rs] - R[rt]
The actual operation
• PC <- PC + 4
Calculate the next
instruction’s address
CPE 442 single-cycle datapath.26
Intro. To Computer Architecture
RTL To Data Path Design – Data-path for Register-Register
Operations
° R[rd] <- R[rs] op R[rt]
Example: add
rd, rs, rt
• Ra, Rb, and Rw comes from instruction’s rs, rt, and rd fields
• ALUctr and RegWr: control logic after decoding the instruction
31
26
21
op
16
rs
6 bits
rt
5 bits
Rd Rs
RegWr 5
5
5 bits
Rt
0
rd
shamt
funct
5 bits
5 bits
6 bits
busA
32
busB
ALU
32
Clk
32 32-bit
Registers
6
ALUctr
5
Rw Ra Rb
busW
11
Result
32
32
CPE 442 single-cycle datapath.27
Intro. To Computer Architecture
Register-Register Timing
Clk
PC
Old Value
Clk-to-Q
New Value
Rs, Rt, Rd,
Op, Func
Old Value
ALUctr
Old Value
RegWr
Old Value
busA, B
busW
Instruction Memory Access Time
New Value
Delay through Control Logic
New Value
New Value
Register File Access Time
New Value
Old Value
ALU Delay
New Value
Old Value
Rd Rs Rt
RegWr 5 5
5
CPE 442 single-cycle datapath.28
Register Write
Occurs Here
busA
32
busB
32
ALU
busW
32
Clk
Rw Ra Rb
32 32-bit
Registers
ALUctr
Result
32
Intro. To Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction
• Where are we with respect to the BIG picture? (50 minutes)
° The Steps of Designing a Processor (10 minutes)
° Datapath and timing for Reg-Reg Operations (15 minutes)
° Datapath for Logical Operations with Immediate (5 minutes)
° Datapath for Load and Store Operations (10 minutes)
° Datapath for Branch and Jump Operations (10 minutes)
CPE 442 single-cycle datapath.29
Intro. To Computer Architecture
RTL: The OR Immediate Instruction
31
26
op
6 bits
° ori
21
rs
5 bits
16
rt
0
immediate
5 bits
16 bits
rt, rs, imm16
• mem[PC]
Fetch the instruction from memory
• R[rt] <- R[rs] or ZeroExt(imm16)
The OR operation
• PC <- PC + 4
Calculate the next instruction’s address
31
0000000000000000
16 bits
CPE 442 single-cycle datapath.30
16 15
0
immediate
16 bits
Intro. To Computer Architecture
Datapath for Logical Operations with Immediate
° R[rt] <- R[rs] op ZeroExt[imm16]]
31
26
op
rt
Rs
5
5
Don’t Care
(Rt)
ALUctr
busA
ALU
32
32 32-bit
Registers
32
Clk
16 bits
rd
Rt
Rw Ra Rb
busW
busB
ZeroExt
16
Result
32
Mux
32
imm16
0
immediate
5 bits
Mux
RegWr 5
rt, rs, imm16
11
16
5 bits
Rd
RegDst
21
rs
6 bits
Example: ori
32
ALUSrc
CPE 442 single-cycle datapath.31
Intro. To Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction
• Where are we with respect to the BIG picture? (50 minutes)
° The Steps of Designing a Processor (10 minutes)
° Datapath and timing for Reg-Reg Operations (15 minutes)
° Datapath for Logical Operations with Immediate (5 minutes)
° Datapath for Load and Store Operations (10 minutes)
° Datapath for Branch and Jump Operations (10 minutes)
° Assignment #3
CPE 442 single-cycle datapath.32
Intro. To Computer Architecture
RTL: The Load Instruction
31
26
op
° lw
rt, rs, imm16
6 bits
• mem[PC]
21
rs
5 bits
16
rt
0
immediate
5 bits
16 bits
Fetch the instruction from memory
• Addr <- R[rs] + SignExt(imm16)
Calculate the memory address
R[rt] <- Mem[Addr]
• PC <- PC + 4
Load the data into the register
Calculate the next instruction’s address
31
SignExt
operation
16 15
0
0000000000000000
16 bits
31
16 15
1111111111111111 1
16 bits
CPE 442 single-cycle datapath.33
0
immediate
16 bits
0
immediate
16 bits
Intro. To Computer Architecture
Datapath for Load Operations
° R[rt] <- Mem[R[rs] + SignExt[imm16]]
31
26
21
op
rs
6 bits
Rd
RegDst
Rs
RegWr 5
5 bits
5
5
Don’t Care
(Rt)
16 bits
rd
busA
32
ExtOp
32
Mux
Extender
32
ALUSrc
CPE 442 single-cycle datapath.34
32
MemWr
Mux
busB
32
16
ALUctr
MemtoReg
Rw Ra Rb
32 32-bit
Registers
imm16
immediate
ALU
32
Clk
0
Rt
Mux
busW
rt, rs, imm16
11
16
rt
5 bits
Example: lw
WrEn Adr
Data In
32
Clk
Data
Memory
Intro. To Computer Architecture
RTL: The Store Instruction
31
26
op
6 bits
° sw
21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
rt, rs, imm16
• mem[PC]
Fetch the instruction from memory
• Addr <- R[rs] + SignExt(imm16)
Calculate the memory address
• Mem[Addr] <- R[rt]
Store the register into
memory
• PC <- PC + 4
Calculate the next
instruction’s address
CPE 442 single-cycle datapath.35
Intro. To Computer Architecture
Datapath for Store Operations
° Mem[R[rs] + SignExt[imm16] <- R[rt]]
31
26
21
op
Rd
RegDst
0
rt
5 bits
rt, rs, imm16
16
rs
6 bits
Example: sw
immediate
5 bits
16 bits
Rt
Mux
RegWr 5
Rw Ra Rb
32 32-bit
Registers
CPE 442 single-cycle datapath.36
MemtoReg
busA
32
busB
32
32
ExtOp
32
32
Mux
16
MemWr
Extender
imm16
ALUctr
5
Mux
32
Clk
5
Rt
ALU
busW
Rs
WrEn Adr
Data In 32
ALUSrc
Clk
Data
Memory
Intro. To Computer Architecture
Outline of Today’s Lecture
° Recap and Introduction
• Where are we with respect to the BIG picture? (10 minutes)
° The Steps of Designing a Processor (10 minutes)
° Datapath and timing for Reg-Reg Operations (15 minutes)
° Datapath for Logical Operations with Immediate (5 minutes)
° Datapath for Load and Store Operations (10 minutes)
° Datapath for Branch and Jump Operations (10 minutes)
CPE 442 single-cycle datapath.37
Intro. To Computer Architecture
RTL: The Branch Instruction
31
26
21
op
6 bits
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
° beq rs, rt, imm16
• mem[PC]
Fetch the instruction
from memory
• Cond <- R[rs] - R[rt]
Calculate the branch
condition
• if (COND eq 0)
Calculate the next
instruction’s address
PC <- PC + 4 + ( SignExt(imm16) x 4 )
else PC <- PC + 4
CPE 442 single-cycle datapath.38
Intro. To Computer Architecture
Step 1) : RTL To Data Path Design – Data Path Components,
Overview of the Instruction Fetch Unit
° The common RTL operations
• Fetch the Instruction: mem[PC]
• Update the program counter:
- Sequential Code: PC <- PC + 4
- Branch and Jump PC <- “something else”
Clk
PC
Next Address
Logic
Address
Instruction
Memory
CPE 442 single-cycle datapath.39
Instruction Word
32
Intro. To Computer Architecture
Datapath for Branch Operations
° beq
rs, rt, imm16
31
We need to compare Rs and Rt!
26
21
op
rs
6 bits
Rd
RegDst
16
0
rt
5 bits
immediate
5 bits
16 bits
Rt
Branch
Clk
PC
Mux
Rs
RegWr 5
32
Clk
ALUctr
5
busA
Rw Ra Rb
32 32-bit
Registers
32
16
Extender
imm16
Mux
busB
32
imm16
16
ALU
busW
5
Rt
Zero
Next Address
Logic
To Instruction
Memory
32
ALUSrc
CPE 442 single-cycle datapath.40
ExtOp
Intro. To Computer Architecture
Binary Arithmetics for the Next Address
° In theory, the PC is a 32-bit byte address into the instruction memory:
• Sequential operation: PC<31:0> = PC<31:0> + 4
• Branch operation: PC<31:0> = PC<31:0> + 4 + SignExt[Imm16] * 4
° The magic number “4” always comes up because:
• The 32-bit PC is a byte address
• And all our instructions are 4 bytes (32 bits) long
° In other words:
• The 2 LSBs of the 32-bit PC are always zeros
• There is no reason to have hardware to keep the 2 LSBs
° In practice, we can simply the hardware by using a 30-bit PC<31:2>:
• Sequential operation: PC<31:2> = PC<31:2> + 1
• Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
• In either case: Instruction Memory Address = PC<31:2> concat “00”
CPE 442 single-cycle datapath.41
Intro. To Computer Architecture
Next Address Logic: Expensive and Fast Solution
° Using a 30-bit PC:
• Sequential operation: PC<31:2> = PC<31:2> + 1
• Branch operation: PC<31:2> = PC<31:2> + 1 + SignExt[Imm16]
• In either case: Instruction Memory Address = PC<31:2> concat “00”
30
Addr<31:2>
30
“00”
Adder
PC
30
0
imm16
Instruction<15:0> 16
SignExt
Clk
Adder
“1”
1
32
30
30
Instruction<31:0>
Branch
CPE 442 single-cycle datapath.42
Instruction
Memory
Mux
30
Addr<1:0>
Zero
Intro. To Computer Architecture
Next Address Logic: Cheap and Slow Solution
° Why is this slow?
• Cannot start the address add until Zero (output of ALU) is valid
° Does it matter that this is slow in the overall scheme of things?
• Probably not here. Critical path is the load operation.
30
PC
Addr<31:2>
30
“1”
“0”
Carry In
Adder
0
Mux
imm16
Instruction<15:0> 16
SignExt
Clk
1
30
“00”
30
Addr<1:0>
Instruction
Memory
32
30
Instruction<31:0>
Branch Zero
CPE 442 single-cycle datapath.43
Intro. To Computer Architecture
RTL: The Jump Instruction
31
26
op
6 bits
°j
0
target address
26 bits
target
• mem[PC]
Fetch the instruction
from memory
• PC<31:2> <- PC<31:28> concat target<25:0>
Calculate the next
instruction’s address
CPE 442 single-cycle datapath.44
Intro. To Computer Architecture
Instruction Fetch Unit
°j
target
• PC<31:2> <- PC<31:29> concat target<25:0>
30
30
PC<31:28>
26
30
Jump
1
Instruction
Memory
32
Instruction<31:0>
30
30
Branch
equal
CPE 442 single-cycle datapath.45
0
0
Addr<1:0>
Mux
imm16
Instruction<15:0> 16
SignExt
Clk
1
30
Adder
“1”
4
Adder
PC
30
“00”
Mux
Target
Instruction<25:0>
Addr<31:2>
Zero
Intro. To Computer Architecture
Putting it All Together: A Single Cycle Datapath
° We have everything except control signals (underline)
Instruction<31:0>
Branch
Rs
RegWr 5
5
Rt
Rt
ALUctr
5
busA
0
1
32
Rd
MemtoReg
MemWr
0
32
32
WrEn Adr
Data In 32
Clk
Imm16
Mux
16
Extender
imm16
32
Mux
32
Clk
Rw Ra Rb
32 32-bit
Registers
busB
32
ALU
busW
Zero
Rs
<0:15>
1 Mux 0
<11:15>
Clk
<16:20>
RegDst
Rt
<21:25>
Rd
Instruction
Fetch Unit
Jump
1
Data
Memory
ALUSrc
ExtOp
CPE 442 single-cycle datapath.46
Intro. To Computer Architecture