slides in ppt

CS 161Computer Architecture
Chapter 5
Lecture 11
Instructor: L.N. Bhuyan
www.cs.ucr.edu/~bhuyan
Adapted from notes by Dave Patterson
(http.cs.berkeley.edu/~patterson)
.1
1999 ©UCB
Implementing Main Control
Main Control has one 6-bit
input, 9 outputs (7 are 1-bit,
ALUOp is 2 bits)
RegDst
Branch
To build Main Control as
sum-of-products:
MemRead
op
(1) Construct a minterm for
each different instruction (or
R-type); each minterm
corresponds to a single
instruction (or all of the Rtype instructions),
e.g., MR-format, Mlw
.2
(2) Determine each main
control output by forming the
logical OR of relevant
minterms (instructions), e.g.,
RegWrite: MR-format OR Mlw
MemtoReg
Main
Control
ALUop
2
MemWrite
ALUSrc
RegWrite
1999 ©UCB
Single-Cycle MIPS-lite CPU
4
Read
Addr
P
C
31:0
Instruction
Imem
op=[31:26]
15:11
a
d
d
25:21
20:16
M
u
x
<<
2
Read
Reg1
Read
Read data1
Reg2
Read
Write
data2
Reg
Write
Data
Regs
RegWrite
15:0
.3
PCSrc
MemWrite
Branch
RegDst
Main
Control
a
d
d
M
u
x
Sign
Extend
Zero
M
u
x
A
L
U
Address
MemToReg
Dmem
ALUcon
ALUsrc
Write
Data
MemRead
ALU
Control
5:0
Read
data
ALUOp (2)
M
u
x
1999 ©UCB
Fig. 5.17 Datapath with Control Signals
.4
1999 ©UCB
Fig. 5.18 Setting Control Lines Depend on Opcode
Memto- Reg Mem Mem
Instruction RegDst ALUSrc
Reg
Write Read Write Branch ALUOp1 ALUp0
R-format
1
0
0
1
0
0
0
1
0
lw
0
1
1
1
1
0
0
0
0
sw
X
1
X
0
0
1
0
0
0
beq
X
0
X
0
0
0
1
0
1
.5
1999 ©UCB
Control Design
° Simple combinational logic (truth tables)
Inputs
Op5
Op4
Op3
Op2
ALUOp
Op1
ALU control block
Op0
ALUOp0
ALUOp1
Outputs
F3
F2
F (5– 0)
Operation2
Operation1
Operation
Iw
sw
beq
RegDst
ALUSrc
MemtoReg
F1
Operation0
F0
R-format
RegWrite
MemRead
MemWrite
Branch
ALUOp1
ALUOpO
.6
1999 ©UCB
Fig. 5.19 R-type operation, add $t1, $t2, $t3 Active parts are highlighted
.7
1999 ©UCB
Fig. 5.20 Active parts for a Load instruction
.8
1999 ©UCB
Fig. 5.21 Active parts for a beq instruction
.9
1999 ©UCB
Fig. 5.24 Extension for Jump instruction
.10
1999 ©UCB
Single-Cycle Machine: Appraisal
° All instructions complete in one clock cycle
(CPI = 1)
° Some instructions take more steps than
others
• lw is most expensive (5 steps, vs. 4 for R-type
and sw, 3 for beq)
° Clock cycle must cover longest instruction
 inefficient
• suppose mul is added?
• 32-shift/add steps  would delay every
other instruction
.11
1999 ©UCB
Example
° Assume 2ns for instruction/data memory,
1ns for decode/register read, 2ns for ALU
and 1 ns for register write.
° Single-cycle datapath clock period = 8 ns.
° Assume an instn mix of 24% loads, 12%
stores, 44% R-format, 18% branches, and
2% jumps.
° Assuming a variable-cycle datapath,
average clock period = 6.3 ns.
° Possible Speed-up = 1.27
.12
1999 ©UCB
Multicycle Implementation (MIPS-lite v.2)
° Want more efficient implementation
° Each step will take one clock cycle (not
each instruction) [CPI > 1]
 shorter clock cycle: cycle time
constrained by longest step, not longest
instruction
° simpler instructions take fewer cycles
 higher overall performance
° complex control: finite state machine
° Versatile (can extend for new instructions:
add3, swap, etc.)
.13
1999 ©UCB
Recap: Clocking: single-cycle vs. multicycle
Single-cycle Implementation
clock
waste
waste
beq $t0,$t1,L
add $t0,$t1,$t2
Multicycle Implementation
clock
add $t0,$t1,$t2
beq $t0,$t1,L
• Multicycle Implementation:
less waste=higher performance
.14
1999 ©UCB
Recap: How fast can we run the clock?
°Depends on how much want done per
clock cycle
• Can do: several “inexpensive” datapath
operations per clock
- simple gates (AND, OR, …)
- single datapath registers (PC)
- sign extender, left shifter, multiplexor
• PLUS: exactly one “expensive” datapath
operation per clock
- ALU operation
- Register File access (2 reads, or 1 write)
- Memory access (read or write)
.15
1999 ©UCB
MIPS-lite
Multicycle Version
Multicycle Datapath (overview)
P
C
Address
Instruction
Register
Read
Reg1
Memory
Read
Reg2
Instruction
or Data
Data
Read
data 1
A
Registers
Memory
Data
Register
Write
Reg
Read
data 2
A
L
U
ALUOut
B
Data
• One ALU
.16
(no extra adders)
• One Memory (no separate Imem, Dmem)
• New Temporary Registers (“clocked”/require clock input)
1999 ©UCB
Multicycle Implementation
°Datapath changes
• one memory: both instructions and data
(because can access on separate steps)
• one ALU (eliminate extra adders)
• extra “invisible” registers to capture
intermediate (per-step) datapath results
°Controller changes
• controller must fire control lines in
correct sequence and correct time
 controller must remember current
execution step, advance to next step
.17
1999 ©UCB
Multicycle Datapath: Add Multiplexors
P
C
M
u Address
x
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Data
Write
Data
Read
Reg2
20:16
IR
M
u Write
15:0
15:11 x
Reg
M
D
R
M
u
x
Write
Data
Regs
4
A
L
U
zero
ALUOut
0
1M
2u
3x
Note inputs to
multiplexors
Sgn
Extend
.18
M
u
x
<<
2
1999 ©UCB
Datapath + Control Points
IRWrite
RegWrite
MemRead
MemWrite
RegDst
IorD
P
C
M
u Address
x
Write
Data
PCSrc
ALUSrcA
PCWriteCond
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Data
PCWrite
Read
Reg2
20:16
IR
M
u Write
15:0
15:11
x Reg
M
D
R
M
u
x
Write
Data
Regs
Sgn
Extend
M
u
x
4
z
A
L
U
0
1M
2u
3x
M
u
x
ALUOut
3
ALU
Control
<<
2
2
2
(funct) 5:0
ALUSrcB
.19
MemtoReg
ALUOp
1999 ©UCB
Multicycle Instruction Execution
°All instructions execute in 3-5 cycles
• 3 cycles:
beq
• 4 cycles:
R-type, sw
• 5 cycles:
lw
°1: fetch instruction, PC=PC+4
°2: decode, fetch registers, brnch target
°3: execute/compute address/branch
°4: access memory/complete R-type
°5: (lw) store memory
.20
1999 ©UCB
Cycle 1 Datapath: IR=Mem[PC]; PC=PC+4
P
C
M
u
x
Address
M
u
x
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
20:16
Read
Data
Write
Data
IR
M
u Write
15:0
15:11
x Reg
M
D
R
IR=Mem[PC];
PC=PC+4
.21
Read
Reg2
M
u
x
Write
Data
Regs
M
u
x
4
A
L
U
0
1M
2u
3x
<<
2
ALUOut
3
(funct) 5:0
Sgn
Extend
z
ALU
Control
2
2
1999 ©UCB
Cycle 2: A=Reg[IR25:21]; ALUOut= PC + sgn-ext(IR15:0) << 2
P
C
M
u Address
x
M
u
x
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Data
Write
Data
20:16
IR
M
u Write
15:0
15:11
x Reg
M
D
R
A=Reg[IR25:21];
B=Reg[IRPC
20:16
ALUOut=
+];
sgn-ext(IR15:0) << 2
.22
Read
Reg2
M
u
x
Write
Data
Regs
M
u
x
4
A
L
U
0
1M
2u
3x
<<
2
ALUOut
3
(funct) 5:0
Sgn
Extend
z
ALU
Control
2
2
1999 ©UCB
Cycle 3: R-format: ALUOut = A op B
P
C
M
u Address
x
M
u
x
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Data
Write
Data
20:16
IR
M
u Write
15:0
15:11
x Reg
M
D
R
ALUOut=A op B
Read
Reg2
M
u
x
Write
Data
Regs
M
u
x
4
A
L
U
0
1M
2u
3x
<<
2
ALUOut
3
(funct) 5:0
Sgn
Extend
z
ALU
Control
2
2
.23
1999 ©UCB
Cycle 4 R-format: Reg[IR15:11] = ALUOut
P
C
M
u Address
x
M
u
x
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Reg2
20:16
Read
Data
Write
Data
IR
M
u Write
15:0
15:11
x Reg
M
D
R
Reg[IR15:11] = ALUOut
M
u
x
Write
Data
Regs
M
u
x
4
A
L
U
0
1M
2u
3x
<<
2
ALUOut
3
(funct) 5:0
Sgn
Extend
z
ALU
Control
2
2
.24
• How many times use ALU?
1999 ©UCB
Cycle 3 beq:
P
C
M
u Address
x
if (A==B) PC =ALUOut
M
u
x
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
20:16
Read
Data
Write
Data
Read
Reg2
IR
M
u Write
15:0
15:11
x Reg
M
D
R
if (A==B) PC =ALUOut
M
u
x
Write
Data
Regs
M
u
x
4
A
L
U
0
1M
2u
3x
<<
2
ALUOut
3
(funct) 5:0
Sgn
Extend
z
ALU
Control
2
2
.25
1999 ©UCB
Cycle 3 lw:
ALUOut = A + sgn-ext(IR15:0)
IRWrite
RegWrite
MemRead
MemWrite
RegDst=x
IorD=x
P
C
M
u Address
x
Write
Data
PCSrc=x
ALUSrcA=1
PCWriteCond
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Data
PCWrite
Read
Reg2
20:16
IR
M
u Write
15:0
15:11
x Reg
M
D
R
M
u
x
Write
Data
Regs
4
<<
2
z
A
L
U
0
1M
2u
3x
ALUOut
3
(funct) 5:0
Sgn
Extend
ALUOut = A + sgn-ext(IR15:0)
M
u
x
M
u
x
ALU
Control
2
2
ALUSrcB=2
.26
MemtoReg=x
ALUOp=0
1999 ©UCB
Cycle 4 lw:MDR = Mem[ALUout]
IRWrite
RegWrite
MemRead
MemWrite
RegDst=x
IorD=1
P
C
M
u Address
x
Write
Data
PCSrc=x
ALUSrcA=x
PCWriteCond
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Data
PCWrite
Read
Reg2
20:16
IR
M
u Write
15:0
15:11
x Reg
M
D
R
M
u
x
Write
Data
Regs
4
<<
2
z
A
L
U
0
1M
2u
3x
ALUOut
3
(funct) 5:0
Sgn
Extend
MDR = Mem[ALUout]
M
u
x
M
u
x
ALU
Control
2
2
ALUSrcB=x
.27
MemtoReg=x
ALUOp=x
1999 ©UCB
Cycle 5 lw:
Reg[IR15:11] = MDR
IRWrite
RegWrite
MemRead
MemWrite
RegDst=0
IorD=x
P
C
M
u Address
x
Write
Data
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Read
Reg2
20:16
IR
M
u Write
15:0
15:11
x Reg
M
D
R
Reg[IR15:11] = MDR
PCSrc=x
ALUSrcA=x
PCWriteCond
Mem
Read
Data
PCWrite
M
u
x
Write
Data
Regs
M
u
x
4
<<
2
ALUOut
3
(funct) 5:0
Sgn
Extend
z
A
L
U
0
1M
2u
3x
M
u
x
ALU
Control
2
2
ALUSrcB=x
.28
MemtoReg=1
ALUOp=x
1999 ©UCB
Cycle 4 (sw): Mem[ALUOut] = B
IRWrite
RegWrite
MemRead
RegDst
IorD=1 MemWrite
P
C
M
u Address
x
Write
Data
PCSrc
ALUSrcA
PCWriteCond
Read
Reg1
25:21
Read
data1
A
Read
data2
B
Mem
Read
Data
PCWrite
Read
Reg2
20:16
IR
M
u Write
15:0
15:11
x Reg
M
D
R
M
u
x
Write
Data
Regs
4
<<
2
z
A
L
U
0
1M
2u
3x
ALUOut
3
(funct) 5:0
Sgn
Extend
Mem[ALUOut] = B
M
u
x
M
u
x
ALU
Control
2
2
ALUSrc
.29
MemtoReg
ALUOp
1999 ©UCB