Single-Cycle MIPS - UNC Computer Science

COMP541
Datapaths II &
Single-Cycle MIPS
Montek Singh
Apr 2, 2012
1
Topics
 Complete the datapath
 Add control to it
 Create a full single-cycle MIPS!
 Reading
 Chapter 7
 Review MIPS assembly language
 Chapter 6 of course textbook
 Or, Patterson Hennessy (inside flap)
2
Top-Level CPU (MIPS)
reset
Instr
Memory
clk
clk
pc[31:2]
memwrite
instr
dataadr
writedata
readdata
MIPS
Data
Memory
3
Top-Level CPU: Verilog
module top(input
clk, reset,
output … ); // add signals here for debugging
wire [31:0] pc, instr, readdata, writedata, dataadr;
wire memwrite;
mips mips(clk, reset, pc, instr, memwrite, dataadr,
writedata, readdata);
// processor
imem imem(pc[31:2], instr);
// instr memory
dmem dmem(clk, memwrite, dataadr, writedata,
readdata);
// data memory
endmodule
4
Top Level Schematic (ISE)
imem
MIPS
dmem
5
One level down: Inside MIPS
module mips(input
output
input
output
output
input
wire
wire [4:0]
wire [3:0]
[31:0]
[31:0]
[31:0]
[31:0]
clk, reset,
pc,
instr,
memwrite,
aluout, writedata,
readdata);
memtoreg, branch, pcsrc,
alusrc, regdst, regwrite, jump;
alucontrol;
flags;
// depends on your ALU
// flags = {Z, V, C, N}
controller c(instr[31:26], instr[5:0], flags,
memtoreg, memwrite, pcsrc,
alusrc, regdst, regwrite, jump,
alucontrol);
datapath dp(clk, reset, memtoreg, pcsrc,
alusrc, regdst, regwrite, jump,
alucontrol,
flags, pc, instr,
aluout, writedata, readdata);
endmodule
6
A Note on Flags
 Book’s design only uses Z (zero)
 simple version of MIPS
 allows beq, bne, slt type of tests
 Our design uses { Z, V, C, N } flags
 Z = zero
 V = overflow
 C = carry out
 N = negative
 Allows richer variety of instructions
 see next slide
 wherever you see “zero” in these slides, it should probably read
“flags”
7
A Note on Flags
 4 flags produced by ALU:
 Z (zero): result is = 0
 big NOR gate
 N (negative): result is < 0
 SN-1
 C (carry): indicates that most significant
position produced a carry, e.g., “1 + (-1)”
 Carry from last FA
 V (overflow): indicates answer doesn’t fit
 precisely:
V = Ai -1Bi -1N+ Ai -1Bi -1N
-or-
To compare A and B,
perform A–B and use
condition codes:
Signed comparison:
LT
NV
LE
Z+(NV)
EQ
Z
NE
~Z
GE
~(NV)
GT
~(Z+(NV))
Unsigned comparison:
LTU
C
LEU
C+Z
GEU
~C
GTU
~(C+Z)
Datapath
flags(3:0)
9
MIPS State Elements
CLK
CLK
PC'
PC
32
32
32
A
RD
Instruction
Memory
5
32
5
5
32
A1
A2
A3
WD3
CLK
WE3
RD1
RD2
Register
File
WE
32
32
32
32
A
RD
Data
Memory
WD
32
We’ll fill out the datapath and control logic for basic single cycle MIPS
• first the datapath
• then the control logic
10
Single-Cycle Datapath: lw
 Let’s start by implementing lw instruction
11
Single-Cycle Datapath: lw
 First consider executing lw
 How does lw work?
 STEP 1: Fetch instruction
CLK
CLK
PC'
PC
Instr
A
RD
Instruction
Memory
A1
CLK
WE3
WE
RD1
A
A2
A3
WD3
RD2
Register
File
RD
Data
Memory
WD
Single-Cycle Datapath: lw
 STEP 2: Read source operands from register file
CLK
CLK
25:21
PC'
PC
A
RD
Instruction
Memory
Instr
A1
CLK
WE3
WE
RD1
A
A2
A3
WD3
RD2
Register
File
RD
Data
Memory
WD
Single-Cycle Datapath: lw
 STEP 3: Sign-extend the immediate
CLK
CLK
PC'
PC
A
RD
Instr
25:21
A1
CLK
WE3
WE
RD1
A
Instruction
Memory
A2
A3
WD3
RD
Data
Memory
WD
RD2
Register
File
SignImm
15:0
Sign Extend
Single-Cycle Datapath: lw
 STEP 4: Compute the memory address
Note Control
ALUControl2:0
PC
A
RD
Instr
25:21
Instruction
Memory
A1
A2
A3
WD3
WE3
RD2
SrcB
Register
File
SignImm
15:0
Sign Extend
CLK
Zero
SrcA
RD1
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
Single-Cycle Datapath: lw
 STEP 5: Read data from memory and write it back to
register file
RegWrite
ALUControl2:0
1
PC
A
RD
Instruction
Memory
Instr
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
RD2
SrcB
Register
File
SignImm
15:0
Sign Extend
Zero
SrcA
RD1
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
ReadData
Single-Cycle Datapath: lw
 STEP 6: Determine the address of the next
instruction
RegWrite
ALUControl2:0
1
PC
A
RD
Instr
Instruction
Memory
25:21
A1
A2
20:16
A3
+
WD3
CLK
WE3
Zero
SrcA
RD1
RD2
SrcB
Register
File
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
ReadData
PCPlus4
SignImm
4
15:0
Sign Extend
Result
Let’s be Clear: CPU is Single-Cycle!
 Although the slides said “STEP” …
 … all that stuff is executed in one cycle!!!
 Let’s look at sw next …
 … and then R-type instructions
18
Single-Cycle Datapath: sw
 Write data in rt to memory
 nothing is written back into the register file
RegWrite
ALUControl2:0
0
1
A
RD
Instr
Instruction
Memory
25:21
A1
CLK
WE3
20:16
A2
20:16
A3
WD3
Zero
SrcA
RD1
RD2
SrcB
ALU
PC
+
PC'
010
CLK
CLK
MemWrite
ALUResult
WriteData
Register
File
WE
A
RD
Data
Memory
WD
ReadData
PCPlus4
SignImm
4
15:0
Sign Extend
Result
Single-Cycle Datapath: R-type instr
 R-Type instructions:
 Read from rs and rt
 Write ALUResult to register file
 Write to rd (instead of rt)
RegWrite
RegDst
1
1
0
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
CLK
WE3
0 SrcB
RD2
A3
1
Register
WD3
File
20:16
+
ALUResult
WriteData
MemtoReg
0
0
WE
A
RD
Data
Memory
WD
ReadData
0
1
0
15:11
WriteReg4:0
PCPlus4
Zero
SrcA
RD1
ALU
PC
MemWrite
varies
CLK
CLK
PC'
ALUSrc ALUControl2:0
1
SignImm
4
15:0
Sign Extend
Result
Single-Cycle Datapath: beq
 Determine whether values in rs and rt are equal
 Calculate branch target address:
BTA = (sign-extended immediate << 2) + (PC+4)
PCSrc
RegWrite
RegDst
0
0
110
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
x
0
WE
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
MemtoReg
0
15:11
PCPlus4
Zero
SrcA
RD1
ALU
PC
MemWrite
1
Sign Extend
<<2
+
PC'
ALUSrc ALUControl2:0 Branch
CLK
CLK
0
x
PCBranch
Result
Complete Single-Cycle Processor (w/control)
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
WE
0
15:11
PCPlus4
Zero
SrcA
RD1
ALU
0
CLK
Sign Extend
<<2
+
CLK
PCBranch
Result
Note: Difference due to Flags
 Our Control Unit will be slightly different
 … because of the extra flags
 All flags (Z, V, C, N) are inputs to the control unit
 Signals such as PCSrc are produced inside the control unit
23
Control Unit
 Generally as shown below
 but some differences because our ALU is more sophisticated
Control
Unit
MemtoReg
MemWrite
Opcode5:0
Main
Decoder
flags[3:0]
Branch
ALUSrc
RegDst
RegWrite
ALUOp1:0
Funct5:0
ALU
Decoder
PCSrc
Note: This will be different
for our full-feature ALU!
ALUControl2:0
Note: This will be 5 bits
for our full-feature ALU!
Review: Lightweight ALU from book
A
B
N
N
ALU
N
Y
3F
F2:0
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
Review: Lightweight ALU from book
A
B
N
N
N
0
1
F2
N
Cout
+
[N-1] S
Zero
Extend
N
N
N
N
0
1
2
3
2
N
Y
F1:0
F2:0
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
Review: Our “full feature” ALU
 Full-feature ALU from COMP411:
A
B
5-bit ALUFN
Sub
Bidirectional
Barrel
Shifter
Add/Sub
Boolean
Bool
0
1
1
Math
1
Flags N
V,C Flag
0
R
0
…
Shft
Z
Flag
Sub Bool Shft Math
0
XX
0
1
1
XX
0
1
X
X0
1
1
X
X1
1
1
X
00
1
0
X
10
1
0
X
11
1
0
X
00
0
0
X
01
0
0
X
10
0
0
X
11
0
0
OP
A+B
A-B
0
1
B<<A
B>>A
B>>>A
A & B
A | B
A ^ B
A | B
Review: R-Type instructions
 Register-type
 3 register operands:
 rs, rt: source registers
 rd: destination register
 Other fields:
 op: the operation code or opcode (0 for R-type instructions)
 funct: the function
– together, op and funct tell the computer which operation to perform
 shamt: the shift amount for shift instructions, otherwise it is 0
R-Type
op
6 bits
rs
5 bits
rt
rd
shamt
funct
5 bits
5 bits
5 bits
6 bits
Controller (2 modules)
module controller(input [5:0] op, funct,
input [3:0] flags,
output
memtoreg, memwrite,
output
pcsrc, alusrc,
output
regdst, regwrite,
output
jump,
output [2:0] alucontrol); // 5 bits for our ALU!!
wire [1:0] aluop;
wire
branch;
// This will be different for our ALU
maindec md(op, memtoreg, memwrite, branch,
alusrc, regdst, regwrite, jump, aluop);
aludec ad(funct, aluop, alucontrol);
assign pcsrc = branch & flags[3];
endmodule
// flags = {Z, V, C, N}
29
Main Decoder
module maindec(input [5:0] op,
output
memtoreg, memwrite, branch, alusrc,
output
regdst, regwrite, jump,
output [1:0] aluop);
// different for our ALU
reg [8:0] controls;
assign {regwrite, regdst, alusrc,
branch, memwrite,
memtoreg, jump, aluop} = controls;
always @(*)
case(op)
6'b000000:
6'b100011:
6'b101011:
6'b000100:
6'b001000:
6'b000010:
default:
endcase
endmodule
controls
controls
controls
controls
controls
controls
controls
<=
<=
<=
<=
<=
<=
<=
9'b110000010;
9'b101001000;
9'b001010000;
9'b000100001;
9'b101000000;
9'b000000100;
9'bxxxxxxxxx;
//Rtype
//LW
//SW
//BEQ
//ADDI
//J
//???
Why do this?
This entire
coding may
be different in
our design
30
ALU Decoder
module aludec(input
[5:0] funct,
input
[1:0] aluop,
output reg [2:0] alucontrol); // 5 bits for our ALU!!
always @(*)
case(aluop)
2'b00: alucontrol <= 3'b010;
2'b01: alucontrol <= 3'b110;
default: case(funct)
6'b100000: alucontrol <=
6'b100010: alucontrol <=
6'b100100: alucontrol <=
6'b100101: alucontrol <=
6'b101010: alucontrol <=
default:
alucontrol <=
endcase
endcase
endmodule
// add
// sub
// RTYPE
3'b010; //
3'b110; //
3'b000; //
3'b001; //
3'b111; //
3'bxxx; //
This entire
coding will be
different in
our design
ADD
SUB
AND
OR
SLT
???
31
Control Unit: ALU Decoder
ALUOp1:0
Meaning
00
Add
01
Subtract
10
Look at Funct
11
Not Used
This entire
coding will be
different in
our design
ALUOp1:0 Funct
ALUControl2:0
00
X
010 (Add)
X1
110 (Subtract)
1X
X
100000 (add)
1X
100010 (sub)
110 (Subtract)
1X
100100 (and)
000 (And)
1X
100101 (or)
001 (Or)
1X
101010 (slt)
111 (SLT)
010 (Add)
Control Unit: Main Decoder
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
R-type
000000
lw
100011
sw
101011
beq
000100
1
1
0
0
1
0
X
X
0
1
1
0
0
0
0
1
0
0
1
0
0
0
X
X
…
…
…
…
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
RD1
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
WE
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
0
15:11
PCPlus4
Zero
SrcA
ALU
0
CLK
WE3
Sign Extend
<<2
+
CLK
PCBranch
Result
Note on controller
 The actual number and names of control signals may
be somewhat different in our/your design
 compared to the one given in the book
 because we are implementing more features/instructions
 SO BE VERY CAREFUL WHEN YOU DESIGN YOUR CPU!
34
Single-Cycle Datapath Example: or
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
0
PCSrc
RegWrite
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
A1
CLK
1
WE3
001
SrcA
RD1
0
20:16
A2
A3
WD3
RD2
0 SrcB
1
Register
File
+
WriteReg4:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
15:0
ALUResult
0
WE
0
15:11
4
Zero
0
1
20:16
PCPlus4
ALU
0
CLK
CLK
Sign Extend
<<2
+
0
PCBranch
Result
Extended Functionality: addi
 No change to datapath
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
WE
0
15:11
PCPlus4
Zero
SrcA
RD1
ALU
0
CLK
Sign Extend
<<2
+
CLK
PCBranch
Result
Control Unit: addi
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
R-type
000000
lw
100011
sw
101011
1
1
0
1
0
X
0
1
1
0
0
0
0
0
1
0
0
X
…
…
…
beq
000100
addi
001000
0
1
X
0
0
1
1
0
0
0
X
0
…
…
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
20:16
A1
A2
A3
WD3
CLK
WE3
RD1
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
WE
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
0
15:11
PCPlus4
Zero
SrcA
ALU
0
25:21
Sign Extend
<<2
+
CLK
PCBranch
Result
Adding Jumps: j
Jump
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
Funct
PCSrc
ALUSrc
RegDst
RegWrite
CLK
0
1
0
1
PC'
PC
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
RD2
0 SrcB
1
Register
File
20:16
PCJump
+
15:11
WriteReg4:0
PCPlus4
Sign Extend
WriteData
<<2
+
15:0
ALUResult
0
1
SignImm
4
Zero
SrcA
RD1
ALU
CLK
27:0
31:28
25:0
<<2
PCBranch
WE
A
RD
Data
Memory
WD
ReadData
0 Result
1
Control Unit: Main Decoder
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
1
1
1
0
0
1
0
0
0
0
0
0
…
…
0
0
0
X
X
X
1
0
X
0
1
X
1
0
0
X
X
X
…
…
XX
R-type 000000
lw
100011
sw
101011
beq
000100
j
000100
Jump
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
Funct
PCSrc
ALUSrc
RegDst
RegWrite
CLK
0
1
PC'
PC
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
CLK
WE3
SrcA
RD1
RD2
0 SrcB
1
Register
WD3
File
20:16
PCJump
+
15:11
WriteReg4:0
PCPlus4
Sign Extend
WriteData
<<2
+
15:0
ALUResult
0
1
SignImm
4
Zero
ALU
CLK
0
1
27:0
31:28
25:0
<<2
PCBranch
WE
A
RD
Data
Memory
WD
ReadData
0 Result
1
Jump
1
Review: Processor Performance
Program Execution Time =
(# instructions)(cycles/instruction)(seconds/cycle)
= # instructions x CPI x TC
Single-Cycle Performance
 TC is limited by the critical path (lw)
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
0
0
PCSrc
RegWrite
PC'
PC
1
RD
A
Instr
Instruction
Memory
25:21
20:16
A1
WE3
SrcA
RD1
1
A2
A3
WD3
0 SrcB
1
RD2
Register
File
+
WriteReg4:0
1
RD
Data
Memory
WD
A
ReadData
0
1
1
SignImm
15:0
WriteData
0
WE
0
15:11
4
ALUResult
CLK
0
20:16
PCPlus4
01
0 Zero
Sign Extend
<<2
+
0
1
ALU
CLK
CLK
PCBranch
Result
Single-Cycle Performance
• Single-cycle critical path:
•
Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup
• In most implementations, limiting paths are:
– memory, ALU, register file.
– Tc = tpcq_PC + 2tmem + tRFread + tALU + tRFsetup + tmux
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
0
0
PCSrc
RegWrite
0
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
1
WE3
SrcA
RD1
1
0 SrcB
1
RD2
Register
File
+
WriteReg4:0
1
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
15:0
WriteData
0
WE
0
15:11
4
ALUResult
CLK
0
20:16
PCPlus4
01
0 Zero
Sign Extend
<<2
+
CLK
ALU
CLK
PCBranch
Result
Single-Cycle Performance Example
Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup
= [30 + 2(250) + 150 + 25 + 200 + 20] ps
= 925 ps
What’s the max
clock frequency?
Single-Cycle Performance Example
 For a program with 100 billion instructions executing
on a single-cycle MIPS processor,
 Execution Time
= # instructions x CPI x TC
= (100 × 109)(1)(925 × 10-12 s)
= 92.5 seconds
Next Time
 Next class:
 We’ll look at multi-cycle MIPS
 Adding functionality to our design
 Next lab:
 Implement single-cycle CPU!
45