COMP541
Datapaths II &
Single-Cycle MIPS
Montek Singh
Apr 2, 2012
1
Topics
Complete the datapath
Add control to it
Create a full single-cycle MIPS!
Reading
Chapter 7
Review MIPS assembly language
Chapter 6 of course textbook
Or, Patterson Hennessy (inside flap)
2
Top-Level CPU (MIPS)
reset
Instr
Memory
clk
clk
pc[31:2]
memwrite
instr
dataadr
writedata
readdata
MIPS
Data
Memory
3
Top-Level CPU: Verilog
module top(input
clk, reset,
output … ); // add signals here for debugging
wire [31:0] pc, instr, readdata, writedata, dataadr;
wire memwrite;
mips mips(clk, reset, pc, instr, memwrite, dataadr,
writedata, readdata);
// processor
imem imem(pc[31:2], instr);
// instr memory
dmem dmem(clk, memwrite, dataadr, writedata,
readdata);
// data memory
endmodule
4
Top Level Schematic (ISE)
imem
MIPS
dmem
5
One level down: Inside MIPS
module mips(input
output
input
output
output
input
wire
wire [4:0]
wire [3:0]
[31:0]
[31:0]
[31:0]
[31:0]
clk, reset,
pc,
instr,
memwrite,
aluout, writedata,
readdata);
memtoreg, branch, pcsrc,
alusrc, regdst, regwrite, jump;
alucontrol;
flags;
// depends on your ALU
// flags = {Z, V, C, N}
controller c(instr[31:26], instr[5:0], flags,
memtoreg, memwrite, pcsrc,
alusrc, regdst, regwrite, jump,
alucontrol);
datapath dp(clk, reset, memtoreg, pcsrc,
alusrc, regdst, regwrite, jump,
alucontrol,
flags, pc, instr,
aluout, writedata, readdata);
endmodule
6
A Note on Flags
Book’s design only uses Z (zero)
simple version of MIPS
allows beq, bne, slt type of tests
Our design uses { Z, V, C, N } flags
Z = zero
V = overflow
C = carry out
N = negative
Allows richer variety of instructions
see next slide
wherever you see “zero” in these slides, it should probably read
“flags”
7
A Note on Flags
4 flags produced by ALU:
Z (zero): result is = 0
big NOR gate
N (negative): result is < 0
SN-1
C (carry): indicates that most significant
position produced a carry, e.g., “1 + (-1)”
Carry from last FA
V (overflow): indicates answer doesn’t fit
precisely:
V = Ai -1Bi -1N+ Ai -1Bi -1N
-or-
To compare A and B,
perform A–B and use
condition codes:
Signed comparison:
LT
NV
LE
Z+(NV)
EQ
Z
NE
~Z
GE
~(NV)
GT
~(Z+(NV))
Unsigned comparison:
LTU
C
LEU
C+Z
GEU
~C
GTU
~(C+Z)
Datapath
flags(3:0)
9
MIPS State Elements
CLK
CLK
PC'
PC
32
32
32
A
RD
Instruction
Memory
5
32
5
5
32
A1
A2
A3
WD3
CLK
WE3
RD1
RD2
Register
File
WE
32
32
32
32
A
RD
Data
Memory
WD
32
We’ll fill out the datapath and control logic for basic single cycle MIPS
• first the datapath
• then the control logic
10
Single-Cycle Datapath: lw
Let’s start by implementing lw instruction
11
Single-Cycle Datapath: lw
First consider executing lw
How does lw work?
STEP 1: Fetch instruction
CLK
CLK
PC'
PC
Instr
A
RD
Instruction
Memory
A1
CLK
WE3
WE
RD1
A
A2
A3
WD3
RD2
Register
File
RD
Data
Memory
WD
Single-Cycle Datapath: lw
STEP 2: Read source operands from register file
CLK
CLK
25:21
PC'
PC
A
RD
Instruction
Memory
Instr
A1
CLK
WE3
WE
RD1
A
A2
A3
WD3
RD2
Register
File
RD
Data
Memory
WD
Single-Cycle Datapath: lw
STEP 3: Sign-extend the immediate
CLK
CLK
PC'
PC
A
RD
Instr
25:21
A1
CLK
WE3
WE
RD1
A
Instruction
Memory
A2
A3
WD3
RD
Data
Memory
WD
RD2
Register
File
SignImm
15:0
Sign Extend
Single-Cycle Datapath: lw
STEP 4: Compute the memory address
Note Control
ALUControl2:0
PC
A
RD
Instr
25:21
Instruction
Memory
A1
A2
A3
WD3
WE3
RD2
SrcB
Register
File
SignImm
15:0
Sign Extend
CLK
Zero
SrcA
RD1
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
Single-Cycle Datapath: lw
STEP 5: Read data from memory and write it back to
register file
RegWrite
ALUControl2:0
1
PC
A
RD
Instruction
Memory
Instr
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
RD2
SrcB
Register
File
SignImm
15:0
Sign Extend
Zero
SrcA
RD1
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
ReadData
Single-Cycle Datapath: lw
STEP 6: Determine the address of the next
instruction
RegWrite
ALUControl2:0
1
PC
A
RD
Instr
Instruction
Memory
25:21
A1
A2
20:16
A3
+
WD3
CLK
WE3
Zero
SrcA
RD1
RD2
SrcB
Register
File
ALU
CLK
PC'
010
CLK
ALUResult
WE
A
RD
Data
Memory
WD
ReadData
PCPlus4
SignImm
4
15:0
Sign Extend
Result
Let’s be Clear: CPU is Single-Cycle!
Although the slides said “STEP” …
… all that stuff is executed in one cycle!!!
Let’s look at sw next …
… and then R-type instructions
18
Single-Cycle Datapath: sw
Write data in rt to memory
nothing is written back into the register file
RegWrite
ALUControl2:0
0
1
A
RD
Instr
Instruction
Memory
25:21
A1
CLK
WE3
20:16
A2
20:16
A3
WD3
Zero
SrcA
RD1
RD2
SrcB
ALU
PC
+
PC'
010
CLK
CLK
MemWrite
ALUResult
WriteData
Register
File
WE
A
RD
Data
Memory
WD
ReadData
PCPlus4
SignImm
4
15:0
Sign Extend
Result
Single-Cycle Datapath: R-type instr
R-Type instructions:
Read from rs and rt
Write ALUResult to register file
Write to rd (instead of rt)
RegWrite
RegDst
1
1
0
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
CLK
WE3
0 SrcB
RD2
A3
1
Register
WD3
File
20:16
+
ALUResult
WriteData
MemtoReg
0
0
WE
A
RD
Data
Memory
WD
ReadData
0
1
0
15:11
WriteReg4:0
PCPlus4
Zero
SrcA
RD1
ALU
PC
MemWrite
varies
CLK
CLK
PC'
ALUSrc ALUControl2:0
1
SignImm
4
15:0
Sign Extend
Result
Single-Cycle Datapath: beq
Determine whether values in rs and rt are equal
Calculate branch target address:
BTA = (sign-extended immediate << 2) + (PC+4)
PCSrc
RegWrite
RegDst
0
0
110
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
x
0
WE
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
MemtoReg
0
15:11
PCPlus4
Zero
SrcA
RD1
ALU
PC
MemWrite
1
Sign Extend
<<2
+
PC'
ALUSrc ALUControl2:0 Branch
CLK
CLK
0
x
PCBranch
Result
Complete Single-Cycle Processor (w/control)
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
WE
0
15:11
PCPlus4
Zero
SrcA
RD1
ALU
0
CLK
Sign Extend
<<2
+
CLK
PCBranch
Result
Note: Difference due to Flags
Our Control Unit will be slightly different
… because of the extra flags
All flags (Z, V, C, N) are inputs to the control unit
Signals such as PCSrc are produced inside the control unit
23
Control Unit
Generally as shown below
but some differences because our ALU is more sophisticated
Control
Unit
MemtoReg
MemWrite
Opcode5:0
Main
Decoder
flags[3:0]
Branch
ALUSrc
RegDst
RegWrite
ALUOp1:0
Funct5:0
ALU
Decoder
PCSrc
Note: This will be different
for our full-feature ALU!
ALUControl2:0
Note: This will be 5 bits
for our full-feature ALU!
Review: Lightweight ALU from book
A
B
N
N
ALU
N
Y
3F
F2:0
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
Review: Lightweight ALU from book
A
B
N
N
N
0
1
F2
N
Cout
+
[N-1] S
Zero
Extend
N
N
N
N
0
1
2
3
2
N
Y
F1:0
F2:0
Function
000
A&B
001
A|B
010
A+B
011
not used
100
A & ~B
101
A | ~B
110
A-B
111
SLT
Review: Our “full feature” ALU
Full-feature ALU from COMP411:
A
B
5-bit ALUFN
Sub
Bidirectional
Barrel
Shifter
Add/Sub
Boolean
Bool
0
1
1
Math
1
Flags N
V,C Flag
0
R
0
…
Shft
Z
Flag
Sub Bool Shft Math
0
XX
0
1
1
XX
0
1
X
X0
1
1
X
X1
1
1
X
00
1
0
X
10
1
0
X
11
1
0
X
00
0
0
X
01
0
0
X
10
0
0
X
11
0
0
OP
A+B
A-B
0
1
B<<A
B>>A
B>>>A
A & B
A | B
A ^ B
A | B
Review: R-Type instructions
Register-type
3 register operands:
rs, rt: source registers
rd: destination register
Other fields:
op: the operation code or opcode (0 for R-type instructions)
funct: the function
– together, op and funct tell the computer which operation to perform
shamt: the shift amount for shift instructions, otherwise it is 0
R-Type
op
6 bits
rs
5 bits
rt
rd
shamt
funct
5 bits
5 bits
5 bits
6 bits
Controller (2 modules)
module controller(input [5:0] op, funct,
input [3:0] flags,
output
memtoreg, memwrite,
output
pcsrc, alusrc,
output
regdst, regwrite,
output
jump,
output [2:0] alucontrol); // 5 bits for our ALU!!
wire [1:0] aluop;
wire
branch;
// This will be different for our ALU
maindec md(op, memtoreg, memwrite, branch,
alusrc, regdst, regwrite, jump, aluop);
aludec ad(funct, aluop, alucontrol);
assign pcsrc = branch & flags[3];
endmodule
// flags = {Z, V, C, N}
29
Main Decoder
module maindec(input [5:0] op,
output
memtoreg, memwrite, branch, alusrc,
output
regdst, regwrite, jump,
output [1:0] aluop);
// different for our ALU
reg [8:0] controls;
assign {regwrite, regdst, alusrc,
branch, memwrite,
memtoreg, jump, aluop} = controls;
always @(*)
case(op)
6'b000000:
6'b100011:
6'b101011:
6'b000100:
6'b001000:
6'b000010:
default:
endcase
endmodule
controls
controls
controls
controls
controls
controls
controls
<=
<=
<=
<=
<=
<=
<=
9'b110000010;
9'b101001000;
9'b001010000;
9'b000100001;
9'b101000000;
9'b000000100;
9'bxxxxxxxxx;
//Rtype
//LW
//SW
//BEQ
//ADDI
//J
//???
Why do this?
This entire
coding may
be different in
our design
30
ALU Decoder
module aludec(input
[5:0] funct,
input
[1:0] aluop,
output reg [2:0] alucontrol); // 5 bits for our ALU!!
always @(*)
case(aluop)
2'b00: alucontrol <= 3'b010;
2'b01: alucontrol <= 3'b110;
default: case(funct)
6'b100000: alucontrol <=
6'b100010: alucontrol <=
6'b100100: alucontrol <=
6'b100101: alucontrol <=
6'b101010: alucontrol <=
default:
alucontrol <=
endcase
endcase
endmodule
// add
// sub
// RTYPE
3'b010; //
3'b110; //
3'b000; //
3'b001; //
3'b111; //
3'bxxx; //
This entire
coding will be
different in
our design
ADD
SUB
AND
OR
SLT
???
31
Control Unit: ALU Decoder
ALUOp1:0
Meaning
00
Add
01
Subtract
10
Look at Funct
11
Not Used
This entire
coding will be
different in
our design
ALUOp1:0 Funct
ALUControl2:0
00
X
010 (Add)
X1
110 (Subtract)
1X
X
100000 (add)
1X
100010 (sub)
110 (Subtract)
1X
100100 (and)
000 (And)
1X
100101 (or)
001 (Or)
1X
101010 (slt)
111 (SLT)
010 (Add)
Control Unit: Main Decoder
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
R-type
000000
lw
100011
sw
101011
beq
000100
1
1
0
0
1
0
X
X
0
1
1
0
0
0
0
1
0
0
1
0
0
0
X
X
…
…
…
…
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
RD1
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
WE
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
0
15:11
PCPlus4
Zero
SrcA
ALU
0
CLK
WE3
Sign Extend
<<2
+
CLK
PCBranch
Result
Note on controller
The actual number and names of control signals may
be somewhat different in our/your design
compared to the one given in the book
because we are implementing more features/instructions
SO BE VERY CAREFUL WHEN YOU DESIGN YOUR CPU!
34
Single-Cycle Datapath Example: or
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
0
PCSrc
RegWrite
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
A1
CLK
1
WE3
001
SrcA
RD1
0
20:16
A2
A3
WD3
RD2
0 SrcB
1
Register
File
+
WriteReg4:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
15:0
ALUResult
0
WE
0
15:11
4
Zero
0
1
20:16
PCPlus4
ALU
0
CLK
CLK
Sign Extend
<<2
+
0
PCBranch
Result
Extended Functionality: addi
No change to datapath
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
WE3
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
WE
0
15:11
PCPlus4
Zero
SrcA
RD1
ALU
0
CLK
Sign Extend
<<2
+
CLK
PCBranch
Result
Control Unit: addi
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
R-type
000000
lw
100011
sw
101011
1
1
0
1
0
X
0
1
1
0
0
0
0
0
1
0
0
X
…
…
…
beq
000100
addi
001000
0
1
X
0
0
1
1
0
0
0
X
0
…
…
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
PCSrc
RegWrite
CLK
PC'
PC
1
A
RD
Instr
Instruction
Memory
20:16
A1
A2
A3
WD3
CLK
WE3
RD1
RD2
0 SrcB
1
Register
File
20:16
+
WriteReg4:0
15:0
WriteData
WE
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
4
ALUResult
0
15:11
PCPlus4
Zero
SrcA
ALU
0
25:21
Sign Extend
<<2
+
CLK
PCBranch
Result
Adding Jumps: j
Jump
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
Funct
PCSrc
ALUSrc
RegDst
RegWrite
CLK
0
1
0
1
PC'
PC
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
CLK
WE3
RD2
0 SrcB
1
Register
File
20:16
PCJump
+
15:11
WriteReg4:0
PCPlus4
Sign Extend
WriteData
<<2
+
15:0
ALUResult
0
1
SignImm
4
Zero
SrcA
RD1
ALU
CLK
27:0
31:28
25:0
<<2
PCBranch
WE
A
RD
Data
Memory
WD
ReadData
0 Result
1
Control Unit: Main Decoder
Instruction
Op5:0
RegWrite
RegDst
AluSrc
Branch
MemWrite
MemtoReg
ALUOp1:0
1
1
1
0
0
1
0
0
0
0
0
0
…
…
0
0
0
X
X
X
1
0
X
0
1
X
1
0
0
X
X
X
…
…
XX
R-type 000000
lw
100011
sw
101011
beq
000100
j
000100
Jump
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
Funct
PCSrc
ALUSrc
RegDst
RegWrite
CLK
0
1
PC'
PC
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
CLK
WE3
SrcA
RD1
RD2
0 SrcB
1
Register
WD3
File
20:16
PCJump
+
15:11
WriteReg4:0
PCPlus4
Sign Extend
WriteData
<<2
+
15:0
ALUResult
0
1
SignImm
4
Zero
ALU
CLK
0
1
27:0
31:28
25:0
<<2
PCBranch
WE
A
RD
Data
Memory
WD
ReadData
0 Result
1
Jump
1
Review: Processor Performance
Program Execution Time =
(# instructions)(cycles/instruction)(seconds/cycle)
= # instructions x CPI x TC
Single-Cycle Performance
TC is limited by the critical path (lw)
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
0
0
PCSrc
RegWrite
PC'
PC
1
RD
A
Instr
Instruction
Memory
25:21
20:16
A1
WE3
SrcA
RD1
1
A2
A3
WD3
0 SrcB
1
RD2
Register
File
+
WriteReg4:0
1
RD
Data
Memory
WD
A
ReadData
0
1
1
SignImm
15:0
WriteData
0
WE
0
15:11
4
ALUResult
CLK
0
20:16
PCPlus4
01
0 Zero
Sign Extend
<<2
+
0
1
ALU
CLK
CLK
PCBranch
Result
Single-Cycle Performance
• Single-cycle critical path:
•
Tc = tpcq_PC + tmem + max(tRFread, tsext + tmux) + tALU + tmem + tmux + tRFsetup
• In most implementations, limiting paths are:
– memory, ALU, register file.
– Tc = tpcq_PC + 2tmem + tRFread + tALU + tRFsetup + tmux
31:26
5:0
MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0
Op
ALUSrc
Funct
RegDst
0
0
PCSrc
RegWrite
0
PC'
PC
1
A
RD
Instr
Instruction
Memory
25:21
20:16
A1
A2
A3
WD3
1
WE3
SrcA
RD1
1
0 SrcB
1
RD2
Register
File
+
WriteReg4:0
1
A
RD
Data
Memory
WD
ReadData
0
1
1
SignImm
15:0
WriteData
0
WE
0
15:11
4
ALUResult
CLK
0
20:16
PCPlus4
01
0 Zero
Sign Extend
<<2
+
CLK
ALU
CLK
PCBranch
Result
Single-Cycle Performance Example
Tc = tpcq_PC + 2tmem + tRFread + tmux + tALU + tRFsetup
= [30 + 2(250) + 150 + 25 + 200 + 20] ps
= 925 ps
What’s the max
clock frequency?
Single-Cycle Performance Example
For a program with 100 billion instructions executing
on a single-cycle MIPS processor,
Execution Time
= # instructions x CPI x TC
= (100 × 109)(1)(925 × 10-12 s)
= 92.5 seconds
Next Time
Next class:
We’ll look at multi-cycle MIPS
Adding functionality to our design
Next lab:
Implement single-cycle CPU!
45
© Copyright 2026 Paperzz