CS2504, Spring`2007 ©Dimitris Nikolopoulos

CS2504, Spring'2007
©Dimitris Nikolopoulos
CS2504: Computer Organization
Lecture 5: Processor Datapath and Control
Dimitris Nikolopoulos
CS2504, Spring'2007
©Dimitris Nikolopoulos
Implement MIPS



We will look into the implementation of:

Memory reference instructions (lw, sw)

Arithmetic, logical instructions (add, sub, and, or, slt)

Branch instructions (beq, j)
We will build on two principles:

Making the common case fast

Simplicity favors regularity
We will build structures common in many
microprocessors for all markets
2
CS2504, Spring'2007
©Dimitris Nikolopoulos
Things common in instructions



Fetch code from memory using the PC
Read one or two registers, using field in
instruction
Following these steps, execution becomes
instruction-specific

Many instructions use the ALU (except from j)

Many instructions write a register

Memory instructions read or write memory
3
CS2504, Spring'2007
©Dimitris Nikolopoulos
Abstract view of MIPS
Missing multiplexers
and control logic for
registers and memories
4
CS2504, Spring'2007
©Dimitris Nikolopoulos
Basic implementation of MIPS
5
CS2504, Spring'2007
©Dimitris Nikolopoulos
Single-cycle datapath


Instruction begins execution in one clock edge,
ends on the other clock edge
Easy, but impractical implementation



Simple and complex instructions
Variable number of clock cycles per instruction
Single-cycle datapath needs separate instruction
and data memories

Different format of data and instructions

Less expensive design

Cannot use a single-ported memory for instructions and
data reads/writes in ony cycle!
6
CS2504, Spring'2007
©Dimitris Nikolopoulos
Datapath Elements - Instructions
We can use the ALU we designed during logic design lectures, hardwired to
perform only additions.
7
CS2504, Spring'2007
©Dimitris Nikolopoulos
Datapath Elements - Instructions
Logic to fetch the next instruction in program order. Common case.
8
CS2504, Spring'2007
©Dimitris Nikolopoulos
Datapath Elements – Register file
R instructions need two input and one output registers. Register file always
outputs the data of the two read registers. Control is needed to write
a register. Writes are edge-triggered, therefore register file can read and write
registers in the same cycle. Writes need control signal, register number
9
and data ready before falling clock edge.
CS2504, Spring'2007
©Dimitris Nikolopoulos
Datapath Elements – Loads/stores
Loads need to read into register file from address calculated with base plus
offset (need ALU). Similarly, store needs to store in memory location
calculated with base plus offset (need ALU).
10
CS2504, Spring'2007
©Dimitris Nikolopoulos
Datapath Elements – Loads/stores
Sign extensions needed for handling positive and negative offsets in
load/store instructions. Data memory needed to read from and write to.
11
CS2504, Spring'2007
©Dimitris Nikolopoulos
Datapath Elements – Branches
Branch needs to calculate target address, by adding a sign-extended offset
to PC+4 (architecture convention). Branch also needs to compare two
registers to decide whether it is taken (condition is true, go to target)
or not taken (condition is false, execute PC+4). Comparison can be done 12
in standard ALU, by asserting the subtract control signal.
CS2504, Spring'2007
©Dimitris Nikolopoulos
Simple Single-Cycle Datapath
No sharing of the datapath between instructions (one at a time).
R-instructions and memory instructions can use the same ALU.
Value of destination register comes from ALU or memory (via a
multiplexer). ALU input may come from register file (R-instruction) or 13
directly from the instruction field (immediate operand, offset). Need MUX
CS2504, Spring'2007
©Dimitris Nikolopoulos
Simple Single-Cycle Datapath
Combines instruction fetch logic, R- and memory instructions datapath and
branch datapath. Branch uses main ALU for comparisons, so we need one
more adder for the branch target.
14
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control
ALU control lines
0000
0001
0010
0110
0111
1100
Function
AND
OR
add
subtract
set less than
nor
Memory access instructions need addition. R-instructions use any of
the five except from nor, depending on the 6-bit funct field in the instruction.
BEQ needs a subtraction
Implementing control: Use 6 bits from instruction field funct and 2 bits
indicating add (00) for loads/stores, subtract (01) for beq or whatever
the function field indicates (10)
15
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control
Instruction
opcode
LW
SW
BEQ
R- type
R- type
R- type
R- type
R- type
ALUop
00
00
01
10
10
10
10
10
Instruction
operation
load word
store word
branch equal
add
subtract
AND
OR
set or less than
Function field
XXXXXX
XXXXXX
XXXXXX
100000
100010
100100
100101
101010
Desired
ALU function
add
add
subtract
add
subtract
and
or
set on less than
ALU control
input
0010
0010
0110
0010
0110
0000
0001
0111
Memory access instructions need addition. R-instructions use any of
the five ALU ops except from nor, depending on the 6-bit funct field in
the instruction. BEQ needs a subtraction
Implementing control: Use 6 bits from instruction field funct and 2 bits
indicating add (00) for loads/stores, subtract (01) for beq or whatever
the function field indicates (10).
16
Note that only small fraction of the possible function values are used here.
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control
ALUop1
ALUop1 Aluop0
0
0
X
1
1
X
1
X
1
X
1
X
1
X
F5
X
X
X
X
X
X
X
F4
X
X
X
X
X
X
X
Funct field
F3 F2
F1
X
X
X
X
X
X
0
0
0
0
0
1
0
1
0
0
1
0
1
0
1
F0
X
X
0
0
0
1
0
Operation
0010
0110
0010
0110
0000
0001
0111
Simplification: Use only a few out of the 64 possible function combinations
in the funciton field. Lots of don't cares. We can simplify the logic.
17
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control

Remember instruction formats:

Bits 31:26 always contain opcode

Registers to read always in 25:21 and 20:16

Base register for load/store in 25:21

16-bit offset for branch, load, store in 15:0

Destination register in one of two places


20:16 for a load
15:11 for R-type instructions (rt register)
18
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control
Added lines from instruction for read/write registers, ALU control block,
RegDst selects destination register.ALUSrc selects ALU input from immediate
field or from register file. PCSrc decides on whether to advance the PC or use
the branch target. MemtoReg decides on whether data written to register file 19
comes from ALU operation or from data memory..
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control - R-instruction
Flow of R-instruction: Instruction fetched, PC incremented. Two registers read
from register file, while control unit decides what operation to perform in
parallel. ALU operates on the data based on bits 5:0 of the instruction. Result
of ALU written to register file based on bits 15:11 of instruction.
20
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control - Load/Store
Flow of load instruction: Instruction fetched, PC incremented. One register
is read. ALU computes base plus offset from register file and instruction.
Sum is used to access data memory. Data from memory written to register
held in bits 20:16
21
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control – Designing the unit
Instruction RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp1 ALUOp0
R- format
lw
sw
beq
1
0
X
X
0
1
1
0
0
1
X
X
1
1
0
0
0
1
0
0
0
0
1
0
0
0
0
1
1
0
0
0
0
0
0
1
Example: R-format: Destination register used (1). ALUsrc from register file
(0). Input to register file from instruction (0). Writes register (1). Does not
read memory (0). Does not write memory (0). Does not branch (0). Performs
an operation defined by the funct field of the instruction (10).
22
CS2504, Spring'2007
©Dimitris Nikolopoulos
MIPS ALU Control – Designing the unit
Input or Output
Inputs
Outputs
Signal Name R- format
Op5
Op4
Op3
Op2
Op1
Op0
RegDst
ALUSrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUop1
ALUop0
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
lw
1
0
0
0
1
1
0
1
1
1
1
0
0
0
0
sw
1
0
1
0
1
1
X
1
X
0
0
1
0
0
0
beq
0
0
0
1
0
0
X
0
X
0
0
0
1
0
1
23
CS2504, Spring'2007
©Dimitris Nikolopoulos
Adding a Jump Instruction
24
CS2504, Spring'2007
©Dimitris Nikolopoulos
Why not single-cycle?

Single-cycle design means CPI = 1

Load instruction uses all five units





Instruction memory to fetch instruction
Register file for target register
ALU to calculate base plus offset
Data memory
Performance bound by the slowest instruction!
25
CS2504, Spring'2007
©Dimitris Nikolopoulos
Why not single-cycle?

Example
Example:

Memory unit latency, 200ps.

ALU and adders: 100ps

Register file (read/write): 50ps

All others: (unrealistically) no delay

25% loads, 10% stores, 45% ALU instructions, 15%
branches, 5% jumps.
26
CS2504, Spring'2007
©Dimitris Nikolopoulos
Why not single-cycle?

CPU execution time = icount * CPI * clock cycle

CPI = 1

R-instructions


Load


fetch, access registers, ALU, access registers, 400ps.
fetch, access registers, ALU, access memory, access,
600ps. register
Store

fetch, access registers, ALU, access memory, 550ps.
27
CS2504, Spring'2007
©Dimitris Nikolopoulos
Why not single-cycle?

CPU execution time = icount * CPI * clock cycle

CPI = 1

Branch


Jump



Fetch, access registers, ALU, 350ps
Fetch, 200 ps.
Slowest instruction defines clock cycle: 600 ps.
This means that all instructions take 600ps, even
though some take as few as 200 ps.!
Average clock cycle in example: 447.5 ps.!
28
CS2504, Spring'2007
©Dimitris Nikolopoulos
Why not single-cycle?



Single-cycle design violates the principle make
the common case fast. Fast instructions run as
slow as the slowest instrucitons
Each functional unit of the microprocessor can be
used at most once per clock cycle. If the
instruction mix needs more functional units, they
need to be duplicated.
Solution: multi-cycle instructions, intro to
pipelining.
29
CS2504, Spring'2007
©Dimitris Nikolopoulos
The laundry example
30
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipelining


Multiple instructions overlapped in execution
The pipeline paradox:

Each task takes the same amount of time as in a nonpipelined implementation

But, a set of tasks executes faster in a pipelined
implementation, because many tasks proceed in parallel!
31
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipelining in MIPS



Fetch instruction from memory
Read registers, while decoding instruction
Execute operation (ALU) or calculate address
(ALU)

Write the result into a register

Classic, 5-stage pipeline (memorize!)
32
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipelining improves performance
33
CS2504, Spring'2007
©Dimitris Nikolopoulos
Understanding pipeline performance

In example, first instruction graduates in 900ps.
Following that, an instruction graduates every
200ps.

What is the maximum speedup from pipelining?

Time between instructions pipelined =
Time between instructions nonpipelined
Number of pipe stages
34
CS2504, Spring'2007
©Dimitris Nikolopoulos
ISA implications for pipelining



Fixed-width instructions, simplify fetch stage
Symmetry between instructions, enables register
access in parallel with instruction decoding.
Asymmetric instructions need more pipeline
stages
Memory operands only in loads/stores. We can
still do a load in five stages. If memory operands
were added to arithmetic instructions, we would
need longer pipeling
35
CS2504, Spring'2007
©Dimitris Nikolopoulos
Hazards


Broadly defined: A hazard is a conflict between
two instructions that attempt to access the same
resource, in the same cycle, at different stages of
their pipelined execution.
Structural hazard: Hardware does not have
enough resources. For example, suppose we had a
single memory for instructions and data.
36
CS2504, Spring'2007
©Dimitris Nikolopoulos
Structural Hazard
Assume a single memory for instructions and data and assume that
a fourth load instruction is executed. First instruction's data access
would conflict with fourth instruction's instruction fetch!
37
CS2504, Spring'2007
©Dimitris Nikolopoulos
Data hazard

A data hazard occurs because an instruction may
need input from an earlier instruction in the
pipeline and the input may not be available:
add $s0,$t0,$t1
sub $t2,$s0,$t3


Add instruction does not write $s0 until 5th stage
But, sub instruction needs new value $s0 in its
second stage
38
CS2504, Spring'2007
©Dimitris Nikolopoulos
Alternative pipeline representation
Symbols corresponding to pipeline stages: fetch, decode, execute,
memory access and register write back. Shading indicates how each
element is used by the instruction. Half-shaded resources indicate reads
(right shade) or writes (left shade).
39
CS2504, Spring'2007
©Dimitris Nikolopoulos
Forwarding to resolve data hazards
Idea: We forward the needed input for the execute stage as soon as it
is produced from the execute stage of the previous instruction. Caution:
we only forward ahead in time, not back. In this case, the add produces
the needed result at time 600, therefore the result can be recycled back
to the ALU and used for the next instruction.
40
CS2504, Spring'2007
©Dimitris Nikolopoulos
Is forwarding always possible?
Result from load not in register file until t=800. If sub follows load in
the next cycle, it will need the resilt in the ALU by time t=600. Can't go
backwards in time, therefore forwarding will not work in this case,
unless we insert a delay (bubble) in the pipeline. First example of
non-perfect pipelining!
41
CS2504, Spring'2007
©Dimitris Nikolopoulos
Resolving data hazards for free
Consider the following segment in C:
A = B + E;
C = B + F;
Here is the translation to MIPS:
lw
$t1, 0($t0)
lw
$t2, 4($t0)
add $t3, $t1, $t2
sw
$t3, 12($t0)
lw
$t4, 8($t0)
add $t5, $t1, $t4
sw
$t5, 16($t0)
Can you find the data
hazards in this example?
42
CS2504, Spring'2007
©Dimitris Nikolopoulos
Resolving data hazards for free
Consider the following segment in C:
A = B + E;
C = B + F;
Here is the translation to MIPS:
lw
$t1, 0($t0)
lw
$t2, 4($t0)
add $t3, $t1, $t2
sw
$t3, 12($t0)
lw
$t4, 8($t0)
add $t5, $t1, $t4
sw
$t5, 16($t0)
Can you remove the hazards
by just reordering instructions?
Key idea: there is no dependence
between the two assignments,
A and C can be assigned in
parallel!
43
CS2504, Spring'2007
©Dimitris Nikolopoulos
Resolving data hazards for free
Consider the following segment in C:
A = B + E;
C = B + F;
Here is the translation to MIPS:
lw
$t1, 0($t0)
lw
$t2, 4($t0)
lw
$t4, 8($t0)
add $t3, $t1, $t2
sw
$t3, 12($t0)
add $t5, $t1, $t4
sw
$t5, 16($t0)
Can you remove the hazards
by just reordering instructions?
Key idea: there is no dependence
between the two assignments,
A and C can be assigned in
parallel!
44
CS2504, Spring'2007
©Dimitris Nikolopoulos
Control hazards



Instruction following a branch needs to be fetched
in next clock cycle.
Unfortunately, we do not know the outcome of
the branch and whether we should execute the
next instruction or not when we fetch the branch.
The outcome of the branch is known after the
ALU stage (3rd in the pipeline).
45
CS2504, Spring'2007
©Dimitris Nikolopoulos
Resolving control hazards earlier
Assume that we throw some hardware (more specifically, ALUs), so
that we can calculate the branch condition and the branch target by the
end of the second stage of the instruction. Even then, we still need to
insert a bubble in the pipeline so that we can safely fetch the next
instruction. Assume that all instructions have CPI=1 and branches need
two cycles. If YY% of instructions are branches, the CPI of the
46
machine increases to 1.YY.
CS2504, Spring'2007
©Dimitris Nikolopoulos
Neo and the Oracle
Oracle: I'd ask you to sit down, but, you're not going to anyway. And
don't worry about the vase.
Neo: What vase?
[Neo turns to look for a vase, and as he does, he knocks over a vase
of flowers, which shatters on the floor.]
Oracle: That vase.
Neo: I'm sorry-Oracle: I said don't worry about it. I'll get one of my kids to fix it.
Neo: How did you know?
Oracle: Ohh, what's really going to bake your noodle later on is,
would you still have broken it if I hadn't said anything?
47
CS2504, Spring'2007
©Dimitris Nikolopoulos
Prediction



Modern processors make use of prediction to
handle branches, as well as several other events
that may cause long pipeline stalls.
If we predict that the branch is not taken, and the
branch is indeed not taken, then the pipeline
proceeds at full speed.
If the branch is taken, then we will have to
insert a bubble
48
CS2504, Spring'2007
©Dimitris Nikolopoulos
Prediction (always not taken)
49
CS2504, Spring'2007
©Dimitris Nikolopoulos
Designing predictors


Predict always taken or not taken is like flipping a
coin.
But branches are not really random:


Branch in the edge of a loop for example, jumps back all
the times, except from the last iteration.
Motivation for dynamic predictors:

Hardware structures that keep information on a per branch
basis. For example, they keep the history of outcomes of
the same branch during the execution of the program.

Predict based on history of branches.
50
CS2504, Spring'2007
©Dimitris Nikolopoulos
Designing predictors

Hit rate: How many times we predict the outcome of
branches right, over the total number of branches
executed in the program


Caution: each branch may be executed numerous times.
Actual branch prediction performance depends on:

Prediction accuracy

Early calculation of the branch target address, if needed.
51
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipeline performance



Other than the memory system, stalls in the
pipeline are the major performance bottlenecks in
modern processors.
Structural hazards are frequent in codes with high
supply of independent arithmetic instructions, if
the processor has only a few execution units.
Control hazards frequent in programs with
frequent and hard-to-predict branches (e.g. trees)


E.g. branches depending on the input
Data hazards are frequent across the board.
52
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipelined datapath
53
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipelined datapath
54
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipelined datapath – Staging data
b
55
CS2504, Spring'2007
©Dimitris Nikolopoulos
Life of a load in the MIPS pipeline
Note: both the
instruction and
the incremented
PC value need
to be forwarded
in the next stage
(in case the
instruction
is a beq)
Note:works in the
same way for
stores.
56
CS2504, Spring'2007
©Dimitris Nikolopoulos
Life of a load in the MIPS pipeline
Note: ALU uses
the sign-extended
16-bit immediate
operand and reg1.
Latches keep
forwarding all
register
values and PC+4.
57
CS2504, Spring'2007
©Dimitris Nikolopoulos
Life of a load in the MIPS pipeline
Note: Latches keep
forwarding all
register
values and PC+4.
58
CS2504, Spring'2007
©Dimitris Nikolopoulos
Life of a store in the MIPS pipeline
Note:Third stage
differs from load
in that the
value of the second
register needs to be
forwarded to the
MEM stage
59
CS2504, Spring'2007
©Dimitris Nikolopoulos
Life of a store in the MIPS pipeline
Note:MEM stage
differs because
memory is written.
Write back stage
differs because
nothing is written
to the register file.
60
CS2504, Spring'2007
©Dimitris Nikolopoulos
Designing the pipeline


All needed information needs to be forwarded
through the pipeline registers, e.g. register 2 in
the store instruction
Each resource can be used in a single pipeline
stage, or else we will have structural hazards.
61
CS2504, Spring'2007
©Dimitris Nikolopoulos
Fixing the pipeline for load
62
CS2504, Spring'2007
©Dimitris Nikolopoulos
Alternative pipeline representation
63
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipeline control
No write control needed in IF stage since PC is always written. Multilpexor
needed to decide between branch target and PC+4
64
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipeline control
No control in ID/reg. read stage. Register write is asserted in writeback
steage.
65
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipeline control
EX stage needs ALUSrc to decide whether operand comes from register
file or is immediate.RegDst signal decides which is the destination register
(different field in R-instructions and instructions with < 3 reg. operand). 66
Finally, ALUOp controls ALU operation
CS2504, Spring'2007
©Dimitris Nikolopoulos
Pipeline control
MEM stage controls branch, loads and stores. WB stage controls MemToReg
to send memory or register value to register file. WB also sets RegWrite
signal
67
CS2504, Spring'2007
©Dimitris Nikolopoulos
Data Hazards – A Bad Sequence
sub
$2,$1,$3
and
$12,$2,$5
or
$13,$6,$2
add
$14,$2,$2
sw
$15,100($2)
68
CS2504, Spring'2007
©Dimitris Nikolopoulos
Data Hazards – A Bad Sequence
Only add and sw get the correct result. Backward lines are trouble!
Data hazard can be resolved by forwarding from the EX stage of earlier
instruction to the EX stage of later instruction.
69
CS2504, Spring'2007
©Dimitris Nikolopoulos
Formally Detecting Hazards
EX/MEM.RegisterRd = ID/EX.RegisterRs
EX/MEM.RegisterRd = ID/EX.RegisterRt
MEM/WB.RegisterRd = ID/EX.RegisterRs
MEM/WB.RegisterRd = ID/EX.RegisterRt
sub
$2,$1,$3
and
$12,$2,$5
MEM/WB.RegisterRd = ID/EX.RegisterRs = $2
70
CS2504, Spring'2007
©Dimitris Nikolopoulos
Accurate Detection of Hazards


Some instructions do not write back to registers.
Therefore, we need forwards only if
MEM/WB.RegisterRd ≠ 0.
$0 pinned to zero, therefore we do not forward
values destined to $0, i.e.
EX/MEM.RegisterRd≠ 0
71
CS2504, Spring'2007
©Dimitris Nikolopoulos
Implementing Forwarding
The trick is to use the pipeline registers (EX/MEM, MEM/WB) to hold
the calculated values from the ALU and forward them to later instructions.
To do that we need to take inputs in the ALU from EX/MEM, MEM/WB 72
CS2504, Spring'2007
©Dimitris Nikolopoulos
Implementing Forwarding
More potential inputs to the ALU, therefore we need a multiplexer.
73
CS2504, Spring'2007
©Dimitris Nikolopoulos
Implementing Forwarding
More potential inputs to the ALU, therefore we need a multiplexer.
74
CS2504, Spring'2007
©Dimitris Nikolopoulos
Implementing Forwarding
Logic: Forwarding possibly for R-instructions. Need to pass register numbers
in the IF/ID and ID/EX stage and detect whether the source register of
a later instruction has the same number as the Rd registers of an earlier
75
instruction
CS2504, Spring'2007
©Dimitris Nikolopoulos
Forwarding Logic
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠0)
and (Ex/MEM.RegisterRd = ID/EXRegisterRs))
Forward A = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠0)
and (Ex/MEM.RegisterRd = ID/EXRegisterRt))
Forward B = 10
76
CS2504, Spring'2007
©Dimitris Nikolopoulos
Forwarding Logic
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA =01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB =01
77
CS2504, Spring'2007
©Dimitris Nikolopoulos
Forwarding Logic


No hazard in WB stage, because instruction can
write to and read from register file in the same
stage.
One more complication:
add $1, $1, $2
add $1, $1, $3
add $1, $1, $4
Forwarding simultaneously from the WB and the
MEM stage. Need to obtain latest value of $1!
78
CS2504, Spring'2007
©Dimitris Nikolopoulos
Corrected forwarding Logic
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd ≠ID/EX.RegisterRs)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA =01
79
CS2504, Spring'2007
©Dimitris Nikolopoulos
Corrected forwarding logic
80