2014-11-10-logic_optimization

Circuit Optimization
CS 3220
Fall 2014
A
C
Hadi Esmaeilzadeh
[email protected]
Georgia Institute of Technology
Some slides adopted from Prof. Milos Prvulovic
T
Alternative,Computing,Technologies
What is on the FPGA chip?
3 Apr 2014
Logic Optimization
2
An FPGA Logic Element (LE)
3 Apr 2014
Logic Optimization
3
FPGA Logic Elements
3 Apr 2014
Logic Optimization
4
Inside a Logic Element
3 Apr 2014
Logic Optimization
5
The ALU
 With enough optimization,
ALU by itself may become the critical path
– Don’t try any of this until ALU is on the critical path!
 It needs to add, subtract, and, or, nor, etc.
– Depending on which instruction is currently in A stage
 What we used before simplifies the R stage
– No need to “decode” ALU operation,
just feed IR[3:0] to ALU as control signal
 But decoding still happens!
– Must decide which operation produces the ALU output!
– All operations get done, then a MUX to choose
• And it’s a large and slow MUX
 We can do (much) better than that
3 Apr 2014
Logic Optimization
6
A (Much) Better ALU
 Overall ALU latency is
– Time for longest operation, plus
– Time to select between that and other results
 Which is the longest operation
– What does + do?
– What does – do?
– What does < do?
– What does <= do?
– What does OR do?
–…
3 Apr 2014
Logic Optimization
7
Aha!
 Obviously AND, OR, XOR and their cousins are
way faster than their cousins
– Let’s just do them all, select between them
(don’t care what is selected if this isn’t a logic op)
– The whole thing done way before adder value is
ready
 But +,-,<,<= need to use an add/sub unit
– One add/sub unit for all three, or one for each?
3 Apr 2014
Logic Optimization
8
One +,- unit?
 Good:
– Less hardware (32 one-bit adders with carry bits)
– No need to select between ADD and SUB afterwards
 Bad:
– Must account for time to flip second operand (for SUB)
before the adder/subtracter can begin its work!
– Must use a full 1-bit adder for LSB bit
• If ADD was separate, it can use a half-adder (no carry-in) for the
LSB
• But SUB needs Cin to be 1, so it uses full adders for all 16 bits
 Turns out both bad things don’t matter!
– Why? Hint: What would be the longest path
if we have an adder and a separate subtractor,
then chose between their results?
3 Apr 2014
Logic Optimization
9
What about LT,LE,GT,EQ,NE,etc.?
 How do we do LT?
 How do we do EQ?
 How do we do LTE?
 What about NE, GT, GTE?
3 Apr 2014
Logic Optimization
10
One ADD/SUB unit
 So we have a single adder with
– 32-bit data inputs (aluin1, aluin2)
– A one bit carry-in input
 Controlled by a new addsub control signal
– If adding, don’t flip aluin2 bits, Cin is 0
– If subtracting (for – or for <), flip aluin2, Cin is 1
 After this, we need to
– For ADD, SUB: ADD/SUB output goes to aluout
– For logic operations: logic unit output goes to aluout
– For LT: MSB of ADD/SUB output goes to LSB of ALU
3 Apr 2014
Logic Optimization
11
Producing ALU output
 So we have
– Logic result
– Add/sub result
– Comparison result
 We have three control signals, each enables
one of these to get to ALU output
– Can use a tri-state bus for ALU output, use enable signals to
output results of different sub-units
– Can “and” each with its enable signal,
then “or” those to get final ALU result
• Almost identical to what tri-state bus ends up doing
– Can use MUXes to select among these three
• One 3-input MUX controlled by a 2-bit signal
• Better: 2-input MUX for the faster two sub-units, control by 1-bit
signal
2-input MUX between that and slowest one, another 1-bit control
3 Apr 2014
Logic Optimization
12