Win with HDL

Win with HDL

FPGA Roadmap
 Design Requirements and Flow

Design Re-Use and IP Integration

Coding for Performance
 Synthesis Requirements
 Verification Methodology
 Summary
Density/Performance
FPGA Roadmap
Virtex
1M Systems Gates
2.5Volt Power Supply
0.25/0.18
XC4000XV
Largest Device
XC40250XV
0.25m
2.5 Volt Power Supply
25% Faster than XL
XC4000XL
Largest Device
XC4085XL
0.35m
3.3 Volt Power Supply
30% faster than EX
XC4000EX
Largest Device
XC4036EX
XC4000E
0.5m
Largest Device
5 Volt Power Supply
XC4025E
30% faster than E
0.5m
5 Volt Power Supply
1995
1996
1997
Year
1998
1999
Design Requirements and Flow
(V)HDL
 Increase Speed and Reduce
Cost

Design Re-Use, IPs

Coding for Performance
Simulation
Behavioral
Synthesis
 Architecture Independence
Post Synthesis
Implementation
Tools
Post Layout
Intellectual Property
Think IP at Specification
 IP the building block

Xilinx suite includes ATM, DSP, HDLC, PCI, RISC and USB
blocks
 IP Solution Goal


Reduced cost, Increase system performance and Increase
predictability
Xilinx delivery vehicles
– IP optimized for Xilinx architecture, guaranteed functionality
– Parameterized cores with Core Generator, web mechanism to download new
cores and links to System Level tools
Intellectual Property
Delivery Vehicles
 AllianceCORE Program


Partner Core Development and Service Program
Sold and supported by partner
 CORE Generator Software


Flexible and easy to use vehicle
Cores , Application Notes & data sheets
Coding for Performance
 FPGAs require better coding styles and more effective
design methodologies

Pipelining techniques allow FPGAs to reach gate array
system speeds
 Gate Arrays can tolerate poor coding styles and design
practices

66 MHz is easy for an Gate Array
 Designs coded for a Gate Array tend to perform 3x
slower when converted to an FPGA


Not uncommon to see up to 30 layers of logic and 10-20
MHz FPGA designs
6-8 FPGA Logic Levels = 50 MHz
Effective Coding Style
Case vs If-Then-Else
module mux (in0, in1, in2, in3, sel, mux_out);
input
in0, in1, in2, in3;
in0
input
[1:0] sel;
output
mux_out;
in1
reg
mux_out;
mux_out
always @(in0 or in1 or in2 or in3 or sel) begin in2
case (sel)
in3
2'b00:
mux_out = in0;
2'b01:
mux_out = in1;
sel
2'b10:
mux_out = in2;
default:
mux_out = in3;
endcase
end
endmodule
module p_encoder (in0, in1, in2, in3, sel, p_encoder_out);
input
in0, in1, in2, in3;
input
[1:0] sel;
output
p_encoder_out;
in3
reg
p_encoder_out;
always @(in0 or in1 or in2 or in3 or sel) begin
in2
if (sel == 2'b00)
p_encoder_out = in0;
in1
else if (sel == 2'b01)
p_encoder_out
in0
sel=10
p_encoder_out = in1;
else if (sel == 2'b10)
sel=01
p_encoder_out = in2;
sel=00
else p_encoder_out = in3;
end
endmodule
Effective Coding Style
Reduce Logical Levels of Critical Path
in0
module critical_bad (in0, in1, in2, in3, critical, out); in1
input
in0, in1, in2, in3, critical;
critical
output
out;
in2
assign out = (((in0&in1) & ~critical) | ~in2) & ~in3;
out
in3
endmodule
module critical_good (in0, in1, in2, in3, critical, out);
input
in0, in1, in2, in3, critical;
in0
output out;
in1
assign out = ((in0&in1) | ~in2) & ~in3 & ~critical;
endmodule
in2
in3
critical
out
Effective Coding Style
Resource Sharing
module poor_resource_sharing (a0, a1, b0, b1, sel, sum);
input
a0, a1, b0, b1, sel;
output
sum;
reg
sum;
always @(a0 or a1 or b0 or b1 or sel) begin
if (sel)
sum = a1 + b1;
else
sum = a0 + b0;
end
endmodule
a0
b0
+
a1
b1
+
module good_resource_sharing (a0, a1, b0, b1, sel, sum);
input
a0, a1, b0, b1, sel;
output
sum;
reg
sum;
reg
a_temp, b_temp;
a0
always @(a0 or a1 or b0 or b1 or sel) begin
if (sel) begin
a1
a_temp = a1;
b_temp = b1;
sel
end
else begin
b0
a_temp = a0;
b_temp = b0;
b1
end
sum = a_temp + b_temp;
end
endmodule
sum
sel
+
sum
Effective Coding Style
Register Duplication to Reduce Fan-Out
module high_fanout(in, en, clk, out);
input
[23:0]in;
input
en, clk;
output
[23:0] out;
reg
[23:0] out;
reg
tri_en;
always @(posedge clk) tri_en = en;
always @(tri_en or in) begin
if (tri_en) out = in;
else out = 24'bZ;
end
endmodule
tri_en
en
clk
module low_fanout(in, en, clk, out);
input
[23:0] in;
input
en, clk;
en
output
[23:0] out;
reg
[23:0] out;
clk
reg
tri_en1, tri_en2;
always @(posedge clk) begin
tri_en1 = en; tri_en2 = en;
end
always @(tri_en1 or in)begin
en
if (tri_en1) out[23:12] = in[23:12];
else out[23:12] = 12'bZ;
end
clk
always @(tri_en2 or in) begin
if (tri_en2) out[11:0] = in[11:0];
else out[11:0] = 12'bZ;
end
endmodule
[23:0]in
[23:0]out
24 loads
tri_en1
12 loads
[23:0]in
tri_en2
[23:0]out
12 loads
Effective Coding Style
Design Partition - Reg at Boundary
module reg_in_module(a0, a1, clk, sum);
input
a0, a1, clk;
output
sum;
reg
sum;
reg
a0_temp, a1_temp;
always @(posedge clk) begin
a0_temp = a0;
a1_temp = a1;
end
always @(a0_temp or a1_temp) begin
sum = a0_temp + a1_temp;
end
endmodule
module reg_at_boundary (a0, a1, clk, sum);
input
a0, a1, clk;
output
sum;
reg
sum;
always @(posedge clk) begin
sum = a0 + a1;
end
endmodule
a0
clk
sum
+
a1
clk
a0
+
a1
sum
clk
Managing FPGA Speed Booster Pipeline
module no_pipeline (a, b, c, clk, out);
input
a, b, c, clk;
output
out;
reg
out;
reg
a_temp, b_temp, c_temp;
always @(posedge clk) begin
out = (a_temp * b_temp) + c_temp;
a_temp = a; b_temp = b; c_temp = c;
end
endmodule
module pipeline (a, b, c, clk, out);
input
a, b, c, clk;
output
out;
a
reg
out;
reg
a_temp, b_temp, c_temp, mult_temp;
always @(posedge clk) begin
mult_temp = a_temp * b_temp;
b
a_temp = a; b_temp = b;
end
always @(posedge clk) begin
c
out = mult_temp + c_temp;
c_temp = c;
end
endmodule
1 cycle
a
*
b
+
out
c
2 cycle
*
+
out
Synthesis Technology Requirement
 RTL level portability
–
–
–
–
–
Counter Inferencing
RAM Inferencing
Operator Inferencing
IO Insertion
IO Register Mapping
 Architecture Dependance
–
–
–
–
IP
FIFO
Pipeline Multiplier
Dual Port RAM
 Design Management
–
–
–
–
–
Hierarchy manipulation
Error cross navigation
Design constraints
Incremental design
Timing analysis
Verification Methodology
(V)HDL
Simulation
Behavioral
 Why verification?
Synthesis
 Xilinx Solution
Post Synthesis
Implementation
Tools
Post Layout
Why Verify?
 Average 25% reduction in design cycle

Steve Winkelmans, DisplayTech.
Design Cycle with
minimal verification
Design Cycle with
robust verification
Learn
Design Implement V
L Design Implement
Verify
Debug
D
Months
Xilinx Solution
 HDL Simulation (MTI, Verilog-XL, VSS)

HDL & Cores


Behavioral (RTL)
Post-Synthesis
Post-Layout
 Static Timing Analysis (Xilinx)

Synthesis
Critical path timing check
 Future Verification Strategy

Formal Verification
–

Cycle Base Simulation
–
Implementation

Rapid checking of design integrity
More test coverage in less time through synchronous
designing
Static Timing
–
Quad Motive, PrimeTime
Summary
 When coding, “think hardware”
 Synthesis support for all technologies
 Technology Independence via inference

operators, RAM (limited), counters.
 Instantiation limited to RAM, FIFO and IP


Exemplar RAM inference today
Synplicity RAM inference in 5.0
 Robust Verification Methodology
 Identified areas for enhancement

Incremental Design, Modular Design, Improved Delay
Estimation