Win with HDL FPGA Roadmap Design Requirements and Flow Design Re-Use and IP Integration Coding for Performance Synthesis Requirements Verification Methodology Summary Density/Performance FPGA Roadmap Virtex 1M Systems Gates 2.5Volt Power Supply 0.25/0.18 XC4000XV Largest Device XC40250XV 0.25m 2.5 Volt Power Supply 25% Faster than XL XC4000XL Largest Device XC4085XL 0.35m 3.3 Volt Power Supply 30% faster than EX XC4000EX Largest Device XC4036EX XC4000E 0.5m Largest Device 5 Volt Power Supply XC4025E 30% faster than E 0.5m 5 Volt Power Supply 1995 1996 1997 Year 1998 1999 Design Requirements and Flow (V)HDL Increase Speed and Reduce Cost Design Re-Use, IPs Coding for Performance Simulation Behavioral Synthesis Architecture Independence Post Synthesis Implementation Tools Post Layout Intellectual Property Think IP at Specification IP the building block Xilinx suite includes ATM, DSP, HDLC, PCI, RISC and USB blocks IP Solution Goal Reduced cost, Increase system performance and Increase predictability Xilinx delivery vehicles – IP optimized for Xilinx architecture, guaranteed functionality – Parameterized cores with Core Generator, web mechanism to download new cores and links to System Level tools Intellectual Property Delivery Vehicles AllianceCORE Program Partner Core Development and Service Program Sold and supported by partner CORE Generator Software Flexible and easy to use vehicle Cores , Application Notes & data sheets Coding for Performance FPGAs require better coding styles and more effective design methodologies Pipelining techniques allow FPGAs to reach gate array system speeds Gate Arrays can tolerate poor coding styles and design practices 66 MHz is easy for an Gate Array Designs coded for a Gate Array tend to perform 3x slower when converted to an FPGA Not uncommon to see up to 30 layers of logic and 10-20 MHz FPGA designs 6-8 FPGA Logic Levels = 50 MHz Effective Coding Style Case vs If-Then-Else module mux (in0, in1, in2, in3, sel, mux_out); input in0, in1, in2, in3; in0 input [1:0] sel; output mux_out; in1 reg mux_out; mux_out always @(in0 or in1 or in2 or in3 or sel) begin in2 case (sel) in3 2'b00: mux_out = in0; 2'b01: mux_out = in1; sel 2'b10: mux_out = in2; default: mux_out = in3; endcase end endmodule module p_encoder (in0, in1, in2, in3, sel, p_encoder_out); input in0, in1, in2, in3; input [1:0] sel; output p_encoder_out; in3 reg p_encoder_out; always @(in0 or in1 or in2 or in3 or sel) begin in2 if (sel == 2'b00) p_encoder_out = in0; in1 else if (sel == 2'b01) p_encoder_out in0 sel=10 p_encoder_out = in1; else if (sel == 2'b10) sel=01 p_encoder_out = in2; sel=00 else p_encoder_out = in3; end endmodule Effective Coding Style Reduce Logical Levels of Critical Path in0 module critical_bad (in0, in1, in2, in3, critical, out); in1 input in0, in1, in2, in3, critical; critical output out; in2 assign out = (((in0&in1) & ~critical) | ~in2) & ~in3; out in3 endmodule module critical_good (in0, in1, in2, in3, critical, out); input in0, in1, in2, in3, critical; in0 output out; in1 assign out = ((in0&in1) | ~in2) & ~in3 & ~critical; endmodule in2 in3 critical out Effective Coding Style Resource Sharing module poor_resource_sharing (a0, a1, b0, b1, sel, sum); input a0, a1, b0, b1, sel; output sum; reg sum; always @(a0 or a1 or b0 or b1 or sel) begin if (sel) sum = a1 + b1; else sum = a0 + b0; end endmodule a0 b0 + a1 b1 + module good_resource_sharing (a0, a1, b0, b1, sel, sum); input a0, a1, b0, b1, sel; output sum; reg sum; reg a_temp, b_temp; a0 always @(a0 or a1 or b0 or b1 or sel) begin if (sel) begin a1 a_temp = a1; b_temp = b1; sel end else begin b0 a_temp = a0; b_temp = b0; b1 end sum = a_temp + b_temp; end endmodule sum sel + sum Effective Coding Style Register Duplication to Reduce Fan-Out module high_fanout(in, en, clk, out); input [23:0]in; input en, clk; output [23:0] out; reg [23:0] out; reg tri_en; always @(posedge clk) tri_en = en; always @(tri_en or in) begin if (tri_en) out = in; else out = 24'bZ; end endmodule tri_en en clk module low_fanout(in, en, clk, out); input [23:0] in; input en, clk; en output [23:0] out; reg [23:0] out; clk reg tri_en1, tri_en2; always @(posedge clk) begin tri_en1 = en; tri_en2 = en; end always @(tri_en1 or in)begin en if (tri_en1) out[23:12] = in[23:12]; else out[23:12] = 12'bZ; end clk always @(tri_en2 or in) begin if (tri_en2) out[11:0] = in[11:0]; else out[11:0] = 12'bZ; end endmodule [23:0]in [23:0]out 24 loads tri_en1 12 loads [23:0]in tri_en2 [23:0]out 12 loads Effective Coding Style Design Partition - Reg at Boundary module reg_in_module(a0, a1, clk, sum); input a0, a1, clk; output sum; reg sum; reg a0_temp, a1_temp; always @(posedge clk) begin a0_temp = a0; a1_temp = a1; end always @(a0_temp or a1_temp) begin sum = a0_temp + a1_temp; end endmodule module reg_at_boundary (a0, a1, clk, sum); input a0, a1, clk; output sum; reg sum; always @(posedge clk) begin sum = a0 + a1; end endmodule a0 clk sum + a1 clk a0 + a1 sum clk Managing FPGA Speed Booster Pipeline module no_pipeline (a, b, c, clk, out); input a, b, c, clk; output out; reg out; reg a_temp, b_temp, c_temp; always @(posedge clk) begin out = (a_temp * b_temp) + c_temp; a_temp = a; b_temp = b; c_temp = c; end endmodule module pipeline (a, b, c, clk, out); input a, b, c, clk; output out; a reg out; reg a_temp, b_temp, c_temp, mult_temp; always @(posedge clk) begin mult_temp = a_temp * b_temp; b a_temp = a; b_temp = b; end always @(posedge clk) begin c out = mult_temp + c_temp; c_temp = c; end endmodule 1 cycle a * b + out c 2 cycle * + out Synthesis Technology Requirement RTL level portability – – – – – Counter Inferencing RAM Inferencing Operator Inferencing IO Insertion IO Register Mapping Architecture Dependance – – – – IP FIFO Pipeline Multiplier Dual Port RAM Design Management – – – – – Hierarchy manipulation Error cross navigation Design constraints Incremental design Timing analysis Verification Methodology (V)HDL Simulation Behavioral Why verification? Synthesis Xilinx Solution Post Synthesis Implementation Tools Post Layout Why Verify? Average 25% reduction in design cycle Steve Winkelmans, DisplayTech. Design Cycle with minimal verification Design Cycle with robust verification Learn Design Implement V L Design Implement Verify Debug D Months Xilinx Solution HDL Simulation (MTI, Verilog-XL, VSS) HDL & Cores Behavioral (RTL) Post-Synthesis Post-Layout Static Timing Analysis (Xilinx) Synthesis Critical path timing check Future Verification Strategy Formal Verification – Cycle Base Simulation – Implementation Rapid checking of design integrity More test coverage in less time through synchronous designing Static Timing – Quad Motive, PrimeTime Summary When coding, “think hardware” Synthesis support for all technologies Technology Independence via inference operators, RAM (limited), counters. Instantiation limited to RAM, FIFO and IP Exemplar RAM inference today Synplicity RAM inference in 5.0 Robust Verification Methodology Identified areas for enhancement Incremental Design, Modular Design, Improved Delay Estimation
© Copyright 2026 Paperzz