Redefining the FPGA The first fully programmable system solution designed specifically for intellectual property. Agenda Technology Roadmap Redefining the FPGA Architecture Overview The CLB Tile, Vector Based Interconnect, Internal Bus Support, SelectRAM+, Clocking & DLLs, SelectI/O, Thermal Management & The SelectMap Interface Software & Cores Support Summary - A System Level Solution Technology Roadmap Virtex Density/Performance 1 Million+ System Gates with High Performance System Solution 5LM - 0.25µm (7LM - 0.18µm) XC4000XV 3LM - 0.25µm (XC40250XV) XC4000XL 3LM - 0.35µm (XC4085XL) XC4000E 2LM - 0.5µm (XC4025E) 1995 XC4000EX 2LM - 0.5µm (XC4036EX) 1996 1997 1998 1999 Redefining the FPGA Chip 1 Chip 2 133MHz SDRAM 3 1x CLK SRAM Cache (Mbytes) 2x CLK LVCMOS SSTL3 4 LVTTL GTL+ 1 2 Low Voltage CPU High Speed System Backplane "Virtex moves FPGAs from glue to system component” Redefining the FPGA 2 System Integration 1 4 System Memory 3 System Timing System Interfaces Value Extends Beyond the Socket Redefining the FPGA Advanced Process Technology Allows for Almost 10x the Density of Today’s FPGAs System Integration Extremely Dense 2ns 1 2ns 1,728 to 27,648 Logic Cells Predictable Routing Delays Produce a Core Friendly Architecture With Fast Place & Route Times Redefining the FPGA 2 System Memory 200 MHz Distributed SelectRAM 200 MHz Block SelectRAM RAMB4_S4_S16 200 MHz Access to External Memory WEA ENA RSTA CLKA ADDRA[9:0] DIA[3:0] DOA[3:0] WEB ENB RSTB CLKB ADDRB[7:0] DIB[15:0] DOB[15:0] Redefining the FPGA CLKDLL CLKIN CLKFB CLK0 CLK90 CLK180 CLK270 90 MHz DLL CLK2X CLKDV LOCKED RST 3 CLK DLL DLL Virtex Route to Other Devices 45 MHz (Divide by 2) 180 MHz (Multiply by 2) System Timing Redefining the FPGA 5.0V 1.8V PCI 3.3V 2.5V SelectI/O Allows Connection Directly to External Signals of Varied Voltages & Thresholds SSTL HSTL Future Standards Can be Supported Without Having to Make Silicon Changes 4 GTL System Interfaces GTL+ AGP Redefining the FPGA 1 System Integration Intellectual Property is Critical for High Density Design & Must Drop in Easily Without Penalty Across an Entire Family 2 System Memory Memory Bandwidth is Always Key Size & Depth Requirements Vary Depending on the Application 3 System Timing Chip to Chip Performance Typically Limits System Speeds Clock Skew is an Important Factor in High Performance Systems 4 System Interfaces Process Technology Leads to Mixed Voltage Systems High performance, Lower Power Signal Standards Have Emerged Redefining the FPGA New Modules IP Modules AllianceCore 133Mhz SDRAM VHDL Design Environment Verilog Design Environment Designer #1 Designer #2 CoreGen DSP FIFO Design Reuse Giga-bit Ethernet CPU LogiCore 66Mhz PCI Virtex 160 MHz I/O 133 MHz Memory 1 Million+ System Gates Redefining the FPGA Extremely Dense 50,000 to 1,000,000 System Gates 1,728 to 27,648 Logic Cells System Performance & Features 160 MHz+ System Performance Multiple DLLs & Block SelectRAM Supports Multiple I/O Standards IP Software Internal Performance & Features System Building Blocks 100 MHz+ at 3 to 4 Logic Levels TBUFs & Distributed SelectRAM Fast, Flexible I/Os Superior Intellectual Property Infrastructure - CoreGen & Web Segmented Routing 4-Input LUT Architecture Leading Edge Process Technology Proven Software Flows for High Density & Performance - M1.5 The World’s First Fully Programmable System-Level Architecture Architecture Overview 2ns RAMB4_S4_S16 2ns WEA ENA RSTA CLKA ADDRA[9:0] DIA[3:0] DOA[3:0] WEB ENB RSTB CLKB ADDRB[7:0] DIB[15:0] DOB[15:0] 2 1 Block SelectRAM The CLB Tile Thermal Management SelectMAP Configuration Distributed SelectRAM CLKDLL GTL CLK0 GTL+ AGP CLK90 CLKIN CLKFB CLK180 CLKDV LOCKED 3 DLL 1.8V 3.3V 2.5V CLK270 CLK2X RST 5.0V PCI SSTL 4 SelectI/O HSTL The CLB Tile Advanced Process Technology Allows for Almost 10x the Density of Today’s FPGAs System Integration Extremely Dense 2ns 1 2ns 1,728 to 27,648 Logic Cells Predictable Routing Delays Produce a Core Friendly Architecture With Much Faster Place & Route Times All CLB Inputs Have Access to Interconnect on All 4 Sides INTERNAL BUSSES CARRY CARRY SINGLE HEX CLB Tile is Composed of a Switch Matrix, Configurable Logic Block, and Associated General Routing Resources LONG The CLB Tile TRISTATE BUSSES LONG LONG HEX HEX SWITCH MATRIX SINGLE SINGLE SLICE DIRECT CONNECT Local Feedback Slices Have a Bit Pitch of 2 CLB CARRY Fast Local Feedback Within the CLB & Direct Connects to Adjacent Horizontal Neighbors SLICE CARRY SINGLE DIRECT CONNECT HEX Wide Single CLB Functions LONG CLB is Divided into Two Identical Slices Simplified CLB Structure CLB Slice LUT Slice Carry PRE D Q CE LUT Carry CLR LUT Carry PRE D Q CE CLR PRE D Q CE CLR LUT Carry PRE D Q CE CLR 2 Slices in Each CLB Virtex Slice is Similar in Contents to the Current XC4000 CLB 2 BUFTs Associated with Each CLB, Accessible by All 8 CLB Outputs Detailed Slice Structure COUT G1 G2 G3 G4 A1 A2 A3 A4 O WS DI YB 1 LUT/RAM/ROM/SHIFT 0 1 Y * 0 1 D BY S Q YQ CE CLK R Write Strobe Logic Data In Multiplex Logic CE SR GSR F5 from other slice XB Position of F5 tap on other slice WS A1 A2 A3 A4 F1 F2 F3 F4 DI 1 0 1 X O LUT/RAM/ROM/SHIFT * D 0 1 S Q XQ CE R * Controlled by the same pair of memory cells ** Implemented as extra inputs on the BX input mux *** CLK and SR inputs are common to both slices BX 1 0 CIN Wide Single CLB Functions 2.5ns CLB Slice Slice 0.3ns 1.1ns 1.1ns LUT LUT Implement 13-Input Functions in a Single CLB Builds on XC4000 Architecture 9-Input Function 2 Logic Levels and 1 Local Interconnect Yield a 2.5ns Max Delay Slice Features Two 4-Input LUTs in Each Slice Includes 2 Highly Flexible Sequential Elements Dedicated Logic for 4x1 & 8x1 Muxes Fast Look Ahead Carry Logic Dedicated Multiplier Fabric New SelectShift Feature Create Shift Registers up to 16 Cycles Deep in a Single 4Input LUT 4-Input LUTs can be used as Distributed SelectRAM Same as XC4000 Synchronous Modes - Single & Dual Port Flexible Sequential Elements Sequential Elements Can be Flip-flops or Latches FDRSE D S CE 2 in Each Slice, 4 in Each CLB Can be Sourced from LUTs or an Independent CLB Input Separate Set & Reset Controls Controls Can be Synchronous or Asynchronous GSR Can be Used for Power On Set/Reset All Controls Can be Inverted Controls are Shared Within Each Slice Q R FDCPE D PRE Q CE CLR LDCPE D PRE CE G CLR Q Fast Efficient Muxes Primary Use of XC4000 HMAP was to Implement a 2x1 Mux Dedicated Muxes are Faster & More Space Efficient Space Freed Up is Used for Muxes & Other Special Logic MUXF5 Can be Used to Combine the Two LUTs in a Slice to Create a 4x1 Mux or Any Function of 5 Inputs CLB Slice LUT MUXF6 LUT MUXF5 Slice LUT LUT MUXF6 Can be Used to Combine the Two Slices in a CLB to Create an 8x1 Mux or Any Function of 6 Inputs MUXF5 Fast Look Ahead Carry Logic 0 1 LUT 0 1 LUT 0 1 LUT 0 1 LUT Simple, Fast & Complete Arithmetic Logic Vertical, Up Only Carry Direction Look Ahead Carry Implementation Yields 32-Bit Counters & Arithmetic Functions that Perform at 100MHz+ Discrete XOR Component for Single Level Sum Completion 2 Separate Carry Chains in CLB Allow for 3 Operand Functions Dedicated Multiplier Fabric LUT A CY_MUX CO S DI CI CY_XOR MULT_AND AxB LUT B LUT Highly Efficient ‘Shift & Add’ Implementation Logic Added for Implementation of Binary Tree Style Multipliers 30% Reduction in Area for a 16x16 Multiply & 1 Less Logic Level SelectShift Dynamically Addressable Shift Registers - DASRs LUT Ultra-Efficient Programmable Clock Cycle Delay Serial In, Serial Out, Clock, Clock Enable, and Shift Depth Address Single LUT Maximum Cycle Delay of 16 Cascade DASRs for Cycle Delays Greater than 16 CLB Flip-Flops Can be Used for Other Functions or to Add to DASR Depth IN CE CLK D Q CE D Q CE D Q CE CLB Slice Slice LUT LUT LUT LUT D Q CE DEPTH[3:0] OUT SelectShift 12 Cycles 64 Operation A Operation B 4 Cycles 8 Cycles 64 Operation C 3 Cycles 9-Cycle Imbalance 3 Cycles Register Rich FPGAs Allow for the Addition of Pipeline Stages to Increase Throughput Data Paths Must be Balanced to Maintain Desired Functionality SelectShift 12 Cycles 64 Operation A Operation B 4 Cycles 8 Cycles Operation C Operation D - NOP 3 Cycles 9 Cycles 64 Paths Statically Balanced 12 Cycles SelectShift Feature of the 4-Input LUT Can be Used to Create NOPs Above Example Uses 64 LUTs to Replace 576 Flip-flops (64*9) SelectShift (continued) 12 Cycles 64 Operation A Operation B 4 Cycles 8 Cycles Operation C 3 Cycles 3 Cycles # NOP Cycles 64 1/10 Cycles Operation D - NOP Paths Dynamically Balanced SelectShift Depth Can be Dynamically Changed Above uses 64 LUTs to Replace 704 Flip-flops & 64 2x1 Muxes Paths Statically Balanced Internal Bus Support One Pair of BUFTs Associated with Each CLB Same ‘Pitch’ as Slice Carry Logic - 2 Bits/Slice Each BUFT has an Independent Control Input All CLB Outputs can Source Either BUFT Data Input Combine BUFTs to Create Wide Muxes Replace LUT Based Mux Logic to Increase Density Much Faster than Previous Architectures Approximately 10ns to Span Entire XCV1000 - 96 Columns Ties Groups of 4 BUFTs with Bi-directional Look Ahead Scheme Similar to Slice Carry Logic Internal Bus Support And-Or Implementation Replaces Three-State Drivers Simultaneously Driving BUFTs will not Cause Contention Capacitance of Entire Load Reduced Dramatically Slow, Power Hungry Pullups & Weak Keepers Unnecessary Output Flexibility Removal of Pullups Allows for Outputs to Span Rows Segments of 4 Columns Allow for Many Outputs Per Row High Performance Routing General Purpose Routing 2ns Routing Delay Depends on Radial Distance Routing Structure Designed to Handle High Fanout Nets 2ns 1000+ Loads - Sub 10ns Much More Predictable Predictability is Critical for Core Integration & Reuse Optimized for 5 Layer Metal CLB Array High Performance Routing Significant Compile Time Reduction Without Performance Penalty CARRY CARRY SINGLE HEX HEX HEX SWITCH MATRIX SINGLE DIRECT CONNECT SINGLE SLICE SLICE DIRECT CONNECT Local Feedback CLB CARRY Algorithmically Friendly Structure LONG CARRY LONG SINGLE TRISTATE BUSSES INTERNAL BUSSES HEX Allows For Optimal Connection Delay, Power, Capacitance & Resource Utilization Combined With Timing Driven Place & Route Yields Superior Path Delays Increasing Device Utilization Does Not Decrease Design Performance Resource Mix Optimized for Large Devices - Optimized for 5 LM LONG LONG Segmented Routing Architecture High Performance Routing Advanced Local CLB Routing Massive Hierarchical General Routing Resources Designed For Speed 24 Singles, 72 Hexes, 12 Longs per Tile (4KXL: 8 Singles, 4 Doubles, 12 Quads, 12 Longs per Tile) Selective Connectivity Between Resource Types to Limit Loading Longs and Hexes Can be Used as Secondary Global Resources for Clocks and Controls With Sub 10ns Delays Special Backbone Routing in Top and Bottom I/O Edges to Connect Vertical Longs to Create Low Skew Resources Increased Switch Matrix Connectivity Higher Connectivity Eliminates Congestion Advanced Local CLB Routing Each LUT Output Can Connect to the Three Other LUTs 100ps to 300ps Maximum Delay Create 13-Input Functions Within the Same CLB - 2.5ns Total Delay Synthesis Tools Use FastConnects on Critical Paths IMUX Receives 96 Connections from General Routing Matrix (GRM) Highly Exhaustive Connection Matrix OMUX Equivalent to 8-bit 13x1 Mux All 8 Outputs Connect to the GRM 2 Outputs Can be Used to Connect Directly to the Horizontal Neighbors All Outputs Can Feed the 2 BUFTs CLB Slice LUT LUT Slice LUT LUT Massive Hierarchical Resources Routing Needs Based On XCV1000 Loading of Resources Minimized While Connectivity Increased Both Long Lines & Hexes are Buffered To Reduce RC Delays Longs Have Access Every 6 Tiles Hexes Have Access at Ends & Middle Special Hexes Added to Top and Bottom to Create High Fanout Resources with Vertical Long Lines Horizontal Singles Connect Directly to Vertical Long Lines for Fast Control Signal Distribution Increased Matrix Connectivity Previous Families Use Planar Pipulation Allows for Routing Along Same Channel Restricts Connectivity of Dissimilar Resources Planar pipulation Virtex Devices Use Non-Planar Pipulation Allows for Routing Across Resource Types Longs Drive Hexes, Hexes Drive Hexes and Singles, Singles drive Singles and CLB IMUXs - Vertical Hexes Drive CLB Controls Inputs As Well CLB OMUXs Drives All Types Switch Matrix Connectivity Determines Design Routabilty Increased Switch Matrix Connectivity Alleviates Congestion Non-Planar pipulation SelectRAM+ 2 System Memory 200 MHz Distributed SelectRAM 200 MHz Block SelectRAM RAMB4_S4_S16 200 MHz Access to External Memory WEA ENA RSTA CLKA ADDRA[9:0] DIA[3:0] DOA[3:0] WEB ENB RSTB CLKB ADDRB[7:0] DIB[15:0] DOB[15:0] SelectRAM+ Hierarchy Distributed SelectRAM Proven Synchronous RAM of the XC4000 Families 16x1 Implemented in a LUT - 4 in Each CLB 32x1 Implemented in a Slice - 2 in Each CLB Ideal for DSP Applications Block SelectRAM True Dual Port, Fully Synchronous RAM 4096-Bit Block Configurable in Widths From 1 to 16 Ideal for Data Buffers & FIFOs Fast Access to External RAM 133MHz Direct Interface to SSTL3, 3.3V Synchronous DRAM Distributed SelectRAM Builds on XC4000 Tradition Synchronous Write Asynchronous Read No Asynchronous Write LUT Use a Single LUT to Create a RAM16X1S Use a Pair of LUTs to Create a RAM32X1S or RAM16X1D RAM16X1D Comes With One R/W Address & One Read Only Address Accompanying Flip-Flops Can Be Used to Register Read Slice LUT LUT RAM16X1S D WE WCLK A0 O A1 A2 A3 RAM32X1S D WE WCLK A0 O A1 A2 A3 A4 RAM16X1D D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 Block SelectRAM True Dual Port Synchronous RAM 2 R/W Ports with Independent Controls Synchronous Read & Write RAMB4_S#_S# WEA ENA RSTA CLKA ADDRA[#:0] DIA[#:0] Block Count Increases With FPGA Size Flexible 4096-Bit Block Variable Aspect Ratio Each Port can be a Different Width Synchronous Reset & INIT Values WEB ENB RSTB CLKB ADDRB[#:0] DIB[#:0] 8 Blocks in the XCV50 - 32Kb 32 Blocks in the XCV1000 - 128Kb Located on Left & Right Sides with 1 Block Every 4 Rows State Machines, Decodes, Etc Sub-10ns Cycle Time For All Widths DOA[#:0] DOB[#:0] Allowed Widths ADDR (11:0) (10:0) (9:0) (8:0) (7:0) DATA (0:0) (1:0) (3:0) (7:0) (15:0) #/Width 1 2 4 8 16 Depth 4096 2048 1024 512 256 Block SelectRAM Library Name Specifies Port Configuration RAMB4_S4_S16 WEA ENA Port A In 1K-Bit Depth RSTA DOA[3:0] Port A Out 4-Bit Width DOB[15:0] Port B Out 16-Bit Width CLKA ADDRA[9:0] DIA[3:0] WEB ENB Port B In 256-Bit Depth RSTB CLKB ADDRB[7:0] DIB[15:0] Each Dual Port can be configured with a different width Block SelectRAM The Dual Ports Access the Same 4096 Bits 4096-Bit Storage When Viewed by a Port Configured as 1kx4 Nibble 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Combine Blocks For Additional Depth & Width The Depth/Width Ratio Determines How the Bits are Accessed For Example: A RAMB4_S4_S16 Has a 1kx4 Port & a 256x16 Port Provides Easy Data Width Conversion Without Any Additional Logic Bit0 Bit4 Bit8 Bit12 Bit16 Bit20 Bit24 Bit28 Bit32 Bit36 Bit40 Bit44 Bit48 Bit52 Bit56 Bit60 DOA[0:3] Bit1 Bit2 Bit5 Bit6 Bit9 Bit10 Bit13 Bit14 Bit17 Bit18 Bit21 Bit22 Bit25 Bit26 Bit29 Bit30 Bit33 Bit34 Bit37 Bit38 Bit41 Bit42 Bit45 Bit46 Bit49 Bit50 Bit53 Bit54 Bit57 Bit58 Bit61 Bit62 4096-Bit Storage When Viewed by a Port Configured as 256x16 Word 1 2 3 4 Bit0 Bit16 Bit32 Bit48 Bit1 Bit17 Bit33 Bit49 Bit2 Bit18 Bit34 Bit50 Bit3 Bit19 Bit35 Bit51 Bit4 Bit20 Bit36 Bit52 Bit5 Bit21 Bit37 Bit53 Bit6 Bit22 Bit38 Bit54 DOB[0:15] Bit7 Bit8 Bit23 Bit24 Bit39 Bit40 Bit55 Bit56 Bit9 Bit25 Bit41 Bit57 Bit10 Bit26 Bit42 Bit58 Bit11 Bit27 Bit43 Bit59 Bit12 Bit28 Bit44 Bit60 Bit13 Bit29 Bit45 Bit61 Bit14 Bit30 Bit46 Bit62 Bit15 Bit31 Bit47 Bit63 Bit3 Bit7 Bit11 Bit15 Bit19 Bit23 Bit27 Bit31 Bit35 Bit39 Bit43 Bit47 Bit51 Bit55 Bit59 Bit63 Block SelectRAM RAMB4_S1 0 WE 1 EN 0 RST Clock A[31:20] N/C CLK DO 4095 FFFXXXXX 4094 FFEXXXXX 4093 FFDXXXXX Subdivide 32-Bit Address Space into 4096 1MB Blocks Enable ADDR[11:0] DI[7:0] Using a DLL, the Enable is Available Only 5.1ns After the Rising Edge of the External System Clock 0002 002XXXXX 0001 001XXXXX 0000 000XXXXX Build State Machines & PROM Based Address Decodes Clocking & DLLs CLKDLL CLKIN CLKFB CLK0 CLK90 CLK180 CLK270 90 MHz DLL CLK2X CLKDV LOCKED RST 3 CLK DLL DLL Virtex Route to Other Devices 45 MHz (Divide by 2) 180 MHz (Multiply by 2) System Timing General Clock Support 4 Dedicated Global Low Skew Buffers Dedicated Input Pin - Intended to Distribute Clocks Only 66 MHz PCI Performance With 500ps Maximum Skew – – 3ns TSetup /0ns THold - Input IOB Flip-flop with No Data Delay 6ns TClock2Out - Output IOB Flip-flop 24 Additional Shared Resources Intended to Distribute Low Skew/High Fanout Signals Distribute Control Signals Across the Device under 10ns – additional clocks, clock enables, three-state controls & resets 4 Delay Lock Loops on Each Device 100% Digital Implementation 2 Global Buffers Associated with Each DLL Pair DLLs Versus PLLs Both types are used to remove clock delay & provide additional clocking functionality Frequency synthesis, Phase adjustment & clock conditioning Both can be implemented using either analog or digital logic CLKIN Programmable Delay Line Control Logic CLKOUT Programmable Oscillator Clock Distribution CLKIN CLKFB DLLs use Programmable Delay Line in Conjunction with Control Logic that Selects the Delay to Match the Distribution Control Logic CLKOUT Clock Distribution CLKFB PLLs use Programmable Oscillators in Conjunction with Phase Detectors & Filters to Phase Adjust the Clock DLLs Versus PLLs The Oscillator Used in a PLL Inherently Introduces Instability & Phase Error The DLL Architecture is Unconditionally Stable and Does Not Accumulate Phase Error It is Generally Accepted that DLLs are Better for Delay Compensation and Clock Conditioning PLLs Typically Have an Advantage When Performing Frequency Synthesis and Can Operate Over a Larger Input Clock Frequency DLL Functions Virtex Speedup Tc2o Zero-Delay Internal Clock Buffer Clock Phase Synthesis For Use Internally Or Externally Virtex Clock Multiplication & Division For Use Internally Or Externally Clock Mirror Zero-Delay Board Clock Buffer DLL Functions Speedup Tc2o by Eliminating Clock Distribution Delay Generate Phase Shifted Clocks Perform Clock Multiplication & Division Cleanup Clocks with 50/50 Duty Cycle Correction Generate Clock Lock for Internal & External Use Can Require Configuration to Synchronize with DLL Lock DLL Feedback can be Connected Internally or Externally Can be Used to Create Clock Mirrors & Perform System Synchronization DLL Tc2o Speedup Tclock = 0ns DLL CLKext D Q > OUT Tc2q + Tout = Tc2o CLKint Nullify Clock Delay - Fast Tc2o on XCV1000 External CLKext pin and Internal CLKint pin are Aligned 2.5ns Setup/0.0ns Hold & 3.5ns Tc2o on All Devices Optional Duty Cycle Correction 50/50 Duty Cycle Correction Applied when Specified Not sensitive to clock input noise - use standard cans DLL Phase Shift Coarse Phase Shifts Available 0°, 90°, 180°, and 270° Available for Internal & External Use 50/50 Duty Cycle Correction Available 100MHz - 180° Phase Shift DLL 100 MHz (0 Phase) 100 MHz (180° Shift) DLL Multiplication 16 16 32 Data Buffer IO Internal Logic 2x DLL CLK x Generate 2x & 4x Clocks Reduce Board EMI and Trace Concerns by Routing Low Frequency Clocks Externally and Multiplying Internally Cross Clock Domains Without Worry Multiplied & Divided Clocks Have Synchronized Edges No External Clock Drift & Minimal External Clock Skew Eliminates Metastable Events DLL Multiplication 2 DLLs on Top & Bottom Use 1 DLL on an Edge for 2x Multiplication or Both for 4x Multiplication 180 MHz Maximum Output Frequency 66MHz - 2x Clock Multiplication DLL 66 MHz 132 MHz (Multiply by 2) DLL Division Selectable Division Values 1.5, 2, 2.5, 3, 4, 5, 8, or 16 50/50 Duty Cycle Correction Available Use DLL Pair to Combine Functions Input 180 2X 30 MHz - 180° Phase Shift DV2 DLL 30 MHz (180° Shift) 30 MHz 30 MHz Used for FB 30 MHz (180° Shift) DLL 15 MHz (Divide by 2) 60 MHz (Multiply by 2) 30 MHz 180° Phase Shift - Clock Multiply & Clock Divide Clock Mirrors Generate Clock Mirrors for Cascaded & Other Devices Extremely Low Output Skew Rising Edge Skew -20ps* Falling Edge Skew +40ps* *Actual Device Measurements 100MHz - 100MHz Clock Mirror DLL 100 MHz LVTTL 100 MHz LVTTL Feedback from External Trace Input Output System Synchronization Synchronize All Devices CLK DLL DLL FPGA 1 DLL FPGA 2 DLL FPGA 3 Eliminate Clock Skew Nullify Clock Input & Board Delay in Addition to Internal Distribution Delay Chip to Chip Race Conditions Removed Increase Chip to Chip Interface Speed - 160MHz DLL FPGA N DLL Modes Low Frequency Input Frequency Range - 25 MHz to 100 MHz Minimum High/Low Time - 2.2 ns All 6 Outputs Available for use Internally & Externally – CLK0, CLK90, CLK180, CLK270, CLK2X, CLKDV High Frequency Input Frequency Range - 60 MHz to 200 MHz Minimum High/Low Time - 2.2 ns 3 Outputs Available for use Internally & Externally – CLK0, CLK180 & CLKDV Both Modes Supported with Simple Design Primitives VHDL & Verilog Simulation Support Available DLL Software Support Use BUFGDLL Macro for Common Clock Usage BUFGDLL 0ns Build Complex Structures Using CLKDLL Primitive CLKDLL CLKIN CLKFB RST Equivalent Structure CLK0 CLK90 CLK180 CLK270 CLK2X CLKDV LOCKED PAD BUFG IBUFG DLL FB To distributed clock network SelectI/O 5.0V 1.8V PCI 3.3V 2.5V SelectI/O Allows Connection Directly to External Signals of Varied Voltages & Thresholds SSTL HSTL Future Standards Can be Supported Without Having to Make Silicon Changes 4 GTL System Interfaces GTL+ AGP Supply Voltage Migration Lower cost Faster speed Higher density Lower power 1.2 Feature Size (µm) 1.0 0.8 Virtex FPGAs Ship 0.6 Voltage 5.0 0.4 0.2 0 1990 1992 1994 1996 1998 2000 3.3 2.5 1.8 1.3 2002 Process Technology Migration Leads to Mixed Voltage Systems Supply Voltage Migration 5V 3.3 V 2.5 V I/O Supply Accepts 5 V levels Any 5V device (XC4000E) 5V 3.3 V Logic Supply Virtex & XC4000XV 2.5 V logic 3.3 V I/O 3.3 V 3.3 V Meets TTL Levels Supply Voltage Sequencing Independent Virtex Supports Additional I/O Standards Any 3.3 V device (XC4000XL) SelectI/O Allows Connection & Use of a Wide Variety of Devices Processors, Memory, Bus Specific Standards, Mixed Signal... Provides Industry Standard IEEE/JDEC I/O Standards Maximizes Speed/Noise Tradeoff - Use Only What is Needed Can Connect to or Create High Performance Backplanes – PCI, GTL+, HSTL – DIY - Virtex Based Backplane Design in Progress Define I/O by Simply Placing Desired Input And/Or Output Buffers Into the Design Special IBUF and OBUF Components Provided in Schematic Based and HDL Based Design Flows For Example: SSTL3, Class I Output Buffer - OBUF_SSTL3_I Simplified IOB Structure Fast I/O Drivers Separate Registers for Input, Output & ThreeState Control Asynchronous Set or Reset Available on Each Flip-flop Common Clock, Separate Clock Enables Programmable Slew Rate, Pullup, Input Delay, Etc Selectable I/O Standard Support Supported Standards List can be Updated After Testing DFF/LATCH D Q CE S/R DFF/LATCH D Q CE S/R DFF/LATCH D Q CE S/R PAD How It Works SelectI/O Output SelectI/O Input Configuration Bits OBUF_SSTL3_I IBUF_SSTL3_I SSTL3 Class1 Output Driver SSTL3 Class1 Input Receiver How It Works Separate I/O & Core Supply Rails Programmable Driver Strength P & N Drivers Individually Controlled 16 Different Setting for Each Variable I/O & Vref Voltages 8 Banks on Each Device Specific I/Os are Used as Reference Inputs Differential Inputs Supported nMOS for High Vref pMOS for Low Vref VCCO Currently Supported Standards Standard LVTTL LVCMOS2 PCI 33MHz 3.3V PCI 33MHz 5.0V PCI 66MHz 3.3V GTL GTL+ HSTL-I HSTL-III SSTL3-I SSTL3-II SSTL2-I CTT AGP VCCO 3.3 2.5 3.3 3.3 3.3 na na 1.5 1.5 3.3 3.3 2.5 3.3 3.3 Vref na na na na na 0.80 1.00 0.75 0.90 0.90 1.50 1.10 1.50 1.32 Application General Purpose PCI Back-Plane Hitachi SRAM SDRAM Memory Graphics I/O Performance Virtex Chip-Chip I/O Performance SSTL3 AGP I/O Standard HSTL IV PCI-3.3V LVCMOS2.5V TTL-Fast 24mA TTL-Fast 12mA TTL-Slow 12mA TTL-Slow 2mA 0 50 100 150 200 Maximum Chip to Chip I/O Frequency = 1/(Tsetup + Tc2o)* *DLLs Used to Eliminate Clock Distribution Delay SelectI/O Banks BANK 1 BANK 5 BANK 4 BANK 3 BANK 6 BANK 2 BANK 7 BANK 0 SelectI/O Banks Each Device is Broken in 8 Banks Regardless of Size 2 Banks on Each Side of the Device Each Bank has Voltage Sources Shared Among Associated I/Os in that Bank All I/O Requiring a Voltage Source Must be of the Same Type Input Banking - Vref I/O Standards Which use a Differential Amplifier Require a Voltage Reference Input All Fixed Location/Dual Purpose Vref Inputs in a Bank Must be Used When Supplying a Voltage Reference Output Banking - Vcco Dedicated Pins provide drive source voltage for output pins SelectI/O Input Banks 1 Voltage Reference can be Supplied in a Bank Any input not requiring a Vref can be placed in Bank Flexible Use of Voltage Reference Inputs Pins Can be Used as General Purpose I/O If a Voltage Reference is Not Needed - All Must be Used to Supply a Voltage Reference Locations are Fixed for Each Device/Package Combination Any Single Output Buffer Type Can be Placed in the Bank Multiple Output Buffer Types Must Adhere to Output Bank Rules OBUFTs with Keepers Circuits Requiring a Voltage Reference are Treated as IOBUFs SelectI/O Output Banks Only One Vcc Output is Supplied to Each Bank Any Output Not Requiring Use of the Vcc Output can be Placed in the Bank Any Single Input Buffer Type Can be Placed in the Bank Multiple Input Buffer Types Must Adhere to Input Bank Rules Special Consideration Must be Given to Configuration I/O Configuration I/O is Located on the Right Side of the Device Serial PROM Downloads Require Vcco Set to 3.3V In Banks 2 & 3 Non-PROM Serial Downloads will generate warning (Even though Vcco Connection dependent on data source) Thermal Management Thermal Challenge Today’s FPGA Density is Absorbing Large Percentages of Board Designs Ambient Temp Data Because of its Highly Demands Dynamic Nature, Power Can Only be Estimated Before Design Completion Even as Voltages Decrease, Power Consumption is a Major Concern How do I Know My Die Temp is Within Spec? Heat Sinking Vcc Tolerance Virtex XCV1000 75M Transistors* 100+ MHz Advanced Signal Processing Apps 20W+ Power Dissipation * Pentium II = 7.5 Million Transistors Thermal Solution Maxim MAX1617 2-Pin SMBUS Serial Interface Interrupt SBMCLK SBMDATA DXP DXN Virtex DXP DXN ALERT* Remote Die Sensor Specially Designed to be Used With the Maxim MAX1617 Simple 2-Pin Interface with no Calibration Required Provides Two Channels – – FPGA Die Temp Reported from -40°C to +125°C at +/- 3°C Maxim Die Temp also at +/- 3°C Programmable Over-Temp & Under-Temp Alarms Same Technology as Pentium II System Management is Now Possible SelectMAP Advanced Configuration Master/Slave Serial JTAG SelectMAP Simple Serial Interface System Integrated Serial Virtex High Performance Parallel Simplified Configuration Mode Set 50 Megabyte/Second Download Rate Using SelectMAP Dedicated JTAG Port - No Contention Issues No Master Parallel Support Direct, JTAG & SelectMAP Device Readback Software & Cores Support HDL Design Entry Focus Synthesis Support is Critical for Large Designs Architecture Decisions Made Based on Synthesis Tool Tendencies Xilinx Relationships With Synthesis Vendors Initiated Direct 4-Input LUT & Carry Chain Synthesis - The Building Blocks of XL & Virtex Xilinx Will Continue to Drive Synthesis Vendors to Support Virtex Specific Features - Block SelectRAM, SelectShift & CLKDLLs Virtex Architecture Adds Additional Resources That Synthesis Vendors Easily Synthesize To Today Implementation Software Written With Synthesis Tool Flow Focus All Three Major Synthesis Vendors Supported Virtex for Beta Large Designs Also Require Team Based Design Must be able to Support Multiple Designers on the Same Device as Well as Core Integration Implementation Software Virtex Software is built on proven M1 technology Builds on Robust Integration with Third Party Design Entry Tools Emphasizes Constraint Driven Design Philosophy Vector Based Interconnect Yields More Predictable Routing Results Predictable Results Allows the Placement Algorithms to Make Better Routing Estimations in Must Less Time Architecture fully software tested before 1st silicon Virtex Implementation Software Was Available 18 Months Before Actual Silicon was Produced Used Proven Place & Route Software as a Gauge of the Architecture’s Ability to Meet Density & Performance Needs Early Software Allowed for Changes to be Made in the Finalization of the Architecture - Necessary Routing Mix, Special Features, etc A System Level Solution 2 System Integration 1 4 System Memory 3 System Timing System Interfaces Virtex is a True System Level Solution A System Level Solution Virtex Opens New System Level Applications to FPGAs 1 Extremely Dense - 50,000 to 1,000,000 System Gates Flexible Architecture – – Vector Based Interconnect – – Efficient for Random Logic, Memory, DSP & Data Path Circuits Automatically Implemented by Today’s Leading Synthesis Vendors Much More Predictable Before Place & Route Enhances Synthesis Based Flows Excellent Platform for Core Integration – Software Based on Proven M1 Timing Driven Place & Route Hierarchical Memory Support 2 SelectRAM+ Can be Used to Create Bytes or KBytes of Internal Storage and Access MBytes of Fast External Memory A System Level Solution System Speedup & Synchronization 3 Nullify Clock Distribution Delays - 160 MHz System Performance Synthesize Clocks for Internal and External Use Synchronize Systems - Create Clock Mirrors & Nullify Board Delay Flexible System Interface 4 Controllable Current, Input Vref and Vcco Characteristics Connect Directly to Existing & Emerging I/O Standards SelectMap Protocol Allows for Easy Interfacing to µControllers and µProcessors – – – 400+ Mb/sec Configuration, Verify & Debug Using a Simple 8-Bit Interface SelectMAP Port Can Remain on After Configuration JTAG Can Also be Used to Configure
© Copyright 2025 Paperzz