SEU Mitigation of a Soft Embedded Processor in the Virtex-II FPGAs Sana Rezgui1, Jeffrey George2, Gary Swift3, Kevin Somervill4, Carl Carmichael1 and Gregory Allen3, For the North American Xilinx Test Consortium 1Xilinx, Inc., San Jose, CA 2The Aerospace 3Jet Corporation, El Segundo, CA Propulsion Laboratory, California Institute of Technology, Pasadena, CA 4NASA Langley, Hampton, VA Objective • Use of embedded system applications built on S-FPGAs in radiation environment => Mitigation to SEUs and Design Implementation • Mitigated Design Performances ― Simplicity, flexibility and automation ― Area and timing performances • Upset Sensitivity in Radiation Environment ― Characterization of the FPGA sensitivity in beam ― Evaluation of the proposed mitigation solution for the embedded design Measure the in-beam performance of upset mitigation technique applied to a complex design - a processor- implemented on FPGA running a computationally intensive benchmark program Rezgui 2 MAPLD 2005/E238 Studied Case Mitigation to SEUs of the Xilinx soft IP processor MicroBlaze by means of the Triple Modular Redundancy (TMR) technique MicroBlaze Block RAM Configuration 18 bit Multipliers Logic Block (CLB) Digital Clock Manager Rezgui Programmable I/Os 3 MAPLD 2005/E238 Internal Architecture MicroBlaze is a 32-bit Harvard Bus RISC Architecture Rezgui 4 MAPLD 2005/E238 MicroBlaze Mitigation 1. Use TMR technique to mitigate the design to SEUs • MicroBlaze designs consist of I/Os, Look-Up Tables (LUT), FlipFlops (FF) and user memory elements, • For TMR Tool (developed by Xilinx), MicroBlaze is no different than any other design. 2. Run Active Readback and Continuous Scrubbing of all the static used resources for error detection and correction • This is transparent and independent to/from the running design, • User memory elements can not be scrubbed from the configuration port. Rezgui 5 MAPLD 2005/E238 Internal Architecture SRL16s SRL16s BRAM BRAM LUT-RAMs User memory elements: SRL16s, Distributed Memory (LUT-RAM), BRAMs • • Active Readback causes problems with user memory elements (dynamic content) BRAM static partial reconfiguration is not possible if storing program data in addition to the code Rezgui 6 MAPLD 2005/E238 User Memory Mitigation • Error Detection and Correction (EDAC) ― Additional decoding logic would be required ― Depends on the speed of detection and correction of upsets • Replacement of the user memory elements by FFs and LUTs ― SRL16 are automatically replaced by FFs and LUTs by the TMR Tool ― Distributed RAM (LUT-RAM) are not set to be automatically replaced: A custom macro is then required for their replacement by FFs and LUTs • Triple Modular Redundancy and Self-Correction of the BRAMs ― Done automatically through the TMR Tool by replacing each BRAM by a custom macro that scrubs the BRAM itself • EDAC and TMR can be defeated by error accumulation Rezgui 7 MAPLD 2005/E238 BRAM Mitigation Methodology 1. Apply TMR on the used BRAMs 2. Insert an internal scrub controller of the 3 BRAMs by their voted output value • Mitigation Requirement: Only one BRAM port could be used for the MicroBlaze design • Each Block RAM is replaced with the tmred BRAMs and the internal BRAM scrubber controller Rezgui 8 MAPLD 2005/E238 EDK / TMR Tool Design Flow LUTRAM & BRAM Macro Replacement System Design Implementation .ngc .ngo .bmm .ucf Rezgui NGDBuild TMR Tool MAP (Manual edit) PAR .elf Design Entry EDK/ISE .edf BitGen / BitInit XTMR Conversion TMR Tool 9 Implementation ISE MAPLD 2005/E238 Implementation and Performance (1) Virtex II- 6000 Used Internal Resources 100 90 50 40 30 20 Full Mitigated Design 60 Mitigated Mblaze design with LUT-RAM 70 Mitigated Mblaze design without LUT-RAMs 80 Single String MicroBlaze %Virtex II 6000 Used Resources FFs LU T s GC LK IOs M U LT s BRAM s 10 Design Type 0 Rezgui 10 MAPLD 2005/E238 Implementation and Performance (2) Timing Performances and Core Voltage Current Consumption Tested Design Maximum Frequency (MHz) Current Consumption (A) Single-string Mblaze (Phase 1) 77 0.37 Mitigated Mblaze design before Replacement of LUT-RAM (Phase 2) 66 0.78 Mitigated Mblaze design after Replacement of LUT-RAM (Phase 3) 66 0.83 Full Mitigated Design (Phase 4) 66 0.99 Rezgui 11 MAPLD 2005/E238 Experimental Test Designs DUT FPGA XQR2V6000 Service FPGA: XC2V3000 1. Configuration Monitor • • • DUT Configuration Continuous alternate scrubbing and readback at a rate of 4 per second SEFI Detection 2. Functional Monitor • • • • Sends input vectors to DUT Detects Errors based on the DUT outputs Records errors and exception occurrence Runs continuous handshaking with the DUT to assure its full synchronization with external peripherals Rezgui 12 MicroBlaze design running • Integer-based FFT software • 33MHz MicroBlaze clock speed • 0.25 MHz GPIO Bus Two mitigated design versions: 1. Without BRAM Scrubber 2. With BRAM Scrubber MAPLD 2005/E238 DUT/Service FPGAs Communication SelectMap Port Clk-TR0 Clk-TR1 Clk-TR2 DUT Configuration Monitor - Configuration - Readback (SEU Counting) - Scrubbing - SEFI Detection TMRed MicroBlaze Rst-TR0 Rst-TR1 Rst-TR2 DVld-In Majority Voter DVld-In-TR0 DVld-In-TR1 DVld-In-TR2 DVld-Exc-In Majority Voter DVld-Exc-Out-TR0 DVld-Exc-Out-TR1 DVld-Exc-Out-TR2 Data_In_TR0 Service FPGA XC2V3000 DVld-Exc-In-TR0 DVld-Exc-In-TR1 DVld-Exc-In-TR2 DVld-Exc-Out-TR0 DVld-Exc-Out-TR1 DVld-Exc-Out-TR2 16 Bits 16 Bits Data_In_TR1 Functional Monitor DVld-Out-TR0 DVld-Out-TR1 DVld-Out-TR2 GPIO BUS Functional Interface BUS DVld-Out-TR0 DVld-Out-TR1 DVld-Out-TR2 16 Bits Data_In_TR2 16 Bits 16 Bits Data_Out 16 Bits Rezgui 13 Handshaking Exception Detection Data_In_TR0 Data_In_TR1 Data_In_TR2 Data_Out_TR0 Data_Out_TR1 Data_Out_TR2 Data Transfer DUT XQR2V6000 MAPLD 2005/E238 Experimental Setup Service FPGA DUT Tested at Crocker Nuclear Laboratory at UC Davis using 63.3MeV Proton Beam Rezgui 14 MAPLD 2005/E238 Proton Beam Results (1) • Error Classification ― Type 1: FFT program calculates an incorrect result ― Type 2: MicroBlaze communication sequence is wrong or stops (timeout) ― Type 3: An exception or interrupt is invoked • Error Recovery Types ― The MicroBlaze recovers the next iteration of the program ― The MicroBlaze recovers when the processor was reset ― The MicroBlaze recovers after scrubbing the FPGA logic • Non-Recovery Types (Type -R) ― Runaway Resets: Upsets in the MicroBlaze code (stored in the BRAM) in at least two domains ― Runaway Exceptions: Illegal operation on the MicroBlaze detected by the exception Handler (DUT/Service) ― Runaway Errors: Illegal code in the FFT computation code Rezgui 15 MAPLD 2005/E238 Proton Beam Results (2) Proton-Induced Cross Sections of the Design 1 at Various Fluxes Flux [p/cm2/s] CLB Upsets / Scrub Cycle Fluence [p/cm2] Type 1 Error Cross-Section [cm2] Type 1R Error Cross-Section [cm2] Type 2 Error Cross-Section [cm2] Type 2R Error Cross-Section [cm2] Type 3 Error Cross-Section [cm2] (1) 1.94 x107 2 to 7 9.79 x1010 7.56 x 10-10 2.04 x 10-11 6.34 x 10-10 1.43 x 10-10 8.17 x 10-11 (2) 3.87 x107 4 to 15 2.49 x1010 8.44 x 10-10 < 4.02 x 10-11 6.03 x 10-10 2.01 x 10-10 1.61 x10-10 Proton-Induced Cross Sections of the Design 2 at Various Fluxes Flux [p/cm2/s] CLB Upsets / Scrub Cycle Fluence [p/cm2] Type 1 Error Cross-Section [cm2] Type 1R Error Cross-Section [cm2] Type 2 Error Cross-Section [cm2] Type 2R Error Cross-Section [cm2] Type 3 Error Cross-Section [cm2] (1) 1.70 x107 2 to 7 1.00 x1011 7.00x10-11 <1.00x10-11 5.00x10-11 <1.00x10-11 <1.00x10-11 (2) 1.70 x108 15 to 30 1.03 x1011 2.92x10-10 9.74x10-12 2.05x10-10 6.82x10-11 <9.70x10-12 (3) 1.70 x109 150 to 190 4.86 x1010 1.07x10-9 <2.05x10-11 7.82x10-10 1.65x10-10 3.60x10-11 Rezgui 16 MAPLD 2005/E238 Conclusion • A complete solution to mitigate an embedded processor implemented on a Xilinx Virtex II FPGA based on: ― Continuous external configuration scrubbing, ― Functional-block design triplication, ― Independent internal BRAM scrubbing (also triplicated). • A high area and power dissipation penalties after replacement of the distributed RAMs • At Low flux: Very low error cross-section (1.2x10-10 cm2) • The error cross-section increase rapidly with increasing flux • For space environment, it is predicted that the error rate of a MicroBlaze design should be lower than a SEFI rate, which prove the high efficacy of this solution Rezgui 17 MAPLD 2005/E238 Learned Lessons • Check if your design includes SRL16s or distributed RAMs to allow active scrubbing • Do the SMOKE test: Break one domain and insure that the design is still running • Reduce the flux to respect the first rule of TMR mitigation technique (1 upset / scrub cycle) Rezgui 18 MAPLD 2005/E238 References 1. Lima, F., Carmichael, C., Fabula, J., Padovani, R. and Reis, R., "A Fault Injection Analysis of Virtex® FPGA TMR Design Methodology", RADECS’01, September 2001. 2. Lima (de) F., Rezgui S., Cota E.F., Lubaszewski M. and Velazco R., “Designing and testing a radiation hardened 8051-like micro-controller”, MAPLD’00, Laurel, Maryland, September 2000. 3. Swift G., Rezgui S., George J., Carmichael C., Napier M., Maksymowicz J., Moore J., Lesea A., Koga R. and Wrobel T., “Dynamic Testing of Xilinx Virtex-II Field Programmable Gate Array’s (FPGA’s) Input Output Blocks (IOBs)”, NSREC’04, July 2004. 4. Carmichael C., Bridgford B. and Moore J., “Triple Module Redundancy Scheme for Static Latch-Based FPGAs”, MAPLD 2004, Laurel, Maryland, September 2004. 5. Carmichael C., “Triple Module Redundancy Design Techniques for Virtex FPGAs”, http://www.xilinx.com/bvdocs/appnotes/xapp197.pdf, Xilinx Application Note XAPP197, November 2001. 6. MicroBlaze Processor Reference User Guide, Embedded Development Kit (EDK 6.3), UG081, Version 4.0, Xilinx Inc., August 2004. 7. Roberts T., Slaney M., FFT C Code available at http://www.jjj.de/fft/int_fft.c, December 1994. 8. TMR Tool User Guide, UG156, Version 6.2.3, http://support.xilinx.com/products/milaero/ug156.pdf, Xilinx Inc., September 2004. 9. Xilin Application Note 197, “Triple Module Redundancy Design Techniques for Virtex FPGAs”, November 2001. Rezgui 19 MAPLD 2005/E238
© Copyright 2024 Paperzz