MicroBlaze Lobby Pitch

SEU Mitigation of a Soft
Embedded Processor in the
Virtex-II FPGAs
Sana Rezgui1, Jeffrey George2, Gary Swift3, Kevin Somervill4,
Carl Carmichael1 and Gregory Allen3,
For the North American Xilinx Test Consortium
1Xilinx,
Inc., San Jose, CA
2The Aerospace
3Jet
Corporation, El Segundo, CA
Propulsion Laboratory, California Institute of Technology, Pasadena, CA
4NASA
Langley, Hampton, VA
Objective
• Use of embedded system applications built on S-FPGAs in radiation
environment => Mitigation to SEUs and Design Implementation
• Mitigated Design Performances
― Simplicity, flexibility and automation
― Area and timing performances
• Upset Sensitivity in Radiation Environment
― Characterization of the FPGA sensitivity in beam
― Evaluation of the proposed mitigation solution for the embedded design
Measure the in-beam performance of upset mitigation technique
applied to a complex design - a processor- implemented on FPGA
running a computationally intensive benchmark program
Rezgui
2
MAPLD 2005/E238
Studied Case
Mitigation to SEUs of the Xilinx soft IP processor MicroBlaze by means of
the Triple Modular Redundancy (TMR) technique
MicroBlaze
Block RAM
Configuration
18 bit
Multipliers
Logic Block
(CLB)
Digital Clock
Manager
Rezgui
Programmable I/Os
3
MAPLD 2005/E238
Internal Architecture
MicroBlaze is a 32-bit Harvard Bus RISC Architecture
Rezgui
4
MAPLD 2005/E238
MicroBlaze Mitigation
1.
Use TMR technique to mitigate the design to SEUs
• MicroBlaze designs consist of I/Os, Look-Up Tables (LUT), FlipFlops (FF) and user memory elements,
• For TMR Tool (developed by Xilinx), MicroBlaze is no different
than any other design.
2.
Run Active Readback and Continuous Scrubbing of all the static
used resources for error detection and correction
• This is transparent and independent to/from the running design,
• User memory elements can not be scrubbed from the
configuration port.
Rezgui
5
MAPLD 2005/E238
Internal Architecture
SRL16s
SRL16s
BRAM
BRAM
LUT-RAMs
User memory elements: SRL16s, Distributed Memory (LUT-RAM), BRAMs
•
•
Active Readback causes problems with user memory elements (dynamic content)
BRAM static partial reconfiguration is not possible if storing program data in addition to the code
Rezgui
6
MAPLD 2005/E238
User Memory Mitigation
• Error Detection and Correction (EDAC)
― Additional decoding logic would be required
― Depends on the speed of detection and correction of upsets
• Replacement of the user memory elements by FFs and LUTs
― SRL16 are automatically replaced by FFs and LUTs by the TMR Tool
― Distributed RAM (LUT-RAM) are not set to be automatically replaced:
A custom macro is then required for their replacement by FFs and LUTs
• Triple Modular Redundancy and Self-Correction of the BRAMs
― Done automatically through the TMR Tool by replacing each BRAM by a
custom macro that scrubs the BRAM itself
• EDAC and TMR can be defeated by error accumulation
Rezgui
7
MAPLD 2005/E238
BRAM Mitigation Methodology
1. Apply TMR on the used BRAMs
2. Insert an internal scrub controller of the
3 BRAMs by their voted output value
• Mitigation Requirement: Only one
BRAM port could be used for the
MicroBlaze design
•
Each Block RAM is replaced with the
tmred BRAMs and the internal BRAM
scrubber controller
Rezgui
8
MAPLD 2005/E238
EDK / TMR Tool Design Flow
LUTRAM & BRAM
Macro Replacement
System Design
Implementation
.ngc
.ngo
.bmm
.ucf
Rezgui
NGDBuild
TMR Tool
MAP
(Manual edit)
PAR
.elf
Design Entry
EDK/ISE
.edf
BitGen / BitInit
XTMR Conversion
TMR Tool
9
Implementation
ISE
MAPLD 2005/E238
Implementation and Performance (1)
Virtex II- 6000 Used Internal Resources
100
90
50
40
30
20
Full Mitigated Design
60
Mitigated Mblaze design
with LUT-RAM
70
Mitigated Mblaze design
without LUT-RAMs
80
Single String MicroBlaze
%Virtex II 6000 Used Resources
FFs
LU T s
GC LK
IOs
M U LT s
BRAM s
10
Design Type
0
Rezgui
10
MAPLD 2005/E238
Implementation and Performance (2)
Timing Performances and Core Voltage Current Consumption
Tested Design
Maximum Frequency
(MHz)
Current Consumption
(A)
Single-string Mblaze (Phase 1)
77
0.37
Mitigated Mblaze design before
Replacement of LUT-RAM (Phase 2)
66
0.78
Mitigated Mblaze design after
Replacement of LUT-RAM (Phase 3)
66
0.83
Full Mitigated Design (Phase 4)
66
0.99
Rezgui
11
MAPLD 2005/E238
Experimental Test Designs
DUT FPGA
XQR2V6000
Service FPGA: XC2V3000
1. Configuration Monitor
•
•
•
DUT Configuration
Continuous alternate scrubbing and
readback at a rate of 4 per second
SEFI Detection
2. Functional Monitor
•
•
•
•
Sends input vectors to DUT
Detects Errors based on the DUT outputs
Records errors and exception occurrence
Runs continuous handshaking with the
DUT to assure its full synchronization with
external peripherals
Rezgui
12
MicroBlaze design running
• Integer-based FFT software
• 33MHz MicroBlaze clock
speed
• 0.25 MHz GPIO Bus
Two mitigated design versions:
1.
Without BRAM Scrubber
2.
With BRAM Scrubber
MAPLD 2005/E238
DUT/Service FPGAs Communication
SelectMap Port
Clk-TR0
Clk-TR1
Clk-TR2
DUT Configuration
Monitor
- Configuration
- Readback (SEU Counting)
- Scrubbing
- SEFI Detection
TMRed MicroBlaze
Rst-TR0
Rst-TR1
Rst-TR2
DVld-In Majority Voter
DVld-In-TR0
DVld-In-TR1
DVld-In-TR2
DVld-Exc-In
Majority Voter
DVld-Exc-Out-TR0
DVld-Exc-Out-TR1
DVld-Exc-Out-TR2
Data_In_TR0
Service FPGA
XC2V3000
DVld-Exc-In-TR0
DVld-Exc-In-TR1
DVld-Exc-In-TR2
DVld-Exc-Out-TR0
DVld-Exc-Out-TR1
DVld-Exc-Out-TR2
16 Bits
16 Bits
Data_In_TR1
Functional
Monitor
DVld-Out-TR0
DVld-Out-TR1
DVld-Out-TR2
GPIO BUS
Functional Interface BUS
DVld-Out-TR0
DVld-Out-TR1
DVld-Out-TR2
16 Bits
Data_In_TR2
16 Bits
16 Bits
Data_Out
16 Bits
Rezgui
13
Handshaking
Exception
Detection
Data_In_TR0
Data_In_TR1
Data_In_TR2
Data_Out_TR0
Data_Out_TR1
Data_Out_TR2
Data
Transfer
DUT
XQR2V6000
MAPLD 2005/E238
Experimental Setup
Service
FPGA
DUT
Tested at Crocker Nuclear Laboratory at UC Davis using 63.3MeV Proton Beam
Rezgui
14
MAPLD 2005/E238
Proton Beam Results (1)
• Error Classification
― Type 1: FFT program calculates an incorrect result
― Type 2: MicroBlaze communication sequence is wrong or stops (timeout)
― Type 3: An exception or interrupt is invoked
• Error Recovery Types
― The MicroBlaze recovers the next iteration of the program
― The MicroBlaze recovers when the processor was reset
― The MicroBlaze recovers after scrubbing the FPGA logic
• Non-Recovery Types (Type -R)
― Runaway Resets: Upsets in the MicroBlaze code (stored in the BRAM) in at
least two domains
― Runaway Exceptions: Illegal operation on the MicroBlaze detected by the
exception Handler (DUT/Service)
― Runaway Errors: Illegal code in the FFT computation code
Rezgui
15
MAPLD 2005/E238
Proton Beam Results (2)
Proton-Induced Cross Sections of the Design 1 at Various Fluxes
Flux
[p/cm2/s]
CLB Upsets
/ Scrub
Cycle
Fluence
[p/cm2]
Type 1 Error
Cross-Section
[cm2]
Type 1R Error
Cross-Section
[cm2]
Type 2 Error
Cross-Section
[cm2]
Type 2R Error
Cross-Section
[cm2]
Type 3 Error
Cross-Section
[cm2]
(1) 1.94 x107
2 to 7
9.79 x1010
7.56 x 10-10
2.04 x 10-11
6.34 x 10-10
1.43 x 10-10
8.17 x 10-11
(2) 3.87 x107
4 to 15
2.49 x1010
8.44 x 10-10
< 4.02 x 10-11
6.03 x 10-10
2.01 x 10-10
1.61 x10-10
Proton-Induced Cross Sections of the Design 2 at Various Fluxes
Flux
[p/cm2/s]
CLB Upsets /
Scrub Cycle
Fluence
[p/cm2]
Type 1 Error
Cross-Section
[cm2]
Type 1R Error
Cross-Section
[cm2]
Type 2 Error
Cross-Section
[cm2]
Type 2R Error
Cross-Section
[cm2]
Type 3 Error
Cross-Section
[cm2]
(1) 1.70 x107
2 to 7
1.00 x1011
7.00x10-11
<1.00x10-11
5.00x10-11
<1.00x10-11
<1.00x10-11
(2) 1.70 x108
15 to 30
1.03 x1011
2.92x10-10
9.74x10-12
2.05x10-10
6.82x10-11
<9.70x10-12
(3) 1.70 x109
150 to 190
4.86 x1010
1.07x10-9
<2.05x10-11
7.82x10-10
1.65x10-10
3.60x10-11
Rezgui
16
MAPLD 2005/E238
Conclusion
• A complete solution to mitigate an embedded processor
implemented on a Xilinx Virtex II FPGA based on:
― Continuous external configuration scrubbing,
― Functional-block design triplication,
― Independent internal BRAM scrubbing (also triplicated).
• A high area and power dissipation penalties after replacement
of the distributed RAMs
• At Low flux: Very low error cross-section (1.2x10-10 cm2)
• The error cross-section increase rapidly with increasing flux
• For space environment, it is predicted that the error rate of a
MicroBlaze design should be lower than a SEFI rate, which
prove the high efficacy of this solution
Rezgui
17
MAPLD 2005/E238
Learned Lessons
• Check if your design includes SRL16s or distributed
RAMs to allow active scrubbing
• Do the SMOKE test: Break one domain and insure that
the design is still running
• Reduce the flux to respect the first rule of TMR mitigation
technique (1 upset / scrub cycle)
Rezgui
18
MAPLD 2005/E238
References
1. Lima, F., Carmichael, C., Fabula, J., Padovani, R. and Reis, R., "A Fault Injection
Analysis of Virtex® FPGA TMR Design Methodology", RADECS’01, September 2001.
2. Lima (de) F., Rezgui S., Cota E.F., Lubaszewski M. and Velazco R., “Designing and
testing a radiation hardened 8051-like micro-controller”, MAPLD’00, Laurel, Maryland,
September 2000.
3. Swift G., Rezgui S., George J., Carmichael C., Napier M., Maksymowicz J., Moore J.,
Lesea A., Koga R. and Wrobel T., “Dynamic Testing of Xilinx Virtex-II Field
Programmable Gate Array’s (FPGA’s) Input Output Blocks (IOBs)”, NSREC’04, July
2004.
4. Carmichael C., Bridgford B. and Moore J., “Triple Module Redundancy Scheme for
Static Latch-Based FPGAs”, MAPLD 2004, Laurel, Maryland, September 2004.
5. Carmichael C., “Triple Module Redundancy Design Techniques for Virtex FPGAs”,
http://www.xilinx.com/bvdocs/appnotes/xapp197.pdf, Xilinx Application Note XAPP197,
November 2001.
6. MicroBlaze Processor Reference User Guide, Embedded Development Kit (EDK 6.3),
UG081, Version 4.0, Xilinx Inc., August 2004.
7. Roberts T., Slaney M., FFT C Code available at http://www.jjj.de/fft/int_fft.c, December
1994.
8. TMR Tool User Guide, UG156, Version 6.2.3,
http://support.xilinx.com/products/milaero/ug156.pdf, Xilinx Inc., September 2004.
9. Xilin Application Note 197, “Triple Module Redundancy Design Techniques for Virtex
FPGAs”, November 2001.
Rezgui
19
MAPLD 2005/E238