1 DSN 2016 DSN 2016 Leveraging ECC to Mitigate Read Disturbance, False Reads Leveraging ECC to Mitigate Read Disturbance, False Reads and Write Faults in STT-RAM and Write Faults in STT-RAM Mohammad Seyedzadeh, Rakan. Maddah, Alex. Jones, Rami. Melhem Mohammad Seyedzadeh, Rakan. Maddah, Alex. Jones, Rami. Melhem University of Pittsburgh University of Pittsburgh 2 Executive Summary Traditional ECC mitigates write faults and False read errors Observation: Read disturbance errors are correlated with repeated read operations Our Approach: On-demand refresh policy • Write After Error detection (WAE) Read disturbance error is close to other bit error rates • Write After Persistent error (WAP) False read error rate is higher than read disturbance • Write after error threshold (WAT) False read error rate is dominant error Key results: Two orders of magnitude improvement on Reliability and Energy product for different ranges of bit error rates 3 Background Bit Line Bit Line (B Bit Line (BL) Bit Line (BL) MTJ Free Layer Barrier Layer low resistance Word Line 0 logic Source Line Word Line (WL) Fixed Layer Word Line (WL) Word Line Gate (WL) Source Source Line (SL) Anti-parallel (AP) Drain NMOS Source Line (SL) Parallel (P) (a) High resistance 1 logic STT-RAM STT-RAMCircuit Cell Word Li (WL) Source Line (S Parallel (P 4 Background Write error Read disturbance False read Error STTRAM -11 10 Typically mitigated using ECC 10-7 10000 Reads Key Observation The cumulative effect of the read disturbance, even relatively low fault probabilities can result in a relatively high probability of failure. 5 Solution to Mitigate Read Disturbance Writing back data after every read operation (WAR) 0 1 1 0 0 0 1 0 0 1 0 0 1 0 ReadCan we do better? 0 0 1 ECC 0 1 0 0 Write-back 1 0 Advantage Highest reliability as long as write error rate is low 0 Disadvantage High Energy Cost 6 New Solution • Do not Write-back after every read • First detect error and then write-back • Mitigate Read disturbance, false read error and write error using ECC Write After Error detection (WAE) • Read disturbance error is close to other bit error rates. Write After Persistent error (WAP) • False read error rate is higher than read disturbance. Write after error threshold (WAT) • False read error rate is dominant error. 7 Proposed Techniques (a) (c) (b) WAE WAP WAT Write-back after error detection Check persistent error Leaving behind false read errors 8 Why Markov Model? • As Monte-Carlo simulation is only feasible for high RBER, it is inadequate for systems with persistent errors since it requires prohibitive simulation times to capture the effect of low RBER. • The cumulative effect of read disturbance is captured by the different Markov states. Markov Chain Process Reliability The expected number of transitions before absorption Energy The number of system write or system read 9 Modeling Write-back after every read by Markov chain S1: No error S2: At most one transient error S5: At least two errors Write-back S3: One persistent error S4: At least two persistent errors 10 Markov Model for Proposed Techniques WAE Write-back WAP Write-back Second Read WAT Write-back after threshold 11 Evaluation False bit error rate is highest bit error rate (Single MTJ) pf > pd Read disturb error rate is highest bit error rate (Double MTJ) pd > pf Write error rate is highest bit error rate pw > pd , pf 12 ECC-1 for single MTJ Uncorrectable bit error rate (UBER) 1.E-06 a=50% b=50% ECC1 WAE 1.E-05 WAR WAP UBER UBER 1.E-05 1.E-07 1.E-06 a=99.9% b=0.1% ECC1 WAE WAR WAP 1.E-07 pd ~ pf 1.E-08 1.E-08 6.000 4.954 3.903 2.845 1.778 0.699 log(pf/pd) Equal prob. of read and write 6.000 4.954 3.903 2.845 1.778 0.699 log(pf/pd) One write every 1000 read • Conclusion1: WAE, WAE and WAP achieve acceptable UBER levels. • Conclusion2: When the user read to write ratio increases, if pd is comparable to pf, the system reliability varies significantly. We conduct experimental results on a ``worst-case'' ratio of 1000 user reads to each user write (a=99.9\% vs. b=0.1\%). 13 Single MTJ ECC-1 WAE WAR WAP 6.000 6.000 4.954 4.954 Log (pf/pd) Log (pf/pd) WAR ECC-2 3.903 2.845 1.778 (a) WAP WAT Leaving behind persistent errors when pd increases 3.903 2.845 1.778 0.699 0.699 1.E-03 WAE 1.E-02 1.E-01 1.E+00 1.E+01 Energy Overhead Two orders of magnitude improvement by WAE and WAP 1.E-05 (b) 1.E-03 1.E-01 1.E+01 Energy Overhead Three orders of magnitude improvement by WAT Conclusion: WAR incurs a large overhead and other approaches dramatically reduce this energy overhead while achieving a similar or acceptable UBER level. 14 Single MTJ: Energy UBER Product (EUP) ECC-1 WAR WAE ECC-2 1.E-06 WAP 1.E-04 1.E-08 1.E-06 1.E-10 WAR WAE WAP WAT Still two orders of magnitude improvement by WAR and WAE EUP EUP 1.E-02 1.E-12 1.E-08 1.E-14 1.E-10 6.000 4.954 3.903 2.845 1.778 0.699 6.000 4.954 3.903 2.845 1.778 0.699 (b) pf > pd Log (pf/pd) pd ~ pf (a) Log (pf/pd) Two orders of magnitude Four orders of magnitude improvement by WAE and WAP improvement by WAT Double MTJ: Energy UBER Product (EUP) WAR WAE 1.E-12 WAP 1.E-10 1.E-14 1.E-12 1.E-16 EUP EUP 1.E-08 1.E-10 ECC-1 1.E-14 1.E-20 1.E-18 1.E-22 pd < pf pd > pf WAE WAP WAT 3.000 1.845 0.602 -0.155-1.222-2.699 3.000 1.845 0.602 -0.155 -1.222 -2.699 Log (pf/pd) WAR 1.E-18 1.E-16 (a) ECC-2 (b) Log (pf/pd) pd < pf pd > pf 15 Conclusion Traditional ECC mitigates write faults and False read errors Observation: Read disturbance errors are correlated with repeated read operations Our Approach: On-demand refresh policy • Write After Error detection (WAE) Read disturbance error is close to other bit error rates • Write After Persistent error (WAP) False read error rate is higher than read disturbance • Write after error threshold (WAT) False read error rate is dominant error Key results: Two orders of magnitude improvement on Reliability and Energy product for different ranges of bit error rates 16 DSN 2016 DSN 2016 Leveraging ECC to Mitigate Read Disturbance, False Reads Leveraging ECC to Mitigate Read Disturbance, False Reads and Write Faults in STT-RAM and Write Faults in STT-RAM Mohammad Seyedzadeh, Rakan. Maddah, Alex. Jones, Rami. Melhem Mohammad Seyedzadeh, Rakan. Maddah, Alex. Jones, Rami. Melhem University of Pittsburgh University of Pittsburgh
© Copyright 2026 Paperzz