PAY-AS-YOU-GO STORAGE-EFFICIENT HARD ERROR CORRECTION Moinuddin K. Qureshi ECE, Georgia Tech Research done while at: IBM T. J. Watson Research Center New York MICRO 2011 Dec 6, 2011 Introduction PCM is a scalable technology. Device state changed by heating. Over time, write operations break heater Cell gets stuck Reported write endurance: 10-100 million writes/cell With good wear leveling still possible to have 8+ years lifetime PAY-AS-YOU-GO, MICRO-2011 Not All Cells Are Created Equal Variability in lifetime due to process variation: weak vs. strong cells Weak cells fail much earlier reduce system lifetime greatly Lifetime usually modeled as Gaussian with SDEV of 10-30% of mean We use SDEV=20% of mean P (5 SDEV from mean) ≈ 10-6 For 1GB memory bank, 8K bits fail at time 0, more as we write! PCM needs significant amount of error correction to handle variability PAY-AS-YOU-GO, MICRO-2011 Write Efficient Code Traditional ECC codes are write intensive More wear Endurance related (hard) faults identified with checker read Write-efficient code: Error Correcting Pointers [ISCA’10] 1 bit 9 bit D Pointer Cache Line (512b) X 0 1 2 3 4 … 511 ECP needs 10 bits per entry. Handles multiple faults (needs 1 Full bit) For correcting N errors, ECP needs (10N+1) bits PAY-AS-YOU-GO, MICRO-2011 Expensive to Correct Many Errors NoECPECP-1 0 1 ECP-3 ECP-2 2 3 4 ECP-4 ECP-5 ECP-6 5 6 7 Baseline System Lifetime (years) To get 6+ years lifetime, we need to correct six errors per line Storage: 61 bits/line (about 12%, 1GB for 8GB) Expensive Unlike ECC in current DRAM chips, this overhead is not optional Goal: Reduce storage significantly (3X-6X) while retaining lifetime PAY-AS-YOU-GO, MICRO-2011 Motivation Key insight: Very few lines have large number of errors Utilization of error correction entries per line Num Writes (Normalized) No ECP used Only ECP-1 used ECP-2 to ECP-6 used Average ECP Used 50% 99.02% 0.97% 0.01% 0.01 95% 79.63% 18.14% 2.23% 0.23 100% 73.24% 22.82% 3.95% 0.31 Uniformly allocating error correction entries is inefficient (by ~20X) We do not need to pay for error correction of each line upfront Pay-As-You-Go: Give error correction entries in proportion to errors PAY-AS-YOU-GO, MICRO-2011 Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011 Naïve Design for PAYG Given 73% of lines have no error, why not give ECP-6 only on error? OFB MEMORY LINE (64B) Ways (Num GEC entries per set) V TAG ECP-N Sets GEC Entry Global Error Correction (GEC) Pool GEC Pool structure: Set associative vs. Fully associative (impractical) PAY-AS-YOU-GO, MICRO-2011 Three Key Problems 1. Set associative structure is inefficient (by ~8X for 8-way) 2. If we allocate six ECP entries per each GEC entry, most error correction entries still remain unused 3. Given >25% of lines are likely to have at-least on error, the latency impact of GEC is significant PAY-AS-YOU-GO, MICRO-2011 Inefficiency of Set Associative GEC There are 10s/100s of thousand of sets Any set could overflow How many entries used before one set overflows? Buckets-and-Balls An 8-way GEC only 12% full when one set overflows Need 8x entries PAY-AS-YOU-GO, MICRO-2011 Scalable Structure for GEC Pool GEC Entry OFB 1 PTR Set Associative Table (SAT) GCT-HEAD OFB 1 PTR TAKEN BY SOME OTHER SET *PTR is two-way replicated Global Collision Table (GCT) “Hash-Table With Chaining” structure for flexibility & low latency PAY-AS-YOU-GO, MICRO-2011 Scalable Structure for GEC Pool Global Collision Table (GCT) with half as many sets as SAT is sufficient Lets say we want to store N entries Structure Fully Associative 8-way Set Associative 8-way (SAT+GCT) Total Entries Latency N Very High 8*N 1 1.5*N 1+epsilon Proposed GEC structure has latency similar to Set Associative Table while needing 5X fewer entries PAY-AS-YOU-GO, MICRO-2011 Solving Other Two Problems 2. Fine Grained Allocation for effectively utilizing ECP entries • Each GEC entry has only ECP-1. • Each line can have multiple GEC entries • We guarantee that all entries are in same set of (SAT/GCT) • A faulty line can get more than ECP-6 as well 3. Local Error Correction (LEC) for low latency in common case • Each line has dedicated ECP-1 (handles 95% lines) • Ensures extra accesses (GEC) needed for only few lines PAY-AS-YOU-GO, MICRO-2011 PAYG: Tying it All Together PAYG performs on-demand allocation of error correction entries PAYG has 3 levels. LEC is first line of defense (lowers latency) SAT is second and GCT is third (flexible) PAY-AS-YOU-GO, MICRO-2011 Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011 Evaluation Settings Assumptions: 1. Mean writes 32 Million, SDEV=20%, no correlation 2. Perfect wear leveling all lines get same number of writes 3. Writes are converted into writes-read to detect faults Configuration: PCM bank of 1GB with 64B lines, so 16 million lines per bank Write latency of 1 micro second At 100% write traffic, lifetime is 18 years (if zero variance) Figure of Merit: Uniform ECP-6 gets 35% of ideal lifetime, so 6.5 years We report lifetime with respect to Uniform ECP-6 PAY-AS-YOU-GO, MICRO-2011 Importance of Scalable GEC Pool NoFGA-NoGCT Lifetime with respect to ECP-6 (%) 110 NoFGA-wGCT 100 Total Sets 128K+64K=192K 90 80 70 60 50 40 30 20 10 0 64K 32K 16K 8K 4K 2K 1024K 512K 256K 128K 64K 32K ECP-6 ECP-5 ECP-4 ECP-3 ECP-2 ECP-1 Num SAT Sets Num GCT Sets (SAT Sets=128K) Proposed structure reduces storage overhead of GEC by more than 5X PAY-AS-YOU-GO, MICRO-2011 Importance of Fine-Grained Alloc. 5 4 3 2 1 Num GEC Entry per Set (64B line) 8 9 12 16 24 Total ECP Entries per Set 40 36 36 32 24 Lifetime Norm. to ECP-6 (%) Num ECP Entries in Each GEC Entry 114 112 110 108 106 104 102 100 5 4 3 2 1 Num ECP Entries in Each GEC Entry Fine-Grained Allocation improves the effectiveness of PAYG PAY-AS-YOU-GO, MICRO-2011 Importance of LEC We can get higher lifetime by increasing GEC size but we still need LEC 5 years Without LEC, latency impact is significant. With LEC, not so much For first 5 years, PAYG incurs on avg 1 extra access for < 0.4% accesses PAY-AS-YOU-GO, MICRO-2011 Storage Overhead LEC Storage 13 bits/line (10 bit ECP + 1 valid + 2 OFB) GEC Storage 6.5 bits/line on average Total 19.5 bits/line Scheme Storage Overhead (bits/line) Lifetime Uniform ECP-6 61 1X Uniform ECP-8 81 1.13X PAYG with ECP-1 in LEC 19.5 1.13X (Total storage overhead to protect 1GB reduces from 122MB to 39MB, down 83MB) PAYG provides lifetime similar to ECP-8 at 3.1X less storage than ECP-6 PAY-AS-YOU-GO, MICRO-2011 Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011 Efficient Single Bit Correction LEC responsible for most of storage overhead (13 bits out of 19.5 bits) Need efficient schemes single bit hard faults Alternate Data Retry (ADR) ADR: Mask hard fault by storing data in either normal or inverted form INV 0 0 1 1 SA-0 INV 1 1 SA-0 1 0 0 0 ADR needs only 1 bit to mask a single stuck-at-fault (caveat: double write) Reduce storage overhead of PAYG by using ADR instead of ECP-1 in LEC PAY-AS-YOU-GO, MICRO-2011 Comparisons Hard to scale ADR to multiple faults. SAFER [MICRO’10] partitions lines with multiple faults into single bit faults. SAFER needs 55 bits/line and lifetime ~ECP-6 Scheme Storage Overhead (bits/line) Lifetime Uniform ECP-6 61 1X Uniform ECP-8 81 1.13X PAYG with ECP-1 in LEC 19.5 1.13X PAYG with ADR in LEC 9.5 1.02X PAYG with heterogeneous error correction reduces storage by 6X PAY-AS-YOU-GO, MICRO-2011 Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011 Non Uniform Error Correction Variable Strength ECC (VS-ECC) by Alameldeen+ ISCA’11 Proposed for cache reliability at low voltages Each way has ECC-4 for one quarter of ways, allocated based on testing Difference: Cache line disabling works. Only set associative structure. Layered ECP by Schechter+ ISCA’10 ECP-1 for each line, and some ECP entries for each page In essence, this is a set-associative GEC with ECP-1 in LEC Difference: Set associative GEC requires 5X more entries (inefficient) Line Sparing with FREE-p by Hyun+ HPCA’11 A faulty line is remapped to a spare area using embedded pointer Sparing needs 1 good line for 1 uncorrectable fault Difference: PAYG is much more storage efficient than sparing PAY-AS-YOU-GO, MICRO-2011 FREE-p: Sparing vs. Correction For 1 extra error bit, PAYG needs 20 bit GEC entry, FREE-p needs 512 bit PAYG is more effective than line sparing with FREE-p PAY-AS-YOU-GO, MICRO-2011 Outline Introduction & Motivation PAYG Design Results Even More Storage Efficiency Related Work Summary PAY-AS-YOU-GO, MICRO-2011 Summary PCM: limited endurance, variability across cells reduces lifetime Need to correct many (six) errors per line Uniform allocation is expensive and inefficient (only 0.3 out of 6 used) Pay-As-You-Go (PAYG): Allocate error correction entries on demand PAYG has LEC + GEC Pool (Set Associative Table + Global Collision Table) Provides 1.13X lifetime compared to ECP-6 at 3.1X lower overhead Heterogeneous scheme (ADR for LEC) reduces storage by 6X PAYG useful for efficient hard-error correction in other technologies too PAY-AS-YOU-GO, MICRO-2011
© Copyright 2026 Paperzz