Pay-As-You-Go - Georgia Tech

PAY-AS-YOU-GO
STORAGE-EFFICIENT HARD ERROR CORRECTION
Moinuddin K. Qureshi
ECE, Georgia Tech
Research done while at:
IBM T. J. Watson Research Center New York
MICRO 2011 Dec 6, 2011
Introduction
PCM is a scalable technology. Device state changed by heating.
Over time, write operations break heater  Cell gets stuck
Reported write endurance: 10-100 million writes/cell
With good wear leveling still possible to have 8+ years lifetime
PAY-AS-YOU-GO, MICRO-2011
Not All Cells Are Created Equal
Variability in lifetime due to process variation: weak vs. strong cells
Weak cells fail much earlier  reduce system lifetime greatly
Lifetime usually modeled as Gaussian with SDEV of 10-30% of mean
We use SDEV=20% of mean
P (5 SDEV from mean) ≈ 10-6
For 1GB memory bank,
8K bits fail at time 0, more as we write!
PCM needs significant amount of error correction to handle variability
PAY-AS-YOU-GO, MICRO-2011
Write Efficient Code
Traditional ECC codes are write intensive  More wear
Endurance related (hard) faults identified with checker read
Write-efficient code: Error Correcting Pointers [ISCA’10]
1 bit
9 bit
D
Pointer
Cache Line (512b)
X
0
1
2
3
4
…
511
ECP needs 10 bits per entry. Handles multiple faults (needs 1 Full bit)
For correcting N errors, ECP needs (10N+1) bits
PAY-AS-YOU-GO, MICRO-2011
Expensive to Correct Many Errors
NoECPECP-1
0
1
ECP-3
ECP-2
2
3
4
ECP-4 ECP-5 ECP-6
5
6
7
Baseline System Lifetime (years)
To get 6+ years lifetime, we need to correct six errors per line
Storage: 61 bits/line (about 12%, 1GB for 8GB)  Expensive
Unlike ECC in current DRAM chips, this overhead is not optional
Goal: Reduce storage significantly (3X-6X) while retaining lifetime
PAY-AS-YOU-GO, MICRO-2011
Motivation
Key insight: Very few lines have large number of errors
Utilization of error correction entries per line
Num Writes
(Normalized)
No ECP
used
Only ECP-1
used
ECP-2 to
ECP-6 used
Average
ECP Used
50%
99.02%
0.97%
0.01%
0.01
95%
79.63%
18.14%
2.23%
0.23
100%
73.24%
22.82%
3.95%
0.31
Uniformly allocating error correction entries is inefficient (by ~20X)
We do not need to pay for error correction of each line upfront
Pay-As-You-Go: Give error correction entries in proportion to errors
PAY-AS-YOU-GO, MICRO-2011
Outline
 Introduction & Motivation
 PAYG Design
 Results
 Even More Storage Efficiency
 Related Work
 Summary
PAY-AS-YOU-GO, MICRO-2011
Naïve Design for PAYG
Given 73% of lines have no error, why not give ECP-6 only on error?
OFB
MEMORY LINE (64B)
Ways (Num GEC entries per set)
V TAG ECP-N
Sets
GEC Entry
Global Error Correction (GEC) Pool
GEC Pool structure: Set associative vs. Fully associative (impractical)
PAY-AS-YOU-GO, MICRO-2011
Three Key Problems
1. Set associative structure is inefficient (by ~8X for 8-way)
2. If we allocate six ECP entries per each GEC entry, most error
correction entries still remain unused
3. Given >25% of lines are likely to have at-least on error,
the latency impact of GEC is significant
PAY-AS-YOU-GO, MICRO-2011
Inefficiency of Set Associative GEC
There are 10s/100s of thousand of sets  Any set could overflow
How many entries used before one set overflows? Buckets-and-Balls
An 8-way GEC only 12% full when one set overflows  Need 8x entries
PAY-AS-YOU-GO, MICRO-2011
Scalable Structure for GEC Pool
GEC Entry
OFB
1
PTR
Set Associative Table (SAT)
GCT-HEAD
OFB
1
PTR
TAKEN BY SOME OTHER SET
*PTR is two-way replicated
Global Collision Table (GCT)
“Hash-Table With Chaining” structure for flexibility & low latency
PAY-AS-YOU-GO, MICRO-2011
Scalable Structure for GEC Pool
Global Collision Table (GCT) with half as many sets as SAT is sufficient
Lets say we want to store N entries
Structure
Fully Associative
8-way Set Associative
8-way (SAT+GCT)
Total Entries
Latency
N
Very High
8*N
1
1.5*N
1+epsilon
Proposed GEC structure has latency similar to Set Associative Table
while needing 5X fewer entries
PAY-AS-YOU-GO, MICRO-2011
Solving Other Two Problems
2. Fine Grained Allocation for effectively utilizing ECP entries
• Each GEC entry has only ECP-1.
• Each line can have multiple GEC entries
• We guarantee that all entries are in same set of (SAT/GCT)
• A faulty line can get more than ECP-6 as well
3. Local Error Correction (LEC) for low latency in common case
• Each line has dedicated ECP-1 (handles 95% lines)
• Ensures extra accesses (GEC) needed for only few lines
PAY-AS-YOU-GO, MICRO-2011
PAYG: Tying it All Together
PAYG performs on-demand allocation of error correction entries
PAYG has 3 levels. LEC is first line of defense (lowers latency)
SAT is second and GCT is third (flexible)
PAY-AS-YOU-GO, MICRO-2011
Outline
 Introduction & Motivation
 PAYG Design
 Results
 Even More Storage Efficiency
 Related Work
 Summary
PAY-AS-YOU-GO, MICRO-2011
Evaluation Settings
Assumptions:
1. Mean writes 32 Million, SDEV=20%, no correlation
2. Perfect wear leveling  all lines get same number of writes
3. Writes are converted into writes-read to detect faults
Configuration:
PCM bank of 1GB with 64B lines, so 16 million lines per bank
Write latency of 1 micro second
At 100% write traffic, lifetime is 18 years (if zero variance)
Figure of Merit:
Uniform ECP-6 gets 35% of ideal lifetime, so 6.5 years
We report lifetime with respect to Uniform ECP-6
PAY-AS-YOU-GO, MICRO-2011
Importance of Scalable GEC Pool
NoFGA-NoGCT
Lifetime with respect to ECP-6 (%)
110
NoFGA-wGCT
100
Total Sets
128K+64K=192K
90
80
70
60
50
40
30
20
10
0
64K
32K
16K
8K
4K
2K
1024K
512K
256K
128K
64K
32K
ECP-6
ECP-5
ECP-4
ECP-3
ECP-2
ECP-1
Num SAT Sets
Num GCT Sets
(SAT Sets=128K)
Proposed structure reduces storage overhead of GEC by more than 5X
PAY-AS-YOU-GO, MICRO-2011
Importance of Fine-Grained Alloc.
5
4
3
2
1
Num GEC Entry per Set (64B line)
8
9
12
16
24
Total ECP Entries per Set
40
36
36
32
24
Lifetime Norm. to ECP-6 (%)
Num ECP Entries in Each GEC Entry
114
112
110
108
106
104
102
100
5
4
3
2
1
Num ECP Entries in Each GEC Entry
Fine-Grained Allocation improves the effectiveness of PAYG
PAY-AS-YOU-GO, MICRO-2011
Importance of LEC
We can get higher lifetime by increasing GEC size but we still need LEC
5 years
Without LEC, latency impact is significant. With LEC, not so much
For first 5 years, PAYG incurs on avg 1 extra access for < 0.4% accesses
PAY-AS-YOU-GO, MICRO-2011
Storage Overhead
LEC Storage
13 bits/line (10 bit ECP + 1 valid + 2 OFB)
GEC Storage
6.5 bits/line on average
Total
19.5 bits/line
Scheme
Storage Overhead
(bits/line)
Lifetime
Uniform ECP-6
61
1X
Uniform ECP-8
81
1.13X
PAYG with ECP-1 in LEC
19.5
1.13X
(Total storage overhead to protect 1GB reduces from 122MB to 39MB, down 83MB)
PAYG provides lifetime similar to ECP-8 at 3.1X less storage than ECP-6
PAY-AS-YOU-GO, MICRO-2011
Outline
 Introduction & Motivation
 PAYG Design
 Results
 Even More Storage Efficiency
 Related Work
 Summary
PAY-AS-YOU-GO, MICRO-2011
Efficient Single Bit Correction
LEC responsible for most of storage overhead (13 bits out of 19.5 bits)
Need efficient schemes single bit hard faults  Alternate Data Retry (ADR)
ADR: Mask hard fault by storing data in either normal or inverted form
INV
0
0
1
1
SA-0
INV
1
1
SA-0
1
0
0
0
ADR needs only 1 bit to mask a single stuck-at-fault (caveat: double write)
Reduce storage overhead of PAYG by using ADR instead of ECP-1 in LEC
PAY-AS-YOU-GO, MICRO-2011
Comparisons
Hard to scale ADR to multiple faults. SAFER [MICRO’10] partitions lines with
multiple faults into single bit faults. SAFER needs 55 bits/line and lifetime ~ECP-6
Scheme
Storage Overhead
(bits/line)
Lifetime
Uniform ECP-6
61
1X
Uniform ECP-8
81
1.13X
PAYG with ECP-1 in LEC
19.5
1.13X
PAYG with ADR in LEC
9.5
1.02X
PAYG with heterogeneous error correction reduces storage by 6X
PAY-AS-YOU-GO, MICRO-2011
Outline
 Introduction & Motivation
 PAYG Design
 Results
 Even More Storage Efficiency
 Related Work
 Summary
PAY-AS-YOU-GO, MICRO-2011
Non Uniform Error Correction
 Variable Strength ECC (VS-ECC) by Alameldeen+ ISCA’11
Proposed for cache reliability at low voltages
Each way has ECC-4 for one quarter of ways, allocated based on testing
Difference: Cache line disabling works. Only set associative structure.
 Layered ECP by Schechter+ ISCA’10
ECP-1 for each line, and some ECP entries for each page
In essence, this is a set-associative GEC with ECP-1 in LEC
Difference: Set associative GEC requires 5X more entries (inefficient)
 Line Sparing with FREE-p by Hyun+ HPCA’11
A faulty line is remapped to a spare area using embedded pointer
Sparing needs 1 good line for 1 uncorrectable fault
Difference: PAYG is much more storage efficient than sparing
PAY-AS-YOU-GO, MICRO-2011
FREE-p: Sparing vs. Correction
For 1 extra error bit, PAYG needs 20 bit GEC entry, FREE-p needs 512 bit
PAYG is more effective than line sparing with FREE-p
PAY-AS-YOU-GO, MICRO-2011
Outline
 Introduction & Motivation
 PAYG Design
 Results
 Even More Storage Efficiency
 Related Work
 Summary
PAY-AS-YOU-GO, MICRO-2011
Summary
PCM: limited endurance, variability across cells reduces lifetime
Need to correct many (six) errors per line
Uniform allocation is expensive and inefficient (only 0.3 out of 6 used)
Pay-As-You-Go (PAYG): Allocate error correction entries on demand
PAYG has LEC + GEC Pool (Set Associative Table + Global Collision Table)
Provides 1.13X lifetime compared to ECP-6 at 3.1X lower overhead
Heterogeneous scheme (ADR for LEC) reduces storage by 6X
PAYG useful for efficient hard-error correction in other technologies too
PAY-AS-YOU-GO, MICRO-2011