Adap0ve-‐Latency DRAM

Op#mizing DRAM Timing for the Common-­‐Case
Adap#ve-­‐Latency DRAM
Donghyuk Lee
Yoongu Kim, Gennady Pekhimenko, Samira Khan,
Vivek Seshadri, Kevin Chang, Onur Mutlu
x86 CPU
SPEC
Apache
GUPS
Memcached
Parsec
mcf
MemCtrl Run#me: 527min Run#me: 477min -­‐10.5% (no error)
Timing Parameters
(11 – 11 – 28)
DRAM Module
(8 – 8 – 19) DDR3 1600MT/s (11-­‐11-­‐28)
2
Reducing DRAM Timing
Why can we reduce DRAM 1ming parameters without any errors? 3
Execu#ve Summary
•  Observa1ons
–  DRAM 1ming parameters are dictated by the worst-­‐case cell (smallest cell across all products at highest temperature)
–  DRAM operates at lower temperature than the worst case
•  Idea: Adap1ve-­‐Latency DRAM –  Op1mizes DRAM 1ming parameters for the common case (typical DIMM opera1ng at low temperatures)
•  Analysis: Characteriza1on of 115 DIMMs
–  Great poten1al to lower DRAM 1ming parameters (17 – 54%) without any errors
•  Real System Performance Evalua1on –  Significant performance improvement (14% for memory-­‐
intensive workloads) without errors (33 days)
4
1. DRAM Opera#on Basics
2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-­‐Latency DRAM
5. DRAM Characteriza#on
6. Real System Performance Evalua#on
5
DRAM Stores Data as Charge
DRAM Cell
Three steps of charge movement
1. Sensing
2. Restore
3. Precharge
Sense-­‐Amplifier
6
DRAM Charge over Time
Cell
Cell
charge
Data 1
Sense-­‐Amplifier
Sense-­‐Amplifier
Timing Parameters Sensing
In theory
In prac1ce
Data 0
Restore
1me
margin
Why does DRAM need the extra 1ming margin?
7
1. DRAM Opera#on Basics
2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-­‐Latency DRAM
5. DRAM Characteriza#on
6. Real System Performance Evalua#on
8
Two Reasons for Timing Margin
1. Process Varia1on –  DRAM cells are not equal
Leads to extra #ming margin for cell that can –  Leads to extra #ming margin for a cell that can store small amount of charge
store a large amount of charge
` 2. Temperature Dependence –  DRAM leaks more charge at higher temperature
–  Leads to extra #ming margin when opera#ng at low temperature 9
DRAM Cells are Not Equal
Ideal
Real
Smallest Cell
Largest Cell
Same Size è
Different Size è
Large varia1on in cell size è
Same Charge è
Different Charge è
Large varia1on in charge è
Same Latency
Different Latency
Large varia1on in access latency
10
Process Varia#on
DRAM Cell
Capacitor
Bitline
Contact
Access Transistor
ACCESS
❶ Cell Capacitance
❷ Contact Resistance
❸ Transistor Performance
Small cell can store small charge
•  Small cell capacitance
•  High contact resistance
•  Slow access transistor
è High access latency 11
Two Reasons for Timing Margin
1. Process Varia1on –  DRAM cells are not equal
–  Leads to extra #ming margin for a cell that can store a large amount of charge
` 2. Temperature Dependence –  DRAM leaks more charge at higher temperature
–  Leads to extra #ming margin for cells that operate at the low temperature operate at the high temperature 12
Charge Leakage ∝ Temperature
Room Temp.
Hot Temp. (85°C)
Cells store small charge at high temperature Small Leakage
Large Leakage
and large charge at low temperature à Large varia1on in access latency
13
DRAM Timing Parameters
•  DRAM 1ming parameters are dictated by the worst-­‐case –  The smallest cell with the smallest charge in all DRAM products
–  Opera#ng at the highest temperature
•  Large 1ming margin for the common-­‐case
14
Our Approach
•  We op1mize DRAM 1ming parameters for the common-­‐case –  The smallest cell with the smallest charge in a DRAM module
–  Opera#ng at the current temperature
•  Common-­‐case cell has extra charge than the worst-­‐case cell à  Can lower latency for the common-­‐case
15
1. DRAM Opera#on Basics
2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-­‐Latency DRAM
5. DRAM Characteriza#on
6. Real System Performance Evalua#on
16
Key Observa#ons
1. Sensing
Sense cells with extra charge faster à Lower sensing latency
2. Restore
No need to fully restore cells with extra charge
à Lower restore latency
3. Precharge
No need to fully precharge bitlines for cells with extra charge
à Lower precharge latency
17
Observa#on 1. Faster Sensing
115 DIMM Characteriza1on
Typical DIMM at Low Temperature
More Charge
Timing
(tRCD)
Strong Charge
Flow
17% ↓
Faster Sensing
No Errors
Typical DIMM at Low Temperature
è More charge è Faster sensing
18
Observa#on 2. Reducing Restore Time
Typical DIMM at Low Temperature Larger Cell & Less Leakage è Extra Charge
No Need to Fully
Restore Charge
115 DIMM Characteriza1on
Read (tRAS)
37% ↓
Write (tWR)
54% ↓
No Errors
Typical DIMM at lower temperature
è More charge è Restore 1me reduc1on
19
Observa#on 3. Reducing Precharge Time
Sensing
Half
Precharge
Empty (0V)
Full (Vdd)
Bitline
Typical DIMM at Lower Temperature
Sense-­‐Amplifier
Precharge ? – Se`ng bitline to half-­‐full charge 20
Observa#on 3. Reducing Precharge Time
Access Empty Cell
Not Fully Precharged
Half
Empty (0V)
Access Full Cell
More Charge
è Strong Sensing
Full (Vdd)
bitline
115 DIMM Characteriza1on
Timing
(tRP)
35% ↓
No Errors
Typical DIMM at Lower Temperature
è More charge è Precharge 1me reduc1on
21
Key Observa#ons
1. Sensing
Sense cells with extra charge faster à Lower sensing latency
2. Restore
No need to fully restore cells with extra charge
à Lower restore latency
3. Precharge
No need to fully precharge bitlines for cells with extra charge
à Lower precharge latency
22
1. DRAM Opera#on Basics
2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-­‐Latency DRAM
5. DRAM Characteriza#on
6. Real System Performance Evalua#on
23
Adap#ve-­‐Latency DRAM
•  Key idea
–  Op#mize DRAM #ming parameters online
•  Two components
– DRAM manufacturer profiles mul#ple sets of reliable DRAM #ming parameters at different reliable DRAM #ming parameters
temperatures for each DIMM
– System monitors DRAM temperature & uses DRAM temperature
appropriate DRAM #ming parameters
24
1. DRAM Opera#on Basics
2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-­‐Latency DRAM
5. DRAM Characteriza#on
6. Real System Performance Evalua#on
25
DRAM Temperature
•  DRAM temperature measurement
•  Server cluster: Operates at under 34°C •  Desktop: Operates at under 50°C
•  DRAM standard op1mized for 85°C
• DRAM operates at low temperatures Previous works – DRAM temperature is low
•  El-­‐Sayed+ SIGMETRICS 2012
in the common-­‐case
•  Liu+ ISCA 2007
•  Previous works – Maintain DRAM temperature low
•  David+ ICAC 2011
•  Liu+ ISCA 2007
•  Zhu+ ITHERM 2008
26
DRAM Tes#ng Infrastructure
Temperature Controller FPGAs Heater FPGAs PC 27
Test Pamern
•  Single cache line test (Read/Write)
Write
Access
Verify
1me
Refresh Interval: 64–512ms
•  Overlapping mul1ple single cache line tests to simulate power noise and coupling
. . . Write
Access Access Access
. . . Verify Verify
Refresh Interval: 64–512ms
. . . 1me
28
Control Factors
•  Timing parameters
–  Sensing: tRCD
–  Restore: tRAS (read), tWR(write)
–  Precharge: tRP
•  Temperature: 55 – 85°C •  Refresh interval: 64 – 512ms –  Longer refresh interval leads to smaller charge –  Standard refresh interval: 64ms
29
1. Timings ↔ Charge
Temperature: 85°C/Refresh Interval: 64, 128, 256, 512ms
105
Sensing
Restore (Read)
Precharge
103
102
More charge enables more 1ming parameter reduc1on
7.5ns 10.0ns 12.5ns 15.0ns 15.0ns 12.5ns 10.0ns 7.5ns 5.0ns 20.0ns 22.5ns 25.0ns 27.5ns 30.0ns 32.5ns 35.0ns 7.5ns 10.0ns 0
12.5ns 10
15.0ns Errors
104
Restore (Write)
30
2. Timings ↔ Temperature
Temperature: 55, 65, 75, 85°C/Refresh Interval: 512ms
105
Sensing
Restore (Read)
Precharge
103
102
Lower temperature enables
more 1ming parameter reduc1on
7.5ns 10.0ns 12.5ns 15.0ns 15.0ns 12.5ns 10.0ns 7.5ns 5.0ns 20.0ns 22.5ns 25.0ns 27.5ns 30.0ns 32.5ns 35.0ns 7.5ns 10.0ns 0
12.5ns 10
15.0ns Errors
104
Restore (Write)
31
3. Summary of 115 DIMMs
•  Latency reduc1on for read & write (55°C)
–  Read Latency: 32.7%
–  Write Latency: 55.1%
•  Latency reduc1on for each 1ming parameter (55°C) –  Sensing: 17.3%
–  Restore: 37.3% (read), 54.8% (write)
–  Precharge: 35.2% 32
1. DRAM Opera#on Basics
2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-­‐Latency DRAM
5. DRAM Characteriza#on
6. Real System Performance Evalua#on
33
Real System Evalua#on Method
•  System
–  CPU: AMD 4386 ( 8 Cores, 3.1GHz, 8MB LLC)
– DRAM: 4GByte DDR3-­‐1600 (800Mhz Clock)
– OS: Linux
– Storage: 128GByte SSD
•  Workload
–  35 applica1ons from SPEC, STREAM, Parsec, Memcached, Apache, GUPS
34
Average
Improvement
non-­‐intensive
gups
s.cluster
copy
gems
lbm
libq
milc
1.4%
6.7% 5.0%
all-­‐workloads
all-­‐35-­‐workload
Mul# Core
intensive
Single Core
mcf
25%
20%
15%
10%
5%
0%
soplex
Performance Improvement
Single-­‐Core Evalua#on
AL-­‐DRAM improves performance on a real system
35
Single Core
Average Improvement
Mul# Core
14.0%
10.4%
all-­‐workloads
all-­‐35-­‐workload
intensive
non-­‐intensive
gups
s.cluster
copy
gems
lbm
libq
milc
2.9%
mcf
25%
20%
15%
10%
5%
0%
soplex
Performance Improvement
Mul#-­‐Core Evalua#on
AL-­‐DRAM provides higher performance for
mul1-­‐programmed & mul1-­‐threaded workloads
36
•  Observa1ons
Conclusion
–  DRAM 1ming parameters are dictated by the worst-­‐case cell (smallest cell across all products at highest temperature)
–  DRAM operates at lower temperature than the worst case
•  Idea: Adap1ve-­‐Latency DRAM –  Op1mizes DRAM 1ming parameters for the common case (typical DIMM opera1ng at low temperatures)
•  Analysis: Characteriza1on of 115 DIMMs
–  Great poten1al to lower DRAM 1ming parameters (17 – 54%) without any errors
•  Real System Performance Evalua1on –  Significant performance improvement (14% for memory-­‐
intensive workloads) without errors (33 days)
37
Op#mizing DRAM Timing for the Common-­‐Case
Adap#ve-­‐Latency DRAM
Donghyuk Lee
Yoongu Kim, Gennady Pekhimenko, Samira Khan,
Vivek Seshadri, Kevin Chang, Onur Mutlu
Backup Slides
39
Overhead
•  DRAM Manufacturer
–  Addi1onal tests: can be integrated into exis1ng test process (i.e., TCSR test)
•  DRAM (DIMM)
–  Already have in-­‐DRAM temperature sensor (i.e., Low Power DDR)
–  Mul1ple sets of 1ming parameters can be stored in SPD (Serial Presence Detect)
•  System Support for AL-­‐DRAM
–  Already have ability to change DRAM 1ming online
40
Mul#ple Timing Parameters
10 tRAS:
35.0ns 32.5ns 30.0ns 27.5ns 25.0ns 22.5ns 20.0ns Errors
8 6 4 2 0 A tRCD: 10.0ns
tRP: 12.5ns
Ref. Interval: 200ms
B 12.5ns
10.0ns
200ms
C 10.0ns
10.0ns
200ms
Reducing a #ming parameter è Reduces poten#al reduc#on of other parameters
41
Maximum error-­‐free refresh interval (ms)
Temperature ↔ Refresh Interval
700 More charge than required
è Need for reliable opera#on from other fail mechanisms (i.e., VRT)
è Safety-­‐margin è Safe refresh interval
600 500 400 300 200 100 0 55°C
75°C
65°C
Temperature (°C)
85°C
64ms SPEC
Extra charge that can be used for latency reduc#on
42
DRAM Cell Organiza#on
Bitline Access transistor Cell capacitor Bitline capacitor Sense-­‐
amplifier 43
DRAM Cell Opera#on
1 Turn-­‐on access transistor
Leakage
Bitline
Access transistor
4 Precharged to Vdd/2
Sense
Cell Bitline capacitor Charge-­‐sharing capacitor
3 Fully charged
Amplify
2 Ready to access data
Precharge
Sense-­‐amplifier
44
DRAM Cell Charge Varia#ons
Typical temp. Worst temp. Typical cell Fast Fast restore leak Worst cell Slow Smallest restore charge Slowly Largest leak charge 45