Op#mizing DRAM Timing for the Common-‐Case Adap#ve-‐Latency DRAM Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu x86 CPU SPEC Apache GUPS Memcached Parsec mcf MemCtrl Run#me: 527min Run#me: 477min -‐10.5% (no error) Timing Parameters (11 – 11 – 28) DRAM Module (8 – 8 – 19) DDR3 1600MT/s (11-‐11-‐28) 2 Reducing DRAM Timing Why can we reduce DRAM 1ming parameters without any errors? 3 Execu#ve Summary • Observa1ons – DRAM 1ming parameters are dictated by the worst-‐case cell (smallest cell across all products at highest temperature) – DRAM operates at lower temperature than the worst case • Idea: Adap1ve-‐Latency DRAM – Op1mizes DRAM 1ming parameters for the common case (typical DIMM opera1ng at low temperatures) • Analysis: Characteriza1on of 115 DIMMs – Great poten1al to lower DRAM 1ming parameters (17 – 54%) without any errors • Real System Performance Evalua1on – Significant performance improvement (14% for memory-‐ intensive workloads) without errors (33 days) 4 1. DRAM Opera#on Basics 2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-‐Latency DRAM 5. DRAM Characteriza#on 6. Real System Performance Evalua#on 5 DRAM Stores Data as Charge DRAM Cell Three steps of charge movement 1. Sensing 2. Restore 3. Precharge Sense-‐Amplifier 6 DRAM Charge over Time Cell Cell charge Data 1 Sense-‐Amplifier Sense-‐Amplifier Timing Parameters Sensing In theory In prac1ce Data 0 Restore 1me margin Why does DRAM need the extra 1ming margin? 7 1. DRAM Opera#on Basics 2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-‐Latency DRAM 5. DRAM Characteriza#on 6. Real System Performance Evalua#on 8 Two Reasons for Timing Margin 1. Process Varia1on – DRAM cells are not equal Leads to extra #ming margin for cell that can – Leads to extra #ming margin for a cell that can store small amount of charge store a large amount of charge ` 2. Temperature Dependence – DRAM leaks more charge at higher temperature – Leads to extra #ming margin when opera#ng at low temperature 9 DRAM Cells are Not Equal Ideal Real Smallest Cell Largest Cell Same Size è Different Size è Large varia1on in cell size è Same Charge è Different Charge è Large varia1on in charge è Same Latency Different Latency Large varia1on in access latency 10 Process Varia#on DRAM Cell Capacitor Bitline Contact Access Transistor ACCESS ❶ Cell Capacitance ❷ Contact Resistance ❸ Transistor Performance Small cell can store small charge • Small cell capacitance • High contact resistance • Slow access transistor è High access latency 11 Two Reasons for Timing Margin 1. Process Varia1on – DRAM cells are not equal – Leads to extra #ming margin for a cell that can store a large amount of charge ` 2. Temperature Dependence – DRAM leaks more charge at higher temperature – Leads to extra #ming margin for cells that operate at the low temperature operate at the high temperature 12 Charge Leakage ∝ Temperature Room Temp. Hot Temp. (85°C) Cells store small charge at high temperature Small Leakage Large Leakage and large charge at low temperature à Large varia1on in access latency 13 DRAM Timing Parameters • DRAM 1ming parameters are dictated by the worst-‐case – The smallest cell with the smallest charge in all DRAM products – Opera#ng at the highest temperature • Large 1ming margin for the common-‐case 14 Our Approach • We op1mize DRAM 1ming parameters for the common-‐case – The smallest cell with the smallest charge in a DRAM module – Opera#ng at the current temperature • Common-‐case cell has extra charge than the worst-‐case cell à Can lower latency for the common-‐case 15 1. DRAM Opera#on Basics 2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-‐Latency DRAM 5. DRAM Characteriza#on 6. Real System Performance Evalua#on 16 Key Observa#ons 1. Sensing Sense cells with extra charge faster à Lower sensing latency 2. Restore No need to fully restore cells with extra charge à Lower restore latency 3. Precharge No need to fully precharge bitlines for cells with extra charge à Lower precharge latency 17 Observa#on 1. Faster Sensing 115 DIMM Characteriza1on Typical DIMM at Low Temperature More Charge Timing (tRCD) Strong Charge Flow 17% ↓ Faster Sensing No Errors Typical DIMM at Low Temperature è More charge è Faster sensing 18 Observa#on 2. Reducing Restore Time Typical DIMM at Low Temperature Larger Cell & Less Leakage è Extra Charge No Need to Fully Restore Charge 115 DIMM Characteriza1on Read (tRAS) 37% ↓ Write (tWR) 54% ↓ No Errors Typical DIMM at lower temperature è More charge è Restore 1me reduc1on 19 Observa#on 3. Reducing Precharge Time Sensing Half Precharge Empty (0V) Full (Vdd) Bitline Typical DIMM at Lower Temperature Sense-‐Amplifier Precharge ? – Se`ng bitline to half-‐full charge 20 Observa#on 3. Reducing Precharge Time Access Empty Cell Not Fully Precharged Half Empty (0V) Access Full Cell More Charge è Strong Sensing Full (Vdd) bitline 115 DIMM Characteriza1on Timing (tRP) 35% ↓ No Errors Typical DIMM at Lower Temperature è More charge è Precharge 1me reduc1on 21 Key Observa#ons 1. Sensing Sense cells with extra charge faster à Lower sensing latency 2. Restore No need to fully restore cells with extra charge à Lower restore latency 3. Precharge No need to fully precharge bitlines for cells with extra charge à Lower precharge latency 22 1. DRAM Opera#on Basics 2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-‐Latency DRAM 5. DRAM Characteriza#on 6. Real System Performance Evalua#on 23 Adap#ve-‐Latency DRAM • Key idea – Op#mize DRAM #ming parameters online • Two components – DRAM manufacturer profiles mul#ple sets of reliable DRAM #ming parameters at different reliable DRAM #ming parameters temperatures for each DIMM – System monitors DRAM temperature & uses DRAM temperature appropriate DRAM #ming parameters 24 1. DRAM Opera#on Basics 2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-‐Latency DRAM 5. DRAM Characteriza#on 6. Real System Performance Evalua#on 25 DRAM Temperature • DRAM temperature measurement • Server cluster: Operates at under 34°C • Desktop: Operates at under 50°C • DRAM standard op1mized for 85°C • DRAM operates at low temperatures Previous works – DRAM temperature is low • El-‐Sayed+ SIGMETRICS 2012 in the common-‐case • Liu+ ISCA 2007 • Previous works – Maintain DRAM temperature low • David+ ICAC 2011 • Liu+ ISCA 2007 • Zhu+ ITHERM 2008 26 DRAM Tes#ng Infrastructure Temperature Controller FPGAs Heater FPGAs PC 27 Test Pamern • Single cache line test (Read/Write) Write Access Verify 1me Refresh Interval: 64–512ms • Overlapping mul1ple single cache line tests to simulate power noise and coupling . . . Write Access Access Access . . . Verify Verify Refresh Interval: 64–512ms . . . 1me 28 Control Factors • Timing parameters – Sensing: tRCD – Restore: tRAS (read), tWR(write) – Precharge: tRP • Temperature: 55 – 85°C • Refresh interval: 64 – 512ms – Longer refresh interval leads to smaller charge – Standard refresh interval: 64ms 29 1. Timings ↔ Charge Temperature: 85°C/Refresh Interval: 64, 128, 256, 512ms 105 Sensing Restore (Read) Precharge 103 102 More charge enables more 1ming parameter reduc1on 7.5ns 10.0ns 12.5ns 15.0ns 15.0ns 12.5ns 10.0ns 7.5ns 5.0ns 20.0ns 22.5ns 25.0ns 27.5ns 30.0ns 32.5ns 35.0ns 7.5ns 10.0ns 0 12.5ns 10 15.0ns Errors 104 Restore (Write) 30 2. Timings ↔ Temperature Temperature: 55, 65, 75, 85°C/Refresh Interval: 512ms 105 Sensing Restore (Read) Precharge 103 102 Lower temperature enables more 1ming parameter reduc1on 7.5ns 10.0ns 12.5ns 15.0ns 15.0ns 12.5ns 10.0ns 7.5ns 5.0ns 20.0ns 22.5ns 25.0ns 27.5ns 30.0ns 32.5ns 35.0ns 7.5ns 10.0ns 0 12.5ns 10 15.0ns Errors 104 Restore (Write) 31 3. Summary of 115 DIMMs • Latency reduc1on for read & write (55°C) – Read Latency: 32.7% – Write Latency: 55.1% • Latency reduc1on for each 1ming parameter (55°C) – Sensing: 17.3% – Restore: 37.3% (read), 54.8% (write) – Precharge: 35.2% 32 1. DRAM Opera#on Basics 2. Reasons for Timing Margin in DRAM 3. Key Observa9ons 4. Adap#ve-‐Latency DRAM 5. DRAM Characteriza#on 6. Real System Performance Evalua#on 33 Real System Evalua#on Method • System – CPU: AMD 4386 ( 8 Cores, 3.1GHz, 8MB LLC) – DRAM: 4GByte DDR3-‐1600 (800Mhz Clock) – OS: Linux – Storage: 128GByte SSD • Workload – 35 applica1ons from SPEC, STREAM, Parsec, Memcached, Apache, GUPS 34 Average Improvement non-‐intensive gups s.cluster copy gems lbm libq milc 1.4% 6.7% 5.0% all-‐workloads all-‐35-‐workload Mul# Core intensive Single Core mcf 25% 20% 15% 10% 5% 0% soplex Performance Improvement Single-‐Core Evalua#on AL-‐DRAM improves performance on a real system 35 Single Core Average Improvement Mul# Core 14.0% 10.4% all-‐workloads all-‐35-‐workload intensive non-‐intensive gups s.cluster copy gems lbm libq milc 2.9% mcf 25% 20% 15% 10% 5% 0% soplex Performance Improvement Mul#-‐Core Evalua#on AL-‐DRAM provides higher performance for mul1-‐programmed & mul1-‐threaded workloads 36 • Observa1ons Conclusion – DRAM 1ming parameters are dictated by the worst-‐case cell (smallest cell across all products at highest temperature) – DRAM operates at lower temperature than the worst case • Idea: Adap1ve-‐Latency DRAM – Op1mizes DRAM 1ming parameters for the common case (typical DIMM opera1ng at low temperatures) • Analysis: Characteriza1on of 115 DIMMs – Great poten1al to lower DRAM 1ming parameters (17 – 54%) without any errors • Real System Performance Evalua1on – Significant performance improvement (14% for memory-‐ intensive workloads) without errors (33 days) 37 Op#mizing DRAM Timing for the Common-‐Case Adap#ve-‐Latency DRAM Donghyuk Lee Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu Backup Slides 39 Overhead • DRAM Manufacturer – Addi1onal tests: can be integrated into exis1ng test process (i.e., TCSR test) • DRAM (DIMM) – Already have in-‐DRAM temperature sensor (i.e., Low Power DDR) – Mul1ple sets of 1ming parameters can be stored in SPD (Serial Presence Detect) • System Support for AL-‐DRAM – Already have ability to change DRAM 1ming online 40 Mul#ple Timing Parameters 10 tRAS: 35.0ns 32.5ns 30.0ns 27.5ns 25.0ns 22.5ns 20.0ns Errors 8 6 4 2 0 A tRCD: 10.0ns tRP: 12.5ns Ref. Interval: 200ms B 12.5ns 10.0ns 200ms C 10.0ns 10.0ns 200ms Reducing a #ming parameter è Reduces poten#al reduc#on of other parameters 41 Maximum error-‐free refresh interval (ms) Temperature ↔ Refresh Interval 700 More charge than required è Need for reliable opera#on from other fail mechanisms (i.e., VRT) è Safety-‐margin è Safe refresh interval 600 500 400 300 200 100 0 55°C 75°C 65°C Temperature (°C) 85°C 64ms SPEC Extra charge that can be used for latency reduc#on 42 DRAM Cell Organiza#on Bitline Access transistor Cell capacitor Bitline capacitor Sense-‐ amplifier 43 DRAM Cell Opera#on 1 Turn-‐on access transistor Leakage Bitline Access transistor 4 Precharged to Vdd/2 Sense Cell Bitline capacitor Charge-‐sharing capacitor 3 Fully charged Amplify 2 Ready to access data Precharge Sense-‐amplifier 44 DRAM Cell Charge Varia#ons Typical temp. Worst temp. Typical cell Fast Fast restore leak Worst cell Slow Smallest restore charge Slowly Largest leak charge 45
© Copyright 2026 Paperzz