3D-DRAM Circuit Design, Modeling and Exploration for Computer

3D-DRAM Circuit Design, Modeling and
Exploration for Computer Memory Hierarchy
Rakesh Anigu, Hongbin Sun, James J.-Q. Lu, Ken Rose, and
Tong Zhang
Electrical, Computer and Systems Engineering Department
Rensselaer Polytechnic Institute
Motivation
TSV size/pitch
but…
Thermal
Yield loss
EDA tools
Equipments
Cost
…
2
Motivation
Naturally embraces the immaturity of 3D integration
TSV size/pitch
9 Coarse-grained die-to-die interconnect only
Thermal
9 Inherently low power and less heat
Yield loss
9 Easy to achieve very high defect tolerance
EDA tools
9 Minimal departure from 2D design
Equipments
9 Big $$$ market
Cost
9 Higher-end, definitely not commodity
3
Overall performance
Why 3D Processor-DRAM Integration
Memory Wall
&
Bandwidth Wall
Time
(Dr. Phil Emma @ IBM)
Move more memory closer to processor cores at minimal extra cost!
3D Processor-DRAM Integration
4
Why 3D Processor-DRAM Integration
Almost no yield loss
2D design know-how
Coarse-grained TSVs
DRAM dies
Thermal friendly
Processor die
Justifiable cost
To break the memory & bandwidth wall!
Quantitatively evaluate the potential
5
Outline
Motivation
3D DRAM Architecture Design
3D Processor-DRAM Integration
Conclusions
6
3D DRAM Architecture Design
Stacked commodity
DRAM dies
Processor die
L2 cache ⇔ main memory
Bandwidth
Latency
Area
CACTI 5 Î 1Gb 2D DRAM @ 65nm
Latency
Energy
7
3D DRAM Architecture Design
Stacked Commodity DRAM Î Customized 3D DRAM
At which granularity should we carry out 3D mapping
Intra-sub-array 3D mapping
Fine-grained TSVs
Inter-sub-array 3D mapping
Coarse-grained TSVs
8
Inter-Sub-Array 3D Mapping
TSV I/Os
Top view
9
3D Sub-Array Set
Distributed across dies
2D sub-array
Data bus
Address bus
2D sub-array
2D sub-array
TSVs bundle
Multi-layer data access (MLDA)
Single-layer data access (SLDA)
All 2D sub-arrays are activated
Only one 2D sub-array is activated
Each handles a portion of data
One 2D sub-array handles all data
TSVs
Energy
TSVs
Energy
10
3D DRAM Architecture Design
Inter-sub-array 3D mapping
Small number of TSVs (1K~10K)
Intact individual DRAM sub-array design
Distributed global routing Î performance gain
Modified CACTI 5 to support inter-sub-array 3D mapping
Case study: 1Gb with 8 banks and 256-bit I/O @ 65nm
2D
vs.
3D die packaging
(i.e., no TSVs)
SLDA
vs.
3D DRAM
MLDA
11
12
Defect Tolerance
One more dimension for redundancy repair
Sub-Array
Sub-Array
Sub-Array
Redundancy
x
Redundancy
Redundancy
Inter-die inter-sub-array redundancy repair
13
Inter-Die Inter-Sub-Array Redundancy Repair
1024x256 sub-array, defect density: 0.05%, repair-most algorithm
14
Outline
Motivation
3D DRAM Architecture Design
3D Processor-DRAM Integration
Conclusions
15
Current Design Practice
Core w/ L1
Core w/ L1
Shared L2 Cache
(SRAM)
L2 capacity & L1↔L2 bandwidth
Core w/ L1
Core w/ L1
Core w/ L1
Core w/ L1
3D Integration
DDRx
Commodity DRAM
channel
L2 ↔ main memory bandwidth
High-density DRAM High-speed DRAM
16
Heterogeneous 3D DRAM
Stacked Commodity DRAM Î Customized 3D DRAM
Heterogeneous 3D-DRAM L2 cache + main memory structure
Each core has its private 2D-SRAM L1 cache & 3D-DRAM L2 cache
DRAM density vs. speed trade-off
Density
Density
Sub-Array
Sub-Array
Speed
Speed
Integrate both high-threshold & low-threshold MOSFETs
17
Evaluation
M5 full system simulator with Linux (U. of Mich.)
Four 4.0GHz cores with 8-layer 3D-DRAM at 45nm node
¾ 3D-DRAM L2 cache per core: 2MB
¾ 3D-DRAM main memory: 1GB
Processor Die
Baseline
Core w/ L1
Core w/ L1
Core w/ L1
Core w/ L1
Without multi-Vt
With multi-Vt
18
Instruction Per Cycle (IPC) Gain over Baseline
19
One Step Further
Decentralized distributed main memory structure
Fastlane between L2 cache and its closest main memory block
Reduced L2 cache miss penalty
20
One Step Further
21
Conclusions
3D multi-core processor DRAM integration
3D DRAM Design
Simple but effective inter-sub-array 3D mapping strategy
Simple but effective 3D redundancy repair
Good memory performance gain
Integration of processor and 3D DRAM
Heterogeneous 3D DRAM architecture
Great computing system performance gain
22

Download Report

3D-DRAM Circuit Design, Modeling and Exploration for Computer

Paperzz.com

Your Paperzz