PowerPoint

On-chip MRAM as a High-Bandwidth,
Low-Latency Replacement for DRAM
Physical Memories
Rajagopalan Desikan, Charles R. Lefurgy,
Stephen W. Keckler, and Doug Burger
Computer Architecture and Technology Lab
University of Texas at Austin
02/21/2003 CART
1
Motivation
• Latency to off-chip memory hundreds of
cycles
• Off-chip memory bandwidth becoming a
performance limiting factor
• MRAM – Emerging memory technology with
high bandwidth and low latency
• Goal of our work - To determine if the
performance advantage of MRAM in high
performance computing is worth more
investment and research
02/21/2003 CART
2
Outline
• MRAM Memory Description
• MRAM Memory Hierarchy
• Results
• Conclusions
02/21/2003 CART
3
MRAM Cell
• Magnetoresistive random access
memory (MRAM) uses the magnetic
tunnel junction (MTJ) to store
Bit Line
information
• MRAM cell composed of a diode
and an MTJ stack
Read/Write Current
• MTJ stack consists of two
ferromagnetic layers separated by a
thin dielectric barrier
• Polarization of one layer fixed, other
used for information storage
Diode
MTJ Stack
Pt
Co/F
Ni/F
e2O
Al
e
Co/F
3
Ni/F
e
Mn/
e
Pt
Fe
W
Word Line
02/21/2003 CART
4
MRAM Bank Design
• MRAM cells located at the intersection
of each word and bit line
• Read – Connect current sources to bit
lines and selected wordline is pulled low
• Writes – Polarity of current in the bit
lines decides value stored
• MRAM banks accessed using vias
02/21/2003 CART
5
MRAM Bank Modeling
• Modified CACTI-3.0 to develop an area and
timing tool to model MRAM banks
• Independently accessible composed of subbanks
• Important features
–
–
–
–
Active area consumed
Delay due to vertical wires
MRAM capacity for a given die size and cell size
Support for multiple layers with sharing
• SIA 2001 roadmap at 90 nm technology
02/21/2003 CART
6
Chip-Level Architecture
02/21/2003 CART
7
MRAM Design Issues
• Number of Banks
– More banks : Low latency, higher concurrency,
higher network traversal time, higher miss rates
• Cache Line Size
– Larger line size : More spatial locality, higher
latency
• Page Placement Policy
– Random
– Round-robin
– Least loaded
02/21/2003 CART
8
Methodology
• Simulated Processor
– Alpha 21264 pipeline modified for 8 wide issue
– 3.8 GHz (10 FO4 inverters per stage)
• Base SDRAM System
– Distributed L2 cache
• Base MRAM system
– Distributed MRAM banks and reduced capacity
distributed L2 cache
• Benchmarks
– Memory intensive SPEC CPU2000, Scientific,
Speech
02/21/2003 CART
9
Page Placement Policy
IPC for 100 banks with different page placement policies
CostLeast-Loaded = (L2 Hit Rate * L2 Hit Latency)
+ (L2 Miss Rate * MRAM Bank Latency)
+ Current Network Latency to Bank
02/21/2003 CART
10
MRAM Sensitivity
20
30
40
60
MRAM Latency Sensitivity
SDRAM Latency : 30 ns
02/21/2003 CART
11
Conclusions
• Developed an architectural model for
exploiting an emerging memory
technology, MRAM
• Analyzed the contribution to
performance of the different
components in our MRAM system
• MRAM system performs 15 % than
conventional SDRAM
02/21/2003 CART
12