07.pdf

Chapter 7
Optimizing Power @ Design Time – Memory
Optimizing Power @ Design Time
Memory
Benton H. Calhoun
Jan M. Rabaey
Slide 7.1
Role of Memory in ICs
Memory is very important
Focus in this chapter is embedded memory
Percentage of area going to memory is increasing
[Ref: V. De, Intel 2006]
Slide 7.2
Processor Area Becoming Memory Dominated
On-chip SRAM contains 50–90%
of total transistor count
– Xeon: 48M/110M
– Itanium 2: 144M/220M
SRAM
SRAM is a major source of
chip static power dissipation
– Dominant in ultra low-power
applications
– Substantial fraction in others
Intel Penryn™
Penryn
(Picture courtesy of Intel )
Slide 7.3
Chapter Outline
Introduction to Memory Architectures
Power in the Cell Array
Power for Read Access
Power for Write Access
New Memory Technologies
Slide 7.4
Basic Memory Structures
Block 0
Block i
Block P – 1
Row
address
Column
address
Block
address
Global data bus
Control
circuitry
Block selector
Global
amplifier/driver
I/O
[Ref: J. Rabaey, Prentice’03]
Slide 7.5
SRAM Metrics
Why is functionality a “metric”?
Functionality
– Data retention
– Readability
– Writability
– Soft Errors
Area
Power
Slide 7.6
Process variations
increase with scaling
Large number of cells
requires analysis of
tails (out to 6σ or 7σ)
Within-die V TH variation
due to Random Dopant
Fluctuations (RDFs)
Where Does SRAM Power Go?
Numerous analytical SRAM power models
Great variety in power breakdowns
Different applications cause different
components of power to dominate
Hence: Depends on applications: e.g., high
speed versus low power, portable
Slide 7.7
SRAM cell
Three tasks of a cell
BL
BL
WL
Hold data
– WL = 0; BLs = X
Q
M3
M6
M2
Write
M5
M4
M1
– WL = 1; BLs driven with new
data
QB
Traditional 6-Transistor
(6T) SRAM cell
Slide 7.8
Read
– WL = 1; BLs precharged
and left floating
Key SRAM cell metrics
BL
BL
WL
Key functionality metrics
Hold
– Static Noise Margin (SNM)
– Data retention voltage (DRV)
Q
M3
M6
M1
M4
M2
Read
M5
QB
– Static Noise Margin (SNM)
Write
– Write Margin
Traditional 6-Transistor
(6T) SRAM cell
Slide 7.9
Metrics:
Area is primary constraint
Next, Power , Delay
Static Noise Margin (SNM)
BL
BLB
WL
VN
SNM gives a measure of the
cell’s stability by quantifying the
DC noise required to flip the cell
M3
M6
M1
M4
M5
M2
Q
QB
VN
Inv 1
Inv 2
0.3
QB(V)
VTC for Inv 2
–1
VTC for Inv 1
VTC for Inv2 with VN = SNM
–1
VTC for Inv1 with VN = SNM
SNM
0.15
0
SNM is length of side of
the largest embedded
square on the butterfly
curve
0
0.15
Q (V)
Slide 7.10
0.3
[Ref: E. Seevinck, JSSC’87]
Static Noise Margin with Scaling
Typical cell SNM
deteriorates with scaling
Variations lead to failure
from insufficient SNM
Tech and VDD scaling lower SNM
Variations worsen tail of SNM
distribution
(Results obtained from
simulations with
Predictive Technology
M d l –
Models
[Ref: PTM; Y. Cao ‘00])
Slide 7.11
Variability: Write Margin
BL
BLB
WL
1
1
1 0
Normalized QB
0.8
0
Write failure:
Positive SNM
0.6
0.4
0.2
0
0
0.2
Dominant fight (ratioed)
1
1
0.8
0.8
Normalized QB
Normalized QB
Cell stability
prior to write:
0.6
0.4
0.2
0.6
0.8
1
Successful write:
Negative “SNM”
0.6
0.4
0.2
0
0
0
0.2
0.4
0.6
Normalized Q
Slide 7.12
0.4
Normalized Q
0.8
1
0
0.2
0.4
0.6
Normalized Q
0.8
1
Variability: Cell Writability
VDD = 0.6 V
0.05
Write Fails
SNM (V)
0
–0.05
–0.1
–0.15
–0.2
TT
WW
SS
WS
SW
–0.25
–40 –20
0
20
40
60
80 100 120
Temperature (°C)
Write margin limits VDD scaling for 6T cells to 600 mV, best case.
65 nm process, VDD = 0.6 V
Variability and large number of cells makes this worse
Slide 7.13
Cell Array Power
Leakage Power dominates while the
memory holds data
BL
BL
WL
‘0’
‘1’
Sub-threshold leakage
Slide 7.14
Importance of Gate
tunneling and GIDL
depends on
technology and
voltages applied
High-VTH cells necessary if
all else is kept the same
To keep leakage in 1 MB
memory within bounds, V TH
must be kept in 0.4–0.6 V
range
1 MB array retention current (A)
Using Threshold Voltage to Reduce Leakage
Tj = 125 °C L g = 0.1 μm
100
100°C
75 °C
50 °C
25°C
W (QT) = 0.20 μm
W (QD) = 0.28 μm
W (QL) = 0.18 μm
high speed
(0.49)
10–2
10–4
low power
(0.71)
10 µA
10–6
0.1 µA
10–8
–0.2
0
0.2
0.4
0.6
0.8
1.0
Average extrapolated VTH (V) at 25°C
Extrapolated VTH = VTH (nA /μm) + 0.3 V
[Ref: K. Itoh, ISCAS’06]
Slide 7.15
Multiple Threshold Voltages
BL
WL
BL
BL
WL
BL
‘0’
Dual VTH cells with low-VTH
access transistors provide
good tradeoffs in power and
delay
[Ref: Hamzaoglu, et al., TVLSI’02]
High VTH
Use high-VTH devices to lower
leakage for stored ‘0’, which is
much more common than a
stored ‘1’
Low VTH
[Ref: N. Azizi, TVLSI’03]
Slide 7.16
Multiple Voltages
Selective usage of multiple voltages in cell array
– e.g.,16 fA/cell at 25°C in 0.13 μm technology
1.0V
WL=0V
1.5V
1.0V
High VTH to lower subVTH leakage
Raised source, raised
VDD, and lower BL
reduce gate stress
while maintaining SNM
0.5V
[Ref: K. Osada, JSSC’03]
Slide 7.17
Power Breakdown During Read
VDD_Prech
Accessing correct cell
Decoders, WL d
D
drivers
i
– For Lower Power:
WL
Address
hierarchical
hi
hi l WLs
WL
pulsed decoders
Sense
mp
Performing read
– Charge and discharge
large BL capacitance
– For Lower Power :
Slide 7.18
Data
SAs and low BL swing
Lower VDD
Hierarchical BLs
– May require read assist
Lower BL precharge
Mem
Hierarchical Wordline Architecture
Global word line
Subglobal word line
Local
word line
…
Block group
select
Block
select
Local
word line
…
…
Memory cell
Block 0
Block 1
Block
select
Block 2 …
Reduces amount of switched capacitance
Saves power and lowers delay
[Ref’s: Rabaey, Prentice’03; T. Hirose, JSSC’90]
Slide 7.19
Hierarchical Bitlines
Local BLs
Global BLs
Divide up bitlines hierarchically
– Many variants possible
Reduces RC delay, also decreases CV 2 power
Lower BL leakage seen by accessed cell
Slide 7.20
BL Leakage During Read Access
Leakage into nonaccessed cells
“1”
“0”
“0”
Slide 7.21
Bit-line
– Raises power and delay
– Affects BL differential
Bitline Leakage Solutions
“1”
“0”
VSSWL
“1”
VG
Negative Wordline (NWL)
Hierarchical BLs
Raise VSS in cell
Negative WL voltage
Longer access FETs
Alternative bit-cells
Active compensation
Lower BL precharge
voltage
[Ref: A. Agarwal, JSSC’03]
Slide 7.22
“0”
VGND
Raise V SS in cell (VGND)
VSSWL
Lower Precharge Voltage
Lower BL precharge
voltage decreases power
and improves Read SNM
Internal bit-cell node rises
less
Sharp limit due to
accidental cell writing iff
access FET pulls internal ‘1’
low
Slide 7.23
VDD Scaling
Lower VDD (and other voltages) via classic
voltage scaling
– Saves power
– Increases
I
d
delay
l
– Limited by lost margin (read and write)
Recover Read SNM with read assist
–
–
–
–
Slide 7.24
Lower BL precharge
Boosted cell VDD [Ref: Bhavnagarwala’04, Zhang’06]
Pulsed WL and/or Write-after-Read [Ref: Khellah’06]
Lower WL [Ref: Ohbayashi’06]
Power Breakdown During Write
VDD_Prech
Accessing cell
– Similar to Read
– For Lower Power:
WL
Address
Mem
Cell
Hierarchical WLs
Performing write
– Traditionally drive BLs full swing
– For Lower Power :
Charge sharing
Data dependencies
Low swing BLs with amplification
Slide 7.25
Data
Charge recycling to reduce write power
Share charge between BLs or pairs of BLs
Saves for consecutive write operations
Need to assess overhead
Basic charge recycling – saves 50% power in theory
1
BL =
0V
0
BLB =
VDD
old values
BL =
VDD/2
1
BLB =
VDD/2
connect
floating BLs
BL =
VDD
BLB =
0V
disconnect and
drive new values
[Ref’s: K. Mai, JSSC’98; G. Ming, ASICON’05]
Slide 7.26
Memory Statistics
0’s more common
– SPEC2000: 90% 0s in data
– SPEC2000: 85% 0s in instructions
Assumed write value using inverted data as
necessary [Ref: Y. Chang, ISLPED’99]
New Bitcell:
BL
WL
WS
Slide 7.27
WZ
WWL
1R, 1W port
W0: WZ = 0, WWL = 1, WS = 1
W1: WZ = 1, WWL = 1, WS = 0
[Ref: Y. Chang, TVLSI’04]
BL
Low-Swing Write
Drive the BLs with low swing
Use amplification in cell to restore
values
VDD_Prech
EQ
BL
SLC
BLB
WL
WL
Q
EQ
SLC
WE
VWR = VDD–VTH–ΔV BL
VWR
Din
BL/BLB
Q/QB
VDD–VTH – delVBL
VDD–VTH
WE
[Ref: K. Kanda, JSSC’04]
Slide 7.28
QB
column
decoder
Write Margin
Fundamental limit to most power-reducing
techniques
Recover write margin with write assist, e.g.,
–
–
–
–
Slide 7.29
Boosted WL
Collapsed cell VDD [Itoh’96, Bhavnagarwala’04]
Raised cell VSS [Yamaoka’04, Kanda’04]
Cell with amplification [Kanda’04]
Non-traditional cells
Key tradeoff is with functional robustness
Use alternative cell to improve robustness, then trade
off for power savings
e.g. Remove read SNM
• Register file cell
• 1R, 1W port
• Read SNM eliminated
• Allows lower VDD
• 30% area overhead
• Robust layout
RWL
WBL
WWL
WBL
RBL
8T SRAM cell
Slide 7.30
[Ref: L. Chang, VLSI’05]
Cells with Pseudo-Static SNM Removal
Isolate stored data during read
Dynamic storage for duration of read
BL
BL
WL
BL
WLW
Slide 7.31
WL
WWL
BL
WLB
Differential read
Single-ended read
[Ref: S. Kosonocky, ISCICT’06]
[Ref: K. Takeda, JSSC’06]
Emerging Devices: Double-gate MOSFET
Emerging devices allow new SRAM structures
Back-gate biasing of thin-body MOSFET provides improved
control of short-channel effects, and re-instates effective dynamic
control of V TH.
Gate Fin Width = T
Si
Gate length = Lg
Drain
Gate1
Fin Height
H FIN = W /2
Switching
Gate
Gate2
VTH Control
Drain
Source
Source
Gate length = L G
Fin Height
H FIN = W
Back-gated (BG) MOSFET
Double-gated (DG) MOSFET
•
•
Independent front and back gates
One switching gate and
VTH control gate
[Ref: Z. Guo, ISLPED’05]
Slide 7.32
6T SRAM Cell with Feedback
Double-Gated (DG) NMOS pull-down
and PMOS load devices
Back-Gated (BG) NMOS access devices
dynamically increase β-ratio
PL
PR
“1”
“0”
AL NL
β ratio
increased
210 mV
6T DG-MOS
READ
Vsn2 (V)
Vsn2 (V)
– SNM during read ~300 mV
– Area penalty ~ 19%
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
STANDBY
210 mV
0
0.5
Vsn1 (V)
1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
300 mV
6T BG-MOS
READ
STANDBY
300 mV
0
[Ref: Z. Guo, ISLPED’05]
Slide 7.33
AR
NR
0.5
Vsn1 (V)
1
Summary and Perspectives
Functionality is main constraint in SRAM
– Variation makes the outlying cells limiters
– Look at hold, read, write modes
Use various methods to improve robustness
robustness,
then trade off for power savings
–C
Cellll voltages,
lt
th
thresholds
h ld
– Novel bit-cells
–E
Emerging
devices
i d
i
Embedded memory major threat to continued
technology scaling – innovative solutions
necessary
Slide 7.34
References
B k and
Books
dB
Book
k Ch
t
Chapters
K. Itoh et al., Ultra-Low Voltage Nano-scale Memories, Springer 2007.
A. Macii, “Memory Organization for Low-Energy Embedded Systems,” in Low-Power Electronics
Design, C. Piguet Ed., Chapter 26, CRC Press, 2005.
V. Moshnyaga and K. Inoue, “Low Power Cache Design,” in Low-Power Electronics Design,
C., Piguet Ed., Chapter 25, CRC Press, 2005.
J. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated Circuits, Prentice Hall, 2003.
T. Takahawara and K. Itoh, “Memoryy Leakage
g Reduction,” in Leakage
g in Nanometer CMOS
Technologies, S. Narendra, Ed., Chapter 7, Springer 2006.
Articles
A. A
A
Agarwal,
l H. Li and K. Roy, “A Single-Vt
Vt low-leakage gated-ground cache for deep
submicron,” IEEE Journal of Solid-State Circuits,38(2),pp.319–328, Feb. 2003.
N. Azizi, F. Najm and A. Moshovos, “Low-leakage asymmetric-cell SRAM,” IEEE Transactions
on VLSI, 11(4), pp. 701–715, Aug. 2003.
A. Bhavnagarwala, S. Kosonocky, S. Kowalczyk, R. Joshi, Y. Chan, U. Srinivasan and
J. Wadhwa, “A transregional CMOS SRAM with single, logic VDD and dynamic power rails,” in
Symposium on VLSI Circuits, pp. 292–293, 2004.
Y. Cao, T. Sato, D. Sylvester, M. Orshansky and C. Hu, “New paradigm of predictive MOSFET
and interconnect modeling for early circuit design, ” in Custom Integrated Circuits Conference
(CICC), Oct. 2000, pp. 201–204.
L. Chang, D. Fried, J. Hergenrother et al., “Stable SRAM cell design for the 32 nm node and
beyond,” Symposium on VLSI Technology, pp. 128–129, June 2005.
Y. Chang, B. Park and C. Kyung, “Conforming inverted data store for low power memory,” IEEE
International Symposium on Low Power Electronics and Design, 1999.
Slide 7.35
References (cont.)
Y. Chang, F. Lai and C. Yang, “Zero-aware asymmetric SRAM cell for reducing cache power in
writing zero,” IEEE Transactions on VLSI Systems, 12(8), pp. 827–836, Aug. 2004.
Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic, “FinFET-based SRAM
design,” International Symposium on Low Power Electronics and Design, pp. 2–7, Aug. 2005.
F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De,
“Analysis of Dual-VT SRAM cells with full-swing single-ended bit line sensing for on-chip
cache,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 10(2),
pp. 91–95, Apr. 2002.
T. Hirose, H. Kuriyama, S. Murakam, et al., “A 20-ns 4-Mb CMOS SRAM with hierarchical word decoding
architecture,”IEEE Journal of SolidState Circuits-, 25(5) pp. 1068–1074, Oct. 1990.
K. Itoh, A. Fridi, A. Bellaouar and M. Elmasry, “A Deep sub-V, single power-supply SRAM cell
with multi-VT, boosted storage node and dynamic load,” Symposium on VLSI Circuits,
133, June 1996.
K. Itoh, M. Horiguchi and T. Kawahara, “Ultra-low voltage nano-scale embedded RAMs,” IEEE
Symposium on Circuits and Systems, May 2006.
K. Kanda, H. Sadaaki and T. Sakurai, “90% write power-saving SRAM using sense-amplifying
memory cell,” IEEE Journal of Solid-State Circuits, 39(6), pp. 927–933, June 2004.
S.
Kosonocky,
S K
k A.
A Bhavnagarwala
Bh
l and L. Chang, International conference on solid-state and
integrated circuit technology, pp. 689–692, Oct. 2006.
K. Mai, T. Mori, B. Amrutur et al., ‘‘Low-power SRAM design using half-swing pulse-mode techniques,”
IEEE Journal of Solid-State Circuits, 33(11) pp. 1659–1671, Nov. 1998.
G. Ming, Y. Jun and X. Jun, ‘‘Low Power SRAM Design Using Charge Sharing Technique, ’’
pp.102–105, ASICON, 2005.
K. Osada, Y. Saitoh, E. Ibe and K. Ishibashi, “16.7-fA/cell tunnel-leakage- suppressed 16-Mb
SRAM for handling cosmic-ray-induced multierrors,” IEEE Journal of Solid-State Circuits,
38(11), pp. 1952–1957, Nov. 2003.
PTM – Predictive Models. Available: http://www.eas.asu.edu/˜ptm
Slide 7.36
References (cont.)
E. Seevinck, F. List and J. Lohstroh, “Static noise margin analysis of MOS SRAM Cells,” IEEE Journal
of Solid-State Circuits, SC-22(5), pp. 748–754, Oct. 1987.
K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii and H. Kobatake, “A readstatic-noise-margin-free SRAM cell for low-vdd and high-speed applications,” IEEE
International Solid-State Circuits Conference, pp. 478–479, Feb. 2005.
M. Yamaoka, Y. Shinozaki, N. Maeda, Y. Shimazaki, K. Kato, S. Shimada, K. Yanagisawa and
K. Osadal, “A 300 MHz 25 µA/Mb leakage on-chip SRAM module featuring process-variation
immunity and low -leakage -active mode for mobile -phone application processor, ” IEEE
International Solid-State Circuits Conference, 2004, pp. 494–495.
Slide 7.37