09.pdf

Chapter 9
Optimizing Power @ Standby – Memory
Optimizing Power @ Standby
Memory
Benton H. Calhoun
Jan M. Rabaey
Slide 9.1
Chapter Outline
Memory in Standby
Voltage Scaling
Body Biasing
Periphery
Slide 9.2
Memory Dominates Processor Area
SRAM is a major source of static power in ICs,
especially for low-power applications
Special memory requirement: need to retain state
in standby
Metrics for standby:
– 1. Leakage power
– 2. Energy overhead for entering/leaving standby
– 3. Timing/area overhead
BL
BL
WL
Q
M3
M6
M1
M4
M2
M5
QB
Slide 9.3
Reminder of “Design-Time” Leakage Reduction
Design-time techniques (Chapter 7) also impact
leakage
– High-VTH transistors
– Different precharge voltages
– Floating BLs
This chapter: adaptive methods that
uniquely address memory standby power
Slide 9.4
The Voltage Knobs
V DD -δ
NMOS
(DIBL)
1
V DD
-δ
0
0 C
0
B1
ΔVTH ≅ k ( δ + 2ψ − 2ψ )
10
B2
ΔVTH ≅ − δ
10
V DD
0
A2 VDD
-δ
10
10
0
L = 90 nm, T = 2 nm
VDD = 1 V
S = 100 mV/decade
K = 0.2 V1/2, 2ψ = 0.6 V
λ = 0.05
0
+δ
0
0.2
+δ
0
A1 VDD
0
10
0.4
0.6
Offset voltage, δ (V)
[Ref: Y. Nakagome, IBM’03]
Slide 9.5
0
0
Leakage reduction (ratio)
Changing internal
voltages has different
impact on leakage of
various transistors in
cell
Voltage changes
accomplished by
playing tricks with
peripheral circuits
0.8
1.0
Lower VDD in Standby
VDDH
VDD
VDDL
Active mode
VDD
Standby mode
drowsy
VDDlow
drowsy
VDD_SRAM
Example
SRAM
Basic Idea: Lower VDD lowers leakage
– sub-threshold leakage
– GIDL
– gate tunneling
Question: What sets the lower limit?
[Ref: K. Flautner, ISCA’02]
Slide 9.6
Limits to VDD Scaling: DRV
Data Retention Voltage (DRV):
Voltage below which a bit-cell loses its data
0.4
130 nm CMOS
VDD = 0.4 V
V2 ( V)
0.3
That is, the supply voltage at
which the Static Noise Margin
(SNM) of the SRAM cell in
standby mode reduces to zero.
0.2
VDD = 0.18 V
0.1
VTC
VTC
2
0
0
0.1
0.2
V1 ( V)
[Ref: H. Qin, ISQED’04]
Slide 9.7
1
0.3
0.4
Power savings of DRV
1.4 mm
1.4 mm
Leakage Current ( A)
60
IP
Module
of 4 kB
SRAM
50
40
30
Measured
DRV range
20
10
0
0
0.2
0.4
0.6
0.8
1
Supply Voltage (V)
Test chip in 130 nm CMOS
technology with built-in voltage
regulator
• More than 90% reduction in
leakage power with 350 mV
standby V DD (100 mV guard band).
[Ref: H. Qin, ISQED’04]
Slide 9.8
DRV and Transistor Sizes
190
DRV (mV)
180
170
160
150
140
0
Ma
Mp
Mn
Model
1
2
Width Scaling Factor
3
Where Ma , Mp , and Mn are the access transistor, PMOS pull-up,
and NMOS pull-down, respectively
[Ref: H. Qin, Jolpe’06]
Slide 9.9
Impact of Process “Balance”
Stronger PMOS or NMOS (SP, SN ) in subthreshold lowers SNM even for typical cell
[Ref: J. Ryan, GLSVLSI’07]
Slide 9.10
Impact of Process Variations on DRV
DRV varies widely from cell to
cell
Most variations random with
some systematic effects (e.g.,
module boundaries)
DRV histogram has long tail
DRV histogram for 32 Kb SRAM
6000
130 nm CMOS
5000
4000
DRV Spatial Distribution
3000
2000
1000
0
100
200
DRV (mV)
[Ref: H. Qin, ISQED’04]
Slide 9.11
300
400
Impact of Process Variations on DRV
DRV distribution for 90 nm and 45 nm CMOS
0.10
©IEEE 2007
Frequency
0.08
0.06
0.04
90 nm tail
0.02
0
50
100
150
200
45 nm tail
250
300
350
DRV (mV)
Other sources of variation:
Global variations, data values, temperature (weak), bitline voltage (weak
[Ref: J. Wang, CICC’07]
Slide 9.12
)
DRV Statistics for an Entire Memory
DRV distribution is neither normal nor log-normal
CDF model of DRV distribution (FDRV(x ) = 1 –P(SNM < 0, VDD = x ))
Worst DRV (mV)
350
Model
Normal
Log-normal
Monte-Carlo
300
250
200
150
© IEEE 2007
100
3
4
5
6
7
Memory size σ
[Ref: J. Wang, ESSCIRC’07]
Slide 9.13
8
Reducing the DRV
6000
5000
4000
3000
2000
1000
0
100
200
DRV (mV)
400
300
Chip DRV
1. Cell optimization
2. ECC (Error-Correcting Codes)
3. Cell optimization + ECC
Slide 9.14
Lowering the DRV Using ECC
ECC
Encoder
Write
D
P
Data Correction
Data In
Read
Error Correction Challenges
Maximize correction rate
Minimize timing overhead
Minimize area overhead
Hamming [31, 26, 3] achieves 33%
power saving
Reed–Muller [256,
[256, 219,
219, 8]
8] achieves
achieves
Reed-Muller
35% power saving
[Ref: A. Kumar, ISCAS’07]
Slide 9.15
ECC
Data Out
Decoder
300
100
0
100
150
200
250 300 350 400
Original DRV (mV)
300
450
500
550
Optimized
200
100
150
200
250 300 350 400
Optimized DRV (mV)
300
450
500
0.6
100
200
250
300
350
400
B
320mV
255mV
0.2
C
D
0
0.2
450
500
Optimized DRV with Error Correction (mV)
550
0.6
VDD (V)
0.8
1
Standby VDD
A
Standard
1V
B
Standard
DRVMAX +100 mV
C
Optimized
DRVMAX+100 mV
D
Optimized
wtih ECC
DRVECC_MAX+100 mV
[Ref: A. Kumar, ISCAS’07]
Slide 9.16
0.4
SRAM
Optimized+ECC
150
650mV
0.4
550
200
0
100
50X
0.8
0
0
100
A
Original SRAM
Optimized SRAM w/ ECC
1
Standard
200
Normalized SRAM leakage current
1K words DRV histogram 1K words DRV histogram 1K words DRV histogram
Combining Cell Optimization and ECC
How to Approach the DRV Safely?
Adjustable
Power
Supply
VDD
VCTRL
voltages
Reset
Sub-VTH
Controller
“1”
“1”
“0”
Failure Detectors
“0”
Core Cells
Using “canary cells” to set the standby voltage
in closed loop
[Ref: J. Wang, CICC’07]
Slide 9.17
How to Approach the DRV Safely?
Histogram
Less
power
More
reliable
Failure
Threshold
Canary Replica &
test circuit
Multiple sets of
canary cells
SRAM cell
128KbSRAM
ARRAY
DRV
Mean DRV of Canary
Cells (V)
©IEEE 2007
0.8
0.6
0.4
0.2
0
0
0.2
0.4
VCTRL (V)
Slide 9.18
0.6
0.8
0.6% area overhead
in 90 nm test chip
[Ref: J. Wang, CICC’07]
Raising VSS
Raise bit-cellVSS in standby (e.g., 0 to 0.5 V)
Lower BL voltage in standby (e.g., 1.5 to 1 V)
‘0’ is 0.5 V
1.0 V
Lower voltage
less
gate leakage and GIDL
WL = 0 V
1.5 V
‘1’
‘0’
Lower VDS less subVTH leakage (DIBL)
Negative VBS reduces
sub-VTH leakage
[Ref: K. Osada, JSSC’03]
Slide 9.19
1.0 V
0.5 V
Body Biasing
Reverse Body Bias (RBB) for leakage reduction
– Move FET source (as in raised V SS )
– Move FET body
Example: Whenever WL is low, apply RBB
Active
V PB
WL
BL
WL
V DD
Standby
V DD
0V
BLB
VDD,VSS
V DD
0V
2V DD
V SS
VPB,VNB
V DD
0V
V NB
-V DD
[Ref: H. Kawaguchi, VLSI Symp. 98]
Slide 9.20
Combining Body Biasing and Voltage Scaling
Active
VPB
WL
BL
WL
VDD
Standby
VDD
0V
BLB
VDD,VSS
VDD
0V
2V DD
VSS
VPB,VNB
V NB
0V
-VDD
[Ref: A. Bhavnagarwala, SOC’00]
Slide 9.21
VDD
Combining Raised VSS and RBB
VPB
BL
WL
VDD
VNB
VSS
28X savings in standby power reported
[Ref: L. Clark, TVLSI’04]
Slide 9.22
BLB
Voltage Scaling in and Around the Bitcell
Large number of reported techniques
[1] K. Osada et al. JSSC 2001
[2] N. Kim et al. TVLSI 2004
[3] H. Qin et al. ISQED 2004
[4] K. Kanda et al. ASIC/SOC 2002
[5] A. Bhavnagarwala et al. SymVLSIC 2004
[6] T. Enomoto et al. JSSC 2003
[7] M. Yamaoka et al. SymVLSIC 2002
[8] M. Yamaoka et al. ISSC 2004
[9] A. Bhavnagarwala et al. ASIC/SOC 2000
[10] K. Itoh et al. SymVLSIC 1996
[11] H. Yamauchi et al. SymVLSIC 1996
[12] K. Osada et al. JSSC 2003
[13] K. Zhang et al. SymVLSIC 2004
[14] K. Nii et al. ISSCC 2004
[15] A. Agarwal et al. JSSC 2003
[16] K. Kanda et al. JSSC 2004
Slide 9.23
Periphery Breakdown
Periphery leakage often not ignorable
– Wide transistors to drive large load capacitors
– Low-VTH transistors to meet performance specs
Chapter 8 techniques for logic leakage reduction
equally applicable, but …
Task made easier than for generic logic because
of well-defined structure and signal patterns of
periphery
– e.g., decoders output 0 in standby
Lower peripheral VDD can be used, but needs fast
level-conversion to interface with array
Slide 9.24
Summary and Perspectives
SRAM standby power is leakage-dominated
Voltage knobs are effective to lower power
Adaptive schemes must account for variation to
allow outlying cells to function
Combined schemes are most promising
– e.g., Voltage scaling and ECC
Important to assess overhead!
– Need for exploration and optimization framework, in
the style we have defined for logic
Slide 9.25
References
Books and Book Chapters:
K. Itoh, M. Horiguchi and H. Tanaka, Ultra-Low Voltage Nano-Scale Memories, Springer 2007.
T. Takahawara and K. Itoh, “Memory Leakage Reduction,” in Leakage in Nanometer CMOS
Technologies, S. Narendra, Ed, Chapter 7, Springer 2006.
Articles:
A. Agarwal, L. Hai and K. Roy, “A single-V/sub t/low-leakage gated-ground cache for deep
submicron,” IEEE Journal of Solid-State Circuits, pp. 319–328, Feb. 2003.
A. Bhavnagarwala, A. Kapoor, J. Meindl, “Dynamic-threshold CMOS SRAM cells for fast,
portable applications,” Proceedings of IEEE ASIC/SOC Conference, pp. 359–363, Sep. 2000.
A. Bhavnagarwala et al., “A transregional CMOS SRAM with single, logic V/sub DD/and dynamic
power rails,” Proceedings of IEEE VLSI Circuits Symposium, pp. 292–293, June 2004.
L. Clark, M. Morrow and W. Brown, “Reverse-body bias and supply collapse for low effective
standby power,” IEEE Transactions on VLSI, pp. 947–956, Sep. 2004.
T. Enomoto, Y. Ota and H. Shikano, “A self-controllable voltage level (SVL) circuit and its lowpower high-speed CMOS circuit applications, “ IEEE Journal of Solid-State Circuits,” 38(7),
pp. 1220–1226, July 2003.
K. Flautner et al., “Drowsy caches: Simple techniques for reducing leakage power”.,
Proceedings of ISCA 2002, pp. 148–157, Anchorage, May 2002.
K. Itoh et al., “A deep sub-V, single power-supply SRAM cell with multi-VT, boosted storage node
and dynamic load, ”Proceedings of VLSI Circuits Symposium, pp. 132–133, June, 1996.
K. Kanda, T. Miyazaki, S. Min, H. Kawaguchi and T. Sakurai, “Two orders of magnitude leakage
power reduction of low voltage SRAMs by row-by-row dynamic Vdd control (RRDV) scheme,”
Proceedings of IEEE ASIC/SOC Conference, pp. 381–385, Sep. 2002.
Slide 9.26
References (cont.)
K. Kanda, et al., “90% write power-saving SRAM using sense-amplifying memory cell,”IEEE
Journal of Solid-State Circuits, pp. 927–933, June 2004
H. Kawaguchi, Y. Itaka and T. Sakurai, “Dynamic leakage cut-off scheme for low-voltage
SRAMs,”Proceedings of VLSI Symposium, pp. 140–141, June 1998.
A. Kumar et al., “Fundamental bounds on power reduction during data-retention in standby
SRAM,”Proceedings ISCAS 2007, pp. 1867–1870, May 2007.
N.Kim, K. Flautner, D. Blaauw and T. Mudge, “Circuit and microarchitectural techniques for
reducing cache leakage power,”IEEE Transactions on VLSI, pp. 167–184, Feb. 2004 167–184
Y. Nakagome et al., “Review and prospects of low-voltage RAM circuits,”IBM J. R & D, 47(516),
pp. 525–552, Sep./Nov. 2003.
K. Osada, “Universal-Vdd 0.65–2.0-V 32-kB cache using a voltage-adapted timing-generation
scheme and a lithographically symmetrical cell,” IEEE Journal of Solid-State Circuits,
pp. 1738–1744, Nov. 2001.
K. Osada et al., “16.7-fA/cell tunnel-leakage-suppressed 16-Mb SRAM for handling cosmic-rayinduced multierrors,”IEEE Journal of Solid-State Circuits, pp. 1952–1957, Nov. 2003.
H. Qin, et al., “SRAM leakage suppression by minimizing standby supply voltage,”Proceedings of
ISQED, pp. 55–60, 2004.
H. Qin, R. Vattikonda, T. Trinh, Y. Cao and J. Rabaey, “SRAM cell optimization for ultra-low
power standby,”Journal of Low Power Electronics, 2(3), pp. 401–411, Dec. 2006.
J. Ryan, J. Wang and B. Calhoun, "Analyzing and modeling process balance for sub-threshold
circuit design” Proceedings GLSVLSI, pp. 275–280, Mar. 2007.
J. Wang and B. Calhoun, “Canary replica feedback for Near-DRV standby VDD scaling in a
90 nm SRAM,” Proceedings of Custom Integrated Circuits Conference (CICC ), pp. 29–32,
Sep. 2007.
Slide 9.27
References (cont.)
J. Wang, A . Singhee, R. Rutenbar and B. Calhoun, “Statistical modeling for the minimum standby
supply voltage of a full SRAM array ”, Proceedings of European Solid-State Circuits Conference
(ESSCIRC ), pp. 400–403, Sep. 2007.
M. Yamaoka et al., “0.4-V logic library friendly SRAM array using rectangular-diffusion cell and
delta-boosted-array-voltage scheme, Proceedings of VLSI Circuits Symposium, pp. 13–15,
June 2002.
M. Yamaoka, et al., “A 300 MHz 25 μA/Mb leakage on-chip SRAM module featuring processvariation immunity and low-leakage-active mode for mobile-phone application processor,”
Proceedings of IEEE Solid-State Circuits Conference, pp. 15–19, Feb. 2004.
K. Zhang et al., “SRAM design on 65 nm CMOS technology with integrated leakage reduction
scheme,” Proceedings of VLSI Circuits Symposium, 2004, pp. 294–295, June 2004.
Slide 9.28