Chapter 7 Optimizing Power @ Design Time – Memory Optimizing Power @ Design Time Memory Benton H. Calhoun Jan M. Rabaey Slide 7.1 Role of Memory in ICs Memory is very important Focus in this chapter is embedded memory Percentage of area going to memory is increasing [Ref: V. De, Intel 2006] Slide 7.2 Processor Area Becoming Memory Dominated On-chip SRAM contains 50–90% of total transistor count – Xeon: 48M/110M – Itanium 2: 144M/220M SRAM SRAM is a major source of chip static power dissipation – Dominant in ultra low-power applications – Substantial fraction in others Intel Penryn™ Penryn (Picture courtesy of Intel ) Slide 7.3 Chapter Outline Introduction to Memory Architectures Power in the Cell Array Power for Read Access Power for Write Access New Memory Technologies Slide 7.4 Basic Memory Structures Block 0 Block i Block P – 1 Row address Column address Block address Global data bus Control circuitry Block selector Global amplifier/driver I/O [Ref: J. Rabaey, Prentice’03] Slide 7.5 SRAM Metrics Why is functionality a “metric”? Functionality – Data retention – Readability – Writability – Soft Errors Area Power Slide 7.6 Process variations increase with scaling Large number of cells requires analysis of tails (out to 6σ or 7σ) Within-die V TH variation due to Random Dopant Fluctuations (RDFs) Where Does SRAM Power Go? Numerous analytical SRAM power models Great variety in power breakdowns Different applications cause different components of power to dominate Hence: Depends on applications: e.g., high speed versus low power, portable Slide 7.7 SRAM cell Three tasks of a cell BL BL WL Hold data – WL = 0; BLs = X Q M3 M6 M2 Write M5 M4 M1 – WL = 1; BLs driven with new data QB Traditional 6-Transistor (6T) SRAM cell Slide 7.8 Read – WL = 1; BLs precharged and left floating Key SRAM cell metrics BL BL WL Key functionality metrics Hold – Static Noise Margin (SNM) – Data retention voltage (DRV) Q M3 M6 M1 M4 M2 Read M5 QB – Static Noise Margin (SNM) Write – Write Margin Traditional 6-Transistor (6T) SRAM cell Slide 7.9 Metrics: Area is primary constraint Next, Power , Delay Static Noise Margin (SNM) BL BLB WL VN SNM gives a measure of the cell’s stability by quantifying the DC noise required to flip the cell M3 M6 M1 M4 M5 M2 Q QB VN Inv 1 Inv 2 0.3 QB(V) VTC for Inv 2 –1 VTC for Inv 1 VTC for Inv2 with VN = SNM –1 VTC for Inv1 with VN = SNM SNM 0.15 0 SNM is length of side of the largest embedded square on the butterfly curve 0 0.15 Q (V) Slide 7.10 0.3 [Ref: E. Seevinck, JSSC’87] Static Noise Margin with Scaling Typical cell SNM deteriorates with scaling Variations lead to failure from insufficient SNM Tech and VDD scaling lower SNM Variations worsen tail of SNM distribution (Results obtained from simulations with Predictive Technology M d l – Models [Ref: PTM; Y. Cao ‘00]) Slide 7.11 Variability: Write Margin BL BLB WL 1 1 1 0 Normalized QB 0.8 0 Write failure: Positive SNM 0.6 0.4 0.2 0 0 0.2 Dominant fight (ratioed) 1 1 0.8 0.8 Normalized QB Normalized QB Cell stability prior to write: 0.6 0.4 0.2 0.6 0.8 1 Successful write: Negative “SNM” 0.6 0.4 0.2 0 0 0 0.2 0.4 0.6 Normalized Q Slide 7.12 0.4 Normalized Q 0.8 1 0 0.2 0.4 0.6 Normalized Q 0.8 1 Variability: Cell Writability VDD = 0.6 V 0.05 Write Fails SNM (V) 0 –0.05 –0.1 –0.15 –0.2 TT WW SS WS SW –0.25 –40 –20 0 20 40 60 80 100 120 Temperature (°C) Write margin limits VDD scaling for 6T cells to 600 mV, best case. 65 nm process, VDD = 0.6 V Variability and large number of cells makes this worse Slide 7.13 Cell Array Power Leakage Power dominates while the memory holds data BL BL WL ‘0’ ‘1’ Sub-threshold leakage Slide 7.14 Importance of Gate tunneling and GIDL depends on technology and voltages applied High-VTH cells necessary if all else is kept the same To keep leakage in 1 MB memory within bounds, V TH must be kept in 0.4–0.6 V range 1 MB array retention current (A) Using Threshold Voltage to Reduce Leakage Tj = 125 °C L g = 0.1 μm 100 100°C 75 °C 50 °C 25°C W (QT) = 0.20 μm W (QD) = 0.28 μm W (QL) = 0.18 μm high speed (0.49) 10–2 10–4 low power (0.71) 10 µA 10–6 0.1 µA 10–8 –0.2 0 0.2 0.4 0.6 0.8 1.0 Average extrapolated VTH (V) at 25°C Extrapolated VTH = VTH (nA /μm) + 0.3 V [Ref: K. Itoh, ISCAS’06] Slide 7.15 Multiple Threshold Voltages BL WL BL BL WL BL ‘0’ Dual VTH cells with low-VTH access transistors provide good tradeoffs in power and delay [Ref: Hamzaoglu, et al., TVLSI’02] High VTH Use high-VTH devices to lower leakage for stored ‘0’, which is much more common than a stored ‘1’ Low VTH [Ref: N. Azizi, TVLSI’03] Slide 7.16 Multiple Voltages Selective usage of multiple voltages in cell array – e.g.,16 fA/cell at 25°C in 0.13 μm technology 1.0V WL=0V 1.5V 1.0V High VTH to lower subVTH leakage Raised source, raised VDD, and lower BL reduce gate stress while maintaining SNM 0.5V [Ref: K. Osada, JSSC’03] Slide 7.17 Power Breakdown During Read VDD_Prech Accessing correct cell Decoders, WL d D drivers i – For Lower Power: WL Address hierarchical hi hi l WLs WL pulsed decoders Sense mp Performing read – Charge and discharge large BL capacitance – For Lower Power : Slide 7.18 Data SAs and low BL swing Lower VDD Hierarchical BLs – May require read assist Lower BL precharge Mem Hierarchical Wordline Architecture Global word line Subglobal word line Local word line … Block group select Block select Local word line … … Memory cell Block 0 Block 1 Block select Block 2 … Reduces amount of switched capacitance Saves power and lowers delay [Ref’s: Rabaey, Prentice’03; T. Hirose, JSSC’90] Slide 7.19 Hierarchical Bitlines Local BLs Global BLs Divide up bitlines hierarchically – Many variants possible Reduces RC delay, also decreases CV 2 power Lower BL leakage seen by accessed cell Slide 7.20 BL Leakage During Read Access Leakage into nonaccessed cells “1” “0” “0” Slide 7.21 Bit-line – Raises power and delay – Affects BL differential Bitline Leakage Solutions “1” “0” VSSWL “1” VG Negative Wordline (NWL) Hierarchical BLs Raise VSS in cell Negative WL voltage Longer access FETs Alternative bit-cells Active compensation Lower BL precharge voltage [Ref: A. Agarwal, JSSC’03] Slide 7.22 “0” VGND Raise V SS in cell (VGND) VSSWL Lower Precharge Voltage Lower BL precharge voltage decreases power and improves Read SNM Internal bit-cell node rises less Sharp limit due to accidental cell writing iff access FET pulls internal ‘1’ low Slide 7.23 VDD Scaling Lower VDD (and other voltages) via classic voltage scaling – Saves power – Increases I d delay l – Limited by lost margin (read and write) Recover Read SNM with read assist – – – – Slide 7.24 Lower BL precharge Boosted cell VDD [Ref: Bhavnagarwala’04, Zhang’06] Pulsed WL and/or Write-after-Read [Ref: Khellah’06] Lower WL [Ref: Ohbayashi’06] Power Breakdown During Write VDD_Prech Accessing cell – Similar to Read – For Lower Power: WL Address Mem Cell Hierarchical WLs Performing write – Traditionally drive BLs full swing – For Lower Power : Charge sharing Data dependencies Low swing BLs with amplification Slide 7.25 Data Charge recycling to reduce write power Share charge between BLs or pairs of BLs Saves for consecutive write operations Need to assess overhead Basic charge recycling – saves 50% power in theory 1 BL = 0V 0 BLB = VDD old values BL = VDD/2 1 BLB = VDD/2 connect floating BLs BL = VDD BLB = 0V disconnect and drive new values [Ref’s: K. Mai, JSSC’98; G. Ming, ASICON’05] Slide 7.26 Memory Statistics 0’s more common – SPEC2000: 90% 0s in data – SPEC2000: 85% 0s in instructions Assumed write value using inverted data as necessary [Ref: Y. Chang, ISLPED’99] New Bitcell: BL WL WS Slide 7.27 WZ WWL 1R, 1W port W0: WZ = 0, WWL = 1, WS = 1 W1: WZ = 1, WWL = 1, WS = 0 [Ref: Y. Chang, TVLSI’04] BL Low-Swing Write Drive the BLs with low swing Use amplification in cell to restore values VDD_Prech EQ BL SLC BLB WL WL Q EQ SLC WE VWR = VDD–VTH–ΔV BL VWR Din BL/BLB Q/QB VDD–VTH – delVBL VDD–VTH WE [Ref: K. Kanda, JSSC’04] Slide 7.28 QB column decoder Write Margin Fundamental limit to most power-reducing techniques Recover write margin with write assist, e.g., – – – – Slide 7.29 Boosted WL Collapsed cell VDD [Itoh’96, Bhavnagarwala’04] Raised cell VSS [Yamaoka’04, Kanda’04] Cell with amplification [Kanda’04] Non-traditional cells Key tradeoff is with functional robustness Use alternative cell to improve robustness, then trade off for power savings e.g. Remove read SNM • Register file cell • 1R, 1W port • Read SNM eliminated • Allows lower VDD • 30% area overhead • Robust layout RWL WBL WWL WBL RBL 8T SRAM cell Slide 7.30 [Ref: L. Chang, VLSI’05] Cells with Pseudo-Static SNM Removal Isolate stored data during read Dynamic storage for duration of read BL BL WL BL WLW Slide 7.31 WL WWL BL WLB Differential read Single-ended read [Ref: S. Kosonocky, ISCICT’06] [Ref: K. Takeda, JSSC’06] Emerging Devices: Double-gate MOSFET Emerging devices allow new SRAM structures Back-gate biasing of thin-body MOSFET provides improved control of short-channel effects, and re-instates effective dynamic control of V TH. Gate Fin Width = T Si Gate length = Lg Drain Gate1 Fin Height H FIN = W /2 Switching Gate Gate2 VTH Control Drain Source Source Gate length = L G Fin Height H FIN = W Back-gated (BG) MOSFET Double-gated (DG) MOSFET • • Independent front and back gates One switching gate and VTH control gate [Ref: Z. Guo, ISLPED’05] Slide 7.32 6T SRAM Cell with Feedback Double-Gated (DG) NMOS pull-down and PMOS load devices Back-Gated (BG) NMOS access devices dynamically increase β-ratio PL PR “1” “0” AL NL β ratio increased 210 mV 6T DG-MOS READ Vsn2 (V) Vsn2 (V) – SNM during read ~300 mV – Area penalty ~ 19% 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 STANDBY 210 mV 0 0.5 Vsn1 (V) 1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 300 mV 6T BG-MOS READ STANDBY 300 mV 0 [Ref: Z. Guo, ISLPED’05] Slide 7.33 AR NR 0.5 Vsn1 (V) 1 Summary and Perspectives Functionality is main constraint in SRAM – Variation makes the outlying cells limiters – Look at hold, read, write modes Use various methods to improve robustness robustness, then trade off for power savings –C Cellll voltages, lt th thresholds h ld – Novel bit-cells –E Emerging devices i d i Embedded memory major threat to continued technology scaling – innovative solutions necessary Slide 7.34 References B k and Books dB Book k Ch t Chapters K. Itoh et al., Ultra-Low Voltage Nano-scale Memories, Springer 2007. A. Macii, “Memory Organization for Low-Energy Embedded Systems,” in Low-Power Electronics Design, C. Piguet Ed., Chapter 26, CRC Press, 2005. V. Moshnyaga and K. Inoue, “Low Power Cache Design,” in Low-Power Electronics Design, C., Piguet Ed., Chapter 25, CRC Press, 2005. J. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated Circuits, Prentice Hall, 2003. T. Takahawara and K. Itoh, “Memoryy Leakage g Reduction,” in Leakage g in Nanometer CMOS Technologies, S. Narendra, Ed., Chapter 7, Springer 2006. Articles A. A A Agarwal, l H. Li and K. Roy, “A Single-Vt Vt low-leakage gated-ground cache for deep submicron,” IEEE Journal of Solid-State Circuits,38(2),pp.319–328, Feb. 2003. N. Azizi, F. Najm and A. Moshovos, “Low-leakage asymmetric-cell SRAM,” IEEE Transactions on VLSI, 11(4), pp. 701–715, Aug. 2003. A. Bhavnagarwala, S. Kosonocky, S. Kowalczyk, R. Joshi, Y. Chan, U. Srinivasan and J. Wadhwa, “A transregional CMOS SRAM with single, logic VDD and dynamic power rails,” in Symposium on VLSI Circuits, pp. 292–293, 2004. Y. Cao, T. Sato, D. Sylvester, M. Orshansky and C. Hu, “New paradigm of predictive MOSFET and interconnect modeling for early circuit design, ” in Custom Integrated Circuits Conference (CICC), Oct. 2000, pp. 201–204. L. Chang, D. Fried, J. Hergenrother et al., “Stable SRAM cell design for the 32 nm node and beyond,” Symposium on VLSI Technology, pp. 128–129, June 2005. Y. Chang, B. Park and C. Kyung, “Conforming inverted data store for low power memory,” IEEE International Symposium on Low Power Electronics and Design, 1999. Slide 7.35 References (cont.) Y. Chang, F. Lai and C. Yang, “Zero-aware asymmetric SRAM cell for reducing cache power in writing zero,” IEEE Transactions on VLSI Systems, 12(8), pp. 827–836, Aug. 2004. Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic, “FinFET-based SRAM design,” International Symposium on Low Power Electronics and Design, pp. 2–7, Aug. 2005. F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De, “Analysis of Dual-VT SRAM cells with full-swing single-ended bit line sensing for on-chip cache,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 10(2), pp. 91–95, Apr. 2002. T. Hirose, H. Kuriyama, S. Murakam, et al., “A 20-ns 4-Mb CMOS SRAM with hierarchical word decoding architecture,”IEEE Journal of SolidState Circuits-, 25(5) pp. 1068–1074, Oct. 1990. K. Itoh, A. Fridi, A. Bellaouar and M. Elmasry, “A Deep sub-V, single power-supply SRAM cell with multi-VT, boosted storage node and dynamic load,” Symposium on VLSI Circuits, 133, June 1996. K. Itoh, M. Horiguchi and T. Kawahara, “Ultra-low voltage nano-scale embedded RAMs,” IEEE Symposium on Circuits and Systems, May 2006. K. Kanda, H. Sadaaki and T. Sakurai, “90% write power-saving SRAM using sense-amplifying memory cell,” IEEE Journal of Solid-State Circuits, 39(6), pp. 927–933, June 2004. S. Kosonocky, S K k A. A Bhavnagarwala Bh l and L. Chang, International conference on solid-state and integrated circuit technology, pp. 689–692, Oct. 2006. K. Mai, T. Mori, B. Amrutur et al., ‘‘Low-power SRAM design using half-swing pulse-mode techniques,” IEEE Journal of Solid-State Circuits, 33(11) pp. 1659–1671, Nov. 1998. G. Ming, Y. Jun and X. Jun, ‘‘Low Power SRAM Design Using Charge Sharing Technique, ’’ pp.102–105, ASICON, 2005. K. Osada, Y. Saitoh, E. Ibe and K. Ishibashi, “16.7-fA/cell tunnel-leakage- suppressed 16-Mb SRAM for handling cosmic-ray-induced multierrors,” IEEE Journal of Solid-State Circuits, 38(11), pp. 1952–1957, Nov. 2003. PTM – Predictive Models. Available: http://www.eas.asu.edu/˜ptm Slide 7.36 References (cont.) E. Seevinck, F. List and J. Lohstroh, “Static noise margin analysis of MOS SRAM Cells,” IEEE Journal of Solid-State Circuits, SC-22(5), pp. 748–754, Oct. 1987. K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii and H. Kobatake, “A readstatic-noise-margin-free SRAM cell for low-vdd and high-speed applications,” IEEE International Solid-State Circuits Conference, pp. 478–479, Feb. 2005. M. Yamaoka, Y. Shinozaki, N. Maeda, Y. Shimazaki, K. Kato, S. Shimada, K. Yanagisawa and K. Osadal, “A 300 MHz 25 µA/Mb leakage on-chip SRAM module featuring process-variation immunity and low -leakage -active mode for mobile -phone application processor, ” IEEE International Solid-State Circuits Conference, 2004, pp. 494–495. Slide 7.37
© Copyright 2025 Paperzz