Computer Architecture Peripherals By Dan Tsafrir, 6/6/2011 Presentation based on slides by Lihu Rappoport 1 Computer Architecture 2011 – peripherals MEMORY: REMINDER 2 Computer Architecture 2011 – peripherals Not so long ago… Performance 1000 CPU CPU 60% per yr 2X in 1.5 yrs 100 Gap grew 50% per year 10 DRAM 2000 1999 1998 1997 1996 1995 1994 1993 1992 1991 1990 1989 1988 1987 1986 1985 1984 1983 1982 1981 1980 1 DRAM 9% per yr 2X in 10 yrs Time 3 Computer Architecture 2011 – peripherals Not so long ago… In 1994, in their paper “Hitting the Memory Wall: Implications of the Obvious”, William Wulf & Sally McKee said: “We all know that the rate of improvement in microprocessor speed exceeds the rate of improvement in DRAM memory speed – each is improving exponentially, but the exponent for microprocessors is substantially larger than that for DRAMs. The difference between diverging exponentials also grows exponentially; so, although the disparity between processor and memory speed is already an issue, downstream someplace it will be a much bigger one.” 4 Computer Architecture 2011 – peripherals More recently (2008)… Slow Conventional architecture Performance (seconds) lower = slower Fast Processor cores The memory wall in the multicore era 5 Computer Architecture 2011 – peripherals Memory Trade-Offs Large (dense) memories are slow Fast memories are small, expensive and consume high power Goal: give the processor a feeling that it has a memory which is large (dense), fast, consumes low power, and cheap Solution: a Hierarchy of memories CPU Speed: Size: Cost: Power: 6 L1 Cache Fastest Smallest Highest Highest L2 Cache L3 Cache Memory (DRAM) Slowest Biggest Lowest Lowest Computer Architecture 2011 – peripherals Typical levels in mem hierarchy Memory level Size Response time CPU registers ≈ 100 bytes ≈ 0.5 ns L1 cache ≈ 64 KB ≈ 1 ns L2 cache ≈ 1 – 4 MB ≈ 15 ns Main memory (DRAM) ≈ 1 – 4 GB ≈ 150 ns Hard disk (SATA) ≈ 1 – 2 TB ≈ 15 ms 7 Computer Architecture 2011 – peripherals DRAM & SRAM 8 Computer Architecture 2011 – peripherals DRAM basics 9 DRAM Dynamic random-access memory Random access = access cost the same (well, not really) CPU thinks of DRAM as 1-dimensional Simpler But DRAM is actually arranged as a 2-D grid Need row & col addresses to access Given “1-D address”, DRAM interface splits it to row & col Some time duration must elapse between row & col access (10s of ns) Computer Architecture 2011 – peripherals DRAM basics 10 Why 2D? Why delayed row & col accesses? Every address-bit requires a physical pin DRAMs are large (GBs nowadays) => need many pins => more expensive A DRAM array has Row decoder • Extracts row number from memory address Column decoder • Extracts column number from memory address Sense amplifiers • Hold row when (1) written to, (2) read from, (3) is refreshed (see next slide) Computer Architecture 2011 – peripherals DRAM basics 11 Use one transistor-capacitor pair Per bit Capacitors leaks => Need to be refreshed every few ms DRAM spends ~1% of time in refreshing “Opening” a row = fetching it to sense amplifiers = refreshing it Is it worth it to make DRAM a rectangle (rather than square?) Computer Architecture 2011 – peripherals x1 DRAM Column decoder Sense amplifiers Memory array one bit …rows… Row decoder Data in/out buffers …columns… 12 Computer Architecture 2011 – peripherals DRAM banks 13 Each DRAM memory array outputs one bit DRAMs use multiple arrays to output multiple bits at a time x N indicates DRAM with N memory arrays Typical today: x16, x32 Each collection of x N arrays forms a DRAM bank Can read/write from/to each bank independently Computer Architecture 2011 – peripherals x4 DRAM Memory Memory array Memory array Memory array array one bit …row… …row… …row… …rows… Row Row decoder Row decoder Row decoder decoder Data Data in/out Data in/out Data buffers in/out buffers in/out buffers buffers Column Column decoder Column decoder Column decoder decoder Sense Sense amplifiers Sense amplifiers Sense amplifiers amplifiers …columns… …columns… …columns… …columns… 14 Computer Architecture 2011 – peripherals Ranks & DIMMs 15 DIMM (Dual in-line) memory module (the unit we connect to the MB) Increase bandwidth by delivering data from multiple banks Bandwidth by one bank is limited => Put multiple banks on DIMM Bus has higher clock frequency than any one DRAM Bus controls switches between banks to achieve high data rate Increase capacity by utilizing multiple ranks Each rank is an independent set of banks that can be accessed for the full data bit‐width, • 64 bits for non-ECC; 72 for ECC (error correction code) Ranks cannot be accessed simultaneously • As they share the same data path Computer Architecture 2011 – peripherals Ranks & DIMMs 1GB 2Rx8 (= 2ranks x 8 banks) 16 Computer Architecture 2011 – peripherals Modern DRAM organization A system has multiple DIMMs Each DIMM has multiple DRAM banks Arranged in one or more ranks Each bank has multiple DRAM arrays Concurrency in banks increases memory bandwidth 17 Computer Architecture 2011 – peripherals Memory controller address/command bus Memory controller data bus chip select 1 address/command bus data bus chip select 2 18 Computer Architecture 2011 – peripherals Memory controller 19 Functionality Executes processor memory requests In earlier systems Separate off-processor chip In modern systems Integrated on-chip with the processor Interconnect with processor Bus, but can be point-to-point, or through crossbar Computer Architecture 2011 – peripherals Lifetime of a memory access 1. 2. 3. 4. 20 Processor orders & queues memory requests Request(s) sent to memory controller Controller queues & orders requests For each request in queue, when the time is right 1. Controller waits until requested DRAM ready 2. Controller breaks address bits into rank, bank, row, column fields 3. Controller sends chip-select signal to select rank 4. Selected bank pre-charged to activate selected row 5. Activate row within selected DRAM bank • Use “RAS” (row-address strobe signal) 6. Send (entire) row to sense amplifiers 7. Select desired column • Use “CAS” (column-address strobe signal) 8. Send data back Computer Architecture 2011 – peripherals Basic DRAM array Memory address bus CAS# Column latch RAS# Addr Row latch Column addr decoder Row address decoder Data Memory array Timing (2 phases) Decode row address + RAS assert Wait for “RAS to CAS delay” Decode column address + CAS assert Transfer DATA 21 Computer Architecture 2011 – peripherals DRAM timing 22 CAS Latency Number of clock cycles to access a specific column of data From moment the memory controller issues a column in the current row until data is read out from memory RAS to CAS delay Number of cycles between row and column access Row pre-charge time Number of cycles to close the opened-row & to open next-row Computer Architecture 2011 – peripherals Addressing sequence precharge delay access time RAS# RAS/CAS delay CAS# A[0:7] X Row i Col n X Row j CAS latency Data Data n Access sequence Put row address on data bus and assert RAS# Wait for RAS# to CAS# delay (tRCD) Put column address on data bus and assert CAS# DATA transfer Pre-charge 23 Computer Architecture 2011 – peripherals Improved DRAM Schemes Paged Mode DRAM – Multiple accesses to different columns from same row (special locality) – Saves time it takes to bring a new row (but might be unfair) RAS# CAS# A[0:7] X Row X Col n X Col n+1 X Data n Data X Col n+2 D n+1 D n+2 Extended Data Output RAM (EDO RAM) – A data output latch enables to parallelize next column address with current column data RAS# CAS# A[0:7] Data 24 X Row X Col n X Col n+1 X X Col n+2 Data n Data n+1 Data n+2 Computer Architecture 2011 – peripherals Improved DRAM Schemes (cont) Burst DRAM – Generates consecutive column address by itself RAS# CAS# A[0:7] Data 25 X Row X Col n X Data n Data n+1 Data n+2 Computer Architecture 2011 – peripherals Synchronous DRAM (SDRAM) 26 Asynchrony in DRAM Due to RAS & CAS arriving at any time Synchronous DRAM Uses clock to deliver requests at regular intervals More predictable DRAM timing => Less skew => Faster turnaround SDRAMs support burst-mode access Initial performance similar to BEDO (=burst +EDO) Clock scaling enabled higher transfer rates later • => DDR SDRAM => DDR2 => DDR3 Computer Architecture 2011 – peripherals DRAM vs. SRAM (Random access = access time the same for all locations) DRAM – Dynamic RAM SRAM – Static RAM Refresh Yes (~1% time) No Address Address muxed: row+col Address not multiplexed Random Access Not really… Yes density High (1 Transistor/bit) Low (6 Transistor/bit) Power low high Speed slow fast Price/bit low high Typical usage Main memory cache 27 Computer Architecture 2011 – peripherals
© Copyright 2026 Paperzz