The Memory Hierarchy CPSC 321 Andreas Klappenecker Some Results from the Survey • Issues with the CS curriculum • CPSC 111 Computer Science Concepts & Prg • CPSC 310 Databases • CPSC 431 Software Engineering • Something from the wish list: • • • • More C++ More Software Engineering More focus on industry needs Less focus on industry needs Some Results from the Survey • Why (MIPS) assembly language? • More detailed explanations of programming language xyz. • Implement slightly reduced version of the Pentium 4 or Athlon processors • Have another computer architecture class • Lack of information on CS website about specialization... Follow Up • CPSC 462 Microcomputer Systems • CPSC 410 Operating Systems • Go to seminars/lectures by Bjarne Stroustrup, Jaakko Jarvi, or Gabriel Dos Reis Today’s Menu Caches Memory Current memory is largely implemented in CMOS technology. Two alternatives: • SRAM • fast, but not area efficient • stored value in a pair of inverting gates • DRAM • slower, but more area efficient • value stored on charge of a capacitor (must be refreshed) Static RAM Static RAM Dynamic RAM Dynamic RAM Memory • Users want large and fast memories • SRAM is too expensive for main memory • DRAM is too slow for many purposes • Compromise • Build a memory hierarchy CPU Level 1 Levels in the memory hierarchy Level 2 Level n Size of the memory at each level Increasing distance from the CPU in access time Locality • If an item is referenced, then • it will be again referenced soon (temporal locality) • nearby data will be referenced soon (spatial locality) • Why does code have locality? Memory Hierarchy • The memory is organized as a hierarchy • levels closer to the processor is a subset of any level further away • the memory can consist of multiple levels, but data is typically copied between two adjacent levels at a time • initially, we focus on two levels Memory Hierarchy Two Level Hierarchy • Upper level (smaller and faster) • Lower level (slower) • A unit of information that is present or not within a level is called a block • If data requested by the processor is in the upper level, then this is called a hit, otherwise it is called a miss • If a miss occurs, then data will be retrieved from the lower level. Typically, an entire block is transferred Cache A cache represents some level of memory between CPU and main memory [More general definitions are often used] A Toy Example • Assumptions • Suppose that processor requests are each one word, • and that each block consists of one word • Example • • • • Before request C = [X1,X2,…,Xn-1] Processor requests Xn not contained in C item Xn is brought from the memory to the cache After the request C = [X1,X2,…,Xn-1,Xn] • Issues • What happens if the cache is full? Issues • How do we know whether the data item is in the cache? • If it is, how do we find it? • Simple strategy: direct mapped cache • exactly one location where data might be in the cache Direct Mapped Cache • Mapping: address modulo the number of blocks in the cache, x -> x mod B 000 001 010 011 100 101 110 111 Cache 00001 00101 01001 01101 10001 Memory 10101 11001 11101 Direct Mapped Cache • Cache with 1024=210 words • tag from cache is compared against upper portion of the address • If tag=upper 20 bits and valid bit is set, then we have a cache hit otherwise it is a cache miss Address (showing bit positions) 31 30 13 12 11 210 Byte offset Hit 10 20 Tag Index Index Valid Tag Data 0 1 2 1021 1022 1023 20 32 Data Direct Mapped Cache Example Direct Mapped Cache Example Direct Mapped Cache Example Direct Mapped Cache • Taking advantage of spatial locality: Address (showing bit positions) 31 16 15 16 Hit 4 32 1 0 12 2 Byte offset Tag Data Index V Block offset 16 bits 128 bits Tag Data 4K entries 16 32 32 32 Mux 32 32 Hits vs. Misses • Read hits • this is what we want! • Read misses • stall the CPU, fetch block from memory, deliver to cache, restart • Write hits: • can replace data in cache and memory (write-through) • write the data only into the cache (write-back the cache later) • Write misses: • read the entire block into the cache, then write the word Hits vs. Miss Example What Block Size? • • • • • A large block size reduces cache misses Cache miss penalty increases We need to balance these two constraints How can we measure cache performance? How can we improve cache performance? The performance of a cache depends on many parameters: • Memory stall clock cycles • Read stall clock cycles • Write stall clock cycles Cache Block Mapping • Direct mapped cache • a block goes in exactly one place in the cache • Fully associative • a block can go anywhere in the cache • difficult to find a block • parallel comparison to speed-up search Cache Block Mapping • Set associative • Each block maps to a unique set, and the block can be placed into any element of that set • Position is given by (Block number) modulo (# of sets in cache) • If the sets contain n elements, then the cache is called n-way set associative Cache Types
© Copyright 2026 Paperzz