Lecture 19 - faculty.cs.tamu.edu

The Memory Hierarchy
CPSC 321
Andreas Klappenecker
Some Results from the Survey
• Issues with the CS curriculum
• CPSC 111 Computer Science Concepts & Prg
• CPSC 310 Databases
• CPSC 431 Software Engineering
• Something from the wish list:
•
•
•
•
More C++
More Software Engineering
More focus on industry needs
Less focus on industry needs
Some Results from the Survey
• Why (MIPS) assembly language?
• More detailed explanations of
programming language xyz.
• Implement slightly reduced version of
the Pentium 4 or Athlon processors
• Have another computer architecture
class
• Lack of information on CS website
about specialization...
Follow Up
• CPSC 462 Microcomputer Systems
• CPSC 410 Operating Systems
• Go to seminars/lectures by Bjarne
Stroustrup, Jaakko Jarvi, or Gabriel Dos Reis
Today’s Menu
Caches
Memory
Current memory is largely implemented in
CMOS technology. Two alternatives:
• SRAM
• fast, but not area efficient
• stored value in a pair of inverting gates
• DRAM
• slower, but more area efficient
• value stored on charge of a capacitor (must be
refreshed)
Static RAM
Static RAM
Dynamic RAM
Dynamic RAM
Memory
• Users want large and fast memories
• SRAM is too expensive for main memory
• DRAM is too slow for many purposes
• Compromise
• Build a memory hierarchy
CPU
Level 1
Levels in the
memory hierarchy
Level 2
Level n
Size of the memory at each level
Increasing distance
from the CPU in
access time
Locality
• If an item is referenced, then
• it will be again referenced soon
(temporal locality)
• nearby data will be referenced soon
(spatial locality)
• Why does code have locality?
Memory Hierarchy
• The memory is organized as a hierarchy
• levels closer to the processor is a subset
of any level further away
• the memory can consist of multiple levels,
but data is typically copied between two
adjacent levels at a time
• initially, we focus on two levels
Memory Hierarchy
Two Level Hierarchy
• Upper level (smaller and faster)
• Lower level (slower)
• A unit of information that is present or not
within a level is called a block
• If data requested by the processor is in the
upper level, then this is called a hit,
otherwise it is called a miss
• If a miss occurs, then data will be retrieved
from the lower level. Typically, an entire
block is transferred
Cache
A cache represents some level of memory
between CPU and main memory
[More general definitions are often used]
A Toy Example
• Assumptions
• Suppose that processor requests are each one word,
• and that each block consists of one word
• Example
•
•
•
•
Before request C = [X1,X2,…,Xn-1]
Processor requests Xn not contained in C
item Xn is brought from the memory to the cache
After the request C = [X1,X2,…,Xn-1,Xn]
• Issues
• What happens if the cache is full?
Issues
• How do we know whether the data item
is in the cache?
• If it is, how do we find it?
• Simple strategy: direct mapped cache
• exactly one location where data might be in
the cache
Direct Mapped Cache
• Mapping: address modulo the number of
blocks in the cache, x -> x mod B
000
001
010
011
100
101
110
111
Cache
00001
00101
01001
01101
10001
Memory
10101
11001
11101
Direct Mapped Cache
• Cache with 1024=210
words
• tag from cache is
compared against upper
portion of the address
• If tag=upper 20 bits
and valid bit is set,
then we have a cache
hit otherwise it is a
cache miss
Address (showing bit positions)
31 30
13 12 11
210
Byte
offset
Hit
10
20
Tag
Index
Index Valid Tag
Data
0
1
2
1021
1022
1023
20
32
Data
Direct Mapped Cache Example
Direct Mapped Cache Example
Direct Mapped Cache Example
Direct Mapped Cache
• Taking advantage of spatial locality:
Address (showing bit positions)
31
16 15
16
Hit
4 32 1 0
12
2 Byte
offset
Tag
Data
Index
V
Block offset
16 bits
128 bits
Tag
Data
4K
entries
16
32
32
32
Mux
32
32
Hits vs. Misses
• Read hits
• this is what we want!
• Read misses
• stall the CPU, fetch block from memory, deliver to cache,
restart
• Write hits:
• can replace data in cache and memory (write-through)
• write the data only into the cache (write-back the cache later)
• Write misses:
• read the entire block into the cache, then write the word
Hits vs. Miss Example
What Block Size?
•
•
•
•
•
A large block size reduces cache misses
Cache miss penalty increases
We need to balance these two constraints
How can we measure cache performance?
How can we improve cache performance?
The performance of a cache depends on
many parameters:
• Memory stall clock cycles
• Read stall clock cycles
• Write stall clock cycles
Cache Block Mapping
• Direct mapped cache
• a block goes in exactly one place in the
cache
• Fully associative
• a block can go anywhere in the cache
• difficult to find a block
• parallel comparison to speed-up search
Cache Block Mapping
• Set associative
• Each block maps to a unique set, and the
block can be placed into any element of
that set
• Position is given by
(Block number) modulo (# of sets in cache)
• If the sets contain n elements, then the
cache is called n-way set associative
Cache Types