CS2100 Computer Organisation

CS2100 Computer Organisation
Cache II
Road Map: Part II
Performance
Assembly
Language

Cache Part II
Processor:
Datapath

Type of Cache Misses

Direct-Mapped Cache
Processor:
Control

Set Associative Cache

Block Replacement Policy

Cache Framework
Pipelining
Cache
Cache II
2
Cache Misses: Classifications
Compulsory / Cold Miss
• First time a memory block is accessed
• Cold fact of life: Not much can be done
• Solution: Increase cache block size
Conflict Miss
• Two or more distinct memory blocks map to the same cache block
• Big problem in direct-mapped caches
• Solution 1: Increase cache size
• Inherent restriction on cache size due to SRAM technology
• Solution 2: Set-Associative caches (coming next ..)
Capacity Miss
• Due to limited cache size
CS2100
Cache II
3
Block Size Tradeoff (1/2)
Average Access Time
= Hit rate x Hit Time + (1-Hit rate) x Miss penalty

Larger block size:
+ Takes advantage of spatial locality
-- Larger miss penalty: Takes longer time to fill up the block
-- If block size is too big relative to cache size
 Too few cache block  miss rate will go up
Miss
Rate
Miss
Penalty
Exploits Spatial Locality
Average
Access
Time
Fewer blocks:
compromises
temporal locality
Block Size
CS2100
Block Size
Cache II
Block Size
4
Block Size Tradeoff (2/2)
Miss rate (%)
10
8 KB
16 KB
64 KB
256 KB
5
0
8
16
32
64
128
256
Block size (bytes)
CS2100
Cache II
5
Another way to organize the cache blocks
SET ASSOCIATIVE CACHE
CS2100
Cache II
6
Set-Associative (SA) Cache Analogy
Many book titles start with "T"
 Too many conflicts!
Hmm… how about we give more slots
per alphabet, 2 books start with "A", 2
books start with "B", …. etc?
CS2100
Cache II
7
Set Associative (SA) Cache

N-way Set Associative Cache


A memory block can be placed in a fixed number
of locations ( N > 1) in the cache
Key Idea:

Cache consists of a number of sets:



CS2100
Each set contains N cache blocks
Each memory block maps to a unique cache set
Within the set, a memory block can be placed in
any element of the set
Cache II
8
SA Cache Structure
Valid Tag
Data
Valid Tag
Data
Set Index
00
01
10
11
Set
2-way Set Associative Cache

An example of 2-way set associative cache


Each set has two cache blocks
A memory block maps to an unique set
In the set, the memory block can be placed in
either of the cache blocks
 Need to search both to look for the memory block

CS2100
Cache II
9
Set-Associative Cache: Mapping
Memory Address
31
N N-1
Block Number
0
Offset
Cache Block size = 2N bytes
Cache Set Index
= (BlockNumber) modulo (NumberOfCacheSets)
N+M-1
31
Tag
N N-1
Set Index
0
Offset
Cache Block size = 2N bytes
Number of cache sets = 2M
Offset = N bits
Set Index = M bits
Tag = 32 – (N + M) bits
CS2100
Cache II
Observation:
It is essentially
unchanged from the
direct-mapping formula
10
SA Cache Mapping: Example
Memory
4GB
1 Block
= 4 bytes
Memory Address
N N-1
31
Block Number
Offset
Offset, N = 2 bits
Block Number = 32 – 2 = 30 bits
Check: Number of Blocks = 230
N+M-1
31
Tag
Cache
4 KB
0
N N-1
Set Index
0
Offset
Number of Cache Blocks
= 4KB / 4bytes = 1024 = 210
4-way associative, number of sets
= 1024 / 4 = 256 = 28
Set Index, M = 8 bits
Cache Tag = 32 – 8 – 2 = 22 bits
CS2100
Cache II
11
Set Associative Cache Circuitry
31 30 . . .
Tag
Tag
22
Idx
8
V Tag Data
0
1
2
.
.
.
254
255
22
=
22
=
22
=
B
C
A
Hit
CS2100
V Tag Data
0
1
2
.
.
.
254
255
0
1
2
.
.
.
254
255
A
4-way 4-KB cache:
1-word (4-byte) blocks
Os
V Tag Data
22
Note the
simultaneous
"search" on all
elements of a set
0
Index
V Tag Data
0
1
2
.
.
.
254
255
11 10 9 . . . 2 1
Cache II
BC
D
=
D
4 x 1 Select
32
Data
12
Advantage of Associativity (1/3)
Block Number (Not Address)
0
Cache Index
00
1
01
2
10
3
11
Memory
4
Cache
5
6
7
8
9
10
11
12
13
14
15
CS2100
Example:
Given this memory access sequence: 0 4 0 4 0 4 0 4
Result:
Cold Miss = 2 (First two accesses)
Conflict Miss = 6 (The rest of the accesses)
Cache II
13
Advantage of Associativity (2/3)
Block Number (Not Address)
0
Set Index
0
1
1
2
Cache
3
Memory
4
5
6
7
8
9
10
11
12
13
14
15
CS2100
Example:
Given this memory access sequence: 0 4 0 4 0 4 0 4
Result:
Cold Miss = 2 (First two accesses)
Conflict Miss = None
Cache II
14
Advantage of Associativity (3/3)
Rule of Thumb:
A direct-mapped cache of size N has about the same miss rate
as a 2-way set associative cache of size N / 2
CS2100
Cache II
15
SA Cache Example: Setup

Given:



Memory access sequence: 4, 0, 8, 36, 0
2-way set-associative cache with a total of four 8byte blocks  total of 2 sets
Indicate hit/miss for each access
31
3 2
4
Tag
Set Index
0
Offset
Offset, N = 3 bits
Block Number = 32 – 3 = 29 bits
2-way associative, number of sets= 2 = 21
Set Index, M = 1 bits
Cache Tag = 32 – 3 – 1 = 28 bits
CS2100
Cache II
16
Example: LOAD #1 - 1
Tag

Index Offset
Load from 4  000000000000000000000000000
0 100
Check: Both blocks in Set 0 are invalid [ Cold Miss ]
Block 0
Block 1
Set
Idx
Valid
0
0
0
1
0
0
CS2100
Tag
W0
W1
Cache II
Valid
Tag
W0
W1
17
Example: LOAD #1 - 2
Tag

Index Offset
Load from 4  000000000000000000000000000
0 100
Result: Load from memory and place in Set 0-Block 0
Block 0
Block 1
Set
Idx
Valid
Tag
W0
W1
Valid
0
1
0
M[0]
M[4]
0
1
0
CS2100
Tag
W0
W1
0
Cache II
18
Example: LOAD #2
Tag

Index Offset
Load from 0  000000000000000000000000000
0 000
Result:
[Valid and Tag match] in Set 0-Block 0 [ Spatial Locality ]
Block 0
Block 1
Set
Idx
Valid
Tag
W0
W1
Valid
0
1
0
M[0]
M[4]
0
1
0
CS2100
Tag
W0
W1
0
Cache II
19
Example: LOAD #3 - 1
Tag

Index Offset
Load from 8  000000000000000000000000000
1 000
Check: Both blocks in Set 1 are invalid [ Cold Miss ]
Block 0
Block 1
Set
Idx
Valid
Tag
W0
W1
Valid
0
1
0
M[0]
M[4]
0
1
0
CS2100
Tag
W0
W1
0
Cache II
20
Example: LOAD #3 - 2
Tag

Index Offset
Load from 8  000000000000000000000000000
1 000
Result: Load from memory and place in Set 1-Block 0
Block 0
Block 1
Set
Idx
Valid
Tag
W0
W1
Valid
0
1
0
M[0]
M[4]
0
1
1
0
M[8] M[12]
CS2100
Cache II
Tag
W0
W1
0
21
Example: LOAD #4 - 1
Tag

Index Offset
Load from 36  000000000000000000000000010
0 100
Check: [Valid but tag mismatch] Set 0-Block 0
[Invalid] Set 0-Block1
[ Cold Miss ]
Block 0
Block 1
Set
Idx
Valid
Tag
W0
W1
Valid
0
1
0
M[0]
M[4]
0
1
1
0
M[8] M[12]
CS2100
Cache II
Tag
W0
W1
0
22
Example: LOAD #4 - 2
Tag

Index Offset
Load from 36  000000000000000000000000010
0 100
Result: Load from memory and place in Set 0-Block 1
Block 0
Set
Valid Tag
Idx
Block 1
W0
W1
Valid
Tag
M[4]
1
2
0
1
0
M[0]
1
1
0
M[8] M[12]
CS2100
Cache II
W0
W1
M[32] M[36]
0
23
Example: LOAD #5
Tag

Load from 0 
Index Offset
000000000000000000000000000
0 000
Check: [Valid and tag match] Set 0-Block 0
[Valid but tag mismatch] Set 0-Block1
[ Temporal Locality ]
Block 0
Set
Valid Tag
Idx
Block 1
W0
W1
Valid
Tag
M[4]
1
2
0
1
0
M[0]
1
1
0
M[8] M[12]
CS2100
Cache II
W0
W1
M[32] M[36]
0
24
Who should I kick out next…..?
BLOCK REPLACEMENT
POLICY
CS2100
Cache II
25
Block Replacement Policy (1/3)

Set Associative or Fully Associative Cache:



Can choose where to place a memory block
Potentially replacing another cache block if full
Need block replacement policy
 Least

How: For cache hit, record the cache block that
was accessed


CS2100
Recently Used (LRU)
When replacing a block, choose one which has not been
accessed for the longest time
Why: Temporal locality
Cache II
26
Block Replacement Policy (2/3)

Least Recently Used policy in action:


4- way Cache
Memory accesses: 0 4 8 12 4 16 12 0 4
LRU
Access 4
CS2100
0
4
8
12
Access 16
x0
8
12
4
Access 12
8
12
4
16
Hit
Access 0
x8
4
16
12
Miss (Evict Block 8)
Access 4
4
16
12
0
16
12
0
4
Cache II
Hit
Miss (Evict Block 0)
Hit
27
Block Replacement Policy (3/3)

Drawback for LRU


Hard to keep track if there are many choices
Other replacement policies:

First in first out (FIFO)

Second chance variant

Random replacement (RR)

Least frequently used (LFU)
CS2100
Cache II
28
Cache Organizations: Summary
CS2100
Cache II
29
Cache Framework (1/2)
Block Placement: Where can a block be placed in cache?
Direct Mapped:
• Only one block
defined by index
N-way SetAssociative:
• Any one of the N
blocks within the
set defined by
index
Fully Associative:
• Any cache block
Block Identification: How is a block found if it is in the
cache?
Direct Mapped:
• Tag match with
only one block
CS2100
N-way Set
Associative:
Fully
Associative:
• Tag match for all
the blocks within
the set
• Tag match for all
the blocks within
the cache
Cache II
30
Cache Framework (2/2)
Block Replacement: Which block should be replaced on a
cache miss?
Direct Mapped:
• No Choice
n-way SetAssociative:
• Based on
replacement
policy
Fully Associative:
• Based on
replacement
policy
Write Strategy: What happens on a write?
Write Policy: Write-through vs. write-back
Write Miss Policy: Write allocate vs. write no allocate
CS2100
Cache II
31
Reading Assignment

Large and Fast: Exploiting Memory Hierarchy


CS2100
Chapter 7 sections 7.3 (3rd edition)
Chapter 5 sections 5.3 (4th edition)
Cache II
32
END
CS2100
Cache II
33