EECS470 Homework 4 Answers 1. a) Will only

EECS470 Homework 4 Answers
1.
a) Will only evict A if either B or C go to the same set.
P(A≠B)*P(A≠C) = 56.25%
or
1 – P(A=B)*P(A≠C) – P(A≠B)*P(A=C) -P(A=B)*P(A=C) = 56.25%
b) 100% because the cache will never kick A out since it never fills up.
c) Same as part a except the probability P(A=X) = 1/8 now. 76.5625%
d) Will evict A if both B and C goto the same set.
1 – P(A=B)*P(A=C) = 75%
2.
a) (2) + 0.05*(10) + 0.05*0.5*(200) = 7.5
or
0.95*(2) + 0.05*0.5*(2 + 10) + 0.05*0.5*(2 + 10 + 200) = 7.5
b) 1 + (1-x)*(2) + 0.05*(10) + 0.05*0.5*(200) = 7.5
or
x*(1) + (1-x)*[(1-x-0.05)/(1-x)]*(1 + 2) + 0.05*0.5*(1 + 2 + 10) +
0.05*0.5*(1 + 2 + 10 + 200) = 7.5
x = 50%
The L0miss * L1miss is still 5% because the L1 is inclusive of the L0
because every access that missed in the L1 before the L0 was added will
still miss in the new setup.
c) The bandwidth to the L1 cache is doubled because every access is
sent to the L1. In the previous setup accesses that missed the L0 cache
were the only ones sent to the L1 cache.
3. a) addresses are 16 tag bits, 8 set index bits, 8 block-offset bits. There
are 256 sets, * (16 tag bits + 1 valid bit) = 4352 bits. 4096 bits is also an
acceptable answer (assumes that there is no valid bit, and invalid blocks
are marked with a special tag)
b) ( 16 tag bits + 8 valid bits )* 256 sets= 6144 bits. 9 valid bits is
acceptable, too.
c) i) Design A: 1MB / 256B per miss = 4096 misses Design B: 1MB /
32B per miss = 32768 misses
ii) Both designs transfer 1MB
d) i) Both designs incur 4096 misses (every access is a miss in both
designs)
ii) Design A: 4096 misses * 256B = 1MB Design B: 4096 misses * 32B
= 128KB
4) a) 2 + 12 * .05 + 250 * .05 * .4 = 7.6
b) Target access time = 0.85 * 7.6 = 6.46
2 + 14 * .05 + .05 * x * 250 = 6.46 x = 30.08%
5. i) Traversing a linked list. correlation
ii) A loop accessing a large two-dimensional array. stride/spatial
iii) A loop that probes a large hash table. Loop iterations are
independent.
run-ahead / pre-execution
iv) A piece of code that examines several fields in network packet
headers to make routing decisions. stride/spatial
6.
12
Page Size = 4 KB = 2
Index Bits + Block Offset Bits = 12
Block Offset = 64 B = 26
Index Bits = 12 – 6 = 6
Number of Sets = 26 = 64
Size of the Cache = # sets * block size * associativity = 64 * 64 *
4 = 16 KB
Changing to Direct Mapped does not change the number of sets in
the cache but it does change the size.
Size of the Cache = # sets * block size * associatively = 64 * 64 *
1 = 4 KB
7.
Let’s call the bits that are used in the index, but are above the page
offset (i.e., they come
from VPN) “alias bits”. These are the “a” bits referred to in lecture.
Software solutions:
• Prohibit multiple virtual mappings to the same physical page
(prohibits sharing)
• Require OS to map physical pages such that alias bits match between
VPN and PPN Hardware solutions:
Physically-index the cache. Set cache size/associativity so that there are
no alias bits (i.e., virtually-index, physically- tagged)
Alpha 21264: Search all locations in L1 where a synonym might be
found in parallel (i.e., check all possibilities for the alias bits)
MIPS: Enforce inclusion and add the alias bits to the L2 tag. L2 can
one for each page size, and look them up in parallel, or (2) for one TLB, do multiple lookups in
now
a synonym
L1,
and remove the old
parallel for each
pagedetect
size, andwhen
check the
page size asgoes
part ofinto
the tag
comparison.
from L1 before the new one goes in.
8.
BR/BW
M
BW⇒WB, WB⇒
I
W⇒BW
5.
R⇒BR
BW⇒
WB
W⇒BW S
BR⇒WB
R/W
R/BR
block