Caches
Problem: Processor-Memory Bottleneck
Cache Structure
Terminology
Directly mapped cache:
- Cache with
E=1
- Every block from memory has a unique location in cache
Fully associative cache:
- Cache with
S=1
- Every block from memory can be mapped to any location in cache
- Expensive to build in practice
- The register file can be viewed as a fully associative cache
LRU Replacement: when selecting which block should be replaced (happens only for E>1), the least recently used one is chosen.
Types of cache misses
Compulsory | occurs on first access to a block |
Capacity | occurs when the working set is larger than the cache |
Conflict | occurs when the cache is large enough, but multiple data objects map to the same slot |
Write policies
Is Coherence Everything?
- Coherence is concerned with behavior of individual locations
- Consider the program (initial
X,Y,Z = 0
)
What value will Z
on P2
have?
Y=10
does not need to have completed beforeX=2
is visible toP2
!- This allows
P2
to exit the loop and readY=0
- This may not be the intent of the programmer!
- This may be due to congestion (imagine X is pushed to a remote cache while Y misses to main memory) and or due to write buffering, …
What happens when
Y
andX
are on the same cache line (assume simple MESI and no write buffer)?
For the exercises
Number of sets: S = cache_size/(cacheline_size*associativity)
e.g. 2MiB cache, 64B cacheline, 8-way associative:
S = 2*1024*1024/(64*8) = 4096
e.g. 256B cache, 32B cacheline, 2-way associative:
S = 256/(32*2) = 4
Number of bits for the offset = log2(cacheline_size)
e.g. 64B cacheline:
log2(64) = 6
e.g. 32B cacheline:
log2(32) = 5
Number of bits for the set = log2(number_of_sets)
Number of bits for the tag = address_size - offset_size - set_size
e.g., 32-bit address, 32B cacheline, 2-way associative:
32 - 5 - 2 = 25
e.g., with a 4KiB memory, 32B cacheline, 2-way associative:
12 - 5 - 2 = 5