Home » Posts

Cache Modeling: The Basics

2025-09-01 · Arsh Sharma
  1. The Basics
  2. AMAT
  3. Cache Mapping Techniques
  4. Cache Specific Parameters

The Basics

When I first started diving into computer architecture, I was fascinated by how modern processors manage memory.

When the CPU requests a memory address:

Another thing to note is that caches are smaller in size compared to main memory, so they can't hold all the data. Therefore, they use various strategies to decide which data to keep and which to evict when new data needs to be loaded. So the overall goal of any memory system or software optimization is to reduce the number of cache misses, and feed as much data as possible to the CPU, thereby improving performance.

AMAT

The number of cache misses/miss rate is a coarse metric to measure performance impact, a much better metric is the Average Memory Access Time (AMAT) because it takes into account the total time required to get that data - or in other words how much time the CPU is waiting for data(due to memory).

We can define, AMAT =Hit latency+Miss ratio×Miss penalty= Hit\ latency + Miss\ ratio \times Miss\ penalty

Let' say we have a cache with:

and a Main memory with, Access Latency =30 cycles= 30\ cycles

Now going by the equation above, we can calculate AMATfrom the cache prespective

=2+(10.7)×30 = 2 + (1 - 0.7) \times 30 =11 cycles = 11 \ \text{cycles}

Modern CPUs have multiple levels of cache (L1, L2, L3), L1 being the smallest and fastest, and L3 being larger and slower. One can say that in terms of speed(access time):

=Disk>RAM>L3>L2>L1>CPURegisters = Disk > RAM > L3 > L2 > L1 > CPU Registers

in terms of size

=CPURegisters<L1<L2<L3<RAM<Disk = CPU Registers < L1 < L2 < L3 < RAM < Disk

On windows you can actually check this for yourself using the Task Manager[1] task_manager

Let's try to use the AMAT equation assuming that we have:

AMATL2=5+(10.9)×30 AMAT_{L2} = 5 + (1 - 0.9) \times 30 =8 cycles = 8 \ \text{cycles}

Now let's substitute the miss penatly for L1 as AMATL2AMAT_{L2}

AMATL1=2+(10.7)×8 AMAT_{L1} = 2 + (1 - 0.7) \times 8 =4.4 cycles = 4.4 \ \text{cycles}

We see that AMAT has reduced from 11 cycles to 4.4 cycles by adding an additional cache level, which is a significant improvement.

Cache Mapping Techniques

How a cache maps data from main memory to the cache is determined by its mapping technique. There are three common techniques:

  1. Direct-Mapped Cache: Each block of main memory maps to exactly one cache line. This is simple but can lead to many conflicts and cache misses.

  1. Fully Associative Cache: Any block of memory can be placed in any cache line. This offers the most flexibility but is more complex and expensive to implement.

  1. Set-Associative Cache: The cache is divided into sets, and each block of memory can map to any line within a specific set. This reduces conflicts compared to direct-mapped caches. ways are basically the number of cache lines in a set

Cache Specific Parameters

Apart from having different cache levels, there are cache specific parameters like:

I think it's a good time to stop here, as we have covered the basics of cache memory, its hierarchy, and some important parameters that affect its performance.

The next blog talks about implementing all of the above using C++.

[1] In a multi-core CPU, the L3 cache is usually shared among all cores, while L1 and L2 caches are typically private to each core.