Back to All Lectures

Lecture 17: Multi-Level Caching

Lectures on Computer Architecture

Click the thumbnail above to watch the video lecture on YouTube

By Dr. Isuru Nawinne

17.1 Introduction

This lecture explores cache hierarchies in modern computer systems, examining how multiple levels of cache work together to optimize memory access performance through careful balance of hit latency versus hit rate. We analyze real-world implementations including Intel's Skylake architecture, understanding the design decisions behind multi-level cache organizations where L1 caches prioritize speed, L2 caches balance capacity and latency, and L3 caches provide large shared storage across processor cores. The examination of associativity tradeoffs—from direct-mapped through set-associative to fully associative designs—reveals how hardware complexity, power consumption, and performance interact in practical cache systems.

17.2 Recap: Associativity Comparison Results

From the previous lecture's example using a 4-block cache with three different organizations:

17.2.1 Direct Mapped Cache

17.2.2 2-Way Set Associative Cache

17.2.3 Fully Associative Cache (4-way)

17.2.4 Key Observations

17.3 Cache Configuration Parameters

17.3.1 Primary Parameters

1. Block Size

2. Set Size

3. Associativity

17.3.2 Cache Size Calculation

Total Cache Size = Block Size × Set Size × Associativity

17.3.3 Secondary Parameters

4. Replacement Policy

5. Write Policy

6. Other Optimization Techniques

17.3.4 Configuration Definition

17.4 Improving Cache Performance

17.4.1 Average Access Time Equation

T_avg = Hit Latency + Miss Rate × Miss Penalty

Three main factors can be optimized as below.

17.5 Hit Rate Improvement

17.5.1 Method 1: Increase Cache Size

Approach: Limitations:

17.5.2 Method 2: Increase Associativity

Benefits: Trade-offs:

17.5.3 Method 3: Cache Prefetching

Concept: Types of Prefetching: Benefits: Limitations:

17.6 Hit Latency Optimization

17.6.1 Relationship with Hit Rate

Fundamental Trade-off: Examples: Design Challenge:

17.7 Miss Penalty Improvement

17.7.1 Miss Penalty Definition

17.7.2 Method 1: Optimize Communication

17.7.3 Method 2: Cache Hierarchy (Main Focus)

17.8 Cache Hierarchy (Multi-Level Caches)

17.8.1 Concept

Instead of a single cache between CPU and memory, use multiple cache levels: L1, L2, L3, etc., with each level serving as backup for the level above.

17.8.2 Terminology

17.8.3 Operation

  1. CPU requests data from L1
  2. L1 miss → request goes to L2 (not directly to memory)
  3. L2 miss → request goes to L3 (if exists)
  4. Last-level miss → request goes to main memory

17.8.4 Benefits

17.8.5 Effective Miss Penalty

For L1 cache:

Effective Miss Penalty = L2 Hit Latency + L2 Miss Rate × L2 Miss Penalty

If L2 has good hit rate:

17.8.6 Example Calculation

Given:

L1 effective penalty = 3 + 0.001 × 100 = 3.1 cycles

17.9 Optimization Strategies for Multi-Level Caches

17.9.1 Why Not One Big Cache?

17.10 L1 Cache Optimization - Optimize for Hit Latency

17.10.1 Goal

Minimize hit latency

17.10.2 Rationale

17.10.3 Characteristics

17.10.4 Trade-off

17.11 L2 Cache Optimization - Optimize for Hit Rate

17.11.1 Goal

Maximize hit rate

17.11.2 Rationale

17.11.3 Characteristics

17.11.4 Trade-off

17.12 Associativity Comparison

Question: Which level has higher associativity? Answer: L2 (and L3 if present) have higher associativity

17.12.1 Reasoning

17.12.2 Combined Effect

Overall result: Much better average performance

17.13 Physical Implementation of Cache Hierarchy

17.13.1 L1 Cache

17.13.2 L2 Cache

17.13.3 L3 Cache

17.13.4 Design Variations

Different implementations based on:

17.14 Real World Example: Intel Skylake Architecture

Source: wikichip.org

17.14.1 Architecture Overview

17.14.2 Dual-Core Layout Analysis

Execution Units

Pipeline Support Hardware

17.14.3 Cache Implementation

L1 Data Cache

L1 Instruction Cache

L2 Cache

17.14.4 Memory Hierarchy

17.14.5 Design Observations

17.14.6 Why Higher L1 Associativity Here?

17.14.7 Multi-Core Configuration

17.14.8 Additional Features

17.15 Recommendations for Further Study

17.15.1 Resource: wikichip.org

Content Available: Benefits:

Key Takeaways

  1. Cache hierarchies reduce effective miss penalty
  2. Different levels optimized for different goals:
    • L1: Hit latency (speed)
    • L2/L3: Hit rate (coverage)
  3. Multi-level caches balance competing requirements
  4. Real implementations show concepts in practice
  5. Design decisions depend on:
    • Performance targets
    • Power budget
    • Cost constraints
    • Application requirements
  6. Modern CPUs use sophisticated cache hierarchies
  7. Cache takes significant portion of CPU die area
  8. Pipeline optimizations also require substantial hardware

Summary

Cache hierarchies represent one of the most effective techniques for improving memory system performance. By using multiple levels of cache, each optimized for different objectives, modern processors achieve both low latency and high hit rates. The L1 cache prioritizes speed to minimize clock cycle time, while L2 and L3 caches prioritize capacity and hit rate to reduce memory access frequency. Real-world implementations, such as Intel's Skylake architecture, demonstrate these principles in practice, showing how careful cache design enables high-performance computing while managing the constraints of power, cost, and chip area.