Back to All Lectures

Lecture 16: Associative Cache Control

Lectures on Computer Architecture

Click the thumbnail above to watch the video lecture on YouTube

By Dr. Isuru Nawinne

16.1 Introduction

This lecture explores advanced cache design techniques that significantly impact memory system performance. We examine write policies—specifically write-through and write-back strategies—understanding how each handles the critical challenge of maintaining consistency between cache and main memory while balancing performance and complexity. The lecture then progresses to associative cache organizations, from direct-mapped through set-associative to fully-associative designs, revealing how different levels of associativity affect hit rates, access latency, and hardware complexity. Through detailed examples and performance analysis, we discover how modern cache systems make strategic trade-offs between speed, capacity utilization, and implementation cost to achieve optimal memory hierarchy performance.

16.2 Recap: Write Access in Direct Mapped Cache

16.2.1 Write-Through Policy

16.2.2 Advantages of Write-Through

16.2.3 Disadvantages of Write-Through

16.2.4 Write Buffer Solution

16.3 Write-Back Policy

16.3.1 Basic Concept

16.3.2 Dirty Bit

16.3.3 Write-Back Operations

On Write Hit:

On Read Miss:

On Write Miss:

16.3.4 Advantages of Write-Back

16.3.5 Disadvantages of Write-Back

16.3.6 Write-Back Cache Structure

16.4 Cache Performance

16.4.1 Average Access Time Formula

T_avg = Hit Latency + Miss Rate × Miss Penalty

Where:

16.4.2 Example Calculation

Given:

T_avg = 1 + (1 - 0.95) × 20 
      = 1 + 0.05 × 20 
      = 2 cycles 
      = 2 nanoseconds

If hit rate improves to 99.9%:

T_avg = 1 + (1 - 0.999) × 20 
      = 1 + 0.001 × 20 
      = 1.02 cycles

This shows significant improvement from better hit rate.

16.4.3 Performance Example Problem

Given:

Calculating Actual CPI:

Speedup with ideal caches: 5.44 / 2 = 2.72×

CPI with no caches:

16.5 Improving Cache Performance

16.5.1 Three Factors to Improve

  1. Hit Rate - increase the percentage of hits
  2. Hit Latency - reduce time to determine hits
  3. Miss Penalty - reduce time to fetch missing blocks

16.5.2 Improving Hit Rate

Method 1: Larger Cache Size

16.5.3 Direct Mapped Cache Limitation

16.6 Fully Associative Cache

16.6.1 Concept

16.6.2 Finding Blocks

16.6.3 Implementation

16.6.4 Block Placement

16.6.5 Block Replacement

When all entries are valid, need replacement policy to choose which block to evict.

16.6.6 Replacement Policies

When all entries are valid, need replacement policy to choose which block to evict.

  1. LRU (Least Recently Used) - IDEAL:
    • Evict the block that was used longest ago
    • Best exploits temporal locality
    • Very complex to implement
    • Need to timestamp every access
    • Expensive in hardware
  2. Pseudo-LRU (PLRU):
    • Approximation of LRU
    • Simpler mechanism than true LRU
    • 90-99% of time picks least recently used
    • Better balance of performance and complexity
  3. FIFO (First In First Out):
    • Evict block that entered cache first
    • Very simple implementation
    • Only update when new block fetched (not on every access)
    • Lower likelihood of picking LRU block
    • Used in embedded systems for simplicity and low power

16.6.7 Fully Associative - Advantages

16.6.8 Fully Associative - Disadvantages

16.7 Set Associative Cache

16.7.1 Concept

16.7.2 Two-Way Set Associative

16.7.3 Read Access Process

  1. Use index to select correct set (via demultiplexer)
  2. Extract both stored tags from the set
  3. Parallel comparison of both tags with incoming tag
  4. Each way has hit status (hit0, hit1)
  5. Use encoder to generate select signal for multiplexer
  6. Select correct data block based on which way hit
  7. Use offset to select correct word within block

16.7.4 Important Notes

16.8 Associativity Spectrum

16.8.1 For an 8-Block Cache, Different Organizations

  1. 1-way set associative (Direct Mapped):
    • 8 entries, 1 way each
    • 3-bit index
    • Each block has fixed location
  2. 2-way set associative:
    • 4 entries, 2 ways each
    • 2-bit index
    • Each set can hold 2 different blocks
Memory System
  1. 4-way set associative:
    • 2 entries, 4 ways each
    • 1-bit index
    • Each set can hold 4 different blocks
  2. 8-way set associative (Fully Associative):
    • 1 entry, 8 ways
    • No index field (0 bits)
    • Any block can go anywhere

16.8.2 Design Considerations

16.9 Associativity Comparison Example

16.9.1 Setup

Initial State

16.9.2 Tag and Index Sizes

16.9.3 Memory Access Sequence

Access 1: Block Address 0

Score: Direct Mapped: 0 hits, 1 miss | 2-way: 0 hits, 1 miss | Fully: 0 hits, 1 miss

Access 2: Block Address 8

Score: Direct Mapped: 0 hits, 2 misses | 2-way: 0 hits, 2 misses | Fully: 0 hits, 2 misses

Access 3: Block Address 0 (repeated)

Score: Direct Mapped: 0 hits, 3 misses | 2-way: 1 hit, 2 misses | Fully: 1 hit, 2 misses

Access 4: Block Address 6

Score: Direct Mapped: 0 hits, 4 misses | 2-way: 1 hit, 3 misses | Fully: 1 hit, 3 misses

Access 5: Block Address 8 (repeated)

Final Score

16.9.4 Types of Misses

  1. Cold Misses: First access to address (unavoidable)
  2. Conflict Misses: Block evicted due to mapping, accessed again later

16.9.5 Key Observations

16.10 Trade-Offs Summary

16.10.1 Hit Rate

16.10.2 Hit Latency

16.10.3 Power and Cost

16.10.4 Design Decision Factors

Key Takeaways

  1. Write policies manage cache-memory consistency:
    • Write-through: Simple but generates heavy memory traffic
    • Write-back: More efficient but requires dirty bit tracking
  2. Write buffers improve write-through performance by decoupling cache and memory writes
  3. Cache performance depends on three factors: hit rate, hit latency, and miss penalty
  4. Associativity spectrum ranges from direct-mapped (1-way) to fully associative (N-way)
  5. Higher associativity reduces conflict misses and improves hit rate but increases complexity
  6. Set-associative caches balance the trade-offs between direct-mapped and fully associative designs
  7. Replacement policies (LRU, PLRU, FIFO) determine which block to evict in associative caches
  8. Design decisions must balance performance, power consumption, cost, and complexity
  9. Real-world caches use different associativity levels based on application requirements
  10. Performance analysis shows that even small improvements in hit rate significantly reduce average access time

Summary

This lecture examines two critical aspects of cache design: write policies and associativity. Write-through and write-back policies each offer distinct trade-offs between simplicity and efficiency, with write buffers providing a middle ground that improves performance without excessive complexity. The exploration of associative cache organizations reveals how different levels of associativity—from direct-mapped through set-associative to fully-associative—affect hit rates, access latency, and hardware complexity. Through detailed performance analysis and practical examples, we discover that while higher associativity generally improves hit rates by reducing conflict misses, it comes at the cost of increased hit latency, power consumption, and implementation complexity. Modern cache systems carefully balance these competing factors, with set-associative designs emerging as an effective compromise that captures most of the benefits of full associativity while maintaining reasonable complexity. Understanding these design trade-offs is essential for optimizing memory hierarchy performance in real-world computer systems.