Back to All Lectures

Lecture 9: Microarchitecture and Datapath

Lectures on Computer Architecture

Click the thumbnail above to watch the video lecture on YouTube

By Dr. Isuru Nawinne

9.1 Introduction

This lecture transitions from instruction set architecture (ISA) to microarchitecture—the hardware implementation of the ISA. We explore how to build a processor that executes MIPS instructions, covering instruction formats, digital logic fundamentals, datapath construction, and single-cycle processor design. Understanding microarchitecture reveals how software instructions translate to hardware operations and provides the foundation for studying advanced processor designs including pipelining and superscalar execution.

9.2 MIPS ISA

9.2.1 Transition to Hardware Implementation

Previous Focus: ARM ISA

Current Focus: MIPS Microarchitecture

Why MIPS for Hardware Study?

9.2.2 MIPS Instruction Categories

Three Instruction Types (based on encoding)

I-Type (Immediate)

R-Type (Register)

J-Type (Jump)

Contrast with ARM

9.2.3 MIPS Instruction Encoding

Fixed 32-Bit Length

R-Type Format


[Opcode][RS][RT][RD][SHAMT][Funct]
6 bits  5   5   5    5      6 bits

Fields:

I-Type Format


[Opcode][RS][RT][Immediate]
6 bits  5   5   16 bits

Fields:

J-Type Format


[Opcode][Address]
6 bits  26 bits

Fields:

9.3 Digital Logic Review

9.3.1 Information Encoding

Binary Representation

Multi-Bit Signals

9.3.2 Combinational Elements

Definition

Examples

Characteristics

9.3.3 Sequential Elements (State Elements)

Definition

Examples

Characteristics

9.3.4 Clocking and Timing

Clock Signal

Edge-Triggered

Clock Period and Frequency

Example:


T = 250 ps = 0.25 ns
f = 1/(250 × 10^-12) = 4 GHz

9.3.5 Register Operations

Basic Register

Timing Example

Register Timing Diagram

Register with Write Control

Timing Example

Register with Write Enable Timing Diagram

9.3.6 Critical Path and Clock Period

Combinational Logic Delay

Clock Period Constraint

Must allow time for:

  1. Register output stabilization
  2. Combinational logic computation
  3. Result reaching next register input
  4. Setup time before next clock edge

Critical Path

Single-Cycle Constraint

9.4 CPU Execution Stages

CPU Execution Stages Overview

9.4.1 Instruction Fetch (IF)

Purpose: Retrieve next instruction from memory

Steps:

  1. Use Program Counter (PC) for instruction address
  2. Access Instruction Memory with PC
  3. Retrieve 32-bit instruction word
  4. Instruction now in CPU for processing

Hardware:

9.4.2 Instruction Decode (ID)

Purpose: Interpret instruction and extract fields

Decode Operations:

  1. Examine Opcode (bits 26-31):
    If opcode = 0: R-type
    If opcode = 2 or 3: J-type
    Otherwise: I-type
    
  2. Extract Register Numbers:
    R-type: RS, RT, RD (three 5-bit fields)
    I-type: RS, RT (two 5-bit fields)
    J-type: No registers
    
  3. Extract Immediate/Address:
    I-type: 16-bit immediate
    J-type: 26-bit address
    

4. Extract Function/Shift (R-type only):
Funct: bits 0-5 (ALU operation)
SHAMT: bits 6-10 (shift amount)

Control Unit Role:

9.4.3 Execute (EX)

Purpose: Perform operation or calculate address

Operations by Type:

Arithmetic/Logic (R-type, I-type arithmetic):

Memory Access (Load/Store):

Branch:

9.4.4 Memory Access (MEM)

Purpose: Read or write data memory

Applies To:

Load Operation:

  1. Use address from ALU
  2. Read data from memory
  3. Data will be written to register

Store Operation:

  1. Use address from ALU
  2. Get data from RT register
  3. Write data to memory

9.4.5 Register Write-Back (WB)

Purpose: Write result to destination register

Applies To:

Source Selection:

9.4.6 PC Update

Purpose: Determine next instruction address

Default: PC = PC + 4 (sequential)

Branch/Jump: PC = calculated target address

Control Flow:

9.5 R-Type Instruction Datapath

9.5.1 Register File

Structure:

Read Ports:

Write Port:

9.5.2 R-Type Execution Flow

Instruction: ADD $t0, $t1, $t2 (R0 = R1 + R2)

Step 1: Register Read

Step 2: ALU Operation

Step 3: Write-Back

9.5.3 ALU Control

Function Field Encoding:

Funct Operation ALU Control
0x20 ADD 0010
0x22 SUB 0110
0x24 AND 0000
0x25 OR 0001
0x2A SLT 0111

ALU Control Logic:

9.6 I-Type Instruction Datapath

9.6.1 Differences from R-Type

Operand Sources:

Register Usage:

9.6.2 Sign Extension

Problem: 16-bit immediate, 32-bit ALU

Process:

  1. Take 16-bit immediate
  2. Examine bit 15 (sign bit)
  3. Replicate sign bit to bits 16-31
  4. Result: 32-bit signed value

Examples:


16-bit: 0x0005 → 32-bit: 0x00000005 (+5)
16-bit: 0xFFFB → 32-bit: 0xFFFFFFFB (-5)

Hardware: Simple wire replication (fast)

9.6.3 Multiplexer for ALU Input

ALU Input B Selection:

ALUSrc Signal:


ALUSrc = 0: Use register (R-type, branch)
ALUSrc = 1: Use immediate (I-type)

9.7 Load/Store Instruction Datapath

9.7.1 Address Calculation

Formula: Address = Base + Offset

Components:

Examples:

LW $t1, 8($t0) # Load from $t0 + 8

SW $t2, -4($sp) # Store to $sp - 4

9.7.2 Load Word (LW)

Instruction Format:

Execution:

  1. Read RS (base address)
  2. Sign-extend immediate (offset)
  3. ALU adds: Address = RS + offset
  4. Read data from memory at address
  5. Write data to RT register

Critical Path: Longest in single-cycle design

9.7.3 Store Word (SW)

Instruction Format:

Execution:

  1. Read RS (base) and RT (data)
  2. ALU calculates address
  3. Write RT data to memory at address
  4. NO register write-back

Key Difference:

9.7.4 Data Memory

Interface:

Control Signals:

Multiplexer for Write-Back:

9.8 Branch Instruction Datapath

9.8.1 Branch Types

BEQ (Branch if Equal):

BNE (Branch if Not Equal):

9.8.2 Branch Target Calculation

Components

  1. PC + 4: Next sequential instruction address
  2. Offset: Sign-extended 16-bit immediate (count of instructions)
  3. Target: Computed branch destination

Target Address Formula


Target = (PC + 4) + (Offset × 4)

Explanation

Hardware Path Summary


Immediate (16 bits) → Sign Extend (32 bits) → Shift Left 2 → Add to (PC + 4) → Branch Target

Design Notes

9.8.3 Branch Execution

Step 1: Register Comparison

Step 2: Zero Flag Evaluation

Step 3: Target Calculation (parallel)

Step 4: PC Update Decision


BEQ: PCSrc = Branch AND Zero
BNE: PCSrc = Branch AND NOT(Zero)

Multiplexer:

9.8.4 Sign Extension and Shifting

Sign Extension: Preserves signed offset

Shift Left 2: Wire routing trick

9.9 Complete Single-Cycle Datapath

Complete Single-Cycle CPU Control and Datapath

9.9.1 Integrated Components

Instruction Fetch:

Register File:

ALU:

Data Memory:

Sign Extender:

Branch Logic:

Multiplexers:

9.9.2 Control Signals

Generated by Control Unit:


1. RegDst: Register destination select
2. Branch: Branch instruction indicator
3. MemRead: Memory read enable
4. MemtoReg: Memory to register select
5. MemWrite: Memory write enable
6. ALUSrc: ALU source select
7. RegWrite: Register write enable
8. ALUOp: ALU operation type

9.9.3 Parallel Operations

Key Insight: Hardware operates in PARALLEL

Example: R-type instruction

9.9.4 Critical Path Analysis

Path for Load Word (longest):


1. Instruction fetch:     200 ps
2. Register read:         150 ps
3. Sign extend:           50 ps
4. Multiplexer:           25 ps
5. ALU address calc:      200 ps
6. Data memory access:    200 ps
7. Multiplexer:           25 ps
8. Register write setup:  100 ps
Total:                    950 ps

Clock Period: Must be ≥ 950 ps

Max Frequency: 1/950 ps ≈ 1.05 GHz

Inefficiency:

9.9.5 Single-Cycle Disadvantages

Inefficiency:

Hardware Duplication:

No Parallelism:

Advantages:

Key Takeaways

  1. Microarchitecture is hardware implementation of ISA - translating instruction semantics to hardware operations.
  2. MIPS uses three instruction types: R-type (registers), I-type (immediate), J-type (jump).
  3. Fixed 32-bit instructions simplify fetch/decode and enable efficient pipelining.
  4. Combinational elements have output as function of inputs only; sequential elements have state.
  5. Clock period must exceed longest combinational path between sequential elements.
  6. Six execution stages: Fetch, Decode, Execute, Memory, Write-back, PC Update.
  7. Register file has three ports: two read (combinational), one write (clocked).
  8. Sign extension converts 16-bit immediate to 32-bit preserving signed value.
  9. Multiplexers select between data sources based on control signals.
  10. ALU operations vary by instruction: addition (load/store), subtraction (branch), varies (R-type).
  11. Critical path determines clock period - load word is longest in single-cycle design.
  12. Single-cycle processor completes one instruction per cycle but inefficiently (all take same time).
  13. Separate instruction and data memories required for single-cycle (both accessed same cycle).
  14. Control signals orchestrate datapath - generated by control unit from opcode.
  15. All hardware operates in parallel - control signals select valid results, ignore others.

Summary

Microarchitecture bridges the gap between software instructions and hardware implementation, revealing how processors execute programs. Building a single-cycle MIPS processor requires understanding digital logic fundamentals, datapath component design, and control signal generation. While conceptually simple (one instruction per cycle), the single-cycle design is inefficient because all instructions must complete within the time required by the slowest instruction. The critical path—typically the load word instruction—determines the maximum clock frequency. Understanding this foundation prepares us for more sophisticated designs including multi-cycle processors (which break execution into multiple stages) and pipelined processors (which overlap instruction execution for higher throughput). These microarchitecture concepts apply broadly across processor design, from embedded systems to high-performance superscalar processors.