Back to All Lectures

Lecture 2: Technology Trends

Lectures on Computer Architecture

Click the thumbnail above to watch the video lecture on YouTube

By Dr. Isuru Nawinne

2.1 Introduction

The evolution of computer technology over the past 50 years has been nothing short of revolutionary. From room-sized scientific calculators to powerful smartphones in our pockets, this transformation has been guided by a prediction made by Intel co-founder Gordon Moore. This lecture examines the technological trends that enabled this revolution, the physical limitations that eventually constrained traditional scaling approaches, and the architectural innovations that emerged in response.

We will trace the exponential growth in transistor density, explore how smaller feature sizes enabled both more complex circuits and faster operation, understand why clock frequencies stopped increasing around 2004, and see how the industry pivoted to multi-core architectures. Finally, we'll examine how computer systems are organized into three layers (hardware, system software, and application software) and follow the complete translation process from high-level code to binary execution.

2.2 Moore's Law - Foundation of Computer Technology Evolution

2.2.1 Who Was Gordon Moore?

Background and Influence:

Intel's Dominance:

2.2.2 Moore's Law Definition

The Prediction:

Moore's Law is NOT a physical law like the law of gravity. It is an observation and prediction about technology trends:

"The number of transistors that can be placed on a standard computer chip will double every two years."

Practical Interpretation:

Historical Context:

2.2.3 Impact of Moore's Law

Computer Evolution Enabled:

Computers transformed from room-sized scientific calculators to:

Revolutionary Applications:

Moore's Law made computationally intensive applications possible:

  1. Human Genome Decoding:
    • Massive computational requirements
    • Processing billions of genetic sequences
    • Pattern recognition across enormous datasets
  2. World Wide Web and Internet Search:
    • Millisecond response times for complex queries
    • Indexing billions of web pages
    • Real-time information retrieval
  3. Artificial Intelligence and Machine Learning:
    • Neural networks with billions of parameters
    • Real-time image and speech recognition
    • Autonomous systems and decision-making
  4. Complex Simulations and Scientific Computing:
    • Weather prediction and climate modeling
    • Molecular dynamics simulations
    • Astrophysical calculations

Societal Impact:

2.3 Technology Scaling - Historical Data

2.3.1 Transistor Count Growth (1970-2010)

Chart Analysis:

Computer System Abstraction Layers

The historical data shows remarkable consistency with Moore's prediction:

Real Intel Processor Models:

Tracking actual transistor counts across processor generations:

Significance:

2.3.2 The x86 Architecture

Origin and Naming:

x86 Architecture Family:

The architecture evolved through multiple generations while maintaining compatibility:

AMD's Adoption:

Evolution Strategy:

2.3.3 Historical Context

Early Computing Era (1985-1990):

Transformation Period (Mid-Late 1990s):

User Experience Revolution:

2.4 Feature Size Scaling - Lithography Improvements

2.4.1 What Made Transistor Count Increase Possible?

The Answer: Smaller Transistors

The exponential growth in transistor count was enabled primarily by reducing transistor size through improved manufacturing processes.

Lithography Process:

Feature Size Timeline:

The relentless march toward smaller dimensions:

2.4.2 What is "Feature Size"?

Original Definition:

Modern Reality:

Alternative Names:

Different terms referring to approximately the same concept:

Why Ambiguity Developed:

2.4.3 How Tiny Are Transistors?

Mind-Boggling Scale:

Putting modern transistor sizes in perspective:

Manufacturing Precision:

2.4.4 Transistor Structure

Basic Components:

Feature Size Definition:

Electrical Properties:

2.5 Technology Roadmaps - ITRS Predictions

2.5.1 ITRS Organization

International Technology Roadmap for Semiconductors:

Prediction Basis:

The roadmaps considered multiple factors:

Regular Updates:

2.5.2 Original Roadmap (2001)

Optimistic Projections:

The initial roadmap predicted steady exponential decrease in feature size:

Assumptions:

2.5.3 Revised Roadmap (2013)

Adjusted Expectations:

By 2013, reality required revised predictions:

Key Observations:

2.5.4 Final Roadmap (2015)

Dramatic Shift in Direction:

The 2015 roadmap marked a fundamental change:

Significance:

2.5.5 Why the Change? - 3D Technology

Major Paradigm Shift (2013-2015):

The industry pivoted to a fundamentally different approach:

Traditional Approach (Before):

New Approach (After):

Impact on Moore's Law:

Technical Innovations:

2.5.6 Dissolution of ITRS (2015)

Reasons for Dissolution:

Multiple Methods for Transistor Density:

Modern approaches include:

Moore's Law Status:

2.6 Why Smaller Transistors Improve Performance

2.6.1 Reason 1: More Complex Circuits

Increased Transistor Budget:

More transistors available on chip enables more sophisticated functionality.

Comparison Example:

Limited Transistor Count (100 transistors):

Abundant Transistors (1 billion):

Architectural Implications:

2.6.2 Reason 2: Faster Switching

Electrical Advantages of Smaller Size:

Smaller transistors possess superior electrical characteristics:

Lower Operating Voltage:

Reduced Impedance:

Faster State Changes:

Overall Impact:

Physical Explanation:

The relationship between size and speed involves:

2.7 Clock Rate Trends - The Power Wall

2.7.1 Clock Rate Increases (1982-2004)

Exponential Growth Era:

Processor clock frequencies increased dramatically for over two decades:

Historical Progression:

Computer System Abstraction Layers

Growth Rate:

2.7.2 The Turning Point (2004-2007)

Sudden Deceleration:

Around 2004, the decades-long trend dramatically changed:

The Paradox:

Industry Recognition:

2.7.3 The Power Wall Problem

Power Consumption Growth Crisis:

As clock rates increased, power consumption grew unsustainably:

Pentium 4 Prescott Example:

The Thermal Crisis:

Physical reality of heat generation:

1. Heat Generation Mechanism:

2. Heat Dissipation Challenge:

The 100-Watt Rule of Thumb:

Industry consensus emerged:

Attempted Solutions (All Insufficient):

Various cooling methods were tried:

None Sufficient:

2.7.4 Dynamic Power Equation

The Physics of Power Consumption:

Dynamic power consumption follows this relationship:

$$ \text{Power} = \text{Capacitance Load} \times \text{Voltage}^2 \times \text{Frequency} $$

Factor Analysis (1982-2004):

Capacitance Load:

Voltage Reduction:

Frequency Increase:

Net Effect Calculation:

$$ \begin{align*} \text{Power Scaling} &= (\text{Capacitance}) \times (\text{Voltage}^2) \times (\text{Frequency}) \\ &= (1\times) \times (\frac{1}{5})^2 \times (300\times) \\ &= (1\times) \times (\frac{1}{25}) \times (300\times) \\ &= 12\times \text{ power increase} \end{align*} $$

Key Insight:

Why Voltage Couldn't Scale Further:

2.7.5 Overclocking Phenomenon

Marketing and User Community Response:

Emerged prominently around early 2000s during the MHz wars:

Manufacturer Approach:

User Overclocking:

Users could manually increase clock speed beyond rated specification:

Process:

Risks:

Target Audience:

Industry Impact:

2.8 Shift to Multi-Core Processors

2.8.1 The Challenge

The Industry Dilemma:

By mid-2000s, the semiconductor industry faced a paradox:

Available Resources:

Constraints:

Critical Questions:

2.8.2 Solution: Multiple Processor Cores

Paradigm Shift (2004-2008):

Industry pivoted from single-core to multi-core architectures:

Core Concept:

Instead of one powerful processor, put multiple complete processors on same chip:

Early Multi-Core Processors:

AMD Barcelona (2007):

Computer System Abstraction Layers

Intel Core Series:

IBM Processors:

Extreme Designs:

Power Management:

2.8.3 The Plan

Initial Industry Vision:

Following Moore's Law principle for core counts:

Projected Growth:

Timeline Projection:

Theoretical Benefits:

Reality Check:

2.8.4 Why Multi-Core Growth Slowed

The Fundamental Problem: Parallel Programming Difficulty

Software Challenge:

Multi-core processors require fundamentally different programming approach:

Sequential Programming (Traditional):

Parallel Programming (Required for Multi-Core):

Available Parallel Programming Techniques:

Multi-Threading:

Multiple Processes:

Communication Mechanisms:

Language Support:

Inherent Difficulties:

1. Parallel Programming is HARD:

2. Requires Deep Understanding:

Key Technical Challenges:

Load Balancing:

Communication Optimization:

Synchronization:

Performance Consequences:

If parallel programming not done well:

2.8.5 Instruction-Level Parallelism vs Multi-Core Parallelism

Instruction-Level Parallelism (ILP):

Characteristics:

Techniques:

Benefits:

Multi-Core Parallelism:

Characteristics:

Programmer Must:

Contrast:

Aspect ILP Multi-Core
Who does work Hardware Programmer
Transparency Invisible Explicit
Difficulty Automatic Hard
Applicability All code Limited patterns
Overhead Hidden Must manage

2.8.6 Impact on Software Development

For Regular Programmers:

For Computer Engineers:

Application Domains:

High-Performance Applications Requiring Parallelism:

Applications That Remain Sequential:

Education Impact:

2.9 Computer System Organization - Three Layers

2.9.1 Hardware Layer (Bottom)

Physical Components:

Processor (CPU):

Microarchitecture:

Memory Hierarchy:

Input/Output Controllers:

Secondary Storage Interfaces:

Purpose:

2.9.2 System Software Layer (Middle)

Tool Chain Components:

Compiler:

Assembler:

Linker:

Purpose of Tool Chain:

Operating System:

Core Responsibilities:

Resource Management:

Memory Management:

Storage Management:

Input/Output Handling:

Task Scheduling:

Resource Sharing:

Why Operating System Needed:

Trust and Security:

Coordination and Protection:

Programmer Benefits:

Abstractions Provided:

Programmers don't need to worry about:

OS Guarantees:

Example Services:

2.9.3 Application Software Layer (Top)

User-Level Programs:

High-Level Programming Languages:

Popular Languages:

Language Characteristics:

Hundreds/Thousands Available:

Domain Optimization:

Level of Abstraction:

2.10 From High-Level Code to Machine Code - The Translation Process

2.10.1 Example: Swap Function in C

Source Code:

void swap(int v[], int k) {
    int temp;
    temp = v[k];
    v[k] = v[k+1];
    v[k+1] = temp;
}

Function Purpose:

Algorithm:

  1. Store v[k] in temporary variable
  2. Copy v[k+1] to v[k]
  3. Copy temporary to v[k+1]

2.10.2 After Compilation - MIPS Assembly Code

Assembly Translation:

The compiler generates 7 MIPS instructions to implement the swap function:

MUL  $2, $5, 4      # Multiply k by 4 (array index to byte offset)
ADD  $2, $4, $2     # Add base address to offset (address of v[k])
LW   $15, 0($2)     # Load v[k] into register $15 (temp = v[k])
LW   $16, 4($2)     # Load v[k+1] into register $16
SW   $16, 0($2)     # Store v[k+1] to v[k]
SW   $15, 4($2)     # Store temp to v[k+1]

Translation Analysis:

Key Operations Explained:

1. Address Calculation:

2. Memory Addressing:

3. Register Usage:

Instruction Set Details:

2.10.3 After Assembly - Machine Code

Binary Representation:

Each assembly instruction translates to 32-bit binary instruction:

00000000101000100001000000011000  # MUL $2, $5, 4
00000000100000100001000000100001  # ADD $2, $4, $2
10001100010011110000000000000000  # LW  $15, 0($2)
10001100010100000000000000000100  # LW  $16, 4($2)
10101100010100000000000000000000  # SW  $16, 0($2)
10101100010011110000000000000100  # SW  $15, 4($2)

One-to-One Mapping:

Instruction Format:

Different instruction types have different bit field layouts:

R-Type (Register) Format:

[Opcode 6 bits][Rs 5 bits][Rt 5 bits][Rd 5 bits][Shamt 5 bits][Funct 6 bits]

I-Type (Immediate) Format:

[Opcode 6 bits][Rs 5 bits][Rt 5 bits][Immediate 16 bits]

Instruction Components Specify:

Example Analysis:

In the immediate value 4:

Binary Image:

2.11 Program Execution - Inside the CPU

2.11.1 Block Diagram of Computer

System Components:

Compiler/Tool Chain:

Memory:

CPU (Central Processing Unit):

Input/Output:

Program Execution Flow:

1. Compile Stage:

2. Store Stage:

3. Load Stage:

4. Execute Stage:

5. Results Stage:

2.11.2 Inside the CPU - Two Main Components

Datapath:

Structure:

Components:

Function:

Examples of Functional Units:

Control:

Structure:

Function:

Responsibilities:

Interaction:

2.11.3 Execution Process (Conveyor Belt Analogy)

Instruction Execution Cycle:

1. Fetch:

2. Decode:

3. Execute:

4. Memory:

5. Writeback:

6. Repeat:

Conveyor Belt Metaphor:

2.11.4 Cache Memory

Purpose and Motivation:

The Performance Gap:

Cache Solution:

Cache Hierarchy:

Level 1 Cache (L1):

Level 2 Cache (L2):

Level 3 Cache (L3):

Performance Impact:

Will Learn in Lecture:

2.12 Real CPU Layout - AMD Barcelona Example

2.12.1 Overview

AMD Barcelona Processor:

Die Photo Analysis:

2.12.2 Four Processor Cores

Core Distribution:

Physical layout shows clear quadrant organization:

Layout Strategy:

2.12.3 Inside Each Core

Floating-Point Unit (FPU):

Characteristics:

Operations:

Why So Large:

Load-Store Unit:

Function:

Operations:

Integer Execution Unit:

Characteristics:

Operations:

Why Smaller:

Fetch and Decode Unit:

Responsibilities:

Instruction Fetch:

Instruction Decode:

Pipeline Frontend:

Level 1 Data Cache (L1 D-Cache):

Characteristics:

Typical Specifications:

Level 1 Instruction Cache (L1 I-Cache):

Characteristics:

Benefits of Separation:

Level 2 Unified Cache (L2 Cache):

Characteristics:

Architecture:

2.12.4 Shared Components

North Bridge (Central Hub):

Location:

Functions:

Critical Role:

DDR PHY (Physical Controller):

DDR Memory:

PHY (Physical Layer):

Responsibilities:

HyperTransport Controllers:

HyperTransport Technology:

Connections:

Benefits:

2.12.5 Additional Information

WikiChip Database: https://en.wikichip.org

Comprehensive Processor Information:

Major Manufacturers Covered:

Available Information:

Visual Content:

Technical Specifications:

Advanced Topics:

Current Technology Landscape (2021):

Mainstream Manufacturing:

Future Direction:

Important Clarification:

Key Takeaways

  1. Moore's Law predicted transistor doubling every 2 years - remarkably accurate for over 40 years, guiding semiconductor industry planning and investment
  2. Smaller transistors enabled by improved lithography - progression from 90nm → 45nm → 22nm → 7nm → 5nm through advancing manufacturing processes
  3. Feature size now marketing term rather than physical measurement - modern processes use 3D structures making simple linear dimensions misleading
  4. Smaller transistors provide dual benefits - enable more complex circuits (more transistors available) and faster switching (lower voltage, reduced impedance)
  5. Clock rate increased exponentially until ~2004 - grew from 12.5 MHz (1982) to 3.6 GHz (2004), then hit fundamental thermal limitations
  6. Power wall halted frequency scaling - heat generation (P = CV²f) exceeded cooling capability, establishing ~100W practical limit for consumer processors
  7. Dynamic power equation explains the crisis - despite 25× power reduction from voltage scaling, 300× frequency increase overwhelmed the benefit
  8. Overclocking emerged as risky performance technique - users could exceed rated speeds at risk of destroying processors, popular among gaming enthusiasts
  9. Industry pivoted to multi-core processors - solution to utilize Moore's Law transistors without exceeding power limits, starting ~2004-2008
  10. Multi-core growth slowed due to programming difficulty - initial projection of hundreds of cores didn't materialize; parallel programming remains challenging
  11. Parallel programming requires explicit management - unlike automatic instruction-level parallelism, multi-core requires programmers to handle threads, synchronization, communication
  12. Three major parallel programming challenges - load balancing across cores, minimizing communication overhead, optimizing synchronization
  13. 3D chip technology changed scaling paradigm (2013-2015) - industry shifted from pure 2D shrinking to vertical stacking of transistor layers
  14. ITRS dissolved in 2015 - technology roadmap organization ended as multiple paths to density replaced simple feature size scaling
  15. Computer systems organized in three layers - hardware (physical components), system software (OS, compilers, tools), application software (user programs)
  16. System software provides abstraction and protection - OS prevents malicious programs from damaging hardware, hides complexity from application programmers
  17. Program translation is multi-stage process - high-level language → assembly language → machine code through compiler, assembler, linker
  18. CPU contains datapath and control - datapath performs computation by routing data through functional units; control coordinates execution and generates signals
  19. Cache memory critical for performance - fast on-chip memory (L1, L2, L3) stores frequently accessed data/instructions, hiding main memory latency
  20. Real CPUs have complex layouts - die photos reveal intricate organization with multiple cores, cache hierarchies, shared interconnects, memory controllers

Summary

This lecture provides a comprehensive examination of computer technology evolution from the 1970s to present day. Moore's Law, predicting transistor count doubling every two years, serves as the guiding principle for the semiconductor industry and enables the transformation of computers from room-sized machines to powerful pocket devices.

The progression of manufacturing technology steadily reduced feature sizes from 90 nanometers to current 7nm and 5nm processes. Smaller transistors provided two key advantages: more transistors per chip enabling complex functionality, and faster switching speeds enabling higher clock frequencies. Clock rates grew exponentially from 12.5 MHz in 1982 to 3.6 GHz in 2004.

However, around 2004, the industry encountered the power wall - a fundamental thermal limitation. The dynamic power equation (P = CV²f) revealed that despite aggressive voltage scaling, the massive frequency increases caused power consumption and heat generation to exceed cooling capabilities. The ~100-watt limit for consumer processors could not be overcome by improved cooling solutions.

The solution was multi-core processors: placing multiple complete CPU cores on a single chip. This allowed continued performance improvement within power constraints by exploiting thread-level parallelism. However, the initial vision of exponentially growing core counts didn't materialize due to the difficulty of parallel programming. Unlike automatic instruction-level parallelism, multi-core requires programmers to explicitly manage threads, balance loads, minimize communication, and handle synchronization - a significantly more challenging paradigm.

Around 2013-2015, the industry made another major shift to 3D chip technology. Instead of only shrinking transistors in two dimensions, manufacturers began stacking transistor layers vertically using FinFET and similar technologies. This represented such a fundamental change that the International Technology Roadmap for Semiconductors (ITRS) dissolved in 2015, as simple feature-size predictions no longer captured the diverse approaches to increasing transistor density.

The lecture concluded by examining computer system organization across three layers: hardware (processor, memory, I/O), system software (compilers, assemblers, operating system), and application software (programs written in high-level languages). We traced the complete journey from high-level code through compilation and assembly to binary machine code, and explored how programs execute through the interaction of control and datapath components within the CPU. Cache memory's critical role in hiding main memory latency was emphasized, and real-world processor layouts illustrated the complex organization of modern multi-core chips.

Understanding these technology trends and architectural responses provides essential context for studying computer architecture and explains why processors are organized as they are today.