Back to All Lectures

Lecture 4: Introduction to ARM Assembly

Lectures on Computer Architecture

Click the thumbnail above to watch the video lecture on YouTube

By Dr. Kisaru Liyanage

4.1 Introduction

This lecture introduces ARM assembly language programming, providing the foundation for understanding how high-level programs translate to machine code. We explore the ARM instruction set architecture (ISA), focusing on its RISC design philosophy, register organization, basic instruction formats, and the toolchain used for development. Understanding assembly language is essential for comprehending how processors execute programs and for optimizing performance-critical code.

4.2 ARM Architecture Overview

4.2.1 RISC Philosophy

Reduced Instruction Set Computer (RISC)

Contrasted with CISC (Complex Instruction Set Computer)

Feature RISC CISC
Instruction Format Simple, uniform format Variable-length instructions
Instruction Complexity Simple instructions, more instructions per program Complex operations
Memory Access Load/store architecture (only LOAD/STORE access memory) Memory operands in arithmetic operations
Registers Large number of general-purpose registers Fewer registers
Hardware Design Hardware simplicity for higher clock rates More complex hardware
Pipelining Regular structure enables efficient pipelining More difficult to pipeline

ARM Design Principles

4.2.2 ARM Registers

General-Purpose Registers

Register Usage Conventions

ARM Register Usage Conventions

ARM Register Usage Conventions

R0-R3:   Argument/result registers
- Pass parameters to functions
- Return values from functions
- Scratch registers (not preserved)

R4-R11:  Local variable registers
- Must be preserved across function calls
- Callee saves/restores if used

R12:     Intra-procedure-call scratch register
- Can be corrupted by function calls
- Not preserved

R13 (SP): Stack Pointer
- Points to top of stack
- Must always be valid

R14 (LR): Link Register
- Stores return address on function call
- Contains address to return to

R15 (PC): Program Counter
- Always points to next instruction
- Modifying PC changes execution flow

Why So Many Registers?

4.2.3 Memory Organization

Little-Endian Byte Ordering

Address:  [base+0] [base+1] [base+2] [base+3]
Content:     78       56       34       12

Word Alignment

Address Space

4.3 ARM Instruction Format

4.3.1 Instruction Structure

Fixed 32-Bit Length

Typical Instruction Fields

[Condition][Opcode][Operands]
4 bits    varies   varies

Example: ADD Instruction

ADD R1, R2, R3    ; R1 = R2 + R3

Encoding includes:

4.3.2 Instruction Types

Data Processing Instructions

Data Transfer Instructions

Control Flow Instructions

4.3.3 Operand Types

Register Operands

ADD R0, R1, R2    ; R0 = R1 + R2 (all registers)

Immediate Operands

ADD R0, R1, #5    ; R0 = R1 + 5 (# indicates immediate)
MOV R2, #100      ; R2 = 100

Immediate Value Constraints

Shifted Register Operands

ADD R0, R1, R2, LSL #2    ; R0 = R1 + (R2 << 2)
SUB R3, R4, R5, LSR #1    ; R3 = R4 - (R5 >> 1)

4.4 Basic ARM Instructions

4.4.1 Arithmetic Instructions

Addition

ADD Rd, Rn, Rm       ; Rd = Rn + Rm
ADD Rd, Rn, #imm     ; Rd = Rn + immediate

Examples:

ADD R0, R1, R2       ; R0 = R1 + R2
ADD R3, R3, #1       ; R3 = R3 + 1 (increment)

Subtraction

SUB Rd, Rn, Rm       ; Rd = Rn - Rm
SUB Rd, Rn, #imm     ; Rd = Rn - immediate
RSB Rd, Rn, #imm     ; Rd = immediate - Rn (reverse subtract)

Examples:

SUB R0, R1, R2       ; R0 = R1 - R2
SUB R4, R4, #10      ; R4 = R4 - 10 (decrement)
RSB R5, R6, #0       ; R5 = 0 - R6 (negate)

Multiplication (covered in later tutorials)

MUL Rd, Rn, Rm       ; Rd = Rn × Rm (lower 32 bits)

4.4.2 Logical Instructions

AND Operation

AND Rd, Rn, Rm       ; Rd = Rn AND Rm
AND Rd, Rn, #imm     ; Rd = Rn AND immediate

Usage: Bit masking, clearing specific bits

Example:

AND R0, R0, #0xFF    ; Keep only lower 8 bits

OR Operation

ORR Rd, Rn, Rm       ; Rd = Rn OR Rm (ORR in ARM)
ORR Rd, Rn, #imm     ; Rd = Rn OR immediate

Usage: Setting specific bits

Example:

ORR R1, R1, #0x80    ; Set bit 7

Exclusive OR

EOR Rd, Rn, Rm       ; Rd = Rn XOR Rm
EOR Rd, Rn, #imm     ; Rd = Rn XOR immediate

Usage: Toggling bits, fast comparison

Example:

EOR R2, R2, R2       ; R2 = 0 (XOR with itself)

Move and Move Not

MOV Rd, Rm           ; Rd = Rm
MOV Rd, #imm         ; Rd = immediate
MVN Rd, Rm           ; Rd = NOT Rm (bitwise complement)

Examples:

MOV R0, R1           ; Copy R1 to R0
MOV R2, #0           ; Clear R2
MVN R3, R4           ; R3 = ~R4 (invert all bits)

4.4.3 Shift Operations

Logical Shift Left (LSL)

LSL Rd, Rn, #shift   ; Rd = Rn << shift
MOV Rd, Rn, LSL #shift

Effect: Multiplies by 2^shift

Example:

LSL R0, R1, #2       ; R0 = R1 × 4

Logical Shift Right (LSR)

LSR Rd, Rn, #shift   ; Rd = Rn >> shift (unsigned)
MOV Rd, Rn, LSR #shift

Effect: Divides by 2^shift (unsigned)

Example:

LSR R0, R1, #3       ; R0 = R1 / 8

Arithmetic Shift Right (ASR)

ASR Rd, Rn, #shift   ; Rd = Rn >> shift (signed)

Effect: Divides by 2^shift, preserves sign

Example:

ASR R0, R1, #2       ; R0 = R1 / 4 (signed)

Rotate Right (ROR)

ROR Rd, Rn, #shift   ; Rotate Rn right by shift

Effect: Bits rotated off right end reappear at left

Example:

ROR R0, R1, #8       ; Rotate R1 right by 8 bits

4.5 Memory Access Instructions

4.5.1 Load Instructions

Load Word (LDR)

LDR Rd, [Rn]         ; Rd = Memory[Rn]
LDR Rd, [Rn, #offset]; Rd = Memory[Rn + offset]

Examples:

LDR R0, [R1]         ; Load word from address in R1
LDR R2, [R3, #4]     ; Load from address R3+4
LDR R4, [R5, #-8]    ; Load from address R5-8

Load Byte (LDRB)

LDRB Rd, [Rn, #offset]; Load one byte, zero-extend to 32 bits

Example:

LDRB R0, [R1]        ; R0 = (byte at R1), upper 24 bits = 0

Load Halfword (LDRH)

LDRH Rd, [Rn, #offset]; Load 16 bits, zero-extend to 32 bits

Example:

LDRH R0, [R1, #2]    ; R0 = (halfword at R1+2), upper 16 bits = 0

Pseudo-Instruction for Loading Addresses

LDR Rd, =label       ; Load address of label into Rd
LDR Rd, =value       ; Load 32-bit constant into Rd

Examples:

LDR R0, =array       ; R0 = address of array
LDR R1, =0x12345678  ; R1 = 0x12345678 (large immediate)

4.5.2 Store Instructions

Store Word (STR)

STR Rd, [Rn]         ; Memory[Rn] = Rd
STR Rd, [Rn, #offset]; Memory[Rn + offset] = Rd

Examples:

STR R0, [R1]         ; Store R0 to address in R1
STR R2, [R3, #8]     ; Store R2 to address R3+8

Store Byte (STRB)

STRB Rd, [Rn, #offset]; Store lower 8 bits of Rd

Example:

STRB R0, [R1]        ; Store lower byte of R0 to address R1

Store Halfword (STRH)

STRH Rd, [Rn, #offset]; Store lower 16 bits of Rd

Example:

STRH R0, [R1, #4]    ; Store lower halfword of R0 to R1+4

4.5.3 Addressing Modes

Offset Addressing

LDR R0, [R1, #4]     ; R0 = Memory[R1 + 4], R1 unchanged

Pre-indexed Addressing

LDR R0, [R1, #4]!    ; R1 = R1 + 4, then R0 = Memory[R1]
; ! indicates update base register

Post-indexed Addressing

LDR R0, [R1], #4     ; R0 = Memory[R1], then R1 = R1 + 4

Register Offset

LDR R0, [R1, R2]     ; R0 = Memory[R1 + R2]
LDR R0, [R1, R2, LSL #2] ; R0 = Memory[R1 + (R2 << 2)]

4.6 Assembly Program Structure

4.6.1 Directives

Section Directives

.text                ; Code section (instructions)
.data                ; Data section (initialized variables)
.bss                 ; Uninitialized data section

Global and External

.global main         ; Make symbol visible to linker
.extern printf       ; Declare external symbol

Data Definition

.word value          ; Define 32-bit word
.byte value          ; Define byte
.asciz "string"      ; Define null-terminated string
.space n             ; Reserve n bytes of space

4.6.2 Labels

Purpose

Syntax

label:               ; Label for instruction
    MOV R0, #1
    ADD R1, R0, R2

array:               ; Label for data
    .word 1, 2, 3, 4

4.6.3 Simple Program Example

.text
.global main

main:
    MOV R0, #5       ; R0 = 5
    MOV R1, #10      ; R1 = 10
    ADD R2, R0, R1   ; R2 = R0 + R1 = 15
    MOV R0, R2       ; R0 = R2 (return value)
    MOV PC, LR       ; Return from main

.data
message:
    .asciz "Hello, ARM!"

4.7 ARM Development Tools

4.7.1 Toolchain Components

Cross-Compiler

Assembler

Linker

Emulator

4.7.2 Compilation Process

From C to Executable

C Source (.c)
    ↓ [gcc -S]
Assembly (.s)
    ↓ [as]
Object Code (.o)
    ↓ [ld]
Executable (a.out)
    ↓ [qemu-arm]
Execution

Command Examples

# Compile C to assembly
arm-linux-gnueabi-gcc -S program.c -o program.s

# Assemble to object code
arm-linux-gnueabi-as program.s -o program.o

# Link to executable
arm-linux-gnueabi-gcc program.o -o program

# Run with emulator
qemu-arm program

One-Step Compilation

# Compile, assemble, and link in one command
arm-linux-gnueabi-gcc program.c -o program

4.7.3 Debugging and Inspection

GDB (GNU Debugger)

# Debug with QEMU and GDB
qemu-arm -g 1234 program &     # Start QEMU, wait for debugger
arm-linux-gnueabi-gdb program  # Start GDB
(gdb) target remote :1234      # Connect to QEMU
(gdb) break main               # Set breakpoint
(gdb) continue                 # Run to breakpoint
(gdb) step                     # Execute one instruction
(gdb) info registers           # Show register values

Objdump

# Disassemble binary to assembly
arm-linux-gnueabi-objdump -d program

nm

# List symbols in object file
arm-linux-gnueabi-nm program.o

4.8 Programming in ARM Assembly

4.8.1 Translating C to ARM

C Code:

int a = 5;
int b = 10;
int c = a + b;

ARM Assembly:

MOV R0, #5       ; a = 5
MOV R1, #10      ; b = 10
ADD R2, R0, R1   ; c = a + b

C Code with Array:

int arr[3] = {1, 2, 3};
int x = arr[1];

ARM Assembly:

.data
arr:
    .word 1, 2, 3

.text
    LDR R0, =arr     ; R0 = address of arr
    LDR R1, [R0, #4] ; R1 = arr[1] (offset 4 bytes)

4.8.2 Common Patterns

Clearing a Register

MOV R0, #0           ; Method 1
EOR R0, R0, R0       ; Method 2 (XOR with itself)

Negating a Value

RSB R0, R0, #0       ; R0 = 0 - R0
MVN R0, R0           ; R0 = ~R0 (bitwise, not arithmetic)
ADD R0, R0, #1       ; Then add 1 (two's complement)

Multiplying by Powers of 2

LSL R0, R1, #3       ; R0 = R1 × 8 (faster than MUL)

Dividing by Powers of 2

LSR R0, R1, #2       ; R0 = R1 / 4 (unsigned)
ASR R0, R1, #2       ; R0 = R1 / 4 (signed)

Swapping Two Registers

EOR R0, R0, R1       ; XOR-based swap (no temporary)
EOR R1, R0, R1
EOR R0, R0, R1

Key Takeaways

  1. ARM follows RISC principles - simple instructions, load/store architecture, large register file, fixed instruction length.
  2. 16 registers (R0-R15) with special purposes: R13 (SP), R14 (LR), R15 (PC), and calling conventions for R0-R11.
  3. Three main instruction categories - data processing (arithmetic/logic), data transfer (load/store), control flow (branches).
  4. Fixed 32-bit instruction format simplifies hardware and enables efficient pipelining.
  5. Little-endian byte ordering - least significant byte stored at lowest address.
  6. Immediate values indicated by # symbol, with encoding constraints due to fixed instruction size.
  7. Memory access only through LOAD/STORE - arithmetic operations work on registers only (load/store architecture).
  8. Rich addressing modes - offset, pre-indexed, post-indexed, register offset with optional shifts.
  9. Cross-compilation toolchain - arm-linux-gnueabi-gcc, as, ld, and qemu-arm for development on x86.
  10. Assembly programming requires understanding of register allocation, instruction selection, and calling conventions.

Summary

ARM assembly language provides the low-level interface between software and hardware, revealing how high-level constructs translate to machine operations. The ARM architecture's RISC design emphasizes simplicity and regularity, with a uniform 32-bit instruction format, a generous 16-register set, and a clean separation between computation (using registers) and memory access (through explicit load/store instructions). Understanding ARM assembly is crucial for optimizing performance-critical code, implementing system-level software, and comprehending how processors execute programs. The development toolchain—including cross-compilers, assemblers, linkers, and emulators—enables efficient development and testing of ARM software. Mastering these fundamentals prepares us for more advanced topics including function calling conventions, stack management, and processor microarchitecture implementation.