e20-co523-C-Lite-Interpreter

C-Lite Interpreter

Version Python License Tests Status Course

A tree-walking interpreter for C-Lite, a pedagogical subset of the C programming language.
Built from scratch in Python - lexer, parser, AST, symbol table, and evaluator - with no external runtime dependencies.

Quick Start Language Reference Architecture Testing Contributing License

Table of Contents


Overview

C-Lite is a simplified, statically-typed subset of C designed to demonstrate the full program translation pipeline from raw source text to evaluated output without the complexity of a production C compiler.

The interpreter implements three classical compilation phases:

Phase Component Responsibility
1 Lexer Tokenizes source text into a stream of classified tokens
2 Parser Builds an Abstract Syntax Tree (AST) via recursive-descent parsing
3 Interpreter Walks the AST using the Visitor pattern to evaluate the program

Key Features


Quick Start

Prerequisites

Requirement Version
Python >= 3.8
pip any

Installation

# Clone the repository
git clone https://github.com/ChethiyaB/C-Lite-Interpreter.git
cd C-Lite-Interpreter

# Install development dependencies (pytest)
pip install -r requirements.txt

Running a Program

Create a file hello.clt:

int x;
x = 42;
printf(x);

Execute it:

python cli.py hello.clt
42

Interactive REPL

python repl.py
>>> int x;
>>> x = 10;
>>> printf(x);
10
>>> if (x > 5) { printf(1); } else { printf(0); }
1

Language Reference

Data Types

Type Description Literal Examples
int Signed integer 0, 42, 1000
float IEEE 754 double-precision floating-point 3.14, .5, 7., 0.0

Variables

Variables must be declared before use. Declaration and assignment are separate statements:

int x;          // declaration
x = 42;         // assignment

float pi;
pi = 3.14159;

Note: Using a variable before initialization raises a SemanticError.

Operators

Arithmetic Operators

Operator Description Example Result Type
+ Addition 3 + 4 int or float
- Subtraction 10 - 3 int or float
* Multiplication 5 * 2 int or float
/ Division 7 / 2 int (truncated) or float

Division semantics:

Relational Operators

Operator Description Returns
> Greater than 1 (true) or 0 (false)
< Less than 1 (true) or 0 (false)
== Equal to 1 (true) or 0 (false)

Relational operators return integer values (1 or 0), consistent with C semantics.

Unary Operators

Operator Description Example
+ Unary positive +5
- Unary negation -x

Assignment Operator

Operator Description Example
= Assignment x = 5;

Operator Precedence (highest to lowest)

Precedence    Operators        Associativity    Description
─────────────────────────────────────────────────────────────
   1          ( )              N/A              Parentheses (grouping)
   2          + -  (unary)     right            Unary positive / negation
   3          * /              left             Multiplicative
   4          + -  (binary)    left             Additive
   5          > < ==           left             Relational / equality
   6          =                N/A              Assignment (statement-level)

Control Flow

if-else Statement

if (condition) {
    // then-branch
} else {
    // else-branch (optional)
}

I/O

printf(expression);

Evaluates the expression and prints the result to standard output:

printf(42);           // prints: 42
printf(3.14);         // prints: 3.14
printf(x + y);        // prints the computed sum

Comments

Single-line comments are supported with //:

int x;   // this is a comment
x = 42;  // assign value
// this entire line is a comment

Scoping Rules

Blocks ({ }) introduce new lexical scopes. Variables declared inside a block are destroyed when the block exits:

int x;
x = 10;

{
    int y;           // y exists only inside this block
    y = 20;
    int x;           // shadows the outer x
    x = 99;
    printf(x);       // prints: 99
}

printf(x);           // prints: 10  (outer x restored)
// printf(y);        // ERROR: y is not accessible here

Variable shadowing is supported: an inner scope may declare a variable with the same name as an outer scope. The inner variable takes precedence within its scope.

Type Coercion

C-Lite performs implicit type coercion following C semantics:

Context Rule Example
int var <- float expr Truncation (toward zero) int x; x = 3.14; -> x == 3
float var <- int expr Promotion float y; y = 10; -> y == 10.0
Mixed arithmetic Promote to float 3 + 2.5 -> 5.5
Mixed comparison Promote to float 3 == 3.0 -> 1 (true)

Formal Grammar (EBNF)

The complete grammar specification is available at docs/grammar.ebnf. A summary:

program               = { declaration | statement } ;
declaration           = type_specifier identifier ";" ;
type_specifier        = "int" | "float" ;

statement             = assignment | if_statement | printf_call | block | ";" ;
assignment            = identifier "=" expression ";" ;
if_statement          = "if" "(" expression ")" statement [ "else" statement ] ;
block                 = "{" { declaration | statement } "}" ;
printf_call           = "printf" "(" expression ")" ";" ;

expression            = relational_expression ;
relational_expression = additive_expression [ ( ">" | "<" | "==" ) additive_expression ] ;
additive_expression   = multiplicative_expression { ( "+" | "-" ) multiplicative_expression } ;
multiplicative_expression = primary_expression { ( "*" | "/" ) primary_expression } ;
primary_expression    = number | identifier | "(" expression ")" | unary_expression ;
unary_expression      = ( "+" | "-" ) primary_expression ;

Grammar properties:


Architecture

Pipeline Overview

The interpreter processes source code through three sequential phases, each transforming the input into a progressively higher-level representation:

Pipeline

Phase 1 – Lexical Analysis

Module: src/lexer.py

The lexer (scanner) converts raw source text into a stream of Token objects. Each token carries its type, value, and source location (line and column).

Source: "int x = 42;"

Tokens: [
    Token(int, "int",   1:1),
    Token(identifier, "x", 1:5),
    Token(=, "=",       1:7),
    Token(integer, 42,  1:9),
    Token(;, ";",       1:11),
    Token(eof, None,    1:12)
]

Implementation details:

Phase 2 – Syntax Analysis

Module: src/parser.py

The parser consumes the token stream and constructs an Abstract Syntax Tree (AST). It uses LL(1) recursive descent - each grammar production is implemented as a method.

Source: "x = 3 + 4 * 5;"

AST:
    Assignment
    ├── name: "x"
    └── value: BinaryOp(+)
              ├── left: NumberLiteral(3)
              └── right: BinaryOp(*)
                        ├── left: NumberLiteral(4)
                        └── right: NumberLiteral(5)

Implementation details:

Phase 3 – Semantic Evaluation

Module: src/interpreter.py

The interpreter traverses the AST using the Visitor pattern (ASTVisitor base class). Each AST node type has a corresponding visit_* method.

AST Traversal:
    visit_program
    ├── visit_declaration  →  SymbolTable.declare("x", int)
    ├── visit_assignment   →  SymbolTable.update("x", 42)
    │   └── visit_number_literal  →  returns 42
    └── visit_printf_call  →  output.append(42)
        └── visit_identifier  →  SymbolTable.get_value("x") → 42

Implementation details:

Error Hierarchy

The interpreter uses a structured exception hierarchy with precise source locations:

CLiteError (base)
├── LexerError      — invalid characters, malformed literals, unclosed strings
├── ParserError     — syntax errors, unexpected tokens, unmatched delimiters
└── SemanticError   — undefined variables, uninitialized access, division by zero,
                      duplicate declarations in same scope

All errors include line, column, and a human-readable message:

SemanticError at line 5, column 1: Undefined variable 'y'
ParserError at line 3, column 12: Expected ')' after condition
LexerError at line 1, column 6: Unexpected character '@'

AST Node Reference

The full AST hierarchy is defined in src/ast.py:

ASTNode (abstract base)
│
├── Program                — root node: declarations[] + statements[]
│
├── Declaration            — variable declaration: type_specifier identifier ;
├── Assignment             — variable assignment: identifier = expression ;
├── IfStatement            — conditional: if (expr) stmt [else stmt]
├── Block                  — scoped block: { declaration | statement }
├── PrintfCall             — output: printf(expression) ;
├── EmptyStatement         — no-op: ;
│
├── BinaryOp              — binary expression: left op right
├── UnaryOp               — unary expression: op operand
├── NumberLiteral         — numeric constant: int or float
└── Identifier            — variable reference: name

All nodes are immutable (@dataclass(frozen=True)) and carry source location (line, column).

Token Type Reference

Defined in src/token.py:

Category Tokens
Keywords int, float, if, else, printf
Literals INT_LITERAL, FLOAT_LITERAL, STRING_LITERAL
Identifiers IDENTIFIER
Arithmetic +, -, *, /
Relational >, <, ==
Assignment =
Delimiters (, ), {, }, ;
Meta COMMENT, EOF

Directory Layout

C-Lite-Interpreter/
├── cli.py                  # Command-line interface entry point
├── repl.py                 # Interactive REPL entry point
├── requirements.txt        # Development dependencies (pytest)
├── LICENSE                 # MIT License
│
├── src/                    # Interpreter source code
│   ├── __init__.py         #   Public API exports
│   ├── token.py            #   Token and TokenType definitions
│   ├── lexer.py            #   Lexical analyzer (scanner)
│   ├── ast.py              #   AST node definitions + Visitor base class
│   ├── parser.py           #   Recursive descent parser
│   ├── symbol_table.py     #   Stack-based symbol table with scoping
│   ├── interpreter.py      #   Tree-walking evaluator (Visitor pattern)
│   └── errors.py           #   Exception hierarchy (CLiteError family)
│
├── tests/                  # Test suite (293 tests)
│   ├── __init__.py
│   ├── test_lexer.py       #   Lexer tests (33 tests)
│   ├── test_parser.py      #   Parser tests (64 tests)
│   ├── test_interpreter.py #   Interpreter tests (46 tests)
│   ├── test_ast.py         #   AST structure tests (57 tests)
│   ├── test_symbol_table.py#   Symbol table tests (38 tests)
│   ├── test_cli.py         #   CLI integration tests (24 tests)
│   └── test_repl.py        #   REPL integration tests (31 tests)
│
├── examples/               # Example C-Lite programs
│   ├── hello.clt           #   Basic variable and printf
│   ├── arithmetic.clt      #   Arithmetic operations
│   ├── control_flow.clt    #   if-else branching
│   ├── float_operations.clt#   Floating-point arithmetic
│   ├── nested_scopes.clt   #   Nested block scoping
│   └── type_coercion.clt   #   Implicit type conversion
│
└── docs/                   # Design documentation
    ├── grammar.ebnf        #   Formal EBNF grammar specification
    ├── first_follow_sets.md#   FIRST/FOLLOW set analysis
    ├── grammar_validation.md#  Grammar validation examples & parse trees
    └── design_decisions.md #   Architectural rationale (18 decisions)

Usage

CLI Reference

usage: cli.py [-h] [-e CODE] [-v] [--version] [source_file]

C-Lite Interpreter

positional arguments:
  source_file           C-Lite source file (.clt extension)

optional arguments:
  -h, --help            show this help message and exit
  -e CODE, --execute CODE
                        Execute C-Lite code directly from command line
  -v, --verbose         Show detailed phase information
  --version             show program's version number and exit

Execute a file:

python cli.py examples/hello.clt
# Output: 42

Execute inline code:

python cli.py -e "int x; x = 10; printf(x);"
# Output: 10

Verbose mode (shows lexer tokens, AST, and execution details):

python cli.py -v examples/control_flow.clt

Supported file extensions: .clt (recommended), .c, .txt

REPL Commands

Launch the REPL with python repl.py:

Command Description
:help Show help message with language reference
:vars Display all variables in the current symbol table
:tokens Show tokens from last parsed input
:ast Show AST from last parsed input
:verbose Enable verbose mode (show all phases)
:quiet Disable verbose mode
:clear / :reset Clear interpreter state (reset all variables)
:version Show version information
:exit / :quit Exit REPL (also Ctrl+D)

REPL features:

Example session:

>>> int x;
>>> int y;
>>> x = 10;
>>> y = x * 2;
>>> printf(y);
20
>>> if (y > 15) { printf(1); } else { printf(0); }
1
>>> :vars
Variables in scope:
  Scope 0:
    x (int) = 10
    y (int) = 20
>>> :clear
Interpreter state cleared.

Programmatic API

The interpreter components can be used as a Python library:

from src.lexer import Lexer
from src.parser import Parser
from src.interpreter import Interpreter

# Tokenize
source = "int x; x = 42; printf(x);"
tokens = list(Lexer(source).tokenize())

# Parse
ast = Parser(tokens).parse()

# Execute
interpreter = Interpreter()
output = interpreter.execute(ast)
print(output)  # [42]

Accessing the symbol table after execution:

interpreter = Interpreter()
interpreter.execute(ast)

# Inspect variables
symbol = interpreter.symbol_table.lookup("x")
print(symbol.name, symbol.var_type, symbol.value)  # x int 42

Examples

Hello World

// examples/hello.clt
int x;
x = 42;
printf(x);
$ python cli.py examples/hello.clt
42

Arithmetic with Type Coercion

// examples/type_coercion.clt
int a;
float b;
float result;

a = 10;
b = 3.5;

result = a + b;    // int + float → float
printf(result);    // Output: 13.5

a = b;             // float → int truncation
printf(a);         // Output: 3
$ python cli.py examples/type_coercion.clt
13.5
3

Control Flow

// examples/control_flow.clt
int x;
x = 5;

if (x > 3) {
    printf(x);         // Output: 5
} else {
    x = 1;
    printf(x);
}
$ python cli.py examples/control_flow.clt
5

Nested Scopes

// examples/nested_scopes.clt
int x;
x = 10;

{
    int y;
    y = 20;

    {
        int z;
        z = x + y;    // accesses outer variables
        printf(z);     // Output: 30
    }
}
$ python cli.py examples/nested_scopes.clt
30

Float Operations

// examples/float_operations.clt
float x;
float y;
float result;

x = 3.14;
y = 2.86;

result = x + y;
printf(result);    // Output: 6

result = x * y;
printf(result);    // Output: 8.9804
$ python cli.py examples/float_operations.clt
6
8.9804

Testing

Running the Test Suite

# Run all tests
python -m pytest

# Run with verbose output
python -m pytest -v

# Run a specific test module
python -m pytest tests/test_lexer.py

# Run a specific test class
python -m pytest tests/test_interpreter.py::TestBooleanSemantics

# Run a single test
python -m pytest tests/test_parser.py::TestASTStructuralValidation::test_full_tree_validation_precedence

# Run with short traceback
python -m pytest --tb=short -q

Test Matrix

Module Tests Description
test_lexer.py 33 Token recognition, source location tracking, numeric/string literals, comments, error handling, boundary/stress tests
test_parser.py 64 Syntax error detection, deep nesting, float edge cases, AST structural validation, recovery behavior, operator precedence, fuzzing, EOF edge cases, performance
test_interpreter.py 46 Boolean semantics, relational evaluation, type coercion, uninitialized variables, nested blocks, expression stress, control flow, printf ordering, division edge cases, variable lifetime, isolation
test_ast.py 57 Expression tree composition, block scope nesting, if-statement traversal, visitor contract, equality comparison, immutability, program node edge cases, location integrity
test_symbol_table.py 38 Type enforcement, variable shadowing, deep nesting (50-200 levels), scope re-entrancy, memory integrity, error messages, performance (10,000 declarations), float precision
test_cli.py 24 File execution, inline code, verbose mode, error reporting, file encoding, version flag, edge cases
test_repl.py 31 Statement execution, multi-line input, state persistence, REPL commands, error recovery, output isolation
Total 293 All passing

Test Categories

The test suite systematically covers the following categories:

Lexer Tests:

Parser Tests:

Interpreter Tests:


Design Decisions

The project documents 18 design decisions in docs/design_decisions.md.

Key highlights:

# Decision Choice Rationale
1 Lexer implementation Hand-written DFA Educational transparency, explicit error control
2 EOF handling Explicit EOF token Simplifies parser termination, aligns with FOLLOW sets
4 Parsing strategy LL(1) recursive descent Educational value, easy debugging, no backtracking
5 Precedence encoding 4-level expression hierarchy Matches C semantics, correct AST construction
7 Interpretation pattern Visitor pattern Separates traversal from execution, extensible
8 Symbol table structure Stack of scope dictionaries Efficient enter/exit, matches block scoping
9 Truthiness C-style (0 = false) Matches C semantics
10 Type coercion Implicit (int<->float) Matches C semantics for assignment and arithmetic
11 Uninitialized variables Raise SemanticError Prevents undefined behavior (stricter than C)
12 Integer division Floor division for int/int Matches C semantics
13 Error strategy Fail-fast (first error only) Simpler implementation, standard for educational compilers
15 Testing approach Test-driven development (TDD) Robustness, 293 tests across all components

Known Limitations

Limitation Description Mitigation
Recursion depth Python’s default recursion limit (~1000) can be exceeded by deeply nested expressions (>500 levels) Increase limit with sys.setrecursionlimit() or restructure deeply nested code
No loop constructs while and for loops are not supported (per language specification) Future extension possible
No functions User-defined functions are not supported Only built-in printf() is available
No arrays/pointers Aggregate types are not supported Future extension possible
Single-error reporting Parser stops on first syntax error (fail-fast) Re-run after fixing each error
No string variables String literals are only supported as printf() arguments Strings cannot be stored in variables

Performance

Component Time Complexity Space Complexity
Lexer O(n) single-pass O(n) token stream
Parser O(n) single-pass O(n) AST
Interpreter O(n) tree traversal O(s) symbol table (s = scope depth)
Symbol table lookup O(d) per lookup (d = nesting depth) O(v) (v = total variables)

Benchmarks (tested on the project test suite):


Contributing

Contributions are welcome. Please follow these guidelines:

  1. Fork the repository and create a feature branch.
  2. Write tests for any new functionality (maintain 100% test pass rate).
  3. Follow existing code style – type hints, docstrings, consistent naming.
  4. Run the full test suite before submitting:
    python -m pytest -v
    
  5. Submit a pull request with a clear description of the changes.

Development Setup

git clone https://github.com/ChethiyaB/C-Lite-Interpreter.git
cd C-Lite-Interpreter
pip install -r requirements.txt
python -m pytest  # Verify all 293 tests pass

License

This project is licensed under the MIT License. See LICENSE for details.

MIT License
Copyright (c) 2026 Chethiya Bandara