A tree-walking interpreter for C-Lite, a pedagogical subset of the C programming language.
Built from scratch in Python - lexer, parser, AST, symbol table, and evaluator - with no external runtime dependencies.
| Quick Start | Language Reference | Architecture | Testing | Contributing | License |
C-Lite is a simplified, statically-typed subset of C designed to demonstrate the full program translation pipeline from raw source text to evaluated output without the complexity of a production C compiler.
The interpreter implements three classical compilation phases:
| Phase | Component | Responsibility |
|---|---|---|
| 1 | Lexer | Tokenizes source text into a stream of classified tokens |
| 2 | Parser | Builds an Abstract Syntax Tree (AST) via recursive-descent parsing |
| 3 | Interpreter | Walks the AST using the Visitor pattern to evaluate the program |
int <-> float)| Requirement | Version |
|---|---|
| Python | >= 3.8 |
| pip | any |
# Clone the repository
git clone https://github.com/ChethiyaB/C-Lite-Interpreter.git
cd C-Lite-Interpreter
# Install development dependencies (pytest)
pip install -r requirements.txt
Create a file hello.clt:
int x;
x = 42;
printf(x);
Execute it:
python cli.py hello.clt
42
python repl.py
>>> int x;
>>> x = 10;
>>> printf(x);
10
>>> if (x > 5) { printf(1); } else { printf(0); }
1
| Type | Description | Literal Examples |
|---|---|---|
int |
Signed integer | 0, 42, 1000 |
float |
IEEE 754 double-precision floating-point | 3.14, .5, 7., 0.0 |
Variables must be declared before use. Declaration and assignment are separate statements:
int x; // declaration
x = 42; // assignment
float pi;
pi = 3.14159;
Note: Using a variable before initialization raises a
SemanticError.
| Operator | Description | Example | Result Type |
|---|---|---|---|
+ |
Addition | 3 + 4 |
int or float |
- |
Subtraction | 10 - 3 |
int or float |
* |
Multiplication | 5 * 2 |
int or float |
/ |
Division | 7 / 2 |
int (truncated) or float |
Division semantics:
int / int -> integer (floor) division: 7 / 2 = 3float / int or int / float -> float division: 7.0 / 2 = 3.5SemanticError| Operator | Description | Returns |
|---|---|---|
> |
Greater than | 1 (true) or 0 (false) |
< |
Less than | 1 (true) or 0 (false) |
== |
Equal to | 1 (true) or 0 (false) |
Relational operators return integer values (1 or 0), consistent with C semantics.
| Operator | Description | Example |
|---|---|---|
+ |
Unary positive | +5 |
- |
Unary negation | -x |
| Operator | Description | Example |
|---|---|---|
= |
Assignment | x = 5; |
Precedence Operators Associativity Description
─────────────────────────────────────────────────────────────
1 ( ) N/A Parentheses (grouping)
2 + - (unary) right Unary positive / negation
3 * / left Multiplicative
4 + - (binary) left Additive
5 > < == left Relational / equality
6 = N/A Assignment (statement-level)
if-else Statementif (condition) {
// then-branch
} else {
// else-branch (optional)
}
0 and 0.0 are false; all other values are true.else clause is optional.else binds to the nearest if (standard C behavior).printf(expression);Evaluates the expression and prints the result to standard output:
printf(42); // prints: 42
printf(3.14); // prints: 3.14
printf(x + y); // prints the computed sum
6.0 prints as 6).Single-line comments are supported with //:
int x; // this is a comment
x = 42; // assign value
// this entire line is a comment
Blocks ({ }) introduce new lexical scopes. Variables declared inside a block are destroyed when the block exits:
int x;
x = 10;
{
int y; // y exists only inside this block
y = 20;
int x; // shadows the outer x
x = 99;
printf(x); // prints: 99
}
printf(x); // prints: 10 (outer x restored)
// printf(y); // ERROR: y is not accessible here
Variable shadowing is supported: an inner scope may declare a variable with the same name as an outer scope. The inner variable takes precedence within its scope.
C-Lite performs implicit type coercion following C semantics:
| Context | Rule | Example |
|---|---|---|
int var <- float expr |
Truncation (toward zero) | int x; x = 3.14; -> x == 3 |
float var <- int expr |
Promotion | float y; y = 10; -> y == 10.0 |
| Mixed arithmetic | Promote to float |
3 + 2.5 -> 5.5 |
| Mixed comparison | Promote to float |
3 == 3.0 -> 1 (true) |
The complete grammar specification is available at docs/grammar.ebnf. A summary:
program = { declaration | statement } ;
declaration = type_specifier identifier ";" ;
type_specifier = "int" | "float" ;
statement = assignment | if_statement | printf_call | block | ";" ;
assignment = identifier "=" expression ";" ;
if_statement = "if" "(" expression ")" statement [ "else" statement ] ;
block = "{" { declaration | statement } "}" ;
printf_call = "printf" "(" expression ")" ";" ;
expression = relational_expression ;
relational_expression = additive_expression [ ( ">" | "<" | "==" ) additive_expression ] ;
additive_expression = multiplicative_expression { ( "+" | "-" ) multiplicative_expression } ;
multiplicative_expression = primary_expression { ( "*" | "/" ) primary_expression } ;
primary_expression = number | identifier | "(" expression ")" | unary_expression ;
unary_expression = ( "+" | "-" ) primary_expression ;
Grammar properties:
docs/first_follow_sets.mdThe interpreter processes source code through three sequential phases, each transforming the input into a progressively higher-level representation:

Module: src/lexer.py
The lexer (scanner) converts raw source text into a stream of Token objects. Each token carries its type, value, and source location (line and column).
Source: "int x = 42;"
Tokens: [
Token(int, "int", 1:1),
Token(identifier, "x", 1:5),
Token(=, "=", 1:7),
Token(integer, 42, 1:9),
Token(;, ";", 1:11),
Token(eof, None, 1:12)
]
Implementation details:
O(n) tokenization via character-by-character advancementyield) for memory efficiency\n, \t, \", \\)Module: src/parser.py
The parser consumes the token stream and constructs an Abstract Syntax Tree (AST). It uses LL(1) recursive descent - each grammar production is implemented as a method.
Source: "x = 3 + 4 * 5;"
AST:
Assignment
├── name: "x"
└── value: BinaryOp(+)
├── left: NumberLiteral(3)
└── right: BinaryOp(*)
├── left: NumberLiteral(4)
└── right: NumberLiteral(5)
Implementation details:
@dataclass(frozen=True) AST nodesModule: src/interpreter.py
The interpreter traverses the AST using the Visitor pattern (ASTVisitor base class). Each AST node type has a corresponding visit_* method.
AST Traversal:
visit_program
├── visit_declaration → SymbolTable.declare("x", int)
├── visit_assignment → SymbolTable.update("x", 42)
│ └── visit_number_literal → returns 42
└── visit_printf_call → output.append(42)
└── visit_identifier → SymbolTable.get_value("x") → 42
Implementation details:
SymbolTable with nested scope support (enter/exit scope on block boundaries)0/0.0 = false, non-zero = true)1e-10)printf() output collected in a list for testabilityThe interpreter uses a structured exception hierarchy with precise source locations:
CLiteError (base)
├── LexerError — invalid characters, malformed literals, unclosed strings
├── ParserError — syntax errors, unexpected tokens, unmatched delimiters
└── SemanticError — undefined variables, uninitialized access, division by zero,
duplicate declarations in same scope
All errors include line, column, and a human-readable message:
SemanticError at line 5, column 1: Undefined variable 'y'
ParserError at line 3, column 12: Expected ')' after condition
LexerError at line 1, column 6: Unexpected character '@'
The full AST hierarchy is defined in src/ast.py:
ASTNode (abstract base)
│
├── Program — root node: declarations[] + statements[]
│
├── Declaration — variable declaration: type_specifier identifier ;
├── Assignment — variable assignment: identifier = expression ;
├── IfStatement — conditional: if (expr) stmt [else stmt]
├── Block — scoped block: { declaration | statement }
├── PrintfCall — output: printf(expression) ;
├── EmptyStatement — no-op: ;
│
├── BinaryOp — binary expression: left op right
├── UnaryOp — unary expression: op operand
├── NumberLiteral — numeric constant: int or float
└── Identifier — variable reference: name
All nodes are immutable (@dataclass(frozen=True)) and carry source location (line, column).
Defined in src/token.py:
| Category | Tokens |
|---|---|
| Keywords | int, float, if, else, printf |
| Literals | INT_LITERAL, FLOAT_LITERAL, STRING_LITERAL |
| Identifiers | IDENTIFIER |
| Arithmetic | +, -, *, / |
| Relational | >, <, == |
| Assignment | = |
| Delimiters | (, ), {, }, ; |
| Meta | COMMENT, EOF |
C-Lite-Interpreter/
├── cli.py # Command-line interface entry point
├── repl.py # Interactive REPL entry point
├── requirements.txt # Development dependencies (pytest)
├── LICENSE # MIT License
│
├── src/ # Interpreter source code
│ ├── __init__.py # Public API exports
│ ├── token.py # Token and TokenType definitions
│ ├── lexer.py # Lexical analyzer (scanner)
│ ├── ast.py # AST node definitions + Visitor base class
│ ├── parser.py # Recursive descent parser
│ ├── symbol_table.py # Stack-based symbol table with scoping
│ ├── interpreter.py # Tree-walking evaluator (Visitor pattern)
│ └── errors.py # Exception hierarchy (CLiteError family)
│
├── tests/ # Test suite (293 tests)
│ ├── __init__.py
│ ├── test_lexer.py # Lexer tests (33 tests)
│ ├── test_parser.py # Parser tests (64 tests)
│ ├── test_interpreter.py # Interpreter tests (46 tests)
│ ├── test_ast.py # AST structure tests (57 tests)
│ ├── test_symbol_table.py# Symbol table tests (38 tests)
│ ├── test_cli.py # CLI integration tests (24 tests)
│ └── test_repl.py # REPL integration tests (31 tests)
│
├── examples/ # Example C-Lite programs
│ ├── hello.clt # Basic variable and printf
│ ├── arithmetic.clt # Arithmetic operations
│ ├── control_flow.clt # if-else branching
│ ├── float_operations.clt# Floating-point arithmetic
│ ├── nested_scopes.clt # Nested block scoping
│ └── type_coercion.clt # Implicit type conversion
│
└── docs/ # Design documentation
├── grammar.ebnf # Formal EBNF grammar specification
├── first_follow_sets.md# FIRST/FOLLOW set analysis
├── grammar_validation.md# Grammar validation examples & parse trees
└── design_decisions.md # Architectural rationale (18 decisions)
usage: cli.py [-h] [-e CODE] [-v] [--version] [source_file]
C-Lite Interpreter
positional arguments:
source_file C-Lite source file (.clt extension)
optional arguments:
-h, --help show this help message and exit
-e CODE, --execute CODE
Execute C-Lite code directly from command line
-v, --verbose Show detailed phase information
--version show program's version number and exit
Execute a file:
python cli.py examples/hello.clt
# Output: 42
Execute inline code:
python cli.py -e "int x; x = 10; printf(x);"
# Output: 10
Verbose mode (shows lexer tokens, AST, and execution details):
python cli.py -v examples/control_flow.clt
Supported file extensions: .clt (recommended), .c, .txt
Launch the REPL with python repl.py:
| Command | Description |
|---|---|
:help |
Show help message with language reference |
:vars |
Display all variables in the current symbol table |
:tokens |
Show tokens from last parsed input |
:ast |
Show AST from last parsed input |
:verbose |
Enable verbose mode (show all phases) |
:quiet |
Disable verbose mode |
:clear / :reset |
Clear interpreter state (reset all variables) |
:version |
Show version information |
:exit / :quit |
Exit REPL (also Ctrl+D) |
REPL features:
printf outputExample session:
>>> int x;
>>> int y;
>>> x = 10;
>>> y = x * 2;
>>> printf(y);
20
>>> if (y > 15) { printf(1); } else { printf(0); }
1
>>> :vars
Variables in scope:
Scope 0:
x (int) = 10
y (int) = 20
>>> :clear
Interpreter state cleared.
The interpreter components can be used as a Python library:
from src.lexer import Lexer
from src.parser import Parser
from src.interpreter import Interpreter
# Tokenize
source = "int x; x = 42; printf(x);"
tokens = list(Lexer(source).tokenize())
# Parse
ast = Parser(tokens).parse()
# Execute
interpreter = Interpreter()
output = interpreter.execute(ast)
print(output) # [42]
Accessing the symbol table after execution:
interpreter = Interpreter()
interpreter.execute(ast)
# Inspect variables
symbol = interpreter.symbol_table.lookup("x")
print(symbol.name, symbol.var_type, symbol.value) # x int 42
// examples/hello.clt
int x;
x = 42;
printf(x);
$ python cli.py examples/hello.clt
42
// examples/type_coercion.clt
int a;
float b;
float result;
a = 10;
b = 3.5;
result = a + b; // int + float → float
printf(result); // Output: 13.5
a = b; // float → int truncation
printf(a); // Output: 3
$ python cli.py examples/type_coercion.clt
13.5
3
// examples/control_flow.clt
int x;
x = 5;
if (x > 3) {
printf(x); // Output: 5
} else {
x = 1;
printf(x);
}
$ python cli.py examples/control_flow.clt
5
// examples/nested_scopes.clt
int x;
x = 10;
{
int y;
y = 20;
{
int z;
z = x + y; // accesses outer variables
printf(z); // Output: 30
}
}
$ python cli.py examples/nested_scopes.clt
30
// examples/float_operations.clt
float x;
float y;
float result;
x = 3.14;
y = 2.86;
result = x + y;
printf(result); // Output: 6
result = x * y;
printf(result); // Output: 8.9804
$ python cli.py examples/float_operations.clt
6
8.9804
# Run all tests
python -m pytest
# Run with verbose output
python -m pytest -v
# Run a specific test module
python -m pytest tests/test_lexer.py
# Run a specific test class
python -m pytest tests/test_interpreter.py::TestBooleanSemantics
# Run a single test
python -m pytest tests/test_parser.py::TestASTStructuralValidation::test_full_tree_validation_precedence
# Run with short traceback
python -m pytest --tb=short -q
| Module | Tests | Description |
|---|---|---|
test_lexer.py |
33 | Token recognition, source location tracking, numeric/string literals, comments, error handling, boundary/stress tests |
test_parser.py |
64 | Syntax error detection, deep nesting, float edge cases, AST structural validation, recovery behavior, operator precedence, fuzzing, EOF edge cases, performance |
test_interpreter.py |
46 | Boolean semantics, relational evaluation, type coercion, uninitialized variables, nested blocks, expression stress, control flow, printf ordering, division edge cases, variable lifetime, isolation |
test_ast.py |
57 | Expression tree composition, block scope nesting, if-statement traversal, visitor contract, equality comparison, immutability, program node edge cases, location integrity |
test_symbol_table.py |
38 | Type enforcement, variable shadowing, deep nesting (50-200 levels), scope re-entrancy, memory integrity, error messages, performance (10,000 declarations), float precision |
test_cli.py |
24 | File execution, inline code, verbose mode, error reporting, file encoding, version flag, edge cases |
test_repl.py |
31 | Statement execution, multi-line input, state persistence, REPL commands, error recovery, output isolation |
| Total | 293 | All passing |
The test suite systematically covers the following categories:
Lexer Tests:
.5, 3., errors like 1.2.3)//, empty comments, comments at EOF)Parser Tests:
Interpreter Tests:
0, 0.0 = false)1/0 as int)The project documents 18 design decisions in docs/design_decisions.md.
Key highlights:
| # | Decision | Choice | Rationale |
|---|---|---|---|
| 1 | Lexer implementation | Hand-written DFA | Educational transparency, explicit error control |
| 2 | EOF handling | Explicit EOF token | Simplifies parser termination, aligns with FOLLOW sets |
| 4 | Parsing strategy | LL(1) recursive descent | Educational value, easy debugging, no backtracking |
| 5 | Precedence encoding | 4-level expression hierarchy | Matches C semantics, correct AST construction |
| 7 | Interpretation pattern | Visitor pattern | Separates traversal from execution, extensible |
| 8 | Symbol table structure | Stack of scope dictionaries | Efficient enter/exit, matches block scoping |
| 9 | Truthiness | C-style (0 = false) | Matches C semantics |
| 10 | Type coercion | Implicit (int<->float) | Matches C semantics for assignment and arithmetic |
| 11 | Uninitialized variables | Raise SemanticError | Prevents undefined behavior (stricter than C) |
| 12 | Integer division | Floor division for int/int | Matches C semantics |
| 13 | Error strategy | Fail-fast (first error only) | Simpler implementation, standard for educational compilers |
| 15 | Testing approach | Test-driven development (TDD) | Robustness, 293 tests across all components |
| Limitation | Description | Mitigation |
|---|---|---|
| Recursion depth | Python’s default recursion limit (~1000) can be exceeded by deeply nested expressions (>500 levels) | Increase limit with sys.setrecursionlimit() or restructure deeply nested code |
| No loop constructs | while and for loops are not supported (per language specification) |
Future extension possible |
| No functions | User-defined functions are not supported | Only built-in printf() is available |
| No arrays/pointers | Aggregate types are not supported | Future extension possible |
| Single-error reporting | Parser stops on first syntax error (fail-fast) | Re-run after fixing each error |
| No string variables | String literals are only supported as printf() arguments |
Strings cannot be stored in variables |
| Component | Time Complexity | Space Complexity |
|---|---|---|
| Lexer | O(n) single-pass |
O(n) token stream |
| Parser | O(n) single-pass |
O(n) AST |
| Interpreter | O(n) tree traversal |
O(s) symbol table (s = scope depth) |
| Symbol table lookup | O(d) per lookup (d = nesting depth) |
O(v) (v = total variables) |
Benchmarks (tested on the project test suite):
Contributions are welcome. Please follow these guidelines:
python -m pytest -v
git clone https://github.com/ChethiyaB/C-Lite-Interpreter.git
cd C-Lite-Interpreter
pip install -r requirements.txt
python -m pytest # Verify all 293 tests pass
This project is licensed under the MIT License. See LICENSE for details.
MIT License
Copyright (c) 2026 Chethiya Bandara