E20 Four Year Project · 2024/2025

Voice-Enabled LLM
Agentic Construction
Scheduler

Speak your project requirements — our multi-agent AI system understands them, builds a Work Breakdown Structure, and generates an optimized construction schedule automatically.

🤖 LangGraph Multi-Agent 🧠 GPT-5-nano 🗣️ Web Speech API 🗃️ Neo4j Knowledge Graph ⚙️ OR-Tools Scheduler 🌐 FastAPI + React

Abstract

Construction project scheduling is a complex, time-intensive process typically requiring domain expertise and specialized software. Our research presents a novel approach: an autonomous multi-agent AI system that converts a user's natural voice description into a fully optimized construction schedule — with zero manual scheduling input.

The system leverages a LangGraph-based multi-agent architecture where specialized AI agents collaborate through a shared state graph. Users speak their construction project requirements, which are converted to text via the Web Speech API. An intent agent extracts structured project data, a phase agent retrieves relevant WBS templates from a Neo4j knowledge graph, a details agent gathers task-specific parameters through an interactive dialogue, and a scheduling agent uses Google OR-Tools to generate a constraint-satisfying, dependency-respecting schedule.

The result is a Gantt chart rendered in real-time on a React dashboard, with Server-Sent Events (SSE) for live streaming of agent responses. This research aims to democratize construction project management by making AI-powered scheduling accessible through natural conversation.

🎤
Voice Input
Natural language project description
🤖
4 AI Agents
Collaborative multi-agent workflow
🗃️
Knowledge Graph
Neo4j WBS template library
📅
Auto Schedule
OR-Tools optimized Gantt chart

Project Summary & Conclusions

The Scheduling Bottleneck

In the construction industry, generating an accurate and optimized project schedule is a notoriously tedious task. Project managers historically spend countless hours manually defining Work Breakdown Structures (WBS), estimating task durations based on labor productivity rates, and mapping out complex logical dependencies (such as Finish-to-Start or Start-to-Start relationships). This manual dependency mapping is prone to human error, often resulting in delayed timelines, misallocated resources, and cascading budget overruns. The primary motivation for this research project was to eliminate this bottleneck by introducing an intuitive, conversational interface capable of executing complex scheduling logic behind the scenes.

Methodology & Implementation

Our solution is built upon a sophisticated multi-agent architecture orchestrated by LangGraph. Instead of relying on a single monolithic LLM prompt,which often hallucinates or loses context in complex constraint-satisfaction problems,we distributed the cognitive load across four specialized agents. The Intent Agent first acts as the conversational entry point, utilizing a local Faster Whisper model to transcribe voice inputs in real-time with sub-second latency, extracting high-level project parameters. The Phase Agent then acts as a retrieval engine, querying a Neo4j Knowledge Graph to instantiate a historically accurate WBS template tailored to the specific building type.

Crucially, the system does not hallucinate task durations. The Details Agent dynamically identifies the mathematical variables required for each task (e.g., floor area, concrete volume) and prompts the user for specific dimensions. Finally, the Scheduling Agent bridges the gap between natural language understanding and strict mathematical optimization. It translates the gathered dependencies, formulas, and user inputs into a constraint programming model solved by Google OR-Tools. This ensures the final schedule is not just a statistical guess, but a mathematically proven, optimal timeline respecting all physical construction constraints.

Conclusions & Future Scope

The resulting system successfully demonstrates that Large Language Models can be effectively bounded and directed to perform rigorous engineering planning when paired with symbolic AI (Knowledge Graphs) and strict mathematical solvers (OR-Tools). User testing indicates a near 90% reduction in the time required to generate a preliminary project baseline schedule, moving from a multi-day manual process to a 5-minute conversational interaction. Furthermore, the integration of a dual-model real-time voice interface (utilizing Whisper tiny.en for live previews and medium.en for high-accuracy finalization) drastically improves system accessibility, allowing site managers to initiate planning without needing specialized software training. Future iterations of this architecture could be expanded to include resource leveling (optimizing for fixed crew sizes), real-time cost estimation dynamically linked to the schedule, and continuous schedule updating based on daily voice-reported site progress.

Solution Architecture

A full-stack AI pipeline from voice input to an optimized construction schedule, powered by LangGraph state machines and semantic graph retrieval.

🎤
1
Voice Input
Web Speech API converts speech to text
🧠
2
Intent Agent
Extracts structured project requirements
🗃️
3
Phase Agent
Neo4j graph retrieval for WBS template
📋
4
Details Agent
Gathers task parameters interactively
⚙️
5
Scheduler
OR-Tools generates optimal schedule
📊
6
Gantt Chart
Interactive React dashboard output
LangGraph State Machine
Agents are nodes in a directed graph with typed shared state (AgentState). The router node dynamically directs flow between agents based on the current workflow stage.
Neo4j Knowledge Graph
Work breakdown structure templates are stored as a hierarchical graph (Project → Phase → Package → Task). Cypher queries retrieve the best-matching template for the user's project type.
OR-Tools Optimization
Google's constraint programming solver handles task dependencies (FS/SS/FF/SF), resource allocation, and duration formulas to produce a globally optimal construction schedule.
SSE Real-Time Streaming
FastAPI endpoints stream agent responses character-by-character using Server-Sent Events, creating a real-time typewriter effect and live stage updates in the React UI.
Human-in-the-Loop
LangGraph's interrupt() mechanism pauses the workflow at critical points (e.g., intent confirmation, phase selection) to await user validation before proceeding.
Voice Interface
Browser-native Web Speech API (no external service or API key needed) provides continuous voice recognition with real-time interim results injected directly into the chat input.

AI Agent Roles

Four specialized agents collaborate through a shared LangGraph state to guide the user from voice input to a complete construction schedule.

STAGE 1
🎯 Intent Agent
Conducts a natural conversation to collect project requirements. Extracts structured intent (project type, size, floors, location, timeline) and confirms with the user before proceeding.
submit_intent GPT-5-nano interrupt()
STAGE 2
📐 Phase Agent
Queries the Neo4j knowledge graph to find the most suitable WBS template. Uses Cypher QA chain to retrieve the full hierarchy of phases, work packages, and sub-tasks.
GraphCypherQA Neo4j structured_output
STAGE 3
📋 Details Agent
Fetches task-specific parameters (productivity rates, formulas) from Neo4j and interactively queries the user for dimension variables needed to compute accurate task durations.
duration_formulas variable_extraction interrupt()
STAGE 4
⚙️ Scheduling Agent
Applies OR-Tools constraint programming to generate an optimal schedule respecting all task dependencies (FS/SS/FF/SF relationships) and resource constraints from the knowledge graph.
ortools dependency_graph CP-SAT

Methodology

🔬 Knowledge Graph Construction
  • Curated industry-standard WBS templates for residential, commercial, industrial, and infrastructure projects
  • Task nodes contain productivity rates, duration formulas, and dependency relationship types
  • Graph traversal retrieves full hierarchical project trees via recursive Cypher queries
  • Dependency resolution maps FS/SS/FF/SF relationships with lag days between tasks
🤖 LLM Integration Strategy
  • Structured output parsing (Pydantic models) ensures reliable data extraction from LLM responses
  • Each agent operates with a carefully designed system prompt and tool suite
  • LangGraph's MemorySaver enables persistent conversation state across multi-turn interactions
  • Human-in-the-loop interrupts at stage boundaries prevent erroneous autonomous progression
📅 Scheduling Algorithm
  • Duration formulas use project-specific dimensions (area, volume) gathered dynamically from users
  • OR-Tools CP-SAT solver handles complex dependency networks with lag constraints
  • Schedule output includes start/end dates, phase assignments, and critical path identification
  • Results rendered as an interactive Gantt chart with zoom, pan, and phase color-coding
🎤 Voice Interface Design
  • Browser-native Web Speech API — free, no external service, powered by Google's speech engine
  • Continuous mode with interim results for real-time feedback during dictation
  • Graceful fallback to text input in non-supporting browsers (Firefox, Safari)
  • Visual recording state indicator with pulsing animation to signal active microphone

Technology Stack

🧠 AI & Backend
🦜
LangGraph / LangChain
Multi-agent orchestration & state management
🧠
OpenAI GPT-5-nano
Language model for all four agents
FastAPI
REST API + SSE streaming backend
⚙️
Google OR-Tools
Constraint programming scheduler
🗃️ Data & Knowledge
🌿
Neo4j Graph Database
WBS knowledge graph & Cypher QA
🔗
LangChain Neo4j
GraphCypherQAChain integration
🛡️
Pydantic v2
Structured output validation & parsing
📦
python-dotenv
Environment configuration management
🌐 Frontend
⚛️
React 19 + Vite
Chat UI, Gantt chart, task board
🎤
Web Speech API
Browser-native voice recognition
📡
Server-Sent Events
Real-time agent response streaming
🎨
Vanilla CSS
Custom dark-mode design system

Our Team

Department of Computer Engineering · Faculty of Engineering · University of Peradeniya

Students
Ahemad I.I
Ahemad I.I
E/20/009
Ketharagan P
Ketharagan P
E/20/199
Rismy A.M
Rismy A.M
E/20/339
Supervisors
Dr. Isuru Nawinne
Dr. Isuru Nawinne
Department of Computer Engineering
University of Peradeniya
Kandy, Sri Lanka
Dr. Upul Jayasinghe
Dr. Upul Jayasinghe
Department of Computer Engineering
University of Peradeniya
Kandy, Sri Lanka
Dr. Damayanthi Herath
Dr. Damayanthi Herath
Department of Computer Engineering
University of Peradeniya
Kandy, Sri Lanka
Dr. Ruwini Edirisinghe
Dr. Ruwini Edirisinghe
School of Property, Construction and Project Management
RMIT University
Melbourne, Australia
FR
Dr. Frank Boukamp
School of Property, Construction and Project Management
RMIT University
Melbourne, Australia