Voice-Enabled LLM Agentic Construction Scheduler

Research Overview

Abstract

            Construction project scheduling is a complex, time-intensive process typically requiring domain expertise and specialized software. Our research presents a novel approach: an autonomous multi-agent AI system that converts a user's natural voice description into a fully optimized construction schedule — with zero manual scheduling input.
          

The system leverages a LangGraph-based multi-agent architecture where specialized AI agents collaborate through a shared state graph. Users speak their construction project requirements, which are converted to text via the Web Speech API. An intent agent extracts structured project data, a phase agent retrieves relevant WBS templates from a Neo4j knowledge graph, a details agent gathers task-specific parameters through an interactive dialogue, and a scheduling agent uses Google OR-Tools to generate a constraint-satisfying, dependency-respecting schedule.

The result is a Gantt chart rendered in real-time on a React dashboard, with Server-Sent Events (SSE) for live streaming of agent responses. This research aims to democratize construction project management by making AI-powered scheduling accessible through natural conversation.

🎤

Voice Input

Natural language project description

🤖

4 AI Agents

Collaborative multi-agent workflow

🗃️

Knowledge Graph

Neo4j WBS template library

📅

Auto Schedule

OR-Tools optimized Gantt chart

Detailed Breakdown

Project Summary & Conclusions

The Scheduling Bottleneck

In the construction industry, generating an accurate and optimized project schedule is a notoriously tedious task. Project managers historically spend countless hours manually defining Work Breakdown Structures (WBS), estimating task durations based on labor productivity rates, and mapping out complex logical dependencies (such as Finish-to-Start or Start-to-Start relationships). This manual dependency mapping is prone to human error, often resulting in delayed timelines, misallocated resources, and cascading budget overruns. The primary motivation for this research project was to eliminate this bottleneck by introducing an intuitive, conversational interface capable of executing complex scheduling logic behind the scenes.

Methodology & Implementation

Our solution is built upon a sophisticated multi-agent architecture orchestrated by LangGraph. Instead of relying on a single monolithic LLM prompt,which often hallucinates or loses context in complex constraint-satisfaction problems,we distributed the cognitive load across four specialized agents. The Intent Agent first acts as the conversational entry point, utilizing a local Faster Whisper model to transcribe voice inputs in real-time with sub-second latency, extracting high-level project parameters. The Phase Agent then acts as a retrieval engine, querying a Neo4j Knowledge Graph to instantiate a historically accurate WBS template tailored to the specific building type.

Crucially, the system does not hallucinate task durations. The Details Agent dynamically identifies the mathematical variables required for each task (e.g., floor area, concrete volume) and prompts the user for specific dimensions. Finally, the Scheduling Agent bridges the gap between natural language understanding and strict mathematical optimization. It translates the gathered dependencies, formulas, and user inputs into a constraint programming model solved by Google OR-Tools. This ensures the final schedule is not just a statistical guess, but a mathematically proven, optimal timeline respecting all physical construction constraints.

Conclusions & Future Scope

The resulting system successfully demonstrates that Large Language Models can be effectively bounded and directed to perform rigorous engineering planning when paired with symbolic AI (Knowledge Graphs) and strict mathematical solvers (OR-Tools). User testing indicates a near 90% reduction in the time required to generate a preliminary project baseline schedule, moving from a multi-day manual process to a 5-minute conversational interaction. Furthermore, the integration of a dual-model real-time voice interface (utilizing Whisper tiny.en for live previews and medium.en for high-accuracy finalization) drastically improves system accessibility, allowing site managers to initiate planning without needing specialized software training. Future iterations of this architecture could be expanded to include resource leveling (optimizing for fixed crew sizes), real-time cost estimation dynamically linked to the schedule, and continuous schedule updating based on daily voice-reported site progress.

System Design

Solution Architecture

A full-stack AI pipeline from voice input to an optimized construction schedule, powered by LangGraph state machines and semantic graph retrieval.

🎤

1

Voice Input

Web Speech API converts speech to text

🧠

2

Intent Agent

Extracts structured project requirements

🗃️

3

Phase Agent

Neo4j graph retrieval for WBS template

📋

4

Details Agent

Gathers task parameters interactively

⚙️

5

Scheduler

OR-Tools generates optimal schedule

📊

6

Gantt Chart

Interactive React dashboard output

LangGraph State Machine

Agents are nodes in a directed graph with typed shared state (AgentState). The router node dynamically directs flow between agents based on the current workflow stage.

Neo4j Knowledge Graph

Work breakdown structure templates are stored as a hierarchical graph (Project → Phase → Package → Task). Cypher queries retrieve the best-matching template for the user's project type.

OR-Tools Optimization

Google's constraint programming solver handles task dependencies (FS/SS/FF/SF), resource allocation, and duration formulas to produce a globally optimal construction schedule.

SSE Real-Time Streaming

FastAPI endpoints stream agent responses character-by-character using Server-Sent Events, creating a real-time typewriter effect and live stage updates in the React UI.

Human-in-the-Loop

LangGraph's interrupt() mechanism pauses the workflow at critical points (e.g., intent confirmation, phase selection) to await user validation before proceeding.

Voice Interface

Browser-native Web Speech API (no external service or API key needed) provides continuous voice recognition with real-time interim results injected directly into the chat input.

Multi-Agent System

AI Agent Roles

Four specialized agents collaborate through a shared LangGraph state to guide the user from voice input to a complete construction schedule.

STAGE 1

🎯 Intent Agent

Conducts a natural conversation to collect project requirements. Extracts structured intent (project type, size, floors, location, timeline) and confirms with the user before proceeding.

submit_intent GPT-5-nano interrupt()

STAGE 2

📐 Phase Agent

Queries the Neo4j knowledge graph to find the most suitable WBS template. Uses Cypher QA chain to retrieve the full hierarchy of phases, work packages, and sub-tasks.

GraphCypherQA Neo4j structured_output

STAGE 3

📋 Details Agent

Fetches task-specific parameters (productivity rates, formulas) from Neo4j and interactively queries the user for dimension variables needed to compute accurate task durations.

duration_formulas variable_extraction interrupt()

STAGE 4

⚙️ Scheduling Agent

Applies OR-Tools constraint programming to generate an optimal schedule respecting all task dependencies (FS/SS/FF/SF relationships) and resource constraints from the knowledge graph.

ortools dependency_graph CP-SAT

Research Approach

Methodology

🔬 Knowledge Graph Construction

Curated industry-standard WBS templates for residential, commercial, industrial, and infrastructure projects
Task nodes contain productivity rates, duration formulas, and dependency relationship types
Graph traversal retrieves full hierarchical project trees via recursive Cypher queries
Dependency resolution maps FS/SS/FF/SF relationships with lag days between tasks

🤖 LLM Integration Strategy

Structured output parsing (Pydantic models) ensures reliable data extraction from LLM responses
Each agent operates with a carefully designed system prompt and tool suite
LangGraph's MemorySaver enables persistent conversation state across multi-turn interactions
Human-in-the-loop interrupts at stage boundaries prevent erroneous autonomous progression

📅 Scheduling Algorithm

Duration formulas use project-specific dimensions (area, volume) gathered dynamically from users
OR-Tools CP-SAT solver handles complex dependency networks with lag constraints
Schedule output includes start/end dates, phase assignments, and critical path identification
Results rendered as an interactive Gantt chart with zoom, pan, and phase color-coding

🎤 Voice Interface Design

Browser-native Web Speech API — free, no external service, powered by Google's speech engine
Continuous mode with interim results for real-time feedback during dictation
Graceful fallback to text input in non-supporting browsers (Firefox, Safari)
Visual recording state indicator with pulsing animation to signal active microphone

Implementation