The Challenge: Securing Critical Infrastructure
Modern airport surveillance networks are highly critical infrastructures that face massive bandwidth constraints as they scale to support hundreds of high-definition cameras and IoT sensors. Traditional manual troubleshooting by network administrators is fundamentally reactive, relying on static CLI interventions that are often too slow, risking critical video feed loss during congestion events.
Our Solution: We built a distributed, generative Digital Twin—a high-fidelity virtual replica of the airport's network. By combining Graph Neural Networks (GNNs) for rapid performance prediction with a Large Language Model (LLM) utilizing Retrieval-Augmented Generation (RAG), administrators can now manage complex camera networks using simple, natural language commands. Before any AI-generated JSON configuration is deployed, the GNN acts as a strict mathematical guardrail to pre-verify that the changes are physically safe.
Core Contributions
🛰 High-Fidelity Data Plane
A fully automated ns-3 C++ simulation environment mirrors a real airport's hierarchical edge-to-cloud topology, generating 1,000 labeled fault scenarios covering peak traffic, rain fade, and node failures.
🧠 ZoneAwareGNN
A custom GATv2 graph attention network formulates the live network as a directed graph to predict latency, jitter, throughput, and packet loss in under 30 ms—replacing hours of simulation at runtime.
🗣 RAG-Augmented Intent Engine
Natural language administrator intents are grounded in 7,004 embedded simulation states via ChromaDB, enabling the Gemini 2.5 Flash-Lite LLM to generate valid, topology-aware JSON configurations without hallucination.
🔒 Closed-Loop Safety Guardrail
Before any config is deployed, the GNN performs a forward pass on the hypothetical topology. Configurations that violate SLA thresholds are immediately flagged, and the LLM iteratively refines its proposal until the predicted QoS is safe.
Technology Stack
System Architecture & Methodology
Our system operates in a continuous, closed-loop cycle consisting of three core technological pillars designed to bridge the gap between natural language intents and strict network physics:
1. High-Fidelity Data Plane
We utilized the ns-3 discrete-event simulator to create a realistic physical baseline, generating 1,000 unique scenarios. This data captures continuous UDP datagrams, bursty TCP background traffic, hardware failures, and weather-induced edge degradations.
2. Predictive AI (ZoneAwareGNN)
Our custom Graph Attention Network (GATv2) formulates the network state as a directed graph. Utilizing 4 independent attention heads and a 128-dimensional latent space, it predicts continuous Quality of Service (QoS) metrics like latency, jitter, throughput, and packet loss.
3. Generative AIOps Pipeline
A ChromaDB vector database stores 7,004 embedded historical network states. When an administrator issues a command, the Gemini 2.5 Flash-Lite LLM retrieves this context to autonomously synthesize and iteratively evaluate safe JSON configurations.
Network Topology Overview
The simulated airport network follows a strict three-tier hierarchy, forcing congestion at precisely the right chokepoints to generate a diverse, realistic training corpus:
cloud-server-01
gw-core-01
Scenario Generation & Fault Injection
To ensure the GNN generalizes across a vast state-space, a Python orchestrator programmatically injected three categories of stress conditions before each of the 1,000 ns-3 runs:
| Context Type | Parameters | Purpose |
|---|---|---|
| Temporal (Peak/Off-Peak) | 10–30 background flows (off-peak) · up to 100 flows (peak) | Simulate crowded terminals & emergency loads that saturate the 1 Gbps WAN link |
| Weather / Rain Fade | Clear · Rain · Storm contexts via RateErrorModel |
Increase bit-error rates on perimeter wireless cameras, forcing packet loss before gateway aggregation |
| Hardware Failures | Random node shutdowns via ipv4->SetDown() |
Force mid-simulation routing reconvergence and generate highly dynamic transient states |
ZoneAwareGNN: Predictive Network Intelligence
To capture the non-Euclidean spatial dependencies of the airport network, we designed a custom Graph Attention Network v2 (GATv2) architecture. Unlike time-series models (ARIMA, RNN), GNNs leverage message-passing to aggregate congestion signals across the entire topology simultaneously.
Graph Formulation
The live network state is encoded as a directed graph G = (V, E). Before entering the GATv2 core, each input type is projected into a shared 128-dimensional latent space via independent MLP encoders:
| Input Type | Raw Dimension | Encoded Dimension | Contents |
|---|---|---|---|
| Node Features xv | R³² | 128-d | Queue occupancy, processing load — 2-layer MLP + ReLU |
| Edge Features euv | R¹⁶ | 128-d | Physical link capacity, dynamic bandwidth utilisation — 1-layer MLP |
| Flow Features fi | R²⁴ | 128-d | Video stream bitrate, sensor burst demand — 1-layer MLP |
Two-Layer GATv2 Message-Passing Core
GATv2 improves upon standard GAT by computing dynamic, strictly non-linear attention coefficients, allowing the model to focus on congested bottleneck links rather than treating all neighbours equally.
Layer 1 — Multi-Head Attention
K = 4 independent heads, outputs concatenated. Each head learns a distinct "view" of the neighbourhood, stabilising training against local optima and capturing diverse congestion patterns simultaneously.
Layer 2 — Spatial Smoother
A single attention head aggregates the concatenated embeddings into the final node representation houtv → R¹²⁸. Acts as a low-pass filter, smoothing neighbourhood noise to expose macro-topological structure.
Contextual Readout & Four-Head Regression
Network KPIs are end-to-end flow metrics, not isolated node properties. For each traffic flow i from source s to destination d, a context vector Ci ∈ R³⁸⁴ is assembled by concatenating the encoded flow features with the learned embeddings of both endpoints. Four independent MLP heads then predict simultaneously:
| Prediction Head | Output | Special Treatment | Status |
|---|---|---|---|
| Latency | End-to-end delay (ms) | log(1+x) transform on targets to reduce outlier impact | ~80% accuracy |
| Throughput | Effective bandwidth (Mbps) | MSE loss with reduced weight ε = 0.1 | ~74% accuracy |
| Jitter | Delay variation (ms) | Targets scaled ×100 to match gradient magnitude | 0.056 ms Abs. Error |
| Packet Loss | Drop rate [0, 1] | Sigmoid output + amplified weight ε = 5.0 | 100% recall >15% |
Training Configuration
Implemented in PyTorch Geometric on an NVIDIA T4 GPU. Dataset: 1,000 ns-3 scenarios split 80/20 train/test. Optimiser: Adam (lr = 0.001). Epochs: 80. Batch size: 32. The test loss converges below training loss — a hallmark of Dropout regularisation active only during training, confirming no overfitting on the synthetic dataset.
Intent-Driven AIOps: Closed-Loop Pipeline
The AIOps orchestrator bridges natural language administrator intents and low-level network configuration parameters through a multi-agent pipeline. The core vulnerability in modern AIOps—deploying AI-generated configs without validation—is addressed by making the GNN a mandatory pre-deployment safety gate.
RAG Knowledge Base Construction
The 7,004 ChromaDB documents were generated offline from the 1,000 ns-3 scenario JSON files. Each document encodes per-zone performance metrics, global scenario summaries, and emergent routing patterns. At query time, the administrator's intent is embedded with all-MiniLM-L6-v2 and the top-10 most relevant historical states are retrieved via cosine similarity search in 42.5 ms.
Supported Intent Classes
🚨 Safety Critical
"Immediately isolate compromised camera feeds in the Critical Security Zone." — Triggers priority rerouting and zone quarantine configurations.
📶 Network QoS
"Prioritize 4K video streams over background sensor data during peak hours." — Adjusts TrafficControlHelper queue weights and flow priorities.
🔁 Fault Tolerance
"Reroute core traffic away from Gateway A due to detected link degradation." — Forces routing reconvergence by manipulating interface states programmatically.
Iterative Optimization Loop
The LLM operates in a dual capacity—as both Configuration Generator and Cognitive Evaluator—guided by the GNN's predicted QoS penalties:
Initial Generation
The LLM processes the administrator's natural language intent alongside the top-10 retrieved RAG documents and the real-time graph state G(t), synthesising a baseline JSON network configuration with specific node IDs, zone tags, and bandwidth parameters.
GNN Pre-Deployment Verification
The proposed config is treated as a hypothetical graph mutation. The ZoneAwareGNN performs a forward pass in ~28 ms to predict future KPIs. These are aggregated into a composite QoS score. Any SLA violations immediately flag the configuration as unsafe.
Cognitive Evaluation
In subsequent iterations (i ≥ 2), the LLM receives the full trajectory of all prior configurations and their predicted QoS scores, plus rigid capacity constraints (e.g. the 80 Mbps cloud uplink ceiling). It analyses the delta between iterations to guide refinement.
Termination & Deployment
The LLM outputs a structured JSON with a should_continue boolean, a reasoning string, and the next_config payload. When further routing shifts yield no statistically significant QoS improvement, the loop terminates—on average in 2.4 iterations—and the best-scoring configuration is deployed to ns-3.
Distributed Microservices Architecture
| Service | File | Responsibility | Avg. Latency |
|---|---|---|---|
| Simulation Engine | app.py | Hosts ns-3 on Ubuntu 22.04 LTS; executes compiled C++ binaries via subprocesses with threading.Lock() for race-condition prevention | ~65 s / scenario |
| GNN Inference Engine | gnn_server.py | Serves pre-trained PyTorch model on T4 GPU; dynamically constructs PyG graphs, denormalises outputs before returning raw QoS predictions | 28.1 ms |
| AIOps Orchestrator | rag_server.py | Manages ChromaDB vector store, stateful iterative loop, Gemini API calls, and HTTP POST coordination between all services | 1.35 s / LLM call |
Experiment Setup & Topology
The physical network was modeled using a hierarchical topology representing an airport's edge-to-cloud infrastructure. It comprises 52 specialized nodes: 50 high-definition endpoint cameras uniformly distributed across five operational zones, sending data through an aggregation core gateway multiplexer to a central Cloud/NOC server.
To accurately simulate congestion and force network bottlenecks, intra-airport Edge-to-Core links were over-provisioned at 10 Gbps, while the critical Core-to-Cloud WAN link was strictly bottlenecked at 1 Gbps using an Active Queue Management (AQM) CoDel algorithm. The entire architecture was deployed as a distributed microservices environment using FastAPI, strictly separating the heavy C++ ns-3 simulation engine from the PyTorch inference servers to ensure rapid, enterprise-grade scalability.
Results and Analysis
Our digital twin successfully distills days of complex packet-level physics into a neural network capable of performing predictive evaluations in mere milliseconds. The multi-task regression model achieved stable convergence across 80 epochs, utilizing a custom weighted Mean Squared Error (MSE) loss function to balance highly varied metric scales.
Key Performance Metrics
- Critical Congestion Recall: The GNN successfully flagged 100.00% of critical congestion states (>15.0% packet loss), ensuring the AI orchestrator is always invoked before video feeds drop.
- Ultra-Low Latency Inference: A single forward-pass inference request to the PyTorch GNN averages just 28.1 ms, while ChromaDB context retrieval averages 42.5 ms.
- Rapid Incident Response: The total operational latency—from a user typing a natural language command to the AI generating, verifying, and deploying a safe network configuration—averages just 4.2 seconds.
- Optimization Efficiency: The LLM's cognitive evaluator loop reaches a mathematically stable termination decision in an average of 2.4 iterations.
- Jitter Prediction Accuracy: Despite the chaotic micro-bursts of simulated TCP traffic, absolute error for network jitter remained highly bounded at just 0.056 ms.
Limitations & Future Work
While our architecture delivers a robust foundation for automated AIOps, we identify four key research avenues that define the path toward production-grade deployment:
Sim-to-Real Transferability
The GNN is trained exclusively on synthetic ns-3 data. Future work will integrate Software-Defined Networking (SDN) hardware with OpenFlow telemetry to fine-tune the model on real physical measurements and evaluate the sim-to-real performance gap.
Dynamic Topology Generalization
The ZoneAwareGNN is trained on a fixed 52-node topology. Expanding to inductive learning paradigms (e.g. GraphSAGE-style sampling) will allow the model to adapt to structural network expansions, node additions, and physical hardware reconfigurations in real time.
Cognitive Bottleneck (LLM Latency)
External API calls to Gemini add ~1.35 s per iteration. Future iterations will explore locally hosted, quantized open-source LLMs (e.g. Llama 3 / Mistral) to reduce operational latency below 200 ms and eliminate dependency on cloud API availability.
Massive-Scale Deployment
The current orchestrator optimises a 52-node network. Scaling to thousands of HD camera endpoints across multi-gateway mega-airports will require horizontal sharding of the orchestrator and distributed GNN inference across multiple GPU nodes.
Project Team & Supervisors
Meet the engineers and academics behind the Generative Digital Twin Framework.
Project Links & Resources
Explore our code repository and departmental links below.