Generative Digital Twin for Airport Surveillance

The Challenge: Securing Critical Infrastructure

Modern airport surveillance networks are highly critical infrastructures that face massive bandwidth constraints as they scale to support hundreds of high-definition cameras and IoT sensors. Traditional manual troubleshooting by network administrators is fundamentally reactive, relying on static CLI interventions that are often too slow, risking critical video feed loss during congestion events.

Our Solution: We built a distributed, generative Digital Twin—a high-fidelity virtual replica of the airport's network. By combining Graph Neural Networks (GNNs) for rapid performance prediction with a Large Language Model (LLM) utilizing Retrieval-Augmented Generation (RAG), administrators can now manage complex camera networks using simple, natural language commands. Before any AI-generated JSON configuration is deployed, the GNN acts as a strict mathematical guardrail to pre-verify that the changes are physically safe.

100%Congestion Recall

4.2sAvg. Response Latency

1,000Simulated Scenarios

28.1msGNN Inference Time

7,004RAG Documents

2.4×Avg. Optimization Iters

Core Contributions

🛰 High-Fidelity Data Plane

A fully automated ns-3 C++ simulation environment mirrors a real airport's hierarchical edge-to-cloud topology, generating 1,000 labeled fault scenarios covering peak traffic, rain fade, and node failures.

🧠 ZoneAwareGNN

A custom GATv2 graph attention network formulates the live network as a directed graph to predict latency, jitter, throughput, and packet loss in under 30 ms—replacing hours of simulation at runtime.

🗣 RAG-Augmented Intent Engine

Natural language administrator intents are grounded in 7,004 embedded simulation states via ChromaDB, enabling the Gemini 2.5 Flash-Lite LLM to generate valid, topology-aware JSON configurations without hallucination.

🔒 Closed-Loop Safety Guardrail

Before any config is deployed, the GNN performs a forward pass on the hypothetical topology. Configurations that violate SLA thresholds are immediately flagged, and the LLM iteratively refines its proposal until the predicted QoS is safe.

Technology Stack

ns-3.41 (C++) PyTorch Geometric GATv2 Convolution Gemini 2.5 Flash-Lite ChromaDB (RAG) FastAPI Microservices NVIDIA T4 GPU all-MiniLM-L6-v2 CoDel AQM Python Orchestrator

System Architecture & Methodology

Our system operates in a continuous, closed-loop cycle consisting of three core technological pillars designed to bridge the gap between natural language intents and strict network physics:

1. High-Fidelity Data Plane

We utilized the ns-3 discrete-event simulator to create a realistic physical baseline, generating 1,000 unique scenarios. This data captures continuous UDP datagrams, bursty TCP background traffic, hardware failures, and weather-induced edge degradations.

2. Predictive AI (ZoneAwareGNN)

Our custom Graph Attention Network (GATv2) formulates the network state as a directed graph. Utilizing 4 independent attention heads and a 128-dimensional latent space, it predicts continuous Quality of Service (QoS) metrics like latency, jitter, throughput, and packet loss.

3. Generative AIOps Pipeline

A ChromaDB vector database stores 7,004 embedded historical network states. When an administrator issues a command, the Gemini 2.5 Flash-Lite LLM retrieves this context to autonomously synthesize and iteratively evaluate safe JSON configurations.

High-level system architecture of the Generative Digital Twin — **Figure 1:** High-level system architecture detailing the closed-loop optimization workflow. The AIOps Orchestrator iteratively leverages the LLM for configuration generation and the ZoneAwareGNN for physical verification prior to deployment.

Network Topology Overview

The simulated airport network follows a strict three-tier hierarchy, forcing congestion at precisely the right chokepoints to generate a diverse, realistic training corpus:

                    ☁ Cloud / NOC Server (1 node)
                    Destination sink · cloud-server-01
                
                    ⚡ WAN Bottleneck
                    1 Gbps · 20 ms delay · CoDel AQM
                
                    🔀 Aggregation Core (1 node)
                    Central multiplexer · gw-core-01
                
                    🔗 Intra-Airport Links (50 × edges)
                    10 Gbps · 1 ms delay · local fiber
                
                    📷 Edge Endpoints (50 nodes)
                    HD cameras & IoT sensors across 5 operational zones

Scenario Generation & Fault Injection

To ensure the GNN generalizes across a vast state-space, a Python orchestrator programmatically injected three categories of stress conditions before each of the 1,000 ns-3 runs:

Context Type	Parameters	Purpose
Temporal (Peak/Off-Peak)	10–30 background flows (off-peak) · up to 100 flows (peak)	Simulate crowded terminals & emergency loads that saturate the 1 Gbps WAN link
Weather / Rain Fade	Clear · Rain · Storm contexts via `RateErrorModel`	Increase bit-error rates on perimeter wireless cameras, forcing packet loss before gateway aggregation
Hardware Failures	Random node shutdowns via `ipv4->SetDown()`	Force mid-simulation routing reconvergence and generate highly dynamic transient states

ZoneAwareGNN: Predictive Network Intelligence

To capture the non-Euclidean spatial dependencies of the airport network, we designed a custom Graph Attention Network v2 (GATv2) architecture. Unlike time-series models (ARIMA, RNN), GNNs leverage message-passing to aggregate congestion signals across the entire topology simultaneously.

Graph Formulation

The live network state is encoded as a directed graph G = (V, E). Before entering the GATv2 core, each input type is projected into a shared 128-dimensional latent space via independent MLP encoders:

Input Type	Raw Dimension	Encoded Dimension	Contents
Node Features x_v	R³²	128-d	Queue occupancy, processing load — 2-layer MLP + ReLU
Edge Features e_uv	R¹⁶	128-d	Physical link capacity, dynamic bandwidth utilisation — 1-layer MLP
Flow Features f_i	R²⁴	128-d	Video stream bitrate, sensor burst demand — 1-layer MLP

Two-Layer GATv2 Message-Passing Core

GATv2 improves upon standard GAT by computing dynamic, strictly non-linear attention coefficients, allowing the model to focus on congested bottleneck links rather than treating all neighbours equally.

Layer 1 — Multi-Head Attention

K = 4 independent heads, outputs concatenated. Each head learns a distinct "view" of the neighbourhood, stabilising training against local optima and capturing diverse congestion patterns simultaneously.

Layer 2 — Spatial Smoother

A single attention head aggregates the concatenated embeddings into the final node representation h^out_v → R¹²⁸. Acts as a low-pass filter, smoothing neighbourhood noise to expose macro-topological structure.

Contextual Readout & Four-Head Regression

Network KPIs are end-to-end flow metrics, not isolated node properties. For each traffic flow i from source s to destination d, a context vector C_i ∈ R³⁸⁴ is assembled by concatenating the encoded flow features with the learned embeddings of both endpoints. Four independent MLP heads then predict simultaneously:

Prediction Head	Output	Special Treatment	Status
Latency	End-to-end delay (ms)	`log(1+x)` transform on targets to reduce outlier impact	~80% accuracy
Throughput	Effective bandwidth (Mbps)	MSE loss with reduced weight `ε = 0.1`	~74% accuracy
Jitter	Delay variation (ms)	Targets scaled ×100 to match gradient magnitude	0.056 ms Abs. Error
Packet Loss	Drop rate [0, 1]	Sigmoid output + amplified weight `ε = 5.0`	100% recall >15%

Training Configuration

Implemented in PyTorch Geometric on an NVIDIA T4 GPU. Dataset: 1,000 ns-3 scenarios split 80/20 train/test. Optimiser: Adam (lr = 0.001). Epochs: 80. Batch size: 32. The test loss converges below training loss — a hallmark of Dropout regularisation active only during training, confirming no overfitting on the synthetic dataset.

Intent-Driven AIOps: Closed-Loop Pipeline

The AIOps orchestrator bridges natural language administrator intents and low-level network configuration parameters through a multi-agent pipeline. The core vulnerability in modern AIOps—deploying AI-generated configs without validation—is addressed by making the GNN a mandatory pre-deployment safety gate.

RAG Knowledge Base Construction

The 7,004 ChromaDB documents were generated offline from the 1,000 ns-3 scenario JSON files. Each document encodes per-zone performance metrics, global scenario summaries, and emergent routing patterns. At query time, the administrator's intent is embedded with all-MiniLM-L6-v2 and the top-10 most relevant historical states are retrieved via cosine similarity search in 42.5 ms.

Supported Intent Classes

🚨 Safety Critical

"Immediately isolate compromised camera feeds in the Critical Security Zone." — Triggers priority rerouting and zone quarantine configurations.

📶 Network QoS

"Prioritize 4K video streams over background sensor data during peak hours." — Adjusts TrafficControlHelper queue weights and flow priorities.

🔁 Fault Tolerance

"Reroute core traffic away from Gateway A due to detected link degradation." — Forces routing reconvergence by manipulating interface states programmatically.

Iterative Optimization Loop

The LLM operates in a dual capacity—as both Configuration Generator and Cognitive Evaluator—guided by the GNN's predicted QoS penalties:

Initial Generation

The LLM processes the administrator's natural language intent alongside the top-10 retrieved RAG documents and the real-time graph state G(t), synthesising a baseline JSON network configuration with specific node IDs, zone tags, and bandwidth parameters.

GNN Pre-Deployment Verification

The proposed config is treated as a hypothetical graph mutation. The ZoneAwareGNN performs a forward pass in ~28 ms to predict future KPIs. These are aggregated into a composite QoS score. Any SLA violations immediately flag the configuration as unsafe.

Cognitive Evaluation

In subsequent iterations (i ≥ 2), the LLM receives the full trajectory of all prior configurations and their predicted QoS scores, plus rigid capacity constraints (e.g. the 80 Mbps cloud uplink ceiling). It analyses the delta between iterations to guide refinement.

Termination & Deployment

The LLM outputs a structured JSON with a should_continue boolean, a reasoning string, and the next_config payload. When further routing shifts yield no statistically significant QoS improvement, the loop terminates—on average in 2.4 iterations—and the best-scoring configuration is deployed to ns-3.

Distributed Microservices Architecture

Service	File	Responsibility	Avg. Latency
Simulation Engine	`app.py`	Hosts ns-3 on Ubuntu 22.04 LTS; executes compiled C++ binaries via subprocesses with `threading.Lock()` for race-condition prevention	~65 s / scenario
GNN Inference Engine	`gnn_server.py`	Serves pre-trained PyTorch model on T4 GPU; dynamically constructs PyG graphs, denormalises outputs before returning raw QoS predictions	28.1 ms
AIOps Orchestrator	`rag_server.py`	Manages ChromaDB vector store, stateful iterative loop, Gemini API calls, and HTTP POST coordination between all services	1.35 s / LLM call

Experiment Setup & Topology

The physical network was modeled using a hierarchical topology representing an airport's edge-to-cloud infrastructure. It comprises 52 specialized nodes: 50 high-definition endpoint cameras uniformly distributed across five operational zones, sending data through an aggregation core gateway multiplexer to a central Cloud/NOC server.

To accurately simulate congestion and force network bottlenecks, intra-airport Edge-to-Core links were over-provisioned at 10 Gbps, while the critical Core-to-Cloud WAN link was strictly bottlenecked at 1 Gbps using an Active Queue Management (AQM) CoDel algorithm. The entire architecture was deployed as a distributed microservices environment using FastAPI, strictly separating the heavy C++ ns-3 simulation engine from the PyTorch inference servers to ensure rapid, enterprise-grade scalability.

Results and Analysis

Our digital twin successfully distills days of complex packet-level physics into a neural network capable of performing predictive evaluations in mere milliseconds. The multi-task regression model achieved stable convergence across 80 epochs, utilizing a custom weighted Mean Squared Error (MSE) loss function to balance highly varied metric scales.

Training and Test loss curves over epochs — **Figure 2:** Training and Test loss curves over 80 epochs, demonstrating stable convergence of our multi-head regression GNN architecture.

**Figure 3:** GNN Predictive Performance: Actual vs. Predicted values for Latency, Throughput, Packet Loss, and Jitter across the test dataset.

                Key Performance Metrics
                Critical Congestion Recall: The GNN successfully flagged 100.00% of critical congestion states (>15.0% packet loss), ensuring the AI orchestrator is always invoked before video feeds drop.
Ultra-Low Latency Inference: A single forward-pass inference request to the PyTorch GNN averages just 28.1 ms, while ChromaDB context retrieval averages 42.5 ms.
Rapid Incident Response: The total operational latency—from a user typing a natural language command to the AI generating, verifying, and deploying a safe network configuration—averages just 4.2 seconds.
Optimization Efficiency: The LLM's cognitive evaluator loop reaches a mathematically stable termination decision in an average of 2.4 iterations.
Jitter Prediction Accuracy: Despite the chaotic micro-bursts of simulated TCP traffic, absolute error for network jitter remained highly bounded at just 0.056 ms.

            

Limitations & Future Work

While our architecture delivers a robust foundation for automated AIOps, we identify four key research avenues that define the path toward production-grade deployment:

🔬

Sim-to-Real Transferability

The GNN is trained exclusively on synthetic ns-3 data. Future work will integrate Software-Defined Networking (SDN) hardware with OpenFlow telemetry to fine-tune the model on real physical measurements and evaluate the sim-to-real performance gap.

🗺

Dynamic Topology Generalization

The ZoneAwareGNN is trained on a fixed 52-node topology. Expanding to inductive learning paradigms (e.g. GraphSAGE-style sampling) will allow the model to adapt to structural network expansions, node additions, and physical hardware reconfigurations in real time.

⚡

Cognitive Bottleneck (LLM Latency)

External API calls to Gemini add ~1.35 s per iteration. Future iterations will explore locally hosted, quantized open-source LLMs (e.g. Llama 3 / Mistral) to reduce operational latency below 200 ms and eliminate dependency on cloud API availability.

📡

Massive-Scale Deployment

The current orchestrator optimises a 52-node network. Scaling to thousands of HD camera endpoints across multi-gateway mega-airports will require horizontal sharding of the orchestrator and distributed GNN inference across multiple GPU nodes.