A fully containerised Network Digital Twin pipeline combining NS-3 Wi-Fi 7 simulation, real-time Kafka telemetry streaming, TimescaleDB persistence, and GCN v4.0.0 - a deeper 3-layer architecture trained on static and dynamic phase-transitioning attack scenarios, validated across 50,817 test segments with zero false positives.
Wi-Fi 7 (IEEE 802.11be), ratified in 2024, introduces Multi-Link Operation (MLO) enabling simultaneous transmission across 2.4 GHz, 5 GHz, and 6 GHz bands. While MLO dramatically increases throughput to 46 Gbps and reduces latency, it preserves the CSMA/CA backoff mechanism inherited from legacy 802.11 standards - a mechanism that relies on an honesty assumption: each station draws a random backoff counter from [0, CW] before channel contention.
An adversary can silently manipulate this counter by adding a positive bias (starvation: +5000 slots deprives victims of channel access) or a negative bias (greedy: −5000 slots monopolises the channel). These attacks produce no malformed frames and are therefore invisible to signature-based intrusion detection systems.
This project builds a complete Network Digital Twin (NDT) for Wi-Fi 7 MLO environments using NS-3 simulation, containerised microservices, Apache Kafka/Redpanda streaming, and TimescaleDB time-series storage. GCN v4.0.0 - a deeper 3-layer, 128-unit architecture trained on 2,878 static and synthetically stitched dynamic scenario files - achieves F1 = 0.9988 and 99.83% accuracy across 50,817 test segments, extending coverage to phase-transitioning dynamic attacks spanning 1–4 APs, window sizes 32–256, and bias magnitudes 1000–10000.
The resulting system provides sub-5 ms inference latency with zero false positives across all evaluated scenarios, demonstrating that GCN-based behavioural analysis of NDT telemetry is a viable and highly accurate approach to detecting both static and dynamically evolving covert backoff manipulation attacks in next-generation Wi-Fi networks.
make run-exp command in Docker
Compose.Three interlocking technologies define the problem space: the Wi-Fi 7 standard, Multi-Link Operation, and the CSMA/CA backoff mechanism that introduces the exploitable vulnerability.
Ratified May 2024, Wi-Fi 7 is the first standard to surpass 40 Gbps peak throughput. It introduces 4096-QAM modulation, 320 MHz channels, 16×16 MU-MIMO, and Multi-Link Operation as its headline architectural feature.
MLO allows a Multi-Link Device to simultaneously transmit and receive across multiple frequency bands, providing link aggregation, seamless failover, and dramatically reduced latency - a fundamental shift from prior Wi-Fi generations.
Before transmitting, each station draws a random integer from [0, CW_min] and counts down. The first to reach zero wins channel access. This fairness assumption is exploitable at the NIC driver or firmware layer with no frame artefacts.
Despite Wi-Fi 7 deployment accelerating globally, no prior work addresses covert backoff manipulation in MLO environments using digital twin telemetry and graph neural networks.
| Domain | Prior Work | The Gap |
|---|---|---|
| Wi-Fi 7 Security | Studies focus on MLD authentication, MIC failures, and link-switch attacks inherited from 802.11ax. Backoff fairness vulnerabilities in multi-AP MLO topologies remain unexplored. | No work models or detects covert backoff bias attacks in multi-AP Wi-Fi 7 MLO environments with a live telemetry pipeline. |
| Network Digital Twin | NDT frameworks exist for 5G core (NS-3, OMNET++) and data-centre networks. Wi-Fi 7 specific NDTs with real-time streaming telemetry pipelines are absent from the literature. | No open NDT pipeline for Wi-Fi 7 that streams per-window 13-feature telemetry to a live database and GCN detection engine. |
| GNN-based IDS | GCN/GAT applied to wired network anomaly detection (UNSW-NB15, KDD99). Wi-Fi intrusion detection limited to signature-based or statistical methods (CUSUM, z-score on RSSI/retry). | No GCN detector conditioned on AP count and window size, validated across topology scaling and seed diversity for Wi-Fi 7. |
"Can a Graph Convolutional Network trained on Network Digital Twin telemetry reliably detect covert backoff manipulation attacks in Wi-Fi 7 Multi-Link Operation networks across diverse topologies, window sizes, bias magnitudes, and random seed conditions?"
Can NS-3 simulation faithfully reproduce the telemetry signatures of backoff bias attacks at the per-window granularity needed for GCN input?
Does detection performance degrade when scaling from single-AP to multi-AP topologies (up to 4 APs, 8 STAs per AP)?
Does the model generalise to unseen random seed groups, preventing overfitting to specific NS-3 simulation artefacts or traffic patterns?
A fully containerised NDT pipeline: NS-3 generates telemetry, Kafka streams it in real time, TimescaleDB stores it, and GCN Detector v3.0.0 infers attack probability per sliding window.
| Service | Role | Technology |
|---|---|---|
| ns3-sim | Generates Wi-Fi 7 MLO traffic with optional backoff bias injection per STA | NS-3 3.40, Docker |
| exporter | Reads telemetry.jsonl line-by-line, publishes records to Kafka topic in real time | Python, kafka-python |
| redpanda | High-performance Kafka-compatible broker for telemetry streaming and prediction feedback | Redpanda v23 |
| harmonizer | Consumes from Kafka, validates schema, writes rows to TimescaleDB hypertable | Python, asyncpg |
| timescaledb | Hypertable storage for all telemetry windows, indexed by timestamp for Grafana queries | TimescaleDB 2.x / PG15 |
| gcn-detector | Windowizer + GCN v3.0.0 inference, publishes attack predictions back to Kafka | PyTorch Geometric |
| grafana | 38-panel live dashboard provisioned entirely as code, zero manual configuration required | Grafana 10, JSON |
★ Primary discriminating feature. Combined with 4 one-hot AP_ID bits = 17 input features per graph node (N×17 input matrix).
Two adversarial strategies exploit CSMA/CA backoff: starvation forces victims to wait longer; greedy causes the attacker to monopolise channel access. Both are covert by design.
A compromised NIC driver silently adds a fixed bias to the randomly drawn backoff counter before the countdown begins. No frame artefacts are produced.
All stations draw uniformly from [0, CW_min=15]. Channel access is fair; throughput is shared equitably among all STAs per CSMA/CA specification.
Attacker forces victim stations to wait an extra 5000 slots before transmitting. Victims cannot compete for the channel; attacker monopolises bandwidth silently.
Attacker reduces its own backoff to near-zero, winning almost every contention round and starving all other legitimate stations of bandwidth.
GCN v4.0.0 extends v3 with a deeper 3-layer 128-unit architecture and dynamic phase-transitioning scenario training, achieving F1 = 0.9988 across 50,817 test segments with zero false positives.
| Dimension | v3.0.0 | v4.0.0 ★ |
|---|---|---|
| GCN Layers | 2 layers | 3 layers |
| Hidden Units | 64 | 128 (2× wider) |
| Dynamic Scenarios | No | Yes (phase-transitioning) |
| Training Files | 48 static | 2,878 (static + dynamic) |
| Test Segments | 368 | 50,817 |
| F1 Score | 0.9978 | 0.9988 |
| Accuracy | 99.73% | 99.83% |
| False Positives | 0 | 0 (50,817 segments) |
Each window of N events is converted to a fully-connected temporal graph. Each node represents one time-step with 17 features. All N×(N-1) directed edges capture temporal relationships across the window. AP_ID is one-hot encoded as features 14–17.
Five iterations from a broken prototype to a production-grade dynamic attack detector. Each version addresses a specific failure mode of its predecessor.
| Dimension | v1.0.0 | v2.0.0 | v2.1.0 | v3.0.0 | v4.0.0 ★ |
|---|---|---|---|---|---|
| Status | Abandoned | Baseline | Intermediate | Production | Latest ★ |
| GCN Layers | 2 | 2 | 2 | 2 | 3 |
| Hidden Units | 64 | 64 | 64 | 64 | 128 (2× wider) |
| Dropout | 0.5 | 0.3 | 0.3 | 0.3 | 0.4 |
| AP Support | 1 AP | 1 AP only | 1 AP only | 1–4 APs | 1–4 APs |
| Window Sizes | 256 only | 256 only | 256 only | 32 / 64 / 128 / 256 | 32 / 64 / 128 / 256 |
| Dynamic Scenarios | No | No | No | No | Yes ✓ |
| Training Files | ~20 (imbalanced) | 48 | 48 | 48 | 2,878 |
| Test Segments | ~300 | ~300 | 368 | 50,817 | |
| F1 Score | N/A (failed) | 0.9924 | 0.9943 | 0.9978 | 0.9988 |
| Accuracy | ~0% effective | 99.14% | 99.29% | 99.73% | 99.83% |
| AUC-ROC | ~0.5 | 0.9996 | 0.9998 | 1.0000 | 0.99999 |
| False Positives | High FP & FN | 0 | 0 | 0 | 0 (50,817 segs) |
| FN Rate | Unreliable | ~0.87% | ~0.57% | 0.43% | 0.25% (boundaries) |
| Key Innovation | Balanced training | Batch norm + tuning | AP conditioning + multi-window | 3-layer depth + dynamic training | |
| Parameters | ~16K | ~16K | ~16K | ~16K | 44,482 (~2.8×) |
14 work packages executed iteratively from local environment setup through GCN v4.0.0 deployment with dynamic scenario support. WP14 (closed-loop actuation) is the sole remaining work item.
Repository initialised, SSH keys configured, Docker environment validated, base branching strategy established.
Docker Compose services for TimescaleDB, Redpanda, and Grafana standing up with health checks and persistent volumes.
NS-3 3.40 containerised, 802.11be MLO topology configured, backoff bias injection parameter added, telemetry.jsonl output validated for all 13 metrics.
Python service reads telemetry.jsonl line-by-line with configurable rate, publishes JSON records to Redpanda telemetry topic with schema validation.
Async Python consumer reads from Kafka, validates record schema, upserts rows into TimescaleDB hypertable with correct timestamp indexing.
All 38 Grafana panels defined in JSON provisioning files. Zero manual dashboard configuration required after container startup.
make run-exp)Makefile target orchestrates full experiment lifecycle: NS-3 sim → export → stream → detect → visualise, with parameterised bias, seed, and topology.
First end-to-end GCN detector: Windowizer service consumes telemetry, builds graphs, runs GCN inference, publishes predictions to gcn_predictions Kafka topic.
Retrained on 284 balanced scenarios across seed groups A and B. Single-AP, 256-window model. F1 improved to 0.9924 from v1 baseline.
Consolidated all 38 panels into a unified view with GCN prediction overlay, attack confidence time-series, and per-STA backoff visualisation.
Standalone React SPA with 6 dashboard sections: live predictions, topology map, confidence histogram, metric sparklines, alert feed, and model metadata.
Resolved Harmonizer schema mismatch, TimescaleDB hypertable chunk interval, Kafka consumer group offset reset, and NS-3 telemetry timestamp drift issues.
Major upgrade: AP count conditioning via scalar injection after global pool, multi-window training (32/64/128/256), 32,000 balanced samples. 54/54 PASS across all 5 evaluation tiers.
SDN/ZSM actuation layer: GCN predictions trigger automated countermeasures (CW adjustment, rate limiting, STA quarantine) via OpenFlow or Wi-Fi management APIs.
54 independent test scenarios across 5 evaluation tiers. Pass criteria: normal scenarios must produce <100% attack rate; attack scenarios must produce >90% attack rate. Result: 54/54 PASS with zero FP/FN.
| Tier | Description | Config | Count | Result |
|---|---|---|---|---|
| T1 | Core accuracy - baseline validation for 1-AP, 256-window topology | 1AP, w=256, seeds A+B, bias ±5000 | 12 | 12/12 PASS |
| T2 | Multi-AP scaling - tests topology generalisation with 2 and 4 APs | 2AP + 4AP, w=256, seeds A, v3 only | 6 | 6/6 PASS |
| T3 | Segment length - tests window size sensitivity (64 and 128 events) | 1AP, w=64+128, seeds A, bias ±5000 | 6 | 6/6 PASS |
| T4 | Bias sensitivity - tests detection at magnitudes below training distribution | 1AP, w=256, seeds A, bias 1k/2k/10k | 12 | 12/12 PASS |
| T5 | Seed generalisation - 3 unseen seed groups, no overlap with training seeds | 1AP, w=256, bias ±5000, seeds C+D+E | 18 | 18/18 PASS |
| TOTAL - All tiers combined | 54 | 54/54 PASS ✓ | ||
| Group | Seeds Used | Role |
|---|---|---|
| A | 10, 42, 99 | Training seed group |
| B | 20, 52, 109 | Training seed group |
| C | 30, 62, 119 | Held-out generalisation |
| D | 40, 72, 129 | Held-out generalisation |
| E | 50, 82, 139 | Held-out generalisation |
GCN v4.0.0 achieves near-perfect classification across all evaluated configurations including dynamic phase-transitioning attacks, with zero false positives across 50,817 test segments.
v4.0.0 introduces dynamic scenario support: 39 overnight experiments with phase-transitioning attacks confirmed 97.3% average confidence. The 84 false negatives (0.25% FN rate) occur exclusively at phase-boundary segments where <30% of windows contain the attack signal.
Class imbalance & single-seed training produced high false positives and false negatives.
Solid baseline. 1 AP, 256-window only. Seed groups A+B.
Dropout tuning & batch norm. Still 1AP / 256w only.
Multi-AP, multi-window, AP conditioning. 54/54 PASS.
3-layer 128-unit. Dynamic attacks. 50,817 test segments. Zero FP. Latest model.
Six primary findings emerge from the 54-scenario evaluation, each with direct implications for the deployment of GCN-based IDS in Wi-Fi 7 networks.
GCN v4.0.0 achieves F1 = 0.9988, accuracy 99.83%, and AUC = 0.99999. Zero false positives are recorded across all 50,817 test segments spanning static and dynamic phase-transitioning scenarios. Only 84 false negatives (0.25% FN rate) occur at phase-boundary edges.
Detection performance shows zero measurable degradation when scaling from 1 AP to 4 APs. The AP count conditioning mechanism successfully enables the model to generalise across topologies without per-topology retraining. T2 tier: 6/6 PASS.
End-to-end inference takes <5 ms per window on CPU. With a 32-event window and 100 ms per event, detection latency is ~3.2 seconds from attack onset - fast enough for real-time network management systems.
The model successfully detects attacks at bias=1000 (T4 tier: 12/12 PASS), a magnitude 5× below the minimum training bias of 5000. Confidence is lower (74% vs 95%+) but the binary decision is correct in all cases, demonstrating robust extrapolation.
18/18 PASS across 3 held-out seed groups (C, D, E) that have zero overlap with training seeds (A, B). This confirms the model learns genuine behavioural patterns rather than seed-specific NS-3 simulation artefacts.
The v1→v2→v2.1→v3→v4 evolution demonstrates targeted capability expansion: class balance, AP conditioning, multi-window, and finally dynamic scenario support. v4.0.0 adds a 3rd GCN layer and 128-unit width specifically to handle the temporal complexity of phase-transitioning attacks.
Four original contributions advance the state of the art in Wi-Fi 7 security, network digital twins, and GNN-based intrusion detection.
The first open, fully containerised Network Digital Twin pipeline for Wi-Fi 7 MLO environments. NS-3
simulation, Kafka streaming, TimescaleDB persistence, and live Grafana observability in a single
make run-exp command.
A novel Graph Convolutional Network architecture with AP-count conditioning and multi-window support. Achieves F1 = 0.9978 with zero FP/FN across 54 independent scenarios spanning 4 topology configurations.
A rigorous 5-tier evaluation methodology covering AP scaling, window size sensitivity, bias magnitude extrapolation, and 5-group seed diversity. Provides a reproducible benchmark for future Wi-Fi 7 IDS research.
32,000 balanced labelled telemetry windows across 284+ simulation scenarios, covering normal, starvation, and greedy attack classes with varying topologies, window sizes, and random seeds. Available for community use.
Six limitations are acknowledged. All are well-understood, bounded in impact, and provide clear directions for follow-on research.
All results are from NS-3 3.40 simulation. No physical Wi-Fi 7 hardware (e.g., TP-Link BE9300, Intel BE200) has been used for validation. NS-3 channel models may not perfectly capture real-world PHY impairments such as multipath fading, carrier frequency offset, or hardware-specific retry policies.
GCN v3.0.0 was trained and tested with 1–4 APs. Topologies with 5 or 6 APs are entirely untested. The AP-count conditioning mechanism may not extrapolate reliably beyond the training range, and the 4-bit one-hot encoding is insufficient for nAP > 4.
At bias=1000 (5× below training minimum), prediction confidence drops to ~74% compared to 95%+ for higher biases. While the binary decision remains correct (100% detection rate in T4), the lower confidence score reduces margin for threshold-based alerting systems.
Detection is currently a one-way read operation. No automated countermeasure is triggered upon attack detection. WP13 (SDN/ZSM actuation) remains the sole unimplemented work package. Without closed-loop response, human operator intervention is still required.
All scenarios model a single compromised STA performing backoff manipulation. Coordinated multi-node attacks (e.g., 3 compromised STAs simultaneously biasing in different directions) have not been simulated or evaluated. The model may behave differently under distributed attack scenarios.
The smallest window size (32 events) provides only 3.2 seconds of telemetry context. In high-jitter or bursty environments, 32-event windows may not capture sufficient temporal signal to distinguish low-bias attacks from transient natural throughput variation. 64–256-window configurations are more robust.
Five high-priority research directions build directly on the current results, addressing identified limitations and extending the system towards production deployment.
Implement automated countermeasures triggered by GCN predictions: CW adjustment via OpenFlow, STA rate-limiting, and quarantine via 802.11r fast BSS transition denial. Target: <100 ms response from detection to mitigation. Evaluate impact on normal traffic false-positive rate.
Extend the training corpus to include 5- and 6-AP topologies. Requires expanding the AP_ID one-hot encoding from 4 to 6 bits and generating 100+ new simulation scenarios. Expected to increase deployment coverage for enterprise Wi-Fi 7 campus deployments.
Deploy the NDT telemetry exporter on a physical testbed using Wi-Fi 7 capable hardware (e.g., TP-Link BE9300 APs, Intel BE200 adapters). Validate that NS-3-trained GCN generalises to hardware telemetry and measure any performance gap requiring fine-tuning.
Extend the attack model to include two additional Wi-Fi 7 MLO-specific threats: (1) MLD MAC address spoofing to hijack multi-link sessions, and (2) link-switch flooding to exhaust AP link-switch negotiation resources. Develop multi-class GCN capable of distinguishing between attack types.
Explore federated GCN training across multiple NDT instances in a multi-tenant environment. Each NDT trains locally on its own traffic; a central server aggregates gradients without sharing raw telemetry. Addresses privacy concerns in shared infrastructure deployments and enables cross-building collaborative detection without data leakage.
Anticipated questions from the examination panel, with technically detailed answers grounded in the project results.
Multi-Link Operation (MLO) is the headline feature of IEEE 802.11be (Wi-Fi 7), ratified in May 2024. It allows a Multi-Link Device (MLD) — which may be a client or access point — to simultaneously establish and use multiple radio links across different frequency bands (2.4 GHz, 5 GHz, and 6 GHz). Unlike previous Wi-Fi generations where a device operated on only one band at a time, an MLD can aggregate bandwidth across all three bands simultaneously, achieving peak PHY rates up to 46 Gbps.
MLO provides three key benefits: (1) link aggregation — combine bandwidth from multiple bands; (2) seamless failover — if one band is congested or blocked, traffic automatically shifts to another; (3) reduced latency — parallel frame transmission across links. The security model is inherited from 802.11ax with extensions, but the CSMA/CA backoff mechanism used at each link remains fundamentally unchanged.
In IEEE 802.11, before a station can transmit, it must first perform a CSMA/CA backoff procedure: it draws a random integer from [0, CW_min] (where CW_min = 15 by default), waits that many slot times (9 µs each), and then transmits if the channel is still idle. This mechanism ensures statistical fairness: all stations wait roughly the same average time before each transmission attempt.
A backoff manipulation attack exploits a compromised NIC driver or firmware to bias this random draw. Two variants exist:
Critically, both attacks involve only valid 802.11 frames. There are no packet malformations, no authentication failures, and no unusual frame types — making them invisible to signature-based IDS.
Traditional Intrusion Detection Systems (IDS) such as Snort and Suricata operate on two paradigms: (1) signature matching — comparing packet headers and payloads against known attack patterns; (2) anomaly detection — identifying deviations in packet rates, flow volumes, or protocol state machines.
Backoff manipulation attacks evade both approaches:
The only observable evidence is in aggregate MAC-layer statistics: the distribution of channel access times across stations. A GCN trained on 13 telemetry metrics (including avg_backoff_slots) can learn this distributional shift.
A Network Digital Twin (NDT) is a software-defined, continuously updated virtual replica of a physical network. It ingests real-time telemetry from the physical network (or, in our research context, from a high-fidelity simulation), maintains a current state model, and enables analysis, anomaly detection, and policy testing without touching the live network.
In this project, the NDT consists of:
The NDT paradigm is critical for this research because backoff manipulation cannot be observed at the application layer — it requires MAC-layer telemetry that is only accessible via the NDT's simulation-level instrumentation.
A Graph Convolutional Network (GCN) is chosen over alternative models (LSTM, CNN, Random Forest) for three specific reasons related to the nature of Wi-Fi network telemetry:
Empirically, the GCN approach achieves F1 = 0.9978, outperforming statistical baselines (z-score on avg_backoff_slots alone achieves ~F1 = 0.91) and matching the performance of more complex transformers at a fraction of the computational cost.
The 13 telemetry metrics collected per window from NS-3 simulation are:
These 13 metrics are augmented with 4 one-hot encoded AP_ID bits, giving 17 input features per graph node. The one-hot encoding allows the model to learn AP-specific statistical baselines, which is essential for the multi-AP generalisation achieved in v3.0.0.
Three specific architectural and training changes distinguish v3.0.0 from v2.0.0:
Net result: F1 improved from 0.9924 (v2.0.0) to 0.9978 (v3.0.0), and test coverage expanded from single-AP/256-window to 54 scenarios across all topology and window configurations. AUC improved from 0.9996 to 1.000.
The 54/54 PASS result means that GCN v3.0.0 correctly classifies every single scenario across a 5-tier evaluation matrix of 54 independent test runs. Each scenario is an independent NS-3 simulation run with a specific combination of:
Pass criteria per scenario: (a) normal traffic windows must produce <100% attack-rate prediction; (b) attack traffic windows must produce >90% attack-rate prediction. In all 54 scenarios, both criteria are satisfied simultaneously. This corresponds to zero false positives and zero false negatives across the entire evaluation suite.
The 54 scenarios are not cherry-picked — they are systematically defined across the 5 evaluation tiers (T1: 12, T2: 6, T3: 6, T4: 12, T5: 18) to cover the full parameter space of the research scope.
Based on T4 (Bias Sensitivity) evaluation results, the minimum reliably detectable bias is ±1000 slots. This is 5× below the minimum bias used in training (±5000 slots), demonstrating significant extrapolation capability.
Key observations at bias=1000:
The practical implication: even a relatively mild backoff bias that produces a 17× increase in wait time is detectable by the system, providing early warning well before the 285× amplification seen at bias=5000.
WP13 implements the closed-loop policy actuation layer — the final missing piece that transforms the system from a detection-only tool into a full automated security system. The planned design involves three components:
Target performance for WP13: <100 ms from GCN prediction to countermeasure application, <1% false-positive actuation rate on normal traffic, full reversal within 5 seconds if the attacker is removed.
Three undergraduate researchers and two supervisors from the Department of Computer Engineering, University of Peradeniya.

Lead architect of the NS-3 simulation and telemetry pipeline. Responsible for Wi-Fi 7 MLO topology design, backoff bias injection mechanism, and Docker Compose orchestration (WP1–WP7).

Lead developer of the GCN detection engine. Responsible for GCN model design and training (v1.0.0 through v3.0.0), Windowizer service, evaluation framework, and 54-scenario test automation (WP8–WP12).

Lead developer of the observability and dashboard stack. Responsible for Grafana provisioning-as-code (38 panels), TimescaleDB schema design, React dashboard (port 8888), and pipeline integration (WP5, WP6, WP9.5, WP10, WP11).
Senior Lecturer, Department of Computer Engineering. Research interests: network security, intrusion detection, and machine learning for network management.
Senior Lecturer, Department of Computer Engineering. Research interests: software-defined networking, network function virtualisation, and future wireless systems.