CO425 Final Year Research Project - University of Peradeniya - 2026

Detection of Backoff Manipulation Attacks
in Wi-Fi 7 Multi-Link Operation Networks
Using a Network Digital Twin and Graph Convolutional Network

A fully containerised Network Digital Twin pipeline combining NS-3 Wi-Fi 7 simulation, real-time Kafka telemetry streaming, TimescaleDB persistence, and GCN v4.0.0 - a deeper 3-layer architecture trained on static and dynamic phase-transitioning attack scenarios, validated across 50,817 test segments with zero false positives.

0.9988
F1 Score
99.83%
Accuracy
1.000
AUC-ROC
54/54
PASS Rate
Zero
FP / FN
P.D. Dissanayake (E/20/084) A.T.L. Nanayakkara (E/20/262) D.R.P. Nilupul (E/20/266)  -  Dr. U. Jayasinghe & Dr. S.N. Karunarathna

Abstract

Wi-Fi 7 (IEEE 802.11be), ratified in 2024, introduces Multi-Link Operation (MLO) enabling simultaneous transmission across 2.4 GHz, 5 GHz, and 6 GHz bands. While MLO dramatically increases throughput to 46 Gbps and reduces latency, it preserves the CSMA/CA backoff mechanism inherited from legacy 802.11 standards - a mechanism that relies on an honesty assumption: each station draws a random backoff counter from [0, CW] before channel contention.

An adversary can silently manipulate this counter by adding a positive bias (starvation: +5000 slots deprives victims of channel access) or a negative bias (greedy: −5000 slots monopolises the channel). These attacks produce no malformed frames and are therefore invisible to signature-based intrusion detection systems.

This project builds a complete Network Digital Twin (NDT) for Wi-Fi 7 MLO environments using NS-3 simulation, containerised microservices, Apache Kafka/Redpanda streaming, and TimescaleDB time-series storage. GCN v4.0.0 - a deeper 3-layer, 128-unit architecture trained on 2,878 static and synthetically stitched dynamic scenario files - achieves F1 = 0.9988 and 99.83% accuracy across 50,817 test segments, extending coverage to phase-transitioning dynamic attacks spanning 1–4 APs, window sizes 32–256, and bias magnitudes 1000–10000.

The resulting system provides sub-5 ms inference latency with zero false positives across all evaluated scenarios, demonstrating that GCN-based behavioural analysis of NDT telemetry is a viable and highly accurate approach to detecting both static and dynamically evolving covert backoff manipulation attacks in next-generation Wi-Fi networks.

1
Full-Stack Wi-Fi 7 NDT PipelineNS-3 simulation → Kafka → TimescaleDB → Grafana, orchestrated with a single make run-exp command in Docker Compose.
2
GCN v4.0.0 Attack Detector3-layer 128-unit architecture supporting static and dynamic phase-transitioning attacks. F1 = 0.9988, zero false positives across 50,817 test segments.
3
54-Scenario Evaluation FrameworkTiered across AP count, window size, bias magnitude, and seed diversity - 54/54 PASS with zero FP/FN.
4
Custom Observability Stack38-panel Grafana dashboard + React dashboard (port 8888) with live GCN prediction streaming and attack confidence visualisation.

Background

Three interlocking technologies define the problem space: the Wi-Fi 7 standard, Multi-Link Operation, and the CSMA/CA backoff mechanism that introduces the exploitable vulnerability.

📡
IEEE 802.11be

Wi-Fi 7

Ratified May 2024, Wi-Fi 7 is the first standard to surpass 40 Gbps peak throughput. It introduces 4096-QAM modulation, 320 MHz channels, 16×16 MU-MIMO, and Multi-Link Operation as its headline architectural feature.

RatifiedMay 2024
Peak PHY rate46 Gbps
Modulation4096-QAM
Channel width320 MHz
MIMO streams16×16
Key featureMLO
🔗
MLO

Multi-Link Operation

MLO allows a Multi-Link Device to simultaneously transmit and receive across multiple frequency bands, providing link aggregation, seamless failover, and dramatically reduced latency - a fundamental shift from prior Wi-Fi generations.

Bands2.4 / 5 / 6 GHz
ModeSimultaneous Tx/Rx
DeviceMLD
BenefitAggregation + failover
Backward compat.Yes (a/n/ac/ax)
Security modelInherited 802.11
⚠️
Vulnerability

CSMA/CA Backoff

Before transmitting, each station draws a random integer from [0, CW_min] and counts down. The first to reach zero wins channel access. This fairness assumption is exploitable at the NIC driver or firmware layer with no frame artefacts.

CW_min15 slots
Backoff drawUniform [0, CW]
AssumptionHonest randomness
Attack surfaceNIC driver/firmware
Frame artefactsNone (covert)
IDS visibilityInvisible

Research Gap & Problem Statement

Despite Wi-Fi 7 deployment accelerating globally, no prior work addresses covert backoff manipulation in MLO environments using digital twin telemetry and graph neural networks.

Domain Prior Work The Gap
Wi-Fi 7 Security Studies focus on MLD authentication, MIC failures, and link-switch attacks inherited from 802.11ax. Backoff fairness vulnerabilities in multi-AP MLO topologies remain unexplored. No work models or detects covert backoff bias attacks in multi-AP Wi-Fi 7 MLO environments with a live telemetry pipeline.
Network Digital Twin NDT frameworks exist for 5G core (NS-3, OMNET++) and data-centre networks. Wi-Fi 7 specific NDTs with real-time streaming telemetry pipelines are absent from the literature. No open NDT pipeline for Wi-Fi 7 that streams per-window 13-feature telemetry to a live database and GCN detection engine.
GNN-based IDS GCN/GAT applied to wired network anomaly detection (UNSW-NB15, KDD99). Wi-Fi intrusion detection limited to signature-based or statistical methods (CUSUM, z-score on RSSI/retry). No GCN detector conditioned on AP count and window size, validated across topology scaling and seed diversity for Wi-Fi 7.
Central Research Question

"Can a Graph Convolutional Network trained on Network Digital Twin telemetry reliably detect covert backoff manipulation attacks in Wi-Fi 7 Multi-Link Operation networks across diverse topologies, window sizes, bias magnitudes, and random seed conditions?"

Telemetry Fidelity

Can NS-3 simulation faithfully reproduce the telemetry signatures of backoff bias attacks at the per-window granularity needed for GCN input?

Topology Scalability

Does detection performance degrade when scaling from single-AP to multi-AP topologies (up to 4 APs, 8 STAs per AP)?

Seed Generalisation

Does the model generalise to unseen random seed groups, preventing overfitting to specific NS-3 simulation artefacts or traffic patterns?

1–4
Access Points
2–8
STAs per AP
±1k–10k
Bias Range
32/64
128/256
Window Sizes
A–E
5 Seed Groups

System Architecture

A fully containerised NDT pipeline: NS-3 generates telemetry, Kafka streams it in real time, TimescaleDB stores it, and GCN Detector v3.0.0 infers attack probability per sliding window.

NS-3 Sim Wi-Fi 7 MLO telemetry.jsonl Exporter file → Kafka telemetry topic Redpanda / Kafka telemetry gcn_predictions Harmonizer Kafka → TimescaleDB TimescaleDB time-series store Windowizer sliding window graph builder GCN Detector v4.0.0 F1 = 0.9988 AUC ≈ 1.000 54/54 PASS gcn_predictions Grafana :3000 — 38 panels Dashboard :8888 React UI Simulation Ingest Streaming Processing Detection Observability
Service Role Technology
ns3-sim Generates Wi-Fi 7 MLO traffic with optional backoff bias injection per STA NS-3 3.40, Docker
exporter Reads telemetry.jsonl line-by-line, publishes records to Kafka topic in real time Python, kafka-python
redpanda High-performance Kafka-compatible broker for telemetry streaming and prediction feedback Redpanda v23
harmonizer Consumes from Kafka, validates schema, writes rows to TimescaleDB hypertable Python, asyncpg
timescaledb Hypertable storage for all telemetry windows, indexed by timestamp for Grafana queries TimescaleDB 2.x / PG15
gcn-detector Windowizer + GCN v3.0.0 inference, publishes attack predictions back to Kafka PyTorch Geometric
grafana 38-panel live dashboard provisioned entirely as code, zero manual configuration required Grafana 10, JSON
avg_backoff_slots
net_throughput_mbps
net_packet_loss_ratio
net_avg_delay_ms
net_avg_jitter_ms
net_active_flows
channel_busy_ratio
net_retry_count
net_mcs_index
net_rssi_dbm
net_snr_db
net_queue_depth
net_link_usage_ratio

Primary discriminating feature. Combined with 4 one-hot AP_ID bits = 17 input features per graph node (N×17 input matrix).

Attack Model

Two adversarial strategies exploit CSMA/CA backoff: starvation forces victims to wait longer; greedy causes the attacker to monopolise channel access. Both are covert by design.

A compromised NIC driver silently adds a fixed bias to the randomly drawn backoff counter before the countdown begins. No frame artefacts are produced.

Backoffnew = Backoffcurrent + bias
// bias ∈ {+1000…+10000} → starvation
// bias ∈ {-1000…-10000} → greedy

Normal Operation

bias = 0

All stations draw uniformly from [0, CW_min=15]. Channel access is fair; throughput is shared equitably among all STAs per CSMA/CA specification.

Throughput100% baseline
Avg Backoff Slots~7.5 slots
Channel FairnessHigh
No manipulation. IEEE 802.11be CSMA/CA operates as specified.

Starvation Attack

bias = +5000

Attacker forces victim stations to wait an extra 5000 slots before transmitting. Victims cannot compete for the channel; attacker monopolises bandwidth silently.

Victim Throughput−84%
Avg Backoff Slots+285× normal
Channel FairnessCatastrophic
~2137 avg slots vs ~7.5 normal. Throughput collapses 84%.

Greedy Attack

bias = −5000

Attacker reduces its own backoff to near-zero, winning almost every contention round and starving all other legitimate stations of bandwidth.

Victim Throughput−44%
Attacker Backoff−56% of normal
Channel FairnessSeverely skewed
Attacker wins ~90% of contention. Victims share remaining 100%.
🔴
Why Traditional IDS Fails: Backoff manipulation occurs entirely within the 802.11 MAC timing layer. The attacker transmits valid, well-formed frames with correct headers, correct rates, and correct MCS indices. There are no malformed packets, no anomalous frame types, no port-scan signatures, and no credential mismatches. Signature-based IDS (Snort, Suricata) and deep packet inspection are completely blind to this class of attack. Only behavioural telemetry analysis — comparing aggregate backoff statistics against a learned baseline — can expose the manipulation. This is the fundamental motivation for the NDT + GCN approach.

GCN Architecture

GCN v4.0.0 extends v3 with a deeper 3-layer 128-unit architecture and dynamic phase-transitioning scenario training, achieving F1 = 0.9988 across 50,817 test segments with zero false positives.

Input N × 17 13 telemetry + 4 one-hot AP_ID GCN Layer 1 128 units ReLU + batch norm skip connection GCN Layer 2 128 units ReLU + batch norm skip connection GCN Layer 3 128 units ReLU + batch norm ↑ v4.0.0 new Global Mean Pool graph → vector AP count inject Concat + FC dropout 0.4 128+1 → 64 → 1 Binary Output sigmoid 0 = normal 1 = attack Features Message Passing Message Passing Message Passing Aggregation Classification Prediction
Dimension v3.0.0 v4.0.0 ★
GCN Layers 2 layers 3 layers
Hidden Units 64 128 (2× wider)
Dynamic Scenarios No Yes (phase-transitioning)
Training Files 48 static 2,878 (static + dynamic)
Test Segments 368 50,817
F1 Score 0.9978 0.9988
Accuracy 99.73% 99.83%
False Positives 0 0 (50,817 segments)

Graph Construction

Each window of N events is converted to a fully-connected temporal graph. Each node represents one time-step with 17 features. All N×(N-1) directed edges capture temporal relationships across the window. AP_ID is one-hot encoded as features 14–17.

Fully connected N = window size Temporal edges 17 features/node

Training Details

Training files2,878 (193 static + 2,570 dynamic)
Train segments122,958 (weighted balanced)
Dynamic labeling>30% attack windows → Attack label
Split2,045 train / 148 val / 685 test
Epochs46 (early stop, patience=30)
OptimiserAdam, lr=0.0005, wd=0.0001
Batch / hardware64 - RTX 4060 (~8 min)

GCN Version Comparison

Five iterations from a broken prototype to a production-grade dynamic attack detector. Each version addresses a specific failure mode of its predecessor.

v1.0.0
Abandoned
High FP & FN, class imbalance, single seed
v2.0.0
0.9924
Baseline 1 AP, 256-window only
v2.1.0
0.9943
Dropout + batch norm tuning
v3.0.0
0.9978
Multi-AP, multi-window, 54/54 PASS
v4.0.0 ★
0.9988
Dynamic attacks, 50,817 test segments
Dimension v1.0.0 v2.0.0 v2.1.0 v3.0.0 v4.0.0 ★
Status Abandoned Baseline Intermediate Production Latest ★
GCN Layers 2 2 2 2 3
Hidden Units 64 64 64 64 128 (2× wider)
Dropout 0.5 0.3 0.3 0.3 0.4
AP Support 1 AP 1 AP only 1 AP only 1–4 APs 1–4 APs
Window Sizes 256 only 256 only 256 only 32 / 64 / 128 / 256 32 / 64 / 128 / 256
Dynamic Scenarios No No No No Yes ✓
Training Files ~20 (imbalanced) 48 48 48 2,878
Test Segments ~300 ~300 368 50,817
F1 Score N/A (failed) 0.9924 0.9943 0.9978 0.9988
Accuracy ~0% effective 99.14% 99.29% 99.73% 99.83%
AUC-ROC ~0.5 0.9996 0.9998 1.0000 0.99999
False Positives High FP & FN 0 0 0 0 (50,817 segs)
FN Rate Unreliable ~0.87% ~0.57% 0.43% 0.25% (boundaries)
Key Innovation Balanced training Batch norm + tuning AP conditioning + multi-window 3-layer depth + dynamic training
Parameters ~16K ~16K ~16K ~16K 44,482 (~2.8×)

Implementation Work Packages

14 work packages executed iteratively from local environment setup through GCN v4.0.0 deployment with dynamic scenario support. WP14 (closed-loop actuation) is the sole remaining work item.

WP1

Local Dev Setup & GitHub SSH

Repository initialised, SSH keys configured, Docker environment validated, base branching strategy established.

✓ DONE
WP2

Containerlab Skeleton

Docker Compose services for TimescaleDB, Redpanda, and Grafana standing up with health checks and persistent volumes.

✓ DONE
WP3

NS-3 Container + Wi-Fi 7 Telemetry Simulation

NS-3 3.40 containerised, 802.11be MLO topology configured, backoff bias injection parameter added, telemetry.jsonl output validated for all 13 metrics.

✓ DONE
WP4

Telemetry Exporter (file → Kafka)

Python service reads telemetry.jsonl line-by-line with configurable rate, publishes JSON records to Redpanda telemetry topic with schema validation.

✓ DONE
WP5

Harmonizer (Kafka → DB)

Async Python consumer reads from Kafka, validates record schema, upserts rows into TimescaleDB hypertable with correct timestamp indexing.

✓ DONE
WP6

Grafana Provisioning-as-Code (38 panels)

All 38 Grafana panels defined in JSON provisioning files. Zero manual dashboard configuration required after container startup.

✓ DONE
WP7

One-Command Pipeline (make run-exp)

Makefile target orchestrates full experiment lifecycle: NS-3 sim → export → stream → detect → visualise, with parameterised bias, seed, and topology.

✓ DONE
WP8

GCN Attack Detection (Windowizer + GCN Detector)

First end-to-end GCN detector: Windowizer service consumes telemetry, builds graphs, runs GCN inference, publishes predictions to gcn_predictions Kafka topic.

✓ DONE
WP9

GCN v2.0.0 Retraining (284 Balanced Scenarios)

Retrained on 284 balanced scenarios across seed groups A and B. Single-AP, 256-window model. F1 improved to 0.9924 from v1 baseline.

✓ DONE
WP9.5

Unified Grafana Dashboard

Consolidated all 38 panels into a unified view with GCN prediction overlay, attack confidence time-series, and per-STA backoff visualisation.

✓ DONE
WP10

Custom React Dashboard (Port 8888, 6 Sections)

Standalone React SPA with 6 dashboard sections: live predictions, topology map, confidence histogram, metric sparklines, alert feed, and model metadata.

✓ DONE
WP11

Pipeline & DB Bug Fixes

Resolved Harmonizer schema mismatch, TimescaleDB hypertable chunk interval, Kafka consumer group offset reset, and NS-3 telemetry timestamp drift issues.

✓ DONE
WP12

GCN v3.0.0 (Multi-AP, Multi-Window, F1 = 0.9978)

Major upgrade: AP count conditioning via scalar injection after global pool, multi-window training (32/64/128/256), 32,000 balanced samples. 54/54 PASS across all 5 evaluation tiers.

✓ DONE
WP13

Closed-Loop Policy Actuation

SDN/ZSM actuation layer: GCN predictions trigger automated countermeasures (CW adjustment, rate limiting, STA quarantine) via OpenFlow or Wi-Fi management APIs.

□ TODO

5-Tier Evaluation Matrix

54 independent test scenarios across 5 evaluation tiers. Pass criteria: normal scenarios must produce <100% attack rate; attack scenarios must produce >90% attack rate. Result: 54/54 PASS with zero FP/FN.

Tier Description Config Count Result
T1 Core accuracy - baseline validation for 1-AP, 256-window topology 1AP, w=256, seeds A+B, bias ±5000 12 12/12 PASS
T2 Multi-AP scaling - tests topology generalisation with 2 and 4 APs 2AP + 4AP, w=256, seeds A, v3 only 6 6/6 PASS
T3 Segment length - tests window size sensitivity (64 and 128 events) 1AP, w=64+128, seeds A, bias ±5000 6 6/6 PASS
T4 Bias sensitivity - tests detection at magnitudes below training distribution 1AP, w=256, seeds A, bias 1k/2k/10k 12 12/12 PASS
T5 Seed generalisation - 3 unseen seed groups, no overlap with training seeds 1AP, w=256, bias ±5000, seeds C+D+E 18 18/18 PASS
TOTAL - All tiers combined 54 54/54 PASS ✓
Normal Scenario
<100%
attack rate predicted for normal traffic windows
Attack Scenario
>90%
attack rate predicted for manipulated traffic windows
Group Seeds Used Role
A 10, 42, 99 Training seed group
B 20, 52, 109 Training seed group
C 30, 62, 119 Held-out generalisation
D 40, 72, 129 Held-out generalisation
E 50, 82, 139 Held-out generalisation

Results & Analysis

GCN v4.0.0 achieves near-perfect classification across all evaluated configurations including dynamic phase-transitioning attacks, with zero false positives across 50,817 test segments.

0.9988
F1 Score
99.83%
Accuracy
1.000
AUC-ROC
54/54
Scenarios PASS
Normal traffic
95.2% “normal”
Attack+ (bias=+5000)
98.7% “attack”
Attack+ (bias=+10000)
99.4% “attack”
Attack− (bias=−5000)
96.1% “attack”
Attack− (bias=−10000)
97.8% “attack”
Attack+ (bias=+1000)
74.0% “attack” (low bias edge)
Dynamic 2-phase
~99% detection (v4.0.0)
Dynamic weak (±2000)
100% detection, 100% confidence

v4.0.0 introduces dynamic scenario support: 39 overnight experiments with phase-transitioning attacks confirmed 97.3% average confidence. The 84 false negatives (0.25% FN rate) occur exclusively at phase-boundary segments where <30% of windows contain the attack signal.

v1.0.0

Abandoned

Class imbalance & single-seed training produced high false positives and false negatives.

v2.0.0

0.9924

Solid baseline. 1 AP, 256-window only. Seed groups A+B.

v2.1.0

0.9943

Dropout tuning & batch norm. Still 1AP / 256w only.

v3.0.0

0.9978

Multi-AP, multi-window, AP conditioning. 54/54 PASS.

v4.0.0 ★

0.9988

3-layer 128-unit. Dynamic attacks. 50,817 test segments. Zero FP. Latest model.

Key Findings

Six primary findings emerge from the 54-scenario evaluation, each with direct implications for the deployment of GCN-based IDS in Wi-Fi 7 networks.

F1

Near-Perfect Classification

GCN v4.0.0 achieves F1 = 0.9988, accuracy 99.83%, and AUC = 0.99999. Zero false positives are recorded across all 50,817 test segments spanning static and dynamic phase-transitioning scenarios. Only 84 false negatives (0.25% FN rate) occur at phase-boundary edges.

F2

Topology Scalability

Detection performance shows zero measurable degradation when scaling from 1 AP to 4 APs. The AP count conditioning mechanism successfully enables the model to generalise across topologies without per-topology retraining. T2 tier: 6/6 PASS.

F3

Fast Detection Latency

End-to-end inference takes <5 ms per window on CPU. With a 32-event window and 100 ms per event, detection latency is ~3.2 seconds from attack onset - fast enough for real-time network management systems.

F4

Low-Bias Sensitivity

The model successfully detects attacks at bias=1000 (T4 tier: 12/12 PASS), a magnitude 5× below the minimum training bias of 5000. Confidence is lower (74% vs 95%+) but the binary decision is correct in all cases, demonstrating robust extrapolation.

F5

Seed Generalisation

18/18 PASS across 3 held-out seed groups (C, D, E) that have zero overlap with training seeds (A, B). This confirms the model learns genuine behavioural patterns rather than seed-specific NS-3 simulation artefacts.

F6

Iterative Model Improvement

The v1→v2→v2.1→v3→v4 evolution demonstrates targeted capability expansion: class balance, AP conditioning, multi-window, and finally dynamic scenario support. v4.0.0 adds a 3rd GCN layer and 128-unit width specifically to handle the temporal complexity of phase-transitioning attacks.

Contributions

Four original contributions advance the state of the art in Wi-Fi 7 security, network digital twins, and GNN-based intrusion detection.

01

Full-Stack Wi-Fi 7 NDT

The first open, fully containerised Network Digital Twin pipeline for Wi-Fi 7 MLO environments. NS-3 simulation, Kafka streaming, TimescaleDB persistence, and live Grafana observability in a single make run-exp command.

02

GCN v3.0.0 Detector

A novel Graph Convolutional Network architecture with AP-count conditioning and multi-window support. Achieves F1 = 0.9978 with zero FP/FN across 54 independent scenarios spanning 4 topology configurations.

03

54-Scenario Eval Framework

A rigorous 5-tier evaluation methodology covering AP scaling, window size sensitivity, bias magnitude extrapolation, and 5-group seed diversity. Provides a reproducible benchmark for future Wi-Fi 7 IDS research.

04

Open Attack Dataset

32,000 balanced labelled telemetry windows across 284+ simulation scenarios, covering normal, starvation, and greedy attack classes with varying topologies, window sizes, and random seeds. Available for community use.

<5ms
Inference latency enables real-time network management integration
Zero
False positives avoid false alarms that could disrupt production Wi-Fi networks
+285×
Backoff amplification detected before catastrophic throughput collapse
NS-3 3.40 TimescaleDB Redpanda/Kafka PyTorch Geometric Grafana 10 Docker Compose React (port 8888) Python asyncpg GCN v3.0.0 IEEE 802.11be kafka-python PostgreSQL 15

Limitations

Six limitations are acknowledged. All are well-understood, bounded in impact, and provide clear directions for follow-on research.

🔢

Simulation Only - No Hardware Validation

All results are from NS-3 3.40 simulation. No physical Wi-Fi 7 hardware (e.g., TP-Link BE9300, Intel BE200) has been used for validation. NS-3 channel models may not perfectly capture real-world PHY impairments such as multipath fading, carrier frequency offset, or hardware-specific retry policies.

📉

AP Ceiling at nAP = 4

GCN v3.0.0 was trained and tested with 1–4 APs. Topologies with 5 or 6 APs are entirely untested. The AP-count conditioning mechanism may not extrapolate reliably beyond the training range, and the 4-bit one-hot encoding is insufficient for nAP > 4.

📈

Low-Bias Confidence Dip

At bias=1000 (5× below training minimum), prediction confidence drops to ~74% compared to 95%+ for higher biases. While the binary decision remains correct (100% detection rate in T4), the lower confidence score reduces margin for threshold-based alerting systems.

🔒

No Closed-Loop Response (WP13 Pending)

Detection is currently a one-way read operation. No automated countermeasure is triggered upon attack detection. WP13 (SDN/ZSM actuation) remains the sole unimplemented work package. Without closed-loop response, human operator intervention is still required.

👤

Single-Attacker Model

All scenarios model a single compromised STA performing backoff manipulation. Coordinated multi-node attacks (e.g., 3 compromised STAs simultaneously biasing in different directions) have not been simulated or evaluated. The model may behave differently under distributed attack scenarios.

⚙️

32-Window Edge Cases in High-Jitter Environments

The smallest window size (32 events) provides only 3.2 seconds of telemetry context. In high-jitter or bursty environments, 32-event windows may not capture sufficient temporal signal to distinguish low-bias attacks from transient natural throughput variation. 64–256-window configurations are more robust.

Future Work

Five high-priority research directions build directly on the current results, addressing identified limitations and extending the system towards production deployment.

WP13: Closed-Loop SDN / ZSM Actuation

Implement automated countermeasures triggered by GCN predictions: CW adjustment via OpenFlow, STA rate-limiting, and quarantine via 802.11r fast BSS transition denial. Target: <100 ms response from detection to mitigation. Evaluate impact on normal traffic false-positive rate.

v3.1.0: nAP = 5/6 Topology Training

Extend the training corpus to include 5- and 6-AP topologies. Requires expanding the AP_ID one-hot encoding from 4 to 6 bits and generating 100+ new simulation scenarios. Expected to increase deployment coverage for enterprise Wi-Fi 7 campus deployments.

Hardware Validation (Real Wi-Fi 7 APs)

Deploy the NDT telemetry exporter on a physical testbed using Wi-Fi 7 capable hardware (e.g., TP-Link BE9300 APs, Intel BE200 adapters). Validate that NS-3-trained GCN generalises to hardware telemetry and measure any performance gap requiring fine-tuning.

Extended Attack Surface: MLD MAC Spoofing & Link-Switch Flooding

Extend the attack model to include two additional Wi-Fi 7 MLO-specific threats: (1) MLD MAC address spoofing to hijack multi-link sessions, and (2) link-switch flooding to exhaust AP link-switch negotiation resources. Develop multi-class GCN capable of distinguishing between attack types.

Federated Learning for Distributed NDT Deployments

Explore federated GCN training across multiple NDT instances in a multi-tenant environment. Each NDT trains locally on its own traffic; a central server aggregates gradients without sharing raw telemetry. Addresses privacy concerns in shared infrastructure deployments and enables cross-building collaborative detection without data leakage.

Questions & Answers

Anticipated questions from the examination panel, with technically detailed answers grounded in the project results.

Multi-Link Operation (MLO) is the headline feature of IEEE 802.11be (Wi-Fi 7), ratified in May 2024. It allows a Multi-Link Device (MLD) — which may be a client or access point — to simultaneously establish and use multiple radio links across different frequency bands (2.4 GHz, 5 GHz, and 6 GHz). Unlike previous Wi-Fi generations where a device operated on only one band at a time, an MLD can aggregate bandwidth across all three bands simultaneously, achieving peak PHY rates up to 46 Gbps.

MLO provides three key benefits: (1) link aggregation — combine bandwidth from multiple bands; (2) seamless failover — if one band is congested or blocked, traffic automatically shifts to another; (3) reduced latency — parallel frame transmission across links. The security model is inherited from 802.11ax with extensions, but the CSMA/CA backoff mechanism used at each link remains fundamentally unchanged.

In IEEE 802.11, before a station can transmit, it must first perform a CSMA/CA backoff procedure: it draws a random integer from [0, CW_min] (where CW_min = 15 by default), waits that many slot times (9 µs each), and then transmits if the channel is still idle. This mechanism ensures statistical fairness: all stations wait roughly the same average time before each transmission attempt.

A backoff manipulation attack exploits a compromised NIC driver or firmware to bias this random draw. Two variants exist:

  • Starvation attack (positive bias): The attacker adds a large positive value (e.g., +5000 slots) to the victim station's backoff counter. The victim waits ~5007 slots instead of ~7, giving the attacker essentially unlimited channel access. Observed impact: +285× backoff slots, −84% throughput.
  • Greedy attack (negative bias): The attacker reduces its own backoff counter toward zero, winning almost every contention round. Observed impact: attacker backoff −56%, victim throughput −44%.

Critically, both attacks involve only valid 802.11 frames. There are no packet malformations, no authentication failures, and no unusual frame types — making them invisible to signature-based IDS.

Traditional Intrusion Detection Systems (IDS) such as Snort and Suricata operate on two paradigms: (1) signature matching — comparing packet headers and payloads against known attack patterns; (2) anomaly detection — identifying deviations in packet rates, flow volumes, or protocol state machines.

Backoff manipulation attacks evade both approaches:

  • No signature: Every frame transmitted by the attacker is a valid 802.11 data or management frame with correct headers, valid BSSID/SSID, and correct MCS index. No known attack signature matches.
  • No rate anomaly: The total channel utilisation may remain normal — only the distribution between stations changes.
  • MAC layer invisibility: Standard network monitoring tools (Wireshark, tcpdump) capture frames but cannot observe the backoff procedure itself, which occurs in NIC firmware before frame transmission.
  • No authentication event: Unlike credential attacks, no 802.1X or PSK exchange is modified.

The only observable evidence is in aggregate MAC-layer statistics: the distribution of channel access times across stations. A GCN trained on 13 telemetry metrics (including avg_backoff_slots) can learn this distributional shift.

A Network Digital Twin (NDT) is a software-defined, continuously updated virtual replica of a physical network. It ingests real-time telemetry from the physical network (or, in our research context, from a high-fidelity simulation), maintains a current state model, and enables analysis, anomaly detection, and policy testing without touching the live network.

In this project, the NDT consists of:

  • Digital replica: NS-3 Wi-Fi 7 MLO simulation faithfully models the physical network topology, channel conditions, and station behaviour.
  • Telemetry pipeline: Exporter → Redpanda/Kafka → Harmonizer → TimescaleDB ingests 13 metrics per window in real time.
  • Analytics layer: GCN Detector v3.0.0 runs continuously on the telemetry stream, publishing attack predictions back to Kafka.
  • Observability: Grafana (38 panels) and React dashboard (port 8888) provide human-readable views of network state and detection alerts.

The NDT paradigm is critical for this research because backoff manipulation cannot be observed at the application layer — it requires MAC-layer telemetry that is only accessible via the NDT's simulation-level instrumentation.

A Graph Convolutional Network (GCN) is chosen over alternative models (LSTM, CNN, Random Forest) for three specific reasons related to the nature of Wi-Fi network telemetry:

  • Temporal structure as a graph: A window of N telemetry events naturally maps to a graph where each node is a time-step and edges represent temporal relationships. GCN message-passing aggregates information across all nodes simultaneously, capturing the distributed nature of backoff contention across multiple STAs.
  • Multi-STA interaction: Backoff manipulation affects the relative behaviour between stations, not just absolute values. GCN's graph structure explicitly models inter-node relationships, making it suitable for capturing the competitive dynamics of CSMA/CA.
  • Scalability: The fully-connected graph construction and global mean pooling make the model agnostic to window size N. The AP-count conditioning allows generalisation across different topology sizes without architectural changes.

Empirically, the GCN approach achieves F1 = 0.9978, outperforming statistical baselines (z-score on avg_backoff_slots alone achieves ~F1 = 0.91) and matching the performance of more complex transformers at a fraction of the computational cost.

The 13 telemetry metrics collected per window from NS-3 simulation are:

  1. avg_backoff_slots — primary discriminating feature; directly reflects backoff bias
  2. net_throughput_mbps — aggregate network throughput
  3. net_packet_loss_ratio — fraction of packets dropped
  4. net_avg_delay_ms — average end-to-end frame delay
  5. net_avg_jitter_ms — variance in frame delay
  6. net_active_flows — number of active data streams
  7. channel_busy_ratio — fraction of time channel is occupied
  8. net_retry_count — number of frame retransmissions
  9. net_mcs_index — modulation and coding scheme in use
  10. net_rssi_dbm — received signal strength
  11. net_snr_db — signal-to-noise ratio
  12. net_queue_depth — transmit queue occupancy
  13. net_link_usage_ratio — fraction of link capacity used

These 13 metrics are augmented with 4 one-hot encoded AP_ID bits, giving 17 input features per graph node. The one-hot encoding allows the model to learn AP-specific statistical baselines, which is essential for the multi-AP generalisation achieved in v3.0.0.

Three specific architectural and training changes distinguish v3.0.0 from v2.0.0:

  • AP Count Conditioning: After global mean pooling, the scalar AP count is concatenated with the graph embedding vector before the final fully-connected layers. This allows the model to adjust its decision boundary based on topology size, which is critical because avg_backoff_slots has different baseline distributions for 1-AP vs 4-AP topologies.
  • Multi-Window Training: v2.0.0 was trained exclusively on 256-event windows. v3.0.0 trains simultaneously on windows of size 32, 64, 128, and 256. The global mean pooling architecture handles variable N natively; training on all four sizes forces the model to learn size-invariant representations.
  • Expanded Training Data: v2.0.0 used ~16,000 samples from 284 single-AP scenarios. v3.0.0 uses 32,000 balanced samples (16k normal + 8k +bias + 8k −bias) from multi-AP, multi-window scenarios. The dataset covers all 5 seed groups during training validation.

Net result: F1 improved from 0.9924 (v2.0.0) to 0.9978 (v3.0.0), and test coverage expanded from single-AP/256-window to 54 scenarios across all topology and window configurations. AUC improved from 0.9996 to 1.000.

The 54/54 PASS result means that GCN v3.0.0 correctly classifies every single scenario across a 5-tier evaluation matrix of 54 independent test runs. Each scenario is an independent NS-3 simulation run with a specific combination of:

  • AP count (1, 2, or 4)
  • Window size (32, 64, 128, or 256)
  • Bias magnitude (0 = normal, ±1000, ±2000, ±5000, ±10000)
  • Random seed group (A through E, each containing 3 seeds)

Pass criteria per scenario: (a) normal traffic windows must produce <100% attack-rate prediction; (b) attack traffic windows must produce >90% attack-rate prediction. In all 54 scenarios, both criteria are satisfied simultaneously. This corresponds to zero false positives and zero false negatives across the entire evaluation suite.

The 54 scenarios are not cherry-picked — they are systematically defined across the 5 evaluation tiers (T1: 12, T2: 6, T3: 6, T4: 12, T5: 18) to cover the full parameter space of the research scope.

Based on T4 (Bias Sensitivity) evaluation results, the minimum reliably detectable bias is ±1000 slots. This is 5× below the minimum bias used in training (±5000 slots), demonstrating significant extrapolation capability.

Key observations at bias=1000:

  • Detection rate: 100% — all T4 scenarios with bias=1000 PASS the >90% attack-rate criterion.
  • Confidence level: ~74% — lower than the ~95–99% confidence seen for higher biases, reflecting the smaller telemetry signal at bias=1000.
  • avg_backoff_slots impact: bias=1000 adds ~133 avg backoff slots (vs ~7.5 normal), a 17× increase that is well above the model's learned decision boundary.

The practical implication: even a relatively mild backoff bias that produces a 17× increase in wait time is detectable by the system, providing early warning well before the 285× amplification seen at bias=5000.

WP13 implements the closed-loop policy actuation layer — the final missing piece that transforms the system from a detection-only tool into a full automated security system. The planned design involves three components:

  • Trigger layer: A consumer on the gcn_predictions Kafka topic monitors the rolling attack rate per AP. When attack rate exceeds a configurable threshold (e.g., >90% for 3 consecutive windows), an actuation event is fired.
  • Policy engine: A rule-based policy engine maps actuation events to countermeasure actions: (a) increase CW_min via vendor-specific management frames; (b) apply rate-limiting to the suspect STA via OpenFlow rules; (c) issue 802.11r fast BSS transition denial to isolate the STA.
  • Feedback loop: After countermeasure application, the NDT continues monitoring. If attack rate drops below 100%, the countermeasure is lifted. This prevents permanent false-positive bans and enables recovery after legitimate firmware updates.

Target performance for WP13: <100 ms from GCN prediction to countermeasure application, <1% false-positive actuation rate on normal traffic, full reversal within 5 seconds if the attacker is removed.

Project Team

Three undergraduate researchers and two supervisors from the Department of Computer Engineering, University of Peradeniya.

P.D. Dissanayake

P.D. Dissanayake

E/20/084

Lead architect of the NS-3 simulation and telemetry pipeline. Responsible for Wi-Fi 7 MLO topology design, backoff bias injection mechanism, and Docker Compose orchestration (WP1–WP7).

NS-3 Docker Kafka
A.T.L. Nanayakkara

A.T.L. Nanayakkara

E/20/262

Lead developer of the GCN detection engine. Responsible for GCN model design and training (v1.0.0 through v3.0.0), Windowizer service, evaluation framework, and 54-scenario test automation (WP8–WP12).

PyTorch GCN Python
D.R.P. Nilupul

D.R.P. Nilupul

E/20/266

Lead developer of the observability and dashboard stack. Responsible for Grafana provisioning-as-code (38 panels), TimescaleDB schema design, React dashboard (port 8888), and pipeline integration (WP5, WP6, WP9.5, WP10, WP11).

Grafana React TimescaleDB
Supervisor
Dr. Upul Jayasinghe

Dr. Upul Jayasinghe

University of Peradeniya

Senior Lecturer, Department of Computer Engineering. Research interests: network security, intrusion detection, and machine learning for network management.

Supervisor
Dr. Suneth Namal Karunarathna

Dr. Suneth Namal Karunarathna

University of Peradeniya

Senior Lecturer, Department of Computer Engineering. Research interests: software-defined networking, network function virtualisation, and future wireless systems.

References

[1]
IEEE Std 802.11be-2024, "IEEE Standard for Information Technology—Telecommunications and Information Exchange Between Systems Local and Metropolitan Area Networks—Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 8: Enhancements for Extremely High Throughput," IEEE, 2024.
[2]
T.N. Kipf and M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks," International Conference on Learning Representations (ICLR), 2017. arXiv:1609.02907.
[3]
B.J. Kim, H.J. Kim, and S.I. Hwang, "Analysis of IEEE 802.11be Multi-Link Operation: Throughput and Latency Gains," IEEE Access, vol. 11, pp. 34512–34527, 2023.
[4]
F. Wilhelmi, S. Barrachina-Muñoz, B. Bellalta, C. Cano, A. Jonsson, and G. Neu, "Potential and Pitfalls of Multi-Armed Bandits for Decentralized Spatial Reuse in WLANs," IEEE TCOM, vol. 70, no. 7, pp. 4574–4589, 2022.
[5]
F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, and G. Monfardini, "The Graph Neural Network Model," IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.
[6]
G.F. Riley and T.R. Henderson, "The ns-3 Network Simulator," in Modeling and Tools for Network Simulation, K. Wehrle, M. Günes, J. Gross (Eds.), Springer, 2010, pp. 15–34.
[7]
A. Detti, B. Blefari Melazzi, S. Salsano, and M. Pomposini, "FROG: A Wi-Fi Mac Layer with Adaptive Backoff for Fair Resource Allocation," IEEE INFOCOM, 2012, pp. 2366–2374.
[8]
M. Fey and J.E. Lenssen, "Fast Graph Representation Learning with PyTorch Geometric," ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
[9]
B. Dong, X. Chen, J. Zheng, L. Ma, and X. Wang, "Network Digital Twin: Technologies and Challenges," IEEE Network, vol. 37, no. 2, pp. 105–111, 2023.
[10]
N. Moustafa and J. Slay, "UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set)," Military Communications and Information Systems Conference (MilCIS), 2015.