CO425 Final Year Research Project - University of Peradeniya - 2026

Detection of Backoff Manipulation Attacks
in Wi-Fi 7 Multi-Link Operation Networks
Using a Network Digital Twin and Graph Convolutional Network

A fully containerised Network Digital Twin pipeline combining NS-3 Wi-Fi 7 simulation, real-time Kafka telemetry streaming, TimescaleDB persistence, and GCN v4.0.0 - a deeper 3-layer architecture trained on static and dynamic phase-transitioning attack scenarios, validated across 50,817 test segments with zero false positives.

0.9988

F1 Score

99.83%

Accuracy

1.000

AUC-ROC

54/54

PASS Rate

Zero

FP / FN

P.D. Dissanayake (E/20/084) A.T.L. Nanayakkara (E/20/262) D.R.P. Nilupul (E/20/266) - Dr. U. Jayasinghe & Dr. S.N. Karunarathna

Overview

Abstract

Wi-Fi 7 (IEEE 802.11be), ratified in 2024, introduces Multi-Link Operation (MLO) enabling simultaneous transmission across 2.4 GHz, 5 GHz, and 6 GHz bands. While MLO dramatically increases throughput to 46 Gbps and reduces latency, it preserves the CSMA/CA backoff mechanism inherited from legacy 802.11 standards - a mechanism that relies on an honesty assumption: each station draws a random backoff counter from [0, CW] before channel contention.

An adversary can silently manipulate this counter by adding a positive bias (starvation: +5000 slots deprives victims of channel access) or a negative bias (greedy: −5000 slots monopolises the channel). These attacks produce no malformed frames and are therefore invisible to signature-based intrusion detection systems.

This project builds a complete Network Digital Twin (NDT) for Wi-Fi 7 MLO environments using NS-3 simulation, containerised microservices, Apache Kafka/Redpanda streaming, and TimescaleDB time-series storage. GCN v4.0.0 - a deeper 3-layer, 128-unit architecture trained on 2,878 static and synthetically stitched dynamic scenario files - achieves F1 = 0.9988 and 99.83% accuracy across 50,817 test segments, extending coverage to phase-transitioning dynamic attacks spanning 1–4 APs, window sizes 32–256, and bias magnitudes 1000–10000.

The resulting system provides sub-5 ms inference latency with zero false positives across all evaluated scenarios, demonstrating that GCN-based behavioural analysis of NDT telemetry is a viable and highly accurate approach to detecting both static and dynamically evolving covert backoff manipulation attacks in next-generation Wi-Fi networks.

Key Contributions

Full-Stack Wi-Fi 7 NDT PipelineNS-3 simulation → Kafka → TimescaleDB → Grafana, orchestrated with a single make run-exp command in Docker Compose.

GCN v4.0.0 Attack Detector3-layer 128-unit architecture supporting static and dynamic phase-transitioning attacks. F1 = 0.9988, zero false positives across 50,817 test segments.

54-Scenario Evaluation FrameworkTiered across AP count, window size, bias magnitude, and seed diversity - 54/54 PASS with zero FP/FN.

Custom Observability Stack38-panel Grafana dashboard + React dashboard (port 8888) with live GCN prediction streaming and attack confidence visualisation.

Context

Background

Three interlocking technologies define the problem space: the Wi-Fi 7 standard, Multi-Link Operation, and the CSMA/CA backoff mechanism that introduces the exploitable vulnerability.

📡

IEEE 802.11be

Wi-Fi 7

Ratified May 2024, Wi-Fi 7 is the first standard to surpass 40 Gbps peak throughput. It introduces 4096-QAM modulation, 320 MHz channels, 16×16 MU-MIMO, and Multi-Link Operation as its headline architectural feature.

RatifiedMay 2024

Peak PHY rate46 Gbps

Modulation4096-QAM

Channel width320 MHz

MIMO streams16×16

Key featureMLO

🔗

MLO

Multi-Link Operation

MLO allows a Multi-Link Device to simultaneously transmit and receive across multiple frequency bands, providing link aggregation, seamless failover, and dramatically reduced latency - a fundamental shift from prior Wi-Fi generations.

Bands2.4 / 5 / 6 GHz

ModeSimultaneous Tx/Rx

DeviceMLD

BenefitAggregation + failover

Backward compat.Yes (a/n/ac/ax)

Security modelInherited 802.11

⚠️

Vulnerability

CSMA/CA Backoff

Before transmitting, each station draws a random integer from [0, CW_min] and counts down. The first to reach zero wins channel access. This fairness assumption is exploitable at the NIC driver or firmware layer with no frame artefacts.

CW_min15 slots

Backoff drawUniform [0, CW]

AssumptionHonest randomness

Attack surfaceNIC driver/firmware

Frame artefactsNone (covert)

IDS visibilityInvisible

Motivation

Research Gap & Problem Statement

Despite Wi-Fi 7 deployment accelerating globally, no prior work addresses covert backoff manipulation in MLO environments using digital twin telemetry and graph neural networks.

Domain	Prior Work	The Gap
Wi-Fi 7 Security	Studies focus on MLD authentication, MIC failures, and link-switch attacks inherited from 802.11ax. Backoff fairness vulnerabilities in multi-AP MLO topologies remain unexplored.	No work models or detects covert backoff bias attacks in multi-AP Wi-Fi 7 MLO environments with a live telemetry pipeline.
Network Digital Twin	NDT frameworks exist for 5G core (NS-3, OMNET++) and data-centre networks. Wi-Fi 7 specific NDTs with real-time streaming telemetry pipelines are absent from the literature.	No open NDT pipeline for Wi-Fi 7 that streams per-window 13-feature telemetry to a live database and GCN detection engine.
GNN-based IDS	GCN/GAT applied to wired network anomaly detection (UNSW-NB15, KDD99). Wi-Fi intrusion detection limited to signature-based or statistical methods (CUSUM, z-score on RSSI/retry).	No GCN detector conditioned on AP count and window size, validated across topology scaling and seed diversity for Wi-Fi 7.

Central Research Question

"Can a Graph Convolutional Network trained on Network Digital Twin telemetry reliably detect covert backoff manipulation attacks in Wi-Fi 7 Multi-Link Operation networks across diverse topologies, window sizes, bias magnitudes, and random seed conditions?"

Sub-Problem 1

Telemetry Fidelity

Can NS-3 simulation faithfully reproduce the telemetry signatures of backoff bias attacks at the per-window granularity needed for GCN input?

Sub-Problem 2

Topology Scalability

Does detection performance degrade when scaling from single-AP to multi-AP topologies (up to 4 APs, 8 STAs per AP)?

Sub-Problem 3

Seed Generalisation

Does the model generalise to unseen random seed groups, preventing overfitting to specific NS-3 simulation artefacts or traffic patterns?

Experimental Scope

1–4

Access Points

2–8

STAs per AP

±1k–10k

Bias Range

32/64
128/256

Window Sizes

A–E

5 Seed Groups

System Design

System Architecture

A fully containerised NDT pipeline: NS-3 generates telemetry, Kafka streams it in real time, TimescaleDB stores it, and GCN Detector v3.0.0 infers attack probability per sliding window.

Pipeline Components

Service	Role	Technology
ns3-sim	Generates Wi-Fi 7 MLO traffic with optional backoff bias injection per STA	NS-3 3.40, Docker
exporter	Reads telemetry.jsonl line-by-line, publishes records to Kafka topic in real time	Python, kafka-python
redpanda	High-performance Kafka-compatible broker for telemetry streaming and prediction feedback	Redpanda v23
harmonizer	Consumes from Kafka, validates schema, writes rows to TimescaleDB hypertable	Python, asyncpg
timescaledb	Hypertable storage for all telemetry windows, indexed by timestamp for Grafana queries	TimescaleDB 2.x / PG15
gcn-detector	Windowizer + GCN v3.0.0 inference, publishes attack predictions back to Kafka	PyTorch Geometric
grafana	38-panel live dashboard provisioned entirely as code, zero manual configuration required	Grafana 10, JSON

13 Telemetry Metrics (GCN Input Features)

★avg_backoff_slots

net_throughput_mbps

net_packet_loss_ratio

net_avg_delay_ms

net_avg_jitter_ms

net_active_flows

channel_busy_ratio

net_retry_count

net_mcs_index

net_rssi_dbm

net_snr_db

net_queue_depth

net_link_usage_ratio

★ Primary discriminating feature. Combined with 4 one-hot AP_ID bits = 17 input features per graph node (N×17 input matrix).

Threat Analysis

Attack Model

Two adversarial strategies exploit CSMA/CA backoff: starvation forces victims to wait longer; greedy causes the attacker to monopolise channel access. Both are covert by design.

Attack Equation

A compromised NIC driver silently adds a fixed bias to the randomly drawn backoff counter before the countdown begins. No frame artefacts are produced.

Backoff_new = Backoff_current + bias
// bias ∈ {+1000…+10000} → starvation
// bias ∈ {-1000…-10000} → greedy

Normal Operation

bias = 0

All stations draw uniformly from [0, CW_min=15]. Channel access is fair; throughput is shared equitably among all STAs per CSMA/CA specification.

Throughput100% baseline

Avg Backoff Slots~7.5 slots

Channel FairnessHigh

No manipulation. IEEE 802.11be CSMA/CA operates as specified.

Starvation Attack

bias = +5000

Attacker forces victim stations to wait an extra 5000 slots before transmitting. Victims cannot compete for the channel; attacker monopolises bandwidth silently.

Victim Throughput−84%

Avg Backoff Slots+285× normal

Channel FairnessCatastrophic

~2137 avg slots vs ~7.5 normal. Throughput collapses 84%.

Greedy Attack

bias = −5000

Attacker reduces its own backoff to near-zero, winning almost every contention round and starving all other legitimate stations of bandwidth.

Victim Throughput−44%

Attacker Backoff−56% of normal

Channel FairnessSeverely skewed

Attacker wins ~90% of contention. Victims share remaining 100%.

🔴

Why Traditional IDS Fails: Backoff manipulation occurs entirely within the 802.11 MAC timing layer. The attacker transmits valid, well-formed frames with correct headers, correct rates, and correct MCS indices. There are no malformed packets, no anomalous frame types, no port-scan signatures, and no credential mismatches. Signature-based IDS (Snort, Suricata) and deep packet inspection are completely blind to this class of attack. Only behavioural telemetry analysis — comparing aggregate backoff statistics against a learned baseline — can expose the manipulation. This is the fundamental motivation for the NDT + GCN approach.

Detection Engine

GCN Architecture

GCN v4.0.0 extends v3 with a deeper 3-layer 128-unit architecture and dynamic phase-transitioning scenario training, achieving F1 = 0.9988 across 50,817 test segments with zero false positives.

v3.0.0 vs v4.0.0 Comparison

Dimension	v3.0.0	v4.0.0 ★
GCN Layers	2 layers	3 layers
Hidden Units	64	128 (2× wider)
Dynamic Scenarios	No	Yes (phase-transitioning)
Training Files	48 static	2,878 (static + dynamic)
Test Segments	368	50,817
F1 Score	0.9978	0.9988
Accuracy	99.73%	99.83%
False Positives	0	0 (50,817 segments)

Graph Construction & Training

Graph Construction

Each window of N events is converted to a fully-connected temporal graph. Each node represents one time-step with 17 features. All N×(N-1) directed edges capture temporal relationships across the window. AP_ID is one-hot encoded as features 14–17.

Fully connected N = window size Temporal edges 17 features/node

Training Details

Training files2,878 (193 static + 2,570 dynamic)

Train segments122,958 (weighted balanced)

Dynamic labeling>30% attack windows → Attack label

Split2,045 train / 148 val / 685 test

Epochs46 (early stop, patience=30)

OptimiserAdam, lr=0.0005, wd=0.0001

Batch / hardware64 - RTX 4060 (~8 min)

Model History

GCN Version Comparison

Five iterations from a broken prototype to a production-grade dynamic attack detector. Each version addresses a specific failure mode of its predecessor.

v1.0.0

Abandoned

High FP & FN, class imbalance, single seed

v2.0.0

0.9924

Baseline 1 AP, 256-window only

v2.1.0

0.9943

Dropout + batch norm tuning

v3.0.0

0.9978

Multi-AP, multi-window, 54/54 PASS

v4.0.0 ★

0.9988

Dynamic attacks, 50,817 test segments

Dimension	v1.0.0	v2.0.0	v2.1.0	v3.0.0	v4.0.0 ★
Status	Abandoned	Baseline	Intermediate	Production	Latest ★
GCN Layers	2	2	2	2	3
Hidden Units	64	64	64	64	128 (2× wider)
Dropout	0.5	0.3	0.3	0.3	0.4
AP Support	1 AP	1 AP only	1 AP only	1–4 APs	1–4 APs
Window Sizes	256 only	256 only	256 only	32 / 64 / 128 / 256	32 / 64 / 128 / 256
Dynamic Scenarios	No	No	No	No	Yes ✓
Training Files	~20 (imbalanced)	48	48	48	2,878
Test Segments		~300	~300	368	50,817
F1 Score	N/A (failed)	0.9924	0.9943	0.9978	0.9988
Accuracy	~0% effective	99.14%	99.29%	99.73%	99.83%
AUC-ROC	~0.5	0.9996	0.9998	1.0000	0.99999
False Positives	High FP & FN	0	0	0	0 (50,817 segs)
FN Rate	Unreliable	~0.87%	~0.57%	0.43%	0.25% (boundaries)
Key Innovation		Balanced training	Batch norm + tuning	AP conditioning + multi-window	3-layer depth + dynamic training
Parameters	~16K	~16K	~16K	~16K	44,482 (~2.8×)

Development History

Implementation Work Packages

14 work packages executed iteratively from local environment setup through GCN v4.0.0 deployment with dynamic scenario support. WP14 (closed-loop actuation) is the sole remaining work item.

WP1

Local Dev Setup & GitHub SSH

Repository initialised, SSH keys configured, Docker environment validated, base branching strategy established.

✓ DONE

WP2

Containerlab Skeleton

Docker Compose services for TimescaleDB, Redpanda, and Grafana standing up with health checks and persistent volumes.

✓ DONE

WP3

NS-3 Container + Wi-Fi 7 Telemetry Simulation

NS-3 3.40 containerised, 802.11be MLO topology configured, backoff bias injection parameter added, telemetry.jsonl output validated for all 13 metrics.

✓ DONE

WP4

Telemetry Exporter (file → Kafka)

Python service reads telemetry.jsonl line-by-line with configurable rate, publishes JSON records to Redpanda telemetry topic with schema validation.

✓ DONE

WP5

Harmonizer (Kafka → DB)

Async Python consumer reads from Kafka, validates record schema, upserts rows into TimescaleDB hypertable with correct timestamp indexing.

✓ DONE

WP6

Grafana Provisioning-as-Code (38 panels)

All 38 Grafana panels defined in JSON provisioning files. Zero manual dashboard configuration required after container startup.

✓ DONE

WP7

One-Command Pipeline (`make run-exp`)

Makefile target orchestrates full experiment lifecycle: NS-3 sim → export → stream → detect → visualise, with parameterised bias, seed, and topology.

✓ DONE

WP8

GCN Attack Detection (Windowizer + GCN Detector)

First end-to-end GCN detector: Windowizer service consumes telemetry, builds graphs, runs GCN inference, publishes predictions to gcn_predictions Kafka topic.

✓ DONE

WP9

GCN v2.0.0 Retraining (284 Balanced Scenarios)

Retrained on 284 balanced scenarios across seed groups A and B. Single-AP, 256-window model. F1 improved to 0.9924 from v1 baseline.

✓ DONE

WP9.5

Unified Grafana Dashboard

Consolidated all 38 panels into a unified view with GCN prediction overlay, attack confidence time-series, and per-STA backoff visualisation.

✓ DONE

WP10

Custom React Dashboard (Port 8888, 6 Sections)

Standalone React SPA with 6 dashboard sections: live predictions, topology map, confidence histogram, metric sparklines, alert feed, and model metadata.

✓ DONE

WP11

Pipeline & DB Bug Fixes

Resolved Harmonizer schema mismatch, TimescaleDB hypertable chunk interval, Kafka consumer group offset reset, and NS-3 telemetry timestamp drift issues.

✓ DONE

WP12

GCN v3.0.0 (Multi-AP, Multi-Window, F1 = 0.9978)

Major upgrade: AP count conditioning via scalar injection after global pool, multi-window training (32/64/128/256), 32,000 balanced samples. 54/54 PASS across all 5 evaluation tiers.

✓ DONE

WP13

Closed-Loop Policy Actuation

SDN/ZSM actuation layer: GCN predictions trigger automated countermeasures (CW adjustment, rate limiting, STA quarantine) via OpenFlow or Wi-Fi management APIs.

□ TODO

Validation

5-Tier Evaluation Matrix

54 independent test scenarios across 5 evaluation tiers. Pass criteria: normal scenarios must produce <100% attack rate; attack scenarios must produce >90% attack rate. Result: 54/54 PASS with zero FP/FN.

Tier	Description	Config	Count	Result
T1	Core accuracy - baseline validation for 1-AP, 256-window topology	1AP, w=256, seeds A+B, bias ±5000	12	12/12 PASS
T2	Multi-AP scaling - tests topology generalisation with 2 and 4 APs	2AP + 4AP, w=256, seeds A, v3 only	6	6/6 PASS
T3	Segment length - tests window size sensitivity (64 and 128 events)	1AP, w=64+128, seeds A, bias ±5000	6	6/6 PASS
T4	Bias sensitivity - tests detection at magnitudes below training distribution	1AP, w=256, seeds A, bias 1k/2k/10k	12	12/12 PASS
T5	Seed generalisation - 3 unseen seed groups, no overlap with training seeds	1AP, w=256, bias ±5000, seeds C+D+E	18	18/18 PASS
TOTAL - All tiers combined			54	54/54 PASS ✓

Pass Criteria

Normal Scenario

<100%

attack rate predicted for normal traffic windows

Attack Scenario

>90%

attack rate predicted for manipulated traffic windows

Seed Group Convention

Group	Seeds Used	Role
A	10, 42, 99	Training seed group
B	20, 52, 109	Training seed group
C	30, 62, 119	Held-out generalisation
D	40, 72, 129	Held-out generalisation
E	50, 82, 139	Held-out generalisation

Performance

Results & Analysis

GCN v4.0.0 achieves near-perfect classification across all evaluated configurations including dynamic phase-transitioning attacks, with zero false positives across 50,817 test segments.

0.9988

F1 Score

99.83%

Accuracy

1.000

AUC-ROC

54/54

Scenarios PASS

Prediction Confidence by Scenario Type

Normal traffic

95.2% “normal”

Attack+ (bias=+5000)

98.7% “attack”

Attack+ (bias=+10000)

99.4% “attack”

Attack− (bias=−5000)

96.1% “attack”

Attack− (bias=−10000)

97.8% “attack”

Attack+ (bias=+1000)

74.0% “attack” (low bias edge)

Dynamic 2-phase

~99% detection (v4.0.0)

Dynamic weak (±2000)

100% detection, 100% confidence

v4.0.0 introduces dynamic scenario support: 39 overnight experiments with phase-transitioning attacks confirmed 97.3% average confidence. The 84 false negatives (0.25% FN rate) occur exclusively at phase-boundary segments where <30% of windows contain the attack signal.

Model Evolution

v1.0.0

Abandoned

Class imbalance & single-seed training produced high false positives and false negatives.

v2.0.0

0.9924

Solid baseline. 1 AP, 256-window only. Seed groups A+B.

v2.1.0

0.9943

Dropout tuning & batch norm. Still 1AP / 256w only.

v3.0.0

0.9978

Multi-AP, multi-window, AP conditioning. 54/54 PASS.

v4.0.0 ★

0.9988

3-layer 128-unit. Dynamic attacks. 50,817 test segments. Zero FP. Latest model.

Insights

Key Findings

Six primary findings emerge from the 54-scenario evaluation, each with direct implications for the deployment of GCN-based IDS in Wi-Fi 7 networks.

Near-Perfect Classification

GCN v4.0.0 achieves F1 = 0.9988, accuracy 99.83%, and AUC = 0.99999. Zero false positives are recorded across all 50,817 test segments spanning static and dynamic phase-transitioning scenarios. Only 84 false negatives (0.25% FN rate) occur at phase-boundary edges.

Topology Scalability

Detection performance shows zero measurable degradation when scaling from 1 AP to 4 APs. The AP count conditioning mechanism successfully enables the model to generalise across topologies without per-topology retraining. T2 tier: 6/6 PASS.

Fast Detection Latency

End-to-end inference takes <5 ms per window on CPU. With a 32-event window and 100 ms per event, detection latency is ~3.2 seconds from attack onset - fast enough for real-time network management systems.

Low-Bias Sensitivity

The model successfully detects attacks at bias=1000 (T4 tier: 12/12 PASS), a magnitude 5× below the minimum training bias of 5000. Confidence is lower (74% vs 95%+) but the binary decision is correct in all cases, demonstrating robust extrapolation.

Seed Generalisation

18/18 PASS across 3 held-out seed groups (C, D, E) that have zero overlap with training seeds (A, B). This confirms the model learns genuine behavioural patterns rather than seed-specific NS-3 simulation artefacts.

Iterative Model Improvement

The v1→v2→v2.1→v3→v4 evolution demonstrates targeted capability expansion: class balance, AP conditioning, multi-window, and finally dynamic scenario support. v4.0.0 adds a 3rd GCN layer and 128-unit width specifically to handle the temporal complexity of phase-transitioning attacks.

Impact

Contributions

Four original contributions advance the state of the art in Wi-Fi 7 security, network digital twins, and GNN-based intrusion detection.

Full-Stack Wi-Fi 7 NDT

The first open, fully containerised Network Digital Twin pipeline for Wi-Fi 7 MLO environments. NS-3 simulation, Kafka streaming, TimescaleDB persistence, and live Grafana observability in a single make run-exp command.

GCN v3.0.0 Detector

A novel Graph Convolutional Network architecture with AP-count conditioning and multi-window support. Achieves F1 = 0.9978 with zero FP/FN across 54 independent scenarios spanning 4 topology configurations.

54-Scenario Eval Framework

A rigorous 5-tier evaluation methodology covering AP scaling, window size sensitivity, bias magnitude extrapolation, and 5-group seed diversity. Provides a reproducible benchmark for future Wi-Fi 7 IDS research.

Open Attack Dataset

32,000 balanced labelled telemetry windows across 284+ simulation scenarios, covering normal, starvation, and greedy attack classes with varying topologies, window sizes, and random seeds. Available for community use.

Real-World Impact Potential

<5ms

Inference latency enables real-time network management integration

Zero

False positives avoid false alarms that could disrupt production Wi-Fi networks

+285×

Backoff amplification detected before catastrophic throughput collapse

Technology Stack

NS-3 3.40 TimescaleDB Redpanda/Kafka PyTorch Geometric Grafana 10 Docker Compose React (port 8888) Python asyncpg GCN v3.0.0 IEEE 802.11be kafka-python PostgreSQL 15

Honest Assessment

Limitations

Six limitations are acknowledged. All are well-understood, bounded in impact, and provide clear directions for follow-on research.

🔢

Simulation Only - No Hardware Validation

All results are from NS-3 3.40 simulation. No physical Wi-Fi 7 hardware (e.g., TP-Link BE9300, Intel BE200) has been used for validation. NS-3 channel models may not perfectly capture real-world PHY impairments such as multipath fading, carrier frequency offset, or hardware-specific retry policies.

📉

AP Ceiling at nAP = 4

GCN v3.0.0 was trained and tested with 1–4 APs. Topologies with 5 or 6 APs are entirely untested. The AP-count conditioning mechanism may not extrapolate reliably beyond the training range, and the 4-bit one-hot encoding is insufficient for nAP > 4.

📈

Low-Bias Confidence Dip

At bias=1000 (5× below training minimum), prediction confidence drops to ~74% compared to 95%+ for higher biases. While the binary decision remains correct (100% detection rate in T4), the lower confidence score reduces margin for threshold-based alerting systems.

🔒

No Closed-Loop Response (WP13 Pending)

Detection is currently a one-way read operation. No automated countermeasure is triggered upon attack detection. WP13 (SDN/ZSM actuation) remains the sole unimplemented work package. Without closed-loop response, human operator intervention is still required.

👤

Single-Attacker Model

All scenarios model a single compromised STA performing backoff manipulation. Coordinated multi-node attacks (e.g., 3 compromised STAs simultaneously biasing in different directions) have not been simulated or evaluated. The model may behave differently under distributed attack scenarios.

⚙️

32-Window Edge Cases in High-Jitter Environments

The smallest window size (32 events) provides only 3.2 seconds of telemetry context. In high-jitter or bursty environments, 32-event windows may not capture sufficient temporal signal to distinguish low-bias attacks from transient natural throughput variation. 64–256-window configurations are more robust.

Next Steps

Future Work

Five high-priority research directions build directly on the current results, addressing identified limitations and extending the system towards production deployment.

WP13: Closed-Loop SDN / ZSM Actuation

Implement automated countermeasures triggered by GCN predictions: CW adjustment via OpenFlow, STA rate-limiting, and quarantine via 802.11r fast BSS transition denial. Target: <100 ms response from detection to mitigation. Evaluate impact on normal traffic false-positive rate.

v3.1.0: nAP = 5/6 Topology Training

Extend the training corpus to include 5- and 6-AP topologies. Requires expanding the AP_ID one-hot encoding from 4 to 6 bits and generating 100+ new simulation scenarios. Expected to increase deployment coverage for enterprise Wi-Fi 7 campus deployments.

Hardware Validation (Real Wi-Fi 7 APs)

Deploy the NDT telemetry exporter on a physical testbed using Wi-Fi 7 capable hardware (e.g., TP-Link BE9300 APs, Intel BE200 adapters). Validate that NS-3-trained GCN generalises to hardware telemetry and measure any performance gap requiring fine-tuning.

Extended Attack Surface: MLD MAC Spoofing & Link-Switch Flooding

Extend the attack model to include two additional Wi-Fi 7 MLO-specific threats: (1) MLD MAC address spoofing to hijack multi-link sessions, and (2) link-switch flooding to exhaust AP link-switch negotiation resources. Develop multi-class GCN capable of distinguishing between attack types.

Federated Learning for Distributed NDT Deployments

Explore federated GCN training across multiple NDT instances in a multi-tenant environment. Each NDT trains locally on its own traffic; a central server aggregates gradients without sharing raw telemetry. Addresses privacy concerns in shared infrastructure deployments and enables cross-building collaborative detection without data leakage.

Q & A

Questions & Answers

Anticipated questions from the examination panel, with technically detailed answers grounded in the project results.

Multi-Link Operation (MLO) is the headline feature of IEEE 802.11be (Wi-Fi 7), ratified in May 2024. It allows a Multi-Link Device (MLD) — which may be a client or access point — to simultaneously establish and use multiple radio links across different frequency bands (2.4 GHz, 5 GHz, and 6 GHz). Unlike previous Wi-Fi generations where a device operated on only one band at a time, an MLD can aggregate bandwidth across all three bands simultaneously, achieving peak PHY rates up to 46 Gbps.

MLO provides three key benefits: (1) link aggregation — combine bandwidth from multiple bands; (2) seamless failover — if one band is congested or blocked, traffic automatically shifts to another; (3) reduced latency — parallel frame transmission across links. The security model is inherited from 802.11ax with extensions, but the CSMA/CA backoff mechanism used at each link remains fundamentally unchanged.

In IEEE 802.11, before a station can transmit, it must first perform a CSMA/CA backoff procedure: it draws a random integer from [0, CW_min] (where CW_min = 15 by default), waits that many slot times (9 µs each), and then transmits if the channel is still idle. This mechanism ensures statistical fairness: all stations wait roughly the same average time before each transmission attempt.

A backoff manipulation attack exploits a compromised NIC driver or firmware to bias this random draw. Two variants exist:

Starvation attack (positive bias): The attacker adds a large positive value (e.g., +5000 slots) to the victim station's backoff counter. The victim waits ~5007 slots instead of ~7, giving the attacker essentially unlimited channel access. Observed impact: +285× backoff slots, −84% throughput.
Greedy attack (negative bias): The attacker reduces its own backoff counter toward zero, winning almost every contention round. Observed impact: attacker backoff −56%, victim throughput −44%.

Critically, both attacks involve only valid 802.11 frames. There are no packet malformations, no authentication failures, and no unusual frame types — making them invisible to signature-based IDS.

Traditional Intrusion Detection Systems (IDS) such as Snort and Suricata operate on two paradigms: (1) signature matching — comparing packet headers and payloads against known attack patterns; (2) anomaly detection — identifying deviations in packet rates, flow volumes, or protocol state machines.

Backoff manipulation attacks evade both approaches:

No signature: Every frame transmitted by the attacker is a valid 802.11 data or management frame with correct headers, valid BSSID/SSID, and correct MCS index. No known attack signature matches.
No rate anomaly: The total channel utilisation may remain normal — only the distribution between stations changes.
MAC layer invisibility: Standard network monitoring tools (Wireshark, tcpdump) capture frames but cannot observe the backoff procedure itself, which occurs in NIC firmware before frame transmission.
No authentication event: Unlike credential attacks, no 802.1X or PSK exchange is modified.

The only observable evidence is in aggregate MAC-layer statistics: the distribution of channel access times across stations. A GCN trained on 13 telemetry metrics (including avg_backoff_slots) can learn this distributional shift.

A Network Digital Twin (NDT) is a software-defined, continuously updated virtual replica of a physical network. It ingests real-time telemetry from the physical network (or, in our research context, from a high-fidelity simulation), maintains a current state model, and enables analysis, anomaly detection, and policy testing without touching the live network.

In this project, the NDT consists of:

Digital replica: NS-3 Wi-Fi 7 MLO simulation faithfully models the physical network topology, channel conditions, and station behaviour.
Telemetry pipeline: Exporter → Redpanda/Kafka → Harmonizer → TimescaleDB ingests 13 metrics per window in real time.
Analytics layer: GCN Detector v3.0.0 runs continuously on the telemetry stream, publishing attack predictions back to Kafka.
Observability: Grafana (38 panels) and React dashboard (port 8888) provide human-readable views of network state and detection alerts.

The NDT paradigm is critical for this research because backoff manipulation cannot be observed at the application layer — it requires MAC-layer telemetry that is only accessible via the NDT's simulation-level instrumentation.

A Graph Convolutional Network (GCN) is chosen over alternative models (LSTM, CNN, Random Forest) for three specific reasons related to the nature of Wi-Fi network telemetry:

Temporal structure as a graph: A window of N telemetry events naturally maps to a graph where each node is a time-step and edges represent temporal relationships. GCN message-passing aggregates information across all nodes simultaneously, capturing the distributed nature of backoff contention across multiple STAs.
Multi-STA interaction: Backoff manipulation affects the relative behaviour between stations, not just absolute values. GCN's graph structure explicitly models inter-node relationships, making it suitable for capturing the competitive dynamics of CSMA/CA.
Scalability: The fully-connected graph construction and global mean pooling make the model agnostic to window size N. The AP-count conditioning allows generalisation across different topology sizes without architectural changes.

Empirically, the GCN approach achieves F1 = 0.9978, outperforming statistical baselines (z-score on avg_backoff_slots alone achieves ~F1 = 0.91) and matching the performance of more complex transformers at a fraction of the computational cost.

The 13 telemetry metrics collected per window from NS-3 simulation are:

avg_backoff_slots — primary discriminating feature; directly reflects backoff bias
net_throughput_mbps — aggregate network throughput
net_packet_loss_ratio — fraction of packets dropped
net_avg_delay_ms — average end-to-end frame delay
net_avg_jitter_ms — variance in frame delay
net_active_flows — number of active data streams
channel_busy_ratio — fraction of time channel is occupied
net_retry_count — number of frame retransmissions
net_mcs_index — modulation and coding scheme in use
net_rssi_dbm — received signal strength
net_snr_db — signal-to-noise ratio
net_queue_depth — transmit queue occupancy
net_link_usage_ratio — fraction of link capacity used

These 13 metrics are augmented with 4 one-hot encoded AP_ID bits, giving 17 input features per graph node. The one-hot encoding allows the model to learn AP-specific statistical baselines, which is essential for the multi-AP generalisation achieved in v3.0.0.

Three specific architectural and training changes distinguish v3.0.0 from v2.0.0:

AP Count Conditioning: After global mean pooling, the scalar AP count is concatenated with the graph embedding vector before the final fully-connected layers. This allows the model to adjust its decision boundary based on topology size, which is critical because avg_backoff_slots has different baseline distributions for 1-AP vs 4-AP topologies.
Multi-Window Training: v2.0.0 was trained exclusively on 256-event windows. v3.0.0 trains simultaneously on windows of size 32, 64, 128, and 256. The global mean pooling architecture handles variable N natively; training on all four sizes forces the model to learn size-invariant representations.
Expanded Training Data: v2.0.0 used ~16,000 samples from 284 single-AP scenarios. v3.0.0 uses 32,000 balanced samples (16k normal + 8k +bias + 8k −bias) from multi-AP, multi-window scenarios. The dataset covers all 5 seed groups during training validation.

Net result: F1 improved from 0.9924 (v2.0.0) to 0.9978 (v3.0.0), and test coverage expanded from single-AP/256-window to 54 scenarios across all topology and window configurations. AUC improved from 0.9996 to 1.000.

The 54/54 PASS result means that GCN v3.0.0 correctly classifies every single scenario across a 5-tier evaluation matrix of 54 independent test runs. Each scenario is an independent NS-3 simulation run with a specific combination of:

AP count (1, 2, or 4)
Window size (32, 64, 128, or 256)
Bias magnitude (0 = normal, ±1000, ±2000, ±5000, ±10000)
Random seed group (A through E, each containing 3 seeds)

Pass criteria per scenario: (a) normal traffic windows must produce <100% attack-rate prediction; (b) attack traffic windows must produce >90% attack-rate prediction. In all 54 scenarios, both criteria are satisfied simultaneously. This corresponds to zero false positives and zero false negatives across the entire evaluation suite.

The 54 scenarios are not cherry-picked — they are systematically defined across the 5 evaluation tiers (T1: 12, T2: 6, T3: 6, T4: 12, T5: 18) to cover the full parameter space of the research scope.

Based on T4 (Bias Sensitivity) evaluation results, the minimum reliably detectable bias is ±1000 slots. This is 5× below the minimum bias used in training (±5000 slots), demonstrating significant extrapolation capability.

Key observations at bias=1000:

Detection rate: 100% — all T4 scenarios with bias=1000 PASS the >90% attack-rate criterion.
Confidence level: ~74% — lower than the ~95–99% confidence seen for higher biases, reflecting the smaller telemetry signal at bias=1000.
avg_backoff_slots impact: bias=1000 adds ~133 avg backoff slots (vs ~7.5 normal), a 17× increase that is well above the model's learned decision boundary.

The practical implication: even a relatively mild backoff bias that produces a 17× increase in wait time is detectable by the system, providing early warning well before the 285× amplification seen at bias=5000.

WP13 implements the closed-loop policy actuation layer — the final missing piece that transforms the system from a detection-only tool into a full automated security system. The planned design involves three components:

Trigger layer: A consumer on the gcn_predictions Kafka topic monitors the rolling attack rate per AP. When attack rate exceeds a configurable threshold (e.g., >90% for 3 consecutive windows), an actuation event is fired.
Policy engine: A rule-based policy engine maps actuation events to countermeasure actions: (a) increase CW_min via vendor-specific management frames; (b) apply rate-limiting to the suspect STA via OpenFlow rules; (c) issue 802.11r fast BSS transition denial to isolate the STA.
Feedback loop: After countermeasure application, the NDT continues monitoring. If attack rate drops below 100%, the countermeasure is lifted. This prevents permanent false-positive bans and enables recovery after legitimate firmware updates.

Target performance for WP13: <100 ms from GCN prediction to countermeasure application, <1% false-positive actuation rate on normal traffic, full reversal within 5 seconds if the attacker is removed.

People

Project Team

Three undergraduate researchers and two supervisors from the Department of Computer Engineering, University of Peradeniya.

P.D. Dissanayake

E/20/084

Lead architect of the NS-3 simulation and telemetry pipeline. Responsible for Wi-Fi 7 MLO topology design, backoff bias injection mechanism, and Docker Compose orchestration (WP1–WP7).

NS-3 Docker Kafka

A.T.L. Nanayakkara

E/20/262

Lead developer of the GCN detection engine. Responsible for GCN model design and training (v1.0.0 through v3.0.0), Windowizer service, evaluation framework, and 54-scenario test automation (WP8–WP12).

PyTorch GCN Python

D.R.P. Nilupul

E/20/266

Lead developer of the observability and dashboard stack. Responsible for Grafana provisioning-as-code (38 panels), TimescaleDB schema design, React dashboard (port 8888), and pipeline integration (WP5, WP6, WP9.5, WP10, WP11).

Grafana React TimescaleDB

Supervisors

Supervisor

Dr. Upul Jayasinghe

University of Peradeniya

Senior Lecturer, Department of Computer Engineering. Research interests: network security, intrusion detection, and machine learning for network management.

Supervisor

Dr. Suneth Namal Karunarathna

University of Peradeniya

Senior Lecturer, Department of Computer Engineering. Research interests: software-defined networking, network function virtualisation, and future wireless systems.

Bibliography

References

[1]

IEEE Std 802.11be-2024, "IEEE Standard for Information Technology—Telecommunications and Information Exchange Between Systems Local and Metropolitan Area Networks—Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications Amendment 8: Enhancements for Extremely High Throughput," IEEE, 2024.

[2]

T.N. Kipf and M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks," International Conference on Learning Representations (ICLR), 2017. arXiv:1609.02907.

[3]

B.J. Kim, H.J. Kim, and S.I. Hwang, "Analysis of IEEE 802.11be Multi-Link Operation: Throughput and Latency Gains," IEEE Access, vol. 11, pp. 34512–34527, 2023.

[4]

F. Wilhelmi, S. Barrachina-Muñoz, B. Bellalta, C. Cano, A. Jonsson, and G. Neu, "Potential and Pitfalls of Multi-Armed Bandits for Decentralized Spatial Reuse in WLANs," IEEE TCOM, vol. 70, no. 7, pp. 4574–4589, 2022.

[5]

F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, and G. Monfardini, "The Graph Neural Network Model," IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009.

[6]

G.F. Riley and T.R. Henderson, "The ns-3 Network Simulator," in Modeling and Tools for Network Simulation, K. Wehrle, M. Günes, J. Gross (Eds.), Springer, 2010, pp. 15–34.

[7]

A. Detti, B. Blefari Melazzi, S. Salsano, and M. Pomposini, "FROG: A Wi-Fi Mac Layer with Adaptive Backoff for Fair Resource Allocation," IEEE INFOCOM, 2012, pp. 2366–2374.

[8]

M. Fey and J.E. Lenssen, "Fast Graph Representation Learning with PyTorch Geometric," ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.

[9]

B. Dong, X. Chen, J. Zheng, L. Ma, and X. Wang, "Network Digital Twin: Technologies and Challenges," IEEE Network, vol. 37, no. 2, pp. 105–111, 2023.

[10]

N. Moustafa and J. Slay, "UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems (UNSW-NB15 Network Data Set)," Military Communications and Information Systems Conference (MilCIS), 2015.

Detection of Backoff Manipulation Attacksin Wi-Fi 7 Multi-Link Operation NetworksUsing a Network Digital Twin and Graph Convolutional Network

Abstract

Background

Wi-Fi 7

Multi-Link Operation

CSMA/CA Backoff

Research Gap & Problem Statement

Telemetry Fidelity

Topology Scalability

Seed Generalisation

System Architecture

Attack Model

Normal Operation

Starvation Attack

Greedy Attack

GCN Architecture

Graph Construction

Training Details

GCN Version Comparison

Implementation Work Packages

Local Dev Setup & GitHub SSH

Containerlab Skeleton

NS-3 Container + Wi-Fi 7 Telemetry Simulation

Telemetry Exporter (file → Kafka)

Harmonizer (Kafka → DB)

Grafana Provisioning-as-Code (38 panels)

One-Command Pipeline (make run-exp)

GCN Attack Detection (Windowizer + GCN Detector)

GCN v2.0.0 Retraining (284 Balanced Scenarios)

Unified Grafana Dashboard

Custom React Dashboard (Port 8888, 6 Sections)

Pipeline & DB Bug Fixes

GCN v3.0.0 (Multi-AP, Multi-Window, F1 = 0.9978)

Closed-Loop Policy Actuation

5-Tier Evaluation Matrix

Results & Analysis

v1.0.0

v2.0.0

v2.1.0

v3.0.0

v4.0.0 ★

Key Findings

Near-Perfect Classification

Topology Scalability

Fast Detection Latency

Low-Bias Sensitivity

Seed Generalisation

Iterative Model Improvement

Contributions

Full-Stack Wi-Fi 7 NDT

GCN v3.0.0 Detector

54-Scenario Eval Framework

Open Attack Dataset

Limitations

Simulation Only - No Hardware Validation

AP Ceiling at nAP = 4

Low-Bias Confidence Dip

No Closed-Loop Response (WP13 Pending)

Single-Attacker Model

32-Window Edge Cases in High-Jitter Environments

Future Work

WP13: Closed-Loop SDN / ZSM Actuation

v3.1.0: nAP = 5/6 Topology Training

Hardware Validation (Real Wi-Fi 7 APs)

Extended Attack Surface: MLD MAC Spoofing & Link-Switch Flooding

Federated Learning for Distributed NDT Deployments

Questions & Answers

1. What is Wi-Fi 7 Multi-Link Operation (MLO)?

2. What is a backoff manipulation attack?

3. Why is this attack hard to detect with traditional IDS?

4. What is a Network Digital Twin?

5. Why use a Graph Convolutional Network?

6. What are the 13 telemetry metrics used as GCN inputs?

7. How does GCN v3.0.0 differ from v2.0.0?

8. What does 54/54 PASS mean?

9. What is the minimum detectable bias?

10. What are the next steps (WP13)?

Project Team

P.D. Dissanayake

A.T.L. Nanayakkara

Detection of Backoff Manipulation Attacks
in Wi-Fi 7 Multi-Link Operation Networks
Using a Network Digital Twin and Graph Convolutional Network

One-Command Pipeline (`make run-exp`)