Backdoor-Resilient Federated Learning for NIDS

01

Abstract

Network Intrusion Detection Systems (NIDS) are a critical line of defence against cyberattacks. Federated Learning (FL) enables collaborative model training across distributed organisations without sharing raw network traffic — making it an attractive paradigm for privacy-preserving NIDS.

However, FL is inherently vulnerable to backdoor attacks, where malicious clients poison the global model to misclassify specific traffic patterns as benign. Our experiments demonstrate that a model achieving 93% main task accuracy can simultaneously suffer a devastating 99.95% Attack Success Rate (ASR), proving that accuracy alone is a dangerously insufficient security metric.

This research investigates how sophisticated backdoor attacks evolve to evade state-of-the-art defences (FLAME, Multi-Krum) and proposes SENTINEL — a novel backdoor-resilient aggregation algorithm combining multi-signal anomaly filtering (L2 norm + Sybil similarity) with coordinate-wise trimmed median aggregation and optional Differential Privacy.

95.2%

Attack Success Rate (FedAvg, IID)

without any defense

≈ 0%

SENTINEL ASR (IID scenario)

under stealthy backdoor attack

10.5%

SENTINEL ASR (Non-IID, α=0.5)

vs 99.5% for FedAvg

UNSW-NB15

Dataset

3 Dirichlet distributions (α = 100, 0.5, 0.1)

02

Related Works

🌐

Federated Learning for NIDS

Federated Learning enables collaborative model training across distributed organisations without sharing raw network traffic data, preserving privacy while building powerful intrusion detection models. FL requires continuous training with distributed data from heterogeneous network environments.

☠️

Backdoor Attacks in FL

In a backdoor attack, malicious clients inject poisoned model updates into the federated aggregation process. A carefully crafted "trigger" in network traffic causes the global model to misclassify targeted malicious traffic as benign — while maintaining normal accuracy on clean samples. "Trust becomes the vulnerability."

🧠

Byzantine-Robust Aggregation

Existing defences include FedAvg (no defence), Median, Trimmed Mean, Krum, Multi-Krum, and FLAME. Our evaluation shows all existing SOTA methods fail against the sophisticated PFedBA attacker — with FLAME reaching up to 100% ASR in Non-IID settings.

📡

Non-IID Network Environments

Real-world federated NIDS deployments face heterogeneous (Non-IID) data distributions across clients. We model this using Dirichlet distributions (α = 100 for IID, α = 0.5 and α = 0.1 for increasing heterogeneity), reflecting realistic scenarios where different organisations see different traffic patterns.

04

Defence Methodology

🛡️

Our Proposed Defense

SENTINEL

A backdoor-resilient federated aggregation algorithm that fuses multiple anomaly detection signals to identify and filter malicious clients before performing robust coordinate-wise trimmed median aggregation.

1

Extract Client Deltas

Compute per-client model weight deltas (update − global model).

2

Compute Anomaly Signals

Calculate L2 norm anomaly score and Sybil similarity score (cosine similarity between clients). Normalize using Median Absolute Deviation (MAD).

3

Fuse & Filter

Combine signals into a unified anomaly score. Sort clients, reject outliers using IQR threshold. Ensure minimum benign clients remain.

4

Trimmed Median Aggregation

Stack deltas per coordinate, sort and trim top/bottom f values, compute coordinate-wise trimmed median, update global model.

5

Differential Privacy (Optional)

Add calibrated Gaussian noise to the aggregated update to provide formal privacy guarantees.

SENTINEL — Defence Methodology Overview

Attack Evolution

We study three generations of backdoor attacks — each designed to evade the defences that defeated the previous generation.

Brute-Force 💥

Model Poisoning & Replacement

Train a backdoored model locally, compute the model update, and scale it with a large factor (λ) to replace the global model in a single round. Easily detected by large update norms.

Phase 2 — Stealth 🥷

Geometric Camouflage

"Ninja logic" — the attacker measures the magnitude of honest updates and caps the malicious update norm at the median of honest clients using L2 projection. Bypasses norm-based filters while maintaining high ASR.

Phase 3 — PFedBA 👾

Proxy Federated Backdoor Attack

The most sophisticated threat model. Uses Shadow Training to simulate clean update directions, then applies a Gradient Alignment Penalty in the loss function — forcing malicious updates to mimic benign ones. Bypasses FLAME (SOTA).

Threat Model — Phase 03 (PFedBA)

Phase 03 introduces PFedBA (Proxy Federated Backdoor Attack) — the most sophisticated attacker in our evaluation, combining Shadow Training and Gradient Alignment to bypass FLAME and other SOTA defences.

🧩

Shadow Training

The attacker maintains a shadow model trained on a proxy dataset that approximates the global data distribution. This lets them predict clean update directions — disguising the malicious update to mimic benign gradient behaviour.

📐

Gradient Alignment Penalty

A custom loss term minimises cosine distance between the malicious and simulated clean gradient — forcing the malicious update to align directionally with honest clients, evading cosine-similarity filters like FLAME.

🥷

Stealth Norm Scaling

The attacker caps their update's L2 norm to match the median of honest clients. Combined with gradient alignment, PFedBA updates are statistically indistinguishable from benign ones in both magnitude and direction.

⚡

Why It Breaks FLAME

FLAME uses HDBSCAN on cosine similarities to detect malicious clients. PFedBA's gradient alignment forces the attacker into the same cluster as honest clients, completely neutralising FLAME. Under Non-IID (α=0.1), FLAME reaches 90.3% ASR.

Threat Model Phase 03 — PFedBA Attack Flowchart

PFedBA attack pipeline: Shadow Training → Gradient Alignment → Norm Scaling → Aggregation Evasion

👾

Why PFedBA Is Our Final Threat Model

PFedBA represents the convergence of norm stealth (Phase 02) and directional stealth (Phase 03). No existing SOTA defense — including FLAME — counters both simultaneously. SENTINEL was built against this attacker, achieving ≈0% ASR (IID) and 10.5% ASR (Non-IID α=0.5), vs FLAME's 7.7% and 60.4%.

04

Experiment Setup & Implementation

📊

Dataset: UNSW-NB15

◆Network intrusion benchmark dataset
◆Binary classification: Normal vs. Attack
◆Feature set: 71 input features
◆Partitioned across 10 federated clients

🌐

Federated Setup

◆10 total clients per experiment
◆3 malicious clients (30%) — baseline
◆4 malicious clients (40%) — stress test
◆Round-based FedAvg coordination

🧮

Model Architecture

◆4-layer Fully Connected Neural Network (FCNN)
◆71 → 128 → 64 → 32 → 2
◆Binary cross-entropy loss
◆Adam optimizer

📐

Data Distributions

◆IID: Dirichlet α = 100 (homogeneous)
◆Non-IID moderate: α = 0.5
◆Non-IID extreme: α = 0.1 (high imbalance)
◆Formula: (p₁ₖ,…,pₙₖ) ~ Dir(α)

Defences Evaluated

FedAvgBaseline

MedianRobust Agg.

Trimmed MeanRobust Agg.

KrumByzantine

Multi-KrumByzantine

FLAMESOTA

SENTINELOurs

05

Results & Analysis

🎯

Accuracy ≠ Security

A model at 93% Main Task Accuracy still suffered 99.95% Attack Success Rate under PFedBA — shattering the assumption that accuracy implies safety.

🛡️

SENTINEL Dominates

SENTINEL reduces ASR to ≈0% in IID settings and 10.5% under Non-IID (α=0.5) — a 9× improvement over FLAME (SOTA) at 60.4%.

⚠️

Extreme Heterogeneity Challenge

At α=0.1, even SENTINEL struggles against PFedBA (ASR≈30%), exposing a critical open problem in highly heterogeneous FL-NIDS environments.

Attack Success Rate (ASR %) — PFedBA Attack

Lower is better. All defences evaluated against the strongest threat model (PFedBA).

Defense	Type	IID (α=100)	Non-IID (α=0.5)	Non-IID (α=0.1)
FedAvg	Baseline	95.2%	99.5%	99.8%
Krum	Byzantine	90.4%	99.9%	99.9%
Median	Robust Agg.	High	High	High
Trimmed Mean	Robust Agg.	High	High	High
Multi-Krum	Byzantine	High	High	High
FLAME	SOTA	7.7%	60.4%	90.3%
SENTINEL Ours	Ours	≈ 0%	10.5%	31.7%

* All results on UNSW-NB15 dataset with 10 clients (3 malicious). Lower ASR = stronger defense.

06

Conclusion

This research demonstrates a critical gap in federated NIDS security: existing state-of-the-art defences (including FLAME) are fundamentally broken against sophisticated adaptive attackers like PFedBA. Our proposed SENTINEL algorithm significantly outperforms all baselines in IID and moderate Non-IID settings, reducing ASR from 99.5% (FedAvg) to 10.5% (Non-IID α=0.5) — while preserving high main-task accuracy.

Key Contributions

🛡️

SENTINEL Defense Algorithm

A novel backdoor-resilient federated aggregation algorithm combining L2 norm anomaly scoring, Sybil similarity detection, IQR-based filtering, and coordinate-wise trimmed median aggregation — with optional Differential Privacy.

💡

Accuracy ≠ Security

We formally demonstrated that a model with 93% main-task accuracy can simultaneously achieve 99.95% Attack Success Rate — proving accuracy is a dangerously insufficient metric for evaluating FL-NIDS security.

📊

Comprehensive FL-NIDS Benchmark

A systematic evaluation of 7 aggregation strategies (FedAvg, Median, Trimmed Mean, Krum, Multi-Krum, FLAME, SENTINEL) against 3 attack generations across IID and Non-IID data distributions.