Final Year Research Project · Group 08 · University of Peradeniya

Backdoor-Resilient Federated Learning
for Network Intrusion Detection

Building a federated NIDS that withstands sophisticated backdoor attacks in Non-IID, privacy-constrained network environments --> introducing SENTINEL, a novel defense combining multi-signal anomaly filtering with coordinate-wise trimmed median aggregation.

🔬 Federated Learning 🛡️ SENTINEL Defense 📡 NIDS 📊 UNSW-NB15 ⚡ Non-IID

Abstract

Network Intrusion Detection Systems (NIDS) are a critical line of defence against cyberattacks. Federated Learning (FL) enables collaborative model training across distributed organisations without sharing raw network traffic — making it an attractive paradigm for privacy-preserving NIDS.

However, FL is inherently vulnerable to backdoor attacks, where malicious clients poison the global model to misclassify specific traffic patterns as benign. Our experiments demonstrate that a model achieving 93% main task accuracy can simultaneously suffer a devastating 99.95% Attack Success Rate (ASR), proving that accuracy alone is a dangerously insufficient security metric.

This research investigates how sophisticated backdoor attacks evolve to evade state-of-the-art defences (FLAME, Multi-Krum) and proposes SENTINEL — a novel backdoor-resilient aggregation algorithm combining multi-signal anomaly filtering (L2 norm + Sybil similarity) with coordinate-wise trimmed median aggregation and optional Differential Privacy.

95.2%
Attack Success Rate (FedAvg, IID)
without any defense
≈ 0%
SENTINEL ASR (IID scenario)
under stealthy backdoor attack
10.5%
SENTINEL ASR (Non-IID, α=0.5)
vs 99.5% for FedAvg
UNSW-NB15
Dataset
3 Dirichlet distributions (α = 100, 0.5, 0.1)

Defence Methodology

🛡️
Our Proposed Defense

SENTINEL

A backdoor-resilient federated aggregation algorithm that fuses multiple anomaly detection signals to identify and filter malicious clients before performing robust coordinate-wise trimmed median aggregation.

1
Extract Client Deltas

Compute per-client model weight deltas (update − global model).

2
Compute Anomaly Signals

Calculate L2 norm anomaly score and Sybil similarity score (cosine similarity between clients). Normalize using Median Absolute Deviation (MAD).

3
Fuse & Filter

Combine signals into a unified anomaly score. Sort clients, reject outliers using IQR threshold. Ensure minimum benign clients remain.

4
Trimmed Median Aggregation

Stack deltas per coordinate, sort and trim top/bottom f values, compute coordinate-wise trimmed median, update global model.

5
Differential Privacy (Optional)

Add calibrated Gaussian noise to the aggregated update to provide formal privacy guarantees.

SENTINEL — Defence Methodology Overview
SENTINEL Defence Methodology

Attack Evolution

We study three generations of backdoor attacks — each designed to evade the defences that defeated the previous generation.

Brute-Force 💥

Model Poisoning & Replacement

Train a backdoored model locally, compute the model update, and scale it with a large factor (λ) to replace the global model in a single round. Easily detected by large update norms.

Phase 2 — Stealth 🥷

Geometric Camouflage

"Ninja logic" — the attacker measures the magnitude of honest updates and caps the malicious update norm at the median of honest clients using L2 projection. Bypasses norm-based filters while maintaining high ASR.

Phase 3 — PFedBA 👾

Proxy Federated Backdoor Attack

The most sophisticated threat model. Uses Shadow Training to simulate clean update directions, then applies a Gradient Alignment Penalty in the loss function — forcing malicious updates to mimic benign ones. Bypasses FLAME (SOTA).

Threat Model — Phase 03 (PFedBA)

Phase 03 introduces PFedBA (Proxy Federated Backdoor Attack) — the most sophisticated attacker in our evaluation, combining Shadow Training and Gradient Alignment to bypass FLAME and other SOTA defences.

🧩

Shadow Training

The attacker maintains a shadow model trained on a proxy dataset that approximates the global data distribution. This lets them predict clean update directions — disguising the malicious update to mimic benign gradient behaviour.

📐

Gradient Alignment Penalty

A custom loss term minimises cosine distance between the malicious and simulated clean gradient — forcing the malicious update to align directionally with honest clients, evading cosine-similarity filters like FLAME.

🥷

Stealth Norm Scaling

The attacker caps their update's L2 norm to match the median of honest clients. Combined with gradient alignment, PFedBA updates are statistically indistinguishable from benign ones in both magnitude and direction.

Why It Breaks FLAME

FLAME uses HDBSCAN on cosine similarities to detect malicious clients. PFedBA's gradient alignment forces the attacker into the same cluster as honest clients, completely neutralising FLAME. Under Non-IID (α=0.1), FLAME reaches 90.3% ASR.

Threat Model Phase 03 — PFedBA Attack Flowchart
PFedBA Threat Model Flowchart

PFedBA attack pipeline: Shadow Training → Gradient Alignment → Norm Scaling → Aggregation Evasion

👾

Why PFedBA Is Our Final Threat Model

PFedBA represents the convergence of norm stealth (Phase 02) and directional stealth (Phase 03). No existing SOTA defense — including FLAME — counters both simultaneously. SENTINEL was built against this attacker, achieving ≈0% ASR (IID) and 10.5% ASR (Non-IID α=0.5), vs FLAME's 7.7% and 60.4%.

Experiment Setup & Implementation

📊

Dataset: UNSW-NB15

  • Network intrusion benchmark dataset
  • Binary classification: Normal vs. Attack
  • Feature set: 71 input features
  • Partitioned across 10 federated clients
🌐

Federated Setup

  • 10 total clients per experiment
  • 3 malicious clients (30%) — baseline
  • 4 malicious clients (40%) — stress test
  • Round-based FedAvg coordination
🧮

Model Architecture

  • 4-layer Fully Connected Neural Network (FCNN)
  • 71 → 128 → 64 → 32 → 2
  • Binary cross-entropy loss
  • Adam optimizer
📐

Data Distributions

  • IID: Dirichlet α = 100 (homogeneous)
  • Non-IID moderate: α = 0.5
  • Non-IID extreme: α = 0.1 (high imbalance)
  • Formula: (p₁ₖ,…,pₙₖ) ~ Dir(α)

Defences Evaluated

FedAvgBaseline
MedianRobust Agg.
Trimmed MeanRobust Agg.
KrumByzantine
Multi-KrumByzantine
FLAMESOTA
SENTINELOurs

Results & Analysis

🎯

Accuracy ≠ Security

A model at 93% Main Task Accuracy still suffered 99.95% Attack Success Rate under PFedBA — shattering the assumption that accuracy implies safety.

🛡️

SENTINEL Dominates

SENTINEL reduces ASR to ≈0% in IID settings and 10.5% under Non-IID (α=0.5) — a 9× improvement over FLAME (SOTA) at 60.4%.

⚠️

Extreme Heterogeneity Challenge

At α=0.1, even SENTINEL struggles against PFedBA (ASR≈30%), exposing a critical open problem in highly heterogeneous FL-NIDS environments.

Attack Success Rate (ASR %) — PFedBA Attack

Lower is better. All defences evaluated against the strongest threat model (PFedBA).

DefenseTypeIID (α=100)Non-IID (α=0.5)Non-IID (α=0.1)
FedAvgBaseline95.2%99.5%99.8%
KrumByzantine90.4%99.9%99.9%
MedianRobust Agg.HighHighHigh
Trimmed MeanRobust Agg.HighHighHigh
Multi-KrumByzantineHighHighHigh
FLAMESOTA7.7%60.4%90.3%
SENTINEL OursOurs≈ 0%10.5%31.7%

* All results on UNSW-NB15 dataset with 10 clients (3 malicious). Lower ASR = stronger defense.

Conclusion

This research demonstrates a critical gap in federated NIDS security: existing state-of-the-art defences (including FLAME) are fundamentally broken against sophisticated adaptive attackers like PFedBA. Our proposed SENTINEL algorithm significantly outperforms all baselines in IID and moderate Non-IID settings, reducing ASR from 99.5% (FedAvg) to 10.5% (Non-IID α=0.5) — while preserving high main-task accuracy.

Key Contributions

🛡️

SENTINEL Defense Algorithm

A novel backdoor-resilient federated aggregation algorithm combining L2 norm anomaly scoring, Sybil similarity detection, IQR-based filtering, and coordinate-wise trimmed median aggregation — with optional Differential Privacy.

💡

Accuracy ≠ Security

We formally demonstrated that a model with 93% main-task accuracy can simultaneously achieve 99.95% Attack Success Rate — proving accuracy is a dangerously insufficient metric for evaluating FL-NIDS security.

📊

Comprehensive FL-NIDS Benchmark

A systematic evaluation of 7 aggregation strategies (FedAvg, Median, Trimmed Mean, Krum, Multi-Krum, FLAME, SENTINEL) against 3 attack generations across IID and Non-IID data distributions.

⚠️ Limitations

Single-Feature Triggers

Current experiments use a single-feature trigger; real-world attacks may use multi-feature or adaptive triggers.

Small Federation Scale

Tested with 10 clients. Behaviour in larger federations (100+ nodes) may differ.

Extreme Heterogeneity

At α=0.1, SENTINEL still struggles against PFedBA — a critical open challenge for highly Non-IID environments.

Static Trigger Design

The current threat model uses a static trigger pattern; adaptive trigger-based attacks remain a future challenge.

Team

Group 08 — Students

De Silva H.D.S.
E/20/055
De Silva H.D.S.
e20055@eng.pdn.ac.lk
Ekanayaka E.M.I.P.
E/20/094
Ekanayaka E.M.I.P.
e20094@eng.pdn.ac.lk
Peiris T.M.S.U.
E/20/284
Peiris T.M.S.U.
e20284@eng.pdn.ac.lk

Supervisors

Dr. Upul Jayasinghe Mendis
Supervisor
Dr. Upul Jayasinghe Mendis
upuljm@eng.pdn.ac.lk
Dr. Suneth Namal Karunarathna
Supervisor
Dr. Suneth Namal Karunarathna
namal@eng.pdn.ac.lk