Explainable AI-Driven Zero-Trust
Anomaly Detection for Encrypted Traffic

Bridging the gap between encrypted traffic analysis, explainable AI, and automated zero-trust policy enforcement.

Zero-Trust Explainable AI Encrypted Traffic Deep Dictionary Learning SHAP

CO425 — Final Year Project II  |  Department of Computer Engineering  |  University of Peradeniya

01 Abstract

Modern cybersecurity is shifting toward encryption to protect data privacy, but this often blinds traditional Intrusion Detection Systems (IDS) that rely on payload inspection. Concurrently, the rise of cloud computing and remote work has made perimeter-based security obsolete, leading to the adoption of Zero-Trust Architecture (ZTA), which requires continuous verification of every entity.

While Deep Learning models can detect anomalies in encrypted traffic without decryption by analyzing metadata, their "black-box" nature creates a trust deficit that hinders automated policy enforcement. This project proposes a framework integrating Encrypted Traffic Analysis (ETA) with Explainable AI (XAI) using SHAP and Deep Dictionary Learning to provide real-time, human-readable rationales for security decisions.

02 Team

Members

CP
Chalaka Perera
E/20/288
e20288@eng.pdn.ac.lk
JW
Janith Wanasinghe
E/20/420
e20420@eng.pdn.ac.lk
SW
Sandaru Wijewardhana
E/20/449
e20449@eng.pdn.ac.lk

Supervisors

SN
Dr. Suneth Namal Karunarathna
namal@eng.pdn.ac.lk
UJ
Dr. Upul Jayasinghe
upuljm@eng.pdn.ac.lk

03 Methodology

The proposed framework utilizes a multi-stage zero-trust pipeline:

🔐 Stage 1 — Encryption & Extraction

Raw PCAP streams are processed through an AES-256-GCM encryption simulator. A hybrid DPKT + NFStream extractor yields 15 CIC-IDS-2017 behavioral features — all from unencrypted metadata (no payload inspection).

🌲 Stage 2 — Decision Tree Pre-Check

A lightweight Decision Tree classifier (entropy criterion, class weight 50:1 attack bias) performs an initial fast check. Flows classified as Normal are forwarded immediately; flagged flows proceed to deep analysis.

🧠 Stage 3 — Deep Dictionary Learning

A two-layer DDL model with ISTA sparse coding learns atomic dictionary representations of normal traffic. Flows with high reconstruction error (exceeding the learned threshold) are classified as anomalous.

💡 Stage 4 — XAI Explanation

For every anomaly, two explanation methods fire: DDL-native per-feature reconstruction error decomposition, and SHAP KernelExplainer perturbation-based attributions — producing human-readable rationales for SOC analysts.

🛡️ Stage 5 — SDN Buffer & Policy

Anomalous flows are held in an OpenFlow-style SDN buffer while the explanation is computed. After analysis: DROP confirmed threats, FORWARD cleared flows. Decisions feed back into the ZTA Policy Engine.

📊 Stage 6 — Dashboard & Logging

A Streamlit dashboard provides real-time visibility into detection events, explanation summaries, and pipeline statistics (TP/TN/FP/FN, latency, throughput).

Feature Sets (CIC-IDS-2017)

BCC v2: 28 features (Stage 1 gatekeeper) | DDL: 40 features (Stage 2, superset of BCC's 28)

Packet Length Variance Fwd Packet Length Max Fwd Header Length Init_Win_bytes_forward Bwd Header Length Total Length of Fwd Packets Init_Win_bytes_backward Bwd Packets/s Flow IAT Min Fwd IAT Min Flow Bytes/s Active Min Bwd IAT Total Flow IAT Max Flow Duration Total Fwd Packets Total Bwd Packets Fwd Packet Length Mean Bwd Packet Length Mean Fwd Packet Length Std Bwd Packet Length Max Flow IAT Mean Flow IAT Std Fwd IAT Total Fwd Packets/s Down/Up Ratio SYN Flag Count RST Flag Count + Bwd Pkt Len Min/Max (DDL) + Fwd/Bwd IAT extras (DDL) + Active Min (DDL)

04 Architecture

Raw PCAP Stream Feature Extraction 15 CIC-IDS feats DPKT + NFStream Decision Tree Pre-Check Entropy, depth=15 50:1 attack weight FORWARD Normal traffic flagged SDN Buffer Hold for analysis DDL Model 2-layer ISTA Recon. error → score x̂ = (α₂·D₂ᵀ)·D₁ᵀ XAI Explainer DDL-native recon. + SHAP Kernel → human rationale RELEASE DDL cleared DROP Anomaly confirmed Legend: Normal Flagged Anomaly Data flow Encryption → Feature Extraction → DT Pre-Check → SDN Buffer + DDL → XAI → Action

Repository Structure

DirectoryPurposeStatus
BaseCheckClassifier/Decision Tree classifier, feature extraction, encryption sim, dashboardActive
DDLModel/Deep Dictionary Learning anomaly detector (two-layer ISTA)Active
XAIExplainer/SHAP + DDL-native reconstruction explanationActive
SDNBuffer/OpenFlow-style SDN flow buffer simulationActive
ZeroTrustPipeline/Pipeline orchestrator — ties all modules togetherActive
tests/Integration tests (27 sub-tests)Active
ObsoleteExperiments/Archived early experiments (see below)Archived
docs/This GitHub Pages siteActive

05 Experiments & Evolution

The current DDL-based pipeline is the result of several iterative experiments. Each attempt taught us something that shaped the final design. Below is the complete timeline of approaches tried, their results, and why we moved on.

Jan 2026
Experiment 1 — Semi-Supervised Pseudo-Labeling on BCCC Darknet

Approach: Ensemble of Isolation Forest + Autoencoder for unsupervised anomaly detection on the BCCC Darknet dataset. Pseudo-labels were generated via agreement voting — confidence = 1.0 when both models agreed a sample was anomalous. Top 50 features selected by variance from 467 numeric features.

Results — Pseudo-Labeling

MetricValue
Accuracy0.9949
Precision0.7451
Recall1.0000
F1-Score0.8539
ROC-AUC0.9996
5-Fold CV (ROC-AUC)0.9991 ± 0.0003

Confusion Matrix: TN=5,006   FP=26   FN=0   TP=76
Dataset: 25,588 samples — only 379 high-confidence anomalies (1.48%), class imbalance ratio 1:66.4

⛔ Why We Moved On
  • No ground truth: Pseudo-labels are unsupervised — there was no way to validate whether detected "anomalies" were real attacks
  • Anonymous features: The 50 variance-selected features were unnamed (Feature_123, Feature_045, etc.) — completely uninterpretable for security analysts
  • Extreme class imbalance: Only 1.48% anomalies (379 / 25,588) making evaluation unreliable
  • Precision ceiling: ~74.5% precision means ~1 in 4 alerts would be false positives — unacceptable for automated policy enforcement
  • Dataset mismatch: BCCC Darknet captures darknet traffic, which is less representative of enterprise encrypted traffic than CIC-IDS-2017
Feb 2026
Experiment 2 — Random Forest + SHAP Pipeline on BCCC Darknet

Approach: 3-stage notebook pipeline — (1) IF+AE pseudo-labeling, (2) Random Forest classifier (300 trees, max_depth=15, balanced class weights), (3) SHAP TreeExplainer for explainability. Same BCCC Darknet dataset and 50 anonymous features.

Results — Random Forest + SHAP

MetricValue
Accuracy0.9945
Precision0.7300
Recall0.9865
F1-Score0.8391
ROC-AUC0.9994
5-Fold CV (ROC-AUC)0.9991 ± 0.0003

Confusion Matrix: TN=5,007   FP=27   FN=1   TP=73
SHAP outputs: bar, beeswarm, waterfall, and dependence plots on anonymous features

⛔ Why We Moved On
  • Same fundamental issues as Experiment 1 — pseudo-labels, anonymous features, BCCC Darknet
  • SHAP on anonymous features is meaningless: Explaining that "Feature_045 contributes most to anomaly" tells a SOC analyst nothing actionable
  • Slightly worse performance: Precision dropped to 73.0%, recall to 98.65% — RF added complexity without improving quality
  • No integration path: This standalone pipeline had no connection to the zero-trust simulation or SDN policy enforcement
  • TreeExplainer limitations: Only works with tree-based models — we needed a model-agnostic approach for DDL
Feb 2026
Experiment 3 — Decision Tree on CIC-IDS-2017 (BaseCheckClassifier)

Approach: Switched datasets to CIC-IDS-2017 with 15 hand-selected behavioral features. Trained a Decision Tree (entropy criterion, max_depth=15, 50:1 attack class weight) as a "zero-leak" classifier. Integrated into a full simulation pipeline with encryption, topology simulation, and a Streamlit dashboard.

✅ Key Improvements Over Previous Experiments
  • Real labeled dataset: CIC-IDS-2017 has proper ground truth labels (Normal / Attack categories)
  • Interpretable features: All 15 features have meaningful names (Packet Length Variance, Flow Duration, etc.)
  • Zero-trust simulation: Full integration with encryption, topology, and SDN policy enforcement
⛔ Why It's Not Sufficient Alone
  • Single-model limitation: A Decision Tree alone cannot capture complex nonlinear attack patterns in encrypted traffic
  • No deep anomaly detection: DTs rely on axis-aligned splits — they miss subtle distributional shifts that dictionary learning can catch
  • No built-in explainability: While DTs are inherently interpretable, they don't provide the rich, quantitative explanations needed for automated policy creation
  • Feature distribution mismatch: When tested on synthetic pcap data, the DT classified everything as Normal due to distribution differences between training and real network data
Feb–Mar 2026
Current — Deep Dictionary Learning + SHAP XAI + SDN Buffer

Approach: Two-layer Deep Dictionary Learning with ISTA sparse coding as the anomaly detection backbone, combined with DDL-native reconstruction decomposition + SHAP KernelExplainer for dual-mode explainability. The Decision Tree serves as a lightweight pre-filter, and an SDN buffer holds suspicious flows during analysis.

✅ Why This Approach Works
  • Unsupervised + supervised hybrid: DDL trains on normal traffic only (no labeled attacks needed) but uses CIC-IDS-2017's labeled data for evaluation
  • Named, interpretable features: Explanations reference real network characteristics ("Packet Length Variance is 250x above normal")
  • Dual explainability: DDL-native provides per-feature reconstruction error; SHAP provides model-agnostic attributions — both give actionable insights
  • Full zero-trust integration: DT pre-check → SDN buffer → DDL analysis → XAI report → automated policy enforcement
  • Modular design: Each component (DDLModel, XAIExplainer, SDNBuffer, Pipeline) is independently testable — 27/27 integration tests pass

Side-by-Side Comparison

Aspect Exp 1: IF+AE Exp 2: RF+SHAP Exp 3: DT (CIC) Current: DDL+XAI
Dataset BCCC Darknet BCCC Darknet CIC-IDS-2017 CIC-IDS-2017
Features 50 anonymous 50 anonymous 15 named 15 named
Labels Pseudo (unsupervised) Pseudo (unsupervised) Ground truth Ground truth
Core Model Isolation Forest + AE Random Forest Decision Tree DDL (ISTA)
Explainability None SHAP (anonymous) Inherent (DT rules) DDL-native + SHAP
Zero-Trust Integration No No Yes (DT only) Yes (full pipeline)
SDN Buffer No No No Yes
Precision 0.7451 0.7300 0.875 (BCC) / 0.699 (DDL) 0.9363 (full pipeline)
Recall 1.0000 0.9865 0.999 (BCC / Sandaru data) 0.4537 (DDL standalone)
FPR 0.78% 0.25% (full pipeline)
Status Archived Archived Pre-filter (Stage 1) Active ✓ Tested

06 Implementation Details

The current zero-trust pipeline is fully implemented as modular Python packages, with 27/27 integration tests passing. Below are the technical specifics of each component.

6.1   Deep Dictionary Learning Model

Architecture

ComponentSpecification
Input40 CIC-IDS-2017 features — superset of BCC's 28 (Z-score normalised)
Layer 1 DictionaryD₁ ∈ ℝ40×64 — captures coarse flow patterns
Layer 2 DictionaryD₂ ∈ ℝ64×128 — captures subtle micro-patterns
Sparse CodingISTA (Iterative Shrinkage-Thresholding Algorithm), 50 iterations per layer
Sparsity Penalty (λ)0.1 (L1 regularisation on sparse codes)
Dictionary UpdateMini-batch gradient descent with column-wise unit-norm projection
Training Epochs150 (batch size = 512, GPU: RTX 6000 Ada, ~1h 45min)
Training samples1,682,457 normal flows (CIC-IDS-2017 TRAIN)
Anomaly Threshold0.7597 (95th percentile of training reconstruction error)

Forward Pass

x → normalise → α₁ = ISTA(D₁, x)     // Layer 1 sparse code (64-dim)
α₁α₂ = ISTA(D₂, α₁)     // Layer 2 sparse code (128-dim)
α₂ → α̂₁ = α₂ · D₂ᵀ           // Decode to Layer 1 space
α̂₁ → = α̂₁ · D₁ᵀ            // Full reconstruction
error = ‖x − x̂‖²            // If error > threshold → Anomaly

Training Strategy

The model trains only on benign traffic — an unsupervised approach that avoids the need for labelled attack samples. During training, dictionaries D₁ and D₂ learn to efficiently represent normal flow patterns via alternating sparse coding (ISTA) and dictionary update (gradient descent with column normalisation). After training, an anomaly threshold is set at the 95th percentile of reconstruction errors on the training set. At inference time, any flow whose reconstruction error exceeds this threshold is flagged as anomalous.

6.2   Explainability Module (XAI)

Every anomaly decision is accompanied by a detailed explanation through two complementary strategies:

DDL-Native Explanation

  • Per-feature reconstruction error — decomposes the total error into each of the 15 features, ranks them by contribution
  • Sparse code activation analysis — reports how many dictionary atoms activated in each layer (sparsity ratio)
  • Layer-by-layer diagnostics — intermediate representations at L1 and L2 for failure-mode analysis
  • Expected vs. observed values — shows what the DDL model expected each feature to be versus the actual value

SHAP KernelExplainer

  • Model-agnostic — treats the DDL anomaly scorer as a black box, compatible with any future model replacement
  • Background dataset — uses 100 normal samples as the baseline distribution
  • Per-feature SHAP values — quantifies each feature's individual contribution to the anomaly score
  • Actionable output — "Packet Length Variance contributes +8340 to anomaly score" rather than "Feature_045 is important"

Both strategies produce a composite report suitable for SOC analyst dashboards and automated policy audit trails. The human-readable interpretation includes feature rankings, deviation magnitudes, and a recommended action (DROP / FORWARD).

6.3   SDN Buffer (Simulated OpenFlow)

ParameterValueDescription
Max Buffer Size1,000 streamsMax concurrent flows held for analysis
Timeout5,000 msAuto-release if DDL analysis takes too long
ActionsBUFFER → RELEASE / DROPMirrors OpenFlow OFPT_PACKET_IN / OFPT_FLOW_MOD
Expiry PolicyAuto-release with warningFail-open on timeout to prevent denial of service

In a real deployment, this module would be replaced by actual SDN controller commands (e.g., OpenDaylight or ONOS). The simulation faithfully tracks buffer state, hold times, and capacity — providing realistic latency measurements for the pipeline evaluation.

6.4   Pipeline Orchestrator

The ZeroTrustPipeline ties all components together with a fail-closed design:

Stage 1  .pcap → simulate TLS encryption → extract 15 features
Stage 2  Features → Decision Tree pre-check
         ├─ NormalFORWARD immediately (fast path)
         └─ Flagged → SDN Buffer HOLD
Stage 3  Buffer → DDL + XAI (parallel threads)
         ├─ DDL: Normal → Buffer RELEASE → FORWARD
         └─ DDL: Anomaly → Buffer DROP + XAI Report attached

6.5   Feature Sets — Why 28 (BCC) and 40 (DDL)?

Features were selected from CIC-IDS-2017's 78 columns using three criteria: (1) not null across all 5 days, (2) high ANOVA F-score between Normal vs Attack classes, (3) computable from raw network metadata (no payload inspection — works on encrypted traffic). The two feature sets are cumulative — DDL's 40 is a superset of BCC's 28.

#FeatureCategoryUsed inRationale
1Packet Length VariancePacket SizeBCC + DDLHigh variance → unusual payload distribution (DDoS uses uniform sizes)
2Fwd Packet Length MaxPacket SizeBCC + DDLAbnormally large fwd packets → exfiltration
3Fwd Header LengthHeaderBCC + DDLHeader padding/manipulation is a common evasion technique
4Init_Win_bytes_forwardTCP WindowBCC + DDLUnusual initial window sizes → scanning tools (e.g. nmap)
5Bwd Header LengthHeaderBCC + DDLAsymmetric header sizes → protocol misuse / C2
6Total Length of Fwd PacketsVolumeBCC + DDLAbnormal volume → flooding or data exfiltration
7Init_Win_bytes_backwardTCP WindowBCC + DDLMismatched backward window → C2 traffic signature
8Bwd Packets/sRateBCC + DDLHigh backward rate → DDoS or amplification attack
9Flow IAT MinTimingBCC + DDLMachine-generated traffic has unnaturally regular timing
10Fwd IAT MinTimingBCC + DDLForward inter-arrival times reveal automated scanning
11Flow Bytes/sThroughputBCC + DDLSudden throughput spikes → data exfiltration or flooding
12Active MinActivityBCC + DDLShort active bursts → bot behaviour / beaconing
13Bwd IAT TotalTimingBCC + DDLTotal backward inter-arrival → response pattern analysis
14Flow IAT MaxTimingBCC + DDLLong idle gaps between bursts → C2 beaconing
15Flow DurationDurationBCC + DDLAbnormally short or long flows → scanning or tunnelling
16Total Fwd PacketsVolumeBCC + DDLPacket count asymmetry reveals scanning patterns
17Total Backward PacketsVolumeBCC + DDLLow backward count with high forward → one-way flooding
18Fwd Packet Length MeanPacket SizeBCC + DDLAverage size compared to variance reveals payload consistency
19Bwd Packet Length MeanPacket SizeBCC + DDLBackward size profile distinguishes scan responses
20Fwd Packet Length StdPacket SizeBCC + DDLLow Std + high rate = tool-generated uniform traffic
21Bwd Packet Length MaxPacket SizeBCC + DDLOversized backward packets → data theft response
22Flow IAT MeanTimingBCC + DDLAverage timing between packets reveals automation
23Flow IAT StdTimingBCC + DDLVery low Std = machine-generated, very high = irregular
24Fwd IAT TotalTimingBCC + DDLTotal forward idle time — long = slow-rate attacks
25Fwd Packets/sRateBCC + DDLHigh fwd rate without proportional payload = flooding
26Down/Up RatioAsymmetryBCC + DDLUnusual download:upload ratio → exfiltration or scanning
27SYN Flag CountTCP FlagsBCC + DDLFlood of SYN packets = SYN flood DDoS
28RST Flag CountTCP FlagsBCC + DDLHigh RST count = port scanning resets
29Bwd Packet Length MinPacket SizeDDL onlyMinimum backward pkt size — DDoS sends identical tiny ACKs
30Bwd Packet Length MaxPacket SizeDDL onlyFull backward size profile for dictionary reconstruction quality
31Flow IAT MeanTimingDDL onlyProvides mean for DDL's pattern reconstruction
32Flow IAT StdTimingDDL onlyIAT variability is key for DDL to encode normal timing patterns
33Fwd IAT TotalTimingDDL onlyForward idle time integral — slow-rate attacks show elevated values
34Bwd IAT MinTimingDDL onlyFast backward bursts = scanning or amplification signatures
35Fwd Packets/sRateDDL onlyForward packet rate for DDL's flow-speed dictionary
36Bwd Packets/sRateDDL onlyBackward packet rate — amplification attacks show extreme ratio
37Fwd Header Length.1HeaderDDL onlyCumulative header overhead — anomalous for tunnelling
38Active MinActivityDDL onlyShortest active period — bots have very short minimum active windows
39ACK Flag CountTCP FlagsDDL onlyACK flood is a common DDoS variant — DDL needs this for flag profiling
40URG Flag CountTCP FlagsDDL onlyUrgent flags rarely appear in normal traffic — clear anomaly indicator

All 40 features are metadata-only — no payload inspection, fully compatible with TLS-encrypted traffic. White rows = shared by BCC and DDL. Green rows = DDL-only features added to improve reconstruction fidelity.

6.6   Evaluation Status

ComponentTestsStatus
DDL model training (150 epochs, GPU)✓ Completed
DDL + IF standalone CSV inference531K rows✓ Completed
BCC on Sandaru's test_raw.csv52K rows✓ Completed — 99.89% recall
Full two-stage pipeline (CSV)50K rows✓ Completed — 93.6% precision
PCAP pipeline evaluation128 labeled flows✓ Completed — see Results
XAI (LIME + SHAP) explanations5 anomaly flows✓ Completed
Live switch / physical hardwarePlanned

07 Experimental Results

Comprehensive evaluation conducted on CIC-IDS-2017 dataset — both CSV (bulk inference) and real labeled PCAP flows. All tests run on ada.ce.pdn.ac.lk server.

7.1   Data Sources

📊 CSV Test (Bulk Inference)

  • Source: CIC-IDS-2017 TEST_Traffic.csv
  • Size: 50,000 flows (39,754 Normal / 10,246 Attack)
  • Preprocessing: Raw CSV — direct column mapping
  • Timing measures: Pure ML inference (no PCAP parsing)
  • Use case: Validates model accuracy and throughput at scale

📡 PCAP Test (Real Network Traffic)

  • Source: CIC-IDS-2017 Friday-labeled-small PCAP
  • Size: 128 labeled flow directories (BENIGN + DDoS)
  • Preprocessing: Raw .pcap → dpkt feature extraction
  • Timing measures: Includes PCAP parsing overhead
  • Use case: Realistic end-to-end latency for live deployment

7.2   Per-Model Results (CSV Test, 50K flows)

ModelAccuracyPrecisionRecallF1FPRLatency/flow
BCC v2 (raw CSV) 83.24%87.50%21.25%34.19%0.78% 0.05 µs
BCC v2 (Sandaru's test data) 98.65%96.31% 99.89%98.07%2.00% 0.05 µs
DDL-40 (standalone) 84.81%69.94%45.37%55.04%5.03% 133 µs
Isolation Forest (standalone) 82.06%62.59%31.00%41.46%4.77% 2.83 µs
Full Pipeline (BCC → DDL+IF) 82.19% 93.63% 14.05%24.44% 0.25% ~8 µs avg

Note: BCC recall is 99.89% on Sandaru's preprocessed data format (the format it was trained on). On raw CIC-IDS-2017 CSV, recall drops to 21.25% due to different feature scaling — the model itself is correct. The full pipeline achieves 93.6% precision with only 0.25% FPR — every DROP is very likely a real attack.

7.3   Confusion Matrices

BCC v2 — On Sandaru's Data ✓

        Predicted
       FORWARD  DROP
Normal   33,679   688
Attack       20 17,967

Only 20 attacks leaked through BCC

Full Pipeline — CSV Test

        Predicted
       FORWARD  DROP
Normal   39,656    98
Attack    8,806  1,440

98 false blocks / 39,754 normal flows = 0.25% FPR

7.4   Inference Timing: CSV vs PCAP

Stage CSV Mode (µs) PCAP Mode (µs) Applies To Notes
Feature Extraction ~0 (pre-extracted) 3,257 µs (3.26 ms) All flows PCAP parsing overhead (dpkt)
BCC Inference 0.05 µs 122 µs All flows Decision Tree predict_proba
DDL Inference 133 µs 4,317 µs (4.3 ms) Flagged only (~5%) 2-layer ISTA reconstruction
IF Inference 2.83 µs 5,616 µs (5.6 ms) Flagged only (~5%) Isolation Forest scoring
Total Pipeline ~8 µs avg ~3,853 µs avg (3.9 ms) All flows PCAP mode dominated by parsing

Key insight: In a real SDN deployment, features would be extracted directly from the OpenFlow PacketIN event (not from a PCAP file) — bringing the feature extraction time much closer to the CSV mode values. The PCAP evaluation shows worst-case latency when reading stored captures.

7.5   XAI Explanation Sample

Flow #11 — True Attack, Correctly DROPped

Both LIME and SHAP independently explain the same flow. The convergence of explanations provides high confidence in the detection.

DDL-LIME (Why DDL flagged it)

syn_flag_count ≤ 0    -0.037  ← No SYN flags
fwd_iat_total > 1.26M  +0.032  ← High idle time
flow_iat_std > 943K   +0.027  ← Erratic timing
ack_flag_count = 1    +0.025  ← Incomplete handshake

IF-LIME (Why IF flagged it)

bwd_pkt_len_mean > 161 +0.031  ← Large Bwd pkts
fwd_iat_total > 1.26M  +0.029  ← Matches DDL
flow_duration > 4.75M  +0.028  ← Very long flow
fwd_pkt_len_std > 23   +0.024  ← Size variance

Cross-validation: DDL and IF both point to fwd_iat_total as the top suspicious feature — this convergence is the rationale for dual-XAI verification. XAI timing: DDL-LIME = 44ms | IF-LIME = 20ms per flow.

Phase 1 — DDL Training on CIC-IDS-2017 Benign Traffic

ParameterPlan
Training DataCIC-IDS-2017 "Monday — WorkingHours" (benign-only, ~529K flows)
Validation Split80/20 train/validation on benign data
Feature NormalisationZ-score from training set (stored in model)
Convergence CriterionValidation reconstruction error plateau (< 0.1% improvement over 20 epochs)
Threshold TuningSweep 90th, 95th, 97th, 99th percentile on validation set

Command: python -m DDLModel.train_ddl --csv data/Monday-WorkingHours.csv --output models/ddl_cic.pkl

Phase 2 — Attack Detection Evaluation

Test the trained DDL against all CIC-IDS-2017 attack categories:

Attack CategoryCIC-IDS-2017 DayExpected Signature
Brute Force (FTP, SSH)TuesdayHigh Fwd IAT Min, abnormal Init_Win_bytes
DoS / DDoS (Hulk, Slowloris, GoldenEye)WednesdayExtreme Bwd Packets/s, Flow Bytes/s spikes
Web Attacks (XSS, SQL Injection)Thursday AMUnusual Fwd Packet Length Max, Header anomalies
InfiltrationThursday PMLong Flow Duration, irregular IAT patterns
Botnet (ARES)Friday AMShort Active Min bursts, C2 beaconing in Flow IAT Max
Port ScanFriday PMVery short flows, low Packet Length Variance
DDoS (LOIT)Friday PMExtreme volume in Total Length of Fwd Packets

For each attack type, we will compute per-class detection rate and verify that the XAI explanations correctly identify the distinguishing features listed above.

Phase 3 — Quantitative Metrics

Primary evaluation metrics with target values:

MetricFormulaTargetRationale
Accuracy(TP + TN) / N≥ 0.95Overall correctness
PrecisionTP / (TP + FP)≥ 0.85Minimise false alarms for SOC analysts
RecallTP / (TP + FN)≥ 0.95Zero-trust: never miss an attack (critical)
F1-Score2 · P · R / (P + R)≥ 0.90Balance between precision and recall
ROC-AUCArea under ROC curve≥ 0.97Threshold-independent discrimination
False Positive RateFP / (FP + TN)≤ 0.05Usability in production SDN

Phase 4 — Ablation Studies & Hyperparameter Tuning

ExperimentVariableRange
Dictionary Sizen_atoms_l1 / n_atoms_l2{32, 64, 128} × {64, 128, 256}
Sparsity Weight (λ)sparsity_weight{0.01, 0.05, 0.1, 0.5}
Threshold Percentilethreshold_percentile{90, 95, 97, 99}
ISTA Iterationsn_iter{20, 50, 100}
Training Epochsn_epochs{50, 100, 200}
DT Pre-filter ImpactWith vs. without DTBinary comparison

Each combination will be evaluated on the Phase 3 metrics. Results will be presented as heatmaps showing the precision-recall trade-off across parameter settings.

Phase 5 — Explainability Quality Assessment

Phase 6 — Latency & Throughput Benchmarks

MeasurementTargetMethod
DT pre-check latency< 1 msAverage over 10K samples
DDL inference latency< 50 msPer-sample, including both layers + ISTA
SHAP explanation latency< 500 msKernelExplainer on single sample (15 features)
End-to-end pipeline latency< 600 mspcap → feature extraction → DT → DDL + SHAP → policy
SDN buffer hold time< 1,000 msTime from BUFFER to RELEASE/DROP
Throughput≥ 100 flows/secBatch processing rate on CIC-IDS-2017

Evaluation Timeline

PhaseTaskTarget DateDeliverable
1Train DDL on CIC-IDS-2017 benign dataWeek 1Trained model + convergence curves
2Attack detection per categoryWeek 2Per-class detection rates + confusion matrix
3Full metrics computationWeek 2Accuracy, Precision, Recall, F1, ROC-AUC
4Ablation studiesWeeks 3–4Hyperparameter sensitivity heatmaps
5Explainability assessmentWeek 4Faithfulness report + rank correlation
6Latency benchmarksWeek 5Performance report + throughput analysis

08 Results & Analysis

Integration Test Results (27/27 passing)

Test SuiteTestsStatus
DDL Model — training, prediction, save/load7✓ Pass
Intermediate Representations2✓ Pass
XAI Explainer — DDL-native6✓ Pass
SHAP Integration — KernelExplainer4✓ Pass
Pipeline Flow — end-to-end with PCAPs4✓ Pass
SDN Buffer — add/release/drop4✓ Pass

Pipeline Demo — 4 Streams

MetricValue
True Positives (attacks correctly dropped)2
True Negatives (normal correctly forwarded)0
False Positives (normal incorrectly dropped)2
False Negatives (attacks missed)0
Recall1.000
F1-Score0.667

Note: The DDL model was trained on synthetic data in the demo. FP rate is expected to improve significantly when trained on real CIC-IDS-2017 benign traffic. The zero false-negative rate aligns with the zero-trust "never miss an attack" philosophy.

Sample XAI Output

Decision: Anomaly

The DDL reconstruction error is 22,965,194x the normal threshold.
Primary anomalous features: Active Min, Init_Win_bytes_backward, Total Length of Fwd Packets
Recommendation: DROP stream and alert SOC analyst.

09 Conclusion

This project identifies that Explainable AI is the missing piece needed to make AI-based detection usable in automated Zero-Trust systems. Through iterative experimentation — from pseudo-labeled BCCC Darknet data with anonymous features, through Random Forest + SHAP, to the current Deep Dictionary Learning architecture — we arrived at a design that provides:

🔍 Transparent Detection

Every anomaly comes with a per-feature reconstruction error breakdown and SHAP attributions, making the detection rationale auditable by security teams.

⚡ Real-Time Compatible

The DT pre-filter handles normal traffic instantly; only flagged flows undergo DDL + XAI analysis, keeping the pipeline feasible for high-throughput networks.

🔄 Policy Feedback Loop

SDN buffer decisions feed directly into ZTA policy enforcement — enabling automated block, throttle, or step-up authentication without human intervention.

Publications

📝 Perera, C., Wanasinghe, J., Wijewardhana, S. et al. "Explainable AI-Driven Zero Trust Anomaly Detection for Encrypted Traffic" (2025/26). In preparation.