Diffusion-Based Counterfactual ECG Generation for Atrial Fibrillation Data Augmentation

Team

E/20/069, Tharaka Dilshan, e20069@eng.pdn.ac.lk
E/20/189, Nethmini Karunarathne, e20189@eng.pdn.ac.lk

Supervisors

Dr. Vajira Thambawita, SimulaMet, Oslo, Norway
Prof. Mary M. Maleckar, Tulane University / Simula Research Laboratory
Dr. Roshan Ragel, University of Peradeniya
Dr. Isuru Nawinne, University of Peradeniya

Collaborators

Isuri Devindi, University of Maryland, College Park, USA
Prof. Jørgen K. Kanters, University of Copenhagen, Denmark

Introduction
Related Works
Methodology
Results
Conclusion
Publications
Links

Introduction

Automated detection of atrial fibrillation (AFib) using deep learning requires large, balanced ECG datasets, yet clinical data remains scarce, imbalanced, and constrained by privacy regulations. We present a diffusion-based data augmentation pipeline that generates synthetic ECG segments by transforming existing recordings into counterfactual waveforms of the opposing class. The pipeline uses a partial-noise conditional denoising process with classifier-free guidance, operating on single-lead ECG signals. A content-style disentangled UNet architecture separates class-invariant morphology from class-discriminative rhythm features. A multi-stage plausibility post-validator enforces morphological and physiological constraints, retaining only waveforms that satisfy quality thresholds.

The augmented mixture achieves 95.05% accuracy and 98.60% AUROC, statistically equivalent to original-only training (95.63% accuracy, TOST p = 0.007, Δ = ±2%). Furthermore, none of the accepted counterfactuals are near-copies of training data (maximum correlation: 0.30), indicating the generated signals are novel and privacy-preserving.

This research is conducted as part of the EU-funded SEARCH Initiative through an international collaboration between the University of Peradeniya (Sri Lanka), SimulaMet (Norway), Tulane University (USA), and the University of Copenhagen (Denmark).

Generative models for ECG synthesis have been explored using GANs, VAEs, and more recently, diffusion models.

Method	Model	Counterfactual	Key Limitations
PGAN-ECG (Golany et al.)	GAN	No	Limited rhythm diversity
WaveGAN-ECG (Donahue et al.)	GAN	No	Mode collapse on long signals
ECG-VAE (Biswal et al.)	VAE	No	Blurred morphology
CoFE-GAN (Jang et al., 2025)	GAN	Yes	Requires iterative latent inversion
GCX-ECG (Alcaraz-Segura et al.)	GAN	Yes	Weak morphology preservation
Ours	Diffusion	Yes	Single-lead; automated validation only

Methodology

Pipeline Overview

Our pipeline consists of five stages: data preparation, classifier training, two-stage diffusion model training, counterfactual generation with filtering, and three-regime augmentation evaluation.

Pipeline Overview

Two-Stage Training

Stage 1 (Epochs 1-50): Reconstruction - The content encoder extracts class-invariant morphological features (256-d VAE latent), and the style encoder captures class-discriminative rhythm features (128-d embedding). The conditional 1D UNet learns to denoise ECG signals with classifier-free guidance (10% label dropout).

Stage 2 (Epochs 51-100): Counterfactual Fine-tuning - The UNet is conditioned on the target class label (y’ = 1 - y), while a frozen AFib classifier provides supervision through a flip loss, and a morphology preservation loss constrains deviation from the source signal.

Inference Pipeline

Partial-noise initialization - Corrupt source ECG to 60% noise level
DDIM denoising - 50 steps with classifier-free guidance (scale w = 3)
Post-processing - Savitzky-Golay smoothing
Gate 1: Flip verification - Frozen classifier must predict target class
Gate 2: Plausibility check - Score P = 0.3M + 0.3Phi + 0.4*C >= 0.7
Accept - Output filtered counterfactual with verified label

Dataset

We use the MIMIC-IV ECG Diagnostic Electrocardiogram Matched Subset (PhysioNet), processing Lead II recordings at 250 Hz, 10-second segments (2,500 samples), bandpass filtered at 0.5-40 Hz.

Partition	Segments	Normal / AFib
Training	104,855	52,447 / 52,408
Validation	22,469	11,239 / 11,230
Test	22,469	11,239 / 11,230
Total	149,793	74,925 / 74,868

Results

Of 22,469 raw counterfactuals generated, 7,784 passed all quality gates (34.6% acceptance rate).

ECG Comparison

Augmentation Evaluation (5-Fold Cross-Validation)

Training Regime	Accuracy	F1 Score	AUROC
A - Original only	95.63 +/- 0.33%	95.65 +/- 0.35%	98.90 +/- 0.16%
B - CF only	85.94 +/- 1.32%	86.70 +/- 1.24%	93.24 +/- 1.47%
C - Augmented (67% + 33%)	95.05 +/- 0.50%	95.09 +/- 0.46%	98.60 +/- 0.17%

Performance Comparison

Statistical Equivalence Testing

Test	Result	p-value
McNemar’s test	Delta = 0.54%	< 0.001
TOST equivalence (+-2%)	Equivalent	< 0.001
Non-inferiority (2%)	Non-inferior	< 0.001
Dunnett’s (A vs C)	No sig. diff.	0.340
Dunnett’s (A vs B)	Sig. diff.	< 0.001

Conclusion

We presented a diffusion-based counterfactual ECG generation pipeline that transforms single-lead ECG recordings into counterfactuals of the opposing class. The goal is to establish that synthetic counterfactual data can safely substitute for or supplement real patient recordings without degrading diagnostic performance - enabling privacy-preserving data sharing and class balancing for AFib detection.

Publications

Tharaka Dilshan, Nethmini Karunarathne, Isuri Devindi, Mary M. Maleckar, Jorgen K. Kanters, Roshan Ragel, Isuru Nawinne, Vajira Thambawita. “Diffusion-Based Counterfactual ECG Generation for Atrial Fibrillation Data Augmentation” (2025). Under Review.

Links

This work is part of the European project SEARCH, supported by the Innovative Health Initiative Joint Undertaking (IHI JU) under grant agreement No. 101172997.

Diffusion-Based Counterfactual ECG Generation for Atrial Fibrillation Data Augmentation

Team

Supervisors

Collaborators

Table of Contents

Introduction

Related Works

Methodology

Pipeline Overview

Two-Stage Training

Inference Pipeline

Dataset

Results

Augmentation Evaluation (5-Fold Cross-Validation)

Statistical Equivalence Testing

Conclusion

Publications

Links