Final Year Project • University of Peradeniya

Quality Over Quantity:
The Impact of Diagnostic Certainty in Deep Learning for ECG Analysis

We show that high-confidence, cardiologist-validated ECG labels produce more accurate, better-calibrated, and clinically interpretable deep learning models than larger but noisier datasets — across two architectures.

View Results ↓ GitHub Repo

About the Project

Why Label Quality Matters

Most medical AI research focuses on collecting more data. We ask a different question: does the certainty of diagnostic labels matter more than dataset size?

12
ECG Leads
7,593
Filtered ECGs
2
Architectures
3
Dataset Variants

Binary MI Detection

Classifying 12-lead ECGs as Myocardial Infarction (MI) vs. Normal using the PTB-XL dataset with cardiologist-annotated SCP codes.

Ablation Study Design

Same normal cases across all three datasets. Only the MI label certainty varies — isolating the effect of ground truth quality on model performance.

Clinical Interpretability

Explainable AI (Grad-CAM & Integrated Gradients) validates that high-certainty labels produce models focusing on physiologically correct ECG regions.


Dataset Design

Three Variants, One Question

All training sets share the same 4,451 pure normal ECGs. They differ only in which MI cases are included — enabling a controlled comparison of label certainty.

Dataset MI Type MI Cases Normal Cases Total Train Label Certainty
Dataset A Certain MI 1,194 4,451 5,645 100% confidence, human-validated
Dataset C All MI 2,387 4,451 6,838 Mixed (certain + uncertain)
Dataset D Uncertain MI 1,193 4,451 5,644 <100% confidence only

Patient-Wise Splitting

Train, validation, and test sets have zero patient overlap — preventing data leakage and ensuring realistic performance estimates.

Shared Evaluation Sets

Validation (1,234 records) and test (1,253 records) sets are identical across all variants for fair, controlled comparison.


Model Architectures

Two Architectures, Same Conclusion

We compare a conventional hybrid architecture with a modern state space model to show our findings generalize across model families.

CNN-LSTM Hybrid

Type: Convolutional + Recurrent

Parameters: ~367K trainable

Framework: PyTorch

XAI Method: Grad-CAM

Input: (batch, 1000, 12)
3× Conv1D → BiLSTM → FC → Sigmoid

Bidirectional Mamba-2

Type: State Space Model (SSM)

Advantage: Linear complexity O(n)

Framework: PyTorch

XAI Method: Integrated Gradients (Captum)

Input: (batch, 1000, 12)
BiMamba-2 blocks → Classifier → Sigmoid

Key Results

Label Quality Wins — Consistently

Across both architectures, Dataset A (highest label certainty) achieves the best discrimination, calibration, and interpretability.

Consistent Ranking Across Both Architectures

Dataset A>Dataset C>Dataset D

AUROC • Accuracy • Calibration (ECE) • Brier Score • Clinical Interpretability

Metric CNN-LSTM (Dataset A) Description
ROC-AUC99.06%Near-perfect discrimination
Accuracy95.87%Overall correctness
Recall93.31%MI detection sensitivity
Precision88.76%False alarm rate control
F1-Score90.98%Balanced performance
Specificity96.60%Normal case identification

Calibration Matters

Dataset D showed significantly degraded ECE (Expected Calibration Error) on Mamba-2 — uncertain labels harm not just accuracy but model trustworthiness.

Model Mirrors Human Uncertainty

When cardiologists expressed lower diagnostic confidence, model prediction confidence dropped proportionally — clinically valuable for flagging ambiguous cases.

XAI Validates Clinical Relevance

Dataset A models focus on V1–V4 for anterior MI and II/III/aVF for inferior MI. Dataset D models show diffuse, non-specific activations — losing interpretability.


Team

Who We Are

Computer Engineering undergraduates at the University of Peradeniya, Sri Lanka.

CR

Chamath Rupasinghe

E/20/342
CNN-LSTM • Calibration Analysis • XAI
MR

M.L. De Croos Rubin

E/20/054
Bidirectional Mamba-2
NP

S.M.N.N. Padeniya

E/20/276
Evaluation • Dataset Design
Dr. Isuru Nawinne — Supervisor
Dr. Vajira Thambavita — Co-supervisor
Dr. Isuri Devindi — Co-supervisor
Dr. Jørgen Kanters — Clinical Advisor