A comparative study benchmarking DDPM, Latent Diffusion Models, and Flow Matching for synthetic cardiac MRI generation - quantifying trade-offs between image quality, segmentation utility, and patient privacy.
Deep learning in cardiac MRI (CMR) is fundamentally constrained by both data scarcity and privacy regulations. This study systematically benchmarks three generative architectures: Denoising Diffusion Probabilistic Models (DDPM), Latent Diffusion Models (LDM), and Flow Matching (FM) for synthetic CMR generation. Utilizing a two-stage pipeline where anatomical masks condition image synthesis, we evaluate generated data across three critical axes: fidelity, utility, and privacy.
Our results show that diffusion-based models, particularly DDPM, provide the most effective balance between downstream segmentation utility, image fidelity, and privacy preservation under limited-data conditions, while FM demonstrates promising privacy characteristics with slightly lower task-level performance. These findings quantify the trade-offs between cross-domain generalization and patient confidentiality, establishing a framework for safe and effective synthetic data augmentation in medical imaging.
A two-phase pipeline: generative models produce synthetic CMR images from noise conditioned on anatomical masks, then outputs are rigorously evaluated across fidelity, utility, and privacy. Click each model tab to explore its architecture.
Statistical divergence between real and synthetic image distributions at the feature level
Full-reference & no-reference metrics for pixel, structural, and perceptual quality
Iterative denoising from Gaussian noise guided by learned score functions. Produces anatomically coherent, high-fidelity CMR images with strong segmentation utility under limited-data conditions.
Diffusion process operating in a compressed latent space via a pre-trained encoder-decoder, enabling computationally efficient synthesis with competitive image quality at reduced resource cost.
Continuous normalizing flow trained with simulation-free objectives. Demonstrates strong privacy characteristics with promising generalization, slightly below DDPM in task utility.
Fidelity Evaluation
| Evaluation Metric | Diffusion-DDPM | Diffusion-LDM | Flow Match |
|---|---|---|---|
| SSIM ↑ | 0.22 | 0.18 | 0.22 |
| MS-SSIM ↑ | 0.36 | 0.33 | 0.40 |
| PSNR ↑ | 10.67 | 9.95 | 11.44 |
| FID ↓ | 72.52 | 95.17 | 108.32 |
| KID ↓ | 0.04 | 0.08 | 0.098 |
| LPIPS ↓ | 0.49 | 0.51 | 0.48 |
↓ lower is better · ↑ higher is better · Teal = best per metric
Evaluation of Segmentation Model
| Training Setup | M&M Testing | ACDC Testing | ||||||
|---|---|---|---|---|---|---|---|---|
| Dice | IoU | HD95 | ASD | Dice | IoU | HD95 | ASD | |
| M&M (Real) | 0.90 | 0.84 | 2.99 | 1.04 | 0.91 | 0.83 | 2.89 | 0.94 |
| ACDC (Real) | 0.91 | 0.83 | 3.88 | 1.23 | 0.95 | 0.87 | 2.65 | 0.75 |
| DDPM Full-Syn | 0.87 | 0.80 | 5.77 | 1.79 | 0.87 | 0.80 | 6.28 | 1.78 |
| DDPM ACDC-Syn | 0.86 | 0.79 | 5.98 | 1.88 | 0.88 | 0.81 | 4.72 | 1.45 |
| DDPM M&M-Syn | 0.89 | 0.83 | 4.34 | 1.41 | 0.90 | 0.84 | 4.61 | 1.34 |
| LDM Full-Syn | 0.87 | 0.79 | 5.28 | 1.74 | 0.87 | 0.80 | 4.90 | 1.52 |
| LDM ACDC-Syn | 0.87 | 0.80 | 7.94 | 2.48 | 0.88 | 0.79 | 9.38 | 2.43 |
| LDM M&M-Syn | 0.89 | 0.82 | 4.17 | 1.40 | 0.88 | 0.82 | 2.26 | 1.53 |
| FM Full-Syn | 0.87 | 0.80 | 7.80 | 2.08 | 0.88 | 0.81 | 6.29 | 1.81 |
| FM ACDC-Syn | 0.82 | 0.73 | 8.75 | 2.90 | 0.85 | 0.76 | 7.74 | 2.19 |
| FM M&M-Syn | 0.85 | 0.82 | 5.04 | 1.64 | 0.89 | 0.82 | 5.73 | 1.67 |
Full-Syn: conditioned synthetic masks only · ACDC-Syn: conditioned on ACDC training masks · M&M-Syn: conditioned on M&M training masks · Teal = best among synthetic setups
Privacy Evaluation
| Evaluation Metric | Diffusion-DDPM | Diffusion-LDM | Flow Match |
|---|---|---|---|
| Nearest Neighbor L2 ↑ | 12.0 | 10.5 | 19.0 |
| LPIPS ↑ | 0.36 | 0.37 | 0.41 |
| NNDR ↑ | 0.83 | 0.85 | 0.87 |
| ROC_AUC (MIA) ↓ | 0.6029 | 0.580 | 0.6038 |
↑ higher is better for L2, LPIPS, NNDR · ↓ ROC-AUC closer to 0.5 indicates stronger privacy
DDPM achieves the best distribution-level fidelity with the lowest FID (72.52) and KID (0.04), while Flow Matching leads on pixel-level metrics PSNR (11.44) and MS-SSIM (0.40).
M&M-conditioned synthesis consistently outperforms other synthetic setups. DDPM M&M-Syn achieves the best ACDC Dice (0.90), while LDM M&M-Syn leads on M&M Dice (0.89) and HD95 (4.17).
Flow Matching provides the strongest privacy with the highest L2 (19.0), LPIPS (0.41), and NNDR (0.87). LDM achieves the lowest MIA ROC-AUC (0.580), closest to the ideal 0.5.
All three models yield MIA AUC scores of 0.58-0.60, confirming robust resistance to re-identification attacks and validating the pipeline for safe synthetic data augmentation in medical imaging.
Team
Supervisors