Systematic benchmarking of dimensionality reduction techniques and GNN-based clustering specifically optimised for high-dimensional, sparse, compositional metagenomic data.
Bridging the gap in metagenomics analysis through data-driven methods
Metagenomic studies generate high-dimensional, sparse, and compositional datasets that challenge traditional analytical methods. This project systematically benchmarks 8 dimensionality reduction (DR) methods across 3 diverse metagenomic datasets (Human Gut, Ocean, Potato Soil) and validates Graph Neural Network (GNN) architectures for unsupervised microbial community clustering. Our goal is to provide evidence-based recommendations for the metagenomics community and develop a unified preprocessing-to-evaluation pipeline.
A two-phase approach to comparative metagenomics analysis
End-to-end workflow from raw data to actionable insights
nzCLR transformation & Jaccard/Bray-Curtis distance computation to handle compositionality
8 methods: PCA, PCoA, MDS, t-SNE, UMAP, PaCMAP, PHATE, SONG
KNN cosine graph construction & GNN architectures (DMoN, MinCutPool) + K-Means
Trustworthiness, Continuity, NMI, ARI, Silhouette & cross-dataset correlation analysis
Highlights from Phase 1 DR benchmarking across 3 datasets
Conferences, expositions, and collaborative work
International Conference on Image Processing & Robotics β Research paper presentation on metagenomics clustering using GNN-based methods
Collaborative research initiative integrating biological data science and computational biology methods for comprehensive metagenomics analysis
Peradeniya University International Research Sessions & Exposition β Poster presentation on data-driven metagenomics
Tools and frameworks powering our research
Department of Computer Engineering, University of Peradeniya
Team with Dr. Damayanthi Herath
Academic deliverables and research outputs
Full project presentation covering research motivation, GNN pipeline, evaluation results and conclusions
π₯ Download Presentation (PDF)Poster presented at International Conference on Image Processing & Robotics
πΌοΈ View PosterInternational Conference on Image Processing & Robotics β Paper on GNN-based metagenomics clustering