Final Year Project · University of Peradeniya · 2025

Discovering hidden subtypes of Knee Osteoarthritis using AI

We fine-tuned a Vision-Language Model on 8,945 knee X-rays and found that patients with the same diagnosis can have fundamentally different disease — with different progression rates and different treatment needs.

BiomedCLIP HDBSCAN Clustering OAI Dataset 10-Year Follow-up Explainable AI
4,502
DICOM X-rays
8,945
Knee images
0.841
Best silhouette
2×
Faster progression

01 — Background

Why standard grading isn't enough

Doctors currently grade knee osteoarthritis on a scale of 0 to 4 — the Kellgren-Lawrence (KL) system — based on what bone damage looks like on an X-ray. But this creates a deep problem: two patients with the same grade can feel completely different.


One patient with KL Grade 2 might have severe pain and rapid deterioration. Another with the same grade feels fine and stays stable for years. Current AI models are trained to reproduce this grading, meaning they're stuck in the same blind spot.

⚠ The Discordance Paradox

A patient can have a "Mild" X-ray but severe pain — or a "Severe" X-ray and no pain at all. This isn't noise; it's a signal that OA has multiple biological subtypes that one number cannot capture.

Grade
What it means
X-ray signs
0
Healthy
None
1
Doubtful
Possible bone spurs
2
Mild
Definite bone spurs
3
Moderate
Joint space narrowing
4
Severe
Bone-on-bone contact

Our approach doesn't replace this system — it looks deeper inside each grade to find biologically distinct patient subgroups (phenotypes) using a Vision-Language Model trained on medical images.

02 — How It Works

A 7-stage AI pipeline

From raw hospital X-rays to clinically validated patient phenotypes with 10-year longitudinal confirmation.

01
🗂️
DICOM Preprocessing
4,502 bilateral X-rays downloaded from OAI. YOLO crops left & right knees separately.
02
🔬
Zero-Shot Embedding
BiomedCLIP extracts 512-dimensional vectors from each knee. Tested as baseline — produced 55 fragmented clusters.
03
🎯
Fine-Tuning
KL-grade regression trains the upper 6 transformer blocks. MAE drops to 0.865 grade units.
04
🔗
Multimodal Fusion
Visual embeddings fused with pain scores, JSN, osteophytes, age & BMI. UMAP reduces to 2D.
05
🧬
HDBSCAN Clustering
Clusters independently within each KL grade. Finds phenotypes that aren't just severity differences.
06
👁️
XAI Validation
CLS-token attention maps confirm the model looks at the joint space — not image borders.
07
📈
Longitudinal Check
OAI follow-ups (V00–V10) confirm clusters predict real disease progression over 10 years.
Full pipeline diagram

Fig. 1 — Full seven-stage architecture · BiomedCLIP fine-tuning · HDBSCAN within-grade clustering

03 — What We Found

Results that matter clinically

Fine-Tuning Performance
86.4%
Within-1 Grade Accuracy
The model predicts KL severity within one grade step 86.4% of the time. MAE = 0.865 grade units. Exact match = 31.0% — comparable to inter-rater radiologist agreement.
Clustering Quality — Silhouette Scores
KL 1
0.789
KL 2
0.783
KL 3
0.781
KL 4
0.841
All scores above 0.5 — considered "strong" cluster separation. Scores above 0.3 are adequate for biological discovery.
Key Clinical Findings
  • F.01 Lateral JSN progresses 2× faster than Medial JSN over 10 years (p < 0.0001, Bonferroni-corrected). This is the headline result — same KL grade at baseline, completely different long-term outcome.
  • F.02 Pain-susceptibility phenotype discovered in KL Grade 0: a cluster of 1,219 patients with elevated pain (WOMAC = 4.7) and zero structural damage. These patients are currently invisible to radiographic grading.
  • F.03 Lateral JSN cluster stability = 97.2% and Medial JSN stability = 95.0% over 8 years — confirming these are genuine biological phenotypes, not statistical noise.
  • F.04 50% of Lateral JSN patients progressed ≥1 KL grade by 8 years, vs 25.8% Medial, 22.1% No JSN, 19.6% Pain-Dominant, 16.7% Healthy — enabling early risk stratification.
XAI — Where the model looks
CLS-Token Attention Validation
After fine-tuning, CLS-token attention maps show the model consistently focuses on the joint space line, femoral condyles, and tibial plateau — the exact anatomical regions relevant to JSN and osteophyte grading. Zero-shot BiomedCLIP instead focused on image borders and ruler strips.
XAI KL0 phenotype comparison

KL Grade 0 — CLS attention · Cluster 0 (pain=4.7) vs Cluster 1 (pain=0.2)

10-Year Longitudinal Progression
KL Grade Change from Baseline
The Lateral JSN phenotype accumulates a mean ΔKL of 0.75 by V120 — nearly double the Medial JSN group (0.41) and nearly triple the Healthy group (0.28).
KL Grade progression

Fig. — KL grade trajectories across 5 phenotypes · V00 to V120 (10 years)

Five distinct patient subtypes

These are the clinically meaningful groups our framework discovered — each requiring a different treatment strategy.

Lateral JSN
KL Grades 1–4 · n = 671
Dominant narrowing of the lateral compartment. Fastest-progressing phenotype — 50% advance ≥1 KL grade within 8 years. Likely benefits from lateral unloading bracing.
jsn_lat ≈ 1.0 50.0% ≥1 grade/8yr
Medial JSN
KL Grades 1–4 · n = 3,082
Dominant narrowing of the medial compartment. Most common phenotype. Moderate progression rate. At KL 4, shows slightly higher pain than Lateral group (2.07 vs 1.69).
jsn_med ≈ 1.0 25.8% ≥1 grade/8yr
No JSN
KL Grades 1–2 · n = 1,780
Osteophyte-dominant involvement with minimal joint space narrowing. Structurally present but without compartment-specific damage pattern.
jsn ≈ 0 22.1% ≥1 grade/8yr
Pain-Dominant
KL Grade 0 · n = 1,219
High pain (WOMAC = 4.7) with zero structural damage on X-ray. Classic discordance paradox. Represents neurobiological pain susceptibility — completely invisible to standard KL grading.
pain = 4.7 jsn = 0
Healthy
KL Grade 0 · n = 2,193
No structural damage and low pain (WOMAC = 0.2). Slowest progression. Serves as the true baseline comparator across all longitudinal analyses.
pain = 0.2 16.7% ≥1 grade/8yr
🔬
Clinical Implication
Instead of treating all KL 2 patients identically, physicians can now stratify by compartment phenotype and prescribe targeted interventions — valgus/varus bracing, anti-inflammatories, or pain management protocols.

05 — Technical Details

Model architecture & training

1

BiomedCLIP as the backbone

Pre-trained on 15 million PubMed figure-caption pairs using InfoNCE contrastive loss. Its ViT vision encoder maps images to 512-dimensional vectors in a shared image-text embedding space.

microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
2

Selective layer freezing

Transformer blocks 0–5, patch embed, positional embed, and CLS token are frozen (152.9M params). Only blocks 6–11 and the regression head are trained (42.9M params, 21.9% of model).

Prevents catastrophic forgetting
3

Regression, not classification

MSE loss with sigmoid activation scaled to [0,4]. Regression imposes ordinal structure — KL grades are treated as continuous, not discrete classes. This avoids hard decision boundaries that reproduce existing bias.

AdamW · lr=5e-4 · cosine annealing
4

Within-grade HDBSCAN clustering

Clustering independently inside each KL grade ensures discovered clusters represent phenotypic variation — not severity differences that are already known. Noise points are rejected rather than forced into clusters.

min_cluster_size tuned per grade
Model Configuration
Base model
BiomedCLIP ViT-B/16
Total parameters
195.9M
Trainable params
42.9M (21.9%)
Embedding dimension
512-D
Training loss
MSE + Ranking
Best epoch
21 / 30
Val MAE
0.865 KL units
Val RMSE
1.055
Within-1 accuracy
86.4%
Dataset
OAI V00 (8,945 knees)
UMAP n_neighbors
15
UMAP min_dist
0.1
Clustering algorithm
HDBSCAN (per grade)
XAI method
CLS-token attention

06 — Research Team

The people behind this work

A
I.A.U. Siriwardane
E/20/378
Computer Engineering · University of Peradeniya
H
K.G.H. Nirmani
E/20/271
Computer Engineering · University of Peradeniya
N
N.R.P. Gunathilake
E/20/122
Computer Engineering · University of Peradeniya
YV
Ms. Yasodha Vimukthi
Dept. of Computer Engineering · Faculty of Engineering
DH
Dr. Damayanthi Herath
Dept. of Computer Engineering · Faculty of Engineering
MR
Mr. A.M. Mohamed Rikas
Dept. of Physiotherapy · Faculty of Allied Health Sciences