Multimodal Phenotyping in Knee Osteoarthritis Using Vision-Language Models

Discovering biological subtypes through fine-tuned BiomedCLIP and longitudinal data analysis.

Vision-Language Models BiomedCLIP Phenotyping

Abstract

Standard Knee Osteoarthritis (OA) grading often fails to capture the mismatch between structural damage and patient pain. Our research introduces a pipeline that leverages BiomedCLIP to extract 512-dimensional embeddings from 4,502 X-rays. By fusing these with clinical data (Pain, BMI, Age) and using HDBSCAN clustering, we identified distinct phenotypes. Our findings reveal that certain phenotypes, like Lateral JSN, progress twice as fast as others over a 10-year period.

Project Walkthrough

Research Methodology

YOLO Preprocessing

Automated cropping of bilateral X-rays into individual knee regions to isolate joint space.

BiomedCLIP Encoder

Fine-tuned on KL grade regression to ensure visual attention focuses on osteophytes and JSN.

Late Fusion

Fusing 512-D visual vectors with 5 clinical features for HDBSCAN clustering.

Methodology Diagram

Full architecture of the multimodal phenotyping pipeline.

Key Results

86.4%

Model Accuracy

The fine-tuned BiomedCLIP achieved a Mean Absolute Error of 0.865 KL grade units, significantly outperforming zero-shot baselines.

Progression Analysis

  • Lateral JSN phenotype progresses 2x faster than Medial.
  • Identified unique "Pain-Dominant" clusters within KL Grade 2.
  • XAI validation confirmed focus on joint space and condyles.

Meet the Researchers

Amanda

Amanda Siriwardane

E/20/378