Automated Medical Image Annotation for Dataset Building
Table of content
Abstract
Medical image annotation to build datasets leverages in many clinical applications such as diagnosis and treatment planning. Automated medical image annotation shows an efficient solution over manual annotation in dataset building. In this work, we focus on automated user-interactive oral image annotation that could perform automated annotations with assistance of user prompts such as text,points and bounding boxes. Meta AI’s Segment Anything Model (SAM) , a vision foundation model trained on the largest segmentation dataset for interactive promptable segmentation with impressive zero-shot performance has increased the potential for medical image segmentation. However, SAM shows limited performance with the images that differ from the trained dataset or images with challenging conditions like irregular regions and boundaries and text-to-mask task seems exploratory.
In this work, we explore a comprehensive study on automating oral image annotation and related work using the foundation models such as SAM, Dino, Grounding Dino,Grounded SAM addressing the above limitations. At the end, we discuss the potential research gaps in automating medical image annotation and propose our methodology to address the identified gaps.
Related works
Object Detection Foundation model
Grounding DINO
- State-of-the-art zero-shot object-set detection model.
- Support (image,text) input.
- Trained on natural images.
Object Segmentation Foundation model
1) SAM
- A promptable segmentation foundation model
- Support points,box,text annotation input.
- Trained on natural images.
- Leverages zero-shot generalization capability to unseen image distributions and tasks
2) MedSAM
- Based on SAM model specified for medical images
Models used for Detection and Segmentation
1) Grounded-SAM
- Combines Grounding DINO and SAM : detect and segment anything with text.
- Integrates object detection and segmentation for open-vocabulary tasks for natural images.
2) TongueSAM
- Integrates object detection and segmentation for open-vocabulary tasks for natural images.
- Trained only for specific task.
Few-shot paradigm
- Aimed at learning from limited labeled data.
- Leverages the existing knowledge to learn new tasks efficiently.
Few-shot keypoint detection
- Predict the keypoints with uncertainty in a query image given the support keypoints.
- N-way-K-shot detection. N: support keypoints K: support images
Few-shot Segmentation
1) UniverSeg : Universal Medical Image Segmentation
- Enables solving new segmentation tasks without retraining.
- A novel flexible CrossBlock mechanism that transfers information from the example set to the new image.
- Tasks are dynamically assigned during inference.
Pipeline
Results and Analysis
MedSAM Results
MedSAM Finetune (Flare 22 CT dataset)
MedSAM Finetune (Tufts Teeth dataset with training)
MedSAM Finetune (Tufts Teeth dataset with validation)
Few-shot keypoint detection Results
Episodic Attention Maps
Support Images
Query
Query Prediction
UniverSeg Results
Increasing the support set size improves the Dice score evaluation of the prediction.
Visualizations
1) WBC dataset Support set samples
Test Predictions for varying Support Set Size N
3) OASIS dataset Support set samples
5) Tufts dataset Support set samples
Test Predictions for varying Support Set Size N
6) ISIC2018 dataset Support set samples
Test Predictions for varying Support Set Size N