Tutorial

Zero-Shot Medical Image Segmentation with MedSAM

Difficulty: Beginner Time: 15 min read

Introduction

In this cookbook, we will cover how to deploy MedSAM, a foundational model for medical image segmentation, to perform zero-shot segmentation on radiology images without any fine-tuning. We'll explore both 2D slices and how to adapt it for 3D volumes.

Architecture Overview


graph LR
    Img(Medical Image) --> Encoder(ViT Image Encoder)
    Prompt(Bounding Box / Points) --> PEncoder(Prompt Encoder)
    Encoder --> Decoder(Mask Decoder)
    PEncoder --> Decoder
    Decoder --> Output(Segmentation Mask)

Prerequisites

Python 3.10+
PyTorch and Segment Anything (SAM) installed
NiBabel (for NIfTI files)
A sample MRI or CT scan (DICOM, PNG, or NIfTI)

Step 1: Install Dependencies

pip install git+https://github.com/facebookresearch/segment-anything.git
pip install torch torchvision opencv-python matplotlib nibabel

Step 2: Load the MedSAM Model

First, download the MedSAM weights and initialize the model.

from segment_anything import sam_model_registry, SamPredictor
import torch

medsam_checkpoint = "medsam_vit_b.pth"
device = "cuda" if torch.cuda.is_available() else "cpu"

sam = sam_model_registry["vit_b"](checkpoint=medsam_checkpoint)
sam.to(device=device)

predictor = SamPredictor(sam)

Step 3: 2D Segmentation with Bounding Boxes and Points

We can provide a bounding box prompt and optional point prompts (positive=foreground, negative=background) to MedSAM to extract a specific organ or tumor.

import cv2
import matplotlib.pyplot as plt
import numpy as np

# Load your medical image
image = cv2.imread('sample_mri.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

predictor.set_image(image)

# Define a bounding box prompt [x_min, y_min, x_max, y_max]
input_box = np.array([100, 150, 300, 400])

# Optional: Add a positive point prompt inside the tumor
input_point = np.array([[200, 250]])
input_label = np.array([1]) # 1 indicates foreground

masks, scores, _ = predictor.predict(
    point_coords=input_point,
    point_labels=input_label,
    box=input_box[None, :],
    multimask_output=False,
)

# Visualize the mask
plt.imshow(image)
plt.imshow(masks[0], alpha=0.5, cmap='jet')
plt.title(f"MedSAM Mask (Score: {scores[0]:.3f})")
plt.show()

Step 4: Adapting MedSAM for 3D NIfTI Volumes

Because MedSAM is natively a 2D model, to segment a 3D volume (like a CT scan), we must iterate through the slices. You can propagate the bounding box from the center slice to adjacent slices.

import nibabel as nib

# Load a 3D NIfTI volume
nii_data = nib.load('patient_ct.nii.gz').get_fdata()
volume_mask = np.zeros_like(nii_data)

# Assuming we have a bounding box for the entire organ
for z in range(nii_data.shape[2]):
    slice_img = nii_data[:, :, z]
    
    # Normalize to 0-255 uint8 for SAM
    slice_img = cv2.normalize(slice_img, None, 0, 255, cv2.NORM_MINMAX, dtype=cv2.CV_8U)
    slice_rgb = cv2.cvtColor(slice_img, cv2.COLOR_GRAY2RGB)
    
    predictor.set_image(slice_rgb)
    
    # Predict slice-by-slice
    mask, _, _ = predictor.predict(
        box=input_box[None, :],
        multimask_output=False
    )
    volume_mask[:, :, z] = mask[0]

# Save the predicted 3D mask
nii_out = nib.Nifti1Image(volume_mask.astype(np.uint8), np.eye(4))
nib.save(nii_out, 'predicted_mask.nii.gz')

Conclusion

MedSAM offers an incredibly powerful zero-shot segmentation capability. By simply providing a bounding box and points, you can extract precise structures from 2D slices or iteratively across 3D volumes instantly.