🚀 We're looking for ML Engineers and Medical Reviewers! Join the OpenPHR Mission →

Getting Started

Concrete recipes for combining models, datasets, and tools into powerful pipelines.

⏱️ 9 min read

Build the Next Cookbook

We're looking for ML Engineers and Technical Writers to help us expand this directory of open-source medical AI recipes.

View Volunteer Roles →

Cookbook 1: The Clinical Reasoning Agent🔗

Deploy an intelligent clinical NLP bot capable of parsing and diagnostic reasoning against structured FHIR patient records entirely locally.

Dataset

Synthea

Simulated synthetic patient populations exported in pure JSON FHIR bundles.

Tool

Medplum

An open-source FHIR server to ingest and serve the Synthea data via a standard REST API.

Model

BioMistral

A locally quantized open-weights LLM fine-tuned specifically on medical literature.

Expected Outcome:

You will have a local React application querying your Medplum FHIR server, taking patient symptoms, and feeding them to BioMistral running in `llama.cpp` to output diagnostic differentials.

$ openphr deploy the-clinical-reasoning-agent

Cookbook 2: Local Medical Imaging Pipeline🔗

Train a customized Convolutional Neural Network (CNN) to detect anomalies in X-rays or MRI scans natively on your GPU infrastructure.

Dataset

Stanford CheXpert

A large public dataset of chest X-rays with structured anomaly labels.

Tool

MONAI

A PyTorch-based framework optimized specifically for reading DICOM files and deep learning in healthcare imaging.

Expected Outcome:

A deployable Docker container utilizing MONAI to intake raw patient DICOM scans and output a semantic segmentation map highlighting potential pneumonia clusters.

$ openphr deploy local-medical-imaging-pipeline

Cookbook 3: Real-Time Wearable Anomaly Detection🔗

Process live ECG and heart rate data from wearables to detect arrhythmias using lightweight edge AI models.

Dataset

MIT-BIH Arrhythmia Database

Standard test material for evaluating arrhythmia detectors, available via PhysioNet.

Model

TensorFlow Lite Micro

A highly compressed 1D-CNN designed to run on microcontrollers with milliwatt power consumption.

Expected Outcome:

A deployable edge model on a Raspberry Pi or smartwatch identifying premature ventricular contractions in real time.

$ openphr deploy real-time-wearable-anomaly-detection

Cookbook 4: Genomic Variant Classification🔗

Build a pipeline to predict the clinical pathogenicity of newly discovered genetic mutations.

Dataset

ClinVar

Public archive of reports of the relationships among human variations and phenotypes.

Tool

Hail

Open-source, scalable framework for exploring and analyzing genomic data at scale.

Model

AlphaMissense

DeepMind's model for predicting the effect of missense variants.

Expected Outcome:

A scalable Spark cluster using Hail to filter patient VCFs and scoring variants with AlphaMissense to flag rare disease candidates.

$ openphr deploy genomic-variant-classification

Cookbook 5: Oncology Pathology Slide Segmentation🔗

Automate the counting and classification of cancer cells in gigapixel whole slide images (WSIs).

Dataset

Camelyon16

Breast cancer metastasis detection dataset in sentinel lymph nodes.

Tool

OpenSlide

A C library that provides a simple interface to read whole-slide images.

Expected Outcome:

A tile-based pipeline extracting high-res patches from WSIs, processing them via a vision transformer (ViT), and generating a tumor-probability heatmap overlay.

$ openphr deploy oncology-pathology-slide-segmentation

Cookbook 6: Medical Audio Dictation Engine🔗

Build a private, HIPAA-compliant voice-to-text service that automatically transcribes doctor-patient encounters.

Model

Whisper (Medical Fine-Tune)

OpenAI's robust ASR model fine-tuned on complex medical jargon and pharmaceutical names.

Tool

CTranslate2

A fast inference engine for Transformer models, executing Whisper locally in real-time.

Expected Outcome:

A local microphone streaming service transcribing clinical encounters directly into the EMR without sending audio to the cloud.

$ openphr deploy medical-audio-dictation-engine

Cookbook 7: Predicting Patient Readmission🔗

Train tabular machine learning models on longitudinal EMR data to flag high-risk discharges.

Dataset

MIMIC-IV

A freely accessible clinical database from the Beth Israel Deaconess Medical Center ICU.

Tool

XGBoost

An optimized distributed gradient boosting library highly effective for sparse clinical tabular data.

Expected Outcome:

A predictive scoring system identifying which ICU patients have a >30% probability of 30-day readmission.

$ openphr deploy predicting-patient-readmission

Cookbook 8: Automated Drug Repurposing Discovery🔗

Use knowledge graphs and link prediction to discover new therapeutic uses for existing FDA-approved drugs.

Dataset

DrugBank / KEGG

Comprehensive databases containing information on drugs, targets, and pathways.

Tool

Neo4j

A powerful graph database engine to map complex biological interactions.

Model

Graph Neural Networks (GNNs)

Models that perform link prediction (e.g., Drug A connects to Disease B).

Expected Outcome:

A queryable interface suggesting statistically probable off-label uses for existing hypertension drugs against novel viral variants.

$ openphr deploy automated-drug-repurposing-discovery

Cookbook 9: Robotic Surgery Video Analysis🔗

Track surgical instruments in real-time endoscopic video to evaluate surgeon performance and prevent errors.

Dataset

EndoVis (Endoscopic Vision Challenge)

A collection of annotated robotic surgery video frames identifying tools and tissue.

Model

YOLOv8

A state-of-the-art, ultra-fast object detection model capable of 60+ FPS inference.

Expected Outcome:

A real-time dashboard overlaying bounding boxes on a live laparoscopy feed, tracking the scalpel and forceps with millimeter precision.

$ openphr deploy robotic-surgery-video-analysis

Cookbook 10: 3D Protein Structure Folding🔗

Predict the 3D atomic structure of a protein from its 1D amino acid sequence to design synthetic antibodies.

Dataset

PDB (Protein Data Bank)

The global archive of experimentally determined 3D structures of biological macromolecules.

Model

OpenFold

A trainable, open-source reproduction of AlphaFold2 for protein folding inference and fine-tuning.

Expected Outcome:

An automated pipeline outputting .pdb files of folded proteins, viewable in PyMOL, to analyze binding pockets for drug discovery.

$ openphr deploy 3d-protein-structure-folding

Cookbook 11: Longitudinal Cohort Extraction🔗

Transform unstructured clinical notes into structured OMOP common data models for population health research.

Dataset

i2b2 NLP Datasets

De-identified clinical notes annotated with named entities, temporal relations, and coreferences.

Tool

Spark NLP for Healthcare

John Snow Labs' commercial-grade NLP library optimized for clinical entity recognition.

Tool

OMOP CDM

The Observational Medical Outcomes Partnership Common Data Model schema.

Expected Outcome:

A distributed Spark job that reads thousands of free-text discharge summaries, extracts ICD-10 and SNOMED codes, and inserts them into a standardized SQL data warehouse.

$ openphr deploy longitudinal-cohort-extraction

Cookbook 12: Real-World Evidence Generation from Social Media🔗

Mine public Reddit and Twitter streams to detect unreported adverse drug reactions in near real-time.

Dataset

SMM4H

Social Media Mining for Health Applications annotated tweets for adverse drug events.

Tool

HuggingFace Transformers

The industry-standard library for downloading and training language models.

Model

ClinicalBERT

A BERT model specifically pre-trained on clinical text.

Expected Outcome:

A pipeline querying the Reddit API, parsing colloquial patient descriptions with ClinicalBERT, and alerting pharmacovigilance teams to novel side effects.

$ openphr deploy real-world-evidence-generation-from-social-media

Cookbook 13: Federated Learning for Multi-Hospital Privacy🔗

Train a global pneumonia detection model across multiple hospitals without ever moving patient data across hospital firewalls.

Tool

NVIDIA FLARE

A robust, open-source SDK for building federated learning paradigms.

Model

ResNet-50

A deep convolutional network suitable for sharing gradients instead of raw X-rays.

Expected Outcome:

A central aggregation server that coordinates model weights from decentralized edge-nodes, achieving state-of-the-art accuracy while maintaining total HIPAA compliance.

$ openphr deploy federated-learning-for-multi-hospital-privacy

Cookbook 14: Reinforcement Learning for Personalized Dosing🔗

Use offline reinforcement learning to determine the optimal dosing strategy for sepsis patients in the ICU.

Dataset

eICU Collaborative Research Database

A multi-center database comprising over 200,000 ICU admissions.

Model

Deep Q-Network (DQN)

An RL algorithm that learns the value of vasopressor dosages given a patient's vital states.

Expected Outcome:

An AI-driven clinician assist tool that recommends real-time fluid and vasopressor adjustments to stabilize blood pressure in septic shock.

$ openphr deploy reinforcement-learning-for-personalized-dosing

Cookbook 15: Automating ICD-10 Coding with LLMs🔗

Eliminate manual medical billing by using Large Language Models to automatically extract correct ICD-10 and CPT codes from discharge summaries.

Model

Clinical LLaMA 3

A quantized LLM tuned specifically for structured information extraction from clinical notes.

Tool

LangChain

A framework for developing applications powered by language models with precise JSON-output parsers.

Expected Outcome:

A deployable API that ingests a raw physician note and reliably outputs a strictly formatted JSON array of valid, billable ICD-10 codes.

$ openphr deploy automating-icd-10-coding-with-llms

Cookbook 16: EEG Seizure Prediction🔗

Train a temporal neural network to predict the onset of epileptic seizures minutes before they occur.

Dataset

CHB-MIT Scalp EEG

Continuous EEG recordings from pediatric subjects with intractable seizures.

Model

LSTM Network

Long Short-Term Memory models excel at finding patterns in sequential time-series data like brainwaves.

Expected Outcome:

A predictive model that processes multi-channel EEG signals and fires an alert 5 minutes prior to clinical seizure onset, allowing for preventative intervention.

$ openphr deploy eeg-seizure-prediction

Cookbook 17: Synthetic Medical Data Generation🔗

Bypass privacy restrictions by using Generative Adversarial Networks (GANs) to create realistic but completely fake patient records.

Tool

SDV (Synthetic Data Vault)

An ecosystem of tools to model and generate synthetic tabular datasets.

Model

CTGAN

A GAN architecture specifically designed to handle mixed continuous and discrete tabular variables.

Expected Outcome:

A script that ingests a highly secure proprietary database and outputs millions of rows of safe, shareable synthetic patient data with matching statistical properties.

$ openphr deploy synthetic-medical-data-generation

Cookbook 18: Digital Pathology Survival Prediction🔗

Predict patient overall survival directly from H&E stained pathology slides using multiple-instance learning.

Dataset

TCGA (The Cancer Genome Atlas)

Vast repository of genomic and histological data matched with patient survival timelines.

Model

CLAM

Clustering-constrained Attention Multiple Instance Learning for gigapixel image classification without localized annotations.

Expected Outcome:

A biomarker discovery tool that identifies highly prognostic morphological regions within tumors and estimates patient risk stratification.

$ openphr deploy digital-pathology-survival-prediction

Cookbook 19: Real-time Bed Capacity Prediction🔗

Forecast hospital census and ICU bed shortages 48 hours in advance to optimize staff scheduling.

Tool

Apache Airflow

An open-source platform to programmatically author, schedule, and monitor data pipelines.

Model

Prophet

Meta's robust time-series forecasting algorithm that handles missing data and large outliers well.

Expected Outcome:

An automated morning dashboard for hospital administration highlighting predicted capacity crunches and recommending proactive discharge planning.

$ openphr deploy real-time-bed-capacity-prediction

Cookbook 20: Antimicrobial Resistance (AMR) Prediction🔗

Predict whether a bacterial infection will resist specific antibiotics by sequencing the pathogen genome.

Dataset

PATRIC

The Pathosystems Resource Integration Center, housing thousands of bacterial genomes and AMR metadata.

Model

Random Forest

An ensemble learning method highly effective at interpreting raw k-mer frequency counts from genomic sequences.

Expected Outcome:

A bioinformatics tool that analyzes rapid genome sequencing from a blood culture and recommends the most effective antibiotic class within hours.

$ openphr deploy antimicrobial-resistance-amr-prediction

Cookbook 21: Drug-Drug Interaction Graph Mining🔗

Identify dangerous polypharmacy side effects by traversing a massive pharmacological knowledge graph.

Dataset

TWOSIDES

A database of polypharmacy side effects extracted from FDA adverse event reports.

Tool

PyTorch Geometric

A geometric deep learning extension library for PyTorch.

Expected Outcome:

An API endpoint for EHR systems that flags severe combination contraindications when a physician attempts to prescribe a new medication.

$ openphr deploy drug-drug-interaction-graph-mining

Cookbook 22: Brain Tumor Segmentation in 3D MRI🔗

Automatically delineate core tumor regions in multi-modal 3D MRI scans to assist neurosurgical planning.

Dataset

BraTS (Brain Tumor Segmentation)

A premier dataset of MRI scans with expert-annotated tumor sub-regions.

Model

3D U-Net

The canonical architecture for volumetric biomedical image segmentation.

Expected Outcome:

A precise 3D voxel mask that isolates edema, enhancing tumor core, and necrotic regions, exportable as an STL file for 3D printing or AR visualization.

$ openphr deploy brain-tumor-segmentation-in-3d-mri

Build the Future of Open Healthcare

Enjoying these cookbooks? The OpenPHR ecosystem is built entirely by open-source volunteers. We are actively recruiting ML engineers, web developers, and clinical reviewers to help expand these guides.

Join as a Volunteer Today

Cookbook 23: Contactless Heart Rate Monitoring🔗

Deploy a privacy-first web application that extracts real-time physiological signals (pulse and heart rate) directly from a user's webcam feed using remote photoplethysmography (rPPG).

Tool

PulseVision

Our open-source computer vision pipeline that isolates the forehead region and amplifies micro-color changes in human skin.

Model

MediaPipe Face Mesh

A lightweight, on-device ML model that tracks 468 3D facial landmarks in real-time, even on mobile devices.

Expected Outcome:

A client-side web application running entirely in the browser that measures resting heart rate within seconds without sending video data to any server.

$ openphr deploy contactless-heart-rate-monitoring

Cookbook 24: CareHub Clinical Trial Screening Bot🔗

Deploy an intelligent patient-facing chatbot that ingests complex clinical trial eligibility matrices and automatically prescreens patients based on their medical history.

Tool

LlamaIndex

A data framework for connecting custom data sources to large language models, perfect for querying complex trial protocols.

Dataset

CareHub Trial Database

Structured OpenPHR clinical trial matrices for diseases like Parkinson's and Atopic Dermatitis.

Expected Outcome:

A deployable, HIPAA-compliant conversational agent that matches patients to active clinical trials using CareHub data matrices.

$ openphr deploy carehub-clinical-trial-screening-bot