Getting Started
Concrete recipes for combining models, datasets, and tools into powerful pipelines.
Build the Next Cookbook
We're looking for ML Engineers and Technical Writers to help us expand this directory of open-source medical AI recipes.
View Volunteer Roles →Cookbook 1: The Clinical Reasoning Agent🔗
Deploy an intelligent clinical NLP bot capable of parsing and diagnostic reasoning against structured FHIR patient records entirely locally.
Synthea
Simulated synthetic patient populations exported in pure JSON FHIR bundles.
Medplum
An open-source FHIR server to ingest and serve the Synthea data via a standard REST API.
BioMistral
A locally quantized open-weights LLM fine-tuned specifically on medical literature.
Expected Outcome:
You will have a local React application querying your Medplum FHIR server, taking patient symptoms, and feeding them to BioMistral running in `llama.cpp` to output diagnostic differentials.
$ openphr deploy the-clinical-reasoning-agent
Cookbook 2: Local Medical Imaging Pipeline🔗
Train a customized Convolutional Neural Network (CNN) to detect anomalies in X-rays or MRI scans natively on your GPU infrastructure.
Stanford CheXpert
A large public dataset of chest X-rays with structured anomaly labels.
MONAI
A PyTorch-based framework optimized specifically for reading DICOM files and deep learning in healthcare imaging.
Expected Outcome:
A deployable Docker container utilizing MONAI to intake raw patient DICOM scans and output a semantic segmentation map highlighting potential pneumonia clusters.
$ openphr deploy local-medical-imaging-pipeline
Cookbook 3: Real-Time Wearable Anomaly Detection🔗
Process live ECG and heart rate data from wearables to detect arrhythmias using lightweight edge AI models.
MIT-BIH Arrhythmia Database
Standard test material for evaluating arrhythmia detectors, available via PhysioNet.
TensorFlow Lite Micro
A highly compressed 1D-CNN designed to run on microcontrollers with milliwatt power consumption.
Expected Outcome:
A deployable edge model on a Raspberry Pi or smartwatch identifying premature ventricular contractions in real time.
$ openphr deploy real-time-wearable-anomaly-detection
Cookbook 4: Genomic Variant Classification🔗
Build a pipeline to predict the clinical pathogenicity of newly discovered genetic mutations.
ClinVar
Public archive of reports of the relationships among human variations and phenotypes.
Hail
Open-source, scalable framework for exploring and analyzing genomic data at scale.
AlphaMissense
DeepMind's model for predicting the effect of missense variants.
Expected Outcome:
A scalable Spark cluster using Hail to filter patient VCFs and scoring variants with AlphaMissense to flag rare disease candidates.
$ openphr deploy genomic-variant-classification
Cookbook 5: Oncology Pathology Slide Segmentation🔗
Automate the counting and classification of cancer cells in gigapixel whole slide images (WSIs).
Camelyon16
Breast cancer metastasis detection dataset in sentinel lymph nodes.
OpenSlide
A C library that provides a simple interface to read whole-slide images.
Expected Outcome:
A tile-based pipeline extracting high-res patches from WSIs, processing them via a vision transformer (ViT), and generating a tumor-probability heatmap overlay.
$ openphr deploy oncology-pathology-slide-segmentation
Cookbook 6: Medical Audio Dictation Engine🔗
Build a private, HIPAA-compliant voice-to-text service that automatically transcribes doctor-patient encounters.
Whisper (Medical Fine-Tune)
OpenAI's robust ASR model fine-tuned on complex medical jargon and pharmaceutical names.
CTranslate2
A fast inference engine for Transformer models, executing Whisper locally in real-time.
Expected Outcome:
A local microphone streaming service transcribing clinical encounters directly into the EMR without sending audio to the cloud.
$ openphr deploy medical-audio-dictation-engine
Cookbook 7: Predicting Patient Readmission🔗
Train tabular machine learning models on longitudinal EMR data to flag high-risk discharges.
MIMIC-IV
A freely accessible clinical database from the Beth Israel Deaconess Medical Center ICU.
XGBoost
An optimized distributed gradient boosting library highly effective for sparse clinical tabular data.
Expected Outcome:
A predictive scoring system identifying which ICU patients have a >30% probability of 30-day readmission.
$ openphr deploy predicting-patient-readmission
Cookbook 8: Automated Drug Repurposing Discovery🔗
Use knowledge graphs and link prediction to discover new therapeutic uses for existing FDA-approved drugs.
DrugBank / KEGG
Comprehensive databases containing information on drugs, targets, and pathways.
Neo4j
A powerful graph database engine to map complex biological interactions.
Graph Neural Networks (GNNs)
Models that perform link prediction (e.g., Drug A connects to Disease B).
Expected Outcome:
A queryable interface suggesting statistically probable off-label uses for existing hypertension drugs against novel viral variants.
$ openphr deploy automated-drug-repurposing-discovery
Cookbook 9: Robotic Surgery Video Analysis🔗
Track surgical instruments in real-time endoscopic video to evaluate surgeon performance and prevent errors.
EndoVis (Endoscopic Vision Challenge)
A collection of annotated robotic surgery video frames identifying tools and tissue.
YOLOv8
A state-of-the-art, ultra-fast object detection model capable of 60+ FPS inference.
Expected Outcome:
A real-time dashboard overlaying bounding boxes on a live laparoscopy feed, tracking the scalpel and forceps with millimeter precision.
$ openphr deploy robotic-surgery-video-analysis
Cookbook 10: 3D Protein Structure Folding🔗
Predict the 3D atomic structure of a protein from its 1D amino acid sequence to design synthetic antibodies.
PDB (Protein Data Bank)
The global archive of experimentally determined 3D structures of biological macromolecules.
OpenFold
A trainable, open-source reproduction of AlphaFold2 for protein folding inference and fine-tuning.
Expected Outcome:
An automated pipeline outputting .pdb files of folded proteins, viewable in PyMOL, to analyze binding pockets for drug discovery.
$ openphr deploy 3d-protein-structure-folding
Cookbook 11: Longitudinal Cohort Extraction🔗
Transform unstructured clinical notes into structured OMOP common data models for population health research.
i2b2 NLP Datasets
De-identified clinical notes annotated with named entities, temporal relations, and coreferences.
Spark NLP for Healthcare
John Snow Labs' commercial-grade NLP library optimized for clinical entity recognition.
OMOP CDM
The Observational Medical Outcomes Partnership Common Data Model schema.
Expected Outcome:
A distributed Spark job that reads thousands of free-text discharge summaries, extracts ICD-10 and SNOMED codes, and inserts them into a standardized SQL data warehouse.
$ openphr deploy longitudinal-cohort-extraction
Cookbook 12: Real-World Evidence Generation from Social Media🔗
Mine public Reddit and Twitter streams to detect unreported adverse drug reactions in near real-time.
SMM4H
Social Media Mining for Health Applications annotated tweets for adverse drug events.
HuggingFace Transformers
The industry-standard library for downloading and training language models.
ClinicalBERT
A BERT model specifically pre-trained on clinical text.
Expected Outcome:
A pipeline querying the Reddit API, parsing colloquial patient descriptions with ClinicalBERT, and alerting pharmacovigilance teams to novel side effects.
$ openphr deploy real-world-evidence-generation-from-social-media
Cookbook 13: Federated Learning for Multi-Hospital Privacy🔗
Train a global pneumonia detection model across multiple hospitals without ever moving patient data across hospital firewalls.
NVIDIA FLARE
A robust, open-source SDK for building federated learning paradigms.
ResNet-50
A deep convolutional network suitable for sharing gradients instead of raw X-rays.
Expected Outcome:
A central aggregation server that coordinates model weights from decentralized edge-nodes, achieving state-of-the-art accuracy while maintaining total HIPAA compliance.
$ openphr deploy federated-learning-for-multi-hospital-privacy
Cookbook 14: Reinforcement Learning for Personalized Dosing🔗
Use offline reinforcement learning to determine the optimal dosing strategy for sepsis patients in the ICU.
eICU Collaborative Research Database
A multi-center database comprising over 200,000 ICU admissions.
Deep Q-Network (DQN)
An RL algorithm that learns the value of vasopressor dosages given a patient's vital states.
Expected Outcome:
An AI-driven clinician assist tool that recommends real-time fluid and vasopressor adjustments to stabilize blood pressure in septic shock.
$ openphr deploy reinforcement-learning-for-personalized-dosing
Cookbook 15: Automating ICD-10 Coding with LLMs🔗
Eliminate manual medical billing by using Large Language Models to automatically extract correct ICD-10 and CPT codes from discharge summaries.
Clinical LLaMA 3
A quantized LLM tuned specifically for structured information extraction from clinical notes.
LangChain
A framework for developing applications powered by language models with precise JSON-output parsers.
Expected Outcome:
A deployable API that ingests a raw physician note and reliably outputs a strictly formatted JSON array of valid, billable ICD-10 codes.
$ openphr deploy automating-icd-10-coding-with-llms
Cookbook 16: EEG Seizure Prediction🔗
Train a temporal neural network to predict the onset of epileptic seizures minutes before they occur.
CHB-MIT Scalp EEG
Continuous EEG recordings from pediatric subjects with intractable seizures.
LSTM Network
Long Short-Term Memory models excel at finding patterns in sequential time-series data like brainwaves.
Expected Outcome:
A predictive model that processes multi-channel EEG signals and fires an alert 5 minutes prior to clinical seizure onset, allowing for preventative intervention.
$ openphr deploy eeg-seizure-prediction
Cookbook 17: Synthetic Medical Data Generation🔗
Bypass privacy restrictions by using Generative Adversarial Networks (GANs) to create realistic but completely fake patient records.
SDV (Synthetic Data Vault)
An ecosystem of tools to model and generate synthetic tabular datasets.
CTGAN
A GAN architecture specifically designed to handle mixed continuous and discrete tabular variables.
Expected Outcome:
A script that ingests a highly secure proprietary database and outputs millions of rows of safe, shareable synthetic patient data with matching statistical properties.
$ openphr deploy synthetic-medical-data-generation
Cookbook 18: Digital Pathology Survival Prediction🔗
Predict patient overall survival directly from H&E stained pathology slides using multiple-instance learning.
TCGA (The Cancer Genome Atlas)
Vast repository of genomic and histological data matched with patient survival timelines.
CLAM
Clustering-constrained Attention Multiple Instance Learning for gigapixel image classification without localized annotations.
Expected Outcome:
A biomarker discovery tool that identifies highly prognostic morphological regions within tumors and estimates patient risk stratification.
$ openphr deploy digital-pathology-survival-prediction
Cookbook 19: Real-time Bed Capacity Prediction🔗
Forecast hospital census and ICU bed shortages 48 hours in advance to optimize staff scheduling.
Apache Airflow
An open-source platform to programmatically author, schedule, and monitor data pipelines.
Prophet
Meta's robust time-series forecasting algorithm that handles missing data and large outliers well.
Expected Outcome:
An automated morning dashboard for hospital administration highlighting predicted capacity crunches and recommending proactive discharge planning.
$ openphr deploy real-time-bed-capacity-prediction
Cookbook 20: Antimicrobial Resistance (AMR) Prediction🔗
Predict whether a bacterial infection will resist specific antibiotics by sequencing the pathogen genome.
PATRIC
The Pathosystems Resource Integration Center, housing thousands of bacterial genomes and AMR metadata.
Random Forest
An ensemble learning method highly effective at interpreting raw k-mer frequency counts from genomic sequences.
Expected Outcome:
A bioinformatics tool that analyzes rapid genome sequencing from a blood culture and recommends the most effective antibiotic class within hours.
$ openphr deploy antimicrobial-resistance-amr-prediction
Cookbook 21: Drug-Drug Interaction Graph Mining🔗
Identify dangerous polypharmacy side effects by traversing a massive pharmacological knowledge graph.
TWOSIDES
A database of polypharmacy side effects extracted from FDA adverse event reports.
PyTorch Geometric
A geometric deep learning extension library for PyTorch.
Expected Outcome:
An API endpoint for EHR systems that flags severe combination contraindications when a physician attempts to prescribe a new medication.
$ openphr deploy drug-drug-interaction-graph-mining
Cookbook 22: Brain Tumor Segmentation in 3D MRI🔗
Automatically delineate core tumor regions in multi-modal 3D MRI scans to assist neurosurgical planning.
BraTS (Brain Tumor Segmentation)
A premier dataset of MRI scans with expert-annotated tumor sub-regions.
3D U-Net
The canonical architecture for volumetric biomedical image segmentation.
Expected Outcome:
A precise 3D voxel mask that isolates edema, enhancing tumor core, and necrotic regions, exportable as an STL file for 3D printing or AR visualization.
$ openphr deploy brain-tumor-segmentation-in-3d-mri
Build the Future of Open Healthcare
Enjoying these cookbooks? The OpenPHR ecosystem is built entirely by open-source volunteers. We are actively recruiting ML engineers, web developers, and clinical reviewers to help expand these guides.
Join as a Volunteer TodayCookbook 23: Contactless Heart Rate Monitoring🔗
Deploy a privacy-first web application that extracts real-time physiological signals (pulse and heart rate) directly from a user's webcam feed using remote photoplethysmography (rPPG).
PulseVision
Our open-source computer vision pipeline that isolates the forehead region and amplifies micro-color changes in human skin.
MediaPipe Face Mesh
A lightweight, on-device ML model that tracks 468 3D facial landmarks in real-time, even on mobile devices.
Expected Outcome:
A client-side web application running entirely in the browser that measures resting heart rate within seconds without sending video data to any server.
$ openphr deploy contactless-heart-rate-monitoring
Cookbook 24: CareHub Clinical Trial Screening Bot🔗
Deploy an intelligent patient-facing chatbot that ingests complex clinical trial eligibility matrices and automatically prescreens patients based on their medical history.
LlamaIndex
A data framework for connecting custom data sources to large language models, perfect for querying complex trial protocols.
CareHub Trial Database
Structured OpenPHR clinical trial matrices for diseases like Parkinson's and Atopic Dermatitis.
Expected Outcome:
A deployable, HIPAA-compliant conversational agent that matches patients to active clinical trials using CareHub data matrices.
$ openphr deploy carehub-clinical-trial-screening-bot