🚀 We're looking for ML Engineers and Medical Reviewers! Join the OpenPHR Mission →
Back to Marketplace
Model

EHR-Safe

Synthetic Data Generation General Apache 2.0 Privacy Preserving
N/A GitHub Stars
N/A Open Issues
N/A Docker Support
N/A Last Updated

Technical Summary

EHR-Safe is a generative modeling framework developed by Google Research that synthesizes highly realistic, privacy-preserving electronic health record (EHR) data.

Key Capabilities

  • Dual-Modality Generation: Capable of generating both static patient demographic information and dynamic, time-series medical events (e.g., lab results over time, sequential diagnoses, vitals).
  • High Fidelity: Retains the complex statistical properties, correlations, and longitudinal patterns of the original real-world dataset, meaning models trained on EHR-Safe data perform comparably to models trained on the real data.
  • Privacy Guarantees: Empirically defends against membership inference and attribute inference attacks, ensuring that no individual patient from the original training set can be re-identified in the synthetic output.

Usage in Healthcare

EHR-Safe solves a critical bottleneck in digital health: the inability to share data due to HIPAA and patient privacy concerns. By training EHR-Safe on a hospital’s internal, sensitive data, the hospital can generate a completely synthetic “twin” dataset. This synthetic dataset can then be freely shared with external academic researchers or commercial partners to accelerate algorithm development and software testing without risking PHI leakage.