EHR-Safe

Synthetic Data Generation General Apache 2.0 Privacy Preserving

N/A GitHub Stars

N/A Open Issues

N/A Docker Support

Unknown Last Updated

Technical Summary

EHR-Safe is a generative modeling framework developed by Google Research that synthesizes highly realistic, privacy-preserving electronic health record (EHR) data.

Key Capabilities

Dual-Modality Generation: Capable of generating both static patient demographic information and dynamic, time-series medical events (e.g., lab results over time, sequential diagnoses, vitals).
High Fidelity: Retains the complex statistical properties, correlations, and longitudinal patterns of the original real-world dataset, meaning models trained on EHR-Safe data perform comparably to models trained on the real data.
Privacy Guarantees: Empirically defends against membership inference and attribute inference attacks, ensuring that no individual patient from the original training set can be re-identified in the synthetic output.

Usage in Healthcare

EHR-Safe solves a critical bottleneck in digital health: the inability to share data due to HIPAA and patient privacy concerns. By training EHR-Safe on a hospital’s internal, sensitive data, the hospital can generate a completely synthetic “twin” dataset. This synthetic dataset can then be freely shared with external academic researchers or commercial partners to accelerate algorithm development and software testing without risking PHI leakage.

💻 Quick Developer Integration

Embed or cite this asset in your research pipeline or GitHub README:

Markdown Badge for GitHub README:

View on GitHub → Source Verified by OpenPHR Catalog