🚀 We're looking for ML Engineers and Medical Reviewers! Join the OpenPHR Mission →
Back to Marketplace
Dataset

Cervical Cancer Risk Classification

Clinical / Demographics Oncology / Women's Health CC0: Public Domain De-identified
N/A GitHub Stars
N/A Open Issues
N/A Docker Support
N/A Last Updated

Technical Summary

This dataset contains demographic information, habits, and historic medical records for a set of patients at the Hospital Universitario de Caracas.

Key Capabilities

  • Risk Factor Analysis: Contains 858 records with 32 attributes, including patient age, number of sexual partners, first sexual intercourse, number of pregnancies, smoking habits, hormonal contraceptives, IUD use, STDs, and historic diagnoses.
  • Multi-Target Prediction: Includes four distinct target variables related to cervical cancer screening results: Hinselmann, Schiller, Cytology, and Biopsy.
  • Imbalanced Data Benchmark: Serves as a standard benchmark for machine learning researchers developing techniques to handle missing values and highly imbalanced classification problems in healthcare settings.

Usage in Healthcare

Data scientists use this dataset to train predictive models that can identify women at high risk for cervical cancer based on their clinical history and lifestyle factors. These models can potentially be integrated into EHR systems as clinical decision support tools to recommend preventative screening (like Pap smears or HPV tests) for high-risk individuals who might otherwise be overlooked.