tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1 module¶

Day intervals cohort module for MIMIC-IV v1.0.

Based on: https://github.com/healthylaife/MIMIC-IV-Data-Pipeline preprocessing/day_intervals_preproc/day_intervals_cohort.py

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.get_visit_pts(mimic4_path: str, group_col: str, visit_col: str, admit_col: str, disch_col: str, adm_visit_col: str | None, use_admn: bool, disease_label: str | None, use_ICU: bool) → DataFrame[source]¶

Combines the MIMIC-IV core/patients table information with either the icu/icustays or core/admissions data.

Parameters:¶

mimic4_path : str¶: Path to mimic-iv folder containing MIMIC-IV data.
group_col : str¶: Patient identifier to group patients (normally "subject_id").
visit_col : str¶: Visit identifier for individual patient visits (normally "hadm_id" or "stay_id").
admit_col : str¶: Column for visit start date information (normally "admittime" or "intime").
disch_col : str¶: Column for visit end date information (normally "dischtime" or "outtime").
adm_visit_col : Optional[str]¶: Column for visit identifier for individual patient visits (normally "hadm_id").
use_admn : bool¶: Flag of whether to use the readmission label. Defaults to False.
disease_label : Optional[str]¶: A disease filter to apply to the label (i.e. “admitted due to”). If None, no filter is applied.
use_ICU : bool¶: Describes whether to specifically look at ICU visits in icu/icustays OR look at general admissions from core/admissions.

Returns:¶

The processed dataframe.

Return type:¶

pd.DataFrame

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.validate_row(row, ctrl, invalid, max_year, disch_col, valid_col, gap)[source]¶

Checks if visit’s prediction window potentially extends beyond the dataset range (2008-2019). An ‘invalid row’ is NOT guaranteed to be outside the range, only potentially outside due to de-identification of MIMIC-IV being done through 3-year time ranges.

To be invalid, the end of the prediction window’s year must both extend beyond the maximum seen year for a patient AND beyond the year that corresponds to the 2017-2019 anchor year range for a patient

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_los(df: DataFrame, los: int, group_col: str, admit_col: str, disch_col: str) → tuple[DataFrame, DataFrame][source]¶

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_readmit(df: DataFrame, gap: timedelta, group_col: str, admit_col: str, disch_col: str) → tuple[DataFrame, DataFrame, DataFrame][source]¶: Applies labels to individual visits according to whether or not a readmission has occurred within the specified gap days. For a given visit, another visit must occur within the gap window for a positive readmission label. The gap window starts from the disch_col time and the admit_col of subsequent visits are considered.

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_mort(df: DataFrame, group_col: str, admit_col: str, disch_col: str, death_col: str) → tuple[DataFrame, DataFrame][source]¶: Applies labels to individual visits according to whether or not a death has occurred within the times of the specified admit_col and disch_col.

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.get_case_ctrls(df: DataFrame, gap: int | None, group_col: str, admit_col: str, disch_col: str, death_col: str, use_mort=False, use_admn=False, use_los=False) → tuple[DataFrame, DataFrame][source]¶

Handles logic for creating the labelled cohort based on arguments passed to extract_data().

Parameters:¶

df : pd.DataFrame¶: Dataframe with patient data.
gap : Optional[int]¶: Specified time interval gap for readmissions.
group_col : str¶: Patient identifier to group patients (normally "subject_id").
admit_col : str¶: Column for visit start date information (normally "admittime" or "intime").
disch_col : str¶: Column for visit end date information (normally "dischtime" or "outtime").
death_col : str¶: Column indicating death for the mortality label.
use_mort : bool, optional¶: Flag of whether to use the mortality label. Defaults to False.
use_admn : bool, optional¶: Flag of whether to use the readmission label. Defaults to False.
use_los : bool, optional¶: Flag of whether to use the length of stay label. Defaults to False.

Returns:¶

Processed dataframes, (cohort, invalid).

Return type:¶

Tuple[pd.DataFrame, pd.DataFrame]

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.extract_data(version: str, use_ICU: bool, label: typing_extensions.Literal[Mortality, Readmission, Length of Stay], time: int, icd_code: str | None, root_dir: str, disease_label: str | None, cohort_output: str | None = None, summary_output: str | None = None) → tuple[DataFrame, str][source]¶

Prepare and save the cohort and summary files for a the given data settings.

Note

Example disease codes for icd_code and disease_label are: - Heart failure: "I50". - CAD (Coronary Artery Disease): "I25". - CKD (Chronic Kidney Disease): "N18". - COPD (Chronic obstructive pulmonary disease): "J44".

Parameters:¶

version : str: MIMIC-IV version, e.g. "1.0".
use_ICU : bool: String indicating whether to extract for the ICU (True) or non-ICU (False) data.
label : Label: Label to use for the cohort.
time : int: The time associated with the label. If label is "Readmission", this is the gap between admissions in days. If label is "Length of Stay", this is the minimum length of stay to consider, in days. If label is "Mortality", this is ignored.
icd_code : Optional[str]: The ICD code to use as a disease filter for the cohort. If None, no filter is applied.
root_dir : str: Data root directory. The MIMIC version subdirectory (e.g. "1.0") is expected to be found under this.
disease_label : Optional[str]: A disease filter to apply to the label (i.e. “admitted due to”). If None, no filter is applied.
cohort_output : Optional[str], optional: Custom cohort file descriptor, if None, will generate automatically based on the inputs. Defaults to None.
summary_output : Optional[str], optional: Custom summary file descriptor, if None, will generate automatically based on the inputs. Defaults to None.

Returns:¶

(cohort, cohort_output), cohort dataframe and the cohort file descriptor.

Return type:¶

Tuple[pd.DataFrame, str]