tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1 module

Day intervals cohort module for MIMIC-IV v1.0.

Based on: https://github.com/healthylaife/MIMIC-IV-Data-Pipeline preprocessing/day_intervals_preproc/day_intervals_cohort.py

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.get_visit_pts(mimic4_path: str, group_col: str, visit_col: str, admit_col: str, disch_col: str, adm_visit_col: str | None, use_admn: bool, disease_label: str | None, use_ICU: bool) DataFrame[source]

Combines the MIMIC-IV core/patients table information with either the icu/icustays or core/admissions data.

Parameters:
mimic4_path : str

Path to mimic-iv folder containing MIMIC-IV data.

group_col : str

Patient identifier to group patients (normally "subject_id").

visit_col : str

Visit identifier for individual patient visits (normally "hadm_id" or "stay_id").

admit_col : str

Column for visit start date information (normally "admittime" or "intime").

disch_col : str

Column for visit end date information (normally "dischtime" or "outtime").

adm_visit_col : Optional[str]

Column for visit identifier for individual patient visits (normally "hadm_id").

use_admn : bool

Flag of whether to use the readmission label. Defaults to False.

disease_label : Optional[str]

A disease filter to apply to the label (i.e. “admitted due to”). If None, no filter is applied.

use_ICU : bool

Describes whether to specifically look at ICU visits in icu/icustays OR look at general admissions from core/admissions.

Returns:

The processed dataframe.

Return type:

pd.DataFrame

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.validate_row(row, ctrl, invalid, max_year, disch_col, valid_col, gap)[source]

Checks if visit’s prediction window potentially extends beyond the dataset range (2008-2019). An ‘invalid row’ is NOT guaranteed to be outside the range, only potentially outside due to de-identification of MIMIC-IV being done through 3-year time ranges.

To be invalid, the end of the prediction window’s year must both extend beyond the maximum seen year for a patient AND beyond the year that corresponds to the 2017-2019 anchor year range for a patient

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_los(df: DataFrame, los: int, group_col: str, admit_col: str, disch_col: str) tuple[DataFrame, DataFrame][source]
tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_readmit(df: DataFrame, gap: timedelta, group_col: str, admit_col: str, disch_col: str) tuple[DataFrame, DataFrame, DataFrame][source]

Applies labels to individual visits according to whether or not a readmission has occurred within the specified gap days. For a given visit, another visit must occur within the gap window for a positive readmission label. The gap window starts from the disch_col time and the admit_col of subsequent visits are considered.

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_mort(df: DataFrame, group_col: str, admit_col: str, disch_col: str, death_col: str) tuple[DataFrame, DataFrame][source]

Applies labels to individual visits according to whether or not a death has occurred within the times of the specified admit_col and disch_col.

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.get_case_ctrls(df: DataFrame, gap: int | None, group_col: str, admit_col: str, disch_col: str, death_col: str, use_mort=False, use_admn=False, use_los=False) tuple[DataFrame, DataFrame][source]

Handles logic for creating the labelled cohort based on arguments passed to extract_data().

Parameters:
df : pd.DataFrame

Dataframe with patient data.

gap : Optional[int]

Specified time interval gap for readmissions.

group_col : str

Patient identifier to group patients (normally "subject_id").

admit_col : str

Column for visit start date information (normally "admittime" or "intime").

disch_col : str

Column for visit end date information (normally "dischtime" or "outtime").

death_col : str

Column indicating death for the mortality label.

use_mort : bool, optional

Flag of whether to use the mortality label. Defaults to False.

use_admn : bool, optional

Flag of whether to use the readmission label. Defaults to False.

use_los : bool, optional

Flag of whether to use the length of stay label. Defaults to False.

Returns:

Processed dataframes, (cohort, invalid).

Return type:

Tuple[pd.DataFrame, pd.DataFrame]

tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.extract_data(version: str, use_ICU: bool, label: typing_extensions.Literal[Mortality, Readmission, Length of Stay], time: int, icd_code: str | None, root_dir: str, disease_label: str | None, cohort_output: str | None = None, summary_output: str | None = None) tuple[DataFrame, str][source]

Prepare and save the cohort and summary files for a the given data settings.

Note

Example disease codes for icd_code and disease_label are: - Heart failure: "I50". - CAD (Coronary Artery Disease): "I25". - CKD (Chronic Kidney Disease): "N18". - COPD (Chronic obstructive pulmonary disease): "J44".

Parameters:
version : str

MIMIC-IV version, e.g. "1.0".

use_ICU : bool

String indicating whether to extract for the ICU (True) or non-ICU (False) data.

label : Label

Label to use for the cohort.

time : int

The time associated with the label. If label is "Readmission", this is the gap between admissions in days. If label is "Length of Stay", this is the minimum length of stay to consider, in days. If label is "Mortality", this is ignored.

icd_code : Optional[str]

The ICD code to use as a disease filter for the cohort. If None, no filter is applied.

root_dir : str

Data root directory. The MIMIC version subdirectory (e.g. "1.0") is expected to be found under this.

disease_label : Optional[str]

A disease filter to apply to the label (i.e. “admitted due to”). If None, no filter is applied.

cohort_output : Optional[str], optional

Custom cohort file descriptor, if None, will generate automatically based on the inputs. Defaults to None.

summary_output : Optional[str], optional

Custom summary file descriptor, if None, will generate automatically based on the inputs. Defaults to None.

Returns:

(cohort, cohort_output), cohort dataframe and the cohort file descriptor.

Return type:

Tuple[pd.DataFrame, str]