tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1 module¶
Day intervals cohort module for MIMIC-IV v1.0.
Based on:
https://github.com/healthylaife/MIMIC-IV-Data-Pipeline
preprocessing/day_intervals_preproc/day_intervals_cohort.py
- tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.get_visit_pts(mimic4_path: str, group_col: str, visit_col: str, admit_col: str, disch_col: str, adm_visit_col: str | None, use_admn: bool, disease_label: str | None, use_ICU: bool) DataFrame[source]¶
Combines the MIMIC-IV core/patients table information with either the
icu/icustaysorcore/admissionsdata.- Parameters:¶
- mimic4_path : str¶
Path to mimic-iv folder containing MIMIC-IV data.
- group_col : str¶
Patient identifier to group patients (normally
"subject_id").- visit_col : str¶
Visit identifier for individual patient visits (normally
"hadm_id"or"stay_id").- admit_col : str¶
Column for visit start date information (normally
"admittime"or"intime").- disch_col : str¶
Column for visit end date information (normally
"dischtime"or"outtime").- adm_visit_col : Optional[str]¶
Column for visit identifier for individual patient visits (normally
"hadm_id").- use_admn : bool¶
Flag of whether to use the readmission label. Defaults to
False.- disease_label : Optional[str]¶
A disease filter to apply to the label (i.e. “admitted due to”). If
None, no filter is applied.- use_ICU : bool¶
Describes whether to specifically look at ICU visits in
icu/icustaysOR look at general admissions fromcore/admissions.
- Returns:¶
The processed dataframe.
- Return type:¶
pd.DataFrame
- tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.validate_row(row, ctrl, invalid, max_year, disch_col, valid_col, gap)[source]¶
Checks if visit’s prediction window potentially extends beyond the dataset range (2008-2019). An ‘invalid row’ is NOT guaranteed to be outside the range, only potentially outside due to de-identification of MIMIC-IV being done through 3-year time ranges.
To be invalid, the end of the prediction window’s year must both extend beyond the maximum seen year for a patient AND beyond the year that corresponds to the 2017-2019 anchor year range for a patient
- tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_los(df: DataFrame, los: int, group_col: str, admit_col: str, disch_col: str) tuple[DataFrame, DataFrame][source]¶
- tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_readmit(df: DataFrame, gap: timedelta, group_col: str, admit_col: str, disch_col: str) tuple[DataFrame, DataFrame, DataFrame][source]¶
Applies labels to individual visits according to whether or not a readmission has occurred within the specified
gapdays. For a given visit, another visit must occur within the gap window for a positive readmission label. The gap window starts from thedisch_coltime and theadmit_colof subsequent visits are considered.
- tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.partition_by_mort(df: DataFrame, group_col: str, admit_col: str, disch_col: str, death_col: str) tuple[DataFrame, DataFrame][source]¶
Applies labels to individual visits according to whether or not a death has occurred within the times of the specified
admit_colanddisch_col.
-
tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.get_case_ctrls(df: DataFrame, gap: int | None, group_col: str, admit_col: str, disch_col: str, death_col: str, use_mort=
False, use_admn=False, use_los=False) tuple[DataFrame, DataFrame][source]¶ Handles logic for creating the labelled cohort based on arguments passed to
extract_data().- Parameters:¶
- df : pd.DataFrame¶
Dataframe with patient data.
- gap : Optional[int]¶
Specified time interval gap for readmissions.
- group_col : str¶
Patient identifier to group patients (normally
"subject_id").- admit_col : str¶
Column for visit start date information (normally
"admittime"or"intime").- disch_col : str¶
Column for visit end date information (normally
"dischtime"or"outtime").- death_col : str¶
Column indicating death for the mortality label.
- use_mort : bool, optional¶
Flag of whether to use the mortality label. Defaults to
False.- use_admn : bool, optional¶
Flag of whether to use the readmission label. Defaults to
False.- use_los : bool, optional¶
Flag of whether to use the length of stay label. Defaults to
False.
- Returns:¶
Processed dataframes,
(cohort, invalid).- Return type:¶
Tuple[pd.DataFrame, pd.DataFrame]
- tempor.datasources.mivdp.preproc.cohort.day_intervals_cohort_v1.extract_data(version: str, use_ICU: bool, label: typing_extensions.Literal[Mortality, Readmission, Length of Stay], time: int, icd_code: str | None, root_dir: str, disease_label: str | None, cohort_output: str | None = None, summary_output: str | None = None) tuple[DataFrame, str][source]¶
Prepare and save the cohort and summary files for a the given data settings.
Note
Example disease codes for
icd_codeanddisease_labelare: - Heart failure:"I50". - CAD (Coronary Artery Disease):"I25". - CKD (Chronic Kidney Disease):"N18". - COPD (Chronic obstructive pulmonary disease):"J44".- Parameters:¶
- version : str
MIMIC-IV version, e.g.
"1.0".- use_ICU : bool
String indicating whether to extract for the ICU (
True) or non-ICU (False) data.- label : Label
Label to use for the cohort.
- time : int
The time associated with the label. If
labelis"Readmission", this is the gap between admissions in days. Iflabelis"Length of Stay", this is the minimum length of stay to consider, in days. Iflabelis"Mortality", this is ignored.- icd_code : Optional[str]
The ICD code to use as a disease filter for the cohort. If
None, no filter is applied.- root_dir : str
Data root directory. The MIMIC version subdirectory (e.g.
"1.0") is expected to be found under this.- disease_label : Optional[str]
A disease filter to apply to the label (i.e. “admitted due to”). If
None, no filter is applied.- cohort_output : Optional[str], optional
Custom cohort file descriptor, if
None, will generate automatically based on the inputs. Defaults toNone.- summary_output : Optional[str], optional
Custom summary file descriptor, if
None, will generate automatically based on the inputs. Defaults toNone.
- Returns:¶
(cohort, cohort_output), cohort dataframe and the cohort file descriptor.- Return type:¶
Tuple[pd.DataFrame, str]