sessionInfo()
Biostat 203B Homework 3
Due Feb 23 @ 11:59PM
Display machine information for reproducibility:
Load necessary libraries (you can add more as needed).
library(arrow)
library(gtsummary)
library(memuse)
library(pryr)
library(R.utils)
library(tidyverse)
Display your machine memory.
::Sys.meminfo() memuse
In this exercise, we use tidyverse (ggplot2, dplyr, etc) to explore the MIMIC-IV data introduced in homework 1 and to build a cohort of ICU stays.
Q1. Visualizing patient trajectory
Visualizing a patient’s encounters in a health care system is a common task in clinical data analysis. In this question, we will visualize a patient’s ADT (admission-discharge-transfer) history and ICU vitals in the MIMIC-IV data.
Q1.1 ADT history
A patient’s ADT history records the time of admission, discharge, and transfer in the hospital. This figure shows the ADT history of the patient with subject_id
10001217 in the MIMIC-IV data. The x-axis is the calendar time, and the y-axis is the type of event (ADT, lab, procedure). The color of the line segment represents the care unit. The size of the line segment represents whether the care unit is an ICU/CCU. The crosses represent lab events, and the shape of the dots represents the type of procedure. The title of the figure shows the patient’s demographic information and the subtitle shows top 3 diagnoses.
Do a similar visualization for the patient with
subject_id
10013310 using ggplot.
Hint: We need to pull information from data files patients.csv.gz
, admissions.csv.gz
, transfers.csv.gz
, labevents.csv.gz
, procedures_icd.csv.gz
, diagnoses_icd.csv.gz
, d_icd_procedures.csv.gz
, and d_icd_diagnoses.csv.gz
. For the big file labevents.csv.gz
, use the Parquet format you generated in Homework 2. For reproducibility, make the Parquet folder labevents_pq
available at the current working directory hw3
, for example, by a symbolic link. Make your code reproducible.
Q1.2 ICU stays
ICU stays are a subset of ADT history. This figure shows the vitals of the patient 10001217
during ICU stays. The x-axis is the calendar time, and the y-axis is the value of the vital. The color of the line represents the type of vital. The facet grid shows the abbreviation of the vital and the stay ID.
Do a similar visualization for the patient 10013310
.
Q2. ICU stays
icustays.csv.gz
(https://mimic.mit.edu/docs/iv/modules/icu/icustays/) contains data about Intensive Care Units (ICU) stays. The first 10 lines are
zcat < ~/mimic/icu/icustays.csv.gz | head
Q2.1 Ingestion
Import icustays.csv.gz
as a tibble icustays_tble
.
Q2.2 Summary and visualization
How many unique subject_id
? Can a subject_id
have multiple ICU stays? Summarize the number of ICU stays per subject_id
by graphs.
Q3. admissions
data
Information of the patients admitted into hospital is available in admissions.csv.gz
. See https://mimic.mit.edu/docs/iv/modules/hosp/admissions/ for details of each field in this file. The first 10 lines are
zcat < ~/mimic/hosp/admissions.csv.gz | head
Q3.1 Ingestion
Import admissions.csv.gz
as a tibble admissions_tble
.
Q3.2 Summary and visualization
Summarize the following information by graphics and explain any patterns you see.
- number of admissions per patient
- admission hour (anything unusual?)
- admission minute (anything unusual?)
- length of hospital stay (from admission to discharge) (anything unusual?)
According to the MIMIC-IV documentation,
All dates in the database have been shifted to protect patient confidentiality. Dates will be internally consistent for the same patient, but randomly distributed in the future. Dates of birth which occur in the present time are not true dates of birth. Furthermore, dates of birth which occur before the year 1900 occur if the patient is older than 89. In these cases, the patient’s age at their first admission has been fixed to 300.
Q4. patients
data
Patient information is available in patients.csv.gz
. See https://mimic.mit.edu/docs/iv/modules/hosp/patients/ for details of each field in this file. The first 10 lines are
zcat < ~/mimic/hosp/patients.csv.gz | head
Q4.1 Ingestion
Import patients.csv.gz
(https://mimic.mit.edu/docs/iv/modules/hosp/patients/) as a tibble patients_tble
.
Q4.2 Summary and visualization
Summarize variables gender
and anchor_age
by graphics, and explain any patterns you see.
Q5. Lab results
labevents.csv.gz
(https://mimic.mit.edu/docs/iv/modules/hosp/labevents/) contains all laboratory measurements for patients. The first 10 lines are
zcat < ~/mimic/hosp/labevents.csv.gz | head
d_labitems.csv.gz
(https://mimic.mit.edu/docs/iv/modules/hosp/d_labitems/) is the dictionary of lab measurements.
zcat < ~/mimic/hosp/d_labitems.csv.gz | head
We are interested in the lab measurements of creatinine (50912), potassium (50971), sodium (50983), chloride (50902), bicarbonate (50882), hematocrit (51221), white blood cell count (51301), and glucose (50931). Retrieve a subset of labevents.csv.gz
that only containing these items for the patients in icustays_tble
. Further restrict to the last available measurement (by storetime
) before the ICU stay. The final labevents_tble
should have one row per ICU stay and columns for each lab measurement.
Hint: Use the Parquet format you generated in Homework 2. For reproducibility, make labevents_pq
folder available at the current working directory hw3
, for example, by a symbolic link.
Q6. Vitals from charted events
chartevents.csv.gz
(https://mimic.mit.edu/docs/iv/modules/icu/chartevents/) contains all the charted data available for a patient. During their ICU stay, the primary repository of a patient’s information is their electronic chart. The itemid
variable indicates a single measurement type in the database. The value
variable is the value measured for itemid
. The first 10 lines of chartevents.csv.gz
are
zcat < ~/mimic/icu/chartevents.csv.gz | head
d_items.csv.gz
(https://mimic.mit.edu/docs/iv/modules/icu/d_items/) is the dictionary for the itemid
in chartevents.csv.gz
.
zcat < ~/mimic/icu/d_items.csv.gz | head
We are interested in the vitals for ICU patients: heart rate (220045), systolic non-invasive blood pressure (220179), diastolic non-invasive blood pressure (220180), body temperature in Fahrenheit (223761), and respiratory rate (220210). Retrieve a subset of chartevents.csv.gz
only containing these items for the patients in icustays_tble
. Further restrict to the first vital measurement within the ICU stay. The final chartevents_tble
should have one row per ICU stay and columns for each vital measurement.
Hint: Use the Parquet format you generated in Homework 2. For reproducibility, make chartevents_pq
folder available at the current working directory, for example, by a symbolic link.
Q7. Putting things together
Let us create a tibble mimic_icu_cohort
for all ICU stays, where rows are all ICU stays of adults (age at intime
>= 18) and columns contain at least following variables
- all variables in
icustays_tble
- all variables in
admissions_tble
- all variables in
patients_tble
- the last lab measurements before the ICU stay in
labevents_tble
- the first vital measurements during the ICU stay in
chartevents_tble
The final mimic_icu_cohort
should have one row per ICU stay and columns for each variable.
Q8. Exploratory data analysis (EDA)
Summarize the following information about the ICU stay cohort mimic_icu_cohort
using appropriate numerics or graphs:
Length of ICU stay
los
vs demographic variables (race, insurance, marital_status, gender, age at intime)Length of ICU stay
los
vs the last available lab measurements before ICU stayLength of ICU stay
los
vs the first vital measurements within the ICU stayLength of ICU stay
los
vs first ICU unit