Biostat 203B Homework 4

Due Mar 8 @ 11:59PM

Author

Your Name and UID

Display machine information:

sessionInfo()

Display my machine memory.

memuse::Sys.meminfo()

Load database libraries and the tidyverse frontend:

library(bigrquery)
library(dbplyr)
library(DBI)
library(gt)
library(gtsummary)
library(tidyverse)

Q1. Compile the ICU cohort in HW3 from the Google BigQuery database

Below is an outline of steps. In this homework, we exclusively work with the BigQuery database and should not use any MIMIC data files stored on our local computer. Transform data as much as possible in BigQuery database and collect() the tibble only at the end of Q1.7.

Q1.1 Connect to BigQuery

Authenticate with BigQuery using the service account token. Please place the service account token (shared via BruinLearn) in the working directory (same folder as your qmd file). Do not add this token to your git repository.

# path to the service account token 
satoken <- "biostat-203b-2024-winter-313290ce47a6.json"
# BigQuery authentication using service account
bq_auth(path = satoken)

Connect to BigQuery database mimic4_v2_2 in GCP (Google Cloud Platform), using the project billing account biostat-203b-2024-winter.

# connect to the BigQuery database `biostat-203b-2024-winter.mimic4_v2_2`
con_bq <- dbConnect(
    bigrquery::bigquery(),
    project = "biostat-203b-2024-winter",
    dataset = "mimic4_v2_2",
    billing = "biostat-203b-2024-winter"
)
con_bq

List all tables in the mimic4_v2_2 database.

dbListTables(con_bq)

Q1.2 icustays data

Connect to the icustays table.

# full ICU stays table
icustays_tble <- tbl(con_bq, "icustays") |>
  # show_query() |>
  print(width = Inf)

Q1.3 admissions data

Connect to the admissions table.

# # TODO
# admissions_tble <- 

Q1.4 patients data

Connect to the patients table.

# # TODO
# patients_tble <-

Q1.5 labevents data

Connect to the labevents table and retrieve a subset that only contain subjects who appear in icustays_tble and the lab items listed in HW3. Only keep the last lab measurements before the ICU stay and pivot lab items to become variables/columns. Write all steps in one chain of pipes.

# # TODO
# labevents_tble <- 

Q1.6 chartevents data

Connect to chartevents table and retrieve a subset that only contain subjects who appear in icustays_tble and the chart events listed in HW3. Only keep the first chart events during ICU stay and pivot chart events to become variables/columns. Write all steps in one chain of pipes.

# # TODO
# chartevents_tble <- 

Q1.7 Put things together

This step is similar to Q7 of HW3. Using one chain of pipes |> to perform following data wrangling steps: (i) start with the icustays_tble, (ii) merge in admissions and patients tables, (iii) keep adults only (age at ICU intime >= 18), (iv) merge in the labevents and chartevents tables, (v) collect the tibble.

# # TODO
# mimic_icu_cohort <- 

Q1.8 Preprocessing

Perform the following preprocessing steps. (i) Lump infrequent levels into “Other” level for first_careunit, last_careunit, admission_type, admission_location, and discharge_location. (ii) Collapse the levels of race into ASIAN, BLACK, HISPANIC, WHITE, and Other. (iii) Create a new variable los_long that is TRUE when los is greater than or equal to 2 days. (iv) Summarize the data using tbl_summary(), stratified by los_long. Hint: fct_lump and fct_collapse from the forcats package can be useful.

Hint: Below is a numerical summary of my tibble after preprocessing:

Q1.9 Save the final tibble

Save the final tibble to an R data file mimic_icu_cohort.rds in the mimiciv_shiny folder.

# make a directory mimiciv_shiny
if (!dir.exists("mimiciv_shiny")) {
  dir.create("mimiciv_shiny")
}
# save the final tibble
mimic_icu_cohort |>
  write_rds("mimiciv_shiny/mimic_icu_cohort.rds", compress = "gz")

Close database connection and clear workspace.

if (exists("con_bq")) {
  dbDisconnect(con_bq)
}
rm(list = ls())

Although it is not a good practice to add big data files to git, for grading purpose, please add mimic_icu_cohort.rds to your git repository.

Q2. Shiny app

Develop a Shiny app for exploring the ICU cohort data created in Q1. The app should reside in the mimiciv_shiny folder. The app should contain at least two tabs. One tab provides easy access to the graphical and numerical summaries of variables (demographics, lab measurements, vitals) in the ICU cohort. The other allows user to choose a specific patient in the cohort and display the patient’s ADT and ICU stay information as we did in Q1 of HW3.