Today

  • String and regular expression (cont’d).

  • Web scraping.

Announcement

  • HW2 is due Feb 10 @ 11:59pm. FAQs:

    • It’s okay to use bash instead of system command in R.
  • Common HW1 issues:

    - Q3.5: How many unique patients (identified by subject_id) are in this data file? (Answer: 256878). Header doesn't count as a patient.
    
zmore ~/mimic/core/admissions.csv.gz | 
tail -n +2 | 
awk -F, '{ print $1 }' | 
sort |
uniq | 
wc -l
  - Q5: cal 9 1752. Interpret the results?

  - Git usage. Not enough commits (less than 5) in develop branch. 5.0 points off. `.DS_Store` and `.Rhistory` should be gitingored. 

  - Reproducibility. Use local path (e.g., `/Users/xxx/Desktop/`): 5.0 points off

  - Q4.1 (Count four characters name): `#| eval: false`. 1.0 point off

  - When  pg42671.txt is not downloaded in code, additional 1.0 point off

  - Q4.3 (`/middle.sh` + `chmod`): `#| eval: false`. 1.0 point off. When `middle.sh` is not included in Git or not created in code, additional 1.0 point off