Today
-
String and regular expression (cont’d).
-
Web scraping.
Announcement
-
HW2 is due Feb 10 @ 11:59pm. FAQs:
- It’s okay to use
bash
instead ofsystem
command in R.
- It’s okay to use
-
Common HW1 issues:
- Q3.5: How many unique patients (identified by subject_id) are in this data file? (Answer: 256878). Header doesn't count as a patient.
zmore ~/mimic/core/admissions.csv.gz |
tail -n +2 |
awk -F, '{ print $1 }' |
sort |
uniq |
wc -l
- Q5: cal 9 1752. Interpret the results?
- Git usage. Not enough commits (less than 5) in develop branch. 5.0 points off. `.DS_Store` and `.Rhistory` should be gitingored.
- Reproducibility. Use local path (e.g., `/Users/xxx/Desktop/`): 5.0 points off
- Q4.1 (Count four characters name): `#| eval: false`. 1.0 point off
- When pg42671.txt is not downloaded in code, additional 1.0 point off
- Q4.3 (`/middle.sh` + `chmod`): `#| eval: false`. 1.0 point off. When `middle.sh` is not included in Git or not created in code, additional 1.0 point off