Today

  • Data transformation using dplyr (cont’d from Grouped Mutates and Grouped Filters).

  • Data visualization using ggplot2.

  • Data and time with lubridate.

  • Text data with stringr.

  • Web scraping.

  • Lab: HW3.

Announcement

  • HW3 is posted and due Feb 23 @ 11:59pm. Start early. Ask questions on Slack or office hours.

  • Common issues in HW1.

    • Use git regularly. Many students only commit and push after finishing the assignment, which is not enough. You should consider making a commit whenever meaningful changes to your code have been made. If you’re not sure when to commit, start by committing after finishing each question. Do not commit directly to the main branch while working on the assignments.

    • Tag your homework submission in git. And make sure that you’re tagging the version you actually want to submit. You can verify on Github which commits were included with the tag to make sure you’re using most recent updates.

    • Do not set every code chunk to eval = False when rendering your html file. In your submitted html, we want to see what the code actually looks like on your machine, which does not happen when code chunks are not evaluated. This is also not reproducible; we should be able to easily run the chunks that are necessary for your analysis.

    • Answer all parts of the question. Make sure you’re providing a complete answer when asked for any interpretation or description.
    • No plagiarism.
  • UC :heart: Data Week (Feb 12-16, 2024).

Q&A

  • Be careful handling big data. Clear unwanted objects (rm(obj)) or workspace (rm(list = ls())). Use gc() (garbage collection) to free up memory. Clear workspace before rendering.

  • Let me know if your machine has <8GB RAM.

  • Git etiquette.

  • Ask HW questions on Slack (do not DM me) or office hours. Do not email me.

  • If ask for hw extension, please email me and TAs (Jon and Jasen) with your documents (doctor notes, …) Or use Center for Accessible Education (CAE) for any accommodations.