Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.

  • Winston Churchill, November 10, 1942

Where are we now?

What’s next?

First-year core courses: foundation.

Second-year electives: advanced methods.

Tips to be successful in graduate studies

  • Show up.

  • Time management.

Checklist on your resume

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

  • Linux scripting
  • Git/GitHub (give your GitHub handle)
  • Container technology: Docker (if you use it)
  • Data wrangling (Tidyverse)
  • Data visualization (ggplot2)
  • Frontend development (shiny, web app)
  • Databases: OLAP (DuckDB), Google BigQuery (spend more time self-studying SQL)
  • Cloud computing (GCP, AWS, Azure, OCI) (if you use it)
  • High-performance computing (HPC) on cluster (if you use Hoffman2)
  • Deep learning with Keras+TensorFlow (PyTorch is more popular in research) (if you use it in HW5)

  • Make your GitHub repo biostat-203b-2024-winter public (after Apr 15, 2024) and show off your work to back your resume. Feel free to modify the reports after this course. You can make your GitHub repository into a webpage by using GitHub Pages.

  • Use these tools in your daily work: use Git/GitHub for all your homework and research projects, write weekly research report using Quarto/RMarkdown/Jupyter Notebook, give presentation using ggplot2 and Shiny, write blog/tutorial, …

  • Be open to languages. Python is a more generic programming language and widely adopted in data science. Julia is attractive for high performance scientific computing. JavaScript is dominant in web applications. Scala is popular for implementing distributed programs.

Course evaluation

Please do it NOW.

http://my.ucla.edu

Today

  • Neural network (cont’d, CNN, RNN/LSTM, GAN).

Announcement

Q&A