Lecture 18 03 Mar 2022

Checklist on your resume

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

Machine/statistical learning methods. Familiar with methods in Elements of Statistical Learning and software, e.g., scikit-learn.

For non-statistician/biostatistician, I recommend An Introduction to Statistical Learning: With Applications in R, which is less technical and more application oriented.
Computational algorithms. Spring quarter’s Biostat 257 will cover numerical linear algebra and numerical optimization algorithms.
Public health applications.
Be open to languages. Python is a more generic programming language and widely adopted in data science. JavaScript is dominant in web applications. Scala is popular for implementing distributed programs. Julia is attractive for high performance scientific computing.

Please do it now.