Learning sources

This lecture draws heavily on following sources.

Software

TensorFlow

when you have a hammer, everything looks like a nail.

R/RStudio

R users can access Keras and TensorFlow via the keras and tensorflow packages.

#install.packages("keras")
library(keras)
install_keras()
# install_keras(tensorflow = "gpu") # if NVIDIA GPU is available

On teaching server, it may be necessary to run

library(reticulate)
virtualenv_create("r-reticulate")

to create a virtual environment ~/.virtualenvs/r-reticulate to install Keras locally.

Workflow for a deep learning network

Step 1: Data ingestion, preparation, and processing

Source: CrowdFlower

  • The most time-consuming but the most creative job. Take \(>80%\) time, require experience and domain knowledge.

  • Determines the upper limit for the goodness of DL. Garbage in, garbage out.

  • For structured/tabular data.

  • Data prep for special DL tasks.

    • Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening.

    • Data tokenization: break sequences into units, map units to vectors, align and padd sequences.

    • Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training.

Step 2: Select neural network

  • Architecture.

Source: https://www.asimovinstitute.org/neural-network-zoo/

  • Activation function.

Step 3: Select loss function

  • Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss.

  • Classification loss: cross-entropy loss, …

  • Customized losses.

Step 4: Train and evaluate model

  • Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive).

    • Stochastic GD.

    • Adding momentum: classical momentum, Nesterov acceleration.

    • Adaptive learning rate: AdaGrad, AdaDelta, RMSprop.

    • Comining acceleration and adaptive learning rate: ADAM (default in many libraries).

    • Beyond ADAM: lookahead, RAdam, AdaBound/AmsBound, Range, AdaBelief.

A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) by Lili Jiang: https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c

  • Fitness of model: underfitting vs overfitting.

Source: https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks

  • Model selection: \(K\)-fold cross validation.

Example: MNIST - MLP

Rmd, html.

Example: MNIST - CNN

Rmd, html.

Example: Generate text from Nietzsche’s writings - RNN LSTM

Rmd, html.

Example: IMDB review sentiment analysis - RNN LSTM

Rmd, html.

Example: Generate handwritten digits from MNIST - GAN

Rmd, html.