Neural Networks and Deep Learning - Part II (Practice)

Learning sources
Software
TensorFlow
R/RStudio
Workflow for a deep learning network
Example: MNIST - MLP
Example: MNIST - CNN
Example: Generate text from Nietzsche’s writings - RNN LSTM
Example: IMDB review sentiment analysis - RNN LSTM
Example: Generate handwritten digits from MNIST - GAN

Learning sources

This lecture draws heavily on following sources.

Learning Deep Learning lectures by Dr. Qiyang Hu (UCLA Office of Advanced Research Computing): https://github.com/huqy/deep_learning_workshops

Software

High-level software focuses on user-friendly interface to specify and train models.
Keras, scikit-learn, …
Lower-level software focuses on developer tools for implementing deep learning models.
TensorFlow, PyTorch, Theano, CNTK, Caffe, Torch, …
Most tools are developed in Python plus a low-level language (C/C++, CUDA).

TensorFlow

Developed by Google Brain team for internal Google use. Formerly DistBelief.
Open sourced in Nov 2015.
OS: Linux, MacOS, and Windows (since Nov 2016).
GPU support: NVIDIA CUDA.
TPU (tensor processing unit), built specifically for machine learning and tailored for TensorFlow.
Mobile device deployment: TensorFlow Lite (May 2017) for Android and iOS.

Used in a variety of Google apps: speech recognition (Google assistant), Gmail (Smart Reply), search, translate, self-driving car, …

when you have a hammer, everything looks like a nail.

Machine Learning Crash Course (MLCC). A 15 hour workshop available to public since March 1, 2018.

R/RStudio

R users can access Keras and TensorFlow via the keras and tensorflow packages.

#install.packages("keras")
library(keras)
install_keras()
# install_keras(tensorflow = "gpu") # if NVIDIA GPU is available

On teaching server, it may be necessary to run

library(reticulate)
virtualenv_create("r-reticulate")

to create a virtual environment ~/.virtualenvs/r-reticulate to install Keras locally.

Workflow for a deep learning network

Step 1: Data ingestion, preparation, and processing

Source: CrowdFlower

The most time-consuming but the most creative job. Take \(>80%\) time, require experience and domain knowledge.
Determines the upper limit for the goodness of DL. Garbage in, garbage out.
For structured/tabular data.

Data prep for special DL tasks.
- Image data: pixel scaling, train-time augmentation, test-time augmentation, convolution and flattening.
- Data tokenization: break sequences into units, map units to vectors, align and padd sequences.
- Data embedding: sparse to dense, merge diverse data, preserve relationship, dimension reduction, Word2Vec, be part of model training.

Step 2: Select neural network

Architecture.

Source: https://www.asimovinstitute.org/neural-network-zoo/

Activation function.

Step 3: Select loss function

Regression loss: MSE/quadratic loss/L2 loss, mean absolute error/L1 loss.
Classification loss: cross-entropy loss, …
Customized losses.

Step 4: Train and evaluate model

Choose optimization algorithm. Generalization (SGD) vs convergence rate (adaptive).
- Stochastic GD.
- Adding momentum: classical momentum, Nesterov acceleration.
- Adaptive learning rate: AdaGrad, AdaDelta, RMSprop.
- Comining acceleration and adaptive learning rate: ADAM (default in many libraries).
- Beyond ADAM: lookahead, RAdam, AdaBound/AmsBound, Range, AdaBelief.

A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam) by Lili Jiang: https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c

Fitness of model: underfitting vs overfitting.

Source: https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks

Model selection: \(K\)-fold cross validation.

Example: MNIST - MLP

Example: MNIST - CNN

Example: Generate text from Nietzsche’s writings - RNN LSTM

Example: IMDB review sentiment analysis - RNN LSTM

Example: Generate handwritten digits from MNIST - GAN