MLP on the MNIST Digit Data

Biostat 203B

Author

Dr. Hua Zhou @ UCLA

Published

February 27, 2024

Display system information for reproducibility.

import IPython
print(IPython.sys_info())
{'commit_hash': '8b1204b6c',
 'commit_source': 'installation',
 'default_encoding': 'utf-8',
 'ipython_path': '/opt/venv/lib/python3.10/site-packages/IPython',
 'ipython_version': '8.21.0',
 'os_name': 'posix',
 'platform': 'Linux-6.6.12-linuxkit-aarch64-with-glibc2.35',
 'sys_executable': '/opt/venv/bin/python',
 'sys_platform': 'linux',
 'sys_version': '3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]'}
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/aarch64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/aarch64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] digest_0.6.34     fastmap_1.1.1     xfun_0.42         Matrix_1.6-1.1   
 [5] lattice_0.21-9    reticulate_1.35.0 knitr_1.45        htmltools_0.5.7  
 [9] png_0.1-8         rmarkdown_2.25    cli_3.6.2         grid_4.3.2       
[13] compiler_4.3.2    rstudioapi_0.15.0 tools_4.3.2       evaluate_0.23    
[17] Rcpp_1.0.12       yaml_2.3.8        rlang_1.1.3       jsonlite_1.8.8   
[21] htmlwidgets_1.6.4

Load some libraries.

# Load the pandas library
import pandas as pd
# Load numpy for array manipulation
import numpy as np
# Load seaborn plotting library
import seaborn as sns
import matplotlib.pyplot as plt

# Set font sizes in plots
sns.set(font_scale = 1.2)
# Display all columns
pd.set_option('display.max_columns', None)

# Load Tensorflow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
library(keras)

In this example, we train an MLP (multi-layer perceptron) on the MNIST data set. Achieve testing accuracy 98% after 30 epochs.

1 Prepare data

Acquire data:

# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz

    8192/11490434 [..............................] - ETA: 0s
  737280/11490434 [>.............................] - ETA: 0s
 2252800/11490434 [====>.........................] - ETA: 0s
 3768320/11490434 [========>.....................] - ETA: 0s
 5242880/11490434 [============>.................] - ETA: 0s
 6471680/11490434 [===============>..............] - ETA: 0s
 8142848/11490434 [====================>.........] - ETA: 0s
 9617408/11490434 [========================>.....] - ETA: 0s
10928128/11490434 [===========================>..] - ETA: 0s
11490434/11490434 [==============================] - 0s 0us/step
# Training set
x_train.shape
(60000, 28, 28)
y_train.shape
(60000,)
# Test set
x_test.shape
(10000, 28, 28)
y_test.shape
(10000,)

Display the first training instance and its label:

import matplotlib.pyplot as plt

# Feature: digit
plt.figure()
plt.gray()
plt.imshow(x_train[0]);
plt.show()

# Label
y_train[0]
5
mnist <- dataset_mnist()
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

Training set:

dim(x_train)
[1] 60000    28    28
dim(y_train)
[1] 60000
image(
  t(x_train[1, 28:1,]), 
  useRaster = TRUE, 
  axes = FALSE, 
  col = grey(seq(0, 1, length = 256))
  )

y_train[1]
[1] 5

Testing set:

dim(x_test)
[1] 10000    28    28
dim(y_test)
[1] 10000

Vectorize \(28 \times 28\) images into \(784\)-vectors and scale entries to [0, 1]:

# Reshape
x_train = np.reshape(x_train, [x_train.shape[0], 784])
x_test = np.reshape(x_test, [x_test.shape[0], 784])
# Rescale
x_train = x_train / 255
x_test = x_test / 255
# Train
x_train.shape
(60000, 784)
# Test
x_test.shape
(10000, 784)
# reshape
x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
dim(x_train)
[1] 60000   784
dim(x_test)
[1] 10000   784

Encode \(y\) as binary class matrix:

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
# Train
y_train.shape
(60000, 10)
# Test
y_test.shape
(10000, 10)
# First train instance
y_train[0]
array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
dim(y_train)
[1] 60000    10
dim(y_test)
[1] 10000    10
head(y_train)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    1    0    0    0     0
[2,]    1    0    0    0    0    0    0    0    0     0
[3,]    0    0    0    0    1    0    0    0    0     0
[4,]    0    1    0    0    0    0    0    0    0     0
[5,]    0    0    0    0    0    0    0    0    0     1
[6,]    0    0    1    0    0    0    0    0    0     0

2 Define the model

Define a sequential model (a linear stack of layers) with 2 fully-connected hidden layers (256 and 128 neurons):

model = keras.Sequential(
  [
    keras.Input(shape = (784,)),
    layers.Dense(units = 256, activation = 'relu'),
    layers.Dropout(rate = 0.4),
    layers.Dense(units = 128, activation = 'relu'),
    layers.Dropout(rate = 0.3),
    layers.Dense(units = 10, activation = 'softmax')
]
)

model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 256)               200960    
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense_1 (Dense)             (None, 128)               32896     
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 dense_2 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 235146 (918.54 KB)
Trainable params: 235146 (918.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Plot the model:

tf.keras.utils.plot_model(
    model,
    to_file = "model.png",
    show_shapes = True,
    show_dtype = False,
    show_layer_names = True,
    rankdir = "TB",
    expand_nested = False,
    dpi = 96,
    layer_range = None,
    show_layer_activations = False,
)
You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model to work.

model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% 
  layer_dropout(rate = 0.4) %>% 
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 10, activation = 'softmax')
summary(model)
Model: "sequential_1"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 dense_5 (Dense)                    (None, 256)                     200960      
 dropout_3 (Dropout)                (None, 256)                     0           
 dense_4 (Dense)                    (None, 128)                     32896       
 dropout_2 (Dropout)                (None, 128)                     0           
 dense_3 (Dense)                    (None, 10)                      1290        
================================================================================
Total params: 235146 (918.54 KB)
Trainable params: 235146 (918.54 KB)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________________

Compile the model with appropriate loss function, optimizer, and metrics:

model.compile(
  loss = "categorical_crossentropy",
  optimizer = "rmsprop",
  metrics = ["accuracy"]
)
model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)

3 Training and validation

80%/20% split for the train/validation set.

batch_size = 128
epochs = 30

history = model.fit(
  x_train,
  y_train,
  batch_size = batch_size,
  epochs = epochs,
  validation_split = 0.2
)

Plot training history:

Code
hist = pd.DataFrame(history.history)
hist['epoch'] = np.arange(1, epochs + 1)
hist = hist.melt(
  id_vars = ['epoch'],
  value_vars = ['loss', 'accuracy', 'val_loss', 'val_accuracy'],
  var_name = 'type',
  value_name = 'value'
)
hist['split'] = np.where(['val' in s for s in hist['type']], 'validation', 'train')
hist['metric'] = np.where(['loss' in s for s in hist['type']], 'loss', 'accuracy')

# Accuracy trace plot
plt.figure()
sns.relplot(
  data = hist[hist['metric'] == 'accuracy'],
  kind = 'scatter',
  x = 'epoch',
  y = 'value',
  hue = 'split'
).set(
  xlabel = 'Epoch',
  ylabel = 'Accuracy'
);
plt.show()

Code
# Loss trace plot
plt.figure()
sns.relplot(
  data = hist[hist['metric'] == 'loss'],
  kind = 'scatter',
  x = 'epoch',
  y = 'value',
  hue = 'split'
).set(
  xlabel = 'Epoch',
  ylabel = 'Loss'
);
plt.show()

system.time({
history <- model %>% fit(
  x_train, y_train, 
  epochs = 30, batch_size = 128, 
  validation_split = 0.2
)
})
Epoch 1/30
375/375 - 1s - loss: 0.4300 - accuracy: 0.8703 - val_loss: 0.1623 - val_accuracy: 0.9523 - 1s/epoch - 3ms/step
Epoch 2/30
375/375 - 1s - loss: 0.1997 - accuracy: 0.9415 - val_loss: 0.1185 - val_accuracy: 0.9631 - 1s/epoch - 3ms/step
Epoch 3/30
375/375 - 1s - loss: 0.1571 - accuracy: 0.9537 - val_loss: 0.1077 - val_accuracy: 0.9669 - 1s/epoch - 3ms/step
Epoch 4/30
375/375 - 1s - loss: 0.1322 - accuracy: 0.9611 - val_loss: 0.0948 - val_accuracy: 0.9740 - 1s/epoch - 3ms/step
Epoch 5/30
375/375 - 1s - loss: 0.1148 - accuracy: 0.9654 - val_loss: 0.0967 - val_accuracy: 0.9729 - 1s/epoch - 3ms/step
Epoch 6/30
375/375 - 1s - loss: 0.1022 - accuracy: 0.9699 - val_loss: 0.0856 - val_accuracy: 0.9768 - 1s/epoch - 3ms/step
Epoch 7/30
375/375 - 1s - loss: 0.0951 - accuracy: 0.9714 - val_loss: 0.0881 - val_accuracy: 0.9758 - 1s/epoch - 3ms/step
Epoch 8/30
375/375 - 1s - loss: 0.0868 - accuracy: 0.9733 - val_loss: 0.0908 - val_accuracy: 0.9759 - 1s/epoch - 3ms/step
Epoch 9/30
375/375 - 1s - loss: 0.0808 - accuracy: 0.9757 - val_loss: 0.0839 - val_accuracy: 0.9777 - 1s/epoch - 3ms/step
Epoch 10/30
375/375 - 1s - loss: 0.0788 - accuracy: 0.9757 - val_loss: 0.0798 - val_accuracy: 0.9795 - 1s/epoch - 3ms/step
Epoch 11/30
375/375 - 1s - loss: 0.0735 - accuracy: 0.9781 - val_loss: 0.0868 - val_accuracy: 0.9798 - 1s/epoch - 3ms/step
Epoch 12/30
375/375 - 1s - loss: 0.0739 - accuracy: 0.9780 - val_loss: 0.0864 - val_accuracy: 0.9783 - 1s/epoch - 3ms/step
Epoch 13/30
375/375 - 1s - loss: 0.0662 - accuracy: 0.9807 - val_loss: 0.0850 - val_accuracy: 0.9792 - 1s/epoch - 3ms/step
Epoch 14/30
375/375 - 1s - loss: 0.0632 - accuracy: 0.9812 - val_loss: 0.0844 - val_accuracy: 0.9799 - 1s/epoch - 3ms/step
Epoch 15/30
375/375 - 1s - loss: 0.0622 - accuracy: 0.9812 - val_loss: 0.0825 - val_accuracy: 0.9802 - 1s/epoch - 3ms/step
Epoch 16/30
375/375 - 1s - loss: 0.0581 - accuracy: 0.9826 - val_loss: 0.0855 - val_accuracy: 0.9802 - 1s/epoch - 3ms/step
Epoch 17/30
375/375 - 1s - loss: 0.0596 - accuracy: 0.9824 - val_loss: 0.0877 - val_accuracy: 0.9804 - 1s/epoch - 3ms/step
Epoch 18/30
375/375 - 1s - loss: 0.0553 - accuracy: 0.9834 - val_loss: 0.0882 - val_accuracy: 0.9803 - 1s/epoch - 3ms/step
Epoch 19/30
375/375 - 1s - loss: 0.0544 - accuracy: 0.9838 - val_loss: 0.0828 - val_accuracy: 0.9811 - 1s/epoch - 3ms/step
Epoch 20/30
375/375 - 1s - loss: 0.0544 - accuracy: 0.9844 - val_loss: 0.0873 - val_accuracy: 0.9805 - 1s/epoch - 3ms/step
Epoch 21/30
375/375 - 1s - loss: 0.0527 - accuracy: 0.9847 - val_loss: 0.0885 - val_accuracy: 0.9808 - 1s/epoch - 3ms/step
Epoch 22/30
375/375 - 1s - loss: 0.0497 - accuracy: 0.9848 - val_loss: 0.0897 - val_accuracy: 0.9811 - 1s/epoch - 3ms/step
Epoch 23/30
375/375 - 1s - loss: 0.0481 - accuracy: 0.9859 - val_loss: 0.0921 - val_accuracy: 0.9802 - 1s/epoch - 3ms/step
Epoch 24/30
375/375 - 1s - loss: 0.0479 - accuracy: 0.9847 - val_loss: 0.0971 - val_accuracy: 0.9814 - 1s/epoch - 3ms/step
Epoch 25/30
375/375 - 1s - loss: 0.0483 - accuracy: 0.9858 - val_loss: 0.0979 - val_accuracy: 0.9803 - 1s/epoch - 3ms/step
Epoch 26/30
375/375 - 1s - loss: 0.0476 - accuracy: 0.9861 - val_loss: 0.0880 - val_accuracy: 0.9809 - 1s/epoch - 3ms/step
Epoch 27/30
375/375 - 1s - loss: 0.0444 - accuracy: 0.9874 - val_loss: 0.0941 - val_accuracy: 0.9816 - 1s/epoch - 3ms/step
Epoch 28/30
375/375 - 1s - loss: 0.0434 - accuracy: 0.9874 - val_loss: 0.1077 - val_accuracy: 0.9808 - 1s/epoch - 3ms/step
Epoch 29/30
375/375 - 1s - loss: 0.0405 - accuracy: 0.9879 - val_loss: 0.0921 - val_accuracy: 0.9820 - 1s/epoch - 3ms/step
Epoch 30/30
375/375 - 1s - loss: 0.0439 - accuracy: 0.9874 - val_loss: 0.0929 - val_accuracy: 0.9827 - 1s/epoch - 3ms/step
   user  system elapsed 
 63.770  20.459  34.053 
plot(history)

4 Testing

Evaluate model performance on the test data:

score = model.evaluate(x_test, y_test, verbose = 0)
print("Test loss:", score[0])
Test loss: 0.08709269762039185
print("Test accuracy:", score[1])
Test accuracy: 0.9825000166893005

model %>% evaluate(x_test, y_test)
313/313 - 0s - loss: 0.0835 - accuracy: 0.9836 - 185ms/epoch - 591us/step
      loss   accuracy 
0.08349238 0.98360002 

Generate predictions on new data:

model %>% predict(x_test) %>% k_argmax()
313/313 - 0s - 208ms/epoch - 663us/step
tf.Tensor([7 2 1 ... 4 5 6], shape=(10000), dtype=int64)

5 Exercise

Suppose we want to fit a multinomial-logit model and use it as a baseline method to neural networks. How to do that? Of course we can use mlogit or other packages. Instead we can fit the same model using keras, since multinomial-logit is just an MLP with (1) one input layer with linear activation and (2) one output layer with softmax link function.

# set up model
library(keras)
mlogit <- keras_model_sequential() 
mlogit %>% 
#  layer_dense(units = 256, activation = 'linear', input_shape = c(784)) %>% 
#  layer_dropout(rate = 0.4) %>% 
  layer_dense(units = 10, activation = 'softmax', input_shape = c(784))
summary(mlogit)
Model: "sequential_2"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 dense_6 (Dense)                    (None, 10)                      7850        
================================================================================
Total params: 7850 (30.66 KB)
Trainable params: 7850 (30.66 KB)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________________
# compile model
mlogit %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
)
mlogit
Model: "sequential_2"
________________________________________________________________________________
 Layer (type)                       Output Shape                    Param #     
================================================================================
 dense_6 (Dense)                    (None, 10)                      7850        
================================================================================
Total params: 7850 (30.66 KB)
Trainable params: 7850 (30.66 KB)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________________
# fit model
mlogit_history <- mlogit %>% 
  fit(
    x_train, 
    y_train, 
    epochs = 20, 
    batch_size = 128, 
    validation_split = 0.2
  )
Epoch 1/20
375/375 - 0s - loss: 0.6682 - accuracy: 0.8367 - val_loss: 0.3600 - val_accuracy: 0.9047 - 468ms/epoch - 1ms/step
Epoch 2/20
375/375 - 0s - loss: 0.3530 - accuracy: 0.9034 - val_loss: 0.3100 - val_accuracy: 0.9141 - 314ms/epoch - 838us/step
Epoch 3/20
375/375 - 0s - loss: 0.3181 - accuracy: 0.9112 - val_loss: 0.2956 - val_accuracy: 0.9172 - 314ms/epoch - 838us/step
Epoch 4/20
375/375 - 0s - loss: 0.3019 - accuracy: 0.9161 - val_loss: 0.2864 - val_accuracy: 0.9205 - 305ms/epoch - 813us/step
Epoch 5/20
375/375 - 0s - loss: 0.2922 - accuracy: 0.9178 - val_loss: 0.2783 - val_accuracy: 0.9232 - 324ms/epoch - 864us/step
Epoch 6/20
375/375 - 0s - loss: 0.2858 - accuracy: 0.9202 - val_loss: 0.2758 - val_accuracy: 0.9235 - 305ms/epoch - 814us/step
Epoch 7/20
375/375 - 0s - loss: 0.2807 - accuracy: 0.9213 - val_loss: 0.2715 - val_accuracy: 0.9251 - 306ms/epoch - 816us/step
Epoch 8/20
375/375 - 0s - loss: 0.2769 - accuracy: 0.9226 - val_loss: 0.2707 - val_accuracy: 0.9268 - 317ms/epoch - 846us/step
Epoch 9/20
375/375 - 0s - loss: 0.2738 - accuracy: 0.9244 - val_loss: 0.2677 - val_accuracy: 0.9261 - 306ms/epoch - 817us/step
Epoch 10/20
375/375 - 0s - loss: 0.2710 - accuracy: 0.9251 - val_loss: 0.2666 - val_accuracy: 0.9279 - 303ms/epoch - 807us/step
Epoch 11/20
375/375 - 0s - loss: 0.2687 - accuracy: 0.9255 - val_loss: 0.2658 - val_accuracy: 0.9273 - 373ms/epoch - 996us/step
Epoch 12/20
375/375 - 0s - loss: 0.2666 - accuracy: 0.9262 - val_loss: 0.2668 - val_accuracy: 0.9277 - 322ms/epoch - 859us/step
Epoch 13/20
375/375 - 0s - loss: 0.2648 - accuracy: 0.9272 - val_loss: 0.2649 - val_accuracy: 0.9282 - 308ms/epoch - 820us/step
Epoch 14/20
375/375 - 0s - loss: 0.2634 - accuracy: 0.9267 - val_loss: 0.2633 - val_accuracy: 0.9302 - 308ms/epoch - 822us/step
Epoch 15/20
375/375 - 0s - loss: 0.2621 - accuracy: 0.9273 - val_loss: 0.2638 - val_accuracy: 0.9300 - 325ms/epoch - 868us/step
Epoch 16/20
375/375 - 0s - loss: 0.2606 - accuracy: 0.9280 - val_loss: 0.2637 - val_accuracy: 0.9286 - 320ms/epoch - 853us/step
Epoch 17/20
375/375 - 0s - loss: 0.2597 - accuracy: 0.9285 - val_loss: 0.2628 - val_accuracy: 0.9298 - 308ms/epoch - 821us/step
Epoch 18/20
375/375 - 0s - loss: 0.2582 - accuracy: 0.9292 - val_loss: 0.2631 - val_accuracy: 0.9308 - 304ms/epoch - 811us/step
Epoch 19/20
375/375 - 0s - loss: 0.2574 - accuracy: 0.9293 - val_loss: 0.2622 - val_accuracy: 0.9301 - 303ms/epoch - 807us/step
Epoch 20/20
375/375 - 0s - loss: 0.2562 - accuracy: 0.9298 - val_loss: 0.2620 - val_accuracy: 0.9298 - 322ms/epoch - 859us/step
# Evaluate model performance on the test data:
mlogit %>% evaluate(x_test, y_test)
313/313 - 0s - loss: 0.2666 - accuracy: 0.9272 - 126ms/epoch - 404us/step
     loss  accuracy 
0.2665513 0.9272000 

Generate predictions on new data:

mlogit %>% predict(x_test) %>% k_argmax()
313/313 - 0s - 122ms/epoch - 388us/step
tf.Tensor([7 2 1 ... 4 5 6], shape=(10000), dtype=int64)

Experiment: Change the linear activation to relu in the multinomial-logit model and see the change in classification accuracy.