Display system information for reproducibility.
import IPython
print (IPython.sys_info())
{'commit_hash': '8b1204b6c',
'commit_source': 'installation',
'default_encoding': 'utf-8',
'ipython_path': '/opt/venv/lib/python3.10/site-packages/IPython',
'ipython_version': '8.21.0',
'os_name': 'posix',
'platform': 'Linux-6.6.12-linuxkit-aarch64-with-glibc2.35',
'sys_executable': '/opt/venv/bin/python',
'sys_platform': 'linux',
'sys_version': '3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]'}
R version 4.3.2 (2023-10-31)
Platform: aarch64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS
Matrix products: default
BLAS: /usr/lib/aarch64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/aarch64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] digest_0.6.34 fastmap_1.1.1 xfun_0.42 Matrix_1.6-1.1
[5] lattice_0.21-9 reticulate_1.35.0 knitr_1.45 htmltools_0.5.7
[9] png_0.1-8 rmarkdown_2.25 cli_3.6.2 grid_4.3.2
[13] compiler_4.3.2 rstudioapi_0.15.0 tools_4.3.2 evaluate_0.23
[17] Rcpp_1.0.12 yaml_2.3.8 rlang_1.1.3 jsonlite_1.8.8
[21] htmlwidgets_1.6.4
Load some libraries.
# Load the pandas library
import pandas as pd
# Load numpy for array manipulation
import numpy as np
# Load seaborn plotting library
import seaborn as sns
import matplotlib.pyplot as plt
# Set font sizes in plots
sns.set (font_scale = 1.2 )
# Display all columns
pd.set_option('display.max_columns' , None )
# Load Tensorflow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
In this example, we train an MLP (multi-layer perceptron) on the MNIST data set. Achieve testing accuracy 98% after 30 epochs.
The MNIST database (Modified National Institute of Standards and Technology database) is a large database of handwritten digits (\(28 \times 28\) ) that is commonly used for training and testing machine learning algorithms.
60,000 training images, 10,000 testing images.
Prepare data
Acquire data:
# Load the data and split it between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
8192/11490434 [..............................] - ETA: 0s
737280/11490434 [>.............................] - ETA: 0s
2252800/11490434 [====>.........................] - ETA: 0s
3768320/11490434 [========>.....................] - ETA: 0s
5242880/11490434 [============>.................] - ETA: 0s
6471680/11490434 [===============>..............] - ETA: 0s
8142848/11490434 [====================>.........] - ETA: 0s
9617408/11490434 [========================>.....] - ETA: 0s
10928128/11490434 [===========================>..] - ETA: 0s
11490434/11490434 [==============================] - 0s 0us/step
# Training set
x_train.shape
Display the first training instance and its label:
import matplotlib.pyplot as plt
# Feature: digit
plt.figure()
plt.gray()
plt.imshow(x_train[0 ]);
plt.show()
mnist <- dataset_mnist ()
x_train <- mnist$ train$ x
y_train <- mnist$ train$ y
x_test <- mnist$ test$ x
y_test <- mnist$ test$ y
Training set:
image (
t (x_train[1 , 28 : 1 ,]),
useRaster = TRUE ,
axes = FALSE ,
col = grey (seq (0 , 1 , length = 256 ))
)
Testing set:
Vectorize \(28 \times 28\) images into \(784\) -vectors and scale entries to [0, 1]:
# Reshape
x_train = np.reshape(x_train, [x_train.shape[0 ], 784 ])
x_test = np.reshape(x_test, [x_test.shape[0 ], 784 ])
# Rescale
x_train = x_train / 255
x_test = x_test / 255
# Train
x_train.shape
# reshape
x_train <- array_reshape (x_train, c (nrow (x_train), 784 ))
x_test <- array_reshape (x_test, c (nrow (x_test), 784 ))
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
dim (x_train)
Encode \(y\) as binary class matrix:
y_train = keras.utils.to_categorical(y_train, 10 )
y_test = keras.utils.to_categorical(y_test, 10 )
# Train
y_train.shape
# First train instance
y_train[0 ]
array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)
y_train <- to_categorical (y_train, 10 )
y_test <- to_categorical (y_test, 10 )
dim (y_train)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 1 0 0 0 0
[2,] 1 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 1 0 0 0 0 0
[4,] 0 1 0 0 0 0 0 0 0 0
[5,] 0 0 0 0 0 0 0 0 0 1
[6,] 0 0 1 0 0 0 0 0 0 0
Define the model
Define a sequential model (a linear stack of layers) with 2 fully-connected hidden layers (256 and 128 neurons):
model = keras.Sequential(
[
keras.Input(shape = (784 ,)),
layers.Dense(units = 256 , activation = 'relu' ),
layers.Dropout(rate = 0.4 ),
layers.Dense(units = 128 , activation = 'relu' ),
layers.Dropout(rate = 0.3 ),
layers.Dense(units = 10 , activation = 'softmax' )
]
)
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 256) 200960
dropout (Dropout) (None, 256) 0
dense_1 (Dense) (None, 128) 32896
dropout_1 (Dropout) (None, 128) 0
dense_2 (Dense) (None, 10) 1290
=================================================================
Total params: 235146 (918.54 KB)
Trainable params: 235146 (918.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Plot the model:
tf.keras.utils.plot_model(
model,
to_file = "model.png" ,
show_shapes = True ,
show_dtype = False ,
show_layer_names = True ,
rankdir = "TB" ,
expand_nested = False ,
dpi = 96 ,
layer_range = None ,
show_layer_activations = False ,
)
You must install pydot (`pip install pydot`) and install graphviz (see instructions at https://graphviz.gitlab.io/download/) for plot_model to work.
model <- keras_model_sequential ()
model %>%
layer_dense (units = 256 , activation = 'relu' , input_shape = c (784 )) %>%
layer_dropout (rate = 0.4 ) %>%
layer_dense (units = 128 , activation = 'relu' ) %>%
layer_dropout (rate = 0.3 ) %>%
layer_dense (units = 10 , activation = 'softmax' )
summary (model)
Model: "sequential_1"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
dense_5 (Dense) (None, 256) 200960
dropout_3 (Dropout) (None, 256) 0
dense_4 (Dense) (None, 128) 32896
dropout_2 (Dropout) (None, 128) 0
dense_3 (Dense) (None, 10) 1290
================================================================================
Total params: 235146 (918.54 KB)
Trainable params: 235146 (918.54 KB)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________________
Compile the model with appropriate loss function, optimizer, and metrics:
model.compile (
loss = "categorical_crossentropy" ,
optimizer = "rmsprop" ,
metrics = ["accuracy" ]
)
model %>% compile (
loss = 'categorical_crossentropy' ,
optimizer = optimizer_rmsprop (),
metrics = c ('accuracy' )
)
Training and validation
80%/20% split for the train/validation set.
batch_size = 128
epochs = 30
history = model.fit(
x_train,
y_train,
batch_size = batch_size,
epochs = epochs,
validation_split = 0.2
)
Plot training history:
Code
hist = pd.DataFrame(history.history)
hist['epoch' ] = np.arange(1 , epochs + 1 )
hist = hist.melt(
id_vars = ['epoch' ],
value_vars = ['loss' , 'accuracy' , 'val_loss' , 'val_accuracy' ],
var_name = 'type' ,
value_name = 'value'
)
hist['split' ] = np.where(['val' in s for s in hist['type' ]], 'validation' , 'train' )
hist['metric' ] = np.where(['loss' in s for s in hist['type' ]], 'loss' , 'accuracy' )
# Accuracy trace plot
plt.figure()
sns.relplot(
data = hist[hist['metric' ] == 'accuracy' ],
kind = 'scatter' ,
x = 'epoch' ,
y = 'value' ,
hue = 'split'
).set (
xlabel = 'Epoch' ,
ylabel = 'Accuracy'
);
plt.show()
Code
# Loss trace plot
plt.figure()
sns.relplot(
data = hist[hist['metric' ] == 'loss' ],
kind = 'scatter' ,
x = 'epoch' ,
y = 'value' ,
hue = 'split'
).set (
xlabel = 'Epoch' ,
ylabel = 'Loss'
);
plt.show()
system.time ({
history <- model %>% fit (
x_train, y_train,
epochs = 30 , batch_size = 128 ,
validation_split = 0.2
)
})
Epoch 1/30
375/375 - 1s - loss: 0.4300 - accuracy: 0.8703 - val_loss: 0.1623 - val_accuracy: 0.9523 - 1s/epoch - 3ms/step
Epoch 2/30
375/375 - 1s - loss: 0.1997 - accuracy: 0.9415 - val_loss: 0.1185 - val_accuracy: 0.9631 - 1s/epoch - 3ms/step
Epoch 3/30
375/375 - 1s - loss: 0.1571 - accuracy: 0.9537 - val_loss: 0.1077 - val_accuracy: 0.9669 - 1s/epoch - 3ms/step
Epoch 4/30
375/375 - 1s - loss: 0.1322 - accuracy: 0.9611 - val_loss: 0.0948 - val_accuracy: 0.9740 - 1s/epoch - 3ms/step
Epoch 5/30
375/375 - 1s - loss: 0.1148 - accuracy: 0.9654 - val_loss: 0.0967 - val_accuracy: 0.9729 - 1s/epoch - 3ms/step
Epoch 6/30
375/375 - 1s - loss: 0.1022 - accuracy: 0.9699 - val_loss: 0.0856 - val_accuracy: 0.9768 - 1s/epoch - 3ms/step
Epoch 7/30
375/375 - 1s - loss: 0.0951 - accuracy: 0.9714 - val_loss: 0.0881 - val_accuracy: 0.9758 - 1s/epoch - 3ms/step
Epoch 8/30
375/375 - 1s - loss: 0.0868 - accuracy: 0.9733 - val_loss: 0.0908 - val_accuracy: 0.9759 - 1s/epoch - 3ms/step
Epoch 9/30
375/375 - 1s - loss: 0.0808 - accuracy: 0.9757 - val_loss: 0.0839 - val_accuracy: 0.9777 - 1s/epoch - 3ms/step
Epoch 10/30
375/375 - 1s - loss: 0.0788 - accuracy: 0.9757 - val_loss: 0.0798 - val_accuracy: 0.9795 - 1s/epoch - 3ms/step
Epoch 11/30
375/375 - 1s - loss: 0.0735 - accuracy: 0.9781 - val_loss: 0.0868 - val_accuracy: 0.9798 - 1s/epoch - 3ms/step
Epoch 12/30
375/375 - 1s - loss: 0.0739 - accuracy: 0.9780 - val_loss: 0.0864 - val_accuracy: 0.9783 - 1s/epoch - 3ms/step
Epoch 13/30
375/375 - 1s - loss: 0.0662 - accuracy: 0.9807 - val_loss: 0.0850 - val_accuracy: 0.9792 - 1s/epoch - 3ms/step
Epoch 14/30
375/375 - 1s - loss: 0.0632 - accuracy: 0.9812 - val_loss: 0.0844 - val_accuracy: 0.9799 - 1s/epoch - 3ms/step
Epoch 15/30
375/375 - 1s - loss: 0.0622 - accuracy: 0.9812 - val_loss: 0.0825 - val_accuracy: 0.9802 - 1s/epoch - 3ms/step
Epoch 16/30
375/375 - 1s - loss: 0.0581 - accuracy: 0.9826 - val_loss: 0.0855 - val_accuracy: 0.9802 - 1s/epoch - 3ms/step
Epoch 17/30
375/375 - 1s - loss: 0.0596 - accuracy: 0.9824 - val_loss: 0.0877 - val_accuracy: 0.9804 - 1s/epoch - 3ms/step
Epoch 18/30
375/375 - 1s - loss: 0.0553 - accuracy: 0.9834 - val_loss: 0.0882 - val_accuracy: 0.9803 - 1s/epoch - 3ms/step
Epoch 19/30
375/375 - 1s - loss: 0.0544 - accuracy: 0.9838 - val_loss: 0.0828 - val_accuracy: 0.9811 - 1s/epoch - 3ms/step
Epoch 20/30
375/375 - 1s - loss: 0.0544 - accuracy: 0.9844 - val_loss: 0.0873 - val_accuracy: 0.9805 - 1s/epoch - 3ms/step
Epoch 21/30
375/375 - 1s - loss: 0.0527 - accuracy: 0.9847 - val_loss: 0.0885 - val_accuracy: 0.9808 - 1s/epoch - 3ms/step
Epoch 22/30
375/375 - 1s - loss: 0.0497 - accuracy: 0.9848 - val_loss: 0.0897 - val_accuracy: 0.9811 - 1s/epoch - 3ms/step
Epoch 23/30
375/375 - 1s - loss: 0.0481 - accuracy: 0.9859 - val_loss: 0.0921 - val_accuracy: 0.9802 - 1s/epoch - 3ms/step
Epoch 24/30
375/375 - 1s - loss: 0.0479 - accuracy: 0.9847 - val_loss: 0.0971 - val_accuracy: 0.9814 - 1s/epoch - 3ms/step
Epoch 25/30
375/375 - 1s - loss: 0.0483 - accuracy: 0.9858 - val_loss: 0.0979 - val_accuracy: 0.9803 - 1s/epoch - 3ms/step
Epoch 26/30
375/375 - 1s - loss: 0.0476 - accuracy: 0.9861 - val_loss: 0.0880 - val_accuracy: 0.9809 - 1s/epoch - 3ms/step
Epoch 27/30
375/375 - 1s - loss: 0.0444 - accuracy: 0.9874 - val_loss: 0.0941 - val_accuracy: 0.9816 - 1s/epoch - 3ms/step
Epoch 28/30
375/375 - 1s - loss: 0.0434 - accuracy: 0.9874 - val_loss: 0.1077 - val_accuracy: 0.9808 - 1s/epoch - 3ms/step
Epoch 29/30
375/375 - 1s - loss: 0.0405 - accuracy: 0.9879 - val_loss: 0.0921 - val_accuracy: 0.9820 - 1s/epoch - 3ms/step
Epoch 30/30
375/375 - 1s - loss: 0.0439 - accuracy: 0.9874 - val_loss: 0.0929 - val_accuracy: 0.9827 - 1s/epoch - 3ms/step
user system elapsed
63.770 20.459 34.053
Testing
Evaluate model performance on the test data:
score = model.evaluate(x_test, y_test, verbose = 0 )
print ("Test loss:" , score[0 ])
Test loss: 0.08709269762039185
print ("Test accuracy:" , score[1 ])
Test accuracy: 0.9825000166893005
model %>% evaluate (x_test, y_test)
313/313 - 0s - loss: 0.0835 - accuracy: 0.9836 - 185ms/epoch - 591us/step
loss accuracy
0.08349238 0.98360002
Generate predictions on new data:
model %>% predict (x_test) %>% k_argmax ()
313/313 - 0s - 208ms/epoch - 663us/step
tf.Tensor([7 2 1 ... 4 5 6], shape=(10000), dtype=int64)
Exercise
Suppose we want to fit a multinomial-logit model and use it as a baseline method to neural networks. How to do that? Of course we can use mlogit
or other packages. Instead we can fit the same model using keras, since multinomial-logit is just an MLP with (1) one input layer with linear activation and (2) one output layer with softmax link function.
# set up model
library (keras)
mlogit <- keras_model_sequential ()
mlogit %>%
# layer_dense(units = 256, activation = 'linear', input_shape = c(784)) %>%
# layer_dropout(rate = 0.4) %>%
layer_dense (units = 10 , activation = 'softmax' , input_shape = c (784 ))
summary (mlogit)
Model: "sequential_2"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
dense_6 (Dense) (None, 10) 7850
================================================================================
Total params: 7850 (30.66 KB)
Trainable params: 7850 (30.66 KB)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________________
# compile model
mlogit %>% compile (
loss = 'categorical_crossentropy' ,
optimizer = optimizer_rmsprop (),
metrics = c ('accuracy' )
)
mlogit
Model: "sequential_2"
________________________________________________________________________________
Layer (type) Output Shape Param #
================================================================================
dense_6 (Dense) (None, 10) 7850
================================================================================
Total params: 7850 (30.66 KB)
Trainable params: 7850 (30.66 KB)
Non-trainable params: 0 (0.00 Byte)
________________________________________________________________________________
# fit model
mlogit_history <- mlogit %>%
fit (
x_train,
y_train,
epochs = 20 ,
batch_size = 128 ,
validation_split = 0.2
)
Epoch 1/20
375/375 - 0s - loss: 0.6682 - accuracy: 0.8367 - val_loss: 0.3600 - val_accuracy: 0.9047 - 468ms/epoch - 1ms/step
Epoch 2/20
375/375 - 0s - loss: 0.3530 - accuracy: 0.9034 - val_loss: 0.3100 - val_accuracy: 0.9141 - 314ms/epoch - 838us/step
Epoch 3/20
375/375 - 0s - loss: 0.3181 - accuracy: 0.9112 - val_loss: 0.2956 - val_accuracy: 0.9172 - 314ms/epoch - 838us/step
Epoch 4/20
375/375 - 0s - loss: 0.3019 - accuracy: 0.9161 - val_loss: 0.2864 - val_accuracy: 0.9205 - 305ms/epoch - 813us/step
Epoch 5/20
375/375 - 0s - loss: 0.2922 - accuracy: 0.9178 - val_loss: 0.2783 - val_accuracy: 0.9232 - 324ms/epoch - 864us/step
Epoch 6/20
375/375 - 0s - loss: 0.2858 - accuracy: 0.9202 - val_loss: 0.2758 - val_accuracy: 0.9235 - 305ms/epoch - 814us/step
Epoch 7/20
375/375 - 0s - loss: 0.2807 - accuracy: 0.9213 - val_loss: 0.2715 - val_accuracy: 0.9251 - 306ms/epoch - 816us/step
Epoch 8/20
375/375 - 0s - loss: 0.2769 - accuracy: 0.9226 - val_loss: 0.2707 - val_accuracy: 0.9268 - 317ms/epoch - 846us/step
Epoch 9/20
375/375 - 0s - loss: 0.2738 - accuracy: 0.9244 - val_loss: 0.2677 - val_accuracy: 0.9261 - 306ms/epoch - 817us/step
Epoch 10/20
375/375 - 0s - loss: 0.2710 - accuracy: 0.9251 - val_loss: 0.2666 - val_accuracy: 0.9279 - 303ms/epoch - 807us/step
Epoch 11/20
375/375 - 0s - loss: 0.2687 - accuracy: 0.9255 - val_loss: 0.2658 - val_accuracy: 0.9273 - 373ms/epoch - 996us/step
Epoch 12/20
375/375 - 0s - loss: 0.2666 - accuracy: 0.9262 - val_loss: 0.2668 - val_accuracy: 0.9277 - 322ms/epoch - 859us/step
Epoch 13/20
375/375 - 0s - loss: 0.2648 - accuracy: 0.9272 - val_loss: 0.2649 - val_accuracy: 0.9282 - 308ms/epoch - 820us/step
Epoch 14/20
375/375 - 0s - loss: 0.2634 - accuracy: 0.9267 - val_loss: 0.2633 - val_accuracy: 0.9302 - 308ms/epoch - 822us/step
Epoch 15/20
375/375 - 0s - loss: 0.2621 - accuracy: 0.9273 - val_loss: 0.2638 - val_accuracy: 0.9300 - 325ms/epoch - 868us/step
Epoch 16/20
375/375 - 0s - loss: 0.2606 - accuracy: 0.9280 - val_loss: 0.2637 - val_accuracy: 0.9286 - 320ms/epoch - 853us/step
Epoch 17/20
375/375 - 0s - loss: 0.2597 - accuracy: 0.9285 - val_loss: 0.2628 - val_accuracy: 0.9298 - 308ms/epoch - 821us/step
Epoch 18/20
375/375 - 0s - loss: 0.2582 - accuracy: 0.9292 - val_loss: 0.2631 - val_accuracy: 0.9308 - 304ms/epoch - 811us/step
Epoch 19/20
375/375 - 0s - loss: 0.2574 - accuracy: 0.9293 - val_loss: 0.2622 - val_accuracy: 0.9301 - 303ms/epoch - 807us/step
Epoch 20/20
375/375 - 0s - loss: 0.2562 - accuracy: 0.9298 - val_loss: 0.2620 - val_accuracy: 0.9298 - 322ms/epoch - 859us/step
# Evaluate model performance on the test data:
mlogit %>% evaluate (x_test, y_test)
313/313 - 0s - loss: 0.2666 - accuracy: 0.9272 - 126ms/epoch - 404us/step
loss accuracy
0.2665513 0.9272000
Generate predictions on new data:
mlogit %>% predict (x_test) %>% k_argmax ()
313/313 - 0s - 122ms/epoch - 388us/step
tf.Tensor([7 2 1 ... 4 5 6], shape=(10000), dtype=int64)
Experiment: Change the linear
activation to relu
in the multinomial-logit model and see the change in classification accuracy.