Select Topics in R Programming - Part II

Biostat 203B

Author

Dr. Hua Zhou @ UCLA

Published

March 13, 2024

1 Preamble

Display machine information for reproducibility.

sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-unknown-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/aarch64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/aarch64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] htmlwidgets_1.6.4 compiler_4.3.2    fastmap_1.1.1     cli_3.6.2        
 [5] tools_4.3.2       htmltools_0.5.7   rstudioapi_0.15.0 yaml_2.3.8       
 [9] rmarkdown_2.25    knitr_1.45        jsonlite_1.8.8    xfun_0.42        
[13] digest_0.6.34     rlang_1.1.3       evaluate_0.23    

2 Typical development cycle for computational statistics

  1. Scientific planning: What experiments would verify/invalidate our hypotheses? What parameter settings should we consider?

  2. Code planning: What does the code need to do? How will the code fit together? What functions will be used? What are their inputs/outputs etc.

  3. Implementation:

    1. Prototype functions, classes, etc., partial documentation

    2. Write unit tests

    3. Implement code, run unit tests, debug

    4. Broader testing, more debugging

    5. Profile code, identify bottlenecks

    6. Optimize code

  4. Conduct experiments.

  5. Full documentation.

3 Bytecode compilation

  • After profiling, what to do to improve performance?

    1. Ask are there obvious speedups? Are things being computed unnecessarily? Are you using a data.frame where you should be using a matrix etc.

    2. Look up your problem on internet (e.g., search for “lapply slow” or “speeding up lapply” etc.)

    3. Try the just-in-time (JIT) compiler.

    4. Consider re-writing some or all of the code in a compiled language (e.g., C/C++).

    5. Try parallelization.

  • R typical execution:

  • Since R 2.1.4, the compiler package by Luke Tierney is distributed with base R. compiler package compiles an R function into bytecode.

3.1 Example: summing a vector

Brute-force for loop for summing a vector:

sum_r <- function(x) {
  sumx <- 0.0
  for (i in 1:length(x)) {
    sumx <- sumx + x[i]
  }
  return(sumx)
}
sum_r
function(x) {
  sumx <- 0.0
  for (i in 1:length(x)) {
    sumx <- sumx + x[i]
  }
  return(sumx)
}

Run the code on 1e6 elements:

library(microbenchmark)
library(ggplot2)

x = seq(from = 0, to = 100, by = 0.0001)
microbenchmark(sum_r(x))
Unit: milliseconds
     expr      min       lq     mean   median       uq      max neval
 sum_r(x) 16.48617 17.25273 17.49703 17.43515 17.76436 18.66871   100

Let’s compile the function into bytecode sum_rc and benchmark again:

library(compiler)
sum_rc <- cmpfun(sum_r)
sum_rc
function(x) {
  sumx <- 0.0
  for (i in 1:length(x)) {
    sumx <- sumx + x[i]
  }
  return(sumx)
}
<bytecode: 0xaaaaeeb4ff60>

Benchmark again:

microbenchmark(sum_r(x), sum_rc(x))
Unit: milliseconds
      expr      min       lq     mean   median       uq      max neval cld
  sum_r(x) 16.41467 17.15398 17.25924 17.24933 17.34571 17.84212   100   a
 sum_rc(x) 16.64304 17.17708 17.30328 17.27069 17.40225 17.84188   100   a

Surprise! Surprise! Compiling into bytecode does not help at all. Following code shows that the function sum_r is already compiled into bytecode before execution.

sum_r
function(x) {
  sumx <- 0.0
  for (i in 1:length(x)) {
    sumx <- sumx + x[i]
  }
  return(sumx)
}
<bytecode: 0xaaaaf0811e28>

Let’s turn off JIT (just-in-time compilation), re-define the (same) sum_r function, and benchmark again:

enableJIT(0) # set JIT level to 0
[1] 3
sum_r <- function(x) {
  sumx <- 0.0
  for (i in 1:length(x)) {
    sumx <- sumx + x[i]
  }
  return(sumx)
}
microbenchmark(sum_r(x))
Unit: milliseconds
     expr      min       lq     mean   median       uq      max neval
 sum_r(x) 149.4499 152.6917 156.5765 155.2042 157.2557 232.8011   100

Now we witness the slowness of the un-compiled sum_r.

Documentation of enableJIT:

enableJIT enables or disables just-in-time (JIT) compilation. JIT is disabled if the argument is 0. If level is 1 then larger closures are compiled before their first use. If level is 2, then some small closures are also compiled before their second use. If level is 3 then in addition all top level loops are compiled before they are executed. JIT level 3 requires the compiler option optimize to be 2 or 3. The JIT level can also be selected by starting R with the environment variable R_ENABLE_JIT set to one of these values. Calling enableJIT with a negative argument returns the current JIT level. The default JIT level is 3.

Since R 3.4.0 (Apr 2017), the JIT (‘Just In Time’) bytecode compiler is enabled by default at its level 3.

If you create a package, then you automatically compile the package on installation by adding

ByteCompile: true

to the DESCRIPTION file.

Note Matlab has employed JIT technology since 2002 and Julia is designed totally based on JIT. R finally is on the same boat.

4 Rcpp

Learning sources:

compiler package compiles R code into bytecode, which is translated to machine code by interpreter during execution. A low-level language such as C, C++, and Fortran is compiled into machine code directly, yielding the maximum efficiency.

4.1 Use cppFunction

Rcpp package provides a convenient way to embed C++ code in R code.

library(Rcpp)

cppFunction('double sum_c(NumericVector x) {
  int n = x.size();
  double total = 0;
  for(int i = 0; i < n; ++i) {
    total += x[i];
  }
  return total;
}')
sum_c
function (x) 
.Call(<pointer: 0xffff71ba5590>, x)

Benchmark (1) compiled C++ function sum_c together with (2) R function sum_r, (3) compiled R function sum_rc, and (4) the sum function in base R:

mbm <- microbenchmark(sum_r(x), sum_rc(x), sum_c(x), sum(x))
mbm
Unit: microseconds
      expr        min          lq        mean     median         uq        max
  sum_r(x) 148166.418 151448.3130 154195.2837 152401.292 153969.542 221559.709
 sum_rc(x)  16885.084  17199.2505  17399.3656  17297.584  17475.001  18786.417
  sum_c(x)    813.167    846.8135    879.8023    875.292    907.209    977.043
    sum(x)  10795.043  10963.7300  11021.4745  11012.751  11077.667  11328.251
 neval  cld
   100 a   
   100  b  
   100   c 
   100    d
autoplot(mbm)

Remember we turned off JIT by enableGIT(0) earlier.

4.2 Use sourceCpp

In realistic projects, we write standalone C++ files and then source them into R using sourceCpp(). For example

cat sum.cpp
#include <Rcpp.h>
using namespace Rcpp;

// This is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp 
// function (or via the Source button on the editor toolbar). Learn
// more about Rcpp at:
//
//   http://www.rcpp.org/
//   http://adv-r.had.co.nz/Rcpp.html
//   http://gallery.rcpp.org/
//

// [[Rcpp::export]]
double sum_c(NumericVector x) {
  int n = x.size();
  double total = 0;
  for(int i = 0; i < n; ++i) {
    total += x[i];
  }
  return total;
}

// You can include R code blocks in C++ files processed with sourceCpp
// (useful for testing and development). The R code will be automatically 
// run after the compilation.
//

/*** R
sum_c(as.double(1:10))
*/
sourceCpp("sum.cpp")

> sum_c(as.double(1:10))
[1] 55
sum_c
function (x) 
.Call(<pointer: 0xffff71b85590>, x)

5 Parallel computing

  • Fact: base R is single-threaded. Even you request a fancy instance with 96 vCPUs, running R code is just using 1/96th of its power.

  • Embarrassing Parallelism: separate a problem into a small number of large parallel pieces, where different pieces of the problem never really have to communicate with each other.

  • To perform multi-core computation in R using parallel package.

    • Authors: Brian Ripley, Luke Tieney, Simon Urbanek.

    • Included in base R since 2.14.0 (2011).

    • Based on the snow (Luke Tierney) and multicore (Simon Urbanek) packages.

    • To find the number of cores:

library(parallel)
detectCores()
[1] 12

5.1 Simulation example

We have a “new” method that estimates the population mean by averaging the observations indexed by prime numbers.

## check if a given integer is prime
isPrime = function(n) {
  if (n <= 3) {
    return (TRUE)
  }
  if (any((n %% 2:floor(sqrt(n))) == 0)) {
    return (FALSE)
  }
  return (TRUE)
}

## estimate mean only using observation with prime indices
estMeanPrimes = function(x) {
  n <- length(x)
  ind <- sapply(1:n, isPrime)
  return (mean(x[ind]))
}

We want to compare our method to the traditional sample average estimator by simulation studies.

## compare methods: sample avg and prime-indexed avg
compare_methods <- function(dist = "gaussian", n = 100, reps = 100, seed = 123) {
  # set seed according to command argument `seed`
  set.seed(seed)
  
  # preallocate space to store estimators
  msePrimeAvg <- 0.0
  mseSamplAvg <- 0.0
  # loop over simulation replicates
  for (r in 1:reps) {
    # simulate data according to command arguments `n` and `distr`
    if (dist == "gaussian") {
      x = rnorm(n)
    } else if (dist == "t1") {
      x = rcauchy(n)
    } else if (dist == "t5") {
      x = rt(n, 5)
    } else {
      stop(paste("unrecognized dist: ", dist))
    }
    # prime indexed mean estimator and classical sample average estimator
    msePrimeAvg <- msePrimeAvg + estMeanPrimes(x)^2 
    mseSamplAvg <- mseSamplAvg + mean(x)^2
  }
  mseSamplAvg <- mseSamplAvg / reps
  msePrimeAvg <- msePrimeAvg / reps
  return(c(mseSamplAvg, msePrimeAvg))
}

5.2 Serial code

We need to loop over 3 generative models (distTypes) and 20 samples sizes (nVals). That are 60 embarrassingly parallel tasks.

seed = 280
reps = 500
nVals = seq(100, 1000, by = 50)
distTypes = c("gaussian", "t5", "t1")

This is the serial code that double-loop over combinations of distTypes and nVals:

## simulation study with combination of generative model `dist` and
## sample size `n` (serial code)
simres1 = matrix(0.0, nrow = 2 * length(nVals), ncol = length(distTypes))
i = 1 # entry index
system.time(
  for (dist in distTypes) {
    for (n in nVals) {
      #print(paste("n=", n, " dist=", dist, " seed=", seed, " reps=", reps, sep=""))
      simres1[i:(i + 1)] = compare_methods(dist, n, reps, seed)
      i <- i + 2
    }
  }
)
   user  system elapsed 
 16.692   0.005  16.697 
simres1
              [,1]        [,2]         [,3]
 [1,] 0.0103989436 0.017070603     312.4001
 [2,] 0.0410217819 0.066503177     200.3237
 [3,] 0.0065484669 0.011260420     173.9631
 [4,] 0.0297390639 0.047465397     199.5330
 [5,] 0.0056445593 0.007754855   68026.7023
 [6,] 0.0215380206 0.040039145 1230343.1341
 [7,] 0.0040803523 0.006871930   43609.7815
 [8,] 0.0165144049 0.032353077  931755.3182
 [9,] 0.0032566766 0.005417194   30283.1286
[10,] 0.0161191554 0.026330133  684726.8788
[11,] 0.0027565672 0.004444172   22306.1369
[12,] 0.0145039253 0.022820075  539105.3392
[13,] 0.0024915830 0.003798500   17119.0807
[14,] 0.0122801335 0.022788299  435528.4778
[15,] 0.0023706676 0.003360507   13531.4104
[16,] 0.0112703627 0.016674640     111.4673
[17,] 0.0020190283 0.003147367   10973.3489
[18,] 0.0106157492 0.016027485     278.9663
[19,] 0.0017567901 0.002863640    9069.6647
[20,] 0.0096185720 0.016444671  261373.7646
[21,] 0.0016441481 0.002637964    7629.9867
[22,] 0.0081784426 0.013710539     296.6235
[23,] 0.0015075246 0.002498450    6498.2362
[24,] 0.0088018140 0.012909942  191986.6388
[25,] 0.0014372130 0.002308089    5603.9395
[26,] 0.0077292632 0.012789483  171280.5448
[27,] 0.0012924543 0.002216936    4889.8739
[28,] 0.0069012154 0.011052562     170.3975
[29,] 0.0011994654 0.001987311    4299.6838
[30,] 0.0067559611 0.011291788     178.7454
[31,] 0.0011642413 0.001888637    3806.8472
[32,] 0.0070131993 0.010048327     140.2389
[33,] 0.0011566121 0.001873365    3401.9766
[34,] 0.0065066558 0.009020973      34.1644
[35,] 0.0010506067 0.001595430    3049.0432
[36,] 0.0060026682 0.010338424  103578.0598
[37,] 0.0009770234 0.001618095    2768.4517
[38,] 0.0054705674 0.009229294     143.5544

5.3 Using mcmapply

Run the same task using mcmapply function (parallel analog of mapply) in the parallel package:

## simulation study with combination of generative model `dist` and
## sample size `n` (parallel code using mcmapply)
library(parallel)
system.time({
  simres2 <- mcmapply(compare_methods, 
                      rep(distTypes, each = length(nVals), times = 1),
                      rep(nVals, each = 1, times = length(distTypes)),
                      reps, 
                      seed,
                      mc.cores = 4)
})
   user  system elapsed 
 12.798   0.161   4.615 
simres2 <- matrix(unlist(simres2), ncol = length(distTypes))
simres2
              [,1]        [,2]         [,3]
 [1,] 0.0103989436 0.017070603     312.4001
 [2,] 0.0410217819 0.066503177     200.3237
 [3,] 0.0065484669 0.011260420     173.9631
 [4,] 0.0297390639 0.047465397     199.5330
 [5,] 0.0056445593 0.007754855   68026.7023
 [6,] 0.0215380206 0.040039145 1230343.1341
 [7,] 0.0040803523 0.006871930   43609.7815
 [8,] 0.0165144049 0.032353077  931755.3182
 [9,] 0.0032566766 0.005417194   30283.1286
[10,] 0.0161191554 0.026330133  684726.8788
[11,] 0.0027565672 0.004444172   22306.1369
[12,] 0.0145039253 0.022820075  539105.3392
[13,] 0.0024915830 0.003798500   17119.0807
[14,] 0.0122801335 0.022788299  435528.4778
[15,] 0.0023706676 0.003360507   13531.4104
[16,] 0.0112703627 0.016674640     111.4673
[17,] 0.0020190283 0.003147367   10973.3489
[18,] 0.0106157492 0.016027485     278.9663
[19,] 0.0017567901 0.002863640    9069.6647
[20,] 0.0096185720 0.016444671  261373.7646
[21,] 0.0016441481 0.002637964    7629.9867
[22,] 0.0081784426 0.013710539     296.6235
[23,] 0.0015075246 0.002498450    6498.2362
[24,] 0.0088018140 0.012909942  191986.6388
[25,] 0.0014372130 0.002308089    5603.9395
[26,] 0.0077292632 0.012789483  171280.5448
[27,] 0.0012924543 0.002216936    4889.8739
[28,] 0.0069012154 0.011052562     170.3975
[29,] 0.0011994654 0.001987311    4299.6838
[30,] 0.0067559611 0.011291788     178.7454
[31,] 0.0011642413 0.001888637    3806.8472
[32,] 0.0070131993 0.010048327     140.2389
[33,] 0.0011566121 0.001873365    3401.9766
[34,] 0.0065066558 0.009020973      34.1644
[35,] 0.0010506067 0.001595430    3049.0432
[36,] 0.0060026682 0.010338424  103578.0598
[37,] 0.0009770234 0.001618095    2768.4517
[38,] 0.0054705674 0.009229294     143.5544
  • We see roughly 3x-4x speedup with mc.cores=4.

  • mcmapply, mclapply and related functions rely on the forking capability of POSIX operating systems (e.g. Linux, MacOS) and is not available in Windows.

  • parLapply, parApply, parCapply, parRapply, clusterApply, clusterMap, and related functions create a cluster of workers based on either socket (default) or forking. Socket is available on all platforms: Linux, MacOS, and Windows.

5.4 Using clusterMap

The same simulation example using clusterMap function:

cl <- makeCluster(getOption("cl.cores", 4))
clusterExport(cl, c("isPrime", "estMeanPrimes", "compare_methods"))
system.time({
  simres3 <- clusterMap(cl, compare_methods,
                        rep(distTypes, each = length(nVals), times = 1),
                        rep(nVals, each = 1, times = length(distTypes)),
                        reps,
                        seed,
                        .scheduling = "dynamic")
})
   user  system elapsed 
  0.013   0.000   3.559 
simres3 <- matrix(unlist(simres3), ncol = length(distTypes))
stopCluster(cl)
simres3
              [,1]        [,2]         [,3]
 [1,] 0.0103989436 0.017070603     312.4001
 [2,] 0.0410217819 0.066503177     200.3237
 [3,] 0.0065484669 0.011260420     173.9631
 [4,] 0.0297390639 0.047465397     199.5330
 [5,] 0.0056445593 0.007754855   68026.7023
 [6,] 0.0215380206 0.040039145 1230343.1341
 [7,] 0.0040803523 0.006871930   43609.7815
 [8,] 0.0165144049 0.032353077  931755.3182
 [9,] 0.0032566766 0.005417194   30283.1286
[10,] 0.0161191554 0.026330133  684726.8788
[11,] 0.0027565672 0.004444172   22306.1369
[12,] 0.0145039253 0.022820075  539105.3392
[13,] 0.0024915830 0.003798500   17119.0807
[14,] 0.0122801335 0.022788299  435528.4778
[15,] 0.0023706676 0.003360507   13531.4104
[16,] 0.0112703627 0.016674640     111.4673
[17,] 0.0020190283 0.003147367   10973.3489
[18,] 0.0106157492 0.016027485     278.9663
[19,] 0.0017567901 0.002863640    9069.6647
[20,] 0.0096185720 0.016444671  261373.7646
[21,] 0.0016441481 0.002637964    7629.9867
[22,] 0.0081784426 0.013710539     296.6235
[23,] 0.0015075246 0.002498450    6498.2362
[24,] 0.0088018140 0.012909942  191986.6388
[25,] 0.0014372130 0.002308089    5603.9395
[26,] 0.0077292632 0.012789483  171280.5448
[27,] 0.0012924543 0.002216936    4889.8739
[28,] 0.0069012154 0.011052562     170.3975
[29,] 0.0011994654 0.001987311    4299.6838
[30,] 0.0067559611 0.011291788     178.7454
[31,] 0.0011642413 0.001888637    3806.8472
[32,] 0.0070131993 0.010048327     140.2389
[33,] 0.0011566121 0.001873365    3401.9766
[34,] 0.0065066558 0.009020973      34.1644
[35,] 0.0010506067 0.001595430    3049.0432
[36,] 0.0060026682 0.010338424  103578.0598
[37,] 0.0009770234 0.001618095    2768.4517
[38,] 0.0054705674 0.009229294     143.5544
  • Again we see roughly 3x-4x speedup by using 4 cores.

  • clusterExport export the data and other information that the child process need from the parent process to the child prrocess.

6 Package development

An R package is the basic unit of reusable code.

Main components of an R package:

  • A DESCRIPTION file

  • An R directory with the code

  • A man directory with documentation (create this automatically by devtools::document() and roxygen2)

  • A NAMESPACE file listing user-level functions in the package (create this automatically by roxygen2)

  • Optional but highly recommended: a tests directory (create by usethis::use_testthat() and test by devtools::test())

  • Optional: a src directory with compiled code (C++)

Learning resources: