function(x) {
sumx <- 0.0
for (i in 1:length(x)) {
sumx <- sumx + x[i]
}
return(sumx)
}
Run the code on 1e6 elements:
library(microbenchmark)library(ggplot2)x =seq(from =0, to =100, by =0.0001)microbenchmark(sum_r(x))
Unit: milliseconds
expr min lq mean median uq max neval
sum_r(x) 16.48617 17.25273 17.49703 17.43515 17.76436 18.66871 100
Let’s compile the function into bytecode sum_rc and benchmark again:
library(compiler)sum_rc <-cmpfun(sum_r)sum_rc
function(x) {
sumx <- 0.0
for (i in 1:length(x)) {
sumx <- sumx + x[i]
}
return(sumx)
}
<bytecode: 0xaaaaeeb4ff60>
Benchmark again:
microbenchmark(sum_r(x), sum_rc(x))
Unit: milliseconds
expr min lq mean median uq max neval cld
sum_r(x) 16.41467 17.15398 17.25924 17.24933 17.34571 17.84212 100 a
sum_rc(x) 16.64304 17.17708 17.30328 17.27069 17.40225 17.84188 100 a
Surprise! Surprise! Compiling into bytecode does not help at all. Following code shows that the function sum_r is already compiled into bytecode before execution.
sum_r
function(x) {
sumx <- 0.0
for (i in 1:length(x)) {
sumx <- sumx + x[i]
}
return(sumx)
}
<bytecode: 0xaaaaf0811e28>
Let’s turn off JIT (just-in-time compilation), re-define the (same) sum_r function, and benchmark again:
Unit: milliseconds
expr min lq mean median uq max neval
sum_r(x) 149.4499 152.6917 156.5765 155.2042 157.2557 232.8011 100
Now we witness the slowness of the un-compiled sum_r.
Documentation of enableJIT:
enableJIT enables or disables just-in-time (JIT) compilation. JIT is disabled if the argument is 0. If level is 1 then larger closures are compiled before their first use. If level is 2, then some small closures are also compiled before their second use. If level is 3 then in addition all top level loops are compiled before they are executed. JIT level 3 requires the compiler option optimize to be 2 or 3. The JIT level can also be selected by starting R with the environment variable R_ENABLE_JIT set to one of these values. Calling enableJIT with a negative argument returns the current JIT level. The default JIT level is 3.
Since R 3.4.0 (Apr 2017), the JIT (‘Just In Time’) bytecode compiler is enabled by default at its level 3.
If you create a package, then you automatically compile the package on installation by adding
ByteCompile: true
to the DESCRIPTION file.
Note Matlab has employed JIT technology since 2002 and Julia is designed totally based on JIT. R finally is on the same boat.
compiler package compiles R code into bytecode, which is translated to machine code by interpreter during execution. A low-level language such as C, C++, and Fortran is compiled into machine code directly, yielding the maximum efficiency.
4.1 Use cppFunction
Rcpp package provides a convenient way to embed C++ code in R code.
library(Rcpp)cppFunction('double sum_c(NumericVector x) { int n = x.size(); double total = 0; for(int i = 0; i < n; ++i) { total += x[i]; } return total;}')sum_c
function (x)
.Call(<pointer: 0xffff71ba5590>, x)
Benchmark (1) compiled C++ function sum_c together with (2) R function sum_r, (3) compiled R function sum_rc, and (4) the sum function in base R:
Unit: microseconds
expr min lq mean median uq max
sum_r(x) 148166.418 151448.3130 154195.2837 152401.292 153969.542 221559.709
sum_rc(x) 16885.084 17199.2505 17399.3656 17297.584 17475.001 18786.417
sum_c(x) 813.167 846.8135 879.8023 875.292 907.209 977.043
sum(x) 10795.043 10963.7300 11021.4745 11012.751 11077.667 11328.251
neval cld
100 a
100 b
100 c
100 d
autoplot(mbm)
Remember we turned off JIT by enableGIT(0) earlier.
4.2 Use sourceCpp
In realistic projects, we write standalone C++ files and then source them into R using sourceCpp(). For example
cat sum.cpp
#include <Rcpp.h>
using namespace Rcpp;
// This is a simple example of exporting a C++ function to R. You can
// source this function into an R session using the Rcpp::sourceCpp
// function (or via the Source button on the editor toolbar). Learn
// more about Rcpp at:
//
// http://www.rcpp.org/
// http://adv-r.had.co.nz/Rcpp.html
// http://gallery.rcpp.org/
//
// [[Rcpp::export]]
double sum_c(NumericVector x) {
int n = x.size();
double total = 0;
for(int i = 0; i < n; ++i) {
total += x[i];
}
return total;
}
// You can include R code blocks in C++ files processed with sourceCpp
// (useful for testing and development). The R code will be automatically
// run after the compilation.
//
/*** R
sum_c(as.double(1:10))
*/
sourceCpp("sum.cpp")
> sum_c(as.double(1:10))
[1] 55
sum_c
function (x)
.Call(<pointer: 0xffff71b85590>, x)
5 Parallel computing
Fact: base R is single-threaded. Even you request a fancy instance with 96 vCPUs, running R code is just using 1/96th of its power.
Embarrassing Parallelism: separate a problem into a small number of large parallel pieces, where different pieces of the problem never really have to communicate with each other.
To perform multi-core computation in R using parallel package.
Authors: Brian Ripley, Luke Tieney, Simon Urbanek.
Included in base R since 2.14.0 (2011).
Based on the snow (Luke Tierney) and multicore (Simon Urbanek) packages.
To find the number of cores:
library(parallel)detectCores()
[1] 12
5.1 Simulation example
We have a “new” method that estimates the population mean by averaging the observations indexed by prime numbers.
## check if a given integer is primeisPrime =function(n) {if (n <=3) {return (TRUE) }if (any((n %%2:floor(sqrt(n))) ==0)) {return (FALSE) }return (TRUE)}## estimate mean only using observation with prime indicesestMeanPrimes =function(x) { n <-length(x) ind <-sapply(1:n, isPrime)return (mean(x[ind]))}
We want to compare our method to the traditional sample average estimator by simulation studies.
## compare methods: sample avg and prime-indexed avgcompare_methods <-function(dist ="gaussian", n =100, reps =100, seed =123) {# set seed according to command argument `seed`set.seed(seed)# preallocate space to store estimators msePrimeAvg <-0.0 mseSamplAvg <-0.0# loop over simulation replicatesfor (r in1:reps) {# simulate data according to command arguments `n` and `distr`if (dist =="gaussian") { x =rnorm(n) } elseif (dist =="t1") { x =rcauchy(n) } elseif (dist =="t5") { x =rt(n, 5) } else {stop(paste("unrecognized dist: ", dist)) }# prime indexed mean estimator and classical sample average estimator msePrimeAvg <- msePrimeAvg +estMeanPrimes(x)^2 mseSamplAvg <- mseSamplAvg +mean(x)^2 } mseSamplAvg <- mseSamplAvg / reps msePrimeAvg <- msePrimeAvg / repsreturn(c(mseSamplAvg, msePrimeAvg))}
5.2 Serial code
We need to loop over 3 generative models (distTypes) and 20 samples sizes (nVals). That are 60 embarrassingly parallel tasks.
seed =280reps =500nVals =seq(100, 1000, by =50)distTypes =c("gaussian", "t5", "t1")
This is the serial code that double-loop over combinations of distTypes and nVals:
## simulation study with combination of generative model `dist` and## sample size `n` (serial code)simres1 =matrix(0.0, nrow =2*length(nVals), ncol =length(distTypes))i =1# entry indexsystem.time(for (dist in distTypes) {for (n in nVals) {#print(paste("n=", n, " dist=", dist, " seed=", seed, " reps=", reps, sep="")) simres1[i:(i +1)] =compare_methods(dist, n, reps, seed) i <- i +2 } })
Run the same task using mcmapply function (parallel analog of mapply) in the parallel package:
## simulation study with combination of generative model `dist` and## sample size `n` (parallel code using mcmapply)library(parallel)system.time({ simres2 <-mcmapply(compare_methods, rep(distTypes, each =length(nVals), times =1),rep(nVals, each =1, times =length(distTypes)), reps, seed,mc.cores =4)})
mcmapply, mclapply and related functions rely on the forking capability of POSIX operating systems (e.g. Linux, MacOS) and is not available in Windows.
parLapply, parApply, parCapply, parRapply, clusterApply, clusterMap, and related functions create a cluster of workers based on either socket (default) or forking. Socket is available on all platforms: Linux, MacOS, and Windows.
5.4 Using clusterMap
The same simulation example using clusterMap function:
cl <-makeCluster(getOption("cl.cores", 4))clusterExport(cl, c("isPrime", "estMeanPrimes", "compare_methods"))system.time({ simres3 <-clusterMap(cl, compare_methods,rep(distTypes, each =length(nVals), times =1),rep(nVals, each =1, times =length(distTypes)), reps, seed,.scheduling ="dynamic")})