Task view: High-Performance and Parallel Computing with R

Last updated on 2017-04-25 by Dirk Eddelbuettel

This CRAN task view contains a list of packages, grouped by topic, that are useful for high-performance computing (HPC) with R. In this context, we are defining 'high-performance computing' rather loosely as just about anything related to pushing R a little further: using compiled code, parallel computing (in both explicit and implicit modes), working with large objects as well as profiling.

Unless otherwise mentioned, all packages presented with hyperlinks are available from CRAN, the Comprehensive R Archive Network.

Several of the areas discussed in this Task View are undergoing rapid change. Please send suggestions for additions and extensions for this task view to the task view maintainer.

Suggestions and corrections by Achim Zeileis, Markus Schmidberger, Martin Morgan, Max Kuhn, Tomas Radivoyevitch, Jochen Knaus, Tobias Verbeke, Hao Yu, David Rosenberg, Marco Enea, Ivo Welch, Jay Emerson, Wei-Chen Chen, Bill Cleveland, Ross Boylan, Ramon Diaz-Uriarte, Mark Zeligman, and Kevin Ushey (as well as others I may have forgotten to add here) are gratefully acknowledged.

Contributions are always welcome, and encouraged. Since the start of this CRAN task view in October 2008, most contributions have arrived as email suggestions. The source file for this particular task view file now also reside in a GitHub repository (see below) so that pull requests are also possible.

The ctv package supports these Task Views. Its functions install.views and update.views allow, respectively, installation or update of packages from a given Task View; the option coreOnly can restrict operations to packages labeled as core below.

Direct support in R started with release 2.14.0 which includes a new package parallel incorporating (slightly revised) copies of packages multicore and snow. Some types of clusters are not handled directly by the base package 'parallel'. However, and as explained in the package vignette, the parts of parallel which provide snow-like functions will accept snow clusters including MPI clusters.
The parallel package also contains support for multiple RNG streams following L'Ecuyer et al (2002), with support for both mclapply and snow clusters.
The version released for R 2.14.0 contains base functionality: higher-level convenience functions are planned for later R releases.

Parallel computing: Explicit parallelism

  • Several packages provide the communications layer required for parallel computing. The first package in this area was rpvm by Li and Rossini which uses the PVM (Parallel Virtual Machine) standard and libraries. rpvm is no longer actively maintained, but available from its CRAN archive directory.
  • In recent years, the alternative MPI (Message Passing Interface) standard has become the de facto standard in parallel computing. It is supported in R via the Rmpi by Yu. Rmpi package is mature yet actively maintained and offers access to numerous functions from the MPI API, as well as a number of R-specific extensions. Rmpi can be used with the LAM/MPI, MPICH / MPICH2, Open MPI, and Deino MPI implementations. It should be noted that LAM/MPI is now in maintenance mode, and new development is focussed on Open MPI.
  • The pbdMPI package provides S4 classes to directly interface MPI in order to support the Single Program/Multiple Data (SPMD) parallel programming style which is particularly useful for batch parallel execution. The pbdSLAP builds on this and uses scalable linear algebra packages (namely BLACS, PBLAS, and ScaLAPACK) in double precision based on ScaLAPACK version 2.0.2. The pbdBASE builds on these and provides the core classes and methods for distributed data types upon which the pbdDMAT builds to provide distributed dense matrices for "Programming with Big Data". The pbdNCDF4 package permits multiple processes to write to the same file (without manual synchronization) and supports terabyte-sized files. The pbdDEMO package provides examples for these packages, and a detailed vignette. The pbdPROF package profiles MPI communication SPMD code via MPI profiling libraries, such as fpmpi, mpiP, or TAU.
  • An alternative is provided by the nws (NetWorkSpaces) packages from REvolution Computing. It is the successor to the earlier LindaSpaces approach to parallel computing, and is implemented on top of the Twisted networking toolkit for Python.
  • The snow (Simple Network of Workstations) package by Tierney et al. can use PVM, MPI, NWS as well as direct networking sockets. It provides an abstraction layer by hiding the communications details. The snowFT package provides fault-tolerance extensions to snow.
  • The snowfall package by Knaus provides a more recent alternative to snow. Functions can be used in sequential or parallel mode.
  • The foreach package allows general iteration over elements in a collection without the use of an explicit loop counter. Using foreach without side effects also facilitates executing the loop in parallel which is possible via the doMC (using parallel/multicore on single workstations), doSNOW (using snow, see above), doMPI (using Rmpi) packages, doFuture (using future or future.BatchJobs), and doRedis (using rredis) packages.
  • The future package allows for synchroneous (sequential) and asynchronous (parallel) evaluations via abstraction of futures, either via function calls or implicitly via promises. Global variables are automatically identified. Iteration over elements in a collection is supported.
  • The Rborist package employs OpenMP pragmas to exploit predictor-level parallelism in the Random Forest algorithm which promotes efficient use of multicore hardware in restaging data and in determining splitting criteria, both of which are performance bottlenecks in the algorithm.
  • The h2o package connects to the h2o open source machine learning environment which has scalable implementations of random forests, GBM, GLM (with elastic net regularization), and deep learning.
  • The randomForestSRC package can use both OpenMP as well as MPI for random forest extensions suitable for survival analysis, competing risks analysis, classification as well as regression

Parallel computing: Implicit parallelism

  • The pnmath package by Tierney (link) uses the Open MP parallel processing directives of recent compilers (such gcc 4.2 or later) for implicit parallelism by replacing a number of internal R functions with replacements that can make use of multiple cores --- without any explicit requests from the user. The alternate pnmath0 package offers the same functionality using Pthreads for environments in which the newer compilers are not available. Similar functionality is expected to become integrated into R 'eventually'.
  • The romp package by Jamitzky was presented at useR! 2008 (slides) and offers another interface to Open MP using Fortran. The code is still pre-alpha and available from the Google Code project romp. An R-Forge project romp was initiated but there is no package, yet.
  • The R/parallel package by Vera, Jansen and Suppi offers a C++-based master-slave dispatch mechanism for parallel execution (link)
  • The Rdsm package provides a threads-like parallel computing environment, both on multicore machine and across the network by providing facilities inspired from distributed shared memory programming.
  • The RhpcBLASctl detects the number of available BLAS cores, and permits explicit selection of the number of cores.
  • The Rhpc permits *apply() style dispatch via MPI.

Parallel computing: Grid computing

  • The multiR package by Grose was presented at useR! 2008 but has not been released. It may offer a snow-style framework on a grid computing platform.
  • The biocep-distrib project by Chine offers a Java-based framework for local, Grid, or Cloud computing. It is under active development.

Parallel computing: Hadoop

  • The RHIPE package, started by Saptarshi Guha and now developed by a core team via GitHub, provides an interface between R and Hadoop for analysis of large complex data wholly from within R using the Divide and Recombine approach to big data.
  • The rmr package by Revolution Analytics also provides an interface between R and Hadoop for a Map/Reduce programming framework. (link)
  • A related package, segue package by Long, permits easy execution of embarassingly parallel task on Elastic Map Reduce (EMR) at Amazon. (link)
  • The RProtoBuf package provides an interface to Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data. This package can be used in R code to read data streams from other systems in a distributed MapReduce setting where data is serialized and passed back and forth between tasks.
  • The HistogramTools package provides a number of routines useful for the construction, aggregation, manipulation, and plotting of large numbers of Histograms such as those created by Mappers in a MapReduce application.
  • The toaster package performs in-database computations utilizing the parallel / distributed Teradata Aster analytical platform

Parallel computing: Random numbers

  • Random-number generators for parallel computing are available via the rlecuyer package by Sevcikova and Rossini.
  • The doRNG package provides functions to perform reproducible parallel foreach loops, using independent random streams as generated by the package rstream, suitable for the different foreach backends.

Parallel computing: Resource managers and batch schedulers

  • Job-scheduling toolkits permit management of parallel computing resources and tasks. The slurm (Simple Linux Utility for Resource Management) set of programs works well with MPI and slurm jobs can be submitted from R using the rslurm package. (link)
  • The Condor toolkit (link) from the University of Wisconsin-Madison has been used with R as described in this R News article.
  • The sfCluster package by Knaus can be used with snowfall. (link) but is currently limited to LAM/MPI.
  • The batch package by Hoffmann can launch parallel computing requests onto a cluster and gather results.
  • The BatchJobs package provides Map, Reduce and Filter variants to manage R jobs and their results on batch computing systems like PBS/Torque, LSF and Sun Grid Engine. Multicore and SSH systems are also supported. The BatchExperiments package extends it with an abstraction layer for running statistical experiments. Package batchtools is a successor / extension to both.
  • The flowr package offers a scatter-gather approach to submit jobs lists (including dependencies) to the computing cluster via simple data.frames as inputs. It supports LSF, SGE, Torque and SLURM.

Parallel computing: Applications

  • The caret package by Kuhn can use various frameworks (MPI, NWS etc) to parallelized cross-validation and bootstrap characterizations of predictive models.
  • The maanova package on Bioconductor by Wu can use snow and Rmpi for the analysis of micro-array experiments.
  • The pvclust package by Suzuki and Shimodaira can use snow and Rmpi for hierarchical clustering via multiscale bootstraps.
  • The tm package by Feinerer can use snow and Rmpi for parallelized text mining.
  • The varSelRF package by Diaz-Uriarte can use snow and Rmpi for parallelized use of variable selection via random forests.
  • The bcp package by Erdman and Emerson for the Bayesian analysis of change points can use foreach for parallelized operations.
  • The multtest package by Pollard et al. on Bioconductor can use snow, Rmpi or rpvm for resampling-based testing of multiple hypothesis.
  • The GAMBoost package by Binder for glm and gam model fitting via boosting using b-splines, the Matching package by Sekhon for multivariate and propensity score matching, the STAR package by Pouzat for spike train analysis, the bnlearn package by Scutari for bayesian network structure learning, the latentnet package by Krivitsky and Handcock for latent position and cluster models, the lga package by Harrington for linear grouping analysis, the peperr package by Porzelius and Binder for parallised estimation of prediction error, the orloca package by Fernandez-Palacin and Munoz-Marquez for operations research locational analysis, the rgenoud package by Mebane and Sekhon for genetic optimization using derivatives the affyPara package by Schmidberger, Vicedo and Mansmann for parallel normalization of Affymetrix microarrays, and the puma package by Pearson et al. which propagates uncertainty into standard microarray analyses such as differential expression all can use snow for parallelized operations using either one of the MPI, PVM, NWS or socket protocols supported by snow.
  • The bugsparallel package uses Rmpi for distributed computing of multiple MCMC chains using WinBUGS.
  • The partDSA package uses nws for generating a piecewise constant estimation list of increasingly complex predictors based on an intensive and comprehensive search over the entire covariate space.
  • The dclone package provides a global optimization approach and a variant of simulated annealing which exploits Bayesian MCMC tools to get MLE point estimates and standard errors using low level functions for implementing maximum likelihood estimating procedures for complex models using data cloning and Bayesian Markov chain Monte Carlo methods with support for JAGS, WinBUGS and OpenBUGS; parallel computing is supported via the snow package.
  • The pmclust package utilizes unsupervised model-based clustering for high dimensional (ultra) large data. The package uses pbdMPI to perform a parallel version of the EM algorithm for finite mixture Gaussian models.
  • The harvestr package provides helper functions for (reproducible) simulations.
  • Nowadays, many packages can use the facilities offered by the parallel package. One example is pls, another is PGICA which can run ICA analysis in parallel on SGE or multicore platforms.
  • The sprint (an acronym for "Simple Parallel R INTerface") package provides a parallel computing framework for R making High Performance Computing (HPC) accessible to users who are not familiar with parallel programming and the use of HPC architectures. It contains a library of parallelised R functions for correlation, partitioning around medoids, apply, permutation testing, bootstrapping, random forest, rank product and hamming distance.
  • The pbapply package offers a progress bar for vectorized R functions in the `*apply` family, and supports several backends.

Parallel computing: GPUs

  • The gputools package by Buckner and Seligman provides several common data-mining algorithms which are implemented using a mixture of nVidia's CUDA langauge and cublas library. Given a computer with an nVidia GPU these functions may be substantially more efficient than native R routines.
  • The cudaBayesreg package by da Silva implements the rhierLinearModel from the bayesm package using nVidia's CUDA langauge and tools to provide high-performance statistical analysis of fMRI voxels.
  • The rgpu package (see below for link) aims to speed up bioinformatics analysis by using the GPU.
  • The gcbd package implements a benchmarking framework for BLAS and GPUs (using gputools).
  • The OpenCL package provides an interface from R to OpenCL permitting hardware- and vendor neutral interfaces to GPU programming.
  • The HiPLARM package provide High-Performance Linear Algebra for R using multi-core and/or GPU support using the PLASMA / MAGMA libraries from UTK, CUDA, and accelerated BLAS.
  • The permGPU package computes permutation resampling inference in the context of RNA microarray studies on the GPU, it uses CUDA (>= 4.5)
  • The gmatrix package enables the evaluation of matrix and vector operations using GPU coprocessors such that intermediate computations may be kept on the coprocessor and reused, with potentially significant performance enhancements by minimizing data movement.
  • The gpuR package offers GPU-enabled functions: New gpu* and vcl* classes are provided to wrap typical R objects (e.g. vector, matrix) mirroring typical R syntax without the need to know OpenCL.

Large memory and out-of-memory data

  • The biglm package by Lumley uses incremental computations to offer lm() and glm() functionality to data sets stored outside of R's main memory.
  • The ff package by Adler et al. offers file-based access to data sets that are too large to be loaded into memory, along with a number of higher-level functions.
  • The bigmemory package by Kane and Emerson permits storing large objects such as matrices in memory (as well as via files) and uses external pointer objects to refer to them. This permits transparent access from R without bumping against R's internal memory limits. Several R processes on the same computer can also share big memory objects.
  • A large number of database packages, and database-alike packages (such as sqldf by Grothendieck and data.table by Dowle) are also of potential interest but not reviewed here.
  • The HadoopStreaming package provides a framework for writing map/reduce scripts for use in Hadoop Streaming; it also facilitates operating on data in a streaming fashion which does not require Hadoop.
  • The speedglm package permits to fit (generalised) linear models to large data. For in-memory data sets, speedlm() or speedglm() can be used along with update.speedlm() which can update fitted models with new data. For out-of-memory data sets, shglm() is available; it works in the presence of factors and can check for singular matrices.
  • The biglars package by Seligman et al can use the ff to support large-than-memory datasets for least-angle regression, lasso and stepwise regression.
  • The MonetDB.R package allows R to access the MonetDB column-oriented, open source database system as a backend.
  • The ffbase package by de Jonge et al adds basic statistical functionality to the ff package.
  • The LaF package provides methods for fast access to large ASCII files in csv or fixed-width format.

Easier interfaces for Compiled code

  • The inline package by Sklyar et al eases adding code in C, C++ or Fortran to R. It takes care of the compilation, linking and loading of embeded code segments that are stored as R strings.
  • The Rcpp package by Eddelbuettel and Francois offers a number of C++ clases that makes transferring R objects to C++ functions (and back) easier, and the RInside package by the same authors allows easy embedding of R itself into C++ applications for faster and more direct data transfer.
  • The RcppParallel package by Allaire et al. bundles the Intel Threading Building Blocks and TinyThread libraries. Together with Rcpp, RcppParallel makes it easy to write safe, performant, concurrently-executing C++ code, and use that code within R and R packages.
  • The rJava package by Urbanek provides a low-level interface to Java similar to the .Call() interface for C and C++.

Profiling tools

  • The profr package by Wickham can visualize output from the Rprof interface for profiling.
  • The proftools package by Tierney, and the aprof package by Visser, can also be used to analyse profiling output.
  • The GUIProfiler package visualizes the results of profiling R programs.

Packages

aprof — 0.3.2

Amdahl's Profiler, Directed Optimization Made Easy

batch — 1.1-4

Batching Routines in Parallel and Passing Command-Line Arguments to R

BatchExperiments — 1.4.1

Statistical Experiments on Batch Computing Clusters

BatchJobs — 1.6

Batch Computing with R

batchtools — 0.9.3

Tools for Computation on Batch Systems

bayesm — 3.0-2

Bayesian Inference for Marketing/Micro-Econometrics

bcp — 4.0.0

Bayesian Analysis of Change Point Problems

biglars — 1.0.2

Scalable Least-Angle Regression and Lasso

biglm — 0.9-1

bounded memory linear and generalized linear models

bigmemory — 4.5.19

Manage Massive Matrices with Shared Memory and Memory-Mapped Files

bnlearn — 4.1.1

Bayesian Network Structure Learning, Parameter Learning and Inference

caret — 6.0-76

Classification and Regression Training

cudaBayesreg — 0.3-16

CUDA Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

dclone — 2.1-2

Data Cloning and MCMC Tools for Maximum Likelihood Methods

doFuture — 0.5.0

A Universal Foreach Parallel Adaptor using the Future API of the 'future' Package

doMC — 1.3.4

Foreach Parallel Adaptor for 'parallel'

doMPI — 0.2.2

Foreach Parallel Adaptor for the Rmpi Package

doRedis — 1.1.1

Foreach parallel adapter for the rredis package

doRNG — 1.6.6

Generic Reproducible Parallel Backend for 'foreach' Loops

doSNOW — 1.0.14

Foreach Parallel Adaptor for the 'snow' Package

data.table — 1.10.4

Extension of `data.frame`

ff — 2.2-13

memory-efficient storage of large data on disk and fast access functions

ffbase — 0.12.3

Basic Statistical Functions for Package 'ff'

flowr — 0.9.10

Streamlining Design and Deployment of Complex Workflows

foreach — 1.4.3

Provides Foreach Looping Construct for R

future — 1.5.0

Unified Parallel and Distributed Processing in R for Everyone

future.BatchJobs — 0.14.0

A Future API for Parallel and Distributed Processing using BatchJobs

GAMBoost — 1.2-3

Generalized linear and additive models by likelihood based boosting

gcbd — 0.2.6

'GPU'/CPU Benchmarking in Debian-Based Systems

gmatrix — 0.3

GPU Computing in R

gputools — 1.1

A Few GPU Enabled Functions

gpuR — 1.2.1

GPU Functions for R Objects

GUIProfiler — 2.0.1

Graphical User Interface for Rprof()

h2o — 3.10.4.6

R Interface for H2O

HadoopStreaming — 0.2

Utilities for using R scripts in Hadoop streaming

harvestr — 0.7.1

A Parallel Simulation Framework

HiPLARM — 0.1

High Performance Linear Algebra in R

HistogramTools — 0.3.2

Utility Functions for R Histograms

inline — 0.3.14

Functions to Inline C, C++, Fortran Function Calls from R

LaF — 0.6.3

Fast Access to Large ASCII Files

latentnet — 2.7.1

Latent Position and Cluster Models for Statistical Networks

lga — 1.1-1

Tools for linear grouping analysis (LGA)

Matching — 4.9-2

Multivariate and Propensity Score Matching with Balance Optimization

MonetDB.R — 1.0.1

Connect MonetDB to R

nws — 1.7.0.1

R functions for NetWorkSpaces and Sleigh

orloca — 4.2

The package deals with Operations Research LOCational Analysis models

OpenCL — 0.1-3

Interface allowing R to use OpenCL

partDSA — 0.9.14

Partitioning Using Deletion, Substitution, and Addition Moves

pbapply — 1.3-2

Adding Progress Bar to '*apply' Functions

pbdBASE — 0.4-5

Programming with Big Data -- Base Wrappers for Distributed Matrices

pbdDEMO — 0.3-1

Programming with Big Data -- Demonstrations and Examples Using 'pbdR' Packages

pbdDMAT — 0.4-2

'pbdR' Distributed Matrix Methods

pbdPROF — 0.3-1

Programming with Big Data --- MPI Profiling Tools

pbdMPI — 0.3-3

Programming with Big Data -- Interface to MPI

pbdNCDF4 — 0.1-4

Programming with Big Data -- Interface to Parallel Unidata NetCDF4 Format Data Files

pbdSLAP — 0.2-2

Programming with Big Data -- Scalable Linear Algebra Packages

peperr — 1.1-7

Parallelised Estimation of Prediction Error

permGPU — 0.14.9

Using GPUs in Statistical Genomics

PGICA — 1.0

Parallel Group ICA Algorithm

pls — 2.6-0

Partial Least Squares and Principal Component Regression

pmclust — 0.1-9

Parallel Model-Based Clustering using Expectation-Gathering-Maximization Algorithm for Finite Mixture Gaussian Model

profr — 0.3.1

An alternative display for profiling information

proftools — 0.99-2

Profile Output Processing Tools for R

pvclust — 2.0-0

Hierarchical Clustering with P-Values via Multiscale Bootstrap Resampling

randomForestSRC — 2.4.2

Random Forests for Survival, Regression and Classification (RF-SRC)

Rborist — 0.1-6

Extensible, Parallelizable Implementation of the Random Forest Algorithm

Rcpp — 0.12.11

Seamless R and C++ Integration

RcppParallel — 4.3.20

Parallel Programming Tools for 'Rcpp'

Rdsm — 2.1.1

Threads Environment for R

rgenoud — 5.7-12.4

R Version of GENetic Optimization Using Derivatives

Rhpc — 0.15-244

Permits *apply() Style Dispatch for 'HPC'

RhpcBLASctl — 0.15-148

Control the Number of Threads on 'BLAS'

RInside — 0.2.14

C++ Classes to Embed R in C++ Applications

rJava — 0.9-8

Low-Level R to Java Interface

rlecuyer — 0.3-4

R Interface to RNG with Multiple Streams

Rmpi — 0.6-6

Interface (Wrapper) to MPI (Message-Passing Interface)

RProtoBuf — 0.4.9

R Interface to the 'Protocol Buffers' 'API' (Version 2 or 3)

rredis — 1.7.0

"Redis" Key/Value Database Client

rslurm — 0.3.3

Submit R Calculations to a 'SLURM' Cluster

snow — 0.4-2

Simple Network of Workstations

snowfall — 1.84-6.1

Easier cluster computing (based on snow).

snowFT — 1.5-0

Fault Tolerant Simple Network of Workstations

speedglm — 0.3-2

Fitting Linear and Generalized Linear Models to Large Data Sets

sprint — 1.0.7

Simple Parallel R INTerface

sqldf — 0.4-10

Perform SQL Selects on R Data Frames

STAR — 0.3-7

Spike Train Analysis with R

tm — 0.7-1

Text Mining Package

toaster — 0.5.5

Big Data in-Database Analytics that Scales with Teradata Aster Distributed Platform

varSelRF — 0.7-5

Variable Selection using Random Forests


Task view list