An efficient interface to MPI by utilizing S4 classes and methods with a focus on Single Program/Multiple Data ('SPMD') parallel programming style, which is intended for batch parallel execution.
With few exceptions (ff, bigalgebra, etc.), R does computations in memory. When data becomes too large to handle in the memory of a single node, or when more processors than those offered in commodity hardware (~16) are needed for a job, a typical strategy is to add more nodes. MPI, or the "Message Passing Interface", is the standard for managing multi-node communication. pbdMPI is a package that greatly simplifies the use of MPI from R.
In pbdMPI, we make extensive use of R's S4 system to simplify the interface significantly. Instead of needing to specify the type (e.g., integer or double) of the data via function name (as in C implementations) or in an argument (as in Rmpi), you need only call the generic function on your data and we will always "do the right thing".
In pbdMPI, we write programs in the "Single Program/Multiple Data" or SPMD style. Contrary to the way much of the R world is aquainted with parallelism, there is no "master" or "manager". Each process (MPI rank) gets runs the same copy of the program as every other process, but operates on its own data. This is arguably one of the simplest extensions of serial to massively parallel programming, and has been the standard way of doing things in the HPC community for over 20 years.
If you are comfortable with MPI concepts, you should find pbdMPI very agreeable and simple to use. Below is a basic "hello world" program:
# load the packagesuppressMessages(library(pbdMPI, quietly = TRUE))# initialize the MPI communicatorsinit()# Hello worldmessage <- paste("Hello from rank", comm.rank(), "of", comm.size())comm.print(message, all.rank=TRUE, quiet=TRUE)# shut down the communicators and exitfinalize()
Save this as, say,
mpi_hello_world.r and run it via:
mpirun -np 4 Rscript mpi_hello_world.r
comm.print() is a "sugar" function custom to pbdMPI that makes it
simple to print in a distributed environment. The argument
specifies that all MPI ranks should print, and the
tells each rank not to "announce" itself when it does its printing.
The package can be installed from the CRAN via the usual
install.packages("pbdMPI"), or via the devtools package:
For additional installation information, see:
More information about pbdMPI, including installation troubleshooting, can be found in:
pbdMPI is authored and maintained by the pbdR core team:
With additional contributions from: