Unified Parallel and Distributed Processing in R for Everyone

The purpose of this package is to provide a lightweight and unified Future API for sequential and parallel processing of R expression via futures. The simplest way to evaluate an expression in parallel is to use `x %<-% { expression }` with `plan(multiprocess)`. This package implements sequential, multicore, multisession, and cluster futures. With these, R expressions can be evaluated on the local machine, on in parallel a set of local machines, or distributed on a mix of local and remote machines. Extensions to this package implements additional backends for processing futures via compute cluster schedulers etc. Because of its unified API, there is no need to modify code in order switch from sequential on the local machine to, say, distributed processing on a remote compute cluster. Another strength of this package is that global variables and functions are automatically identified and exported as needed, making it straightforward to tweak existing code to make use of futures.


News

Package: future

Version: 1.2.0 [2016-11-12] o Added makeClusterPSOCK() - a version of parallel::makePSOCKcluster() that allows for more flexible control of how PSOCK cluster workers are set up and how they are launched and communicated with if running on external machines. o Added generic as.cluster() for coercing objects to cluster objects to be used as in plan(cluster, workers = as.cluster(x)). Also added a c() implementation for cluster objects such that multiple cluster objects can be combined into a single one. o plan() and plan("list") now prints more user-friendly output. o Added sessionDetails() for gathering details of the current R session. o ROBUSTNESS: On Unix, internal myInternalIP() tries more alternatives for finding the local IP number. o DEPRECATED: %<=% is deprecated. Use %<-% instead. Same for %=>%. o BUG FIX: values() for lists and list environments of futures where one or more of the futures resolved to NULL would give an error. o BUG FIX: value() for ClusterFuture would give cryptic error message "Error in stop(ex) : bad error message" if the cluster worker had crashed / terminated. Now it will instead give an error message like "Failed to retrieve the value of ClusterFuture from cluster node #1 on 'localhost'. The reason reported was ‘error reading from connection". o BUG FIX: Argument 'user' to remote() was ignored (since 1.1.0).

Version: 1.1.1 [2016-10-10] o FIX: For the special case where 'remote' futures use workers = "localhost" they (again) use the exact same R executable as the main / calling R session (in all other cases it uses whatever 'Rscript' is found in the PATH). This was already indeed implemented in 1.0.1, but with the added support for reverse SSH tunnels in 1.1.0 this default behavior was lost.

Version: 1.1.0 [2016-10-09] o REMOTE CLUSTERS: It is now very simple to use cluster() and remote() to connect to remote clusters / machines. As long as you can connect via ssh to those machines, it works also with these future. The new code completely avoids incoming firewall and incoming port forwarding issues previously needed. This is done by using reverse SSH tunneling. There is also no need to worry about internal or external IP numbers. o GLOBALS: Global variables part of subassignments in future expressions are recognized and exported (iff found), e.g. x$a <- value, x[["a"]] <- value, and x[1,2,3] <- value. o GLOBALS: Global variables part of formulae in future expressions are recognized and exported (iff found), e.g. y ~ x | z. o GLOBALS: As an alternative to the default automatic identification of globals, it is now also possible to explicitly specify them either by their names (as a character vector) or by their names and values (as a named list), e.g. f <- future({ 2a }, globals=c("a")) or f <- future({ 2a }, globals=list(a=42)). For future assignments one can use the %globals% operator, e.g. y %<-% { 2*a } %globals% c("a"). o Now all Future classes supports run() for launching the future and value() calls run() if the future has not been launched. o Added optional argument 'label' to all futures, e.g. f <- future(42, label="answer") and v %<-% { 42 } %label% "answer". o Added argument 'user' to cluster() and remote(). o ROBUSTNESS: Now the default future strategy is explicitly set when no strategies are set, e.g. when used nested futures. Previously, only mc.cores was set so that only a single core was used, but now also plan("default") set. o WORKAROUND: resolved() on cluster futures would block on Linux until future was resolved. This is due to a bug in R. The workaround is to use round the timeout (in seconds) to an integer, which seems to always work / be respected. o DOCUMENTATION: Added vignette on command-line options and other methods for controlling the default type of futures to use. o MEMORY: Now plan(cluster, gc=TRUE) causes the background R session to be garbage collected immediately after the value is collected. Since multisession and remote futures are special cases of cluster futures, the same is true for these as well.

Version: 1.0.1 [2016-07-04] o DOCUMENTATION: Adding section to vignette on globals in formulas describing how they are currently not automatically detected and how to explicitly export them. o ROBUSTNESS: For the special case where 'remote' futures use workers = "localhost" they now use the exact same R executable as the main / calling R session (in all other cases it uses whatever 'Rscript' is found in the PATH). o FutureError now extends simpleError and no longer the error class of captured errors. o BUG FIX: Since future 0.13.0, a global 'pkg' would be overwritten by the name of the last package attached in future. o BUG FIX: Futures that generated R.oo::Exception errors, they triggered another internal error.

Version: 1.0.0 [2016-06-24] o GLOBALS: Falsely identified global variables no longer generate an error when the future is created. Instead, we leave it to R and the evaluation of the individual futures to throw an error if the a global variable is truly missing. This was done in order to automatically handle future expressions that use non-standard evaluation (NSE), e.g. subset(df, x < 3) where 'x' is falsely identified as a global variable. o Add support for remote(..., myip=""), which now queries a set of external lookup services in case one of them fails. o DEMO: Now the Mandelbrot demo tiles a single Mandelbrot region with one future per tile. This better illustrates parallelism. o Add mandelbrot() function used in demo to the API for convenience. o DOCUMENTATION: Documented R options used by the future package. o ROBUSTNESS: If .future.R script, which is sourced when the future package is attached, gives an error, then the error is ignored with a warning. o CLEANUP: Dropped support for system environment variable 'R_FUTURE_GLOBALS_MAXSIZE'. o TROUBLESHOOTING: If the future requires attachment of packages, then each namespace is loaded separately and before attaching the package. This is done in order to see the actual error message in case there is a problem while loading the namespace. With require()/library() this error message is otherwise suppressed and replaced with a generic one. o BUG FIX: Custom futures based on a constructor function that is defined outside a package gave an error. o BUG FIX: plan("default") assumed that the 'future.plan' option was a string; gave an error if it was a function. o BUG FIX: Various future options were not passed on to futures. o BUG FIX: A startup .future.R script is no longer sourced if the future package is attached by a future expression.

Version: 0.15.0 [2016-06-13] o Now .future.R (if found in the current directory or otherwise in the user's home directory) is sourced when the future package is attach (but not loaded). This helps separating scripts from configuration of futures. o Added remote futures, which are cluster futures with convenient default arguments for simple remote access to R, e.g. plan(remote, workers="login.my-server.org"). o Added support for plan(cluster, workers=c("n1", "n2", "n2", "n4")), where 'workers' (also for ClusterFuture()) is a set of host names passed to parallel::makeCluster(workers). It can also be the number of localhost workers. o Added command line option --parallel=

, which is long for -p

. o Now command line option -p

also set the default future strategy to multiprocessing (if p >= 2 and eager otherwise), unless another strategy is already specified via option 'future.plan' or system environment variable R_FUTURE_PLAN. o Now availableCores() also acknowledges environment variable NSLOTS set by Sun/Oracle Grid Engine (SGE). o MEMORY: Added argument 'gc=FALSE' to all futures. When TRUE, the garbage collector will run at the very end in the process that evaluated the future (just before returning the value). This may help lowering the overall memory footprint when running multiple parallel R processes. The user can enable this by specifying plan(multiprocess, gc=TRUE). The developer can control this using future(expr, gc=TRUE) or v %<-% { expr } %tweak% list(gc=TRUE). o SPEEDUP: Significantly decreased the overhead of creating a future, particularly multicore futures. o BUG FIX: Future would give an error with plan(list("eager")), whereas it did work with plan("eager") and plan(list(eager)).

Version: 0.14.0 [2016-05-16] o Renamed arguments 'maxCores' and 'cluster' to 'workers'. If using the old argument names a deprecation warning will be generated, but it will still work until made defunct in a future release. o Added nbrOfWorkers(). o values() passes arguments '...' to value() of each Future. o Added FutureError class. o Added informative print() method for the Future class. o BUG FIX: resolve() for lists and environments did not work properly when the set of futures was not resolved in order, which could happen with asynchronous futures.

Version: 0.13.0 [2016-04-13] o Add support to plan() for specifying different future strategies for the different levels of nested futures. o Add backtrace() for listing the trace the expressions evaluated (the calls made) before a condition was caught. o Add transparent futures, which are eager futures with early signaling of conditioned enabled and whose expression is evaluated in the calling environment. This makes the evaluation of such futures as similar as possible to how R evaluates expressions, which in turn simplifies troubleshooting errors etc. o Add support for early signaling of conditions. The default is (as before) to signal conditions when the value is queried. In addition, they may be signals as soon as possible, e.g. when checking whether a future is resolved or not. o Signaling of conditions when calling value() is now controlled by argument 'signal' (previously 'onError'). o Now UniprocessFuture:s captures the call stack for errors occurring while resolving futures. o ClusterFuture gained argument 'persistent=FALSE'. With persistent=TRUE, any objects in the cluster R session that was created during the evaluation of a previous future is available for succeeding futures that are evaluated in the same session. Moreover, globals are still identified and exported but "missing" globals will not give an error - instead it is assumed such globals are available in the environment where the future is evaluated. o OVERHEAD: Utility functions exported by ClusterFuture are now much smaller; previously they would export all of the package environment. o BUG FIX: f <- multicore(NA, maxCores=2) would end up in an endless waiting loop for a free core if availableCores() returned one. o BUG FIX: ClusterFuture would ignore local=TRUE.

Version: 0.12.0 [2016-02-23] o Added multiprocess futures, which are multicore futures if supported, otherwise multisession futures. This makes it possible to use plan(multiprocess) everywhere regardless of operating system. o Future strategy functions gained class attributes such that it is possible to test what type of future is currently used, e.g. inherits(plan(), "multicore"). o ROBUSTNESS: It is only the R process that created a future that can resolve it. If a non-resolved future is queried by another R process, then an informative error is generated explaining that this is not possible. o ROBUSTNESS: Now value() for multicore futures detects if the underlying forked R process was terminated before completing and if so generates an informative error messages. o SPEED: Adjusted the parameters for the schema used to wait for next available cluster node such that nodes are polled more frequently. o GLOBALS: resolve() gained argument 'recursive'. o GLOBALS: Added option 'future.globals.resolve' for controlling whether global variables should be resolved for futures or not. If TRUE, then globals are searched recursively for any futures and if found such "global" futures are resolved. If FALSE, global futures are not located, but if they are later trying to be resolved by the parent future, then an informative error message is generated clarifying that only the R process that created the future can resolve it. The default is currently FALSE. o FIX: Exports of objects available in packages already attached by the future were still exported. o FIX: Now availableCores() returns 3L (=2L+1L) instead of 2L if R_CHECK_LIMIT_CORES is set.

Version: 0.11.0 [2016-01-15] o GLOBALS: All futures now validates globals by default (globals=TRUE). o Add multisession futures, which analogously to multicore ones, use multiple cores on the local machine with the difference that they are evaluated in separate R session running in the background rather than separate forked R processes. A multisession future is a special type of cluster futures that do not require explicit setup of cluster nodes. o Add support for cluster futures, which can make use of a cluster of nodes created by parallel::makeCluster(). o Add futureCall(), which is for futures what do.call() is otherwise. o Standardized how options are named, i.e. 'future.'. If you used any future options previously, make sure to check they follow the above format.

Version: 0.10.0 [2015-12-30] o Now %<=% can also assign to multi-dimensional list environments. o Add futures(), values() and resolved(). o Add resolve() to resolve futures in lists and environments. o Now availableCores() also acknowledges the number of CPUs allotted by Slurm. o CLEANUP: Now the internal future variable created by %<=% is removed when the future variable is resolved. o BUG FIX: futureOf(envir=x) did not work properly when 'x' was a list environment.

Version: 0.9.0 [2015-12-11] o GLOBALS: Now globals ("unknown" variables) are identified using the new findGlobals(..., method="ordered") in globals (> 0.5.0) such that a global variable preceding a local variable with the same name is properly identified and exported/frozen. o DOCUMENTATION: Updated vignette on common issues with the case where a global variable is not identified because it is hidden by an element assignment in the future expression. o ROBUSTNESS: Now values of environment variables are trimmed before being parsed. o ROBUSTNESS: Add reproducibility test for random number generation using Pierre L'Ecuyer's RNG stream regardless of how futures are evaluated, e.g. eager, lazy and multicore. o BUG FIX: Errors occurring in multicore futures could prevent further multicore futures from being created.

Version: 0.8.2 [2015-10-14] o BUG FIX: Globals that were copies of package objects were not exported to the future environments. o BUG FIX: The future package had to be attached or future::future() had to be imported, if %<=% was used internally in another package. Similarly, it also had to be attached if multicore futures where used.

Version: 0.8.1 [2015-10-05] o eager() and multicore() gained argument 'globals', where globals=TRUE will validate that all global variables identified can be located already before the future is created. This provides the means for providing the same tests on global variables with eager and multicore futures as with lazy futures. o lazy(sum(x, ...), globals=TRUE) now properly passes ... from the function from which the future is setup. If not called within a function or called within a function without ... arguments, an informative error message is thrown. o Added vignette 'Futures in R: Common issues with solutions'.

Version: 0.8.0 [2015-09-06] o plan("default") resets to the default strategy, which is synchronous eager evaluation unless option 'future_plan' or environment variable 'R_FUTURE_PLAN' has been set. o availableCores("mc.cores") returns getOption("mc.cores") + 1L, because option 'mc.cores' specifies "allowed number of additional R processes" to be used in addition to the main R process. o BUG FIX: plan(future::lazy) and similar gave errors.

Version: 0.7.0 [2015-07-13] o ROBUSTNESS: multicore() blocks until one of the CPU cores is available, iff all are currently occupied by other multicore futures. o multicore() gained argument 'maxCores', which makes it possible to use for instance plan(multicore, maxCores=4L). o Add availableMulticore() [from (in-house) 'async' package]. o More colorful demo("mandelbrot", package="future"). o BUG FIX: old <- plan(new) now returns the old plan/strategy (was the newly set one).

Version: 0.6.0 [2015-06-18] o Add multicore futures, which are futures that are resolved asynchronously in a separate process. These are only supported on Unix-like systems, but not on Windows.

Version: 0.5.1 [2015-06-18] o Eager and lazy futures now records the result internally such that the expression is only evaluated once, even if their errored values are requested multiple times. o Eager futures are always created regardless of error or not. o All Future objects are environments themselves that record the expression, the call environment and optional variables.

Version: 0.5.0 [2015-06-16] o lazy() "freezes" global variables at the time when the future is created. This way the result of a lazy future is more likely to be the same as an eager future. This is also how globals are likely to be handled by asynchronous futures.

Version: 0.4.2 [2015-06-15] o plan() records the call. o Added demo("mandelbrot", package="future"), which can be re-used by other future packages.

Version: 0.4.1 [2015-06-14] o Added plan(). o Added eager future - useful for troubleshooting.

Version: 0.4.0 [2015-06-07] o Distilled Future API from (in-house) 'async' package.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("future")

1.3.0 by Henrik Bengtsson, 9 days ago


https://github.com/HenrikBengtsson/future


Report a bug at https://github.com/HenrikBengtsson/future/issues


Browse source code at https://github.com/cran/future


Authors: Henrik Bengtsson [aut, cre, cph]


Documentation:   PDF Manual  


Task views: High-Performance and Parallel Computing with R


LGPL (>= 2.1) license


Imports digest, globals, listenv, parallel, utils

Suggests R.rsp, markdown


Imported by PSCBS, R.filesets, aroma.affymetrix, aroma.core, fiery, googleComputeEngineR.

Depended on by doFuture, future.BatchJobs, pbmcapply.


See at CRAN