A Parallel Simulation Framework

Functions for easy and reproducible simulation.


Travis-CI Build Status Coverage Status CRAN version

The harvestr package is a framework for conducting replicable parallel simulations in R. It builds off the the popular plyr package for split apply combine framework, and the parallel combined multiple-recursive generator from L'Ecuyer (1999).

Due to the replicable simulations being based off seed values,this package takes a theme of seeds and farming. The principal functions are as follows:

  • gather - Creates a list of parallel rng seeds.
  • farm - Uses seeds from gather to evaluate expressions after each seed has been set. This is usefull for generating data.
  • harvest - This will take the results from farm and continue evaluation with the random number generation where farm left off. This is useful for the evaluating data generated with farm, through stochastic methods such as Markov Chain Monte Carlo.
  • reap - is the single version of harvest for a single element that has appropriately structured seed attributes.
  • plant - takes a list of objects, assumed to be of the same class, and gives each element a parallel seed value to use with harvest for evaluation.
  • graft - splits RNG sub-streams from a main object.
  • sprout - gets the seeds for use in graft.

Lists##

All of the functions work off lists, They expect and return lists, which can be easily converted to data frames. I would do this with ldply(list, I).

Parallel##

The advantage of setting the seeds like this is that parallelization is seamless and transparent, similar to the plyr framework each function has a .parallel argument, which defaults to FALSE, but when set to true will evaluate and run in parallel. An appropriate parallel backend must be specified. For example, with a multicore backend you would run the following code.

library(doMC)
regiserDoMC()

See the plyr and foreach packages documentation for what backends are currently supported.

Operating Systems

harvestr is limited in it's capabilities by the packages that it depends on, mainly foreach and plyr The Parallel backends are platform limited read the individual packages documentation:

Notes

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

News

harvestr 0.7.1

  • Updated DESCRIPTION and Vignette sources to build vignettes.

harvestr 0.7.0

  • Added getAttr for retrieving attributes with a default similar to getOption.
  • plow and plant allow for specifying a single seed that will be used to generate other seeds.
  • changed vignette to use knitr.
  • Added Interactive function.

harvestr 0.6.0

  • made cache.dir configurable at call level.
  • added is_seeded function.

harvestr 0.5.2

  • Bug fix for time option not carrying forward.
  • Added plow function for calling with each row of a dataframe as parameters
  • Added Bale to combine back into a data.frame
  • Removed dependency on lme4, due to removed functionality.

harvestr 0.5.1

  • Documentation updates:
    • Stripped last remaining references to rsprng.
    • Expanded documentation for farm, plant, and gather
  • Version bump to 0.5.1 to avoid possible conflict on github.

harvestr 0.5

  • Major bugfix that prevented independent streams, in fact the same stream would be run.
  • new tests to prevent that condition again.

harvestr 0.4

  • Removed dependence on rsprng, now depends on parallel base package.
  • Removed volatile tests cache timings and parallel. Moved to examples.
  • timing in now optional, controlled with option harvestr.time
  • caching controlled by option harvestr.use.cache
  • Support for RNG sub streams with sprout & graft.

harvestr 0.3

Added use_method to help with reference classes as input to harvest.

harvestr 0.2

Added caching. The cache is an explicit parameter for each function. The cache directory is controlled with option("harvestr.cache.dir") and defaults to "harvestr-cache" if the opion is not set.

Added timing. Each evaluation is timed and can be extracted from the 'time' attribute of each results.

Speed improvements through optimizing withpseed. No longer double evaluates.

Added a vignette that explains the process flow for using harvestr.

harvestr 0.1

harvestr is a package that facilitates the creation of reproducible parallel simulations.

The primary functions are:

  • gather for generating parallel seeds.
  • farm for generating datasets from the paralele seeds.
  • harvest for applying an analysis function for each generated data frame, including stochastic analysis such as bootstrap or mcmc.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("harvestr")

0.7.1 by Andrew Redd, 2 years ago


Browse source code at https://github.com/cran/harvestr


Authors: Andrew Redd


Documentation:   PDF Manual  


Task views: High-Performance and Parallel Computing with R


GPL (>= 2) license


Imports parallel, plyr, digest, foreach, stats

Suggests testthat, dostats, doParallel, MCMCpack, knitr, boot, withr


Imported by pstest.


See at CRAN