Functions for easy and reproducible simulation.
harvestr package is a framework for conducting replicable parallel
simulations in R. It builds off the
the popular plyr
package for split apply combine framework, and the parallel combined
multiple-recursive generator from L'Ecuyer (1999).
Due to the replicable simulations being based off seed values,this package takes a theme of seeds and farming. The principal functions are as follows:
gather- Creates a list of parallel rng seeds.
farm- Uses seeds from
gatherto evaluate expressions after each seed has been set. This is usefull for generating data.
harvest- This will take the results from
farmand continue evaluation with the random number generation where farm left off. This is useful for the evaluating data generated with farm, through stochastic methods such as Markov Chain Monte Carlo.
reap- is the single version of harvest for a single element that has appropriately structured seed attributes.
plant- takes a list of objects, assumed to be of the same class, and gives each element a parallel seed value to use with
graft- splits RNG sub-streams from a main object.
sprout- gets the seeds for use in
All of the functions work off lists, They expect and return lists, which can be easily converted to data frames. I would do this with
The advantage of setting the seeds like this is that parallelization is seamless and transparent, similar to the
plyr framework each function has a
.parallel argument, which defaults to
FALSE, but when set to true will evaluate and run in parallel. An appropriate parallel backend must be specified. For example, with a multicore backend you would run the following code.
foreach packages documentation for what backends are currently supported.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
rsprng, now depends on
Added use_method to help with reference classes as input to harvest.
Added caching. The cache is an explicit parameter for each function. The cache directory is controlled with option("harvestr.cache.dir") and defaults to "harvestr-cache" if the opion is not set.
Added timing. Each evaluation is timed and can be extracted from the 'time' attribute of each results.
Speed improvements through optimizing withpseed. No longer double evaluates.
Added a vignette that explains the process flow for using harvestr.
harvestr is a package that facilitates the creation of reproducible parallel simulations.
The primary functions are:
gatherfor generating parallel seeds.
farmfor generating datasets from the paralele seeds.
harvestfor applying an analysis function for each generated data frame, including stochastic analysis such as bootstrap or mcmc.