A general-purpose computational engine for data analysis, drake rebuilds intermediate data objects when their dependencies change, and it skips work when the results are already up to date. Not every execution starts from scratch, there is native support for parallel and distributed computing, and completed projects have tangible evidence that they are reproducible. Extensive documentation, from beginner-friendly tutorials to practical examples and more, is available at the reference website < https://ropensci.github.io/drake/> and the online manual < https://ropenscilabs.github.io/drake-manual/>.
Version 6.2.1 is a hotfix to address the failing automated CRAN checks for 6.2.0. Chiefly, in CRAN's Debian R-devel (2018-12-10) check platform, errors of the form "length > 1 in coercion to logical" occurred when either argument to
|| was not of length 1 (e.g.
nzchar(letters) && length(letters)). In addition to fixing these errors, version 6.2.1 also removes a problematic link from the vignette.
plan_summaries(). Allows the user to set the delimiter for generating new target names.
drake_config(). Here, the user can set the function that builds targets in "hasty mode" (
make(parallelism = "hasty")).
drake_envir()function that returns the environment where
drakebuilds targets. Can only be accessed from inside the commands in the workflow plan data frame. The primary use case is to allow users to remove individual targets from memory at predetermined build steps.
predict_runtime(targets_only = TRUE)when some targets are outdated and others are not.
create_drake_layout(). (Affects R-3.3.x.)
config$layout) just to store the code analysis results. This is an intermediate structure between the workflow plan data frame and the graph. It will help clean up the internals in future development.
make(parallelism = "future"). That way , job names are target names by default if
job.nameis used correctly in the
forceargument in all functions except
console_log_filein real time (#588).
vis_drake_graph()hover text to display commands in the
drakeplan more elegantly.
predict_load_balancing()and remove its reliance on internals that will go away in 2019 via #561.
predict_load_balancing(). This functionality will go away in 2019 via #561.
predict_load_balancing()up to date.
drake_session()and rename to
timeoutargument in the API of
drake_config(). A value of
timeoutcan be still passed to these functions without error, but only the
cpuarguments impose actual timeouts now.
map_plan()function to easily create a workflow plan data frame to execute a function call over a grid of arguments.
plan_to_code()function to turn
drakeplans into generic R scripts. New users can use this function to better understand the relationship between plans and code, and unsatisfied customers can use it to disentangle their projects from
plan_to_notebook()generates an R notebook from a
drake_debug()function to run a target's command in debug mode. Analogous to
trigger()to control how the
conditiontrigger factors into the decision to build or skip a target. See the
drake_config()to help the master process consume fewer resources during parallel processing.
cachingargument for the
"clustermq_staged"parallel backends. Now,
make(parallelism = "clustermq", caching = "master")will do all the caching with the master process, and
make(parallelism = "clustermq", caching = "worker")will do all the caching with the workers. The same is true for
parallelism = "clustermq_staged".
appendargument control whether the output includes the original
planin addition to the newly generated rows.
reduce_by()in order to restrict what we gather even when
make(parallelism = "hasty")skips all of
drake's expensive caching and checking. All targets run every single time and you are responsible for saving results to custom output files, but almost all the by-target overhead is gone.
render_sankey_drake_graph(). That way, tildes in file paths no longer interfere with the rendering of static image files. Compensates for https://github.com/wch/webshot.
evaluate_plan(trace = TRUE)followed by
reduce_by(). The more relaxed behavior also gives users more options about how to construct and maintain their workflow plan data frames.
"future"parallelism to make sure files travel over network file systems before proceeding to downstream targets.
visNetworkpackage is not installed.
make_targets()if all the targets are already up to date.
"worker". The default option should be the lower-overhead option for small workflows. Users have the option to make a different set of tradeoffs for larger workflows.
conditiontrigger to evaluate to non-logical values as long as those values can be coerced to logicals.
conditiontrigger evaluate to a vector of length 1.
make(verbose = 4)now prints to the console when a target is stored.
reduce_by()now gather/reduce everything if no columns are specified.
make(jobs = 4)was equivalent to
make(jobs = c(imports = 4, targets = 4)). Now,
make(jobs = 4)is equivalent to
make(jobs = c(imports = 1, targets = 4)). See issue 553 for details.
verboseis at least 2.
reduce_by(), do not exclude targets with all
digest()wherever possible. This puts old
drakeprojects out of date, but it improves speed.
stringipackage no longer compiles on 3.2.0.
code_dependencies(), restrict the possible global variables to the ones mentioned in the new
globalsargument (turned off when
NULL. In practical workflows, global dependencies are restricted to items in
envirand proper targets in the plan. In
globalsslot of the output list is now a list of candidate globals, not necessarily actual globals (some may not be targets or variables in
FALSE. This should prevent the accidental deletion of whole directories.
clean()deleted input-only files if no targets from the plan were cached. A patch and a unit test are included in this release.
loadd(not_a_target)no longer loads every target in the cache.
igraphvertex attribute (fixes https://github.com/ropensci/drake/issues/503).
knitr_in()file code chunks.
sort(NULL)that caused warnings in R 3.3.3.
analyze_loadd()was sometimes quitting with "Error: attempt to set an attribute on NULL".
digest::digest(file = TRUE)on directories. Instead, set hashes of directories to
NA. Users should still not directories as file dependencies.
vis_drake_graph(). Previously, these files were missing from the visualization, but actual workflows worked just fine. Ref: https://stackoverflow.com/questions/52121537/trigger-notification-from-report-generation-in-r-drake-package
codetoolsfailures in R 3.3 (add a
clustermq-based parallel backend:
make(parallelism = "clustermq").
evaluate_plan(trace = TRUE)now adds a
*_fromcolumn to show the origins of the evaluated targets. Try
evaluate_plan(drake_plan(x = rnorm(n__), y = rexp(n__)), wildcard = "n__", values = 1:2, trace = TRUE).
reduce_by(), which gather on custom columns in the plan (or columns generated by
evaluate_plan(trace = TRUE)) and append the new targets to the previous plan.
workers()) as an argument of
code_to_plan()function to turn R scripts and R Markdown reports into workflow plan data frames.
drake_plan_source()function, which generates lines of code for a
drake_plan()call produces the plan passed to
drake_plan_source(). The main purpose is visual inspection (we even have syntax highlighting via
prettycode) but users may also save the output to a script file for the sake of reproducibility or simple reference.
deps_targets()in favor of a new
deps_target()function (singular) that behaves more like
vis_drake_graph()using the "title" node column.
vis_drake_graph(collapse = TRUE).
dependency_profile()show major trigger hashes side-by-side to tell the user if the command, a dependency, an input file, or an ouptut file changed since the last
txtqpackage is installed.
readd(), giving specific usage guidance in prose.
build_drake_graph()and print to the console the ones that execute.
txtqis not installed.
drake's code examples to this repository and make make
drake_examples()download examples from there.
igraphattributes of the dependency graph to allow for smarter dependency/memory management during
sankey_drake_graph()to save static image files via
render_static_drake_graph()in favor of
evaluate_plan()so users can evaluate wildcards in columns other than the
target()so users do not have to (explicitly).
ggraphstatic graph visualizations.
drake_graph_info()to optionally condense nodes into clusters.
evaluate_plan()to optionally add indicator columns to show which targets got expanded/evaluated with which wildcard values.
make(parallelism = "clustermq_staged"), a
clustermq-based staged parallelism backend (see https://github.com/ropensci/drake/pull/452).
make(parallelism = "future_lapply_staged"), a
future-based staged parallelism backend (see https://github.com/ropensci/drake/pull/450).
CodeDependsfor finding global variables.
knitrreports referenced with
knitr_in()inside imported functions. Previously, this feature was only available in explicit
knitr_in()calls in commands.
drake_batchtools_tmpl_file()in favor of
gc()is called after every new build of a target.
tracked()to accept only a
drake_config()object as an argument. Yes, it is technically a breaking change, but it is only a small break, and it is the correct API choice.
knitrreports without warnings.
drakeuses persistent workers and a master process. In the case of
"future_lapply"parallelism, the master process is a separate background process called by
make()'s. (Previously, there were "check" messages and a call to
make(parallelism = c(imports = "mclapply_staged", targets = "mclapply").
make(jobs = 1). Now, they are kept in memory until no downstream target needs them (for
make(jobs = 1)).
predict_runtime(). It is a more sensible way to go about predicting runtimes with multiple jobs. Likely to be more accurate.
make()no longer leave targets in the user's environment.
drake_config()in favor of
failed()so users can list failed targets that do not have any failed dependencies. Naturally accompanies
make(keep_going = TRUE).
plyras a dependency.
target()to help create drake plans with custom columns.
drake_gc(), clean out disruptive files in
storrs with mangled keys (re: https://github.com/ropensci/drake/issues/198).
load_basic_example()in favor of
README.mdfile on the main example rather than the mtcars example.
README.Rmdfile to generate
deps()in favor of
drake_config()so the user can decide how
drakekeeps non-import dependencies in memory when it builds a target.
drakeplans to help users customize scheduling.
drake_config()to avoid potential conflicts between user-side custom
Makefiles and the one written by
make(parallelism = "Makefile").
drake_config()so users can redirect console output to a file.
readd(show_source = TRUE),
loadd(show_source = TRUE).
!!operator from tidyeval and
rlangis parsed differently than in R <= 3.4.4. This change broke one of the tests in
tests/testthat/tidy-eval.RThe main purpose of
drake's 5.1.2 release is to fix the broken test.
R CMD checkerror from building the pdf manual with LaTeX.
drake_plan(), allow users to customize target-level columns using
target()inside the commands.
bind_plans()function to concatenate the rows of drake plans and then sanitize the aggregate plan.
sessionargument to tell
make()to build targets in a separate, isolated master R session. For example,
make(session = callr::r_vanilla).
reduce_plan()function to do pairwise reductions on collections of targets.
.) from being a dependency of any target or import. This enforces more consistent behavior in the face of the current static code analysis funcionality, which sometimes detects
.and sometimes does not.
ignore()to optionally ignore pieces of workflow plan commands and/or imported functions. Use
draketo not track dependencies in
some_codewhen it comes to deciding which target are out of date.
draketo only look for imports in environments inheriting from
make()(plus explicitly namespaced functions).
loadd()to ignore foreign imports (imports not explicitly found in
make()last imported them).
loadd()so that only targets (not imports) are loaded if the
listarguments are empty.
"*"to the default
.drake/cache folder every time
new_cache()is called. This means the cache will not be automatically committed to git. Users need to remove
.gitignorefile to allow unforced commits, and then subsequent
make()s on the same cache will respect the user's wishes and not add another
.gitignore. this only works for the default cache. Not supported for manual
"future"backend with a manual scheduler.
build_times(), there is an API change: for
tidyselectto work, we needed to insert a new
...argument as the first argument of
file_in()for file inputs to commands or imported functions (for imported functions, the input file needs to be an imported file, not a target).
file_out()for output file targets (ignored if used in imported functions).
rmarkdownreports. This tells
draketo look inside the source file for target dependencies in code chunks (explicitly referenced with
readd()). Treated as a
file_in()if used in imported functions.
drake_plan()so that it automatically fills in any target names that the user does not supply. Also, any
file_out()s become the target names automatically (double-quoted internally).
read_drake_plan()(rather than an empty
drake_plan()) the default
planargument in all functions that accept a
loadd(..., lazy = "bind"). That way, when you have a target loaded in one R session and hit
make()in another R session, the target in your first session will automatically update.
diagnose()will take on the role of returning this metadata.
read_drake_meta()function in favor of
expose_imports()function to optionally force
drakedetect deeply nested functions inside specific packages.
drake_build()to be an exclusively user-side function.
loadd()so that objects already in the user's eOne small thing:nvironment need not be replaced.
load_basic_example(). Also hard-code a default seed of
0. That way, the pseudo-randomness in projects should be reproducible across R sessions.
drake_read_seed()function to read the seed from the cache. Its examples illustrate what
drakeis doing to try to ensure reproducible random numbers.
drake_plan(). Suppress this behavior using
tidy_evaluation = FALSEor by passing in commands passed through the
rlang::expr()before evaluating them. That means you can use the quasiquotation operator
!!in your commands, and
make()will evaluate them according to the tidy evaluation paradigm.
drake_example("packages")to demonstrate how to set up the files for serious
drakeprojects. More guidance was needed in light of this issue.
drake_plan()in the help file (
draketo rOpenSci: https://github.com/ropensci/drake
configargument, which you can get from
make()decides to build targets.
storrcache in a way that is not back-compatible with projects from versions 4.4.0 and earlier. The main change is to make more intelligent use of
storrnamespaces, improving efficiency (both time and storage) and opening up possibilities for new features. If you attempt to run drake >= 5.0.0 on a project from drake <= 4.0.0, drake will stop you before any damage to the cache is done, and you will be instructed how to migrate your project to the new drake.
drakewas having problems with an edge case: as a command, the literal string
"A"was interpreted as the symbol
Aafter tidying. With
tidy_source(), literal quoted strings stay literal quoted strings in commands. This may put some targets out of date in old projects, yet another loss of back compatibility in version 5.0.0.
rescue_cache(), exposed to the user and used in
clean(). This function removes dangling orphaned files in the cache so that a broken cache can be cleaned and used in the usual ways once more.
NULL. This solves an elusive bug in how drake imposes timeouts.
graphargument to functions
prune_graph()function for igraph objects.
codeas names in the workflow plan data frame. Use
commandinstead. This naming switch has been formally deprecated for several months prior.
drake_strings()to remove the silly dependence on the
drake_config(). Increases speed.
sanitize_plan(), remove rows with blank targets "".
clean()to optionally remove all target-level information.
cached()so users can inspect individual
verboseto numeric: 0 = print nothing, 1 = print progress on imports only, 2 = print everything.
next_stage()function to report the targets to be made in the next parallelizable stage.
sessionInfo()is a bottleneck for small
make()s, so there is now an option to suppress it. This is mostly for the sake of speeding up unit tests.
make()to suppress progress logging. This increases storage efficiency and speeds some projects up a tiny bit.
readd(). You can now load and read from non-default
make(..., cache_log_file = TRUE)as options to track changes to targets/imports in the drake cache.
rmarkdown::render(), not just
plot_graph()to display subcomponents. Check out arguments
subset. The graph visualization vignette has demonstrations.
"future_lapply"parallelism: parallel backends supported by the future and future.batchtools packages. See
?backendfor examples and the parallelism vignette for an introductory tutorial. More advanced instruction can be found in the
make()to wrap around
build(). That way, users can more easily control the side effects of distributed jobs. For example, to redirect error messages to a file in
make(..., parallelism = "Makefile", jobs = 2, hook = my_hook),
my_hookshould be something like
drakewas previously using the
outfileargument for PSOCK clusters to generate output that could not be caught by
capture.output(). It was a hack that should have been removed before.
drakewas previously using the
outfileargument for PSOCK clusters to generate output that could not be caught by
capture.output(). It was a hack that should have been removed before.
outdated()print "All targets are already up to date" to the console.
progress(). Also see the new
failed()function, which is similar to
parLapplyparallelism. The downside to this fix is that
drakehas to be properly installed. It should not be loaded with
devtools::load_all(). The speedup comes from lightening the first
run_parLapply(). Previously, we exported every single individual
drakefunction to all the workers, which created a bottleneck. Now, we just load
drakeitself in each of the workers, which works because
get_cache(..., verbose = TRUE).
lightly_parallelize_atomic(). Now, processing happens faster, and only over the unique values of a vector.
make_with_config()function to do the work of
make()on an existing internal configuration list from
drake_batchtools_tmpl_file()to write a
batchtoolstemplate file from one of the examples (
drake_example()), if one exists.
Version 4.3.0 has:
Version 4.2.0 will be released today. There are several improvements to code style and performance. In addition, there are new features such as cache/hash externalization and runtime prediction. See the new storage and timing vignettes for details. This release has automated checks for back-compatibility with existing projects, and I also did manual back compatibility checks on serious projects.
Version 3.0.0 is coming out. It manages environments more intelligently so that the behavior of
make() is more consistent with evaluating your code in an interactive session.
Version 1.0.1 is on CRAN! I'm already working on a massive update, though. 2.0.0 is cleaner and more powerful.