A complete and consistent functional programming toolkit for R.
purrr enhances R's functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. If you've never heard of FP before, the best place to start is the family of
map() functions which allow you to replace many for loops with code that is both more succinct and easier to read. The best place to learn about the
map() functions is the iteration chapter in R for data science.
install.packages("tidyverse")# Alternatively, install just purrr:install.packages("purrr")# Or the the development version from GitHub:# install.packages("devtools")devtools::install_github("tidyverse/purrr")
The following example uses purrr to solve a fairly realistic problem: split a data frame into pieces, fit a model to each piece, compute the summary, then extract the R2.
library(purrr)mtcars %>%split(.$cyl) %>% # from base Rmap(~ lm(mpg ~ wt, data = .)) %>%map(summary) %>%map_dbl("r.squared")#> 4 6 8#> 0.5086326 0.4645102 0.4229655
This example illustrates some of the advantages of purrr functions over the equivalents in base R:
The first argument is always the data, so purrr works naturally with the pipe.
All purrr functions are type-stable. They always return the advertised output type (
map() returns lists;
map_dbl() returns double vectors), or they throw an errror.
map() functions either accept function, formulas (used for succinctly generating anonymous functions), a character vector (used to extract components by name), or a numeric vector (used to extract by position).
This is a maintenance release following the release of dplyr 0.7.5.
We noticed the following issues during reverse dependencies checks:
reduce() fails with this message:
Error: `.x` is empty, and no `.init` supplied, this is because
reduce() now returns
.x is empty. Fix the problem by supplying an
appropriate argument to
.init, or by providing special behaviour
.x has length 0.
The type predicates have been migrated to rlang. Consequently the
bare-type-predicates documentation topic is no longer in purrr,
which might cause a warning if you cross-reference it.
purrr no longer depends on lazyeval or Rcpp (or dplyr, as of the previous version). This makes the dependency graph of the tidyverse simpler, and makes purrr more suitable as a dependency of lower-level packages.
There have also been two changes to eliminate name conflicts between purrr and dplyr:
split_by() have been removed.
dplyr::order_by() and the complete family doesn't feel that
useful. Use tibbles instead (#217).
contains() has been renamed to
has_element() to avoid conflicts with
The plucking mechanism used for indexing into data structures with
map() has been extracted into the function
pluck(). Plucking is
often more readable to extract an element buried in a deep data
structure. Compare this syntax-heavy extraction which reads
to the equivalent pluck:
x %>% pluck(1, accessor, "foo")
as_function() is now
as_mapper() because it is a tranformation that
makes sense primarily for mapping functions, not in general (#298).
.null has been renamed to
.default to better reflect its intent (#298).
.default is returned whenever an element is absent or empty (#231, #254).
as_mapper() sanitises primitive functions by transforming them to
closures with standardised argument names (using
+ is transformed to
function(.x, .y) .x + .y. This
results in proper argument matching so that
, .x = 5)) produces
list(5 - 1, 5 - 2, ...).
Recursive indexing can now extract objects out of environments (#213) and S4 objects (#200), as well as lists.
attr_getter() makes it possible to extract from attributes
map(list(iris, mtcars), attr_getter("row.names")).
The argument list for formula-functions has been tweaked so that you can
refer to arguments by position with
..2, and so on. This makes it
possible to use the formula shorthand for functions with more than two
safely() and friends no longer capture interrupts: this
means that you can now terminate a mapper using one of these with
Escape or Ctrl + C (#314)
All map functions now treat
NULL the same way as an empty vector (#199),
and return an empty vector if any input is an empty vector.
map() functions now force their arguments in the same way that base R
lapply() (#191). This makes
map() etc easier to use when
A new family of "indexed" map functions,
provide a short-hand for
map2(x, names(x)) or
The data frame suffix
_df has been (soft) deprecated in favour of
_dfr to more clearly indicate that it's a row-bind. All variants now
also have a
_dfc for column binding (#167). (These will not be terribly
dplyr::bind_cols() have better
semantics for vectors.)
modify() family returns the same output of the type as the
.x. This is in contrast to the
map() family which always
returns a list, regardless of the input type.
The modify functions are S3 generics. However their default methods
should be sufficient for most classes since they rely on the semantics
modify.default() is thus a shorthand for
x <- map(x, f).
at_depth() has been renamed to
modify_depth() gains new
.ragged argument, and negative depths are
now computed relative to the deepest component of the list (#236).
auto_browse(f) returns a new function that automatically calls
f throws an error (#281).
vec_depth() computes the depth (i.e. the number of levels of indexing)
or a vector (#243).
reduce2_right() make it possible to reduce with a
3 argument function where the first argument is the accumulated value, the
second argument is
.x, and the third argument is
stats::modifyList() to replace by position
if the list is not named.(#201).
list_merge() operates similarly
list_modify() but combines instead of replacing (#322).
The legacy function
update_list() is basically a version of
list_modify that evaluates formulas within the list. It is likely
to be deprecated in the future in favour of a tidyeval interface
such as a list method for
Thanks to @dchiu911, the unit test coverage of purrr is now much greater.
All predicate functions are re-exported from rlang (#124).
compact() now works with standard mapper conventions (#282).
cross_n() has been renamed to
_n suffix was
removed for consistency with
pmap() (originally called
at the start of the project) and
transpose() (originally called
cross_d() has been renamed to
for consistency with
some() now return
NA if present in the input (#174).
invoke() uses a more robust approach to generate the argument list (#249)
It no longer uses lazyeval to figure out which enviroment a character
is_scalar_numeric() are deprecated because they
don't test for what you might expect at first sight.
reduce() now throws an error if
.x is empty and
.init is not
zip_n() have been removed.
pmap() coerces data frames to lists to avoid the expensive
which provides security that is unneeded here (#220).
rdunif() checks its inputs for validity (#211).
set_names() can now take a function to tranform the names programmatically
(#276), and you can supply names in
... to reduce typing even more
set_names() is now powered by
safely() now actually uses the
quiet argument (#296).
transpose() now matches by name if available (#164). You can
override the default choice with the new
The function argument of
detect_index() have been
.f. This is because they have mapper
semantics rather than predicate semantics.
This is a compatibility release with dplyr 0.6.0.
unslice()have been moved to purrrlyr. This is a bit of an aggresive change but it allows us to make the dependencies much lighter.
Fix for dev tibble support.
as_function() now supports list arguments which allow recursive indexing
using either names or positions. They now always stop when encountering
the first NULL (#173).
reduce correctly pass extra arguments to the
as_function() gains a
.null argument that for character and numeric
values allows you to specify what to return for null/absent elements (#110).
This can be used with any map function, e.g.
map_int(x, 1, .null = NA)
as_function() is now generic.
is_function() that returns
TRUE only for regular functions.
Fix crash on GCC triggered by
There are two handy infix functions:
x %||% yis shorthand for
if (is.null(x)) y else x(#109).
x %@% "a"is shorthand for
attr(x, "a", exact = TRUE)(#69).
accumulate() has been added to handle recursive folding. It is shortand
Reduce(f, .x, accumulate = TRUE) and follows a similar syntax to
reduce() (#145). A right-hand version
accumulate_right() was also added.
map_df() row-binds output together. It's the equivalent of
flatten() is now type-stable and always returns a list. To return a simpler
invoke() has been overhauled to be more useful: it now works similarly
.x is NULL, and hence
map_call() has been
invoke_map() is a vectorised complement to
and comes with typed variants
The name more clearly reflects the intent (transposing the first and second
levels of list). It no longer has fields argument or the
instead use the new
possibly() are experimental functions
for working with functions with side-effects (e.g. printed output,
messages, warnings, and errors) (#120).
safely() is a version of
that modifies a function (rather than an expression), and always returns a
list with two components,
rep_along() generalise the idea of
is_null() is the snake-case version of
pmap() (parallel map) replaces
map_n() (#132), and has typed-variants
set_names() is a snake-case alternative to
setNames() with stricter
equality checking, and more convenient defaults for pipes:
x %>% set_names() is equivalent to
setNames(x, x) (#119).
We are still figuring out what belongs in dplyr and what belongs in purrr. Expect much experimentation and many changes with these functions.
map() now always returns a list. Data frame support has been moved
dmap(). The latter supports sliced data frames
as a shortcut for the combination of
x %>% by_slice(dmap, fun, .collate = "rows"). The conditional
dmap_if() also support sliced data frames
and will recycle scalar results to the slice size.
map_rows() has been renamed to
invoke_rows(). As other
rows-based functionals, it collates results inside lists by default,
but with column collation this function is equivalent to
The rows-based functionals gain a
.to option to name the output
column as well as a
.collate argument. The latter allows to
collate the output in lists (by default), on columns or on
rows. This makes these functions more flexible and more predictable.
as_function(), which converts formulas etc to functions, is now
rerun() is correctly scoped (#95)
update_list() can now modify an element called
map*() now use custom C code, rather than relying on
etc. The performance characteristcs are very similar, but it allows us greater
control over the output (#118).
map_lgl() now has second argument
flatmap() -> use
map() followed by the appropriate
map3(x, y, z) ->
map_n(list(x, y, z));
walk3(x, y, z) ->pwalk(list(x, y, z))`