A Toolbox for Non-Tabular Data Manipulation

Provides a set of functions for data manipulation with list objects, including mapping, filtering, grouping, sorting, updating, searching, and other useful functions. Most functions are designed to be pipeline friendly so that data processing with lists can be chained.


rlist is a set of tools for working with list objects. Its goal is to make it easier to work with lists by providing a wide range of functions that operate on non-tabular data stored in them.

This package supports list mapping, filtering, grouping, sorting, updating, searching, file input/output, and many other functions. Most functions in the package are designed to be pipeline friendly so that data processing with lists can be chained.

rlist Tutorial is a highly recommended complete guide to rlist.

This document is also translated into 日本語 (by @teramonagi).

Install the latest version from GitHub:

devtools::install_github("renkun-ken/rlist")

Install from CRAN:

install.packages("rlist")

In R, there are numerous powerful tools to deal with structured data stored in tabular form such as data frame. However, a variety of data is non-tabular: different records may have different fields; for each field they may have different number of values.

It is hard or no longer straightforward to store such data in data frame, but the list object in R is flexible enough to represent such records of diversity. rlist is a toolbox to deal with non-structured data stored in list objects, providing a collection of high-level functions which are pipeline friendly.

Suppose we have a list of developers, each of whom has a name, age, a few interests, a list of programming languages they use and the number of years they have been using them.

library(rlist)
devs <- 
  list(
    p1=list(name="Ken",age=24,
      interest=c("reading","music","movies"),
      lang=list(r=2,csharp=4)),
    p2=list(name="James",age=25,
      interest=c("sports","music"),
      lang=list(r=3,java=2,cpp=5)),
    p3=list(name="Penny",age=24,
      interest=c("movies","reading"),
      lang=list(r=1,cpp=4,python=2)))

This type of data is non-relational since it does not well fit the shape of a data frame, yet it can be easily stored in JSON or YAML format. In R, list objects are flexible enough to represent a wide range of non-relational datasets like this. This package provides a wide range of functions to query and manipulate this type of data.

The following examples use str() to show the structure of the output.

Filter those who like music and has been using R for more than 3 years.

str( list.filter(devs, "music" %in% interest & lang$r >= 3) )
List of 1
 $ p2:List of 4
  ..$ name    : chr "James"
  ..$ age     : num 25
  ..$ interest: chr [1:2] "sports" "music"
  ..$ lang    :List of 3
  .. ..$ r   : num 3
  .. ..$ java: num 2
  .. ..$ cpp : num 5

Select their names and ages.

str( list.select(devs, name, age) )
List of 3
 $ p1:List of 2
  ..$ name: chr "Ken"
  ..$ age : num 24
 $ p2:List of 2
  ..$ name: chr "James"
  ..$ age : num 25
 $ p3:List of 2
  ..$ name: chr "Penny"
  ..$ age : num 24

Map each of them to the number of interests.

str( list.map(devs, length(interest)) )
List of 3
 $ p1: int 3
 $ p2: int 2
 $ p3: int 2

In addition to these basic functions, rlist also supports various types of grouping, joining, searching, sorting, updating, etc. For the introduction to more functionality, please go through the rlist Tutorial.

In this package, almost all functions that work with expressions accept the following forms of lambda expressions:

  • Implicit lambda expression: expression
  • Univariate lambda expressions:
    • x ~ expression
    • f(x) ~ expression
  • Multivariate lambda expressions:
    • f(x,i) ~ expression
    • f(x,i,name) ~ expression

where x refers to the list member itself, i denotes the index, name denotes the name. If the symbols are not explicitly declared, ., .i and .name will by default be used to represent them, respectively.

nums <- list(a=c(1,2,3),b=c(2,3,4),c=c(3,4,5))
list.map(nums, c(min=min(.),max=max(.)))
list.filter(nums, x ~ mean(x)>=3)
list.map(nums, f(x,i) ~ sum(x,i))

Query the name of each developer who likes music and uses R, and put the results in a data frame.

library(pipeR)
devs %>>% 
  list.filter("music" %in% interest & "r" %in% names(lang)) %>>%
  list.select(name,age) %>>%
  list.stack
   name age
1   Ken  24
2 James  25

The example above uses pipeR(http://renkun.me/pipeR/) package for pipeline operator %>>% that chains commands in a fluent style.

List() function wraps a list within an environment where almost all list functions are defined. Here is the List-environment version of the previous example.

ldevs <- List(devs)
ldevs$filter("music" %in% interest & "r" %in% names(lang))$
  select(name,age)$
  stack()$
  data
   name age
1   Ken  24
2 James  25
help(package = rlist)

or view the documentation on CRAN

This package is under MIT License.

News

Changes since version 0.4

  • New features
    • list.ungroup now supports level arguments to unlist a nested list recursively. (#102)
    • list.flatten now accepts classes to filter list element recursively by class name.
    • list.expand implements a list version of expand.grid (#107)
  • Improvements
    • Support loading and parsing from xml to list. (#43)
    • list.search now uses is.null to clean the results.
    • Add list.unzip to List object.
    • list.ungroup now supports group.names to indicate whether to preserve group names.
    • Implement better error handling mechanism in list.all, list.takeWhile, list.skipWhile, etc.
    • list.table will directly call table upon input data if ... is missing.
  • Bug fixes
    • Fix returned value of list.unserialize.
    • list.skip returns the original data when asked to skip 0 elements.
    • list.skip takes the first n elements when asked to skip a negative number of elements.
    • Fix bug in lambda expression handling of list.all (#105)

Version 0.4

  • New features
    • Include a dataset nyweather scraped from OpenWeatherMap (#2)
    • list.load now supports text-based progress bar when progress = TRUE which is by default enabled if over 5 files are to be loaded. (#92)
    • New function list.names gives a list or vector names by mapping.
    • New functions list.first and list.last find the first or last list element that meets a given condition.
    • New function list.unzip to transform a list of elements with similar structure into a list of decoupled fields.
  • Improvements
    • Add error handling in several edge cases. (#18)
    • list.group now supports grouping by multi-key which produces multi-level list. (#69)
    • list.load now supports loading from multiple filenames given in character vector. (#74)
    • list.load is now able to guess the file format even if the file type is not specified. (#76)
    • list.maps now allows the usage of ..1, ..2, etc. to refer to unnamed arguments. (#80)
    • list.load now supports merging and ungrouping as means to aggregating loaded results. (#82)
    • list.stack now uses data.table::setDF to convert data.table to data.frame if data.table = FALSE, which is done by reference and thus has higher performance.
  • Bug fixes
    • list.search now takes n as the number of returned vector rather than that of the elements in all returned vectors, and is now able to jump out when the result set reaches given capacity. (#47, #84)
    • Fix how list.table deals with NULL values. (#73)
    • Fix how wrapper functions deal with default arguments. (#75)
    • Fix the dynamic scoping issues in list.table. (#86)
    • list.all and list.any behave the same as all and any respectively when the input is empty. (#87)
    • One-sided formula does not result in error now. (#89)
    • list.flatten now preserves names as specified. (#90)
    • Fix incorrect processing for fallback in list.findi. (#91)
    • Fix the implementation in list.group working with multi-key. (#93)
    • Fix incorrect ordering if some entries are multi-valued vectors and others and single- valued. If list.order and list.sort encounter such situation, they now report error rather than silently produced unreliable results. (#94)
    • Fix inconsistencies in list.all, list.any, list.first and list.last.
  • Deprecation
    • equal() is removed and related packages are now suggested rather than imported. (#70)
    • summary.list() is deprecated. (#70)
    • No longer interprets x -> f(x) as a form a lambda expression. Use x ~ f(x) instead. (#54)
    • desc(x) is no longer supported in list.sort and list.order. Use -x or (x) instead. (#66)

Version 0.3

API Break: list.search now evaluates expression recursively in a list and supports lambda expression.

Add equal() function for logical and fuzzy filtering and searching which supports exact equality, atomic equality, inclusion, pattern matching, string-distance tolerance.

Add List() to provide an environment in which most list functions are defined for light-weight chaining that does not rely on external operators.

Version 0.2.5

Add list.apply which is a wrapper function of lapply. Add list.search that searches a list recursively. Add exact search functions: equal, unequal, unidentical, include, and exclude. Add fuzzy search functions: like and unlike based on stringdist package. Enhance list.clean which now supports recursive cleaning.

Version 0.2.4

Add list.common that returns the common cases of all list member by expression.

Version 0.2.3

Improve performance (#26, #27) Add list.flatten that flattens a nested list to one-level.

Version 0.2.2

Add list.stack that binds list members to a data.frame. Add list.zip that combines multiple lists element-wisely. Add list.maps that performs mapping over multiple lists. Performance improvements. Minor maintainence updates. list.cases supports list-like cases Fixed #23 Fixed #25 list.select no longer accepts explicit lambda expressions. Vignettes updated

Version 0.2.1

Add new function list.table Minor maintainence updates. Fixed #6 Fixed #11 Fixed #20 Fixed #21

Version 0.2

Add list.join, list.mapv, list.do, list.clean, list.parse Add vignettes

Version 0.1

Implement functions

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.