'rquery' for 'data.table'

Implements the 'rquery' piped Codd-style query algebra using 'data.table'. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.


rqdatatable is an implementation of the rquery piped Codd-style relational algebra hosted on data.table. rquery allow the expression of complex transformations as a series of relational operators and rqdatatable implements the operators using data.table.

For example scoring a logistic regression model (which requires grouping, ordering, and ranking) is organized as follows. For more on this example please see "Let’s Have Some Sympathy For The Part-time R User".

library("rqdatatable")
# data example
dL <- build_frame(
   "subjectID", "surveyCategory"     , "assessmentTotal" |
   1          , "withdrawal behavior", 5                 |
   1          , "positive re-framing", 2                 |
   2          , "withdrawal behavior", 3                 |
   2          , "positive re-framing", 4                 )
scale <- 0.237
 
# example rquery pipeline
rquery_pipeline <- local_td(dL) %.>%
  extend_nse(.,
             probability :=
               exp(assessmentTotal * scale))  %.>% 
  normalize_cols(.,
                 "probability",
                 partitionby = 'subjectID') %.>%
  pick_top_k(.,
             k = 1,
             partitionby = 'subjectID',
             orderby = c('probability', 'surveyCategory'),
             reverse = c('probability', 'surveyCategory')) %.>% 
  rename_columns(., c('diagnosis' = 'surveyCategory')) %.>%
  select_columns(., c('subjectID', 
                      'diagnosis', 
                      'probability')) %.>%
  orderby(., cols = 'subjectID')

We can show the expanded form of query tree.

cat(format(rquery_pipeline))
table(dL; 
  subjectID,
  surveyCategory,
  assessmentTotal) %.>%
 extend(.,
  probability := exp(assessmentTotal * 0.237)) %.>%
 extend(.,
  probability := probability / sum(probability),
  p= subjectID) %.>%
 extend(.,
  row_number := row_number(),
  p= subjectID,
  o= "probability" DESC, "surveyCategory" DESC) %.>%
 select_rows(.,
   row_number <= 1) %.>%
 rename(.,
  c('diagnosis' = 'surveyCategory')) %.>%
 select_columns(.,
   subjectID, diagnosis, probability) %.>%
 orderby(., subjectID)

And execute it using data.table.

ex_data_table(rquery_pipeline)
##    subjectID           diagnosis probability
## 1:         1 withdrawal behavior   0.6706221
## 2:         2 positive re-framing   0.5589742

One can also apply the pipeline to new tables.

build_frame(
   "subjectID", "surveyCategory"     , "assessmentTotal" |
   7          , "withdrawal behavior", 5                 |
   7          , "positive re-framing", 20                ) %.>%
  rquery_pipeline
##    subjectID           diagnosis probability
## 1:         7 positive re-framing   0.9722128

Initial bench-marking of rqdatatable is very favorable (notes here).

rqdatatable is a fairly complete implementation of rquery. The main differences are the rqdatatable implementations of sql_node() and theta_join() are implemented by round-tripping through a database handle specified by the rquery.rquery_db_executor option (so it is not they are not very desirable implementation).

To install rqdatatable please use install.packages("rqdatatable") or try devtools as follows.

# install.packages("devtools")
devtools::install_github("WinVector/rqdatatable")

News

rqdatatable 1.1.1 2018/09/20

  • alternate data.table implementation path.
  • force parent.frame.

rqdatatable 1.0.0 2018/09/10

  • allow no group columns project.
  • work on ordering in extend.

rqdatatable 0.1.4 2018/08/18

  • More tests.
  • Work on result print-visibility.

rqdatatable 0.1.3 2018/07/28

  • Fix full join print glitch.
  • data.table implementation of theta-join.
  • Documentation fixes.

rqdatatable 0.1.2 2018/07/08

  • Adapt to instant execution path.
  • Don't expect %>>%.
  • Documentation improvements.

rqdatatable 0.1.1 2018/06/26

  • Don't use isFALSE() (new to R 3.5.0).
  • Update install instructions.
  • Improve regexps.

rqdatatable 0.1.0 2018/06/18

  • First CRAN release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rqdatatable")

1.1.3 by John Mount, 2 days ago


https://github.com/WinVector/rqdatatable/, https://winvector.github.io/rqdatatable/


Report a bug at https://github.com/WinVector/rqdatatable/issues


Browse source code at https://github.com/cran/rqdatatable


Authors: John Mount [aut, cre] , Win-Vector LLC [cph]


Documentation:   PDF Manual  


GPL-3 license


Imports wrapr, data.table, methods

Depends on rquery

Suggests knitr, rmarkdown, DBI, RSQLite, parallel, RUnit


Suggested by cdata, rquery, vtreat.


See at CRAN