# 'rquery' for 'data.table'

Implements the 'rquery' piped Codd-style query algebra using 'data.table'. This allows for a high-speed in memory implementation of Codd-style data manipulation tools.

`rqdatatable` is an implementation of the `rquery` piped Codd-style relational algebra hosted on `data.table`. `rquery` allow the expression of complex transformations as a series of relational operators and `rqdatatable` implements the operators using `data.table`.

For example scoring a logistic regression model (which requires grouping, ordering, and ranking) is organized as follows. For more on this example please see "Let’s Have Some Sympathy For The Part-time R User".

We can show the expanded form of query tree.

``````table(dL;
subjectID,
surveyCategory,
assessmentTotal) %.>%
extend(.,
probability := exp(assessmentTotal * 0.237)) %.>%
extend(.,
probability := probability / sum(probability),
p= subjectID) %.>%
extend(.,
row_number := row_number(),
p= subjectID,
o= "probability" DESC, "surveyCategory" DESC) %.>%
select_rows(.,
row_number <= 1) %.>%
rename(.,
c('diagnosis' = 'surveyCategory')) %.>%
select_columns(.,
subjectID, diagnosis, probability) %.>%
orderby(., subjectID)
``````

And execute it using `data.table`.

``````##    subjectID           diagnosis probability
## 1:         1 withdrawal behavior   0.6706221
## 2:         2 positive re-framing   0.5589742
``````

One can also apply the pipeline to new tables.

``````##    subjectID           diagnosis probability
## 1:         7 positive re-framing   0.9722128
``````

Initial bench-marking of `rqdatatable` is very favorable (notes here).

Note `rqdatatable` has an "immediate mode" which allows direct application of pipelines stages without pre-assembling the pipeline. "Immediate mode" is a convenience for ad-hoc analyses, and has some negative performance impact, so we encourage users to build pipelines for most work. Some notes on the issue can be found here.

`rqdatatable` is a fairly complete implementation of `rquery`. The main differences are the `rqdatatable` implementations of `sql_node()` and `theta_join()` are implemented by round-tripping through a database handle specified by the `rquery.rquery_db_executor` option (so it is not they are not very desirable implementation).

To install `rqdatatable` please use `install.packages("rqdatatable")`.

# rqdatatable 1.1.4 2019/02/24

• extra copy in ex_data_table.relop_list() (just in case).

# rqdatatable 1.1.3 2019/02/17

• Move to RUnit.
• More tests.
• Add ex_data_table.relop_list().

# rqdatatable 1.1.2 2018/12/17

• Allow more control of ordering in extend.
• Relax column production check.
• Add rq_ufn().
• More of force parent.frame forcing.
• Add row limit to order.
• Add order_expr.
• Add power test.

# rqdatatable 1.1.1 2018/09/20

• alternate data.table implementation path.
• force parent.frame.

# rqdatatable 1.0.0 2018/09/10

• allow no group columns project.
• work on ordering in extend.

# rqdatatable 0.1.4 2018/08/18

• More tests.
• Work on result print-visibility.

# rqdatatable 0.1.3 2018/07/28

• Fix full join print glitch.
• data.table implementation of theta-join.
• Documentation fixes.

# rqdatatable 0.1.2 2018/07/08

• Adapt to instant execution path.
• Don't expect %>>%.
• Documentation improvements.

# rqdatatable 0.1.1 2018/06/26

• Don't use isFALSE() (new to R 3.5.0).
• Update install instructions.
• Improve regexps.

# rqdatatable 0.1.0 2018/06/18

• First CRAN release.

# Reference manual

install.packages("rqdatatable")

1.2.2 by John Mount, a month ago

https://github.com/WinVector/rqdatatable/, https://winvector.github.io/rqdatatable/

Report a bug at https://github.com/WinVector/rqdatatable/issues

Browse source code at https://github.com/cran/rqdatatable

Authors: John Mount [aut, cre] , Win-Vector LLC [cph]

Documentation:   PDF Manual

GPL-2 | GPL-3 license

Imports wrapr, data.table, methods

Depends on rquery

Suggests knitr, rmarkdown, DBI, RSQLite, parallel, RUnit

Imported by WVPlots, cdata.

Suggested by rquery, vtreat.

See at CRAN