Fluid Data Transformations

Supplies higher-order fluid data transform operators that include pivot and anti-pivot as special cases. The methodology is describe in 'Zumel', 2018, "Fluid data reshaping with 'cdata'", < http://winvector.github.io/FluidData/FluidDataReshapingWithCdata.html> , doi:10.5281/zenodo.1173299 . Based on the 'DBI' database interface.


The cdata package is a demonstration of the "coordinatized data" theory and includes an implementation of the "fluid data" methodology. The recommended tutorial is: Fluid data reshaping with cdata. We also have a short free cdata screencast (and another example can be found here).

Briefly cdata supplies data transform operators that:

  • Work on local data or with any DBI data source.
  • Are powerful generalizations of the operators commonly called pivot and un-pivot.

A quick example:

library("cdata")
my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
 
# pivot example
d <- data.frame(meas = c('AUC', 'R2'), val = c(0.6, 0.2))
DBI::dbWriteTable(my_db,
                  'd',
                  d,
                  temporary = TRUE)
qlook(my_db, 'd')
## table `d` SQLiteConnection 
##  nrow: 2 
## 'data.frame':    2 obs. of  2 variables:
##  $ meas: chr  "AUC" "R2"
##  $ val : num  0.6 0.2
cT <- build_pivot_control_q('d',
                              columnToTakeKeysFrom= 'meas',
                              columnToTakeValuesFrom= 'val',
                              my_db = my_db)
tab <- blocks_to_rowrecs_q('d',
                            keyColumns = NULL,
                            controlTable = cT,
                            my_db = my_db)
qlook(my_db, tab)
## table `mvtcq_2qgmocqh5t2od9os9xi7_0000000001` SQLiteConnection 
##  nrow: 1 
## 'data.frame':    1 obs. of  2 variables:
##  $ AUC: num 0.6
##  $ R2 : num 0.2
DBI::dbDisconnect(my_db)

Install via CRAN:

install.packages("cdata")

Or from Github using devtools:

devtools::install_github("WinVector/cdata")

Note: cdata is targeted at data with "tame column names" (column names that are valid both in databases, and as R unquoted variable names) and basic types (column values that are simple R types such as character, numeric, logical, and so on).

News

cdata 0.5.2 2018/01/20

  • Remove append based row binding (seems to have some issues on Spark).
  • Depricate old methods.

cdata 0.5.1 2018/01/03

  • New naming convention.
  • Doc fixes.
  • Better table lifetime controls.
  • Move to wrapr 1.0.2.
  • Move grepdf out of package.
  • Add row binder.
  • Add map_fields.
  • Add winvector_temp_db_handle support.

cdata 0.5.0 2017/11/13

  • query-based re-implementation
  • fluid data workflow.
  • remove dplyr and tidyr dependence

cdata 0.1.7 2017/10/31

  • Better error msgs.

cdata 0.1.6 2017/10/12

  • work around empty keyset issues.
  • add column control.

cdata 0.1.5 2017/07/04

  • Allow NA in key columns.
  • Add optional class annotation when moving values to rows.

cdata 0.1.1 2017/05/05

  • ungroup before calculating distinct.

cdata 0.1.0 2017/03/28

  • First release.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("cdata")

0.7.1 by John Mount, 6 days ago


https://github.com/WinVector/cdata/, https://winvector.github.io/cdata/


Report a bug at https://github.com/WinVector/cdata/issues


Browse source code at https://github.com/cran/cdata


Authors: John Mount [aut, cre], Nina Zumel [aut], Win-Vector LLC [cph]


Documentation:   PDF Manual  


GPL-3 license


Imports wrapr

Suggests DBI, RSQLite, testthat, knitr, rmarkdown


Imported by WVPlots.


See at CRAN