DBI Connector to Presto

Implements a 'DBI' compliant interface to Presto. Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes: < https://prestodb.io/>.


RPresto is a DBI-based adapter for the open source distributed SQL query engine Presto for running interactive analytic queries.

RPresto is both on CRAN and github. For the CRAN version, you can use

install.packages('RPresto')

You can install the github development version via

devtools::install_github('prestodb/RPresto')

The standard DBI approach works with RPresto:

library('DBI')
 
con <- dbConnect(
  RPresto::Presto(),
  host='http://localhost',
  port=7777,
  user=Sys.getenv('USER'),
  schema='<schema>',
  catalog='<catalog>'
)
 
res <- dbSendQuery(con, 'SELECT 1')
# dbFetch without arguments only returns the current chunk, so we need to
# loop until the query completes.
while (!dbHasCompleted(res)) {
    chunk <- dbFetch(res)
    print(chunk)
}
 
res <- dbSendQuery(con, 'SELECT CAST(NULL AS VARCHAR)')
# Due to the unpredictability of chunk sizes with presto, we do not support
# custom number of rows
# testthat::expect_error(dbFetch(res, 5))
 
# To get all rows using dbFetch, pass in a -1 argument
print(dbFetch(res, -1))
 
# An alternative is to use dbGetQuery directly
 
# `source` for iris.sql()
source(system.file('tests', 'testthat', 'utilities.R', package='RPresto'))
 
iris <- dbGetQuery(con, paste("SELECT * FROM", iris.sql()))
 
dbDisconnect(con)

We also include dplyr integration.

library(dplyr)
 
db <- src_presto(
  host='http://localhost',
  port=7777,
  user=Sys.getenv('USER'),
  schema='<schema>',
  catalog='<catalog>'
)
 
# Assuming you have a table like iris in the database
iris <- tbl(db, 'iris')
 
iris %>%
  group_by(species) %>%
  summarise(mean_sepal_length = mean(as(sepal_length, 0.0))) %>%
  arrange(species) %>%
  collect()

Presto exposes its interface via a REST based API1. We utilize the httr package to make the API calls and use jsonlite to reshape the data into a data.frame. Note that as of now, only read operations are supported.

RPresto has been tested on Presto 0.100.

RPresto is BSD-licensed. We also provide an additional patent grant.

[1] See https://gist.github.com/electrum/7710544 for an unofficial description of the API.

News

RPresto 1.2.1

  • Handle responses with no column information (fixes #49)
  • Add retries for GET and POST responses with error status codes
  • Skip test cases for ones that need locale modification if we cannot set the locale for the OS.
  • Adapt to changes in the upcoming dplyr and testthat versions.

RPresto 1.2.0

  • Add a session.timezone parameter to dbConnect and src_presto which defaults to UTC. This affects the timestamps returned for Presto data types "TIMESTAMP". We handle the ambiguity by assigning a time zone to every POSIXct column returned. Note that if you are doing as.character() directly on these columns, the values you obtain will be different from what happened before.
  • Fix the way we handle zero row multiple column query results. This will affect LIMIT 0 queries specifically.

RPresto 1.1.1

  • Minor dplyr related fixes
  • Drop the R version requirement from 3.1.1 to 3.1.0
  • Speed-up in binding chunks if dplyr is available.
  • Handle special values like Infinity, NaN.

RPresto 1.1.0

  • Add optional dplyr support. One can initiate a connection via src_presto.
  • Minor documentation fixes.

RPresto 1.0.0

  • Initial release to github

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("RPresto")

1.3.0 by Onur Ismail Filiz, 2 months ago


https://github.com/prestodb/RPresto


Report a bug at https://github.com/prestodb/RPresto/issues


Browse source code at https://github.com/cran/RPresto


Authors: Onur Ismail Filiz [aut, cre], Sergey Goder [aut], John Myles White [ctb]


Documentation:   PDF Manual  


BSD_3_clause + file LICENSE license


Imports DBI, httr, openssl, jsonlite, stringi, stats, Rcpp, utils

Depends on methods

Suggests testthat, dplyr, dbplyr

Linking to Rcpp


See at CRAN