Read Large Text Files

Read large text files by splitting them in smaller files.


lifecycle Travis build status AppVeyor build status Coverage status CRAN status

Read large text files based on splitting + data.table::fread

Example

# devtools::install_github("privefl/bigreadr")
library(bigreadr)
 
# Create a temporary file of ~141 MB (just as an example)
csv <- fwrite2(iris[rep(seq_len(nrow(iris)), 1e4), rep(1:5, 4)], tempfile())
format(file.size(csv), big.mark = ",")
 
## Splitting lines (1)
# Read (by parts) all data -> using `fread` would be faster
nlines(csv)  ## 1M5 lines -> split every 500,000
big_iris1 <- big_fread1(csv, every_nlines = 5e5)
# Read and subset (by parts)
big_iris1_setosa <- big_fread1(csv, every_nlines = 5e5, .transform = function(df) {
  dplyr::filter(df, Species == "setosa")
})
 
## Splitting columns (2)
big_iris2 <- big_fread2(csv, nb_parts = 3)
# Read and subset (by parts)
species_setosa <- (fread2(csv, select = 5)[[1]] == "setosa")
big_iris2_setosa <- big_fread2(csv, nb_parts = 3, .transform = function(df) {
  dplyr::filter(df, species_setosa)
})
 
## Verification
identical(big_iris1_setosa, dplyr::filter(big_iris1, Species == "setosa"))
identical(big_iris2, big_iris1)
identical(big_iris2_setosa, big_iris1_setosa)

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("bigreadr")

0.1.3 by Florian Privé, 8 months ago


https://github.com/privefl/bigreadr


Report a bug at https://github.com/privefl/bigreadr/issues


Browse source code at https://github.com/cran/bigreadr


Authors: Florian Privé [aut, cre]


Documentation:   PDF Manual  


GPL-3 license


Imports data.table, Rcpp, parallel, fpeek, utils

Suggests spelling, testthat, covr

Linking to Rcpp


Imported by bigstatsr.


See at CRAN