Tools for Reading, Tokenizing and Parsing R Code

Tools for the reading and tokenization of R code. The 'sourcetools' package provides both an R and C++ interface for the tokenization of R code, and helpers for interacting with the tokenized representation of R code.


Tools for reading, tokenizing, and (eventually) parsing R code.

sourcetools is not yet on CRAN -- install with

devtools::install_github("kevinushey/sourcetools")

sourcetools comes with a couple fast functions for reading files into R.

Use read() and read_lines() to quickly read a file into R as character vectors. read_lines() handles both Windows style \r\n line endings, as well as Unix-style \n endings.

text <- replicate(10000, paste(sample(letters, 200, TRUE), collapse = ""))
file <- tempfile()
cat(text, file = file, sep = "\n")
mb <- microbenchmark::microbenchmark(times = 10,
  readChar   = readChar(file, file.info(file)$size, TRUE),
  readLines  = readLines(file),
  read       = read(file),
  read_lines = read_lines(file)
)
print(mb, digits = 3)
## Unit: milliseconds
##        expr   min     lq  mean median     uq    max neval cld
##    readChar   5.2   6.54  10.5   7.02   8.73  36.56    10 ab 
##   readLines 155.9 159.69 162.4 161.95 163.15 171.76    10   c
##        read   5.3   5.48   6.5   5.97   7.52   9.35    10 a  
##  read_lines  13.5  13.95  14.4  14.09  14.50  16.97    10  b
unlink(file)

sourcetools provides the tokenize_string() and tokenize_file() functions for generating a tokenized representation of R code. These produce 'raw' tokenized representations of the code, with each token's value as a string, and a recorded row, column, and type:

tokenize_string("if (x < 10) 20")
##    value row column       type
## 1     if   1      1    keyword
## 2          1      3 whitespace
## 3      (   1      4    bracket
## 4      x   1      5     symbol
## 5          1      6 whitespace
## 6      <   1      7   operator
## 7          1      8 whitespace
## 8     10   1      9     number
## 9      )   1     11    bracket
## 10         1     12 whitespace
## 11    20   1     13     number

News

sourcetools 0.1.5

  • Ensure that symbols included from e.g. <cstdio>, <cstring> are resolved using a std:: prefix.

sourcetools 0.1.4

  • More work to ensure sourcetools can build on Solaris.

sourcetools 0.1.3

  • Relax C++11 requirement, to ensure that sourcetools can build on machines with older compilers (e.g. gcc 4.4).

sourcetools 0.1.2

  • Disable failing tests on Solaris.

sourcetools 0.1.1

  • Rename token type ERR to INVALID to fix build errors on Solaris.

sourcetools 0.1.0

The first release of sourcetools comes with a small set of features exposed to R:

  • read(file): Read a file (as a string). Similar to readChar(), but faster (and maybe be optimized to use a memory mapped file reader in the future).

  • tokenize_file(file): Tokenize an R script.

  • tokenize_string(string): Tokenize a string of R code.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("sourcetools")

0.1.6 by Kevin Ushey, 8 months ago


Report a bug at https://github.com/kevinushey/sourcetools/issues


Browse source code at https://github.com/cran/sourcetools


Authors: Kevin Ushey


Documentation:   PDF Manual  


MIT + file LICENSE license


Suggests testthat


Imported by shiny.


See at CRAN