Text Extraction, Rendering and Converting of PDF Documents

Utilities based on 'libpoppler' for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents info PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.


News

1.0

  • Add workaround for poppler landscape truncation bug (fixes #7)

0.5

  • Rebuild poppler on Windows to support PDF rendering

0.4

  • Update Homebrew URL in configure script.
  • Fix autobrew (rename libopenjepg -> libopenjp2)
  • Update libpoppler 0.46 for Windows

0.3

  • Update libpoppler 0.42 for Windows
  • Use the COMPILED_BY variable on Windows to support R 3.3

0.2

  • Switch pdf_render_page to 1 based indexing
  • Fix for red/blue channel mixup in pdf_render_page
  • Update example to use local PDF file

0.1

  • Initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("pdftools")

1.5 by Jeroen Ooms, 18 days ago


https://ropensci.org/blog/2016/03/01/pdftools-and-jeroen (blog) https://github.com/ropensci/pdftools#readme (devel) https://poppler.freedesktop.org (upstream)


Report a bug at https://github.com/ropensci/pdftools/issues


Browse source code at https://github.com/cran/pdftools


Authors: Jeroen Ooms


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports Rcpp

Suggests jpeg, png, webp, testthat

Linking to Rcpp

System requirements: Poppler C++ API: libpoppler-cpp-dev (deb) or poppler-cpp-devel (rpm). The unit tests also require the 'poppler-data' package (rpm/deb)


Imported by crminer, rcoreoa, readtext, textreadr.

Depended on by pdfsearch.

Suggested by goldi, hunspell, magick, tesseract, tm.


See at CRAN