Text Extraction, Rendering and Converting of PDF Documents

Utilities based on 'libpoppler' for extracting text, fonts, attachments and metadata from a PDF file. Also supports high quality rendering of PDF documents info PNG, JPEG, TIFF format, or into raw bitmap vectors for further processing in R.



  • Run configure script in bash


  • Change autobrew script to avoid dependency on xQuartz


  • pdf_render_page() and pdf_convert() gain argument to control 'antialias'
  • Small tweaks in pdf_text() for dealing with malformed pdf files


  • On Windows and MacOS we now bundle poppler-data to support non-latin text
  • Windows: Upgrade libpoppler to 0.61.0 from rwinlib
  • Windows: patch libpoppler bug that would cause pdf_convert() to generate corrupt files
  • PDF rendering errors are relayed via message() instead of warning()


  • Hide symbols in supported platforms
  • Upgrade libpoppler on Windows


  • Improve support for reading passworded and encyrpted pdf files (+ unit tests)
  • Support direct conversion from pdf to png, jpeg, tiff (+ unit tests)
  • Switch to Rcpp automatic symbol registration
  • Tweak autobrew script for legacy Mavericks builds


  • Fix autobrew for OSX Mavericks


  • Extract autobrew script to separate repo


  • Add workaround for poppler landscape truncation bug (fixes #7)


  • Rebuild poppler on Windows to support PDF rendering


  • Update Homebrew URL in configure script.
  • Fix autobrew (rename libopenjepg -> libopenjp2)
  • Update libpoppler 0.46 for Windows


  • Update libpoppler 0.42 for Windows
  • Use the COMPILED_BY variable on Windows to support R 3.3


  • Switch pdf_render_page to 1 based indexing
  • Fix for red/blue channel mixup in pdf_render_page
  • Update example to use local PDF file


  • Initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.8 by Jeroen Ooms, 6 months ago

https://ropensci.org/blog/2016/03/01/pdftools-and-jeroen (blog) https://github.com/ropensci/pdftools#readme (devel) https://poppler.freedesktop.org (upstream)

Report a bug at https://github.com/ropensci/pdftools/issues

Browse source code at https://github.com/cran/pdftools

Authors: Jeroen Ooms [aut, cre]

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports Rcpp

Suggests jpeg, png, webp, testthat

Linking to Rcpp

System requirements: Poppler C++ API: libpoppler-cpp-dev (deb) or poppler-cpp-devel (rpm). The unit tests also require the 'poppler-data' package (rpm/deb)

Imported by crminer, findR, fulltext, pdfsearch, rcoreoa, readtext, tesseract, textreadr.

Suggested by goldi, gridGraphics, hunspell, magick, spelling, staplr, tm.

See at CRAN