Extract Text from Microsoft Word Documents

Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter.



  • Windows: shQuote() path to file to make it work for paths with spaces
  • Capture error messages sent to stderr() by antiword
  • Simplify build structure a bit
  • Fix UBSAN error

1.1 by Jeroen Ooms, a year ago

https://github.com/ropensci/antiword#readme (devel) http://www.winfield.demon.nl (upstream)

Report a bug at http://github.com/ropensci/antiword/issues

Browse source code at https://github.com/cran/antiword

Authors: Jeroen Ooms [aut, cre], Adri van Os [cph] (Author 'antiword' utility)

GPL-2 license

Imports sys

Imported by readtext, textreadr.

Suggested by tm.

