Extract Text from Microsoft Word Documents

Wraps the 'AntiWord' utility to extract text from Microsoft Word documents. The utility only supports the old 'doc' format, not the new xml based 'docx' format. Use the 'xml2' package to read the latter.


News

1.1

  • Windows: shQuote() path to file to make it work for paths with spaces
  • Capture error messages sent to stderr() by antiword
  • Simplify build structure a bit
  • Fix UBSAN error

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("antiword")

1.1 by Jeroen Ooms, a year ago


https://github.com/ropensci/antiword#readme (devel) http://www.winfield.demon.nl (upstream)


Report a bug at http://github.com/ropensci/antiword/issues


Browse source code at https://github.com/cran/antiword


Authors: Jeroen Ooms [aut, cre], Adri van Os [cph] (Author 'antiword' utility)


Documentation:   PDF Manual  


GPL-2 license


Imports sys


Imported by readtext, textreadr.

Suggested by tm.


See at CRAN