A Tidy Data Model for Natural Language Processing

Provides a set of fast tools for converting a textual corpus into a set of normalized tables. Users may make use of a Python back end with 'spaCy' < https://spacy.io> or the Java back end 'CoreNLP' < http://stanfordnlp.github.io/CoreNLP/>. A minimal back end with no external dependencies is also provided. Exposed annotation tasks include tokenization, part of speech tagging, named entity recognition, entity linking, sentiment analysis, dependency parsing, coreference resolution, and word embeddings. Summary statistics regarding token unigram, part of speech tag, and dependency type frequencies are also included to assist with analyses.


R package providing annotators and a tidy data model for natural language processing

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("cleanNLP")

1.10.0 by Taylor B. Arnold, 3 months ago


https://statsmaths.github.io/cleanNLP/


Report a bug at http://github.com/statsmaths/cleanNLP/issues


Browse source code at https://github.com/cran/cleanNLP


Authors: Taylor B. Arnold [aut, cre]


Documentation:   PDF Manual  


LGPL-2 license


Imports dplyr, readr, Matrix, stringi, stats, methods, utils

Suggests reticulate, rJava, tokenizers, RCurl, knitr, rmarkdown, testthat, covr

System requirements: Python (>= 2.7.0); spaCy <https://spacy.io/> (>= 1.8); Java (>= 7.0); Stanford CoreNLP <http://nlp.stanford.edu/software/corenlp.shtml> (>= 3.7.0)


See at CRAN