Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

This natural language processing toolkit provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization' and 'dependency parsing' of raw text. Next to text parsing, the package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at < http://universaldependencies.org/format.html>. The techniques are explained in detail in the paper: 'Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe', available at .


News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("udpipe")

0.2 by Jan Wijffels, 10 days ago


https://github.com/bnosac/udpipe


Browse source code at https://github.com/cran/udpipe


Authors: Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [cph], Jana Strakov√° [cph]


Documentation:   PDF Manual  


MPL-2.0 license


Imports Rcpp, data.table, Matrix

Suggests knitr, topicmodels

Linking to Rcpp

System requirements: C++11


See at CRAN