Wraps the 'CRFsuite' library < https://github.com/chokkan/crfsuite> allowing users to fit a Conditional Random Field model and to apply it on existing data. The focus of the implementation is in the area of Natural Language Processing where this R package allows you to easily build and apply models for named entity recognition, text chunking, part of speech tagging, intent recognition or classification of any category you have in mind. Next to training, a small web application is included in the package to allow you to easily construct training data.
This repository contains an R package which wraps the CRFsuite C/C++ library (https://github.com/chokkan/crfsuite), allowing the following:
For users unfamiliar with Conditional Random Field (CRF) models, you can read this excellent tutorial http://homepages.inf.ed.ac.uk/csutton/publications/crftut-fnt.pdf
devtools::install_github("bnosac/crfsuite", build_vignettes = TRUE)
For detailed documentation on how to build your own CRF tagger for doing NER / Chunking. Look to the vignette.
library(crfsuite)vignette("crfsuite-nlp", package = "crfsuite")
library(crfsuite)## Get example training data + enrich with token and part of speech 2 words before/after each tokenx <- ner_download_modeldata("conll2002-nl")x <- crf_cbind_attributes(x, terms = c("token", "pos"), by = c("doc_id", "sentence_id"),from = -2, to = 2, ngram_max = 3, sep = "-")## Split in train/test setcrf_train <- subset(x, data == "ned.train")crf_test <- subset(x, data == "testa")## Build the crf modelattributes <- grep("token|pos", colnames(x), value=TRUE)model <- crf(y = crf_train$label,x = crf_train[, attributes],group = crf_train$doc_id,method = "lbfgs", options = list(max_iterations = 25, feature.minfreq = 5, c1 = 0, c2 = 1))model## Use the model to score on existing tokenised datascores <- predict(model, newdata = crf_test[, attributes], group = crf_test$doc_id)table(scores$label)B-LOC B-MISC B-ORG B-PER I-LOC I-MISC I-ORG I-PER O261 211 182 693 24 205 209 605 35297
The package itself does not contain any models to do NER or Chunking. It's a package which facilitates creating your own CRF model for doing Named Entity Recognition or Chunking on your own data with your own categories.
In order to facilitate creating training data on your own data, a shiny app is made available in this R package which allows you to easily tag your own chunks of text, with your own categories. More details can be found in the vignette
vignette("crfsuite-nlp", package = "crfsuite").
Need support in text mining? Contact BNOSAC: http://www.bnosac.be