A Stemming Algorithm for the Portuguese Language

Implements the "Stemming Algorithm for the Portuguese Language" .


This package uses the algorithm Stemming Algorithm for the Portuguese Language described in this article by Viviane Moreira Orengo and Christian Huyck.

The idea of the stemmer is very well explained by the following schema.

To install the package you can use the following:

devtools::install_github("dfalbel/rslp")

The only important function of the package is the rslp function. You can call it on a vector of characters like this:

library(rslp)
words <- c("balões", "aviões", "avião", "gostou", "gosto", "gostaram")
rslp(words)

It works with vector of texts too, using the rslp_doc function.

docs <- c(
  "coma frutas pois elas fazem bem para a saúde.",
  "não coma doces, eles fazem mal para os dentes."
  )
rslp_doc(docs)
#> [1] "com frut poi ela faz bem par a saud." 
#> [2] "nao com doc, ele faz mal par os dent."

News

rslp 0.1.0

  • rslp is now returning a character vector without names
  • fixed a bug related to strigr_sub that wasn't returning expected result
  • fixed a bug that was eliminating the "a" as last letter from words in feminine reduction
  • added a new rule to steprules noun reduction ("ático" -> "")
  • corrected the noun reduction rule "atoria" -> "atória"
  • added a rule to remove the suffix "irá" in verb reduction
  • added a lot of tests

rslp 0.0.1

  • first CRAN release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("rslp")

0.1.0 by Daniel Falbel, a year ago


https://github.com/dfalbel/rslp


Browse source code at https://github.com/cran/rslp


Authors: Daniel Falbel


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports stringr, stringi, plyr, magrittr

Suggests dplyr, testthat, covr


Imported by ptstem.


See at CRAN