Statistical Analysis of Textual Data

Provides a set of functions devoted to multivariate exploratory statistics on textual data. Classical methods such as correspondence analysis and agglomerative hierarchical clustering are available. Chronologically constrained agglomerative hierarchical clustering enriched with labelled-by-words trees is offered. Given a division of the corpus into parts, their characteristic words and documents are identified. Further, accessing to 'FactoMineR' functions is very easy. Two of them are relevant in textual domain. MFA() addresses multiple lexical table allowing applications such as dealing with multilingual corpora as well as simultaneously analyzing both open-ended and closed questions in surveys. CaGalt() helps to explore the relationships between lexical choices and contextual variables. See <> for examples.


Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


1.1.0 by Ramón Alvarez-Esteban, 22 days ago

Browse source code at

Authors: Monica Bécue-Bertaut, Ramón Alvarez-Esteban, Josep-Anton Sánchez-Espigares

Documentation:   PDF Manual  

GPL (>= 2.0) license

Imports tm, stringr, slam, stats, graphics, gridExtra, utils

Depends on FactoMineR, ggplot2

See at CRAN