METACRAN search results

Tools for Accessing the Botanical Information and Ecology Network Database

Provides Tools for Accessing the Botanical Information and Ecology Network Database. The BIEN database contains cleaned and standardized botanical data including occurrence, trait, plot and taxonomic data (See < https://bien.nceas.ucsb.edu/bien/> for more Information). This package provides functions that query the BIEN database by constructing and executing optimized SQL queries.

autodb — by Mark Webster, a month ago

Automatic Database Normalisation for Data Frames

Automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.) Originally inspired by the 'AutoNormalize' library for 'Python' by 'Alteryx' (< https://github.com/alteryx/autonormalize>), with various changes and improvements. Automatic discovery of functional or approximate dependencies, normalisation based on those, and plotting of the resulting "database" via 'Graphviz', with options to exclude some attributes at discovery time, or remove discovered dependencies at normalisation time.

https://charnelmouse.github.io/autodb/, https://github.com/CharnelMouse/autodb

PhoneValidator — by Onur Ozturk, 3 months ago

Client for 'GenderAPI.io' Phone Number Validation and Formatter API

Provides an interface to the 'GenderAPI.io' Phone Number Validation & Formatter API (< https://www.genderapi.io>) for validating international phone numbers, detecting number type (mobile, landline, Voice over Internet Protocol (VoIP)), retrieving region and country metadata, and formatting numbers to E.164 or national format. Designed to simplify integration into R workflows for data validation, Customer Relationship Management (CRM) data cleaning, and analytics tasks. Full documentation is available at < https://www.genderapi.io/docs-phone-validation-formatter-api>.

https://github.com/GenderAPI/PhoneValidator-R

sampleVADIR — by Trevor Swanson, 4 years ago

Draw Stratified Samples from the VADIR Database

Affords researchers the ability to draw stratified samples from the U.S. Department of Veteran's Affairs/Department of Defense Identity Repository (VADIR) database according to a variety of population characteristics. The VADIR database contains information for all veterans who were separated from the military after 1980. The central utility of the present package is to integrate data cleaning and formatting for the VADIR database with the stratification methods described by Mahto (2019) < https://CRAN.R-project.org/package=splitstackshape>. Data from VADIR are not provided as part of this package.

https://github.com/tswanson222/sampleVADIR

starschemar — by Jose Samos, 2 years ago

Obtaining Stars from Flat Tables

Data in multidimensional systems is obtained from operational systems and is transformed to adapt it to the new structure. Frequently, the operations to be performed aim to transform a flat table into a star schema. Transformations can be carried out using professional extract, transform and load tools or tools intended for data transformation for end users. With the tools mentioned, this transformation can be carried out, but it requires a lot of work. The main objective of this package is to define transformations that allow obtaining stars from flat tables easily. In addition, it includes basic data cleaning, dimension enrichment, incremental data refresh and query operations, adapted to this context.

https://josesamos.github.io/starschemar/, https://github.com/josesamos/starschemar

gen5helper — by Yanxian Lin, 6 years ago

Processing 'Gen5' 2.06 Exported Data

A collection of functions for processing 'Gen5' 2.06 exported data. 'Gen5' is an essential data analysis software for BioTek plate readers < https://www.biotek.com/products/software-robotics-software/gen5-microplate-reader-and-imager-software/>. This package contains functions for data cleaning, modeling and plotting using exported data from 'Gen5' version 2.06. It exports technically correct data defined in (Edwin de Jonge and Mark van der Loo (2013) < https://cran.r-project.org/doc/contrib/de_Jonge+van_der_Loo-Introduction_to_data_cleaning_with_R.pdf>) for customized analysis. It contains Boltzmann fitting for general kinetic analysis. See < https://www.github.com/yanxianUCSB/gen5helper> for more information, documentation and examples.

shinymrp — by Toan Tran, 11 days ago

Interface for Multilevel Regression and Poststratification

Dual interfaces, graphical and programmatic, designed for intuitive applications of Multilevel Regression and Poststratification (MRP). Users can apply the method to a variety of datasets, from electronic health records to sample survey data, through an end-to-end Bayesian data analysis workflow. The package provides robust tools for data cleaning, exploratory analysis, flexible model building, and insightful result visualization. For more details, see Si et al. (2020) < https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2020002/article/00003-eng.pdf?st=iF1_Fbrh> and Si (2025) .

https://mrp-interface.github.io/shinymrp/

wordpredictor — by Nadir Latif, a year ago

Develop Text Prediction Models Based on N-Grams

A framework for developing n-gram models for text prediction. It provides data cleaning, data sampling, extracting tokens from text, model generation, model evaluation and word prediction. For information on how n-gram models work we referred to: "Speech and Language Processing" < https://web.archive.org/web/20240919222934/https%3A%2F%2Fweb.stanford.edu%2F~jurafsky%2Fslp3%2F3.pdf>. For optimizing R code and using R6 classes we referred to "Advanced R" < https://adv-r.hadley.nz/r6.html>. For writing R extensions we referred to "R Packages", < https://r-pkgs.org/index.html>.

https://github.com/pakjiddat/word-predictor, https://pakjiddat.github.io/word-predictor/

matchmaker — by Zhian N. Kamvar, 6 years ago

Flexible Dictionary-Based Cleaning

Provides flexible dictionary-based cleaning that allows users to specify implicit and explicit missing data, regular expressions for both data and columns, and global matches, while respecting ordering of factors. This package is part of the 'RECON' (< https://www.repidemicsconsortium.org/>) toolkit for outbreak analysis.

https://www.repidemicsconsortium.org/matchmaker, https://github.com/reconhub/matchmaker

LLMAgentR — by Kwadwo Daddy Nyame Owusu Boakye, 7 months ago

Language Model Agents in R for AI Workflows and Research

Provides modular, graph-based agents powered by large language models (LLMs) for intelligent task execution in R. Supports structured workflows for tasks such as forecasting, data visualization, feature engineering, data wrangling, data cleaning, 'SQL', code generation, weather reporting, and research-driven question answering. Each agent performs iterative reasoning: recommending steps, generating R code, executing, debugging, and explaining results. Includes built-in support for packages such as 'tidymodels', 'modeltime', 'plotly', 'ggplot2', and 'prophet'. Designed for analysts, developers, and teams building intelligent, reproducible AI workflows in R. Compatible with LLM providers such as 'OpenAI', 'Anthropic', 'Groq', and 'Ollama'. Inspired by the Python package 'langagent'.

https://github.com/knowusuboaky/LLMAgentR, https://knowusuboaky.github.io/LLMAgentR/

Search results

R links

R homepage

Download R

Mailing lists

R documentation

R manuals

R FAQs

The R Journal

CRAN links

CRAN homepage

CRAN repository policy

Submit a package

METACRAN stuff

About METACRAN

At github

Report a bug