METACRAN search results

Simple Tools for Examining and Cleaning Dirty Data

The main janitor functions can: perfectly format data.frame column names; provide quick counts of variable combinations (i.e., frequency tables and crosstabs); and explore duplicate records. Other janitor functions nicely format the tabulation results. These tabulate-and-report functions approximate popular features of SPSS and Microsoft Excel. This package follows the principles of the "tidyverse" and works well with the pipe function %>%. janitor was built with beginning-to-intermediate R users in mind and is optimized for user-friendliness.

https://github.com/sfirke/janitor, https://sfirke.github.io/janitor/

cleaner — by Matthijs S. Berends, 7 months ago

Fast and Easy Data Cleaning

Data cleaning functions for classes logical, factor, numeric, character, currency and Date to make data cleaning fast and easy. Relying on very few dependencies, it provides smart guessing, but with user options to override anything if needed.

https://msberends.github.io/cleaner/, https://github.com/msberends/cleaner

editrules — by Edwin de Jonge, a year ago

Parsing, Applying, and Manipulating Data Cleaning Rules

Please note: active development has moved to packages 'validate' and 'errorlocate'. Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the 'igraph' package.

https://github.com/data-cleaning/editrules

DataClean — by Xiaorui(Jeremy) Zhu, 9 years ago

Data Cleaning

Includes functions that researchers or practitioners may use to clean raw data, transferring html, xlsx, txt data file into other formats. And it also can be used to manipulate text variables, extract numeric variables from text variables and other variable cleaning processes. It is originated from a author's project which focuses on creative performance in online education environment. The resulting paper of that study will be published soon.

bdc — by Bruno Ribeiro, 7 months ago

Biodiversity Data Cleaning

It brings together several aspects of biodiversity data-cleaning in one place. 'bdc' is organized in thematic modules related to different biodiversity dimensions, including 1) Merge datasets: standardization and integration of different datasets; 2) Pre-filter: flagging and removal of invalid or non-interpretable information, followed by data amendments; 3) Taxonomy: cleaning, parsing, and harmonization of scientific names from several taxonomic groups against taxonomic databases locally stored through the application of exact and partial matching algorithms; 4) Space: flagging of erroneous, suspect, and low-precision geographic coordinates; and 5) Time: flagging and, whenever possible, correction of inconsistent collection date. In addition, it contains features to visualize, document, and report data quality – which is essential for making data quality assessment transparent and reproducible. The reference for the methodology is Bruno et al. (2022) .

https://brunobrr.github.io/bdc/ (website) https://github.com/brunobrr/bdc

BeeBDC — by James B. Dorey, 8 months ago

Occurrence Data Cleaning

Flags and checks occurrence data that are in Darwin Core format. The package includes generic functions and data as well as some that are specific to bees. This package is meant to build upon and be complimentary to other excellent occurrence cleaning packages, including 'bdc' and 'CoordinateCleaner'. This package uses datasets from several sources and particularly from the Discover Life Website, created by Ascher and Pickering (2020). For further information, please see the original publication and package website. Publication - Dorey et al. (2023) and package website - Dorey et al. (2023) < https://github.com/jbdorey/BeeBDC>.

https://jbdorey.github.io/BeeBDC/ https://github.com/jbdorey/BeeBDC

clean — by Matthijs S. Berends, 5 years ago

Fast and Easy Data Cleaning

A wrapper around the new 'cleaner' package, that allows data cleaning functions for classes 'logical', 'factor', 'numeric', 'character', 'currency' and 'Date' to make data cleaning fast and easy. Relying on very few dependencies, it provides smart guessing, but with user options to override anything if needed.

https://github.com/msberends/cleaner

psycCleaning — by Jason Moy, 2 years ago

Data Cleaning for Psychological Analyses

Useful for preparing and cleaning data. It includes functions to center data, reverse coding, dummy code and effect code data, and more.

https://jasonmoy28.github.io/psycCleaning/

datacleanr — by Alexander Hurley, 2 months ago

Interactive and Reproducible Data Cleaning

Flexible and efficient cleaning of data with interactivity. 'datacleanr' facilitates best practices in data analyses and reproducibility with built-in features and by translating interactive/manual operations to code. The package is designed for interoperability, and so seamlessly fits into reproducible analyses pipelines in 'R'.

https://github.com/the-Hull/datacleanr

arkhe — by Nicolas Frerebeau, 2 months ago

Tools for Cleaning Rectangular Data

A dependency-free collection of simple functions for cleaning rectangular data. This package allows to detect, count and replace values or discard rows/columns using a predicate function. In addition, it provides tools to check conditions and return informative error messages.

https://codeberg.org/tesselle/arkhe, https://packages.tesselle.org/arkhe/

Search results

R links

R homepage

Download R

Mailing lists

R documentation

R manuals

R FAQs

The R Journal

CRAN links

CRAN homepage

CRAN repository policy

Submit a package

METACRAN stuff

About METACRAN

At github

Report a bug