METACRAN search results

editrules — by Edwin de Jonge, a year ago

Parsing, Applying, and Manipulating Data Cleaning Rules

Please note: active development has moved to packages 'validate' and 'errorlocate'. Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the 'igraph' package.

https://github.com/data-cleaning/editrules

dendextend — by Tal Galili, 12 hours ago

Extending 'dendrogram' Functionality in R

Offers a set of functions for extending 'dendrogram' objects in R, letting you visualize and compare trees of 'hierarchical clusterings'. You can (1) Adjust a tree's graphical parameters - the color, size, type, etc of its branches, nodes and labels. (2) Visually and statistically compare different 'dendrograms' to one another.

https://talgalili.github.io/dendextend/, https://github.com/talgalili/dendextend/, https://cran.r-project.org/package=dendextend, https://www.r-statistics.com/tag/dendextend/, https://doi.org/10.1093/bioinformatics/btv428

rspa — by Mark van der Loo, 3 years ago

Adapt Numerical Records to Fit (in)Equality Restrictions

Minimally adjust the values of numerical records in a data.frame, such that each record satisfies a predefined set of equality and/or inequality constraints. The constraints can be defined using the 'validate' package. The core algorithms have recently been moved to the 'lintools' package, refer to 'lintools' for a more basic interface and access to a version of the algorithm that works with sparse matrices.

https://github.com/markvanderloo/rspa

hashr — by Mark van der Loo, 4 years ago

Hash R Objects to Integers Fast

Apply an adaptation of the SuperFastHash algorithm to any R object. Hash whole R objects or, for vectors or lists, hash R objects to obtain a set of hash values that is stored in a structure equivalent to the input. See < http://www.azillionmonkeys.com/qed/hash.html> for a description of the hash algorithm.

https://github.com/markvanderloo/hashr

accumulate — by Mark van der Loo, 4 months ago

Split-Apply-Combine with Dynamic Groups

Estimate group aggregates, where one can set user-defined conditions that each group of records must satisfy to be suitable for aggregation. If a group of records is not suitable, it is expanded using a collapsing scheme defined by the user. A paper on this package was published in the Journal of Statistical Software .

https://github.com/markvanderloo/accumulate

synthesizer — by Mark van der Loo, 5 days ago

Fast, Robust, and High-Quality Synthetic Data Generation with a Tuneable Privacy-Utility Trade-Off

Synthesize numeric, categorical, mixed and time series data. Data circumstances including mixed (or zero-inflated) distributions and missing data patterns are reproduced in the synthetic data. A single parameter allows balancing between high-quality synthetic data that represents correlations of the original data and lower quality but more privacy safe synthetic data without correlations. Tuning can be done per variable or for the whole dataset.

https://github.com/markvanderloo/synthesizer

deductive — by Mark van der Loo, 5 months ago

Data Correction and Imputation Using Deductive Methods

Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.

https://github.com/data-cleaning/deductive

dcmodify — by Mark van der Loo, a year ago

Modify Data Using Externally Defined Modification Rules

Data cleaning scripts typically contain a lot of 'if this change that' type of statements. Such statements are typically condensed expert knowledge. With this package, such 'data modifying rules' are taken out of the code and become in stead parameters to the work flow. This allows one to maintain, document, and reason about data modification rules as separate entities.

https://github.com/data-cleaning/dcmodify

deducorrect — by Mark van der Loo, 10 years ago

Deductive Correction, Deductive Imputation, and Deterministic Correction

A collection of methods for automated data cleaning where all actions are logged.

https://github.com/data-cleaning/deducorrect

drat — by Dirk Eddelbuettel, 9 months ago

'Drat' R Archive Template

Creation and use of R Repositories via helper functions to insert packages into a repository, and to add repository information to the current R session. Two primary types of repositories are support: gh-pages at GitHub, as well as local repositories on either the same machine or a local network. Drat is a recursive acronym: Drat R Archive Template.

https://github.com/eddelbuettel/drat, https://dirk.eddelbuettel.com/code/drat.html

Search results

R links

R homepage

Download R

Mailing lists

R documentation

R manuals

R FAQs

The R Journal

CRAN links

CRAN homepage

CRAN repository policy

Submit a package

METACRAN stuff

About METACRAN

At github

Report a bug