METACRAN search results

tinytest — by Mark van der Loo, 2 years ago

Lightweight and Feature Complete Unit Testing Framework

Provides a lightweight (zero-dependency) and easy to use unit testing framework. Main features: install tests with the package. Test results are treated as data that can be stored and manipulated. Test files are R scripts interspersed with test commands, that can be programmed over. Fully automated build-install-test sequence for packages. Skip tests when not run locally (e.g. on CRAN). Flexible and configurable output printing. Compare computed output with output stored with the package. Run tests in parallel. Extensible by other packages. Report side effects.

https://github.com/markvanderloo/tinytest

stringdist — by Mark van der Loo, 6 months ago

Approximate String Matching, Fuzzy Text Search, and String Distance Functions

Implements an approximate string matching version of R's native 'match' function. Also offers fuzzy text search based on various string distance measures. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences. This package is built for speed and runs in parallel by using 'openMP'. An API for C or C++ is exposed as well. Reference: MPJ van der Loo (2014) .

https://github.com/markvanderloo/stringdist

gower — by Mark van der Loo, 7 months ago

Gower's Distance

Compute Gower's distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting OpenMP.

https://github.com/markvanderloo/gower

settings — by Mark van der Loo, 4 years ago

Software Option Settings Manager for R

Provides option settings management that goes beyond R's default 'options' function. With this package, users can define their own option settings manager holding option names, default values and (if so desired) ranges or sets of allowed option values that will be automatically checked. Settings can then be retrieved, altered and reset to defaults with ease. For R programmers and package developers it offers cloning and merging functionality which allows for conveniently defining global and local options, possibly in a multilevel options hierarchy. See the package vignette for some examples concerning functions, S4 classes, and reference classes. There are convenience functions to reset par() and options() to their 'factory defaults'.

https://github.com/markvanderloo/settings

validate — by Mark van der Loo, a year ago

Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. Supports checks implied by an SDMX DSD file as well. See also Van der Loo and De Jonge (2018) , Chapter 6 and the JSS paper (2021) .

https://github.com/data-cleaning/validate

digest — by Dirk Eddelbuettel, a year ago

Create Compact Hash Digests of R Objects

Implementation of a function 'digest()' for the creation of hash digests of arbitrary R objects (using the 'md5', 'sha-1', 'sha-256', 'crc32', 'xxhash', 'murmurhash', 'spookyhash', 'blake3', 'crc32c', 'xxh3_64', and 'xxh3_128' algorithms) permitting easy comparison of R language objects, as well as functions such as'hmac()' to create hash-based message authentication code. Please note that this package is not meant to be deployed for cryptographic purposes for which more comprehensive (and widely tested) libraries such as 'OpenSSL' should be used.

https://github.com/eddelbuettel/digest, https://dirk.eddelbuettel.com/code/digest.html

lintools — by Mark van der Loo, 2 years ago

Manipulation of Linear Systems of (in)Equalities

Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, simplify systems by removing spurious columns and rows and collapse implied equalities, test if a matrix is totally unimodular, compute variable ranges implied by linear (in)equalities.

https://github.com/data-cleaning/lintools

simputation — by Mark van der Loo, 7 months ago

Simple Imputation

Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the 'magrittr' package.

https://github.com/markvanderloo/simputation

extremevalues — by Mark van der Loo, 7 months ago

Univariate Outlier Detection

Detect outliers in one-dimensional data.

https://github.com/markvanderloo/extremevalues

lumberjack — by Mark van der Loo, 2 years ago

Track Changes in Data

A framework that allows for easy logging of changes in data. Main features: start tracking changes by adding a single line of code to an existing script. Track changes in multiple datasets, using multiple loggers. Add custom-built loggers or use loggers offered by other packages. .

https://github.com/markvanderloo/lumberjack

Search results

R links

R homepage

Download R

Mailing lists

R documentation

R manuals

R FAQs

The R Journal

CRAN links

CRAN homepage

CRAN repository policy

Submit a package

METACRAN stuff

About METACRAN

At github

Report a bug