Examples: visualization, C++, networks, data cleaning, html widgets, ropensci.

Found 16 packages in 0.04 seconds

stringdist — by Mark van der Loo, 3 months ago

Approximate String Matching and String Distance Functions

Implements an approximate string matching version of R's native 'match' function. Can calculate various string distances based on edits (Damerau-Levenshtein, Hamming, Levenshtein, optimal sting alignment), qgrams (q- gram, cosine, jaccard distance) or heuristic metrics (Jaro, Jaro-Winkler). An implementation of soundex is provided as well. Distances can be computed between character vectors while taking proper care of encoding or between integer vectors representing generic sequences.

simputation — by Mark van der Loo, 2 months ago

Simple Imputation

Easy to use interfaces to a number of imputation methods that fit in the not-a-pipe operator of the 'magrittr' package.

rspa — by Mark van der Loo, 2 years ago

Adapt Numerical Records to Fit (in)Equality Restrictions

Based on (optionally sparse) quadratic optimization with the main algorithms implemented in C. Includes features for easy processing of many (smaller) records. The algorithm has been tested on fairly large optimization problems with up to a few million variables and several hundred thousand restrictions.

validate — by Mark van der Loo, 9 months ago

Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity.

hashr — by Mark van der Loo, 2 years ago

Hash R Objects to Integers Fast

Apply the SuperFastHash algorithm to any R object. Hash whole R objects or, for vectors or lists, hash R objects to obtain a set of hash values that is stored in a structure equivalent to the input.

deductive — by Mark van der Loo, 6 months ago

Data Correction and Imputation Using Deductive Methods

Attempt to repair inconsistencies and missing values in data records by using information from valid values and validation rules restricting the data.

lintools — by Mark van der Loo, a month ago

Manipulation of Linear Systems of (in)Equalities

Variable elimination (Gaussian elimination, Fourier-Motzkin elimination), Moore-Penrose pseudoinverse, reduction to reduced row echelon form, value substitution, projecting a vector on the convex polytope described by a system of (in)equations, simplify systems by removing spurious columns and rows and collapse implied equalities, test if a matrix is totally unimodular, compute variable ranges implied by linear (in)equalities.

gower — by Mark van der Loo, a month ago

Gower's Distance

Compute Gower's distance (or similarity) coefficient between records. Compute the top-n matches between records. Core algorithms are executed in parallel on systems supporting OpenMP.

editrules — by Edwin de Jonge, 2 years ago

Parsing, Applying, and Manipulating Data Cleaning Rules

Facilitates reading and manipulating (multivariate) data restrictions (edit rules) on numerical and categorical data. Rules can be defined with common R syntax and parsed to an internal (matrix-like format). Rules can be manipulated with variable elimination and value substitution methods, allowing for feasibility checks and more. Data can be tested against the rules and erroneous fields can be found based on Fellegi and Holt's generalized principle. Rules dependencies can be visualized with using the igraph package.

deducorrect — by Mark van der Loo, 2 years ago

Deductive Correction, Deductive Imputation, and Deterministic Correction

A collection of methods for automated data cleaning where all actions are logged.