Data Validation Infrastructure

Declare data validation rules and data quality indicators; confront data with them and analyze or visualize the results. The package supports rules that are per-field, in-record, cross-record or cross-dataset. Rules can be automatically analyzed for rule type and connectivity. See also Van der Loo and De Jonge (2018) , Chapter 6 and the JSS paper (2021) .


News

version 0.2.6

  • New methods 'all' and 'any' for 'validation' objects
  • New 'plot' method for 'validator' objects.
  • New 'plot' method for 'validation' objects.
  • New 'as.data.frame' methods for 'validatorComparison' and 'cellComparison' objects.
  • New 'plot' method for 'validatorComparison' and 'cellComparison' objects.
  • New 'barplot' method for 'validatorComparison' and 'cellComparison' objects.
  • Improved 'plot'/'barplot' method for objects of class 'validator'.
  • Bugfixes in 'cells' and 'compare'.
  • The row order of output in 'cells' has changed to be more consistent with row order in the output of 'compare'.
  • The row 'new_missing' is renamed 'removed' in 'cellComparison' objects.
  • Improved cross-references in reference manual

version 0.2.5

  • Documentation updates (Thanks to Matthias Gomolka)
  • Removed deprecated functions 'number_missing', 'fraction_missing', 'row_missing', 'number_unique', 'any_dunplicated', 'any_missing'.
  • Bugfix: the 'call' object in a 'confrontation' object was created incorrectly. This only affects printing.

version 0.2.4

  • New set membership operator "%vin%", behaving consistently with "=="
  • Set membership operator %in% is now replaced with %vin% (#80).
  • Added [[<- and [<- methods for validator objects.
  • Validation rules can now be endowed with user-defined metadata, see ?meta (#59).
  • fixed (harmless) compiler warning (thanks to Brian Ripley)
  • bugfix: empty [] selection on expressionset crashed (thanks to Anne Petersen)
  • bugfix: NA in validating expression would crash parser (thanks to Masafumi Okada)

version 0.2.0

  • new object of class 'indicator'
  • validation rules now support if-then-else syntax.
  • validaton rules can now contain embedded if-statements, e.g. A | (if (B) C)
  • expressionset objects (validator, indicator) can be combined using '+'
  • results of getter functions (e.g. 'labels') are now named.
  • less verbose printng of options for expressionset objects (validator, indicator)
  • expressionset objects (validator, indicator) can now be coerced to as.data.frame.
  • confrontation objects (validation, indication) can now be coerced to data.frame.
  • safer expression manipulation under the hood.
  • breaking: 'rule' column now called 'name' in summary of 'confrontation' objects.
  • bugfix: character indexing with multiple elements of expressionset objects would crash.
  • bugfix: proper recycling for replace methods in expressionset.

version 0.1.8

  • Added loggers for the 'lumberjack' package: lbj_cells and lbj_rules
  • Tolerance for checking (non-strict) linear inequalities is now controlled by voptions(lin.ineq.opts).
  • Macro assignments through := are no longer put in brackets.

version 0.1.7

  • The missingess counters are now only internally documented and will be deprecated (the introduction of the '.' made them more or less obsolete).
  • Package now 'depends' on methods to allow dispatch on objects inhereting from data.frame
  • bugfix: The assignment created() <- was broken. #65, thanks to Andrew R Gibson.
  • bugfix: Simple equallity checks on character data behaved unexpectedly. #67, thanks to Kevin Kuo.
  • Native routines now registered as required by updated CRAN policy.

version 0.1.5

  • The '.' is now used to reference the validated data set as whole.
  • Small change in output of 'compare' to match the table in van den Broek et al. (2013)
  • registered native routines as now recommended by CRAN

version 0.1.4

  • 'confront' now emits a warining when variable name conflicts with name of a reference data set
  • Deprecated 'validate_reset', in favour of the shorter 'reset' (use 'validate::reset' in case of ambiguity)
  • Deprecated 'validate_options' in favour of the shorter 'voptions'
  • New option na.value with default value NA, controlling the output when a rule evaluates to NA.
  • Added rules from the ESSnet on validation (deliverable 17) to automated tests.
  • added 'grepl' to allowed validation syntax (suggested by Dusan Sovic)
  • exported a few functions w/ keywords internal for extensibility
  • Bugfix: blocks sometimes reported wrong nr of blocks (in case of a single connected block.)
  • Bugfix: macro expansion failed when macros were reused in other macros.
  • Bugfix: certain nonlinear relations were recognized as linear
  • Bugfix: rules that use (anonymous) function definitions raised error when printed.

version 0.1.3

  • initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("validate")

1.0.4 by Mark van der Loo, 5 months ago


https://github.com/data-cleaning/validate


Report a bug at https://github.com/data-cleaning/validate/issues


Browse source code at https://github.com/cran/validate


Authors: Mark van der Loo [cre, aut] , Edwin de Jonge [aut] , Paul Hsieh [ctb]


Documentation:   PDF Manual  


Task views: Official Statistics & Survey Methodology, Official Statistics & Survey Statistics


GPL-3 license


Imports stats, graphics, grid, settings, yaml

Depends on methods

Suggests tinytest, knitr, bookdown, lumberjack, rmarkdown


Imported by dcmodify, deductive, iNZightTools, rspa.

Depended on by errorlocate, validatedb, validatetools.

Suggested by datamods, mapping.


See at CRAN