This comprehensive toolkit provide a consistent and extensible framework for working with missing values in vectors. The companion package 'tidyimpute' provides similar functionality for list-like and table-like structures). Functions exist for detection, removal, replacement, imputation, recollection, etc. of 'NAs'.
na.tools is a comprehensive library for handling missing (NA) values. It has several goals:
In this package, there are methods for the detection, removal, replacement,
--imputation--, recollection, etc. of missing values (
NAs). This libraries focus
is on vectors (atomics). For tidy/dplyr compliant methods operating on
tables and lists, please use the
tidyimpute package which
depends on this package.
na.*functions found in the stats package.
na.return a transformed version of the input vector with missing values imputes/
x <- 1:3 x <- NA_real_ any_na(x) all_na(x) which_na(x) n_na(x) pct_na(x) na.rm(x) na.replace(x, 2) na.replace(x, mean) # error na.replace(x, na.mean) # Works na.zero(x) na.mean(x) na.cumsum(x)
na.n- Count mising values
na.pct- Calculate pct of missing values
which.na- Return logical or character indicating which elements are missing
na.all) - test if all elements are missing
na.any) - test if any elements are missing
NAs (with tables is equivalent to
NAs from beginning or end (non-commutative/order matters)
There are two types of imputation methods for plain vectors. They are distinguished by their replacement values.
In "constant" imputation methods, missing values are replaced by an a priori selected constant value. No calculation are performed to derive replacement values and all missing value assume the same transformied value.
NAs with 0
na.constant: constant value
In functional imputation, the value is calculated from the vector containing the missing value(s) -- and only that vector. Missing values may impute to different values. Replacement values may (or may not) be affected by the ording of the vector.
Commutative functions provide the same result irregarless of the ordering of the input vectors. Therefore, these functions do not depend on the ordering of elements of the input vector.
(When imputing in a table, imputation by function is also called column-based imputation since replacement values derive from the single column. Table-based imputation is found in the tidyimpute package.)
na.median- median value
na.quantile- quantile value
na.random- randomly sampled value
** Non-commulative functions **s
na.cummax- cumulative max
na.cummin- cumulative min
na.cumsum- cumulative sum
na.cumprod- cumulative prod
na.explicit- atomic vectors only. General replacement function
na.implicit- turn explicit values back into NAs
A number of other packages have methods for working with missing values and/or imputation. Here is a short, incomplete and growing list:
randomForest::na.roughfix()- imputes with
zoo::na.*- collection of non-commutative imputation techniques for time series data.
NA_explicit_as an exported constant for explicit categorical values.
na_explicit) to add levels for values if they do not already exist.
ifelsebecause of edge cases
ifelseand prevent recycling