Parallel Distance Matrix Computation using Multiple Threads

A fast parallelized alternative to R's native 'dist' function to calculate distance matrices for continuous, binary, and multi-dimensional input matrices, which supports a broad variety of 39 predefined distance functions from the 'stats', 'proxy' and 'dtw' R packages, as well as user- defined functions written in C++. For ease of use, the 'parDist' function extends the signature of the 'dist' function and uses the same parameter naming conventions as distance methods of existing R packages. The package is mainly implemented in C++ and leverages the 'RcppParallel' package to parallelize the distance computations with the help of the 'TinyThread' library. Furthermore, the 'Armadillo' linear algebra library is used for optimized matrix operations during distance calculations. The curiously recurring template pattern (CRTP) technique is applied to avoid virtual functions, which improves the Dynamic Time Warping calculations while the implementation stays flexible enough to support different DTW step patterns and normalization methods.


Introduction

The parallelDist package provides a fast parallelized alternative to R's native 'dist' function to calculate distance matrices for continuous, binary, and multi-dimensional input matrices and offers a broad variety of predefined distance functions from the 'stats', 'proxy' and 'dtw' R packages, as well as support for user-defined distance functions written in C++. For ease of use, the 'parDist' function extends the signature of the 'dist' function and uses the same parameter naming conventions as distance methods of existing R packages. Currently 39 different distance methods are supported.

The package is mainly implemented in C++ and leverages the 'Rcpp' and 'RcppParallel' package to parallelize the distance computations with the help of the 'TinyThread' library. Furthermore, the Armadillo linear algebra library is used via 'RcppArmadillo' for optimized matrix operations for distance calculations. The curiously recurring template pattern (CRTP) technique is applied to avoid virtual functions, which improves the Dynamic Time Warping calculations while keeping the implementation flexible enough to support different step patterns and normalization methods.

Documentation and Usage Examples

Usage examples and performance benchmarks can be found in the included vignette.

Details about the 39 supported distance methods and their parameters are described on the help page of the 'parDist' function. The help page can be displayed with the following command:

?parDist

User-defined distance functions

Since version 0.2.0, parallelDist supports fast parallel distance matrix computations for user-defined distance functions written in C++.

A user-defined function needs to have the following signature (also see the Armadillo documentation):

double customDist(const arma::mat &A, const arma::mat &B)

Defining and compiling the function, as well as creating an external pointer to the user-defined function can easily be achieved with the cppXPtr function of the 'RcppXPtrUtils' package. The following code shows a full example of defining and using a user-defined euclidean distance function:

# RcppArmadillo is used as dependency
library(RcppArmadillo)
# RcppXPtrUtils is used for simple handling of C++ external pointers
library(RcppXPtrUtils)
 
# compile user-defined function and return pointer (RcppArmadillo is used as dependency)
euclideanFuncPtr <- cppXPtr("double customDist(const arma::mat &A, const arma::mat &B) { return sqrt(arma::accu(arma::square(A - B))); }",
                            depends = c("RcppArmadillo"))
 
# distance matrix for user-defined euclidean distance function
# (note that method is set to "custom")
parDist(matrix(1:16, ncol=2), method="custom", func = euclideanFuncPtr)

More information can be found in the vignette and the help pages.

Installation

parallelDist is available on CRAN and can be installed with the following command:

install.packages("parallelDist")

The current version from github can be installed using the 'devtools' package:

library(devtools)
install_github("alexeckert/parallelDist")

Authors

Alexander Eckert

License

GPL (>= 2)

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("parallelDist")

0.2.1 by Alexander Eckert, 3 months ago


https://github.com/alexeckert/parallelDist, https://www.alexandereckert.com/R


Report a bug at https://github.com/alexeckert/parallelDist/issues


Browse source code at https://github.com/cran/parallelDist


Authors: Alexander Eckert [aut, cre]


Documentation:   PDF Manual  


GPL (>= 2) license


Imports Rcpp, RcppParallel

Suggests dtw, ggplot2, proxy, testthat, RcppArmadillo, RcppXPtrUtils

Linking to Rcpp, RcppParallel, RcppArmadillo

System requirements: C++11


Suggested by dtwclust.


See at CRAN