Similarity and Distance Quantification Between Probability Functions

Computes 46 optimized distance and similarity measures for comparing probability functions. These comparisons between probability functions have their foundations in a broad range of scientific disciplines from mathematics to ecology. The aim of this package is to provide a core framework for clustering, classification, statistical inference, goodness-of-fit, non-parametric statistics, information theory, and machine learning tasks that are based on comparing univariate or multivariate probability functions.


Travis-CI Build Status

Data collection and data comparison are the foundations of scientific research. Mathematics provides the abstract framework to describe patterns we observe in nature and Statistics provides the framework to quantify the uncertainty of these patterns. In statistics, natural patterns are described in form of probability distributions which either follow a fixed pattern (parametric distributions) or more dynamic patterns (non-parametric distributions).

The philentropy package implements fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures. In this regard, it aims to provide a framework for comparing natural patterns in a statistical notation.

This project is born out of my passion for statistics and I hope that it will be useful to the people who share it with me.

Tutorials

Installation

# install philentropy version 0.0.2 from CRAN
install.packages("philentropy")

Install Developer Version

# install.packages("devtools")
# install the current version of philentropy on your system
library(devtools)
install_github("HajkD/philentropy", build_vignettes = TRUE, dependencies = TRUE)

NEWS

The current status of the package as well as a detailed history of the functionality of each version of philentropy can be found in the NEWS section.

Important Functions

Distance Measures

  • distance() : Implements 46 fundamental probability distance (or similarity) measures
  • getDistMethods() : Get available method names for 'distance'
  • dist.diversity() : Distance Diversity between Probability Density Functions
  • estimate.probability() : Estimate Probability Vectors From Count Vectors

Information Theory

  • H() : Shannon's Entropy H(X)
  • JE() : Joint-Entropy H(X,Y)
  • CE() : Conditional-Entropy H(X | Y)
  • MI() : Shannon's Mutual Information I(X,Y)
  • KL() : Kullback–Leibler Divergence
  • JSD() : Jensen-Shannon Divergence
  • gJSD() : Generalized Jensen-Shannon Divergence

Correlation Analyses

  • lin.cor() : Computes linear correlations

Discussions and Bug Reports

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:

https://github.com/HajkD/philentropy/issues

or find me on twitter: HajkDrost

News

Version 0.0.2

Bug fixes

  • Fixing C++ memory leaks in dist.diversity() and distance() when check for colSums(x) > 1.001 was peformed (leak was found with rhub::check_with_valgrind())

Version 0.0.1

Initial submission version.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("philentropy")

0.2.0 by Hajk-Georg Drost, a month ago


https://github.com/HajkD/philentropy


Browse source code at https://github.com/cran/philentropy


Authors: Hajk-Georg Drost [aut, cre] (<https://orcid.org/0000-0002-1567-306X>>)


Documentation:   PDF Manual  


GPL-2 license


Imports Rcpp, dplyr, KernSmooth

Suggests testthat, knitr

Linking to Rcpp


See at CRAN