Sparse and Regularized Discriminant Analysis

A collection of sparse and regularized discriminant analysis methods intended for small-sample, high-dimensional data sets. The package features the High-Dimensional Regularized Discriminant Analysis classifier.


The R package sparsediscrim provides a collection of sparse and regularized discriminant analysis classifiers that are especially useful for when applied to small-sample, high-dimensional data sets.

You can install the stable version on CRAN:

install.packages('sparsediscrim', dependencies = TRUE)

If you prefer to download the latest version, instead type:

library(devtools)
install_github('ramhiser/sparsediscrim')

The sparsediscrim package features the following classifier (the R function is included within parentheses):

  • High-Dimensional Regularized Discriminant Analysis (hdrda) from Ramey et al. (2015)

The sparsediscrim package also includes a variety of additional classifiers intended for small-sample, high-dimensional data sets. These include:

ClassifierAuthorR Function
Diagonal Linear Discriminant AnalysisDudoit et al. (2002)dlda
Diagonal Quadratic Discriminant AnalysisDudoit et al. (2002)dqda
Shrinkage-based Diagonal Linear Discriminant AnalysisPang et al. (2009)sdlda
Shrinkage-based Diagonal Quadratic Discriminant AnalysisPang et al. (2009)sdqda
Shrinkage-mean-based Diagonal Linear Discriminant AnalysisTong et al. (2012)smdlda
Shrinkage-mean-based Diagonal Quadratic Discriminant AnalysisTong et al. (2012)smdqda
Minimum Distance Empirical Bayesian Estimator (MDEB)Srivistava and Kubokawa (2007)mdeb
Minimum Distance Rule using Modified Empirical Bayes (MDMEB)Srivistava and Kubokawa (2007)mdmeb
Minimum Distance Rule using Moore-Penrose Inverse (MDMP)Srivistava and Kubokawa (2007)mdmp

We also include modifications to Linear Discriminant Analysis (LDA) with regularized covariance-matrix estimators:

  • Moore-Penrose Pseudo-Inverse (lda_pseudo)
  • Schafer-Strimmer estimator (lda_schafer)
  • Thomaz-Kitani-Gillies estimator (lda_thomaz)

News

UPDATES

  • Fixed an issue with the HDRDA classifier's predict function. The posterior probabilities did not sum to 1 because they were unnormalized. #34

  • Fixed another issue with the HDRDA classifier's predict function, where the class names were incorrect when predicting a single observation. #34

  • Improved docs throughout the package to pass R CMD CHECK. #35

UPDATES

  • The predict function now returns posterior-probability estimates for each classifier.

  • The object returned by cv_hdrda() can be plotted. A heatmap is produced using ggplot2 to illustrate the cross-validation error rate for each tuning-parameter pair considered.

  • The predict function for the HDRDA classifier is now substantially faster when classifying a large number of observations. #33

UPDATES

  • The cross-validation helper function cv_hdrda() for the HDRDA classifier now returns a trained classifier rather than the optimal model.

  • cv_hdrda() also has an optional verbose argument to dump summary information while the cross-validation is running.

FIXES

  • Fixed issue with classifiers' documentation not appearing in help index. #26

  • Better handling of HDRDA when its tuning parameters are both 0.

  • Corrected calculation of W_k and Q_k in HDRDA classifier.

MISCELLANEOUS

  • Added unit tests for HDRDA.

  • Can now specify population means in generate_blockdiag().

  • Added unit tests for generate_blockdiag().

  • Updated man docs with roxygen2 4.0.

  • Added log_determinant() helper function to calculate the log-determinant of a matrix.

UPDATES

  • The High-Dimensional Regularized Discriminant Analysis (HDRDA) classifier from Ramey, Stein, and Young (2014) implemented in hdrda() has been revamped to improve its computational performance.

CLASSIFIERS

  • lda_pseudo() is an implementation of Linear Discriminant Analysis (LDA) with the Moore-Penrose Pseudo-Inverse

  • lda_schafer() is an implementation of Linear Discriminant Analysis (LDA) using the covariance matrix estimator from Schafer and Strimmer (2005)

  • lda_thomaz() is an implementation of Linear Discriminant Analysis (LDA) using the covariance matrix estimator from Thomaz, Kitani, and Gillies (2006)

  • mdeb() is an implementation of the Minimum Distance Empirical Bayesian Estimator (MDEB) classifier from Srivistava and Kubokawa (2007)

  • mdmeb() is an implementation of the Minimum Distance Rule using Modified Empirical Bayes (MDMEB) classifier from Srivistava and Kubokawa (2007)

  • mdmp() is an implementation of the Minimum Distance Rule using Moore-Penrose Inverse (MDMP) classifier from Srivistava and Kubokawa (2007)

  • smdlda() is an implementation of the Shrinkage-mean-based Diagonal Linear Discriminant Analysis (SmDLDA) from Tong, Chen, and Zhao (2012)

  • smdqda() is an implementation of the Shrinkage-mean-based Diagonal Quadratic Discriminant Analysis (SmDQDA) from Tong, Chen, and Zhao (2012)

MISCELLANEOUS

  • Added a summary function for hdrda classifiers

NEW FEATURES

  • First version of the sparsediscrim package. With this package, we aim to provide a large collection of regularized and sparse discriminant analysis classifiers intended for high-dimensional classification.

CLASSIFIERS

  • hdrda() is an implementation of the High-Dimensional Regularized Discriminant Analysis classifier from Ramey, Stein, and Young (2014).

  • dlda() is an implementation of the Diagonal Linear Discriminant Analysis classifier from Dudoit, Fridlyand, and Speed (2002).

  • dqda() is an implementation of the Diagonal Quadratic Discriminant Analysis classifier from Dudoit, Fridlyand, and Speed (2002).

  • sdlda() is an implementation of the Shrinkage-based Diagonal Linear Discriminant Analysis classifier from Pang, Tong, and Zhao (2009).

  • sdqda() is an implementation of the Shrinkage-based Diagonal Quadratic Discriminant Analysis classifier from Pang, Tong, and Zhao (2009).

SIMULATED DATA SETS

  • generate_blockdiag() generates random variates from K multivariate normal populations, where each class is generated with a constant mean vector and a covariance matrix consisting of block-diagonal autocorrelation matrices.

  • generate_intraclass() generates random variates from K multivariate normal populations, where class is generated with a constant mean vector and an intraclass covariance matrix.

MISCELLANEOUS

  • cv_partition() randomly partitions data for cross-validation.

  • no_intercept() removes the intercept term from a formula if it is included.

  • cov_mle() computes the maximum likelihood estimator for the sample covariance matrix under the assumption of multivariate normality.

  • cov_pool() computes the pooled maximum likelihood estimator for the common covariance matrix under the assumption of multivariate normality.

  • cov_eigen() computes the eigenvalue decomposition of the maximum likelihood estimators of the covariance matrices for the given data matrix. We provide an option to calculate the eigenvalue decomposition using the Fast Singular Value Decomposition, which can greatly expedite the eigenvalue decomposition for very tall data (large n, small p) or very wide data (small n, large p).

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("sparsediscrim")

0.2.4 by John A. Ramey, a month ago


https://github.com/ramhiser/sparsediscrim, http://ramhiser.com


Browse source code at https://github.com/cran/sparsediscrim


Authors: John A. Ramey <johnramey@gmail.com>


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports bdsmatrix, corpcor, dplyr, ggplot2, mvtnorm

Suggests testthat, caret


Suggested by mlr.


See at CRAN