A collection of sparse and regularized discriminant analysis methods intended for small-sample, high-dimensional data sets. The package features the High-Dimensional Regularized Discriminant Analysis classifier.
The R package sparsediscrim
provides a collection of sparse and regularized discriminant
analysis classifiers that are especially useful for when applied to
small-sample, high-dimensional data sets.
You can install the stable version on CRAN:
install.packages('sparsediscrim', dependencies = TRUE)
If you prefer to download the latest version, instead type:
library(devtools)install_github('ramhiser/sparsediscrim')
The sparsediscrim
package features the following classifier (the R function
is included within parentheses):
hdrda
) from Ramey et al. (2015)The sparsediscrim
package also includes a variety of additional classifiers
intended for small-sample, high-dimensional data sets. These include:
Classifier | Author | R Function |
---|---|---|
Diagonal Linear Discriminant Analysis | Dudoit et al. (2002) | dlda |
Diagonal Quadratic Discriminant Analysis | Dudoit et al. (2002) | dqda |
Shrinkage-based Diagonal Linear Discriminant Analysis | Pang et al. (2009) | sdlda |
Shrinkage-based Diagonal Quadratic Discriminant Analysis | Pang et al. (2009) | sdqda |
Shrinkage-mean-based Diagonal Linear Discriminant Analysis | Tong et al. (2012) | smdlda |
Shrinkage-mean-based Diagonal Quadratic Discriminant Analysis | Tong et al. (2012) | smdqda |
Minimum Distance Empirical Bayesian Estimator (MDEB) | Srivistava and Kubokawa (2007) | mdeb |
Minimum Distance Rule using Modified Empirical Bayes (MDMEB) | Srivistava and Kubokawa (2007) | mdmeb |
Minimum Distance Rule using Moore-Penrose Inverse (MDMP) | Srivistava and Kubokawa (2007) | mdmp |
We also include modifications to Linear Discriminant Analysis (LDA) with regularized covariance-matrix estimators:
lda_pseudo
)lda_schafer
)lda_thomaz
)UPDATES
Fixed an issue with the HDRDA classifier's predict
function. The posterior
probabilities did not sum to 1 because they were unnormalized. #34
Fixed another issue with the HDRDA classifier's predict
function, where the
class names were incorrect when predicting a single observation. #34
Improved docs throughout the package to pass R CMD CHECK
. #35
UPDATES
The predict
function now returns posterior-probability estimates for each
classifier.
The object returned by cv_hdrda()
can be plotted. A heatmap is produced
using ggplot2
to illustrate the cross-validation error rate for each
tuning-parameter pair considered.
The predict
function for the HDRDA classifier is now substantially faster
when classifying a large number of observations. #33
UPDATES
The cross-validation helper function cv_hdrda()
for the HDRDA classifier now
returns a trained classifier rather than the optimal model.
cv_hdrda()
also has an optional verbose
argument to dump summary
information while the cross-validation is running.
FIXES
Fixed issue with classifiers' documentation not appearing in help index. #26
Better handling of HDRDA when its tuning parameters are both 0.
Corrected calculation of W_k and Q_k in HDRDA classifier.
MISCELLANEOUS
Added unit tests for HDRDA.
Can now specify population means in generate_blockdiag()
.
Added unit tests for generate_blockdiag()
.
Updated man docs with roxygen2 4.0.
Added log_determinant()
helper function to calculate the log-determinant of
a matrix.
UPDATES
hdrda()
has been revamped to
improve its computational performance.CLASSIFIERS
lda_pseudo()
is an implementation of Linear Discriminant Analysis (LDA) with
the Moore-Penrose Pseudo-Inverse
lda_schafer()
is an implementation of Linear Discriminant Analysis (LDA)
using the covariance matrix estimator from Schafer and Strimmer (2005)
lda_thomaz()
is an implementation of Linear Discriminant Analysis (LDA)
using the covariance matrix estimator from Thomaz, Kitani, and Gillies (2006)
mdeb()
is an implementation of the Minimum Distance Empirical Bayesian
Estimator (MDEB) classifier from Srivistava and Kubokawa (2007)
mdmeb()
is an implementation of the Minimum Distance Rule using Modified
Empirical Bayes (MDMEB) classifier from Srivistava and Kubokawa (2007)
mdmp()
is an implementation of the Minimum Distance Rule using Moore-Penrose
Inverse (MDMP) classifier from Srivistava and Kubokawa (2007)
smdlda()
is an implementation of the Shrinkage-mean-based Diagonal Linear
Discriminant Analysis (SmDLDA) from Tong, Chen, and Zhao (2012)
smdqda()
is an implementation of the Shrinkage-mean-based Diagonal Quadratic
Discriminant Analysis (SmDQDA) from Tong, Chen, and Zhao (2012)
MISCELLANEOUS
hdrda
classifiersNEW FEATURES
sparsediscrim
package. With this package, we aim to
provide a large collection of regularized and sparse discriminant analysis
classifiers intended for high-dimensional classification.CLASSIFIERS
hdrda()
is an implementation of the High-Dimensional Regularized
Discriminant Analysis classifier from Ramey, Stein, and Young (2014).
dlda()
is an implementation of the Diagonal Linear Discriminant Analysis
classifier from Dudoit, Fridlyand, and Speed (2002).
dqda()
is an implementation of the Diagonal Quadratic Discriminant Analysis
classifier from Dudoit, Fridlyand, and Speed (2002).
sdlda()
is an implementation of the Shrinkage-based Diagonal Linear
Discriminant Analysis classifier from Pang, Tong, and Zhao (2009).
sdqda()
is an implementation of the Shrinkage-based Diagonal Quadratic
Discriminant Analysis classifier from Pang, Tong, and Zhao (2009).
SIMULATED DATA SETS
generate_blockdiag()
generates random variates from K multivariate normal
populations, where each class is generated with a constant mean vector and a
covariance matrix consisting of block-diagonal autocorrelation matrices.
generate_intraclass()
generates random variates from K multivariate normal
populations, where class is generated with a constant mean vector and an
intraclass covariance matrix.
MISCELLANEOUS
cv_partition()
randomly partitions data for cross-validation.
no_intercept()
removes the intercept term from a formula if it is included.
cov_mle()
computes the maximum likelihood estimator for the sample
covariance matrix under the assumption of multivariate normality.
cov_pool()
computes the pooled maximum likelihood estimator for the common
covariance matrix under the assumption of multivariate normality.
cov_eigen()
computes the eigenvalue decomposition of the maximum likelihood
estimators of the covariance matrices for the given data matrix. We provide an
option to calculate the eigenvalue decomposition using the Fast Singular Value
Decomposition, which can greatly expedite the eigenvalue decomposition for
very tall data (large n, small p) or very wide data (small n, large p).