Last updated on 2019-07-03 by Martin Maechler
Robust (or "resistant") methods for statistics modelling have been
available in S from the very beginning in the 1980s; and then in R in
package stats
.
Examples are median()
, mean(*, trim =
. )
, mad()
, IQR()
,
or also fivenum()
, the statistic
behind boxplot()
in package graphics
)
or lowess()
(and loess()
) for robust
nonparametric regression, which had been complemented
by runmed()
in 2003.
Much further important functionality has been made available in
recommended (and hence present in all R versions) package
MASS (by Bill Venables and Brian Ripley, see the book
Modern Applied
Statistics with S).
Most importantly, they provide
rlm()
for robust regression and cov.rob()
for
robust multivariate scatter and covariance.
This task view is about R add-on packages providing newer or faster, more efficient algorithms and notably for (robustification of) new models.
Please send suggestions for additions and extensions to the task view maintainer.
An international group of scientists working in the field of robust statistics has made efforts (since October 2005) to coordinate several of the scattered developments and make the important ones available through a set of R packages complementing each other. These should build on a basic package with "Essentials", coined robustbase with (potentially many) other packages building on top and extending the essential functionality to particular models or applications. Further, there is the quite comprehensive package robust, a version of the robust library of S-PLUS, as an R package now GPLicensed thanks to Insightful and Kjell Konis. Originally, there has been much overlap between 'robustbase' and 'robust', now robust depends on robustbase, the former providing convenient routines for the casual user where the latter will contain the underlying functionality, and provide the more advanced statistician with a large range of options for robust modeling.
We structure the packages roughly into the following topics, and typically will first mention functionality in packages robustbase and robust.
lmrob()
(robustbase) and lmRob()
(robust) where the former uses the latest of the
fast-S algorithms and heteroscedasticity and autocorrelation corrected
(HAC) standard errors, the latter makes use of the M-S algorithm of
Maronna and Yohai (2000), automatically when there are factors
among the predictors (where S-estimators (and hence MM-estimators)
based on resampling typically badly fail).
The ltsReg()
and lmrob.S()
functions
are available in robustbase, but rather for comparison
purposes.
rlm()
from MASS had been the first widely
available implementation for robust linear models, and also one of
the very first MM-estimation implementations.
robustreg provides very simple M-estimates for linear
regression (in pure R).
Note that Koenker's quantile regression package quantreg
contains L1 (aka LAD, least absolute deviations)-regression as a
special case, doing so also for nonparametric regression via
splines.
Quantile regression (and hence L1 or LAD) for mixed effect models,
is available in package lqmm, whereas an
MM-like approach for robust linear mixed effects modeling
is available from package robustlmm.
Package mblm's function mblm()
fits
median-based (Theil-Sen or Siegel's repeated) simple linear models.
Package TEEReg provides trimmed elemental estimators for
linear models.
Generalized linear models (GLMs) are provided both via
glmrob()
(robustbase) and glmRob()
(robust),
where package robustloggamma focuses on generalized log
gamma models.
Robust ordinal regression is provided by
rorutadis (UTADIS).
Robust Nonlinear model fitting is available through
robustbase's nlrob()
.
multinomRob fits overdispersed multinomial regression
models for count data.
robustgam fits robust GAMs,
i.e., robust Generalized Additive Models.
drgee fits "Doubly Robust" Generalized Estimating Equations (GEEs)
complmrob does robust linear regression with compositional data as covariates.
Depends
")
on robustbase provides nice S4 class based methods,
more methods for robust multivariate variance-covariance estimation,
and adds robust PCA methodology.
It is extended by rrcovNA, providing robust multivariate
methods for for incomplete or missing (NA
) data, and by
rrcovHD, providing robust multivariate methods for
High Dimensional data. High dimensional data with an
emphasis on functional data are treated robustly also by roahd.
Specialized robust PCA packages are pcaPP (via
Projection Pursuit), rpca (incl "sparse")
and rospca.
Historically, note that robust PCA can be performed by using standard
R's princomp()
, e.g.,
X <- stackloss; pc.rob <- princomp(X, covmat= MASS::cov.rob(X))
Here, robustbase contains a slightly more flexible
version, covMcd()
than robust's
fastmcd()
, and similarly for covOGK()
.
OTOH, robust's covRob()
has automatically chosen
methods, notably pairwiseQC()
for large dimensionality p.
Package robustX for experimental, or other not yet
established procedures, contains BACON()
and
covNCC()
, the latter providing the
neighbor variance estimation (NNVE) method of Wang and Raftery (2002),
also available (slightly less optimized) in covRobust.
RobRSVD provides a robust Regularized Singular Value Decomposition.
mvoutlier (building on robustbase) provides
several methods for outlier identification in high dimensions.
GSE estimates multivariate location and scatter in the presence of missing data.
RSKC provides Robust Sparse
K-means Clustering.
robustDA for robust mixture Discriminant Analysis
(RMDA) builds a mixture model classifier with noisy class labels.
robcor computes robust pairwise correlations based on scale estimates,
particularly on FastQn()
.
covRobust provides the
nearest neighbor variance estimation (NNVE) method of Wang and
Raftery (2002).
pam()
implementing "partioning around medians" is partly robust (medians
instead of very unrobust k-means) but is not good enough,
as e.g., the k clusters could consist of k-1 outliers one
cluster for the bulk of the remaining data.
"Truly" robust clustering is provided by packages
genie,
Gmedian,
otrimle (trimmed MLE model-based)
snipEM, (snipping EM)
and qclust (robust estim. of Gaussian mixtures) and
notably tclust (robust trimmed clustering).
See also the CRAN task views
Multivariate and
Cluster
BACON()
(in robustX)
should be applicable for larger (n,p) than traditional robust
covariance based outlier detectors.
OutlierDM detects outliers for replicated high-throughput data.
(See also the CRAN task view MachineLearning.)boxplot.stats()
, etc mentioned above runmed()
provides most robust
running median filtering.vcov(lmrob())
also uses a version of HAC
standard errors for its robustly estimated linear models.
See also the CRAN task view Econometrics
3 months ago by Brian Ripley
Support Functions and Datasets for Venables and Ripley's MASS
a month ago by Martin Maechler
"Finding Groups in Data": Cluster Analysis Extended Rousseeuw et al.
3 months ago by David Kepplinger
Robust Linear Regression with Compositional Data as Covariates
2 years ago by Hana Sevcikova
Robust Covariance Estimation via Nearest Neighbor Cleaning
2 years ago by Marek Gagolewski
A New, Fast, and Outlier Resistant Hierarchical Clustering Algorithm
2 years ago by Herve Cardot
Geometric Median, k-Median Clustering and Robust Median PCA
3 months ago by Claudio Agostinelli
Robust Estimation in the Presence of Cellwise and Casewise Contamination and Missing Data
6 years ago by Jasjeet Singh Sekhon
Robust Estimation of Overdispersed Multinomial Regression Models
5 years ago by Soo-Heang Eo
Outlier Detection using quantile regression for Censored Data
5 years ago by Soo-Heang Eo
Outlier Detection for Multi-replicated High-throughput Data
3 months ago by Matthias Kohl
Optimally Robust Influence Curves and Estimators for Location and Scale
3 months ago by Matthias Kohl
Infinitesimally Robust Estimators for Preprocessing -Omics Data
3 years ago by Jonathan Rathjens
Robust Periodogram and Periodicity Detection Methods
3 months ago by Matthias Kohl
Optimally Robust Influence Curves for Regression and Scale
a day ago by A. Randriamiharisoa
Truncated Maximum Likelihood Fit and Robust Accelerated Failure Time Regression for Gaussian and Log-Weibull Case
6 years ago by Raymond K. W. Wong
Robust Estimation for Generalized Additive Models
3 months ago by Claudio Agostinelli
Robust Estimation of the Generalized log Gamma Model
5 months ago by Martin Maechler
'eXtra' / 'eXperimental' Functionality for Robust Statistics
3 months ago by Matthias Kohl
Optimally Robust Estimation for Regression-Type Models
4 years ago by Maciek Sykulski
RobustPCA: Decompose a Matrix into Low-Rank and Sparse Components
3 years ago by Valentin Todorov
Robust Multivariate Methods for High Dimensional Data
3 years ago by Valentin Todorov
Scalable Robust Estimators with High Breakdown Point for Incomplete Data
4 months ago by Alessio Farcomeni
Snipping Methods for Robust Estimation and Clustering
2 months ago by Mikhail Zhelonkin
Robust Estimation and Inference in Sample Selection Models