Collection of convenient functions for common statistical computations, which are not directly provided by R's base or stats packages. This package aims at providing, first, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like standard errors or root mean squared errors). Second, these shortcut functions are generic (if appropriate), and can be applied not only to vectors, but also to other objects as well (e.g., the Coefficient of Variation can be computed for vectors, linear models, or linear mixed models; the r2()-function returns the r-squared value for 'lm', 'glm', 'merMod' or 'lme' objects). The focus of most functions lies on summary statistics or fit measures for regression models, including generalized linear models and mixed effects models. However, some of the functions also deal with other statistical measures, like Cronbach's Alpha, Cramer's V, Phi etc.
Collection of convenient functions for common statistical computations, which are not directly provided by R's base or stats packages.
This package aims at providing, first, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like standard errors or root mean squared errors).
Second, these shortcut functions are generic (if appropriate), and can be applied not only to vectors, but also to other objects as well (e.g., the Coefficient of Variation can be computed for vectors, linear models, or linear mixed models; the
r2()-function returns the r-squared value for lm, glm, merMod or lme objects).
Most functions of this package are designed as summary functions, i.e. they do not transform the input vector; rather, they return a summary, which is sometimes a vector and sometimes a tidy data frame. The focus of most functions lies on summary statistics or fit measures for regression models, including generalized linear models and mixed effects models. However, some of the functions deal with other statistical measures, like Cronbach's Alpha, Cramer's V, Phi etc.
The comprised tools include:
Furthermore, sjstats has functions to access information from model objects, which either support more model objects than their stats counterparts, or provide easy access to model attributes, like:
model_frame()to get the model frame,
link_inverse()to get the link-inverse function,
resp_var()to get the names of either the dependent or independent variables, or
var_names()to get the "cleaned" variables names from a model object (cleaned means, things like
log()are removed from the returned character vector with variable names.)
To install the latest development snapshot (see latest changes below), type following commands into the R console:
Please note the package dependencies when installing from GitHub. The GitHub version of this package may depend on latest GitHub versions of my other packages, so you may need to install those first, if you encounter any problems. Here's the order for installing packages from GitHub:
To install the latest stable release from CRAN, type following command into the R console:
In case you want / have to cite my package, please use
citation('sjstats') for citation information.
dplyr::select_helperswere updated to
var_names()now also cleans variable names from variables modeled with the
mi()function (multiple imputation on the fly in brms).
out-argument, to print output to console, or as HTML table in the viewer or web browser.
tidy_stan()with more complex brmsfit-models.
typical_value()to prevent error for R-oldrel-Windows.
model_frame()now returns response values from models, which are in matrix form (bound with
cbind()), as is.
grpmean(), where values instead of value labels were printed if some categories were not present in the data.
mcse()to compute the Monte Carlo standard error for
n_eff()to compute the effective sample size for
contrasts()from package emmeans to compute p-values, which correclty indicate whether the sub-group mean is significantly different from the total mean.
out-argument, to print output to console, or as HTML table in the viewer or web browser.
tidy_stan()now includes information on the Monte Carlo standard error.
link_inverse()now support Zelig-relogit-models.
typical_value()gets an explicit
model_frame()did not work properly for variables that were standardized with
weight.by-argument did not work in
scale_weights()to rescale design weights for multilevel models.
pca_rotate()to create tidy summaries of principal component analyses or rotated loadings matrices from PCA.
gmd()to compute Gini's mean difference.
is_prime()to check whether a number is a prime number or not.
tolerance-argument, to accept models with a ratio within a certain range of 1.
var_names()now also cleans variable names from variables modelled with the
brmsfit-objects (models fitted with the brms-package).
fun = "weighted.mean",
typical_value()now checks if vector of weights is of same length as
grpmean()now also prints the overall p-value from the model.
pred_accuracy()did not work for formulas with transforming function for response terms, e.g.
model_frame()to get the model frame from model objects, also of those models that don't have a S3-generic model.frame-function.
var_names()to get cleaned variable names from model objects.
link_inverse()to get the inverse link function from model objects.
typical_value()can now also be a named vector, to apply different functions for numeric and categorical variables.
tidy_stan()to return a tidy summary of Stan-models.
rope()now also work for
brmsfit-models, from package brms.
rope()now have a
type-argument, to return fixed, random or all effects for mixed effects models.
typical_value()gets a "zero"-option for the
icc(), which used
stats::sigma()and thus required R-version 3.3 or higher. Now should depend on R 3.2 again.
se()now also supports
hdi()now also supports
ci.lvl-argument, to specify the level of the calculated confidence interval for standardized coefficients.
get_model_pval()is now deprecated. Please use
rope()to calculate the region of practical equivalence for MCMC samples.
grpmean()to compute mean values by groups (One-way Anova).
hdi()to compute high density intervals (HDI) for MCMC samples.
find_beta2()to find the shape parameters of a Beta distribution.
find_cauchy()to find the parameters of a normal or cauchy distribution.
typical_value(), to return the typical value of a variable.
omega_sq()to compute (partial) eta-squared or omega-squared statistics, or Cohen's F for anova tables.
anova_stats()to compute a complete model summary, including (partial) eta-squared, omega-squared and Cohen's F statistics for anova tables, returned as tidy data frame.
svy_md()as convenient shortcut to compute the median for variables from survey designs.
is_singular()to check a model fit for singularity in case of post-fitting convergence warnings.
glm-objects is now based on log-Likelihood methods and also accounts for count models.
svyglm.nb()now also prints the dispersion parameter Theta.
boot_ci()also displays CI based on sample quantiles.
std_beta()did not work for models with only one predictor.
pred_accuracy()now also reports the standard error of accuracy, and gets a print-method.
pred_accuracy()with cross-validation-method did not correctly account for the generated test data.
boot_est()to return the estimate from bootstrap replicates.
svyglm.nb()-objects now also prints confidence intervals.
se()did not work for
icc()-objects, when the mixed model had more than one random effect term.
cv_compare()to compute the root mean squared error for test and training data from cross-validation.
props()to calculate proportions in a vector, supporting multiple logical statements.
or_to_rr()to convert odds ratio estimates into risk ratio estimates.
sm()to calculate mean, median or sum of a vector, but using
na.rm = TRUEas default.
center()were removed and are now in the sjmisc-package.
svyglm.nb()to compute survey-weighted negative binomial regressions.
xtab_statistics()to compute various measures of assiciation for contingency tables.
type-argument, which applies to generalized linear mixed models. You can now choose to compute either standard errors with delta-method approximation for fixed effects only, or standard errors for joint random and fixed effects.
prop()did not work for non-labelled data frames when used with grouped data frames.
svy()to compute robust standard errors for weighted models, adjusting the residual degrees of freedom to simulate sampling weights.
zero_count()to check whether a poisson-model is over- or underfitting zero-counts in the outcome.
pred_accuracy()to calculate accuracy of predictions from model fit.
outliers()to detect outliers in (generalized) linear models.
heteroskedastic()to check linear models for (non-)constant error variance.
autocorrelation()to check linear models for auto-correlated residuals.
normality()to check whether residuals in linear models are normally distributed or not.
multicollin()to check predictors in a model for multicollinearity.
check_assumptions()to run a set of model assumption checks.
prop()no longer works within dplyr's
summarise()function. Instead, when now used with grouped data frames, a summary of proportions is directly returned as tibble.
se()now computes adjusted standard errors for generalized linear (mixed) models, using the Taylor series-based delta method.
digits-argument to round the return value to a specific number of decimal places.
prop()to calculate proportion of values in a vector.
mse()to calculate the mean square error for models.
robust()to calculate robust standard errors and confidence intervals for regression models, returned as tidy data frame.
split_half()to compute the split-half-reliability of tests or questionnaires.
var_pop()to compute population variance and population standard deviation.
se()now also computes the standard error from estimates (regression coefficients) and p-values.
get_model_pval()to return a tidy data frame (tibble) of model term names, p-values and standard errors from various regression model types.
se_ybar()to compute standard error of sample mean for mixed models, considering the effect of clustering on the standard error.
center()to standardize and center variables, supporting the pipe-operator.
se()now also computes the standard error for intraclass correlation coefficients, as returned by the
std_beta()now always returns a tidy data frame (tibble) with model term names, standardized estimate, standard error and confidence intervals.
r2()now also computes alternative omega-squared-statistics, if null model is given.