Collection of convenient functions for common statistical computations, which are not directly provided by R's base or stats packages. This package aims at providing, first, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like standard errors or root mean squared errors). Second, these shortcut functions are generic (if appropriate), and can be applied not only to vectors, but also to other objects as well (e.g., the Coefficient of Variation can be computed for vectors, linear models, or linear mixed models; the r2()-function returns the r-squared value for 'lm', 'glm', 'merMod' or 'lme' objects). The focus of most functions lies on summary statistics or fit measures for regression models, including generalized linear models and mixed effects models. However, some of the functions also deal with other statistical measures, like Cronbach's Alpha, Cramer's V, Phi etc.

Collection of convenient functions for common statistical computations, which are not directly provided by R's base or stats packages.

This package aims at providing, **first**, shortcuts for statistical measures, which otherwise could only be calculated with additional effort (like standard errors or root mean squared errors).

**Second**, these shortcut functions are generic (if appropriate), and can be applied not only to vectors, but also to other objects as well (e.g., the Coefficient of Variation can be computed for vectors, linear models, or linear mixed models; the `r2()`

-function returns the r-squared value for *lm*, *glm*, *merMod* or *lme* objects).

Most functions of this package are designed as *summary functions*, i.e. they do not transform the input vector; rather, they return a summary, which is sometimes a vector and sometimes a tidy data frame. The focus of most functions lies on summary statistics or fit measures for regression models, including generalized linear models and mixed effects models. However, some of the functions deal with other statistical measures, like Cronbach's Alpha, Cramer's V, Phi etc.

The comprised tools include:

- For regression and mixed models: Coefficient of Variation, Root Mean Squared Error, Residual Standard Error, Coefficient of Discrimination, R-squared and pseudo-R-squared values, standardized beta values
- Especially for mixed models: Design effect, ICC, sample size calculation and convergence tests
- Fit and accuracy measures for regression models: Overdispersion tests, accuracy of predictions, test/training-error comparisons
- For anova-tables: Eta-squared, Partial Eta-squared and Omega-squared statistics

Furthermore, *sjstats* has functions to access information from model objects, which either support more model objects than their *stats* counterparts, or provide easy access to model attributes, like:

`model_frame()`

to get the model frame,`link_inverse()`

to get the link-inverse function,`pred_vars()`

and`resp_var()`

to get the names of either the dependent or independent variables, or`var_names()`

to get the "cleaned" variables names from a model object (cleaned means, things like`s()`

or`log()`

are removed from the returned character vector with variable names.)

Other statistics:

- Cramer's V, Cronbach's Alpha, Mean Inter-Item-Correlation, Mann-Whitney-U-Test, Item-scale reliability tests

To install the latest development snapshot (see latest changes below), type following commands into the R console:

`library(devtools)devtools::install_github("strengejacke/sjstats")`

Please note the package dependencies when installing from GitHub. The GitHub version of this package may depend on latest GitHub versions of my other packages, so you may need to install those first, if you encounter any problems. Here's the order for installing packages from GitHub:

sjlabelled → sjmisc → sjstats → ggeffects → sjPlot

To install the latest stable release from CRAN, type following command into the R console:

`install.packages("sjstats")`

In case you want / have to cite my package, please use `citation('sjstats')`

for citation information.

- Remove unused imports.
- Cross refences from
`dplyr::select_helpers`

were updated to`tidyselect::select_helpers`

.

`var_names()`

now also cleans variable names from variables modeled with the`mi()`

function (multiple imputation on the fly in*brms*).`reliab_test()`

gets an`out`

-argument, to print output to console, or as HTML table in the viewer or web browser.

- Fix issues with
`mcse()`

,`n_eff()`

and`tidy_stan()`

with more complex*brmsfit*-models. - Fix issue in
`typical_value()`

to prevent error for R-oldrel-Windows. `model_frame()`

now returns response values from models, which are in matrix form (bound with`cbind()`

), as is.- Fixed issues in
`grpmean()`

, where values instead of value labels were printed if some categories were not present in the data.

- Beautiful colored output for
`grpmean()`

and`mwu()`

.

`mcse()`

to compute the Monte Carlo standard error for`stanreg`

- and`brmsfit`

-models.`n_eff()`

to compute the effective sample size for`stanreg`

- and`brmsfit`

-models.

`grpmean()`

now uses`contrasts()`

from package*emmeans*to compute p-values, which correclty indicate whether the sub-group mean is significantly different from the total mean.`grpmean()`

gets an`out`

-argument, to print output to console, or as HTML table in the viewer or web browser.`tidy_stan()`

now includes information on the Monte Carlo standard error.`model_frame()`

,`p_value()`

and`link_inverse()`

now support Zelig-relogit-models.`typical_value()`

gets an explicit`weight.by`

-argument.

`model_frame()`

did not work properly for variables that were standardized with`scale()`

.- In certain cases,
`weight.by`

-argument did not work in`grpmean()`

.

- Remove deprecated
`get_model_pval()`

. - Revised documentation for
`overdisp()`

.

`scale_weights()`

to rescale design weights for multilevel models.`pca()`

and`pca_rotate()`

to create tidy summaries of principal component analyses or rotated loadings matrices from PCA.`gmd()`

to compute Gini's mean difference.`is_prime()`

to check whether a number is a prime number or not.

`link_inverse()`

now supports`brmsfit`

,`multinom`

and`clm`

-models.`p_value()`

now supports`polr`

and`multinom`

-models.`zero_count()`

gets a`tolerance`

-argument, to accept models with a ratio within a certain range of 1.`var_names()`

now also cleans variable names from variables modelled with the`offset()`

,`lag()`

or`diff()`

function.`icc()`

,`re_var()`

and`get_re_var()`

now support`brmsfit`

-objects (models fitted with the*brms*-package).- For
`fun = "weighted.mean"`

,`typical_value()`

now checks if vector of weights is of same length as`x`

. - The print-method for
`grpmean()`

now also prints the overall p-value from the model.

`resp_val()`

,`cv_error()`

and`pred_accuracy()`

did not work for formulas with transforming function for response terms, e.g.`log(response)`

.

- Fixed examples, to resolve issues with CRAN package checks.
- More model objects supported in
`p_value()`

.

`model_frame()`

to get the model frame from model objects, also of those models that don't have a S3-generic model.frame-function.`var_names()`

to get cleaned variable names from model objects.`link_inverse()`

to get the inverse link function from model objects.

- The
`fun`

-argument in`typical_value()`

can now also be a named vector, to apply different functions for numeric and categorical variables.

- Fixed issue with specific model formulas in
`pred_vars()`

. - Fixed issue with specific model objects in
`resp_val()`

. - Fixed issue with nested models in
`re_var()`

.

`tidy_stan()`

to return a tidy summary of Stan-models.

`hdi()`

and`rope()`

now also work for`brmsfit`

-models, from package*brms*.`hdi()`

and`rope()`

now have a`type`

-argument, to return fixed, random or all effects for mixed effects models.

`typical_value()`

gets a "zero"-option for the`fun`

-argument.- Changes to
`icc()`

, which used`stats::sigma()`

and thus required R-version 3.3 or higher. Now should depend on R 3.2 again. `se()`

now also supports`stanreg`

and`stanfit`

objects.`hdi()`

now also supports`stanfit`

-objects.`std_beta()`

gets a`ci.lvl`

-argument, to specify the level of the calculated confidence interval for standardized coefficients.`get_model_pval()`

is now deprecated. Please use`p_value()`

instead.

`rope()`

to calculate the region of practical equivalence for MCMC samples.

- Added vignettes for various functions.
- Fixed issue with latest tidyr-update on CRAN.

`grpmean()`

to compute mean values by groups (One-way Anova).`hdi()`

to compute high density intervals (HDI) for MCMC samples.`find_beta()`

and`find_beta2()`

to find the shape parameters of a Beta distribution.`find_normal()`

and`find_cauchy()`

to find the parameters of a normal or cauchy distribution.

`typical_value()`

, to return the typical value of a variable.`eta_sq()`

,`cohens_f()`

and`omega_sq()`

to compute (partial) eta-squared or omega-squared statistics, or Cohen's F for anova tables.`anova_stats()`

to compute a complete model summary, including (partial) eta-squared, omega-squared and Cohen's F statistics for anova tables, returned as tidy data frame.`svy_md()`

as convenient shortcut to compute the median for variables from survey designs.`is_singular()`

to check a model fit for singularity in case of post-fitting convergence warnings.

- Computation of
`r2()`

for`glm`

-objects is now based on log-Likelihood methods and also accounts for count models. - Better
`print()`

-method for`overdisp()`

. `print()`

-method for`svyglm.nb()`

now also prints the dispersion parameter Theta.`overdisp()`

now supports`glmmTMB`

-objects.`boot_ci()`

also displays CI based on sample quantiles.

`std_beta()`

did not work for models with only one predictor.

`icc()`

,`re_var()`

and`get_re_var()`

now support`glmmTMB`

-objects.`pred_accuracy()`

now also reports the standard error of accuracy, and gets a print-method.

`pred_accuracy()`

with cross-validation-method did not correctly account for the generated test data.- Fixed issue with calculation in
`smpsize_lmm()`

and`se_ybar()`

.

- Revised imports: Labelled data functions from package
*sjmisc*have been moved to package*sjlabelled*.

`boot_est()`

to return the estimate from bootstrap replicates.

- The
`print()`

-method for`svyglm.nb()`

-objects now also prints confidence intervals.

`se()`

did not work for`icc()`

-objects, when the mixed model had more than one random effect term.

`cv_error()`

and`cv_compare()`

to compute the root mean squared error for test and training data from cross-validation.`props()`

to calculate proportions in a vector, supporting multiple logical statements.`or_to_rr()`

to convert odds ratio estimates into risk ratio estimates.`mn()`

,`md()`

and`sm()`

to calculate mean, median or sum of a vector, but using`na.rm = TRUE`

as default.- S3-generics for
`svyglm.nb`

-models:`family()`

,`print()`

,`formula()`

,`model.frame()`

and`predict()`

.

- Fixed error in computation of
`mse()`

.

- Functions
`std()`

and`center()`

were removed and are now in the sjmisc-package.

`svyglm.nb()`

to compute survey-weighted negative binomial regressions.`xtab_statistics()`

to compute various measures of assiciation for contingency tables.- Added S3-
`model.frame()`

-function for`gee`

-models.

`se()`

gets a`type`

-argument, which applies to generalized linear mixed models. You can now choose to compute either standard errors with delta-method approximation for fixed effects only, or standard errors for joint random and fixed effects.

`prop()`

did not work for non-labelled data frames when used with grouped data frames.

`svy()`

to compute robust standard errors for weighted models, adjusting the residual degrees of freedom to simulate sampling weights.`zero_count()`

to check whether a poisson-model is over- or underfitting zero-counts in the outcome.`pred_accuracy()`

to calculate accuracy of predictions from model fit.`outliers()`

to detect outliers in (generalized) linear models.`heteroskedastic()`

to check linear models for (non-)constant error variance.`autocorrelation()`

to check linear models for auto-correlated residuals.`normality()`

to check whether residuals in linear models are normally distributed or not.`multicollin()`

to check predictors in a model for multicollinearity.`check_assumptions()`

to run a set of model assumption checks.

`prop()`

no longer works within dplyr's`summarise()`

function. Instead, when now used with grouped data frames, a summary of proportions is directly returned as tibble.`se()`

now computes adjusted standard errors for generalized linear (mixed) models, using the Taylor series-based delta method.

- Package depends on R-version >= 3.3.

`prop()`

gets a`digits`

-argument to round the return value to a specific number of decimal places.

- Largely revised the documentation.

`prop()`

to calculate proportion of values in a vector.`mse()`

to calculate the mean square error for models.`robust()`

to calculate robust standard errors and confidence intervals for regression models, returned as tidy data frame.

`split_half()`

to compute the split-half-reliability of tests or questionnaires.`sd_pop()`

and`var_pop()`

to compute population variance and population standard deviation.

`se()`

now also computes the standard error from estimates (regression coefficients) and p-values.

- Added S3-
`print`

-method for`mwu()`

-function. `get_model_pval()`

to return a tidy data frame (tibble) of model term names, p-values and standard errors from various regression model types.`se_ybar()`

to compute standard error of sample mean for mixed models, considering the effect of clustering on the standard error.`std()`

and`center()`

to standardize and center variables, supporting the pipe-operator.

`se()`

now also computes the standard error for intraclass correlation coefficients, as returned by the`icc()`

-function.`std_beta()`

now always returns a tidy data frame (tibble) with model term names, standardized estimate, standard error and confidence intervals.`r2()`

now also computes alternative omega-squared-statistics, if null model is given.