Create Tidy Data Frames of Marginal Effects for 'ggplot' from Model Outputs

Compute marginal effects at the mean or average marginal effects from statistical models and returns the result as tidy data frames. These data frames are ready to use with the 'ggplot2'-package. Marginal effects can be calculated for many different models. Interaction terms, splines and polynomial terms are also supported. The two main functions are ggpredict() and ggaverage(), however, there are some convenient wrapper-functions especially for polynomials or interactions. There is a generic plot()-method to plot the results using 'ggplot2'.


This package computes marginal effects at the mean or average marginal effects from statistical models and returns the result as tidy data frames. These data frames are ready to use with the ggplot2-package. Marginal effects can be calculated for many different models. Currently supported model-objects are: lm, glm, glm.nb, lme, lmer, glmer, glmer.nb, nlmer, glmmTMB, gam (package mgcv), vgam, gamm, gamm4, multinom, betareg, truncreg, coxph, gls, gee, plm, lrm, polr, clm, zeroinfl, hurdle, stanreg, brmsfit, svyglm and svyglm.nb. Other models not listed here are passed to a generic predict-function and might work as well, or maybe with ggeffect(), which effectively does the same as ggpredict().

Interaction terms, splines and polynomial terms are also supported. The two main functions are ggpredict() and ggaverage(), however, there are some convenient wrapper-functions especially for polynomials or interactions. There is a generic plot()-method to plot the results using ggplot2.

Examples

The returned data frames always have the same, consistent structure and column names, so it's easy to create ggplot-plots without the need to re-write the function call. x and predicted are the values for the x- and y-axis. conf.low and conf.high could be used as ymin and ymax aesthetics for ribbons to add confidence bands to the plot. group can be used as grouping-aesthetics, or for faceting.

ggpredict() requires at least one, but not more than three terms specified in the terms-argument. Predicted values of the response, along the values of the first term are calucalted, optionally grouped by the other terms specified in terms.

data(efc)
fit <- lm(barthtot ~ c12hour + neg_c_7 + c161sex + c172code, data = efc)

ggpredict(fit, terms = "c12hour")
#> # A tibble: 62 × 6
#>        x predicted conf.low conf.high  group
#>    <dbl>     <dbl>    <dbl>     <dbl> <fctr>
#> 1      4  74.43040 72.33073  76.53006      1
#> 2      5  74.17710 72.09831  76.25588      1
#> 3      6  73.92379 71.86555  75.98204      1
#> 4      7  73.67049 71.63242  75.70857      1
#> 5      8  73.41719 71.39892  75.43546      1
#> 6      9  73.16389 71.16504  75.16275      1
#> 7     10  72.91059 70.93076  74.89042      1
#> 8     11  72.65729 70.69608  74.61850      1
#> 9     12  72.40399 70.46098  74.34700      1
#> 10    14  71.89738 69.98948  73.80529      1
#> # ... with 52 more rows

A possible call to ggplot could look like this:

library(ggplot2)
mydf <- ggpredict(fit, terms = "c12hour")
ggplot(mydf, aes(x, predicted)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = .1)

However, there is also a plot()-method. This method uses convenient defaults, to easily create the most suitable plot for the marginal effects.

mydf <- ggpredict(fit, terms = "c12hour")
plot(mydf)

plot() offers a few, but useful arguments, so it's easy to use.

With three variables, predictions can be grouped and faceted.

ggpredict(fit, terms = c("c12hour", "c172code", "c161sex"))
#> # A tibble: 372 × 7
#>        x predicted  conf.low conf.high                           group      facet
#>    <dbl>     <dbl>     <dbl>     <dbl>                          <fctr>     <fctr>
#> 1      4  74.70073  72.38031  77.02114 intermediate level of education [2] Female
#> 2      4  73.98237  70.45711  77.50763          low level of education [2] Female
#> 3      4  75.41908  71.91747  78.92070         high level of education [2] Female
#> 4      4  73.65930  70.08827  77.23033 intermediate level of education   [1] Male
#> 5      4  72.94094  68.38540  77.49649          low level of education   [1] Male
#> 6      4  74.37766  70.05658  78.69874         high level of education   [1] Male
#> 7      5  74.44742  72.14644  76.74841 intermediate level of education [2] Female
#> 8      5  73.72907  70.21926  77.23888          low level of education [2] Female
#> 9      5  75.16578  71.67430  78.65726         high level of education [2] Female
#> 10     5  73.40600  69.84575  76.96625 intermediate level of education   [1] Male
#> # ... with 362 more rows

mydf <- ggpredict(fit, terms = c("c12hour", "c172code", "c161sex"))
ggplot(mydf, aes(x = x, y = predicted, colour = group)) +
  stat_smooth(method = "lm", se = FALSE) +
  facet_wrap(~facet)

plot() works for this case, as well.

There are some more features, which are explained in more detail in the package-vignette.

Adding support for more model classes

The package is easily extendable, to add support for other model objects. The only requirement is that following methods are available: predict(), model.frame() and family(). If model objects do not support these methods, you may implement workarounds (see below).

Following code needs to be revised to add further model objects:

  • file utils_model_function.R, function get_model_function() needs a line to specify whether the new model can be considered as linear or generalized linear model.
  • file utils_model_function.R, function get_predict_function() needs a line to specify the class.
  • finally, in the file predictions.R, add a line to select_prediction_method() to call the right prediction-method, and add a method get_predictions_<class>(), if one of the existing prediction-methods does not fit the needs of the new model object.

When the model object does not support one of predict(), model.frame() or family(), you may add workarounds:

  • if the model does not have a family()-function, a workaround has to be added to get_glm_family() in the file utils_model_family.R.
  • if the model does not have a model.frame()-function with standard arguments or return values, a workaround has to be added to get_model_frame() in the file utils_model_frame.R.
  • if the model does not have a predict()-function, a workaround has to be added to get_predictions_<class>() in the file predictions.R.

Installation

Latest development build

To install the latest development snapshot (see latest changes below), type following commands into the R console:

library(devtools)
devtools::install_github("strengejacke/ggeffects")

Please note the package dependencies when installing from GitHub. The GitHub version of this package may depend on latest GitHub versions of my other packages, so you may need to install those first, if you encounter any problems. Here's the order for installing packages from GitHub:

sjlabelledsjmiscsjstatsggeffectssjPlot

Officiale, stable release

CRAN_Status_Badge    downloads    total

To install the latest stable release from CRAN, type following command into the R console:

install.packages("ggeffects")

Citation

In case you want / have to cite my package, please use citation('ggeffects') for citation information.

News

ggeffects 0.3.1

General

  • Use convert_case() from sjlabelled, in preparation for the latest snakecase-package update.

Bug fixes

  • Model weights are now correctly taken into account.

ggeffects 0.3.0

General

  • Support for brmsfit-models from the brms-package.
  • Support for clm-models from the ordinal-package.
  • Support for multinom-models from the nnet-package.
  • Posterior predictive distributions (see argument ppd) now compute uncertainty intervals also for non-gaussian models.
  • Use functions from package sjstats (link inverse, model frame etc.).
  • If the regression model used weights, ggpredict() now computes the weighted mean as typical value for predictors that are held constant.
  • Use select-helpers from package tidyselect, instead of dplyr.

New functions

  • New summary() function, to provide information on predictions by grouping variables, and on constant values from adjustments.

Changes to functions

  • plot() gets a show.legend-argument to show or hide the legend of plots.

Bug fixes

  • Fixed issues with gam- and vgam-models.

ggeffects 0.2.2

Changes to functions

  • plot() gets a dot.alpha-argument, to specify a different alpha-values for data points when plotting raw data.
  • plot() gets a jitter-argument, to add a small amount of random variation to the location of data points when plotting raw data.
  • plot() and getter-functions (like get_title() or get_x_labels()) get a case-argument, to convert labels into any case, using the snakecase-package.
  • Confidence intervals are now also computed for hurdle, zeroinfl, truncreg and betareg-models. Note, however, that due to some uncertainty, the intervals may not be "smooth".

Bug fixes

  • Confidence intervals for generalized mixed effects models are now computed properly.
  • Different levels for confidence intervals (argument ci.lvl) were not always recognized.
  • Fixed issues with glmmTMB-models.
  • Fixed issues with lme-models.
  • Fixed issue when plotting data returned from ggeffect(), if the term in question was categorical.

ggeffects 0.2.1

General

  • Support for stanreg models (pkg rstanarm).
  • Fixed issue with latest tidyr-update on CRAN.

Bug fixes

  • Plotting raw data with plot() did not work for predictions at specific values (i.e. when certain levels of predictor where selected in square brackets).
  • Computing predictions for mermod-objects did not work when model had only one fixed effects term.

ggeffects 0.2.0

General

  • Updated package imports and dependencies.
  • Support for polr models (pkg MASS).
  • Support for hurdle and zeroinfl models (pkg pscl).
  • Support for betareg models (pkg betareg).
  • Support for truncreg models (pkg truncreg).
  • Support for coxph models (pkg survival).

New functions

  • emm() as convenient shortcut to compute the estimate marginal mean of the model's response value.

Changes to functions

  • plot() gets a use.theme-argument, to use the default ggeffects-theme, or to use the default ggplot-theme.

Bug fixes

  • Fixed issues with columns resp. column names for models that used special functions in formula (e.g. s() for gam-models, or bs() for splines).
  • Fixed issue for wrong legend values when grouping term was a non-labelled factor with non-ordered numeric levels.

ggeffects 0.1.1

Changes to functions

  • ggpredict() computes proper confidence intervals for merMod- and lme-objects.
  • Improved plot()-method, to better plot raw data.

Bug fixes

  • Confidence intervals were not properly calculated for glm's.
  • For plot(), argument rawdata did not work for models with discrete binary response.
  • Fixed issues with models of class lme and glmmTMB.
  • Fixed issues with model-classes that preserved NA-values in model-frame.

ggeffects 0.1.0

General

  • initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.