Marginal Effects for Model Objects

An R port of Stata's 'margins' command, which can be used to calculate marginal (or partial) effects from model objects.


margins is an effort to port the marginal effect estimation functionality of Stata's (closed source) margins command to R. This is done through a single function, margins(), which is an S3 generic method for calculating the marginal effects (or "partial effects") of covariates included in model objects (like those of classes "lm" and "glm").

Underlying functions marginal_effects() and dydx() provide low-level interfaces to the numerical differentiation, and the package implements several useful additional features for summarizing model objects, including:

  • A plot() method for the new "margins" class that ports Stata's marginsplot command.
  • A persp() method for "lm", "glm", and "loess" objects to provide three-dimensional representations of response surfaces.
  • An image() method for the same that produces flat, two-dimensional heatmap-style representations of response surfaces.
  • A plotting function cplot() to provide the commonly needed visual summaries of predictions or marginal effects conditional on a second variable.

Users interested in generating predicted (fitted) values, such as the "predictive margins" generated by Stata's margins command, should consider using prediction() from the prediction package.

Motivation

With the introduction of Stata's margins command, it has become incredibly simple to estimate average marginal effects (i.e., "average partial effects") and marginal effects at representative cases. Indeed, in just a few lines of Stata code, regression results for almost any kind model can be transformed into meaningful quantities of interest and related plots:

. import delimited mtcars.csv
. quietly reg mpg c.cyl##c.hp wt
. margins, dydx(*)
------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         cyl |   .0381376   .5998897     0.06   0.950    -1.192735     1.26901
          hp |  -.0463187    .014516    -3.19   0.004     -.076103   -.0165343
          wt |  -3.119815    .661322    -4.72   0.000    -4.476736   -1.762894
------------------------------------------------------------------------------
. marginsplot

marginsplot

Stata's margins command is incredibly robust. It works with nearly any kind of statistical model and estimation procedure, including OLS, generalized linear models, panel regression models, and so forth. It also represents a significant improvement over Stata's previous marginal effects command - mfx - which was subject to various well-known bugs. While other Stata modules have provided functionality for deriving quantities of interest from regression estimates (e.g., Clarify), none has done so with the simplicity and genearlity of margins.

By comparison, R has no robust functionality in the base tools for drawing out marginal effects from model estimates (though the S3 predict() methods implement some of the functionality for computing fitted/predicted values). Nor do any add-on packages implement appropriate marginal effect estimates. Notably, several packages provide estimates of marginal effects for different types of models. Among these are car, alr3, mfx, erer, among others. Unfortunately, none of these packages implement marginal effects correctly (i.e., correctly account for interrelated variables such as interaction terms (e.g., a:b) or power terms (e.g., I(a^2)) and the packages all implement quite different interfaces for different types of models. interplot and plotMElm provide functionality simply for plotting quantities of interest from multiplicative interaction terms in models but do not appear to support general marginal effects displays (in either tabular or graphical form), while visreg provides a more general plotting function but no tabular output. interactionTest provides some additional useful functionality for controlling the false discovery rate when making such plots and interpretations, but is again not a general tool for marginal effect estimation.

Given the challenges of interpreting the contribution of a given regressor in any model that includes quadratic terms, multiplicative interactions, a non-linear transformation, or other complexities, there is a clear need for a simple, consistent way to estimate marginal effects for popular statistical models. This package aims to correctly calculate marginal effects that include complex terms and provide a uniform interface for doing those calculations. Thus, the package implements a single S3 generic method (margins()) that can be easily generalized for any type of model implemented in R.

Some technical details of the package are worth briefly noting. The estimation of marginal effects relies on numerical approximations of derivatives produced using predict() (actually, a wrapper around predict() called prediction() that is type-safe). Variance estimation, by default is provided using the delta method a numerical approximation of the Jacobian matrix. While symbolic differentiation of some models (e.g., basic linear models) is possible using D() and deriv(), R's modelling language (the "formula" class) is sufficiently general to enable the construction of model formulae that contain terms that fall outside of R's symbolic differentiation rule table (e.g., y ~ factor(x) or y ~ I(FUN(x)) for any arbitrary FUN()). By relying on numeric differentiation, margins() supports any model that can be expressed in R formula syntax. Even Stata's margins command is limited in its ability to handle variable transformations (e.g., including x and log(x) as predictors) and quadratic terms (e.g., x^3); these scenarios are easily expressed in an R formula and easily handled, correctly, by margins().

Simple code examples

Replicating Stata's results is incredibly simple using just the margins() method to obtain average marginal effects:

library("margins")
x <- lm(mpg ~ cyl * hp + wt, data = mtcars)
(m <- margins(x))
## Average marginal effects
## lm(formula = mpg ~ cyl * hp + wt, data = mtcars)
##      cyl       hp    wt
##  0.03814 -0.04632 -3.12
summary(m)
##  factor     AME     SE       z      p   lower   upper
##     cyl  0.0381 0.5999  0.0636 0.9493 -1.1376  1.2139
##      hp -0.0463 0.0145 -3.1909 0.0014 -0.0748 -0.0179
##      wt -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236

With the exception of differences in rounding, the above results match identically what Stata's margins command produces. Using the plot() method also yields an aesthetically similar result to Stata's marginsplot:

plot(m)

plot of chunk marginsplot

If you are only interested in obtaining the marginal effects (without corresponding variances or the overhead of creating a "margins" object), you can call marginal_effects(x) directly. Furthermore, the dydx() function enables the calculation of the marginal effect of a single named variable:

# all marginal effects, as a data.frame
head(marginal_effects(x))
##     dydx_cyl     dydx_hp   dydx_wt
## 1 -0.6572244 -0.04987248 -3.119815
## 2 -0.6572244 -0.04987248 -3.119815
## 3 -0.9794364 -0.08777977 -3.119815
## 4 -0.6572244 -0.04987248 -3.119815
## 5  0.5747624 -0.01196519 -3.119815
## 6 -0.7519926 -0.04987248 -3.119815
# marginal effect of one variable
head(dydx(mtcars, x, "hp"))
##       dydx_hp
## 1 -0.04987248
## 2 -0.04987248
## 3 -0.08777977
## 4 -0.04987248
## 5 -0.01196519
## 6 -0.04987248

These functions may be useful, for example, for plotting, or getting a quick impression of the results.

While there is still work to be done to improve performance, margins is reasonably speedy:

library("microbenchmark")
microbenchmark(marginal_effects(x))
## Unit: milliseconds
##                 expr      min       lq     mean   median       uq      max neval
##  marginal_effects(x) 11.67015 11.98566 13.39177 12.31466 13.59553 63.25713   100
microbenchmark(margins(x))
## Unit: milliseconds
##        expr      min       lq     mean   median       uq      max neval
##  margins(x) 88.40953 91.17482 95.45244 92.99871 96.37431 183.5197   100

In addition to the estimation procedures and plot() generic, margins offers several plotting methods for model objects. First, there is a new generic cplot() that displays predictions or marginal effects (from an "lm" or "glm" model) of a variable conditional across values of third variable (or itself). For example, here is a graph of predicted probabilities from a logit model:

m <- glm(am ~ wt*drat, data = mtcars, family = binomial)
cplot(m, x = "wt", se.type = "shade")

plot of chunk cplot1

And fitted values with a factor independent variable:

cplot(lm(Sepal.Length ~ Species, data = iris))

plot of chunk cplot2

and a graph of the effect of drat across levels of wt:

cplot(m, x = "wt", dx = "drat", what = "effect", se.type = "shade")
## Warning in warn_for_weights(model): 'weights' used in model estimation are currently ignored!

plot of chunk cplot3

Second, the package implements methods for "lm" and "glm" class objects for the persp() generic plotting function. This enables three-dimensional representations of predicted outcomes:

persp(x, xvar = "cyl", yvar = "hp")

plot of chunk persp1

and marginal effects:

persp(x, xvar = "cyl", yvar = "hp", what = "effect", nx = 10)

plot of chunk persp2

And if three-dimensional plots aren't your thing, there are also analogous methods for the image() generic, to produce heatmap-style representations:

image(x, xvar = "cyl", yvar = "hp", main = "Predicted Fuel Efficiency,\nby Cylinders and Horsepower")

plot of chunk image11

The numerous package vignettes and help files contain extensive documentation and examples of all package functionality.

Requirements and Installation

CRAN Build Status Build status codecov.io Project Status: Active - The project has reached a stable, usable state and is being actively developed.

The development version of this package can be installed directly from GitHub using ghit:

if (!require("ghit")) {
    install.packages("ghit")
    library("ghit")
}
install_github("leeper/prediction")
install_github("leeper/margins")
 
# building vignettes takes a moment, so for a quicker install set:
install_github("leeper/margins", build_vignettes = FALSE)

News

margins 0.3.0

  • Significantly modified the data structure returned by margins(). It now returns a data frame with an added at attribute, specifying the names of the variables that have been fixed by build_datalist(). (#58)
  • Renamed marginal effects, variance, and standard error columns returned by margins(). Marginal effects columns are prefixed by dydx_. Variances of the average marginal effect are stored (repeatedly, across observations) in new Var_dydx_ columns. Unit-specific standard errors, if requested, are stored as SE_dydx_ columns. (#58)
  • summary.margins() now returns a single data frame of marginal effect estimates. Column names have also changed to avoid use of special characters (thus making it easier to use column names in plotting with, for example, ggplot2). Row-order can be controlled by the by_factor attribute, which by default sorts the data frame by the factor/term. If set to by_factor = FALSE, the data frame is sorted by the at variables. This behavior cascades into the print.summary.margins() method. (#58)
  • print.margins() now presents (but does not return) effect estimates as a condensed data frame with some auxiliary information. Its behavior when using at is improved and tidied. (#58)
  • build_margins() is no longer exported. Arguments used to control its behavior have been exposed in margins() methods.
  • plot.margins() now displays marginal effects across each level of at. (#58)
  • build_margins() and thus margins() no longer returns the original data twice (a bug introduced by change in behavior of prediction()). (#57)
  • All methods for objects of class "marginslist" have been removed. (#58)
  • The at argument in plot.margins() has been renamed to pos, to avoid ambiguity with at as used elsewhere in the package.
  • persp() and image() methods gain a dx argument (akin to that in cplot()) to allow visualization of marginal effects of a variable across levels of two other variables. The default behavior remains unchanged.
  • Cleaned up documentation and add some examples.

margins 0.2.26

  • Added support for "merMod" models from lme4, though no variance estimation is currently supported.
  • Imported prediction::mean_or_mode() for use in cplot() methods.

margins 0.2.25

  • cplot.polr() now allows the display of "stacked" (cumulative) predicted probabilities. (#49)
  • Added an example of cplot(draw = "add") to display predicted probabilities across a third factor variable. (#46)
  • Moved the build_datalist() and seq_range() functions to the prediction package.
  • A tentative cplot.multinom() method has been added.

margins 0.2.24

  • The internal code of cplot.lm() has been refactored so that the actual plotting code now relies in non-exported utility functions, which can be used in other methods. This should make it easier to maintain existing methods and add new ones. (#49)
  • A new cplot() method for objects of class "polr" has been added (#49).

margins 0.2.23

  • The extract_marginal_effects() function has been removed and replaced by marginal_effects() methods for objects of classes "margins" and "marginslist".
  • Added a dependency on prediction v.0.1.3 and, implicitly, an enhances suggestion of survey v3.31-5 to resolve an underlying prediction() issue for models of class "svyglm". (#47, h/t Carl Ganz)

margins 0.2.20

  • A warning is now issued when a model uses weights, indicating that they are ignored. (#4)
  • Various errors and warnings that occurred when applying margins() to a model with weights have been fixed.
  • cplot() now issues an error when attempting to display the effects of a factor (with > 2 levels).

margins 0.2.20

  • Fixed a bug in get_effect_variances(vce = "bootstrap"), wherein the variance of the marginal effects was always zero.

margins 0.2.20

  • Factored the prediction() generic and methods into a separate package, prediction, to ease maintainence.
  • Added a print.summary.margins() method to separate construction of the summary data frame the printing thereof.
  • The "Technical Details" vignette now describes the package functionality and computational approach in near-complete detail.

margins 0.2.19

  • Plotting functions cplot(), persp(), and image() gain a vcov argumetn to pass to `build_margins(). (#43)
  • cplot() now allows for the display of multiple conditional relationships by setting draw = "add". (#32)
  • The package Introduction vignette has improved examples, including ggplot2 examples using cplot() data. (#31)

margins 0.2.18

  • Added support in dydx.default() to allow the calculation of various discrete changes rather than only numerical derivatives.

margins 0.2.17

  • Fixes to handling of factors and ordered variables converted within formulae. (#38)
  • Reconfigured the data argument in margins() and prediction() to be clearer about what is happening when it is set to missing.

margins 0.2.16

  • Switched to using a more reliable "central difference" numerical differentiation and updated the calculation of the step size to follow marfx (#31, h/t Jeffrey Arnold)
  • Added some functionality prediction() methods to, hopefully, reduce memory footprint of model objects. (#26)
  • Changed the capitalization of the variances field in "margins" objects (to lower case), for consistency.
  • Fixed some small errors in documentation and improved width of examples.

margins 0.2.15

  • Expose previously internal dydx() generic and methods to provide variable-specific marginal effects calculations. (#31)
  • Added example dataset from marfx package. (#31)

margins 0.2.13

  • Added support for calculating marginal effects of logical terms, treating them as factors. (#31)

margins 0.2.12

  • Added an image() method for "lm", "glm", and "loess" objects, as a flat complement to existing persp() methods. (#42)

margins 0.2.11

  • Added a prediction() method for "gls" objects (from MASS::gls()). (#3)

margins 0.2.10

  • Replaced numDeriv::jacobian() with an internal alternative. (#41)

margins 0.2.8

  • Added a prediction() method for "ivreg" objects (from AER::ivreg()). (#3)
  • Added a prediction() method for "survreg" objects (from survival::survreg()). (#3)

margins 0.2.7

  • Added a prediction() method for "polr" objects (from MASS::polr()). (#3)
  • Added a prediction() method for "coxph" objects (from survival::coxph()). (#3)

margins 0.2.7

  • marginal_effects() and prediction() are now S3 generics, with methods for "lm" and "glm" objects, improving extensability. (#39, #40)
  • prediction() returns a new class ("prediction") and gains a print() method.
  • Added preliminary support for "loess" objects, including methods for prediction(), marginal_effects(), cplot(), and persp(). No effect variances are currently calculated. (#3)
  • Added a prediction() method for "nls" objects. (#3)
  • Internal function get_effect_variances() gains a "none" option for the vce argument, to skip calculation of ME variances.

margins 0.2.7

  • marginal_effects() issues a warning (rather than fails) when trying to extract the marginal effect of a factor variable that was coerced to numeric in a model formula via I(). (#38)

margins 0.2.5

  • Added better support for factor x variables in cplot().
  • Added (rudimentary) tests of variance methods. (#21)
  • Removed .build_predict_fun() factory function, as it was no longer needed.
  • Fix vignettes so package can be built with them. (#16)

margins 0.2.4

  • Modified marginal_effects() to use a vectorized approach to simple numerical differentiation. (#36/#37, h/t Vincent Arel-Bundock)
  • Removed margins.plm() method, which didn't actually work because "plm" does not provide a predict() method.
  • Updated Stata/R comparison documents included in inst/doc.
  • Expanded tests of unit-specific variances. (#21)

margins 0.2.3

  • Added a logical argument to enable/disable calculation of unit-specific marginal effect variances and set it to FALSE by default. (#36, h/t Vincent Arel-Bundock)

margins 0.2.2

  • Removed support for "marginal effects at means" (MEMs) and the atmeans argument throughout package. (#35)
  • Renamed the vc argument to vcov for consistency with other packages. (#34)

margins 0.2.1

  • build_margins() now returns columns containing unit-specific standard errors of marginal effects.
  • Added a vc argument to build_margins() to allow the passing of arbitrary variance-covariance matrices. (#16, h/t Alex Coppock & Gijs Schumacher)
  • cplot() now draws confidence intervals for "effect" plots.
  • Fixed a bug in get_marginal_effects() wherein the method argument was ignored. This improves performance significantly when using method = "simple" (the default differentiation method).

margins 0.2.0

  • Added persp() methods for "lm" and "glm" class objects to display 3-dimensional representations of predicted values and marginal effects.
  • Added plot.margins() method for mimicking Stata's marginsplot behavior.
  • Added cplot() generic and methods for "lm" and "glm" class objects to display conditional predictions and conditional marginal effects in the style of the interplot and plotMElm packages.
  • Added various variance estimation procedures for marginal effects: delta method (the default), bootstrap, and simulation (ala Clarify).
  • Fixed estimation of marginal effect variances for generalized linear models, so that they are correct on both "link" and "response" scales.
  • Exposed two internal marginal effect estimation functions. First, build_margins() is called by margins() methods (perhaps repeatedly) and actually assembles a "margins" object from a model and data. It is never necessary to call this directly, but may be useful for very simple marginal effect estimation procedures (i.e., using original data with no at specification). Second, marginal_effects() is the very low level function that differentiates a model with respect to some input data (or calculate discrete changes in the outcome with respect to factor variables). This is the fastest way to obtain marginal effects without the overhead of creating a "margins" object (for which variance estimation is fairly time-consuming).
  • Implemented estimation of "discrete change" representations of marginal effects of factor variables in models, ala Stata's default settings.
  • Re-implemented marginal effects estimation using numeric derivatives provided by numDeriv::grad() rather than symbolic differentiation. This allows margins() to handle almost any model that can be specified in R, including models that cannot be specified in Stata.
  • Used compiler to byte compile prediction and gradient fucntions, thereby improving estimation speed.
  • The internal build_datalist() now checks for specification of illegal factor levels in at and errors when these are encountered.
  • Use the webuse package to handle examples.

margins 0.1.0

  • Initial package released.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("margins")

0.3.23 by Thomas J. Leeper, a month ago


https://github.com/leeper/margins


Report a bug at https://github.com/leeper/margins/issues


Browse source code at https://github.com/cran/margins


Authors: Thomas J. Leeper [aut, cre] (<https://orcid.org/0000-0003-4097-6326>>), Jeffrey Arnold [ctb], Vincent Arel-Bundock [ctb]


Documentation:   PDF Manual  


Task views: Econometrics


MIT + file LICENSE license


Imports utils, stats, prediction, data.table, graphics, grDevices, MASS

Suggests methods, knitr, rmarkdown, testthat, ggplot2, gapminder, sandwich, stargazer, lme4

Enhances AER, betareg, nnet, ordinal, survey


Imported by konfound.

Suggested by MarginalMediation, estimatr.


See at CRAN