Marginal Effects for Model Objects

An R port of Stata's 'margins' command, which can be used to calculate marginal (or partial) effects from model objects.

Marginal Effects for Model Objects

The margins and prediction packages are a combined effort to port the functionality of Stata's (closed source) margins command to (open source) R. The major functionality of margins - namely the estimation of marginal (or partial) effects - is provided through a single function, margins(). This is an S3 generic method for calculating the marginal effects of covariates included in model objects (like those of classes "lm" and "glm"). Users interested in generating predicted (fitted) values, such as the "predictive margins" generated by Stata's margins command, should consider using prediction() from the sibling project, prediction.

Motivation

With the introduction of Stata's margins command, it has become incredibly simple to estimate average marginal effects (i.e., "average partial effects") and marginal effects at representative cases. Indeed, in just a few lines of Stata code, regression results for almost any kind model can be transformed into meaningful quantities of interest and related plots:

. import delimited mtcars.csv
. quietly reg mpg c.cyl##c.hp wt
. margins, dydx(*)
------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
cyl |   .0381376   .5998897     0.06   0.950    -1.192735     1.26901
hp |  -.0463187    .014516    -3.19   0.004     -.076103   -.0165343
wt |  -3.119815    .661322    -4.72   0.000    -4.476736   -1.762894
------------------------------------------------------------------------------
. marginsplot Stata's margins command is incredibly robust. It works with nearly any kind of statistical model and estimation procedure, including OLS, generalized linear models, panel regression models, and so forth. It also represents a significant improvement over Stata's previous marginal effects command - mfx - which was subject to various well-known bugs. While other Stata modules have provided functionality for deriving quantities of interest from regression estimates (e.g., Clarify), none has done so with the simplicity and genearlity of margins.

By comparison, R has no robust functionality in the base tools for drawing out marginal effects from model estimates (though the S3 predict() methods implement some of the functionality for computing fitted/predicted values). The closest approximation is modmarg, which does one-variable-at-a-time estimation of marginal effects is quite robust. Other than this relatively new package on the scene, no packages implement appropriate marginal effect estimates. Notably, several packages provide estimates of marginal effects for different types of models. Among these are car, alr3, mfx, erer, among others. Unfortunately, none of these packages implement marginal effects correctly (i.e., correctly account for interrelated variables such as interaction terms (e.g., a:b) or power terms (e.g., I(a^2)) and the packages all implement quite different interfaces for different types of models. interflex, interplot, and plotMElm provide functionality simply for plotting quantities of interest from multiplicative interaction terms in models but do not appear to support general marginal effects displays (in either tabular or graphical form), while visreg provides a more general plotting function but no tabular output. interactionTest provides some additional useful functionality for controlling the false discovery rate when making such plots and interpretations, but is again not a general tool for marginal effect estimation.

Given the challenges of interpreting the contribution of a given regressor in any model that includes quadratic terms, multiplicative interactions, a non-linear transformation, or other complexities, there is a clear need for a simple, consistent way to estimate marginal effects for popular statistical models. This package aims to correctly calculate marginal effects that include complex terms and provide a uniform interface for doing those calculations. Thus, the package implements a single S3 generic method (margins()) that can be easily generalized for any type of model implemented in R.

Some technical details of the package are worth briefly noting. The estimation of marginal effects relies on numerical approximations of derivatives produced using predict() (actually, a wrapper around predict() called prediction() that is type-safe). Variance estimation, by default is provided using the delta method a numerical approximation of the Jacobian matrix. While symbolic differentiation of some models (e.g., basic linear models) is possible using D() and deriv(), R's modelling language (the "formula" class) is sufficiently general to enable the construction of model formulae that contain terms that fall outside of R's symbolic differentiation rule table (e.g., y ~ factor(x) or y ~ I(FUN(x)) for any arbitrary FUN()). By relying on numeric differentiation, margins() supports any model that can be expressed in R formula syntax. Even Stata's margins command is limited in its ability to handle variable transformations (e.g., including x and log(x) as predictors) and quadratic terms (e.g., x^3); these scenarios are easily expressed in an R formula and easily handled, correctly, by margins().

Simple code examples

Replicating Stata's results is incredibly simple using just the margins() method to obtain average marginal effects:

## lm(formula = mpg ~ cyl * hp + wt, data = mtcars)

##      cyl       hp    wt
##  0.03814 -0.04632 -3.12
##  factor     AME     SE       z      p   lower   upper
##     cyl  0.0381 0.5999  0.0636 0.9493 -1.1376  1.2139
##      hp -0.0463 0.0145 -3.1909 0.0014 -0.0748 -0.0179
##      wt -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236

With the exception of differences in rounding, the above results match identically what Stata's margins command produces.

If you are only interested in obtaining the marginal effects (without corresponding variances or the overhead of creating a "margins" object), you can call marginal_effects(x) directly. Furthermore, the dydx() function enables the calculation of the marginal effect of a single named variable:

##     dydx_cyl     dydx_hp   dydx_wt
## 1 -0.6572244 -0.04987248 -3.119815
## 2 -0.6572244 -0.04987248 -3.119815
## 3 -0.9794364 -0.08777977 -3.119815
## 4 -0.6572244 -0.04987248 -3.119815
## 5  0.5747624 -0.01196519 -3.119815
## 6 -0.7519926 -0.04987248 -3.119815
##     dydx_cyl     dydx_hp
## 1 -0.6572244 -0.04987248
## 2 -0.6572244 -0.04987248
## 3 -0.9794364 -0.08777977
## 4 -0.6572244 -0.04987248
## 5  0.5747624 -0.01196519
## 6 -0.7519926 -0.04987248
##     dydx_cyl
## 1 -0.6572244
## 2 -0.6572244
## 3 -0.9794364
## 4 -0.6572244
## 5  0.5747624
## 6 -0.7519926

These functions may be useful for plotting, getting a quick impression of the results, or for using unit-specific marginal effects in further analyses.

Counterfactual Datasets (at) and Subgroup Analyses

The package also implement's one of the best features of margins, which is the at specification that allows for the estimation of average marginal effects for counterfactual datasets in which particular variables are held at fixed values:

##     factor     AME     SE       z      p   lower   upper
##        age  0.0096 0.0008 12.3763 0.0000  0.0081  0.0112
##      group -0.0479 0.0129 -3.7044 0.0002 -0.0733 -0.0226
##  treatment  0.0432 0.0147  2.9320 0.0034  0.0143  0.0720
##     factor age     AME     SE       z      p   lower  upper
##  treatment  20 -0.0009 0.0043 -0.2061 0.8367 -0.0093 0.0075
##  treatment  30  0.0034 0.0107  0.3199 0.7490 -0.0176 0.0245
##  treatment  40  0.0301 0.0170  1.7736 0.0761 -0.0032 0.0634
##  treatment  50  0.0990 0.0217  4.5666 0.0000  0.0565 0.1415
##  treatment  60  0.1896 0.0384  4.9341 0.0000  0.1143 0.2649

This functionality removes the need to modify data before performing such calculations, which can be quite unwieldy when many specifications are desired.

If one desires subgroup effects, simply pass a subset of data to the data argument:

##     factor     AME     SE       z      p   lower   upper
##        age  0.0043 0.0007  5.7723 0.0000  0.0028  0.0057
##      group -0.0753 0.0105 -7.1745 0.0000 -0.0959 -0.0547
##  treatment  0.0381 0.0070  5.4618 0.0000  0.0244  0.0517
##     factor     AME     SE       z      p   lower  upper
##        age  0.0150 0.0013 11.5578 0.0000  0.0125 0.0176
##      group -0.0206 0.0236 -0.8742 0.3820 -0.0669 0.0256
##  treatment  0.0482 0.0231  2.0909 0.0365  0.0030 0.0934

Plotting and Visualization

The package implements several useful additional features for summarizing model objects, including:

• A plot() method for the new "margins" class that ports Stata's marginsplot command.
• A plotting function cplot() to provide the commonly needed visual summaries of predictions or average marginal effects conditional on a covariate.
• A persp() method for "lm", "glm", and "loess" objects to provide three-dimensional representations of response surfaces or marginal effects over two covariates.
• An image() method for the same that produces flat, two-dimensional heatmap-style representations of persp()-type plots.

Using the plot() method yields an aesthetically similar result to Stata's marginsplot:

##  factor     AME     SE        z      p   lower   upper
##  agegrp  0.0846 0.0021  39.4392 0.0000  0.0804  0.0888
##     bmi  0.0261 0.0009  28.4995 0.0000  0.0243  0.0279
##     sex -0.0911 0.0085 -10.7063 0.0000 -0.1077 -0.0744 In addition to the estimation procedures and plot() generic, margins offers several plotting methods for model objects. First, there is a new generic cplot() that displays predictions or marginal effects (from an "lm" or "glm" model) of a variable conditional across values of third variable (or itself). For example, here is a graph of predicted probabilities from a logit model: And fitted values with a factor independent variable: and a graph of the effect of drat across levels of wt: cplot() also returns a data frame of values, so that it can be used just for calculating quantities of interest before plotting them with another graphics package, such as ggplot2:

##   xvals  yvals  upper   lower factor
##  1.5130 0.3250 1.3927 -0.7427   drat
##  1.6760 0.3262 1.1318 -0.4795   drat
##  1.8389 0.3384 0.9214 -0.2447   drat
##  2.0019 0.3623 0.7777 -0.0531   drat
##  2.1648 0.3978 0.7110  0.0846   drat
##  2.3278 0.4432 0.7074  0.1789   drat Second, the package implements methods for "lm" and "glm" class objects for the persp() generic plotting function. This enables three-dimensional representations of predicted outcomes: and marginal effects: And if three-dimensional plots aren't your thing, there are also analogous methods for the image() generic, to produce heatmap-style representations: The numerous package vignettes and help files contain extensive documentation and examples of all package functionality.

Performance

While there is still work to be done to improve performance, margins is reasonably speedy:

## Unit: milliseconds
##                    expr      min       lq     mean   median       uq      max neval
##  marginal_effects(mod1) 3.391895 3.516197 3.989595 3.821821 4.294132 6.779228   100
## Unit: milliseconds
##           expr      min       lq    mean   median       uq      max neval
##  margins(mod1) 24.45411 25.36547 31.3643 26.24414 29.35435 161.8154   100

The most computationally expensive part of margins() is variance estimation. If you don't need variances, use marginal_effects() directly or specify margins(..., vce = "none").

Requirements and Installation

The development version of this package can be installed directly from GitHub using remotes:

margins 0.3.23

• Fix a small issue in print() and summary() methods related to the release of prediction 0.3.6.

margins 0.3.22

• Expanded support for objects of class "merMod" from lme4, including support for variance estimation and an expanded test suite. (#56)

margins 0.3.21

• Modified the internals of gradient_factory() to be more robust to an expanded set of model classes through the introduction of an internal function reset_coefs(). A test suite for this function has been added.

margins 0.3.20

• Added support for objects of class "ivreg" from AER.
• margins.default() now attempts to calculate marginal effect variances in order to, by default, support additional model classes.

margins 0.3.19

• Added support for objects of class "betareg" from betareg. (#90)

margins 0.3.18

• margins() now returns attributes "vcov" and "jacobian" (the latter only when vce = "delta"), which contain the full variance-covariance matrix for the average marginal effects and jacobian for the same. This is different behavior from the previous draft (v0.3.17) because the attributes now always contain a single matrix; again use the vcov() method rather than accessing the attribute directly lest it change in the future. This allows calculation combination of marginal effects, such as the difference between two AMEs. Some internal functions have been renamed and code reorganized to make this possible. (#87, h/t Trenton Mize)
• The "at" attribute returned by margins() now contains the input value passed to the at argument to the function. New attribute "at_vars" returns a character vector of variables specified therein.
• The data frame returned by margins() now contains an added column "_at_number", which specifies which at combination a row comes from. This may be changed or removed in the future, but is useful for matching subsets of the data frame to corresponding entries in the "vcov" and "jacobian" matrices.

margins 0.3.17

• margins() now returns an attribute ("vcov") containing the variance-covariance matrix for the average marginal effects and a new vcov.margins() method is provided for extracting it. Behavior when using at specifications is unspecified and may change in the future. (#87, h/t Trenton Mize)
• Updated examples in README.Rmd. (#83)

margins 0.3.16

• Fixed a bug in cplot() when xvar was of class "ordered". (#77, h/t Francisco Llaneras)
• Fixed a bug in plot.margins() when at contained only one variable. (#78, h/t @cyberbryce)

margins 0.3.15

• Tried to improve the handling of edge case model specifications like y ~ I(x^2), y ~ x + I(2*x), and those involving RHS interactions between factors where some cells are not observed in the data. Added a test suite to cover these cases. (#82)
• Continued to update behavior of internal function find_terms_in_model().

margins 0.3.14

• Fixed a bug in survey-weighted objects involving weights and expanded the test suite to cover these cases.

margins 0.3.13

• Fixed a bug in all functions (ultimately in internal utility clean_terms()) that occurred when formulae contained variables with backticked names that contained spaces. (#80)

margins 0.3.12

• dydx() now uses the performance-enhancing prediction::prediction(..., calculate_se = FALSE) setting, where possible (introduced in prediction 0.2.4)
• data.table::rbindlist() is used instead of base::rbind() inside dydx().

margins 0.3.11

• Changed some internal representations from data frames to matrices in an effort to improve performance. marginal_effects() and dydx() gain an as.data.frame argument to regulate the class of their responses.
• Internal calls to prediction::prediction() were halved by stacking data frames used in calculating numerical derivatives (inside dydx() methods) and then splitting the resulting predicted value vectors.

margins 0.3.10

• Added an (internal use only) argument, varslist, to marginal_effects() and several internal functions that significantly improves performance. The performance gain is due to computational cost of identifying terms in model formulae each time marginal_effects() was called, which occurred repeatedly (e.g., during variance estimation). By performing this once at the margins()-level and passing the argument throughout, margins() is perhaps twice as fast as in versions <= 0.3.9. But, importantly, note that this argument should not be specified by end users!
• Some internal edits were made to the formula-processing functions find_terms_in_model() and clean_terms(), removing many regex calls with the goal of improving performance.
• Removed compiler dependency, which appeared to not improve performance.

margins 0.3.9

• Fixed a bug wherein model formulae involving non-standard variables names with spaces in them led to errors. (#80)

margins 0.3.8

• Added method for "svyglm" from survey.
• Improved handling of survey-weighted estimates. Removed weight-related warnings from margins() for unweighted models.
• print() and summary() now handle survey-weighted marginal effects.

margins 0.3.7

• margins() and marginal_effects() gain a variables argument to request marginal effects for a subset of variables included in a model. (#65, h/t Vincent Arul-Bundock)

margins 0.3.6

• Export margins.merMod(). (#56)

margins 0.3.5

• Added a cplot.clm() method. (#63, h/t David Barron)

margins 0.3.4

• Fixed a bug in cplot.polr(). (#62, h/t David Barron)

margins 0.3.3

• Fixed "margins" object structure in margins.merMod().
• Switched print() and summary() methods to using weighted.mean() instead of mean(). (#45)

margins 0.3.2

• Added method for class "polr" from MASS. (#60)

margins 0.3.1

• Added method for class "nnet" from nnet as an initial implementation of multi-category outcome models. (#60)

margins 0.3.0

• Significantly modified the data structure returned by margins(). It now returns a data frame with an added at attribute, specifying the names of the variables that have been fixed by build_datalist(). (#58)
• Renamed marginal effects, variance, and standard error columns returned by margins(). Marginal effects columns are prefixed by dydx_. Variances of the average marginal effect are stored (repeatedly, across observations) in new Var_dydx_ columns. Unit-specific standard errors, if requested, are stored as SE_dydx_ columns. (#58)
• summary.margins() now returns a single data frame of marginal effect estimates. Column names have also changed to avoid use of special characters (thus making it easier to use column names in plotting with, for example, ggplot2). Row-order can be controlled by the by_factor attribute, which by default sorts the data frame by the factor/term. If set to by_factor = FALSE, the data frame is sorted by the at variables. This behavior cascades into the print.summary.margins() method. (#58)
• print.margins() now presents (but does not return) effect estimates as a condensed data frame with some auxiliary information. Its behavior when using at is improved and tidied. (#58)
• build_margins() is no longer exported. Arguments used to control its behavior have been exposed in margins() methods.
• plot.margins() now displays marginal effects across each level of at. (#58)
• build_margins() and thus margins() no longer returns the original data twice (a bug introduced by change in behavior of prediction()). (#57)
• All methods for objects of class "marginslist" have been removed. (#58)
• The at argument in plot.margins() has been renamed to pos, to avoid ambiguity with at as used elsewhere in the package.
• persp() and image() methods gain a dx argument (akin to that in cplot()) to allow visualization of marginal effects of a variable across levels of two other variables. The default behavior remains unchanged.
• Cleaned up documentation and add some examples.

margins 0.2.26

• Added support for "merMod" models from lme4, though no variance estimation is currently supported.
• Imported prediction::mean_or_mode() for use in cplot() methods.

margins 0.2.25

• cplot.polr() now allows the display of "stacked" (cumulative) predicted probabilities. (#49)
• Added an example of cplot(draw = "add") to display predicted probabilities across a third factor variable. (#46)
• Moved the build_datalist() and seq_range() functions to the prediction package.
• A tentative cplot.multinom() method has been added.

margins 0.2.24

• The internal code of cplot.lm() has been refactored so that the actual plotting code now relies in non-exported utility functions, which can be used in other methods. This should make it easier to maintain existing methods and add new ones. (#49)
• A new cplot() method for objects of class "polr" has been added (#49).

margins 0.2.23

• The extract_marginal_effects() function has been removed and replaced by marginal_effects() methods for objects of classes "margins" and "marginslist".
• Added a dependency on prediction v.0.1.3 and, implicitly, an enhances suggestion of survey v3.31-5 to resolve an underlying prediction() issue for models of class "svyglm". (#47, h/t Carl Ganz)

margins 0.2.20

• A warning is now issued when a model uses weights, indicating that they are ignored. (#4)
• Various errors and warnings that occurred when applying margins() to a model with weights have been fixed.
• cplot() now issues an error when attempting to display the effects of a factor (with > 2 levels).

margins 0.2.20

• Fixed a bug in get_effect_variances(vce = "bootstrap"), wherein the variance of the marginal effects was always zero.

margins 0.2.20

• Factored the prediction() generic and methods into a separate package, prediction, to ease maintainence.
• Added a print.summary.margins() method to separate construction of the summary data frame the printing thereof.
• The "Technical Details" vignette now describes the package functionality and computational approach in near-complete detail.

margins 0.2.19

• Plotting functions cplot(), persp(), and image() gain a vcov argumetn to pass to `build_margins(). (#43)
• cplot() now allows for the display of multiple conditional relationships by setting draw = "add". (#32)
• The package Introduction vignette has improved examples, including ggplot2 examples using cplot() data. (#31)

margins 0.2.18

• Added support in dydx.default() to allow the calculation of various discrete changes rather than only numerical derivatives.

margins 0.2.17

• Fixes to handling of factors and ordered variables converted within formulae. (#38)
• Reconfigured the data argument in margins() and prediction() to be clearer about what is happening when it is set to missing.

margins 0.2.16

• Switched to using a more reliable "central difference" numerical differentiation and updated the calculation of the step size to follow marfx (#31, h/t Jeffrey Arnold)
• Added some functionality prediction() methods to, hopefully, reduce memory footprint of model objects. (#26)
• Changed the capitalization of the variances field in "margins" objects (to lower case), for consistency.
• Fixed some small errors in documentation and improved width of examples.

margins 0.2.15

• Expose previously internal dydx() generic and methods to provide variable-specific marginal effects calculations. (#31)
• Added example dataset from marfx package. (#31)

margins 0.2.13

• Added support for calculating marginal effects of logical terms, treating them as factors. (#31)

margins 0.2.12

• Added an image() method for "lm", "glm", and "loess" objects, as a flat complement to existing persp() methods. (#42)

margins 0.2.11

• Added a prediction() method for "gls" objects (from MASS::gls()). (#3)

margins 0.2.10

• Replaced numDeriv::jacobian() with an internal alternative. (#41)

margins 0.2.8

• Added a prediction() method for "ivreg" objects (from AER::ivreg()). (#3)
• Added a prediction() method for "survreg" objects (from survival::survreg()). (#3)

margins 0.2.7

• Added a prediction() method for "polr" objects (from MASS::polr()). (#3)
• Added a prediction() method for "coxph" objects (from survival::coxph()). (#3)

margins 0.2.7

• marginal_effects() and prediction() are now S3 generics, with methods for "lm" and "glm" objects, improving extensability. (#39, #40)
• prediction() returns a new class ("prediction") and gains a print() method.
• Added preliminary support for "loess" objects, including methods for prediction(), marginal_effects(), cplot(), and persp(). No effect variances are currently calculated. (#3)
• Added a prediction() method for "nls" objects. (#3)
• Internal function get_effect_variances() gains a "none" option for the vce argument, to skip calculation of ME variances.

margins 0.2.7

• marginal_effects() issues a warning (rather than fails) when trying to extract the marginal effect of a factor variable that was coerced to numeric in a model formula via I(). (#38)

margins 0.2.5

• Added better support for factor x variables in cplot().
• Added (rudimentary) tests of variance methods. (#21)
• Removed .build_predict_fun() factory function, as it was no longer needed.
• Fix vignettes so package can be built with them. (#16)

margins 0.2.4

• Modified marginal_effects() to use a vectorized approach to simple numerical differentiation. (#36/#37, h/t Vincent Arel-Bundock)
• Removed margins.plm() method, which didn't actually work because "plm" does not provide a predict() method.
• Updated Stata/R comparison documents included in inst/doc.
• Expanded tests of unit-specific variances. (#21)

margins 0.2.3

• Added a logical argument to enable/disable calculation of unit-specific marginal effect variances and set it to FALSE by default. (#36, h/t Vincent Arel-Bundock)

margins 0.2.2

• Removed support for "marginal effects at means" (MEMs) and the atmeans argument throughout package. (#35)
• Renamed the vc argument to vcov for consistency with other packages. (#34)

margins 0.2.1

• build_margins() now returns columns containing unit-specific standard errors of marginal effects.
• Added a vc argument to build_margins() to allow the passing of arbitrary variance-covariance matrices. (#16, h/t Alex Coppock & Gijs Schumacher)
• cplot() now draws confidence intervals for "effect" plots.
• Fixed a bug in get_marginal_effects() wherein the method argument was ignored. This improves performance significantly when using method = "simple" (the default differentiation method).

margins 0.2.0

• Added persp() methods for "lm" and "glm" class objects to display 3-dimensional representations of predicted values and marginal effects.
• Added plot.margins() method for mimicking Stata's marginsplot behavior.
• Added cplot() generic and methods for "lm" and "glm" class objects to display conditional predictions and conditional marginal effects in the style of the interplot and plotMElm packages.
• Added various variance estimation procedures for marginal effects: delta method (the default), bootstrap, and simulation (ala Clarify).
• Fixed estimation of marginal effect variances for generalized linear models, so that they are correct on both "link" and "response" scales.
• Exposed two internal marginal effect estimation functions. First, build_margins() is called by margins() methods (perhaps repeatedly) and actually assembles a "margins" object from a model and data. It is never necessary to call this directly, but may be useful for very simple marginal effect estimation procedures (i.e., using original data with no at specification). Second, marginal_effects() is the very low level function that differentiates a model with respect to some input data (or calculate discrete changes in the outcome with respect to factor variables). This is the fastest way to obtain marginal effects without the overhead of creating a "margins" object (for which variance estimation is fairly time-consuming).
• Implemented estimation of "discrete change" representations of marginal effects of factor variables in models, ala Stata's default settings.
• Re-implemented marginal effects estimation using numeric derivatives provided by numDeriv::grad() rather than symbolic differentiation. This allows margins() to handle almost any model that can be specified in R, including models that cannot be specified in Stata.
• Used compiler to byte compile prediction and gradient fucntions, thereby improving estimation speed.
• The internal build_datalist() now checks for specification of illegal factor levels in at and errors when these are encountered.
• Use the webuse package to handle examples.

margins 0.1.0

• Initial package released.

Reference manual

install.packages("margins")

0.3.23 by Thomas J. Leeper, a year ago

https://github.com/leeper/margins

Report a bug at https://github.com/leeper/margins/issues

Browse source code at https://github.com/cran/margins

Authors: Thomas J. Leeper [aut, cre] >) , Jeffrey Arnold [ctb] , Vincent Arel-Bundock [ctb]

Documentation:   PDF Manual