An R port of Stata's 'margins' command, which can be used to calculate marginal (or partial) effects from model objects.

**margins** is an effort to port the marginal effect estimation functionality of Stata's (closed source) `margins`

command to R. This is done through a single function, `margins()`

, which is an S3 generic method for calculating the marginal effects (or "partial effects") of covariates included in model objects (like those of classes "lm" and "glm").

Underlying functions `marginal_effects()`

and `dydx()`

provide low-level interfaces to the numerical differentiation, and the package implements several useful additional features for summarizing model objects, including:

- A
`plot()`

method for the new "margins" class that ports Stata's`marginsplot`

command. - A
`persp()`

method for "lm", "glm", and "loess" objects to provide three-dimensional representations of response surfaces. - An
`image()`

method for the same that produces flat, two-dimensional heatmap-style representations of response surfaces. - A plotting function
`cplot()`

to provide the commonly needed visual summaries of predictions or marginal effects conditional on a second variable.

Users interested in generating predicted (fitted) values, such as the "predictive margins" generated by Stata's `margins`

command, should consider using `prediction()`

from the **prediction** package.

With the introduction of Stata's `margins`

command, it has become incredibly simple to estimate average marginal effects (i.e., "average partial effects") and marginal effects at representative cases. Indeed, in just a few lines of Stata code, regression results for almost any kind model can be transformed into meaningful quantities of interest and related plots:

```
. import delimited mtcars.csv
. quietly reg mpg c.cyl##c.hp wt
. margins, dydx(*)
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cyl | .0381376 .5998897 0.06 0.950 -1.192735 1.26901
hp | -.0463187 .014516 -3.19 0.004 -.076103 -.0165343
wt | -3.119815 .661322 -4.72 0.000 -4.476736 -1.762894
------------------------------------------------------------------------------
. marginsplot
```

Stata's `margins`

command is incredibly robust. It works with nearly any kind of statistical model and estimation procedure, including OLS, generalized linear models, panel regression models, and so forth. It also represents a significant improvement over Stata's previous marginal effects command - `mfx`

- which was subject to various well-known bugs. While other Stata modules have provided functionality for deriving quantities of interest from regression estimates (e.g., Clarify), none has done so with the simplicity and genearlity of `margins`

.

By comparison, R has no robust functionality in the base tools for drawing out marginal effects from model estimates (though the S3 `predict()`

methods implement some of the functionality for computing fitted/predicted values). Nor do any add-on packages implement appropriate marginal effect estimates. Notably, several packages provide estimates of marginal effects for different types of models. Among these are car, alr3, mfx, erer, among others. Unfortunately, none of these packages implement marginal effects correctly (i.e., correctly account for interrelated variables such as interaction terms (e.g., `a:b`

) or power terms (e.g., `I(a^2)`

) and the packages all implement quite different interfaces for different types of models. interplot and plotMElm provide functionality simply for plotting quantities of interest from multiplicative interaction terms in models but do not appear to support general marginal effects displays (in either tabular or graphical form), while visreg provides a more general plotting function but no tabular output. interactionTest provides some additional useful functionality for controlling the false discovery rate when making such plots and interpretations, but is again not a general tool for marginal effect estimation.

Given the challenges of interpreting the contribution of a given regressor in any model that includes quadratic terms, multiplicative interactions, a non-linear transformation, or other complexities, there is a clear need for a simple, consistent way to estimate marginal effects for popular statistical models. This package aims to correctly calculate marginal effects that include complex terms and provide a uniform interface for doing those calculations. Thus, the package implements a single S3 generic method (`margins()`

) that can be easily generalized for any type of model implemented in R.

Some technical details of the package are worth briefly noting. The estimation of marginal effects relies on numerical approximations of derivatives produced using `predict()`

(actually, a wrapper around `predict()`

called `prediction()`

that is type-safe). Variance estimation, by default is provided using the delta method a numerical approximation of the Jacobian matrix. While symbolic differentiation of some models (e.g., basic linear models) is possible using `D()`

and `deriv()`

, R's modelling language (the "formula" class) is sufficiently general to enable the construction of model formulae that contain terms that fall outside of R's symbolic differentiation rule table (e.g., `y ~ factor(x)`

or `y ~ I(FUN(x))`

for any arbitrary `FUN()`

). By relying on numeric differentiation, `margins()`

supports *any* model that can be expressed in R formula syntax. Even Stata's `margins`

command is limited in its ability to handle variable transformations (e.g., including `x`

and `log(x)`

as predictors) and quadratic terms (e.g., `x^3`

); these scenarios are easily expressed in an R formula and easily handled, correctly, by `margins()`

.

Replicating Stata's results is incredibly simple using just the `margins()`

method to obtain average marginal effects:

`library("margins")x <- lm(mpg ~ cyl * hp + wt, data = mtcars)(m <- margins(x))`

```
## Average marginal effects
```

```
## lm(formula = mpg ~ cyl * hp + wt, data = mtcars)
```

```
## cyl hp wt
## 0.03814 -0.04632 -3.12
```

`summary(m)`

```
## factor AME SE z p lower upper
## cyl 0.0381 0.5999 0.0636 0.9493 -1.1376 1.2139
## hp -0.0463 0.0145 -3.1909 0.0014 -0.0748 -0.0179
## wt -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236
```

With the exception of differences in rounding, the above results match identically what Stata's `margins`

command produces. Using the `plot()`

method also yields an aesthetically similar result to Stata's `marginsplot`

:

`plot(m)`

If you are only interested in obtaining the marginal effects (without corresponding variances or the overhead of creating a "margins" object), you can call `marginal_effects(x)`

directly. Furthermore, the `dydx()`

function enables the calculation of the marginal effect of a single named variable:

`# all marginal effects, as a data.framehead(marginal_effects(x))`

```
## dydx_cyl dydx_hp dydx_wt
## 1 -0.6572244 -0.04987248 -3.119815
## 2 -0.6572244 -0.04987248 -3.119815
## 3 -0.9794364 -0.08777977 -3.119815
## 4 -0.6572244 -0.04987248 -3.119815
## 5 0.5747624 -0.01196519 -3.119815
## 6 -0.7519926 -0.04987248 -3.119815
```

`# marginal effect of one variablehead(dydx(mtcars, x, "hp"))`

```
## dydx_hp
## 1 -0.04987248
## 2 -0.04987248
## 3 -0.08777977
## 4 -0.04987248
## 5 -0.01196519
## 6 -0.04987248
```

These functions may be useful, for example, for plotting, or getting a quick impression of the results.

While there is still work to be done to improve performance, **margins** is reasonably speedy:

`library("microbenchmark")microbenchmark(marginal_effects(x))`

```
## Unit: milliseconds
## expr min lq mean median uq max neval
## marginal_effects(x) 11.67015 11.98566 13.39177 12.31466 13.59553 63.25713 100
```

`microbenchmark(margins(x))`

```
## Unit: milliseconds
## expr min lq mean median uq max neval
## margins(x) 88.40953 91.17482 95.45244 92.99871 96.37431 183.5197 100
```

In addition to the estimation procedures and `plot()`

generic, **margins** offers several plotting methods for model objects. First, there is a new generic `cplot()`

that displays predictions or marginal effects (from an "lm" or "glm" model) of a variable conditional across values of third variable (or itself). For example, here is a graph of predicted probabilities from a logit model:

`m <- glm(am ~ wt*drat, data = mtcars, family = binomial)cplot(m, x = "wt", se.type = "shade")`

And fitted values with a factor independent variable:

`cplot(lm(Sepal.Length ~ Species, data = iris))`

and a graph of the effect of `drat`

across levels of `wt`

:

`cplot(m, x = "wt", dx = "drat", what = "effect", se.type = "shade")`

```
## Warning in warn_for_weights(model): 'weights' used in model estimation are currently ignored!
```

Second, the package implements methods for "lm" and "glm" class objects for the `persp()`

generic plotting function. This enables three-dimensional representations of predicted outcomes:

`persp(x, xvar = "cyl", yvar = "hp")`

and marginal effects:

`persp(x, xvar = "cyl", yvar = "hp", what = "effect", nx = 10)`

And if three-dimensional plots aren't your thing, there are also analogous methods for the `image()`

generic, to produce heatmap-style representations:

`image(x, xvar = "cyl", yvar = "hp", main = "Predicted Fuel Efficiency,\nby Cylinders and Horsepower")`

The numerous package vignettes and help files contain extensive documentation and examples of all package functionality.

The development version of this package can be installed directly from GitHub using `ghit`

:

`if (!require("ghit")) { install.packages("ghit") library("ghit")}install_github("leeper/prediction")install_github("leeper/margins") # building vignettes takes a moment, so for a quicker install set:install_github("leeper/margins", build_vignettes = FALSE)`

- Significantly modified the data structure returned by
`margins()`

. It now returns a data frame with an added`at`

attribute, specifying the names of the variables that have been fixed by`build_datalist()`

. (#58) - Renamed marginal effects, variance, and standard error columns returned by
`margins()`

. Marginal effects columns are prefixed by`dydx_`

. Variances of the*average*marginal effect are stored (repeatedly, across observations) in new`Var_dydx_`

columns. Unit-specific standard errors, if requested, are stored as`SE_dydx_`

columns. (#58) `summary.margins()`

now returns a single data frame of marginal effect estimates. Column names have also changed to avoid use of special characters (thus making it easier to use column names in plotting with, for example, ggplot2). Row-order can be controlled by the`by_factor`

attribute, which by default sorts the data frame by the factor/term. If set to`by_factor = FALSE`

, the data frame is sorted by the`at`

variables. This behavior cascades into the`print.summary.margins()`

method. (#58)`print.margins()`

now presents (but does not return) effect estimates as a condensed data frame with some auxiliary information. Its behavior when using`at`

is improved and tidied. (#58)`build_margins()`

is no longer exported. Arguments used to control its behavior have been exposed in`margins()`

methods.`plot.margins()`

now displays marginal effects across each level of`at`

. (#58)`build_margins()`

and thus`margins()`

no longer returns the original data twice (a bug introduced by change in behavior of`prediction()`

). (#57)- All methods for objects of class
`"marginslist"`

have been removed. (#58) - The
`at`

argument in`plot.margins()`

has been renamed to`pos`

, to avoid ambiguity with`at`

as used elsewhere in the package. `persp()`

and`image()`

methods gain a`dx`

argument (akin to that in`cplot()`

) to allow visualization of marginal effects of a variable across levels of two other variables. The default behavior remains unchanged.- Cleaned up documentation and add some examples.

- Added support for
`"merMod"`

models from**lme4**, though no variance estimation is currently supported. - Imported
`prediction::mean_or_mode()`

for use in`cplot()`

methods.

`cplot.polr()`

now allows the display of "stacked" (cumulative) predicted probabilities. (#49)- Added an example of
`cplot(draw = "add")`

to display predicted probabilities across a third factor variable. (#46) - Moved the
`build_datalist()`

and`seq_range()`

functions to the**prediction**package. - A tentative
`cplot.multinom()`

method has been added.

- The internal code of
`cplot.lm()`

has been refactored so that the actual plotting code now relies in non-exported utility functions, which can be used in other methods. This should make it easier to maintain existing methods and add new ones. (#49) - A new
`cplot()`

method for objects of class`"polr"`

has been added (#49).

- The
`extract_marginal_effects()`

function has been removed and replaced by`marginal_effects()`

methods for objects of classes`"margins"`

and`"marginslist"`

. - Added a dependency on
**prediction**v.0.1.3 and, implicitly, an enhances suggestion of**survey**v3.31-5 to resolve an underlying`prediction()`

issue for models of class`"svyglm"`

. (#47, h/t Carl Ganz)

- A warning is now issued when a model uses weights, indicating that they are ignored. (#4)
- Various errors and warnings that occurred when applying
`margins()`

to a model with weights have been fixed. `cplot()`

now issues an error when attempting to display the effects of a factor (with > 2 levels).

- Fixed a bug in
`get_effect_variances(vce = "bootstrap")`

, wherein the variance of the marginal effects was always zero.

- Factored the
`prediction()`

generic and methods into a separate package,**prediction**, to ease maintainence. - Added a
`print.summary.margins()`

method to separate construction of the summary data frame the printing thereof. - The "Technical Details" vignette now describes the package functionality and computational approach in near-complete detail.

- Plotting functions
`cplot()`

,`persp()`

, and`image()`

gain a`vcov`

argumetn to pass to `build_margins(). (#43) `cplot()`

now allows for the display of multiple conditional relationships by setting`draw = "add"`

. (#32)- The package Introduction vignette has improved examples, including ggplot2 examples using
`cplot()`

data. (#31)

- Added support in
`dydx.default()`

to allow the calculation of various discrete changes rather than only numerical derivatives.

- Fixes to handling of factors and ordered variables converted within formulae. (#38)
- Reconfigured the
`data`

argument in`margins()`

and`prediction()`

to be clearer about what is happening when it is set to missing.

- Switched to using a more reliable "central difference" numerical differentiation and updated the calculation of the step size to follow
`marfx`

(#31, h/t Jeffrey Arnold) - Added some functionality
`prediction()`

methods to, hopefully, reduce memory footprint of model objects. (#26) - Changed the capitalization of the
`variances`

field in "margins" objects (to lower case), for consistency. - Fixed some small errors in documentation and improved width of examples.

- Expose previously internal
`dydx()`

generic and methods to provide variable-specific marginal effects calculations. (#31) - Added example dataset from
**marfx**package. (#31)

- Added support for calculating marginal effects of logical terms, treating them as factors. (#31)

- Added an
`image()`

method for "lm", "glm", and "loess" objects, as a flat complement to existing`persp()`

methods. (#42)

- Added a
`prediction()`

method for "gls" objects (from`MASS::gls()`

). (#3)

- Replaced
`numDeriv::jacobian()`

with an internal alternative. (#41)

- Added a
`prediction()`

method for "ivreg" objects (from`AER::ivreg()`

). (#3) - Added a
`prediction()`

method for "survreg" objects (from`survival::survreg()`

). (#3)

- Added a
`prediction()`

method for "polr" objects (from`MASS::polr()`

). (#3) - Added a
`prediction()`

method for "coxph" objects (from`survival::coxph()`

). (#3)

`marginal_effects()`

and`prediction()`

are now S3 generics, with methods for "lm" and "glm" objects, improving extensability. (#39, #40)`prediction()`

returns a new class ("prediction") and gains a`print()`

method.- Added preliminary support for "loess" objects, including methods for
`prediction()`

,`marginal_effects()`

,`cplot()`

, and`persp()`

. No effect variances are currently calculated. (#3) - Added a
`prediction()`

method for "nls" objects. (#3) - Internal function
`get_effect_variances()`

gains a "none" option for the`vce`

argument, to skip calculation of ME variances.

`marginal_effects()`

issues a warning (rather than fails) when trying to extract the marginal effect of a factor variable that was coerced to numeric in a model formula via`I()`

. (#38)

- Added better support for factor
`x`

variables in`cplot()`

. - Added (rudimentary) tests of variance methods. (#21)
- Removed
`.build_predict_fun()`

factory function, as it was no longer needed. - Fix vignettes so package can be built with them. (#16)

- Modified
`marginal_effects()`

to use a vectorized approach to simple numerical differentiation. (#36/#37, h/t Vincent Arel-Bundock) - Removed
`margins.plm()`

method, which didn't actually work because "plm" does not provide a`predict()`

method. - Updated Stata/R comparison documents included in
`inst/doc`

. - Expanded tests of unit-specific variances. (#21)

- Added a logical argument to enable/disable calculation of unit-specific marginal effect variances and set it to FALSE by default. (#36, h/t Vincent Arel-Bundock)

- Removed support for "marginal effects at means" (MEMs) and the
`atmeans`

argument throughout package. (#35) - Renamed the
`vc`

argument to`vcov`

for consistency with other packages. (#34)

`build_margins()`

now returns columns containing unit-specific standard errors of marginal effects.- Added a
`vc`

argument to`build_margins()`

to allow the passing of arbitrary variance-covariance matrices. (#16, h/t Alex Coppock & Gijs Schumacher) `cplot()`

now draws confidence intervals for "effect" plots.- Fixed a bug in
`get_marginal_effects()`

wherein the`method`

argument was ignored. This improves performance significantly when using`method = "simple"`

(the default differentiation method).

- Added
`persp()`

methods for "lm" and "glm" class objects to display 3-dimensional representations of predicted values and marginal effects. - Added
`plot.margins()`

method for mimicking Stata's`marginsplot`

behavior. - Added
`cplot()`

generic and methods for "lm" and "glm" class objects to display conditional predictions and conditional marginal effects in the style of the**interplot**and**plotMElm**packages. - Added various variance estimation procedures for marginal effects: delta method (the default), bootstrap, and simulation (ala
**Clarify**). - Fixed estimation of marginal effect variances for generalized linear models, so that they are correct on both "link" and "response" scales.
- Exposed two internal marginal effect estimation functions. First,
`build_margins()`

is called by`margins()`

methods (perhaps repeatedly) and actually assembles a "margins" object from a model and data. It is never necessary to call this directly, but may be useful for very simple marginal effect estimation procedures (i.e., using original data with no`at`

specification). Second,`marginal_effects()`

is the very low level function that differentiates a model with respect to some input data (or calculate discrete changes in the outcome with respect to factor variables). This is the fastest way to obtain marginal effects without the overhead of creating a "margins" object (for which variance estimation is fairly time-consuming). - Implemented estimation of "discrete change" representations of marginal effects of factor variables in models, ala Stata's default settings.
- Re-implemented marginal effects estimation using numeric derivatives provided by
`numDeriv::grad()`

rather than symbolic differentiation. This allows`margins()`

to handle almost any model that can be specified in R, including models that cannot be specified in Stata. - Used
**compiler**to byte compile prediction and gradient fucntions, thereby improving estimation speed. - The internal
`build_datalist()`

now checks for specification of illegal factor levels in`at`

and errors when these are encountered. - Use the
**webuse**package to handle examples.

- Initial package released.