A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flow--procedures and algorithms that may be essential to one user's application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.
All models in Zelig can be estimated and results explored presented using four simple functions:
zelig to estimate the parameters,
setx to set fitted values for which we want to find quantities of
sim to simulate the quantities of interest,
plot to plot the simulation results.
Zelig 5 introduced reference classes. These enable a different way of working with Zelig that is detailed in a separate vignette. Directly using the reference class architecture is optional. They are not used in the examples below.
Let’s walk through an example. This example uses the swiss dataset. It contains data on fertility and socioeconomic factors in Switzerland’s 47 French-speaking provinces in 1888 (Mosteller and Tukey, 1977, 549-551). We will model the effect of education on fertility, where education is measured as the percent of draftees with education beyond primary school and fertility is measured using the common standardized fertility measure (see Muehlenbein (2010, 80-81) for details).
If you haven't already done so, open your R console and install Zelig. We recommend installing Zelig with the zeligverse package. This installs core Zelig and ancillary packages at once.
Alternatively you can install the development version of Zelig with:
Once Zelig is installed, load it:
Let’s assume we want to estimate the effect of education on fertility.
Since fertility is a continuous variable, least squares (
ls) is an
appropriate model choice. To estimate our model, we call the
function with three two arguments: equation, model type, and data:
# load data data(swiss) # estimate ls model z5_1 <- zelig(Fertility ~ Education, model = "ls", data = swiss, cite = FALSE) # model summary summary(z5_1) ## Model: ## ## Call: ## z5$zelig(formula = Fertility ~ Education, data = swiss) ## ## Residuals: ## Min 1Q Median 3Q Max ## -17.036 -6.711 -1.011 9.526 19.689 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 79.6101 2.1041 37.836 < 2e-16 ## Education -0.8624 0.1448 -5.954 3.66e-07 ## ## Residual standard error: 9.446 on 45 degrees of freedom ## Multiple R-squared: 0.4406, Adjusted R-squared: 0.4282 ## F-statistic: 35.45 on 1 and 45 DF, p-value: 3.659e-07 ## ## Next step: Use 'setx' method
The -0.86 coefficient on education suggests a negative relationship
between the education of a province and its fertility rate. More
precisely, for every one percent increase in draftees educated beyond
primary school, the fertility rate of the province decreases 0.86 units.
To help us better interpret this finding, we may want other quantities
of interest, such as expected values or first differences. Zelig makes
this simple by automating the translation of model estimates into
interpretable quantities of interest using Monte Carlo simulation
methods (see King, Tomz, and Wittenberg (2000) for more information).
For example, let’s say we want to examine the effect of increasing the
percent of draftees educated from 5 to 15. To do so, we set our
predictor value using the
# set education to 5 and 15 z5_1 <- setx(z5_1, Education = 5) z5_1 <- setx1(z5_1, Education = 15) # model summary summary(z5_1) ## setx: ## (Intercept) Education ## 1 1 5 ## setx1: ## (Intercept) Education ## 1 1 15 ## ## Next step: Use 'sim' method
After setting our predictor value, we simulate using the
# run simulations and estimate quantities of interest z5_1 <- sim(z5_1) # model summary summary(z5_1) ## ## sim x : ## ----- ## ev ## mean sd 50% 2.5% 97.5% ## 1 75.30616 1.658283 75.28057 72.12486 78.48007 ## pv ## mean sd 50% 2.5% 97.5% ## [1,] 75.28028 9.707597 75.60282 57.11199 94.3199 ## ## sim x1 : ## ----- ## ev ## mean sd 50% 2.5% 97.5% ## 1 66.66467 1.515977 66.63699 63.66668 69.64761 ## pv ## mean sd 50% 2.5% 97.5% ## [1,] 66.02916 9.441273 66.32583 47.19223 82.98039 ## fd ## mean sd 50% 2.5% 97.5% ## 1 -8.641488 1.442774 -8.656953 -11.43863 -5.898305
At this point, we’ve estimated a model, set the predictor value, and
estimated easily interpretable quantities of interest. The
method shows us our quantities of interest, namely, our expected and
predicted values at each level of education, as well as our first
differences–the difference in expected values at the set levels of
plot() function plots the estimated quantities of interest:
We can also simulate and plot simulations from ranges of simulated values:
z5_2 <- zelig(Fertility ~ Education, model = "ls", data = swiss, cite = FALSE) # set Education to range from 5 to 15 at single integer increments z5_2 <- setx(z5_2, Education = 5:15) # run simulations and estimate quantities of interest z5_2 <- sim(z5_2)
Then use the
plot() function as before:
z5_2 <- plot(z5_2)
The primary documentation for Zelig is available at: http://docs.zeligproject.org/articles/.
Within R, you can access function help using the normal
If you are looking for details on particular estimation model methods,
you can also use the
? function. Simply place a
z before the model
name. For example, to access details about the
logit model use:
Zelig can be fully checked and build using the code in check_build_zelig.R. Note that this can be time consuming due to the extensive test coverage.
All changes to Zelig are documented here. GitHub issue numbers are given after each change note when relevant. See https://github.com/IQSS/Zelig/issues. External contributors are referenced with their GitHub usernames when applicable.
++++ All Zelig time series models will be deprecated on 1 February 2018 ++++
Resolved an issue where
odds_ratios standard errors were not correctly
relogit models. Thanks to @retrography. #302
Zelig 4 compatability wrappers now work for
arima models. Thanks to
Resolved an error when only
setx was called with
arima models Thanks to
Resolved an error when
summary was called after
Resolved an error when
sim is used with differenced first-order
autoregressive models. #307
arima models return informative error when
data is not found. #308
Compatibility with testthat 2.0.0
Documentation updated to correctly reflect that
Speed improvements made to
relogit. Thanks to @retrography. #88
relogit weighted case control method to that described in
King and Langche (2001, eq. 11) and used in the Stata
relogitmodels via the
odds_ratios = TRUEargument. #302
zquantile with Amelia imputed data now working. #277
vcov now works with
rq quantile regression models.
More informative error handling for conflicting
Resolved and issue with
relogit that produced a warning when the fitted
model object was passed to
!EXPERIMENTAL! interface function
to_zelig allows users to convert fitted model
objects fitted outside of Zelig to a Zelig object. The function is called
setx wrapper if a non-Zelig object is supplied. Currently
only works for models fitted with
lm and many estimated with
get_pvalue function wrappers created for
get_pvalue methods, respectively. #269
combine_coef_se is given a model estimated without multiply imputed
data or bootstraps, an error is no longer returned. Instead a list of the
models' untransformed coefficients, standard errors, and p-values is returned. #268
logit models now accepts the argument
TRUE odds ratio estimates are returned rather than coefficient estimates.
Thanks to Adam Obeng. PR/#270.
sim fail informatively when passed ZeligEI objects. #271
Resolved a bug where
weights were not being passed to
in survey models. #258
Due to limited functionality and instability, zelig survey estimations
no return a warning and a link to documentation on how to use
setx to bipass
Resolved a bug where
from_zelig_model would not extract fitted model
objects for models estimated using
get_se now work for models estimated using
mlogit, and getter (#266) documentation.
Average Treatment Effect on the Treated (ATT) vignette added to the online documentation http://docs.zeligproject.org/articles/att.html
Corrected vignette URLs.
Introduce a new model type for instrumental-variable regression:
based on the
ivreg from the AER package. #223
Use the Formula package for formulas. This will enable a common syntax for
multiple equations, though currently in Core Zelig it is only
zelig calls now support
updateing formulas (#244) and
. syntax for
inserting all variables from
data on the right-hand side of the formula
log transformations are now supported in
ivreg regressors). #225
factor transformations are now supported in
Restored quantile regression (
model = "rq"). Currently only supports one
tau at a time. #255
get_qi wrapper for
ATT wrapper for
gee models can now be estimated with multiply imputed data. #263
zelig returns an error if
weights are specified in a model estimated
with multiply imputed data. (not possible before, but uninformative error
Code improvement to
factor_coef_combine so it does not return a warning
for model types with more than 1 declared class.
Reorganize README files to meet new CRAN requirements.
zquantile as the latter is depricated.
Depends on the survival package in order to enable
setx for exponential
models without explicitly loading survival. #254
relogit now only accepts one
tau per call (similar to
to address #257.
Additional unit tests.
combine_coef_se takes as input a
zelig model estimated
using multiply imputed data or bootstrapping and returns a list of coefficients,
standard errors, z-values, and p-values combined across the estimations. Thanks
to @vincentarelbundock for prompting. #229
The following changes were primarily made to re-established Zelig integration with WhatIf. #236
zelig_setx_to_df for extracted fitted values created by
Fitted factor level variable values are returned in a single column (not
by parameter level) by
setx used with a range of fitted values) now creates
scenarios based on matches of equal length set ranges. This enables
work with polynomials, splines, etc. (currently only when these are created
outside of the
zelig call). #238
Resolve a bug where appropriate
plots were not created for
Arguments (such as
xlab) can now be passed to
qi_slimmer bug with multinomial response models
Resolved a bug where
returned errors. Thanks to @vincentarelbundock for initially reporting. #231
Reduced number of digits show from
summary for fitted model objects.
!! Breaking change !! the
get* functions (e.g.
getcoef) now use
_ to delimit words in the function names (e.g.
Added a number of new "getter" methods for extracting estimation elements:
get_names method to return Zelig object field names. Also available via
get_residuals to extract fitted model residuals. Also available via
get_df_residuals method to return residual degrees-of-freedom.
Also accessible via
get_model_data method to return the data frame used to estimate the
get_se methods to return estimated model p-values and
standard errors. Thank you to @vincentarelbundock for contributions. #147
zelig_qi_to_df function for extracting simulated quantities of interest
from a Zelig object and returning them as a tidy-formatted data frame. #189
setx returns an error if it is unable to find a supplied variable name.
setx1 wrapper added to facilitate piped workflows for first differences.
zelig can handle independent variables that are transformed using the
natural logarithm inside of the call. #225
Corrected an issue where
plot would tend to choose a factor level as the
x-axis variable when plotting a range of simulations. #226
If a factor level variable's fitted value is not specified in
it is multi-modal, the last factor in the factor list is arbitrarily chosen.
This replaces previous behavior where the level was randomly chosen, causing
unuseful quantity of interest range plots. #226
Corrected a bug where
summary for ranges of
setx would only show the
first scenario. Now all scenarios are shown. #226
Corrected a bug where the README.md was not included in the CRAN build.
to_zelig_mi now can accept a list of data frames. Thanks to
Internal code improvements.
Allows users to convert an independent variable to a factor within a
from_zelig_model function to extract original fitted model objects from
zelig estimation calls. This is useful for conducting non-Zelig supported
post-estimation and easy integration with the texreg and stargazer packages
for formatted parameter estimate tables. #189
Additional MC tests for a wide range of models. #160
Resolves a bug from
sim would fail for models that included
factor level independent variables. #156
Fixed an issue with
ids was hard coded as
ATT bug introduced in 5.0-14. #194
ci.plot bug with
timeseries models introduced in 5.0-15. #204
mode has been deprecated. Please use
The Zelig 4
sim wrapper now intelligently looks for fitted values from the
reference class object if not supplied via the x argument.
to_zelig_mi utility function for combining multiply imputed data sets
for passing to
mi will also work to enable backwards compatibility. #178
Initial development on a new testing architecture and more tests for
model-*, Zelig 4 wrappers,
ci.plot, and the Zelig workflow.
graph method now accepts simulations from
setrange. For the
former it uses
ci.plot for the latter.
Improved error messages for Zelig 4 wrappers.
Improved error messages if Zelig methods are supplied with too little information.
model-arima now fails if the dependent variable does not vary for one of the
Minor documentation improvements for Zelig 4 wrappers.
Dynamically generated README.md.
Removed plyr package dependency.
rbind_all replaced by
bind_rows as the former is deprecated by dplyr.
Other internal code improvements