Everyone's Statistical Software

A framework that brings together an abundance of common statistical models found across packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the package. Zelig allows each individual package, for each statistical model, to be accessed by a common uniformly structured call and set of arguments. Moreover, Zelig automates all the surrounding building blocks of a statistical work-flow--procedures and algorithms that may be essential to one user's application but which the original package developer did not use in their own research and might not themselves support. These include bootstrapping, jackknifing, and re-weighting of data. In particular, Zelig automatically generates predicted and simulated quantities of interest (such as relative risk ratios, average treatment effects, first differences and predicted and expected values) to interpret and visualize complex models.


zelig-logo

Release: CRANVersion CRAN MonthlyDownloads CRANTotal Downloads

Development: Project Status: Active - The project has reached astable, usable state and is being activelydeveloped. Travis (LINUX) BuildStatus AppVeyor (Windows) BuildStatus codecov Dev-Blog

Zelig workflow overview

All models in Zelig can be estimated and results explored presented using four simple functions:

  1. zelig to estimate the parameters,

  2. setx to set fitted values for which we want to find quantities of interest,

  3. sim to simulate the quantities of interest,

  4. plot to plot the simulation results.

Zelig 5 introduced reference classes. These enable a different way of working with Zelig that is detailed in a separate vignette. Directly using the reference class architecture is optional. They are not used in the examples below.

Zelig Quickstart Guide

Let’s walk through an example. This example uses the swiss dataset. It contains data on fertility and socioeconomic factors in Switzerland’s 47 French-speaking provinces in 1888 (Mosteller and Tukey, 1977, 549-551). We will model the effect of education on fertility, where education is measured as the percent of draftees with education beyond primary school and fertility is measured using the common standardized fertility measure (see Muehlenbein (2010, 80-81) for details).

Installing and Loading Zelig

If you haven't already done so, open your R console and install Zelig. We recommend installing Zelig with the zeligverse package. This installs core Zelig and ancillary packages at once.

install.packages('zeligverse')

Alternatively you can install the development version of Zelig with:

devtools::install_github('IQSS/Zelig')

Once Zelig is installed, load it:

library(zeligverse)

Building Models

Let’s assume we want to estimate the effect of education on fertility. Since fertility is a continuous variable, least squares (ls) is an appropriate model choice. To estimate our model, we call the zelig() function with three two arguments: equation, model type, and data:

# load data
data(swiss)

# estimate ls model
z5_1 <- zelig(Fertility ~ Education, model = "ls", data = swiss, cite = FALSE)

# model summary
summary(z5_1)

## Model: 
## 
## Call:
## z5$zelig(formula = Fertility ~ Education, data = swiss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.036  -6.711  -1.011   9.526  19.689 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  79.6101     2.1041  37.836  < 2e-16
## Education    -0.8624     0.1448  -5.954 3.66e-07
## 
## Residual standard error: 9.446 on 45 degrees of freedom
## Multiple R-squared:  0.4406, Adjusted R-squared:  0.4282 
## F-statistic: 35.45 on 1 and 45 DF,  p-value: 3.659e-07
## 
## Next step: Use 'setx' method

The -0.86 coefficient on education suggests a negative relationship between the education of a province and its fertility rate. More precisely, for every one percent increase in draftees educated beyond primary school, the fertility rate of the province decreases 0.86 units. To help us better interpret this finding, we may want other quantities of interest, such as expected values or first differences. Zelig makes this simple by automating the translation of model estimates into interpretable quantities of interest using Monte Carlo simulation methods (see King, Tomz, and Wittenberg (2000) for more information). For example, let’s say we want to examine the effect of increasing the percent of draftees educated from 5 to 15. To do so, we set our predictor value using the setx() and setx1() functions:

# set education to 5 and 15
z5_1 <- setx(z5_1, Education = 5)
z5_1 <- setx1(z5_1, Education = 15)

# model summary
summary(z5_1)

## setx:
##   (Intercept) Education
## 1           1         5
## setx1:
##   (Intercept) Education
## 1           1        15
## 
## Next step: Use 'sim' method

After setting our predictor value, we simulate using the sim() method:

# run simulations and estimate quantities of interest
z5_1 <- sim(z5_1)

# model summary
summary(z5_1)

## 
##  sim x :
##  -----
## ev
##       mean       sd      50%     2.5%    97.5%
## 1 75.30616 1.658283 75.28057 72.12486 78.48007
## pv
##          mean       sd      50%     2.5%   97.5%
## [1,] 75.28028 9.707597 75.60282 57.11199 94.3199
## 
##  sim x1 :
##  -----
## ev
##       mean       sd      50%     2.5%    97.5%
## 1 66.66467 1.515977 66.63699 63.66668 69.64761
## pv
##          mean       sd      50%     2.5%    97.5%
## [1,] 66.02916 9.441273 66.32583 47.19223 82.98039
## fd
##        mean       sd       50%      2.5%     97.5%
## 1 -8.641488 1.442774 -8.656953 -11.43863 -5.898305

At this point, we’ve estimated a model, set the predictor value, and estimated easily interpretable quantities of interest. The summary() method shows us our quantities of interest, namely, our expected and predicted values at each level of education, as well as our first differences–the difference in expected values at the set levels of education.

Zelig’s plot() function plots the estimated quantities of interest:

plot(z5_1)

We can also simulate and plot simulations from ranges of simulated values:

z5_2 <- zelig(Fertility ~ Education, model = "ls", data = swiss, cite = FALSE)

# set Education to range from 5 to 15 at single integer increments
z5_2 <- setx(z5_2, Education = 5:15)

# run simulations and estimate quantities of interest
z5_2 <- sim(z5_2)

Then use the plot() function as before:

z5_2 <- plot(z5_2)

Getting help

The primary documentation for Zelig is available at: http://docs.zeligproject.org/articles/.

Within R, you can access function help using the normal ? function, e.g.:

?setx

If you are looking for details on particular estimation model methods, you can also use the ? function. Simply place a z before the model name. For example, to access details about the logit model use:

?zlogit

Building Zelig (for developers)

Zelig can be fully checked and build using the code in check_build_zelig.R. Note that this can be time consuming due to the extensive test coverage.

News

All changes to Zelig are documented here. GitHub issue numbers are given after each change note when relevant. See https://github.com/IQSS/Zelig/issues. External contributors are referenced with their GitHub usernames when applicable.

Zelig version 5.1.5

++++ All Zelig time series models will be deprecated on 1 February 2018 ++++

Bug fixes

  • Resolved an issue where odds_ratios standard errors were not correctly returned for logit and relogit models. Thanks to @retrography. #302

  • Zelig 4 compatability wrappers now work for arima models. Thanks to @mbsabath. #280

  • Resolved an error when only setx was called with arima models Thanks to @mbsabath. #299

  • Resolved an error when summary was called after sim for arima models.

305

  • Resolved an error when sim is used with differenced first-order autoregressive models. #307

  • arima models return informative error when data is not found. #308

Minor

  • Compatibility with testthat 2.0.0

  • Documentation updated to correctly reflect that tobit wraps AER::tobit.

315

  • Package terminology in documentation corrected. #316

Zelig version 5.1-4

Major changes

  • Speed improvements made to relogit. Thanks to @retrography. #88

  • Returns relogit weighted case control method to that described in King and Langche (2001, eq. 11) and used in the Stata relogit implementation.

295

  • Odds ratios now returned from summary with relogit models via the odds_ratios = TRUE argument. #302

Zelig version 5.1-3

Major changes

  • Roxygen documentation improvements.

Minor changes and bug fixes

  • zquantile with Amelia imputed data now working. #277

  • vcov now works with rq quantile regression models.

  • More informative error handling for conflicting timeseries model arguments. #283

  • Resolved and issue with relogit that produced a warning when the fitted model object was passed to predict. #291

Zelig version 5.1-2

Major changes

  • !EXPERIMENTAL! interface function to_zelig allows users to convert fitted model objects fitted outside of Zelig to a Zelig object. The function is called within the setx wrapper if a non-Zelig object is supplied. Currently only works for models fitted with lm and many estimated with glm and svyglm. #189

  • get_se and get_pvalue function wrappers created for get_se and get_pvalue methods, respectively. #269

  • If combine_coef_se is given a model estimated without multiply imputed data or bootstraps, an error is no longer returned. Instead a list of the models' untransformed coefficients, standard errors, and p-values is returned. #268

  • summary for logit models now accepts the argument odds_ratios. When TRUE odds ratio estimates are returned rather than coefficient estimates. Thanks to Adam Obeng. PR/#270.

  • setx and sim fail informatively when passed ZeligEI objects. #271

Minor changes and bug fixes

  • Resolved a bug where weights were not being passed to svydesign in survey models. #258

  • Due to limited functionality and instability, zelig survey estimations no return a warning and a link to documentation on how to use to_survey via setx to bipass zelig. #273

  • Resolved a bug where from_zelig_model would not extract fitted model objects for models estimated using vglm. #265

  • get_pvalue and get_se now work for models estimated using vglm. #267

  • Improved ivreg, mlogit, and getter (#266) documentation.

Zelig version 5.1-1

Minor changes

Zelig version 5.1-0

Major changes

  • Introduce a new model type for instrumental-variable regression: ivreg based on the ivreg from the AER package. #223

  • Use the Formula package for formulas. This will enable a common syntax for multiple equations, though currently in Core Zelig it is only enhances ivreg. #241

  • zelig calls now support updateing formulas (#244) and . syntax for inserting all variables from data on the right-hand side of the formula

87. See also #247.

  • Arbitrary log transformations are now supported in zelig calls (exept for ivreg regressors). #225

  • Arbitrary as.factor and factor transformations are now supported in zelig calls.

  • Restored quantile regression (model = "rq"). Currently only supports one tau at a time. #255

  • Added get_qi wrapper for get_qi method.

  • Added ATT wrapper for ATT method.

  • gee models can now be estimated with multiply imputed data. #263

Minor changes and bug fixes

  • zelig returns an error if weights are specified in a model estimated with multiply imputed data. (not possible before, but uninformative error returned)

  • Code improvement to factor_coef_combine so it does not return a warning for model types with more than 1 declared class.

  • Reorganize README files to meet new CRAN requirements.

  • Switch bind_rows for rbind_all in zquantile as the latter is depricated.

255

  • Depends on the survival package in order to enable setx for exponential models without explicitly loading survival. #254

  • relogit now only accepts one tau per call (similar to quantile). Fixed to address #257.

  • Additional unit tests.

Zelig version 5.0-17

Major changes

  • New function combine_coef_se takes as input a zelig model estimated using multiply imputed data or bootstrapping and returns a list of coefficients, standard errors, z-values, and p-values combined across the estimations. Thanks to @vincentarelbundock for prompting. #229

  • The following changes were primarily made to re-established Zelig integration with WhatIf. #236

    • Added zelig_setx_to_df for extracted fitted values created by setx.

    • Fitted factor level variable values are returned in a single column (not by parameter level) by zelig_qi_to_df.

  • setrange (including setx used with a range of fitted values) now creates scenarios based on matches of equal length set ranges. This enables setx to work with polynomials, splines, etc. (currently only when these are created outside of the zelig call). #238

Minor changes and bug fixes

  • Resolve a bug where appropriate plots were not created for mlogitbayes. #206

  • Arguments (such as xlab) can now be passed to plot. #237

  • zelig_qi_to_df and qi_slimmer bug with multinomial response models resolved. #235

  • Resolved a bug where coef, coefficients, vcov, fitted, and predict returned errors. Thanks to @vincentarelbundock for initially reporting. #231

  • Reduced number of digits show from summary for fitted model objects.

Zelig version 5.0-16

Major changes

  • !! Breaking change !! the get* functions (e.g. getcoef) now use underscores _ to delimit words in the function names (e.g. get_coef). #214

  • Added a number of new "getter" methods for extracting estimation elements:

    • get_names method to return Zelig object field names. Also available via names. #216

    • get_residuals to extract fitted model residuals. Also available via residuals.

    • get_df_residuals method to return residual degrees-of-freedom. Also accessible via df.residuals.

    • get_model_data method to return the data frame used to estimate the original model.

    • get_pvalue and get_se methods to return estimated model p-values and standard errors. Thank you to @vincentarelbundock for contributions. #147

  • zelig_qi_to_df function for extracting simulated quantities of interest from a Zelig object and returning them as a tidy-formatted data frame. #189

  • setx returns an error if it is unable to find a supplied variable name.

  • setx1 wrapper added to facilitate piped workflows for first differences.

  • zelig can handle independent variables that are transformed using the natural logarithm inside of the call. #225

Minor changes and bug fixes

  • Corrected an issue where plot would tend to choose a factor level as the x-axis variable when plotting a range of simulations. #226

  • If a factor level variable's fitted value is not specified in setx and it is multi-modal, the last factor in the factor list is arbitrarily chosen. This replaces previous behavior where the level was randomly chosen, causing unuseful quantity of interest range plots. #226

  • Corrected a bug where summary for ranges of setx would only show the first scenario. Now all scenarios are shown. #226

  • Corrected a bug where the README.md was not included in the CRAN build.

  • to_zelig_mi now can accept a list of data frames. Thanks to @vincentarelbundock.

  • Internal code improvements.

Zelig version 5.0-15

Major changes

  • Allows users to convert an independent variable to a factor within a zelig call using as.factor. #213

  • from_zelig_model function to extract original fitted model objects from zelig estimation calls. This is useful for conducting non-Zelig supported post-estimation and easy integration with the texreg and stargazer packages for formatted parameter estimate tables. #189

  • Additional MC tests for a wide range of models. #160

Minor changes

  • Solved deep assignment issue that returned a series of warnings on build. #172

Bug fixes

  • Resolves a bug from set where sim would fail for models that included factor level independent variables. #156

  • Fixed an issue with model-survey where ids was hard coded as ~1. #144

  • Fixed ATT bug introduced in 5.0-14. #194

  • Fixed ci.plot bug with timeseries models introduced in 5.0-15. #204

Zelig version 5.0-14

Major changes

  • mode has been deprecated. Please use Mode. #152

  • The Zelig 4 sim wrapper now intelligently looks for fitted values from the reference class object if not supplied via the x argument.

  • New to_zelig_mi utility function for combining multiply imputed data sets for passing to zelig. mi will also work to enable backwards compatibility. #178

  • Initial development on a new testing architecture and more tests for model-*, Zelig 4 wrappers, ci.plot, and the Zelig workflow.

  • graph method now accepts simulations from setx and setrange. For the former it uses qi.plot and ci.plot for the latter.

  • Improved error messages for Zelig 4 wrappers.

  • Improved error messages if Zelig methods are supplied with too little information.

  • model-arima now fails if the dependent variable does not vary for one of the cases.

Minor changes

  • Minor documentation improvements for Zelig 4 wrappers.

  • Dynamically generated README.md.

  • Removed plyr package dependency.

  • rbind_all replaced by bind_rows as the former is deprecated by dplyr.

  • Other internal code improvements

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("Zelig")

5.1.6 by Christopher Gandrud, 4 months ago


https://cran.r-project.org/package=Zelig


Report a bug at https://github.com/IQSS/Zelig/issues


Browse source code at https://github.com/cran/Zelig


Authors: Christine Choirat [aut], Christopher Gandrud [aut, cre], James Honaker [aut], Kosuke Imai [aut], Gary King [aut], Olivia Lau [aut], IQSS Harvard University [cph]


Documentation:   PDF Manual  


Task views: Econometrics, Empirical Finance, Statistics for the Social Sciences


GPL (>= 3) license


Imports AER, Amelia, coda, dplyr, Formula, geepack, jsonlite, sandwich, MASS, MatchIt, maxLik, MCMCpack, methods, quantreg, survey, VGAM

Depends on survival

Suggests ei, eiPack, knitr, networkD3, optmatch, rmarkdown, testthat, tidyverse, ZeligChoice, ZeligEI, zeligverse


Imported by WhatIf, ZeligChoice, ZeligEI, zeligverse.

Suggested by Amelia, hot.deck, mice, sjPlot, sjlabelled, sjstats, texreg.

Enhanced by stargazer.


See at CRAN