'dplyr'-Like Syntax for Summary Statistics of Survey Data

Use piping, verbs like 'group_by' and 'summarize', and other 'dplyr' inspired syntactic style when calculating summary statistics on survey data using functions from the 'survey' package.

CRAN_Status_Badge Travis-CI Build Status AppVeyor Build Status Coverage Status Documentation via pkgdown

srvyr brings parts of dplyr's syntax to survey analysis, using the survey package.

srvyr focuses on calculating summary statistics from survey data, such as the mean, total or quantile. It allows for the use of many dplyr verbs, such as summarize, group_by, and mutate, the convenience of pipe-able functions, rlang's style of non-standard evaluation and more consistent return types than the survey package.

You can try it out:

# devtools::install_github("gergness/srvyr")

Example usage

First, describe the variables that define the survey's stucture with the function as_survey()with the bare column names of the names that you would use in functions from the survey package like survey::svydesign(), survey::svrepdesign() or survey::twophase().

library(srvyr, warn.conflicts = FALSE)
data(api, package = "survey")
dstrata <- apistrat %>%
   as_survey_design(strata = stype, weights = pw)

Now many of the dplyr verbs are available.

  • mutate() adds or modifies a variable.
dstrata <- dstrata %>%
  mutate(api_diff = api00 - api99)
  • summarise() calculates summary statistics such as mean, total, quantile or ratio.
dstrata %>% 
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
#> # A tibble: 1 x 3
#>   api_diff api_diff_low api_diff_upp
#>      <dbl>        <dbl>        <dbl>
#> 1     32.9         28.8         37.0
  • group_by() and then summarise() creates summaries by groups.
dstrata %>% 
  group_by(stype) %>%
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
#> # A tibble: 3 x 4
#>   stype api_diff api_diff_low api_diff_upp
#>   <fct>    <dbl>        <dbl>        <dbl>
#> 1 E        38.6         33.1          44.0
#> 2 H         8.46         1.74         15.2
#> 3 M        26.4         20.4          32.4
  • Functions from the survey package are still available:
my_model <- survey::svyglm(api99 ~ stype, dstrata)
#> Call:
#> svyglm(formula = api99 ~ stype, dstrata)
#> Survey design:
#> Called via srvyr
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   635.87      13.34  47.669   <2e-16 ***
#> stypeH        -18.51      20.68  -0.895    0.372    
#> stypeM        -25.67      21.42  -1.198    0.232    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Dispersion parameter for gaussian family taken to be 16409.56)
#> Number of Fisher Scoring iterations: 2

What people are saying about srvyr

-- Kieran Healy, in Data Visualization: A practical introduction

  1. Yay!

--Thomas Lumley, in the Biased and Inefficent blog


I do appreciate bug reports, suggestions and pull requests! I started this as a way to learn about R package development, and am still learning, so you'll have to bear with me. Please review the Contributor Code of Conduct, as all participants are required to abide by its terms.

If you're unfamiliar with contributing to an R package, I recommend the guides provided by Rstudio's tidyverse team, such as Jim Hester's blog post or Hadley Wickham's R packages book.


srvyr 0.3.0

  • srvyr now uses tidy evaluation from rlang. The "underscore" functions have been soft deprecated in favor of quosure splicing. See dplyr's vignette "programming" for more details. In almost all cases, the old syntax will still work, with one exception: the standard evaluation function as_survey_twophase_() had to be changed slightly so that the entire list is inside quotation.

  • Datbase support has been rewritten. It should be faster now and doesn't require a unique identifier. You also can now convert survey db-backed surveys to srvyr with as_survey.

  • srvyr now has a pkgdown site, check it out at http://gdfe.co/srvyr

srvyr 0.2.2

  • Remove test blocking survey update

srvyr 0.2.1

  • Added support for dplyr mutate_at/_if/_all and summarize_at/_if/_all for srvyr surveys.

  • Fixed a few bugs introduced with dplyr 0.6. This version of srvyr will work with both old versions of dplyr and 0.6, but may be full of warnings if you update dplyr. Full support for the new dplyr is coming soon.

srvyr 0.2.0

  • Added support for database backed surveys, using dplyr's handling of DBI. Because of problems interacting with the survey package twophase designs do not work.

srvyr 0.1.2

  • Fixed a problem with confidence levels not being passed into quantiles

  • Added deff parameter to survey_mean(), survey_total() and survey_median(), and a df parameter to those functions and survey_quantile() / survey_median().

  • summarize and mutate match dplyr's behavior when arguments aren't named (uses dplyr::auto_name())

srvyr 0.1.1

  • New function cascade summarizes groups, and cascades to create summary statistics of groups of groups.

  • Fixed a bug for confidence intervals for survey_total() on groups.

  • Fixed some issues with the upcoming version of dplyr.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.3.1 by Greg Freedman Ellis, 13 days ago

http://gdfe.co/srvyr, https://github.com/gergness/srvyr

Report a bug at https://github.com/gergness/srvyr/issues

Browse source code at https://github.com/cran/srvyr

Authors: Greg Freedman Ellis [aut, cre], Thomas Lumley [ctb]

Documentation:   PDF Manual  

Task views: Official Statistics & Survey Methodology

GPL-2 | GPL-3 license

Imports dplyr, magrittr, rlang, survey, tibble

Suggests convey, dbplyr, ggplot2, knitr, Matrix, rmarkdown, pander, RSQLite, MonetDBLite, survival, testthat, vardpoor

See at CRAN