Analysis and Presentation of Social Scientific Data

This is a collection of tools that the author (Jacob) has written for the purpose of more efficiently understanding and sharing the results of (primarily) regression analyses. There are a number of functions focused specifically on the interpretation and presentation of interactions in linear models. Just about everything supports models from the survey package.

CRAN_Status_Badge GitHubtag BuildStatus AppVeyor BuildStatus codecov

This package consists of a series of functions created by the author (Jacob) to automate otherwise tedious research tasks. At this juncture, the unifying theme is the more efficient presentation of regression analyses. There are a number of functions for visualizing and doing inference for interaction terms. Support for the survey package’s svyglm objects as well as weighted regressions is a common theme throughout.

Note: This is beta software. Bugs are possible, both in terms of code-breaking errors and more pernicious errors of mistaken computation.


For the most stable version, simply install from CRAN.


If you want the latest features and bug fixes (and perhaps the latest bugs, too) then you can download from Github. To do that you will need to have devtools installed if you don’t already:


Then install the package from Github.



Here’s a brief synopsis of the current functions in the package:

Summarizing regressions (summ, plot_summs, export_summs)

summ is a replacement for summary that provides the user several options for formatting regression summaries. It supports glm, svyglm, and merMod objects as input as well. It supports calculation and reporting of robust standard errors via the sandwich package.

Basic use:

fit <- lm(mpg ~ hp + wt, data = mtcars)
#> Observations: 32
#> Dependent Variable: mpg
#> F(2,29) = 69.21, p = 0
#> R-squared = 0.83
#> Adj. R-squared = 0.81
#> Standard errors: OLS 
#>              Est. S.E. t val. p    
#> (Intercept) 37.23 1.6   23.28 0 ***
#> hp          -0.03 0.01  -3.52 0  **
#> wt          -3.88 0.63  -6.13 0 ***

It has several conveniences, like re-fitting your model with scaled variables (scale = TRUE). You have the option to leave the outcome variable in its original scale (scale.response = TRUE), which is the default for scaled models. I’m a fan of Andrew Gelman’s 2 SD standardization method, so you can specify by how many standard deviations you would like to rescale ( = 2).

You can also get variance inflation factors (VIFs) and partial/semipartial (AKA part) correlations. Partial correlations are only available for OLS models. You may also substitute confidence intervals in place of standard errors and you can choose whether to show p values.

summ(fit, scale = TRUE, vifs = TRUE, part.corr = TRUE, confint = TRUE,
     pvals = FALSE)
#> Observations: 32
#> Dependent Variable: mpg
#> F(2,29) = 69.21, p = 0
#> R-squared = 0.83
#> Adj. R-squared = 0.81
#> Standard errors: OLS 
#>              Est.  2.5% 97.5% t val.  VIF partial.r part.r
#> (Intercept) 20.09 19.19 20.99  43.82                      
#> hp          -2.18 -3.39 -0.97  -3.52 1.77     -0.55  -0.27
#> wt          -3.79 -5.01 -2.58  -6.13 1.77     -0.75  -0.47
#> All continuous predictors are mean-centered and scaled by 1 s.d.

Cluster-robust standard errors:

data("PetersenCL", package = "sandwich")
fit2 <- lm(y ~ x, data = PetersenCL)
summ(fit2, robust = TRUE, cluster = "firm", robust.type = "HC3")
#> Observations: 5000
#> Dependent Variable: y
#> F(1,4998) = 1310.74, p = 0
#> R-squared = 0.21
#> Adj. R-squared = 0.21
#> Standard errors: Cluster-robust, type = HC3
#>             Est. S.E. t val.    p    
#> (Intercept) 0.03 0.07   0.44 0.66    
#> x           1.03 0.05  20.36 0    ***

Of course, summ like summary is best-suited for interactive use. When it comes to share results with others, you want sharper output and probably graphics. jtools has some options for that, too.

First, for tabular output, export_summs is an interface to the huxtable package’s huxreg function that preserves the niceties of summ, particularly its facilities for robust standard errors and standardization. It also concatenates multiple models into a single table.

fit <- lm(mpg ~ hp + wt, data = mtcars)
fit_b <- lm(mpg ~ hp + wt + disp, data = mtcars)
fit_c <- lm(mpg ~ hp + wt + disp + drat, data = mtcars)
export_summs(fit, fit_b, fit_c, scale = TRUE, scale.response = TRUE,
             note = "")
1 2 3
(Intercept) 5.27e-17.00 4.98e-17.00 1.02e-16.00
(0.08) (0.08) (0.08)
hp -0.36 ** -0.35 * -0.40 **
(0.10) (0.13) (0.13)
wt -0.63 *** -0.62 ** -0.56 **
(0.10) (0.17) (0.18)
disp -0.02 0.08
(0.21) (0.22)
drat 0.16
N 32 32 32
R 2 0.83 0.83 0.84

In RMarkdown documents, using export_summs and the chunk option results = 'asis' will give you nice-looking tables in HTML and PDF output. Using the to.word = TRUE argument will create a Microsoft Word document with the table in it.

Another way to get a quick gist of your regression analysis is to plot the values of the coefficients and their corresponding uncertainties with plot_summs (or the closely related plot_coefs). jtools has made some slight changes to ggplot2 geoms to make everything look nice; and like with export_summs, you can still get your scaled models and robust standard errors.

plot_summs(fit, fit_b, fit_c, scale = TRUE, robust = TRUE, 
    coefs = c("Horsepower" = "hp", "Weight (tons)" = "wt", 
        "Displacement" = "disp", "Rear axle ratio" = "drat"))

And since you get a ggplot object in return, you can tweak and theme as you wish.

plot_coefs works much the same way, but without support for summ arguments like robust and scale. This enables a wider range of models that have support from the broom package but not for summ. And you can give summ objects to plot_coefs since this package defines tidy methods for summ objects.

For instance, I could compare the confidence bands with different robust standard error specifications using plot_coefs by giving the summ objects as arguments.

summ_fit_1 <- summ(fit_b, scale = TRUE)
summ_fit_2 <- summ(fit_b, scale = TRUE, robust = TRUE,
                    robust.type = "HC0")
summ_fit_3 <- summ(fit_b, scale = TRUE, robust = TRUE,
                    robust.type = "HC3")
plot_coefs(summ_fit_1, summ_fit_2, summ_fit_3, 
           model.names = c("OLS","HC0","HC3"),
           coefs = c("Horsepower" = "hp", "Weight (tons)" = "wt", 
                    "Displacement" = "disp"))

Exploring interactions

Unless you have a really keen eye and good familiarity with both the underlying mathematics and the scale of your variables, it can be very difficult to look at the ouput of regression model that includes an interaction and actually understand what the model is telling you.

This package contains several means of aiding understanding and doing statistical inference with interactions.

Johnson-Neyman intervals and simple slopes analysis

The “classic” way of probing an interaction effect is to calculate the slope of the focal predictor at different values of the moderator. When the moderator is binary, this is especially informative—e.g., what is the slope for men vs. women? But you can also arbitrarily choose points for continuous moderators.

With that said, the more statistically rigorous way to explore these effects is to find the Johnson-Neyman interval, which tells you the range of values of the moderator in which the slope of the predictor is significant vs. nonsignificant at a specified alpha level.

The sim_slopes function will by default find the Johnson-Neyman interval and tell you the predictor’s slope at specified values of the moderator; by default either both values of binary predictors or the mean and the mean +/- one standard deviation for continuous moderators.

fiti <- lm(mpg ~ hp * wt, data = mtcars)
sim_slopes(fiti, pred = hp, modx = wt, jnplot = TRUE)
#> The slope of hp is p < .05 when wt is OUTSIDE this interval:
#> [3.69, 5.9]
#> Note: The range of observed values of wt is [1.51, 5.42]

#> Slope of hp when wt = 4.2 (+ 1 SD): 
#> Est. S.E.    p 
#> 0.00 0.01 0.76 
#> Slope of hp when wt = 3.22 (Mean): 
#>  Est.  S.E.     p 
#> -0.03  0.01  0.00 
#> Slope of hp when wt = 2.24 (- 1 SD): 
#>  Est.  S.E.     p 
#> -0.06  0.01  0.00

The Johnson-Neyman plot can really help you get a handle on what the interval is telling you, too. Note that you can look at the Johnson-Neyman interval directly with the johnson_neyman function.

The above all generalize to three-way interactions, too.

Visualizing interaction effects

This function plots two- and three-way interactions using ggplot2 with a similar interface to the aforementioned sim_slopes function. Users can customize the appearance with familiar ggplot2 commands. It supports several customizations, like confidence intervals.

interact_plot(fiti, pred = hp, modx = wt, interval = TRUE)

You can also plot the observed data for comparison:

interact_plot(fiti, pred = hp, modx = wt, plot.points = TRUE)

The function also supports categorical moderators—plotting observed data in these cases can reveal striking patterns.

fitiris <- lm(Petal.Length ~ Petal.Width * Species, data = iris)
interact_plot(fitiris, pred = Petal.Width, modx = Species, plot.points = TRUE)

You may also combine the plotting and simple slopes functions by using probe_interaction, which calls both functions simultaneously. Categorical by categorical interactions can be investigated using the cat_plot function.

Other stuff


This will format your ggplot2 graphics to make them (mostly) appropriate for APA style publications. There’s no drop-in, perfect way to get plots into APA format sight unseen, but this gets you close and returns a ggplot object that can be further tweaked to your specification.

The plots produced by other functions in this package use theme_apa, but use its options to position the plots and alter other details to make them more in line with ggplot2 defaults than APA norms.

You might start with something like the above interaction plots and then use theme_apa to tune it to APA specification. Note the legend.pos option:

p <- interact_plot(fitiris, pred = "Petal.Width", modx = "Species", plot.points = TRUE)
p + theme_apa(legend.pos = "topleft")

You may need to make further changes to please your publisher, of course. Since these are regular ggplot theme changes, it shouldn’t be a problem.


This function extends the survey package by calculating correlations with complex survey designs, a feature absent from survey. Users may request significance tests, which are calculated via bootstrap by calling the weights package.

dstrat <- svydesign(id = ~1, strata = ~stype, weights = ~pw, data = apistrat, fpc = ~fpc)
svycor(~ api00 + api99 + dnum, design = dstrat, sig.stats = TRUE)
#>       api00 api99 dnum 
#> api00 1     0.98* 0.25*
#> api99 0.98* 1     0.24*
#> dnum  0.25* 0.24* 1

Tests for survey weight ignorability

In keeping with the package’s attention to users of survey data, I’ve implemented a couple of tests that help to check whether your model is specified correctly without survey weights. It goes without saying that you shouldn’t let statistical tests do your thinking for you, but they can provide useful info.

The first is wgttest, which implements the DuMouchel-Duncan (1983) procedure and is meant in part to duplicate the user-written Stata procedure of the same name. It can both test whether the model fit overall is changed with the addition of weights as well as show you which coefficients are most affected.

The next is pf_sv_test, short for Pfeffermann-Sverchkov (1999) test, which focuses on residual correlation with weights. You’ll need the boot package for this one.

To run both at once, you can use weights_tests.


gscale, center_lm, scale_lm, and svysd each do some of the behind the scenes computation in the above functions, but could do well for end users as well. See the documentation for more.

Details on the arguments can be accessed via the R documentation (?functionname). There are now vignettes documenting just about everything you can do as well.


I’m happy to receive bug reports, suggestions, questions, and (most of all) contributions to fix problems and add features. I prefer you use the Github issues system over trying to reach out to me in other ways. Pull requests for contributions are encouraged.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.


The source code of this package is licensed under the MIT License.


jtools 0.9.3 (CRAN release)


  • johnson_neyman and sim_slopes were both encountering errors with merMod input. Thanks to Seongho Bae for reporting these issues and testing out development versions.
  • An upcoming version of R will change a common warning to an error, causing a need to change the internals of gscale.
  • The default model names in export_summs had an extra space (e.g., ( 1)) due to changes in huxtable. The defaults are now just single numbers.

jtools 0.9.2


  • Johnson-Neyman plots misreported the alpha level if control.fdr was TRUE. It was reporting alpha * 2 in the legend, but now it is accurate again.

Feature update:

  • johnson_neyman now handles multilevel models from lme4.

jtools 0.9.1 (CRAN release)

Bugfix update:

Jonas Kunst helpfully pointed out some odd behavior of interact_plot with factor moderators. No longer should there be occasions in which you have two different legends appear. The linetype and colors also should now be consistent whether there is a second moderator or not. For continuous moderators, the darkest line should also be a solid line and it is by default the highest value of the moderator.

Other fixes:

  • An update to huxtable broke export_summs, but that has been fixed.

Feature updates:

  • You can now manually provide colors to interact_plot and cat_plot by providing a vector of colors (any format that ggplot2 accepts) for the color.class argument.
  • Noah Greifer wrote up a tweak to summ that formats the output in a way that lines up the decimal points. It looks great.

jtools 0.9.0 (CRAN release)

This may be the single biggest update yet. If you downloaded from CRAN, be sure to check the 0.8.1 update as well.

New features are organized by function.


  • A new control.fdr option is added to control the false discovery rate, building on new research. This makes the test more conservative but less likely to be a Type 1 error.
  • A line.thickness argument has been added after Heidi Jacobs pointed out that it cannot be changed after the fact.
  • The construction of the multiple plots when using sim_slopes for 3-way interactions is much-improved.
  • The critical test statistic used by default has been slightly altered. It previously used a normal approximation; i.e., if alpha = .05 the critical test statistic was always 1.96. Now, the residual degrees of freedom are used with the t distribution. You can do it the old way by setting df = "normal" or any arbitrary number.


  • More improvements to plot.points (see 0.8.1 for more). You can now plot observed data with 3-way interactions.
  • Another pre-set modxvals and mod2vals specification has been added: "terciles". This splits the observed data into 3 equally sized groups and chooses as values the mean of each of those groups. This is especially good for skewed data and for second moderators.
  • A new linearity.check option for two-way interactions. This facets by each level of the moderator and lets you compare the fitted line with a loess smoothed line to ensure that the interaction effect is roughly linear at each level of the (continuous) moderator.
  • When the model used weights, like survey sampling weights, the observed data points are resized according to the observation's weight when plot.points = TRUE.
  • New jitter argument added for those using plot.points. If you don't want the points jittered, you can set jitter = 0. If you want more or less, you can play with the value until it looks right. This applies to effect_plot as well.


  • Users are now informed why the function is taking so long if r.squared or pbkrtest are slowing things down. r.squared is now set to FALSE by default.

New functions!

plot_summs: A graphic counterpart to export_summs, which was introduced in the 0.8.0 release. This plots regression coefficients to help in visualizing the uncertainty of each estimate and facilitates the plotting of nested models alongside each other for comparison. This allows you to use summ features like robust standard errors and scaling with this type of plot that you could otherwise create with some other packages.

plot_coefs: Just like plot_summs, but no special summ features. This allows you to use models unsupported by summ, however, and you can provide summ objects to plot the same model with different summ argument alongside each other.

cat_plot: This was a long time coming. It is a complementary function to interact_plot, but is designed to deal with interactions between categorical variables. You can use bar plots, line plots, dot plots, and box and whisker plots to do so. You can also use the function to plot the effect of a single categorical predictor without an interaction.

jtools 0.8.1

Thanks to Kim Henry who reported a bug with johnson_neyman in the case that there is an interval, but the entire interval is outside of the plotted area: When that happened, the legend wrongly stated the plotted line was non-significant.

Besides that bugfix, some new features:

  • When johnson_neyman fails to find the interval (because it doesn't exist), it no longer quits with an error. The output will just state the interval was not found and the plot will still be created.
  • Much better support for plotting observed data in interact_plot has been added. Previously, if the moderator was a factor, you would get very nicely colored plotted points when using plot.points = TRUE. But if the moderator was continuous, the points were just black and it wasn't very informative beyond examining the main effect of the focal predictor. With this update, the plotted points for continous moderators are shaded along a gradient that matches the colors used for the predicted lines and confidence intervals.

jtools 0.8.0 (CRAN release)

Not many user-facing changes since 0.7.4, but major refactoring internally should speed things up and make future development smoother.

jtools 0.7.4


  • interact_plot and effect_plot would trip up when one of the focal predictors had a name that was a subset of a covariate (e.g., pred = "var" but a covariate is called "var_2"). That's fixed.
  • Confidence intervals for merMod objects were not respecting the user-requested confidence level and that has been fixed.
  • Confidence intervals for merMod objects were throwing a spurious warning on R 3.4.2.
  • interact_plot was mis-ordering secondary moderators. That has been fixed.
  • export_summs had a major performance problem when providing extra arguments which may have also caused it to wrongly ignore some arguments. That has been fixed and it is much faster.


  • interact_plot now gives more informative labels for secondary moderators when the user has defined the values but not the labels.
  • confidence intervals are now properly supported with export_summs
  • changes made to export_summs for compatibility with huxtable 1.0.0 changes

jtools 0.7.3 (CRAN release)

Important bugfix:

  • When standardize was set to TRUE using summ, the model was not mean-centered as the output stated. This has been fixed. I truly regret the error---double-check any analyses you may have run with this feature.

New function: export_summs.

This function outputs regression models supported by summ in table formats useful for RMarkdown output as well as specific options for exporting to Microsoft Word files. This is particularly helpful for those wanting an efficient way to export regressions that are standardized and/or use robust standard errors.

jtools 0.7.2

The documentation for j_summ has been reorganized such that each supported model type has its own, separate documentation. ?j_summ will now just give you links to each supported model type.

More importantly, j_summ will from now on be referred to as, simply, summ. Your old code is fine; j_summ will now be an alias for summ and will run the same underlying code. Documentation will refer to the summ function, though. That includes the updated vignette.

One new feature for summ.lm:

  • With the part.corr = TRUE argument for a linear model, partial and semipartial correlations for each variable are reported.

More tweaks to summ.merMod:

  • Default behavior with regard to p values depends on model type (lmer vs. glmer/nlmer) and, in the case of linear models, whether the pbkrtest package is installed. If it is, p values are calculated based on the Kenward-Roger degrees of freedom calculation and printed. Otherwise, p values are not shown by default with lmer models. P values are shown with glmer models, since that is also the default behavior of lme4.
  • There is an r.squared option, which for now is FALSE by default. It adds runtime since it must fit a null model for comparison and sometimes this also causes convergence issues.

jtools 0.7.1

Returning to CRAN!

A very strange bug on CRAN's servers was causing jtools updates to silently fail when I submitted updates; I'd get a confirmation that it passed all tests, but a LaTeX error related to an Indian journal I cited was torpedoing it before it reached CRAN servers.

The only change from 0.7.0 is fixing that problem, but if you're a CRAN user you will want to flip through the past several releases as well to see what you've missed.

jtools 0.7.0

New features:

  • j_summ can now provide cluster-robust standard errors for lm models.
  • j_summ output now gives info about missing observations for supported models.
  • At long last, j_summ/scale_lm/center_lm can standardize/center models with logged terms and other functions applied.
  • interact_plot and effect_plot will now also support predictors that have functions applied to them.
  • j_summ now supports confidence intervals at user-specified widths.
  • j_summ now allows users to not display p-values if requested.
  • I've added a warning to j_summ output with merMod objects, since it provides p-values calculated on the basis of the estimated t-values. These are not to be interpreted in the same way that OLS and GLM p-values are, since with smaller samples mixed model t-values will give inflated Type I error rates.
  • By default, j_summ will not show p-values for merMod objects.

Bug fix:

  • scale_lm did not have its center argument implemented and did not explain the option well in its documentation.
  • johnson_neyman got confused when a factor variable was given as a predictor

jtools 0.6.1

Bug fix release:

  • wgttest acted in a way that might be unexpected when providing a weights variable name but no data argument. Now it should work as expected by getting the data frame from the model call.
  • gscale had a few situations in which it choked on missing data, especially when weights were used. This in turn affected j_summ, scale_lm, and center_lm, which each rely on gscale for standardization and mean-centering. That's fixed now.
  • gscale wasn't playing nicely with binary factors in survey designs, rendering the scaling incorrect. If you saw a warning, re-check your outputs after this update.

jtools 0.6.0

A lot of changes!

New functions:

  • effect_plot: If you like the visualization of moderation effects from interact_plot, then you should enjoy effect_plot. It is a clone of interact_plot, but shows a single regression line rather than several. It supports GLMs and lme4 models and can plot original, observed data points.
  • pf_sv_test: Another tool for survey researchers to test whether it's okay to run unweighted regressions. Named after Pfefferman and Svervchkov, who devised the test.
  • weights_tests: Like probe_interaction does for the interaction functions, weights_tests will run the new pf_sv_test as well as wgttest simultaneously with a common set of arguments.


  • Set a default number of digits to print for all jtools functions with the option "jtools-digits".
  • wgttest now accepts and tests GLMs and may work for other regression models.

Bug fixes:

  • j_summ would print significance stars based on the rounded p value, sometimes resulting in misleading output. Now significance stars are based on the non-rounded p values.
  • probe_interaction did not pass an "alpha" argument to sim_slopes, possibly confusing users of johnson_neyman. The argument sim_slopes is looking for is called "jnalpha". Now probe_interaction will pass "alpha" arguments as "jn_alpha".
  • interact_plot would stop on an error when the model included a two-level factor not involved in the interaction and not centered. Now those factors in that situation are treated like other factors.
  • interact_plot sometimes gave misleading output when users manually defined moderator labels. It is now more consistent with the ordering the labels and values and will not wrongly label them when the values are provided in an odd order.
  • wgttest now functions properly when a vector of weights is provided to the weights argument rather than a column name.
  • gscale now works properly on tibbles, which requires a different style of column indexing than data frames.
  • Related to the prior point, j_summ/standardize_lm/center_lm now work properly on models that were originally fit with tibbles in the data argument.
  • sim_slopes would fail for certain weighted lm objects depending on the way the weights were specified in the function call. It should now work for all weighted lm objects.

jtools 0.5.0

More goodies for users of interact_plot:

  • Added support for models with a weights parameter in interact_plot. It would work previously, but didn't use a weighted mean or SD in calculating values of the moderator(s) and for mean-centering other predictors. Now it does.
  • Added support for two-level factor predictors in interact_plot. Previously, factor variables had to be a moderator.
  • When predictor in interact_plot has only two unique values (e.g., dummy variables that have numeric class), by default only those two values have tick marks on the x-axis. Users may use the pred.labels argument to specify labels for those ticks.
  • Offsets are now supported (especially useful for Poisson GLMs), but only if specified via the offset argument rather than included in the model formula. You can (and should) specify the offset used for the plot using the set.offset argument. By default it is 1 so that the y-axis represents a proportion.

Other feature changes:

  • sim_slopes now supports weights (from the weights argument rather than a svyglm model). Previously it used unweighted mean and standard deviation for non-survey models with weights.
  • Improved printing features of wgttest

Bug fixes:

  • R 3.4 introduced a change that caused warning messages when return objects are created in a certain way. This was first addressed in jtools 0.4.5, but a few instances slipped through the cracks. Thanks to Kim Henry for pointing out one such instance.
  • When sim_slopes called johnson_neyman while the robust argument was set to TRUE, the robust.type argument was not being passed (causing the default of "HC3" to be used). Now it is passing that argument correctly.

jtools 0.4.5

  • Added better support for plotting nonlinear interactions with interact_plot, providing an option to plot on original (nonlinear) scale.
  • interact_plot can now plot fixed effects interactions from merMod objects
  • Fixed warning messages when using j_summ with R 3.4.x
  • Added preliminary merMod support for j_summ. Still needs convergence warnings, some other items.

jtools 0.4.4

  • Under the hood changes to j_summ
  • Cleaned up examples
  • Added wgttest function, which runs a test to assess need for sampling weights in linear regression

jtools 0.4.3

  • No matter what you do, there's nothing like seeing your package on CRAN to open your eyes to all the typos, etc. you've put into your package.

jtools 0.4.2 — Initial CRAN release

  • This is the first CRAN release. Compared to 0.4.1, the prior Github release, dependencies have been removed and several functions optimized for speed.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.9.4 by Jacob A. Long, 4 days ago

Report a bug at

Browse source code at

Authors: Jacob A. Long [aut, cre] (<>)

Documentation:   PDF Manual  

MIT + file LICENSE license

Imports ggplot2

Suggests boot, broom, car, cowplot, flextable, grid, huxtable, lme4, methods, officer, pbkrtest, plyr, RColorBrewer, sandwich, stats4, survey, weights, knitr, rmarkdown, testthat

See at CRAN