Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.
broom summarizes key information about models in tidy
tibble()s. broom provides three verbs to make it convenient to interact with model objects:
tidy()summarizes information about model components
glance()reports information about the entire model
augment()adds informations about observations to a dataset
For a detailed introduction, please see
broom tidies 100+ models from popular modelling packages and almost all of the model objects in the
stats package that comes with base R.
vignette("available-methods") lists method availabilty.
If you aren't familiar with tidy data structures and want to know how they can make your life easier, we highly recommend reading Hadley Wickham's Tidy Data.
install.packages("tidyverse")# alternatively, to install just broom:install.packages("broom")# to get the development version from GitHub:install.packages("devtools")devtools::install_github("tidyverse/broom")
If you find a bug, please file a minimal reproducible example in the issues.
tidy() produces a
tibble() where each row contains information about an important component of the model. For regression models, this often corresponds to regression coefficients. This is can be useful if you want to inspect a model or create custom visualizations.
library(broom)fit <- lm(Sepal.Width ~ Petal.Length + Petal.Width, iris)tidy(fit)#> # A tibble: 3 x 5#> term estimate std.error statistic p.value#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 (Intercept) 3.59 0.0937 38.3 2.51e-78#> 2 Petal.Length -0.257 0.0669 -3.84 1.80e- 4#> 3 Petal.Width 0.364 0.155 2.35 2.01e- 2
glance() returns a tibble with exactly one row of goodness of fitness measures and related statistics. This is useful to check for model misspecification and to compare many models.
glance(fit)#> # A tibble: 1 x 11#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC#> * <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>#> 1 0.213 0.202 0.389 19.9 2.24e-8 3 -69.8 148. 160.#> # ... with 2 more variables: deviance <dbl>, df.residual <int>
augment adds columns to a dataset, containing information such as fitted values, residuals or cluster assignments. All columns added to a dataset have
. prefix to prevent existing columns from being overwritten.
augment(fit, data = iris)#> # A tibble: 150 x 12#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species .fitted#> * <dbl> <dbl> <dbl> <dbl> <fct> <dbl>#> 1 5.1 3.5 1.4 0.2 setosa 3.30#> 2 4.9 3 1.4 0.2 setosa 3.30#> 3 4.7 3.2 1.3 0.2 setosa 3.33#> 4 4.6 3.1 1.5 0.2 setosa 3.27#> 5 5 3.6 1.4 0.2 setosa 3.30#> 6 5.4 3.9 1.7 0.4 setosa 3.30#> 7 4.6 3.4 1.4 0.3 setosa 3.34#> 8 5 3.4 1.5 0.2 setosa 3.27#> 9 4.4 2.9 1.4 0.2 setosa 3.30#> 10 4.9 3.1 1.5 0.1 setosa 3.24#> # ... with 140 more rows, and 6 more variables: .se.fit <dbl>,#> # .resid <dbl>, .hat <dbl>, .sigma <dbl>, .cooksd <dbl>,#> # .std.resid <dbl>
We welcome contributions of all types!
If you have never made a pull request to an R package before, broom is an excellent place to start. Find an issue with the Beginner Friendly tag and comment that you'd like to take it on and we'll help you get started.
We encourage typo corrections, bug reports, bug fixes and feature requests. Feedback on the clarity of the documentation is especially valuable.
If you are interested in adding new tidiers methods to broom, please read
We have a Contributor Code of Conduct. By participating in broom you agree to abide by its terms.
augment()are now re-exported from the generics package.
Tidiers now return
tibble::tibble()s. This release also includes several new tidiers, new vignettes and a large number of bugfixes. We've also begun to more rigorously define tidier specifications: we've laid part of the groundwork for stricter and more consistent tidying, but the new tidier specifications are not yet complete. These will appear in the next release.
Additionally, users should note that we are in the process of migrating tidying methods for mixed models and Bayesian models to
broom.mixed is not on CRAN yet, but all mixed model and Bayesian tidiers will be deprecated once
broom.mixed is on CRAN. No further development of mixed model tidiers will take place in
Almost all tidiers should now return
tibbles rather than
data.frames. Deprecated tidying methods, Bayesian and mixed model tidiers still return
Users are mostly to experience issues when using
augment in situations where tibbles are stricter than data frames. For example, specifying model covariates as a matrix object will now error:
library(broom)library(quantreg)fit <- rq(stack.loss ~ stack.x, tau = .5)broom::augment(fit)#> Error: Column `stack.x` must be a 1d atomic vector or a list
This is because the default
data = model.frame(fit) cannot be coerced to
Another consequence of this is that
augment.coxph from the
survival package now require that the user explicitly passes data to either the
These restrictions will be relaxed in an upcoming release of
broom pending support for matrix-columns in tibbles.
Developers are likely to experience issues:
[, which returns a tibble rather than a vector.
roxygen2template based documentation system.
This version of
broom includes several new vignettes:
vignette("available-methods", package = "broom")contains a table detailing which tidying methods are available
vignette("adding-tidiers", package = "broom")is an in-progress guide for contributors on how to add new tidiers to broom
vignette("glossary", package = "broom")contains tables describing acceptable argument names and column names for the in-progress new specification.
Several old vignettes have also been updated:
vignette("bootstrapping", package = "broom")now relies on the
rsamplepackage and a
tidyr::unnestworkflow. This is now the recommended workflow for working with multiple models, as opposed to the old
bootstrap()has been deprecated in favor of the
inflatehas been removed from
alphaargument has been removed from
separate.levelsargument has been removed from
tidy.TukeyHSD. To obtain the effect of
separate.levels = TRUE, users may
tidyr::separateafter tidying. This is consistent with the
fe.errorargument was removed from
tidy.felm. When fixed effects are tidier, their standard errors are now always included.
tidy.disthas been renamed
arimaobjects fit with
method = "CSS"(#396 by @josue-rodriguez)
family = multinomial(#395 by @erleholgersen)
quantregintercept only models (#378 by @erleholgersen)
aovlistobjects (#377 by @mvevans89)
glmnetUtilsobjects (#352 by @Hong-Revo)
tidy_emmeansto handle column names with dashes (#351 by @bmannakee)
augment.felmno longer returns
augment.felm(#347 by @ShreyasSingh)
confint_tidynow drops rows of all
NA(#345 by @atyre2)
caret::confusionMatrixobjects (#344 by @mkuehn10)
Kendall::Kendallobjects (#343 by @cimentadaj)
car::durbinWatsonTestobjects (#341 by @mkuehn10)
glancethrows an informative error for
quantreg:rqmodels fit with multiple
tauvalues (#338 by @bfgray3)
tidy.glmnetgains the ability to retain zero-valued coefficients with a
return_zerosargument that defaults to
FALSE(#337 by @bfgray3)
tidy.manovanow retains a
Residualsrow (#334 by @jarvisc1)
MASS::polrordinal model objects (#332 by @larmarange)
car::Anova(#325 by @mariusbarth)
tseries::garchmodels (#323 by @wilsonfreitas)
psychpackage (#313 by @nutterb)
loopackages (#298 by @jgabry)
tidy.prcompwhen missing labels (#265 by @corybrunson)
pkgdownsite at https://broom.tidyverse.org/ (#260 by @jayhesselberth)
AER::ivregmodels (#247 by @hughjonesd)
lavaanpackage (#233 by @puterleat)
tidy.coxph(#220 by @larmarange)
augmentmethod for chi-squared tests (#138 by @larmarange)
Many many thanks to all the following for their thoughtful comments on design, bug reports and PRs! The community of broom contributors has been kind, supportive and insighftul and I look forward to working you all again!
@atyre2, @batpigandme, @bfgray3, @bmannakee, @briatte, @cawoodjm, @cimentadaj, @dan87134, @dgrtwo, @dmenne, @ekatko1, @ellessenne, @erleholgersen, @Hong-Revo, @huftis, @IndrajeetPatil, @jacob-long, @jarvisc1, @jenzopr, @jgabry, @jimhester, @josue-rodriguez, @karldw, @kfeilich, @larmarange, @lboller, @mariusbarth, @michaelweylandt, @mine-cetinkaya-rundel, @mkuehn10, @mvevans89, @nutterb, @ShreyasSingh, @stephlocke, @strengejacke, @topepo, @willbowditch, @WillemSleegers, and @wilsonfreitas
glanceon NULLs now return an empty data frame
inflate()function in favor of
quick = TRUEto return terms as character rather than factor (thanks to #191 from Matteo Sostero)
ivregobjects from the AER package (thanks to #245 from David Hugh-Jones)
survdiffobjects from the survival package (thanks to #147 from Michał Bojanowski)
emmeansfrom the emmeans package (thanks to #252 from Matthew Kay)
speedglmfrom the speedglm package (thanks to #248 from David Hugh-Jones)
muhazobjects from the muhaz package (thanks to #251 from Andreas Bender)
stlobjects from stats (thanks to #165 from Aaron Jacobs)
ref.gridobjects from the lsmeans package
betaregobjects from the betareg package
glmRobobjects from the robust package
brmsobjects from the brms package (thanks to #149 from Paul Buerkner)
tidy.glmnetto filter out rows where estimate == 0.
rstanarmtidiers (thanks to #177 from Jonah Gabry)
termcolumn. Also added
separate.levelsargument, with option to separate
tidy.manovato use correct column name for test (previously, always
kde_tidiersto tidy kernel density estimates
orcutt_tidiersto tidy the results of
tidy.distto tidy the distance matrix output of
distfrom the stats package
lmodel2objects from the lmodel2 package
poLCAobjects from the poLCA package
Mclustobjects from the Mclust package
tidymethods for lists, including u, d, v lists from
svd, and x, y, z lists used by
tidy.biglm, to create a smaller and faster version of the output.
rowwise_df_tidiersto allow the original data to be saved as a list column, then provided as a column name to
augment. This required removing
augmentS3 signature. Also added
tidy.coeftestfor coeftest objects from the lmtest package.
tidy.lmto work with "mlm" (multiple linear model) objects (those with multiple response columns).
glancefor "biglm" and "bigglm" objects from the biglm package.
tidy.coxphwhen one-row matrices are returned
multinomobjects from the nnet package.
tidy.pairwise.htest, which now can handle cases where the grouping variable is numeric.
tidy.aovlistmethod. This added
stringrpackage to IMPORTS to trim whitespace from the beginning and end of the
stratumcolumns. This also required adjusting
tidy.aovso that it could handle strata that are missing p-values.
glance.lmto work with
aovobjects along with
glancefor matrix objects, with
tidy.matrixconverting a matrix to a data frame with rownames included, and
glance.matrixreturning the same result as
.residcolumns were matrices rather than vectors.
rlm(robust linear model) and
gam(generalized additive model) objects, including adjustments to "lm" tidiers in order to handle them. See
The behavior of
augment, particularly with regard to missing data and the
na.exclude argument, has through the use of the
augment_columns function been made consistent across the following models:
Unit tests in
tests/testthat/test-augment.R were added to ensure consistency across these models.
glance methods were added for
rowwise_df objects, and are set up to apply across their rows. This allows for simple patterns such as:
regressions <- mtcars %>% group_by(cyl) %>% do(mod = lm(mpg ~ wt, .)) regressions %>% tidy(mod) regressions %>% augment(mod)
?rowwise_df_tidiers for more.
glance methods for
Arima objects, and
Fixes for CRAN: change package description to title case, removed NOTES, mostly by adding
globals.R to declare global variables.
This is the original version published on CRAN.
glancemethods for data.frames have also been added, and
augment.data.frameproduces an error (rather than returning the same data.frame).
stderrorhas been changed to
std.error(affects many functions) to be consistent with broom's naming conventions for columns.
bootstraphas been added based on this example, to perform the common use case of bootstrapping models.