Summarizes key information about statistical objects in tidy tibbles. This makes it easy to report results, create plots and consistently work with large numbers of models at once. Broom provides three verbs that each provide different types of information about a model. tidy() summarizes information about model components such as coefficients of a regression. glance() reports information about an entire model, such as goodness of fit measures like AIC and BIC. augment() adds information about individual observations to a dataset, such as fitted values or influence measures.
broom summarizes key information about models in tidy tibble()
s. broom provides three verbs to make it convenient to interact with model objects:
tidy()
summarizes information about model componentsglance()
reports information about the entire modelaugment()
adds informations about observations to a datasetFor a detailed introduction, please see vignette("broom")
.
broom tidies 100+ models from popular modelling packages and almost all of the model objects in the stats
package that comes with base R. vignette("available-methods")
lists method availabilty.
If you aren't familiar with tidy data structures and want to know how they can make your life easier, we highly recommend reading Hadley Wickham's Tidy Data.
install.packages("tidyverse")# alternatively, to install just broom:install.packages("broom")# to get the development version from GitHub:install.packages("devtools")devtools::install_github("tidyverse/broom")
If you find a bug, please file a minimal reproducible example in the issues.
tidy()
produces a tibble()
where each row contains information about an important component of the model. For regression models, this often corresponds to regression coefficients. This is can be useful if you want to inspect a model or create custom visualizations.
library(broom)fit <- lm(Sepal.Width ~ Petal.Length + Petal.Width, iris)tidy(fit)#> # A tibble: 3 x 5#> term estimate std.error statistic p.value#> <chr> <dbl> <dbl> <dbl> <dbl>#> 1 (Intercept) 3.59 0.0937 38.3 2.51e-78#> 2 Petal.Length -0.257 0.0669 -3.84 1.80e- 4#> 3 Petal.Width 0.364 0.155 2.35 2.01e- 2
glance()
returns a tibble with exactly one row of goodness of fitness measures and related statistics. This is useful to check for model misspecification and to compare many models.
glance(fit)#> # A tibble: 1 x 11#> r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC#> * <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl>#> 1 0.213 0.202 0.389 19.9 2.24e-8 3 -69.8 148. 160.#> # ... with 2 more variables: deviance <dbl>, df.residual <int>
augment
adds columns to a dataset, containing information such as fitted values, residuals or cluster assignments. All columns added to a dataset have .
prefix to prevent existing columns from being overwritten.
augment(fit, data = iris)#> # A tibble: 150 x 12#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species .fitted#> * <dbl> <dbl> <dbl> <dbl> <fct> <dbl>#> 1 5.1 3.5 1.4 0.2 setosa 3.30#> 2 4.9 3 1.4 0.2 setosa 3.30#> 3 4.7 3.2 1.3 0.2 setosa 3.33#> 4 4.6 3.1 1.5 0.2 setosa 3.27#> 5 5 3.6 1.4 0.2 setosa 3.30#> 6 5.4 3.9 1.7 0.4 setosa 3.30#> 7 4.6 3.4 1.4 0.3 setosa 3.34#> 8 5 3.4 1.5 0.2 setosa 3.27#> 9 4.4 2.9 1.4 0.2 setosa 3.30#> 10 4.9 3.1 1.5 0.1 setosa 3.24#> # ... with 140 more rows, and 6 more variables: .se.fit <dbl>,#> # .resid <dbl>, .hat <dbl>, .sigma <dbl>, .cooksd <dbl>,#> # .std.resid <dbl>
We welcome contributions of all types!
If you have never made a pull request to an R package before, broom is an excellent place to start. Find an issue with the Beginner Friendly tag and comment that you'd like to take it on and we'll help you get started.
We encourage typo corrections, bug reports, bug fixes and feature requests. Feedback on the clarity of the documentation is especially valuable.
If you are interested in adding new tidiers methods to broom, please read vignette("adding-tidiers")
.
We have a Contributor Code of Conduct. By participating in broom you agree to abide by its terms.
To be released as 0.5.0
Tidiers now return tibble::tibble()
s. This release also includes several new tidiers, new vignettes and a large number of bugfixes. We've also begun to more rigorously define tidier specifications: we've laid part of the groundwork for stricter and more consistent tidying, but the new tidier specifications are not yet complete. These will appear in the next release.
Additionally, users should note that we are in the process of migrating tidying methods for mixed models and Bayesian models to broom.mixed
. broom.mixed
is not on CRAN yet, but all mixed model and Bayesian tidiers will be deprecated once broom.mixed
is on CRAN. No further development of mixed model tidiers will take place in broom
.
Almost all tidiers should now return tibble
s rather than data.frame
s. Deprecated tidying methods, Bayesian and mixed model tidiers still return data.frame
s.
Users are mostly to experience issues when using augment
in situations where tibbles are stricter than data frames. For example, specifying model covariates as a matrix object will now error:
library(broom)library(quantreg) fit <- rq(stack.loss ~ stack.x, tau = .5)broom::augment(fit)#> Error: Column `stack.x` must be a 1d atomic vector or a list
This is because the default data
argument data = model.frame(fit)
cannot be coerced to tibble
.
Another consequence of this is that augment.survreg
and augment.coxph
from the survival
package now require that the user explicitly passes data to either the data
or newdata
arguments.
These restrictions will be relaxed in an upcoming release of broom
pending support for matrix-columns in tibbles.
Developers are likely to experience issues:
[
, which returns a tibble rather than a vector.tbl_df
and tbl
beyond the data.frame
classroxygen2
template based documentation system.This version of broom
includes several new vignettes:
vignette("available-methods", package = "broom")
contains a table detailing which tidying methods are availablevignette("adding-tidiers", package = "broom")
is an in-progress guide for contributors on how to add new tidiers to broomvignette("glossary", package = "broom")
contains tables describing acceptable argument names and column names for the in-progress new specification.Several old vignettes have also been updated:
vignette("bootstrapping", package = "broom")
now relies on the rsample
package and a tidyr::nest
-purrr::map
-tidyr::unnest
workflow. This is now the recommended workflow for working with multiple models, as opposed to the old dplyr::rowwise
-dplyr::do
based workflow.tibble::as_tibble
and tibble::enframe
bootstrap()
has been deprecated in favor of the rsample
inflate
has been removed from broom
alpha
argument has been removed from quantreg
tidy methodsseparate.levels
argument has been removed from tidy.TukeyHSD
. To obtain the effect of separate.levels = TRUE
, users may tidyr::separate
after tidying. This is consistent with the multcomp
tidier behavior.fe.error
argument was removed from tidy.felm
. When fixed effects are tidier, their standard errors are now always included.diag
argument in tidy.dist
has been renamed diagonal
glance
support for arima
objects fit with method = "CSS"
(#396 by @josue-rodriguez)glmnet
objects with family = multinomial
(#395 by @erleholgersen)quantreg
intercept only models (#378 by @erleholgersen)aovlist
objects (#377 by @mvevans89)glmnetUtils
objects (#352 by @Hong-Revo)tidy_emmeans
to handle column names with dashes (#351 by @bmannakee)augment.felm
no longer returns .fe_
and .comp
columnsaugment.felm
(#347 by @ShreyasSingh)confint_tidy
now drops rows of all NA
(#345 by @atyre2)caret::confusionMatrix
objects (#344 by @mkuehn10)Kendall::Kendall
objects (#343 by @cimentadaj)car::durbinWatsonTest
objects (#341 by @mkuehn10)glance
throws an informative error for quantreg:rq
models fit with multiple tau
values (#338 by @bfgray3)tidy.glmnet
gains the ability to retain zero-valued coefficients with a return_zeros
argument that defaults to FALSE
(#337 by @bfgray3)tidy.manova
now retains a Residuals
row (#334 by @jarvisc1)ordinal::clm
, ordinal::clmm
, survey::svyolr
and MASS::polr
ordinal model objects (#332 by @larmarange)anova
objects from car::Anova
(#325 by @mariusbarth)tseries::garch
models (#323 by @wilsonfreitas)psych
package (#313 by @nutterb)rstanarm
and loo
packages (#298 by @jgabry)irlba::irlba
tidy.prcomp
when missing labels (#265 by @corybrunson)pkgdown
site at https://broom.tidyverse.org/ (#260 by @jayhesselberth)AER::ivreg
models (#247 by @hughjonesd)lavaan
package (#233 by @puterleat)conf.int
argument to tidy.coxph
(#220 by @larmarange)augment
method for chi-squared tests (#138 by @larmarange)Many many thanks to all the following for their thoughtful comments on design, bug reports and PRs! The community of broom contributors has been kind, supportive and insighftul and I look forward to working you all again!
@atyre2, @batpigandme, @bfgray3, @bmannakee, @briatte, @cawoodjm, @cimentadaj, @dan87134, @dgrtwo, @dmenne, @ekatko1, @ellessenne, @erleholgersen, @Hong-Revo, @huftis, @IndrajeetPatil, @jacob-long, @jarvisc1, @jenzopr, @jgabry, @jimhester, @josue-rodriguez, @karldw, @kfeilich, @larmarange, @lboller, @mariusbarth, @michaelweylandt, @mine-cetinkaya-rundel, @mkuehn10, @mvevans89, @nutterb, @ShreyasSingh, @stephlocke, @strengejacke, @topepo, @willbowditch, @WillemSleegers, and @wilsonfreitas
dplyr::failwith
to purrr::possibly
augment
and glance
on NULLs now return an empty data frameinflate()
function in favor of tidyr::crossing
quick = TRUE
to return terms as character rather than factor (thanks to #191 from Matteo Sostero)ivreg
objects from the AER package (thanks to #245 from David Hugh-Jones)survdiff
objects from the survival package (thanks to #147 from Michał Bojanowski)emmeans
from the emmeans package (thanks to #252 from Matthew Kay)speedlm
and speedglm
from the speedglm package (thanks to #248 from David Hugh-Jones)muhaz
objects from the muhaz package (thanks to #251 from Andreas Bender)decompose
and stl
objects from stats (thanks to #165 from Aaron Jacobs)lsmobj
and ref.grid
objects from the lsmeans packagebetareg
objects from the betareg packagelmRob
and glmRob
objects from the robust packagebrms
objects from the brms package (thanks to #149 from Paul Buerkner)tidy.glmnet
to filter out rows where estimate == 0.rstanarm
tidiers (thanks to #177 from Jonah Gabry)tidy.TukeyHSD
to include term
column. Also added separate.levels
argument, with option to separate comparison
into level1
and level2
tidy.manova
to use correct column name for test (previously, always pillai
)kde_tidiers
to tidy kernel density estimatesorcutt_tidiers
to tidy the results of cochrane.orcutt
orcutt packagetidy.dist
to tidy the distance matrix output of dist
from the stats packagetidy
and glance
for lmodel2
objects from the lmodel2 packagepoLCA
objects from the poLCA packageprcomp
objectsMclust
objects from the Mclust packageacf
objectstidy
methods for lists, including u, d, v lists from svd
, and x, y, z lists used by image
and persp
quick
argument to tidy.lm
, tidy.nls
, and tidy.biglm
, to create a smaller and faster version of the output.rowwise_df_tidiers
to allow the original data to be saved as a list column, then provided as a column name to augment
. This required removing data
from the augment
S3 signature. Also added tests-rowwise.R
tidy.coeftest
for coeftest objects from the lmtest package.tidy.lm
to work with "mlm" (multiple linear model) objects (those with multiple response columns).tidy
and glance
for "biglm" and "bigglm" objects from the biglm package.tidy.coxph
when one-row matrices are returnedtidy.power.htest
tidy
and glance
for summaryDefault
objectstidy
and glance
for multinom
objects from the nnet package.tidy.pairwise.htest
, which now can handle cases where the grouping variable is numeric.tidy.aovlist
method. This added stringr
package to IMPORTS to trim whitespace from the beginning and end of the term
and stratum
columns. This also required adjusting tidy.aov
so that it could handle strata that are missing p-values.glance.lm
to work with aov
objects along with lm
objects.tidy
and glance
for matrix objects, with tidy.matrix
converting a matrix to a data frame with rownames included, and glance.matrix
returning the same result as glance.data.frame
.felm
where the .fitted
and .resid
columns were matrices rather than vectors.rlm
(robust linear model) and gam
(generalized additive model) objects, including adjustments to "lm" tidiers in order to handle them. See ?rlm_tidiers
and ?gam_tidiers
for more.tidy.cv.glmnet
outputThe behavior of augment
, particularly with regard to missing data and the na.exclude
argument, has through the use of the augment_columns
function been made consistent across the following models:
lm
glm
nls
merMod
(lme4
)survreg
(survival
)coxph
(survival
)Unit tests in tests/testthat/test-augment.R
were added to ensure consistency across these models.
tidy
, augment
and glance
methods were added for rowwise_df
objects, and are set up to apply across their rows. This allows for simple patterns such as:
regressions <- mtcars %>% group_by(cyl) %>% do(mod = lm(mpg ~ wt, .))
regressions %>% tidy(mod)
regressions %>% augment(mod)
See ?rowwise_df_tidiers
for more.
Added tidy
and glance
methods for Arima
objects, and tidy
for pairwise.htest
objects.
Fixes for CRAN: change package description to title case, removed NOTES, mostly by adding globals.R
to declare global variables.
This is the original version published on CRAN.
lme4
glmnet
survival
zoo
felm
MASS
(ridgelm
objects)tidy
and glance
methods for data.frames have also been added, and augment.data.frame
produces an error (rather than returning the same data.frame).stderror
has been changed to std.error
(affects many functions) to be consistent with broom's naming conventions for columns.bootstrap
has been added based on this example, to perform the common use case of bootstrapping models.