Functions that simplify the process of generating print-ready data summary using 'dplyr' syntax.
dplyr provides a grammar to talk about data manipulation and another his package,
tidyr provides a mindset to think about data. These two tools really makes it a lot easier to perform data manipulation today. This package
ezsummary packed up some commonly used
tidyr steps to generate data summarization to help you save some typing time. It also comes with some table decoration tolls that basically allows you to pipe the results directly into a table generating function like
knitr::kable() to render out.
For example, if you only use
tidyr to generate a statistical summary table by group. You need to go through the following steps.
library(dplyr)#> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#>#> filter, lag#> The following objects are masked from 'package:base':#>#> intersect, setdiff, setequal, unionlibrary(tidyr)mtcars %>%select(cyl, mpg, wt, hp) %>%group_by(cyl) %>%summarize_each(funs(mean, sd)) %>%gather(variable, value, -cyl) %>%mutate(value = round(value, 3)) %>%separate(variable, into = c("variable", "analysis")) %>%spread(analysis, value) %>%mutate(variable = factor(variable, levels = c("mpg", "wt", "hp"))) %>%arrange(variable, cyl) %>%kable()
For people who are familar with "tidyverse", I'm sure the above codes are very straightforward. However, it's a bit annoying to type it again and again. With
ezsummary, you don't need to think too much about it. You can just type:
library(ezsummary)mtcars %>%select(cyl, mpg, wt, hp) %>%group_by(cyl) %>%ezsummary() %>%kable()
Here, I will show another quick demo of how to use this package here. For detailed package documentation, please check the package vignette.
library(dplyr)library(ezsummary)mtcars %>%# q: quantitative/continuous variables; c: categorical variablesvar_types("qcqqqqqcccc") %>%group_by(am) %>%ezsummary(flavor = "wide", unit_markup = "[. (.)]",digits = 1, p_type = "percent") %>%kable(col.names = c("variable", "Manual", "Automatic"))
|mpg||17.1 (3.8)||24.4 (6.2)|
|cyl_4||3 (15.8%)||8 (61.5%)|
|cyl_6||4 (21.1%)||3 (23.1%)|
|cyl_8||12 (63.2%)||2 (15.4%)|
|disp||290.4 (110.2)||143.5 (87.2)|
|hp||160.3 (53.9)||126.8 (84.1)|
|drat||3.3 (0.4)||4 (0.4)|
|wt||3.8 (0.8)||2.4 (0.6)|
|qsec||18.2 (1.8)||17.4 (1.8)|
|vs_0||12 (63.2%)||6 (46.2%)|
|vs_1||7 (36.8%)||7 (53.8%)|
|gear_3||15 (78.9%)||0 (0)|
|gear_4||4 (21.1%)||8 (61.5%)|
|gear_5||0 (0)||5 (38.5%)|
|carb_1||3 (15.8%)||4 (30.8%)|
|carb_2||6 (31.6%)||4 (30.8%)|
|carb_3||3 (15.8%)||0 (0)|
|carb_4||7 (36.8%)||3 (23.1%)|
|carb_6||0 (0)||1 (7.7%)|
|carb_8||0 (0)||1 (7.7%)|
If you ever find any issues, please feel free to report it in the issues tracking part on github. https://github.com/haozhu233/simple.summary/issues.
Thanks for using this package!
It has been almost 8 months since
ezsummary 0.1.9 was released on CRAN. I hope this is a good time, if not too late, to do a major update. In this new version, I introduced a few attractive features by completely recoding some key functions in this package. In most cases, you will find the outcome of
ezsummary() look the same as the outcome of the old version. However, there are a few cases that the new version behaves slightly differently with the old version (mostly with column namings). I'm sorry for any inconvenience caused by this update.
Here is a list of new features introduced by this update:
ezsummary_c()were introduced for
...as options and these options will be passed to
ezsummary_q()to produce any analyses you want. Please see ******* for details.
ezsummary()to analyze both quantitative and categorical variables. now you have two output modes:
ezwill generate an integrated table like
ezsummary 0.1.9while Mode
detailswill generate a list of two with quantitative and categorical variables in different tables. Mode
detailsallows you to have different number of analyses run for those two types of data.
round.Nwill be deprecated. Use
P(for percentages) will be deprecated. Instead, now you can select the type of proportion outputby setting the value of
p_typeto be either "percent" or "decimal". You can no longer display both percent and decimal outputs because it's sort of meaningless.
totalto display counts of records including NA.
missingto display counts of missing records.
fillis available when you set the
wide. The value of
fillwill be passed to
tidyr::spread()and decide what to fill the NA slots generated by the "spread" step.