Alluvial plots use variable-width ribbons and stacked bar plots to
represent multi-dimensional or repeated-measures data with categorical or
ordinal variables; see Riehmann, Hanfler, and Froehlich (2005)
This is a ggplot2 extension for alluvial diagrams.
The alluvial plots implemented here can be used to visualize frequency distributions over time or frequency tables involving several categorical variables. The design is derived mostly from the alluvial package, but the ggplot2 framework induced several conspicuous differences:
The latest stable release can be installed from CRAN:
install.packages("ggalluvial")
The cran branch will contain the version most recently submitted to CRAN.
Development versions can be installed from GitHub:
devtools::install_github("corybrunson/ggalluvial", build_vignettes = TRUE)
The optimization branch contains a development version with experimental functions to reduce the number or area of alluvial overlaps (see issue #6). Install it as follows:
devtools::install_github("corybrunson/ggalluvial", ref = "optimization")
Here is how to generate an alluvial diagram representation of the multi-dimensional categorical dataset of passengers on the Titanic:
titanic_wide <- data.frame(Titanic)head(titanic_wide)#> Class Sex Age Survived Freq#> 1 1st Male Child No 0#> 2 2nd Male Child No 0#> 3 3rd Male Child No 35#> 4 Crew Male Child No 0#> 5 1st Female Child No 0#> 6 2nd Female Child No 0ggplot(data = titanic_wide,aes(axis1 = Class, axis2 = Sex, axis3 = Age,y = Freq)) +scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +xlab("Demographic") +geom_alluvium(aes(fill = Survived)) +geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +theme_minimal() +ggtitle("passengers on the maiden voyage of the Titanic","stratified by demographics and survival")
The data is in "wide" format, but ggalluvial also recognizes data in "long" format and can convert between the two:
titanic_long <- to_lodes_form(data.frame(Titanic),key = "Demographic",axes = 1:3)head(titanic_long)#> Survived Freq alluvium Demographic stratum#> 1 No 0 1 Class 1st#> 2 No 0 2 Class 2nd#> 3 No 35 3 Class 3rd#> 4 No 0 4 Class Crew#> 5 No 0 5 Class 1st#> 6 No 0 6 Class 2ndggplot(data = titanic_long,aes(x = Demographic, stratum = stratum, alluvium = alluvium,y = Freq, label = stratum)) +geom_alluvium(aes(fill = Survived)) +geom_stratum() + geom_text(stat = "stratum") +theme_minimal() +ggtitle("passengers on the maiden voyage of the Titanic","stratified by demographics and survival")
For detailed discussion of the data formats recognized by ggalluvial and several examples that illustrate its flexibility and limitations, read the vignette:
vignette(topic = "ggalluvial", package = "ggalluvial")
The documentation contains several examples; use help()
to call forth examples of any layer (stat_*
or geom_*
).
If you use ggalluvial-generated figures in publication, i'd be grateful to hear about it! You can also cite the package according to citation("ggalluvial")
.
Issues and pull requests are more than welcome! Pretty much every fix and feature of this package derives from a problem or question posed by someone with datasets or design goals i hadn't anticipated.
session_info()
Because the only functional (e.g. out README.md
) occurrence of devtools is to call session_info()
at the ends of the vignettes, this suggestion and usage are switched to sessioninfo.
Documentation is slightly reformatted due to switching roxygen syntax to markdown.
The internal z-ordering function z_order_aes
failed to recognize contiguous segments of alluvia, thereby assigning later segments missing values of 'group'
and preventing them from being rendered. This has been corrected.
geom_alluvium()
patchAn occurrence of weight
in geom_alluvium()
was not updated for v0.8.0 and caused geom_alluvium()
to throw an error in some cases. This has been corrected.
geom_flow()
patchAn earlier solution to the z-ordering problem sufficed for matched layers (*_alluvium()
and *_flow()
) but failed for the combination of stat_alluvium()
with geom_flow()
. This is been corrected in the code for GeomFlow$draw_panel()
, though a more elegant and general solution is preferred.
The deprecated parameters axis_width
(all geom layers) and ribbon_bend
(geom_alluvium()
and geom_flow()
) are removed and an explanatory note added to the layers' documentation.
A vignette illustrating two methods for labeling small strata, using other ggplot2 extensions, is included.
self_adjoin()
exportThe internal function self_adjoin()
, invoked by geom_flow()
, is revised, exported, documented, and exemplified.
weight
aesthetic for the three stat_*()
functions is replaced by the y
aesthetic, so that scale_y_continuous()
will correctly transform the vertical scales of the layers. An example is provided in the documentation for stat_alluvium()
. The y
aesthetic must be present in order for scales to be correctly transformed. The weight
parameter is still available but deprecated.weight
to y
, the aggregate.wts
parameter to stat_alluvium()
is replaced with aggregate.y
; aggregate.wts
is deprecated.stat_*()
functions.These changes make the functions that test for and convert between alluvial formats behave more like popular functions in the tidyverse. Some of the changes introduce backward incompatibilities, but most result in deprecation warnings.
is_alluvial_*()
and to_*()
are renamed to is_*_form()
and to_*_form()
for consistency. Their old names are deprecated.is_alluvial()
is deprecated and will be removed in a future version.logical
is deprecated. In a future version, the functions is_*_form()
will only return logical values.silent = TRUE
now silences all messages.is_*_form()
now return FALSE
if any weights are negative, with a message to this effect.key
, value
, id
, weight
, and diffuse
parameters, using up-to-date rlang and tidyselect functionality.axes
parameter in is_alluvia_form()
and to_lodes_form()
now accepts dplyr::vars()
objects, as in dplyr::select_at()
. Alternatively, variables can be fed to these functions as in dplyr::select()
, to be collected by rlang::quos(...)
and used as axis variables. If axes
is not NULL
, then such additional arguments are ignored.to_*_form()
now merge their internal reshapen data frames with the distilled or diffused variables in a consistent order, placing the distilled or diffused variables to the left.Depends
on R v3.3.0
(patch number zero) instead of v3.3.1
. I've been unable to install this version locally, so there is a slight chance of incompatibility that i'll be watchful for going forward.Suggests
rather than Imports
.is_alluvial_*()
and to_*()
functions are combined; see help("alluvial-data")
.is_alluvial_alluvia
now prints a message rather than a warning when some combinations of strata are not linked by any alluvia.to_lodes()
now has a diffuse
parameter to join any original variables to the reformatted data by the id
variable (alluvium). This makes it possible to assign original variables to aesthetics after reformatting, as illustrated in a new example.to_alluvia()
now has a distill
parameter to control the inclusion of any original variables that vary within values of id
into the reformatted data, based on a distilling function that returns a single value from a vector.to_lodes()
now has a logical discern
parameter that uses make.unique()
to make stratum values that appear at different axes distinct. The stat_*()
functions can pass the same parameter internally and print a warning if the data is already in lodes form.GeomFlow$draw_panel()
now begins by restricting to complete.cases()
, corresponding to flows with both starting and terminating axes. (This is not done in StatFlow$compute_panel()
, which would have the effect of excluding missing aesthetic values from legends.)GeomAlluvium$setup_data()
now throws a warning if some color or differentiation aesthetics vary within alluvia.lode.ordering
argument by StatAlluvium$compute_panel()
has been fixed.The ggalluvial()
shortcut function, which included a formula interface, deprecated in version 0.4.0, is removed.
I only started maintaining NEWS.md
with version 0.5.0.