Alluvial Plots in 'ggplot2'

Alluvial plots use x-splines, sometimes augmented with stacked histograms, to visualize multi-dimensional or repeated-measures data with categorical or ordinal variables. They can be viewed as simplified and standardized Sankey diagrams; see Riehmann, Hanfler, and Froehlich (2005) and Rosvall and Bergstrom (2010) . This package provides ggplot2 layers to produce alluvial plots from tidy data.


This is a ggplot2 extension for alluvial diagrams.

Design

The alluvial plots implemented here can be used to visualize frequency distributions over time or frequency tables involving several categorical variables. The design is derived mostly from the alluvial package, but the ggplot2 framework induced several conspicuous differences:

  • alluvial understands a variety of inputs (vectors, lists, data frames), while ggalluvial requires a single data frame;
  • alluvial uses each variable of these inputs as a dimension of the data, whereas ggalluvial requires the user to specify the dimensions, either as separate aesthetics or as key-value pairs;
  • alluvial produces both the alluvia, which link cohorts across multiple dimensions, and (what are here called) the strata, which partition the data along each dimension, in a single function; whereas ggalluvial relies on separate layers (stats and geoms) to produce strata, alluvia, and alluvial segments called lodes and flows.

Installation

The latest stable release can be installed from CRAN:

install.packages("ggalluvial")

The cran branch will contain the version most recently submitted to CRAN.

Development versions can be installed from GitHub:

devtools::install_github("corybrunson/ggalluvial", build_vignettes = TRUE)

The optimization branch contains a development version with experimental functions to reduce the number or area of alluvial overlaps (see issue #6). Install it as follows:

devtools::install_github("corybrunson/ggalluvial", ref = "optimization")

Usage

Here is how to generate an alluvial diagram representation of the multi-dimensional categorical dataset of passengers on the Titanic:

titanic_wide <- data.frame(Titanic)
head(titanic_wide)
#>   Class    Sex   Age Survived Freq
#> 1   1st   Male Child       No    0
#> 2   2nd   Male Child       No    0
#> 3   3rd   Male Child       No   35
#> 4  Crew   Male Child       No    0
#> 5   1st Female Child       No    0
#> 6   2nd Female Child       No    0
ggplot(data = titanic_wide,
       aes(axis1 = Class, axis2 = Sex, axis3 = Age,
           y = Freq)) +
  scale_x_discrete(limits = c("Class", "Sex", "Age"), expand = c(.1, .05)) +
  xlab("Demographic") +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic",
          "stratified by demographics and survival")

The data is in "wide" format, but ggalluvial also recognizes data in "long" format and can convert between the two:

titanic_long <- to_lodes_form(data.frame(Titanic),
                              key = "Demographic",
                              axes = 1:3)
head(titanic_long)
#>   Survived Freq alluvium Demographic stratum
#> 1       No    0        1       Class     1st
#> 2       No    0        2       Class     2nd
#> 3       No   35        3       Class     3rd
#> 4       No    0        4       Class    Crew
#> 5       No    0        5       Class     1st
#> 6       No    0        6       Class     2nd
ggplot(data = titanic_long,
       aes(x = Demographic, stratum = stratum, alluvium = alluvium,
           y = Freq, label = stratum)) +
  geom_alluvium(aes(fill = Survived)) +
  geom_stratum() + geom_text(stat = "stratum") +
  theme_minimal() +
  ggtitle("passengers on the maiden voyage of the Titanic",
          "stratified by demographics and survival")

Resources

For detailed discussion of the data formats recognized by ggalluvial and several examples that illustrate its flexibility and limitations, read the vignette:

vignette(topic = "ggalluvial", package = "ggalluvial")

The documentation contains several examples; use help() to call forth examples of any layer (stat_* or geom_*).

Feedback

Cite

If you use ggalluvial-generated figures in publication, i'd be grateful to hear about it! You can also cite the package according to citation("ggalluvial").

Contribute

Issues and pull requests are more than welcome! Pretty much every fix and feature of this package derives from a problem or question posed by someone with datasets or design goals i hadn't anticipated.

News

ggalluvial 0.9.1

Suggest sessioninfo for session_info()

Because the only functional (e.g. out README.md) occurrence of devtools is to call session_info() at the ends of the vignettes, this suggestion and usage are switched to sessioninfo.

markdown formatting

Documentation is slightly reformatted due to switching roxygen syntax to markdown.

z-ordering patch

The internal z-ordering function z_order_aes failed to recognize contiguous segments of alluvia, thereby assigning later segments missing values of 'group' and preventing them from being rendered. This has been corrected.

ggalluvial 0.9.0

geom_alluvium() patch

An occurrence of weight in geom_alluvium() was not updated for v0.8.0 and caused geom_alluvium() to throw an error in some cases. This has been corrected.

geom_flow() patch

An earlier solution to the z-ordering problem sufficed for matched layers (*_alluvium() and *_flow()) but failed for the combination of stat_alluvium() with geom_flow(). This is been corrected in the code for GeomFlow$draw_panel(), though a more elegant and general solution is preferred.

Deprecated parameters removed

The deprecated parameters axis_width (all geom layers) and ribbon_bend (geom_alluvium() and geom_flow()) are removed and an explanatory note added to the layers' documentation.

Vignette on labeling small strata

A vignette illustrating two methods for labeling small strata, using other ggplot2 extensions, is included.

self_adjoin() export

The internal function self_adjoin(), invoked by geom_flow(), is revised, exported, documented, and exemplified.

ggalluvial 0.8.0

Stat layer functionality

  • The weight aesthetic for the three stat_*() functions is replaced by the y aesthetic, so that scale_y_continuous() will correctly transform the vertical scales of the layers. An example is provided in the documentation for stat_alluvium(). The y aesthetic must be present in order for scales to be correctly transformed. The weight parameter is still available but deprecated.
  • For consistency with the switch from weight to y, the aggregate.wts parameter to stat_alluvium() is replaced with aggregate.y; aggregate.wts is deprecated.

Alluvial data functionality

  • Tests for alluvial format are silenced inside the stat_*() functions.

ggalluvial 0.7.0

Alluvial data functionality

These changes make the functions that test for and convert between alluvial formats behave more like popular functions in the tidyverse. Some of the changes introduce backward incompatibilities, but most result in deprecation warnings.

  • The functions is_alluvial_*() and to_*() are renamed to is_*_form() and to_*_form() for consistency. Their old names are deprecated.
  • is_alluvial() is deprecated and will be removed in a future version.
  • The parameter logical is deprecated. In a future version, the functions is_*_form() will only return logical values.
  • The setting silent = TRUE now silences all messages.
  • The functions is_*_form() now return FALSE if any weights are negative, with a message to this effect.
  • These functions now accept unquoted variable names for the key, value, id, weight, and diffuse parameters, using up-to-date rlang and tidyselect functionality.
  • The axes parameter in is_alluvia_form() and to_lodes_form() now accepts dplyr::vars() objects, as in dplyr::select_at(). Alternatively, variables can be fed to these functions as in dplyr::select(), to be collected by rlang::quos(...) and used as axis variables. If axes is not NULL, then such additional arguments are ignored.
  • The functions to_*_form() now merge their internal reshapen data frames with the distilled or diffused variables in a consistent order, placing the distilled or diffused variables to the left.

ggalluvial 0.6.0

CRAN checks for v0.5.0

  • The package now Depends on R v3.3.0 (patch number zero) instead of v3.3.1. I've been unable to install this version locally, so there is a slight chance of incompatibility that i'll be watchful for going forward.
  • The grid and alluvial packages are now Suggests rather than Imports.

Alluvial data functionality

  • Source files and documentation for is_alluvial_*() and to_*() functions are combined; see help("alluvial-data").
  • is_alluvial_alluvia now prints a message rather than a warning when some combinations of strata are not linked by any alluvia.
  • to_lodes() now has a diffuse parameter to join any original variables to the reformatted data by the id variable (alluvium). This makes it possible to assign original variables to aesthetics after reformatting, as illustrated in a new example.
  • to_alluvia() now has a distill parameter to control the inclusion of any original variables that vary within values of id into the reformatted data, based on a distilling function that returns a single value from a vector.
  • to_lodes() now has a logical discern parameter that uses make.unique() to make stratum values that appear at different axes distinct. The stat_*() functions can pass the same parameter internally and print a warning if the data is already in lodes form.

Layer internals

  • GeomFlow$draw_panel() now begins by restricting to complete.cases(), corresponding to flows with both starting and terminating axes. (This is not done in StatFlow$compute_panel(), which would have the effect of excluding missing aesthetic values from legends.)
  • GeomAlluvium$setup_data() now throws a warning if some color or differentiation aesthetics vary within alluvia.
  • A bug in the processing of a custom lode.ordering argument by StatAlluvium$compute_panel() has been fixed.

ggalluvial 0.5.0

Backward incompatibilities

The ggalluvial() shortcut function, which included a formula interface, deprecated in version 0.4.0, is removed.

earlier versions

I only started maintaining NEWS.md with version 0.5.0.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("ggalluvial")

0.11.1 by Jason Cory Brunson, 3 days ago


http://corybrunson.github.io/ggalluvial/


Report a bug at https://github.com/corybrunson/ggalluvial/issues


Browse source code at https://github.com/cran/ggalluvial


Authors: Jason Cory Brunson [aut, cre]


Documentation:   PDF Manual  


GPL-3 license


Imports stats, dplyr, tidyr, lazyeval, rlang, tidyselect

Depends on ggplot2

Suggests grid, alluvial, testthat, knitr, babynames, sessioninfo, ggrepel, ggfittext, vdiffr


Imported by easyalluvial.


See at CRAN