Capture the Spirit of Your 'ggplot2' Calls

Creates a reproducible 'ggplot2' object by storing the data and calls.


Capture the spirit of your ggplot2 calls.

ggplot2::ggplot() stores the information needed to build the graph as a grob, but that's what the computer needs to know about in order to build the graph. As humans, we're more interested in what commands were issued in order to build the graph. For good reproducibility, the calls need to be applied to the relevant data. While this is somewhat available by deconstructing the grob, it's not the simplest approach.

Here is one option that solves that problem.

ggghost stores the data used in a ggplot() call, and collects ggplot2 commands (usually separated by +) as they are applied, in effect lazily collecting the calls. Once the object is requested, the print method combines the individual calls back into the total plotting command and executes it. This is where the call would usually be discarded. Instead, a "ghost" of the commands lingers in the object for further investigation, subsetting, adding to, or subtracting from.

You can install ggghost from CRAN with:

install.packages("ggghost")

or the development version from github with:

devtools::install_github("jonocarroll/ggghost")

use %g<% to initiate storage of the ggplot2 calls then add to the call with each logical call on a new line (@hrbrmstr style)

tmpdata <- data.frame(x = 1:100, y = rnorm(100))
head(tmpdata)
#>   x          y
#> 1 1  0.8930142
#> 2 2 -2.2215165
#> 3 3 -0.5178338
#> 4 4  0.4729639
#> 5 5 -0.1718715
#> 6 6  0.1927056
library(ggplot2)
library(ggghost)
z %g<% ggplot(tmpdata, aes(x, y))
z <- z + geom_point(col = "steelblue")
z <- z + theme_bw()
z <- z + labs(title = "My cool ggplot")
z <- z + labs(x = "x axis", y = "y axis")
z <- z + geom_smooth()

This invisibly stores the ggplot2 calls in a list which can be reviewed either with the list of calls

summary(z)
#> [[1]]
#> ggplot(tmpdata, aes(x, y))
#> 
#> [[2]]
#> geom_point(col = "steelblue")
#> 
#> [[3]]
#> theme_bw()
#> 
#> [[4]]
#> labs(title = "My cool ggplot")
#> 
#> [[5]]
#> labs(x = "x axis", y = "y axis")
#> 
#> [[6]]
#> geom_smooth()

or the concatenated call

summary(z, combine = TRUE)
#> [1] "ggplot(tmpdata, aes(x, y)) + geom_point(col = \"steelblue\") + theme_bw() + labs(title = \"My cool ggplot\") + labs(x = \"x axis\", y = \"y axis\") + geom_smooth()"

The plot can be generated using a print method

z

which re-evaluates the list of calls and applies them to the saved data, meaning that the plot remains reproducible even if the data source is changed/destroyed.

The call list can be subset, removing parts of the call

subset(z, c(1,2,6))

Plot features can be removed by name, a task that would otherwise have involved re-generating the entire plot

z2 <- z + geom_line(col = "coral")
z2 - geom_point()

Calls are removed based on matching to the regex \\(.*$ (from the first bracket to the end of the call), so arguments are irrelevant.

The object still generates all the grob info, it's just stored as calls rather than a completed image.

str(print(z))
#> List of 9
#>  $ data       :'data.frame': 100 obs. of  2 variables:
#>   ..$ x: int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
#>   ..$ y: num [1:100] 0.893 -2.222 -0.518 0.473 -0.172 ...
#>  $ layers     :List of 2
#> [... truncated ...]

Since the grob info is still produced, normal ggplot2 operators can be applied after the print statement, such as replacing the data

xvals <- seq(0,2*pi,0.1)
tmpdata_new <- data.frame(x = xvals, y = sin(xvals))
print(z - geom_smooth()) %+% tmpdata_new

ggplot2 calls still work as normal if you want to avoid storing the calls.

ggplot(tmpdata) + geom_point(aes(x,y), col = "red")

Since the object is a list, we can stepwise show the process of building up the plot as a (re-)animation

lazarus(z, "mycoolplot.gif")

A supplementary data object (e.g. for use in a geom_* or scale_* call) can be added to the ggghost object

myColors <- c("alpha" = "red", "beta" = "blue", "gamma" = "green")
supp_data(z) <- myColors

These will be recovered along with the primary data.

For full reproducibility, the entire structure can be saved to an object for re-loading at a later point. This may not have made much sense for a ggplot2 object, but now both the original data and the calls to generate the plot are saved. Should the environment that generated the plot be destroyed, all is not lost.

saveRDS(z, file = "README_supp/mycoolplot.rds")
rm(z)
rm(tmpdata)
rm(myColors)
exists("z")
#> [1] FALSE
exists("tmpdata")
#> [1] FALSE
exists("myColors")
#> [1] FALSE

Reading the ggghost object back to the session, both the relevant data and plot-generating calls can be re-executed.

z <- readRDS("README_supp/mycoolplot.rds")
str(z)
#> List of 6
#>  $ : language ggplot(tmpdata, aes(x, y))
#>  $ : language geom_point(col = "steelblue")
#>  $ : language theme_bw()
#>  $ : language labs(title = "My cool ggplot")
#>  $ : language labs(x = "x axis", y = "y axis")
#>  $ : language geom_smooth()
#>  - attr(*, "class")= chr [1:2] "ggghost" "gg"
#>  - attr(*, "data")=List of 2
#>   ..$ data_name: chr "tmpdata"
#>   ..$ data     :'data.frame':    100 obs. of  2 variables:
#>   .. ..$ x: int [1:100] 1 2 3 4 5 6 7 8 9 10 ...
#>   .. ..$ y: num [1:100] 0.893 -2.222 -0.518 0.473 -0.172 ...
#>  - attr(*, "suppdata")=List of 2
#>   ..$ supp_data_name: chr "myColors"
#>   ..$ supp_data     : Named chr [1:3] "red" "blue" "green"
#>   .. ..- attr(*, "names")= chr [1:3] "alpha" "beta" "gamma"
 
recover_data(z, supp = TRUE)
head(tmpdata)
#>   x          y
#> 1 1  0.8930142
#> 2 2 -2.2215165
#> 3 3 -0.5178338
#> 4 4  0.4729639
#> 5 5 -0.1718715
#> 6 6  0.1927056
 
myColors
#>   alpha    beta   gamma 
#>   "red"  "blue" "green"
 
z

We now have a proper reproducible graphic.

  • The data must be used as an argument in the ggplot2 call, not piped in to it. Pipelines such as z %g<% tmpdata %>% ggplot() won't work... yet.
  • Only one original data set will be stored; the one in the original ggplot(data = x) call. If you require supplementary data for some geom then you need manage storage/consistency of that. (fixed)
  • For removing labs calls, an argument must be present. It doesn't need to be the actual one (all will be removed) but it must evaluate in scope. TRUE will do fine.

News

CHANGES in ggghost 0.2.1

==========

  • Corrected a minor bug which dropped supplementary data when using + or -.

CHANGES in ggghost 0.2.0

==========

  • Allowed inclusion of supplementary data, with recovery (closes #3).

  • Added a warning to recover_data when data object exists in the calling frame but has changed since being captured, with opt-out where used interactively (closes #4). Note that this is responsible for the less than 100% code coverage (everything except if (interactive()) is covered.

  • Moved README supplementary objects out of package, reducing package filesize (closes #2).

ggghost 0.1.0

==========

  • Initial CRAN submission. No ERRORs, WARNINGs, or NOTEs.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("ggghost")

0.2.1 by Jonathan Carroll, a year ago


https://github.com/jonocarroll/ggghost


Report a bug at https://github.com/jonocarroll/ggghost/issues


Browse source code at https://github.com/cran/ggghost


Authors: Jonathan Carroll [aut, cre]


Documentation:   PDF Manual  


GPL (>= 3) license


Depends on ggplot2, animation

Suggests testthat


See at CRAN