An extension to the 'testthat' package that makes it easy to add graphical unit tests. It provides a Shiny application to manage the test cases.
vdiffr is an extension to the package testthat that makes it easy to test for visual regressions. It provides a Shiny app to manage failed tests and visually compare a graphic to its expected output.
Important: The CRAN version no longer works properly. Please use the development version which removes the dependency on the system FreeType library.
Get the development version from github with:
or the last CRAN release with:
Getting started with vdiffr is a three step process:
Add expectations to by including
expect_doppelganger() in your test files.
manage_cases() to generate the plots which vdiffr will test against in
the future. This will launch a shiny gadget which will ask you to confirm
that each plot is correct.
devtools::test() to execute the tests as normal.
When a figure doesn't matched the saved version, vdiffr signals a failure when it is run interactively, or when it is run on Travis or Appveyor. Mismatches do not cause R CMD check to fail on CRAN machines. See the testing versus monitoring section below.
vdiffr integrates with testthat through the
expectation. It takes as arguments:
A title. This title is used in two ways. First, the title is
standardised (it is converted to lowercase and any character that is
not alphanumeric or a space is turned into a dash) and used as
filename for storing the figure. Secondly, with ggplot2 figures the
title is automatically added to the plot with
ggtitle() (only if
no ggtitle has been set).
A figure. This can be a ggplot object, a recordedplot, a function to
be called, or more generally any object with a
Optionally, a path where to store the figures, relative to
tests/figs/. They are stored in a subfolder according to the
current testthat context by default. Supply
path to change the
For example, the following tests will create figures in
context("Histograms")disp_hist_base <- function() hist(mtcars$disp)disp_hist_ggplot <- ggplot(mtcars, aes(disp)) + geom_histogram()vdiffr::expect_doppelganger("Base graphics histogram", disp_hist_base)vdiffr::expect_doppelganger("ggplot2 histogram", disp_hist_ggplot)
Note that in addition to automatic ggtitles, ggplot2 figures are
assigned the minimalistic theme
theme_test() (unless they already
have been assigned a theme).
When you have added new test cases or detected regressions, you can
manage those from the R command line with the functions
However it's easier to run the shiny application
With this app you can:
Check how a failed case differs from its expected output using three widgets: Toggle (click to swap the images), Slide and Diff. If you use Github, you may be familiar with the last two.
Validate cases. You can do so groupwise (all new cases or all failed cases) or on a case by case basis. When you validate a failed case, the old expected output is replaced by the new one.
Delete orphaned cases. During a refactoring of your unit tests, some
visual expectations may be removed or renamed. This means that some
unused figures will linger in the
tests/figs/ folder. These
figures appear in the Shiny application under the category
"Orphaned" and can be cleaned up from there.
package as first
argument, the path to your package sources. This argument has exactly
the same semantics as in devtools. You can use vdiffr tools the same
way as you would use
devtools::check(), for example. The default is
".", meaning that the package is expected to be found in the current
All validated cases are stored in
tests/figs/. This folder may be
handy to showcase the different graphs offered in your package. You
can also keep track of how your plots change as you tweak their layout
and add features by checking the history on Github.
You can run the tests the usual way, for example with
devtools::test(). New cases for which you just wrote an expectation
will be skipped. Failed tests will show as an error.
When a figure doesn't match its saved version, it is only reported as a failure under these circumstances:
NOT_CRAN environment is set. In particular, devtools sets this when running the tests interactively.
On Travis, Appveyor, or any environment where the
Sys.getenv("CI") is set.
Otherwise, the failure is ignored. The motivation for this is that vdiffr is a monitoring tool and shouldn't cause R CMD check failures on the CRAN machines.
Checking the appearance of a figure is inherently fragile. It is a bit like testing for errors by matching exact error messages. These messages are susceptible to change at any time. Similarly, the appearance of plots depends on a lot of upstream code, such as the way margins and spacing are computed. vdiffr uses a special ggplot2 theme that should change very rarely, but there are just too many upstream factors that could cause breakages. For this reason, figure mismatches are not necessarily representative of actual failures.
Visual testing is not an alternative to writing unit tests for the internal data transformations performed during the creation of your figure. It is more of a monitoring tool that allows you to quickly check how the appearance of your figures changes over time, and to manually assess whether changes reflect actual problems in your packages.
If you want vdiffr to fail on CRAN machines as well, just set the environment variable
"true" in a
setup-vdiffr.R file in your testthat folder.
An addin to launch
manage_cases() is provided with vdiffr. Use the
addin menu to launch the Shiny app in an RStudio dialog.
To use the Shiny app as part of ESS devtools integration with
C-c C-w C-v, include something like this in your init file:
(defun ess-r-vdiffr-manage-cases ()(interactive)(ess-r-package-send-process "vdiffr::manage_cases(%s)\n""Manage vdiffr cases for %s"))(define-key ess-r-package-dev-map "\C-v" 'ess-r-vdiffr-manage-cases)
It is sometimes difficult to understand the cause of a failure. This usually indicates that the plot is not created deterministically. Potential culprits are:
Some of the plot components depend on random variation. Try setting a seed.
The plot depends on some system library. For instance sf plots depend on libraries like GEOS and GDAL. It might not be possible to test these plots with vdiffr (which can still be used for manual inspection, add a [testthat::skip()] before the
expect_doppelganger() call in that case).
To help you understand the causes of a failure, vdiffr automatically logs the SVG diff of all failures when run under R CMD check. The log is located in
tests/vdiffr.Rout.fail and should be displayed on Travis.
You can also set the
VDIFFR_LOG_PATH environment variable with
Sys.setenv() to unconditionally (also interactively) log failures in the file pointed by the variable.
vdiffr extends testthat through a custom
are classes (R6 classes in recent versions of testthat) whose
instances collect cases and output a summary of the tests. While
reporters are usually meant to provide output for the end user, you
can also use them in functions to interact with testthat.
vdiffr has a
that does nothing but activate a collecter for the visual test
devtools::test() with this
expect_doppelganger() is called, it first checks
whether the case is new or failed. If that's the case, and if it finds
that vdiffr's collecter is active, it calls the collecter, which in
turns records the current test case.
This enables the user to run the tests with the usual development tools and get feedback in the form of skipped or failed cases. On the other hand, when vdiffr's tools are called, we collect information about the tests of interest and wrap them in a data structure.
Comparing SVG files is convenient and should work correctly in most situations. However, SVG is not suitable for tracking really subtle changes and regressions. See vdiffr's issue #1 for a discussion on this. vdiffr may gain additional comparison backends in the future to make the tests more stringent.
This release of vdiffr features a major overhaul of the internals to make the package more robust.
vdiffr now works reliably across platforms:
svglite is now embedded in vdiffr to protect against updates of the SVG generation engine.
It also embeds harfbuzz to compute font extents and text boxes metrics. This makes SVG generation of text boxes consisent across platforms.
While this makes vdiffr much more robust, it also means you will have to regenerate all your testcases with the new version of vdiffr. You can expect very few future releases that will require updating figures, hopefully once every few years.
Now that vdiffr has a stable engine, the next release will focus on improving the Shiny UI.
Another important change is that figure mismatches are no longer
reported as failures, except when the tests are run locally, on
Travis, Appveyor, or any environment where the
Sys.getenv("NOT_CRAN") variables are set. Because vdiffr is more of
a monitoring than a unit testing tool, it shouldn't cause R CMD check
failures on the CRAN machines.
Despite our efforts to make vdiffr robust and reliable across platforms, checking the appearance of a figure is still inherently fragile. It is similar to testing for errors by matching exact error messages: these messages are susceptible to change at any time. Similarly, the appearance of plots depends on a lot of upstream code, such as the way margins and spacing are computed. vdiffr uses a special ggplot2 theme that should change very rarely, but there are just too many upstream factors that could cause breakages. For this reason, figure mismatches are not necessarily representative of actual failures.
Visual testing is not an alternative to writing unit tests for the internal data transformations performed during the creation of your figure. It is more of a monitoring tool that allows you to quickly check how the appearance of your figures changes over time, and to manually assess whether changes reflect actual problems in your package.
If you need to override the default vdiffr behaviour on CRAN (not
recommended) or Travis (for example to run the tests in a particular
builds but not others), set the
variable to "true" or "false".
vdiffr now advises user to run
manage_cases() when a figure was
not validated yet (#25).
Fixed a bug in the Shiny app that prevented SVGs from being displayed in Firefox (@KZARCA, #29).
manage_cases() gains an
options argument that is passed to
The Shiny app now has a quit button (@ilarischeinin).
VDIFFR_LOG_PATH environment variable. When set, vdiffr pushes
diffs of failed SVG comparisons to that file.
expect_doppelganger() now takes a
writer argument. This makes it
easy to use vdiffr with a different SVG engine. See
an example function. Packages implementing a different SVG engine
should wrap around
expect_doppelganger() to pass their custom
write_svg() is now an exported function. It provides a template
(function arguments and return value) for SVG writer functions.
manage_cases() no longer checks for orphaned cases when a filter
is supplied. (Orphaned cases are figures dangling in the
folder even though their original
expect_doppelganger() has been
removed from the tests.)
verbose argument of
soft-deprecated. Please use the vdiffr failure log instead. It is
created automatically when run under R CMD check in
tests/vdiffr.Rout.fail, and should be displayed on Travis.
You can also set the
VDIFFR_LOG_PATH environment variable with
Sys.setenv() to unconditionally (also interactively) log failures
in the file pointed by the variable.
add_dependency() is soft-deprecated without replacement.
user_fonts argument of
expect_doppelganger() is defunct
because it complicated the UI for no clear benefit. The fonts used
to generate the SVGs are now hardcoded to Liberation and Symbola.
Maintenance release to fix CRAN errors. Thanks to Gregory R. Warnes (@gwarnes-mdsol) and Hiroaki Yutani (@yutannihilation) for helping out with this!
I'm working on embedding svglite in vdiffr and compiling statically to FreeType and Harfbuzz to make SVG generation deterministic across platforms. Until then vdiffr will remain a bit unstable (but should silently fail if dependencies have diverged).
last_collection_error() to print a testthat error that
occurred while collecting the test cases.
Skip tests if the system version of Cairo (actually the one gdtools was compiled with) doesn't match the version of Cairo used to generate the testcases. Cairo has an influence on the computation of text metrics which can cause spurious test failures.
We plan to fix these issues once and for all by embedding gdtools, svglite, Cairo and FreeType in the vdiffr package.
This release fixes some CRAN failures.
Test cases of the mock package were updated to FreeType 2.8.0.
The unit test log file from the mock package is now preserved.
This release makes it easier to debug failures on remote systems. It also makes vdiffr more robust to failures caused by incompatible installations: instead of failing, the tests are skipped. This prevents spurious failures on CRAN.
expect_doppelganger() gains a
verbose argument to print the
SVG files for failed cases while testing. This is useful to debug
failures on remotes.
When tests are run by
R CMD check, failures are now recorded in a
log file called
vdiffr.fail. This file will show up in the Travis
log and can be retrieved from artifacts on Appveyor. It includes the
SVG files for failed cases, which is useful to debug failures on
The tests are now skipped if the FreeType version used to build the comparison SVGs does not match the version installed on the system where the tests are run. This is necessary because changes in new version of FreeType might affect the computation of text extents, which then causes svglite to produce slightly different SVGs. The minor version is not taken into account so FreeType 2.7.1 is deemed compatible with 2.7.2 but not with 2.8.0.
In practice, this means that package contributors should only
validate visual cases if their FreeType version matches the one of
the package maintainer. Also, the maintainer must update the version
recorded in the package repository (in the file
./tests/figs/deps.txt) when FreeType has been updated on their
vdiffr::validate_cases() updates the dependency
file even if there are no visual case to update.
In the future, we may provide a version of vdiffr statically compiled with a specific version of FreeType to prevent these issues.
expect_doppelganger() no longer throws an error when FreeType is
too old. Instead, the test is skipped. This ensures that R CMD check
passes on those platforms (e.g., CRAN's Solaris test server).
Depends on gdtools 0.1.2 or later as this version fixes a crash on Linux platforms.
widget_diff() now take
plots as arguments. This makes it easy to embed a vdiffr widget in
R Markdown documents. The underscored versions take HTML sources as
argument (paths to SVG files or inline SVGs).
Generated SVGs are now reproducible across platforms thanks to recent versions of svglite, gdtools, and the new package fontquiver. vdiffr now requires versions of FreeType greater than 2.6.1.
The figures folder is hardcoded to
The figures are now stored in subfolders according to the current
expect_doppelganger() accepts the
argument to bypass this behaviour (set it to
"" to store the
title argument of
expect_doppelganger() now serves as
ggtitle() in ggplot2 figures (unless a title is already set). It
is also standardised and used as filename to store the figure
(spaces and non-alphanumeric characters are converted to dashes).
Add support for handling orphaned cases: you can now remove figures
left over from deleted tests with
delete_orphaned_cases() or from
the Shiny app.
filter argument to
This lets you filter the test files from which to collect the cases,
which is useful to speed up the collection for large codebases with
a lot of unit tests.
Fix invalid generation of SVG files (#3)
Give a warning when multiple doppelgangers have the same name (#4).
Remove CR line endings before comparing svg files for compatibility with Windows