Datasets from the Datasaurus Dozen

The Datasaurus Dozen is a set of datasets with the same summary statistics. They retain the same summary statistics despite having radically different distributions. The datasets represent a larger and quirkier object lesson that is typically taught via Anscombe's Quartet (available in the 'datasets' package). Anscombe's Quartet contains four very different distributions with the same summary statistics and as such highlights the value of visualisation in understanding data, over and above summary statistics. As well as being an engaging variant on the Quartet, the data is generated in a novel way. The simulated annealing process used to derive datasets from the original Datasaurus is detailed in "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing" < http://dx.doi.org/10.1145/3025453.3025912>.


datasauRus

Build StatusCoverage Status

This package wraps the awesome Datasaurus Dozen dataset.

The Datasaurus was created by Alberto Cairo in this great blog post.

Datasaurus shows us why visualisation is important, not just summary statistics.

He's been subsequently made even more famous in the paper Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing by Justin Matejka and George Fitzmaurice.

In the paper, Justin and George simulate a variety of datasets that the same summary statistics to the Datasaurus but have very different distributions.

This package looks to make these datasets available for use as an advanced Anscombe's Quartet, available in R as anscombe.

Install

Currently, only available on GitHub, so use devtools to install the package

devtools::install_github("stephlocke/datasauRus")

Usage

You can use the package to produce Anscombe plots and more.

library(ggplot2)
library(datasauRus)
ggplot(datasaurus_dozen, aes(x=x, y=y, colour=dataset))+
  geom_point()+
  theme_void()+
  theme(legend.position = "none")+
  facet_wrap(~dataset, ncol=3)

Tests

library(devtools)
test()
#> Loading required package: testthat
#> Testing datasauRus
#> datasets: ......................
#> Raw files: .
#> 
#> DONE ======================================================================

Contributing to the package

Code of Conduct

Anyone getting involved in this package agrees to our Code of Conduct. If someone is breaking the Will Wheaton rule aka Don't be a dick, or breaking the Code of Conduct, please let me know at [email protected]

Bug reports

When you file a bug report, please spend some time making it easy for us to follow and reproduce. The more time you spend on making the bug report coherent, the more time we can dedicate to investigate the bug as opposed to the bug report.

Ideas

Got an idea for how we can improve the package? Awesome stuff!

Please raise it with some succinct information on expected behaviour of the enhancement and why you think it'll improve the package.

Package development

We really want people to contribute to the package. A great way to start doing this is to look at the help wanted issues and/or contribute an example.

Examples for this package are done in base R or with ggplot2 as an optional example, using the structure:

if(require(ggplot2)){
#ggplot2 code here
}

As this is a data package, most of the documentation is sitting in one file (R/Datasaurus-package.R) so we keep the examples in a separate directory (inst/examples).

  • If there isn't a file for the dataset you want to write an example for, you can make one by just calling it datasetname.R. To reference an example file, add the line @example inst/datasetname.R in the relevant documentation section of R/Datasaurus-package.R.

Conventions

We're relatively loose on coding conventions.

  • Datasets are lower-case with underscores between words
  • R code should be formatted with the "Reformat code" option in RStudio
  • There are no standards for base R plots
  • My preferred ggplot2 themes are theme_minimal where axes labels matter and theme_void when they do not but I'm OK with the default ggplot2 theming if you want to avoid writing longer ggplot2 code

News

datasauRus 0.1.2

  • First release, contains datasaurus datasets

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("datasauRus")

0.1.2 by Steph Locke, a year ago


https://github.com/stephlocke/datasauRus


Report a bug at https://github.com/stephlocke/datasauRus/issues


Browse source code at https://github.com/cran/datasauRus


Authors: Steph Locke [cre, aut], Alberto Cairo [dtc], Justin Matejka [dtc], George Fitzmaurice [dtc], Lucy D'Agostino McGowan [aut]


Documentation:   PDF Manual  


MIT + file LICENSE license


Suggests covr, testthat, knitr, rmarkdown, ggplot2


See at CRAN