Import and Export CSV Data with a YAML Metadata Header

Support for import from and export to the CSVY file format. CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.) by placing a YAML header on top of a regular CSV.


CSVY is a file format that combines the simplicity of CSV (comma-separated values) with the metadata of other plain text and binary formats (JSON, XML, Stata, etc.). The CSVY file specification is simple: place a YAML header on top of a regular CSV. The yaml header is formatted according to the Table Schema of a Tabular Data Package.

A CSVY file looks like this:

#---
#profile: tabular-data-resource
#name: my-dataset
#path: https://raw.githubusercontent.com/csvy/csvy.github.io/master/examples/example.csvy
#title: Example file of csvy 
#description: Show a csvy sample file.
#format: csvy
#mediatype: text/vnd.yaml
#encoding: utf-8
#schema:
#  fields:
#  - name: var1
#    type: string
#  - name: var2
#    type: integer
#  - name: var3
#    type: number
#dialect:
#  csvddfVersion: 1.0
#  delimiter: ","
#  doubleQuote: false
#  lineTerminator: "\r\n"
#  quoteChar: "\""
#  skipInitialSpace: true
#  header: true
#sources:
#- title: The csvy specifications
#  path: http://csvy.org/
#  email: ''
#licenses:
#- name: CC-BY-4.0
#  title: Creative Commons Attribution 4.0
#  path: https://creativecommons.org/licenses/by/4.0/
#---
var1,var2,var3
A,1,2.0
B,3,4.3

Which we can read into R like this:

library("csvy")
str(read_csvy(system.file("examples", "example1.csvy", package = "csvy")))
## 'data.frame':	2 obs. of  3 variables:
##  $ var1: chr  "A" "B"
##  $ var2: int  1 3
##  $ var3: num  2 4.3
##  - attr(*, "profile")= chr "tabular-data-resource"
##  - attr(*, "title")= chr "Example file of csvy"
##  - attr(*, "description")= chr "Show a csvy sample file."
##  - attr(*, "name")= chr "my-dataset"
##  - attr(*, "format")= chr "csvy"
##  - attr(*, "sources")=List of 1
##   ..$ :List of 3
##   .. ..$ name : chr "CC-BY-4.0"
##   .. ..$ title: chr "Creative Commons Attribution 4.0"
##   .. ..$ path : chr "https://creativecommons.org/licenses/by/4.0/"

Optional comment characters on the YAML lines make the data readable with any standard CSV parser while retaining the ability to import and export variable- and file-level metadata. The CSVY specification does not use these, but the csvy package for R does so that you (and other users) can continue to rely on utils::read.csv() or readr::read_csv() as usual. The import() function in rio supports CSVY natively.

Export

To create a CSVY file from R, just do:

library("csvy")
library("datasets")
write_csvy(iris, "iris.csvy")

It is also possible to export the metadata to separate YAML or JSON file (and then also possible to import from those separate files) by specifying the metadata field in write_csvy() and read_csvy().

Import

To read a CSVY into R, just do:

d1 <- read_csvy("iris.csvy")
str(d1)
## 'data.frame':	150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...
##   ..- attr(*, "levels")= chr  "setosa" "versicolor" "virginica"
##  - attr(*, "profile")= chr "tabular-data-package"
##  - attr(*, "name")= chr "iris"

or use any other appropriate data import function to ignore the YAML metadata:

d2 <- utils::read.table("iris.csvy", sep = ",", header = TRUE)
str(d2)
## 'data.frame':	150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Package Installation

The package is available on CRAN and can be installed directly in R using:

install.packages("csvy")

The latest development version on GitHub can be installed using devtools:

if(!require("remotes")){
    install.packages("remotes")
}
remotes::install_github("leeper/csvy")

CRAN Version Downloads Travis-CI Build Status Appveyor Build status codecov.io

News

csvy 0.3.0

  • Updated support to current CSVY specifications. (#13, h/t Michael Chirico)
  • Argument sep2 in write_csvy() has been corrected to dec.
  • Fixed an unclosed connection bug. (#23)

csvy 0.2.2

  • If reading a file data.csvy without a metadata header, and a data.[yaml|yml|json] file is present (in the same directory), that will be automatically read-in as the metadata (completes requests for #10, h/t @jonocarroll)

csvy 0.2.1

  • Expanded test suite and fixed some small bugs in the process.
  • Parse YAML header file first, then pass column classes to data.table::fread to improve performance (#9, Alexey Shiklomanov)

csvy 0.2.0

  • Removed support for utils::read.csv() and readr::read_csv() for simplicity.
  • Updated support to current CSVY specifications. (#13, h/t Michael Chirico)
  • Substantially changed internal code and added markup.
  • Changed example files.
  • Added option to output metadata to separate YAML or JSON file. (#10, h/t Hadley Wickham)

csvy 0.1.2

  • Address header that is not in the same order as data columns. (#1)
  • Support for readr::read_csv() and utils::read.csv(). (#2)

csvy 0.1.1

  • Initial release

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("csvy")

0.3.0 by Thomas J. Leeper, 10 months ago


https://github.com/leeper/csvy


Report a bug at https://github.com/leeper/csvy/issues


Browse source code at https://github.com/cran/csvy


Authors: Thomas J. Leeper [aut, cre] , Alexey N. Shiklomanov [aut] , Jonathan Carroll [aut]


Documentation:   PDF Manual  


GPL-2 license


Imports tools, data.table, jsonlite, yaml

Suggests testthat, datasets


Suggested by rio.


See at CRAN