Find Free Versions of Scholarly Publications via Unpaywall

This web client interfaces Unpaywall < https://unpaywall.org/products/api>, formerly oaDOI, a service finding free full-texts of academic papers by linking DOIs with open access journals and repositories. It provides unified access to various data sources for open access full-text links including Crossref and the Directory of Open Access Journals (DOAJ). API usage is free and no registration is required.


Build Status AppVeyor Build Status codecov.io cran version rstudio mirror downloads review

roadoi interacts with the oaDOI API, a simple web-interface which links DOIs and open access versions of scholarly works. oaDOI powers unpaywall.

This client supports the most recent API Version 2.

API Documentation: http://oadoi.org/api/v2

How do I use it?

Use the oadoi_fetch() function in this package to get open access status information and full-text links from oaDOI.

roadoi::oadoi_fetch(dois = c("10.1038/ng.3260", "10.1093/nar/gkr1047"), 
                    email = "[email protected]")
#> # A tibble: 2 x 13
#>                   doi best_oa_location      oa_locations data_standard
#>                 <chr>           <list>            <list>         <int>
#> 1     10.1038/ng.3260 <tibble [0 x 0]>  <tibble [0 x 0]>             2
#> 2 10.1093/nar/gkr1047 <tibble [1 x 8]> <tibble [3 x 10]>             2
#> # ... with 9 more variables: is_oa <lgl>, journal_is_oa <lgl>,
#> #   journal_issns <chr>, journal_name <chr>, publisher <chr>, title <chr>,
#> #   year <chr>, updated <chr>, non_compliant <list>

There are no API restrictions. However, providing an email address is required and a rate limit of 100k is suggested. If you need to access more data, ask for the data dump https://oadoi.org/api instead.

RStudio Addin

This package also has a RStudio Addin for easily finding free full-texts in RStudio.

How do I get it?

Install and load from CRAN:

install.packages("roadoi")
library(roadoi)

To install the development version, use the devtools package

devtools::install_github("ropensci/roadoi")
library(roadoi)

Long-Form Documentation including use-case

Open access copies of scholarly publications are sometimes hard to find. Some are published in open access journals. Others are made freely available as preprints before publication, and others are deposited in institutional repositories, digital archives maintained by universities and research institutions. This document guides you to roadoi, a R client that makes it easy to search for these open access copies by interfacing the oaDOI.org service where DOIs are matched with freely available full-texts available from open access journals and archives.

About oaDOI.org

oaDOI.org, developed and maintained by the team of Impactstory, is a non-profit service that finds open access copies of scholarly literature simply by looking up a DOI (Digital Object Identifier). It not only returns open access full-text links, but also helpful metadata about the open access status of a publication such as licensing or provenance information.

oaDOI.org uses different data sources to find open access full-texts including:

  • Crossref: a DOI registration agency serving major scholarly publishers.
  • Datacite: another DOI registration agency with main focus on research data
  • Directory of Open Access Journals (DOAJ): a registry of open access journals
  • Various OAI-PMH metadata sources. OAI-PMH is a protocol often used by open access journals and repositories such as arXiv and PubMed Central.

See Piwowar et al. (2017) for a comprehensive overview of oaDOI.org.[^1]

Basic usage

There is one major function to talk with oaDOI.org, oadoi_fetch(), taking a character vector of DOIs and your email address as required arguments.

library(roadoi)
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
                             "10.1103/physreve.88.012814"), 
                    email = "[email protected]")
#> # A tibble: 2 x 13
#>                          doi best_oa_location      oa_locations
#>                        <chr>           <list>            <list>
#> 1  10.1186/s12864-016-2566-9 <tibble [1 x 8]> <tibble [3 x 10]>
#> 2 10.1103/physreve.88.012814 <tibble [1 x 9]> <tibble [1 x 10]>
#> # ... with 10 more variables: data_standard <int>, is_oa <lgl>,
#> #   journal_is_oa <lgl>, journal_issns <chr>, journal_name <chr>,
#> #   publisher <chr>, title <chr>, year <chr>, updated <chr>,
#> #   non_compliant <list>

What's returned?

The client supports API version 2. According to the oaDOI.org API specification, the following variables with the following definitions are returned:

Column Description
doi DOI (always in lowercase)
best_oa_location list-column describing the best OA location. Algorithm prioritizes publisher hosted content (e.g. Hybrid or Gold)
oa_locations list-column of all the OA locations.
data_standard Indicates the data collection approaches used for this resource. 1 mostly uses Crossref for hybrid detection. 2 uses more comprehensive hybrid detection methods.
is_oa Is there an OA copy (logical)?
journal_is_oa Is the article published in a fully OA journal? Uses the Directory of Open Access Journals (DOAJ) as source.
journal_issns ISSNs
journal_name Journal title
publisher Publisher
title Publication title.
year Year published.
updated Time when the data for this resource was last updated.
non_compliant Lists other full-text resources that are not hosted by either publishers or repositories.

The columns best_oa_location and oa_locations are list-columns that contain useful metadata about the OA sources found by oaDOI. These are

Column Description
evidence How the OA location was found and is characterized by oaDOI?
host_type OA full-text provided by publisher or repository.
license The license under which this copy is published
url The URL where you can find this OA copy.
versions The content version accessible at this location following the DRIVER 2.0 Guidelines (https://wiki.surfnet.nl/display/DRIVERguidelines/DRIVER-VERSION+Mappings)

There at least two ways to simplify these list-columns.

To get the full-text links from the list-column best_oa_location, you may want to use purrr::map_chr().

library(dplyr)
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
                             "10.1103/physreve.88.012814"), 
                    email = "[email protected]") %>%
  dplyr::mutate(
    urls = purrr::map(best_oa_location, "url") %>% 
                  purrr::map_if(purrr::is_empty, ~ NA_character_) %>% 
                  purrr::flatten_chr()
                ) %>%
  .$urls
#> [1] "https://bmcgenomics.biomedcentral.com/track/pdf/10.1186/s12864-016-2566-9?site=bmcgenomics.biomedcentral.com"
#> [2] "http://arxiv.org/pdf/1304.0473"

If you want to gather all full-text links and to explore where these links are hosted, simplify the list-column oa_locations with tidyr::unnest():

library(dplyr)
roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
                             "10.1103/physreve.88.012814"), 
                    email = "[email protected]") %>%
  tidyr::unnest(oa_locations) %>% 
  dplyr::mutate(
    hostname = purrr::map(url, httr::parse_url) %>% 
                  purrr::map_chr(., "hostname", .null = NA_integer_)
                ) %>% 
  dplyr::mutate(hostname = gsub("www.", "", hostname)) %>% 
  dplyr::count(hostname)
#> # A tibble: 4 x 2
#>                        hostname     n
#>                           <chr> <int>
#> 1                     arxiv.org     1
#> 2 bmcgenomics.biomedcentral.com     1
#> 3                       doi.org     1
#> 4              ncbi.nlm.nih.gov     1

Note that fields to be returned might change according to the oaDOI.org API specs

Any API restrictions?

There are no API restrictions. However, providing your email address when using this client is required by oaDOI.org. Set email address in your .Rprofile file with the option roadoi_email when you are too tired to type in your email address every time you want to call oaDOI.org.

options(roadoi_email = "[email protected]")

Keeping track of crawling

To follow your API call, and to estimate the time until completion, use the .progress parameter inherited from plyr to display a progress bar.

roadoi::oadoi_fetch(dois = c("10.1186/s12864-016-2566-9",
                             "10.1103/physreve.88.012814"), 
                    email = "[email protected]", 
                    .progress = "text")
#> 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================================| 100%
#> # A tibble: 2 x 13
#>                          doi best_oa_location      oa_locations
#>                        <chr>           <list>            <list>
#> 1  10.1186/s12864-016-2566-9 <tibble [1 x 8]> <tibble [3 x 10]>
#> 2 10.1103/physreve.88.012814 <tibble [1 x 9]> <tibble [1 x 10]>
#> # ... with 10 more variables: data_standard <int>, is_oa <lgl>,
#> #   journal_is_oa <lgl>, journal_issns <chr>, journal_name <chr>,
#> #   publisher <chr>, title <chr>, year <chr>, updated <chr>,
#> #   non_compliant <list>

Catching errors

oaDOI is a reliable API. However, this client follows Hadley Wickham's Best practices for writing an API package and throws an error when the API does not return valid JSON or is not available. To catch these errors, you may want to use plyr's failwith() function

random_dois <-  c("ldld", "10.1038/ng.3260", "§dldl  ")
purrr::map_df(random_dois, 
              plyr::failwith(f = function(x) roadoi::oadoi_fetch(x, email ="[email protected]")))
#> # A tibble: 1 x 13
#>               doi best_oa_location     oa_locations data_standard is_oa
#>             <chr>           <list>           <list>         <int> <lgl>
#> 1 10.1038/ng.3260 <tibble [0 x 0]> <tibble [0 x 0]>             2 FALSE
#> # ... with 8 more variables: journal_is_oa <lgl>, journal_issns <chr>,
#> #   journal_name <chr>, publisher <chr>, title <chr>, year <chr>,
#> #   updated <chr>, non_compliant <list>

Use Case: Studying the compliance with open access policies

An increasing number of universities, research organisations and funders have launched open access policies in recent years. Using roadoi together with other R-packages makes it easy to examine how and to what extent researchers comply with these policies in a reproducible and transparent manner. In particular, the rcrossref package, maintained by rOpenSci, provides many helpful functions for this task.

Gathering DOIs representing scholarly publications

DOIs have become essential for referencing scholarly publications, and thus many digital libraries and institutional databases keep track of these persistent identifiers. For the sake of this vignette, instead of starting with a pre-defined set of publications originating from these sources, we simply generate a random sample of 100 DOIs registered with Crossref by using the rcrossref package.

library(dplyr)
library(rcrossref)
# get a random sample of DOIs and metadata describing these works
random_dois <- rcrossref::cr_r(sample = 100) %>%
  rcrossref::cr_works() %>%
  .$data
random_dois
#> # A tibble: 100 x 35
#>       alternative.id                          container.title    created
#>                <chr>                                    <chr>      <chr>
#>  1                                                            2015-12-21
#>  2 S0090429510019503                                  Urology 2011-05-03
#>  3                                  physica status solidi (c) 2010-02-04
#>  4 S1878875017315589                       World Neurosurgery 2017-09-19
#>  5                           Journal of Differential Geometry 2017-03-16
#>  6                               Chinese Journal of Chemistry 2010-09-09
#>  7  0550321380904678                        Nuclear Physics B 2002-11-12
#>  8                            Journal of Experimental Zoology 2005-06-10
#>  9                                                 ChemInform 2012-04-26
#> 10 S0399832006731293 Gastroentérologie Clinique et Biologique 2008-05-04
#> # ... with 90 more rows, and 32 more variables: deposited <chr>,
#> #   DOI <chr>, funder <list>, indexed <chr>, ISBN <chr>, ISSN <chr>,
#> #   issued <chr>, link <list>, member <chr>, prefix <chr>,
#> #   publisher <chr>, reference.count <chr>, score <chr>, source <chr>,
#> #   subject <chr>, title <chr>, type <chr>, URL <chr>, assertion <list>,
#> #   author <list>, `clinical-trial-number` <list>, issue <chr>,
#> #   license_date <chr>, license_URL <chr>, license_delay.in.days <chr>,
#> #   license_content.version <chr>, page <chr>, volume <chr>,
#> #   abstract <chr>, subtitle <chr>, update.policy <chr>, archive <chr>

Let's see when these random publications were published

random_dois %>%
  # convert to years
  mutate(issued, issued = lubridate::parse_date_time(issued, c('y', 'ymd', 'ym'))) %>%
  mutate(issued, issued = lubridate::year(issued)) %>%
  group_by(issued) %>%
  summarize(pubs = n()) %>%
  arrange(desc(pubs))
#> # A tibble: 47 x 2
#>    issued  pubs
#>     <dbl> <int>
#>  1     NA     9
#>  2   2015     5
#>  3   2002     4
#>  4   2006     4
#>  5   2008     4
#>  6   2010     4
#>  7   2011     4
#>  8   2012     4
#>  9   2013     4
#> 10   1994     3
#> # ... with 37 more rows

and of what type they are

random_dois %>%
  group_by(type) %>%
  summarize(pubs = n()) %>%
  arrange(desc(pubs))
#> # A tibble: 7 x 2
#>                  type  pubs
#>                 <chr> <int>
#> 1     journal-article    75
#> 2        book-chapter    12
#> 3 proceedings-article     6
#> 4           component     3
#> 5             dataset     2
#> 6        dissertation     1
#> 7              report     1

Calling oaDOI.org

Now let's call oaDOI.org

oa_df <- roadoi::oadoi_fetch(dois = random_dois$DOI, email = "[email protected]")

and merge the resulting information about open access full-text links with parts of our Crossref metadata-set

my_df <- random_dois %>%
  select(DOI, type) %>% 
  left_join(oa_df, by = c("DOI" = "doi"))

Reporting

After gathering the data, reporting with R is very straightforward. You can even generate dynamic reports using R Markdown and related packages, thus making your study reproducible and transparent for others.

To display how many full-text links were found and which sources were used in a nicely formatted markdown-table using the knitr-package:

my_df %>%
  group_by(is_oa) %>%
  summarise(Articles = n()) %>%
  mutate(Proportion = Articles / sum(Articles)) %>%
  arrange(desc(Articles)) %>%
  knitr::kable()
is_oa Articles Proportion
FALSE 84 0.84
TRUE 16 0.16

How did oaDOI find those Open Access full-texts, which were characterized as best matches, and how are these OA types distributed over publication types?

my_df %>%
  filter(is_oa == TRUE) %>%
  tidyr::unnest(best_oa_location) %>% 
  group_by(evidence, type) %>%
  summarise(Articles = n()) %>%
  arrange(desc(Articles)) %>%
  knitr::kable()
evidence type Articles
open (via free pdf) journal-article 7
oa journal (via issn in doaj) journal-article 4
oa repository (via OAI-PMH title and first author match) journal-article 2
open (via crossref license) journal-article 2
oa journal (via publisher name) component 1

More examples

For more examples, see Piwowar et al. 2017.[^1] Together with the article, they shared their analysis of oaDOI-data as R Markdown supplement.

[^1]: Piwowar, H., Priem, J., Larivière, V., Alperin, J. P., Matthias, L., Norlander, B., … Haustein, S. (2017). The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles (Version 1). PeerJ Preprints. https://doi.org/10.7287/peerj.preprints.3119v1

Meta

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

License: MIT

Please use the issue tracker for bug reporting and feature requests.

ropensci_footer

News

roadoi 0.4.1

Minor fixes:

  • remove BASE examples because BASE is no longer a data source of oaDOI
  • bug fix json parser

roadoi 0.4

Implements the oaDOI.org API version 2.

roadoi 0.3

Accepted for rOpenSci: https://github.com/ropensci/onboarding/issues/115

The following suggestions from the reviewers were added:

  • email validation, thanks @sckott
  • bugfix Shiny Addin, thanks @tts
  • add version number as package dependencies for shiny, thanks @rossmounce
  • add unit test using lintr package, thanks @maelle
  • improved documentation

roadoi 0.2

NEW FEATRUES

  • Shiny Addin for finding free full-texts in RStudio
  • full support of oadoi API version 1.3.0

Major changes

  • requests must now include email address to reflect new oadoi API version

Minor changes

  • improved output documentation

roadoi 0.1

NEW FEATRUES

  • released on CRAN
  • full support of oadoi API version 1.2.0

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("roadoi")

0.5.1 by Najko Jahn, a month ago


https://github.com/ropensci/roadoi


Report a bug at https://github.com/ropensci/roadoi/issues


Browse source code at https://github.com/cran/roadoi


Authors: Najko Jahn [aut, cre], Tuija Sonkkila [rev] (Tuija Sonkkila reviewed the package for rOpenSci, see https://github.com/ropensci/onboarding/issues/115), Ross Mounce [rev] (Ross Mounce reviewed the package for rOpenSci, see https://github.com/ropensci/onboarding/issues/115)


Documentation:   PDF Manual  


MIT + file LICENSE license


Imports httr, jsonlite, dplyr, plyr, purrr, tibble, miniUI, shiny, tidyr

Suggests roxygen2, testthat, knitr, covr, rmarkdown, lubridate, rcrossref, lintr


See at CRAN