Interface to the Search 'API' for 'PLoS' Journals

A programmatic interface to the 'SOLR' based search 'API' (<>) provided by the Public Library of Science journals to search their articles. Functions are included for searching for articles, retrieving articles, making plots, doing 'faceted' searches, 'highlight' searches, and viewing results of 'highlighted' searches in a browser.

Build Status Build status rstudio mirror downloads cran version

You can get this package at CRAN here, or install it within R by doing


Or install the development version from GitHub


What is this?

rplos is a package for accessing full text articles from the Public Library of Science journals using their API.


You used to need a key to use rplos - you no longer do as of 2015-01-13 (or v0.4.5.999).

rplos tutorial at rOpenSci website here

PLoS API documentation here

Crossref API documentation here, and here. Note that we are working on a new package rcrossref (on CRAN) with a much fuller implementation of R functions for all Crossref endpoints.


Beware, PLOS recently has started throttling requests. That is, they will give error messages like "(503) Service Unavailable - The server cannot process the request due to a high load", which means you've done too many requests in a certain time period. Here's what they say on the matter:

Quick start


Search for the term ecology, and return id (DOI) and publication date, limiting to 5 items

searchplos('ecology', 'id,publication_date', limit = 5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1    40530     0
#> $data
#> # A tibble: 5 x 2
#>                             id     publication_date
#>                          <chr>                <chr>
#> 1 10.1371/journal.pone.0001248 2007-11-28T00:00:00Z
#> 2 10.1371/journal.pone.0059813 2013-04-24T00:00:00Z
#> 3 10.1371/journal.pone.0155019 2016-05-11T00:00:00Z
#> 4 10.1371/journal.pone.0080763 2013-12-10T00:00:00Z
#> 5 10.1371/journal.pone.0150648 2016-03-03T00:00:00Z

Get DOIs for full article in PLoS One

searchplos(q="*:*", fl='id', fq=list('journal_key:PLoSONE',
   'doc_type:full'), limit=5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1   189936     0
#> $data
#> # A tibble: 5 x 1
#>                             id
#>                          <chr>
#> 1 10.1371/journal.pone.0107314
#> 2 10.1371/journal.pone.0037802
#> 3 10.1371/journal.pone.0163113
#> 4 10.1371/journal.pone.0079578
#> 5 10.1371/journal.pone.0023759

Query to get some PLOS article-level metrics, notice difference between two outputs

out <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'), fq='doc_type:full')
out_sorted <- searchplos(q="*:*", fl=c('id','counter_total_all','alm_twitterCount'),
   fq='doc_type:full', sort='counter_total_all desc')
#> # A tibble: 6 x 3
#>                             id alm_twitterCount counter_total_all
#>                          <chr>            <int>             <int>
#> 1 10.1371/journal.pone.0107314                0               602
#> 2 10.1371/journal.pone.0037802                0              3743
#> 3 10.1371/journal.pone.0163113                2              2137
#> 4 10.1371/journal.ppat.1004790                0              2819
#> 5 10.1371/journal.pone.0079578                2              3187
#> 6 10.1371/journal.pone.0023759                0              5016
#> # A tibble: 6 x 3
#>                                                        id alm_twitterCount
#>                                                     <chr>            <int>
#> 1                            10.1371/journal.pmed.0020124             3207
#> 2 10.1371/annotation/80bd7285-9d2d-403a-8e6f-9c375bf977ca                0
#> 3                            10.1371/journal.pcbi.1003149              182
#> 4                            10.1371/journal.pone.0141854             3401
#> 5                            10.1371/journal.pcbi.0030102               63
#> 6                            10.1371/journal.pone.0088278              932
#> # ... with 1 more variables: counter_total_all <int>

A list of articles about social networks that are popular on a social network

   fq=list('doc_type:full','subject:"Social networks"','alm_twitterCount:[100 TO 10000]'),
   sort='counter_total_month desc')
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1       56     0
#> $data
#> # A tibble: 10 x 2
#>                              id alm_twitterCount
#>                           <chr>            <int>
#>  1 10.1371/journal.pone.0073791             1856
#>  2 10.1371/journal.pmed.1000316             1006
#>  3 10.1371/journal.pone.0069841              892
#>  4 10.1371/journal.pone.0148405              516
#>  5 10.1371/journal.pone.0151588              331
#>  6 10.1371/journal.pcbi.1005399              561
#>  7 10.1371/journal.pone.0149885              167
#>  8 10.1371/journal.pone.0150989              241
#>  9 10.1371/journal.pbio.1001535             2123
#> 10 10.1371/journal.pbio.1002373              402

Show all articles that have these two words less then about 15 words apart

searchplos(q='everything:"sports alcohol"~15', fl='title', fq='doc_type:full', limit=3)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1      111     0
#> $data
#> # A tibble: 3 x 1
#>                                                                         title
#>                                                                         <chr>
#> 1 Alcohol Advertising in Sport and Non-Sport TV in Australia, during Children
#> 2 Symptoms of Insomnia and Sleep Duration and Their Association with Incident
#> 3 Correction: Alcohol Advertising in Sport and Non-Sport TV in Australia, dur

Narrow results to 7 words apart, changing the ~15 to ~7

searchplos(q='everything:"sports alcohol"~7', fl='title', fq='doc_type:full', limit=3)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1       60     0
#> $data
#> # A tibble: 3 x 1
#>                                                                         title
#>                                                                         <chr>
#> 1 Alcohol Advertising in Sport and Non-Sport TV in Australia, during Children
#> 2 Symptoms of Insomnia and Sleep Duration and Their Association with Incident
#> 3 Correction: Alcohol Advertising in Sport and Non-Sport TV in Australia, dur

Remove DOIs for annotations (i.e., corrections) and Viewpoints articles

searchplos(q='*:*', fl=c('id','article_type'),
   fq=list('-article_type:correction','-article_type:viewpoints'), limit=5)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1  1851430     0
#> $data
#> # A tibble: 5 x 2
#>                                        id     article_type
#>                                     <chr>            <chr>
#> 1            10.1371/journal.pone.0074173 Research Article
#> 2      10.1371/journal.pone.0074173/title Research Article
#> 3   10.1371/journal.pone.0074173/abstract Research Article
#> 4 10.1371/journal.pone.0074173/references Research Article
#> 5       10.1371/journal.pone.0074173/body Research Article

Faceted search

Facet on multiple fields

facetplos(q='alcohol', facet.field=c('journal','subject'), facet.limit=5)
#> $facet_queries
#> $facet_fields
#> $facet_fields$journal
#> # A tibble: 5 x 2
#>                               term value
#>                              <chr> <chr>
#> 1                         plos one 23167
#> 2                    plos genetics   535
#> 3                    plos medicine   452
#> 4 plos neglected tropical diseases   415
#> 5                   plos pathogens   311
#> $facet_fields$subject
#> # A tibble: 5 x 2
#>                            term value
#>                           <chr> <chr>
#> 1     biology and life sciences 24912
#> 2  medicine and health sciences 22173
#> 3 research and analysis methods 14372
#> 4                  biochemistry 12309
#> 5             physical sciences  9342
#> $facet_pivot
#> $facet_dates
#> $facet_ranges

Range faceting

facetplos(q='*:*', url=url, facet.range='counter_total_all',
 facet.range.start=5, facet.range.end=100,
#> $facet_queries
#> $facet_fields
#> $facet_pivot
#> $facet_dates
#> $facet_ranges
#> $facet_ranges$counter_total_all
#> # A tibble: 10 x 2
#>     term value
#>    <chr> <chr>
#>  1     5    25
#>  2    15   239
#>  3    25   509
#>  4    35   927
#>  5    45  1409
#>  6    55  1709
#>  7    65  1830
#>  8    75  1726
#>  9    85  1595
#> 10    95  1464

Highlight searches

Search for and highlight the term alcohol in the abstract field only

(out <- highplos(q='alcohol', hl.fl = 'abstract', rows=3))
#> $`10.1371/journal.pone.0185457`
#> $`10.1371/journal.pone.0185457`$abstract
#> [1] "Objectives: <em>Alcohol</em>-related morbidity and mortality are significant public health issues"
#> $`10.1371/journal.pone.0071284`
#> $`10.1371/journal.pone.0071284`$abstract
#> [1] "\n<em>Alcohol</em> dependence is a heterogeneous disorder where several signalling systems play important"
#> $`10.1371/journal.pone.0027752`
#> $`10.1371/journal.pone.0027752`$abstract
#> [1] "Background: The negative influences of <em>alcohol</em> on TB management with regard to delays in seeking"

And you can browse the results in your default browser



Full text urls

Simple function to get full text urls for a DOI

#> [1] ""

Full text xml given a DOI

(out <- plos_fulltext(doi='10.1371/journal.pone.0086169'))
#> 1 full-text articles retrieved
#> Min. Length: 110717 - Max. Length: 110717
#> DOIs: 10.1371/journal.pone.0086169 ...
#> NOTE: extract xml strings like output['<doi>']

Then parse the XML any way you like, here getting the abstract

xpathSApply(xmlParse(out$`10.1371/journal.pone.0086169`), "//abstract", xmlValue)
#> [1] "Mammalian females pay high energetic costs for reproduction, the greatest of which is imposed by lactation. The synthesis of milk requires, in part, the mobilization of bodily reserves to nourish developing young. Numerous hypotheses have been advanced to predict how mothers will differentially invest in sons and daughters, however few studies have addressed sex-biased milk synthesis. Here we leverage the dairy cow model to investigate such phenomena. Using 2.39 million lactation records from 1.49 million dairy cows, we demonstrate that the sex of the fetus influences the capacity of the mammary gland to synthesize milk during lactation. Cows favor daughters, producing significantly more milk for daughters than for sons across lactation. Using a sub-sample of this dataset (N = 113,750 subjects) we further demonstrate that the effects of fetal sex interact dynamically across parities, whereby the sex of the fetus being gestated can enhance or diminish the production of milk during an established lactation. Moreover the sex of the fetus gestated on the first parity has persistent consequences for milk synthesis on the subsequent parity. Specifically, gestation of a daughter on the first parity increases milk production by ∼445 kg over the first two lactations. Our results identify a dramatic and sustained programming of mammary function by offspring in utero. Nutritional and endocrine conditions in utero are known to have pronounced and long-term effects on progeny, but the ways in which the progeny has sustained physiological effects on the dam have received little attention to date."

Search within a field

There are a series of convience functions for searching within sections of articles.

  • plosauthor()
  • plosabstract()
  • plosfigtabcaps()
  • plostitle()
  • plossubject()

For example:

plossubject(q='marine ecology',  fl = c('id','journal'), limit = 10)
#> $meta
#> # A tibble: 1 x 2
#>   numFound start
#>      <int> <int>
#> 1     3560     0
#> $data
#> # A tibble: 10 x 2
#>                                         id  journal
#>                                      <chr>    <chr>
#>  1            10.1371/journal.pone.0167252 PLOS ONE
#>  2      10.1371/journal.pone.0167252/title PLOS ONE
#>  3   10.1371/journal.pone.0167252/abstract PLOS ONE
#>  4 10.1371/journal.pone.0167252/references PLOS ONE
#>  5       10.1371/journal.pone.0167252/body PLOS ONE
#>  6            10.1371/journal.pone.0021810 PLoS ONE
#>  7      10.1371/journal.pone.0021810/title PLoS ONE
#>  8   10.1371/journal.pone.0021810/abstract PLoS ONE
#>  9 10.1371/journal.pone.0021810/references PLoS ONE
#> 10       10.1371/journal.pone.0021810/body PLoS ONE

However, you can always just do this in searchplos() like searchplos(q = "subject:science"). See also the fq parameter. The above convenience functions are simply wrappers around searchplos, so take all the same parameters.

Search by article views

Search with term marine ecology, by field subject, and limit to 5 results

plosviews(search='marine ecology', byfield='subject', limit=5)
#>                             id counter_total_all
#> 1 10.1371/journal.pone.0167252              1379
#> 2 10.1371/journal.pone.0021810              2883
#> 5 10.1371/journal.pone.0053598              4351
#> 4 10.1371/journal.pone.0149852              8319
#> 3 10.1371/journal.pone.0092590              8873


Visualize word use across articles

plosword(list('monkey','Helianthus','sunflower','protein','whale'), vis = 'TRUE')
#> $table
#>   No_Articles       Term
#> 1       11884     monkey
#> 2         502 Helianthus
#> 3        1394  sunflower
#> 4      135029    protein
#> 5        1613      whale
#> $plot



  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for rplos in R doing citation(package = 'rplos')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

This package is part of a richer suite called fulltext, along with several other packages, that provides the ability to search for and retrieve full text of open access scholarly articles. We recommend using fulltext as the primary R interface to rplos unless your needs are limited to this single source.



rplos 0.8.0


  • Now using solrium for under the hood Solr interaction instead of solr package (#106)
  • Along with above change, the following: facetplos, searchplos, and highplos lose parameter verbose, and gain parameters error and proxy for changing how verbose error reporting is, and for setting proxy details, respectively.
  • Now using crul instead of httr for HTTP requests (#110)


  • Fix to placement of images for README requested by CRAN (#114)
  • Replaced XML with xml2 (#112)
  • citations function for PLOS rich citations is defunct as the service is gone (#113)
  • package tm dropped from Enhances (#111)
  • added code of conduct, issue and pull request templates

rplos 0.6.4


  • URLs to full text XML have been changed - old URLs were working but were going through 2 302 redirects to get there. Updated URLs. (#107)


  • Fixed content-type check for plos_fulltext() function. XML can be either application/xml or text/xml (#108)

rplos 0.6.0


  • Added notes to documentation for relavant functions for how to do phrase searching. (#96) (#97) thanks @poldham
  • Removed parameter random parameter from citations() function as it's no longer available in the API (#103)
  • Swapped out all uses of dplyr::rbind_all() for dplyr::bind_rows() (#105)
  • full_text_urls() now gives back NA when DOIs for annotations are given, which can be easily removed.


  • Fixed full_text_urls() function to create full text URLs for PLOS Clinical Trials correctly (#104)

rplos 0.5.6


  • move ggplot2 from Depends to Imports, and using @importFrom for ggplot2 functions, now all imports are using @importFrom (#99)
  • Fixes for httr::content() to parse manually, and use explicit encoding of UTF-8 (#102)

rplos 0.5.4


  • Change solr dependency to require version v0.1.6 or less (#94)

rplos 0.5.2


  • More tests added (#94)


  • Fix encoding in parsing of XML data in plos_fulltext() to avoid unicode problems (#93)

rplos 0.5.0


  • Now importing non-Base R functions from utils, stats, and methods packages (#90)


  • Fixes for httr v1 that broke rplos when length 0 list passed to query parameter (#89)

rplos 0.4.7


  • New function citations() for querying the PLOS Rich Citations API ( (#88)


  • Added vignettes/figure to .Rbuildignore as requested by CRAN admin (#87)

rplos 0.4.6


  • API key no longer required (#86)


  • searchplos() now returns a list of length two, meta and data, and meta is a data.frame of metadata for the search.
  • Switched from CC0 to MIT license.
  • No longer importing libraries RCurl, data.table, googleVis, assertthat, RJSONIO, and stringr (#79) (#82) (#84)
  • Now importing dplyr.
  • Moved jsonlite from Suggests to Imports. Replaces use of RJSONIO. (#80)
  • crossref() now defunct. See package rcrossref (#83)
  • highplos() now uses solr::solr_highlight() to do highlight searches.
  • searchplos(), plosabstract(), and other functions that wrap searchplos() now use ... to pass in curl options to httr::GET(). You'll now get an error on using callopts parameter.
  • Added manual file entry for the dataset isocodes.
  • Reworked both plosword() and plot_throughtime() to have far less code, uses httr now instead of RCurl, but to the user, everything should be the same.
  • Made documentation more clear on discrepancy between PLOS website behavior and rplos behavior, and how to make them match, or match more closely (#76)
  • Added package level man file to allow ?rplos to go to help page.


  • Removed some examples from searchplos() that are now not working for some unknown reason. (#81)
  • Previously when user set limit=0, we still gave back data, this is fixed, and now the meta slot given back, and the data slot gives an NA (#85)

rplos 0.4.1


  • Fixed some broken tests.

rplos 0.4.0


  • Errors from the data provider are reported now. At least we attempt to do so when they are given, for example if you specify asc or desc incorrectly with the sort parameter. See the check_response() function for examples.
  • New functions facetplos() and highplos() using the solr R wrapper to the Solr indexing engine. The PLOS API just exposes the Solr endpoints, so we can use the general Solr wrapper package solr to allow more flexible Solr searching.
  • New function highbrow() to visualize highlighting results in a browser.
  • New function plos_fulltext() to get full text xml of PLOS articles. Helper function full_text_urls() constructs the URL's for full text xml.


  • Fixed bug in tests where we forgot to give a key. No key is required per se, but PLOS encourages it so we prevent a call from happening without at least a dumby key.
  • Added function check_response() to check responses from the PLOS API, deals with capturing server error messages, and checking for correct content type, etc.


  • Removed function crossref_r() as we are working on a package for the CrossRef API.
  • Parameter arguments in searchplos(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle() were changed to match closer the Solr parameter names. terms to q. fields to fl. toquery to fq.
  • Multiple values passed to fields
  • returndf parameter is gone from searchplos(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle(). You can easily get raw JSON, etc. data using the solr package.
  • Now using httr instead of RCurl in plosviews() function.

rplos 0.3.6


  • All search functions (searchplos(), plosabstract(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle()) gain highlighting argument, setting to TRUE (default=FALSE) returns matching sentence fragments that were matched. NOTE that if highlighting=TRUE the output can be a list of data.frame's if returndf=TRUE, with two named elements 'data' and 'highlighting', or a list of lists if returndf=FALSE.
  • All search functions (searchplos(), plosabstract(), plosauthor(), plosfigtabcaps(), plossubject(), and plostitle()) gain sort argument. You can pass a field to sort by even if you don't return that field in the output, e.g., sort='counter_total_month desc'.
  • A tiny function parsehighlight() added to parse out html code from highlighting output.


  • Some examples in docs didn't work - fixed them.
  • Fixed bug in searchplos() that was causing elements of a return field to cause failure because they were longer than 1 (e.g., authors). Now concatenating elements of length > 1.
  • Fixed bug in searchplos() that was causing elements of length 0 to cause failure. Now removing elements of length 0.
  • Fixed parsehighlight function to return NA if highlighting return of length 0.
  • Fixed broken test for plosauthor(), plosabstract(), and plot_throughtime().

rplos 0.3.0


  • Added httr::stop_for_status() calls to a few functions to give informative http status errors when they happen


  • Fixed bug in plot_throughtime() that was throwing errors and preventing fxn from working, thanks to Ben Bolker for the fix.
  • Simplified code in many functions to have cleaner and simpler code.
  • ... parameter in many functions changed to callopts=list(), which passes in curl options to a call to either RCurl::getForm() or httr::GET()
  • Fixed bug in function plosviews() that caused errors in some calls. Now forces full document searches, so that you get views data back for full papers only, not sections of papers. See package alm ( for more in depth PLOS article-level metrics.

rplos 0.2.0


  • All functions for interacting with the PLOS ALM (altmetrics) API have been removed, and are now in a separate package called alm (
  • Convenience functions plosabstract, plosauthor, plosfigtabcaps, plossubject, and plostitle, that search specifically within those sections of papers now wrap searchplos, so they should behave the same way.
  • ldfast() fxn added as an attempt to do ldply faster
  • performance improvements in searchplos


  • Dependency on assertthat removed since it's not on CRAN.
  • Fixed namespace conflicts by importing only functions needed from some packages.
  • searchplos() now removes leading, trailing, and internal whitespace from character strings

rplos 0.1.1

  • remove alm*() functions so that this package now only wraps the PLoS search API.

rplos 0.1.0

  • The almdateupdated function has been deprecated - use almupdated instead.

  • The articlelength function has been deprecated - didn't see the usefulness any longer.

  • In general simplified and prettified code.

  • Changed from using RCurl to httr in many functions, but not all.

  • Added more examples for many functions.

  • Added three internal functions: concat_todf, addmissing, and getkey.

  • Added Karthik Ram as a package author.


  • All url arguments in functions put inside functions as they are not likely to change that often.

  • Fixed crossref function, and added more examples.


  • The alm function (previously almplosallviews) gains many ### new features: now allows up to 50 DOIs per call; you can specify the source you want to get alm data from as an argument; you can specify the year you want to get alm data from as an argument.

  • Added the plosfields data file to get all the possible fields to use in function calls.


  • almplosallviews changed to alm.

  • almplotallviews changed to almplot.

  • almevents added to specifically search and get detailed events data for a specific source or N sources.

  • crossref_r gets 20 random DOIs from

  • Added package startup message.

  • journalnamekey function to get the short name keys for each PLoS Journal.

rplos 0.0-7


  • ALM functions (any functions starting with alm) received updated arguments/parameters according to the ALM API version 3.0 changes.

  • ### Bug fixes in general across library.

  • Added tests.

  • almplosallviews now outputs different output - two data.frames, one total metrics (summed across time), and history (for metrics for each time period specified in the search)

  • crossref function returns R's native bibtype format. See examples in crossref function documentation

rplos 0.0-5


  • almpub changed to almdatepub

  • changed help file rplos to help - use help('rplos') in R

  • changed URL from to

  • added sleep argument to plosallviews function to allow pauses between API calls when running plosallviews in a loop - this is an attempt to limit hitting the PLoS API too hard

  • various other fixed to functions

  • more examples added to some functions


  • added function journalnamekey to get short keys for journals to use in searching for specific journals

rplos 0.0-1


  • released to CRAN

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.8.0 by Scott Chamberlain, 8 months ago

Report a bug at

Browse source code at

Authors: Scott Chamberlain [aut, cre], Carl Boettiger [aut], Karthik Ram [aut]

Documentation:   PDF Manual  

Task views: Web Technologies and Services

MIT + file LICENSE license

Imports ggplot2, crul, jsonlite, dplyr, plyr, lubridate, reshape2, whisker, solrium

Suggests xml2, testthat, knitr

Imported by fulltext.

See at CRAN