A programmatic interface to many species occurrence data sources, including Global Biodiversity Information Facility ('GBIF'), 'USGSs' Biodiversity Information Serving Our Nation ('BISON'), 'iNaturalist', Berkeley 'Ecoinformatics' Engine, 'eBird', 'AntWeb', Integrated Digitized 'Biocollections' ('iDigBio'), 'VertNet', Ocean 'Biogeographic' Information System ('OBIS'), and Atlas of Living Australia ('ALA'). Includes functionality for retrieving species occurrence data, and combining those data.
spocc = SPecies OCCurrence data
At rOpenSci, we have been writing R packages to interact with many sources of species occurrence data, including GBIF, iDigBio, Vertnet, BISON, iNaturalist, the Berkeley ecoengine, and AntWeb.
spocc is an R package to query and collect species occurrence data from many sources. The goal is to wrap functions in other R packages to make a seamless experience across data sources for the user.
The inspiration for this comes from users requesting a more seamless experience across data sources, and from our work on a similar package for taxonomy data (taxize).
BEWARE: In cases where you request data from multiple providers, especially when including GBIF, there could be duplicate records since many providers' data eventually ends up with GBIF. See
?spocc_duplicates, after installation, for more.
Stable version from CRAN
install.packages("spocc", dependencies = TRUE)
Or the development version from GitHub
Get data from GBIF
(out <- occ(query = 'Accipiter striatus', from = 'gbif', limit = 100))#> Searched: gbif#> Occurrences - Found: 528,936, Returned: 100#> Search type: Scientific#> gbif: Accipiter striatus (100)
Just gbif data
out$gbif#> Species [Accipiter striatus (100)]#> First 10 rows of [Accipiter_striatus]#>#> # A tibble: 100 × 108#> name longitude latitude prov issues key#> <chr> <dbl> <dbl> <chr> <chr> <int>#> 1 Accipiter striatus -97.94314 30.04580 gbif cdround,gass84 1233600470#> 2 Accipiter striatus -77.05161 38.87834 gbif cdround,gass84 1270044795#> 3 Accipiter striatus -95.50117 29.76086 gbif cdround,gass84 1229610478#> 4 Accipiter striatus -96.74874 33.03102 gbif cdround,gass84 1257416040...
Get fine-grained detail over each data source by passing on parameters to the packge rebird in this example.
(out <- occ(query = 'Setophaga caerulescens', from = 'ebird', ebirdopts = list(region = 'US')))#> Searched: ebird#> Occurrences - Found: 0, Returned: 500#> Search type: Scientific#> ebird: Setophaga caerulescens (500)
Just ebird data
out$ebird#> Species [Setophaga caerulescens (500)]#> First 10 rows of [Setophaga_caerulescens]#>#> # A tibble: 500 × 12#> name longitude latitude prov obsDt#> <chr> <dbl> <dbl> <chr> <date>#> 1 Setophaga caerulescens -74.96229 38.94037 ebird 2016-10-07#> 2 Setophaga caerulescens -81.78414 24.54900 ebird 2016-10-07#> 3 Setophaga caerulescens -74.04066 40.62207 ebird 2016-10-07#> 4 Setophaga caerulescens -71.13179 42.29393 ebird 2016-10-07...
Get data from many sources in a single call
ebirdopts = list(region = 'US'); gbifopts = list(country = 'US')out <- occ(query = 'Setophaga caerulescens', from = c('gbif','bison','inat','ebird'), gbifopts = gbifopts, ebirdopts = ebirdopts, limit = 50)dat <- occ2df(out)head(dat); tail(dat)#> # A tibble: 6 × 6#> name longitude latitude prov date key#> <chr> <chr> <chr> <chr> <date> <chr>#> 1 Setophaga caerulescens -71.80166 44.53323 gbif 2016-05-31 1291120531#> 2 Setophaga caerulescens -72.83974 44.07966 gbif 2016-05-17 1269567302#> 3 Setophaga caerulescens -83.44952 44.25382 gbif 2016-05-07 1291149671#> 4 Setophaga caerulescens -83.59349 41.58065 gbif 2016-05-11 1291674787#> 5 Setophaga caerulescens -75.19615 39.95469 gbif 2016-05-11 1269552963#> 6 Setophaga caerulescens -83.44952 44.25382 gbif 2016-05-08 1291149541#> # A tibble: 6 × 6#> name longitude latitude prov date key#> <chr> <chr> <chr> <chr> <date> <chr>#> 1 Setophaga caerulescens -76.5798569 39.2896713 ebird 2016-10-07 L449982#> 2 Setophaga caerulescens -76.2258053 39.0347918 ebird 2016-10-07 L126631#> 3 Setophaga caerulescens -75.3904152 40.6363295 ebird 2016-10-07 L372101#> 4 Setophaga caerulescens -73.9869263 40.4392518 ebird 2016-10-07 L197353#> 5 Setophaga caerulescens -88.21148 40.11972 ebird 2016-10-07 L251002#> 6 Setophaga caerulescens -75.1045883 40.0237695 ebird 2016-10-07 L3694793
spoccin R doing
citation(package = 'spocc')
rvertnet, a dependency dealing with data from Vertnet, was failing on certain searches.
rvertnetwas fixed and a new version on CRAN now. No changes here other than requiring the new version of
inherits(), and namespace all
occ()now allows queries that only pass
fromand one of the data source opts params (e.g.,
gbifopts) - allows specifying any options passed down to the internal functions used to do data queries without having to use the other params in
tibblefor representing data.frames (#164)
httr::content()calls to parse raw data from web requests (#160)
ridigbioas its on CRAN - was using internal fxns prior to this (#154)
has_coordsalso fixed. (#161)
data.frame()to set a
data.tablestyle table to a
vertnetas an option to
occ_options()to get the options for passing to
print.occdatind()- which in last version introduced a bug in this print method - wasn't fatal as only applied to empty slots in the output of a call to
occ(), but nonetheless, not good (#159)
data.tablefor fast list to data.frame
as.vertnet()to coerce various inputs (e.g., result from
occ2df(), or a key itself) to occurrence data objects (#142)
occ()gains two parameters
pageto facilitate paging through results across data sources, instead of having to page individually for each data source. Some sources use the
startparameter, while others use the
pageparameter. See Paging section in
?occfor details on Paging (#140)
wkt_vis()now works with WKT polygons with multipe polygons, e.g.,
spocc::wkt_vis("POLYGON((-125 38.4, -121.8 38.4, -121.8 40.9, -125 40.9, -125 38.4), (-115 22.4, -111.8 22.4, -111.8 30.9, -115 30.9, -115 22.4))")(#147)
print.occdatind()to print more helpful info when a geometry search is used as opposed to a taxonomy based search (#149)
print.occdatind()to not fail when first element not present; proceeds to next slot with data (#143)
occ()failed when multiple
geometryelements passed in along with taxonomic names (#146)
occ2df()for combining outputs to not fail when AntWeb doesn't give back dates (#144) (#145) - thanks @timcdlucas
occ2df()to not fail when date field missing (#141)
occ()function. Each data source is taken care of in a separate package or set of wrapper functions, and the man file now details what API parameters are being queried (#138)
Datetimevariable changed to
occurrenceIDvariable changed to
occ()gains new parameter
has_coords- a global parameter (except for ebird and bison) to return only records with lat/long data. (#128)
rank(#133) parameters dropped from
occ()is printed, we now include a message that total count of records found (not returned) is not completely known if ebird is included, because eBird does not include data on records found on their servers with requests to their API (#111)
as.gbif) for most data sources. These functions take in occurrence keys or sets of keys, and retrieve detailed occurrence record data for each key (#112)
occ2df()now returns more fields. This function collapses all essential fields that are easy to get in all data sources:
keyfield is the occurrence key for each record, which you can use to keep track of individual records, get more data on the record, etc. (#103) (#108)
inspect()- takes output from
occ()or individual occurrence keys and gets detailed occurrence data.
methods. No longer importing:
leafletR. Pkgs removed mostly due to splitting off some functionality into
spoccutils. related issues: (#131) (#132)
wkt_vis()now only has an option to view a WKT shape in the browser.
gistrnow to post interactive geojson maps on GitHub gists (#100)
rgbifnow must be
v0.7.7or greater (the latest version on CRAN).
occ2sp()removed. The function
occ_to_sp()function is the working version. (#97)
\dontrunin examples as requested by CRAN maintainers (#99)
occ_names()to search only for taxonomic names. The goal here is to use ths function if there is some question about what names you want to use to search for occurrences with. (#84). Suggested by @jarioksa
occ_names_options()to quickly get parameter options to pass to
summary()method for the
S3object that is output from
occ()documentation file, at package startup), we make it clear that there could be duplicate records returned in certain scenarios. And a new documentation page detailing what to watch out for:
limitto each functions options parameter, and it will work. Each data source can have a different parameter internally from
limit, but now internally within
spocc, we allow you to use
limitso you don't have to know what the data source specific parameter is. (#81)
occ_options()gains new parameter
whereto print either in the console or to open man file in the IDE, or prints to console in command line R.
occ()gains new parameter
calloptsto pass on curl debugging options to
wkt_vis()now by default plots a well known text area (WKT) on an interactive mapbox map in your default browser. New parameter
whichallows you to choose the interactive map or a static ggplot2 map. (#70)
occ()gains new class. In the previous version of this package, a
data.framewas printed. Now the data is assigned the object
occdatind(short for occdat individual).
occ()now uses a print method for the
occdatindclass, adopted from
dplyrthat prints a brief
data.frame, with columns wrapped to fit the width of your console, and additional columns not printed given at bottom with their class type. Note that the print behavior for the resulting object of an
occ()call remains the same. (#69) (#74)
whiskeras a package import to use in the
mapggplot()accepted the output of
occ(), of class
occdat, while the other two functions for mapping,
data.frame. Now all three functions accept the output of
occ(), an object of class
metaslot in each returned object (indexed by
object$meta) contains spots for
found, to designate number of records returned, and number of records found. (#64)
rgbif. A number of input and output parameter names changed. A new version of
rgbifwas pushed to CRAN. (#56)
clean_spocc()started (not finished yet) to attempt to clean data. For example, one use case is removing impossible lat/long values (i.e., longitue values greater than absolute 180). Another, not implemented yet, is to remove points that are not in the country or habitat your points are supposed to be in. (#44)
fixnames()to trim species names with optional input parameters to make data easier to use for mapping.
wkt_vis()to visualize a WKT (well-known text) area on a map. Uses
ggmapto pull down a Google map so that the visualization has some geographic and natural earth context. We'll soon introduce an interactive version of this function that will bring up a small Shiny app to draw a WKT area, then return those coordinates to your R session. (#34)