Interacts with a suite of web 'APIs' for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more.
taxize allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.
taxize tutorial is can be found at https://ropensci.org/tutorials/taxize.html
The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of
service_whatitdoes. For example,
gnr_resolve uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g.,
You need API keys for Encyclopedia of Life (EOL), Tropicos, IUCN, and NatureServe.
Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: Pan-European Species directories Infrastructure and Mycobank. Data sources that use SOAP web services have been moved to
taxizesoap at https://github.com/ropensci/taxizesoap.
|Souce||Function prefix||API Docs||API key|
|Encylopedia of Life||
|Taxonomic Name Resolution Service||
|Integrated Taxonomic Information Service||
|Global Names Resolver||
|Global Names Index||
|IUCN Red List||
|Theplantlist dot org||
|Catalogue of Life||
|National Center for Biotechnology Information||
|CANADENSYS Vascan name search API||
|International Plant Names Index (IPNI)||
|Barcode of Life Data Systems (BOLD)||
|National Biodiversity Network (UK)||
|Index of Names (ION)||
|Open Tree of Life (TOL)||
|World Register of Marine Species (WoRMS)||
|Kew's Plants of the World||
**: There are none! We suggest using
TPLck functions in the taxonstand package. We provide two functions to get bulk data:
***: There are none! The function scrapes the web directly.
See the newdatasource tag in the issue tracker
For more examples see the tutorial
Windows users install Rtools first.
taxize revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data source knows about, then we can move forth acquiring more fun taxonomic data.
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.
out <- classification(uids)lapply(out, head)#> $`315576`#> name rank id#> 1 cellular organisms no rank 131567#> 2 Eukaryota superkingdom 2759#> 3 Opisthokonta no rank 33154#> 4 Metazoa kingdom 33208#> 5 Eumetazoa no rank 6072#> 6 Bilateria no rank 33213#>#> $`492549`#> name rank id#> 1 cellular organisms no rank 131567#> 2 Eukaryota superkingdom 2759#> 3 Opisthokonta no rank 33154#> 4 Metazoa kingdom 33208#> 5 Eumetazoa no rank 6072#> 6 Bilateria no rank 33213
Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.
children("Salmo", db = 'ncbi')#> $Salmo#> childtaxa_id childtaxa_name childtaxa_rank#> 1 2304090 Salmo abanticus species#> 2 2126688 Salmo ciscaucasicus species#> 3 1509524 Salmo marmoratus x Salmo trutta species#> 4 1484545 Salmo cf. cenerinus BOLD:AAB3872 species#> 5 1483130 Salmo zrmanjaensis species#> 6 1483129 Salmo visovacensis species#> 7 1483128 Salmo rhodanensis species#> 8 1483127 Salmo pellegrini species#> 9 1483126 Salmo opimus species#> 10 1483125 Salmo macedonicus species#> 11 1483124 Salmo lourosensis species#> 12 1483123 Salmo labecula species#> 13 1483122 Salmo farioides species#> 14 1483121 Salmo chilo species#> 15 1483120 Salmo cettii species#> 16 1483119 Salmo cenerinus species#> 17 1483118 Salmo aphelios species#> 18 1483117 Salmo akairos species#> 19 1201173 Salmo peristericus species#> 20 1035833 Salmo ischchan species#> 21 700588 Salmo labrax species#> 22 602068 Salmo caspius subspecies#> 23 237411 Salmo obtusirostris species#> 24 235141 Salmo platycephalus species#> 25 234793 Salmo letnica species#> 26 62065 Salmo ohridanus species#> 27 33518 Salmo marmoratus species#> 28 33516 Salmo fibreni species#> 29 33515 Salmo carpio species#> 30 8032 Salmo trutta species#> 31 8030 Salmo salar species#>#> attr(,"class")#>  "children"#> attr(,"db")#>  "ncbi"
Get all species in the genus Apis
downstream(as.tsn(154395), db = 'itis', downto = 'species', verbose = FALSE)#> $`154395`#> tsn parentname parenttsn taxonname rankid rankname#> 1 154396 Apis 154395 Apis mellifera 220 species#> 2 763550 Apis 154395 Apis andreniformis 220 species#> 3 763551 Apis 154395 Apis cerana 220 species#> 4 763552 Apis 154395 Apis dorsata 220 species#> 5 763553 Apis 154395 Apis florea 220 species#> 6 763554 Apis 154395 Apis koschevnikovi 220 species#> 7 763555 Apis 154395 Apis nigrocincta 220 species#>#> attr(,"class")#>  "downstream"#> attr(,"db")#>  "itis"
Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).
upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)#> $`Pinus contorta`#> tsn parentname parenttsn taxonname rankid rankname#> 1 18031 Pinaceae 18030 Abies 180 genus#> 2 18033 Pinaceae 18030 Picea 180 genus#> 3 18035 Pinaceae 18030 Pinus 180 genus#> 4 183396 Pinaceae 18030 Tsuga 180 genus#> 5 183405 Pinaceae 18030 Cedrus 180 genus#> 6 183409 Pinaceae 18030 Larix 180 genus#> 7 183418 Pinaceae 18030 Pseudotsuga 180 genus#> 8 822529 Pinaceae 18030 Keteleeria 180 genus#> 9 822530 Pinaceae 18030 Pseudolarix 180 genus#>#> attr(,"class")#>  "upstream"#> attr(,"db")#>  "itis"
synonyms("Acer drummondii", db="itis")#> $`Acer drummondii`#> sub_tsn acc_name acc_tsn#> 1 183671 Acer rubrum var. drummondii 526853#> 2 183671 Acer rubrum var. drummondii 526853#> 3 183671 Acer rubrum var. drummondii 526853#> acc_author syn_author#> 1 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) E. Murray#> 2 (Hook. & Arn. ex Nutt.) Sarg. Hook. & Arn. ex Nutt.#> 3 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) Small#> syn_name syn_tsn#> 1 Acer rubrum ssp. drummondii 28730#> 2 Acer drummondii 183671#> 3 Rufacer drummondii 183672#>#> attr(,"class")#>  "synonyms"#> attr(,"db")#>  "itis"
get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)#> $itis#> Salvelinus fontinalis#> "162003"#> attr(,"match")#>  "found"#> attr(,"multiple_matches")#>  FALSE#> attr(,"pattern_match")#>  FALSE#> attr(,"uri")#>  ""#> attr(,"class")#>  "tsn"#>#> $ncbi#> Salvelinus fontinalis#> "8038"#> attr(,"class")#>  "uid"#> attr(,"match")#>  "found"#> attr(,"multiple_matches")#>  FALSE#> attr(,"pattern_match")#>  FALSE#> attr(,"uri")#>  ""#>#> attr(,"class")#>  "ids"
You can limit to certain rows when getting ids in any
get_ids(names="Poa annua", db = "gbif", rows=1)#> $gbif#> Poa annua#> "2704179"#> attr(,"class")#>  "gbifid"#> attr(,"match")#>  "found"#> attr(,"multiple_matches")#>  TRUE#> attr(,"pattern_match")#>  FALSE#> attr(,"uri")#>  ""#>#> attr(,"class")#>  "ids"
Furthermore, you can just back all ids if that's your jam with the
get_*_() functions (all
get_*() functions with additional
_ underscore at end of function name)
get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)#> $nbn#> $nbn$`Chironomus riparius`#> guid scientificName rank taxonomicStatus#> 1 NBNSYS0000027573 Chironomus riparius species accepted#> 2 NHMSYS0001718585 Hypnoidus riparius species accepted#> 3 NBNSYS0000023345 Paederus riparius species accepted#>#> $nbn$`Pinus contorta`#> guid scientificName rank taxonomicStatus#> 1 NBNSYS0000004786 Pinus contorta species accepted#> 2 NHMSYS0000494858 Pinus contorta var. murrayana variety accepted#> 3 NHMSYS0000494848 Pinus contorta var. contorta variety accepted#>#>#> attr(,"class")#>  "ids"
sci2comm('Helianthus annuus', db = 'itis')#> $`Helianthus annuus`#>  "common sunflower" "sunflower" "wild sunflower"#>  "annual sunflower"
comm2sci("black bear", db = "itis")#> $`black bear`#>  "Ursus americanus luteolus" "Ursus americanus"#>  "Ursus americanus" "Ursus americanus americanus"#>  "Chiropotes satanas" "Ursus thibetanus"#>  "Ursus thibetanus"
spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")lowest_common(spp, db = "ncbi")#> name rank id#> 21 Boreoeutheria below-class 1437010
as.uid(315567)#>  "315567"#> attr(,"class")#>  "uid"#> attr(,"match")#>  "found"#> attr(,"multiple_matches")#>  FALSE#> attr(,"pattern_match")#>  FALSE#> attr(,"uri")#>  ""
as.uid(list("315567", "3339", "9696"))#>  "315567" "3339" "9696"#> attr(,"class")#>  "uid"#> attr(,"match")#>  "found" "found" "found"#> attr(,"multiple_matches")#>  FALSE FALSE FALSE#> attr(,"pattern_match")#>  FALSE FALSE FALSE#> attr(,"uri")#>  ""#>  ""#>  ""
out <- as.uid(c(315567, 3339, 9696))(res <- data.frame(out))#> ids class match multiple_matches pattern_match#> 1 315567 uid found FALSE FALSE#> 2 3339 uid found FALSE FALSE#> 3 9696 uid found FALSE FALSE#> uri#> 1#> 2#> 3
See our CONTRIBUTING document.
Collected via GitHub Issues - this list honors all contributions, whether code or not.
afkoeppel - ahhurlbert - albnd - Alectoria - andzandz11 - antagomir - arendsee - ArielGreiner - arw36 - ashenkin - ashiklom - benjaminschwetz - benmarwick - bomeara - bw4sz - cboettig - cdeterman - ChrKoenig - chuckrp - clarson2191 - claudenozeres - cmzambranat - cparsania - daattali - DanielGMead - DarrenObbard - davharris - davidvilanova - diogoprov - dlebauer - dlenz1 - dschlaep - EDiLD - edwbaker - emhart - eregenyi - fdschneider - fgabriel1891 - fischhoff - fmichonneau - fozy81 - gedankenstuecke - GISKid - git-og - glaroc - gpli - gustavobio - hlapp - ibartomeus - Ironholds - jangorecki - jarioksa - jebyrnes - jimmyodonnell - johnbaums - jonmcalder - josephwb - jsgosnell - jwilk - kamapu - karthik - katrinleinweber - KevCaz - kgturner - kmeverson - Koalha - ljvillanueva - maelle - Markus2015 - mcsiple - MikkoVihtakari - millerjef - miriamgrace - MK212 - mpnelsen - MUSEZOOLVERT - nate-d-olson - nmatzke - npch - paternogbc - patperu - pederengelstad - philippi - pmarchand1 - PrincessPi314 - pssguy - raredd - rec3141 - Rekyt - RodgerG - rossmounce - sariya - scelmendorf - sckott - SimonGoring - snsheth - snubian - Squiercg - taddallas - tdjames1 - tmkurobe - toczydlowski - tpaulson1 - tpoisot - vijaybarve - wcornwell - willpearse - wpetry - yhg926 - zachary-foster
Check out our milestones to see what we plan to get done for each version.
taxizein R doing
citation(package = 'taxize')
taxize. the string will look something like
r-curl/3.3 crul/0.7.0 rOpenSci(taxize/0.9.6), including the versions of the
curlR pkg, the
crulpackage, and the
get_colidfunctionality: we weren't paginating for the user when there were more than 50 results for a query; we now paginate for the user using async HTTP requests; this means that some requests will take longer than they did before if they have more than 50 results; this is a good change given that you get all the results for your query now (#743)
get_*functions: in some of the
get_*functions we tried for a direct match (e.g.,
"Poa" == "Poa") and if one was found, then we were done and returned that record. however, we didn't deploy the same logic across all
get_*functions. Now all
get_*functions check for a direct match. Of course if there is a direct match with more than 1 result, you still get the prompt asking you which name you want. (#631) (#734)
taxize-authenticationmanual file covering authentication information across the package (#681)
gnr_resolve()docs about age of datasets used in the Global Names Resolver, and how to access age of datasets (#737)
get_eolid()fixes: gains new attribute
uri's given are updated to EOL's new URL format;
datasourceparameters were not documented, now are; we no longer use short names for data sources within EOL, but instead use their full names (#702) (#742)
col_search()now returns attributes on the output data.frame's with number of results found and returned, and other metadata about the search
todfparameter; now always returns a data.frame and the data.frame has all the columns, whereas the default call returned a limited set of columns in previous versions
get_wormsid(), was failing when there was a direct match found with more than 1 result (#740)
get_*functions: linting of the input to the
rowsparmeter was failing with a vector of values in some cases (#741)
iucn_summary(); we weren't passing on the API key internally correctly (#735) thanks @PrincessPi314 for the report
iucn_summary_id()is defunct, use
extant_only(logical) to optionally keep extant taxa only (#714) thanks @ArielGreiner for the inquiry
dboptions: Worms. You can now set
db="worms"to use Worms to get taxa downstream from a target taxon. In addition,
taxizegains new function
worms_downstream(), which is used under the hood in
downstream(..., db="worms")(#713) (#715)
dboptions for tol, itis, ncbi, worms, gbif, col, and bold. the function converts taxonomid IDs to names. It's sort of the inverse of the
get_*()family of functions. (#712) (#716)
tax_rank()gains new parameter
rowsso that one can pass
synonyms()warning from an internal
cbind()call now fixed (#704) (#705) thanks @vijaybarve
taxizefunction calls thrown when notifying users about API keys (e.g.,
taxize::use_tropicos()) to make it very clear where the functions live (to avoid confusion with
usethis) (#724) (#725) thanks @maelle
iucn_summary()to output the same structure when no match is found as when a match is found so that when output is passed to
iucn_status()behavior is the same (#708) thanks @Rekyt
tax_name()tests on CRAN (#728)
vcr, making tests much faster and not prone to errors to remote services being down (#729)
eol_dataobjects()gains new parameter
textparameters, and gains
texts_page. Please do let us know if you find any problems with any EOL functions (#717) (#718)
get_*()functions changed parameter
messagesto not conflict with
verbosepassed down to
ncbi_ping()reworked to allow use of your api key as a parameter or pulled from your environemnt;
eol_ping()using https instead of http, and parsing JSON instead of XML.
get_eolid()was erroring when no results found for a query due to not assigning an internal variable (#701) (#709) thanks for the fix @taddallas
get_tolid()was erroring when values were
NULL- now replacing all
data.table::rbindlist()happy (#710) (#711) thanks @gpli for the fix
rank_refdata.frame of taxonomic ranks: species subgroup, forma, varietas, clade, megacohort, supercohort, cohort, subcohort, infracohort. when there's no matched rank errors can result in many of the downstream functions. The data.frame now has 43 rows. (#720) (#727)
ncbi_get_taxon_summary(): change in
ncbi_get_taxon_summaryto break up queries into smaller chunks to avoid HTTP 414 errors ("URI too long") (#727) (#730) thanks for reporting @fischhoff and @benjaminschwetz
use_iucn()(which uses internally
use_tropicos()(#682) (#691) (#693) By @maelle
gbif_downstream(): some of the results don't have a
canonicalName, so now safely try to get that field (#673)
as.uid(), was erroring when passing in a taxon ID (#674) (#675) by @zachary-foster
get_boldid()(and by extension
classification(..., db = "bold")): was failing when no parent taxon found, just fill in with NA now (#680)
synonyms(): was failing for some TSNs for
rowsarg wasn't being passed on internally (#686)
gnr_datasources(): problems were caused by http scheme, switched to use https instead of http (#687)
class2tree(): organisms with unique rank lower than non-unique ranks will give extra wrong rows (#689) (#690) thanks @gpli
ncbi_get_taxon_summary(): changes in the NCBI API most likely lead to HTTP 414 (URI Too Long) errors. we now loop internally for the user. By extension this helps problems upsteam in
class2tree(): was erroring when name strings contained pound signs (e.g.,
#) (#699) (#700) thanks @gpli
Sys.sleepfor NCBI requests if the user has an API key (#667)
messagesacross the package so that supressing calls to
message()do not conflict with curl options passed in
httrfor HTTP requests
get_tolid(): it was missing assignment of the
attattribute internally, causing failures in some cases (#663) (#672)
children()when requesting NCBI data) to not fail when there is an empty result from the internal call to
classification()(#664) thanks @arendsee
class2tree()gets a major overhaul thanks to @gedankenstuecke and @trvinh (!!). The function now takes unnamed ranks into account when clustering, which fixes problem where trees were unresolved for many splits as the named taxonomy levels were shared between them. Now it makes full use of the NCBI Taxonomy string, including the unnamed ranks, leading to higher resolution trees that have less multifurcations (#611) (#634)
?taxize-authenticationfor help. Importantly, note that API key names (both R options and environment variables) have changed. They are now the same for R options and env vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer need an API key for Plantminer. (#640) (#646)
downstream()we now pass on
gbif_downstream(); we weren't doing that before; the two parameters control pagination (#638)
genbank2uid()now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail (#642) thanks @zachary-foster
children()outputs made more consistent for certain cases when no results found for searches (#648) (#649) thanks @arendsee
...(additional parameters) down to
ncbi_children()used internally. allows e.g., use of
ncbi_children()allows you to remove ambiguousl named nodes (#653) (#654) thanks @arendsee
crulin EOL and Tropics functions - note that this won't affect you unless you're passing curl options. see package
crulfor help on curl options. Along with this change, the parameter
verbosehas changed to
messages(for toggling printing of information messages)
CONTRIBUTING.mdfile for how to contribute to the test suite (#635)
genbank2uidnow returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail.
downstream(): passing numeric taxon ids to the function while using
db="ncbi"wasn't working (#641) thanks @arendsee
children(): passing numeric taxon ids to the function while using
db="worms"wasn't working (#650) (#651) thanks @arendsee
synonyms_df()- that attemps to combine many outputs from the
synonyms()function - now removes NA/NULL/empy outputs before attempting the combination (#636)
gnr_resolve(): before if
preferred_data_sourceswas used, you would get the preferred data but only a few columns of the response. We now return all fields; however, we only return the preferred data part when that parameter is used (#656)
children(). It was returning unexpected results for amgiguous taxonomic names (e.g., there's some insects that are returned when searching within Bacteria). It was also failing when one tried to get the children of a root taxon (e.g., the children of the NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-foster
rowsparameter value. Those all changed to
rowsparameter value given
get_*()functions to behave the same when
ask = FALSE, rows = 1and
ask = TRUE, rows = 1as these should result in the same outcome. (#627) thanks @zachary-foster !
NAwith no inication that there were multiple matches.
comm2sci()to S3 setup with methods for
iucn_status()now has S3 setup with a single method that only handles output from the
keyparameter to fxn
sci2comm(): to indicate how to get non-simplified output (which includes what language the common name is from) vs. getting simplified output (#623) thanks @glaroc !
sci2comm()to not be case sensitive when looking for matches (#625) thanks @glaroc !
eol_search()to describe returned
bold_bing()to use new base URL for their API
downstream()via fix to
rank_refdataset to include "infraspecies" and make "unspecified" and "no rank" requivalent. Fix to
col_downstream()to remove properly ranks lower than allowed. (#620) thanks @cdeterman !
iucn_summary: changed to using
scinameparam changed to
iucn_summary_id()now is deprecated in favor of
iucn_summary()now has a S3 setup, with methods for
rank_refdataset as that rank sometimes used at NCBI (from bug reported in
tryCatch()to internals to catch failed requests for specific pageid's (#624) thanks @glaroc !
ape::neworder_phyloobject, which is not used anymore in
ncbi_downstream()and now NCBI is an option in the function
downstream()(#583) thanks for the push @andzandz11
wikitaxa, with contributions from @ezwelty (#317)
scrapenames()gains a parameter
return_content, a boolean, to optionally return the OCR content as a text string with the results. (#614) thanks @fgabriel1891
get_iucn()- to get IUCN Red List ids for taxa. In addition, new S3 methods
sci2comm.iucn- no other methods could be made to work with IUCN Red List ids as they do no share their taxonomic classification data (#578) thanks @diogoprov
boldnow an option in
genbank2uid()can give back more than 1 taxon matched to a given Genbank accession number. Now the function can return more than one match for each query, e.g., try
genbank2uid(id = "AM420293")(#602) thanks @sariya
cbind()usage to incclude
...for method consistency (#612)
tax_rank()used to be able to do only ncbi and itis. Can now do a lot more data sources: ncbi, itis, eol, col, tropicos, gbif, nbn, worms, natserv, bold (#587)
classification()docs in a section
Lots of resultsa note about how to deal with results when there are A LOT of them. (#596) thanks @ahhurlbert for raising the issue
tnrs()now returns the resulting data.frame in the oder of the names passed in by the user (#613) thanks @wpetry
gnr_resolve()to now strip out taxonomic names submitted by user that are NA, or zero length strings, or are not of class character (#606)
gnr_resolve()(#610) thanks @kamapu
tnrs()docs that the service doesn't provide any information about homonyms. (#610) thanks @kamapu
rank_refdataset - used by NCBI - if tax returned with that rank, some functions in
taxizewere failing due to that rank missing in our reference dataset
get_colid()via problem in parsing within
gbif_downstream(and thus fix in
downstream()): there was two rows with form in our
rank_refreference dataset of rank names, causing > 1 result in some cases, then causing
vapplyto fail as it's expecting length 1 result (#599) thanks @andzandz11
genbank2uid(): was failing when getting more than 1 result back, works now (#603) and fails better now, giving back warnings/error messages that are more informative (see also #602) thanks @sariya
synonyms.tsn(): in some cases a TSN has > 1 accepted name. We get accepted names first from the TSN, then look for synonyms, and hadn't accounted for > 1 accepted name. Fixed now (#607) thanks @tdjames
sci2comm()- was not dealing internally with passing the
worrmspackage on CRAN. Adds functions
rankagg()with respect to
veganpackage to work with older and new version of
vegan- thank @jarioksa (#580) (#581)
classification()gains new method for TOL data
lowest_common()gains new method for TOL data
ritispackage, an external dependency for ITIS taxonomy data. Note that a large number of ITIS functions were removed, and are now available via the package
ritis. However, there are still many high level functions for working with ITIS data (see functions prefixed with
classification.tsn(), and similar high level functions remain unchanged. (#525)
eubon()fxn is now
eubon_search(), although either still work - though
eubon()will be made defunct in the next version of this package. Additional new functions were added:
lowest_common()function gains two new data source options: COL (Catalogue of Life) and TOL (Tree of Life) (#505)
synonyms_df()as a slim wrapper around
data.table::rbindlist()to make it easy to combine many outputs from
synonyms()for a single data source - there is a lot of heterogeneity among data sources in how they report synonyms data, so we don't attempt to combine data across sources (#533)
tax_name()in which when an invalid taxon was searched for then
classification()returned no data and caused an error. Fixed now. (#560) thanks @ljvillanueva for reporting it!
gnr_resolve()in which order of input names to the function was not retained. fixed now. (#561) thanks @bomeara for reporting it!
gbif_parse()- data format changed coming back from GBIF - needed to replace
NA(#568) thanks @ChrKoenig for reporting it!
get_*()functions now have new attributes to further help the user:
multiple_matches(logical) indicating whether there were multiple matches or not, and
pattern_match(logical) indicating whether a pattern match was made, or not. (#550) from (#547) discussion, thanks @ahhurlbert ! see also (#551)
gnr_resolve()now retains user supplied taxa that had no matches - this could affect your code, make sure to check your existing code (#558)
gnr_resolve()- stop sorting output data.frame, so order of rows in output data.frame now same as user input vector/list (#559)
sub_rows()inside of most
get_*()functions to not fail when the data.frame rows were less than that requested by the user in
get_gbifid(), as sometimes calls failed because we now return numberic IDs but used to return character IDs (#555)
get_()functions to call the internal
sub_rows()function later in the function flow so as not to interfere with taxonomic based filtering (e.g., user filtering by a taxonomic rank) (#555)
gnr_resolve(), to not fail on parsing when no data returned when a preferred data source specified (#557)
iucn_summary()(#543) thanks @mcsiple
ncbi_get_taxon_summary()suggesting to break up the ids into chunks (#541) thanks @daattali
itis_acceptname()to accept multiple names (#534) and now gives back same output regardless of whether match found or not (#531)
tax_name()for some queries that return no classification data via internal call to
classification()(#542) thanks @daattali
tax_name()(#530) thanks @ibartomeus
requireNamespace()in examples to make sure user has
gisd_invasive()to point to new location in the originr package. Also, cleaned out code in those functions as not avail. anymore (#494)
get_gbifid()to use new internal code to provide two ways to search GBIF taxonomy API, either via
/species/search, instead of
/species/suggest, which we used previously. The suggest route was too coarse.
get_gbifid()also gains a parameter
methodto toggle whether you search for names using
col_search()to handle when COL can return a value of
missapplied name, which a
switch()statement didn't handle yet (#511) thanks @JoStaerk !
col_search()(#523) thanks @zachary-foster !
bold, which fixes
taxize::bold_search(), so no actual changes in
taxizefor this, but take note (#521)
gnr_resolve()where we indexed to data incorrectly. And added tests to account for this problem. Thanks @raredd ! (#519) (#520)
iucn_summary()introduced in last version.
iucn_summary()now uses the package
rredlist, which requires an API key, and I didn't document how to use the key. Function now allows user to pass the key in as a parameter, and documents how to get a key and save it in either
lowest_common()for obtaining the lowest common taxon and rank for a given taxon name or ID. Methods so far for ITIS, NCBI, and GBIF (#505)
iucn_summary_id()- same as
iucn_summary(), except takes IUCN IDs as input instead of taxonomic names (#493)
iucn_summary()fixes, long story short: a number of bug fixes, and uses the new IUCN API via the newish package
rredlistwhen IDs are given as input, but uses the old IUCN API when taxonomic names given. Also: gains new parameter
distr_details(#174) (#472) (#487) (#488)
xml2for XML parsing (#499)
httr::contentto explicitly state
gnr_resolve()now outputs a column (
user_supplied_name) for the exact input taxon name - facilitates merging data back to original data inputs (#486) thanks @Alectoria
eol_dataobjects()gains new parameter
taxonomyto toggle whether to return any taxonomy details from different data providers (#497)
classification()was giving back rank values in mixed case from different data providers (e.g.,
Class). All rank values are now all lowercase (#504)
get_gbfidto 50 from 20. Gives back more results, so more likely to get the thing searched for (#513)
gni_search()to make all output columns
tpl_get()all gain a new parameter
...to pass on curl options to
get_eolid(): URI returned now always has the pageid, and goes to the right place; API key if passed in now actually used, woopsy (#484)
get_uid(): when a taxon not found, the "match" attribute was saying found sometimes anyway - that is now fixed; additionally, fixed docs to correctly state that we give back
'NA due to ask=FALSE'when
ask = FALSE(#489) Additionally, made this doc fix in other
get_tpsid(): Tropicos doesn't allow periods (
.) in query strings, so those are URL encoded now; Tropicos doesn't like sub-specific rank names in name query strings, so we warn when those are found, but don't alter user inputs; and improved docs to be more clear about how the function fails (#491) thanks @scelmendorf !
classification(db = "itis")to fail better when no taxa found (#495) thanks @ashenkin !
eol_pages()fixes: the EOL API route for this method gained a new parameter
taxonomy, this function gains that parameter. That change caused this fxn to fail. Now fixed. Also, parameter
col_search()due to when
misapplied namecome back as a data slot. There was previously no parser for that type. Now there is, and it works (#512)
R >= 3.2.1. Good idea to update your R installation anyway (#476)
ion()for obtaining data from Index of Organism Names (#345)
eubon()for obtaining data from EU (European Union) BON taxonomy (#466) Note that you may onloy get partial results for some requests as paging isn't implemented yet in the EU BON API (#481)
fg_*()for obtaining data from Index Fungorum. More work has to be done yet on this data source, but these initial functions allow some Index Fungorum data access (#471)
gbif_downstream()for obtaining downstream names from GBIF's backbone taxonomy. Also available in
downstream(), where you can request downstream names from GBIF, along with other data sources (#414)
dbparameters to warn users that if they provide the wrong
dbvalue for the given taxon ID, they can get data back, but it would be wrong. That is, all taxonomic data sources available in
taxizeuse their own unique IDs, so a single ID value can be in multiple data sources, even though the ID refers to different taxa in each data source. There is no way we can think of to prevent this from happening, so be cautious. (#465)
gnr_resolve()to by default capitalize first name of a name string passed to the function. GNR is case sensitive, so case matters (#469)
phylomatic_format()are defunct. They were deprecated in recent versions, but are now gone. See the new package
brranchingfor Phylomatic data (#479)
gnr_resolve()has been renamed to
canonicalto better match what it actually does (#451)
gnr_resolve()now returns a single data.frame in output, or
NULLwhen no data found. The input taxa that have no match at all are returned in an attribute with name
...to pass in curl options to the request.
...to pass in curl options to the request. In addition, better http error handling, and added a test suite for this function. (#458)
stringsAsFactors=FALSEnow used for
get_uid()to make more clear how to use the varoious parameters to get the desired result, and how to avoid certain pitfalls (#436)
asdffrom the function
eol_dataobjects()- now returning data.frame's only.
tryCatch()to fail better when names not found.
opensslas a package dependency. Not needed anymore because uBio dropped.
gnr_resolve()failed when no canonical form was found.
gnr_resolve()when no results found when
itisdf()to give back an empty data.frame when no results found, often with subspecific taxa. Helps solve errors reported in use of
gnr_resolve()gains new parameter
with_canonical_ranks(logical) to choose whether infraspecific ranks are returned or not.
iucn_id()to get the IUCN ID for a taxon from it's name. (#431)
ubio_ping(). In addition, ubio has been removed as an option in the
synonyms()function, and references for uBio have been removed from the
taxize_cite()utility function. (#449)
rankagg()doesn't depend on
data.tableanymore (fixes issue with CRAN checks)
openssl::base64_decode(), needed for
importFrom) used across all imports now (#446). In addition,
importFromfor all non-base R pkgs, including
GET(), but can pass
gni_*()functions, including code tidying, some DRYing out, and ability to pass in curl options (#444)
classification()where numeric IDs as input got converted to itis ids just because they were numeric. Fixed now. (#434)
synonymsfunction to get name synonyms. (#430)
responseto get a terse or full response, and
...to pass in curl options.
...to pass in curl options, and parameter
asdf(for "as data.frame").
...to pass in curl options.
children()function gains the
rowsparameter passed on to
get_*()functions, supported for data sources ITIS and Catalogue of Life, but not for NCBI.
upstream()function gains the
rowsparameter passed on to
get_*()functions, supported for both data sources ITIS and Catalogue of Life.
classification()function gains the
rowsparameter passed on to
get_*()functions, for all sources used in the function.
downstream()function gains the
rowsparameter passed on to
get_*()functions, for all sources used in the function.
get_*()) gain new parameters to help filter results (e.g.,
rank, etc.). These parameters allow direct matching or regex filters (e.g.,
.ato match any character followed by an
a). (#410) (#385)
get_*()) now give back more information (mostly higher taxonomic data) to help in the interactive decision process. (#327)
synonyms()function: Catalogue of Life. (#430)
veganpackage, used in
class2tree()function, moved from Imports to Suggests. (#392)
taxize_cite()a lot - get URLs and sometimes citation information for data sources available in taxize. (#270)
FALSEby default. (#425)
tnrsis often quite slow.
gisd_isinvasive(). These functions are available in the
phylomatic_tree()is deprecated, but will be defunct in a upcoming version.
itis_ping()pings ITIS and returns a logical, indicating if the ITIS API is working or not. You can also do a very basic test to see whether content returned matches what's expected. (#394)
status_codes()to get vector of HTTP status codes. (#394)
itis_ping(), and all
genbank2uid()to get a NCBI taxonomic id (i.e., a uid) from a either a GenBank accession number of GI number. (#375)
get_nbnid()to get a UK National Biodiversity Network taxonomic id (i.e., a nbnid). (#332)
nbn_classification()to get a taxonomic classification for a UK National Biodiversity Network taxonomic id. Using this new function, generic method
classification()gains method for
nbn_synonyms()to get taxonomic synonyms for a UK National Biodiversity Network taxonomic id. Using this new function, generic method
synonyms()gains method for
nbn_search()to search for taxa in the UK National Biodiversity Network. (#332)
ncbi_children()to get direct taxonomic children for a NCBI taxonomic id. Using this new function, generic method
children()gains method for
ncbi. (#348) (#351) (#354)
upstream()to get taxa upstream of a taxon. E.g., getting families upstream from a genus gets all families within the one level higher up taxonomic class than family. (#343)
as.*()to coerce numeric/alphanumeric codes to taxonomic identifiers for various databases. There are methods on this function for each of itis, ncbi, tropicos, gbif, nbn, bold, col, eol, and ubio. By default
as.*()funtions make a quick check that the identifier is a real one by making a GET request against the identifier URI - this can be toggle off by setting
check=FALSE. There are methods for returning itself, character, numeric, list, and data.frame. In addition, if the
as.*.data.frame()function is used, a generic method exists to coerce the
data.frameback to a identifier object. (#362)
get_tsn_()(the underscore is the only different from the previous function name). These functions don't do the normal interactive process of prompts that e.g.,
get_tsn()do, but instead returned a list of all ids, or a subset via the
ncbi_get_taxon_summary()to get taxonomic name and rank for 1 or more NCBI uid's. (#348)
assertthatremoved from package imports, replaced with
stopifnot(), to reduce dependency load. (#387)
eol_hierarchy()now defunct (no longer available) (#228) (#381)
tp_classifcation()now defunct (no longer available) (#228) (#381)
col_classification()now defunct (no longer available) (#228) (#381)
get_*()functions gain a new parameter
rowsto allow selection of particular rows. For example,
rows=1to select the first row, or
rows=1:3to select rows 1 through 3. (#347)
classification()now by default returns taxonomic identifiers for each of the names. This can be toggled off by the
return_id=FALSE. (#359) (#360)
dbparameter, which helps give better error message when a
dbvalue is not possible or spelled incorrectly. (#379)
children(), which is a single interface to various data sources to get immediate children from a given taxonomic name. (#304)
bold_search()that searches for taxa in the BOLD database of barcode data;
get_boldid()to search for a BOLD taxon identifier. (#301)
get_ubioid()to get a uBio taxon identifier. (#318)
get_ids()gains new option to search for a uBio ID, in addition to the others, itis, ncbi, eol, col, tropicos, and gbif.
iplant_resolve()now outputs data.frame structure instead of a list. (#306)
synonyms()gains new data source, can now get synonyms from uBio data source (#319)
vascan_search()giving back more useful results now.
tnrs()function, including more meaningful error messages on failures (#323) (#331)
getpublicationsfromtsn()that caused function to fail on data.frame's with no data on name assignment (#297)
sci2comm()that caused fxn to fail when using
scrapenames(). Sending a text blob via the
textparameter now works.
resolve()so that function now works for all 3 data sources. (#337)
iplant_resolve()to do name resolution using the iPlant name resolution service. Note, this is different from http://taxosaurus.org/ that is wrapped in the
ipni_search()to search for names in the International Plant Names Index (IPNI).
resolve()that unifies name resolution services from iPlant's name resolution service (via
iplant_resolve()), Taxosaurus' TNRS (via
tnrs()), and GNR's name resolution service (via
get_*()functions how returning a new uri attribute that is a link to the taxon on on the web. If NA is given back (e.g. nothing found), the uri attribute is blank. You can go directly to the uri in your default browser by doing, for example:
get_eolid()now returns an attribute provider because EOL collates taxonomic data form a lot of sources, then gives back IDs that are internal EOL ids, not those matching the id of the source they pull from. This should help with provenance, and should help if there is confusion about why the id givenb back by this function does not match that from the original source.
get_tsn()function, now using the function
itis_terms(), which gives back the accepted status of the taxa. This allows a new parameter in the function (
accepted, logical) that allows user to say give back only accepted status names (
accepted=TRUE), or to give back all names (
gnr_resolve()gains two new parameters
best_match_only(logical, to return best match only) and
preferred_data_sources(to return preferred data sources) and
calloptsto pass in curl options.
tp_synonyms()gain new parameter
calloptsto pass in curl options.
class2tree()can now handle NA in classification objects.
classification.colid()now return the submitted name along with the classification.
plyrfunctions, see #275.
verboseparameter to many more functions to allow suppression of help messages.
httr, now manually parsing JSON to a list then to another data format instead of allowing internal
httrparsing - in addition added checks on content type and encoding in many functions.
dbparameter so that a) unique short abbreviations of possible values are possible, and b) gives a meaningful warning if unsupported values are given.
getgeographicdivisionsfromtsn) gain parameter
curloptsto pass in curl options.
data.framecreations to eliminate factor variables.
classification.gbifid()did not return the correct result when taxon not found.
classification()used to fail when it was passed a subset of a vector of ids, in which case the class information was stripped off. Now works (#284)