Taxonomic Information from Around the Web

Interacts with a suite of web 'APIs' for taxonomic tasks, such as getting database specific taxonomic identifiers, verifying species names, getting taxonomic hierarchies, fetching downstream and upstream taxonomic names, getting taxonomic synonyms, converting scientific to common names and vice versa, and more.

taxize allows users to search over many taxonomic data sources for species names (scientific and common) and download up and downstream taxonomic hierarchical information - among other things.

The taxize tutorial is can be found at

The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes. For example, gnr_resolve uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification.

You need API keys for Encyclopedia of Life (EOL), and Tropicos.

Note that a few data sources require SOAP web services, which are difficult to support in R across all operating systems. These include: World Register of Marine Species, Pan-European Species directories Infrastructure , and Mycobank, so far. Data sources that use SOAP web services have been moved to a new package called taxizesoap. Find it at

Souce Function prefix API Docs API key
Encylopedia of Life eol link link
Taxonomic Name Resolution Service tnrs "" none
Integrated Taxonomic Information Service itis link none
Global Names Resolver gnr link none
Global Names Index gni link none
IUCN Red List iucn link none
Tropicos tp link link
Theplantlist dot org tpl ** none
Catalogue of Life col link none
National Center for Biotechnology Information ncbi none none
CANADENSYS Vascan name search API vascan link none
International Plant Names Index (IPNI) ipni link none
Barcode of Life Data Systems (BOLD) bold link none
National Biodiversity Network (UK) nbn link none
Index Fungorum fg link none
EU BON eubon link none
Index of Names (ION) ion link none
Open Tree of Life (TOL) tol link none

**: There are none! We suggest using TPL and TPLck functions in the taxonstand package. We provide two functions to get bullk data: tpl_families and tpl_get.

***: There are none! The function scrapes the web directly.

See the newdatasource tag in the issue tracker

For more examples see the tutorial


Windows users install Rtools first.


Alot of taxize revolves around taxonomic identifiers. Because, as you know, names can be a mess (misspelled, synonyms, etc.), it's better to get an identifier that a particular data sources knows about, then we can move forth acquiring more fun taxonomic data.

uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))

Classifications - think of a species, then all the taxonomic ranks up from that species, like genus, family, order, class, kingdom.

out <- classification(uids)
lapply(out, head)
#> $`315576`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2          Eukaryota superkingdom   2759
#> 3       Opisthokonta      no rank  33154
#> 4            Metazoa      kingdom  33208
#> 5          Eumetazoa      no rank   6072
#> 6          Bilateria      no rank  33213
#> $`492549`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2          Eukaryota superkingdom   2759
#> 3       Opisthokonta      no rank  33154
#> 4            Metazoa      kingdom  33208
#> 5          Eumetazoa      no rank   6072
#> 6          Bilateria      no rank  33213

Get immediate children of Salmo. In this case, Salmo is a genus, so this gives species within the genus.

children("Salmo", db = 'ncbi')
#> $Salmo
#>    childtaxa_id                   childtaxa_name childtaxa_rank
#> 1       1509524  Salmo marmoratus x Salmo trutta        species
#> 2       1484545 Salmo cf. cenerinus BOLD:AAB3872        species
#> 3       1483130               Salmo zrmanjaensis        species
#> 4       1483129               Salmo visovacensis        species
#> 5       1483128                Salmo rhodanensis        species
#> 6       1483127                 Salmo pellegrini        species
#> 7       1483126                     Salmo opimus        species
#> 8       1483125                Salmo macedonicus        species
#> 9       1483124                Salmo lourosensis        species
#> 10      1483123                   Salmo labecula        species
#> 11      1483122                  Salmo farioides        species
#> 12      1483121                      Salmo chilo        species
#> 13      1483120                     Salmo cettii        species
#> 14      1483119                  Salmo cenerinus        species
#> 15      1483118                   Salmo aphelios        species
#> 16      1483117                    Salmo akairos        species
#> 17      1201173               Salmo peristericus        species
#> 18      1035833                   Salmo ischchan        species
#> 19       700588                     Salmo labrax        species
#> 20       237411              Salmo obtusirostris        species
#> 21       235141              Salmo platycephalus        species
#> 22       234793                    Salmo letnica        species
#> 23        62065                  Salmo ohridanus        species
#> 24        33518                 Salmo marmoratus        species
#> 25        33516                    Salmo fibreni        species
#> 26        33515                     Salmo carpio        species
#> 27         8032                     Salmo trutta        species
#> 28         8030                      Salmo salar        species
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "ncbi"

Get all species in the genus Apis

downstream("Apis", db = 'itis', downto = 'Species', verbose = FALSE)
#> $Apis
#>      tsn parentname parenttsn          taxonname rankid rankname
#> 1 154396       Apis    154395     Apis mellifera    220  species
#> 2 763550       Apis    154395 Apis andreniformis    220  species
#> 3 763551       Apis    154395        Apis cerana    220  species
#> 4 763552       Apis    154395       Apis dorsata    220  species
#> 5 763553       Apis    154395        Apis florea    220  species
#> 6 763554       Apis    154395 Apis koschevnikovi    220  species
#> 7 763555       Apis    154395   Apis nigrocincta    220  species
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "itis"

Get all genera up from the species Pinus contorta (this includes the genus of the species, and its co-genera within the same family).

upstream("Pinus contorta", db = 'itis', upto = 'Genus', verbose=FALSE)
#> $`Pinus contorta`
#>      tsn parentname parenttsn   taxonname rankid rankname
#> 1  18031   Pinaceae     18030       Abies    180    genus
#> 2  18033   Pinaceae     18030       Picea    180    genus
#> 3  18035   Pinaceae     18030       Pinus    180    genus
#> 4 183396   Pinaceae     18030       Tsuga    180    genus
#> 5 183405   Pinaceae     18030      Cedrus    180    genus
#> 6 183409   Pinaceae     18030       Larix    180    genus
#> 7 183418   Pinaceae     18030 Pseudotsuga    180    genus
#> 8 822529   Pinaceae     18030  Keteleeria    180    genus
#> 9 822530   Pinaceae     18030 Pseudolarix    180    genus
#> attr(,"class")
#> [1] "upstream"
#> attr(,"db")
#> [1] "itis"
synonyms("Acer drummondii", db="itis")
#> $`Acer drummondii`
#>   sub_tsn                    acc_name acc_tsn
#> 1  526853 Acer rubrum var. drummondii  526853
#> 2  526853 Acer rubrum var. drummondii  526853
#> 3  526853 Acer rubrum var. drummondii  526853
#>                          author                            author
#> 1 (Hook. & Arn. ex Nutt.) Sarg. (Hook. & Arn. ex Nutt.) E. Murray
#> 2 (Hook. & Arn. ex Nutt.) Sarg.             Hook. & Arn. ex Nutt.
#> 3 (Hook. & Arn. ex Nutt.) Sarg.     (Hook. & Arn. ex Nutt.) Small
#>                      syn_name syn_tsn
#> 1 Acer rubrum ssp. drummondii   28730
#> 2             Acer drummondii  183671
#> 3          Rufacer drummondii  183672
#> attr(,"class")
#> [1] "synonyms"
#> attr(,"db")
#> [1] "itis"
get_ids(names="Salvelinus fontinalis", db = c('itis', 'ncbi'), verbose=FALSE)
#> $itis
#> Salvelinus fontinalis 
#>              "162003" 
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] ""
#> attr(,"class")
#> [1] "tsn"
#> $ncbi
#> Salvelinus fontinalis 
#>                "8038" 
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] ""
#> attr(,"class")
#> [1] "ids"

You can limit to certain rows when getting ids in any get_*() functions

get_ids(names="Poa annua", db = "gbif", rows=1)
#> $gbif
#> Poa annua 
#> "2704179" 
#> attr(,"class")
#> [1] "gbifid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] TRUE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] ""
#> attr(,"class")
#> [1] "ids"

Furthermore, you can just back all ids if that's your jam with the get_*_() functions (all get_*() functions with additional _ underscore at end of function name)

get_ids_(c("Chironomus riparius", "Pinus contorta"), db = 'nbn', rows=1:3)
#> $nbn
#> $nbn$`Chironomus riparius`
#>   ptaxonversionkey    searchmatchtitle    rank  namestatus
#> 1 NBNSYS0000027573 Chironomus riparius species Recommended
#> 2 NHMSYS0001718042   Elaphrus riparius species Recommended
#> 3 NBNSYS0000023345   Paederus riparius species Recommended
#> $nbn$`Pinus contorta`
#>   ptaxonversionkey               searchmatchtitle       rank  namestatus
#> 1 NHMSYS0000494848   Pinus contorta var. contorta    variety Recommended
#> 2 NBNSYS0000004786                 Pinus contorta    species Recommended
#> 3 NHMSYS0000494848 Pinus contorta subsp. contorta subspecies Recommended
#> attr(,"class")
#> [1] "ids"
sci2comm('Helianthus annuus', db = 'itis')
#> $`Helianthus annuus`
#> [1] "common sunflower" "sunflower"        "wild sunflower"  
#> [4] "annual sunflower"
comm2sci("black bear", db = "itis")
#> $`black bear`
#> [1] "Chiropotes satanas"          "Ursus thibetanus"           
#> [3] "Ursus thibetanus"            "Ursus americanus luteolus"  
#> [5] "Ursus americanus"            "Ursus americanus"           
#> [7] "Ursus americanus americanus"
spp <- c("Sus scrofa", "Homo sapiens", "Nycticebus coucang")
lowest_common(spp, db = "ncbi")
#>             name        rank      id
#> 21 Boreoeutheria below-class 1437010

numeric to uid

#> [1] "315567"
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found"
#> attr(,"multiple_matches")
#> [1] FALSE
#> attr(,"pattern_match")
#> [1] FALSE
#> attr(,"uri")
#> [1] ""

list to uid

as.uid(list("315567", "3339", "9696"))
#> [1] "315567" "3339"   "9696"  
#> attr(,"class")
#> [1] "uid"
#> attr(,"match")
#> [1] "found" "found" "found"
#> attr(,"multiple_matches")
#> attr(,"pattern_match")
#> attr(,"uri")
#> [1] ""
#> [2] ""  
#> [3] ""
out <- as.uid(c(315567, 3339, 9696))
(res <- data.frame(out))
#>      ids class match multiple_matches pattern_match
#> 1 315567   uid found            FALSE         FALSE
#> 2   3339   uid found            FALSE         FALSE
#> 3   9696   uid found            FALSE         FALSE
#>                                           uri
#> 1
#> 2
#> 3


Check out our milestones to see what we plan to get done for each version.

  • Please report any issues or bugs.
  • License: MIT
  • Get citation information for taxize in R doing citation(package = 'taxize')
  • Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.


taxize 0.8.0

  • New data source added: Open Tree of Life. New functions for the data source added: get_tolid(), get_tolid_(), and as.tolid() (#517)
  • related to above classification() gains new method for TOL data
  • related to above lowest_common() gains new method for TOL data
  • Now using ritis package, an external dependency for ITIS taxonomy data. Note that a large number of ITIS functions were removed, and are now available via the package ritis. However, there are still many high level functions for working with ITIS data (see functions prefixed with itis_), and get_tsn(), classification.tsn(), and similar high level functions remain unchanged. (#525)
  • EUBON has a new API (v1.2). We now interact with that new API version. In addition, eubon() fxn is now eubon_search(), although either still work - though eubon() will be made defunct in the next version of this package. Additional new functions were added: eubon_capabilities(), eubon_children(), and eubon_hierarchy() (#567)
  • lowest_common() function gains two new data source options: COL (Catalogue of Life) and TOL (Tree of Life) (#505)
  • Addded new function synonyms_df() as a slim wrapper around data.table::rbindlist() to make it easy to combine many outputs from synonyms() for a single data source - there is a lot of heterogeneity among data sources in how they report synonyms data, so we don't attempt to combine data across sources (#533)
  • Change NCBI URLs to https from http (#571)
  • Fixed bug in tax_name() in which when an invalid taxon was searched for then classification() returned no data and caused an error. Fixed now. (#560) thanks @ljvillanueva for reporting it!
  • Fixed bug in gnr_resolve() in which order of input names to the function was not retained. fixed now. (#561) thanks @bomeara for reporting it!
  • Fixed bug in gbif_parse() - data format changed coming back from GBIF - needed to replace NULL with NA (#568) thanks @ChrKoenig for reporting it!

taxize 0.7.9

  • New vignette: "Strategies for programmatic name cleaning" (#549)
  • get_*() functions now have new attributes to further help the user: multiple_matches (logical) indicating whether there were multiple matches or not, and pattern_match (logical) indicating whether a pattern match was made, or not. (#550) from (#547) discussion, thanks @ahhurlbert ! see also (#551)
  • Change all xml2::xml_find_one() to xml2::xml_find_first() for new xml2 version (#546)
  • gnr_resolve() now retains user supplied taxa that had no matches - this could affect your code, make sure to check your existing code (#558)
  • gnr_resolve() - stop sorting output data.frame, so order of rows in output data.frame now same as user input vector/list (#559)
  • Fixed internal fxn sub_rows() inside of most get_*() functions to not fail when the data.frame rows were less than that requested by the user in rows parameter (#556)
  • Fixed get_gbifid(), as sometimes calls failed because we now return numberic IDs but used to return character IDs (#555)
  • Fix to all get_() functions to call the internal sub_rows() function later in the function flow so as not to interfere with taxonomic based filtering (e.g., user filtering by a taxonomic rank) (#555)
  • Fix to gnr_resolve(), to not fail on parsing when no data returned when a preferred data source specified (#557)

taxize 0.7.8

  • Fix to iucn_summary() (#543) thanks @mcsiple
  • Added message for when too many Ids passed in to ncbi_get_taxon_summary() suggesting to break up the ids into chunks (#541) thanks @daattali
  • Fix to itis_acceptname() to accept multiple names (#534) and now gives back same output regardless of whether match found or not (#531)
  • Fix to tax_name() for some queries that return no classification data via internal call to classification() (#542) thanks @daattali
  • Another fix for tax_name() (#530) thanks @ibartomeus
  • Fixed docs for rankagg() function, use requireNamespace() in examples to make sure user has vegan installed (#529)

taxize 0.7.6

  • Changed defunct messages in eol_invasive() and gisd_invasive() to point to new location in the originr package. Also, cleaned out code in those functions as not avail. anymore (#494)
  • Access to IUCN taxonomy information is now provided through the newish rredlist package. (Two issues dealing with IUCN problems (#475) (#492))
  • Fix to get_gbifid() to use new internal code to provide two ways to search GBIF taxonomy API, either via /species/match or via /species/search, instead of /species/suggest, which we used previously. The suggest route was too coarse. get_gbifid() also gains a parameter method to toggle whether you search for names using /species/match or /species/search. (#528)
  • Fix for col_search() to handle when COL can return a value of missapplied name, which a switch() statement didn't handle yet (#511) thanks @JoStaerk !
  • Fixes for get_colid() and col_search() (#523) thanks @zachary-foster !

taxize 0.7.5

  • Fixed bug in the package dependency bold, which fixes taxize::bold_search(), so no actual changes in taxize for this, but take note (#521)
  • Fixed problem in gnr_resolve() where we indexed to data incorrectly. And added tests to account for this problem. Thanks @raredd ! (#519) (#520)
  • Fixed bug in iucn_summary() introduced in last version. iucn_summary() now uses the package rredlist, which requires an API key, and I didn't document how to use the key. Function now allows user to pass the key in as a parameter, and documents how to get a key and save it in either .Renviron or in .Rprofile (#522)

taxize 0.7.4

  • New function lowest_common() for obtaining the lowest common taxon and rank for a given taxon name or ID. Methods so far for ITIS, NCBI, and GBIF (#505)
  • New contributor James O'Donnell (@jimmyodonnell) (via #505)
  • Now importing rredlist rredlist
  • New function iucn_summary_id() - same as iucn_summary(), except takes IUCN IDs as input instead of taxonomic names (#493)
  • All taxonomic rank columns in data.frame's now given back as lower case. This provides consistency, which is important, and many functions use ranks to determine what to do next, so using a consistent case is good.
  • iucn_summary() fixes, long story short: a number of bug fixes, and uses the new IUCN API via the newish package rredlist when IDs are given as input, but uses the old IUCN API when taxonomic names given. Also: gains new parameter distr_details (#174) (#472) (#487) (#488)
  • Replaced XML with xml2 for XML parsing (#499)
  • Fixes to internal use of httr::content to explicitly state encoding="UTF-8" (#498)
  • gnr_resolve() now outputs a column (user_supplied_name) for the exact input taxon name - facilitates merging data back to original data inputs (#486) thanks @Alectoria
  • eol_dataobjects() gains new parameter taxonomy to toggle whether to return any taxonomy details from different data providers (#497)
  • Catalogue of Life URLs changed - updated all appropriate COL functions to use the new URLs (#501)
  • classification() was giving back rank values in mixed case from different data providers (e.g., class vs. Class). All rank values are now all lowercase (#504)
  • Changed number of results returned from internal GBIF search in get_gbfid to 50 from 20. Gives back more results, so more likely to get the thing searched for (#513)
  • Fix to gni_search() to make all output columns character class
  • iucn_id(), tpl_families(), and tpl_get() all gain a new parameter ... to pass on curl options to httr::GET()
  • Fixes to get_eolid(): URI returned now always has the pageid, and goes to the right place; API key if passed in now actually used, woopsy (#484)
  • Fixes to get_uid(): when a taxon not found, the "match" attribute was saying found sometimes anyway - that is now fixed; additionally, fixed docs to correctly state that we give back 'NA due to ask=FALSE' when ask = FALSE (#489) Additionally, made this doc fix in other get_*() function docs
  • Fix to apgOrders() function (#490)
  • Fixes to tp_search() which fixes get_tpsid(): Tropicos doesn't allow periods (.) in query strings, so those are URL encoded now; Tropicos doesn't like sub-specific rank names in name query strings, so we warn when those are found, but don't alter user inputs; and improved docs to be more clear about how the function fails (#491) thanks @scelmendorf !
  • Fix to classification(db = "itis") to fail better when no taxa found (#495) thanks @ashenkin !
  • eol_pages() fixes: the EOL API route for this method gained a new parameter taxonomy, this function gains that parameter. That change caused this fxn to fail. Now fixed. Also, parameter subject changed to subjects (#500)
  • Fix to col_search() due to when misapplied name come back as a data slot. There was previously no parser for that type. Now there is, and it works (#512)

taxize 0.7.0

  • Now requires R >= 3.2.1. Good idea to update your R installation anyway (#476)
  • New function ion() for obtaining data from Index of Organism Names (#345)
  • New function eubon() for obtaining data from EU (European Union) BON taxonomy (#466) Note that you may onloy get partial results for some requests as paging isn't implemented yet in the EU BON API (#481)
  • New suite of functions, with prefix fg_*() for obtaining data from Index Fungorum. More work has to be done yet on this data source, but these initial functions allow some Index Fungorum data access (#471)
  • New function gbif_downstream() for obtaining downstream names from GBIF's backbone taxonomy. Also available in downstream(), where you can request downstream names from GBIF, along with other data sources (#414)
  • Note added in docs for all db parameters to warn users that if they provide the wrong db value for the given taxon ID, they can get data back, but it would be wrong. That is, all taxonomic data sources available in taxize use their own unique IDs, so a single ID value can be in multiple data sources, even though the ID refers to different taxa in each data source. There is no way we can think of to prevent this from happening, so be cautious. (#465)
  • A note added to all IUCN functions to warn users that sometimes incorrect data is returned. This is beyond our control, as sometimes IUCN itself gives back incorrect data, and sometimes EOL/Global Names (which we use in some of the IUCN functions) give back incorrect data. (#468) (#473) (#174) (472) (#475)
  • Fix to gnr_resolve() to by default capitalize first name of a name string passed to the function. GNR is case sensitive, so case matters (#469)
  • phylomatic_tree() and phylomatic_format() are defunct. They were deprecated in recent versions, but are now gone. See the new package brranching for Phylomatic data (#479)

taxize 0.6.6

  • stripauthority argument in gnr_resolve() has been renamed to canonical to better match what it actually does (#451)
  • gnr_resolve() now returns a single data.frame in output, or NULL when no data found. The input taxa that have no match at all are returned in an attribute with name not_known (#448)
  • updated some functions to work with to R >3.2.x
  • In vascan_search() changed callopts parameter to ... to pass in curl options to the request.
  • In ipni_search() changed callopts parameter to ... to pass in curl options to the request. In addition, better http error handling, and added a test suite for this function. (#458)
  • stringsAsFactors=FALSE now used for gibf_parse() (
  • Made nearly all column headers and list names lowercase to simplify indexing to elements, as well as combining outputs. (#462)
  • Plantminer API updated to use a new API. Option to search ThePlantList or the Brazilian Flora Checklist (#464)
  • Added more details to the documentation for get_uid() to make more clear how to use the varoious parameters to get the desired result, and how to avoid certain pitfalls (#436)
  • Removed the parameter asdf from the function eol_dataobjects() - now returning data.frame's only.
  • Added some error catching to get_eolid() via tryCatch() to fail better when names not found.
  • Dropped openssl as a package dependency. Not needed anymore because uBio dropped.
  • gnr_resolve() failed when no canonical form was found.
  • Fixed gnr_resolve() when no results found when best_match_only=TRUE (#432)
  • Fixed bug in internal function itisdf() to give back an empty data.frame when no results found, often with subspecific taxa. Helps solve errors reported in use of downstream(), itis_downstream(), and gethierarchydownfromtsn() (#459)
  • gnr_resolve() gains new parameter with_canonical_ranks (logical) to choose whether infraspecific ranks are returned or not.
  • New function iucn_id() to get the IUCN ID for a taxon from it's name. (#431)
  • All functions that interacted with the taxonomy service uBio are now defunct. Of course we would deprecate first, then make defunct later, to make transition easier, but that is out of our hands. The functions that are defunct are: ubio_classification(), ubio_classification_search(), ubio_id(), ubio_search(), ubio_synonyms(), get_ubioid(), ubio_ping(). In addition, ubio has been removed as an option in the synonyms() function, and references for uBio have been removed from the taxize_cite() utility function. (#449)

taxize 0.6.2

  • rankagg() doesn't depend on data.table anymore (fixes issue with CRAN checks)
  • Replaced RCurl::base64Decode() with openssl::base64_decode(), needed for ubio_*() functions (#447)
  • Importing only functions (via importFrom) used across all imports now (#446). In addition, importFrom for all non-base R pkgs, including graphics, methods, stats and utils packages (#441)
  • Fixes to prevent problems with httr v1, where you can't pass a zero length list to the query parameter in GET(), but can pass NULL (#445)
  • Fixes to all of the gni_*() functions, including code tidying, some DRYing out, and ability to pass in curl options (#444)
  • Fixed typo in taxize_cite()
  • Fixed a bug in classification() where numeric IDs as input got converted to itis ids just because they were numeric. Fixed now. (#434)
  • Catalogue of Life (COL) changed from using short numeric codes for taxa to long alphanumeric UUID type ids. This required fixing functions using COL web services (#435)

taxize 0.6.0

  • Added a method for Catalogue of Life for the synonyms function to get name synonyms. (#430)
  • Added datasets apgFamilies and apgOrders. (#418)
  • col_search() gains parameters response to get a terse or full response, and ... to pass in curl options.
  • eol_dataobjects() gains parameter ... to pass in curl options, and parameter returntype renamed to asdf (for "as data.frame").
  • ncb_get_taxon_summary() gains parameter ... to pass in curl options.
  • The children() function gains the rows parameter passed on to get_*() functions, supported for data sources ITIS and Catalogue of Life, but not for NCBI.
  • The upstream() function gains the rows parameter passed on to get_*() functions, supported for both data sources ITIS and Catalogue of Life.
  • The classification() function gains the rows parameter passed on to get_*() functions, for all sources used in the function.
  • The downstream() function gains the rows parameter passed on to get_*() functions, for all sources used in the function.
  • Nearly all taxonomic ID retrieveal functions (i.e., get_*()) gain new parameters to help filter results (e.g., division, phylum, class, family, parent, rank, etc.). These parameters allow direct matching or regex filters (e.g., .a to match any character followed by an a). (#410) (#385)
  • Nearly all taxonomic ID retrieveal functions (i.e., get_*()) now give back more information (mostly higher taxonomic data) to help in the interactive decision process. (#327)
  • New data source added to synonyms() function: Catalogue of Life. (#430)
  • vegan package, used in class2tree() function, moved from Imports to Suggests. (#392)
  • Improved taxize_cite() a lot - get URLs and sometimes citation information for data sources available in taxize. (#270)
  • Fixed typo in apg_lookup() function. (#422)
  • Fixed documentation in apg_families() function. (#418)
  • Across many functions, fixed support for passing in curl options, and added examples of curl option use.
  • callopts parameter in eol_pages(), eol_search(), gnr_resolve(), tp_accnames(), tp_dist(), tp_search(), tp_summary(), tp_synonyms(), ubio_search() changed to ...
  • accepted parameter in get_tsn() changed to FALSE by default. (#425)
  • Default value of db parameter in resolve() changed to gnr as tnrs is often quite slow.
  • General code tidying across the package to make code easier to read.
  • Fixed encoding issues in tpl_families() and tpl_get(). (#424)
  • The following functions that were deprecated are now defunct (no longer available): ncbi_getbyname(), ncbi_getbyid(), ncbi_search(), eol_invasive(), gisd_isinvasive(). These functions are available in the traits package. (#382)
  • phylomatic_tree() is deprecated, but will be defunct in a upcoming version.

taxize 0.5.2

  • New set of functions to ping each of the APIs used in taxize. E.g., itis_ping() pings ITIS and returns a logical, indicating if the ITIS API is working or not. You can also do a very basic test to see whether content returned matches what's expected. (#394)
  • New function status_codes() to get vector of HTTP status codes. (#394)
  • Removed startup message.
  • Now can pass in curl options to itis_ping(), and all *_ping() functions.
  • Moved examples that were in \donttest into \dontrun.

taxize 0.5.0

  • New function genbank2uid() to get a NCBI taxonomic id (i.e., a uid) from a either a GenBank accession number of GI number. (#375)
  • New function get_nbnid() to get a UK National Biodiversity Network taxonomic id (i.e., a nbnid). (#332)
  • New function nbn_classification() to get a taxonomic classification for a UK National Biodiversity Network taxonomic id. Using this new function, generic method classification() gains method for nbnid. (#332)
  • New function nbn_synonyms() to get taxonomic synonyms for a UK National Biodiversity Network taxonomic id. Using this new function, generic method synonyms() gains method for nbnid. (#332)
  • New function nbn_search() to search for taxa in the UK National Biodiversity Network. (#332)
  • New function ncbi_children() to get direct taxonomic children for a NCBI taxonomic id. Using this new function, generic method children() gains method for ncbi. (#348) (#351) (#354)
  • New function upstream() to get taxa upstream of a taxon. E.g., getting families upstream from a genus gets all families within the one level higher up taxonomic class than family. (#343)
  • New suite of functions as.*() to coerce numeric/alphanumeric codes to taxonomic identifiers for various databases. There are methods on this function for each of itis, ncbi, tropicos, gbif, nbn, bold, col, eol, and ubio. By default as.*() funtions make a quick check that the identifier is a real one by making a GET request against the identifier URI - this can be toggle off by setting check=FALSE. There are methods for returning itself, character, numeric, list, and data.frame. In addition, if the as.*.data.frame() function is used, a generic method exists to coerce the data.frame back to a identifier object. (#362)
  • New suite of functions named, for example, get_tsn_() (the underscore is the only different from the previous function name). These functions don't do the normal interactive process of prompts that e.g., get_tsn() do, but instead returned a list of all ids, or a subset via the rows parameter. (#237)
  • New function ncbi_get_taxon_summary() to get taxonomic name and rank for 1 or more NCBI uid's. (#348)
  • assertthat removed from package imports, replaced with stopifnot(), to reduce dependency load. (#387)
  • eol_hierarchy() now defunct (no longer available) (#228) (#381)
  • tp_classifcation() now defunct (no longer available) (#228) (#381)
  • col_classification() now defunct (no longer available) (#228) (#381)
  • New manual page listing all the low level ITIS functions for which their manual pages are not shown in the package index, but are available if you to ?fxn-name.
  • All get_*() functions gain a new parameter rows to allow selection of particular rows. For example, rows=1 to select the first row, or rows=1:3 to select rows 1 through 3. (#347)
  • classification() now by default returns taxonomic identifiers for each of the names. This can be toggled off by the return_id=FALSE. (#359) (#360)
  • Simplification of many higher level functions to use switch() on the db parameter, which helps give better error message when a db value is not possible or spelled incorrectly. (#379)
  • Lots of reduction of redundancy in internal functions. (#378)

taxize 0.4.0

  • New data sources added to taxize: BOLD (Biodiversity of Life Database). Three more data sources were added (World Register of Marine Species (WoRMS), Pan-European Species directories Infrastructure (PESI), and Mycobank), but are not available on CRAN. Those three data sources provide data via SOAP web services protocol, which is hard to support in R. Thus, those sources are available on Github. See
  • New function children(), which is a single interface to various data sources to get immediate children from a given taxonomic name. (#304)
  • New functions added to search BOLD data" bold_search() that searches for taxa in the BOLD database of barcode data; get_boldid() to search for a BOLD taxon identifier. (#301)
  • New function get_ubioid() to get a uBio taxon identifier. (#318)
  • New function started (not complete yet) to get suggested citations for the various data sources available in taxize: taxize_cite(). (#270)
  • Using jsonlite instead of RJSONIO throughout the taxize.
  • get_ids() gains new option to search for a uBio ID, in addition to the others, itis, ncbi, eol, col, tropicos, and gbif.
  • Fixed documentation for stripauthority parameter gnr_resolve(). (#325)
  • iplant_resolve() now outputs data.frame structure instead of a list. (#306)
  • Clarified parameter seqrange in ncbi_getbyname() and ncbi_search() (#328)
  • synonyms() gains new data source, can now get synonyms from uBio data source (#319)
  • vascan_search() giving back more useful results now.
  • Added error catching for when URI is too long, i.e., when too many names provided (#329) (#330)
  • Various fixes to tnrs() function, including more meaningful error messages on failures (#323) (#331)
  • Fixed bug in getpublicationsfromtsn() that caused function to fail on data.frame's with no data on name assignment (#297)
  • Fixed bug in sci2comm() that caused fxn to fail when using db=itis sometimes (#293)
  • Fixes to scrapenames(). Sending a text blob via the text parameter now works.
  • Fixes to resolve() so that function now works for all 3 data sources. (#337)

taxize 0.3.0

  • New function iplant_resolve() to do name resolution using the iPlant name resolution service. Note, this is different from that is wrapped in the tnrs() function.
  • New function ipni_search() to search for names in the International Plant Names Index (IPNI).
  • New function resolve() that unifies name resolution services from iPlant's name resolution service (via iplant_resolve()), Taxosaurus' TNRS (via tnrs()), and GNR's name resolution service (via gnr_resolve()).
  • All get_*() functions how returning a new uri attribute that is a link to the taxon on on the web. If NA is given back (e.g. nothing found), the uri attribute is blank. You can go directly to the uri in your default browser by doing, for example: browseURL(attr(result, "uri")).
  • get_eolid() now returns an attribute provider because EOL collates taxonomic data form a lot of sources, then gives back IDs that are internal EOL ids, not those matching the id of the source they pull from. This should help with provenance, and should help if there is confusion about why the id givenb back by this function does not match that from the original source.
  • Within the get_tsn() function, now using the function itis_terms(), which gives back the accepted status of the taxa. This allows a new parameter in the function (accepted, logical) that allows user to say give back only accepted status names (accepted=TRUE), or to give back all names (accepted=FALSE).
  • gnr_resolve() gains two new parameters best_match_only (logical, to return best match only) and preferred_data_sources (to return preferred data sources) and callopts to pass in curl options.
  • tnrs(), tp_accnames(), tp_refs(), tp_summary(), and tp_synonyms() gain new parameter callopts to pass in curl options.
  • class2tree() can now handle NA in classification objects.
  • classification.eolid() and classification.colid() now return the submitted name along with the classification.
  • Changed from CC0 to MIT license.
  • Updated citation to have both the taxize paper in F1000 Research and the package citation.
  • Sped up some functions by removing internal use of plyr functions, see #275.
  • Removed dependency on rgbif - copied into this package a few functions needed internally. This avoids users having to install GDAL binary.
  • Added in verbose parameter to many more functions to allow suppression of help messages.
  • In most functions when using httr, now manually parsing JSON to a list then to another data format instead of allowing internal httr parsing - in addition added checks on content type and encoding in many functions.
  • Added match.arg iternally to get_ids() for the db parameter so that a) unique short abbreviations of possible values are possible, and b) gives a meaningful warning if unsupported values are given.
  • Most long-named ITIS functions (e.g., getexpertsfromtsn, getgeographicdivisionsfromtsn) gain parameter curlopts to pass in curl options.
  • Added stringsAsFactors=FALSE to all data.frame creations to eliminate factor variables.
  • classification.gbifid() did not return the correct result when taxon not found.
  • Fixed bugs in many functions, see #245, #248, #254, #277.
  • classification() used to fail when it was passed a subset of a vector of ids, in which case the class information was stripped off. Now works (#284)

taxize 0.2.2

  • itis_downstream() and col_downstream() functions accessible now from a single function downstream() (
  • Added a extension function classification() for the gbif id class, classification.gbifid() (
  • Added some error catching to class2tree function. (
  • Fixed problems in cbind.classification() and rbind.classification() where the first column of the ouput was a useless column name, and all column names now lower case for consistency. (
  • classification() was giving back IDS instead of taxon names on the list element names, fixed this so hopefully all are giving back names. (
  • Fixed bugs in col_*() functions so they give back data.frame's now with character class columns instead of factors, damned stringsAsFactors! (

taxize 0.2.0

  • New dataset: Lookup-table for family, genus, and species names for ThePlantList under dataset name "theplantlist".
  • get_ids() now accepts "gbif" as an option via use of get_gbifid().
  • Changed function itis_phymat_format() to phylomatic_format() - this function gets the typical Phylomatic format name string "family/genus/genus_epithet"
  • Updated gbif_parse() base url to the new one (
  • Fixes to phylomatic_tree().
  • New function class2tree() to convert list of classifications to a tree. For example, go from a list of classifications from the function classification() to this function to get a taxonomy tree in ape phylo format.
  • New function get_gbfid() to get a Global Biodiversity Information Facility identifier. This is the ID GBIF uses in their backbone taxonomy.
  • classification() outputs gain rbind() and cbind() generic methods that act on the various outputs of classification() to bind data width-wise, or column-wise, respectively.

taxize 0.1.9

  • Updated ncbi_search() to retrieve more than a max of 500, slightly changed column headers in output data files, and if didn't before, now accepts a vector/list of taxonomic names instead of just one name.

taxize 0.1.8

  • We attempted to make all ouput column names lowercase, and to increase consistency across column names in outputs from similar functions.
  • New function scrapenames() uses the Global Names Recognition and Discovery service to extract taxonomic names from a web page, pdf, or other document.
  • New function vascan_search() to search the CANADENSYS Vascan names database.
  • Fixed bugs in get_tpsid(), get_eolid() and eol_pages().
  • phylomatic_tree() bugs fixed.
  • classification() methods were simplified. Now classification() is the workhorse for every data-source. col_classification(), eol_hierarchy(), and tp_classification() are now deprecated and will be removed in the next taxize version.
  • classification() gains four new arguments: start, checklist, key, and callopts.
  • comm2sci() gains argument simplify to optionally simplify output to a vector of names (TRUE by default).
  • get_eolid() and get_tpsid() both gain new arguments key to specify an API key, and ... to pass on arguments to eol_search().
  • Added ncbi as a data source (db="ncbi") in sci2comm().
  • tax_agg() now accepts a matrix in addition to a data.frame. Thanks to @tpoi
  • tnrs() changes: Using httr instead of RCurl; now forcing splitting up name vector when long. Still issues when using POST requests (getpost="POST") wherein a request sent with 100 names only returns 30 for example. Investigating this now.
  • Function name change: tp_acceptednames() now tp_accnames().
  • Function name change: tp_namedistributions() now tp_dist().
  • Function name change: tp_namereferences() now tp_refs().
  • Internal ldfast() function changed name to taxize_ldfast() to avoid namespace conflicts with similar function in another package.
  • Three functions now with ncbi_* prefix: get_seqs() is now ncbi_getbyname(); get_genes() is now ncbi_getbyid(); and get_genes_avail() is now ncbi_search().

taxize 0.1.5

  • classification() gains extension method classification.ids() to accept output from get_ids() - which attempts to get a taxonomic hierarchy from each of the taxon identifiers with the output from get_ids().
  • synonyms() gains extension method synonyms.ids() to accept output from get_ids() - which attempts to get synonyms from each of the taxon identifiers with the output from get_ids().

taxize 0.1.4

  • Reworked functions that interact with the ITIS API so that lower level functions were grouped together into higher level functions. All the approximately 50 lower level functions are still exported but are not included in the index help file (due to @keywords internal for each fxn) - but can still be used normally, and man files are avaialable at ?functionName.
  • New function itis_ping() to check if the ITIS API service is up, similar to eol_ping() for the EOL API.
  • New function itis_getrecord() to get a partial or full record, using a TSN or lsid.
  • New function itis_refs() to get references associated with a TSN.
  • New function itis_kingdomnames() to get all kingdom names, or kingdom name for a TSN.
  • New function itis_lsid() to get a TSN from an lsid, get a partial or full record from an lsid.
  • New function itis_native() to get status as native, exotic, etc. in various geographic regions.
  • New function itis_hierarchy() to get full hierarchy, or immediate up or downstream hierarchy.
  • New function itis_terms() to get tsn's, authors, common names, and scientific names from a given query.
  • New function sci2comm() to get common (vernacular) names from input scientific names from various data sources.
  • New function comm2sci() to get scientific names from input common (vernacular) names from various data sources.
  • New function get_ids() to get taxonomic identifiers across all sources.
  • itis_taxrank() now outputs a character, not a factor; loses parameter verbose, and gains ..., which passes on further arguments to gettaxonomicranknamefromtsn.
  • tp_synonyms(), tp_summary(), plantminer(), itis_downstream(), gisd_isinvasive(), get_genes_avail(), get_genes(), eol_invasive(), eol_dataobjects(), andn tnrs() gain parameter verbose to optionally suppress messages.
  • phylomatic_tree() format changed so that names are passed in normall (e.g., Poa annua) instead of the slashpath format (family/genus/genus_species). Also, taxaformat parameter dropped.
  • itis_acceptname() gains ... to pass in further arguments to getacceptednamesfromtsn()
  • tp_namedistributions() loses parameter format.
  • get_tsn() and get_uid() return infomation about match as attribute.
  • clarified iucn-documentation
  • Fixed bug in synonyms() so that further arguments can be passed on to get_tsn() to suppress messages.
  • Removed test for ubio_classification_search(), a function that isn't operational yet.

taxize 0.1.1

  • New functions added just like get_uid()/get_tsn() but for EOL, Catalogue of Life, and Tropicos, see get_eolid(), get_colid(), and get_tpsid(), respectively.
  • classification() methods added for EOL, Catalogue of Life, and Tropicos, see functions classification.eolid(), classification.colid(), and classification.tpsid() respectively.
  • New function col_search() to search for names in the Catalogue of Life.
  • User can turn off interactive mode in get_* functions. All get_* functions gain an ask argument, if TRUE (default) a user prompt is used for user to select which row they want, if FALSE, NA is returned when many results available; and added tests for the new argument. Affects downstream functions too.
  • New function eol_invasive() to search EOL collections of invasive species lists.
  • New function tp_search() to search for a taxonomic IDs from Tropicos.
  • New function tp_classification() to get a taxonomic hierarchy from Tropicos.
  • New function gbif_parse() to parse scientific names into their components, using the GBIF name parser API.
  • New function itis_searchcommon() to search for common names across both searchbycommonnamebeginswith, and searchbycommonnameendswith.
  • tax_name() and other function broke, because get_tsn() and get_uid() returned wrong value when a taxon was not found. Fixed.
  • Added tests for new classification() methods for EOL, COL, and Tropicos.
  • Added tests for new functions tp_search() and tp_classification().
  • Moved tests from inst/tests to tests/testthat according to new preferred location of tests.
  • Updated CITATION in inst/ with our F1000Research paper info.
  • Package repo name on Github changed from taxize_ to taxize - remember to use "taxize" in install_github() calls now instead of "taxize_"

taxize 0.1.0

  • New function tpl_families() to get data.frame of families from The site.
  • New function names_list() to get a random vector of species names using the
  • Added two new data sets, plantGenusNames.RData and plantNames.RData, to be used in names_list().
  • New function ldfast(), a replacement function for plyr::ldply that should be faster in all cases.
  • Changed API key names to be more consistent, now tropicosApiKey, eolApiKey, ubioApiKey, and pmApiKey - do change these in your .Rprofile if you store them there.
  • Added a startup message.
  • Across most functions, removed dependencies on plyr, using ldfast() instead, for increased speed.
  • Across most functions, changed from using RCurl to using httr.
  • Across most functions, stop_for_status() now used directly after Curl call to check the http status code, stoping the function if appropriate code found.
  • Many functions changed parameter ... to callopts, which passes on additional Curl options, with default an empty list (list()), which makes function testing easier.
  • eol_search() gains parameters page, exact, filter_tid, filter_heid, filter_by_string, matching, cache_ttl, and callopts.
  • eol_hierarchy() gains parameter callopts, and loses parameter usekey (always using API key now).
  • eol_pages() gains parameters images, videos, sounds, maps, text, subject, licenses, details, common_names, synonyms, references, vetted, cache_ttl, and callopts.
  • gni_search(): parameter url lost, is defined inside the function now, and .Rd file gains url references.
  • phylomatic_tree() now checks to make sure family names were found for input taxa. If not, the function stops with message informing this.
  • tpl_get() updated with fixes/improvements by John Baumgartner - now gets taxa from all groups, whereas only retrieved from Angiosperms before. In addition, csv files from The are downloaded directly rather than read into R and written out again.
  • tpl_search() now checks for missing data or errors, and stops function with error message.
  • capwords() fxn changed to taxize_capwords() to avoid namespace conflicts with other packages with a similar function.
  • ubio_namebank() was giving back base64 encoded data, now decoded appropriately.
  • Added John Baumgartner as an author.

taxize 0.0.6

  • tax_name() accepts multiple ranks to query.
  • tax_name() accepts vectors as input.
  • tax_name() has an option to query both, NCBI and ITIS, in one call and return the union of both.
  • new extractor function for iucn_summary(): iucn_status(), to extract status from iucn-objects.
  • tax_agg(): A function to aggregate species data to given taxonomic rank.
  • tax_rank(): Get taxonomic rank for a given taxon name.
  • classification() accepts taxon names as input and returns a named list.
  • new function apg_lookup() looks up APGIII taxonomy and replaces family names
  • new function gni_parse() parses scientific names using EOl's name parser API
  • new function iucn_getname() is a utility to find IUCN names using the EOL API
  • new function rank_agg() aggregates data by a given taxonomic rank
  • new data table apg_families
  • new data table apg_orders
  • gnr_resolve() gains new arguments gnr_resolvee_once, with_context, stripauthority, highestscore, and http, and loses returndf (that is, a data.frame is returned by default)
  • gni_search() gains parameter parse_names
  • tnrs() parameter getpost changed from default of 'GET' to 'POST'
  • Across all functions, the url parameter specifying an API endpoint was moved inside of functions (i.e., not available as a parameter in the function call)
  • gnr_datasources() parameter todf=TRUE by default now, returning a data.frame
  • col_classification() minor formatting improvements
  • iucn_summary() returns no information about population estimates.
  • get_tsn() raised a warning in specific situations.
  • tax_name() did not work for multiple ranks with ITIS.
  • fixed errors in getfullhierarchyfromtsn()
  • fixed errors in gethierarchydownfromtsn()
  • fixed errors in getsynonymnamesfromtsn()
  • fixed errors in searchforanymatch()
  • fixed errors in searchforanymatchedpage()
  • Removed dependency to NCBI2R
  • Improvements of documentation
  • Citation added

taxize 0.0.5

  • removed tests for now until longer term fix is made so that web APIs that are temporarily down don't cause tests to fail.

taxize 0.0.4

  • added R (>= 2.15.0) so that package tests don't fail on some systems due to paste0()
  • remove test for ubio_namebank() function as it sometimes fails

taxize 0.0.3

  • iucn_summary() does not break when API returns no information.
  • tax_name() returns NA when taxon is not found on API.
  • get_uid() asks for user input when more then one UID is found for a taxon.
  • changed base URL for phylomatic_tree(), and associated parameter changes
  • added check for invasive species status for a set of species from GISD database via gisd_isinvasive().
  • Further development with the EOL-API: eol_dataobjects().
  • added Catalogue of Life: col_classification(), col_children(), and col_downstream().
  • new fxn get_genes(), retrieve gene sequences from NCBI by accession number.
  • new functions to interact with the Phylotastic name resolution service: tnrs_sources() and tnrs()
  • Added unit tests
  • itis_name() fxn deprecated - use tax_name() instead

taxize 0.0.2

  • changed paste0 to paste to avoid problems on certain platforms.
  • removed all tests until the next version so that tests will not fail on any platforms.
  • plyr was missing as import for iucn_summary fxn.
  • added NEWS file.

taxize 0.0.1

  • released to CRAN

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.


0.9.0 by Scott Chamberlain, 4 months ago

Report a bug at

Browse source code at

Authors: Scott Chamberlain [aut, cre], Eduard Szoecs [aut], Zachary Foster [aut], Carl Boettiger [ctb], Karthik Ram [ctb], Ignasi Bartomeus [ctb], John Baumgartner [ctb], James O'Donnell [ctb], Jari Oksanen [ctb]

Documentation:   PDF Manual  

Task views: Phylogenetics, Especially Comparative Methods

MIT + file LICENSE license

Imports graphics, methods, stats, utils, httr, xml2, jsonlite, reshape2, stringr, plyr, foreach, ape, bold, data.table, rredlist, rotl, ritis, tibble, worrms, natserv, wikitaxa

Suggests testthat, roxygen2, knitr, vegan

Imported by RNeXML, TR8, bdvis, brranching, camtrapR, metacoder, myTAI, originr, rusda, taxlist, traits.

Depended on by MonoPhy, aptg.

Suggested by binomen, mapr, rbison, rnoaa, spocc, taxa.

Enhanced by rerddap.

See at CRAN