An interface to the 'Open Tree of Life' API to retrieve phylogenetic trees, information about studies used to assemble the synthetic tree, and utilities to match taxonomic names to 'Open Tree identifiers'. The 'Open Tree of Life' aims at assembling a comprehensive phylogenetic tree for all named species.
rotl is an R package to interact with the Open Tree of Life data APIs. It was
initially developed as part of the
NESCENT/OpenTree/Arbor hackathon.
Client libraries to interact with the Open Tree of Life API also exists for Python and Ruby.
The current stable version is available from CRAN, and can be installed by typing the following at the prompt in R:
install.packages("rotl")If you want to test the development version, you first need to install
the remotes package.
install.packages("remotes")Then you can install rotl using:
remotes::install_github("ropensci/rotl")There are three vignettes:
Start by checking out the "How to use rotl?" by typing:
vignette("how-to-use-rotl", package="rotl") after installing the
package.
Then explore how you can use rotl with other packages to combine your data
with trees from the Open Tree of Life project by typing:
vignette("data_mashups", package="rotl").
The vignette "Using the Open Tree Synthesis in a comparative analsysis"
demonstrates how you can reproduce an analysis of a published paper by
downloading the tree they used, and data from the supplementary material:
vignette("meta-analysis", package="rotl").
The vignettes are also available from CRAN:
How to use rotl?,
Data mashups,
and
Using the Open Tree synthesis in a comparative analysis.
Taxonomic names are represented in the Open Tree by numeric identifiers, the
ott_ids (Open Tree Taxonomy identifiers). To extract a portion of a tree from
the Open Tree, you first need to find ott_ids for a set of names using the
tnrs_match_names function:
library(rotl)apes <- c("Pongo", "Pan", "Gorilla", "Hoolock", "Homo")(resolved_names <- tnrs_match_names(apes))## search_string unique_name approximate_match ott_id is_synonym flags
## 1 pongo Pongo FALSE 417949 FALSE
## 2 pan Pan FALSE 417957 FALSE
## 3 gorilla Gorilla FALSE 417969 FALSE
## 4 hoolock Hoolock FALSE 712902 FALSE
## 5 homo Homo FALSE 770309 FALSE
## number_matches
## 1 2
## 2 2
## 3 1
## 4 1
## 5 1
Now we can get the tree with just those tips:
tr <- tol_induced_subtree(ott_ids=ott_id(resolved_names))plot(tr)
The code above can be summarized in a single pipe:
library(magrittr)##
## Attaching package: 'magrittr'
## The following objects are masked from 'package:testthat':
##
## equals, is_less_than, not
## or expressed as a pipe:c("Pongo", "Pan", "Gorilla", "Hoolock", "Homo") %>% tnrs_match_names %>% ott_id %>% tol_induced_subtree %>% plot
To cite rotl in publications pleases use:
interact with the Open Tree of Life data. Methods in Ecology and Evolution. 7(12):1476-1481. doi: 10.1111/2041-210X.12593
You may also want to cite the paper for the Open Tree of Life
Hinchliff, C. E., et al. (2015). Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proceedings of the National Academy of Sciences 112.41 (2015): 12764-12769 doi: 10.1073/pnas.1423041112
The manuscript in Methods in Ecology and Evolution includes additional examples on how to use the package. The manuscript and the code it contains are also hosted on GitHub at: https://github.com/fmichonneau/rotl-ms
Starting with v3.0.0 of the package, the major and minor version numbers (the first 2 digits of the version number) will be matched to those of the API. The patch number (the 3rd digit of the version number) will be used to reflect bug fixes and other changes that are independent from changes to the API.
rotl can be used to access other versions of the API (if they are available)
but most likely the high level functions will not work. Instead, you will need
to parse the output yourself using the "raw" returns from the unexported
low-level functions (all prefixed with a .). For instance to use the
tnrs/match_names endpoint for v2 of the API:
rotl:::.tnrs_match_names(c("pan", "pango", "gorilla", "hoolock", "homo"), otl_v="v2")Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
tnrs_match_names are
consistent, and remain the same even after using update().get_study_subtree gains the argument tip_label to control the
formatting of the tip labels, #90, reported by @bomearais_in_tree takes a list of OTT ids (i.e., the output of
ott_id()), and returns a vector of logical indiicating whether they are
included in the synthetic tree (workaround #31).get_study_subtree ignored the argument subtree_id, #89
reported by @bomearacitation("rotl") now includes the reference to the Open Tree of Life
publication.Fix tests and vignette to reflect changes accompanying release 6.1 of the synthetic tree
Add section in vignette "How to use rotl?" about how to get the higher taxonomy from a given taxon.
Add CITATION file with MEE manuscript information (#82)
rotl now interacts with v3.0 of the Open Tree of Life APIs. The
documentation has been updated to reflect the associated changes. More
information about the v3.0 of the Open Tree of Life APIs can be found
on their wiki.New methods: tax_sources, is_suppressed, tax_rank, unique_name,
name, ott_id, for objects returned by tnrs_match_names(),
taxonomy_taxon_info(), taxonomy_taxon_mrca(), tol_node_info(),
tol_about(), and tol_mrca(). Each of these methods have their own class.
New method tax_lineage() to extract the higher taxonomy from an object
returned by taxonomy_taxon_info() (initally suggested by Matt Pennell, #57).
New method tol_lineage() to extract the nodes towards the root of the tree.
New print methods for tol_node_info() and tol_mrca().
New functions study_external_IDs() and taxon_external_IDs() that return
the external identifiers for a study and associated trees (e.g., DOI, TreeBase
ID); and the identifiers of taxon names in taxonomic databases. The vignette
"Data mashup" includes an example on how to use it.
The function strip_ott_id() gains the argument remove_underscores to remove
underscores from tips in trees returned by OTL.
Rename method ott_taxon_name() to tax_name() for consistency.
Rename method synth_sources() and study_list() to source_list().
Refactor how result of query is checked and parsed (invisible to the user).
Fix bug in studies_find_studies(), the arguments verbose and exact were
ignored.
The argument only_current has been dropped for the methods associated with
objects returned by tnrs_match_names()
The print method for tnrs_context() duplicated some names.
inspect(), update() and synonyms() methods for tnrs_match_names() did
not work if the query included unmatched taxa.
New vignette: meta-analysis
Added arguments include_lineage and list_terminal_descendants to
taxonomy_taxon()
Improve warning and format of the result if one of the taxa requested doesn't
match anything tnrs_match_names.
In the data frame returned by tnrs_match_names, the columns
approximate_match, is_synonym and is_deprecated are now logical
(instead of character) [issue #54]
New utility function strip_ott_ids removes OTT id information from
a character vector, making it easier to match tip labels in trees returned by
tol_induced_subtree to taxonomic names in other data sources. This function
can also remove underscores from the taxon names.
New method list_trees returns a list of tree ids associated with
studies. The function takes the output of studies_find_studies or
studies_find_trees.
studies_find_studies and studies_find_trees gain argument detailed
(default set to TRUE), that produces a data frame summarizing information
(title of the study, year of publication, DOI, ids of associated trees, ...)
about the studies matching the search criteria.
get_study_tree gains argument deduplicate. When TRUE, if the tree
returned for a given study contains duplicated tip labels, they will be made
unique before being parsed by NCL by appending a suffix (_1, _2, _3,
etc.). (#46, reported by @bomeara)
New method get_study_year for objects of class study_meta that returns the
year of publication of the study.
A more robust approach is used by get_tree_ids to identify the tree ids in
the metadata returned by the API