Evolutionary Transcriptomics Analyses

Investigate the evolution of biological processes by capturing evolutionary signatures in transcriptomes. The aim of this tool is to provide a transcriptome analysis environment for answering questions regarding the evolution of biological processes.

I will confess that I frequently have the feeling in my experimental work of holding a dialogue with someone who is considerably brighter than me.

- Hans Spemann

Development is the major process establishing complex life on earth. Hence, studying the evolution of developmental processes allows us to understand the key machanisms that control and constraint the evolution and diversification of complex organisms on this planet. To study the evolution of developmental processes an evolutionary transcriptomics approach (= phylotranscriptomics) has been proposed aiming to quantify the evolutionary conservation of developmental transcriptomes (Drost et al., 2015 Mol. Biol. Evol. ; Drost et al., 2016 Mol. Biol. Evol.).

The myTAI package allows users to capture evolutionary information that is hidden in transcriptomes using an evolutionary transcriptomics approach.

This evolutionary transcriptomics approach (= phylotranscriptomics) defines the concept of combining genetic sequence conservation information with gene expression levels to quantify transcriptome conservation throughout biological processes (Domazet-Loso and Tautz, 2010 Nature ; Quint, Drost et al., 2012 Nature ; Drost et al., 2015 Mol. Biol. Evol. ; Drost et al., 2016 Mol. Biol. Evol.).

This subfield of Evolutionary Developmental Biology aims to determine and investigate stages or periods of evolutionary conservation in biological processes of extant species. However, although motivated by and applied to developmental processes, the myTAI package is implemented to quantify transcriptome conservation in any transcriptome experiment of interest and therefore aims to provide a standard approach to investigate the evolution of biological processes in the context of transcriptome conservation.

In particular, myTAI provides an easy to use and optimized software framework to perform phylostrancriptomic analyses for any annotated organism and developmental process of interest. Additionally, customized visualization functions implemented in myTAI allow users to generate publication quality plots for their own phylotranscriptomics research.

The following tutorials will provide use cases and detailed explainations of how to quantify transcriptome conservation with myTAI and how to interpret the results generated with this software tool.

These tutorials introduce users to myTAI:

The current status of the package as well as a detailed history of the functionality of each version of myTAI can be found in the NEWS section.

Users can download myTAI from CRAN :

# install myTAI 0.4.0 from CRAN
install.packages("myTAI", dependencies = TRUE)
# to perform differential gene expression analyses with myTAI
# please install the edgeR package
# install edgeR

Users can also read the tutorials within (RStudio) :

# source the myTAI package
# look for all tutorials (vignettes) available in the myTAI package
# this will open your web browser
# or as single tutorials
# open tutorial: Introduction to Phylotranscriptomics and myTAI
 vignette("Introduction", package = "myTAI")
# open tutorial: Intermediate Concepts of Phylotranscriptomics
 vignette("Intermediate", package = "myTAI")
# open tutorial: Advanced Concepts of Phylotranscriptomics
 vignette("Advanced", package = "myTAI")
# open tutorial: Age Enrichment Analyses
 vignette("Enrichment", package = "myTAI")
# open tutorial: Gene Expression Analysis with myTAI
 vignette("Expression", package = "myTAI")
 # open tutorial: Taxonomic Information Retrieval with myTAI
 vignette("Taxonomy", package = "myTAI")

In the myTAI framework users can find:

  • TAI() : Function to compute the Transcriptome Age Index (TAI)
  • TDI() : Function to compute the Transcriptome Divergence Index (TDI)
  • REMatrix() : Function to compute the relative expression profiles of all phylostrata or divergence-strata
  • RE() : Function to transform mean expression levels to relative expression levels
  • pTAI() : Compute the Phylostratum Contribution to the global TAI
  • pTDI() : Compute the Divergence Stratum Contribution to the global TDI
  • pMatrix() : Compute Partial TAI or TDI Values
  • pStrata() : Compute Partial Strata Values
  • PlotPattern() : Function to plot the TAI or TDI profiles and perform statistical tests
  • PlotCorrelation() : Function to plot the correlation between phylostratum values and divergence-stratum values
  • PlotRE() : Function to plot the relative expression profiles
  • PlotBarRE() : Function to plot the mean relative expression levels of phylostratum or divergence-stratum classes as barplot
  • PlotMeans() : Function to plot the mean expression profiles of phylostrata or divergence-strata
  • PlotDistribution() : Function to plot the frequency distribution of genes within the corresponding phylostratigraphic map or divergence map
  • PlotContribution() : Plot the Phylostratum or Divergence Stratum Contribution to the Global TAI/TDI Pattern
  • PlotEnrichment() : Plot the Phylostratum or Divergence Stratum Enrichment of a given Gene Set
  • PlotGeneSet() : Plot the Expression Profiles of a Gene Set
  • PlotCategoryExpr() : Plot the Expression Levels of each Age or Divergence Category as Barplot or Violinplot
  • PlotGroupDiffs() : Plot the significant differences between gene expression distributions of PS or DS groups
  • PlotSelectedAgeDistr() : Plot the PS or DS distribution of a selected set of genes
  • FlatLineTest() : Function to perform the Flat Line Test that quantifies the statistical significance of an observed phylotranscriptomics pattern (significant deviation from a frat line = evolutionary signal)
  • ReductiveHourglassTest() : Function to perform the Reductive Hourglass Test that statistically evaluates the existence of a phylotranscriptomic hourglass pattern (hourglass model)
  • EarlyConservationTest() : Function to perform the Reductive Early Conservation Test that statistically evaluates the existence of a monotonically increasing phylotranscriptomic pattern (early conservation model)
  • EnrichmentTest() : Phylostratum or Divergence Stratum Enrichment of a given Gene Set based on Fisher's Test
  • bootMatrix() : Compute a Permutation Matrix for Test Statistics

All functions also include visual analytics tools to quantify the goodness of test statistics.

  • DiffGenes() : Implements Popular Methods for Differential Gene Expression Analysis
  • CollapseReplicates() : Combine Replicates in an ExpressionSet
  • CombinatorialSignificance() : Compute the Statistical Significance of Each Replicate Combination
  • Expressed() : Filter Expression Levels in Gene Expression Matrices (define expressed genes)
  • SelectGeneSet() : Select a Subset of Genes in an ExpressionSet
  • PlotReplicateQuality() : Plot the Quality of Biological Replicates
  • GroupDiffs() : Quantify the significant differences between gene expression distributions of PS or DS groups
  • taxonomy() : Retrieve Taxonomic Information for any Organism of Interest
  • MatchMap() : Match a Phylostratigraphic Map or Divergence Map with a ExpressionMatrix
  • tf() : Transform Gene Expression Levels
  • age.apply() : Age Category Specific apply Function
  • ecScore() : Compute the Hourglass Score for the EarlyConservationTest
  • geom.mean() : Geometric Mean
  • harm.mean() : Harmonic Mean
  • omitMatrix() : Compute TAI or TDI Profiles Omitting a Given Gene
  • rhScore() : Compute the Hourglass Score for the Reductive Hourglass Test

The developer version of myTAI might include more functionality than the stable version on CRAN. Hence users can download the current developer version of myTAI by typing:

# The developer version can be installed directly from github:
# install.packages("devtools")
# install developer version of myTAI
install_github("HajkD/myTAI", build_vignettes = TRUE, dependencies = TRUE)
# On Windows, this won't work - see ?build_github_devtools
# install_github("HajkD/myTAI", build_vignettes = TRUE, dependencies = TRUE)
# When working with Windows, first you need to install the
# R package: rtools -> http://cran.r-project.org/bin/windows/Rtools/
# or consult: http://www.rstudio.com/products/rpackages/devtools/
# Afterwards you can install devtools -> install.packages("devtools")
# and then you can run:
devtools::install_github("HajkD/myTAI", build_vignettes = TRUE, dependencies = TRUE)
# and then call it from the library
library("myTAI", lib.loc = "C:/Program Files/R/R-3.1.1/library")

Domazet-Lošo T. and Tautz D. A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns. Nature (2010) 468: 815-8.

Quint M, Drost HG, et al. A transcriptomic hourglass in plant embryogenesis. Nature (2012) 490: 98-101.

Drost HG, Gabel A, Grosse I, Quint M. Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis. Mol. Biol. Evol. (2015) 32 (5): 1221-1231.

Drost HG, Bellstädt J, Ó'Maoiléidigh DS, Silva AT, Gabel A, Weinholdt C, Ryan PT, Dekkers BJW, Bentsink L, Hilhorst H, Ligterink W, Wellmer F, Grosse I, and Quint M. Post-embryonic hourglass patterns mark ontogenetic transitions in plant development. Mol. Biol. Evol. (2016) doi:10.1093/molbev/msw039

I would be very happy to learn more about potential improvements of the concepts and functions provided in this package.

Furthermore, in case you find some bugs or need additional (more flexible) functionality of parts of this package, please let me know:


I would like to thank several individuals for making this project possible.

First I would like to thank Ivo Grosse and Marcel Quint for providing me a place and the environment to be able to work on fascinating topics of Evo-Devo research and for the fruitful discussions that led to projects like this one.

Furthermore, I would like to thank Alexander Gabel and Jan Grau for valuable discussions on how to improve some methodological concepts of some analyses present in this package.

I would also like to thank Master Students: Sarah Scharfenberg, Anne Hoffmann, and Sebastian Wussow who worked intensively with this package and helped me to improve the usability and logic of the package environment.


myTAI 0.4.0

  • a new function PlotSelectedAgeDistr() allowing unsers to visualize the PS or DS gene distribution of a subset of genes stored in the input ExpressionSet object
  • a new function PlotGroupDiffs() allowing users to plot the significant differences between gene expression distributions of PS or DS groups
  • a new function GroupDiffs() allowing users to perform statistical tests to quantify the gene expression level differences between all genes of defined PS or DS groups
  • PlotDistribution() now uses ggplot2 to visualize the PS or DS distribution and is also based on the new function PlotSelectedAgeDistr(); furthermore it loses arguments plotText and ... and gains a new argument legendName

  • remove arguments 'main.text' and '...' from PlotCorrelation()

  • PlotCorrelation() is now based on ggplot2

  • PlotGroupDiffs() receives a new argument gene.set allowing users to statistically quantify the group specific PS/DS differences of a selected set of genes

  • analogously to PlotGroupDiffs() the function GroupDiffs() also receives a new argument gene.set allowing users to statistically quantify the group specific PS/DS differences of a selected set of genes

  • Fixing wrong x-axis labeling in PlotCategoryExpr() when type = "stage-centered" is specified

  • PlotCategoryExpr() now also prints out the PS/DS absolute frequency distribution of the selected gene.set

myTAI 0.3.0

  • adding examples for PlotCategoryExpr() to Advanced Vignette
  • adding examples for PlotReplicateQuality() to Expression vignette
  • a new function PlotCategoryExpr() allowing users to plot the expression levels of each age or divergence category as boxplot, dot plot or violin plot
  • a new function PlotReplicateQuality() allowing users to visualize the quality of biological replicates

myTAI 0.2.1

  • fixed a wrong example in the Enrichment vignette (https://github.com/HajkD/myTAI/commit/8d52fd60c274361dc9028dec3409abf60a738d8a)
  • PlotGeneSet() and SelectGeneSet() now have a new argument use.only.map specifying whether or not instead of using a standard ExpressionSet a Phylostratigraphic Map or Divergene Map is passed to the function.
  • a wrong version of the edgeR Bioconductor package was imported causing version 0.2.0 to fail R CMD Check on unix based systems

myTAI 0.2.0

  • adding new vignette Taxonomy providing spep by step instructions on retrieving taxonomic information for any organism of interest

  • adding new vignette Expression Analysis providing use cases to perform gene expression data analysis with myTAI

  • adding new vignette Enrichment providing step-by-step instructions on how to perform PS and DS enrichment analyses with PlotEnrichment()

  • adding examples for pStrata(), pMatrix(), pTAI(), pTDI(), and PlotContribution() to the Introduction Vignette

  • a new function taxonomy() allows users to retrieve taxonomic information for any organism of interest; this function has been taken from the biomartr package and was removed from biomartr afterwards. Please notice, that in myTAI version 0.1.0 the Introduction vignette referenced to the taxonomy() function in biomartr. This is no longer the case (since myTAI version 0.2.0), because now taxonomy() is implemented in myTAI.

  • the new taxonomy() function is based on the powerful R package taxize.

  • a new function SelectGeneSet() allows users to fastly select a subset of genes in an ExpressionSet

  • a new function DiffGenes() allows users to perform differential gene expression analysis with ExpressionSet objects

  • a new function EnrichmentTest() allows users to perform a Fisher's exact test based enrichment analysis of over or underrepresented Phylostrata or Divergence Strata within a given gene set without having to plot the result

  • a new function PlotGeneSet() allows users to visualize the expression profiles of a given gene set

  • a new function PlotEnrichment() allows users to visualize the Phylostratum or Divergence Stratum enrichment of a given Gene Set as well as computing Fisher's exact test to quantify the statistical significance of enrichment

  • a new function PlotContribution() allows users to visualize the Phylostratum or Divergence Stratum contribution to the global TAI/TDI pattern

  • a new function pTAI() allows users to compute the phylostratum contribution to the global TAI pattern

  • a new function pTDI() allows users to compute the divergence stratum contribution to the global TDI pattern

  • FilterRNASeqCT() has been renamed to Expressed() allowing users to apply this filter function to RNA-Seq data as well as to microarray data
  • PlotRE() and PlotMeans() are now based on colors from the RColorBrewer package (default)
  • PlotRE() and PlotMeans() now have a new argument colors allowing unsers to choose custom colors for the visualized relative or mean expression profiles
  • geom.mean() and harm.mean() now are external functions accessible to the myTAI user

myTAI 0.1.0

  • now all functions have unit tests
  • a new function pStrata() allows users to compute partial TAI/TDI values for all Phylostrata or Divergence Strata

  • a new function CollapseReplicates() allows users to combine replicate expression levels in ExpressionSet objects

  • a new function FilterRNASeqCT() allows users to filter expression levels of ExpressionSet objects deriving from RNA-Seq count tables

  • function MatchMap() now receives a new argument remove.duplicates allowing users to delete duplicate gene ids (that might be stored in the input PhyoMap or DivergenceMap) during the process of matching a Map with an ExpressionSet

  • FlatLineTest(), ReductiveHourglassTest(), EarlyConservationTest(), and PlotPattern() implement a new argument custom.perm.matrix allowing users to pass their own (custom) permutation matrix to the corresponding function. All subsequent test statistics and p-value/std.dev computations are then based on this custom permutation matrix

  • EarlyConservationTest() and ReductiveHourglassTest() now have a new parameter gof.warning allowing users to choose whether or not non significant goodness of fit results should be printed as warning

  • now when specifying TestStatistic = NULL in PlotPattern() only the TAI/TDI profile is drawn (without performing any test statistics); this is equavalent to performing: plot(TAI(PhyloExpressionSetExample)

  • function combinatorialSignificance() is now named CombinatorialSignificance()

  • changing the title and description of the myTAI package

  • some minor changes in vignettes and within the documentation of functions

myTAI 0.0.2

  • combinatorialSignificance(), FlatLineTest(), ReductiveHourglassTest(), and EarlyConservationTest() now support multicore processing

  • MatchMap() has been entirely rewritten and is now based on dplyr; additionally it now has a new argument accumulate that allows you to accumulate multiple expression levels to a unique expressiion level for a unique gene id

All three Vignettes: Introduction, Intermediate, and Advanced have been updated and extended.

  • two small bugs in ReductiveHourglassTest() and EarlyConservationTest() have been fixed that caused that instead of displaying 3 or 4 plots (par(mfrow=c(1,3)) or par(mfrow=c(2,2))) only 1 plot has been generated

  • a small bug in PlotMeans() that caused the visualization of a wrong y-axis label when plotting only one group of Phylostrata or Divergence Strata

myTAI 0.0.1

Introducing myTAI 0.0.1:

A framework to perform phylotranscriptomics analyses for Evolutionary Developmental Biology research.

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.