Implements routines for metagenome sample taxonomy assignments collection, aggregation, and visualization. Accepts the EDGE-formatted output from GOTTCHA/GOTTCHA2, BWA, Kraken, MetaPhlAn, DIAMOND, and Pangia. Produces SVG and PDF heatmap-like plots comparing taxa abundances across projects.
Metagenome taxonomy assignment comparison toolkit. The toolkit is being developed for EDGE platform and reflects its backend specificity. The routines, however, can be used as a stand-alone library for multi-project comparative visualization of taxonomy assignments obtained for metagenomic samples processed with GOTTCHA/GOTTCHA2, BWA, KRAKEN, or METAPHLAN.
install.packages("devtools") library(devtools) install_github(repo = 'seninp-bioinfo/MetaComp', ref = "v1.3")
to use the library, simply load it into R environment:
the_gottcha_assignment <- load_gottcha_assignment(data_file_g) the_kraken_assignment <- load_kraken_assignment(data_file_k) the_metaphlan_assignment <- load_metaphlan_assignment(data_file_m)
The package functions
xxx stands for gottcha, kraken, or metaphlan) are designed to read a tool-specific assignment files. The configuration file for these functions must be tab-delimeted two columns file where the first column is the project id (used as the project's name in plotting), and the second column is an actual assignment file path:
the_assignments_list_g <- load_gottcha_assignments(config_file_g) the_assignments_list_k <- load_kraken_assignments(config_file_k) the_assignments_list_m <- load_metaphlan_assignments(config_file_m)
merge_xxx_assignments function is capable to merge a named list of GOTTCHA, Kraken, or MetaPhlAn assignments into a single table using
TAXA columns as ids.
plot_xxx_assignment accepts a single assignment table and outputs a ggplot object or produces a PDF plot using ggplot2's
plot_merged_assignment accepts a single merged assignment table as an input and outputs a ggplot object or produces a PDF plot using ggplot2's
The following script can be used to run the merge procedure in a batch mode:
# load library require(MetaComp) # # configure runtime options(echo = TRUE) args <- commandArgs(trailingOnly = TRUE) # # print provided args print(paste("provided args: ", args)) # # acquire values srcFile <- args destFile <- args taxonomyLevelArg <- args plotTitleArg <- args plotFileArg <- args rowLimitArg <- args sortingOrderArg <- args # # read the data and produce the merged table merged <- merge_gottcha_assignments(load_gottcha_assignments(srcFile)) # # write the merge table as a TAB-delimeted file write.table(merged, file = destFile, col.names = T, row.names = F, quote = T, sep = "\t") # # produce a PDF of the merged assignment plot_merged_assignment(assignment = merged, taxonomy_level = taxonomyLevelArg, sorting_order = sortingOrderArg, row_limit = base::strtoi(rowLimitArg), plot_title = plotTitleArg, filename = plotFileArg)
To execute the scrip, use Rscript as shown below:
$> Rscript merge_and_plot_gottcha_assignments.R assignments_table_gottcha.txt merged_assignments.txt \ family "Merge test plot" merge_test 20 alphabetical
this command line arguments are (some of these are clickable -- so you can see examples):
Rscript- a way to execute the R script
merge_and_plot_gottcha_assignments.R- the above script filename
assignments_table_gottcha.txt- the tab delimeted table of assignments (two columns:
merged_assignments_gottcha.txt- the tab-delimeted output file name
family- a LEVEL at which the plot should be produced
"Merge test plot"- the output plot's title
merge_test- the output plot filename mask,
".svg"files will be produced...
20the max number of rows to plot (in the specified sorting order)
alphabeticalthe merged plot sorting order