Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine Learning Framework

Use Monte-Carlo and K-fold cross-validation coupled with machine-learning classification algorithms to perform population assignment, with functionalities of evaluating discriminatory power of independent training samples, identifying informative loci, reducing data dimensionality for genomic data, integrating genetic and non-genetic data, and visualizing results.


Travis-CI Build Status CRAN status GitHub release license

Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine-learning Framework

Description

This R package helps perform population assignment and infer population structure using a machine-learning framework. It employs supervised machine-learning methods to evaluate the discriminatory power of your data collected from source populations, and is able to analyze large genetic, non-genetic, or integrated (genetic plus non-genetic) data sets. This framework is designed for solving the upward bias issue discussed in previous studies. Main features are listed as follows.

  • Use principle component analysis (PCA) for dimensionality reduction (or data transformation)
  • Use Monte-Carlo cross-validation to estimate mean and variance of assignment accuracy
  • Use K-fold cross-validation to estimate membership probability
  • Allow to resample various sizes of training datasets (proportions or fixed numbers of individuals and proportions of loci)
  • Allow to choose from various proportions of training loci either randomly or based on locus Fst values
  • Provide several machine-learning classification algorithms, including LDA, SVM, naive Bayes, decision tree, and random forest, to build tunable predictive models.
  • Output results in publication-quality plots that can be modified using ggplot2 functions

Install assignPOP

You can install the released version from CRAN or the up-to-date version from this Github respository.

  • To install from CRAN

    • Simply enter install.packages("assignPOP") in your R console
  • To install from Github

    • step 1. Install devtools package by entering install.packages("devtools")
    • step 2. Import the library, library(devtools)
    • step 3. Then enter install_github("alexkychen/assignPOP")

Note: When you install the package from Github, you may need to install additional packages before the assignPOP can be successfully installed. Follow the hints that R provided and then re-run install_github("alexkychen/assignPOP").

Package tutorial

Please visit our tutorial website for more infomration

What's new

Changes in ver. 1.1.2

  • 2017.5.13 Change function name read.genpop to read.Genepop; Add function read.Structure.
  • 2017.5.2 Update read.genpop function, now can read haploid data

Citation

Chen, K-Y., Marschall, E.A. Sovic M.G., Fries, A.C., Gibbs, H.L., Ludsin, S.A. (2017) assignPOP: Population Assignment using Genetic, Non-Genetic or Integrated Data in a Machine-learning Framework. R package version 1.1.2. https://CRAN.R-project.org/package=assignPOP

Previous version

Previous packages can be found and downloaded at archive branch

News

Reference manual

It appears you don't have a PDF plugin for this browser. You can click here to download the reference manual.

install.packages("assignPOP")

1.1.4 by Kuan-Yu (Alex) Chen, 3 months ago


https://github.com/alexkychen/assignPOP, http://alexkychen.github.io/assignPOP/


Browse source code at https://github.com/cran/assignPOP


Authors: Kuan-Yu (Alex) Chen [aut, cre], Elizabeth A. Marschall [aut], Michael G. Sovic [aut], Anthony C. Fries [aut], H. Lisle Gibbs [aut], Stuart A. Ludsin [aut]


Documentation:   PDF Manual  


GPL (>= 2) license


Imports caret, doParallel, e1071, foreach, ggplot2, MASS, parallel, randomForest, reshape2, stringr, tree

Suggests gtable, iterators, klaR, stringi, knitr, rmarkdown, testthat


See at CRAN