Simulation is a critical part of method development and assessment in quantitative genetics. 'PhenotypeSimulator' allows for the flexible simulation of phenotypes under different models, including genetic variant and infinitesimal genetic effects (reflecting population structure) as well as non-genetic covariate effects, observational noise and additional correlation effects. The different phenotype components are combined into a final phenotype while controlling for the proportion of variance explained by each of the components. For each effect component, the number of variables, their distribution and the design of their effect across traits can be customised. For the simulation of the genetic effects, external genotype data from a number of standard software ('plink', 'hapgen2'/ 'impute2', 'genome', 'bimbam', simple text files) can be imported. The final simulated phenotypes and its components can be automatically saved into .rds or .csv files. In addition, they can be saved in formats compatible with commonly used genetic association software ('gemma', 'bimbam', 'plink', 'snptest', 'LiMMBo').
PhenotypeSimulator allows for the flexible simulation of phenotypes from different genetic and non-genetic (noise) components.
In quantitative genetics, genotype to phenotype mapping is commonly realised by fitting a linear model to the genotype as the explanatory variable and the phenotype as the response variable. Other explanatory variable such as additional sample measures (e.g. age, height, weight) or batch effects can also be included. For linear mixed models, in addition to the fixed effects of the genotype and the covariates, different random effect components can be included, accounting for population structure in the study cohort or environmental effects. The application of linear and linear mixed models in quantitative genetics ranges from genetic studies in model organism such as yeast and Arabidopsis thaliana to human molecular, morphological or imaging derived traits. Developing new methods for increasing numbers of sample cohorts, phenotypic measurements or complexity of phenotypes to analyse, often requires the simulation of datasets with a specific underlying phenotype structure.
PhenotypeSimulator allows for the simulation of complex phenotypes under different models, including genetic variant effects and infinitesimal genetic effects (reflecting population structure) as well as correlated, non-genetic covariates and observational noise effects. Different phenotypic effects can be combined into a final phenotype while controlling for the proportion of variance explained by each of the components. For each component, the number of variables, their distribution and the design of their effect across traits can be customised.
Full documentation of PhenotypeSimulator is available at http://HannahVMeyer.github.io/PhenotypeSimulator/.
The current github version of PhenotypeSimulator is: 0.3.1 and can be installed via
The current CRAN version of PhenotypeSimulator is: 0.2.2 (soon to be updated!)
A log of version changes can be found here.
Meyer, HV & Birney E (2018) PhenotypeSimulator: A comprehensive framework for simulating multi-trait, multi-locus genotype to phenotype relationships, Bioinformatics, 34(17):2951–2956
Genotype simulation and kinship estimation: functions for genotype simulation and kinship estimation have been rewritten for significant speed-ups of the computation time benchmarking.
geneticFixedEffects and noiseFixedEffects:
The effect size distributions of the shared effects are now modelled as the product of two exponential distributions (to yield an approximately uniform distributions) or the product of a normal distribution with user-specified parameters and a standard normal distribution.
The independent effects can now be specified to affect the same subset or different subsets of traits (via keepSameIndependent).
The overall number of traits affected by the effects can now be specified via pTraitsAffected.
correlatedBgEffects: the additional correlation between the traits can be specified by the user by providing an external correlation matrix.