DescriptionTools to investigate how demographic parameters, populations genetics and abiotic conditions affect the rate of adaptation
DescriptionAlphaDrop is a very simple software package for simulating genomic selection or GWAS data. It simply drops simulated haplotypes through a pedigree. The haplotypes are simulated using MaCS (Chen et al., 2009). AlphaDrop can simulate sequence and SNP data, pedigrees, QTL effects, and breeding values. Pre-specified pedigrees can also be supplied.
DescriptionAnA-FiTS is an efficient tool for simulating polymorphism data forward-in-time on the chromosome and genome level. Its most striking features are high runtime efficiency, specifically when a part of the sequence to be simulated shall be neutral. Furthermore, for the neutral part of the sequence, AnA-FiTS stores (and outputs) a graph structure that allows to reconstruct the ancestral part of each haplotype that survived into present at any point in time.
DescriptionART is a set of simulation tools to generate synthetic next-generation sequencing data by mimicking real sequencing process with empirical error models or quality profiles. ART supports simulation of single-end, paired-end and mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can perform regular genome sequencing simulation as well amplicon sequencing simulation. ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN/MAP and/or SAM format. ART can also generate alignments in UCSC BED file format.
DescriptionBAMSurgeon can add SNVs, INDELs, and several forms of structural variant (SV) to existing BAM files and using multiple alignment methods, which is useful for testing mutation detection software in a variety of contexts.
DescriptionBayes SSC is powerful because it allows flexible coalescent modelling from a variety of different priors. The enables parameter estimation, likelihood calculations, and Bayesian inference. Typically, BayeSSC generates thousands of hypothetical trees using slightly different population parameters. The simulated genetics of these trees can then be compared to the actual genetics of the user's samples to investigate which history of these many simulated histories is the most likely to have generated the samples.
DescriptionBaySICS is made of five programs accessible from the same graphical interface. The first program performs coalescent simulations and create reference tables containing summary statistics from simulated DNA alignments. The second and third programs perform post-simulation analysis employing the reference tables and obtain parameters estimations or model choice (hypothesis contrasts) respectively. The fourth and fifth programs perform validation procedures for assessing the statistical power as well as the robustness of the inference by means of pseudo-observed datasets. BaySICS was designed for be user-friendly and for optimizing studies of ancient DNA.
DescriptionBy default BEERS simulates either mouse or human paired-end RNA-Seq data modeled on the illumina platform. It starts with a large number of gene models (approx 500K) taken from about ten different published annotation efforts, and then chooses a fixed number of these genes at random (30,000 by default). This avoids biasing for or against any particular set of annotations. BEERS then introduces substitutions, indels, alternate spice forms, sequencing errors, and intron signal. BEERS can also simulate strand specific reads. BEERS does not simulate quality scores. There are four configuration files required (available below).
DescriptionPopulation bottlenecks reduce genetic diversity and thus cause great concern in conservation biology. Previous theoretical studies often assume discrete generations in projecting declines in genetic diversity caused by bottlenecks. This assumption creates complexities when applying the models to long-lived species with overlapping generations. BottleSim is a program for simulating bottlenecks to estimate the impact on genetic diversity; the novelties include an overlapping-generation model, a wide range of reproductive systems, and flexible population size settings. With these features, BottleSim will be a useful tool for estimating the genetic consequences of bottlenecks, evaluating conservation plans, and performing power analysis.
DescriptionCASS provides simulated protein (codon) sequences from a population genetic context with a protein structure-dependent explicit genotype-phenotype map.
DescriptionCDPOP (Cost Distance POPulations) is an individual-based simulator of gene flow in complex landscapes to explain observed population responses and provide a foundation for landscape genetics. It models genetic exchange among spatially located individuals as a function of individual-based movement through mating and dispersal, incorporating population dynamics and the all factors that affect the frequency of an allele in a population (mutation, gene flow, genetic drift, and selection). User’s initially specify individual locations, environmental conditions governing gene flow, spatially-explicit fitness landscapes governing selection, and various genic configurations, and CDPOP models divergence through time as function of individual-based movement, breeding and dispersal as functions of the given landscape surfaces.
DescriptionAllows biology students to apply lessons in Mendelian genetics to real-world situations
DescriptionCoalFace has been developed predominantly as a teaching tool, yet some basic coalescent simulation analyses can be performed. Both Windows and Linux (Intel) executables are available.
DescriptionCoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models. It effectively constructs the ancestral recombination graph for a given number of individuals and uses this to simulate samples of SNP, micro-satellite, and other haplotypes/genotypes. The generated sample can afterwards be separated in cases and controls, depending on states of selected individual markers. The tool can accordingly also be used to construct cases and control data sets for association studies. CoaSim is written in C++, Guile Scheme and Python, and is available as source code (under the GNU General Public License, GPL) and as binary versions as Linux RPM files. The source code has been successfully compiled on various Linux and UNIX systems, under OS X and under Windows with Cygwin. As I have only limited access to architectures other than Linux, it is not possible for me to make binary distributions for other platforms, but if anyone is willing to build the distributions I will be more than happy to put them on this site.
DescriptionPopulation genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.
Descriptioncosi2 is an efficient coalescent simulator with support for selection, population structure, variable recombination rates, and gene conversion. It supports exact and approximate simulation modes.
DescriptionCS-PSeq-Gen is a program derived from PSeq-Gen, a program developed by Nick C. Grassly and Andrew Rambaut, designed to simulate the evolution of protein sequences along evolutionary trees. CS-PSeq-Gen modifications are related to the aim of simulating the evolution of protein sequences under the constraints of the information of a particular reconstructed phylogeny: the "root sequence" that initiates the simulation, or the rate heterogeneity among sites are specific on each particular protein family. CS-Pseq-Gen will allow simulations to take such information into account. As well, exploring the evolution of one protein family and testing hypotheses makes often it necessary to have some control on the variability of the parameters. CS-PSeq-Gen will allow some control on the simulated tree / branch lengths around an average value. Finally, a particular category of applications for such simulations is the search for the significant co-evolution of sites. CS-PSeq-Gen offers some facilities to generate sequences under such hypotheses, and propose a basic scheme for their detection, that can be easily adapted by programmers.
DescriptionA simulation tool named DHOEM (densification of haplotypes by loess regression and maximum likelihood) which is free from population assumptions and simulates new markers in real SNP marker data. The main objective of DHOEM is to generate a new population, which incorporates real and simulated SNP by statistical learning from an initial population, which match the realized features of the latter.
DescriptionEggLib is a C++/Python library and program package for evolutionary genetics and genomics. Main features are sequence data management, sequence polymorphism analysis, coalescent simulations and Approximate Bayesian Computation. EggLib is a flexible Python module with a performant underlying C++ library (which can be used independently), and allows fast and intuitive development of Python programs and scripts. A number of pre-programmed applications of EggLib possibilities are available interactively.
DescriptionEpistasis is a ubiquitous phenomenon in genetics, and is considered to be one of the main factors in current efforts to detect missing heritability for complex diseases. Simulation is a critical tool in developing methodologies that can more effectively detect and study epistasis. Here we present a simulator, epiSIM (epistasis SIMulator), that can simulate some of the statistical properties of genetic data. EpiSIM is capable of expanding the range of the epistasis models that current simulators offer, including epistasis models that display marginal effects and those that display no marginal effects. One or more of these epistasis models can be embedded simultaneously into a single simulation data set, jointly determining the phenotype. In addition, epiSIM is independent of any outside data source in generating linkage disequilibrium patterns and haplotype blocks. We demonstrate the wide applicability of epiSIM by performing several data simulations, and examine its properties by comparing it with current representative simulators and by comparing the data that it generates with real data. Our experiments demonstrate that epiSIM is a valuable addition and a nice complement to the existing epistasis simulators. The software package is available online at https://sourceforge.net/projects/episimsimulator/files/.
DescriptionEvolveAGene 3 is a realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions, including variable regions of selection intensity within the sequence and variation in intensity of selection over branches. Variation includes base substitutions, insertions, and deletions.
DescriptionFastsimcoal is a program to generate the neutral genomic molecular diversity in current or ancient samples drawn from a population with a complex demographic history. Fastsimcoal is a completely rewritten version of simcoal2 (Laval and Excoffier, 2004), a coalescent simulation program implementing a generation by generation approach while fastsimcoal is based on a much faster continuous time approximation. Despite a completely new coalescent engine, fastsimcoal uses exactly the same input files as simcoal2, and it produces very similar output files. Fastsimcoal typically generates many replicates of random outcome of molecular diversity under a user-‐defined evolutionary scenario. The evolutionary scenario is defined in an input parameter file (extension .par) and the output diversity is written in arlequin project files (extension .arp) that can then be processed with arlequin or arlsumstat (Excoffier and Lischer, 2010) to get distributions of various summary statistics. Additional options of fastsimcoal can be specified on the command line (type "fastsimcoal -‐h" for help on command line options). Fastsimcoal can handle very complex evolutionary scenarios including an arbitrary migration matrix between samples, historical events allowing for population resize, population fusion and fission, admixture events, changes in migration matrix, or changes in population growth rates. The time of sampling can be specified independently for each sample, allowing for serial sampling in the same or in different populations. Different markers, such as DNA sequences, SNP, STR (microsatellite) or multi-‐locus allelic data can be generated under a variety of mutation models (e.g. finite-‐ and infinite-‐site models for DNA sequences, stepwise or generalized stepwise mutation model for STRs data, infinite-‐allele model for standard multi-‐allelic data). fastsimcoal can simulate data in genomic regions with arbitrary recombination rates, thus allowing for recombination hotspots of different intensities at any position. fastsimcoal implements a new approximation to the ancestral recombination graph in the form of sequential Markov coalescent allowing it to very quickly generate genetic diversity for >100 Mb genomic segments.
DescriptionFFPopSim is a C++ and Python library to simulate large populations that are polymorphic at many loci. It allows for complex fitness functions, including pairwise and higher order epistasis. It is designed to study the effects of linked selection, the rare processes in large populations, and can be used to address a large variety of population genetics problems.
DescriptionThe FluxSimulator is the part of the FLUX project that aims at providing an in silico reproduction of the experimental pipelines for RNA-Seq, adopting a minimal set of parameters. Corresponding models were established after analyzing RNA-Seq experiments from different cell types, sample preparation protocols and sequencing platforms. The first step of the FLUX project is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are mimicked-whether they incur during library construction, or, in the sequencing process. The FluxSimulator provides a flexible base to design benchmark experiments based on the new sequencing technologies, as for instance abundance predictions of the FluxCapacitor.
Descriptionforqs is a forward-in-time population genetics simulation that tracks individual haplotype chunks as they recombine each generation. forqs also also models quantitative traits and selection on those traits. forqs is implemented as a command-line C++ program, using a modular design that gives the user great flexibility in creating custom simulations. It is freely available with a permissive BSD license.
DescriptionForSim is a forward evolutionary simulation system designed to be highly flexible for application to a wide variety of both applied health and life science questions as well as issues in theoretical evolutionary biology. It attempts to simulate in the most natural way the evolutionary process that generates the genetic architecture that underlies present-day traits, and related phenomena such as mate choice, migration bias, population substructure, and interactions with the environment. These phenomena are related to the way natural selection affects underlying genetic variation, molding the trait’s genetic architecture. Variation over the short evolutionary scale, within species or among closely related species, is generally built upon a phylogenetically stable underlying causal genetic architecture upon which mutation, selection, and demographic effects are laid to generate subsequent variation within and among populations.
DescriptionDiploid organisms are represented as pairs of chromosome arrays that store the locations of mutations. The user specifies a probability of selfing. Populations of constant size reproduce with non-overlapping generations. Mutations occur at a Poisson-distributed rate and insert new integers into chromosome arrays, which may undergo recombination. The number of sites is finite, but mutations occur only at non-polymorphic sites. Locations which are no longer polymorphic are removed. If using natural selection, the evolution of selected and neutral sites is carried out separately, with selected sites considered first.
DescriptionFREGENE works forwards-in-time which allows a wide range of demographic and selection scenarios to be implemented. Many such models are already incorporated into FREGENE, and since it is open source users can modify or extend these. Coalescent methods have difficulty incorporating large amounts of gene conversion or crossover (Hoggart et al. 2007), whereas these pose no particular problem for FREGENE. FREGENE offers a flexible model for recombination hotspots, and can readily simulate regions up to tens of Mb on a standard desktop computer. The principle limitation of forward-in-time algorithms is computational, since the entire population must be tracked through time, not only the chromosomes that are ancestral to the observed sample. FREGENE implements many features to enhance computational efficiency, and includes a rescaling option that greatly reduces computation time at the cost of some approximation.
DescriptionFwdpp is a C++11 library intended to help implement forward-time population genetic simulations.
DescriptionRapid, user friendly software package, able to generate whole populations of “worst-case-scenario” complex genetic models with random architectures, but a user specified set of constraints (i.e. number of loci, heritability, allele frequencies, prevalence). Intended for testing and evaluating algorithms or software for their ability to detect and model epistatic interactions in the absence of any main effects. The next version will add the ability to generate heterogeneous datasets (specifically datasets which concurrently contain both epistatic and heterogeneous effects.
DescriptionGCTA (Genome-wide Complex Trait Analysis) was originally designed to estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits (the GREML method), and has subsequently extended for many other analyses to better understand the genetic architecture of complex traits. GCTA currently supports the following functionalities: 1) Estimate the genetic relationship from genome-wide SNPs; 2) Estimate the inbreeding coefficient from genome-wide SNPs; 3) Estimate the variance explained by all the autosomal SNPs; 3) Partition the genetic variance onto individual chromosomes; 4) Estimate the genetic variance associated with the X-chromosome; 5) Test the effect of dosage compensation on genetic variance on the X-chromosome; 6) Predict the genome-wide additive genetic effects for individual subjects and for individual SNPs; 7) Estimate the LD structure encompassing a list of target SNPs; 8) Simulate GWAS data based upon the observed genotype data; 9) Convert Illumina raw genotype data into PLINK format; 10) Conditional & joint analysis of GWAS summary statistics without individual level genotype data; 11) Estimating the genetic correlation between two traits (diseases) using SNP data; 12) Mixed linear model association analysis
DescriptionGemSIM is a software package for generating realistic simulated next generation sequencing reads with quality score values. Both Illumina and Roche/454 reads (single or paired end) can be simulated using empirically derived error models.
DescriptionGENOME is a program to simulate sequences drawn from a population under the Wright-Fisher neutral model (Ewens 1979). It is based on a standard coalescent model (Hudson 1983, 1990; Donnelly & Tavaré 1995). Starting with the sampled sequences and moving backward in time, coalescent, recombination and migration events are simulated at each generation. These events could occur multiple times and could happen in the same generation. Each coalescent event is recorded and the resulting genealogy tree is constructed. Demographic events such as population bottlenecks and expansions or population merges and splits can also be simulated. In addition to uniform recombination rates, it is possible to allow recombination rates to vary so as to mimic the pattern of hotspots along the genome. After simulating a coalescent tree, mutations are placed along each branch. The number of mutations on each branch follows a Poisson distribution with mean equal to the product of the mutation rate and the branch length. The infinite-site mutation model is assumed, so no recurrent mutation can occur. The genealogy tree can also be output in Newick format, which is identical to that used by programs such as PHYLIP (Felsenstein 2005) and seq-gen (Rambaut & Grassly 1997). The program is written in C++ and is portable to multiple operating systems. The following sections will describe how to download and compile the program and how to specify the parameters for the program.
DescriptionThis new version allows the forward simulation of sequences of biallelic positions. As in the previous version, a number of evolutionary and demographic settings are allowed. Several populations under any migration model can be implemented, contraction-expansion scenarios, directional or divergent selection. Theoretical or simulated initial equilibrium population can be computed the same as speciation processes via the simulation of user-defined population splits. Each population consists of a number N of individuals. Each individual is represented by one or more chromosomes with constant or variable (hotspots) recombination between binary sites.
DescriptionGenomeSimla uses Hardy-Weinburg mating to advance simulated genetic data forward through time from generation to generation. Next, we included two distinct algorithms to aide the user in developing various types of disease models: SIMLA for diseases with interactions and main effects and simPEN for embedding purely epistatic models.
DescriptionPrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and—perhaps more interestingly—also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock.
DescriptionThe Gene-Environment iNteraction Simulator 2 (GENS2) simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. GENS2 is based on data with realistic patterns of linkage disequilibrium, and imposes no limitations either on the number of individuals to be simulated or on number of non-predisposing genetic/environmental factors to be considered. The GENS2 tool is able to simulate gene-environment and gene-gene interactions. To make the Simulator more intuitive, the input parameters are expressed as standard epidemiological quantities. GENS2 is written in Python language and takes advantage of operators and modules provided by the simuPOP simulation environment. GENS2 is not intended to simulate the evolution of a population, but to simulate complex gene-environment interactions in case-control sample. It shuold be used along with simuPOP, a software that allows realistic evolutionary simulation (or an equivalent simulator), to simulate dataset on which apply disease model .
DescriptionGenerating samples for association studies based on HapMap data
DescriptionSimulating gene trees under the multispecies coalescent and time-dependent migration
DescriptionGWAsimulator is a C++ program that can simulate genotype data for SNP chips that are used in genome-wide association (GWA) studies. It implements a rapid moving-window algorithm (Durrant et al. 2004. AJHG 75:35-43) to simulate whole genome case-control or population samples. It also can simulate specific regions if desired. For case-control data, the program retrospectively sample cases and controls according to a user-specified multi-locus disease model. The program requires phased data as input, and the simulated data will have similar LD patterns as the input data. The program can use HapMap phased data as input and has the flexibility of simulating genotypes for different populations and different SNP chips. Because many large-scale GWA data are becoming available, they can be used instead of the HapMap data as the input, as long as the phase information is generated. These data may provide a better representation of the population under study and more accurate LD information than the HapMap due to much larger sample sizes. See the manual for instructions and detailed description of the program
DescriptionHAPGEN2 is a an updated version of the program HAPGEN, which simulates case control datasets at SNP markers. The new version can now simulate multiple disease SNPs on a single chromosome, on the assumption that each disease SNP acts independently and are in Hardy-Weinberg equilibrium. We also supply a R package that can simulate interaction between the disease SNPs. We hope to add further facilities to simulate quantitive traits and admixture soon.
DescriptionSimulate haplotypes through meioses. Allows specification of population parameters.
DescriptionPackage for haplotype data simulation. Haplotypes are generated such that their allele frequencies and linkage disequilibrium coefficients match those estimated from an input data set
DescriptionHAPSIMU, a program based on real haplotype data from the HapMap ENCODE project, can simulate heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model. Moreover, both qualitative and quantitative traits can be simulated using additive genetic model with various genetic parameters designated by users.
DescriptionIBDSim can consider a large panel of subdivided population models representing discrete subpopulations as well as a large continuous population. Many dispersal distributions, with different tails, can be considered as well as various heterogeneities in space and time of the demographic parameters. For examples of various applications see Leblois et al. (2003), Leblois et al. (2004), Leblois et al. (2006), Rousset & Leblois (2007). The program runs on PC under Windows, Mac or Linux systems, and we provide the source code that can be easily compiled under any system using C++ ISO compiler.
DescriptionIgSimulator is a tool for simulation of antibody repertoire and Ig-seq library. IgSimulator is designed for testing and benchmarking tools for reconstruction of Ig repertoires.
DescriptionINDELible is a new, portable, and flexible application for biological sequence simulation that combines many features in the same place for the first time. Using a length-dependent model of indel formation it can simulate evolution of multi-partitioned nucleotide, amino-acid, or codon data sets through the processes of insertion, deletion, and substitution in continuous time.
Descriptionindel-Seq-Gen (iSG) is a biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies. This is accomplished through the addition of subsequence length constraints and lineage- and site-specific evolution. iSG tracks insertion and deletion processes that occur during the simulation run. iSG records all evolutionary events and outputs the "true" multiple alignment of the sequences, and can generate a larger simulated sequence space by allowing the use of multiple related root sequences. iSG can be used to test the accuracy of multiple alignment methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein superfamily classification methods.
DescriptioninvertFREGENE is the forward-in-time simulator of inversions in population genetic data, while SAMPLE samples genotype and haplotype data from the output of invertFREGENE simulations based on specified individual and marker ascertainment criteria, including a continuous and case-control disease model. invertFREGENE has been developed from a beta version of the population genetic simulator FREGENE, and as a result there are a small number of features not included in invertFREGENE (eg. it does not model natural selection), therefore we provide self-contained documentation for invertFREGENE. O'Reilly PF, Coin LJ, Hoggart CJ. invertFREGENE: software for simulating inversions in population genetic data. Bioinformatics. 2010 Mar 15;26(6):838-40.
DescriptionIndividual-based, spatially explicit models provide a mechanism to understand distributions of individuals on the landscape; however, few models have been coupled with population genetics. The primary benefits of such a combination is to assess performance of populationgenetic estimators in realistic situations. KERNELPOP represents a flexible framework to implement almost any arbitrary population-genetic and demographic model in a spatially explicit context using a variety of dispersal kernels. Estimates of type I error associated with genome scans in metapopulations are provided as an illustration of this software’s utility
DescriptionLDSO (Linkage Disequilibrium with Several Options) is a completely self-contained program written in Fortran90. It is a complete computer program for simulations of whole diploid population histories under various historical scenarios based on the gene-dropping method (MacCluer et al. 1986). The random number generator from L'Ecuyer (1996) was used. The genetic history of one or two populations can be simulated; the output files can deliver various statistics (inbreeding rates, allele frequencies, linkage disequilibrium) on these populations for generations wished by the user. Evolutionary forces that are classically found in livestock populations, such as mutation, selection, changes in the population size or random drift, can be taken into account. This allows the simulation of a wide-range of natural or experimental livestock populations, possibly by making use of a real pedigree. A set of parameters has to be provided by the user using simple text files. The possibility of genotyping errors and missing data is also envisaged.
DescriptionMaCS is a simulator of the coalescent process that simulates geneologies spatially across chromosomes as a Markovian process. The algorithm is similar to the SMC algorithm (McVean and Cardin, Phil Trans Soc R B 2005) in that the algorithm scales linearly in time with respect to sample size and sequence length. However, it more accurately models the true coalescent, while supporting all demographic scenarios found in the popular program MS (Hudson, Bioinformatics 2002) making this program appropriate for simulating data for structured populations in genome wide association studies.
DescriptionMarlin is a program for running spatially explicit forward-in-time population genetic simulations. It provides an intuitive user interface with realistic geographic scenarios can easily be easily created and simulated. But Marlin goes further than that and directly analyses and plots the results. This combination of creation, simulation, and analysis makes Marlin ideal for teaching and for scientists who are interested in doing simulations without having to learn command-line operations.
DescriptionMason is a package for the simulation of nucleotide data. Starting with a genome, you can simulate variants and optionally also methylation levels. From this, reads of different technologies can be simulated, optionally simulating bisulphite treatment. The variants can also be specified as a VCF file. The result are FASTQ files with the reads and optionally a SAM file with the alignment to the reference sequence. Substeps of the process are available as standalone tools, e.g. for the simulation of reads from preselected/-simulated fragments, computing of genomic sequences with variants. The time intensive part of read simulation has been parallelized.
DescriptionA software application to generate samples of DNA sequences when there is a biallelic site targeted by selection. mbs is developed by modifying Hudson's ms. The mbs software is so flexible that it can incorporate any arbitrary histories of population size changes and any mode of selection as long as selection is operating on a biallelic site.
DescriptionMENDEL is a genetic accounting program that allows realistic numerical simulation of the mutation/selection process over time. MENDEL is applicable to either haploid or diploid organisms, having either sexual or clonal reproduction. Each mutation that enters the simulated population is tracked from generation to generation to the end of the experiment - or until that mutation is lost either as a result of selection or random drift. Using a standard personal computer, the MENDEL program can be used to generate and track millions of mutations within a single population. MENDEL's input variables include such things as mutation rate, distribution specifications for mutation effects, extent of dominance, mating characteristics, selection method, average fertility, heritability, non-scaling noise, linkage block properties, chromosome number, genome size, population size, population sub-structure, and number of generations. The MENDEL program outputs, both in tabular and graphic form, provide several types of data including: deleterious and beneficial mutation counts per individual, mean individual fitness as a function of generation count, distribution of mutation effects, and allele frequencies. MENDEL provides biologists with a new tool for research and teaching, and allows for the modeling of complex biological scenarios that would have previously been impossible. Mendel operates in Linux, Windows, and MacIntosh environments. Mendel is described in more detail in the following publication: John C. Sanford and Chase W. Nelson (2012). The Next Step in Understanding Population Dynamics: Comprehensive Numerical Simulation. In: Studies in Population Genetics, M. Carmen Fusté (Ed.), ISBN: 978-953-51-0588-6, InTech Available from: http://www.intechopen.com/books/studies-in-population-genetics/the-next-step-in-understanding-population-dynamics-comprehensive-numerical-simulation
DescriptionMetaPopGen is a population genetics simulator. Features included in the model are age-structure, monoecious and dioecious (or separate sexes) life-cycles, mutation, dispersal and selection. All demographic parameters can be genotype-, sex-, age-, deme- and time-dependent. MetaPopGen is therefore indicated to study large populations and very complex demographic scenarios.
DescriptionThe aim of MetaSim is to provide a tool for the simulation of reads based on given genome sequences refecting (adaptable) error models of current sequencing technologies. Additionally, the user is able to determine the abundance of the chosen taxa. Therefore, MetaSim integrates an induced tree view of the NCBI taxonomy that can be used to interactively select taxa and inner nodes of the taxonomy to congure their relative abundances. Another feature of MetaSim allows the user to simulate an evolved population of a single genome sequence, using a population simulator. This feature is aimed at simulating the common real world situation that many dierent, but closely related strains of a lineage coexist in the same habitat. The resulting data sets can be used to plan and design metagenome studies and for evaluation and improvement of metagenomic software tools and assembly algorithms.
DescriptionThe application program mlcoalsim (multilocus coalescent simulations) is designed to: (i) Generate samples and calculate neutrality tests, and other statistics, under stationary model, several demographic models or strong positive selection by mean of coalescent theory. (i) Perform coalescent simulations with the mutational phase given: 1. the population mutation rate θ (θ = 4Nμ, where N is the effective population size and μ is the mutational rate). 2. a fixed number of mutations. 3. a distribution of θ values. A prior uniform (bounded) and a gamma distributions are enabled. 4. a fixed number of biallelic segregating sites taking into account the uncertainty of the population mutation rate (conditioning on biallelic segregating sites). A prior uniform (bounded) and a gamma distributions are enabled. (iii) Perform coalescent simulations with recombination given: 1. the population recombination rate R (R = 4Nr, where r is the recombination rate). 2. a distribution of r values. A prior uniform (bounded) and a gamma distributions are enabled. 3. a fixed number of minimum recombination events (Rm) taking into account the uncer- tainty of the population recombination rate (fixing Rm). A prior uniform (bounded) and a gamma distributions are enabled. 4. a fixed number of minimum recombination events (Rm) and a fixed number of haplo- types, considering the uncertainty of the population recombination rate. (iv) Perform multilocus analyses. Linked loci and unlinked loci are enabled. Multilocus statistics for unlinked loci are the average and the variance for each statistic. (v) Include recurrent mutations (multiple hits) or not. (vi) Include heterogeneity in mutation rate across the length of the sequence. A gamma distri- bution is used. Also, a number of invariant positions can also be defined. (vii) Include heterogeneity in recombination rate across the length of the sequence. A gamma distribution is used. Hotspots or a constant value for all positions are possible. This program is based on a previous version of Hudson’s coalescent program ms (Hudson, 2002) and modified for the above purposes. The function to calculate minimum recombinant values is a modification of Wall’s code (Wall, 2000). The gamma function was partially obtained from Grassly, Adachi and Rambaut code (Grassly et al., 1997). This program is distributed under the GNU GPL License. Version 2 includes parallel computation for multiple locus and the possibility to include priors for each of the parameters (useful for ABC computation analysis). The input file has been modified.
DescriptionMODELER4SIMCOAL2 (M4S2) is an extensible graphical tool to model linked loci and population demographies. M4S2 is easy to use, allowing for the modeling of complicated scenarios, making coalescent simulation modeling accessible to biologists with limited computer skills. The software includes an extension system allowing for new models to be created, published and downloaded from the Internet.
DescriptionThe program ms can be used to generate many independent replicate samples under a variety of assumptions about migration, recombination rate and population size to aid in the interpretation of such polymorphism studies. The samples are generated using the now standard coalescent approach in which the random genealogy of the sample is rst generated and then mutations are randomly place on the genealogy (Kingman, 1982; Hudson, 1990; Nordborg, 2001). The usual small sample approximations of the coalescent are used. An infinitesites model of mutation is assumed, and thus multiple-hits and back mutations do not occur. However, when used in conjunction with other programs, finite-site mutation models or micro-satellite models can be studied. For example, the gene trees themselves can be output, and these gene trees can be used as input to other programs which will evolve the sequences under a variety of finite-site models. These are described later. The program is intended to run on Unix, or Unix-like operating systems, such as Linux or MacOsX. The next section describes how to download and compile the program. The subsequent sections described how to run the program and in particular how to specify the parameter values for the simulations.
DescriptionThis addition to Hudson’s (2002) ms, called msHOT, allows for implementation of multiple crossover hotspots and/or multiple gene conversion hotspots in the simulated genetic region. Crossover hotspots may overlap with gene conversion hotspots, but crossover hotspots may not overlap with each other and gene conversion hotspots may not overlap with each other.
DescriptionThis document describes how to use msms, a tool to generate sequence samples under both neutral models and a single locus selection model. msms permits the full range of demographic models provided by ms(Hudson, 2002). In partic-ular, it allows for multiple demes with arbitrary migration patterns, population growth and decay in each deme, and for population splits and mergers. Selection (including dominance) can depend on the deme and also change with time. The program is designed to be command line compatible to ms, however no prior knowledge of ms is assumed for this document. Applications of this program include power studies, analytical comparisons, approximated Bayesian computation among many others. Because most applications require the generation of a large number of independent replicates, the code is designed to be efficient and fast. For the neutral case, it is comparable to ms and even faster for large recombination rates. For selection, the performance is only slightly slower, making this one of the fastest tools for simulation with selection. The program has been developed with a wide number of possible operating systems and hardware in mind. For this reason, the code has been developed in Java and can run on any hardware that supports Java 1.6. This includes Mac OS X, all current versions of MS Windows, and most Unix flavors (Linux, Sun, BSD). The Java programing language is also popular and widely known which should facilitate the writing of extensions for the program.
DescriptionMsprime is a reimplementation of Hudson’s classical ms program for modern datasets. The Python API and storage format are currently under development and are not fully documented, but the command line interface mspms is reliable and ready for use. This program provides a fully ms compatible interface, and can be used as a drop-in replacement in existing workflows.
DescriptionMySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package.
DescriptionNEAT-genReads is a fine-grained read simulator. GenReads simulates real-looking data using models learned from specific datasets. There are several supporting utilities for generating models used for simulation.
DescriptionNemo implements many different life cycles and evolvable traits with a large variety of genetic architectures. Species interaction between a parasite and its host can also be modeled (i.e., Cytoplasmic-Incompatibility inducing endosymbiont: Wolbachia). All this is framed within a flexible metapopulation model that allows for patch-specific carrying capacities, dispersal rates (dispersal matrices), stochastic extinction/harvesting rates, and demographic stochasticity. Populations can be dynamically modified during a simulation, allowing for population bottlenecks, patch fusion/fission, population expansion, etc. Spatially heterogeneous selection on quantitative traits can also be modeled. Nemo's interface is a simple text file containing the simulation parameters. Large batches of simulations can be run from a single parameter file with multiple parameter values. Many complex evolutionary and demographic scenarios can be modeled easily by providing temporally varying parameter values.
DescriptionNetRecodon is a population genetic simulator that generates samples of nucleotide and codon sequences from haploid/diploid populations with inter and intracodon recombination, migration, growth and dated tips. It can also run in several processors using MPI. Operative systems Source code and a makefile are provided for compilation in any OS with a C compiler, along with some compiled executables.
DescriptionAn R/BioConductor package that provides functions for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation, and order effects. Mutation rates can differ between genes, and we can include mutator/antimutator genes (to model mutator phenotypes). Simulations use continuous-time models and can include driver and passenger genes and modules. Also included are functions for simulating random DAGs of the type found in Oncogenetic Trees, Conjunctive Bayesian Networks, and other cancer progression models; plotting and sampling from single or multiple realizations of the simulations, including single-cell sampling; plotting the parent-child relationships of the clones; generating random fitness landscapes (Rough Mount Fuji, House of Cards, and additive models) and plotting them.
DescriptionPEDAGOG is a Windows program that simulates population dynamics at the individual level, allows for heritability and selection of traits, records individual genotype and pedigree information, and allows for several types of errors to manifest in the output which can be formatted for 57 existing software programs. In all, parameters can be specified for genetics, demographics, mating strategy, mutations and genetic/demographic errors, growth models, heritability and selection, and output. Demographic parameters can be either age or size based, and all parameters can be drawn from twelve statistical distributions where appropriate.
DescriptionPhyloSim is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, PhyloSim can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing PhyloSim to be adapted to specific needs. Key features of PhyloSim include 1) Simulation of the evolution of a set of discrete characters with arbitrary states evolving by a continuous-time Markov process with an arbitrary rate matrix. 2) Explicit implementations of the most popular substitution models (nucleotide, amino acid and codon substitution models). 3) Simulation under the popular models of among-sites rate variation, like the gamma (+G) and invariant sites plus gamma (+I+G) models. 4) The possibility to simulate under arbitrarily complex patterns of among-sites rate variation by setting the site specific rates according to any R expression. 5) Simulation of one or more separate insertion and/or deletion processes acting on the sequences and which sample the insertion/deletion length from an arbitrary discrete distribution or an R expression (so all the probability distributions implemented in R are readily available for this purpose). 6) Simulation of the effects of variable functional constraints over the sites by site-process specific insertion and deletion tolerance parameters which determine the rejection probability of a proposed insertion/deletion. 7) The possibility of having a different set of processes and site-process specific parameters for every site, which allows for an arbitrary number of partitions in the simulated data. 8) The possibility to evolve sites by a combination of substitution processes along a single branch. 9) Simulation of heterotachy and other cases of non-homogeneous evolution by allowing the user to set "node hook" functions altering the site properties at internal nodes. 10) The possibility to export the counts of various events ("branch statistics") as phylo objects (see the man page of exportStatTree.PhyloSim).
DescriptionπBUSS is a BEAST/BEAGLE utility for sequence simulation, which provides an easy to use interface that allows flexible and extensible phylogenetic data fabrication, delegating computationally intensive tasks to the BEAGLE library and thus making full use of multi-core architectures.
DescriptionIt simulates Illumina reads with empirical Base-Calling and GC%-depth profiles trained from real re-sequencing data. It considers error & quality distributions, as well as coverage bias patterns. In addition, pIRS also comes with a tool to simulate the heterozygous diploid genomes.
DescriptionThe homepage is no longer available and I could not locate another page. Perhaps another user knows the correct URL?
DescriptionProteinEvolver generates samples of protein-coding genes and protein sequences evolved along phylogenies under structure-based substitution models. These models consider the protein structure to evaluate candidate mutations, which can be accepted (substitutions) or rejected depending on the energy of the protein structure of the mutated sequence. The simulation of molecular evolution occurs along phylogenetic histories, which can be either user-specified or simulated by the coalescent modified with recombination (including recombination hotspots), migration, demographics and longitudinal sampling.
DescriptionLinkage disequilibrium (LD) and linkage analyses have been used extensively to identify quantitative trait loci (QTL) in human and livestock. Owing to the recent developments in genotyping technologies, dense marker maps are now available for several livestock species. Even though genotyping costs have substantially declined, large scale genome-wide association studies are still costly. For this reason many studies in livestock suffer from small sample size or from low density of markers. However, simulation is a highly valuable tool for assessing and validating new proposed methods for association studies at very low cost. During the last few decades, simulation has played a major role in answering a wide variety of questions in genomics. Several software have been developed for simulating genomes especially in human research. However most of the developed software tools do not provide functionality required for many of the applications in livestock. QMSim was developed to simulate large scale genomic data in livestock populations. QMSim is a family based simulator, which can also take into account predefined evolutionary features, such as LD, mutation, bottlenecks and expansions. The simulation is basically carried out in two steps: In the first step, a historical population is simulated to establish mutation-drift equilibrium and, in the second step, recent population structures are generated, which can be complex. QMSim allows for a wide range of parameters to be incorporated in the simulation models in order to produce appropriate simulated data.
DescriptionquantiNEMO is an individual-based, genetically explicit stochastic simulation program. It was developed to investigate the effects of selection, mutation, recombination, and drift on quantitative traits with varying architectures in structured populations connected by migration and located in a heterogeneous habitat. quantiNEMO is highly flexible at various levels: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, mating system, etc. quantiNEMO is a console program, and is coded in standard C++ using an object oriented approach, runs on any computer platform, and is distributed under an open source license.
DescriptionRecodon can simulate samples of coding DNA sequences under complex scenarios in which several evolutionary forces can interact simultaneously (namely, recombination, migration and demography). The basic codon model implemented is an extension to the general time-reversible model of nucleotide substitution with a proportion of invariable sites and among-site rate variation. In addition, the program implements non-reversible processes and mixtures of different codon models.
DescriptionREvolver is a program to simulate protein sequence evolution. REvolver automatically integrates domain information described by a profile Hidden Markov Model (pHMM) into the simulation. In the simulation of protein evolution it often had been assumed that sites evolve identically and independently from each other. This simplification is necessary since information concerning site specific evolution is frequently unavailable. However, homologous sequences and domains have been collected, aligned, and pHMMs built. The pHMM describes the variability and shared characteristics of sequences that share a common ancestor. Here we do have knowledge about what sites are conserved, at what positions in the sequences insertions are more likely, or what sites can be deleted. Pfam (Finn et al., 2010) and SMART (Letunic, Doerks and Bork, 2009) are examples for databases providing such data. REvolver is the first method, for simulating protein sequence evolution that integrates this pre-existing information about evolution in an automatic fashion.
DescriptionThe rlsim package is a collection of tools for simulating RNA-seq library construction, aiming to reproduce the most important factors which are known to introduce significant biases in the currently used protocols: hexamer priming, PCR amplification and size selection. It allows for a systematic exploration of the effects of the individual biasing factors and their interactions on downstream applications by simulating data under a variety of parameter sets. The implicit simulation model implemented in the main tool (rlsim) is inspired by the actual library preparation protocols and it is more general than the models used by the bias correction methods hence it allows for a fair assessment of their performance. Although the simulation model was kept as simple as possible in order to aid usability, it still has too many parameters to be inferred from data produced by standard RNA-seq experiments. However, simulating datasets with properties similar to specific datasets is often useful. To address this, the package provides a tool (effest) implementing simple approaches for estimating the parameters which can be recovered from standard RNA-seq data (GC-dependent amplification efficiencies, fragment size distribution, relative expression levels).
DescriptionRmetasim provides a flexible environment in which to perform individual-based population genetic simulations. A wide range of landscape-level dynamics, population structures, and within-population demographies can be represented using the framework implemented in this software. In addition, temporal variation in all demographic characteristics can be simulated, both deterministically and stochastically. Such simulations can be used to produce null distributions of genotypes under realistic conditions. These genotypic data can then be used by a variety of analytical programs to develop null expectations of any population genetic statistic estimated from genotypic data.
DescriptionRSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.
DescriptionRose implements a new probabilistic model of the evolution of RNA-, DNA-, or protein-like sequences. Guided by an evolutionary tree, a family of related sequences is created from a common ancestor sequence by insertion, deletion and substitution of characters. During this artificial evolutionary process, the `true' history is logged and the `correct' multiple sequence alignment is created simultaneously. The model also allows for varying rates of mutation within the sequences, making it possible to establish so-called sequence motifs. The data created by Rose are suitable for the evaluation of methods in multiple sequence alignment computation and the prediction of phylogenetic relationships. It can also be useful when teaching courses in or developing models of sequence evolution and in the study of evolutionary processes.
DescriptionThe Sequential Coalescent with Recombination Model (SCRM) is a new method that efficiently and accurately approximates the coalescent with recombination. It closes the gap between current approximations and the exact model and can be used to simulate genomic-scale data sets with an essentially correct linkage structure. The efficient C++ implementation scrm is available for all major platforms and as an R package on CRAN.
DescriptionSeq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution.
DescriptionSEQPower is a software to simulate rare variants data associated with complex traits and to perform power and sample size estimation for sequence based association studies. It features on analytic sample size estimates, power comparison of rare variant association methods as well as validation and evaluation of novel association tests under various study designs.
DescriptionSeqSIMLA can simulate sequence data in families with multiple affected and unaffected siblings or unrelated case-control data under different disease models. SeqSIMLA accepts a population of sequences generated by other sequence generators. We implemented two disease models, in which the user can flexibly specify the number of disease loci, effect sizes or population attributable risk, disease prevalence, and risk or protective loci. We also implemented a quantitative trait model, in which the user can specify the number of quantitative trait loci (QTL), proportions of variance explained by the QTL, and genetic models. In 2014, we extended SeqSIMLA to create SeqSIMLA2, which can simulate correlated traits and considers the shared environmental effects. SeqSIMLA2 can also simulate prespecified large pedigree structures. There are no restrictions on the number of individuals that can be simulated in a pedigree. In 2015, we implemented SeqSIMLA2_exact, which can simulate sequences with multiple disease sites in large pedigrees with given disease status for each pedigree member, assuming that the disease prevalence is low.
DescriptionSerial NetEvolve is a modification of the Treevolve program in which serially sampled sequences are evolved along a randomly generated coalescent tree or network (Grassly et al. 1999; Hudson 1983; Kingman 1982) . Treevolve offers a variety of evolutionary model and population parameters including a rate of recombination and as such it was chosen over other programs to be adapted for the simulation of serially sampled data. The new features include the choice of either a clock-like model of evolution or a variable rate of evolution, simulation of serial samples and the output of the randomly generated tree or network in Newick format or in our newly formulated NeTwick format.
DescriptionSFS_CODE (Selection on Finite Sites under COmplex Demographic Events) performs forward population genetic simulations under a general Wright-Fisher model with arbitrary demographic, selective, and mutational effects.
DescriptionSIBSIM is a modern and powerful computer program to simulate genotype and quantitative trait data in extended pedigrees. In the current release (2.1.2), we put emphasis on the simulation of a quantitative trait in pedigrees of arbitrary size without monozygotic twins. Well known software as, e.g., the SIMULATE package are not as scalable as SIBSIM. As an advantage over both G.A.S.P. and SIMLA no predefined boundaries restrict SIBSIM in its potential, neither in genome nor in family size. Instead, SIBSIM is as highly scalable as possible to meet any needs. SIBSIM may not only be used in simulation studies, but also in the validation, verification and testing process of other applications which deal with the implementation of statistical analysis of genomic data. We successfully used SIBSIM in the latter respect and detected a bug in a widely used genetic epidemiological software package.
DescriptionSimAdapt is a spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton to represent evolutionary processes of adaptation and population dynamics in changing landscapes, using the NetLogo environment.
DescriptionWe present here SIMCOAL2, an extended version of the SIMCOAL program (Excoffier et al. 2000), to simulate the neutral genetic diversity at partially linked loci under different histories and a wide range of migration and demographic models. SIMCOAL2 includes a number of new features compared to the previous version: The possibility of arbitrary recombination rates between adjacent loci Multiple coalescent events per generation, allowing the correct simulation of very large samples and very large recombining genomic regions The simulation of SNP data with arbitrary minimum frequency, for instance to simulate ascertainment bias The output of diploid genotypic data generated under the assumption of Hardy-Weinberg equilibrium The simulation of a mixture of different data types (DNA sequence, RFLP, STR, or SNP) along a single chromosome.
DescriptionSimCopy is an R package simulating the evolution of copy number profiles along a tree. It relies on the PhyloSim package for performing the simulations by encoding the genomic regions as sites in sequences and using modified processes acting on them. Please note, that the SimCopy simulations are restricted to a single chromosome. The genomes are encoded as a sequence of sites containing integers identifying genomic regions. Negative integers represent inverted genomic regions. SimCopy supports 1) deletion - deletes genomic regions, 2) duplication - duplicates genomic regions, 3) inversion - changes the orientation of the genomic regions by taking the opposite of the corresponding integer, 4) inverted duplication - duplicates genomic regions and flips their orientation and 5) translocation - translocates a stretch of genomic regions.
DescriptionA Whole-Genome Simulator Capable of Modeling High-Order Epistasis for Complex Disease
DescriptionSIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies. It allows the user flexibility in specifying marker and disease placement, locus heterogeneity, disequilibrium between markers and between markers and disease loci. Output is in the form of a LINKAGE (Lathrop et al., Proc Natl Acad Sci USA 81, 1984) pedigree file and is easily utilized, either directly or with minimal reformatting, as input for various genetic analysis packages.
DescriptionExisting simulation methods usually simulate linkage disequilibrium (LD) structures starting with an initial population that is randomly generated according to specified allele frequencies. These at random based methods might be unstable because the LD level of the initial population is generally extremely low. This study presents a new algorithm, SIMLD, to simulate genome populations with real LD structures. SIMLD begins from an initial population with possibly the highest LD level, and then the LD decays to fit the desired level through processes of mating and recombination over generations. SIMLD can produce case–control samples according to various disease models. Using empirical SNP marker information from three populations of HapMap data, we implement the proposed algorithm and demonstrate a set of experimental results.
DescriptionsimNGS is software for simulating observations from Illumina sequencing machines using the statistical models behind the AYB base-calling software. By default, observations only incorporate noise due to sequencing and do not incorporate effects from more esoteric sources of noise that may be present in real data ("dust", bubbles, merged clusters, sequence-heterogeneous clusters, etc). Many of these additional sources may optionally applied. simNGS takes fasta format sequences and a file describing the covariance of noise between bases and cycles observed in an actual run of the machine, randomly generates noisy intensities representing the signals for the sequence at each cycle and calculates likelihoods for all possible base calls.
DescriptionSimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.
DescriptionSimPEL is short for Simulation-based Power Estimation for sequencing studies of Low-prevalence conditions. SimPEL addresses the need for power estimation in low-prevalence condition studies, taking into account their inherently small sample sizes and analytical procedures. SimPEL integrates customizable parameters to realistically model study design outcomes and provide applicable information towards further refinement of experimental procedure. SimPEL is implemented as a function of the established JAWAMix5 tool (Long et al., 2013), an HDF5-based Java implementation for association mapping.
DescriptionProtein evolution has been largely modelled by considering the amino acid substitution process; however they have been few studies of the process of insertion and deletion. Simprot allows for several models of amino acid substitution (PAM, JTT and PMB), allows for gamma distributed sites rates according to Yang's model, and implements a parameterised Qian and Goldstein distribution model for insertion and deletion.
DescriptionA program to generate and analyze sequence-based data for rare variant association studies of quantitative and qualitative traits
DescriptionThis project attempts to model as many of the quirks that exist in Illumina data as possible. Some of these quirks include the potential for chimeric reads, and non-biotinylated fragment pull down in mate-pair libraries . Additionally the program provides the ability to model both site a…
DescriptionsimuGWAS evolves a population forward in time, subject to rapid population expansion, mutation, recombination and natural selection. A trajectory simulation method is used to control the allele frequency of optional disease predisposing loci. A scaling approach can be used to improve efficiency when weak, additive genetic factors are used.
DescriptionSIMULATE is a computer program to simulate genotypes in family members for a map of linked markers unlinked to a given affection status locus. The markers are assumed to be in linkage equilibrium when genotypes are assigned to founders in pedigrees (except in SIMULATE3). Output from this program is in SLINK format and is ready for analysis with UNKNOWN, ISIM, LSIM, or MSIM of the SLINK package.
DescriptionsimuPOP is a general-purpose individual-based forward-time population genetics simulation environment. The core of simuPOP is a scripting language (Python) that provides a large number of objects and functions to manipulate populations, and a mechanism to evolve populations forward in time. Using this environment, users can create, manipulate and evolve populations interactively, or write a script and run it as a batch file. Owing to its flexible and extensible design, simuPOP can simulate large and complex evolutionary processes with ease.
DescriptionSimulating linkage disequilibrium structures in a human population for SNP association studies.
DescriptionAn open-source variant simulator and read generator capable of simulating all the three common types of biological variants taking into account a distribution of base quality score from a most commonly used next-generation sequencing instrument from Illumina. SInC is capable of generating single- and paired-end reads with user-defined insert size and with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes.
DescriptionSimulating Site-Specific Interactions (SISSI) that simulatesevolution of a nucleotide sequence along a phylogenetic tree incorporating user defined site-specific interactions. Furthermore, our method allows to simulate more complex interactions among nucleotide and other character based sequences
DescriptionSLiM is an evolutionary simulation framework that combines a powerful engine for population genetic simulations with the capability of modeling arbitrarily complex evolutionary scenarios. Simulations are configured via the integrated Eidos scripting language that allows interactive control over practically every aspect of the simulated evolutionary scenarios. The underlying individual-based simulation engine is highly optimized to enable modeling of entire chromosomes in large populations. For Mac OS X users (on OS X 10.11 or later), we also provide a graphical user interface for easy simulation set-up, interactive runtime control, and dynamical visualization of simulation output.
DescriptionSMARTPOP is a fast and flexible forward-in-time simulator for population genetics. Specially developed for speed, it is available in a serial and a parallel versions. Developed for anthropological inference on human populations and eco-anthropological questions, SMARTPOP simulates individuals with sequences of sex-linked DNA (mitochondria, X and Y chromosomes) and autosomes. Studies of social dynamics are enabled using SMARTPOP flexible demographic model and social rules of mating.
DescriptionSNPsim is a population genetic simulator that generates samples of SNP (Single Nucleotide Polymorphisms) haplotypes and diploid biallelic genotypes. It is based on the coalescent with recombination (Hudson 1983) modified by Wiuf and Posada (2003) to include recombination hotspots. SNPsim also allows for the specification of demographic periods and different mutation models.
DescriptionSPLATCHE (for SPatiaL And Temporal Coalescences in Heterogenous Environment) is a program that allows to incorporate the influence of environment in the simulation of migration of a given species from one or several origin(s). In a second phase, the molecular genetic diversity of one or several samples drawn from the simulated species can be generated. Geographic area and environmental information have to be specified by the program user in a series of input files. Basically, the virtual world where migrations take place is constituted by a matrix of demes. Each deme has its own environmental characteristics according to the input files. A coalescent-based approach allows to generate the molecular diversity of any population sample. The molecular data obtained can then be analyzed in order to study the signature of the simulated demographic scenario. The goal of this online manual is to describe the technical aspects of the software SPLATCHE (version 1.1). This manual complements the article from Currat, Ray and Excoffier, published in 2004. Further details on the methodology can also be found in Ray (2003) and Currat (2004). The pdf version of the user manual could also be download there.
DescriptionAn extension to SLINK/FastSLINK to allow a larger number of marker loci to be simulated in pedigrees conditional on trait values and in linkage equilibrium or disequilibrium with a trait locus.
DescriptionTreesimJ is a forward-time simulator of an evolving population that tracks the evolutionary tree of the entire population. The application offers an intuitive GUI, a variety of pre-configured models of fitness, mutation, and demography, and a suite of data collectors that analyze the population and emit data to one or more sources. To the user, TreesimJ offers a simple, easy to use interface, a variety of interchangeable 'models' describing many aspects of the evolving population, and many ways to quantify and summarize the state of the population. Since the entire tree of the population is tracked, TreesimJ can easily be used to asses the average time to most recent common ancestor, the level of tree imbalance, or the mean pairwise coalescent time. It can also compute a number of familiar population genetic statistics, such as the nucleotide diversity and the number of segregating sites (if a model of fitness that includes DNA is used). The list of potential data collecting items is long, and getting
DescriptionVariant Simulation Tools is a module of Variant Tools for the simulation of genetic variants for sequencing-based genetic epidemiological studies. Although multiple simulation engines are provided, the core of VST is a novel forward-time simulation engine that simulates real nucleotide sequences of the human genome using DNA mutation models, fine-scale recombination maps, and a selection model based on amino acid changes of translated protein sequences. The design of VST allows users to easily create and distribute simulation methods and simulated datasets for a variety of applications and encourages fair comparison between statistical methods through the use of existing or reproduced simulated datasets.
DescriptionVORTEX is an individual-based simulation model for population viability analysis (PVA). This program will help you understand the effects of deterministic forces as well as demographic, environmental, and genetic stochastic (or random) events on the dynamics of wildlife populations. VORTEX models population dynamics as discrete, sequential events (e.g., births, deaths, catastrophes, etc.) that occur according to defined probabilities. The probabilities of events are modeled as constants or as random variables that follow specified distributions. Since the growth or decline of a simulated population is strongly influenced by these random events, separate model iterations or “runs” using the exact same input parameters will produce different results. Consequently, the model is repeated many times to reveal the distribution of fates that the population might experience under a given set of input conditions.
DescriptionWessim is a simulator for a targeted resequencing as generally known as exome sequencing. Wessim basically generates a set of artificial DNA fragments for next generation sequencing (NGS) read simulation. In the targeted resequencing, we constraint the genomic regions that are used to generated DNA fragments to be only a part of the entire genome; they are usually exons and/or a few introns and untranslated regions (UTRs).
DescriptionWgsim is a small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing errors. It does not generate INDEL sequencing errors, but this can be partly compensated by simulating INDEL polymorphisms. Wgsim outputs the simulated polymorphisms, and writes the true read coordinates as well as the number of polymorphisms and sequencing errors in read names. One can evaluate the accuracy of a mapper or a SNP caller with wgsim_eval.pl that comes with the package.
Descriptionis a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. XS handles Ion Torrent, Roche-454, Illumina and ABI-SOLiD simulation sequencing types. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores).
DescriptionSimulate 454-data using configurable statistical models at high speed
DescriptionArtificial Life Framework (ALF) aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. ALF is available as a stand-alone application and a user-friendly yet powerful web interface.
DescriptionOuputs artificial FASTQ files derived from a reference genome.
DescriptionThe program BOTTLENECK computes for each population sample and for each locus the distribution of the heterozygosity expected from the observed number of alleles (k), given the sample size (n) under the assumption of mutation-drift equilibrium. This distribution is obtained through simulating the coalescent process of n genes under two possible mutation models, the IAM and the SMM. This enables the computation of the average (Hexp) which is compared to the observed heterozygosity (Hobs, in the sense of Nei's gene diversity) to establish whether there is an heterozygosity excess or deficit at this locus. In addition, the standard deviation (SD) of the mutation-drift equilibrium distribution of the heterozygosity is used to compute the standardized difference for each locus ((Hobs-Hexp)/SD). The distribution obtained through simulation enables also the computation of a P-value for the observed heterozygosity.
DescriptionClotho is a C++ library of efficient data structures, algorithms, and tools for use in Forward Time Population Genetic Simulation. The name is in reference to the youngest sister of the Three Fates or Moirai. She was responsible for spinning the thread of human life.
DescriptionCuReSim (Customized Read Simulator) is a customized tool which generates synthetic New-Generation Sequencing reads, supporting read simulation for major letter-base sequencing platforms. CuReSim is developed in Java and is distributed as an executable jar file. Wrappers to integrate CuReSim in Galaxy are also available.
DescriptionDNA Assembly with Gaps (Dawg) is an application designed to simulate the evolution of recombinant DNA sequences in continuous time based on the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of gap formation. The application accepts phylogenies in Newick format and can return the sequence of any node, allowing for the exact evolutionary history to be recorded at the discretion of users. Dawg records the gap history of every lineage to produce the true alignment in the output. Many options are available to allow users to customize their simulations and results.
DescriptionWhole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li, and forked from DNAA. It was modified to handle ABI SOLiD and Ion Torrent data, as well as various assumptions about aligners and positions of indels. Many new features have been subsequently added.
DescriptionEnhanced Artificial Genome Engine: next generation sequencing reads simulator
DescriptionEASYPOP can simulate haploid, diploid or haplodiploid data. For diploids there is the choice between hermaphrodites or sexuals. For hermaphrodites, the proportion of clonal reproduction and selfing can be chosen, whereas for sexuals, complex breeding structures can be simulated (e.g. monogamy with a given proportion of extra-pair matings). The number of individuals can be selected for each population and dispersal is sex-specific. There are various migration models such as two-dimensional stepping stone or hierarchical island model. In addition there is an isolation-by-distance option which works with the coordinates of the populations on any number of dimensions. There are also several mutation models implemented, which are particularly oriented on the simulation of microsatellite loci. Genotypes are real multilocus, (i.e. there are not independent replicates for each locus). All mutation parameters can be set individually for each locus. EASYPOP is able to handle very large simulations on standard personal computers and is limited only by the memory of the machine. The computer code has been optimized for maximum speed. This allows running very large simulations on personal computers in a reasonable amount of time. In order to fit to analytical xpectations in particular for variances, the functions implemented in EASYPOP are probabilistic and not deterministic. In other words, the simulations rely on the genertation of random numbers.
DescriptionEvolSimulator is a program that allows the simulation of evolution at the level of genes, gene families, and whole genomes. It was designed with the goal of investigating evolutionary phenomena like biased mutation regimes in different lineages, complicated patterns of selective pressure across sequences, and the confounding effects of paralogy and lateral genetic transfer.
DescriptionNGS data characterization and in silico read generation
DescriptionFIGG is a genome simulation tool that uses known or theorized variation frequency, per a given fragment size and grouped by GC content across a genome to model new genomes in FASTA format while tracking applied mutations for use in analysis tools or population simulations. FIGG uses Apache MapReduce and HBase to rapidly generate individual genomes and allow users to scale up generation to fit specific project needs.
DescriptionFPG (for Forward Population Genetic simulation) simulates a population of constant size that is undergoing various evolutionary processes, including: mutation, recombination, natural selection, and migration. The meaning of "forward" in this context is simply that time, within the simulation, moves forward just as it does in the real world. This is in contrast to coalescent population genetic simulation in which time, as represented within the simulation, proceeds back into the past. Coalescent simulations have many advantages, but they are unwieldy if they incorporate natural selection on multiple sites.
DescriptionThe Genometric Analysis Simulation Program (G.A.S.P.) is a software tool that can generate samples of family data based on user specified genetic models. Data generated can be as simple as a single sample of random individuals with a single normally distributed trait or as complex as thousands of samples of extended families with multiple traits based on a linear combination of major locus, polygenic, common sibship environment and covariate components. Traits can be generated based on a number of user specified components, and components can be unique to a single trait or shared by multiple traits. The user first specifies a list of all desired components and then creates each trait by specifying the desired component weighted by its contribution to the phenotypic variance. G.A.S.P. can be used in two ways. First, data can be generated in a standalone fashion. The resulting family data can be saved and then used as sample data for demonstrating applications and methods of genetic analysis or for testing and verifying newly developed algorithms in statistical genetics. A simple driver ("dataonly") is provided for this application. Second, data can be generated and analyzed immediately using an existing statistical package. A driver can be designed to call subroutine versions of widely available genetic analysis programs.
DescriptionGeneArtisan: Simulation of Markers in Case-Control Study Designs Version 1.1 Release Date 22 May 2005. Note: This release implements an improved algorithm for simulating samples that allows larger intervals to be used and dramatically improves execution time. Check this page for updates and bug reports. Reference: Y. Wang and B. Rannala. 2005. In Silico Analysis of Disease-Association Mapping Strategies Using the Coalescent Process and Incorporating Ascertainment and Selection. American Journal of Human Genetics 76:1066-1073. Program Support: Questions about general issues/problems using this program can be directed to email@example.com .
DescriptionGppFst is a posterior predictive simulation (PPS) framework to generate theoretical distributions of FST and dXY under the neutral coalescent model for two populations that accounts for demographic parameters in a probabilistic framework. Importantly, our method allows users to explicitly test the null hypothesis of genetic drift when conducting genomic scans. PPS is a popular method for evaluating model fit within a Bayesian framework that has been used to test a variety of evolutionary models (Gelman et al., 2004; Reid et al., 2014). GppFst explicitly accounts for the demographic history of two genetically-isolated species, including multiple demographic and experimental parameters (and uncertainty in those parameters), such as sample sizes, demographic parameters, unequal rates of genetic drift within populations (unequal s), and divergence time. Our method allows users to simulate theoretical distributions that are conditioned on sampling multiple linked SNPs per locus – allowing users to take full advantage of large genomic datasets. We provide our PPS model in the package GppFst (Genomic Posterior Predictive distributions of FST), which offers a user-friendly, open-source framework to generate theoretical distributions of FST and dXY under the neutral coalescent model.
DescriptionGrinder is a versatile open-source bioinformatic tool to create simulated omic shotgun and amplicon sequence libraries for all main sequencing platforms.
DescriptionHAP-SAMPLE is a web application for simulating SNP genotypes for case-control and affected-child trio studies by resampling from Phase I/II HapMap SNP data. The user provides a list of SNPs to be "genotyped," along with a disease model file that describes causal SNPs and their effect sizes. The simulation tool is appropriate for candidate regions or whole-genome scans. The stand-alone software is also available.
DescriptionPacBio sequencers produced two types of characteristic reads: CCS (short and low error rate) and CLR (long and high error rate), both of which could be useful for de novo assembly of genomes. PBSIM simulates those PacBio reads by using either a model-based or sampling-based simulation.
Descriptionphenosim reads the output of commonly used coalescent simulators and simulates a phenotype based on a user-defined trait model for each individual. The simulated data can be used to assess the influence of various factors such as demography, genetic architecture or selection on the statistical power of association methods to detect causal genetic variants under a wide variety of population genetic scenarios.
DescriptionThis package can be used to simulate RNA-seq reads from differential expression experiments with replicates. The reads can then be aligned and used to perform comparisons of methods for differential expression.
DescriptionSimple reads simulator for pacbio & nanopore
DescriptionRECOAL simulates new haplotype data from a reference population of haplotypes. A coalescent genealogy for the reference haplotype data is sampled from the appropriate posterior probability distribution, then a coalescent genealogy is simulated which extends the sampled genealogy to include new haplotype data. The new haplotype data will therefore contain both some of the existing polymorphic sites and new polymorphisms added based on the structure of the simulated coalescent genealogy.
DescriptionSelSim is a program for Monte Carlo simulation of DNA polymorphism data for a recom-bining region within which a single bi-allelic site has experienced natural selection. SelSim allows simulation from either a fully stochastic model of, or deterministic approximations to, natural selection within a coalescent framework. A number of different mutation models are available for simulating surrounding neutral variation. The package enables a detailed ex- ploration of the effects of different models and strengths of selection on patterns of diversity.This provides a tool for the statistical analysis of both empirical data and methods designed to detect natural selection.
DescriptionWe develop a new user-friendly and integrated R package, sim1000G, which simulates genomic regions for unrelated individuals or for families. Only a single input of raw phased Variant Call Format (VCF) file is needed. Haplotypes are extracted to compute linkage disequilibrium in the simulated region and then for the generation of new genotype data for unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Arbitrary pedigree sizes are generated by modeling recombination events within sim1000G. Various simulation scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation family data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need for any tuning parameters.
DescriptionGiven a reference sequence, simhtsd will create a large set of short nucleotide reads, simulating the output from today's high-throughput DNA sequencers, such as the Illumina Genome Analyzer II.
DescriptionSPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user. Numerous variables controlling the age structure of the population, the number of offspring produced, the variance in male and female reproductive success, survival rates of different age classes, mate fidelity, duration of simulation, etc. can be specified by the user. The program stores the pedigree of all individuals in the simulated population. This pedigree is used to simulate genetic data on sampled individuals by tracing lineages back through paternal or maternal genes within each sampled individual. Data may be simulated for an arbitrary number of loci that are assumed to be independently segregating and to not be subject to natural selection, nor linked to any selected genes. Genotypes are reported in terms of both "founder alleles" (i.e., each distinct allele amongst the founders of the pedigree is given a distinct label) and also in terms of alleles whose frequencies amongst the founding members of the pedigree may be specified by the user.
Descriptionsrv simulates the introduction and evolution of genetic variants in one or more regions of chromosomes. These regions span roughly 10k to 100k basepair and can be considered as a gene. During evolution, mutants are introduced to the population and change the fitness of individuals who carry these mutants. The most distinguishing feature of this script is that it allows multi-locus fitness schemes with random or locus-specific diploid single-locus selection models to newly arising mutants. A multi-locus selection model is used to assign a fitness value to individuals according the mutants they carry.
DescriptionThe R package ThetaMater provides a Bayesian framework to simulate posterior probability distributions of θ. At the core of ThetaMater is the infinite-sites likelihood function described in Watterson, 1975 and Tavaré, 1984, which describes the probability distribution of observing k mutations in a sample size of n sequences obtained from a locus with size l (see manual for model description). Integral to ThetaMater is a suite of functions for simulating population genetic data under the infinite sites model, given theta. These functions are used to simulate realistic datasets under the neutral coalescent model, which can be used to identify potential paralogous loci using posterior predictive simulation. With these simulated data, users can identify loci with unexpected patterns (i.e., unlikely mutation counts) of genetic variation. Furthermore, Thetamater includes several functions for simulating datasets that have evolved under models of among-site variation across the genome.
Select attributes to compare
How to use this tool
- Select your desired simulator attributes in the Select attributes to compare pane in one of two ways:
- Navigate the attribute tree
- Use the text-box and its typeahead features to populate the attribute tree
- Observe the simulators ranked by their match quality in the Matching simulators pane to the right
- Select at least one and at most six simulators by checking their checkboxes to the right of each simulator and click the Compare button to view the comparison table
This tool is best viewed in one of the following browsers:
When using this tool with a screen reader, we have found that the Google Chrome browser with ChromeVox seems to give the best user experience. If you are using a screen reader, please use Firefox version 17 or Internet Explorer 9 and be sure to use the screen reader in application mode.