GSR: Compare Simulators by Attribute
454sim
https://sourceforge.net/projects/bioinfo-454sim/
Simulate 454-data using configurable statistical models at high speed
Description
Simulate 454-data using configurable statistical models at high speed
Full 454sim ProfileArtificialFastqGenerator
https://sourceforge.net/projects/artfastqgen/
Ouputs artificial FASTQ files derived from a reference genome.
Description
Ouputs artificial FASTQ files derived from a reference genome.
Full ArtificialFastqGenerator ProfileHaploSIm
http://cran.r-project.org/web/packages/HaploSim/index.html
Functions to simulate haplotypes
Description
Simulate haplotypes through meioses. Allows specification of population parameters.
Full HaploSIm ProfileAdmixSim2
https://github.com/Shuhua-Group/AdmixSim2
AdmixSim2 is a forward-time simulator of population genetic data.
Description
AdmixSim2 is an individual-based forward-time simulation tool that can flexibly and efficiently simulate population genomics data under complex evolutionary scenarious. It is based on the extended Wright-Fisher model, and it implements many common evolutionary parameters to involve gene flow, natural selection, recombination, and mutation. AdmixSim2 can be used to simulte data of diocious or monoecious populations, autosomes, or sex chromosomes.
Full AdmixSim2 ProfileAladyn
http://www.katja-schiffers.eu/research.html
Tools to investigate how demographic parameters, populations genetics and abiotic conditions affect the rate of adaptation
Description
Tools to investigate how demographic parameters, populations genetics and abiotic conditions affect the rate of adaptation
Full Aladyn ProfileALF
A Simulation Framework for Genome Evolution
Description
Artificial Life Framework (ALF) aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. ALF is available as a stand-alone application and a user-friendly yet powerful web interface.
Full ALF ProfileAliSim
http://www.iqtree.org/doc/AliSim
A fast and versatile phylogenetic sequence simulator
Description
AliSim is a new tool that can efficiently simulate biologically realistic alignments under a large range of complex evolutionary models. To achieve high performance across a wide range of simulation conditions, AliSim implements an adaptive approach that combines the commonly used rate matrix and probability matrix approaches. AliSim takes 1.4 hours and 1.3 GB RAM to simulate alignments with one million sequences or sits, whereas popular software like Seq-Gen, Dawg, and INDELible require 2-5 hours and 50-500 GB of RAM for the same task.
Full AliSim ProfileAna-FiTS
http://sco.h-its.org/exelixis/web/software/anafits/index.html
an efficient tool for simulating polymorphism data forward-in-time on the chromosome and genome level
Description
AnA-FiTS is an efficient tool for simulating polymorphism data forward-in-time on the chromosome and genome level. Its most striking features are high runtime efficiency, specifically when a part of the sequence to be simulated shall be neutral. Furthermore, for the neutral part of the sequence, AnA-FiTS stores (and outputs) a graph structure that allows to reconstruct the ancestral part of each haplotype that survived into present at any point in time.
Full Ana-FiTS ProfileARGON
https://github.com/pierpal/ARGON
Fast, whole-genome simulation of the discrete time Wright-Fisher process
Description
ARGON simulates the discrete time Wright Fisher process (DTWF) backwards in time. The coalescent is equivalent to the DTWF process if the sample size is small compared to the effective population size, but will deviate from it as the sample size increases (Wakeley and Takahashi, MBE 2003; Bhaskar, Clark and Song, PNAS 2014). ARGON supports arbitrary demographic history, migration, variable mutation/recombination rates and gene conversion, and efficiently outputs pairwise identical-by-descent (IBD) sharing data.
Full ARGON ProfileART
http://www.niehs.nih.gov/research/resources/software/biostatistics/art/
ART is a set of simulation tools to generate synthetic next-generation sequencing data by mimicking real sequencing process with empirical error models or quality profiles.
Description
ART is a set of simulation tools to generate synthetic next-generation sequencing data by mimicking real sequencing process with empirical error models or quality profiles. ART supports simulation of single-end, paired-end and mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can perform regular genome sequencing simulation as well amplicon sequencing simulation. ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN/MAP and/or SAM format. ART can also generate alignments in UCSC BED file format.
Full ART ProfileBAMSurgeon
https://github.com/adamewing/bamsurgeon
Methods for realistic simulation of mutations in real data.
Description
BAMSurgeon can add SNVs, INDELs, and several forms of structural variant (SV) to existing BAM files and using multiple alignment methods, which is useful for testing mutation detection software in a variety of contexts.
Full BAMSurgeon ProfileBayesian Serial SimCoal
http://www.stanford.edu/group/hadlylab/ssc/index.html
Bayesian Serial SimCoal, (BayeSSC) is a modification of SIMCOAL 1.0, a program written by Laurent Excoffier, John Novembre, and Stefan Schneider.
Description
Bayes SSC is powerful because it allows flexible coalescent modelling from a variety of different priors. The enables parameter estimation, likelihood calculations, and Bayesian inference. Typically, BayeSSC generates thousands of hypothetical trees using slightly different population parameters. The simulated genetics of these trees can then be compared to the actual genetics of the user's samples to investigate which history of these many simulated histories is the most likely to have generated the samples.
Full Bayesian Serial SimCoal ProfileBaySICS
https://sites.google.com/site/baysicsabc/
An integral platform with a graphical interface for statistical inference based on approximate Bayesian computation.
Description
BaySICS is made of five programs accessible from the same graphical interface. The first program performs coalescent simulations and create reference tables containing summary statistics from simulated DNA alignments. The second and third programs perform post-simulation analysis employing the reference tables and obtain parameters estimations or model choice (hypothesis contrasts) respectively. The fourth and fifth programs perform validation procedures for assessing the statistical power as well as the robustness of the inference by means of pseudo-observed datasets. BaySICS was designed for be user-friendly and for optimizing studies of ancient DNA.
Full BaySICS ProfileBEERS
BEERS was designed to benchmark RNA-Seq alignment algorithms and also algorithms that aim to reconstruct different isoforms and alternate splicing from RNA-Seq data
Description
By default BEERS simulates either mouse or human paired-end RNA-Seq data modeled on the illumina platform. It starts with a large number of gene models (approx 500K) taken from about ten different published annotation efforts, and then chooses a fixed number of these genes at random (30,000 by default). This avoids biasing for or against any particular set of annotations. BEERS then introduces substitutions, indels, alternate spice forms, sequencing errors, and intron signal. BEERS can also simulate strand specific reads. BEERS does not simulate quality scores. There are four configuration files required (available below).
Full BEERS Profilebmsim
https://github.com/pingchen09990102/BMSIM
BioNano Molecule SIMulator
Description
BioNano Molecule SIMulator (BMSIM) explicitly incorporated BioNano data models (BioNano molecule length distribution, FN and FP signals, DNA molecules stretching variations, variation in optical resolution, and fragile sites) and the methods to generate chimeric molecules and assign SNR scores for simulated BioNano molecules. We simulated noisy maps from ‘perturbed’ versions of the reference map. Using genomic sequences (.fasta file) as input, BMSIM simulated noisy maps with five main steps: I) generate BioNano molecules with random fragmentation and fragile site bias model; II) abel nicking sites for BioNano molecules by in silico restriction digestion.Our program supported all available nicking enzymes currently used in BioNano systerm (i.e., Nt.BspQI, Nb.BbvCI, Nb.Bsml and Nb.BsrDI), as well as any artificial nicking sequences that users chose to define; III) incorporate data models for FN sites, FP sites, stretching variations, optical resolution, and chimerism for BioNano molecules; IV) assign SNR and intensity scores for labelling sites; V) iterate for targeted coverage depth. The output of BMSIM is a BNX format text file (.BNX, see example BNX file) which contains molecule map length, label positions, and label signal score, ect.
Full bmsim ProfileBOTTLENECK
http://www1.montpellier.inra.fr/CBGP/software/Bottleneck/bottleneck.html
Bottleneck is a program for detecting recent effective population size reductions from allele data frequencies
Description
The program BOTTLENECK computes for each population sample and for each locus the distribution of the heterozygosity expected from the observed number of alleles (k), given the sample size (n) under the assumption of mutation-drift equilibrium. This distribution is obtained through simulating the coalescent process of n genes under two possible mutation models, the IAM and the SMM. This enables the computation of the average (Hexp) which is compared to the observed heterozygosity (Hobs, in the sense of Nei's gene diversity) to establish whether there is an heterozygosity excess or deficit at this locus. In addition, the standard deviation (SD) of the mutation-drift equilibrium distribution of the heterozygosity is used to compute the standardized difference for each locus ((Hobs-Hexp)/SD). The distribution obtained through simulation enables also the computation of a P-value for the observed heterozygosity.
Full BOTTLENECK ProfileBottleSim
https://github.com/chihhorngkuo/BottleSim
a simulation program for changes in genetic diversity during the process of population bottlenecks
Description
Population bottlenecks reduce genetic diversity and thus cause great concern in conservation biology. Previous theoretical studies often assume discrete generations in projecting declines in genetic diversity caused by bottlenecks. This assumption creates complexities when applying the models to long-lived species with overlapping generations. BottleSim is a program for simulating bottlenecks to estimate the impact on genetic diversity; the novelties include an overlapping-generation model, a wide range of reproductive systems, and flexible population size settings. With these features, BottleSim will be a useful tool for estimating the genetic consequences of bottlenecks, evaluating conservation plans, and performing power analysis.
Full BottleSim ProfileCAMISIM
https://github.com/CAMI-challenge/CAMISIM
Simulating metagenomes and microbial communities
Description
CAMISIM is a software to model abundance distributions of microbial communities and to simulate corresponding shotgun metagenome datasets. It was mainly developed for the Critical Assessment of Metagenome Annotation (CAMI) challenge, but should be suitable for general use. Please don't hesitate to open a new issue if you run into problems or need help.
Full CAMISIM ProfileCAMPAREE
https://github.com/itmat/CAMPAREE
a robust and configurable RNA expression simulator
Description
CAMPAREE is a RNA expression simulator that is primed using real data to give realistic output. CAMPAREE needs as input a reference genome with transcript annotations as well as fastq files of samples of the species to base the output on. For each sample, CAMPAREE outputs a simulated set of RNA transcripts mimicking expression levels with in the fastq files and accounting for isoform-level expression and allele-specific expression. It also outputs simulated diploid genomes and their corresponding annotations with phased SNP and indel calls in the transcriptome from fastq reads. Additionally the simulation outputs the underlying distributions used for expressing the transcripts.
Full CAMPAREE ProfileCancerInSilico
https://github.com/FertigLab/CancerInSilico
The CancerInSilico package provides an R interface for running mathematical models of tumor progresson. This package has the underlying models implemented in C++ and the output and analysis features implemented in R.
Description
The CancerInSilico package provides an R interface for running mathematical models of tumor progresson. This package has the underlying models implemented in C++ and the output and analysis features implemented in R.
Full CancerInSilico ProfileCASS
https://liberles.cst.temple.edu/Software/CASS/index.html
Protein Sequence Simulation
Description
CASS provides simulated protein (codon) sequences from a population genetic context with a protein structure-dependent explicit genotype-phenotype map.
Full CASS ProfileCDPOP
https://github.com/ComputationalEcologyLab/CDPOP
CDPOP is a landscape genetics tool for simulating the emergence of spatial genetic structure in populations resulting from specified landscape processes governing organism movement behavior.
Description
CDPOP (Cost Distance POPulations) is an individual-based simulator of gene flow in complex landscapes to explain observed population responses and provide a foundation for landscape genetics. It models genetic exchange among spatially located individuals as a function of individual-based movement through mating and dispersal, incorporating population dynamics and the all factors that affect the frequency of an allele in a population (mutation, gene flow, genetic drift, and selection). User’s initially specify individual locations, environmental conditions governing gene flow, spatially-explicit fitness landscapes governing selection, and various genic configurations, and CDPOP models divergence through time as function of individual-based movement, breeding and dispersal as functions of the given landscape surfaces.
Full CDPOP ProfileCellCoal
https://github.com/dapogon/cellcoal
CellCoal: Coalescent Simulation of Single-Cell Sequencing Samples
Description
CellCoal simulates the somatic evolution of single-cells. CellCoal generates a coalescent genealogy for a sample of somatic cells –no recombination– obtained from a growing population, together with a another cell as outgroup, introduces mutations along this genealogy, and produces single-cell diploid genotypes (single-nucleotide variants or SNVs). CellCoal implements multiple mutations models (0/1, DNA, infinite and finite site models, deletion, copy-neutral LOH, 30 cancer signatures) and is able to generate read counts and genotype likelihoods considering allelic dropout, sequencing and amplification error, plus doublet cells.
Full CellCoal ProfileClotho
https://github.com/putnampp/clotho
a C++ library of efficient data structures, algorithms, and tools for use in Forward Time Population Genetic Simulation
Description
Clotho is a C++ library of efficient data structures, algorithms, and tools for use in Forward Time Population Genetic Simulation. The name is in reference to the youngest sister of the Three Fates or Moirai. She was responsible for spinning the thread of human life.
Full Clotho ProfileCoala
https://github.com/statgenlmu/coala
Coala is an R package that simulates biological sequences according to a given model of evolution.
Description
Coala is an R package that simulates biological sequences according to a given model of evolution. The package calls simulators based on coalescent theory. All the simulators can simulate finite site mutation models when combined with Seq-gen. Coala then imports the output of the simulators into R and is capable of calculating their summary statistics.
Full Coala ProfileCoaSim
https://github.com/mailund/CoaSim
CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models.
Description
CoaSim is a tool for simulating the coalescent process with recombination and geneconversion under various demographic models. It effectively constructs the ancestral recombination graph for a given number of individuals and uses this to simulate samples of SNP, micro-satellite, and other haplotypes/genotypes. The generated sample can afterwards be separated in cases and controls, depending on states of selected individual markers. The tool can accordingly also be used to construct cases and control data sets for association studies. CoaSim is written in C++, Guile Scheme and Python, and is available as source code (under the GNU General Public License, GPL) and as binary versions as Linux RPM files. The source code has been successfully compiled on various Linux and UNIX systems, under OS X and under Windows with Cygwin. As I have only limited access to architectures other than Linux, it is not possible for me to make binary distributions for other platforms, but if anyone is willing to build the distributions I will be more than happy to put them on this site.
Full CoaSim Profilecophesim
https://sites.duke.edu/barusoftware/othersoft/cophesim/
A Comprehensive Simulator of Phenotype-Genotype Connections for Testing Methods of Genetic Analysis
Description
Simulation tool to simulate phenotypes: continuous, dichotomous and survival for common variants from existing genotype data simulated with some other tool.
Full cophesim ProfileCoreSimul
https://github.com/lbobay/CoreSimul
a forward-in-time simulator of genome evolution for prokaryotes modeling homologous recombination
Description
CoreSimul is a forward-in-time simulator of core genome evolution for prokaryotes modeling homologous recombination. Simulations are guided by a phylogenetic tree and incorporate different substitution models, including models of codon selection.
Full CoreSimul Profilecosi
http://www.broadinstitute.org/~sfs/cosi/
A coalescent-based simulator with a demographic model calibrated from empirical data.
Description
Population genetic models play an important role in human genetic research, connecting empirical observations about sequence variation with hypotheses about underlying historical and biological causes. More specifically, models are used to compare empirical measures of sequence variation, linkage disequilibrium (LD), and selection to expectations under a "null" distribution. In the absence of detailed information about human demographic history, and about variation in mutation and recombination rates, simulations have of necessity used arbitrary models, usually simple ones. With the advent of large empirical data sets, it is now possible to calibrate population genetic models with genome-wide data, permitting for the first time the generation of data that are consistent with empirical data across a wide range of characteristics. We present here the first such calibrated model and show that, while still arbitrary, it successfully generates simulated data (for three populations) that closely resemble empirical data in allele frequency, linkage disequilibrium, and population differentiation. No assertion is made about the accuracy of the proposed historical and recombination model, but its ability to generate realistic data meets a long-standing need among geneticists. We anticipate that this model, for which software is publicly available, and others like it will have numerous applications in empirical studies of human genetics.
Full cosi Profilecosi2
https://software.broadinstitute.org/mpg/cosi2/
an efficient coalescent simulator with support for simulating selection
Description
cosi2 is an efficient coalescent simulator with support for selection, population structure, variable recombination rates, and gene conversion. It supports exact and approximate simulation modes.
Full cosi2 ProfileCuReSim
http://www.pegase-biosciences.com/curesim-a-customized-read-simulator/
A customized read simulator
Description
CuReSim (Customized Read Simulator) is a customized tool which generates synthetic New-Generation Sequencing reads, supporting read simulation for major letter-base sequencing platforms. CuReSim is developed in Java and is distributed as an executable jar file. Wrappers to integrate CuReSim in Galaxy are also available.
Full CuReSim ProfileDAWG
https://github.com/reedacartwright/dawg
An application designed to simulate the evolution of recombinant DNA sequences in continuous time
Description
DNA Assembly with Gaps (Dawg) is an application designed to simulate the evolution of recombinant DNA sequences in continuous time based on the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of gap formation. The application accepts phylogenies in Newick format and can return the sequence of any node, allowing for the exact evolutionary history to be recorded at the discretion of users. Dawg records the gap history of every lineage to produce the true alignment in the output. Many options are available to allow users to customize their simulations and results.
Full DAWG ProfileDeepSimulator
https://github.com/liyu95/DeepSimulator
The first deep learning based Nanopore simulator which can simulate the process of Nanopore sequencing
Description
DeepSimulator, to mimic the entire pipeline of Nanopore sequencing. Starting from a given reference genome or assembled contigs, we simulate the electrical current signals by a context-dependent deep learning model, followed by a base-calling procedure to yield simulated reads. This workflow mimics the sequencing procedure more naturally. The thorough experiments performed across four species show that the signals generated by our context-dependent model are more similar to the experimentally obtained signals than the ones generated by the official context-independent pore model. In terms of the simulated reads, we provide a parameter interface to users so that they can obtain the reads with different accuracies ranging from 83 to 97%. The reads generated by the default parameter have almost the same properties as the real data. Two case studies demonstrate the application of DeepSimulator to benefit the development of tools in de novo assembly and in low coverage SNP detection.
Full DeepSimulator ProfileDHOEM
a statistical simulation software for simulating new markers in real SNP marker data
Description
A simulation tool named DHOEM (densification of haplotypes by loess regression and maximum likelihood) which is free from population assumptions and simulates new markers in real SNP marker data. The main objective of DHOEM is to generate a new population, which incorporates real and simulated SNP by statistical learning from an initial population, which match the realized features of the latter.
Full DHOEM Profilediscoal
https://github.com/kr-colab/discoal
flexible coalescent simulations with selection
Description
discoal is a coalescent simulation program capable of simulating models with recombination, selective sweeps, and demographic changes including population splits and admixture events.
Full discoal ProfileDWGSIM
https://github.com/nh13/DWGSIM
Whole Genome Simulator for Next-Generation Sequencing
Description
Whole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li, and forked from DNAA. It was modified to handle ABI SOLiD and Ion Torrent data, as well as various assumptions about aligners and positions of indels. Many new features have been subsequently added.
Full DWGSIM Profiledyngen
https://github.com/dynverse/dyngen
Spearheading future omics analyses using dyngen, a multi-modal simulator of single cells
Description
We present dyngen, a multi-modal simulation engine for studying dynamic cellular processes at single-cell resolution. dyngen is more flexible than current single-cell simulation engines, and allows better method development and benchmarking, thereby stimulating development and testing of computational methods. We demonstrate its potential for spearheading computational methods on three applications: aligning cell developmental trajectories, cell-specific regulatory network inference and estimation of RNA velocity.
Full dyngen ProfileEAGLE
https://github.com/sequencing/EAGLE
Enhanced Artificial Genome Engine: next generation sequencing reads simulator
Description
The Enhanced Artificial Genome Engine (EAGLE) software is designed to simulate the behaviour of Illumina's Next Generation Sequencing instruments, in order to facilitate the development and testing of downstream applications.
Full EAGLE ProfileEasypop
EASYPOP is an individual based model intended to simulate datasets under a very broad range of conditions
Description
EASYPOP can simulate haploid, diploid or haplodiploid data. For diploids there is the choice between hermaphrodites or sexuals. For hermaphrodites, the proportion of clonal reproduction and selfing can be chosen, whereas for sexuals, complex breeding structures can be simulated (e.g. monogamy with a given proportion of extra-pair matings). The number of individuals can be selected for each population and dispersal is sex-specific. There are various migration models such as two-dimensional stepping stone or hierarchical island model. In addition there is an isolation-by-distance option which works with the coordinates of the populations on any number of dimensions. There are also several mutation models implemented, which are particularly oriented on the simulation of microsatellite loci. Genotypes are real multilocus, (i.e. there are not independent replicates for each locus). All mutation parameters can be set individually for each locus. EASYPOP is able to handle very large simulations on standard personal computers and is limited only by the memory of the machine. The computer code has been optimized for maximum speed. This allows running very large simulations on personal computers in a reasonable amount of time. In order to fit to analytical xpectations in particular for variances, the functions implemented in EASYPOP are probabilistic and not deterministic. In other words, the simulations rely on the genertation of random numbers.
Full Easypop ProfileEggLib
http://egglib.sourceforge.net/
EggLib is a C++/Python library and program package for evolutionary genetics and genomics.
Description
EggLib is a C++/Python library and program package for evolutionary genetics and genomics. Main features are sequence data management, sequence polymorphism analysis, coalescent simulations and Approximate Bayesian Computation. EggLib is a flexible Python module with a performant underlying C++ library (which can be used independently), and allows fast and intuitive development of Python programs and scripts. A number of pre-programmed applications of EggLib possibilities are available interactively.
Full EggLib ProfileEpiSIM
https://sourceforge.net/projects/episimsimulator/files/
EpiSIM: simulation of multiple epistasis, linkage disequilibrium patterns and haplotype blocks for genome-wide interaction analysis
Description
Epistasis is a ubiquitous phenomenon in genetics, and is considered to be one of the main factors in current efforts to detect missing heritability for complex diseases. Simulation is a critical tool in developing methodologies that can more effectively detect and study epistasis. Here we present a simulator, epiSIM (epistasis SIMulator), that can simulate some of the statistical properties of genetic data. EpiSIM is capable of expanding the range of the epistasis models that current simulators offer, including epistasis models that display marginal effects and those that display no marginal effects. One or more of these epistasis models can be embedded simultaneously into a single simulation data set, jointly determining the phenotype. In addition, epiSIM is independent of any outside data source in generating linkage disequilibrium patterns and haplotype blocks. We demonstrate the wide applicability of epiSIM by performing several data simulations, and examine its properties by comparing it with current representative simulators and by comparing the data that it generates with real data. Our experiments demonstrate that epiSIM is a valuable addition and a nice complement to the existing epistasis simulators. The software package is available online at https://sourceforge.net/projects/episimsimulator/files/.
Full EpiSIM ProfileESCO
https://github.com/JINJINT/ESCO
ESCO: single cell expression simulation incorporating gene co-expression
Description
Ensemble Single-cell expression simulator incorporating gene CO-expression, ESCO, is constructed as an ensemble of the best features among current simulators to preserve the marginal performance, while allowing easily incorporating co-expression structure among genes using a copula. Particularly, ESCO allows realistic simulation of a homogeneous cell group, heterogeneous cell groups, as well as complex cell group relationships such as tree and trajectory structure, together with a flexible input of co-expression. As for technical noise, ESCO integrates the parametric and non-parametric approaches in current literature and gives the user flexibility to choose. In order to mimic a specific real dataset, ESCO can estimate all the hyperparameters in a feasible way for both a homogeneous cell group or heterogeneous cell groups. ESCO is implemented in the R package ESCO, which is built upon the R package Splatter (Zappia et al., 2017), in order to provide a unified software framework.
Full ESCO ProfileEvolSimulator
http://bioinformatics.org.au/tools/evolsim/
A simulation test bed for hypotheses of genome evolution
Description
EvolSimulator is a program that allows the simulation of evolution at the level of genes, gene families, and whole genomes. It was designed with the goal of investigating evolutionary phenomena like biased mutation regimes in different lineages, complicated patterns of selective pressure across sequences, and the confounding effects of paralogy and lateral genetic transfer.
Full EvolSimulator ProfileEvolveAGene
https://sourceforge.net/projects/evolveagene/?source=navbar
A realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions
Description
EvolveAGene 3 is a realistic coding sequence simulation program that separates mutation from selection and allows the user to set selection conditions, including variable regions of selection intensity within the sequence and variation in intensity of selection over branches. Variation includes base substitutions, insertions, and deletions.
Full EvolveAGene ProfileFASTQSim
https://sourceforge.net/projects/fastqsim
platform-independent data characterization and in silico read generation for NGS datasets
Description
FASTQSim is a tool that provides the dual functionality of Next-Gen Sequencing dataset characterization and metagenomic data generation. FASTQSim is sequencing platform-independent, and computes distributions of read length, quality scores, indel rates, single point mutation rates, indel size, and similar statistics for any sequencing platform. To create training or testing datasets, FASTQSim has the ability to convert target sequences into in silico reads with matching error profiles.
Full FASTQSim Profilefastsimcoal2
http://cmpg.unibe.ch/software/fastsimcoal2/
A continuous-‐time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios
Description
While preserving all the simulation flexibility of simcoal2, fastsimcoal is now implemented under a faster continous-time sequential Markovian coalescent approximation, allowing it to efficiently generate genetic diversity for different types of markers along large genomic regions, for both present or ancient samples. It includes a parameter sampler allowing its integration into Bayesian or likelihood parameter estimation procedure. fastsimcoal can handle very complex evolutionary scenarios including an arbitrary migration matrix between samples, historical events allowing for population resize, population fusion and fission, admixture events, changes in migration matrix, or changes in population growth rates. The time of sampling can be specified independently for each sample, allowing for serial sampling in the same or in different populations. Different markers, such as DNA sequences, SNPs, STRs (microsatellites) or multi-locus allelic data can be generated under a variety of mutation models (e.g. finite- and infinite-site models for DNA sequences, stepwise or generalized stepwise mutation model for STRs data, infinite-allele model for standard multi-allelic data). fastsimcoal can simulate data in genomic regions with arbitrary recombination rates, thus allowing for recombination hotspots of different intensities at any position. fastsimcoal implements a new approximation to the ancestral recombination graph in the form of sequential Markov coalescent allowing it to very quickly generate genetic diversity for >100 Mb genomic segments. fastsimcoal2 now allows one to estimate demographic parameters from the (joint) site frequency spectrum (SFS) using simulations to compute the expected SFS and a robust method for the maximization of the composite likelihood.
Full fastsimcoal2 ProfileFastSLINK
https://watson.hgen.pitt.edu/register/soft_doc.html
Simulation of Marker and Phenotype Data in Pedigrees
Description
FastSLINK permits simulation of marker and phenotype data in large pedigrees. Both power and significance can be evaluated. FastSLINK also supports locus heterogeneity.
Full FastSLINK ProfileFAVITES
https://github.com/niemasd/FAVITES
FrAmework for VIral Transmission and Evolution Simulation
Description
FAVITES (FrAmework for VIral Transmission and Evolution Simulation) is a robust modular framework for the simultaneous simulation of a transmission network and viral evolution, as well as simulation of sampling imperfections of the transmission network and of the sequencing process (Moshiri et al., 2018). The framework is robust in that the simulation process has been broken down into a series of interactions between abstract module classes, and the user can simply plug in each desired module implementation (or implement one from scratch) to customize any stage of the simulation process.
Full FAVITES ProfileFFPopSim
http://webdav.tuebingen.mpg.de/ffpopsim/
C++/Python library for population genetics.
Description
FFPopSim is a C++ and Python library to simulate large populations that are polymorphic at many loci. It allows for complex fitness functions, including pairwise and higher order epistasis. It is designed to study the effects of linked selection, the rare processes in large populations, and can be used to address a large variety of population genetics problems.
Full FFPopSim ProfileFIGG
http://insilicogenome.sourceforge.net/
FIGG is a genome simulation tool that uses known or theorized variation frequency, per a given fragment size and grouped by GC content across a genome to model new genomes in FASTA format while tracking applied mutations for use in analysis
Description
FIGG is a genome simulation tool that uses known or theorized variation frequency, per a given fragment size and grouped by GC content across a genome to model new genomes in FASTA format while tracking applied mutations for use in analysis tools or population simulations. FIGG uses Apache MapReduce and HBase to rapidly generate individual genomes and allow users to scale up generation to fit specific project needs.
Full FIGG ProfileFLUX SIMULATOR
http://confluence.sammeth.net/display/SIM/Home
The Flux Simulator aims at providing a deterministic in silico reproduction of the experimental pipelines for RNA-Seq, employing a minimal set of parameters.
Description
The FluxSimulator is the part of the FLUX project that aims at providing an in silico reproduction of the experimental pipelines for RNA-Seq, adopting a minimal set of parameters. Corresponding models were established after analyzing RNA-Seq experiments from different cell types, sample preparation protocols and sequencing platforms. The first step of the FLUX project is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are mimicked-whether they incur during library construction, or, in the sequencing process. The FluxSimulator provides a flexible base to design benchmark experiments based on the new sequencing technologies, as for instance abundance predictions of the FluxCapacitor.
Full FLUX SIMULATOR Profileforqs
https://bitbucket.org/dkessner/forqs
Forward-in-time simulation of Recombination, Quantitative Traits, and Selection
Description
forqs is a forward-in-time population genetics simulation that tracks individual haplotype chunks as they recombine each generation. forqs also also models quantitative traits and selection on those traits. forqs is implemented as a command-line C++ program, using a modular design that gives the user great flexibility in creating custom simulations. It is freely available with a permissive BSD license.
Full forqs ProfileFPG
https://bio.cst.temple.edu/~hey/software
Forward Population Genetic simulation
Description
FPG (for Forward Population Genetic simulation) simulates a population of constant size that is undergoing various evolutionary processes, including: mutation, recombination, natural selection, and migration. The meaning of "forward" in this context is simply that time, within the simulation, moves forward just as it does in the real world. This is in contrast to coalescent population genetic simulation in which time, as represented within the simulation, proceeds back into the past. Coalescent simulations have many advantages, but they are unwieldy if they incorporate natural selection on multiple sites.
Full FPG ProfileFreeHi-C
https://github.com/yezhengSTAT/FreeHiC
FreeHi-C simulates high fidelity Hi-C data for benchmarking and data augmentation
Description
FreeHi-C (v2.0) is short for Fragment interactions empirical estimation for fast simulation of Hi-C data. It is a data-driven Hi-C data simulator for simulating and augmenting Hi-C datasets. FreeHi-C employs a non-parametric strategy for estimating an interaction distribution of genome fragments and simulates Hi-C reads from interacting fragments. Data from FreeHi-C exhibit higher fidelity to the biological Hi-C data. FreeHi-C not only can be used to study and benchmark a wide range of Hi-C analysis methods but also boosts power and enables false discovery rate control for differential interaction detection algorithms through data augmentation. Different from FreeHi-C (v1.0), a spike-in module is added enabling the simulation of true differential chromatin interactions. FreeHi-C is designed for studies that are prone to simulate Hi-C interactions from the real data and add deviations from the true ones. Therefore, FreeHi-C requires real Hi-C sequencing data (FASTQ format) as input along with user-defined simulation parameters. FreeHi-C will eventually provide the simulated genomics contact counts in a sparse matrix format (BED format) which is compatible with the standard input of downstream Hi-C analysis.
Full FreeHi-C ProfileFREGENE
http://www.ebi.ac.uk/projects/BARGEN
FREGENE is a C++ program that simulates sequence-like data over large genomic regions in large diploid populations.
Description
FREGENE works forwards-in-time which allows a wide range of demographic and selection scenarios to be implemented. Many such models are already incorporated into FREGENE, and since it is open source users can modify or extend these. Coalescent methods have difficulty incorporating large amounts of gene conversion or crossover (Hoggart et al. 2007), whereas these pose no particular problem for FREGENE. FREGENE offers a flexible model for recombination hotspots, and can readily simulate regions up to tens of Mb on a standard desktop computer. The principle limitation of forward-in-time algorithms is computational, since the entire population must be tracked through time, not only the chromosomes that are ancestral to the observed sample. FREGENE implements many features to enhance computational efficiency, and includes a rescaling option that greatly reduces computation time at the cost of some approximation.
Full FREGENE Profilefwdpp
https://github.com/molpopgen/fwdpp
A C++ template library for implementing efficient forward simulations.
Description
Fwdpp is a C++11 library intended to help implement forward-time population genetic simulations.
Full fwdpp ProfileG2P
https://github.com/XiaoleiLiuBio/G2P
A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation
Description
A Genome-Wide-Association-Study Simulation Tool for Genotype Simulation, Phenotype Simulation, and Power Evaluation
Full G2P ProfileGAMETES
https://sourceforge.net/projects/gametes/?source=navbar
Genetic Architecture Model Emulator for Testing and Evaluating Software: Simulates complex SNP models with pure, strict epistatic interactions with n-loci.
Description
Rapid, user friendly software package, able to generate whole populations of “worst-case-scenario” complex genetic models with random architectures, but a user specified set of constraints (i.e. number of loci, heritability, allele frequencies, prevalence). Intended for testing and evaluating algorithms or software for their ability to detect and model epistatic interactions in the absence of any main effects. The next version will add the ability to generate heterogeneous datasets (specifically datasets which concurrently contain both epistatic and heterogeneous effects.
Full GAMETES ProfileGARLIC
https://github.com/caballero/Garlic
Artificial DNA sequence generator
Description
A common practice in computational genomic analysis is to use a set of 'background' sequences as negative controls for evaluating the false-positive rates of prediction tools, such as gene identification programs and algorithms for detection of cis-regulatory elements. Such 'background' sequences are generally taken from regions of the genome presumed to be intergenic, or generated synthetically by 'shuffling' real sequences. This last method can lead to underestimation of false-positive rates. We developed a new method for generating artificial sequences that are modeled after real intergenic sequences in terms of composition, complexity and interspersed repeat content. These artificial sequences can serve as an inexhaustible source of high-quality negative controls. We used artificial sequences to evaluate the false-positive rates of a set of programs for detecting interspersed repeats, ab initio prediction of coding genes, transcribed regions and non-coding genes. We found that RepeatMasker is more accurate than PClouds, Augustus has the lowest false-positive rate of the coding gene prediction programs tested, and Infernal has a low false-positive rate for non-coding gene detection. A web service, source code and the models for human and many other species are freely available at http://repeatmasker.org/garlic/.
Full GARLIC ProfileGASP
http://research.nhgri.nih.gov/gasp/
Genometric Analysis Simulation Program. A software tool for testing and investigating methods in statistical genetics by generating samples of family data based on user specified models.
Description
The Genometric Analysis Simulation Program (G.A.S.P.) is a software tool that can generate samples of family data based on user specified genetic models. Data generated can be as simple as a single sample of random individuals with a single normally distributed trait or as complex as thousands of samples of extended families with multiple traits based on a linear combination of major locus, polygenic, common sibship environment and covariate components. Traits can be generated based on a number of user specified components, and components can be unique to a single trait or shared by multiple traits. The user first specifies a list of all desired components and then creates each trait by specifying the desired component weighted by its contribution to the phenotypic variance. G.A.S.P. can be used in two ways. First, data can be generated in a standalone fashion. The resulting family data can be saved and then used as sample data for demonstrating applications and methods of genetic analysis or for testing and verifying newly developed algorithms in statistical genetics. A simple driver ("dataonly") is provided for this application. Second, data can be generated and analyzed immediately using an existing statistical package. A driver can be designed to call subroutine versions of widely available genetic analysis programs.
Full GASP ProfileGCTA
http://cnsgenomics.com/software/gcta/
Genome-wide Complex Trait Analysis
Description
GCTA (Genome-wide Complex Trait Analysis) was originally designed to estimate the proportion of phenotypic variance explained by genome- or chromosome-wide SNPs for complex traits (the GREML method), and has subsequently extended for many other analyses to better understand the genetic architecture of complex traits. GCTA currently supports the following functionalities: 1) Estimate the genetic relationship from genome-wide SNPs; 2) Estimate the inbreeding coefficient from genome-wide SNPs; 3) Estimate the variance explained by all the autosomal SNPs; 3) Partition the genetic variance onto individual chromosomes; 4) Estimate the genetic variance associated with the X-chromosome; 5) Test the effect of dosage compensation on genetic variance on the X-chromosome; 6) Predict the genome-wide additive genetic effects for individual subjects and for individual SNPs; 7) Estimate the LD structure encompassing a list of target SNPs; 8) Simulate GWAS data based upon the observed genotype data; 9) Convert Illumina raw genotype data into PLINK format; 10) Conditional & joint analysis of GWAS summary statistics without individual level genotype data; 11) Estimating the genetic correlation between two traits (diseases) using SNP data; 12) Mixed linear model association analysis
Full GCTA ProfileGemSIM
http://sourceforge.net/projects/gemsim/
Next generation sequencing read simulator
Description
GemSIM is a software package for generating realistic simulated next generation sequencing reads with quality score values. Both Illumina and Roche/454 reads (single or paired end) can be simulated using empirically derived error models.
Full GemSIM ProfileGeneEvolve
https://github.com/rtahmasbi/GeneEvolve
A fast and memory efficient forward-time simulator of realistic whole-genome sequence and SNP data
Description
GeneEvolve is a user-friendly and efficient population genetics simulator that handles complex evolutionary and life history scenarios and generates individual-level phenotypes and realistic whole-genome sequence or SNP data. GeneEvolve runs forward-in-time, which allows it to provide a wide range of scenarios for mating systems, selection, population size and structure, migration, recombination and environmental effects. The software is designed to use as input data from real or previously simulated phased haplotypes, allowing it to mimic very closely the properties of real genomic data.
Full GeneEvolve ProfileGeneSPIDER
https://bitbucket.org/sonnhammergrni/genespider/src/master/
Gene regulatory network inference benchmarking with controlled network and data properties
Description
Inference of gene regulatory networks (GRNs) is a central goal in systems biology. It is therefore important to evaluate the accuracy of GRN inference methods in the light of network and data properties. Although several packages are available for modelling, simulate, and analyse GRN inference, they offer limited control of network topology together with system dynamics, experimental design, data properties, and noise characteristics. Independent control of these properties in simulations is key to drawing conclusions about which inference method to use in a given condition and what performance to expect from it, as well as to obtain properties representative of real biological systems.
Full GeneSPIDER ProfileGENLIB
https://github.com/R-GENLIB/GENLIB
An R package for the analysis of genealogical data
Description
GENLIB is an R package specifically designed to analyze large genealogical datasets. Genealogical data from human founder populations can contribute to research in diverse fields from genetic epidemiology to historical geography, along with population genetics, evolutionary biology, demography and social history. Animal and plant geneticists also need to analyze large pedigrees. GENLIB has several functionalities ranging from descriptive statistics specifically developed for genealogical data to simulations of genomic segments passed down the genealogies from the founders. GENLIB functions can be grouped into 4 categories: i) genealogical data management, ii) data description and visualisation, iii) computation of relevant statistics (e.g., kinship coefficients for pairs of individuals) and iv) simulations.
Full GENLIB ProfileGENOME
http://csg.sph.umich.edu/liang/genome/
A rapid coalescent-based whole genome simulator
Description
GENOME is a program to simulate sequences drawn from a population under the Wright-Fisher neutral model (Ewens 1979). It is based on a standard coalescent model (Hudson 1983, 1990; Donnelly & Tavaré 1995). Starting with the sampled sequences and moving backward in time, coalescent, recombination and migration events are simulated at each generation. These events could occur multiple times and could happen in the same generation. Each coalescent event is recorded and the resulting genealogy tree is constructed. Demographic events such as population bottlenecks and expansions or population merges and splits can also be simulated. In addition to uniform recombination rates, it is possible to allow recombination rates to vary so as to mimic the pattern of hotspots along the genome. After simulating a coalescent tree, mutations are placed along each branch. The number of mutations on each branch follows a Poisson distribution with mean equal to the product of the mutation rate and the branch length. The infinite-site mutation model is assumed, so no recurrent mutation can occur. The genealogy tree can also be output in Newick format, which is identical to that used by programs such as PHYLIP (Felsenstein 2005) and seq-gen (Rambaut & Grassly 1997). The program is written in C++ and is portable to multiple operating systems. The following sections will describe how to download and compile the program and how to specify the parameters for the program.
Full GENOME ProfileGenomePop2
http://acraaj.webs.uvigo.es/GenomePop2.htm
GenomePop2 is a specialization of the program GenomePop just to manage SNPs under more flexible and useful settings. If you need models with more than 2 alleles please use the GenomePop program version.
Description
This new version allows the forward simulation of sequences of biallelic positions. As in the previous version, a number of evolutionary and demographic settings are allowed. Several populations under any migration model can be implemented, contraction-expansion scenarios, directional or divergent selection. Theoretical or simulated initial equilibrium population can be computed the same as speciation processes via the simulation of user-defined population splits. Each population consists of a number N of individuals. Each individual is represented by one or more chromosomes with constant or variable (hotspots) recombination between binary sites.
Full GenomePop2 ProfileGenomeSimla
https://ritchielab.org/research/research-areas/statistical-genetics-and-gen-epi/methods/genomesimla
GenomeSIMLA is currently under development- however, we have a beta release that we are asking to be tested
Description
GenomeSimla uses Hardy-Weinburg mating to advance simulated genetic data forward through time from generation to generation. Next, we included two distinct algorithms to aide the user in developing various types of disease models: SIMLA for diseases with interactions and main effects and simPEN for embedding purely epistatic models.
Full GenomeSimla ProfileGenomic Variant Simulator
https://cadd.gs.washington.edu/simulator
generating simulated single nucleotide and indel variants
Description
The script for generating simulated single nucleotide and indel variants as well as the parameter files used to simulate the variants for the above manuscript are available for download here. This software is released under a MIT license (license text available from the ZIP-archive). Please see the README file contained in the ZIP-archive for further information about the software.
Full Genomic Variant Simulator ProfileGenPhyloData
https://code.google.com/p/jprime/
realistic simulation of gene family evolution
Description
PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and—perhaps more interestingly—also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock.
Full GenPhyloData ProfileGENS2
https://sourceforge.net/projects/gensim/
Simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions.
Description
The Gene-Environment iNteraction Simulator 2 (GENS2) simulates interactions among two genetic and one environmental factor and also allows for epistatic interactions. GENS2 is based on data with realistic patterns of linkage disequilibrium, and imposes no limitations either on the number of individuals to be simulated or on number of non-predisposing genetic/environmental factors to be considered. The GENS2 tool is able to simulate gene-environment and gene-gene interactions. To make the Simulator more intuitive, the input parameters are expressed as standard epidemiological quantities. GENS2 is written in Python language and takes advantage of operators and modules provided by the simuPOP simulation environment. GENS2 is not intended to simulate the evolution of a population, but to simulate complex gene-environment interactions in case-control sample. It shuold be used along with simuPOP, a software that allows realistic evolutionary simulation (or an equivalent simulator), to simulate dataset on which apply disease model .
Full GENS2 ProfileGeonomics
https://github.com/erthward/geonomics
A Python package for simulation of genomic evolution on complex and dynamic landscapes
Description
Geonomics is a Python package for forward-time, individual-based, continuous-space, population genomic simulations on complex and dynamic landscapes. Geonomics models are parameterized by way of an informatively annotated parameters file that provides the user a straightforward means of building models of arbitrary complexity while offering reasonable default settings and “off switches” for parameters and components unrelated to the user’s interests. Models consist of 1) a landscape with one or more environmental layers, each of which can undergo arbitrarily complex environmental change events and 2) one or more species having genomes with realistic architecture and any number of associated phenotypes. Species undergo non-Wright-Fisher evolution in continuous space, with localized mating and mortality, such that species-level phenomena and simulation dynamics are emergent properties of a model’s parameterization. Evolution is comprehensively tracked by way of tskit data structures that record the complete spatial pedigree, providing for the customizable output of rich, 3D data sets in a variety of common formats, including VCF and FASTA for genomic data, GeoTiff for landscape data, and CSV, Shapefile, and GeoJSON for individuals’ nongenomic data (location, environmental values, phenotypes, age, and sex). All of this allows Geonomics to produce realistic landscape genomic results useful for a wide variety of theoretical and empirical purposes.
Full Geonomics ProfileGPOPSIM
https://github.com/SCAU-AnimalGenetics/GPOPSIMv2
GPOPSIM is a simulation tool for pedigree, phenotypes, and genome data.
Description
GPOPSIM is a simulation tool for pedigree, phenotypes, and genome data. The software uses a variety of population and genome structures as well as trait genetic architectures. GROPSIM also provides parameter settings for a wide variety of disciplines. The package is capable of simulating multiple genetically correlated traits with given genetic parameters along with underlying genetic architectures.
Full GPOPSIM ProfileGppFst
https://github.com/radamsRHA/GppFst
GppFst is an open source R package that generates posterior predictive distributions of Fst and day under a neutral coalescent model to identify putative targets of selection from genomic data.
Description
GppFst is a posterior predictive simulation (PPS) framework to generate theoretical distributions of FST and dXY under the neutral coalescent model for two populations that accounts for demographic parameters in a probabilistic framework. Importantly, our method allows users to explicitly test the null hypothesis of genetic drift when conducting genomic scans. PPS is a popular method for evaluating model fit within a Bayesian framework that has been used to test a variety of evolutionary models (Gelman et al., 2004; Reid et al., 2014). GppFst explicitly accounts for the demographic history of two genetically-isolated species, including multiple demographic and experimental parameters (and uncertainty in those parameters), such as sample sizes, demographic parameters, unequal rates of genetic drift within populations (unequal s), and divergence time. Our method allows users to simulate theoretical distributions that are conditioned on sampling multiple linked SNPs per locus – allowing users to take full advantage of large genomic datasets. We provide our PPS model in the package GppFst (Genomic Posterior Predictive distributions of FST), which offers a user-friendly, open-source framework to generate theoretical distributions of FST and dXY under the neutral coalescent model.
Full GppFst ProfileGrinder
https://sourceforge.net/projects/biogrinder/
Grinder is a versatile open-source bioinformatic tool to create simulated omic shotgun and amplicon sequence libraries for all main sequencing platforms.
Description
Grinder is a versatile open-source bioinformatic tool to create simulated omic shotgun and amplicon sequence libraries for all main sequencing platforms.
Full Grinder ProfileGS
http://engr.case.edu/li_jing/gs.html
Generating samples for association studies based on HapMap data
Description
A new version of gs is available. In addition to the functionalities implemented earlier, gs2.0 has implemented a comprehensive yet flexible model to simulate genetic and environmental interactions. The program can be used to generate samples in testing algorithms for tag SNP selection, haplotype inference, as well as epistatic detection.
Full GS ProfileGWAsimulator
https://biostat.app.vumc.org/wiki/Main/GWAsimulator
A rapid whole genome simulation program
Description
GWAsimulator is a C++ program that can simulate genotype data for SNP chips that are used in genome-wide association (GWA) studies. It implements a rapid moving-window algorithm (Durrant et al. 2004. AJHG 75:35-43) to simulate whole genome case-control or population samples. It also can simulate specific regions if desired. For case-control data, the program retrospectively sample cases and controls according to a user-specified multi-locus disease model. The program requires phased data as input, and the simulated data will have similar LD patterns as the input data. The program can use HapMap phased data as input and has the flexibility of simulating genotypes for different populations and different SNP chips. Because many large-scale GWA data are becoming available, they can be used instead of the HapMap data as the input, as long as the phase information is generated. These data may provide a better representation of the population under study and more accurate LD information than the HapMap due to much larger sample sizes. See the manual for instructions and detailed description of the program
Full GWAsimulator ProfileHAP-SAMPLE
https://sites.google.com/a/umich.edu/leeshawn/software
An association simulator for candidate regions or genome scans
Description
HAP-SAMPLE is a web application for simulating SNP genotypes for case-control and affected-child trio studies by resampling from Phase I/II HapMap SNP data. The user provides a list of SNPs to be "genotyped," along with a disease model file that describes causal SNPs and their effect sizes. The simulation tool is appropriate for candidate regions or whole-genome scans. The stand-alone software is also available.
Full HAP-SAMPLE ProfileHAPGEN
https://mathgen.stats.ox.ac.uk/genetics_software/hapgen/hapgen2.html
A simulator for the simulation of case control datasets at SNP markers
Description
HAPGEN2 is a an updated version of the program HAPGEN, which simulates case control datasets at SNP markers. The new version can now simulate multiple disease SNPs on a single chromosome, on the assumption that each disease SNP acts independently and are in Hardy-Weinberg equilibrium. We also supply a R package that can simulate interaction between the disease SNPs. We hope to add further facilities to simulate quantitive traits and admixture soon.
Full HAPGEN ProfileHaploDX
https://github.com/remytuyeras/HaploDynamics
A python library to develop genomic data simulators
Description
The HaploDX library provides a collection of functions to generate simulated population-specific genomic data in VCF format. The library includes parameters and functions to control mutation rates, linkage disequilibrium strength and block lengths, and number of individuals. To generate genomic data, the HaploDX framework offers a pipeline of functions that can be used to simulate: (1) the allele frequency spectra of different populations; (2) the Hardy-Weinberg principle for genotypes and haplotypes; (3) linkage disequilibrium across different populations.
Full HaploDX ProfileHapSim
http://cran.r-project.org/web/packages/hapsim/index.html
A simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients
Description
Package for haplotype data simulation. Haplotypes are generated such that their allele frequencies and linkage disequilibrium coefficients match those estimated from an input data set
Full HapSim ProfileHAPSIMU
http://l.web.umkc.edu/liujian/
A program that simulates heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model
Description
HAPSIMU, a program based on real haplotype data from the HapMap ENCODE project, can simulate heterogeneous populations with various known and controllable structures under the continuous migration model or the discrete model. Moreover, both qualitative and quantitative traits can be simulated using additive genetic model with various genetic parameters designated by users.
Full HAPSIMU ProfileIBDsim
http://raphael.leblois.free.fr/
IBDSim is a computer package for the simulation of genotypic data under general isolation by distance models.
Description
IBDSim can consider a large panel of subdivided population models representing discrete subpopulations as well as a large continuous population. Many dispersal distributions, with different tails, can be considered as well as various heterogeneities in space and time of the demographic parameters. For examples of various applications see Leblois et al. (2003), Leblois et al. (2004), Leblois et al. (2006), Rousset & Leblois (2007). The program runs on PC under Windows, Mac or Linux systems, and we provide the source code that can be easily compiled under any system using C++ ISO compiler.
Full IBDsim ProfileIgSimulator
http://yana-safonova.github.io/ig_simulator/
a versatile immunosequencing simulator
Description
IgSimulator is a tool for simulation of antibody repertoire and Ig-seq library. IgSimulator is designed for testing and benchmarking tools for reconstruction of Ig repertoires.
Full IgSimulator Profileindel-Seq-Gen
http://bioinfolab.unl.edu/~cstrope/iSG/
A biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies
Description
indel-Seq-Gen (iSG) is a biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies. This is accomplished through the addition of subsequence length constraints and lineage- and site-specific evolution. iSG tracks insertion and deletion processes that occur during the simulation run. iSG records all evolutionary events and outputs the "true" multiple alignment of the sequences, and can generate a larger simulated sequence space by allowing the use of multiple related root sequences. iSG can be used to test the accuracy of multiple alignment methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein superfamily classification methods.
Full indel-Seq-Gen ProfileIndelible
http://abacus.gene.ucl.ac.uk/software/indelible/
A powerful and flexible simulator of biological evolution
Description
INDELible is a new, portable, and flexible application for biological sequence simulation that combines many features in the same place for the first time. Using a length-dependent model of indel formation it can simulate evolution of multi-partitioned nucleotide, amino-acid, or codon data sets through the processes of insertion, deletion, and substitution in continuous time.
Full Indelible ProfileInSilicoSeq
https://github.com/HadrienG/InSilicoSeq
A sequencing simulator
Description
InSilicoSeq is a sequencing simulator producing realistic Illumina reads. Primarily intended for simulating metagenomic samples, it can also be used to produce sequencing data from a single genome. InSilicoSeq is written in python, and use kernel density estimators to model the read quality of real sequencing data. InSilicoSeq support substitution, insertion and deletion errors. If you don't have the use for insertion and deletion error a basic error model is provided.
Full InSilicoSeq ProfileinterSIM
https://cran.r-project.org/web/packages/InterSIM/index.html
InterSIM: simulation tool for multiple integrative omic datasets. Comput. Methods Prog. Biomed
Description
Generates three inter-related genomic datasets : methylation, gene expression and protein expression. Input: Number of samples, proportion of samples in the cluster groups, cluster mean shift parameter delta and a few other options. Output: Generation of three datasets, methylation, gene expression and protein expression with inter- and intra- correlations having cluster group information. Also, the true clustering clustering assignment to each subject is generated.
Full interSIM ProfileinvertFREGENE
http://www.ebi.ac.uk/projects/BARGEN/
InvertFREGENE is a forward-in-time simulator of inversions in population genetic data
Description
invertFREGENE is the forward-in-time simulator of inversions in population genetic data, while SAMPLE samples genotype and haplotype data from the output of invertFREGENE simulations based on specified individual and marker ascertainment criteria, including a continuous and case-control disease model. invertFREGENE has been developed from a beta version of the population genetic simulator FREGENE, and as a result there are a small number of features not included in invertFREGENE (eg. it does not model natural selection), therefore we provide self-contained documentation for invertFREGENE. O'Reilly PF, Coin LJ, Hoggart CJ. invertFREGENE: software for simulating inversions in population genetic data. Bioinformatics. 2010 Mar 15;26(6):838-40.
Full invertFREGENE ProfileJ-SPACE
https://github.com/BIMIB-DISCo/J-Space.jl
J-SPACE: a Julia package for the simulation of spatial models of cancer evolution and of sequencing experiments
Description
J-SPACE is a Julia package to simulate the spatial growth and the genomic evolution of a cell population and the experiment of sequencing the genome of the sampled cells. Firstly, the software simulates the spatial dynamics of the cells as a continuous-time multi-type birth-death stochastic process on a graph employing different rules of interaction and an optimised Gillespie algorithm. After mimicking a spatial sampling of the tumour cells, J-SPACE returns the phylogenetic tree of the sample and simulates molecular evolution of the genome under the infinite-site models or a set of different substitution models. Ther is also the possibility of include indels. Finally, employing ART, J-SPACE generates the synthetic single-end, paired-/mate-pair end reads of the next-generation sequencing platforms.
Full J-SPACE ProfilekernalPop
http://cran.r-project.org/src/contrib/Archive/kernelPop/
A spatially explicit population genetic simulation engine
Description
Individual-based, spatially explicit models provide a mechanism to understand distributions of individuals on the landscape; however, few models have been coupled with population genetics. The primary benefits of such a combination is to assess performance of populationgenetic estimators in realistic situations. KERNELPOP represents a flexible framework to implement almost any arbitrary population-genetic and demographic model in a spatially explicit context using a variety of dispersal kernels. Estimates of type I error associated with genome scans in metapopulations are provided as an illustration of this software’s utility
Full kernalPop ProfileLongISLND
https://github.com/bioinform/longislnd
LongISLND is a read simulator which profiles the characteristics of third generation, single-molecule sequencing technologies and simulates accordingly
Description
LongISLND is a read simulator which profiles the characteristics of third generation, single-molecule sequencing technologies and simulates accordingly. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. Please read on to see application examples to PacBio and oxford nanopre (ONT) data.
Full LongISLND ProfileLSH-GAN
https://github.com/Snehalikalall/LSH-GAN
LSH-GAN enables in-silico generation of cells for small sample high dimensional scRNA-seq data
Description
A fundamental problem of downstream analysis of scRNA-seq data is the unavailability of enough cell samples compare to the feature size. This is mostly due to the budgetary constraint of single cell experiments or simply because of the small number of available patient samples. Here, we present an improved version of generative adversarial network (GAN) called LSH-GAN to address this issue by producing new realistic cell samples. We update the training procedure of the generator of GAN using locality sensitive hashing which speeds up the sample generation, thus maintains the feasibility of applying the standard procedures of downstream analysis. LSH-GAN outperforms the benchmarks for realistic generation of quality cell samples. Experimental results show that generated samples of LSH-GAN improves the performance of the downstream analysis such as feature (gene) selection and cell clustering. Overall, LSH-GAN therefore addressed the key challenges of small sample scRNA-seq data analysis.
Full LSH-GAN ProfileMaCS
https://github.com/gchen98/macs
Markovian Coalescent Simulator
Description
MaCS is a simulator of the coalescent process that simulates geneologies spatially across chromosomes as a Markovian process. The algorithm is similar to the SMC algorithm (McVean and Cardin, Phil Trans Soc R B 2005) in that the algorithm scales linearly in time with respect to sample size and sequence length. However, it more accurately models the true coalescent, while supporting all demographic scenarios found in the popular program MS (Hudson, Bioinformatics 2002) making this program appropriate for simulating data for structured populations in genome wide association studies.
Full MaCS ProfileMarlin
http://www.patrickmeirmans.com/software/Marlin.html
Marlin provides a user-friendly interface for performing forward-in-time population genetic simulations.
Description
Marlin is a program for running spatially explicit forward-in-time population genetic simulations. It provides an intuitive user interface with realistic geographic scenarios can easily be easily created and simulated. But Marlin goes further than that and directly analyses and plots the results. This combination of creation, simulation, and analysis makes Marlin ideal for teaching and for scientists who are interested in doing simulations without having to learn command-line operations.
Full Marlin ProfileMason
http://www.seqan.de/projects/mason/
A package for the simulation of nucleotide data.
Description
Mason is a package for the simulation of nucleotide data. Starting with a genome, you can simulate variants and optionally also methylation levels. From this, reads of different technologies can be simulated, optionally simulating bisulphite treatment. The variants can also be specified as a VCF file. The result are FASTQ files with the reads and optionally a SAM file with the alignment to the reference sequence. Substeps of the process are available as standalone tools, e.g. for the simulation of reads from preselected/-simulated fragments, computing of genomic sequences with variants. The time intensive part of read simulation has been parallelized.
Full Mason ProfileMaSS-Simulator
https://github.com/pcdslab/MaSS-Simulator
MaSS-Simulator: A Highly Configurable Simulator for Generating MS/MS Datasets for Benchmarking of Proteomics Algorithms
Description
MaSS-Simulator offers many configuration options to allow the user a great degree of control over the test datasets, which can enable rigorous and large- scale testing of any proteomics algorithm. MaSS-Simulator is assessed by comparing its performance against experimentally generated spectra and spectra obtained from NIST collections of spectral library. The results show that MaSS-Simulator generated spectra match closely with real-spectra and have a relative-error distribution centered around 25%. In contrast, the theoretical spectra for same peptides have relative-error distribution centered around 150%. MaSS-Simulator will enable developers to specifically highlight the capabilities of their algorithms and provide a strong proof of any pitfalls they might face. Source code, executables, and a user manual for MaSS-Simulator can be downloaded from https://github.com/pcdslab/MaSS-Simulator.
Full MaSS-Simulator Profilembs
http://www.sendou.soken.ac.jp/esb/innan/InnanLab/software.html
modifying Hudson's ms software to generate samples of DNA sequences with a biallelic site under selection
Description
A software application to generate samples of DNA sequences when there is a biallelic site targeted by selection. mbs is developed by modifying Hudson's ms. The mbs software is so flexible that it can incorporate any arbitrary histories of population size changes and any mode of selection as long as selection is operating on a biallelic site.
Full mbs ProfileMendel's Accountant
http://mendelsaccount.sourceforge.net/
Mendel's Accountant (MENDEL) is an advanced numerical simulation program for modeling genetic change over time and was developed collaboratively by Sanford, Baumgardner, Brewer, Gibson and ReMine
Description
MENDEL is a genetic accounting program that allows realistic numerical simulation of the mutation/selection process over time. MENDEL is applicable to either haploid or diploid organisms, having either sexual or clonal reproduction. Each mutation that enters the simulated population is tracked from generation to generation to the end of the experiment - or until that mutation is lost either as a result of selection or random drift. Using a standard personal computer, the MENDEL program can be used to generate and track millions of mutations within a single population. MENDEL's input variables include such things as mutation rate, distribution specifications for mutation effects, extent of dominance, mating characteristics, selection method, average fertility, heritability, non-scaling noise, linkage block properties, chromosome number, genome size, population size, population sub-structure, and number of generations. The MENDEL program outputs, both in tabular and graphic form, provide several types of data including: deleterious and beneficial mutation counts per individual, mean individual fitness as a function of generation count, distribution of mutation effects, and allele frequencies. MENDEL provides biologists with a new tool for research and teaching, and allows for the modeling of complex biological scenarios that would have previously been impossible.
Full Mendel's Accountant ProfileMetaPopGen
https://github.com/MarcoAndrello/MetaPopGen
Simulates genetics in large size metapopulations
Description
MetaPopGen is a population genetics simulator. Features included in the model are age-structure, monoecious and dioecious (or separate sexes) life-cycles, mutation, dispersal and selection. All demographic parameters can be genotype-, sex-, age-, deme- and time-dependent. MetaPopGen is therefore indicated to study large populations and very complex demographic scenarios.
Full MetaPopGen ProfileMetaSim
https://software-ab.informatik.uni-tuebingen.de/download/metasim/welcome.html
A tool to generate collections of synthetic reads that reflect the diverse taxonomical composition of typical metagenome data sets
Description
The aim of MetaSim is to provide a tool for the simulation of reads based on given genome sequences refecting (adaptable) error models of current sequencing technologies. Additionally, the user is able to determine the abundance of the chosen taxa. Therefore, MetaSim integrates an induced tree view of the NCBI taxonomy that can be used to interactively select taxa and inner nodes of the taxonomy to congure their relative abundances. Another feature of MetaSim allows the user to simulate an evolved population of a single genome sequence, using a population simulator. This feature is aimed at simulating the common real world situation that many dierent, but closely related strains of a lineage coexist in the same habitat. The resulting data sets can be used to plan and design metagenome studies and for evaluation and improvement of metagenomic software tools and assembly algorithms.
Full MetaSim ProfilemetaSPARSim
https://gitlab.com/sysbiobig/metasparsim
metaSPARSim is a sparse count matrix simulator intended for usage in the development of pipelines for 16S rRNA metagenomic data processing.
Description
metaSPARSim is a sparse count matrix simulator intended for usage in development of pipelines 16S rRNA metagenomic data processing. metaSPARSim implements a new generative process that models the sequencing process with a Multivariate Hypergeometric in order to realistically reproduce these data considering their characteristic aspects, such as compositionality and sparsity. It provides ready-to-use count matrices and comes with the possibility to reproduce different pre-coded scenarios or to tune internal parameters in order to create a tailored count matrix that better fits some prior information or specific characteristic an expert user may want to consider.
Full metaSPARSim ProfileMichiGAN
https://github.com/welch-lab/MichiGAN
MichiGAN: sampling from disentangled representations of single-cell data using generative adversarial networks
Description
MichiGAN is a novel neural network that combines the strengths of VAEs and GANs to sample from disentangled representations without sacrificing data generation quality. MichiGAN allows us to manipulate semantically distinct aspects of cellular identity and predict single-cell gene expression response to drug treatment.
Full MichiGAN ProfileMimicrEE2
https://sourceforge.net/projects/mimicree2/
MimicrEE2: Genome-wide forward simulations of Evolve and Resequencing studies
Description
MimicrEE2 is a multi-threaded Java program for genome-wide forward simulations of evolving populations. MimicrEE2 enables the convenient usage of available genomic resources, supports biological particulars of model organism frequently used in E&R studies and offers a wide range of different adaptive models (selective sweeps, polygenic adaptation, epistasis). Due to its user-friendly and efficient design MimicrEE2 will facilitate simulations of E&R studies even for small labs with limited bioinformatics expertise or computational resources. Additionally, the scripts provided for executing MimicrEE2 on a computer cluster permit the coverage even of a large parameter space. MimicrEE2 runs on any computer with Java installed.
Full MimicrEE2 ProfileMinnow
https://github.com/COMBINE-lab/minnow
Minnow is a read level simulator for droplet based single cell RNA-seq data.
Description
Analysis pipelines usually validate their results by using marker genes and simulated data from gene-count-level simulators. The impact of using different read-alignment or UMI deduplication methods has not been investigated. Assessments usually start by assuming a count matrix where the effects for resolving UMI counts from raw read data are ignored. Minnow differs in the respect by modeling Unique Molecule Identifiers selection. Minnow is a read level simulator for droplet based single cell RNA-seq data. Minnow simulates reads by either sampling sequences from the de-Buijin graph of the reference transcriptome or by sampling sequences from the reference transcriptome itself.
Full Minnow Profilemlcoalsim
https://github.com/CRAGENOMICA/mlcoalsim-v2
Multilocus Coalescent Simulations
Description
The application program mlcoalsim (multilocus coalescent simulations) is designed to: (i) Generate samples and calculate neutrality tests, and other statistics, under stationary model, several demographic models or strong positive selection by mean of coalescent theory. (i) Perform coalescent simulations with the mutational phase given: 1. the population mutation rate θ (θ = 4Nμ, where N is the effective population size and μ is the mutational rate). 2. a fixed number of mutations. 3. a distribution of θ values. A prior uniform (bounded) and a gamma distributions are enabled. 4. a fixed number of biallelic segregating sites taking into account the uncertainty of the population mutation rate (conditioning on biallelic segregating sites). A prior uniform (bounded) and a gamma distributions are enabled. (iii) Perform coalescent simulations with recombination given: 1. the population recombination rate R (R = 4Nr, where r is the recombination rate). 2. a distribution of r values. A prior uniform (bounded) and a gamma distributions are enabled. 3. a fixed number of minimum recombination events (Rm) taking into account the uncer- tainty of the population recombination rate (fixing Rm). A prior uniform (bounded) and a gamma distributions are enabled. 4. a fixed number of minimum recombination events (Rm) and a fixed number of haplo- types, considering the uncertainty of the population recombination rate. (iv) Perform multilocus analyses. Linked loci and unlinked loci are enabled. Multilocus statistics for unlinked loci are the average and the variance for each statistic. (v) Include recurrent mutations (multiple hits) or not. (vi) Include heterogeneity in mutation rate across the length of the sequence. A gamma distri- bution is used. Also, a number of invariant positions can also be defined. (vii) Include heterogeneity in recombination rate across the length of the sequence. A gamma distribution is used. Hotspots or a constant value for all positions are possible. This program is based on a previous version of Hudson’s coalescent program ms (Hudson, 2002) and modified for the above purposes. The function to calculate minimum recombinant values is a modification of Wall’s code (Wall, 2000). The gamma function was partially obtained from Grassly, Adachi and Rambaut code (Grassly et al., 1997). This program is distributed under the GNU GPL License. Version 2 includes parallel computation for multiple locus and the possibility to include priors for each of the parameters (useful for ABC computation analysis). The input file has been modified.
Full mlcoalsim ProfileMOSim
https://www.bioconductor.org/packages/release/bioc/html/MOSim.html
An R package for the simulation of multi-omic experiments that mimic regulatory mechanisms within the cell.
Description
MOSim is an R package for the simulation of multi-omic experiments that mimic regulatory mechanisms within the cell. Gene expression (RNA-seq count data) is the central data type simulated by MOSim, while the rest of available omic data types provide gene regulation information and include ATAC-seq (DNase-seq), ChIP-seq, small RNA-seq and Methyl-seq. In addition to these omics, regulation by transcription factors (TFs) can also be modeled.
Full MOSim Profilems
http://home.uchicago.edu/~rhudson1/source/mksamples.html
The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets.
Description
The program ms can be used to generate many independent replicate samples under a variety of assumptions about migration, recombination rate and population size to aid in the interpretation of such polymorphism studies. The samples are generated using the now standard coalescent approach in which the random genealogy of the sample is rst generated and then mutations are randomly place on the genealogy (Kingman, 1982; Hudson, 1990; Nordborg, 2001). The usual small sample approximations of the coalescent are used. An infinitesites model of mutation is assumed, and thus multiple-hits and back mutations do not occur. However, when used in conjunction with other programs, finite-site mutation models or micro-satellite models can be studied. For example, the gene trees themselves can be output, and these gene trees can be used as input to other programs which will evolve the sequences under a variety of finite-site models. These are described later. The program is intended to run on Unix, or Unix-like operating systems, such as Linux or MacOsX. The next section describes how to download and compile the program. The subsequent sections described how to run the program and in particular how to specify the parameter values for the simulations.
Full ms ProfilemsHOT
http://home.uchicago.edu/~rhudson1/
The purpose of this program is to allow one to investigate the statistical properties of such samples, to evaluate estimators or statistical tests, and generally to aid in the interpretation of polymorphism data sets.
Description
This addition to Hudson’s (2002) ms, called msHOT, allows for implementation of multiple crossover hotspots and/or multiple gene conversion hotspots in the simulated genetic region. Crossover hotspots may overlap with gene conversion hotspots, but crossover hotspots may not overlap with each other and gene conversion hotspots may not overlap with each other.
Full msHOT Profilemsms
http://www.mabs.at/ewing/msms/index.shtml
A coalescent Simlation tool with selection.
Description
This document describes how to use msms, a tool to generate sequence samples under both neutral models and a single locus selection model. msms permits the full range of demographic models provided by ms(Hudson, 2002). In partic-ular, it allows for multiple demes with arbitrary migration patterns, population growth and decay in each deme, and for population splits and mergers. Selection (including dominance) can depend on the deme and also change with time. The program is designed to be command line compatible to ms, however no prior knowledge of ms is assumed for this document. Applications of this program include power studies, analytical comparisons, approximated Bayesian computation among many others. Because most applications require the generation of a large number of independent replicates, the code is designed to be efficient and fast. For the neutral case, it is comparable to ms and even faster for large recombination rates. For selection, the performance is only slightly slower, making this one of the fastest tools for simulation with selection. The program has been developed with a wide number of possible operating systems and hardware in mind. For this reason, the code has been developed in Java and can run on any hardware that supports Java 1.6. This includes Mac OS X, all current versions of MS Windows, and most Unix flavors (Linux, Sun, BSD). The Java programing language is also popular and widely known which should facilitate the writing of extensions for the program.
Full msms Profilemsnsam
https://github.com/rossibarra/msnsam
Hudson's ms with variable sample size across loci
Description
This version is the October 2007 version of the ms code with the added ability to include the number of samples (nsam) as a tbs argument. Please see Hudson's website for details on ms as well as installation instructions, but please email me for questions or bug reports, as bugs are likely mine and NOT part of the original code. The primary motivation for this is to allow efficient simulation of datasets with unequal sampling across loci. Running this version of ms using sample size as a tbs argument appears to be much faster than running an independent ms run for each of many loci.
Full msnsam Profilemsprime
https://pypi.python.org/pypi/msprime
A fast and accurate coalescent simulator.
Description
Msprime is a reimplementation of Hudson’s classical ms program for modern datasets.
Full msprime ProfileMutation-Simulator
https://github.com/mkpython3/Mutation-Simulator
Mutation-Simulator: fine-grained simulation of random mutations in any genome
Description
Mutation-Simulator allows the introduction of various types of sequence alterations in reference sequences, with reasonable compute-time even for large eukaryotic genomes. Its intuitive system for fine-grained control over mutation rates along the sequence enables the mimicking of natural mutation patterns. Using standard file formats for input and output data, it can easily be integrated into any development and benchmarking workflow for high-throughput sequencing applications.
Full Mutation-Simulator ProfileMySSP
http://www.rosenberglab.net/software.html
A program for the simulation of DNA sequence evolution across a phylogenetic tree
Description
MySSP is a new program for the simulation of DNA sequence evolution across a phylogenetic tree. Although many programs are available for sequence simulation, MySSP is unique in its inclusion of indels, flexibility in allowing for non-stationary patterns, and output of ancestral sequences. Some of these features can individually be found in existing programs, but have not all have been previously available in a single package.
Full MySSP Profilenanosim
https://github.com/bcgsc/NanoSim
Nanopore sequence read simulator
Description
NanoSim is a fast and scalable read simulator that captures the technology-specific features of ONT data, and allows for adjustments upon improvement of nanopore sequencing technology.
Full nanosim ProfileNEAT
https://github.com/zstephens/neat-genreads
NEAT read simulation tools
Description
NEAT-genReads is a fine-grained read simulator. GenReads simulates real-looking data using models learned from specific datasets. There are several supporting utilities for generating models used for simulation.
Full NEAT ProfileNemo
A forward-time, individual-based, genetically explicit, and stochastic simulation program designed to study the evolution of genetic markers, life history traits, and phenotypic traits in a flexible (meta-)population framework.
Description
Nemo implements many different life cycles and evolvable traits with a large variety of genetic architectures. Species interaction between a parasite and its host can also be modeled (i.e., Cytoplasmic-Incompatibility inducing endosymbiont: Wolbachia). All this is framed within a flexible metapopulation model that allows for patch-specific carrying capacities, dispersal rates (dispersal matrices), stochastic extinction/harvesting rates, and demographic stochasticity. Populations can be dynamically modified during a simulation, allowing for population bottlenecks, patch fusion/fission, population expansion, etc. Spatially heterogeneous selection on quantitative traits can also be modeled. Nemo's interface is a simple text file containing the simulation parameters. Large batches of simulations can be run from a single parameter file with multiple parameter values. Many complex evolutionary and demographic scenarios can be modeled easily by providing temporally varying parameter values.
Full Nemo ProfileNeSSM
http://cbb.sjtu.edu.cn/~ccwei/pub/software/NeSSM.php
A Next-Generation Sequencing Simulator for Metagenomics
Description
NeSSM is a tool to generate Next-Generation Sequencing (NGS) reads with parameters set by users. The goal of NeSSM is to generate metagenome sequencing reads close to the reality. Currently, 454, Illumina sequencing platforms are supported. It can help develop methods or systems for metagenomics analysis.
Full NeSSM ProfileNetRecodon
http://code.google.com/p/netrecodon/
Coalescent simulation of coding DNA sequences with recombination (inter and intracodon), migration and demography
Description
NetRecodon is a population genetic simulator that generates samples of nucleotide and codon sequences from haploid/diploid populations with inter and intracodon recombination, migration, growth and dated tips. It can also run in several processors using MPI. Operative systems Source code and a makefile are provided for compilation in any OS with a C compiler, along with some compiled executables.
Full NetRecodon ProfileNPBSS
https://github.com/NWPU-903PR/NPBSS_Octave
PacBio sequencing simulator
Description
By analyzing the characteristic features of CLR data from PacBio SMRT (single molecule real time) sequencing, we developed a new PacBio sequencing simulator (called NPBSS) for producing CLR reads. NPBSS simulator firstly samples the read sequences according to the read length logarithmic normal distribution, and choses different base quality values with different proportions. Then, NPBSS computes the overall error probability of each base in the read sequence with an empirical model, and calculates the deletion, substitution and insertion probabilities with the overall error probability to generate the PacBio CLR reads. Alignment results demonstrate that NPBSS fits the error rate of the PacBio CLR reads better than PBSIM and FASTQSim. In addition, the assembly results also show that simulated sequences of NPBSS are more like real PacBio CLR data.
Full NPBSS ProfileOmicsSIMLA
A simulation tool for generating multi-omics data with disease status
Description
OmicsSIMLA is a simulation tool for generating multi-omics data with disease status. Currently, OmicsSIMLA has four main modules: SeqSIMLA, pWGBSSimla, RNA-Seq, and RPPA. SeqSIMLA can simulate sequence data in families with multiple affected and unaffected siblings or unrelated case-control samples under different disease models. pWGBSSimla is a profile-based whole-genome bisulphite sequencing data simulator, which can simulate whole-genome DNA methylation (WGBS), reduced representation bisulfite sequencing (RRBS), and oxidative bisulfite sequencing (oxBS-seq) data while modeling methylation quantitative trait loci, allele-specific methylations, and differentially methylated regions. RNA-Seq uses a negative binomial distribution to simulate NGS read counts for gene expression. Finally, RPPA uses a mass-action kinetic action model to simulate protein expression data.
Full OmicsSIMLA ProfileOncoSimulR
https://github.com/rdiaz02/OncoSimul
BioConductor package for Forward Genetic Simulation of Cancer Progresion with Epistasis
Description
An R/BioConductor package that provides functions for forward population genetic simulation in asexual populations, with special focus on cancer progression. Fitness can be an arbitrary function of genetic interactions between multiple genes or modules of genes, including epistasis, order restrictions in mutation accumulation, and order effects. Mutation rates can differ between genes, and we can include mutator/antimutator genes (to model mutator phenotypes). Simulations use continuous-time models and can include driver and passenger genes and modules. Also included are functions for simulating random DAGs of the type found in Oncogenetic Trees, Conjunctive Bayesian Networks, and other cancer progression models; plotting and sampling from single or multiple realizations of the simulations, including single-cell sampling; plotting the parent-child relationships of the clones; generating random fitness landscapes (Rough Mount Fuji, House of Cards, and additive models) and plotting them.
Full OncoSimulR ProfilePARA-suite
https://github.com/akloetgen/PARA-suite
PAR-CLIP specific sequence read simulation and processing
Description
PAR-CLIP Analyzing suite. Useful tools for short and error prone sequence read handling. Note, that the PARA-suite addon of the Burrows-Wheeler Aligner (BWA) is necessary for the mapping tool of the PARA-suite.
Full PARA-suite ProfilePaSS
PaSS is an effective sequence simulator for PacBio sequencing
Description
PacBio Sequencing Simulator (PaSS) can learn sequence patterns from PacBio sequencing data currently available. In addition to the distribution of read lengths and error rates, we included a context-specific sequencing error model. Compared to existing PacBio sequencing simulators such as PBSIM, LongISLND and NPBSS, PaSS performed better in many aspects. Assembly tests also suggest that reads simulated by PaSS are the most similar to experimental sequencing data.
Full PaSS ProfilePBSIM2
https://github.com/yukiteruono/pbsim2
a simulator for long-read sequencers with a novel generative model of quality scores
Description
PacBio sequencers produced two types of characteristic reads: CCS (short and low error rate) and CLR (long and high error rate), both of which could be useful for de novo assembly of genomes. PBSIM simulates those PacBio reads by using either a model-based or sampling-based simulation.
Full PBSIM2 ProfilePEDAGOG
https://bcrc.bio.umass.edu/pedigreesoftware/node/5
Software for simulating eco-evolutionary population dynamics
Description
PEDAGOG is a Windows program that simulates population dynamics at the individual level, allows for heritability and selection of traits, records individual genotype and pedigree information, and allows for several types of errors to manifest in the output which can be formatted for 57 existing software programs. In all, parameters can be specified for genetics, demographics, mating strategy, mutations and genetic/demographic errors, growth models, heritability and selection, and output. Demographic parameters can be either age or size based, and all parameters can be drawn from twelve statistical distributions where appropriate.
Full PEDAGOG Profilepg-gan
https://github.com/mathiesonlab/pg-gan
create realistic simulated data that matches real population genetic data.
Description
This software can be used to create realistic simulated data that matches real population genetic data. It implements a GAN-based algorithm (Generative Adversarial Network).
Full pg-gan ProfilePGsim
https://github.com/lrjuan/PGsim
A Comprehensive and Highly Customizable Personal Genome Simulator
Description
we designed and developed PGsim, a comprehensive and highly customizable individual genome simulator, that fully uses existing knowledge, such as variant allele frequencies in global or world main populations, mutation probability differences between protein-coding regions and non-coding regions, transition/transversion (Ti/Tv) ratios, Indel incidence, Indel length distribution, structural variation sites, and pathogenic mutation sites. Users can flexibly control the proportion and quantity of known variants, common variants, novel variants in both coding and non-coding regions, and special variants through detailed parameter settings. To ensure that the simulated personal genome has sufficient randomness, PGsim makes the generated variants more real and reliable in terms of variant distribution, proportion, and population characteristics. PGsim is able to employ a huge volume database as background data to simulate personal genomes and does not require SQL database support. Users can easily change the variant databases used as needed.
Full PGsim ProfilephastSim
https://github.com/NicolaDM/phastSim
phastSim: Efficient simulation of sequence evolution for pandemic-scale datasets
Description
We present phastSim, a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.
Full phastSim Profilephenosim
http://evoplant.uni-hohenheim.de/downloads/
A tool to add phenotypes to simulated genotypes
Description
phenosim reads the output of commonly used coalescent simulators and simulates a phenotype based on a user-defined trait model for each individual. The simulated data can be used to assess the influence of various factors such as demography, genetic architecture or selection on the statistical power of association methods to detect causal genetic variants under a wide variety of population genetic scenarios.
Full phenosim ProfilePhenotypeSimulator
https://github.com/HannahVMeyer/PhenotypeSimulator
flexible simulation of phenotypes from different genetic and non-genetic (noise) components.
Description
PhenotypeSimulator allows for the simulation of complex phenotypes under different models, including genetic variant effects and infinitesimal genetic effects (reflecting population structure) as well as correlated, non-genetic covariates and observational noise effects. Different phenotypic effects can be combined into a final phenotype while controlling for the proportion of variance explained by each of the components. For each component, the number of variables, their distribution and the design of their effect across traits can be customised.
Full PhenotypeSimulator Profilephylodyn
https://github.com/mdkarcher/phylodyn
Phylodyn facilitates phylodynamic inference and analysis in an approachable R package.
Description
Phylon is an r package for phylodynamic analysis based on gene genealogies. The package applies Bayesian nonparametric estimation for population size fluctuations over time. The software includes Markov chain Monte Carlo-based methods and an integrated nested Laplace approximation-based approach for phylodynamic inference. The genealogical data describes the timed ancestral relationships of individuals sampled from a population of interest. The individuals within the software are simulated according to isochronous sampling or heterochronous sampling. The purpose of phylodyn is to fascilitate phylodynamic inference and analysis in an approachable R package.
Full phylodyn ProfilePhyloSim
http://www.ebi.ac.uk/goldman-srv/phylosim/
An R package for the Monte Carlo simulation of sequence evolution
Description
PhyloSim is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, PhyloSim can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing PhyloSim to be adapted to specific needs. Key features of PhyloSim include 1) Simulation of the evolution of a set of discrete characters with arbitrary states evolving by a continuous-time Markov process with an arbitrary rate matrix. 2) Explicit implementations of the most popular substitution models (nucleotide, amino acid and codon substitution models). 3) Simulation under the popular models of among-sites rate variation, like the gamma (+G) and invariant sites plus gamma (+I+G) models. 4) The possibility to simulate under arbitrarily complex patterns of among-sites rate variation by setting the site specific rates according to any R expression. 5) ... please refer to our documentation for details.
Full PhyloSim ProfilepiBUSS
https://rega.kuleuven.be/cev/ecv/software/pibuss
a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios
Description
πBUSS is a BEAST/BEAGLE utility for sequence simulation, which provides an easy to use interface that allows flexible and extensible phylogenetic data fabrication, delegating computationally intensive tasks to the BEAGLE library and thus making full use of multi-core architectures.
Full piBUSS ProfilepIRS
https://code.google.com/p/pirs/
Profile-based Illumina pair-end reads simulator
Description
It simulates Illumina reads with empirical Base-Calling and GC%-depth profiles trained from real re-sequencing data. It considers error & quality distributions, as well as coverage bias patterns. In addition, pIRS also comes with a tool to simulate the heterozygous diploid genomes.
Full pIRS ProfilePolyester
http://bioconductor.org/packages/release/bioc/html/polyester.html
simulating RNA-seq datasets with differential transcript expression
Description
simulate RNA-seq reads from differential expression experiments with replicates. The reads can then be aligned and used to perform comparisons of methods for differential expression.
Full Polyester ProfilePOWSC
https://github.com/suke18/POWSC
POWSC is a computational tool that is used for power evaluation and sample size estimation in scRNA-seq.
Description
POWSC is an R package designed for sc-RNA-seq. The software plays three roles: parameter estimator, data simulator, and power assessor. As a parameter estimator, POWSC accurately captures the characterized parameters for any specific cell type from expression data. The simulator generates synthetic data based on a rigorous simulation mechanism that includes zero expression values. POWSC also performs comprehensive power analysis and reports stratified target powers for DE genes.
Full POWSC ProfilepowsimR
https://github.com/bvieth/powsimR
powsimR assess power and sample size requirements for differential expression (DE) analysis of single cell and bulk RNA-seq experiments.
Description
powsimR assess power and sample size requirements for differential expression (DE) analysis of single cell and bulk RNA-seq experiments. The number of replicates required to achieve the desired statistical power is determined by technical noise and biological variability. Both of these variables are considerably larger if the biological replicates are single cells. powsimR can not only estimate sample sizes necessary to achieve a certain power, but also informs about the power to detect DE in a data set at hand.
Full powsimR ProfilePReFerSim
https://github.com/LohmuellerLab/PReFerSim
PReFerSim is an ANSI C program that performs forward simulations under the PRF model.
Description
PReFerSim is an ANSI C program that performs forward simulations under the PRF model. PReFerSim models changes in population size, inbreeding, dominance, and distributions of selective effects. PReFerSim allows the tracking of summaries for genetic variations over time along with the output trajectories of selected alleles.
Full PReFerSim ProfilePROSSTT
https://github.com/soedinglab/prosstt
PROSSTT: probabilistic simulation of single-cell RNA-seq data for complex differentiation processes
Description
PROSSTT (PRObabilistic Simulations of ScRNA-seq Tree-like Topologies) is a package with code for the simulation of scRNAseq data for dynamic processes such as cell differentiation. PROSSTT is open source GPL-licensed software implemented in Python. Single-cell RNAseq is revolutionizing cellular biology, and many algorithms are developed for the analysis of scRNAseq data. PROSSTT provides an easy way to test the performance of trajectory inference methods on realistic data with a known "gold standard". The algorithm can produce datasets with user-defined topologies while simulating any number of sampled cells and genes.
Full PROSSTT ProfileProteinEvolver
http://code.google.com/p/proteinevolver/
Simulation of protein evolution along phylogenies under structure-based substitution models
Description
ProteinEvolver generates samples of protein-coding genes and protein sequences evolved along phylogenies under structure-based substitution models. These models consider the protein structure to evaluate candidate mutations, which can be accepted (substitutions) or rejected depending on the energy of the protein structure of the mutated sequence. The simulation of molecular evolution occurs along phylogenetic histories, which can be either user-specified or simulated by the coalescent modified with recombination (including recombination hotspots), migration, demographics and longitudinal sampling.
Full ProteinEvolver ProfilepSBVB
https://github.com/lauzingaretti/pSBVB
Polyploid sequence based virtual breeding (pSBVB) is a modification of SBVB software that allows simulating traits of an arbitrary genetic complexity in polyploids.
Description
pSBVB is a modification of SBVB software that simulates traits of an arbitrary genetic complexity in polyploids. pSBVB simulates complex traits and genotype data starting with a vcf file that contains the genotypes of founder individuals and follows a given pedigree. The main output is the genotypes of all individuals in the pedigree and/or molecular relationship matrices (GRM) using all sequence or a series of SNP lists, together with phenotype data. The program implements very efficient algorithms where only the recombination breakpoints for each individual are stored, therefore allowing the simulation of thousands of individuals very quickly.
Full pSBVB ProfilepWGBSSimla
https://omicssimla.sourceforge.io/index.html
a profile-based whole-genome bisulphite sequencing data simulator
Description
a profile-based whole-genome bisulphite sequencing data simulator
Full pWGBSSimla ProfilePysim-sv
https://github.com/xyc0813/pysim/
Pysim-sv: a package for simulating structural variation data with GC-biases
Description
Pysim-sv is a package for simulating HTS data to evaluate performance of SV detection algorithms. Pysim-sv can introduce a wide spectrum of germline and somatic genomic variations. The package contains functionalities to simulate tumor data with aneuploidy and heterogeneous subclones, which is very useful in assessing algorithm performance in tumor studies. Furthermore, Pysim-sv can introduce GC-bias, the most important and prevalent bias in HTS data, in the simulated HTS data.
Full Pysim-sv ProfilePyvolve
https://github.com/sjspielman/pyvolve
A Flexible Python Module for Simulating Sequences along Phylogenies
Description
Pyvolve is an open-source Python module for simulating sequences along a phylogenetic tree according to continuous-time Markov models of sequence evolution.
Full Pyvolve ProfileQMSim
http://www.aps.uoguelph.ca/~msargol/qmsim/
QTL and Marker Simulator
Description
Linkage disequilibrium (LD) and linkage analyses have been used extensively to identify quantitative trait loci (QTL) in human and livestock. Owing to the recent developments in genotyping technologies, dense marker maps are now available for several livestock species. Even though genotyping costs have substantially declined, large scale genome-wide association studies are still costly. For this reason many studies in livestock suffer from small sample size or from low density of markers. However, simulation is a highly valuable tool for assessing and validating new proposed methods for association studies at very low cost. During the last few decades, simulation has played a major role in answering a wide variety of questions in genomics. Several software have been developed for simulating genomes especially in human research. However most of the developed software tools do not provide functionality required for many of the applications in livestock. QMSim was developed to simulate large scale genomic data in livestock populations. QMSim is a family based simulator, which can also take into account predefined evolutionary features, such as LD, mutation, bottlenecks and expansions. The simulation is basically carried out in two steps: In the first step, a historical population is simulated to establish mutation-drift equilibrium and, in the second step, recent population structures are generated, which can be complex.
Full QMSim ProfilequantiNEMO
http://www2.unil.ch/popgen/softwares/quantinemo/
An individual-based program for the analysis of quantitative traits with explicit genetic architecture potentially under selection in a structured population
Description
quantiNEMO is an individual-based, genetically explicit stochastic simulation program. It was developed to investigate the effects of selection, mutation, recombination, and drift on quantitative traits with varying architectures in structured populations connected by migration and located in a heterogeneous habitat. quantiNEMO is highly flexible at various levels: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, mating system, etc. quantiNEMO is a console program, and is coded in standard C++ using an object oriented approach, runs on any computer platform, and is distributed under an open source license.
Full quantiNEMO Profilequantinemo 2
https://www2.unil.ch/popgen/softwares/quantinemo/
A swiss knife to simulate complex demographic and genetic scenarios, forward and backward in time.
Description
QuantiNemo 2 is a stochastic simulation program for quantitative population genetics. It was developed to investigate the effects of selection, mutation, recombination and drift on quantitative traits and neutral markers in structured populations connected by migration and located in heterogeneous habitats. A specific feature is that it allows to switch between an individual-based full-featured mode and a population-based, faster mode. Several demographic, genetic and selective parameters can be finetuned in QuantiNemo 2: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, and mating system are the main features.
Full quantinemo 2 Profilereadsim
https://sourceforge.net/projects/readsim/
Simple reads simulator for pacbio & nanopore
Description
Simple reads simulator for pacbio & nanopore
Full readsim ProfileRECOAL
https://github.com/cjkang/RECOAL
Simulates new haplotype data from a reference population of haplotypes.
Description
RECOAL simulates new haplotype data from a reference population of haplotypes. A coalescent genealogy for the reference haplotype data is sampled from the appropriate posterior probability distribution, then a coalescent genealogy is simulated which extends the sampled genealogy to include new haplotype data. The new haplotype data will therefore contain both some of the existing polymorphic sites and new polymorphisms added based on the structure of the simulated coalescent genealogy.
Full RECOAL ProfileRecodon
https://github.com/MiguelArenas/recodon
Coalescent simulation of coding DNA sequences with recombination, migration and demography
Description
Recodon can simulate samples of coding DNA sequences under complex scenarios in which several evolutionary forces can interact simultaneously (namely, recombination, migration and demography). The basic codon model implemented is an extension to the general time-reversible model of nucleotide substitution with a proportion of invariable sites and among-site rate variation. In addition, the program implements non-reversible processes and mixtures of different codon models.
Full Recodon ProfileREGENS
https://github.com/EpistasisLab/regens/tags
Simulates whole autosomes from real genomic segments in a way that preserves the input autosomes' linkage disequilibrium (LD) pattern.
Description
REGENS (REcombinatory Genome ENumeration of Subpopulations) is an open-source Python package that simulates whole genomes from real genomic segments. REGENS recombines these segments in a way that simulates completely new individuals while simultaneously preserving the input genomes' linkage disequilibrium (LD) pattern with extremely high fidelity. It takes plink (bed, bim, fam) file sets of existing genotype data as input and produces new (bed, bim, fam) file sets as output. REGENS can also simulate mono-allelic and epistatic single nucleotide variant (SNV) effects on a continuous or binary phenotype without perturbing the simulated LD pattern. REGENS was measured to be 88.5 times faster and require 6.2 times lower peak RAM on average than a similar algorithm called Triadsim. Our publication (https://doi.org/10.21105/joss.02743) and supplementary repository (https://github.com/EpistasisLab/regens-analysis) both contain more technical details. See our REGENS repository (REGENS repository (https://github.com/EpistasisLab/regens) for the source code, as well as detailed instructions and examples.
Full REGENS ProfileReSeq
https://github.com/schmeing/ReSeq
ReSeq simulates realistic Illumina high-throughput sequencing data
Description
Real Sequence Reproducer shortens the gap between simulated and real data evaluations by adequately reproducing key statistics of real data, like the coverage profile, systematic errors and the k-mer spectrum. When these characteristics are translated into new synthetic computational experiments (i.e. simulated data), the performance can be more accurately estimated. Combining our simulator and real data gives two valuable perspectives on the performance of tools to minimize biases.
Full ReSeq ProfileREvolver
http://www.cibiv.at/software/revolver/
Modeling sequence evolution under domain constraints
Description
REvolver is a program to simulate protein sequence evolution. REvolver automatically integrates domain information described by a profile Hidden Markov Model (pHMM) into the simulation. In the simulation of protein evolution it often had been assumed that sites evolve identically and independently from each other. This simplification is necessary since information concerning site specific evolution is frequently unavailable. However, homologous sequences and domains have been collected, aligned, and pHMMs built. The pHMM describes the variability and shared characteristics of sequences that share a common ancestor. Here we do have knowledge about what sites are conserved, at what positions in the sequences insertions are more likely, or what sites can be deleted. Pfam (Finn et al., 2010) and SMART (Letunic, Doerks and Bork, 2009) are examples for databases providing such data. REvolver is the first method, for simulating protein sequence evolution that integrates this pre-existing information about evolution in an automatic fashion.
Full REvolver Profilerlsim
A package for simulating RNA-seq library preparation with parameter estimation
Description
The rlsim package is a collection of tools for simulating RNA-seq library construction, aiming to reproduce the most important factors which are known to introduce significant biases in the currently used protocols: hexamer priming, PCR amplification and size selection. It allows for a systematic exploration of the effects of the individual biasing factors and their interactions on downstream applications by simulating data under a variety of parameter sets. The implicit simulation model implemented in the main tool (rlsim) is inspired by the actual library preparation protocols and it is more general than the models used by the bias correction methods hence it allows for a fair assessment of their performance. Although the simulation model was kept as simple as possible in order to aid usability, it still has too many parameters to be inferred from data produced by standard RNA-seq experiments. However, simulating datasets with properties similar to specific datasets is often useful. To address this, the package provides a tool (effest) implementing simple approaches for estimating the parameters which can be recovered from standard RNA-seq data (GC-dependent amplification efficiencies, fragment size distribution, relative expression levels).
Full rlsim ProfileRmetasim
http://cran.r-project.org/web/packages/rmetasim/index.html
Rmetasim is a front-end for the metasim engine that is implemented as a package that runs in the statistical computing environment R
Description
Rmetasim provides a flexible environment in which to perform individual-based population genetic simulations. A wide range of landscape-level dynamics, population structures, and within-population demographies can be represented using the framework implemented in this software. In addition, temporal variation in all demographic characteristics can be simulated, both deterministically and stochastically. Such simulations can be used to produce null distributions of genotypes under realistic conditions. These genotypic data can then be used by a variety of analytical programs to develop null expectations of any population genetic statistic estimated from genotypic data.
Full Rmetasim ProfileRNA Seq Simulator
https://github.com/HuntsmanCancerInstitute/USeq
RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.
Description
RSS takes SAM alignment files from RNA-Seq data and simulates over dispersed, multiple replica, differential, non-stranded RNA-Seq datasets.
Full RNA Seq Simulator ProfileRose
http://bibiserv.techfak.uni-bielefeld.de/rose/
Random model of sequence evolution
Description
Rose implements a new probabilistic model of the evolution of RNA-, DNA-, or protein-like sequences. Guided by an evolutionary tree, a family of related sequences is created from a common ancestor sequence by insertion, deletion and substitution of characters. During this artificial evolutionary process, the `true' history is logged and the `correct' multiple sequence alignment is created simultaneously. The model also allows for varying rates of mutation within the sequences, making it possible to establish so-called sequence motifs. The data created by Rose are suitable for the evaluation of methods in multiple sequence alignment computation and the prediction of phylogenetic relationships. It can also be useful when teaching courses in or developing models of sequence evolution and in the study of evolutionary processes.
Full Rose ProfileRSVSim
https://bioconductor.org/packages/release/bioc/html/RSVSim.html
an R/Bioconductor package for the simulation of structural variations
Description
RSVSim is a tool for the simulation of deletions, insertions, inversions, tandem duplications and translocations of various sizes in any genome available as FASTA-file or data package in R. The structural variations can be generated randomly, based on user-supplied genomic coordinates or associated to various kinds of repeats. The package further comprises functions to estimate the distribution of structural variation sizes from real datasets.
Full RSVSim Profilesanta-sim
https://github.com/santa-dev/santa-sim
SANTA simulates the evolution of gene sequences.
Description
SANTA is JAVA software application that simulates the evolution of a population of gene sequences forwards through time. It models the underlying biological processes as discrete components; replication (including recombination), mutation (including indels), fitness and selection. SANTA is easy to use and is well-suited to simulate pathogen evolution according to different scenarios.
Full santa-sim ProfilescDesign
https://github.com/Vivianstats/scDesign
scDesign assess scRNA-seq experimental design in the context of differential gene expression analysis.
Description
scDesign quantitatively assesses scRNA-seq experimental design. The software also assists in computational method development by generating high-quality synthetic scRNA-seq datasets under customized experimental settings. scDesign is reproducible across biological replicates and independent studies.
Full scDesign ProfilecscGAN
https://github.com/imsb-uke/scGAN
cscGAN learns non-linear gene-gene dependencies from cell type samples in order to generate realistic cells of defined types.
Description
cscGAN learns non-linear gene-gene dependencies from cell type samples in order to generate realistic cells of definitely types. Augmenting sparse cell populations improves the detection of marker genes, the robustness of and reliability of classifiers as well as the assessment of novel analysis algorithms.
Full cscGAN Profilescrm
A coalescent simulator optimized for long sequences and large samples.
Description
The Sequential Coalescent with Recombination Model (SCRM) is a new method that efficiently and accurately approximates the coalescent with recombination. It closes the gap between current approximations and the exact model and can be used to simulate genomic-scale data sets with an essentially correct linkage structure. The efficient C++ implementation scrm is available for all major platforms and as an R package on CRAN.
Full scrm ProfileSCSilicon
https://github.com/xikanfeng2/SCSilicon
SCSilicon: a tool for synthetic single-cell DNA sequencing data generation
Description
SCSilicon efficiently generates single-cell in silicon DNA reads with minimum manual intervention. SCSilicon first creates the genome sequence (FASTA file) for each single-cell by automatically simulating a collection of genomic aberrations, including SNP, SNV, Indel, and CNV. Likewise, SCSilicon yields the ground truth of CNV segmentation breakpoints and subclone cell labels. Then, SCSilicon amplifies the genome and generates FASTQ reads. We have manually inspected a series of synthetic variations (SNP, SNV, Indel, and CNV breakpoint) generated by SCSilicon, and evaluated three start-of-the-art single-cell CNV callers.
Full SCSilicon ProfileSCSIM
https://github.com/flahertylab/scsim
SCSIM: Jointly simulating correlated single-cell and bulk next-generation DNA sequencing data
Description
SCSIM simulates DNA sequencing data from hierarchically grouped (correlated) samples where each sample is designated bulk or single-cell. Our tool uses a simple configuration file to define the experimental arrangement and can be integrated into software pipelines for testing of variant callers or other genomic tools.
Full SCSIM ProfileSECNVs
https://github.com/YJulyXing/SECNVs
SECNVs: A Simulator of Copy Number Variants and Whole-Exome Sequences From Reference Genomes
Description
SECNVs (Simulator of Exome Copy Number Variants) is a fast, robust and customizable software application for simulating copy number variants and whole-exome sequences from a reference genome. SECNVs is easy to install, implements a wide range of commands to customize simulations, can output multiple samples at once, and incorporates a pipeline to output rearranged genomes, short reads and BAM files in a single command. Variants generated by SECNVs are detected with high sensitivity and precision by tools commonly used to detect copy number variants.
Full SECNVs ProfileSELECTOR
https://ua.unige.ch/en/agp/outils/selector/
SELECTOR is a program to simulate lineages under selection in a spatially-explicit population framework, written in C++ and running under MS windows and linux.
Description
SELECTOR investigates the evolution of multi allelic genes under balancing or positive selection while also simulating the complex evolutionary scenarios that integrate demographic growth and migration in a spatially explicit population framework. The parameters can be varied in both space and time in order to account for geographical, environmental and cultural heterogeneity. The software can be used to investigate genetic differentiation of loci under balancing selection in interconnected demes with spatially heterogeneous gene flow. SELECTOR is intended to be used for building insight into human settlement history and evolution.
Full SELECTOR ProfileSelSim
https://github.com/trvrb/selsim
population genetic simulation (Not SelSim from Spencer & Coop 2004, which is currently unavailable)
Description
With selsim, an evolving population of sequences is simulated according to a haploid Wright-Fisher model with discrete generations. This uses a Jukes-Cantor mutation model with a specified mutation rate. In each subsequent generation, the population is reconstituted by sampling sequences with replacement proportional to their frequency multiplied by their fitness. Mutations can be advantageous or deleterious and affect fitness in a multiplicative fashion (additive on a log-scale). Sequences are sampled at random time points after a period of burn-in.
Full SelSim ProfileSELVa
https://github.com/bazykinlab/SELVa
Simulator of evolution with landscape variation
Description
SELVa is a simulator of sequence evolution that allows the fitness landscape to vary according to user-specified rules. It is geared towards exploring the effects of landscape change on molecular sequence evolution. SELVa has a variety of options for specifying the rules of landscape change, allowing the user to tailor the simulation to his or her needs and to explore various evolutionary scenarios.
Full SELVa ProfileSeq-Gen
http://tree.bio.ed.ac.uk/software/seqgen/
An application for the Monte Carlo simulation of molecular sequence evolution along phylogenetic trees.
Description
Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution.
Full Seq-Gen ProfileSeqNet
https://github.com/tgrimes/SeqNet
An R package for simulating RNA-seq counts from gene-gene association networks.
Description
Methods to generate random gene-gene association networks and simulate RNA-seq data from them, as described in Grimes and Datta (2021) . Includes functions to generate random networks of any size and perturb them to obtain differential networks. Network objects are built from individual, overlapping modules that represent pathways. The resulting network has various topological properties that are characteristic of gene regulatory networks. RNA-seq data can be generated such that the association among gene expression profiles reflect the underlying network. A reference RNA-seq dataset can be provided to model realistic marginal distributions. Plotting functions are available to visualize a network, compare two networks, and compare the expression of two genes across multiple networks.
Full SeqNet ProfileSEQPower
http://bioinformatics.org/spower/
Statistical power analysis for sequence-based association studies
Description
SEQPower is a software to simulate rare variants data associated with complex traits and to perform power and sample size estimation for sequence based association studies. It features on analytic sample size estimates, power comparison of rare variant association methods as well as validation and evaluation of novel association tests under various study designs.
Full SEQPower ProfileSeqSIMLA
http://seqsimla.sourceforge.net/
SeqSIMLA can simulate sequence data with user-specified disease and quantitative trait models. Family or unrelated case-control data can be simulated.
Description
SeqSIMLA can simulate sequence data in families with multiple affected and unaffected siblings or unrelated case-control data under different disease models. SeqSIMLA accepts a population of sequences generated by other sequence generators. We implemented two disease models, in which the user can flexibly specify the number of disease loci, effect sizes or population attributable risk, disease prevalence, and risk or protective loci. We also implemented a quantitative trait model, in which the user can specify the number of quantitative trait loci (QTL), proportions of variance explained by the QTL, and genetic models. In 2014, we extended SeqSIMLA to create SeqSIMLA2, which can simulate correlated traits and considers the shared environmental effects. SeqSIMLA2 can also simulate prespecified large pedigree structures. There are no restrictions on the number of individuals that can be simulated in a pedigree. In 2015, we implemented SeqSIMLA2_exact, which can simulate sequences with multiple disease sites in large pedigrees with given disease status for each pedigree member, assuming that the disease prevalence is low.
Full SeqSIMLA ProfileSERGIO
https://github.com/PayamDiba/SERGIO
Sergio is a simulator for single-cell gene expression data that models the stochastic nature of the transcription and regulation of genes via transcription factors according to a user-provided gene regulatory network.
Description
Sergio is a simulator for single-cell gene expression data that models the stochastic nature of the transcription and regulation of genes via transcription factors according to a user-provided gene regulatory network. The package can simulate cell types in steady states or cells differentiating to multiple fates. The datasets generated by SERGIO are statistically comparable to experimental data generated by Illumina HiSeq200, Drop-Seq, Illumina 10x chromium and Smart-seq
Full SERGIO ProfileSerial NetEvolve
https://biorg.cis.fiu.edu/SNE/index.htm
A flexible utility for generating serially-sampled sequences along a tree or recombinant network
Description
Serial NetEvolve is a modification of the Treevolve program in which serially sampled sequences are evolved along a randomly generated coalescent tree or network (Grassly et al. 1999; Hudson 1983; Kingman 1982) . Treevolve offers a variety of evolutionary model and population parameters including a rate of recombination and as such it was chosen over other programs to be adapted for the simulation of serially sampled data. The new features include the choice of either a clock-like model of evolution or a variable rate of evolution, simulation of serial samples and the output of the randomly generated tree or network in Newick format or in our newly formulated NeTwick format.
Full Serial NetEvolve ProfileSFS_CODE
http://sfscode.sourceforge.net/SFS_CODE/index/index.html
SFS_CODE can perform forward population genetic simulations under a general Wright-Fisher model with arbitrary migration, demographic, selective, and mutational effects.
Description
SFS_CODE (Selection on Finite Sites under COmplex Demographic Events) performs forward population genetic simulations under a general Wright-Fisher model with arbitrary demographic, selective, and mutational effects.
Full SFS_CODE ProfileSIApopr
https://github.com/olliemcdonald/siapopr#readme
Siapopr is an R package that wraps the C++ functions SIApop. These functions simulate birth-death-mutation processes with mutations having random fitnesses to simulate clonal evolution.
Description
Siapopr is an R package that wraps the C++ functions SIApop. These functions simulate birth-death-mutation processes with mutations having random fitnesses to simulate clonal evolution.
Full SIApopr ProfileSIBSIM
http://sourceforge.net/projects/sibsim/
Quantitative phenotype simulation in extended pedigrees
Description
SIBSIM is a modern and powerful computer program to simulate genotype and quantitative trait data in extended pedigrees. In the current release (2.1.2), we put emphasis on the simulation of a quantitative trait in pedigrees of arbitrary size without monozygotic twins. Well known software as, e.g., the SIMULATE package are not as scalable as SIBSIM. As an advantage over both G.A.S.P. and SIMLA no predefined boundaries restrict SIBSIM in its potential, neither in genome nor in family size. Instead, SIBSIM is as highly scalable as possible to meet any needs. SIBSIM may not only be used in simulation studies, but also in the validation, verification and testing process of other applications which deal with the implementation of statistical analysis of genomic data. We successfully used SIBSIM in the latter respect and detected a bug in a widely used genetic epidemiological software package.
Full SIBSIM Profilesim1000G
https://github.com/adimitromanolakis/sim1000G
sim1000G integrates fully with R and can simulate existing variation from a single VCF file. In addition it can also simulate arbitrary pedigrees.
Description
We develop a new user-friendly and integrated R package, sim1000G, which simulates genomic regions for unrelated individuals or for families. Only a single input of raw phased Variant Call Format (VCF) file is needed. Haplotypes are extracted to compute linkage disequilibrium in the simulated region and then for the generation of new genotype data for unrelated individuals. The covariance across variants is used to preserve the LD structure of the original population. Arbitrary pedigree sizes are generated by modeling recombination events within sim1000G. Various simulation scenarios are presented assuming unrelated individuals from a single population or two distinct populations, or alternatively for three-generation family data. Sim1000G can capture allele frequency diversity, short and long-range linkage disequilibrium (LD) patterns and subtle population differences in LD structure without the need for any tuning parameters.
Full sim1000G Profilesim3C
https://github.com/cerebis/sim3C
Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies
Description
We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error.
Full sim3C ProfileSimAdapt
https://www.openabm.org/model/3137
A spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton.
Description
SimAdapt is a spatially explicit, individual-based, forward-time, landscape-genetic simulation model combined with a landscape cellular automaton to represent evolutionary processes of adaptation and population dynamics in changing landscapes, using the NetLogo environment.
Full SimAdapt ProfileSimBA
https://github.com/ComputationalGenomics/SimBA
SimBA is a non-generative approach to population simulations based on a combination of stochastic techniques and discrete methods.
Description
SimBA is a non-generative approach to population simulations, based on a combination of stochastic techniques and discrete methods. The package contains a hill climbing algorithm and multiple subpopulation structures. SimBA is very sensitive to the input specifications, i.e., very similar but distinct input characteristics result in distinct outputs with high fidelity to the specified distributions. This property of the simulation is not explicitly modeled or studied by previous methods.
Full SimBA ProfileSimBit
https://github.com/RemiMattheyDoret/SimBit
Simbit is an all-purpose, high-performance forward-in-time population genetics simulator.
Description
Simbit is an all-purpose, high-performance forward-in-time population genetics simulator. The software is capable of simulating selection scenarios, demographic scenarios, and mating systems. Simbit can also simulate multiple species along with their ecological relationships. The package comes with an R wrapper that simplifies the management of research projects from the creation of a grid of parameters, and can run simulations and gather inputs for analysis.
Full SimBit ProfileSIMCOAL2
http://cmpg.unibe.ch/software/simcoal2/
A coalescent program for the simulation of complex recombination patterns over large genomic regions under various demographic models
Description
We present here SIMCOAL2, an extended version of the SIMCOAL program (Excoffier et al. 2000), to simulate the neutral genetic diversity at partially linked loci under different histories and a wide range of migration and demographic models. SIMCOAL2 includes a number of new features compared to the previous version: The possibility of arbitrary recombination rates between adjacent loci Multiple coalescent events per generation, allowing the correct simulation of very large samples and very large recombining genomic regions The simulation of SNP data with arbitrary minimum frequency, for instance to simulate ascertainment bias The output of diploid genotypic data generated under the assumption of Hardy-Weinberg equilibrium The simulation of a mixture of different data types (DNA sequence, RFLP, STR, or SNP) along a single chromosome.
Full SIMCOAL2 ProfileSimCopy
An R package simulating the evolution of copy number profiles along a tree.
Description
SimCopy is an R package simulating the evolution of copy number profiles along a tree. It relies on the PhyloSim package for performing the simulations by encoding the genomic regions as sites in sequences and using modified processes acting on them. Please note, that the SimCopy simulations are restricted to a single chromosome. The genomes are encoded as a sequence of sites containing integers identifying genomic regions. Negative integers represent inverted genomic regions. SimCopy supports 1) deletion - deletes genomic regions, 2) duplication - duplicates genomic regions, 3) inversion - changes the orientation of the genomic regions by taking the opposite of the corresponding integer, 4) inverted duplication - duplicates genomic regions and flips their orientation and 5) translocation - translocates a stretch of genomic regions.
Full SimCopy ProfileSIMLA
http://dmpi.duke.edu/simla-simulation-software-version-32
SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies.
Description
SIMLA is a SIMuLAtion program that generates data sets of families for use in Linkage and Association studies. It allows the user flexibility in specifying marker and disease placement, locus heterogeneity, disequilibrium between markers and between markers and disease loci. Output is in the form of a LINKAGE (Lathrop et al., Proc Natl Acad Sci USA 81, 1984) pedigree file and is easily utilized, either directly or with minimal reformatting, as input for various genetic analysis packages.
Full SIMLA ProfileSimLoRD
https://bitbucket.org/genomeinformatics/simlord/src/master/
SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.
Description
SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model. Reads are simulated from both strands of a provided or randomly generated reference sequence.
Full SimLoRD ProfilesimNGS
http://www.ebi.ac.uk/goldman-srv/simNGS/
software for simulating observations from Illumina sequencing machines using the statistical models behind the AYB base-calling software.
Description
simNGS is software for simulating observations from Illumina sequencing machines using the statistical models behind the AYB base-calling software. By default, observations only incorporate noise due to sequencing and do not incorporate effects from more esoteric sources of noise that may be present in real data ("dust", bubbles, merged clusters, sequence-heterogeneous clusters, etc). Many of these additional sources may optionally applied. simNGS takes fasta format sequences and a file describing the covariance of noise between bases and cycles observed in an actual run of the machine, randomly generates noisy intensities representing the signals for the sequence at each cycle and calculates likelihoods for all possible base calls.
Full simNGS ProfileSimPed
http://bioinformatics.org/simped/
A Simulation Program to Generate Haplotype and Genotype Data for Pedigree Structures
Description
SimPed is a program that quickly generates haplotype and/or genotype data for pedigrees of virtually any size and complexity. Marker data either in linkage disequilibrium or equilibrium can be generated for greater than 20,000 diallelic or multiallelic marker loci. Haplotypes and/or genotypes are generated for pedigree structures using specified genetic map distances and haplotype and/or allele frequencies. The simulated data generated by SimPed is useful for a variety of purposes, including evaluating methods that estimate haplotype frequencies for pedigree data, evaluating type I error due to intermarker linkage disequilibrium and estimating empirical p values for linkage and family-based association studies.
Full SimPed ProfileSimPEL
https://github.com/precisionomics/SimPEL
SimPEL is short for Simulation-based Power Estimation for sequencing studies of Low-prevalence conditions.
Description
SimPEL is short for Simulation-based Power Estimation for sequencing studies of Low-prevalence conditions. SimPEL addresses the need for power estimation in low-prevalence condition studies, taking into account their inherently small sample sizes and analytical procedures. SimPEL integrates customizable parameters to realistically model study design outcomes and provide applicable information towards further refinement of experimental procedure. SimPEL is implemented as a function of the established JAWAMix5 tool (Long et al., 2013), an HDF5-based Java implementation for association mapping.
Full SimPEL ProfileSimPhy
https://github.com/adamallo/SimPhy
A comprehensive simulator of gene family evolution
Description
SimPhy simulates the evolution of multiple gene families under incomplete lineage sorting, gene duplication and loss, horizontal gene transfer—all three potentially leading to species tree/gene tree discordance—and gene conversion. SimPhy implements a hierarchical phylogenetic model in which the evolution of species, locus, and gene trees is governed by global and local parameters (e.g., genome-wide, species-specific, locus-specific), that can be fixed or be sampled from a priori statistical distributions. SimPhy also incorporates comprehensive models of substitution rate variation among lineages (uncorrelated relaxed clocks) and the capability of simulating partitioned nucleotide, codon, and protein multilocus sequence alignments under a plethora of substitution models using the program INDELible.
Full SimPhy ProfileSimprot
http://www.uhnresearch.ca/labs/tillier/software.htm#3
A program to simulate protein evolution by substitution, insertion and deletion
Description
Protein evolution has been largely modelled by considering the amino acid substitution process; however they have been few studies of the process of insertion and deletion. Simprot allows for several models of amino acid substitution (PAM, JTT and PMB), allows for gamma distributed sites rates according to Yang's model, and implements a parameterised Qian and Goldstein distribution model for insertion and deletion.
Full Simprot ProfileSimRare
http://code.google.com/p/simrare/
Rare variant simulation and analysis tool
Description
A program to generate and analyze sequence-based data for rare variant association studies of quantitative and qualitative traits
Full SimRare ProfileSimSeq
https://github.com/jstjohn/SimSeq
An illumina paired-end and mate-pair short read simulator.
Description
This project attempts to model as many of the quirks that exist in Illumina data as possible. Some of these quirks include the potential for chimeric reads, and non-biotinylated fragment pull down in mate-pair libraries . Additionally the program provides the ability to model both site a…
Full SimSeq ProfilesimuG
https://github.com/yjx1217/simuG
simuG: a general-purpose genome simulator
Description
Simulated genomes with pre-defined or random genomic variants can be very useful for benchmarking genomic and bioinformatics analyses. Here we introduce simuG as a light-weighted tool for simulating the full spectrum of genomic variants (SNPs, INDELs, CNVs, inversions, and translocations). In addition, simuG enables a rich array of fine-tuned controls, such as simulating SNPs in different coding partitions (e.g. coding sites, noncoding sites, 4-fold degenerate sites, or 2-fold degenerate sites); simulating CNVs with different formation mechanisms (e.g. segmental deletions, dispersed duplications, and tandem duplications); and simulating inversions and translocations with specific types of breakpoints. The simplicity and versatility of simuG make it a unique general purpose genome simulator for a wide-range of simulation-based applications.
Full simuG ProfilesimuGWAS
https://github.com/BoPeng/simuPOP-examples/tree/master/published/simuGWAS
A forward-time simulator that simulates realistic samples for genome-wide association studies.
Description
simuGWAS evolves a population forward in time, subject to rapid population expansion, mutation, recombination and natural selection. A trajectory simulation method is used to control the allele frequency of optional disease predisposing loci. A scaling approach can be used to improve efficiency when weak, additive genetic factors are used.
Full simuGWAS ProfilesimuPOP
http://simupop.sourceforge.net/
simuPOP is a general-purpose individual-based forward-time population genetics simulation environment.
Description
simuPOP is a general-purpose individual-based forward-time population genetics simulation environment. The core of simuPOP is a scripting language (Python) that provides a large number of objects and functions to manipulate populations, and a mechanism to evolve populations forward in time. Using this environment, users can create, manipulate and evolve populations interactively, or write a script and run it as a batch file. Owing to its flexible and extensible design, simuPOP can simulate large and complex evolutionary processes with ease.
Full simuPOP ProfilesimuRare
https://ysph.yale.edu/c2s2/software/simurare/
Simulating realistic genomic data with rare variants
Description
simuRare is a regression-based resampling method to use real data and simulate rare variants obtained from the 1000 Genomes Project
Full simuRare ProfileSimuSCoP
https://github.com/qasimyu/simuscop
reliably simulate Illumina sequencing data based on position and context dependent profiles
Description
a novel tool to reliably Simulate Illumina Sequencing data based on position and Context dependent Profiles
Full SimuSCoP ProfileSInC
https://sourceforge.net/projects/sincsimulator/
An accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data
Description
An open-source variant simulator and read generator capable of simulating all the three common types of biological variants taking into account a distribution of base quality score from a most commonly used next-generation sequencing instrument from Illumina. SInC is capable of generating single- and paired-end reads with user-defined insert size and with high efficiency compared to the other existing tools. SInC, due to its multi-threaded capability during read generation, has a low time footprint. SInC is currently optimised to work in limited infrastructure setup and can efficiently exploit the commonly used quad-core desktop architecture to simulate short sequence reads with deep coverage for large genomes.
Full SInC ProfileSISSI
http://www.cibiv.at/software/sissi/
A software tool to generate data of related sequences along a given phylogeny, taking into account user defined system of neighbourhoods and instantaneous rate matrices.
Description
Simulating Site-Specific Interactions (SISSI) that simulatesevolution of a nucleotide sequence along a phylogenetic tree incorporating user defined site-specific interactions. Furthermore, our method allows to simulate more complex interactions among nucleotide and other character based sequences
Full SISSI ProfileskeleSim
https://github.com/christianparobek/skeleSim
an extensible, general framework for population genetic simulation in R
Description
skeleSim is a tool to guide users in choosing appropriate simulations, setting parameters, calculating summary genetic statistics, and organizing data output, all within the R environment. skeleSim is designed to be an extensible environment that can 'wrap' around any simulation software to increase its accessibility and use.
Full skeleSim ProfileSLiM
A framework for implementing forward genetic simulations, including an interactive development environment and a highly flexible scripting language.
Description
SLiM is an evolutionary simulation framework that combines a powerful engine for population genetic simulations with the capability of modeling arbitrarily complex eco-evolutionary scenarios. Simulations are configured via the integrated Eidos scripting language that allows interactive control over practically every aspect of the simulated scenarios. The underlying individual-based simulation engine is highly optimized to enable modeling of entire chromosomes in large populations. For macOS, Linux, and Windows (native and WSL) users, we also provide a graphical user interface for easy simulation set-up, interactive runtime control, and dynamical visualization of simulation output.
Full SLiM ProfileSMARTPOP
http://smartpop.sourceforge.net/
Simulating Mating Alliance as a Reproductive Tactic for Populations
Description
SMARTPOP is a fast and flexible forward-in-time simulator for population genetics. Specially developed for speed, it is available in a serial and a parallel versions. Developed for anthropological inference on human populations and eco-anthropological questions, SMARTPOP simulates individuals with sequences of sex-linked DNA (mitochondria, X and Y chromosomes) and autosomes. Studies of social dynamics are enabled using SMARTPOP flexible demographic model and social rules of mating.
Full SMARTPOP ProfileSNPsim
http://code.google.com/p/phylosoftware/
Coalescent simulation of hotspot recombination
Description
SNPsim is a population genetic simulator that generates samples of SNP (Single Nucleotide Polymorphisms) haplotypes and diploid biallelic genotypes. It is based on the coalescent with recombination (Hudson 1983) modified by Wiuf and Posada (2003) to include recombination hotspots. SNPsim also allows for the specification of demographic periods and different mutation models.
Full SNPsim ProfileSomatoSim
https://github.com/BieseckerLab/SomatoSim
SomatoSim: precision simulation of somatic single nucleotide variants
Description
SomatoSim is a tool that lets users simulate somatic single nucleotide variants in sequence alignment map (SAM/BAM) files with full control of the specific variant positions, number of variants, variant allele fractions, depth of coverage, read quality, and base quality, among other parameters. SomatoSim accomplishes this through a three-stage process: variant selection, where candidate positions are selected for simulation, variant simulation, where reads are selected and mutated, and variant evaluation, where SomatoSim summarizes the simulation results.
Full SomatoSim ProfileSPARSIM
https://gitlab.com/sysbiobig/sparsim
SPARSim is an R tool used for simulating single cell RNA-seq (scRNA-seq) count tables.
Description
SPARSim is an scRNA count data simulator based on a Gamma-Multivariate Hypergeometric model. The package generates count data that resembles real data in terms of count intensity, variability and sparsity. The simulator is capable of simulating count matrices that resemble the distribution of different expression intensities observed in real count data. The package can also simulate single cell RNA-seq count tables.
Full SPARSIM ProfileSpartaABC
http://spartaabc.tau.ac.il/webserver
a web server to simulate sequences with indel parameters inferred using an approximate Bayesian computation algorithm
Description
SpartaABC is an Approximate Bayesian Computation (ABC) reject algorithm to infer indel parameters from sequence data. It focuses on the inference of three indel parameters: IR — the indel-to-substitution rate ratio, A — the shape parameter for the power law distribution controlling the indel length, and RL — the root length parameter. SpartaABC extracts a vector of summary statistics from its input; it then performs repeated simulations using an integrated sequence simulator (Fletcher and Yang 2009, Cartwright 2016) under various indel parameters. From each such simulated dataset it extracts a vector of summary statistics and computes its distance from the vector extracted for the input using a weighted Euclidean distance.
Full SpartaABC ProfileSPIP
https://swfsc.noaa.gov/textblock.aspx?Division=FED&id=3434
SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user
Description
SPIP simulates the transmission of genes from parents to offspring in a population having demographic structure defined by the user. Numerous variables controlling the age structure of the population, the number of offspring produced, the variance in male and female reproductive success, survival rates of different age classes, mate fidelity, duration of simulation, etc. can be specified by the user. The program stores the pedigree of all individuals in the simulated population. This pedigree is used to simulate genetic data on sampled individuals by tracing lineages back through paternal or maternal genes within each sampled individual. Data may be simulated for an arbitrary number of loci that are assumed to be independently segregating and to not be subject to natural selection, nor linked to any selected genes. Genotypes are reported in terms of both "founder alleles" (i.e., each distinct allele amongst the founders of the pedigree is given a distinct label) and also in terms of alleles whose frequencies amongst the founding members of the pedigree may be specified by the user.
Full SPIP ProfileSplatche
Spatial and Temporal Coalescences in Heterogeneous Environment
Description
SPLATCHE (for SPatiaL And Temporal Coalescences in Heterogenous Environment) is a program that allows to incorporate the influence of environment in the simulation of migration of a given species from one or several origin(s). In a second phase, the molecular genetic diversity of one or several samples drawn from the simulated species can be generated. Geographic area and environmental information have to be specified by the program user in a series of input files. Basically, the virtual world where migrations take place is constituted by a matrix of demes. Each deme has its own environmental characteristics according to the input files. A coalescent-based approach allows to generate the molecular diversity of any population sample. The molecular data obtained can then be analyzed in order to study the signature of the simulated demographic scenario. The goal of this online manual is to describe the technical aspects of the software SPLATCHE (version 1.1). This manual complements the article from Currat, Ray and Excoffier, published in 2004. Further details on the methodology can also be found in Ray (2003) and Currat (2004). The pdf version of the user manual could also be download there.
Full Splatche ProfileSplatter
https://bioconductor.org/packages/devel/bioc/vignettes/splatter/inst/doc/splatter.html
an R package for the simple simulation of single-cell RNA sequencing data. This vignette gives an overview and introduction to Splatter’s functionality.
Description
As single-cell RNA sequencing (scRNA-seq) technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed, and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here, we present the Splatter Bioconductor package for simple, reproducible, and well-documented simulation of scRNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types, or differentiation paths.
Full Splatter ProfileSPsimSeq
https://github.com/CenterForStatistics-UGent/SPsimSeq
SPsimSeq simulates datasets from estimated marginal distributions with Gaussian-copulas.
Description
SPsimSeq uses an exponential family for density estimation to construct distributions of gene expression levels from RNA sequencing data, thereby simulating a new dataset from marginal distributions with Gaussian-copulas in order to retain the dependence between genes. It also allows for the simulation of multiple groups and batches with any required sample size and library size.
Full SPsimSeq Profilesrv
https://github.com/BoPeng/simuPOP-examples/tree/master/published/simuRareVariants
Simulator of Rare Varaints (srv) is a simulator for the simulation of the introduction and evolution of (rare) genetic variants.
Description
srv simulates the introduction and evolution of genetic variants in one or more regions of chromosomes. These regions span roughly 10k to 100k basepair and can be considered as a gene. During evolution, mutants are introduced to the population and change the fitness of individuals who carry these mutants. The most distinguishing feature of this script is that it allows multi-locus fitness schemes with random or locus-specific diploid single-locus selection models to newly arising mutants. A multi-locus selection model is used to assign a fitness value to individuals according the mutants they carry.
Full srv ProfileSVEngine
https://bitbucket.org/charade/svengine/src/master/
Allele Specific and Haplotype Aware Structural Variants Simulator
Description
SVEngine is a multi-purpose and self-contained simulator for whole genome scale spike-in of thousands of SV events of various types in both single-sample and matched sample scenarios.
Full SVEngine ProfileSymSim
https://github.com/YosefLab/SymSim
SymSim simulates single cell RNA sequencing data thereby allowing users to tune the variation of gene-expression on different levels.
Description
SymSim models the processes that give rise to data observed in single cell RNA-Seq experiments. The components of the SymSim pipeline pertain to three sources of variation in single cell RNA-Seq data: noise intrinsic to the process of transcription, extrinsic variation indicative of different cell states, and technical variation due to low sensitivity and measurement noise and bias.
Full SymSim ProfileSynggen
https://bcglab.cibio.unitn.it/synggen
Fast and data-driven generation of synthetic heterogeneous NGS cancer data
Description
Synggen is a tool written in C programming language to generate synthetic NGS files, in the form of whole-exome or targeted sequencing experiments, representing heterogeneous cancer genomes and matched controls. The tool provides two execution modes which allow to (i) exploit a set of control (non-cancer) NGS sequencing files (BAM format) to generate reference models capturing a collection of data summary statistics; and (ii) combine these reference models and a set of user-specified germline and somatic genomic profiles to create synthetic sequencing files (FASTQ format). Synggen allows to input specific lists of germline variants and somatic genomic events, including phased germline SNPs and somatic allele-specific CNAs and SNVs, together with local and global parameters including the clonality of somatic events and the overall sample tumor content, allowing for the emulation of varied and realistic cancer- and patient-specific data across the different multi-subclones composition, tumor purity, aneuploidy and tumor evolution scenarios.
Full Synggen ProfileTreesimJ
http://code.google.com/p/treesimj/
A flexible, forward-time population genetic simulator
Description
TreesimJ is a forward-time simulator of an evolving population that tracks the evolutionary tree of the entire population. The application offers an intuitive GUI, a variety of pre-configured models of fitness, mutation, and demography, and a suite of data collectors that analyze the population and emit data to one or more sources. To the user, TreesimJ offers a simple, easy to use interface, a variety of interchangeable 'models' describing many aspects of the evolving population, and many ways to quantify and summarize the state of the population. Since the entire tree of the population is tracked, TreesimJ can easily be used to asses the average time to most recent common ancestor, the level of tree imbalance, or the mean pairwise coalescent time. It can also compute a number of familiar population genetic statistics, such as the nucleotide diversity and the number of segregating sites (if a model of fitness that includes DNA is used). The list of potential data collecting items is long, and getting
Full TreesimJ ProfileVariant Simulation Tools
http://varianttools.sourceforge.net/Simulation/HomePage
A simulation tool for post-GWAS genetic epidemiological studies using whole-genome or whole-exome next-gen sequencing data, with an emphasis on user-friendliness and reproducibility.
Description
Variant Simulation Tools is a module of Variant Tools for the simulation of genetic variants for sequencing-based genetic epidemiological studies. Although multiple simulation engines are provided, the core of VST is a novel forward-time simulation engine that simulates real nucleotide sequences of the human genome using DNA mutation models, fine-scale recombination maps, and a selection model based on amino acid changes of translated protein sequences. The design of VST allows users to easily create and distribute simulation methods and simulated datasets for a variety of applications and encourages fair comparison between statistical methods through the use of existing or reproduced simulated datasets.
Full Variant Simulation Tools ProfileVarSim
https://github.com/bioinform/varsim
A high-fidelity simulation validation framework for high-throughput genome sequencing with cancer applications
Description
VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing.
Full VarSim ProfileVirusTreeSimulator
https://github.com/PangeaHIV/VirusTreeSimulator
Simulates virus trees within a given transmission tree
Description
Simulates virus trees within a given transmission tree
Full VirusTreeSimulator ProfileVISOR
https://github.com/davidebolo1993/VISOR
VISOR is a haplotype-aware structural variants simulator for short and long read sequencing
Description
VISOR is an efficient and versatile command-line application, capable to simulate structural variants and small/single-nucleotide variants in a haplotype-resolved manner. VISOR currently supports simulations of bulk short (Illumina) and long (PacBio-ONT) reads sequencing data. VISOR also supports simulations of single cell, strand-seq data and includes a module, actively under development, capable to simulate 10X linked reads data. VISOR is readily applicable to canger genomics, enabling the simulation of tumour purity (normal in tumour contamination), heterogeneity (mix of several subclones) and aneuploidy. VISOR also incorporates capture biases, a crucial feature for whole-exome data sets and panel sequencing applications.
Full VISOR ProfileVortex
VORTEX is an individual-based simulation model for population viability analysis (PVA).
Description
VORTEX is an individual-based simulation model for population viability analysis (PVA). This program will help you understand the effects of deterministic forces as well as demographic, environmental, and genetic stochastic (or random) events on the dynamics of wildlife populations. VORTEX models population dynamics as discrete, sequential events (e.g., births, deaths, catastrophes, etc.) that occur according to defined probabilities. The probabilities of events are modeled as constants or as random variables that follow specified distributions. Since the growth or decline of a simulated population is strongly influenced by these random events, separate model iterations or “runs” using the exact same input parameters will produce different results. Consequently, the model is repeated many times to reveal the distribution of fates that the population might experience under a given set of input conditions.
Full Vortex ProfileWessim
http://sak042.github.io/Wessim/
Whole Exome Sequencing SIMulator
Description
Wessim is a simulator for a targeted resequencing as generally known as exome sequencing. Wessim basically generates a set of artificial DNA fragments for next generation sequencing (NGS) read simulation. In the targeted resequencing, we constraint the genomic regions that are used to generated DNA fragments to be only a part of the entire genome; they are usually exons and/or a few introns and untranslated regions (UTRs).
Full Wessim ProfileWgSim
a small tool for simulating sequence reads from a reference genome.
Description
Wgsim is a small tool for simulating sequence reads from a reference genome. It is able to simulate diploid genomes with SNPs and insertion/deletion (INDEL) polymorphisms, and simulate reads with uniform substitution sequencing errors. It does not generate INDEL sequencing errors, but this can be partly compensated by simulating INDEL polymorphisms. Wgsim outputs the simulated polymorphisms, and writes the true read coordinates as well as the number of polymorphisms and sequencing errors in read names. One can evaluate the accuracy of a mapper or a SNP caller with wgsim_eval.pl that comes with the package.
Full WgSim ProfileXomeBlender
https://github.com/rsemeraro/XomeBlender
Generates synthetic cancer genomes with different contamination level and intra-tumor heterogeneity and devoid of any synthetic element
Description
Xome-Blender is a collection of python and R scripts based on SAMtools functions that allows to generate synthetic cancer genomes with user defined features such as the number of subclones, the number of somatic variants and the presence of CNV, without the addition of any synthetic element. It is composed of two modules: InXalizer and Xome-Blender. The first module is devoted to the blending process initialization. It takes as input a single BAM file, a set of user-defined parameters and returns the coverage of the sample and the input-files for the second module (Xome-Blender). Optionally, it creates a file containing the coordinates to insert CNV in the final product. The second module generates the synthetic heterogeneous sample.
Full XomeBlender ProfileXS
http://bioinformatics.ua.pt/software/xs/
a FASTQ read simulator
Description
is a skilled FASTQ read simulation tool, flexible, portable (does not need a reference sequence) and tunable in terms of sequence complexity. XS handles Ion Torrent, Roche-454, Illumina and ABI-SOLiD simulation sequencing types. It has several running modes, depending on the time and memory available, and is aimed at testing computing infrastructures, namely cloud computing of large-scale projects, and testing FASTQ compression algorithms. Moreover, XS offers the possibility of simulating the three main FASTQ components individually (headers, DNA sequences and quality-scores).
Full XS ProfileZombi
https://github.com/AADavin/Zombi
Zombi generates species trees, gene trees and sequences.
Description
Zombi is a flexible platform of genome evolution which can be of great interest to those who want to test different evolutionary hypotheses under simulations and need to use a fast and easy-to-use tool to generate species trees, gene trees or sequences. Zombi's output is especially simple and easy to read understand and parse.
Full Zombi ProfileStep 1: Select attributes to compare
-
Target
-
Type of Simulated Data
- Genotype at Genetic Markers
- Diploid DNA Sequence
- Haploid DNA Sequence
- RNA
- Gene Expression
- Sex Chromosomes
- Mitochondrial DNA
- Protein Sequence
- Sequencing Reads
- Phenotype
- Single-Cell Sequencing
- Bulk Sequencing
- Proteomics
- Chromatin Conformation
-
Variations
- Biallelic Marker
- Multiallelic Marker
- Single Nucleotide Variation
- Amino acid variation
- Microsatellite
- Insertion and Deletion
- CNV
- Inversion and Rearrangement
- Alternative Splicing
- Missing Genotypes
- Genotype or Sequencing Error
- Ionization
- Other
-
Type of Simulated Data
-
Simulation Method
- Standard Coalescent
- Exact Coalescent
- Machine Learning
- Forward-time
- Resample Existing Data
- Phylogenetic
- Gene dropping
- Neural network
- Other
-
Input
-
Data Type
- Allele Frequencies
- Empirical
- Ancestral Sequence
- Saved simulation
- Reference genome
- Other
-
File format
- Arlequin
- CREATE
- Fstat
- GDA
- Genepop
- MIGRATE
- MS
- SAM or BAM
- NEXUS
- Phylip
- STRUCTURE
- XML
- Tree Sequence
- Program Specific
- Other
-
Data Type
-
Output
-
Data Type
- Genotype or Sequence
- Phenotypic Trait
- Individual Relationship
- Phylogenetic Tree
- Demographic
- Mutation
- Methylation
- Gene Expression
- Protein Expression
- Linkage Disequilibrium
- Diversity Measures
- Fitness
-
Sequencing Reads
- Illumina
- Roche 454
- SOLiD
- IonTorrent
- PacBio
- Nanopore
- Other
- Other
-
File Format
- Arlequin
- Fasta or Fastq
- Fstat
- Genepop
- Linkage
- MIGRATE
- MS
- PED
- Phylip
- NEXUS
- STRUCTURE
- VCF
- SAM or BAM
- Tree Sequence
- Program Specific
- Other
-
Sample Type
- Random or Independent
- Sibpairs, Trios and Nuclear Families
- Extended or Complete Pedigrees
- Case-control
- Longitudinal
- Other
-
Data Type
-
Phenotype
-
Trait Type
- Binary or Qualitative
- Quantitative
- Multiple
-
Determinants
- Single Genetic Marker
- Multiple Genetic Markers
- Sex-linked
- Gene-Gene Interaction
- Environmental Factors
- Gene-Environment Interaction
-
Trait Type
-
Evolutionary Features
-
Demographic
-
Population Size Changes
- Constant Size
- Exponential Growth or Decline
- Logistic Growth
- Bottleneck
- Carrying Capacity
- User Defined
-
Gene Flow
- Stepping Stone Models
- Island Models
- Continent-Island Models
- Sex or Age-Specific Migration Rates
- Influenced by Environmental Factors
- Admixed Population
- User-defined Matrix
- Other
-
Spatiality
- Discrete Models
- Continuous Models
- Landscape Factors
-
Population Size Changes
-
Life Cycle
- Discrete Generation Model
- Age structured
- Overlapping Generation
- User-Defined transition matrices
-
Mating System
- Random Mating
- Monogamous
- Polygamous
- Haplodiploid
- Selfing
- Age- or Stage-Specific
- Assortative or Disassortative
- Other
-
Fecundity
- Constant Number
- Randomly Distributed
- Individually Determined
- Influenced by Environment
- Other
-
Natural Selection
-
Determinant
- Single-locus
- Multi-locus
- Codon-based
- Fitness of Offspring
- Phenotypic Trait
- Environmental Factors
-
Models
- Directional Selection
- Balancing Selection
- Multi-locus models
- Epistasis
- Random Fitness Effects
- Disruptive
- Phenotype Threshold
- Frequency-Dependent
- Other
-
Determinant
-
Recombination
- Uniform
- Varying Recombination Rates
- Gene Conversion Allowed
-
Mutation Models
- Two-allele Mutation Model
- Markov DNA Evolution Models
- k-Allele Model
- Infinite-allele Model
- Infinite-sites Model
- Stepwise Mutation Model
- Codon and Amino Acid Models
- Indels and Others
- Heterogeneity among Sites
- Others
-
Events Allowed
- Population Merge and Split
- Varying Demographic Features
- Population Events
- Varying Genetic Features
- Change of Mating Systems
- Other
-
Other
- Phenogenetic
- Polygenic background
-
Demographic
-
Interface
- Command-line
- Graphical User Interface
- Integrated Development Environment
- Script-based
- Web-based
-
Development
-
Tested Platforms
- Windows
- Mac OS X
- Linux and Unix
- Solaris
- Others
-
Language
- C or C++
- Java
- R
- Python
- Perl
- Visual Basic
- Other
-
License
- GNU Public License
- BSD
- Creative Commons
- MIT
- Other
-
Tested Platforms
-
GSR Certification
- Accessibility
- Documentation
- Application
- Support
Step 2: Select matching simulators
(ordered by match quality)
- Compare Simulators by Attribute
- Browse and Search Simulators
-
How to Use this Tool
How to Use this Tool
Steps
- Select your desired simulator attributes in the Select attributes to compare pane in one of two ways:
- Navigate the attribute tree
- Use the text-box and its typeahead features to populate the attribute tree
- Observe the simulators ranked by their match quality in the Matching simulators pane to the right
- Select at least one and at most six simulators by checking their checkboxes to the right of each simulator and click the Compare button to view the comparison table
Important
This tool is best viewed in one of the following browsers: