|Home||Suche | Sitemap | Impressum ||
Simulations of Genome Evolution
Roman R. Stocsits
Interdisciplinary Centre for Bioinformatics
University of Leipzig
Pawel Mackiewicz, Stanislaw Cebrat
Institute of Genetics and Microbiology
University of Wroclaw
University of Zielona Gora
Genomes are large sets of not only protein-coding genes but also of vast amounts of genes that encode structurally functional RNA transcripts, and various types of regulatory elements on the DNA level. Expression patterns and complex regulation lead to phenotypes which are built up by proteins, various RNAs, complexes of both and other factors, up to cells, tissues, and the individual organism.
Under selective pressure, genomes evolve via mutation and recombination of already evolutionarily variated defaults for better adaptation of the phenotype's fitness to the environment.
Rapidly interchanging interactions of proteins, RNA-protein complexes, various RNAs, DNA-protein complexes, up to the sub-organellar states, like the cytoskeleton, with enhancers, promoters, and transcription factor binding sites featuring various types of sequence motifs are important for regulation: The whole system consists of encoded data and functional structures that co-operate in highly variable cross reactions, (antisense) inhibitions, co-activation mechanisms, reaction networks, and much more maybe widely unknown dependencies within the three dimensions of the cell.
The regulation of the template's expression is, at least, as decisive as the contents of the template, but lots of principles and mechanisms are widely not understood, and little is known about effective dependencies among various regulation schemata (regarding quality and quantity). This is one of the most important intrinsic problems of genome simulations.
Furthermore, complex processing of huge data amounts is of course expensive in calculation time and computer memory.
In nature selective pressure acts on function. Model genomes are, in the most simple case, defined as arrays of protein coding genes. Selective pressure acts on predefined fitness parameters. Randomly placed mutations (about 1 per generation cycle) influence the fitness of the encoded phenotype. The phenotype is defined as a set of (bio)chemical features (isoelectric points, amino acid sequence motifs, secondary structure motifs, etc.). Artificial selective pressure acts on this phenotype while iterations of mutation and selection (checking for fitness iteratively) lead to simulated evolution.
Under selective pressure some mutations are letal. The genome containing the letal mutation dies and gets replaced randomly by another genome in the pool.
But shortcomings of this simple approach are obvious: A good genome simulation needs a model extended to functional RNA genes, various regulation sites, and much more.
The decision if survival of the selection is possible regarding a specific phenotypic marker might be just YES or NO. But there might also be in-between states, and fitness might even be continuously decreasing from perfectly fitting to letal. This means that a disadvantageous mutation is not necessarily letal (in vivo and in silico).
In our approaches to more complex (realistic) genome evolution simulations the first step is the inclusion of RNA into the system: The set of protein genes is extended by some tRNA genes. Artificial selective pressure in those cases acts only on secondary structural features of the transcript, because biological function is mediated by the RNA structure.
Some parts of the algorithms needed are already existing, some are under construction: The Vienna RNA package features already essential routines and can easily be adapted to fit some eventual further needs. Folding software for single RNA molecules as well as for consensus folding of alignments is part of the package. Also the iterative mutation/selection algorithms (for generation cycles) and the protein selection models are available. All necessary parallelization software has been implemented in other context.
We start from biologically relevant tRNA sequences.
We allow certain variation from our default:
The RNA selection models (only for tRNA at the beginning) are under construction. We plan to produce biologically relevant consensus structures for distinct tRNA genes to get (more or less stringent) structural constraints that must be still fulfilled after mutation of the gene for survival of the complete genome. Manually adapting the consensus depending on our needs (of course arbitrarily) allows to simulate even highly diverse fitness rules. The first implementations for tRNA just feature a YES/NO selection. For extending the existing algorithms to biologically more relevant genome simulations it will be necessary to introduce in-between states: For instance, if one constraint is not fulfilled, the survival is in spite of that possible, if another constraint is fulfilled especially good.
Furthermore, routines are still missing that catch, store and evaluate all important data about the behaviour of the system in long-time simulations.
We expect to gain insight into the behaviour of various genomic combinations of genes for protein, tRNA, rRNA, ncRNA, regulatory DNA elements, and junk DNA. And we hope to be able to formulate answers step by step: What are the effects of junk DNA on selection? Is it advantageous as a 'mutation absorber', or a disadvantage for fitness (e.g. if short generation time is important)? How can we simulate improvements after newly invented features? Can we see recurrencies regarding regulation schemata, sequence motifs, expression regulation networks, rearrangement events, and more? Is it possible to apply these reproducible events to real (mitochondrial) genomes, to deduce correlations between genome structure and evolutionary success? One further important step is also to extend the long-time simulations to mitochondrial genome rearrangement studies.