|Home||Suche | Sitemap | Impressum ||
RNA Morphology Using Constraint-Directed Folding
Interdisciplinary Centre for Bioinformatics
University of Leipzig
Peter F. Stadler
Department of Computer Science
University of Leipzig
Functional non-coding RNA molecules can serve as a major source of data in molecular phylogenetics.
But the peculiarities of RNA evolution, and in particular the long term conservation of their secondary structures, have been taken into account only recently. Functional non-coding RNAs are e.g.: tRNA, rRNA, miRNAs, viral genomes, and transportation signals. They perform vital functions in the cell and a vast amount of sequence data already exists in databases widely not used in exhaustive manner. Correct structural annotation at large scale is a serious problem.
Because the functions of non-coding RNA are mediated by their structures, selective pressure acts on these structures. But structural conservation is not necessarily mediated by sequence conservation: Many slightly different sequences can fold into the same (functional) structure. On the other hand, only little changes in the nucleic acid sequences may cause large changes in secondary structures.
Secondary structure elements, consistently present in a group of slightly variable sequences are mostly the result of stabilizing selection, not a consequence of the high degree of sequence conservation.
Models: If selection acts to preserve a structural element in spite of sequence variation then it must carry some function. Furthermore, nucleotides in stem regions evolve in strong correlation with their pairing counterpart: The principle of compensatory mutations is defined by the fact that selective pressure acts to re-establish the functional structure after a mutation of a pairing position.
This selective pressure on re-establishing highly conserved secondary structures of RNA molecules introduces strong correlations between the two strands of a helix. The selection for stabilizing RNA structure contradicts the assumption of independent evolution of each sequence position and leads to overestimated reliability of sequence based trees and artefacts in phylogeny reconstruction. It has been shown that phylogeny reconstruction methods can be extended to partially structure based approaches using the special features of RNA secondary structures incorporating the slower evolution of the structural elements compared to underlying sequence elements provided a good model for the RNA secondary structure.
Our workflow aims at automated large scale structure annotation for sequence databases to utilize the evolution of secondary structures when reconstructing phylogenies.
Thermodynamic folding algorithms are based on an energy model that considers additive contributions from stacked base pairs and various types of loops. But the calculated minimum free energy structures are not necessarily the biologically active ones. First results of our pilot studies show that folding can be directed towards a biologically relevant solution when using common structural features in a set of related RNA molecules as a constraint for the folding of each individual sequence. A consensus structure can support the folding process to direct it towards a biologically 'stabilized' prediction.
The workflow starts using a standard sequence alignment, RNAalifold extracts a consensus structure out of an sequence alignment, a set of basic structural features that are represented in all sequences. The consensus structure is then adapted to each individual sequence and, finally, it serves as a constraint for the folding algorithm of RNAfold to obtain a best estimate of the secondary structure of each individual sequence.
Results: Doubtless improved folding: Some results in the following show the capabilities of the automated process of constraint folding applied to 38 mitochondrial trp-tRNAs. Processing each possible pairwise alignment gave 703 possible pairwise consensus structures (some of them are usable, others not). Most nonsense consensus structures appear only once, but the best fitting (= correct) consensus prediction appeared in 94 cases. Thus, a consensus/constraint for mitochondrial trp-tRNA was detected unambiguously.
All single molecules can then be folded supported by the automatically derived constraint. The figure shows two examples of successfully corrected foldings:
Outlook: As a conclusion, it can be stated that the evolution of secondary structure over long timescales provides a useful source of phylogenetic information.
Large scale RNA databases with good structure annotation seem to be possible, a vast amount of sequence data already exists in databases.
RNA structure alignments are likely to be a more robust and reliable source of character information than the underlying sequences.
The combination of sequence and structure alignments with precise nucleic acid substitution rates that depend on conserved structures will help to describe RNA evolution at large time-scales.