|Home||Suche | Sitemap | Impressum ||
RNA-Based Gene Phylogeny
Roman R. Stocsits
Interdisciplinary Centre for Bioinformatics
University of Leipzig
Sonja Prohaska, Claudia Fried, Peter F. Stadler
Department of Computer Science
University of Leipzig
Institute for Theoretical Chemistry
University of Vienna
Functional non-coding RNA molecules are a major source of data in molecular phylogenetics. Only recently, the peculiarities of RNA evolution, and in particular the long term conservation of their secondary structures, have been taken into account in such context.
Functional RNAs that do not code for proteins include tRNA, rRNA, non-coding regulatory RNAs (miRNAs and various other types known), viral genomes, and transportation signals. The functions of these molecules are mediated by their secondary structures. Therefore, selective pressure acts on these structures, and primary structural features (sequence motifs) are sometimes exchangeable without consequence.
Because structural conservation is not necessarily mediated by sequence conservation and many slightly different sequences can fold into the same structure, the problem arises that highly variable sequences may obscure phylogenetic signals and lead to artefacts in phylogeny reconstruction.
Figure 1: One set of predicted secondary structure elements of the viral genome of human Hepatitis B virus. In principle, various slightly different sequences can fold into the same (functional) structure and often compensatory mutations occur to re-establish a base pair that was lost by mutation via a further mutation. On the other hand, only little changes in the nucleic acid sequences may cause large changes in secondary structures, if no selective pressure is acting on structure conservation. (Ref. 3)
To overcome this problem the information pool used for phylogeny reconstruction can be extended to RNA secondary structure.
Because of their high phylogenetic information content, ribosomal RNAs in particular should be studied with methods which account for functionally conserved secondary structure elements. But also other non-coding RNA classes, like tRNA, miRNA, and others need to be re-evaluated by explicitly taking into account conserved functional secondary structure elements. The development of algorithms and a convenient software implementation for deriving reliable secondary structure models of novel and existing RNA sequences have already been aims of our investigation. (Ref. 1)
We are now working on methods for including the evolution of secondary structures when reconstructing phylogenies. Our goal is a systematic investigation of the relationships between sequence and structure evolution.
Furthermore, we are currently working on a pilot-study with the aim to provide secondary structure annotations for the tRNAs contained in the OGRe database of Paul G. Higgs and collaborators. Our plans are automatic database entry with structural annotation of new (functional) RNA sequences and automatic assignment of sequence/structure to the appropriate location in the phylogenetic tree.
Regarding gene phylogeny it has to be stated that gene and genome duplications played a major role in the origin of new clades. It is difficult to find if homologous genes have arisen by means of genome duplications or sequential local duplications. Also determining when gene duplications have occurred after lineage divergence is subject of our research. (Ref. 2)
Duplicate genes need not have independent origins as regional duplications are known to be common. The subsequent loss of duplicates in the various lineages through deleterious substitutions must also be taken into account. In this context, we develop algorithms to reconstruct the history of gene duplications and gene losses from a given species tree and a corresponding gene tree, resulting from phylogenetic analysis.
The importance of considering the evolution of non-coding RNAs, especially miRNAs and snoRNAs, has become very evident during the last few years. It has been shown that wide regions of the human genome, so called non-coding RNA genes, do not code for any proteins but for short RNA molecules that are of major importance for regulation of various cellular processes.
For complete understanding of these mechanisms it will be necessary to reconstruct the origin of the underlying mechanistic principles by extensive phylogenetic analyses. For the microRNA 17 clusters we showed the change of genomic organization in recent species during evolution. (Ref. 4)
Figure 2: The evolutionary history of the non-coding miRNA 17 cluster. Starting from a cluster of 3 sequence motifs (bottom, left) duplications and deletions of single genes as well as genome duplications have led to the genomic organization in recent species