|
Interdisziplinäres Zentrum |
![]() |
| Home | Suche | Sitemap | Impressum | | ![]() |
![]() ![]()
|
The search for functional gene expression signatures – some efficient computer programsJürgen Läuter Interdisciplinary Centre for Bioinformatics University of Leipzig Friedemann Horn Institute for Clinical Immunology and Transfusion Medicine University Hospital Leipzig Cooperation: Maciej Rosolowski Institute for Medical Informatics, Statistics and Epidemiology University of Leipzig Martin Beck Interdisciplinary Centre for Bioinformatics University of Leipzig The statistical analysis of single-gene expression data is not sufficient to understand the biological gene function. Genes must also be characterized by their mutual relationship. Therefore, gene sets should be found which indicate intra-individual or between-groups tissue differences, respectively. Such gene sets allow more stable conclusions than single genes do.
Figure 1. Representation of 9 gene sets consisting of a total of 33 genes that have been found as the result of
the searching algorithm for gene expression signatures. These genes correlate significantly and positively with 12 target genes of the tumour
suppressor p53 that were predetermined for the computation. For each set, a circle was drawn which contains the genes of this set.
The membership of a gene to a centre gene is represented by a straight line. Our procedure is particularly effective if some target genes of a cascade are already known from prior microarray investigations using cell lines. The high computational performance of our programs lies in a special technique which at first considers each gene as a potential centre of a gene set and then evaluates all sets with reference to statistical significance in a very simple and lucid way. The evaluation is based on the well-known permutation method by Westfall and Young (1993). Thus, the principles of multiple testing with maintaining strictly the familywise error of first kind are applied. The programs can be used in many different modifications: Three resampling strategies such as systematic and random permutations, random data rotations; different test statistics, for example, with truncation; application of the given data or their ranks; choosing the gene set size by a corresponding correlation limit; demand dominant centre genes or not; demand homogeneous directions of all genes in a set or not; application of two- or one-sided tests; limitation of the sum of squares for each gene or not; limitation of the variation coefficient for each gene or not; output of the results in different formats and extents. In Fig. 1, an application is demonstrated. We used a dataset from the „Global Cancer Map“ comprising the expression values of 7129 genes from 180 patients of different tumour classes. Considering 12 target genes known to be up-regulated by the tumour suppressor p53, among them the gene encoding the cyclin-dependent kinase inhibitor p21, we undertook to identify additional genes that positively correlate with these genes by the above-mentioned procedures. As a result, we obtained 9 gene sets consisting of 33 genes at the multiple significance level α=0.05. The figure represents these genes by two principal components. The representation shows the centre gene and a circle for each set, which contains all genes being a member of the set.
Figure 2. A large protein synthesis gene signature is reciprocally correlated to the expression of p53
target genes in the “Global Cancer Map” microarray dataset. The majority of ribosomal protein genes belong to that signature as demonstrated
in figure for the large ribosomal subunit (proteins encoded by signature genes are indicated by red circles). The members are connected with their centre gene by a straight line. We observe that the 33 genes split up into four distinct clusters. Further analyses and inspections of genes revealed functional gene sets that are either positively or negatively correlated with the expression of p53 target genes. As an example, a group of genes tightly connected to DNA damage signalling and DNA repair correlated positively with p53 target gene activity in a lymphoma array dataset. In contrast, the analysis of the Global Cancer Map revealed the negative correlation of p53 activity with the expression of a large set of genes representing the cellular protein synthesis machinery, including many ribosomal protein as well as eukaryotic translation factor genes (Fig. 2). These results demonstrate that our algorithm allows the identification of functionally relevant gene sets potentially involved in tumour pathogenesis. We currently extend this analysis to other signalling cascades and transcription factors.
top
|