Interdisciplinary Centre
for Bioinformatics

Search  |  Sitemap  |  Imprint  |

Multivariate procedures to increase power of microarray-based gene expression

Ingo Röder, Markus Löffler, Ernst Schuster
Institute for Medical Informatics, Statistics and Epidemiology
University of Leipzig


Hans Binder, Toralf Kirsten
University of Leipzig

Knut Krohn
University of Leipzig

Markus Eszlinger
Med. Klinik u. Polliklinik II
Medical Faculty of the University of Leipzig

Background and problem: Micro array technology allows the simultaneous analysis of ten-thousands of genes. Most often, however, the analysis is based on a few replications only. This causes problems in the application of classical multivariate tests which require sample sizes exceeding the number of observed variables. Moreover, the simultaneous testing of tens of thousands of genes for differential expression raises the "multiple testing problem", i.e. the increasing probability of obtaining false positive results when performing multiple tests.
To overcome these problems, a class of stable, multivariate procedures based on the theory of spherical distributions has been proposed by Läuter, Glimm, and Kropf, 1996, 1998 [5,6]. These methods allow the use of multivariate information of many genes for testing differential gene expression. Furthermore, multiple testing procedures based on these principles have been constructed (e.g., Kropf, Läuter, 2002 [4]), which strictly keep the family-wise type I error rate (FWE).

Results: In this project, the above mentioned methods have been generalized to allow for the use of full multivariate information on expression intensities of individual genes analysed by the Affymetrix GeneChip technology. In contrast to the usual strategy, which constructs an expression score for each gene, based on averaging of the different oligonucleotide (perfect- and miss-match) information, and then performs some test on these summarized expression values, we developed a test procedure based on the complete multivariate perfect match information. It is shown that a multiple FWE-controlling procedure for normally distributed data proposed by Westfall, Kropf, and Finos, 2004 [8], can be generalised to a more powerful procedure (WKF-procedure) based on left-spherically distributed scores derived from the perfect match information, without losing the FWE-controlling property. Herein, different variants of the WKF-procedure (non-standardized principle component test - NPC, standardized principle component test - SPC, covariance sum test - CS, and standardized sum test SS) are considered.

To illustrate the proposed test procedures, which have been implemented in the statistical programming environment R, we analyse two already published data sets, comparing gene expression of tumour and healthy tissues within identical patients and between two groups of different patients, respectively. Using these examples, we demonstrate that the use of multivariate scores leads to a more efficient identification of differentially expressed genes than the widely used MAS5 approach provided by the Affymetrix software tools (Affymetrix Microarray Suite 5 or GeneChip Operating Software) or even the robust version of a two-way analysis of variance (MDP) to estimate the expression value for each individual gene as suggested by Irizarry et al., 2003. The incorporation of the multivariate perfect match information is superior to classical expression score based methods with respect to the number of identifiable differentially expressed genes (Fig. 1). For a detailed description of the results we refer to Schuster et al., 2004 and Krohn et al, 2005.

Krohn, K., Eszlinger, M., Paschke, R., Roeder, I., Schuster, E. (2005)
Increased power of microarray analysis by use of an algorithm based on a multivariate procedure.
Bioinformatics 21: 3530-34.
Kropf, S. and Läuter, J. (2002)
Multiple Tests for Different Sets of Variables Using a Data-Driven Ordering of Hypotheses, with an Application to Gene Expression Data.
Biometrical Journal 44, 789?800.
Läuter, J., Glimm, E., and Kropf, S. (1996)
New multivariate tests for data with an inherent structure.
Biometrical Journal 38, 5–23. Erratum: Biometrical Journal 40, 1015.
Läuter, J., Glimm, E., and Kropf, S. (1998)
Multivariate Tests Based on Left-Spherically Distributed Linear Scores.
Annals of Statistics 26, 1972–1988. Correction: Annals of Statistics 27, 1441.
Schuster,E. et al. (2004)
Microarray based gene expression analysis using parametric multivariate tests per gene—a generalized application of multiple procedures with data-driven order of hypotheses.
Biomet. J., 46, 687–696.