Interdisziplinäres Zentrum
für Bioinformatik

Suche  |  Sitemap  |  Impressum  |

    Analysis of large-scale molecular biological data using self-organizing maps    

Molecular biology is presently flooded by terabytes of high-throughput data generated by newest generation sequencing and microarrays as well as by protein shotgun experiments which challenge tasks such as dimension reduction, data compression, visual perception, data integration and extraction of biological information. Information technology and, particularly, machine learning enable recognizing complex patterns and make intelligent decisions based on such large-scale data sets. Application of these techniques for analyzing molecular-biological data requires adaptation of statistical methods, e.g. for class discovery and feature selection, consideration of previous knowledge for interpretation in the context of biological function and also easy-to handle tools for direct use by biologists and non-informaticians which allow to generate hypotheses about functional relationships directly from the data.

We developed a comprehensive analysis and visualization pipeline based on self-organizing maps (SOMs). The unsupervised SOM mapping projects the initially high number of features to meta-feature clusters. This meta-data is visualized in terms of intuitive mosaic portraits. We have shown that these SOM portraits transform large and heterogeneous sets of molecular biological data into an atlas of sample-specific texture maps which can be directly compared in terms of similarities and dissimilarities.
Spot-clusters of correlated meta-features can be extracted from the SOM portraits in a subsequent step of aggregation. This spot-clustering effectively enables reduction of the dimensionality of the data to a handful of signature modules in an unsupervised fashion.
Furthermore we demonstrated that analysis techniques, which are normally applied at the feature-level, provide enhanced resolution if applied to the meta-features due to essentially two facts: Firstly, the set of meta-features better represents the diversity of patterns and modes inherent in the data and secondly, it also possesses the better signal-to-noise characteristics as a comparable collection of single features.
Additionally to the pattern-driven feature selection in the SOM portraits, we apply statistical measures to detect significantly differential features between sample classes. Further, two variants of functional enrichment analyses are introduced which link sample specific patterns of the meta-feature landscape with biological knowledge and support functional interpretation of the data based on the ‘guilt by association’ principle.

Selected projects:
Human body tissue atlas
(browse results)
In a pilot study we applied our analysis pipeline to the whole genome expression profiles of 67 healthy human tissues selected from ten tissue categories.
(browse results)
The method is applied to disentangle the different subtypes of Glioblastome Multiforme. 153 tumor and 11 normal samples were used in our analysis.
Prostate cancer development
(browse results)
We investigated the progression of protate cancer from benign hyperplasia to metastatic cancer. The study comprises of 84 samples from 44 individuals.
Malignant lymphomas
A large-scale study consisting of more than 900 patients is used to build up an atlas of Burkitt's lymphomas and to differentiate subtypes from individual expression patterns.
Cancer methylome
In several studies we analysed the DNA methyation patterns in diverse cancers, e.g. Glioblastona Multiforme, colorectal cancer, prostate cancer and a large cohort of hematological neoplasms. In combined methylome + transcriptome analyses we investigate concerted epigenetic and transcriptional effects.
miRNA expression landscapes
Explicit miRNA-studies of healthy, deseased and developing tissues are investigated to deduce regulatory mechanisms of miRNAs.
Human SNP atlas
The SOM framework was applied to a SNP study (660,000 SNPs measured on microarrays) of more than 1,000 healthy human individuals from all over the world. The SOM reflects regional differences as primary patterns of the SNP landscapes. However the individual characteristics is also resolved in the SOM.

Download pipeline:
  • The pipeline is available as R-package on CRAN repository: link

Our group:

Hans Binder 

Henry Löffler-Wirth 

Lydia Hopp 

Volkan Cakir 

Edith Willscher 

Kathrin Lembcke 

Selected References:
Wirth H, Löffler M, von Bergen M, Binder H.:
Expression cartography of human tissues using self organizing maps. (download)
BMC Bioinformatics 2011
Wirth H., Bergen M.v., Murugaiyan J., Rösler U., Stokowy T., Binder, H.:
MALDI-typing of infectious algae of the genus Prototheca using SOM portraits. (download)
Journal of Microbiological Methods 2012
Wirth H., Bergen M.v., Binder H.:
Mining SOM expression portraits: Feature selection and integrating concepts of molecular function. (download)
BioData Mining 2012
Steiner, L., Hopp, L., Wirth, H., Galle, J., Binder, H., Prohaska, S., Rohlf, T.:
A global genome segmentation method for exploration of epigenetic patterns.
PLoS ONE 2012
Hopp, L., Lembcke, K., Binder, H. & Wirth, H.:
Portraying the Expression Landscapes of B-Cell Lymphoma - Intuitive Detection of Outlier Samples and of Molecular Subtypes.
Biology 2013
Hopp, L.*, Wirth, H.*, Fasold, M. & Binder, H.:
Portraying the expression landscapes of cancer subtypes: a glioblastoma multiforme and prostate cancer case study.
Systems Biomedicine 2013
Wirth, H.*, Cakir, V.*, Hopp, L. & Binder, H.:
Analysis of miRNA expression using machine learning.
Methods of Molecular Biology 2014
Cakir, V.*, Wirth, H.*, Hopp, L. & Binder, H.:
miRNA expression landscapes in stem cells, tissues and cancer.
Methods of Molecular Biology 2014
Binder, H. & Wirth, H.:
Analysis of large-scale OMIC data using Self Organizing Maps.
Encyclopedia of Information Science and Technology 2014
Binder, H., Wirth, H., Lembcke, K. et al.:
Time-course human urine proteomics in space-flight simulation experiments - A high resolution and personalized machine learning analysis.
BMC Genomics 2014

Last update: 23.01.2015