|Home||Search | Sitemap | Imprint ||
BioFuice: A decentralized approach for data integration in bioinformatics
Interdisciplinary Centre for Bioinformatics
University of Leipzig
Dept. of Computer Science
University of Leipzig
Many bioinformatics applications require data from different sources to answer complex research questions. Integrating such highly diverse data is a major challenge in bioinformatics and often much too laborious and error-prone for scientists. Traditional database approaches are mostly too rigid and not scalable.
We developed the BioFuice approach for interconnecting and integrating data from different autonomous sources. It is based on a decentralized peer-to-peer-like infrastructure. We directly utilize instance (object)-level correspondences between different sources which are often already available in the sources in the form of web links, e.g. based on accession ids. Sets of such correspondences represent mappings between sources which describe objects of different types, such as genes, proteins, and their function. Data sources can comprise multiple of such object types. For instance, the Ensembl source contains amongst others the object types "Gene" and "Transcript", NetAffx the type "Gene" and SwissProt/UniProt the type "Protein". These object types and their corresponding intra- and inter-source mappings form the so called source mapping model. Mappings are also assigned a semantic mapping type. Together with object types they mirror the semantics of the domain within a so called domain model.
Generating the semantic domain model
To process objects and mappings we have devised a set of high-level operators. They can be used within script programs (workflows) to combine and analyze data from different sources. For instance, we can use a script to identify and retrieve all chemokine-related genes of the NetAffx source which can be used to focus the analysis of microarray-based experiments on relevant genes. To form this gene group, the sources HUGO, SwissProt and GeneOntology are included in the query process in order to compensate incomplete sources and mappings.
The key aspects of the BioFuice integration approach are:
Currently, BioFuice integrates data from more than 20 public molecular biological annotation sources, such as Ensembl, Bind, NetAffx, HUGO and HomoloGene, but also personal sources as result of different analyses. The integration approach is applied in various collaborative research projects ranging from analysis of microarray data (IZBI), the analysis of protein interaction networks (MPI MIS) to the detection non-coding RNAs and gene homologues (BioInf).