written by Toralf Kirsten, Frank Meineke, Henry Löffler-Wirth, Alexandr Uciteli, Christoph Beger, Sebastian StäubertMatthias LöbeRene HänselFranziska G RauscherJudith Christina SchusterThomas PeschelHeinrich HerreJonas WagnerSilke ZachariaeChristoph EngelMarkus ScholzErhard RahmHans BinderMarkus Löffler



Clinical trials, epidemiological studies, clinical registries and other prospective research projects, together with patient care services, are main sources of data in the medical research domain. They serve often as basis for secondary research in evidence-based medicine, prediction models for disease and its progression. This data is often neither sufficiently described nor accessible. Related models are often not accessible as functional program tool for interested users from the healthcare and biomedical domains.


The interdisciplinary project Leipzig Health Atlas (LHA) has been developed to close this gap. LHA is an online platform that serves as a sustainable archive providing medical data, metadata, models, and novel phenotypes from clinical trials, epidemiological studies and other medical research projects.


Data, models, and phenotypes are described by semantically rich metadata. The platform prefers to share data and models presented in original publications but is also open for non-published data. LHA provides and associates unique permanent identifiers for each data set and model. Hence, the platform can be used to share prepared, quality assured data sets and models while they are referenced in publications. All managed data, models, and phenotypes in LHA follow the FAIR principles, with public availability or restricted access for specific user groups.


The LHA platform is in productive mode ( It is already used by a variety of clinical trial and research groups and is becoming increasingly popular also in the biomedical community. LHA is an integral part of the forthcoming initiative building a national research data infrastructure for health in Germany.


Scientific results are typically reported in (peer-reviewed) publications. In the medical
research domain, clinical trials, epidemiological and molecular biological studies are the main
sources for creating data for evidence-based medicine finally aiming at improving healthcare.
While publications describing the essence of the scientific finding are accessible, the
publication data and derived models are often not available for interested readers which
hampers findability, accessibility, and further interpretability of the data, by limiting their re-

In recent years, the publication requirements in many medical journals, among them the
Nature publishing group, Cell Science, PLOS journals have changed. Typically, these journals
require the availability of fundamental data, not just in order to check the validity of generated
scientific results but also to guarantee their sustainability for future research. Multiple
platforms became available allowing managing scientific data. Dryad 1 and Gene Expression
Omnibus (GEO) 2,3 are prominent examples of such platforms used widely in the last decade.
Most of these platforms offer a storage service, which is, in some cases, not free of charge.
Data can be uploaded in nearly every format. Comma separated value structures (CSV) allow
nearly every tabular format with and without column headers. However, some platforms are
lacking additional metadata, which often makes data usage difficult. Moreover, data is often
not interoperable because common data formats are not used. Shared medical data should
include metadata describing the clinical, technical and semantical context. Because of the
semantic heterogeneity of the used terminologies data from different publications (uploads)
usually cannot be combined without additional efforts. Moreover, clinical projects often use
different consent forms in order to capture the permission from participants giving rise to variable consent limitations. In the same way, privacy regulations need to be considered to be in line with laws, like the General Data Protection Regulation (GDPR) 4 and country-specific laws. All these technical and organizational requirements have been bundled with the FAIR principles 5 for making research data Findable, Accessible, Interoperable, and Reusable, that came up in parallel with the project start of LHA in 2016. Numerous articles have discussed FAIR practices and principles (6 and references cited therein).

To address these challenges, we have set up the Leipzig Health Atlas (LHA) as a web-based platform to make data, analyses and models accessible for interested research communities in a FAIR-conform way. The goal is to provide data of different nature together with their metadata on different levels of abstraction, cf. a classification of data – from an ontological viewpoint7. Moreover, and in addition to many other data platforms, the LHA also aims to implement different kinds of biomathematical models and data analysis tools as interactive web-applications.

By now (February 2022), the LHA contains 327 data sets and 34 models / tools associated
with 891 publications and 34 scientific projects, mainly from the domains of medical
research. The paper is organized as follows: We introduce the main concepts and methods
(Section 2), provide an overview of available models and applications (Section 3), and finally
discuss special aspects in comparison with similar platforms (Section 4).


The Leipzig Health Atlas has been designed and implemented as a web-based platform running in the production mode from early days on. The platform has a modular structure and consists of different components which are linked together (see Fig. 1). Initially LHA has been created only for sharing research data. Today its functionality is extended for sharing models and their applications, which enables the user for quickly applying and utilizing a model without any further implementation efforts. The platform is currently mainly used by medical and clinical scientists as well as by epidemiologists, medical computer scientists, biometricians and bioinformaticians. According to standard SEEK statistics the mean
download frequency of content of the LHA is 28.60 downloads (sd = 59.52, max = 1,019), as of autumn 2021. The mean number of content accesses (i.e. requests to pages containing metadata of content) is 77.89 (sd = 114.53, max = 1,229) where requests send from web crawlers were not considered…

to read the full original article, click here please: