Database

Development of an integrated, deep learning-based system to support the curation of biomedical databases

Müller Group, Heidelberg Institute for Theoretical Studies, Service Center: de.NBI Systems Biology Service Center - de.NBI-SysBio

The BMBF-funded DeepCurate project (Computational Life Sciences (CompLS) -Deep Learning in Biomedicine) will support the previously manual curation process of scientific publications in biomedical databases using the SABIO-RK database as an example. SABIO-RK is a database for biochemical reactions and their kinetic properties. The curation in SABIO-RK mainly comprises the manual extraction as well as the standardization and annotation of data from the scientific literature to provide them in a structured, easily accessible and machine-readable form. Scientific publications are often unstructured. Existing automatic natural language processing (NLP) methods do not have the required coverage, robustness, and effectiveness to be used for the curation of high-quality databases. However, current advances in deep learning-based NLP allow the support of the curation process by using methods of automatic information extraction and thus make the process more effective and efficient. However, deep learning needs training data. DeepCurate explores innovative ways to use training data of various modalities (texts, images, eye trackings). In combination with current deep learning approaches, which can particularly benefit from multi-modal input, DeepCurate will be a very powerful tool that can also be adapted to other manually curated biomedical databases because it is not dependent on specific database models, ontologies, and scientific domains.

A first publication uses data from the SABIO-RK curation process to generate useful training data for deep learning approaches. Without the curation knowledge generated and maintained in de.NBI for more than a decade, the generation of such training data would be very expensive and time-consuming. The project exemplifies the interaction between service and research activities.

For further information, please visit SABIO-RK.

Funded by: BMBF, FKZ 031I0204

Search projects by keywords:
Open Medical data

Eils Group, Hub for Innovations in Digital Health, Service center: Heidelberg Center for Human Bioinformatics – HD-HuB

Machine learning methods hold the promise of great benefits for patients, physicians and researchers but require vast amounts of data. While this data typically exists in large research hospitals, it is generally inaccessible due to legal, ethical and privacy concerns. By building a secure computing framework, following the model-to-data approach, we aim at opening medical data for research purposes, without compromising security.

The core idea is that the sensitive data stays within the hospital’s servers, pseudonymized and protected by existing safeguards. Researchers will work with the data by sending in their code for machine learning models, which will then be executed on the data using on-site high-performance computing resources. While performance metrics are generally sent back to the researchers to allow for code changes and model improvements, models will only be sent back after thorough privacy checks.

In this framework, the hosting hospital stays in full control over their (patient) data and does not disclose personalized or otherwise sensitive data to researchers while still enabling research for a wide scientific community.

For further information, please visit ails lab.

Search projects by keywords:

Database

Development of an integrated, deep learning-based system to support the curation of biomedical databases

Open Medical data