Müller Group, Heidelberg Institute for Theoretical Studies, Systems Biology Service Center - SysBio

The BMBF-funded DeepCurate project (Computational Life Sciences (CompLS) -Deep Learning in Biomedicine) will support the previously manual curation process of scientific publications in biomedical databases using the SABIO-RK database as an example. SABIO-RK is a database for biochemical reactions and their kinetic properties. The curation in SABIO-RK mainly comprises the manual extraction as well as the standardization and annotation of data from the scientific literature to provide them in a structured, easily accessible and machine-readable form. Scientific publications are often unstructured. Existing automatic natural language processing (NLP) methods do not have the required coverage, robustness, and effectiveness to be used for the curation of high-quality databases. However, current advances in deep learning-based NLP allow the support of the curation process by using methods of automatic information extraction and thus make the process more effective and efficient. However, deep learning needs training data. DeepCurate explores innovative ways to use training data of various modalities (texts, images, eye trackings). In combination with current deep learning approaches, which can particularly benefit from multi-modal input, DeepCurate will be a very powerful tool that can also be adapted to other manually curated biomedical databases because it is not dependent on specific database models, ontologies, and scientific domains.

A first publication uses data from the SABIO-RK curation process to generate useful training data for deep learning approaches. Without the curation knowledge generated and maintained in de.NBI for more than a decade, the generation of such training data would be very expensive and time-consuming. The project exemplifies the interaction between service and research activities.

For further information, please visit SABIO-RK

Funded by: BMBF, FKZ 031I0204

Search projects by keywords: