Project leads:
- Jonas Grieb, Daniel Bauer, Claus Weiland (Senckenberg)
- Rohitha Ravinder, Nelson Quiñones, Leyla Jael Castro (ZB MED)
- Harry Caufield (Lawrence Berkeley National Laboratory)

Questions to the project? Please contact This email address is being protected from spambots. You need JavaScript enabled to view it.

Project idea
Combining Large Language Models (LLMs) with ontological knowledge opens up some opportunities that, in the long-run, will make AI stronger and more useful to researchers. Such a combination supports, for instance, applications around neuro-symbolic learning, AI explainability and reduction of hallucination. These topics also align with ELIXIR-DE/de.NBI objectives around accessibility and interoperability. While LLMs facilitate access to information, their combination with ontologies facilitate adoption of controlled vocabularies, standardized formats, and communication across data sources. This combination also aligns to NFDI (e.g., NFDI4Earth Knowledge Hub, and NFDI4DataScience MLentory FDO registry).

This year, our project will continue and extend the 2024 German BioHackathon project Building on top of OntoGPT. OntoGPT combines LLMs, prompt engineering, and ontologies to extract structured information from text grounded on ontology background knowledge. Since last year, other approaches combining ontologies and LLMs have emerged, for instance, GPT-NER assesses the gaps of LLMs (focused on text generation) wrt named entity recognition (NER) and proposes a solution. Leveraging on the outcomes of the BioHackathon Germany 2024, Senckenberg demonstrated successfully the integration of OntoGPT in the Machine Annotation Services (MAS) of the DiSSCo RI. On its side, ZB MED is combining LLMs with structured metadata to improve information retrieval and recommendation systems from various data sources hosting machine-learning models.

Our project “tale of two cities” will focus on (i) use of LLMs and NER for topic classification, (ii) ontology guided relation extraction to generate triples from biomedical scholarly text, (iii) additional involvement of Knowledge Graphs to enhance LLMs by providing external knowledge for inference and interpretability, and (iv) test KG-enhanced LLMs in various downstream tasks such as enrichment and completion of datasets about material samples and human or machine-based observations. In this respect, we will also investigate (v) agentic LLMs involving the Model Context Protocol (MCP) which connects LLMs in a standardized way to different data sources, services and tools. These topics can be narrowed down or extended depending on the participants who join during the BioHackathon.

On day one, we begin with a basic introduction to LLMs, KGs and OntoGPT to facilitate the entry for beginners. Access to a locally deployed LLM (e.g., Mistral AI) will also be provided.