Building on top of OntoGPT
Project Leads:
- Claus Weiland (Senkenberg)
- Leyla Jael Castro (ZB Med)
- Stian Soiland-Reyes (University of Manchester)
Abstract:
OntoGPT combines LLM with ontology as a source of factual, grounded knowledge. In this project we want to (i) assess the effectiveness of OntoGPT for selected biological knowledge bases (e.g. UniProt, KEGG, Phenoscape KB); (ii) explore the application of OntoGPT with locally deployed open LLMs such as OpenLLaMA, i.a. for the combination (e.g., mapping and creation of upper-level hierarchy) of ontologies, using MeSH and AGROVOC as use case, (iii) combine OntoGPT with Bioschemas/RO-Crates/schema.org as ground truth to annotate content with Bioschemas/RO-Crates and explore its potential to suggest Bioschemas/RO-Crates profiles. We will also explore generation of SPARQL linked data queries with LLM from plain text questions, informed by such profiles. Use of LLM to query scientific knowledge aims at reducing the barrier for those not familiar with native query languages (e.g., SQL or SPARQL) or ontology rules and languages, while also benefiting from the generic knowledge covered by the regular LLM inputs. In addition, we will integrate (i.a.) the aforementioned knowledge bases as external knowledge resources to constrain the generated output with factual information employing so-called retrieval augmented abstraction (RAG) and reduce in this way “hallucination” of the LLM output. deNBI resources such as PlantsDB will be included here. The use of schema.org poses a challenge as it defines its own way to deal with domain and ranges for properties/relations between types -this approach is also used by RO-Crate profiles, but also makes our project a cross-domain one so it could be further adopted by. e.g., NFDI consortia in Germany.