The objective of this proposal is to develop an automated suggestion chatbot that assists researchers in various domains with data management by providing recommendations for best practices, policies, and tools. The chatbot aims to address the challenge faced by researchers who may be overwhelmed and unaware of the optimal data management approaches in their specific domain.

The proposed approach involves utilizing a combination of reliable sources such as FAIRsharing, RDMkit, FAIR Cookbook, Helmholtz Metadata Collaboration, RDMO and sources made available through NFDI consortia. These sources will serve as a knowledge base for the chatbot, enabling it to recommend a selection of tools, best practices, and policies that researchers should follow for effective data management.

The chatbot will be designed to interactively assist users by providing tailored recommendations based on their specific data management needs. Users can ask questions or seek guidance on various aspects of data management, such as data organization, metadata standards, data sharing, and data preservation. 

Approach:

  1. Data Collection and Integration:
    • Gather relevant information from reliable sources such as FAIRsharing, RDMkit, FAIR Cookbook, Helmholtz Metadata Collaboration, RDMO, and NFDI sources.
    • Integrate the collected data to create a comprehensive knowledge base for the chatbot.
  1. Natural Language Processing (NLP) Implementation:
    • Develop NLP algorithms to effectively understand and interpret user queries.
    • Extract key concepts and user intents from the queries.
    • Match user queries with appropriate responses from the knowledge base.
  1. Interactive Chatbot Development:
    • Design and implement an intuitive and user-friendly chatbot interface.
    • Integrate the NLP algorithms and the knowledge base to provide tailored recommendations for RDM best practices, policies, and available tools.
    • Enable interactive communication between the chatbot and the user to address specific RDM queries.

To ensure the chatbot provides up-to-date information, APIs will be utilized to fetch the current status of the sources mentioned at runtime. This approach will ensure that the chatbot accesses the most recent guidelines, policies, and tools available from these sources. By dynamically retrieving information, the chatbot can offer researchers the latest and most relevant recommendations for their data management needs. If this approach to dynamically retrieve information through API is technically not feasible, we will revert to utilizing a data dump and use this as the knowledge base for the chatbot.

While the chatbot may not reach its final version within the short timeframe of the hackathon, our goal is to develop a functional prototype that demonstrates the capabilities of the chatbot. This prototype will serve as a foundation for future development, refinement, and expansion of the chatbot.

This project aligns with the broader topic of utilizing Artificial Intelligence (AI) for data management practices, an area actively explored by de.NBI and ELIXIR. By leveraging Large Language Models (LLMs), the chatbot enhances data transformation efficiency and accuracy, promoting interdisciplinary collaboration and scientific advancements in research.

The expertise of team members from DataPLANT and ELIXIR Plant Sciences Community ensures a solid foundation for success. Collaborative efforts during the BioHackathon will focus on developing the chatbot, highlighting the innovative use of chatbots in data management practices.

Project leads: Xiaoran Zhou, FZJ / DataPLANT, <This email address is being protected from spambots. You need JavaScript enabled to view it.>, Sebastian Beier, FZJ / ELIXIR-DE, <This email address is being protected from spambots. You need JavaScript enabled to view it.