📢 scverse x BioCypher: Integrating Single-Cell Omics and Large Language Models in Biomedical Research

Project Leads:
- Sebastian Lobentanzer: Institute for Computational Biomedicine, Heidelberg University Hospital, Heidelberg, Germany, and EBI/Open Targets, EMBL-EBI, Hinxton, Cambridge, United Kingdom, ORCID. - This email address is being protected from spambots. You need JavaScript enabled to view it.
- Daniele Lucarelli: Institute for Experimental Cancer Therapy, Technical University of Munich, Munich, Germany, and Institute for Computational Biology, Helmholtz Munich, Munich, Germany. - This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract:
The scverse ecosystem offers essential tools for the aggregation, processing, and analysis of single-cell data, while the BioCypher ecosystem focuses on streamlining knowledge management in biomedical sciences and integrating current Large Language Model (LLM)-related technologies. This hackathon project proposes to bring these two open-source Python ecosystems closer together, enhancing scientific productivity in biomedical research.

🔬 What's in store:

  • Build custom pipelines: Create dedicated workflows that encapsulate knowledge about data processing and analysis, making scverse tools more accessible through the BioCypher ecosystem.
  • Enhance LLM integration: Implement retrieval-augmented generation from knowledge graphs or vector embeddings to simplify access to scverse documentation, along with API parameterization and function calling for seamless use of scverse analysis packages.
  • Integrate visualisations: Combine visual representations such as UMAP embeddings and spatial transcriptomics in a unified app with a chat-based user interface.
  • Support multimodal interactions: Develop models that integrate text, image, and transcriptome data, facilitating richer interaction with experimental data.
  • Develop multi-LLM-agent workflows: Enable complex, multi-stage workflows to be supported by different LLMs working together.
  • Establish benchmarks: Monitor and ensure correct LLM behaviours through dedicated evaluation metrics.

🧑‍🎓 What you should bring:

Both scverse and BioCypher are Python-based ecosystems, so proficiency in Python programming and packaging is essential. Additional skills that would be beneficial include experience with generative AI models (such as prompt engineering and APIs), knowledge of Knowledge Graph technologies or other databases, and familiarity with TypeScript.