📢 scverse x BioCypher: Integrating Single-Cell Omics and Large Language Models in Biomedical Research
Project Leads:
- Sebastian Lobentanzer: Institute for Computational Biomedicine, Heidelberg University Hospital, Heidelberg, Germany, and EBI/Open Targets, EMBL-EBI, Hinxton, Cambridge, United Kingdom, ORCID. -
- Daniele Lucarelli: Institute for Experimental Cancer Therapy, Technical University of Munich, Munich, Germany, and Institute for Computational Biology, Helmholtz Munich, Munich, Germany. -
Abstract:
The scverse ecosystem offers essential tools for the aggregation, processing, and analysis of single-cell data, while the BioCypher ecosystem focuses on streamlining knowledge management in biomedical sciences and integrating current Large Language Model (LLM)-related technologies. This hackathon project proposes to bring these two open-source Python ecosystems closer together, enhancing scientific productivity in biomedical research.
🔬 What's in store:
- Build custom pipelines: Create dedicated workflows that encapsulate knowledge about data processing and analysis, making scverse tools more accessible through the BioCypher ecosystem.
- Enhance LLM integration: Implement retrieval-augmented generation from knowledge graphs or vector embeddings to simplify access to scverse documentation, along with API parameterization and function calling for seamless use of scverse analysis packages.
- Integrate visualisations: Combine visual representations such as UMAP embeddings and spatial transcriptomics in a unified app with a chat-based user interface.
- Support multimodal interactions: Develop models that integrate text, image, and transcriptome data, facilitating richer interaction with experimental data.
- Develop multi-LLM-agent workflows: Enable complex, multi-stage workflows to be supported by different LLMs working together.
- Establish benchmarks: Monitor and ensure correct LLM behaviours through dedicated evaluation metrics.