Increasing interoperability of bioimage dataset resources
Project Leads:
- Josh Moore, German BioImaging e.V. (
- Susanne Kunis, University Osnabrueck, CellNanOS (
Abstract:
The volume of published bioimage data is constantly increasing. In addition to centralized public repositories such as the Image Data Resources (IDR) and the BioImage Archive, there are also distributed, federated bioimage data resources. Such repositories have been created, for example, by projects like NFDI4BIOIMAGE and I3D:bio (https://omero-tim.gerbi-gmb.de/webclient/) as well as national initiatives like FBI.data and NL-Bioimaging. Overall, these services host collections of millions of images and their associated biological metadata annotations.
The IDR and many federated repositories are based on the Open Microscopy Environment Remote Object (OMERO) data management system. In order to improve accessibility and comply with FAIR standards, first steps have been taken to export information from OMERO using the Resource Description Framework (RDF). During this hackathon, our goal is to improve the RDF export and consumption processes to enable the integration of information from different bioimage resources and to facilitate querying across different bioimage data sources. This work leverages the omero-rdf Python package for this task and represents a further development of the "Towards OMERO and ARC interoperability for RDM-compliant bio-image data" project that took place in the de.NBI Hackathons of 2023.
Our goals include:
1. Reviewing RDF Structure and URIs: Analyze and refine the current RDF structure and URIs to ensure they are optimized for production environments.
2. Subsetting RDF for Various Use Cases: Identify effective strategies to create subsets of the RDF data that cater to different research and application needs.
3. Packaging subsets: Package subsets of RDF data in a RO-Crate format to foster sharing and data integration.
4. Endpoint Drafting and Testing: Develop endpoints for SPARQL and bioschemas formats, aiming to facilitate the consumption of RDF subsets. Curate (meta)datasets and queries for testing and performance profiling. Conduct performance and scalability testing of these endpoints.
Expertise needed:
Familiarity with bioimaging data in general, SPARQL, RDF, and ingestion/query optimization is required.