The benefits of well-maintained research data management according to FAIR principles are apparent. A Nature survey conducted in 2016 revealed that more than 70% of researchers were unable to reproduce published studies and more than 50% could even not reproduce their own experiments (Baker 2016). In this context, the annotation of metadata is particularly essential. Their completeness depends on the respective endpoint repository and is based on specific formats, checklists and terminologies. For the individual researcher, the amount of different standards can easily become overwhelming. Therefore, the DataPLANT project aims at summarizing the key parameters relevant for data publication in so-called Annotated Research Context (ARC). These ARCs are based on various generic frameworks such as ISA (Investigation-Study-Assay, Sansone et al. 2012), CWL (Common workflow language, Amstutz et al. 2016) and RO-Crate (Research object crate, Soiland-Reyes et al. 2022) and are designed to be a central structure for processing and storing data and metadata on any experimental setup. An alignment of the activities on the national level represented by DataPLANT and the international level represented by ELIXIR to activate existing synergies is of utmost importance also to avoid parallel and interfering developments. In practice, the ARC is a directory scaffold with a GIT-based version control and allows the researchers to think of research data as immutable but evolving. Meta(data) can be continuously layered and improved upon the existing entities. However, meta(data) requirements and validity needs to be ensured by a template-based process followed by a constraint validation with feedback to the user as a pull request. To align the activities of ELIXIR and DataPLANT, we want to enable easy creation of MIAPPE-compliant ISA-ARCs for Plant Phenotyping Experiments. Here, we will provide MIAPPE-compliant ISA-ARC templates including user tutorial for better dissemination and community acceptance. To ensure the quality of resulting ARCs, we will implement the required continuous integration testing based on MIAPPE.
With the ArcCommander and the MS Excel plugin SWATE, among others, DataPLANT provides tools to the community to create the ARC structure for their experimental data and to generate the FAIR metadata using well-defined templates. At the moment, these SWATE templates are still relatively generic and not quite tailored to the plant use case, so it would be enormously important to build on groundwork done by the MIAPPE (Minimal Information About a Plant Phenotyping Experiment, Papoutsoglou et al. 2020) data standard to make these templates a) MIAPPE-compliant and b) create sample datasets (ARCs) based on real
experimental data. Due to the efforts in ELIXIR and the availability of different ISA tools, we plan to assess the ARC compatibility with other tools during development, namely the MIAPPE Wizard and other advancements made during the BioHackathon Europe.
Another aspect that is still underrepresented is information dissemination and outreach. One reason for this is the lack of training materials, best practice documents and sample data and, above all, of competencies to adequately pass this information on to third parties. To improve this situation, work will also be done during the hackathon to design possible training materials and knowledge dissemination strategies.
We plan to leverage existing tools such as ontologies for standardized and controlled terminologies, DMP generators that can be adapted or integrated to our solution, and file converters, e.g. from databases to ISA, and improve the API of ISA-Tools to support the solutions developed here.
Project Leads: Elisa Senger <