Computational workflows have gained significant momentum in the past two years. They formalize ways to execute tools and data analysis and are digital objects on their own. Workflows can also automatize code execution, speeding up complex analyses such as data analysis for chemical compounds which is performed via Metabolomics Workflows. However, generating FAIR workflows is still a challenge in terms of reproducibility. These workflows generally consist of many computational steps which use complex inputs and generate outputs with no collection of standardized traceable intermediary data objects. Different tools integrated within these workflows also make it a challenge to ensure reproducibility of the results.
This session aims at improving the support for the FAIR principles in workflows by implementing methods to gather provenance metadata online while the workflow modules arebeing executed. This ensures the quality, traceability, and reproducibility of results. As an example, we will choose the Metabolome Annotation Workflow (MAW) that performs compound identification. The modules within MAW are executed in Docker environments. We will integrate libraries in R and Python that enable the automated ontology-based provenance collection. This will improve reusability by gathering informative metadata, inputs, outputs, intermediary data objects, and the computational tasks performed along with the respective dependencies and libraries used. The methods used in this session are workflow-agnostic and not domain-specific. They can be transferred to other workflows and tools and, thus, can
engage a wider research community such as the Metabolomics and Genomics communities. We will align to the goals and work towards improving packages to make provenance gathering an easy-to-use and integrated process.
Project Lead: Mahnoor Zulfiqar <