Tools for reproducible research

Basel

Educators:
Bjoern Grüning (RBC), Johannes Köster, Devon Ryan

Date:
21.07.2019

Location:
ISMB/ECCB Basel

Content:
The typical data analyst must simultaneously juggle multiple projects, each having its own duration and software requirements. As few analysts have any formal training on structuring or even writing the code necessary to perform an analysis, it is unsurprising that the iterative analytic process can produce a wide assortment of almost identically named files (e.g., “final_results.txt”, “final_results.version2.txt”, “final_results.really_final.txt”), all with unclear origins and produced with a hodge-podge of similarly poorly named scripts. The near impossibility of tracing a results file to the exact process that produced it creates untold difficulties both when it comes time to publish results as well as when planning subsequent experiments months or years later (afterall, which of the “final_results” files was really the “right one”?). These issues are further compounded by software paths and other similar assumptions being hard-coded into scripts, preventing easy analysis replication elsewhere. Performing analyses in a reproducible and traceable manner is clearly needed to combat such problems.

Schedule Overview

2:00 - 2:10 pm     Installing conda and snakeMake
2:10 - 2:30 pm     Intro to conda and bioconda (slides)
2:30 - 3:30 pm     Hands-on Session: creating conda envs and installing packages from bioconda repo

    This practical would require installing hisat, samtools and deeptools via bioconda

3:30 - 4:00 pm     Hands-on Session: writing conda recipes

    Topics in BioVis (including examples)
    Visualization of sequences, macromolecules, omics data, biological networks

4:00 - 4:15 am     Coffee Break
4:15 - 4:35 pm     Intro to snakemake

    Specific tools for visualizing large-scale biological data

4:35 - 6:00     Hands On Session: Writing a snakemake workflow wrapper for mapping, indexing and creating coverage files

Learning goals:
In this hands-on tutorial, we demonstrate how Conda can be used to deploy specific software versions easily, reproducibly, and without administrator credentials. Moreover, we demonstrate how Conda’s ability to create isolated software environments helps to avoid side-effects between different analyses or different steps of the same analysis. Attendees will also learn how to create conda recipes themselves, so they can contribute new packages to projects such as Bioconda. We further demonstrate how Snakemake can be used in combination with Conda and Containers to create reproducible analysis workflows and execute them on any platform from workstations to clusters and the cloud. Finally, using snakePipes as an example, we demonstrate how Conda and Snakemake can be used to define reproducible and flexible workflows for complex genomics analysis.

Prerequisites:
- Laptops with Linux or MacOS
- Pre-installed Miniconda - install via miniconda : https://conda.io/miniconda.html
- Expected audience should have basic familiarity with python, git and the command line.

Keywords:
Conda, Bioconda, snakemake, Bioconductor, reproducible research

Tools:
Conda, Bioconda, snakemake,

Contact:
Björn Grüning This email address is being protected from spambots. You need JavaScript enabled to view it.

Training

The de.NBI training platform

Tools for reproducible research - ISMB/ECCB 2019