B. Grüning, Albert Ludwigs University Freiburg
Service Center: RNA Bioinformatics Center – RBC
Over the last years, many areas of research suffer from poor reproducibility. This problem is particularly acute in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. For many years, the RBC Freiburg and de.NBI-epi Freiburg team have been leading initiatives such as Bioconda, BioContainers, and Docker for building highly portable packages of bioinformatics software, containerization and virtualization technologies. Recent analysis of the COVID-19 case study reveals the strong need for reproducible analytics. It is no longer acceptable to publish results whose analytical procedures are not fully reproducible and transparent. Besides the importance to access raw data for analysis, we could show that an existing community effort in curation and deployment of biomedical software can reliably support rapid reproducible research during a global crisis such as the SARS-CoV-2 pandemic infection. We have used Galaxy for accessing the raw read data, assembly of the SARS-CoV-2 genome, estimated timing for most recent common ancestors (MRCA), analysed variations within individual isolates, spike protein substitutions, and the recombination and selection. Microbiome samples, which typically display considerable contamination with host DNA, or with samples of body fluids for pathogen detection can pose an issue with certain types of analyses, in particular with genome assembly. To target this issue, we have developed a Galaxy tutorial which guides through the preprocessing of sequencing data of BronchoAlveolar Lavage Fluid (BALF) samples obtained from early COVID-19 patients in China. Since such samples are expected to be contaminated significantly with human sequenced reads, the goal is to enrich the data for SARS-CoV-2 reads by identifying and discarding reads of human origin before trying to assemble the viral genome sequence. To further reproduce and expand the analysis, we provide researchers with all Galaxy histories, workflows and results with full details.
To acess the Galaxy tutorial: https://training.galaxyproject.org/training-material/topics/assembly/tutorials/assembly-with-preprocessing/tutorial.html