Alexander Sczyrba, Christian Henke (BiGi), Clovis Galiez, Milot Mirdita, Johannes Soeding


Athens, ECCB 2018

The amount of data generated by metagenomics is growing rapidly, making the data analysis the main bottleneck to get to novel biological insights. The goal of this tutorial is to introduce modern bioinformatic tools and pipeline construction methods that will enable you to efficiently cope with the enormous amount of metagenomic data through modular and reproducible, workflow-based analysis.

We will first give a summary of metagenomic tools for assembly, binning and taxonomic profiling in a comprehensive way by reviewing the  results from the CAMI challenge. This should give you a taste of which tools fit best in your own projects. We will then introduce the Common Workflow Language (CWL), which allows you to build reproducible and flexible metagenomic workflows.

In the afternoon session, we will train you in efficient metagenomic data analysis on the protein level using the MMseqs2 software suite. Exercises will cover different topics including efficient protein-level assembly, ultra-fast ORF clustering, sensitive homology search as well as building goal-specific custom pipelines. You will learn by hands-on exercises how to build your own efficient workflows in MMseqs2 by combining its various modules.

Learning goals:
Efficient analysis of large metagenome datasets

This tutorial is aimed at bioinformatics practitioners with experience in command line usage and scripting, who are interested to learn about powerful tools for the efficient analysis of even very large metagenomic datasets. More than 50% of this workshop will involve hands-on exercises.

CWL, Metagenome, MMseqs2

CWL, MMseqs2

Alexander Sczyrba (This email address is being protected from spambots. You need JavaScript enabled to view it.)

For more information see: