René Rahn, Marcel Ehrhardt, Enrico Seiler (CIBI)
Online - GCB 2021 Conference
In this half-day tutorial we are going to teach how to use modern C++ and utilise modern C++ libraries to rapidly develop tools and scripts for operating on and manipulating large-scale sequencing data.
The high variability and heterogeneity often observed within various genomic data is challenging for many standard tools, for example for read alignment and variant calling. Often, these tools are wrapped in complicated pre- and postprocessing data curation steps in order to obtain results with higher quality. However, these additional steps incur a high maintenance and performance burden to the established work process and often do not scale with larger data sets. Seldomly, C++ is considered as the language of choice for these small processes, although it is the main language used in high-performance computing. We are going to show that implementing modern C++ can be as easy as using other modern high-level languages.
Students will develop
- skills in developing an application using the C++ programming language
- skills in using modern C++ libraries to query large sequence databases (e.g. SeqAn, SDSL, etc.)
- knowledge and understanding of modern C++ features, such as ranges and concepts
- knowledge and understanding about modern and efficient data structures as well as algorithms crucial for large-scale genomic sequence analysis
- knowledge and understanding about how to develop and sustain high-quality software
This tutorial is mostly suited for computational biologist and bioinformaticians with research focus on sequence analysis (e.g., genomics, metagenomics, proteomics, read alignment, variant detection, etc.). A fundamental knowledge about sequencing experiments and the involved data is required. We expect that attendees have an intermediate knowledge in programming with any high-level programming language, e.g. Python, Java or C++. Some basic C++-knowledge is helpful but not mandatory to successfully complete the course.
This tutorial is targeting beginners and intermediate C++ developers that want to learn more about modern C++ features like ranges and concepts.
BioC++, modern C++, bioinformatics, SeqAn, FileIO
- A simple text editor
- g++ >= 7
- cmake >= 3.12