Applied Machine Learning for Biological Data

Online

Organizer & Educators:
Silvia Di Giorgio (ZB MED – Information Centre for Life Sciences - Associated Partner), Sabry Razick & Pubudu Samarakoon (University of Oslo)

Date:
Module 1: 27-28.05.2025
Module 2: 02-06.06.2025

Location:
Online

Contents:
This intensive workshop focuses on applying machine learning techniques to biological and genomic data, combining theoretical foundations with hands-on coding experience. Participants will work through real-world scenarios using Python-based tools and frameworks that are critical for modern bioinformatics.

Module 1 (optional) provides a solid foundation in scientific computing with Python. Across two half-day sessions, participants will explore essential data handling techniques using NumPy and Pandas—tools widely adopted for manipulating and analyzing biological data.

Module 2 (mandatory) spans five full days and begins by introducing core concepts in machine learning. On the first day, the participants will be introduced to unsupervised learning, and they will implement clustering algorithms and dimensionality reduction techniques using real-world genomics data. The workshop then dives into supervised learning with a focus on classification and regression, including logistic regression and tree-based methods. Participants will construct and evaluate ML models, perform cross-validation, and tune hyperparameters in hands-on sessions tailored to cancer genomics datasets. Later sessions introduce deep learning concepts and the PyTorch framework. Participants will learn to build and train simple neural networks and explore a deep learning-based bioinformatics tool used in genomic variant calling. The final day introduces accelerated genomics through GPU-powered workflows. Participants will learn about GPU technology and how to use containerized bioinformatics tools. They will also implement high-performance, GPU-accelerated pipelines using Parabricks.

Please note: Registration is only required for participation in Module 2. Module 1 is optional and does not require registration.

This workshop offers a comprehensive, practical journey through the machine learning landscape in bioinformatics, from data wrangling to deep learning and scalable genomic workflows.

Learning goals:
By the end of this workshop, you will be able to:

Apply data manipulation techniques using NumPy and Pandas.
Define essential machine learning terminology and differentiate between supervised and unsupervised learning approaches.
Implement and evaluate regression and classification models on biological datasets through hands-on coding exercises.
Apply regularization techniques and hyperparameter tuning to optimize model performance while preventing overfitting.
Analyze biological questions to determine the most appropriate machine learning approach (regression, classification, clustering).
Interpret and evaluate machine learning models using appropriate metrics and cross-validation techniques to ensure reliability.
Develop scripts using PyTorch to build and train simple neural networks and implement deep learning based bioinformatics tools using genomics datasets.
Design end-to-end machine learning workflows for biological applications, from data preprocessing to model deployment.
Implement containerization using Docker to enhance reproducibility and scalability in bioinformatics workflows.
Compare CPU-native versus GPU-accelerated approaches for genomic data processing and identify computational bottlenecks.

Prerequisites:

A life scientist, bioinformatician, or data analyst working with biological or genomic data
Curious about how machine learning can be applied to biological research questions
Looking to strengthen your Python skills for data handling and analysis
Interested in implementing classification, regression, or clustering models on real-world datasets
Exploring the use of deep learning techniques, in bioinformatics
Involved in next-generation sequencing (NGS) workflows and want to optimize them with GPU acceleration
Committed to building reproducible and scalable analysis pipelines using container technology
Eager to understand and apply best practices in model evaluation, tuning, and validation
New to machine learning and seeking a hands-on, structured introduction

Keywords:
Machine Learning

Tools:
Python

Contact:
This email address is being protected from spambots. You need JavaScript enabled to view it.

Registration:
https://www.cecam.org/workshop-details/applied-machine-learning-for-biological-data-1459

Training

The de.NBI training platform

Applied Machine Learning for Biological Data