Zeller and Bork Group, EMBL Heidelberg, Service center: Heidelberg Center for Human Bioinformatics – HD-HuB
A primary goal of analyzing microbiome data is to determine changes in community composition that are associated with environmental factors. In particular, linking human microbiome composition to host phenotypes such as diseases has become an area of intense research: Using machine learning, the human microbiome is increasingly mined for diagnostic and therapeutic biomarkers, but the relevant tools to do so are scarce. Therefore, the Zeller Group has developed SIAMCAT, a pipeline for Statistical Inference of Associations between Microbial Communities And host phenoTypes, which is one of EMBL’s de.NBI/ELIXIR-DE offerings. SIAMCAT is a versatile R toolbox for comparative metagenome analysis using machine learning, statistical modeling, and advanced visualization approaches. It also includes functionality to identify and visually explore confounding factors. SIAMCAT is one tool of a larger framework of microbiome-related tools (https://microbiome-tools.embl.de/) which are all offered via de.NBI/ELIXIR-DE. A publication under review (see preprint) not only describes the SIAMCAT computational workflows, but also a large-scale meta-analysis of 50 case-control microbiome disease association studies including a total of >10,000 samples and addresses solutions to overcome the technical issues hampering naive cross-study application of machine learning approaches.