Johanna Nelkner, PhD student at Bielefeld University, Germany, discusses her experiences with the handling of biological Big Data, the analysis of microbial community data, resulting challenges and how they can be solved with de.NBI services.

Meta min

Fig.: A-D: experimental setup of the study. E-F: the direct single-read-based metagenomic approach with MGX. G-J: the assembly-binning-based approach yielding metagenomically assembled genomes and their estimated abundance. Created with BioRender.

I’m a member of the Genome Research of Industrial Microorganisms (GRIM) research group led by Prof. Dr. Alfred Pühler at the Center for Biotechnology (CeBiTec) at the Bielefeld University. Our group has a lot of experience in the analysis of genomic and metagenomic data with a focus on biogas-producing microbial communities originating from biogas plants. A topic of our common interest, namely microbiome analysis, has led to establishing strong cooperation with the Computational metagenomics group led by Prof. Dr. Alexander Sczyrba at Bielefeld University.

For my PhD, I was excited to study microbial communities from agricultural soils for a sustainable crop production - the importance of agriculturally used soil for our environment is enormous. In Germany, more than 50% of the land is used for agriculture. Soil is the basis of our food-chain and also plays an important role for our climate. We aimed to analyze the effect of long-term farming practices (conventional vs. preserving) on the microbial community.

Little did I know that extracting results from soil metagenome data was going to be way more complicated than expected. First, soil harbors a huge diversity of microorganisms. While in biogas plants, we typically can taxonomically classify between 500-1000 different microbial genera [], in agricultural soil we found more than 3500 genera. As a result, deep sequencing is required to capture the entire metagenome and to be able to assemble the sequencing reads later, which provides substantial insights into the metabolic potential of microorganisms allowing conclusions on their functional role in the microbial ecosystem while generating higher volumes of biological Big Data. For the processing of this huge amount of data computational power is needed - without the resources of the de.NBI Cloud (, this would have been impossible to achieve.

Metagenomic sequencing data can be analyzed directly by classifying short reads taxonomically and functionally. For this task, I used the de.NBI tool MGX (Link, which provides a graphical user interface, many workflows and comparisons against different databases e.g., for the taxonomic classification. 

With the direct classification of single-reads, comparing microbial communities from biogas plants, we usually see differences in taxonomic composition already at phylum level. For the agricultural soil data from a long-term field experiment, this was not the case. Even at the genus level (high resolution), we couldn’t detect pronounced differences with the single-read classification approach. So we had to dig deeper (not in the soil, but in the data). 

To uncover information hidden in a microbial consortium and to follow our hypothesis that the soil management practices we studied have an impact on the soil microbial community, we followed the assembly-binning-based approach for the analysis of metagenomic sequencing data. In order to analyze each species of a microbial community individually, the original genomes in the metagenome can potentially be separated with this approach. For this purpose, Alexander Sczyrba’s group developed an automated software pipeline, the Elastic Metagenome Browser, or short EMGB ( Within the EMGB workflow, metagenomic sequencing data is automatically pre-processed, assembled and then assembled contigs get binned to Metagenomically assembled Genomes (MAGs). Using this assembly and binning approaches, we were able to reconstruct genomes from the metagenome and now analyzed the differential abundance of our MAGs in the soil treatments by read mapping (also generated within EMGB). Following this approach, we found differences in our soil treatments - at the level of MAGs, representing single species of the microbial community. 

Concluding: for soil metagenomic data, the devil is in the details and de.NBI tools helped me tackle the devil.

The mentioned bioinformatic tools are de.NBI Services, and are freely available to all life science researchers. de.NBI also offers Training Courses to learn the handling of the tools.

For any questions about the research project, please read the open-access publication at or contact Johanna Nelkner.