Simon Maria Zumkeller

Research Focus

DeepCRE - deep learning applications for identification and functional annotation of cis-regulatory elements in crops

Transcription factors (TFs) are proteins that bind specific DNA sites at target genes acting as cis-regulatory elements (CRE). The interaction of TF and DNA determine transcription rates of target genes and differ across cell type and changing physiological conditions. So, inference of TF binding sites is key for understanding the plant gene regulatory networks. However, the inference of a such binding sites and cognate TFs is a complex task. In the small eukaryotic genome of Arabidopsis thaliana alone, thousands of TFs from different protein families have been characterised by their encoded DNA binding domain. With 27,655 target genes in A. thaliana, there are approximately 5 to 20 million potential TF-DNA binding site interactions. While experimental assays, such as ChIP-seq and DNase-seq, provide means of experimental identification of some of them, annotation of such regulatory sites on a genome scale remains a major challenge.
In my project, I address key aspects of that challenge using deep learning. Specifically, I use convolutional neural networks to combine gene sequence data with information on DNA protein interactions and gene expression patterns. Currently, we are focusing on genome-scale annotation of the regulatory sequences in four species, including Zea mays, Arabidopsis thaliana, Solanum lycopersicum and Sorghum bicolor. Consequently, not only species-specific, but also evolutionary conserved regulatory sequence features are being targeted.

Our data resource includes public data repositories as well as custom datasets provided by our partner labs, the Usadel lab at HHU and the group of Thomas Hartwig at MPIPZ.

If you are interested in collaboration or have questions, please contact me.

Simon Maria Zumkeller


Institute of Bio- and Geosciences - Bioinformatics (IBG-4)
Forschungszentrum Jülich

Wilhelm Johnen-Straße

Gebäude 14.6y  Raum 4044

52428 Jülich

Omics and Data-based Bioinformatics - Prof. Dr. Björn Usadel 

Group of Network Analyses and Modelling - Dr. Jedrzej Jakub Szymanski

Code and implementation at GitHub

Heinrich Heine University
University of Cologne
Max Planck Institute for Plant Breeding Research
Forschungszentrum Jülich