DeepCRE - deep learning applications for identification and functional annotation of cis-regulatory elements in crops
Transcription factors (TFs) are proteins that bind specific DNA sites at target genes acting as cis-regulatory elements (CRE). The interaction of TF and DNA determine transcription rates of target genes and differ across cell type and changing physiological conditions. So, inference of TF binding sites is key for understanding the plant gene regulatory networks. However, the inference of a such binding sites and cognate TFs is a complex task. In the small eukaryotic genome of Arabidopsis thaliana alone, thousands of TFs from different protein families have been characterised by their encoded DNA binding domain. With 27,655 target genes in A. thaliana, there are approximately 5 to 20 million potential TF-DNA binding site interactions. While experimental assays, such as ChIP-seq and DNase-seq, provide means of experimental identification of some of them, annotation of such regulatory sites on a genome scale remains a major challenge.
In my project, I address key aspects of that challenge using deep learning. Specifically, I use convolutional neural networks to combine gene sequence data with information on DNA protein interactions and gene expression patterns. Currently, we are focusing on genome-scale annotation of the regulatory sequences in four species, including Zea mays, Arabidopsis thaliana, Solanum lycopersicum and Sorghum bicolor. Consequently, not only species-specific, but also evolutionary conserved regulatory sequence features are being targeted.
Our data resource includes public data repositories as well as custom datasets provided by our partner labs, the Usadel lab at HHU and the group of Thomas Hartwig at MPIPZ.
If you are interested in collaboration or have questions, please contact me.