Z1: Bioinformatics core - semantic knowledge fusion and non-standard optimization - Single-cell sequencing core
The Z1 project supports all CRC 1074 scientists with the bioinformatical analysis of NGS data. Besides providing standardized analyses, this includes the integration of heterogeneous data from public databases and semantic domain knowledge, as well as the generation of interpretable/sparse diagnostic models for in-depth analysis of results. To perform this task, efficiently parallelizable algorithms running on multi-core computer hardware are needed. Accordingly, we will continue to adapt our algorithms for clustering, classification and feature selection to the increasing computational demands, either by implementing parallelization strategies and/or non-standard probabilistic computational approaches, e.g. genetic algorithms for feature selection. In addition to individually adapted solutions for different types of NGS data, also automated and standardized workflows are needed that efficiently process the increasing amounts of data in a well-defined manner. This need will be addressed by incorporating recent developments in virtualization technologies, rapid prototyping of specialized container pipelines for data analyses. Specifically, we will extend and optimize community-driven as well as self-developed pipeline frameworks via two strategies: On one hand, we will apply software stack containers (Docker/Singularity). The other approach will be to extend the existing Nextflow based nf-core pipelines to process different sequencing readouts. Feedback from participants of the CRC will be part of the agile development process here.
This approach will also be applied to a new task of the Ulm Core Unit Bioinformatics that comprises the single-cell sequencing analysis by three different readouts, namely RNA (scRNA-seq), mapping of open chromatin loci with active regulatory elements (scATAC-seq) and targeted DNA sequencing (scDNA-seq). Furthermore, Z1 will contribute to integration of multi-omics data sets, including different types of both bulk and single-cell sequencing data.
For a current list of all project-related publications, please go to this page