Finding new overlapping genes and their theory (FOG-Theory)
The general goal of the project is to find and verify new overlapping protein-coding DNA-sequences in prokaryotes and to understand the underlying mechanisms with the help of models from information and communication theory. To reach these goals, a cooperation of three groups is necessary, namely a group performing in vivo and in vitro molecular biology experiments, an informatic group which can handle the huge amount of widely dis- tributed data on gene sequences, and a group working in information and communication theory. With methods from information theory, especially from error correcting codes, the process of coding proteins via embedded genes will be studied, using new distance measures. Further, the powerful concept of random coding will be used to obtain bounds. Embedded genes will be analysed using a coding theoretic approach. Communication theory provides models and mechanisms in order to transmit information reliably over channels which intro- duce errors. Evolution, as well as the process of coding proteins by overlapping genes, can be viewed as such a communication system. Both will be described and analysed with the theory from communication systems, including synchronization mechanisms. The parameters of the models need to be verified and/or determined. Therefore, aspects of bioinformatics and molecular biology are essential. Algorithms will be developed which efciently search data bases at a large scale for new protein-coding DNA-sequences in prokaryotes, embedded in annotated genes in overlapping alternative reading frames. Based on these results, experimental evaluation of embedded genes using molecular biology tools to determine function of selected candidate genes will be performed.