Identification of functionally important protein residues by means of entropy based methods, and experimental validation by mutational analysis
In order to understand the structural basis of protein function, all residues (i. e. amino acid side chains) contributing to this function have to be identified. However, neither the amino acid sequence nor the three-dimensional structure of a protein allows one to judge which residues are important and which ones are irrelevant. With the recent advent of hundreds of completely sequenced genomes, one can generate multiple sequence alignments (MSAs) of proteins sharing the same function, which provide the set of accepted residues. These sets are the results of those mutations having accumulated since the last common ancestor of cellular life and increased the fitness or were at least neutral. Thus, a high degree of residue conservation at a given position indicates that this residue is important for the function of the protein. To quantify conservation, information theory based approaches have to be applied. One group of important residues are strictly conserved ones, which can be identified easily. A second group of informative residues are those ones, which mutate in a correlated manner. Such correlations, which can be identified by methods from communication theory, will unravel functionally important residues and intra-molecular networks connecting them. Moreover, inter-molecular correlations can be exploited in an analogous manner to identify functional interactions across protein-protein interfaces. It is the aim of our project to develop sensitive and statistically affirmed methods based on information theory for the identification of intraand inter-molecular residue correlations and functional networks. We will design a comprehensive and entropy-based model for co-evolving residue positions. In particular, we will consider similarities between elements of the sample space for the calculation of entropies. The importance of residues being predicted to be involved in such networks will be tested experimentally by mutational analysis of well-characterized enzymes. In this project, model building, program development and validation depends critically on feedback generated by biochemical/biophysical experiments.