PIER Protein IntErface Recognition  


Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments (MSA) projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions.

In this study, we developed an improved method for predicting interfaces from a single protein structure, that is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition method (PIER) achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric and 196 transient interfaces (compared to 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark the binding patch residues were successfully detected with precision exceeding 50% at 50% recall.

We demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared to several alignment-free or alignment-dependent predictions.

The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects. This work is supported by NIH grant 5-R01-GM071872-02.