“School of Biological Sciences”
Back to Papers HomeBack to Papers of School of Biological Sciences
Paper IPM / Biological Sciences / 16455 |
|
||||||
Abstract: | |||||||
Background: Predicting physical interaction between proteins is one of the greatest
challenges in computational biology. There are considerable various protein interactions
and a huge number of protein sequences and synthetic peptides with unknown
interacting counterparts. Most of co-evolutionary methods discover a combination
of physical interplays and functional associations. However, there are only a handful
of approaches which specifically infer physical interactions. Hybrid co-evolutionary
methods exploit inter-protein residue coevolution to unravel specific physical interacting
proteins. In this study, we introduce a hybrid co-evolutionary-based approach
to predict physical interplays between pairs of protein families, starting from protein
sequences only.
Results: In the present analysis, pairs of multiple sequence alignments are constructed
for each dimer and the covariation between residues in those pairs are
calculated by CCMpred (Contacts from Correlated Mutations predicted) and three
mutual information based approaches for ten accessible surface area threshold groups.
Then, whole residue couplings between proteins of each dimer are unified into a single
Frobenius norm value. Norms of residue contact matrices of all dimers in different
accessible surface area thresholds are fed into support vector machine as single or
multiple feature models. The results of training the classifiers by single features show
no apparent different accuracies in distinct methods for different accessible surface
area thresholds. Nevertheless, mutual information product and context likelihood of
relatedness procedures may roughly have an overall higher and lower performances
than other two methods for different accessible surface area cut-offs, respectively. The
results also demonstrate that training support vector machine with multiple norm
features for several accessible surface area thresholds leads to a considerable improvement
of prediction performance. In this context, CCMpred roughly achieves an overall
better performance than mutual information based approaches. The best accuracy,
sensitivity, specificity, precision and negative predictive value for that method are 0.98,
1, 0.962, 0.96, and 0.962, respectively.
Conclusions: In this paper, by feeding norm values of protein dimers into support
vector machines in different accessible surface area thresholds, we demonstrate that
even small number of proteins in pairs of multiple alignments could allow one to accurately
discriminate between positive and negative dimers.
Download TeX format |
|||||||
back to top |