TY - JOUR
T1 - Deep learning integration of molecular and interactome data for protein–compound interaction prediction
AU - Watanabe, Narumi
AU - Ohnuki, Yuuto
AU - Sakakibara, Yasubumi
N1 - Funding Information:
This work was supported by a Grant-in-Aid for Scientific Research on Innovative Areas “Frontier Research on Chemical Communications”[No. 17H06410] from the Ministry of Education, Culture, Sports, Science and Technology of Japan and a Grant-in-Aid for Scientific Research (A) (KAKENHI) [No. 18H04127] from the JSPS.
Publisher Copyright:
© 2021, The Author(s).
PY - 2021/12
Y1 - 2021/12
N2 - Motivation: Virtual screening, which can computationally predict the presence or absence of protein–compound interactions, has attracted attention as a large-scale, low-cost, and short-term search method for seed compounds. Existing machine learning methods for predicting protein–compound interactions are largely divided into those based on molecular structure data and those based on network data. The former utilize information on proteins and compounds, such as amino acid sequences and chemical structures; the latter rely on interaction network data, such as protein–protein interactions and compound–compound interactions. However, there have been few attempts to combine both types of data in molecular information and interaction networks. Results: We developed a deep learning-based method that integrates protein features, compound features, and multiple types of interactome data to predict protein–compound interactions. We designed three benchmark datasets with different difficulties and applied them to evaluate the prediction method. The performance evaluations show that our deep learning framework for integrating molecular structure data and interactome data outperforms state-of-the-art machine learning methods for protein–compound interaction prediction tasks. The performance improvement is statistically significant according to the Wilcoxon signed-rank test. This finding reveals that the multi-interactome data captures perspectives other than amino acid sequence homology and chemical structure similarity and that both types of data synergistically improve the prediction accuracy. Furthermore, experiments on the three benchmark datasets show that our method is more robust than existing methods in accurately predicting interactions between proteins and compounds that are unseen in training samples.
AB - Motivation: Virtual screening, which can computationally predict the presence or absence of protein–compound interactions, has attracted attention as a large-scale, low-cost, and short-term search method for seed compounds. Existing machine learning methods for predicting protein–compound interactions are largely divided into those based on molecular structure data and those based on network data. The former utilize information on proteins and compounds, such as amino acid sequences and chemical structures; the latter rely on interaction network data, such as protein–protein interactions and compound–compound interactions. However, there have been few attempts to combine both types of data in molecular information and interaction networks. Results: We developed a deep learning-based method that integrates protein features, compound features, and multiple types of interactome data to predict protein–compound interactions. We designed three benchmark datasets with different difficulties and applied them to evaluate the prediction method. The performance evaluations show that our deep learning framework for integrating molecular structure data and interactome data outperforms state-of-the-art machine learning methods for protein–compound interaction prediction tasks. The performance improvement is statistically significant according to the Wilcoxon signed-rank test. This finding reveals that the multi-interactome data captures perspectives other than amino acid sequence homology and chemical structure similarity and that both types of data synergistically improve the prediction accuracy. Furthermore, experiments on the three benchmark datasets show that our method is more robust than existing methods in accurately predicting interactions between proteins and compounds that are unseen in training samples.
KW - Deep learning
KW - Heterogeneous interaction network
KW - Integration
KW - Protein–compound interaction
UR - http://www.scopus.com/inward/record.url?scp=85105131356&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105131356&partnerID=8YFLogxK
U2 - 10.1186/s13321-021-00513-3
DO - 10.1186/s13321-021-00513-3
M3 - Article
AN - SCOPUS:85105131356
SN - 1758-2946
VL - 13
JO - Journal of Cheminformatics
JF - Journal of Cheminformatics
IS - 1
M1 - 36
ER -