Integrating statistical predictions and experimental verifications for enhancing protein-chemical interaction predictions in virtual screening

Nobuyoshi Nagamine, Takayuki Shirakawa, Yusuke Minato, Kentaro Torii, Hiroki Kobayashi, Masaya Imoto, Yasubumi Sakakibara

Research output: Contribution to journalArticle

36 Citations (Scopus)

Abstract

Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space.

Original languageEnglish
Article numbere1000397
JournalPLoS Computational Biology
Volume5
Issue number6
DOIs
Publication statusPublished - 2009 Jun

Fingerprint

Virtual Screening
chemical interactions
Screening
screening
Proteins
Protein
protein
prediction
Prediction
Statistical Models
Ligands
Interaction
proteins
Learning
False Positive
Statistical Learning
Drug Discovery
ligand
Computer Simulation
Chemical compounds

ASJC Scopus subject areas

  • Cellular and Molecular Neuroscience
  • Ecology
  • Molecular Biology
  • Genetics
  • Ecology, Evolution, Behavior and Systematics
  • Modelling and Simulation
  • Computational Theory and Mathematics

Cite this

Integrating statistical predictions and experimental verifications for enhancing protein-chemical interaction predictions in virtual screening. / Nagamine, Nobuyoshi; Shirakawa, Takayuki; Minato, Yusuke; Torii, Kentaro; Kobayashi, Hiroki; Imoto, Masaya; Sakakibara, Yasubumi.

In: PLoS Computational Biology, Vol. 5, No. 6, e1000397, 06.2009.

Research output: Contribution to journalArticle

@article{e8c16e40e2544c54b49bcc433498d322,
title = "Integrating statistical predictions and experimental verifications for enhancing protein-chemical interaction predictions in virtual screening",
abstract = "Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space.",
author = "Nobuyoshi Nagamine and Takayuki Shirakawa and Yusuke Minato and Kentaro Torii and Hiroki Kobayashi and Masaya Imoto and Yasubumi Sakakibara",
year = "2009",
month = "6",
doi = "10.1371/journal.pcbi.1000397",
language = "English",
volume = "5",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Integrating statistical predictions and experimental verifications for enhancing protein-chemical interaction predictions in virtual screening

AU - Nagamine, Nobuyoshi

AU - Shirakawa, Takayuki

AU - Minato, Yusuke

AU - Torii, Kentaro

AU - Kobayashi, Hiroki

AU - Imoto, Masaya

AU - Sakakibara, Yasubumi

PY - 2009/6

Y1 - 2009/6

N2 - Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space.

AB - Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space.

UR - http://www.scopus.com/inward/record.url?scp=67650895854&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650895854&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1000397

DO - 10.1371/journal.pcbi.1000397

M3 - Article

VL - 5

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 6

M1 - e1000397

ER -