Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization

Kenta Sasaki, Nobuyoshi Nagamine, Yasubumi Sakakibara

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called "the third chain of the living organism". About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90% accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.

Original languageEnglish
Pages (from-to)25-35
Number of pages11
JournalIPSJ Transactions on Bioinformatics
Volume2
DOIs
Publication statusPublished - 2009

Fingerprint

Glycosylation
Support vector machines
Proteins
Polysaccharides
Cellular radio systems
Glycosyltransferases
Protein Databases
Glycoproteins
Support Vector Machine
Post Translational Protein Processing
Computational methods
Sugars
DNA
Genes
Genome
Databases

ASJC Scopus subject areas

  • Computer Science Applications
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)

Cite this

@article{e8ab1711e85f45ea9597b8a75bcfa61d,
title = "Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization",
abstract = "Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called {"}the third chain of the living organism{"}. About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90{\%} accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.",
author = "Kenta Sasaki and Nobuyoshi Nagamine and Yasubumi Sakakibara",
year = "2009",
doi = "10.2197/ipsjtbio.2.25",
language = "English",
volume = "2",
pages = "25--35",
journal = "IPSJ Transactions on Bioinformatics",
issn = "1882-6679",
publisher = "Information Processing Society of Japan",

}

TY - JOUR

T1 - Support vector machine prediction of N-and O-glycosylation sites using whole sequence information and subcellular localization

AU - Sasaki, Kenta

AU - Nagamine, Nobuyoshi

AU - Sakakibara, Yasubumi

PY - 2009

Y1 - 2009

N2 - Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called "the third chain of the living organism". About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90% accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.

AB - Background: Glycans, or sugar chains, are one of the three types of chain (DNA, protein and glycan) that constitute living organisms; they are often called "the third chain of the living organism". About half of all proteins are estimated to be glycosylated based on the SWISS-PROT database. Glycosylation is one of the most important post-translational modifications, affecting many critical functions of proteins, including cellular communication, and their tertiary structure. In order to computationally predict N-glycosylation and O-glycosylation sites, we developed three kinds of support vector machine (SVM) model, which utilize local information, general protein information and/or subcellular localization in consideration of the binding specificity of glycosyltransferases and the characteristic subcellular localization of glycoproteins. Results: In our computational experiment, the model integrating three kinds of information achieved about 90% accuracy in predictions of both N-glycosylation and O-glycosylation sites. Moreover, our model was applied to a protein whose glycosylation sites had not been previously identified and we succeeded in showing that the glycosylation sites predicted by our model were structurally reasonable. Conclusions: In the present study, we developed a comprehensive and effective computational method that detects glycosylation sites. We conclude that our method is a comprehensive and effective computational prediction method that is applicable at a genome-wide level.

UR - http://www.scopus.com/inward/record.url?scp=74049142700&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=74049142700&partnerID=8YFLogxK

U2 - 10.2197/ipsjtbio.2.25

DO - 10.2197/ipsjtbio.2.25

M3 - Article

VL - 2

SP - 25

EP - 35

JO - IPSJ Transactions on Bioinformatics

JF - IPSJ Transactions on Bioinformatics

SN - 1882-6679

ER -