TY - JOUR
T1 - Proteome-wide prediction of novel DNA/RNA-binding proteins using amino acid composition and periodicity in the hyperthermophilic archaeon pyrococcus furiosus
AU - Fujishima, Kosuke
AU - Komasa, Mizuki
AU - Kitamura, Sayaka
AU - Suzuki, Haruo
AU - Tomita, Masaru
AU - Kanai, Akio
PY - 2007/6
Y1 - 2007/6
N2 - Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I-III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87 of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method.
AB - Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I-III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87 of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method.
KW - Amino acid periodicity
KW - Archaea
KW - DNA/RNA-binding protein
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=34748889938&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34748889938&partnerID=8YFLogxK
U2 - 10.1093/dnares/dsm011
DO - 10.1093/dnares/dsm011
M3 - Article
C2 - 17573465
AN - SCOPUS:34748889938
SN - 1340-2838
VL - 14
SP - 91
EP - 102
JO - DNA Research
JF - DNA Research
IS - 3
ER -