Comparison and characterization of proteomes in the three domains of life using 2D correlation analysis

Kosuke Fujishima, Mizuki Komasa, Sayaka Kitamura, Masaru Tomita, Akio Kanai

Research output: Contribution to journalArticle

Abstract

Proteins are a major regulatory component in complex biological systems. Among them, DNA/RNA-binding proteins, the key components of the central dogma of molecular biology, and membrane proteins, which are necessary for both signal transduction and metabolite transport, are suggested to be the most important protein families that arose in the early stage of life. In this study, we computationally analyzed the whole proteome data of six model species to overview the protein diversity in the three domains of life (Bacteria, Archaea and Eukaryota), especially focusing on the above two protein families. To compare the protein distribution among the six model species, we calculated various protein profiles: hydropathy, molecular weight, amino acid composition and periodicity for each protein. We found a domain-specific distribution of the proteome based on 2D correlation analysis of hydropathy and molecular weight. Further, the merged protein distribution of Archaea and other domains revealed many membrane proteins localized in Bacteria-specific regions with a high ratio of hydropathy and many DNA/RNA-binding proteins localized in Eukaryota-specific regions with a low ratio of hydropathy. Since about half of the proteins encoded in the genome are still functionally unknown, we further conducted Support Vector Machine (SVM)-based functional prediction using amino acid composition (CO score) and periodicity (PD score) as feature vectors to predict the overall number of DNA/RNA-binding proteins and membrane proteins in the proteome. Our estimation indicated that two functional categories occupy approximately 60% to 80% of the proteome, and further, the proportion of the two categories varied among the three domains of life, suggesting that the proteome has gone through different selective pressure during evolution.

Original languageEnglish
Pages (from-to)206-218
Number of pages13
JournalProgress of Theoretical Physics Supplement
Issue number173
Publication statusPublished - 2008

Fingerprint

proteome
proteins
deoxyribonucleic acid
membranes
bacteria
amino acids
periodic variations
molecular weight
molecular biology
genome
metabolites

ASJC Scopus subject areas

  • Physics and Astronomy (miscellaneous)

Cite this

Comparison and characterization of proteomes in the three domains of life using 2D correlation analysis. / Fujishima, Kosuke; Komasa, Mizuki; Kitamura, Sayaka; Tomita, Masaru; Kanai, Akio.

In: Progress of Theoretical Physics Supplement, No. 173, 2008, p. 206-218.

Research output: Contribution to journalArticle

@article{7c3279896e45423a87599a89e94d9d1e,
title = "Comparison and characterization of proteomes in the three domains of life using 2D correlation analysis",
abstract = "Proteins are a major regulatory component in complex biological systems. Among them, DNA/RNA-binding proteins, the key components of the central dogma of molecular biology, and membrane proteins, which are necessary for both signal transduction and metabolite transport, are suggested to be the most important protein families that arose in the early stage of life. In this study, we computationally analyzed the whole proteome data of six model species to overview the protein diversity in the three domains of life (Bacteria, Archaea and Eukaryota), especially focusing on the above two protein families. To compare the protein distribution among the six model species, we calculated various protein profiles: hydropathy, molecular weight, amino acid composition and periodicity for each protein. We found a domain-specific distribution of the proteome based on 2D correlation analysis of hydropathy and molecular weight. Further, the merged protein distribution of Archaea and other domains revealed many membrane proteins localized in Bacteria-specific regions with a high ratio of hydropathy and many DNA/RNA-binding proteins localized in Eukaryota-specific regions with a low ratio of hydropathy. Since about half of the proteins encoded in the genome are still functionally unknown, we further conducted Support Vector Machine (SVM)-based functional prediction using amino acid composition (CO score) and periodicity (PD score) as feature vectors to predict the overall number of DNA/RNA-binding proteins and membrane proteins in the proteome. Our estimation indicated that two functional categories occupy approximately 60{\%} to 80{\%} of the proteome, and further, the proportion of the two categories varied among the three domains of life, suggesting that the proteome has gone through different selective pressure during evolution.",
author = "Kosuke Fujishima and Mizuki Komasa and Sayaka Kitamura and Masaru Tomita and Akio Kanai",
year = "2008",
language = "English",
pages = "206--218",
journal = "Progress of Theoretical Physics Supplement",
issn = "0375-9687",
publisher = "Yukawa Institute for Theoretical Physics",
number = "173",

}

TY - JOUR

T1 - Comparison and characterization of proteomes in the three domains of life using 2D correlation analysis

AU - Fujishima, Kosuke

AU - Komasa, Mizuki

AU - Kitamura, Sayaka

AU - Tomita, Masaru

AU - Kanai, Akio

PY - 2008

Y1 - 2008

N2 - Proteins are a major regulatory component in complex biological systems. Among them, DNA/RNA-binding proteins, the key components of the central dogma of molecular biology, and membrane proteins, which are necessary for both signal transduction and metabolite transport, are suggested to be the most important protein families that arose in the early stage of life. In this study, we computationally analyzed the whole proteome data of six model species to overview the protein diversity in the three domains of life (Bacteria, Archaea and Eukaryota), especially focusing on the above two protein families. To compare the protein distribution among the six model species, we calculated various protein profiles: hydropathy, molecular weight, amino acid composition and periodicity for each protein. We found a domain-specific distribution of the proteome based on 2D correlation analysis of hydropathy and molecular weight. Further, the merged protein distribution of Archaea and other domains revealed many membrane proteins localized in Bacteria-specific regions with a high ratio of hydropathy and many DNA/RNA-binding proteins localized in Eukaryota-specific regions with a low ratio of hydropathy. Since about half of the proteins encoded in the genome are still functionally unknown, we further conducted Support Vector Machine (SVM)-based functional prediction using amino acid composition (CO score) and periodicity (PD score) as feature vectors to predict the overall number of DNA/RNA-binding proteins and membrane proteins in the proteome. Our estimation indicated that two functional categories occupy approximately 60% to 80% of the proteome, and further, the proportion of the two categories varied among the three domains of life, suggesting that the proteome has gone through different selective pressure during evolution.

AB - Proteins are a major regulatory component in complex biological systems. Among them, DNA/RNA-binding proteins, the key components of the central dogma of molecular biology, and membrane proteins, which are necessary for both signal transduction and metabolite transport, are suggested to be the most important protein families that arose in the early stage of life. In this study, we computationally analyzed the whole proteome data of six model species to overview the protein diversity in the three domains of life (Bacteria, Archaea and Eukaryota), especially focusing on the above two protein families. To compare the protein distribution among the six model species, we calculated various protein profiles: hydropathy, molecular weight, amino acid composition and periodicity for each protein. We found a domain-specific distribution of the proteome based on 2D correlation analysis of hydropathy and molecular weight. Further, the merged protein distribution of Archaea and other domains revealed many membrane proteins localized in Bacteria-specific regions with a high ratio of hydropathy and many DNA/RNA-binding proteins localized in Eukaryota-specific regions with a low ratio of hydropathy. Since about half of the proteins encoded in the genome are still functionally unknown, we further conducted Support Vector Machine (SVM)-based functional prediction using amino acid composition (CO score) and periodicity (PD score) as feature vectors to predict the overall number of DNA/RNA-binding proteins and membrane proteins in the proteome. Our estimation indicated that two functional categories occupy approximately 60% to 80% of the proteome, and further, the proportion of the two categories varied among the three domains of life, suggesting that the proteome has gone through different selective pressure during evolution.

UR - http://www.scopus.com/inward/record.url?scp=54049149033&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=54049149033&partnerID=8YFLogxK

M3 - Article

SP - 206

EP - 218

JO - Progress of Theoretical Physics Supplement

JF - Progress of Theoretical Physics Supplement

SN - 0375-9687

IS - 173

ER -