Proteins are a major regulatory component in complex biological systems. Among them, DNA/RNA-binding proteins, the key components of the central dogma of molecular biology, and membrane proteins, which are necessary for both signal transduction and metabolite transport, are suggested to be the most important protein families that arose in the early stage of life. In this study, we computationally analyzed the whole proteome data of six model species to overview the protein diversity in the three domains of life (Bacteria, Archaea and Eukaryota), especially focusing on the above two protein families. To compare the protein distribution among the six model species, we calculated various protein profiles: hydropathy, molecular weight, amino acid composition and periodicity for each protein. We found a domain-specific distribution of the proteome based on 2D correlation analysis of hydropathy and molecular weight. Further, the merged protein distribution of Archaea and other domains revealed many membrane proteins localized in Bacteria-specific regions with a high ratio of hydropathy and many DNA/RNA-binding proteins localized in Eukaryota-specific regions with a low ratio of hydropathy. Since about half of the proteins encoded in the genome are still functionally unknown, we further conducted Support Vector Machine (SVM)-based functional prediction using amino acid composition (CO score) and periodicity (PD score) as feature vectors to predict the overall number of DNA/RNA-binding proteins and membrane proteins in the proteome. Our estimation indicated that two functional categories occupy approximately 60% to 80% of the proteome, and further, the proportion of the two categories varied among the three domains of life, suggesting that the proteome has gone through different selective pressure during evolution.
ASJC Scopus subject areas
- Physics and Astronomy (miscellaneous)