TY - JOUR
T1 - Evolutionary analysis of HIV-1 pol proteins reveals representative residues for viral subtype differentiation
AU - Nagata, Shohei
AU - Imai, Junnosuke
AU - Makino, Gakuto
AU - Tomita, Masaru
AU - Kanai, Akio
N1 - Funding Information:
We thank Dr. Motomu Matsui, Dr. Haruo Suzuki, Dr. Yasuhiro Naito and Mr. Satoshi Tamaki for their constructive suggestions and discussions. We also thank all the members of the RNA Group at the Institute for Advanced Biosciences, Keio University, Japan, for their insightful discussions. This work was supported, in part, by research funds from the Yamagata Prefectural Government and Tsuruoka City, Japan. The funding bodies played no role in the study design, data collection or analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2017 Nagata, Imai, Makino, Tomita and Kanai.
PY - 2017/11/2
Y1 - 2017/11/2
N2 - RNA viruses have been used as model systems to understand the patterns and processes of molecular evolution because they have high mutation rates and are genetically diverse. Human immunodeficiency virus 1 (HIV-1), the etiological agent of acquired immune deficiency syndrome, is highly genetically diverse, and is classified into several groups and subtypes. However, it has been difficult to use its diverse sequences to establish the overall phylogenetic relationships of different strains or the trends in sequence conservation with the construction of phylogenetic trees. Our aims were to systematically characterize HIV-1 subtype evolution and to identify the regions responsible for HIV-1 subtype differentiation at the amino acid level in the Pol protein, which is often used to classify the HIV-1 subtypes. In this study, we systematically characterized the mutation sites in 2,052 Pol proteins from HIV-1 group M (144 subtype A; 1,528 subtype B; 380 subtype C), using sequence similarity networks. We also used spectral clustering to group the sequences based on the network graph structures. A stepwise analysis of the cluster hierarchies allowed us to estimate a possible evolutionary pathway for the Pol proteins. The subtype A sequences also clustered according to when and where the viruses were isolated, whereas both the subtype B and C sequences remained as single clusters. Because the Pol protein has several functional domains, we identified the regions that are discriminative by comparing the structures of the domain-based networks. Our results suggest that sequence changes in the RNase H domain and the reverse transcriptase (RT) connection domain are responsible for the subtype classification. By analyzing the different amino acid compositions at each site in both domain sequences, we found that a few specific amino acid residues (i.e., M357 in the RT connection domain and Q480, Y483, and L491 in the RNase H domain) represent the differences among the subtypes. These residues were located on the surface of the RT structure and in the vicinity of the amino acid sites responsible for RT enzymatic activity or function.
AB - RNA viruses have been used as model systems to understand the patterns and processes of molecular evolution because they have high mutation rates and are genetically diverse. Human immunodeficiency virus 1 (HIV-1), the etiological agent of acquired immune deficiency syndrome, is highly genetically diverse, and is classified into several groups and subtypes. However, it has been difficult to use its diverse sequences to establish the overall phylogenetic relationships of different strains or the trends in sequence conservation with the construction of phylogenetic trees. Our aims were to systematically characterize HIV-1 subtype evolution and to identify the regions responsible for HIV-1 subtype differentiation at the amino acid level in the Pol protein, which is often used to classify the HIV-1 subtypes. In this study, we systematically characterized the mutation sites in 2,052 Pol proteins from HIV-1 group M (144 subtype A; 1,528 subtype B; 380 subtype C), using sequence similarity networks. We also used spectral clustering to group the sequences based on the network graph structures. A stepwise analysis of the cluster hierarchies allowed us to estimate a possible evolutionary pathway for the Pol proteins. The subtype A sequences also clustered according to when and where the viruses were isolated, whereas both the subtype B and C sequences remained as single clusters. Because the Pol protein has several functional domains, we identified the regions that are discriminative by comparing the structures of the domain-based networks. Our results suggest that sequence changes in the RNase H domain and the reverse transcriptase (RT) connection domain are responsible for the subtype classification. By analyzing the different amino acid compositions at each site in both domain sequences, we found that a few specific amino acid residues (i.e., M357 in the RT connection domain and Q480, Y483, and L491 in the RNase H domain) represent the differences among the subtypes. These residues were located on the surface of the RT structure and in the vicinity of the amino acid sites responsible for RT enzymatic activity or function.
KW - Bioinformatics
KW - HIV-1
KW - Molecular evolution
KW - Network analysis
KW - Pol protein
KW - Protein domain
UR - http://www.scopus.com/inward/record.url?scp=85032725774&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032725774&partnerID=8YFLogxK
U2 - 10.3389/fmicb.2017.02151
DO - 10.3389/fmicb.2017.02151
M3 - Article
AN - SCOPUS:85032725774
SN - 1664-302X
VL - 8
JO - Frontiers in Microbiology
JF - Frontiers in Microbiology
IS - NOV
M1 - 2151
ER -