TY - JOUR
T1 - Validating the significance of genomic properties of Chi sites from the distribution of all octamers in Escherichia coli
AU - Arakawa, Kazuharu
AU - Uno, Reina
AU - Nakayama, Yoichi
AU - Tomita, Masaru
N1 - Funding Information:
We would like to thank Nozomu Yachie and Yukino Ogawa for technical advices in statistical validation. This research was supported by the Japan Society for the Promotion of Science (JSPS).
PY - 2007/5/1
Y1 - 2007/5/1
N2 - Chi sites (5′-GCTGGTGG-3′) are homologous recombinational hotspot octamer sequences, which attenuate the exonuclease activity of RecBCD in Escherichia coli. They are overrepresented in the genome (1008 occurrences), preferentially located within coding regions (98%), oriented in the direction of replication (75%), and occur most commonly on the mRNA-synonymous sense strand of the double helix (79%). Previous statistical studies of the genome sequence suggested that these genomic properties of Chi sites appear to be related to their role in recombinational repair and therefore to replication and transcription. In this study, we employ three mathematical models to predict the properties of Chi sites from single nucleotide and multi-nucleotide compositions, and validate them statistically using the distribution of all octamer sequences in the entire genome, or exclusively within ORFs. The model based on the overall distribution of all octamers provided better predictions than the single nucleotide composition model, and the ORF and sense strand preference of Chi sites were shown to be within the standard deviation of all octamers. In contrast, the orientation bias of the Chi sites in the direction of replication was significant, although the bias was not as pronounced as with the single nucleotide composition model, suggesting a selective pressure related to the role of RecBCD in replication.
AB - Chi sites (5′-GCTGGTGG-3′) are homologous recombinational hotspot octamer sequences, which attenuate the exonuclease activity of RecBCD in Escherichia coli. They are overrepresented in the genome (1008 occurrences), preferentially located within coding regions (98%), oriented in the direction of replication (75%), and occur most commonly on the mRNA-synonymous sense strand of the double helix (79%). Previous statistical studies of the genome sequence suggested that these genomic properties of Chi sites appear to be related to their role in recombinational repair and therefore to replication and transcription. In this study, we employ three mathematical models to predict the properties of Chi sites from single nucleotide and multi-nucleotide compositions, and validate them statistically using the distribution of all octamer sequences in the entire genome, or exclusively within ORFs. The model based on the overall distribution of all octamers provided better predictions than the single nucleotide composition model, and the ORF and sense strand preference of Chi sites were shown to be within the standard deviation of all octamers. In contrast, the orientation bias of the Chi sites in the direction of replication was significant, although the bias was not as pronounced as with the single nucleotide composition model, suggesting a selective pressure related to the role of RecBCD in replication.
KW - Bioinformatics
KW - Homologous recombination
KW - Orientation bias
KW - RecBCD
KW - Strand bias
UR - http://www.scopus.com/inward/record.url?scp=33947587178&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947587178&partnerID=8YFLogxK
U2 - 10.1016/j.gene.2006.12.022
DO - 10.1016/j.gene.2006.12.022
M3 - Article
C2 - 17270364
AN - SCOPUS:33947587178
VL - 392
SP - 239
EP - 246
JO - Gene
JF - Gene
SN - 0378-1119
IS - 1-2
ER -