Translational initiation signals, such as Shine-Dalgarno (SD) sequences in bacteria and Kozak consensus sequences in vertebrates, direct ribosomes to initiate protein synthesis from mRNAs. Investigating sequence characteristics of these signals is important, particularly to infer translational initiation mechanisms. Although various statistical analyses of translational initiation signals have been done, few have focused on base correlations that assess base dependencies in the signal sequences. We used relative entropy and mutual information to analyze base conservation and correlation, respectively, in the 5′ UTRs of various species. In eukaryotes, we found peaks of relative entropy at - 3 from the translational start site but no peak of mutual information at that position, indicating that the base at that position (known as the core base of the Kozak sequence) is well conserved but not correlated with neighboring bases and thus functions as a single base. We observed unexpected peaks of mutual information between positions - 2 and - 1 in most eukaryotes. Surprisingly these base correlation also occurred in some bacteria and archaea, although there were no base preferences at neither position. Various dinucleotide patterns existed at these positions, and the correlation between bases at - 2 and - 1 may be relevant to the context of translational initiation. Because dinucleotide patterns of correlated pairs of nucleotides at - 2 and - 1 were not unique within respective organisms, the correlation could not be found when analyzing single-nucleotide conservation. Therefore, mutual information allowed us to discover signals that were not found by simply analyzing base conservation.
ASJC Scopus subject areas