Comparative analysis of base correlations in 5′ untranslated regions of various species

Yuko Osada, Rintaro Saito, Masaru Tomita

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Translational initiation signals, such as Shine-Dalgarno (SD) sequences in bacteria and Kozak consensus sequences in vertebrates, direct ribosomes to initiate protein synthesis from mRNAs. Investigating sequence characteristics of these signals is important, particularly to infer translational initiation mechanisms. Although various statistical analyses of translational initiation signals have been done, few have focused on base correlations that assess base dependencies in the signal sequences. We used relative entropy and mutual information to analyze base conservation and correlation, respectively, in the 5′ UTRs of various species. In eukaryotes, we found peaks of relative entropy at - 3 from the translational start site but no peak of mutual information at that position, indicating that the base at that position (known as the core base of the Kozak sequence) is well conserved but not correlated with neighboring bases and thus functions as a single base. We observed unexpected peaks of mutual information between positions - 2 and - 1 in most eukaryotes. Surprisingly these base correlation also occurred in some bacteria and archaea, although there were no base preferences at neither position. Various dinucleotide patterns existed at these positions, and the correlation between bases at - 2 and - 1 may be relevant to the context of translational initiation. Because dinucleotide patterns of correlated pairs of nucleotides at - 2 and - 1 were not unique within respective organisms, the correlation could not be found when analyzing single-nucleotide conservation. Therefore, mutual information allowed us to discover signals that were not found by simply analyzing base conservation.

Original languageEnglish
Pages (from-to)80-86
Number of pages7
JournalGene
Volume375
Issue number1-2
DOIs
Publication statusPublished - 2006 Jun 21

Fingerprint

5' Untranslated Regions
Entropy
Protein Sorting Signals
Eukaryota
Nucleotides
Bacteria
Archaea
Consensus Sequence
Ribosomes
Vertebrates
Messenger RNA
Proteins

Keywords

  • Kozak consensus sequence
  • Mutual information
  • Shine-Dalgarno sequence
  • Start codon
  • Translation initiation sites

ASJC Scopus subject areas

  • Genetics

Cite this

Comparative analysis of base correlations in 5′ untranslated regions of various species. / Osada, Yuko; Saito, Rintaro; Tomita, Masaru.

In: Gene, Vol. 375, No. 1-2, 21.06.2006, p. 80-86.

Research output: Contribution to journalArticle

Osada, Yuko ; Saito, Rintaro ; Tomita, Masaru. / Comparative analysis of base correlations in 5′ untranslated regions of various species. In: Gene. 2006 ; Vol. 375, No. 1-2. pp. 80-86.
@article{19f41241de1d477aa83e2c607c4dba1e,
title = "Comparative analysis of base correlations in 5′ untranslated regions of various species",
abstract = "Translational initiation signals, such as Shine-Dalgarno (SD) sequences in bacteria and Kozak consensus sequences in vertebrates, direct ribosomes to initiate protein synthesis from mRNAs. Investigating sequence characteristics of these signals is important, particularly to infer translational initiation mechanisms. Although various statistical analyses of translational initiation signals have been done, few have focused on base correlations that assess base dependencies in the signal sequences. We used relative entropy and mutual information to analyze base conservation and correlation, respectively, in the 5′ UTRs of various species. In eukaryotes, we found peaks of relative entropy at - 3 from the translational start site but no peak of mutual information at that position, indicating that the base at that position (known as the core base of the Kozak sequence) is well conserved but not correlated with neighboring bases and thus functions as a single base. We observed unexpected peaks of mutual information between positions - 2 and - 1 in most eukaryotes. Surprisingly these base correlation also occurred in some bacteria and archaea, although there were no base preferences at neither position. Various dinucleotide patterns existed at these positions, and the correlation between bases at - 2 and - 1 may be relevant to the context of translational initiation. Because dinucleotide patterns of correlated pairs of nucleotides at - 2 and - 1 were not unique within respective organisms, the correlation could not be found when analyzing single-nucleotide conservation. Therefore, mutual information allowed us to discover signals that were not found by simply analyzing base conservation.",
keywords = "Kozak consensus sequence, Mutual information, Shine-Dalgarno sequence, Start codon, Translation initiation sites",
author = "Yuko Osada and Rintaro Saito and Masaru Tomita",
year = "2006",
month = "6",
day = "21",
doi = "10.1016/j.gene.2006.02.018",
language = "English",
volume = "375",
pages = "80--86",
journal = "Gene",
issn = "0378-1119",
publisher = "Elsevier",
number = "1-2",

}

TY - JOUR

T1 - Comparative analysis of base correlations in 5′ untranslated regions of various species

AU - Osada, Yuko

AU - Saito, Rintaro

AU - Tomita, Masaru

PY - 2006/6/21

Y1 - 2006/6/21

N2 - Translational initiation signals, such as Shine-Dalgarno (SD) sequences in bacteria and Kozak consensus sequences in vertebrates, direct ribosomes to initiate protein synthesis from mRNAs. Investigating sequence characteristics of these signals is important, particularly to infer translational initiation mechanisms. Although various statistical analyses of translational initiation signals have been done, few have focused on base correlations that assess base dependencies in the signal sequences. We used relative entropy and mutual information to analyze base conservation and correlation, respectively, in the 5′ UTRs of various species. In eukaryotes, we found peaks of relative entropy at - 3 from the translational start site but no peak of mutual information at that position, indicating that the base at that position (known as the core base of the Kozak sequence) is well conserved but not correlated with neighboring bases and thus functions as a single base. We observed unexpected peaks of mutual information between positions - 2 and - 1 in most eukaryotes. Surprisingly these base correlation also occurred in some bacteria and archaea, although there were no base preferences at neither position. Various dinucleotide patterns existed at these positions, and the correlation between bases at - 2 and - 1 may be relevant to the context of translational initiation. Because dinucleotide patterns of correlated pairs of nucleotides at - 2 and - 1 were not unique within respective organisms, the correlation could not be found when analyzing single-nucleotide conservation. Therefore, mutual information allowed us to discover signals that were not found by simply analyzing base conservation.

AB - Translational initiation signals, such as Shine-Dalgarno (SD) sequences in bacteria and Kozak consensus sequences in vertebrates, direct ribosomes to initiate protein synthesis from mRNAs. Investigating sequence characteristics of these signals is important, particularly to infer translational initiation mechanisms. Although various statistical analyses of translational initiation signals have been done, few have focused on base correlations that assess base dependencies in the signal sequences. We used relative entropy and mutual information to analyze base conservation and correlation, respectively, in the 5′ UTRs of various species. In eukaryotes, we found peaks of relative entropy at - 3 from the translational start site but no peak of mutual information at that position, indicating that the base at that position (known as the core base of the Kozak sequence) is well conserved but not correlated with neighboring bases and thus functions as a single base. We observed unexpected peaks of mutual information between positions - 2 and - 1 in most eukaryotes. Surprisingly these base correlation also occurred in some bacteria and archaea, although there were no base preferences at neither position. Various dinucleotide patterns existed at these positions, and the correlation between bases at - 2 and - 1 may be relevant to the context of translational initiation. Because dinucleotide patterns of correlated pairs of nucleotides at - 2 and - 1 were not unique within respective organisms, the correlation could not be found when analyzing single-nucleotide conservation. Therefore, mutual information allowed us to discover signals that were not found by simply analyzing base conservation.

KW - Kozak consensus sequence

KW - Mutual information

KW - Shine-Dalgarno sequence

KW - Start codon

KW - Translation initiation sites

UR - http://www.scopus.com/inward/record.url?scp=33744543456&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33744543456&partnerID=8YFLogxK

U2 - 10.1016/j.gene.2006.02.018

DO - 10.1016/j.gene.2006.02.018

M3 - Article

VL - 375

SP - 80

EP - 86

JO - Gene

JF - Gene

SN - 0378-1119

IS - 1-2

ER -