Genome-wide assembly and analysis of alternative transcripts in mouse

Alexei A. Sharov, Dawood B. Dudekula, Minoru Ko

Research output: Contribution to journalArticle

50 Citations (Scopus)

Abstract

To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databases revealed that it had a greater number of transcripts, a greater average number of exons and introns with proper splicing sites per gene, and longer ORFs. The 27,497 protein-coding genes had 77,138 transcripts, i.e., 2.8 transcripts per gene on average. Close examination of transcripts led to a combinatorial table of 23 types of ATS units, only nine of which were previously described, i.e., 14 types of alternative splicing, seven types of alternative starts, and two types of alternative termination. The 47%, 18%, and 14% of 20,323 multiexon protein-coding genes with proper splice sites had alternative splicings, alternative starts, and alternative terminations, respectively. The gene index with the comprehensive ATS will provide a useful platform for analyzing the nature and mechanism of ATS, as well as for designing the accurate exon-based DNA microarrays.

Original languageEnglish
Pages (from-to)748-754
Number of pages7
JournalGenome Research
Volume15
Issue number5
DOIs
Publication statusPublished - 2005 May
Externally publishedYes

Fingerprint

Alternative Splicing
Genome
Genes
Exons
Proteins
Oligonucleotide Array Sequence Analysis
Introns
Open Reading Frames
Databases

ASJC Scopus subject areas

  • Genetics

Cite this

Genome-wide assembly and analysis of alternative transcripts in mouse. / Sharov, Alexei A.; Dudekula, Dawood B.; Ko, Minoru.

In: Genome Research, Vol. 15, No. 5, 05.2005, p. 748-754.

Research output: Contribution to journalArticle

Sharov, Alexei A. ; Dudekula, Dawood B. ; Ko, Minoru. / Genome-wide assembly and analysis of alternative transcripts in mouse. In: Genome Research. 2005 ; Vol. 15, No. 5. pp. 748-754.
@article{f927e41573eb4f89b81606f3d1f0e40c,
title = "Genome-wide assembly and analysis of alternative transcripts in mouse",
abstract = "To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databases revealed that it had a greater number of transcripts, a greater average number of exons and introns with proper splicing sites per gene, and longer ORFs. The 27,497 protein-coding genes had 77,138 transcripts, i.e., 2.8 transcripts per gene on average. Close examination of transcripts led to a combinatorial table of 23 types of ATS units, only nine of which were previously described, i.e., 14 types of alternative splicing, seven types of alternative starts, and two types of alternative termination. The 47{\%}, 18{\%}, and 14{\%} of 20,323 multiexon protein-coding genes with proper splice sites had alternative splicings, alternative starts, and alternative terminations, respectively. The gene index with the comprehensive ATS will provide a useful platform for analyzing the nature and mechanism of ATS, as well as for designing the accurate exon-based DNA microarrays.",
author = "Sharov, {Alexei A.} and Dudekula, {Dawood B.} and Minoru Ko",
year = "2005",
month = "5",
doi = "10.1101/gr.3269805",
language = "English",
volume = "15",
pages = "748--754",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "5",

}

TY - JOUR

T1 - Genome-wide assembly and analysis of alternative transcripts in mouse

AU - Sharov, Alexei A.

AU - Dudekula, Dawood B.

AU - Ko, Minoru

PY - 2005/5

Y1 - 2005/5

N2 - To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databases revealed that it had a greater number of transcripts, a greater average number of exons and introns with proper splicing sites per gene, and longer ORFs. The 27,497 protein-coding genes had 77,138 transcripts, i.e., 2.8 transcripts per gene on average. Close examination of transcripts led to a combinatorial table of 23 types of ATS units, only nine of which were previously described, i.e., 14 types of alternative splicing, seven types of alternative starts, and two types of alternative termination. The 47%, 18%, and 14% of 20,323 multiexon protein-coding genes with proper splice sites had alternative splicings, alternative starts, and alternative terminations, respectively. The gene index with the comprehensive ATS will provide a useful platform for analyzing the nature and mechanism of ATS, as well as for designing the accurate exon-based DNA microarrays.

AB - To build a mouse gene index with the most comprehensive coverage of alternative transcription/splicing (ATS), we developed an algorithm and a fully automated computational pipeline for transcript assembly from expressed sequences aligned to the genome. We identified 191,946 genomic loci, which included 27,497 protein-coding genes and 11,906 additional gene candidates (e.g., nonprotein-coding, but multiexon). Comparison of the resulting gene index with TIGR, UniGene, DoTS, and ESTGenes databases revealed that it had a greater number of transcripts, a greater average number of exons and introns with proper splicing sites per gene, and longer ORFs. The 27,497 protein-coding genes had 77,138 transcripts, i.e., 2.8 transcripts per gene on average. Close examination of transcripts led to a combinatorial table of 23 types of ATS units, only nine of which were previously described, i.e., 14 types of alternative splicing, seven types of alternative starts, and two types of alternative termination. The 47%, 18%, and 14% of 20,323 multiexon protein-coding genes with proper splice sites had alternative splicings, alternative starts, and alternative terminations, respectively. The gene index with the comprehensive ATS will provide a useful platform for analyzing the nature and mechanism of ATS, as well as for designing the accurate exon-based DNA microarrays.

UR - http://www.scopus.com/inward/record.url?scp=19844379515&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19844379515&partnerID=8YFLogxK

U2 - 10.1101/gr.3269805

DO - 10.1101/gr.3269805

M3 - Article

VL - 15

SP - 748

EP - 754

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 5

ER -