TY - JOUR
T1 - Criteria for gene identification and features of genome organization
T2 - Analysis of 6.5 Mb of DNA sequence from human chromosome 21
AU - Slavov, Dobromir
AU - Hattori, Masahira
AU - Sakaki, Yoshiyuki
AU - Rosenthal, André
AU - Shimizu, Nobuyoshi
AU - Minoshima, Shinsei
AU - Kudoh, Jun
AU - Yaspo, Marie Laure
AU - Ramser, Juliane
AU - Reinhardt, Richard
AU - Reimer, Candy
AU - Clancy, Kevin
AU - Rynditch, Alla
AU - Gardiner, Katheleen
N1 - Funding Information:
This is a contribution (#1754) of the Thomas G. and Mary W. Vessels Laboratory for Molecular Biology and the John C. Mitchell Laboratory for the Study of Genetic Diseases and Human Development of the Eleanor Roosevelt Institute. The authors thank Andrew Fortna and Roger Lucas for excellent technical assisance and Richard Mural for helpful discussions. This work was supported by NIH grant HD17449 and by the Boettcher Foundation.
PY - 2000/4/18
Y1 - 2000/4/18
N2 - To establish criteria for and the limitations of novel gene identification, to identify novel genes of potential relevance to Down Syndrome and to investigate features of genome organization, 6.5 Mb of DNA sequence, dispersed throughout the long arm of human chromosome 21, have been annotated computationally and experimentally. Exon prediction with four programs, protein and EST database searches, two-sequence BLAST searches and CpG island characterization identified 41 genes with known or new protein homologies. Features of these genes suggested criteria for prediction of novel genes (those lacking any protein homology) with the following characteristics: (1) exon+EST genes: genes with excellent patterns of predicted exons and one or more matches in dbEST; (2) exon-EST genes: genes with good patterns of predicted exons and no matches in dbEST; (3) EST-exon genes: genes without any patterns of reliable exon prediction but with matches in dbEST; and (4) isolated CpG island genes: genes consisting of strong CpG islands that are apparently unique sequences and found in regions lacking any consistent exon predictions within >50 kb. In total, 41 novel gene models were predicted, and for a subset of these, RT-PCR experiments helped to verify and refine the models, and were used to assess expression in early development and in adult brain regions of potential relevance to Down syndrome. Results suggest generally low and/or restricted patterns of expression, and also reveal examples of complex alternative processing, especially in brain, that may have important implications for regulation of protein function. Analysis of complete gene structures of the known genes identified a number of very large introns, a number of very short intergenic distances, and at least one potentially bi-directional promoter. At least 3/4 of known genes and 1/2 of predicted genes are associated with CpG islands. For novel genes, three cases of overlapping genes are predicted. Results of these analyses illustrate some of the complexities inherent in mammalian genome organization and some of the limitations of current sequence analysis technologies. They also doubled the number of potential genes within the region. (C) 2000 Elsevier Science B.V. All rights reserved.
AB - To establish criteria for and the limitations of novel gene identification, to identify novel genes of potential relevance to Down Syndrome and to investigate features of genome organization, 6.5 Mb of DNA sequence, dispersed throughout the long arm of human chromosome 21, have been annotated computationally and experimentally. Exon prediction with four programs, protein and EST database searches, two-sequence BLAST searches and CpG island characterization identified 41 genes with known or new protein homologies. Features of these genes suggested criteria for prediction of novel genes (those lacking any protein homology) with the following characteristics: (1) exon+EST genes: genes with excellent patterns of predicted exons and one or more matches in dbEST; (2) exon-EST genes: genes with good patterns of predicted exons and no matches in dbEST; (3) EST-exon genes: genes without any patterns of reliable exon prediction but with matches in dbEST; and (4) isolated CpG island genes: genes consisting of strong CpG islands that are apparently unique sequences and found in regions lacking any consistent exon predictions within >50 kb. In total, 41 novel gene models were predicted, and for a subset of these, RT-PCR experiments helped to verify and refine the models, and were used to assess expression in early development and in adult brain regions of potential relevance to Down syndrome. Results suggest generally low and/or restricted patterns of expression, and also reveal examples of complex alternative processing, especially in brain, that may have important implications for regulation of protein function. Analysis of complete gene structures of the known genes identified a number of very large introns, a number of very short intergenic distances, and at least one potentially bi-directional promoter. At least 3/4 of known genes and 1/2 of predicted genes are associated with CpG islands. For novel genes, three cases of overlapping genes are predicted. Results of these analyses illustrate some of the complexities inherent in mammalian genome organization and some of the limitations of current sequence analysis technologies. They also doubled the number of potential genes within the region. (C) 2000 Elsevier Science B.V. All rights reserved.
KW - Down syndrome
KW - Gene identification
KW - Genome organization
KW - Human chromosome 21
KW - Sequence analysis
UR - http://www.scopus.com/inward/record.url?scp=0034681998&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0034681998&partnerID=8YFLogxK
U2 - 10.1016/S0378-1119(00)00089-5
DO - 10.1016/S0378-1119(00)00089-5
M3 - Article
C2 - 10773462
AN - SCOPUS:0034681998
SN - 0378-1119
VL - 247
SP - 215
EP - 232
JO - Gene
JF - Gene
IS - 1-2
ER -