GPAC

Benchmarking the sensitivity of genome informatics analysis to genome annotation completeness

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered "sensitive" to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC benchmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/.

Original languageEnglish
Pages (from-to)49-60
Number of pages12
JournalIn Silico Biology
Volume6
Issue number1-2
Publication statusPublished - 2006 Jan 4

Fingerprint

Benchmarking
Informatics
Annotation
Completeness
Genome
Genes
Gene
Prediction
Language
Explosions
Computational Biology
Computer Simulation
Explosion
Software
Bioinformatics
Vary
Subset

Keywords

  • Annotation
  • Bioinformatics
  • Genome analysis
  • Genome informatics
  • Software

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

GPAC : Benchmarking the sensitivity of genome informatics analysis to genome annotation completeness. / Arakawa, Kazuharu; Nakayama, Yoichi; Tomita, Masaru.

In: In Silico Biology, Vol. 6, No. 1-2, 04.01.2006, p. 49-60.

Research output: Contribution to journalArticle

@article{dc4b3987867043f5a26df90f651e9e59,
title = "GPAC: Benchmarking the sensitivity of genome informatics analysis to genome annotation completeness",
abstract = "In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered {"}sensitive{"} to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC benchmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/.",
keywords = "Annotation, Bioinformatics, Genome analysis, Genome informatics, Software",
author = "Kazuharu Arakawa and Yoichi Nakayama and Masaru Tomita",
year = "2006",
month = "1",
day = "4",
language = "English",
volume = "6",
pages = "49--60",
journal = "In Silico Biology",
issn = "1386-6338",
publisher = "IOS Press",
number = "1-2",

}

TY - JOUR

T1 - GPAC

T2 - Benchmarking the sensitivity of genome informatics analysis to genome annotation completeness

AU - Arakawa, Kazuharu

AU - Nakayama, Yoichi

AU - Tomita, Masaru

PY - 2006/1/4

Y1 - 2006/1/4

N2 - In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered "sensitive" to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC benchmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/.

AB - In view of the recent explosion in genome sequence data, and the 200 or more complete genome sequences currently available, the importance of genome-scale bioinformatics analysis is increasing rapidly. However, computational genome informatics analyses often lack a statistical assessment of their sensitivity to the completeness of the functional annotation. Therefore, a pre-analysis method to automatically validate the sensitivity of computational genome analyses with regard to genome annotation completeness is useful for this purpose. In this report we developed the Gene Prediction Accuracy Classification (GPAC) test, which provides statistical evidence of sensitivity by repeating the same analysis for five different gene groups (classified according to annotation accuracy level), and for randomly sampled gene groups, with the same number of genes as each of the five classified groups. Variability in these results is then assessed, and if the results vary significantly with different data subsets, the analysis is considered "sensitive" to annotation completeness, and careful selection of data is advised prior to the actual in silico analysis. The GPAC test has been applied to the analyses of Sakai et al., 2001, and Ohno et al., 2001, and it revealed that the analysis of Ohno et al. was more sensitive to annotation completeness. It showed that GPAC could be employed to ascertain the sensitivity of an analysis. The GPAC benchmarking software is freely available in the latest G-language Genome Analysis Environment package, at http://www.g-language.org/.

KW - Annotation

KW - Bioinformatics

KW - Genome analysis

KW - Genome informatics

KW - Software

UR - http://www.scopus.com/inward/record.url?scp=33745209677&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745209677&partnerID=8YFLogxK

M3 - Article

VL - 6

SP - 49

EP - 60

JO - In Silico Biology

JF - In Silico Biology

SN - 1386-6338

IS - 1-2

ER -