Improvement of structure conservation index with centroid estimators

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

RNAz, a support vector machine (SVM) approach for identifying functional non-coding RNAs (ncRNAs), has been proven to be one of the most accurate tools for this goal. Among the measurements used in RNAz, the Structure Conservation Index (SCI) which evaluates the evolutionary conservation of RNA secondary structures in terms of folding energies, has been reported to have an extremely high discrimination capability. However, for practical use of RNAz on the genome-wide search, a relatively high false discovery rate has unfortunately been estimated. It is conceivable that multiple alignments produced by a standard aligner that does not consider any secondary structures are not suitable for identifying ncRNAs in some cases and incur high false discovery rate. In this study, we propose C-SCI, an improved measurement based on the SCI applying γ-centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that the C-SCI achieves higher accuracy than the original SCI for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the accuracy of the C-SCI on CLUSTAL W alignments is comparable with that of the original SCI on structural alignments generated with RAF for which 4.7-fold expensive computational time is required on average.

Original languageEnglish
Title of host publicationPacific Symposium on Biocomputing 2010, PSB 2010
Pages88-97
Number of pages10
Publication statusPublished - 2010
Event15th Pacific Symposium on Biocomputing, PSB 2010 - Kamuela, HI, United States
Duration: 2010 Jan 42010 Jan 8

Other

Other15th Pacific Symposium on Biocomputing, PSB 2010
CountryUnited States
CityKamuela, HI
Period10/1/410/1/8

Fingerprint

Untranslated RNA
Conservation
RNA
Genome
Support vector machines
Genes
Support Vector Machine

Keywords

  • centroid estimators
  • non-coding RNAs
  • structure conservation index

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Biomedical Engineering
  • Medicine(all)

Cite this

Okada, Y., Sato, K., & Sakakibara, Y. (2010). Improvement of structure conservation index with centroid estimators. In Pacific Symposium on Biocomputing 2010, PSB 2010 (pp. 88-97)

Improvement of structure conservation index with centroid estimators. / Okada, Yohei; Sato, Kengo; Sakakibara, Yasubumi.

Pacific Symposium on Biocomputing 2010, PSB 2010. 2010. p. 88-97.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Okada, Y, Sato, K & Sakakibara, Y 2010, Improvement of structure conservation index with centroid estimators. in Pacific Symposium on Biocomputing 2010, PSB 2010. pp. 88-97, 15th Pacific Symposium on Biocomputing, PSB 2010, Kamuela, HI, United States, 10/1/4.
Okada Y, Sato K, Sakakibara Y. Improvement of structure conservation index with centroid estimators. In Pacific Symposium on Biocomputing 2010, PSB 2010. 2010. p. 88-97
Okada, Yohei ; Sato, Kengo ; Sakakibara, Yasubumi. / Improvement of structure conservation index with centroid estimators. Pacific Symposium on Biocomputing 2010, PSB 2010. 2010. pp. 88-97
@inproceedings{b1e9e7b8c8434a668a5b78cd37f3c95e,
title = "Improvement of structure conservation index with centroid estimators",
abstract = "RNAz, a support vector machine (SVM) approach for identifying functional non-coding RNAs (ncRNAs), has been proven to be one of the most accurate tools for this goal. Among the measurements used in RNAz, the Structure Conservation Index (SCI) which evaluates the evolutionary conservation of RNA secondary structures in terms of folding energies, has been reported to have an extremely high discrimination capability. However, for practical use of RNAz on the genome-wide search, a relatively high false discovery rate has unfortunately been estimated. It is conceivable that multiple alignments produced by a standard aligner that does not consider any secondary structures are not suitable for identifying ncRNAs in some cases and incur high false discovery rate. In this study, we propose C-SCI, an improved measurement based on the SCI applying γ-centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that the C-SCI achieves higher accuracy than the original SCI for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the accuracy of the C-SCI on CLUSTAL W alignments is comparable with that of the original SCI on structural alignments generated with RAF for which 4.7-fold expensive computational time is required on average.",
keywords = "centroid estimators, non-coding RNAs, structure conservation index",
author = "Yohei Okada and Kengo Sato and Yasubumi Sakakibara",
year = "2010",
language = "English",
isbn = "9814295299",
pages = "88--97",
booktitle = "Pacific Symposium on Biocomputing 2010, PSB 2010",

}

TY - GEN

T1 - Improvement of structure conservation index with centroid estimators

AU - Okada, Yohei

AU - Sato, Kengo

AU - Sakakibara, Yasubumi

PY - 2010

Y1 - 2010

N2 - RNAz, a support vector machine (SVM) approach for identifying functional non-coding RNAs (ncRNAs), has been proven to be one of the most accurate tools for this goal. Among the measurements used in RNAz, the Structure Conservation Index (SCI) which evaluates the evolutionary conservation of RNA secondary structures in terms of folding energies, has been reported to have an extremely high discrimination capability. However, for practical use of RNAz on the genome-wide search, a relatively high false discovery rate has unfortunately been estimated. It is conceivable that multiple alignments produced by a standard aligner that does not consider any secondary structures are not suitable for identifying ncRNAs in some cases and incur high false discovery rate. In this study, we propose C-SCI, an improved measurement based on the SCI applying γ-centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that the C-SCI achieves higher accuracy than the original SCI for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the accuracy of the C-SCI on CLUSTAL W alignments is comparable with that of the original SCI on structural alignments generated with RAF for which 4.7-fold expensive computational time is required on average.

AB - RNAz, a support vector machine (SVM) approach for identifying functional non-coding RNAs (ncRNAs), has been proven to be one of the most accurate tools for this goal. Among the measurements used in RNAz, the Structure Conservation Index (SCI) which evaluates the evolutionary conservation of RNA secondary structures in terms of folding energies, has been reported to have an extremely high discrimination capability. However, for practical use of RNAz on the genome-wide search, a relatively high false discovery rate has unfortunately been estimated. It is conceivable that multiple alignments produced by a standard aligner that does not consider any secondary structures are not suitable for identifying ncRNAs in some cases and incur high false discovery rate. In this study, we propose C-SCI, an improved measurement based on the SCI applying γ-centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that the C-SCI achieves higher accuracy than the original SCI for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the accuracy of the C-SCI on CLUSTAL W alignments is comparable with that of the original SCI on structural alignments generated with RAF for which 4.7-fold expensive computational time is required on average.

KW - centroid estimators

KW - non-coding RNAs

KW - structure conservation index

UR - http://www.scopus.com/inward/record.url?scp=79551474846&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79551474846&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9814295299

SN - 9789814295291

SP - 88

EP - 97

BT - Pacific Symposium on Biocomputing 2010, PSB 2010

ER -