Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Background: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.Results: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%.Conclusions: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.

Original languageEnglish
Article numberS48
JournalBMC Bioinformatics
Volume12
Issue numberSUPPL. 1
DOIs
Publication statusPublished - 2011 Feb 15

Fingerprint

Untranslated RNA
Sequence Alignment
Secondary Structure
RNA
Cluster Analysis
Ensemble
Clustering
Alignment
Approximate Algorithm
Hierarchical Clustering
Similarity Measure
Computational Cost
Dynamic programming
Costs and Cost Analysis
Costs
Clustering Methods
Dynamic Programming
High Performance
Heuristics
kernel

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures. / Saito, Yutaka; Sato, Kengo; Sakakibara, Yasubumi.

In: BMC Bioinformatics, Vol. 12, No. SUPPL. 1, S48, 15.02.2011.

Research output: Contribution to journalArticle

@article{c557d94220a5491e9673ce7bea07a63c,
title = "Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures",
abstract = "Background: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.Results: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60{\%}.Conclusions: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.",
author = "Yutaka Saito and Kengo Sato and Yasubumi Sakakibara",
year = "2011",
month = "2",
day = "15",
doi = "10.1186/1471-2105-12-S1-S48",
language = "English",
volume = "12",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL. 1",

}

TY - JOUR

T1 - Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures

AU - Saito, Yutaka

AU - Sato, Kengo

AU - Sakakibara, Yasubumi

PY - 2011/2/15

Y1 - 2011/2/15

N2 - Background: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.Results: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%.Conclusions: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.

AB - Background: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.Results: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%.Conclusions: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.

UR - http://www.scopus.com/inward/record.url?scp=79951519599&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951519599&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-12-S1-S48

DO - 10.1186/1471-2105-12-S1-S48

M3 - Article

VL - 12

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL. 1

M1 - S48

ER -