TY - JOUR
T1 - Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures
AU - Saito, Yutaka
AU - Sato, Kengo
AU - Sakakibara, Yasubumi
N1 - Funding Information:
This work was supported by KAKENHI (Grant-in-Aid for Scientific Research) on Innovative Areas (No.221S0002) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan. This work was also supported by KAKENHI on Priority Area “Comparative Genomics” (No.17018029), and by a grant from “Functional RNA Project” funded by the New Energy and Industrial Technology Development Organization (NEDO), Japan. KS was supported in part by Global COE program “Deciphering Biosphere from Genome Big Bang”, and by KAKENHI for Young Scientists (B) (No.22700305) from MEXT, Japan. This article has been published as part of BMC Bioinformatics Volume 12 Supplement 1, 2011: Selected articles from the Ninth Asia Pacific Bioinformatics Conference (APBC 2011). The full contents of the supplement are available online at http://www.biomedcentral.com/1471-2105/12? issue=S1.
PY - 2011/2/15
Y1 - 2011/2/15
N2 - Background: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.Results: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%.Conclusions: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.
AB - Background: Clustering of unannotated transcripts is an important task to identify novel families of noncoding RNAs (ncRNAs). Several hierarchical clustering methods have been developed using similarity measures based on the scores of structural alignment. However, the high computational cost of exact structural alignment requires these methods to employ approximate algorithms. Such heuristics degrade the quality of clustering results, especially when the similarity among family members is not detectable at the primary sequence level.Results: We describe a new similarity measure for the hierarchical clustering of ncRNAs. The idea is that the reliability of approximate algorithms can be improved by utilizing the information of suboptimal solutions in their dynamic programming frameworks. We approximate structural alignment in a more simplified manner than the existing methods. Instead, our method utilizes all possible sequence alignments and all possible secondary structures, whereas the existing methods only use one optimal sequence alignment and one optimal secondary structure. We demonstrate that this strategy can achieve the best balance between the computational cost and the quality of the clustering. In particular, our method can keep its high performance even when the sequence identity of family members is less than 60%.Conclusions: Our method enables fast and accurate clustering of ncRNAs. The software is available for download at http://bpla-kernel.dna.bio.keio.ac.jp/clustering/.
UR - http://www.scopus.com/inward/record.url?scp=79951519599&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79951519599&partnerID=8YFLogxK
U2 - 10.1186/1471-2105-12-S1-S48
DO - 10.1186/1471-2105-12-S1-S48
M3 - Article
C2 - 21342580
AN - SCOPUS:79951519599
VL - 12
JO - BMC Bioinformatics
JF - BMC Bioinformatics
SN - 1471-2105
IS - SUPPL. 1
M1 - S48
ER -