RNA secondary structural alignment with conditional random fields

Research output: Contribution to journalArticle

52 Citations (Scopus)

Abstract

Motivation: The computational identification of non-coding RNA regions on the genome is currently receiving much attention. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignment of RNA sequences. Several methods have been proposed to accomplish the structural alignment tasks for RNA sequences, and we found that one of the most important points is to estimate an accurate score matrix for calculating structural alignments. Results: We propose a novel approach for RNA structural alignment based on conditional random fields (CRFs). Our approach has some specific features compared with previous methods in the sense that the parameters for structural alignment are estimated such that the model can most probably discriminate between correct alignments and incorrect alignments, and has the generalization ability so that a satisfiable score matrix can be obtained even with a small number of sample data without overfitting. Experimental results clearly show that the parameter estimation with CRFs can outperform all the other existing methods for structural alignments of RNA sequences. Furthermore, structural alignment search based on CRFs is more accurate for predicting non-coding RNA regions than the other scoring methods. These experimental results strongly support our discriminative method employing CRFs to estimate the score matrix parameters.

Original languageEnglish
JournalBioinformatics
Volume21
Issue numberSUPPL. 2
DOIs
Publication statusPublished - 2005 Sep

Fingerprint

Conditional Random Fields
RNA
Untranslated RNA
Alignment
Open Reading Frames
Sequence Analysis
Genes
Research Design
Genome
Overfitting
Experimental Results
Computational methods
Scoring
Comparative Analysis
Computational Methods
Estimate
Parameter estimation
Parameter Estimation
Coding
Gene

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Biochemistry
  • Molecular Biology
  • Computational Mathematics
  • Statistics and Probability

Cite this

RNA secondary structural alignment with conditional random fields. / Sato, Kengo; Sakakibara, Yasubumi.

In: Bioinformatics, Vol. 21, No. SUPPL. 2, 09.2005.

Research output: Contribution to journalArticle

@article{cfd7a6573d0c468d89ddb56f10ad3bb8,
title = "RNA secondary structural alignment with conditional random fields",
abstract = "Motivation: The computational identification of non-coding RNA regions on the genome is currently receiving much attention. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignment of RNA sequences. Several methods have been proposed to accomplish the structural alignment tasks for RNA sequences, and we found that one of the most important points is to estimate an accurate score matrix for calculating structural alignments. Results: We propose a novel approach for RNA structural alignment based on conditional random fields (CRFs). Our approach has some specific features compared with previous methods in the sense that the parameters for structural alignment are estimated such that the model can most probably discriminate between correct alignments and incorrect alignments, and has the generalization ability so that a satisfiable score matrix can be obtained even with a small number of sample data without overfitting. Experimental results clearly show that the parameter estimation with CRFs can outperform all the other existing methods for structural alignments of RNA sequences. Furthermore, structural alignment search based on CRFs is more accurate for predicting non-coding RNA regions than the other scoring methods. These experimental results strongly support our discriminative method employing CRFs to estimate the score matrix parameters.",
author = "Kengo Sato and Yasubumi Sakakibara",
year = "2005",
month = "9",
doi = "10.1093/bioinformatics/bti1139",
language = "English",
volume = "21",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "SUPPL. 2",

}

TY - JOUR

T1 - RNA secondary structural alignment with conditional random fields

AU - Sato, Kengo

AU - Sakakibara, Yasubumi

PY - 2005/9

Y1 - 2005/9

N2 - Motivation: The computational identification of non-coding RNA regions on the genome is currently receiving much attention. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignment of RNA sequences. Several methods have been proposed to accomplish the structural alignment tasks for RNA sequences, and we found that one of the most important points is to estimate an accurate score matrix for calculating structural alignments. Results: We propose a novel approach for RNA structural alignment based on conditional random fields (CRFs). Our approach has some specific features compared with previous methods in the sense that the parameters for structural alignment are estimated such that the model can most probably discriminate between correct alignments and incorrect alignments, and has the generalization ability so that a satisfiable score matrix can be obtained even with a small number of sample data without overfitting. Experimental results clearly show that the parameter estimation with CRFs can outperform all the other existing methods for structural alignments of RNA sequences. Furthermore, structural alignment search based on CRFs is more accurate for predicting non-coding RNA regions than the other scoring methods. These experimental results strongly support our discriminative method employing CRFs to estimate the score matrix parameters.

AB - Motivation: The computational identification of non-coding RNA regions on the genome is currently receiving much attention. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for structural alignment of RNA sequences. Several methods have been proposed to accomplish the structural alignment tasks for RNA sequences, and we found that one of the most important points is to estimate an accurate score matrix for calculating structural alignments. Results: We propose a novel approach for RNA structural alignment based on conditional random fields (CRFs). Our approach has some specific features compared with previous methods in the sense that the parameters for structural alignment are estimated such that the model can most probably discriminate between correct alignments and incorrect alignments, and has the generalization ability so that a satisfiable score matrix can be obtained even with a small number of sample data without overfitting. Experimental results clearly show that the parameter estimation with CRFs can outperform all the other existing methods for structural alignments of RNA sequences. Furthermore, structural alignment search based on CRFs is more accurate for predicting non-coding RNA regions than the other scoring methods. These experimental results strongly support our discriminative method employing CRFs to estimate the score matrix parameters.

UR - http://www.scopus.com/inward/record.url?scp=27544451563&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27544451563&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bti1139

DO - 10.1093/bioinformatics/bti1139

M3 - Article

C2 - 16204111

AN - SCOPUS:27544451563

VL - 21

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - SUPPL. 2

ER -