Stem kernels for RNA sequence analyses

Yasubumi Sakakibara, Kris Popendorf, Nana Ogawa, Kiyoshi Asai, Kengo Sato

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.

Original languageEnglish
Pages (from-to)1103-1122
Number of pages20
JournalJournal of Bioinformatics and Computational Biology
Volume5
Issue number5
DOIs
Publication statusPublished - 2007 Oct

Fingerprint

RNA Sequence Analysis
RNA
Genome
Amino Acid Sequence Homology
Untranslated RNA
Aptitude
Support vector machines
Base Pairing
Genes
Context free grammars
Stochastic models
Computational methods
Dynamic programming

Keywords

  • RNA
  • Secondary structure
  • Stem kernel
  • String kernel
  • SVM

ASJC Scopus subject areas

  • Medicine(all)
  • Cell Biology

Cite this

Stem kernels for RNA sequence analyses. / Sakakibara, Yasubumi; Popendorf, Kris; Ogawa, Nana; Asai, Kiyoshi; Sato, Kengo.

In: Journal of Bioinformatics and Computational Biology, Vol. 5, No. 5, 10.2007, p. 1103-1122.

Research output: Contribution to journalArticle

Sakakibara, Yasubumi ; Popendorf, Kris ; Ogawa, Nana ; Asai, Kiyoshi ; Sato, Kengo. / Stem kernels for RNA sequence analyses. In: Journal of Bioinformatics and Computational Biology. 2007 ; Vol. 5, No. 5. pp. 1103-1122.
@article{daa7871c44f742f38962a3aab8d0c4d7,
title = "Stem kernels for RNA sequence analyses",
abstract = "Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.",
keywords = "RNA, Secondary structure, Stem kernel, String kernel, SVM",
author = "Yasubumi Sakakibara and Kris Popendorf and Nana Ogawa and Kiyoshi Asai and Kengo Sato",
year = "2007",
month = "10",
doi = "10.1142/S0219720007003028",
language = "English",
volume = "5",
pages = "1103--1122",
journal = "Journal of Bioinformatics and Computational Biology",
issn = "0219-7200",
publisher = "World Scientific Publishing Co. Pte Ltd",
number = "5",

}

TY - JOUR

T1 - Stem kernels for RNA sequence analyses

AU - Sakakibara, Yasubumi

AU - Popendorf, Kris

AU - Ogawa, Nana

AU - Asai, Kiyoshi

AU - Sato, Kengo

PY - 2007/10

Y1 - 2007/10

N2 - Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.

AB - Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA, and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from nonmembers and hence detect noncoding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVMs) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences, and calculates the inner product of common stem structure counts. An efficient algorithm is developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from nonmembers using SVMs. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Furthermore, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel in order to find novel RNA families from genome sequences.

KW - RNA

KW - Secondary structure

KW - Stem kernel

KW - String kernel

KW - SVM

UR - http://www.scopus.com/inward/record.url?scp=35348881458&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35348881458&partnerID=8YFLogxK

U2 - 10.1142/S0219720007003028

DO - 10.1142/S0219720007003028

M3 - Article

C2 - 17933013

AN - SCOPUS:35348881458

VL - 5

SP - 1103

EP - 1122

JO - Journal of Bioinformatics and Computational Biology

JF - Journal of Bioinformatics and Computational Biology

SN - 0219-7200

IS - 5

ER -