Stem kernels for RNA sequence analyses

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from non-members and hence detect non-coding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVM) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base-pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences and calculates the inner product of common stem structure counts. An efficient algorithm was developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from non-members using SVM. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Further, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel to find novel RNA families from genome sequences.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages278-291
Number of pages14
Volume4414 LNBI
Publication statusPublished - 2007
Event1st International Conference on Bioinformatics Research and Development, BIRD 2007 - Berlin, Germany
Duration: 2007 Mar 122007 Mar 14

Other

Other1st International Conference on Bioinformatics Research and Development, BIRD 2007
CountryGermany
CityBerlin
Period07/3/1207/3/14

Fingerprint

RNA Sequence Analysis
RNA
kernel
Secondary Structure
Genome
Amino Acid Sequence Homology
Untranslated RNA
Aptitude
Support vector machines
Discrimination
Base Pairing
Genes
Support Vector Machine
Strings
Context free grammars
Calculate
Context-free Grammar
Stochastic models
Computational methods
Dynamic programming

Keywords

  • RNA
  • Secondary structure
  • Stem kernel
  • String kernel
  • SVM

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Sakakibara, Y., Asai, K., & Sato, K. (2007). Stem kernels for RNA sequence analyses. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4414 LNBI, pp. 278-291)

Stem kernels for RNA sequence analyses. / Sakakibara, Yasubumi; Asai, Kiyoshi; Sato, Kengo.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4414 LNBI 2007. p. 278-291.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sakakibara, Y, Asai, K & Sato, K 2007, Stem kernels for RNA sequence analyses. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 4414 LNBI, pp. 278-291, 1st International Conference on Bioinformatics Research and Development, BIRD 2007, Berlin, Germany, 07/3/12.
Sakakibara Y, Asai K, Sato K. Stem kernels for RNA sequence analyses. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4414 LNBI. 2007. p. 278-291
Sakakibara, Yasubumi ; Asai, Kiyoshi ; Sato, Kengo. / Stem kernels for RNA sequence analyses. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 4414 LNBI 2007. pp. 278-291
@inproceedings{acbbb2bcf07948efac4a2c34453a9a8f,
title = "Stem kernels for RNA sequence analyses",
abstract = "Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from non-members and hence detect non-coding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVM) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base-pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences and calculates the inner product of common stem structure counts. An efficient algorithm was developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from non-members using SVM. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Further, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel to find novel RNA families from genome sequences.",
keywords = "RNA, Secondary structure, Stem kernel, String kernel, SVM",
author = "Yasubumi Sakakibara and Kiyoshi Asai and Kengo Sato",
year = "2007",
language = "English",
isbn = "3540712321",
volume = "4414 LNBI",
pages = "278--291",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Stem kernels for RNA sequence analyses

AU - Sakakibara, Yasubumi

AU - Asai, Kiyoshi

AU - Sato, Kengo

PY - 2007

Y1 - 2007

N2 - Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from non-members and hence detect non-coding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVM) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base-pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences and calculates the inner product of common stem structure counts. An efficient algorithm was developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from non-members using SVM. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Further, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel to find novel RNA families from genome sequences.

AB - Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from non-members and hence detect non-coding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVM) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base-pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences and calculates the inner product of common stem structure counts. An efficient algorithm was developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from non-members using SVM. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Further, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel to find novel RNA families from genome sequences.

KW - RNA

KW - Secondary structure

KW - Stem kernel

KW - String kernel

KW - SVM

UR - http://www.scopus.com/inward/record.url?scp=34548081526&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34548081526&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540712321

SN - 9783540712329

VL - 4414 LNBI

SP - 278

EP - 291

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -