TY - GEN
T1 - Stem kernels for RNA sequence analyses
AU - Sakakibara, Yasubumi
AU - Asai, Kiyoshi
AU - Sato, Kengo
PY - 2007
Y1 - 2007
N2 - Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from non-members and hence detect non-coding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVM) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base-pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences and calculates the inner product of common stem structure counts. An efficient algorithm was developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from non-members using SVM. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Further, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel to find novel RNA families from genome sequences.
AB - Several computational methods based on stochastic context-free grammars have been developed for modeling and analyzing functional RNA sequences. These grammatical methods have succeeded in modeling typical secondary structures of RNA and are used for structural alignment of RNA sequences. However, such stochastic models cannot sufficiently discriminate member sequences of an RNA family from non-members and hence detect non-coding RNA regions from genome sequences. A novel kernel function, stem kernel, for the discrimination and detection of functional RNA sequences using support vector machines (SVM) is proposed. The stem kernel is a natural extension of the string kernel, specifically the all-subsequences kernel, and is tailored to measure the similarity of two RNA sequences from the viewpoint of secondary structures. The stem kernel examines all possible common base-pairs and stem structures of arbitrary lengths, including pseudoknots between two RNA sequences and calculates the inner product of common stem structure counts. An efficient algorithm was developed to calculate the stem kernels based on dynamic programming. The stem kernels are then applied to discriminate members of an RNA family from non-members using SVM. The study indicates that the discrimination ability of the stem kernel is strong compared with conventional methods. Further, the potential application of the stem kernel is demonstrated by the detection of remotely homologous RNA families in terms of secondary structures. This is because the string kernel is proven to work for the remote homology detection of protein sequences. These experimental results have convinced us to apply the stem kernel to find novel RNA families from genome sequences.
KW - RNA
KW - SVM
KW - Secondary structure
KW - Stem kernel
KW - String kernel
UR - http://www.scopus.com/inward/record.url?scp=34548081526&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34548081526&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-71233-6_22
DO - 10.1007/978-3-540-71233-6_22
M3 - Conference contribution
AN - SCOPUS:34548081526
SN - 3540712321
SN - 9783540712329
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 278
EP - 291
BT - Bioinformatics Research and Development - First International Conference, BIRD 2007 Proceedings
PB - Springer Verlag
T2 - 1st International Conference on Bioinformatics Research and Development, BIRD 2007
Y2 - 12 March 2007 through 14 March 2007
ER -