Stochastic context-free grammars for modeling RNA

Yasubumi Sakakibara, Michael Brown, Rebecca C. Underwood, I. Saira Mian, David Haussler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

Stochastic context-free grammars (SCFGs) are used to fold, align and model a family of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. The novel aspect of this work is that SCFG parameters are learned automatically from unaligned, unfolded training sequences. A generalization of the HMM forward-backward algorithm is introduced. The new algorithm, based on tree grammars and faster than the previously proposed SCFG inside-outside algorithm, is tested on the transfer RNA (tRNA) family. Results show the model can discern tRNA from similar-length RNA sequences, can find secondary structure of new tRNA sequences, and can give multiple alignments of large sets of tRNA sequences. The model is extended to handle introns in tRNA.

Original languageEnglish
Title of host publicationProceedings of the Hawaii International Conference on System Sciences
EditorsJay F. Nunamaker, Ralph H.Jr. Sprague
PublisherPubl by IEEE
Pages284-293
Number of pages10
Volume5
ISBN (Print)0818650907
Publication statusPublished - 1995
Externally publishedYes
EventProceedings of the 27th Hawaii International Conference on System Sciences (HICSS-27). Part 4 (of 5) - Wailea, HI, USA
Duration: 1994 Jan 41994 Jan 7

Other

OtherProceedings of the 27th Hawaii International Conference on System Sciences (HICSS-27). Part 4 (of 5)
CityWailea, HI, USA
Period94/1/494/1/7

Fingerprint

Context free grammars
RNA
Hidden Markov models
DNA
Transfer RNA
Proteins

ASJC Scopus subject areas

  • Software
  • Industrial and Manufacturing Engineering

Cite this

Sakakibara, Y., Brown, M., Underwood, R. C., Mian, I. S., & Haussler, D. (1995). Stochastic context-free grammars for modeling RNA. In J. F. Nunamaker, & R. H. J. Sprague (Eds.), Proceedings of the Hawaii International Conference on System Sciences (Vol. 5, pp. 284-293). Publ by IEEE.

Stochastic context-free grammars for modeling RNA. / Sakakibara, Yasubumi; Brown, Michael; Underwood, Rebecca C.; Mian, I. Saira; Haussler, David.

Proceedings of the Hawaii International Conference on System Sciences. ed. / Jay F. Nunamaker; Ralph H.Jr. Sprague. Vol. 5 Publ by IEEE, 1995. p. 284-293.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sakakibara, Y, Brown, M, Underwood, RC, Mian, IS & Haussler, D 1995, Stochastic context-free grammars for modeling RNA. in JF Nunamaker & RHJ Sprague (eds), Proceedings of the Hawaii International Conference on System Sciences. vol. 5, Publ by IEEE, pp. 284-293, Proceedings of the 27th Hawaii International Conference on System Sciences (HICSS-27). Part 4 (of 5), Wailea, HI, USA, 94/1/4.
Sakakibara Y, Brown M, Underwood RC, Mian IS, Haussler D. Stochastic context-free grammars for modeling RNA. In Nunamaker JF, Sprague RHJ, editors, Proceedings of the Hawaii International Conference on System Sciences. Vol. 5. Publ by IEEE. 1995. p. 284-293
Sakakibara, Yasubumi ; Brown, Michael ; Underwood, Rebecca C. ; Mian, I. Saira ; Haussler, David. / Stochastic context-free grammars for modeling RNA. Proceedings of the Hawaii International Conference on System Sciences. editor / Jay F. Nunamaker ; Ralph H.Jr. Sprague. Vol. 5 Publ by IEEE, 1995. pp. 284-293
@inproceedings{d71fa1d6283c4c4a9ed2f26d69ae1f25,
title = "Stochastic context-free grammars for modeling RNA",
abstract = "Stochastic context-free grammars (SCFGs) are used to fold, align and model a family of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. The novel aspect of this work is that SCFG parameters are learned automatically from unaligned, unfolded training sequences. A generalization of the HMM forward-backward algorithm is introduced. The new algorithm, based on tree grammars and faster than the previously proposed SCFG inside-outside algorithm, is tested on the transfer RNA (tRNA) family. Results show the model can discern tRNA from similar-length RNA sequences, can find secondary structure of new tRNA sequences, and can give multiple alignments of large sets of tRNA sequences. The model is extended to handle introns in tRNA.",
author = "Yasubumi Sakakibara and Michael Brown and Underwood, {Rebecca C.} and Mian, {I. Saira} and David Haussler",
year = "1995",
language = "English",
isbn = "0818650907",
volume = "5",
pages = "284--293",
editor = "Nunamaker, {Jay F.} and Sprague, {Ralph H.Jr.}",
booktitle = "Proceedings of the Hawaii International Conference on System Sciences",
publisher = "Publ by IEEE",

}

TY - GEN

T1 - Stochastic context-free grammars for modeling RNA

AU - Sakakibara, Yasubumi

AU - Brown, Michael

AU - Underwood, Rebecca C.

AU - Mian, I. Saira

AU - Haussler, David

PY - 1995

Y1 - 1995

N2 - Stochastic context-free grammars (SCFGs) are used to fold, align and model a family of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. The novel aspect of this work is that SCFG parameters are learned automatically from unaligned, unfolded training sequences. A generalization of the HMM forward-backward algorithm is introduced. The new algorithm, based on tree grammars and faster than the previously proposed SCFG inside-outside algorithm, is tested on the transfer RNA (tRNA) family. Results show the model can discern tRNA from similar-length RNA sequences, can find secondary structure of new tRNA sequences, and can give multiple alignments of large sets of tRNA sequences. The model is extended to handle introns in tRNA.

AB - Stochastic context-free grammars (SCFGs) are used to fold, align and model a family of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. The novel aspect of this work is that SCFG parameters are learned automatically from unaligned, unfolded training sequences. A generalization of the HMM forward-backward algorithm is introduced. The new algorithm, based on tree grammars and faster than the previously proposed SCFG inside-outside algorithm, is tested on the transfer RNA (tRNA) family. Results show the model can discern tRNA from similar-length RNA sequences, can find secondary structure of new tRNA sequences, and can give multiple alignments of large sets of tRNA sequences. The model is extended to handle introns in tRNA.

UR - http://www.scopus.com/inward/record.url?scp=0028907514&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0028907514&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0028907514

SN - 0818650907

VL - 5

SP - 284

EP - 293

BT - Proceedings of the Hawaii International Conference on System Sciences

A2 - Nunamaker, Jay F.

A2 - Sprague, Ralph H.Jr.

PB - Publ by IEEE

ER -