Improved prediction of transcription binding sites from chromatin modification data

Kengo Sato, Tom Whitington, Timothy L. Bailey, Paul Horton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we apply machine learning to the task of predicting transcription factor binding sites by combining information on multiple forms of chromatin modification with the binding strength DNA site predicted by a position weight matrix. We additionally explore the effect of incorporating auxiliary features such as the distance of the site to the nearest gene's transcription start site and the degree to which the site is conserved among related species. We approach the task as a classification problem, and show that both Naïve Bayes and Random Forests can provide substantial increases in the accuracy of predicted binding sites. Our results extend previous work which simply filtered candidate sites based on H3K4Me3 chromatin modification scores. In addition we apply feature selection to explore which forms of chromatin modification and which auxiliary features have predictive value for which transcription factors.

Original languageEnglish
Title of host publication2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
Pages220-226
Number of pages7
DOIs
Publication statusPublished - 2010
Externally publishedYes
Event2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010 - Montreal, QC, Canada
Duration: 2010 May 22010 May 5

Other

Other2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
CountryCanada
CityMontreal, QC
Period10/5/210/5/5

Fingerprint

Transcription factors
Binding sites
Transcription
Learning systems
Feature extraction
DNA
Genes

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Biomedical Engineering

Cite this

Sato, K., Whitington, T., Bailey, T. L., & Horton, P. (2010). Improved prediction of transcription binding sites from chromatin modification data. In 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010 (pp. 220-226). [5510323] https://doi.org/10.1109/CIBCB.2010.5510323

Improved prediction of transcription binding sites from chromatin modification data. / Sato, Kengo; Whitington, Tom; Bailey, Timothy L.; Horton, Paul.

2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010. 2010. p. 220-226 5510323.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Sato, K, Whitington, T, Bailey, TL & Horton, P 2010, Improved prediction of transcription binding sites from chromatin modification data. in 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010., 5510323, pp. 220-226, 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010, Montreal, QC, Canada, 10/5/2. https://doi.org/10.1109/CIBCB.2010.5510323
Sato K, Whitington T, Bailey TL, Horton P. Improved prediction of transcription binding sites from chromatin modification data. In 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010. 2010. p. 220-226. 5510323 https://doi.org/10.1109/CIBCB.2010.5510323
Sato, Kengo ; Whitington, Tom ; Bailey, Timothy L. ; Horton, Paul. / Improved prediction of transcription binding sites from chromatin modification data. 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010. 2010. pp. 220-226
@inproceedings{33c8bb5a39bb481d9c2f267e2af85e29,
title = "Improved prediction of transcription binding sites from chromatin modification data",
abstract = "In this paper we apply machine learning to the task of predicting transcription factor binding sites by combining information on multiple forms of chromatin modification with the binding strength DNA site predicted by a position weight matrix. We additionally explore the effect of incorporating auxiliary features such as the distance of the site to the nearest gene's transcription start site and the degree to which the site is conserved among related species. We approach the task as a classification problem, and show that both Na{\"i}ve Bayes and Random Forests can provide substantial increases in the accuracy of predicted binding sites. Our results extend previous work which simply filtered candidate sites based on H3K4Me3 chromatin modification scores. In addition we apply feature selection to explore which forms of chromatin modification and which auxiliary features have predictive value for which transcription factors.",
author = "Kengo Sato and Tom Whitington and Bailey, {Timothy L.} and Paul Horton",
year = "2010",
doi = "10.1109/CIBCB.2010.5510323",
language = "English",
isbn = "9781424467662",
pages = "220--226",
booktitle = "2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010",

}

TY - GEN

T1 - Improved prediction of transcription binding sites from chromatin modification data

AU - Sato, Kengo

AU - Whitington, Tom

AU - Bailey, Timothy L.

AU - Horton, Paul

PY - 2010

Y1 - 2010

N2 - In this paper we apply machine learning to the task of predicting transcription factor binding sites by combining information on multiple forms of chromatin modification with the binding strength DNA site predicted by a position weight matrix. We additionally explore the effect of incorporating auxiliary features such as the distance of the site to the nearest gene's transcription start site and the degree to which the site is conserved among related species. We approach the task as a classification problem, and show that both Naïve Bayes and Random Forests can provide substantial increases in the accuracy of predicted binding sites. Our results extend previous work which simply filtered candidate sites based on H3K4Me3 chromatin modification scores. In addition we apply feature selection to explore which forms of chromatin modification and which auxiliary features have predictive value for which transcription factors.

AB - In this paper we apply machine learning to the task of predicting transcription factor binding sites by combining information on multiple forms of chromatin modification with the binding strength DNA site predicted by a position weight matrix. We additionally explore the effect of incorporating auxiliary features such as the distance of the site to the nearest gene's transcription start site and the degree to which the site is conserved among related species. We approach the task as a classification problem, and show that both Naïve Bayes and Random Forests can provide substantial increases in the accuracy of predicted binding sites. Our results extend previous work which simply filtered candidate sites based on H3K4Me3 chromatin modification scores. In addition we apply feature selection to explore which forms of chromatin modification and which auxiliary features have predictive value for which transcription factors.

UR - http://www.scopus.com/inward/record.url?scp=77955604271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955604271&partnerID=8YFLogxK

U2 - 10.1109/CIBCB.2010.5510323

DO - 10.1109/CIBCB.2010.5510323

M3 - Conference contribution

SN - 9781424467662

SP - 220

EP - 226

BT - 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010

ER -