TY - GEN
T1 - Improved prediction of transcription binding sites from chromatin modification data
AU - Sato, Kengo
AU - Whitington, Tom
AU - Bailey, Timothy L.
AU - Horton, Paul
PY - 2010/8/20
Y1 - 2010/8/20
N2 - In this paper we apply machine learning to the task of predicting transcription factor binding sites by combining information on multiple forms of chromatin modification with the binding strength DNA site predicted by a position weight matrix. We additionally explore the effect of incorporating auxiliary features such as the distance of the site to the nearest gene's transcription start site and the degree to which the site is conserved among related species. We approach the task as a classification problem, and show that both Naïve Bayes and Random Forests can provide substantial increases in the accuracy of predicted binding sites. Our results extend previous work which simply filtered candidate sites based on H3K4Me3 chromatin modification scores. In addition we apply feature selection to explore which forms of chromatin modification and which auxiliary features have predictive value for which transcription factors.
AB - In this paper we apply machine learning to the task of predicting transcription factor binding sites by combining information on multiple forms of chromatin modification with the binding strength DNA site predicted by a position weight matrix. We additionally explore the effect of incorporating auxiliary features such as the distance of the site to the nearest gene's transcription start site and the degree to which the site is conserved among related species. We approach the task as a classification problem, and show that both Naïve Bayes and Random Forests can provide substantial increases in the accuracy of predicted binding sites. Our results extend previous work which simply filtered candidate sites based on H3K4Me3 chromatin modification scores. In addition we apply feature selection to explore which forms of chromatin modification and which auxiliary features have predictive value for which transcription factors.
UR - http://www.scopus.com/inward/record.url?scp=77955604271&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955604271&partnerID=8YFLogxK
U2 - 10.1109/CIBCB.2010.5510323
DO - 10.1109/CIBCB.2010.5510323
M3 - Conference contribution
AN - SCOPUS:77955604271
SN - 9781424467662
T3 - 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
SP - 220
EP - 226
BT - 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
T2 - 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2010
Y2 - 2 May 2010 through 5 May 2010
ER -