Discriminative discovery of transcription factor binding sites from location data

Yuji Kawada, Yasubumi Sakakibara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations. Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.

Original languageEnglish
Title of host publicationProceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
Pages86-92
Number of pages7
Volume2005
DOIs
Publication statusPublished - 2005
Event2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005 - Stanford, CA, United States
Duration: 2005 Aug 82005 Aug 11

Other

Other2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005
CountryUnited States
CityStanford, CA
Period05/8/805/8/11

Fingerprint

Transcription factors
Binding sites
Transcription Factors
Genes
Chromatin Immunoprecipitation
Binding Sites
Genome
Decision Trees
Consensus Sequence
Computer Simulation
Decision trees
Yeasts
Learning
Yeast
Availability
Discrimination (Psychology)

ASJC Scopus subject areas

  • Engineering(all)
  • Medicine(all)

Cite this

Kawada, Y., & Sakakibara, Y. (2005). Discriminative discovery of transcription factor binding sites from location data. In Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005 (Vol. 2005, pp. 86-92). [1498010] https://doi.org/10.1109/CSB.2005.30

Discriminative discovery of transcription factor binding sites from location data. / Kawada, Yuji; Sakakibara, Yasubumi.

Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. Vol. 2005 2005. p. 86-92 1498010.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kawada, Y & Sakakibara, Y 2005, Discriminative discovery of transcription factor binding sites from location data. in Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. vol. 2005, 1498010, pp. 86-92, 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005, Stanford, CA, United States, 05/8/8. https://doi.org/10.1109/CSB.2005.30
Kawada Y, Sakakibara Y. Discriminative discovery of transcription factor binding sites from location data. In Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. Vol. 2005. 2005. p. 86-92. 1498010 https://doi.org/10.1109/CSB.2005.30
Kawada, Yuji ; Sakakibara, Yasubumi. / Discriminative discovery of transcription factor binding sites from location data. Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005. Vol. 2005 2005. pp. 86-92
@inproceedings{36293009c6bf4501a84dbe4d5c237284,
title = "Discriminative discovery of transcription factor binding sites from location data",
abstract = "Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations. Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.",
author = "Yuji Kawada and Yasubumi Sakakibara",
year = "2005",
doi = "10.1109/CSB.2005.30",
language = "English",
isbn = "0769523447",
volume = "2005",
pages = "86--92",
booktitle = "Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005",

}

TY - GEN

T1 - Discriminative discovery of transcription factor binding sites from location data

AU - Kawada, Yuji

AU - Sakakibara, Yasubumi

PY - 2005

Y1 - 2005

N2 - Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations. Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.

AB - Motivation: The availability of genome-wide location analyses based on chromatin immunoprecipitation (ChIP) data gives a new insight for in silico analysis of transcriptional regulations. Results: We propose a novel discriminative discovery framework for precisely identifying transcriptional regulatory motifs from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor (TF)) based on the genome-wide location data. In this framework, our goal is to find such discriminative motifs that best explain the location data in the sense that the motifs precisely discriminate the positive samples from the negative ones. First, in order to discover an initial set of discriminative substrings between positive and negative samples, we apply a decision tree learning method which produces a text-classification tree. We extract several clusters consisting of similar substrings from the internal nodes of the learned tree. Second, we start with initial profile-HMMs constructed from each cluster for representing putative motifs and iteratively refine the profile-HMMs to improve the discrimination accuracies. Our genome-wide experimental results on yeast show that our method successfully identifies the consensus sequences for known TFs in the literature and further presents significant performances for discriminating between positive and negative samples in all the TFs, while most other motif detecting methods show very poor performances on the problem of discriminations. Our learned profile-HMMs also improve false negative predictions of ChIP data.

UR - http://www.scopus.com/inward/record.url?scp=33745489184&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745489184&partnerID=8YFLogxK

U2 - 10.1109/CSB.2005.30

DO - 10.1109/CSB.2005.30

M3 - Conference contribution

C2 - 16447966

AN - SCOPUS:33745489184

SN - 0769523447

SN - 9780769523446

VL - 2005

SP - 86

EP - 92

BT - Proceedings - 2005 IEEE Computational Systems Bioinformatics Conference, CSB 2005

ER -