TY - GEN
T1 - Discriminative detection of cis-acting regulatory variation from location data
AU - Kawada, Yuji
AU - Sakakibara, Yasubumi
PY - 2006/12/1
Y1 - 2006/12/1
N2 - The interaction between transcription factors and their DNA binding sites plays a key role for understanding gene regulation mechanisms. Recent studies revealed the presence of .functional polymorphism. in genes that is defined as regulatory variation measured in transcription levels due to the cisacting sequence differences. These regulatory variants are assumed to contribute to modulating gene functions. However, computational identifications of such functional cisregulatory variants is a much greater challenge than just identifying consensus sequences, because cisregulatory variants differ by only a few bases from the main consensus sequences, while they have important consequences for organismal phenotype. None of the previous studies have directly addressed this problem. We propose a novel discriminative detection method for precisely identifying transcription factor binding sites and their functional variants from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor) based on the genome-wide location data. Our goal is to find such discriminative substrings that best explain the location data in the sense that the substrings precisely discriminate the positive samples from the negative ones rather than finding the substrings that are simply over-represented among the positive ones. Our method consists of two steps: First, we apply a decision tree learning method to discover discriminative substrings and a hierarchical relationship among them. Second, we extract a main motif and further a second motif as a cis-regulatory variant by utilizing functional annotations. Our genome-wide experimental results on yeast Saccharomyces cerevisiae show that our method presented significantly better performances for detecting experimentally verified consensus sequences than current motif detecting methods. In addition, our method has successfully discovered second motifs of putative functional cis-regulatory variants which are associated with genes of different functional annotations, and the correctness of those variants have been verified by expression profile analyses.
AB - The interaction between transcription factors and their DNA binding sites plays a key role for understanding gene regulation mechanisms. Recent studies revealed the presence of .functional polymorphism. in genes that is defined as regulatory variation measured in transcription levels due to the cisacting sequence differences. These regulatory variants are assumed to contribute to modulating gene functions. However, computational identifications of such functional cisregulatory variants is a much greater challenge than just identifying consensus sequences, because cisregulatory variants differ by only a few bases from the main consensus sequences, while they have important consequences for organismal phenotype. None of the previous studies have directly addressed this problem. We propose a novel discriminative detection method for precisely identifying transcription factor binding sites and their functional variants from both positive and negative samples (sets of upstream sequences of both bound and unbound genes by a transcription factor) based on the genome-wide location data. Our goal is to find such discriminative substrings that best explain the location data in the sense that the substrings precisely discriminate the positive samples from the negative ones rather than finding the substrings that are simply over-represented among the positive ones. Our method consists of two steps: First, we apply a decision tree learning method to discover discriminative substrings and a hierarchical relationship among them. Second, we extract a main motif and further a second motif as a cis-regulatory variant by utilizing functional annotations. Our genome-wide experimental results on yeast Saccharomyces cerevisiae show that our method presented significantly better performances for detecting experimentally verified consensus sequences than current motif detecting methods. In addition, our method has successfully discovered second motifs of putative functional cis-regulatory variants which are associated with genes of different functional annotations, and the correctness of those variants have been verified by expression profile analyses.
UR - http://www.scopus.com/inward/record.url?scp=84856993101&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84856993101&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84856993101
SN - 1860946232
SN - 9781860946233
T3 - Series on Advances in Bioinformatics and Computational Biology
SP - 89
EP - 98
BT - Proceedings of the 4th Asia-Pacific Bioinformatics Conference, APBC 2006
T2 - 4th Asia-Pacific Bioinformatics Conference, APBC 2006
Y2 - 13 February 2006 through 16 February 2006
ER -