TY - JOUR
T1 - Estimating the false discovery rate using mixed normal distribution for identifying differentially expressed genes in microarray data analysis
AU - Hirakawa, Akihiro
AU - Sato, Yasunori
AU - Sozu, Takashi
AU - Hamada, Chikuma
AU - Yoshimura, Isao
PY - 2007
Y1 - 2007
N2 - The recent development of DNA microarray technology allows us to measure simultaneously the expression levels of thousands of genes and to identify truly correlated genes with anticancer drug response (differentially expressed genes) from many candidate genes. Significance Analysis of Microarray (SAM) is often used to estimate the false discovery rate (FDR), which is an index for optimizing the identifiability of differentially expressed genes, while the accuracy of the estimated FDR by SAM is not necessarily confirmed. We propose a new method for estimating the FDR assuming a mixed normal distribution on the test statistic and examine the performance of the proposed method and SAM using simulated data. The simulation results indicate that the accuracy of the estimated FDR by the proposed method and SAM, varied depending on the experimental conditions. We applied both methods to actual data comprised of expression levels of 12,625 genes of 10 responders and 14 non-responders to docetaxel for breast cancer. The proposed method identified 280 differentially expressed genes correlated with docetaxel response using a cut-off value for achieving FDR <0.01 to prevent false-positive genes, although 92 genes were previously thought to be correlated with docetaxel response ones.
AB - The recent development of DNA microarray technology allows us to measure simultaneously the expression levels of thousands of genes and to identify truly correlated genes with anticancer drug response (differentially expressed genes) from many candidate genes. Significance Analysis of Microarray (SAM) is often used to estimate the false discovery rate (FDR), which is an index for optimizing the identifiability of differentially expressed genes, while the accuracy of the estimated FDR by SAM is not necessarily confirmed. We propose a new method for estimating the FDR assuming a mixed normal distribution on the test statistic and examine the performance of the proposed method and SAM using simulated data. The simulation results indicate that the accuracy of the estimated FDR by the proposed method and SAM, varied depending on the experimental conditions. We applied both methods to actual data comprised of expression levels of 12,625 genes of 10 responders and 14 non-responders to docetaxel for breast cancer. The proposed method identified 280 differentially expressed genes correlated with docetaxel response using a cut-off value for achieving FDR <0.01 to prevent false-positive genes, although 92 genes were previously thought to be correlated with docetaxel response ones.
KW - Differentially expressed genes
KW - False discovery rate
KW - Microarray
KW - Mixed normal distribution
KW - Significance analysis of microarray
UR - http://www.scopus.com/inward/record.url?scp=49649093310&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49649093310&partnerID=8YFLogxK
U2 - 10.1177/117693510700300009
DO - 10.1177/117693510700300009
M3 - Article
C2 - 19455258
AN - SCOPUS:49649093310
SN - 1176-9351
VL - 3
SP - 140
EP - 148
JO - Cancer Informatics
JF - Cancer Informatics
ER -