Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data

Gengxin Chen, Saied A. Jaradat, Nila Banerjee, Tetsuya S. Tanaka, Minoru Ko, Michael Q. Zhang

Research output: Contribution to journalArticle

129 Citations (Scopus)

Abstract

Many clustering algorithms have been used to analyze microarray gene expression data. Given embryonic stem cell gene expression data, we applied several indices to evaluate the performance of clustering algorithms, including hierarchical clustering, k-means, PAM and SOM. The indices were homogeneity and separation scores, silhouette width, redundant score (based on redundant genes), and WADP (testing the robustness of clustering results after small perturbation). The results showed that the ES cell dataset posed a challenge for cluster analysis in that the clusters generated by different methods were only partially consistent. Using this data set, we were able to evaluate the advantages and weaknesses of algorithms with respect to both internal and external quality measures. This study may provide a guideline on how to select suitable clustering algorithms and it may help raise issues in the extraction of meaningful biological information from microarray expression data.

Original languageEnglish
Pages (from-to)241-262
Number of pages22
JournalStatistica Sinica
Volume12
Issue number1
Publication statusPublished - 2002 Jan
Externally publishedYes

Fingerprint

Gene Expression Data
Clustering Algorithm
Cell
Evaluation
Stem Cells
Quality Measures
Silhouette
Evaluate
Hierarchical Clustering
K-means
Cluster Analysis
Microarray Data
Small Perturbations
Microarray
Homogeneity
Clustering
Robustness
Gene
Internal
Testing

Keywords

  • Cluster analysis
  • Gene expression
  • Microarray
  • Mouse embryonic stem cell

ASJC Scopus subject areas

  • Mathematics(all)
  • Statistics and Probability

Cite this

Chen, G., Jaradat, S. A., Banerjee, N., Tanaka, T. S., Ko, M., & Zhang, M. Q. (2002). Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica Sinica, 12(1), 241-262.

Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. / Chen, Gengxin; Jaradat, Saied A.; Banerjee, Nila; Tanaka, Tetsuya S.; Ko, Minoru; Zhang, Michael Q.

In: Statistica Sinica, Vol. 12, No. 1, 01.2002, p. 241-262.

Research output: Contribution to journalArticle

Chen, G, Jaradat, SA, Banerjee, N, Tanaka, TS, Ko, M & Zhang, MQ 2002, 'Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data', Statistica Sinica, vol. 12, no. 1, pp. 241-262.
Chen G, Jaradat SA, Banerjee N, Tanaka TS, Ko M, Zhang MQ. Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. Statistica Sinica. 2002 Jan;12(1):241-262.
Chen, Gengxin ; Jaradat, Saied A. ; Banerjee, Nila ; Tanaka, Tetsuya S. ; Ko, Minoru ; Zhang, Michael Q. / Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data. In: Statistica Sinica. 2002 ; Vol. 12, No. 1. pp. 241-262.
@article{05e3a3695278438094c887d5419fd86b,
title = "Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data",
abstract = "Many clustering algorithms have been used to analyze microarray gene expression data. Given embryonic stem cell gene expression data, we applied several indices to evaluate the performance of clustering algorithms, including hierarchical clustering, k-means, PAM and SOM. The indices were homogeneity and separation scores, silhouette width, redundant score (based on redundant genes), and WADP (testing the robustness of clustering results after small perturbation). The results showed that the ES cell dataset posed a challenge for cluster analysis in that the clusters generated by different methods were only partially consistent. Using this data set, we were able to evaluate the advantages and weaknesses of algorithms with respect to both internal and external quality measures. This study may provide a guideline on how to select suitable clustering algorithms and it may help raise issues in the extraction of meaningful biological information from microarray expression data.",
keywords = "Cluster analysis, Gene expression, Microarray, Mouse embryonic stem cell",
author = "Gengxin Chen and Jaradat, {Saied A.} and Nila Banerjee and Tanaka, {Tetsuya S.} and Minoru Ko and Zhang, {Michael Q.}",
year = "2002",
month = "1",
language = "English",
volume = "12",
pages = "241--262",
journal = "Statistica Sinica",
issn = "1017-0405",
publisher = "Institute of Statistical Science",
number = "1",

}

TY - JOUR

T1 - Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data

AU - Chen, Gengxin

AU - Jaradat, Saied A.

AU - Banerjee, Nila

AU - Tanaka, Tetsuya S.

AU - Ko, Minoru

AU - Zhang, Michael Q.

PY - 2002/1

Y1 - 2002/1

N2 - Many clustering algorithms have been used to analyze microarray gene expression data. Given embryonic stem cell gene expression data, we applied several indices to evaluate the performance of clustering algorithms, including hierarchical clustering, k-means, PAM and SOM. The indices were homogeneity and separation scores, silhouette width, redundant score (based on redundant genes), and WADP (testing the robustness of clustering results after small perturbation). The results showed that the ES cell dataset posed a challenge for cluster analysis in that the clusters generated by different methods were only partially consistent. Using this data set, we were able to evaluate the advantages and weaknesses of algorithms with respect to both internal and external quality measures. This study may provide a guideline on how to select suitable clustering algorithms and it may help raise issues in the extraction of meaningful biological information from microarray expression data.

AB - Many clustering algorithms have been used to analyze microarray gene expression data. Given embryonic stem cell gene expression data, we applied several indices to evaluate the performance of clustering algorithms, including hierarchical clustering, k-means, PAM and SOM. The indices were homogeneity and separation scores, silhouette width, redundant score (based on redundant genes), and WADP (testing the robustness of clustering results after small perturbation). The results showed that the ES cell dataset posed a challenge for cluster analysis in that the clusters generated by different methods were only partially consistent. Using this data set, we were able to evaluate the advantages and weaknesses of algorithms with respect to both internal and external quality measures. This study may provide a guideline on how to select suitable clustering algorithms and it may help raise issues in the extraction of meaningful biological information from microarray expression data.

KW - Cluster analysis

KW - Gene expression

KW - Microarray

KW - Mouse embryonic stem cell

UR - http://www.scopus.com/inward/record.url?scp=0036012375&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036012375&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0036012375

VL - 12

SP - 241

EP - 262

JO - Statistica Sinica

JF - Statistica Sinica

SN - 1017-0405

IS - 1

ER -