Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data

Gengxin Chen, Saied A. Jaradat, Nila Banerjee, Tetsuya S. Tanaka, Minoru S.H. Ko, Michael Q. Zhang

研究成果: Article査読

152 被引用数 (Scopus)

抄録

Many clustering algorithms have been used to analyze microarray gene expression data. Given embryonic stem cell gene expression data, we applied several indices to evaluate the performance of clustering algorithms, including hierarchical clustering, k-means, PAM and SOM. The indices were homogeneity and separation scores, silhouette width, redundant score (based on redundant genes), and WADP (testing the robustness of clustering results after small perturbation). The results showed that the ES cell dataset posed a challenge for cluster analysis in that the clusters generated by different methods were only partially consistent. Using this data set, we were able to evaluate the advantages and weaknesses of algorithms with respect to both internal and external quality measures. This study may provide a guideline on how to select suitable clustering algorithms and it may help raise issues in the extraction of meaningful biological information from microarray expression data.

本文言語English
ページ(範囲)241-262
ページ数22
ジャーナルStatistica Sinica
12
1
出版ステータスPublished - 2002 1月 1
外部発表はい

ASJC Scopus subject areas

  • 統計学および確率
  • 統計学、確率および不確実性

フィンガープリント

「Evaluation and comparison of clustering algorithms in analyzing ES cell gene expression data」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル