A pillar algorithm for k-means optimization by distance maximization for initial centroid designation

Ali Ridho Barakbah, Yasushi Kiyoki

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

Clustering performance of the K-means greatly relies upon the correctness of the initial centroids. Usually the initial centroids for the K-means clustering are determined randomly so that the determined centroids may reach the nearest local minima, not the global optimum. This paper proposes a new approach to optimizing the designation of initial centroids for K-means clustering. This approach is inspired by the thought process of determining a set of pillars' locations in order to make a stable house or building. We consider the pillars' placement which should be located as far as possible from each other to withstand against the pressure distribution of a roof, as identical to the number of centroids amongst the data distribution. Therefore, our proposed approach in this paper designates positions of initial centroids by using the farthest accumulated distance between them. First, the accumulated distance metric between all data points and their grand mean is created. The first initial centroid which has maximum accumulated distance metric is selected from the data points. The next initial centroids are designated by modifying the accumulated distance metric between each data point and all previous initial centroids, and then, a data point which has the maximum distance is selected as a new initial centroid. This iterative process is needed so that all the initial centroids are designated. This approach also has a mechanism to avoid outlier data being chosen as the initial centroids. The experimental results show effectiveness of the proposed algorithm for improving the clustering results of K-means clustering.

Original languageEnglish
Title of host publication2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings
Pages61-68
Number of pages8
DOIs
Publication statusPublished - 2009
Event2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Nashville, TN, United States
Duration: 2009 Mar 302009 Apr 2

Other

Other2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009
CountryUnited States
CityNashville, TN
Period09/3/3009/4/2

Fingerprint

Pressure distribution
Roofs

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics
  • Software

Cite this

Barakbah, A. R., & Kiyoki, Y. (2009). A pillar algorithm for k-means optimization by distance maximization for initial centroid designation. In 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings (pp. 61-68). [4938630] https://doi.org/10.1109/CIDM.2009.4938630

A pillar algorithm for k-means optimization by distance maximization for initial centroid designation. / Barakbah, Ali Ridho; Kiyoki, Yasushi.

2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings. 2009. p. 61-68 4938630.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barakbah, AR & Kiyoki, Y 2009, A pillar algorithm for k-means optimization by distance maximization for initial centroid designation. in 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings., 4938630, pp. 61-68, 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009, Nashville, TN, United States, 09/3/30. https://doi.org/10.1109/CIDM.2009.4938630
Barakbah AR, Kiyoki Y. A pillar algorithm for k-means optimization by distance maximization for initial centroid designation. In 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings. 2009. p. 61-68. 4938630 https://doi.org/10.1109/CIDM.2009.4938630
Barakbah, Ali Ridho ; Kiyoki, Yasushi. / A pillar algorithm for k-means optimization by distance maximization for initial centroid designation. 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings. 2009. pp. 61-68
@inproceedings{235ffc5e17224f16b068eede09bca364,
title = "A pillar algorithm for k-means optimization by distance maximization for initial centroid designation",
abstract = "Clustering performance of the K-means greatly relies upon the correctness of the initial centroids. Usually the initial centroids for the K-means clustering are determined randomly so that the determined centroids may reach the nearest local minima, not the global optimum. This paper proposes a new approach to optimizing the designation of initial centroids for K-means clustering. This approach is inspired by the thought process of determining a set of pillars' locations in order to make a stable house or building. We consider the pillars' placement which should be located as far as possible from each other to withstand against the pressure distribution of a roof, as identical to the number of centroids amongst the data distribution. Therefore, our proposed approach in this paper designates positions of initial centroids by using the farthest accumulated distance between them. First, the accumulated distance metric between all data points and their grand mean is created. The first initial centroid which has maximum accumulated distance metric is selected from the data points. The next initial centroids are designated by modifying the accumulated distance metric between each data point and all previous initial centroids, and then, a data point which has the maximum distance is selected as a new initial centroid. This iterative process is needed so that all the initial centroids are designated. This approach also has a mechanism to avoid outlier data being chosen as the initial centroids. The experimental results show effectiveness of the proposed algorithm for improving the clustering results of K-means clustering.",
author = "Barakbah, {Ali Ridho} and Yasushi Kiyoki",
year = "2009",
doi = "10.1109/CIDM.2009.4938630",
language = "English",
isbn = "9781424427659",
pages = "61--68",
booktitle = "2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings",

}

TY - GEN

T1 - A pillar algorithm for k-means optimization by distance maximization for initial centroid designation

AU - Barakbah, Ali Ridho

AU - Kiyoki, Yasushi

PY - 2009

Y1 - 2009

N2 - Clustering performance of the K-means greatly relies upon the correctness of the initial centroids. Usually the initial centroids for the K-means clustering are determined randomly so that the determined centroids may reach the nearest local minima, not the global optimum. This paper proposes a new approach to optimizing the designation of initial centroids for K-means clustering. This approach is inspired by the thought process of determining a set of pillars' locations in order to make a stable house or building. We consider the pillars' placement which should be located as far as possible from each other to withstand against the pressure distribution of a roof, as identical to the number of centroids amongst the data distribution. Therefore, our proposed approach in this paper designates positions of initial centroids by using the farthest accumulated distance between them. First, the accumulated distance metric between all data points and their grand mean is created. The first initial centroid which has maximum accumulated distance metric is selected from the data points. The next initial centroids are designated by modifying the accumulated distance metric between each data point and all previous initial centroids, and then, a data point which has the maximum distance is selected as a new initial centroid. This iterative process is needed so that all the initial centroids are designated. This approach also has a mechanism to avoid outlier data being chosen as the initial centroids. The experimental results show effectiveness of the proposed algorithm for improving the clustering results of K-means clustering.

AB - Clustering performance of the K-means greatly relies upon the correctness of the initial centroids. Usually the initial centroids for the K-means clustering are determined randomly so that the determined centroids may reach the nearest local minima, not the global optimum. This paper proposes a new approach to optimizing the designation of initial centroids for K-means clustering. This approach is inspired by the thought process of determining a set of pillars' locations in order to make a stable house or building. We consider the pillars' placement which should be located as far as possible from each other to withstand against the pressure distribution of a roof, as identical to the number of centroids amongst the data distribution. Therefore, our proposed approach in this paper designates positions of initial centroids by using the farthest accumulated distance between them. First, the accumulated distance metric between all data points and their grand mean is created. The first initial centroid which has maximum accumulated distance metric is selected from the data points. The next initial centroids are designated by modifying the accumulated distance metric between each data point and all previous initial centroids, and then, a data point which has the maximum distance is selected as a new initial centroid. This iterative process is needed so that all the initial centroids are designated. This approach also has a mechanism to avoid outlier data being chosen as the initial centroids. The experimental results show effectiveness of the proposed algorithm for improving the clustering results of K-means clustering.

UR - http://www.scopus.com/inward/record.url?scp=67650501922&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650501922&partnerID=8YFLogxK

U2 - 10.1109/CIDM.2009.4938630

DO - 10.1109/CIDM.2009.4938630

M3 - Conference contribution

AN - SCOPUS:67650501922

SN - 9781424427659

SP - 61

EP - 68

BT - 2009 IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2009 - Proceedings

ER -