Extending the K-Means Clustering Algorithm to Improve the Compactness of the Clusters
Antonia Nasiakou, Miltiadis Alamaniotis, Lefteri H. Tsoukalas
Abstract
Clustering is a popular method essentially applied to data analysis, data mining, vector quantization and data compression. The most widely used clustering algorithm, which belongs to the group of partitioning algorithms, is the k-means. In this paper, we propose an extended version of k-means where the initial cluster centers are selected based on a heuristic data based formula, in contrast to random selection adopted by the traditional k-means algorithm. In particular, a new formula for selecting the initial cluster centers, before applying the k-means algorithm for clustering of a data set, is introduced. The new extended k-means algorithm is tested on clustering a set of 2-D data points. The obtained results exhibit superiority with respect to clustering compactness of the proposed algorithm as compared to traditional k-means. The validity of the extended algorithm is assessed through a set of clustering measures (Silhouette, Davies-Bouldin), with the most prominent being the Davies-Bouldin measure, that identify how compactness and well-separated the clusters are.