Training Set Compression by Incremental Clustering
Dalong Li, Steven Simske
Abstract
Compression of training sets is a technique for reducing training set size without degrading classification accuracy. By reducing the size of a training set, training will be more efficient in addition to saving storage space. In this paper, an incremental clustering algorithm, the Leader algorithm, is used to reduce the size of a training set by effectively subsampling the training set. Experiments on several standard data sets using SVM and KNN as classifiers indicate that the proposed method is more efficient than CONDENSE in reducing the size of training set without degrading the classification accuracy. While the compression ratio for the CONDENSE method is fixed, the proposed method offers variable compression ratio through the cluster threshold value.