Validation Index for Transactional Clustering
Piyang Wang, W. M. Ma, Tommy W. S. Chow
Abstract
A new Transactional clustering validation (TCV) index for transactional data is proposed. Due to the poor structure nature of transactional data, conversional validation indices are hardly applicable to this kind of data directly. Most conversional validation indices assume that dataset is in an attribute-form. Transactional data, however, are in item-form. Hence, any metric measurement such as variance and distance can hardly measure transactional data unless they are restructured into attribute-form. If a dataset contains many missing values, treating it as a transactional dataset is more suitable than replacing those missing values by mean/mode of the corresponding attribute or discarding the sample with missing value. The proposed index measures the items scatter within the cluster clusters in order to quantify the evaluation of clustering results. In this paper, a user input parameter is introduced to select the range of the clustering results. Instead of using complex statistical analysis to describe the data structure, the composition of the proposed index is able to provide a brief structure description, which is useful for user to understand the dataset with high dimensionality.