Bandwidth Selection for Mean-Shift Based Unsupervised Learning Techniques: A Unified Approach via Self-Coverage
Jochen Einbeck
Abstract
The mean shift is a simple but powerful tool emerging from the computer science literature which shifts a point to the local center of mass around this point. It has been used as a building block for several nonparametric unsupervised learning techniques, such as density mode estimation, clustering, and the estimation of principal curves. Due to the localized way of averaging, it requires the specification of a window size in form of a bandwidth (matrix). This paper proposes to use a so-called self-coverage measure as a general device for bandwidth selection in this context. In short, a bandwidth h will be favorable if a high proportion of data points falls within circles or ``hypertubes"; of radius h centered at the fitted object. The method is illustrated through real data examples in the light of several unsupervised estimation problems.