Distance Between Distributions With Special Topologies of Cost Matrices

Sung-Hyuk Cha

Abstract

Comparing two distributions plays important role in many problems. The traditional minimum cost flow problem has been utilized as a distance measure between two distributions (transportation problem) such as the earth mover’ distance ( EMD ). If the distributions have b number of bins, the cost matrix is b×b square matrix. While generic algorithms such as Simplex method to compute the EMD take too long for users to wait for the output, efficient algorithms are known for several special histogram types such as nominal Ө( b ), ordinal Ө( b ), modulo O ( b 2 ). Here a variety of special classes of cost matrices such as star , linear , tree and ring cost matrices are formally defined and generalized and their respective efficient algorithms to compute the EMD are given. A pendant arc elimination algorithm is proposed to compute the EMD with a phylogenetic network in Ө( b ). Algorithms to test whether a given cost matrix belongs to one of special classes of cost matrices are also described. Main contribution pertains to reducing the computational complexity of EMD by analyzing topological patterns in cost matrices. Keywords: Cost matrix, Distribution, Earth Mover’s distance, Topology