Recognizing Thai Broken Characters Based on Set-Partitions and N-Grams Graphs
Chaivatna Sumetphong, Supachai Tangwongsan
Abstract
Automatic recognition of broken Thai characters represents one of the biggest challenges in some applications such as computerized restoration of Thai text documents. We propose a novel solution based on set-partitioning of the broken character pieces. The basic idea is to find the optimal set-partition of broken pieces that maximizes the likelihood that each block of the partition is well-grouped and resembles a character in terms of pattern and sizing. Our approach is tolerant to presence of noise pieces in the image. In addition, we devise a mechanism based on A* search to align a sequence of Thai characters into Thai words using a dictionary modeled as N-grams graph structure. Experiments based on this framework have been performed and the results are very promising.