Recognizing Thai Broken Characters Based on Set-Partitions and N-Grams Graphs
The Journal of Pattern Recognition Research (JPRR) provides an international forum for the electronic publication of high-quality research and industrial experience articles in all areas of pattern recognition, machine learning, and artificial intelligence. JPRR is committed to rigorous yet rapid reviewing. Final versions are published electronically
(ISSN 1558-884X) immediately upon acceptance.
Recognizing Thai Broken Characters Based on Set-Partitions and N-Grams Graphs
Chaivatna Sumetphong, Supachai Tangwongsan
JPRR Vol 7, No 1 (2012); doi:10.13176/11.363 
Download
Chaivatna Sumetphong, Supachai Tangwongsan
Abstract
Automatic recognition of broken Thai characters represents one of the biggest challenges in some applications such as computerized restoration of Thai text documents. We propose a novel solution based on set-partitioning of the broken character pieces. The basic idea is to find the optimal set-partition of broken pieces that maximizes the likelihood that each block of the partition is well-grouped and resembles a character in terms of pattern and sizing. Our approach is tolerant to presence of noise pieces in the image. In addition, we devise a mechanism based on A* search to align a sequence of Thai characters into Thai words using a dictionary modeled as N-grams graph structure. Experiments based on this framework have been performed and the results are very promising.
JPRR Vol 7, No 1 (2012); doi:10.13176/11.363 | Full Text  | Share this paper: