Find Paper, Faster
Example:10.1021/acsami.1c06204 or Chem. Rev., 2007, 107, 2411-2502
Word Segmentation for Burmese Based on Dual-Layer CRFs
ACM Transactions on Asian and Low-Resource Language Information Processing  (IF1.471),  Pub Date : 2018-11-13, DOI: 10.1145/3232537
Shaoning Zhang, Cunli Mao, Zhengtao Yu, Hongbin Wang, Zhongwei Li, Jiafu Zhang

Burmese is an isolated language, in which the syllable is the smallest unit. Syllable segmentation methods based on matching lead to performance subject to the syllable segmentation effect. This article proposes a word segmentation method with fusion conditions of double syllable features. It combines word segmentation and segmentation of syllables into one process, thus reducing the impact of errors on the syllable segmentation of Burmese. In the first layer of the conditional random fields (CRF) model, Burmese characters as atomic features are integrated into the Burma section of the Barkis Speech Paradigm (Backus normal form) features to realize the Burma syllable sequence tags. In the second layer of the CRFs model, with the syllable marked as input, it realizes the sequence markers through building a feature template with syllables as atomic features. The experimental results show that the proposed method has a better effect compared with the method based on the matching of syllables.