Loading…

A conditional random fields model for overlapping ambiguity resolution in Chinese word segmentation

Overlapping ambiguity is a kind of ambiguity phenomena in the Chinese word segmentation. Up to now, the researches on overlapping ambiguity always focused on the 3-character overlapping ambiguity strings. In this paper the distribution and forms of overlapping ambiguity strings are discussed empiric...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan Liang, Yaoting Zhu
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Overlapping ambiguity is a kind of ambiguity phenomena in the Chinese word segmentation. Up to now, the researches on overlapping ambiguity always focused on the 3-character overlapping ambiguity strings. In this paper the distribution and forms of overlapping ambiguity strings are discussed empirically. In order to deal with the overlapping ambiguity strings in different forms synchronously, a conditional random fields model is used. Different features for overlapping ambiguity resolution are explored, including component independency probability, component co-occurrence probability, in-word probability of a component and string structures. The experimental results show that the precision reaches 93.81% in the open test.
DOI:10.1109/GRC.2009.5255092