Loading…

Automatic Extraction of Collocations from Korean Text

In this paper, we propose a statistical method to automatically extract collocations from Korean POS-tagged corpus. Since a large portion of language is represented by collocation patterns, the collocational knowledge provides a valuable resource for NLP applications. One difficulty of collocation e...

Full description

Saved in:

Bibliographic Details
Published in:	Computers and the humanities 2001-08, Vol.35 (3), p.273-297
Main Authors:	Kim, Seonho, Yoon, Juntae, Song, Mansuk
Format:	Article
Language:	English
Subjects:	Alcohol drinking Collocations Computer science Construction materials industry Industrial products Korean language Language Linguistics Mathematics and linguistics Statistical methods Textile industry Textual collocation Timber industry Transportation industries Trucking industry Wine industry Words
Citations:	Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we propose a statistical method to automatically extract collocations from Korean POS-tagged corpus. Since a large portion of language is represented by collocation patterns, the collocational knowledge provides a valuable resource for NLP applications. One difficulty of collocation extraction is that Korean has a partially free word order, which also appears in collocations. In this work, we exploit four statistics, 'frequency', 'randomness', 'convergence', and 'correlation' in order to take into account the flexible word order of Korean collocations. We separate meaningful bigrams using an evaluation function based on the four statistics and extend the bigrams to n-gram collocations using a fuzzy relation. Experiments show that this method works well for Korean collocations.
ISSN:	0010-4817 1574-020X 1572-8412 1574-0218
DOI:	10.1023/A:1017507019909