Loading…

Some Effective Techniques for Naive Bayes Text Classification

While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor re...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering 2006-11, Vol.18 (11), p.1457-1466
Main Authors: KIM, Sang-Bum, HAN, Kyoung-Soo, RIM, Hae-Chang, SUNG HYON MYAENG
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2006.180