Loading…

Some Effective Techniques for Naive Bayes Text Classification

While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor re...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on knowledge and data engineering 2006-11, Vol.18 (11), p.1457-1466
Main Authors:	KIM, Sang-Bum, HAN, Kyoung-Soo, RIM, Hae-Chang, SUNG HYON MYAENG
Format:	Article
Language:	English
Subjects:	Applied sciences Artificial intelligence Bayesian analysis Classification Classifiers Collection Computer science control theory systems Data mining Exact sciences and technology feature weighting Frequency Information systems. Data bases Learning Learning systems Memory organisation. Data processing naive Bayes classifier Natural languages Parameter estimation Poisson model Probability Software Speech and sound recognition and synthesis. Linguistics Statistical learning Support vector machine classification Support vector machines Tasks Text categorization Text classification Texts
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem. Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain. In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method. While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2006.180