Loading…

Investigation of Luhn's claim on information retrieval

In this study, we show how Luhn's claim about the degree of importance of a word in a document can be related to information retrieval. His basic idea is transformed into z-scores as the weights of terms for the purpose of modeling term frequency (tf) within documents. The Luhn-based models rep...

Full description

Saved in:
Bibliographic Details
Published in:Elektrik : Turkish journal of electrical engineering & computer sciences 2011-01
Main Authors: KOCABAŞ, İlker, DİNÇER, Bekir Taner, KARAOĞLAN, Bahar
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this study, we show how Luhn's claim about the degree of importance of a word in a document can be related to information retrieval. His basic idea is transformed into z-scores as the weights of terms for the purpose of modeling term frequency (tf) within documents. The Luhn-based models represented in this paper are considered as the TF component of proposed TF \times IDF weighing schemes. Moreover, the final term weighting functions appropriate for the TF \times IDF weighting scheme are applied to TREC-6, -7, and -8 databases. The experimental results show relevance to Luhn's claim by having high mean average precision (MAP) for the terms with frequencies around the mean frequency of terms within a document. On the other hand, the weighting, which significantly discriminates the importance between low/high frequencies and medium frequencies, degrades the retrieval performance. Therefore, any weighting scheme (TF) that is directly proportional to tf has a probability of high retrieval performance, if this can optimally indicate the difference of the importance regarding tf values and also optimally eliminate the terms that have high frequencies.
ISSN:1300-0632
1303-6203
DOI:10.3906/elk-1003-448