Loading…

A multi-centrality index for graph-based keyword extraction

•Analyses of nine centrality measures with Structural Holes used for the first time in keyword extraction.•Centrality measures are correlated and with statistical similar performance when finding keywords.•Proposal of the multi-centrality index (MCI) to combine the most representative measures.•MCI...

Full description

Saved in:
Bibliographic Details
Published in:Information processing & management 2019-11, Vol.56 (6), p.102063, Article 102063
Main Authors: Vega-Oliveros, Didier A., Gomes, Pedro Spoljaric, E. Milios, Evangelos, Berton, Lilian
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Analyses of nine centrality measures with Structural Holes used for the first time in keyword extraction.•Centrality measures are correlated and with statistical similar performance when finding keywords.•Proposal of the multi-centrality index (MCI) to combine the most representative measures.•MCI achieves a high precision, recall, and F1-score with statistical significance.•Clustering algorithms could not identify well the keyword group as the MCI approach. Keyword extraction aims to capture the main topics of a document and is an important step in natural language processing (NLP) applications. The use of different graph centrality measures has been proposed to extract automatic keywords. However, there is no consensus yet on how these measures compare in this task. Here, we present the multi-centrality index (MCI) approach, which aims to find the optimal combination of word rankings according to the selection of centrality measures. We analyze nine centrality measures (Betweenness, Clustering Coefficient, Closeness, Degree, Eccentricity, Eigenvector, K-Core, PageRank, Structural Holes) for identifying keywords in co-occurrence word-graphs representation of documents. We perform experiments on three datasets of documents and demonstrate that all individual centrality methods achieve similar statistical results, while the proposed MCI approach significantly outperforms the individual centralities, three clustering algorithms, and previously reported results in the literature.
ISSN:0306-4573
1873-5371
DOI:10.1016/j.ipm.2019.102063