Loading…

Unsupervised learning of textual pattern based on Propagation in Bipartite Graph

Graph-based algorithms have aroused considerable interests in recent years by facilitating pattern recognition and learning via information propagation process through the graph. Here, we propose an unsupervised learning algorithm based on propagation on bipartite graph, referred to as Propagation i...

Full description

Saved in:
Bibliographic Details
Published in:Intelligent data analysis 2020-01, Vol.24 (3), p.543-565
Main Authors: de Paulo Faleiros, Thiago, Valejo, Alan, de Andrade Lopes, Alneu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Graph-based algorithms have aroused considerable interests in recent years by facilitating pattern recognition and learning via information propagation process through the graph. Here, we propose an unsupervised learning algorithm based on propagation on bipartite graph, referred to as Propagation in Bipartite Graph (PBG) algorithm. The contributions of this approach are threefold: 1) we present an iterative graph-based algorithm and a straight-forward bipartite representation for textual data, in which vertices represent documents and words, and edges between documents and words represent the occurrences of the words in the documents. Additionally, 2) we show that PBG is more flexible and easier to be adapted for different applications than the mathematical formalism of the generative models, and 3) we present a comprehensive evaluation and comparison of PBG to other topic extraction techniques. Here, we describe the strategy employed in PBG algorithm as a problem of maximization of similarity between latent vectors assigned to vertices and edges and demonstrate that the proposed strategy can be improved by assigning good initial values for the vectors. We notice that PBG can be parallelized by a simple adjustment in the algorithm. We also show that the proposed algorithm is competitive with LDA and NMF in the task of textual collection modelling, returning coherent topics, and in the dimensionality reduction task.
ISSN:1088-467X
1571-4128
DOI:10.3233/IDA-194528