Loading…

NMTF-LTM: Towards an Alignment of Semantics for Lifelong Topic Modeling

Aiming at mining high quality topics by accumulating and utilizing semantic knowledge for a stream of documents, lifelong topic modeling (LTM) has attracted more and more attentions recently. However, the permutation of topics may change over time, resulting in a semantic misalignment between the to...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on knowledge and data engineering 2023-10, Vol.35 (10), p.1-16
Main Authors:	Lei, Zhiqi, Liu, Hai, Yan, Jiaxing, Rao, Yanghui, Li, Qing
Format:	Article
Language:	English
Subjects:	Algorithms Alignment Computational modeling Data mining Data models Documents Lifelong topic modeling Matrix decomposition Misalignment Modelling non-negative matrix tri-factorization Parallel algorithms parallel computing Permutations semantic alignment Semantics Task analysis
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Aiming at mining high quality topics by accumulating and utilizing semantic knowledge for a stream of documents, lifelong topic modeling (LTM) has attracted more and more attentions recently. However, the permutation of topics may change over time, resulting in a semantic misalignment between the topic representations of document chunks across the stream. Such a misalignment deteriorates the model performances of various downstream tasks, while it has been overlooked by the existing lifelong topic models. Towards addressing the misalignment of semantics, we formulate LTM as a problem of non-negative matrix tri-factorization (NMTF) and propose a consolidation framework (i.e., NMTF-LTM) to enforce an alignment in a mapped topic space. In addition, a distributed parallel algorithm, namely PNMTF-LTM, is developed to meet the real-time requirement for large-scale stream processing. Empirical results show that our method can not only obtain a superior alignment of semantics without loss of topic quality, but also achieve effective speedup when deployed to a high performance computing cluster.
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2023.3267496