Loading…
Approximation of the Meaning for Thematic Subject Headings by Simple Interpretable Representations
The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing...
Saved in:
Published in: | Lobachevskii journal of mathematics 2024-03, Vol.45 (3), p.1261-1274 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The paper studies methods for approximating a user labeled topics by simple representations in a text classification problem. It is assumed that in real information systems the meaning of thematic categories can be approximated by a fairly simple interpreted expression. An algorithm for constructing formulas is considered, which constructs a representation of a text topic in the form of a Boolean formula—in fact, a request to a full-text information system. The algorithm is based on an optimized selection of various logical predicates with words and terms from the thesaurus. The presented algorithm has been compared with modern machine learning techniques on real collections with noisy expert markup. The described method can be used for text classification, expert evaluation of the content of the heading, assessment of the complexity of the description of the topic, and correcting the markup. |
---|---|
ISSN: | 1995-0802 1818-9962 |
DOI: | 10.1134/S1995080224600778 |