Loading…

Online Learning for Data Streams With Incomplete Features and Labels

Online learning is critical for handling complex data streams in Big Data-related applications. This study explores a new online learning problem where both the features and labels are incomplete. Such incompleteness poses a critical challenge in determining the latent relationship between incomplet...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on knowledge and data engineering 2024-09, Vol.36 (9), p.4820-4834
Main Authors: You, Dianlong, Yan, Huigui, Xiao, Jiawei, Chen, Zhen, Wu, Di, Shen, Limin, Wu, Xindong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Online learning is critical for handling complex data streams in Big Data-related applications. This study explores a new online learning problem where both the features and labels are incomplete. Such incompleteness poses a critical challenge in determining the latent relationship between incomplete features and labels. Unfortunately, existing online learning methods only consider a few cases of incomplete feature spaces, such as trapezoidal, evolvable, and capricious data streams, limiting their applicability to this problem. To bridge this gap, this study proposes a novel algorithm of O nline L earning for Data Streams with I ncomplete F eatures and L abels (OLIFL). OLIFL imposes no constraints on changing patterns of feature space and does not require all instances to be labeled with two-fold ideas. First, OLIFL explores the informativeness of individual features to update the classifier by dynamically maintaining global feature space and updating the informativeness matrix. Second, it estimates the label confidence of unlabeled instances to control their negative effects by limiting the error upper bound. Extensive experiments on benchmark datasets are conducted in five scenarios: three incomplete feature (trapezoidal, evolvable, and capricious) spaces, and two incomplete labels (only missing labels and missing both features and labels). In addition, we explore the sensitivity of the model to parameters, and its usability and response efficiency in handling concept drifts. The results show that OLIFL significantly outperforms its rivals. Moreover, we use OLIFL to classify a movie review task as real application verification.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2024.3374357