Loading…

Genetic Algorithm Classifier System for Semi-Supervised Learning

Real‐world datasets often contain large numbers of unlabeled data points, because there is additional cost for obtaining the labels. Semi‐supervised learning (SSL) algorithms use both labeled and unlabeled data points for training that can result in higher classification accuracy on these datasets....

Full description

Saved in:
Bibliographic Details
Published in:Computational intelligence 2015-05, Vol.31 (2), p.201-232
Main Authors: Dee Miller, L., Soh, Leen-Kiat, Scott, Stephen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Real‐world datasets often contain large numbers of unlabeled data points, because there is additional cost for obtaining the labels. Semi‐supervised learning (SSL) algorithms use both labeled and unlabeled data points for training that can result in higher classification accuracy on these datasets. Generally, traditional SSLs tentatively label the unlabeled data points on the basis of the smoothness assumption that neighboring points should have the same label. When this assumption is violated, unlabeled points are mislabeled injecting noise into the final classifier. An alternative SSL approach is cluster‐then‐label (CTL), which partitions all the data points (labeled and unlabeled) into clusters and creates a classifier by using those clusters. CTL is based on the less restrictive cluster assumption that data points in the same cluster should have the same label. As shown, this allows CTLs to achieve higher classification accuracy on many datasets where the cluster assumption holds for the CTLs, but smoothness does not hold for the traditional SSLs. However, cluster configuration problems (e.g., irrelevant features, insufficient clusters, and incorrectly shaped clusters) could violate the cluster assumption. We propose a new framework for CTLs by using a genetic algorithm (GA) to evolve classifiers without the cluster configuration problems (e.g., the GA removes irrelevant attributes, updates number of clusters, and changes the shape of the clusters). We demonstrate that a CTL based on this framework achieves comparable or higher accuracy with both traditional SSLs and CTLs on 12 University of California, Irvine machine learning datasets.
ISSN:0824-7935
1467-8640
DOI:10.1111/coin.12018