Loading…

A novel two‐phase near‐infrared and midinfrared wavelength selection framework for sample classification

Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two‐phase framework that integrates a preselection wavelength step oriented by wa...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of chemometrics 2024-03, Vol.38 (3), p.n/a
Main Authors:	Fontes, Juliana, Anzanello, Michel J., Brito, João B. G., Bucco, Guilherme B.
Format:	Article
Language:	English
Subjects:	Classification Clustering clustering of wavelengths Computing time Fourier transforms Infrared spectroscopy machine learning Near infrared radiation Performance prediction random forest spectral analysis FTIR/NIR spectral clustering Spectroscopy Spectrum analysis Support vector machines wavelength selection Wavelengths
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two‐phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper‐based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near‐infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi‐square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, k‐nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time. This paper introduces a two‐phase framework merging wavelength preselection through clustering and a wrapper strategy. Initial spectral clustering eliminates less informative wavelengths. Subsequently, diverse wavelength importance indices and classification methods are integrated. Applied to 11 spectral datasets, the proposed combination (spectral clustering [SC]‐random forest [RF]+Gini [GI]) enhances average accuracy by 6.37%, retaining 3.84% of the original spectra, and reduces computational time.
ISSN:	0886-9383 1099-128X
DOI:	10.1002/cem.3536