Loading…
A novel two‐phase near‐infrared and midinfrared wavelength selection framework for sample classification
Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two‐phase framework that integrates a preselection wavelength step oriented by wa...
Saved in:
Published in: | Journal of chemometrics 2024-03, Vol.38 (3), p.n/a |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Spectral data describing product samples are typically composed of a large number of noisy and irrelevant wavelengths that tends to undermine the performance of multivariate predictive techniques. This paper proposes a two‐phase framework that integrates a preselection wavelength step oriented by wavelength clustering to a wrapper‐based strategy. The first phase performs a pruning process in the data that removes the less informative wavelengths relying on the spectral clustering, a technique deemed suitable to the Fourier transform infrared (FTIR) spectroscopy and near‐infrared (NIR) spectroscopy data at hand. The preselected wavelengths undergo a second phase of selection efforts based on the combination of different wavelength importance indices (i.e., Bhattacharyya distance, Chi‐square, ReliefF, and Gini) and classification techniques (i.e., support vector machine, k‐nearest neighbors, and random forest). When applied to 11 FTIR datasets from different domains, the recommended combination of importance index and classifier increased the average accuracy by 6.37% (from 0.863 to 0.918), while retaining average 3.84% of the original spectra. The framework also improved the selection process regarding computational time.
This paper introduces a two‐phase framework merging wavelength preselection through clustering and a wrapper strategy. Initial spectral clustering eliminates less informative wavelengths. Subsequently, diverse wavelength importance indices and classification methods are integrated. Applied to 11 spectral datasets, the proposed combination (spectral clustering [SC]‐random forest [RF]+Gini [GI]) enhances average accuracy by 6.37%, retaining 3.84% of the original spectra, and reduces computational time. |
---|---|
ISSN: | 0886-9383 1099-128X |
DOI: | 10.1002/cem.3536 |