Machine learning combined with non-targeted LC-HRMS analysis for a risk warning system of chemical hazards in drinking water: A proof of concept

Guaranteeing clean drinking water to the global population is becoming more challenging, because of the cases of water scarcity across the globe, growing population, and increased chemical footprint of this population. Existing targeted strategies for hazard monitoring in drinking water are not adeq...

Full description

Saved in:

Bibliographic Details
Published in:	Talanta (Oxford) 2019-04, Vol.195, p.426-432
Main Authors:	Samanipour, Saer, Kaserzon, Sarit, Vijayasarathy, Soumini, Jiang, Hui, Choi, Phil, Reid, Malcolm J., Mueller, Jochen F., Thomas, Kevin V.
Format:	Article
Language:	eng
Subjects:	Chromatography, Liquid Discriminant Analysis Drinking water Drinking Water - analysis LC-HRMS Least-Squares Analysis Machine Learning Mass Spectrometry - methods Models, Statistical Non-target Reproducibility of Results Statistical modeling Water Pollutants, Chemical - analysis
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Guaranteeing clean drinking water to the global population is becoming more challenging, because of the cases of water scarcity across the globe, growing population, and increased chemical footprint of this population. Existing targeted strategies for hazard monitoring in drinking water are not adequate to handle such diverse and multidimensional stressors. In the current study, we have developed, validated, and tested a machine learning algorithm based on the data produced via non-targeted liquid chromatography coupled with high resolution mass spectrometry (LC-HRMS) for the identification of potential chemical hazards in drinking water. The machine learning algorithm consisted of a composite statistical model including an unsupervised component (i.e. principal component analysis PCA) and a supervised one (i.e. partial least square discrimination analysis PLS-DA). This model was trained using a training set of 20 drinking water samples previously tested via conventional suspect screening. The developed model was validated using a validation set of 20 drinking water samples of which 4 were spiked with 15 labeled standards at four different concentration levels. The model successfully detected all of the added analytes in the four spiked samples without producing any cases of false detection. The same validation set was processed via conventional trend analysis in order to cross validate the composite model. The results of cross validation showed that even though the conventional trend analysis approach produced a false positive detection rate of ≤5% the composite model outperformed that approach by producing zero cases of false detection. Additionally, the validated model went through an additional test with 42 extra drinking water samples from the same source for an unbiased examination of the model. Finally, the potentials and limitations of this approach were further discussed. [Display omitted] •A composite statistical model was developed using non-targeted LC-HRMS data.•The validated model (i.e. machine learning algorithm) was used for monitoring the drinking water.•The algorithm was cross-validated via conventional approaches.
ISSN:	0039-9140 1873-3573