Loading…

Effect of environmental covariable selection in the hydrological modeling using machine learning models to predict daily streamflow

There are different methods for predicting streamflow, and, recently machine learning has been widely used for this purpose. This technique uses a wide set of covariables in the prediction process that must undergo a selection to increase the precision and stability of the models. Thus, this work ai...

Full description

Saved in:
Bibliographic Details
Published in:Journal of environmental management 2021-07, Vol.290, p.112625-112625, Article 112625
Main Authors: Reis, Guilherme Barbosa, da Silva, Demetrius David, Fernandes Filho, Elpídio Inácio, Moreira, Michel Castro, Veloso, Gustavo Vieira, Fraga, Micael de Souza, Pinheiro, Sávio Augusto Rocha
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:There are different methods for predicting streamflow, and, recently machine learning has been widely used for this purpose. This technique uses a wide set of covariables in the prediction process that must undergo a selection to increase the precision and stability of the models. Thus, this work aimed to analyze the effect of covariable selection with Recursive Feature Elimination (RFE) and Forward Feature Selection (FFS) in the performance of machine learning models to predict daily streamflow. The study was carried out in the Piranga river basin, located in the State of Minas Gerais, Brazil. The database consisted of an 18-year-old historical series (2000–2017) of streamflow data at the outlet of the basin and the covariables derived from the streamflow of affluent rivers, precipitation, land use and land cover, products from the MODIS sensors, and time. The highly correlated covariables were eliminated and the selection of covariables by the level of importance was carried out by the RFE and FFS methods for the Multivariate Adaptive Regression (EARTH), Multiple Linear Regression (MLR), and Random Forest (RF) models. The data were partitioned into two groups: 75% for training and 25% for validation. The models were run 50 times and had their performance evaluated by the Nash Sutcliffe efficiency coefficient (NSE), Determination coefficient (R2), and Root of Mean Square Error (RMSE). The three models tested showed satisfactory performance with both covariable selection methods, however, all of them proved to be inaccurate for predicting values associated with maximum streamflow events. The use of FFS, in most cases, improved the performance of the models and reduced the number of selected covariables. The use of machine learning to predict daily streamflow proved to be efficient and the use of FFS in the selection of covariables enhanced this efficiency. •The Forward Feature Selection (FFS) was more advantageous for selecting covariables.•In most cases, the FFS selected fewer covariables to predict daily streamflow.•The machine learning models with covariable selection were efficient in predicting streamflow.•The most important covariable was the streamflow of affluent rivers lagged in time.
ISSN:0301-4797
1095-8630
DOI:10.1016/j.jenvman.2021.112625