Loading…

Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil

Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluat...

Full description

Saved in:
Bibliographic Details
Published in:International journal of plant production 2022-12, Vol.16 (4), p.691-703
Main Authors: Monteiro, Leonardo A., Ramos, Rafael M., Battisti, Rafael, Soares, Johnny R., Oliveira, Julianne C., Figueiredo, Gleyce K. D. A., Lamparelli, Rubens A. C., Nendel, Claas, Lana, Marcos Alberto
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63
cites cdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63
container_end_page 703
container_issue 4
container_start_page 691
container_title International journal of plant production
container_volume 16
creator Monteiro, Leonardo A.
Ramos, Rafael M.
Battisti, Rafael
Soares, Johnny R.
Oliveira, Julianne C.
Figueiredo, Gleyce K. D. A.
Lamparelli, Rubens A. C.
Nendel, Claas
Lana, Marcos Alberto
description Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha –1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.
doi_str_mv 10.1007/s42106-022-00209-0
format article
fullrecord <record><control><sourceid>swepub_cross</sourceid><recordid>TN_cdi_swepub_primary_oai_slubar_slu_se_119035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_research_chalmers_se_4456f599_82bd_4e82_8610_453a2d63b91c</sourcerecordid><originalsourceid>FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</originalsourceid><addsrcrecordid>eNp9kctuFDEQRVsIJELgB1j5BzqUH-12LyEJEClApCELVlbZLhNHne7I9gSFr4-HGbGDVdWiztFV3a57y-GEA4zvihIcdA9C9AACph6edUd8lENvQMnnh10brl52r0q5BdBac3PUxau10lITzuy6EFsjO8OK_VlOD7SwL2ugubC6svNS0x1WYrgEdpUpJF_ZZn10hAv7kWgOhWFlX7GmdWmyjceZWFrYh4y_0_y6exFxLvTmMI-764_n308_95ffPl2cvr_svQJd-wgYveFci0HrKER0gxnJ0cBN4GbkwjiJU5BGjZ4ieC-8dCMnIwWFyWt53J3sveUX3W-dvc8tdX60KyZb5q3DvBu2kOV8Ajk0YPNPIFMhzP7G-huc7yiXHafUoOMwTdYIF6wiI6zRHKwaJIqgpZu4b1axt_q8lpIp_vW2y11hdl-YbYXZP4VZaJA8RGnHy0_K9nbd5vbM8j_qCdoTmRc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</title><source>Springer Link</source><creator>Monteiro, Leonardo A. ; Ramos, Rafael M. ; Battisti, Rafael ; Soares, Johnny R. ; Oliveira, Julianne C. ; Figueiredo, Gleyce K. D. A. ; Lamparelli, Rubens A. C. ; Nendel, Claas ; Lana, Marcos Alberto</creator><creatorcontrib>Monteiro, Leonardo A. ; Ramos, Rafael M. ; Battisti, Rafael ; Soares, Johnny R. ; Oliveira, Julianne C. ; Figueiredo, Gleyce K. D. A. ; Lamparelli, Rubens A. C. ; Nendel, Claas ; Lana, Marcos Alberto ; Sveriges lantbruksuniversitet</creatorcontrib><description>Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha –1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.</description><identifier>ISSN: 1735-6814</identifier><identifier>ISSN: 1735-8043</identifier><identifier>EISSN: 1735-8043</identifier><identifier>DOI: 10.1007/s42106-022-00209-0</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Agricultural Science ; Agriculture ; Biomedical and Life Sciences ; Climatic and soil variables ; Geospatial and temporal variability ; Jordbruksvetenskap ; Large-scale analysis ; Life Sciences ; Machine learning approaches ; Plant Ecology ; Plant Physiology ; Public databases</subject><ispartof>International journal of plant production, 2022-12, Vol.16 (4), p.691-703</ispartof><rights>Springer Nature Switzerland AG 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</citedby><cites>FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</cites><orcidid>0000-0003-3889-6095</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,315,786,790,891,27957,27958</link.rule.ids><backlink>$$Uhttps://research.chalmers.se/publication/532111$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttps://res.slu.se/id/publ/119035$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Monteiro, Leonardo A.</creatorcontrib><creatorcontrib>Ramos, Rafael M.</creatorcontrib><creatorcontrib>Battisti, Rafael</creatorcontrib><creatorcontrib>Soares, Johnny R.</creatorcontrib><creatorcontrib>Oliveira, Julianne C.</creatorcontrib><creatorcontrib>Figueiredo, Gleyce K. D. A.</creatorcontrib><creatorcontrib>Lamparelli, Rubens A. C.</creatorcontrib><creatorcontrib>Nendel, Claas</creatorcontrib><creatorcontrib>Lana, Marcos Alberto</creatorcontrib><creatorcontrib>Sveriges lantbruksuniversitet</creatorcontrib><title>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</title><title>International journal of plant production</title><addtitle>Int. J. Plant Prod</addtitle><description>Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha –1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.</description><subject>Agricultural Science</subject><subject>Agriculture</subject><subject>Biomedical and Life Sciences</subject><subject>Climatic and soil variables</subject><subject>Geospatial and temporal variability</subject><subject>Jordbruksvetenskap</subject><subject>Large-scale analysis</subject><subject>Life Sciences</subject><subject>Machine learning approaches</subject><subject>Plant Ecology</subject><subject>Plant Physiology</subject><subject>Public databases</subject><issn>1735-6814</issn><issn>1735-8043</issn><issn>1735-8043</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kctuFDEQRVsIJELgB1j5BzqUH-12LyEJEClApCELVlbZLhNHne7I9gSFr4-HGbGDVdWiztFV3a57y-GEA4zvihIcdA9C9AACph6edUd8lENvQMnnh10brl52r0q5BdBac3PUxau10lITzuy6EFsjO8OK_VlOD7SwL2ugubC6svNS0x1WYrgEdpUpJF_ZZn10hAv7kWgOhWFlX7GmdWmyjceZWFrYh4y_0_y6exFxLvTmMI-764_n308_95ffPl2cvr_svQJd-wgYveFci0HrKER0gxnJ0cBN4GbkwjiJU5BGjZ4ieC-8dCMnIwWFyWt53J3sveUX3W-dvc8tdX60KyZb5q3DvBu2kOV8Ajk0YPNPIFMhzP7G-huc7yiXHafUoOMwTdYIF6wiI6zRHKwaJIqgpZu4b1axt_q8lpIp_vW2y11hdl-YbYXZP4VZaJA8RGnHy0_K9nbd5vbM8j_qCdoTmRc</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Monteiro, Leonardo A.</creator><creator>Ramos, Rafael M.</creator><creator>Battisti, Rafael</creator><creator>Soares, Johnny R.</creator><creator>Oliveira, Julianne C.</creator><creator>Figueiredo, Gleyce K. D. A.</creator><creator>Lamparelli, Rubens A. C.</creator><creator>Nendel, Claas</creator><creator>Lana, Marcos Alberto</creator><general>Springer International Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ADTPV</scope><scope>AOWAS</scope><scope>F1S</scope><orcidid>https://orcid.org/0000-0003-3889-6095</orcidid></search><sort><creationdate>20221201</creationdate><title>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</title><author>Monteiro, Leonardo A. ; Ramos, Rafael M. ; Battisti, Rafael ; Soares, Johnny R. ; Oliveira, Julianne C. ; Figueiredo, Gleyce K. D. A. ; Lamparelli, Rubens A. C. ; Nendel, Claas ; Lana, Marcos Alberto</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Agricultural Science</topic><topic>Agriculture</topic><topic>Biomedical and Life Sciences</topic><topic>Climatic and soil variables</topic><topic>Geospatial and temporal variability</topic><topic>Jordbruksvetenskap</topic><topic>Large-scale analysis</topic><topic>Life Sciences</topic><topic>Machine learning approaches</topic><topic>Plant Ecology</topic><topic>Plant Physiology</topic><topic>Public databases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Monteiro, Leonardo A.</creatorcontrib><creatorcontrib>Ramos, Rafael M.</creatorcontrib><creatorcontrib>Battisti, Rafael</creatorcontrib><creatorcontrib>Soares, Johnny R.</creatorcontrib><creatorcontrib>Oliveira, Julianne C.</creatorcontrib><creatorcontrib>Figueiredo, Gleyce K. D. A.</creatorcontrib><creatorcontrib>Lamparelli, Rubens A. C.</creatorcontrib><creatorcontrib>Nendel, Claas</creatorcontrib><creatorcontrib>Lana, Marcos Alberto</creatorcontrib><creatorcontrib>Sveriges lantbruksuniversitet</creatorcontrib><collection>CrossRef</collection><collection>SwePub</collection><collection>SwePub Articles</collection><collection>SWEPUB Chalmers tekniska högskola</collection><jtitle>International journal of plant production</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Monteiro, Leonardo A.</au><au>Ramos, Rafael M.</au><au>Battisti, Rafael</au><au>Soares, Johnny R.</au><au>Oliveira, Julianne C.</au><au>Figueiredo, Gleyce K. D. A.</au><au>Lamparelli, Rubens A. C.</au><au>Nendel, Claas</au><au>Lana, Marcos Alberto</au><aucorp>Sveriges lantbruksuniversitet</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</atitle><jtitle>International journal of plant production</jtitle><stitle>Int. J. Plant Prod</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>16</volume><issue>4</issue><spage>691</spage><epage>703</epage><pages>691-703</pages><issn>1735-6814</issn><issn>1735-8043</issn><eissn>1735-8043</eissn><abstract>Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha –1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s42106-022-00209-0</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-3889-6095</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1735-6814
ispartof International journal of plant production, 2022-12, Vol.16 (4), p.691-703
issn 1735-6814
1735-8043
1735-8043
language eng
recordid cdi_swepub_primary_oai_slubar_slu_se_119035
source Springer Link
subjects Agricultural Science
Agriculture
Biomedical and Life Sciences
Climatic and soil variables
Geospatial and temporal variability
Jordbruksvetenskap
Large-scale analysis
Life Sciences
Machine learning approaches
Plant Ecology
Plant Physiology
Public databases
title Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-21T22%3A59%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-swepub_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Potential%20Use%20of%20Data-Driven%20Models%20to%20Estimate%20and%20Predict%20Soybean%20Yields%20at%20National%20Scale%20in%20Brazil&rft.jtitle=International%20journal%20of%20plant%20production&rft.au=Monteiro,%20Leonardo%20A.&rft.aucorp=Sveriges%20lantbruksuniversitet&rft.date=2022-12-01&rft.volume=16&rft.issue=4&rft.spage=691&rft.epage=703&rft.pages=691-703&rft.issn=1735-6814&rft.eissn=1735-8043&rft_id=info:doi/10.1007/s42106-022-00209-0&rft_dat=%3Cswepub_cross%3Eoai_research_chalmers_se_4456f599_82bd_4e82_8610_453a2d63b91c%3C/swepub_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true