Loading…
Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil
Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluat...
Saved in:
Published in: | International journal of plant production 2022-12, Vol.16 (4), p.691-703 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63 |
---|---|
cites | cdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63 |
container_end_page | 703 |
container_issue | 4 |
container_start_page | 691 |
container_title | International journal of plant production |
container_volume | 16 |
creator | Monteiro, Leonardo A. Ramos, Rafael M. Battisti, Rafael Soares, Johnny R. Oliveira, Julianne C. Figueiredo, Gleyce K. D. A. Lamparelli, Rubens A. C. Nendel, Claas Lana, Marcos Alberto |
description | Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha
–1
representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy. |
doi_str_mv | 10.1007/s42106-022-00209-0 |
format | article |
fullrecord | <record><control><sourceid>swepub_cross</sourceid><recordid>TN_cdi_swepub_primary_oai_slubar_slu_se_119035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>oai_research_chalmers_se_4456f599_82bd_4e82_8610_453a2d63b91c</sourcerecordid><originalsourceid>FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</originalsourceid><addsrcrecordid>eNp9kctuFDEQRVsIJELgB1j5BzqUH-12LyEJEClApCELVlbZLhNHne7I9gSFr4-HGbGDVdWiztFV3a57y-GEA4zvihIcdA9C9AACph6edUd8lENvQMnnh10brl52r0q5BdBac3PUxau10lITzuy6EFsjO8OK_VlOD7SwL2ugubC6svNS0x1WYrgEdpUpJF_ZZn10hAv7kWgOhWFlX7GmdWmyjceZWFrYh4y_0_y6exFxLvTmMI-764_n308_95ffPl2cvr_svQJd-wgYveFci0HrKER0gxnJ0cBN4GbkwjiJU5BGjZ4ieC-8dCMnIwWFyWt53J3sveUX3W-dvc8tdX60KyZb5q3DvBu2kOV8Ajk0YPNPIFMhzP7G-huc7yiXHafUoOMwTdYIF6wiI6zRHKwaJIqgpZu4b1axt_q8lpIp_vW2y11hdl-YbYXZP4VZaJA8RGnHy0_K9nbd5vbM8j_qCdoTmRc</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</title><source>Springer Link</source><creator>Monteiro, Leonardo A. ; Ramos, Rafael M. ; Battisti, Rafael ; Soares, Johnny R. ; Oliveira, Julianne C. ; Figueiredo, Gleyce K. D. A. ; Lamparelli, Rubens A. C. ; Nendel, Claas ; Lana, Marcos Alberto</creator><creatorcontrib>Monteiro, Leonardo A. ; Ramos, Rafael M. ; Battisti, Rafael ; Soares, Johnny R. ; Oliveira, Julianne C. ; Figueiredo, Gleyce K. D. A. ; Lamparelli, Rubens A. C. ; Nendel, Claas ; Lana, Marcos Alberto ; Sveriges lantbruksuniversitet</creatorcontrib><description>Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha
–1
representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.</description><identifier>ISSN: 1735-6814</identifier><identifier>ISSN: 1735-8043</identifier><identifier>EISSN: 1735-8043</identifier><identifier>DOI: 10.1007/s42106-022-00209-0</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Agricultural Science ; Agriculture ; Biomedical and Life Sciences ; Climatic and soil variables ; Geospatial and temporal variability ; Jordbruksvetenskap ; Large-scale analysis ; Life Sciences ; Machine learning approaches ; Plant Ecology ; Plant Physiology ; Public databases</subject><ispartof>International journal of plant production, 2022-12, Vol.16 (4), p.691-703</ispartof><rights>Springer Nature Switzerland AG 2022. Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</citedby><cites>FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</cites><orcidid>0000-0003-3889-6095</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,315,786,790,891,27957,27958</link.rule.ids><backlink>$$Uhttps://research.chalmers.se/publication/532111$$DView record from Swedish Publication Index$$Hfree_for_read</backlink><backlink>$$Uhttps://res.slu.se/id/publ/119035$$DView record from Swedish Publication Index$$Hfree_for_read</backlink></links><search><creatorcontrib>Monteiro, Leonardo A.</creatorcontrib><creatorcontrib>Ramos, Rafael M.</creatorcontrib><creatorcontrib>Battisti, Rafael</creatorcontrib><creatorcontrib>Soares, Johnny R.</creatorcontrib><creatorcontrib>Oliveira, Julianne C.</creatorcontrib><creatorcontrib>Figueiredo, Gleyce K. D. A.</creatorcontrib><creatorcontrib>Lamparelli, Rubens A. C.</creatorcontrib><creatorcontrib>Nendel, Claas</creatorcontrib><creatorcontrib>Lana, Marcos Alberto</creatorcontrib><creatorcontrib>Sveriges lantbruksuniversitet</creatorcontrib><title>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</title><title>International journal of plant production</title><addtitle>Int. J. Plant Prod</addtitle><description>Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha
–1
representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.</description><subject>Agricultural Science</subject><subject>Agriculture</subject><subject>Biomedical and Life Sciences</subject><subject>Climatic and soil variables</subject><subject>Geospatial and temporal variability</subject><subject>Jordbruksvetenskap</subject><subject>Large-scale analysis</subject><subject>Life Sciences</subject><subject>Machine learning approaches</subject><subject>Plant Ecology</subject><subject>Plant Physiology</subject><subject>Public databases</subject><issn>1735-6814</issn><issn>1735-8043</issn><issn>1735-8043</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kctuFDEQRVsIJELgB1j5BzqUH-12LyEJEClApCELVlbZLhNHne7I9gSFr4-HGbGDVdWiztFV3a57y-GEA4zvihIcdA9C9AACph6edUd8lENvQMnnh10brl52r0q5BdBac3PUxau10lITzuy6EFsjO8OK_VlOD7SwL2ugubC6svNS0x1WYrgEdpUpJF_ZZn10hAv7kWgOhWFlX7GmdWmyjceZWFrYh4y_0_y6exFxLvTmMI-764_n308_95ffPl2cvr_svQJd-wgYveFci0HrKER0gxnJ0cBN4GbkwjiJU5BGjZ4ieC-8dCMnIwWFyWt53J3sveUX3W-dvc8tdX60KyZb5q3DvBu2kOV8Ajk0YPNPIFMhzP7G-huc7yiXHafUoOMwTdYIF6wiI6zRHKwaJIqgpZu4b1axt_q8lpIp_vW2y11hdl-YbYXZP4VZaJA8RGnHy0_K9nbd5vbM8j_qCdoTmRc</recordid><startdate>20221201</startdate><enddate>20221201</enddate><creator>Monteiro, Leonardo A.</creator><creator>Ramos, Rafael M.</creator><creator>Battisti, Rafael</creator><creator>Soares, Johnny R.</creator><creator>Oliveira, Julianne C.</creator><creator>Figueiredo, Gleyce K. D. A.</creator><creator>Lamparelli, Rubens A. C.</creator><creator>Nendel, Claas</creator><creator>Lana, Marcos Alberto</creator><general>Springer International Publishing</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ADTPV</scope><scope>AOWAS</scope><scope>F1S</scope><orcidid>https://orcid.org/0000-0003-3889-6095</orcidid></search><sort><creationdate>20221201</creationdate><title>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</title><author>Monteiro, Leonardo A. ; Ramos, Rafael M. ; Battisti, Rafael ; Soares, Johnny R. ; Oliveira, Julianne C. ; Figueiredo, Gleyce K. D. A. ; Lamparelli, Rubens A. C. ; Nendel, Claas ; Lana, Marcos Alberto</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Agricultural Science</topic><topic>Agriculture</topic><topic>Biomedical and Life Sciences</topic><topic>Climatic and soil variables</topic><topic>Geospatial and temporal variability</topic><topic>Jordbruksvetenskap</topic><topic>Large-scale analysis</topic><topic>Life Sciences</topic><topic>Machine learning approaches</topic><topic>Plant Ecology</topic><topic>Plant Physiology</topic><topic>Public databases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Monteiro, Leonardo A.</creatorcontrib><creatorcontrib>Ramos, Rafael M.</creatorcontrib><creatorcontrib>Battisti, Rafael</creatorcontrib><creatorcontrib>Soares, Johnny R.</creatorcontrib><creatorcontrib>Oliveira, Julianne C.</creatorcontrib><creatorcontrib>Figueiredo, Gleyce K. D. A.</creatorcontrib><creatorcontrib>Lamparelli, Rubens A. C.</creatorcontrib><creatorcontrib>Nendel, Claas</creatorcontrib><creatorcontrib>Lana, Marcos Alberto</creatorcontrib><creatorcontrib>Sveriges lantbruksuniversitet</creatorcontrib><collection>CrossRef</collection><collection>SwePub</collection><collection>SwePub Articles</collection><collection>SWEPUB Chalmers tekniska högskola</collection><jtitle>International journal of plant production</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Monteiro, Leonardo A.</au><au>Ramos, Rafael M.</au><au>Battisti, Rafael</au><au>Soares, Johnny R.</au><au>Oliveira, Julianne C.</au><au>Figueiredo, Gleyce K. D. A.</au><au>Lamparelli, Rubens A. C.</au><au>Nendel, Claas</au><au>Lana, Marcos Alberto</au><aucorp>Sveriges lantbruksuniversitet</aucorp><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil</atitle><jtitle>International journal of plant production</jtitle><stitle>Int. J. Plant Prod</stitle><date>2022-12-01</date><risdate>2022</risdate><volume>16</volume><issue>4</issue><spage>691</spage><epage>703</epage><pages>691-703</pages><issn>1735-6814</issn><issn>1735-8043</issn><eissn>1735-8043</eissn><abstract>Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha
–1
representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><doi>10.1007/s42106-022-00209-0</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-3889-6095</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1735-6814 |
ispartof | International journal of plant production, 2022-12, Vol.16 (4), p.691-703 |
issn | 1735-6814 1735-8043 1735-8043 |
language | eng |
recordid | cdi_swepub_primary_oai_slubar_slu_se_119035 |
source | Springer Link |
subjects | Agricultural Science Agriculture Biomedical and Life Sciences Climatic and soil variables Geospatial and temporal variability Jordbruksvetenskap Large-scale analysis Life Sciences Machine learning approaches Plant Ecology Plant Physiology Public databases |
title | Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-21T22%3A59%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-swepub_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Potential%20Use%20of%20Data-Driven%20Models%20to%20Estimate%20and%20Predict%20Soybean%20Yields%20at%20National%20Scale%20in%20Brazil&rft.jtitle=International%20journal%20of%20plant%20production&rft.au=Monteiro,%20Leonardo%20A.&rft.aucorp=Sveriges%20lantbruksuniversitet&rft.date=2022-12-01&rft.volume=16&rft.issue=4&rft.spage=691&rft.epage=703&rft.pages=691-703&rft.issn=1735-6814&rft.eissn=1735-8043&rft_id=info:doi/10.1007/s42106-022-00209-0&rft_dat=%3Cswepub_cross%3Eoai_research_chalmers_se_4456f599_82bd_4e82_8610_453a2d63b91c%3C/swepub_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c406t-f0afc81162566f22fb587ebe518d187128b3a9d3847cef0cc2c3b71e832ed9c63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |