Loading…

Deep learning of electrochemical CO conversion literature reveals research trends and directions

Large-scale and openly available material science databases are mainly composed of computer simulation results rather than experimental data. Some examples include the Materials Project, Open Quantum Materials Database, and Open Catalyst 2022. Unfortunately, building large-scale experimental databas...

Full description

Saved in:
Bibliographic Details
Published in:Journal of materials chemistry. A, Materials for energy and sustainability Materials for energy and sustainability, 2023-08, Vol.11 (33), p.17628-17643
Main Authors: Choi, Jiwoo, Bang, Kihoon, Jang, Suji, Choi, Jaewoong, Ordonez, Juanita, Buttler, David, Hiszpanski, Anna, Yong-Jin Han, T, Sohn, Seok Su, Lee, Byungju, Lee, Kwang-Ryeol, Han, Sang Soo, Kim, Donghun
Format: Article
Language:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Large-scale and openly available material science databases are mainly composed of computer simulation results rather than experimental data. Some examples include the Materials Project, Open Quantum Materials Database, and Open Catalyst 2022. Unfortunately, building large-scale experimental databases remains challenging due to the difficulties in consolidating locally distributed datasets. In this work, focusing on the catalysis literature of CO 2 reduction reactions (CO 2 RRs), we present a machine learning (ML)-based protocol for selecting highly relevant papers and extracting important experimental data. First, we report a document embedding method (Doc2Vec) for collecting papers of greatest relevance to the specific target domain, which yielded 3154 CO 2 RR-related papers from six publishers. Next, we developed named entity recognition (NER) models to extract twelve entities related to material names (catalyst, electrolyte, etc. ) and catalytic performance (Faradaic efficiency, current density, etc. ). Among several tested models, the MatBERT-based approach achieved the highest accuracy, with an average F1-score of 90.4% and an F1-score of 95.2% in a boundary relaxation evaluation scheme. The accurate and accelerated NER-based data extraction from a large volume of catalysis literature enables temporal trend analyses of the CO 2 RR catalysts, products, and performances, revealing the potentially effective material space in CO 2 RRs. While this work demonstrates the effectiveness of our ML-based text mining methods for specifically CO 2 RR literature, the methods and approach are applicable to and may be used to accelerate the development of other catalytic chemical reactions. Machine learning (ML)-based protocol for selecting highly relevant papers, extracting important experimental data, and analyzing research trends & directions focusing on the field of CO 2 reduction reactions (CO 2 RRs).
ISSN:2050-7488
2050-7496
DOI:10.1039/d3ta02780e