Loading…

Automatic Information Extraction in the Third-Generation Semiconductor Materials Domain Based on DKNet and MANet

The third-generation semiconductor materials (TGSMs) is a frontier scientific domain, where researchers need to consult extensive literature for the entity information on materials, devices, preparation methods, and experimental performances, and sort out the complex relations between them. However,...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2022, Vol.10, p.29367-29376
Main Authors: Jiang, Xiaobo, He, Kun, Yang, Borui
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The third-generation semiconductor materials (TGSMs) is a frontier scientific domain, where researchers need to consult extensive literature for the entity information on materials, devices, preparation methods, and experimental performances, and sort out the complex relations between them. However, the explosion of relevant papers has far exceeded researchers' reading ability. In this article, the TGSM-field automatic information extraction is conducted based on entity recognition (ER) and relation extraction (RE) techniques. First, the corpora used for ER and RE in this field are created. Second, aiming at the complexity of the entities, a neural network using domain knowledge (DKNet) is proposed to improve ER performance. It uses the keyword sequence of each entity type as prior knowledge, adds a dedicated embedding to encode entity categories, then combines prior knowledge and encoded vectors with the context through a gated information fusion module to assist recognition. As for the indicative word dependence problem of entity relations, a multi-aspect attention-based network model (MANet) is proposed to enhance the attention to relation-indicative words, thereby improving the RE performance. Finally, F1 scores of 74.5 and 85.9 were achieved on the created ER and RE test sets, outperforming other advanced models by 3.4~\sim ~10.1 , which is the best performance of the TGSM-field automatic information extraction.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2022.3159338