Loading…

A multiple stream architecture for the recognition of signs in Brazilian sign language in the context of health

Deaf people communicate naturally through sign languages and often face barriers to communicating with hearing people and accessing information in written languages. These communication difficulties are aggravated in the health domain, especially in a hospital emergency, when human sign language int...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia tools and applications 2024-02, Vol.83 (7), p.19767-19785
Main Authors:	da Silva, Diego R. B., de Araújo, Tiago Maritan U., do Rêgo, Thaís Gaudencio, Brandão, Manuella Aschoff Cavalcanti, Gonçalves, Luiz Marcos Garcia
Format:	Article
Language:	English
Subjects:	Color imagery Communication Computer Communication Networks Computer Science Context Data Structures and Information Theory Deafness Emergency medical services Kinematics Languages Multimedia Information Systems Optical flow (image analysis) Recurrent neural networks Sign language Special Purpose and Application-Based Systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deaf people communicate naturally through sign languages and often face barriers to communicating with hearing people and accessing information in written languages. These communication difficulties are aggravated in the health domain, especially in a hospital emergency, when human sign language interpreters are unavailable. This paper proposes a solution for automatically recognizing signs in Brazilian Sign Language (Libras) in the health context to reduce this problem. The idea is that the system could assist in the communication between a Deaf patient and his doctor in the future. Our solution involves a multiple-stream architecture that combines convolutional and recurrent neural networks, dealing with sign languages’ visual phonemes individual and specialized ways. The first stream uses the optical flow as input for capturing information about the “movement” of the sign; the second stream extracts kinematic and postural features, including “handshapes” and “facial expressions”; and the third stream process the raw RGB images to address additional attributes about the sign not captured in the previous streams. Thus, we can process more spatiotemporal features that discriminate the classes during the training stage. The computational results show that the solution can recognize signs in Libras in the health context, with an average accuracy, precision, recall, and f1-score of 99.80%, 99.81%, 99.80%, and 99.80%, respectively. Our system also performed better than other works in the literature, obtaining an average accuracy of 100% in an Argentine Sign Language (LSA) dataset, which is usually used for comparison purposes.
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-16332-7