Loading…

Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis

Sentiment analysis of social multimedia data has attracted extensive research interest and has been applied to many tasks, such as election prediction and products evaluation. Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid...

Full description

Saved in:

Bibliographic Details
Published in:	ACM transactions on multimedia computing communications and applications 2020-09, Vol.16 (3), p.1-19
Main Authors:	Huang, Feiran, Wei, Kaimin, Weng, Jian, Li, Zhoujun
Format:	Article
Language:	English
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c225t-acc76a3feb5c64fbc6b397276a74ebd2641611fd254c4deeebc793dd8f9154313
cites	cdi_FETCH-LOGICAL-c225t-acc76a3feb5c64fbc6b397276a74ebd2641611fd254c4deeebc793dd8f9154313
container_end_page	19
container_issue	3
container_start_page	1
container_title	ACM transactions on multimedia computing communications and applications
container_volume	16
creator	Huang, Feiran Wei, Kaimin Weng, Jian Li, Zhoujun
description	Sentiment analysis of social multimedia data has attracted extensive research interest and has been applied to many tasks, such as election prediction and products evaluation. Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multimodal data. Different modalities usually have information that is complementary. Thus, it is necessary to learn the overall sentiment by combining the visual content with text description. In this article, we propose a novel method—Attention-Based Modality-Gated Networks (AMGN)—to exploit the correlation between the modalities of images and texts and extract the discriminative features for multimodal sentiment analysis. Specifically, a visual-semantic attention model is proposed to learn attended visual features for each word. To effectively combine the sentiment information on the two modalities of image and text, a modality-gated LSTM is proposed to learn the multimodal features by adaptively selecting the modality that presents stronger sentiment information. Then a semantic self-attention model is proposed to automatically focus on the discriminative features for sentiment classification. Extensive experiments have been conducted on both manually annotated and machine weakly labeled datasets. The results demonstrate the superiority of our approach through comparison with state-of-the-art models.
doi_str_mv	10.1145/3388861
format	article
fullrecord	<record><control><sourceid>crossref</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3388861</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1145_3388861</sourcerecordid><originalsourceid>FETCH-LOGICAL-c225t-acc76a3feb5c64fbc6b397276a74ebd2641611fd254c4deeebc793dd8f9154313</originalsourceid><addsrcrecordid>eNo9ULtOwzAUtRBIlIL4hWxMhtz4kWQMFZRIpR0oc-TY1yiQxMi2BPn7pqJiOS_pnOEQcgvpPQAXD4wVRSHhjCxACKCykOL8X4v8klyF8JmmTAouF6SuYsQxdm6kjyqgSV6dUX0XJ7pWcbZbjD_Of4XEOp_Ug_pAusffmLwdS8MMSTWqfgpduCYXVvUBb068JO_PT_vVC93s1vWq2lCdZSJSpXUuFbPYCi25bbVsWZlnc5ZzbE0mOUgAazLBNTeI2Oq8ZMYUtgTBGbAlufvb1d6F4NE2374blJ8aSJvjA83pAXYAcC1OCg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Huang, Feiran ; Wei, Kaimin ; Weng, Jian ; Li, Zhoujun</creator><creatorcontrib>Huang, Feiran ; Wei, Kaimin ; Weng, Jian ; Li, Zhoujun</creatorcontrib><description>Sentiment analysis of social multimedia data has attracted extensive research interest and has been applied to many tasks, such as election prediction and products evaluation. Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multimodal data. Different modalities usually have information that is complementary. Thus, it is necessary to learn the overall sentiment by combining the visual content with text description. In this article, we propose a novel method—Attention-Based Modality-Gated Networks (AMGN)—to exploit the correlation between the modalities of images and texts and extract the discriminative features for multimodal sentiment analysis. Specifically, a visual-semantic attention model is proposed to learn attended visual features for each word. To effectively combine the sentiment information on the two modalities of image and text, a modality-gated LSTM is proposed to learn the multimodal features by adaptively selecting the modality that presents stronger sentiment information. Then a semantic self-attention model is proposed to automatically focus on the discriminative features for sentiment classification. Extensive experiments have been conducted on both manually annotated and machine weakly labeled datasets. The results demonstrate the superiority of our approach through comparison with state-of-the-art models.</description><identifier>ISSN: 1551-6857</identifier><identifier>EISSN: 1551-6865</identifier><identifier>DOI: 10.1145/3388861</identifier><language>eng</language><ispartof>ACM transactions on multimedia computing communications and applications, 2020-09, Vol.16 (3), p.1-19</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c225t-acc76a3feb5c64fbc6b397276a74ebd2641611fd254c4deeebc793dd8f9154313</citedby><cites>FETCH-LOGICAL-c225t-acc76a3feb5c64fbc6b397276a74ebd2641611fd254c4deeebc793dd8f9154313</cites><orcidid>0000-0003-4294-0212</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,786,790,27957,27958</link.rule.ids></links><search><creatorcontrib>Huang, Feiran</creatorcontrib><creatorcontrib>Wei, Kaimin</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><creatorcontrib>Li, Zhoujun</creatorcontrib><title>Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis</title><title>ACM transactions on multimedia computing communications and applications</title><description>Sentiment analysis of social multimedia data has attracted extensive research interest and has been applied to many tasks, such as election prediction and products evaluation. Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multimodal data. Different modalities usually have information that is complementary. Thus, it is necessary to learn the overall sentiment by combining the visual content with text description. In this article, we propose a novel method—Attention-Based Modality-Gated Networks (AMGN)—to exploit the correlation between the modalities of images and texts and extract the discriminative features for multimodal sentiment analysis. Specifically, a visual-semantic attention model is proposed to learn attended visual features for each word. To effectively combine the sentiment information on the two modalities of image and text, a modality-gated LSTM is proposed to learn the multimodal features by adaptively selecting the modality that presents stronger sentiment information. Then a semantic self-attention model is proposed to automatically focus on the discriminative features for sentiment classification. Extensive experiments have been conducted on both manually annotated and machine weakly labeled datasets. The results demonstrate the superiority of our approach through comparison with state-of-the-art models.</description><issn>1551-6857</issn><issn>1551-6865</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNo9ULtOwzAUtRBIlIL4hWxMhtz4kWQMFZRIpR0oc-TY1yiQxMi2BPn7pqJiOS_pnOEQcgvpPQAXD4wVRSHhjCxACKCykOL8X4v8klyF8JmmTAouF6SuYsQxdm6kjyqgSV6dUX0XJ7pWcbZbjD_Of4XEOp_Ug_pAusffmLwdS8MMSTWqfgpduCYXVvUBb068JO_PT_vVC93s1vWq2lCdZSJSpXUuFbPYCi25bbVsWZlnc5ZzbE0mOUgAazLBNTeI2Oq8ZMYUtgTBGbAlufvb1d6F4NE2374blJ8aSJvjA83pAXYAcC1OCg</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Huang, Feiran</creator><creator>Wei, Kaimin</creator><creator>Weng, Jian</creator><creator>Li, Zhoujun</creator><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-4294-0212</orcidid></search><sort><creationdate>20200901</creationdate><title>Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis</title><author>Huang, Feiran ; Wei, Kaimin ; Weng, Jian ; Li, Zhoujun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c225t-acc76a3feb5c64fbc6b397276a74ebd2641611fd254c4deeebc793dd8f9154313</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Feiran</creatorcontrib><creatorcontrib>Wei, Kaimin</creatorcontrib><creatorcontrib>Weng, Jian</creatorcontrib><creatorcontrib>Li, Zhoujun</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on multimedia computing communications and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Feiran</au><au>Wei, Kaimin</au><au>Weng, Jian</au><au>Li, Zhoujun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis</atitle><jtitle>ACM transactions on multimedia computing communications and applications</jtitle><date>2020-09-01</date><risdate>2020</risdate><volume>16</volume><issue>3</issue><spage>1</spage><epage>19</epage><pages>1-19</pages><issn>1551-6857</issn><eissn>1551-6865</eissn><abstract>Sentiment analysis of social multimedia data has attracted extensive research interest and has been applied to many tasks, such as election prediction and products evaluation. Sentiment analysis of one modality (e.g., text or image) has been broadly studied. However, not much attention has been paid to the sentiment analysis of multimodal data. Different modalities usually have information that is complementary. Thus, it is necessary to learn the overall sentiment by combining the visual content with text description. In this article, we propose a novel method—Attention-Based Modality-Gated Networks (AMGN)—to exploit the correlation between the modalities of images and texts and extract the discriminative features for multimodal sentiment analysis. Specifically, a visual-semantic attention model is proposed to learn attended visual features for each word. To effectively combine the sentiment information on the two modalities of image and text, a modality-gated LSTM is proposed to learn the multimodal features by adaptively selecting the modality that presents stronger sentiment information. Then a semantic self-attention model is proposed to automatically focus on the discriminative features for sentiment classification. Extensive experiments have been conducted on both manually annotated and machine weakly labeled datasets. The results demonstrate the superiority of our approach through comparison with state-of-the-art models.</abstract><doi>10.1145/3388861</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-4294-0212</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1551-6857
ispartof	ACM transactions on multimedia computing communications and applications, 2020-09, Vol.16 (3), p.1-19
issn	1551-6857 1551-6865
language	eng
recordid	cdi_crossref_primary_10_1145_3388861
source	Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
title	Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-10-01T04%3A50%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Attention-Based%20Modality-Gated%20Networks%20for%20Image-Text%20Sentiment%20Analysis&rft.jtitle=ACM%20transactions%20on%20multimedia%20computing%20communications%20and%20applications&rft.au=Huang,%20Feiran&rft.date=2020-09-01&rft.volume=16&rft.issue=3&rft.spage=1&rft.epage=19&rft.pages=1-19&rft.issn=1551-6857&rft.eissn=1551-6865&rft_id=info:doi/10.1145/3388861&rft_dat=%3Ccrossref%3E10_1145_3388861%3C/crossref%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c225t-acc76a3feb5c64fbc6b397276a74ebd2641611fd254c4deeebc793dd8f9154313%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true