Loading…

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection

In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Propo...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on image processing 2024-01, Vol.33, p.1-1
Main Authors:	Han, Mingfei, Wang, Yali, Li, Mingjie, Chang, Xiaojun, Yang, Yi, Qiao, Yu
Format:	Article
Language:	English
Subjects:	Annotations Benchmark testing Detectors Educational films Frames (data processing) Holistic-View Refinement Object detection Object recognition Proposals Redundancy Sensors Task analysis Training Video Object Detection Weakly Supervised Learning
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823
container_end_page	1
container_issue
container_start_page	1
container_title	IEEE transactions on image processing
container_volume	33
creator	Han, Mingfei Wang, Yali Li, Mingjie Chang, Xiaojun Yang, Yi Qiao, Yu
description	In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.
doi_str_mv	10.1109/TIP.2024.3364536
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10438399</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10438399</ieee_id><sourcerecordid>2928242838</sourcerecordid><originalsourceid>FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</originalsourceid><addsrcrecordid>eNpdkM9LwzAUx4Mobk7vHkQKXrx05ufaHGU6FaYbOPVY0vR1ZHbNTNrB_nszNkU8vcfL5315-SB0TnCfECxvZk_TPsWU9xkbcMEGB6hLJCcxxpwehh6LJE4Ilx104v0CY8IFGRyjDkuZSNOEd9HL1Nm5A-_NGqKRU0uIw2RlvaqiZ1Obeh6V1kUfoD6rTfTarsCtjYciejcF2GiSL0A30R00oRhbn6KjUlUezva1h95G97PhYzyePDwNb8exZpg0seSMcJ2kJSm0zssEWCG4UEqBVIBZOLnI9UAXOsFaQEoE4UzyUoS3PGcpZT10vctdOfvVgm-ypfEaqkrVYFufUUlTymkaPtpDV__QhW1dHa4LFKMiETjBgcI7SjvrvYMyWzmzVG6TEZxtXWfBdbZ1ne1dh5XLfXCbL6H4XfiRG4CLHWAA4E8eD4SU7BuCAII3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932575070</pqid></control><display><type>article</type><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</creator><creatorcontrib>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</creatorcontrib><description>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2024.3364536</identifier><identifier>PMID: 38358874</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Annotations ; Benchmark testing ; Detectors ; Educational films ; Frames (data processing) ; Holistic-View Refinement ; Object detection ; Object recognition ; Proposals ; Redundancy ; Sensors ; Task analysis ; Training ; Video Object Detection ; Weakly Supervised Learning</subject><ispartof>IEEE transactions on image processing, 2024-01, Vol.33, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</cites><orcidid>0000-0002-0512-880X ; 0000-0001-6096-9858 ; 0000-0002-1889-2567 ; 0000-0002-2999-7428 ; 0000-0002-7778-8807</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10438399$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,786,790,27957,27958,55147</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38358874$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Mingfei</creatorcontrib><creatorcontrib>Wang, Yali</creatorcontrib><creatorcontrib>Li, Mingjie</creatorcontrib><creatorcontrib>Chang, Xiaojun</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Qiao, Yu</creatorcontrib><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</description><subject>Annotations</subject><subject>Benchmark testing</subject><subject>Detectors</subject><subject>Educational films</subject><subject>Frames (data processing)</subject><subject>Holistic-View Refinement</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Proposals</subject><subject>Redundancy</subject><subject>Sensors</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video Object Detection</subject><subject>Weakly Supervised Learning</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpdkM9LwzAUx4Mobk7vHkQKXrx05ufaHGU6FaYbOPVY0vR1ZHbNTNrB_nszNkU8vcfL5315-SB0TnCfECxvZk_TPsWU9xkbcMEGB6hLJCcxxpwehh6LJE4Ilx104v0CY8IFGRyjDkuZSNOEd9HL1Nm5A-_NGqKRU0uIw2RlvaqiZ1Obeh6V1kUfoD6rTfTarsCtjYciejcF2GiSL0A30R00oRhbn6KjUlUezva1h95G97PhYzyePDwNb8exZpg0seSMcJ2kJSm0zssEWCG4UEqBVIBZOLnI9UAXOsFaQEoE4UzyUoS3PGcpZT10vctdOfvVgm-ypfEaqkrVYFufUUlTymkaPtpDV__QhW1dHa4LFKMiETjBgcI7SjvrvYMyWzmzVG6TEZxtXWfBdbZ1ne1dh5XLfXCbL6H4XfiRG4CLHWAA4E8eD4SU7BuCAII3</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Han, Mingfei</creator><creator>Wang, Yali</creator><creator>Li, Mingjie</creator><creator>Chang, Xiaojun</creator><creator>Yang, Yi</creator><creator>Qiao, Yu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0001-6096-9858</orcidid><orcidid>https://orcid.org/0000-0002-1889-2567</orcidid><orcidid>https://orcid.org/0000-0002-2999-7428</orcidid><orcidid>https://orcid.org/0000-0002-7778-8807</orcidid></search><sort><creationdate>20240101</creationdate><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><author>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Benchmark testing</topic><topic>Detectors</topic><topic>Educational films</topic><topic>Frames (data processing)</topic><topic>Holistic-View Refinement</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Proposals</topic><topic>Redundancy</topic><topic>Sensors</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video Object Detection</topic><topic>Weakly Supervised Learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Mingfei</creatorcontrib><creatorcontrib>Wang, Yali</creatorcontrib><creatorcontrib>Li, Mingjie</creatorcontrib><creatorcontrib>Chang, Xiaojun</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Qiao, Yu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Mingfei</au><au>Wang, Yali</au><au>Li, Mingjie</au><au>Chang, Xiaojun</au><au>Yang, Yi</au><au>Qiao, Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2024-01-01</date><risdate>2024</risdate><volume>33</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><notes>ObjectType-Article-1</notes><notes>SourceType-Scholarly Journals-1</notes><notes>ObjectType-Feature-2</notes><notes>content type line 23</notes><abstract>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38358874</pmid><doi>10.1109/TIP.2024.3364536</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0001-6096-9858</orcidid><orcidid>https://orcid.org/0000-0002-1889-2567</orcidid><orcidid>https://orcid.org/0000-0002-2999-7428</orcidid><orcidid>https://orcid.org/0000-0002-7778-8807</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2024-01, Vol.33, p.1-1
issn	1057-7149 1941-0042
language	eng
recordid	cdi_ieee_primary_10438399
source	IEEE Electronic Library (IEL) Journals
subjects	Annotations Benchmark testing Detectors Educational films Frames (data processing) Holistic-View Refinement Object detection Object recognition Proposals Redundancy Sensors Task analysis Training Video Object Detection Weakly Supervised Learning
title	Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-30T00%3A27%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Progressive%20Frame-Proposal%20Mining%20for%20Weakly%20Supervised%20Video%20Object%20Detection&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Han,%20Mingfei&rft.date=2024-01-01&rft.volume=33&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2024.3364536&rft_dat=%3Cproquest_ieee_%3E2928242838%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2932575070&rft_id=info:pmid/38358874&rft_ieee_id=10438399&rfr_iscdi=true