Loading…

Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection

In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Propo...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on image processing 2024-01, Vol.33, p.1-1
Main Authors: Han, Mingfei, Wang, Yali, Li, Mingjie, Chang, Xiaojun, Yang, Yi, Qiao, Yu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823
container_end_page 1
container_issue
container_start_page 1
container_title IEEE transactions on image processing
container_volume 33
creator Han, Mingfei
Wang, Yali
Li, Mingjie
Chang, Xiaojun
Yang, Yi
Qiao, Yu
description In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.
doi_str_mv 10.1109/TIP.2024.3364536
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10438399</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10438399</ieee_id><sourcerecordid>2928242838</sourcerecordid><originalsourceid>FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</originalsourceid><addsrcrecordid>eNpdkM9LwzAUx4Mobk7vHkQKXrx05ufaHGU6FaYbOPVY0vR1ZHbNTNrB_nszNkU8vcfL5315-SB0TnCfECxvZk_TPsWU9xkbcMEGB6hLJCcxxpwehh6LJE4Ilx104v0CY8IFGRyjDkuZSNOEd9HL1Nm5A-_NGqKRU0uIw2RlvaqiZ1Obeh6V1kUfoD6rTfTarsCtjYciejcF2GiSL0A30R00oRhbn6KjUlUezva1h95G97PhYzyePDwNb8exZpg0seSMcJ2kJSm0zssEWCG4UEqBVIBZOLnI9UAXOsFaQEoE4UzyUoS3PGcpZT10vctdOfvVgm-ypfEaqkrVYFufUUlTymkaPtpDV__QhW1dHa4LFKMiETjBgcI7SjvrvYMyWzmzVG6TEZxtXWfBdbZ1ne1dh5XLfXCbL6H4XfiRG4CLHWAA4E8eD4SU7BuCAII3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932575070</pqid></control><display><type>article</type><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</creator><creatorcontrib>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</creatorcontrib><description>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2024.3364536</identifier><identifier>PMID: 38358874</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Annotations ; Benchmark testing ; Detectors ; Educational films ; Frames (data processing) ; Holistic-View Refinement ; Object detection ; Object recognition ; Proposals ; Redundancy ; Sensors ; Task analysis ; Training ; Video Object Detection ; Weakly Supervised Learning</subject><ispartof>IEEE transactions on image processing, 2024-01, Vol.33, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</cites><orcidid>0000-0002-0512-880X ; 0000-0001-6096-9858 ; 0000-0002-1889-2567 ; 0000-0002-2999-7428 ; 0000-0002-7778-8807</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10438399$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,786,790,27957,27958,55147</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38358874$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Mingfei</creatorcontrib><creatorcontrib>Wang, Yali</creatorcontrib><creatorcontrib>Li, Mingjie</creatorcontrib><creatorcontrib>Chang, Xiaojun</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Qiao, Yu</creatorcontrib><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</description><subject>Annotations</subject><subject>Benchmark testing</subject><subject>Detectors</subject><subject>Educational films</subject><subject>Frames (data processing)</subject><subject>Holistic-View Refinement</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Proposals</subject><subject>Redundancy</subject><subject>Sensors</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video Object Detection</subject><subject>Weakly Supervised Learning</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpdkM9LwzAUx4Mobk7vHkQKXrx05ufaHGU6FaYbOPVY0vR1ZHbNTNrB_nszNkU8vcfL5315-SB0TnCfECxvZk_TPsWU9xkbcMEGB6hLJCcxxpwehh6LJE4Ilx104v0CY8IFGRyjDkuZSNOEd9HL1Nm5A-_NGqKRU0uIw2RlvaqiZ1Obeh6V1kUfoD6rTfTarsCtjYciejcF2GiSL0A30R00oRhbn6KjUlUezva1h95G97PhYzyePDwNb8exZpg0seSMcJ2kJSm0zssEWCG4UEqBVIBZOLnI9UAXOsFaQEoE4UzyUoS3PGcpZT10vctdOfvVgm-ypfEaqkrVYFufUUlTymkaPtpDV__QhW1dHa4LFKMiETjBgcI7SjvrvYMyWzmzVG6TEZxtXWfBdbZ1ne1dh5XLfXCbL6H4XfiRG4CLHWAA4E8eD4SU7BuCAII3</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Han, Mingfei</creator><creator>Wang, Yali</creator><creator>Li, Mingjie</creator><creator>Chang, Xiaojun</creator><creator>Yang, Yi</creator><creator>Qiao, Yu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0001-6096-9858</orcidid><orcidid>https://orcid.org/0000-0002-1889-2567</orcidid><orcidid>https://orcid.org/0000-0002-2999-7428</orcidid><orcidid>https://orcid.org/0000-0002-7778-8807</orcidid></search><sort><creationdate>20240101</creationdate><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><author>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Benchmark testing</topic><topic>Detectors</topic><topic>Educational films</topic><topic>Frames (data processing)</topic><topic>Holistic-View Refinement</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Proposals</topic><topic>Redundancy</topic><topic>Sensors</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video Object Detection</topic><topic>Weakly Supervised Learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Mingfei</creatorcontrib><creatorcontrib>Wang, Yali</creatorcontrib><creatorcontrib>Li, Mingjie</creatorcontrib><creatorcontrib>Chang, Xiaojun</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Qiao, Yu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Mingfei</au><au>Wang, Yali</au><au>Li, Mingjie</au><au>Chang, Xiaojun</au><au>Yang, Yi</au><au>Qiao, Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2024-01-01</date><risdate>2024</risdate><volume>33</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><notes>ObjectType-Article-1</notes><notes>SourceType-Scholarly Journals-1</notes><notes>ObjectType-Feature-2</notes><notes>content type line 23</notes><abstract>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38358874</pmid><doi>10.1109/TIP.2024.3364536</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0001-6096-9858</orcidid><orcidid>https://orcid.org/0000-0002-1889-2567</orcidid><orcidid>https://orcid.org/0000-0002-2999-7428</orcidid><orcidid>https://orcid.org/0000-0002-7778-8807</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2024-01, Vol.33, p.1-1
issn 1057-7149
1941-0042
language eng
recordid cdi_ieee_primary_10438399
source IEEE Electronic Library (IEL) Journals
subjects Annotations
Benchmark testing
Detectors
Educational films
Frames (data processing)
Holistic-View Refinement
Object detection
Object recognition
Proposals
Redundancy
Sensors
Task analysis
Training
Video Object Detection
Weakly Supervised Learning
title Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-30T00%3A27%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Progressive%20Frame-Proposal%20Mining%20for%20Weakly%20Supervised%20Video%20Object%20Detection&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Han,%20Mingfei&rft.date=2024-01-01&rft.volume=33&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2024.3364536&rft_dat=%3Cproquest_ieee_%3E2928242838%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2932575070&rft_id=info:pmid/38358874&rft_ieee_id=10438399&rfr_iscdi=true