Loading…
Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection
In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Propo...
Saved in:
Published in: | IEEE transactions on image processing 2024-01, Vol.33, p.1-1 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823 |
container_end_page | 1 |
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on image processing |
container_volume | 33 |
creator | Han, Mingfei Wang, Yali Li, Mingjie Chang, Xiaojun Yang, Yi Qiao, Yu |
description | In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors. |
doi_str_mv | 10.1109/TIP.2024.3364536 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10438399</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10438399</ieee_id><sourcerecordid>2928242838</sourcerecordid><originalsourceid>FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</originalsourceid><addsrcrecordid>eNpdkM9LwzAUx4Mobk7vHkQKXrx05ufaHGU6FaYbOPVY0vR1ZHbNTNrB_nszNkU8vcfL5315-SB0TnCfECxvZk_TPsWU9xkbcMEGB6hLJCcxxpwehh6LJE4Ilx104v0CY8IFGRyjDkuZSNOEd9HL1Nm5A-_NGqKRU0uIw2RlvaqiZ1Obeh6V1kUfoD6rTfTarsCtjYciejcF2GiSL0A30R00oRhbn6KjUlUezva1h95G97PhYzyePDwNb8exZpg0seSMcJ2kJSm0zssEWCG4UEqBVIBZOLnI9UAXOsFaQEoE4UzyUoS3PGcpZT10vctdOfvVgm-ypfEaqkrVYFufUUlTymkaPtpDV__QhW1dHa4LFKMiETjBgcI7SjvrvYMyWzmzVG6TEZxtXWfBdbZ1ne1dh5XLfXCbL6H4XfiRG4CLHWAA4E8eD4SU7BuCAII3</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2932575070</pqid></control><display><type>article</type><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</creator><creatorcontrib>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</creatorcontrib><description>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2024.3364536</identifier><identifier>PMID: 38358874</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Annotations ; Benchmark testing ; Detectors ; Educational films ; Frames (data processing) ; Holistic-View Refinement ; Object detection ; Object recognition ; Proposals ; Redundancy ; Sensors ; Task analysis ; Training ; Video Object Detection ; Weakly Supervised Learning</subject><ispartof>IEEE transactions on image processing, 2024-01, Vol.33, p.1-1</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</cites><orcidid>0000-0002-0512-880X ; 0000-0001-6096-9858 ; 0000-0002-1889-2567 ; 0000-0002-2999-7428 ; 0000-0002-7778-8807</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10438399$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,786,790,27957,27958,55147</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38358874$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Han, Mingfei</creatorcontrib><creatorcontrib>Wang, Yali</creatorcontrib><creatorcontrib>Li, Mingjie</creatorcontrib><creatorcontrib>Chang, Xiaojun</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Qiao, Yu</creatorcontrib><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</description><subject>Annotations</subject><subject>Benchmark testing</subject><subject>Detectors</subject><subject>Educational films</subject><subject>Frames (data processing)</subject><subject>Holistic-View Refinement</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Proposals</subject><subject>Redundancy</subject><subject>Sensors</subject><subject>Task analysis</subject><subject>Training</subject><subject>Video Object Detection</subject><subject>Weakly Supervised Learning</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpdkM9LwzAUx4Mobk7vHkQKXrx05ufaHGU6FaYbOPVY0vR1ZHbNTNrB_nszNkU8vcfL5315-SB0TnCfECxvZk_TPsWU9xkbcMEGB6hLJCcxxpwehh6LJE4Ilx104v0CY8IFGRyjDkuZSNOEd9HL1Nm5A-_NGqKRU0uIw2RlvaqiZ1Obeh6V1kUfoD6rTfTarsCtjYciejcF2GiSL0A30R00oRhbn6KjUlUezva1h95G97PhYzyePDwNb8exZpg0seSMcJ2kJSm0zssEWCG4UEqBVIBZOLnI9UAXOsFaQEoE4UzyUoS3PGcpZT10vctdOfvVgm-ypfEaqkrVYFufUUlTymkaPtpDV__QhW1dHa4LFKMiETjBgcI7SjvrvYMyWzmzVG6TEZxtXWfBdbZ1ne1dh5XLfXCbL6H4XfiRG4CLHWAA4E8eD4SU7BuCAII3</recordid><startdate>20240101</startdate><enddate>20240101</enddate><creator>Han, Mingfei</creator><creator>Wang, Yali</creator><creator>Li, Mingjie</creator><creator>Chang, Xiaojun</creator><creator>Yang, Yi</creator><creator>Qiao, Yu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0001-6096-9858</orcidid><orcidid>https://orcid.org/0000-0002-1889-2567</orcidid><orcidid>https://orcid.org/0000-0002-2999-7428</orcidid><orcidid>https://orcid.org/0000-0002-7778-8807</orcidid></search><sort><creationdate>20240101</creationdate><title>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</title><author>Han, Mingfei ; Wang, Yali ; Li, Mingjie ; Chang, Xiaojun ; Yang, Yi ; Qiao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Annotations</topic><topic>Benchmark testing</topic><topic>Detectors</topic><topic>Educational films</topic><topic>Frames (data processing)</topic><topic>Holistic-View Refinement</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Proposals</topic><topic>Redundancy</topic><topic>Sensors</topic><topic>Task analysis</topic><topic>Training</topic><topic>Video Object Detection</topic><topic>Weakly Supervised Learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Han, Mingfei</creatorcontrib><creatorcontrib>Wang, Yali</creatorcontrib><creatorcontrib>Li, Mingjie</creatorcontrib><creatorcontrib>Chang, Xiaojun</creatorcontrib><creatorcontrib>Yang, Yi</creatorcontrib><creatorcontrib>Qiao, Yu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Mingfei</au><au>Wang, Yali</au><au>Li, Mingjie</au><au>Chang, Xiaojun</au><au>Yang, Yi</au><au>Qiao, Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2024-01-01</date><risdate>2024</risdate><volume>33</volume><spage>1</spage><epage>1</epage><pages>1-1</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><notes>ObjectType-Article-1</notes><notes>SourceType-Scholarly Journals-1</notes><notes>ObjectType-Feature-2</notes><notes>content type line 23</notes><abstract>In this paper, we focus on the weakly supervised video object detection problem, where each training video is only tagged with object labels, without any bounding box annotations of objects. To effectively train object detectors from such weakly-annotated videos, we propose a Progressive Frame-Proposal Mining (PFPM) framework by exploiting discriminative proposals in a coarse-to-fine manner. First, we design a flexible Multi-Level Selection (MLS) scheme, with explicit guidance of video tags. By selecting object-relevant frames and mining important proposals from these frames, the proposed MLS can effectively reduce frame redundancy as well as improve proposal effectiveness to boost weakly-supervised detectors. Moreover, we develop a novel Holistic-View Refinement (HVR) scheme, which can globally evaluate importance of proposals among frames, and thus correctly refine pseudo ground truth boxes for training video detectors in a self-supervised manner. Finally, we evaluate the proposed PFPM on a large-scale benchmark for video object detection, on ImageNet VID, under the setting of weak annotations. The experimental results demonstrate that our PFPM significantly outperforms the state-of-the-art weakly-supervised detectors.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>38358874</pmid><doi>10.1109/TIP.2024.3364536</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-0512-880X</orcidid><orcidid>https://orcid.org/0000-0001-6096-9858</orcidid><orcidid>https://orcid.org/0000-0002-1889-2567</orcidid><orcidid>https://orcid.org/0000-0002-2999-7428</orcidid><orcidid>https://orcid.org/0000-0002-7778-8807</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1057-7149 |
ispartof | IEEE transactions on image processing, 2024-01, Vol.33, p.1-1 |
issn | 1057-7149 1941-0042 |
language | eng |
recordid | cdi_ieee_primary_10438399 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Annotations Benchmark testing Detectors Educational films Frames (data processing) Holistic-View Refinement Object detection Object recognition Proposals Redundancy Sensors Task analysis Training Video Object Detection Weakly Supervised Learning |
title | Progressive Frame-Proposal Mining for Weakly Supervised Video Object Detection |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-30T00%3A27%3A20IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Progressive%20Frame-Proposal%20Mining%20for%20Weakly%20Supervised%20Video%20Object%20Detection&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Han,%20Mingfei&rft.date=2024-01-01&rft.volume=33&rft.spage=1&rft.epage=1&rft.pages=1-1&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2024.3364536&rft_dat=%3Cproquest_ieee_%3E2928242838%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c301t-94314c78f1dccbf7e3d545aaae9ae03714dbc6cdc70c5e81514394f5e03bb3823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2932575070&rft_id=info:pmid/38358874&rft_ieee_id=10438399&rfr_iscdi=true |