Loading…

Learning object class detectors from weakly annotated video

Object detectors are typically trained on a large set of still images annotated by bounding-boxes. This paper introduces an approach for learning object detectors from real-world web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects...

Full description

Saved in:

Bibliographic Details
Main Authors:	Prest, Alessandro, Leistner, C., Civera, J., Schmid, C., Ferrari, V.
Format:	Conference Proceeding
Language:	English
Subjects:	Detectors Electron tubes Hidden Markov models Image segmentation Motion segmentation Tracking Training
Citations:	Items that cite this one
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c300t-b6a2511271f169834198738de66941fa0a4ef59597e5448fe8b9962311dc252d3
cites
container_end_page	3289
container_issue
container_start_page	3282
container_title
container_volume
creator	Prest, Alessandro Leistner, C. Civera, J. Schmid, C. Ferrari, V.
description	Object detectors are typically trained on a large set of still images annotated by bounding-boxes. This paper introduces an approach for learning object detectors from real-world web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects in a set of videos of the class and learns a detector for it. The approach extracts candidate spatio-temporal tubes based on motion segmentation and then selects one tube per video jointly over all videos. To compare to the state of the art, we test our detector on still images, i.e., Pascal VOC 2007. We observe that frames extracted from web videos can differ significantly in terms of quality to still images taken by a good camera. Thus, we formulate the learning from videos as a domain adaptation task. We show that training from a combination of weakly annotated videos and fully annotated still images using domain adaptation improves the performance of a detector trained from still images alone.
doi_str_mv	10.1109/CVPR.2012.6248065
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6248065</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6248065</ieee_id><sourcerecordid>6248065</sourcerecordid><originalsourceid>FETCH-LOGICAL-c300t-b6a2511271f169834198738de66941fa0a4ef59597e5448fe8b9962311dc252d3</originalsourceid><addsrcrecordid>eNo1j81KxDAUhSMqOI59AHGTF2jNvflpgispjgoFRdTtkDY30nGmlaYo8_YWHM_m8MHHgcPYJYgCQLjr6v35pUABWBhUVhh9xM5BmVICosVjlrnS_rNRJ2wBwsjcOHBnLEtpI-bMhnC4YDc1-bHv-g8-NBtqJ95ufUo80DTDMCYex2HHf8h_bvfc9_0w-YkC_-4CDRfsNPptouzQS_a2unutHvL66f6xuq3zVgox5Y3xqAGwhAjGWanA2VLaQMY4BdELryhqp11JWikbyTbOGZQAoUWNQS7Z1d9uR0Trr7Hb-XG_PnyXv_LmSWk</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Learning object class detectors from weakly annotated video</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Prest, Alessandro ; Leistner, C. ; Civera, J. ; Schmid, C. ; Ferrari, V.</creator><creatorcontrib>Prest, Alessandro ; Leistner, C. ; Civera, J. ; Schmid, C. ; Ferrari, V.</creatorcontrib><description>Object detectors are typically trained on a large set of still images annotated by bounding-boxes. This paper introduces an approach for learning object detectors from real-world web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects in a set of videos of the class and learns a detector for it. The approach extracts candidate spatio-temporal tubes based on motion segmentation and then selects one tube per video jointly over all videos. To compare to the state of the art, we test our detector on still images, i.e., Pascal VOC 2007. We observe that frames extracted from web videos can differ significantly in terms of quality to still images taken by a good camera. Thus, we formulate the learning from videos as a domain adaptation task. We show that training from a combination of weakly annotated videos and fully annotated still images using domain adaptation improves the performance of a detector trained from still images alone.</description><identifier>ISSN: 1063-6919</identifier><identifier>ISBN: 9781467312264</identifier><identifier>ISBN: 1467312266</identifier><identifier>EISBN: 1467312282</identifier><identifier>EISBN: 1467312274</identifier><identifier>EISBN: 9781467312271</identifier><identifier>EISBN: 9781467312288</identifier><identifier>DOI: 10.1109/CVPR.2012.6248065</identifier><language>eng</language><publisher>IEEE</publisher><subject>Detectors ; Electron tubes ; Hidden Markov models ; Image segmentation ; Motion segmentation ; Tracking ; Training</subject><ispartof>2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, p.3282-3289</ispartof><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c300t-b6a2511271f169834198738de66941fa0a4ef59597e5448fe8b9962311dc252d3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6248065$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>310,311,786,790,795,796,2071,27958,55271</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6248065$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Prest, Alessandro</creatorcontrib><creatorcontrib>Leistner, C.</creatorcontrib><creatorcontrib>Civera, J.</creatorcontrib><creatorcontrib>Schmid, C.</creatorcontrib><creatorcontrib>Ferrari, V.</creatorcontrib><title>Learning object class detectors from weakly annotated video</title><title>2012 IEEE Conference on Computer Vision and Pattern Recognition</title><addtitle>CVPR</addtitle><description>Object detectors are typically trained on a large set of still images annotated by bounding-boxes. This paper introduces an approach for learning object detectors from real-world web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects in a set of videos of the class and learns a detector for it. The approach extracts candidate spatio-temporal tubes based on motion segmentation and then selects one tube per video jointly over all videos. To compare to the state of the art, we test our detector on still images, i.e., Pascal VOC 2007. We observe that frames extracted from web videos can differ significantly in terms of quality to still images taken by a good camera. Thus, we formulate the learning from videos as a domain adaptation task. We show that training from a combination of weakly annotated videos and fully annotated still images using domain adaptation improves the performance of a detector trained from still images alone.</description><subject>Detectors</subject><subject>Electron tubes</subject><subject>Hidden Markov models</subject><subject>Image segmentation</subject><subject>Motion segmentation</subject><subject>Tracking</subject><subject>Training</subject><issn>1063-6919</issn><isbn>9781467312264</isbn><isbn>1467312266</isbn><isbn>1467312282</isbn><isbn>1467312274</isbn><isbn>9781467312271</isbn><isbn>9781467312288</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1j81KxDAUhSMqOI59AHGTF2jNvflpgispjgoFRdTtkDY30nGmlaYo8_YWHM_m8MHHgcPYJYgCQLjr6v35pUABWBhUVhh9xM5BmVICosVjlrnS_rNRJ2wBwsjcOHBnLEtpI-bMhnC4YDc1-bHv-g8-NBtqJ95ufUo80DTDMCYex2HHf8h_bvfc9_0w-YkC_-4CDRfsNPptouzQS_a2unutHvL66f6xuq3zVgox5Y3xqAGwhAjGWanA2VLaQMY4BdELryhqp11JWikbyTbOGZQAoUWNQS7Z1d9uR0Trr7Hb-XG_PnyXv_LmSWk</recordid><startdate>201206</startdate><enddate>201206</enddate><creator>Prest, Alessandro</creator><creator>Leistner, C.</creator><creator>Civera, J.</creator><creator>Schmid, C.</creator><creator>Ferrari, V.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201206</creationdate><title>Learning object class detectors from weakly annotated video</title><author>Prest, Alessandro ; Leistner, C. ; Civera, J. ; Schmid, C. ; Ferrari, V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c300t-b6a2511271f169834198738de66941fa0a4ef59597e5448fe8b9962311dc252d3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Detectors</topic><topic>Electron tubes</topic><topic>Hidden Markov models</topic><topic>Image segmentation</topic><topic>Motion segmentation</topic><topic>Tracking</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Prest, Alessandro</creatorcontrib><creatorcontrib>Leistner, C.</creatorcontrib><creatorcontrib>Civera, J.</creatorcontrib><creatorcontrib>Schmid, C.</creatorcontrib><creatorcontrib>Ferrari, V.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Prest, Alessandro</au><au>Leistner, C.</au><au>Civera, J.</au><au>Schmid, C.</au><au>Ferrari, V.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Learning object class detectors from weakly annotated video</atitle><btitle>2012 IEEE Conference on Computer Vision and Pattern Recognition</btitle><stitle>CVPR</stitle><date>2012-06</date><risdate>2012</risdate><spage>3282</spage><epage>3289</epage><pages>3282-3289</pages><issn>1063-6919</issn><isbn>9781467312264</isbn><isbn>1467312266</isbn><eisbn>1467312282</eisbn><eisbn>1467312274</eisbn><eisbn>9781467312271</eisbn><eisbn>9781467312288</eisbn><abstract>Object detectors are typically trained on a large set of still images annotated by bounding-boxes. This paper introduces an approach for learning object detectors from real-world web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects in a set of videos of the class and learns a detector for it. The approach extracts candidate spatio-temporal tubes based on motion segmentation and then selects one tube per video jointly over all videos. To compare to the state of the art, we test our detector on still images, i.e., Pascal VOC 2007. We observe that frames extracted from web videos can differ significantly in terms of quality to still images taken by a good camera. Thus, we formulate the learning from videos as a domain adaptation task. We show that training from a combination of weakly annotated videos and fully annotated still images using domain adaptation improves the performance of a detector trained from still images alone.</abstract><pub>IEEE</pub><doi>10.1109/CVPR.2012.6248065</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1063-6919
ispartof	2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, p.3282-3289
issn	1063-6919
language	eng
recordid	cdi_ieee_primary_6248065
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Detectors Electron tubes Hidden Markov models Image segmentation Motion segmentation Tracking Training
title	Learning object class detectors from weakly annotated video
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-21T17%3A32%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Learning%20object%20class%20detectors%20from%20weakly%20annotated%20video&rft.btitle=2012%20IEEE%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition&rft.au=Prest,%20Alessandro&rft.date=2012-06&rft.spage=3282&rft.epage=3289&rft.pages=3282-3289&rft.issn=1063-6919&rft.isbn=9781467312264&rft.isbn_list=1467312266&rft_id=info:doi/10.1109/CVPR.2012.6248065&rft.eisbn=1467312282&rft.eisbn_list=1467312274&rft.eisbn_list=9781467312271&rft.eisbn_list=9781467312288&rft_dat=%3Cieee_6IE%3E6248065%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c300t-b6a2511271f169834198738de66941fa0a4ef59597e5448fe8b9962311dc252d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6248065&rfr_iscdi=true