Loading…

Transductive Video Segmentation on Tree-Structured Model

This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation h...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2017-05, Vol.27 (5), p.992-1005
Main Authors: Botao Wang, Zhihui Fu, Hongkai Xiong, Zheng, Yuan F.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3
cites cdi_FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3
container_end_page 1005
container_issue 5
container_start_page 992
container_title IEEE transactions on circuits and systems for video technology
container_volume 27
creator Botao Wang
Zhihui Fu
Hongkai Xiong
Zheng, Yuan F.
description This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.
doi_str_mv 10.1109/TCSVT.2016.2527378
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2174467547</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7401019</ieee_id><sourcerecordid>2174467547</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt_QC8Lnrdmspkke5TiF1Q8dO01pLsT2dLu1mRX8N-bWhEGZg7vMzM8jF0DnwHw8q6aL1fVTHBQM4FCF9qcsAkgmlwIjqdp5gi5EYDn7CLGDecgjdQTZqrgutiM9dB-UbZqG-qzJX3sqBvc0PZdlqoKRPlyCCk0Bmqy176h7SU7824b6eqvT9n740M1f84Xb08v8_tFXosShxyKUqHwa4NYSlmsUXPnaw2qKchrWa_Ro5JKUel8-tQDkvEOpOIGeN24Yspuj3v3of8cKQ5204-hSyetAC2l0ih1Soljqg59jIG83Yd258K3BW4PhuyvIXswZP8MJejmCLVE9A9oyYFDWfwAsH9hKg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2174467547</pqid></control><display><type>article</type><title>Transductive Video Segmentation on Tree-Structured Model</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</creator><creatorcontrib>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</creatorcontrib><description>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2016.2527378</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Classifiers ; Clustering ; Computer simulation ; Divergence ; Frames ; Hypotheses ; Image segmentation ; Monte Carlo approximation ; Motion segmentation ; Object segmentation ; Optimization ; parametric min-cut ; Proposals ; Robustness ; Segmentation ; State of the art ; Support vector machines ; temporal tree ; transductive learning ; Video data ; video segmentation ; Video sequences ; Visualization</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2017-05, Vol.27 (5), p.992-1005</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2017</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</citedby><cites>FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</cites><orcidid>0000-0003-4552-0029</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7401019$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,786,790,27957,27958,55147</link.rule.ids></links><search><creatorcontrib>Botao Wang</creatorcontrib><creatorcontrib>Zhihui Fu</creatorcontrib><creatorcontrib>Hongkai Xiong</creatorcontrib><creatorcontrib>Zheng, Yuan F.</creatorcontrib><title>Transductive Video Segmentation on Tree-Structured Model</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Classifiers</subject><subject>Clustering</subject><subject>Computer simulation</subject><subject>Divergence</subject><subject>Frames</subject><subject>Hypotheses</subject><subject>Image segmentation</subject><subject>Monte Carlo approximation</subject><subject>Motion segmentation</subject><subject>Object segmentation</subject><subject>Optimization</subject><subject>parametric min-cut</subject><subject>Proposals</subject><subject>Robustness</subject><subject>Segmentation</subject><subject>State of the art</subject><subject>Support vector machines</subject><subject>temporal tree</subject><subject>transductive learning</subject><subject>Video data</subject><subject>video segmentation</subject><subject>Video sequences</subject><subject>Visualization</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><recordid>eNo9kE1LAzEQhoMoWKt_QC8Lnrdmspkke5TiF1Q8dO01pLsT2dLu1mRX8N-bWhEGZg7vMzM8jF0DnwHw8q6aL1fVTHBQM4FCF9qcsAkgmlwIjqdp5gi5EYDn7CLGDecgjdQTZqrgutiM9dB-UbZqG-qzJX3sqBvc0PZdlqoKRPlyCCk0Bmqy176h7SU7824b6eqvT9n740M1f84Xb08v8_tFXosShxyKUqHwa4NYSlmsUXPnaw2qKchrWa_Ro5JKUel8-tQDkvEOpOIGeN24Yspuj3v3of8cKQ5204-hSyetAC2l0ih1Soljqg59jIG83Yd258K3BW4PhuyvIXswZP8MJejmCLVE9A9oyYFDWfwAsH9hKg</recordid><startdate>20170501</startdate><enddate>20170501</enddate><creator>Botao Wang</creator><creator>Zhihui Fu</creator><creator>Hongkai Xiong</creator><creator>Zheng, Yuan F.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-4552-0029</orcidid></search><sort><creationdate>20170501</creationdate><title>Transductive Video Segmentation on Tree-Structured Model</title><author>Botao Wang ; Zhihui Fu ; Hongkai Xiong ; Zheng, Yuan F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Classifiers</topic><topic>Clustering</topic><topic>Computer simulation</topic><topic>Divergence</topic><topic>Frames</topic><topic>Hypotheses</topic><topic>Image segmentation</topic><topic>Monte Carlo approximation</topic><topic>Motion segmentation</topic><topic>Object segmentation</topic><topic>Optimization</topic><topic>parametric min-cut</topic><topic>Proposals</topic><topic>Robustness</topic><topic>Segmentation</topic><topic>State of the art</topic><topic>Support vector machines</topic><topic>temporal tree</topic><topic>transductive learning</topic><topic>Video data</topic><topic>video segmentation</topic><topic>Video sequences</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Botao Wang</creatorcontrib><creatorcontrib>Zhihui Fu</creatorcontrib><creatorcontrib>Hongkai Xiong</creatorcontrib><creatorcontrib>Zheng, Yuan F.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Botao Wang</au><au>Zhihui Fu</au><au>Hongkai Xiong</au><au>Zheng, Yuan F.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Transductive Video Segmentation on Tree-Structured Model</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2017-05-01</date><risdate>2017</risdate><volume>27</volume><issue>5</issue><spage>992</spage><epage>1005</epage><pages>992-1005</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>This paper presents a transductive multicomponent video segmentation algorithm, which is capable of segmenting the predefined object of interest in the frames of a video sequence. To ensure temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength, and visual saliency. In particular, the state-of-the-art R-convolutional neural network descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses through the Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes and poses. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by a support vector machine classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2016.2527378</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0003-4552-0029</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2017-05, Vol.27 (5), p.992-1005
issn 1051-8215
1558-2205
language eng
recordid cdi_proquest_journals_2174467547
source IEEE Electronic Library (IEL) Journals
subjects Algorithms
Artificial neural networks
Classifiers
Clustering
Computer simulation
Divergence
Frames
Hypotheses
Image segmentation
Monte Carlo approximation
Motion segmentation
Object segmentation
Optimization
parametric min-cut
Proposals
Robustness
Segmentation
State of the art
Support vector machines
temporal tree
transductive learning
Video data
video segmentation
Video sequences
Visualization
title Transductive Video Segmentation on Tree-Structured Model
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-09-22T07%3A46%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Transductive%20Video%20Segmentation%20on%20Tree-Structured%20Model&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Botao%20Wang&rft.date=2017-05-01&rft.volume=27&rft.issue=5&rft.spage=992&rft.epage=1005&rft.pages=992-1005&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2016.2527378&rft_dat=%3Cproquest_ieee_%3E2174467547%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c295t-139652fb8559443b570afc716d3ef74cb5f56466e9af205f15e8fa1460810cda3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2174467547&rft_id=info:pmid/&rft_ieee_id=7401019&rfr_iscdi=true