Loading…

A spatio-temporal network for video semantic segmentation in surgical videos

Purpose Semantic segmentation in surgical videos has applications in intra-operative guidance, post-operative analytics and surgical education. Models need to provide accurate predictions since temporally inconsistent identification of anatomy can hinder patient safety. We propose a novel architectu...

Full description

Saved in:
Bibliographic Details
Published in:International journal for computer assisted radiology and surgery 2024-02, Vol.19 (2), p.375-382
Main Authors: Grammatikopoulou, Maria, Sanchez-Matilla, Ricardo, Bragman, Felix, Owen, David, Culshaw, Lucy, Kerr, Karen, Stoyanov, Danail, Luengo, Imanol
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Purpose Semantic segmentation in surgical videos has applications in intra-operative guidance, post-operative analytics and surgical education. Models need to provide accurate predictions since temporally inconsistent identification of anatomy can hinder patient safety. We propose a novel architecture for modelling temporal relationships in videos to address these issues. Methods We developed a temporal segmentation model that includes a static encoder and a spatio-temporal decoder. The encoder processes individual frames whilst the decoder learns spatio-temporal relationships from frame sequences. The decoder can be used with any suitable encoder to improve temporal consistency. Results Model performance was evaluated on the CholecSeg8k dataset and a private dataset of robotic Partial Nephrectomy procedures. Mean Intersection over Union improved by 1.30% and 4.27% respectively for each dataset when the temporal decoder was applied. Our model also displayed improvements in temporal consistency up to 7.23%. Conclusions This work demonstrates an advance in video segmentation of surgical scenes with potential applications in surgery with a view to improve patient outcomes. The proposed decoder can extend state-of-the-art static models, and it is shown that it can improve per-frame segmentation output and video temporal consistency.
ISSN:1861-6429
1861-6410
1861-6429
DOI:10.1007/s11548-023-02971-6