Loading…

Context-aware 6D pose estimation of known objects using RGB-D data

In the realm of computer vision and robotics, the pursuit of intelligent robotic grasping and accurate 6D object pose estimation has been a focal point of research. Many modern-world applications, such as robot grasping, manipulation, and palletizing, require the correct pose of objects present in a...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia tools and applications 2024-05, Vol.83 (17), p.52973-52987
Main Authors: Kumar, Ankit, Shukla, Priya, Kushwaha, Vandana, Nandi, Gora Chand
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the realm of computer vision and robotics, the pursuit of intelligent robotic grasping and accurate 6D object pose estimation has been a focal point of research. Many modern-world applications, such as robot grasping, manipulation, and palletizing, require the correct pose of objects present in a scene to perform their specific tasks. The estimation of a 6D object pose becomes even more challenging due to inherent complexities, especially when dealing with objects positioned within cluttered scenes and subjected to high levels of occlusion. While prior endeavors have made strides in addressing this issue, their accuracy falls short of the reliability demanded by real-world applications. In this research, we present an architecture that, unlike prior works, incorporates contextual awareness. This novel approach capitalizes on the contextual information attainable about the objects in question. The framework we propose takes a dissection approach, discerning objects by their intrinsic characteristics, namely whether they are symmetric or non-symmetric. Notably, our methodology employs a more profound estimator and refiner network tandem for non-symmetric objects, in contrast to symmetric ones. This distinction acknowledges the inherent dissimilarities between the two object types, thereby enhancing performance. Through experiments conducted on the LineMOD dataset, widely regarded as a benchmark for pose estimation in occluded and cluttered scenes, we demonstrate a notable improvement in accuracy of approximately 3.2% compared to the previous state-of-the-art method, DenseFusion. Moreover, our results indicate that the achieved inference time is sufficient for real-time usage. Overall, our proposed architecture leverages contextual information and tailors the pose estimation process based on object types, leading to enhanced accuracy and real-time performance in challenging scenarios. Code is available at GitHub link
ISSN:1573-7721
1380-7501
1573-7721
DOI:10.1007/s11042-023-17524-x