Loading…

LaMMOn: language model combined graph neural network for multi-target multi-camera tracking in online scenarios

Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process...

Full description

Saved in:

Bibliographic Details
Published in:	Machine learning 2024-09, Vol.113 (9), p.6811-6837
Main Authors:	Nguyen, Tuan T., Nguyen, Hoang H., Sartipi, Mina, Fisichella, Marco
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Cameras Clustering Computer Science Control Datasets Embedding Graph neural networks Intelligent transportation systems Machine Learning Mechatronics Modules Multiple target tracking Natural Language Processing (NLP) Neural networks Object linking & embedding Object recognition Real time Robotics Simulation and Modeling Tracking Transportation networks
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multi-target multi-camera tracking is crucial to intelligent transportation systems. Numerous recent studies have been undertaken to address this issue. Nevertheless, using the approaches in real-world situations is challenging due to the scarcity of publicly available data and the laborious process of manually annotating the new dataset and creating a tailored rule-based matching system for each camera scenario. To address this issue, we present a novel solution termed LaMMOn , an end-to-end transformer and graph neural network-based multi-camera tracking model. LaMMOn consists of three main modules: (1) Language Model Detection (LMD) for object detection; (2) Language and Graph Model Association module (LGMA) for object tracking and trajectory clustering; (3) Text-to-embedding module (T2E) that overcome the problem of data limitation by synthesizing the object embedding from defined texts. LaMMOn can be run online in real-time scenarios and achieve a competitive result on many datasets, e.g., CityFlow (HOTA 76.46%), I24 (HOTA 25.7%), and TrackCUIP (HOTA 80.94%) with an acceptable FPS (from 12.20 to 13.37) for an online application.
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-024-06592-1