Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception
IEEE ROBOTICS AND AUTOMATION LETTERS(2024)
Megvii Technol
Abstract
Long-term temporal fusion is a crucial but often overlooked technique incamera-based Bird's-Eye-View (BEV) 3D perception. Existing methods are mostlyin a parallel manner. While parallel fusion can benefit from long-terminformation, it suffers from increasing computational and memory overheads asthe fusion window size grows. Alternatively, BEVFormer adopts a recurrentfusion pipeline so that history information can be efficiently integrated, yetit fails to benefit from longer temporal frames. In this paper, we explore anembarrassingly simple long-term recurrent fusion strategy built upon theLSS-based methods and find it already able to enjoy the merits from both sides,i.e., rich long-term information and efficient fusion pipeline. A temporalembedding module is further proposed to improve the model's robustness againstoccasionally missed frames in practical scenarios. We name this simple buteffective fusing pipeline VideoBEV. Experimental results on the nuScenesbenchmark show that VideoBEV obtains strong performance on various camera-based3D perception tasks, including object detection (55.4% mAP and 62.9% NDS),segmentation (48.6% vehicle mIoU), tracking (54.8% AMOTA), and motionprediction (0.80m minADE and 0.463 EPA).
MoreTranslated text
Key words
Three-dimensional displays,History,Task analysis,Feature extraction,Fuses,Pipelines,Detectors,Multi-view 3D object detection,recurrent network and long-term temporal fusion
PDF
View via Publisher
AI Read Science
Must-Reading Tree
Example

Generate MRT to find the research sequence of this paper
Data Disclaimer
The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn
Chat Paper
Summary is being generated by the instructions you defined