论文标题:Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net论文来源:CVPR 2018论文链接:http://openaccess.thecvf.com/content_cvpr_2018/papers/Luo_Fast_and_Furious_CVPR_2018_paper.pdf
论文中的实验指出,单帧处理需要 9ms,早期融合处理 5 帧需要 11ms;但早期融合的缺点是无法准确地捕捉复杂的运动信息。后期融合(下图b)则采取逐级融合的方式,通过 3D 时空卷积逐步将多帧时序信息融合在一起。
论文在 Uber 内部数据集上报告的结果显示,后期融合效果最好,但推理时间也相应增加到 30ms。
最近的一篇关于激光雷达时序融合的论文收录于 CVPR 2020。
论文标题:LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention论文来源:CVPR 2020论文链接:https://arxiv.org/abs/2004.01389代码链接:https://github.com/yinjunbo/3DVID
论文标题:MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps论文来源:CVPR 2020论文链接:https://arxiv.org/abs/2003.06754代码链接:https://github.com/pxiangwu/MotionNet
论文标题:Any Motion Detector: Learning Class-Agnostic Scene Dynamics from a Sequence of LiDAR Point Clouds论文来源:ICRA 2020论文链接:https://arxiv.org/pdf/2004.11647
[1] W. Han, P. Khorrami, T. L. Paine, P. Ramachandran, M. Babaeizadeh, H. Shi, J. Li, S. Yan, and T. S. Huang. Seq-NMS for Video Object Detection. arXiv:1602.08465, 2016.
[2] K. Kang, W. Ouyang, H. Li, and X. Wang. Object Detection from Video Tubelets with Convolutional Neural Networks. CVPR, 2016. [3] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei. Flow-Guided Feature Aggregation for Video Object Detection. ICCV, 2017.[4] S. Wang, Y. Zhou, J. Yan, and Z. Deng. Fully Motion-Aware Network for Video Object Detection. ECCV, 2018.[5] F. Xiao and Y. J. Lee. Video Object Detection with an Aligned Spatial-Temporal Memory. ECCV, 2018.[6] C. Guo, B. Fan, J. Gu, Q. Zhang, S. Xiang, V. Prinet, C. Pan. Progressive Sparse Local Attention for Video Object Detection. ICCV, 2019.
论文标题:Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net论文来源:CVPR 2018论文链接:http://openaccess.thecvf.com/content_cvpr_2018/papers/Luo_Fast_and_Furious_CVPR_2018_paper.pdf
论文中的实验指出,单帧处理需要 9ms,早期融合处理 5 帧需要 11ms;但早期融合的缺点是无法准确地捕捉复杂的运动信息。后期融合(下图b)则采取逐级融合的方式,通过 3D 时空卷积逐步将多帧时序信息融合在一起。
论文在 Uber 内部数据集上报告的结果显示,后期融合效果最好,但推理时间也相应增加到 30ms。
最近的一篇关于激光雷达时序融合的论文收录于 CVPR 2020。
论文标题:LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention论文来源:CVPR 2020论文链接:https://arxiv.org/abs/2004.01389代码链接:https://github.com/yinjunbo/3DVID
论文标题:MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps论文来源:CVPR 2020论文链接:https://arxiv.org/abs/2003.06754代码链接:https://github.com/pxiangwu/MotionNet
论文标题:Any Motion Detector: Learning Class-Agnostic Scene Dynamics from a Sequence of LiDAR Point Clouds论文来源:ICRA 2020论文链接:https://arxiv.org/pdf/2004.11647
[1] W. Han, P. Khorrami, T. L. Paine, P. Ramachandran, M. Babaeizadeh, H. Shi, J. Li, S. Yan, and T. S. Huang. Seq-NMS for Video Object Detection. arXiv:1602.08465, 2016.
[2] K. Kang, W. Ouyang, H. Li, and X. Wang. Object Detection from Video Tubelets with Convolutional Neural Networks. CVPR, 2016. [3] X. Zhu, Y. Wang, J. Dai, L. Yuan, and Y. Wei. Flow-Guided Feature Aggregation for Video Object Detection. ICCV, 2017.[4] S. Wang, Y. Zhou, J. Yan, and Z. Deng. Fully Motion-Aware Network for Video Object Detection. ECCV, 2018.[5] F. Xiao and Y. J. Lee. Video Object Detection with an Aligned Spatial-Temporal Memory. ECCV, 2018.[6] C. Guo, B. Fan, J. Gu, Q. Zhang, S. Xiang, V. Prinet, C. Pan. Progressive Sparse Local Attention for Video Object Detection. ICCV, 2019.