RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation Through Embedding Predictive Pre-Training

Raktim Gautam Goswami,Prashanth Krishnamurthy,Yann LeCun,Farshad Khorrami

CVPR 2025（2025）

Cited 0|Views3

Abstract

Vision-based pose estimation of articulated robots with unknown joint angles has applications in collaborative robotics and human-robot interaction tasks. Current frameworks use neural network encoders to extract image features and downstream layers to predict joint angles and robot pose. While images of robots inherently contain rich information about the robot's physical structures, existing methods often fail to leverage it fully; therefore, limiting performance under occlusions and truncations. To address this, we introduce RoboPEPP, a method that fuses information about the robot's physical model into the encoder using a masking-based self-supervised embedding-predictive architecture. Specifically, we mask the robot's joints and pre-train an encoder-predictor model to infer the joints' embeddings from surrounding unmasked regions, enhancing the encoder's understanding of the robot's physical model. The pre-trained encoder-predictor pair, along with joint angle and keypoint prediction networks, is then fine-tuned for pose and joint angle estimation. Random masking of input during fine-tuning and keypoint filtering during evaluation further improves robustness. Our method, evaluated on several datasets, achieves the best results in robot pose and joint angle estimation while being the least sensitive to occlusions and requiring the lowest execution time.

Translated text

Bibtex

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Data Disclaimer

The page data are from open Internet sources, cooperative publishers and automatic analysis results through AI technology. We do not make any commitments and guarantees for the validity, accuracy, correctness, reliability, completeness and timeliness of the page data. If you have any questions, please contact us by email: report@aminer.cn

Chat Paper

【要点】：论文提出了RoboPEPP方法，通过融合机器人物理模型信息，提高了视觉基础上的机器人位姿和关节角估计的准确性，增强了遮挡情况下的性能，并降低了执行时间。

【方法】：RoboPEPP采用了一种基于遮罩的自监督嵌入预测架构，在编码器中融合了机器人物理模型信息，通过预测遮罩区域周围的关节嵌入来增强对机器人结构的理解。

【实验】：研究者在多个数据集上评估了RoboPEPP方法，通过随机遮罩输入和关键点过滤，该方法在机器人位姿和关节角估计方面取得了最佳效果，并且对遮挡的敏感性最低，执行时间最短。

去 AI 文献库对话