Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation

J. Tang, R. Ambrus, V. Guizilini, S. Pillai, H. Kim, P. Jensfelt, A. Gaidon

Published in CoRL 2020 (oral) - November 2020

Links: arxiv, CoRL page, video, code, bibtex

KP3D demo


Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion. State-of-the-art learning-based methods generate training samples via homography adaptation to create 2D synthetic views with known keypoint matches from a single image. This approach, however, does not generalize to non-planar 3D scenes with illumination variations commonly seen in real-world videos. In this work, we propose self-supervised learning of depth-aware keypoints directly from unlabeled videos. We jointly learn keypoint and depth estimation networks by combining appearance and geometric matching via a differentiable structure-from-motion module based on Procrustean residual pose correction. We describe how our self-supervised keypoints can be integrated into state-of-the-art visual odometry frameworks for robust and accurate ego-motion estimation of autonomous vehicles in real-world conditions.



    title={Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation},
    author={Jiexiong Tang and Rares Ambrus and Vitor Guizilini and Sudeep Pillai and Hanme Kim and Patric Jensfelt and Adrien Gaidon},