PureACL: View Consistent Purification for Accurate Cross-View Localization

Australian National University, CSIRO, Ford Motor Company
ICCV 2023

PureACL Confidence and pose optimization visualization.

Method Overview

SAFCE is used to produce feature maps (F), view-consistent confidence maps (V), and on-ground confidence maps (O) separately for satellite and ground-view images. The VOKD fuses the confidence maps and identifies the top-k confident features from the ground-view images and their corresponding features on the satellite feature maps. Sub-pixel interpolation is used to lookup point features (F[p] from F) and their weights (w[p] from VxO). The residual between the two views and the point weights are fed to the RPRB for subsequent pose optimization. The olive outline indicates that the O^s disables gradient backpropagation while red, green, blue and magenta outlines and points represent the front, left, right and rear views, respectively.

Teaser

Abstract

This paper proposes a fine-grained self-localization method for outdoor robotics that utilizes a flexible number of onboard cameras and readily accessible satellite images. The proposed method addresses limitations in existing cross-view localization methods that struggle to handle noise sources such as moving objects and seasonal variations. It is the first sparse visual-only method that enhances perception in dynamic environments by detecting view-consistent key points and their corresponding deep features from ground and satellite views, while removing off-the-ground objects and establishing homography transformation between the two views. Moreover, the proposed method incorporates a spatial embedding approach that leverages camera intrinsic and extrinsic information to reduce the ambiguity of purely visual matching, leading to improved feature matching and overall pose estimation accuracy. The method exhibits strong generalization and is robust to environmental changes, requiring only geo-poses as ground truth. Extensive experiments on the KITTI and Ford Multi-AV Seasonal datasets demonstrate that our proposed method outperforms existing state-of-the-art methods, achieving median spatial accuracy errors below 0.5 meters along the lateral and longitudinal directions, and a median orientation accuracy error below 2 degrees.

Comparison on the KITTI-CVL dataset

Teaser

Comparison on the FordAV-CVL dataset

Teaser

BibTeX

@inproceedings{wang2023view,
  title={View Consistent Purification for Accurate Cross-View Localization},
  author={Wang, Shan and Zhang, Yanhao and Perincherry, Akhil and Vora, Ankit and Li, Hongdong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={8197--8206},
  year={2023}
  }