MscaVPR: Multi-Scale Coordinate Attention Network for Robust Visual Place Recognition.

Xiaohan Gao, Zhinong Zhong, Yongjian Tan, Ning Jing, Anran Yang, Qingren Jia

Visual place recognition (VPR) aims to localize a query image by matching its visual representation against a geotagged database. One major challenge in VPR is to learn place representations that remain robust under appearance changes, viewpoint variations, and perceptual aliasing. However, existing VPR methods still show limitations in adaptive multi-scale feature fusion and viewpoint-aware training supervision, which may hinder robustness under severe viewpoint changes. In this paper, we propose MscaVPR, a VPR framework that combines multi-scale feature enhancement with azimuth-aware training. Specifically, a Multi-Scale Spatial Pyramid Attention (MSPA) module is incorporated to aggregate regional features across different spatial scales, and Coordinate Attention (CA) is used to encode positional cues for spatially refined feature learning. To further enhance viewpoint robustness, we design an azimuth-guided training strategy that selects hard positive samples with significant viewpoint discrepancies and optimizes them using an azimuth-aware auxiliary loss function.Experimental results on multiple benchmark datasets indicate that MscaVPR generally outperforms the strong baseline and demonstrates improved performance under challenging conditions. In particular, Recall@1 is improved by 2.1%, 1.9%, and 1.9% on the AmsterTime, SVOX-Night, and SVOX-Sun datasets, respectively. The results demonstrate that explicitly incorporating azimuth cues provides an effective complement to existing multi-scale and attention-based VPR methods.

Read on ELI