SAM2-RFANet: a multi-receptive-field aware attention network for enhanced polyp segmentation in colonoscopy imagery.
Guangcheng Li, Ge Jiao, Yang Liu, Zhenpeng Zhong
Accurate polyp segmentation in colonoscopy images is crucial for early colorectal cancer diagnosis but remains challenging due to irregular shapes, weak boundaries, and low contrast. Although foundation segmentation models like SAM2 offer powerful representations, their generic decoders often lack the multi-scale modeling and cross-layer semantic constraints needed for precise medical image analysis, particularly under weak-boundary conditions. To address this, we propose SAM2-RFANet, a novel network that synergistically couples multi-receptive-field feature extraction with collaborative attention mechanisms. Our framework introduces three core modules: an adaptive receptive field modulation module for dynamic context modeling via learnable hybrid pooling and multi-scale dilated convolutions; a multi-scale feature decoding module employing direction-aware large-kernel decomposition to better capture anisotropic structures like polyp boundaries; and a semantic feature alignment module for gated, semantically consistent fusion of multi-level features. Extensive experiments on five public datasets show that SAM2-RFANet achieves highly competitive and generally strong performance across both in-distribution and unseen test sets. Although it does not obtain the best score on every metric, it demonstrates a favorable balance between overlap accuracy, structural consistency, and cross-dataset generalization. Notably, it attains mean Dice scores of 0.923 on Kvasir-SEG and 0.942 on CVC-ClinicDB, while showing strong generalization on more challenging datasets (e.g. 0.813 on CVC-ColonDB). These results confirm the effectiveness of integrating multi-receptive-field modeling with cross-layer semantic alignment for robust polyp segmentation in complex endoscopic scenarios. Our work illustrates how task-specific architectural innovations can effectively adapt powerful foundation models to the nuanced demands of medical image analysis. The code will be available athttps://github.com/253458lgc/SAM2-RFANet.
Read on ELI