Weakly Supervised Temporal Action Localization With Proposal-Level Action Consistency Learning.
Maodong Li, Zhihao Wang, Tian Wang, Jingxiong Wang, Jian Wang, Bing Li
Existing weakly supervised temporal action localization (WTAL) methods typically follow a decoupled classification-localization pipeline: segment-level classifiers are trained first, and their predictions are then aggregated to score proposals at inference. Under this training-inference discrepancy, proposal scoring at inference relies on an additional aggregation step, which can accumulate errors from noisy segment responses and thus undermine score reliability. Moreover, proposal scores are often directly used as confidence without explicit score-quality modeling or quality-aware evaluation, further contributing to pronounced score-quality misalignment and thus widening the classification-localization gap. To address proposal score-quality misalignment, we propose ACL-Net, a framework for proposal score calibration. At its core is a dual-axis Proposal-level Action Consistency Learning (PACL) paradigm, implemented through two complementary modules: (i) a Semantic Consistency Module (SCM) that refines proposal representations by maintaining fused class centers to enforce compact and robust same-class features; within SCM, a cross-modal consistency-driven Classification Enhancement Module (CEM) denoises the fused class centers to mitigate error accumulation under weak supervision; and (ii) a Process Consistency Module (PCM) that derives geometry-aware reference scores from relative temporal relations among overlapping proposals, guiding the model to assess proposal quality in terms of relative process completeness and improve score-quality alignment. By jointly modeling semantic and process consistency to calibrate proposal scores, ACL-Net markedly improves localization accuracy. On THUMOS14 and ActivityNet1.3, it achieves state-of-the-art performance with uniform and substantial gains across multiple established baselines, while markedly lowering the expected calibration error (ECE).
Read on ELI