FedMIR: Multimodal Federated Learning with Missing Modality Imputation and Distribution-Aware Routing.
Hongyu Xiong, Ming Dai
Existing multimodal federated learning methods typically assume complete modality availability and struggle with heterogeneity between training and testing data distributions, making them unsuitable for handling missing modalities and distribution drift in distributed learning scenarios such as the Internet of Things (IoT). To address these challenges, we present FedMIR, a novel framework for multimodal federated learning. Our key observation is that heterogeneous modalities can be mapped into a shared semantic space, where cross-modal dependencies can be effectively modeled. Based on this insight, FedMIR leverages contrastive learning to align image-text modalities in a shared latent space and employs conditional generation to reconstruct missing modality representations. The completed representations are then routed through a mixture-of-experts backbone conditioned on the estimated distribution state. FedMIR shares only model parameters and distribution statistics with the server. This design enables the model to operate under missing modality settings while adaptively allocating expert knowledge to cope with distribution drift. We validate FedMIR on federated image-text retrieval benchmarks under heterogeneity and missing data conditions, demonstrating its effectiveness compared to representative federated learning baselines.
Read on ELI