Authors - Harita Venkatesan Abstract - Fusion-based multimodal models typically assume full modality availability at inference, an assumption that often fails in real-world settings. When a modality is missing, common strategies such as zerovector masking or unimodal fallback can lead to unstable predictions. We propose CORE, an embedding-level framework that completes multimodal representations by integrating original and cross-modally reconstructed embeddings in a fusion-consistent manner prior to fusion. CORE employs lightweight bidirectional cross-modal imagination networks with a cycle-consistency constraint to preserve shared semantic structure across modalities. The model is trained with stochastic modality dropout, enabling unified inference under complete and incomplete modality configurations. Experiments on a multimodal MRI–text classification task for lumbar spine analysis demonstrate that CORE yields more stable predictions than zero-vector masking under severe modality absence, while maintaining comparable performance when all modalities are present.