Announcement_7
Now accepted to NeurIPS 2025: multi-modal contrastive learning adapts to intrinsic dimensions. We present a theoretical analysis of CLIP, showing how temperature optimization enables adaptation to the intrinsic dimension of shared features in multi-modal data.
A more recent work proposes IndiSeek which learns modality-specific features that are independent of shared features.