Revisiting Integration of Image and Metadata for DICOM Series Classification: Cross-Attention and Dictionary Learning
Tuan Truong, Melanie Dohmen, Sara Lorio, Matthias Lenga
An end-to-end multimodal framework for DICOM series classification that jointly models image content and acquisition metadata using bi-directional cross-modal attention and a sparse, missingness-aware metadata encoder based on learnable feature dictionaries.